[
  {
    "path": ".github/ISSUE_TEMPLATE/bug-report.yml",
    "content": "---\nname: \"🐛 Bug Report\"\ndescription: Report a bug\ntitle: \"(short issue description)\"\nlabels: [bug, needs-triage]\nassignees: []\nbody:\n  - type: textarea\n    id: description\n    attributes:\n      label: Describe the bug\n      description: What is the problem? Provide a clear description of your issue and the steps you took that produced it.\n    validations:\n      required: true\n  - type: textarea\n    id: modelname\n    attributes:\n      label: Model Name\n      description: Provide Model Name\n    validations:\n      required: true\n  - type: textarea\n    id: workloadtype\n    attributes:\n      label: Describe the workload type \n      description: Note the type of workload (such as Inference or Training) and any specific details about the workload configuration.\n    validations:\n      required: true\n  - type: textarea\n    id: instancetype\n    attributes:\n      label: Instance Type\n      description: |\n       Provide the AWS EC2 instance type you used to run the workload (such as `inf2.xlarge`, `trn1.32xlarge`, `trn2.48xlarge` etc.)\n    validations:\n      required: true\n  - type: textarea\n    id: release\n    attributes:\n      label: Release version \n      description: |\n        Provide the Neuron SDK release version (such as `2.25.0`) you are using, and all relevant Neuron component versions. \n        ```\n        apt list --installed | grep -i -e neuron\n        pip list | grep -i -e neuron -e torch -e transformers -e jax\n        ```\n  - type: textarea\n    id: reproduction\n    attributes:\n      label: Reproduction Steps\n      description: |\n        Provide the type of the model and links to any tutorials you may have used, as additional context.\n        Provide a self-contained, concise snippet of code that can be used to reproduce the issue.\n        For more complex issues provide a repo with the smallest sample that reproduces the bug.\n        \n        Avoid including business logic or unrelated code as it makes diagnosis more difficult.\n        The code sample should be an SSCCE. See http://sscce.org/ for details. In short, please provide a code sample that we can copy/paste, run and reproduce.\n    validations:\n      required: true\n  - type: checkboxes\n    id: regression\n    attributes:\n      label: Regression Issue\n      description: Is this as regression (did it work in a previous version and not now)?\n        If this is a regression, provide the Neuron SDK release version where this configuration worked for you.\n      options:\n        - label: Select this option if this issue appears to be a regression.\n          required: false\n  - type: textarea\n    id: solution\n    attributes:\n      label: Possible Solution\n      description: |\n        Suggest a fix or reason for the bug, if you know:\n    validations:\n      required: false\n  - type: textarea\n    id: context\n    attributes:\n      label: Logs/Context/Additional Information\n      description: |\n        Anything else that might be relevant for troubleshooting this bug. Providing context helps us come up with a solution that is most useful in the real world. When applicable, please provide HLOs and compiler commands.\n    validations:\n      required: false\n\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "content": "blank_issues_enabled: false"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/documentation.yml",
    "content": "---\nname: \"📕 Documentation Issue\"\ndescription: Report an issue in the documentation and Developer Guide\ntitle: \"(short issue description)\"\nlabels: [documentation, needs-triage]\nassignees: []\nbody:\n  - type: textarea\n    id: description\n    attributes:\n      label: Describe the issue\n      description: A clear and concise description of the issue.\n    validations:\n      required: true\n\n  - type: textarea\n    id: links\n    attributes:\n      label: Links\n      description: |\n        Include links to affected documentation page(s).\n    validations:\n      required: true\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature-request.yml",
    "content": "---\nname: 🚀 Feature Request\ndescription: Suggest an idea for this project\ntitle: \"(short issue description)\"\nlabels: [feature-request, needs-triage]\nassignees: []\nbody:\n  - type: textarea\n    id: description\n    attributes:\n      label: Describe the feature\n      description: A clear and concise description of the feature you are proposing.\n    validations:\n      required: true\n  - type: textarea\n    id: use-case\n    attributes:\n      label: Use Case\n      description: |\n        Why do you need this feature?\n    validations:\n        required: true\n  - type: textarea\n    id: solution\n    attributes:\n      label: Proposed Solution\n      description: |\n        Provide detailed suggestions or requirements for this proposed feature. If you have them, include any reference implementation details (or even links to prototypes).\n    validations:\n      required: false\n  - type: textarea\n    id: other\n    attributes:\n      label: Other Information\n      description: |\n        Any additional details or information you can provide, including links to related content or similar issues.\n    validations:\n      required: false\n  - type: checkboxes\n    id: ack\n    attributes:\n      label: Acknowledgements\n      options:\n        - label: I may be able to implement this feature request\n          required: false\n"
  },
  {
    "path": ".github/pull_request_template.md",
    "content": "**IMPORTANT!** _If this is a documentation PR for a specific release, this PR must go the corresponding release branch_ (`release-X.XX.X`). _If it is an \"out-of-band\" doc update, the PR must go to the_ `master` _branch_.\n\n\n## Required PR information\n\nTo expedite approvals and merges for releases, provide the following information (select the `...` button to the right at the top of your PR message to edit it):\n\n> **AWS email alias**: {_your-name_}@amazon.com\n\n>**Description**: {_What this documentation change is and why you made it. If you have a corresponding Jira ticket or content plan, link it here. The more details you provide around any decisions you made when preparing the docs, the less annoying comments you'll get preparing to release it._}\n\n> **Date this must be published by**: {_If empty, we will assume the release date for the branch you're merging into._}\n\n> **Link to ReadTheDocs staging for this branch's doc changes**: https://awsdocs-neuron-staging.readthedocs-hosted.com/en/{YOUR_BRANCH_NAME_HERE}/\n\n> **Set the `docs-review-needed` label on the PR for tracking.**\n\n## Before you request approvals\n\n> Run a spelling and grammar check over your prose and make the changes it suggests. VSCode has a number of extensions (cSpell, LTeX) that you can use. You can also provide the rendered HTML for (or a cut-and-paste of) your pages to an AI and have it correct your spelling, grammar, and formatting issues. If you need an advanced prompt, contact @erickson-doug.\n\n## Approvers\n\nWe require 3-4 approvers to merge for non-trivial content changes (where a \"trivial\" change is a typo/grammar fix or a minor update to the format syntax):\n\n1. A senior+ engineer who will review your documentation for technical accuracy and clarity in communicating the technical concepts in your work\n2. A product manager for your Neuron component area who will review it for customer relevance and product/component/feature messaging\n3. The lead tech writer (@erickson-doug) who will review your work for overall doc design and quality, and perform the merge when all approvals are met\n4. (For PRs with code/notebook submissions) A QA/test engineer who can run your code and confirm the results.\n\nMake sure you get a commitment from these reviewers in advance! It's hard to get good quality doc reviews in order in the 11th hour of a release.\n\n**Note**: For trivial changes, you only need @erickson-doug's approval. He will merge your content once he's confirmed the fixes on staging.\n\n## Doc review checklist\n\n### Engineering reviewer checklist\n\n- [ ] I've confirmed that the contributions in this PR meet the current  [AWS Neuron writing guidelines](https://quip-amazon.com/m97CAO0kQFEU/Writing-for-AWS-Neuron).\n- [ ] I've confirmed that the documentation submitted is technically correct to the best of my knowledge.\n- [ ] I've confirmed that the documentation submitted has no spelling or grammar errors or use of internal jargon/terminology.\n- [ ] I've verified the changes render correctly on RTD (link above).\n- [ ] (If code is included) I've run tests to verify the contents of the change.\n\n---\n\n## For PRs that include code or notebook examples\n\n**MANDATORY: PR must include test run output**\n\nProvide this information for the QA reviewer in order to expedite their review.\n\n**Test run output:**\nSpecify the release version, instance size and type, OS type and test output.\n\n**For Training tutorials:**\n\n{Convergence graph for training tutorials}\n\n{Performance metrics `average_throughput`, `latency_p50`, `latency_p99` and MFU% if available}\n\nMake sure this PR contains correct classification terms (Alpha, Beta, and Stable).\n\nIf possible, provide your results or a link to them for the reviewer to check your work.\n\n## Code example/notebook content PR checklist\n\n- [ ] (If applicable) I've automated a test to safeguard my changes from regression.\n- [ ] (If applicable) I've posted test collateral to prove my change was effective and not harmful.\n- [ ] (If applicable) I've added someone from QA to the list of reviewers.  Do this if you didn't make an automated test or feel it's appropriate for another reason.\n- [ ] (If applicable) I've reviewed the licenses of updated and new binaries and their dependencies to make sure all licenses are on the pre-approved Amazon license list.  See https://inside.amazon.com/en/services/legal/us/OpenSource/Pages/BlessedOpenSourceLicenses.aspx.\n\nBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice."
  },
  {
    "path": ".github/stale_issue_mark_close_workflow.yml",
    "content": "name: Close inactive issues\non:\n  schedule:\n    - cron: \"30 1 * * *\"\n\njobs:\n  close-issues:\n    runs-on: ubuntu-latest\n    permissions:\n      issues: write\n      pull-requests: write\n    steps:\n      - uses: actions/stale@v5\n        with:\n          days-before-issue-stale: 30\n          days-before-issue-close: 14\n          stale-issue-label: \"stale\"\n          stale-issue-message: \"This issue is stale because it has been open for 30 days with no activity.\"\n          close-issue-message: \"This issue was closed because it has been inactive for 14 days since being marked as stale.\"\n          days-before-pr-stale: -1\n          days-before-pr-close: -1\n          repo-token: ${{ secrets.GITHUB_TOKEN }}\n"
  },
  {
    "path": ".github/workflows/acknowledge-new-issue.yml",
    "content": "name: Acknowledge New Issue\n\non:\n  issues:\n    types: [opened]\n\npermissions:\n  issues: write\n\njobs:\n  acknowledge:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Comment on issue\n        uses: actions/github-script@v7\n        with:\n          script: |\n            const creator = context.payload.issue.user.login;\n            await github.rest.issues.createComment({\n              owner: context.repo.owner,\n              repo: context.repo.repo,\n              issue_number: context.payload.issue.number,\n              body: `Hi @${creator}, Thank you for filing the issue! We will take a look and get back to you.`\n            });\n"
  },
  {
    "path": ".github/workflows/auto-label-issues.yml",
    "content": "# Auto-label issues based on content keywords\nname: auto-label-issues\n\non:\n  issues:\n    types: [opened]\n\njobs:\n  auto-label-issues:\n    runs-on: ubuntu-latest\n    permissions:\n      issues: write\n    steps:\n      - name: Analyze issue content\n        id: analyze_content\n        uses: actions/github-script@v7\n        env:\n          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n          ISSUE_TITLE: ${{ github.event.issue.title }}\n          ISSUE_BODY: ${{ github.event.issue.body }}\n        with:\n          script: |\n            const title = process.env.ISSUE_TITLE || '';\n            const body = process.env.ISSUE_BODY || '';\n            const content = `${title} ${body}`;\n            \n            const labels = [];\n            \n            // =============================================================================\n            // LABEL CONFIGURATION - Easy to update dictionary\n            // Add keywords, typos, or synonyms to the arrays below\n            // =============================================================================\n            \n            const labelConfig = {\n              // ----- Issue Type Labels (mutually exclusive) -----\n              bug: {\n                keywords: [\n                  // Standard terms\n                  'bug', 'error', 'crash', 'fail', 'failed', 'failure', 'failing',\n                  'broken', 'exception', 'traceback', 'segfault', 'segmentation fault',\n                  // Synonyms\n                  'issue', 'problem', 'defect', 'fault', 'glitch', 'malfunction',\n                  'wrong', 'incorrect', 'unexpected',\n                  'hang', 'hanging', 'hung', 'freeze', 'frozen',\n                  'timeout', 'timed out',\n                  'oom', 'out of memory', 'memory error',\n                  'nan', 'diverge', 'diverged',\n                  // Common typos\n                  'bugg', 'bgu', 'eror', 'errror', 'crahs', 'fial', 'brokn', 'broke'\n                ],\n                patterns: [/not\\s*work/i, /doesn'?t\\s*work/i, /won'?t\\s*work/i, /can'?t\\s*work/i]\n              },\n              \n              documentation: {\n                keywords: [\n                  // Standard terms\n                  'doc', 'docs', 'documentation', 'readme',\n                  'guide', 'tutorial', 'howto', 'how-to', 'how to',\n                  'typo', 'typos', 'spelling', 'grammar',\n                  'example', 'examples', 'sample', 'samples',\n                  'instruction', 'instructions',\n                  'clarify', 'clarification', 'unclear', 'confusing',\n                  'outdated', 'out of date', 'stale',\n                  'missing documentation', 'missing docs',\n                  'broken link', 'dead link', '404',\n                  // Common typos\n                  'documention', 'documenation', 'documentaion', 'tutoral', 'toturial'\n                ],\n                patterns: [/issue\\s*on\\s*page/i, /page\\s*.*\\.html/i]\n              },\n              \n              'feature-request': {\n                keywords: [\n                  // Standard terms\n                  'feature', 'feature request', 'feature-request',\n                  'enhancement', 'improvement',\n                  'implement', 'implementation',\n                  'new feature', 'add feature',\n                  'support for', 'add support',\n                  'would be nice', 'would be great', 'would be helpful',\n                  'suggestion', 'suggest', 'proposal', 'propose',\n                  'wishlist', 'wish list',\n                  // Common typos\n                  'feture', 'featrue', 'enchancement', 'improvment'\n                ],\n                patterns: [/add\\s+support\\s+for/i, /please\\s+add/i, /would\\s+be\\s+(nice|great|helpful)/i]\n              },\n              \n              // ----- Hardware Labels (independent - multiple can be applied) -----\n              Trn1: {\n                keywords: [\n                  'trn1', 'trn-1', 'trn 1', 'trn1n',\n                  'trn1.2xlarge', 'trn1.32xlarge', 'trn1n.32xlarge',\n                  'trainium', 'trainium1', 'trainium 1', 'trainium-1',\n                  // Common typos\n                  'tranium', 'trainuim', 'trn-1n'\n                ],\n                patterns: [/trn1n?(?:\\.[0-9]*xlarge)?/i, /trainium\\s*1?(?!\\s*2)/i]\n              },\n              \n              Trn2: {\n                keywords: [\n                  'trn2', 'trn-2', 'trn 2',\n                  'trn2.48xlarge',\n                  'trainium2', 'trainium 2', 'trainium-2',\n                  // Common typos\n                  'tranium2', 'trainuim2'\n                ],\n                patterns: [/trn2(?:\\.[0-9]*xlarge)?/i, /trainium\\s*2/i]\n              },\n              \n              Inf1: {\n                keywords: [\n                  'inf1', 'inf-1', 'inf 1',\n                  'inf1.xlarge', 'inf1.2xlarge', 'inf1.6xlarge', 'inf1.24xlarge',\n                  'inferentia', 'inferentia1', 'inferentia 1', 'inferentia-1',\n                  // Common typos\n                  'infertia', 'inferntia', 'infernita'\n                ],\n                patterns: [/inf1(?:\\.[0-9]*xlarge)?/i, /inferentia\\s*1?(?!\\s*2)/i]\n              },\n              \n              Inf2: {\n                keywords: [\n                  'inf2', 'inf-2', 'inf 2',\n                  'inf2.xlarge', 'inf2.8xlarge', 'inf2.24xlarge', 'inf2.48xlarge',\n                  'inferentia2', 'inferentia 2', 'inferentia-2',\n                  // Common typos\n                  'infertia2', 'inferntia2', 'infernita2'\n                ],\n                patterns: [/inf2(?:\\.[0-9]*xlarge)?/i, /inferentia\\s*2/i]\n              },\n              \n              // ----- Use Case Labels (independent - both can be applied) -----\n              Inference: {\n                keywords: [\n                  // Standard terms\n                  'inference', 'inferencing',\n                  'predict', 'prediction', 'predictions', 'predicting',\n                  'serving', 'serve', 'server',\n                  'batch inference', 'real-time', 'realtime',\n                  'endpoint', 'endpoints',\n                  // Common typos\n                  'infernce', 'inferance', 'prediciton', 'deploymnet'\n                ],\n                patterns: [/infer(?:ence|ring)?/i, /predict(?:ion|ing)?/i, /deploy(?:ment|ing)?/i]\n              },\n              \n              Training: {\n                keywords: [\n                  // Standard terms\n                  'training', 'train', 'trained',\n                  'fine-tune', 'finetune', 'fine tune', 'finetuning', 'fine-tuning',\n                  'pretrain', 'pre-train', 'pretraining', 'pre-training',\n                  'learning', 'learn',\n                  'gradient', 'gradients',\n                  'backward', 'backprop', 'backpropagation',\n                  'loss', 'convergence', 'converge',\n                  'epoch', 'epochs',\n                  'checkpoint', 'checkpointing',\n                  // Common typos\n                  'trainig', 'traning', 'trainin', 'fintune', 'finetunning'\n                ],\n                patterns: [/train(?:ing|ed)?/i, /fine[\\s-]?tun(?:e|ing)/i, /pre[\\s-]?train(?:ing)?/i]\n              }\n            };\n            \n            // =============================================================================\n            // MATCHING LOGIC\n            // =============================================================================\n            \n            function matchesLabel(config) {\n              const contentLower = content.toLowerCase();\n              // Check keywords (case-insensitive substring match)\n              for (const keyword of config.keywords) {\n                if (contentLower.includes(keyword.toLowerCase())) {\n                  return true;\n                }\n              }\n              // Check regex patterns\n              for (const pattern of config.patterns) {\n                if (pattern.test(content)) {\n                  return true;\n                }\n              }\n              return false;\n            }\n            \n            // Issue Type Labels - MUTUALLY EXCLUSIVE (priority: bug > documentation > feature-request)\n            if (matchesLabel(labelConfig.bug)) {\n              labels.push('bug');\n            } else if (matchesLabel(labelConfig.documentation)) {\n              labels.push('documentation');\n            } else if (matchesLabel(labelConfig['feature-request'])) {\n              labels.push('feature-request');\n            }\n            \n            // Hardware/Instance Type Labels - INDEPENDENT (multiple can be applied)\n            if (matchesLabel(labelConfig.Trn1)) {\n              labels.push('Trn1');\n            }\n            if (matchesLabel(labelConfig.Trn2)) {\n              labels.push('Trn2');\n            }\n            if (matchesLabel(labelConfig.Inf1)) {\n              labels.push('Inf1');\n            }\n            if (matchesLabel(labelConfig.Inf2)) {\n              labels.push('Inf2');\n            }\n            \n            // Use Case Labels - INDEPENDENT (both can be applied)\n            if (matchesLabel(labelConfig.Inference)) {\n              labels.push('Inference');\n            }\n            if (matchesLabel(labelConfig.Training)) {\n              labels.push('Training');\n            }\n            \n            core.setOutput('labels', labels.join(','));\n            core.setOutput('has_labels', labels.length > 0);\n\n      - name: Apply labels to issue\n        if: steps.analyze_content.outputs.has_labels == 'true'\n        env:\n          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n        run: |\n          IFS=',' read -ra LABELS <<< \"${{ steps.analyze_content.outputs.labels }}\"\n          for label in \"${LABELS[@]}\"; do\n            gh issue edit ${{ github.event.issue.number }} --add-label \"$label\" -R ${{ github.repository }}\n          done\n"
  },
  {
    "path": ".gitignore",
    "content": "_build/\n__pycache__/\n.venv/\n.DS_Store\nsrc/examples/pytorch/libtorch_demo.tar.gz\nsrc/neuronperf.tar.gz\n*-checkpoint.ipynb\n.idea/\n.vscode/\nnki/*/generated/\nuncommitted/"
  },
  {
    "path": ".readthedocs.yml",
    "content": "# .readthedocs.yml\n# Read the Docs configuration file\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details\n\n# Required\nversion: 2\n\n# Set the version of Python and other tools you might need\nbuild:\n  os: \"ubuntu-22.04\"\n  tools:\n    python: \"3.10\"\n#  jobs:\n#    pre_build:\n#      - python -m sphinx -b linkcheck . _build/linkcheck\n\n# Build documentation in the docs/ directory with Sphinx\nsphinx:\n  configuration: conf.py\n\n#conda\n#conda:\n#  file: readthedocs-environment.yml\n\n# Build documentation with MkDocs\n#mkdocs:\n#  configuration: mkdocs.yml\n\n# Optionally build your docs in additional formats such as PDF\n#formats:\n#  - pdf\n\n# Optionally set the version of Python and requirements required to build your docs\npython:\n  install:\n    - requirements: requirements.txt"
  },
  {
    "path": "CODEOWNERS",
    "content": "# This file creates codeowners for the documentation. It will allow setting code reviewers for all Pull requests to merge to the master branch \n# Each line is a file pattern followed by one or more owners.\n\n# Reference guide - https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-code-owners#example-[…]ners-file\n# Example - These owners will be the default owners for everything in\n# the repo. Unless a later match takes precedence,\n# @global-owner1 and @global-owner2 will be requested for\n# review when someone opens a pull request.\n# *       @global-owner1 @global-owner2\n\n*       @aws-maens @micwade-aws @musunita @aws-sadaf @rgrandhiamzn @eshalakhotia @jluntamazon @jeffhataws @aws-rhsoln @hannanjgaws @PrashantSaraf @aws-donkrets @aws-singhada @gsnaws @awsjoshir @sidjoshiaws @pinak-p @vikas-paliwal-aws @aarondou @mrinalks @erickson-doug @lnixaws @micwade-aws \n\nsrc/examples/mxnet/ @aws-rhsoln  @aws-sadaf @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia\nneuron-guide/neuron-frameworks/mxnet-neuron/  @aws-rhsoln @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia\nneuron-guide/neuron-frameworks/mxnet-neuron/tutorials/ @musunita @aws-rhsoln @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia\n\nsrc/examples/tensorflow/  @awshaichen  @aws-sadaf @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia\nneuron-guide/neuron-frameworks/tensorflow-neuron/ @awshaichen @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia\nneuron-guide/neuron-frameworks/tensorflow-neuron/tutorials/ @musunita @awshaichen @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia\n\n\nsrc/examples/pytorch/ @jluntamazon @aws-sadaf @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia\nneuron-guide/neuron-frameworks/pytorch-neuron/  @jluntamazon @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia\nneuron-guide/neuron-frameworks/pytorch-neuron/tutorials/ @musunita @jluntamazon @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia\n\nlibraries/nxd-inference/ @huntingcarlisle @lccasagrande @lipovsek-aws @erickson-doug @eshalakhotia @pinak-p @hannanjgaws @akhil-aws @ahimsh-aws @rgrandhiamzn @yahavb @FThompsonAWS @gsnaws @sidjoshiaws @jluntamazon @musunita"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "# Contributing Guidelines\n\nThank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional\ndocumentation, we greatly value feedback and contributions from our community.\n\nPlease read through this document before submitting any issues or pull requests to ensure we have all the necessary\ninformation to effectively respond to your bug report or contribution.\n\n## Reporting Bugs/Feature Requests\n\nWe welcome you to use the GitHub issue tracker to report bugs or suggest features.\n\nWhen filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already\nreported the issue. Please try to include as much information as you can. Details like these are incredibly useful:\n\n* A reproducible test case or series of steps\n* The version of our code being used\n* Any modifications you've made relevant to the bug\n* Anything unusual about your environment or deployment\n\n## Contributing Workflow (via Pull Requests)\n\nContributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:\n\n1. You are working against the latest source on the *master* branch.\n2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.\n3. You open an issue to discuss any significant work - we would hate for your time to be wasted.\n\n**Important**: Currently, local doc builds require a Python 3.9 environment. If you are on MacOS, you can install it from the terminal with `brew install python@3.9`. Add it to your working path with `brew link python@3.9` and confirm it works by running `python3.9 --version`.\n\n### Docker Build\n\nIf you don't have Python 3.9/3.10 or a compatible gcc toolchain, use the Docker workflow:\n\n```bash\n./build.sh build   # Build Docker image (first time only)\n./build.sh html    # Build HTML docs to _build/html/\n./build.sh shell   # Interactive shell for debugging\n./build.sh clean   # Remove _build/ directory\n```\n\n### Manual Build\n\nTo send us a pull request, please:\n\n1. Clone the repository locally:\n\n    ```bash\n    git clone git@github.com:YOUR-USERNAME/private-aws-neuron-sdk-staging.git\n    ```\n\n2. Install the build dependencies. This requires a Python 3.9 installation and venv:\n\n    ```bash\n    cd .. # The root folder where you have your cloned Git repos; don't run this in the repo folder but one level up or you'll have venv files in your repo folder\n    python3.9 -m venv venv && . venv/bin/activate\n    pip install -U pip\n    cd private-aws-neuron-sdk-staging\n    pip install -r requirements.txt\n    ```\n\n3. Build the documentation into HTML. This command will allow you to view the\n   rendered documentation by opening the generated `_build/html/index.html`. On first run, this will take about 15 mins. Subsequent html generations are incremental and will take less time.\n\n   Run:\n\n   ```bash\n   sphinx-build -b html . _build/html\n   ```\n\n   Or leverage the make file and run:\n\n   ```bash\n   make html\n   ```\n\n   If this doesn't work, try this command:\n\n   ```bash\n   sphinx-build -C -b html . _build/html\n   ```\n\n   For speedier builds in multiprocessor environments, run:\n\n     ```bash\n   sphinx-build -b html . _build/html -j auto\n   ```\n\n   **NOTE**: If you get an error for the spelling extension, like `Extension error: Could not import extension sphinxcontrib.spelling (exception: The 'enchant' C library was not found and maybe needs to be installed. See  https://pyenchant.github.io/pyenchant/install.html`, run `brew install enchant`.\n\n4. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.\n5. Rebuild the documentation with `sphinx-build -b html . _build/html`. Always ensure that the docs build without errors and that your changes look correct before pushing your changes to remote.\n    * If you encounter errors that are unclear, run the build in verbose mode with `sphinx-build -vv -b html . _build/html`.\n6. Commit your changes to your branch with a clear, scoped commit messages. Bad: \"fixed stuff\". Good: \"Updated ref IDs in all containers topics\".\n7. Push your changes to remote (`git push origin`) and create a PR from your branch into `master` or the standing release branch (example: `release-2.27.0`). Answer any default questions in the pull request interface.\n    * See: [pull request guide](https://help.github.com/articles/creating-a-pull-request/)).\n8. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.\n\nUpdated process documentation can be found here: [Runbook: Authoring a topic for the Neuron documentation](https://quip-amazon.com/e9B9AM7Npb17/Runbook-Authoring-a-topic-for-the-Neuron-documentation).\n\n## Updating the sitemap\n\nIf you add or remove a topic, you must recreate the sitemap. To do so:\n\n1. From a shell, `cd` to the root of this repo (`private-aws-neuron-sdk-staging`) on your local machine.\n2. Run the following command: `python3 ./_utilities/create_sitemap.py`. This will generate the sitemap as `sitemap.xml` in the root folder of the repo.\n3. Rename the `sitemap.xml` file to `sitemap1.xml`.\n4. Move the `sitemap1.xml` file to the `/static` folder, copying over the previous version.\n5. Delete the generated `sitemap.xml` file from the root (**not** from `/static`) if you did a copy instead of a move.\n6. Push a PR with the updated sitemap to remote and request DougEric review/approve it.\n\n## Finding contributions to work on\n\nLooking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.\n    * Or, if you're so inclined, get on DougEric's Christmas card list by fixing broken links, formatting errors, removing stale topics, and fixing spelling/grammar errors.\n\n## Code of Conduct\n\nThis project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).\nFor more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact\nopensource-codeofconduct@amazon.com with any additional questions or comments.\n\n## Security issue notifications\n\nIf you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.\n\n## Licensing\n\nSee the [LICENSE-DOCUMENTATION](./LICENSE-DOCUMENTATION), [LICENSE-SAMPLECODE](./LICENSE-SAMPLECODE) and [LICENSE-SUMMARY-DOCS-SAMPLES](./LICENSE-SUMMARY-DOCS-SAMPLES) files for our project's licensing. We will ask you to confirm the licensing of your contribution.\n\nWe may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger chan\n"
  },
  {
    "path": "Dockerfile",
    "content": "FROM python:3.10-slim\n\nRUN apt-get update && apt-get install -y --no-install-recommends \\\n    make enchant-2 git pandoc \\\n    && rm -rf /var/lib/apt/lists/* \\\n    && pandoc --version\n\nCOPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv\n\nWORKDIR /docs\n\nCOPY requirements.txt .\nRUN uv pip install --system -r requirements.txt --extra-index-url=https://pypi.org/simple\n\nENTRYPOINT [\"/bin/bash\"]\n"
  },
  {
    "path": "LICENSE-DOCUMENTATION",
    "content": "*** Documentation:\n\nCreative Commons Attribution-ShareAlike 4.0 International Public License\n\nBy exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-ShareAlike 4.0 International Public License (\"Public License\"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.\n\nSection 1 – Definitions.\n\t\n     a.\tAdapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.\n\t\n     b.\tAdapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License.\n\t\n     c.\tBY-SA Compatible License means a license listed at creativecommons.org/compatiblelicenses, approved by Creative Commons as essentially the equivalent of this Public License.\n\t\n     d.\tCopyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights.\n\t\n     e.\tEffective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements.\n\t\n     f.\tExceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.\n\t\n     g.\tLicense Elements means the license attributes listed in the name of a Creative Commons Public License. The License Elements of this Public License are Attribution and ShareAlike.\n\t\n     h.\tLicensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License.\n\t\n     i.\tLicensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license.\n\t\n     j.\tLicensor means the individual(s) or entity(ies) granting rights under this Public License.\n\t\n     k.\tShare means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them.\n\t\n     l.\tSui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world.\n\t\n     m.\tYou means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning.\n\nSection 2 – Scope.\n\t\n     a.\tLicense grant.\n\t\n          1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:\n\n               A. reproduce and Share the Licensed Material, in whole or in part; and\t\n\n               B. produce, reproduce, and Share Adapted Material.\n\t\n          2. Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.\n\t\n          3. Term. The term of this Public License is specified in Section 6(a).\n\t\n          4. Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material.\n\t\n          5. Downstream recipients.\n\n               A. Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License.\n\t\n               B. Additional offer from the Licensor – Adapted Material. Every recipient of Adapted Material from You automatically receives an offer from the Licensor to exercise the Licensed Rights in the Adapted Material under the conditions of the Adapter’s License You apply.\n\t\n               C. No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.\n\t\n          6. No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i).\n\t\n     b.\tOther rights.\n\t\n          1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise.\n\t\n          2. Patent and trademark rights are not licensed under this Public License.\n\t\n          3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties.\n\nSection 3 – License Conditions.\n\nYour exercise of the Licensed Rights is expressly made subject to the following conditions.\n\t\n     a.\tAttribution.\n\t\n          1. If You Share the Licensed Material (including in modified form), You must:\n\n               A. retain the following if it is supplied by the Licensor with the Licensed Material:\n\n                    i.\tidentification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);\n\n                    ii.\ta copyright notice;\n\n                    iii. a notice that refers to this Public License;\n\n                    iv.\ta notice that refers to the disclaimer of warranties;\n\n                    v.\ta URI or hyperlink to the Licensed Material to the extent reasonably practicable;\n\n               B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and\n\n               C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.\n\t\n          2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.\n\t\n          3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.\n\t\n     b.\tShareAlike.In addition to the conditions in Section 3(a), if You Share Adapted Material You produce, the following conditions also apply.\n\t\n          1. The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-SA Compatible License.\n\t\n          2. You must include the text of, or the URI or hyperlink to, the Adapter's License You apply. You may satisfy this condition in any reasonable manner based on the medium, means, and context in which You Share Adapted Material.\n\t\n          3. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply.\n\nSection 4 – Sui Generis Database Rights.\n\nWhere the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material:\n\t\n     a.\tfor the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database;\n\t\n     b.\tif You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material, including for purposes of Section 3(b); and\n\t\n     c.\tYou must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database.\nFor the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights.\n\nSection 5 – Disclaimer of Warranties and Limitation of Liability.\n\t\n     a.\tUnless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.\n\t\n     b.\tTo the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.\n\t\n     c.\tThe disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.\n\nSection 6 – Term and Termination.\n\t\n     a.\tThis Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically.\n\t\n     b.\tWhere Your right to use the Licensed Material has terminated under Section 6(a), it reinstates:\n\t\n          1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or\n\t\n          2. upon express reinstatement by the Licensor.\n\t\n     c.\tFor the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License.\n\t\n     d.\tFor the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License.\n\t\n     e.\tSections 1, 5, 6, 7, and 8 survive termination of this Public License.\n\nSection 7 – Other Terms and Conditions.\n\t\n     a.\tThe Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed.\n\t\n     b.\tAny arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License.\n\nSection 8 – Interpretation.\n\t\n     a.\tFor the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.\n\t\n     b.\tTo the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions.\n\t\n     c.\tNo term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor.\n\t\n     d.\tNothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority.\n"
  },
  {
    "path": "LICENSE-SAMPLECODE",
    "content": "Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this\nsoftware and associated documentation files (the \"Software\"), to deal in the Software\nwithout restriction, including without limitation the rights to use, copy, modify,\nmerge, publish, distribute, sublicense, and/or sell copies of the Software, and to\npermit persons to whom the Software is furnished to do so.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,\nINCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A\nPARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\nHOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION\nOF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE\nSOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n"
  },
  {
    "path": "LICENSE-SUMMARY-DOCS-SAMPLES",
    "content": "*** Documentation and Sample Code:\n\nCopyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n\nThe documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.\n\nThe sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file.\n"
  },
  {
    "path": "Makefile",
    "content": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the environment for the first two.\nSPHINXOPTS    ?=\nSPHINXBUILD   ?= sphinx-build\nSOURCEDIR     = $(CURDIR)\nBUILDDIR      = _build\n\n# Put it first so that \"make\" without argument is like \"make help\".\nhelp:\n\t@$(SPHINXBUILD) -M help \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n\n.PHONY: help Makefile clean\n\n# Catch-all target: route all unknown targets to Sphinx using the new\n# \"make mode\" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).\n%: Makefile\n\t@$(SPHINXBUILD) -M $@ \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n\nclean:\n\t-rm -rf $(BUILDDIR)/*\n\n"
  },
  {
    "path": "README.md",
    "content": "![neuron](./images/Site-Merch_Neuron-ML-SDK_Editorial.png)\n\n# AWS Neuron\n\n## Neuron SDK Overview\n\nAWS Neuron is a software development kit (SDK) enabling high-performance deep learning acceleration using AWS Inferentia and Trainium, AWS's custom designed machine learning accelerators. With Neuron, you can develop, profile, and deploy high-performance machine learning workloads on top of accelerated EC2 instances, e.g. Inf1 and Trn1.\n\nNeuron includes a compiler, runtime driver, as well as debug and profiling utilities with a TensorBoard plugin for visualization, and is pre-integrated into popular machine learning frameworks like Pytorch, TensorFlow and MXNet, to provide a seamless machine learning acceleration workflow.\n\n## Neuron SDK’s documentation\n\nFor full documentations including user guide, Howtos and Tutorials see [Neuron SDK’s documentation](https://awsdocs-neuron.readthedocs-hosted.com/)\n\n## Support\nIf none of the github and online resources have an answer to your question, checkout the AWS Neuron [support forum](https://forums.aws.amazon.com/forum.jspa?forumID=355).\n"
  },
  {
    "path": "_backup-setup/neuron-setup/multiframework/multi-framework-ubuntu22-neuron-dlami.rst",
    "content": ".. _setup-ubuntu22-multi-framework-dlami:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nGet Started with Neuron on Ubuntu 22 with Neuron Multi-Framework DLAMI\n======================================================================\n\nYou can quickly get started on Ubuntu 22 using the Neuron Deep Learning AMI (DLAMI). Then, start using one of the multiple frameworks or libraries that Neuron SDK supports by\nactivating the corresponding virtual environment. Each virtual environment comes pre-installed with Neuron libraries needed for you to get started. The Neuron DLAMI supports all Neuron instances (Inf1/Inf2/Trn1/Trn1n/Trn2/Trn3)\nand is updated with each Neuron SDK release. To start using the latest version of the Neuron DLAMI, use the following steps:\n\nStep 1:  Launch the instance using Neuron DLAMI\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOnce you open the `EC2 Console <https://console.aws.amazon.com/ec2>`_, select your desired AWS region and choose \"Launch Instance\". Under AMI selection select the \"Quick Start\"\nand \"Ubuntu\", choose the \"Deep Learning AMI Neuron (Ubuntu 22.04)\"(see screenshot below). Once you have selected the AMI, select the desired Neuron Instance(Inf1/Inf2/Trn1/Trn1n/Trn2/Trn3) , \nconfigure disk size and other criteria, launch the instance\n\n.. image:: /images/neuron-multi-framework-dlami-quick-start.png\n    :scale: 20%\n    :align: center\n\n\n.. note::\n  If you are looking to use the Neuron DLAMI in your cloud automation flows , Neuron also supports :ref:`SSM parameters <ssm-parameter-neuron-dlami>` to easily retrieve the latest DLAMI id.\n\n\n\nStep 2: Activate the desired virtual environment \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  \n\nYou can activate one of the virtual environments depending on the library or framework you are interested in:\n\n1. Get the desired virtual environment name for the framework/library by referring to :ref:`the Neuron DLAMI overview <neuron-dlami-multifw-venvs>`.\n2. Activate the virtual environment by using:\n\n  ::\n\n    source /opt/<name_of_virtual_environment>/bin/activate\n\n\nAfter you have activated the desired virtual environment , you can try out one of the tutorials listed in the corresponding framework or library training and inference section.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "_backup-setup/neuron-setup/multiframework/multi-framework-ubuntu24-neuron-dlami.rst",
    "content": ".. _setup-ubuntu24-multi-framework-dlami:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nGet Started with Neuron on Ubuntu 24 with Neuron Multi-Framework DLAMI\n======================================================================\n\nYou can quickly get started on Ubuntu 24 using the Neuron Deep Learning AMI (DLAMI). Then, start using one of the multiple frameworks or libraries that Neuron SDK supports by\nactivating the corresponding virtual environment. Each virtual environment comes pre-installed with Neuron libraries needed for you to get started. The Neuron DLAMI supports all Neuron instances (Inf2/Trn1/Trn1n/Trn2/Trn3)\nand is updated with each Neuron SDK release. To start using the latest version of the Neuron DLAMI, use the following steps:\n\nStep 1:  Launch the instance using Neuron DLAMI\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOnce you open the `EC2 Console <https://console.aws.amazon.com/ec2>`_, select your desired AWS region and choose \"Launch Instance\". Under AMI selection select the \"Quick Start\"\nand \"Ubuntu\", choose the \"Deep Learning AMI Neuron (Ubuntu 24.04)\"(see screenshot below). Once you have selected the AMI, select the desired Neuron Instance(Inf2/Trn1/Trn1n/Trn2/Trn3), \nconfigure disk size and other criteria, launch the instance\n\n.. image:: /images/neuron-multi-framework-dlami-U24-quick-start.png\n    :scale: 20%\n    :align: center\n\n\n.. note::\n  If you are looking to use the Neuron DLAMI in your cloud automation flows , Neuron also supports :ref:`SSM parameters <ssm-parameter-neuron-dlami>` to easily retrieve the latest DLAMI id.\n\n\n\nStep 2: Activate the desired virtual environment \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  \n\nYou can activate one of the virtual environments depending on the library or framework you are interested in:\n\n1. Get the desired virtual environment name for the framework/library by referring to :ref:`the Neuron DLAMI overview <neuron-dlami-multifw-venvs>`.\n2. Activate the virtual environment by using:\n\n  ::\n\n    source /opt/<name_of_virtual_environment>/bin/activate\n\n\nAfter you have activated the desired virtual environment , you can try out one of the tutorials listed in the corresponding framework or library training and inference section.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuron/amazon-linux/torch-neuron-al2-base-dlami.rst",
    "content": ".. _setup-torch-neuron-al2-base-dlami:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuron\") Setup on Amazon Linux 2 with DLAMI Base\n=======================================================================\n\n.. note::\n   As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide <eos-al2>`\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n.. include:: /setup/install-templates/al2-python.rst\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instance sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Check for the latest version of the `DLAMI Base AMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-amazon-linux-2/>`_ and copy the AMI name that starts with \"Deep Learning Base Neuron AMI (Amazon Linux 2) <latest_date>\" from \"AMI Name:\" section\n    * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n\n.. include:: /includes/setup/tab-inference-torch-neuron-al2.txt\n\n.. include :: /archive/torch-neuron/setup/pytorch-update-al2.rst\n\n.. include :: /archive/torch-neuron/setup/pytorch-install-prev-al2.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuron/amazon-linux/torch-neuron-al2-pytorch-dlami.rst",
    "content": ".. _setup-torch-neuron-al2-pytorch-dlami:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuron\") Setup on Amazon Linux 2 with Pytorch DLAMI\n=========================================================================\n\n.. note::\n   As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide <eos-al2>`\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n.. include:: /setup/install-templates/al2-python.rst\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron`.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Check for the latest version of the `DLAMI Neuron Pytorch 1.13 AMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-13-amazon-linux-2/>`_ and copy the AMI name that starts with \"Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) <latest_date>\" from \"AMI Name:\" section\n    * Search for the copied AMI name in the AMI Search , you should see an exact matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Update Neuron Drivers\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=driver_runtime_tools --framework=pytorch --framework-version=1.13.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1\n\n.. dropdown::  Get Started With Pytorch DLAMI\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 98\n            :end-line: 99\n\n.. card:: Visit PyTorch Neuron(``torch-neuron``) for Inference section\n    :link: inference-torch-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit PyTorch Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: neuron-pytorch\n    :link-type: ref\n\n.. include:: /archive/torch-neuron/setup/pytorch-update-al2-dlami.rst\n\n.. include:: /archive/torch-neuron/setup/pytorch-install-prev-al2.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuron/amazon-linux/torch-neuron-al2.rst",
    "content": ".. _setup-torch-neuron-al2:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuron\") Setup on Amazon Linux 2\n=========================================================\n\n.. note::\n   As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide <eos-al2>`\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n.. include:: /setup/install-templates/al2-python.rst\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Select Amazon Linux 2 AMI(HVM) - Kernel 5.10\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance  \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n\n.. include:: /includes/setup/tab-inference-torch-neuron-al2.txt\n\n.. include :: /archive/torch-neuron/setup/pytorch-update-al2.rst\n\n.. include :: /archive/torch-neuron/setup/pytorch-install-prev-al2.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuron/amazon-linux/torch-neuron-al2023.rst",
    "content": ".. _setup-torch-neuron-al2023:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuron\") Setup on Amazon Linux 2023\n===========================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Select Amazon Linux 2023 AMI\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance  \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n\n.. include:: /includes/setup/tab-inference-torch-neuron-al2023.txt\n\n.. include:: /archive/torch-neuron/setup/pytorch-update-al2023.rst\n\n.. include:: /archive/torch-neuron/setup/pytorch-install-prev-al2023.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuron/ubuntu/torch-neuron-ubuntu20-base-dlami.rst",
    "content": ".. _setup-torch-neuron-u20-base-dlami:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuron\") Setup on Ubuntu 20 with DLAMI Base\n==================================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instance sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Check for the latest version of the `DLAMI Base AMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-ubuntu-20-04/>`_ and copy the AMI name that starts with \"Deep Learning Base Neuron AMI (Ubuntu 20.04) <latest_date>\" from \"AMI Name:\" section\n    * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance\n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n\n.. include:: /includes/setup/tab-inference-torch-neuron-u20.txt\n\n.. include:: /archive/torch-neuron/setup/pytorch-update-u20.rst\n\n.. include:: /archive/torch-neuron/setup/pytorch-install-prev-u20.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuron/ubuntu/torch-neuron-ubuntu20-pytorch-dlami.rst",
    "content": ".. _setup-torch-neuron-u20-pytorch-dlami:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuron\") Setup on Ubuntu 20 with Pytorch DLAMI\n=====================================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron`.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Check for the latest version of the `DLAMI Neuron Pytorch 1.13 AMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-13-ubuntu-20-04/>`_ and copy the AMI name that starts with \"Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) <latest_date>\" from \"AMI Name:\" section\n    * Search for the copied AMI name in the AMI Search , you should see an exact matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n\n.. dropdown::  Update Neuron Drivers\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=driver_runtime_tools --framework=pytorch --framework-version=1.13.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1\n\n\n.. dropdown::  Get Started With Pytorch DLAMI\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 101\n            :end-line: 102\n\n.. card:: PyTorch Neuron(``torch-neuron``) for Inference\n    :link: inference-torch-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit PyTorch Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: neuron-pytorch\n    :link-type: ref\n\n.. include:: /archive/torch-neuron/setup/pytorch-update-u20-dlami.rst\n\n.. include:: /archive/torch-neuron/setup/pytorch-install-prev-u20.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuron/ubuntu/torch-neuron-ubuntu20.rst",
    "content": ".. _setup-torch-neuron-u20:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuron\") Setup on Ubuntu 20\n====================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Select Ubuntu Server 20 AMI\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n\n.. include:: /includes/setup/tab-inference-torch-neuron-u20.txt\n\n.. include:: /archive/torch-neuron/setup/pytorch-update-u20.rst\n\n.. include:: /archive/torch-neuron/setup/pytorch-install-prev-u20.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuron/ubuntu/torch-neuron-ubuntu22.rst",
    "content": ".. _setup-torch-neuron-u22:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuron\") Setup on Ubuntu 22\n=====================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Select Ubuntu Server 22 AMI\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n.. include:: /includes/setup/tab-inference-torch-neuron-u22.txt\n\n.. include:: /archive/torch-neuron/setup/pytorch-update-u22.rst\n\n.. include:: /archive/torch-neuron/setup/pytorch-install-prev-u22.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2-base-dlami.rst",
    "content": ".. _setup-torch-neuronx-al2-base-dlami:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuronx\") Setup on Amazon Linux 2 with DLAMI Base\n=========================================================================\n\n.. note::\n    As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide <eos-al2>`\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n.. include:: /setup/install-templates/al2-python.rst\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instance sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Check for the latest version of the `DLAMI Base AMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-amazon-linux-2/>`_ and copy the AMI name that starts with \"Deep Learning Base Neuron AMI (Amazon Linux 2) <latest_date>\" from \"AMI Name:\" section\n    * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance.\n    * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 2\n        :end-line: 3\n\n\n.. include:: /includes/setup/tab-inference-torch-neuronx-al2.txt\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-al2.rst\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2-pytorch-dlami.rst",
    "content": ".. _setup-torch-neuronx-al2-dlami-pytorch:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuronx\") Setup on Amazon Linux 2 with DLAMI Pytorch\n===========================================================================\n\n.. note::\n    As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide <eos-al2>`\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n.. include:: /setup/install-templates/al2-python.rst\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Check for the latest version of the `DLAMI Neuron Pytorch 1.13 AMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-13-amazon-linux-2/>`_ and copy the AMI name that starts with \"Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) <latest_date>\" from \"AMI Name:\" section\n    * Search for the copied AMI name in the AMI Search , you should see an exact matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance.\n    * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Update Neuron Drivers\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=driver_runtime_tools --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1\n\n.. dropdown::  Get Started With Pytorch DLAMI\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 50\n            :end-line: 51\n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Inference section\n    :link: inference-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Training section\n    :link: training-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-al2-dlami.rst\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2.rst",
    "content": ".. _setup-torch-neuronx-al2:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuronx\") Setup on Amazon Linux 2\n=========================================================\n\n.. note::\n   As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide <eos-al2>`\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n.. include:: /setup/install-templates/al2-python.rst\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Select Amazon Linux 2 AMI(HVM) - Kernel 5.10\n    * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 2\n        :end-line: 3\n\n\n.. include:: /includes/setup/tab-inference-torch-neuronx-al2.txt\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-al2.rst\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2023.rst",
    "content": ".. _setup-torch-neuronx-al2023:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuronx\") Setup on Amazon Linux 2023\n============================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Select Amazon Linux 2023 AMI\n    * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 239\n        :end-line: 240\n\n\n.. include:: /includes/setup/tab-inference-torch-neuronx-al2023.txt\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-al2023.rst\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2023.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu20-base-dlami.rst",
    "content": ".. _setup-torch-neuronx-ubuntu20-base-dlami:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuronx\") Setup on Ubuntu 20 with DLAMI Base\n====================================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instance sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Check for the latest version of the `DLAMI Base AMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-ubuntu-20-04/>`_ and copy the AMI name that starts with \"Deep Learning Base Neuron AMI (Ubuntu 20.04) <latest_date>\" from \"AMI Name:\" section\n    * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance.\n    * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 5\n        :end-line: 6\n\n\n.. include:: /includes/setup/tab-inference-torch-neuronx-u20.txt\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u20.rst\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u20.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu20-pytorch-dlami.rst",
    "content": ".. _setup-torch-neuronx-ubuntu20-dlami-pytorch:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuronx\") Setup on Ubuntu 20 with DLAMI Pytorch\n======================================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Check for the latest version of the `DLAMI Neuron Pytorch 1.13 AMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-13-ubuntu-20-04/>`_ and copy the AMI name that starts with \"Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) <latest_date>\" from \"AMI Name:\" section\n    * Search for the copied AMI name in the AMI Search , you should see an exact matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance.\n    * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Update Neuron Drivers\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=driver_runtime_tools --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1\n\n.. dropdown::  Get Started With Pytorch DLAMI\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 53\n            :end-line: 54\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Inference section\n    :link: inference-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Training section\n    :link: training-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n \n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u20-dlami.rst\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u20.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu20.rst",
    "content": ".. _setup-torch-neuronx-ubuntu20:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :width: 100%\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuronx\") Setup on Ubuntu 20\n===================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training.\n\n.. include:: /setup/install-templates/trn1-ga-warning.txt\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Select Ubuntu Server 20 AMI\n    * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance\n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 5\n        :end-line: 6\n\n\n.. include:: /includes/setup/tab-inference-torch-neuronx-u20.txt\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u20.rst\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u20.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.rst",
    "content": ".. _setup-torch-neuronx-ubuntu22:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuronx\") Setup on Ubuntu 22\n=====================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training.\n\n.. include:: /setup/install-templates/trn1-ga-warning.txt\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Select Ubuntu Server 22 AMI\n    * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 242\n        :end-line: 243\n\n.. include:: /includes/setup/tab-inference-torch-neuronx-u22.txt\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u22.rst\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u22.rst"
  },
  {
    "path": "_backup-setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu24.rst",
    "content": ".. _setup-torch-neuronx-ubuntu24:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nPyTorch Neuron (\"torch-neuronx\") Setup on Ubuntu 24\n=====================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of PyTorch Neuron (``torch-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training.\n\n.. include:: /setup/install-templates/trn1-ga-warning.txt\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Select Ubuntu Server 24 AMI\n    * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 299\n        :end-line: 300\n\n.. include:: /includes/setup/tab-inference-torch-neuronx-u24.txt\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u24.rst\n\n"
  },
  {
    "path": "_content-types/conceptual-deep-dive.rst",
    "content": ".. meta::\n   :description: {short description here}\n   :date_updated: {planned date of publication here}\n\n.. _{RST page ref string here}:\n\n================================================================================\nDeep dive: {concept/practice/technique name; use sentence-case, not title case!}\n================================================================================\n\n.. {SEO-friendly intro paragraph, no more than 3 sentences total.} \nThis topic explores {subjects} in depth and discusses the technical details of it from the perspective of an AWS Neuron expert. Some experience in {related subjects here} is required to understand it in full.\n\nWhat you should know before reading\n-----------------------------------\n\n.. {If there is anything the reader should know before diving into this material, note it here and provide any supporting links. This also helps LLMs training on this content have greater technical context for this subject.}\n\nBefore you start, you must be familiar with the following:\n\n- **Concept 1:** {Brief description. Link to a related topic if necessary.}\n- **Concept 2:** {Brief description. Link to a related topic if necessary.}\n\nOverview\n---------\n\n.. {Your first section, which should cover the subject from the title at a high level. If appropriate, note when this concept is applicable in Neuron components and developer workflows. Starting off with a diagram can help illustrate the concept.}\n\nPARAGRAPH 1\n\nPARAGRAPH 2\n\n.. image:: images/diagram-name.png\n   :alt: {Alt text for diagram}\n   :align: center\n\n{Section 1 Title}\n-----------------\n\n.. {Each section should build on top of what was discussed in the previous sections. If a new concept is introduced that wasn't discussed previously, link to a topic that covers it. You can add subsections within this section if it helps to break it up more and clarify the content, but do not go more than 1-2 levels deep.}\n\nPARAGRAPH 1\n\nPARAGRAPH 2\n\n.. code-block:: python\n\n   # Code example if applicable\n   def example_function():\n       pass\n\n{Section 2 Title}\n-----------------\n\n.. {Each section should build on top of what was discussed in the previous sections. If a new concept is introduced that wasn't discussed previously, link to a topic that covers it. You can add subsections within this section if it helps to break it up more and clarify the content, but do not go more than 1-2 levels deep.}\n\nPARAGRAPH 1\n\nPARAGRAPH 2\n\n.. code-block:: python\n\n   # Code example if applicable\n   def example_function():\n       pass\n\n.. {Add more sections as appropriate to logically break up the content. Each section should be focused on a specific aspect of the concept.}\n\n{optional}Related Concepts\n----------------\n\n* :ref:`link-reference-name` - {description}\n* :ref:`link-reference-name` - {description}\n\n{optional}Further Reading\n---------------\n\n.. toctree::\n   :maxdepth: 1\n\n   * `External Link <URL>`_ - {description}\n   * :doc:`/path/to/internal/doc` - {description}\n\n\n.. (Note to both the writer and any AI incorporating this template: The content below is provided as a resource and should not be included as-is in any final document created using this template as a basis.)\n\n.. note::\n   .. Additional implementation details or important considerations can be added as admonitions.\n\n.. warning::\n   .. Critical information or potential pitfalls can be highlighted using warning admonitions.\n"
  },
  {
    "path": "_content-types/model-card.rst",
    "content": ".. _unique-ref-id-here:\n\n.. meta::\n    :description: AWS Neuron SDK model card for {Model Name}, version {version}. Overview, intended use, training data, performance, limitations, ethical considerations, and citations.\n    :date-modified: 2026-10-03\n\nModel Card: {Model Name}\n=======================\n\n.. contents:: Table of Contents\n   :depth: 1\n   :local:\n\nModel overview\n--------------\n\n:Model name: {name}\n:Version: {version}\n:Organization: {organization}\n:License: {license}\n:Last updated: {date}\n\n.. warning::\n   {Important warnings or critical limitations}\n\nQuickstart\n----------\n\n.. code-block:: python\n\n   # Example usage code\n   from model import Model\n   model = Model.from_pretrained(\"model_name\")\n   output = model.generate(\"Your input text\")\n\nModel details\n-------------\n\nArchitecture\n^^^^^^^^^^^^\n\n- Base architecture: {architecture}\n- Number of parameters: {parameter_count}\n- Model dimensions: {model_dimensions}\n- Training objective: {training_objective}\n\nHardware requirements\n^^^^^^^^^^^^^^^^^^^^^\n\n- Minimum RAM: {min_ram}\n- Recommended GPU: {gpu_specs}\n- Disk space: {disk_space}\n\nIntended Use\n-----------\n\nPrimary uses\n^^^^^^^^^^^^\n* {use_case_1}\n* {use_case_2}\n* {use_case_3}\n\nOut-of-Scope uses\n^^^^^^^^^^^^^^^^^\n* {prohibited_use_1}\n* {prohibited_use_2}\n\nTraining data\n------------\n\nDatasets\n^^^^^^^^\n.. list-table::\n   :header-rows: 1\n\n   * - Dataset Name\n     - Size\n     - Description\n   * - {dataset1}\n     - {size1}\n     - {description1}\n   * - {dataset2}\n     - {size2}\n     - {description2}\n\nTraining procedure\n^^^^^^^^^^^^^^^^^^\n* Training hardware: {hardware_details}\n* Training time: {duration}\n* Training cost: {cost_estimate}\n* Carbon footprint: {carbon_impact}\n\nPerformance and limitations\n---------------------------\n\nBenchmarks\n^^^^^^^^^\n.. list-table::\n   :header-rows: 1\n\n   * - Benchmark\n     - Score\n     - Details\n   * - {benchmark1}\n     - {score1}\n     - {details1}\n   * - {benchmark2}\n     - {score2}\n     - {details2}\n\nKnown limitations\n^^^^^^^^^^^^^^^^^\n* {limitation_1}\n* {limitation_2}\n\nBias and fairness\n^^^^^^^^^^^^^^^^^\n* {bias_consideration_1}\n* {bias_consideration_2}\n\nEthical considerations\n----------------------\n\nPotential risks\n^^^^^^^^^^^^^^^\n* {risk_1}\n* {risk_2}\n\nMitigation strategies\n^^^^^^^^^^^^^^^^^^^^^\n* {strategy_1}\n* {strategy_2}\n\nModel details and notes\n----------------------\n\n{Provide detailed information about the model, its training, evaluation, and any other relevant aspects. Create the sections as needed.}\n\n{Section 1 title}\n^^^^^^^^^^^^^^^^^\n{Details for section 1.}\n\n{Section 2 title}\n^^^^^^^^^^^^^^^^^\n{Details for section 2.}\n\n{. . .}\n\nCitations\n---------\n\n.. code-block:: bibtex\n\n   @article{model_paper,\n       title={},\n       author={},\n       journal={},\n       year={}\n   }\n\nVersion history\n---------------\n\n.. list-table::\n   :header-rows: 1\n\n   * - Version\n     - Date\n     - Changes\n   * - {version1}\n     - {date1}\n     - {changes1}\n   * - {version2}\n     - {date2}\n     - {changes2}\n\nContact\n-------\n\n:Documentation Issues: {link_to_issues}\n:Support: {support_contact}\n:Website: {website_url}\n"
  },
  {
    "path": "_content-types/procedural-how-to.rst",
    "content": ".. meta::\n   :description: {short description here}\n   :date_updated: {planned date of publication here}\n\n.. _{RST page ref string here}:\n\n========================================================================\nHow to {verb phrase with specific features or models that will be used}\n========================================================================\n\nTask overview\n-------------\n\n.. {SEO-friendly intro paragraph, no more than 3 sentences total.} \nThis topic discusses how to {description of task or process here} using the AWS Neuron SDK. {Short description of what the task will accomplish.}\n\nPrerequisites\n-------------\n\n- **Prerequisite 1:** Description. Link to a related topic if necessary.\n- **Prerequisite 2:** Description. Link to a related topic if necessary.\n\nInstructions\n------------\n\n**1:** {First step; start with verb/action}\n\n.. {Describe what the user will do in this step, starting with a verb. If applicable, include any commands or code examples that illustrate the step.}\n\n.. code-block:: bash\n\n   # Command or code example\n   command --flag value\n\n.. {Additional detail if needed.}\n\n.. note::\n   .. {Optional; important information or caveats about this step}\n\n**2:** {Second step; start with verb/action}\n\n.. .. {Describe what the user will do in this step, starting with a verb. If applicable, include any commands or code examples that illustrate the step.}\n\n.. code-block:: python\n\n   # Code example if applicable\n   def example():\n       pass\n\n.. {Additional detail if needed.}\n\n.. note::\n   .. {Optional; important information or caveats about this step}\n\n.. **{More discrete steps as needed, following the same pattern as above.}**\n\n**N:** {Last step; start with verb/action}\n\n.. {Final step instructions}\n\nConfirm your work\n-----------------\n\nTo confirm you have successfully completed this task, {how to verify the task was done correctly}:\n\n.. {Provide them with a way to know they’ve done everything correctly. This could be a screenshot, command-line output, a tool to launch, or specific settings to check.}\n\n.. code-block:: bash\n\n   # Verification command if applicable\n   verify-command --check\n\nCommon issues\n-------------\n\nUh oh! Did you encounter an error or other issue while working through this task? Here are some commonly encountered issues and how to address them.\n\n.. rubric:: {Problem 1}\n\n- **Possible solution**: {detailed solution}\n\n.. rubric:: {Problem 2}\n\n- **Possible solution**: {detailed solution}\n\nRelated information\n-------------------\n\n.. toctree::\n   :maxdepth: 1\n\n   * `External Link <URL>`_ - {description}\n   * :doc:`/path/to/internal/doc` - {description}\n"
  },
  {
    "path": "_content-types/procedural-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. meta::\\n\",\n    \"    :description: {SEO-friendly short description of the tutorial. Include 'Neuron' and any keywords such as the language mode and framework.}\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: {title starting with verb}\\n\",\n    \"\\n\",\n    \"This tutorial guides you through using the AWS Neuron SDK to {description of what the reader will accomplish in this tutorial, using a specific component or framework}.\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Overview\\n\",\n    \"\\n\",\n    \"{Briefly summarize the purpose and outcome of this end-to-end tutorial}.\\n\",\n    \"{State what users will learn or achieve by completing the tutorial}.\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\\n\",\n    \"    :local:\\n\",\n    \"    :depth: 2\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Before you start\\n\",\n    \"\\n\",\n    \"To successfully complete this tutorial, you must have completed the following steps in advance:\\n\\n\",\n    \"- Downloaded and installed the [AWS Neuron SDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/index.html) for {component}.\\n\",\n    \"- {prerequisite 2 description here. If the user must read a topic in advance or perform any complex preparations, provide a link to a topic or download}\\n\",\n    \"- {prerequisite 3 description here}\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Setup\\n\",\n    \"\\n\",\n    \"{Describe any initial local setup required before starting the tutorial.}\\n\",\n    \"{Include any code-specific installation, configuration, or environment setup steps.}\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Example setup command (Remove these comments and add the CLI commands, env variable declarations, or other operations for the user to prepare their environment.)\\n\",\n    \"# pip install package_name\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Tutorial steps\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Step 1: {Title, starting with an infinitive verb like 'Load...', 'Create...', etc.}\\n\",\n    \"\\n\",\n    \"{Describe the first main step. Provide code, commands, or configuration as needed.}\\n\",\n    \"\\n\",\n    \"{Optional} {Add any important notes, caveats, or warnings for this step.}\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"print(\\\"Code goes here!\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Step 2: {Title, starting with an infinitive verb like 'Load...', 'Create...', etc.}\\n\",\n    \"\\n\",\n    \"{Describe the second main step.}\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"print(\\\"Code goes here!\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Step 3: {Title, starting with an infinitive verb like 'Load...', 'Create...', etc.}\\n\",\n    \"\\n\",\n    \"Describe the third main step.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"print(\\\"Code goes here!\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Step N: {Title, starting with an infinitive verb like 'Load...', 'Create...', etc.}\\n\",\n    \"\\n\",\n    \"Describe the last main step.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"print(\\\"Code goes here!\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\nCode completed. Now, let's run it...\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run the code\\n\",\n    \"\\n\",\n    \"To run this code, {action to take to run the code}:\\n\",\n    \"Include commands, expected outputs, or checks to perform.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Example verification command\\n\",\n    \"# python foo.py\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\n\",\n    \"If your code works, you will see output like this:\\n\\n\",\n    \"```\\n\",\n    \"Loading glorp inhume logic...Done!\\n\",\n    \"Configuring extubation channel instances...Done!\\n\\n\",\n    \"1111 | 2222 | 3333\\n\",\n    \"4444 | 5555 | 6666\\n\\n\",\n    \"Average glorps inhumed and extubated: 420\\n\",\n    \"Time to max glorp: 8 seconds\\n\",\n    \"```\\n\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\nCongratulations! You now know how to {goal of tutorial}. If your code did not run or did not produce similar results, see the [Common issues](#Common issues) section below.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Common issues\\n\",\n    \"\\n\",\n    \"Here are some common errors and mistakes you can make when developing code using the approach in this tutorial, and how you may be able to address them:\\n\\n\",\n    \"- {describe error, symptoms, and possible solution}\\n\",\n    \"- {describe error, symptoms, and possible solution}\"\n   ]\n  },\n\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## (Optional) Next steps\\n\",\n    \"\\n\",\n    \"{Suggest what users might want to do next after completing the tutorial.\\n\",\n    \"Link to related topics or advanced guides.}\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Related topics\\n\",\n    \"\\n\",\n    \"- [Related topic 1](link_here)\\n\",\n    \"- [Related topic 2](link_here)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.5\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "_content-types/reference-kernel-api.rst",
    "content": ".. meta::\n    :description: API reference for the {kernel-name} kernel included in the NKI Library .\n    :date-modified: MM/DD/YYYY\n\n.. currentmodule:: {kernel namespace}.{kernel module path}\n\nRMSNorm-Quant Kernel API Reference\n==================================\n\nThis topic provides the API reference for the ``{kernel name}`` kernel. The kernel performs optional RMS normalization followed by quantization to ``fp8``.\n\nThe kernel supports:\n\n* {feature 1}\n* {feature 2}\n* {feature 3}\n* ... {more features as needed}\n\nBackground\n-----------\n\nThe ``{kernel}`` kernel ... {description of kernel functionality based in sources}\n\nFor detailed information about the mathematical operations and implementation details, refer to the :doc:`{kernel name} Kernel Design Specification </nki/library/specs/{kernel-spec-doc-file-link}>`.\n\nAPI Reference\n--------------\n\n{kernel argument class name}\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:class:: {kernel argument class name}\n\n   {kernel name} Kernel arguments.\n\n   .. py:attribute:: {attribute-1}\n      :type: {attribute-1-type}\n\n      {description from docstring}\n\n   .. py:attribute:: {attribute-1}\n      :type: {attribute-1-type}\n\n      {description from docstring}\n\n    {more attributes as needed}\n\n   .. py:method:: {method syntax} -> {return type}\n\n      {description from docstring}\n\n   .. py:method:: {method syntax} -> {return type}\n\n      {description from docstring}\n\n   **Raises**:\n\n   * **{exception-1}** – {when exception is raised}\n   * **{exception-1}** – {when exception is raised}\n\n{kernel API function name in code}\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: rmsnorm_quant_kernel(hidden: nt.tensor, ln_w: nt.tensor, kargs: RmsNormQuantKernelArgs)\n\n   {definition of method used to instantiate or invoke kernel here, from source docstrings}\n   \n   {params and types with descriptions from source docstrings}\n\nImplementation Details\n-----------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **{optimization-or-feature}**: {description}\n2. **{optimization-or-feature}**: {description}\n3. **{optimization-or-feature}**: {description}\n\nExample\n--------\n\nThe following is a simple example of how to use the ``{kernel}`` kernel:\n\n.. code-block:: python\n\n   # Code here -- need usage example in pedagogical style.\n\nSee Also\n--------\n\n* :doc:`{kernel} </nki/library/specs/{link-to-kernel-spec}>`\n\n"
  },
  {
    "path": "_content-types/release-notes-templates/compiler.rst",
    "content": ".. _neuron-2-XX-0-compiler:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK compiler component, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.XX.X: Neuron Compiler release notes\n====================================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n* Go back to the :ref:`AWS Neuron 2.XX.0 release notes home <neuron-2-XX-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\n Here's what we fixed in 2.XX.X:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . ."
  },
  {
    "path": "_content-types/release-notes-templates/containers.rst",
    "content": ".. _neuron-2-XX-0-dlc:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Deep Learning Containers (DLC) component, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.XX.0: Neuron Deep Learning Containers release notes\n====================================================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n* Go back to the :ref:`AWS Neuron 2.XX.0 release notes home <neuron-2-XX-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\n Here's what we fixed in 2.XX.X:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . ."
  },
  {
    "path": "_content-types/release-notes-templates/dlami.rst",
    "content": ".. _neuron-2-XX-0-dlami:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Deep Learning AWS Machine Images (DLAMIs) component, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.XX.X: Neuron Deep Learning AWS Machine Images release notes\n============================================================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n* Go back to the :ref:`AWS Neuron 2.XX.X release notes home <neuron-2-XX-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\n Here's what we fixed in 2.XX.X:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . ."
  },
  {
    "path": "_content-types/release-notes-templates/index.rst",
    "content": ".. _neuron-2-XX-0-whatsnew:\n.. _latest-neuron-release:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.XX.X release notes\n===================================\n\n**Date of release**: Month Day, 2026\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   PyTorch support <nx-pytorch>\n   JAX support <nx-jax>\n   NxD Inference <nxd-inference>\n   NxD Training <nxd-training>\n   NxD Core <nxd-core>\n   Neuron Compiler <compiler>\n   NKI <nki>\n   Neuron Runtime <runtime>\n   Developer tools <tools>\n   Deep Learning AMIs <dlami>\n   Deep Learning Containers <containers>\n   Release artifacts <../releasecontent>\n\nWhat's new?\n-----------\n\nAWS and Annapurna Labs are excited to bring you release version 2.XX.X of the Neuron SDK! In this release you'll find improvements to...\n\n* . . .\n* . . .\n* . . .\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\nRelease highlights\n------------------\n\nVersion 2.XX.X brings some exciting new features! HYPE TEXT HERE\n\nHIGHLIGHT 1\n^^^^^^^^^^^\nHYPE TEXT HERE \n\n* TALKING POINT 1\n* TALKING POINT 2\n* . . .\n\nUSE CASE DESCRIPTION HERE\n\nFor more details, see :doc:`DOC LINK </release-notes/path/to/page>`\n\nHIGHLIGHT 2\n^^^^^^^^^^^\nHYPE TEXT HERE \n\n* TALKING POINT 1\n* TALKING POINT 2\n* . . .\n\nUSE CASE DESCRIPTION HERE\n\nFor more details, see :doc:`DOC LINK </release-notes/path/to/page>`\n\nHIGHLIGHT 3\n^^^^^^^^^^^\nHYPE TEXT HERE \n\n* TALKING POINT 1\n* TALKING POINT 2\n* . . .\n\nUSE CASE DESCRIPTION HERE\n\nFor more details, see :doc:`DOC LINK </release-notes/path/to/page>`\n\nOther important changes\n^^^^^^^^^^^^^^^^^^^^^^^\n\nThis release also includes the following improvements\n\n* . . . LINK TO COMPONENT RELEASE NOTE PAGE\n* . . . LINK TO COMPONENT RELEASE NOTE PAGE\n* . . . LINK TO COMPONENT RELEASE NOTE PAGE\n* . . . LINK TO COMPONENT RELEASE NOTE PAGE\n\nComponent release notes\n-----------------------\n\nSelect a card below to review detailed release notes for each component of the Neuron SDK version 2.XX.X. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.\n\n.. grid:: 1 1 2 2\n        :gutter: 2\n\n        .. grid-item-card:: \n                :link: neuron-2-XX-0-pytorch\n                :link-type: ref\n\n                **PyTorch support** 2.XX.0 release notes\n                ^^^\n                Neuron features and solutions that support the PyTorch ML framework.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-XX-0-jax\n                :link-type: ref\n\n                **JAX support** 2.XX.0 release notes\n                ^^^\n                Neuron features and solutions that support the JAX ML framework.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-XX-0-nxd-training\n                :link-type: ref\n\n                **NxD Training** 2.XX.0 release notes\n                ^^^\n                Neuron features and tools for LLM and agent ML model training.\n                +++\n                Supports: ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-XX-0-nxd-inference\n                :link-type: ref\n\n                **NxD Inference** 2.XX.0 release notes\n                ^^^\n                Neuron features and tools for LLM and agent ML model inference.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n        \n        .. grid-item-card::\n                :link: neuron-2-XX-0-nxd-core\n                :link-type: ref\n\n                **NxD Core** 2.XX.0 release notes\n                ^^^\n                Common features and tools for Neuron-based training and inference.\n                +++\n                Supports: ``Trn1`` / ``Trn1n``, ``Trn2``\n         \n        .. grid-item-card:: \n                :link: neuron-2-XX-0-compiler\n                :link-type: ref\n\n                **Neuron Compiler** 2.XX.0 release notes\n                ^^^\n                The Neuron compiler for AWS Trainium and Inferentia, and its libraries and tools.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-XX-0-nki\n                :link-type: ref\n\n                **Neuron Kernel Interface (NKI)** 2.XX.0 release notes\n                ^^^\n                Neuron's Python-based programming interface for developing and optimizing Neuron kernels.\n                +++\n                Supports: ``Inf2``, ``Trn1``, ``Trn1n``\n\n        .. grid-item-card:: \n                :link: neuron-2-XX-0-runtime\n                :link-type: ref\n\n                **Neuron Runtime** 2.XX.0 release notes\n                ^^^\n                The Neuron kernel driver and C++ libraries for AWS Inferentia and Trainium instances.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n\n        .. grid-item-card:: \n                :link: neuron-2-XX-0-tools\n                :link-type: ref\n\n                **Neuron Developer Tools** 2.XX.0 release notes\n                ^^^\n                Tools that support end-to-end development for AWS Neuron.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n\n\n        .. grid-item-card:: \n                :link: neuron-2-XX-0-dlami\n                :link-type: ref\n\n                **Neuron Deep Learning AWS Machine Images (DLAMIs)** 2.XX.0 release notes\n                ^^^\n                AWS-specific machine images for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n \n        .. grid-item-card:: \n                :link: neuron-2-XX-0-dlc\n                :link-type: ref\n\n                **Neuron Deep Learning Containers (DLCs)** 2.XX.0 release notes\n                ^^^\n                AWS-specific container definitions for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n\n        .. grid-item-card::\n                :link: latest-neuron-release-artifacts\n                :link-type: ref\n        \n                **Neuron 2.XX.0 release artifacts**\n                ^^^\n                The libraries and packages updated in this release.\n\nSupport announcements\n---------------------\n\nThis section signals the official end-of-support or end of support for specific features, tools, and APIs.\n\nEnd-of-support announcements\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n*An \"end-of-support (EoS)\" announcement is a notification that a feature, tool, or API will not be supported in the future. Plan accordingly!*\n\n* END-OF-SUPPORT ANNOUNCEMENT 1 (link to announcement here)\n* . . .\n\nEnding support in 2.XX.X\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n\"End of support\" means that AWS Neuron no longer supports the feature, tool, or API indicated in the note as of this release.\n\n* ENDING SUPPORT ANNOUNCEMENT 1 (link to announcement here)\n* . . .\n\nPrevious releases\n-----------------\n\n* :doc:`Neuron 2.27.0 </release-notes/prev/2.27.0/index>`\n* :doc:`Neuron 2.26.0 </release-notes/prev/2.26.0/index>`\n* :doc:`Neuron 2.25.0 </release-notes/prev/2.25.0/index>`\n* :doc:`Earlier releases </release-notes/prev/rn>`\n\n* :ref:`prev-rn`\n* :ref:`pre-release-content`\n* :ref:`prev-n1-rn`\n"
  },
  {
    "path": "_content-types/release-notes-templates/nki.rst",
    "content": ".. _neuron-2-XX-0-nki:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron Kernel Interface (NKI) component, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.25.0: Neuron Kernel Interace (NKI) release notes\n=================================================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-XX-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\n Here's what we fixed in 2.25.0:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . ."
  },
  {
    "path": "_content-types/release-notes-templates/nx-jax.rst",
    "content": ".. _neuron-2-XX-0-jax:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK JAX support component, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.XX.X: JAX support release notes\n================================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-XX-0-whatsnew>`\n\nReleased versions\n-----------------\n* ``0.6.1.1.0.*``\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\n Here's what we fixed in 2.25.0:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . .\n"
  },
  {
    "path": "_content-types/release-notes-templates/nx-pytorch.rst",
    "content": ".. _neuron-2-XX-0-pytorch:\n\n.. meta::\n   :description: The official release notes for AWS Neuron SDK PyTorch support, version X.XX.0. Release date: XX/XX/XXXX.\n\nAWS Neuron SDK X.XX.0: PyTorch support release notes\n====================================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n\n* Go back to the :ref:`AWS Neuron 2.XX.0 release notes home <neuron-2-XX-0-whatsnew>`\n\nReleased versions\n-----------------\n\n* ... \n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE WHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\nHere's what we fixed in 2.XX.X:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . .\n"
  },
  {
    "path": "_content-types/release-notes-templates/nxd-core.rst",
    "content": ".. _neuron-2-XX-0-nxd-core:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK NxD Core component, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.XX.X: NxD Core release notes\n=============================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n* Go back to the :ref:`AWS Neuron 2.XX.0 release notes home <neuron-2-XX-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\n Here's what we fixed in 2.XX.X:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . ."
  },
  {
    "path": "_content-types/release-notes-templates/nxd-inference.rst",
    "content": ".. _neuron-2-XX-0-nxd-inference:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Transformers for Inference component, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.XX.X: NxD Inference release notes\n==================================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n* Go back to the :ref:`AWS Neuron 2.XX.0 release notes home <neuron-2-XX-0-whatsnew>`\n* \nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\n Here's what we fixed in 2.XX.X:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . ."
  },
  {
    "path": "_content-types/release-notes-templates/nxd-training.rst",
    "content": ".. _neuron-2-XX-0-nxd-training:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK NxD Training component, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.25.0: NxD Training release notes\n=================================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n* Go back to the :ref:`AWS Neuron 2.XX.0 release notes home <neuron-2-XX-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\n Here's what we fixed in 2.25.0:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . ."
  },
  {
    "path": "_content-types/release-notes-templates/runtime.rst",
    "content": ".. _neuron-2-XX-0-runtime:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Runtime component, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.XX.X: Neuron Runtime release notes\n===================================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n* Go back to the :ref:`AWS Neuron 2.XX.0 release notes home <neuron-2-XX-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\n Here's what we fixed in 2.XX.X:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . ."
  },
  {
    "path": "_content-types/release-notes-templates/tools.rst",
    "content": ".. _neuron-2-XX-0-tools:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Developer Tools component, version X.XX.0. Release date: XX/XX/2026.\n\nAWS Neuron SDK 2.XX.X: Developer Tools release notes\n====================================================\n\n**Date of release**: Month Day, 2026\n\n.. contents:: In this release\n   :local:\n   :depth: 1\n\n* Go back to the :ref:`AWS Neuron 2.XX.0 release notes home <neuron-2-XX-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nFeature 1\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 2\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nFeature 3\n^^^^^^^^^\n\nUSER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE.\n* . . .\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE.\n* . . .\n\nBug fixes\n---------\n\n Here's what we fixed in 2.XX.X:\n\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* SHORT SENTENCE DESCRIBING BUG FIX.\n* . . .\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT.\n* . . ."
  },
  {
    "path": "_ext/archive.py",
    "content": "# This file creates a downloadable archive from each directory listed in src_dirs.\n# You can modify or add additional archive_handler functions here to create additional archives.\n\nimport os, tarfile\n\ndef archive_handler(app):\n    old_cwd = os.getcwd()\n    src_dirs = ['src/examples/pytorch', 'src']\n    target_dirs = ['libtorch_demo', 'neuronperf']\n    archive_names = [name + '.tar.gz' for name in target_dirs]\n\n    for src_dir, target_dir, archive_name in zip(src_dirs, target_dirs, archive_names):\n        os.chdir(src_dir)\n\n        try:\n            os.remove(archive_name)\n        except OSError:\n            pass\n\n        with tarfile.open(archive_name, 'w:gz') as tar:\n            tar.add(target_dir)\n\n        os.chdir(old_cwd)\n\ndef setup(app):\n    app.connect('builder-inited', archive_handler)\n\n    return {\n        'version': '1.0',\n        'parallel_read_safe': True,\n        'parallel_write_safe': True,\n    }\n"
  },
  {
    "path": "_ext/df_tables.py",
    "content": "import os\nfrom docutils.parsers.rst import Directive, directives\nfrom docutils.parsers.rst.directives.tables import CSVTable\n\nclass DFTable(CSVTable):\n    CSVTable.option_spec['df-arg'] = directives.unchanged\n    df = None\n\n    def __init__(self, name, arguments, options, content, lineno,\n                 content_offset, block_text, state, state_machine):\n\n        super().__init__(name, arguments, options, content, lineno,\n                 content_offset, block_text, state, state_machine)\n\n    def get_csv_data(self):\n        return self.df.to_csv(index=False).splitlines(), None\n\n    def run(self):\n        source_file_name = self.state_machine.document.attributes[\"source\"]\n        dirname = os.path.abspath(os.path.dirname(source_file_name))\n        os.chdir(dirname)\n\n        code = \"\\n\".join(map(str, self.content))\n        ns = {}\n\n        try:\n            exec(\"\\n\".join( [\"import numpy as np\", \"import pandas as pd\", ]), ns)\n\n            variable_name = \"df\"\n            if self.options.get(\"df-var\"):\n                variable_name = self.options.get(\"df-var\")\n\n            exec(code, ns)\n            self.df = ns[variable_name]\n            \n        except Exception as e:\n            raise self.error(str(e))\n\n        return super().run()\n\n    \ndef setup(app):\n    setup.app = app\n    setup.config = app.config\n    setup.confdir = app.confdir\n    app.add_directive(\"df-table\", DFTable)\n\n    metadata = {\n        \"parallel_read_safe\": True,\n        \"parallel_write_safe\": True,\n        \"version\": 0.1,\n    }\n    return metadata"
  },
  {
    "path": "_ext/local_documenter.py",
    "content": "import os\nimport sys\n\nfrom sphinx.ext.autodoc import ModuleDocumenter, FunctionDocumenter\n\n\nclass LocalModuleDocumenter(ModuleDocumenter):\n    \"\"\"\n    Provides identical functionality to \"automodule\", but allows the module\n    function names to be overridden with the \"module-name\" option.\n\n    This also allows local python files to be documented as if they were\n    imported from an actual package by temporarily adding the directory of the\n    RST file to the python path.\n    \"\"\"\n    option_spec = dict(ModuleDocumenter.option_spec)\n    option_spec['module-name'] = lambda x = None: x\n\n    def import_object(self, *args):\n        \"\"\"Find modules local to the RST document directory\"\"\"\n        local = os.path.join(self.env.app.srcdir, os.path.dirname(self.env.docname))\n        sys.path.append(local)\n        result = super().import_object(*args)\n        sys.path.remove(local)\n        return result\n\n    def get_module_members(self):\n        \"\"\"Add module name override to local files\"\"\"\n        members = super().get_module_members()\n        name = self.options.module_name\n        if name is not None:\n            for member in members.values():\n                if callable(member.object):\n                    setattr(member.object, 'module_name_override', name)\n        return members\n\n\nclass LocalFunctionDocumenter(FunctionDocumenter):\n    def format_name(self) -> str:\n        \"\"\"Apply module name override to local functions\"\"\"\n        # Use overridden module path if it is provided\n        if hasattr(self.object, 'module_name_override'):\n            self.objpath = self.object.module_name_override.split('.') + [self.objpath[-1]]\n        return super().format_name()\n\n\ndef setup(app):\n    app.add_autodocumenter(LocalFunctionDocumenter)\n    app.add_autodocumenter(LocalModuleDocumenter)\n"
  },
  {
    "path": "_ext/neuron_tag.py",
    "content": "import os\n\nfrom docutils import nodes\nfrom docutils.statemachine import ViewList\n\nfrom sphinx.util.docutils import SphinxDirective\nfrom sphinx.util.nodes import nested_parse_with_titles\n\n\n# =============================================================================\n# Legacy add/clear lists (used only for files NOT handled by explicit overrides)\n# =============================================================================\n\n# These lists use substring matching via in_list(). They apply ONLY when no\n# explicit_override was set. As more paths get explicit overrides, entries\n# here become dead code. Kept for backward compatibility with paths not yet\n# explicitly overridden.\n\nadd_inf1_tag = [\n    'about-neuron/arch',\n    'archive/mxnet-neuron',\n    'about-neuron/announcements/index',\n    'archive/tensorflow/tensorflow-neuron/',\n]\n\nadd_trn1_tag = [\n    'frameworks/neuron-customops/',\n    'neuron-customops/',\n    'frameworks/torch/inference-torch-neuronx',\n    'libraries/nemo-megatron/',\n    'libraries/nxd-training/',\n]\n\nadd_trn2_tag = [\n    'libraries/nxd-training/',\n    'about-neuron/models/',\n]\n\nadd_trn3_tag = [\n    'about-neuron/arch/neuron-hardware/neuron-core-v4',\n    'about-neuron/arch/neuron-hardware/trn3-arch',\n]\n\nadd_neuronx_tag = [\n    'frameworks/torch/torch-neuronx/',\n    'archive/tensorflow/tensorflow-neuronx/',\n    'frameworks/torch/inference-torch-neuronx/',\n    'libraries/neuronx-distributed/',\n    'libraries/nxd-training',\n    'setup/tensorflow-neuronx',\n]\n\nclear_inf1_tag = [\n    'about-neuron/arch/neuron-features/neuron-caching',\n    'about-neuron/arch/neuron-features/eager-debug-mode',\n    'about-neuron/arch/neuron-features/collective-communication-operations',\n    'about-neuron/arch/neuron-features/dynamic-shapes',\n    'about-neuron/arch/neuron-features/control-flow',\n    'about-neuron/arch/neuron-features/custom-c++-operators',\n    'about-neuron/arch/neuron-features/collective-communication',\n    'about-neuron/arch/neuron-features/rounding-modes',\n    'about-neuron/arch/neuron-hardware/trn1-arch',\n    'about-neuron/arch/neuron-hardware/inf2-arch',\n    'about-neuron/arch/neuron-hardware/inferentia2',\n    'about-neuron/arch/neuron-hardware/trainium',\n    'about-neuron/arch/neuron-hardware/neuron-core-v2',\n    'about-neuron/arch/neuron-hardware/trn2-arch',\n    'about-neuron/arch/neuron-hardware/trn3-arch',\n    'about-neuron/arch/neuron-hardware/neuron-core-v3',\n    'about-neuron/arch/neuron-hardware/neuron-core-v4',\n    'about-neuron/benchmarks/trn1-performance',\n    'about-neuron/benchmarks/trn1/',\n    'about-neuron/benchmarks/inf2/inf2-performance',\n    'about-neuron/faq/training/',\n    'about-neuron/models/inference-inf2-trn1-samples',\n    'about-neuron/models/training-trn1-samples',\n    'about-neuron/models/training-inference-trn2-samples',\n    'about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision',\n    'about-neuron/appnotes/transformers-neuronx/generative-llm-inference-with-neuron',\n    'about-neuron/appnotes/torch-neuronx/torch-neuronx-dataparallel-app-note',\n    'about-neuron/calculator/neuron-calculator',\n    'about-neuron/announcements/neuron2.x/dlami-pytorch-introduce',\n    'about-neuron/announcements/neuron2.x/sm-training-trn1-introduce',\n    'about-neuron/announcements/neuron2.x/sm-training-dlc-2.9.1',\n    'devflows/training',\n    'devflows/inference/byoc-hosting-devflow-inf2',\n    'compiler/neuronx-cc/',\n    'about-neuron/appnotes/perf/neuronx-cc/',\n    'frameworks/torch/torch-neuronx/',\n    'frameworks/torch/training',\n    'frameworks/torch/inference-torch-neuronx',\n    'archive/tensorflow/tensorflow-neuronx/',\n    'archive/tensorflow/tensorflow-neuronx-inference',\n    'frameworks/torch/torch-neuronx/transformers-neuronx/readme',\n    'release-notes/neuron-cc/index',\n    'release-notes/runtime/aws-neuronx-collectives/',\n    'release-notes/torch/torch-neuronx/',\n    'release-notes/torch/transformers-neuronx/index',\n    'release-notes/tensorflow/tensorflow-neuronx/',\n    'release-notes/compiler/neuronx-cc/',\n    'tools/tutorials/tutorial-tensorboard-scalars-mnist',\n    'tools/tutorials/tutorial-neuron-monitor-mnist',\n    'tools/tensorboard/getting-started-tensorboard-neuronx-plugin',\n    'tools/neuron-sys-tools/nccom-test',\n    'setup/torch-neuronx',\n    'setup/tensorflow-neuronx',\n    'setup/neuron-setup/tensorflow/neuronx/',\n    'setup/neuron-setup/pytorch/neuronx/',\n    'nki/',\n    'frameworks/jax/',\n    'libraries/nxd-training/',\n    '/release-notes/components/nki',\n    '/release-notes/components/nki-lib',\n    '/release-notes/components/compiler'\n]\n\nclear_inf2_tag = [\n    'frameworks/torch/torch-neuronx/training',\n    'frameworks/torch/training',\n    'archive/torch-neuron/inference-torch-neuron',\n    'archive/tensorflow/tensorflow-neuron-inference',\n    'frameworks/jax/',\n    'about-neuron/arch/neuron-hardware/trn1-arch',\n    'about-neuron/arch/neuron-hardware/trainium',\n    'about-neuron/arch/neuron-hardware/trn2-arch',\n    'about-neuron/arch/neuron-hardware/trn3-arch',\n    'about-neuron/arch/neuron-hardware/neuron-core-v3',\n    'about-neuron/arch/neuron-hardware/neuron-core-v4',\n    'about-neuron/arch/neuron-features/logical-neuroncore-config',\n    'about-neuron/benchmarks/trn1/trn1-inference-performance',\n    'about-neuron/benchmarks/trn1/trn1-training-performance',\n    'about-neuron/models/training-trn1-samples',\n    'about-neuron/models/training-inference-trn2-samples',\n    'about-neuron/announcements/neuron2.x/announce-neuron-trn2',\n    'neuronx-distributed/nxd-training',\n    'libraries/nxd-training/',\n    'tools/neuron-sys-tools/nccom-test',\n    'release-notes/runtime/aws-neuronx-collectives/',\n]\n\nclear_trn1_tag = [\n    'about-neuron/arch/neuron-hardware/inf2-arch',\n    'about-neuron/arch/neuron-hardware/inferentia2',\n    'about-neuron/arch/neuron-hardware/trn2-arch',\n    'about-neuron/arch/neuron-hardware/trn3-arch',\n    'about-neuron/arch/neuron-hardware/trainium2',\n    'about-neuron/arch/neuron-hardware/neuron-core-v3',\n    'about-neuron/arch/neuron-hardware/neuron-core-v4',\n    'about-neuron/benchmarks/inf2/inf2-performance',\n    'about-neuron/models/training-inference-trn2-samples',\n]\n\nclear_trn2_tag = [\n    'archive/tensorflow/',\n    'libraries/transformers-neuronx/',\n    'about-neuron/arch/neuron-hardware/trn1-arch',\n    'about-neuron/arch/neuron-hardware/trainium',\n    'about-neuron/arch/neuron-hardware/neuron-core-v2',\n    'about-neuron/arch/neuron-hardware/neuron-core-v4',\n    'about-neuron/arch/neuron-hardware/trn3-arch',\n    'about-neuron/benchmarks/',\n    'about-neuron/benchmarks/trn1/',\n    'about-neuron/benchmarks/inf2/inf2-performance',\n    'about-neuron/models/inference-inf2-trn1-samples',\n    'about-neuron/models/training-trn1-samples',\n    'neuron-customops/programming-guide/custom-c++-operators-devguide'\n]\n\nclear_trn3_tag = [\n    'archive/tensorflow/',\n    'libraries/transformers-neuronx/',\n    'about-neuron/arch/neuron-hardware/trn1-arch',\n    'about-neuron/arch/neuron-hardware/trainium',\n    'about-neuron/arch/neuron-hardware/neuron-core-v2',\n    'about-neuron/arch/neuron-hardware/neuron-core-v3',\n    'about-neuron/benchmarks/',\n    'about-neuron/benchmarks/trn1/',\n    'about-neuron/benchmarks/inf2/inf2-performance',\n    'about-neuron/models/inference-inf2-trn1-samples',\n    'about-neuron/models/training-trn1-samples',\n    'libraries/neuronx-distributed/context_parallelism_overview',\n    'about-neuron/appnotes/',\n    'neuron-customops/programming-guide/custom-c++-operators-devguide'\n]\n\n# Neuron 1.x / NeuronCore v1 era content — clear all non-Inf1 tags\nclear_nc_v2_tag = [\n    'tools/tutorials/tutorial-neuron-check-model',\n    'tools/tutorials/tutorial-neuron-gatherinfo',\n    'tools/tutorials/getting-started-tensorboard-neuron-plugin',\n    'tools/tensorboard/getting-started-tensorboard-neuron-plugin',\n    'tools/helper-tools/tutorial-neuron-check-model',\n    'tools/helper-tools/tutorial-neuron-gatherinfo',\n    'about-neuron/appnotes/neuron-cc/mixed-precision',\n    'about-neuron/appnotes/perf/neuron-cc/',\n    'about-neuron/appnotes/neuron1x/',\n    'about-neuron/appnotes/torch-neuron/',\n    'about-neuron/arch/neuron-hardware/inf1-arch',\n    'about-neuron/arch/neuron-hardware/inferentia',\n    'about-neuron/arch/neuron-hardware/neuron-core-v1',\n    'about-neuron/arch/neuron-features/neuroncore-pipeline',\n    'about-neuron/announcements/neuron1.x/',\n    'about-neuron/quick-start/mxnet-neuron',\n    'about-neuron/benchmarks/inf1/',\n    'about-neuron/faq/inference/',\n    'about-neuron/models/inference-inf1-samples',\n    'containers/dlc-then-ec2-devflow',\n    'containers/dlc-then-ecs-devflow',\n    'containers/dlc-then-eks-devflow',\n    'containers/container-sm-hosting-devflow',\n    'containers/rn',\n    'containers/tutorials/k8s-neuron-scheduler',\n    'compiler/neuron-cc/',\n    'release-notes/mxnet-neuron/',\n    'release-notes/torch/torch-neuron/',\n    'release-notes/tensorflow/tensorflow-neuron/',\n    'release-notes/compiler/neuron-cc/',\n    'release-notes/neuron1/',\n    'archive/torch-neuron/',\n    'archive/torch-neuron/inference-torch-neuron',\n    'archive/tensorflow/tensorflow-neuron/',\n    'archive/tensorflow/tensorflow-neuron-inference',\n    'archive/mxnet-neuron/',\n    'setup/tensorflow-neuron',\n    'setup/torch-neuron',\n    'setup/mxnet-neuron',\n    'setup/neuron-setup/pytorch/neuron/',\n    'setup/neuron-setup/mxnet/neuron/ubuntu/',\n    'setup/neuron-setup/mxnet/neuron/amazon-linux/',\n    'setup/neuron-setup/tensorflow/neuron/ubuntu/',\n    'setup/neuron-setup/tensorflow/neuron/amazon-linux/',\n]\n\n# Top-level directories used for initial tag assignment\nNEURON1_DIRS = ['n1']\nCOMMON_DIRS = [\n    'tools', 'neuron-runtime', 'release-notes', 'containers', 'compiler',\n    'frameworks', 'src', 'about-neuron', 'setup', 'devflows', 'dlami', 'libraries',\n]\n\nTEXT_TEMPLATE = '**This document is relevant for**: '\n\n\n# =============================================================================\n# Hardware architecture page map (exact docname → instance list)\n# =============================================================================\n\nHW_ARCH_MAP = {\n    'about-neuron/arch/neuron-hardware/inf1-arch': ['Inf1'],\n    'about-neuron/arch/neuron-hardware/inf2-arch': ['Inf2'],\n    'about-neuron/arch/neuron-hardware/inferentia': ['Inf1'],\n    'about-neuron/arch/neuron-hardware/inferentia2': ['Inf2'],\n    'about-neuron/arch/neuron-hardware/neuron-core-v1': ['Inf1'],\n    'about-neuron/arch/neuron-hardware/neuron-core-v2': ['Inf2', 'Trn1'],\n    'about-neuron/arch/neuron-hardware/neuron-core-v3': ['Trn2'],\n    'about-neuron/arch/neuron-hardware/neuron-core-v4': ['Trn3'],\n    'about-neuron/arch/neuron-hardware/trainium': ['Trn1'],\n    'about-neuron/arch/neuron-hardware/trainium2': ['Trn2'],\n    'about-neuron/arch/neuron-hardware/trainium3': ['Trn3'],\n    'about-neuron/arch/neuron-hardware/trn1-arch': ['Trn1'],\n    'about-neuron/arch/neuron-hardware/trn2-arch': ['Trn2'],\n    'about-neuron/arch/neuron-hardware/trn3-arch': ['Trn3'],\n}\n\n# NxD Core training-specific pages (no Inf2)\nNXD_CORE_TRAINING_PAGES = [\n    'libraries/neuronx-distributed/index-training',\n    'libraries/neuronx-distributed/developer-guide-training',\n    'libraries/neuronx-distributed/api-reference-guide-training',\n    'libraries/neuronx-distributed/tp_developer_guide',\n    'libraries/neuronx-distributed/pp_developer_guide',\n    'libraries/neuronx-distributed/ptl_developer_guide',\n    'libraries/neuronx-distributed/save_load_developer_guide',\n    'libraries/neuronx-distributed/activation_memory_reduction',\n    'libraries/neuronx-distributed/activation_memory_reduction_developer_guide',\n    'libraries/neuronx-distributed/standard_mixed_precision',\n    'libraries/neuronx-distributed/tensor_parallelism_overview',\n    'libraries/neuronx-distributed/pipeline_parallelism_overview',\n    'libraries/neuronx-distributed/lora_finetune_developer_guide',\n    'libraries/neuronx-distributed/model_optimizer_wrapper_developer_guide',\n    'libraries/neuronx-distributed/context_parallelism_overview',\n]\n\n\ndef _in_list(cur_file, file_list):\n    \"\"\"Return True if any entry in file_list is a substring of cur_file.\"\"\"\n    return any(entry in cur_file for entry in file_list)\n\n\ndef _splitall(path):\n    \"\"\"Split a path into all its components.\"\"\"\n    parts = []\n    while True:\n        head, tail = os.path.split(path)\n        if head == path:\n            parts.insert(0, head)\n            break\n        elif tail == path:\n            parts.insert(0, tail)\n            break\n        else:\n            path = head\n            parts.insert(0, tail)\n    return parts, len(parts)\n\n\ndef _get_explicit_override(cur_file):\n    \"\"\"Return (instances, True) if cur_file has an explicit CSV-based override,\n    or (None, False) otherwise.\n\n    Rules are evaluated top-to-bottom. More specific paths must come AFTER\n    broader paths so they can override them (last match wins).\n    \"\"\"\n\n    # --- Libraries -----------------------------------------------------------\n\n    # NxD Core = Inf2, Trn1, Trn2 (default for all neuronx-distributed pages)\n    if cur_file.startswith('libraries/neuronx-distributed/'):\n        result = ['Inf2', 'Trn1', 'Trn2']\n        # Training-specific pages drop Inf2\n        if cur_file in NXD_CORE_TRAINING_PAGES:\n            result = ['Trn1', 'Trn2']\n        if cur_file.startswith('libraries/neuronx-distributed/tutorials/training') or \\\n           cur_file.startswith('libraries/neuronx-distributed/tutorials/finetune'):\n            result = ['Trn1', 'Trn2']\n        return result, True\n\n    if cur_file.startswith('libraries/transformers-neuronx/'):\n        return ['Inf2', 'Trn1'], True\n\n    if cur_file.startswith('libraries/nxd-training/'):\n        return ['Trn1', 'Trn2'], True\n\n    # vLLM must come before general nxd-inference\n    if cur_file.startswith('libraries/nxd-inference/vllm/'):\n        return ['Trn2', 'Trn3'], True\n\n    if cur_file.startswith('libraries/nxd-inference/'):\n        return ['Inf2', 'Trn1', 'Trn2'], True\n\n    if cur_file.startswith('libraries/nemo-megatron/'):\n        return ['Trn1', 'Trn2'], True\n\n    # --- NKI -----------------------------------------------------------------\n\n    if cur_file.startswith('nki/'):\n        return ['Trn2', 'Trn3'], True\n\n    # --- CustomOps -----------------------------------------------------------\n\n    if cur_file.startswith('neuron-customops/'):\n        return ['Inf2', 'Trn1'], True\n\n    # --- Frameworks ----------------------------------------------------------\n\n    if cur_file.startswith('frameworks/jax/'):\n        return ['Trn2', 'Trn3'], True\n\n    # TensorFlow NeuronX (must come before TensorFlow Neuron check)\n    if 'tensorflow/tensorflow-neuronx' in cur_file:\n        return ['Inf2', 'Trn1'], True\n\n    # TensorFlow Neuron (Inf1)\n    if 'tensorflow/tensorflow-neuron' in cur_file and 'neuronx' not in cur_file:\n        return ['Inf1'], True\n\n    # TorchNeuron native PyTorch (must come before torch-neuronx check)\n    if 'torch/pytorch-native' in cur_file:\n        return ['Trn2', 'Trn3'], True\n\n    # PyTorch NeuronX (Torch/XLA)\n    if 'torch/torch-neuronx' in cur_file:\n        return ['Inf2', 'Trn1', 'Trn2'], True\n\n    # PyTorch NeuronX top-level pages (not in torch-neuronx/ subdir)\n    if cur_file in ['frameworks/torch/inference-torch-neuronx',\n                     'frameworks/torch/training-torch-neuronx',\n                     'frameworks/torch/training',\n                     'frameworks/torch/inference']:\n        return ['Inf2', 'Trn1', 'Trn2'], True\n\n    # PyTorch Neuron (Inf1)\n    if 'torch/torch-neuron' in cur_file and 'neuronx' not in cur_file:\n        return ['Inf1'], True\n\n    if cur_file == 'archive/torch-neuron/inference-torch-neuron':\n        return ['Inf1'], True\n\n    # MXNet\n    if 'mxnet-neuron' in cur_file:\n        return ['Inf1'], True\n\n    # --- Neuron Runtime ------------------------------------------------------\n\n    # Collectives (more specific, must come after general runtime)\n    if cur_file.startswith('neuron-runtime/about/collectives') or \\\n       cur_file in ['neuron-runtime/explore/internode-collective-comm',\n                     'neuron-runtime/explore/intranode-collective-comm',\n                     'neuron-runtime/explore/compute-comm-overlap']:\n        return ['Trn1', 'Trn2', 'Trn3'], True\n\n    if cur_file.startswith('neuron-runtime/'):\n        return ['Inf2', 'Trn1', 'Trn2', 'Trn3'], True\n\n    # --- Compiler ------------------------------------------------------------\n\n    if cur_file.startswith('compiler/error-codes/'):\n        return ['Inf2', 'Trn1', 'Trn2', 'Trn3'], True\n\n    if cur_file == 'compiler/neuron-cc' or cur_file.startswith('compiler/neuron-cc/'):\n        return ['Inf1'], True\n\n    if cur_file == 'compiler/neuronx-cc' or cur_file.startswith('compiler/neuronx-cc/'):\n        return ['Inf2', 'Trn1', 'Trn2', 'Trn3'], True\n\n    if cur_file == 'neuron-customops/programming-guide' or cur_file.startswith('neuron-customops/programming-guide'):\n        return ['Inf2', 'Trn1'], True\n\n    # --- Setup ---------------------------------------------------------------\n\n    if cur_file.startswith('setup/install-templates/inf1/'):\n        return ['Inf1'], True\n    if cur_file.startswith('setup/install-templates/inf2/'):\n        return ['Inf2'], True\n    if cur_file.startswith('setup/install-templates/trn1/') or \\\n       cur_file == 'setup/install-templates/launch-trn1-dlami':\n        return ['Trn1'], True\n\n    if cur_file in ['setup/setup-neuron', 'setup/torch-neuron', 'setup/torch-neuron-ubuntu20']:\n        return ['Inf1'], True\n\n    if cur_file.startswith('setup/neuron-setup/pytorch/neuronx/'):\n        return ['Inf2', 'Trn1', 'Trn2'], True\n    if cur_file.startswith('setup/neuron-setup/tensorflow/neuronx/'):\n        return ['Inf2', 'Trn1'], True\n    if cur_file.startswith('setup/neuron-setup/pytorch/neuron/'):\n        return ['Inf1'], True\n    if cur_file.startswith('setup/neuron-setup/tensorflow/neuron/'):\n        return ['Inf1'], True\n\n    if cur_file == 'setup/jax-neuronx':\n        return ['Trn2', 'Trn3'], True\n    if cur_file == 'setup/torch-neuronx':\n        return ['Inf2', 'Trn1', 'Trn2'], True\n    if cur_file == 'setup/tensorflow-neuronx':\n        return ['Inf2', 'Trn1'], True\n    if cur_file == 'setup/tensorflow-neuron':\n        return ['Inf1'], True\n\n    return None, False\n\n\ndef _get_page_override(cur_file):\n    \"\"\"Return (instances, True) for page-specific overrides that don't fit\n    neatly into _get_explicit_override (devflows, containers, tools, about-neuron, etc.).\n    \"\"\"\n\n    # --- Devflows ------------------------------------------------------------\n\n    if cur_file == 'devflows/inference/byoc-hosting-devflow-inf2':\n        return ['Inf2'], True\n    if cur_file == 'devflows/inference/ec2-then-ec2-devflow-inf2':\n        return ['Inf2'], True\n    if cur_file == 'devflows/parallelcluster-flows':\n        return ['Trn1', 'Trn2'], True\n\n    if cur_file.startswith('devflows/training/batch/') or \\\n       cur_file.startswith('devflows/training/ec2/') or \\\n       cur_file.startswith('devflows/training/parallelcluster/') or \\\n       cur_file.startswith('devflows/training/sm-devflow/'):\n        return ['Trn1', 'Trn2', 'Trn3'], True\n\n    if cur_file.startswith('devflows/plugins/npd'):\n        return ['Inf2', 'Trn1', 'Trn2'], True\n\n    # --- Containers ----------------------------------------------------------\n\n    # OCI Hooks\n    if 'tutorial-oci-hook' in cur_file:\n        return ['Inf1', 'Inf2', 'Trn1', 'Trn2'], True\n\n    # DRA\n    if cur_file == 'containers/neuron-dra' or cur_file.startswith('containers/files/'):\n        return ['Trn2', 'Trn3'], True\n\n    if cur_file == 'containers/how-to/how-to-ultraserver':\n        return ['Trn2', 'Trn3'], True\n\n    # DLC quickstarts\n    if cur_file == 'containers/get-started/quickstart-configure-deploy-dlc':\n        return ['Trn2', 'Trn3'], True\n    if cur_file == 'containers/get-started/quickstart-pytorch-inference-dlc':\n        return ['Inf2', 'Trn1', 'Trn2', 'Trn3'], True\n\n    # Inf1-era container content\n    if cur_file == 'containers/tutorial-docker-runtime1.0':\n        return ['Inf1'], True\n    if cur_file == 'containers/container-deployment-flows' or \\\n       cur_file.startswith('containers/docker-example/inference/') or \\\n       cur_file.startswith('containers/docker-example/v1/') or \\\n       cur_file == 'containers/ec2-then-ec2-devflow' or \\\n       cur_file == 'containers/neo-then-hosting-devflow':\n        return ['Inf1'], True\n\n    # Container training/inference tutorials and docker examples\n    if cur_file.startswith('containers/docker-example/training/'):\n        return ['Trn1', 'Trn2', 'Trn3'], True\n    if cur_file.startswith('containers/tutorials/inference/'):\n        return ['Inf1'], True\n    if cur_file.startswith('containers/tutorials/training/'):\n        return ['Trn1', 'Trn2', 'Trn3'], True\n\n    # Neuron Monitor Container\n    if cur_file == 'containers/tutorials/k8s-neuron-monitor':\n        return ['Inf2', 'Trn1', 'Trn2'], True\n\n    # Node Problem Detector\n    if cur_file.startswith('containers/tutorials/k8s-neuron-problem-detector'):\n        return ['Inf2', 'Trn1', 'Trn2'], True\n\n    # --- Tools ---------------------------------------------------------------\n\n    # TensorBoard plugin (End Of Support)\n    if cur_file.startswith('tools/tensorboard/getting-started-tensorboard-neuronx') or \\\n       cur_file == 'tools/tutorials/tutorial-tensorboard-scalars-mnist' or \\\n       cur_file == 'tools/tutorials/torch-neuronx-profiling-with-tb':\n        return ['Inf2', 'Trn1'], True\n\n    # --- Announcements -------------------------------------------------------\n\n    if cur_file.startswith('about-neuron/announcements/'):\n        return [], True\n\n    # --- Hardware architecture -----------------------------------------------\n\n    if cur_file in HW_ARCH_MAP:\n        return HW_ARCH_MAP[cur_file], True\n\n    # --- Arch features -------------------------------------------------------\n\n    if cur_file == 'about-neuron/arch/neuron-features/custom-c++-operators':\n        return ['Inf2', 'Trn1'], True\n    if cur_file == 'about-neuron/arch/neuron-features/logical-neuroncore-config':\n        return ['Trn2', 'Trn3'], True\n\n    # --- Appnotes ------------------------------------------------------------\n\n    if cur_file == 'about-neuron/appnotes/neuronx-distributed/introducing-nxd-inference':\n        return ['Inf2', 'Trn1', 'Trn2'], True\n    if cur_file == 'about-neuron/appnotes/neuronx-distributed/introducing-nxdt-training':\n        return ['Trn1', 'Trn2'], True\n    if cur_file.startswith('about-neuron/appnotes/torch-neuronx/'):\n        return ['Inf2', 'Trn1', 'Trn2'], True\n    if cur_file.startswith('about-neuron/appnotes/transformers-neuronx/'):\n        return ['Inf2', 'Trn1'], True\n    if cur_file == 'about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision':\n        return ['Trn1', 'Trn2', 'Trn3'], True\n    if cur_file.startswith('about-neuron/appnotes/neuron1x/'):\n        return ['Inf1'], True\n\n    # --- Benchmarks ----------------------------------------------------------\n\n    if cur_file == 'about-neuron/benchmarks/index':\n        return ['Inf1', 'Inf2', 'Trn1', 'Trn2', 'Trn3'], True\n\n    # --- Quick-start ---------------------------------------------------------\n\n    if cur_file == 'about-neuron/quick-start/tensorflow-neuron':\n        return ['Inf1'], True\n    if cur_file in ['about-neuron/quick-start/torch-neuron',\n                     'about-neuron/quick-start/torch-neuron-tab-training']:\n        return ['Inf1'], True\n\n    if cur_file.startswith('about-neuron/quick-start/tab-inference-torch-neuronx'):\n        return ['Inf2', 'Trn1', 'Trn2'], True\n    if cur_file.startswith('about-neuron/quick-start/tab-inference-torch-neuron') and 'neuronx' not in cur_file:\n        return ['Inf1'], True\n    if cur_file.startswith('about-neuron/quick-start/tab-inference-tensorflow-neuronx'):\n        return ['Inf2', 'Trn1'], True\n    if cur_file.startswith('about-neuron/quick-start/tab-inference-tensorflow-neuron') and 'neuronx' not in cur_file:\n        return ['Inf1'], True\n\n    return None, False\n\n\nclass NeuronTag(SphinxDirective):\n\n    def run(self):\n        cur_file = self.env.docname\n        path_split, path_len = _splitall(cur_file)\n\n        # Landing page gets no tag\n        if path_split[0] == 'index':\n            return self._render('')\n\n        # Step 1: Assign default instances based on top-level directory\n        return_instances = []\n        if path_split[0] in NEURON1_DIRS:\n            return_instances = ['Inf1']\n        elif path_split[0] in COMMON_DIRS:\n            return_instances = ['Inf1', 'Inf2', 'Trn1', 'Trn2', 'Trn3']\n\n        # Step 2: Check explicit overrides (CSV-based, highest priority)\n        explicit_override = False\n\n        result, matched = _get_explicit_override(cur_file)\n        if matched:\n            return_instances = result\n            explicit_override = True\n\n        if not explicit_override:\n            result, matched = _get_page_override(cur_file)\n            if matched:\n                return_instances = result\n                explicit_override = True\n\n        # Step 3: Directory-based inference/training heuristic\n        if not explicit_override:\n            if path_len >= 2:\n                parent_dir = path_split[path_len - 2]\n                if parent_dir == 'inference':\n                    return_instances = ['Inf1']\n                elif parent_dir == 'training':\n                    return_instances = ['Trn1', 'Trn2', 'Trn3']\n\n        # Step 4: Legacy add/clear tag lists (only for non-overridden files)\n        if not explicit_override:\n            if _in_list(cur_file, add_trn1_tag):\n                if 'Trn1' not in return_instances:\n                    return_instances.extend(['Trn1', 'Trn2', 'Trn3', 'Inf2'])\n\n            if _in_list(cur_file, add_trn2_tag):\n                if 'Trn2' not in return_instances:\n                    return_instances.extend(['Trn2', 'Trn3'])\n\n            if _in_list(cur_file, add_trn3_tag):\n                if 'Trn3' not in return_instances:\n                    return_instances.append('Trn3')\n\n            if _in_list(cur_file, add_neuronx_tag):\n                if 'Trn1' not in return_instances:\n                    return_instances.extend(['Trn1', 'Trn2', 'Trn3', 'Inf2'])\n\n            if _in_list(cur_file, add_inf1_tag):\n                if 'Inf1' not in return_instances:\n                    return_instances.append('Inf1')\n\n            if _in_list(cur_file, clear_nc_v2_tag):\n                for tag in ['Trn1', 'Trn2', 'Trn3', 'Inf2']:\n                    if tag in return_instances:\n                        return_instances.remove(tag)\n\n            if _in_list(cur_file, clear_trn1_tag):\n                if 'Trn1' in return_instances:\n                    return_instances.remove('Trn1')\n\n            if _in_list(cur_file, clear_trn2_tag):\n                if 'Trn2' in return_instances:\n                    return_instances.remove('Trn2')\n\n            if _in_list(cur_file, clear_trn3_tag):\n                if 'Trn3' in return_instances:\n                    return_instances.remove('Trn3')\n\n            if _in_list(cur_file, clear_inf1_tag):\n                if 'Inf1' in return_instances:\n                    return_instances.remove('Inf1')\n\n            if _in_list(cur_file, clear_inf2_tag):\n                if 'Inf2' in return_instances:\n                    return_instances.remove('Inf2')\n\n        # Step 5: Generate output\n        return_instances = sorted(set(return_instances))\n        if return_instances:\n            text = TEXT_TEMPLATE + ', '.join('``' + i + '``' for i in return_instances)\n        else:\n            text = ''\n\n        return self._render(text)\n\n    def _render(self, text):\n        \"\"\"Parse RST text and return docutils nodes.\"\"\"\n        rst = ViewList()\n        rst.append(text, \"neuron-tag\", 1)\n        node = nodes.section()\n        node.document = self.state.document\n        nested_parse_with_titles(self.state, rst, node)\n        return node.children\n\n\ndef setup(app):\n    app.add_directive(\"neuron-tag\", NeuronTag)\n    return {\n        'version': '0.2',\n        'parallel_read_safe': True,\n        'parallel_write_safe': True,\n    }\n"
  },
  {
    "path": "_ext/release-notes-automation-spec.md",
    "content": "# Release Notes Review Automation Specification\n\n## Overview\n\nThis specification defines a GitHub Action that automatically reviews release notes files in pull requests using Amazon Q CLI to ensure they meet quality standards defined in the release notes writing guidelines.\n\n## Purpose\n\nAutomate the review of release notes changes to:\n- Ensure consistency and quality across all release notes\n- Catch common issues before human review\n- Provide immediate feedback to PR authors\n- Reduce manual review burden on documentation team\n\n## Scope\n\n### In Scope\n- PRs labeled with \"release-notes\"\n- RST files under `/release-notes/components/` directory\n- Files that have been modified in the PR (not just added to context)\n- Automated review using Q CLI with release notes guidelines\n- Posting review feedback as PR comments\n\n### Out of Scope\n- Release notes files outside `/release-notes/components/`\n- Non-RST files\n- PRs without the \"release-notes\" label\n- Manual approval/rejection of PRs (action only provides feedback)\n\n## Requirements\n\n### Functional Requirements\n\n#### FR1: PR Detection and Filtering\n- **FR1.1**: Action triggers on pull request events (opened, synchronize, labeled)\n- **FR1.2**: Action only runs when PR has \"release-notes\" label\n- **FR1.3**: Action identifies all changed RST files in `/release-notes/components/` directory\n\n#### FR2: File Analysis\n- **FR2.1**: Action reads content of each changed RST file\n- **FR2.2**: Action loads release notes guidelines from `_ext/release-notes-context.md`\n- **FR2.3**: Action processes files individually to provide file-specific feedback\n\n#### FR3: Q CLI Integration\n- **FR3.1**: Action invokes Amazon Q CLI with appropriate context\n- **FR3.2**: Action provides Q CLI with:\n  - Release notes guidelines from `_ext/release-notes-context.md`\n  - Content of the changed RST file\n  - Instruction to review against guidelines\n- **FR3.3**: Action captures Q CLI output for each file\n\n#### FR4: Review Feedback\n- **FR4.1**: Action formats Q CLI feedback into readable PR comment\n- **FR4.2**: Action posts comment to PR with review results\n- **FR4.3**: Comment includes:\n  - List of files reviewed\n  - Issues found per file (using format from guidelines)\n  - Suggested improvements\n  - Link to full guidelines document\n- **FR4.4**: If no issues found, action posts positive confirmation\n\n#### FR5: Error Handling\n- **FR5.1**: Action handles Q CLI failures gracefully\n- **FR5.2**: Action reports when no RST files are found in scope\n- **FR5.3**: Action logs errors for debugging without failing the PR\n\n### Non-Functional Requirements\n\n#### NFR1: Performance\n- Action completes review within 5 minutes for typical PRs (1-5 files)\n- Action processes files in parallel when possible\n\n#### NFR2: Security\n- Action uses GitHub secrets for Q CLI credentials\n- Action has read-only access to repository\n- Action has write access only to PR comments\n\n#### NFR3: Maintainability\n- Action configuration is version controlled in `.github/workflows/`\n- Action uses official Q CLI container/action when available\n- Action logic is simple and well-documented\n\n## User Stories\n\n### US1: Automatic Review Trigger\n**As a** documentation contributor  \n**I want** the review action to run automatically when I label my PR  \n**So that** I get immediate feedback without manual intervention\n\n**Acceptance Criteria:**\n- Action triggers when \"release-notes\" label is added\n- Action runs on subsequent commits to labeled PR\n- Action does not run on PRs without the label\n\n### US2: Targeted File Review\n**As a** documentation contributor  \n**I want** only my changed release notes files to be reviewed  \n**So that** I get relevant feedback without noise from unchanged files\n\n**Acceptance Criteria:**\n- Only files in `/release-notes/components/*.rst` are reviewed\n- Only files modified in the PR are analyzed\n- Files in other directories are ignored\n\n### US3: Clear Feedback\n**As a** documentation contributor  \n**I want** clear, actionable feedback on my release notes  \n**So that** I know exactly what to improve\n\n**Acceptance Criteria:**\n- Feedback follows the format specified in guidelines\n- Each issue includes: original text, problem, example rewrite, action items\n- Feedback is posted as a PR comment\n- Comment includes link to full guidelines\n\n### US4: No False Failures\n**As a** documentation contributor  \n**I want** the action to provide feedback without blocking my PR  \n**So that** I can address issues without being blocked by automation\n\n**Acceptance Criteria:**\n- Action never fails the PR check\n- Action always succeeds even if issues are found\n- Issues are reported as comments, not check failures\n\n## Technical Design\n\n### GitHub Action Workflow\n\n**File Location:** `.github/workflows/release-notes-review.yml`\n\n**Trigger Events:**\n```yaml\non:\n  pull_request:\n    types: [opened, synchronize, labeled]\n    paths:\n      - 'release-notes/components/**/*.rst'\n```\n\n**Workflow Steps:**\n\n1. **Check Label**\n   - Verify PR has \"release-notes\" label\n   - Exit gracefully if label not present\n\n2. **Get Changed Files**\n   - Use GitHub API to get list of changed files\n   - Filter for `release-notes/components/**/*.rst`\n   - Exit if no matching files found\n\n3. **Setup Q CLI**\n   - Install/configure Amazon Q CLI\n   - Authenticate using GitHub secrets\n\n4. **Load Guidelines**\n   - Read `_ext/release-notes-context.md`\n   - Prepare as context for Q CLI\n\n5. **Review Each File**\n   - For each changed RST file:\n     - Read file content\n     - Invoke Q CLI with prompt:\n       ```\n       Review the following release notes file against the guidelines provided.\n       \n       Guidelines: [content from release-notes-context.md]\n       \n       File: [filename]\n       Content: [file content]\n       \n       Provide feedback using the review format specified in the guidelines.\n       Focus on: customer visibility, documentation links, impact clarity, \n       specific conditions, and actionable information.\n       ```\n     - Capture Q CLI response\n\n6. **Format Feedback**\n   - Combine all file reviews into single comment\n   - Format as markdown with sections per file\n   - Include summary at top\n\n7. **Post Comment**\n   - Post formatted feedback as PR comment\n   - Include link to guidelines\n   - Tag PR author\n\n### Q CLI Prompt Template\n\n```markdown\nYou are reviewing release notes for the AWS Neuron SDK. Review the following \nfile against the release notes writing guidelines.\n\nGUIDELINES:\n[Full content of _ext/release-notes-context.md]\n\nFILE TO REVIEW: {filename}\n\nCONTENT:\n{file_content}\n\nINSTRUCTIONS:\n1. Review the content against all guidelines\n2. Identify issues using the review format from the guidelines\n3. For each issue, provide:\n   - Issue number and title\n   - Original text\n   - Problem description\n   - Phrasing problem (if applicable)\n   - Example rewrite\n   - Specific action items\n4. If no issues found, state \"No issues found - release notes meet guidelines\"\n\nFocus especially on:\n- Customer-visible language (no internal code names)\n- Documentation URLs for all new features\n- Specific conditions (not vague language)\n- Clear impact statements\n- Proper categorization (breaking changes vs bug fixes)\n- Migration guidance for breaking changes\n```\n\n### Comment Format Template\n\n```markdown\n## 🤖 Release Notes Review\n\nThis PR modifies {count} release notes file(s). Here's the automated review:\n\n### Files Reviewed\n- ✅ `release-notes/components/file1.rst` - {issue_count} issue(s)\n- ✅ `release-notes/components/file2.rst` - No issues found\n\n---\n\n### 📝 Review Feedback\n\n#### File: `release-notes/components/file1.rst`\n\n[Q CLI feedback for file1]\n\n---\n\n#### File: `release-notes/components/file2.rst`\n\n[Q CLI feedback for file2]\n\n---\n\n### 📚 Resources\n\n- [Release Notes Writing Guidelines](_ext/release-notes-context.md)\n- Need help? Tag @documentation-team\n\n---\n\n*This is an automated review. Please address the feedback and request human \nreview when ready.*\n```\n\n## Implementation Notes\n\n### GitHub Action Configuration\n\n**Required Secrets:**\n- `Q_CLI_TOKEN` or equivalent for Q CLI authentication\n\n**Required Permissions:**\n```yaml\npermissions:\n  contents: read\n  pull-requests: write\n```\n\n**Environment:**\n- Ubuntu latest runner\n- Node.js 18+ (if using JavaScript action)\n- Python 3.9+ (if using Python script)\n\n### Q CLI Integration Options\n\n**Option 1: Direct CLI Invocation**\n```bash\nq chat --prompt-file prompt.txt --context-file guidelines.md\n```\n\n**Option 2: Q CLI GitHub Action** (if available)\n```yaml\n- uses: aws/q-cli-action@v1\n  with:\n    prompt: ${{ steps.prepare.outputs.prompt }}\n    context: ${{ steps.prepare.outputs.context }}\n```\n\n**Option 3: API Integration** (if Q provides API)\n```python\nimport q_cli\nresponse = q_cli.chat(prompt=prompt, context=guidelines)\n```\n\n## Testing Strategy\n\n### Unit Tests\n- Test file filtering logic\n- Test prompt generation\n- Test comment formatting\n\n### Integration Tests\n- Test with sample PR containing valid release notes\n- Test with sample PR containing issues\n- Test with PR without \"release-notes\" label\n- Test with PR modifying non-component files\n\n### Manual Testing\n- Create test PR with intentional issues\n- Verify action triggers correctly\n- Verify feedback is accurate and helpful\n- Verify comment formatting is readable\n\n## Success Criteria\n\n1. **Automation Works**: Action runs on 100% of labeled PRs\n2. **Accurate Detection**: Action correctly identifies changed RST files\n3. **Useful Feedback**: 80%+ of PR authors find feedback helpful\n4. **No False Blocks**: Action never blocks valid PRs\n5. **Performance**: Action completes within 5 minutes\n6. **Reliability**: Action succeeds 95%+ of the time\n\n## Future Enhancements\n\n### Phase 2 (Optional)\n- Support for reviewing other release notes files (not just components)\n- Severity levels for issues (critical, warning, suggestion)\n- Auto-fix suggestions as code suggestions\n- Integration with PR review status\n- Metrics dashboard for common issues\n\n### Phase 3 (Optional)\n- Pre-commit hook for local review\n- VS Code extension for real-time feedback\n- Training mode to help new contributors learn guidelines\n- Historical analysis of release notes quality trends\n\n## Dependencies\n\n- GitHub Actions infrastructure\n- Amazon Q CLI availability and access\n- Repository write access for bot account\n- `_ext/release-notes-context.md` guidelines file\n\n## Risks and Mitigations\n\n| Risk | Impact | Mitigation |\n|------|--------|------------|\n| Q CLI unavailable | High | Graceful failure with manual review fallback |\n| Q CLI rate limits | Medium | Implement retry logic and rate limiting |\n| False positives | Medium | Continuous refinement of guidelines and prompts |\n| Action performance | Low | Parallel processing and caching |\n| Cost of Q CLI usage | Low | Monitor usage and set budget alerts |\n\n## Rollout Plan\n\n1. **Phase 1**: Implement basic action with manual trigger\n2. **Phase 2**: Enable automatic trigger on label\n3. **Phase 3**: Gather feedback and refine prompts\n4. **Phase 4**: Expand to other release notes files if successful\n\n## Maintenance\n\n- **Owner**: Documentation team\n- **Review Frequency**: Quarterly\n- **Update Triggers**: \n  - Changes to release notes guidelines\n  - Q CLI updates\n  - User feedback on accuracy\n  - GitHub Actions platform changes\n"
  },
  {
    "path": "_ext/release-notes-context.md",
    "content": "# Release Notes Writing Guidelines\n\n## Core Principles\n\n### Answer Three Questions for Every Item\n\n- **What?** — What feature/API is affected?\n- **When?** — Under what conditions does this occur?\n- **So what?** — What is the impact on the user?\n\n### All Content Must Be:\n\n- **Customer-visible** - Written from the customer's perspective about capabilities they can use\n- **Documented** - If documentation doesn't exist, exclude the feature. All new features must include documentation URLs.\n- **Actionable** - Include workarounds, timelines, or how to check if affected\n\n## DO:\n\n- **Write in customer-visible terms** - Describe what customers can now do, not how it was implemented\n- **State the impact clearly** - Use concrete language about what happens to users\n- **Be specific about conditions** - Replace vague phrases with precise conditions\n- **Quantify performance improvements** - Provide specific before/after metrics (e.g., \"improved from 2.164x to 3.654x speedup\") and state the conditions that trigger these improvements (e.g., \"for batch I/O operations with 1024 ops at 10KB\")\n- **Explain the impact of wrong defaults** - When fixing incorrect default values, state what the wrong default was and what impact it had on users\n- **Specify what was missing** - When fixing \"missing\" items, list what was missing and confirm they are now documented\n- **Describe previous behavior for bugs** - Always explain what the incorrect behavior was before the fix\n- **Categorize breaking changes correctly** - If a bug fix changes API behavior (e.g., renaming a parameter), list it under Breaking Changes, not Bug Fixes\n- **Provide actionable information** - Include workarounds if available, fix timelines if known, or how users can check if they're affected\n- **Provide migration guidance for breaking changes** - Tell users what they should do when behavior changes, with before/after examples\n- **Link to documentation** - Every feature must have corresponding documentation with URL\n- **Include documentation URLs for all new features** - If no URL exists, either create documentation first or remove the feature from release notes\n- **Use standard terminology** - Use terms your audience already knows\n- **Use clear, descriptive sentences** - Transform technical phrases into customer-understandable language\n- **Focus on customer-visible results** - Describe what customers will see, not internal mechanics\n- **Drop unnecessary words** - Remove \"when specified,\" \"may,\" \"is in progress\" when they add no value\n- **Remove empty sections** - Don't include placeholder text like \"None in this release\"\n- **Verify accuracy** - Check version numbers, dates, and technical details\n- **Run IP scanner** - Catch any internal code name leaks before publishing\n- **Use active voice** - Write \"The system ignores the parameter\" instead of \"The parameter is ignored\"\n- **Define abbreviations on first use** - Write \"time to first token (TTFT)\" before using \"TTFT\"\n- **Remove temporal qualifiers** - Replace \"for now\" with specific timelines or remove entirely\n- **Provide concrete examples** - Include calculation examples for complex parameters\n\n## DO NOT:\n\n- **Include internal code names** - Remove references like \"TRN3PDS\", \"Mariana\", \"Penguin\"\n- **Document undocumented features** - If documentation doesn't exist, exclude the feature\n- **Include features without documentation URLs** - Every new feature must have a documentation link\n- **List unreleased features** - Only include features available to customers\n- **Include internal-only metrics** - Remove metrics useful only internally\n- **Document bugs never released** - Only include fixes for publicly released issues\n- **Use internal API names** - Unless they're part of the public API\n- **Include debug variables** - Remove environment variables meant only for internal use\n- **Use vague language** - Avoid \"in certain cases,\" \"some patterns,\" \"may sometimes\"\n- **Use ambiguous phrasing** - Avoid phrases like \"Fixed dynamic for loop\" that could mean multiple things\n- **Leave impacts unexplained** - Don't just say \"fixed wrong default\" without explaining what the impact was\n- **Mix breaking changes with bug fixes** - Parameter renames or behavior changes belong in Breaking Changes, not Bug Fixes\n- **Create heavy noun chains** - Break up complex phrases (e.g., \"dtype override was ignored during reshape\" not \"reshape dtype override not being applied\")\n- **Write without context** - Every change needs metrics, conditions, or migration guidance\n- **Use hedging language** - Replace \"may result in\" with \"results in\" when deterministic\n- **Focus on internal implementation** - Avoid phrases like \"internally uses\" or internal platform identifiers\n- **Use passive voice without clear subject** - Avoid constructions where the actor is unclear\n- **Reference undefined versions** - Don't use \"V0\" or \"V1\" without defining them\n\n## Impact Statements\n\n| Avoid | Prefer |\n|-------|--------|\n| \"incorrectly interpret\" | \"produces incorrect results\" |\n| \"not being applied\" | \"is ignored\" |\n| \"failing check\" | \"crashes with validation error\" |\n| \"may incorrectly interpret tensor shapes\" | \"can produce incorrect results when transposing tensors\" |\n\n## Conditions - Be Specific\n\n| Avoid | Prefer |\n|-------|--------|\n| \"in certain cases\" | \"when reduction axis is not the last dimension\" |\n| \"some patterns\" | \"multi-dimensional transposes with more than 2 axes\" |\n| \"may sometimes\" | \"consistently occurs when...\" |\n| \"for now\" | \"Support is planned for version X.X.X\" or remove entirely |\n| \"small inputs\" | \"inputs under 512 tokens\" |\n| \"low batch sizes\" | \"batch sizes of 4 or less\" |\n\n## Phrasing Examples\n\n### Bug Fixes:\n\n| Avoid | Prefer |\n|-------|--------|\n| \"Fixed bug in nrt_vnc_usage_find_internal\" | \"Improved error handling to return a clear error instead of asserting during nrt_init\" |\n| \"Fixed dynamic for loop incorrectly incrementing the loop induction variable\" | \"Fixed: dynamic for loops now correctly increment the loop counter. Previously, the counter incremented incorrectly, causing [specific impact]\" |\n| \"Fixed reshape dtype override not being applied when specified\" | \"Fixed a bug where specifying a data type override during a reshape operation was ignored\" |\n| \"Fixed reshape of shared/private HBM tensors failing partition size check\" | \"Fixed a bug where reshaping tensors stored in shared or private HBM incorrectly failed the partition size check\" |\n| \"Fixed incorrect default value for on_false_value\" | \"Fixed incorrect default value for on_false_value in nki.isa.range_select. Previously defaulted to [X], now correctly defaults to [Y], which [impact]\" |\n\n### Performance Improvements:\n\n| Avoid | Prefer |\n|-------|--------|\n| \"Optimized zero-copy operations by enabling descriptor merging\" | \"Enhanced zero-copy operation performance: Write performance improved from 2.164x to 3.654x speedup for batch I/O operations(1_Batch_1024_Ops_10_KBs)\" |\n| \"Optimized mesh AllGather on TP8 configurations using destination routing\" | \"Optimized mesh AllGather: [X]% performance improvement on TP8 configurations when [specific conditions]\" |\n\n### New Features:\n\n| Avoid | Prefer |\n|-------|--------|\n| \"Added support for TRN3PDS platform\" | \"Added support for [public instance type name] with optimized topology configurations for distributed training. See [documentation URL]\" |\n| \"Added IOCTL to lookup Neuron device/HBM for a given virtual address\" | \"Added capability to lookup Neuron device for a given virtual address, enabling frameworks to identify which device holds a tensor. See [documentation link] for API details\" |\n\n### Known Issues:\n\n| Avoid | Prefer |\n|-------|--------|\n| \"may incorrectly interpret tensor shapes in certain multi-dimensional transpose patterns\" | \"can produce incorrect results when transposing tensors with certain multi-dimensional shapes\" |\n| \"Training, Inference, and Penguin kernels compilation and execution validation is in progress\" | Remove entirely (internal project name and not customer-actionable) |\n| \"Chunked prefill is not supported on Neuron for now\" | \"Chunked prefill is not supported. If you attempt to enable it with DISABLE_NEURON_CUSTOM_SCHEDULER='1', the system will fail to start with an error. Use standard prefill mode instead.\" |\n\n## Breaking Changes Checklist\n\nWhen documenting breaking changes, always include:\n\n1. **What changed** - The specific API, parameter, or behavior\n2. **Why it's breaking** - What will stop working\n3. **Migration path** - What users should do instead\n4. **Example (if helpful)** - Show old vs. new usage\n\n### Example:\n\n**Breaking:** NumPy synonyms (e.g., `np.add` for `nl.add`) are no longer accepted in NKI API calls.\n\n**Migration:** Replace all NumPy function calls with their NKI equivalents:\n- Replace `np.add(x, y)` with `nl.add(x, y)`\n- Replace `np.multiply(x, y)` with `nl.multiply(x, y)`\n\nAlways explain:\n- Why is this breaking?\n- What was the previous behavior?\n- What is the workaround or migration effort?\n\n## Quick Template\n\n```\n[Fixed/Known Issue]: [API/Feature] [impact] when [specific conditions]. [Optional: Workaround or timeline.]\n```\n\n### Example:\n\n```\nFixed: nki.isa.dma_copy causes a runtime timeout when copying FP32 from SBUF to BF16 in HBM with indirect addressing. Workaround: cast to BF16 in SBUF before copying.\n```\n\n## Quality Checks Before Publishing\n\n1. **No internal names** - Run IP scanner to catch code name leaks\n2. **Customer value** - Each item explains why customers should care\n3. **Documentation links** - New features link to relevant docs with URLs\n4. **Documentation exists** - Verify all features are documented before including; if no documentation URL exists, remove the feature from release notes\n5. **Accuracy** - Technical details are correct and verifiable\n6. **Clarity** - Phrasing is clear and professional\n7. **Completeness** - Previous behavior and migration paths explained\n8. **Impact explained** - Bug fixes describe what was broken and what the impact was\n9. **Active voice** - Sentences use active voice with clear subjects\n10. **Abbreviations defined** - All abbreviations spelled out on first use\n11. **No vague language** - All conditions and impacts are specific and quantified\n12. **Examples provided** - Complex parameters include calculation examples\n\n## Key Principles\n\n### All content must be:\n\n- **Customer-visible** (not internal implementation details)\n- **Documented with URLs** (if docs don't exist, exclude it)\n- **Impactful** (explain value, not just what changed)\n\n### Every bug fix must answer:\n\n- What was broken?\n- What was the impact?\n- What works now?\n\n### Every new feature must include:\n\n- Documentation URL\n- Customer benefit\n- Usage guidance or examples\n\n## How to Review Release Notes\n\nWhen reviewing release notes against these guidelines, provide feedback in the following format:\n\n### Issue [Number]: [Brief Issue Title]\n\n**Original Text:**\n```\n[Exact text from the release notes]\n```\n\n**Problem:**\n[Description of the content/completeness issue]\n\n**Phrasing Problem:**\n[Description of the language/clarity issue, if applicable]\n\n**Example Rewrite:**\n```\n[Suggested improved version showing correct phrasing and content]\n```\n\n**Action:**\n- [Specific action item 1]\n- [Specific action item 2]\n\n## Review Process:\n\n1. **Extract original text** - Include the exact text being reviewed\n2. **Identify problems** - Separate content issues from phrasing issues\n3. **Provide examples** - Show how to rewrite the text correctly\n4. **List actions** - Give specific, actionable steps to fix each issue\n5. **Check documentation** - Verify URLs exist for all new features; if not, recommend removal\n6. **Verify completeness** - Ensure all three questions (What? When? So what?) are answered\n7. **Check phrasing** - Identify vague language, passive voice, undefined terms, internal references\n8. **Validate breaking changes** - Ensure migration guidance and before/after examples are included\n"
  },
  {
    "path": "_ext/sphinx_plotly_directive.py",
    "content": "\"\"\"\nCODE FROM: https://github.com/harupy/sphinx-plotly-directive\nLICENSE: MIT\n\nBased on: https://matplotlib.org/3.1.3/devel/plot_directive.html\n\nA directive for including a Plotly figure in a Sphinx document\n================================================================\n\nBy default, in HTML output, `plot` will include a .png file with a link to a\nhigh-res .png and .pdf.  In LaTeX output, it will include a .pdf.\n\nThe source code for the plot may be included in one of three ways:\n\n1. **A path to a source file** as the argument to the directive::\n\n     .. plot:: path/to/plot.py\n\n   When a path to a source file is given, the content of the\n   directive may optionally contain a caption for the plot::\n\n     .. plot:: path/to/plot.py\n\n        The plot's caption.\n\n   Additionally, one may specify the name of a function to call (with\n   no arguments) immediately after importing the module::\n\n     .. plot:: path/to/plot.py plot_function1\n\n2. Included as **inline content** to the directive::\n\n     .. plotly::\n\n        import plotly.express as px\n        px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])\n\n3. Using **doctest** syntax::\n\n     .. plotly::\n\n        A plotting example:\n        >>> import plotly.express as px\n        >>> px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])\n\n4. Using the `fig-vars` option. In the example below, `fig1` and `fig2` will be\n   rendered::\n\n     .. plotly::\n        :fig-vars: fig1, fig2\n\n        import plotly.express as px\n        fig1 = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])\n        fig2 = px.scatter(x=[4, 3, 2, 1, 0], y=[0, 1, 4, 9, 16])\n\nOptions\n-------\n\nThe ``plotly`` directive supports the following options:\n\n    format : {'python', 'doctest'}\n        The format of the input.\n\n    include-source : bool\n        Whether to display the source code. The default can be changed\n        using the `plot_include_source` variable in :file:`conf.py`.\n\n    encoding : str\n        If this source file is in a non-UTF8 or non-ASCII encoding, the\n        encoding must be specified using the ``:encoding:`` option.  The\n        encoding will not be inferred using the ``-*- coding -*-`` metacomment.\n\n    context : bool or str\n        If provided, the code will be run in the context of all previous plot\n        directives for which the ``:context:`` option was specified.  This only\n        applies to inline code plot directives, not those run from files. If\n        the ``:context: reset`` option is specified, the context is reset\n        for this and future plots, and previous figures are closed prior to\n        running the code. ``:context: close-figs`` keeps the context but closes\n        previous figures before running the code.\n\n    nofigs : bool\n        If specified, the code block will be run, but no figures will be\n        inserted.  This is usually useful with the ``:context:`` option.\n\n    caption : str\n        If specified, the option's argument will be used as a caption for the\n        figure. This overwrites the caption given in the content, when the plot\n        is generated from a file.\n\n    iframe-width\n        The width of the iframe in which a plotly figure is rendered. The default can be changed\n        using the `plotly_iframe_width` variable in :file:`conf.py`.\n\n    iframe-height\n        The height of the iframe in which a plotly figure is rendered. The default can be changed\n        using the `plotly_iframe_height` variable in :file:`conf.py`.\n\nAdditionally, this directive supports all of the options of the `image`\ndirective, except for *target* (since plot will add its own target).  These\ninclude *alt*, *height*, *width*, *scale*, *align* and *class*.\n\nConfiguration options\n---------------------\n\nThe plot directive has the following configuration options:\n\n    plotly_include_source\n        Default value for the include-source option\n\n    plotly_html_show_source_link\n        Whether to show a link to the source in HTML.\n\n    plotly_pre_code\n        Code that should be executed before each plot. If not specified or None\n        it will default to a string containing::\n\n            import numpy as np\n            import plotly\n            import plotly.graph_objects as go\n            import plotly.express as px\n\n    plotly_basedir\n        Base directory, to which ``plot::`` file names are relative\n        to.  (If None or empty, file names are relative to the\n        directory where the file containing the directive is.)\n\n    plotly_formats\n        File formats to generate. List of tuples or strings::\n\n            [(suffix, dpi), suffix, ...]\n\n        that determine the file format and the DPI. For entries whose\n        DPI was omitted, sensible defaults are chosen. When passing from\n        the command line through sphinx_build the list should be passed as\n        suffix:dpi,suffix:dpi, ...\n\n    plotly_html_show_formats\n        Whether to show links to the files in HTML.\n\n    plotly_working_directory\n        By default, the working directory will be changed to the directory of\n        the example, so the code can get at its data files, if any.  Also its\n        path will be added to `sys.path` so it can import any helper modules\n        sitting beside it.  This configuration option can be used to specify\n        a central directory (also added to `sys.path`) where data files and\n        helper modules for all code are located.\n\n    plotly_iframe_width\n        The width of the iframe in which a plotly figure is rendered. The default is \"100%\".\n\n    plotly_iframe_height\n        The height of the iframe in which a plotly figure is rendered. The default is \"500px\".\n\n    plotly_template\n        Provide a customized template for preparing restructured text.\n\"\"\"\n\nimport copy\nimport itertools\nimport os\nimport re\nimport shutil\nimport textwrap\nimport traceback\nfrom os.path import relpath\nfrom pathlib import Path\n\nimport jinja2  # Sphinx dependency.\nfrom docutils.parsers.rst import Directive, directives\nfrom docutils.parsers.rst.directives.images import Image\n\nimport re\nimport textwrap\n\nimport plotly\n\n\nINDENT_SPACES = \" \" * 3\n\n\ndef save_plotly_figure(fig, path):\n    r\"\"\"\n    Save a Plotly figure.\n    Parameters\n    ----------\n    fig : plotly figure\n        A plotly figure to save.\n    path : str\n        A file path.\n    Returns\n    -------\n    None\n    Examples\n    --------\n    >>> import plotly.express as px\n    >>> import tempfile\n    >>> fig = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16])\n    >>> path = tempfile.NamedTemporaryFile(suffix=\".html\").name\n    >>> save_plotly_figure(fig, path)\n    \"\"\"\n    fig_html = plotly.offline.plot(fig, output_type=\"div\", include_plotlyjs=\"cdn\", auto_open=False)\n    with open(path, \"w\") as f:\n        f.write(fig_html)\n\n\ndef assign_last_line_into_variable(code, variable_name):\n    r\"\"\"\n    Save a Plotly figure.\n    Parameters\n    ----------\n    code : str\n        A string representing code.\n    name : str\n        A variable name.\n    Returns\n    -------\n    str\n        Mew code.\n    Examples\n    --------\n    >>> code = \"a = 1\\nfunc(a)\"\n    >>> new_code = assign_last_line_into_variable(code, \"b\")\n    >>> print(new_code)\n    a = 1\n    b = func(a)\n    \"\"\"\n    lines = code.split(\"\\n\")\n    for idx in range(len(lines) - 1, -1, -1):\n        if lines[idx].strip() != \"\":\n            lines[idx] = \"{} = \".format(variable_name) + lines[idx]\n            break\n    return \"\\n\".join(lines)\n\n\ndef create_directive_block(name, arguments, options, content):\n    r\"\"\"\n    Create a directive block.\n    Parameters\n    ----------\n    name : str\n        A directive name.\n    arguments : list of str\n        Arguments of the directive.\n    option : dict\n        Option of the directive.\n    content : list of str\n        Content of the directive.\n    Returns\n    -------\n    str\n        A directive block.\n    Examples\n    --------\n    >>> block = create_directive_block(\n    ...     \"plotly\",\n    ...     [\"f1\", \"f2\"],\n    ...     {\"a\": 0, \"b\": 1},\n    ...     [\"l1\", \"l2\"],\n    ... )\n    >>> print(block)\n    .. plotly:: f1 f2\n       :a: 0\n       :b: 1\n    <BLANKLINE>\n       l1\n       l2\n    \"\"\"\n    header = \".. {}:: \".format(name) + \" \".join(arguments)\n    code = \"\\n\".join(map(str, content))\n\n    lines = [header]\n\n    if len(options.items()) > 0:\n\n        def process_value(v):\n            if isinstance(v, list):\n                return \", \".join(v)\n            return v\n\n        options_block = \"\\n\".join(\":{}: {}\".format(k, process_value(v)) for k, v in options.items())\n        lines.append(textwrap.indent(options_block, INDENT_SPACES))\n\n    lines.append(\"\")\n    lines.append(textwrap.indent(code, INDENT_SPACES))\n\n    return \"\\n\".join(lines)\n\n\ndef create_code_block(code, language=None):\n    return \"\\n\".join(\n        [\n            \".. code-block::{}\".format(\" \" + language if language else \"\"),\n            \"\",\n            textwrap.indent(code.strip(), INDENT_SPACES),\n            \"\",\n        ]\n    )\n\n\ndef strip_last_line(code):\n    r\"\"\"\n    Strips the last line of the give code block\n    Parameters\n    ----------\n    code : str\n        Code to strip\n    Returns\n    -------\n    str:\n        Stripped code\n    Examples\n    --------\n    >>> strip_last_line(\"a\")\n    ''\n    >>> strip_last_line(\"a\\nb\")\n    'a'\n    >>> strip_last_line(\"a\\nb\\nc\")\n    'a\\nb'\n    \"\"\"\n    return \"\\n\".join(code.strip().split(\"\\n\")[:-1])\n\n\ndef ends_with_show(code):\n    r\"\"\"\n    Returns True if the last line of the given code block ends with `show()`\n    Parameters\n    ----------\n    code : str\n        Code that may contain a line that looks like `fig.show()`\n    Returns\n    -------\n    str:\n        Variable name of the object that calls `show()`\n    Examples\n    --------\n    >>> ends_with_show(\"fig.show()\")  # simple\n    True\n    >>> ends_with_show(\"fig.show(1, a=2)\")  # show with arguments\n    True\n    >>> ends_with_show(\"fig = dummy\\nfig.show()\\n\")  # multiline\n    True\n    >>> ends_with_show(\"foo\")  # doesn't contains `show`\n    False\n    \"\"\"\n    # TODO: Use a more strict regular expression\n    pattern = r\"^(.+)\\.show\\(.*\\)$\"\n    match = re.search(pattern, code.strip().split(\"\\n\")[-1], flags=re.DOTALL)\n    return bool(match)\n\n\n# -----------------------------------------------------------------------------\n# Registration hook\n# -----------------------------------------------------------------------------\n\n\ndef _option_boolean(arg):\n    if not arg or not arg.strip():\n        # no argument given, assume used as a flag\n        return True\n    elif arg.strip().lower() in (\"no\", \"0\", \"false\"):\n        return False\n    elif arg.strip().lower() in (\"yes\", \"1\", \"true\"):\n        return True\n    else:\n        raise ValueError('\"%s\" unknown boolean' % arg)\n\n\ndef _option_context(arg):\n    if arg in [None, \"reset\", \"close-figs\"]:\n        return arg\n    raise ValueError(\"Argument should be None or 'reset' or 'close-figs'\")\n\n\ndef _option_format(arg):\n    return directives.choice(arg, (\"python\", \"doctest\"))\n\n\ndef _option_fig_vars(arg):\n    return [x.strip() for x in arg.split(\",\")]\n\n\ndef mark_plot_labels(app, document):\n    \"\"\"\n    To make plots referenceable, we need to move the reference from the\n    \"htmlonly\" (or \"latexonly\") node to the actual figure node itself.\n    \"\"\"\n    for name, explicit in document.nametypes.items():\n        if not explicit:\n            continue\n        labelid = document.nameids[name]\n        if labelid is None:\n            continue\n        node = document.ids[labelid]\n        if node.tagname in (\"html_only\", \"latex_only\"):\n            for n in node:\n                if n.tagname == \"figure\":\n                    sectname = name\n                    for c in n:\n                        if c.tagname == \"caption\":\n                            sectname = c.astext()\n                            break\n\n                    node[\"ids\"].remove(labelid)\n                    node[\"names\"].remove(name)\n                    n[\"ids\"].append(labelid)\n                    n[\"names\"].append(name)\n                    document.settings.env.labels[name] = (\n                        document.settings.env.docname,\n                        labelid,\n                        sectname,\n                    )\n                    break\n\n\nclass PlotlyDirective(Directive):\n    \"\"\"The ``.. plotly::`` directive, as documented in the module's docstring.\"\"\"\n\n    has_content = True\n    required_arguments = 0\n    optional_arguments = 2\n    final_argument_whitespace = False\n    option_spec = {\n        \"alt\": directives.unchanged,\n        \"height\": directives.length_or_unitless,\n        \"width\": directives.length_or_percentage_or_unitless,\n        \"scale\": directives.nonnegative_int,\n        \"align\": Image.align,\n        \"class\": directives.class_option,\n        \"include-source\": _option_boolean,\n        \"format\": _option_format,\n        \"context\": _option_context,\n        \"nofigs\": directives.flag,\n        \"encoding\": directives.encoding,\n        \"caption\": directives.unchanged,\n        \"fig-vars\": _option_fig_vars,\n        \"iframe-width\": directives.unchanged,\n        \"iframe-height\": directives.unchanged,\n    }\n\n    def run(self):\n        \"\"\"Run the plot directive.\"\"\"\n        try:\n            return run(\n                self.arguments,\n                self.content,\n                self.options,\n                self.state_machine,\n                self.state,\n                self.lineno,\n            )\n        except Exception as e:\n            raise self.error(str(e))\n\n\ndef setup(app):\n    setup.app = app\n    setup.config = app.config\n    setup.confdir = app.confdir\n    app.add_directive(\"plotly\", PlotlyDirective)\n    app.add_config_value(\"plotly_pre_code\", None, True)\n    app.add_config_value(\"plotly_include_source\", False, True)\n    app.add_config_value(\"plotly_html_show_source_link\", True, True)\n    app.add_config_value(\"plotly_formats\", [\"html\"], True)\n    app.add_config_value(\"plotly_basedir\", None, True)\n    app.add_config_value(\"plotly_html_show_formats\", True, True)\n    app.add_config_value(\"plotly_working_directory\", None, True)\n    app.add_config_value(\"plotly_iframe_width\", \"100%\", True)\n    app.add_config_value(\"plotly_iframe_height\", \"500px\", True)\n    app.add_config_value(\"plotly_template\", None, True)\n\n    app.add_config_value(\"plotly_include_directive_source\", None, False)\n\n    app.connect(\"doctree-read\", mark_plot_labels)\n\n    metadata = {\n        \"parallel_read_safe\": True,\n        \"parallel_write_safe\": True,\n        \"version\": 0.1,\n    }\n    return metadata\n\n\n# -----------------------------------------------------------------------------\n# Doctest handling\n# -----------------------------------------------------------------------------\n\n\ndef contains_doctest(text):\n    try:\n        # check if it's valid Python as-is\n        compile(text, \"<string>\", \"exec\")\n        return False\n    except SyntaxError:\n        pass\n    r = re.compile(r\"^\\s*>>>\", re.M)\n    m = r.search(text)\n    return bool(m)\n\n\ndef unescape_doctest(text):\n    \"\"\"\n    Extract code from a piece of text, which contains either Python code\n    or doctests.\n    \"\"\"\n    if not contains_doctest(text):\n        return text\n\n    code = \"\"\n    for line in text.split(\"\\n\"):\n        m = re.match(r\"^\\s*(>>>|\\.\\.\\.) (.*)$\", line)\n        if m:\n            code += m.group(2) + \"\\n\"\n        elif line.strip():\n            code += \"# \" + line.strip() + \"\\n\"\n        else:\n            code += \"\\n\"\n    return code\n\n\ndef split_code_at_show(text):\n    \"\"\"Split code at plt.show().\"\"\"\n    parts = []\n    is_doctest = contains_doctest(text)\n\n    part = []\n    for line in text.split(\"\\n\"):\n        if (not is_doctest and line.strip() == \"plt.show()\") or (\n            is_doctest and line.strip() == \">>> plt.show()\"\n        ):\n            part.append(line)\n            parts.append(\"\\n\".join(part))\n            part = []\n        else:\n            part.append(line)\n    if \"\\n\".join(part).strip():\n        parts.append(\"\\n\".join(part))\n    return parts\n\n\n# -----------------------------------------------------------------------------\n# Template\n# -----------------------------------------------------------------------------\n\n\nTEMPLATE = \"\"\"\n{% if directive_source %}\nSource:\n\n{{ directive_source }}\n\nOutput:\n{% endif %}\n{{ source_code }}\n\n.. only:: html\n\n   {% if source_link or (html_show_formats and not multi_image) %}\n   (\n   {%- if source_link -%}\n   `Source code <{{ source_link }}>`__\n   {%- endif -%}\n   {%- if html_show_formats and not multi_image -%}\n     {%- for fig in figures -%}\n       {%- for fmt in fig.formats -%}\n         {%- if source_link or not loop.first -%}, {% endif -%}\n         `{{ fmt }} <{{ dest_dir }}/{{ fig.basename }}.{{ fmt }}>`__\n       {%- endfor -%}\n     {%- endfor -%}\n   {%- endif -%}\n   )\n   {% endif %}\n\n   {% for fig in figures %}\n   .. raw:: html\n      {% for option in options -%}\n      {{ option }}\n      {% endfor %}\n\n       <iframe src=\"{{ fig.basename }}.{{ default_fmt }}\" width=\"{{ iframe_width }}\"\n        height=\"{{ iframe_height }}\" frameborder=\"0\"></iframe>\n\n   {% if html_show_formats and multi_figure -%}\n     (\n     {%- for fmt in fig.formats -%}\n     {%- if not loop.first -%}, {% endif -%}\n     `{{ fmt }} <{{ dest_dir }}/{{ fig.basename }}.{{ fmt }}>`__\n     {%- endfor -%}\n     )\n   {%- endif -%}\n\n      {{ caption }}\n   {% endfor %}\n\n.. only:: not html\n\n   {% for fig in figures %}\n   .. raw:: html\n      {% for option in options -%}\n      {{ option }}\n      {% endfor %}\n\n       <iframe src=\"{{ fig.basename }}.{{ default_fmt }}\" width=\"{{ iframe_width }}\"\n        height=\"{{ iframe_height }}\" frameborder=\"0\"></iframe>\n\n      {{ caption }}\n   {% endfor %}\n\n\"\"\"\n\nexception_template = \"\"\"\n.. only:: html\n\n   [`source code <%(linkdir)s/%(basename)s.py>`__]\n\nException occurred rendering plot.\n\n\"\"\"\n\n# the context of the plot for all directives specified with the\n# :context: option\nplot_context = dict()\n\n\nclass FigureFile:\n    def __init__(self, basename, dirname):\n        self.basename = basename\n        self.dirname = dirname\n        self.formats = []\n\n    def filename(self, format):\n        return os.path.join(self.dirname, \"%s.%s\" % (self.basename, format))\n\n    def filenames(self):\n        return [self.filename(fmt) for fmt in self.formats]\n\n\ndef out_of_date(original, derived):\n    \"\"\"\n    Return whether *derived* is out-of-date relative to *original*, both of\n    which are full file paths.\n    \"\"\"\n    return not os.path.exists(derived) or (\n        os.path.exists(original) and os.stat(derived).st_mtime < os.stat(original).st_mtime\n    )\n\n\nclass PlotError(RuntimeError):\n    pass\n\n\ndef run_code(code, code_path, ns=None, function_name=None, fig_vars=None):\n    \"\"\"\n    Import a Python module from a path, and run the function given by\n    name, if function_name is not None.\n    \"\"\"\n\n    # Change the working directory to the directory of the example, so\n    # it can get at its data files, if any.  Add its path to sys.path\n    # so it can import any helper modules sitting beside it.\n    pwd = os.getcwd()\n    if setup.config.plotly_working_directory is not None:\n        try:\n            os.chdir(setup.config.plotly_working_directory)\n        except OSError as err:\n            raise OSError(\n                str(err) + \"\\n`plot_working_directory` option in\"\n                \"Sphinx configuration file must be a valid \"\n                \"directory path\"\n            ) from err\n        except TypeError as err:\n            raise TypeError(\n                str(err) + \"\\n`plot_working_directory` option in \"\n                \"Sphinx configuration file must be a string or \"\n                \"None\"\n            ) from err\n    elif code_path is not None:\n        dirname = os.path.abspath(os.path.dirname(code_path))\n        os.chdir(dirname)\n\n    try:\n        code = unescape_doctest(code)\n        if ns is None:\n            ns = {}\n        if not ns:\n            if setup.config.plotly_pre_code is None:\n                exec(\n                    \"\\n\".join(\n                        [\n                            \"import numpy as np\",\n                            \"import plotly\",\n                            \"import plotly.graph_objects as go\",\n                            \"import plotly.express as px\",\n                        ]\n                    ),\n                    ns,\n                )\n            else:\n                exec(str(setup.config.plotly_pre_code), ns)\n        if \"__main__\" in code:\n            ns[\"__name__\"] = \"__main__\"\n\n        variable_name = \"fig\"\n\n        if ends_with_show(code):\n            exec(strip_last_line(code), ns)\n            figs = [ns[fig_var] for fig_var in fig_vars] if fig_vars else [ns[variable_name]]\n        elif function_name is not None:\n            exec(code, ns)\n            exec(assign_last_line_into_variable(function_name + \"()\", variable_name), ns)\n            figs = [ns[variable_name]]\n        elif fig_vars:\n            exec(code, ns)\n            figs = [ns[fig_var] for fig_var in fig_vars]\n        else:\n            exec(assign_last_line_into_variable(code, variable_name), ns)\n            figs = [ns[variable_name]]\n\n    except (Exception, SystemExit) as err:\n        raise PlotError(traceback.format_exc()) from err\n    finally:\n        os.chdir(pwd)\n\n    return figs\n\n\ndef get_plot_formats(config):\n    default_dpi = {\"html\": 0}\n    formats = []\n    plot_formats = config.plotly_formats\n    for fmt in plot_formats:\n        if isinstance(fmt, str):\n            if \":\" in fmt:\n                suffix, dpi = fmt.split(\":\")\n                formats.append((str(suffix), int(dpi)))\n            else:\n                formats.append((fmt, default_dpi.get(fmt, 80)))\n        elif isinstance(fmt, (tuple, list)) and len(fmt) == 2:\n            formats.append((str(fmt[0]), int(fmt[1])))\n        else:\n            raise PlotError('invalid image format \"%r\" in plot_formats' % fmt)\n    return formats\n\n\ndef render_figures(\n    code,\n    code_path,\n    output_dir,\n    output_base,\n    context,\n    function_name,\n    config,\n    context_reset=False,\n    close_figs=False,\n    fig_vars=None,\n):\n    \"\"\"\n    Run a pyplot script and save the images in *output_dir*.\n\n    Save the images under *output_dir* with file names derived from\n    *output_base*\n    \"\"\"\n    formats = get_plot_formats(config)\n\n    # -- Try to determine if all images already exist\n\n    code_pieces = split_code_at_show(code)\n\n    # Look for single-figure output files first\n    all_exists = True\n    fig = FigureFile(output_base, output_dir)\n    for format, dpi in formats:\n        if out_of_date(code_path, fig.filename(format)):\n            all_exists = False\n            break\n        fig.formats.append(format)\n\n    if all_exists:\n        return [(code, [fig])]\n\n    # Then look for multi-figure output files\n    results = []\n    all_exists = True\n    for i, code_piece in enumerate(code_pieces):\n        figures = []\n        for j in itertools.count():\n            if len(code_pieces) > 1:\n                fig = FigureFile(\"%s_%02d_%02d\" % (output_base, i, j), output_dir)\n            else:\n                fig = FigureFile(\"%s_%02d\" % (output_base, j), output_dir)\n            for fmt, dpi in formats:\n                if out_of_date(code_path, fig.filename(fmt)):\n                    all_exists = False\n                    break\n                fig.formats.append(fmt)\n\n            # assume that if we have one, we have them all\n            if not all_exists:\n                all_exists = j > 0\n                break\n            figures.append(fig)\n        if not all_exists:\n            break\n        results.append((code_piece, figures))\n\n    if all_exists:\n        return results\n\n    # We didn't find the files, so build them\n\n    results = []\n    if context:\n        ns = plot_context\n    else:\n        ns = {}\n\n    if context_reset:\n        plot_context.clear()\n\n    close_figs = not context or close_figs\n\n    for i, code_piece in enumerate(code_pieces):\n\n        if not context:\n            pass\n        elif close_figs:\n            pass\n\n        fig_objects = run_code(code_piece, code_path, ns, function_name, fig_vars)\n\n        figures = []\n        for j, fig_obj in enumerate(fig_objects):\n            if len(fig_objects) == 1 and len(code_pieces) == 1:\n                fig = FigureFile(output_base, output_dir)\n            elif len(code_pieces) == 1:\n                fig = FigureFile(\"%s_%02d\" % (output_base, j), output_dir)\n            else:\n                fig = FigureFile(\"%s_%02d_%02d\" % (output_base, i, j), output_dir)\n            figures.append(fig)\n            for fmt, dpi in formats:\n                try:\n                    save_plotly_figure(fig_obj, fig.filename(fmt))\n                except Exception as err:\n                    raise PlotError(traceback.format_exc()) from err\n                fig.formats.append(fmt)\n\n        results.append((code_piece, figures))\n\n    if not context:\n        pass\n\n    return results\n\n\ndef run(arguments, content, options, state_machine, state, lineno):\n    document = state_machine.document\n    config = document.settings.env.config\n    nofigs = \"nofigs\" in options\n\n    formats = get_plot_formats(config)\n    default_fmt = formats[0][0]\n\n    options_copy = copy.deepcopy(options)\n\n    options.setdefault(\"include-source\", config.plotly_include_source)\n    options.setdefault(\"iframe-width\", config.plotly_iframe_width)\n    options.setdefault(\"iframe-height\", config.plotly_iframe_height)\n    keep_context = \"context\" in options\n    context_opt = None if not keep_context else options[\"context\"]\n\n    rst_file = document.attributes[\"source\"]\n    rst_dir = os.path.dirname(rst_file)\n\n    if len(arguments):\n        if not config.plotly_basedir:\n            source_file_name = os.path.join(setup.app.builder.srcdir, directives.uri(arguments[0]))\n        else:\n            source_file_name = os.path.join(\n                setup.confdir, config.plotly_basedir, directives.uri(arguments[0])\n            )\n\n        # If there is content, it will be passed as a caption.\n        caption = \"\\n\".join(content)\n\n        # Enforce unambiguous use of captions.\n        if \"caption\" in options:\n            if caption:\n                raise ValueError(\n                    \"Caption specified in both content and options.\" \" Please remove ambiguity.\"\n                )\n            # Use caption option\n            caption = options[\"caption\"]\n\n        # If the optional function name is provided, use it\n        if len(arguments) == 2:\n            function_name = arguments[1]\n        else:\n            function_name = None\n\n        code = Path(source_file_name).read_text(encoding=\"utf-8\")\n        output_base = os.path.basename(source_file_name)\n    else:\n        source_file_name = rst_file\n        code = textwrap.dedent(\"\\n\".join(map(str, content)))\n        counter = document.attributes.get(\"_plot_counter\", 0) + 1\n        document.attributes[\"_plot_counter\"] = counter\n        base, ext = os.path.splitext(os.path.basename(source_file_name))\n        output_base = \"%s-%d.py\" % (base, counter)\n        function_name = None\n        caption = options.get(\"caption\", \"\")\n\n    base, source_ext = os.path.splitext(output_base)\n    if source_ext in (\".py\", \".rst\", \".txt\"):\n        output_base = base\n    else:\n        source_ext = \"\"\n\n    # ensure that LaTeX includegraphics doesn't choke in foo.bar.pdf filenames\n    output_base = output_base.replace(\".\", \"-\")\n\n    # is it in doctest format?\n    is_doctest = contains_doctest(code)\n    if \"format\" in options:\n        if options[\"format\"] == \"python\":\n            is_doctest = False\n        else:\n            is_doctest = True\n\n    # determine output directory name fragment\n    source_rel_name = relpath(source_file_name, setup.confdir)\n    source_rel_dir = os.path.dirname(source_rel_name)\n    while source_rel_dir.startswith(os.path.sep):\n        source_rel_dir = source_rel_dir[1:]\n\n    # build_dir: where to place output files (temporarily)\n    build_dir = os.path.join(\n        os.path.dirname(setup.app.doctreedir), \"plot_directive\", source_rel_dir\n    )\n    # get rid of .. in paths, also changes pathsep\n    # see note in Python docs for warning about symbolic links on Windows.\n    # need to compare source and dest paths at end\n    build_dir = os.path.normpath(build_dir)\n\n    if not os.path.exists(build_dir):\n        os.makedirs(build_dir)\n\n    # output_dir: final location in the builder's directory\n    dest_dir = os.path.abspath(os.path.join(setup.app.builder.outdir, source_rel_dir))\n    if not os.path.exists(dest_dir):\n        os.makedirs(dest_dir)  # no problem here for me, but just use built-ins\n\n    # how to link to files from the RST file\n    dest_dir_link = os.path.join(relpath(setup.confdir, rst_dir), source_rel_dir).replace(\n        os.path.sep, \"/\"\n    )\n    try:\n        build_dir_link = relpath(build_dir, rst_dir).replace(os.path.sep, \"/\")\n    except ValueError:\n        # on Windows, relpath raises ValueError when path and start are on\n        # different mounts/drives\n        build_dir_link = build_dir\n    source_link = dest_dir_link + \"/\" + output_base + source_ext\n\n    # make figures\n    try:\n        results = render_figures(\n            code,\n            source_file_name,\n            build_dir,\n            output_base,\n            keep_context,\n            function_name,\n            config,\n            context_reset=context_opt == \"reset\",\n            close_figs=context_opt == \"close-figs\",\n            fig_vars=options.get(\"fig-vars\"),\n        )\n        errors = []\n    except PlotError as err:\n        reporter = state.memo.reporter\n        sm = reporter.system_message(\n            2,\n            \"Exception occurred in plotting {}\\n from {}:\\n{}\".format(\n                output_base, source_file_name, err\n            ),\n            line=lineno,\n        )\n        results = [(code, [])]\n        errors = [sm]\n\n    # Properly indent the caption\n    caption = \"\\n\".join(\"      \" + line.strip() for line in caption.split(\"\\n\"))\n\n    # generate output restructuredtext\n    total_lines = []\n    for j, (code_piece, figures) in enumerate(results):\n        if options[\"include-source\"]:\n            if is_doctest:\n                lines = [\"\", *code_piece.splitlines()]\n            else:\n                lines = [\n                    \".. code-block:: python\",\n                    \"\",\n                    *textwrap.indent(code_piece, \"    \").splitlines(),\n                ]\n            source_code = \"\\n\".join(lines)\n        else:\n            source_code = \"\"\n\n        if nofigs:\n            figures = []\n\n        opts = [\n            \":%s: %s\" % (key, val)\n            for key, val in options.items()\n            if key in (\"alt\", \"height\", \"width\", \"scale\", \"align\", \"class\")\n        ]\n\n        # Not-None src_link signals the need for a source link in the generated\n        # html\n        if j == 0 and config.plotly_html_show_source_link:\n            src_link = source_link\n        else:\n            src_link = None\n\n        if config.plotly_include_directive_source:\n            directive_source = create_directive_block(\"plotly\", arguments, options_copy, content)\n            directive_source = create_code_block(directive_source, \"text\")\n        else:\n            directive_source = \"\"\n\n        result = jinja2.Template(config.plotly_template or TEMPLATE).render(\n            directive_source=directive_source,\n            default_fmt=default_fmt,\n            dest_dir=dest_dir_link,\n            build_dir=build_dir_link,\n            source_link=src_link,\n            multi_figure=len(figures) > 1,\n            options=opts,\n            figures=figures,\n            iframe_width=options[\"iframe-width\"],\n            iframe_height=options[\"iframe-height\"],\n            source_code=source_code,\n            html_show_formats=config.plotly_html_show_formats and len(figures),\n            caption=caption,\n        )\n\n        total_lines.extend(result.split(\"\\n\"))\n        total_lines.extend(\"\\n\")\n\n    if total_lines:\n        state_machine.insert_input(total_lines, source=source_file_name)\n\n    # copy image files to builder's output directory, if necessary\n    Path(dest_dir).mkdir(parents=True, exist_ok=True)\n\n    for code_piece, figures in results:\n        for fig in figures:\n            for fn in fig.filenames():\n                destfig = os.path.join(dest_dir, os.path.basename(fn))\n                if fn != destfig:\n                    shutil.copyfile(fn, destfig)\n\n    # copy script (if necessary)\n    Path(dest_dir, output_base + source_ext).write_text(\n        unescape_doctest(code) if source_file_name == rst_file else code,\n        encoding=\"utf-8\",\n    )\n\n    return errors"
  },
  {
    "path": "_ext/symlink.py",
    "content": "from docutils import nodes\nfrom docutils.parsers.rst import Directive, directives\n\nimport os, sys\n\ndef remove_symlink_handler(app, exception):\n    dst = './src'\n    \n    if os.path.exists(dst):\n        if os.path.isdir(dst):\n            if os.path.islink(dst):\n                 os.unlink(dst)\n            else:\n                shutil.rmtree(dst)\n        else:\n            if os.path.islink(dst):\n                os.unlink(dst)\n            else:\n                os.remove(dst)\n\ndef setup(app):\n    app.connect('build-finished', remove_symlink_handler)\n    src = '../src'\n    dst = './src'\n\n    # This creates a symbolic link on python in tmp directory\n\n    if os.path.exists(dst):\n        if os.path.isdir(dst):\n            if os.path.islink(dst):\n                 os.unlink(dst)\n            else:\n                shutil.rmtree(dst)\n        else:\n            if os.path.islink(dst):\n                os.unlink(dst)\n            else:\n                os.remove(dst)\n\n    os.symlink(src, dst)\n\n    return {\n        'version': '1.0',\n        'parallel_read_safe': True,\n        'parallel_write_safe': True,\n    }"
  },
  {
    "path": "_static/css/custom.css",
    "content": ".xxtable-smaller-font-size p, strong  {\n    font-size:0.9em; \n}\n\n.ablog-post-title p {\n    font-size:0.9em;     \n}\n\n.ablog-post p {\n    font-size:0.9em;     \n}\n\n.sphinx-design-class-title-small {\n    font-size:0.9em;     \n}\n\n.sphinx-design-class-title-med {\n    font-size:1em;     \n}\n\n.sphinx-design-class-body-small {\n    font-size:0.9em;     \n}\n\n\nh1{font-size:2em;}\nh2{font-size:1.5em;}\nh3{font-size:1.3em;}\nh4{font-size:1.2em;}\ndiv.topic  {\n    font-size:0.85em; \n}\nli.toctree-l1 {\n    font-size:0.95em; \n}\nth , tr, td {\n    white-space: normal !important;\n}\nth {\n    font-size:0.90em; \n}\n\n.ff th , tr, td{\n    font-size:0.90em; \n    white-space: normal !important;\n}\n\n.ff div.section.p  {\n    font-size:0.8em; \n}\n\nhr {\n    border-color: #0000DD;\n    height: 2px;\n}\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "_static/css/custom.css.new",
    "content": "\n.table-smaller-font-size p, strong  {\n    font-size: 90%;  \n}\n\n  \n\ntd, th , tr {\n    white-space: normal !important;\n}\n\n/* Fixes the size of the RTD flyout */\n/* .rst-versions {\n    width: 320px !important;\n} */\n\n/* Content area color */\n.wy-nav-content {\n    background: #ffffff;\n}\n\n/* Scroll Bar*/\n.wy-side-scroll {\n    width: auto;\n    overflow-y: auto;\n    margin-top: 0px;\n}\n\n/* width of the side panel */\n.wy-nav-side {\n    width: 320px;\n}\n\n/* content section full screen */\n.wy-nav-content {\n   max-width: none; \n}\n\n/* set color of left side bar */\n.wy-nav-side,.wy-side-nav-search,.wy-nav-top {\n    /*background: #0079c1; /*005eb8 */\n   background: #ffffff;\n}\n\n/* Change caption color to be more legible */\n.wy-menu > .caption > span.caption-text {\n   color: #000000;\n   font-size: 20px;\n}\n\n/* Change the version color to match caption color */\n.wy-side-nav-search>div.version {\n   color: #000000;\n}\n\n/* Get rid of that ugly yellow highlight color and replace with something more appealing to the eye */\n.highlight .hll {\n   background-color: #ffffff;\n}\n/* \n@media screen and (max-width: 768px) {\n    .wy-nav-content-wrap {\n        margin-left: 0px;\n    }\n    .wy-nav-side {\n        width: 500px;\n    }\n} */\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "_templates/recentposts.html",
    "content": "{% if ablog %}\n<h3>\n  <a href=\"{{ pathto(ablog.blog_path) }}\">{{ gettext('Recent Posts') }}</a>\n</h3>\n<ul>\n  {% set pcount = 1 %} {% for recent in ablog.recent(10, pagename) %}\n  <li>\n    <a href=\"{{ pathto(recent.docname) }}{{ anchor(recent) }}\"\n      >{{ recent.title }}</a\n    >\n  </li>\n  {% endfor %}\n</ul>\n{% endif %}\n"
  },
  {
    "path": "_templates/search-field.html",
    "content": "<form id=\"search-form\" class=\"bd-search d-flex align-items-center\" action=\"{{ pathto('search') }}\" method=\"get\"> \n  <i class=\"fa-solid fa-magnifying-glass\"></i>\n  <input type=\"search\"\n         class=\"form-control\"\n         name=\"q\"\n         placeholder='Default Search'\n         aria-label='Default Search'\n         autocomplete=\"off\"\n         autocorrect=\"off\"\n         autocapitalize=\"off\"\n         spellcheck=\"false\"/>\n  <span class=\"search-button__kbd-shortcut\"><kbd class=\"kbd-shortcut__modifier\">Ctrl</kbd>+<kbd>K</kbd></span>\n</form>\n\n<div class=\"search-engine-toggle\">\n  <span class=\"toggle-label\"><b>Search Engine:</b></span>\n  <span class=\"toggle-option\" id=\"default-option\" style=\"font-weight: bold; color: #3cba54;\">Default</span> \n  <label class=\"switch\">\n    <input type=\"checkbox\" id=\"search-type\">\n    <span class=\"slider round\"></span>\n  </label>\n  <span class=\"toggle-option\" id=\"google-option\">Google</span>\n</div>\n\n\n<script>\n  const form = document.getElementById('search-form');\n  const searchType = document.getElementById('search-type');\n  const searchInput = form.querySelector('input[type=\"search\"]');\n  const defaultOption = document.getElementById('default-option');\n  const googleOption = document.getElementById('google-option');\n\n  // Load the saved state from localStorage\n  const savedState = localStorage.getItem('searchType');\n  if (savedState === 'google') {\n    searchType.checked = true;\n    form.action = \"{{ pathto('search-results') }}\";\n    defaultOption.style.fontWeight = 'normal';\n    defaultOption.style.color = 'initial';\n    googleOption.style.fontWeight = 'bold';\n    googleOption.style.color = '#2196F3';\n    searchInput.placeholder = \"Google Search\";\n    searchInput.setAttribute(\"aria-label\", \"Google Search\");\n  } else {\n    // Default state (or if savedState is null)\n    searchType.checked = false; \n    form.action = \"{{ pathto('search') }}\";\n    defaultOption.style.fontWeight = 'bold';\n    defaultOption.style.color = '#3cba54';\n    googleOption.style.fontWeight = 'normal';\n    googleOption.style.color = 'initial';\n    searchInput.placeholder = \"Default Search\";\n    searchInput.setAttribute(\"aria-label\", \"Default Search\"); \n  }\n\n  searchType.addEventListener('change', () => {\n    if (searchType.checked) {\n      const query = form.elements['q'].value; \n      form.action = \"{{ pathto('search-results') }}\";\n\n      // Bold \"Google\" in blue, unbold \"Default\" in default color\n      defaultOption.style.fontWeight = 'normal'; \n      defaultOption.style.color = 'initial'; \n      googleOption.style.fontWeight = 'bold'; \n      googleOption.style.color = '#2196F3'; \n\n      searchInput.placeholder = \"Google Search\";\n      searchInput.setAttribute(\"aria-label\", \"Google Search\");\n      localStorage.setItem('searchType', 'google');\n    } else {\n      form.action = \"{{ pathto('search') }}\"; \n\n      // Bold \"Default\" in green, unbold \"Google\" in default color\n      defaultOption.style.fontWeight = 'bold'; \n      defaultOption.style.color = '#3cba54'; \n      googleOption.style.fontWeight = 'normal';\n      googleOption.style.color = 'initial';\n\n      searchInput.placeholder = \"Default Search\";\n      searchInput.setAttribute(\"aria-label\", \"Default Search\"); \n      localStorage.setItem('searchType', 'default');\n    }\n  });\n</script>\n\n<style>\n.switch {\n  position: relative;\n  display: inline-block;\n  width: 35px; \n  height: 18px; \n}\n\n.switch input { \n  opacity: 0;\n  width: 0;\n  height: 0;\n}\n\n.slider {\n  position: absolute;\n  cursor: pointer;\n  top: 0;\n  left: 0;\n  right: 0;\n  bottom: 0;\n  background-color: #3cba54;\n  -webkit-transition: .4s;\n  transition: .4s;\n}\n\n.slider:before {\n  position: absolute;\n  content: \"\";\n  height: 14px; \n  width: 14px; \n  left: 2px; \n  bottom: 2px;\n  background-color: white;\n  -webkit-transition: .4s;\n  transition: .4s;\n}\n\ninput:checked + .slider {\n  background-color: #2196F3;\n}\n\ninput:focus + .slider {\n  box-shadow: 0 0 1px #2196F3;\n}\n\ninput:checked + .slider:before {\n  -webkit-transform: translateX(19px); \n  -ms-transform: translateX(19px);\n  transform: translateX(19px);\n}\n\n.slider.round {\n  border-radius: 34px;\n}\n\n.slider.round:before {\n  border-radius: 50%;\n}\n\n.search-engine-toggle {\n  display: flex;\n  align-items: center; \n  font-size: 80%; \n}\n\n.toggle-label {\n  margin-right: 5px; \n}\n\n.toggle-option {\n  margin-right: 5px; \n}\n\n#google-option {\n  margin-left: 5px; \n}\n</style>\n"
  },
  {
    "path": "_templates/search-google.html",
    "content": "{%- extends \"page.html\" %}\n{# Over-ride the body to be custom search structure we want #}\n{% block docs_body %}\n  <div class=\"bd-search-container\">\n    <h1>{{ _(\"Search\") }}</h1>\n    <noscript>\n      <div class=\"admonition error\">\n        <p class=\"admonition-title\">{% trans %}Error{% endtrans %}</p>\n        <p>{% trans %}Please activate JavaScript to enable the search\n          functionality.{% endtrans %}</p>\n      </div>\n    </noscript>\n\n    <h2>{{ _('Search Results') }}</h2> \n\n    <script async src=\"https://cse.google.com/cse.js?cx=657ffaecc36684ee1\">\n    </script>\n    <div class=\"gcse-searchresults-only\"></div> \n    <div id=\"search-results\"></div>\n  </div>\n<script>\n\n// Activate the search field on page load\nlet searchForm2 = document.getElementById('search-form');\nlet searchFormInput2 = searchForm2.querySelector('input[type=\"search\"]');\n\nif (searchFormInput2) {\n  searchFormInput2.focus();\n  searchFormInput2.select();\n  console.log(\"[PST]: Set focus on search field.\");\n  searchFormInput2.value = localStorage.getItem('lastSearchQuery');\n} \n\n</script>\n\n{% endblock docs_body %}\n{# Below sections just re-create the behavior of Sphinx default search #}\n{# Page metadata #}\n{%- block htmltitle -%}\n  <title>{{ _(\"Search\") }} - {{ title or docstitle }}</title>\n{%- endblock htmltitle -%}\n{# Manually include the search JS that Sphinx includes #}\n{% block scripts -%}\n  {{ super() }}\n{%- endblock scripts %}"
  },
  {
    "path": "_templates/search.html",
    "content": "{%- extends \"page.html\" %}\n{# Over-ride the body to be custom search structure we want #}\n{% block docs_body %}\n  <div class=\"bd-search-container\">\n    <h1>{{ _(\"Search\") }}</h1>\n    <noscript>\n      <div class=\"admonition error\">\n        <p class=\"admonition-title\">{% trans %}Error{% endtrans %}</p>\n        <p>{% trans %}Please activate JavaScript to enable the search functionality.{% endtrans %}</p>\n      </div>\n    </noscript>\n    <div id=\"search-results\"></div>\n  </div>\n  <script>\n// Activate the search field on page load\nlet searchForm3 = document.getElementById('search-form');\nlet searchFormInput3 = searchForm3.querySelector('input[type=\"search\"]');\n\n\nif (searchFormInput3) {\n  searchFormInput3.focus();\n  searchFormInput3.select();\n  console.log(\"[PST]: Set focus on search field3.\");\n  searchFormInput3.value = localStorage.getItem('lastSearchQuery');\n} \n  </script>\n{% endblock docs_body %}\n{# Below sections just re-create the behavior of Sphinx default search #}\n{# Page metadata #}\n{%- block htmltitle -%}\n  <title>{{ _(\"Search\") }} - {{ title or docstitle }}</title>\n{%- endblock htmltitle -%}\n{# Manually include the search JS that Sphinx includes #}\n{% block scripts -%}\n  {{ super() }}\n  <script src=\"{{ pathto('_static/searchtools.js', 1) }}\"></script>\n  <script src=\"{{ pathto('_static/language_data.js', 1) }}\"></script>\n  <script src=\"{{ pathto('searchindex.js', 1) }}\"></script>\n{%- endblock scripts %}"
  },
  {
    "path": "_utilities/JIRA_SETUP_QUICKSTART.md",
    "content": "# Jira Integration Quick Start\n\n## Prerequisites Check\n\nRun these commands to verify you have everything installed:\n\n```bash\n# Check AWS CLI\naws --version\n\n# Check ada credentials tool\nada --version\n\n# Check Python 3\npython3 --version\n\n# Check if uvx is available (for MCP server)\nuvx --version\n```\n\nIf any are missing, install them:\n```bash\n# AWS CLI\nbrew install awscli\n\n# ada credentials tool\ntoolbox install ada\n\n# uv (includes uvx)\nbrew install uv\n```\n\n## One-Time Setup\n\n### 1. Configure Ada Credentials\n\n```bash\nada credentials setup\n```\n\nWhen prompted:\n- **Account**: 621547421844\n- **Role**: Admin\n- **Profile name**: kaena\n\n### 2. Add Kaena Profile to AWS Config\n\n```bash\necho '[profile kaena]\ncredential_process='$HOME'/.toolbox/bin/ada credentials print --profile=kaena' >> ~/.aws/config\n```\n\n### 3. Run the Setup Script\n\n```bash\ncd /path/to/aws-neuron-sdk-staging\nchmod +x _utilities/setup_jira_token.sh\n./_utilities/setup_jira_token.sh\n```\n\nThis script will:\n- Fetch the Jira API token from AWS Secrets Manager\n- Update your MCP configuration with the token\n- Verify everything is set up correctly\n\n### 4. Restart Kiro\n\nAfter running the setup script, restart Kiro CLI to load the new MCP server.\n\n## Using Jira in Kiro\n\nOnce set up, you can use Kiro Powers to interact with Jira:\n\n```bash\n# In Kiro CLI, check available powers\nkiro powers list\n\n# Look for Atlassian/Jira related tools\n```\n\n## Manual Verification\n\nTo manually verify the setup worked:\n\n```bash\n# Check MCP config has Jira server\ncat ~/.kiro/settings/mcp.json | grep -A 10 atlassian-jira\n\n# Test AWS Secrets Manager access\nexport AWS_PROFILE=kaena\naws secretsmanager get-secret-value \\\n    --secret-id NKI_JIRA_API_TOKEN \\\n    --region us-west-2 \\\n    --query SecretString \\\n    --output text\n```\n\n## Troubleshooting\n\n### \"Error: Failed to fetch Jira API token\"\n\n1. Verify ada credentials are set up:\n   ```bash\n   ada credentials list\n   ```\n\n2. Check AWS profile is configured:\n   ```bash\n   cat ~/.aws/config | grep -A 2 kaena\n   ```\n\n3. Test AWS access:\n   ```bash\n   export AWS_PROFILE=kaena\n   aws sts get-caller-identity\n   ```\n\n### \"MCP server not loading\"\n\n1. Check uvx is installed:\n   ```bash\n   uvx --version\n   ```\n\n2. Manually test the MCP server:\n   ```bash\n   uvx mcp-server-atlassian\n   ```\n\n3. Check Kiro MCP logs (location varies by installation)\n\n## What's Next\n\nAfter setup, you can:\n- Query NKI Jira tickets\n- Create new tickets\n- Update ticket status\n- Search and filter tickets\n- Generate reports\n\nSee the full guide at `.kiro/steering/jira.md` for detailed usage examples.\n"
  },
  {
    "path": "_utilities/add_meta.py",
    "content": "#!/usr/bin/env python3\n\"\"\"Add missing .. meta:: blocks with :description:, :keywords:, and :date-modified: to .rst files.\"\"\"\n\nimport os\nimport re\nimport sys\nfrom pathlib import Path\n\nTODAY = \"2026-03-13\"\n\n# Map file paths to sensible descriptions/keywords based on content\ndef infer_meta(filepath: str, content: str) -> dict:\n    \"\"\"Infer description and keywords from file path and content.\"\"\"\n    rel = filepath.replace(\"frameworks/\", \"\")\n    \n    # Extract title from RST\n    title = \"\"\n    lines = content.split(\"\\n\")\n    title_chars = set(\"=-~^\\\"'`#*+_.\")\n    for i, line in enumerate(lines):\n        stripped = line.rstrip()\n        if (len(stripped) >= 3 and len(set(stripped)) == 1 \n            and stripped[0] in title_chars and i > 0):\n            candidate = lines[i-1].strip()\n            if candidate and not candidate.startswith(\"..\"):\n                title = candidate\n                break\n    \n    # Build description from title or path\n    if title:\n        desc = f\"{title} - AWS Neuron SDK documentation\"\n    else:\n        desc = f\"AWS Neuron SDK documentation for {os.path.basename(filepath).replace('.rst', '').replace('-', ' ')}\"\n    \n    # Build keywords from path components\n    kw_parts = set()\n    if \"torch\" in rel:\n        kw_parts.update([\"PyTorch\", \"AWS Neuron\"])\n    if \"neuronx\" in rel:\n        kw_parts.update([\"torch-neuronx\", \"Trainium\", \"Inferentia\"])\n    if \"jax\" in rel:\n        kw_parts.update([\"JAX\", \"AWS Neuron\", \"JAX NeuronX\"])\n    if \"training\" in rel.lower():\n        kw_parts.add(\"training\")\n    if \"inference\" in rel.lower():\n        kw_parts.add(\"inference\")\n    if \"setup\" in rel.lower() or \"install\" in rel.lower() or \"update\" in rel.lower():\n        kw_parts.add(\"setup\")\n    if \"tutorial\" in rel.lower():\n        kw_parts.add(\"tutorials\")\n    if \"api\" in rel.lower():\n        kw_parts.add(\"API reference\")\n    if \"profil\" in rel.lower():\n        kw_parts.add(\"profiling\")\n    if \"troubleshoot\" in rel.lower():\n        kw_parts.add(\"troubleshooting\")\n    if \"debug\" in rel.lower():\n        kw_parts.add(\"debugging\")\n    if not kw_parts:\n        kw_parts.update([\"AWS Neuron\", \"machine learning\"])\n    \n    keywords = \", \".join(sorted(kw_parts))\n    \n    return {\"description\": desc, \"keywords\": keywords}\n\n\ndef has_meta_field(content: str, field: str) -> bool:\n    \"\"\"Check if a .. meta:: block contains a specific field.\"\"\"\n    return bool(re.search(rf\"^\\s+:{field}:\", content, re.MULTILINE))\n\n\ndef process_file(filepath: str, dry_run: bool = False):\n    \"\"\"Process a single .rst file to ensure it has complete meta block.\"\"\"\n    with open(filepath, \"r\", encoding=\"utf-8\", errors=\"replace\") as f:\n        content = f.read()\n    \n    # Skip include-only fragments (no title, very short)\n    if len(content.strip()) < 50:\n        print(f\"  SKIP (fragment): {filepath}\")\n        return False\n    \n    has_meta = \".. meta::\" in content\n    has_desc = has_meta_field(content, \"description\")\n    has_kw = has_meta_field(content, \"keywords\")\n    has_date = has_meta_field(content, \"date-modified\")\n    \n    if has_desc and has_kw and has_date:\n        print(f\"  OK (complete): {filepath}\")\n        return False\n    \n    meta = infer_meta(filepath, content)\n    \n    if has_meta:\n        # Meta block exists but missing fields — add them\n        missing = []\n        if not has_desc:\n            missing.append(f\"   :description: {meta['description']}\")\n        if not has_kw:\n            missing.append(f\"   :keywords: {meta['keywords']}\")\n        if not has_date:\n            missing.append(f\"   :date-modified: {TODAY}\")\n        \n        insert_text = \"\\n\".join(missing)\n        \n        # Find the end of the existing meta block (last line starting with :field:)\n        lines = content.split(\"\\n\")\n        meta_start = -1\n        meta_last_field = -1\n        for i, line in enumerate(lines):\n            if line.strip() == \".. meta::\":\n                meta_start = i\n            elif meta_start >= 0 and re.match(r\"\\s+:\\w\", line):\n                meta_last_field = i\n            elif meta_start >= 0 and meta_last_field >= 0 and not line.strip().startswith(\":\") and not (line.strip() and not line[0].isspace()):\n                break\n        \n        if meta_last_field >= 0:\n            lines.insert(meta_last_field + 1, insert_text)\n            new_content = \"\\n\".join(lines)\n        else:\n            # Fallback: insert after .. meta:: line\n            new_content = content.replace(\".. meta::\", f\".. meta::\\n{insert_text}\", 1)\n    else:\n        # No meta block at all — add one at the top (after any labels)\n        lines = content.split(\"\\n\")\n        insert_idx = 0\n        \n        # Skip leading labels (.. _label:) and blank lines\n        for i, line in enumerate(lines):\n            stripped = line.strip()\n            if stripped.startswith(\".. _\") and stripped.endswith(\":\"):\n                insert_idx = i + 1\n            elif stripped == \"\" and i <= insert_idx + 1:\n                insert_idx = i + 1\n            else:\n                break\n        \n        meta_block = (\n            f\"\\n.. meta::\\n\"\n            f\"   :description: {meta['description']}\\n\"\n            f\"   :keywords: {meta['keywords']}\\n\"\n            f\"   :date-modified: {TODAY}\\n\\n\"\n        )\n        \n        lines.insert(insert_idx, meta_block)\n        new_content = \"\\n\".join(lines)\n    \n    action = \"UPDATE\" if has_meta else \"ADD\"\n    fields = []\n    if not has_desc: fields.append(\"description\")\n    if not has_kw: fields.append(\"keywords\")\n    if not has_date: fields.append(\"date-modified\")\n    print(f\"  {action} ({', '.join(fields)}): {filepath}\")\n    \n    if not dry_run:\n        with open(filepath, \"w\", encoding=\"utf-8\") as f:\n            f.write(new_content)\n    \n    return True\n\n\ndef main():\n    import argparse\n    parser = argparse.ArgumentParser(description=\"Add meta blocks to .rst files\")\n    parser.add_argument(\"directory\", default=\"frameworks\", nargs=\"?\")\n    parser.add_argument(\"--dry-run\", action=\"store_true\", help=\"Show what would change without writing\")\n    args = parser.parse_args()\n    \n    root = Path(args.directory)\n    rst_files = sorted(root.rglob(\"*.rst\"))\n    \n    print(f\"Scanning {len(rst_files)} .rst files in {root}/:\")\n    changed = 0\n    for f in rst_files:\n        if process_file(str(f), dry_run=args.dry_run):\n            changed += 1\n    \n    print(f\"\\n{'Would change' if args.dry_run else 'Changed'} {changed} file(s).\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "_utilities/audit_frameworks.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nAudit script for the /frameworks directory of the AWS Neuron SDK documentation.\n\nDetects orphaned pages (not referenced by any toctree, :doc:, :ref:, or\n.. include:: directive) and stale pages (containing outdated references).\n\nUsage:\n    python3 _utilities/audit_frameworks.py --root . --output audit-report.md\n\"\"\"\n\nimport argparse\nimport os\nimport re\nfrom pathlib import Path\n\n\n# ---------------------------------------------------------------------------\n# Reference extraction helpers\n# ---------------------------------------------------------------------------\n\n# Regex patterns for RST directives and roles\nTOCTREE_BLOCK_RE = re.compile(r\"^\\.\\.\\s+toctree::\", re.MULTILINE)\nDOC_ROLE_RE = re.compile(r\":doc:`(?:[^<`]*<)?(/[^>`]+|[^>`/][^>`]*)`\")\nREF_ROLE_RE = re.compile(r\":ref:`(?:[^<`]*<)?([^>`]+)`\")\nINCLUDE_RE = re.compile(r\"^\\.\\.\\s+include::\\s+(.+)$\", re.MULTILINE)\nLABEL_RE = re.compile(r\"^\\.\\.\\s+_([a-zA-Z0-9_-]+)\\s*:\", re.MULTILINE)\n\n\ndef _resolve_path(ref: str, referencing_file: Path, root: Path) -> str | None:\n    \"\"\"Resolve a toctree/doc/include reference to a repo-relative path.\"\"\"\n    ref = ref.strip()\n    if not ref:\n        return None\n\n    # Absolute path (starts with /)\n    if ref.startswith(\"/\"):\n        resolved = ref.lstrip(\"/\")\n    else:\n        # Relative to the directory of the referencing file\n        ref_dir = referencing_file.parent.relative_to(root)\n        resolved = str(ref_dir / ref)\n\n    # Normalise (collapse ..)\n    resolved = os.path.normpath(resolved)\n    return resolved\n\n\ndef _resolve_to_files(base: str, root: Path) -> list[str]:\n    \"\"\"Given a resolved base path, return candidate file paths that exist.\"\"\"\n    candidates = []\n    # Direct file match (already has extension)\n    if (root / base).is_file():\n        candidates.append(base)\n        return candidates\n\n    # Try common extensions\n    for ext in (\".rst\", \".ipynb\", \".txt\"):\n        p = base + ext\n        if (root / p).is_file():\n            candidates.append(p)\n\n    # Could be a directory with index.rst\n    idx = os.path.join(base, \"index.rst\")\n    if (root / idx).is_file():\n        candidates.append(idx)\n\n    return candidates\n\n\ndef extract_toctree_entries(content: str, filepath: Path, root: Path) -> set[str]:\n    \"\"\"Extract all file paths referenced in toctree directives.\"\"\"\n    referenced: set[str] = set()\n    lines = content.split(\"\\n\")\n    i = 0\n    while i < len(lines):\n        if TOCTREE_BLOCK_RE.match(lines[i]):\n            # Skip toctree options (lines starting with : or blank within indent)\n            i += 1\n            # Skip blank lines and option lines\n            while i < len(lines):\n                stripped = lines[i].strip()\n                if stripped == \"\" or stripped.startswith(\":\"):\n                    i += 1\n                    continue\n                break\n            # Now read toctree entries (indented non-empty lines)\n            while i < len(lines):\n                line = lines[i]\n                stripped = line.strip()\n                if stripped == \"\":\n                    i += 1\n                    continue\n                # Check if still indented (part of toctree body)\n                if line[0] in (\" \", \"\\t\"):\n                    # Entry may have a title: \"Title <path>\" or just \"path\"\n                    entry = stripped\n                    m = re.match(r\".*<(.+)>\", entry)\n                    if m:\n                        entry = m.group(1).strip()\n                    # Resolve the path\n                    resolved = _resolve_path(entry, filepath, root)\n                    if resolved:\n                        for f in _resolve_to_files(resolved, root):\n                            referenced.add(f)\n                    i += 1\n                else:\n                    break\n        else:\n            i += 1\n    return referenced\n\n\ndef extract_doc_refs(content: str, filepath: Path, root: Path) -> set[str]:\n    \"\"\"Extract all file paths referenced via :doc: roles.\"\"\"\n    referenced: set[str] = set()\n    for m in DOC_ROLE_RE.finditer(content):\n        ref = m.group(1).strip()\n        resolved = _resolve_path(ref, filepath, root)\n        if resolved:\n            for f in _resolve_to_files(resolved, root):\n                referenced.add(f)\n    return referenced\n\n\ndef extract_include_refs(content: str, filepath: Path, root: Path) -> set[str]:\n    \"\"\"Extract all file paths referenced via .. include:: directives.\"\"\"\n    referenced: set[str] = set()\n    for m in INCLUDE_RE.finditer(content):\n        ref = m.group(1).strip()\n        resolved = _resolve_path(ref, filepath, root)\n        if resolved:\n            for f in _resolve_to_files(resolved, root):\n                referenced.add(f)\n    return referenced\n\n\ndef extract_ref_labels(content: str) -> set[str]:\n    \"\"\"Extract all :ref: label targets from content.\"\"\"\n    return set(m.group(1) for m in REF_ROLE_RE.finditer(content))\n\n\ndef extract_label_definitions(content: str) -> set[str]:\n    \"\"\"Extract all label definitions (.. _label:) from content.\"\"\"\n    return set(m.group(1) for m in LABEL_RE.finditer(content))\n\n\n# ---------------------------------------------------------------------------\n# Orphan detection\n# ---------------------------------------------------------------------------\n\ndef find_all_framework_files(root: Path) -> tuple[set[str], set[str], set[str]]:\n    \"\"\"Find all .rst, .ipynb, and .txt files under frameworks/.\n\n    Returns (rst_files, ipynb_files, txt_files) as repo-relative paths.\n    \"\"\"\n    rst_files: set[str] = set()\n    ipynb_files: set[str] = set()\n    txt_files: set[str] = set()\n    fw_dir = root / \"frameworks\"\n    if not fw_dir.is_dir():\n        return rst_files, ipynb_files, txt_files\n    for p in fw_dir.rglob(\"*\"):\n        if not p.is_file():\n            continue\n        rel = str(p.relative_to(root))\n        if \"__pycache__\" in rel:\n            continue\n        if p.suffix == \".rst\":\n            rst_files.add(rel)\n        elif p.suffix == \".ipynb\":\n            ipynb_files.add(rel)\n        elif p.suffix == \".txt\":\n            txt_files.add(rel)\n    return rst_files, ipynb_files, txt_files\n\n\ndef collect_all_references(root: Path) -> tuple[set[str], set[str], set[str]]:\n    \"\"\"Scan ALL .rst and .txt files in the repo to collect references.\n\n    Returns (toctree_and_doc_refs, include_refs, ref_labels_used).\n    We scan the entire repo (not just /frameworks) so that references\n    from root index.rst, setup/, about-neuron/, etc. are captured.\n    \"\"\"\n    toctree_doc_refs: set[str] = set()\n    include_refs: set[str] = set()\n    ref_labels_used: set[str] = set()\n\n    # Directories to skip entirely\n    skip_dirs = {\"_build\", \".git\", \"venv\", \".venv\", \"__pycache__\", \".kiro\",\n                 \".vscode\", \".github\", \"node_modules\", \"_backup-rn\"}\n\n    for ext in (\"*.rst\", \"*.txt\"):\n        for p in root.rglob(ext):\n            # Skip files in excluded directories\n            rel = str(p.relative_to(root))\n            parts = Path(rel).parts\n            if any(part in skip_dirs for part in parts):\n                continue\n            try:\n                content = p.read_text(encoding=\"utf-8\", errors=\"replace\")\n            except Exception:\n                continue\n\n            toctree_doc_refs |= extract_toctree_entries(content, p, root)\n            toctree_doc_refs |= extract_doc_refs(content, p, root)\n            include_refs |= extract_include_refs(content, p, root)\n            ref_labels_used |= extract_ref_labels(content)\n\n    return toctree_doc_refs, include_refs, ref_labels_used\n\n\ndef build_label_to_file_map(root: Path) -> dict[str, str]:\n    \"\"\"Build a mapping from :ref: label -> repo-relative file path.\n\n    Only scans files under frameworks/ since we only need to know\n    which framework files are referenced via :ref:.\n    \"\"\"\n    label_map: dict[str, str] = {}\n    fw_dir = root / \"frameworks\"\n    if not fw_dir.is_dir():\n        return label_map\n    for p in fw_dir.rglob(\"*.rst\"):\n        rel = str(p.relative_to(root))\n        try:\n            content = p.read_text(encoding=\"utf-8\", errors=\"replace\")\n        except Exception:\n            continue\n        for label in extract_label_definitions(content):\n            label_map[label] = rel\n    return label_map\n\n\ndef detect_orphans(root: Path) -> list[dict]:\n    \"\"\"Detect orphaned pages under /frameworks.\n\n    Returns a list of dicts with keys: path, type, reason, action.\n    \"\"\"\n    rst_files, ipynb_files, txt_files = find_all_framework_files(root)\n    toctree_doc_refs, include_refs, ref_labels_used = collect_all_references(root)\n    label_map = build_label_to_file_map(root)\n\n    # Files referenced via :ref: labels\n    ref_referenced_files: set[str] = set()\n    for label in ref_labels_used:\n        if label in label_map:\n            ref_referenced_files.add(label_map[label])\n\n    # All referenced content files (rst + ipynb)\n    all_content_refs = toctree_doc_refs | ref_referenced_files\n    # All referenced include files (txt)\n    all_include_refs = include_refs\n\n    orphans: list[dict] = []\n\n    # Check .rst and .ipynb files against toctree/doc/ref references\n    for f in sorted(rst_files | ipynb_files):\n        if f not in all_content_refs and f not in all_include_refs:\n            ext = Path(f).suffix\n            orphans.append({\n                \"path\": f,\n                \"type\": ext,\n                \"reason\": \"Not in any toctree or cross-reference\",\n                \"action\": \"Delete\",\n            })\n\n    # Check .txt files against include references only\n    for f in sorted(txt_files):\n        if f not in all_include_refs:\n            orphans.append({\n                \"path\": f,\n                \"type\": \".txt (include fragment)\",\n                \"reason\": \"Not referenced by any .. include:: directive\",\n                \"action\": \"Delete\",\n            })\n\n    return orphans\n\n\n# ---------------------------------------------------------------------------\n# Stale page detection\n# ---------------------------------------------------------------------------\n\n# Staleness indicator patterns\nSTALE_OS_RE = re.compile(\n    r\"Ubuntu\\s+18\\.04|Ubuntu\\s+20\\.04|Amazon\\s+Linux\\s+2(?!\\s*023)(?!\\s*\\d{3})\\b\",\n    re.IGNORECASE,\n)\nSTALE_PYTHON_RE = re.compile(\n    r\"Python\\s+3\\.[0-9](?!\\d)\\b\",  # matches Python 3.0 through 3.9\n)\nSTALE_SDK_RE = re.compile(r\"Neuron\\s+SDK\\s+2\\.(\\d+)\")\nTORCH_NEURON_SETUP_RE = re.compile(\n    r\"torch-neuron.*(?:setup|install|update)\",\n    re.IGNORECASE,\n)\nNEURON_CC_RE = re.compile(r\"\\bneuron-cc\\b\")\n\n\ndef _check_stale_python(content: str) -> list[str]:\n    \"\"\"Find references to Python versions below 3.10.\"\"\"\n    indicators = []\n    for m in STALE_PYTHON_RE.finditer(content):\n        ver_str = m.group(0)\n        # Extract minor version\n        minor = int(ver_str.split(\".\")[-1])\n        if minor < 10:\n            indicators.append(ver_str)\n    return list(set(indicators))\n\n\ndef _check_stale_sdk(content: str) -> list[str]:\n    \"\"\"Find references to Neuron SDK versions older than 2.20.\"\"\"\n    indicators = []\n    for m in STALE_SDK_RE.finditer(content):\n        ver = int(m.group(1))\n        if ver < 20:\n            indicators.append(m.group(0))\n    return list(set(indicators))\n\n\ndef _check_stale_os(content: str) -> list[str]:\n    \"\"\"Find references to unsupported OS versions.\"\"\"\n    return list(set(m.group(0) for m in STALE_OS_RE.finditer(content)))\n\n\ndef _check_torch_neuron_unsupported_os(content: str) -> list[str]:\n    \"\"\"Flag torch-neuron setup/update instructions for unsupported OS.\"\"\"\n    indicators = []\n    if TORCH_NEURON_SETUP_RE.search(content):\n        os_refs = _check_stale_os(content)\n        if os_refs:\n            indicators.append(\n                f\"torch-neuron setup/update with unsupported OS: {', '.join(os_refs)}\"\n            )\n    return indicators\n\n\ndef _check_neuron_cc(content: str) -> list[str]:\n    \"\"\"Flag deprecated neuron-cc references.\"\"\"\n    if NEURON_CC_RE.search(content):\n        return [\"References deprecated neuron-cc compiler\"]\n    return []\n\n\ndef detect_stale_pages(root: Path) -> list[dict]:\n    \"\"\"Detect stale pages under /frameworks.\n\n    Returns a list of dicts with keys: path, indicators, recommendation.\n    \"\"\"\n    stale: list[dict] = []\n    fw_dir = root / \"frameworks\"\n    if not fw_dir.is_dir():\n        return stale\n\n    for p in fw_dir.rglob(\"*\"):\n        if not p.is_file():\n            continue\n        if p.suffix not in (\".rst\", \".txt\"):\n            continue\n        rel = str(p.relative_to(root))\n        try:\n            content = p.read_text(encoding=\"utf-8\", errors=\"replace\")\n        except Exception:\n            continue\n\n        indicators: list[str] = []\n        indicators.extend(_check_stale_os(content))\n        indicators.extend(_check_stale_python(content))\n        indicators.extend(_check_stale_sdk(content))\n        indicators.extend(_check_torch_neuron_unsupported_os(content))\n        indicators.extend(_check_neuron_cc(content))\n\n        if indicators:\n            # Determine recommendation\n            is_archival = (\n                \"mxnet-neuron/\" in rel\n                or \"tensorflow/\" in rel\n                or (\"torch-neuron/\" in rel and \"torch-neuronx/\" not in rel)\n            )\n            if is_archival:\n                rec = \"Will be archived\"\n            else:\n                rec = \"Update or archive\"\n            stale.append({\n                \"path\": rel,\n                \"indicators\": \"; \".join(sorted(set(indicators))),\n                \"recommendation\": rec,\n            })\n\n    return sorted(stale, key=lambda x: x[\"path\"])\n\n\n# ---------------------------------------------------------------------------\n# Report generation\n# ---------------------------------------------------------------------------\n\ndef generate_report(orphans: list[dict], stale: list[dict]) -> str:\n    \"\"\"Generate the audit report as Markdown.\"\"\"\n    lines: list[str] = []\n    lines.append(\"# Frameworks Audit Report\\n\")\n\n    # Orphaned pages\n    lines.append(\"## Orphaned Pages\\n\")\n    if orphans:\n        lines.append(\"| File Path | Type | Reason | Action |\")\n        lines.append(\"|---|---|---|---|\")\n        for o in orphans:\n            lines.append(\n                f\"| {o['path']} | {o['type']} | {o['reason']} | {o['action']} |\"\n            )\n    else:\n        lines.append(\"No orphaned pages detected.\\n\")\n\n    lines.append(\"\")\n\n    # Stale pages\n    lines.append(\"## Stale Pages\\n\")\n    if stale:\n        lines.append(\"| File Path | Staleness Indicators | Recommendation |\")\n        lines.append(\"|---|---|---|\")\n        for s in stale:\n            lines.append(\n                f\"| {s['path']} | {s['indicators']} | {s['recommendation']} |\"\n            )\n    else:\n        lines.append(\"No stale pages detected.\\n\")\n\n    lines.append(\"\")\n    return \"\\n\".join(lines)\n\n\n# ---------------------------------------------------------------------------\n# CLI\n# ---------------------------------------------------------------------------\n\ndef main():\n    parser = argparse.ArgumentParser(\n        description=\"Audit /frameworks for orphaned and stale pages.\"\n    )\n    parser.add_argument(\n        \"--root\",\n        default=\".\",\n        help=\"Repository root directory (default: current directory)\",\n    )\n    parser.add_argument(\n        \"--output\",\n        default=\"audit-report.md\",\n        help=\"Output file path for the audit report (default: audit-report.md)\",\n    )\n    args = parser.parse_args()\n\n    root = Path(args.root).resolve()\n    print(f\"Auditing frameworks under: {root}\")\n\n    orphans = detect_orphans(root)\n    print(f\"Found {len(orphans)} orphaned page(s).\")\n\n    stale = detect_stale_pages(root)\n    print(f\"Found {len(stale)} stale page(s).\")\n\n    report = generate_report(orphans, stale)\n    output_path = Path(args.output)\n    if not output_path.is_absolute():\n        output_path = root / output_path\n    output_path.write_text(report, encoding=\"utf-8\")\n    print(f\"Audit report written to: {output_path}\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "_utilities/check_urls.sh",
    "content": "#!/bin/bash\n\n# Output file\noutput_file=\"url_check_results.txt\"\n\n# Initialize counters\ntotal=0\nworking=0\nnot_found=0\nother=0\n\n# Create output file with header\necho \"URL Status Check Results\" > $output_file\necho \"=========================\" >> $output_file\necho \"\" >> $output_file\n\n# Read each URL from the file\nwhile read url; do\n  # Skip empty lines\n  if [ -z \"$url\" ]; then\n    continue\n  fi\n  \n  # Increment total counter\n  ((total++))\n  \n  # Print progress\n  echo \"Checking $total: $url\"\n  \n  # Use curl to check the URL status\n  status_code=$(curl -s -o /dev/null -w \"%{http_code}\" \"$url\")\n  \n  # Check status code\n  if [ \"$status_code\" -eq 200 ]; then\n    echo \"✓ WORKING: $url\" >> $output_file\n    ((working++))\n  elif [ \"$status_code\" -eq 404 ]; then\n    echo \"✗ NOT FOUND (404): $url\" >> $output_file\n    ((not_found++))\n  else\n    echo \"? OTHER STATUS ($status_code): $url\" >> $output_file\n    ((other++))\n  fi\n  \n  # Small delay to avoid overwhelming the server\n  sleep 0.1\n  \ndone < old-nki-apis.txt\n\n# Write summary\necho \"\" >> $output_file\necho \"\" >> $output_file\necho \"Summary\" >> $output_file\necho \"=======\" >> $output_file\necho \"Total URLs checked: $total\" >> $output_file\necho \"Working URLs: $working\" >> $output_file\necho \"Not found (404) URLs: $not_found\" >> $output_file\necho \"Other status URLs: $other\" >> $output_file\n\necho \"URL check completed. Results saved to $output_file\"\n"
  },
  {
    "path": "_utilities/create_sitemap.py",
    "content": "# v1.0 by dougeric 2025-09-30\n# Script to create sitemap.xml for Sphinx-generated docs; must be run at the root of the docs repo with venv\n\nimport os\nfrom pathlib import Path\nfrom datetime import datetime\n\ndef create_sitemap(root_dir, base_url):\n    \"\"\"\n    This function generates a sitemap.xml file for the given root directory and base URL.\n    It recursively scans all .rst files in the root directory, excluding those in directories\n    starting with \"_\". For each .rst file, it calculates the last modification time, converts\n    the .rst path to the corresponding HTML path, and adds a <url> entry to the sitemap in the\n    format required by Google Search Console.\n    \"\"\"\n    sitemap = ['<?xml version=\"1.0\" encoding=\"UTF-8\"?>',\n               '<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">']\n    \n    for path in Path(root_dir).rglob('*.rst'):\n        # Skip directories starting with \"_\"\n        if any(part.startswith('_') for part in path.parts):\n            continue\n            \n        # Convert .rst path to expected html path\n        rel_path = path.relative_to(root_dir)\n        html_path = str(rel_path).replace('.rst', '.html')\n        \n        # Get file modification time\n        mod_time = datetime.fromtimestamp(os.path.getmtime(path))\n        \n        sitemap.append(f'  <url>')\n        sitemap.append(f'    <loc>{base_url}/{html_path}</loc>')\n        sitemap.append(f'    <lastmod>{mod_time.strftime(\"%Y-%m-%d\")}</lastmod>')\n        sitemap.append(f'  </url>')\n    \n    sitemap.append('</urlset>')\n    return '\\n'.join(sitemap)\n\n# Call the function and write the result to sitemap.xml\nsitemap_content = create_sitemap('./', 'https://awsdocs-neuron.readthedocs-hosted.com/en/latest')\nwith open('sitemap.xml', 'w') as f:\n    f.write(sitemap_content)\nprint(\"\\nsitemap.xml has been created.\\n\")"
  },
  {
    "path": "_utilities/format_build_logs.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nFormat Sphinx Build Logs\n\nThis script checks for Python 3.9 and pip, creates a virtual environment,\nruns sphinx-build, and formats the build log as Markdown with separate\nsections for errors and warnings.\n\"\"\"\n\nimport os\nimport sys\nimport subprocess\nimport re\nimport datetime\nimport platform\nimport shutil\nfrom collections import Counter\nfrom pathlib import Path\n\ndef check_python_version():\n    \"\"\"Check if Python 3.9 is installed.\"\"\"\n    python_version = sys.version_info\n    \n    if python_version.major != 3 or python_version.minor != 9:\n        print(\"Error: Python 3.9 is required.\")\n        \n        if platform.system() == \"Darwin\":  # macOS\n            print(\"To install Python 3.9 on macOS, visit: https://www.python.org/downloads/release/python-3913/\")\n            print(\"Or use Homebrew: brew install python@3.9\")\n        elif platform.system() == \"Windows\":\n            print(\"To install Python 3.9 on Windows, visit: https://www.python.org/downloads/release/python-3913/\")\n        else:\n            print(\"Please install Python 3.9 from: https://www.python.org/downloads/release/python-3913/\")\n            \n        sys.exit(1)\n    \n    return True\n\ndef check_pip_installed():\n    \"\"\"Check if pip is installed.\"\"\"\n    try:\n        subprocess.run([sys.executable, \"-m\", \"pip\", \"--version\"], \n                      check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n        return True\n    except subprocess.CalledProcessError:\n        print(\"Error: pip is not installed.\")\n        print(\"Please install pip: https://pip.pypa.io/en/stable/installation/\")\n        sys.exit(1)\n\ndef find_repo_root():\n    \"\"\"Find the root of the private-aws-neuron-sdk-staging repo.\"\"\"\n    # Start with the current directory\n    current_dir = Path.cwd()\n    \n    # Check if we're already in the repo root\n    if current_dir.name == \"private-aws-neuron-sdk-staging\":\n        return current_dir\n    \n    # Check parent directory\n    parent_dir = current_dir.parent\n    if parent_dir.name == \"private-aws-neuron-sdk-staging\":\n        return parent_dir\n    \n    # Look for the repo in the current directory\n    for item in current_dir.iterdir():\n        if item.is_dir() and item.name == \"private-aws-neuron-sdk-staging\":\n            return item\n    \n    # Look for the repo in the parent directory\n    for item in parent_dir.iterdir():\n        if item.is_dir() and item.name == \"private-aws-neuron-sdk-staging\":\n            return item\n    \n    print(\"Error: Repository 'private-aws-neuron-sdk-staging' not found on local machine.\")\n    sys.exit(1)\n\ndef setup_venv(repo_parent):\n    \"\"\"Create and activate a Python 3.9 virtual environment.\"\"\"\n    venv_path = repo_parent / \"venv\"\n    \n    # Create venv if it doesn't exist\n    if not venv_path.exists():\n        print(f\"Creating virtual environment at {venv_path}...\")\n        try:\n            subprocess.run([sys.executable, \"-m\", \"venv\", str(venv_path)], check=True)\n        except subprocess.CalledProcessError as e:\n            print(f\"Error creating virtual environment: {e}\")\n            sys.exit(1)\n    \n    # Determine the path to the activate script\n    if platform.system() == \"Windows\":\n        activate_script = venv_path / \"Scripts\" / \"activate.bat\"\n        activate_cmd = str(activate_script)\n    else:\n        activate_script = venv_path / \"bin\" / \"activate\"\n        activate_cmd = f\"source {activate_script}\"\n    \n    print(f\"Virtual environment created at {venv_path}\")\n    print(f\"To activate manually, run: {activate_cmd}\")\n    \n    return venv_path\n\ndef get_venv_python(venv_path):\n    \"\"\"Get the path to the Python executable in the virtual environment.\"\"\"\n    if platform.system() == \"Windows\":\n        return venv_path / \"Scripts\" / \"python.exe\"\n    else:\n        return venv_path / \"bin\" / \"python\"\n\ndef get_venv_pip(venv_path):\n    \"\"\"Get the path to the pip executable in the virtual environment.\"\"\"\n    if platform.system() == \"Windows\":\n        return venv_path / \"Scripts\" / \"pip.exe\"\n    else:\n        return venv_path / \"bin\" / \"pip\"\n\ndef install_requirements(repo_root, venv_pip):\n    \"\"\"Install requirements from requirements.txt.\"\"\"\n    requirements_file = repo_root / \"requirements.txt\"\n    \n    if not requirements_file.exists():\n        print(f\"Error: requirements.txt not found at {requirements_file}\")\n        sys.exit(1)\n    \n    print(\"Installing requirements...\")\n    try:\n        subprocess.run([\n            str(venv_pip), \"install\", \"-r\", str(requirements_file),\n            \"--extra-index-url=https://pypi.org/simple\"\n        ], check=True)\n    except subprocess.CalledProcessError as e:\n        print(f\"Error installing requirements: {e}\")\n        sys.exit(1)\n    \n    print(\"Requirements installed successfully.\")\n\ndef run_sphinx_build(repo_root, venv_path):\n    \"\"\"Run sphinx-build and capture the output.\"\"\"\n    sphinx_build_path = venv_path / \"bin\" / \"sphinx-build\"\n    if platform.system() == \"Windows\":\n        sphinx_build_path = venv_path / \"Scripts\" / \"sphinx-build.exe\"\n    \n    if not sphinx_build_path.exists():\n        print(f\"Error: sphinx-build not found at {sphinx_build_path}\")\n        sys.exit(1)\n    \n    print(\"Running sphinx-build...\")\n    \n    # Create a log file to capture output\n    log_file_path = repo_root / \"sphinx_build_output.log\"\n    \n    try:\n        # Run sphinx-build with output redirected to both terminal and log file\n        with open(log_file_path, 'w') as log_file:\n            process = subprocess.Popen(\n                [str(sphinx_build_path), \"-b\", \"html\", \".\", \"_build/html\", \"-w\", \"warnings.txt\"],\n                cwd=str(repo_root),\n                stdout=subprocess.PIPE,\n                stderr=subprocess.STDOUT,\n                text=True,\n                bufsize=1\n            )\n            \n            # Capture output in real-time\n            output = []\n            for line in process.stdout:\n                print(line, end='')  # Print to terminal\n                log_file.write(line)  # Write to log file\n                output.append(line)\n                \n            process.wait()\n            \n            if process.returncode != 0:\n                print(f\"sphinx-build exited with code {process.returncode}\")\n        \n        # Also read the warnings.txt file if it exists\n        warnings_file = repo_root / \"warnings.txt\"\n        if warnings_file.exists():\n            with open(warnings_file, 'r') as f:\n                warnings_content = f.read()\n                output.append(\"\\n--- WARNINGS FILE CONTENT ---\\n\")\n                output.append(warnings_content)\n        \n        return ''.join(output)\n    except Exception as e:\n        print(f\"Error running sphinx-build: {e}\")\n        sys.exit(1)\n\ndef parse_build_log(log_text):\n    \"\"\"Parse the build log to extract errors and warnings.\"\"\"\n    # Save raw log for debugging\n    with open(\"raw_build_log.txt\", \"w\") as f:\n        f.write(log_text)\n    \n    # Check if warnings.txt exists and use it directly\n    warnings_file = Path(\"warnings.txt\")\n    if warnings_file.exists():\n        print(f\"Found warnings.txt file with direct warnings from Sphinx\")\n        with open(warnings_file, 'r') as f:\n            warnings_content = f.read()\n            \n        # Parse warnings.txt which has format: path:line: WARNING: message\n        warnings = []\n        for line in warnings_content.split('\\n'):\n            if not line.strip():\n                continue\n                \n            # Try to match the standard format first\n            match = re.match(r'(.*?):(\\d+): WARNING: (.*)', line)\n            if match:\n                file_path, line_num, message = match.groups()\n                warnings.append({\n                    'file': file_path,\n                    'line': line_num,\n                    'message': message.strip()\n                })\n                print(f\"Standard format match: file={file_path}, line={line_num}, message={message[:50]}...\")\n            else:\n                # Check for the \"document isn't included in any toctree\" pattern\n                # Format: /path/to/file.rst: WARNING: document isn't included in any toctree\n                toctree_match = re.match(r'(.*?): WARNING: (document isn\\'t included in any toctree.*)', line)\n                if toctree_match:\n                    file_path, message = toctree_match.groups()\n                    warnings.append({\n                        'file': file_path,\n                        'line': '0',  # No line number in this format\n                        'message': message.strip()\n                    })\n                    print(f\"Toctree match: file={file_path}, message={message[:50]}...\")\n                else:\n                    # If no match, just add as unknown\n                    warnings.append({\n                        'file': 'unknown',\n                        'line': '0',\n                        'message': line.strip()\n                    })\n                    print(f\"No match: message={line[:50]}...\")\n    else:\n        print(\"No warnings.txt file found, parsing log output directly\")\n        warnings = []\n        lines = log_text.split('\\n')\n        i = 0\n        while i < len(lines):\n            line = lines[i].strip()\n            \n            # Skip empty lines\n            if not line:\n                i += 1\n                continue\n                \n            # Check for the \"document isn't included in any toctree\" pattern\n            # Format: /path/to/file.rst: WARNING: document isn't included in any toctree\n            toctree_match = re.match(r'(.*?): WARNING: (document isn\\'t included in any toctree.*)', line)\n            if toctree_match:\n                file_path, message = toctree_match.groups()\n                warnings.append({\n                    'file': file_path,\n                    'line': '0',  # No line number in this format\n                    'message': message.strip()\n                })\n                i += 1\n                continue\n                \n            # Check for warnings in the raw message\n            # This is for warnings that are already in the log as complete messages\n            raw_warning_match = re.match(r'(.*?): WARNING: (.*)', line)\n            if raw_warning_match:\n                file_path, message = raw_warning_match.groups()\n                warnings.append({\n                    'file': file_path,\n                    'line': '0',  # No line number in this format\n                    'message': message.strip()\n                })\n                i += 1\n                continue\n            \n            # Check for standard format: path:line: WARNING: message\n            std_match = re.match(r'(.*?):(\\d+): WARNING: (.*)', line)\n            if std_match:\n                file_path, line_num, message = std_match.groups()\n                warnings.append({\n                    'file': file_path,\n                    'line': line_num,\n                    'message': message.strip()\n                })\n                i += 1\n                continue\n                \n            # Check for alternative format: WARNING: message (path:line)\n            alt_match = re.match(r'WARNING: (.*?) \\((.*?):(\\d+)\\)', line)\n            if alt_match:\n                message, file_path, line_num = alt_match.groups()\n                warnings.append({\n                    'file': file_path,\n                    'line': line_num,\n                    'message': message.strip()\n                })\n                i += 1\n                continue\n                \n            # Check for simple warnings that start with \"WARNING:\"\n            if line.startswith(\"WARNING:\"):\n                message = line[8:].strip()  # Remove \"WARNING: \" prefix\n                \n                # Collect continuation lines\n                i += 1\n                while i < len(lines) and lines[i].strip() and not lines[i].strip().startswith((\"WARNING:\", \"ERROR:\")):\n                    message += \" \" + lines[i].strip()\n                    i += 1\n                    \n                warnings.append({\n                    'file': 'unknown',\n                    'line': '0',\n                    'message': message\n                })\n                continue\n                \n            i += 1\n    \n    # Debug: Print the first few warnings to see what's being parsed\n    print(f\"Parsed {len(warnings)} warnings\")\n    for i, warning in enumerate(warnings[:5]):\n        print(f\"Warning {i+1}: file={warning['file']}, line={warning['line']}, message={warning['message'][:50]}...\")\n    \n    # Debug: Print the warning categories\n    categories = categorize_issues(warnings)\n    print(f\"Warning categories: {categories}\")\n    \n    # Regular expressions for errors\n    error_pattern = re.compile(r'(.*?):(\\d+): (?:ERROR|SEVERE): (.*?)(?:\\n|$)')\n    \n    errors = []\n    lines = log_text.split('\\n')\n    for line in lines:\n        error_match = error_pattern.search(line)\n        if error_match:\n            file_path, line_num, message = error_match.groups()\n            errors.append({\n                'file': file_path,\n                'line': line_num,\n                'message': message.strip()\n            })\n    \n    return errors, warnings\n\ndef categorize_issues(issues):\n    \"\"\"Categorize issues by type.\"\"\"\n    categories = Counter()\n    \n    for issue in issues:\n        # Extract the main category from the message\n        message = issue['message'].lower()\n        \n        if \"undefined label\" in message:\n            categories[\"Undefined Label\"] += 1\n        elif \"unknown document\" in message:\n            categories[\"Unknown Document\"] += 1\n        elif \"duplicate label\" in message:\n            categories[\"Duplicate Label\"] += 1\n        elif \"image file not found\" in message:\n            categories[\"Missing Image\"] += 1\n        elif \"toctree contains reference to nonexisting document\" in message:\n            categories[\"Missing Document\"] += 1\n        elif \"document isn't included in any toctree\" in message:\n            categories[\"Document Not in TOC\"] += 1\n        else:\n            categories[\"Other\"] += 1\n    \n    return categories\n\ndef format_markdown(errors, warnings, build_time):\n    \"\"\"Format the build log as Markdown.\"\"\"\n    timestamp = datetime.datetime.now().strftime(\"%Y-%m-%d_%H-%M-%S\")\n    \n    error_categories = categorize_issues(errors)\n    warning_categories = categorize_issues(warnings)\n    \n    markdown = f\"# Sphinx Build Log - {timestamp}\\n\\n\"\n    \n    # Build summary\n    markdown += \"## Build Summary\\n\\n\"\n    markdown += f\"- **Build Time**: {build_time:.2f} seconds\\n\"\n    markdown += f\"- **Total Errors**: {len(errors)}\\n\"\n    markdown += f\"- **Total Warnings**: {len(warnings)}\\n\\n\"\n    \n    # Error categories\n    if error_categories:\n        markdown += \"### Error Categories\\n\\n\"\n        for category, count in error_categories.most_common():\n            markdown += f\"- **{category}**: {count}\\n\"\n        markdown += \"\\n\"\n    \n    # Warning categories\n    if warning_categories:\n        markdown += \"### Warning Categories\\n\\n\"\n        for category, count in warning_categories.most_common():\n            markdown += f\"- **{category}**: {count}\\n\"\n        markdown += \"\\n\"\n    \n    # Errors section\n    markdown += \"## Errors\\n\\n\"\n    if errors:\n        for i, error in enumerate(errors, 1):\n            # Format the file path to be more readable\n            file_path = error['file']\n            if file_path.startswith('/Users/dougeric/git/private-aws-neuron-sdk-staging/'):\n                file_path = file_path[len('/Users/dougeric/git/private-aws-neuron-sdk-staging/'):]\n            \n            # Create a more readable header with file and line info\n            if error['file'] != 'unknown':\n                markdown += f\"### Error {i}: {file_path} (line {error['line']})\\n\\n\"\n            else:\n                markdown += f\"### Error {i}\\n\\n\"\n                \n            markdown += f\"```\\n{error['message']}\\n```\\n\\n\"\n    else:\n        markdown += \"No errors found.\\n\\n\"\n    \n    # Warnings section\n    markdown += \"## Warnings\\n\\n\"\n    if warnings:\n        for i, warning in enumerate(warnings, 1):\n            # Format the file path to be more readable\n            file_path = warning['file']\n            if file_path.startswith('/Users/dougeric/git/private-aws-neuron-sdk-staging/'):\n                file_path = file_path[len('/Users/dougeric/git/private-aws-neuron-sdk-staging/'):]\n            \n            # Create a more readable header with file and line info\n            if warning['file'] != 'unknown':\n                if warning['line'] != '0':\n                    markdown += f\"### Warning {i}: {file_path} (line {warning['line']})\\n\\n\"\n                else:\n                    markdown += f\"### Warning {i}: {file_path}\\n\\n\"\n            else:\n                markdown += f\"### Warning {i}\\n\\n\"\n                \n            # Don't include the file path in the message if it's already in the header\n            message = warning['message']\n            if warning['file'] != 'unknown' and message.startswith(warning['file']):\n                # Remove the file path from the message\n                message = message[len(warning['file'])+2:] # +2 for \": \"\n            \n            markdown += f\"```\\n{message}\\n```\\n\\n\"\n    else:\n        markdown += \"No warnings found.\\n\\n\"\n    \n    return markdown\n\ndef main():\n    \"\"\"Main function.\"\"\"\n    print(\"Checking Python version...\")\n    check_python_version()\n    \n    print(\"Checking pip installation...\")\n    check_pip_installed()\n    \n    print(\"Finding repository root...\")\n    repo_root = find_repo_root()\n    repo_parent = repo_root.parent\n    \n    print(f\"Repository found at: {repo_root}\")\n    \n    print(\"Setting up virtual environment...\")\n    venv_path = setup_venv(repo_parent)\n    venv_python = get_venv_python(venv_path)\n    venv_pip = get_venv_pip(venv_path)\n    \n    print(f\"Changing directory to {repo_root}...\")\n    os.chdir(str(repo_root))\n    \n    print(\"Installing requirements...\")\n    install_requirements(repo_root, venv_pip)\n    \n    print(\"Running sphinx-build...\")\n    start_time = datetime.datetime.now()\n    build_log = run_sphinx_build(repo_root, venv_path)\n    end_time = datetime.datetime.now()\n    build_time = (end_time - start_time).total_seconds()\n    \n    print(\"Parsing build log...\")\n    errors, warnings = parse_build_log(build_log)\n    \n    print(\"Formatting build log as Markdown...\")\n    markdown = format_markdown(errors, warnings, build_time)\n    \n    # Write the formatted log to a file\n    timestamp = datetime.datetime.now().strftime(\"%Y-%m-%d_%H-%M-%S\")\n    output_file = repo_root / f\"build-log-{timestamp}.md\"\n    \n    with open(output_file, \"w\") as f:\n        f.write(markdown)\n    \n    print(f\"Build log written to {output_file}\")\n    print(f\"Found {len(errors)} errors and {len(warnings)} warnings.\")\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "_utilities/inject_archive_meta.py",
    "content": "#!/usr/bin/env python3\n\"\"\"Inject noindex/nofollow meta directives and deprecation banners into archived .rst files.\"\"\"\n\nimport os\nimport re\nimport sys\n\nMETA_BLOCK = \"\"\".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\"\"\"\n\nWARNING_TEMPLATE = \"\"\"\n.. warning::\n\n   This document is archived. {framework} is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\"\"\"\n\n# Default for backward compatibility\nWARNING_BLOCK = WARNING_TEMPLATE.format(framework=\"MXNet\")\n\n\ndef find_title_end(lines):\n    \"\"\"Find the line index after the RST title underline.\n    \n    RST titles look like:\n        Title Text\n        ==========\n    \n    or with overline:\n        ==========\n        Title Text\n        ==========\n    \n    Returns the index of the line AFTER the title underline, or -1 if not found.\n    \"\"\"\n    title_chars = set('=-~^\"\\'`#*+_.')\n    i = 0\n    while i < len(lines):\n        line = lines[i].rstrip()\n        # Check if this line is an underline (all same char, at least 3 chars)\n        if len(line) >= 3 and len(set(line)) == 1 and line[0] in title_chars:\n            # Check if previous line is text (title) - this is an underline\n            if i > 0 and lines[i-1].strip() and not (len(set(lines[i-1].rstrip())) == 1 and lines[i-1].rstrip()[0] in title_chars):\n                return i + 1\n            # Check if next line is text and line after that is underline (overline pattern)\n            if i + 2 < len(lines) and lines[i+1].strip():\n                next_next = lines[i+2].rstrip()\n                if len(next_next) >= 3 and len(set(next_next)) == 1 and next_next[0] in title_chars:\n                    return i + 3\n        i += 1\n    return -1\n\n\ndef inject_meta_and_warning(filepath, framework=\"MXNet\"):\n    \"\"\"Inject meta block at top and warning after title in an RST file.\"\"\"\n    with open(filepath, 'r') as f:\n        content = f.read()\n    \n    # Skip if already has noindex meta\n    if ':noindex:' in content:\n        print(f\"  SKIP (already has meta): {filepath}\")\n        return\n    \n    warning_block = WARNING_TEMPLATE.format(framework=framework)\n    \n    lines = content.split('\\n')\n    \n    # Separate any leading labels (.. _label:) and blank lines\n    # These need to stay before the meta block\n    label_lines = []\n    content_start = 0\n    for i, line in enumerate(lines):\n        stripped = line.strip()\n        if stripped.startswith('.. _') and stripped.endswith(':'):\n            label_lines.append(line)\n            content_start = i + 1\n        elif stripped == '' and all(l.strip().startswith('.. _') for l in lines[:i] if l.strip()):\n            label_lines.append(line)\n            content_start = i + 1\n        else:\n            break\n    \n    # Build the content after labels\n    remaining_lines = lines[content_start:]\n    remaining_content = '\\n'.join(remaining_lines)\n    \n    # Find title end in remaining content\n    title_end = find_title_end(remaining_lines)\n    \n    if title_end >= 0:\n        # Insert warning after title\n        before_title = '\\n'.join(remaining_lines[:title_end])\n        after_title = '\\n'.join(remaining_lines[title_end:])\n        \n        new_remaining = before_title + '\\n' + warning_block + after_title\n    else:\n        # No title found, just add warning at the start of content\n        print(f\"  WARNING: No title found in {filepath}\")\n        new_remaining = warning_block + remaining_content\n    \n    # Reconstruct: labels + meta + content with warning\n    label_section = '\\n'.join(label_lines) + '\\n' if label_lines else ''\n    new_content = label_section + META_BLOCK + new_remaining\n    \n    # Ensure file ends with newline\n    if not new_content.endswith('\\n'):\n        new_content += '\\n'\n    \n    with open(filepath, 'w') as f:\n        f.write(new_content)\n    \n    print(f\"  OK: {filepath}\")\n\n\ndef main():\n    import argparse\n    parser = argparse.ArgumentParser(description='Inject archive meta into .rst files')\n    parser.add_argument('archive_dir', nargs='?', default='archive/mxnet-neuron',\n                        help='Directory containing .rst files to process')\n    parser.add_argument('--framework', default='MXNet',\n                        help='Framework name for the deprecation warning (e.g., MXNet, TensorFlow)')\n    args = parser.parse_args()\n\n    archive_dir = args.archive_dir\n    framework = args.framework\n    \n    rst_files = []\n    for root, dirs, files in os.walk(archive_dir):\n        for fname in files:\n            if fname.endswith('.rst'):\n                rst_files.append(os.path.join(root, fname))\n    \n    rst_files.sort()\n    print(f\"Processing {len(rst_files)} .rst files in {archive_dir}:\")\n    \n    for filepath in rst_files:\n        inject_meta_and_warning(filepath, framework=framework)\n    \n    print(f\"\\nDone. Processed {len(rst_files)} files.\")\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "_utilities/metadata_schema.yaml",
    "content": "# Metadata Schema for AWS Neuron SDK Setup Documentation\n# This schema defines the structured metadata fields used in setup documentation pages\n\nmetadata_fields:\n  # Core identification fields\n  description:\n    type: string\n    required: true\n    description: \"SEO and AI agent description of the page content\"\n    example: \"Install PyTorch Neuron using AWS Deep Learning AMI on Inf2, Trn1, Trn2, Trn3\"\n  \n  keywords:\n    type: array[string]\n    required: true\n    description: \"Comma-separated search terms for discoverability\"\n    example: \"pytorch, neuron, dlami, installation, inf2, trn1, trn2, trn3\"\n  \n  date-modified:\n    type: date\n    required: true\n    format: \"YYYY-MM-DD\"\n    description: \"ISO 8601 date of last modification\"\n    example: \"2026-03-02\"\n  \n  content-type:\n    type: enum\n    required: true\n    values:\n      - navigation-hub\n      - framework-setup-hub\n      - installation-guide\n      - troubleshooting\n      - legacy-guide\n    description: \"Type of documentation page\"\n  \n  # Setup-specific fields\n  framework:\n    type: enum\n    required_for: [installation-guide, framework-setup-hub]\n    values:\n      - pytorch\n      - jax\n      - tensorflow\n      - mxnet\n    description: \"ML framework being documented\"\n    validation: \"Must match parent directory name\"\n  \n  instance-types:\n    type: array[enum]\n    required_for: [installation-guide, framework-setup-hub, navigation-hub]\n    values:\n      - inf1\n      - inf2\n      - trn1\n      - trn2\n      - trn3\n    description: \"Supported AWS instance types\"\n    validation: \"Cannot mix inf1 with inf2/trn1/trn2/trn3\"\n  \n  installation-method:\n    type: enum\n    required_for: [installation-guide]\n    values:\n      - dlami\n      - manual\n      - container\n    description: \"Installation approach documented\"\n  \n  os:\n    type: array[enum]\n    required_for: [installation-guide]\n    values:\n      - ubuntu-24.04\n      - ubuntu-22.04\n      - al2023\n      - rocky-9\n    description: \"Supported operating systems\"\n  \n  python-versions:\n    type: array[string]\n    required: false\n    description: \"Supported Python versions\"\n    example: \"3.10, 3.11, 3.12\"\n  \n  status:\n    type: enum\n    required: false\n    values:\n      - current\n      - beta\n      - legacy\n      - deprecated\n    description: \"Status of the documented feature/hardware\"\n    validation: \"Must be 'legacy' when instance-types contains only inf1\"\n  \n  # AI agent hints\n  task:\n    type: string\n    required: false\n    description: \"Task-based description for AI agents\"\n    example: \"Install PyTorch on Trn1 using DLAMI\"\n  \n  prerequisites:\n    type: array[string]\n    required: false\n    description: \"List of required knowledge/resources\"\n  \n  estimated-time:\n    type: string\n    required: false\n    description: \"Estimated completion time\"\n    example: \"5 minutes\"\n\n# Validation Rules\nvalidation_rules:\n  - rule: \"inf1_separation\"\n    description: \"inf1 cannot be mixed with inf2, trn1, trn2, or trn3\"\n    check: \"If 'inf1' in instance-types, then len(instance-types) == 1\"\n    error_message: \"Cannot mix inf1 with other instance types\"\n  \n  - rule: \"framework_directory_match\"\n    description: \"framework metadata must match parent directory\"\n    check: \"framework value must equal parent directory name\"\n    error_message: \"Framework '{framework}' does not match directory '{directory}'\"\n  \n  - rule: \"legacy_status_for_inf1\"\n    description: \"Pages with only inf1 must have legacy status\"\n    check: \"If instance-types == ['inf1'], then status == 'legacy'\"\n    error_message: \"Inf1-only pages must have status: legacy\"\n  \n  - rule: \"legacy_directory_location\"\n    description: \"Legacy content must be in legacy-inf1 directory\"\n    check: \"If status == 'legacy', then path contains '/legacy-inf1/'\"\n    warning_message: \"Legacy content should be in /setup/legacy-inf1/ directory\"\n  \n  - rule: \"installation_guide_completeness\"\n    description: \"Installation guides must have complete metadata\"\n    check: \"If content-type == 'installation-guide', then framework, instance-types, installation-method, and os must be present\"\n    error_message: \"Installation guide missing required metadata: {missing_fields}\"\n  \n  - rule: \"content_type_requirements\"\n    description: \"Each content type has specific required fields\"\n    requirements:\n      navigation-hub: [description, keywords, instance-types, content-type]\n      framework-setup-hub: [description, keywords, framework, instance-types, content-type]\n      installation-guide: [description, keywords, framework, instance-types, installation-method, os, content-type]\n      troubleshooting: [description, keywords, content-type]\n      legacy-guide: [description, keywords, instance-types, status, content-type]\n\n# Usage Examples\nexamples:\n  installation_guide:\n    description: \"Install PyTorch Neuron using AWS DLAMI on Inf2, Trn1, Trn2, Trn3\"\n    keywords: \"pytorch, neuron, dlami, installation, inf2, trn1, trn2, trn3\"\n    framework: \"pytorch\"\n    instance-types: \"inf2, trn1, trn2, trn3\"\n    installation-method: \"dlami\"\n    os: \"ubuntu-24.04, ubuntu-22.04, al2023\"\n    content-type: \"installation-guide\"\n    date-modified: \"2026-03-02\"\n  \n  framework_hub:\n    description: \"Install PyTorch for AWS Neuron on Inf2, Trn1, Trn2, Trn3 instances\"\n    keywords: \"pytorch, neuron, installation, trn1, trn2, trn3, inf2\"\n    framework: \"pytorch\"\n    instance-types: \"inf2, trn1, trn2, trn3\"\n    content-type: \"framework-setup-hub\"\n    date-modified: \"2026-03-02\"\n  \n  legacy_guide:\n    description: \"Legacy installation guide for AWS Inferentia 1 (Inf1) instances\"\n    keywords: \"neuron, inf1, legacy, installation, inferentia\"\n    instance-types: \"inf1\"\n    status: \"legacy\"\n    content-type: \"legacy-guide\"\n    date-modified: \"2026-03-02\"\n"
  },
  {
    "path": "_utilities/migrate_setup_content.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nSetup Content Migration Script\n\nMaps old setup file paths to new framework-first paths and generates\na migration report. This script does NOT move files — it produces a\nreport of what references exist and where they should point.\n\nUsage:\n    python3 _utilities/migrate_setup_content.py [--dry-run] [--fix]\n\nOptions:\n    --dry-run   Show what would be changed without modifying files (default)\n    --fix       Apply changes to files\n\"\"\"\n\nimport argparse\nimport os\nimport re\nimport sys\nfrom pathlib import Path\n\n# Old path → new path mapping\nPATH_MAP = {\n    \"/setup/torch-neuronx\": \"/setup/pytorch/index\",\n    \"/setup/jax-neuronx\": \"/setup/jax/index\",\n    \"/setup/tensorflow-neuronx\": \"/frameworks/tensorflow/index\",\n    \"/setup/setup-neuronx\": \"/setup/index\",\n    \"/setup/setup-neuron\": \"/setup/index\",\n    \"/setup/mxnet-neuron\": \"/archive/mxnet-neuron/index\",\n}\n\n# External URL mapping (for hardcoded URLs in tutorials)\nURL_MAP = {\n    \"setup/torch-neuronx.html\": \"setup/pytorch/index.html\",\n    \"setup/jax-neuronx.html\": \"setup/jax/index.html\",\n}\n\n# Directories to scan\nSCAN_DIRS = [\n    \"about-neuron\",\n    \"frameworks\",\n    \"libraries\",\n    \"tools\",\n    \"compiler\",\n    \"containers\",\n    \"devflows\",\n    \"release-notes\",\n    \"setup\",\n    \"nki\",\n    \"dlami\",\n]\n\n# Directories to skip\nSKIP_DIRS = {\"_build\", \".git\", \"__pycache__\", \".venv\", \"node_modules\"}\n\n\ndef find_rst_files(base_dir: str) -> list[Path]:\n    \"\"\"Find all .rst files in scan directories.\"\"\"\n    files = []\n    for scan_dir in SCAN_DIRS:\n        dir_path = Path(base_dir) / scan_dir\n        if dir_path.exists():\n            for rst_file in dir_path.rglob(\"*.rst\"):\n                if not any(skip in rst_file.parts for skip in SKIP_DIRS):\n                    files.append(rst_file)\n    return sorted(files)\n\n\ndef find_references(content: str, file_path: Path) -> list[dict]:\n    \"\"\"Find old setup path references in file content.\"\"\"\n    refs = []\n\n    # Match :doc: references\n    for old_path, new_path in PATH_MAP.items():\n        pattern = re.compile(\n            rf\":doc:`([^`]*<)?{re.escape(old_path)}(>)?`\", re.IGNORECASE\n        )\n        for match in pattern.finditer(content):\n            line_num = content[: match.start()].count(\"\\n\") + 1\n            refs.append(\n                {\n                    \"file\": str(file_path),\n                    \"line\": line_num,\n                    \"old\": match.group(0),\n                    \"old_path\": old_path,\n                    \"new_path\": new_path,\n                    \"type\": \"doc_ref\",\n                }\n            )\n\n    # Match :ref: references to old labels\n    old_labels = {\n        \"setup-torch-neuronx\": \"pytorch-setup\",\n        \"setup-jax-neuronx\": \"jax-setup\",\n        \"setup-tensorflow-neuronx\": \"tensorflow-setup\",\n    }\n    for old_label, new_label in old_labels.items():\n        pattern = re.compile(rf\":ref:`([^`]*<)?{re.escape(old_label)}(>)?`\")\n        for match in pattern.finditer(content):\n            line_num = content[: match.start()].count(\"\\n\") + 1\n            refs.append(\n                {\n                    \"file\": str(file_path),\n                    \"line\": line_num,\n                    \"old\": match.group(0),\n                    \"old_label\": old_label,\n                    \"new_label\": new_label,\n                    \"type\": \"ref_label\",\n                }\n            )\n\n    # Match hardcoded URLs\n    for old_url, new_url in URL_MAP.items():\n        if old_url in content:\n            line_num = content[: content.index(old_url)].count(\"\\n\") + 1\n            refs.append(\n                {\n                    \"file\": str(file_path),\n                    \"line\": line_num,\n                    \"old_url\": old_url,\n                    \"new_url\": new_url,\n                    \"type\": \"url\",\n                }\n            )\n\n    return refs\n\n\ndef apply_fix(file_path: Path, refs: list[dict]) -> bool:\n    \"\"\"Apply reference fixes to a file.\"\"\"\n    content = file_path.read_text()\n    modified = False\n\n    for ref in refs:\n        if ref[\"type\"] == \"doc_ref\":\n            old = ref[\"old_path\"]\n            new = ref[\"new_path\"]\n            new_content = content.replace(old, new)\n            if new_content != content:\n                content = new_content\n                modified = True\n        elif ref[\"type\"] == \"url\":\n            old = ref[\"old_url\"]\n            new = ref[\"new_url\"]\n            new_content = content.replace(old, new)\n            if new_content != content:\n                content = new_content\n                modified = True\n\n    if modified:\n        file_path.write_text(content)\n    return modified\n\n\ndef main():\n    parser = argparse.ArgumentParser(description=\"Setup content migration script\")\n    parser.add_argument(\n        \"--fix\", action=\"store_true\", help=\"Apply changes (default is dry-run)\"\n    )\n    args = parser.parse_args()\n\n    base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))\n    rst_files = find_rst_files(base_dir)\n\n    print(f\"Scanning {len(rst_files)} .rst files...\")\n    print()\n\n    all_refs = []\n    for rst_file in rst_files:\n        content = rst_file.read_text()\n        refs = find_references(content, rst_file)\n        all_refs.extend(refs)\n\n    if not all_refs:\n        print(\"No old setup references found. Migration complete.\")\n        return\n\n    # Group by file\n    by_file = {}\n    for ref in all_refs:\n        by_file.setdefault(ref[\"file\"], []).append(ref)\n\n    print(f\"Found {len(all_refs)} references in {len(by_file)} files:\")\n    print()\n\n    for file_path, refs in sorted(by_file.items()):\n        print(f\"  {file_path}:\")\n        for ref in refs:\n            if ref[\"type\"] == \"doc_ref\":\n                print(f\"    L{ref['line']}: {ref['old_path']} → {ref['new_path']}\")\n            elif ref[\"type\"] == \"ref_label\":\n                print(f\"    L{ref['line']}: {ref['old_label']} → {ref['new_label']}\")\n            elif ref[\"type\"] == \"url\":\n                print(f\"    L{ref['line']}: {ref['old_url']} → {ref['new_url']}\")\n        print()\n\n    if args.fix:\n        fixed_count = 0\n        for file_path, refs in by_file.items():\n            if apply_fix(Path(file_path), refs):\n                fixed_count += 1\n                print(f\"  ✓ Fixed: {file_path}\")\n        print(f\"\\nFixed {fixed_count} files.\")\n    else:\n        print(\"Dry run — no files modified. Use --fix to apply changes.\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "_utilities/old-nki-apis.txt",
    "content": "https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.benchmark.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.profile.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.baremetal.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.simulate_kernel.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.sbuf.alloc.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.sbuf.mod_alloc.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.sbuf.auto_alloc.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.psum.alloc.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.psum.mod_alloc.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.psum.auto_alloc.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.skip_middle_end_transformations.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.enable_stack_allocator.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.force_auto_alloc.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.tensor.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.load.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.store.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.load_transpose2d.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.atomic_rmw.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.copy.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.broadcast_to.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.empty_like.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.zeros_like.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.ones.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.full.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.rand.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.random_seed.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.shared_constant.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.shared_identity_matrix.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.arange.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.mgrid.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.expand_dims.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.where.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.gather_flattened.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.all_reduce.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.par_dim.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.spmd_dim.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.nc.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.device_print.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.loop_reduce.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.fp32.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.add.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.subtract.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.multiply.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.divide.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.power.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.maximum.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.minimum.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.max.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.min.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.mean.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.var.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.sum.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.prod.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.all.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.abs.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.negative.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.sign.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.trunc.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.floor.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.ceil.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.mod.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.fmod.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.exp.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.log.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.cos.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.sin.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.tan.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.tanh.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.arctan.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.sqrt.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.rsqrt.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.sigmoid.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.relu.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.gelu.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.gelu_dx.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.gelu_apprx_tanh.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.gelu_apprx_sigmoid.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.silu.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.silu_dx.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.erf.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.erf_dx.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.softplus.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.mish.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.square.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.softmax.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.rms_norm.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.dropout.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.matmul.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.transpose.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.reciprocal.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.bitwise_and.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.bitwise_or.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.bitwise_xor.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.invert.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.left_shift.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.right_shift.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.equal.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.not_equal.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.greater.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.greater_equal.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.less.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.less_equal.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.logical_and.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.logical_or.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.logical_xor.html\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.logical_not.html\n\n"
  },
  {
    "path": "_utilities/setup_jira_token.sh",
    "content": "#!/bin/bash\n# Setup script to fetch Jira API token from AWS Secrets Manager\n# and configure it for the Atlassian MCP server\n\nset -e\n\necho \"Setting up Jira API token...\"\n\n# Check if AWS CLI is available\nif ! command -v aws &> /dev/null; then\n    echo \"Error: AWS CLI is not installed\"\n    echo \"Install with: brew install awscli\"\n    exit 1\nfi\n\n# Check if ada is available\nif ! command -v ada &> /dev/null; then\n    echo \"Error: ada credentials tool is not installed\"\n    echo \"Install with: toolbox install ada\"\n    exit 1\nfi\n\n# Set AWS profile to kaena\nexport AWS_PROFILE=kaena\n\necho \"Fetching Jira API token from AWS Secrets Manager...\"\nJIRA_TOKEN=$(aws secretsmanager get-secret-value \\\n    --secret-id NKI_JIRA_API_TOKEN \\\n    --region us-west-2 \\\n    --query SecretString \\\n    --output text 2>&1)\n\nif [ $? -ne 0 ]; then\n    echo \"Error: Failed to fetch Jira API token\"\n    echo \"Make sure you have:\"\n    echo \"  1. Run 'ada credentials setup' with account 621547421844, role Admin, profile kaena\"\n    echo \"  2. Added kaena profile to ~/.aws/config with ada credential_process\"\n    echo \"  3. Have IAM permissions to access the secret\"\n    echo \"\"\n    echo \"Error details:\"\n    echo \"$JIRA_TOKEN\"\n    exit 1\nfi\n\necho \"✓ Successfully fetched Jira API token\"\n\n# Update the MCP config with the actual token\nMCP_CONFIG=\"$HOME/.kiro/settings/mcp.json\"\n\nif [ ! -f \"$MCP_CONFIG\" ]; then\n    echo \"Error: MCP config not found at $MCP_CONFIG\"\n    exit 1\nfi\n\n# Create a temporary file with the token substituted\npython3 << EOF\nimport json\nimport os\n\nconfig_path = os.path.expanduser('$MCP_CONFIG')\nwith open(config_path, 'r') as f:\n    config = json.load(f)\n\n# Update the Jira API token\nif 'atlassian-jira' in config['mcpServers']:\n    config['mcpServers']['atlassian-jira']['env']['JIRA_API_TOKEN'] = '''$JIRA_TOKEN'''\n    \n    with open(config_path, 'w') as f:\n        json.dump(config, f, indent=2)\n    \n    print(\"✓ Updated MCP configuration with Jira API token\")\nelse:\n    print(\"Error: atlassian-jira server not found in MCP config\")\n    exit(1)\nEOF\n\necho \"\"\necho \"Setup complete! You can now use Jira tools in Kiro.\"\necho \"\"\necho \"To use Jira MCP tools:\"\necho \"  1. Restart Kiro CLI\"\necho \"  2. Use Jira tools through the MCP server\"\necho \"\"\necho \"Example queries:\"\necho \"  - Search for NKI tickets\"\necho \"  - Get ticket details\"\necho \"  - Create new tickets\"\n"
  },
  {
    "path": "about-neuron/amazonq-getstarted.rst",
    "content": "\n.. image:: /images/q-logo.png\n       :scale: 30%\n       :alt: Amazon Q\n       :align: left\n       :target: https://aws.amazon.com/q/\n\n.. _amazon-q-dev:\n\nAsk Amazon AI helper tools\n===========================\n\nUse Kiro, Quick, and Amazon Q in the AWS console as your Neuron Experts for general Neuron technical guidance and to jumpstart your NKI kernel developement.\n\n\n.. card:: Ask Q on AWS apps and websites\n            :link: https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/q-on-aws.html\n\n.. card:: Ask Kiro IDE\n            :link: https://kiro.dev/\n\n.. card:: Ask Kiro CLI\n            :link: https://kiro.dev/cli\n\n.. card:: Ask Quick\n            :link: https://aws.amazon.com/quick/\n\n.. card:: Guidelines for Quality Results\n            :link: amazon-q-dev-guidelines\n            :link-type: ref\n\n.. _amazon-q-dev-guidelines:\n\nGuidelines for Quality Results\n------------------------------\n\n1. Be Specific: Clearly state the task, desired output, and any\n   constraints.\n2. Provide Context: Mention specific versions, strategies, and any relevant performance requirements.\n3. Request Complete Code: Ask for full implementations including\n   imports, decorators, and main functions. Remember to always review and test the generated code before using it in\n   production.\n4. Ask for Explanations: Request comments or separate explanations for\n   complex parts of the code.\n5. Iterate: If the initial response isn’t satisfactory, refine your\n   prompt based on the output. If you encounter issues or inaccuracies, consider rephrasing your\n   prompt or breaking down complex tasks into smaller, more specific\n   questions.\n6. Fact check: Use Q as a starting point and supplement its output with official documentation, AWS NKI Samples repository, and your own expertise.\n\nExample Prompts\n~~~~~~~~~~~~~~~~~\n\n.. note::\n   Amazon AI helper tools may not be fully synched with the latest Neuron features. Therefore, they may not always produce optimal or fully accurate results.\n\n1. “Explain the key features and benefits of AWS Neuron Kernel Interface (NKI).”\n2. \"How do different parallelism strategies (data, pipeline, tensor) affect training performance on Neuron?\"\n3. “What are the best practices for optimizing matrix multiplication operations using Neuron Kernel Interface (NKI)?”\n4. “Provide complete Neuron Kernel Interface (NKI) code for a matrix multiplication kernel, including imports, decorators, and explanations of key optimizations. Focus on efficient tiling and data movement strategies.”\n"
  },
  {
    "path": "about-neuron/announcements/index.rst",
    "content": ".. _announcements-main:\n\nAnnouncements\n=============\n\nThis page will be replaced by ABlog. It's here to make sure it's in the TOC.\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron1.x/announce-eol-mx-before-1-5.rst",
    "content": ".. post:: May 01, 2023 01:00\n    :language: en\n    :tags: announce-eol mxnet-neuron\n\n.. _announce-eol-mxnet-before-1-5:\n\nAnnouncing end of support for ``mxnet-neuron`` versions 1.5\n-----------------------------------------------------------\n\n:ref:`Neuron release 2.10 <neuron-2.10.0-whatsnew>` will be the last release that will include ``mxnet-neuron`` versions 1.5. Future Neuron releases will not include ``mxnet-neuron`` versions 1.5\n\nCurrent users of those versions are advised to migrate to latest ``mxnet-neuron`` version.\n"
  },
  {
    "path": "about-neuron/announcements/neuron1.x/announce-eol-pt-1-5.rst",
    "content": ".. post:: Mar 25, 2022\n    :language: en\n    :tags: announce-eol torch-neuron\n\n.. _announce-eol-pt-1-5:\n\nAnnouncing end of support for torch-neuron version 1.5 starting with Neuron 1.19.0 release\n------------------------------------------------------------------------------------------\n\nStarting with *Neuron 1.19.0* release, *torch-neuron version 1.5* will no longer be supported. Last release of *torch-neuron version 1.5* will be issued\nas part of *Neuron 1.18.0* release. Current users of those versions are advised to migrate to latest *torch-neuron* version.\n"
  },
  {
    "path": "about-neuron/announcements/neuron1.x/announce-eol-pt-before-1-8.rst",
    "content": ".. post:: Nov 22, 2022\n    :language: en\n    :tags: announce-eol torch-neuron\n\n.. _announce-eol-pt-before-1-8:\n\nAnnouncing end of support for ``torch-neuron`` versions 1.7 and 1.8\n-------------------------------------------------------------------\n\n:ref:`Neuron release 2.5 <neuron-2.5.0-whatsnew>` will be the last release that will include ``torch-neuron`` versions 1.7 and 1.8. Future Neuron releases will not include ``torch-neuron`` versions 1.7 and 1.8.\n\nCurrent users of those versions are advised to migrate to latest ``torch-neuron`` version.\n"
  },
  {
    "path": "about-neuron/announcements/neuron1.x/announce-eol-tf-before-2-5.rst",
    "content": ".. post:: Nov 22, 2022 01:00\n    :language: en\n    :tags: announce-eol tensorflow-neuron\n\n.. _announce-eol-tf-before-2-5:\n\nAnnouncing end of support for ``tensorflow-neuron`` versions 2.5 and 2.6\n------------------------------------------------------------------------\n\n:ref:`Neuron release 2.5 <neuron-2.5.0-whatsnew>` will be the last release that will include ``tensorflow-neuron`` versions 2.5 and 2.6. Future Neuron releases will not include ``tensorflow-neuron`` versions 2.5 and 2.6.\n\nCurrent users of those versions are advised to migrate to latest ``tensorflow-neuron`` version.\n"
  },
  {
    "path": "about-neuron/announcements/neuron1.x/announce-eol-tf-before-2-7.rst",
    "content": ".. post:: May 01, 2023 01:00\n    :language: en\n    :tags: announce-eol tensorflow-neuron\n\n.. _announce-eol-tf-before-2-7:\n\nAnnouncing end of support for ``tensorflow-neuron`` versions 2.7\n----------------------------------------------------------------\n\n:ref:`Neuron release 2.10 <neuron-2.10.0-whatsnew>` will be the last release that will include ``tensorflow-neuron`` versions 2.7. Future Neuron releases will not include ``tensorflow-neuron`` versions 2.7\n\nCurrent users of those versions are advised to migrate to latest ``tensorflow-neuron`` version.\n"
  },
  {
    "path": "about-neuron/announcements/neuron1.x/announcements.rst",
    "content": ".. post:: Feb 17, 2022\n    :language: en\n    :tags: announcements\n\n.. _prev-announcements:\n\nPrevious Announcements\n======================\n\n.. contents::  Table of contents\n\t:local:\n\t:depth: 1\n\n.. _maintenance_tf21_tf24:\n\n02/17/2022 - tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4 enter maintenance mode\n------------------------------------------------------------------------------------\n\nStarting with *Neuron 1.17.2* release, *tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4* are entering maintenance mode.  Future releases of \n*tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4* will address critical security issues only. Current users of those versions are advised to migrate to \nlatest *tensorflow-neuron* version.\n\n10/27/2021 - Introducing Neuron Runtime 2.x (libnrt.so)  \n-------------------------------------------------------\n\nStarting with *Neuron 1.16.0* release, *Neuron Runtime 1.x* (``neuron-rtd``) is entering maintenance mode and is replaced by *Neuron Runtime 2.x*, a shared library named (``libnrt.so``). For more information on Runtime 1.x see  :ref:`Neuron Runtime 1.x enters maintenance mode <maintenance_rtd>`.\n\nFor more information please see :ref:`introduce-libnrt`.\n\n.. _maintenance_rtd:\n\n10/27/2021 - Neuron Runtime 1.x (``neuron-rtd``) enters maintenance mode\n------------------------------------------------------------------------\n\nStarting with *Neuron 1.16.0* release, *Neuron Runtime 1.x* (``neuron-rtd``) is entering maintenance mode and replaced \nwith *Neuron Runtime 2.x*, a shared library named ``libnrt.so``. \nFuture releases of *Neuron Runtime 1.x* (``neuron-rtd``) will address critical bug fixes and security issues only. Previous releases of \n*Neuron Runtime 1.x* (``neuron-rtd``) will continue to be available via ``rpm`` and ``deb`` packages.\n\nFor more information please see:\n\n\t* :ref:`introduce-libnrt`\n\t* :ref:`install-guide-index`\n\t* :ref:`neuron-maintenance-policy`\n\n\n.. _maintenance_mxnet_1_5:\n\n10/27/2021 - Neuron support for *Apache MXNet 1.5* enters maintenance mode\n--------------------------------------------------------------------------\n\nStarting *Neuron release 1.16.0*,  Neuron support for *MXNet 1.5* is entering maintenance mode.\nFuture releases of Neuron supporting *MXNet 1.5*  will address critical bug fixes and security issues only.\nPrevious releases of *Apache MXNet 1.5* will continue to be available via ``pip`` packages.\n\nCurrent users of *MXNet Neuron 1.5* can migrate their applications to *MXNet Neuron 1.8*, for more information \nabout MXNet Neuron support and how to upgrade to latest *MXNet Neuron 1.8*, please see visit :ref:`neuron-mxnet`.\n\n\n.. _maintenance_neuron-cli:\n\n10/27/2021 - ``neuron-cli`` enters maintenance mode\n---------------------------------------------------\n\nStarting *Neuron release 1.16.0*, with the introduction of *Neuron Runtime 2.x*, ``neuron-cli`` is entering maintenance mode. ``neuron-cli`` \nfunctionality will be available only if *Neuron Runtime 1.x* (``neuron-rtd``) is being used by the application. If the application is using \n*Neuron Runtime 2.x* shared library(``libnrt.so``), ``neuron-cli`` functionality will not be available.\n\n\nIf you have used ``neuron-cli`` in previous releases, and you are migrating to\nnewer Neuron releases where applications require *Neuron Runtime 2.x* shared library, please see the below :ref:`neuron-cli-mntnce-faq`.\nFuture releases of ``neuron-cli`` will address \ncritical bug fixes and security issues only. Previous releases of ``neuron-cli`` will continue to be available via ``rpm`` and ``deb`` packages.\n\n\n.. _eol-ncg:\n\n10/27/2021 - End of support for NeuronCore Groups (NCG)\n-------------------------------------------------------\n\nBefore the introduction of *Neuron Runtime 2.x*, NeuronCore Group (NCG) has been used by Neuron Runtime 1.x \nto define an execution group of one or more NeuronCores where models can be loaded and executed. It also provided separation between processes.\n   \nWith the introduction of *Neuron Runtime 2.x*, the strict separation of NeuronCores into groups is no longer needed and NeuronCore Groups (NCG) is \ndeprecated.  *Neuron Runtime 2.x* enables each process to own a set of NeuronCores, and within each process, Neuron Runtime 2.x supports loading and \nexecuting multiple models on separate , different or overlapping sets of NeuronCores.\n\nPlease note that ``NEURONCORE_GROUP_SIZES`` environment variable is in the process of being :ref:`unsupported <eol-ncgs-env>`, and for a transition period \n``NEURONCORE_GROUP_SIZES`` can be used to preserve the old NeuronCore Group behavior. The frameworks internally would convert ``NEURONCORE_GROUP_SIZES`` to \nuse runtime's new mode of mapping models to NeuronCores.\n\nFor more information see details about ``NEURON_RT_VISIBLE_CORES`` at :ref:`nrt-configuration` and  and :ref:`neuron-migrating-apps-neuron-to-libnrt`.\n\n\n.. _eol-ncgs-env:\n\n10/27/2021 - Announcing end of support for ``NEURONCORE_GROUP_SIZES``\n---------------------------------------------------------------------\n\n``NEURONCORE_GROUP_SIZES`` environment variable is in the process of being deprecated, future Neuron releases may no longer support\nthe ``NEURONCORE_GROUP_SIZES`` environment variable. Please start\nusing ``NEURON_RT_VISIBLE_CORES`` instead.\n\nSee :ref:`eol-ncg`, :ref:`nrt-configuration` and :ref:`neuron-migrating-apps-neuron-to-libnrt` for more information.\n\n\n\n\n.. _neuron-cli-mntnce-faq:\n\nFrequently Asked questions (FAQ)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIs there another tool that provide the same functionality as ``neuron-cli list-model``?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYes, please see :ref:`neuron-ls-ug` or :ref:`neuron-monitor-ug`.\n\nIs there another tool that provide the same functionality as ``neuron-cli create-ncg``, ``neuron-cli destroy-ncg``, and ``neuron-cli list-ncg``?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo, these functionalities are no longer needed with *Neuron Runtime 2.x*,NeuronCore Groups (NCG) :ref:`is deprecated <eol-ncg>` and ``NEURONCORE_GROUP_SIZES`` environment variable :ref:`is in the process of being deprecated <eol-ncgs-env>`, Please start using ``NEURON_RT_VISIBLE_CORES`` instead. See :ref:`nrt-configuration` and :ref:`neuron-migrating-apps-neuron-to-libnrt` \n\nfor more information.\n\nIs there another tool that provide the same functionality as ``neuron-cli reset``?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo, this functionality is no longer needed with *Neuron Runtime 2.x*. Before introducing ``libnrt.so``, in certain cases after an application \ncrashed  models had to be unloaded manually by calling neuron-cli reset.\n\nWith ``libnrt.so``, applications runs in the context of the ``libnrt.so`` shared library and when an application exits the Neuron driver will free all resources associated with the application.\n\n\nFor more information please see:\n\n\t* :ref:`introduce-libnrt`\n\t* :ref:`neuron-tools`\n\t* :ref:`install-guide-index`\n\t* :ref:`neuron-maintenance-policy`\n\n\n.. _eol-conda-packages:\n\n05/28/2021 - End of support for Neuron Conda packages in Deep Learning AMI starting Neuron 1.14.0\n-------------------------------------------------------------------------------------------------\n\n05/28/2021 - Starting with Neuron SDK 1.14.0, we will no longer support conda packages to install Neuron SDK framework in DLAMI and we will no longer update conda packages used to install Neuron SDK framework (Neuron conda packages) with new versions.\n\nStarting with Neuron SDK 1.14.0, pip packages (Neuron pip packages) will be used to install Neuron SDK framework in DLAMI conda environment. To upgrade Neuron SDK framework DLAMI users should use pip upgrade commands instead of conda update commands. Instructions are available in this blog and in Neuron SDK documentation (:ref:`setup-guide-index`).\n\n\nStarting with Neuron SDK 1.14.0, run one of the following commands to upgrade to latest Neuron framework of your choice:\n\n* To upgrade PyTorch Neuron:\n\n.. code-block::\n\n    source activate aws_neuron_pytorch_p36\n    pip install --upgrade torch-neuron neuron-cc[tensorflow] torchvision --extra-index-url https://pip.repos.neuron.amazonaws.com\n\n* To upgrade TensorFlow Neuron:\n\n.. code-block::\n\n   source activate aws_neuron_tensorflow_p36\n   pip install --upgrade torch-neuron neuron-cc[tensorflow] torchvision --extra-index-url https://pip.repos.neuron.amazonaws.com\n\n* To upgrade MXNet Neuron:\n\n.. code-block::\n\n   source activate aws_neuron_mxnet_p36\n   pip install --upgrade torch-neuron neuron-cc[tensorflow] torchvision --extra-index-url https://pip.repos.neuron.amazonaws.com\n\nFor more information please check the `blog <https://aws.amazon.com/blogs/developer/neuron-conda-packages-eol/>`__.\n\n\n\n.. _eol-ubuntu16:\n\n05/01/2021 - End of support for Ubuntu 16 starting Neuron 1.14.0\n----------------------------------------------------------------\n\nUbuntu 16.04 entered end of life phase officially in April 2021 (see https://ubuntu.com/about/release-cycle) and will not receive any public software or security updates. Starting with Neuron SDK 1.14.0, Ubuntu 16 is no longer supported for Neuron, users who are using Ubuntu 16 are requested to migrate to Ubuntu18 or Amazon Linux 2.\n\nCustomers who choose to upgrade libc on Ubuntu 16 to work with Neuron v1.13.0 (or higher versions) are highly discouraged from doing that since Ubuntu 16 will no longer receive public security updates.\n\n.. _eol-classic-tensorboard:\n\n05/01/2021 - End of support for classic TensorBoard-Neuron starting Neuron 1.13.0 and introducing Neuron Plugin for TensorBoard \n-------------------------------------------------------------------------------------------------------------------------------\n\nStarting with Neuron SDK 1.13.0, we are introducing :ref:`Neuron Plugin for TensorBoard <neuron-plugin-tensorboard>` and we will no longer support classic TensorBoard-Neuron. Users are required to migrate to Neuron Plugin for TensorBoard.\n\nStarting with Neuron SDK 1.13.0, if you are using TensorFlow-Neuron within DLAMI Conda environment, attempting to run ``tensorboard`` with the existing version of TensorBoard will fail.  Please update the TensorBoard version before installing the Neuron plugin by running ``pip install TensorBoard --force-reinstall``, for installation instructions see :ref:`neuron-plugin-tensorboard`.\n\nUsers who are using Neuron SDK releases before 1.13.0,  can find classic TensorBoard-Neuron documentation at `Neuron 1.12.2 documentation <https://awsdocs-neuron.readthedocs-hosted.com/en/1.12.2/neuron-guide/neuron-tools/getting-started-tensorboard-neuron.html>`__.\n\n\nFor more information see see :ref:`neuron-tensorboard-rn` and :ref:`neuron-plugin-tensorboard`.\n\n.. _eol_python_3_5:\n\n02/24/2021 - End of support for Python 3.5 \n-------------------------------------------\n\nAs Python 3.5 reached end-of-life in October 2020, and many packages including TorchVision and Transformers have\nstopped support for Python 3.5, we will begin to stop supporting Python 3.5 for frameworks, starting with\nPyTorch-Neuron version :ref:`neuron-torch-11170` in this release. You can continue to use older versions with Python 3.5.\n\n\n11/17/2020 - End of support for ONNX \n------------------------------------\n\nONNX support is limited and from this version onwards we are not\nplanning to add any additional capabilities to ONNX. We recommend\nrunning models in TensorFlow, PyTorch or MXNet for best performance and\nsupport.\n\n\n07/16/2020 - End of support for PyTorch 1.3 \n--------------------------------------------\n\nStarting this release we are ending the support of PyTorch 1.3 and migrating to PyTorch 1.5.1, customers are advised to migrate to PyTorch 1.5.1.\n\n\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron1.x/eol-ncgs-env_2.rst",
    "content": ".. post:: Mar 25, 2022\n    :language: en\n    :tags: announce-eol\n\n\nAnnouncing end of support for ``NEURONCORE_GROUP_SIZES`` starting with Neuron 1.20.0 release\n--------------------------------------------------------------------------------------------\n\nStarting with Neuron SDK 1.20.0, ``NEURONCORE_GROUP_SIZES`` environment variable will no longer be supported. Setting \n``NEURONCORE_GROUP_SIZES`` environment variable will no longer affect applications behavior.\nCurrent customers using ``NEURONCORE_GROUP_SIZES`` environment variable are advised to use ``NEURON_RT_VISIBLE_CORES`` environment variable  or ``NEURON_RT_NUM_CORES`` environment variable instead.\n\nSee :ref:`eol-ncg`, :ref:`nrt-configuration` and :ref:`neuron-migrating-apps-neuron-to-libnrt` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron1.x/eol-pt-15.rst",
    "content": ".. post:: Apr 29, 2022\n    :language: en\n    :tags: eol\n\n.. _eol-pt-15:\n\n\nEnd of support for torch-neuron version 1.5\n-------------------------------------------\n\nStarting with *Neuron 1.19.0* release, *torch-neuron 1.5* will no longer be supported, and  \nno further releases of *torch-neuron version 1.5* will be issued.  Current users of torch-neuron version 1.5 are advised to migrate to \nlatest *torch-neuron* version."
  },
  {
    "path": "about-neuron/announcements/neuron1.x/eol-tf-21-24.rst",
    "content": ".. post:: Mar 25, 2022\n    :language: en\n    :tags: eol\n\n.. _eol-tf-21-24:\n\nEnd of support for tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4\n--------------------------------------------------------------------\n\nStarting with *Neuron 1.18.0* release, *tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4* will no longer be supported, and  \nno further releases of *tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4* will be issued.  Current users of those versions are advised to migrate to \nlatest *tensorflow-neuron* version."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-component-change.rst",
    "content": ".. post:: December 21, 2023\n    :language: en\n    :tags: announce-name-change, neuron-component\n\n.. _announce-component-name-change:\n\nAnnouncing Name Change for Neuron Components\n---------------------------------------------\n\nStarting with :ref:`Neuron release 2.16 <neuron-2.16.0-whatsnew>`, the name of the following Neuron components will change as follows:\n\n======================= =================== ====================\nPackage name            Current Name        New Name\n======================= =================== ====================\ntorch-neuronx           PyTorch Neuron      PyTorch NeuronX\ntensorflow-neuronx      TensorFlow Neuron   TensorFlow NeuronX\nneuronx-cc              Neuron Compiler     NeuronX Compiler\naws-neuronx-runtime-lib Neuron Runtime      NeuronX Runtime\ntensorflow-neuronx      Transformers Neuron Transformers NeuronX\nneuronx-distributed     Neuron Distributed  NeuronX Distributed\n======================= =================== ====================\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-correction-neuron-driver-support-inf1.rst",
    "content": ".. post:: March 12, 2026\n    :language: en\n    :tags: announce-correction-neuron-driver-inf1, neuron-driver-version, inf1\n\n.. _announce-correction-neuron-driver-inf1-support:\n\n\nCorrection: Neuron Driver support for Inf1 — version 2.24 (not 2.21)\n---------------------------------------------------------------------\n\nWe are correcting a previous announcement regarding last Neuron Driver version to support Inf1. The last supported version is 2.24\n\nNeuron driver versions above 2.24 only support non-Inf1 instances (such as ``Trn1``, ``Inf2``, or other instance types).\nFor ``Inf1`` instance users, only Neuron driver version 2.24 will remain supported with regular security patches.\n\nAs part of this correction, Neuron Driver version **2.24.13.0** has been released as a patch for ``Inf1`` users, adding compatibility with Linux kernel 6.18.\n\n``Inf1`` instance users are advised to pin the Neuron driver version to ``2.24.*`` in their installation script:\n\nFor Ubuntu:\n\n.. code-block:: bash\n\n    sudo apt-get install aws-neuronx-dkms=2.24.* -y\n\nFor Amazon Linux 2 / Amazon Linux 2023:\n\n.. code-block:: bash\n\n    sudo yum install aws-neuronx-dkms-2.24.* -y\n\nRefer to the :ref:`Neuron Driver release notes <runtime_rn>` for more details.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-deprecation-containers-rtd.rst",
    "content": ".. post:: December 20, 2023\n    :language: en\n    :tags: announce-deprecating-containers, runtime-rtd\n\n.. _announce-update-containers:\n\nAnnouncing end-of-support for Neuron Containers with Runtime 1.x\n-----------------------------------------------------------------\n\n:ref:`Neuron release 2.3 <announce-neuron-rtd-eol>` was the last release to support Neuron Runtime 1.x (neuron-rtd).\nCurrent users of Neuron DLC/DLAMI with Neuron Runtime 1.x are required to :ref:`update their image <neuron_containers>` to support latest Neuron Runtime versions. For instructions, see the :ref:`Setup Guide <setup-guide-index>`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-deprecation-nxd-path-trace-api.rst",
    "content": ".. post:: September 18, 2025\n    :language: en\n    :tags: announce-deprecation-nxd-path-trace-api, al2\n\n.. _announce-deprecation-nxd-path-trace-api:\n\nAnnouncing the deprecation of the NeuronX Deep Learning Inference API path_trace function\n-----------------------------------------------------------------------------------------\n\n:ref:`Neuron release 2.26.0 <neuron-2-26-0-whatsnew>` is the last release supporting ``parallel_model_trace``. This NxD Inference function will be deprecated in the next version of the Neuron SDK in favor of the ``ModelBuilder.trace()`` method, which provides a more robust and flexible approach for tracing and compiling models for Neuron devices,  enabling more advanced features such as weight layout optimization support, as well as other quality-of-life and stability improvements for SPMD tracing.\n\nFor customers directly invoking ``parallel_model_trace``, they can now use ModelBuilderV2 APIs. For more details on these APIS, see :ref:`nxd-core-model-builder-v2`. For customers that are directly using models in NxDI, there is  no impact since NxDI models are already built on MBv1 which has no issues."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-deprecation-transformer-flag.rst",
    "content": ".. post:: September 15, 2023\n    :language: en\n    :tags: announce-end-of-support, transformer-flag \n\n.. _announce-end-of-support-transformer-flag:\n\nAnnouncing end-of-support for ``--model-type=transformer-inference`` compiler flag\n-----------------------------------------------------------------------------------\n\nStarting with :ref:`Neuron release 2.14 <neuron-2.14.0-whatsnew>`, the ``--model-type=transformer-inference`` compiler flag is deprecated.\n\nNeuron SDK users using the ``--model-type=transformer-inference`` compiler flag are highly encouraged to migrate to the ``--model-type=transformer`` compiler flag.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eol-megatron-lm.rst",
    "content": ".. post:: Aug 8, 2023\n    :language: en\n    :tags: announce-eol, trn1, trn1n\n\n.. _announce-eol-megatronlm:\n\nAnnouncing end of support for AWS Neuron reference for Megatron-LM \n-------------------------------------------------------------------\n\n:ref:`Neuron release 2.12 <neuron-2.12.0-whatsnew>` will be the last release that will include support for `AWS Neuron reference for Megatron-LM <https://github.com/aws-neuron/aws-neuron-reference-for-megatron-lm>`_. Future releases will not include Neuron support for Megatron-LM.\n\nCurrent Neuron Megatron-LM users are advised to migrate to `AWS Neuron reference for NeMo Megatron <https://github.com/aws-neuron/neuronx-nemo-megatron>`_ or `Neuron Distributed <https://github.com/aws-neuron/neuronx-distributed>`_.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eol-python-3-7.rst",
    "content": ".. post:: Jul 26, 2023 10:00\n    :language: en\n    :tags: announce-eol, python37\n\n.. _announce-eol-python37:\n\nAnnouncing end of support for ``Python 3.7`` \n---------------------------------------------\n\n:ref:`Neuron release 2.12 <neuron-2.12.0-whatsnew>` will be the last release that will include support for ``Python 3.7`` . Future Neuron releases will not include support for ``Python 3.7``\n\nCurrent users using ``Python 3.7`` are advised to migrate to latest supported Python version. (``Python 3.10`` )"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eol-ubuntu-18.rst",
    "content": ".. post:: Jul 13, 2023 11:00\n    :language: en\n    :tags: announce-eol, ubuntu18\n\n.. _announce-eol-ubuntu18:\n\nAnnouncing end of support for ``Ubuntu 18`` \n-------------------------------------------\n\n:ref:`Neuron release 2.12 <neuron-2.12.0-whatsnew>` will be the last release that will include support for ``Ubuntu 18`` . Future Neuron releases will not include support for ``Ubuntu 18``\n\nCurrent users using ``Ubuntu 18`` are advised to migrate to ``Ubuntu 20`` version.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-al2.rst",
    "content": ".. post:: June 28, 2024\n    :language: en\n    :tags: announce-eos-al2, al2\n\n.. _announce-eos-al2:\n\nAnnouncing end of support for Neuron Runtime support of Amazon Linux 2 (AL2)\n------------------------------------------------------------------------------\n\n:ref:`Neuron release 2.19 <neuron-2.19.0-whatsnew>` will be the last release that will include Neuron Runtime support for ``Amazon Linux 2`` . Future Neuron releases will not include Neuron Runtime support for ``Amazon Linux 2``.\n\nCurrent users using ``Amazon Linux 2`` are advised to migrate to Amazon Linux 2023 (AL2023) or Ubuntu 20/22.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-beta-pytorch-neuroncore-placement-apis.rst",
    "content": ".. post:: June 24, 2025\n    :language: en\n    :tags: announce-no-longer-support-pytorch-neuroncore-placement\n\n.. _announce-no-longer-support-beta-pytorch-neuroncore-placement-apis:\n\nAnnouncing end of support for Beta PyTorch NeuronCore Placement APIs starting next release \n--------------------------------------------------------------------------------------------\n\n:ref:`Neuron Release 2.24 <neuron-2-24-0-whatsnew>` is the last release to support the Beta PyTorch NeuronCore Placement APIs. \n\nCustomers using Beta PyTorch NeuronCore Placement APIs are recommended to migrate to using generally available (GA) PyTorch Neuron Core Placement APIs. Please refer to the :ref:`PyTorch Neuron documentation <torch_neuronx_core_placement_api>` for guidance on using the supported functionality. Any models using the beta APIs will need to be updated to use the generally available APIs.\n\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-bf16-vars.rst",
    "content": ".. post:: June 24, 2025\n    :language: en\n    :tags: announce-no-longer-support-xla-env-vars\n\n.. _announce-eos-longer-support-xla-bf16-vars:\n\nAnnouncing end of support XLA_USE_BF16 and XLA_DOWNCAST_BF16 environment variables starting next release\n---------------------------------------------------------------------------------------------------------\n\n:ref:`Neuron Release 2.24 <neuron-2-24-0-whatsnew>` will be the last release to support the following environment variables:\n\n- XLA_USE_BF16\n- XLA_DOWNCAST_BF16\n\n**I currently utilize these environment variables in my model code. What do I do?**\n\nCustomers are recommended to migrate to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to convert their model to BF16 format. For detailed migration guidance, please refer to :ref:`migration_from_xla_downcast_bf16`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-block-dimension-nki.rst",
    "content": ".. post:: June 24, 2025\n    :language: en\n    :tags: announce-eos-block-dimension-nki\n\n.. _announce-eos-block-dimension-nki:\n\nAnnouncing end of support for NKI block dimension starting next release\n--------------------------------------------------------------------------\n\n:ref:`Neuron release 2.24 <neuron-2-24-0-whatsnew>` will be the last release to include support for the NKI block dimension in NKI tensor creation routines. Starting with this release, using the block dimension will generate EOS warnings. In the next release (Neuron Release 2.25), these warnings will be upgraded to errors.\n\nCustomers are recommended to refer to the :ref:`nki_block_dimension_migration_guide` for detailed instructions on updating their code.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-dlami-ubuntu-22-04.rst",
    "content": ".. post:: December 18, 2025\n    :language: en\n    :tags: announce-eos-dlami-ubuntu-22-04\n\n.. _announce-eos-dlami-ubuntu-22-04:\n\nAnnouncing End of Support for Ubuntu 22.04 single framework DLAMIs for PyTorch and JAX in future release\n========================================================================================================\n\nUbuntu 22.04 single framework DLAMIs for PyTorch and JAX will reach end of support in a future release. Customers are advised to use multi-framework or previously released DLAMIs for Ubuntu 22.04."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-dlami.rst",
    "content": ".. post:: April 24, 2024\n    :language: en\n    :tags: announce-eos-dlami, neuron-dlami\n\n.. _announce-eos-dlami:\n\nAnnouncing end of support for Neuron Release 2.18.0 Deep Learning AMIs \n------------------------------------------------------------------------\n\nWe are announcing end of support for :ref:`Neuron release 2.18.0 <neuron-2.18.0-whatsnew>` Deep Learning AMIs. DLAMIs released between March 26,2024 (2024-03-26) and April 10, 2024 (2024-04-10) were shipped without the audit package. The following are the affected DLAMIs:\n\n- Deep Learning AMI Neuron (Ubuntu 22.04) 20240401\n- Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) 20240328\n- Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) 20240402\n- Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) 20240409\n- Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240328\n- Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240402\n- Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240409\n- Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2) 20240328\n- Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2) 20240402\n- Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2) 20240409\n- Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 20.04) 20240328\n- Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 20.04) 20240402\n- Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 20.04) 20240409\n- Deep Learning Base Neuron AMI (Amazon Linux 2) 20240401\n- Deep Learning Base Neuron AMI (Amazon Linux 2) 20240408\n- Deep Learning Base Neuron AMI (Ubuntu 20.04) 20240401\n- Deep Learning Base Neuron AMI (Ubuntu 20.04) 20240408\n\nCurrent users of the above :ref:`Neuron release 2.18 <neuron-2.18.0-whatsnew>` Deep Learning AMIs are required to upgrade to the latest DLAMIs in order to consume those with the audit package installed. For instructions to upgrade to the latest AMI, see the :ref:`DLAMI User Guide <neuron-dlami-overview>` or find the specific DLAMI image id for the latest Neuron release with :ref:`SSM parameters <ssm-parameter-neuron-dlami>`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-inf1-virtual-environments.rst",
    "content": ".. post:: December 18, 2025\n    :language: en\n    :tags: announce-eos-inf1-virtual-environments\n\n.. _announce-eos-inf1-virtual-environments:\n\nNeuron no longer supports Inf1 virtual environments and AMIs starting with Neuron 2.27\n======================================================================================\n\nStarting with Neuron release 2.27, Neuron no longer supports Inf1 virtual environments and AMIs. If you are a customer who is currently using Inf1 virtual environments or AMIs, use Neuron DLAMIs with Neuron version 2.26.1 or earlier."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-jax-neuronx-nki-call.rst",
    "content": ".. post:: April 3, 2025\n    :language: en\n    :tags: announce-eos-jax-neuronx-features\n\n.. _announce-eos-jax-neuronx-features-2:\n\nAnnouncing end of support for ``jax_neuronx.nki_call`` API in ``jax-neuronx`` from  starting next release\n------------------------------------------------------------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.23 <neuron-2.23.0-whatsnew>`, Neuron will end support for ``jax_neuronx.nki_call`` API in ``jax-neuronx`` package.\n\nFor a full list of features that require ``jax-neuronx``, please see :ref:`jax-neuron-known-issues`. \n\nCustomers using ``jax_neuronx.nki_call`` API are recommended to switch invocations to directly call functions annotated with ``@nki.jit``.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-megatronlm-2-13.rst",
    "content": ".. post:: Aug 28, 2023\n    :language: en\n    :tags: announce-eos, trn1, trn1n\n\n.. _announce-eos-megatronlm:\n\nAWS Neuron reference for Megatron-LM no longer supported\n----------------------------------------------------------\n\n:ref:`Neuron release 2.13 <neuron-2.13.0-whatsnew>` no longer includes support for `AWS Neuron reference for Megatron-LM <https://github.com/aws-neuron/aws-neuron-reference-for-megatron-lm>`_.\n\nCurrent Neuron Megatron-LM users are required to migrate to `AWS Neuron reference for NeMo Megatron <https://github.com/aws-neuron/neuronx-nemo-megatron>`_ or `Neuron Distributed <https://github.com/aws-neuron/neuronx-distributed>`_.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-mllama-checkpoint.rst",
    "content": ".. post:: May 15, 2025\n    :language: en\n    :tags: announce-eos-mllama-checkpoint\n\n.. _announce-eos-mllama-checkpoint:\n\nAnnouncing end of support for mllama 3.2 Meta Checkpoint API starting next release\n--------------------------------------------------------------------------------------\n\n:ref:`Neuron Release 2.23 <neuron-2.23.0-whatsnew>` will be the last release to include support for the mllama 3.2 Meta checkpoint API. In the next release (Neuron 2.24), Neuron will end support.\n\nAll previously converted checkpoints will continue to function without disruption. Customers' existing workflows and converted models remain fully operational. For new checkpoint conversions, the HuggingFace solution provides equivalent functionality. Customers are recommended to use HuggingFace's official conversion script, available here:\n`Hugging Face Conversion Script <https://github.com/huggingface/transformers/blob/main/src/transformers/models/mllama/convert_mllama_weights_to_hf.py>`_\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-multiframework-dlamis-inf1.rst",
    "content": ".. post:: April 24, 2024\n    :language: en\n    :tags: announce-eos-dlamis-inf1, dlami-inf1\n\n.. _announce-update-multiframework-dlami:\n\nAnnouncing end of support for Neuron virtual environments in AWS Deep Learning AMI (Amazon Linux 2)\n----------------------------------------------------------------------------------------------------\n\n:ref:`Neuron release 2.18.2 <neuron-2.18.0-whatsnew>` will be the last release that will include support for the following virtual environments in AWS Deep Learning AMI (Amazon Linux 2):\n\n- ``aws_neuron_pytorch_p38: PyTorch 1.13, Python 3.8``\n- ``aws_neuron_tensorflow2_p38: TensorFlow 2.10, Python 3.8``\n\nFuture releases will not include Neuron support for these virtual environments.\n\nCurrent users of Neuron virtual environments in `AWS Deep Learning AMI (Amazon Linux 2) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-amazon-linux-2/>`_ are required to migrate to the `Neuron multi framework DLAMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-ubuntu-22-04/>`_.\n\nTo see a list of Neuron supported virtual environments, please refer to :ref:`Neuron Multi Framework DLAMI User Guide <neuron-dlami-overview>`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-nemo.rst",
    "content": ".. post:: April 3, 2025\n    :language: en\n    :tags: announce-eos-nemo-megatron\n\n.. _announce-eos-nnm:\n\nAnnouncing end of support for Neuron support for NeMo Megatron starting next release\n-------------------------------------------------------------------------------------\n\nStarting with  Neuron Release 2.23, Neuron will end support for :ref:`NeMo Megatron <nemo-megatron-index>`. \n\nWe recommend all users of :ref:`NeMo Megatron <nemo-megatron-index>` to migrate their training workloads to :ref:`NxD Training <nxd-training-overview>`. Please refer to :ref:`Neuron NeMo Megatron to NeuronX Distributed Training Migration Guide <nxdt_developer_guide_migration_nnm_nxdt>` for guidance."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-neuron-det.rst",
    "content": ".. post:: December 20, 2024\n    :language: en\n    :tags: announce-eos-neuron-det\n\n.. _announce-eos-neuron-det:\n\nAnnouncing end of support for Neuron DET tool starting next release\n-------------------------------------------------------------------\n\n:ref:`Neuron Release 2.21 <neuron-2.21.0-whatsnew>` will be the last release to support the Neuron Distributed Event Tracing (NDET/neuron-det) tool.\n\nWe recommend all customers using the NDET tool for debugging runtime hangs/issues in large-scale settings transition to the Neuron Profiler 2.0. This tool offers the same runtime function level traces with improved ease of use and optimized performance. For more information on Neuron Profiler 2.0, please refer to the :ref:`neuron-profiler-2-0-guide`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-neuron-driver-support-inf1.rst",
    "content": ".. post:: June 24, 2025\n    :language: en\n    :tags: announce-eos-neuron-driver-2.21-version, neuron-driver-version, inf1\n\n.. _announce-upcoming-neuron-driver-2.21-version support changes for inf1 instance:\n\n\nUpcoming changes to Neuron driver 2.21 support for Inf1 starting Neuron 2.26 release\n------------------------------------------------------------------------------------\n\n.. note::\n\n   This announcement has been superseded. The correct last supported Neuron driver version for ``Inf1`` is **2.24**, not 2.21. See :ref:`announce-correction-neuron-driver-inf1-support` for details.\n\nStarting with Neuron Release 2.26, Neuron driver versions above 2.21 will only support non-Inf1 instances (such as ``Trn1``, ``Inf2``, or other instance types). \nFor ``Inf1`` instance users, Neuron driver versions <  2.21 will remain supported with regular security patches. \n\n``Inf1`` instance users are advised to pin the Neuron driver version to ``2.21.*`` in their installation script. \nRefer to the :ref:`Neuron Driver release [2.22.2.0] <runtime_rn>` for detailed instructions on pinning the Neuron Driver.  \n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler-2.rst",
    "content": ".. post:: February 26, 2026\n    :language: en\n    :tags: announce-eos-neuron-profiler\n\n.. _announce-eos-neuron-profiler-2:\n\nNeuron Explorer Replaces Neuron Profiler, Starting with Neuron 2.29\n-------------------------------------------------------------------\n\nStarting with Neuron 2.29, **Neuron Profiler and Profiler 2.0 (UI and CLI) will reach end of support** and be replaced by Neuron Explorer. If you are currently using the Neuron Profiler, migrate to Neuron Explorer before the Neuron 2.29 release.\n\nFor migration guidance, see the :doc:`/tools/neuron-explorer/migration-faq`.\n\nWhat is Neuron Explorer?\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuron Explorer is the next-generation suite of tools, guiding developers through their development journey on Trainium. It enables ML performance engineers to:\n\n* **Trace execution end-to-end** — from source code down to hardware operations.\n* **Analyze model behavior at every layer of the stack** — with detailed breakdowns per operation, per core, and per device.\n* **Profile distributed workloads** — with native support for multi-node and multi-worker analysis at scale.\n\nFor more details, see :doc:`/tools/neuron-explorer/index`.\n\nHow does this impact current Neuron Profiler users?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. important::\n\n    Neuron strongly recommends migrating to Neuron Explorer **before** the Neuron 2.29 release.\n\nThere are two things to be aware of when migrating:\n\n* **Existing NTFF profile files are supported**, but must be reprocessed before they can be viewed in the Neuron Explorer UI.\n* **New features require new profiles.** To access the full set of Neuron Explorer capabilities, you must recapture your profiles using the updated tooling.\n\nFor detailed migration steps, see the :doc:`/tools/neuron-explorer/migration-faq` and the :ref:`Neuron Explorer FAQ <neuron-explorer-faq>`.\n\nWhat happens to Neuron Profiler after Neuron 2.29?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAfter Neuron 2.29, Neuron Profiler will:\n\n* **No longer receive** bug fixes, feature updates, or technical support.\n* **No longer be distributed** as part of the Neuron SDK.\n\nIf you need to continue using Neuron Profiler temporarily, you must pin your environment to Neuron 2.28 or earlier. This is **not recommended**, as you will not receive any SDK updates or security fixes.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler-v230.rst",
    "content": ".. post:: March 31, 2026\n    :language: en\n    :tags: announce-eos-neuron-profiler\n\n.. _announce-eos-neuron-profiler-v230:\n\nNeuron Explorer Replaces Neuron Profiler, Starting with Neuron 2.30.0\n----------------------------------------------------------------------\n\nStarting with Neuron 2.30.0, Neuron Profiler and Profiler 2.0 (UI and CLI) will reach end of support and be replaced by Neuron Explorer. If you are currently using the Neuron Profiler, migrate to Neuron Explorer before the Neuron 2.30.0 release.\n\nFor migration guidance, see the :doc:`/tools/neuron-explorer/migration-faq`.\n\nWhat is Neuron Explorer?\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuron Explorer is the next-generation suite of tools, guiding developers through their development journey on Trainium. It enables ML performance engineers to:\n\n* **Trace execution end-to-end** — from source code down to hardware operations.\n* **Analyze model behavior at every layer of the stack** — with detailed breakdowns per operation, per core, and per device.\n* **Profile distributed workloads** — with native support for multi-node and multi-worker analysis at scale.\n\nFor more details, see :doc:`/tools/neuron-explorer/index`.\n\nHow does this impact current Neuron Profiler users?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. important::\n\n    Neuron strongly recommends migrating to Neuron Explorer **before** the Neuron 2.30.0 release.\n\nThere are two things to be aware of when migrating:\n\n* **Existing NTFF profile files are supported**, but must be reprocessed before they can be viewed in the Neuron Explorer UI.\n* **New features require new profiles.** To access the full set of Neuron Explorer capabilities, you must recapture your profiles using the updated tooling.\n\nFor detailed migration steps, see the :doc:`/tools/neuron-explorer/migration-faq` and the :ref:`Neuron Explorer FAQ <neuron-explorer-faq>`.\n\nWhat happens to Neuron Profiler after Neuron 2.30.0?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAfter Neuron 2.30.0, Neuron Profiler will:\n\n* **No longer receive** bug fixes, feature updates, or technical support.\n* **No longer be distributed** as part of the Neuron SDK.\n\nIf you need to continue using Neuron Profiler temporarily, you must pin your environment to Neuron 2.28 or earlier. This is **not recommended**, as you will not receive any SDK updates or security fixes.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announce-eos-neuron-profiler\n\n.. _announce-eos-neuron-profiler:\n\nEnd of Support for Neuron Profiler and Neuron Profiler 2.0 UI and CLI coming in a future Neuron release\n--------------------------------------------------------------------------------------------------------\n\nWhat's changing\n^^^^^^^^^^^^^^^^\nNeuron will end support for the legacy Neuron Profiler and Neuron Profiler 2.0 UI and CLI tools in a coming release (planned for v2.29.0). We launched Neuron Explorer in Neuron SDK 2.27, replacing these tools with a unified developer experience that will include device and system profiling in a single view, eager mode support, enhanced memory profiling, improved visualization capabilities, as well as support for the full developer lifecycle.\n\nWhy are we making this change\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nConsolidating to Neuron Explorer allows us to focus development efforts on a single, modern profiling solution while providing you with enhanced features and a better user experience.\n\nHow does this impact you\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you are currently using the legacy Neuron Profiler UI or CLI, please do the following before Neuron 2.29:\n\n* Begin using Neuron Explorer (available since Neuron 2.27). See https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/get-started.html#\n* Reprocess your existing NTFF files for the new UI: see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/how-to-profile-workload.html\n\nNote: Neuron Explorer is backwards compatible with existing Profiler NTFF files, but they must be reprocessed to view in the new UI. For new features (eager mode, memory viewer, certain NKI tools), you'll need to recapture profiles.\n\nAfter Neuron 2.29.0 releases (planned):\n\n* Legacy UI will no longer receive bug fixes, updates, or technical support\n* To continue using legacy UI, you must pin to the last version that supports it (not recommended)\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-neurondevice-version.rst",
    "content": ".. post:: June 28, 2024\n    :language: en\n    :tags: announce-eos-neuron-device-version, neuron-device-version\n\n.. _announce-eos-neuron-device-version:\n\nAnnouncing end of support for 'neuron-device-version' field in neuron-monitor\n-------------------------------------------------------------------------------\n\n:ref:`Neuron release 2.19 <neuron-2.19.0-whatsnew>` will be the last release to include the field 'neuron-device-version' in neuron-monitor.\n\nIn future releases, customers who are using the field 'neuron-device-version' will instead need to use 'instance_type' field in the 'instance_info' section and the 'neuroncore_version' field to obtain neuron device information.\n\nPlease see :ref:`neuron-monitor-ug` for more details.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-neurondevice.rst",
    "content": ".. post:: June 28, 2024\n    :language: en\n    :tags: announce-eos-neuron-device, neuron-device\n\n.. _announce-eos-neurondevice:\n\nAnnouncing end of support for 'neurondevice' resource name in Neuron Device K8s plugin\n----------------------------------------------------------------------------------------\n\n:ref:`Neuron release 2.19 <neuron-2.19.0-whatsnew>` will be the last release to include resource name 'neurondevice'. \n\nNeuron device plugin is a Neuron Software component that gets installed in Kubernetes environment. The resource name 'neurondevice' enables customers to allocate devices to the Neuron K8s container.\n\nIn future releases, we will rename resource name 'neurondevice' to 'neuron' to maintain consistency. Customers who are using the resource name 'neurondevice' in their YAML file will need to update to use 'neuron'.\n\nPlease see :ref:`k8s-neuron-device-plugin` for more details.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-nxd-examples.rst",
    "content": ".. post:: December 20, 2024\n    :language: en\n    :tags: announce-eos-nxd-examples\n\n.. _announce-eos-nxd-examples:\n\nAnnouncing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release\n--------------------------------------------------------------------------------------------------------------\n\n:ref:`Neuron Release 2.21 <neuron-2.21.0-whatsnew>` will be the last release to include NxD Core repository inference examples under the NxD Core repository: https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference. Starting with :ref:`Neuron Release 2.21 <neuron-2.21.0-whatsnew>`, the models and modules in NxD Core inference examples are now available through NxD Inference package. We recommend customers to update their applications to use examples from the NxD Inference repository. See :ref:`nxdi-overview`\n\nIn Neuron Release 2.22, the NxD Core inference samples will only reside under the NxD Inference repository. Current users are advised to start using samples/tutorials under the NxD Inference repository: https://github.com/aws-neuron/neuronx-distributed-inference.\n\nI currently utilize an inference sample from the NxD Core repository in my model code. What do I do?\n======================================================================================================\n\nIf your applications depend on the inference examples from NxD Core, we recommend that you update your code to use the new NxD Inference package. With NxD Inference, you can import and use these models and modules in your applications. Any models compiled with inference code from the NxD Core repository will need to be re-compiled. Please refer to the :ref:`nxd-examples-migration-guide` for guidance.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-nxdt-nxd-core-training.rst",
    "content": ".. post:: February 26, 2026\n    :language: en\n    :tags: announce-eos-nxdt\n\n.. _announce-eos-nxdt-nxd-core-training:\n\nAnnouncing end of support for NxDT and NxD Core Training APIs starting with Neuron SDK release 2.29 (PyTorch 2.10)\n-------------------------------------------------------------------------------------------------------------------\n\nNeuron SDK release 2.28 (PyTorch 2.9) will be the last release to include the NeuronX Distributed Training (NxDT) library. Starting with Neuron SDK release 2.29 (PyTorch 2.10), the use of NxD Core training APIs and the PyTorch/XLA package for training will no longer be supported.\n\nHow does this impact you?\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nExisting NxDT/NxD Core users should stay on Neuron SDK 2.28 (PyTorch 2.9) until ready to migrate to native PyTorch on Neuron. Native PyTorch on Neuron uses standard distributed primitives (DTensor, FSDP, DDP). A migration guide will be published in a coming release.\n\nSee :doc:`Native PyTorch on Neuron Overview </frameworks/torch/pytorch-native-overview>` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-probuf.rst",
    "content": ".. post:: June 28, 2024\n    :language: en\n    :tags: announce-eos-probuf, probuf\n\n.. _announce-eos-probuf319:\n\nAnnouncing end of support for Probuf versions <= 3.19 for PyTorch NeuronX, NeuronX Distributed, and Transformers NeuronX libraries \n------------------------------------------------------------------------------------------------------------------------------------\n\n:ref:`Neuron release 2.19 <neuron-2.19.0-whatsnew>` will be the last release that will include Probuf <= 3.19 support for PyTorch NeuronX, NeuronX Distributed, and Transformers NeuronX libraries. Future Neuron releases will not include Probuf <= 3.19 support for PyTorch NeuronX.\n\nCurrent PyTorch NeuronX, NeuronX Distributed, or Transformers NeuronX users using Probuf <= 3.19 are advised to migrate to latest supported Probuf version.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-pt-versions.rst",
    "content": ".. post:: December 20, 2023\n    :language: en\n    :tags: announce-eos-pt, pt-versions\n\n.. _announce-eos_pytorch110:\n\nAnnouncing End of Support for PyTorch Neuron version 1.10\n-----------------------------------------------------------\n\n:ref:`Neuron release 2.16 <neuron-2.16.0-whatsnew>` will be the last release that will include support for PyTorch Neuron version 1.10. Future Neuron releases will not include support for PyTorch Neuron version 1.10.\n\nCurrent users of PyTorch Neuron version 1.10 are advised to migrate to latest supported PyTorch Neuron version.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-pt2.rst",
    "content": ".. post:: December 20, 2023\n    :language: en\n    :tags: announce-eos-pt-two, pt-versions-two\n\n.. _announce-eos_pytorch2:\n\nAnnouncing End of Support for PyTorch NeuronX version 2.0 (beta)\n-----------------------------------------------------------------\n\n:ref:`Neuron release 2.16 <neuron-2.16.0-whatsnew>` will be the last release that will include beta support for PyTorch NeuronX version 2.0 (beta). Future Neuron releases will not include support for PyTorch NeuronX version 2.0.\n\nCurrent users of PyTorch NeuronX version 2.0 are advised to upgrade to PyTorch NeuronX 2.1 (beta).\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-python38.rst",
    "content": ".. post:: December 20, 2024\n    :language: en\n    :tags: announce-python-eos\n\n.. _announce-python-eos:\n\nAnnouncing end of support for Python 3.8 in future releases\n-----------------------------------------------------------\n\nDue to Python 3.8 reaching its end-of-life status, future Neuron releases will no longer include support for this version.\n\n=========================\nHow does this impact me?\n=========================\n\nI currently use Python 3.8.\n============================\n\nTo avoid security issues and bugs, current users of Python 3.8 are advised to migrate to a Neuron supported Python version (3.9, 3.10, or 3.11) as Neuron will no longer support Python 3.8. For a list of supported Python versions according to Neuron package, please see :ref:`latest-neuron-release-artifacts`.\n\nI currently use Ubuntu 20, which has Python 3.8 as the default version. Am I affected?\n=======================================================================================\n\nAlthough Python 3.8 is the default version of Ubuntu 20.04, Neuron will continue to support Ubuntu 20.04 until April 2025, due to extended standard support of Python 3.8 in Ubuntu 20. Please see the :ref:`sdk-maintenance-policy` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-pytorch-1-1-3.rst",
    "content": ".. post:: December 20, 2024\n    :language: en\n    :tags: announce-eos-pytorch-version\n\n.. _announce-eos-pytorch-eos-113:\n\nAnnouncing end of support for PyTorch 1.13 starting next release\n----------------------------------------------------------------\n\n:ref:`Neuron Release 2.21 <neuron-2.21.0-whatsnew>` is the last release to support PyTorch 1.13, its associated Deep Learning Containers (DLCs), and Deep Learning AMIs (DLAMIS) for Trn1, Trn2, and Inf2 instances.\n\nWe recommend that all customers using torch-neuron 1.13, related DLCs, and DLAMIS on Trn2, Trn1, and Inf2 instances upgrade to the latest supported PyTorch version. For more information on supported versions, please refer to :ref:`latest-neuron-release-artifacts`.\n\nPlease note that PyTorch 1.13 will continue to be supported for Inf1 instances."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-pytorch-1-9.rst",
    "content": ".. post:: August 28, 2023\n    :language: en\n    :tags: announce-eol, torch-neuron \n\n.. _announce-eol-pytorch19:\n\nAnnouncing end of support for ``torch-neuron`` version 1.9 \n-----------------------------------------------------------\n\n:ref:`Neuron release 2.13 <neuron-2.13.0-whatsnew>` will be the last release that will include support for ``torch-neuron`` version 1.9. Future Neuron releases will not include support for ``torch-neuron`` version 1.9.\n\nCurrent users of ``torch-neuron`` version 1.9 are advised to migrate to latest supported ``torch-neuron`` version.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-1.rst",
    "content": ".. post:: December 20, 2024\n    :language: en\n    :tags: announce-eos-pytorch-version\n\n.. _announce-eos-pytorch-2-1:\n\nAnnouncing end of support for PyTorch 2.1 starting next release\n---------------------------------------------------------------\n\n:ref:`Neuron Release 2.21 <neuron-2.21.0-whatsnew>` is the last release to support PyTorch 2.1, its associated Deep Learning Containers (DLCs), and Deep Learning AMIs (DLAMIS).\n\nWe recommend that all customers using PyTorch 2.1, related DLCs, and DLAMIS upgrade to the latest supported PyTorch version. For more information on supported versions, please refer to :ref:`latest-neuron-release-artifacts`.\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-7-2-8-v229.rst",
    "content": ".. post:: March 31, 2026\n    :language: en\n    :tags: announce-eos-pytorch-version\n\n.. _announce-eos-pytorch-2-7-2-8-v229:\n\nNeuron no longer supports PyTorch versions 2.7 and 2.8 starting with Neuron 2.29\n----------------------------------------------------------------------------------\n\nStarting with Neuron 2.29, Neuron no longer supports PyTorch versions 2.7 and 2.8. We recommend that all customers upgrade to the latest supported PyTorch version.\n\nCustomers currently using PyTorch versions 2.7 and 2.8 must upgrade to a newer supported PyTorch version. For more information on supported versions, refer to :ref:`latest-neuron-release-artifacts`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-7-2-8.rst",
    "content": ".. post:: February 26, 2026\n    :language: en\n    :tags: announce-eos-pytorch-version\n\n.. _announce-eos-pytorch-2-7-2-8:\n\nAnnouncing end of support for PyTorch versions 2.7 and 2.8 starting next release\n---------------------------------------------------------------------------------\n\n:ref:`Neuron Release 2.28 <whats-new-2026-02-26-v2_28>` is the last release to support PyTorch versions 2.7 and 2.8. Future Neuron releases will not include support for PyTorch versions 2.7 and 2.8.\n\nCurrent users of PyTorch version 2.7 or 2.8 are advised to upgrade to PyTorch 2.9. For more information on supported versions, refer to :ref:`latest-neuron-release-artifacts`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-pytorch-profiling-api.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announce-eos-pytorch-profling-api\n\n.. _announce-eos-pytorch-profling-api:\n\nEnd of Support for PyTorch Experimental Profiling API starting in a future release\n------------------------------------------------------------------------------------\n\nWhat's changing\n^^^^^^^^^^^^^^^^\n\nNeuron will end support for the ``torch_neuronx.experimental.profiler.profile`` API in a future release of Neuron (planned for v2.29.0). This experimental API will be replaced by native PyTorch profiling support using the standard ``torch.profiler.profile()`` API.\n\nHow does this impact you\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you are using ``torch_neuronx.experimental.profiler.profile,`` before April/May 2026:\n\n* Update your code to use native PyTorch profiling API:\n\n.. code-block:: python\n\n    # Before (Experimental API)\n    from torch_neuronx.experimental import profiler\n    with profiler.profile(output_path=\"/tmp/profile\") as prof:\n        output = model(input)\n\n    # After (Native API)\n    import torch.profiler\n    with torch.profiler.profile(\n        activities=[torch.profiler.ProfilerActivity.NEURON],\n        on_trace_ready=torch.profiler.tensorboard_trace_handler(\"/tmp/profile\")\n    ) as prof:\n        output = model(input)\n\nAfter Neuron 2.29.0 releases (planned):\n\n* Experimental API will no longer be supported\n* To continue using the experimental API, you must pin to Neuron SDK 2.28 or earlier (not recommended)\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-tensorboard-tools.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announce-eos-tensorboard-tools\n\n.. _announce-eos-tensorboard-tools:\n\nAnnouncing End of Support for TensorBoard Plugin for Neuron Profiler in Neuron 2.27\n-----------------------------------------------------------------------------------\n\nNeuron 2.27 will be the last release to support TensorBoard Plugin. Future Neuron releases will not include the support for TensorBoard plugin. All customers using the TensorBoard plugin to visualize and analyze model performance are recommended to migrate to Neuron Explorer. \n\nTo begin using Neuron Explorer (available since Neuron 2.27) for profiling, see :doc:`the Neuron Explorer documentation </tools/neuron-explorer/index>`. Neuron Explorer was introduced with :doc:`the release of the AWS Neuron SDK version 2.27.0 </release-notes/prev/2.27.0/index>`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-tensorflow-2-8-9.rst",
    "content": ".. post:: April 3, 2025\n    :language: en\n    :tags: announce-tensorflow-versions-eos\n\n.. _announce-tfx-2-8-9-eos:\n\nAnnouncing end of support for Tensorflow 2.8 and 2.9 starting next release\n----------------------------------------------------------------------------\n\nStarting with Neuron Release 2.23, Neuron will end support for TensorFlow 2.8 and 2.9. Future Neuron releases will not include support for Tensorflow-Neuron 2.8 and 2.9 versions.\n\nCurrent users of those versions are advised to migrate to latest TensorFlow version (2.10). For a list of supported versions, please see :ref:`latest-neuron-release-artifacts`.\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-tensorflow-inf2.rst",
    "content": ".. post:: February 26, 2026\n    :language: en\n    :tags: announce-eos-tensorflow\n\n.. _announce-eos-tensorflow-inf2:\n\nAnnouncing end of support for TensorFlow for Inferentia2 (Inf2) starting with Neuron 2.29\n------------------------------------------------------------------------------------------\n\n:ref:`Neuron Release 2.28 <whats-new-2026-02-26-v2_28>` is the last release to support TensorFlow for Inferentia2 (``Inf2``). Future Neuron releases will not include support for TensorFlow for ``Inf2`` instance users.\n\nCurrent Inf2 instance users are advised to use the latest PyTorch version 2.9. For a list of supported PyTorch versions, see :ref:`latest-neuron-release-artifacts`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-tensorflow1-x.rst",
    "content": ".. post:: June 28, 2024\n    :language: en\n    :tags: announce-tensorflow-eos, tf-versions-1-x\n\n.. _announce-tfx-eos:\n\nAnnouncing end of support for Tensorflow-Neuron 1.x\n-----------------------------------------------------\n\n:ref:`Neuron release 2.19 <neuron-2.19.0-whatsnew>` will be the last release to support Tensorflow-Neuron 1.x. \nFuture Neuron releases will not include support for Tensorflow-Neuron 1.x versions. Current users of those versions are advised to migrate to latest tensorflow-neuron version 2.10.1.\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-torch-neuron.rst",
    "content": ".. post:: September 16, 2024\n    :language: en\n    :tags: announce-torch-neuron-eos, torch-neuron\n\n.. _announce-torch-neuron-eos:\n\nAnnouncing maintenance mode for torch-neuron 1.9 and 1.10 versions \n---------------------------------------------------------------------\n\nStarting with :ref:`Neuron release 2.20 <neuron-2-20-2-whatsnew>`, torch-neuron 1.9 and 1.10 versions will enter maintenance mode.\nFuture Neuron releases will not include support for torch-neuron 1.9 and 1.10 versions. Current users of torch-neuron 1.9 and 1.10 versions are advised to migrate to torch-neuron 1.13.\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-torch-neuronx-nki-jit.rst",
    "content": ".. post:: May 15, 2025\n    :language: en\n    :tags: announce-eos-torch-neuronx-nki-jit\n\n.. _announce-eos-torch-neuronx-nki-jit:\n\nAnnouncing end of support for ``torch_neuronx.nki_jit`` API in ``torch-neuronx`` starting next release\n---------------------------------------------------------------------------------------------------------\n\n:ref:`Neuron Release 2.23 <neuron-2.23.0-whatsnew>` will be the last release to include support for ``torch_neuronx.nki_jit`` API in ``torch-neuronx`` package.\n\nCustomers using ``torch_neuronx.nki_jit`` API are recommended to switch invocations to directly call functions annotated with ``@nki.jit``.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-u20-dlamis.rst",
    "content": ".. post:: December 20, 2024\n    :language: en\n    :tags: announce-u20-dlami-dlc-eos\n\n.. _announce-u20-dlami-dlc-eos:\n\nAnnouncing end of support for Ubuntu20 DLCs and DLAMIs\n------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.21 <neuron-2.21.0-whatsnew>`, AWS Neuron will begin phasing out support for Ubuntu20 Deep Learning Containers (DLCs) and Deep Learning AMIs (DLAMIs). Neuron 2.21 will be the last release to provide bug fixes, and by Neuron 2.22, these offerings will no longer be available.\n\nWe recommend that all customers using Ubuntu20 DLCs and DLAMIs migrate to newer versions based on Ubuntu22 or Amazon Linux 2023. For customers who need to continue using Ubuntu20, you can create custom AMIs based on the Ubuntu20 base image and install Neuron components manually. Please see :ref:`container-faq` and :ref:`neuron-dlami-overview`. \n\nPlease note that this does not affect support for the base Ubuntu20 operating system, which will continue to receive updates as per our standard support policy. For more information, please see :ref:`sdk-maintenance-policy`\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-eos-xla-bf16.rst",
    "content": ".. post:: May 15, 2025\n    :language: en\n    :tags: announce-eos-xla-bf\n\n.. _announce-eos-xla-bf:\n\nAnnouncing end of support for XLA_USE_BF16 and XLA_DOWNCAST_BF16 starting next release\n----------------------------------------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.23 <neuron-2.23.0-whatsnew>`, Neuron will begin phasing out support for the ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` environment variables. In this release, usage of these variables will trigger warnings. Neuron will end support in a subsequent release, aligned with the torch-xla maintenance schedule.\n\nCustomers are recommended to migrate to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to convert their model to BF16 format. For detailed migration guidance, please refer to :ref:`migration_from_xla_downcast_bf16`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-intent-eol-nemo-arg.rst",
    "content": ".. post:: Oct 26, 2023\n    :language: en\n    :tags: announce-intent-end-of-support-nemo-arg, nemo-arg\n\n.. _announce-intent-deprecate-nemo-arg:\n\nAnnouncing End of Support for ``nemo`` option-argument\n-------------------------------------------------------\n\n:ref:`Neuron release 2.15 <neuron-2.15.0-whatsnew>` will be the last release that will include support for ``nemo`` option-argument in the existing `--distribution_strategy` :ref:`compiler option <neuron-compiler-cli-reference-guide>`. Future releases will not include Neuron support for ``nemo`` option-argument.\nUsers are advised to migrate to the new ``llm-training`` option-argument.\n\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-intent-eos-opt.rst",
    "content": ".. post:: Oct 26, 2023\n    :language: en\n    :tags: announce-intent-eos-opt, opt\n\n.. _announce-intent-eos-opt:\n\nAnnouncing End Of Support for OPT example in Transformers NeuronX\n------------------------------------------------------------------\n\n:ref:`Neuron release 2.15 <neuron-2.15.0-whatsnew>` will be the last release that will include OPT example in Transformers NeuronX.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-intent-eos-pt-version.rst",
    "content": ".. post:: June 24, 2025\n    :language: en\n    :tags: announce-eos-pt-two-five\n\n.. _announce-eos_pytorch25:\n\nAnnouncing End of Support for PyTorch NeuronX version 2.5 starting next release\n---------------------------------------------------------------------------------\n\n:ref:`Neuron release 2.24 <neuron-2-24-0-whatsnew>` will be the last release that will include support for PyTorch NeuronX version 2.5. Future Neuron releases will not include support for PyTorch NeuronX version 2.5.\n\nCurrent users of PyTorch NeuronX version 2.5 are advised to upgrade to PyTorch NeuronX 2.6 or 2.7. Please see release artifacts for more details on supported versions.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-intent-eos-pt2-6.rst",
    "content": ".. post:: September 18, 2025\n    :language: en\n    :tags: announce-eos-pt2-6\n\n.. _announce-eos_pt2-6:\n\nAnnouncing End of Support for PyTorch NeuronX version 2.6 starting next release\n---------------------------------------------------------------------------------\n\n:ref:`Neuron release 2.26 <neuron-2-26-0-whatsnew>` will be the last release that will include support for PyTorch NeuronX version 2.6. Future Neuron releases will not include support for PyTorch NeuronX version 2.6. Current users of PyTorch NeuronX version 2.6 are advised to upgrade to PyTorch NeuronX 2.7 or 2.8. See :ref:`Neuron release artifacts <latest-neuron-release-artifacts>` for more details on supported versions."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-intent-eos-tensorflow-tutorial-inf.rst",
    "content": ".. post:: June 24, 2025\n    :language: en\n    :tags: announce-eos-tensorflow-tutorial\n\n.. _announce-eos-tensorflow-tutorial:\n\nAnnouncing End of Support for Tensorflow Neuron Inf1 SSD300 tutorial starting next release\n--------------------------------------------------------------------------------------------\n\n:ref:`Neuron release 2.24 <neuron-2-24-0-whatsnew>` will be the last release that will include support for :ref:`Tensorflow Neuron Inf1 SSD300 <tensorflow-ssd300>` tutorial. Future Neuron releases will not include support for :ref:`Tensorflow Neuron Inf1 SSD300 <tensorflow-ssd300>` tutorial due to security issues.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-intent-eos-tnx.rst",
    "content": ".. post:: June 24, 2025\n    :language: en\n    :tags: announce-eos-tnx\n\n.. _announce-eos-tnx:\n\nAnnouncing end of support for Transformers NeuronX library starting in Neuron 2.26 release\n--------------------------------------------------------------------------------------------\n\nStarting from :ref:`Neuron Release 2.24 <neuron-2-24-0-whatsnew>`, Transformers NeuronX library is in maintenance mode. ``transformers-neuronx`` releases will now only address critical security issues. In Neuron Release 2.26, Neuron will end support for ``transformers-neuronx``.\n\nCurrent users of ``transformers-neuronx`` are advised to migrate to :ref:`NeuronX Distributed Inference <nxdi-overview>`.\n\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-intent-maintenance-tnx.rst",
    "content": ".. post:: May 15, 2025\n    :language: en\n    :tags: announce-transformers-neuronx-maintenance, tnx\n\n.. _announce-tnx-maintenance:\n\nAnnouncing maintenance mode for Transformers NeuronX library starting next release\n------------------------------------------------------------------------------------\n\nStarting from Neuron release 2.24, Transformers NeuronX library is entering maintenance mode. Future releases of ``transformers-neuronx`` will address critical security issues only and we will gradually end support. \n\nCurrent users of ``transformers-neuronx`` are advised to migrate to :ref:`NeuronX Distributed Inference <nxdi-overview>`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-maintenance-mxnet.rst",
    "content": ".. post:: June 28, 2024\n    :language: en\n    :tags: announce-mxnet-maintenance, mxnet\n\n.. _announce-mxnet-maintenance:\n\nNeuron support for MxNet enters maintenance mode\n---------------------------------------------------\n\nStarting with :ref:`Neuron release 2.19 <neuron-2.19.0-whatsnew>`, Neuron support for MxNet (``mxnet-neuron``) is entering maintenance mode.\n\nFuture releases of ``mxnet-neuron`` will address critical security issues only and we will gradually end support. Current users of ``mxnet-neuron`` are advised to migrate to PyTorch NeuronX or TensorFlow NeuronX.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-maintenance-nxdi-nxd-core-inference.rst",
    "content": ".. post:: March 31, 2026\n    :language: en\n    :tags: announce-maintenance-nxdi\n\n.. _announce-maintenance-nxdi-nxd-core-inference:\n\nAnnouncing maintenance mode for NxD Inference and NxD Core Inference APIs starting next release\n-----------------------------------------------------------------------------------------------\n\nStarting with Neuron 2.30.0, NxD Inference library and NxD Core Inference APIs are entering maintenance mode. Future releases will address critical security issues only and we will gradually end support.\n\nWe are actively investing in an enhanced vLLM Neuron plugin that will not require a separate NxD Inference library. More information about the vLLM Neuron plugin enhancements and migration guidance will be shared in the upcoming release.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-maintenance-nxdt-nxd-core-training.rst",
    "content": ".. post:: March 31, 2026\n    :language: en\n    :tags: announce-maintenance-nxdt\n\n.. _announce-maintenance-nxdt-nxd-core-training:\n\nAnnouncing maintenance mode for NxDT and NxD Core Training APIs starting next release\n-------------------------------------------------------------------------------------\n\nStarting with Neuron 2.30.0, NxDT and NxD Core Training APIs are entering maintenance mode. Future releases will address critical security issues only and we will gradually end support.\n\nHow does this impact you?\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nExisting NxDT/NxD Core users should stay on Neuron 2.28 and PyTorch 2.9 until ready to migrate to native PyTorch on Neuron (starting PyTorch 2.10). Customers are recommended to use native PyTorch with standard distributed primitives (DTensor, FSDP, DDP) and TorchTitan starting with Neuron 2.30.0 and PyTorch 2.10. A migration guide will be published in a coming release.\n\nSee :doc:`/frameworks/torch/pytorch-native-overview` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-maintenance-tf.rst",
    "content": ".. post:: April 1, 2024\n    :language: en\n    :tags: announce-tensorflow-maintenance, tf-versions\n\n.. _announce-tfx-maintenance:\n\nTensorflow-Neuron 1.x enters maintenance mode\n-----------------------------------------------\n\nStarting with :ref:`Neuron release 2.18 <neuron-2.18.0-whatsnew>`, Tensorflow-Neuron 1.x is entering maintenance mode. Future releases of Tensorflow-Neuron 1.x will address critical security issues only and we will gradually end support. Current users of those versions are advised to migrate to latest tensorflow-neuron version 2.10.1.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-moving-samples.rst",
    "content": ".. post:: December 20, 2023\n    :language: en\n    :tags: announce-moving-nxd-samples, nxd-samples\n\n.. _announce-moving-samples:\n\nAnnouncing end-of-support for NeuronX Distributed Training Samples in Neuron Samples Repository \n------------------------------------------------------------------------------------------------\n\n:ref:`Neuron release 2.16 <neuron-2.16.0-whatsnew>` will be the last release to include support for NeuronX Distributed Training Samples (Llama-2, GPT-NeoX 20B, and GPT-NeoX 6.9B) under the `AWS Neuron Samples Github repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training>`_.\n\nIn future releases, NeuronX Distributed samples will reside under the `NeuronX Distributed Github repository <https://github.com/aws-neuron/neuronx-distributed>`_. Current users are advised to start using samples under the NeuronX Distributed repository for all NeuronX Distributed tutorials.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-nki-library-namespace-changes-2-28.rst",
    "content": ".. post:: February 26, 2026\n    :language: en\n    :tags: announce-nki-library-changes\n\n.. _announce-nki-library-namespace-changes-2-28:\n\nNKI Library namespace changes starting with Neuron 2.28\n--------------------------------------------------------\n\nStarting with Neuron 2.28, the open source repository namespace has changed from ``nkilib_standalone.nkilib.*`` to ``nkilib.*``, providing a consistent namespace between the open source repository and the shipped version. If customers want to add or modify NKI Library kernels, they can build and install them to replace the default implementation without changing model imports.\n\nSee :ref:`NKI Library <nkl_home>` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-nki-namespace-migration.rst",
    "content": ".. post:: March 31, 2026\n    :language: en\n    :tags: announce-nki-namespace\n\n.. _announce-nki-namespace-migration:\n\nAnnouncing NKI Library Kernel Migration to New nki.* Namespace starting Neuron 2.29\n------------------------------------------------------------------------------------\n\nStarting with Neuron 2.29, all NKI Library kernels are migrated to the new ``nki.*`` namespace.\n\nThe new ``nki.*`` namespace introduces changes to NKI APIs and language constructs that improve usability and performance. This transition ensures consistency across all NKI kernels and allows us to focus development efforts on a single, modern namespace.\n\nSee the :doc:`/nki/deep-dives/nki-migration-guide` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-longer-support-neuron-det.rst",
    "content": ".. post:: April 3, 2025\n    :language: en\n    :tags: announce-no-longer-support-neuron-det\n\n.. _announce-no-longer-support-neuron-det:\n\nNeuron no longer includes support for Neuron DET tool starting with this release \n---------------------------------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.22 <neuron-2.22.0-whatsnew>`, Neuron no longer supports the Neuron Distributed Event Tracing (NDET/neuron-det) tool.\n\nWe recommend customers transition to the Neuron Profiler 2.0 for debugging runtime hangs and issues in large-scale settings. This tool offers the same runtime function level traces with improved ease of use and optimized performance. For more information about the Neuron Profiler 2.0, see :ref:`neuron-profiler-2-0-guide`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-longer-support-nxd-examples.rst",
    "content": ".. post:: May 15, 2025\n    :language: en\n    :tags: announce-eol-nxd-examples\n\n.. _announce-eol-nxd-examples:\n\nAnnouncing migration of NxD Core inference examples from NxD Core repository to NxD Inference repository starting this release\n==================================================================================================================================\n\n\nStarting with :ref:`Neuron Release 2.23 <neuron-2.23.0-whatsnew>`, the following models and modules in NxD Core inference examples are now only available through NxD Inference package:\n\n- Llama\n- Mixtral\n- DBRX\n\n\nI currently utilize one of the mentioned inference samples from the NxD Core repository in my model code. What do I do?\n------------------------------------------------------------------------------------------------------------------------\n\nFor customers who want to deploy models out of the box, please use the NxD Inference model hub, which is the recommended option. With NxD Inference, you can import and use these models and modules in your applications. \nCustomers will need to update their applications to use examples under the NxD Inference repository: https://github.com/aws-neuron/neuronx-distributed-inference.\nAny models compiled with inference code from the NxD Core repository will need to be re-compiled. Please refer to the :ref:`nxd-examples-migration-guide` for guidance and see :ref:`nxdi-overview` for more information.\n\nI would like to continue using NxD Core. What do I do?\n--------------------------------------------------------\n\nFor customers who want to continue using NxD Core without NxD Inference, please refer to the Llama3.2 1B sample as a reference implementation: https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-113.rst",
    "content": ".. post:: April 3, 2025\n    :language: en\n    :tags: announce-no-longer-support-pytorch-version\n\n.. _announce-no-longer-support-pytorch-113:\n\nNeuron no longer supports PyTorch 1.13 starting this release\n-------------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.22 <neuron-2.22.0-whatsnew>`, Neuron no longer supports PyTorch 1.13, its associated Deep Learning Containers (DLCs), and Deep Learning AMIs (DLAMIS) for Trn1, Trn2, and Inf2 instances.\n\nWe recommend that all customers using PyTorch 1.13, related DLCs, and DLAMIS on Trn2, Trn1, and Inf2 instances upgrade to the latest supported PyTorch version. For more information on supported versions, please refer to :ref:`latest-neuron-release-artifacts`.\n\nPlease note that PyTorch 1.13 will continue to be supported for Inf1 instances.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-2-1.rst",
    "content": ".. post:: April 3, 2025\n    :language: en\n    :tags: announce-no-longer-support-pytorch-version\n\n.. _announce-no-longer-support-pytorch-2-1:\n\nNeuron no longer supports PyTorch 2.1 starting this release\n------------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.22 <neuron-2.22.0-whatsnew>`, Neuron no longer includes support for PyTorch 2.1, its associated Deep Learning Containers (DLCs), and Deep Learning AMIs (DLAMIS).\n\nWe recommend that all customers using PyTorch 2.1, related DLCs, and DLAMIS upgrade to the latest supported PyTorch version. For more information on supported versions, please refer to :ref:`latest-neuron-release-artifacts`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-2-7-2-8.rst",
    "content": ".. post:: March 30, 2026\n    :language: en\n    :tags: announce-no-longer-support-pytorch-version\n\n.. _announce-no-longer-support-pytorch-2-7-2-8:\n\nNeuron no longer supports PyTorch versions 2.7 and 2.8 starting with Neuron 2.29\n----------------------------------------------------------------------------------\n\nStarting with Neuron 2.29, Neuron no longer supports PyTorch versions 2.7 and 2.8. We recommend that all customers upgrade to the latest supported PyTorch version.\n\nCustomers currently using PyTorch versions 2.7 and 2.8 must upgrade to a newer supported PyTorch version. For more information on supported versions, refer to :ref:`latest-neuron-release-artifacts`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-longer-support-tensorflow-inf2.rst",
    "content": ".. post:: March 30, 2026\n    :language: en\n    :tags: announce-no-longer-support-tensorflow\n\n.. _announce-no-longer-support-tensorflow-inf2:\n\nNeuron no longer supports TensorFlow for Inferentia2 (Inf2) starting with Neuron 2.29\n---------------------------------------------------------------------------------------\n\nStarting with Neuron 2.29, Neuron no longer supports TensorFlow for Inferentia2 (Inf2). Current Inf2 instance users are advised to use the latest PyTorch version 2.9. For a list of supported PyTorch versions, see :doc:`/release-notes/releasecontent`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-longer-support-u20-dlc-dlami.rst",
    "content": ".. post:: April 3, 2025\n    :language: en\n    :tags: announce-u20-dlami-dlc-no-longer-support\n\n.. _announce-u20-dlami-dlc-eos:\n\nNeuron no longer includes support for Ubuntu20 DLCs and DLAMIs starting this release\n-------------------------------------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.22 <neuron-2.22.0-whatsnew>`, Neuron no longer includes offerings for Ubuntu20 Deep Learning Containers (DLCs) and Deep Learning AMIs (DLAMIs). \n\nCustomers using Ubuntu20 DLCs and DLAMIs should migrate to newer versions based on Ubuntu22 or Amazon Linux 2023. For customers who need to continue using Ubuntu20, you can create custom AMIs based on the Ubuntu20 base image and install Neuron components manually. Please see :ref:`container-faq` and :ref:`neuron-dlami-overview`. \n\nPlease note that this does not affect support for the base Ubuntu20 operating system, which will continue to receive updates as per our standard support policy. For more information, please see :ref:`sdk-maintenance-policy`\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-al2.rst",
    "content": ".. post:: September 16, 2024\n    :language: en\n    :tags: end-support-al2\n\n.. _eos-al2:\n\nNeuron Runtime no longer supports Amazon Linux 2 (AL2)\n========================================================\n\nStarting from :ref:`Neuron release 2.20 <neuron-2.20-whatsnew>`, the Neuron Runtime (``aws-neuronx-runtime-lib``) will no longer support Amazon Linux 2 (AL2). \nThe Neuron Driver (``aws-neuronx-dkms``) is now the only Neuron package that supports Amazon Linux 2. \nHowever, the Neuron Driver requires Linux kernel 5.10 or higher. Since default AL2 AMIs ship with kernel 4.14, you must upgrade your AL2 kernel to 5.10+ before installing driver versions 2.18 and later, or migrate to Amazon Linux 2023 or Ubuntu which include compatible kernels by default.\n\nThis change introduces the following constraint:\n\nCustomers cannot run their full Neuron-powered applications natively on an AL2-based Amazon Machine Image (AMI). To leverage Neuron functionality on an AL2 AMI, customers must containerize their applications using a Neuron supported container with non-AL2 Linux distribution (e.g., Ubuntu 22.04, Amazon Linux 2023, etc.) and then deploy those containers on an AL2-based AMI that has the Neuron Driver (``aws-neuronx-dkms``) installed.\n\nHow does this impact me?\n------------------------\n\n**I have an AL2 DLAMI**\n\nIf you are using one of the following Amazon\nLinux 2 DLAMIs, please migrate to a supported DLAMI (e.g., Ubuntu 22.04, Amazon Linux 2023 (AL2023), etc.). Please see :ref:`neuron-dlami-overview` for\na list of all supported DLAMIs to migrate to.\n\n+-----------------+------------------+-----------------------------------------------------------+\n|    Framework    | Operating System |                        DLAMI Name                         |\n+=================+==================+===========================================================+\n|  PyTorch 1.13   |  Amazon Linux 2  |  Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2)   |\n+-----------------+------------------+-----------------------------------------------------------+\n| TensorFlow 2.10 |  Amazon Linux 2  | Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2) |\n+-----------------+------------------+-----------------------------------------------------------+\n\n**I am using my own AL2 Container**\n\nIf you using your own AL2 Container, please migrate to a Neuron supported container with non-AL2 Linux distribution (e.g., Ubuntu 22.04, Amazon Linux 2023, etc.)\n\n**I am using a base AL2 DLAMI**\n\nIf you are using a base Amazon Linux 2 DLAMI, please ensure the Neuron Driver (``aws-neuronx-dkms``) is the only Neuron package installed. Please use non AL2 (e.g., Ubuntu 22.04, Amazon Linux 2023, etc.) containers to run your Neuron applications.\n\n.. note::\n   Neuron does not supports Linux kernel versions < 5.10. Customers using\n   Linux kernel versions < 5.10 must migrate to >= 5.10.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-device-version.rst",
    "content": ".. post:: September 16, 2024\n    :language: en\n    :tags: eos-neuron-device, neuron-device-\n\n.. _eos-neurondevice:\n\n'neurondevice' resource name in Neuron Device K8s plugin no longer supported\n------------------------------------------------------------------------------\n\nStarting with :ref:`Neuron release 2.20 <neuron-2-20-2-whatsnew>`, Neuron no longer supports resource name 'neurondevice'. \n\nNeuron device plugin is a Neuron Software component that gets installed in Kubernetes environment. The resource name 'neurondevice' enables customers to allocate devices to the Neuron K8s container.\n\nIn this release, we renamed resource name 'neurondevice' to 'neuron' to maintain consistency. Customers who are using the resource name 'neurondevice' in their YAML file need to update to use 'neuron'.\n\nPlease see :ref:`k8s-neuron-device-plugin` for more details.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-jax-neuronx-nki-call.rst",
    "content": ".. post:: May 15, 2025\n    :language: en\n    :tags: \n\n.. _announce-eos-jax-neuronx-features:\n\nNeuron no longer supports ``jax_neuronx.nki_call`` API in ``jax-neuronx`` starting this release\n-------------------------------------------------------------------------------------------------\n\n:ref:`Neuron Release 2.23 <neuron-2.23.0-whatsnew>` no longer supports ``jax_neuronx.nki_call`` API in ``jax-neuronx`` package.\n\nFor a full list of features that require ``jax-neuronx``, please see :ref:`jax-neuron-known-issues`. \n\nCustomers using ``jax_neuronx.nki_call`` API will need to switch invocations to directly call functions annotated with ``@nki.jit``.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-llama3-2-checkpoint.rst",
    "content": ".. post:: June 24, 2025\n    :language: en\n    :tags: announce-no-longer-support-llama-checkpoint\n\n.. _announce-no-longer-support-llama-32-meta-checkpoint:\n\nAnnouncing end of support for Llama 3.2 Meta checkpoint\n---------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.24 <neuron-2-24-0-whatsnew>`, the mllama 3.2 Meta checkpoint API is no longer be supported.\n\n**I currently use the mllama 3.2 Meta checkpoint in my applications. What do I do?**\n\nAll previously converted checkpoints will continue to function without disruption. Customers' existing workflows and converted models remain fully operational. For new checkpoint conversions, customers are advised to use the Hugging Face solution which provides equivalent functionality. Hugging Face's official conversion script is available here:\n`HuggingFace Conversion Script <https://github.com/huggingface/transformers/blob/main/src/transformers/models/mllama/convert_mllama_weights_to_hf.py>`_\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-nemo-megatron.rst",
    "content": ".. post:: May 15, 2025\n    :language: en\n    :tags: announce-no-support-nemo-megatron\n\n.. _announce-no-support-nemo-megatron:\n\nNeuron no longer supports NeMo Megatron starting this release\n---------------------------------------------------------------\n\nStarting with :ref:`Neuron release 2.23 <neuron-2.23.0-whatsnew>`, Neuron no longer supports :ref:`NeMo Megatron <nemo-megatron-index>`. \n\nAll users of :ref:`nemo-megatron-index` are requested to migrate their training workloads to :ref:`NxD Training <nxd-training-overview>`. Please refer to :ref:`Neuron NeMo Megatron to NeuronX Distributed Training Migration Guide <nxdt_developer_guide_migration_nnm_nxdt>` for guidance.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-neurondevice.rst",
    "content": ".. post:: September 16, 2024\n    :language: en\n    :tags: eos-neuron-device-version, neuron-device-version\n\n.. _eos-neuron-device-version:\n\n'neuron-device-version' field in neuron-monitor no longer supported\n--------------------------------------------------------------------\n\nStarting with :ref:`Neuron release 2.20 <neuron-2-20-2-whatsnew>`, Neuron no longer supports the field 'neuron-device-version' in neuron-monitor.\n\nCustomers who are using the field 'neuron-device-version' will instead need to use 'instance_type' field in the 'instance_info' section and the 'neuroncore_version' field to obtain neuron device information.\n\nPlease see :ref:`neuron-monitor-ug` for more details.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-nki-jit-torch.rst",
    "content": ".. post:: June 24, 2025\n    :language: en\n    :tags: announce-no-longer-support-nki-jit\n\n.. _announce-no-longer-support-nki-jit:\n\nNeuron no longer supports nki_jit API in PyTorch Neuron starting this release\n--------------------------------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.24 <neuron-2-24-0-whatsnew>`, ``torch_neuronx.nki_jit`` API in ``torch-neuronx`` package is no longer supported.\n\n**I currently use nki_jit in my PyTorch models. What do I do?**\n\nCustomers using ``torch_neuronx.nki_jit`` API are recommended to switch invocations to directly call functions annotated with ``@nki.jit``.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-tensorboard-plugin.rst",
    "content": ".. post:: February 26, 2026\n    :language: en\n    :tags: announce-no-support-tensorboard\n\n.. _announce-no-support-tensorboard-plugin:\n\nNeuron no longer supports TensorBoard Plugin for Neuron Profiler starting with Neuron 2.28\n-------------------------------------------------------------------------------------------\n\nStarting with Neuron 2.28, Neuron no longer supports TensorBoard Plugin for Neuron Profiler. All customers using TensorBoard Plugin to visualize and analyze model performance are recommended to migrate to Neuron Explorer.\n\nTo start using Neuron Explorer (available since Neuron 2.27) to profile your workloads, please see the :doc:`Neuron Explorer Getting Started guide </tools/neuron-explorer/get-started>`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-tensorflow1-x.rst",
    "content": ".. post:: September 16, 2024\n    :language: en\n    :tags: no-support-tensorflow-eos, tf-versions-1-x-no-support\n\n.. _announce-tfx-no-support:\n\nTensorflow-Neuron 1.x no longer supported\n------------------------------------------\n\nStarting with :ref:`Neuron release 2.20 <neuron-2-20-2-whatsnew>`, Neuron no longer supports Tensorflow-Neuron 1.x. \nCurrent users of those versions are advised to migrate to latest tensorflow-neuron version 2.10.1. Please see :ref:`TensorFlow Neuron <tensorflow-neuron-main>` for more details.\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-tensorflow2-10.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announce-no-support-tensorflow2-10\n\n.. _announce-no-support-tensorflow2-10:\n\nNeuron no longer supports tensorflow_2_10 single framework DLAMI and virtual environment in multi-framework DLAMIs starting with Neuron 2.27\n----------------------------------------------------------------------------------------------------------------------------------------------\n\nStarting with the release of Neuron 2.27.0, the ``tensorflow_2_10`` single framework Deep Learning AMI (DLAMI) and the TensorFlow 2.10 virtual environment in multi-framework DLAMIs are no longer supported.\n\nUsers are advised to use previously released DLAMIs for TensorFlow 2.10 support, or migrate to newer supported TensorFlow versions. For more information on supported versions, refer to :doc:`the list of current Neuron-supported package and library versions </release-notes/releasecontent>`.\n\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-tf-versions.rst",
    "content": ".. post:: May 15, 2025\n    :language: en\n    :tags: announce-no-support-tensorflow-eos\n\n.. _announce-no-support-tensorflow-eos:\n\nNeuron no longer supports Tensorflow 2.8 and 2.9 starting this release\n-----------------------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.23 <neuron-2.23.0-whatsnew>`, Neuron no longer supports for TensorFlow-Neuron 2.8 and 2.9 versions. \n\nCurrent users of those versions are advised to migrate to latest TensorFlow version (2.10). For a list of supported versions, please see :ref:`latest-neuron-release-artifacts`."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-torch-neuron-versions.rst",
    "content": ".. post:: December 20, 2024\n    :language: en\n    :tags: announce-no-support-torch-neuron\n\n.. _announce-no-support-torch-neuron:\n\nPyTorch Neuron versions 1.9 and 1.10 no longer supported\n----------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.21 <neuron-2.21.0-whatsnew>`, Neuron no longer supports torch-neuron 1.9 and 1.10 versions. Current users of torch-neuron 1.9 and 1.10 versions are advised to migrate to the latest torch-neuron supported version. Please see :ref:`latest-neuron-release-artifacts`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-ubuntu-20-base.rst",
    "content": ".. post:: May 15, 2025\n    :language: en\n    :tags: announce-u20-base-no-support\n\n.. _announce-u20-base-no-support:\n\nNeuron no longer supports base Ubuntu 20 operating system starting this release\n--------------------------------------------------------------------------------\n\n:ref:`Neuron Release 2.23 <neuron-2.23.0-whatsnew>` no longer includes support for base Ubuntu 20.04 operating system. \n\nCustomers using Ubuntu 20.04 are required to migrate their workloads to Ubuntu 22.04 or another supported operating system. Please refer to :ref:`neuron-dlami-overview` for guidance on Neuron supported operating systems. \n\nFor more information on the Neuron operating system support policy, please see :ref:`sdk-maintenance-policy`."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-no-support-vllm-v0.rst",
    "content": ".. post:: February 26, 2026\n    :language: en\n    :tags: announce-no-support-vllm\n\n.. _announce-no-support-vllm-v0:\n\nNeuron no longer supports vLLM V0 starting with Neuron 2.28\n------------------------------------------------------------\n\nStarting with Neuron 2.28 release, vLLM V0 will no longer be supported. This includes the vLLM V0 Neuron forks in the AWS Neuron `upstreaming-to-vllm GitHub repo <https://github.com/aws-neuron/upstreaming-to-vllm>`__ and vLLM V0-based Neuron Inference Deep Learning Containers.\n\nCustomers are recommended to use vLLM V1-based inference containers as documented in the :doc:`vLLM V1 user guide </libraries/nxd-inference/developer_guides/vllm-user-guide-v1>`. Additionally, Neuron will be updating existing vLLM-based tutorials to use vLLM V1 in the coming release.\n\nSee :ref:`vLLM on Neuron <nxdi-vllm-user-guide-v1>` for more information on vLLM V1 support.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-nxdi-changes.rst",
    "content": ".. post:: December 19, 2025\n    :language: en\n    :tags: announce-nxdi-changes\n\n.. _announce-nxdi-changes:\n\nAnnouncing changes to NxDI in the upcoming releases\n====================================================\n\nAs part of our transition to native PyTorch support, we are simplifying NxDI to provide a more streamlined developer experience.\n\n**What's changing:**\n\nIn the upcoming releases, we will introduce NxDI v2 that will not use NxDI ModelBuilder APIs. Instead, it will use ``torch.compile`` for model compilation. We will also simplify the NxDI APIs for modeling to align with native PyTorch primitives.\n\n**Timeline and migration:**\n\nWhile we introduce these changes, we will maintain both NxDI v1 and NxDI v2 simultaneously to ensure a smooth migration path for our customers. We will provide detailed migration guidance, timelines, and updated documentation as we approach the transition. More information about the migration path and specific release dates will be shared in the next release (Neuron 2.28).\n\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-package-change.rst",
    "content": ".. post:: September 16, 2024\n    :language: en\n    :tags: announce-nxdcore, neuron-component-nxdcore\n\n.. _announce-component-name-change-nxdcore:\n\nAnnouncing Name Change for Neuron Component \n---------------------------------------------\n\nStarting with :ref:`Neuron release 2.20 <neuron-2-20-2-whatsnew>`, the name of the following Neuron component will change as follows:\n\n======================= ======================= ============================ ==================\nPackage name            Current Name             New Name                     Abbreviation\n======================= ======================= ============================ ==================\nneuronx-distributed     NeuronX Distributed      NeuronX Distributed Core     NxD Core\n======================= ======================= ============================ ==================\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-python38-no-longer-support.rst",
    "content": ".. post:: April 3, 2025\n    :language: en\n    :tags: announce-python-version-no-longer-support\n\n.. _announce-python-no-longer-support:\n\nNeuron no longer includes Python 3.8 support starting this release\n-------------------------------------------------------------------\n\nStarting with :ref:`Neuron Release 2.22 <neuron-2.22.0-whatsnew>`, Neuron no longer includes support for Python 3.8 as it has its reached end-of-life status.\n\n=========================\nHow does this impact me?\n=========================\n\nI currently use Python 3.8.\n============================\n\nTo avoid security issues and bugs, current users of Python 3.8 are advised to migrate to a Neuron supported Python version (3.9, 3.10, or 3.11) as Neuron no longer supports Python 3.8. For a list of supported Python versions according to Neuron package, please see :ref:`latest-neuron-release-artifacts`.\n\nI currently use Ubuntu 20, which has Python 3.8 as the default version. Am I affected?\n=======================================================================================\n\nAlthough Python 3.8 is the default version of Ubuntu 20.04, Neuron will continue to support Ubuntu 20.04 until April 2025, due to extended standard support of Python 3.8 in Ubuntu 20. Please see the :ref:`sdk-maintenance-policy` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announce-transition-pytorch-trainium.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announce-transition-pytorch-trainium\n\n.. _announce-transition-pytorch-trainium:\n\nAnnouncing Transition to PyTorch Native Support for AWS Trainium in the Next Neuron Release Supporting PyTorch 2.10\n------------------------------------------------------------------------------------------------------------------------\n\nStarting with the introduction of Neuron support for PyTorch 2.10, AWS Neuron will begin a transition from PyTorch/XLA to native PyTorch support via TorchNeuron. PyTorch 2.9 will be the last version based on PyTorch/XLA.\n\nWhat's changing\n^^^^^^^^^^^^^^^^\n\n* If you are using PyTorch 2.9, it will be the last version of it that uses the PyTorch/XLA backend in Neuron.\n* For PyTorch 2.10 and later users, Neuron will provide Native PyTorch support via TorchNeuron.\n\nCustomers using PyTorch/XLA-based training should migrate to native PyTorch with TorchNeuron, which provides:\n\n* Native PyTorch eager execution mode\n* Standard distributed primitives (DTensor, FSDP, DDP)\n* ``torch.compile`` support\n* Compatibility with frameworks like TorchTitan (PyTorch Training Library)\n\nFor more information about native PyTorch on Neuron and migration guidance, see :doc:`Native PyTorch for AWS Trainium </frameworks/torch/pytorch-native-overview>`.\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announcement-end-of-support-neuronxcc-nki.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announcement-end-of-support-neuronxcc-nki\n\n.. _announcement-end-of-support-neuronxcc-nki:\n\nAnnouncing End of Support for neuronxcc.nki Namespace Starting with Neuron 2.28\n--------------------------------------------------------------------------------\n\nNeuron 2.27 will be the last to include support for the neuronxcc.nki.* namespace. Starting with Neuron 2.28, this namespace will no longer be supported.\n\nThe new ``nki.*`` namespace introduces changes to NKI APIs and language constructs. \n\nExisting kernels using ``neuronxcc.nki.*`` must migrate to the new nki.* namespace. A kernel migration guide is available in the Neuron 2.27 documentation.\n\nSee :doc:`the NKI Kernel Migration Guide </nki/deep-dives/nki-migration-guide>` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announcement-end-of-support-nxdt-nxd-core.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announcement-end-of-support-nxdt-nxd-core\n\n.. _announcement-end-of-support-nxdt-nxd-core:\n\nAnnouncing End of Support for NxDT and NxD Core Training APIs Starting with PyTorch 2.10\n-----------------------------------------------------------------------------------------\n\nNeuron support for PyTorch 2.9 will be the last to include the NeuronX Distributed Training (NxDT) libraries, NxD Core training APIs, and PyTorch/XLA for training. Starting with Neuron support for PyTorch 2.10, these components will no longer be supported.\n\nHow does this impact you\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nExisting NxDT/NxD Core users should stay on PyTorch 2.9 until ready to migrate to native PyTorch on Neuron (starting PyTorch 2.10). Customers are recommended to use native PyTorch with standard distributed primitives (DTensor, FSDP, DDP) and TorchTitan starting with Neuron 2.28 and PyTorch 2.10. A migration guide will be published in a coming release.\n\nSee :doc:`Native PyTorch on Neuron Overview </frameworks/torch/pytorch-native-overview>` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announcement-end-of-support-parallel-model-trace.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announcement-end-of-support-parallel-model-trace\n\n.. _announcement-end-of-support-parallel-model-trace:\n\nNeuron no longer supports parallel_model_trace API starting with Neuron 2.27\n-----------------------------------------------------------------------------\n\nStarting with the Neuron 2.27 release, the :ref:`parallel_model_trace API <nxd_tracing>` is no longer supported for inference. We introduced the :doc:`Model Builder V2 API </libraries/neuronx-distributed/model_builder_v2_api_reference>` in Neuron 2.25 as an alternative to the tracing API, and it is now the default API in Neuron for model tracing.\n\nCustomers can migrate to the Model Builder V2 API by following the reference `Llama-3.2-1B inference sample <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama>`__.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announcement-end-of-support-pytorch-2-6.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announcement-end-of-support-pytorch-2-6\n\n.. _announcement-end-of-support-pytorch-2-6:\n\nNeuron no longer supports PyTorch 2.6 starting with Neuron 2.27\n---------------------------------------------------------------\n\nStarting with Neuron 2.27, Neuron no longer supports PyTorch 2.6. We recommend that all customers using PyTorch 2.6 to upgrade to the latest supported PyTorch version.\n\nCustomers currently using PyTorch 2.6 must upgrade to a newer supported PyTorch version. For more information on supported versions, refer to :doc:`the list of current Neuron-supported package and library versions </release-notes/releasecontent>`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announcement-end-of-support-vllm-v0.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announcement-end-of-support-vllm-v0\n\n.. _announcement-end-of-support-vllm-v0:\n\nAnnouncing End of Support for vLLM V0 starting with Neuron 2.28\n----------------------------------------------------------------\n\nNeuron Release 2.27 will be the last release to support vLLM V0. In Neuron 2.27 release, vLLM V1 support is introduced for Neuron using the ``vllm-neuron`` plugin. Review the sources in the `Neuron vLLM GitHub Repository <https://github.com/vllm-project/vllm-neuron>`__.\n\nStarting with the Neuron 2.28 release, vLLM V0 will not be supported. Support will be dropped for vLLM V0 Neuron forks of the `upstreaming-to-vllm <https://github.com/aws-neuron/upstreaming-to-vllm/>`__ Neuron GitHub repo, along with vLLM V0-based Neuron Inference Deep Learning Containers.\n\nCustomers should migrate to vLLM V1 using the :doc:`vLLM V1 user guide </libraries/nxd-inference/developer_guides/vllm-user-guide-v1>`. Customers are recommended to start using vLLM V1 based inference containers that are released with Neuron v2.27.0. We plan to update the existing vLLM-based tutorials to use vLLM V1 in the coming release.\n\nSee :doc:`vLLM on Neuron </libraries/nxd-inference/vllm/index>` for more information on vLLM V1.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announcement-nki-library-kernel-migration.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announcement-nki-library-kernel-migration\n\n.. _announcement-nki-library-kernel-migration:\n\nAnnouncing NKI Library Kernel Migration to New nki.* Namespace in Neuron 2.28\n------------------------------------------------------------------------------\n\nSome NKI Library kernels currently use the legacy ``neuronxcc.nki.*`` namespace. Starting with Neuron 2.28, all NKI Library kernels will migrate to the new ``nki.*`` namespace.\n\nThe new ``nki.*`` namespace introduces changes to NKI APIs and language constructs that improve usability and performance. This transition ensures consistency across all NKI kernels and allows us to focus development efforts on a single, modern namespace.\n\nSee :doc:`the NKI Kernel Migration Guide </nki/deep-dives/nki-migration-guide>` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announcement-nki-library-namespace-changes.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announcement-nki-library-namespace-changes\n\n.. _announcement-nki-library-namespace-changes:\n\nAnnouncing NKI Library Namespace Changes in Neuron 2.28\n--------------------------------------------------------\n\nNKI Library kernels are published in the `NKI Library GitHub repository <https://github.com/aws-neuron/nki-library>`__. In Neuron 2.27, these kernels are also shipped as part of neuronx-cc using the nkilib.* namespace. To avoid namespace conflicts when customers use kernels from the open source repository, the repository uses the ``nkilib_standalone.nkilib.*`` namespace.\n\nStarting with Neuron 2.28 the open source repository namespace will change from ``nkilib_standalone.nkilib.*`` to ``nkilib.*``, providing a consistent namespace between the open source repository and the shipped version.\n\nSee :doc:`NKI Library </nki/library/index>` for more information.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/announcement-python-3-9-eol.rst",
    "content": ".. post:: December 16, 2025\n    :language: en\n    :tags: announcement-python-3-9-eol\n\n.. _announcement-python-3-9-eol:\n\nNeuron no longer supports Python 3.9 starting with Neuron version 2.27\n-----------------------------------------------------------------------\n\nStarting with Neuron Release 2.27, Neuron no longer includes support for Python 3.9 as it has reached its end-of-life status.\n\nIf you currently use Python 3.9, you are advised to migrate to a Neuron supported Python version (3.10, 3.11 or 3.12) to avoid security issues and bugs.\n\nFor a list of supported Python versions according to Neuron package, refer to :doc:`the list of current Neuron-supported package and library versions </release-notes/releasecontent>`."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/dlami-neuron-2.10.rst",
    "content": ".. post:: May 02, 2023 11:00\n    :language: en\n    :tags: dlami, pytorch, trn1, inf2, inf1\n\n.. _announce-dlc-sm-neuron-2.9.1:\n\nAWS Deep Learning AMIs now available with Neuron 2.10 version\n-------------------------------------------------------------\n\nWe are happy to announce that the following Deep Learning AMIs are now available with latest Neuron Version 2.10. These DLAMIs now support\nall the Neuron EC2 instances including Inf1, Inf2, Trn1/Trn1n.\n\nYou can access the AMIs at the following URLs\n\n* `AWS Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-13-ubuntu-20-04/>`__\n* `AWS Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-13-amazon-linux-2/>`__\n* `AWS Deep Learning AMI Base Neuron (Ubuntu 20.04) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-ubuntu-20-04/>`__\n* `AWS Deep Learning AMI Base Neuron (Amazon Linux 2) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-amazon-linux-2/>`__\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/dlami-neuron-2.12.rst",
    "content": ".. post:: July 26, 2023 11:00\n    :language: en\n    :tags: dlami, pytorch, trn1, inf2, inf1\n\n.. _announce-dlami-neuron-2.12:\n\nAWS Deep Learning AMIs now available with Neuron 2.12 version\n-------------------------------------------------------------\n\nWe are happy to announce that the following Deep Learning AMIs are now available with latest Neuron Version 2.12. \n\nYou can see more about the AMIs at the following URLs\n\n* `AWS Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-13-ubuntu-20-04/>`__\n* `AWS Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-13-amazon-linux-2/>`__\n* `AWS Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 20.04) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-tensorflow-2-10-ubuntu-20-04/>`__\n* `AWS Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-tensorflow-2-10-amazon-linux-2/>`__\n* `AWS Deep Learning AMI Base Neuron (Ubuntu 20.04) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-ubuntu-20-04/>`__\n* `AWS Deep Learning AMI Base Neuron (Amazon Linux 2) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-amazon-linux-2/>`__\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/dlami-pytorch-introduce.rst",
    "content": ".. post:: Nov 02, 2022 00:01\n    :language: en\n    :tags: dlami, pytorch\n\n.. _announce-dlami-neuron-pytorch:\n\nIntroducing AWS Deep Learning AMI Neuron PyTorch\n------------------------------------------------\n\nWe are happy to announce that Deep Learning AMI (DLAMI) with pre-installed PyTorch Neuron (``torch-neuronx``) is now available, for more information see:\n\n* `AWS Deep Learning AMI Neuron PyTorch 1.11 \\(Amazon Linux 2\\) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-11-amazon-linux-2/>`_\n\n* `AWS Deep Learning AMI Neuron PyTorch 1.11 \\(Ubuntu 20.04\\) <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-neuron-pytorch-1-11-ubuntu-20-04/>`_\n\nThe Neuron Setup Guide will be updated soon to include the DLAMI PyTorch Neuron."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/end-of-support-pt2.rst",
    "content": ".. post:: February 2, 2024\n    :language: en\n    :tags: eos-pt-two, pt-two\n\n.. _eos_pytorch2:\n\nPyTorch NeuronX version 2.0 (Beta) no longer supported\n-------------------------------------------------------\n\n:ref:`Neuron release 2.17 <neuron-2.17.0-whatsnew>` no longer supports PyTorch NeuronX version 2.0 (Beta). \n\nCurrent users of PyTorch NeuronX version 2.0 are advised to migrate to PyTorch NeuronX 2.1 (Beta).\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/github-changes.rst",
    "content": ".. post:: Oct 10, 2022 02:00\n    :language: en\n    :tags: github\n\n.. _announce-aws-neuron-github-org:\n\nIntroducing New Neuron GitHub Repositories\n------------------------------------------\n\nStarting with Neuron release 2.3, Neuron Github repositories will be migrated\nto the new `AWS Neuron GitHub Organization <https://github.com/aws-neuron>`_. \n\nThe new AWS Neuron GitHub Organization will include the `Neuron SDK GitHub <https://github.com/aws-neuron/aws-neuron-sdk>`_ repository and will include the following additional new GitHub repositories:\n\n.. list-table:: AWS Neuron GitHub Organization \n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * \t- New GitHub repository\n     \t- Description\n\n   * \t- `AWS Neuron Samples <https://github.com/aws-neuron/aws-neuron-samples>`_\n     \t- Repository that hosts examples and scripts used in the Neuron documentation tutorials\n\n   * \t- `AWS Neuron Reference for Megatron-LM <https://github.com/aws-neuron/aws-neuron-reference-for-megatron-lm>`_\n     \t- Repository that hosts Neuron support for Megatron-LM\n\n   * \t- `AWS Neuron Samples for AWS ParallelCluster <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples>`_\n     \t- Repository that hosts Neuron support for AWS ParallelCluster\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/gpg-expiration.rst",
    "content": ".. post:: Nov 10, 2022 00:01\n    :language: en\n    :tags: dlami, pytorch\n\n.. _announce-dlami-neuron-pytorch:\n\nNeuron GPG key for Ubuntu installation has expired\n--------------------------------------------------\n\nGPG, or GNU Privacy Guard, is a public key cryptography implementation. This allows for the secure transmission of information between parties and can be used to verify that the origin of a message is genuine.\n\nThe GPG key for the Neuron repository (https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB) is installed on the Ubuntu (Canonical) server, the key was uploaded originally with an expiry date of three (3) years, which has expired on 11/10/22.\n\nPlease see :ref:`gpg_key_update` for instructions how to update the Neuron repository GPG keys."
  },
  {
    "path": "about-neuron/announcements/neuron2.x/neuron-rtd-eol.rst",
    "content": ".. post:: Oct 10, 2022 01:00\n    :language: en\n    :tags: eol, neuron2.x\n\n.. _announce-neuron-rtd-eol:\n\nAnnouncing Neuron Runtime 1.x (``neuron-rtd``) end-of-support\n-------------------------------------------------------------\n\nStarting with Neuron release 2.3, Neuron components like Neuron System Tools\nand Neuron Driver will no longer support Neuron Runtime 1.x.\n\nIn addition, starting with Neuron release 2.3, the `AWS Neuron Runtime Proto GitHub <https://github.com/aws-neuron/aws-neuron-runtime-proto>`_  and `AWS Neuron Driver GitHub <https://github.com/aws-neuron/aws-neuron-driver>`_ repositories will no longer be supported.\n\nWhy are we removing support for Neuron Runtime 1.x?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron Runtime 1.x (``neuron-rtd``) entered :ref:`maintenance mode <maintenance_rtd>` when Neuron 1.16.0 \nwas released. While Neuron components like Neuron Driver and Neuron System Tools continued to support \nNeuron Runtime 1.x in addition to supporting Neuron Runtime 2.x, Neuron supported frameworks (e.g. PyTorch Neuron,\nTensorFlow Neuron, and MXNet Neuron) stopped supporting Neuron Runtime 1.x starting with Neuron 1.16.0. \nFor detailed information see :ref:`introduce-libnrt`.\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/neuron2-intro.rst",
    "content": ".. post:: Oct 10, 2022 04:00\n    :language: en\n    :tags: neuron2.x\n\n.. _neuron2-intro:\n\nIntroducing the first release of Neuron 2.x enabling EC2 Trn1 General Availability (GA)\n=======================================================================================\n\nNeuron release 2.3 is the first release of Neuron 2.x that enables GA of the new EC2 Trn1 instances.\nNeuron release 2.3 extends the latest release of Neuron 1.x (Neuron 1.19.2), adding support for Deep Learning training on the AWS Trainium chips.\n\nStarting with Neuron release 2.3, developers can run Deep Learning training workloads on Trn1 instances, saving training costs by up to \n50% over equivalent GPU-based EC2 instances, while achieving the highest training performance in the AWS cloud for popular NLP models.  Neuron 2.x introduces new capabilities and major architectural updates to support training neural-networks with the Trn1 instances. \n\nIn addition, starting with this release, Neuron introduces new packages, renames several packages, \nand updates Neuron installation and update instructions. This release also ends support for Neuron Runtime 1.x.\n\nMore about the release\n----------------------\n\n.. include:: /release-notes/templates/n2.x-trn1-ga-quick.txt\n\n\n\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/neuron230-packages-changes.rst",
    "content": ".. post:: Oct 10, 2022 03:00\n    :language: en\n    :tags: neuron2.x\n\n.. _neuron-packages-changes:\n\nIntroducing Packaging and installation changes\n----------------------------------------------\n\nStarting with Neuron release 2.3, Neuron introduces changes in Neuron packages and installation instructions.\n\n.. contents::  Table of contents\n   :local:\n   :depth: 2\n\n.. _neuron-new-packages:\n\nNew Neuron packages\n^^^^^^^^^^^^^^^^^^^\n\nStarting with Neuron release 2.3, Neuron introduces the following new packages:\n\n.. list-table:: New Neuron packages\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - New Package\n     - Package Type\n     - Description\n     - Supported Instances \n     \n       (At the time of releasing Neuron release 2.3)\n\n   * - ``torch-neuronx``\n     - .whl (pip)\n     - PyTorch Neuron package using `PyTorch XLA <https://pytorch.org/xla>`_ \n     - Trn1\n\n   * - ``neuronx-cc``\n     - .whl (pip)\n     - Neuron Compiler with XLA front-end\n     - Trn1\n\n   * - ``aws-neuronx-runtime-lib``\n     - .deb (apt), .rpm (yum)\n     - Neuron Runtime library\n     - Trn1\n\n   * - ``aws-neuronx-collective``\n     - .deb (apt), .rpm (yum)\n     - Collective Communication library          \n     - Trn1\n\n   * - ``aws-neuronx-tools``\n     - .deb (apt), .rpm (yum)\n     - Neuron System Tools\n     - Trn1\n\n.. note::\n\n   In next releases ``aws-neuronx-tools`` and ``aws-neuronx-runtime-lib`` will add support for Inf1.\n\nWhy are we introducing new Neuron packages?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo add Neuron support for training neural-networks, Neuron 2.x introduces new capabilities and major architectural updates. For example, Neuron adds support for Collective Communication Operations, in :ref:`new packages <neuron-new-packages>` such as ``aws-neuron-collective``. \n\nIn addition, some of those updates and new capabilities are not backward compatible, for example the Pytorch Neuron package that adds support for training neural-networks uses `PyTorch XLA <https://pytorch.org/xla>`_ as a backend. To reduce the possibility of customers using features that are not backward compatible, the new capabilities are introduced in new Neuron packages. For example, PyTorch Neuron and Neuron Compiler will support  different packages for Inf1 and for Trn1: ``torch-neuron`` and ``neuron-cc`` will support Inf1 instances, and ``torch-neuronx`` and ``neuronx-cc`` will support Trn1 instances.\n\n.. _neuron-packages-renaming:\n\nRenamed Neuron Packages\n^^^^^^^^^^^^^^^^^^^^^^^\n\nStarting with Neuron release 2.3, the following  Neuron packages will change names: \n\n\n.. list-table:: Neuron package with changed names\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size   \n\n   * - New name\n     - Old name (deprecated package)\n     - Package Type\n     - Description\n     - Supported Instances \n\n   * - ``aws-neuronx-oci-hooks``\n     - ``aws-neuron-runtime-base``\n     - .deb (apt), .rpm (yum)\n     - OCI Hooks support\n     - Trn1, Inf1\n\n   * - ``aws-neuronx-dkms``\n     - ``aws-neuron-dkms``\n     - .deb (apt), .rpm (yum)\n     - Neuron Driver\n     - Trn1, Inf1     \n\n\n\n\n   * - ``aws-neuronx-k8-plugin``\n     - ``aws-neuron-k8-plugin``\n     - .deb (apt), .rpm (yum)\n     - Neuron Kubernetes plugin\n     - Trn1, Inf1\n\n   * - ``aws-neuronx-k8-scheduler``\n     - ``aws-neuron-k8-scheduler``\n     - .deb (apt), .rpm (yum)\n     - Neuron Scheduler plugin\n     - Trn1, Inf1\n\nWhy are we changing package names?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo avoid situations where customers may accidentally install Neuron packages with features that are not backward compatible, we have introduced additional packages with different names for the same Neuron component. \n\n.. _neuron-installation-instruction-change:\n\nUpdated installation and update instructions \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nStarting with Neuron release 2.3, Neuron installation and update instructions will include pinning of the major version of the Neuron package. For example, to install latest Neuron tools package, call ``sudo apt-get install aws-neuronx-tools=2.*`` and to install latest PyTorch Neuron package for Trn1, call ``pip install torch-neuronx==1.11.0.1.*``. \n\n\nWhy are we changing installation and update instructions?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe Neuron installation and update instructions now guide customers to pin the major version of the different Neuron packages as mentioned in :ref:`neuron-installation-instruction-change`. This is done to future-proof instructions for new, backwards-incompatible major version releases.\n\n.. note:: The change of the installation and update instructions will not include instruction to install or update ``torch-neuron`` and ``neuron-cc``.\n\nWhat do I need to do?\n~~~~~~~~~~~~~~~~~~~~~\n\nPlease follow the :ref:`Neuron setup guide <setup-guide-index>` to update to latest Neuron releases.\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/neuron250-packages-changes.rst",
    "content": ".. post:: Nov 22, 2022 03:00\n    :language: en\n    :tags: neuron2.x\n\n.. _neuron250-packages-changes:\n\nIntroducing Neuron packaging and installation changes for Inf1 customers\n------------------------------------------------------------------------\n\nStarting with :ref:`Neuron release 2.5 <neuron-2.5.0-whatsnew>`, Neuron introduces changes in Neuron packages and installation instructions for Inf1, the following  Neuron packages will change names: \n\n\n.. list-table:: Neuron package with changed names for Inf1\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size   \n\n   * - New name\n     - Old name (deprecated package)\n     - Package Type\n     - Description\n     - Supported Instances \n\n   * - ``aws-neuronx-tools``\n     - ``aws-neuron-tools``\n     - .deb (apt), .rpm (yum)\n     - System Tools\n     - Trn1, Inf1\n\n   * - ``aws-neuronx-dkms``\n     - ``aws-neuron-dkms``\n     - .deb (apt), .rpm (yum)\n     - Neuron Driver\n     - Trn1, Inf1     \n\n   * - ``aws-neuronx-k8-plugin``\n     - ``aws-neuron-k8-plugin``\n     - .deb (apt), .rpm (yum)\n     - Neuron Kubernetes plugin\n     - Trn1, Inf1\n\n   * - ``aws-neuronx-k8-scheduler``\n     - ``aws-neuron-k8-scheduler``\n     - .deb (apt), .rpm (yum)\n     - Neuron Scheduler plugin\n     - Trn1, Inf1\n\n   * - ``tensorflow-model-server-neuronx``\n     - ``tensorflow-model-server-neuron``\n     - .deb (apt), .rpm (yum)\n     - tensorflow-model-server\n     - Trn1, Inf1\n\n\nPlease follow the :ref:`Neuron setup guide <setup-guide-index>` to update to latest Neuron releases.\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/release-neuron2.4.rst",
    "content": ""
  },
  {
    "path": "about-neuron/announcements/neuron2.x/sm-training-dlc-2.9.1.rst",
    "content": ".. post:: Apr 26, 2023 11:00\n    :language: en\n    :tags: sagemaker, pytorch, trn1, inf2\n\n.. _announce-dlc-sm-neuron-2.9.1:\n\nPyTorch 1.13 Deep Learning Container for Inf2 & Trn1/Trn1n now available for SageMaker \n--------------------------------------------------------------------------------------\n\nWe are happy to announce that an updated Deep Learning Container that supports PyTorch 1.13 and Neuron 2.9.1 versions is now available for Sagemaker Training.\n\nFor more information see `Neuron Containers <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers>`_\n\n\n"
  },
  {
    "path": "about-neuron/announcements/neuron2.x/sm-training-trn1-introduce.rst",
    "content": ".. post:: Nov 03, 2022 00:01\n    :language: en\n    :tags: sagemaker, pytorch, trn1\n\n.. _announce-dlami-neuron-pytorch:\n\nAmazon SageMaker now supports Trn1 training jobs\n------------------------------------------------\n\nWe are happy to announce that Amazon SageMaker now supports running training jobs on ml.trn1 instance types.\n\nFor more information see `Distributed Training with PyTorch Neuron on Trn1 instances <https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#distributed-training-with-pytorch-neuron-on-trn1-instances>`_\n\nThe Neuron Developer Flows section will be updated soon.\n"
  },
  {
    "path": "about-neuron/appnotes/index.rst",
    "content": ".. _neuron-appnotes-index:\n.. _neuron-appnotes:\n\n.. meta::\n   :description: AWS Neuron SDK application notes for support announcements, performance optimization, migration guides, and framework-specific implementations.\n   :date-modified: 2025-10-03\n\nNeuron application notes\n========================\n\n.. toctree:: \n   :maxdepth: 2\n   :hidden:\n\n   Neuron Runtime Library <neuron1x/introducing-libnrt>\n   Performance <perf/neuron-cc/performance-tuning>\n   Parallel execution <perf/neuron-cc/parallel-ncgs>\n   PyTorch for Neuron <torch-neuron/index>\n   PyTorch for NeuronX <torch-neuronx/index>\n\nApplication notes provide specific documentation for support announcements, migration guides, performance optimization techniques, and framework-specific implementations for AWS Neuron SDK components.\n\n\nFramework integration\n---------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: torch-neuron-r-cnn-app-note\n      :link-type: ref\n\n      **PyTorch Neuron (Inf1)**\n      ^^^\n      R-CNN implementation and optimization techniques for PyTorch on ``Inf1``\n\n   .. grid-item-card::\n      :link: torch-neuronx-graph-partitioner-app-note\n      :link-type: ref\n\n      **PyTorch NeuronX Graph Partitioner**\n      ^^^\n      Advanced graph partitioning strategies for distributed training and inference\n\n   .. grid-item-card::\n      :link: torch-neuronx-dataparallel-app-note\n      :link-type: ref\n\n      **Data Parallel Inference on Torch NeuronX**\n      ^^^\n      Guide to using ``torch.neuronx.DataParallel`` for scalable inference on ``Inf1``\n\n   .. grid-item-card::\n      :link: torch-neuron-dataparallel-app-note\n      :link-type: ref\n\n      **Data Parallel Inference on Torch Neuron**\n      ^^^\n      Guide to using ``torch.neuron.DataParallel`` for scalable inference on ``Inf1``\n\n   .. grid-item-card::\n      :link: migration_from_xla_downcast_bf16\n      :link-type: ref\n\n      **Migrate from XLA_USE_BF16/XLA_DOWNCAST_BF16**\n      ^^^\n      Guide to migrating from deprecated XLA environment variables to recommended PyTorch mixed-precision options on NeuronX\n\n   .. grid-item-card::\n      :link: introduce-pytorch-2-9\n      :link-type: ref\n\n      **PyTorch 2.9 Support**\n      ^^^\n      New features and migration guide for PyTorch 2.9 on Neuron\n\n\n\n\n\n"
  },
  {
    "path": "about-neuron/appnotes/mxnet-neuron/flex-eg.rst",
    "content": ".. _flexeg:\n\nFlexible Execution Group (FlexEG) in Neuron-MXNet\n=================================================\n\nIntroduction\n------------\n\nInf1 instances are available with a different number of Inferentia\nchips, each Inferentia chip is combined of 4 NeuronCores and an Inf1\ninstance includes 4 to 64 NeuronCores depending on the instance size.\nWith Neuron Runtime 1.x (neuron-rtd server), NeuronCores could be\ncombined into NeuronCore Groups (NCG),\nwhich were basic scheduling units of compiled neural network in Neuron.\nCreation of desired sized NCGs was done at the start of the application\nand could not be modified afterwards.\n\nStarting with Neuron SDK 1.16.0, and with the introduction of Neuron\nRuntime 2.x, MXNet Neuron 1.8 introduces Flexible Execution Groups\n(FlexEG) feature. With FlexEG, you do not have to create NCGs at the\nstart of the process, instead you will set the index of the first\nNeuronCore you want to load models onto, and FlexEG feature will enable\nthe flexibility of loading models onto any available NeuronCore on the\ninf1 instance starting from the first NeuronCore you set. This guide\nwill show you how to efficiently utilize NeuronCores using FlexEG\nfeature in NeuronMXNet.\n\nFlexEG\n------\n\nWith the introduction of FlexEG, you don’t need to create NCGs and can\nload models onto a group of consecutive NeuronCores by providing the\nindex of the first NeuronCore in the group. Neuron runtime takes care of\nfiguring out the number of NeuronCores required for the given compiled\nmodel and loads the model using the required number of cores\n(sequentially starting with the NeuronCore index provided by the user).\n\nFor example, assuming that you have an Inf1.6xl machine and there are 4\nmodels A, B, C, D compiled to 2, 4, 3, and 4 NeuronCores respectively,\nyou can map any model to any core by context\n``mx.neuron(neuron_core_index)`` where ``neuron_core_index`` is the\nNeuronCore index (0,1,2,3,4 … ).\n\nIn the example below, you map model A to ``mx.neuron(0)`` context, model\nB to ``mx.neuron(2)`` context, model C to ``mx.neuron(6)`` context and\nmodel D to ``mx.neuron(9)`` context. \n\n.. figure:: /images/mx_FlexEG_arch_1.png\n   :scale: 80 %\n\nThe above configuration is achieved by using application code similar to\nbelow:\n\n.. code :: python\n\n   # Load models (MXNet)\n   # loaded into the 2 cores starting with core 0\n   sym, args, aux = mx.model.load_checkpoint(mx_model0_file, 0)\n   model0 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null')\n   # loaded into the 4 cores starting with core 2\n   sym, args, aux = mx.model.load_checkpoint(mx_model1_file, 0)\n   model1 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null')\n   # loaded into the 3 cores starting with core 6\n   sym, args, aux = mx.model.load_checkpoint(mx_model2_file, 0)\n   model2 = sym.bind(ctx=mx.neuron(6), args=args, aux_states=aux, grad_req='null')\n   # loaded into the 4 cores starting with core 9\n   sym, args, aux = mx.model.load_checkpoint(mx_model3_file, 0)\n   model3 = sym.bind(ctx=mx.neuron(9), args=args, aux_states=aux, grad_req='null')\n\n   # run inference by simply calling the loaded model\n   results0 = model0.forward(data=inputs0)\n   results1 = model1.forward(data=inputs1)\n   results2 = model2.forward(data=inputs2)\n   results3 = model3.forward(data=inputs3)\n\nSince there is no NCG creation at the start of the process, you can load\nthe same four models but in a different configuration by changing the\ncontext being used for inference. For example, you could map model C to\n``mx.neuron(0)`` context, model A to ``mx.neuron(3)`` context, model D\nto ``mx.neuron(5)`` context and model B to ``mx.neuron(9)`` context.\n\n.. figure:: /images/mx_FlexEG_arch_2.png\n   :scale: 80 %\n\nMigration from NeuronCore Groups to FlexEG\n------------------------------------------\n\nNeuronCore Groups are defined by setting the environment variable\n``NEURONCORE_GROUP_SIZES`` with a comma separated list of number of\ncores in each group. In this mode of operation, number of devices\n(defined in ``NEURONCORE_GROUP_SIZES``) are grouped together to create a\nsingle entity.\n\n``NEURONCORE_GROUP_SIZES`` environment variable is set at runtime:\n\n.. code :: python\n\n   #!/bin/bash\n   export NEURONCORE_GROUP_SIZES=2,4,3,4 \n   python your_neuron_application.py\n\nNeuronCore groups are created once at the start of the application and\ncannot be modified / re-created till the application process runs. The\nabove flow creates 4 neuron devices with 2,4,3 and 4 devices each. In\norder to get the same configuration as the example from before , you map\nmodel A to ``mx.neuron(0)`` context, model B to ``mx.neuron(1)``\ncontext, model C to ``mx.neuron(2)`` context and model D to\n``mx.neuron(3)`` context.\n\n\n.. figure:: /images/mx_FlexEG_arch_1.png\n   :scale: 80 %\n\n\nThis can be achieved programmatically as shown below:\n\n.. code :: python\n\n   # Set Environment \n   os.environ['NEURONCORE_GROUP_SIZES']='2,4,3,4'\n\n   # Load models (MXNet)\n   # loaded into the first group of NC0-NC1\n   sym, args, aux = mx.model.load_checkpoint(mx_model0_file, 0)\n   model0 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null')\n   # loaded into the second group of NC2-NC5\n   sym, args, aux = mx.model.load_checkpoint(mx_model1_file, 0)\n   model1 = sym.bind(ctx=mx.neuron(1), args=args, aux_states=aux, grad_req='null')\n   # loaded into the third group of NC6-NC8\n   sym, args, aux = mx.model.load_checkpoint(mx_model2_file, 0)\n   model2 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null')\n   # loaded into the fourth group of NC9-NC12\n   sym, args, aux = mx.model.load_checkpoint(mx_model3_file, 0)\n   model3 = sym.bind(ctx=mx.neuron(3), args=args, aux_states=aux, grad_req='null')\n\n   # run inference by simply calling the loaded model\n   results0 = model0.forward(data=inputs0)\n   results1 = model1.forward(data=inputs1)\n   results2 = model2.forward(data=inputs2)\n   results3 = model3.forward(data=inputs3)\n\nSo comparing to FlexEG, we see that in case of NCGs neuron context\nrequires the index of the execution group, while in FlexEG\nneuron context requires the NeuronCore index of the first NeuronCore on which the\nmodel is supposed to be loaded and executed. For example, with\n``NEURONCORE_GROUP_SIZES='2,4,3,4'``, ``ctx=mx.neuron(1)`` loads the\nmodel on execution group 1 which effectively loads the model on the 2nd NCG group \nwhich has 4 NeuronCores.\n\nBest practices when using FlexEG\n--------------------------------\n\nFlexEG gives the user most flexibility in terms of accessing cores and\nloading models on specific cores. With this the users can effortlessly\nload and execute new models on NeuronCores without closing the\napplication. Here we shall outline some of the best practices that\nshould be kept in mind while using FlexEG.\n\nChoosing starting core\n~~~~~~~~~~~~~~~~~~~~~~\n\nFlexEG tries to use the required number of cores (based on the input\nmodel) starting with the core index provided by the user. Incase the\nsystem, doesnt have the required number of cores after the starting core\nindex, model load will fail. For example: We have a model X which needs\n2 cores and an inf1.xl machine with 4 NeuronCores (NeuronCore indexes\nare: 0, 1, 2 and 3). As the model needs at least 2 cores, valid start\nindexes for this model are: 0, 1, 2. However if the user gives 3 as the\nneuron context, then there are no 2 cores available starting from core\n3. So it will fail.\n\nPerformance vs. Flexibility tradeoff\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhile using data parallel model of operation (were models are executed\nin parallel), for optimal performance the user should make sure that the\nmodels are not sharing any cores. That is because NeuronCores can\nexecute one model at a time, when two or more models are executed on the\nsame core (assuming that they are already loaded), it executes the first model, stops it, starts the second\nmodel and then executes it. This is called model switiching and involves\nadditional overhead and prevents execution on model in parallel. For\nexample: assuming that you have an Inf1.6xl machine and there are 4\nmodels A, B, C, D compiled to 2, 4, 3, and 4 NeuronCores respectively.\nLoading model A to ``mx.neuron(0)`` context, model B to ``mx.neuron(2)``\ncontext, model C to ``mx.neuron(6)`` context and model D to\n``mx.neuron(9)`` context is a good configuration because no two models\nare sharing NeuronCores and thus can be executed in parallel. However,\nLoading model A to ``mx.neuron(0)`` context, model B to ``mx.neuron(2)``\ncontext, model C to ``mx.neuron(5)`` context and model D to\n``mx.neuron(9)`` context is a not a good configuration as models B and C\nshare NeuronCore 5 and thus cannot be executed in parallel.\n\n\n.. figure:: /images/mx_FlexEG_arch_bad.png\n   :scale: 80 %\n\n"
  },
  {
    "path": "about-neuron/appnotes/neuron-cc/mixed-precision.rst",
    "content": ".. _neuron-cc-training-mixed-precision:\n\nMixed precision and performance-accuracy tuning (``neuron-cc``)\n===============================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nThe Neuron Compiler supports machine learning models with FP32,\nFP16 and BF16 (Bfloat16) tensors and operators. The Neuron hardware supports a\nmix of 32 and 16 bit datatypes.\nThe available auto-cast methods and their performance / accuracy trade-offs\nare explained in this document.\n\nNeuron Hardware\n-------------------\n\nThe Neuron hardware supports matrix multiplication using FP16 or BF16 on its Matmult Engine, and\naccumulations using FP32.\nSimilarly, operators such as activations or vector operations\nare supported using FP16, BF16 and FP32.\nNeuron supports tensor transpose in two ways - by fast matrix\nmultiplication in FP16/BF16 or by slower byte-by-byte data movements.\n\n\nPerformance-accuracy tradeoffs for models trained in FP32\n---------------------------------------------------------\n\nModels that are trained using FP32 data types can be deployed on Neuron\nthrough ahead of time compilation using the :ref:`Neuron Compiler <neuron_cli>`.\n\n\n.. important::\n    **By default**, the Neuron Compiler disables auto-casting and uses the data types defined within the model.\n    This provides the best accuracy for FP32 trained models, but does not provide the best performance.\n\nweights and operations to BF16**. Only partial sums are left in FP32. The default, casting will generate the highest\nperformance for a FP32 trained model.\n\nUsing the ``--fast-math`` CLI option, you can choose the right \ntradeoff between performance and accuracy. The tradeoff usually is between achieving high performance or optimal accuracy, and decision what settings to use will be application specific.\n\nIt is recommended that the you start with compiling the model to achieve the high performance (default), you can then \ntest the accuracy of the application and, if needed, try the next higher precision casting option until the desired \naccuracy and performance are achieved. A typical flow can be:\n\n1. You can compile without options (default) or with ``--fast-math all`` which will optimize for performance.\n\n2. If accuracy is not sufficient you can try ``--fast-math fp32-cast-matmult``  \n\n3. If accuracy is not sufficient you can try ``--fast-math fp32-cast-matmult no-fast-relayout``\n\n4. If accuracy is not sufficient you can try ``--fast-math none`` which will optimize for accuracy .\n\n \nBetween step 2 and step 3, and between step 3 and step 4 you have additional options that can provide different level of accuracy and which are explained in the below section.\n\nNote that compiler has to preserve the input/output (i/o) tensor types requested by Framework, therefore no casting is done on the i/o tensors. Additional speedup can be obtained by casting them in the Framework prior compilation.\n\nTo learn how to use compiler command line interface (CLI) options with your application's framework, please see :ref:`torch_neuron_trace_api`, :ref:`tensorflow-ref-neuron-compile-api` and :ref:`tensorflow-ref-neuron-tracing-api`.\n\n\nCompiler casting options\n------------------------\n\n``--fast-math`` option\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe ``--fast-math`` option is intended to replace the ``--fp32-cast`` option. It is recommended to\nto start using or migrating to ``--fast-math`` option. The ``--fast-math`` option provides the same level of functionality\nas the ``--fp32-cast`` option in addition to the following:\n\n* The ``--fast-math`` option introduces the ``no-fast-relayout`` option to enable lossless transpose operation. This was not possible with the ``--fp32-cast`` option.\n* The ``--fast-math`` option provides finer control than the ``--fp32-cast`` option. The transpose operation and the cast operation are controlled independently:\n\n    - ``no-fast-relayout`` and ``fast-relayout`` provide control for the transpose operation.\n    - ``fp32-cast-*`` provide control for casting.\n\nSee the detailed list of the options in :ref:`/compiler/neuron-cc/command-line-reference.rst`.\n"
  },
  {
    "path": "about-neuron/appnotes/neuron1x/important-neuronx-dkms.txt",
    "content": ".. important ::\n\n   Starting with Neuron version 2.3, the ``aws-neuron-dkms`` package name has been changed to ``aws-neuronx-dkms``. See :ref:`neuron2-intro`\n"
  },
  {
    "path": "about-neuron/appnotes/neuron1x/introducing-libnrt.rst",
    "content": ".. _introduce-libnrt:\n\nIntroducing Neuron Runtime 2.x (libnrt.so)  \n==========================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nWhat are we changing?\n---------------------\n\nStarting with the *Neuron 1.16.0* release, *Neuron Runtime 1.x* (``neuron-rtd``) is entering maintenance mode and is being replaced by *Neuron Runtime 2.x*, a shared library named (``libnrt.so``). For more information on Runtime 1.x see :ref:`maintenance_rtd`.\n\nUpgrading to ``libnrt.so`` simplifies the Neuron installation and upgrade process, introduces new capabilities for allocating NeuronCores \nto applications, streamlines container creation, and deprecates tools that are no longer needed.\n\nThis document describes the capabilities of *Neuron Runtime 2.x* in detail, provides information needed for successful installation and upgrade, \nand provides information needed for successful upgrade of Neuron applications using *Neuron Runtime 1.x* (included in releases before *Neuron 1.16.0*)\nto *Neuron Runtime 2.x* (included in releases *Neuron 1.16.0* or newer).\n\n.. _introduce-libnrt-why:\n\nWhy are we making this change?\n------------------------------\n\nBefore *Neuron 1.16.0*, Neuron Runtime was delivered as a daemon (``neuron-rtd``), and communicated with Neuron framework extensions through a ``gRPC`` interface. \n``neuron-rtd`` was packaged as an ``rpm`` or ``debian`` package (``aws-neuron-runtime``) and required a separate installation step.\n\nStarting with *Neuron 1.16.0*, *Neuron Runtime 2.x* is delivered as a shared\nlibrary (``libnrt.so``) and is directly linked to Neuron framework extensions.\n``libnrt.so`` is packaged and installed as part of the Neuron framework extensions\n(e.g. TensorFlow Neuron, PyTorch Neuron or MXNet Neuron), and does not require a\nseparate installation step. Installing Neuron Runtime as part of the Neuron\nframework extensions simplifies installation and improves the user experience.\nIn addition, since ``libnrt.so`` is directly linked to the Neuron framework\nextensions, faster communication between the Neuron Runtime and\nNeuron Frameworks is enabled by eliminating the ``gRPC`` interface overhead.\n\nFor more information see :ref:`introduce-libnrt-how-sdk` and :ref:`neuron-migrating-apps-neuron-to-libnrt`.\n\n\n.. _libnrt-neuron-cmponents:\n\n.. _introduce-libnrt-how-sdk:\n\nHow will this change affect the Neuron SDK?\n-------------------------------------------\n\nNeuron Driver\n^^^^^^^^^^^^^\n\nUse the latest Neuron Driver. For successful installation and upgrade to *Neuron 1.16.0* or newer, \nyou must install or upgrade to Neuron Driver (``aws-neuron-dkms``) *version 2.1.5.0* or newer. Neuron applications using *Neuron 1.16.0* will fail if \nthey do not detect *Neuron Driver version 2.1.5.0* or newer. For installation and upgrade instructions see :ref:`install-guide-index`.\n\n\n.. include:: ./important-neuronx-dkms.txt\n\nTo see details of Neuron component versions please see :ref:`latest-neuron-release-artifacts`.\n\n.. important ::\n\n   For successful installation or update to Neuron 1.16.0 and newer from previous releases:\n      * Stop Neuron Runtime 1.x daemon (``neuron-rtd``) by running: ``sudo systemctl stop neuron-rtd``\n      * Uninstall ``neuron-rtd`` by running: ``sudo apt remove aws-neuron-runtime`` or ``sudo dnf remove aws-neuron-runtime``\n      * Install or upgrade to the latest Neuron Driver (``aws-neuron-dkms``) by following the :ref:`install-guide-index` instructions.\n      * Starting with Neuron version 2.3, ``aws-neuron-dkms`` the package name has been changed to ``aws-neuronx-dkms``, see :ref:`neuron2-intro`\n\n\nNeuron Runtime\n^^^^^^^^^^^^^^\n\n* Installation\n  Starting from *Neuron 1.16.0*, Neuron releases will no longer include the ``aws-neuron-runtime packages`` and Neuron Runtime will be part of the Neuron \n  framework extension of choice (TensorFlow Neuron, PyTorch Neuron or MXNet Neuron). Installing any Neuron framework package will install the Neuron Runtime library \n  (``libnrt.so``).\n\n      * For installation and upgrade instructions see :ref:`install-guide-index`.\n\n* Configuring *Neuron Runtime*\n   Before *Neuron 1.16.0*, *Neuron Runtime 1.x* was configured in configuration files (e.g. /opt/aws/neuron/config/neuron-rtd.config).\n   Starting from *Neuron 1.16.0*, *Neuron Runtime 2.x* can be configured through environment variables. See :ref:`nrt-configuration` for details. \n\n* Starting and Stopping *Neuron Runtime*\n   Before introducing ``libnrt.so``, ``neuron-rtd`` ran as a daemon that communicated through a ``gRPC`` interface. Whenever ``neuron-rtd`` took ownership of a Neuron device, \n   it continued owning that device until it was stopped. This created the need to stop ``neuron-rtd`` in certain cases. With the introduction of ``libnrt.so``, *Neuron Runtime* as it runs inside the context of the application. With *Neuron Runtime 2.x*, the act of starting and stopping a Neuron application causes ``libnrt.so`` to automatically claim or release ownership of the required Neuron devices.\n   \n\n* NeuronCore Groups (NCG) end-of-support\n   Before the introduction of *Neuron Runtime 2.x*, NeuronCore Group (NCG) was used to define an execution group of one or more NeuronCores \n   where models could be loaded and executed. It also provided separation between processes.\n   \n   With the introduction of *Neuron Runtime 2.x*, strict separation of NeuronCores into groups is no longer necessary and NeuronCore Groups (NCG) has been \n   deprecated. See :ref:`eol-ncg` for more information.\n\n* Running multiple *Neuron Runtimes*\n   Before the introduction of ``libnrt.so``, it was necessary to run multiple ``neuron-rtd`` daemons to allocate Neuron devices for each ``neuron-rtd``, \n   using configuration files.\n   After the introduction of ``libnrt.so``, it will no longer necessary to run multiple ``neuron-rtd`` daemons to allocate Neuron devices to a specific Neuron application. \n   With ``libnrt.so`` NeuronCores (A Neuron device includes multiple NeuronCores) are allocated to a particular application by using ``NEURON_RT_VISIBLE_CORES`` or ``NEURON_RT_NUM_CORES``\n   environment variables, for example:\n\n   .. code ::\n\n      NEURON_RT_VISIBLE_CORES=0-3 myapp1.py\n      NEURON_RT_VISIBLE_CORES=4-11 myapp2.py\n\n   Or\n\n   .. code ::\n\n      NEURON_RT_NUM_CORES=3 myapp1.py &\n      NEURON_RT_NUM_CORES=4 myapp2.py &\n\n\n\n   See :ref:`nrt-configuration` for details. \n\n* Logging\n   Similar to *Neuron Runtime 1.x*, *Neuron Runtime 2.x* logs into syslog (verbose logging). To make debugging easier, *Neuron Runtime 2.x* also logs into the console (error-only logging). Refer to :ref:`nrt-configuration` to see how to increase or decrease logging verbosity.\n\n* Multi-process access to NeuronCores\n    With the introduction of ``libnrt.so``, it is no longer possible to load models from multiple processes on the same NeuronCore.  \n    A NeuronCore can only be accessed from the same process. Instead you can load models on a specific NeuronCore, using multiple threads from the same process.\n\n    .. note::\n\n      For optimal performance of multi-model execution, each NeuronCore executes a single model.\n\n\n* Neuron Runtime architecture\n    *Neuron Runtime 2.x* is delivered as a shared library (``libnrt.so``) and is directly linked to Neuron framework extensions.\n    ``libnrt.so`` is packaged and installed as part of Neuron framework extensions \n    (e.g. TensorFlow Neuron, PyTorch Neuron, or MXNet Neuron), and does not require a \n    separate installation step. Installing Neuron Runtime as part of the Neuron \n    framework extensions simplifies installation and improves the user experience. \n    In addition, since ``libnrt.so`` is directly linked to Neuron framework \n    extensions, it enables faster communication between Neuron Runtime and \n    Neuron Frameworks by eliminating ``gRPC`` interface overhead.\n\n\nNeuron framework extensions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nStarting from *Neuron 1.16.0*, Neuron framework extensions (TensorFlow Neuron, PyTorch Neuron, or MXNet Neuron) are packaged together with \n``libnrt.so``. It is required to install the ``aws-neuron-dkms`` Driver version 2.1.5.0 or newer for proper operation. The ``neuron-rtd`` daemon \nthat was installed in previous releases no longer works starting with Neuron 1.16.0.\n\nTo see details of Neuron component versions see :ref:`latest-neuron-release-artifacts`.\n\n.. :important:\n\n   Starting Neuron version 2.3, the ``aws-neuron-dkms`` package name is changed to ``aws-neuronx-dkms``, see :ref:`neuron2-intro`\n\nTensorFlow model server\n^^^^^^^^^^^^^^^^^^^^^^^\n\nStarting from *Neuron 1.16.0*, the TensorFlow Neuron model server is packaged together with ``libnrt.so`` and expects ``aws-neuron-dkms`` \n*version 2.1.5.0* or newer for proper operation.\n\n\n.. note::\n\n   The TensorFlow Neuron model server included in *Neuron 1.16.0* runs from the directory in which it was installed and will not run properly if copied to a different location, due to its dependency on ``libnrt.so``.\n\n.. include:: ./important-neuronx-dkms.txt\n\n\n\nNeuron tools\n^^^^^^^^^^^^\n\n* ``neuron-cli`` - Starting from *Neuron 1.16.0*, ``neuron-cli``  enters maintenance mode. See :ref:`maintenance_neuron-cli` for more information.\n* ``neuron-top`` - Starting from *Neuron 1.16.0*, ``neuron-top`` has a new user interface. See :ref:`neuron-top-ug` for more information.\n* ``neuron-monitor`` - ``neuron-monitor`` was updated to support Neuron Runtime 2.x (``libnrt.so``)\n\n  * See :ref:`neuron-monitor-ug` for an updated user guide of ``neuron-monitor``.\n  * See neuron-monitor upgrade notes for a list of changes between *Neuron Monitor 2.x* and *Neuron Monitor 1.0*\n  * See neuron-monitor backward compatibility notes for instructions for using *Neuron Monitor 2.x* with *Neuron Runtime 1.x* (``neuron-rtd``) .\n\n\n\n.. _introduce-libnrt-how-user:\n\nHow will this change affect me?\n-------------------------------\n\nNeuron installation and upgrade\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n\nAs explained in \":ref:`libnrt-neuron-cmponents`\", starting from *Neuron 1.16.0*, ``libnrt.so`` requires the latest Neuron Driver (``aws-neuron-dkms``). \nIn addition, it is no longer necessary to install ``aws-neuron-runtime``. To install Neuron or to upgrade to latest Neuron version, follow the \ninstallation and upgrade instructions below:\n\n* PyTorch Neuron\n   * :ref:`install-neuron-pytorch`.\n   * :ref:`update-neuron-pytorch`.\n\n* TensorFlow Neuron\n   * :ref:`install-neuron-tensorflow`.\n   * :ref:`update-neuron-tensorflow`.\n\n* MXNet Neuron\n   * :ref:`install-neuron-mxnet`.\n   * :ref:`update-neuron-mxnet`.\n\n\n.. include:: ./important-neuronx-dkms.txt\n\n\n.. _neuron-migrating-apps-neuron-to-libnrt:\n\nMigrate your application to Neuron Runtime 2.x (libnrt.so) \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFor a successful migration from previous releases of your application to *Neuron 1.16.0* or newer, make sure you perform the following:\n\n#. Prerequisite\n    Read  \":ref:`libnrt-neuron-cmponents`\".\n\n#. Make sure you are not using *Neuron Runtime 1.x* (``aws-neuron-runtime``)   \n    * Remove any code that installs ``aws-neuron-runtime`` from any CI/CD scripts.\n    * Stop ``neuron-rtd`` by running ``sudo systemctl stop neuron-rtd``\n    * Uninstall ``neuron-rtd`` by running ``sudo apt remove aws-neuron-runtime`` or ``sudo dnf remove aws-neuron-runtime``\n\n\n#. Upgrade to your Neuron Framework of choice:\n    * :ref:`update-neuron-pytorch`.\n    * :ref:`update-neuron-tensorflow`.\n    * :ref:`update-neuron-mxnet`.\n\n\n#. If you have code that starts and/or stops ``neuron-rtd``\n    Remove any code that starts or stops ``neuron-rtd`` from any CI/CD scripts.\n       \n\n\n#. Application running multiple ``neuron-rtd``\n    If your application runs multiple processes and requires running multiple ``neuron-rtd`` daemons:\n\n    * Remove the code that runs multiple ``neuron-rtd`` daemons.\n    * Instead of allocating Neuron devices to ``neuron-rtd`` through configuration files, use ``NEURON_RT_VISIBLE_CORES`` or ``NEURON_RT_NUM_CORES`` environment variables to\n      allocate NeuronCores. See :ref:`nrt-configuration` for details.\n\n    If you application uses ``NEURONCORE_GROUP_SIZES``, see the next item.\n\n\n    .. note::\n\n      ``NEURON_RT_VISIBLE_CORES`` and ``NEURON_RT_NUM_CORES`` environment variables enable you to allocate NeuronCores to an application. Allocating NeuronCores improves application granularity, because Neuron devices include multiple NeuronCores.\n\n#. Application running multiple processes using ``NEURONCORE_GROUP_SIZES``\n    * Consider using ``NEURON_RT_VISIBLE_CORES`` or ``NEURON_RT_NUM_CORES`` environment variables instead of ``NEURONCORE_GROUP_SIZES``, which is being deprecated.  See :ref:`nrt-configuration` for details.\n\n    * If you are using TensorFlow Neuron (``tensorflow-neuron (TF2.x)``) and you are replacing ``NEURONCORE_GROUP_SIZES=AxB`` which enables auto multicore replication, see the new API :ref:`tensorflow-ref-auto-replication-python-api` for usage and documentation.\n   \n    * The behavior of your application will remain the same as before if you do not set ``NEURON_RT_VISIBLE_CORES`` and do not set ``NEURON_RT_NUM_CORES``.\n\n    * If you are considering migrating to ``NEURON_RT_VISIBLE_CORES`` or ``NEURON_RT_NUM_CORES``:\n\n      * ``NEURON_RT_VISIBLE_CORES`` takes precedence over ``NEURON_RT_NUM_CORES``.\n\n      * If you are migrating to ``NEURON_RT_VISIBLE_CORES``:\n\n         * For TensorFlow applications or PyTorch applications make sure that ``NEURONCORE_GROUP_SIZES`` is unset, or that ``NEURONCORE_GROUP_SIZES`` allocates the same or smaller number of NeuronCores as allocated by ``NEURON_RT_VISIBLE_CORES``.\n         * For MXNet applications, setting ``NEURONCORE_GROUP_SIZES`` and ``NEURON_RT_VISIBLE_CORES`` environment variables at the same time is not supported. Use ``NEURON_RT_VISIBLE_CORES`` only.\n         * See :ref:`nrt-configuration` for more details on how to use ``NEURON_RT_VISIBLE_CORES``.\n\n\n      * If you are migrating to ``NEURON_RT_NUM_CORES``:\n\n         * Make sure that ``NEURONCORE_GROUP_SIZES`` is unset.\n         * See :ref:`nrt-configuration` for more details on how to use ``NEURON_RT_NUM_CORES``.\n\n\n#. Application running multiple processes accessing the same NeuronCore\n    If  your application accesses the same NeuronCore from multiple processes, this is no longer possible with ``libnrt.so``.\n    Instead, modify your application to access the same NeuronCore from multiple threads.\n\n    .. note::\n\n      Optimal performance of multi-model execution is achieved when each NeuronCore executes a single model.\n\n\n#. Neuron Tools\n    * If you are using Neuron Monitor, see the neuron-monitor upgrade notes for details.\n    * If you are using ``neuron-cli`` remove any call to ``neuron-cli``. For more information, see :ref:`maintenance_neuron-cli`.\n\n\n\n#. Containers\n    If your application is running within a container, and it previously executed ``neuron-rtd`` within the container, you need\n    to re-build your container, so it will not include or install ``aws-neuron-runtime``. See :ref:`neuron-containers` for details.\n\n\n\nTroubleshooting\n---------------\n\nApplication fails to start\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDescription\n~~~~~~~~~~~\n\nStarting with the *Neuron 1.16.0* release, Neuron Runtime (``libnrt.so``) requires *Neuron Driver 2.0* or greater (``aws-neuron-dkms``). Neuron Runtime requires the Neuron Driver (``aws-neuron-dkms`` package) to access Neuron devices. \n\nIf ``aws-neuron-dkms`` is not installed, the application will fail with an error message on the console and syslog similar to the following:\n\n.. code::\n\n   NRT:nrt_init      Unable to determine Neuron Driver version. Please check aws-neuron-dkms package is installed.\n\nIf an old ``aws-neuron-dkms`` is installed, the application will fail with an error message on the console and syslog similar to the following:\n\n.. code::\n\n   NRT:nrt_init      This runtime requires Neuron Driver version 2.0 or greater. Please upgrade aws-neuron-dkms package.\n\n\n\n\nSolution\n~~~~~~~~\n\nFollow the installation steps in :ref:`install-guide-index` to install ``aws-neuron-dkms``.\n\n.. include:: ./important-neuronx-dkms.txt\n\n\nApplication fails to start although I installed latest ``aws-neuron-dkms``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDescription\n~~~~~~~~~~~\n\nStarting from the *Neuron 1.16.0* release, Neuron Runtime (``libnrt.so``) requires *Neuron Driver 2.0* or greater (``aws-neuron-dkms``). If an old ``aws-neuron-dkms`` is installed,  the application will fail. You may try to install ``aws-neuron-dkms`` and still face application failure, because the ``aws-neuron-dkms`` installation failed as a result of ``neuron-rtd`` daemon that was still running.\n\n\nSolution\n~~~~~~~~\n\n* Stop ``neuron-rtd`` by running: ``sudo systemctl stop neuron-rtd``\n* Uninstall ``neuron-rtd`` by running: ``sudo apt remove aws-neuron-runtime`` or sudo ``dnf remove aws-neuron-runtime``\n* Install ``aws-neuron-dkms`` by following steps in :ref:`install-guide-index`\n\n.. include:: ./important-neuronx-dkms.txt\n\n\nApplication unexpected behavior when upgrading to release *Neuron 1.16.0* or newer \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDescription\n~~~~~~~~~~~\n\nWhen upgrading to release *Neuron 1.16.0* or newer from previous releases, the OS may include two different versions of \n*Neuron Runtime*: the ``libnrt.so`` shared library and ``neuron-rtd`` daemon. This can happen if the user did not stop ``neuron-rtd`` daemon\nor did not make sure to uninstall the existing Neuron version before upgrade. \nIn this case the user application may behave unexpectedly.\n\nSolution\n~~~~~~~~\n\nIf the OS includes two different versions of *Neuron Runtime*, ``libnrt.so`` shared library and ``neuron-rtd`` daemon:\n\n   * Before running applications that use ``neuron-rtd``, restart ``neuron-rtd`` by calling ``sudo systemctl restart neuron-rtd``.\n   * Before running applications linked with ``libnrt.so``, stop ``neuron-rtd`` by calling ``sudo systemctl stop neuron-rtd``.\n\n\nApplication unexpected behavior when downgrading to releases before *Neuron 1.6.0* (from *Neuron 1.16.0* or newer)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDescription\n~~~~~~~~~~~\n\nWhen upgrading to release *Neuron 1.16.0* or newer from previous releases, and then downgrading back to releases before *Neuron 1.6.0*, \nthe OS may include two different versions of *Neuron Runtime*: the ``libnrt.so`` shared library and ``neuron-rtd`` daemon. This can happen \nif the user did not make sure to uninstall the existing Neuron version before the upgrade or downgrade.\nIn this case the user application may behave unexpectedly.\n\nSolution\n~~~~~~~~\n\nIf the OS include two different versions of *Neuron Runtime*, ``libnrt.so`` shared library and ``neuron-rtd`` daemon:\n\n   * Before running applications that use ``neuron-rtd``, restart ``neuron-rtd`` by calling ``sudo systemctl restart neuron-rtd``.\n   * Before running applications linked with ``libnrt.so``, stop ``neuron-rtd`` by calling ``sudo systemctl stop neuron-rtd``.\n\n\n\nNeuron Core is in use\n^^^^^^^^^^^^^^^^^^^^^\n\nDescription\n~~~~~~~~~~~\n\nA Neuron Core cannot be shared between two applications. If an application\nstarted using a Neuron Core all other applications trying to use the\nNeuronCore will fail during runtime initialization with the following\nmessage in the console and in syslog:\n\n.. code:: bash\n\n   ERROR   NRT:nrt_allocate_neuron_cores               NeuronCore(s) not available - Requested:nc1-nc1 Available:0\n\nSolution\n~~~~~~~~\n\nTerminate the the process using NeuronCore and then try launching the application.\n\nFrequently Asked Questions (FAQ)\n--------------------------------\n\nDo I need to recompile my model to run it with Neuron Runtime 2.x (``libnrt.so``)?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNo. \n\nDo I need to change my application launch command?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNo.\n\n\nCan ``libnrt.so`` and ``neuron-rtd`` co-exist in the same environment?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAlthough we recommend upgrading to the latest Neuron release, we understand that for a transition period you may continue using ``neuron-rtd`` for old releases. If you are using Neuron Framework (PyTorch,TensorFlow or MXNet) from releases before *Neuron 1.16.0*: \n\n* Install the latest Neuron Driver (``aws-neuron-dkms``) \n\n.. include:: ./important-neuronx-dkms.txt\n\n* For development, we recommend using different environments for Neuron Framework (PyTorch,TensorFlow or MXNet) from releases before *Neuron 1.16.0* and for Neuron \n  Framework (PyTorch,TensorFlow or MXNet) from *Neuron 1.16.0* and newer. If that is not possible, make sure to stop ``neuron-rtd`` before executing models using\n  Neuron Framework (PyTorch,TensorFlow or MXNet) from *Neuron 1.16.0* and newer.\n\n* For deployment, when you are ready to upgrade, upgrade to Neuron Framework (PyTorch,TensorFlow or MXNet) from *Neuron 1.16.0* and newer. \n  See :ref:`neuron-migrating-apps-neuron-to-libnrt` for more information.\n\n\n.. warning ::\n\n   Executing models using Neuron Framework (PyTorch,TensorFlow or MXNet) from *Neuron 1.16.0* and newer in an environment where ``neuron-rtd`` is running may cause\n   undefined behavior. Make sure to stop ``neuron-rtd`` before executing models using Neuron Framework (PyTorch,TensorFlow or MXNet) from *Neuron 1.16.0* and newer.\n\nAre there Neuron framework versions that will not support Neuron Runtime 2.x (``libnrt.so``)?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAll supported PyTorch Neuron and TensorFlow framework extensions, in addition to Neuron MXnet 1.8.0 framework extensions support Neuron Runtime 2.x.\n\nNeuron MxNet 1.5.1 does not support Neuron Runtime 2.x (``libnrt.so``) and has now entered maintenance mode. See :ref:`maintenance_mxnet_1_5` for details.\n"
  },
  {
    "path": "about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision.rst",
    "content": ".. _neuronx-cc-training-mixed-precision:\n\nMixed Precision and Performance-accuracy Tuning (``neuronx-cc``)\n================================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n\nThe Neuron Compiler supports machine learning models with FP32, TF32, FP16 and BF16 (Bfloat16) tensors and operators. The Neuron hardware supports a mix of 32, 16, and 8 bit datatypes. This guide explains how to apply the available auto-cast methods and their performance / accuracy trade-offs when compiling a model with Neuron.\n\n.. note:: Neuron Compiler support for INT8 is planned for a future Neuron SDK release. See `Neuron Compiler: Enable Neuron INT8 support <https://github.com/aws/aws-neuron-sdk/issues/36>`_ for details.\n\nNeuron Hardware\n---------------\n\nThe Neuron v2 hardware supports matrix multiplication using FP16, BF16, TF32, and FP32 on its matrix multiply (\"matmult\") engine, and accumulations using FP32. Operators such as activations or vector operations are supported using FP32, TF32, FP16, and BF16. Supporting FP16 and BF16 allows Neuron to have significantly higher performance than executing everything as FP32.\n\n\nPerformance-accuracy tradeoffs\n------------------------------\n\n**By default**, the Neuron Compiler will **automatically cast FP32 matrix multiplication operations to BF16**. The remaining operations are performed in the data type specified by the model. The Neuron Compiler provides CLI options that direct the compiler to cast to other data types, thereby giving the ability to choose an accuracy-to-performance tradeoff in model execution. Deciding what CLI settings to use will be application specific and may require some experimentation. See :ref:`Neuron Compiler CLI Reference Guide<neuron-compiler-cli-reference-guide>` for details.\n\n\nWhat is the difference between  Data Types?\n-------------------------------------------\n\nThe NeuronCore v2 support multiple data types (see :ref:`NeuronCore v2 Data Types<neuron-data-types-v2>`). Each data type provides benefits and drawbacks due to its dynamic range and numeric precision.\n\n+------+-----------+----------+--------------------------------------------------------+---------------------------------------------------+\n| Type | Minimum   | Maximum  | Strength                                               | Weakness                                          |\n+======+===========+==========+========================================================+===================================================+\n| FP16 | -65504    | 65504    |\tNumeric Precision, High granularity, Mid-range numbers | Low range, medium precision                       |\n+------+-----------+----------+--------------------------------------------------------+---------------------------------------------------+\n| BF16 | -3.40E+38 | 3.40E+38 |\tDynamic Range, Extremely small/large numbers           | Low precision                                     |\n+------+-----------+----------+--------------------------------------------------------+---------------------------------------------------+\n| TF32 | -3.40E+38 | 3.40E+38 |\tDynamic Range, Extremely small/large numbers           | Medium precision                                  |\n+------+-----------+----------+--------------------------------------------------------+---------------------------------------------------+\n| FP32 | -3.40E+38 | 3.40E+38 | N/A                                                    | Larger model size, potentially slower computation |\n+------+-----------+----------+--------------------------------------------------------+---------------------------------------------------+\n\n* FP16 provides a high density of representable values that are neither extremely small or extremely large. The density of representable values within the range is approximately an order of magnitude greater than BF16.\n\n  * Conversion from FP32 to FP16 will perform well when values are relatively small but non-extreme (either very small or very large).\n  * Conversion from FP32 to FP16 will perform badly if the original FP32 values are outside of the range of FP16. This will produce inf/-inf values and may result in NaN depending on the operation.\n\n* BF16 provides a wider range of representable values which includes both very small and very large values. However, the overall density of representable values is usually lower than FP16 for more non-extreme values. The range is nearly identical to the range of FP32 but because the number of bits is halved, this means the individual values are sparse.\n\n  * Conversion from FP32 to BF16 will perform well when the values are well-distributed throughout the range. Since BF16 covers the entire FP32 range, this means each original value can map to a relatively close downcast value.\n  * Conversion from FP32 to BF16 will perform badly when fine granularity is needed. Since BF16 granularity is sacrificed for greater range it will almost always map worse to values that are within the FP16 range.\n\nShould I downcast operations to smaller Data Types?\n---------------------------------------------------\n\nThis choice here is driven entirely by accuracy vs performance tradeoff. Casting operations to smaller 16-bit data types will provide a significant performance benefit but may end up sacrificing accuracy.\n\nThe compiler uses BF16 casting **by default** for matrix multiplication operations. The speedup from casting operations gives a significant performance boost and the range of representable values in BF16 allows for more safety compared to FP16 when the possible numeric range of input values is unknown.\n\nThe Neuron Compiler's  ``--auto-cast`` and ``--auto-cast-type`` CLI options are used to direct the compiler to perform alternate casting operations. See the detailed list of the options in :ref:`Neuron v2 Compiler CLI Reference Guide<neuron-compiler-cli-reference-guide>`. The default setting is ``--auto-cast=none``, which is applied if the ``--auto-cast`` flag is not provided.\n\n\nThe option combinations to consider in a typical flow are:\n\n\n+---------------------------------------------------------+--------------------------------------------------------------------------+-----------------------------------------------------+-------------------------------------------------+\n| Compiler autocast                                       | Options    Effect                                                        | Performance                                         | Accuracy                                        |\n+=========================================================+==========================================================================+=====================================================+=================================================+\n| ``--auto-cast none`` (default)                          | Disables all auto-casting, using the data types defined within the model | Lowest performance                                  | Highest accuracy                                |\n+---------------------------------------------------------+--------------------------------------------------------------------------+-----------------------------------------------------+-------------------------------------------------+\n| ``--auto-cast matmult --auto-cast-type tf32``           |                                                                          | Performance *increases* as you move down the table  | Accuracy *decreases* as you move down the table |\n+---------------------------------------------------------+--------------------------------------------------------------------------+                                                     |                                                 |\n| ``--auto-cast all —-auto-cast-type tf32``               | Balance of performance, dynamic range, and precision                     |                                                     |                                                 |\n+---------------------------------------------------------+--------------------------------------------------------------------------+                                                     |                                                 |\n| ``--auto-cast matmult --auto-cast-type fp16``           |                                                                          |                                                     |                                                 |\n+---------------------------------------------------------+--------------------------------------------------------------------------+                                                     |                                                 |\n| ``--auto-cast all —-auto-cast-type fp16``               | Best performance at the expense of dynamic range                         |                                                     |                                                 |\n+---------------------------------------------------------+--------------------------------------------------------------------------+                                                     |                                                 |\n| ``--auto-cast matmult --auto-cast-type bf16``           | Best performance at the expense of precision                             |                                                     |                                                 |\n+---------------------------------------------------------+                                                                          + ----------------------------------------------------+-------------------------------------------------+\n| ``--auto-cast all --auto-cast-type bf16``               |                                                                          |  Highest performance                                | Lowest accuracy                                 |\n+---------------------------------------------------------+--------------------------------------------------------------------------+-----------------------------------------------------+-------------------------------------------------+\n\nNote that compiler has to preserve the input/output (i/o) tensor types requested by Framework, therefore no casting is done on the i/o tensors. Additional speedup can be obtained by casting them in the Framework prior to compilation.\n\nTo learn how to configure the compiler options from within your application’s framework, please see:\n\n* :ref:`Developer Guide for Training with PyTorch Neuron <pytorch-neuronx-programming-guide>`\n"
  },
  {
    "path": "about-neuron/appnotes/neuronx-distributed/introducing-nxd-inference.rst",
    "content": ".. _introduce-nxd-inference:\n\nIntroducing NeuronX Distributed (NxD) Inference\n=================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nWhat are we introducing?\n------------------------\n\n\nStarting with the Neuron SDK 2.21 release, we are introducing NxD Inference, an open-source PyTorch-based inference library that simplifies deep learning model deployment on AWS Inferentia and Trainium instances. NxD Inference is designed for optimized inference, enabling quick onboarding of PyTorch models with minimal changes. It features a modular architecture that facilitates easy integration of HuggingFace PyTorch models and is compatible with serving engines like vLLM.\n\nPlease see :ref:`nxdi-index` for NxD Inference overview and documentation.\n\n\nHow can I install NxD Inference library?\n-----------------------------------------\nPlease refer to :ref:`nxdi-setup` for installation instructions.\n\n\nI am currently using the Transformers NeuronX library for inference. How does the NxD Inference library affect me?\n--------------------------------------------------------------------------------------------------------------------\n\nIf you are using Transformers NeuronX (TNx) in production, you can continue doing so. However, if you are planning to onboard new models to Neuron for inference, NxD Inference offers several advantages to consider.\n\nNxD Inference is designed to enable easy on-boarding of PyTorch models and comes with new features and enhanced support:\n\n* **Hardware Support**: While TNx is not supported on Trn2, NxD Inference supports all platforms (Trn1, Inf2, and Trn2)\n* **Simplified interface**: To simplify model development with NxD Inference, you write modeling code using PyTorch with standard Python, rather than using PyHLO as in TNx.\n* **Easy Migration**: NxD Inference was designed to provide seamless migration from TNx, especially if you are using it with vLLM. You can migrate your existing TNx inference scripts using the :ref:`migration guide <nxdi_migrate_from_tnx>`\n* **Enhanced Capabilities**: NxD Inference offers more comprehensive support for MoE models and multimodal models (Llama 3.2) compared to TNx\n* **Future Development**: New inference features and support for advanced model architectures (like multi-modality/video models) will be focused on NxD Inference\n\n\n\nI am currently using vLLM with Transformers NeuronX library for inference. Does NxD Inference library support vLLM ?\n---------------------------------------------------------------------------------------------------------------------\n\nYes, NxD Inference library supports vLLM inference engine.  Neuron vLLM integration in 2.21 release will start supporting both NxD Inference and Transformers NeuronX libraries.  To use vLLM with NxD Inference library, you can refer to the :ref:`nxdi-vllm-user-guide-v1`.\n\n\n\nWhat features and models are available in Transformers NeuronX (TNx) but not yet in NeuronX Distributed Inference?\n-------------------------------------------------------------------------------------------------------------------\n\nWhile NxD Inference supports most features and models available in TNx, there are some differences in current support that users should be aware of.\n\n**Features that are not yet supported in NxD Inference**: The following TNx features aren't supported yet in the NxD Inference library.\n\n* Multi-Node Inference support\n\n\n**Models not part of NxD Inference Model Hub**: The following models are included in Transformers NeuronX but not currently in NxD Inference library:\n\n* Bloom\n* GPT2\n* GPT-J\n* GPT-NEOX\n\nIf you need to use these models with NxD Inference, we encourage you to follow the :ref:`onboarding models developer guide <nxdi-onboarding-models>`. The onboarding process in NxD Inference is more straightforward compared to TNx due to its PyTorch-based architecture.\n\n\nI currently use Hugging Face TGI serving engine for deploying and serving Large Language Models (LLMs) on Neuron. How does NxD Inference library affect me?\n-----------------------------------------------------------------------------------------------------------------------------------------------------------\n\nIf you are currently using Hugging Face TGI serving engine to deploy models on Neuron, the introduction of NxD Inference library will not have any impact and you can continue to use your existing inference workloads. Hugging Face TGI integrates with Neuron SDK Inference libraries in a way that abstracts the underlying library for the users.\n\n\n\nI am new to Neuron and have inference workloads, what library should I use?\n----------------------------------------------------------------------------\n\nWe recommend you use NxD Inference for your model inference workloads. To learn how to get started using NxD Inference, see the :ref:`nxdi-index` documentation\n\n\n\n\n\n\n\n\nAdditional Resources\n--------------------\n\n* :ref:`nxdi-index`\n* :ref:`nxdi-overview`\n* :ref:`nxd-inference_rn`\n"
  },
  {
    "path": "about-neuron/appnotes/neuronx-distributed/introducing-nxdt-training.rst",
    "content": ".. _introduce-nxd-training:\n\nIntroducing NxD Training\n===================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nWhat are we introducing?\n------------------------\n\nStarting with the Neuron 2.20 release, we are introducing NxD Training. \nIn doing so, we are expanding NeuronX Distributed library (previously called NxD that will now be called NxD Core) to \nNxD Training with data science/engineering modules, and end to end examples. NxD Training is a PyTorch based \ndistributed training library that enables customers to train large-scale models. Some key distributed strategies \nsupported by NxD Training include 3D-parallelism (data parallelism, tensor parallelism and pipeline parallelism) and \nZeRO-1 (where optimizer states are partitioned across workers). \n \nNxD Training supports model training workflows like pretraining, supervised finetuning (SFT) and parameter efficient \nfinetuning (PEFT) using Low-Rank Adapter (LoRA) techniques [#f1]_. For developers, NxD Training offers both API level access \nthrough NxD Core and PyTorch Lightning and an intuitive interface via YAML based configuration files. NxD Training \noffers a flexible approach that enables customers to leverage only the functionalities that align with their unique \nworkflows and seamlessly integrate their machine learning training software at the appropriate level within NxD Training, \nensuring a user experience tailored to their specific requirements. This is a beta preview version of NxD Training  \nand feedback from the developer community is strongly encouraged for upcoming releases.\n\n\n\n.. _how-nxd-core-user-affected:\n\nI currently use NeuronX Distributed (NxD Core). How does NxD Training release affect me?\n---------------------------------------------------------------------------------------------------------------\n\nExisting NxD Core customers can continue to use NxD Core APIs available under NxD Training. If workflows based on NxD Core \nmeet your needs, you do not need to do anything different with NxD Training’s introduction. NxD Core APIs and \nfunctionalities for NxD Core continue to be available to you as before. You can choose to \n:ref:`install NxD Core only <neuronx_distributed_setup>` and skip all subsequent installation steps for \nNxD Training. However, NxD Training has additional support for YAML based configuration, a model hub and integration with \nPyTorch Lightning. If these capabilities are of interest to you, you may choose to evaluate and start using NxD Training. \n\n.. _should_nnm_usage_continue:\n\nShould the current Neuron NeMo Megatron (NNM) users continue to use NNM?\n------------------------------------------------------------------------------------------------\n\nNxD Training offers same capabilities as Neuron NeMo Megatron (NNM). Additionally, NNM \nwill go into maintenance mode in the next release. If you are currently using NNM, the introduction of NxD Training \ntoolkit means that you should start evaluating NxD Training for your training needs. With its YAML interface, NxD \nTraining is very close in terms of usability to NNM and NeMo. Migrating from NNM to NxD Training  \nshould involve a relatively minor effort and instructions for doing so are provided \n:ref:`here <nxdt_developer_guide_migration_nnm_nxdt>`.\n\n.. _what_to_use_as_new_user:\n\nI am new to Neuron and have training workloads, what toolkits or libraries should I use?\n----------------------------------------------------------------------------------------\n\nIf you are starting with Neuron and looking for solutions to your model pretraining or finetuning needs, then NxD Training \nis the recommended toolkit for you. Please start from :ref:`NxD Training page <nxdt>` for overview, \ninstallation and usage instructions.\n\n\nAdditional Resources\n------------------------\n\nMultiple NxD Training resources on getting started, using it and getting required support are listed below. If you encounter issues \nor have product related questions, please refer to FAQs and troubleshooting guides. Additionally, please feel free to reach out to us \nusing resources in Support section.\n\n:ref:`How to get started <neuron-quickstart>`\n\n:ref:`Release notes <neuron-2.19.0-whatsnew>`\n\n:ref:`Main section <nxdt>`\n\n:ref:`Troubleshooting <nxdt_known_issues>` \n\n:ref:`Support <neuron-quickstart>`\n\n.. [#f1] Supported through NxD Core."
  },
  {
    "path": "about-neuron/appnotes/perf/neuron-cc/parallel-ncgs.rst",
    "content": ".. _parallel-exec-ncgs:\n\nParallel Execution using NEURON_RT_NUM_CORES\n===============================================\n\n.. important ::\n  ``NEURONCORE_GROUP_SIZES`` will no longer be supported starting with the Neuron 1.19.0 release. If your application uses ``NEURONCORE_GROUP_SIZES``\n  see :ref:`neuron-migrating-apps-neuron-to-libnrt` and :ref:`eol-ncgs-env_2` for more details.\n\n\nIntroduction\n------------\n\nInf1 instances are available with a different number of Inferentia\nchips. Each Inferentia chip consists of 4 NeuronCores and an Inf1\ninstance includes 4 to 64 NeuronCores, depending on the size of the instance.\nThis guide shows you how to load one or more compiled models into\ndifferent consecutive groups of NeuronCores using your framework of choice.\n\nData Parallel Execution\n-----------------------\n\nIn PyTorch and TensorFlow, the same compiled model can run in parallel on an Inf1 instance by loading it multiple times, up to the total number of NeuronCores specified in NEURON_RT_NUM_CORES or NEURON_RT_VISIBLE_CORES. For more information about NEURON_RT_NUM_CORES and NEURON_RT_VISIBLE_CORES, refer to :ref:`Neuron Runtime Configuration <nrt-configuration>`.\n\n\nRunning multiple models using single process\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo run multiple models using a single process, set the environment\nvariable ``NEURON_RT_NUM_CORES`` with a list of the\nnumber of cores in each group, separated by commas.\n\nYou can set the ``NEURON_RT_NUM_CORES`` environment variable at runtime:\n\n.. code :: bash\n\n   #!/bin/bash\n   NEURON_RT_NUM_CORES=13 python your_neuron_application.py\n\nOr from within the Python process running your models (NOTE: You can\nonly set it once in the same process at the beginning of the script):\n\n.. code :: bash\n\n    #!/usr/bin/env python\n    import os\n\n    # Set Environment\n    os.environ['NEURON_RT_NUM_CORES']='13'\n\n    # Load models and run inferences ...\n\nThe following examples allow you to load 4 models into 4 groups of NeuronCores\nwithin one process. For example, if there are 4 models A, B, C, D\ncompiled to 2, 4, 3, and 4 NeuronCores respectively, directly load\nthe models A, B, C, D in sequence within your TensorFlow or PyTorch\nNeuron process. This example requires an inf1.6xlarge instance with 16\nNeuronCores, as the total number of NeuronCores within the NeuronCore\nGroups is 13.\n\n\n\nIn MXNet, mapping from models to NeuronCores is controlled by\ncontext ``mx.neuron(neuron_core_index)`` where ``neuron_core_index`` is the NeuronCore\nindex at the start of the group. In the example above, map model A to ``mx.neuron(0)``\ncontext, model B to ``mx.neuron(2)`` context, model C to\n``mx.neuron(6)`` context and model D to ``mx.neuron(9)`` context. For\nfurther details, refer to :ref:`Flexible Execution Group (FlexEG) in Neuron-MXNet<flexeg>`.\n\nFor PyTorch\n\nSee :ref:`Data Parallel Inference on Torch Neuron<torch-neuron-dataparallel-app-note>` for more details.\n\nFor Tensorflow\n\n.. code :: python\n\n    # Set Environment \n    os.environ['NEURON_RT_NUM_CORES']='13'\n\n    # Load models (TF2)\n    model0 = tf.keras.models.load_model(model0_file) # loaded into the first group of NC0-NC1\n    model1 = tf.keras.models.load_model(model1_file) # loaded into the second group of NC2-NC5\n    model2 = tf.keras.models.load_model(model1_file) # loaded into the third group of NC6-NC8\n    model3 = tf.keras.models.load_model(model1_file) # loaded into the fourth group of NC9-NC12\n\n    # run inference by simply calling the loaded model\n    results0 = model0(inputs0)\n    results1 = model1(inputs1)\n    results2 = model2(inputs2)\n    results3 = model3(inputs3)\n\n\nFor MXNet 2.x:\n\n.. code :: python\n\n    # Set Environment\n    os.environ['NEURON_RT_NUM_CORES']='13'\n\n    # Load models (MXNet)\n    # loaded into the first group of NC0-NC1\n    sym, args, aux = mx.model.load_checkpoint(mx_model0_file, 0)\n    model0 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null')\n    # loaded into the second group of NC2-NC5\n    sym, args, aux = mx.model.load_checkpoint(mx_model1_file, 0)\n    model1 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null')\n    # loaded into the third group of NC6-NC8\n    sym, args, aux = mx.model.load_checkpoint(mx_model2_file, 0)\n    model2 = sym.bind(ctx=mx.neuron(6), args=args, aux_states=aux, grad_req='null')\n    # loaded into the fourth group of NC9-NC12\n    sym, args, aux = mx.model.load_checkpoint(mx_model3_file, 0)\n    model3 = sym.bind(ctx=mx.neuron(9), args=args, aux_states=aux, grad_req='null')\n\n    # run inference by simply calling the loaded model\n    results0 = model0.forward(data=inputs0)\n    results1 = model1.forward(data=inputs1)\n    results2 = model2.forward(data=inputs2)\n    results3 = model3.forward(data=inputs3)\n\nYou can identify the NeuronCores used by each application with the ``neuron-top`` command\nline tool. For more information about the neuron-top user interface, see :ref:`Neuron Top User Guide <neuron-top-ug>`.\n\n.. code :: bash\n\n   $ neuron-top\n\n.. figure:: /images/multi_1core_models_multi_processes.png\n   :scale: 80 %\n\nRunning multiple models using multiple processes\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can also run multiple models in parallel processes, when you set\n``NEURON_RT_NUM_CORES`` per process:\n\n.. code :: bash\n\n   $ NEURON_RT_NUM_CORES=2 python your_1st_neuron_application.py\n   $ NEURON_RT_NUM_CORES=2 python your_2nd_neuron_application.py\n\nThe first process automatically selects a first set of 2 unused\nNeuronCores for its new group. The second process automatically selects\na new set of 2 unused NeuronCores for its new group.\n\n.. figure:: /images/multi_2cores_models_multi_processes.png\n   :scale: 80 %\n\nRunning multiple models on the same NeuronCore group\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can load more than one model in a NeuronCore group within one\nprocess. Neuron runtime handles switching from one model to the\nnext model within the NeuronCore group, when the next model is run within\nthe application. In TensorFlow or PyTorch, simply load the additional\nmodels after the initial number of models have been loaded, to fill the\nNeuronCore groups associated with the process.\n\nFor PyTorch:\n\n.. code :: python\n\n    # Set Environment\n    os.environ['NEURON_RT_NUM_CORES']='2'\n\n    # Load models (PT)\n    model0 = torch.jit.load(model0_file) # loaded into the first group of NC0-NC1\n    model1 = torch.jit.load(model1_file) # loaded into the first group of NC0-NC1\n\n    # run inference by simply calling the loaded model\n    results0 = model0(inputs0)\n    results1 = model1(inputs1)\n\nFor TensorFlow 2.x:\n\n.. code :: python\n\n    # Set Environment\n    os.environ['NEURON_RT_NUM_CORES']='2'\n\n    # Load models (TF2)\n    model0 = tf.keras.models.load_model(model0_file) # loaded into the first group of NC0-NC1\n    model1 = tf.keras.models.load_model(model1_file) # loaded into the first group of NC0-NC1\n\n    # run inference by simply calling the loaded model\n    results0 = model0(inputs0)\n    results1 = model1(inputs1)\n\nIn MXNet, use context ``mx.neuron(neuron_core_index)`` and use the\nsame NeuronCore start index for the additional models.\n\n.. code :: python\n\n    # Set Environment\n    os.environ['NEURON_RT_NUM_CORES']='2'\n\n    # Load models (MXNet)\n    # loaded into the first group of NC0-NC1\n    sym, args, aux = mx.model.load_checkpoint(mx_model0_file, 0)\n    model0 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null')\n    # loaded into the first group of NC0-NC1\n    sym, args, aux = mx.model.load_checkpoint(mx_model1_file, 0)\n    model1 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null')\n\n    # run inference by simply calling the loaded model\n    results0 = model0.forward(data=inputs0)\n    results1 = model1.forward(data=inputs1)\n\nThe total ``NEURON_RT_NUM_CORES`` across all processes cannot exceed\nthe number of NeuronCores available on the instance. For example,\non an inf1.xlarge with default configurations where the total number of\nNeuronCores visible to TensorFlow-Neuron is 4, you can launch one\nprocess with ``NEURON_RT_NUM_CORES=2`` (pipelined) and another\nprocess with ``NEURON_RT_NUM_CORES=2`` (data-parallel).\n\nExamples using ``NEURON_RT_NUM_CORES`` include:\n\n* :ref:`PyTorch example </src/examples/pytorch/resnet50.ipynb>`\n* :ref:`MXNet example </src/examples/mxnet/resnet50_neuroncore_groups.ipynb>`\n\n\nAuto Model Replication in TensorFlow Neuron (``tensorflow-neuron``) (Beta)\n----------------------------------------------------------------------------------\n\nRefer to the following API documentation to see how to perform automatic replication on\nmultiple cores. Note auto-replication will only work on models compiled with pipeline size 1:\nvia ``--neuroncore-pipeline-cores=1``. If automatic replication is not enabled, the model will default to \nreplicate on up to 4 cores.\n\nPython API (TF 2.x only):\n\n:ref:`tensorflow-ref-auto-replication-python-api`\n\nCLI API (TF 1.x and TF 2.x):\n\n:ref:`tensorflow-ref-auto-replication-cli-api`\n\n\nAuto Model Replication (Being Deprecated)\n-----------------------------------------\n\nThe Auto Model Replication feature in TensorFlow-Neuron enables you to\nload the model once and the data parallel replication will occur\nautomatically. This reduces framework memory usage, as the same model is not loaded multiple times. This feature is beta and\navailable in TensorFlow-Neuron only.\n\nTo enable Auto Model Replication, set NEURONCORE_GROUP_SIZES to Nx1,\nwhere N is the desired replication count (the number of NeuronCore\ngroups, each group has size 1). For example, NEURONCORE_GROUP_SIZES=8x1\nwould automatically replicate the single-NeuronCore model 8 times.\n\n.. code :: python\n\n       os.environ['NEURONCORE_GROUP_SIZES'] = '4x1'\n\nor\n\n.. code :: bash\n\n   NEURONCORE_GROUP_SIZES=4x1 python3 application.py\n\nWhen NEURONCORE_GROUP_SIZES is not set, the default is 4x1, where a\nsingle-NeuronCore model is replicated 4 times on any size of inf1 machine.\n\nThis feature is only available for models compiled with\nneuroncore-pipeline-cores set to 1 (default).\n\nYou will still need to use threads in the scaffolding code, to feed the\nloaded replicated model instance, to achieve high throughput.\n\nExample of auto model replication: :ref:`/src/examples/tensorflow/openpose_demo/openpose.ipynb`\n\n\nFAQ\n---\n\nCan I mix data parallel and NeuronCore Pipelines?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYes. You can compile the model using the neuroncore-pipeline-cores option.\nThis tells the compiler to set compilation to the specified number of\ncores for :ref:`neuroncore-pipeline`.\nThe Neuron Compiler returns a NEFF that fits within this limit. See\nthe :ref:`neuron-compiler-cli-reference`\nfor instructions on how to use this option.\n\nFor example, on an inf1.2xlarge, you can load two model instances, each\ncompiled with neuroncore-pipeline-cores set to 2, so they can run\nin parallel. The model instances can be loaded from different saved\nmodels or from the same saved model.\n\nCan I have a mix of multiple models in one Neuroncore group and single model in another one Neuroncore group?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nCurrently, you can do this in MXNet, by setting up two Neuroncore groups, then loading,\nfor example, multiple models in one NCG, using context mx.neuron(0), and\nloading a single model in the second NCG, using context mx.neuron(2). You can\nalso load a single model in the first NCG and multiple models in the\nsecond NCG. For example:\n\n.. code :: python\n\n\n    # Set Environment\n    os.environ['NEURON_RT_NUM_CORES']='6'\n\n    # Load models (MXNet)\n    # loaded into the first group of NC0-NC1\n    sym, args, aux = mx.model.load_checkpoint(mx_model0_file, 0)\n    model0 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null')\n    # loaded into the second group of NC2-NC5\n    sym, args, aux = mx.model.load_checkpoint(mx_model1_file, 0)\n    model1 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null')\n    # loaded into the second group of NC2-NC5\n    sym, args, aux = mx.model.load_checkpoint(mx_model2_file, 0)\n    model2 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null')\n    # loaded into the second group of NC2-NC5\n    sym, args, aux = mx.model.load_checkpoint(mx_model3_file, 0)\n    model3 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null')\n\n    # run inference by simply calling the loaded model\n    results0 = model0.forward(data=inputs0)\n    results1 = model1.forward(data=inputs1)\n    results2 = model2.forward(data=inputs2)\n    results3 = model3.forward(data=inputs3)\n\nLoading multiple models in one NCG and a single model in another NCG is\ncurrently not supported in TensorFlow and PyTorch.\n"
  },
  {
    "path": "about-neuron/appnotes/perf/neuron-cc/performance-tuning.rst",
    "content": ".. _appnote-performance-tuning:\n\nPerformance Tuning\n==================\n\n.. important ::\n  NeuronCore Groups (NCG) have been deprecated. See :ref:`eol-ncg` and :ref:`neuron-migrating-apps-neuron-to-libnrt` for more details.\n\nThis guide is intended to provide the reader with an in-depth\nunderstanding of how to optimize neural network performance on\nInferentia for both throughput and latency. For simplicity, the guide\nuses the TensorFlow and ResNet-50 models as teaching examples to show how\nto choose between different compile-time optimizations (e.g., Batching and\nNeuronCore Pipeline), as well as model-serving optimizations (e.g.,\nmulti-threading and dynamic-batching) to improve inference performance.\n\nThe following guides are considered to be prerequisites for this tutorial:\n\n-  :ref:`/src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb`\n-  TensorFlow Serving NeuronCore Group\n-  :ref:`neuron-batching`\n-  :ref:`neuroncore-pipeline`\n\nBatching and pipelining (technical background)\n----------------------------------------------\n\nNeuron provides developers with various performance optimization features.\n\nTwo of the most widely used features are batching and pipelining. Both\ntechniques aim to keep the data close to the compute engines, but they achieve\nthis data locality in different ways. In batching it is achieved by loading\nthe data into an on-chip cache and reusing it multiple times for multiple\ndifferent model-inputs, while in pipelining it is achieved by caching all\nmodel parameters into the on-chip cache across multiple NeuronCores and\nstreaming the calculation across them.\n\nAs a general rule of thumb, batching is preferred for applications that\naim to optimize throughput and cost at the expense of latency, while\npipelining is preferred for applications with a high-throughput\nrequirement under a strict latency budget.\n\nCompiling for batching optimization\n-----------------------------------\n\nTo enable batching optimization, the model must first be compiled\nfor a target batch-size. This is done by specifying the batch size in\nthe input tensor's batch dimension during compilation. Users are\nencouraged to evaluate multiple batch size, in order to determine the\noptimal latency/throughput deployment-point, which is application-dependent.\n\nFor example, the code snippet below enables batching on a ResNet50\nmodel, with a batch-size of 5:\n\n.. code:: python\n\n   import numpy as np\n   import tensorflow.neuron as tfn\n\n   # To change the batch size, change the first dimension in example_input\n   batch_size = 5\n   example_input = np.zeros([batch_size,224,224,3], dtype='float16')\n\n   tfn.saved_model.compile(\"rn50_fp16\",\n                           \"rn50_fp16_compiled/1\",\n                           model_feed_dict={'input_1:0': example_input },\n                           dynamic_batch_size=True)\n\n.. note::\n\n   Depending on the size of the neural network, Neuron has a maximum\n   batch size that works optimally on Inferentia. If\n   an unsupported batch size is used, an internal compiler error message\n   will be displayed.\n   A simple way to explore optimal batch size for your specific model is to\n   increment the batch size from 1 upward, one at a time, and test\n   application performance.\n\nCompiling for pipeline optimization\n-----------------------------------\n\nIn NeuronCore Pipeline mode, Neuron stores the model parameters in\nInferentias' local cache and streams inference requests across\nthe available NeuronCores, as specified by the\n``--neuroncore-pipeline-cores`` compiler argument. For example, to\ncompile the model to fit a pipeline size of four Inferentia devices (16\nNeuronCores) avaliable in the inf1.6xlarge instance size:\n\n.. code:: python\n\n   import numpy as np\n   import tensorflow.neuron as tfn\n\n   compiler_args = ['--neuroncore-pipeline-cores', '16']\n   example_input = np.zeros([1,224,224,3], dtype='float16')\n   tfn.saved_model.compile(\"rn50_fp16\",\n                           \"rn50_fp16_compiled/1\",\n                           model_feed_dict={'input_1:0': example_input },\n                           compiler_args=compiler_args)\n\nThe minimum number of NeuronCores needed to run a compiled model can be\nfound using the Neuron Check Model tool. See :ref:`neuron_check_model`.\n\nModel-serving inference optimizations\n-------------------------------------\n\nTo fully realize the maximum throughput of the compiled model\n(for either batching and pipelining), users need to launch multiple host\nCPU threads to feed inputs into the Neuron pipeline. The number of\nthreads needs to be larger than the specified maximum number of\nNeuronCores.\n\nAdditionally, dynamic batching can be used to process a larger\nclient-side inference batch-size and the framework automatically breaks\nup the user-batch into smaller batch sizes, to match the compiled\nbatch-size. This technique increases the achievable throughput by hiding\nthe framework-to-neuron overhead, and amortizing it over a larger batch\nsize. To use dynamic batching, set the argument\n``--dynamic_batch_size=True`` during compilation and send a larger\ninference batch size (user inference batch size) that is equal to a\nmultiple of the compiled batch size.\n\nBoth methods can be applied together if this improves\nperformance. However, multi-threading is always needed as a first step\nto achieve high throughput. You need to experiment to find\noptimal settings for your application.\n\nBy default the framework sets the number of outstanding inference\nrequests to the total number of NeuronCores plus three. This can be\nchanged by setting the NEURON_MAX_NUM_INFERS environment variable. For\nexample, if the compiled model includes CPU partitions (e.g., if the Neuron compiler decides that some operations are more efficient to execute on CPU), \nthe number of threads needs to be increased to account for the\nadditional compute performed on the CPU. Note that the available\ninstance host memory size needs to be taken into consideration to prevent\nout-of-memory errors. As above, you need to experiment in order to find\nthe optimal settings for your application.\n\n.. note::\n\n   By default the framework allocates a NeuronCore Group size to\n   match the size of the compiled model. The size of the model is the\n   number of NeuronCores limit passed to compiler during compilation\n   (``--neuroncore-pipeline-cores`` option). For more information see the\n   TensorFlow Serving NeuronCore Group documentation.\n\nOther considerations\n--------------------\n\nMixed Precision\n~~~~~~~~~~~~~~~\n\nYou can find more information about performance and accuracy trade offs\nin :ref:`neuron-cc-training-mixed-precision`.\n\n\nOperator support\n~~~~~~~~~~~~~~~~\n\nThe Neuron Compiler maintains an evolving list of supported operators\nfor each framework: :ref:`neuron-supported-operators`\n\nAWS Neuron handles unsupported operators by partitioning the graph into\nsubgraphs and executing them on different targets (e.g., NeuronCore\npartition, CPU partition). If the entire model can run on Inferentia\n(i.e., all operators are supported), then it will be compiled into\na single subgraph, which will be executed by a NeuronCore Group.\n\nDebug\n~~~~~\n\nYou can examine the post-compiled model to view the compilation results\nusing the Neuron plugin for TensorBoard.\nSee :ref:`tensorboard-plugin-visualize-graph`.\n\nResNet-50 optimization example\n------------------------------\n\nFor an example demonstrating the concepts described here, see\n:ref:`/src/examples/tensorflow/keras_resnet50/keras_resnet50.ipynb`\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuron/bucketing-app-note.rst",
    "content": ".. _bucketing_app_note:\n\nRunning inference on variable input shapes with bucketing\n=========================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nIntroduction\n------------\n\nWith Inferentia, the shape of every input must be fixed at compile time. For\napplications that require multiple input sizes, we recommend using padding or\nbucketing techniques. Padding requires you to compile your model with the\nlargest expected input size and pad every input to this maximum size. If the\nperformance of your model using padding is not within your targets, you can\nconsider implementing bucketing.\n\nThis guide introduces bucketing, a technique to run inference on inputs with\nvariable shapes on Inferentia. The following sections explain how bucketing can\nimprove the performance of inference workloads on Inferentia. It covers an\noverview of how bucketing works and provides examples of using bucketing in\n:ref:`computer vision <bucketing_example_cv>` and\n:ref:`natural language processing<bucketing_example_nlp>` applications.\n\nApplications that benefit from bucketing\n----------------------------------------\n\nBucketing refers to compiling your model multiple times with different target\ninput shapes to create “bucketed models.\" :ref:`creating_buckets` provides an\noverview on selecting the input shapes that you use to create bucketed models. At\ninference time, each input is padded until its shape matches the next largest\nbucket shape. The padded input is then passed into the corresponding bucketed model\nfor inference. By compiling the same model with multiple different input shapes,\nthe amount of input padding is reduced compared to padding every input to the\nmaximum size in your dataset. This technique minimizes the compute overhead\nand improves inference performance compared to padding every image to the\nmaximum shape in your dataset.\n\nBucketing works best when multiple different bucketed models are created to efficiently\ncover the full range of input shapes. You can fine-tune the model performance\nby experimenting with different bucket sizes that correspond to the\ndistribution of input shapes in your dataset.\n\nBucketing can only be used if there is an upper bound on the shape of the\ninputs. If necessary, an upper bound on the input shape can be enforced using\nresizing and other forms of preprocessing.\n\n.. _num_buckets:\n\nThe upper bound on the number of bucketed models that you use is dictated by the\ntotal size of the compiled bucketed models. Each Inferentia chip has 8GB of\nDRAM, or 2GB of DRAM per NeuronCore. An inf1.xlarge and inf1.2xlarge have\n1 Inferentia chip, an inf1.6xlarge has 4 Inferentia chips, and an inf1.24xlarge\nhas 16 Inferentia chips. Thus, you should limit the total size of all bucketed\nmodels to around 8GB per Inferentia chip or 2GB per NeuronCore.\nThe following formula provides an approximation for the number of\ncompiled bucketed models you can fit on each NeuronCore:\n\n::\n\n    number-of-buckets = round(10^9 / number-of-weights-in-model)\n\nWe recommend using :ref:`neuron-top <neuron-top-ug>` to monitor the\nmemory usage on your inf1 instance as you load multiple bucketed models.\n\nImplementing bucketing\n-----------------------\n\nImplementing bucketing consists of two main parts: creating multiple bucketed\nmodels at compile-time and running inference using the bucketed models on (padded)\ninputs. The following sections describe how to implement bucketing to run\ninference in applications that have variable input shapes.\n\n.. _creating_buckets:\n\nCreating bucketed models\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBefore running inference, models should be compiled for different input shapes\nthat are representative of the input dataset. The input shapes that are used\nto compile the models determine the bucket shapes that are used during inference.\nThe bucket shapes should be chosen to minimize the amount of padding on each new input.\nAdditionally, there should always be a bucket that’s large enough to handle the\nmaximum input shape in the dataset. The limit on the number of compiled bucketed\nmodels that can be used is described in this :ref:`section<num_buckets>`.\n\n\nRunning inference with bucketing\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAt inference time, each input should be padded to match the size of the next\nlargest bucket, such that the height and width (or sequence length) of the\npadded input equals the size of the bucket. Then, the padded input should\nbe passed into the corresponding bucket for inference. If necessary, it’s\nimportant to remove and/or crop any aberrant predictions that occur in the\npadded region. For example, in object detection applications, bounding box\npredictions that occur in the padded regions should be removed to avoid\nerroneous predictions. \n\n.. _bucketing_examples:\n\nExamples\n--------\n\nThe following sections provide examples of applying the bucketing technique\nto run inference in applications that have variable input shapes.\n\n.. _bucketing_example_cv:\n\nComputer vision bucketing\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAs an example of implementing bucketing for computer vision models, consider an\napplication where the height and width of images in dataset are uniformly\ndistributed between `[400, 400]` and `[800, 800]`. Given that every input\nshape between `[400, 400]` and `[800, 800]` is equally likely, it could\nmake sense to create bucketed models that divide up the range of input shapes into\nequally sized chunks. For example, we could create bucketed models for the input shapes\n`[500, 500]`, `[600, 600]`, `[700, 700]`, and `[800, 800]`. \n\nAs an example of running inference with bucketing, let’s assume that we created\nbucketed models for the input shapes `[500, 500]`, `[600, 600]`, `[700, 700]`, and\n`[800, 800]`. If we receive an input with shape `[640, 640]`, we would\npad the input to the next largest bucket, `[700, 700]`, and use this bucket\nfor inference. If we receive an input with shape `[440, 540]`, we would\nneed to pad the input to the bucket size, `[600, 600]`, and use this bucket\nfor inference.\n\nAs another example of creating bucketed models, consider a computer vision\napplication where the dataset is not uniformly distributed. As before, let’s\nassume the input shapes range between `[400, 400]` to `[800, 800]`. Now, let’s\nassume the data shape distribution is bimodal, such that `[540, 540]` and\n`[720, 720]` are the two most common input shapes. In this example, it might\nmake sense to create bucketed models for input shapes `[540, 540]`, `[720, 720]`, and\n`[800, 800]` to target the most common shapes while still including the\nentire range of input shapes.\n\n\nEnd-to-end computer vision bucketing example\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this example, we run inference in a computer vision application that has\nvariable shaped images that range in shape from `[400, 400]` to\n`[800, 800]`. We create bucketed models for the input shapes `[500, 500]`,\n`[600, 600]`, `[700, 700]`, and `[800, 800]` to handle the variable input\nshapes.\n\n.. code-block:: python\n\n    import numpy as np\n    import torch\n    from torchvision import models\n    import torch_neuron\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Define the bucket sizes that will be used for compilation and inference\n    bucket_sizes = [(500, 500), (600, 600), (700, 700), (800, 800)]\n\n    # Create the bucketed models by compiling a model for each bucket size\n    buckets = {}\n    for bucket_size in bucket_sizes:\n        # Create an example input that is the desired bucket size\n        h, w = bucket_size\n        image = torch.rand([1, 3, h, w])\n\n        # Compile with the example input to create the bucketed model\n        model_neuron = torch.neuron.trace(model, image)\n\n        # Run a warm up inference to load the model into Inferentia memory\n        model_neuron(image)\n\n        # Add the bucketed model based on its bucket size\n        buckets[bucket_size] = model_neuron\n\n\n    def get_bucket_and_pad_image(image):\n        # Determine which bucket size to use\n        oh, ow = image.shape[-2:]\n        target_bucket = None\n        for bucket_size in bucket_sizes:\n            # Choose a bucket that's larger in both the height and width dimensions\n            if oh <= bucket_size[0] and ow <= bucket_size[1]:\n                target_bucket = bucket_size\n                break\n\n        # Pad the image to match the size of the bucket\n        h_delta = target_bucket[0] - oh\n        w_delta = target_bucket[1] - ow\n\n        b_pad = h_delta  # Bottom padding\n        l_pad = 0  # Left padding\n        t_pad = 0  # Top padding\n        r_pad = w_delta  # Right padding\n\n        # Pad the height and width of the image\n        padding_amounts = (l_pad, r_pad, t_pad, b_pad)\n        image_padded = torch.nn.functional.pad(image, padding_amounts, value=0)\n\n        return image_padded, target_bucket\n\n\n    # Run inference on inputs with different shapes\n    for _ in range(10):\n        # Create an image with a random height and width in range [400, 400] to [800, 800]\n        h = int(np.random.uniform(low=400, high=800))\n        w = int(np.random.uniform(low=400, high=800))\n        image = torch.rand(1, 3, h, w)\n\n        # Determine bucket and pad the image\n        image_padded, target_bucket = get_bucket_and_pad_image(image)\n\n        # Use the corresponding bucket to run inference\n        output = buckets[target_bucket](image_padded)\n\n\n.. _bucketing_example_nlp:\n\nNatural language processing bucketing\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAs an example of implementing bucketing for natural language processing models,\nconsider an application where the lengths of tokenized sequences in a dataset are\nuniformly distributed between 0 and 128 tokens. Given that every tokenized sequence\nlength between 0 and 128 is equally likely, it might make sense to create\nbucketed models that divide up the range of tokenized sequence lengths into equally sized\nchunks. For example, we could create bucketed models for tokenized sequence lengths 64\nand 128.\n\nAs an example of running inference with bucketing, let's assume that we created\nbucketed models for the input tokenized sequence lengths 64 and 128. If we receive a\ntokenized sequence with length 55, we would need to pad it to the bucket size\n64 and use this bucket for inference. If we receive a tokenized sequence with\nlength 112, we would need to pad it to the bucket size 128 and use this bucket\nfor inference.\n\nEnd-to-end natural language processing bucketing example\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this example, we run inference in a natural language processing application\nthat has variable length tokenized sequences that range from 0 to 128. We\ncreate bucketed models for lengths 64 and 128 to handle the variable input lengths.\n\n.. code-block:: python\n\n    import numpy as np\n    import torch\n    from transformers import AutoTokenizer, AutoModelForSequenceClassification\n    import torch_neuron\n\n    # Build tokenizer and model\n    tokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased-finetuned-mrpc\")\n    model = AutoModelForSequenceClassification.from_pretrained(\"bert-base-cased-finetuned-mrpc\", return_dict=False)\n    model.eval()\n\n    # Define the bucket sizes that will be used for compilation and inference\n    bucket_sizes = [64, 128]\n\n    # Create the bucketed models by compiling a model for each bucket size\n    buckets = {}\n    for bucket_size in bucket_sizes:\n        # Setup some example inputs\n        sequence_0 = \"The company HuggingFace is based in New York City\"\n        sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n\n        # Create an example input that is the desired bucket size\n        paraphrase = tokenizer.encode_plus(sequence_0,\n                                        sequence_1,\n                                        max_length=bucket_size,\n                                        padding='max_length',\n                                        truncation=True,\n                                        return_tensors=\"pt\")\n\n        # Convert example inputs to a format that is compatible with TorchScript tracing\n        example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']\n\n        # Compile with the example input to create the bucketed model\n        model_neuron = torch.neuron.trace(model, example_inputs_paraphrase)\n\n        # Run a warm up inference to load the model into Inferentia memory\n        model_neuron(*example_inputs_paraphrase)\n\n        # Add the bucketed model based on its bucket size\n        buckets[bucket_size] = model_neuron\n\n\n    def get_bucket_and_pad_paraphrase(paraphrase):\n        # Determine which bucket size to use\n        inputs = paraphrase['input_ids']\n        attention = paraphrase['attention_mask']\n        token_type = paraphrase['token_type_ids']\n        paraphrase_len = inputs.shape[1]\n        target_bucket = None\n        for bucket_size in bucket_sizes:\n            if paraphrase_len <= bucket_size:\n                target_bucket = bucket_size\n                break\n\n        # Pad the paraphrase to match the size of the bucket\n        delta = target_bucket - paraphrase_len\n        zeros = torch.zeros([1, delta], dtype=torch.long)\n        inputs = torch.cat([inputs, zeros], dim=1)\n        attention = torch.cat([attention, zeros], dim=1)\n        token_type = torch.cat([token_type, zeros], dim=1)\n\n        paraphrase_padded = inputs, attention, token_type\n        return paraphrase_padded, target_bucket\n\n\n    # Create two sample sequences\n    sequence_0 = (\"The only other bear similar in size to the polar bear is the \"\n                  \"Kodiak bear, which is a subspecies of the brown bear. Adult male \"\n                  \"polar bears weigh 350–700 kg and measure 2.4–3 meters in total \"\n                  \"length. All bears are short-tailed, the polar bear's tail is \"\n                  \"relatively the shortest amongst living bears.\")\n    sequence_1 = (\"Around the Beaufort Sea, however, mature males reportedly \"\n                  \"average 450 kg. Adult females are roughly half the size of males \"\n                  \"and normally weigh 150–250 kg, measuring 1.8–2.4 meters in length. \"\n                  \"The legs are stocky and the ears and tail are small.\")\n\n    # Run inference on inputs with different shapes\n    # We create the variable shapes by randomly cropping the sequences\n    for _ in range(10):\n        # Get random sequence lengths between 0 and 128\n        paraphrase_len = int(np.random.uniform(128))\n\n        # Crop the paraphrase\n        paraphrase_cropped = tokenizer.encode_plus(sequence_0,\n                                        sequence_1,\n                                        max_length=paraphrase_len,\n                                        padding='max_length',\n                                        truncation=True,\n                                        return_tensors=\"pt\")\n\n        # Determine bucket and pad the paraphrase\n        paraphrase_padded, target_bucket = get_bucket_and_pad_paraphrase(paraphrase_cropped)\n\n        # Use the corresponding bucket to run inference\n        output = buckets[target_bucket](*paraphrase_padded)\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuron/index.rst",
    "content": ".. _torch-neuron-appnotes:\n\nPyTorch Neuron Application Notes\n=================================\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   bucketing-app-note\n   rcnn-app-note\n   torch-neuron-dataparallel-app-note\n\nThis section contains application notes specific to PyTorch Neuron (``torch-neuron``) for ``Inf1`` instances. These guides cover advanced optimization techniques, implementation patterns, and best practices for deploying PyTorch models on AWS Inferentia.\n\nApplication Notes\n-----------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: bucketing-app-note\n      :link-type: doc\n\n      **Dynamic Batching with Bucketing**\n      ^^^\n      Optimize inference performance using dynamic batching and bucketing strategies\n\n   .. grid-item-card::\n      :link: rcnn-app-note\n      :link-type: doc\n\n      **R-CNN Implementation Guide**\n      ^^^\n      Comprehensive guide for implementing and optimizing R-CNN models on Inferentia\n\n   .. grid-item-card::\n      :link: torch-neuron-dataparallel-app-note\n      :link-type: doc\n\n      **Data Parallel Inference**\n      ^^^\n      Scale inference workloads using ``torch.neuron.DataParallel`` for multi-core execution"
  },
  {
    "path": "about-neuron/appnotes/torch-neuron/rcnn-app-note.rst",
    "content": ".. _torch-neuron-r-cnn-app-note:\n\nRunning R-CNNs on Inf1\n======================\n\nThis application note demonstrates how to compile and run\n`Detectron2 <https://github.com/facebookresearch/detectron2>`__-based\nR-CNNs on Inf1. It also provides guidance on how to use profiling to\nimprove performance of R-CNN models on Inf1.\n\n.. contents:: Table of contents\n   :local:\n\n\nR-CNN Model Overview\n--------------------\n\nRegion-based CNN (R-CNN) models are commonly used for object detection\nand image segmentation tasks. A typical R-CNN architecture consists\nof the following components:\n\n-  **Backbone:** The backbone extracts features from input images. In\n   some models the backbone is a Feature Pyramid Network (FPN), which\n   uses a top-down architecture with lateral connections to build an\n   in-network feature pyramid from a single-scale input. The backbone is\n   commonly a ResNet or Vision Transformer based network.\n-  **Region Proposal Network (RPN):** The RPN predicts region proposals\n   with a wide range of scales and aspect ratios. RPNs are constructed\n   using convolutional layers and anchor boxes, which that serve as references\n   for multiple scales and aspect ratios.\n-  **Region of Interest (RoI):** The RoI component is used to resize the\n   extracted features of varying size to the same size so that\n   they can be consumed by a fully connected layer. RoI Align is\n   typically used instead of RoI Pooling, because RoI Align provides\n   better alignment.\n\nThe `Detectron2 <https://github.com/facebookresearch/detectron2>`__\nlibrary provides many popular PyTorch R-CNN implementations, including\nR-CNN, Fast R-CNN, Faster R-CNN, and Mask R-CNN. This application note \nfocuses on the Detectron2 R-CNN models.\n\nR-CNN Limitations and Considerations on Inferentia (NeuronCore-v1)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nR-CNN models may have limitations and considerations on Inferentia\n(NeuronCore-v1). See the Model Architecture Fit Guidelines\nfor more information. These limitations are not\napplicable to NeuronCore-v2.\n\nRequirements\n------------\n\nThe process described in this application note is intended to be run on an ``inf1.2xlarge``. In practice,\nR-CNN models can be run on any Inf1 instance size.\n\nVerify that this Jupyter notebook is running the Python kernel\nenvironment that was set up according to the `PyTorch Installation\nGuide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install.html>`__.\nSelect the kernel from the “Kernel -> Change Kernel” option at\nthe top of the Jupyter notebook page.\n\nInstallation\n------------\n\nThis process requires the following pip packages:\n\n- ``torch==1.11.0``\n- ``torch-neuron``\n- ``neuron-cc``\n- ``opencv-python``\n- ``pycocotools``\n- ``torchvision==0.12.0``\n- ``detectron2==0.6``\n\nThe following section explains how to build ``torchvision`` from source and install\nthe ``Detectron2`` package. It also reinstalls the Neuron packages, to ensure\nversion compatibility.\n\nThe ``torchvision`` ``roi_align_kernel.cpp`` kernel is modified to\nuse OMP threading for a multi-threaded inference on the CPU. This significantly\nimproves the performance of RoI Align kernels on Inf1: OMP threading\nleads to a RoI Align latency reduction two to three times larger than the default\n``roi_align_kernel.cpp`` kernel configuration.\n\n.. code:: ipython3\n\n    # Install python3.7-dev for pycocotools (a Detectron2 dependency)\n    !sudo apt install python3.7-dev -y\n    \n    # Install Neuron packages\n    !pip uninstall -y torchvision\n    !pip install --force-reinstall \"protobuf==3.20.1\" ninja opencv-python\n    !pip install --force-reinstall torch-neuron==1.11.0.* neuron-cc[tensorflow] --extra-index-url https://pip.repos.neuron.amazonaws.com\n\n    # Change cuda to 10.2 for Detectron2\n    !sudo rm /usr/local/cuda\n    !sudo ln -s /usr/local/cuda-10.2 /usr/local/cuda\n    \n    # Install Torchvision 0.12.0 from source\n    !git clone -b release/0.12 https://github.com/pytorch/vision.git\n    \n    # Update the RoI Align kernel to use OMP multithreading\n    with open('vision/torchvision/csrc/ops/cpu/roi_align_kernel.cpp', 'r') as file:\n        content = file.read()\n    \n    # Enable OMP Multithreading and set the number of threads to 4\n    old = \"// #pragma omp parallel for num_threads(32)\"\n    new = \"#pragma omp parallel for num_threads(4)\"\n    content = content.replace(old, new)\n    \n    # Re-write the file\n    with open('vision/torchvision/csrc/ops/cpu/roi_align_kernel.cpp', 'w') as file:\n        file.write(content)\n    \n    # Build Torchvision with OMP threading\n    !cd vision && CFLAGS=\"-fopenmp\" python setup.py bdist_wheel\n    %pip install vision/dist/*.whl\n    \n    # Install Detectron2 release v0.6\n    !python -m pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.6'\n\nCompiling an R-CNN for Inf1\n---------------------------\n\nBy default, R-CNN models are not compilable on Inf1, because they cannot\nbe traced with ``torch.jit.trace``, which is a requisite for inference\non Inf1. The following section demonstrates techniques for compiling a\nDetectron2 R-CNN model for inference on Inf1.\n\nSpecifically, this section explains how to create a standard Detectron2 R-CNN model,\nusing a ResNet-101 backbone. It demonstrates how to use profiling to\nidentify the most compute-intensive parts of the R-CNN that need to be\ncompiled for accelerated inference on Inf1. It then explains how to\nmanually extract and compile the ResNet backbone (the dominant compute\ncomponent) and inject the compiled backbone back into the full model, for\nimproved performance.\n\nCreate a Detectron2 R-CNN Model\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nCreate a Detectron2 R-CNN model using the\n``COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml`` pretrained weights and\nconfig file. Download a sample image from the COCO dataset and\nrun an example inference.\n\n.. code:: ipython3\n\n    from detectron2 import model_zoo\n    from detectron2.engine import DefaultPredictor\n    from detectron2.config import get_cfg\n    \n    def get_model():\n    \n        # Configure the R-CNN model\n        CONFIG_FILE = \"COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml\"\n        WEIGHTS_FILE = \"COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml\"\n        cfg = get_cfg()\n        cfg.merge_from_file(model_zoo.get_config_file(CONFIG_FILE))\n        cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(WEIGHTS_FILE)\n        cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5\n        cfg.MODEL.DEVICE = 'cpu'  # Send to CPU for Neuron Tracing\n    \n        # Create the R-CNN predictor wrapper\n        predictor = DefaultPredictor(cfg)\n        return predictor\n\n.. code:: ipython3\n\n    import os\n    import urllib.request\n    \n    # Define a function to get a sample image\n    def get_image():\n        filename = 'input.jpg'\n        if not os.path.exists(filename):\n            url = \"http://images.cocodataset.org/val2017/000000439715.jpg\"\n            urllib.request.urlretrieve(url, filename)\n        return filename\n\n.. code:: ipython3\n\n    import time\n    import cv2\n    \n    # Create an R-CNN model\n    predictor = get_model()\n    \n    # Get a sample image from the COCO dataset\n    image_filename = get_image()\n    image = cv2.imread(image_filename)\n    \n    # Run inference and print inference latency\n    start = time.time()\n    outputs = predictor(image)\n    print(f'Inference time: {(time.time() - start):0.3f} s')\n\nProfile the Model\n~~~~~~~~~~~~~~~~~\n\nUse the `PyTorch\nProfiler <https://pytorch.org/docs/stable/profiler.html>`__ to identify\nwhich operators contribute the most to the model’s runtime on CPU.\nIdeally, you can compile these compute intensive operators onto Inf1 for\naccelerated inference.\n\n.. code:: ipython3\n\n    import torch.autograd.profiler as profiler\n    \n    with profiler.profile(record_shapes=True) as prof:\n        with profiler.record_function(\"model_inference\"):\n            predictor(image)\n    print(prof.key_averages().table(sort_by=\"cpu_time_total\", row_limit=30))\n\nWe see that convolution operators (``aten::convolution``) contribute the\nmost to inference time. By compiling these convolution operators to\nInf1, you can improve performance of the R-CNN model. Print the\nR-CNN model architecture to see which layers contain the\n``aten::convolution`` operators:\n\n.. code:: ipython3\n\n    print(predictor.model)\n\nNote that the ResNet FPN backbone\n(`predictor.model.backbone <https://github.com/facebookresearch/detectron2/blob/v0.6/detectron2/modeling/backbone/fpn.py>`__ L17-L162)\ncontains the majority of convolution operators in the model. The RPN\n(`predictor.model.proposal_generator <https://github.com/facebookresearch/detectron2/blob/v0.6/detectron2/modeling/proposal_generator/rpn.py>`__ L181-L533)\nalso contains several convolutions. Based on this,\ncompile the ResNet backbone and RPN onto Inf1 to maximize performance.\n\nCompiling the ResNet backbone to Inf1\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis section demonstrates how to compile the ResNet backbone to\nInf1 and use it for inference.\n\nEextract the backbone by accessing it with\n``predictor.model.backbone``. Compile the backbone using\n``strict=False``, because the backbone outputs a dictionary. Use a\nfixed input shape (``800 x 800``) for compilation, as all inputs will be resized to this shape during inference. This\nsection also defines a basic preprocessing function (mostly derived from\nthe Detectron2 R-CNN\n`DefaultPredictor <https://github.com/facebookresearch/detectron2/blob/45b3fcea6e76bf7a351e54e01c7d6e1a3a0100a5/detectron2/engine/defaults.py>`__\nmodule L308-L318) that reshapes inputs to ``800 x 800``.\n\nCreate a ``NeuronRCNN`` wrapper to inject the\ncompiled backbone back into the model by dynamically replacing the\n``predictor.model.backbone`` attribute with the compiled model.\n\n.. code:: ipython3\n\n    import torch\n    import torch_neuron \n    \n    example = torch.rand([1, 3, 800, 800])\n    \n    # Use `with torch.no_grad():` to avoid a jit tracing issue in the ResNet backbone\n    with torch.no_grad():\n        neuron_backbone = torch_neuron.trace(predictor.model.backbone, example, strict=False)\n    \n    backbone_filename = 'backbone.pt'\n    torch.jit.save(neuron_backbone, backbone_filename)\n\n.. code:: ipython3\n\n    from detectron2.modeling.meta_arch.rcnn import GeneralizedRCNN\n    from torch.jit import ScriptModule\n\n    class NeuronRCNN(torch.nn.Module):\n        \"\"\"\n        Creates a `NeuronRCNN` wrapper that injects the compiled backbone into\n        the R-CNN model. It also stores the `size_divisibility` attribute from\n        the original backbone.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN, neuron_backbone: ScriptModule) -> None:\n            super().__init__()\n    \n            # Keep track of the backbone variables\n            size_divisibility = model.backbone.size_divisibility\n    \n            # Load and inject the compiled backbone\n            model.backbone = neuron_backbone\n    \n            # Set backbone variables\n            setattr(model.backbone, 'size_divisibility', size_divisibility)\n    \n            self.model = model\n    \n        def forward(self, x):\n            return self.model(x)\n\n.. code:: ipython3\n\n    # Create the R-CNN with the compiled backbone\n    neuron_rcnn = NeuronRCNN(predictor.model, neuron_backbone)\n    neuron_rcnn.eval()\n\n    # Print the R-CNN architecture to verify the backbone is now the\n    # `neuron_backbone` (shows up as `RecursiveScriptModule`)\n    print(neuron_rcnn)\n\n.. code:: ipython3\n\n    def preprocess(original_image, predictor):\n        \"\"\"\n        A basic preprocessing function that sets the input height=800 and \n        input width=800. The function is derived from the preprocessing\n        steps in the Detectron2 `DefaultPredictor` module.\n        \"\"\"\n    \n        height, width = original_image.shape[:2]\n        resize_func = predictor.aug.get_transform(original_image)\n        resize_func.new_h = 800 # Override height\n        resize_func.new_w = 800 # Override width\n        image = resize_func.apply_image(original_image)\n        image = torch.as_tensor(image.astype(\"float32\").transpose(2, 0, 1))\n        inputs = {\"image\": image, \"height\": height, \"width\": width}\n        return inputs\n\n.. code:: ipython3\n\n    # Get a resized input using the sample image\n    inputs = preprocess(image, get_model())\n    \n    # Run inference and print inference latency\n    start = time.time()\n    for _ in range(10):\n        outputs = neuron_rcnn([inputs])[0]\n    print(f'Inference time: {((time.time() - start)/10):0.3f} s')\n\n.. code:: ipython3\n\n    with profiler.profile(record_shapes=True) as prof:\n        with profiler.record_function(\"model_inference\"):\n            neuron_rcnn([inputs])\n    print(prof.key_averages().table(sort_by=\"cpu_time_total\", row_limit=30))\n\nBy running the backbone on Inf1, the overall runtime is already\nsignificantly improved. The count and runtime of ``aten::convolution``\noperators is also decreased. We now see a ``neuron::forward_v2``\noperator that is the compiled backbone.\n\nOptimize the R-CNN model\n------------------------\n\nCompiling the RPN\n~~~~~~~~~~~~~~~~~\n\nExamine the profiling and note that there are still several\n``aten::convolution``, ``aten::linear``, and ``aten::addmm`` operators\nthat significantly contribute to the model’s overall latency. By\ninspecting the model's architecture and code, we can determine that the\nmajority of these operators are contained in the RPN module\n(`predictor.model.proposal_generator <https://github.com/facebookresearch/detectron2/blob/v0.6/detectron2/modeling/proposal_generator/rpn.py>`__ L181-L533).\n\nTo improve the model's performance, extract the RPN Head and\ncompile it on Inf1 to increase the number of operators running\non Inf1. You need to compile the RPN Head, because the RPN Anchor Generator\ncontains objects that are not traceable with ``torch.jit.trace``.\n\nThe RPN Head contains five layers that run inference on multiple resized\ninputs. To compile the RPN Head, create a list of tensors\nthat contain the input (“``features``”) shapes used by RPN Head on\neach layer. These tensor shapes can be determined by printing the input\nshapes in the RPN Head ``forward`` function\n(``predictor.model.proposal_generator.rpn_head.forward``).\n\nCreate a new ``NeuronRCNN`` wrapper that injects both the\ncompiled backbone and RPN Head into the R-CNN model.\n\n.. code:: ipython3\n\n    import math\n    \n    input_shape = [1, 3, 800, 800] # Overall input shape at inference time\n    \n    # Create the list example of RPN inputs using the resizing logic from the RPN Head\n    features = list()\n    for i in [0, 1, 2, 3, 4]:\n        ratio = 1 / (4 * 2**i)\n        x_i_h = math.ceil(input_shape[2] * ratio)\n        x_i_w = math.ceil(input_shape[3] * ratio)\n        feature = torch.zeros(1, 256, x_i_h, x_i_w)\n        features.append(feature)\n\n.. code:: ipython3\n\n    # Extract and compile the RPN Head\n    neuron_rpn_head = torch_neuron.trace(predictor.model.proposal_generator.rpn_head, [features])\n    rpn_head_filename = 'rpn_head.pt'\n    torch.jit.save(neuron_rpn_head, rpn_head_filename)\n\n.. code:: ipython3\n\n    class NeuronRCNN(torch.nn.Module):\n        \"\"\"\n        Creates a wrapper that injects the compiled backbone and RPN Head\n        into the R-CNN model.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN, neuron_backbone: ScriptModule, neuron_rpn_head: ScriptModule) -> None:\n            super().__init__()\n    \n            # Keep track of the backbone variables\n            size_divisibility = model.backbone.size_divisibility\n    \n            # Inject the compiled backbone\n            model.backbone = neuron_backbone\n    \n            # Set backbone variables\n            setattr(model.backbone, 'size_divisibility', size_divisibility)\n    \n            # Inject the compiled RPN Head\n            model.proposal_generator.rpn_head = neuron_rpn_head\n    \n            self.model = model\n    \n        def forward(self, x):\n            return self.model(x)\n\n.. code:: ipython3\n\n    # Create the R-CNN with the compiled backbone and RPN Head\n    predictor = get_model()\n    neuron_rcnn = NeuronRCNN(predictor.model, neuron_backbone, neuron_rpn_head)\n    neuron_rcnn.eval()\n\n    # Print the R-CNN architecture to verify the compiled modules show up\n    print(neuron_rcnn)\n\n.. code:: ipython3\n\n    # Run inference and print inference latency\n    start = time.time()\n    for _ in range(10):\n        outputs = neuron_rcnn([inputs])[0]\n    print(f'Inference time: {((time.time() - start)/10):0.3f} s')\n\n.. code:: ipython3\n\n    with profiler.profile(record_shapes=True) as prof:\n        with profiler.record_function(\"model_inference\"):\n            neuron_rcnn([inputs])\n    print(prof.key_averages().table(sort_by=\"cpu_time_total\", row_limit=30))\n\nBy running the compiled backbone and RPN Head on Inf1, overall\nruntime is improved. Once again, the number and runtime of\n``aten::convolution`` operators is also decreased. There are now two\n``neuron::forward_v2`` operators, which correspond to the compiled\nbackbone and RPN Head.\n\nFusing the Backbone and RPN Head\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIt is usually preferable to compile fewer independent models\n(“subgraphs”) on Inf1. Combining models and compiling them as a single\nsubgraph enables the Neuron compiler to perform additional optimizations\nand reduces I/O data transfer between CPU and NeuronCores between\neach subgraph.\n\nIn this section, the ResNet backbone and RPN Head are \"fused\" into a\nsingle model to compile on Inf1. Create the\n``NeuronFusedBackboneRPNHead`` wrapper as a compilable model that\ncontains both the ResNet backbone\n(`predictor.model.backbone <https://github.com/facebookresearch/detectron2/blob/v0.6/detectron2/modeling/backbone/fpn.py>`__ L17-L162)\nand RPN Head\n(`predictor.model.proposal_generator <https://github.com/facebookresearch/detectron2/blob/v0.6/detectron2/modeling/proposal_generator/rpn.py>`__ L181-L533).\nOutput the ``features`` to be used downstream by the RoI\nHeads. Compile this ``NeuronFusedBackboneRPNHead`` wrapper as\n``neuron_backbone_rpn``, then create a separate ``BackboneRPN``\nwrapper to inject the ``neuron_backbone_rpn`` in place of\nthe original backbone and RPN Head. Copy the remainder of the\nRPN ``forward`` code\n(`predictor.model.proposal_generator.forward <https://github.com/facebookresearch/detectron2/blob/v0.6/detectron2/modeling/proposal_generator/rpn.py>`__ L431-L480)\nto create a “fused” backbone + RPN module. Lastly, re-write the\n``NeuronRCNN`` wrapper to use the fused backbone + RPN module. The\n``NeuronRCNN`` wrapper also uses the ``predictor.model`` ``forward``\ncode to re-write the rest of the R-CNN model forward function.\n\n.. code:: ipython3\n\n    class NeuronFusedBackboneRPNHead(torch.nn.Module):\n        \"\"\"\n        Wrapper to compile the fused ResNet backbone and RPN Head.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n            self.backbone = model.backbone\n            self.rpn_head = model.proposal_generator.rpn_head\n            self.in_features = model.proposal_generator.in_features\n    \n        def forward(self, x):\n            features = self.backbone(x)\n            features_ = [features[f] for f in self.in_features]\n            return self.rpn_head(features_), features\n\n.. code:: ipython3\n\n    # Create the wrapper with the combined backbone and RPN Head\n    predictor = get_model()\n    backbone_rpn_wrapper = NeuronFusedBackboneRPNHead(predictor.model)\n    backbone_rpn_wrapper.eval()\n    \n    # Compile the wrapper\n    example = torch.rand([1, 3, 800, 800])\n    \n    with torch.no_grad():\n        neuron_backbone_rpn_head = torch_neuron.trace(\n            backbone_rpn_wrapper, example, strict=False)\n    \n    backbone_rpn_filename = 'backbone_rpn.pt'\n    torch.jit.save(neuron_backbone_rpn_head, backbone_rpn_filename)\n\n.. code:: ipython3\n\n    class BackboneRPN(torch.nn.Module):\n        \"\"\"\n        Wrapper that uses the compiled `neuron_backbone_rpn` instead\n        of the original backbone and RPN Head. We copy the remainder\n        of the RPN `forward` code (`predictor.model.proposal_generator.forward`)\n        to create a \"fused\" backbone + RPN module.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n            self.backbone_rpn_head = NeuronFusedBackboneRPNHead(model)\n            self._rpn = model.proposal_generator\n            self.in_features = model.proposal_generator.in_features\n    \n        def forward(self, images):\n            preds, features = self.backbone_rpn_head(images.tensor)\n            features_ = [features[f] for f in self.in_features]\n            pred_objectness_logits, pred_anchor_deltas = preds\n            anchors = self._rpn.anchor_generator(features_)\n    \n            # Transpose the Hi*Wi*A dimension to the middle:\n            pred_objectness_logits = [\n                # (N, A, Hi, Wi) -> (N, Hi, Wi, A) -> (N, Hi*Wi*A)\n                score.permute(0, 2, 3, 1).flatten(1)\n                for score in pred_objectness_logits\n            ]\n            pred_anchor_deltas = [\n                # (N, A*B, Hi, Wi) -> (N, A, B, Hi, Wi) -> (N, Hi, Wi, A, B) -> (N, Hi*Wi*A, B)\n                x.view(x.shape[0], -1, self._rpn.anchor_generator.box_dim,\n                       x.shape[-2], x.shape[-1])\n                .permute(0, 3, 4, 1, 2)\n                .flatten(1, -2)\n                for x in pred_anchor_deltas\n            ]\n    \n            proposals = self._rpn.predict_proposals(\n                anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes\n            )\n            return proposals, features\n\n.. code:: ipython3\n\n    class NeuronRCNN(torch.nn.Module):\n        \"\"\"\n        Wrapper that uses the fused backbone + RPN module and re-writes\n        the rest of the R-CNN `model` `forward` function.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n    \n            # Use the fused Backbone + RPN\n            self.backbone_rpn = BackboneRPN(model)\n    \n            self.roi_heads = model.roi_heads\n    \n            self.preprocess_image = model.preprocess_image\n            self._postprocess = model._postprocess\n    \n        def forward(self, batched_inputs):\n            images = self.preprocess_image(batched_inputs)\n            proposals, features = self.backbone_rpn(images)\n            results, _ = self.roi_heads(images, features, proposals, None)\n            return self._postprocess(results, batched_inputs, images.image_sizes)\n\n.. code:: ipython3\n\n    # Create the new NeuronRCNN wrapper with the combined backbone and RPN Head\n    predictor = get_model()\n    neuron_rcnn = NeuronRCNN(predictor.model)\n    neuron_rcnn.eval()\n\n    # Inject the Neuron compiled models\n    neuron_rcnn.backbone_rpn.backbone_rpn_head = neuron_backbone_rpn_head\n\n    # Print the R-CNN architecture to verify the compiled modules show up\n    print(neuron_rcnn)\n\n.. code:: ipython3\n\n    # Run inference and print inference latency\n    start = time.time()\n    for _ in range(10):\n        outputs = neuron_rcnn([inputs])[0]\n    print(f'Inference time: {((time.time() - start)/10):0.3f} s')\n\n.. code:: ipython3\n\n    with profiler.profile(record_shapes=True) as prof:\n        with profiler.record_function(\"model_inference\"):\n            neuron_rcnn([inputs])\n    print(prof.key_averages().table(sort_by=\"cpu_time_total\", row_limit=30))\n\nBy running the fused backbone + RPN Head on Inf1, overall runtime is\nimproved even more. We now see a single ``neuron::forward_v2`` operator with\na lower runtime than the previous combined runtime of the two separate\n``neuron::forward_v2`` operators.\n\nCompiling the RoI Heads\n~~~~~~~~~~~~~~~~~~~~~~~\n\nThis section describes how to extract and compile part of RoI Heads module\n(`predictor.model.roi_heads <https://github.com/facebookresearch/detectron2/blob/v0.6/detectron2/modeling/roi_heads/roi_heads.py>`__ L530-L778) which runs most of the remaining ``aten::linear`` and ``aten::addmm``\noperators on Inf1. The entire RoI Heads module cannot be extracted, because\nit contains unsupported operators. So you need to create a\n``NeuronBoxHeadBoxPredictor`` wrapper, extracts specific parts of\nthe ``roi_heads`` for compilation. The example input for compilation is\nthe shape of the input into the ``self.roi_heads.box_head.forward``\nfunction. Write another wrapper, ``ROIHead`` that combines the\ncompiled ``roi_heads`` into the rest of the RoI module. The\n``_forward_box`` and ``forward`` functions are from the\n``predictor.model.roi_heads`` module. Lastly, re-write the ``NeuronRCNN``\nwrapper to use the optimized RoI Heads wrapper as well as the fused\nbackbone + RPN module.\n\n.. code:: ipython3\n\n    class NeuronBoxHeadBoxPredictor(torch.nn.Module):\n        \"\"\"\n        Wrapper that extracts the RoI Box Head and Box Predictor\n        for compilation.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n            self.roi_heads = model.roi_heads\n    \n        def forward(self, box_features):\n            box_features = self.roi_heads.box_head(box_features)\n            predictions = self.roi_heads.box_predictor(box_features)\n            return predictions\n\n.. code:: ipython3\n\n    # Create the NeuronBoxHeadBoxPredictor wrapper\n    predictor = get_model()\n    box_head_predictor = NeuronBoxHeadBoxPredictor(predictor.model)\n    box_head_predictor.eval()\n\n    # Compile the wrapper\n    example = torch.rand([1000, 256, 7, 7])\n    neuron_box_head_predictor = torch_neuron.trace(box_head_predictor, example)\n\n    roi_head_filename = 'box_head_predictor.pt'\n    torch.jit.save(neuron_box_head_predictor, roi_head_filename)\n\n.. code:: ipython3\n\n    class ROIHead(torch.nn.Module):\n        \"\"\"\n        Wrapper that combines the compiled `roi_heads` into the\n        rest of the RoI module. The `_forward_box` and `forward`\n        functions are from the `predictor.model.roi_heads` module.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n            self.roi_heads = model.roi_heads\n            self.neuron_box_head_predictor = NeuronBoxHeadBoxPredictor(model)\n    \n        def _forward_box(self, features, proposals):\n            features = [features[f] for f in self.roi_heads.box_in_features]\n            box_features = self.roi_heads.box_pooler(\n                features, [x.proposal_boxes for x in proposals])\n            predictions = self.neuron_box_head_predictor(box_features)\n            pred_instances, _ = self.roi_heads.box_predictor.inference(\n                predictions, proposals)\n            return pred_instances\n    \n        def forward(self, images, features, proposals, targets=None):\n            pred_instances = self._forward_box(features, proposals)\n            pred_instances = self.roi_heads.forward_with_given_boxes(\n                features, pred_instances)\n            return pred_instances, {}\n\n.. code:: ipython3\n\n    class NeuronRCNN(torch.nn.Module):\n        \"\"\"\n        Wrapper that uses the fused backbone + RPN module and the optimized RoI\n        Heads wrapper\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n    \n            # Create fused Backbone + RPN\n            self.backbone_rpn = BackboneRPN(model)\n    \n            # Create Neuron RoI Head\n            self.roi_heads = ROIHead(model)\n    \n            # Define pre and post-processing functions\n            self.preprocess_image = model.preprocess_image\n            self._postprocess = model._postprocess\n    \n        def forward(self, batched_inputs):\n            images = self.preprocess_image(batched_inputs)\n            proposals, features = self.backbone_rpn(images)\n            results, _ = self.roi_heads(images, features, proposals, None)\n            return self._postprocess(results, batched_inputs, images.image_sizes)\n\n.. code:: ipython3\n\n    # Initialize an R-CNN on CPU\n    predictor = get_model()\n\n    # Create the Neuron R-CNN on CPU\n    neuron_rcnn = NeuronRCNN(predictor.model)\n    neuron_rcnn.eval()\n\n    # Inject the Neuron compiled models\n    neuron_rcnn.backbone_rpn.backbone_rpn_head = neuron_backbone_rpn_head\n    neuron_rcnn.roi_heads.neuron_box_head_predictor = neuron_box_head_predictor\n\n.. code:: ipython3\n\n    # Run inference and print inference latency\n    start = time.time()\n    for _ in range(10):\n        outputs = neuron_rcnn([inputs])[0]\n    print(f'CPU Inference time: {((time.time() - start)/10):0.3f} s')\n\n.. code:: ipython3\n\n    with profiler.profile(record_shapes=True) as prof:\n        with profiler.record_function(\"model_inference\"):\n            neuron_rcnn([inputs])\n    print(prof.key_averages().table(sort_by=\"cpu_time_total\", row_limit=30))\n\nAlthough the overall latency did not change significantly, running more\nof the model on Inf1 instead of CPU frees up CPU resources when\nmultiple models are running in parallel.\n\nEnd-to-end Compilation and Inference\n------------------------------------\n\nThis section provides standalone code that compiles and runs an\noptimized Detectron2 R-CNN on Inf1. Most of the code in this section is\nfrom the previous sections in this application note and is\nconsolidated here for easy deployment. This section has the following\nmain components:\n\n- Preprocessing and compilation functions\n- Wrappers that extract the R-CNN ResNet backbone, RPN Head, and RoI\n   Head for compilation on Inf1.\n- A ``NeuronRCNN`` wrapper that creates an optimized end-to-end\n   Detectron2 R-CNN model for inference on Inf1\n- Benchmarking code that runs parallelized inference for optimized\n   throughput on Inf1\n\nBenchmarking\n~~~~~~~~~~~~\n\nThe benchmarking section explains how to load multiple optimized RCNN models and\nrun them in parallel, to maximize throughput.\n\nUse the beta NeuronCore placement API,\n``torch_neuron.experimental.neuron_cores_context()``, to ensure all\ncompiled models in an optimized RCNN model are loaded onto the same\nNeuronCore. Note that the functionality and API of\n``torch_neuron.experimental.neuron_cores_context()`` might change in\nfuture releases.\n\nDefine a simple benchmark function that loads four optimized RCNN\nmodels onto four separate NeuronCores, runs multithreaded inference, and\ncalculates the corresponding latency and throughput. Benchmark\nvarious numbers of loaded models, to show the impact of parallelism.\n\nNote that throughput increases (at the cost of latency) when more\nmodels are run in parallel on Inf1. Increasing the number of worker\nthreads also improves throughput.\n\nOther improvements\n~~~~~~~~~~~~~~~~~~\n\nThere are many additional optimizations that can be applied to RCNN\nmodels on Inf1 depending on the application:\n\nFor latency sensitive applications:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n-  Each of the five layers in the RPN head can be parallelized to\n   decrease overall latency.\n-  The number of OMP Threads can be increased in the ROI Align kernel.\n   Both of these optimizations improve latency, at the cost of\n   decreasing throughput.\n\nFor throughput sensitive applications:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n-  The input batch size can be increased to improve NeuronCore\n   utilization.\n\n.. code:: ipython3\n\n    import time\n    import os\n    import urllib.request\n    from typing import Any, Union, Callable\n    \n    import cv2\n    import numpy as np\n    from concurrent.futures import ThreadPoolExecutor\n    \n    import torch\n    import torch_neuron\n    \n    from detectron2 import model_zoo\n    from detectron2.engine import DefaultPredictor\n    from detectron2.config import get_cfg\n    from detectron2.modeling.meta_arch.rcnn import GeneralizedRCNN\n    \n    \n    # -----------------------------------------------------------------------------\n    # Helper functions\n    # -----------------------------------------------------------------------------\n    \n    def get_model():\n    \n        # Configure the R-CNN model\n        CONFIG_FILE = \"COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml\"\n        WEIGHTS_FILE = \"COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml\"\n        cfg = get_cfg()\n        cfg.merge_from_file(model_zoo.get_config_file(CONFIG_FILE))\n        cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(WEIGHTS_FILE)\n        cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5\n        cfg.MODEL.DEVICE = 'cpu'  # Send to CPU for Neuron Tracing\n    \n        # Create the R-CNN predictor wrapper\n        predictor = DefaultPredictor(cfg)\n        return predictor\n    \n    \n    def get_image():\n    \n        # Get a sample image\n        filename = 'input.jpg'\n        if not os.path.exists(filename):\n            url = \"http://images.cocodataset.org/val2017/000000439715.jpg\"\n            urllib.request.urlretrieve(url, filename)\n        return filename\n    \n    \n    def preprocess(original_image, predictor):\n        \"\"\"\n        A basic preprocessing function that sets the input height=800 and \n        input width=800. The function is derived from the preprocessing\n        steps in the Detectron2 `DefaultPredictor` module.\n        \"\"\"\n    \n        height, width = original_image.shape[:2]\n        resize_func = predictor.aug.get_transform(original_image)\n        resize_func.new_h = 800 # Override height\n        resize_func.new_w = 800 # Override width\n        image = resize_func.apply_image(original_image)\n        image = torch.as_tensor(image.astype(\"float32\").transpose(2, 0, 1))\n        inputs = {\"image\": image, \"height\": height, \"width\": width}\n        return inputs\n    \n    \n    # -----------------------------------------------------------------------------\n    # Neuron modules\n    # -----------------------------------------------------------------------------\n    \n    class NeuronFusedBackboneRPNHead(torch.nn.Module):\n        \"\"\"\n        Wrapper to compile the fused ResNet backbone and RPN Head.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n            self.backbone = model.backbone\n            self.rpn_head = model.proposal_generator.rpn_head\n            self.in_features = model.proposal_generator.in_features\n    \n        def forward(self, x):\n            features = self.backbone(x)\n            features_ = [features[f] for f in self.in_features]\n            return self.rpn_head(features_), features\n    \n    \n    class BackboneRPN(torch.nn.Module):\n        \"\"\"\n        Wrapper that uses the compiled `neuron_backbone_rpn` instead\n        of the original backbone and RPN Head. We copy the remainder\n        of the RPN `forward` code (`predictor.model.proposal_generator.forward`)\n        to create a \"fused\" backbone + RPN module.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n            self.backbone_rpn_head = NeuronFusedBackboneRPNHead(model)\n            self._rpn = model.proposal_generator\n            self.in_features = model.proposal_generator.in_features\n    \n        def forward(self, images):\n            preds, features = self.backbone_rpn_head(images.tensor)\n            features_ = [features[f] for f in self.in_features]\n            pred_objectness_logits, pred_anchor_deltas = preds\n            anchors = self._rpn.anchor_generator(features_)\n    \n            # Transpose the Hi*Wi*A dimension to the middle:\n            pred_objectness_logits = [\n                # (N, A, Hi, Wi) -> (N, Hi, Wi, A) -> (N, Hi*Wi*A)\n                score.permute(0, 2, 3, 1).flatten(1)\n                for score in pred_objectness_logits\n            ]\n            pred_anchor_deltas = [\n                # (N, A*B, Hi, Wi) -> (N, A, B, Hi, Wi) -> (N, Hi, Wi, A, B) -> (N, Hi*Wi*A, B)\n                x.view(x.shape[0], -1, self._rpn.anchor_generator.box_dim,\n                       x.shape[-2], x.shape[-1])\n                .permute(0, 3, 4, 1, 2)\n                .flatten(1, -2)\n                for x in pred_anchor_deltas\n            ]\n    \n            proposals = self._rpn.predict_proposals(\n                anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes\n            )\n            return proposals, features\n    \n    \n    class NeuronBoxHeadBoxPredictor(torch.nn.Module):\n        \"\"\"\n        Wrapper that extracts the RoI Box Head and Box Predictor\n        for compilation.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n            self.roi_heads = model.roi_heads\n    \n        def forward(self, box_features):\n            box_features = self.roi_heads.box_head(box_features)\n            predictions = self.roi_heads.box_predictor(box_features)\n            return predictions\n    \n    \n    class ROIHead(torch.nn.Module):\n        \"\"\"\n        Wrapper that combines the compiled `roi_heads` into the\n        rest of the RoI module. The `_forward_box` and `forward`\n        functions are from the `predictor.model.roi_heads` module.\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n            self.roi_heads = model.roi_heads\n            self.neuron_box_head_predictor = NeuronBoxHeadBoxPredictor(model)\n    \n        def _forward_box(self, features, proposals):\n            features = [features[f] for f in self.roi_heads.box_in_features]\n            box_features = self.roi_heads.box_pooler(\n                features, [x.proposal_boxes for x in proposals])\n            predictions = self.neuron_box_head_predictor(box_features)\n            pred_instances, _ = self.roi_heads.box_predictor.inference(\n                predictions, proposals)\n            return pred_instances\n    \n        def forward(self, images, features, proposals, targets=None):\n            pred_instances = self._forward_box(features, proposals)\n            pred_instances = self.roi_heads.forward_with_given_boxes(\n                features, pred_instances)\n            return pred_instances, {}\n    \n    \n    class NeuronRCNN(torch.nn.Module):\n        \"\"\"\n        Wrapper that uses the fused backbone + RPN module and the optimized RoI\n        Heads wrapper\n        \"\"\"\n    \n        def __init__(self, model: GeneralizedRCNN) -> None:\n            super().__init__()\n    \n            # Create fused Backbone + RPN\n            self.backbone_rpn = BackboneRPN(model)\n    \n            # Create Neuron RoI Head\n            self.roi_heads = ROIHead(model)\n    \n            # Define pre and post-processing functions\n            self.preprocess_image = model.preprocess_image\n            self._postprocess = model._postprocess\n    \n        def forward(self, batched_inputs):\n            images = self.preprocess_image(batched_inputs)\n            proposals, features = self.backbone_rpn(images)\n            results, _ = self.roi_heads(images, features, proposals, None)\n            return self._postprocess(results, batched_inputs, images.image_sizes)\n    \n    \n    # -----------------------------------------------------------------------------\n    # Compilation functions\n    # -----------------------------------------------------------------------------\n    \n    def compile(\n        model: Union[Callable, torch.nn.Module],\n        example_inputs: Any,\n        filename: str,\n        **kwargs\n    ) -> torch.nn.Module:\n        \"\"\"\n        Compiles the model for Inf1 if it doesn't already exist and saves it as the provided filename. \n        \n        model: A module or function which defines a torch model or computation.\n        example_inputs: An example set of inputs which will be passed to the\n            `model` during compilation.\n        filename: Name of the compiled model\n        kwargs: Extra `torch_neuron.trace` kwargs\n        \"\"\"\n    \n        if not os.path.exists(filename):\n            with torch.no_grad():\n                compiled_model = torch_neuron.trace(model, example_inputs, **kwargs)\n            torch.jit.save(compiled_model, filename)\n    \n    \n    # -----------------------------------------------------------------------------\n    # Benchmarking function\n    # -----------------------------------------------------------------------------\n    \n    def benchmark(backbone_rpn_filename, roi_head_filename, inputs, \n                  n_models=4, batch_size=1, n_threads=4, iterations=200):\n        \"\"\"\n        A simple benchmarking function that loads `n_models` optimized\n        models onto separate NeuronCores, runs multithreaded inference,\n        and calculates the corresponding latency and throughput.\n        \"\"\"\n    \n        # Load models\n        models = list()\n        for i in range(n_models):\n            with torch_neuron.experimental.neuron_cores_context(i):\n                # Create the RCNN with the fused backbone + RPN Head and compiled RoI Heads\n                # Initialize an R-CNN on CPU\n                predictor = get_model()\n\n                # Create the Neuron R-CNN on CPU\n                neuron_rcnn = NeuronRCNN(predictor.model)\n                neuron_rcnn.eval()\n\n                # Inject the Neuron compiled models\n                neuron_rcnn.backbone_rpn.backbone_rpn_head = torch.jit.load(backbone_rpn_filename)\n                neuron_rcnn.roi_heads.neuron_box_head_predictor = torch.jit.load(roi_head_filename)\n\n                models.append(neuron_rcnn)\n    \n        # Warmup\n        for _ in range(8):\n            for model in models:\n                model([inputs])\n    \n        latencies = []\n    \n        # Thread task\n        def task(i):\n            start = time.time()\n            models[i]([inputs])\n            finish = time.time()\n            latencies.append((finish - start) * 1000)\n    \n        begin = time.time()\n        with ThreadPoolExecutor(max_workers=n_threads) as pool:\n            for i in range(iterations):\n                pool.submit(task, i % n_models)\n        end = time.time()\n    \n        # Compute metrics\n        boundaries = [50, 95, 99]\n        names = [f'Latency P{i} (ms)' for i in boundaries]\n        percentiles = np.percentile(latencies, boundaries)\n        duration = end - begin\n    \n        # Display metrics\n        results = {\n            'Samples': iterations,\n            'Batch Size': batch_size,\n            'Models': n_models,\n            'Threads': n_threads,\n            'Duration (s)': end - begin,\n            'Throughput (inf/s)': (batch_size * iterations) / duration,\n            **dict(zip(names, percentiles)),\n        }\n    \n        print('-' * 80)\n        pad = max(map(len, results))\n        for key, value in results.items():\n            if isinstance(value, float):\n                print(f'{key + \":\" :<{pad + 1}} {value:0.3f}')\n            else:\n                print(f'{key + \":\" :<{pad + 1}} {value}')\n        print()\n    \n    \n    if __name__ == \"__main__\":\n    \n        # Create and compile the combined backbone and RPN Head wrapper\n        backbone_rpn_filename = 'backbone_rpn.pt'\n        predictor = get_model()\n        backbone_rpn_wrapper = NeuronFusedBackboneRPNHead(predictor.model)\n        backbone_rpn_wrapper.eval()\n        example = torch.rand([1, 3, 800, 800])\n        compile(backbone_rpn_wrapper, example, backbone_rpn_filename, strict=False)\n\n        # Create and compile the RoI Head wrapper\n        roi_head_filename = 'box_head_predictor.pt'\n        predictor = get_model()\n        box_head_predictor = NeuronBoxHeadBoxPredictor(predictor.model)\n        box_head_predictor.eval()\n        example = torch.rand([1000, 256, 7, 7])\n        compile(box_head_predictor, example, roi_head_filename)\n\n        # Download a sample image from the COCO dataset and read it\n        image_filename = get_image()\n        image = cv2.imread(image_filename)\n        inputs = preprocess(image, get_model())\n    \n        # Benchmark the Neuron R-CNN model for various numbers of loaded models\n        benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=1, n_threads=1)\n        benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=1, n_threads=2)\n        benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=2, n_threads=2)\n        benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=2, n_threads=4)\n        benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=4, n_threads=4)\n        benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=4, n_threads=8)\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuron/torch-neuron-dataparallel-app-note.rst",
    "content": ".. _torch-neuron-dataparallel-app-note:\n\nData Parallel Inference on Torch Neuron\n=======================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nIntroduction\n------------\n\nThis guide introduces :func:`torch.neuron.DataParallel`, a Python API that\nimplements data parallelism on :class:`~torch.jit.ScriptModule` models created by the\n:doc:`Trace API </archive/torch-neuron/api-compilation-python-api>`.\nThe following sections explain how data parallelism can improve the performance of\ninference workloads on Inferentia, including how :func:`torch.neuron.DataParallel`\nuses dynamic batching to run inference on variable input sizes. It covers an\noverview of the :func:`torch.neuron.DataParallel` module and provides a few\n:ref:`example data parallel applications <data_paraellel_examples>`.\n\nData parallel inference\n-------------------------\n\nData Parallelism is a form of parallelization across multiple devices or cores,\nreferred to as nodes. Each node contains the same model and parameters, but\ndata is distributed across the different nodes. By distributing the\ndata across multiple nodes, data parallelism reduces the total\nexecution time of large batch size inputs compared to sequential execution.\nData parallelism works best for smaller models in latency sensitive\napplications that have large batch size requirements.\n\n\ntorch.neuron.DataParallel\n-------------------------\n\nTo fully leverage the Inferentia hardware, we want to use all available\nNeuronCores. An inf1.xlarge and inf1.2xlarge have four NeuronCores, an\ninf1.6xlarge has 16 NeuronCores, and an inf1.24xlarge has 64 NeuronCores.\nFor maximum performance on Inferentia hardware, we can use\n:func:`torch.neuron.DataParallel` to utilize all available NeuronCores.\n\n:func:`torch.neuron.DataParallel` implements data parallelism at the module\nlevel by replicating the Neuron model on all available NeuronCores\nand distributing data across the different cores for parallelized inference.\nThis function is analogous to :class:`~torch.nn.DataParallel` in PyTorch.\n:func:`torch.neuron.DataParallel` requires PyTorch >= 1.8.\n\nThe following sections provide an overview of some of the features\nof :func:`torch.neuron.DataParallel` that enable maximum performance on\nInferentia.\n\nNeuronCore selection\n^^^^^^^^^^^^^^^^^^^^\n\nBy default, DataParallel will try to use all NeuronCores allocated to the\ncurrent process to fully saturate the Inferentia hardware for maximum performance.\nIt is more efficient to make the batch dimension divisible by the number of\nNeuronCores. This will ensure that NeuronCores are not left idle during\nparallel inference and the Inferentia hardware is fully utilized.\n\nIn some applications, it is advantageous to use a subset of the\navailable NeuronCores for DataParallel inference. DataParallel has a\n``device_ids`` argument that accepts a list of :obj:`int` or ``'nc:#'``\nthat specify the NeuronCores to use for parallelization. See\n:ref:`Specifying NeuronCores <dataparallel_example_specify_ncs>`\nfor an example of how to use ``device_ids`` argument.\n\nBatch dim\n^^^^^^^^^\n\nDataParallel accepts a ``dim`` argument that denotes the batch dimension used\nto split the input data for distributed inference. By default,\nDataParalell splits the inputs on ``dim = 0`` if the ``dim`` argument is not\nspecified. For applications with a non-zero batch dim, the ``dim`` argument\ncan be used to specify the inference-time input batch dimension.\n:ref:`DataParallel with dim ! = 0 <data_paraellel_examples>` provides an\nexample of data parallel inference on inputs with batch dim = 2.\n\n.. _dynamic_batching_description:\n\nDynamic batching\n^^^^^^^^^^^^^^^^\n\nBatch size has a direct impact on model performance. The Inferentia chip is optimized\nto run with small batch sizes. This means that a Neuron compiled model can outperform\na GPU model, even if running single digit batch sizes.\n\nAs a general best practice, we recommend optimizing your model's throughput by\ncompiling the model with a small batch size and gradually increasing it to\nfind the peak throughput on Inferentia.\n\nDynamic batching is a feature that allows you to use tensor batch sizes that the\nNeuron model was not originally compiled against. This is necessary because the\nunderlying Inferentia hardware will always execute inferences with the batch\nsize used during compilation. Fixed batch size execution allows tuning the\ninput batch size for optimal performance. For example, batch size 1 may be\nbest suited for an ultra-low latency on-demand inference application, while\nbatch size > 1 can be used to maximize throughput for offline inferencing.\nDynamic batching is implemented by slicing large input tensors into chunks\nthat match the batch size used during the :func:`torch_neuron.trace` compilation call.\n\nThe :func:`torch.neuron.DataParallel` class automatically enables dynamic batching on\neligible models. This allows us to run inference in applications that have\ninputs with a variable batch size without needing to recompile the model. See\n:ref:`Dynamic batching <dataparallel_example_dynamic_batching>` for an example\nof how DataParallel can be used to run inference on inputs with a dynamic batch\nsize without needing to recompile the model.\n\nDynamic batching using small batch sizes can result in sub-optimal throughput\nbecause it involves slicing tensors into chunks and iteratively sending data\nto the hardware. Using a larger batch size at compilation time can use the\nInferentia hardware more efficiently in order to maximize throughput. You can\ntest the tradeoff between individual request latency and total throughput by\nfine-tuning the input batch size.\n\nAutomatic batching in the DataParallel module can be disabled using the\n``disable_dynamic_batching()`` function as follows:\n\n.. code-block:: python\n\n   >>> model_parallel = torch.neuron.DataParallel(model_neuron)\n   >>> model_parallel.disable_dynamic_batching()\n\nIf dynamic batching is disabled, the compile-time batch size must be equal to\nthe inference-time batch size divided by the number of NeuronCores.\n:ref:`DataParallel with dim != 0 <dataparallel_example_dim_neq_zero>` and\n:ref:`Dynamic batching disabled <dataparallel_example_disable_dynamic_batching>`\nprovide examples of running DataParallel inference with dynamic batching\ndisabled.\n\n\nPerformance optimizations\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe DataParallel module has a ``num_workers`` attribute that can be used to\nspecify the number of worker threads used for multithreaded inference. By\ndefault, ``num_workers = 2 * number of NeuronCores``. This value can be\nfine tuned to optimize DataParallel performance.\n\nDataParallel has a ``split_size`` attribute that dictates the size of the input\nchunks that are distributed to each NeuronCore. By default,\n``split_size = max(1, input.shape[dim] // number of NeuronCores)``. This value\ncan be modified to optimally match the inference input chunk size with the\ncompile-time batch size.\n\n.. _data_paraellel_examples:\n\nExamples\n--------\n\nThe following sections provide example usages of the\n:func:`torch.neuron.DataParallel` module.\n\n\n.. _dataparallel_example_default:\n\nDefault usage\n^^^^^^^^^^^^^\n\n.. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-default.rst\n\n.. _dataparallel_example_specify_ncs:\n\nSpecifying NeuronCores\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-specify-ncs.rst\n\n\n.. _dataparallel_example_dim_neq_zero:\n\nDataParallel with dim != 0\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-dim-neq-zero.rst\n\n\n.. _dataparallel_example_dynamic_batching:\n\nDynamic batching\n^^^^^^^^^^^^^^^^\n\n.. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-dynamic-batching.rst\n\n\n.. _dataparallel_example_disable_dynamic_batching:\n\nDynamic batching disabled\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-disable-dynamic-batching.rst\n\n\nFull tutorial with torch.neuron.DataParallel\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFor an end-to-end tutorial that uses DataParallel, see the\n:ref:`PyTorch Resnet Tutorial </src/examples/pytorch/resnet50.ipynb>`.\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuronx/index.rst",
    "content": ".. _torch-neuronx-appnotes:\n\nPyTorch NeuronX Application Notes\n==================================\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   introducing-pytorch-2-6\n   introducing-pytorch-2-7\n   introducing-pytorch-2-8\n   introducing-pytorch-2-9\n   introducing-pytorch-2-x\n   migration-from-xla-downcast-bf16\n   torch-neuronx-dataparallel-app-note\n   torch-neuronx-graph-partitioner-app-note\n\nThis section contains application notes specific to PyTorch NeuronX (``torch-neuronx``) for ``Trn1`` and ``Inf2`` instances. These guides cover PyTorch version migrations, advanced features, optimization techniques, and best practices for training and inference on AWS Trainium and Inferentia2.\n\nPyTorch Version Support\n-----------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: introducing-pytorch-2-9\n      :link-type: doc\n\n      **PyTorch 2.9 Support**\n      ^^^\n      New features and migration guide for PyTorch 2.9 on Neuron\n\n   .. grid-item-card::\n      :link: introducing-pytorch-2-8\n      :link-type: doc\n\n      **PyTorch 2.8 Support**\n      ^^^\n      New features and migration guide for PyTorch 2.8 on Neuron\n\n   .. grid-item-card::\n      :link: introducing-pytorch-2-7\n      :link-type: doc\n\n      **PyTorch 2.7 Support**\n      ^^^\n      Features and improvements introduced with PyTorch 2.7 support\n\n   .. grid-item-card::\n      :link: introducing-pytorch-2-x\n      :link-type: doc\n\n      **PyTorch 2.x Overview**\n      ^^^\n      General guide to PyTorch 2.x series support and features\n\nAdvanced Features\n-----------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: torch-neuronx-graph-partitioner-app-note\n      :link-type: doc\n\n      **Graph Partitioner**\n      ^^^\n      Advanced graph partitioning strategies for distributed training and inference\n\n   .. grid-item-card::\n      :link: torch-neuronx-dataparallel-app-note\n      :link-type: doc\n\n      **Data Parallel Inference**\n      ^^^\n      Scale inference workloads using ``torch.neuronx.DataParallel`` for multi-core execution\n\n   .. grid-item-card::\n      :link: migration-from-xla-downcast-bf16\n      :link-type: doc\n\n      **XLA Migration Guide**\n      ^^^\n      Migrate from deprecated XLA environment variables to PyTorch mixed-precision options\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-6.rst",
    "content": ".. _introduce-pytorch-2-6:\n\nIntroducing PyTorch 2.6 Support\n===============================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nWhat are we introducing?\n------------------------\n\nStarting with the :ref:`Neuron 2.23 <neuron-2.23.0-whatsnew>` release, customers can now upgrade to PyTorch NeuronX (``torch-neuronx``) with specific support for PyTorch version 2.6.\n\n:ref:`setup-torch-neuronx` is updated to include installation instructions for PyTorch NeuronX 2.6 for Amazon Linux 2023 and Ubuntu 22.04. Note that PyTorch NeuronX 2.6 is supported on Python 3.9, 3.10, and 3.11.\n\nReview :ref:`migration guide <migrate_to_pytorch_2.6>` for possible changes to training scripts. No code changes are required for inference scripts.\n\n\n.. _how-pytorch-2.6-different:\n\nHow is PyTorch NeuronX 2.6 different compared to PyTorch NeuronX 2.5?\n---------------------------------------------------------------------\n\nPyTorch NeuronX 2.6 uses Torch-XLA 2.6 which has improved support for Automatic Mixed Precision and buffer aliasing. Additionally:\n\n* Reintroduced ``XLA_USE_32BIT_LONG`` to give customers the flexibility to use INT32 for their workloads. This flag was removed in v2.5.\n* Added xm.xla_device_kind() to return the XLA device kind string ('NC_v2' for Trainium1, 'NC_v3' and 'NC_v3d' for Trainium2). See :ref:`logical-neuroncore-config` for more info.\n\nSee `Torch-XLA 2.6 release <https://github.com/pytorch/xla/releases/tag/v2.6.0>`__ for a full list.\n\nSee :ref:`migrate_to_pytorch_2.6` for changes needed to use PyTorch NeuronX 2.6.\n\n.. note::\n\n   GSPMD and Torch Dynamo (torch.compile) support in Neuron will be available in a future release.\n\n.. _install_pytorch_neuron_2.6:\n\nHow can I install PyTorch NeuronX 2.6?\n--------------------------------------------\n\nTo install PyTorch NeuronX 2.6, follow the :ref:`setup-torch-neuronx` guides for Amazon Linux 2023 and Ubuntu 22.04 AMI. Refer to the Neuron Multi-Framework DLAMI :ref:`setup guide <setup-ubuntu22-multi-framework-dlami>` for Ubuntu 22.04 with a pre-installed virtual environment for PyTorch NeuronX 2.6 that you can use to get started. PyTorch NeuronX 2.6 can be installed using the following:\n\n.. code::\n\n    python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.6.* torchvision\n\n.. note::\n\n   PyTorch NeuronX 2.6 is currently available for Python 3.9, 3.10, 3.11.\n\n.. _migrate_to_pytorch_2.6:\n\nMigrate your application to PyTorch 2.6\n---------------------------------------\n\nFirst, install the PyTorch NeuronX 2.6 as described above in :ref:`installation guide <install_pytorch_neuron_2.6>`\n\n\nMigrating training scripts\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo migrate the training scripts from PyTorch NeuronX 2.5 to PyTorch NeuronX 2.6, implement the following changes: \n\n.. note::\n\n    ``xm`` below refers to ``torch_xla.core.xla_model``, ``xr`` refers to ``torch_xla.runtime``, and ``xmp`` refers to ``torch_xla.distributed.xla_multiprocessing``\n\n* The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used) and will be removed in an upcoming release. Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to convert model to BF16 format. (see :ref:`migration_from_xla_downcast_bf16`)\n* The functions ``xm.xrt_world_size()``, ``xm.get_ordinal()``, and ``xm.get_local_ordinal()`` are deprecated (warnings are shown when used). Switch to ``xr.world_size()``, ``xr.global_ordinal()``, and ``xr.local_ordinal()`` respectively as replacements.\n* The default behavior of ``torch.load`` parameter ``weights_only`` is changed from ``False`` to ``True``. Setting ``weights_only`` to ``True`` may cause issues with pickling custom objects.\n* If using ``xmp.spawn``, the ``nprocs`` argument is limited to 1 or None since v2.1. Previously, passing a value > 1 would result in a warning. In torch-xla 2.6, passing a value > 1 will result in an error with an actionable message to use ``NEURON_NUM_DEVICES`` to set the number of NeuronCores to use.\n\nSee :ref:`v2.5 migration guide <migrate_to_pytorch_2_5>` for additional changes needed if you are migrating from PyTorch NeuronX 2.1.\n\nMigrating inference scripts\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThere are no code changes required in the inference scripts.\n\n\nTroubleshooting and Known Issues\n--------------------------------\n\nTensor split on second dimension of 2D array not working\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``.\n\n\nLower BERT pretraining performance with torch-neuronx 2.6 compared to torch-neuronx 2.5\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, BERT pretraining performance is ~10% lower with torch-neuronx 2.6 compared to torch-neuronx 2.5. This is due to a known regression in the torch-xla library https://github.com/pytorch/xla/issues/9037 and may affect other models with high graph tracing overhead. To work around this issue, build the ``r2.6_aws_neuron`` branch of torch-xla as follows (see :ref:`pytorch-neuronx-install-cxx11` for C++11 ABI version):\n\n.. code:: bash\n\n   # Setup build env (make sure you are in a python virtual env). Replace \"apt\" with \"yum\" on AL2023.\n   sudo apt install cmake\n   pip install yapf==0.30.0\n   wget https://github.com/bazelbuild/bazelisk/releases/download/v1.20.0/bazelisk-linux-amd64\n   sudo cp bazelisk-linux-amd64 /usr/local/bin/bazel\n   # Clone repos\n   git clone --recursive https://github.com/pytorch/pytorch --branch v2.6.0\n   cd pytorch/\n   git clone --recursive https://github.com/pytorch/xla.git --branch r2.6_aws_neuron\n   _GLIBCXX_USE_CXX11_ABI=0 python setup.py bdist_wheel\n   # The pip wheel will be present in ./dist\n   cd xla/\n   CXX_ABI=0 python setup.py bdist_wheel\n   # The pip wheel will be present in ./dist and can be installed instead of the torch-xla released in pypi.org\n\nLower BERT pretraining performance when switch to using ``model.to(torch.bfloat16)``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 and 2.6 although there will be end-of-support warnings (as noted below).\n\n\nWarning \"XLA_DOWNCAST_BF16 will be deprecated after the 2.6 release, please downcast your model directly\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEnvironment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`)\n\n\nWARNING:root:torch_xla.core.xla_model.xrt_world_size() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.world_size instead.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is a warning that ``torch_xla.core.xla_model.xrt_world_size()`` will be removed in a future release. Switch to using ``torch_xla.runtime.world_size`` instead.\n\n\nWARNING:torch_xla.core.xla_model.get_ordinal() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.global_ordinal instead.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is a warning that ``torch_xla.core.xla_model.get_ordinal()`` will be removed in a future release. Switch to using ``torch_xla.runtime.global_ordinal`` instead.\n\nWARNING:torch_xla.core.xla_model.get_local_ordinal() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.local_ordinal instead.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n\n.. warning::\n    ``torch_xla.core.xla_model.get_local_ordinal()`` will be removed in a future release. Use ``torch_xla.runtime.local_ordinal`` instead.\n    \n\n\nSocket Error: Socket failed to bind\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn PyTorch 2.6, there must be a socket available for both torchrun and the ``init_process_group`` to bind. By default, both \nwill be set to use unused sockets. If you plan to use a ``MASTER_PORT`` environment variable then this error may occur if the port you set it to\nis already in use.\n\n.. code:: \n\n    [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:2.600 (errno: 98 - Address already in use).\n    [W socket.cpp:426] [c10d] The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use).\n    [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.\n    RuntimeError: The server socket has failed to listen on any local network address. \n    The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use).\n\nTo resolve the issue, ensure you are setting ``MASTER_PORT`` to a port value that is not used anywhere else in your scripts. Otherwise,\nyou can leave ``MASTER_PORT`` unset and torchrun will set the default port for you.\n\n\n``AttributeError: module 'torch' has no attribute 'xla'`` Failure\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn PyTorch 2.6, training scripts might fail during activation checkpointing with the error shown below.\n\n.. code::\n\n    AttributeError: module 'torch' has no attribute 'xla'\n\n\nThe solution is to use ``torch_xla.utils.checkpoint.checkpoint`` instead of ``torch.utils.checkpoint.checkpoint`` as the checkpoint function while wrapping pytorch modules for activation checkpointing.\nRefer to the pytorch/xla discussion regarding this `issue <https://github.com/pytorch/xla/issues/5766>`_.\nAlso set ``use_reentrant=True`` while calling the torch_xla checkpoint function. Failure to do so will lead to ``XLA currently does not support use_reentrant==False`` error.\nFor more details on checkpointing, refer the `documentation <https://pytorch.org/docs/stable/checkpoint.html>`_.\n\n\nError ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nWhile using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial<torch-hf-bert-finetune>`), you may see the error \"Attempted to access the data pointer on an invalid python storage\". This is a known `issue <https://github.com/huggingface/transformers/issues/2.678>`_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers.\n\n\n``ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` on Amazon Linux 2023\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\ntorch-xla version 2.6+ now requires ``libcrypt.so.1`` shared library. Currently, Amazon Linux 2023 includes ``libcrypt.so.2`` shared library by default so you may see ``ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` when using torch-neuronx 2.1+ on Amazon Linux 2023. To install ``libcrypt.so.1`` on Amazon Linux 2023, run the following installation command (see also https://github.com/amazonlinux/amazon-linux-2023/issues/182 for more context):\n\n.. code::\n\n   sudo dnf install libxcrypt-compat\n\n\n``FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path'`` Failure\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nIn PyTorch 2.6, users might face the error shown below due to incompatible ``libneuronxla`` and ``torch-neuronx`` versions being installed.\n\n.. code::\n\n    FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path'\n\nCheck that the version of ``libneuronxla`` that support PyTorch NeuronX 2.6 is ``2.2.*``. If not, then uninstall ``libneuronxla`` using ``pip uninstall libneuronxla`` and then reinstall the packages following the installation guide :ref:`installation guide <install_pytorch_neuron_2.6>`\n\n\n``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running Neuron Parallel Compile with HF Trainer API, you may see the errors ``Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` in Accelerator's ``pad_across_processes`` function. This is due to data-dependent operation in evaluation metrics computation. Data-dependent operations would result in undefined behavior with Neuron Parallel Compile trial execution (execute empty graphs with zero outputs). To work-around this error, disable compute_metrics when NEURON_EXTRACT_GRAPHS_ONLY is set to 1:\n\n.. code:: python\n\n   compute_metrics=None if os.environ.get(\"NEURON_EXTRACT_GRAPHS_ONLY\") else compute_metrics\n\nCompiler assertion error when running Stable Diffusion training\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWith PyTorch 2.6 (torch-neuronx), you may encounter the following compiler assertion error with Stable Diffusion training when gradient accumulation is enabled. This will be fixed in an upcoming release. For now, if you want to run Stable Diffusion training, disable gradient accumulation in torch-neuronx 2.6 by keeping the `default gradient accumulation steps of 1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/stable_diffusion/run.py#L20>`__.\n\n.. code:: bash\n\n    ERROR 222163 [NeuronAssert]: Assertion failure in usr/lib/python3.9/concurrent/futures/process.py at line 239 with exception:\n    too many partition dims! {{0,+,960}[10],+,10560}[10]\n\n\nFrequently Asked Questions (FAQ)\n--------------------------------\n\nDo I need to recompile my models with PyTorch 2.6?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes.\n\nDo I need to update my scripts for PyTorch 2.6?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nSee the :ref:`migration guide <migrate_to_pytorch_2.6>`\n\nWhat environment variables will be changed with PyTorch NeuronX 2.6 ?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`)\n\nWhat features will be missing with PyTorch NeuronX 2.6?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch NeuronX 2.6 has all of the supported features in PyTorch NeuronX 2.5, with known issues listed above, and unsupported features as listed in :ref:`pytorch_rn`.\n\nCan I use Neuron Distributed and Transformers Neuron libraries with PyTorch NeuronX 2.6?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes, NeuronX Distributed, and Transformers NeuronX, and AWS Neuron Reference for NeMo Megatron libraries will work with PyTorch NeuronX 2.6.\n\nCan I still use PyTorch 2.5 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.5 is supported for releases 2.21/2.22/2.23 and will reach end-of-life in a future release. Additionally, the CVE `CVE-2025-32434 <https://github.com/advisories/GHSA-53q9-r3pm-6pq6>`_ affects PyTorch version 2.5. We recommend upgrading to the new version of Torch-NeuronX by following :ref:`setup-torch-neuronx`.\n\nCan I still use PyTorch 2.1 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.1 is supported for release 2.21 and has reached end-of-life in release 2.22. Additionally, the CVEs `CVE-2024-31583 <https://github.com/advisories/GHSA-pg7h-5qx3-wjr3>`_ and `CVE-2024-31580 <https://github.com/advisories/GHSA-5pcm-hx3q-hm94>`_ affect PyTorch versions 2.1 and earlier.  We recommend upgrading to the new version of Torch-NeuronX by following :ref:`setup-torch-neuronx`.\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-7.rst",
    "content": ".. _introduce-pytorch-2-7:\n\nIntroducing PyTorch 2.7 Support\n===============================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nWhat are we introducing?\n------------------------\n\nStarting with the :ref:`Neuron 2.24 <neuron-2-24-0-whatsnew>` release, customers can now upgrade to PyTorch NeuronX (``torch-neuronx``) with specific support for PyTorch version 2.7.\n\n:ref:`setup-torch-neuronx` is updated to include installation instructions for PyTorch NeuronX 2.7 for Amazon Linux 2023 and Ubuntu 22.04. Note that PyTorch NeuronX 2.7 is supported on Python 3.9, 3.10, and 3.11.\n\nReview :ref:`migration guide <migrate_to_pytorch_2.7>` for possible changes to training scripts. No code changes are required for inference scripts.\n\n\n.. _how-pytorch-2.7-different:\n\nHow is PyTorch NeuronX 2.7 different compared to PyTorch NeuronX 2.5?\n---------------------------------------------------------------------\n\nPyTorch NeuronX 2.7 uses Torch-XLA v2.7 and PyTorch v2.7 which have C++11 ABI enabled by default. \n\nAdditionally, Torch-XLA v2.7 includes a fix for the training performance issue https://github.com/pytorch/xla/issues/9037.\n\nSee `Torch-XLA 2.7 release <https://github.com/pytorch/xla/releases/tag/v2.7.0>`__ for a full list.\n\nSee :ref:`migrate_to_pytorch_2.7` for changes needed to use PyTorch NeuronX 2.7.\n\n.. note::\n\n   GSPMD and Torch Dynamo (torch.compile) support in Neuron will be available in a future release.\n\n.. _install_pytorch_neuron_2.7:\n\nHow can I install PyTorch NeuronX 2.7?\n--------------------------------------------\n\nTo install PyTorch NeuronX 2.7, follow the :ref:`setup-torch-neuronx` guides for Amazon Linux 2023 and Ubuntu 22.04 AMI. Refer to the Neuron Multi-Framework DLAMI :ref:`setup guide <setup-ubuntu22-multi-framework-dlami>` for Ubuntu 22.04 with a pre-installed virtual environment for PyTorch NeuronX 2.7 that you can use to get started. PyTorch NeuronX 2.7 can be installed using the following:\n\n.. code::\n\n    python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.7.* torchvision\n\n.. note::\n\n   PyTorch NeuronX 2.7 is currently available for Python 3.9, 3.10, 3.11.\n\n.. _migrate_to_pytorch_2.7:\n\nMigrate your application to PyTorch 2.7\n---------------------------------------\n\nFirst, install the PyTorch NeuronX 2.7 as described above in :ref:`installation guide <install_pytorch_neuron_2.7>`\n\n\nMigrating training scripts\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo migrate the training scripts from PyTorch NeuronX 2.5/2.6 to PyTorch NeuronX 2.7, implement the following changes: \n\n.. note::\n\n    ``xm`` below refers to ``torch_xla.core.xla_model``, ``xr`` refers to ``torch_xla.runtime``, and ``xmp`` refers to ``torch_xla.distributed.xla_multiprocessing``\n\n* The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used) and will be removed in an upcoming release. Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to convert model to BF16 format. (see :ref:`migration_from_xla_downcast_bf16`)\n* The functions ``xm.xrt_world_size()``, ``xm.get_ordinal()``, and ``xm.get_local_ordinal()`` are deprecated and removed so there are errors when used. Switch to ``xr.world_size()``, ``xr.global_ordinal()``, and ``xr.local_ordinal()`` respectively as replacements.\n* The default behavior of ``torch.load`` parameter ``weights_only`` is changed from ``False`` to ``True``. Setting ``weights_only`` to ``True`` may cause issues with pickling custom objects.\n* If using ``xmp.spawn``, the ``nprocs`` argument is limited to 1 or None since v2.1. Previously, passing a value > 1 would result in a warning. In torch-xla 2.6+, passing a value > 1 will result in an error with an actionable message to use ``NEURON_NUM_DEVICES`` to set the number of NeuronCores to use.\n\nSee :ref:`v2.6 migration guide <migrate_to_pytorch_2.6>` for additional changes needed if you are migrating from PyTorch NeuronX 2.5.\nSee :ref:`v2.5 migration guide <migrate_to_pytorch_2_5>` for additional changes needed if you are migrating from PyTorch NeuronX 2.1.\n\nMigrating inference scripts\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThere are no code changes required in the inference scripts.\n\n\nTroubleshooting and Known Issues\n--------------------------------\n\nUsing the latest torch-xla v2.7 may result in increase in host memory usage\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUsing the latest torch-xla v2.7 may result in an increase in host memory usage compared to torch-xla v2.6. In one example, LLama2 pretraining with ZeRO1 and sequence length 16k could see an increase of 1.6% in host memory usage.\n\nTypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAdamW now has an additional argument “decoupled_weight_decay” which defaults to False. If you get “TypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay'” with NeuronX Distributed, update to the latest version.\n\n\nTensor split on second dimension of 2D array not working\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``.\n\nLower BERT pretraining performance when switch to using ``model.to(torch.bfloat16)``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 and 2.7 although there will be end-of-support warnings (as noted below).\n\n\nWarning \"XLA_DOWNCAST_BF16 will be deprecated after the 2.6 release, please downcast your model directly\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEnvironment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`)\n\n\nAttributeError: <module 'torch_xla.core.xla_model' ... does not have the attribute 'xrt_world_size'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is an error that ``torch_xla.core.xla_model.xrt_world_size()`` is removed in torch-xla version 2.7. Switch to using ``torch_xla.runtime.world_size()`` instead. If using Hugging Face transformers/accelerate libraries, use transformers==4.53.* and accelerate==1.7.*.\n\nAttributeError: <module 'torch_xla.core.xla_model' ... does not have the attribute 'get_ordinal'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is an error that ``torch_xla.core.xla_model.get_ordinal()`` is removed in torch-xla version 2.7. Switch to using ``torch_xla.runtime.global_ordinal()`` instead. If using Hugging Face transformers/accelerate libraries, use transformers==4.53.* and accelerate==1.7.*.\n\nAttributeError: <module 'torch_xla.core.xla_model' ... does not have the attribute 'get_local_ordinal'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is an error that ``torch_xla.core.xla_model.get_local_ordinal()`` is removed in torch-xla version 2.7. Switch to using ``torch_xla.runtime.local_ordinal()`` instead. If using Hugging Face transformers/accelerate libraries, use transformers==4.53.* and accelerate==1.7.*.\n\n\nSocket Error: Socket failed to bind\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn PyTorch 2.7, there must be a socket available for both torchrun and the ``init_process_group`` to bind. By default, both \nwill be set to use unused sockets. If you plan to use a ``MASTER_PORT`` environment variable then this error may occur if the port you set it to\nis already in use.\n\n.. code:: \n\n    [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:2.700 (errno: 98 - Address already in use).\n    [W socket.cpp:426] [c10d] The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use).\n    [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.\n    RuntimeError: The server socket has failed to listen on any local network address. \n    The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use).\n\nTo resolve the issue, if you are setting ``MASTER_PORT``, ensure that the port you're setting it to is not used anywhere else in your scripts. Otherwise,\nyou can leave ``MASTER_PORT`` unset and torchrun will set the default port for you.\n\n\n``AttributeError: module 'torch' has no attribute 'xla'`` Failure\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn PyTorch 2.7, training scripts might fail during activation checkpointing with the error shown below.\n\n.. code::\n\n    AttributeError: module 'torch' has no attribute 'xla'\n\n\nThe solution is to use ``torch_xla.utils.checkpoint.checkpoint`` instead of ``torch.utils.checkpoint.checkpoint`` as the checkpoint function while wrapping pytorch modules for activation checkpointing.\nRefer to the pytorch/xla discussion regarding this `issue <https://github.com/pytorch/xla/issues/5766>`_.\nAlso set ``use_reentrant=True`` while calling the torch_xla checkpoint function. Failure to do so will lead to ``XLA currently does not support use_reentrant==False`` error.\nFor more details on checkpointing, refer the `documentation <https://pytorch.org/docs/stable/checkpoint.html>`_.\n\n\nError ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nWhile using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial<torch-hf-bert-finetune>`), you may see the error \"Attempted to access the data pointer on an invalid python storage\". This is a known `issue <https://github.com/huggingface/transformers/issues/27778>`_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers.\n\n\n``ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` on Amazon Linux 2023\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\ntorch-xla version 2.5+ now requires the ``libcrypt.so.1`` shared library. Currently, Amazon Linux 2023 includes ``libcrypt.so.2`` shared library by default so you may see `ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` when using torch-neuronx 2.1+ on Amazon Linux 2023. To install ``libcrypt.so.1`` on Amazon Linux 2023, run the following installation command (see also https://github.com/amazonlinux/amazon-linux-2023/issues/182 for more context):\n\n.. code::\n\n   sudo dnf install libxcrypt-compat\n\n\n``FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path'`` Failure\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nIn PyTorch 2.7, users might face the error shown below due to incompatible ``libneuronxla`` and ``torch-neuronx`` versions being installed.\n\n.. code::\n\n    FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path'\n\nCheck that the version of ``libneuronxla`` that supports PyTorch NeuronX 2.7 is ``2.2.*``. If not, then uninstall ``libneuronxla`` using ``pip uninstall libneuronxla`` and then reinstall the packages following the installation guide :ref:`installation guide <install_pytorch_neuron_2.7>`\n\n\n``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running Neuron Parallel Compile with HF Trainer API, you may see the errors ``Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` in Accelerator's ``pad_across_processes`` function. This is due to data-dependent operations in evaluation metrics computation. Data-dependent operations would result in undefined behavior with Neuron Parallel Compile trial execution (execute empty graphs with zero outputs). To work around this error, disable compute_metrics when NEURON_EXTRACT_GRAPHS_ONLY is set to 1:\n\n.. code:: python\n\n   compute_metrics=None if os.environ.get(\"NEURON_EXTRACT_GRAPHS_ONLY\") else compute_metrics\n\nCompiler assertion error when running Stable Diffusion training\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWith PyTorch 2.7 (torch-neuronx), you may encounter the following compiler assertion error with Stable Diffusion training when gradient accumulation is enabled. This will be fixed in an upcoming release. For now, if you want to run Stable Diffusion training, disable gradient accumulation in torch-neuronx 2.7 by keeping the `default gradient accumulation steps of 1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/stable_diffusion/run.py#L20>`__.\n\n.. code:: bash\n\n    ERROR 222163 [NeuronAssert]: Assertion failure in usr/lib/python3.9/concurrent/futures/process.py at line 239 with exception:\n    too many partition dims! {{0,+,960}[10],+,10560}[10]\n\n\nFrequently Asked Questions (FAQ)\n--------------------------------\n\nDo I need to recompile my models with PyTorch 2.7?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes.\n\nDo I need to update my scripts for PyTorch 2.7?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nSee the :ref:`migration guide <migrate_to_pytorch_2.7>`\n\nWhat environment variables will be changed with PyTorch NeuronX 2.7 ?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`)\n\nWhat features will be missing with PyTorch NeuronX 2.7?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch NeuronX 2.7 has all of the supported features in PyTorch NeuronX 2.6, with known issues listed above, and unsupported features as listed in :ref:`pytorch_rn`.\n\nCan I use Neuron Distributed and Transformers Neuron libraries with PyTorch NeuronX 2.7?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes, NeuronX Distributed and Transformers NeuronX are supported by PyTorch NeuronX 2.7.  AWS Neuron Reference for NeMo Megatron has reached end-of-support in release 2.23.\n\nCan I still use PyTorch 2.6 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.6 is supported since release 2.23.\n\nCan I still use PyTorch 2.5 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.5 is supported for releases 2.21 to 2.24 and will reach end-of-life in a future release. Additionally, the CVE `CVE-2025-32434 <https://github.com/advisories/GHSA-53q9-r3pm-6pq6>`_ affects PyTorch version 2.5. We recommend upgrading to the new version of Torch-NeuronX by following :ref:`setup-torch-neuronx`.\n\nCan I still use PyTorch 2.1 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.1 is supported for release 2.21 and has reached end-of-life in release 2.22. Additionally, the CVEs `CVE-2024-31583 <https://github.com/advisories/GHSA-pg7h-5qx3-wjr3>`_ and `CVE-2024-31580 <https://github.com/advisories/GHSA-5pcm-hx3q-hm94>`_ affect PyTorch versions 2.1 and earlier.  We recommend upgrading to the new version of Torch-NeuronX by following :ref:`setup-torch-neuronx`.\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-8.rst",
    "content": ".. _introduce-pytorch-2-8:\n\nIntroducing PyTorch 2.8 Support\n===============================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nWhat are we introducing?\n------------------------\n\nStarting with the :ref:`Neuron 2.26 <neuron-2-26-0-whatsnew>` release, customers can now upgrade to PyTorch NeuronX (``torch-neuronx``) with specific support for PyTorch version 2.8.\n\n:ref:`setup-torch-neuronx` is updated to include installation instructions for PyTorch NeuronX 2.8 for Ubuntu 22.04. Note that PyTorch NeuronX 2.8 is supported on Python 3.10 and 3.11, with 3.12+ support coming in a future release.\n\nReview :ref:`migration guide <migrate_to_pytorch_2.8>` for possible changes to training scripts. No code changes are required for inference scripts.\n\n\n.. _how-pytorch-2.8-different:\n\nHow is PyTorch NeuronX 2.8 different compared to PyTorch NeuronX 2.7?\n---------------------------------------------------------------------\n\nSee `Torch-XLA 2.8 release <https://github.com/pytorch/xla/releases/tag/v2.8.0>`__ for a full list of changes.\n\nSee :ref:`migrate_to_pytorch_2.8` for changes needed to use PyTorch NeuronX 2.8.\n\n.. note::\n\n   GSPMD and Torch Dynamo (torch.compile) support in Neuron will be available in a future release.\n\n.. _install_pytorch_neuron_2.8:\n\nHow can I install PyTorch NeuronX 2.8?\n--------------------------------------------\n\nTo install PyTorch NeuronX 2.8, follow the :ref:`setup-torch-neuronx` guides for Ubuntu 22.04 AMI. Refer to the Neuron Multi-Framework DLAMI :ref:`setup guide <setup-ubuntu22-multi-framework-dlami>` for Ubuntu 22.04 with a pre-installed virtual environment for PyTorch NeuronX 2.8 that you can use to get started. PyTorch NeuronX 2.8 can be installed using the following:\n\n.. code::\n\n    python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.8.* torchvision\n\n.. note::\n\n   PyTorch NeuronX 2.8 is currently available for Python 3.10 and 3.11, with 3.12+ support coming in a future release.\n\n.. note::\n\n   To use Amazon Linux 2023, you will need to install Python 3.10 or 3.11 to use PyTorch NeuronX 2.8.\n\n.. _migrate_to_pytorch_2.8:\n\nMigrate your application to PyTorch 2.8\n---------------------------------------\n\nFirst, install the PyTorch NeuronX 2.8 as described above in :ref:`installation guide <install_pytorch_neuron_2.8>`\n\n\nMigrating training scripts\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThere are no code changes required in the training scripts to move from PyTorch NeuronX 2.7 to PyTorch NeuronX 2.8.\n\nSee :ref:`v2.7 migration guide <migrate_to_pytorch_2.7>` for additional changes needed if you are migrating from PyTorch NeuronX 2.6.\nSee :ref:`v2.6 migration guide <migrate_to_pytorch_2.6>` for additional changes needed if you are migrating from PyTorch NeuronX 2.5.\n\nMigrating inference scripts\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThere are no code changes required in the inference scripts.\n\n\nTroubleshooting and Known Issues\n--------------------------------\n\n[v2.8] Lower BERT/LLaMA performance with torch-xla 2.8.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUsing the publicly released version of torch-xla 2.8.0 from public PyPI repositories would result in lower performance for models like BERT and LLaMA (https://github.com/pytorch/xla/issues/9605). To fix this, switch to using the updated torch-xla version 2.8.1 from public PyPI repositories.\n\nUsing the latest torch-xla 2.7/2.8 may result in increase in host memory usage\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUsing torch-xla 2.7/2.8 may result in an increase in host memory usage compared to torch-xla 2.6. In one example, LLama2 pretraining with ZeRO1 and sequence length 16k could see an increase of 1.6% in host memory usage.\n\nTypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAdamW now has an additional argument ``decoupled_weight_decay`` which defaults to False. If you get ``TypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay'`` with NeuronX Distributed, update to the latest version.\n\n\nTensor split on second dimension of 2D array not working\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``.\n\nLower BERT pretraining performance when switch to using ``model.to(torch.bfloat16)``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 to 2.8 although there will be end-of-support warnings (as noted below).\n\n\nDeprecationWarning: Use torch_xla.device instead\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is a end-of-support warning when using ``torch_xla.core.xla_model.xla_device()``. Switch to ``torch_xla.device()`` instead.\n\nDeprecationWarning: Use torch_xla.sync instead\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is a end-of-support warning when using ``torch_xla.core.xla_model.mark_step()``. Switch to ``torch_xla.sync()`` instead.\n\nWarning \"XLA_DOWNCAST_BF16 will be deprecated after the 2.6 release, please downcast your model directly\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEnvironment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`)\n\n\nAttributeError: <module 'torch_xla.core.xla_model' ... does not have the attribute 'xrt_world_size'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is an error that ``torch_xla.core.xla_model.xrt_world_size()`` was removed since torch-xla version 2.7+. Switch to using ``torch_xla.runtime.world_size()`` instead. If using Hugging Face transformers/accelerate libraries, use transformers==4.53.* and accelerate==1.7.* or newer.\n\nAttributeError: <module 'torch_xla.core.xla_model' ... does not have the attribute 'get_ordinal'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is an error that ``torch_xla.core.xla_model.get_ordinal()`` was removed since torch-xla version 2.7+. Switch to using ``torch_xla.runtime.global_ordinal()`` instead. If using Hugging Face transformers/accelerate libraries, use transformers==4.53.* and accelerate==1.7.* or newer.\n\nAttributeError: <module 'torch_xla.core.xla_model' ... does not have the attribute 'get_local_ordinal'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is an error that ``torch_xla.core.xla_model.get_local_ordinal()`` was removed since torch-xla version 2.7+. Switch to using ``torch_xla.runtime.local_ordinal()`` instead. If using Hugging Face transformers/accelerate libraries, use transformers==4.53.* and accelerate==1.7.* or newer.\n\n\nSocket Error: Socket failed to bind\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn PyTorch 2.1+ including 2.8, there must be a socket available for both torchrun and the ``init_process_group`` to bind. By default, both \nwill be set to use unused sockets. If you plan to use a ``MASTER_PORT`` environment variable then this error may occur if the port you set it to\nis already in use.\n\n.. code:: \n\n    [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:2.700 (errno: 98 - Address already in use).\n    [W socket.cpp:426] [c10d] The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use).\n    [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.\n    RuntimeError: The server socket has failed to listen on any local network address. \n    The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use).\n\nTo resolve the issue, if you are setting ``MASTER_PORT``, ensure that the port you're setting it to is not used anywhere else in your scripts. Otherwise,\nyou can leave ``MASTER_PORT`` unset and torchrun will set the default port for you.\n\n\n``AttributeError: module 'torch' has no attribute 'xla'`` Failure\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn PyTorch 2.8, training scripts might fail during activation checkpointing with the error shown below.\n\n.. code::\n\n    AttributeError: module 'torch' has no attribute 'xla'\n\n\nThe solution is to use ``torch_xla.utils.checkpoint.checkpoint`` instead of ``torch.utils.checkpoint.checkpoint`` as the checkpoint function while wrapping pytorch modules for activation checkpointing.\nRefer to the pytorch/xla discussion regarding this `issue <https://github.com/pytorch/xla/issues/5766>`_.\nAlso set ``use_reentrant=True`` while calling the torch_xla checkpoint function. Failure to do so will lead to ``XLA currently does not support use_reentrant==False`` error.\nFor more details on checkpointing, refer the `documentation <https://pytorch.org/docs/stable/checkpoint.html>`_.\n\n\nError ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nWhile using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial<torch-hf-bert-finetune>`), you may see the error \"Attempted to access the data pointer on an invalid python storage\". This is a known `issue <https://github.com/huggingface/transformers/issues/27778>`_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers.\n\n``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running Neuron Parallel Compile with HF Trainer API, you may see the errors ``Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` in Accelerator's ``pad_across_processes`` function. This is due to data-dependent operations in evaluation metrics computation. Data-dependent operations would result in undefined behavior with Neuron Parallel Compile trial execution (execute empty graphs with zero outputs). To work around this error, disable compute_metrics when NEURON_EXTRACT_GRAPHS_ONLY is set to 1:\n\n.. code:: python\n\n   compute_metrics=None if os.environ.get(\"NEURON_EXTRACT_GRAPHS_ONLY\") else compute_metrics\n\nCompiler assertion error when running Stable Diffusion training\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWith PyTorch 2.8 (torch-neuronx), you may encounter the following compiler assertion error with Stable Diffusion training when gradient accumulation is enabled. This will be fixed in an upcoming release. For now, if you want to run Stable Diffusion training, disable gradient accumulation in torch-neuronx 2.8 by keeping the `default gradient accumulation steps of 1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/stable_diffusion/run.py#L20>`__.\n\n.. code:: bash\n\n    ERROR 222163 [NeuronAssert]: Assertion failure in usr/lib/python3.9/concurrent/futures/process.py at line 239 with exception:\n    too many partition dims! {{0,+,960}[10],+,10560}[10]\n\n\nFrequently Asked Questions (FAQ)\n--------------------------------\n\nDo I need to recompile my models with PyTorch 2.8?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes.\n\nDo I need to update my scripts for PyTorch 2.8?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nSee the :ref:`migration guide <migrate_to_pytorch_2.8>`\n\nWhat environment variables will be changed with PyTorch NeuronX 2.8 ?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`)\n\nWhat features will be missing with PyTorch NeuronX 2.8?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch NeuronX 2.8 has all of the supported features in PyTorch NeuronX 2.7, with known issues listed above, and unsupported features as listed in :ref:`pytorch_rn`.\n\nCan I use Neuron Distributed libraries with PyTorch NeuronX 2.8?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes, NeuronX Distributed libraries are supported by PyTorch NeuronX 2.8. Transformers NeuronX has reached end-of-support in release 2.26. AWS Neuron Reference for NeMo Megatron has reached end-of-support in release 2.23.\n\nCan I still use PyTorch 2.7 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.7 is supported since release 2.24.\n\nCan I still use PyTorch 2.6 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.6 is supported since release 2.23.\n\nCan I still use PyTorch 2.5 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.5 reached end-of-support in release 2.25.\n\nCan I still use Amazon Linux 2023?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes. You will need to install Python 3.10 or 3.11 to use PyTorch NeuronX 2.8.\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-9.rst",
    "content": ".. _introduce-pytorch-2-9:\n\nIntroducing PyTorch 2.9 Support\n===============================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nWhat are we introducing?\n------------------------\n\nStarting with the :ref:`Neuron 2.27 <neuron-2-27-0-whatsnew>` release, customers can now upgrade to PyTorch NeuronX (``torch-neuronx``) with specific support for PyTorch version 2.9.\n\nPyTorch NeuronX 2.9 adds support for AWS Trainium 3 (Trn3) instances, in addition to existing support for Trainium (Trn2/Trn1/Trn1n) and Inferentia (Inf2) instances.\n\n:ref:`setup-torch-neuronx` is updated to include installation instructions for PyTorch NeuronX 2.9 for Ubuntu 24.04. Note that PyTorch NeuronX 2.9 is supported on Python 3.10, 3.11 and 3.12.\n\nReview :ref:`migration guide <migrate_to_pytorch_2.9>` for possible changes to training scripts. No code changes are required for inference scripts.\n\n\n.. _how-pytorch-2.9-different:\n\nHow is PyTorch NeuronX 2.9 different compared to PyTorch NeuronX 2.8?\n---------------------------------------------------------------------\n\nSee `Torch-XLA 2.9 release <https://github.com/pytorch/xla/releases/tag/v2.9.0>`__ for a full list of changes.\n\nSee :ref:`migrate_to_pytorch_2.9` for changes needed to use PyTorch NeuronX 2.9.\n\n.. note::\n\n   Torch Dynamo (torch.compile) support in Neuron will be available in a future release.\n\n.. _install_pytorch_neuron_2.9:\n\nHow can I install PyTorch NeuronX 2.9?\n--------------------------------------------\n\nTo install PyTorch NeuronX 2.9, follow the :ref:`setup-torch-neuronx` guides for Ubuntu 24.04 AMI. Refer to the Neuron Multi-Framework DLAMI :ref:`setup guide <setup-ubuntu22-multi-framework-dlami>` for Ubuntu 24.04 with a pre-installed virtual environment for PyTorch NeuronX 2.9 that you can use to get started. PyTorch NeuronX 2.9 can be installed using the following:\n\n.. code::\n\n    python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.9.* torchvision\n\n.. note::\n\n   PyTorch NeuronX 2.9 is currently available for Python 3.10, 3.11 and 3.12.\n\n.. note::\n\n   To use Amazon Linux 2023, you will need to install Python 3.10, 3.11 or 3.12 to use PyTorch NeuronX 2.9. See `Amazon Linux 2023 Python documentation <https://docs.aws.amazon.com/linux/al2023/ug/python.html>`_ for installation instructions.\n\n.. _migrate_to_pytorch_2.9:\n\nMigrate your application to PyTorch 2.9\n---------------------------------------\n\nFirst, install the PyTorch NeuronX 2.9 as described above in :ref:`installation guide <install_pytorch_neuron_2.9>`\n\n\nMigrating training scripts\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThere are no code changes required in the training scripts to move from PyTorch NeuronX 2.8 to PyTorch NeuronX 2.9.\n\nSee :ref:`v2.8 migration guide <migrate_to_pytorch_2.8>` for additional changes needed if you are migrating from PyTorch NeuronX 2.7.\nSee :ref:`v2.7 migration guide <migrate_to_pytorch_2.7>` for additional changes needed if you are migrating from PyTorch NeuronX 2.6.\n\nMigrating inference scripts\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThere are no code changes required in the inference scripts.\n\n\nTroubleshooting and Known Issues\n--------------------------------\n\nGLIBC compatibility issue on Amazon Linux 2023\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running PyTorch NeuronX 2.9 on Amazon Linux 2023, you may encounter the following error:\n\n.. code::\n\n    ImportError: /usr/lib64/libm.so.6: version `GLIBC_2.35' not found (required by /opt/conda/lib/python3.12/site-packages/_XLAC.cpython-312-x86_64-linux-gnu.so)\n\nThis occurs because the PyTorch NeuronX 2.9 binaries require GLIBC 2.35, but Amazon Linux 2023 ships with an older version of GLIBC. Use Ubuntu 24.04 AMI instead, which has the required GLIBC version. Follow the :ref:`setup-torch-neuronx` installation guide for Ubuntu 24.04.\n\nUsing the latest torch-xla 2.7/2.8/2.9 may result in increase in host memory usage\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUsing the latest torch-xla v2.7/2.8/2.9 may result in an increase in host memory usage compared to torch-xla v2.6. In one example, LLama2 pretraining with ZeRO1 and sequence length 16k could see an increase of 1.6% in host memory usage.\n\nTypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAdamW now has an additional argument ``decoupled_weight_decay`` which defaults to False. If you get ``TypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay'`` with NeuronX Distributed, update to the latest version.\n\n\nTensor split on second dimension of 2D array not working\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``.\n\nLower BERT pretraining performance when switch to using ``model.to(torch.bfloat16)``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 through 2.9 although there will be end-of-support warnings (as noted below).\n\n\nDeprecationWarning: Use torch_xla.device instead\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is a end-of-support warning when using ``torch_xla.core.xla_model.xla_device()``. Switch to ``torch_xla.device()`` instead.\n\nDeprecationWarning: Use torch_xla.sync instead\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is a end-of-support warning when using ``torch_xla.core.xla_model.mark_step()``. Switch to ``torch_xla.sync()`` instead.\n\nWarning \"XLA_DOWNCAST_BF16 will be deprecated after the 2.6 release, please downcast your model directly\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEnvironment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`)\n\n\nAttributeError: <module 'torch_xla.core.xla_model' ... does not have the attribute 'xrt_world_size'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is an error that ``torch_xla.core.xla_model.xrt_world_size()`` was removed since torch-xla version 2.7+. Switch to using ``torch_xla.runtime.world_size()`` instead. If using Hugging Face transformers/accelerate libraries, use transformers==4.53.* and accelerate==1.7.* or newer.\n\nAttributeError: <module 'torch_xla.core.xla_model' ... does not have the attribute 'get_ordinal'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is an error that ``torch_xla.core.xla_model.get_ordinal()`` was removed since torch-xla version 2.7+. Switch to using ``torch_xla.runtime.global_ordinal()`` instead. If using Hugging Face transformers/accelerate libraries, use transformers==4.53.* and accelerate==1.7.* or newer.\n\nAttributeError: <module 'torch_xla.core.xla_model' ... does not have the attribute 'get_local_ordinal'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is an error that ``torch_xla.core.xla_model.get_local_ordinal()`` was removed since torch-xla version 2.7+. Switch to using ``torch_xla.runtime.local_ordinal()`` instead. If using Hugging Face transformers/accelerate libraries, use transformers==4.53.* and accelerate==1.7.* or newer.\n\n\nSocket Error: Socket failed to bind\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn PyTorch 2.1+ including 2.9, there must be a socket available for both torchrun and the ``init_process_group`` to bind. By default, both \nwill be set to use unused sockets. If you plan to use a ``MASTER_PORT`` environment variable then this error may occur if the port you set it to\nis already in use.\n\n.. code:: \n\n    [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:2.700 (errno: 98 - Address already in use).\n    [W socket.cpp:426] [c10d] The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use).\n    [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.\n    RuntimeError: The server socket has failed to listen on any local network address. \n    The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use).\n\nTo resolve the issue, if you are setting ``MASTER_PORT``, ensure that the port you're setting it to is not used anywhere else in your scripts. Otherwise,\nyou can leave ``MASTER_PORT`` unset and torchrun will set the default port for you.\n\n\n``AttributeError: module 'torch' has no attribute 'xla'`` Failure\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn PyTorch 2.9, training scripts might fail during activation checkpointing with the error shown below.\n\n.. code::\n\n    AttributeError: module 'torch' has no attribute 'xla'\n\n\nThe solution is to use ``torch_xla.utils.checkpoint.checkpoint`` instead of ``torch.utils.checkpoint.checkpoint`` as the checkpoint function while wrapping pytorch modules for activation checkpointing.\nRefer to the pytorch/xla discussion regarding this `issue <https://github.com/pytorch/xla/issues/5766>`_.\nAlso set ``use_reentrant=True`` while calling the torch_xla checkpoint function. Failure to do so will lead to ``XLA currently does not support use_reentrant==False`` error.\nFor more details on checkpointing, refer the `documentation <https://pytorch.org/docs/stable/checkpoint.html>`_.\n\n\nError ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nWhile using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial<torch-hf-bert-finetune>`), you may see the error \"Attempted to access the data pointer on an invalid python storage\". This is a known `issue <https://github.com/huggingface/transformers/issues/27778>`_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers.\n\n``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running Neuron Parallel Compile with HF Trainer API, you may see the errors ``Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` in Accelerator's ``pad_across_processes`` function. This is due to data-dependent operations in evaluation metrics computation. Data-dependent operations would result in undefined behavior with Neuron Parallel Compile trial execution (execute empty graphs with zero outputs). To work around this error, disable compute_metrics when NEURON_EXTRACT_GRAPHS_ONLY is set to 1:\n\n.. code:: python\n\n   compute_metrics=None if os.environ.get(\"NEURON_EXTRACT_GRAPHS_ONLY\") else compute_metrics\n\n\nFrequently Asked Questions (FAQ)\n--------------------------------\n\nDo I need to recompile my models with PyTorch 2.9?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes.\n\nDo I need to update my scripts for PyTorch 2.9?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nSee the :ref:`migration guide <migrate_to_pytorch_2.9>`\n\nWhat environment variables will be changed with PyTorch NeuronX 2.9 ?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`)\n\nWhat features will be missing with PyTorch NeuronX 2.9?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch NeuronX 2.9 has all of the supported features in PyTorch NeuronX 2.8, with known issues listed above, and unsupported features as listed in :ref:`pytorch_rn`.\n\nCan I use Neuron Distributed libraries with PyTorch NeuronX 2.9?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes, NeuronX Distributed libraries are supported by PyTorch NeuronX 2.9. Transformers NeuronX has reached end-of-support in release 2.26. AWS Neuron Reference for NeMo Megatron has reached end-of-support in release 2.23.\n\nCan I still use PyTorch 2.8 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.8 is supported since release 2.26.\n\nCan I still use PyTorch 2.7 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.7 is supported since release 2.24.\n\n.. note::\n\n   PyTorch NeuronX 2.7 supports Python 3.10, and 3.11. Python 3.12 is not supported for PyTorch 2.7 and earlier versions.\n\nCan I still use PyTorch 2.6 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.6 has reached end-of-support since release 2.27.\n\nCan I still use Amazon Linux 2023?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes. You will need to install Python 3.10, 3.11 or 3.12 to use PyTorch NeuronX 2.9.\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-x.rst",
    "content": ".. _introduce-pytorch-2-5:\n\nIntroducing PyTorch 2.5 Support\n===============================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nWhat are we introducing?\n------------------------\n\nStarting with the :ref:`Neuron 2.21 <neuron-2.21.0-whatsnew>` release, customers will be able to upgrade to ``PyTorch NeuronX(torch-neuronx)`` supporting ``PyTorch 2.5``.\n\n:ref:`setup-torch-neuronx` is updated to include installation instructions for PyTorch NeuronX 2.5 for Amazon Linux 2023 and Ubuntu 22. Note that PyTorch NeuronX 2.5 does not support Python 3.8 which is default in Ubuntu 20. To use Ubuntu 20, customers will need to install Python 3.9+.\n\nPlease review :ref:`migration guide <migrate_to_pytorch_2_5>` for possible changes to training scripts. No code changes are required for inference scripts.\n\n\n.. _how-pytorch-2-5-different:\n\nHow is PyTorch NeuronX 2.5 different compared to PyTorch NeuronX 2.1?\n---------------------------------------------------------------------\n\nPyTorch NeuronX 2.5 uses Torch-XLA 2.5 which has improved support for eager debug mode, Automatic Mixed Precission, PJRT device auto-detection, FP8, and others. See `Torch-XLA 2.5 release <https://github.com/pytorch/xla/releases/tag/v2.5.0>`__ for a full list.\n\nSee :ref:`migrate_to_pytorch_2_5` for changes needed to use PyTorch NeuronX 2.5.\n\n.. note::\n\n   GSPMD and Torch Dynamo (torch.compile) support in Neuron will be available in a future release.\n\n.. _install_pytorch_neuron_2_5:\n\nHow can I install PyTorch NeuronX 2.5?\n--------------------------------------------\n\nTo install PyTorch NeuronX 2.5 please follow the :ref:`setup-torch-neuronx` guides for Amazon Linux 2023 and Ubuntu 22 AMI. Please also refer to the Neuron multi-framework DLAMI :ref:`setup guide <setup-ubuntu22-multi-framework-dlami>` for Ubuntu 22 with a pre-installed virtual environment for PyTorch NeuronX 2.5 that you can use to get started. PyTorch NeuronX 2.5 can be installed using the following:\n\n.. code::\n\n    python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.5.* torchvision\n\n.. note::\n\n   PyTorch NeuronX 2.5 is currently available for Python 3.9, 3.10, 3.11.\n\n.. _migrate_to_pytorch_2_5:\n\nMigrate your application to PyTorch 2.5\n---------------------------------------\n\nPlease make sure you have first installed the PyTorch NeuronX 2.5 as described above in :ref:`installation guide <install_pytorch_neuron_2_5>`\n\n\nMigrating training scripts\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo migrate the training scripts from PyTorch NeuronX 2.1 to PyTorch NeuronX 2.5, implement the following changes: \n\n.. note::\n\n    ``xm`` below refers to ``torch_xla.core.xla_model`` and ``xr`` refers to ``torch_xla.runtime``\n\n* The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used). Please switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to convert model to BF16 format. (see :ref:`migration_from_xla_downcast_bf16`)\n* The ``torch_xla.experimental.pjrt`` module which was replaced by ``torch_xla.runtime`` in Torch-XLA 2.1, has been removed in Torch-XLA 2.5. Users should now utilize the ``torch_xla.runtime`` module as a replacement.\n* ``torch_xla.runtime.using_pjrt`` is removed because PJRT is the sole Torch-XLA runtime.\n* ``xm.all_reduce`` no longer operates in-place for single tensors. To fix this, please convert the single tensor to an array (e.g.. ``[single_tensor]``) or assign the output of ``xm.all_reduce`` to a variable.\n* The functions ``xm.xrt_world_size()``, ``xm.xla_model.get_ordinal()``, and ``xm.xla_model.get_local_ordinal()`` are deprecated (warning when used). Please switch to ``xr.world_size``, ``xr.global_ordinal``, and ``xr.local_ordinal`` respectively as replacements.\n* ``torch_xla.experimental.xla_sharding`` is now replaced by ``torch_xla.distributed.spmd.xla_sharding``.\n* Class ``ZeroRedundancyOptimizer`` now has two new arguments that replaces the optional boolean argument ``coalesce_cc``:\n    * ``bucket_cap_mb_all_gather`` (int, Optional): Number of MegaBytes of the tensor bucket to fill before doing all-gather. Default: 0 (disable  all gather coalescing).\n    * ``bucket_cap_mb_reduce_scatter`` (int, Optional): Number of MegaBytes of the tensor bucket to fill before doing reduce-scatter. Default: 0 (disable reduce scatter coalescing).\n\nMigrating inference scripts\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThere are no code changes required in the inference scripts.\n\n\nTroubleshooting and Known Issues\n--------------------------------\n\nNeuronx-Distributed Training Llama 3.1 70B 8-node tutorial failed with OSError when the Neuron Cache is placed on FSx mount\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nCurrently, the Neuronx-Distributed Training Llama 3.1 70B 8-node tutorial failed with OSError (Errno 61) when the Neuron Cache is placed on FSx mount:\n\n.. code:: bash\n\n    [rank197]: RuntimeError: Bad StatusOr access: INVALID_ARGUMENT: RunNeuronCCImpl: error condition !(error != 400): <class 'OSError'>: [Errno 61] No data available: '/fsxl/neuron_cache/neuronxcc-2.16.372.0+4a9b2326/MODULE_3540044791706521849+4eb52b03/model.neff' -> '/tmp/tmpx7bvfpmm/model.neff'\n\nWe found that the error is due to FSx failing during file copy when there are multiple readers (13 workers fail to copy out of 256). This issue doesn’t affect simpler models like BERT.\n\nTo work-around the issue, please use the shared NFS mount (/home directory on a Parallel Cluster) instead of FSx to store Neuron Cache. This will be fixed in an upcoming release.\n\nRunning in-place update operations (e.g. all_reduce) on 0-dimensional tensors result in buffer aliasing errors in torch 2.5 and earlier\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTorch's lazy tensor core has a feature where 0-dimensional tensors are stored in a device cache, so scalar constant values can be transferred once and then reused. The values in the device cache are supposed to be marked read-only and never participate in parameter aliasing. However, due to a bug in torch-xla 2.5 (`#8499 <https://github.com/pytorch/xla/issues/8499>`_), sometimes the read-only flag can be dropped, allowing these tensors to be donated, resulting in aliasing errors later when the cached value is used again.\n\nA work-around is to avoid using 0-dimensional tensors by changing them to be 1d tensor of length 1 (`example <https://github.com/aws-neuron/neuronx-nemo-megatron/pull/36/commits/0b2354666508ac75cb6150083211fa6823864ebe>`_).\nIf modifying library code is not possible, disable XLA parameter aliasing by setting environment variable XLA_ENABLE_PARAM_ALIASING=0\n\nTensor split on second dimension of 2D array not working\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, when using tensor split operation on a 2D array in the second dimension, the resulting tensors don't have the expected data (https://github.com/pytorch/xla/issues/8640). The work-around is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another work-around is to use ``torch.tensor_split``.\n\nImport torch_xla crashed with ``TypeError: must be called with a dataclass type or instance`` with torch-xla 2.5 and torch 2.5.1+cpu (CPU flavor)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen using torch 2.5.1+cpu (CPU flavor) on python 3.10, importing torch_xla crashed with ``TypeError: must be called with a dataclass type or instance`` due to installed triton version 3.2.0 (https://github.com/pytorch/xla/issues/8560). To work-around, please remove the installed triton package or downgrade to triton==3.1.0 or use the regular torch 2.5.1 (GPU flavor).\n\nCertain sequence of operations with ``xm.save()`` could corrupt tensors\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen using the ``xm.save`` function to save tensors, please use ``xm.mark_step()`` before ``xm.save`` to avoid the error described in https://github.com/pytorch/xla/issues/8422 where parameter aliasing could corrupt other tensor values. This issue will be fixed in a future release.\n\n(Here ``xm`` is ``torch_xla.core.xla_model`` following PyTorch/XLA convention)\n\nLower BERT pretraining performance when switch to using ``model.to(torch.bfloat16)``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, BERT pretraining performance is ~11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a work-around to recover the performance, you can set ``XLA_DOWNCAST_BF16=1`` which would still work in torch-neuronx 2.5 and 2.6 although there will be end-of-support warnings (as noted below).\n\nWarning \"XLA_DOWNCAST_BF16 will be deprecated after the 2.5 release, please downcast your model directly\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEnvironment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used). Please switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`)\n\n\nWARNING:root:torch_xla.core.xla_model.xrt_world_size() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.world_size instead.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is a warning that ``torch_xla.core.xla_model.xrt_world_size()`` will be removed in a future release. Please switch to using ``torch_xla.runtime.world_size`` instead.\n\n\nWARNING:torch_xla.core.xla_model.xla_model.get_ordinal() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.global_ordinal instead.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis is a warning that ``torch_xla.core.xla_model.xla_model.get_ordinal()`` will be removed in a future release. Please switch to using ``torch_xla.runtime.global_ordinal`` instead.\n\n\nAttributeError: module 'torch_xla.runtime' has no attribute 'using_pjrt'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn Torch-XLA 2.5, ``torch_xla.runtime.using_pjrt`` is removed because PJRT is the sole Torch-XLA runtime.\nSee `commit PR <https://github.com/pytorch/xla/commit/d6fb5391d09578c8804b1331a5e7a4f72bf981db>`__.\n\n\nSocket Error: Socket failed to bind\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn PyTorch 2.5, there needs to be a socket available for both torchrun and the ``init_process_group`` to bind. Both of these, by default,\nwill be set to unused sockets. If you plan to use a ``MASTER_PORT`` environment variable then this error may occur, if the port you set it to\nis already in use.\n\n.. code:: \n\n    [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use).\n    [W socket.cpp:426] [c10d] The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use).\n    [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address.\n    RuntimeError: The server socket has failed to listen on any local network address. \n    The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use).\n\nTo resolve the issue, please ensure if you are setting ``MASTER_PORT`` that the port you're setting it to is not used anywhere else in your scripts. Otherwise,\nyou can leave ``MASTER_PORT`` unset, and torchrun will set the default port for you.\n\n\n``AttributeError: module 'torch' has no attribute 'xla'`` Failure\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn PyTorch 2.5, training scripts might fail during activation checkpointing with the error shown below.\n\n.. code::\n\n    AttributeError: module 'torch' has no attribute 'xla'\n\n\nThe solution is to use ``torch_xla.utils.checkpoint.checkpoint`` instead of ``torch.utils.checkpoint.checkpoint`` as the checkpoint function while wrapping pytorch modules for activation checkpointing.\nRefer to the pytorch/xla discussion regarding this `issue <https://github.com/pytorch/xla/issues/5766>`_.\nAlso set ``use_reentrant=True`` while calling the torch_xla checkpoint function. Failure to do so will lead to ``XLA currently does not support use_reentrant==False`` error.\nFor more details on checkpointing, refer the `documentation <https://pytorch.org/docs/stable/checkpoint.html>`_.\n\n\nError ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhile using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial<torch-hf-bert-finetune>`), you may see the error \"Attempted to access the data pointer on an invalid python storage\". This is a known `issue <https://github.com/huggingface/transformers/issues/27578>`_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers.\n\n``ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` on Amazon Linux 2023\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\ntorch-xla version 2.5+ now requires ``libcrypt.so.1`` shared library. Currently, Amazon Linux 2023 includes ``libcrypt.so.2`` shared library by default so you may see `ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` when using torch-neuronx 2.1+ on Amazon Linux 2023. To install ``libcrypt.so.1`` on Amazon Linux 2023, please run the following installation command (see also https://github.com/amazonlinux/amazon-linux-2023/issues/182 for more context):\n\n.. code::\n\n   sudo dnf install libxcrypt-compat\n\n\n``FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path'`` Failure\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nIn PyTorch 2.5, users might face the error shown below due to incompatible ``libneuronxla`` and ``torch-neuronx`` versions being installed.\n\n.. code::\n\n    FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path'\n\nCheck that the version of ``libneuronxla`` that support PyTorch NeuronX 2.5 is ``2.1.*``. If not, then uninstall ``libneuronxla`` using ``pip uninstall libneuronxla`` and then reinstall the packages following the installation guide :ref:`installation guide <install_pytorch_neuron_2_5>`\n\n\nGlibC error on Amazon Linux 2\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nIf using Torch-NeuronX 2.5 on Amazon Linux 2, you will see a GlibC error below. Please switch to a newer supported OS such as Ubuntu 22 or Amazon Linux 2023.\n\n.. code:: bash\n\n   ImportError: /lib64/libc.so.6: version `GLIBC_2.27' not found (required by /tmp/debug/_XLAC.cpython-38-x86_64-linux-gnu.so)\n\n``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running Neuron Parallel Compile with HF Trainer API, you may see the error ``Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` in Accelerator's ``pad_across_processes`` function. This is due to data-dependent operation in evaluation metrics computation. Data-dependent operations would result in undefined behavior with Neuron Parallel Compile trial execution (execute empty graphs with zero outputs). To work-around this error, please disable compute_metrics when NEURON_EXTRACT_GRAPHS_ONLY is set to 1:\n\n.. code:: python\n\n   compute_metrics=None if os.environ.get(\"NEURON_EXTRACT_GRAPHS_ONLY\") else compute_metrics\n\nCompiler assertion error when running Stable Diffusion training\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, with PyTorch 2.5 (torch-neuronx), we are seeing the following compiler assertion error with Stable Diffusion training when gradient accumulation is enabled. This will be fixed in an upcoming release. For now, if you would like to run Stable Diffusion training with Neuron SDK release 2.21/2.22, please disable gradient accumulation in torch-neuronx 2.5.\n\n.. code:: bash\n\n    ERROR 222163 [NeuronAssert]: Assertion failure in usr/lib/python3.9/concurrent/futures/process.py at line 239 with exception:\n    too many partition dims! {{0,+,960}[10],+,10560}[10]\n\n\nFrequently Asked Questions (FAQ)\n--------------------------------\n\nDo I need to recompile my models with PyTorch 2.5?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes.\n\nDo I need to update my scripts for PyTorch 2.5?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPlease see the :ref:`migration guide <migrate_to_pytorch_2_5>`\n\nWhat environment variables will be changed with PyTorch NeuronX 2.5 ?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used). Please switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`)\n\nWhat features will be missing with PyTorch NeuronX 2.5?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch NeuronX 2.5 now has most of the supported features in PyTorch NeuronX 2.1, with known issues listed above, and unsupported features as listed in :ref:`pytorch_rn`.\n\nCan I use Neuron Distributed and Transformers Neuron libraries with PyTorch NeuronX 2.5?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nYes, NeuronX Distributed, and Transformers NeuronX, and AWS Neuron Reference for NeMo Megatron libraries will work with PyTorch NeuronX 2.5.\n\nCan I still use PyTorch 2.1 version?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nPyTorch 2.1 is supported for release 2.21 and will reach end-of-life in a future release. Additionally, the CVEs `CVE-2024-31583 <https://github.com/advisories/GHSA-pg7h-5qx3-wjr3>`_ and `CVE-2024-31580 <https://github.com/advisories/GHSA-5pcm-hx3q-hm94>`_ affect PyTorch versions 2.1 and earlier.  We recommend upgrading to the new version of Torch-NeuronX by following :ref:`setup-torch-neuronx`.\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuronx/migration-from-xla-downcast-bf16.rst",
    "content": ".. _migration_from_xla_downcast_bf16:\n\nMigration From ``XLA_USE_BF16``/``XLA_DOWNCAST_BF16``\n=====================================================\n\nIntroduction\n------------\n\nThe environmental variables ``XLA_USE_BF16`` and ``XLA_DOWNCAST_BF16`` were created to provide an easy cast-to-bf16 option before automatic mixed-precision or ``model.to(torch.bfloat16)`` as available in Torch-XLA. Now that both automatic mixed precision and ``model.to(torch.bfloat16)`` are available in Torch-XLA,  ``XLA_USE_BF16`` and ``XLA_DOWNCAST_BF16`` are redundant and can be replaced with these options as a more familiar experience as on other platforms such as CPUs and GPUs. Using them in Torch-XLA 2.5+ would cause warnings to be displayed about their end-of-support. While they are still functional, their functionality will be removed in a future release (Torch-XLA 2.8) so the recommended changes below are available as replacement.\n\nNeuronX Distributed Training has been updated to use some of the options below. Please see :ref:`standard_mixed_precision` for more information.\n\nThe changes recommended below can best be made to scripts running with Torch-XLA 2.5+. The same recommendations are also available in :ref:`pytorch-neuronx-programming-guide`.\n\n.. note::\n\n    This guide recommends the options below as replacement for ``XLA_USE_BF16`` and ``XLA_DOWNCAST_BF16``. Do not set ``XLA_USE_BF16=1`` or ``XLA_DOWNCAST_BF16=1`` when using the options below on Neuron devices. Using them will override the per-operator precision settings provided by the options and thus cause more operators to execute in bfloat16.\n\nFull BF16 with stochastic rounding enabled\n------------------------------------------\n\nPreviously, on torch-neuronx 2.1 and earlier, the environmental variables ``XLA_USE_BF16`` or ``XLA_DOWNCAST_BF16`` provided full casting to BF16 with stochastic rounding enabled by default. These environmental variables are deprecated in torch-neuronx 2.5, although still functional with warnings. To replace ``XLA_USE_BF16`` or ``XLA_DOWNCAST_BF16`` with stochastic rounding on Neuron, set ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1`` and use the ``torch.nn.Module.to`` method to cast model floating-point parameters and buffers to data-type BF16 as follows:\n\n.. code:: python\n\n    os.environ[\"NEURON_RT_STOCHASTIC_ROUNDING_EN\"] = \"1\"\n\n    # model is created\n    model.to(torch.bfloat16)\n\nStochastic rounding is needed to enable faster convergence for full BF16 model.\n\nIf the loss is to be kept in FP32, initialize it with ``dtype=torch.float`` as follows:\n\n.. code:: python\n\n    running_loss = torch.zeros(1, dtype=torch.float).to(device)\n\nSimilarly, if the optimizer states are to be kept in FP32, convert the gradients to FP32 before optimizer computations:\n\n.. code:: python\n\n    grad = p.grad.data.float()\n\nFor a full example, please see the :ref:`PyTorch Neuron BERT Pretraining Tutorial (Data-Parallel) <hf-bert-pretraining-tutorial>`, which has been updated to use ``torch.nn.Module.to`` instead of ``XLA_DOWNCAST_BF16``.\n\nBF16 in GPU-compatible mode without stochastic rounding enabled\n---------------------------------------------------------------\n\nFull BF16 training in GPU-compatible mode would enable faster convergence without the need for stochastic rounding, but would require a FP32 copy of weights/parameters to be saved and used in the optimizer. To enable BF16 in GPU-compatible mode without stochastic rounding enabled, use the ``torch.nn.Module.to`` method to cast model floating-point parameters and buffers to data-type bfloat16 as follows without setting ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1``:\n\n.. code:: python\n\n    # model is created\n    model.to(torch.bfloat16)\n\nIn the initializer of the optimizer, for example AdamW, you can add code like the following code snippet to make a FP32 copy of weights:\n\n.. code:: python\n\n        # keep a copy of weights in highprec\n        self.param_groups_highprec = []\n        for group in self.param_groups:\n            params = group['params']\n            param_groups_highprec = [p.data.float() for p in params]\n            self.param_groups_highprec.append({'params': param_groups_highprec})\n\nFrom then, you can use the usual gradients but updating the FP32 copy of weights instead:\n\n.. code:: python\n\n        for group, group_highprec in zip(self.param_groups, self.param_groups_highprec):\n            for p, p_highprec in zip(group['params'], group_highprec['params']):\n                # convert gradients to FP32 before computing exponential average\n                grad = p.grad.data.float()\n\n                # compute the exponential average and denominator using grad\n                ...\n\n                # Update FP32 copy of weights\n                p_highprec.data.addcdiv_(exponential_avg, denominator, value=-step_size)\n\n\nIn the :ref:`PyTorch Neuron BERT Pretraining Tutorial (Data-Parallel) <hf-bert-pretraining-tutorial>`, this mode can be enabled by pasing ``--optimizer=AdamW_FP32ParamsCopy`` option to ``dp_bert_large_hf_pretrain_hdf5.py`` and setting ``NEURON_RT_STOCHASTIC_ROUNDING_EN=0`` (or leave it unset).\n\nBF16 automatic mixed precision using PyTorch Autocast\n-----------------------------------------------------\n\nBy default, the compiler automatically casts internal FP32 operations to\nBF16. You can disable this and allow PyTorch's BF16 automatic mixed precision function (``torch.autocast``) to\ndo the casting of certain operations to operate in BF16.\n\nTo enable PyTorch's BF16 mixed-precision, first turn off the Neuron\ncompiler auto-cast:\n\n.. code:: python\n\n   os.environ[\"NEURON_CC_FLAGS\"] = \"--auto-cast=none\"\n\nNext, per recommendation from official PyTorch `torch.autocast documentation <https://pytorch.org/docs/stable/amp.html#autocasting>`__, place only\nthe forward-pass of the training step in the ``torch.autocast`` scope with ``xla`` device type:\n\n.. code:: python\n\n   with torch.autocast(dtype=torch.bfloat16, device_type='xla'):\n       # forward pass\n\nThe device type is XLA because we are using PyTorch-XLA's autocast backend. The PyTorch-XLA `autocast mode source code <https://github.com/pytorch/xla/blob/master/torch_xla/csrc/autocast_mode.cpp>`_ lists which operations are casted to lower precision BF16 (\"lower precision fp cast policy\" section), which are maintained in FP32 (\"fp32 cast policy\"), and which are promoted to the widest input types (\"promote\" section).\n\n.. note::\n\n   If an operation is not part of any policy in `autocast mode source code <https://github.com/pytorch/xla/blob/master/torch_xla/csrc/autocast_mode.cpp>`_, the data type of the inputs will be used for the computation of the operation.\n\n\nExample showing the original training code snippet:\n\n.. code:: python\n\n   def train_loop_fn(train_loader):\n       for i, data in enumerate(train_loader):\n           inputs = data[0]\n           labels = data[3]\n           outputs = model(inputs, labels=labels)\n           loss = outputs.loss/ flags.grad_acc_steps\n           loss.backward()\n           optimizer.step()\n           xm.mark_step()\n\nThe following shows the training loop modified to use BF16 autocast:\n\n.. code:: python\n\n   os.environ[\"NEURON_CC_FLAGS\"] = \"--auto-cast=none\"\n\n   def train_loop_fn(train_loader):\n       for i, data in enumerate(train_loader):\n           torch.cuda.is_bf16_supported = lambda: True\n           with torch.autocast(dtype=torch.bfloat16, device_type='xla'):\n               inputs = data[0]\n               labels = data[3]\n               outputs = model(inputs, labels=labels)\n           loss = outputs.loss/ flags.grad_acc_steps\n           loss.backward()\n           optimizer.step()\n           xm.mark_step()\n\nFor a full example of BF16 mixed-precision, see :ref:`PyTorch Neuron BERT Pretraining Tutorial (Data-Parallel) <hf-bert-pretraining-tutorial>`.\n\nSee official PyTorch documentation for more details about\n`torch.autocast <https://pytorch.org/docs/stable/amp.html#autocasting>`__\n.\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuronx/torch-neuronx-dataparallel-app-note.rst",
    "content": ".. _torch-neuronx-dataparallel-app-note:\n\nData Parallel Inference on torch_neuronx\n=======================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nIntroduction\n------------\n\nThis guide introduces :func:`torch_neuronx.DataParallel`, a Python API that\nimplements data parallelism on :class:`~torch.jit.ScriptModule` models created by the\n:ref:`torch_neuronx_trace_api`.\nThe following sections explain how data parallelism can improve the performance of\ninference workloads on Inferentia, including how :func:`torch_neuronx.DataParallel`\nuses dynamic batching to run inference on variable input sizes. It covers an\noverview of the :func:`torch_neuronx.DataParallel` module and provides a few\n:ref:`example data parallel applications <data_parallel_examples_torch_neuronx>`.\n\nData parallel inference\n-------------------------\n\nData Parallelism is a form of parallelization across multiple devices or cores,\nreferred to as nodes. Each node contains the same model and parameters, but\ndata is distributed across the different nodes. By distributing the\ndata across multiple nodes, data parallelism reduces the total\nexecution time of large batch size inputs compared to sequential execution.\nData parallelism works best for smaller models in latency sensitive\napplications that have large batch size requirements.\n\n\ntorch_neuronx.DataParallel\n-------------------------\n\nTo fully leverage the Inferentia hardware, we want to use all available\nNeuronCores. An inf2.xlarge and inf2.8xlarge have two NeuronCores, an\ninf2.24xlarge has 12 NeuronCores, and an inf2.48xlarge has 24 NeuronCores.\nFor maximum performance on Inferentia hardware, we can use\n:func:`torch_neuronx.DataParallel` to utilize all available NeuronCores.\n\n:func:`torch_neuronx.DataParallel` implements data parallelism at the module\nlevel by replicating the Neuron model on all available NeuronCores\nand distributing data across the different cores for parallelized inference.\nThis function is analogous to :class:`~torch.nn.DataParallel` in PyTorch.\n:func:`torch_neuronx.DataParallel` requires PyTorch >= 1.8.\n\nThe following sections provide an overview of some of the features\nof :func:`torch_neuronx.DataParallel` that enable maximum performance on\nInferentia.\n\nNeuronCore selection\n^^^^^^^^^^^^^^^^^^^^\n\nBy default, DataParallel will try to use all NeuronCores allocated to the\ncurrent process to fully saturate the Inferentia hardware for maximum performance.\nIt is more efficient to make the batch dimension divisible by the number of\nNeuronCores. This will ensure that NeuronCores are not left idle during\nparallel inference and the Inferentia hardware is fully utilized.\n\nIn some applications, it is advantageous to use a subset of the\navailable NeuronCores for DataParallel inference. DataParallel has a\n``device_ids`` argument that accepts a list of :obj:`int` or ``'nc:#'``\nthat specify the NeuronCores to use for parallelization. See\n:ref:`Specifying NeuronCores <dataparallel_example_specify_ncs_torch_neuronx>`\nfor an example of how to use ``device_ids`` argument.\n\nBatch dim\n^^^^^^^^^\n\nDataParallel accepts a ``dim`` argument that denotes the batch dimension used\nto split the input data for distributed inference. By default,\nDataParalell splits the inputs on ``dim = 0`` if the ``dim`` argument is not\nspecified. For applications with a non-zero batch dim, the ``dim`` argument\ncan be used to specify the inference-time input batch dimension.\n:ref:`DataParallel with dim ! = 0 <dataparallel_example_dim_neq_zero_torch_neuronx>` provides an\nexample of data parallel inference on inputs with batch dim = 2.\n\n.. _dynamic_batching_description_torch_neuronx:\n\nDynamic batching\n^^^^^^^^^^^^^^^^\n\nBatch size has a direct impact on model performance. The Inferentia chip is optimized\nto run with small batch sizes. This means that a Neuron compiled model can outperform\na GPU model, even if running single digit batch sizes.\n\nAs a general best practice, we recommend optimizing your model's throughput by\ncompiling the model with a small batch size and gradually increasing it to\nfind the peak throughput on Inferentia.\n\nDynamic batching is a feature that allows you to use tensor batch sizes that the\nNeuron model was not originally compiled against. This is necessary because the\nunderlying Inferentia hardware will always execute inferences with the batch\nsize used during compilation. Fixed batch size execution allows tuning the\ninput batch size for optimal performance. For example, batch size 1 may be\nbest suited for an ultra-low latency on-demand inference application, while\nbatch size > 1 can be used to maximize throughput for offline inferencing.\nDynamic batching is implemented by slicing large input tensors into chunks\nthat match the batch size used during the :func:`torch_neuronx.trace` compilation call.\n\nThe :func:`torch_neuronx.DataParallel` class automatically enables dynamic batching on\neligible models. This allows us to run inference in applications that have\ninputs with a variable batch size without needing to recompile the model. See\n:ref:`Dynamic batching <dataparallel_example_dynamic_batching_torch_neuronx>` for an example\nof how DataParallel can be used to run inference on inputs with a dynamic batch\nsize without needing to recompile the model.\n\nDynamic batching using small batch sizes can result in sub-optimal throughput\nbecause it involves slicing tensors into chunks and iteratively sending data\nto the hardware. Using a larger batch size at compilation time can use the\nInferentia hardware more efficiently in order to maximize throughput. You can\ntest the tradeoff between individual request latency and total throughput by\nfine-tuning the input batch size.\n\nAutomatic batching in the DataParallel module can be disabled using the\n``disable_dynamic_batching()`` function as follows:\n\n.. code-block:: python\n\n   >>> model_parallel = torch_neuronx.DataParallel(model_neuron)\n   >>> model_parallel.disable_dynamic_batching()\n\nIf dynamic batching is disabled, the compile-time batch size must be equal to\nthe inference-time batch size divided by the number of NeuronCores.\n:ref:`DataParallel with dim != 0 <dataparallel_example_dim_neq_zero_torch_neuronx>` and\n:ref:`Dynamic batching disabled <dataparallel_example_disable_dynamic_batching_torch_neuronx>`\nprovide examples of running DataParallel inference with dynamic batching\ndisabled.\n\n\nPerformance optimizations\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe DataParallel module has a ``num_workers`` attribute that can be used to\nspecify the number of worker threads used for multithreaded inference. By\ndefault, ``num_workers = 2 * number of NeuronCores``. This value can be\nfine tuned to optimize DataParallel performance.\n\nDataParallel has a ``split_size`` attribute that dictates the size of the input\nchunks that are distributed to each NeuronCore. By default,\n``split_size = max(1, input.shape[dim] // number of NeuronCores)``. This value\ncan be modified to optimally match the inference input chunk size with the\ncompile-time batch size.\n\n.. _data_parallel_examples_torch_neuronx:\n\nExamples\n--------\n\nThe following sections provide example usages of the\n:func:`torch_neuronx.DataParallel` module.\n\n\n.. _dataparallel_example_default_torch_neuronx:\n\nDefault usage\n^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-default.rst\n\n.. _dataparallel_example_specify_ncs_torch_neuronx:\n\nSpecifying NeuronCores\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-specify-ncs.rst\n\n\n.. _dataparallel_example_dim_neq_zero_torch_neuronx:\n\nDataParallel with dim != 0\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dim-neq-zero.rst\n\n\n.. _dataparallel_example_dynamic_batching_torch_neuronx:\n\nDynamic batching\n^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dynamic-batching.rst\n\n\n.. _dataparallel_example_disable_dynamic_batching_torch_neuronx:\n\nDynamic batching disabled\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-disable-dynamic-batching.rst\n\n"
  },
  {
    "path": "about-neuron/appnotes/torch-neuronx/torch-neuronx-graph-partitioner-app-note.rst",
    "content": ".. _torch-neuronx-graph-partitioner-app-note:\n\nGraph Partitioner on torch_neuronx\n=======================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nIntroduction\n------------\n\nThis guide introduces the graph partitioner for torch-neuronx.\nThe following sections explain the purpose of the graph partitioner,\nhow it works, and go over a few examples.\n\nThe Purpose of the Graph Partitioner\n------------------------------------\n\nWhile ``neuronx-cc`` is very sophisticated and can compile most operators,\nthere are some operator configurations that are not supported by the compiler.\nUsually in a model that contains unsupported operators, these are only a few\noperators while the supported parts of the model can benefit from the acceleration\nbenefits that Neuron offers. With this in mind, we developed a graph partitioner\nthat will partition out unsupported operators to be executed on CPU, while \ncompiling and executing the supported operators on Neuron.\n\nHow it Works\n------------\n\nDetermining Unsupported Operators\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOperator support is determined by the ``neuronx-cc`` compiler frontend. This is done\nbecause this gives us more flexibility than a static list. This is evident\nin cases where a specific operator configuration is supported but another\nconfiguration is not supported. For example, we support the square root operator,\nbut do not support it with a ``C64`` data type for example.\n\nTo check operator support, we use the :func:`torch_neuronx.analyze` API, which\nqueries the compiler for device placement: Neuron or CPU, which gives the graph\npartitioner a base graph to start partitioning.\n\nThe below image shows the flow of the graph partitioner:\n\n|torch-neuronx-graph-partitioner-flow-diagram|\n\n.. |torch-neuronx-graph-partitioner-flow-diagram| image:: /images/torch-neuronx-graph-partitioner-flow-diagram.png\n\nCustomizability\n^^^^^^^^^^^^^^^\n\nThe graph partitioner has a wide range of customizability\nfor a variety of situations. The customization options include:\n\n1. **Minimum Operator Support:** Only partition the model if a minimum percentage of operators are supported.\n2. **Minimum Subgraph Size:** The minimum number of operators in any given subgraph. This can be useful if having compute chokepoints with single operator subgraphs is not desired.\n3. **Maximum Subgraph Count:** The maximum number of subgraphs. Too many subgraphs can fragment the computation graph causing performance degredation.\n4. **Ops to Partition:** Additional operators to partition to CPU beyond the unsupported operators. This can be useful to suggest to the graph partitioner to partition to create a more balanced graph.\n\nFurthermore, compiler flags/args can be passed into all Neuron subgraphs through the graph partitioner.\n\nFor the API Reference, visit :func:`torch_neuronx.trace` and :class:`torch_neuronx.PartitionerConfig`\n\n.. note::\n  Dynamic batching has a case-by-case support with partitioned\n  models, because it is highly dependent on how the\n  final partition scheme looks like.\n\nExamples\n--------\n\nThe following sections provide example usages of the graph partitioner.\n\nDefault Usage\n^^^^^^^^^^^^^\n\nThe below model is a simple MLP model with sorted log softmax output.\nThe sort operator, ``torch.sort()`` or ``aten::sort``, is not supported\nby ``neuronx-cc`` at this time, so the graph partitioner will partition\nout the sort operator to CPU.\n\n.. code-block:: python\n\n  import torch\n  import torch_neuronx\n  import torch.nn as nn\n\n  import logging\n  \n  # adjust logger level to see what the partitioner is doing\n  logger = logging.getLogger(\"Neuron\")\n\n  class MLP(nn.Module):\n      def __init__(\n          self, input_size=28 * 28, output_size=10, layers=[4096, 2048]\n      ):\n          super(MLP, self).__init__()\n          self.fc1 = nn.Linear(input_size, layers[0])\n          self.fc2 = nn.Linear(layers[0], layers[1])\n          self.fc3 = nn.Linear(layers[1], output_size)\n          self.relu = nn.ReLU()\n\n      def forward(self, x):\n          f1 = self.fc1(x)\n          r1 = self.relu(f1)\n          f2 = self.fc2(r1)\n          r2 = self.relu(f2)\n          f3 = self.fc3(r2)\n          out = torch.log_softmax(f3, dim=1)\n          sort_out,_ = torch.sort(out)\n          return sort_out\n\n  n = MLP()\n  n.eval()\n\n  inputs = torch.rand(32,784)\n\n  # Configure the graph partitioner with the default values\n  partitioner_config = torch_neuronx.PartitionerConfig()\n\n  # Trace a neural network with graph partitioner enabled\n  neuron_net = torch_neuronx.trace(n, inputs, partitioner_config=partitioner_config)\n\n  # Run inference on the partitioned model\n  output = neuron_net(inputs)\n\n\nSpecifying requirements\n^^^^^^^^^^^^^^^^^^^^^^^\n\nThis example is very similar to the previous example, but\nhas two differences. The unsupported sort operator is sandwiched\nbetween the ReLU activation function after the first linear layer\nand the second linear layer. The second difference is that we are\nspecifying a max subgraph count of 2.\n\n.. code-block:: python\n\n  import torch\n  import torch_neuronx\n  import torch.nn as nn\n\n  import logging\n  \n  # adjust logger level to see what the partitioner is doing\n  logger = logging.getLogger(\"Neuron\")\n\n  class MLP(nn.Module):\n      def __init__(\n          self, input_size=28 * 28, output_size=10, layers=[4096, 2048]\n      ):\n          super(MLP, self).__init__()\n          self.fc1 = nn.Linear(input_size, layers[0])\n          self.fc2 = nn.Linear(layers[0], layers[1])\n          self.fc3 = nn.Linear(layers[1], output_size)\n          self.relu = nn.ReLU()\n\n      def forward(self, x):\n          f1 = self.fc1(x)\n          r1 = self.relu(f1)\n          sort_r1,_ = torch.sort(r1)\n          f2 = self.fc2(sort_r1)\n          r2 = self.relu(f2)\n          f3 = self.fc3(r2)\n          out = torch.log_softmax(f3, dim=1)\n          return out\n\n  n = MLP()\n  n.eval()\n\n  inputs = torch.rand(32,784)\n\n  # Configure the graph partitioner with the default values\n  partitioner_config = torch_neuronx.PartitionerConfig(max_subgraph_count=2)\n\n  # This trace will fail since the min_subgraph_size requirement can't be satisfied by the graph partitioner\n  neuron_net = torch_neuronx.trace(n, inputs, partitioner_config=partitioner_config)\n\nOutput:\n\n.. code-block::\n\n    ValueError: The partitioner has found 3 subgraphs which exceeds the specified max subgraph count of 2.\n\n\nThis example fails because the sort operator placement generates 3 subgraphs, which is more than 2.\n\nSpecifying additional operators to partition\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis example shows a situation where we want to partition out\nthe log_softmax operator despite it being supported. We also specify\nan 80% support percentage threshold.\n\n.. code-block:: python\n\n  import torch\n  import torch_neuronx\n  import torch.nn as nn\n\n  import logging\n  \n  # adjust logger level to see what the partitioner is doing\n  logger = logging.getLogger(\"Neuron\")\n  logger.setLevel(logging.INFO)\n\n  class MLP(nn.Module):\n      def __init__(\n          self, input_size=28 * 28, output_size=10, layers=[4096, 2048]\n      ):\n          super(MLP, self).__init__()\n          self.fc1 = nn.Linear(input_size, layers[0])\n          self.fc2 = nn.Linear(layers[0], layers[1])\n          self.fc3 = nn.Linear(layers[1], output_size)\n          self.relu = nn.ReLU()\n\n      def forward(self, x):\n          f1 = self.fc1(x)\n          r1 = self.relu(f1)\n          f2 = self.fc2(r1)\n          r2 = self.relu(f2)\n          f3 = self.fc3(r2)\n          out = torch.log_softmax(f3, dim=1)\n          sort_out,_ = torch.sort(out)\n          return sort_out\n\n  n = MLP()\n  n.eval()\n\n  inputs = torch.rand(32,784)\n\n  # Configure the graph partitioner with the default values\n  partitioner_config = torch_neuronx.PartitionerConfig(min_operator_percentage_threshold=0.8,ops_to_partition=set([\"aten::log_softmax\"]))\n\n  # This trace succeeds\n  neuron_net = torch_neuronx.trace(n, inputs, partitioner_config=partitioner_config)\n\nKey Output logs:\n\n.. code-block::\n\n    ...\n    Neuron: The following operations are currently supported:\n    Neuron: aten::linear\n    Neuron: aten::relu\n    Neuron: aten::log_softmax\n    Neuron: The following operations are currently not supported:\n    Neuron: aten::sort, unsup.py(28): <stack_trace>\n    ...\n    Neuron: 85.71% of arithmetic operations (6 of 7) are supported\n    Neuron: Num Partitions: 2\n\n    Neuron: Creating Partition #1 for device: Device.NEURON\n    Neuron: The following operators will be included in this partition:\n    Neuron: prim::GetAttr:9\n    Neuron: aten::linear:3\n    Neuron: aten::relu:2\n    ...\n    Neuron: Creating Partition #2 for device: Device.CPU\n    Neuron: The following operators will be included in this partition:\n    Neuron: prim::Constant:4\n    Neuron: aten::sort:1\n    Neuron: aten::log_softmax:1\n\n\nNotice that we still report that ``aten::log_softmax`` is still supported, but also\nreport that ``aten::log_softmax`` is in Partition #2 which is for ``Device.CPU``."
  },
  {
    "path": "about-neuron/appnotes/transformers-neuronx/generative-llm-inference-with-neuron.rst",
    "content": ".. _neuron_llm_inference:\n\nGenerative LLM inference with Neuron\n====================================\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nBackground\n----------\n\nLarge Language Models (LLMs) generate human-like text through a\nprocess known as generative inference. Fundamentally, given an input prompt, generative LLM\ninference generates text outputs, by\niteratively predicting the next token in a sequence.\n\nThese models typically take a sequence of integers as input, which\nrepresent a sequence of tokens (words/subwords), and generate a\nprediction for the next token to be emitted. Below is a simple example\nthat illustrates this in code:\n\n.. code-block:: python\n\n    # Vocabulary of tokens the model can parse. The position of each token in the \n    # vocabulary is used as the token_id (an integer representing that token)\n    vocab = [\"having\", \"I\", \"fun\", \"am\", \"learning\", \".\", \"Neuron\"]\n\n    # input token_ids: list of integers that represent the input tokens in this\n    # case: \"I\", \"am\", \"having\", \"fun\"\n    input_token_ids = [1, 3, 0, 2] \n                                   \n\n    # The LLM gets a vector of input token_ids, and generates a probability-distribution\n    # for what the output token_id should be (with a probability score for each token_id\n    # in the vocabulary)\n    output = LLM(input_token_ids) \n                                  \n\n    # by taking argmax on the output, we effectively perform a 'greedy sampling' process,\n    # i.e. we choose the token_id with the highest probability. Other sampling techniques\n    # also exist, e.g. Top-K. By choosing a probabilistic sampling method we enable the model\n    # to generate different outputs when called multiple times with the same input.\n    next_token_id = np.argmax(output) \n\n\n    # map the token_id back into an output token\n    next_token = vocab[next_token_id] \n\n\nTo generate entire sentences, the application iteratively invokes the\nLLM to generate the next token's prediction, and at each iteration we\nappend the predicted token back into the input:\n\n\n.. code-block:: python\n\n   def generate(input_token_ids, n_tokens_to_generate):\n      for _ in range(n_tokens_to_generate): # decode loop\n          output = LLM(input_token_ids) # model forward pass\n      \n          next_token_id = np.argmax(output) # greedy sampling\n      \n          if (next_token_id == EOS_TOK_ID)\n              break # break if generated End Of Sentence (EOS)\n      \n          # append the prediction to the input, and continue to the next out_token\n          input_token_ids.append(int(next_token_id)) \n\n      return input_token_ids[-n_tokens_to_generate :] # only return generated token_ids\n\n   input_token_ids = [1, 3] # \"I\" \"am\"\n   output_token_ids = generate(input_tokens_ids, 4) # output_token_ids = [0, 2, 4, 6]\n   output_tokens = [vocab[i] for i in output_token_ids] # \"having\" \"fun\" \"learning\" “Neuron”\n\n\nThis process, of predicting a future value (regression) and adding\nit back into the input (auto), is sometimes referred to as\nautoregression. For more details, Jay Mody’s \\ `GPT in 60 Lines of\nNumPy <https://jaykmody.com/blog/gpt-from-scratch/>`__\\  is an\nexcellent writeup on GPTs (Generative Pre-trained Transformers).\n\n\nPerformance optimizations\n-------------------------\n\nThe sheer size of state-of-the-art LLMs, as well as the sequential\nnature of text generation, poses multiple challenges for efficient\ngenerative LLM deployment.\n\nFirst, the model is typically sharded across multiple devices, in order to fit the model\nin device memory. This creates communication overhead and complexity among devices.\nSecondly, certain deployments have strict application-level latency bounds, thus requiring\nsubstantial latency optimizations. This is especially challenging, due to the sequential nature\nof token-by-token generation. Finally, generating one token at a time often leads to poor \ndevice utilization, due to low arithmetic intensity, which can be improved via batching (see :ref:`what_batch_size_to_use`).\n\nThe Neuron SDK provides several built-in\noptimizations, allowing you to extract optimal performance when\ndeploying LLM models, including:\n\nKV-caching:\n^^^^^^^^^^^\n\nThe `transformers-neuronx <https://github.com/aws-neuron/transformers-neuronx>`__\nlibrary implements KV-cache optimization, which saves compute\nresources by reusing previously calculated SelfAttention key-value\npairs, instead of recalculating them for each generated token.\n\nTo illustrate this concept, see the\ninner workings of the MaskedSelfAttention operator in the figure below.\n\nAt each token generation step, the Query vector of a single current token is multiplied by the Key vectors of all \nprevious tokens in the sequence to create attention scores and these scores are further multiplied by the Value\nvectors of all previous tokens.\n\n\n.. image:: /images/masked-self-attention-operator.png\n\n\nThe core idea behind this optimization is that instead of re-computing the Key and Value vectors\nfor all previous tokens at each token generation step, Neuron can perform only incremental\ncomputation for the current token and re-use previously computed Key/Value vectors from the KV-cache. \nThe Key/Value vector of the current token is also appended to the KV-cache, for the next token generation step.\n\n\n\n.. image:: /images/kv-cache-optimization.png\n\n\n\nNote that the first token in the\noutput sequence is unique in two ways:\n\n.. container::\n\n   -  No KV-cache is available at this point.\n   -  Neuron needs to compute the entire KV-cache for <input_len> tokens (the\n      input prompt), rather than one incremental KV-cache entry.\n\nThis means that first-token latency is typically higher\nthan the following tokens.\n\nModel sharding:\n^^^^^^^^^^^^^^^\n\nNeuron enables you to shard the model across devices via Tensor\nParallelism, Pipeline Parallelism (coming soon), or a combination of the two (coming soon).\n\nTensor Parallelism shards each layer across multiple devices,\nenabling you to achieve the optimal latency.\n\nPipeline Parallelism places different layers on different devices and\ncreates a pipeline between them (as the name suggests) and is\nuseful mainly when optimizing throughput and/or cost-per-inference.\n\nTo find the optimal Tensor/Pipeline parallelism configuration for your\nmodel, see the :ref:`model_partitioning` section.\n \nComputation/communication overlap:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThe Neuron compiler automatically fuses Collective Communication\nprimitives (e.g., AllReduce) with the following computation (e.g.,\nGEMM) in the compute graph. This helps minimize any overhead caused by sharding the\nmodel across devices.\n\nCompact data-types:\n^^^^^^^^^^^^^^^^^^^\nNeuron supports INT8 and FP8 (coming soon), which can significantly reduce the model's memory bandwidth and capacity requirements. \nThis is especially useful for Generative LLM inference, which is typically memory-bound. Therefore, using a compact data-type can improve the overall\nLLM inference performance with lower latency and higher throughput.\n\n\nBucketing:\n^^^^^^^^^^\nThe transformers-neuronx library automatically uses bucketing to process the input prompt and output tokens. Bucketing makes\nit possible to handle variable sequence lengths, without requiring support for dynamic shapes. Using multiple progressively \nlarger buckets helps minimize the portion of the KV-cache that needs to be read for each token.\n\n.. _model_partitioning:\n\nModel partitioning\n------------------\n\nHow many NeuronCores do I need?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTransformer models are typically defined via a hyper-parameter configuration, such\nas the following:\n\n.. code-block:: python\n\n   {\n    \"n_vocab\": 50257, # number of tokens in our vocabulary\n    \"n_ctx\": 2048, # maximum possible sequence length of the input\n    \"n_embd\": 9216, # embedding dimension (determines the \"width\" of the network)\n    \"n_head\": 72, # number of attention heads (n_embd must be divisible by n_head)\n    \"n_layer\": 64 # number of layers (determines the \"depth\" of the network)\n   }\n\nTo determine the number of NeuronCores needed to fit the model,\nperform the following calculation:\n\n.. code-block:: python\n\n   weight_mem_footprint = 12 x <n_layer> x <n_embd>^2 x <dtype-size> \n   KV_cache_mem_footprint = <batch-size> x <n_layer> x <n_ctx> x <n_embd> x 2 x <dtype-size>\n   # <dtype-size> is 2 for BF16/FP16, or 1 for FP8/INT8\n\n   mem_footprint = weight_mem_footprint + KV_cache_mem_footprint\n\n\nAnd from here, determining the number of NeuronCores is straightforward:\n\n\n.. code-block:: python\n\n   num_neuron_cores = ceil_to_closest_supported_size (mem_footprint / <NC-HBM-capacity>, <instance-type>) # 16GiB per Inferentia2/Trainium1 NeuronCore\n\n\n\nFor example, when running OPT-66B on Inf2, with a batch-size of 16, \nthe number of required NeuronCores can be computed as follows.\n\n\n.. code-block:: python\n\n   # OPT-66B example (BF16, Inf2)\n   # n_layer=64, n_ctx=2048, n_embd=9216, batch=16\n   weight_mem_footprint = 12 x 64 x 9216^2 x 2 = 121.5 GiB\n   KV_cache_mem_footprint = 16 x 64 x 2048 x 9216 x 2 x 2 = 72 GiB \n\n   mem_footprint = 121.5GiB + 72GiB = 193.5 GiB\n\n   num_neuron_cores = ceil_to_closest_supported_size (193.5GiB / 16GiB, Inf2)\n                    = ceil_to_closest_supported_size (12.1) = 24\n                    ## Currently, the Neuron runtime supports tensor-parallelism degrees 2, 8, and 32 on Trn1\n                    ## and supports tensor-parallelism degrees 2, 4, 8, 12 and 24 on Inf2.\n\n\nUse the :ref:`neuron_calculator` to compute the number of cores needed for a custom hyper-parameter configuration.\n\nWhich parallelism technique should I use?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTensor parallelism improves latency, at the expense of increased\nintra-layer communication. Thus, as a general rule, it is recommended to use\nthe smallest tensor parallelism degree that meets your latency\nrequirement and then use pipeline/data parallelism from that point on.\n\nIf latency is not a major concern in your application (e.g., model evaluation)\nand the primary goal is to maximize throughput (i.e., minimize total cost per token),\nthen it is most efficient to use pipeline parallelism and increase the batch-size\nas much as possible.\n\n\n.. _what_batch_size_to_use:\n\nWhat batch-size should I use?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nDue to the serial token generation nature of generative LLM inference,\nthis workload tends to be extremely memory bound. This means that\nthroughput (and thus cost per inference) improves significantly by\nbatching.\n\nAs a general rule, we recommend increasing the batch-size to the\nmaximum amount that fits within the latency budget (up to batch=256.\nA larger batch-size typically does not help with performance.)\n\nNote that the KV-cache grows linearly with the batch-size and can\ngrow until it runs out of memory (typically referred to as\nOOM). If the latency budget allows, we recommend increasing the\nbatch-size to the maximum value that does not result in OOM.\n\nUsers may also consider pipelining the model beyond what is necessary\nto fit model parameters / KV-cache on devices, in order to free up\ndevice-memory space and thus allow the batch-size to increase\nwithout causing OOM issues.\n\n\n"
  },
  {
    "path": "about-neuron/arch/glossary.rst",
    "content": ".. _neuron_hw_glossary:\n\nNeuron Glossary\n===============\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nTerms\n-----\n\nNeuron Devices (Accelerated Machine Learning chips)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n      \n\n   * - Term\n     - Description\n\n   * - .. glossary::\n          Inferentia\n     - AWS first generation accelerated machine learning chip supporting inference only\n\n   * - .. glossary::\n          Trainium/Inferentia2\n     - AWS second generation accelerated machine learning chip supporting training and inference\n\n   * - .. glossary::\n          Trainium2\n     - AWS second generation accelerated machine learning chip supporting training and inference\n\n   * - .. glossary::\n          Neuron Device\n     - Accelerated machine learning chip (e.g. Inferentia or Trainium)\n\nNeuron powered Instances\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n      \n\n   * - Term\n     - Description\n\n\n   * - .. glossary::\n          Inf1\n     - Inferentia powered accelerated compute EC2 instance\n\n   * - .. glossary::\n          Trn1\n     - Trainium powered accelerated compute EC2 instance\n\n   * - .. glossary::\n          Inf2\n     - Inferentia2 powered accelerated compute EC2 instance\n\n   * - .. glossary::\n          Trn2\n     - Trainium2 powered accelerated compute EC2 instance\n\n\nNeuronCore terms\n^^^^^^^^^^^^^^^^\n\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n      \n\n   * - Term\n     - Description\n\n\n   * - .. glossary::\n          NeuronCore\n     - The machine learning compute cores within Inferentia/Trainium\n\n   * - .. glossary::\n          NeuronCore-v1\n     - Neuron Core within Inferentia\n\n   * - .. glossary::\n          NeuronCore-v2\n     - Neuron Core within Trainium1/Inferentia2\n\n   * - .. glossary::\n          NeuronCore-v3\n     - Neuron Core within Trainium2\n\n   * - .. glossary::\n          Tensor Engine\n     - 2D systolic array (within the NeuronCore), used for matrix computations\n\n   * - .. glossary::\n          Scalar Engine\n     - A scalar-engine within each NeuronCore, which can accelerate element-wise operations (e.g. GELU, ReLU, reciprocal, etc)\n\n   * - .. glossary::\n          Vector Engine\n     - A vector-engine with each NeuronCore, which can accelerate spatial operations (e.g. layerNorm, TopK, pooling, etc)\n\n   * - .. glossary::\n          GPSIMD Engine\n     - Embedded General Purpose SIMD cores, within each NeuronCore, to accelerate custom-operators\n\n   * - .. glossary::\n          Sync Engine\n     - The SP engine, which is integrated inside NeuronCore. Used for synchronization and DMA triggering.\n\n   * - .. glossary::\n          Collective Communication Engine\n     - Dedicated engine for collective communication, allows for overlapping computation and communication\n\n   * - .. glossary::\n          High Bandwidth Memory\n     - `High Bandwidth Memory <https://en.wikipedia.org/wiki/High_Bandwidth_Memory>`_, used as device memory for NeuronCore-v2 and beyond.\n   \n   * - .. glossary::\n          State Buffer\n     - The main software-managed on-chip memory in NeuronCore-v1 and beyond.\n\n   * - .. glossary::\n          Partial Sum Buffer\n     - A second software-managed on-chip memory in NeuronCore-v1 and beyond, with near-memory accumulation support for TensorE output data.\n    \n   * - .. glossary::\n          NeuronLink\n     - Interconnect between NeuronCores\n\n   * - .. glossary::\n          NeuronLink-v1\n     - Interconnect between NeuronCores in Inferentia device\n\n   * - .. glossary::\n          NeuronLink-v2\n     - Interconnect between NeuronCores in Trainium1/Inferentia2 device\n\n   * - .. glossary::\n          NeuronLink-v3\n     - Interconnect between NeuronCores in Trainium2 device\n\nNeuron SDK terms\n^^^^^^^^^^^^^^^^\n\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n      \n\n   * - Term\n     - Description\n\n\n   * - .. glossary::\n          Neuron Kernel Interface\n     - A bare-metal language and compiler for directly programming Neuron devices available on AWS Trainium/Inferentia2 and beyond devices.\n\n\nAbbreviations\n-------------\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n      \n\n   * - Abbreviation\n     - Description\n\n   * - .. glossary::\n          NxD Core\n     - NeuronX Distributed Core Library\n\n   * - .. glossary::\n          NxD Training\n     - NeuronX Distributed Training Library\n\n   * - .. glossary::\n          NxD Inference\n     - NeuronX Distributed Inference Library\n\n   * - .. glossary::\n          NC\n     - Neuron Core\n\n   * - .. glossary::\n          NeuronCore\n     - Neuron Core\n     \n   * - .. glossary::\n          ND\n     - Neuron Device\n\n   * - .. glossary::\n          NeuronDevice\n     - Neuron Device\n\n   * - .. glossary::\n          TensorE\n     - Tensor Engine\n\n   * - .. glossary::\n          ScalarE\n     - Scalar Engine\n\n   * - .. glossary::\n          VectorE\n     - Vector Engine\n\n   * - .. glossary::\n          GpSimdE\n     - GpSimd Engine\n\n   * - .. glossary::\n          CCE\n     - Collective Communication Engine\n\n   * - .. glossary::\n          HBM\n     - High Bandwidth Memory\n\n   * - .. glossary::\n          SBUF\n     - State Buffer \n\n   * - .. glossary::\n          PSUM\n     - Partial Sum Buffer\n\n   * - .. glossary::\n          FP32\n     - Float32\n\n   * - .. glossary::\n          TF32\n     - TensorFloat32\n\n   * - .. glossary::\n          FP16\n     - Float16\n\n   * - .. glossary::\n          BF16\n     - Bfloat16\n\n   * - .. glossary::\n          cFP8\n     - Configurable Float8\n\n   * - .. glossary::\n          RNE\n     - Round Nearest Even\n\n   * - .. glossary::\n          SR\n     - Stochastic Rounding\n\n   * - .. glossary::\n          NKI\n     - Neuron Kernel Interface\n\n   * - .. glossary::\n          CustomOps\n     - Custom Operators\n\n   * - .. glossary::\n          RT\n     - Neuron Runtime\n\n   * - .. glossary::\n          DP\n     - Data Parallel\n\n   * - .. glossary::\n          DPr\n     - Data Parallel degree\n\n   * - .. glossary::\n          TP\n     - Tensor Parallel\n\n   * - .. glossary::\n          TPr\n     - Tensor Parallel degree\n\n   * - .. glossary::\n          PP\n     - Pipeline Parallel\n\n   * - .. glossary::\n          PPr\n     - Pipeline Parallel degree\n"
  },
  {
    "path": "about-neuron/arch/index.rst",
    "content": ".. _neuron-architecture-index:\n\n.. meta::\n   :description: Explore the hardware architecture of AWS Neuron instances, including EC2 Trn and Inf instance types, AWS Inferentia and Trainium chips, and NeuronCore processing units. Learn about system specifications, memory hierarchies, interconnect topologies, and architectural considerations for machine learning workloads.\n   :date-modified: 2025-10-03\n\nAWS Neuron architecture guides\n==============================\n\nReview and understand the hardware architecture of AWS Neuron instances, including AWS Elastic Compute Cloud (EC2) ``Trn`` and ``Inf`` instance types, AWS Inferentia and Trainium chips, and NeuronCore processing units. The documentation covers system specifications, memory hierarchies, interconnect topologies, and architectural considerations for machine learning workloads.\n\nAbout Neuron Hardware\n----------------------\n\nAWS Neuron hardware consists of custom-designed machine learning accelerators optimized for deep learning workloads. This section covers the architecture and capabilities of AWS Inferentia and Trainium chips, their NeuronCore processing units, and the EC2 instances that host them.\n\nTrainium Architecture\n----------------------\n\n.. grid:: 2\n   :gutter: 2\n\n   .. grid-item-card:: AWS Trainium3\n      :link: neuron-hardware/trainium3\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Third-generation training accelerator chip\n\n   .. grid-item-card:: AWS Trainium2\n      :link: neuron-hardware/trainium2\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Second-generation training accelerator chip\n\n   .. grid-item-card:: AWS Trainium\n      :link: neuron-hardware/trainium\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      First-generation training accelerator chip\n\nInferentia Architecture\n------------------------\n\n.. grid:: 2\n   :gutter: 2\n\n   .. grid-item-card:: AWS Inferentia2\n      :link: neuron-hardware/inferentia2\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Second-generation inference accelerator chip\n\n   .. grid-item-card:: AWS Inferentia\n      :link: neuron-hardware/inferentia\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      First-generation inference accelerator chip\n\nNeuronCore Architecture\n------------------------\n\nNeuronCores are fully-independent heterogenous compute-units that power Tranium, Tranium2, Inferentia, and Inferentia2 chips.\n\n.. grid:: 2\n   :gutter: 2\n\n   .. grid-item-card:: NeuronCore v4\n      :link: neuron-hardware/neuron-core-v4\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Processing unit architecture for Trainium3\n\n   .. grid-item-card:: NeuronCore v3\n      :link: neuron-hardware/neuron-core-v3\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Processing unit architecture for Trainium2\n\n   .. grid-item-card:: NeuronCore v2\n      :link: neuron-hardware/neuron-core-v2\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Processing unit architecture for Inferentia2 and Trainium\n\n\n   .. grid-item-card:: NeuronCore v1\n      :link: neuron-hardware/neuron-core-v1\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Processing unit architecture for Inferentia\n\nNeuron AWS EC2 Platform Architecture\n-------------------------------------\n\nOverviews of the AWS Inf and Trn instance and UltraServer architectures.\n\n.. grid:: 2\n   :gutter: 2\n\n   .. grid-item-card:: Inf1 Architecture\n      :link: neuron-hardware/inf1-arch\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Inf1 instance architecture and specifications\n\n   .. grid-item-card:: Inf2 Architecture\n      :link: neuron-hardware/inf2-arch\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Inf2 instance architecture and specifications\n\n   .. grid-item-card:: Trn1 Architecture\n      :link: neuron-hardware/trn1-arch\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Trn1 instance architecture and specifications\n\n   .. grid-item-card:: Trn2 Architecture\n      :link: neuron-hardware/trn2-arch\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Trn2 instance architecture and specifications\n\n   .. grid-item-card:: Trn3 Architecture\n      :link: neuron-hardware/trn3-arch\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Trn3 instance architecture and specifications\n\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   AWS Inferentia <neuron-hardware/inferentia>\n   AWS Inferentia2 <neuron-hardware/inferentia2>\n   AWS Trainium <neuron-hardware/trainium>\n   AWS Trainium2 <neuron-hardware/trainium2>\n   AWS Trainium3 <neuron-hardware/trainium3>\n   NeuronCore v1 <neuron-hardware/neuron-core-v1>\n   NeuronCore v2 <neuron-hardware/neuron-core-v2>\n   NeuronCore v3 <neuron-hardware/neuron-core-v3>\n   NeuronCore v4 <neuron-hardware/neuron-core-v4>\n   Inf1 Architecture <neuron-hardware/inf1-arch>\n   Inf2 Architecture <neuron-hardware/inf2-arch>\n   Trn1 Architecture <neuron-hardware/trn1-arch>\n   Trn2 Architecture <neuron-hardware/trn2-arch>\n   Trn3 Architecture <neuron-hardware/trn3-arch>\n"
  },
  {
    "path": "about-neuron/arch/neuron-features/custom-c++-operators.rst",
    "content": ".. _feature-custom-c++-operators:\n\nNeuron Custom C++ Operators\n===========================\n\n.. include:: /neuron-customops/customops-intro.txt\n\n\nFor more details see :ref:`neuron_c++customops`"
  },
  {
    "path": "about-neuron/arch/neuron-features/data-types.rst",
    "content": ".. _neuron-data-types:\n\nData Types\n==========\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nIntroduction\n------------\n\nInferentia and Trainium NeuronDevices include different NeuronCore versions, which support different data-types. This section describes what data-types are supported in each NeuronCore version.\n\nNeuronCore v1 Data Types\n------------------------\n\nNeuron Data-Types\n^^^^^^^^^^^^^^^^^\n\nNeuron enables developers to choose from multiple data-types. The\nsupported data-types are FP32, FP16, and BF16. Developers can\ntrain their models on their platform of choice (e.g. EC2 P3 instances),\nand then easily move their trained models to EC2 Inf1 for execution.\n\n.. raw:: html\n\n  <style type=\"text/css\">table, td, th { border: 1px solid black; padding: 5px; }\n  </style>\n  <table style=\"table-layout: fixed; width: 50%; border-spacing:0px;\">\n  \t<tbody>\n  \t\t<tr>\n  \t\t\t<th width=\"20%\">Data Type</th>\n  \t\t\t<th width=\"10%\">S</th>\n         <th colspan=\"8\">Range</th>\n  \t\t\t<th colspan=\"23\">Precision</th>\n  \t\t</tr>\n  \t\t<tr>\n  \t\t\t<td>FP32</td>\n  \t\t\t<td bgcolor=\"#ad3bff\">1</td>\n         <td bgcolor=\"#AFEFA9\" colspan=\"8\">8 bits</td>\n  \t\t\t<td bgcolor=\"#FAC49E\" colspan=\"23\">23 bits</td>\n  \t\t</tr>\n  \t\t<tr>\n  \t\t\t<td>BF16</td>\n  \t\t\t<td bgcolor=\"#ad3bff\">1</td>\n         <td bgcolor=\"#AFEFA9\" colspan=\"8\">8 bits</td>\n  \t\t\t<td style=\"border-right: 0px\" colspan=\"13\" />\n  \t\t\t<td colspan=\"3\" />\n  \t\t\t<td bgcolor=\"#FAC49E\" colspan=\"7\">7 bits</td>\n  \t\t</tr>\n  \t\t<tr>\n  \t\t\t<td>FP16</td>\n  \t\t\t<td bgcolor=\"#ad3bff\">1</td>\n         <td colspan=\"3\" />\n         <td bgcolor=\"#AFEFA9\" colspan=\"5\">5 bits</td>\n         <td colspan=\"13\" />\n  \t\t\t<td bgcolor=\"#FAC49E\" colspan=\"10\">10 bits</td>\n  \t\t</tr>\n  \t</tbody>\n  </table>\n  <p/>\n\nFP16/BF16 models\n~~~~~~~~~~~~~~~~\n\nModels natively trained in FP16/BF16 will be executed in their trained\ndata-types. This is a straightforward migration from the training\nplatform to Inf1.\n\nFP32 models\n~~~~~~~~~~~\n\nNeuron SDK supports **automatic model conversion** from FP32 to BF16 by\ndefault. This capability allows developers to train their models using\nFP32 format for the highest accuracy, and achieve performance benefits\nwithout having to worry about low-precision training (e.g. no need for\nloss-scaling during training). ML models are typically robust to FP32 to\nBF16 conversion, with minimal to no impact on accuracy. The conversion\naccuracy is model dependent; therefore, users are encouraged to\nbenchmark the accuracy of the auto-converted model against the original\nFP32 trained model.\n\nWhen the compiler is supplied with an unmodified FP32 model input it\nwill automatically compile the model to run as BF16 on Inferentia. During\ninference the FP32 input data will be auto-converted internally by\nInferentia to BF16 and the output will be converted back to FP32\ndata-type. For explicit FP16 inferencing, either use an FP16 trained\nmodel, or use an external tool (like AMP) to make the explicit\nconversions.\n\n.. _neuron-data-types-v2:\n\nNeuronCore v2 Data Types\n------------------------\n\nThe NeuronCore v2 supports the following data types:\n\n* 32 and 16-bit Floating Point (FP32 / FP16)\n* TensorFloat-32 (TF32)\n* Brain Floating Point (BFloat16)\n* 8-bit Floating point with configurable range and precision (cFP8)\n* Unsigned 8-bit integer (UINT8)\n\nThe layout for these is as follows:\n\n.. raw:: html\n\n  <style type=\"text/css\">table, td, th { border: 1px solid black; padding: 5px; }\n  </style>\n  <table style=\"table-layout: fixed; width: 50%; border-spacing:0px;\">\n  \t<tbody>\n  \t\t<tr>\n  \t\t\t<th width=\"20%\">Data Type</th>\n  \t\t\t<th width=\"10%\">S</th>\n         <th colspan=\"8\">Range</th>\n  \t\t\t<th colspan=\"23\">Precision</th>\n  \t\t</tr>\n  \t\t<tr>\n  \t\t\t<td>FP32</td>\n  \t\t\t<td bgcolor=\"#ad3bff\">1</td>\n  \t\t\t<td bgcolor=\"#AFEFA9\" colspan=\"8\">8 bits</td>\n         <td bgcolor=\"#FAC49E\" colspan=\"23\">23 bits</td>\n  \t\t</tr>\n  \t\t<tr>\n  \t\t\t<td>TF32</td>\n  \t\t\t<td bgcolor=\"#ad3bff\">1</td>\n         <td bgcolor=\"#AFEFA9\" colspan=\"8\">8 bits</td>\n  \t\t\t<td colspan=\"13\" />\n  \t\t\t<td bgcolor=\"#FAC49E\" colspan=\"10\">10 bits</td>\n  \t\t</tr>\n  \t\t<tr>\n  \t\t\t<td>BF16</td>\n  \t\t\t<td bgcolor=\"#ad3bff\">1</td>\n         <td bgcolor=\"#AFEFA9\" colspan=\"8\">8 bits</td>\n  \t\t\t<td style=\"border-right: 0px\" colspan=\"13\" />\n  \t\t\t<td colspan=\"3\" />\n  \t\t\t<td bgcolor=\"#FAC49E\" colspan=\"7\">7 bits</td>\n  \t\t</tr>\n  \t\t<tr>\n  \t\t\t<td>FP16</td>\n  \t\t\t<td bgcolor=\"#ad3bff\">1</td>\n         <td colspan=\"3\" />\n  \t\t\t<td bgcolor=\"#AFEFA9\" colspan=\"5\">5 bits</td>\n  \t\t\t<td colspan=\"13\" />\n  \t\t\t<td bgcolor=\"#FAC49E\" colspan=\"10\">10 bits</td>\n      </tr>\n      <tr>\n  \t\t\t<td>FP8_e5m2</td>\n  \t\t\t<td bgcolor=\"#ad3bff\">1</td>\n         <td colspan=\"3\" />\n  \t\t\t<td bgcolor=\"#AFEFA9\" colspan=\"5\">5 bits</td>\n         <td style=\"border-right: 0px\" colspan=\"18\" />\n         <td colspan=\"3\" />\n  \t\t\t<td bgcolor=\"#FAC49E\" colspan=\"2\">2 bits</td>\n  \t\t</tr>\n      <tr>\n  \t\t\t<td>FP8_e4m3</td>\n  \t\t\t<td bgcolor=\"#ad3bff\">1</td>\n         <td style=\"border-right: 0px\" colspan=\"3\" />\n         <td colspan=\"1\" />\n  \t\t\t<td bgcolor=\"#AFEFA9\" colspan=\"4\">4 bits</td>\n         <td style=\"border-right: 0px\" colspan=\"20\" />\n  \t\t\t<td bgcolor=\"#FAC49E\" colspan=\"3\">3 bits</td>\n  \t\t</tr>\n      <tr>\n  \t\t\t<td>FP8_e3m4</td>\n  \t\t\t<td bgcolor=\"#ad3bff\">1</td>\n         <td style=\"border-right: 0px\" colspan=\"4\" />\n         <td colspan=\"1\" />\n  \t\t\t<td bgcolor=\"#AFEFA9\" colspan=\"3\">3 bits</td>\n         <td style=\"border-right: 0px\" colspan=\"19\" />\n  \t\t\t<td bgcolor=\"#FAC49E\" colspan=\"4\">4 bits</td>\n  \t\t</tr>\n      <tr>\n  \t\t\t<td>UINT8</td>\n  \t\t\t<td colspan=\"1\" />\n  \t\t\t<td bgcolor=\"#AFEFA9\" colspan=\"8\">8 bits</td>\n         <td colspan=\"23\" />\n  \t\t</tr>\n  </table>\n  <p/>\n\n\n\nModel Type Conversion\n^^^^^^^^^^^^^^^^^^^^^\n\nThe Neuron SDK supports automatic model conversion from FP32 to BF16 by default. This capability allows developers to train their models using FP32 format for the highest accuracy, and then achieve run-time performance benefits without having to worry about low-precision training (e.g. no need for loss-scaling during training). ML models are typically robust to FP32 to BF16 conversion, with minimal to no impact on accuracy. Since conversion accuracy is model dependent, users are encouraged to benchmark the accuracy of the auto-converted model against the original FP32 trained model.\n\nSee :ref:`Mixed Precision and Performance-accuracy Tuning for Training<neuronx-cc-training-mixed-precision>` for more details on supported data types and their properties.\n\nThe Neuron compiler offers the ``--auto-cast`` and ``--auto-cast-type`` options to specify automatic casting of FP32 tensors to other data types to address performance and accuracy tradeoffs. See the :ref:`Neuron Compiler CLI Reference Guide<neuron-compiler-cli-reference-guide>` for a description of these options.\n\n\nNeuronCore v2 Rounding Modes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBecause floating point values are represented by a finite number of bits, they cannot represent all real numbers accurately. Floating point calculations that exceed their defined data type size are rounded. The NeuronCore v2 performs a Round-to-Nearest (RNE) algorithm with ties to Even by default. It also provides a new Stochastic Rounding mode. When Stochastic Rounding is enabled, the hardware will round the floating point value up or down using a proportional probability. This could lead to improved model convergence. Use the environment variable NEURON_RT_STOCHASTIC_ROUNDING_EN to select a rounding mode.\n"
  },
  {
    "path": "about-neuron/arch/neuron-features/index.rst",
    "content": ".. _neuron-features-index:\n\nNeuron Features\n===============\nNeuron features provide insights into Neuron capabilities that enable high-performance and improve usability of developing and deploying deep learning acceleration on top of Inferentia and Trainium based instances.\n\n.. grid:: 2\n      :gutter: 2\n\n      .. grid-item-card:: Custom C++ operators\n            :link: custom-c++-operators\n            :link-type: doc\n            :class-body: sphinx-design-class-title-small\n\n            Framework for implementing custom operators in C++ to extend Neuron's built-in operation support.\n\n      .. grid-item-card:: Data types\n            :link: data-types\n            :link-type: doc\n            :class-body: sphinx-design-class-title-small\n\n            Supported numerical data types including FP32, FP16, BF16, and INT8 for efficient model execution.\n\n      .. grid-item-card:: Logical NeuronCore configuration\n            :link: logical-neuroncore-config\n            :link-type: doc\n            :class-body: sphinx-design-class-title-small\n\n            Configuration options for grouping and managing NeuronCores as logical units for workload distribution.\n\n      .. grid-item-card:: Neuron persistent cache\n            :link: neuron-caching\n            :link-type: doc\n            :class-body: sphinx-design-class-title-small\n\n            Persistent caching system for compiled models to reduce compilation time across sessions.\n\n      .. grid-item-card:: NeuronCore batching\n            :link: neuroncore-batching\n            :link-type: doc\n            :class-body: sphinx-design-class-title-small\n\n            Batching strategies to maximize throughput by processing multiple inputs simultaneously on NeuronCores.\n\n      .. grid-item-card:: NeuronCore pipeline\n            :link: neuroncore-pipeline\n            :link-type: doc\n            :class-body: sphinx-design-class-title-small\n\n            Pipeline execution model that overlaps computation and data movement for improved performance.\n\n      .. grid-item-card:: Rounding modes\n            :link: rounding-modes\n            :link-type: doc\n            :class-body: sphinx-design-class-title-small\n\n            Configurable numerical rounding modes for controlling precision and accuracy in computations. \n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Custom C++ operators <custom-c++-operators>\n    Data types <data-types>\n    Logical NeuronCore configuration <logical-neuroncore-config>\n    Neuron persistent cache <neuron-caching>\n    NeuronCore batching <neuroncore-batching>\n    NeuronCore pipeline <neuroncore-pipeline>\n    Rounding modes <rounding-modes>\n\n\n\n    \n"
  },
  {
    "path": "about-neuron/arch/neuron-features/logical-neuroncore-config.rst",
    "content": ".. _logical-neuroncore-config:\n\n################################\nLogical NeuronCore configuration\n################################\n\nLogical NeuronCore configuration (LNC) is a set of compiler and runtime settings for instances powered by AWS Trainium2 that \ndetermines the number of NeuronCores exposed to your machine learning (ML) applications. LNC configuration works by combining \nthe compute and memory resources of multiple physical NeuronCores into a single logical NeuronCore. You can configure these settings \nto reduce the number of worker process needed for training and deployment of large-scale models. \n\n.. important::\n\n   LNC can only be set to **1** or **2**. These are the only supported values. On Trn2, each chip has 8 physical NeuronCores. With LNC=2 (default), these are grouped into 4 Logical NeuronCores. With LNC=1, all 8 physical cores are treated as individual logical NeuronCores. LNC applies only to Trn2 and Trn3 instances.\n\n\n\n.. contents:: Concepts\n    :depth: 1\n    :local:\n    :backlinks: none\n\n===================\nLogical NeuronCores\n===================\n\nA logical NeuronCore is a grouping of physical NeuronCores that the Neuron Compiler, Neuron Runtime, Neuron Tools, and Frameworks \nhandle as a single unified NeuronCore. Every Trainium2 device contains eight physical NeuronCore-v3. \n\n=============================\nCompiler and runtime settings\n=============================\n \nLNC configuration is controlled with the following runtime and compiler settings:\n\n| **Neuron Runtime**\n| The ``NEURON_LOGICAL_NC_CONFIG`` runtime environment variable controls how many physical NeuronCores are grouped to make up a logical NeuronCore.\n\n\n| **Neuron compiler flags** \n| The ``--logical-nc-config`` or ``-lnc`` command-line options control the degree of model sharding the compiler performs on an input graph. You must compile your Models to use the LNC configuration set by the Neuron Runtime environment variable. AWS Neuron currently doesn't support setting the compiler flag to a different LNC configuration than the Neuron Runtime environment variable. \n\n=================================\nLogical NeuronCore configurations\n=================================\n\nAWS Neuron supports the following Logical NeuronCore configurations:\n\n.. tab-set::\n\n    .. tab-item:: LNC = 2\n\n        A Logical NeuronCore configuration (LNC) of two is the default setting on Trainium2 devices. It combines two physical \n        NeuronCore-v3 into a logical NeuronCore with the software id ``NC_V3d``. When you set Logical NeuronCore configuration to \n        two, it directs Trainium2 devices to expose four ``NC_v3d`` to your machine learning applications. On this setting, \n        a ``Trn2.48xlarge`` instance presents 64 available NeuronCores. The folowing high-level diagram shows a ``Trn2.48xlarge`` \n        instance, connected in a 2D torus topology, with the Logical NeuronCore configuration set to two.\n\n        .. image:: /images/architecture/Trn2/trn2_lnc2.png\n            :align: center\n            :width: 750\n        |\n\n        Trainium2 devices contain four 24GB HBM banks. Each bank is shared by two physical NeuronCore-v3. \n        When LNC=2, the two physical NeuronCores share a single address space. Workers on each of the \n        two physical NeuronCores can access tensors and perform local collective operations without \n        accessing the network. The following diagram shows how a logical NeuronCore is presented to the \n        software under this configuration.\n\n        .. image:: /images/architecture/NeuronCore/lnc_2.png\n            :align: center\n            :width: 450\n        |\n\n        To set the Logical NeuronCore configuration to two, use the following runtime and compiler flag combination:\n\n        | **Runtime environment variable:**\n        | ``NEURON_LOGICAL_NC_CONFIG`` = 2\n\n        | **Compiler flag:**\n        | ``-lnc`` = 2 \n        |\n\n    .. tab-item:: LNC = 1\n\n        When you set the Logical NeuronCore configuration to one, it assigns each physical NeuronCore-v3 to a single logical \n        NeuronCore with the software id ``NC_V3``. This directs Trainium2 devices to expose eight ``NC_v3`` to your machine learning \n        applications. On this setting, a ``Trn2.48xlarge`` instance presents 128 available NeuronCores. \n        The following high-level diagram shows a ``Trn2.48xlarge`` instance, connected in a 2D torus topology, \n        with the Logical NeuronCore configuration set to one.\n\n        .. image:: /images/architecture/Trn2/trn2_lnc1.png\n            :align: center\n            :width: 750\n        |\n\n        Trainium2 devices contain four 24GB HBM banks. Each bank is shared by two physical NeuronCore-v3. \n        When the Logical NeuronCore configuration is set to one, both physical NeuronCores have access to the entire 24GB HBM bank. The following \n        diagram shows how logical NeuronCores are presented to the software under this configuration.\n\n        .. image:: /images/architecture/NeuronCore/lnc_1.png\n            :align: center\n            :width: 475\n        |\n\n        To set the Logical NeuronCore configuration to one, use the following runtime and compiler flag combination:\n\n        | **Runtime environment variable:**\n        | ``NEURON_LOGICAL_NC_CONFIG`` = 1\n\n        | **Compiler flag:**\n        | ``-lnc`` = 1\n        |\n\n        \n        \n\n\n "
  },
  {
    "path": "about-neuron/arch/neuron-features/neuron-caching.rst",
    "content": ".. _neuron-caching:\n\nNeuron Persistent Cache\n=======================\n\nPyTorch Neuron (``torch-neuronx``) uses ``torch-xla``, and ``torch-xla`` operates in lazy mode. In other words, every operation in training script\nis recorded in a graph. The graph is executed only when the results are requested by \nthe user when they use ``print`` or ``xm.mark_step``.  Requesting results tells \n``torch-xla`` that the recorded graph needs to be executed. \n\nBefore executing the graph on a Neuron device, ``torch-xla`` would call Neuron Compiler (``neuronx-cc``) to compile the graph into Neuron specific \ngraph. Then the graph is executed on the NeuronCores. Compiling the graph involves \nrunning optimizations that can make use of the NeuronCores efficiently. Running these \noptimizations can be expensive and can result in long compile times. To save the \nusers from compiling these graphs at every iteration, ``torch-xla`` maintains an \nin-memory cache called Just in Time (JIT) cache. When the user re-runs the same graph (eg. 2nd \niteration of the training run), torch-xla would check in this JIT cache and re-use \nthe cached compilation result, thereby avoiding the wait times.\n\nSince the JIT cache is an in-memory cache, it needs to be constructed every time the training script is \nrun. Hence, if the user re-runs the training script, a new JIT cache is created. This causes a compilation for the first training graph.\nTo avoid such  compilations across training runs, PyTorch Neuron (``torch-neuronx``) has built an on-disk \n``Neuron Persistent Cache``. Since this cache is on-disk, its persistent across training runs. So \nnow, when a graph is compiled for the fist time, the compilation result is saved in \n``Neuron Persistent Cache``. When the user re-runs the training script, since the JIT cache is not \nready, it would send the graph for compilation. PyTorch Neuron (``torch-neuronx``) would then check if \nthe compiled result is present in the ``Neuron Persistent Cache``, if yes, it would return with the \ncompiled result. This on-disk cache thereby avoids compilations across training runs. \nThis cache is enabled by default for Neuron's PyTorch/XLA flow (training) as well as\ntransformers-neuronx LLM inference package.\nThe default cache path is the directory ``/var/tmp/neuron-compile-cache``.\n\nLook at the diagram below on the end to end flow:\n\n|Image:|\n\nAs seen from the diagram, the operations are recorded in a graph in lazy mode and only \nwhen a mark_step is hit, the graph is executed. Before execution, the graph passes through\ntwo caches to check if we have compiled the graph sometime in the past. If yes, we reuse \nthe compilation result and execute with it. This avoid duplicate compilations.\nOne thing to note, both JIT cache and Neuron Cache are complementary to each other.\nJIT cache prevents duplicate compilation within a run and Neuron Cache prevents duplicate \ncompilations across training runs. For example, within a training script, we have a training \nloop that iterates through the dataset. The first iteration would trace a unique graph \nand the following iteration would trace a graph that is similar to the first one. In this case,\nthe subsequent iterations would hit the JIT cache and reuse the result. However, to save \nusers from compiling for the first iteration graph, ``Neuron Persistent Cache`` would be used. In this case,\nthe very first time when the script is run, the ``Neuron Persistent Cache`` would be updated. Going forward \nwhen we re-run the training script, compilation results from ``Neuron Persistent Cache`` would be used.\n\nTo better understand how ``Neuron Persistent Cache`` works, consider the example below:\n\n.. code:: python\n\n   import torch\n   import torch_xla\n   import torch_xla.core.xla_model as xm\n   device = xm.xla_device()\n   t1 = torch.randn(3, 3).to(device)\n   t2 = t1 / 0.5\n   x = t2.cpu()\n\nRunning the above example produces the following logs:\n\n.. code:: bash\n\n   2023-08-25 21:51:36.000433: INFO ||NCC_WRAPPER||: Compile cache path: /var/tmp/neuron-compile-cache\n   .\n   Compiler status PASS\n\nRe-running the above script would fetch the graph from the \nneuron cache and you would see logs as follows:\n\n.. code:: bash\n\n   2023-08-25 21:52:23.000451: INFO ||NCC_WRAPPER||: Compile cache path: /var/tmp/neuron-compile-cache\n   2023-08-25 21:52:23.000453: INFO ||NCC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.8.0.25+a3ad0f342/MODULE_198775565831884870+d41d8cd9/model.neff. Exiting with a successfully compiled graph.\n\nAs you can see, the next run picks the compiled graph from\ncache, thereby saving the compilation time.\nThe cache uses hash of the Neuron compiler flags and XLA graph as the\nkey. If the Neuron compiler version or XLA graph changes, you will see\nrecompilation. Examples of changes that would cause XLA graph change\ninclude:\n\n-  Model type and size\n-  Batch size\n-  Optimizer and optimizer hyperparameters\n-  Location of xm.mark_step()\n\nTo keep cache size small and to enable weights/parameters updates without recompilation, \nonly the compute graphs are cached when using transformers-neuronx (weights/parameters are inputs to the compute graphs) and \ntraining flow using torch-neuronx's XLA  (weights/parameters are inputs and outputs of the compute graphs). \nNote that this caching mechanism doesn't apply to the torch-neuronx trace API where the weights/parameters are frozen and converted to constants, \nthen compiled together with the compute operations (traced graphs with frozen weights/parameters are not cached).\n\nAll compilation results are saved in the cache. To disable the cache, you \ncan pass ``--no_cache`` option via NEURON_CC_FLAGS:\n\n.. code:: python\n\n   os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + ' --no_cache'\n\nThe default cache path is the directory ``/var/tmp/neuron-compile-cache``.\nTo change the cache's location, pass ``cache_dir=<cache_url>``\noption via ``NEURON_CC_FLAGS`` or ``NEURON_COMPILE_CACHE_URL=<cache_url>`` environment variables:\n\n.. code:: python\n\n   os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + ' --cache_dir=<cache URL>'\n\n.. code:: python\n\n   os.environ['NEURON_COMPILE_CACHE_URL'] = '<cache_URL>'\n\nThe cache URL specified using ``--cache_dir`` is prioritized over that specified using ``NEURON_COMPILE_CACHE_URL`` if both are set.\nIf ``<cache_url>`` starts with ``s3://``, it will use the AWS S3 URL as the cache location, provided that the corresponding S3 bucket exists and is both readable and writeable.\n\nYou can change the verbose level of the compiler by adding ``log_level`` to either ``WARNING``, ``INFO``\nor ``ERROR``. This can be done as follows:\n\n.. code:: python\n\n   os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + ' --log_level=INFO'\n\nA graph compilation can fail because of a compilation error or an environment issue (for example, compilation is interrupted by ctrl-C). The graph would be marked as failed and subsequent rerun would encounter message like below:\n\n.. code:: bash\n\n    INFO ||NCC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.8.0.25+a3ad0f342/MODULE_12486829708343293975+d41d8cd9/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation. \n\nTo retry compilation,\nadd ``--retry_failed_compilation`` in ``NEURON_CC_FLAGS`` environment variable. When the script is reran, all the previously failed compilations are recompiled and fresh results are saved in the cache.\n\n.. code:: python\n\n   os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + ' --retry_failed_compilation'\n\nNote that all flags demonstrated above will be parsed by a tool called ``neuron_cc_wrapper``, which is a wrapper over Neuron Compiler CLI to provide caching mechanism. All these flags will not be passed into Neuron Compiler CLI.  \n\n.. |Image:| image:: ./images/NeuronCaching.png\n"
  },
  {
    "path": "about-neuron/arch/neuron-features/neuroncore-batching.rst",
    "content": ".. _neuron-batching:\n\nNeuron Batching\n===============\n\nBatching refers to the process of grouping multiple samples together,\nand processing them as a group (i.e. passing them together through the\nneural network). Batching is typically used as an optimization for\nimproving throughput at the expense of higher latency (and potentially\nhigher memory footprint). Batching considerations are slightly different\nbetween inference and training workloads, and we thus cover them\nseparately below.\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\nBatching in inference workloads\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhat is batched inference?\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe concept of batched inference is conceptually illustrated below, with\na single NeuronCore performing batched computation of a 3 layer neural\nnetwork with a batch-size of 4. The NeuronCore reads the parameters for\na certain layer from the external memory, and then performs the\ncorresponding computations for all 4 inference-requests, before reading\nthe next set of parameters (thus, performing more compute for every\nparameter read from memory). \n\n.. image:: /images/batched-inference.png\n\n\nWhat are the benefits of batched Inference?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFor inference, batching is typically used as a trade-off knob between\nthroughput and latency: higher batch-size typically leads to better\nhardware utilization and thus higher throughput, but at the same time\nbatching requires to perform more computation until getting the first\nresults, and hence leads to higher latency. \n\n\n.. image:: /images/tradeoffs.png\n\nTo understand why batching tends to improve throughput (up to a certain max\nvalue), it is useful to consider an intuitive visual performance-model\ncalled ‘the roofline model’, which provides with a theoretical bound on\nthe system’s performance: \n\n\n.. image:: /images/memoryvscompute.png\n\nThe X-axis indicates the\narithmetic intensity (AI) of the workload, which is the ratio between\nthe number of operations and the number of bytes read-from/written-to\nmemory. The Y-axis indicates the theoretical extractable performance.\nFor small(large) AI values, the workload is expected to be\nmemory(compute) bound. For inference workloads, AI is often approximated\nby dividing the model’s number of operations by its memory footprint\n(#params x dtype_size). To a first order approximate, the AI value is\nlinearly dependent on the batch-size, which means that the workloads\nperformance (throughput) is expected to increase with the batch-size. To\nunderstand this more intuitively, for a larger batch size, Neuron can\nbetter amortize the cost of reading parameters from the external memory,\nand thus improve the overall hardware efficiency. It should be noted\nthat while the roofline model can be very useful, it is not perfectly\naccurate (e.g. it doesn’t take into account spill/fills from/to on-chip\nSRAM memories), and thus users are encouraged to use it as a tool for\n**estimating** the optimal batch-size for their workloads.\n\nHow to determine the optimal batch-size for inference workloads?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe optimal batch size is dependent on the application-level\nrequirements: some applications require strict latency guarantees (in\nwhich case, check out the :ref:`neuroncore-pipeline`\ntechnology), while other applications strictly aim to maximize\nthroughput. We thus encourage our users to try out multiple batch-sizes,\nand compare performance between them. A good starting for batch-size\nexploration can be identified using the roofline model: we can choose a\nbatch-size that achieves an Arithmetic Intensity which is at the edge of\nthe compute bound region. By doing that, we aim to achieve max\nthroughput with a minimal batch-size, and thus minimal impact to\nlatency. \n\n.. image:: /images/memoryvscompute2.png\n\n\nThis can be expressed via the following\nequation:\n``batch-size(Inference) = ceiling[0.5 x (<NeuronDevice PeakFLOPS>/<NeuronDevice MemBW>) /``\n``(<model FLOPs>/(<#model-dense-params> x <dtype_size>))]`` (for\nNeuronDevice PeakFLOPS and MemBW, see the :ref:`trainium-arch`, :ref:`inferentia-arch` and :ref:`inferentia2-arch` pages.\n\nFor example, a BF16 BERT-Large model, with a sequence length of 128,\nwill have the following approximated batch sizes:\n\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :stub-columns: 1    \n    :align: left\n    \n\n    *   - Model\n        - NeuronDevice\n        - Peak TFLOPS (BF16)\n        - MemBW (GB/sec)\n        - Model GFLOPs\n        - Model Dense Params (Millions)\n        - Data-type size (BF16)\n        - Approximated optimal batch-size\n\n    *   - BERT-Large (SeqLen=128)\n        - Inferentia\n        - 64\n        - 50\n        - 77.3\n        - 302\n        - 2\n        - 6\n\n    *   - BERT-Large (SeqLen=128)\n        - Trainium\n        - 210\n        - 820\n        - 77.3\n        - 302\n        - 2\n        - 2\n\n    *   - ResNet-50\n        - Inferentia\n        - 64\n        - 50\n        - 7.8\n        - 25\n        - 2\n        - 5\n\n    *   - ResNet-50\n        - Trainium\n        - 210\n        - 820\n        - 7.8\n        - 25\n        - 2\n        - 1\n\nWe recommend to evaluate multiple batch sizes and compare the\nperformance between them, in order to determine the optimal\nlatency/throughput deployment-point.\n\nHow to set the batch-size?\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Neuron compiler takes a model and its sample input, as inputs for\nthe compilation process. For example, the code snippet below will\ncompile a model with a batch-size of 4:\n\n.. code::\n\n   import torch\n   import torch_neuron\n   from torchvision import models\n\n   # Load the model and set it to evaluation mode\n   model = models.resnet50(pretrained=True)\n   model.eval()\n\n   # Compile with an example input of batch size 4\n   image = torch.rand([4, 3, 224, 224])\n\n   model_neuron = torch.neuron.trace(model, image, dynamic_batch_size=True)\n\n   # Execute with a batch of 12 images\n   batch = torch.rand([12, 3, 224, 224])\n   results = model_neuron(batch)\n\nFor ahead-of-time compiled inference graphs (i.e. Inf1), dynamic\nbatching can be used (as shown in the above code snippet) to process a\nlarger client-side inference batch-size, and allow the framework to\nautomatically break up the user-batch (12 in our case) into smaller\nbatch sizes, to match the compiled batch-size (4 in our case). This\ntechnique increases the achievable throughput by hiding the\nframework-to-neuron overhead, and amortizing it over a larger batch\nsize.\n\n.. seealso::\n\n  - :ref:`torch-neuronx-dynamic-batching` in ``torch-neuronx``\n  - :ref:`tensorflow-neuronx-special-flags` in ``tensorflow-neuronx``.\n\n\nBatching in training workloads\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUnlike inference workloads, training is inherently an offline process,\nand thus doesn’t have latency requirements. This means that training is\nalmost always batched to some degree.\n\nBatch-size naming\n^^^^^^^^^^^^^^^^^\n\nFor distributed processing, defining the batch size depends on the observation\nlevel. There are multiple terms you should be aware of when running\na distributed training job, especially global batch size (GBS) and micro-batch.\nKnowing the batch size in advance is crucial for precompiling the computational\ngraph and for setting the hyperparameters.\n\n  micro-batch size\n    Smallest unit of the number of samples getting processed in a single step\n    in the accelerator. For very large models, it is frequently chosen to be 1.\n\n  gradient accumulation\n    Process of iterating over a micro-batch multiple times and summing up the gradients \n    before an optimizer update. This can happen in a dedicated loop for gradient \n    accumulation or as part of multiple iterations of samples in pipeline parallelism.\n    See :ref:`pp_developer_guide` for more details on pipeline parallelism.\n\n  data-parallel size (or DP degree)\n    Number of model replicas that process different portions of data in parallel.\n    Each replica maintains a complete copy of the model while processing unique\n    data chunks, after which their gradients are synchronized for the optimizer update. \n    See :ref:`neuron_hw_glossary` for more details.\n\n  global batch-size\n    Number of total samples used for an update of the optimizer.\n    This includes all the respective gradients that get added up from\n    data-parallel processing or gradient accumulation.\n    :literal:`global batch size = micro_batch_size * data_parallel_size * gradient_accumulation_steps`\n\n  mini-batch or replica-batch size\n    Number of samples that contribute to a gradient within one data-parallel rank.\n    A mini-batch gradient is obtained by aggregating multiple\n    micro-batch gradients within or without a pipeline (aka. gradient accumulation).\n    :literal:`mini_batch_size = micro_batch_size * gradient_accumulation_steps`\n\n  worker batch\n    The portion of mini-batch samples processed by a worker.\n    The idea behind a worker batch is that one worker (node) might have a subset of the dp-degrees \n    and we care about how much data gets tackled by this worker.\n\nHow to determine the optimal batch-size for training workloads?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDetermining the optimal batch-size for training workloads can be a\nnon-trivial task. In most cases, we’d want to choose the largest\nbatch-size that we can get away with.\n\nThe most dominant factor for determining the optimal batch-size in\ntraining workloads is memory footprint: training workloads have higher\nmemory footprint compared to inference, as they require saving more\ntensors aside from the model parameters, such as gradients, intermediate\nactivations (passed between forward-pass and backward-pass), and\noptimizer-state. If the batch-size is increased beyond a certain point,\none can run out of device memory (indicated by an ‘Out of device memory’\nerror, typically abbreviated as OOM).\n\nTo estimate the memory footprint of a model, we look at the different\ncontributors:\n\n1. Weights and gradients:\n\n   1. typically 2B each, thus 4B per parameter\n\n2. Optimizer state:\n\n   1. typically 4B - 12B per parameter\n\n3. Intermediate activations:\n\n   1. sum of all tensor sizes for forward pass\n   2. for example, for a transformer neural network, this is roughly 16\n      x x <num_layers> x x x = 100MB x\n\n\nFor training workloads, determining the optimal batch size can be a\nlittle more tricky, due to two reasons:\n\n1. *Higher memory footprint:* Training workloads have higher memory\n   footprint compared to inference, as they require saving more tensors\n   aside from the model parameters, such as gradients,\n   intermediate-state and optimizer-state. If the batch-size is\n   increased too much, one can run out of device memory (indicated by an\n   ‘Out of memory’ error, typically abbreviated as OOM).\n2. *Arithmetic intensity estimation:* Arithmetic intensity is harder to\n   estimate in training workloads, compared to inference workloads, as\n   the majority of the external memory access are due to reads/writes of\n   intermediate activation state (rather than parameters), which\n   requires lower level familiarity with the model to estimate\n   correctly.\n\nA good first order approximate for the optimal batch-size in a training\nworkload, is the largest one that can fit in the device’s memory (i.e.\nwon’t lead to OOM error).\n:literal:`batch-size(Training) = 0.6 x (<TP-Rank> x <PP-Rank> x ``<NeuronCore MemoryCapacity>)`\n:literal:`/ ``(<#model-dense-params> x ``<model-state-bytes-per-parameter>)`\n\nNote TP-rank stands for Tensor-Parallelism rank, i.e. how many\nNeuronCores participate in a single Tensor-Parallelism group. Similarly,\nPP-rank stands for Pipeline-Parallelism rank, i.e. how many NeuronCores\nparticipate in a single Pipeline-Parallelism group.\n\nFor example, for BERT-Large Ph1 training, with a model-state of 4B per\nparameter (2B weights, 2B parameters), and TP-rank = PP-rank = 1, the\napproximated optimal per-NeuronCore training batch-size would be:\n:literal:`batch-size(Training/Trainium) = 0.6 x (1 x 1 x 16e+9``) / ``(300e+6 x 4``) = 8`\n"
  },
  {
    "path": "about-neuron/arch/neuron-features/neuroncore-pipeline.rst",
    "content": ".. _neuroncore-pipeline:\n\nNeuronCore Pipeline\n===================\n\nThe Neuron software feature referred to as a NeuronCore Pipeline refers\nto the process of sharding a compute-graph across multiple NeuronCores,\ncaching the model parameters in each core’s on-chip memory (cache), and\nthen streaming inference requests across the cores in a pipelined\nmanner. Based on the number of NeuronCores selected, the model might get\nseamlessly sharded across up-to 16 Inferentia devices (i.e. 64\nNeuronCores). This enables users to optimize for both throughput and\nlatency, as it enables the NeuronCores to process neural-networks with\nlocally cached data and avoid the cost of accessing external memory.\n|Image:|\n\nOne benefit to this approach is that NeuronCore Pipeline can typically\nhit maximal hardware efficiency without the need for batching (e.g.\nBERT, ResNet50).\n\nFor maximal performance, users should choose an instance-size that can\ncache the entire model by using sufficient NeuronCores. Inf1 instance\ntypes have different number of Inferentia devices, each of which has 4\nNeuronCores, as shown here\nhttps://aws.amazon.com/ec2/instance-types/inf1/\n\nTo enable the NeuronCore Pipeline optimization, the compiler should be\ninvoked with the following flags: ``--neuroncore-pipeline-cores N``. The\nnumber of NeuronCores is typically chosen to be the minimal number that\ncan fit the entire model, which is currently done through a\ntrial-and-error process (compiling to different number of cores and\nlooking for compilation success/failure message). This process will be\nautomated in the future. A simple formula to help define the number of\nNeuronCores that may be an appropriate choice is\n\n::\n\n   neuroncore-pipeline-cores = 4 * round( number-of-weights-in-model/(2 * 10^7) ) \n\nThis allocates a set of NeuronCores based on the size of the given\nmodel's weights and normalizes to multiples of 4 so it uses full\nInferentias.\n\nThe code snippet below shows how to compile a model with NeuronCore\nPipeline for 16 NeuronCores (instance size inf1.6xlarge).\n\n::\n\n   import numpy as np\n   import tensorflow.neuron as tfn\n\n   example_input = np.zeros([1,224,224,3], dtype='float16')\n   tfn.saved_model.compile(\"rn50_fp16\",\n                           \"rn50_fp16_compiled/1\",\n                           model_feed_dict={'input_1:0' : example_input },\n                           compiler_args = ['--neuroncore-pipeline-cores', '16'])\n\n.. |Image:| image:: ./images/NeuronCorePipelining.png\n"
  },
  {
    "path": "about-neuron/arch/neuron-features/rounding-modes.rst",
    "content": ".. _neuron-rounding-modes:\n\nNeuron Rounding Modes\n=====================\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 1\n\n\n.. _neuron-rounding-mode-rne:\n\nRound Nearest, ties to Even (RNE)\n---------------------------------\n\nWhen the exact result of a floating point operation cannot be exactly\nrepresented as a floating point value, it must be rounded. The IEEE\n754-2008 standard defines the default rounding mode to be ‘Round\nNearest, ties to Even’ (RNE for short). Under this scheme, numbers are\nrounded to the nearest representable value, and in case of a ‘tie’ (i.e.\nthe number is exactly between the two nearest representable values)\nnumbers will be rounded to the nearest even number.\n\nAll NeuronCore generations support the RNE rounding scheme, which is the\nmost commonly used rounding scheme for Machine Learning workloads. Below\nis an illustration of the RNE rounding scheme: \n\n.. image:: /images/rne1.png\n    :width: 700\n\n.. image:: /images/rne2.png\n    :width: 700\n\n.. image:: /images/rne3.png\n    :width: 700\n\n.. _neuron-rounding-mode-sr:\n\n\nStochastic Rounding (SR)\n------------------------\n\nOne downside of the RNE rounding scheme (and other rounding schemes\ndescribed in the IEEE 754-2008 standard), is that when adding floating\npoint values of significantly different magnitudes, rounding can squash\nsmall values and prevent them from accumulating over time. \n\nTo improve this, starting from the second generation of the NeuronCore\n(NeuronCore-v2), customers can choose between the RNE rounding scheme\ndescribed above, and a second rounding scheme called ‘Stochastic\nRounding’ (SR for short). Stochastic rounding prevents the computation\nprecision-loss described above, by performing the rounding operations in\na probabilistic manner, according to the relative distance from the two\nnearest representable values, as illustrated below: \n\n.. image:: /images/sr.png\n    :width: 700\n\n\nBy performing the rounding in a probabilistic manner, this scheme allows\nfor small increments to accumulate over time, even when added to numbers\nof significantly higher magnitude, which leads to more precise results\nwhen performing large floating point computations (as done for machine\nlearning).\n\n\nQuick Tests \n-----------\n\nAs an example, we examine the code-snippet below:\n\n::\n\n   import torch\n   import torch_xla\n   import torch_xla.core.xla_model as xm\n   device = xm.xla_device()\n   \n   a = torch.tensor(1024.0).half().to(device)\n   \n   for i in range(2048) :\n      a = (a + 0.5)\n      xm.mark_step()\n   \n   print(a)\n\n\nThis code shows that rounding can significantly impact the calculation’s precision over time.\nTo use standard RNE rounding, use the environment variable ``NEURON_RT_STOCHASTIC_ROUNDING_EN=0``.\nTo enable stochastic rounding, use the environment variable ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1``.\n\nNOTE: Stochastic rounding mode is enabled by default in PyTorch-Neuron when XLA_USE_BF16=1.\n\nThe first test continues to show 1024 due to RNE rounding after each addition, and the second test shows result that is mostly in line with expectation.\n\n::\n\n   $ NEURON_RT_STOCHASTIC_ROUNDING_EN=0 python3 rounding_mode_test.py\n   \n   tensor(1024., device='xla:1', dtype=torch.float16)\n   \n   $ NEURON_RT_STOCHASTIC_ROUNDING_EN=1 python3 rounding_mode_test.py\n   \n   tensor(2056., device='xla:1', dtype=torch.float16)\n\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/inf1-arch.rst",
    "content": ".. _aws-inf1-arch:\n\nAmazon EC2 Inf1 Architecture\n==============================\n\nOn this page, we provide an architectural overview of the Amazon EC2 Inf1\ninstance and the corresponding :ref:`Inferentia <inferentia-arch>` NeuronChips that power\nthem (:ref:`Inferentia <inferentia-arch>` chips from here on).\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n.. _inf1-arch:\n\nInf1 Architecture\n-----------------\n\nThe EC2 Inf1 instance is powered by 16 :ref:`Inferentia <inferentia-arch>` chips, allowing\ncustomers to choose between four instance sizes:\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :stub-columns: 1    \n    :align: left\n    \n\n    *   - Instance size\n        - # of Inferentia chips\n        - vCPUs\n        - Host Memory (GiB)\n        - FP16/BF16 TFLOPS\n        - INT8 TOPS\n        - Device Memory (GiB)\n        - Device Memory bandwidth (GiB/sec)\n        - NeuronLink-v1 chip-to-chip bandwidth (GiB/sec/chip)\n        - EFA bandwidth (Gbps)\n\n    *   - Inf1.xlarge\n        - 1\n        - 4\n        - 8\n        - 64\n        - 128\n        - 8\n        - 50\n        - N/A\n        - up-to 25\n\n\n    *   - Inf1.2xlarge\n        - 1\n        - 8\n        - 16\n        - 64\n        - 128\n        - 8\n        - 50\n        - N/A\n        - up-to 25\n\n    *   - Inf1.6xlarge\n        - 4\n        - 24\n        - 48\n        - 256\n        - 512\n        - 32\n        - 200\n        - 32\n        - 25\n\n    *   - Inf1.24xlarge\n        - 16\n        - 96\n        - 192\n        - 1024\n        - 2048\n        - 128\n        - 800\n        - 32\n        - 100\n\n\n\nInf1 offers a direct chip-to-chip interconnect called NeuronLink-v1,\nwhich enables co-optimizing latency and throughput via the :ref:`Neuron Core Pipeline <neuroncore-pipeline>` technology. \n\n.. image:: /images/inf1-server-arch.png\n\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/inf2-arch.rst",
    "content": ".. _aws-inf2-arch:\n\nAmazon EC2 Inf2 Architecture\n=============================\n\nOn this page we provide an architectural overview of the Amazon EC2 Inf2\ninstances and the corresponding Inferentia2 NeuronChips that power\nthem (Inferentia2 chips from here on).\n\nInf2 Architecture\n-----------------\n\nThe EC2 Inf2 instance is powered by up to 12 :ref:`Inferentia2 chips <inferentia2-arch>`, and allows\ncustomers to choose between four instance sizes:\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :stub-columns: 1    \n    :align: left\n\n    *   - Instance size\n        - # of Inferentia2 chips\n        - vCPUs\n        - Host Memory (GiB)\n        - FP8/FP16/BF16/TF32 TFLOPS\n        - FP32 TFLOPS\n        - Device Memory (GiB)\n        - Instance Memory Bandwidth (GiB/sec)\n        - NeuronLink-v2 chip-to-chip (GiB/sec/chip)\n\n    *   - Inf2.xlarge\n        - 1\n        - 4\n        - 16\n        - 190\n        - 47.5\n        - 32\n        - 820\n        - N/A\n\n    *   - Inf2.8xlarge\n        - 1\n        - 32\n        - 128\n        - 190\n        - 47.5\n        - 32\n        - 820\n        - N/A\n\n    *   - Inf2.24xlarge\n        - 6\n        - 96\n        - 384\n        - 1140\n        - 285\n        - 192\n        - 4920\n        - 192\n\n    *   - Inf2.48xlarge\n        - 12\n        - 192\n        - 768\n        - 2280\n        - 570\n        - 384\n        - 9840\n        - 192\n\n\nInf2 offers a low-latency, high-bandwidth chip-to-chip interconnect\ncalled NeuronLink-v2, which enables high-performance collective communication operations (e.g., AllReduce and AllGather).\n\nThis allows sharding large models across Inferentia2 chips (e.g., via\nTensor Parallelism), thus optimizing latency and throughput. This\ncapability is especially useful when deploying Large Generative Models.\n\n.. image:: /images/inf2-topology.png\n\n\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/inferentia.rst",
    "content": ".. _inferentia-arch:\n\n\nInferentia Architecture\n-----------------------\n\nAt the heart of each Inf1 instance are sixteen Inferentia chips, each with four :ref:`NeuronCore-v1 <neuroncores-v1-arch>`, as depicted\nbelow:\n\n.. image:: /images/inferentia-neurondevice.png\n\n\n\nEach Inferentia chip consists of:\n\n+---------------+-------------------------------------------+\n| Compute       | Four                                      |  \n|               | :ref:`NeuronCore-v1 <neuroncores-v1-arch>`|   \n|               | cores, delivering 128 INT8 TOPS and 64    |   \n|               | FP16/BF16 TFLOPS                          |  \n+---------------+-------------------------------------------+\n| Device Memory | 8GiB of device DRAM memory (for storing   |  \n|               | parameters and intermediate state), with  | \n|               | 50 GiB/sec of bandwidth                   | \n+---------------+-------------------------------------------+\n| NeuronLink    | Enables co-optimization of latency and    |   \n|               | throughput via the :ref:`Neuron Core      |\n|               | Pipeline <neuroncore-pipeline>`           |  \n|               | technology                                |  \n+---------------+-------------------------------------------+\n\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/inferentia2.rst",
    "content": ".. _inferentia2-arch:\n\nInferentia2 Architecture\n------------------------\n\nAt the heart of each Inf2 instance are up to twelve Inferentia2 chips (each with two :ref:`NeuronCore-v2 <neuroncores-v2-arch>` cores). Inferentia2 is the second\ngeneration AWS purpose-built Machine Learning inference accelerator. The Inferentia2 chip architecture is depicted below: \n\n.. image:: /images/inferentia2.png\n\n\nEach Inferentia2 chip consists of:\n\n+----------------------------------+----------------------------------+\n| Compute                          | Two :ref:`NeuronCore-v2          |\n|                                  | <neuroncores-v2-arch>`           |\n|                                  | cores, delivering 380 INT8 TOPS, |\n|                                  | 190 FP16/BF16/cFP8/TF32 TFLOPS,  |\n|                                  | and 47.5 FP32 TFLOPS.            |\n+----------------------------------+----------------------------------+\n| Device Memory                    | 32GiB of high-bandwidth device   |                                  \n|                                  | memor (HBM) (for storing model   |                                  \n|                                  | state), with 820 GiB/sec of      |                                  \n|                                  | bandwidth.                       |\n+----------------------------------+----------------------------------+\n| Data Movement                    | 1 TB/sec of DMA bandwidth, with  |\n|                                  | inline memory                    |\n|                                  | compression/decompression.       |\n+----------------------------------+----------------------------------+\n| NeuronLink                       | NeuronLink-v2 for                |                                  \n|                                  | chip-to-chip interconnect        |                                  \n|                                  | enables high-performance         |                                  \n|                                  | collective compute for           |                                  \n|                                  | co-optimization of latency and   |                                  \n|                                  | throughput.                      |\n+----------------------------------+----------------------------------+\n| Programmability                  | Inferentia2 supports dynamic     |\n|                                  | shapes and control flow, via ISA |\n|                                  | extensions of NeuronCore-v2 and  |\n|                                  | :ref:`custom-operators           |\n|                                  | <feature-custom-c++-operators>`  |\n|                                  | via the deeply embedded GPSIMD   |\n|                                  | engines.                         |\n+----------------------------------+----------------------------------+\n\nFor a more detailed description of all the hardware engines, see :ref:`NeuronCore-v2 <neuroncores-v2-arch>`."
  },
  {
    "path": "about-neuron/arch/neuron-hardware/neuron-core-v1.rst",
    "content": ".. _neuroncores-v1-arch:\n\n\nNeuronCore-v1 Architecture\n--------------------------\n\nNeuronCore-v1 is the first generation NeuronCore engine, powering\nthe Inferentia chips. Each NeuronCore-v1 is a fully-independent\nheterogenous compute-unit, with three main engines (Tensor/Vector/Scalar\nEngines), and on-chip software-managed SRAM memory, for\nmaximizing data locality (compiler managed, for maximum data locality\nand optimized data prefetch).\n\n.. image:: /images/nc-v1.png\n\n\nThe ScalarEngine is optimized for scalar computations, in which every\nelement of the output is dependent on one element of the input, e.g.,\nnon-linearities such as GELU, SIGMOID, or EXP. The ScalarEngine is highly\nparallelized, and can process 512 floating point operations per cycle.\nIt can handle various data types, including FP16, BF16, FP32, INT8,\nINT16, and INT32. \n\nThe VectorEngine is optimized for vector computations,\nin which every element of the output is dependent on multiple input\nelements. Examples include ‘axpy’ operations (Z=aX+Y), Layer\nNormalization, Pooling operations, and many more. The VectorEngine is\nalso highly parallelized, and can perform 256 floating point operations\nper cycle. It can handle various data-types, including FP16, BF16, FP32,\nINT8, INT16, and INT32.\n\nThe TensorEngine is based on a power-optimized systolic array, which is\nhighly optimized for tensor computations (e.g., GEMM, CONV, Reshape,\nTranspose), and supports mixed-precision computations (FP16/BF16/INT8\ninputs, FP32/INT32 outputs). Each NeuronCore-v1 TensorEngine delivers 16\nTFLOPS of FP16/BF16 tensor computations."
  },
  {
    "path": "about-neuron/arch/neuron-hardware/neuron-core-v2.rst",
    "content": ".. _neuroncores-v2-arch:\n\nNeuronCore-v2 Architecture\n--------------------------\n\nNeuronCore-v2 is the second generation of the NeuronCore engine,\npowering the Trainium chips. Each NeuronCore-v2 is a\nfully-independent heterogenous compute-unit, with 4 main engines\n(Tensor/Vector/Scalar/GPSIMD Engines), and on-chip\nsoftware-managed SRAM memory, for maximizing data locality (compiler\nmanaged, for maximum data locality and optimized data prefetch).\n\n\n.. image:: /images/nc-v2.png\n\nJust like in NeuronCore-v1, The ScalarEngine is optimized for\nscalar-computations, in which every element of the output is dependent\non one element of the input. The ScalarEngine is highly parallelized,\nand delivers 2.9 TFLOPS of FP32 computations (3x speedup\nrelative to NeuronCore-v1). The NeuronCore-v2 ScalarEngine can handle\nvarious data types, including cFP8, FP16, BF16, TF32, FP32, INT8, INT16,\nand INT32. \n\nThe VectorEngine is optimized for vector computations, in\nwhich every element of the output is dependent on multiple input\nelements. Examples include ‘axpy’ operations (Z=aX+Y), Layer\nNormalization, Pooling operations, and many more. The VectorEngine is\nalso highly parallelized, and delivers 2.3 TFLOPS of FP32 computations \n(10x speedup vs. NeuronCore-v1). The NeuronCore-v2\nVectorEngine can handle various data-types, including cFP8, FP16, BF16,\nTF32, FP32, INT8, INT16 and INT32.\n\nThe TensorEngine is based on a power-optimized systolic-array, which is\nhighly optimized for tensor computations (e.g., GEMM, CONV, \nTranspose), and supports mixed-precision computations (cFP8 / FP16 /\nBF16 / TF32 / FP32 / INT8 inputs, FP32 / INT32 outputs). Each\nNeuronCore-v2 TensorEngine delivers over 90 TFLOPS of FP16/BF16 tensor\ncomputations (6x speedup from NeuronCore-v1). \n\nNeuronCore-v2 also introduces a new engine called the\nGPSIMD-Engine, which consists of eight fully-programmable 512-bit wide \nvector processors, which can execute general purpose C-code and access the \nembedded on-chip SRAM memory. With these cores, customers can implement \ncustom operators and execute them directly on the NeuronCores.\n\nNeuronCore-v2 also adds support for control flow, dynamic shapes, and\nprogrammable :ref:`rounding mode <neuron-rounding-modes>` (RNE & Stochastic-rounding).\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/neuron-core-v3.rst",
    "content": ".. _neuroncores-v3-arch:\n\nNeuronCore-v3 Architecture\n--------------------------\n\nNeuronCore-v3 is the third-generation NeuronCore that powers Trainium2 chips. It is a fully-independent heterogenous compute \nunit consisting of 4 main engines: Tensor, Vector, Scalar, and GPSIMD, with on-chip software-managed SRAM memory to maximize data \nlocality and optimize data prefetch. The following diagram shows a high-level overview of the NeuronCore-V3 architecture.\n\n.. image:: /images/architecture/NeuronCore/nc-v3.png\n    :align: center\n    :width: 250\n|\nNeuronCore-v3 is made up of the following components:\n\nOn-chip SRAM \n\"\"\"\"\"\"\"\"\"\"\"\"\nEach NeuronCore-v3 has a total of 28MB of on-chip SRAM. NeuronCore-v3 on-chip SRAM is software-managed to maximize data locality \nand optimize data prefetch. \n\nTensor Engine\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nTensor engines are based on a power-optimized systolic array. They are highly optimized for tensor computations such as GEMM, CONV, and \nTranspose. Tensor Engines support mixed-precision computations, including cFP8, FP16, BF16, TF32, and FP32 inputs and outputs. \nA NeuronCore-v3 Tensor Engine delivers 158 cFP8 TFLOPS, and 79 BF16/FP16/TF32 TFLOPS of tensor computations. \n\nLike NeuronCore-v2, NeuronCore-v3 supports control flow, dynamic shapes, and programmable rounding mode (RNE & Stochastic-rounding). \nNeuronCore-v3 also supports adjustable exponent biasing for the cFP8 data type.\n   \nThe NeuronCore-v3 Tensor Engine also supports Structured Sparsity, delivering up to 316 TFLOPS of cFP8/FP16/BF16/TF32 \ncompute. This is useful when one of the input tensors to matrix multiplication exhibits a M:N sparsity pattern, where only M elements \nout of every N contiguous elements are non-zero. NeuronCore-v3 supports several sparsity patterns, including 4:16, 4:12, 4:8, 2:8, \n2:4, 1:4, and 1:2. \n\nVector Engine\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nOptimized for vector computations, in which every element of the output is dependent on multiple input elements. Examples include \naxpi operations (Z=aX+Y), Layer Normalization, and Pooling operations. \n\nVector Engines are highly parallelized, and deliver a total of 1 TFLOPS of FP32 computations. NeuronCore-v3 Vector Engines can handle \nvarious data-types, including cFP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32. \n\nScalar engine\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nOptimized for scalar computations in which every element of the output is dependent on one element of the input. Scalar Engines are \nhighly parallelized, and deliver a total of 1.2 TFLOPS of FP32 computations. NeuronCore-v3 Scalar Engines support multiple data \ntypes, including cFP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32.\n\nGPSIMD engine\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nEach GPSIMD engine consists of eight fully-programmable 512-bit wide vector processors. They can execute general purpose C-code and \naccess the embedded on-chip SRAM, allowing you to implement custom operators and execute them directly on the NeuronCores.\n\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/neuron-core-v4.rst",
    "content": ".. meta::\n    :description: \"NeuronCore-v4 architecture overview and components.\"\n    :date-modified: 12/02/2025\n\n\n.. _neuroncores-v4-arch:\n\nNeuronCore-v4 Architecture\n===========================\n\nNeuronCore-v4 is the fourth-generation NeuronCore that powers Trainium3 chips. It is a fully-independent heterogenous compute unit consisting of 4 main engines: Tensor, Vector, Scalar, and GPSIMD, with on-chip software-managed SRAM memory to maximize data locality and optimize data prefetch. \n\nThe following diagram shows a high-level overview of the NeuronCore-v4 architecture.\n\n.. image:: /images/architecture/trn3/neuroncore-v4.png\n    :align: center\n\nLike previous generations of NeuronCore, NeuronCore-v4 supports control flow, dynamic shapes, and programmable rounding mode (RNE & Stochastic-rounding). NeuronCore-v4 is made up of the following components:\n\nOn-chip SRAM\n-------------\n\nEach NeuronCore-v4 has a total of 32MiB of on-chip SRAM. The on-chip SRAM is software-managed to maximize data locality and optimize data prefetch. NeuronCore-v4 SRAM also introduces a new near-memory accumulation feature, which allows DMA engines to perform a read-add-write operation into existing SRAM data via a single transfer. \n\nTensor Engine\n--------------\n\nTensor engines are based on a power-optimized systolic array. They are highly optimized for tensor computations such as GEMM, CONV, and Transpose. Tensor Engines support mixed-precision computations, including MXFP8/MXFP4, FP16, BF16, TF32, and FP32 inputs. The output data type can either be FP32 or BF16. A NeuronCore-v4 Tensor Engine delivers 315 MXFP8/MXFP4 TFLOPS, where MXFP8/MXFP4 are OCP (Open Compute Project) compliant data type formats. MXFP4 data types are converted to MXFP8 before Tensor Engine computation logic, using any arbitrary programmer-defined mapping. Besides quantized data types, a NeuronCore-v4 Tensor Engine also delivers 79 BF16/FP16/TF32 and 20 FP32 TFLOPS of tensor computations. \n\nThe NeuronCore-v4 Tensor Engine also supports Structured Sparsity, delivering up to 315 TFLOPS of FP16/BF16/TF32 compute. This is useful when one of the input tensors to matrix multiplication exhibits a M:N sparsity pattern, where only M elements out of every N contiguous elements are non-zero. NeuronCore-v4 supports several sparsity patterns, including 4:16, 4:12, 4:8, 2:8, 2:4, 1:4, and 1:2.\n\n\nVector Engine\n----------------\n\nOptimized for vector computations, in which every element of the output is dependent on multiple input elements. Examples include axpi operations (Z=aX+Y), Layer Normalization, and Pooling operations.\n\nVector Engines are highly parallelized, and deliver a total of 1.2 TFLOPS of FP32 computations. NeuronCore-v3 Vector Engines can handle various data-types, including FP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32. \n\nIn addition, NeuronCore-v4 Vector Engine supports two new features:\n\n1. Data quantization into MXFP8 data type formats from BF16/FP16, which is particularly useful for online data quantization in between MLP (multi-layer perceptron) layers. \n2. Fast exponential functional evaluation, at 4x higher throughput than exponential on Scalar Engine, which is particularly useful in self attention acceleration.\n\n\n\nScalar Engine\n---------------\n\nOptimized for scalar computations in which every element of the output is dependent on one element of the input. Scalar Engines are highly parallelized, and deliver a total of 1.2 TFLOPS of FP32 computations. NeuronCore-v3 Scalar Engines support multiple data types, including FP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32.\n\nGPSIMD Engine\n---------------\n\nEach GPSIMD engine consists of eight fully-programmable 512-bit wide vector processors. They can execute general purpose C/C++ code and access the embedded on-chip SRAM, allowing you to implement custom operators and execute them directly on the NeuronCores.\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/trainium.rst",
    "content": ".. _trainium-arch:\n\n\nTrainium Architecture\n----------------------\n\nAt the heart of the Trn1 instance are 16 x Trainium chips (each Trainium include 2 x :ref:`NeuronCore-v2 <neuroncores-v2-arch>`). Trainium is the second\ngeneration purpose-built Machine Learning accelerator from AWS. The\nTrainium chip architecture is depicted below:\n\n.. image:: /images/trainium-neurondevice.png\n\n\nEach Trainium chip consists of:\n\n+----------------------------------+----------------------------------+\n| Compute                          | Two :ref:`NeuronCore-v2          |\n|                                  | <neuroncores-v2-arch>`           |\n|                                  | delivering 380 INT8 TOPS,        |\n|                                  | 190 FP16/BF16/cFP8/TF32 TFLOPS,  |\n|                                  | and 47.5 FP32 TFLOP.             |\n+----------------------------------+----------------------------------+\n| Device Memory                    | 32 GiB of device memory (for     |                                  \n|                                  | storing model state), with 820   |                                  \n|                                  | GiB/sec of bandwidth.            |             \n+----------------------------------+----------------------------------+\n| Data Movement                    | 1 TB/sec of DMA bandwidth, with  |\n|                                  | inline memory                    |\n|                                  | compression/decompression.       |\n+----------------------------------+----------------------------------+\n| NeuronLink                       | NeuronLink-v2 for                |\n|                                  | chip-to-chip interconnect        |\n|                                  | enables efficient scale-out      |\n|                                  | training, as well as memory      |\n|                                  | pooling between the different    |\n|                                  | Trainium chips.                  |\n+----------------------------------+----------------------------------+\n| Programmability                  | Trainium supports dynamic shapes |\n|                                  | and control flow, via ISA        |\n|                                  | extensions of NeuronCore-v2. In  |\n|                                  | addition, Trainium also allows   |\n|                                  | for user-programmable            |\n|                                  | :ref:`rounding mode              |\n|                                  | <neuron-rounding-modes>`         |\n|                                  | (Round Nearest Even Stochastic   |\n|                                  | Rounding), and custom operators  |\n|                                  | via the deeply embedded GPSIMD   |\n|                                  | engines.                         |\n+----------------------------------+----------------------------------+\n\n\nFor a detailed description of all the hardware engines, see :ref:`NeuronCore-v2 <neuroncores-v2-arch>`\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/trainium2.rst",
    "content": ".. _trainium2-arch:\n\n######################\nTrainium2 Architecture\n######################\n\nTrainium2 is the third generation, purpose-built Machine Learning chip from AWS. Every Trainium2 chip contains eight NeuronCore-V3 cores. Beginning with Trainium2, AWS Neuron adds support for Logical \nNeuronCore Configuration (LNC), which lets you combine the compute and memory resources of multiple physical NeuronCores into a \nsingle logical NeuronCore. The following diagram shows the architecture overview of a Trainium2 chip.\n\n.. image:: /images/architecture/Trainium2/trainium2.png\n    :align: center\n    :width: 400\n    \n===========================\nTrainium2 chip components\n===========================\n\nEach Trainium2 chip consists of the following components:\n\n+----------------------------------+-----------------------------------------------------+\n| Compute                          | Eight NeuronCore-v3 that collectively deliver:      |\n|                                  |                                                     |\n|                                  | * 1,299 FP8 TFLOPS                                  | \n|                                  | * 667 BF16/FP16/TF32 TFLOPS                         |\n|                                  | * 2,563 FP8/FP16/BF16/TF32 sparse TFLOPS            |\n|                                  | * 181 FP32 TFLOPS                                   |\n|                                  |                                                     |\n+----------------------------------+-----------------------------------------------------+\n| Device Memory                    | 96 GiB of device memory with 2.9 TB/sec of          |\n|                                  | bandwidth.                                          |             \n+----------------------------------+-----------------------------------------------------+\n| Data Movement                    | 3.5 TB/sec of DMA bandwidth, with inline            |\n|                                  | memory compression and decompression.               |\n+----------------------------------+-----------------------------------------------------+\n| NeuronLink                       | NeuronLink-v3 for chip-to-chip interconnect         |\n|                                  | provides 1.28 TB/sec bandwidth per chip. It allows  |\n|                                  | for efficient scale-out training and inference, as  |\n|                                  | well as memory pooling between Trainium2 chips.     |\n+----------------------------------+-----------------------------------------------------+\n| Programmability                  | Trainium2 supports dynamic shapes and control flow  |\n|                                  | via NeuronCore-v3 ISA extensions. Trainium2 also    |\n|                                  | allows for user-programmable                        |\n|                                  | :ref:`rounding mode <neuron-rounding-modes>`        |\n|                                  | (Round Nearest Even or Stochastic Rounding), and    |\n|                                  | custom operators via deeply embedded GPSIMD engines.|\n+----------------------------------+-----------------------------------------------------+\n| Collective communication         | 16 CC-Cores orchestrate collective communication    |\n|                                  | among Trainium2 chips within and across instances.  |\n+----------------------------------+-----------------------------------------------------+     \n\n==================================\nTrainium2 performance improvements\n==================================\n\nThe following set of tables offer a comparison between Trainium and Trainium2 chips. \n \nCompute\n\"\"\"\"\"\"\"\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1 \n    :stub-columns: 1    \n    :align: left\n      \n    *   - \n        - Trainium\n        - Trainium2\n        - Improvement factor\n    \n    *   - FP8 (TFLOPS)\n        - 191\n        - 1299\n        - 6.7x\n    *   - BF16/FP16/TF32 (TFLOPS)\n        - 191\n        - 667\n        - 3.4x\n    *   - FP32 (TFLOPS)\n        - 48\n        - 181\n        - 3.7x\n    *   - FP8/FP16/BF16/TF32 Sparse (TFLOPS)\n        - Not applicable\n        - 2563 \n        - Not applicable\n\nMemory\n\"\"\"\"\"\"\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1 \n    :stub-columns: 1    \n    :align: left\n      \n    *   - \n        - Trainium\n        - Trainium2\n        - Improvement factor\n    \n    *   - HBM Capacity (GiB)\n        - 32\n        - 96\n        - 3x\n    *   - HBM Bandwidth (TB/sec)\n        - 0.8\n        - 2.9\n        - 3.6x\n    *   - SBUF Capacity (MiB)\n        - 48\n        - 224\n        - 4.7x\n    *   - Memory Pool Size\n        - Up to 16 chips\n        - Up to 64 chips\n        - 4x\n\nInterconnect\n\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1 \n    :stub-columns: 1    \n    :align: left\n      \n    *   - \n        - Trainium\n        - Trainium2\n        - Improvement factor\n    \n    *   - Inter-chip Interconnect (GB/sec/chip)\n        - 384\n        - 1280\n        - 3.3x\n\nData movement\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n.. list-table::\n    :widths: auto\n    :header-rows: 1 \n    :stub-columns: 1    \n    :align: left\n      \n    *   - \n        - Trainium\n        - Trainium2\n        - Improvement factor\n    \n    *   - CC Cores\n        - 6\n        - 16\n        - 3.3x\n    *   - DMA barriers\n        - Write-after-write\n        - Strong-order-write\n        - \\>1x (Benefit DMA-size dependent)\n    *   - SBUF memory layout\n        - Row-major\n        - Row-major, Col-major-2B, Col-major-4B\n        - Not applicable\n\n====================\nAdditional resources\n====================\n\nFor a detailed description of NeuronCore-v3 hardware engines, instances powered by AWS Trainium2, and Logical NeuronCore configuration, see the following resources:\n\n* :ref:`NeuronCore-v3 architecture <neuroncores-v3-arch>`\n* :ref:`Amazon EC2 Trn2 architecture <aws-trn2-arch>`\n* :ref:`Logical NeuronCore configuration <logical-neuroncore-config>`\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/trainium3.rst",
    "content": ".. meta::\n    :description: \"Neuron Trainium3 (Trn3) architecture overview.\"\n    :date-modified: 12/02/2025\n\n.. _trainium3-arch:\n\nTrainium3 Architecture\n=======================\n\nTrainium3 is the fourth-generation purpose-built Machine Learning chip from AWS. A Trainium3 device contains eight NeuronCore-v4 cores. Similar to Trainium2, AWS Neuron adds support for Logical NeuronCore Configuration (LNC), which lets you combine the compute and memory resources of multiple physical NeuronCores into a single logical NeuronCore. The following diagram shows the architecture overview of a Trainium3 chip.\n\n.. image:: /images/architecture/trn3/neuroncore-v4-overview.png\n    :align: center\n\nNeuronCore-v4\n--------------\n\nEach Trainium3 chip consists of the following components:\n\n.. list-table::\n    :widths: auto\n    :header-rows: 0\n    :stub-columns: 1\n    :align: left\n\n    *   - Compute\n        - Eight NeuronCore-v4 cores that collectively deliver:\n\n          * 2,517 MXFP8/MXFP4 TFLOPS\n          * 671 BF16/FP16/TF32 TFLOPS\n          * 2,517 FP16/BF16/TF32 sparse TFLOPS\n          * 183 FP32 TFLOPS\n\n    *   - Device memory\n        - 144 GiB of device memory, with 4.9 TB/sec of bandwidth.\n\n    *   - Data movement\n        - 4.9 TB/sec of DMA bandwidth, with inline computation.\n\n    *   - NeuronLink\n        - NeuronLink-v4 for device-to-device interconnect provides 2.56 TB/sec bandwidth per device. It enables efficient scale-out training, as well as memory pooling between the different Trainium3 devices.\n\n    *   - Programmability\n        - Trainium3 supports dynamic shapes and control flow, via ISA extensions of NeuronCore-v4. Trainium3 also allows for user-programmable rounding mode (Round Nearest Even or Stochastic Rounding), and custom operators via the deeply embedded GPSIMD engines.\n\n    *   - Collective communication\n        - 16 CC-Cores orchestrate collective communication among Trainium3 devices, both within a server and across servers.\n\nTrainium3 performance improvements\n-----------------------------------\n\nThe following set of tables offer a comparison between Trainium2 and Trainium3 chips.\n\nCompute\n\"\"\"\"\"\"\"\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :stub-columns: 1\n    :align: left\n\n    *   -\n        - Trainium2\n        - Trainium3\n        - Improvement factor\n\n    *   - MXFP4 (TFLOPS)\n        - Not applicable\n        - 2517\n        - -\n    *   - FP8 (TFLOPS)\n        - 1299\n        - 2517\n        - 2x\n    *   - BF16/FP16/TF32 (TFLOPS)\n        - 667\n        - 671\n        - 1x\n    *   - FP32 (TFLOPS)\n        - 181\n        - 183\n        - 1x\n\nMemory\n\"\"\"\"\"\"\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :stub-columns: 1\n    :align: left\n\n    *   -\n        - Trainium2\n        - Trainium3\n        - Improvement factor\n\n    *   - HBM Capacity (GiB)\n        - 96\n        - 144\n        - 1.5x\n    *   - HBM Bandwidth (TB/sec)\n        - 2.9\n        - 4.9\n        - 1.7x\n    *   - SBUF Capacity (MiB)\n        - 224\n        - 256\n        - 1.14x\n\nInterconnect\n\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :stub-columns: 1\n    :align: left\n\n    *   -\n        - Trainium2\n        - Trainium3\n        - Improvement factor\n\n    *   - Inter-chip Interconnect (GB/sec/chip)\n        - 1280\n        - 2560\n        - 2x\n\nData movement\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :stub-columns: 1\n    :align: left\n\n    *   -\n        - Trainium2\n        - Trainium3\n        - Improvement factor\n\n    *   - DMA Bandwidth (TB/sec)\n        - 3.5\n        - 4.9\n        - 1.4x\n\nAdditional resources\n----------------------\n\nFor a detailed description of NeuronCore-v4 hardware engines, instances powered by AWS Trainium3, and Logical NeuronCore configuration, see the following resources:\n\n* :ref:`NeuronCore-v4 architecture <neuroncores-v4-arch>`\n\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/trn1-arch.rst",
    "content": ".. _aws-trn1-arch:\n\nAmazon EC2 Trn1/Trn1n Architecture\n===================================\n\nOn this page, we provide an architectural overview of the AWS Trn1/Trn1n\ninstances, and the corresponding :ref:`Trainium <trainium-arch>` NeuronChips that power them\n(Trainium chips from here on).\n\n.. contents::  Table of contents\n   :local:\n   :depth: 2\n\n.. _trn1-arch:\n\nTrn1/Trn1n Architecture\n-----------------------\n\nAn EC2 Trn1/Trn1n instance is powered by up to 16 :ref:`Trainium <trainium-arch>` chips.\n\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :stub-columns: 1    \n    :align: left\n      \n\n    *   - Instance size\n        - # of Trainium chips\n        - vCPUs\n        - Host Memory (GiB)\n        - FP8/FP16/BF16/TF32 TFLOPS\n        - FP32 TFLOPS\n        - Device Memory (GiB)\n        - Device Memory Bandwidth (GiB/sec)\n        - EFA bandwidth (Gbps)\n\n    *   - Trn1.2xlarge\n        - 1\n        - 8\n        - 32\n        - 190\n        - 47.5\n        - 32\n        - 820\n        - N/A\n        - up-to 25 \n\n    *   - Trn1.32xlarge\n        - 16\n        - 128\n        - 512\n        - 3,040\n        - 760\n        - 512\n        - 13,120\n        - 384\n        - 800\n\n    *   - Trn1n.32xlarge\n        - 16\n        - 128\n        - 512\n        - 3,040\n        - 760\n        - 512\n        - 13,120\n        - 768\n        - 1,600\n\n\nThe Trn1.2xlarge instance size allows customers to train their models on\na single Trainium chip, which is useful for small model training, as\nwell as for model experimentation. The Trn1.32xlarge and Trn1n.32xlarge instance size come\nwith a high-bandwidth and low-latency NeuronLink-v2 chip-to-chip\ninterconnect, which utilizes a 2D Torus topology. This is useful for\ncollective communication between the Trainium chips during scale-out\ntraining, as well as for pooling the memory capacity of all Trainium\nchips, making it directly addressable from each of the chips.\n\nIn a Trn1/Trn1n server, the Trainium chips are connected in a 2D Torus topology, as depicted below:\n\n.. image:: /images/trn1-topology.png\n\nThe Trn1/Trn1n instances are also available in an EC2 UltraCluster, which\nenables customers to scale Trn1/Trn1n instances to over 100,000 Trainium\nchips, and leverage the AWS-designed non-blocking petabit-scale EFA\nnetworking infrastructure.\n\n.. image:: /images/ultracluster-1.png\n\n\n\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/trn2-arch.rst",
    "content": ".. _aws-trn2-arch:\n\n############################\nAmazon EC2 Trn2 Architecture\n############################\n\nTrn2 is an Amazon EC2 accelerated computing instance, purpose built for high-performance deep learning training and inference. This page provides \nan architecture overview of the trn2.48xlarge and trn2u.48xlarge instances, and Trn2 UltraServer.\n\n.. contents::  Topics\n   :local:\n   :depth: 2\n\n.. _trn2-arch:\n\nTrn2 instance sizes\n===================\n\nTrn2 instances and UltraServers are available in the following sizes and configurations:\n\n* trn2.48xlarge\n* trn2u.48xlarge\n* Trn2 UltraServer\n\n.. _trn2-instance:\n\ntrn2.48xlarge / trn2u.48xlarge\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nTrn2 instances are powered by 16 Trainium2 chips connected using a high-bandwidth, low-latency NeuronLink-v3 \nchip-to-chip interconnect. The NeuronLink-v3 chip-to-chip interconnect enables collective communication between Trainium2 \nchips during distributed training and inference. It also allows for the pooling of memory resources from all 16 Trainium2 chips.  \n\nIn a trn2.48xlarge or trn2u.48xlarge instance, 16 Trainium2 chips are connected using a 4x4, 2D Torus topology. The following diagram shows the \nintra-instance connections of a trn2.48xlarge or trn2u.48xlarge instance\n\n.. image:: /images/architecture/Trn2/trn2.48xlarge.png\n    :align: center\n    :width: 650\n|\n\n.. _trn2-ultraserver: \n\nTrn2 UltraServer\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nA Trn2 UltraServer comprises four trn2u.48xlarge instances connected together via the NeuronLink-v3 chip-to-chip interconnect. \nThis allows for a total of 64 Trainium2 chips to be interconnected within a Trn2 UltraServer. Trainium2 chips with the same \ncoordinates in each Trn2 instance are connected in a ring topology. The following figure shows the inter-instance ring connection \nbetween Trainium2 chips.\n\n.. image:: /images/architecture/Trn2/u-trn2x64.png\n    :align: center\n    :width: 650\n|\nTrn2 instance specifications \n============================\n\nThe following table shows the performance metrics for Trainium2 based instances.\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :stub-columns: 1    \n    :align: left\n      \n\n    *   - Perfomance specification\n        - trn2.48xlarge / trn2u.48xlarge\n        - Trn2 UltraServer\n    *   - # of Trainium2 chips\n        - 16\n        - 64\n    *   - vCPUs\n        - 192\n        - 768\n    *   - Host Memory (GiB)\n        - 2,048\n        - 8,192\n    *   - FP8 PFLOPS\n        - 20.8\n        - 83.2\n    *   - FP16/BF16/TF32 PFLOPS\n        - 10.7\n        - 42.8\n    *   - FP8/FP16/BF16/TF32 Sparse PFLOPS\n        - 41\n        - 164\n    *   - FP32 PFLOPS\n        - 2.9\n        - 11.6\n    *   - Device Memory (GiB)\n        - 1,536\n        - 6,144\n    *   - Device Memory Bandwidth (TB/sec)\n        - 46.4\n        - 185.6\n    *   - Intra-instance NeuronLink-v3 bandwidth (GB/sec/chip)\n        - 1,024\n        - 1,024\n    *   - Inter-instance NeuronLink-v3 bandwidth (GB/sec/chip)\n        - Not applicable\n        - 256\n    *   - EFAv3 bandwidth (Gbps)\n        - 3,200\n        - 3,200\n\n"
  },
  {
    "path": "about-neuron/arch/neuron-hardware/trn3-arch.rst",
    "content": ".. _aws-trn3-arch:\n\n###############################\nAmazon EC2 Trn3 Architecture\n###############################\n\nAmazon EC2 **Trn3** instances are accelerated computing instances powered by Trainium3 AI chips, purpose-built for high-performance deep learning training and inference. Trn3 is available in two UltraServer scale-up configurations: Gen1 with 64 Trainium3 chips per UltraServer, and Gen2 with 144 chips per UltraServer. Both configurations use NeuronSwitch-v1 interconnect technology to enable all-to-all connectivity between chips, especially optimized for workloads that leverage all-to-all communication patterns, such as Mixture of Experts models and autoregressive inference serving.\n\n=====================\nTrn3 Gen1 UltraServer\n=====================\n\nThe EC2 Trn3 Gen1 UltraServers deliver 161 PetaFLOPS of dense MXFP8 compute, 314 TB/s of HBM bandwidth, and 9TB of HBM capacity. Each UltraServer consists of four servers with 16 Trainium3 devices per server. Therefore, the UltraServer integrates a total of 64 Trainium3 devices into a single scale-up domain, interconnected via our latest-generation NeuronLink-v4 and the newly introduced NeuronSwitch-v1. The chip-to-chip topology features an all-to-all connectivity design, replacing the previous 2D-torus architecture. This all-to-all topology is optimized for workloads that require efficient all-to-all communication patterns or ultra-low latency collectives, including Mixture of Experts models and autoregressive inference serving. The following diagram illustrates the Trn3 Gen1 UltraServer connectivity.\n\n.. image:: /images/architecture/trn3/trn3-ultraserver-gen1.png\n    :align: center\n\n\n=====================\nTrn3 Gen2 UltraServer\n=====================\n\nThe EC2 Trn3 Gen2 UltraServers deliver 362 PetaFLOPS of dense MXFP8 compute, 706 TB/s of HBM bandwidth, and 20TB of HBM capacity. Each UltraServer consists of 36 servers with 4 Trainium3 devices per server. Trainium3 devices within the same server are connected via a first-level NeuronSwitch-v1, while devices across servers are connected via two second-level NeuronSwitch-v1 and NeuronLink-v4. Therefore, the UltraServer integrates 144 Trainium3 devices into a single scale-up domain. Like Gen1, the chip-to-chip topology features an all-to-all connectivity design optimized for Mixture of Experts models and autoregressive inference serving. The following diagram illustrates the Trn3 Gen2 UltraServer connectivity.\n\n.. image:: /images/architecture/trn3/trn3-ultraserver-gen2.png\n    :align: center\n\n==========================================\nTrn3 Gen1/Gen2 UltraServer specifications\n==========================================\n\nThe following table shows the performance metrics for Tranium3 based instances.\n\n.. list-table::\n   :header-rows: 2\n   :stub-columns: 1\n   :widths: 30 20 20\n\n   * - \n     - Trn3 Gen1 UltraServer\n     - Trn3 Gen2 UltraServer\n   * - Configuration\n     - \n     - \n   * - # of Trainium3 devices\n     - 64\n     - 144\n   * - Host vCPUs\n     - 768\n     - 2304\n   * - Host Memory (GiB)\n     - 8,192\n     - 27,648\n   * - **Compute**\n     - \n     - \n   * - MXFP8/MXFP4 TFLOPS\n     - 161,088\n     - 362,448\n   * - FP16/BF16/TF32 TFLOPS\n     - 42,944\n     - 96,624\n   * - FP32 TFLOPS\n     - 11,712\n     - 26,352\n   * - **Memory**\n     - \n     - \n   * - Device Memory (GiB)\n     - 9,216\n     - 20,736\n   * - Device Memory Bandwidth (TB/sec)\n     - 313.6\n     - 705.6\n   * - **Interconnect**\n     - \n     - \n   * - NeuronLink-v4 bandwidth (GiB/sec/device)\n     - 2,048\n     - 2,048\n   * - EFA bandwidth (Gbps)\n     - 12,800\n     - 28,800\n  \n============================================\nTrn3 UltraServer Connectivity and Networking\n============================================\n\nTrn3 UltraServers use a PCIe switch-based interconnect architecture for all chip-to-chip communication, both within and across servers. This replaces the point-to-point NeuronLink topology used in previous generations (Trn1, Trn2) with a switched fabric that enables flexible, all-to-all connectivity across the entire UltraServer domain.\n\nIntra-server connectivity\n-------------------------\n\nEach server (sled) contains 4 Trainium3 chips connected through an intra-server PCIe switch. Each chip provides four PCIe Gen6 x8 links to this switch, delivering a total of 256 GB/s of bidirectional bandwidth between chips within the same server. This local switch enables low-latency communication for operations like tensor parallelism and data-parallel gradient synchronization within a server.\n\nInter-server connectivity\n-------------------------\n\nAll servers within a rack are connected through inter-server PCIe switches. Each Trainium3 chip provides five PCIe Gen6 x8 links to the inter-server switch, delivering 320 GB/s of bidirectional bandwidth per chip for cross-server communication. This enables collective operations such as all-reduce and all-gather to span all servers in a rack without requiring host CPU involvement.\n\nInter-rack connectivity\n-----------------------\n\nFor multi-rack configurations, Trainium3 chips in corresponding positions across racks are connected via dedicated direct PCIe links. Each chip provides two PCIe Gen6 x8 links for inter-rack communication, delivering 128 GB/s of bidirectional bandwidth per chip between racks. This direct-link design avoids additional switch hops for cross-rack traffic.\n\nBandwidth summary\n-----------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 30 40\n\n   * - Connectivity level\n     - Bandwidth per chip\n     - Link configuration\n   * - Intra-server (within sled)\n     - 256 GB/s\n     - 4 × PCIe Gen6 x8 via intra-server switch\n   * - Inter-server (within rack)\n     - 320 GB/s\n     - 5 × PCIe Gen6 x8 via inter-server switch\n   * - Inter-rack\n     - 128 GB/s\n     - 2 × PCIe Gen6 x8 direct links\n\nRouting and address-based switching\n------------------------------------\n\nUnlike Trn1 and Trn2, where NeuronLink connections are point-to-point and require no intermediate routing, Trn3's PCIe switch fabric uses address-based routing to direct transactions to the correct destination chip. Each Trainium3 chip in the system is identified by a tuple of (rack, server, chip), and this identity is encoded in the upper bits of the PCIe address used for outbound transactions. The PCIe switches use BAR (Base Address Register) address matching to determine the correct output port for each transaction.\n\nThis routing is transparent to ML workloads. The Neuron Runtime and compiler handle all address encoding and switch configuration automatically. From the developer's perspective, collective operations and direct memory access between chips work the same way as on previous Trainium generations.\n\nSemaphore-based synchronization\n-------------------------------\n\nTrn3 uses hardware semaphores to synchronize data transfers across the switched fabric. When a chip writes data to a remote chip's HBM, a follow-up semaphore write signals completion to the receiving chip. The system guarantees that data and its associated semaphore always traverse the same physical path through the switch fabric, ensuring correct ordering without additional software synchronization overhead.\n"
  },
  {
    "path": "about-neuron/benchmarks/index.rst",
    "content": ".. _benchmark:\n\n.. meta::\n   :description: Explore AWS Neuron performance benchmarks for Inf1, Inf2, and Trn1 instances. Find detailed inference and training performance data across NLP, CV, and recommender models to optimize your machine learning workloads.\n   :date-modified: 2025-10-03\n\nNeuron performance\n==================\n\nThe Neuron performance pages provide comprehensive benchmarks and performance data for AWS Neuron SDK across different Trainium and Inferentia instance types. These benchmarks cover various open-source models for Natural Language Processing (NLP), Computer Vision (CV), and Recommender systems. Each benchmark includes detailed setup instructions and reproducible test configurations to help you evaluate performance for your specific use cases.\n\nInference performance\n---------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: appnote-performance-benchmark\n      :link-type: ref\n\n      **Inf1 Inference Performance**\n      ^^^\n      Comprehensive inference benchmarks for ``Inf1`` instances across NLP, CV, and recommender models\n\n   .. grid-item-card::\n      :link: inf2-performance\n      :link-type: ref\n\n      **Inf2 Inference Performance**\n      ^^^\n      Latest inference performance data for ``Inf2`` instances with improved throughput and latency metrics\n\n   .. grid-item-card::\n      :link: trn1-inference-performance\n      :link-type: ref\n\n      **Trn1 Inference Performance**\n      ^^^\n      Inference benchmarks for ``Trn1`` instances showcasing versatile training and inference capabilities\n\nTraining performance\n--------------------\n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: trn1-training-performance\n      :link-type: ref\n\n      **Trn1 Training Performance**\n      ^^^\n      Training performance benchmarks for ``Trn1`` instances with distributed training metrics and scalability data\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   inf1/index\n   inf2/inf2-performance\n   trn1/trn1-inference-performance\n   trn1/trn1-training-performance\n\n\n"
  },
  {
    "path": "about-neuron/benchmarks/inf1/data.csv",
    "content": "Name,Model,Model details,Framework,Application Type,Run Mode,Inst. Type,Num. Cores,Batch Size,Avg Throughput (/sec),Max Throughput,Threads,Ops in Inferentia,Latency P50 (ms),Latency P90 (ms),Latency P95 (ms),Latency P99 (ms),Latency P100 (ms),Neuron Version,Application,Tutorial\r\n\"YOLOv4-PT(fp32,b1,c4)\",YOLO v4,fp32,PyTorch 1.13,Real Time,Data Parallel,inf1.2xlarge,4,1,180.2,,8,,40.1,,,52,,2.15.0,CV,:ref:`Evaluate YOLO v4 on Inferentia </src/examples/pytorch/yolo_v4.ipynb>`\r\n\"Resnet50-PT(fp32,b5,c4)\",Resnet-50,fp32,PyTorch 1.13,Batch,Data Parallel,inf1.xlarge,4,5,923,,4,,22,,,23,,2.15.0,CV,:ref:`Resnet50 model for Inferentia </src/examples/pytorch/resnet50.ipynb>`\r\n\"Resnet50-TF(fp16,b5,c4)\",Resnet-50,fp16,Tensorflow 1.15,Batch,Data Parallel,inf1.xlarge,4,10,2207,,8,,17.8,,,22.7,,2.12.0,CV,:ref:`ResNet-50 optimization example </src/examples/tensorflow/keras_resnet50/keras_resnet50.ipynb>`\r\n\"OpenPose-TF(fp16,b1,c4)\",OpenPose,fp16,Tensorflow 1.15,Real Time,Data Parallel,inf1.xlarge,4,1,57.5,,4,,60.3,,,67.4,,2.12.0,CV,:ref:`Running OpenPose on Inferentia </src/examples/tensorflow/openpose_demo/openpose.ipynb>`\r\n\"BERT-base-PT(fp32,b6,c4)\",BERT base,\"fp32, bert-base-cased-finetuned-mrpc, sequence-length=128\",PyTorch 1.13,Batch,Data Parallel,inf1.xlarge,4,6,966,,4,,21,,,22,,2.15.0,NLP,:ref:`HuggingFace Pretrained BERT </src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb>`\r\n\"BERT-base-PT(fp32,b1,c16)\",BERT base,\"fp32, bert-base-uncased, sequence-length=128\",PyTorch 1.13,Real Time,Model Pipeline,inf1.6xlarge,16,1,1988.8,,12,,6,,,6.3,,2.15.0,NLP,:ref:`Using NeuronCore Pipeline </src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb>`\r\n\"BERT-base-TF(fp32,b128,c16)\",BERT base,\"fp32, distilbert-base-uncased-finetuned-sst-2-english, sequence-length=128\",Tensorflow 2.8,Batch,Data Parallel,inf1.6xlarge,16,16,2114.8,,,,30.1,,,33,,2.15.0,NLP,:ref:`HuggingFace distilBERT with Tensorflow2 </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>`"
  },
  {
    "path": "about-neuron/benchmarks/inf1/index.rst",
    "content": ".. _appnote-performance-benchmark:\n\nInf1 Inference Performance\n===========================\n\n.. important::\n\n   The benchmark scripts linked on this page are provided for historical reference only and are not tested with recent versions of the Neuron SDK. They have been moved to the `archive folder <https://github.com/aws-neuron/aws-neuron-sdk/tree/master/archive/src/benchmark/pytorch>`_.\n\n.. contents:: Table of contents\n   :local:\n\nThe following tables contain the reference inference performance for models in the tutorials. Follow the links on each row to replicate similar results in your own environment. Refer to :ref:`ec2-then-ec2-setenv` documentation to create a new environment based on the latest Neuron release.\n\n*Last update: September 16th, 2024*\n\n\n.. _NLP:\n\nEncoder Models\n--------------\n.. tab-set::\n\n   .. tab-item:: Throughput optimized\n\n      .. df-table::\n         :header-rows: 1\n\n         df = pd.read_csv('throughput_data_encoder.csv')\n         df_prices = pd.read_csv('instance_prices.csv')\n         df = pd.merge(df,df_prices,on='Inst. Type')\n\n         df['Cost per 1M inferences'] = ((1.0e6 / df['Avg Throughput (/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n         cols_to_show = ['Model', 'Scripts', 'Framework', 'Inst. Type', 'Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model details' ]\n         df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences'])\n\n         int_cols = ['Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)']\n         df[int_cols] = df[int_cols].round(0).astype('int',copy=True)\n\n   .. tab-item:: Latency optimized\n\n      .. df-table::\n         :header-rows: 1\n\n         df = pd.read_csv('latency_data_encoder.csv')\n         df_prices = pd.read_csv('instance_prices.csv')\n         df = pd.merge(df,df_prices,on='Inst. Type')\n\n         df['Cost per 1M inferences'] = ((1.0e6 / df['Avg Throughput (/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n         cols_to_show = ['Model', 'Scripts', 'Framework', 'Inst. Type', 'Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model details' ]\n         df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences'])\n\n         int_cols = ['Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)']\n         df[int_cols] = df[int_cols].round(0).astype('int',copy=True)\n\n\n.. note::\n    Throughput and latency numbers in this table were computed using* NeuronPerf_. To reproduce these results, install NeuronPerf and run the provided scripts.*\n\n.. _NeuronPerf: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuronperf/index.html\n\nConvolutional Neural Networks (CNN) Models\n------------------------------------------\n\n.. df-table::\n   :header-rows: 1\n\n   df = pd.read_csv('throughput_data_cnn.csv')\n   df_prices = pd.read_csv('instance_prices.csv')\n   df = pd.merge(df,df_prices,on='Inst. Type').query('`Application`==\"CV\"')\n\n   df['Cost per 1M inferences'] = ((1.0e6 / df['Avg Throughput (/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n   cols_to_show = ['Model', 'Tutorial', 'Framework', 'Inst. Type', 'Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model details' ]\n   df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']).groupby('Model').head(2)\n\n   int_cols = ['Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)']\n   df[int_cols] = df[int_cols].round(0).astype('int',copy=True)\n\n.. note::\n    Throughput and latency numbers in this table were generated using Neuron Tutorials.\n\n.. note::\n   **Cost per 1M inferences** is calculated using US East (N. Virginia) RI-Effective hourly rate.\n\n   **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference.\n"
  },
  {
    "path": "about-neuron/benchmarks/inf1/instance_prices.csv",
    "content": "Inst. Type,RI-Effective hourly rate\ninf1.xlarge,0.110\ninf1.2xlarge,0.174\ninf1.6xlarge,0.567\ninf1.24xlarge,2.269\n"
  },
  {
    "path": "about-neuron/benchmarks/inf1/latency_data_encoder.csv",
    "content": "Model,Scripts,Source,Framework,Inst. Type,Num Cores,Seq. Length,Avg Throughput (/sec),Max Throughput,Threads,Latency P50 (ms),Latency P90 (ms),Latency P95 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,N Models,Workers per Model,Model details\r\nBERT base (bert-base-cased),:compile-pt:`Compile <bert-base-cased>` + :benchmark-pt:`Benchmark <bert-base-cased>`,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,125.7,,8,7.9,,,8.0,Real Time,2.20.0,Data Parallel,1,1,1,\"fp32, sequence-length=128\"\r\nBERT base (bert-base-uncased),:compile-pt:`Compile <bert-base-uncased>` + :benchmark-pt:`Benchmark <bert-base-uncased>`,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,284.7,,8,10.5,,,10.7,Real Time,2.20.0,Data Parallel,3,1,1,\"fp32, sequence-length=128\"\r\nDistilBERT base (distilbert-base-uncased-finetuned-sst-2-english),:compile-pt:`Compile <distilbert-base-uncased-finetuned-sst-2-english>` + :benchmark-pt:`Benchmark <distilbert-base-uncased-finetuned-sst-2-english>`,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,593.4,,8,10.0,,,10.7,Real Time,2.20.0,Data Parallel,5,1,1,\"fp32, sequence-length=128\"\r\nDistilBERT base (distilbert-base-uncased),:compile-pt:`Compile <distilbert-base-uncased>` + :benchmark-pt:`Benchmark <distilbert-base-uncased>`,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,538.2,,8,11.1,,,11.5,Real Time,2.20.0,Data Parallel,6,1,1,\"fp32, sequence-length=128\"\r\nDistilRoBERTa base (distilroberta-base),:compile-pt:`Compile <distilroberta-base>` + :benchmark-pt:`Benchmark <distilroberta-base>`,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,417.0,,8,7.0,,,7.8,Real Time,2.20.0,Data Parallel,3,1,1,\"fp32, sequence-length=128\"\r\n"
  },
  {
    "path": "about-neuron/benchmarks/inf1/throughput_data_cnn.csv",
    "content": "Name,Model,Model details,Framework,Application Type,Run Mode,Inst. Type,Num. Cores,Batch Size,Avg Throughput (/sec),Max Throughput,Threads,Ops in Inferentia,Latency P50 (ms),Latency P90 (ms),Latency P95 (ms),Latency P99 (ms),Latency P100 (ms),Neuron Version,Application,Tutorial\r\n\"YOLOv4-PT(fp32,b1,c4)\",YOLO v4,fp32,PyTorch 1.13,Real Time,Data Parallel,inf1.2xlarge,4,1,180.3,,8,,40.0,,,50.8,,2.20.0,CV,:ref:`Evaluate YOLO v4 on Inferentia </src/examples/pytorch/yolo_v4.ipynb>`\r\n\"Resnet50-PT(fp32,b5,c4)\",Resnet-50,fp32,PyTorch 1.13,Batch,Data Parallel,inf1.xlarge,4,5,921.5,,4,,21.6,,,22.9,,2.20.0,CV,:ref:`Resnet50 model for Inferentia </src/examples/pytorch/resnet50.ipynb>`\r\n\"Resnet50-TF(fp16,b5,c4)\",Resnet-50,fp16,Tensorflow 1.15,Batch,Data Parallel,inf1.xlarge,4,10,2207,,8,,17.8,,,22.7,,2.12.0,CV,:ref:`ResNet-50 optimization example </src/examples/tensorflow/keras_resnet50/keras_resnet50.ipynb>`\r\n\"OpenPose-TF(fp16,b1,c4)\",OpenPose,fp16,Tensorflow 1.15,Real Time,Data Parallel,inf1.xlarge,4,1,57.5,,4,,60.3,,,67.4,,2.12.0,CV,:ref:`Running OpenPose on Inferentia </src/examples/tensorflow/openpose_demo/openpose.ipynb>`\r\n"
  },
  {
    "path": "about-neuron/benchmarks/inf1/throughput_data_encoder.csv",
    "content": "Model,Scripts,Source,Framework,Inst. Type,Num Cores,Seq. Length,Avg Throughput (/sec),Max Throughput,Threads,Latency P50 (ms),Latency P90 (ms),Latency P95 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,N Models,Workers per Model,Model details\r\nBERT base (bert-base-cased),:compile-pt:`Compile <bert-base-cased>` + :benchmark-pt:`Benchmark <bert-base-cased>`,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,1095.4,,8,58.3,,,65.0,Batch,2.20.0,Data Parallel,8,4,2,\"fp32, sequence-length=128\"\r\nBERT base (bert-base-uncased),:compile-pt:`Compile <bert-base-uncased>` + :benchmark-pt:`Benchmark <bert-base-uncased>`,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,1180.7,,8,40.6,,,45.0,Batch,2.20.0,Data Parallel,6,4,2,\"fp32, sequence-length=128\"\r\nDistilBERT base (distilbert-base-uncased-finetuned-sst-2-english),:compile-pt:`Compile <distilbert-base-uncased-finetuned-sst-2-english>` + :benchmark-pt:`Benchmark <distilbert-base-uncased-finetuned-sst-2-english>`,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,1875.3,,8,33.7,,,54.1,Batch,2.20.0,Data Parallel,8,4,2,\"fp32, sequence-length=128\"\r\nDistilBERT base (distilbert-base-uncased),:compile-pt:`Compile <distilbert-base-uncased>` + :benchmark-pt:`Benchmark <distilbert-base-uncased>`,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,1876.7,,8,33.7,,,53.2,Batch,2.20.0,Data Parallel,8,4,2,\"fp32, sequence-length=128\"\r\nDistilRoBERTa base (distilroberta-base),:compile-pt:`Compile <distilroberta-base>` + :benchmark-pt:`Benchmark <distilroberta-base>`,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,1512.9,,8,15.0,,,25.9,Batch,2.20.0,Data Parallel,6,4,1,\"fp32, sequence-length=128\"\r\nBERT base,:ref:`HuggingFace Pretrained BERT </src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb>`,,PyTorch 1.13,inf1.xlarge,,,1056,,,20,,,21,Batch,2.20.0,Data Parallel,4,,,\"fp32, bert-base-cased-finetuned-mrpc, sequence-length=128\"\r\nBERT base,:ref:`Using NeuronCore Pipeline </src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb>`,,PyTorch 1.13,inf1.6xlarge,,,2009.1,,,5.9,,,6.3,Real Time,2.20.0,Model Pipeline,1,,,\"fp32, bert-base-uncased, sequence-length=128\"\r\nBERT base,:ref:`HuggingFace distilBERT with Tensorflow2 </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>`,,Tensorflow 2.10,inf1.6xlarge,,,2123.4,,,30.0,,,32.2,Batch,2.20.0,Data Parallel,16,,,\"fp32, distilbert-base-uncased-finetuned-sst-2-english, sequence-length=128\"\r\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/inf2-performance.rst",
    "content": ".. _inf2-performance:\n\nInf2 Inference Performance\n==========================\n\n.. important::\n\n   The benchmark scripts linked on this page are provided for historical reference only and are not tested with recent versions of the Neuron SDK. They have been moved to the `archive folder <https://github.com/aws-neuron/aws-neuron-sdk/tree/master/archive/src/benchmark/pytorch>`_.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n*Last update: Feb 26th, 2026*\n\n.. _inf2_inference_perf:\n\nEncoder Models\n--------------\n\n.. tab-set::\n\n    .. tab-item:: Throughput optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('throughput_data_encoder.csv')\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n            df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (inference/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n            cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/second)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Sequence Length', 'Model Data Type','Compilation Autocast Data Type', 'OS Type']\n            df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences'])\n            df['Throughput (inference/second)'] = df['Throughput (inference/second)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n\n    .. tab-item:: Latency optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('latency_data_encoder.csv')\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n            df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (inference/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n            cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/second)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Sequence Length', 'Model Data Type','Compilation Autocast Data Type', 'OS Type']\n            df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences'])\n            df['Throughput (inference/second)'] = df['Throughput (inference/second)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n\nEncoder-Decoder Models\n----------------------\n\n.. tab-set::\n\n    .. tab-item:: Throughput optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('throughput_data_encoder_decoder.csv')\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n\n            df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (tokens/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n            cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (tokens/second)', 'Latency per Token P50 (ms)', 'Latency per Token P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'TP Degree',\t'DP Degree', 'Batch Size', 'Sequence Length', 'Input Length', 'Output Length', 'Model Data Type','Compilation Autocast Data Type']\n            df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences'])\n\n            df['Throughput (tokens/second)'] = df['Throughput (tokens/second)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency per Token P50 (ms)', 'Latency per Token P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n        .. note::\n         Only for Encoder-Decoder\n\n         **Throughput (tokens/second)** counts both input and output tokens\n\n         **Latency per Token** counts both input and output tokens\n\n\n    .. tab-item:: Latency optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('latency_data_encoder_decoder.csv')\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n\n            df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (tokens/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n            cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (tokens/second)', 'Latency per Token P50 (ms)', 'Latency per Token P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'TP Degree',\t'DP Degree', 'Batch Size', 'Sequence Length', 'Input Length', 'Output Length', 'Model Data Type','Compilation Autocast Data Type']\n            df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences'])\n\n            df['Throughput (tokens/second)'] = df['Throughput (tokens/second)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency per Token P50 (ms)', 'Latency per Token P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n        .. note::\n         **Throughput (tokens/second)** counts both input and output tokens\n\n         **Latency per Token** counts both input and output tokens\n        \n\nVision Transformers Models\n--------------------------\n\n.. tab-set::\n\n    .. tab-item:: Throughput optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('throughput_data_vision_transformers.csv')\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n\n            df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n            cols_to_show = ['Model','Image Size','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M images', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model Data Type','Compilation Autocast Data Type']\n            df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images'])\n\n            df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n\n    .. tab-item:: Latency optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('latency_data_vision_transformers.csv')\n\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n\n            df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n            cols_to_show = ['Model','Image Size','Scripts','Framework','Inst. Type','Task', 'Throughput (inference/sec)','Latency P50 (ms)','Latency P99 (ms)','Cost per 1M images','Application Type','Neuron Version','Run Mode','Batch Size','Model Data Type', 'Compilation Autocast Data Type']\n            df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images'])\n\n            df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n\nConvolutional Neural Networks (CNN) Models\n------------------------------------------\n\n.. tab-set::\n\n    .. tab-item:: Throughput optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('throughput_data_vision_cnn.csv')\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n\n            df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n            cols_to_show = ['Model','Image Size','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M images', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model Data Type','Compilation Autocast Data Type']\n            df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images'])\n\n            df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n\n    .. tab-item:: Latency optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('latency_data_vision_cnn.csv')\n\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n\n            df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n            cols_to_show = ['Model','Image Size','Scripts','Framework','Inst. Type','Task', 'Throughput (inference/sec)','Latency P50 (ms)','Latency P99 (ms)','Cost per 1M images','Application Type','Neuron Version','Run Mode','Batch Size','Model Data Type', 'Compilation Autocast Data Type']\n            df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images'])\n\n            df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n\nStable Diffusion Models\n-----------------------\n\n.. tab-set::\n\n    .. tab-item:: Throughput optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('throughput_data_vision_sd.csv')\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n\n            df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n            cols_to_show = ['Model','Image Size','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M images', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model Data Type','Compilation Autocast Data Type']\n            df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images'])\n\n            df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n        .. note::\n         **Cost per 1M images** is calculated using RI-Effective hourly rate.\n\n         **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference.\n\n\n    .. tab-item:: Latency optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('latency_data_vision_sd.csv')\n\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n\n            df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n            cols_to_show = ['Model','Image Size','Scripts','Framework','Inst. Type','Task', 'Throughput (inference/sec)','Latency P50 (ms)','Latency P99 (ms)','Cost per 1M images','Application Type','Neuron Version','Run Mode','Batch Size','Model Data Type', 'Compilation Autocast Data Type']\n            df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images'])\n\n            df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n        .. note::\n         **Cost per 1M images** is calculated using RI-Effective hourly rate.\n\n         **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference.\n\nDiffusion Transformer Models\n----------------------------\n\n.. tab-set::\n\n    .. tab-item:: Throughput optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('throughput_data_vision_dit.csv')\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n\n            df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n            cols_to_show = ['Model','Image Size','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M images', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model Data Type','Compilation Autocast Data Type']\n            df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images'])\n\n            df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n        .. note::\n         **Cost per 1M images** is calculated using RI-Effective hourly rate.\n\n         **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference.\n\n\n    .. tab-item:: Latency optimized\n\n        .. df-table::\n            :header-rows: 1\n\n            df = pd.read_csv('latency_data_vision_dit.csv')\n\n            df_prices = pd.read_csv('inf2_instance_prices.csv')\n            df = pd.merge(df,df_prices,on='Inst. Type')\n\n            df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n            cols_to_show = ['Model','Image Size','Scripts','Framework','Inst. Type','Task', 'Throughput (inference/sec)','Latency P50 (ms)','Latency P99 (ms)','Cost per 1M images','Application Type','Neuron Version','Run Mode','Batch Size','Model Data Type', 'Compilation Autocast Data Type']\n            df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images'])\n\n            df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True)\n            int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n            df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n        .. note::\n         **Cost per 1M images** is calculated using RI-Effective hourly rate.\n\n         **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference.\n\n\n\n\n.. note::\n\n      See :ref:`neuron_hw_glossary` for abbreviations and terms\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/inf2_instance_prices.csv",
    "content": "Inst. Type,RI-Effective hourly rate\nInf2.xlarge,0.328\nInf2.48xlarge,5.608\nInf2.24xlarge,2.804\nInf2.8xlarge,0.850\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/latency_data_decoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Output Token Throughput (tokens/sec),TTFT Latency P50 (ms),TTFT Latency P99 (ms),TPOT Latency P50 (ms),TPOT Latency P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type,Weight Storage Data Type\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,144.98,29.47,42.05,7.41,7.68,Real Time,2.18.1,Tensor Parallel,24,1,8192,128,8064,FP16,Matmult-BF16,int8\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,133.63,209.37,232.02,7.47,7.57,Real Time,2.18.1,Tensor Parallel,24,1,8192,4096,4096,FP16,Matmult-BF16,int8\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,161.67,25,25.88,6.42,6.58,Real Time,2.18.1,Tensor Parallel,24,1,4096,128,3968,FP16,Matmult-BF16,int8\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,153.58,101.81,110.6,6.5,6.6,Real Time,2.18.1,Tensor Parallel,24,1,4096,2048,2048,FP16,Matmult-BF16,int8\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.84,745.49,749.48,34.67,35.06,Real Time,2.18.1,Tensor Parallel,24,1,4096,2048,2048,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.81,312.86,322.56,33.81,34.13,Real Time,2.18.1,Tensor Parallel,24,1,3072,1024,2048,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.16,310.18,315.23,33.14,34.29,Real Time,2.18.1,Tensor Parallel,24,1,2048,1024,1024,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.82,80,100.47,32.47,33.03,Real Time,2.18.1,Tensor Parallel,24,1,1152,128,1024,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.9,99.37,142.62,32.48,32.86,Real Time,2.18.1,Tensor Parallel,24,1,512,256,256,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,31.28,77.81,78.52,32.2,33.02,Real Time,2.18.1,Tensor Parallel,24,1,256,128,128,FP16,Matmult-BF16,bf16\r\nLlama-2-7b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,156.1281689,27.63772011,33.7741375,6.46972656,7.07960129,Real Time,2.18.0,Tensor Parallel,24,1,4096,128,3968,FP16,Matmult-BF16,bf16\r\nLlama-2-7b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,145.1665497,29.20985222,33.39338303,7.34019279,7.80153275,Real Time,2.18.0,Tensor Parallel,24,1,8192,128,8064,FP16,Matmult-BF16,bf16\r\nLlama-2-13b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,112.520024,25.85077286,26.89838409,9.16552544,9.33074951,Real Time,2.18.0,Tensor Parallel,24,1,4096,128,3968,FP16,Matmult-BF16,bf16\r\nLlama-2-13b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,97.41527724,333.7800503,340.9907818,10.17355919,10.37788391,Real Time,2.18.0,Tensor Parallel,24,1,8192,4096,4096,FP16,Matmult-BF16,bf16\r\nLlama-2-13b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,73.16747525,994.1797257,999.7954369,13.49759102,13.97609711,Real Time,2.18.0,Tensor Parallel,24,1,16384,8192,8192,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.06356,76.59531,77.12364,32.89557,33.42032,Real Time,2.18.0,Tensor Parallel,24,1,256,128,128,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.92419,96.4396,98.47379,33.13422,33.45966,Real Time,2.18.0,Tensor Parallel,24,1,512,256,256,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.07017,76.33042,86.52544,33.15115,34.0786,Real Time,2.18.0,Tensor Parallel,24,1,1152,128,1024,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.426,277.01592,280.12586,33.73241,34.01256,Real Time,2.18.0,Tensor Parallel,24,1,2048,1024,1024,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.91353,275.96617,284.77097,34.81936,35.43973,Real Time,2.18.0,Tensor Parallel,24,1,3072,1024,2048,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.32725,810.43696,814.87799,34.90329,35.14242,Real Time,2.18.0,Tensor Parallel,24,1,4096,2048,2048,FP16,Matmult-BF16,bf16\r\nMistral-7B-Instruct-v0.2,:llama-sample:`Sample <mistralai-Mistral-7b-Instruct-v0.2>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,149.7363908,27.34160423,29.20722961,6.86240196,7.07960129,Real Time,2.18.0,Tensor Parallel,24,1,4096,128,3968,FP16,Matmult-BF16,bf16\r\nMistral-7B-Instruct-v0.2,:llama-sample:`Sample <mistralai-Mistral-7b-Instruct-v0.2>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,81.7034129,557.9631329,562.8581047,7.86566734,11.64746284,Real Time,2.18.0,Tensor Parallel,24,1,8192,4096,4096,FP16,Matmult-BF16,bf16\r\nMistral-7B-Instruct-v0.2,:llama-sample:`Sample <mistralai-Mistral-7b-Instruct-v0.2>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,95.99325977,539.5913124,557.1010113,10.32972336,10.61367989,Real Time,2.18.0,Tensor Parallel,24,1,16384,8192,8192,FP16,Matmult-BF16,bf16\r\nCodeLlama-13b-hf,:llama-sample:`Sample <codellama-13b-16k-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,112.7050057,27.0178318,33.24627876,9.12380219,9.38177109,Real Time,2.18.0,Tensor Parallel,24,1,4096,128,3968,FP16,Matmult-BF16,bf16\r\nCodeLlama-13b-hf,:llama-sample:`Sample <codellama-13b-16k-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,97.52121418,338.6683464,340.4603005,10.15138626,10.55026054,Real Time,2.18.0,Tensor Parallel,24,1,8192,4096,4096,FP16,Matmult-BF16,bf16\r\nCodeLlama-13b-hf,:llama-sample:`Sample <codellama-13b-16k-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,73.67826681,989.4962311,1000.655413,13.43631744,13.85569572,Real Time,2.18.0,Tensor Parallel,24,1,16384,8192,8192,FP16,Matmult-BF16,bf16\r\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/latency_data_encoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Throughput (inference/second),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Sequence Length,Model Data Type,Compilation Autocast Data Type,OS Type\r\nalbert-base-v2,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),2119.78480993,0.93722343,1.00183487,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\nbert-base-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),1998.20950133,0.99897385,1.04045868,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\nbert-large-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.7,Inf2.xlarge,Raw Output (AutoModel),738.64502335,2.69365311,2.77733803,Real Time,2.25.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\ndistilbert-base-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),3401.96550351,0.57864189,0.67734718,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\ngoogle/electra-base-discriminator,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),2020.45540243,0.9958744,1.04618073,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\nroberta-base,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),1989.26102482,0.99945068,1.09100342,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\nroberta-large,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),738.88441011,2.69317627,2.77304649,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\nxlm-roberta-base,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.5,Inf2.48xlarge,Raw Output (AutoModelForMaskedLM),48.80198341,40.66610336,51.05760336,Real Time,2.22.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\n\r\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/latency_data_encoder_decoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Throughput (tokens/second),Latency per Token P50 (ms),Latency per Token P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,DP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type\r\nt5-3b,`Tutorial <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`_,NeuronX Distributed,Inf2.24xlarge,Text Generation,108.18,9.25,9.26,Real Time,2.18.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16\r\ngoogle/flan-t5-xl,`Tutorial <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`_,NeuronX Distributed,Inf2.24xlarge,Text Generation,117.6,8.5,8.53,Real Time,2.18.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16\r\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/latency_data_vision.csv",
    "content": "Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type\ndeepmind/multimodal-perceiver,16x224x224,:benchmark-pt:`Benchmark <perceiver-multimodal>`,PyTorch 1.13.1,Inf2.xlarge,Multimodal Autoencoding,0.83,1250,1271,Real Time,2.18.0,Data Parallel,1,FP32,None\ndeepmind/vision-perceiver-learned,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,99.6,18.6,18.7,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\ndeepmind/vision-perceiver-fourier,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,67.9,29.5,29.68,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\ndeepmind/vision-perceiver-conv,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,126.5,14.14,14.2,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\ngoogle/vit-base-patch16-224,224x224,:benchmark-pt:`Benchmark <hf-google-vit>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,709.468,1.406,1.431,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\nopenai/clip-vit-base-patch32,224x224,:benchmark-pt:`Benchmark <hf-openai-clip>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,163.444,6.113,6.143,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\nopenai/clip-vit-large-patch14,224x224,:benchmark-pt:`Benchmark <hf-openai-clip>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,61.812,16.172,16.216,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\nresnet18,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,1385.04,0.72,0.75,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\nresnet34,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,1187.64,0.83,0.88,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\nresnet50,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,1044.93,0.95,0.98,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\nresnet101,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,882.61,1.13,1.15,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\nresnet152,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,736.91,1.35,1.39,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\nStable Diffusion 1.5,512x512,:benchmark-pt:`Benchmark <sd_15_512>`,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.421,2369.6,2406.8,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16\nStable Diffusion 2.1,512x512,:benchmark-pt:`Benchmark <sd2_512>`,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.549,1794.5,2103.7,Real Time,2.17.0,Data Parallel,1,\"FP32, BF16\",Matmult-BF16\nStable Diffusion 2.1,768x768,:benchmark-pt:`Benchmark <sd2_768>`,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.188,5306.7,5368.6,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16\nStable Diffusion 2 Inpainting,936x624,:benchmark-pt:`Benchmark <sd2_inpainting>`,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.15,6701.4,6737.4,Real Time,2.17.0,Data Parallel,1,\"FP32, BF16\",Matmult-BF16\nStable Diffusion XL Base,1024x1024,:benchmark-pt:`Benchmark <sdxl_base_1024>`,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.073,13431.7,15739.0,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16\nStable Diffusion XL Base & Refiner,1024x1024,:benchmark-pt:`Benchmark <sdxl_base_and_refiner>`,PyTorch 1.13.1,Inf2.8xlarge,Image Generation,0.078,12651.9,15053.9,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16\nUNet,224x224,:benchmark-pt:`Benchmark <unet>`,PyTorch 1.13.1,Inf2.xlarge,Image Segmentation,420.16,2.37,2.41,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\nvgg11,224x224,:benchmark-pt:`Benchmark <vgg>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,524.10,1.90,1.96,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\nvgg16,224x224,:benchmark-pt:`Benchmark <vgg>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,435.54,2.29,2.33,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/latency_data_vision_cnn.csv",
    "content": "Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type\r\nresnet18,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 2.5,Inf2.xlarge,Image Classification,1669.796,0.596,0.613,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nresnet34,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 2.5,Inf2.xlarge,Image Classification,1394.211,0.718,0.726,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nresnet50,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 2.5,Inf2.xlarge,Image Classification,1218.875,0.83,0.846,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nresnet101,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 2.5,Inf2.xlarge,Image Classification,994.691,1.007,1.024,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nresnet152,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 2.5,Inf2.xlarge,Image Classification,837.784,1.185,1.219,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nUNet,224x224,:benchmark-pt:`Benchmark <unet>`,PyTorch 2.5,Inf2.xlarge,Image Segmentation,447.094,2.232,2.253,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nvgg11,224x224,:benchmark-pt:`Benchmark <vgg>`,PyTorch 2.5,Inf2.xlarge,Image Classification,629.189,1.59,1.605,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nvgg16,224x224,:benchmark-pt:`Benchmark <vgg>`,PyTorch 2.5,Inf2.xlarge,Image Classification,508.665,1.956,1.995,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16"
  },
  {
    "path": "about-neuron/benchmarks/inf2/latency_data_vision_dit.csv",
    "content": "Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type\nPixArt Alpha,256x256,:benchmark-pt:`Benchmark <pixart_alpha>`,PyTorch 2.1,Inf2.xlarge,Image Generation,1.975,502.587,537.258,Real Time,2.20,Data Parallel,1,\"\"\"FP32, BF16\"\"\",Matmult-BF16\nPixArt Alpha,512x512,:benchmark-pt:`Benchmark <pixart_alpha>`,PyTorch 2.1,Inf2.xlarge,Image Generation,0.565,1769.756,1775.697,Real Time,2.20,Data Parallel,1,\"\"\"FP32, BF16\"\"\",Matmult-BF16\nPixArt Sigma,256x256,:benchmark-pt:`Benchmark <pixart_sigma>`,PyTorch 2.1,Inf2.xlarge,Image Generation,1.86,540.832,548.41,Real Time,2.20,Data Parallel,1,\"\"\"FP32, BF16\"\"\",Matmult-BF16\nPixArt Sigma,512x512,:benchmark-pt:`Benchmark <pixart_sigma>`,PyTorch 2.1,Inf2.xlarge,Image Generation,0.543,1841.882,1850.683,Real Time,2.20,Data Parallel,1,\"\"\"FP32, BF16\"\"\",Matmult-BF16\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/latency_data_vision_sd.csv",
    "content": "Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type\r\nStable Diffusion 1.5,512x512,:benchmark-pt:`Benchmark <sd_15_512>`,PyTorch 2.5,Inf2.xlarge,Image Generation,0.494,2023.741,2031.705,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nStable Diffusion 2.1,512x512,:benchmark-pt:`Benchmark <sd2_512>`,PyTorch 2.5,Inf2.xlarge,Image Generation,0.596,1679.805,1685.442,Real Time,2.21.0,Data Parallel,1,\"FP32, BF16\",Matmult-BF16\r\nStable Diffusion 2.1,768x768,:benchmark-pt:`Benchmark <sd2_768>`,PyTorch 2.5,Inf2.xlarge,Image Generation,0.187,5337.509,5357.361,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nStable Diffusion 2 Inpainting,936x624,:benchmark-pt:`Benchmark <sd2_inpainting>`,PyTorch 2.5,Inf2.xlarge,Image Generation,0.133,7546.004,7550.984,Real Time,2.21.0,Data Parallel,1,\"FP32, BF16\",Matmult-BF16\r\nStable Diffusion XL Base,1024x1024,:benchmark-pt:`Benchmark <sdxl_base_1024>`,PyTorch 2.5,Inf2.xlarge,Image Generation,0.083,12048.659,12102.431,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nStable Diffusion XL Base & Refiner,1024x1024,:benchmark-pt:`Benchmark <sdxl_base_and_refiner>`,PyTorch 2.5,Inf2.8xlarge,Image Generation,0.095,10546.45,10704.566,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16"
  },
  {
    "path": "about-neuron/benchmarks/inf2/latency_data_vision_transformers.csv",
    "content": "Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type\r\ndeepmind/multimodal-perceiver,16x224x224,:benchmark-pt:`Benchmark <perceiver-multimodal>`,PyTorch 2.5,Inf2.xlarge,Multimodal Autoencoding,0.853,1170.045,1232.056,Real Time,2.21.0,Data Parallel,1,FP32,None\r\ndeepmind/vision-perceiver-learned,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,99.6,18.6,18.7,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\r\ndeepmind/vision-perceiver-fourier,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,67.9,29.5,29.68,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\r\ndeepmind/vision-perceiver-conv,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,126.5,14.14,14.2,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\r\ngoogle/vit-base-patch16-224,224x224,:benchmark-pt:`Benchmark <hf-google-vit>`,PyTorch 2.5,Inf2.xlarge,Image Classification,746.139,1.322,1.378,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nopenai/clip-vit-base-patch32,224x224,:benchmark-pt:`Benchmark <hf-openai-clip>`,PyTorch 2.5,Inf2.xlarge,Image Classification,161.047,6.213,6.246,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nopenai/clip-vit-large-patch14,224x224,:benchmark-pt:`Benchmark <hf-openai-clip>`,PyTorch 2.5,Inf2.xlarge,Image Classification,73.261,13.643,13.685,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16"
  },
  {
    "path": "about-neuron/benchmarks/inf2/throughput_data_decoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Output Token Throughput (tokens/sec),TTFT Latency P50 (ms),TTFT Latency P99 (ms),TPOT Latency P50 (ms),TPOT Latency P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type,Weight Storage Data Type\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,649.17,68.95,99.28,15.22,15.48,Batch,2.18.1,Tensor Parallel,24,8,8192,128,8064,FP16,Matmult-BF16,int8\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,521.96,1992.59,2016.73,15.31,15.64,Batch,2.18.1,Tensor Parallel,24,8,8192,4096,4096,FP16,Matmult-BF16,int8\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,859.09,66.02,75.73,10.45,10.76,Batch,2.18.1,Tensor Parallel,24,8,4096,128,3968,FP16,Matmult-BF16,int8\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,759.15,823.53,832.84,10.5,11.02,Batch,2.18.1,Tensor Parallel,24,8,4096,2048,2048,FP16,Matmult-BF16,int8\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.84,745.49,749.48,34.67,35.06,Batch,2.18.1,Tensor Parallel,24,1,4096,2048,2048,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.81,312.86,322.56,33.81,34.13,Batch,2.18.1,Tensor Parallel,24,1,3072,1024,2048,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.16,310.18,315.23,33.14,34.29,Batch,2.18.1,Tensor Parallel,24,1,2048,1024,1024,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.82,80,100.47,32.47,33.03,Batch,2.18.1,Tensor Parallel,24,1,1152,128,1024,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.9,99.37,142.62,32.48,32.86,Batch,2.18.1,Tensor Parallel,24,1,512,256,256,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,31.28,77.81,78.52,32.2,33.02,Batch,2.18.1,Tensor Parallel,24,1,256,128,128,FP16,Matmult-BF16,bf16\r\nLlama-2-7b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,725.82805,77.36206,87.27574,12.10523,13.05699,Batch,2.18.0,Tensor Parallel,24,8,4096,128,3968,FP16,Matmult-BF16,bf16\r\nLlama-2-7b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,577.97078,80.11794,89.68878,16.39295,17.81178,Batch,2.18.0,Tensor Parallel,24,8,8192,128,8064,FP16,Matmult-BF16,bf16\r\nLlama-2-13b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,589.88712,108.80947,113.89017,14.89663,15.79142,Batch,2.18.0,Tensor Parallel,24,8,4096,128,3968,FP16,Matmult-BF16,bf16\r\nLlama-2-13b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,351.75817,7083.72855,7158.32424,20.9856,21.80099,Batch,2.18.0,Tensor Parallel,24,8,8192,4096,4096,FP16,Matmult-BF16,bf16\r\nLlama-2-13b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,178.56973,5141.32094,5160.92515,21.70897,22.74466,Batch,2.18.0,Tensor Parallel,24,4,16384,8192,8192,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.06356,76.59531,77.12364,32.89557,33.42032,Batch,2.18.0,Tensor Parallel,24,1,256,128,128,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.92419,96.4396,98.47379,33.13422,33.45966,Batch,2.18.0,Tensor Parallel,24,1,512,256,256,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.07017,76.33042,86.52544,33.15115,34.0786,Batch,2.18.0,Tensor Parallel,24,1,1152,128,1024,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.426,277.01592,280.12586,33.73241,34.01256,Batch,2.18.0,Tensor Parallel,24,1,2048,1024,1024,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.91353,275.96617,284.77097,34.81936,35.43973,Batch,2.18.0,Tensor Parallel,24,1,3072,1024,2048,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.32725,810.43696,814.87799,34.90329,35.14242,Batch,2.18.0,Tensor Parallel,24,1,4096,2048,2048,FP16,Matmult-BF16,bf16\r\nMistral-7B-Instruct-v0.2,:llama-sample:`Sample <mistralai-Mistral-7b-Instruct-v0.2>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,761.88605,77.62027,86.62724,11.63864,12.49599,Batch,2.18.0,Tensor Parallel,24,8,4096,128,3968,FP16,Matmult-BF16,bf16\r\nMistral-7B-Instruct-v0.2,:llama-sample:`Sample <mistralai-Mistral-7b-Instruct-v0.2>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,450.37555,4740.11564,4783.75316,16.54649,17.52925,Batch,2.18.0,Tensor Parallel,24,8,8192,4096,4096,FP16,Matmult-BF16,bf16\r\nMistral-7B-Instruct-v0.2,:llama-sample:`Sample <mistralai-Mistral-7b-Instruct-v0.2>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,411.04655,11085.12306,11125.86117,18.01157,19.9585,Batch,2.18.0,Tensor Parallel,24,8,16384,8192,8192,FP16,Matmult-BF16,bf16\r\nCodeLlama-13b-hf,:llama-sample:`Sample <codellama-13b-16k-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,546.51472,115.81421,121.49906,15.87224,17.21263,Batch,2.18.0,Tensor Parallel,24,8,4096,128,3968,FP16,Matmult-BF16,bf16\r\nCodeLlama-13b-hf,:llama-sample:`Sample <codellama-13b-16k-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,333.24073,7115.97776,7231.01234,22.26758,23.81206,Batch,2.18.0,Tensor Parallel,24,8,8192,4096,4096,FP16,Matmult-BF16,bf16\r\nCodeLlama-13b-hf,:llama-sample:`Sample <codellama-13b-16k-sampling>`,Transformers NeuronX,Inf2.48xlarge,Text Generation,178.79017,5136.61623,5192.58666,21.6732,22.73154,Batch,2.18.0,Tensor Parallel,24,4,16384,8192,8192,FP16,Matmult-BF16,bf16\r\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/throughput_data_encoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Throughput (inference/second),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Sequence Length,Model Data Type,Compilation Autocast Data Type,OS Type\r\nalbert-base-v2,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.7,Inf2.xlarge,Raw Output (AutoModel),3147.09984049,5.0675869,5.27883291,Batch,2.25.0,Data Parallel,8,128,FP32,Matmult-BF16,U22\r\nbert-base-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,Inf2.xlarge,Raw Output (AutoModel),2674.18956433,5.97381591,6.17100715,Batch,2.27.0,Data Parallel,8,128,FP32,Matmult-BF16,U22\r\nbert-large-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.5,Inf2.xlarge,Raw Output (AutoModel),950.0496231,8.41140747,8.84652853,Batch,2.21.0,Data Parallel,4,128,FP32,Matmult-BF16,U22\r\ndistilbert-base-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,Inf2.xlarge,Raw Output (AutoModel),5307.87660777,6.01053237,6.23083114,Batch,2.27.0,Data Parallel,16,128,FP32,Matmult-BF16,U22\r\ngoogle/electra-base-discriminator,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.7,Inf2.xlarge,Raw Output (AutoModel),2889.75325068,11.02411747,11.97555304,Batch,2.25.0,Data Parallel,16,128,FP32,Matmult-BF16,U22\r\nroberta-base,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.7,Inf2.xlarge,Raw Output (AutoModel),2920.37954741,5.42390347,5.82957506,Batch,2.25.0,Data Parallel,8,128,FP32,Matmult-BF16,U22\r\nroberta-large,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.7,Inf2.xlarge,Raw Output (AutoModel),962.70185508,8.31007957,8.60977411,Batch,2.25.0,Data Parallel,4,128,FP32,Matmult-BF16,U22\r\nxlm-roberta-base,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.5,Inf2.48xlarge,Raw Output (AutoModelForMaskedLM),51.13695938,625.66077709,694.93403673,Batch,2.22.0,Data Parallel,16,128,FP32,Matmult-BF16,U22\r\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/throughput_data_encoder_decoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Throughput (tokens/second),Latency per Token P50 (ms),Latency per Token P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,DP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type\r\nt5-3b,`Tutorial <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`_,NeuronX Distributed,Inf2.24xlarge,Text Generation,111.92,8.97,8.98,Batch,2.17.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16\r\ngoogle/flan-t5-xl,`Tutorial <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`_,NeuronX Distributed,Inf2.24xlarge,Text Generation,117.61,8.51,8.53,Batch,2.17.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16\r\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/throughput_data_vision.csv",
    "content": "Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type\ndeepmind/multimodal-perceiver,16x224x224,:benchmark-pt:`Benchmark <perceiver-multimodal>`,PyTorch 1.13.1,Inf2.xlarge,Multimodal Autoencoding,0.83,1250,1271,Real Time,2.18.0,Data Parallel,1,FP32,None\ndeepmind/vision-perceiver-learned,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,99.6,18.6,18.7,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\ndeepmind/vision-perceiver-fourier,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,67.9,29.5,29.68,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\ndeepmind/vision-perceiver-conv,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,126.5,14.14,14.2,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\ngoogle/vit-base-patch16-224,224x224,:benchmark-pt:`Benchmark <hf-google-vit>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,1632.359,4.716,5.902,Batch,2.14.0,Data Parallel,2,FP32,Matmult-BF16\nopenai/clip-vit-base-patch32,224x224,:benchmark-pt:`Benchmark <hf-openai-clip>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,5178.833,48.973,57.002,Batch,2.14.0,Data Parallel,64,FP32,Matmult-BF16\nopenai/clip-vit-large-patch14,224x224,:benchmark-pt:`Benchmark <hf-openai-clip>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,200.997,78.331,92.452,Batch,2.14.0,Data Parallel,4,FP32,Matmult-BF16\nresnet18,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,6635.04,4.80,4.88,Batch,2.14.0,Data Parallel,8,FP32,Matmult-BF16\nresnet34,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,4848.72,6.56,6.66,Batch,2.14.0,Data Parallel,8,FP32,Matmult-BF16\nresnet50,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,4269.12,7.49,7.55,Batch,2.14.0,Data Parallel,8,FP32,Matmult-BF16\nresnet101,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,3066.24,83.38,83.56,Batch,2.14.0,Data Parallel,64,FP32,Matmult-BF16\nresnet152,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,2323.20,110.06,110.21,Batch,2.14.0,Data Parallel,64,FP32,Matmult-BF16\nStable Diffusion 1.5,512x512,:benchmark-pt:`Benchmark <sd_15_512>`,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.421,2369.6,2406.8,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16\nStable Diffusion 2.1,512x512,:benchmark-pt:`Benchmark <sd2_512>`,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.549,1794.5,2103.7,Real Time,2.17.0,Data Parallel,1,\"FP32, BF16\",Matmult-BF16\nStable Diffusion 2.1,768x768,:benchmark-pt:`Benchmark <sd2_768>`,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.188,5306.7,5368.6,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16\nStable Diffusion 2 Inpainting,936x624,:benchmark-pt:`Benchmark <sd2_inpainting>`,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.15,6701.4,6737.4,Real Time,2.17.0,Data Parallel,1,\"FP32, BF16\",Matmult-BF16\nStable Diffusion XL Base,1024x1024,:benchmark-pt:`Benchmark <sdxl_base_1024>`,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.073,13431.7,15739.0,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16\nStable Diffusion XL Base & Refiner,1024x1024,:benchmark-pt:`Benchmark <sdxl_base_and_refiner>`,PyTorch 1.13.1,Inf2.8xlarge,Image Generation,0.078,12651.9,15053.9,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16\nUNet,224x224,:benchmark-pt:`Benchmark <unet>`,PyTorch 1.13.1,Inf2.xlarge,Image Segmentation,866.96,18.37,18.86,Batch,2.14.0,Data Parallel,4,FP32,Matmult-BF16\nvgg11,224x224,:benchmark-pt:`Benchmark <vgg>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,3955.20,64.15,64.24,Batch,2.14.0,Data Parallel,64,FP32,Matmult-BF16\nvgg16,224x224,:benchmark-pt:`Benchmark <vgg>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,1964.16,16.27,16.35,Batch,2.14.0,Data Parallel,8,FP32,Matmult-BF16\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/throughput_data_vision_cnn.csv",
    "content": "Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type\r\nresnet18,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 2.5,Inf2.xlarge,Image Classification,6949.174,4.587,4.659,Batch,2.21.0,Data Parallel,8,FP32,Matmult-BF16\r\nresnet34,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 2.5,Inf2.xlarge,Image Classification,5158.607,6.18,6.251,Batch,2.21.0,Data Parallel,8,FP32,Matmult-BF16\r\nresnet50,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 2.5,Inf2.xlarge,Image Classification,4393.304,7.283,7.331,Batch,2.21.0,Data Parallel,8,FP32,Matmult-BF16\r\nresnet101,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 2.5,Inf2.xlarge,Image Classification,3164.991,80.818,80.938,Batch,2.21.0,Data Parallel,64,FP32,Matmult-BF16\r\nresnet152,224x224,:benchmark-pt:`Benchmark <resnet>`,PyTorch 2.5,Inf2.xlarge,Image Classification,2449.875,104.406,104.531,Batch,2.21.0,Data Parallel,64,FP32,Matmult-BF16\r\nUNet,224x224,:benchmark-pt:`Benchmark <unet>`,PyTorch 2.5,Inf2.xlarge,Image Segmentation,1010.803,15.818,15.875,Batch,2.21.0,Data Parallel,4,FP32,Matmult-BF16\r\nvgg11,224x224,:benchmark-pt:`Benchmark <vgg>`,PyTorch 2.5,Inf2.xlarge,Image Classification,4734.402,54.044,54.09,Batch,2.21.0,Data Parallel,64,FP32,Matmult-BF16\r\nvgg16,224x224,:benchmark-pt:`Benchmark <vgg>`,PyTorch 2.5,Inf2.xlarge,Image Classification,2161.392,14.77,14.832,Batch,2.21.0,Data Parallel,8,FP32,Matmult-BF16"
  },
  {
    "path": "about-neuron/benchmarks/inf2/throughput_data_vision_dit.csv",
    "content": "Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type\nPixArt Alpha,256x256,:benchmark-pt:`Benchmark <pixart_alpha>`,PyTorch 2.1,Inf2.xlarge,Image Generation,1.975,502.587,537.258,Real Time,2.20,Data Parallel,1,\"\"\"FP32, BF16\"\"\",Matmult-BF16\nPixArt Alpha,512x512,:benchmark-pt:`Benchmark <pixart_alpha>`,PyTorch 2.1,Inf2.xlarge,Image Generation,0.565,1769.756,1775.697,Real Time,2.20,Data Parallel,1,\"\"\"FP32, BF16\"\"\",Matmult-BF16\nPixArt Sigma,256x256,:benchmark-pt:`Benchmark <pixart_sigma>`,PyTorch 2.1,Inf2.xlarge,Image Generation,1.86,540.832,548.41,Real Time,2.20,Data Parallel,1,\"\"\"FP32, BF16\"\"\",Matmult-BF16\nPixArt Sigma,512x512,:benchmark-pt:`Benchmark <pixart_sigma>`,PyTorch 2.1,Inf2.xlarge,Image Generation,0.543,1841.882,1850.683,Real Time,2.20,Data Parallel,1,\"\"\"FP32, BF16\"\"\",Matmult-BF16\n"
  },
  {
    "path": "about-neuron/benchmarks/inf2/throughput_data_vision_sd.csv",
    "content": "Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type\r\nStable Diffusion 1.5,512x512,:benchmark-pt:`Benchmark <sd_15_512>`,PyTorch 2.5,Inf2.xlarge,Image Generation,0.494,2023.741,2031.705,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nStable Diffusion 2.1,512x512,:benchmark-pt:`Benchmark <sd2_512>`,PyTorch 2.5,Inf2.xlarge,Image Generation,0.596,1679.805,1685.442,Real Time,2.21.0,Data Parallel,1,\"FP32, BF16\",Matmult-BF16\r\nStable Diffusion 2.1,768x768,:benchmark-pt:`Benchmark <sd2_768>`,PyTorch 2.5,Inf2.xlarge,Image Generation,0.187,5337.509,5357.361,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nStable Diffusion 2 Inpainting,936x624,:benchmark-pt:`Benchmark <sd2_inpainting>`,PyTorch 2.5,Inf2.xlarge,Image Generation,0.133,7546.004,7550.984,Real Time,2.21.0,Data Parallel,1,\"FP32, BF16\",Matmult-BF16\r\nStable Diffusion XL Base,1024x1024,:benchmark-pt:`Benchmark <sdxl_base_1024>`,PyTorch 2.5,Inf2.xlarge,Image Generation,0.083,12048.659,12102.431,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16\r\nStable Diffusion XL Base & Refiner,1024x1024,:benchmark-pt:`Benchmark <sdxl_base_and_refiner>`,PyTorch 2.5,Inf2.8xlarge,Image Generation,0.095,10546.45,10704.566,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16"
  },
  {
    "path": "about-neuron/benchmarks/inf2/throughput_data_vision_transformers.csv",
    "content": "Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type\r\ndeepmind/multimodal-perceiver,16x224x224,:benchmark-pt:`Benchmark <perceiver-multimodal>`,PyTorch 2.5,Inf2.xlarge,Multimodal Autoencoding,0.853,1170.045,1232.056,Real Time,2.21.0,Data Parallel,1,FP32,None\r\ndeepmind/vision-perceiver-learned,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,99.6,18.6,18.7,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\r\ndeepmind/vision-perceiver-fourier,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,67.9,29.5,29.68,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\r\ndeepmind/vision-perceiver-conv,224x224,:benchmark-pt:`Benchmark <perceiver-vision>`,PyTorch 1.13.1,Inf2.xlarge,Image Classification,126.5,14.14,14.2,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16\r\ngoogle/vit-base-patch16-224,224x224,:benchmark-pt:`Benchmark <hf-google-vit>`,PyTorch 2.5,Inf2.xlarge,Image Classification,1955.406,4.087,4.125,Batch,2.21.0,Data Parallel,2,FP32,Matmult-BF16\r\nopenai/clip-vit-base-patch32,224x224,:benchmark-pt:`Benchmark <hf-openai-clip>`,PyTorch 2.5,Inf2.xlarge,Image Classification,6509.83,135.806,136.003,Batch,2.21.0,Data Parallel,64,FP32,Matmult-BF16\r\nopenai/clip-vit-large-patch14,224x224,:benchmark-pt:`Benchmark <hf-openai-clip>`,PyTorch 2.5,Inf2.xlarge,Image Classification,285.938,113.117,115.940,Batch,2.21.0,Data Parallel,8,FP32,Matmult-BF16"
  },
  {
    "path": "about-neuron/benchmarks/trn1/latency_data_decoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Output Token Throughput (tokens/sec),TTFT Latency P50 (ms),TTFT Latency P99 (ms),TPOT Latency P50 (ms),TPOT Latency P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type,Weight Storage Data Type\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,157.25202,17.09,21.62,7.03,7.16,Real Time,2.18.1,Tensor Parallel,32,1,8192,128,8064,FP16,Matmult-BF16,int8\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,140.50031,153.02,159.13,7.04,7.13,Real Time,2.18.1,Tensor Parallel,32,1,8192,4096,4096,FP16,Matmult-BF16,int8\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,178.18923,14.75,22.94,5.86,6,Real Time,2.18.1,Tensor Parallel,32,1,4096,128,3968,FP16,Matmult-BF16,int8\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,37.70379,547,553.89,26.2,26.79,Real Time,2.18.1,Tensor Parallel,32,1,4096,2048,2048,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,40.63808,53.2,59.5,24.48,26.17,Real Time,2.18.1,Tensor Parallel,32,1,1152,128,1024,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,40.80995,52.53,52.79,26.48,24.22,Real Time,2.18.1,Tensor Parallel,32,1,256,128,128,FP16,Matmult-BF16,bf16\r\nLlama-2-7b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,161.7081305,13.32402229,14.1210556,6.69956207,6.84595108,Real Time,2.18.0,Tensor Parallel,32,1,8192,128,8064,FP16,Matmult-BF16,bf16\r\nLlama-2-13b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,60.43330245,864.1381264,865.9124374,9.84406471,10.14947891,Real Time,2.18.0,Tensor Parallel,32,1,8192,4096,4096,FP16,Matmult-BF16,bf16\r\nLlama-2-13b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,31.3990051,2367.928505,2369.139671,13.40842247,15.76948166,Real Time,2.18.0,Tensor Parallel,32,1,16384,8192,8192,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,39.28574,53.91026,54.9469,25.18129,26.58272,Real Time,2.18.0,Tensor Parallel,32,1,256,128,128,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,39.17668,81.882,98.77896,25.26712,25.7585,Real Time,2.18.0,Tensor Parallel,32,1,512,256,256,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,39.16379,57.75213,64.75568,25.44856,26.1333,Real Time,2.18.0,Tensor Parallel,32,1,1152,128,1024,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,38.09518,232.47981,239.02893,26.03793,26.17574,Real Time,2.18.0,Tensor Parallel,32,1,2048,1024,1024,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,37.70947,236.78207,241.14895,26.62468,27.02999,Real Time,2.18.0,Tensor Parallel,32,1,3072,1024,2048,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,36.78021,690.95588,695.91761,26.85046,27.04263,Real Time,2.18.0,Tensor Parallel,32,1,4096,2048,2048,FP16,Matmult-BF16,bf16\r\nMistral-7B-Instruct-v0.2,:llama-sample:`Sample <mistralai-Mistral-7b-Instruct-v0.2>`,Transformers NeuronX,trn1.32xlarge,Text Generation,49.55890938,1322.874308,1325.857162,9.89246368,10.18333435,Real Time,2.18.0,Tensor Parallel,32,1,16384,8192,8192,FP16,Matmult-BF16,bf16\r\nCodeLlama-13b-hf,:llama-sample:`Sample <codellama-13b-16k-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,60.21552741,868.635416,870.9816933,9.86456871,10.24436951,Real Time,2.18.0,Tensor Parallel,32,1,8192,4096,4096,FP16,Matmult-BF16,bf16\r\nCodeLlama-13b-hf,:llama-sample:`Sample <codellama-13b-16k-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,31.37781421,2372.928381,2375.921965,13.3998394,13.79013062,Real Time,2.18.0,Tensor Parallel,32,1,16384,8192,8192,FP16,Matmult-BF16,bf16\r\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/latency_data_encoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Sequence Length,Model Data Type,Compilation Autocast Data Type,OS Type\r\nalbert-base-v2,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),2321.97758889,0.85997581,0.9086132,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\nbert-base-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.8,trn1.2xlarge,Raw Output (AutoModel),2085.45272427,0.94294548,1.02853775,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\nbert-large-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),747.48212826,2.66885757,2.73442268,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\ndistilbert-base-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),3672.38478861,0.54264069,0.58531761,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\ngoogle/electra-base-discriminator,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),2127.07474023,0.93317032,0.9958744,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\nroberta-base,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),2094.37288172,0.95796585,1.00588799,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\nroberta-large,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),747.58300171,2.66981125,2.73323059,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\nxlm-roberta-base,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.32xlarge,Raw Output (AutoModelForMaskedLM),46.89836990,42.62268543,44.11746978,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22\r\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/latency_data_encoder_decoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Throughput (tokens/second),Latency per Token P50 (ms),Latency per Token P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,DP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type\r\nt5-3b,`Tutorial <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`_,NeuronX Distributed,trn1.32xlarge,Text Generation,110.23,9.07,9.12,Real Time,2.18.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16\r\ngoogle/flan-t5-xl,`Tutorial <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`_,NeuronX Distributed,trn1.32xlarge,Text Generation,120.29,8.31,8.34,Real Time,2.18.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16\r\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/throughput_data_decoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Output Token Throughput (tokens/sec),TTFT Latency P50 (ms),TTFT Latency P99 (ms),TPOT Latency P50 (ms),TPOT Latency P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type,Weight Storage Data Type\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,933.50053,55.16,61.47,9.95,10.1,Batch,2.18.1,Tensor Parallel,32,8,8192,128,8064,FP16,Matmult-BF16,int8\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,770.16291,1265.95,1292.94,10.04,10.33,Batch,2.18.1,Tensor Parallel,32,8,8192,4096,4096,FP16,Matmult-BF16,int8\r\nLlama-3-8B,:llama-sample:`Sample <meta-llama-3-8b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,1142.69582,49.05,52.79,7.65,7.94,Batch,2.18.1,Tensor Parallel,32,8,4096,128,3968,FP16,Matmult-BF16,int8\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,120.3614,1661.12,1672.71,32.33,33.27,Batch,2.18.1,Tensor Parallel,32,4,4096,2048,2048,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,140.51039,129.86,132.03,28.38,29.11,Batch,2.18.1,Tensor Parallel,32,4,1152,128,1024,FP16,Matmult-BF16,bf16\r\nLlama-3-70B,:llama-sample:`Sample <meta-llama-3-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,138.01357,130.37,130.48,28.08,28.53,Batch,2.18.1,Tensor Parallel,32,4,256,128,128,FP16,Matmult-BF16,bf16\r\nLlama-2-7b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,917.2452652,66.4024353,70.63961029,10.09511948,10.46204567,Batch,2.18.0,Tensor Parallel,32,8,8192,128,8064,FP16,Matmult-BF16,bf16\r\nLlama-2-13b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,371.7031,6668.70475,6689.8005,19.85741,21.0557,Batch,2.18.0,Tensor Parallel,32,8,8192,4096,4096,FP16,Matmult-BF16,bf16\r\nLlama-2-13b,:llama-sample:`Sample <meta-llama-2-13b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,184.28337,4628.44729,4635.24675,21.09194,22.3856,Batch,2.18.0,Tensor Parallel,32,4,16384,8192,8192,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,141.45357,156.84581,158.41317,26.72362,30.16973,Batch,2.18.0,Tensor Parallel,32,4,256,128,128,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,143.42503,270.15853,270.55573,26.9084,27.90999,Batch,2.18.0,Tensor Parallel,32,4,512,256,256,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,145.12799,156.68869,161.41367,27.21453,30.60174,Batch,2.18.0,Tensor Parallel,32,4,1152,128,1024,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,133.25056,1478.64008,1479.77638,28.55039,29.49882,Batch,2.18.0,Tensor Parallel,32,4,2048,1024,1024,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,129.27628,1478.84846,1482.93161,31.67439,32.01842,Batch,2.18.0,Tensor Parallel,32,4,3072,1024,2048,FP16,Matmult-BF16,bf16\r\nLlama-2-70b,:llama-sample:`Sample <llama-70b-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,120.62953,2722.03422,2730.95036,31.78978,33.2315,Batch,2.18.0,Tensor Parallel,32,4,4096,2048,2048,FP16,Matmult-BF16,bf16\r\nMistral-7B-Instruct-v0.2,:llama-sample:`Sample <mistralai-Mistral-7b-Instruct-v0.2>`,Transformers NeuronX,trn1.32xlarge,Text Generation,484.5773,8614.85291,8630.24068,15.43713,15.9421,Batch,2.18.0,Tensor Parallel,32,8,16384,8192,8192,FP16,Matmult-BF16,bf16\r\nCodeLlama-13b-hf,:llama-sample:`Sample <codellama-13b-16k-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,370.97736,6625.1595,6628.26467,19.91653,20.94936,Batch,2.18.0,Tensor Parallel,32,8,8192,4096,4096,FP16,Matmult-BF16,bf16\r\nCodeLlama-13b-hf,:llama-sample:`Sample <codellama-13b-16k-sampling>`,Transformers NeuronX,trn1.32xlarge,Text Generation,184.17898,4626.17469,4630.66864,21.09528,22.16578,Batch,2.18.0,Tensor Parallel,32,4,16384,8192,8192,FP16,Matmult-BF16,bf16\r\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/throughput_data_encoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Sequence Length,Model Data Type,Compilation Autocast Data Type,OS Type\r\nalbert-base-v2,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),3442.53392946,9.28854942,9.35173273,Batch,2.27.0,Data Parallel,16,128,FP32,Matmult-BF16,U22\r\nbert-base-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),3421.56625089,9.34481621,9.41992044,Batch,2.27.0,Data Parallel,16,128,FP32,Matmult-BF16,U22\r\nbert-large-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),1104.43610458,7.24101067,7.29799271,Batch,2.27.0,Data Parallel,4,128,FP32,Matmult-BF16,U22\r\ndistilbert-base-uncased,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),6369.44180331,5.00988960,5.09214401,Batch,2.28.0,Data Parallel,16,128,FP32,Matmult-BF16,U22\r\ngoogle/electra-base-discriminator,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),3425.55803570,9.32765007,9.45640087,Batch,2.28.0,Data Parallel,16,128,FP32,Matmult-BF16,U22\r\nroberta-base,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),3378.10764201,9.46044921,9.53317165,Batch,2.28.0,Data Parallel,16,128,FP32,Matmult-BF16,U22\r\nroberta-large,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),1123.90475943,14.23048973,14.30106163,Batch,2.27.0,Data Parallel,8,128,FP32,Matmult-BF16,U22\r\nxlm-roberta-base,:benchmark-pt:`Benchmark <inf2>`,PyTorch 2.9,trn1.32xlarge,Raw Output (AutoModelForMaskedLM),46.68898543,342.50581264,350.86465597,Batch,2.27.0,Data Parallel,8,128,FP32,Matmult-BF16,U22\r\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/throughput_data_encoder_decoder.csv",
    "content": "Model,Scripts,Framework,Inst. Type,Task,Throughput (tokens/second),Latency per Token P50 (ms),Latency per Token P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,DP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type\r\nt5-3b,`Tutorial <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`_,NeuronX Distributed,trn1.32xlarge,Text Generation,116.29,8.58,8.66,Batch,2.17.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16\r\ngoogle/flan-t5-xl,`Tutorial <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`_,NeuronX Distributed,trn1.32xlarge,Text Generation,122.52,8.16,8.19,Batch,2.17.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16\r\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/training_data_decoder.csv",
    "content": "Model,Instance-Type,Training Data-Type,Nodes,Topology,Microbatch,Globalbatch, Optimizer, Sequence Length, Performance [seq/sec],Strong/Weak Scaling,Neuron Version,Neuron Tutorial/Example,Pytorch Neuron(torch-neuronx) Version, OS Type.\r\nLlama-3.1-8B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+FP32Optimizer,32,TP=32 DP=32 PP=1 ZeRO-1,1,1024,AdamW,8192,47.95,strong scaling,2.24.0,`NeuronX Distributed <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.rst>`_,2.7.0.2.8.6896,U22\r\nLlama-3.1-70B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+FP32Optimizer,32,TP=32 DP=4 PP=8,1,1024,AdamW,8192,7.94,strong scaling,2.24.0,`NeuronX Distributed <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/libraries/neuronx-distributed/tutorials/training_llama_tp_pp.rst>`_,2.7.0.2.8.6896,U22\r\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/training_data_encoder.csv",
    "content": "Model,Instance-Type,Training Data-Type,Nodes,Topology,Microbatch,Globalbatch, Optimizer, Sequence Length, Performance [seq/sec],Strong/Weak Scaling,Neuron Version,Neuron Tutorial/Example,Pytorch Neuron(torch-neuronx) Version, OS Type.\r\nHuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,16,[32xNC(DP)] x 16Nodes(DP),16,1048576,Lamb,128,57407.9207,weak scaling,2.28.0,:ref:`hf-bert-pretraining-tutorial`,2.9.0.2.12.21983, U22\r\nHuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,FP32,16,[32xNC(DP)] x 16Nodes(DP),8,1048576,Lamb,128,32362.6714,weak scaling,2.28.0,:ref:`hf-bert-pretraining-tutorial`,2.9.0.2.12.21983, U22\r\nHuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],16,16384,AdamW,128,3826.6103,strong scaling,2.28.0,:ref:`hf-bert-pretraining-tutorial`,2.9.0.2.12.21983, U22\r\n\r\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/training_data_vision_transformers.csv",
    "content": "Model,Instance-Type,Training Data-Type,Nodes,Topology,Microbatch,Globalbatch, Optimizer, Performance [seq/sec],Strong/Weak Scaling,Neuron Version,Neuron Tutorial/Example,Pytorch Neuron(torch-neuronx) Version, OS Type.\r\nHuggingFace ViT-Base fine-tuning,trn1.32xlarge/trn1n.32xlarge,BF16,1,[32xNC(DP)],64,2048,AdamW,6587.25,weak scaling,2.25.0,`ViT-Base Fine-tuning Example <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_image_classification/vit.ipynb>`_,2.7.0.2.9.0, U22\r\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/trn1-inference-performance.rst",
    "content": ".. _trn1-inference-performance:\n\nTrn1/Trn1n Inference Performance\n================================\n\n.. important::\n\n   The benchmark scripts linked on this page are provided for historical reference only and are not tested with recent versions of the Neuron SDK. They have been moved to the `archive folder <https://github.com/aws-neuron/aws-neuron-sdk/tree/master/archive/src/benchmark/pytorch>`_.\n\n.. contents:: Table of contents\n   :local:\n\n\n*Last update:  Feb 26th, 2026*\n\n\n.. _NLP:\n\nEncoder Models\n--------------\n\n.. tab-set::\n\n   .. tab-item:: Throughput optimized\n\n      .. df-table::\n         :header-rows: 1\n\n         df = pd.read_csv('throughput_data_encoder.csv')\n         df_prices = pd.read_csv('trn1_instance_prices.csv')\n         df = pd.merge(df,df_prices,on='Inst. Type')\n\n         df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n         cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size','Sequence Length', 'Model Data Type','Compilation Autocast Data Type','OS Type']\n         df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences'])\n\n         df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True)\n         int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n         df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n\n   .. tab-item:: Latency optimized\n\n      .. df-table::\n         :header-rows: 1\n\n         df = pd.read_csv('latency_data_encoder.csv')\n         df_prices = pd.read_csv('trn1_instance_prices.csv')\n         df = pd.merge(df,df_prices,on='Inst. Type')\n\n         df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n\n         cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size','Sequence Length', 'Model Data Type','Compilation Autocast Data Type','OS Type']\n         df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences'])\n\n         df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True)\n         int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)']\n         df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\nEncoder-Decoder Models\n----------------------\n\n.. tab-set::\n\n   .. tab-item:: Throughput optimized\n\n      .. df-table::\n         :header-rows: 1\n\n         df = pd.read_csv('throughput_data_encoder_decoder.csv')\n         df_prices = pd.read_csv('trn1_instance_prices.csv')\n         df = pd.merge(df,df_prices,on='Inst. Type')\n         df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (tokens/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n         cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (tokens/second)', 'Latency per Token P50 (ms)', 'Latency per Token P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'TP Degree',        'DP Degree', 'Batch Size', 'Sequence Length', 'Input Length', 'Output Length', 'Model Data Type','Compilation Autocast Data Type']\n         df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences'])\n         df['Throughput (tokens/second)'] = df['Throughput (tokens/second)'].round(2).astype('float',copy=True)\n         int_cols = ['Latency per Token P50 (ms)', 'Latency per Token P99 (ms)']\n         df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n      .. note::\n         Only for Encoder-Decoder\n\n         **Throughput (tokens/second)** counts both input and output tokens\n\n         **Latency per Token** counts both input and output tokens\n\n         Applicable to all models\n\n         **Cost per 1M inferences** is calculated using RI-Effective hourly rate.\n\n         **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference.\n\n\n   .. tab-item:: Latency optimized\n\n      .. df-table::\n         :header-rows: 1\n\n         df = pd.read_csv('latency_data_encoder_decoder.csv')\n         df_prices = pd.read_csv('trn1_instance_prices.csv')\n         df = pd.merge(df,df_prices,on='Inst. Type')\n         df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (tokens/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format)\n         cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (tokens/second)', 'Latency per Token P50 (ms)', 'Latency per Token P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'TP Degree',        'DP Degree', 'Batch Size', 'Sequence Length', 'Input Length', 'Output Length', 'Model Data Type','Compilation Autocast Data Type']\n         df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences'])\n         df['Throughput (tokens/second)'] = df['Throughput (tokens/second)'].round(2).astype('float',copy=True)\n         int_cols = ['Latency per Token P50 (ms)', 'Latency per Token P99 (ms)']\n         df[int_cols] = df[int_cols].round(2).astype('float',copy=True)\n\n      .. note::\n\n         Only for Encoder-Decoder\n\n         **Throughput (tokens/second)** counts both input and output tokens\n\n         **Latency per Token** counts both input and output tokens\n\n\n      .. note::\n\n         **Cost per 1M inferences** is calculated using RI-Effective hourly rate.\n\n         **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference.\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/trn1-training-performance.rst",
    "content": ".. _trn1-training-performance:\n\nTrn1/Trn1n Training Performance\n===============================\n\nThis section provides benchmark results for training various deep learning models on AWS Trn1 and Trn1n instances powered by AWS Trainium chips. The benchmarks cover a range of model architectures, including encoder models, decoder models, and vision transformers, demonstrating the performance capabilities of Trn1/Trn1n instances for different training workloads.\n\n**Last update: February 19th, 2026**\n\n.. contents:: Table of contents\n   :local:\n\n.. _NLP:\n\nEncoder Models\n--------------\n\n.. csv-table::\n   :file: training_data_encoder.csv\n   :header-rows: 1\n\n\nDecoder Models\n--------------\n\n.. csv-table::\n   :file: training_data_decoder.csv\n   :header-rows: 1\n\n.. note::\n   **TP (Tensor Parallel), PP (Pipeline Parallel) and DP (Data Parallel) Topology** configuration refers to the degrees of 3D Parallelism (How the model and data is sharded across NeuronCores).\n\n   TP and PP are specified in the run script and DP is calculated by dividing **world size** (Number of nodes/instances * Number of neuron cores per instance) by TP * PP degrees.\n\n   For example, ``TP = 4, PP = 4`` and`` Number of instances is 32 (trn1.32xlarge)``. The world size will be: ``32(num instances) * 32(Neuron Cores per instance) = 1024``. Now, ``DP degree = 1024 (World size)/ 4 (TP) * 4 (PP) = 64``\n\n   For more information on batch sizes please refer to :ref:`neuron-batching`\n\n\nVision Transformer Models\n--------------------------\n\n.. csv-table::\n   :file: training_data_vision_transformers.csv\n   :header-rows: 1\n\n.. note::\n   Read more about strong vs weak scaling here :ref:`neuron-training-faq`\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/trn1_instance_prices.csv",
    "content": "Inst. Type,RI-Effective hourly rate\ntrn1.2xlarge,0.512\ntrn1.32xlarge,8.197\n"
  },
  {
    "path": "about-neuron/benchmarks/trn1/trn1_trn1n_nlp_data.csv",
    "content": "Model,Instance-Type,Training Data-Type,Nodes,Topology,Microbatch,Globalbatch, Optimizer, Performance [seq/sec],MFU[%],ComputeCostPerToken(Tflops),Strong/Weak Scaling,Neuron Version,Neuron Tutorial/Example,Pytorch Neuron(torch-neuronx) Version, OS Type.\nHuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,16,[32xNC(DP)] x 16Nodes(DP),16,1048576,Lamb,53069,25.83,,weak scaling,2.15.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.12.0, U20\nHuggingFace BERT-Large Ph2 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,16,[32xNC(DP)] x 16Nodes(DP),2,524288,Lamb,7507,15.5,,weak scaling,2.15.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.12.0, U20\nHuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16/AMP,16,[32xNC(DP)] x 16Nodes(DP),16,16384,AdamW,24518.47,,,strong scaling,2.14.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.11.0, U20\nHuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,FP32,16,[32xNC(DP)] x 16Nodes(DP),8,1048576,Lamb,28432,13.83,,weak scaling,2.14.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.12.0, U20\nHuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],16,16384,AdamW,3530,27.49,,strong scaling,2.15.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.12.0, U20\nHuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],16,65536,Lamb,3733,29.07,,strong scaling,2.15.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.12.0, U20\nGPT3-23B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,32,TP=8 DP=32 PP=4,1,1024,AdamW,100,29.65,289,strong scaling,2.15.0,`nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`_,1.13.1.1.12.0, U20\nGPT3-46B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,32,TP=8 DP=16 PP=8,1,1024,AdamW,47.2,27.7,578,strong scaling,2.15.0,`nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`_,1.13.1.1.12.0, U20\nGPT3-175B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,32,TP=32 DP=4 PP=8,1,1024,AdamW,12.7,33.14,2197,strong scaling,2.13.0,`nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`_,1.13.1.1.10.0, U20\nLlama2-7B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,32,TP=8 DP=4 PP=4,1,1024,AdamW,82,14.8,336,strong scaling,2.15.0,`nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md>`_,1.13.1.1.12.0, U20\nLlama2-13B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,32,TP=8 DP=4 PP=4,1,1024,AdamW,60,20.7,336,strong scaling,2.15.0,`nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md>`_,1.13.1.1.12.0, U20\nLlama2-7B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+FP32Optimizer,16,TP=8 DP=64,1,1024,AdamW,81,30.8,,strong scaling,2.15.0,`neuronx-distributed <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/libraries/neuronx-distributed/tutorials/training_llama2_7b.rst>`_,1.13.1.1.12.0, U20\nHuggingFace ViT-Base fine-tuning,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],64,2048,AdamW,5232.78,,,weak scaling,2.17.0,`ViT-Base Fine-tuning Example <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_image_classification/vit.ipynb>`_,1.13.1.1.13.0, U20\nHuggingFace CLIP-Base fine-tuning,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],80,2560,AdamW,5152.76,,,weak scaling,2.17.0,`CLIP-Base Fine-tuning <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_contrastive_image_text/CLIPBase.ipynb>`_,1.13.1.1.13.0, U20\nHuggingFace Vision-Perveriver-Conv fine-tuning,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],4,128,AdamW,423.32,,,weak scaling,2.17.0,`Vision Perceiver Conv Fine-tuning <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_image_classification/VisionPerceiverConv.ipynb>`_,1.13.1.1.13.1, U20\nHuggingFace Language-Perveriver fine-tuning,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],20,640,AdamW,1407.02,,,weak scaling,2.17.0,`Language Perceiver Fine-tuning <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/LanguagePerceiver.ipynb>`_,1.13.1.1.13.1, U20\n"
  },
  {
    "path": "about-neuron/beta-participation.rst",
    "content": ".. meta::\n    :description: Information about participating in the AWS Neuron SDK beta program.\n    :date-modified: 12/19/2025\n\nParticipate in the AWS Neuron SDK Beta Program\n===============================================\n\nAWS Neuron SDK users can participate in our beta program to get early access to new features and improvements. By joining the beta program, you can provide valuable feedback that helps us enhance the AWS Neuron SDK for everyone.\n\nCurrently, we are taking requests to join our Beta program for the new Neuron Kernel Interface and its associated features. If you are interested in participating, `fill out this online form <https://pulse.aws/survey/NZU6MQGW?p=0>`__ and we'll get back to you! Read more about the new NKI features `here <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/whats-new.html>`__.\n\n.. admonition:: Disclaimer\n\n   Beta features are not recommended for production workloads. They may contain bugs or incomplete functionality. Use them at your own risk and provide feedback to help us improve."
  },
  {
    "path": "about-neuron/calculator/neuron-calculator.rst",
    "content": ".. _neuron_calculator:\n\nNeuron Calculator\n=================\n\n.. raw:: html\n\n       <link href=\"https://cdnjs.cloudflare.com/ajax/libs/choices.js/1.1.6/styles/css/choices.min.css\" rel=\"stylesheet\">\n\n    <script src=\"https://cdnjs.cloudflare.com/ajax/libs/choices.js/1.1.6/choices.min.js\"></script>\n\n\n            <script>\n                require.config({\n                   paths: {\n                        mathjs: 'https://cdnjs.cloudflare.com/ajax/libs/mathjs/11.8.0/math.min'\n                        }\n                    });\n            </script>\n\n \n\n\n\n        <div class=\"container\">\n\n            <div id=\"neuron-calculator-select\" class=\"form-group row\">\n                <label for=\"formSelect\" class=\"col-sm-3 col-form-label fw-bold text-end\"> Select the Calculator</label>\n                <div class=\"col-sm-8\">\n\n                    <select class=\"form-control\" id=\"formSelect\" name=\"formSelect\">\n                        <option value=\"compute-core-form\"> NeuronCores needed for LLM Inference</option>\n                    </select>\n\n                </div>\n            </div>\n\n\n       <form id=\"compute-core-form\" method=\"get\" onkeydown=\"return event.key != 'Enter';\"  onsubmit=\"submitComputeCoreForm(); return false;\"> \n\n            <h2> Number of NeuronCores needed for LLM Inference</h2>\n            <span style=\"margin-bottom:5px;\">Please enter model configurations (You can enter multiple values of each hyperparameter. Press enter after adding each value in the text field) </span>\n\n            \n                 <div class=\"form-group row\">\n                    <label for=\"model-select\" class=\"col-sm-3 col-form-label fw-bold text-end\"> Model: </label>\n                   <div class=\"col-sm-8\">\n\n                        <select class=\"form-control\" id=\"model-select\" style=\"width:100%;\">\n                            <option value=\"custom-llm-model\"> Custom LLM Model </option>\n\n                            <optgroup label=\"Sample Model Configuration\" class=\"font-weight-bold\" >\n                                <option value=\"opt-66b\"> opt-66b </option>\n                                <option value=\"meta-llama/llama-2-7b\"> meta-llama/Llama-2-7b </option>\n                                <option value=\"meta-llama/llama-2-13b\"> meta-llama/Llama-2-13b </option>\n                            </optgroup>\n                        </select>\n                    </div>\n                \n              </div>\n\n\n                 <div class=\"form-group row\">\n                    <label for=\"instance-type\" class=\"col-sm-3 col-form-label fw-bold text-end\"> Instance Type: </label>\n                        <div class=\"col-sm-8\">\n                    <select class=\"form-control\" id=\"instance-type\" >\n                        <option value=\"Inf2\"> Inf2 </option>\n                        <option value=\"Trn1\"> Trn1 </option>\n                    </select>\n                </div>\n            </div>\n\n\n            <div class=\"form-group row\">\n                <label for=\"data-type\" class=\"col-sm-3 col-form-label fw-bold text-end\"> Data Type: </label>\n                <div class=\"col-sm-8\">\n                    <select class=\"form-control\" id=\"data-type\" >\n                        <option value=\"BF16 / FP16\" selected> BF16 / FP16 </option>\n                    </select>\n                </div>\n            </div>\n\n\n             <div class=\"form-group row\">\n                <label for=\"batch-size\" title=\"Enter the Batch Size\" class=\"col-sm-3 col-form-label fw-bold text-end\"> Batch Size:\n                 <!-- <i class=\"fa fa-info-circle\" aria-hidden=\"true\"> </i> -->\n                 </label>\n                <div class=\"col-sm-8\">\n                    <input type=\"text\" class=\"form-control\" id=\"batch-size\" placeholder=\"\" >        \n                </div>\n            </div>\n\n       \n\n            <div class=\"form-group row\">\n                <label for=\"max-sequence-length\" class=\"col-sm-3 col-form-label fw-bold text-end\"> Max Sequence Length:</label>\n                <div class=\"col-sm-8\">\n                    <input type=\"text\" class=\"form-control\" id=\"max-sequence-length\" placeholder=\"\" >\n                </div>\n            </div>\n\n\n            <div class=\"form-group row\">\n                <label for=\"num-embeddings\" class=\"col-sm-3 col-form-label fw-bold text-end\"> Embedding Dimension:</label>\n                <div class=\"col-sm-8\">\n                    <input type=\"text\" class=\"form-control\" id=\"num-embeddings\" placeholder=\"\" >\n                </div>\n            </div>\n\n            <div class=\"form-group row\">\n                <label for=\"num-attention-heads\" class=\"col-sm-3 col-form-label fw-bold text-end\"> Number of Attention Heads:</label>\n                <div class=\"col-sm-8\">\n                    <input type=\"text\" class=\"form-control\" id=\"num-attention-heads\" placeholder=\"\" >\n                </div>\n            </div>\n\n\n            <div class=\"form-group row\">\n                <label for=\"num-layers\" class=\"col-sm-3 col-form-label fw-bold text-end\"> Number of Layers:</label>\n                <div class=\"col-sm-8\">\n                    <input type=\"text\" name=\"num-layers\" class=\"form-control\" id=\"num-layers\" >              \n                </div>\n            </div>\n    \n        \n            <div class=\"form-group row\">\n                <div class=\"form-check\">\n                    <input type=\"checkbox\" style=\"width:20px;height:20px;margin-left:5px;\" name=\"num-attention-heads-divisible\" class=\"form-check-input\" id=\"num-attention-heads-divisible\" >              \n                    <label for=\"num-attention-heads-divisible\" class=\"form-check-label\" style=\"margin-left:35px;\"> Tensor Parallel Degree Constraint (Flexible tensor parallelism (TP) is not supported for certain models like GPT-J and GPT-NeoX in transformers-neuronx. Checking this box will flag a TP degree as invalid if the number of attention heads is not divisible by it.) </label>\n                </div>\n            </div>\n\n\n\n            <div id=\"warningMessage\" class=\"alert alert-warning text-danger\" style=\"display:none;\"> Invalid model configurations entered. Each text field accepts multiple values. Please press Enter after adding a new value to the text field.</div>\n\n\n            <div id=\"submit-button-row\" class=\"form-group row\">\n                <div class=\"col-sm-9 offset-sm-3 text-center\">\n                   <div class=\"mt-3\">\n                        <button  id=\"submit-button\" type=\"submit\" class=\"btn btn-primary ml-25\"> Submit</button>\n                    </div>\n                </div>\n            </div>        \n\n\n\n        </div>\n    \n    </form>\n\n      \n        <div id=\"calculator-result\" style=\"margin-bottom:50px;\"> </div>\n\n        <table id=\"results-table\" class=\"table\" style=\"display:none;\">\n            <thead>\n                <tr>\n                    <th> Batch Size </th>\n                    <th> Max Seq Length </th>\n                    <th> Embedding Dimension </th>\n                    <th> Num Attention Heads </th>\n                    <th> Num Layers </th>\n                    <th> Memory Footprint (GB)</th>\n                    <th> TP Degree(NeuronCores) </th>\n                    <th> Instances Recommended </th>\n                </tr>\n            </thead>\n            <tbody id=\"results-body\">\n            </tbody>\n       </table>\n\n\n     \n\n        <div id=\"reset-button-row\" class=\"form-group row\" style=\"display:none;margin-bottom:50px;\">\n            \n               <div class=\"col-sm-9 offset-sm-2 text-center\">\n                <div class=\"mt-3\">\n                    <button  id=\"edit-button\"  class=\"btn btn-primary ml-15 mr-3\"> Edit Model Configuration</button>  \n                    <button  id=\"reset-button\"  class=\"btn btn-primary ml-25\"> Reset Calculator</button>  \n\n                    <!--<button  id=\"reset-button\"  class=\"btn btn-primary ml-25\"> Reset Calculator</button> -->\n\n                </div>\n             </div>\n            \n        </div>  \n\n\n        <style>\n           .choices__list--multiple .choices__item {\n                background-color: #6c757d;\n                color: #ffffff;\n           }\n          \n          .table {\n                border-top: 1px solid black;\n                margin-top: 20px;\n                margin-bottom: 20px;\n          } \n          .green-row {\n             background-color: #c8e6c9 ;\n             }\n\n           .table-row {\n              border-bottom: 1px solid black;\n              } \n\n\n           </style>\n\n.. raw:: html\n\n    \n    <script>\n\n        var numLayersField ;\n        var batchSizeField;\n        var maxSequenceLengthField;\n        var numEmbeddingDimensionField;\n        var numAttentionHeadsField;\n\n\n        var modelSelectSavedHTML = \"\";\n        var modelSelectSavedValue = \"\";\n        var instanceTypeSavedHTML = \"\";\n        var instanceTypeSavedValue = \"\";\n        var dataTypeSavedHTML = \"\";\n        var dataTypeSavedValue = \"\";\n\n\n        document.addEventListener('DOMContentLoaded', function() {\n\n            numLayersField = new Choices('#num-layers' , { \n               maxItemCount: 8,\n               valueField: 'id',\n               labelField: 'title',\n               searchField: 'title',\n               shouldSort: false ,\n               searchEnabled: false ,\n               create: true ,\n               removeItemButton:true,\n               duplicateItems: false,\n            });\n\n            batchSizeField = new Choices('#batch-size' , { \n               maxItemCount: 8,\n               valueField: 'id',\n               labelField: 'title',\n               searchField: 'title',\n               shouldSort: false ,\n               searchEnabled: false ,\n               create: true ,\n               removeItemButton:true,\n               duplicateItems: false,\n            });\n\n            maxSequenceLengthField = new Choices('#max-sequence-length' , { \n               maxItemCount: 8,\n               valueField: 'id',\n               labelField: 'title',\n               searchField: 'title',\n               shouldSort: false ,\n               searchEnabled: false ,\n               create: true ,\n               removeItemButton:true,\n               duplicateItems: false,\n            });\n\n            numEmbeddingDimensionField = new Choices('#num-embeddings' , { \n               maxItemCount: 8,\n               valueField: 'id',\n               labelField: 'title',\n               searchField: 'title',\n               shouldSort: false ,\n               searchEnabled: false ,\n               create: true ,\n               removeItemButton:true,\n               duplicateItems: false,\n            });\n\n\n            numAttentionHeadsField = new Choices('#num-attention-heads' , { \n               maxItemCount: 8,\n               valueField: 'id',\n               labelField: 'title',\n               searchField: 'title',\n               shouldSort: false ,\n               searchEnabled: false ,\n               create: true ,\n               removeItemButton:true,\n               duplicateItems: false,\n            });\n\n        });\n\n\n        function modelSelectOnChangeHandler() {\n             var modelSelected=$(this).val();\n                if(modelSelected=='opt-66b'){\n\n                    batchSizeField.clearStore();\n                    batchSizeField.setValue([{value: '16', label: '16'},]);\n\n                    maxSequenceLengthField.clearStore();\n                    maxSequenceLengthField.setValue([{value: '2048', label: '2048'},]);\n\n                    \n                    numEmbeddingDimensionField.clearStore();\n                    numEmbeddingDimensionField.setValue([{value: '9216', label: '9216'},]);\n\n\n                    numLayersField.clearStore();\n                    numLayersField.setValue([{value: '64', label: '64'},]);\n\n\n                    numAttentionHeadsField.clearStore();\n                    numAttentionHeadsField.setValue([{value: '72', label: '72'},]);\n\n                }\n                else if(modelSelected == 'meta-llama/llama-2-7b'){\n\n                    batchSizeField.clearStore();\n                    batchSizeField.setValue([{value: '16', label: '16'},]);\n\n                    maxSequenceLengthField.clearStore();\n                    maxSequenceLengthField.setValue([{value: '4096', label: '4096'},]);\n\n                    \n                    numEmbeddingDimensionField.clearStore();\n                    numEmbeddingDimensionField.setValue([{value: '4096', label: '4096'},]);\n\n\n                    numLayersField.clearStore();\n                    numLayersField.setValue([{value: '32', label: '32'},]);\n\n\n                    numAttentionHeadsField.clearStore();\n                    numAttentionHeadsField.setValue([{value: '32', label: '32'},]);\n\n                }\n                else if(modelSelected == 'meta-llama/llama-2-13b'){\n\n                    batchSizeField.clearStore();\n                    batchSizeField.setValue([{value: '16', label: '16'},]);\n\n                    maxSequenceLengthField.clearStore();\n                    maxSequenceLengthField.setValue([{value: '4096', label: '4096'},]);\n\n                    \n                    numEmbeddingDimensionField.clearStore();\n                    numEmbeddingDimensionField.setValue([{value: '5120', label: '5120'},]);\n\n\n                    numLayersField.clearStore();\n                    numLayersField.setValue([{value: '40', label: '40'},]);\n\n\n                    numAttentionHeadsField.clearStore();\n                    numAttentionHeadsField.setValue([{value: '40', label: '40'},]);\n\n                }\n                else if(modelSelected=='custom-llm-model')\n                {\n                    batchSizeField.clearStore();\n                    maxSequenceLengthField.clearStore();\n                    numEmbeddingDimensionField.clearStore();\n                    numLayersField.clearStore();\n                    numAttentionHeadsField.clearStore();\n\n\n                }\n                else if(modelSelected=='import-hf-model')\n                {\n                    var hfDivLabel= document.getElementById('hf-model-url-label');\n                    hfDivLabel.style.display = 'block';\n\n                    var hfDiv= document.getElementById('hf-model-url-input-field');\n                    hfDiv.style.display = 'block';\n\n                    var hfDivButton= document.getElementById('hf-model-url-import-button');\n                    hfDivButton.style.display = 'block';\n                  \n                    document.getElementById(\"hf-model-url-import-button\").addEventListener(\"click\",function() { processHFImport(); } );\n\n                }\n\n\n        }\n\n\n        $(document).ready(function() {\n\n   \n            $('#formSelect').on('change',function() {\n                var form=$(this).val();\n                if(form=='compute-core-form'){\n                    $('#compute-core-form').show();\n                   // $('#batch-size-form').hide();\n                }\n                //else if(form=='batch-size-form'){\n                //    $('#batch-size-form').show();\n                //    $('#compute-core-form').hide();\n                //}\n            });\n            \n\n            $('#model-select').on('change',modelSelectOnChangeHandler);\n\n\n\n\n\n     $('#compute-core-form').show();\n            $('#batch-size-form').hide();\n\n        });\n                function processHFImport() {\n                        var hfModelURL= $(\"#hf-model-url\").val();\n                        var hfModelJSONURL = hfModelURL + \"/raw/main/config.json\";\n\n                        var xhr = new XMLHttpRequest();\n                        xhr.open('GET',hfModelJSONURL,true);\n\n                        xhr.onload = function() {\n                          if(xhr.status==200)\n                          {\n                                var data = JSON.parse(xhr.responseText);\n\n                                var numLayersVal = -1;\n                                var numEmbeddingsVal = -1;\n\n                                if ('n_layer' in data)\n                                {\n                                    numLayersVal = data.n_layer;\n                                }\n\n\n                                if ('hidden_size' in data)\n                                {\n                                    numEmbeddingsVal = data.hidden_size;\n                                }\n\n\n                                if(numLayersVal > -1)\n                                {\n                                    numLayersField.clearStore();\n                                    numLayersField.setValue([{value: numLayersVal, label: numLayersVal},]);\n                                }\n\n\n                                if(numEmbeddingsVal > -1)\n                                {\n                                    numEmbeddingDimensionField.clearStore();\n                                    numEmbeddingDimensionField.setValue([{value: numEmbeddingsVal, label: numEmbeddingsVal},]);\n                                }\n\n\n\n                          }\n                        };\n\n                        xhr.send();    \n\n                }\n        \n                function submitComputeCoreForm() {\n\n\n                    require(['mathjs'], function(math) {\n\n                 \n\n\n                    batchSizeVals = batchSizeField.getValue(true);\n                    maxSequenceLengthVals = maxSequenceLengthField.getValue(true);\n                    numEmbeddingDimensionVals = numEmbeddingDimensionField.getValue(true);\n                    numAttentionHeadsVals = numAttentionHeadsField.getValue(true);\n\n\n                    var numAttentionHeadsDivisibleField = document.getElementById(\"num-attention-heads-divisible\");\n                    attentionHeadsConstraint = false;\n\n                    if(numAttentionHeadsDivisibleField.checked)\n                    {\n                        attentionHeadsConstraint = true;\n                    }\n\n\n                    numLayersVals = numLayersField.getValue(true);\n\n                    const dataTypeSelected= $(\"#data-type\").val();\n                    const dTypeSize = math.bignumber(2);\n                    \n                    const instanceTypeSelected = $(\"#instance-type\").val();\n\n                    const modelSelected= $(\"#model-select\").val();\n\n                    //var hfModelURL= $(\"#hf-model-url\").val();\n\n                    var calculatorResultStr = '';\n\n                    var resultsTable = document.getElementById('results-table');\n                    var resultsBody = document.getElementById('results-body');\n                    var warningMessage = document.getElementById('warningMessage')\n\n                \n                   var inf2Cores = {'Inf2.xlarge':2 , 'Inf2.8xlarge':2 , 'Inf2.24xlarge':12 , 'Inf2.48xlarge':24 };\n                   var inf2Keys = Object.keys(inf2Cores);\n\n                   var trn1Cores = { 'Trn1.2xlarge':2 , 'Trn1.32xlarge':32};\n                   var trn1Keys = Object.keys(trn1Cores);\n             \n\n\n                    if(batchSizeVals=== null || numEmbeddingDimensionVals=== null || maxSequenceLengthVals=== null || numLayersVals=== null || (batchSizeVals.length === 0) || (maxSequenceLengthVals.length === 0) || (numEmbeddingDimensionVals.length === 0) || (numAttentionHeadsVals.length === 0) || (numLayersVals.length  === 0) )\n                    {\n                         event.preventDefault();\n                         warningMessage.style.display = 'block';\n                         return false;\n                    }\n\n\n                    rowBackgroundColor = \"#f5f5f5\" ;\n\n                    for(let i=0; i<batchSizeVals.length;  i++) {\n                        for(let j=0; j<maxSequenceLengthVals.length;  j++) {\n                            for(let k=0; k<numEmbeddingDimensionVals.length;  k++) {\n                                for(let m=0; m<numAttentionHeadsVals.length;  m++) {\n                                    for(let l=0; l<numLayersVals.length;  l++) {\n                                        \n                                           \n                                          rowBackgroundColor = (rowBackgroundColor === \"#f5f5f5\") ? \"#e0e0e0\" : \"#f5f5f5\"\n                                        \n\n                                           batchSize = math.bignumber(parseInt(batchSizeVals[i]));\n                                           maxSequenceLength = math.bignumber(parseInt(maxSequenceLengthVals[j]));\n                                           numEmbeddings = math.bignumber(parseInt(numEmbeddingDimensionVals[k]));\n                                           numLayers = math.bignumber(parseInt(numLayersVals[l]));\n                                           \n                                           numAttentionHeads =  math.bignumber(parseInt(numAttentionHeadsVals[m]));\n\n\n                                           weightMemFootPrintBytes = math.multiply(12,numLayers,math.pow(numEmbeddings,2),dTypeSize);\n                                           weightMemFootPrintGB = math.divide(weightMemFootPrintBytes,math.pow(1024,3))\n\n\n                                           kvCacheMemFootPrintBytes = math.multiply(batchSize,numLayers,maxSequenceLength,numEmbeddings,2,dTypeSize);\n                                           kvCacheMemFootPrintGB = math.divide(kvCacheMemFootPrintBytes,math.pow(1024,3))\n\n                                          \n                                           memFootPrintGB = math.add(weightMemFootPrintGB,kvCacheMemFootPrintGB);\n                                           memFootPrintGBRounded = math.ceil(memFootPrintGB)\n\n                                           numCoresCeiled = math.ceil(math.divide(memFootPrintGB,16));\n\n\n                                            if(isNaN(batchSize) || isNaN(numEmbeddings) || isNaN(maxSequenceLength) || isNaN(numLayers) || batchSize<=0 || numEmbeddings<=0 || numAttentionHeads<=0 || maxSequenceLength<=0 || numLayers<=0 )\n                                            {\n                                                event.preventDefault();\n                                                warningMessage.style.display = 'block';\n                                                return false;\n                                            }\n                                            else\n                                            {\n                                                warningMessage.style.display = 'none';\n\n                                            }\n\n\n\n                                            var neuronCoresNeeded = -1\n\n                                            var tensorParallelDegreesSupported = [];\n                                            if (instanceTypeSelected == 'Trn1') {\n                                                tensorParallelDegreesSupported = [2,8,32];\n                                            }\n                                            else if (instanceTypeSelected == 'Inf2') {\n                                                tensorParallelDegreesSupported = [2,4,8,12,24];\n                                            }\n\n\n                                            //alert(\"tensor parallel degrees supported on instance:\" + tensorParallelDegreesSupported);\n\n                                            //alert(\"num cores ceiled:\" + numCoresCeiled)\n\n\n                                            var tpDegreesPossible = [];\n\n                                            for (let p=0; p < tensorParallelDegreesSupported.length; p++) {\n                                                if(numCoresCeiled <= tensorParallelDegreesSupported[p]) {\n                                                    neuronCoresNeeded = tensorParallelDegreesSupported[p];\n                                                    \n                                                    for(let q=p ; q < tensorParallelDegreesSupported.length; q++) {\n                                                          var curPossibleTPDegree = tensorParallelDegreesSupported[q]\n                                                          tpDegreesPossible.push(curPossibleTPDegree);\n                                                    }\n\n                                                    break;\n                                                }\n                                            }\n\n\n                                            //alert(\"tp degrees possible:\" + tpDegreesPossible)\n\n                                            if(tpDegreesPossible.length == 0)\n                                            {\n\n                                                var row = document.createElement('tr');\n                                                row.style.backgroundColor = rowBackgroundColor\n\n                                                var batchSizeCell = document.createElement('td');\n                                                var maxSequenceLengthCell = document.createElement('td');\n                                                var numEmbeddingsCell = document.createElement('td');\n                                                var numAttentionHeadsCell = document.createElement('td');\n                                                var numLayersCell = document.createElement('td');\n                                                var memoryFootprintCell = document.createElement('td');\n                                                var numCoresCell = document.createElement('td');\n                                                var instancesSupportedCell = document.createElement('td');\n\n\n                                                batchSizeCell.textContent =  batchSize;\n                                                maxSequenceLengthCell.textContent =  maxSequenceLength;\n                                                numEmbeddingsCell.textContent =  numEmbeddings;\n                                                numAttentionHeadsCell.textContent = numAttentionHeads;\n                                                numLayersCell.textContent =  numLayers;\n\n                                                memoryFootprintCell.textContent = memFootPrintGBRounded\n\n\n                                                numCoresCell.textContent = \"N/A\";\n\n                                                instancesSupportedCell.style.color = 'red' ;\n\n                                                instancesSupportedCellStr = \"Does not fit in Single Instance. Multiple Instances needed\"\n\n\n                                                var rawHtmlElement = document.createElement('div');\n                                                rawHtmlElement.innerHTML = instancesSupportedCellStr\n                                                var sphinxHTMLString = '\\n\\n    ' + rawHtmlElement.outerHTML;\n                                                instancesSupportedCell.innerHTML = sphinxHTMLString\n\n                                                row.classList.add('table-row');\n\n\n                                                row.appendChild(batchSizeCell);\n                                                row.appendChild(maxSequenceLengthCell);\n                                                row.appendChild(numEmbeddingsCell);\n                                                row.appendChild(numAttentionHeadsCell);\n                                                row.appendChild(numLayersCell);\n                                                row.appendChild(memoryFootprintCell);\n                                                row.appendChild(numCoresCell);\n                                                row.appendChild(instancesSupportedCell);\n\n                                                //alert(row.innerHTML)\n\n                                                resultsBody.appendChild(row);\n\n\n\n                                            }\n                                            \n                                            \n                                    \n                                            for (let q=0;q<tpDegreesPossible.length; q++)\n                                            {\n                                                var row = document.createElement('tr');\n\n                                                row.style.backgroundColor = rowBackgroundColor\n\n                                                var batchSizeCell = document.createElement('td');\n                                                var maxSequenceLengthCell = document.createElement('td');\n                                                var numEmbeddingsCell = document.createElement('td');\n                                                var numAttentionHeadsCell = document.createElement('td');\n                                                var numLayersCell = document.createElement('td');\n                                                var memoryFootprintCell = document.createElement('td');\n                                                var numCoresCell = document.createElement('td');\n                                                var instancesSupportedCell = document.createElement('td');\n\n                                                batchSizeCell.textContent =  batchSize;\n                                                maxSequenceLengthCell.textContent =  maxSequenceLength;\n                                                numEmbeddingsCell.textContent =  numEmbeddings;\n                                                numAttentionHeadsCell.textContent = numAttentionHeads;\n                                                numLayersCell.textContent =  numLayers;\n\n                                                memoryFootprintCell.textContent = memFootPrintGBRounded\n\n                                                var instancesSupportedCellStr = \"\";\n\n                                                var tpDegree = tpDegreesPossible[q]\n                                                if(neuronCoresNeeded>0 && (!attentionHeadsConstraint || (numAttentionHeads % tpDegree === 0)))\n                                                {\n                                                    numCoresCell.textContent = tpDegree\n\n                                                    if(instanceTypeSelected === \"Inf2\")\n                                                    {\n                                                    for(var p=0; p<inf2Keys.length ; p++) {\n                                                    var inf2InstanceSize = inf2Keys[p];\n                                                    var instanceCores = inf2Cores[inf2InstanceSize]\n                                                    if(instanceCores>=tpDegree)\n                                                    {\n                                                            if(instancesSupportedCellStr.length >0)  instancesSupportedCellStr += \"<br>\";\n                                                            instancesSupportedCellStr += inf2InstanceSize\n                                                    } \n                                                    }\n                                                    }\n                                                    else if(instanceTypeSelected === \"Trn1\")\n                                                    {\n                                                    for(var p=0; p<trn1Keys.length ; p++) {\n                                                            var trn1InstanceSize = trn1Keys[p];\n                                                            var instanceCores = trn1Cores[trn1InstanceSize]\n                                                            if(instanceCores>=tpDegree)\n                                                            {\n                                                                    if(instancesSupportedCellStr.length >0)  instancesSupportedCellStr += \"<br>\";\n                                                                    instancesSupportedCellStr += trn1InstanceSize\n                                                            } \n                                                        }\n                                                    } \n\n\n                                                    \n                                                    if(instancesSupportedCellStr.length>0)\n                                                    {\n                                                        if(tpDegreesPossible.length>1)\n                                                        {\n                                                            if(q === 0)\n                                                            {\n                                                                numCoresCell.textContent = numCoresCell.textContent + \"(Min NeuronCores Reqd)\";\n                                                            }\n                                                            else if(q === (tpDegreesPossible.length-1))\n                                                            {\n                                                                numCoresCell.textContent = numCoresCell.textContent + \"(Best Latency)\";\n                                                            }\n                                                            else\n                                                            {\n                                                                numCoresCell.textContent = numCoresCell.textContent\n\n                                                            }\n                                                        }\n                                                        else\n                                                        {\n                                                            numCoresCell.textContent = numCoresCell.textContent + \"(Min NeuronCores Reqd)\";\n\n                                                        }\n                                                    }\n                                                    else\n                                                    {\n\n                                                        numCoresCell.textContent = numCoresCell.textContent;\n\n                                                    }\n\n                                                    instancesSupportedCell.textContent = \"Inf2 or Trn1 instances\"\n                                                // row.classList.add('green-row')\n                                                }\n                                                else if((neuronCoresNeeded>0 && attentionHeadsConstraint && (numAttentionHeads % tpDegree != 0)))\n                                                {\n\n                                                    numCoresCell.textContent = tpDegree\n\n                                                    instancesSupportedCell.style.color = 'red' ;\n\n\n                                                    instancesSupportedCellStr = \"TP degree not supported. Number of attention heads must be divisible by TP degree.\"\n                                                    // row.classList.add('red-row')\n\n\n                                                }\n                                                else{\n\n                                                    numCoresCell.textContent = \"N/A\";\n\n                                                    instancesSupportedCell.style.color = 'red' ;\n\n\n                                                    instancesSupportedCellStr = \"Does not fit in Single Instance. Multiple Instances needed\"\n                                                    // row.classList.add('red-row')\n\n                                                }\n\n\n                                                var rawHtmlElement = document.createElement('div');\n                                                rawHtmlElement.innerHTML = instancesSupportedCellStr\n                                                var sphinxHTMLString = '\\n\\n    ' + rawHtmlElement.outerHTML;\n                                                instancesSupportedCell.innerHTML = sphinxHTMLString\n\n                                                row.classList.add('table-row');\n\n\n                                                row.appendChild(batchSizeCell);\n                                                row.appendChild(maxSequenceLengthCell);\n                                                row.appendChild(numEmbeddingsCell);\n                                                row.appendChild(numAttentionHeadsCell);\n                                                row.appendChild(numLayersCell);\n                                                row.appendChild(memoryFootprintCell);\n                                                row.appendChild(numCoresCell);\n                                                row.appendChild(instancesSupportedCell);\n\n                                                //alert(row.innerHTML)\n\n                                                resultsBody.appendChild(row);\n                                            }\n\n                                        }\n\n                                    }\n                                }\n                            }\n                        }\n\n\n\n                    $('#submit-button-row').hide();\n                    $('#neuron-calculator-select').hide();\n\n                    var calculatorResult = document.getElementById('calculator-result');\n                    calculatorResult.innerHTML = \"For more details on how the number of NeuronCores is computed, please refer to the <a href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/transformers-neuronx/generative-llm-inference-with-neuron.html#how-many-neuroncores-do-i-need'>LLM Inference App Note</a>\";\n                    //$('#calculator-result').replaceWith(\"For more details on how the number of min NeuronCores are computed, please refer to the <a href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/transformers-neuronx/generative-llm-inference-with-neuron.html#how-many-neuroncores-do-i-need'>LLM Inference App Note</a>\");\n\n                    \n                    resultsTable.style.display = 'block';\n                     \n          \n\n                    $('#reset-button-row').show();\n\n\n                    batchSizeField.disable();\n                    maxSequenceLengthField.disable();\n                    numEmbeddingDimensionField.disable();\n                    numLayersField.disable();\n                    numAttentionHeadsField.disable();\n\n                    var numAttentionHeadsDivisibleField = document.getElementById(\"num-attention-heads-divisible\");\n                    numAttentionHeadsDivisibleField.disabled = true;\n\n                    const choiceCloseButtons = document.querySelectorAll('.choices__button');\n                    choiceCloseButtons.forEach(button => { button.disabled=true;} );\n\n\n\n                    modelSelectSavedHTML =  document.getElementById(\"model-select\").outerHTML;\n                    modelSelectSavedValue = modelSelected\n                    instanceTypeSavedHTML =  document.getElementById(\"instance-type\").outerHTML;\n                    instanceTypeSavedValue = instanceTypeSelected\n                    dataTypeSavedHTML =  document.getElementById(\"data-type\").outerHTML;\n                    dataTypeSavedValue = dataTypeSelected\n\n                   // $('#hf-model-url').replaceWith('<span id=\"hf-model-url\" class=\"readonly-text\" style=\"margin-top:5px;display:flex;\">' + hfModelURL + '</span>');\n                    $('#model-select').replaceWith('<span id=\"model-select\" class=\"readonly-text\" style=\"margin-top:5px;display:flex;\">' + modelSelected + '</span>');\n                    $('#instance-type').replaceWith('<span id=\"instance-type\" class=\"readonly-text\" style=\"margin-top:5px;display:flex;\" >' + instanceTypeSelected + '</span>');\n                     $('#data-type').replaceWith('<span id=\"data-type\" class=\"readonly-text\" style=\"margin-top:5px;display:flex;\" >' + dataTypeSelected + '</span>');\n                  \n\n                  \n\n\n                    });\n\n                    return false;\n                \n    }\n\n\n\n        \n\n         function editNeuronCalculatorConfiguration() {\n                    batchSizeField.enable();\n                    maxSequenceLengthField.enable();\n                    numEmbeddingDimensionField.enable();\n                    numLayersField.enable();\n                    numAttentionHeadsField.enable();\n\n                    var numAttentionHeadsDivisibleField = document.getElementById(\"num-attention-heads-divisible\");\n                    numAttentionHeadsDivisibleField.disabled = false;\n\n                    const choiceCloseButtons = document.querySelectorAll('.choices__button');\n                    choiceCloseButtons.forEach(button => { button.disabled=false;} );\n\n\n                    $('#reset-button-row').hide();\n\n                    $('#submit-button-row').show();\n                    $('#neuron-calculator-select').show();\n\n\n                    document.getElementById('model-select').outerHTML = modelSelectSavedHTML;\n                    document.getElementById('model-select').value = modelSelectSavedValue;\n\n            \n                    document.getElementById('model-select').addEventListener(\"change\",modelSelectOnChangeHandler)\n\n\n\n                    document.getElementById('instance-type').outerHTML = instanceTypeSavedHTML;\n                    document.getElementById('instance-type').value = instanceTypeSavedValue;\n\n                    document.getElementById('data-type').outerHTML = dataTypeSavedHTML;\n                    document.getElementById('data-type').value = dataTypeSavedValue;\n\n                    var calculatorResult = document.getElementById('calculator-result');\n                    calculatorResult.innerHTML = \"\";\n\n                    var resultsTable = document.getElementById('results-table');\n\n                    for (var i=resultsTable.rows.length-1; i>0; i--) \n                    {\n                        resultsTable.deleteRow(i);\n                    }\n                    resultsTable.style.display = 'none';\n                 \n\n\n           \n         }\n\n\n         function resetNeuronCalculator() {\n\n            location.reload();\n\n\n         }\n\n          document.getElementById(\"edit-button\").addEventListener(\"click\",function() { editNeuronCalculatorConfiguration(); } );\n          document.getElementById(\"reset-button\").addEventListener(\"click\",function() { resetNeuronCalculator(); } );\n\n\n        $(function() {\n            $('[data-toggle=\"tooltip\"]').tooltip();\n            }\n        );\n\n    </script>\n\n"
  },
  {
    "path": "about-neuron/faq/contributing-faq.rst",
    "content": ".. _contribute-faq:\n\nContributing Guidelines FAQs\n============================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nWhether it's\na bug report, new feature, correction, or additional documentation, we\ngreatly value feedback and contributions from our community.\n\nPlease read through this document before submitting any issues or pull\nrequests to ensure we have all the necessary information to effectively\nrespond to your bug report or contribution.\n\nHow to reporting Bugs/Feature Requests\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWe welcome you to use the GitHub issue tracker to report bugs or suggest\nfeatures.\n\nWhen filing an issue, please check existing open, or recently closed,\nissues to make sure somebody else hasn't already reported the issue.\nPlease try to include as much information as you can. Details like these\nare incredibly useful:\n\n-  A reproducible test case or series of steps\n-  The version of our code being used\n-  Any modifications you've made relevant to the bug\n-  Anything unusual about your environment or deployment\n\nContributing via Pull Requests\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nContributions via pull requests are much appreciated. Before sending us\na pull request, please ensure that:\n\n1. You are working against the latest source on the *master* branch.\n2. You check existing open, and recently merged, pull requests to make\n   sure someone else hasn't addressed the problem already.\n3. You open an issue to discuss any significant work - we would hate for\n   your time to be wasted.\n\nTo send us a pull request, please:\n\n1. Fork the repository.\n2. Modify the source; please focus on the specific change you are\n   contributing. If you also reformat all the code, it will be hard for\n   us to focus on your change.\n3. Ensure local tests pass.\n4. Commit to your fork using clear commit messages.\n5. Send us a pull request, answering any default questions in the pull\n   request interface.\n6. Pay attention to any automated CI failures reported in the pull\n   request, and stay involved in the conversation.\n\nGitHub provides additional document on `forking a\nrepository <https://help.github.com/articles/fork-a-repo/>`__ and\n`creating a pull\nrequest <https://help.github.com/articles/creating-a-pull-request/>`__.\n\nHow to find contributions to work on\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nLooking at the existing issues is a great way to find something to\ncontribute on. As our projects, by default, use the default GitHub issue\nlabels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix),\nlooking at any 'help wanted' issues is a great place to start.\n\nWhat is the code of conduct\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis project has adopted the `Amazon Open Source Code of\nConduct <https://aws.github.io/code-of-conduct>`__. For more information\nsee the `Code of Conduct\nFAQ <https://aws.github.io/code-of-conduct-faq>`__ or contact\nopensource-codeofconduct@amazon.com with any additional questions or\ncomments.\n\nHow to notify for a security issue\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf you discover a potential security issue in this project we ask that\nyou notify AWS/Amazon Security via our `vulnerability reporting\npage <http://aws.amazon.com/security/vulnerability-reporting/>`__.\nPlease do **not** create a public github issue.\n\nWhat is the licensing\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nSee the `link <https://github.com/aws/aws-neuron-sdk/blob/master/LICENSE-DOCUMENTATION>`_ \nand `link <https://github.com/aws/aws-neuron-sdk/blob/master/LICENSE-SUMMARY-DOCS-SAMPLES>`_ files\nfor our project's licensing. We will ask you to confirm the licensing of\nyour contribution.\n\nWe may ask you to sign a `Contributor License Agreement\n(CLA) <http://en.wikipedia.org/wiki/Contributor_License_Agreement>`__\nfor larger changes.\n"
  },
  {
    "path": "about-neuron/faq/index.rst",
    "content": ".. _neuron_faq:\n\nOther Neuron FAQs\n=================\n\nFrequently asked questions about AWS Neuron SDK, covering general topics, inference, training, ONNX support, and contributing guidelines. \n\n.. note::\n    This content may not be be up to date as of 2026, and often pertains to older or now-unsupported platforms and components.\n\nGeneral FAQs\n-------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: neuron2-intro-faq\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Neuron 2.x Introduction FAQ**\n      ^^^\n      Common questions about Neuron 2.x and Trn1 general availability\n\n   .. grid-item-card::\n      :link: onnx-faq\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **ONNX FAQ**\n      ^^^\n      Using ONNX models with AWS Neuron\n\n   .. grid-item-card::\n      :link: contributing-faq\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Contributing Guidelines FAQ**\n      ^^^\n      How to report bugs, request features, and contribute to Neuron\n\nInference FAQs\n---------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: inference/neuron-faq\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Inference with Neuron FAQ**\n      ^^^\n      Common questions about running inference workloads on AWS Neuron\n\n   .. grid-item-card::\n      :link: inference/trouble-shooting-faq\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Troubleshooting for Inf1 FAQ**\n      ^^^\n      Debugging and troubleshooting inference issues on Inf1 instances\n\nTraining FAQs\n-------------\n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: training/neuron-training\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Training with Neuron FAQ**\n      ^^^\n      Common questions about training models on Trainium instances\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   Neuron 2.x Introduction FAQ <neuron2-intro-faq>\n   ONNX FAQ <onnx-faq>\n   Contributing Guidelines FAQ <contributing-faq>\n   Inference with Neuron FAQ <inference/neuron-faq>\n   Troubleshooting for Inf1 FAQ <inference/trouble-shooting-faq>\n   Training with Neuron FAQ <training/neuron-training>\n"
  },
  {
    "path": "about-neuron/faq/inference/neuron-faq.rst",
    "content": ".. _neuron-f1-faq:\n\nInference with Neuron - FAQ\n---------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nWhat ML model types and operators are supported by AWS Neuron?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAWS Neuron includes a compiler that converts your trained machine\nlearning models to a binary object for execution. The Neuron\ncompiler supports many commonly used machine learning operators used in computer vision, natural language processing, recommender engines and more. A list of supported ML operators and supported inputs are in :ref:`neuron-supported-operators` .\n\nIt's important to mention that to get good performance doesn't require all of the model operators to run on the chip. In many cases, some of the operators will continue to run on the instance CPUs, like the case of embeddings or image pre-processing, and will still provide a compelling end to end performance. We call this approach auto-partitioning, where the Neuron compiler optimizes the model execution based on operators that are most suitable to run on the CPU or the chip.\n\nFor the latest model architecture support, please refer to the model architecture fit and performance pages.\n\nWhy is a compiler needed, and how do I use it?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Neuron compiler converts a model from a framework level Neural Network\ngraph, with operators like convolution and pooling, into a\nNeuron Device-specific instruction set, builds the schedule for\nexecution of these instructions, and converts the model parameters into\nformat that the neuron device can consume. The supported input formats include\nTensorFlow, PyTorch, and MXNet. The output from the\ncompiler is a Neuron Executable File Format (NEFF) artifact. The NEFF\ncontains a combination of binary code, the model parameters, and\nadditional meta-data needed by the Neuron runtime and profiler.\n\nI am using a ML framework today – what will change for me to use this?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo use Inferentia within the Inf1 instances, the developer needs to perform one-time compilation\nof the pre-trained model to generate a NEFF, and use this as the inference\nmodel in fleet of Inf1 instances.\n\n-  :doc:`TensorFlow Neuron </archive/tensorflow/index>`\n-  :ref:`neuron-pytorch`\n-  :ref:`neuron-mxnet`\n\nWhat is a NeuronCore Pipeline? How do I take advantage of it?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nA NeuronCore Pipeline is a unique technique to shard a specific Neural\nNetwork across multiple NeuronCores, to take advantage of the large\non-chip cache instead of moving data in and out of external memory. The result is an increased throughput and reduce latency\ntypically important for real-time inference applications. All Inf1 instances support it, and the Inf1\ninstances with multiple Inferentia accelerators, such as inf1.6xlarge or\ninf1.24xlarge support it thanks to the fast chip-to-chip interconnect.\n\nDevelopers can choose to use NeuronCore Pipeline mode during compile\nstage, with an opt-in flag. :ref:`neuron-cc` provides further details.\n\nNeuronCores, NeuronCore Groups and NeuronCore Pipelines: What do they do?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEach Inferentia chip has four compute engines called NeuronCores. A\nNeuronCore Group is a way to aggregate NeuronCores to increase hardware\nutilization and assign models with the right compute sizing for a\nspecific application. If you want to run multiple models in parallel,\nyou can assign different models to separate NeuronCore Groups. A model\ncompiled to use multiple NeuronCores in a NeuronCore Pipeline can be\nassigned to a NeuronCore Group with enough NeuronCores to load into.\nFinally- it is also possible for sets of Inferentia devices to be mapped\nto separate Neuron Runtimes. :ref:`neuron-features-index` section has more\ninformation and examples.\n\nCan I use TensorFlow networks from tfhub.dev as-is ? if not, what should I do?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYes. Models format can be imported into TensorFlow, either as a standard\nmodel-server, in which case it appears as a simple command line utility,\nor via the Python based TensorFlow environment. The primary additional\nstep needed is to compile the model into Inferentia NEFF format.\n"
  },
  {
    "path": "about-neuron/faq/inference/trouble-shooting-faq.rst",
    "content": ".. _trouble-shooting-inf1-faq:\n\nTroubleshooting for Inf1 - FAQ\n==============================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\nPerformance is not what I expect it to be, what's the next step?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nPlease check our performance optimization section on performance\ntuning and other notes on how to use pipelining and batching to improve\nperformance.\n\nDo I need to worry about size of model and size of inferentia memory? what problems can I expect to have?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nErrors like this will be logged and can be found as shown\n:ref:`neuron_gatherinfo`.\n\nHow can I debug / profile my inference request?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSee :ref:`neuron-plugin-tensorboard`\n\n\nHow to report Bug/Feature Requests\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWe welcome you to use the Neuron GitHub issue tracker to report bugs or suggest\nfeatures.\n\nWhen filing an issue, please check existing open, or recently closed,\nissues to make sure somebody else hasn't already reported the issue.\nPlease try to include as much information as you can. Details like these\nare incredibly useful:\n\n-  A reproducible test case or series of steps\n-  The version of our code being used\n-  Any modifications you've made relevant to the bug\n-  Anything unusual about your environment or deployment\n"
  },
  {
    "path": "about-neuron/faq/neuron2-intro-faq.rst",
    "content": ".. _neuron2-intro-faq:\n\nNeuron 2.x Introduction at Trn1 GA - FAQ\n----------------------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n.. include:: /release-notes/templates/n2.x-trn1-ga-faq.txt\n"
  },
  {
    "path": "about-neuron/faq/onnx-faq.rst",
    "content": ".. _onnx-faq:\n\nONNX FAQ\n---------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\nCan I use ONNX models with Neuron ? If not, what should I do?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAWS Neuron does not directly support compilation of models in the ONNX file format. The recommended way to compile a model that is in the ONNX file format is to first convert the model to PyTorch using a publicly available tool\nlike  `onnx2pytorch <https://github.com/ToriML/onnx2pytorch>`_ . Once the ONNX model is converted to PyTorch, it can then be compiled with the :func:`torch_neuron.trace` function to produce a model that can run on Neuron.\n\n\n"
  },
  {
    "path": "about-neuron/faq/roadmap-faq.rst",
    "content": ".. _neuron_roadmap_faq:\n\nRoadmap FAQ\n===========\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\nWhy did you build this?\n~~~~~~~~~~~~~~~~~~~~~~~\n\nA: We know that our customers are making decisions and plans based on\nwhat we are developing, and we want to provide them with the right\nvisibility to what we are working on, as well as the opportunity to\nprovide direct feedback.\n\nWhat do the roadmap categories mean?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n-  **Roadmap Requests** - Requests we received and we are considering to add to the roadmap, \n   this is a great phase to give us feedback and let us know if you need this feature as well.\n-  **Working on it** - In progress, we might still be\n   working through the implementation details, or scoping stuff out.\n   This is a great phase to give us feedback as to how you want to see\n   something implemented. We’ll benefit from your specific use cases\n   here.\n-  **Completed** - Feature complete and supported by Neuron.\n\n\nWhy are there no dates on your roadmap?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA: We are not providing exact target dates for releases because we\nprioritize operational excellence, security and quality over hitting a\nspecific date. If you have an urgent need for a feature, please contact\nus directly at aws-neuron-support@amazon.com.\n\nIs everything on the roadmap?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA: We are focusing on upgrades for existing features, as well as\nbuilding new features. We will keep adding features and capabilities to\nthis roadmap as time progresses.\n\nHow can I provide feedback or ask for more information?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA: When in doubt, please create an issue or post a question on the `AWS\nNeuron support\nforum <https://forums.aws.amazon.com/forum.jspa?forumID=355>`__.\n\nHow can I request a feature be added to the roadmap?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA: We encourage you to open an issue. All community-submitted issues\nwill be reviewed by the roadmap maintainers.\n\nCan I \"+1\" existing issues?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA:We strongly encourage you to do so, as it helps us understand which\nissues will have the widest impact. You can navigate to the issue\ndetails page and add a reaction (thumbs up). There are six types of\nreactions supported (thumbs down “-1”, confused, heart, watching, laugh,\nhooray, and thumbs up +1)."
  },
  {
    "path": "about-neuron/faq/training/neuron-training.rst",
    "content": ".. _neuron-training-faq:\n\nTraining with Neuron - FAQ\n==========================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nCompute\n-------\n\nHow do I get started with training my model on Trn1?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOnce you select your machine learning framework, you can get started here: :ref:`docs-quick-links`\n\n\nHow do I setup EFA for multi-node training?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nFor setting up EFA that is needed for multi-node training, please see :ref:`setup-trn1-multi-node-execution`\n\n\nHow do I know if I can train my models with Trainium?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe aim to support a broad set of models and distribution libraries. We continuously add more capabilities and enable new features via Neuron SDK releases and suggest you will follow our public roadmap and join our slack and email lists.\n\nHow should I size Trainium NeuronCores vs GPUs?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFor simplicity, you should consider each NeuronCore within your instances as an independent deep learning compute engine, the equivalent of a GPU. As point of comparison, a trn1.32xlarge has 32 NeuronCores, and their max performance is 40% higher than of P4d for BF16/FP16/FP8, 2.5X faster for TF32, and 5X faster for FP32. Each NeuronCore is independent and connected to the rest of the NeuronCores within the instance via NeuronLink, and across instances with EFA. Each NeuronCore has also full access to the accelerator memory in the instance, which helps scale large models across NeuronCores using various collective compute ops techniques. \n\nWhat are the time to train advantages of Trn1?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhile the answer is largely model dependent, training performance on Trn1 is fast due thanks for multiple system wide optimizations working in concert. Dependent on the data type, you should expect between 1.4-5X higher throughput on Trn1 as compared to the latest GPUs instances (P4d). For distributed workloads, 800Gbps EFA gives customers lower latency, and 2x the throughput as compared to P4d. (a Trn1n 1.6Tb option is coming soon). Each Trainium also has a dedicated collective compute (CC) engine, which enables running the CC ops in parallel to the NeuronCores compute. This enables another 10-15% acceleration of the overall workload. Finally, stochastic rounding enables running at half precision speeds (BF16) while maintaining accuracy at near full precision, this is not only simplifying model development (no need for mixed precision) it also helps the loss function converge faster and reduce memory footprint.\n\nWhat are some of the training performance results for Trn1?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThey are great! please refer to the :ref:`benchmark` page for open-source model performance results. We encourage you to try it for your own models/application.\n\nCan I use CUDA libraries with AWS Trainium?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAWS Trainium and Neuron are plugged into popular frameworks, and is automatically optimizing model deployment on Neuron devices like Inferentia and Trainium. The Neuron SDK automatically optimizes for Trainium without using closed source dependencies like Nvidia CUDA, not requiring any application level code changes to accelerate models. We believe this intentional approach allows developers freedom of choice with their code and models. If you have applications dependencies on CUDA (or other 3rd party closed source artifacts) you will need to strip them out, and from that point the Neuron compiler will take the model as is and optimize it at the hardware level. \n\n\nNetworking\n----------\n\nWhat’s important to know about the networking in Trn1?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTrn1 have the fastest EFA in AWS, clocked at 800Gbps they enable more collective communication as compared to other training instances, which is important if your training job spans across multiple servers. You should also expect lower latency as we streamline the communication path between the dedicated collective communication engine on Trainium, and the AWS Nitro EFA NICs.\n\nHow does Trainium accelerates collective communication  operations?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTrainium introduces a dedicated collective compute engine, that runs in parallel to the compute cores (aka NeuronCores). This improves convergence time of intermediate steps as the communication happens in parallel to the compute. This capability, in addition to the faster and optimized EFA, results in better scalability and faster time to train, as compared to other training instances in AWS.\n\nWhat does Strong/Weak Scaling mean?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo enable strong scaling, we optimized Trainium to be efficient at small batch sizes. Compared to GPUs, Trn1 maintains high efficiency even for small batch sizes. This allows you to scale-out to thousands of devices without increasing the global mini-batch size at the same rate, which in turn leads to faster end-to-end training convergence.\n\nIn weak scaling setup, we show the optimal throughput with sufficiently large batch size per Trainium. The large batch size is set to leverage the high core utilization so that the overall end-to-end training will be fast. This setup also enables a large global batch size as it scales with the total number of nodes in the cluster.\n\nUsability\n---------\n\nWhat have AWS done to improve usability of Trainium?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nStochastic rounding enables running at half precision speeds (BF16) while maintaining accuracy at near full precision. This of course helps the loss function converge faster and reduce memory footprint, but equally important, it is simplifying model development as you can write your model in FP32, and Neuron/Trainium will auto-cast the model to BF16, and execute it with SR enabled. There is no need to loss accuracy with pure BF16 runs, and more importantly no need for experimenting with  mixed precision strategies to find the optimal settings.\n\nEager debug mode provides a convenient utility to step through the code and evaluate operator correctness as part of your model creation/debug. For more details, please refer to the Neuron documentation\n\nWhat other AWS services work with Trn1?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTrn1 via its Neuron SDK supports Amazon ECS, EKS, ParallelCluster, Batch, and Amazon SageMaker. Customers can also choose to run in a Neuron container within their self-managed containers orchestration service (e.g., Kubernetes and Ray).\n\nWhat tools are available to develop models with Trn1?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running training, evaluation or inference workloads you can use Neuron 2.x CLI tools such as neuron-ls and neuron-top to get insights into the NeuronCores and NeuronDevices performance and memory utilization, topology and host vCPU performance and memory utilization. In addition, the Neuron Plugin for TensorBoard provides a standard GUI that enables profile and debug of models. TensorBoard views include:\n\n- Model overview: provide a summary of the model and the utilization on the Host and NeuronDevice\n- Operators’ view: provide a breakdown of ML framework and HLO operators on both Host and NeuronDevice\n- Code trace view: show a timeline of the model execution at the framework and HLO operators level \n- Hardware trace view: show a timeline of the model execution at the level of hardware (Host, NeuronDevice, Data Transfer)\n- Topology view: show the NeuronDevices topology within an instance\n\n\nHow will compile time impact my work flow?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe understand compilation is a new step with Trainium, but as long as the overall time to train and cost to train is optimized, the compilation impact on these two metrics is minimized. To further help reduce compilation time impact on usability, Neuron supports a persistent cache, where artifacts that have not changed since the last run can be reused, skipping compilation all together. For developing and experimenting with new models, you can use the eager debug mode, that compiles (and caches) op-by-op, enabling quick evaluation without compiling large models. We are also working on Neuron model analyzer (see Neuron roadmap) that will recommend optimized hyper parameters, skipping full compilation per experiment.\n"
  },
  {
    "path": "about-neuron/faq.rst",
    "content": ".. _neuron_faq:\n\n.. meta::\n   :description: Frequently Asked Questions (FAQ) about the AWS Neuron SDK, including topics on Neuron 2.x, training, inference, runtime, compiler, containers, and ONNX support.\n   :date-modified: 2025-10-03\n\nNeuron FAQ\n==========\n\nThis topic provides links to frequently asked questions (FAQs) about the AWS Neuron SDK, organized by Neuron component.\n\nNeuron 2.x FAQ\n--------------\n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: neuron2-intro-faq\n      :link-type: ref\n\n      **Neuron 2.x Introduction FAQ**\n      ^^^\n      Common questions about Neuron 2.x features and migration\n\nTraining-specific FAQ\n---------------------\n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: neuron-training-faq\n      :link-type: ref\n\n      **Neuron Training FAQ**\n      ^^^\n      Frequently asked questions about training models on Neuron\n\nInference-specific FAQ\n----------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: neuron-f1-faq\n      :link-type: ref\n\n      **Neuron F1 FAQ**\n      ^^^\n      Questions about F1 instance inference capabilities\n\n   .. grid-item-card::\n      :link: trouble-shooting-inf1-faq\n      :link-type: ref\n\n      **Inf1 Troubleshooting FAQ**\n      ^^^\n      Common ``Inf1`` instance issues and solutions\n\n   .. grid-item-card::\n      :link: neuronperf_faq\n      :link-type: ref\n\n      **NeuronPerf FAQ**\n      ^^^\n      Performance benchmarking tool questions\n\nNeuron Runtime FAQ\n------------------\n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: neuron-runtime-faq\n      :link-type: ref\n\n      **Neuron Runtime FAQ**\n      ^^^\n      Runtime configuration and execution questions\n\nNeuron Compiler FAQ\n-------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: neuronx_compiler_faq\n      :link-type: ref\n\n      **NeuronX Compiler FAQ**\n      ^^^\n      Questions about the NeuronX compiler for Trn1/Inf2\n\n   .. grid-item-card::\n      :link: neuron_compiler_faq\n      :link-type: ref\n\n      **Neuron Compiler FAQ**\n      ^^^\n      Questions about the Neuron compiler for Inf1\n\nNeuron DLCs FAQ\n---------------\n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: container-faq\n      :link-type: ref\n\n      **Neuron Containers FAQ**\n      ^^^\n      Container deployment and configuration questions\n\nSupport\n-------\n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: contribute-faq\n      :link-type: ref\n\n      **Contribute FAQ**\n      ^^^\n      Questions about contributing to the Neuron project\n"
  },
  {
    "path": "about-neuron/index.rst",
    "content": ".. _about-neuron:\n\nAbout the AWS Neuron SDK\n========================\n\nAWS Neuron is a software development kit (SDK) enabling high-performance deep learning acceleration using AWS Inferentia and Trainium, AWS's custom designed machine learning accelerators. It enables you to develop, profile, and deploy high-performance machine learning workloads on AWS Inferentia and Trainium instances. \n\nThe AWS Neuron SDK includes:\n\n* **Neuron Compiler** - Compiles high-level, framework-based models for optimal performance on Neuron devices\n* **Neuron Kernel Interface (NKI)** - Provides direct compiler access to Neuron device capabilities\n* **Neuron Runtime** - Executes compiled models on Neuron devices\n* **ML Framework integration** - Deep support for PyTorch and JAX\n* **Training and inference libraries** - Distributable training and inference libraries for large-scale models\n* **Deployment support** - Integration with AWS services like SageMaker, EC2, EKS, and ECS\n* **Developer tools** - Profiling, monitoring, and debugging utilities\n\nFor a full list of AWS Neuron features, see :ref:`what-is-neuron`.\n\n.. admonition:: Join our Beta program\n\n   Get early access to new Neuron features and tools! `Fill out this form and apply to join our Beta program <https://pulse.aws/survey/NZU6MQGW?p=0>`__.\n\nWhat is \"NeuronX\"?\n------------------\n\n\"NeuronX\" refers to the next-generation AWS Neuron SDK, which provides enhanced capabilities for both inference and training on AWS Inferentia and Trainium instances. NeuronX includes:\n\n* Support for the latest versions of PyTorch and JAX\n* Advanced compiler optimizations for improved performance\n* Enhanced distributed training libraries for large-scale models\n* Improved profiling and debugging tools\n* Ongoing feature development and support for new instance types\n\nCatch up on the latest Neuron news\n-----------------------------------\n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: /about-neuron/whats-new\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **What's New in Neuron**\n      ^^^\n      Read about the latest releases and features of the Neuron SDK\n\n\n\n\nLearn about AWS Neuron\n----------------------\n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: /about-neuron/what-is-neuron\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **What is AWS Neuron?**\n      ^^^\n      Short overview of the AWS Neuron SDK and its components\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n   \n   .. grid-item-card::\n      :link: /about-neuron/arch/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Neuron architecture**\n      ^^^\n      Understand the Neuron hardware and software architecture\n\n   .. grid-item-card::\n      :link: /about-neuron/arch/neuron-features/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Neuron features**\n      ^^^\n      Overviews of model development features provided by Neuron\n\n   .. grid-item-card::\n      :link: /frameworks/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Supported ML frameworks**\n      ^^^\n      Neuron support for popular ML frameworks including PyTorch and JAX\n\n   .. grid-item-card::\n      :link: /libraries/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **NeuronX distributed (NxD) libraries**\n      ^^^\n      NeuronX distributed libraries for training and inference\n\n   .. grid-item-card::\n      :link: /nki/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Neuron Kernel Interface (NKI)**\n      ^^^\n      NKI is a low-level interface for custom, bare-metal kernel development\n\n   .. grid-item-card::\n      :link: /compiler/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Neuron Compiler**\n      ^^^\n      The Neuron compiler optimizes models for Neuron hardware\n\n   .. grid-item-card::\n      :link: /neuron-runtime/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Neuron Runtime**\n      ^^^\n      Runtime for executing compiled models on Neuron devices\n\n   .. grid-item-card::\n      :link: /tools/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Neuron developer tools**\n      ^^^\n      Tools for profiling, debugging, and monitoring Neuron applications\n\n   .. grid-item-card::\n      :link: /dlami/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Neuron AWS Neuron Deep Learning AMIs**\n      ^^^\n      Deploy the Neuron SDK on EC2 instances with pre-installed Amazon Machine Images (AMIs)\n\n   .. grid-item-card::\n      :link: /containers/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      **Neuron AWS Neuron Deep Learning Containers**\n      ^^^\n      Deploy the Neuron SDK using pre-built Docker deep learning containers (DLCs)\n\nResources\n---------\n\n* :ref:`Setup Guide <setup-guide-index>`\n* :ref:`Release Notes <neuron_release_notes>`\n* :ref:`Neuron FAQ <neuron_faq>`\n* :doc:`Older Neuron FAQs <faq/index>`\n\nSupport\n-------\n\n* :doc:`Neuron Open Source GitHub Repos </about-neuron/oss/index>`\n* :ref:`AWS Neuron SDK maintenance policy <sdk-maintenance-policy>`\n\n.. _contact-us:\n\nContact us\n----------\n\nFor support, submit a request with AWS Neuron `Github issues <https://github.com/aws/aws-neuron-sdk/issues>`_ or visit the `Neuron AWS forums <https://forums.aws.amazon.com/forum.jspa?forumID=355>`_ for an answer. \n\nIf you want to request a feature or report a critical issue, you can contact us directly at ``aws-neuron-support@amazon.com``.\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   App Notes <appnotes/index>\n   Ask Amazon AI helper tools </about-neuron/amazonq-getstarted>\n   Benchmarks </about-neuron/benchmarks/index>\n   Beta Participation </about-neuron/beta-participation>\n   Model Samples </about-neuron/models/index>\n   Neuron FAQ <faq>\n   Neuron Features </about-neuron/arch/neuron-features/index>\n   Open Source </about-neuron/oss/index>\n   SDK Maintenance Policy <sdk-policy>\n   Security <security>\n   Term Glossary <arch/glossary>\n   Troubleshooting <troubleshooting>\n   What is AWS Neuron? <what-is-neuron>\n   Older Neuron FAQS <faq/index>\n"
  },
  {
    "path": "about-neuron/models/index.rst",
    "content": ".. _model_samples_tutorials:\n\nModel samples and tutorials\n===========================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Training on Trn1 </about-neuron/models/training-trn1-samples>\n    Inference on Inf2/Trn1/Trn2 </about-neuron/models/inference-inf2-trn1-samples>    \n    Inference on Inf1 </about-neuron/models/inference-inf1-samples>\n    \nThis section gives you the consolidated list of code samples and tutorials published by AWS Neuron across documentation \nand various GitHub repositories.\n\n.. card:: Training on Trn1\n    :link: model_samples_training_trn1\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Inference on Inf2, Trn1 and Trn2\n    :link: model_samples_inference_inf2_trn1\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Inference on Inf1\n    :link: model_samples_inference_inf1\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\nFor links to individual GitHub sample repositories, see :ref:`neuron-github-samples`\n"
  },
  {
    "path": "about-neuron/models/inference-inf1-samples.rst",
    "content": ".. _model_samples_inference_inf1:\n\nInference Samples/Tutorials (Inf1)\n==================================\n\n.. important::\n\n   The samples linked on this page have been archived and are provided for historical reference only. They are not tested with recent versions of the Neuron SDK.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n   \n.. _encoder_model_samples_inference_inf1:\n \nEncoders \n--------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - bert-base-cased-finetuned-mrpc\n     - torch-neuron\n     - * HuggingFace pretrained BERT tutorial :ref:`[html] </src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb>` :pytorch-neuron-src:`[notebook] <bert_tutorial/tutorial_pretrained_bert.ipynb>`\n       * `BertBaseCased Inference on Inf1 instances <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/bertbasecased/BertBaseCased.ipynb>`_\n       * Bert TorchServe tutorial :ref:`[html] <pytorch-tutorials-torchserve>`\n       * Bring your own HuggingFace pretrained BERT container to Sagemaker Tutorial :ref:`[html] </src/examples/pytorch/byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb>` :pytorch-neuron-src:`[notebook] <byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb>`\n\n   * - bert-base-uncased\n     - torch-neuron\n     - * NeuronCore Pipeline tutorial :ref:`[html] </src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb>` :pytorch-neuron-src:`[notebook] <pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb>`\n\n   * - bert-large-uncased\n     - torch-neuron\n     - * `BertLargeUncased Inference on Inf1 instances <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/bertlargeuncased/BertLargeUncased.ipynb>`_\n   \n   * - roberta-base\n     - torch-neuron\n     - * `Roberta-Base inference on Inf1 instances <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/robertabase/RobertaBase.ipynb>`_\n\n   * - distilbert-base-uncased-finetuned-sst-2-english\n     - tensorflow-neuron \n     - * Tensorflow 2.x - HuggingFace Pipelines distilBERT with Tensorflow2 Neuron :ref:`[html] </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>` :github:`[notebook] </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>`\n    \n   * - gluon bert\n     - mxnet-neuron \n     - * MXNet 1.8: Using data parallel mode tutorial :ref:`[html] </src/examples/mxnet/data_parallel/data_parallel_tutorial.ipynb>` :mxnet-neuron-src:`[notebook] <data_parallel/data_parallel_tutorial.ipynb>`\n\n\n\n.. _vision_transformer_model_samples_inference_inf1:\n\nVision Transformers  \n-------------------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n   \n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - ssd\n     - torch-neuron\n     - * `Inference of SSD model on inf1 instances <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/ssd/SSD300VGG16.ipynb>`_\n \n\n   * - TrOCR\n     - torch-neuron\n     - * `TrOCR inference on Inf1 instances <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/trocr/TrOCR.ipynb>`_\n\n    \n   * - vgg\n     - torch-neuron\n     - * `VGG inference on Inf1 instances <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/vgg/VGG.ipynb>`_\n\n\n   * - google/vit-base-patch16-224\n     - torch-neuron\n     - * `ViT model inference on Inf1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/vit/ViT.ipynb>`_\n\n\n\n.. _cnn_model_samples_inference_inf1:\n\nConvolutional Neural Networks(CNN)\n----------------------------------\n\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - EfficientNet\n     - torch-neuron\n     - * `EfficientNet model inference on Inf1 instances <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/efficientnet/EfficientNet.ipynb>`_\n\n   * - GFL (MMDetection)\n     - torch-neuron\n     - * `GFL (MMDetection) inference on Inf1 instances <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/gfl_mmdet/GFL.ipynb>`_\n\n   * - HRNet\n     - torch-neuron\n     - * `HRNET - Pose Estimation <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/hrnet/HRnet.ipynb>`_\n\n   * - MarianMT\n     - torch-neuron\n     - * HuggingFace MarianMT tutorial :ref:`[html] </src/examples/pytorch/transformers-marianmt.ipynb>` :pytorch-neuron-src:`[notebook] <transformers-marianmt.ipynb>`\n       * `Inference of Pre-trained MarianMT model on Inf1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/marianmt/MarianMT.ipynb>`_\n\n   * - Detectron2 R-CNN \n     - torch-neuron\n     - * `R-CNN inference on Inf1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/rcnn/Rcnn.ipynb>`_\n\n   * - resnet\n     - torch-neuron\n     - * `Inference of Pre-trained Resnet model (18,34,50,101,152) on Inf1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/resnet/Resnet.ipynb>`_\n       * ResNet-50 tutorial :ref:`[html] </src/examples/pytorch/resnet50.ipynb>` :pytorch-neuron-src:`[notebook] <resnet50.ipynb>`\n\n   * - resnet\n     - tensorflow-neuron\n     - * Tensorflow 2.x - Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving :ref:`[html] </src/examples/tensorflow/tensorflow_serving_tutorial.rst>`\n   \n   * - resnet\n     - mxnet-neuron\n     - * ResNet-50 tutorial :ref:`[html] </src/examples/mxnet/resnet50/resnet50.ipynb>` :mxnet-neuron-src:`[notebook] <resnet50/resnet50.ipynb>`\n       * Getting started with Gluon tutorial :ref:`[html] </src/examples/mxnet/mxnet-gluon-tutorial.ipynb>` :github:`[notebook] </src/examples/mxnet/mxnet-gluon-tutorial.ipynb>`\n       * NeuronCore Groups tutorial :ref:`[html] </src/examples/mxnet/resnet50_neuroncore_groups.ipynb>` :mxnet-neuron-src:`[notebook] <resnet50_neuroncore_groups.ipynb>`\n    \n\n   * - Resnext\n     - torch-neuron\n     - * `Inference of Resnext model on Inf1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/resnext/Resnext.ipynb>`_\n\n\n   * - Yolov4\n     - torch-neuron \n     - * PyTorch YOLOv4 tutorial :ref:`[html] </src/examples/pytorch/yolo_v4.ipynb>` :pytorch-neuron-src:`[notebook] <yolo_v4.ipynb>`\n\n   * - Yolov5\n     - torch-neuron\n     - * `Inference of Yolov5 on Inf1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/yolov5/Yolov5.ipynb>`_\n\n\n   * - Yolov6\n     - torch-neuron \n     - * `Inference of Yolov6 on Inf1 instances <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/yolov6/Yolov6.ipynb>`_\n\n\n   * - Yolov7\n     - torch-neuron\n     - * `Inference of Yolov7 model on Inf1 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/archive/torch-neuron/inference/yolov7>`_\n\n   * - Yolof\n     - torch-neuron\n     - * `Inference of Yolof model on Inf1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuron/inference/yolof_detectron2/YoloF.ipynb>`_\n\n   * - fairseq\n     - torch-neuron\n     - * `Inference of fairseq model on Inf1 <https://github.com/aws-neuron/aws-neuron-samples-staging/tree/master/archive/torch-neuron/inference/fairseq>`_\n\n   * - unet\n     - tensorflow-neuron\n     - * `Unet - Tensorflow 2.x tutorial <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/tensorflow-neuron/inference/unet>`_\n\n\n\n.. _vision_model_samples_inference_inf1:\n\nVision\n------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - craft-pytorch\n     - torch-neuron\n     - * `CRAFT model inference on Inf1 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/archive/torch-neuron/inference/craft>`_\n\n   \n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "about-neuron/models/inference-inf2-trn1-samples.rst",
    "content": ".. _model_samples_inference_inf2_trn1:\n\nInference Samples/Tutorials (Inf2/Trn1/Trn2)\n============================================\n\n.. important::\n\n   Some samples linked on this page have been archived and are provided for historical reference only. They are not tested with recent versions of the Neuron SDK. For the latest inference tutorials, refer to :ref:`NxD Inference Tutorials <nxdi-tutorials-index>`.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\n.. _encoder_model_samples_inference_inf2_trn1:\n \nEncoders \n--------\n\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - bert-base-cased-finetuned-mrpc\n     - torch-neuronx\n     - * :ref:`BERT TorchServe tutorial <pytorch-tutorials-torchserve-neuronx>`\n       * HuggingFace pretrained BERT tutorial :ref:`[html] </src/examples/pytorch/torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.ipynb>`\n       * `LibTorch C++ Tutorial for HuggingFace Pretrained BERT <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorial-libtorch.html#pytorch-tutorials-libtorch>`_\n       * `Compiling and Deploying HuggingFace Pretrained BERT on Inf2 on Amazon SageMaker <https://github.com/aws-neuron/aws-neuron-sagemaker-samples/blob/master/inference/inf2-bert-on-sagemaker/inf2_bert_sagemaker.ipynb>`_\n\n\n   * - bert-base-cased-finetuned-mrpc\n     - neuronx-distributed\n     - * :ref:`tp_inference_tutorial`\n\n\n   * - bert-base-uncased\n     - torch-neuronx\n     - * `HuggingFace Pretrained BERT Inference on Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_bert_inference_on_trn1.ipynb>`_\n\n   * - distilbert-base-uncased\n     - torch-neuronx\n     - * `HuggingFace Pretrained DistilBERT Inference on Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_distilbert_Inference_on_trn1.ipynb>`_\n\n\n   * - roberta-base\n     - tensorflow-neuronx\n     - * HuggingFace Roberta-Base :ref:`[html]</src/examples/tensorflow/tensorflow-neuronx/tfneuronx-roberta-base-tutorial.ipynb>` :github:`[notebook] </src/examples/tensorflow/tensorflow-neuronx/tfneuronx-roberta-base-tutorial.ipynb>`\n\n\n   * - roberta-large\n     - torch-neuronx\n     - * `HuggingFace Pretrained RoBERTa Inference on Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_roberta_inference_on_frn1.ipynb>`_\n\n\n\n.. _decoder_model_samples_inference_inf2_trn1:\n\nDecoders\n--------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - gpt2\n     - torch-neuronx\n     - * `HuggingFace Pretrained GPT2 Feature Extraction on Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_gpt2_feature_extraction_on_trn1.ipynb>`_\n  \n   * - meta-llama/Llama-3.3-70B\n     - neuronx-distributed-inference\n     - * :ref:`nxdi-trn2-llama3.3-70b-tutorial`\n       * :ref:`/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-dp-tutorial.ipynb`\n       * :ref:`nxdi-sd-inference-tutorial`\n\n   * - meta-llama/Llama-3.1-8b\n     - transformers-neuronx\n     - * `Run Hugging Face Llama 3.1 8B autoregressive sampling on Inf2 & Trn1 with 32k sequence length <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-8b-32k-sampling.ipynb>`_\n       * `Run Hugging Face Llama 3.1 8B autoregressive sampling on Inf2 & Trn1 with 128k sequence length <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-8b-128k-sampling.ipynb>`_\n       * `Run meta-llama/Meta-Llama-3.1-8B autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-3.1-8b-sampling.ipynb>`_\n   \n   * - meta-llama/Llama-3.1-70b\n     - transformers-neuronx\n     - * `Run Hugging Face Llama 3.1 70B autoregressive sampling on Trn1 with 64k sequence length <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-70b-64k-sampling.ipynb>`_\n       * `Run Hugging Face meta-llama/Meta-Llama-3.1-70B autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-3.1-70b-sampling.ipynb>`_\n\n   * - meta-llama/Llama-3.1-70b-Instruct\n     - transformers-neuronx\n     - * `Run Hugging Face Llama-3.1-70B-Instruct + Llama-3.2-1B-Instruct Speculative Decoding on Trn1 with transformers-neuronx and vLLM <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-70b-speculative-decoding.ipynb>`_\n       * `Run Hugging Face Llama-3.1-70B-Instruct EAGLE Speculative Decoding on Trn1 with transformers-neuronx and vLLM <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-70b-eagle-speculative-decoding.ipynb>`_\n\n   * - meta-llama/Llama-3.1-405b\n     - neuronx-distributed-inference\n     - * :ref:`Tutorial for deploying Llama-3.1-405B on Trn2 <nxdi-trn2-llama3.1-405b-tutorial>`\n       * :ref:`nxdi-trn2-llama3.1-405b-speculative-tutorial`\n   \n   * - meta-llama/Llama-3.1-405b\n     - transformers-neuronx\n     - * `Run Hugging Face Llama 3.1 405B autoregressive sampling on Trn1/Trn1n with 16k sequence length <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-405b-multinode-16k-sampling.ipynb>`_\n\n   * - meta-llama/Llama-3-8b\n     - transformers-neuronx\n     - * `Run Hugging Face meta-llama/Llama-3-8b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-3-8b-sampling.ipynb>`_\n\n   * - meta-llama/Llama-3-70b\n     - transformers-neuronx\n     - * `Run Hugging Face meta-llama/Llama-3-70b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-3-70b-sampling.ipynb>`_\n\n   * - meta-llama/Llama-2-13b\n     - transformers-neuronx\n     - * `Run Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb>`_\n\n   * - meta-llama/Llama-2-70b\n     - transformers-neuronx\n     - * `Run Hugging Face meta-llama/Llama-2-70b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-70b-sampling.ipynb>`_\n       *  `Run speculative sampling on Meta Llama models [Beta] <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/speculative_sampling.ipynb>`_\n\n   * - meta-llama/Llama-3.2-1B-Instruct\n     - neuronx-distributed\n     - * `Run meta-llama/Llama-3.2-1B-Instruct on Inf2 and Trn1 <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama>`_\n\n   * - meta-llama/codellama-13b\n     - neuronx-distributed\n     - * `Run meta-llama/codellama-13b-16k-sampling <https://github.com/aws-neuron/aws-neuron-samples/torch-neuronx/transformers-neuronx/inference/codellama-13b-16k-sampling.ipynb>`_\n\n   * - mistralai/Mistral-7B-Instruct-v0.1\n     - transformers-neuronx\n     - * :ref:`Run Mistral-7B-Instruct-v0.1 autoregressive sampling on Inf2 & Trn1 <mistral_gqa_code_sample>`\n\n   * - mistralai/Mistral-7B-Instruct-v0.2\n     - transformers-neuronx\n     - * `Run Hugging Face mistralai/Mistral-7B-Instruct-v0.2 autoregressive sampling on Inf2 & Trn1 [Beta] <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/mistralai-Mistral-7b-Instruct-v0.2.ipynb>`_\n\n   * - Mixtral-8x7B-v0.1\n     - transformers-neuronx\n     - * `Run Hugging Face mistralai/Mixtral-8x7B-v0.1 autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/mixtral-8x7b-sampling.ipynb>`_\n\n   * - Mixtral-8x7B\n     - neuronx-distributed\n     - * `Mixtral inference with NeuronX Distributed on Inf2 & Trn1 <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/mixtral>`_\n\n\n   * - DBRX\n     - neuronx-distributed\n     - * `DBRX inference with NeuronX Distributed on Inf2 & Trn1 <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/dbrx>`_  \n\n   * - codellama/CodeLlama-13b-hf\n     - transformers-neuronx\n     - * `Run Hugging Face codellama/CodeLlama-13b-hf autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/codellama-13b-16k-sampling.ipynb>`_\n\n.. _encoder_decoder_model_samples_inference_inf2_trn1:\n\nEncoder-Decoders  \n----------------\n\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - t5-large\n     - * torch-neuronx\n       * optimum-neuron\n     - * T5 inference tutorial :ref:`[html] </src/examples/pytorch/torch-neuronx/t5-inference-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <torch-neuronx/t5-inference-tutorial.ipynb>`\n\n   * - t5-3b\n     - neuronx-distributed\n     - * T5 inference tutorial :ref:`[html] </src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`\n\n   * - google/flan-t5-xl\n     - neuronx-distributed\n     - * flan-t5-xl inference tutorial :ref:`[html] </src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`\n\n\n\n.. _vision_transformer_model_samples_inference_inf2_trn1:\n\nVision Transformers  \n-------------------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n   \n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - google/vit-base-patch16-224\n     - torch-neuronx\n     - * `HuggingFace Pretrained ViT Inference on Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_vit_inference_on_inf2.ipynb>`_\n\n   * - clip-vit-base-patch32\n     - torch-neuronx\n     - * `HuggingFace Pretrained CLIP Base Inference on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_clip_base_inference_on_inf2.ipynb>`_\n\n\n   * - clip-vit-large-patch14\n     - torch-neuronx\n     - * `HuggingFace Pretrained CLIP Large Inference on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_clip_large_inference_on_inf2.ipynb>`_\n\n\n\n.. _cnn_model_samples_inference_inf2_trn1:\n\nConvolutional Neural Networks(CNN)\n----------------------------------\n\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - resnet50\n     - torch-neuronx\n     - * `Torchvision Pretrained ResNet50 Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/tv_pretrained_resnet50_inference_on_trn1.ipynb>`_\n       *  Torchvision ResNet50 tutorial :ref:`[html] </src/examples/pytorch/torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb>`\n\n   * - resnet50\n     - tensorflow-neuronx\n     - * :ref:`tensorflow-servingx-neuronrt-visible-cores`\n\n   * - unet\n     - torch-neuronx\n     - * `Pretrained UNet Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/pretrained_unet_inference_on_trn1.ipynb>`_\n\n   * - vgg\n     - torch-neuronx\n     - * `Torchvision Pretrained VGG Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/tv_pretrained_vgg_inference_on_trn1.ipynb>`_\n\n\n.. _sd_model_samples_inference_inf2_trn1:\n\nStable Diffusion\n----------------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - stable-diffusion-v1-5\n     - torch-neuronx\n     - * `HuggingFace Stable Diffusion 1.5 (512x512) Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sd15_512_inference.ipynb>`_\n\n   * - stable-diffusion-2-1-base\n     - torch-neuronx\n     - * `HuggingFace Stable Diffusion 2.1 (512x512) Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sd2_512_inference.ipynb>`_\n\n   * - stable-diffusion-2-1\n     - torch-neuronx\n     - * `HuggingFace Stable Diffusion 2.1 (768x768) Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sd2_768_inference.ipynb>`_\n       * `Deploy & Run Stable Diffusion on SageMaker and Inferentia2 <https://github.com/aws-neuron/aws-neuron-sagemaker-samples/blob/master/inference/stable-diffusion/StableDiffusion2_1.ipynb>`_\n\n   * - stable-diffusion-xl-base-1.0\n     - torch-neuronx\n     - * `HuggingFace Stable Diffusion XL 1.0 (1024x1024) Inference on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sdxl_base_1024_inference.ipynb>`_\n       * `HuggingFace Stable Diffusion XL 1.0 Base and Refiner (1024x1024) Inference on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sdxl_base_and_refiner_1024_inference.ipynb>`_\n\n   * - stable-diffusion-2-inpainting\n     - torch-neuronx\n     - * `stable-diffusion-2-inpainting model Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/archive/torch-neuronx/inference/hf_pretrained_sd2_inpainting_936_624_inference.ipynb>`_\n\n\n.. _diffusion_transformers_samples_inference_inf2_trn1:\n\nDiffusion Transformers\n----------------------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - pixart-alpha\n     - torch-neuronx\n     - * `HuggingFace PixArt Alpha (256x256, 512x512 square resolution) Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_pixart_alpha_inference_on_inf2.ipynb>`_\n\n   * - pixart-sigma\n     - torch-neuronx\n     - * `HuggingFace PixArt Sigma (256x256, 512x512 square resolution) Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_pixart_sigma_inference_on_inf2.ipynb>`_\n\n   \n\n.. _audio_model_samples_inference_inf2_trn1:\n\nAudio\n-----\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n       \n   * - wav2vec2-conformer\n     - torch-neuronx\n     - * `Run HuggingFace Pretrained Wav2Vec2-Conformer with Rotary Position Embeddings Inference on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_wav2vec2_conformer_rope_inference_on_inf2.ipynb>`_\n       * `Run HuggingFace Pretrained Wav2Vec2-Conformer with Relative Position Embeddings Inference on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_wav2vec2_conformer_relpos_inference_on_inf2.ipynb>`_\n\n\n\n.. _multi_modal_model_samples_inference_inf2_trn1:\n\nMulti Modal\n-----------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n       \n\n   * - multimodal-perceiver\n     - torch-neuronx\n     - * `HuggingFace Multimodal Perceiver Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_perceiver_multimodal_inference.ipynb>`_\n\n\n   * - language-perceiver\n     - torch-neuronx\n     - * `HF Pretrained Perceiver Language Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_perceiver_language_inference.ipynb>`_\n\n\n   * - vision-perceiver-conv\n     - torch-neuronx\n     - * `HF Pretrained Perceiver Image Classification Inference on Trn1 / Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_perceiver_vision_inference.ipynb>`_\n\n\n\n \n\n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "about-neuron/models/training-trn1-samples.rst",
    "content": ".. _model_samples_training_trn1:\n\nTraining Samples/Tutorials (Trn1/Trn1n)\n=======================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\n.. _encoder_model_samples_training_trn1:\n \nEncoders \n--------\n\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - bert-base-cased\n     - torch-neuronx\n     - * `Fine-tune a \"bert-base-cased\" PyTorch model for Text Classification  <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/BertBaseCased.ipynb>`_\n       * `How to fine-tune a \"bert base cased\" PyTorch model with AWS Trainium (Trn1 instances) for Sentiment Analysis <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_sentiment_analysis/01-hf-single-neuron.ipynb>`_\n    \n   * - bert-base-uncased\n     - torch-neuronx\n     - * `Fine-tune a \"bert-base-uncased\" PyTorch model <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/BertBaseUncased.ipynb>`_\n       * `Fine tuning BERT base model from HuggingFace on Amazon SageMaker <https://github.com/aws-neuron/aws-neuron-sagemaker-samples/blob/master/training/trn1-bert-fine-tuning-on-sagemaker/bert-base-uncased-amazon-polarity.ipynb>`_\n   \n   * - bert-large-cased\n     - torch-neuronx\n     - * `Fine-tune a \"bert-large-cased\" PyTorch model  <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/BertLargeCased.ipynb>`_\n    \n   * - bert-large-uncased\n     - torch-neuronx\n     - * :ref:`hf-bert-pretraining-tutorial`\n       * `Launch Bert Large Phase 1 pretraining job on Parallel Cluster <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/dp-bert-launch-job.md>`_\n       * `Launch a Multi-Node PyTorch Neuron Training Job on Trainium Using TorchX and EKS <https://github.com/aws-neuron/aws-neuron-eks-samples/tree/master/dp_bert_hf_pretrain#tutorial-launch-a-multi-node-pytorch-neuron-training-job-on-trainium-using-torchx-and-eks>`_\n       * :ref:`torch-hf-bert-finetune`\n       * `Fine-tune a \"bert-large-uncased\" PyTorch model <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/BertLargeCased.ipynb>`_\n       \n\n   * - roberta-base\n     - tensorflow-neuronx\n     - * `Fine-tune a \"roberta-base\" PyTorch model <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/RobertaBase.ipynb>`_\n\n\n   * - roberta-large\n     - torch-neuronx\n     - * `Fine-tune a \"roberta-large\" PyTorch model <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/RobertaLarge.ipynb>`_\n\n  \n   * - xlm-roberta-base\n     - torch-neuronx\n     - * `Fine-tune a \"xlm-roberta-base\" PyTorch model <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/XlmRobertaBase.ipynb>`_\n\n\n   * - alberta-base-v2\n     - torch-neuronx\n     - * `Fine-tune a \"alberta-base-v2\" PyTorch model <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/AlbertBase.ipynb>`_\n\n\n   * - distilbert-base-uncased\n     - torch-neuronx\n     - * `Fine-tune a \"distilbert-base-uncased\" PyTorch model <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/DistilbertBaseUncased.ipynb>`_\n\n\n   * - camembert-base\n     - torch-neuronx\n     - * `Fine-tune a \"camembert-base PyTorch model <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/CamembertBase.ipynb>`_\n\n   * - cl-tohoku/bert-base-japanese-whole-word-masking\n     - torch-neuronx\n     - * `Fine-tuning & Deployment Hugging Face BERT Japanese model\t<https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_bert_jp/bert-jp-tutorial.ipynb>`_\n\n\n\n\n.. _decoder_model_samples_training_trn1:\n\n\nDecoders\n--------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n   \n   * - gpt-2\n     - torch-neuronx\n     - * `How to run training jobs for \"gpt2\" PyTorch model with AWS Trainium <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_language_modeling/gpt2/gpt2.ipynb>`_\n       * :ref:`zero1-gpt2-pretraining-tutorial`\n   \n   \n   * - gpt-3\n     - neuronx-nemo-megatron\n     - * `Launch a GPT-3 23B pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`_\n       * `Launch a GPT-3 46B pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`_\n       * `Launch a GPT-3 175B pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`_\n\n\n   * - GPT-NEOX-20B\n     - neuronx-distributed\n     - * :ref:`gpt_neox_20b_tp_zero1_tutorial`\n       * `Training GPT-NEOX 20B model using neuronx-distributed\t <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain>`_\n       * `Pre-train GPT Neox 20b on Wikicorpus dataset using Neuronx Distributed library <https://github.com/aws-samples/amazon-eks-machine-learning-with-terraform-and-kubeflow/blob/master/examples/neuronx-distributed/gpt_neox_20b/README.md>`_\n\n   \n   * - GPT-NEOX-6.9B\n     - neuronx-distributed\n     - * :ref:`gpt_neox_tp_zero1_tutorial`\n       * `Training GPT-NEOX 6.9B model using neuronx-distributed\t\t <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_6.9b_hf_pretrain>`_\n       * `Pre-train GPT Neox 6.9b on Wikicorpus dataset using Neuronx Distributed library <https://github.com/aws-samples/amazon-eks-machine-learning-with-terraform-and-kubeflow/blob/master/examples/neuronx-distributed/gpt_neox_6.9b/README.md#pre-train-gpt-neox-69b-on-wikicorpus-dataset-using-neuronx-distributed-library>`_\n\n   * - meta-llama/Llama-3.1-70b\n     - neuronx-distributed\n     - * :ref:`llama2_tp_pp_tutorial`\n\n   * - meta-llama/Llama-3.1-8b\n     - neuronx-distributed\n     - * :ref:`llama2_7b_tp_zero1_tutorial`\n\n   * - meta-llama/Llama-3-70b\n     - neuronx-distributed\n     - * :ref:`llama2_tp_pp_tutorial`\n\n   * - meta-llama/Llama-3-8b\n     - nxd-training\n     - * :ref:`hf_llama3_8B_pretraining`\n       * :ref:`hf_llama3_8B_SFT`\n\n   * - meta-llama/Llama-3-8b\n     - neuronx-distributed\n     - * :ref:`Training Llama3 8B Model with Tensor Parallelism and ZeRO-1 Optimizer <llama2_7b_tp_zero1_tutorial>`\n       * :ref:`Tutorial for Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning with NeuronX Distributed <llama3_8b_tp_ptl_lora_finetune_tutorial>`\n            \n   * - meta-llama/Llama-2-7b\n     - neuronx-distributed\n     - * :ref:`llama2_7b_tp_zero1_tutorial`\n       * `Training Llama2 7B Model with AWS Batch and Trainium <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/aws-batch/llama2/README.md>`_\n       * :ref:`llama2_7b_tp_zero1_ptl_finetune_tutorial`\n       * `Pre-train Llama2-7B on Wikicorpus dataset using Neuronx Distributed library <https://github.com/aws-samples/amazon-eks-machine-learning-with-terraform-and-kubeflow/blob/master/examples/neuronx-distributed/llama2_7b/README.md>`_\n\n   * - meta-llama/Llama-2-13b\n     - neuronx-distributed\n     - * :ref:`llama2_tp_pp_tutorial`\n\n   * - meta-llama/Llama-2-70b\n     - neuronx-distributed\n     - * :ref:`llama2_tp_pp_tutorial`\n\n   * - codegen25-7b-mono\n     - neuronx-distributed\n     - * :ref:`codegen25_7b_tp_zero1_tutorial`\n\n   * - meta-llama/Llama-2\n     - neuronx-nemo-megatron\n     - * `Launch a Llama-2-7B pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md>`_\n       * `Launch a Llama-2-13B pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md>`_\n       * `Launch a Llama-2-70B pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md>`_\n\n   * - Mistral-7B\n     - neuronx-nemo-megatron\n     - * `Training Mistral-7B <https://github.com/aws-neuron/neuronx-nemo-megatron/blob/main/nemo/examples/nlp/language_modeling/test_mistral.sh>`_\n\n.. _encoder_decoder_model_samples_training_trn1:\n\nEncoder-Decoders  \n----------------\n\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - t5-small\n     - * torch-neuronx\n       * optimum-neuron\n     - * :ref:`torch-hf-t5-finetune`\n\n   * - facebook/bart-large\n     - * torch-neuronx\n     - * `How to fine-tune a \"Bart-Large\" PyTorch model with AWS Trainium (trn1 instances) <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_summarization/BartLarge.ipynb>`_\n\n\n.. _vision_transformer_model_samples_training_trn1:\n\nVision Transformers  \n-------------------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n   \n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - google/vit-base-patch16-224-in21k\n     - torch-neuronx\n     - * `Fine-tune a pretrained HuggingFace vision transformer PyTorch model  <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_image_classification/vit.ipynb>`_\n\n    \n   * - openai/clip-vit-base-patch32\n     - torch-neuronx\n     - * `Fine-tune a pretrained HuggingFace CLIP-base PyTorch model with AWS Trainium  <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_contrastive_image_text/CLIPBase.ipynb>`_\n\n\n   * - openai/clip-vit-large-patch14\n     - torch-neuronx\n     - * `Fine-tune a pretrained HuggingFace CLIP-large PyTorch model with AWS Trainium <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_contrastive_image_text/CLIPLarge.ipynb>`_\n\n\n\n.. _sd_model_samples_training_trn1:\n\nStable Diffusion\n----------------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n       \n\n   * - stabilityai/stable-diffusion-2-1-base\n     - torch-neuronx\n     - * [Beta] `Train stabilityai/stable-diffusion-2-1-base with AWS Trainium (trn1 instances) <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/stable_diffusion/>`_\n\n\n   * - runwayml/stable-diffusion-v1-5\n     - torch-neuronx\n     - * [Beta] `Train runwayml/stable-diffusion-v1-5 with AWS Trainium (trn1 instances) <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/stable_diffusion/>`_\n  \n\n\n.. _multi_modal_model_samples_training_trn1:\n\nMulti Modal\n-----------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n\n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n       \n\n   * - language-perceiver\n     - torch-neuronx\n     - * `How to fine-tune a \"language perceiver\" PyTorch model with AWS Trainium (trn1 instances) <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/LanguagePerceiver.ipynb>`_\n\n\n   * - vision-perceiver-conv\n     - torch-neuronx\n     - * `How to fine-tune a pretrained HuggingFace Vision Perceiver Conv <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_image_classification/VisionPerceiverConv.ipynb>`_\n\n\n\n.. _cnn_model_samples_training_trn1:\n\nConvolutional Neural Networks(CNN)\n----------------------------------\n\n.. list-table::\n   :widths: 20 15 45 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n   \n   * - Model\n     - Frameworks/Libraries\n     - Samples and Tutorials\n\n   * - resnet50\n     - torch-neuronx\n     - `How to fine-tune a pretrained ResNet50 Pytorch model with AWS Trainium (trn1 instances) using NeuronSDK <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/resnet50>`_\n"
  },
  {
    "path": "about-neuron/monitoring-tools.rst",
    "content": ".. _monitoring_tools:\n\nMonitoring Tools\n=================\n\n.. toctree:: \n    :maxdepth: 1\n\n    Neuron-Monitor User Guide </tools/neuron-sys-tools/neuron-monitor-user-guide>\n    Neuron-Top User Guide </tools/neuron-sys-tools/neuron-top-user-guide>\n    Neuron-LS User Guide </tools/neuron-sys-tools/neuron-ls>\n    Neuron-Sysfs User Guide </tools/neuron-sys-tools/neuron-sysfs-user-guide>\n    NCCOM-TEST User Guide </tools/neuron-sys-tools/nccom-test>\n    What's New </release-notes/components/dev-tools>\n"
  },
  {
    "path": "about-neuron/news-and-blogs/CONTRIBUTING.md",
    "content": "# Contributing to AWS Neuron News and Blogs\n\nThank you for your interest in sharing content about AWS Neuron, Trainium, and Inferentia! This page collects external articles, blog posts, tutorials, and news to help the community discover valuable content.\n\n## How to Add Your Article\n\n### Quick Steps\n\n1. **Fork the repository** on GitHub\n2. **Edit the data file**: `about-neuron/news-and-blogs/news-and-blogs.yaml`\n3. **Add your article** following the format below\n4. **Submit a pull request** with your changes\n\n### Article Entry Format\n\nAdd your article to the appropriate section in `news-and-blogs.yaml`:\n\n```yaml\n- title: \"Your Article Title\"\n  url: \"https://example.com/your-article\"\n  description: \"A brief 1-2 sentence description of your article content.\"\n  author: \"Your Name or Organization\"\n  author_url: \"https://your-website.com\"  # Optional for featured articles\n  date: \"YYYY-MM-DD\"  # Publication date\n  category: \"blog\"  # Options: blog, news, tutorial, case-study, benchmark\n  locale: \"en-US\"  # Language/region code (e.g., en-US, ja-JP, zh-CN, de-DE, fr-FR)\n  featured: false  # Set to true only if approved by AWS Neuron team\n  icon: \"📝\"  # Optional emoji icon for featured articles\n```\n\n### Sections\n\n- **`featured_articles`**: Highlighted content (requires AWS Neuron team approval)\n- **`all_articles`**: All community and official content\n\n### Categories\n\nChoose the most appropriate category for your content:\n\n- **`blog`**: Technical blog posts and articles\n- **`news`**: News announcements and press releases\n- **`tutorial`**: Step-by-step guides and how-tos\n- **`case-study`**: Customer success stories and use cases\n- **`benchmark`**: Performance benchmarks and comparisons\n\n### Locale Codes\n\nSpecify the language and region of your article using standard locale codes:\n\n**Common Locales:**\n- `en-US` - English (United States) 🇺🇸\n- `en-GB` - English (United Kingdom) 🇬🇧\n- `ja-JP` - Japanese 🇯🇵\n- `zh-CN` - Chinese (Simplified) 🇨🇳\n- `zh-TW` - Chinese (Traditional) 🇹🇼\n- `ko-KR` - Korean 🇰🇷\n- `de-DE` - German 🇩🇪\n- `fr-FR` - French 🇫🇷\n- `es-ES` - Spanish (Spain) 🇪🇸\n- `es-MX` - Spanish (Mexico) 🇲🇽\n- `pt-BR` - Portuguese (Brazil) 🇧🇷\n- `it-IT` - Italian 🇮🇹\n- `nl-NL` - Dutch 🇳🇱\n- `ru-RU` - Russian 🇷🇺\n- `ar-SA` - Arabic 🇸🇦\n- `hi-IN` - Hindi 🇮🇳\n\nA flag emoji will be automatically displayed next to your article based on the locale. If your locale isn't in the list, a 🌐 globe icon will be shown.\n\n### Example Entry\n\n```yaml\nall_articles:\n  - title: \"Building Large Language Models on AWS Trainium\"\n    url: \"https://example.com/llm-trainium-guide\"\n    description: \"A comprehensive guide to training and deploying LLMs using AWS Trainium instances with practical code examples.\"\n    author: \"Jane Developer\"\n    date: \"2026-01-15\"\n    category: \"tutorial\"\n    locale: \"en-US\"\n    featured: false\n```\n\n### Guidelines\n\n1. **Content must be relevant** to AWS Neuron, Trainium, or Inferentia\n2. **Provide accurate information** - ensure URLs work and descriptions are clear\n3. **Use proper formatting** - follow YAML syntax exactly\n4. **One article per pull request** - makes review easier\n5. **Include context** in your PR description about why this content is valuable\n\n### Featured Articles\n\nTo request your article be featured:\n\n1. Add it to `all_articles` first with `featured: false`\n2. In your pull request, explain why it should be featured\n3. AWS Neuron team will review and may promote it to `featured_articles`\n\nFeatured articles should be:\n- High-quality, in-depth content\n- Particularly valuable to the community\n- Recent (typically within the last 6 months)\n\n### Review Process\n\n1. Submit your pull request\n2. AWS Neuron team will review within 5-7 business days\n3. May request changes or clarifications\n4. Once approved, your article will appear on the next documentation build\n\n### Questions?\n\n- Open an issue in the repository\n- Contact your AWS Neuron support representative\n- Email: aws-neuron-support@amazon.com\n\n## Content Guidelines\n\n### What to Include\n\n✅ Technical tutorials and guides  \n✅ Performance benchmarks and analysis  \n✅ Customer success stories  \n✅ Integration guides with other tools  \n✅ Best practices and optimization tips  \n✅ Conference talks and presentations  \n✅ Research papers using Neuron/Trainium/Inferentia  \n\n### What Not to Include\n\n❌ Marketing content without technical substance  \n❌ Broken or paywalled links  \n❌ Content unrelated to AWS Neuron ecosystem  \n❌ Duplicate submissions  \n❌ Self-promotional content without value to community  \n\n## Technical Details\n\nThis page uses:\n- **Sphinx** with `sphinxcontrib.datatemplates` extension\n- **YAML** for data storage\n- **Jinja2** templates for rendering\n- **sphinx-design** for grid layouts\n\nThe system is fully static - no backend required. All content is rendered at build time.\n\n## License\n\nBy contributing, you agree that your contributions will be licensed under the same license as this project. See the repository LICENSE files for details.\n"
  },
  {
    "path": "about-neuron/news-and-blogs/JIRA-INTEGRATION-DESIGN.md",
    "content": "# Jira Integration Design for News & Blogs\n\n## Overview\n\nThis document describes a design for populating the `news-and-blogs.yaml` file from Jira tickets, allowing contributors to submit article links via Jira instead of direct pull requests.\n\n## Design Goals\n\n1. **Simple for contributors**: Submit a Jira ticket with article metadata\n2. **Automated**: Minimal manual intervention to add articles to YAML\n3. **Quality control**: Review process before articles appear on the site\n4. **Compatible**: Works with existing Sphinx build process\n5. **No backend required**: Leverages existing CI/CD infrastructure\n\n## Architecture\n\n### Option 1: GitHub Actions + Jira API (Recommended)\n\n```\nJira Ticket Created → GitHub Action Triggered → Parse Ticket → Update YAML → Create PR\n```\n\n**Components:**\n\n1. **Jira Ticket Template**: Custom issue type \"News Article Submission\"\n2. **GitHub Action**: Runs on schedule (e.g., hourly) or webhook\n3. **Python Script**: Fetches approved tickets, generates YAML entries\n4. **Automated PR**: Creates pull request with new articles\n\n**Workflow:**\n\n```yaml\n# .github/workflows/sync-jira-articles.yml\nname: Sync Jira Articles to YAML\n\non:\n  schedule:\n    - cron: '0 */6 * * *'  # Every 6 hours\n  workflow_dispatch:  # Manual trigger\n\njobs:\n  sync-articles:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v3\n      \n      - name: Set up Python\n        uses: actions/setup-python@v4\n        with:\n          python-version: '3.9'\n      \n      - name: Install dependencies\n        run: pip install jira pyyaml\n      \n      - name: Fetch and process Jira tickets\n        env:\n          JIRA_URL: ${{ secrets.JIRA_URL }}\n          JIRA_USER: ${{ secrets.JIRA_USER }}\n          JIRA_TOKEN: ${{ secrets.JIRA_TOKEN }}\n        run: python scripts/sync_jira_articles.py\n      \n      - name: Create Pull Request\n        uses: peter-evans/create-pull-request@v5\n        with:\n          commit-message: 'Add articles from Jira'\n          title: 'Add news articles from Jira submissions'\n          body: 'Automated PR from Jira article submissions'\n          branch: jira-articles-sync\n```\n\n### Option 2: Jira Automation + Webhook\n\n```\nJira Ticket Approved → Webhook to GitHub → GitHub Action → Update YAML → Create PR\n```\n\n**Advantages:**\n- Real-time updates when tickets are approved\n- No polling required\n- More efficient\n\n**Setup:**\n1. Configure Jira Automation rule\n2. Trigger on status change to \"Approved\"\n3. Send webhook to GitHub repository dispatch endpoint\n\n### Option 3: Manual Script (Simplest)\n\n```\nDeveloper runs script → Fetches approved tickets → Updates YAML → Commits changes\n```\n\n**Use case:** Lower volume, manual review preferred\n\n## Jira Ticket Structure\n\n### Custom Fields Required\n\n```\nIssue Type: News Article Submission\n\nFields:\n- Article Title (text, required)\n- Article URL (URL, required)\n- Description (text area, required)\n- Author Name (text, required)\n- Author URL (URL, optional)\n- Publication Date (date, required)\n- Category (dropdown: blog|news|tutorial|case-study|benchmark)\n- Locale (dropdown: en-US|ja-JP|zh-CN|ko-KR|de-DE|fr-FR|es-ES|pt-BR)\n- Keywords (labels or multi-select)\n- Featured (checkbox)\n- Icon (text, optional, for featured articles)\n\nStatus Workflow:\n- Submitted → Under Review → Approved → Published → Rejected\n```\n\n### Example Jira Ticket\n\n```\nTitle: Add Karakuri AWS Trainium Tutorial\n\nFields:\n- Article Title: AWS Trainium: 50 Exercises\n- Article URL: https://zenn.dev/karakuri_blog/articles/5ccedeee1beb08\n- Description: Learn how to build LLMs for Trainium accelerators...\n- Author Name: Karakuri\n- Author URL: https://about.karakuri.ai/\n- Publication Date: 2026-02-19\n- Category: tutorial\n- Locale: ja-JP\n- Keywords: trainium, llm, training, tutorial\n- Featured: Yes\n- Icon: 🚀\n```\n\n## Implementation Script\n\n### `scripts/sync_jira_articles.py`\n\n```python\n#!/usr/bin/env python3\n\"\"\"\nSync approved Jira article submissions to news-and-blogs.yaml\n\"\"\"\n\nimport os\nimport yaml\nfrom jira import JIRA\nfrom datetime import datetime\n\n# Configuration\nJIRA_URL = os.environ.get('JIRA_URL')\nJIRA_USER = os.environ.get('JIRA_USER')\nJIRA_TOKEN = os.environ.get('JIRA_TOKEN')\nYAML_FILE = 'about-neuron/news-and-blogs/news-and-blogs.yaml'\n\n# JQL to find approved, unpublished articles\nJQL_QUERY = 'project = NEURON AND issuetype = \"News Article Submission\" AND status = \"Approved\" AND labels != \"published\"'\n\ndef connect_jira():\n    \"\"\"Connect to Jira instance\"\"\"\n    return JIRA(server=JIRA_URL, basic_auth=(JIRA_USER, JIRA_TOKEN))\n\ndef fetch_approved_articles(jira):\n    \"\"\"Fetch approved article submissions from Jira\"\"\"\n    issues = jira.search_issues(JQL_QUERY, maxResults=100)\n    articles = []\n    \n    for issue in issues:\n        article = {\n            'title': issue.fields.customfield_10001,  # Article Title\n            'url': issue.fields.customfield_10002,     # Article URL\n            'description': issue.fields.customfield_10003,  # Description\n            'author': issue.fields.customfield_10004,  # Author Name\n            'date': issue.fields.customfield_10006,    # Publication Date\n            'category': issue.fields.customfield_10007.value,  # Category\n            'locale': issue.fields.customfield_10008.value,    # Locale\n            'keywords': [label.name for label in issue.fields.labels if label.name != 'published'],\n            'featured': bool(issue.fields.customfield_10009),  # Featured checkbox\n        }\n        \n        # Optional fields\n        if issue.fields.customfield_10005:  # Author URL\n            article['author_url'] = issue.fields.customfield_10005\n        \n        if issue.fields.customfield_10010:  # Icon (for featured)\n            article['icon'] = issue.fields.customfield_10010\n        \n        articles.append({\n            'article': article,\n            'issue_key': issue.key\n        })\n    \n    return articles\n\ndef load_yaml():\n    \"\"\"Load existing YAML file\"\"\"\n    with open(YAML_FILE, 'r', encoding='utf-8') as f:\n        return yaml.safe_load(f)\n\ndef article_exists(data, url):\n    \"\"\"Check if article URL already exists in YAML\"\"\"\n    all_urls = [a['url'] for a in data.get('featured_articles', [])]\n    all_urls.extend([a['url'] for a in data.get('all_articles', [])])\n    return url in all_urls\n\ndef add_articles_to_yaml(data, new_articles):\n    \"\"\"Add new articles to appropriate sections\"\"\"\n    added_keys = []\n    \n    for item in new_articles:\n        article = item['article']\n        \n        # Skip if already exists\n        if article_exists(data, article['url']):\n            print(f\"Skipping duplicate: {article['title']}\")\n            continue\n        \n        # Add to appropriate section\n        if article.get('featured', False):\n            data['featured_articles'].append(article)\n        else:\n            data['all_articles'].append(article)\n        \n        added_keys.append(item['issue_key'])\n        print(f\"Added: {article['title']} ({item['issue_key']})\")\n    \n    return added_keys\n\ndef save_yaml(data):\n    \"\"\"Save updated YAML file\"\"\"\n    with open(YAML_FILE, 'w', encoding='utf-8') as f:\n        yaml.dump(data, f, allow_unicode=True, sort_keys=False, default_flow_style=False)\n\ndef mark_as_published(jira, issue_keys):\n    \"\"\"Add 'published' label to Jira tickets and transition to Published status\"\"\"\n    for key in issue_keys:\n        issue = jira.issue(key)\n        \n        # Add published label\n        labels = issue.fields.labels\n        if 'published' not in labels:\n            labels.append('published')\n            issue.update(fields={'labels': labels})\n        \n        # Transition to Published status (adjust transition ID as needed)\n        try:\n            jira.transition_issue(issue, 'Published')\n        except Exception as e:\n            print(f\"Could not transition {key}: {e}\")\n\ndef main():\n    print(\"Connecting to Jira...\")\n    jira = connect_jira()\n    \n    print(\"Fetching approved articles...\")\n    new_articles = fetch_approved_articles(jira)\n    \n    if not new_articles:\n        print(\"No new articles to add.\")\n        return\n    \n    print(f\"Found {len(new_articles)} approved articles\")\n    \n    print(\"Loading existing YAML...\")\n    data = load_yaml()\n    \n    print(\"Adding articles to YAML...\")\n    added_keys = add_articles_to_yaml(data, new_articles)\n    \n    if added_keys:\n        print(\"Saving YAML...\")\n        save_yaml(data)\n        \n        print(\"Marking Jira tickets as published...\")\n        mark_as_published(jira, added_keys)\n        \n        print(f\"Successfully added {len(added_keys)} articles!\")\n    else:\n        print(\"No new articles added (all were duplicates)\")\n\nif __name__ == '__main__':\n    main()\n```\n\n## Setup Instructions\n\n### 1. Configure Jira\n\n1. Create custom issue type \"News Article Submission\"\n2. Add custom fields (see structure above)\n3. Configure workflow: Submitted → Under Review → Approved → Published\n4. Create Jira API token for automation user\n\n### 2. Configure GitHub Secrets\n\nAdd these secrets to your GitHub repository:\n\n```\nJIRA_URL: https://your-company.atlassian.net\nJIRA_USER: automation@your-company.com\nJIRA_TOKEN: <api-token>\n```\n\n### 3. Add GitHub Action\n\nCreate `.github/workflows/sync-jira-articles.yml` with the workflow above.\n\n### 4. Install Dependencies\n\nAdd to `requirements.txt`:\n```\njira==3.5.0\nPyYAML==6.0\n```\n\n### 5. Test\n\n1. Create a test Jira ticket\n2. Approve it\n3. Run workflow manually: Actions → Sync Jira Articles → Run workflow\n4. Verify PR is created with new article\n\n## Alternative: Simpler Webhook Approach\n\nIf you want something lighter without Jira API polling:\n\n### Jira Automation Rule\n\n```\nTrigger: Issue transitioned to \"Approved\"\nCondition: Issue type = \"News Article Submission\"\nAction: Send web request\n\nURL: https://api.github.com/repos/aws-neuron/aws-neuron-sdk/dispatches\nMethod: POST\nHeaders:\n  Authorization: Bearer ${GITHUB_TOKEN}\n  Accept: application/vnd.github.v3+json\nBody:\n{\n  \"event_type\": \"jira-article-approved\",\n  \"client_payload\": {\n    \"issue_key\": \"{{issue.key}}\",\n    \"title\": \"{{issue.customfield_10001}}\",\n    \"url\": \"{{issue.customfield_10002}}\",\n    \"description\": \"{{issue.customfield_10003}}\",\n    \"author\": \"{{issue.customfield_10004}}\",\n    \"date\": \"{{issue.customfield_10006}}\",\n    \"category\": \"{{issue.customfield_10007}}\",\n    \"locale\": \"{{issue.customfield_10008}}\"\n  }\n}\n```\n\nThen GitHub Action receives webhook and processes directly without Jira API calls.\n\n## Maintenance\n\n### Regular Tasks\n\n1. **Monitor failed syncs**: Check GitHub Action logs\n2. **Review PRs**: Automated PRs should still be reviewed before merge\n3. **Clean up Jira**: Archive old Published tickets\n4. **Update mappings**: If custom field IDs change, update script\n\n### Troubleshooting\n\n**Articles not syncing:**\n- Check Jira API credentials\n- Verify custom field IDs match\n- Check JQL query returns expected tickets\n\n**Duplicate articles:**\n- Script checks URL before adding\n- Manually remove duplicates from YAML if needed\n\n**Formatting issues:**\n- Validate YAML after sync: `python -m yaml about-neuron/news-and-blogs/news-and-blogs.yaml`\n- Check for special characters in descriptions\n\n## Security Considerations\n\n1. **API Tokens**: Store in GitHub Secrets, never commit\n2. **Permissions**: Use dedicated Jira service account with minimal permissions\n3. **Validation**: Sanitize all input from Jira before adding to YAML\n4. **Review**: Always review automated PRs before merging\n\n## Cost & Complexity\n\n| Approach | Setup Time | Maintenance | Cost |\n|----------|-----------|-------------|------|\n| GitHub Actions + Jira API | 4-6 hours | Low | Free (GitHub Actions) |\n| Webhook + GitHub Actions | 2-3 hours | Very Low | Free |\n| Manual Script | 1-2 hours | Medium | Free |\n\n## Recommendation\n\n**For production use**: Start with **Option 3 (Manual Script)** to validate the workflow, then upgrade to **Option 1 (GitHub Actions)** once the process is proven and volume increases.\n\n**For high volume**: Use **Option 2 (Webhook)** for real-time updates.\n\n## Future Enhancements\n\n1. **Validation**: Add URL validation, duplicate detection in Jira\n2. **Preview**: Generate preview of how article will appear\n3. **Scheduling**: Support future publication dates\n4. **Analytics**: Track article submissions and approval rates\n5. **Notifications**: Notify submitters when articles are published\n6. **Bulk import**: Support CSV upload for multiple articles\n"
  },
  {
    "path": "about-neuron/news-and-blogs/README.md",
    "content": "# AWS Neuron News and Blogs System\n\nThis directory contains a dynamic, community-driven news and blogs page for AWS Neuron, Trainium, and Inferentia content.\n\n## Overview\n\nThe system allows external contributors to add links to relevant articles, blog posts, and news through a simple YAML data file, without requiring any backend infrastructure.\n\n## Architecture\n\n```\nabout-neuron/news-and-blogs/\n├── index.rst                    # Main page (uses datatemplate directives)\n├── news-and-blogs.yaml          # Data file with all article metadata\n├── featured-articles.tmpl       # Jinja2 template for featured section\n├── all-articles.tmpl            # Jinja2 template for all articles section\n├── CONTRIBUTING.md              # Contribution guidelines\n└── README.md                    # This file\n```\n\n## How It Works\n\n1. **Data Storage**: Article metadata is stored in `news-and-blogs.yaml`\n2. **Templating**: Jinja2 templates (`*.tmpl`) define how articles are rendered\n3. **Rendering**: Sphinx's `datatemplates` extension processes the YAML and templates at build time\n4. **Output**: Static HTML with grid cards using `sphinx-design`\n\n## Key Features\n\n- ✅ **No backend required** - fully static site generation\n- ✅ **Easy contributions** - edit a YAML file and submit a PR\n- ✅ **Version controlled** - all changes tracked in Git\n- ✅ **Automated rendering** - Sphinx handles everything at build time\n- ✅ **Responsive design** - uses sphinx-design grid system\n- ✅ **Maintainable** - clear separation of data, templates, and content\n\n## Adding New Articles\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for detailed instructions.\n\nQuick example:\n\n```yaml\nall_articles:\n  - title: \"My Article Title\"\n    url: \"https://example.com/article\"\n    description: \"Brief description\"\n    author: \"Author Name\"\n    date: \"2026-01-15\"\n    category: \"blog\"\n    featured: false\n```\n\n## Modifying Templates\n\nTemplates use Jinja2 syntax and have access to the YAML data structure.\n\n### Featured Articles Template (`featured-articles.tmpl`)\n\nRenders articles from the `featured_articles` section with:\n- Large cards with borders\n- Icons and bold titles\n- Author attribution with links\n- Publication dates\n\n### All Articles Template (`all-articles.tmpl`)\n\nRenders articles from the `all_articles` section with:\n- 2-column grid on desktop, 1-column on mobile\n- Simple card layout\n- Title and description\n\n## Customization\n\n### Adding New Fields\n\n1. Add field to YAML entries:\n   ```yaml\n   - title: \"Article\"\n     new_field: \"value\"\n   ```\n\n2. Update template to use it:\n   ```jinja\n   {{ article.new_field }}\n   ```\n\n### Changing Layout\n\nEdit the grid directive in templates:\n```rst\n.. grid:: 1 1 2 3  # 1 col mobile, 1 tablet, 2 desktop, 3 wide\n   :gutter: 2\n```\n\n### Adding Filters/Sorting\n\nYou can add Jinja2 filters in templates:\n\n```jinja\n{% for article in all_articles | sort(attribute='date', reverse=True) %}\n  {# Sorted by date, newest first #}\n{% endfor %}\n```\n\n## Dependencies\n\nRequired Sphinx extensions (already in `conf.py`):\n- `sphinxcontrib.datatemplates` - YAML data processing\n- `sphinx_design` - Grid card layouts\n\n## Testing Locally\n\n1. Install dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n2. Build documentation:\n   ```bash\n   sphinx-build -b html . _build/html\n   ```\n\n3. View the page:\n   ```bash\n   open _build/html/about-neuron/news-and-blogs/index.html\n   ```\n\n## Troubleshooting\n\n### Template Not Found Error\n\nEnsure templates are in the same directory as `index.rst` or add the directory to `templates_path` in `conf.py`.\n\n### YAML Parse Error\n\nValidate your YAML:\n```bash\npython -c \"import yaml; yaml.safe_load(open('news-and-blogs.yaml'))\"\n```\n\n### Articles Not Rendering\n\nCheck that:\n1. YAML file is in the same directory as `index.rst`\n2. Template files exist and have correct names\n3. YAML structure matches template expectations\n\n## Future Enhancements\n\nPossible improvements:\n- Add category filtering/grouping\n- Add search functionality\n- Add RSS feed generation\n- Add automatic link checking\n- Add article metadata validation\n- Sort by date automatically\n- Add pagination for large lists\n\n## Support\n\nFor questions or issues:\n- Open a GitHub issue\n- Contact AWS Neuron support team\n- See main repository CONTRIBUTING.md\n"
  },
  {
    "path": "about-neuron/news-and-blogs/article-template.yaml",
    "content": "# Article Entry Template\n# \n# Copy this template and fill in your article details.\n# Then add it to the appropriate section in news-and-blogs.yaml\n#\n# For featured articles (requires AWS Neuron team approval):\n# Add to the 'featured_articles' section with featured: true and an icon\n#\n# For regular articles:\n# Add to the 'all_articles' section with featured: false\n\n# TEMPLATE - Copy everything below this line\n# ============================================\n\n- title: \"Your Article Title Here\"\n  url: \"https://your-website.com/path-to-article\"\n  description: \"A clear, concise description of your article in 1-2 sentences. Explain what readers will learn or discover.\"\n  author: \"Your Name or Organization Name\"\n  author_url: \"https://your-website.com\"  # Optional: Your website or profile URL (for featured articles)\n  date: \"YYYY-MM-DD\"  # Publication date in YYYY-MM-DD format (e.g., 2026-01-27)\n  category: \"blog\"  # Choose ONE: blog, news, tutorial, case-study, benchmark\n  locale: \"en-US\"  # Language/region code (e.g., en-US, ja-JP, zh-CN, de-DE, fr-FR, es-ES, pt-BR, ko-KR)\n  keywords: [\"keyword1\", \"keyword2\", \"keyword3\"]  # List of 3-10 relevant keywords for filtering/search\n  featured: false  # Set to false unless approved by AWS Neuron team\n  icon: \"📝\"  # Optional: Single emoji for featured articles (e.g., 🚀 📊 🎯 💡 ⚡)\n\n# ============================================\n# FIELD DESCRIPTIONS\n# ============================================\n#\n# title (required):\n#   - Clear, descriptive title of your article\n#   - Keep under 100 characters\n#   - Use title case\n#\n# url (required):\n#   - Full HTTPS URL to your article\n#   - Must be publicly accessible\n#   - Should not require login or paywall\n#\n# description (required):\n#   - Brief summary of article content\n#   - 20-500 characters recommended\n#   - Focus on what readers will learn\n#   - Avoid marketing language\n#\n# author (required):\n#   - Your name or organization\n#   - Will be displayed as attribution\n#\n# author_url (optional):\n#   - Link to your website or profile\n#   - Only used in featured articles\n#   - Must be valid HTTPS URL\n#\n# date (required):\n#   - Publication date in YYYY-MM-DD format\n#   - Use the date the article was published\n#   - Not the date you're adding it here\n#\n# category (required):\n#   - blog: Technical blog posts and articles\n#   - news: News announcements and press releases\n#   - tutorial: Step-by-step guides and how-tos\n#   - case-study: Customer success stories and use cases\n#   - benchmark: Performance benchmarks and comparisons\n#\n# locale (required):\n#   - Language and region code in format: language-REGION\n#   - Examples: en-US (English-US), ja-JP (Japanese), zh-CN (Chinese-Simplified)\n#   - Common codes: en-US, en-GB, ja-JP, zh-CN, zh-TW, ko-KR, de-DE, fr-FR, \n#     es-ES, es-MX, pt-BR, pt-PT, it-IT, nl-NL, ru-RU, ar-SA, hi-IN\n#   - A flag emoji will be displayed based on the locale\n#   - Unknown locales will display 🌐 globe icon\n#\n# keywords (required):\n#   - List of 3-10 relevant keywords for filtering and search\n#   - Use lowercase, hyphenated format (e.g., \"machine-learning\", \"pytorch\")\n#   - Include technology names, topics, and key concepts\n#   - Examples: [\"trainium\", \"inference\", \"pytorch\", \"llm\", \"optimization\"]\n#   - Keywords help users find your article through filtering and search\n#\n# featured (required):\n#   - true: Article appears in featured section (requires approval)\n#   - false: Article appears in all articles section\n#   - Most submissions should use false\n#\n# icon (optional):\n#   - Single emoji character\n#   - Only used for featured articles\n#   - Examples: 🚀 📊 🎯 💡 ⚡ 🔥 ✨ 🌟 📈 🛠️\n#\n# ============================================\n# EXAMPLES\n# ============================================\n\n# Example 1: Tutorial Article\n- title: \"Getting Started with PyTorch on AWS Trainium\"\n  url: \"https://example.com/pytorch-trainium-tutorial\"\n  description: \"A comprehensive guide to training PyTorch models on AWS Trainium instances, including setup, optimization tips, and common pitfalls to avoid.\"\n  author: \"Jane Developer\"\n  date: \"2026-01-15\"\n  category: \"tutorial\"\n  locale: \"en-US\"\n  keywords: [\"pytorch\", \"trainium\", \"training\", \"tutorial\", \"getting-started\"]\n  featured: false\n\n# Example 2: Benchmark Article\n- title: \"BERT Inference Performance: Inferentia2 vs GPU Comparison\"\n  url: \"https://example.com/bert-benchmark\"\n  description: \"Detailed performance comparison of BERT inference on AWS Inferentia2 vs leading GPU instances, including cost analysis and throughput metrics.\"\n  author: \"ML Performance Lab\"\n  date: \"2026-01-20\"\n  category: \"benchmark\"\n  locale: \"en-US\"\n  keywords: [\"inferentia\", \"bert\", \"benchmark\", \"performance\", \"gpu-comparison\"]\n  featured: false\n\n# Example 3: Case Study\n- title: \"How Acme Corp Reduced ML Training Costs by 60% with Trainium\"\n  url: \"https://example.com/acme-case-study\"\n  description: \"Learn how Acme Corp migrated their large language model training to AWS Trainium and achieved significant cost savings while maintaining performance.\"\n  author: \"Acme Corp Engineering Team\"\n  date: \"2026-01-25\"\n  category: \"case-study\"\n  locale: \"en-US\"\n  keywords: [\"trainium\", \"cost-optimization\", \"llm\", \"case-study\", \"migration\"]\n  featured: false\n\n# Example 4: Featured Article (requires approval)\n- title: \"Advanced Optimization Techniques for Neuron Compiler\"\n  url: \"https://example.com/neuron-optimization\"\n  description: \"Deep dive into advanced compiler optimization techniques for AWS Neuron, with practical examples and performance improvements.\"\n  author: \"AWS Neuron Team\"\n  author_url: \"https://aws.amazon.com/machine-learning/neuron/\"\n  date: \"2026-01-27\"\n  category: \"blog\"\n  locale: \"en-US\"\n  keywords: [\"neuron\", \"compiler\", \"optimization\", \"performance\", \"advanced\"]\n  featured: true\n  icon: \"⚡\"\n\n# Example 5: Japanese Article\n- title: \"AWS Trainiumで大規模言語モデルを訓練する\"\n  url: \"https://example.jp/trainium-llm-training\"\n  description: \"AWS Trainiumを使用して大規模言語モデルを効率的に訓練する方法を詳しく解説します。\"\n  author: \"日本のMLエンジニア\"\n  date: \"2026-01-20\"\n  category: \"tutorial\"\n  locale: \"ja-JP\"\n  keywords: [\"trainium\", \"llm\", \"training\", \"japanese\", \"tutorial\"]\n  featured: false\n"
  },
  {
    "path": "about-neuron/news-and-blogs/index.rst",
    "content": ".. meta::\n    :description: Links to external news and blog articles about AWS Neuron and Trainium/Inferentia ML accelerators.\n    :date-modified: 02/26/2026\n\n.. _neuron-news:\n\nAWS Neuron News and Blogs\n=========================\n\nStay up to date with the latest news, announcements, and technical blog posts about AWS Neuron, AWS Trainium, and AWS Inferentia. Discover customer success stories, performance benchmarks, best practices, and deep dives into machine learning acceleration on AWS.\n\n----\n\nFeatured Articles\n-----------------\n\nRead recent blogs and technical content about Neuron, Trainium, and Inferentia from AWS subject matter experts and our highly experienced customers.\n\n.. datatemplate:yaml:: news-and-blogs.yaml\n\n   .. grid:: 1\n      :gutter: 2\n   {% for article in data.featured_articles %}\n   {% if article.locale == 'en-US' %}{% set flag = '🇺🇸' %}{% set locale_name = 'English' %}{% elif article.locale == 'ja-JP' %}{% set flag = '🇯🇵' %}{% set locale_name = 'Japanese' %}{% elif article.locale == 'zh-CN' %}{% set flag = '🇨🇳' %}{% set locale_name = 'Chinese' %}{% elif article.locale == 'ko-KR' %}{% set flag = '🇰🇷' %}{% set locale_name = 'Korean' %}{% else %}{% set flag = '🌐' %}{% set locale_name = 'Unknown' %}{% endif %}\n\n      .. grid-item-card::\n         :class-card: sd-border-2\n         :link: {{ article.url }}\n\n         {{ article.icon }} **{{ article.title }}**\n         ^^^\n         {{ article.description }}\n         +++\n         **Published on**: {{ article.date }} | {{ flag }} ({{ locale_name }}) | Content by `{{ article.author }} <{{ article.author_url }}>`__\n   {% endfor %}\n\n.. note::\n   \n   This page is regularly updated with new content. Bookmark it to stay informed about the latest developments in AWS Neuron, Trainium, and Inferentia.\n \n**For the full list of featured articles and posts, go to the :ref:`News & Blogs <all-articles>` section of this page.**\n\n.. _all-articles:\n\nNews & Blogs \n-------------\n\nExplore the latest news, press releases, and industry coverage about AWS Neuron, Trainium, and Inferentia.\n\n.. raw:: html\n\n   <div style=\"margin-bottom: 20px;\">\n     <label for=\"locale-filter\" style=\"font-weight: bold; margin-right: 10px;\">Filter by language:</label>\n     <select id=\"locale-filter\" style=\"padding: 5px 10px; font-size: 14px; border: 1px solid #ccc; border-radius: 4px;\">\n       <option value=\"en-US\" selected>🇺🇸 English</option>\n       <option value=\"ja-JP\">🇯🇵 Japanese</option>\n       <option value=\"ko-KR\">🇰🇷 Korean</option>\n       <option value=\"zh-CN\">🇨🇳 Chinese</option>\n       <option value=\"all\">All languages</option>\n     </select>\n   </div>\n\n.. datatemplate:yaml:: news-and-blogs.yaml\n\n   .. grid:: 1 1 2 2\n      :gutter: 2\n      :class-container: articles-grid news-blogs-grid\n   {% for article in data.all_articles|sort(attribute='date', reverse=True) %}\n   {% if article.locale == 'en-US' %}{% set flag = '🇺🇸' %}{% set locale_name = 'English' %}{% elif article.locale == 'ja-JP' %}{% set flag = '🇯🇵' %}{% set locale_name = 'Japanese' %}{% elif article.locale == 'zh-CN' %}{% set flag = '🇨🇳' %}{% set locale_name = 'Chinese' %}{% elif article.locale == 'ko-KR' %}{% set flag = '🇰🇷' %}{% set locale_name = 'Korean' %}{% else %}{% set flag = '🌐' %}{% set locale_name = 'Unknown' %}{% endif %}\n\n      .. grid-item-card::\n         :link: {{ article.url }}\n         :class-card: sd-border-1 article-card\n         :class-body: article-locale-{{ article.locale }}\n\n         **{{ article.title }}**\n         ^^^\n         {{ article.description }}\n         +++\n         **Published on**: {{ article.date }} | {{ flag }} ({{ locale_name }})\n   {% endfor %}\n\n.. raw:: html\n\n   <script>\n   (function() {\n     'use strict';\n     \n     function initFilter() {\n       const filter = document.getElementById('locale-filter');\n       const articlesGrid = document.querySelector('.news-blogs-grid');\n       \n       if (!filter) {\n         console.error('Filter dropdown not found!');\n         return;\n       }\n       \n       if (!articlesGrid) {\n         console.error('News & Blogs grid not found!');\n         return;\n       }\n       \n       console.log('Filter and News & Blogs grid found successfully');\n       \n       // Get all article cards\n       const articleCards = Array.from(articlesGrid.querySelectorAll('.sd-col'));\n       console.log('Total article cards found:', articleCards.length);\n       \n       // Extract locale from each card\n       const cardLocales = articleCards.map((card, index) => {\n         const body = card.querySelector('[class*=\"article-locale-\"]');\n         if (!body) {\n           console.warn('Card', index, 'has no locale class');\n           return 'UNKNOWN';\n         }\n         \n         const classes = body.className.split(' ');\n         const localeClass = classes.find(c => c.startsWith('article-locale-'));\n         \n         if (!localeClass) {\n           console.warn('Card', index, 'has no article-locale- class');\n           return 'UNKNOWN';\n         }\n         \n         // Convert \"article-locale-ja-jp\" to \"JA-JP\"\n         const locale = localeClass.replace('article-locale-', '').toUpperCase();\n         console.log('Card', index, 'locale:', locale);\n         return locale;\n       });\n       \n       // Function to apply filter\n       function applyFilter(selectedLocale) {\n         console.log('=== Applying filter:', selectedLocale, '===');\n         \n         let visibleCount = 0;\n         \n         articleCards.forEach((card, index) => {\n           const cardLocale = cardLocales[index];\n           const shouldShow = (selectedLocale === 'ALL' || cardLocale === selectedLocale);\n           \n           if (shouldShow) {\n             card.style.setProperty('display', 'flex', 'important');\n             card.style.setProperty('visibility', 'visible', 'important');\n             visibleCount++;\n             console.log('Showing card', index, '(', cardLocale, ')');\n           } else {\n             card.style.setProperty('display', 'none', 'important');\n             card.style.setProperty('visibility', 'hidden', 'important');\n             console.log('Hiding card', index, '(', cardLocale, ')');\n           }\n         });\n         \n         console.log('Total visible cards:', visibleCount);\n         \n         // Remove existing \"no results\" message\n         const existingMsg = document.querySelector('.no-results-message');\n         if (existingMsg) {\n           existingMsg.remove();\n         }\n         \n         // Show \"no results\" message if needed\n         if (visibleCount === 0) {\n           const noResultsMsg = document.createElement('div');\n           noResultsMsg.className = 'no-results-message';\n           noResultsMsg.style.cssText = 'padding: 20px; text-align: center; color: #666; font-style: italic; margin-top: 20px;';\n           noResultsMsg.textContent = 'No articles found for the selected language.';\n           articlesGrid.parentElement.appendChild(noResultsMsg);\n         }\n       }\n       \n       // Add change event listener\n       filter.addEventListener('change', function(e) {\n         const selectedLocale = e.target.value.toUpperCase(); // Convert to uppercase for comparison\n         applyFilter(selectedLocale);\n       });\n       \n       // Apply initial filter on page load (English by default)\n       const initialLocale = filter.value.toUpperCase();\n       applyFilter(initialLocale);\n       \n       console.log('Filter initialized successfully!');\n     }\n     \n     // Initialize when DOM is ready\n     if (document.readyState === 'loading') {\n       document.addEventListener('DOMContentLoaded', initFilter);\n     } else {\n       initFilter();\n     }\n   })();\n   </script>\n\n.. important::\n\n   AWS and Neuron provide links to external articles and posts to help you discover them, but do not commission or own any content not created by AWS employees. This list is curated based on internal and customer recommendations. \n\n**Want to add your article?** Go to `https://github.com/aws-neuron/aws-neuron-sdk <https://github.com/aws-neuron/aws-neuron-sdk>`_, edit ``about-neuron/news-and-blogs/news-and-blogs.yaml`` to add your submission, and submit a pull request. \n\n\n\n"
  },
  {
    "path": "about-neuron/news-and-blogs/news-and-blogs.yaml",
    "content": "# AWS Neuron News and Blogs Data File\n# \n# This file contains metadata for external articles, blog posts, and news about\n# AWS Neuron, Trainium, and Inferentia.\n#\n# To contribute a new article:\n# 1. Add a new entry to the appropriate section below\n# 2. Follow the existing format exactly\n# 3. Submit a pull request with your changes\n#\n# Entry format:\n#   - title: \"Article Title\"\n#     url: \"https://example.com/article\"\n#     description: \"Brief description of the article content\"\n#     author: \"Author Name or Organization\"\n#     date: \"YYYY-MM-DD\"\n#     category: \"blog|news|tutorial|case-study|benchmark\"\n#     locale: \"en-US\"  # Language/region code (e.g., en-US, ja-JP, zh-CN, de-DE, fr-FR, es-ES, pt-BR, ko-KR)\n#     keywords: [\"keyword1\", \"keyword2\", \"keyword3\"]  # List of relevant keywords for filtering\n#     featured: true|false  # Set to true for featured articles section\n\nfeatured_articles:\n  - title: \"AWS Trainium: 50 Exercises\"\n    url: \"https://zenn.dev/karakuri_blog/articles/5ccedeee1beb08\"\n    description: \"Learn how to build LLMs for Trainium accelerators with this rich 50-lesson guide from customer Karakuri.\"\n    author: \"Karakuri\"\n    author_url: \"https://about.karakuri.ai/\"\n    date: \"2026-02-19\"\n    category: \"tutorial\"\n    locale: \"en-US\"\n    keywords: [\"trainium\", \"llm\", \"training\", \"tutorial\", \"japanese\"]\n    featured: true\n    icon: \"🚀\"\n\n  - title: \"Cost-effective AI image generation with PixArt-Sigma inference on AWS Trainium and AWS Inferentia\"\n    url: \"https://aws.amazon.com/blogs/machine-learning/cost-effective-ai-image-generation-with-pixart-sigma-inference-on-aws-trainium-and-aws-inferentia/\"\n    description: \"Learn how to use AWS Trainium and Inferentia to deploy a PixArt-Sigma diffusion transformer model.\"\n    author: \"AWS Neuron Team\"\n    author_url: \"https://aws.amazon.com/machine-learning/neuron/\"\n    date: \"2026-02-19\"\n    category: \"blog\"\n    locale: \"en-US\"\n    keywords: [\"inferentia\", \"trainium\", \"inference\", \"diffusion\", \"image-generation\"]\n    featured: true\n    icon: \"📊\"\n\nall_articles:\n  # Japanese Articles\n  - title: \"AWS Neuron 関連記事まとめ\"\n    url: \"https://zenn.dev/tosshi/articles/36f3615e26c323\"\n    description: \"AWS Neuron エコシステムに関する自身が作成した一連の技術記事のインデックス\"\n    author: \"littlemex\"\n    date: \"2026-02-20\"\n    category: \"blog\"\n    locale: \"ja-JP\"\n    keywords: [\"trainium\", \"neuron\", \"collective-communication\", \"architecture\", \"japanese\"]\n    featured: false\n\n  - title: \"【AWS re:Invent 2025 速報】AWS 自社設計 AIチップ AWS Trainium3 の全貌\"\n    url: \"https://zenn.dev/aws_japan/articles/06808526d5c75f\"\n    description: \"AWS re:Invent 2025で発表されたAWS Trainium3カスタムAIチップの完全な概要をお届けします。\"\n    author: \"AWS Japan\"\n    date: \"2025-12-06\"\n    category: \"news\"\n    locale: \"ja-JP\"\n    keywords: [\"trainium3\", \"reinvent\", \"announcement\", \"ai-chip\"]\n    featured: false\n\n  - title: \"【AWS Trainium 50本ノック #0】はじめに\"\n    url: \"https://zenn.dev/karakuri_blog/articles/77d93c40b27b60\"\n    description: \"AWS Trainium 50本ノックシリーズの紹介 - 入門ガイド。\"\n    author: \"Karakuri\"\n    date: \"2025-11-18\"\n    category: \"tutorial\"\n    locale: \"ja-JP\"\n    keywords: [\"trainium\", \"tutorial\", \"getting-started\", \"series\"]\n    featured: false\n\n  - title: \"「Syn Pro」開発レポート：AWS TrainiumとRFTによる高性能日本語LLMの実現\"\n    url: \"https://zenn.dev/karakuri_blog/articles/b923acfc86083b\"\n    description: \"AWS TrainiumとRFTを使用した高性能日本語LLMの構築に関する開発レポート。\"\n    author: \"Karakuri\"\n    date: \"2025-10-24\"\n    category: \"case-study\"\n    locale: \"ja-JP\"\n    keywords: [\"trainium\", \"llm\", \"japanese\", \"rft\", \"case-study\"]\n    featured: false\n\n  - title: \"AWS Inferentia2 + Llama 3.2 にできること\"\n    url: \"https://zenn.dev/exwzd/articles/20250930-inferentia-llama\"\n    description: \"AWS Inferentia2とLlama 3.2モデルでできることを紹介します。\"\n    author: \"exwzd\"\n    date: \"2025-09-30\"\n    category: \"blog\"\n    locale: \"ja-JP\"\n    keywords: [\"inferentia2\", \"llama\", \"capabilities\", \"inference\"]\n    featured: false\n\n  - title: \"AWS Inferentia2とvLLMでLlama 3.2の推論サーバーを構築する手順\"\n    url: \"https://zenn.dev/exwzd/articles/20250827_inferentia_compile\"\n    description: \"AWS Inferentia2とvLLMを使用してLlama 3.2推論サーバーを構築するステップバイステップガイド。\"\n    author: \"exwzd\"\n    date: \"2025-08-28\"\n    category: \"tutorial\"\n    locale: \"ja-JP\"\n    keywords: [\"inferentia2\", \"vllm\", \"llama\", \"inference\", \"tutorial\"]\n    featured: false\n\n  - title: \"【開催報告】Neuron Community – Vol.2\"\n    url: \"https://aws.amazon.com/jp/blogs/news/neuron-community-vol-2/\"\n    description: \"Neuron Community Vol.2の開催報告。\"\n    author: \"AWS Japan\"\n    date: \"2025-07-24\"\n    category: \"news\"\n    locale: \"ja-JP\"\n    keywords: [\"community\", \"event\", \"neuron\", \"japan\"]\n    featured: false\n\n  - title: \"KARAKURI VL - 日本語コンピュータユースに特化した視覚言語モデル\"\n    url: \"https://zenn.dev/karakuri_blog/articles/28c73f2ada797a\"\n    description: \"日本語コンピュータユースに特化したビジョン言語モデルKARAKURI VLの紹介。\"\n    author: \"Karakuri\"\n    date: \"2025-07-11\"\n    category: \"blog\"\n    locale: \"ja-JP\"\n    keywords: [\"vision-language\", \"japanese\", \"multimodal\", \"karakuri\"]\n    featured: false\n\n  - title: \"LLM-jp Chatbot Arenaを試験運用しました\"\n    url: \"https://llm-jp.nii.ac.jp/ja/blog/blog-836/\"\n    description: \"LLM-jp Chatbot Arenaの試験運用に関するレポート。\"\n    author: \"LLM-jp\"\n    date: \"2025-05-12\"\n    category: \"blog\"\n    locale: \"ja-JP\"\n    keywords: [\"llm\", \"chatbot\", \"arena\", \"japanese\"]\n    featured: false\n\n  - title: \"【開催報告】Neuron Community – Day One\"\n    url: \"https://aws.amazon.com/jp/blogs/news/neuron-community-day-one/\"\n    description: \"初回Neuron Community Dayの開催報告。\"\n    author: \"AWS Japan\"\n    date: \"2025-04-14\"\n    category: \"news\"\n    locale: \"ja-JP\"\n    keywords: [\"community\", \"event\", \"neuron\", \"japan\"]\n    featured: false\n\n  - title: \"EKS Auto Mode でサクッと機械学習用インスタンスを利用してみる。 AWS 独自設計チップ搭載の Trainium と Inferentia を使ってみた！\"\n    url: \"https://dev.classmethod.jp/articles/eks-auto-mode-gpu-aws-trainium-inferentia/\"\n    description: \"EKS Auto Modeを使用してMLインスタンスを簡単に利用する方法。AWS TrainiumとInferentiaチップの活用ガイド。\"\n    author: \"Classmethod\"\n    date: \"2025-01-02\"\n    category: \"tutorial\"\n    locale: \"ja-JP\"\n    keywords: [\"eks\", \"trainium\", \"inferentia\", \"kubernetes\", \"tutorial\"]\n    featured: false\n\n  # Korean Articles\n  - title: \"Nota AI가 제안하는 AWS Inferentia에서 다양한 LLM 모델 양자화 최적화기법 사용하기\"\n    url: \"https://aws.amazon.com/ko/blogs/tech/llm-model-quantization-techniques-for-aws-inferentia-by-nota-ai/\"\n    description: \"Nota AI가 제안하는 AWS Inferentia에서 LLM 모델 양자화 최적화 기법.\"\n    author: \"Nota AI / AWS Korea\"\n    date: \"2026-01-20\"\n    category: \"blog\"\n    locale: \"ko-KR\"\n    keywords: [\"inferentia\", \"quantization\", \"llm\", \"optimization\", \"nota-ai\"]\n    featured: false\n\n  - title: \"Nota AI가 제안하는 Transformer 모델을 AWS Inferentia/Trainium에 손쉽게 배포하는 방법\"\n    url: \"https://aws.amazon.com/ko/blogs/tech/tips-for-using-transformer-models-on-aws-inf-and-trn/\"\n    description: \"Nota AI가 제안하는 AWS Inferentia/Trainium에서 Transformer 모델을 쉽게 배포하는 방법.\"\n    author: \"Nota AI / AWS Korea\"\n    date: \"2025-04-09\"\n    category: \"blog\"\n    locale: \"ko-KR\"\n    keywords: [\"transformer\", \"deployment\", \"inferentia\", \"trainium\", \"nota-ai\"]\n    featured: false\n\n  - title: \"콜드스타트 추천 문제를 AWS Trainium과 vLLM으로 해결하는 자동화 전략\"\n    url: \"https://blog.a-cloud.co.kr/2025/07/25/%EC%BD%9C%EB%93%9C%EC%8A%A4%ED%83%80%ED%8A%B8-%EC%B6%94%EC%B2%9C-%EB%AC%B8%EC%A0%9C%EB%A5%BC-aws-trainium%EA%B3%BC-vllm%EC%9C%BC%EB%A1%9C-%ED%95%B4%EA%B2%B0%ED%95%98%EB%8A%94-%EC%9E%90%EB%8F%99/\"\n    description: \"AWS Trainium과 vLLM을 사용하여 콜드 스타트 추천 문제를 해결하는 자동화 전략.\"\n    author: \"A-Cloud\"\n    date: \"2025-07-25\"\n    category: \"blog\"\n    locale: \"ko-KR\"\n    keywords: [\"trainium\", \"vllm\", \"cold-start\", \"recommendations\", \"automation\"]\n    featured: false\n\n  - title: \"DeepSeek-R1 모델 AWS 출시\"\n    url: \"https://aws.amazon.com/ko/blogs/korea/deepseek-r1-models-now-available-on-aws/\"\n    description: \"AWS에서 DeepSeek-R1 모델을 사용할 수 있게 되었습니다.\"\n    author: \"AWS Korea\"\n    date: \"2025-02-05\"\n    category: \"news\"\n    locale: \"ko-KR\"\n    keywords: [\"deepseek\", \"r1\", \"model\", \"launch\", \"aws\"]\n    featured: false\n\n  # Chinese Articles\n  - title: \"使用亚马逊云科技自研芯片 Inferentia2 部署 DeepSeek R1 Distillation 模型（一）\"\n    url: \"https://aws.amazon.com/cn/blogs/china/deploying-the-deepseek-r1-distillation-model-using-amazon-inferentia2/\"\n    description: \"使用亚马逊云科技自研芯片 Inferentia2 部署 DeepSeek R1 Distillation 模型（第一部分）。\"\n    author: \"AWS China\"\n    date: \"2025-02-12\"\n    category: \"tutorial\"\n    locale: \"zh-CN\"\n    keywords: [\"inferentia2\", \"deepseek\", \"r1\", \"deployment\", \"distillation\"]\n    featured: false\n\n  - title: \"使用亚马逊云科技自研芯片 Inferentia2 部署 DeepSeek R1 Distillation 模型（二）\"\n    url: \"https://aws.amazon.com/cn/blogs/china/deploying-the-deepseek-r1-distillation-model-using-amazon-inferentia2-part-two/\"\n    description: \"使用亚马逊云科技自研芯片 Inferentia2 部署 DeepSeek R1 Distillation 模型（第二部分）。\"\n    author: \"AWS China\"\n    date: \"2025-02-14\"\n    category: \"tutorial\"\n    locale: \"zh-CN\"\n    keywords: [\"inferentia2\", \"deepseek\", \"r1\", \"deployment\", \"distillation\"]\n    featured: false\n\n  - title: \"Bytedance processes billions of daily videos using their multimodal video understanding models on AWS Inferentia2\"\n    url: \"https://aws.amazon.com/blogs/machine-learning/bytedance-processes-billions-of-daily-videos-using-their-multimodal-video-understanding-models-on-aws-inferentia2/\"\n    description: \"How Bytedance processes billions of daily videos using multimodal models on AWS Inferentia2.\"\n    author: \"AWS\"\n    date: \"2025-02-26\"\n    category: \"case-study\"\n    locale: \"en-US\"\n    keywords: [\"inferentia2\", \"bytedance\", \"video\", \"multimodal\", \"case-study\"]\n    featured: false\n\n  - title: \"基于 HAMi 实现亚马逊云科技 Trainium 与 Inferentia 核心级共享与策略性拓扑调度\"\n    url: \"https://aws.amazon.com/cn/blogs/china/achieve-trainium-and-inferentia-core-level-sharing-and-strategic-topology-scheduling/\"\n    description: \"基于 HAMi 实现亚马逊云科技 Trainium 与 Inferentia 核心级共享与策略性拓扑调度。\"\n    author: \"AWS China\"\n    date: \"2025-11-06\"\n    category: \"blog\"\n    locale: \"zh-CN\"\n    keywords: [\"trainium\", \"inferentia\", \"hami\", \"scheduling\", \"topology\"]\n    featured: false\n\n  # Red Hat / AWS Neuron Collaboration\n  - title: \"Red Hat to Deliver Enhanced AI Inference Across AWS\"\n    url: \"https://www.redhat.com/en/about/press-releases/red-hat-deliver-enhanced-ai-inference-across-aws\"\n    description: \"Red Hat and AWS expand collaboration to power enterprise-grade generative AI using Red Hat AI Inference Server on AWS Inferentia2 and Trainium3.\"\n    author: \"Red Hat\"\n    date: \"2025-12-02\"\n    category: \"news\"\n    locale: \"en-US\"\n    keywords: [\"red-hat\", \"inferentia2\", \"trainium3\", \"vllm\", \"openshift\", \"inference\", \"collaboration\"]\n    featured: false\n\n  - title: \"Run cost-effective AI workloads on OpenShift with AWS Neuron Operator\"\n    url: \"https://developers.redhat.com/articles/2025/12/02/cost-effective-ai-workloads-openshift-aws-neuron-operator\"\n    description: \"How to use the AWS Neuron Operator to run LLM inference with vLLM on AWS AI chips in Red Hat OpenShift.\"\n    author: \"Red Hat\"\n    date: \"2025-12-02\"\n    category: \"tutorial\"\n    locale: \"en-US\"\n    keywords: [\"red-hat\", \"openshift\", \"neuron-operator\", \"vllm\", \"inferentia\", \"trainium\", \"kubernetes\"]\n    featured: false\n\n  - title: \"AWS Neuron Operator for AI Chips on AWS — GitHub Releases\"\n    url: \"https://github.com/awslabs/operator-for-ai-chips-on-aws/releases\"\n    description: \"Open-source AWS Neuron Operator for Kubernetes and Red Hat OpenShift, enabling native support for AWS Inferentia and Trainium accelerators.\"\n    author: \"AWS\"\n    date: \"2025-12-02\"\n    category: \"news\"\n    locale: \"en-US\"\n    keywords: [\"neuron-operator\", \"kubernetes\", \"openshift\", \"open-source\", \"inferentia\", \"trainium\"]\n    featured: false\n\n  - title: \"Red Hat AI Inference Server — vLLM Neuron Container Image (RHEL 9)\"\n    url: \"https://catalog.redhat.com/en/software/containers/rhaiis/vllm-neuron-rhel9/698c42b20b626d81c97abd7f\"\n    description: \"Certified container image for the Red Hat AI Inference Server with vLLM optimized for AWS Inferentia and Trainium accelerators via the AWS Neuron SDK. Provides enterprise-grade, high-performance LLM inference serving on RHEL 9, enabling production deployment of generative AI models on AWS AI chips through Red Hat OpenShift or Podman.\"\n    author: \"Red Hat\"\n    date: \"2025-12-02\"\n    category: \"news\"\n    locale: \"en-US\"\n    keywords: [\"red-hat\", \"vllm\", \"neuron\", \"inferentia\", \"trainium\", \"container\", \"rhel9\", \"inference\", \"openshift\"]\n    featured: true\n\n"
  },
  {
    "path": "about-neuron/news-and-blogs/validate_articles.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nValidation script for news-and-blogs.yaml\n\nThis script validates the structure and content of article entries\nto ensure they meet the required format before submission.\n\nUsage:\n    python validate_articles.py\n\"\"\"\n\nimport sys\nfrom pathlib import Path\nfrom datetime import datetime\nimport re\n\ntry:\n    import yaml\nexcept ImportError:\n    print(\"Error: PyYAML is required. Install with: pip install pyyaml\")\n    sys.exit(1)\n\n\nVALID_CATEGORIES = {'blog', 'news', 'tutorial', 'case-study', 'benchmark'}\nREQUIRED_FIELDS = {'title', 'url', 'description', 'author', 'date', 'category', 'locale', 'keywords'}\nOPTIONAL_FIELDS = {'featured', 'author_url', 'icon'}\nALL_FIELDS = REQUIRED_FIELDS | OPTIONAL_FIELDS\n\n# Valid locale codes\nVALID_LOCALES = {\n    'en-US', 'en-GB', 'en-CA', 'en-AU', 'en-NZ', 'en-IE', 'en-IN', 'en-SG', 'en-ZA',\n    'ja-JP', 'zh-CN', 'zh-TW', 'zh-HK', 'ko-KR', 'th-TH', 'vi-VN', 'id-ID', 'ms-MY', 'fil-PH',\n    'de-DE', 'fr-FR', 'es-ES', 'es-MX', 'es-AR', 'pt-BR', 'pt-PT', 'it-IT', 'nl-NL', 'pl-PL',\n    'ru-RU', 'tr-TR', 'sv-SE', 'da-DK', 'no-NO', 'fi-FI', 'cs-CZ', 'hu-HU', 'ro-RO', 'el-GR',\n    'uk-UA', 'ar-SA', 'ar-AE', 'ar-EG', 'he-IL', 'fa-IR', 'hi-IN', 'bn-BD', 'ur-PK', 'sw-KE'\n}\n\n\ndef validate_url(url):\n    \"\"\"Validate URL format\"\"\"\n    url_pattern = re.compile(\n        r'^https?://'  # http:// or https://\n        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\\.)+[A-Z]{2,6}\\.?|'  # domain\n        r'localhost|'  # localhost\n        r'\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})'  # or IP\n        r'(?::\\d+)?'  # optional port\n        r'(?:/?|[/?]\\S+)$', re.IGNORECASE)\n    return url_pattern.match(url) is not None\n\n\ndef validate_date(date_str):\n    \"\"\"Validate date format (YYYY-MM-DD)\"\"\"\n    try:\n        datetime.strptime(date_str, '%Y-%m-%d')\n        return True\n    except ValueError:\n        return False\n\n\ndef validate_article(article, index, section):\n    \"\"\"Validate a single article entry\"\"\"\n    errors = []\n    warnings = []\n    \n    # Check for required fields\n    missing_fields = REQUIRED_FIELDS - set(article.keys())\n    if missing_fields:\n        errors.append(f\"Missing required fields: {', '.join(missing_fields)}\")\n    \n    # Check for unknown fields\n    unknown_fields = set(article.keys()) - ALL_FIELDS\n    if unknown_fields:\n        warnings.append(f\"Unknown fields (will be ignored): {', '.join(unknown_fields)}\")\n    \n    # Validate title\n    if 'title' in article:\n        if not article['title'] or not isinstance(article['title'], str):\n            errors.append(\"Title must be a non-empty string\")\n        elif len(article['title']) > 200:\n            warnings.append(f\"Title is very long ({len(article['title'])} chars). Consider shortening.\")\n    \n    # Validate URL\n    if 'url' in article:\n        if not validate_url(article['url']):\n            errors.append(f\"Invalid URL format: {article['url']}\")\n    \n    # Validate description\n    if 'description' in article:\n        if not article['description'] or not isinstance(article['description'], str):\n            errors.append(\"Description must be a non-empty string\")\n        elif len(article['description']) < 20:\n            warnings.append(\"Description is very short. Consider adding more detail.\")\n        elif len(article['description']) > 500:\n            warnings.append(f\"Description is very long ({len(article['description'])} chars). Consider shortening.\")\n    \n    # Validate author\n    if 'author' in article:\n        if not article['author'] or not isinstance(article['author'], str):\n            errors.append(\"Author must be a non-empty string\")\n    \n    # Validate author_url (optional)\n    if 'author_url' in article:\n        if article['author_url'] and not validate_url(article['author_url']):\n            errors.append(f\"Invalid author_url format: {article['author_url']}\")\n    \n    # Validate date\n    if 'date' in article:\n        if not validate_date(str(article['date'])):\n            errors.append(f\"Invalid date format: {article['date']}. Use YYYY-MM-DD\")\n        else:\n            article_date = datetime.strptime(str(article['date']), '%Y-%m-%d')\n            if article_date > datetime.now():\n                warnings.append(f\"Date is in the future: {article['date']}\")\n    \n    # Validate category\n    if 'category' in article:\n        if article['category'] not in VALID_CATEGORIES:\n            errors.append(f\"Invalid category: {article['category']}. Must be one of: {', '.join(VALID_CATEGORIES)}\")\n    \n    # Validate locale\n    if 'locale' in article:\n        if not isinstance(article['locale'], str):\n            errors.append(\"Locale must be a string\")\n        elif article['locale'] not in VALID_LOCALES:\n            warnings.append(f\"Locale '{article['locale']}' not in standard list. Will display with 🌐 globe icon. Common locales: en-US, ja-JP, zh-CN, de-DE, fr-FR, es-ES, pt-BR, ko-KR\")\n    \n    # Validate keywords\n    if 'keywords' in article:\n        if not isinstance(article['keywords'], list):\n            errors.append(\"Keywords must be a list\")\n        elif len(article['keywords']) == 0:\n            warnings.append(\"Keywords list is empty. Consider adding relevant keywords for better filtering\")\n        else:\n            for i, keyword in enumerate(article['keywords']):\n                if not isinstance(keyword, str):\n                    errors.append(f\"Keyword at index {i} must be a string\")\n                elif len(keyword.strip()) == 0:\n                    warnings.append(f\"Keyword at index {i} is empty or whitespace\")\n            if len(article['keywords']) > 10:\n                warnings.append(f\"Article has {len(article['keywords'])} keywords. Consider limiting to 5-10 most relevant keywords\")\n    \n    # Validate featured\n    if 'featured' in article:\n        if not isinstance(article['featured'], bool):\n            errors.append(\"Featured must be true or false (boolean)\")\n        if section == 'all_articles' and article['featured']:\n            warnings.append(\"Article marked as featured but in all_articles section\")\n    \n    # Validate icon (optional)\n    if 'icon' in article:\n        if not isinstance(article['icon'], str) or len(article['icon']) > 10:\n            warnings.append(\"Icon should be a short string (emoji recommended)\")\n    \n    return errors, warnings\n\n\ndef main():\n    \"\"\"Main validation function\"\"\"\n    yaml_file = Path(__file__).parent / 'news-and-blogs.yaml'\n    \n    if not yaml_file.exists():\n        print(f\"❌ Error: {yaml_file} not found\")\n        return 1\n    \n    print(f\"Validating {yaml_file}...\\n\")\n    \n    try:\n        with open(yaml_file, 'r', encoding='utf-8') as f:\n            data = yaml.safe_load(f)\n    except yaml.YAMLError as e:\n        print(f\"❌ YAML Parse Error: {e}\")\n        return 1\n    \n    if not isinstance(data, dict):\n        print(\"❌ Error: YAML file must contain a dictionary\")\n        return 1\n    \n    total_errors = 0\n    total_warnings = 0\n    \n    # Validate featured_articles section\n    if 'featured_articles' in data:\n        print(\"📌 Validating featured_articles section...\")\n        if not isinstance(data['featured_articles'], list):\n            print(\"❌ Error: featured_articles must be a list\")\n            total_errors += 1\n        else:\n            for i, article in enumerate(data['featured_articles'], 1):\n                errors, warnings = validate_article(article, i, 'featured_articles')\n                if errors or warnings:\n                    print(f\"\\n  Article #{i}: {article.get('title', 'NO TITLE')}\")\n                    for error in errors:\n                        print(f\"    ❌ Error: {error}\")\n                        total_errors += 1\n                    for warning in warnings:\n                        print(f\"    ⚠️  Warning: {warning}\")\n                        total_warnings += 1\n        print()\n    \n    # Validate all_articles section\n    if 'all_articles' in data:\n        print(\"📚 Validating all_articles section...\")\n        if not isinstance(data['all_articles'], list):\n            print(\"❌ Error: all_articles must be a list\")\n            total_errors += 1\n        else:\n            for i, article in enumerate(data['all_articles'], 1):\n                errors, warnings = validate_article(article, i, 'all_articles')\n                if errors or warnings:\n                    print(f\"\\n  Article #{i}: {article.get('title', 'NO TITLE')}\")\n                    for error in errors:\n                        print(f\"    ❌ Error: {error}\")\n                        total_errors += 1\n                    for warning in warnings:\n                        print(f\"    ⚠️  Warning: {warning}\")\n                        total_warnings += 1\n        print()\n    \n    # Summary\n    print(\"=\" * 60)\n    if total_errors == 0 and total_warnings == 0:\n        print(\"✅ Validation passed! No errors or warnings found.\")\n        return 0\n    else:\n        print(f\"Validation complete:\")\n        if total_errors > 0:\n            print(f\"  ❌ {total_errors} error(s) found - must be fixed\")\n        if total_warnings > 0:\n            print(f\"  ⚠️  {total_warnings} warning(s) found - should be reviewed\")\n        \n        if total_errors > 0:\n            print(\"\\n❌ Validation FAILED - please fix errors before submitting\")\n            return 1\n        else:\n            print(\"\\n✅ Validation PASSED - warnings are optional to fix\")\n            return 0\n\n\nif __name__ == '__main__':\n    sys.exit(main())\n"
  },
  {
    "path": "about-neuron/oss/index.rst",
    "content": ".. meta::\n    :description: GitHub repositories for AWS Neuron open source components, libraries, and tools.\n    :date-modified: 12/02/2025\n\nNeuron Open Source Repositories and Contribution\n===================================================\n\nAWS Neuron provides open source code and samples for some of its components, libraries, and tools under the Apache 2.0 license. The current public repositories open to contribution at this time are listed below.\n\nNeuron Open Source GitHub Repositories\n---------------------------------------\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: \n      :class-body: sphinx-design-class-title-small\n \n      **TorchNeuron PyTorch Extension Open Source**\n      ^^^\n      Source code for the Neuron Native PyTorch extension and the TorchNeuron library that implements it for AWS Trainium.\n\n      * Neuron GitHub source repository: https://github.com/aws-neuron/torch-neuronx\n\n   .. grid-item-card:: \n      :class-body: sphinx-design-class-title-small\n \n      **Neuron Kernel Library Open Source**\n      ^^^\n      Source code and specifications for the pre-built kernels that ship with the NKI Library .\n\n      * Neuron GitHub source repository: https://github.com/aws-neuron/nki-library\n  \n   .. grid-item-card:: \n      :class-body: sphinx-design-class-title-small\n \n      **vLLM for Neuron Open Source**\n      ^^^\n      Source code for the vLLM integrations with Neuron, supporting AWS Trainium and Inferentia.\n\n      * Neuron GitHub source repository: https://github.com/vllm-project/vllm-neuron\n      * **Note**: Released under vLLM project license (`LICENSE <https://github.com/vllm-project/vllm-neuron/blob/main/LICENSE>`__). \n  \n   .. grid-item-card:: \n      :class-body: sphinx-design-class-title-small\n \n      **NKI Samples**\n      ^^^\n      Full code examples that support NKI kernel development.\n\n      * Neuron GitHub source repository: https://github.com/aws-neuron/nki-samples\n\nHow to Contribute to Neuron Open Source\n----------------------------------------\n\nContributions via pull requests are appreciated! Before sending us a pull request, please ensure that:\n\n1. You are working against the latest source on the `main`` branch.\n2. You check existing open and recently merged pull requests and GitHub Issues to make sure someone else hasn't addressed the problem already.\n3. You open a GitHub Issue for the repo to discuss any significant work.\n\nTo send us a pull request:\n\n1. Fork the repository.\n2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.\n3. Ensure local tests pass.\n4. Commit to your fork using clear commit messages.\n5. Send us a pull request, answering any default questions in the pull request interface.\n6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.\n\nGitHub provides documentation on `forking a repository <https://help.github.com/articles/fork-a-repo/>`_ and `creating a pull request <https://help.github.com/articles/creating-a-pull-request/>`_.\n\nFor the specific details on licenses and contributing to each OSS repo, review the ``CONTRIBUTING.md`` pages linked below:\n\n* Contribute to TorchNeuron: https://github.com/aws-neuron/torch-neuronx/blob/main/CONTRIBUTING.md\n* Contribute to the NKI Library: https://github.com/aws-neuron/nki-library/blob/main/CONTRIBUTING.md\n* Contribute the the NKI samples: https://github.com/aws-neuron/nki-samples/blob/main/CONTRIBUTING.md\n  \n.. Re-add this when available: * Contribute to vLLM Neuron: https://github.com/vllm-project/vllm-neuron/blob/main/CONTRIBUTING.md\n"
  },
  {
    "path": "about-neuron/profiling-tools.rst",
    "content": ".. _profiling-tools:\n\nProfiling Tools\n================\n\n.. toctree:: \n    :maxdepth: 1\n\n    Neuron Profiler User Guide </tools/profiler/neuron-profile-user-guide>\n    Neuron Profiler 2.0 (Beta) User Guide </tools/profiler/neuron-profiler-2-0-beta-user-guide>\n    What's New </release-notes/components/dev-tools>\n\n\n"
  },
  {
    "path": "about-neuron/quick-start/_specs/REFACTORING_NOTES.md",
    "content": "# Quick-Start Refactoring Notes\n\n## Summary\n\nThe quick-start documentation has been restructured with a modern, task-based information architecture. The new structure eliminates the need for .txt includes in the primary quickstart paths.\n\n## New Structure (No .txt includes)\n\n### Primary Quickstarts (Self-contained)\n- `index.rst` - Main landing page with decision tree\n- `training-quickstart.rst` - Complete training workflow (no includes)\n- `inference-quickstart.rst` - Complete inference workflow (no includes)\n\nThese files follow the procedural-quickstart template and contain all content inline. No external includes required.\n\n### Supporting Pages\n- `docs-quicklinks.rst` - Quick navigation links\n- `github-samples.rst` - GitHub repository links\n\n## Legacy Structure (Uses .txt includes)\n\n### Legacy Quick-Start Pages (Inf1 only)\n- `torch-neuron.rst` - Uses tab-inference-torch-neuronx.txt and tab-inference-torch-neuron.txt\n- `tensorflow-neuron.rst` - Uses tab-inference-tensorflow-neuronx.txt and tab-inference-tensorflow-neuron.rst\n- `mxnet-neuron.rst` - Uses tab-inference-mxnet-neuron.txt\n\nThese legacy pages:\n- Target Inf1 instances (NeuronCore v1)\n- Use .txt includes that reference `/src/helperscripts/installationScripts/python_instructions.txt`\n- Are de-emphasized in the new navigation (under \"Legacy\" section)\n- Are preserved for backward compatibility and existing links\n\n### .txt Include Files (Legacy only)\nAll .txt files in this directory are used exclusively by the legacy quick-start pages:\n- `tab-inference-torch-neuronx*.txt` (various OS versions)\n- `tab-inference-torch-neuron*.txt` (various OS versions)\n- `tab-inference-tensorflow-neuronx*.txt` (various OS versions)\n- `tab-inference-tensorflow-neuron*.txt` (various OS versions)\n- `tab-inference-mxnet-neuron*.txt` (various OS versions)\n- `select-framework-note.txt`\n\n## Design Decision\n\n**Why not refactor legacy files?**\n1. They target deprecated Inf1 hardware\n2. They're not prominently featured in new navigation\n3. Refactoring would require updating installation script references\n4. Risk of breaking existing external links\n5. New users are directed to the new self-contained quickstarts\n\n**Why are new quickstarts self-contained?**\n1. Easier to maintain (all content in one place)\n2. Better for AI/LLM context retrieval\n3. Follows modern docs-as-code best practices\n4. Clearer for human readers (no jumping between files)\n5. Follows the procedural-quickstart template structure\n\n## Migration Path\n\nFor users currently using legacy quick-starts:\n- Inf1 users: Continue using legacy pages (torch-neuron.rst, etc.)\n- New projects: Use new quickstarts (training-quickstart.rst, inference-quickstart.rst)\n- Inf2/Trn1/Trn2/Trn3 users: Use new quickstarts\n\n## Future Cleanup\n\nWhen Inf1 support is fully deprecated:\n1. Archive legacy quick-start pages to `/archive/quick-start/`\n2. Remove .txt include files\n3. Update any remaining cross-references\n4. Update neuron_tag.py to remove special handling\n"
  },
  {
    "path": "about-neuron/quick-start/docs-quicklinks.rst",
    "content": ".. _docs-quick-links:\n\nNeuron Quick Links\n==================\n\n.. grid:: 2\n        :gutter: 2\n\n        .. grid-item-card:: Overview\n                \n                * :ref:`neuron-quickstart`\n                * :ref:`amazon-q-dev`\n                * :ref:`model_samples_tutorials`\n                * :ref:`benchmark`\n                * :ref:`neuron_release_notes`\n                * :ref:`announcements-main`\n\n        .. grid-item-card:: ML frameworks\n                \n                * :ref:`pytorch-neuronx-main`\n                * :ref:`jax-neuron-main`\n                * :ref:`tensorflow-neuron-main`\n                * :doc:`MXNet Neuron (archived) </archive/mxnet-neuron/index>`\n\n        .. grid-item-card:: ML libraries\n\n                * :ref:`nxdt`\n                * :ref:`NxD Inference <nxdi-index>`\n                * :ref:`neuronx-distributed-index`\n                * :ref:`transformers_neuronx_readme`\n                * :ref:`nemo-megatron-index`\n\n        .. grid-item-card:: User Guides\n                \n                * :ref:`neuron_runtime`\n                * :ref:`neuron_cc`\n                * :ref:`Neuron Kernel Interface (NKI) (beta) <neuron-nki>`\n                * :ref:`Neuron Custom C++ Operators (beta) <neuron_c++customops>`\n                * :ref:`monitoring_tools`\n                * :ref:`profiling-tools`\n                * :ref:`setup-guide-index`\n                * :ref:`neuron-dlami-overview`\n                * :ref:`neuron_containers`\n                * :ref:`neuron-devflows`\n\n        .. grid-item-card:: Learn AWS Neuron\n\n                * :ref:`neuron-architecture-index`\n                * :ref:`neuron-features-index`\n                * :ref:`neuron-appnotes-index`\n                * :ref:`neuron_faq`\n                * :ref:`general-troubleshooting`\n\n        .. grid-item-card:: About AWS Neuron\n\n                * :ref:`neuron_release_notes`\n\n"
  },
  {
    "path": "about-neuron/quick-start/github-samples.rst",
    "content": ".. _neuron-github-samples:\n\nNeuron GitHub Samples\n=====================\n\n.. grid:: 2\n\n        .. dropdown::  Training Samples for ``Trn1``\n                :class-title: sphinx-design-class-title-small\n                :class-body: sphinx-design-class-body-small\n                :animate: fade-in\n                :open:\n\n                * `PyTorch Neuron (torch-neuronx) samples for Trn1 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx#training>`_\n                * `Nemo Megatron for Neuron for Trn1 <https://github.com/aws-neuron/neuronx-nemo-megatron>`_\n                * `AWS Neuron samples for ParallelCluster <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples>`_\n                * `AWS Neuron samples for EKS <https://github.com/aws-neuron/aws-neuron-eks-samples>`_\n                * `AWS Neuron samples for SageMaker <https://github.com/aws-neuron/aws-neuron-sagemaker-samples>`_\n                * `AWS Neuron samples for Batch <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/aws-batch/>`_\n\n\n        .. dropdown::  Inference Samples for ``Inf2 & Trn1``\n                :class-title: sphinx-design-class-title-small\n                :class-body: sphinx-design-class-body-small\n                :animate: fade-in\n                :open:\n\n                * `PyTorch Neuron (torch-neuronx) samples for Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx#inference>`_\n                * `Transformers Neuron (transformers-neuronx)  samples <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx>`_\n                * `AWS Neuron samples for SageMaker <https://github.com/aws-neuron/aws-neuron-sagemaker-samples>`_\n\n        .. dropdown::   Inference Samples for ``Inf1``\n                :class-title: sphinx-design-class-title-small\n                :class-body: sphinx-design-class-body-small\n                :animate: fade-in\n                :open:\n\n                * `PyTorch Neuron (torch-neuron) samples for Inf1 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuron>`_\n                * `TensorFlow Neuron (tensorflow-neuron) samples for Inf1 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/tensorflow-neuron>`_\n\n        \n"
  },
  {
    "path": "about-neuron/quick-start/index.rst",
    "content": ".. meta::\n   :description: Get started quickly with AWS Neuron SDK for PyTorch, JAX, and TensorFlow on Inferentia and Trainium\n   :keywords: neuron, quickstart, getting started, pytorch, jax, tensorflow, inferentia, trainium, training, inference\n   :instance-types: inf2, trn1, trn2, trn3\n   :content-type: navigation-hub\n   :date-modified: 2026-03-03\n\n.. _neuron-quickstart:\n\nGet Started with AWS Neuron\n============================\n\nGet up and running with AWS Neuron SDK in minutes. These quickstarts guide you through your first training or inference workload on Inferentia and Trainium instances.\n\n.. note::\n   \n   **First time using AWS Neuron?** These quickstarts assume you have:\n   \n   - An active AWS account with EC2 access\n   - Basic familiarity with your chosen ML framework (PyTorch, JAX, or TensorFlow)\n   - SSH access to launch and connect to EC2 instances\n   \n   For detailed installation instructions, see the :doc:`Setup Guide </setup/index>`.\n\nChoose Your Path\n----------------\n\nSelect the quickstart that matches your use case:\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: 🚀 Training Quickstart\n      :link: training-quickstart\n      :link-type: ref\n      :class-card: sd-border-2\n      \n      Train your first model on Trainium\n      \n      - Launch a Trn1 instance\n      - Run a PyTorch training script\n      - Monitor training progress\n      \n      **Time**: ~15 minutes\n      \n      :bdg-primary:`Trn1` :bdg-primary:`Trn2` :bdg-primary:`Trn3`\n\n   .. grid-item-card:: 🎯 Inference Quickstart\n      :link: inference-quickstart\n      :link-type: ref\n      :class-card: sd-border-2\n      \n      Run your first inference on Inferentia\n      \n      - Launch an Inf2 instance\n      - Load a pre-compiled model\n      - Run predictions\n      \n      **Time**: ~10 minutes\n      \n      :bdg-success:`Inf2` :bdg-success:`Trn1`\n\nSpecialized Quickstarts\n-----------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: 💬 LLM Serving with vLLM\n      :class-card: sd-border-1\n      \n      Deploy large language models for production inference\n      \n      - :doc:`Online serving </libraries/nxd-inference/vllm/quickstart-vllm-online-serving>` (OpenAI-compatible API)\n      - :doc:`Offline batch inference </libraries/nxd-inference/vllm/quickstart-vllm-offline-serving>`\n      \n      **Time**: ~20 minutes\n      \n      :bdg-info:`Inf2` :bdg-info:`Trn1`\n\n   .. grid-item-card:: 🤖 Amazon AI helper tools\n      :link: amazon-q-dev\n      :link-type: ref\n      :class-card: sd-border-1\n      \n      Use AI-powered code assistance for Neuron development\n      \n      - Get code suggestions\n      - Debug Neuron applications\n      - Optimize performance\n      \n      **Time**: ~5 minutes\n\nFramework-Specific Guides\n-------------------------\n\nNeed framework-specific setup instructions?\n\n.. grid:: 1 1 3 3\n   :gutter: 2\n\n   .. grid-item-card:: PyTorch\n      :link: /setup/pytorch/index\n      :link-type: doc\n      :class-card: sd-border-1\n      :class-body: sphinx-design-class-title-small\n      \n      PyTorch 2.9+ setup\n\n   .. grid-item-card:: JAX\n      :link: /setup/jax/index\n      :link-type: doc\n      :class-card: sd-border-1\n      :class-body: sphinx-design-class-title-small\n      \n      JAX 0.7+ setup\n\n   .. grid-item-card:: TensorFlow\n      :link: /archive/tensorflow/index\n      :link-type: doc\n      :class-card: sd-border-1\n      :class-body: sphinx-design-class-title-small\n      \n      TensorFlow 2.x setup\n\nAdditional Resources\n--------------------\n\n- :doc:`/about-neuron/models/index` - Pre-tested model samples and tutorials\n- :doc:`/devflows/ec2-flows` - Detailed EC2 deployment workflows\n- :doc:`/containers/index` - Use Deep Learning Containers\n- :doc:`docs-quicklinks` - Quick links to all Neuron documentation\n- :doc:`github-samples` - GitHub sample repositories\n\nLegacy Quick-Start Pages (Inf1)\n--------------------------------\n\n.. warning::\n   \n   The following pages are for legacy Inf1 instances only. For new projects, use the quickstarts above for Inf2, Trn1, Trn2, or Trn3.\n\n- :doc:`torch-neuron` - PyTorch on Inf1\n- :doc:`tensorflow-neuron` - TensorFlow on Inf1\n- :doc:`mxnet-neuron` - MXNet on Inf1\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n   \n   training-quickstart\n   inference-quickstart\n   /libraries/nxd-inference/vllm/quickstart-vllm-online-serving\n   /libraries/nxd-inference/vllm/quickstart-vllm-offline-serving\n   /about-neuron/amazonq-getstarted\n   docs-quicklinks\n   github-samples\n   torch-neuron\n   tensorflow-neuron\n   mxnet-neuron\n"
  },
  {
    "path": "about-neuron/quick-start/inference-quickstart.rst",
    "content": ".. meta::\n   :description: Run your first inference workload on AWS Inferentia with PyTorch and Neuron SDK\n   :keywords: neuron, inference, quickstart, pytorch, inferentia, inf2, getting started\n   :instance-types: inf2, trn1\n   :content-type: quickstart\n   :date-modified: 2026-03-03\n\n.. _inference-quickstart:\n\nQuickstart: Run Inference on Inferentia\n========================================\n\nThis quickstart guides you through running your first PyTorch inference workload on AWS Inferentia. You'll launch an Inf2 instance, compile a model for Neuron, and run predictions. When you complete this quickstart, you'll understand the basic workflow for deploying models on Inferentia.\n\n**This quickstart is for**: ML engineers and developers deploying inference workloads\n\n**Time to complete**: ~10 minutes\n\nPrerequisites\n-------------\n\nBefore you begin, ensure you have:\n\n- An AWS account with EC2 launch permissions\n- AWS CLI configured with your credentials\n- SSH key pair for EC2 access\n- Basic familiarity with PyTorch\n- Terminal access (Linux, macOS, or WSL on Windows)\n\nStep 1: Launch an Inferentia instance\n--------------------------------------\n\nIn this step, you will launch an Inf2 instance using the AWS Deep Learning AMI.\n\nLaunch an Inf2.xlarge instance with the latest Deep Learning AMI:\n\n.. code-block:: bash\n\n   aws ec2 run-instances \\\n       --image-id resolve:ssm:/aws/service/deep-learning-base-neuron/ubuntu-22-04/latest \\\n       --instance-type inf2.xlarge \\\n       --key-name YOUR_KEY_NAME \\\n       --security-group-ids YOUR_SECURITY_GROUP \\\n       --subnet-id YOUR_SUBNET_ID\n\n.. note::\n   \n   Replace ``YOUR_KEY_NAME``, ``YOUR_SECURITY_GROUP``, and ``YOUR_SUBNET_ID`` with your values.\n   \n   Alternatively, launch the instance through the `EC2 Console <https://console.aws.amazon.com/ec2/>`_.\n\nConnect to your instance via SSH:\n\n.. code-block:: bash\n\n   ssh -i YOUR_KEY.pem ubuntu@YOUR_INSTANCE_IP\n\nVerify Neuron devices are available:\n\n.. code-block:: bash\n\n   neuron-ls\n\nYou should see output showing available NeuronCores:\n\n.. code-block:: text\n\n   +--------+--------+--------+---------+\n   | NEURON | NEURON | NEURON |   PCI   |\n   | DEVICE | CORES  | MEMORY |   BDF   |\n   +--------+--------+--------+---------+\n   | 0      | 2      | 32 GB  | 00:1e.0 |\n   +--------+--------+--------+---------+\n\nStep 2: Set up your environment\n--------------------------------\n\nIn this step, you will create a Python virtual environment and install PyTorch with Neuron support.\n\nCreate and activate a virtual environment:\n\n.. code-block:: bash\n\n   python3 -m venv neuron_env\n   source neuron_env/bin/activate\n\nInstall PyTorch Neuron and dependencies:\n\n.. code-block:: bash\n\n   pip install torch-neuronx neuronx-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\nVerify the installation:\n\n.. code-block:: bash\n\n   python -c \"import torch; import torch_neuronx; print(f'PyTorch: {torch.__version__}')\"\n\nYou should see output confirming PyTorch is installed:\n\n.. code-block:: text\n\n   PyTorch: 2.9.0+cpu\n\nStep 3: Compile a model for Neuron\n-----------------------------------\n\nIn this step, you will create a simple model and compile it for Neuron inference.\n\nCreate a file named ``compile_model.py``:\n\n.. code-block:: python\n\n   import torch\n   import torch.nn as nn\n   import torch_neuronx\n   \n   # Simple neural network\n   class SimpleNet(nn.Module):\n       def __init__(self):\n           super().__init__()\n           self.fc1 = nn.Linear(784, 128)\n           self.fc2 = nn.Linear(128, 10)\n           self.relu = nn.ReLU()\n       \n       def forward(self, x):\n           x = self.relu(self.fc1(x))\n           return self.fc2(x)\n   \n   # Create model and set to eval mode\n   model = SimpleNet()\n   model.eval()\n   \n   # Create example input\n   example_input = torch.randn(1, 784)\n   \n   # Trace and compile for Neuron\n   print(\"Compiling model for Neuron...\")\n   neuron_model = torch_neuronx.trace(model, example_input)\n   \n   # Save compiled model\n   neuron_model.save('simple_net_neuron.pt')\n   print(\"Model compiled and saved to simple_net_neuron.pt\")\n\nRun the compilation script:\n\n.. code-block:: bash\n\n   python compile_model.py\n\nYou should see compilation progress and success message:\n\n.. code-block:: text\n\n   Compiling model for Neuron...\n   INFO:Neuron:Compiling function _NeuronGraph$1 with neuronx-cc\n   INFO:Neuron:Compilation successful\n   Model compiled and saved to simple_net_neuron.pt\n\n.. note::\n   \n   Model compilation happens once. The compiled model (``simple_net_neuron.pt``) can be reused for inference without recompiling.\n\nStep 4: Run inference\n----------------------\n\nIn the final step, you will load the compiled model and run predictions.\n\nCreate a file named ``run_inference.py``:\n\n.. code-block:: python\n\n   import torch\n   import torch_neuronx\n   \n   # Load compiled model\n   print(\"Loading compiled model...\")\n   neuron_model = torch.jit.load('simple_net_neuron.pt')\n   \n   # Create sample input\n   sample_input = torch.randn(1, 784)\n   \n   # Run inference\n   print(\"Running inference...\")\n   with torch.no_grad():\n       output = neuron_model(sample_input)\n   \n   # Get prediction\n   predicted_class = output.argmax(dim=1).item()\n   print(f\"Predicted class: {predicted_class}\")\n   print(f\"Output logits: {output[0][:5].tolist()}\")  # Show first 5 logits\n   \n   # Run multiple inferences to measure throughput\n   print(\"\\nRunning 100 inferences...\")\n   import time\n   start = time.time()\n   \n   with torch.no_grad():\n       for _ in range(100):\n           output = neuron_model(sample_input)\n   \n   elapsed = time.time() - start\n   throughput = 100 / elapsed\n   print(f\"Throughput: {throughput:.2f} inferences/second\")\n   print(f\"Latency: {elapsed/100*1000:.2f} ms per inference\")\n\nRun the inference script:\n\n.. code-block:: bash\n\n   python run_inference.py\n\nYou should see inference results:\n\n.. code-block:: text\n\n   Loading compiled model...\n   Running inference...\n   Predicted class: 7\n   Output logits: [0.123, -0.456, 0.789, -0.234, 0.567]\n   \n   Running 100 inferences...\n   Throughput: 245.67 inferences/second\n   Latency: 4.07 ms per inference\n\nMonitor Neuron device utilization in another terminal:\n\n.. code-block:: bash\n\n   neuron-top\n\nThis shows real-time NeuronCore utilization and inference metrics.\n\nConfirmation\n------------\n\nCongratulations! You've successfully run inference on AWS Inferentia. You should have:\n\n- ✅ Launched an Inf2 instance with Neuron SDK\n- ✅ Installed PyTorch with Neuron support\n- ✅ Compiled a model for Neuron inference\n- ✅ Ran predictions and measured throughput\n- ✅ Monitored inference with Neuron tools\n\nIf you encountered any issues, see the **Common issues** section below.\n\nCommon issues\n-------------\n\n**Issue**: ``ModuleNotFoundError: No module named 'torch_neuronx'``\n\n**Solution**: Ensure you activated the virtual environment and installed packages:\n\n.. code-block:: bash\n\n   source neuron_env/bin/activate\n   pip install torch-neuronx neuronx-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n**Issue**: ``RuntimeError: No Neuron devices found``\n\n**Solution**: Verify you're on an Inferentia instance and devices are visible:\n\n.. code-block:: bash\n\n   neuron-ls\n\nIf no devices appear, check instance type and driver installation.\n\n**Issue**: Compilation takes a long time\n\n**Solution**: Model compilation is a one-time cost. For this simple model, compilation should take 1-2 minutes. Larger models take longer but only need to be compiled once. The compiled model can be saved and reused.\n\n**Issue**: Lower throughput than expected\n\n**Solution**: This quickstart uses a small model and batch size for demonstration. For production workloads:\n\n- Use larger batch sizes (e.g., 4, 8, 16)\n- Enable dynamic batching\n- Use multiple NeuronCores in parallel\n- See :doc:`/frameworks/torch/torch-neuronx/programming-guide/inference/index` for optimization techniques\n\nClean up\n--------\n\nTo avoid ongoing charges, terminate your instance when finished:\n\n.. code-block:: bash\n\n   # From your local machine\n   aws ec2 terminate-instances --instance-ids YOUR_INSTANCE_ID\n\nOr use the EC2 Console to terminate the instance.\n\nNext steps\n----------\n\nNow that you've completed this quickstart, explore more advanced inference topics:\n\n- :doc:`/frameworks/torch/torch-neuronx/programming-guide/inference/index` - Comprehensive inference guide\n- :doc:`/libraries/nxd-inference/index` - Production inference with NeuronX Distributed\n- :doc:`/libraries/nxd-inference/vllm/quickstart-vllm-online-serving` - Deploy LLMs with vLLM\n- :doc:`/about-neuron/models/index` - Pre-tested model samples\n- :doc:`/tools/neuron-explorer/index` - Profile and optimize inference performance\n\nFurther reading\n---------------\n\n- :doc:`/setup/pytorch/index` - Detailed PyTorch installation options\n- :doc:`/devflows/ec2-flows` - EC2 deployment workflows\n- :doc:`/frameworks/torch/index` - Complete PyTorch Neuron documentation\n- :doc:`/compiler/index` - Understanding Neuron compilation\n"
  },
  {
    "path": "about-neuron/quick-start/mxnet-neuron.rst",
    "content": ".. _mxnet_quick_start:\n\n\nGet Started with Apache MXNet Neuron\n=====================================\n\nThis page provide links that will assist you to quickly start with :doc:`MXNet Neuron </archive/mxnet-neuron/index>` (supporting inference only).\n\n.. note::\n  Below instructions are for Ubuntu20, if you looking for complete setup instructions for different platforms, please :ref:`Check Here. <setup-guide-index>`\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /setup/install-templates/launch-instance.txt\n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 5\n        :end-line: 6\n\t\t\n.. include:: /includes/setup/tab-inference-mxnet-neuron.txt"
  },
  {
    "path": "about-neuron/quick-start/tab-inference-tensorflow-neuron.rst",
    "content": ".. dropdown::  Install TensorFlow Neuron (``tensorflow-neuron``)\n        :class-title: drop-down-class-title-small\n        :class-body: drop-down-class-body-small\n        :animate: fade-in\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=compiler_framework\n\n.. dropdown::  Get Started with Inference (``Inf1``)\n       :class-title: sphinx-design-class-title-small\n       :class-body: sphinx-design-class-body-small\n       :animate: fade-in\n\n        :ref:`ResNet-50 </src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb>`\n\n.. card:: Visit TensorFlow Neuron section for more\n        :class-body: sphinx-design-class-body-small\n        :link: tensorflow-neuron-main\n        :link-type: ref"
  },
  {
    "path": "about-neuron/quick-start/tensorflow-neuron.rst",
    "content": ".. _tensorflow_quick_start:\n\nGet Started with TensorFlow Neuron\n==================================\n\nThis page provide links that will assist you to quickly start with :ref:`tensorflow-neuron-main`.\n\n\n.. note::\n  Below instructions are for Ubuntu20, if you looking for complete setup instructions for different platforms, please :ref:`Check Here. <setup-guide-index>`\n\n.. _tensorflow_quick_start_inference:\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /setup/install-templates/launch-instance.txt\n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 5\n        :end-line: 6\n\n\n.. tab-set::\n\n   .. tab-item:: tensorflow-neuronx (``Trn1, Inf2``)\n\n        .. include:: /includes/setup/tab-inference-tensorflow-neuronx.txt\n\n   .. tab-item:: tensorflow-neuron (``Inf1``)\n\n        .. include:: /includes/setup/tab-inference-tensorflow-neuron.rst"
  },
  {
    "path": "about-neuron/quick-start/torch-neuron-tab-training.rst",
    "content": "\n.. dropdown::  Launch Trn1 Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /setup/install-templates/launch-instance.txt\n\n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. code:: bash\n\n        # Configure Linux for Neuron repository updates\n\n        sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF\n        [neuron]\n        name=Neuron YUM Repository\n        baseurl=https://yum.repos.neuron.amazonaws.com\n        enabled=1\n        metadata_expire=0\n        EOF\n        sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n\n        # Update OS packages\n        sudo dnf update -y\n\n        # Install git\n        sudo dnf install git -y\n\n\n        # Install OS headers\n        sudo dnf install -y \"kernel-devel-uname-r = $(uname -r)\"\n\n        # Remove preinstalled packages and Install Neuron Driver and Runtime\n        sudo dnf remove aws-neuron-dkms -y\n        sudo dnf remove aws-neuronx-dkms -y\n        sudo dnf remove aws-neuronx-oci-hook -y\n        sudo dnf remove aws-neuronx-runtime-lib -y\n        sudo dnf remove aws-neuronx-collectives -y\n        sudo dnf install aws-neuronx-dkms-2.*  -y\n        sudo dnf install aws-neuronx-oci-hook-2.*  -y\n        sudo dnf install aws-neuronx-runtime-lib-2.*  -y\n        sudo dnf install aws-neuronx-collectives-2.*  -y\n\n        # Install EFA Driver(only required for multi-instance training)\n        curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz\n        wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key\n        cat aws-efa-installer.key | gpg --fingerprint\n        wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig\n        tar -xvf aws-efa-installer-latest.tar.gz\n        cd aws-efa-installer && sudo bash efa_installer.sh --yes\n        cd\n        sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer\n\n        # Remove pre-installed package and Install Neuron Tools\n        sudo dnf remove aws-neuron-tools  -y\n        sudo dnf remove aws-neuronx-tools  -y\n        sudo dnf install aws-neuronx-tools-2.*  -y\n\n        export PATH=/opt/aws/neuron/bin:$PATH\n\n.. dropdown::  Install PyTorch Neuron (``torch-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. code:: bash\n\n        # Install Python venv and activate Python virtual environment to install\n        # Neuron pip packages.\n        python3.7 -m venv aws_neuron_venv_pytorch\n        source aws_neuron_venv_pytorch/bin/activate\n        pip install -U pip\n\n        # Install wget, awscli\n        pip install wget\n        pip install awscli\n\n        # Install Neuron packages\n        pip install torch-neuronx==1.13.0.1.* --extra-index-url=https://pip.repos.neuron.amazonaws.com\n        pip install neuronx-cc==2.* --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n\n.. dropdown::  Run Tutorial\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    :ref:`neuronx-mlp-training-tutorial`\n\n\n.. card:: Visit PyTorch Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: pytorch-neuronx-main\n    :link-type: ref\n"
  },
  {
    "path": "about-neuron/quick-start/torch-neuron.rst",
    "content": ".. _torch_quick_start:\n\nGet Started with PyTorch Neuron\n===============================\n\nThis page provide links that will assist you to quickly start with :ref:`pytorch-neuronx-main` for both Inference and Training.\n\n.. note::\n  Below instructions are for Ubuntu20, if you looking for complete setup instructions for different platforms, please :ref:`Check Here. <setup-guide-index>`\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /setup/install-templates/launch-instance.txt\n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 5\n        :end-line: 6\n\n.. tab-set::\n\n   .. tab-item:: torch-neuronx (``Trn1, Inf2``)\n\n        .. include:: /includes/setup/tab-inference-torch-neuronx.txt\n\n\n   .. tab-item:: torch-neuron (``Inf1``)\n\n        .. include:: /includes/setup/tab-inference-torch-neuron.txt"
  },
  {
    "path": "about-neuron/quick-start/training-quickstart.rst",
    "content": ".. meta::\n   :description: Train your first model on AWS Trainium with PyTorch and Neuron SDK\n   :keywords: neuron, training, quickstart, pytorch, trainium, trn1, getting started\n   :instance-types: trn1, trn2, trn3\n   :content-type: quickstart\n   :date-modified: 2026-03-03\n\n.. _training-quickstart:\n\nQuickstart: Train a Model on Trainium\n======================================\n\nThis quickstart guides you through training your first PyTorch model on AWS Trainium. You'll launch a Trn1 instance, install Neuron SDK, and run a simple training script. When you complete this quickstart, you'll understand the basic workflow for training models with Neuron.\n\n**This quickstart is for**: ML engineers and data scientists new to AWS Trainium\n\n**Time to complete**: ~15 minutes\n\nPrerequisites\n-------------\n\nBefore you begin, ensure you have:\n\n- An AWS account with EC2 launch permissions\n- AWS CLI configured with your credentials\n- SSH key pair for EC2 access\n- Basic familiarity with PyTorch\n- Terminal access (Linux, macOS, or WSL on Windows)\n\nStep 1: Launch a Trainium instance\n-----------------------------------\n\nIn this step, you will launch a Trn1 instance using the AWS Deep Learning AMI.\n\nFirst, launch a Trn1.2xlarge instance with the latest Deep Learning AMI:\n\n.. code-block:: bash\n\n   aws ec2 run-instances \\\n       --image-id resolve:ssm:/aws/service/deep-learning-base-neuron/ubuntu-22-04/latest \\\n       --instance-type trn1.2xlarge \\\n       --key-name YOUR_KEY_NAME \\\n       --security-group-ids YOUR_SECURITY_GROUP \\\n       --subnet-id YOUR_SUBNET_ID\n\n.. note::\n   \n   Replace ``YOUR_KEY_NAME``, ``YOUR_SECURITY_GROUP``, and ``YOUR_SUBNET_ID`` with your values.\n   \n   Alternatively, launch the instance through the `EC2 Console <https://console.aws.amazon.com/ec2/>`_.\n\nOnce the instance is running, connect via SSH:\n\n.. code-block:: bash\n\n   ssh -i YOUR_KEY.pem ubuntu@YOUR_INSTANCE_IP\n\nVerify Neuron devices are available:\n\n.. code-block:: bash\n\n   neuron-ls\n\nYou should see output showing available NeuronCores:\n\n.. code-block:: text\n\n   +--------+--------+--------+---------+\n   | NEURON | NEURON | NEURON |   PCI   |\n   | DEVICE | CORES  | MEMORY |   BDF   |\n   +--------+--------+--------+---------+\n   | 0      | 2      | 32 GB  | 00:1e.0 |\n   | 1      | 2      | 32 GB  | 00:1f.0 |\n   +--------+--------+--------+---------+\n\nStep 2: Set up your environment\n--------------------------------\n\nIn this step, you will create a Python virtual environment and install PyTorch with Neuron support.\n\nCreate and activate a virtual environment:\n\n.. code-block:: bash\n\n   python3 -m venv neuron_env\n   source neuron_env/bin/activate\n\nInstall PyTorch Neuron and dependencies:\n\n.. code-block:: bash\n\n   pip install torch-neuronx neuronx-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\nVerify the installation:\n\n.. code-block:: bash\n\n   python -c \"import torch; import torch_neuronx; print(f'PyTorch: {torch.__version__}')\"\n\nYou should see output confirming PyTorch is installed:\n\n.. code-block:: text\n\n   PyTorch: 2.9.0+cpu\n\nStep 3: Create a training script\n---------------------------------\n\nIn this step, you will create a simple PyTorch training script that uses Neuron acceleration.\n\nCreate a file named ``train_simple.py``:\n\n.. code-block:: python\n\n   import torch\n   import torch.nn as nn\n   import torch.optim as optim\n   import torch_neuronx\n   \n   # Simple neural network\n   class SimpleNet(nn.Module):\n       def __init__(self):\n           super().__init__()\n           self.fc1 = nn.Linear(784, 128)\n           self.fc2 = nn.Linear(128, 10)\n           self.relu = nn.ReLU()\n       \n       def forward(self, x):\n           x = self.relu(self.fc1(x))\n           return self.fc2(x)\n   \n   # Create model and move to Neuron device\n   model = SimpleNet().to('neuron')\n   criterion = nn.CrossEntropyLoss()\n   optimizer = optim.SGD(model.parameters(), lr=0.01)\n   \n   # Generate dummy training data\n   batch_size = 32\n   num_batches = 100\n   \n   print(\"Starting training...\")\n   model.train()\n   \n   for batch_idx in range(num_batches):\n       # Create dummy batch\n       inputs = torch.randn(batch_size, 784).to('neuron')\n       targets = torch.randint(0, 10, (batch_size,)).to('neuron')\n       \n       # Training step\n       optimizer.zero_grad()\n       outputs = model(inputs)\n       loss = criterion(outputs, targets)\n       loss.backward()\n       optimizer.step()\n       \n       if batch_idx % 10 == 0:\n           print(f\"Batch {batch_idx}/{num_batches}, Loss: {loss.item():.4f}\")\n   \n   print(\"Training complete!\")\n\nThis script creates a simple neural network, moves it to the Neuron device, and trains it on synthetic data.\n\nStep 4: Run training\n---------------------\n\nIn the final step, you will run the training script and monitor its progress.\n\nExecute the training script:\n\n.. code-block:: bash\n\n   python train_simple.py\n\nYou should see training progress output:\n\n.. code-block:: text\n\n   Starting training...\n   Batch 0/100, Loss: 2.3156\n   Batch 10/100, Loss: 2.2845\n   Batch 20/100, Loss: 2.2534\n   ...\n   Training complete!\n\nMonitor Neuron device utilization in another terminal:\n\n.. code-block:: bash\n\n   neuron-top\n\nThis shows real-time NeuronCore utilization, memory usage, and other metrics.\n\nConfirmation\n------------\n\nCongratulations! You've successfully trained your first model on AWS Trainium. You should have:\n\n- ✅ Launched a Trn1 instance with Neuron SDK\n- ✅ Installed PyTorch with Neuron support\n- ✅ Created and ran a training script on Neuron devices\n- ✅ Monitored training with Neuron tools\n\nIf you encountered any issues, see the **Common issues** section below.\n\nCommon issues\n-------------\n\n**Issue**: ``ModuleNotFoundError: No module named 'torch_neuronx'``\n\n**Solution**: Ensure you activated the virtual environment and installed packages:\n\n.. code-block:: bash\n\n   source neuron_env/bin/activate\n   pip install torch-neuronx neuronx-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n**Issue**: ``RuntimeError: No Neuron devices found``\n\n**Solution**: Verify you're on a Trainium instance and devices are visible:\n\n.. code-block:: bash\n\n   neuron-ls\n\nIf no devices appear, check instance type and driver installation.\n\n**Issue**: Training is slower than expected\n\n**Solution**: This quickstart uses a small model for demonstration. For production workloads:\n\n- Use larger batch sizes\n- Enable XLA compilation with ``torch.compile()``\n- See :doc:`/frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide` for optimization techniques\n\nClean up\n--------\n\nTo avoid ongoing charges, terminate your instance when finished:\n\n.. code-block:: bash\n\n   # From your local machine\n   aws ec2 terminate-instances --instance-ids YOUR_INSTANCE_ID\n\nOr use the EC2 Console to terminate the instance.\n\nNext steps\n----------\n\nNow that you've completed this quickstart, explore more advanced training topics:\n\n- :doc:`/frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide` - Comprehensive training guide\n- :doc:`/libraries/nxd-training/index` - Distributed training with NeuronX Distributed\n- :doc:`/about-neuron/models/index` - Pre-tested model samples\n- :doc:`/tools/neuron-explorer/index` - Profile and optimize training performance\n\nFurther reading\n---------------\n\n- :doc:`/setup/pytorch/index` - Detailed PyTorch installation options\n- :doc:`/devflows/ec2-flows` - EC2 deployment workflows\n- :doc:`/frameworks/torch/index` - Complete PyTorch Neuron documentation\n"
  },
  {
    "path": "about-neuron/quick-start/user-guide-quickstart.rst",
    "content": ".. _userguide-quickstart:\n\nUser Guide Quick Start\n======================\n\n* :ref:`setup-guide-index`\n* :ref:`Neuron Containers <neuron-containers>`\n* :ref:`neuron-devflows`\n\n"
  },
  {
    "path": "about-neuron/sdk-policy.rst",
    "content": ".. _sdk-maintenance-policy:\n.. _neuron-maintenance-policy:\n\nNeuron Software Maintenance policy\n==================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 3\n\nOverview\n--------\n\nThis document outlines software maintenance policy for AWS Neuron\nSoftware Development Kit (SDK), Neuron Components, both extension and\nstandalone components, supported model classes, features, APIs, DLAMIs\nand DLCs, and dependency software. AWS Neuron is the SDK for Amazon EC2\n`Inferentia <https://aws.amazon.com/machine-learning/inferentia/>`__ and\nAmazon EC2\n`Trainium <https://aws.amazon.com/machine-learning/trainium/>`__ based\ninstances purpose-built for deep learning. Neuron integrates with\npopular Machine Learning (ML) frameworks like PyTorch, JAX, and\nTensorFlow and includes a compiler, runtime, driver, profiling tools,\nand libraries to support high performance training of generative AI\nmodels on Trainium and Inferentia powered instances.\n\nThis document addresses Neuron Software life-cycle and the Neuron SDK\nrelease versioning.\n\n.. _neuron-software-definitions:\n\nNeuron Software Definitions\n---------------------------\n\nNeuron Software refers to the complete set of software elements\nprovided by AWS Neuron, including:\n\nNeuron SDK\n~~~~~~~~~~\n\nThe core software development kit that enables users to build, train,\nand deploy machine learning models on Inferentia and Trainium based\ninstances. The Neuron SDK encompasses the entire set of components,\nfeatures, APIs, and other elements that are bundled together and made\navailable in a particular version of the Neuron SDK release.\n\nNeuron components\n~~~~~~~~~~~~~~~~~\n\nNeuron components refer to any packages or libraries within the Neuron\nSDK that offer specific functionality. These components are typically\naccessible through PIP, RPM, or Debian packages for easy installation\nand usage. There are two main categories of Neuron components: Neuron\nextension components and Neuron standalone components.\n\nNeuron extension components\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron extension components are components that integrate Neuron support\ninto open source machine learning frameworks, libraries or tools\nenhancing their functionality and extending their capabilities as\nnecessary. When referring to Neuron extension components, we are also\nreferring to the parts of the open source machine learning framework or\nlibrary that are supported by Neuron. The software life-cycle of the\nopen source machine learning frameworks, libraries or tools that are\nextended by Neuron is managed and maintained by their respective\ncommunities or the vendors responsible for those specific components.\nExamples for Neuron extension components are:\n\n-  **Third party ML Library**: Examples include Neuron Nemo Megatron.\n-  **Third party ML Framework**: Examples include PyTorch NeuronX and\n   TensorFlow Neuron.\n\nNeuron standalone components\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron standalone components are self-contained components within the\nNeuron SDK. Examples of such components are Neuron Compiler, Neuron\nTools and Neuron Runtime.\n\nNeuron Model Classes\n~~~~~~~~~~~~~~~~~~~~\n\nA Neuron supported model class is tightly coupled with a specific Neuron\nextension component (e.g. PyTorch NeuronX) or Neuron library (e.g.\nNeuronX Distributed) and the workload type (e.g. Training or Inference).\nFor example a model can be supported at Beta level in PyTorch NeuronX\nfor training and Stable level in PyTorch NeuronX for inference.\n\nNeuron features\n~~~~~~~~~~~~~~~\n\nA Neuron feature refers to any functionality or attribute that is part\nof the Neuron SDK, whether it belongs to the entire Neuron SDK or to one\nof its specific components.\n\nNeuron APIs\n~~~~~~~~~~~\n\nA Neuron API refers to any API, CLI, environment variables, or flag that\nbelong to to the entire Neuron SDK or to one the Neuron components. A\nNeuron API allows developers to interact with and leverage the\ncapabilities of the Neuron SDK and its components.\n\nExamples include :ref:`Neuron Trace API <torch_neuron_trace_api>` and :ref:`Neuron Compiler flags <neuron-compiler-cli-reference-guide>`\n\nDependency software components\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nExternal software components or frameworks that the Neuron\nSDK and its components rely on for proper functioning and compatibility,\nsuch as language runtimes or operating systems.\n\nThe software life-cycle of the dependency software components, is\nmanaged and maintained by their respective communities or the vendors\nresponsible for those specific dependency software components. The\nfollowing terms are examples of underlying dependency software\ncomponents:\n\n-  **Operating System (OS)**: Examples include Ubuntu 22 and Amazon\n   Linux 2023\n-  **Language Runtime**: Examples include Python 3.10\n\nNeuron Deep Learning AMIs and Deep Learning Containers\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n:ref:`Neuron Deep Learning AMIs\n(DLAMIs) <neuron-dlami-overview>`\nand :ref:`Neuron Deep Learning Containers\n(DLCs) <neuron_containers>` are pre-configured Amazon Machine Images and Docket container that\ncome with the Neuron SDK and necessary dependencies pre-installed,\nproviding a ready-to-use environment for machine learning development.\n\n.. _neuron-software-lifecycle:\n\nNeuron Software Life-cycle\n--------------------------\n\nThe typical life-cycle for Neuron software consists of several phases, though not all phases are applicable to every type of Neuron software. The phases are as follows:\n\n-  **Developer Preview or Beta** (these terms are used interchangeably in\n   Neuron collaterals)\n-  **Release Candidate (RC)**\n-  **General Availability (GA) or Stable** (these terms are used\n   interchangeably in Neuron collaterals)\n-  **Maintenance**\n-  **End-of-Support (EOS)**\n\nThe following table outlines the details for each phase for Neuron software:\n\n+-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+\n|                               | Description                                                                                                          | Comments                                         |\n+-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+\n| Developer Preview (Beta)      | In this phase, Neuron Software is not supported, should not be used in production environments,                      |                                                  |\n|                               | and is meant for early access and feedback purposes only. It is possible for future releases                         |                                                  |\n|                               | to introduce breaking changes.                                                                                       |                                                  |\n|                               | See :ref:`Neuron Software Classification <sdk-classification>` for more information                                  |                                                  |\n+-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+\n| Release Candidate (RC)        | Once AWS identifies a release to be a stable product, it may be marked as a Release Candidate (RC).                  | This phase applies only to Neuron SDK            |\n|                               | This phase is usually short and during it AWS will provide for Neuron Software on an as-needed basis.                | and Neuron components                            |\n+-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+\n| General Availability (Stable) | During this phase, AWS releases :ref:`regular <neuron-regular-updates>` updates for the Neuron Software based         |                                                  |\n|                               | on a predefined release cadence of the Neuron SDK or provides :ref:`maintenance updates <neuron-maintenance-updates>`|                                                  |\n|                               | for Neuron Software on an as-needed basis.                                                                           |                                                  |\n|                               | See :ref:`Neuron Software Classification <sdk-classification>` for more information                                  |                                                  |\n+-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+\n| Maintenance                   | During the maintenance phase, AWS will provide :ref:`maintenance updates <neuron-maintenance-updates>`               | This phase does not apply to Dependency Software |\n|                               | for Neuron Software on an as-needed basis. Any new PIP, RPM, and Debian packages for the Neuron                      | Components, Neuron DLCs,                         |\n|                               | Software, as well as updated versions of the Neuron DLAMIs and Neuron DLCs, will be released                         | Neuron DLAMIs, Neuron Features and APIs          |\n|                               | only when deemed necessary by the AWS Neuron team.                                                                   |                                                  |\n|                               | Users can expect updates to be less frequent compared to :ref:`regular <neuron-regular-updates>`                     |                                                  |\n|                               | as the focus will be on addressing critical issues and ensuring the stability of the software.                       |                                                  |\n|                               |                                                                                                                      |                                                  |\n|                               | Maintenance Announcement: AWS will make a public :ref:`announcement <neuron-communication>` at least one month       |                                                  |\n|                               | before the Neuron Software enters Maintenance phase.                                                                 |                                                  |\n+-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+\n| End of Support (EOS)          | When Neuron Software reaches the end of its support lifecycle, it will no longer receive                             |                                                  |\n|                               | :ref:`regular <neuron-regular-updates>` updates and :ref:`maintenance updates <neuron-maintenance-updates>`          |                                                  |\n|                               | (including security updates). While AWS will continue to provide access to all previously released                   |                                                  |\n|                               | PIP, RPM, and Debian packages for the Neuron Software, as well as earlier versions of the Neuron DLAMIs              |                                                  |\n|                               | and Neuron DLCs, it's important to note that these older versions will not receive any updates or support.           |                                                  |\n|                               | Customers can still use these resources at their own discretion, but it is highly recommended to upgrade             |                                                  |\n|                               | to the latest available versions                                                                                     |                                                  |\n|                               |                                                                                                                      |                                                  |\n|                               | End of Support Announcement: AWS will make a public :ref:`announcement <neuron-communication>` at least one month    |                                                  |\n|                               | before a Neuron Software enters End of Support.                                                                      |                                                  |\n+-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+\n\n.. _neuron-regular-updates:\n\nNeuron Software Regular Updates\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRegular updates for Neuron Software address the following areas: new\nfeatures, feature improvements, performance enhancements, bug\nresolution, security vulnerability fixes, upgrades to Neuron dependency\nsoftware components and upgrades to Neuron extension components. To\nhandle these regular updates, AWS will release a new version of the\nNeuron SDK, incrementing the minor version (the second digit in the\nversion number) for a minor release or incrementing the major version\n(the first digit in the version number) for a major release when\nsignificant changes that break compatibility are introduced. It's\nimportant to note that any bug-fixes or security issues in regular\nupdates are not applied retroactively to previous versions of the Neuron\nSDK. To benefit from these updates, users must adopt the latest release.\n\nFor more information see:\n\n-  :ref:`Neuron DLAMIs and DLCs Updates <neuron-dlami-dlc-updates>`\n-  :ref:`Neuron Extension Components Updates <neuron-extension-components-updates>`\n-  :ref:`Neuron Software Versioning <neuron-software-versioning>`\n\n**Neuron SDK Installation and Update instructions**\nTo install and update to the latest Neuron packages, customers need to pin the major\nversion of the Neuron package. For example, to install latest Neuron\ntools package, call ``sudo apt-get install aws-neuronx-tools=2.*`` and\nto install latest PyTorch Neuron package for Trn1, call\n``pip install torch-neuronx==2.1.0.1.*``. This is done to future-proof\ninstructions for new, backwards-incompatible major version releases.\n\n.. _neuron-maintenance-updates:\n\nNeuron Software Maintenance Updates\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nMaintenance updates for Neuron Software address three key areas:\nresolving bugs, fixing security vulnerabilities, and upgrading\ndependency software components. At AWS discretion, additional critical\nfeatures or performance enhancement may also be included. To handle\nthese maintenance updates, AWS will release a new version of the Neuron\nSDK, incrementing the patch number (the last digit in the version\nnumber) to indicate a patch release. Major or minor releases may also\ncontain maintenance updates. It's important to note that these\nmaintenance updates are not applied retroactively to previous versions\nof the Neuron SDK. To take advantage of these updates, users must adopt\nthe latest patch release.\n\nFor more information see:\n\n-  :ref:`Neuron DLAMIs and DLCs Updates <neuron-dlami-dlc-updates>`\n-  :ref:`Neuron Extension Components Updates <neuron-extension-components-updates>`\n-  :ref:`Neuron Software Versioning <neuron-software-versioning>`\n\n.. _neuron-dlami-dlc-updates:\n\nNeuron DLAMIs and DLCs Updates\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAWS will address :ref:`regular <neuron-regular-updates>` updates, life-cycle changes, maintenance\nupdates, and security issues related to any third-party software\nincluded in the Neuron DLAMI or DLCs by releasing new versions of the\nNeuron DLAMI or DLCs. However, updates won't be applied retroactively to\nolder versions of the Neuron DLAMI or DLCs. Instead, users will need to\nuse the new versions to get the latest updates. Generally, Neuron DLAMIs and Deep Learning Containers (DLCs) will support one latest LTS Linux Distribution version (Ubuntu, Amazon Linux, and Rocky9), with exceptions. Neuron Base DLAMIs (which come pre-installed with Neuron driver, EFA, and Neuron tools) will support the two latest versions of LTS Linux Distributions.\n\n\nFor more information see:\n\n-  :ref:`Neuron Extension Components Updates <neuron-extension-components-updates>`\n-  :ref:`Neuron Software Versioning <neuron-software-versioning>`\n\n.. _neuron-extension-components-updates:\n\nNeuron Extension Components Updates\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen a new version of an open source ML framework (e.g. PyTorch) is\nsupported by a Neuron extension component (e.g., PyTorch NeuronX), the\nNeuron extension component for the latest supported ML framework version\nwill become the default for installation. If users wish to use a Neuron\nextension component for an earlier supported ML framework version, they\nwill need to explicitly specify the desired version during installation.\nAfter upgrading a Neuron extension component to support a newer version\nof an ML framework, AWS will continue to provide :ref:`regular updates <neuron-regular-updates>`\nfor the Neuron extension component that supports the earlier ML\nframework version for a minimum of 6 months. After the 6 months period,\nthe Neuron extension component for the earlier supported ML framework\nversion may transition into a maintenance mode. In the maintenance mode,\nupdates for the older Neuron extension component versions will be\nprovided on an as-needed basis, focusing on critical bug fixes and\nsecurity patches. For more information see: :ref:`Neuron extension component versioning <neuron-extension-components-versioning>`\n\n.. _neuron-communication:\n\nCommunication methods\n~~~~~~~~~~~~~~~~~~~~~\n\nNeuron software classification and lifecycle announcements are\ncommunicated as follows:\n\n-  Neuron SDK documentation under\n   `Announcements <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/index.html>`__\n\nTo see the list of available Neuron SDK versions and supported\ndependency software components versions:\n\n-  Neuron SDK documentation under `Release\n   Content <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/releasecontent.html#latest-neuron-release-artifacts>`__\n-  Neuron SDK documentation under `What’s\n   New <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html#neuron-whatsnew>`__\n\n.. _neuron-software-versioning:\n\nNeuron Software Versioning\n--------------------------\n\nNeuron SDK Documentation Versioning\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuron SDK documentation is versioned and maps to the corresponding\nNeuron SDK version. Users can switch to earlier versions of the Neuron\nSDK documentation by selecting the version from the dropdown in bottom\nleft portion of the side bar.\n\nNeuron SDK Versioning\n~~~~~~~~~~~~~~~~~~~~~\n\nThe AWS SDK release versions are in the form of ``[A.B.C]`` where\n``(A)`` represents the major version, ``(B)`` represents\nthe minor version, and ``(C)`` represents the patch version.\n\n.. _neuron-extension-components-versioning:\n\nNeuron extension components Versioning\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuron extension components versioning (like PyTorch NeuronX) is in the\nform ``[X.Y.Z].[A.B.C]``, where ``[X.Y.Z]`` represents the\nthird party component’s major (``X``), minor (``Y``), and patch\n(``Z``) versions and ``[A.B.C]`` represents the Neuron extension\ncomponents (``A``), minor (``B``), and patch (``C``)\nversions.\n\nNeuron Standalone Component Versioning\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuron Component versioning (except of Neuron extension components like\nPyTorch NeuronX) is in the form ``[A.B.C.D]``, where ``A``\nrepresents the major version, ``B`` represents the minor version,\nand ``C.D`` represents the patch version.\n\n.. _neuron-releases-types:\n\nNeuron Software Release Types\n-----------------------------\n\nMajor release\n~~~~~~~~~~~~~~~~~\n\nIncreasing the major version indicates that the Neuron software\nunderwent significant and substantial changes in an incompatible manner.\nApplications need to be updated in order for them to work with the\nnewest SDK version. It is important to update major versions carefully\nand in accordance with the upgrade guidelines provided by AWS. After\nincreasing the major version, the Neuron software may not maintain\ncompatibility with previous supported versions of :ref:`Neuron\nRuntime <nrt-api-guide>`, :ref:`Neuron Compiler <neuron_cc>`, and\n:ref:`NEFF <neff-format>`.\n\nMinor release\n~~~~~~~~~~~~~~~~~\n\nIncreasing the minor version indicates that the Neuron software added\nfunctionality in a backwards compatible manner.\n\nPatch release\n~~~~~~~~~~~~~~~~~\n\nIncreasing the patch version indicates that the Neuron software\nadded backward compatible bug or security fixes. A bug fix is defined as\nan internal change that fixes incorrect behavior.\n\nPre-releases\n~~~~~~~~~~~~~~~~\n\n-  **Developer Preview (Beta)**: During this phase, the Neuron software\n   is not supported, should not be used in production environments, and\n   is meant for early access and feedback purposes only. It is possible\n   for future releases to introduce breaking changes. In the case of a\n   Developer Preview (Beta) release, the minor version will include a\n   lower case ``b`` along with a (Beta) tag.\n-  **Release Candidate (RC)**: Once Neuron identifies a release to be a\n   stable product, it may mark it as a Release Candidate. Release\n   Candidates are ready for GA release unless significant bugs emerge,\n   and will receive full AWS Neuron support. In the case of a RC\n   release, the minor version will include a lower case ``rc``\n   along with a (RC) tag.\n\n.. _sdk-classification:\n\nNeuron Software Classification\n------------------------------\n\nThis section explains the Neuron software classification for APIs,\nlibraries, packages, features, and Neuron supported model classes\nmentioned in the Neuron documentation.\n\nNeuron SDK and Neuron components\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n+-----------------+-----------------+-----------------+-------------+\n|                 | Testing         | Features        | Performance |\n+=================+=================+=================+=============+\n| Developer       | Basic           | Minimal Viable  |             |\n| Preview (Beta)  |                 | Product (MVP) \\*|             |\n+-----------------+-----------------+-----------------+-------------+\n| Release         | Basic           | Minimal Viable  | Tested      |\n| Candidate (RC)  |                 | Product (MVP)\\* |             |\n+-----------------+-----------------+-----------------+-------------+\n| GA (Stable)     | Standard        | Incremental     | Tested      |\n|                 | Product Testing | additions or    |             |\n|                 |                 | changes         |             |\n|                 |                 | in new releases |             |\n+-----------------+-----------------+-----------------+-------------+\n\n\\* A minimum viable product (MVP) for a Neuron Component contains just\nenough features to be usable by early customers who can then provide\nfeedback for future development. MVP can be different per use case\nand depends on the specific package/library of interest. Please note\nthat in many cases, an MVP can also represent an advanced level of\nfeatures.\n\n.. _neuron-apis-classification:\n\nNeuron APIs\n~~~~~~~~~~~\n\n+----------------------+----------------------+----------------------+\n|                      | API Contract         | API Backward         |\n|                      |                      | Compatibility        |\n+======================+======================+======================+\n|       Alpha          |   Unstable and       |    No                |\n|                      |   undocumented       |                      |\n+----------------------+----------------------+----------------------+\n| Developer Preview    | Major changes may    |    No                |\n| (Beta)               | happen               |                      |\n+----------------------+----------------------+----------------------+\n| GA (Stable)          | Incremental changes  | Yes \\*               |\n|                      | in new releases      |                      |\n|                      | (without breaking    |                      |\n|                      | the API contract)    |                      |\n+----------------------+----------------------+----------------------+\n\n\\* In certain cases, when necessary, AWS may introduce API changes that may break compatibility, with notice provided ahead of time.\n\n.. _neuron-features-classification:\n\nNeuron Features\n~~~~~~~~~~~~~~~\n\n+-----------------+-----------------+------------------------+-------------+\n|                 | Testing         | Functionality          | Performance |\n+=================+=================+========================+=============+\n|                 | No formal       | Partial funcitonality  | Not tested  |\n|     Alpha       | testing done    | with limited set of    | or          |\n|                 |                 | core capabilities,     | evaluated   |\n|                 |                 | far from Minium Viable |             |\n|                 |                 | Product (MVP) \\*       |             |\n+-----------------+-----------------+------------------------+-------------+\n| Developer       | Basic           | Minimum Viable         |             |\n| Preview (Beta)  |                 | Product (MVP) \\*       |             |\n+-----------------+-----------------+------------------------+-------------+\n| GA (Stable)     | Standard        | Incremental            | Tested      |\n|                 | Product Testing | additions or changes   |             |\n|                 |                 | in new releases        |             |\n+-----------------+-----------------+------------------------+-------------+\n\n\\* A minimum viable product (MVP) for a Neuron Feature contains just\nenough functionality to be usable by early customers who can then\nprovide feedback for future development. MVP can be different per use\ncase and depends on the specific feature of interest. Please note\nthat in many cases, an MVP can also represent an advanced level of\nfunctionality.\n\n.. _neuron-models-classification:\n\nNeuron Supported Model Classes\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n+----------------------+----------------------+----------------------+\n|                      | Accuracy /           | Throughput / Latency |\n|                      | Convergence          |                      |\n+======================+======================+======================+\n| Developer Preview    | Validated            | Tested               |\n| (Beta)               |                      |                      |\n+----------------------+----------------------+----------------------+\n| GA (Stable)          | Validated            | Tested               |\n+----------------------+----------------------+----------------------+\n"
  },
  {
    "path": "about-neuron/security.rst",
    "content": ".. meta::\n    :description: Security disclosures and notification for the AWS Neuron SDK.\n    :date-modified: 01/27/2026\n\n.. _security:\n\nNeuron Security Disclosures\n===========================\n\nIf you think you've found a potential security issue, please do not post it in the Issues. Instead, please follow the instructions here\n(https://aws.amazon.com/security/vulnerability-reporting/) or email AWS\nsecurity directly (`mailto:aws-security@amazon.com <mailto:aws-security@amazon.com>`__).\n\nImportant Security Information for Trainium Hardware\n-----------------------------------------------------\n\nTrainium hardware is designed to optimize performance for machine learning workloads. To deliver high performance, applications with access to Trainium devices have unrestricted access to instance physical memory.\n\nWhat this means for your deployment:\n\n* Instance-level isolation is maintained: AWS EC2 ensures Trainium devices cannot access physical memory of other EC2 instances.\n* As a best practice to prevent unrestricted access to host physical memory by any user/application, we recommend implementing a permission model where:\n\n   * A dedicated system group owns the device nodes\n   * Only explicitly authorized users are added to this group\n   * Device permissions prevent access by users outside the group\n  \nCustomer responsibility: Ensure that only trusted applications have access to Tranium devices on Trainium instances. For more information, see `the AWS Shared Responsibility Model <https://aws.amazon.com/compliance/shared-responsibility-model/>`__.\n\nExample Implementation Steps\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe steps below are an example you can follow to implement a security group using udev rules:\n\n1. Create a dedicated security group (in this example, ``neuron``): ``sudo groupadd -r neuron``\n\n2. Add authorized users to that security group: ``sudo usermod -aG neuron {username-to-add-here}``, repeat for each user\n\n3. Configure udev rules. Create a udev rule to automatically set correct ownership and permissions when Trainium (neuron) devices are detected.\n\n   Create the file ``/etc/udev/rules.d/neuron-udev.rules`` with the following content:\n    \n   .. code-block:: shell\n\n      # Neuron device access control\n      # Only members of the 'neuron' group can access 'neuron' devices.\n\n      SUBSYSTEM==\"neuron*\", KERNEL==\"neuron*\", GROUP=\"neuron\", MODE=\"0660\"\n\n4. Apply the configuration:\n\n   ``sudo udevadm control —-reload``\n   ``sudo udevadm trigger —-subsystem-match=neuron``\n\n5. Verify the configuration:\n\n    ``ls -l /dev/neuron*``\n\n    Expected output:\n\n    ``crw-rw---- 1 root neuron 239, 0 Jan 9 15:58 /dev/neuron0``\n\n"
  },
  {
    "path": "about-neuron/troubleshooting.rst",
    "content": ".. _general-troubleshooting:\n\nTroubleshooting Guide\n=====================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\nTraining Only Troubleshooting\n-----------------------------\n\n* :ref:`PyTorch Neuron for Training <pytorch-neuron-traning-troubleshooting>`\n\n\nInference Only Troubleshooting\n------------------------------\n\n* :ref:`PyTorch Neuron for Inference <pytorch-neuron-inference-troubleshooting>`\n* :ref:`NeuronPerf <neuronperf_troubleshooting>`\n* :ref:`MXNet Neuron <mxnet_troubleshooting_guide>`\n\n\nRuntime Troubleshooting\n------------------------------\n\n* :ref:`Neuron Runtime Troubleshooting on Inf1 and Trn1 <nrt-troubleshooting>`\n\n\nContainers Troubleshooting\n--------------------------\n\n* :ref:`Containers <container-troubleshooting>`\n\n\nSetup Troubleshooting\n---------------------\n\n* :ref:`neuron-setup-troubleshooting`\n"
  },
  {
    "path": "about-neuron/what-is-neuron.rst",
    "content": ".. _what-is-neuron:\n\n.. meta::\n   :description: AWS Neuron is a software development kit for high-performance machine learning on AWS Inferentia and Trainium, enabling developers to compile, optimize, and deploy deep learning models at scale.\n\nWhat is AWS Neuron?\n===================\n\nAWS Neuron is the software stack for running deep learning and generative AI workloads on AWS Trainium and AWS Inferentia. Built on an open source foundation, Neuron enables developers to build, deploy and explore natively with PyTorch and JAX frameworks and with ML libraries such as Hugging Face, vLLM, PyTorch Lightning, and others without modifying your code.  It includes a compiler, runtime, training and inference libraries, and developer tools for monitoring, profiling, and debugging. Neuron supports your end-to-end machine learning (ML) development lifecycle from building and deploying deep learning and AI models, optimizing to achieve highest performance and lowest cost, and getting deeper insights into model behavior.\n\nNeuron enables rapid experimentation, production scale training of frontier models, low level performance optimization through the Neuron Kernel Interface (NKI) for custom kernels, cost optimized inference deployment for agentic AI and reinforcement learning workloads, and comprehensive profiling and debugging with Neuron Explorer.\n\nFor more details, see the detailed documentation under :ref:`About the AWS Neuron SDK <about-neuron>`.\n\nWho is AWS Neuron for?\n-----------------------\n\n* **ML engineers** can use Neuron's vLLM integration to migrate their models to Trainium for improved performance and without code modifications. They can\n* **Performance engineers** can use NKI and our Developer Tools to create new ML kernels and optimize existing ones.\n* **ML researchers** can use their existing PyTorch experience and ecosystem tools to experiment freely on Trainium using our native PyTorch implementatio, without having to learn new frameworks or APIs\n\nWhat is AWS Neuron used for?\n-----------------------------\n\n**Research and Development**: Neuron provides native PyTorch execution on Trainium with full Eager mode compatibility. The stack supports standard distributed training patterns including FSDP, DDP, and DTensor for model sharding across devices and nodes. torch.compile integration enables graph optimization, while existing frameworks like TorchTitan and HuggingFace Transformers run without code modifications. JAX support includes XLA compilation targeting Inferentia and Trainium hardware. \n\n**Production Inference**: Neuron implements vLLM V1 API compatibility on Trainium and Inferentia with optimizations for large-scale inference workloads. The runtime supports Expert Parallelism for MoE models, disaggregated inference architectures, and speculative decoding. Optimized kernels from the NKI Library provide hardware-specific implementations. Training workflows integrate with HuggingFace Optimum Neuron, PyTorch Lightning, and TorchTitan, with seamless deployment through standard vLLM interfaces. \n\n**Performance Engineering**: Neuron Kernel Interface (NKI) provides direct access to Trainium instruction set architecture with APIs for memory management, execution scheduling, and low-level kernel development. The NKI Compiler, built on MLIR, offers full visibility into the compilation pipeline from high-level operations to hardware instructions. The NKI Library contains optimized kernel implementations with source code and performance benchmarks. Neuron Explorer enables comprehensive profiling from application code to hardware execution, supporting both single-node and distributed workload analysis with detailed performance metrics and optimization recommendations.\n\nAWS Neuron Core Components\n----------------------------\n\n**vLLM**\n    Neuron enables production inference deployment with standard frameworks and APIs on Trainium and Inferentia. Use Neuron's vLLM integration with standard APIs to deliver high-performance model serving with optimized kernels from the NKI Library. \n\n    It provides:\n\n    * **Standard vLLM APIs**: Full compatibility with vLLM V1 APIs, enabling customers to use familiar vLLM interfaces on Neuron hardware without code changes\n    * **Advanced Inference Features**: Support for Expert Parallelism for MoE models, disaggregated inference for flexible deployment architectures, and speculative decoding for improved latency\n    * **Optimized Performance**: Pre-optimized kernels from the NKI Library for peak performance across dense, MoE, and multimodal models\n    * **Open Source**: Source code released under the vLLM project organization with source code on GitHub, enabling community contributions\n\n**Native PyTorch**\n    Neuron provides native integration with PyTorch, enabling researchers and ML developers to run existing code unchanged on Trainium. Train models with familiar workflows and tools, from pre-training to post-training with reinforcement learning, while leveraging Trainium's performance and cost advantages for both experimentation and production scale training.\n\n    It provides:\n\n    * **Native Device Support**: Neuron registers as a native device type in PyTorch with standard device APIs like ``torch.tensor([1,2,3], device='neuron')`` and ``.to('neuron')``\n    * **Standard Distributed Training APIs**: Support for FSDP, DTensor, DDP, tensor parallelism, context parallelism, and distributed checkpointing\n    * **Eager Mode Execution**: Immediate operation execution for interactive development and debugging in notebook environments\n    * **torch.compile Integration**: Support for ``torch.compile`` for optimized performance\n    * **Open Source**: Released as an open source package on GitHub under Apache 2.0, enabling community contributions.  \n\n**Neuron Kernel Interface (NKI)**\n    For performance engineers seeking maximum hardware efficiency, Neuron provides complete control through the Neuron Kernel Interface (NKI), with direct access to the NeuronISA (NISA) instruction set, memory allocation, and execution scheduling. Developers can create new operations not available in standard frameworks and optimize performance critical code with custom kernels. \n\n    It includes:\n\n    * The NKI Compiler, built on MLIR, which provides greater transparency into the kernel compilation process\n    * The NKI Library , which provides pre-built kernels you can use to optimize the performance of your models\n\n**Neuron Tools**\n    Debug and profiling utilities including:\n    \n    * Neuron Monitor for real-time performance monitoring\n    * Neuron Explorer, built on the Neuron Profiler (``neuron-profile``), for detailed performance analysis\n\n    Neuron Explorer provides:\n\n    * **Hierarchical Profiling**: Top-down visualization from framework layers through HLO operators to hardware instructions, enabling developers to understand execution at any level of the stack\n    * **Code Linking**: Direct navigation between PyTorch, JAX, and NKI source code and performance timeline with automatic annotations showing metrics for specific code lines\n    * **IDE Integration**: VSCode extension for profile visualization and analysis directly within the development environment\n    * **Device Profiling**: Unified interface for comprehensive view of system-wide metrics and device-specific execution details\n\n**Neuron Compiler**\n    Optimizes machine learning models for AWS Inferentia and Trainium chips, converting models from popular frameworks into efficient executable formats.\n\n**Neuron Runtime**\n    Manages model execution on Neuron devices, handling memory allocation, scheduling, and inter-chip communication for maximum throughput.\n\n**AWS DLAMIs and DLCs**\n    Orchestrate and deploy your models using Deep Learning AWS Machine Images (DLAMIs) and Deep Learning Containers (DLCs).\n\n    Neuron DLAMIs come pre-configured with the Neuron SDK, popular frameworks, and helpful libraries, allowing you to quickly begin training and running inference on AWS Inferentia. Or, quickly deploy models using pre-configured AWS Neuron Deep Learning Containers (Neuron DLCs) with optimized frameworks for AWS Trainium and Inferentia.  \n\nSupported Hardware\n------------------\n\n**AWS Inferentia**\n    Purpose-built for high-performance inference workloads:\n    \n    * ``Inf1`` instances - First-generation Inferentia chips\n    * ``Inf2`` instances - Second-generation with improved performance and efficiency\n\n**AWS Trainium**\n    Designed for distributed training of large models:\n    \n    * ``Trn1`` instances - High-performance training acceleration\n    * ``Trn1n`` instances - Enhanced networking for large-scale distributed training\n    * ``Trn2`` instances - Next-generation Trainium with superior performance\n    * ``Trn2`` UltraServer - High-density Trainium servers for massive training workloads\n    * ``Trn3`` UltraServer -- The next generation of Trainium servers for massive training workloads\n\n\nHow do I get more information?\n------------------------------\n\n* Review the comprehensive documentation and follow the tutorials on this site\n* Check the Neuron GitHub repositories for code examples. GitHub repos include:\n\n  * `Neuron SDK code samples <https://github.com/aws-neuron/aws-neuron-samples>`_\n  * `Neuron NKI ML kernel samples <https://github.com/aws-neuron/nki-samples>`_\n  * `Neuron container confirguations <https://github.com/aws-neuron/deep-learning-containers>`_\n  * `Helm charts for Kubernetes deployment <https://github.com/aws-neuron/neuron-helm-charts>`_\n  * `NeuronX Distributed Core library sources <https://github.com/aws-neuron/neuronx-distributed>`_\n  * `NeuronX Distributed Training library sources <https://github.com/aws-neuron/neuronx-distributed-training>`_\n  * `NeuronX Distributed Inference library sources <https://github.com/aws-neuron/neuronx-distributed-inference>`_\n  * `Linux kernel driver sources <https://github.com/aws-neuron/aws-neuron-driver>`_\n  * `Neuron workshop model samples <https://github.com/aws-neuron/neuron-workshops>`_\n\n* Visit the `AWS Neuron support forum <https://forums.aws.amazon.com/forum.jspa?forumID=355>`_ for community assistance\n"
  },
  {
    "path": "about-neuron/whats-new.rst",
    "content": ".. _main_whats-new:\n\n.. meta::\n    :description: Blog posts for the latest features and updates for the AWS Neuron SDK\n    :date-modified: 03/13/2026\n\nWhat's New in the AWS Neuron SDK\n================================\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   Release Notes </release-notes/index>\n\n*Explore detailed posts about the latest releases, updates, and upcoming changes to the AWS Neuron SDK.*\n\n.. grid:: 1\n    :gutter: 2\n\n    .. grid-item-card:: Neuron Release Notes\n        :link: /release-notes/index\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        **Latest release**: 2.29.0 (04/09/2026)\n\n----\n\n.. _whats-new-2026-04-02-v2_29:\n\nAWS Neuron SDK 2.29.0: NKI Exits Beta, CPU Simulator, and Expanded NKI Library\n-------------------------------------------------------------------------------\n\n**Posted on**: April 09, 2026\n\nToday we are releasing AWS Neuron SDK 2.29.0. This release brings NKI 0.3.0 out of Beta into Stable, featuring the new NKI Standard Library and an experimental CPU Simulator for local kernel development without Trainium hardware. The NKI Library adds 7 new experimental kernels including Conv1D, a Transformer TKG megakernel, and fused communication-compute primitives, along with improvements to existing attention, MLP, and MoE kernels. NxD Inference delivers performance gains for Qwen2 VL, Qwen3 VL, and Flux.1 models. Neuron Runtime introduces new APIs for collective stream management and network proxy tuning. Neuron Explorer is now out of Beta and Stable, with full Device widget support in the System Trace Viewer and availability on the VS Code Extension Marketplace. The Neuron Driver adds support for new Trn3 Gen2 Ultraserver configurations.\n\n\nNeuron Kernel Interface (NKI)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAWS Neuron SDK 2.29.0 introduces NKI 0.3.0, the latest update to the Neuron Kernel Interface. NKI 0.3.0 is now out of Beta and Stable. It features the NKI Standard Library (``nki-stdlib``), which provides developer-visible code for all NKI APIs and native language objects (such as ``NkiTensor``). This release provides new exposed Trainium capabilities and features in the NKI API and introduces ``nki.language`` APIs.\n\n**NKI CPU Simulator (Experimental)**: NKI 0.3.0 includes a CPU Simulator, which executes NKI kernels entirely on CPU and allows for a fast development cycle on inexpensive CPUs and compute instances to validate kernel correctness, using standard Python step-by-step debugging tools and instrumentation to print results for every line of kernel code. Activate it with ``NKI_SIMULATOR=1`` or use ``nki.simulate(kernel)``.\n\n**New Language APIs (Experimental)**: Introduced ``nki.language`` high-level convenience wrappers including ``nl.load``, ``nl.store``, ``nl.copy``, ``nl.matmul``, ``nl.transpose``, and ``nl.softmax``. \n\n**New ISA and Hardware Features**: Added the ability to set DMA priority of DMA operations and collectives operations for Trn3 (NeuronCore-v4). A dedicated ``nki.isa.exponential`` instruction is optimized for vectorising exponents (``exp``) with VectorE. Matmul accumulation control is added via the ``accumulate`` parameter on ``nc_matmul`` and ``nc_matmul_mx``. Variable-length all-to-all collectives are now available via ``nki.collectives.all_to_all_v``.\n\n**Breaking Changes**: NKI 0.3.0 includes several API breaking changes that improve correctness and consistency. All kernels must be updated to NKI 0.3.0; mixing with Beta 2 kernels in the same model is not supported. For the full list of changes and migration examples, see the :doc:`NKI 0.3.0 Update Guide </nki/migration/nki-0-3-0-update-guide>`.\n\nFor more details, see :ref:`nki-2-29-0-rn`.\n\n\nNKI Library\n^^^^^^^^^^^\n\n**New Experimental Kernels (7 added)**: Conv1D provides 1D convolution with stride, padding, dilation, bias, activation fusion, and LNC sharding. Transformer TKG is a multi-layer transformer forward pass megakernel for token generation. Fine-Grained All-Gather and FGCC (All-Gather + Matmul) enable ring-based communication with compute overlap on Trn2. SBUF-to-SBUF All-Gather provides two variants for small and large tensors. Top-K Reduce supports MoE output gathering with LNC sharding. Dynamic Elementwise Add handles runtime-variable M-dimension tiling. The ``find_nonzero_indices`` subkernel is promoted from experimental to core.\n\n**Key Improvements to Existing Kernels**: Attention CTE increases max batch size from 32 to 512 and max sequence length from 36,864 to 131,072 with sequence packing support. Attention Block TKG adds fused QK-norm before RoPE and KVDP attention sharding. MLP adds BufferManager support and MXFP4/MXFP8 quantization paths. MoE TKG introduces a dynamic all-expert algorithm with ``block_size``. QKV adds flexible weight layout support. PyTorch reference implementations are added for 22 kernels.\n\n**Breaking Changes**: Multiple kernel signatures have changed with new parameters inserted mid-signature; callers using positional arguments must switch to keyword arguments. ``SbufManager`` is renamed to ``BufferManager``. MoE TKG replaces boolean sharding flags with ``LNCShardingStrategy`` enum. For the full list of breaking changes, see :ref:`nki-lib-2-29-0-rn`.\n\nFor more details, see :ref:`nki-lib-2-29-0-rn`.\n\n\nInference Updates\n^^^^^^^^^^^^^^^^^\n\n**NxD Inference 0.9.17155**: Qwen2 VL gains vision data parallelism with 7% QPS improvement for image-heavy workloads. Qwen3 VL adds text-model sequence parallelism with 2.2x QPS throughput improvement. Flux.1 adds CFG parallelism with 19% end-to-end latency improvement and 23% instance throughput improvement.\n\n**vLLM Neuron Plugin 0.5.0**: Updated alongside NxD Inference with model performance improvements.\n\n**Hardware Support Change**: NxD Inference no longer supports Trn1/Inf2. Only Trn2 and newer hardware is supported. Pin to Neuron SDK 2.28 for Trn1/Inf2 support.\n\nFor more details, see :ref:`nxd-inference-2-29-0-rn`.\n\n\nRuntime and Driver\n^^^^^^^^^^^^^^^^^^\n\n**Neuron Runtime Library 2.31**: New ``nrt_cc_create_stream`` API creates a collective stream to be used by host-initiated collectives, replacing the previous environment variable approach. New ``nrt_get_attached_efa_bdf`` API returns the BDF string of the EFA device for optimal network interface selection. New environment variables ``NEURON_RT_ONE_THREAD_PER_CORE`` (up to 2x improvement in collective communication latency) and ``NEURON_RT_RANKS_PER_NETWORK_PROXY`` provide fine-grained control over network proxy threading. RDMA support extends to Trn3. Collectives XU gains profiling support, context caching with up to 90% performance improvement, and removal of the 512 queue set instance limit. The async API version is bumped from 2.x to 3.0; applications using the async API must be recompiled.\n\n**Neuron Driver 2.27**: Adds support for new Trn3 Gen2 Ultraserver configurations: US3 (2-node), US4 (4-node), US16 (4-node), and US18 (4-node). Top-level DMA reset support is added during TPB reset on Trn3 and later platforms.\n\n**Neuron Collectives 2.31**: EFA device processing is restructured to per-stream granularity for improved stability. Fixed incorrect interface selection in multi-ultraserver collectives and crash on channel initialization failures.\n\nFor more details, see :ref:`runtime-2-29-0-rn`.\n\n\nNeuron Explorer\n^^^^^^^^^^^^^^^\n\nNeuron Explorer is now out of Beta and Stable. The System Trace Viewer now supports the full suite of Device widgets, enabling multi-device profile analysis across all linked Device Profiles within a single System Profile. The Summary Viewer includes system-level profile data for both system and device profiles. New System Timeline HBM Usage shows device HBM usage with memory allocation breakdown by category. Box Selection Summary enables viewing aggregated device profile information for a selected region in the trace viewer. Neuron Explorer for VS Code is now available on the Visual Studio Code Extension Marketplace and Open VSX, enabling simpler installation and automatic updates.\n\nFor more details, see :ref:`dev-tools-2-29-0-rn`.\n\n\nPyTorch Framework\n^^^^^^^^^^^^^^^^^\n\nPyTorch 2.7 and 2.8 have reached end of support starting with this release. Use PyTorch 2.9 on Ubuntu 24.04. Starting with PyTorch 2.10 support (planned for a future Neuron release), AWS Neuron will transition from PyTorch/XLA to native PyTorch support via TorchNeuron.\n\nFor more details, see :ref:`pytorch-2-29-0-rn`.\n\n\nEnd of Support and Migration Notices\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Effective this release:**\n\n* PyTorch 2.7 and 2.8 have reached end of support. Pin to Neuron SDK 2.28 if required.\n* NeuronX Distributed Training (NxDT) and NxD Core training APIs reach end of support; DLCs and DLAMI virtual environments pinned to SDK 2.28.0.\n* ``neuron-profile analyze`` subcommand is no longer supported. Migrate to Neuron Explorer.\n* Ubuntu 22.04 Multi-Framework DLAMI is no longer published. Use Ubuntu 24.04.\n\n**Hardware support:**\n\n* NxD Inference no longer supports Trn1/Inf2. Pin to Neuron SDK 2.28 for continued support.\n\n**NKI namespace migration:**\n\n* Removal of ``neuronxcc.nki.*`` namespace postponed to a future release. Both ``neuronxcc.nki.*`` and ``nki.*`` namespaces continue to work. Migration to ``nki.*`` is encouraged.\n\n**Effective with PyTorch 2.10 support:**\n\n* PyTorch/XLA will be replaced by TorchNeuron.\n\n* Read the :doc:`Neuron 2.29.0 component release notes </release-notes/2.29.0>` for specific Neuron component improvements and details.\n\n----\n\n.. _whats-new-2026-03-13-v2_28_1:\n\nAWS Neuron SDK 2.28.1 Patch Available\n--------------------------------------\n\n**Posted on**: March 13, 2026\n\nAWS Neuron provides a patch version, 2.28.1, to address a Neuron Driver compatibility issue with Linux kernel 6.18. \n\n.. _whats-new-2026-02-26-v2_28:\n\nAWS Neuron SDK 2.28.0: Enhanced Profiling, Vision Language Models, and Expanded NKI Capabilities\n--------------------------------------------------------------------------------------------------\n\n**Posted on**: February 26, 2026\n\nToday we are releasing AWS Neuron SDK 2.28.0. This release enhances Neuron Explorer with system profiling, Tensor Viewer, and Database Viewer for comprehensive performance analysis. NxD Inference adds support for Qwen2/Qwen3 VL vision language models, Flux.1 inpainting capabilities, and Eagle3 speculative decoding. The NKI Library expands with 9 new kernels including RoPE, MoE operations, and experimental kernels for attention and cross entropy. NKI (Beta 2) introduces LNC multi-core support with intra-LNC collectives and new APIs. Kubernetes users gain Neuron DRA Driver support for advanced resource allocation.\n\n\nDeveloper Tools and Profiling\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Neuron Explorer Enhancements** - Added system profiling support with drill-down navigation to device profiles. New Tensor Viewer helps identify memory bottlenecks by displaying tensor names, shapes, sizes, and memory usage. Database Viewer provides an interactive interface for querying profiling data using SQL or natural language. Profile Manager now supports tag-based organization and search. A migration guide from Neuron Profiler/Profiler 2.0 is now available.\n\n**nccom-test Improvements** - Enhanced data integrity checks use pseudo-random data patterns for better corruption detection. Added support for ``alltoallv`` collective operation for benchmarking variable-sized all-to-all communication patterns.\n\nFor more details, see :ref:`dev-tools-2-28-0-rn`.\n\nInference Updates\n^^^^^^^^^^^^^^^^^\n\n**NxD Inference 0.8.16251** - Added support for vision language models including Qwen2 VL (Qwen2-VL-7B-Instruct) and Qwen3 VL (Qwen3-VL-8B-Thinking) for processing text and image inputs (Beta). Pixtral model support improved with batch size 32 and sequence length 10240 on Trn2 with vLLM V1. Flux.1 model gains new functionality for in-paint, out-paint, canny edge detection, and depth-based image generation (Beta).\n\n**vLLM Neuron Plugin 0.4.1** - Multi-LoRA serving enhancements enable streaming LoRA adapters via vLLM's ``load_adapter`` API with dynamic runtime loading. Users can now run the base model alone when multi-LoRA serving is enabled. Added Eagle3 speculative decoding support for Llama 3.1 8B. Updated to support vLLM v0.13.0 and PyTorch 2.9.\n\nFor more details, see :ref:`nxd-inference-2-28-0-rn`.\n\nNKI Library\n^^^^^^^^^^^\n\n**9 New Kernels** - The NKI Library expands from 7 to 16 documented kernel APIs. New core kernels include RoPE (Rotary Position Embedding), Router Top-K (expert selection for MoE), MoE CTE (Context Encoding), MoE TKG (Token Generation), and Cumsum. New experimental kernels include Attention Block TKG (fused attention for token generation), Cross Entropy (forward and backward passes), Depthwise Conv1D, and Blockwise MM Backward (for MoE training).\n\n**Enhanced Quantization Support** - Existing kernels receive FP8 and MX quantization support across QKV, MLP, and Output Projection kernels. QKV kernel adds fused FP8 KV cache quantization and block-based KV cache layout. MLP kernel adds gate/up projection clamping and fp16 support for TKG mode. Attention CTE kernel adds strided Q slicing for context parallelism.\n\n**Improved Utilities** - TensorView gains ``rearrange`` method for dimension reordering and ``has_dynamic_access`` for runtime-dependent addressing checks. SbufManager provides hierarchical tree-formatted allocation logging with new query methods for SBUF utilization. New utilities include ``rmsnorm_mx_quantize_tkg``, ``interleave_copy``, ``LncSubscriptable``, and ``TreeLogger``.\n\nFor more details, see :ref:`nki-lib-2-28-0-rn`.\n\nNeuron Kernel Interface (NKI)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**NKI Beta 2 (0.2.0)** - This release includes LNC multi-core support for LNC=2, enabling kernels to leverage multiple NeuronCores within a logical NeuronCore. The compiler now tracks ``shared_hbm`` tensors and canonicalizes LNC kernel outputs. Users can declare tensors private to a single NeuronCore using ``private_hbm`` memory type.\n\n**New nki.collectives Module** - Enables collective communication across multiple NeuronCores with operations including ``all_reduce``, ``all_gather``, ``reduce_scatter``, ``all_to_all``, ``collective_permute`` variants, and ``rank_id``.\n\n**New APIs and Features** - New ``nki.isa`` APIs include ``nonzero_with_count`` for sparse computation and ``exponential`` for element-wise operations. New ``float8_e4m3fn`` dtype supports FP8 workloads. Language features include ``no_reorder`` blocks for instruction ordering control, ``__call__`` special method support, ``tensor.view`` method for reshaping, and shared constants as string arguments.\n\n**API Improvements** - ``dma_transpose`` now supports indirect addressing, ``dma_copy`` adds the ``unique_indices`` parameter, and ``register_alloc`` accepts optional tensor arguments for pre-filling. The compiler no longer truncates diagnostic output.\n\nFor more details, see :ref:`nki-2-28-0-rn`.\n\nKubernetes Support\n^^^^^^^^^^^^^^^^^^\n\n**Neuron DRA Driver** - Introduced Neuron Dynamic Resource Allocation (DRA) Driver enabling advanced resource allocation using the Kubernetes DRA API for flexible and efficient Neuron device management. The DRA API provides topology-aware scheduling, atomic resource allocation, and per-workload configuration. Neuron Helm Charts now include DRA Driver support.\n\nFor more details, see :ref:`containers-2-28-0-rn`.\n\nPyTorch Framework (torch-neuronx)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Transition to Native PyTorch Support** - Starting with PyTorch 2.10 support (planned for a future Neuron release), AWS Neuron will transition from PyTorch/XLA to native PyTorch support via TorchNeuron. PyTorch 2.9 is the last version using PyTorch/XLA. Users will need to update their scripts when upgrading to PyTorch 2.10 or later. See :ref:`native-pytorch-trainium` for migration guidance.\n\nFor more details, see :ref:`pytorch-2-28-0-rn`.\n\n* Read the :doc:`Neuron 2.28.1 component release notes </release-notes/prev/2.28.1>` for specific Neuron component improvements and details.\n\n.. _whats-new-2025-12-19-v2_27:\n\nAWS Neuron SDK 2.27.0: Trainium3 Support, Enhanced NKI, and Unified Profiling with Neuron Explorer\n---------------------------------------------------------------------------------------------------\n\n**Posted on**: December 19, 2025\n\nToday we are releasing AWS Neuron SDK 2.27.0. This release adds support for Trainium3 (``Trn3``) instances. Enhanced NKI with new NKI Compiler introduces the ``nki.*`` namespace with updated APIs and language constructs. The NKI Library provides pre-optimized kernels for common model operations including attention, MLP, and normalization. Neuron Explorer delivers a unified profiling suite with AI-driven optimization recommendations. vLLM V1 integration is now available through the vLLM-Neuron Plugin. Deep Learning Containers and AMIs are updated with vLLM V1, PyTorch 2.9, JAX 0.7, Ubuntu 24.04, and Python 3.12.\n\nIn addition to this release, we are introducing new capabilities and features in private beta access (see Private Beta Access section). We are also announcing our transition to PyTorch native support starting with PyTorch 2.10 in Neuron 2.28, plans to simplify NxDI in upcoming releases, and other important updates.\n\nNeuron Kernel Interface (NKI)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**NKI Compiler** - The new ``nki.*`` namespace replaces the legacy ``neuronxcc.nki.*`` namespace. Top-level kernel functions now require the ``@nki.jit`` annotation. Neuron 2.27 supports both namespaces side by side; the legacy namespace will be removed in Neuron 2.28. A kernel migration guide is available in the documentation.\n\nFor more details, see :ref:`neuron-2-27-0-nki`.\n\nNKI Library\n^^^^^^^^^^^\n\nThe NKI Library provides pre-optimized kernels: Attention CTE, Attention TKG, MLP, Output Projection CTE, Output Projection TKG, QKV, and RMSNorm-Quant. Kernels are accessible via the ``nkilib.*`` namespace in neuronx-cc or from the GitHub repository.\n\nFor more details, see :ref:`neuron-2-27-0-nkilib`.\n\nDeveloper Tools\n^^^^^^^^^^^^^^^\n\n**Neuron Explorer** - A a suite of tools designed to support ML engineers throughout their development journey on AWS Trainium. This release features improved performance and user expereince for device profiling, with four core viewers to provide insights into model performance:\n\n* **Hierarchy Viewer**: Visualizes model structure and component interactions\n* **AI Recommendation Viewer**: Delivers AI-driven optimization recommendations\n* **Source Code Viewer**: Links profiling data directly to source code\n* **Summary Viewer**: Displays high-level performance metrics\n\nNeuron Explorer is available through UI, CLI, and VSCode IDE integration. Existing NTFF files are compatible but require reprocessing for new features.\n\nNew tutorials cover profiling NKI kernels, multi-node training jobs, and vLLM inference workloads. The ``nccom-test`` tool now includes fine-grained collective communication support.\n\nFor more details, see :ref:`neuron-2-27-0-tools`.\n\nInference Updates\n^^^^^^^^^^^^^^^^^\n\n**vLLM V1** - The vLLM-Neuron Plugin enables vLLM V1 integration for inference workloads. vLLM V0 support ends in Neuron 2.28.\n\n**NxD Inference** - Model support expands with beta releases of Qwen3 MoE (Qwen3-235B-A22B) for multilingual text and Pixtral (Pixtral-Large-Instruct-2411) for image understanding. Both models use HuggingFace checkpoints and are supported on ``Trn2`` and ``Trn3`` instances.\n\nFor more details, see :ref:`neuron-2-27-0-nxd-inference`.\n\nNeuron Graph Compiler\n^^^^^^^^^^^^^^^^^^^^^\n\nDefault accuracy settings are now optimized for precision. The ``--auto-cast`` flag defaults to ``none`` (previously ``matmul``), and ``--enable-mixed-precision-accumulation`` is enabled by default. FP32 models may see performance impacts; restore previous behavior with ``--auto-cast=matmul`` and ``--disable-mixed-precision-accumulation``. Python 3.10 or higher is now required.\n\nFor more details, see :ref:`neuron-2-27-0-compiler`.\n\nRuntime Improvements\n^^^^^^^^^^^^^^^^^^^^\n\n**Neuron Runtime Library 2.29** adds support for Trainium3 (``Trn3``) instances and delivers performance improvements for Collectives Engine overhead, NeuronCore branch overhead, NEFF program startup, and all-gather latency.\n\nFor more details, see :ref:`neuron-2-27-0-runtime`.\n\nDeep Learning AMIs and Containers\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Platform Updates** - All DLCs are updated to Ubuntu 24.04 and Python 3.12. DLAMIs add Ubuntu 24.04 support for base, single framework, and multi-framework configurations.\n\n**Framework Updates**:\n\n* vLLM V1 single framework DLAMI and multi-framework virtual environments\n* PyTorch 2.9 single framework DLAMIs and multi-framework virtual environments (Amazon Linux 2023, Ubuntu 22.04, Ubuntu 24.04)\n* JAX 0.7 single framework DLAMI and multi-framework virtual environments\n\n**New Container** - The ``pytorch-inference-vllm-neuronx`` 0.11.0 DLC provides a complete vLLM inference environment with PyTorch 2.8 and all dependencies.\n\nFor more details, see :ref:`neuron-2-27-0-dlami` and :ref:`neuron-2-27-0-dlc`.\n\n\nEnd of Support and Migration Notices\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Effective this release:**\n\n* :ref:`announcement-python-3-9-eol`\n* :ref:`announcement-end-of-support-pytorch-2-6`\n* :ref:`announce-no-support-tensorflow2-10`\n* :ref:`announce-eos-inf1-virtual-environments`\n* :ref:`announcement-end-of-support-parallel-model-trace`\n* :ref:`announce-eos-tensorboard-tools`\n\n**Effective Neuron 2.28:**\n\n* :ref:`announcement-end-of-support-neuronxcc-nki`\n* :ref:`announcement-nki-library-namespace-changes`\n* :ref:`announcement-nki-library-kernel-migration`\n* :ref:`announcement-end-of-support-vllm-v0`\n\n**Effective with PyTorch 2.10 support:**\n\n* :ref:`announce-transition-pytorch-trainium`\n* :ref:`announcement-end-of-support-nxdt-nxd-core`\n\n**Future Releases:**\n\n* :ref:`announce-nxdi-changes`\n* :ref:`announce-eos-dlami-ubuntu-22-04`\n* :ref:`announce-eos-pytorch-profling-api`\n* :ref:`announce-eos-neuron-profiler`\n\nDetailed Release Notes\n^^^^^^^^^^^^^^^^^^^^^^^\n\n* Read the :doc:`Neuron 2.27.0 component release notes </release-notes/prev/2.27.0/index>` for specific Neuron component improvements and details.\n\n----\n\n.. _whats-new-2025-12-02-riv:\n\nAWS Neuron Expands with Trainium3, Native PyTorch, Faster NKI, and Open Source at re:Invent 2025\n------------------------------------------------------------------------------------------------\n\n**Posted on**: 12/02/2025\n\n.. image:: /images/NeuronStandalone_white_small.png\n   :alt: AWS Neuron Logo\n   :align: right\n   :width: 120px\n\nAt re:Invent 2025, AWS Neuron introduces support for `Trainium3 UltraServer <https://aws.amazon.com/ai/machine-learning/trainium/>`__ with expanded open source components and enhanced developer experience. These updates enable standard frameworks to run unchanged on Trainium, removing barriers for researchers to experiment and innovate. For developers requiring deeper control, the enhanced Neuron Kernel Interface (NKI) provides direct access to hardware-level optimizations, enabling customers to scale AI workloads with improved performance.\n\n**Expanded capabilities and enhancements include**:\n\n* :doc:`Trainium3 UltraServer support </about-neuron/arch/neuron-hardware/trn3-arch>`: Enabling customers to scale AI workloads with improved performance\n* :doc:`Native PyTorch support </frameworks/torch/pytorch-native-overview>`: Standard PyTorch runs unchanged on Trainium without platform-specific modifications\n* :doc:`Enhanced Neuron Kernel Interface (NKI) </nki/get-started/about/index>` with open source :doc:`NKI Compiler </nki/deep-dives/nki-compiler>`: Improved programming capabilities with direct access to Trainium hardware instructions and fine-grained optimization control, compiler built on MLIR\n* :doc:`NKI Library </nki/library/index>`: Open source collection of optimized, ready-to-use kernels for common ML operations\n* :doc:`Neuron Explorer </tools/neuron-explorer/index>`: Tools suite to support developers and performance engineers in their performance optimization journey from framework operations to hardware instructions\n* :doc:`Neuron DRA for Kubernetes </containers/neuron-dra>`: Kubernetes-native resource management eliminating custom scheduler extensions\n* :doc:`Expanded open source components </about-neuron/oss/index>`: Open sourcing more components including NKI Compiler, Native PyTorch, NKI Library, and more released under Apache 2.0\n\n\nAI development requires rapid experimentation, hardware optimization, and production scale workloads. These updates enable researchers to experiment with novel architectures using familiar workflows, ML developers to build AI applications using standard frameworks, and performance engineers to optimize workloads using low-level hardware optimization.\n\n.. admonition:: Looking to try out our Beta features?\n\n   Submit your beta access request through `this form <https://pulse.aws/survey/NZU6MQGW?p=0>`__ and the Neuron Product team will get back to you.\n\nNative PyTorch Support\n^^^^^^^^^^^^^^^^^^^^^^\n\n**Private Preview**\n\nAWS Neuron now natively supports PyTorch through TorchNeuron, an open source native PyTorch backend for Trainium. TorchNeuron integrates with PyTorch through the PrivateUse1 device backend mechanism, registering Trainium as a native device alongside other backends and allowing researchers and ML developers to run their code without modifications.\n\nTorchNeuron provides eager mode execution for interactive development and debugging, native distributed APIs including FSDP and DTensor for distributed training, and torch.compile support for optimization. TorchNeuron enables compatibility with minimal code changes with ecosystem tools like TorchTitan and HuggingFace Transformers.\n\nUse TorchNeuron to run your PyTorch research and training workloads on Trainium without platform-specific code changes.\n\n**Learn more**: :doc:`documentation </frameworks/torch/pytorch-native-overview>`, and `TorchNeuron GitHub repository <https://github.com/aws-neuron/torch-neuronx>`__.\n\n**Access**: Contact your AWS account team for access.\n\n\nEnhanced NKI\n^^^^^^^^^^^^\n\n**Public Preview**\n\nThe enhanced Neuron Kernel Interface (NKI) provides developers with complete hardware control through advanced APIs for fine-grained scheduling and allocation. The enhanced NKI enables instruction-level programming, memory allocation control, and execution scheduling with direct access to the Trainium ISA. \n\nWe are also releasing the NKI Compiler as open source under Apache 2.0, built on MLIR to enable transparency and collaboration with the broader compiler community. NKI integrates with PyTorch and JAX, enabling developers to use custom kernels within their training workflows.\n\nUse Enhanced NKI to innovate and build optimized kernels on Trainium. Explore the NKI Compiler source code to inspect and contribute to the MLIR-based compilation pipeline. \n\n.. note::\n  The NKI Compiler source code is currently in **Private Preview**, while the NKI programming interface is in **Public Preview**.\n\n**Learn more**: :doc:`NKI home page </nki/index>` and :doc:`NKI Language Guide </nki/get-started/nki-language-guide>`.\n\nNKI Library\n^^^^^^^^^^^\n\n**Public Preview**\n\nThe NKI Library provides an open source collection of optimized, ready-to-use kernels for common ML operations. The library includes kernels for dense transformer operations, MoE-specific operations, and attention mechanisms, all with complete source code, documentation, and benchmarks.\n\nUse NKI Library kernels directly in your models to improve performance, or explore the implementations as reference for best practices of performance optimizations on Trainium.\n\n**Learn more**: `GitHub repository <https://github.com/aws-neuron/nki-library>`__ and :doc:`API documentation </nki/library/api/index>`.\n\n\nNeuron Explorer\n^^^^^^^^^^^^^^^\n\n**Public Preview**\n\nNeuron Explorer is a tools suite that supports developers and performance engineers in their performance optimization journey. It provides capabilities to inspect and optimize code from framework operations down to hardware instructions with hierarchical profiling, source code linking, IDE integration, and AI-powered recommendations for optimization insights.\n\nUse Neuron Explorer to understand and optimize your model performance on Trainium, from high-level framework operations to low-level hardware execution.\n\n**Learn more**: :doc:`Neuron Explorer documentation </tools/neuron-explorer/index>`.\n\n\nKubernetes-Native Resource Management with Neuron DRA\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Private Preview**\n\nNeuron Dynamic Resource Allocation (DRA) provides Kubernetes-native resource management for Trainium, eliminating custom scheduler extensions. DRA enables topology-aware scheduling using the default Kubernetes scheduler, atomic UltraServer allocation, and flexible per-workload configuration.\n\nNeuron DRA supports EKS, SageMaker HyperPod, and UltraServer configurations. The driver is open source with container images in AWS ECR public gallery.\n\nUse Neuron DRA to simplify Kubernetes resource management for your Trainium workloads with native scheduling and topology-aware allocation.\n\n**Learn more**: :doc:`Neuron DRA documentation </containers/neuron-dra>`.\n\n**Access**: Contact your AWS account team to participate in the Private Preview.\n\n\nResources and Additional Information\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFor more information visit the `AWS Trainium official page <https://aws.amazon.com/ai/machine-learning/trainium/>`__, the :doc:`AWS Neuron Documentation </index>`, and :doc:`the AWS Neuron GitHub repositories </about-neuron/oss/index>`.\n\n\n\n\n"
  },
  {
    "path": "archive/helper-tools/index.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nHelper Tools\n============\n\n.. toctree:: \n    :maxdepth: 1\n\n        \n    Check Model </archive/helper-tools/tutorial-neuron-check-model>\n    GatherInfo </archive/helper-tools/tutorial-neuron-gatherinfo>"
  },
  {
    "path": "archive/helper-tools/tutorial-neuron-check-model.rst",
    "content": ".. _neuron_check_model:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nNeuron Check Model\n^^^^^^^^^^^^^^^^^^\n\nOverview\n========\n\nNeuron Check Model tool provides user with basic information about the compiled and uncompiled model's operations\nwithout the use of TensorBoard-Neuron. For additional visibility into the models, please see :ref:`neuron-plugin-tensorboard`.\n\nNeuron Check Model tool scans the user's uncompiled model and provides a table of the operations within the uncompiled\nmodel. By default, the table shows each operation type and number of instances of that type within model, and whether\nthe type is supported in Neuron. If --show_names option is specified, the table shows each operation by name and\nwhether the type of that operation is supported in Neuron.\n\nIf the model is already compiled, the tool also provides the table of operations as for uncompiled model. The table\ninclude the Neuron subgraph type and number of instances of that type, along with operations that have not been\ncompiled to Neuron. Additionally, the tool displays a message showing the minimum number of NeuronCores required to run the\nmodel, followed by another table which shows the list of Neuron subgraphs by name and the number of pipelined\nNeuronCores used by each subgraph. More information about NeuronCore pipeline can be found in\n:ref:`neuroncore-pipeline`. If --expand_subgraph option is specified, the operations within each subgraph are\nprinted below the subgraph information.\n\nNeuron Check Model tool is currently available for TensorFlow and MXNet. To check PT model, please use\ntorch.neuron.analyze_model function as shown in PyTorch-Neuron Getting Started tutorial :ref:`/src/examples/pytorch/resnet50.ipynb`\n\nTensorFlow-Neuron Check Model\n=============================\n\nThe following example shows how to run TensorFlow-Neuron Check Model tool with TensorFlow ResNet50 tutorial.\n\n1. Start with the TensorFlow ResNet50 tutorial at :ref:`/src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb` and do the first three steps of the\ntutorial. Please stay in the Python environment that you setup during the tutorial.\n\n2. Install needed tensorflow_hub package and download the tool:\n\n::\n\n    pip install tensorflow_hub\n    wget https://raw.githubusercontent.com/aws/aws-neuron-sdk/master/src/neuron-gatherinfo/tf_neuron_check_model.py\n    python tf_neuron_check_model.py -h\n\n::\n\n    usage: tf_neuron_check_model.py [-h] [--show_names] [--expand_subgraph]\n                                    model_path\n\n    positional arguments:\n      model_path         a TensorFlow SavedModel directory (currently supporting\n                         TensorFlow v1 SaveModel only).\n\n    optional arguments:\n      -h, --help         show this help message and exit\n      --show_names       list operation by name instead of summarizing by type\n                         (caution: this option will generate many lines of output\n                         for a large model).\n      --expand_subgraph  show subgraph operations.\n\n3. After step 3 of the TensorFlow ResNet50 tutorial, you can check the uncompiled model to see Neuron supported operations (currently supporting TensorFlow v1 SaveModel only):\n\n::\n\n    $ python tf_neuron_check_model.py ws_resnet50/resnet50/\n\n    * The following table shows the supported and unsupported operations within this uncompiled model.\n    * Each line shows an operation type, the number of instances of that type within model,\n    * and whether the type is supported in Neuron.\n    * Some operation types are excluded from table because they are no-operations or training-related operations:\n     ['Placeholder', 'PlaceholderWithDefault', 'NoOp', 'Const', 'Identity', 'IdentityN', 'VarHandleOp',\n     'VarIsInitializedOp', 'AssignVariableOp', 'ReadVariableOp', 'StringJoin', 'ShardedFilename', 'SaveV2',\n     'MergeV2Checkpoints', 'RestoreV2']\n\n    Op Type           Num Instances   Neuron Supported ?\n    -------           -------------   ------------------\n    Pad               2               Yes\n    RandomUniform     54              Yes\n    Sub               54              Yes\n    Mul               54              Yes\n    Add               54              Yes\n    Conv2D            53              Yes\n    BiasAdd           54              Yes\n    FusedBatchNormV3  53              Yes\n    Relu              49              Yes\n    MaxPool           1               Yes\n    AddV2             16              Yes\n    Fill              56              Yes\n    Mean              1               Yes\n    MatMul            1               Yes\n    Softmax           1               Yes\n    Pack              1               Yes\n\n    * Total inference operations: 504\n    * Total Neuron supported inference operations: 504\n    * Percent of total inference operations supported by Neuron: 100.0\n\n4. You can also check the compiled model to see the number of pipeline NeuronCores for each subgraph:\n\n::\n\n    $ python tf_neuron_check_model.py ws_resnet50/resnet50_neuron/\n\n    * Found 1 Neuron subgraph(s) (NeuronOp(s)) in this compiled model.\n    * Use this tool on the original uncompiled model to see Neuron supported operations.\n    * The following table shows all operations, including Neuron subgraphs.\n    * Each line shows an operation type, the number of instances of that type within model,\n    * and whether the type is supported in Neuron.\n    * Some operation types are excluded from table because they are no-operations or training-related operations:\n     ['Placeholder', 'PlaceholderWithDefault', 'NoOp', 'Const', 'Identity', 'IdentityN', 'VarHandleOp',\n     'VarIsInitializedOp', 'AssignVariableOp', 'ReadVariableOp', 'StringJoin', 'ShardedFilename', 'SaveV2',\n     'MergeV2Checkpoints', 'RestoreV2']\n\n    Op Type   Num Instances   Neuron Supported ?\n    -------   -------------   ------------------\n    NeuronOp  1               Yes\n\n    * Please run this model on Inf1 instance with at least 1 NeuronCore(s).\n    * The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph\n    * (and subgraph operations if --expand_subgraph is used):\n\n    Subgraph Name                                                                 Num Pipelined NeuronCores\n    -------------                                                                 -------------------------\n    conv5_block3_3_bn/FusedBatchNormV3/ReadVariableOp/neuron_op_d6f098c01c780733  1\n\n5. When showing subgraph information, you can use --expand_subgraph to show operation types in each subgraph:\n\n::\n\n    $ python tf_neuron_check_model.py ws_resnet50/resnet50_neuron/ --expand_subgraph\n\n    (output truncated to show subgraph information only)\n\n    Subgraph Name                                                                 Num Pipelined NeuronCores\n    -------------                                                                 -------------------------\n    conv5_block3_3_bn/FusedBatchNormV3/ReadVariableOp/neuron_op_d6f098c01c780733  1\n         Op Type         Num Instances\n         -------         -------------\n         MatMul          1\n         Relu            49\n         Add             16\n         FusedBatchNorm  53\n         BiasAdd         54\n         Conv2D          53\n         Pad             2\n         Mean            1\n         MaxPool         1\n         Softmax         1\n\n6. Use --show_names to see full operation names (caution: this option will generate many lines of output for a large model):\n\n::\n\n    $ python tf_neuron_check_model.py ws_resnet50/resnet50_neuron/ --show_names\n\n    * Found 1 Neuron subgraph(s) (NeuronOp(s)) in this compiled model.\n    * Use this tool on the original uncompiled model to see Neuron supported operations.\n    * The following table shows all operations, including Neuron subgraphs.\n    * Each line shows an operation name and whether the type of that operation is supported in Neuron.\n    * Some operation types are excluded from table because they are no-operations or training-related operations:\n     ['Placeholder', 'PlaceholderWithDefault', 'NoOp', 'Const', 'Identity', 'IdentityN', 'VarHandleOp',\n     'VarIsInitializedOp', 'AssignVariableOp', 'ReadVariableOp', 'StringJoin', 'ShardedFilename', 'SaveV2',\n     'MergeV2Checkpoints', 'RestoreV2']\n\n    Op Name                                                                       Op Type   Neuron Supported ?\n    -------                                                                       -------   ------------------\n    conv5_block3_3_bn/FusedBatchNormV3/ReadVariableOp/neuron_op_d6f098c01c780733  NeuronOp  Yes\n\n    * Please run this model on Inf1 instance with at least 1 NeuronCore(s).\n    * The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph\n    * (and subgraph operations if --expand_subgraph is used):\n\n    Subgraph Name                                                                 Num Pipelined NeuronCores\n    -------------                                                                 -------------------------\n    conv5_block3_3_bn/FusedBatchNormV3/ReadVariableOp/neuron_op_d6f098c01c780733  1\n\n\nMXNet-Neuron Check Model\n=========================\n\nThe following example shows how to run MXNet-Neuron Check Model tool with MXNet ResNet50 tutorial.\n\n1. Start with the MXNet ResNet50 tutorial at :ref:`/src/examples/mxnet/resnet50/resnet50.ipynb` and do the first three steps of the tutorial.\nPlease stay in the Python environment that you setup during the tutorial.\n\n2. Download the tool:\n\n::\n\n    wget https://raw.githubusercontent.com/aws/aws-neuron-sdk/master/src/neuron-gatherinfo/mx_neuron_check_model.py\n    python mx_neuron_check_model.py -h\n\n::\n\n    usage: mx_neuron_check_model.py [-h] [--show_names] [--expand_subgraph]\n                                    model_path\n\n    positional arguments:\n      model_path         path prefix to MXNet model (the part before -symbol.json)\n\n    optional arguments:\n      -h, --help         show this help message and exit\n      --show_names       list operation by name instead of summarizing by type\n                         (caution: this option will generate many lines of output\n                         for a large model).\n      --expand_subgraph  show subgraph operations.\n\n3. After step 3 of MXNet ResNet50 tutorial, you can check the uncompiled model to see Neuron supported operations:\n\n::\n\n    $ python mx_neuron_check_model.py resnet-50\n\n    * The following table shows the supported and unsupported operations within this uncompiled model.\n    * Each line shows an operation type, the number of instances of that type within model,\n    * and whether the type is supported in Neuron.\n    * Some operation types are excluded from table because they are no-operations or training-related operations:\n     ['null']\n\n    Op Type         Num Instances   Neuron Supported ?\n    -------         -------------   ------------------\n    BatchNorm       51              Yes\n    Convolution     53              Yes\n    Activation      50              Yes\n    Pooling         2               Yes\n    elemwise_add    16              Yes\n    Flatten         1               Yes\n    FullyConnected  1               Yes\n    SoftmaxOutput   1               No\n\n    * Total inference operations: 175\n    * Total Neuron supported inference operations: 174\n    * Percent of total inference operations supported by Neuron: 99.4\n\n4. You can also check the compiled model to see the number of pipeline NeuronCores for each subgraph:\n\n::\n\n    $ python mx_neuron_check_model.py resnet-50_compiled\n\n    * Found 1 Neuron subgraph(s) (_neuron_subgraph_op(s)) in this compiled model.\n    * Use this tool on the original uncompiled model to see Neuron supported operations.\n    * The following table shows all operations, including Neuron subgraphs.\n    * Each line shows an operation type, the number of instances of that type within model,\n    * and whether the type is supported in Neuron.\n    * Some operation types are excluded from table because they are no-operations or training-related operations:\n     ['null']\n\n    Op Type              Num Instances   Neuron Supported ?\n    -------              -------------   ------------------\n    _neuron_subgraph_op  1               Yes\n    SoftmaxOutput        1               No\n\n    * Please run this model on Inf1 instance with at least 1 NeuronCore(s).\n    * The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph\n    * (and subgraph operations if --expand_subgraph is used):\n\n    Subgraph Name         Num Pipelined NeuronCores\n    -------------         -------------------------\n    _neuron_subgraph_op0  1\n\n5. When showing subgraph information, you can use --expand_subgraph to show operation types in each subgraph:\n\n::\n\n    $ python mx_neuron_check_model.py resnet-50_compiled --expand_subgraph\n\n    (output truncated to show subgraph information only)\n\n    Subgraph Name         Num Pipelined NeuronCores\n    -------------         -------------------------\n    _neuron_subgraph_op0  1\n         Op Type         Num Instances\n         -------         -------------\n         BatchNorm       51\n         Convolution     53\n         Activation      50\n         Pooling         2\n         elemwise_add    16\n         Flatten         1\n         FullyConnected  1\n\n6. Use --show_names to see full operation names (caution: this option will generate many lines of output for a large model):\n\n::\n\n    $ python mx_neuron_check_model.py resnet-50_compiled --show_names\n\n    * Found 1 Neuron subgraph(s) (_neuron_subgraph_op(s)) in this compiled model.\n    * Use this tool on the original uncompiled model to see Neuron supported operations.\n    * The following table shows all operations, including Neuron subgraphs.\n    * Each line shows an operation name and whether the type of that operation is supported in Neuron.\n    * Some operation types are excluded from table because they are no-operations or training-related operations:\n     ['null']\n\n    Op Name               Op Type              Neuron Supported ?\n    -------               -------              ------------------\n    _neuron_subgraph_op0  _neuron_subgraph_op  Yes\n    softmax               SoftmaxOutput        No\n\n    * Please run this model on Inf1 instance with at least 1 NeuronCore(s).\n    * The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph\n    * (and subgraph operations if --expand_subgraph is used):\n\n    Subgraph Name         Num Pipelined NeuronCores\n    -------------         -------------------------\n    _neuron_subgraph_op0  1\n"
  },
  {
    "path": "archive/helper-tools/tutorial-neuron-gatherinfo.rst",
    "content": ".. _neuron_gatherinfo:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nUsing Neuron GatherInfo Tool to collect debug and support information\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOverview\n========\n\nThe Neuron GatherInfo tool ``neuron-gatherinfo.py`` can assist in\nautomating the collection and packaging of information from Neuron SDK\ntools that is useful to both user and AWS for issue resolution. The tool\ngathers log files and other system information. If being used to supply\nthat info to AWS, the tool will redact proprietary and confidential\ninformation. The GatherInfo tool is supplied in source code form -\navailable here: :github:`Neuron Gatherinfo </src/neuron-gatherinfo/neuron-gatherinfo.py>`\n\nThe tool enables developers to gather compiler and inference/runtime\nlogs. Additionally, the common usage is from within one of the supported\nML frameworks that have been integrated with Neuron, and information can\nbe captured from those compile/runtime environments using the\nframeworks.\n\nSteps Overview:\n~~~~~~~~~~~~~~~\n\n1. Obtain a copy of neuron-gatherinfo.py from\n   :github:`Neuron Gatherinfo </src/neuron-gatherinfo/neuron-gatherinfo.py>`\n2. Install into a location in your $PATH or into a location from where\n   you can launch the script\n3. Use with compile and/or runtime environments\n\nNeuron-CC information gathering\n-------------------------------\n\nStep 1: Re-run the compile steps for your workload with increased verbosity or debug levels\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n-  For TensorFlow-Neuron, change the Python code as shown. Note that\n   ‘compiler-workdir’ is expected to be an empty directory to prevent\n   files from other runs from interfering with the information\n   gathering. The call to the compile function has to be augmented with\n   the **verbose** and the \\**compiler_workdir \\**arguments. In\n   addition, please capture the stdout messages into a file (for\n   example, by redirecting the stdout to a file)\n\n::\n\n   tfn.saved_model.compile(model_dir, compiled_model_dir, compiler_args=['--verbose', '2', '--pipeline', 'compile',  'SaveTemps'], compiler_workdir='./compiler-workdir')\n\n-  For Neuron Apache MXNet, add compiler arguments as shown below and run the\n   compilation process from an empty workdir:\n\n::\n\n   import mxnet as mx\n   import os\n\n   from packaging import version\n   mxnet_version = version.parse(mx.__version__)\n   if mxnet_version >= version.parse(\"1.8\"):\n      import mx_neuron as neuron\n   else: \n      from mxnet.contrib import neuron\n\n   ...\n   os.environ['SUBGRAPH_INFO'] = '1'\n   compile_args = { '--verbose' : 2, '--pipeline' : 'compile', 'flags' : ['SaveTemps'] }\n   csym, cargs, cauxs = neuron.compile(sym, args, auxs, inputs=inputs, **compile_args)\n\n.. _step-2-run-neuron-gatherinfopy-to-gather-information-to-share:\n\nStep 2: Run neuron-gatherinfo.py to gather information to share\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe output result will be a tar.gz file.\n\nNeuron Runtime information gathering\n------------------------------------\n\nStep 1: EXECUTE inference steps for your workload with increased verbosity or debug levels\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn the case of runtime information, the tool **neuron-dump.py** is used\nby \\**neuron-gatherinfo.py \\**to gather that information. Make sure that\nyou have the neuron tools package (aws-neuron-tools) installed.\n\n.. _step-2-run-neuron-gatherinfopy-to-gather-information-to-share-1:\n\nStep 2: Run neuron-gatherinfo.py to gather information to share\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe output result will be a tar.gz file.\n\nTool Usage Reference\n====================\n\nRun neuron-gatherinfo.py using the “—help“ option:\n\n::\n\n   bash $ ~/bin/neuron-gatherinfo.py --help\n   usage: neuron-gatherinfo.py [-h] [--additionalfileordir ADDFLDIR] [-c CCDIR]\n                               [-i] [-f FILTERFILE] [-m] -o OUTDIR [-r RTDIR] -s\n                               STDOUT [-v]\n\n       Usage: /home/user/bin/neuron-gatherinfo.py [options]\n       This program is used to gather information from this system for analysis\n       and debugging\n\n\n   optional arguments:\n     -h, --help            show this help message and exit\n     --additionalfileordir ADDFLDIR\n                           Additional file or directory that the user wants to\n                           provide in the archive. The user can sanitize this\n                           file or directory before sharing\n     -c CCDIR, --compileroutdir CCDIR\n                           Location of the neuron-cc generated files\n     -i, --include         By default, only the lines containing (grep) patterns\n                           like 'nrtd|neuron|kernel:' from the syslog are copied.\n                           Other lines are excluded. Using this option allows the\n                           timestamp section of other lines to be included. The\n                           rest of the contents of the line itself are elided.\n                           Providing the timestamp section may provide time\n                           continuity while viewing the copied syslog file\n     -f FILTERFILE, --filter FILTERFILE\n     -m, --modeldata       By using this option, the entire compiler work\n                           directory's contents will be included (excluding the\n                           .pb files, unless an additional option is used). This\n                           would include model information, etc. The files that\n                           are included, by default, are these: graph_def.neuron-\n                           cc.log, all_metrics.csv, hh-tr-operand-\n                           tensortensor.json\n     -o OUTDIR, --out OUTDIR\n                           The output directory where all the files and other\n                           information will be stored. The output will be stored\n                           as an archive as well as the actual directory where\n                           all the contents are copied. This will allow a simple\n                           audit of the files, if necessary. *** N O T E ***:\n                           Make sure that this directory has enough space to hold\n                           the files and resulting archive\n     -r RTDIR, --runtimeoutdir RTDIR\n                           Location of the neuron runtime generated files\n     -s STDOUT, --stdout STDOUT\n                           The file where the stdout of the compiler run was\n                           saved\n     -v, --verbose         Verbose mode displays commands executed and any\n                           additional information which may be useful in\n                           debugging the tool itself\n\nExamples\n========\n\nExample 1: no ML model information gathered (default behavior)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn this case, the tool will archive just the default information\ngathering:\n\n::\n\n   bash $ sudo ~/bin/neuron-gatherinfo.py   -o compile-and-run-info-for-debugging-no-model-info  -i --verbose  -s stdout-from-compile_resnet50.out -c compiler-workdir\n\n   Running cmd: lscpu and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-lscpu.txt\n   Running cmd: lshw and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-lshw.txt\n   Running cmd: lspci | grep -i Amazon and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-lspci.txt\n   Running cmd: neuron-cc --version and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-cc.txt\n   Running cmd: neuron-ls and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-ls.txt\n   <SNIP>\n       ******\n       Archive created at:\n           /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo.tar.gz\n       From directory:\n           /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo\n       ******\n\n\n.. _example-2--model-ml-information-gathered-using-the-modeldata-option:\n\nExample 2 : model ML information gathered using the “—modeldata” option\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn this case, the tool will archive the compiler work directory in\naddition to the default information gathering\n\n::\n\n   bash $ sudo ~/bin/neuron-gatherinfo.py   -o compile-and-run-info-for-debugging  -i --verbose  -s stdout-from-compile_resnet50.out -c compiler-workdir --modeldata\n\n   <SNIP>\n   Running cmd: lscpu and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo/report-lscpu.txt\n   Running cmd: lshw and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo/report-lshw.txt\n   Running cmd: lspci | grep -i Amazon and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo/report-lspci.txt\n   Running cmd: neuron-cc --version and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-cc.txt\n   Running cmd: neuron-ls and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-ls.txt\n   <SNIP>\n\n       ******\n       Archive created at:\n           /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo.tar.gz\n       From directory:\n           /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo\n       ******\n\n\n       **************************\n       Based on your command line option, we're also packaging these files:\n\n           graph_def.neuron-cc.log\n           all_metrics.csv\n           hh-tr-operand-tensortensor.json\n\n       And this directory: /home/user/tutorials-3/compiler-workdir\n\n       **************************\n\n"
  },
  {
    "path": "archive/index.rst",
    "content": ".. meta::\n   :description: Archived AWS Neuron SDK documentation\n   :keywords: AWS Neuron SDK, archived tutorials, legacy documentation\n   :date-modified: 12-02-2025\n\n=====================================\nArchived AWS Neuron SDK documentation\n=====================================\n\n.. note::\n\n    This page contains archived tutorials and other documentation for older versions of the AWS Neuron SDK.\n    These pages are no longer actively maintained and may reference unsupported features or deprecated APIs. They are provided as-is and may not reflect the current state of the AWS Neuron SDK.\n\nOverview\n--------\n\nThe following content has been archived for reference purposes. For the latest documentation and guides, visit the `AWS Neuron SDK documentation <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/>`_.\n\nArchived feature docs\n---------------------\n\n.. list-table::\n   :header-rows: 1\n\n   * - Feature\n     - Last release supported\n     - Date archived\n   * - :doc:`tensorboard/getting-started-tensorboard-neuron-plugin`\n     - Neuron 2.27.0\n     - Archived on: 12/2/2025\n   * - :doc:`neuronperf/index`\n     - Neuron 2.27.0\n     - Archived on: 12/2/2025\n   * - :doc:`helper-tools/index`\n     - Neuron 2.27.0\n     - Archived on: 12/2/2025\n   * - :doc:`transformers-neuronx/index`\n     - Neuron 2.25.0\n     - Archived on: 9/15/2025\n   * - :doc:`MXNet Neuron Setup Guides <mxnet-neuron/index>`\n     - Neuron 2.27.0\n     - Archived on: 3/30/2026\n   * - :doc:`mxnet-neuron/index`\n     - Neuron 2.16.0\n     - Archived on: 3/11/2026\n   * - :doc:`tensorflow/index`\n     - Neuron 2.22.0\n     - Archived on: 3/11/2026\n   * - :doc:`torch-neuron/index`\n     - Neuron 2.22.0\n     - Archived on: 3/11/2026\n\n\nArchived tutorials\n------------------\n\n.. list-table::\n   :header-rows: 1\n\n   * - Tutorial\n     - Last release supported\n     - Date archived\n   * - :doc:`tutorials/finetune_t5`\n     - Neuron 2.24.0\n     - Archived on: 7/31/2025\n   * - :doc:`tutorials/ssd300_demo/ssd300_demo`\n     - Neuron 2.24.0\n     - Archived on: 7/31/2025\n   * - :doc:`tutorials/megatron_gpt_pretraining`\n     - Neuron 2.25.0\n     - Archived on: 7/31/2025\n   * - :doc:`tutorials/finetuning_llama2_7b_ptl`\n     - Neuron 2.26.0\n     - Archived on: 8/25/2025\n   * - :doc:`tutorials/training_llama2_tp_pp_ptl`\n     - Neuron 2.26.0\n     - Archived on: 8/25/2025\n   * - :doc:`tutorials/training_codegen25_7b`\n     - Neuron 2.26.0\n     - Archived on: 8/25/2025\n   * - :doc:`tutorials/gpt3_neuronx_nemo_megatron_pretraining`\n     - Neuron 2.26.0\n     - Archived on: 8/25/2025\n   * - :doc:`tutorials/multinode-training-model-profiling`\n     - Neuron 2.29.0\n     - Archived on: 3/30/2026\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    tutorials/finetune_t5\n    tutorials/ssd300_demo/ssd300_demo\n    tutorials/megatron_gpt_pretraining\n    tutorials/training-gpt-neox-20b\n    tutorials/finetuning_llama2_7b_ptl\n    tutorials/training_llama2_tp_pp_ptl\n    tutorials/training_codegen25_7b\n    tutorials/multinode-training-model-profiling\n    tutorials/training-gpt-neox\n    tensorboard/getting-started-tensorboard-neuron-plugin\n    neuronperf/index\n    helper-tools/index\n    transformers-neuronx/index\n    mxnet-neuron/index\n    tensorflow/index\n    torch-neuron/index\n\nAccessing Archived Content\n--------------------------\n\nEach tutorial listed above corresponds to a specific version or feature set of the Neuron SDK that has since been superseded. Use these resources for historical context or migration guidance.\n\n.. warning::\n\n    Archived tutorials may not be compatible with current Neuron SDK releases. Exercise caution when following instructions from these documents.\n"
  },
  {
    "path": "archive/mxnet-neuron/api-compilation-python-api.rst",
    "content": ".. _ref-mxnet-neuron-compilation-python-api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nNeuron Apache MXNet Compilation Python API\n=======================================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThe MXNet-Neuron compilation Python API provides a method to compile\nmodel graph for execution on Inferentia.\n\n\nDescription\n-----------\n\nWithin the graph or subgraph, the compile method selects and sends\nNeuron-supported operations to Neuron-Compiler for compilation and saves\nthe compiled artifacts in the graph. Uncompilable operations are kept as\noriginal operations for framework execution.\n\nThe compiled graph can be saved using the MXNet save_checkpoint and\nserved using MXNet Model Serving. Please see\n:ref:`mxnet-neuron-model-serving` for more information about exporting\nto saved model and serving using MXNet Model Serving.\n\nOptions can be passed to Neuron compiler via the compile function. For\nexample, the “\\ ``--neuroncore-pipeline-cores``\\ ” option directs Neuron compiler\nto compile each subgraph to fit in the specified number of NeuronCores.\nThis number can be less than the total available NeuronCores on an Inf1\ninstance. See :ref:`neuron-compiler-cli-reference` for more information\nabout compiler options.\n\nFor debugging compilation, use SUBGRAPH_INFO=1 environment setting before\ncalling the compilation script. The extract subgraphs are preserved as hidden\nfiles in the run directory. For more information, see :ref:`neuron_gatherinfo`\n\n**MXNet 1.5**\n-------------\n\nMethod\n------\n\n\n.. code:: python\n\n  from mxnet.contrib import neuron\n  neuron.compile(sym, args, aux, inputs, **compile_args)\n\n\nArguments\n---------\n\n-  **sym** - Symbol object loaded from symbol.json file\n-  **args** - args/params dictionary loaded from params file\n-  **aux** - aux/params dictionary loaded from params file\n-  **inputs** - a dictionary with key/value mappings for input name to\n   input numpy arrays\n-  **kwargs** (optional) - a dictionary with key/value mappings for\n   MXNet-Neuron compilation and Neuron Compiler options.\n\n   -  For example, to limit the number of NeuronCores per subgraph, use\n      ``compile_args={'--neuroncore-pipeline-cores' : N}`` where N is an integer\n      representing the maximum number of NeuronCores per subgraph.\n   -  Additional compiler flags can be passed using\n      ``'flags' : [<flags>]`` where is a comma separated list of\n      strings. See :ref:`neuron_gatherinfo` for example of passing debug\n      flags to compiler.\n   -  Advanced option to exclude node names:\n      ``compile_args={'excl_node_names' : [<node names>]}`` where is a\n      comma separated list of node name strings.\n\nReturns\n-------\n\n-  **sym** - new partitioned symbol\n-  **args** - modified args/params\n-  **auxs** - modified aux/params\n\nExample Usage: Compilation\n--------------------------\n\nThe following is an example usage of the compilation, with default\ncompilation arguments:\n\n\n.. code:: python\n\n  from mxnet.contrib import neuron\n  ...\n  neuron.compile(sym, args, aux, inputs={'data' : img})\n\n\n\n**MXNet 1.8**\n-------------\n\n\nMethod\n------\n\n\n.. code:: python\n\n  import mx_neuron as neuron\n  neuron.compile(obj, args=None, aux=None, inputs=None, **compile_args)\n\n\nArguments\n---------\n\n-  **obj** - Symbol object loaded from symbol.json file or gluon.HybridBlock object\n-  **args** (optional) - args/params dictionary loaded from params file. Only needed in case of Symbol object\n-  **aux** (optional) - aux/params dictionary loaded from params file. Only needed in case of Symbol object\n-  **inputs** - a dictionary with key/value mappings for input name to\n   input numpy arrays.\n-  **kwargs** (optional) - a dictionary with key/value mappings for\n   MXNet-Neuron compilation and Neuron Compiler options.\n\n   -  For example, to limit the number of NeuronCores per subgraph, use\n      ``compile_args={'--neuroncore-pipeline-cores' : N}`` where N is an integer\n      representing the maximum number of NeuronCores per subgraph.\n   -  Additional compiler flags can be passed using\n      ``'flags' : [<flags>]`` where is a comma separated list of\n      strings. See :ref:`neuron_gatherinfo` for example of passing debug\n      flags to compiler.\n   -  Advanced option to exclude node names:\n      ``compile_args={'excl_node_names' : [<node names>]}`` where is a\n      comma separated list of node name strings.\n   -  work_dir:  relative or absolute path for storing compiler artifacts (including params and jsons) generated \n      during compilation when SUBGRAPH_INFO=1.\n\nReturns\n-------\n- **(sym, args, auxs)** - for symbol object as input. sym, args and auxs are new partitioned symbol, modified args/params and modified aux/params repectively.\n- **(obj)** - for gluon.HybridBlock object as input. obj is the parititioned and optimized gluon.Hybrid block object for Neuron backend.\n\n\nExample Usage: Compilation\n--------------------------\n\nThe following is an example usage of the compilation, with default\ncompilation arguments for symbol object:\n\n\n.. code:: python\n\n  import mx_neuron as neuron\n  ...  \n  neuron.compile(sym, args, aux, inputs={'data' : img})\n\n\nThe following is an example usage of the compilation, with default\ncompilation arguments for gluon.HybridBlock object (only supported in MXNet-Neuron 1.8):\n\n.. code:: python\n\n  import mx_neuron as neuron\n  ...  \n  neuron.compile(obj, inputs={'data' : img})\n\n\nExample Usage: Extract Compilation Statistics\n---------------------------------------------\n\nTo extract operation counts, insert the following code after compile\nstep (assume csym is the compiled MXNet symbol):\n\n.. code:: python\n\n   import json\n\n   # Return list of nodes from MXNet symbol\n   def sym_nodes(sym):\n     return json.loads(sym.tojson())['nodes']\n\n   # Return number of operations in node list  \n   def count_ops(graph_nodes):\n     return len([x['op'] for x in graph_nodes if x['op'] != 'null'])\n\n   # Return triplet of compile statistics\n   # - count of operations in symbol database\n   # - number of Neuron subgraphs\n   # - number of operations compiled to Neuron runtime  \n   def get_compile_stats(sym):\n     cnt = count_ops(sym_nodes(sym))\n     neuron_subgraph_cnt = 0\n     neuron_compiled_cnt = 0\n     for g in sym_nodes(sym):\n       if g['op'] == '_neuron_subgraph_op':\n         neuron_subgraph_cnt += 1\n         for sg in g['subgraphs']:\n           neuron_compiled_cnt += count_ops(sg['nodes'])\n     return (cnt, neuron_subgraph_cnt, neuron_compiled_cnt)\n\n   original_cnt = count_ops(sym_nodes(sym))\n   post_compile_cnt, neuron_subgraph_cnt, neuron_compiled_cnt = get_compile_stats(csym)\n   print(\"INFO:mxnet: Number of operations in original model: \", original_cnt)\n   print(\"INFO:mxnet: Number of operations in compiled model: \", post_compile_cnt)\n   print(\"INFO:mxnet: Number of Neuron subgraphs in compiled model: \", neuron_subgraph_cnt)\n   print(\"INFO:mxnet: Number of operations placed on Neuron runtime: \", neuron_compiled_cnt)\n\n.. code:: bash\n\n   INFO:mxnet: Number of operations in original model:  67\n   INFO:mxnet: Number of operations in compiled model:  4\n   INFO:mxnet: Number of Neuron subgraphs in compiled model:  2\n   INFO:mxnet: Number of operations placed on Neuron runtime:  65\n"
  },
  {
    "path": "archive/mxnet-neuron/api-reference-guide.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nAPI Reference Guide (mxnet-neuron)\n==================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /archive/mxnet-neuron/api-compilation-python-api\n\n\n.. include:: /archive/mxnet-neuron/api-reference-guide.txt\n"
  },
  {
    "path": "archive/mxnet-neuron/api-reference-guide.txt",
    "content": "* :ref:`ref-mxnet-neuron-compilation-python-api`"
  },
  {
    "path": "archive/mxnet-neuron/developer-guide.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nDeveloper Guide\n===============\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /about-neuron/appnotes/mxnet-neuron/flex-eg\n\n\n.. include:: /archive/mxnet-neuron/developer-guide.txt\n"
  },
  {
    "path": "archive/mxnet-neuron/developer-guide.txt",
    "content": "* :ref:`flexeg`"
  },
  {
    "path": "archive/mxnet-neuron/ec2-then-ec2-devflow.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n.. include:: /devflows/inference/ec2-then-ec2-devflow.rst\n"
  },
  {
    "path": "archive/mxnet-neuron/index.rst",
    "content": "Neuron Apache MXNet Release Notes\n==============================================\n\n.. toctree::\n   :maxdepth: 1\n\n   /release-notes/archive/mxnet-neuron\n"
  },
  {
    "path": "archive/mxnet-neuron/inference-mxnet-neuron.rst",
    "content": ".. _inference-mxnet-neuron:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInference (mxnet-neuron) (maintenance)\n=======================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Tutorials </archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron>\n    API Reference Guide </archive/mxnet-neuron/api-reference-guide>\n    Developer Guide  </archive/mxnet-neuron/developer-guide>\n    Misc  </archive/mxnet-neuron/misc-mxnet-neuron>\n\n\n.. include:: inference-mxnet-neuron.txt\n"
  },
  {
    "path": "archive/mxnet-neuron/inference-mxnet-neuron.txt",
    "content": ".. card:: Setup  (``mxnet-neuron``)\n            :link: setup-mxnet-neuron\n            :link-type: ref\n            :class-body: sphinx-design-class-title-small\n\n.. dropdown::  Tutorials\n        :class-title: sphinx-design-class-title-small\n        :animate: fade-in\n\n        .. include:: /archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron.txt\n\n\n.. dropdown::  API Reference Guide\n        :class-title: sphinx-design-class-title-small\n        :class-body: sphinx-design-class-body-small\n        :animate: fade-in\n\n        .. include:: /archive/mxnet-neuron/api-reference-guide.txt\n\n\n.. dropdown::  Developer Guide\n        :class-title: sphinx-design-class-title-small\n        :class-body: sphinx-design-class-body-small\n        :animate: fade-in\n\n        .. include:: /archive/mxnet-neuron/developer-guide.txt\n\n\n.. dropdown::  Misc\n        :class-title: sphinx-design-class-title-small\n        :class-body: sphinx-design-class-body-small\n        :animate: fade-in\n\n        .. include:: /archive/mxnet-neuron/misc-mxnet-neuron.txt"
  },
  {
    "path": "archive/mxnet-neuron/misc-mxnet-neuron.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nMisc (mxnet-neuron)\n===================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /archive/mxnet-neuron/troubleshooting-guide\n    What's New </release-notes/archive/mxnet-neuron>\n    /release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-mxnet\n\n\n.. include:: /archive/mxnet-neuron/misc-mxnet-neuron.txt\n"
  },
  {
    "path": "archive/mxnet-neuron/misc-mxnet-neuron.txt",
    "content": "* :ref:`mxnet_troubleshooting_guide`\n* :ref:`What's New <mxnet-neuron-rn>`\n* :ref:`neuron-cc-ops-mxnet`"
  },
  {
    "path": "archive/mxnet-neuron/mxnet-neuron-setup.rst",
    "content": ".. _mxnet-setup:\n\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nMXNet Neuron Setup\n==================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: mxnet-neuron-setup.txt\n"
  },
  {
    "path": "archive/mxnet-neuron/mxnet-neuron-setup.txt",
    "content": "\n.. card:: MxNet Neuron (``mxnet-neuron``) Setup for  Inf1 Instances\n            :link: setup-mxnet-neuron\n            :link-type: ref\n            :class-body: sphinx-design-class-title-small"
  },
  {
    "path": "archive/mxnet-neuron/neo-then-hosting-devflow.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n.. include:: /devflows/inference/neo-then-hosting-devflow.rst\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-install-prev-al2.rst",
    "content": ".. _mxnet-neuron-install-prev-al2:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous MXNet Neuron Releases for Amazon Linux (``mxnet-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\nThis section will assist you in installing previous Neuron releases.\n\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.18.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.17.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.17.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.16.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.16.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-install-prev-al2023.rst",
    "content": ".. _mxnet-neuron-install-prev-al2023:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous MXNet Neuron Releases for Amazon Linux 2023 (``mxnet-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\nThis section will assist you in installing previous Neuron releases.\n\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.18.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-install-prev-u20.rst",
    "content": "\n.. Install previous MXNet Neuron releases for Ubuntu 20.04 - archived\n\nUse the tabs below to install a specific previous Neuron SDK release. Select the Neuron version you need.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.18.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-install-prev-u22.rst",
    "content": ".. _mxnet-neuron-install-prev-u22:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous MXNet Neuron Releases for Ubuntu 22 (``mxnet-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\nThis section will assist you in installing previous Neuron releases.\n\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.18.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-install.rst",
    "content": ".. _install-neuron-mxnet:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall MXNet Neuron\n=====================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-neuron-al2-base-dlami.rst",
    "content": ".. _setup-mxnet-neuron-al2-base-dlami:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nMXNet Neuron (\"mxnet-neuron\") Setup on Amazon Linux 2\n=========================================================\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n.. include:: /setup/install-templates/al2-python.rst\n\nGet Started with Latest Release of MXNet Neuron (``mxnet-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instance sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Check for the latest version of the `DLAMI Base AMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-amazon-linux-2/>`_ and copy the AMI name that starts with \"Deep Learning Base Neuron AMI (Amazon Linux 2) <latest_date>\" from \"AMI Name:\" section\n    * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n\n.. include:: /includes/setup/tab-inference-mxnet-neuron-al2.txt\n\n.. include:: /archive/mxnet-neuron/setup/mxnet-update-u20.rst\n\n.. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-al2.rst"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-neuron-al2.rst",
    "content": ".. _setup-mxnet-neuron-al2:\n\n.. include:: /setup/install-templates/al2-python.rst\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nMXNet Neuron (\"mxnet-neuron\") Setup on Amazon Linux 2\n======================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n.. include:: /setup/install-templates/al2-python.rst\n\nGet Started with Latest Release of MXNet Neuron (``mxnet-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Select Amazon Linux 2 AMI(HVM) - Kernel 5.10\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n\n.. include:: /includes/setup/tab-inference-mxnet-neuron-al2.txt\n\n.. include:: /archive/mxnet-neuron/setup/mxnet-update-u20.rst\n\n.. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-al2.rst"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-neuron-al2023.rst",
    "content": ".. _setup-mxnet-neuron-al2023:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nMXNet Neuron (\"mxnet-neuron\") Setup on Amazon Linux 2023\n=========================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\nGet Started with Latest Release of MXNet Neuron (``mxnet-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Select Amazon Linux 2023 AMI\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n\n.. include:: /includes/setup/tab-inference-mxnet-neuron-al2023.txt\n\n.. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-al2023.rst"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-neuron-ubuntu20-base-dlami.rst",
    "content": ".. _setup-mxnet-neuron-u20-base-dlami:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nMXNet Neuron (\"mxnet-neuron\") Setup on Ubuntu 20\n================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of MXNet Neuron (``mxnet-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instance sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Check for the latest version of the `DLAMI Base AMI <https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-ubuntu-20-04/>`_ and copy the AMI name that starts with \"Deep Learning Base Neuron AMI (Ubuntu 20.04) <latest_date>\" from \"AMI Name:\" section\n    * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance\n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n.. include:: /includes/setup/tab-inference-mxnet-neuron-u20.txt\n\n.. include:: /archive/mxnet-neuron/setup/mxnet-update-u20.rst\n\n.. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-u20.rst"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-neuron-ubuntu20.rst",
    "content": ".. _setup-mxnet-neuron-u20:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nMXNet Neuron (\"mxnet-neuron\") Setup on Ubuntu 20\n=================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of MXNet Neuron (``mxnet-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Select Ubuntu Server 20 AMI\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n.. include:: /includes/setup/tab-inference-mxnet-neuron-u20.txt\n\n.. include:: /archive/mxnet-neuron/setup/mxnet-update-u20.rst\n\n.. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-u20.rst"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-neuron-ubuntu22.rst",
    "content": ".. _setup-mxnet-neuron-u22:\n\n.. card:: Select a Different Framework or Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\nMXNet Neuron (\"mxnet-neuron\") Setup on Ubuntu 22\n=================================================\n\n\n.. contents:: Table of contents\n\t:local:\n\t:depth: 2\n\n\nGet Started with Latest Release of MXNet Neuron (``mxnet-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`.\n\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n    * Select Ubuntu Server 20 AMI\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance \n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n.. include:: /includes/setup/tab-inference-mxnet-neuron-u22.txt\n\n.. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-u22.rst"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-update-u20.rst",
    "content": "\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n.. mxnet-neuron-u20-update:\n\nUpdate to latest MXNet Neuron  (``mxnet-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nIf you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release.\n\n\n.. tab-set::\n\n    .. tab-item:: MXNet 1.8.0\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n\n    .. tab-item:: MXNet 1.5.1\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/mxnet-update.rst",
    "content": ".. _update-neuron-mxnet:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUpdate to latest MXNet Neuron\n===============================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/prev-releases/neuron-1.14.2-mxnet-install.rst",
    "content": ".. _install-neuron-1.14.2-mxnet:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall MXNet Neuron (Neuron 1.14.2)\n======================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2\n\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1\n\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/prev-releases/neuron-1.15.0-mxnet-install.rst",
    "content": ".. _install-neuron-1.15.0-mxnet:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall MXNet Neuron (Neuron 1.15.0)\n======================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0\n\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1\n\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/prev-releases/neuron-1.15.1-mxnet-install.rst",
    "content": ".. _install-neuron-1.15.1-mxnet:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall MXNet Neuron (Neuron 1.15.1)\n======================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1\n\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1\n\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/prev-releases/neuron-1.15.2-mxnet-install.rst",
    "content": ".. _install-neuron-1.15.2-mxnet:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall MXNet Neuron (Neuron 1.15.2)\n======================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2\n\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1\n\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/prev-releases/neuron-1.16.3-mxnet-install.rst",
    "content": ".. _install-neuron-1.16.3-mxnet:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall MXNet Neuron\n=====================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3\n\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1\n\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/prev-releases/neuron-1.17.2-mxnet-install.rst",
    "content": ".. _install-neuron-1.17.2-mxnet:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall MXNet Neuron\n=====================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2\n\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1\n\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/prev-releases/neuron-1.18.0-mxnet-install.rst",
    "content": ".. _install-neuron-1.18.0-mxnet:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall MXNet Neuron\n=====================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0\n\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1\n\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/prev-releases/neuron-1.19.0-mxnet-install.rst",
    "content": ".. _install-neuron-1.19.0-mxnet:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall MXNet Neuron\n=====================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0\n\n\n\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1\n\n"
  },
  {
    "path": "archive/mxnet-neuron/setup/setup-inference",
    "content": "Setup Guide for Inf1\n====================\n\n.. toctree::\n    :maxdepth: 1\n\n    Fresh install </archive/mxnet-neuron/setup/mxnet-install>\n    Update to latest release </archive/mxnet-neuron/setup/mxnet-update>\n    Install previous releases </archive/mxnet-neuron/setup/mxnet-install-prev>"
  },
  {
    "path": "archive/mxnet-neuron/troubleshooting-guide.rst",
    "content": ".. _mxnet_troubleshooting_guide:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTroubleshooting Guide for Neuron Apache MXNet \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\nInference Runtime Error\n=======================\n\nOut-of-memory error when calling Symbol API bind() too many times\n-----------------------------------------------------------------\n\n.. important ::\n\n  ``NEURONCORE_GROUP_SIZES`` will no longer be supported starting Neuron 1.19.0 release if your application is using ``NEURONCORE_GROUP_SIZES`` please \n  see :ref:`neuron-migrating-apps-neuron-to-libnrt` and :ref:`eol-ncgs-env_2` for more details.\n\nIf you see out-of-memory error when using Symbol API's bind() function, please ensure that the bind() function is\ncalled once for each desired model instance. For example, on inf1.xlarge, use Symbol API to create 4 parallel \ninstances of a model that was compiled to 1 NeuronCore (--neuroncore-pipeline-cores=1), each is bound to an \ndifferent mx.neuron(i) context where i is the NeuronCore Group index ranging from 0 to 3. Then use 4 threads to feed\nthe 4 instances in parallel. For example:\n\n.. code:: python\n\n    NUM_PARALLEL = 4\n    os.environ['NEURONCORE_GROUP_SIZES'] = ','.join('1' for _ in range(NUM_PARALLEL))\n       \n    data_iter = []\n    for i in range(NUM_PARALLEL):\n        data_iter.append(mx.io.ImageRecordIter(\n            path_imgrec=recfile_base, data_shape=(3, 224, 224), batch_size=1,            \n            prefetch_buffer=1,\n            num_parts=NUM_PARALLEL, part_index=i))\n\n    sym, args, auxs = mx.model.load_checkpoint('resnet-50_compiled', 0)\n\n    exec_list = []\n    for i in range(NUM_PARALLEL):\n        exec = sym.bind(ctx=mx.neuron(i), args=args, aux_states=auxs, grad_req='null')\n        exec_list.append(exec)\n\n    def single_thread_infer(i):\n        for batch in data_iter[i]:\n            img = batch.data[0]\n            label = batch.label\n            feed_dict = {'data': img}\n            exe = exec_list[i]\n            exe.copy_params_from(feed_dict)\n            exe.forward()\n            out = exe.outputs[0]\n\n    future_list = []\n    with futures.ThreadPoolExecutor(max_workers=NUM_PARALLEL) as executor:\n        for i in range(NUM_PARALLEL):\n            future_list.append(executor.submit(single_thread_infer, i))\n\n\nInference crashed with MXNetError: InferShapeKeyword argument name xyz not found\n--------------------------------------------------------------------------------\n\nIf you see MXNetError:\n\n.. code:: bash\n\n    mxnet.base.MXNetError: [11:55:39] src/c_api/c_api_symbolic.cc:508: InferShapeKeyword argument name xyz not found.\"\n\nThis is followed by a list of \"Candidate arguments\". This list shows all the input argument names that the model knows about, and 'xyz' is not in the list. To fix this, remove entry xyz from the feed dictionary.\n\n\nInference crashed at mx.nd.waitall() with MXNetError: Check failed: bin.dtype() == mshadow::kUint8\n--------------------------------------------------------------------------------------------------\n\nWhen executing Symbol API's forward function followed by mx.nd.waitall(), where MXNetError exception occurs with 'Check failed: bin.dtype() == mshadow::kUint8'.\n\n\nInference crashed with NRTD error 1002\n--------------------------------------\n\nDuring inference, the user may encounter an error with details \"[NRTD:infer_wait] error: 1002\":\n\n.. code:: bash\n\n    mxnet.base.MXNetError: [11:26:56] src/operator/subgraph/neuron/./neuron_util.h:1175: Check failed: rsp_wait.status().code() == 0 || rsp_wait.status().code() == 1003: Failed\n    Infer Wait with Neuron-RTD Error. Neuron-RTD Status Code: 1002, details: \"[NRTD:infer_wait] error: 1002\n    \"\n\nRuntime errors are listed in the Neuron Runtime return codes documentation. In particular, 1002 means that some invalid input has been submitted to infer, e.g. missing some of the input tensors, incorrect input tensor sizes. Please examine /var/log/syslog to see imore details on the error. For example, you may see:\n\n.. code::\n\n    Oct 30 19:13:39 ip-172-31-93-131 nrtd[1125]: [TDRV:io_queue_prepare_input_nonhugetlb] Unexpected input size, for data00, expected: 2097152, received: 33554432\n\nThis means that the input tensor size is larger than what the model was compiled for (i.e. the example input tensor shapes passed during compilation.\n\n\nMulti-Model Server\n==================\n\n\nFailed to create NEURONCORE Group with GRPC Error. Status Error: 14, Error message: \"Connect Failed\"\n----------------------------------------------------------------------------------------------------\n\nNOTE: This error only applies to MXNet 1.5.\n\nIf the client is unable to start workers and you get a message that MMS is unable to create NeuronCore Group,\nplease check that Neuron RTD is running (neuron-rtd process).\n\n.. code:: json\n\n    {\n    \"code\": 500,\n    \"type\": \"InternalServerException\",\n    \"message\": \"Failed to start workers“\n    }\n\n.. code:: bash\n\n    2019-10-23 19:56:23,187 [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [19:56:23] src/operator/subgraph/inferentia/./inferentia_util.h:218: Check failed: status.ok() Failed to create NeuronCore Group with GRPC Error. Status Error: 14, Error message: \"Connect Failed\"\n\nMultiple MMS workers die with “Backend worker process die.” message\n-------------------------------------------------------------------\n\n.. important ::\n\n  ``NEURONCORE_GROUP_SIZES`` will no longer be supported starting Neuron 1.19.0 release if your application is using ``NEURONCORE_GROUP_SIZES`` please \n  see :ref:`neuron-migrating-apps-neuron-to-libnrt` and :ref:`eol-ncgs-env_2` for more details.\n\nIf you run inference with MMS and get multiple messages “Backend worker process die\", please ensure that the number of workers (\"intial_workers\") passed during load model is less than or equal to number of NeuronCores available divided by  number of NeuronCores required by model.\n\n.. code:: bash\n\n    com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Backend worker process die.\n    com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):\n    com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File \"/usr/local/lib/python3.6/site-packages/mxnet/symbol/symbol.py\", line 1524, in simple_bind\n    com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ctypes.byref(exe_handle)))\n    com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File \"/usr/local/lib/python3.6/site-packages/mxnet/base.py\", line 252, in check_call\n    com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise MXNetError(py_str(_LIB.MXGetLastError()))\n    com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mxnet.base.MXNetError: [00:26:32] src/operator/subgraph/neuron/./neuron_util.h:221: Check failed: 0 == create_eg_rsp.status().code() Failed to create NeuronCore Group with KRTD Error. KRTD Status Code: 4, details: \"\"\n\nAs indicated in :ref:`appnote-performance-tuning`, for greater flexibility user can use NEURONCORE_GROUP_SIZES to specify the groupings of NeuronCores into Neuron devices, each device consisting of one or more NeuronCores. Each worker would take a device. The total number of NeuronCores taken by all the workers should be less than or equal the total number of NeuronCores visible to neuron-rtd. This situation should be considered at full load (MMS scales up to max_workers). Additionally, to properly assign model to Neuron device, the environment NEURONCORE_GROUP_SIZES must be specified within the model server class (ie. mxnet_model_service.py in the example above). For example, add the following line within mxnet_model_service.py for model compiled to 1 NeuronCore:\n\n.. code:: python\n\n    os.environ['NEURONCORE_GROUP_SIZES'] = '1'\n\nMore information about max_worker limit setting can be found at `MMS Management API Documentation`_. For example, to run up to 4 workers in inf1.xlarge where 4 NeuronCores are available by default to Neuron-RTD, set max_workers to 4:\n\n.. _MMS Management API Documentation: https://github.com/awslabs/multi-model-server/blob/master/docs/management_api.md#user-content-scale-workers\n\n.. code:: bash\n\n    curl -v -X PUT \"http://localhost:8081/models/squeezenet_v1.1_compiled?min_worker=1?max_worker=4\"\n\nMMS throws a \"mxnet.base.MXNetError: array::at\" error\n-----------------------------------------------------\n\nIf you see “mxnet.base.MXNetError: array::at” when running MMS please check that NDArray/Gluon API is not used as they are not supported in MXNet-Neuron.\nIf you would like to use NDArray or Gluon API, please upgrade to MXNet 1.8.\n\n.. code:: bash\n\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - array::at\n    [INFO ] W-9000-squeezenet_v1.1_compiled com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 30\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File \"/tmp/models/6606fa046f68a34df87f15362a7a2d9a49749878/model_handler.py\", line 82, in handle\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     data = self.inference(data)\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File \"/tmp/models/6606fa046f68a34df87f15362a7a2d9a49749878/mxnet_model_service.py\", line 153, in inference\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     d.wait_to_read()\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File \"/home/user/regression_venv_p3.6/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py\", line 1819, in wait_to_read\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     check_call(_LIB.MXNDArrayWaitToRead(self.handle))\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File \"/home/user/regression_venv_p3.6/lib/python3.6/site-packages/mxnet/base.py\", line 253, in check_call\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     raise MXNetError(py_str(_LIB.MXGetLastError()))\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mxnet.base.MXNetError: array::at\n    [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Invoking custom service failed.\n\nMXNet Model Server is not able to clean up Neuron RTD states after model is unloaded\n------------------------------------------------------------------------------------\n\nNOTE: This issue is resolved in version 1.5.1.1.1.88.0 released 11/17/2020 and only applies for MXNet 1.5.\n\nMXNet Model Server is not able to clean up Neuron RTD states after model is unloaded (deleted) from model server. Restarting the model server may fail with \"Failed to create NEURONCORE_GROUP\" error:\n\n.. code:: bash\n\n    mxnet.base.MXNetError: [00:26:59] src/operator/subgraph/neuron/./neuron_util.h:348: Check failed:    0 == create_eg_rsp.status().code(): Failed to create NEURONCORE_GROUP with Neuron-RTD Error. Neuron-RTD Status Code: 9, details: \"\"\n\nThe workaround is to run “`/opt/aws/neuron/bin/neuron-cli reset`“ to clear Neuron RTD states after all models are unloaded and server is shut down before restarting the model server.\n\nPipeline mode is not able to execute inferences requests in parallel\n--------------------------------------------------------------------\n\nIf you see that multiple executors in a neuron pipeline setup (one model compiled for more than one neuron-cores using `--neuroncore-pipeline-cores` option during compilation) are not running in parallel, please set the following MXNet's environment variables before inference to allow mxnet to execute the CPU ops in parallel. Otherwise it will be sequential and stall the executors.\n\n``MXNET_CPU_WORKER_NTHREADS`` is used to do that. Setting its value to ``__subgraph_opt_neuroncore__`` in the compiled model json will ensure that all the executors (threads) can be run in parallel.\n\n\nFeatures only in MXNet-Neuron 1.5\n---------------------------------\n- Shared memory for IFMaps transfer to neuron runtime (has higher performance compared to GRPC mode)\n- Neuron profiling using MXNet\n\nFeatures only in MXNet-Neuron 1.8\n---------------------------------\n- Gluon API support\n- Library mode neuron runtime\n"
  },
  {
    "path": "archive/mxnet-neuron/tutorials/mxnet-tutorial-setup.rst",
    "content": ".. _mxnet-tutorial-setup:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nMXNet Tutorial Setup\n====================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n#. Launch an Inf1.6xlarge Instance:\n    .. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst\n\n#. Set up a development environment:\n    * Enable or install MXNet-Neuron: :ref:`install-neuron-mxnet`.\n    \n\n#. Run tutorial in Jupyter notebook:\n    * Follow instruction at :ref:`Setup Jupyter notebook <setup-jupyter-notebook-steps-troubleshooting>` to:\n    \n      #. Start the Jupyter Notebook on the instance\n      #. Run the Jupyter Notebook from your local browser\n\n    * Connect to the instance from the terminal, clone the Neuron Github repository to the Inf1 instance and then change the working directory to the tutorial directory:\n\n      .. code::\n\n        git clone https://github.com/aws/aws-neuron-sdk.git\n        cd aws-neuron-sdk/src/examples/mxnet\n\n    * Locate the tutorial notebook file (.ipynb file) under ``aws-neuron-sdk/src/examples/mxnet``\n    * From your local browser, open the tutorial notebook from the menu and follow the instructions.\n"
  },
  {
    "path": "archive/mxnet-neuron/tutorials/tutorial-model-serving.rst",
    "content": ".. _mxnet-neuron-model-serving:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTutorial: Neuron Apache MXNet Model Serving\n=============================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThis MXNet Neuron Model Serving (MMS) example is adapted from the MXNet\nvision service example which uses pretrained squeezenet to perform image\nclassification:\nhttps://github.com/awslabs/multi-model-server/tree/master/examples/mxnet_vision.\n\nBefore starting this example, please ensure that Neuron-optimized MXNet\nversion mxnet-neuron is installed along with Neuron Compiler.\n\nWarning\n*******\nIf you are using MXNet-1.5, please note that MXNet-1.5 entered maintenance mode and require Neuron Runtime 1.x, please see :ref:`maintenance_mxnet_1_5`.\nTo setup development environment for MXNet-1.5 see installation instructions at :ref:`mxnet-setup`.\n\nIf using DLAMI, you can activate the environment aws_neuron_mxnet_p36\nand skip the installation part in the first step below.\n\n1. First, install Java runtime and multi-model-server:\n\n.. code:: bash\n\n   cd ~/\n   # sudo dnf -y install -q jre # for AL2023\n   sudo apt-get install -y -q default-jre  # for Ubuntu\n   pip install multi-model-server\n\nDownload the example code:\n\n.. code:: bash\n\n   git clone https://github.com/awslabs/multi-model-server\n   cd ~/multi-model-server/examples/mxnet_vision\n\n2. Compile ResNet50 model to Inferentia target by saving the following\n   Python script to compile_resnet50.py and run\n   “\\ ``python compile_resnet50.py``\\ ”\n\n.. code:: python\n\n\n   from packaging import version\n   import numpy as np\n   import mxnet as mx\n   \n   mxnet_version = version.parse(mx.__version__)\n   if mxnet_version >= version.parse(\"1.8\"):\n      import mx_neuron as neuron\n   else: \n      from mxnet.contrib import neuron\n\n   path='http://data.mxnet.io/models/imagenet/'\n   mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')\n   mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')\n   mx.test_utils.download(path+'synset.txt')\n\n   nn_name = \"resnet-50\"\n\n   #Load a model\n   sym, args, auxs = mx.model.load_checkpoint(nn_name, 0)\n\n   #Define compilation parameters\n   #  - input shape and dtype\n   inputs = {'data' : mx.nd.zeros([1,3,224,224], dtype='float32') }\n\n   # compile graph to inferentia target\n   csym, cargs, cauxs = neuron.compile(sym, args, auxs, inputs)\n\n   # save compiled model\n   mx.model.save_checkpoint(nn_name + \"_compiled\", 0, csym, cargs, cauxs)\n\n3. Prepare signature file ``signature.json`` to configure the input name\n   and shape:\n\n.. code:: json\n\n   {\n     \"inputs\": [\n       {\n         \"data_name\": \"data\",\n         \"data_shape\": [\n           1,\n           3,\n           224,\n           224\n         ]\n       }\n     ]\n   }\n\n4. Prepare ``synset.txt`` which is a list of names for ImageNet\n   prediction classes:\n\n.. code:: bash\n\n   curl -O https://s3.amazonaws.com/model-server/model_archive_1.0/examples/squeezenet_v1.1/synset.txt\n\n5. Create custom service class following template in\n   model_server_template folder:\n\n.. code:: bash\n\n   cp -r ../model_service_template/* .\n\nEdit ``mxnet_model_service.py`` to use the appropriate context. \n\nMake the following change:\n\n.. code:: bash\n\n   from packaging import version\n   \n   mxnet_version = version.parse(mx.__version__)\n   if mxnet_version >= version.parse(\"1.8\"):\n      import mx_neuron as neuron\n   self.mxnet_ctx = mx.neuron()\n\nComment out the existing context set:\n\n.. code:: bash\n\n   #self.mxnet_ctx = mx.cpu() if gpu_id is None else mx.gpu(gpu_id)\n\nAlso, comment out unnecessary data copy for model_input in\n``mxnet_model_service.py``:\n\n.. code:: bash\n\n   #model_input = [item.as_in_context(self.mxnet_ctx) for item in model_input]\n\n6. Package the model with model-archiver:\n\n.. code:: bash\n\n   cd ~/multi-model-server/examples\n   model-archiver --force --model-name resnet-50_compiled --model-path mxnet_vision --handler mxnet_vision_service:handle\n\n7. Start MXNet Model Server (MMS) and load model using RESTful API.\n   Please ensure that Neuron RTD is running with default settings (see\n   Neuron Runtime Getting Started):\n\n.. code:: bash\n\n   cd ~/multi-model-server/\n   multi-model-server --start --model-store examples\n   # Pipe to log file if you want to keep a log of MMS\n   curl -v -X POST \"http://localhost:8081/models?initial_workers=1&max_workers=1&synchronous=true&url=resnet-50_compiled.mar\"\n   sleep 10 # allow sufficient time to load model\n\nEach worker requires a NeuronCore group that can accommodate the compiled\nmodel. Additional workers can be added by increasing max_workers\nconfiguration as long as there are enough NeuronCores available. Use\n``neuron-top`` to see which models are loaded on specific NeuronCores.\n\n8. Test inference using an example image:\n\n.. code:: bash\n\n   curl -O https://raw.githubusercontent.com/awslabs/multi-model-server/master/docs/images/kitten_small.jpg\n   curl -X POST http://127.0.0.1:8080/predictions/resnet-50_compiled -T kitten_small.jpg\n\nYou will see the following output:\n\n.. code:: bash\n\n   [\n     {\n       \"probability\": 0.6375716328620911,\n       \"class\": \"n02123045 tabby, tabby cat\"\n     },\n     {\n       \"probability\": 0.1692783385515213,\n       \"class\": \"n02123159 tiger cat\"\n     },\n     {\n       \"probability\": 0.12187337130308151,\n       \"class\": \"n02124075 Egyptian cat\"\n     },\n     {\n       \"probability\": 0.028840631246566772,\n       \"class\": \"n02127052 lynx, catamount\"\n     },\n     {\n       \"probability\": 0.019691042602062225,\n       \"class\": \"n02129604 tiger, Panthera tigris\"\n     }\n   ]\n\n9. To cleanup after test, issue a delete command via RESTful API and\n   stop the model server:\n\n.. code:: bash\n\n   curl -X DELETE http://127.0.0.1:8081/models/resnet-50_compiled\n\n   multi-model-server --stop\n"
  },
  {
    "path": "archive/mxnet-neuron/tutorials/tutorials-mxnet-computervision.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nComputer Vision Tutorials (``mxnet-neuron``)\n============================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n* ResNet-50 tutorial :ref:`[html] </src/examples/mxnet/resnet50/resnet50.ipynb>` :mxnet-neuron-src:`[notebook] <resnet50/resnet50.ipynb>`\n* Model Serving tutorial :ref:`[html] <mxnet-neuron-model-serving>`\n* Getting started with Gluon tutorial :ref:`[html] </src/examples/mxnet/mxnet-gluon-tutorial.ipynb>` :github:`[notebook] </src/examples/mxnet/mxnet-gluon-tutorial.ipynb>`\n"
  },
  {
    "path": "archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTutorials  (``mxnet-neuron``)\n=============================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Computer Vision Tutorials </archive/mxnet-neuron/tutorials/tutorials-mxnet-computervision>\n    Natural Language Processing (NLP) Tutorials </archive/mxnet-neuron/tutorials/tutorials-mxnet-nlp>\n    Utilizing Neuron Capabilities Tutorials </archive/mxnet-neuron/tutorials/tutorials-mxnet-utilizing-neuron-capabilities>\n\n\n.. include:: /archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron.txt\n"
  },
  {
    "path": "archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron.txt",
    "content": ".. tab-set::\n\n    .. tab-item:: Computer Vision Tutorials\n                :name:\n\n                * ResNet-50 tutorial :ref:`[html] </src/examples/mxnet/resnet50/resnet50.ipynb>` :mxnet-neuron-src:`[notebook] <resnet50/resnet50.ipynb>`\n                * Model Serving tutorial :ref:`[html] <mxnet-neuron-model-serving>`\n                * Getting started with Gluon tutorial :ref:`[html] </src/examples/mxnet/mxnet-gluon-tutorial.ipynb>` :github:`[notebook] </src/examples/mxnet/mxnet-gluon-tutorial.ipynb>`\n\n\n    .. tab-item:: Natural Language Processing (NLP) Tutorials\n                :name:\n\n                * MXNet 1.8: Using data parallel mode tutorial :ref:`[html] </src/examples/mxnet/data_parallel/data_parallel_tutorial.ipynb>` :mxnet-neuron-src:`[notebook] <data_parallel/data_parallel_tutorial.ipynb>`\n\n\n    .. tab-item:: Utilizing Neuron Capabilities Tutorials\n                :name:\n\n                * NeuronCore Groups tutorial :ref:`[html] </src/examples/mxnet/resnet50_neuroncore_groups.ipynb>` :mxnet-neuron-src:`[notebook] <resnet50_neuroncore_groups.ipynb>`\n\n\n.. note::\n\n        To use Jupyter Notebook see:\n\n        * :ref:`setup-jupyter-notebook-steps-troubleshooting`\n        * :ref:`running-jupyter-notebook-as-script`"
  },
  {
    "path": "archive/mxnet-neuron/tutorials/tutorials-mxnet-nlp.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nNatural Language Processing (NLP) Tutorials (``mxnet-neuron``)\n==============================================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n* MXNet 1.8: Using data parallel mode tutorial :ref:`[html] </src/examples/mxnet/data_parallel/data_parallel_tutorial.ipynb>` :mxnet-neuron-src:`[notebook] <data_parallel/data_parallel_tutorial.ipynb>`\n\n"
  },
  {
    "path": "archive/mxnet-neuron/tutorials/tutorials-mxnet-utilizing-neuron-capabilities.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUtilizing Neuron Capabilities Tutorials (``mxnet-neuron``)\n==========================================================\n\n.. warning::\n\n   This document is archived. MXNet is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n* NeuronCore Groups tutorial :ref:`[html] </src/examples/mxnet/resnet50_neuroncore_groups.ipynb>` :mxnet-neuron-src:`[notebook] <resnet50_neuroncore_groups.ipynb>`\n\n"
  },
  {
    "path": "archive/neuronperf/index.rst",
    "content": ".. _neuronperf:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\n=================\nNeuronPerf (Beta)\n=================\n\nNeuronPerf is a lightweight Python library with a simple API that enables fast measurements of performance when running models using Neuron.\n\n.. _neuronperf_quickstart:\n\nNeuronPerf Quickstart\n---------------------\n\nTo install NeuronPerf in your Neuron environment, execute:\n\n.. code:: bash\n\n  $ pip install neuronperf --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n\nRefer to the :ref:`neuronperf_examples` and :ref:`neuronperf_user_guide` to get started.\n\n\n.. _neuronperf_user_guide:\n\nNeuronPerf User Guide\n---------------------\n\n.. toctree::\n   :maxdepth: 1\n\n   Overview <neuronperf_overview>\n   Terminology <neuronperf_terminology>\n   Examples <neuronperf_examples>\n   Benchmark Guide <neuronperf_benchmark_guide>\n   Evaluate Guide <neuronperf_evaluate_guide>\n   Compile Guide <neuronperf_compile_guide>\n   Model Index Guide <neuronperf_model_index_guide>\n\n\nNeuronPerf API Reference\n------------------------\n\n.. toctree::\n   :maxdepth: 1\n\n   API <neuronperf_api>\n   Framework Notes <neuronperf_framework_notes>\n\n\nFAQ\n---\n\n.. toctree::\n   :maxdepth: 1\n\n   FAQ <neuronperf_faq>\n\n\nTroubleshooting\n---------------\n\n.. toctree::\n   :maxdepth: 1\n\n   Troubleshooting <neuronperf_troubleshooting>\n\n\nRelease Notes\n-------------\n\n.. toctree::\n   :maxdepth: 1\n\n   rn\n\n\n\n\n\n"
  },
  {
    "path": "archive/neuronperf/neuronperf_api.rst",
    "content": ".. _neuronperf_api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nNeuronPerf API\n==============\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n.. note::\n    Due to a bug in Sphinx, some of the type annotations may be incomplete. \n\n.. py:function:: compile(compile_fn, model, inputs, batch_sizes: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, performance_levels: Union[str, List[int]] = None, models_dir: str = \"models\", filename: str = None, compiler_args: dict = None, verbosity: int = 1, *args, **kwargs) -> str:\n\n    Compiles the provided model with each provided example input, pipeline size, and performance level.\n    Any additional compiler_args passed will be forwarded to the compiler on every invocation.\n\n    :param model: The model to compile.\n    :param list inputs: A list of example inputs.\n    :param batch_sizes: A list of batch sizes that correspond to the example inputs.\n    :param pipeline_sizes: A list of pipeline sizes to use. See :ref:`neuroncore-pipeline`.\n    :param performance_levels: A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default).  See :ref:`neuron-cc-training-mixed-precision`.\n    :param str models_dir: The directory where compilation artifacts will be stored.\n    :param str model_name: An optional model name tag to apply to compiled artifacts.\n    :param str filename: The name of the model index to write out. If not provided, a name will be generated and returned.\n    :param dict compiler_args: Additional compiler arguments to be forwarded with every compilation.\n    :param int verbosity: 0 = error, 1 = info, 2 = debug\n    :return: A model index filename. If a configuration fails to compile, it will not be included in the index and an error will be logged.\n    :rtype: str\n\n.. _neuronperf_api_benchmark:\n\n\n.. py:function:: benchmark(load_fn: Callable[[str, int], Any], model_filename: str, inputs: Any, batch_sizes: Union[int, List[int]] = None, duration: float = BENCHMARK_SECS, n_models: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, cast_modes: Union[str, List[str]] = None, workers_per_model: Union[int, None] = None, env_setup_fn: Callable[[int, Dict], None] = None, setup_fn: Callable[[int, Dict, Any], None] = None, preprocess_fn: Callable[[Any], Any] = None, postprocess_fn: Callable[[Any], Any] = None, dataset_loader_fn: Callable[[Any, int], Any] = None, verbosity: int = 1, multiprocess: bool = True, multiinterpreter: bool = False, return_timers: bool = False, device_type: str = \"neuron\") -> List[Dict]:\n\n    Benchmarks the model index or individiual model using the provided inputs.\n    If a model index is provided, additional fields such as ``pipeline_sizes`` and\n    ``performance_levels`` can be used to filter the models to benchmark. The default\n    behavior is to benchmark all configurations in the model index.\n\n    :param load_fn: A function that accepts a model filename and device id, and returns a loaded model. This is automatically passed through the subpackage calls (e.g. ``neuronperf.torch.benchmark``).\n    :param str model_filename: A path to a model index from compile or path to an individual model. For CPU benchmarking, a class should be passed that can be instantiated with a default constructor (e.g. ``MyModelClass``).\n    :param list inputs: A list of example inputs. If the list contains tuples, they will be destructured on inference to support multiple arguments.\n    :param batch_sizes: A list of ints indicating batch sizes that correspond to the inputs. Assumes 1 if not provided.\n    :param float duration: The number of seconds to benchmark each model.\n    :param n_models: The number of models to run in parallel. Default behavior runs 1 model and the max number of models possible, determined by a best effort from ``device_type``, instance size, or other environment state.\n    :param pipeline_sizes: A list of pipeline sizes to use. See :ref:`neuroncore-pipeline`.\n    :param performance_levels: A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See :ref:`neuron-cc-training-mixed-precision`.\n    :param workers_per_model: The number of workers to use per model loaded. If ``None``, this is automatically selected.\n    :param env_setup_fn: A custom environment setup function to run in each subprocess before model loading. It will receive the benchmarker id and config.\n    :param setup_fn: A function that receives the benchmarker id, config, and model to perform last minute configuration before inference.\n    :param preprocess_fn: A custom preprocessing function to perform on each input before inference.\n    :param postprocess_fn: A custom postprocessing function to perform on each input after inference.\n    :param bool multiprocess: When True, model loading is dispatched to forked subprocesses. Should be left alone unless debugging.\n    :param bool multiinterpreter: When True, benchmarking is performed in a new python interpreter per model. All parameters must be serializable. Overrides multiprocess.\n    :param bool return_timers: When True, the return of this function is a list of tuples ``(config, results)`` with detailed information. This can be converted to reports with ``get_reports(results)``.\n    :param float stats_interval: Collection interval (in seconds) for metrics during benchmarking, such as CPU and memory usage.\n    :param str device_type: This will be set automatically to one of the ``SUPPORTED_DEVICE_TYPES``.\n    :param float cost_per_hour: The price of this device / hour. Used to estimate cost / 1 million infs in reports.\n    :param str model_name: A friendly name for the model to use in reports.\n    :param str model_class_name: Internal use.\n    :param str model_class_file: Internal use.\n    :param int verbosity: 0 = error, 1 = info, 2 = debug\n    :return: A list of benchmarking results.\n    :rtype: list[dict]\n\n\n.. py:function:: get_reports(results)\n\n   Summarizes and combines the detailed results from ``neuronperf.benchmark``, when run with ``return_timers=True``. One report dictionary is produced per model configuration benchmarked. The list of reports can be fed directly to other reporting utilities, such as ``neuronperf.write_csv``.\n\n   :param list[tuple] results: The list of results from ``neuronperf.benchmark``.\n   :param list[int] batch_sizes: The batch sizes that correspond to the `inputs` provided to ``compile`` and ``benchmark``. Used to correct throughput values in the reports.\n   :return: A list of dictionaries that summarize the results for each model configuration.\n   :rtype: list[dict]\n\n.. py:function:: print_reports(reports, cols=SUMMARY_COLS, sort_by=\"throughput_peak\", reverse=False)\n\n    Print a report to the terminal.\n    Example of default behavior:\n\n    >>> neuronperf.print_reports(reports)\n    throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size  workers_per_model batch_size model_filename\n    329.667        6.073          6.109          1        1              2                 1          models/model_b1_p1_83bh3hhs.pt\n\n    :param reports: Results from `get_reports`.\n    :param cols: The columns in the report to be displayed.\n    :param sort_by: Sort the cols by the specified key.\n    :param reverse: Sort order.\n\n.. py:function:: write_csv(reports: list[dict], filename: str = None, cols=REPORT_COLS)\n\n    Write benchmarking reports to CSV file.\n\n    :param list[dict] reports: Results from `neuronperf.get_reports`.\n    :param str filename: Filename to write. If not provided, generated from model_name in report and current timestamp.\n    :param list[str] cols: The columns in the report to be kept.\n    :return: The filename written.\n    :rtype: str\n\n.. py:function:: write_json(reports: list[dict], filename: str = None)\n\n    Writes benchmarking reports to a JSON file.\n\n\t:param list[dict] reports: Results from `neuronperf.get_reports`.\n\t:param str filename: Filename to write. If not provided, generated from model_name in report and current timestamp.\n\t:return: The filename written.\n\t:rtype: str\n\n\n.. py:function:: model_index.append(*model_indexes: Union[str, dict]) -> dict:\n\n    Appends the model indexes non-destructively into a new model index, without\n    modifying any of the internal data.\n\n    This is useful if you have benchmarked multiple related models and wish to\n    combine their respective model indexes into a single index.\n\n    Model name will be taken from the first index provided.\n    Duplicate configs will be filtered.\n\n    :param model_indexes: Model indexes or paths to model indexes to combine.\n    :return: A new dictionary representing the combined model index.\n    :rtype: dict\n\n\n.. py:function:: model_index.copy(old_index: Union[str, dict], new_index: str, new_dir: str) -> str:\n\n    Copy an index to a new location. Will rename ``old_index``\n    to ``new_index`` and copy all model files into ``new_dir``,\n    updating the index paths.\n\n    This is useful for pulling individual models out of a pool.\n\n    Returns the path to the new index.\n\n\n.. py:function:: model_index.create(filename, input_idx=0, batch_size=1, pipeline_size=1, cast_mode=DEFAULT_CAST, compile_s=None)\n\n    Create a new model index from a pre-compiled model.\n\n    :param str filename: The path to the compiled model.\n    :param int input_idx: The index in your inputs that this model should be run on.\n    :param int batch_size: The batch size at compilation for this model.\n    :param int pipeline_size: The pipeline size used at compilation for this model.\n    :param str cast_mode: The casting option this model was compiled with.\n    :param float compile_s: Seconds spent compiling.\n    :return: A new dictionary representing a model index.\n    :rtype: dict\n\n\n.. py:function:: model_index.delete(filename: str):\n\n    Deletes the model index and all associated models referenced by the index.\n\n\n.. py:function:: model_index.filter(index: Union[str, dict], **kwargs) -> dict:\n\n    Filters provided model index on provided criteria and returns a new index.\n    Each kwarg is a standard (k, v) pair, where k is treated as a filter name\n    and v may be one or more values used to filter model configs.\n\n\n.. py:function:: model_index.load(filename) -> dict:\n\n    Load a NeuronPerf model index from a file.\n\n\n.. py:function:: model_index.move(old_index: str, new_index: str, new_dir: str) -> str:\n\n    This is the same as ``copy`` followed by ``delete`` on the old index.\n\n\n.. py:function:: model_index.save(model_index, filename: str = None, root_dir=None) -> str:\n\n    Save a NeuronPerf model index to a file.\n\n\n"
  },
  {
    "path": "archive/neuronperf/neuronperf_benchmark_guide.rst",
    "content": ".. _neuronperf_benchmark_guide:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\n==========================\nNeuronPerf Benchmark Guide\n==========================\n\nThe call to ``neuronperf[torch/tensorflow/mxnet/cpu].benchmark`` is used to measure your model performance. It will choose reasonable defaults if none are provided, and will return back reports that summarize the benchmarking results.\n\nWhat is the default behavior of ``benchmark``?\n----------------------------------------------\n\nThat will depend how you provided your model and how your model was compiled.\n\nThe two most common ways to provide your model are:\n\n#. Provide the path to your compiled model\n#. Provide the path to a model index from ``neuronperf.compile`` (a JSON file)\n\n\nData Parallel\n~~~~~~~~~~~~~\n\nYour model is benchmarked on provided ``inputs`` in 4 different configurations:\n   #. A single model on 1 NeuronCore with one worker (min. latency)\n   #. A single model on 1 NeuronCore with two workers (max. throughput / NC)\n   #. ``MAX`` models on ``MAX`` NeuronCores with one worker (min. latency + max. instance usage)\n   #. ``MAX`` models on ``MAX`` NeuronCores with two workers (max. throughput + max. instance usage)\n\nThe value ``MAX`` is automatically determined by your instance size. If it can't be identified, those configurations will be skipped.\n\nThe primary benefit of (3) and (4) is to verify that your model scales well at maximum instance usage.\n\n.. note::\n\n   If you provided the path to a model index from ``compile``:\n      * Your input parameters to ``benchmark`` (``batch_sizes``, etc.) are treated as filters on the index\n      * Each remaining model configuration is benchmarked as described in (1)\n\n\nPipeline\n~~~~~~~~\n\nPipeline mode is active when using a Neuron device and ``pipeline_sizes > 1``. The same behavior as described in Data Parallel applies, except that only one worker configuration is executed: the optimal number of workers for your pipeline size, unless manually overridden.\n\n\nParameters\n----------\n\nBelow are some useful and common parameters to tweak. Please see the :ref:`neuronperf_api` for full details.\n\n* ``n_models`` controls how many models to load. The default behavior is ``n_models=[1, MAX]``.\n* ``workers_per_model`` controls how many worker threads will be feeding inputs to each model. The default is automatically determined.\n* ``pipeline_sizes`` tells the benchmarker how many cores are needed for your model so that each model instance can be loaded properly. Default is 1.\n* ``duration`` controls how long to run each configuration.\n* ``batch_sizes`` is used to inform the benchmarker of your input shape so that throughput can be computed correctly.\n\nAlmost all NeuronPerf behaviors are controllable via arguments found in the :ref:`neuronperf_api`. This guide attempts to provide some context and examples for those arguments.\n\nInputs\n------\n\nModels accept one or more inputs to operate on. Since NeuronPerf needs to support multiple inputs for multiple models, as well as multi-input models, there are some details that may need your attention. See the :ref:`neuronperf_framework_notes` for details.\n\nMulti-input Models\n~~~~~~~~~~~~~~~~~~\n\nIf your model accepts multiple inputs, you must provide them in a ``tuple``. For example, suppose you have a model like this:\n\n.. code:: python\n\n\n\tclass Model(torch.nn.Module):\n\t\tdef forward(self, x, y, z):\n\t\t\t...\n\t\t\treturn output\n\n\nIn order for NeuronPerf to pass along your multiple inputs correctly, you should provide them as a ``tuple``:\n\n.. code:: python\n\n\tinputs = (x, y, z)\n\tnpf.torch.benchmark(model_filename, inputs, ...)\n\nIf you are compiling and/or benchmarking multiple models, you can pass different sized inputs as a list of tuples:\n\n.. code:: python\n\n\tinputs = [(x1, y1, z1), (x2, y2, z2), ...]\n\tnpf.torch.benchmark(model_filename, inputs, ...)\n\n\nPreprocessing and Postprocessing\n--------------------------------\n\nMany models have additional preprocessing and postprocessing steps involved that may add non-negligible overhead to inference time. NeuronPerf supports these use cases through the use of custom functions.\n\nPreprocessing\n~~~~~~~~~~~~~\n\nRecall that NeuronPerf expects (or wraps) each model input into a ``tuple``. These tuples will be unpacked before calling your model.\n\nHere is an example for a model with one input. The example multiples the input by 5 before inference.\n\n.. code:: python\n\n    def preprocess_fn(x):\n        return x * 5\n\n    ...\n\n    # Benchmark with custom preprocessing function\n    reports = npf.torch.benchmark(\n            filename,\n            inputs,\n            ...,\n            preprocess_fn = preprocess_fn,\n    )\n\nOr if your model expects multiple inputs:\n\n.. code:: python\n\n    def preprocess_fn(x, y, z):\n        return x / 255, y / 255, z / 255\n\n    ...\n\n    # Benchmark with custom preprocessing function\n    reports = npf.torch.benchmark(\n            filename,\n            inputs,\n            ...,\n            preprocess_fn = preprocess_fn,\n    )\n\nPostprocessing\n~~~~~~~~~~~~~~\n\nPostprocessing is almost identical to preprocessing, except that your function will receive whatever the output of your model is, exactly as returned without modification. There are no type guarantees.\n\n.. code:: python\n\n   def postprocess_fn(x):\n      return x.argmax()\n\n   ...\n\n   # Benchmark with custom preprocessing function\n   reports = npf.torch.benchmark(\n         filename,\n         inputs,\n         ...,\n         postprocess_fn = postprocess_fn,\n   )\n\nMinimal Latency\n---------------\n\nSuppose you are interested in the minimal latency achievable with your model. In this case, there is no need for more than one worker to execute at a time. We can manually specify the number of workers to use. See below :ref:`neuronperf_worker_threads`.\n\n\n.. _neuronperf_worker_threads:\n\nWorker Threads\n--------------\n\nThe argument ``workers_per_model`` controls the number of worker threads that are trying to prepare and load examples onto a single NeuronCore at a time. Therefore, a value of 1 corresponds to 1 thread / model. If ``n_models=16``, then there would be 16 worker threads, one per model. This number is selected based upon whether you are using DataParallel (i.e. ``pipeline_sizes == 1``), or Pipeline Mode (``pipeline_sizes != 1``).\n\nBy default, NeuronPerf will try to pick try multiple combinations of model copies and workers. You may be interested in controlling this manually.\n\n.. code:: python\n\n   reports = npf.torch.benchmark('model_neuron_b1.pt', ..., workers_per_model=1)\n\n\nYou may also pass a list, as with other parameters:\n\n.. code:: python\n\n   workers_per_model = [1, 2] # Same as the default for data parallel\n   reports = npf.torch.benchmark('model_neuron_b1.pt', ..., workers_per_model=workers_per_model)\n\nWith the default number of :ref:`neuronperf_model_copies`, a call to ``print_results`` might look like this:\n\n.. code:: bash\n\n   throughput_avg latency_ms_p50 latency_ms_p99 n_models       pipeline_size  workers_per_model batch_size     model_filename\n   307.25         3.251          3.277          1              1              1                 1              models/a5cff386-89ca-4bbf-9087-d0e624c3c604.pt\n   2746.0         5.641          6.82           16             1              1                 1              models/a5cff386-89ca-4bbf-9087-d0e624c3c604.pt\n   329.5          6.053          6.108          1              1              2                 1              models/a5cff386-89ca-4bbf-9087-d0e624c3c604.pt\n   2809.0         10.246         12.52          16             1              2                 1              models/a5cff386-89ca-4bbf-9087-d0e624c3c604.pt\n\n\n.. _neuronperf_model_copies:\n\nModel Copies\n------------\n\nBy default, NeuronPerf will benchmark two settings for ``n_models``:\n   1. A single copy\n   2. The maximum number number of copies for your instance size\n\nYou can override this behavior by passing ``n_models`` to ``benchmark``, as shown below:\n\n.. code:: python\n\n   reports = npf.torch.benchmark('model_neuron_b1.pt', ..., n_models=6)\n\nor\n\n.. code:: python\n\n   n_models = list(range(1, 10))\n   reports = npf.torch.benchmark('model_neuron_b1.pt', ..., n_models=n_models)\n\n.. _neuronperf_pipeline_mode:\n\nPipeline Mode\n-------------\n\nBy default, NeuronPerf will assume you intend to use DataParallel, with two exceptions:\n\n* You compiled your model using NeuronPerf for pipeline mode\n* You constructed a model index that uses pipeline mode\n\nYou can also manually tell NeuronPerf that your model was compiled for pipeline mode. It is similar to how other arguments are passed.\n\n.. code:: python\n\n   reports = npf.torch.benchmark('model_neuron_b1.pt', ..., pipeline_sizes=2)\n\nIf you are passing multiple models in an index, then you should pass a list for ``pipeline_sizes``.\n\n.. code:: python\n\n   reports = npf.torch.benchmark('model_index.json', ..., pipeline_sizes=[1, 2, 3])\n\n\nDuration\n--------\n\nNeuronPerf will benchmark each configuration specified for 60 seconds by default. You can control the duration by passing ``duration`` (in seconds).\n\n.. code:: python\n\n   reports = npf.torch.benchmark('model_index.json', ..., duration=10)\n\n.. warning::\n\n   If you make the duration too short, it may expire before all models are loaded and have had time to execute.\n\n\nCustom Datasets (Beta)\n----------------------\n\nCurrently, only PyTorch supports custom datasets, and the interface is subject to change. If you provide a custom dataset, it will be fully executed on each loaded model copy. So if you provide ``n_models=2``, your dataset will be run through twice in parallel.\n\nTo use this API, call ``benchmark`` passing a ``torch.utils.data.Dataset`` to ``inputs``. You can easily create your own ``Dataset`` by implementing the interface, or use one of the available datasets. For example:\n\n.. code:: python\n\n   import torchvision\n\n   dataset = torchvision.datasets.FashionMNIST(\n      root=\"data\",\n      train=False,\n      download=True,\n      transform=ToTensor()\n   )\n\n   reports = npf.torch.benchmark('model_index.json', inputs=dataset, batch_sizes=[8], preprocess_fn=lambda x: x[0], loop_dataset=False)\n\n.. note::\n\n   The ``preprocess_fn`` is required here to extract image input from the ``(image, label)`` tuple generated by dataloader. If the length of dataset is not sufficient to get the runtime performance, one can set ``loop_dataset=True`` to rerun dataset until certain duration. \n\nResults\n-------\n\nViewing and Saving\n~~~~~~~~~~~~~~~~~~\n\nThere are currently three ways to view results.\n\n- ``neuronperf.print_reports(...)``\n   - Dump abbrieviated results in your terminal\n- ``neuronperf.write_csv(...)``\n   - Store metrics of interest as CSV\n- ``neuronperf.write_json(...)``\n   - Store everything as JSON\n\nSee the :ref:`neuronperf_api` for full details.\n\nFull Timing Results\n~~~~~~~~~~~~~~~~~~~\n\nNeuronPerf automatically combines and summarizes the detailed timing information collecting during benchmarking. If you wish to receive everything back yourself, you can use:\n\n.. code:: python\n\n   results = npf.torch.benchmark('model_index.json', ..., return_timers=True)\n\nIf you later wish to produce reports the same way that NeuronPerf does internally, you can call:\n\n.. code:: python\n\n   reports = npf.get_reports(results)\n\nVerbosity\n---------\n\nVerbosity is an integer, currently one of ``{0, 1, 2}``, where:\n\n* 0 = SILENT\n* 1 = INFO (default)\n* 2 = VERBOSE / DEBUG\n\nExample:\n\n.. code:: python\n\n   reports = npf.torch.benchmark(..., n_models=1, duration=5, verbosity=2)\n\n.. code:: bash\n\n   DEBUG:neuronperf.benchmarking - Cast mode was not specified, assuming default.\n   INFO:neuronperf.benchmarking - Benchmarking 'resnet50.json', ~5 seconds remaining.\n   DEBUG:neuronperf.benchmarking - Running model config: {'model_filename': 'models/model_b1_p1_83bh3hhs.pt', 'device_type': 'neuron', 'input_idx': 0, 'batch_size': 1, 'n_models': 1, 'workers_per_model': 2, 'pipeline_size': 1, 'cast_mode': None, 'multiprocess': True, 'multiinterpreter': False, 'start_dts': '20211111-062818', 'duration': '5'}\n   DEBUG:neuronperf.benchmarking - Benchmarker 0 started.\n   DEBUG:neuronperf.benchmarking - Benchmarker 0, Worker 0 started.\n   DEBUG:neuronperf.benchmarking - Benchmarker 0, Worker 1 started.\n   DEBUG:neuronperf.benchmarking - Benchmarker 0, Worker 0 finished after 738 inferences.\n   DEBUG:neuronperf.benchmarking - Benchmarker 0, Worker 1 finished after 738 inferences.\n   DEBUG:neuronperf.benchmarking - Benchmarker 0 finished.\n   throughput_avg latency_ms_p50 latency_ms_p99 n_models       pipeline_size  workers_per_model batch_size     model_filename\n   329.667        6.073          6.109          1              1              2                 1              models/model_b1_p1_83bh3hhs.pt\n\n\nInternal Process Model\n----------------------\n\nFor each model loaded (see :ref:`neuronperf_model_copies`), a process is spawned. Each process may use multiple threads (see :ref:`neuronperf_worker_threads`). The threads will continue to load examples and keep the hardware busy.\n\nNeuronPerf spawns processes slightly differently between frameworks. For PyTorch and Apache MXNet, processes are forked. For Tensorflow/Keras, a fresh interpreter is launched, and benchmarkers are serialized and run as a script.\n\nIf you suspect you are having trouble due to the way processes are managed, you have two mechanisms of control:\n\n.. code:: python\n\n   reports = npf.torch.benchmark(..., multiprocess=False)\n\nDefault is ``True``, and ``False`` will disable multiprocessing and run everything inside a single parent process. This may not work for all frameworks beyond the first model configuration, because process teardown is used to safely deallocate models from the hardware. It is not recommeneded to benchmark this way.\n\n\n.. code:: python\n\n   reports = npf.torch.benchmark(..., multiinterpreter=True)\n\nThis flag controls whether a fresh interpreter is used instead of forking. Defaults to ``False`` except with Tensorflow/Keras.\n\n\n.. _npf-cpu-gpu:\n\nBenchmark on CPU or GPU\n-----------------------\n\nWhen benchmarking on CPU or GPU, the API is slightly different. With CPU or GPU, there is no compiled model to benchmark, so instead we need to directly pass a reference to the model class that will be instantiated.\n\n.. note::\n\n   GPU benchmarking is currently only available for PyTorch.\n\nCPU:\n\n.. code:: python\n\n   cpu_reports = npf.cpu.benchmark(YourModelClass, ...)\n\nGPU:\n\n.. code:: python\n\n   gpu_reports = npf.torch.benchmark(YourModelClass, ..., device_type=\"gpu\")\n\n\nYour model class will be instantiated in a subprocess, so there are some things to keep in mind.\n\n* Your model class must be defined at the top level inside a Python module\n   * i.e. don't place your model class definition inside a function or other nested scope\n* If your model class has special Python module dependencies, consider importing them inside your class ``__init__``\n* If your model class expects constructor arguments, wrap your class so that it has no constructor arguments\n\n\nExample of a wrapped model class for CPU/GPU benchmarking:\n\n.. code:: python\n\n   class ModelWrapper(torch.nn.Module):\n      def __init__(self):\n         super().__init__()\n         from transformers import AutoModelForSequenceClassification\n         model_name = \"bert-base-cased\"\n         self.bert = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n         self.add_module(model_name, self.bert)\n\n      def forward(self, *inputs):\n         return self.bert(*inputs)\n\n\n   reports = npf.torch.benchmark(ModelWrapper, inputs, device_type=\"gpu\")\n"
  },
  {
    "path": "archive/neuronperf/neuronperf_compile_guide.rst",
    "content": ".. _neuronperf_compile_guide:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\n========================\nNeuronPerf Compile Guide\n========================\n\nIf you wish to compile multiple configurations at once, NeuronPerf provides a simplified and uniform API across frameworks. The output is a model index that tracks the artifacts produces, and can be passed directly to the :ref:`benchmark <neuronperf_api_benchmark>` routine for a streamlined end-to-end process. This may be useful if you wish to test multiple configurations of your model on Neuron hardware.\n\nYou can manually specify the model index filename by passing ``filename``, or let NeuronPerf generate one and return it for you. Compiled artifacts will be placed in a local ``models`` directory.\n\nHow does ``compile`` know which instance type to compile for?\n-------------------------------------------------------------\n\nNeuronPerf will assume that the instance type your are currently on is also the compile target. However, you may compile on a non-Neuron instance or choose to target a different instance type. In the case, you can pass ``compiler_target`` to the ``compile`` call.\n\nFor example:\n\n.. code:: python\n\n   import neuronperf as npf\n   import neuronperf.torch\n\n   npf.torch.compile(model, inputs)  # compile for current instance type\n   npf.torch.compile(model, inputs, compiler_target=\"inf2\")  # compile for inf2\n\n\n\nCompiling multiple variants\n---------------------------\n\nIf you provide multiple pipeline sizes, batch sizes, and/or cast modes, NeuronPerf will compile all of them.\n\n.. code:: python\n\n   # Select a few batch sizes and pipeline configurations to test\n   batch_sizes = [1, 5, 10]\n   pipeline_sizes = [1, 2, 4]\n\n   # Construct example inputs\n   example_inputs = [torch.zeros([batch_size, 3, 224, 224], dtype=torch.float16) for batch_size in batch_sizes]\n\n   # Compile all configurations\n   index = npf.torch.compile(\n      model,\n      example_inputs,\n      batch_sizes=batch_sizes,\n      pipeline_sizes=pipeline_sizes,\n   )\n\n\nIf you wished to benchmark specific subsets of configurations, you could compile the specific configurations independently and later combine the results into a single index, as shown below.\n\n.. code:: python\n\n   # Compile with pipeline size 1 and vary batch dimension\n   batch_index = npf.torch.compile(\n      model,\n      example_inputs,\n      batch_sizes=batch_sizes,\n      pipeline_sizes=1,\n   )\n\n   # Compile with batch size 1 and vary pipeline dimension\n   pipeline_index = npf.torch.compile(\n      model,\n      example_inputs[0],\n      batch_sizes=1,\n      pipeline_sizes=pipeline_sizes,\n   )\n\n   index = npf.model_index.append(batch_index, pipeline_index)\n   npf.model_index.save(index, 'model_index.json')\n\nThe ``compile`` function supports ``batch_sizes``, ``pipeline_sizes``, ``cast_modes``, and custom ``compiler_args``. If there is an error during compilation for a requested configuration, it will be logged and compilation will continue onward without terminating. (This is to support long-running compile jobs with many configurations.)\n\n"
  },
  {
    "path": "archive/neuronperf/neuronperf_evaluate_guide.rst",
    "content": ".. _neuronperf_evaluate_guide:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\n==========================\nNeuronPerf Evaluate Guide\n==========================\n\nNeuronPerf has a new API for evaluating model accuracy on Neuron hardware. This API is currently only available for PyTorch.\n\nYou can access the API through standard ``benchmark()`` by passing an additional kwarg, ``eval_metrics``.\n\nFor example:\n\n.. code:: python\n\n    reports = npf.torch.benchmark(\n        model_index_or_path,\n        dataset,\n        n_models=1,\n        workers_per_model=2,\n        duration=0,\n        eval_metrics=['accuracy', 'precision']\n    )\n\n\nIn this example, we fix ``n_models`` and ``n_workers`` because replicating the same model will not impact accuracy. We also set ``duration=0`` to allow benchmarking to run untimed through all dataset examples.\n\nBecause this call can be tedious to type, a convenience function is provided:\n\n.. code:: python\n\n    reports = npf.torch.evaluate(model_index_or_path, dataset, metrics=['accuracy', 'precision'])\n\n\n.. note:\n\n    Please note that ``eval_metrics`` becomes ``metrics`` when using ``evaluate``.\n\nThe ``dataset`` can be any iterable object that produces ``tuple(*INPUTS, TARGET)``.\n\nIf ``TARGET`` does not appear in the last column for your dataset, you can customize this by passing ``eval_target_col``.\n\nFor example:\n\n.. code:: python\n\n    reports = npf.torch.evaluate(model_index_or_path, dataset, metrics='accuracy', eval_target_col=1)\n\n\nYou can list the currently available metrics.\n\n.. code:: python\n\n    >>> npf.list_metrics()                                                                                 │·····\n    Name                     Description                                                                   │·····\n    Accuracy                 (TP + TN) / (TP + TN + FP + FN)                                               │·····\n    TruePositiveRate         TP / (TP + FN)                                                                │·····\n    Sensitivity              Alias for TruePositiveRate                                                    │·····\n    Recall                   Alias for TruePositiveRate                                                    │·····\n    Hit Rate                 Alias for TruePositiveRate                                                    │·····\n    TrueNegativeRate         TN / (TN + FP)                                                                │·····\n    Specificity              Alias for TrueNegativeRate                                                    │·····\n    Selectivity              Alias for TrueNegativeRate                                                    │·····\n    PositivePredictiveValue  TP / (TP + FP)                                                                │·····\n    Precision                Alias for PositivePredictiveValue                                             │·····\n    NegativePredictiveValue  TN / (TN + FN)                                                                │·····\n    FalseNegativeRate        FN / (FN + TP)                                                                │·····\n    FalsePositiveRate        FP / (FP + TN)                                                                │·····\n    FalseDiscoveryRate       FP / (FP + TN)                                                                │·····\n    FalseOmissionRate        FP / (FP + TP)                                                                │·····\n    PositiveLikelihoodRatio  TPR / FPR                                                                     │·····\n    NegativeLikelihoodRatio  FNR / TNR                                                                     │·····\n    PrevalenceThreshold      sqrt(FPR) / (sqrt(FPR) + sqrt(TPR))                                           │·····\n    ThreatScore              TP / (TP + FN + FP)                                                           │·····\n    F1Score                  2TP / (2TP + FN + FP)                                                         │·····\n    MeanAbsoluteError        sum(|y - x|) / n                                                              │·····\n    MeanSquaredError         sum((y - x)^2) / n\n\n\nNew metrics may appear in the list after importing a submodule. For example, ``import neuronperf.torch`` will register a new ``topk`` metric.\n\nCustom Metrics\n--------------\n\nSimple Variants\n===============\n\nIf you wish to register a metric that is a slight tweak of an existing metric with different ``init`` args, you can use ``register_metric_from_existing()``:\n\n.. code:: python\n\n    npf.register_metric_from_existing(\"topk\", \"topk_3\", k=3)\n\nThis example registers a new metric ``topk_3`` from existing metric ``topk``, passing ``k=3`` as at ``init`` time.\n\n\nNew Metrics\n===========\n\nYou can register your own metrics using ``register_metric()``.\n\nYou metrics must extend ``BaseEvalMetric``:\n\n.. code:: python\n\n    class BaseEvalMetric(ABC):\n        \"\"\"\n        Abstract base class BaseEvalMetric from which other metrics inherit.\n        \"\"\"\n\n        @abstractmethod\n        def process_record(self, output: Any = None, target: Any = None) -> None:\n            \"\"\"Process an individual record and return the result.\"\"\"\n            pass\n\n        @staticmethod\n        def aggregate(metrics: Iterable[\"BaseEvalMetric\"]) -> Any:\n            \"\"\"Combine a sequence of metrics into a single result.\"\"\"\n            raise NotImplementedError\n\nFor example:\n\n.. code:: python\n\n    import neuronperf as npf\n\n    class MyCustomMetric(npf.BaseEvalMetric):\n        def __init__(self):\n            super().__init__()\n            self.passing = 0\n            self.processed = 0\n\n        def process_record(self, outputs, target):\n            self.processed += 1\n            if outputs == target:\n                self.passing += 1\n        \n        @staticmethod\n        def aggregate(metrics):\n            passing = 0\n            processed = 0\n            for metric in metrics:\n                passing += metric.passing\n                processed += metric.processed\n            return passing / processed if processed else 0\n\n\n    npf.register_metric(\"MyCustomMetric\", MyCustomMetric)\n\n\n"
  },
  {
    "path": "archive/neuronperf/neuronperf_examples.rst",
    "content": ".. _neuronperf_examples:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nNeuronPerf Examples\n===================\n\nThis page walks through several examples of using NeuronPerf, starting with the simplest way---using a compiled model. We will also see how we can use NeuronPerf to perform a hyperparameter search, and manage the artifacts produced, as well as our results.\n\nBenchmark a Compiled Model\n--------------------------\n\nThis example assumes you have already compiled your model for Neuron and saved it to disk.\nYou will need to adapt the batch size, input shape, and filename for your model.\n\n.. code:: python\n\n   import torch  # or tensorflow, mxnet\n\n   import neuronperf as npf\n   import neuronperf.torch  # or tensorflow, mxnet\n\n   # Construct dummy inputs\n   batch_sizes = 1\n   input_shape = (batch_sizes, 3, 224, 224)\n   inputs = torch.ones(input_shape)  # or numpy array for TF, MX\n\n   # Benchmark and save results\n   reports = npf.torch.benchmark(\"your_model_file.pt\", inputs, batch_sizes)\n   npf.print_reports(reports)\n   npf.write_json(reports)\n\n\n.. code:: bash\n\n   INFO:neuronperf.benchmarking - Benchmarking 'your_model_file.pt', ~8.0 minutes remaining.\n   throughput_avg    latency_ms_p50    latency_ms_p99    n_models          pipeline_size     workers_per_model batch_size        model_filename\n   296766.5          0.003             0.003             1                 1                 1                 1                 your_model_file.pt\n   3616109.75        0.005             0.008             24                1                 1                 1                 your_model_file.pt\n   56801.0           0.035             0.04              1                 1                 2                 1                 your_model_file.pt\n   3094419.4         0.005             0.051             24                1                 2                 1                 your_model_file.pt\n\n\nLet's suppose you only wish to test two specific configurations. You wish to benchmark  1 model and 1 worker thread, and also with 2 worker threads for 15 seconds each. The call to ``benchmark`` becomes:\n\n.. code:: python\n\n   reports = npf.torch.benchmark(filename, inputs, batch_sizes, n_models=1, workers_per_model=[1, 2], duration=15)\n\nYou can also add a custom model name to reports.\n\n.. code:: python\n\n   reports = npf.torch.benchmark(..., model_name=\"MyFancyModel\")\n\nSee the :ref:`neuronperf_benchmark_guide` for further details.\n\n\nBenchmark a Model from Source\n-----------------------------\n\nIn this example, we define, compile, and benchmark a simple (dummy) model using PyTorch.\n\nWe'll assume you already have a PyTorch model compiled for Neuron with the filename ``model_neuron_b1.pt``. Furthermore, let's assume the model was traced with a batch size of 1, and has an input shape of (3, 224, 224).\n\n.. literalinclude:: test_simple_pt.py\n    :language: python\n    :caption: :download:`test_simple_pt.py <test_simple_pt.py>`\n    :linenos:\n\n\n.. code:: bash\n\n   (aws_neuron_pytorch_p36) ubuntu@ip-172-31-11-122:~/tmp$ python test_simple_pt.py\n   INFO:neuronperf.benchmarking - Benchmarking 'model_neuron_b1.pt', ~8.0 minutes remaining.\n   throughput_avg    latency_ms_p50    latency_ms_p99    n_models          pipeline_size     workers_per_model batch_size        model_filename\n   296766.5          0.003             0.003             1                 1                 1                 1                 model_neuron_b1.pt\n   3616109.75        0.005             0.008             24                1                 1                 1                 model_neuron_b1.pt\n   56801.0           0.035             0.04              1                 1                 2                 1                 model_neuron_b1.pt\n   3094419.4         0.005             0.051             24                1                 2                 1                 model_neuron_b1.pt\n\nCompile and Benchmark a Model\n-----------------------------\n\nHere is an end-to-end example of compiling and benchmarking a ResNet-50 model from ``torchvision``.\n\n.. literalinclude:: test_resnet50_pt.py\n    :language: python\n    :caption: :download:`test_resnet50_pt.py <test_resnet50_pt.py>`\n    :linenos:\n\n\nBenchmark on CPU or GPU\n-----------------------\n\nWhen benchmarking on CPU or GPU, the API is slightly different. With CPU or GPU, there is no compiled model to benchmark, so instead we need to directly pass a reference to the model class that will be instantiated.\n\n.. note::\n\n   GPU benchmarking is currently only available for PyTorch.\n\nCPU:\n\n.. code:: python\n\n   cpu_reports = npf.cpu.benchmark(YourModelClass, ...)\n\nGPU:\n\n.. code:: python\n\n   gpu_reports = npf.torch.benchmark(YourModelClass, ..., device_type=\"gpu\")\n\n\nPlease refer to :ref:`npf-cpu-gpu` for details and an example of providing your model class.\n"
  },
  {
    "path": "archive/neuronperf/neuronperf_faq.rst",
    "content": ".. _neuronperf_faq:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nNeuronPerf FAQ\n==============\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nWhen should I use NeuronPerf?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen you want to measure the highest achievable performance for your model with Neuron.\n\nWhen should I **not** use NeuronPerf?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen measuring end-to-end performance that includes your network serving stack. Instead, your should compare your e2e numbers to those obtained by NeuronPerf to optimize your serving overhead.\n\n\nWhich frameworks does NeuronPerf support?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSee :ref:`neuronperf_framework_notes`.\n\nWhich Neuron instance types does NeuronPerf support?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nPyTorch and TensorFlow support all instance types.\nMXNet support is limited to inf1.\n\n\nWhat is the secret to obtaining the best numbers?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThere is no secret sauce. NeuronPerf follows best practices.\n\nWhat are the \"best practices\" that NeuronPerf uses?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- These vary slightly by framework and how your model was compiled\n- For a model compiled for a single NeuronCore (DataParallel):\n\n\t- To maximize throughput, for ``N`` models, use ``2 * N`` worker threads\n\t- To minimize latency, use 1 worker thread per model\n- Use a new Python process for each model to avoid GIL contention\n- Ensure you benchmark long enough for your numbers to stabilize\n- Ignore outliers at the start and end of inference benchmarking\n\n"
  },
  {
    "path": "archive/neuronperf/neuronperf_framework_notes.rst",
    "content": ".. _neuronperf_framework_notes:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\n==========================\nNeuronPerf Framework Notes\n==========================\n\nPyTorch\n=======\n\n  * Requires: ``torch-neuron`` or ``torch-neuronx``\n\t- Versions: 1.7.x, 1.8.x, 1.9.x, 1.10.x, 1.11.x, 1.12.x, 1.13.x\n  * Input to ``compile``: ``torch.nn.Module``\n  * Model inputs: ``Any``.\n\n\nTensorFlow 1.x\n==============\n\n  * Requires: ``tensorflow-neuron``\n  \t- Versions: All\n  * Input to ``compile``: Path to uncompiled model dir from ``saved_model.simple_save``\n  * Model inputs: Tensors must be provided as ``numpy.ndarray``\n\n.. note::\n\n\tAlthough TensorFlow *tensors* must be ``ndarray``, this doesn't stop you from wrapping them inside of data structures that traverse process boundaries safely. For example, you can still pass an input ``dict`` like ``{'input_0': np.zeros((2, 1))}``.\n\nTensorFlow 2.x\n==============\n\n  * Requires: ``tensorflow-neuron`` or ``tensorflow-neuronx``\n  \t- Versions: All\n  * Input to ``compile``: ``tf.keras.Model``\n  * Model inputs: Tensors must be provided as ``numpy.ndarray``\n\n.. note::\n\n\tAlthough TensorFlow *tensors* must be ``ndarray``, this doesn't stop you from wrapping them inside of data structures that traverse process boundaries safely. For example, you can still pass an input ``dict`` like ``{'input_0': np.zeros((2, 1))}``.\n\nApache MXNet\n=============\n\n  * Requires: ``mxnet-neuron``\n  \t- Versions 1.5, 1.8\n  * Input to ``compile``: ``tuple(sym, args, aux)``\n  * Inputs: Tensors must be provided as ``mxnet.ndarray`` or ``numpy.ndarray``\n"
  },
  {
    "path": "archive/neuronperf/neuronperf_install.rst",
    "content": ".. _neuronperf_install:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nNeuronPerf Install\n==================\n\nActivate your Neuron environment, and execute:\n\n.. code:: bash\n\n  $ pip install neuronperf --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n"
  },
  {
    "path": "archive/neuronperf/neuronperf_model_index_guide.rst",
    "content": ".. _neuronperf_model_index_guide:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\n============================\nNeuronPerf Model Index Guide\n============================\n\nA **model index** is a JSON file that tracks information about one or more compiled models. You can generate them using ``compile``, by using the API described here, or you may create them manually in a text editor.\n\nAfter a call to ``compile`` you may notice that you now have a ``models`` directory. You will also spot a new file named something like ``model_83b3raj2.json`` in your local directory, if you didn't provide a ``filename`` yourself.\n\nA model index is not intended to be opaque; you should feel free to open, inspect, and modify it yourself. It contains some information about the artifacts that were compiled. Individual models referenced by the index can be handed to ``benchmark`` directly along with an example input, or you may pass the entire index as in the basic example above. Here is an example index:\n\n.. code:: bash\n\n   python3 -m json.tool model_index.json\n\n.. code:: json\n\n   {\n       \"version\": \"0.0.0.0+0bc220a\",\n       \"model_configs\": [\n           {\n               \"filename\": \"models/model_b1_p1_38793jda.pt\",\n               \"input_idx\": 0,\n               \"batch_size\": 1,\n               \"pipeline_size\": 1,\n               \"compile_s\": 5.32\n           }\n       ]\n   }\n\nAn index is useful for keeping track of your compiled artifacts and their parameters. The advantages of using ``neuronperf.[torch/tensorflow/mxnet].compile`` are clearer when we wish to compile multiple variants of our model and benchmark all of them at the same time. All of the model artifacts and the index can be destroyed using ``model_index.delete('model_index.json')``.\n\nBenchmarking\n============\n\nWhen benchmarking with an index, there are some important details to keep in mind. If you originally built the index using a set of inputs, the model index has associated the ``inputs`` with the compiled models by their positional index.\n\nFor example:\n\n.. code:: python\n\n   batch_sizes = [1, 2]\n   inputs = [torch.zeros((b, 100)) for b in batch_sizes]\n\nHere, ``inputs[0]`` corresponds to batch size 1. Therefore, the model index will contain a reference to input 0 for that model. When you call ``benchmark``, you must pass inputs with the same shape in the same positions as at compile time.\n\n.. note::\n\n   It's only necessary that there is an input with the correct shape at``inputs[input_index]``. The example data itself is not important.\n\n\nWorking with Indexes\n--------------------\n\nThe API detail below describes utilities for working with indexes. An ``index`` can be either a loaded index (JSON) or the path to an index (it will be loaded automatically).\n\nCreating\n========\n\n.. code:: python\n\n   index = neuronperf.model_index.create('/path/to/model', batch_size=1)\n   filename = neuronperf.model_index.save(index)\n\nOnce you have an index, you can pass its path directly to ``benchmark``. You can also pass a custom filename instead:\n\n.. code:: python\n\n   index = neuronperf.model_index.create('/path/to/model', batch_size=1)\n   neuronperf.model_index.save(index, 'my_index.json')\n\nAppending\n=========\n\nIf **multiple models use the same inputs**, you can append them together. For example, if you have the same batch size with multiple pipeline sizes, the inputs are the same, but the model changes.\n\n.. code:: python\n\n   pipeline_sizes = [1, 2, 3, 4]\n   indexes = [neuronperf.model_index.create(f'/path/to/model_p{p}', pipeline_size=p, batch_size=5) for p in pipeline_sizes]\n   index = neuronperf.model_index.append(*indexes)\n   neuronperf.model_index.save(index, 'my_index.json')\n\nFiltering\n=========\n\nYou can construct a new model index that is filtered by some parameter. For example, to get a new index with only batch sizes [1, 2], you could do:\n\n.. code:: python\n\n   new_index = neuronperf.model_index.filter(index, batch_sizes=[1, 2])\n\nYou can also benchmark subset of a model index by passing only the subset parameters of interest, but remember to ensure you provide the correct number of inputs for the index (even if some are not used).\n\nFor example, if you an index with models at ``batch_sizes = [1, 2, 3]``, but only wish to benchmark batch size 2:\n\n.. code:: python\n\n   batch_sizes = [1, 2, 3]\n   inputs = [torch.zeros((b, 100)) for b in batch_sizes]\n   reports = neuronperf.torch.benchmark('model_index.json', inputs, batch_sizes=2)\n\nCopying\n=======\n\nYou can copy an index to a new location with ``neuronperf.model_index.copy(index, new_index_name, new_index_dir)``. This is mostly useful in combination with ``filter``/``append``.\n\nDeleting\n========\n\nIf you wish to keep your compiled models, just delete the model index file yourself. If you want to delete your model index and all associated artifacts, use:\n\n.. code:: python\n\n   neuronperf.model_index.delete('my_index.json')"
  },
  {
    "path": "archive/neuronperf/neuronperf_overview.rst",
    "content": ".. _neuronperf_overview:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\n===================\nNeuronPerf Overview\n===================\n\nNeuronPerf is a lightweight Python library that can help you easily benchmark your models with Neuron hardware.\n\nNeuronPerf supports Neuron releases for PyTorch, Tensorflow, and MXNet. It is used internally by the Neuron team to generate performance benchmarking numbers.\n\nWhen interacting with NeuronPerf, you will typically import the base package along with one of the submodule wrappers, for example:\n\n.. code:: python\n\n\timport neuronperf\n\timport neuronperf.torch\n\nYou may then benchmark and/or compile one or more models with NeuronPerf. For example,\n\n.. code:: python\n\n\treports = neuronperf.torch.benchmark(model, inputs, ...)\n\nThe ``compile`` and ``benchmark`` methods must be accessed through one of the supported framework submodules.\n\nBenchmarking\n============\n\nAll NeuronPerf ``benchmark`` calls require a minimum of two arguments:\n\n\t1. A filename\n\t2. Inputs\n\nThe filename may refer to:\n\n\t1. A Neuron-compiled model (e.g. ``my_model.pt``)\n\t2. A :ref:`Model Index <neuronperf_model_index_guide>`.\n\nA Model Index is useful for benchmarking more than one model in a single session.\n\nCompiling\n=========\n\nNeuronPerf also provides a standard interface to all Neuron frameworks through the ``compile`` API.\n\n.. code:: python\n\n\tmodel_index = neuronperf.torch.compile(model, inputs, ...)\n\nThis is completely optional. You may use the standard compilation guides for supported frameworks.\n\nNext Steps\n==========\n\nTake a look at the simple :ref:`neuronperf_examples`, :ref:`neuronperf_benchmark_guide`, :ref:`neuronperf_compile_guide`, and :ref:`neuronperf_api`."
  },
  {
    "path": "archive/neuronperf/neuronperf_terminology.rst",
    "content": ".. _neuronperf_terminology:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nNeuronPerf Terminology\n======================\n\n  * Model Inputs\n    - An individual input or ``list`` of inputs\n    - Example: ``inputs = [(torch.ones((batch_size, 5))) for batch_size in batch_sizes]``\n    - Each input is associated with the ``batch_sizes`` specified, in the same order\n    - Each input is fed individually to a corresponding model\n    - If an input is provided as a ``tuple``, it will be destructured to ``model(*input)`` to support multiple args\n    - See :ref:`neuronperf_framework_notes` for framework-specific requirements\n  * Latency\n  \t- Time to execute a single ``model(input)``\n  \t- Typically measured in milliseconds\n  * Model\n   \t- Your data model; varies by framework. See :ref:`neuronperf_framework_notes`\n  \t- Models may be wrapped by submodules (``torch``, ``tensorflow``, ``mxnet``) as callables\n  * Model Index\n  \t- A JSON file that tracks compiled model artifacts\n  * Model Inputs\n  \t- A ``tuple`` of inputs passed to a model, i.e. a single complete example\n  \t- Example: ``input = (torch.ones((5, 3, 224, 224)),)``\n  * Throughput\n  \t- Inferences / second"
  },
  {
    "path": "archive/neuronperf/neuronperf_troubleshooting.rst",
    "content": ".. _neuronperf_troubleshooting:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nNeuronPerf Troubleshooting\n==========================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nCompilation issues\n^^^^^^^^^^^^^^^^^^\n\nModel fails to compile\n~~~~~~~~~~~~~~~~~~~~~~\n\nPlease `file a bug <https://github.com/aws/aws-neuron-sdk/issues>`_ with as much information as possible.\n\nBenchmarking Issues\n^^^^^^^^^^^^^^^^^^^\n\nBenchmarking terminates early with errors\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- Scroll up and read the output. Most likely causes are:\n   - invalid input shapes or\n   - not enough memory to load the requested number of model copies on the device. Try passing ``n_models=1`` to ``benchmark`` again to test for memory issues.\n\nOther Issues or Feature Requests\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nPlease file a bug on `Github <https://github.com/aws/aws-neuron-sdk/issues>`_."
  },
  {
    "path": "archive/neuronperf/rn.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nWhat's New\n==========\n\n.. toctree::\n   :maxdepth: 1\n\n   /release-notes/components/dev-tools\n"
  },
  {
    "path": "archive/neuronperf/setup.cfg",
    "content": "[aliases]\n# Define this so we don't resolve to the wrong setuptools 'test' entrypoint when\n# invoking brazil-build test.\ntest = brazil_test\n"
  },
  {
    "path": "archive/neuronperf/setup.py",
    "content": "import collections\nimport os\nimport subprocess\n\nfrom setuptools import find_packages, setup\n\n# Read __version__.py\nversion_py = os.path.join(\"src\", \"neuronperf\", \"__version__.py\")\nwith open(version_py, \"rt\") as fp:\n    lines = fp.readlines()\nmeta = collections.OrderedDict()\nfor line in lines:\n    key, value = line.split(\"=\")\n    meta[key.strip()] = value.strip()[1:-1]\n\n# Extract fields for packaging\nTITLE = meta[\"__title__\"]\nAUTHOR = meta[\"__author__\"]\nDESCRIPTION = meta[\"__description__\"]\nVERSION = os.getenv(\"BRAZIL_PACKAGE_VERSION\", \"0.0.0.0\")\nLICENSE = meta[\"__license__\"]\n\n# Compute release version and write back meta info for consistency.\nGIT_SHA = os.environ.get(\"BRAZIL_PACKAGE_CHANGE_ID\")\nif GIT_SHA:\n    GIT_SHA = GIT_SHA.strip()[:9]\nelse:\n    # This is probably a local build. Try to attach something meaningful.\n    try:\n        GIT_SHA = subprocess.check_output([\"git\", \"rev-parse\", \"--short\", \"HEAD\"]).decode().strip()\n    except:\n        GIT_SHA = \"0\" * 9\nVERSION = \"{}+{}\".format(VERSION.strip(), GIT_SHA)\nmeta[\"__version__\"] = VERSION\nwith open(version_py, \"wt\") as fp:\n    for k, v in meta.items():\n        fp.write('{} = \"{}\"\\n'.format(k, v))\n\n\nsetup(\n    name=TITLE,\n    version=VERSION,\n    description=DESCRIPTION,\n    author=AUTHOR,\n    license=LICENSE,\n    classifiers=[\n        \"Development Status :: 4 - Beta\",\n        \"Intended Audience :: Developers\",\n        \"Topic :: Scientific/Engineering :: Artificial Intelligence\",\n        \"License :: Other/Proprietary License\",\n        \"Programming Language :: Python :: 3.6\",\n    ],\n    keywords=\"aws neuron\",\n    packages=find_packages(where=\"src\", exclude=(\"test\",)),\n    install_requires=[\"dill==0.3.4\", \"numpy\", \"psutil==5.9.0\"],\n    python_requires=\">=3.6\",\n    package_dir={\"\": \"src\"},\n    data_files=[],\n    package_data={\"\": [\"py.typed\"]},\n)\n"
  },
  {
    "path": "archive/neuronperf/test_resnet50_pt.py",
    "content": "import torch\nimport torch_neuron\n\nimport neuronperf as npf\nimport neuronperf.torch\n\nfrom torchvision import models\n\n\n# Load a pretrained ResNet50 model\nmodel = models.resnet50(pretrained=True)\n\n# Select a few batch sizes to test\nfilename = 'resnet50.json'\nbatch_sizes = [5, 6, 7]\n\n# Construct example inputs\ninputs = [torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) for batch_size in batch_sizes]\n\n# Compile\nnpf.torch.compile(\n\tmodel, \n\tinputs, \n\tbatch_sizes=batch_sizes, \n\tfilename=filename,\n)\n\n# Benchmark\nreports = npf.torch.benchmark(filename, inputs)\n\n# View and save results\nnpf.print_reports(reports)\nnpf.write_csv(reports, 'resnet50_results.csv')\nnpf.write_json(reports, 'resnet50_results.json')\n"
  },
  {
    "path": "archive/neuronperf/test_simple_pt.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf as npf\nimport neuronperf.torch\n\n\n# Define a simple model\nclass Model(torch.nn.Module):\n    def forward(self, x):\n        x = x * 3\n        return x + 1\n\n\n# Instantiate\nmodel = Model()\nmodel.eval()\n\n# Define some inputs\nbatch_sizes = [1]\ninputs = [torch.ones((batch_size, 3, 224, 224)) for batch_size in batch_sizes]\n\n# Compile for Neuron\nmodel_neuron = torch.neuron.trace(model, inputs)\nmodel_neuron.save(\"model_neuron_b1.pt\")\n\n# Benchmark\nreports = npf.torch.benchmark(\"model_neuron_b1.pt\", inputs, batch_sizes)\n\n# View and save results\nnpf.print_reports(reports)\nnpf.write_csv(reports, \"model_neuron_b1.csv\")\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/bert-base-cased_benchmark.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf\nimport neuronperf.torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Add to these lists or change as needed\nmodel_names = [\"bert-base-cased\"]\nsequence_lengths = [128]\nbatch_sizes = [6]\npipeline_sizes = [1]\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n    paraphrase = tokenizer.encode_plus(\n        sequence_0,\n        sequence_1,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"pt\",\n    )\n    inputs = (\n        torch.cat([paraphrase[\"input_ids\"]] * batch_size, 0),\n        torch.cat([paraphrase[\"attention_mask\"]] * batch_size, 0),\n    )\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Benchmark\n            print(\"Benchmarking {}\".format(filename))\n            reports = neuronperf.torch.benchmark(filename, inputs)\n\n            # View and save results\n            print(\"======== {} ========\".format(filename))\n            neuronperf.print_reports(reports)\n            neuronperf.write_csv(reports)\n            neuronperf.write_json(reports)"
  },
  {
    "path": "archive/src/benchmark/pytorch/bert-base-cased_compile.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf\nimport neuronperf.torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Add to these lists or change as needed\nmodel_names = [\"bert-base-cased\"]\nsequence_lengths = [128]\nbatch_sizes = [6]\npipeline_sizes = [1]\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n    paraphrase = tokenizer.encode_plus(\n        sequence_0,\n        sequence_1,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"pt\",\n    )\n    inputs = (\n        torch.cat([paraphrase[\"input_ids\"]] * batch_size, 0),\n        torch.cat([paraphrase[\"attention_mask\"]] * batch_size, 0),\n    )\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Compile\n            print(\"Compiling {}\".format(filename))\n            neuronperf.torch.compile(\n                model,\n                inputs,\n                batch_sizes=batch_sizes,\n                pipeline_sizes=pipeline_sizes,\n                filename=filename,\n                model_name=model_name,\n            )\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/bert-base-uncased_benchmark.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf\nimport neuronperf.torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Add to these lists or change as needed\nmodel_names = [\"bert-base-uncased\"]\nsequence_lengths = [128]\nbatch_sizes = [6]\npipeline_sizes = [1]\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n    paraphrase = tokenizer.encode_plus(\n        sequence_0,\n        sequence_1,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"pt\",\n    )\n    inputs = (\n        torch.cat([paraphrase[\"input_ids\"]] * batch_size, 0),\n        torch.cat([paraphrase[\"attention_mask\"]] * batch_size, 0),\n    )\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Benchmark\n            print(\"Benchmarking {}\".format(filename))\n            reports = neuronperf.torch.benchmark(filename, inputs)\n\n            # View and save results\n            print(\"======== {} ========\".format(filename))\n            neuronperf.print_reports(reports)\n            neuronperf.write_csv(reports)\n            neuronperf.write_json(reports)"
  },
  {
    "path": "archive/src/benchmark/pytorch/bert-base-uncased_compile.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf\nimport neuronperf.torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Add to these lists or change as needed\nmodel_names = [\"bert-base-uncased\"]\nsequence_lengths = [128]\nbatch_sizes = [6]\npipeline_sizes = [1]\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n    paraphrase = tokenizer.encode_plus(\n        sequence_0,\n        sequence_1,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"pt\",\n    )\n    inputs = (\n        torch.cat([paraphrase[\"input_ids\"]] * batch_size, 0),\n        torch.cat([paraphrase[\"attention_mask\"]] * batch_size, 0),\n    )\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Compile\n            print(\"Compiling {}\".format(filename))\n            neuronperf.torch.compile(\n                model,\n                inputs,\n                batch_sizes=batch_sizes,\n                pipeline_sizes=pipeline_sizes,\n                filename=filename,\n                model_name=model_name,\n            )\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/distilbert-base-uncased-finetuned-sst-2-english_benchmark.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf\nimport neuronperf.torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Add to these lists or change as needed\nmodel_names = [\"distilbert-base-uncased-finetuned-sst-2-english\"]\nsequence_lengths = [128]\nbatch_sizes = [6]\npipeline_sizes = [1]\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n    paraphrase = tokenizer.encode_plus(\n        sequence_0,\n        sequence_1,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"pt\",\n    )\n    inputs = (\n        torch.cat([paraphrase[\"input_ids\"]] * batch_size, 0),\n        torch.cat([paraphrase[\"attention_mask\"]] * batch_size, 0),\n    )\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Benchmark\n            print(\"Benchmarking {}\".format(filename))\n            reports = neuronperf.torch.benchmark(filename, inputs)\n\n            # View and save results\n            print(\"======== {} ========\".format(filename))\n            neuronperf.print_reports(reports)\n            neuronperf.write_csv(reports)\n            neuronperf.write_json(reports)"
  },
  {
    "path": "archive/src/benchmark/pytorch/distilbert-base-uncased-finetuned-sst-2-english_compile.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf\nimport neuronperf.torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Add to these lists or change as needed\nmodel_names = [\"distilbert-base-uncased-finetuned-sst-2-english\"]\nsequence_lengths = [128]\nbatch_sizes = [6]\npipeline_sizes = [1]\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n    paraphrase = tokenizer.encode_plus(\n        sequence_0,\n        sequence_1,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"pt\",\n    )\n    inputs = (\n        torch.cat([paraphrase[\"input_ids\"]] * batch_size, 0),\n        torch.cat([paraphrase[\"attention_mask\"]] * batch_size, 0),\n    )\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Compile\n            print(\"Compiling {}\".format(filename))\n            neuronperf.torch.compile(\n                model,\n                inputs,\n                batch_sizes=batch_sizes,\n                pipeline_sizes=pipeline_sizes,\n                filename=filename,\n                model_name=model_name,\n            )\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/distilbert-base-uncased_benchmark.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf\nimport neuronperf.torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Add to these lists or change as needed\nmodel_names = [\"distilbert-base-uncased\"]\nsequence_lengths = [128]\nbatch_sizes = [9]\npipeline_sizes = [1]\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n    paraphrase = tokenizer.encode_plus(\n        sequence_0,\n        sequence_1,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"pt\",\n    )\n    inputs = (\n        torch.cat([paraphrase[\"input_ids\"]] * batch_size, 0),\n        torch.cat([paraphrase[\"attention_mask\"]] * batch_size, 0),\n    )\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Benchmark\n            print(\"Benchmarking {}\".format(filename))\n            reports = neuronperf.torch.benchmark(filename, inputs)\n\n            # View and save results\n            print(\"======== {} ========\".format(filename))\n            neuronperf.print_reports(reports)\n            neuronperf.write_csv(reports)\n            neuronperf.write_json(reports)"
  },
  {
    "path": "archive/src/benchmark/pytorch/distilbert-base-uncased_compile.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf\nimport neuronperf.torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Add to these lists or change as needed\nmodel_names = [\"distilbert-base-uncased\"]\nsequence_lengths = [128]\nbatch_sizes = [9]\npipeline_sizes = [1]\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n    paraphrase = tokenizer.encode_plus(\n        sequence_0,\n        sequence_1,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"pt\",\n    )\n    inputs = (\n        torch.cat([paraphrase[\"input_ids\"]] * batch_size, 0),\n        torch.cat([paraphrase[\"attention_mask\"]] * batch_size, 0),\n    )\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Compile\n            print(\"Compiling {}\".format(filename))\n            neuronperf.torch.compile(\n                model,\n                inputs,\n                batch_sizes=batch_sizes,\n                pipeline_sizes=pipeline_sizes,\n                filename=filename,\n                model_name=model_name,\n            )\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/distilroberta-base_benchmark.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf\nimport neuronperf.torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Add to these lists or change as needed\nmodel_names = [\"distilroberta-base\"]\nsequence_lengths = [128]\nbatch_sizes = [6]\npipeline_sizes = [1]\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n    paraphrase = tokenizer.encode_plus(\n        sequence_0,\n        sequence_1,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"pt\",\n    )\n    inputs = (\n        torch.cat([paraphrase[\"input_ids\"]] * batch_size, 0),\n        torch.cat([paraphrase[\"attention_mask\"]] * batch_size, 0),\n    )\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Benchmark\n            print(\"Benchmarking {}\".format(filename))\n            reports = neuronperf.torch.benchmark(filename, inputs)\n\n            # View and save results\n            print(\"======== {} ========\".format(filename))\n            neuronperf.print_reports(reports)\n            neuronperf.write_csv(reports)\n            neuronperf.write_json(reports)"
  },
  {
    "path": "archive/src/benchmark/pytorch/distilroberta-base_compile.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf\nimport neuronperf.torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Add to these lists or change as needed\nmodel_names = [\"distilroberta-base\"]\nsequence_lengths = [128]\nbatch_sizes = [6]\npipeline_sizes = [1]\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n    paraphrase = tokenizer.encode_plus(\n        sequence_0,\n        sequence_1,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"pt\",\n    )\n    inputs = (\n        torch.cat([paraphrase[\"input_ids\"]] * batch_size, 0),\n        torch.cat([paraphrase[\"attention_mask\"]] * batch_size, 0),\n    )\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Compile\n            print(\"Compiling {}\".format(filename))\n            neuronperf.torch.compile(\n                model,\n                inputs,\n                batch_sizes=batch_sizes,\n                pipeline_sizes=pipeline_sizes,\n                filename=filename,\n                model_name=model_name,\n            )\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/hf-google-vit_benchmark.py",
    "content": "import torch\nimport neuronperf\nimport neuronperf.torch\nimport torch_neuronx\n\nfrom PIL import Image\nimport requests\nfrom transformers import ViTImageProcessor, ViTForImageClassification\n\ndef benchmark(batch_size):\n    feature_extractor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')\n    model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224', torchscript=True)\n    model.eval()\n\n    url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n    image = Image.open(requests.get(url, stream=True).raw)\n    inputs = feature_extractor(images=image, return_tensors=\"pt\")\n    inputs = inputs['pixel_values'].repeat([batch_size, 1, 1, 1])\n    example = (inputs,)\n\n    traced = torch_neuronx.trace(model, example, compiler_args=\"--model-type=transformer\")\n    filename = 'model.pt'\n    torch.jit.save(traced, filename)\n    reports = neuronperf.torch.benchmark(filename, [example], batch_sizes=[batch_size])\n    # View and save results\n    print(\"======== {} ========\".format(filename))\n    neuronperf.print_reports(reports)\n    neuronperf.write_csv(reports)\n    neuronperf.write_json(reports)\n\nif __name__ == '__main__':\n    # Use batch_size = 1 for best latency, batch_size = 2 for best throughput\n    benchmark(batch_size=2)"
  },
  {
    "path": "archive/src/benchmark/pytorch/hf-openai-clip_benchmark.py",
    "content": "import torch\nimport neuronperf\nimport neuronperf.torch\nimport torch_neuronx\nimport os\n\nfrom torchvision.datasets import CIFAR100\nfrom transformers import CLIPProcessor, CLIPModel\n\ndef benchmark(model_name, batch_size):\n    # Build the model, preprocessor, and dataset\n    cifar100 = CIFAR100(root=os.path.expanduser(\"~/.cache\"), download=True, train=False)\n    processor = CLIPProcessor.from_pretrained(model_name)\n    model = CLIPModel.from_pretrained(model_name, return_dict=False)\n\n    # Prepare a sample input\n    image = cifar100[0][0]\n    text = []\n    for c in cifar100.classes:\n        text.append(f'a photo of a {c}')\n\n    inputs = processor(text=text, images=image, return_tensors=\"pt\", padding=True)\n    image = inputs['pixel_values']\n    # (b, c, h, w)\n    image = image.repeat(batch_size, 1, 1, 1)\n    inputs = (inputs['input_ids'], image)\n\n    # Trace the model\n    model.eval()\n    traced = torch_neuronx.trace(model, inputs, compiler_args='--enable-saturate-infinity')\n    filename = 'model.pt'\n    torch.jit.save(traced, filename)\n    reports = neuronperf.torch.benchmark(filename, [inputs], batch_sizes=[batch_size])\n    # View and save results\n    print(\"======== {} ========\".format(filename))\n    neuronperf.print_reports(reports)\n    neuronperf.write_csv(reports)\n    neuronperf.write_json(reports)\n\nif __name__ == '__main__':\n    # Recommended batch sizes for throughput\n    # openai/clip-vit-base-patch32: 64\n    # openai/clip-vit-large-patch14: 4\n    model_name = 'openai/clip-vit-base-patch32'\n    batch_size = 64\n    benchmark(model_name, batch_size)"
  },
  {
    "path": "archive/src/benchmark/pytorch/hf_pretrained_wav2vec2_conformer_relpos_benchmark.py",
    "content": "import torch\nimport torch_neuronx\nfrom datasets import load_dataset\nfrom transformers import Wav2Vec2Processor, Wav2Vec2ConformerForCTC\nimport neuronperf as npf\nimport neuronperf.torch\n\nBATCH_SIZE = 1\ndef benchmark():\n    processor = Wav2Vec2Processor.from_pretrained(\"facebook/wav2vec2-conformer-rel-pos-large-960h-ft\")\n    model = Wav2Vec2ConformerForCTC.from_pretrained(\"facebook/wav2vec2-conformer-rel-pos-large-960h-ft\")\n    model.eval()\n\n    # take the first entry in the dataset as our input\n    ds = load_dataset(\"patrickvonplaten/librispeech_asr_dummy\", \"clean\", split=\"validation\", trust_remote_code=True)\n    inputs = processor(ds[0][\"audio\"][\"array\"], return_tensors=\"pt\", padding=\"longest\", sampling_rate=16_000).input_values\n    inputs = inputs.repeat([BATCH_SIZE, 1])\n    example = (inputs,)\n\n    traced = torch_neuronx.trace(model, example, compiler_args='--model-type=transformer')\n    filename = 'model.pt'\n    torch.jit.save(traced, filename)\n    \n    model_neuron = torch.jit.load(filename)\n    output = model_neuron(inputs)\n    print(f\"output is {output}\")\n\n    reports = neuronperf.torch.benchmark(filename, [example], multiprocess=False, batch_sizes=[BATCH_SIZE])\n    # View and save results\n    print(\"======== {} ========\".format(filename))\n    neuronperf.print_reports(reports)\n    neuronperf.write_csv(reports)\n    neuronperf.write_json(reports)\n\nif __name__ == '__main__':\n    benchmark()\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/hf_pretrained_wav2vec2_conformer_rope_benchmark.py",
    "content": "import torch\nimport torch_neuronx\nfrom datasets import load_dataset\nfrom transformers import Wav2Vec2Processor, Wav2Vec2ConformerForCTC\nimport neuronperf as npf\nimport neuronperf.torch\n\nBATCH_SIZE = 1\ndef benchmark():\n    processor = Wav2Vec2Processor.from_pretrained(\"facebook/wav2vec2-conformer-rope-large-960h-ft\")\n    model = Wav2Vec2ConformerForCTC.from_pretrained(\"facebook/wav2vec2-conformer-rope-large-960h-ft\")\n    model.eval()\n\n    # take the first entry in the dataset as our input\n    ds = load_dataset(\"patrickvonplaten/librispeech_asr_dummy\", \"clean\", split=\"validation\", trust_remote_code=True)\n    inputs = processor(ds[0][\"audio\"][\"array\"], return_tensors=\"pt\", padding=\"longest\", sampling_rate=16_000).input_values\n    inputs = inputs.repeat([BATCH_SIZE, 1])\n    example = (inputs,)\n\n    traced = torch_neuronx.trace(model, example, compiler_args='--model-type=transformer')\n    filename = 'model.pt'\n    torch.jit.save(traced, filename)\n    \n    model_neuron = torch.jit.load(filename)\n    output = model_neuron(inputs)\n    print(f\"output is {output}\")\n\n    reports = neuronperf.torch.benchmark(filename, [example], multiprocess=False, batch_sizes=[BATCH_SIZE])\n    # View and save results\n    print(\"======== {} ========\".format(filename))\n    neuronperf.print_reports(reports)\n    neuronperf.write_csv(reports)\n    neuronperf.write_json(reports)\n\nif __name__ == '__main__':\n    benchmark()\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/inf2_benchmark.py",
    "content": "# primary Script used for inf2 Benchmarking\n\nimport torch\nimport neuronperf\nimport neuronperf.torch\nimport torch_neuronx\nfrom transformers import (\n    AutoModel, AutoModelForSequenceClassification  # Any other model class respective to the model we want to infer on\n)\n\nclass GPT2Neuron(torch.nn.Module):\n    def __init__(self, model) -> None:\n        super().__init__()\n        self.model = model\n\n    def forward(self, input_ids, attention_mask):\n        return self.model(input_ids=input_ids, attention_mask=attention_mask, use_cache=False)\n\ndef benchmark(model_name, batch_size, sequence_length):\n    model = AutoModel.from_pretrained(model_name, torchscript=True)\n    if 'gpt2' in model_name:\n        model = GPT2Neuron(model)\n    model.eval()\n\n    example = (\n        torch.zeros(batch_size, sequence_length, dtype=torch.int),  # input_ids\n        torch.zeros(batch_size, sequence_length, dtype=torch.int),  # attention_mask\n    )\n\n    traced = torch_neuronx.trace(model, example)\n    filename = 'model.pt'\n    torch.jit.save(traced, filename)\n    reports = neuronperf.torch.benchmark(filename, [example])\n    # View and save results\n    print(\"======== {} ========\".format(filename))\n    neuronperf.print_reports(reports)\n    neuronperf.write_csv(reports)\n    neuronperf.write_json(reports)\n\nif __name__ == '__main__':\n    # benchmark(model_name, batch_size, sequence_length)\n    # Below are a few examples -\n    # benchmark('bert-base-cased', 16, 128)\n    # benchmark('bert-base-uncased', 4, 128)\n    # benchmark('gpt2', 16, 256)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/opt_benchmark.py",
    "content": "import os\nimport neuronperf as npf\nimport torch\nfrom transformers import AutoTokenizer\n\n\"\"\"\nRun the sample at this link to get the split model state_dict (opt-13b-split):\nhttps://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/facebook-opt-13b-sampling.ipynb\n\nMake sure transformers is installed\n\nChange the variables below for opt30b or opt66b models\n\"\"\"\n\n\nBATCH_SIZE = 2\nTP_DEGREE = 2\nSEQ_LEN = 2048\nTOKENIZER = AutoTokenizer.from_pretrained(\"facebook/opt-13b\")\nMODEL_DIR = \"./opt-13b-split\"\n\n\nclass Wrapper(torch.nn.Module):\n    def __init__(self, filename):\n        super().__init__()\n        from transformers_neuronx.opt.model import OPTForSampling\n        self.neuron_model = OPTForSampling.from_pretrained(\n            filename, batch_size=BATCH_SIZE, tp_degree=TP_DEGREE, amp=\"f16\"\n        )\n        self.neuron_model.to_neuron()\n\n    def forward(self, *inputs):\n        return self.neuron_model.sample(torch.concat(inputs), sequence_length=SEQ_LEN)\n\n# Custom load to let our Wrapper class handle things\ndef load_fn(filename, **kwargs):\n    return Wrapper(filename)\n\n# NeuronPerf can't see tp_degree at the moment, so just expose all cores\ndef env_setup_fn(*_):\n    del os.environ[\"NEURON_RT_VISIBLE_CORES\"]\n\ndef preprocess_fn(inputs):\n    return [TOKENIZER.encode(text, return_tensors=\"pt\") for text in inputs]\n\ndef postprocess_fn(outputs):\n    return [TOKENIZER.decode(seq) for seq in outputs]\n\ndef benchmark():\n    inputs = [\"Hello, I'm a language model,\"] * BATCH_SIZE\n    reports = npf.benchmark(\n        load_fn,\n        MODEL_DIR,\n        [inputs],  # treat batch as 1 input and let Wrapper handle batching\n        batch_sizes=1,  # ^\n        n_models=1,  # only load 1 copy of model\n        max_infers=5,\n        max_duration=0,  # sampling can take a while, so let's not timeout\n        workers_per_model=1,  # no bottleneck on model inputs, so 1 is fine\n        env_setup_fn=env_setup_fn,\n        preprocess_fn=preprocess_fn,\n        postprocess_fn=postprocess_fn,\n    )\n    \n    # grab the only report (we only benchmarked 1 config)\n    report = reports[0]\n    \n    # let's update throughput to be tokens / second and add a new record\n    new_tokens = sum(SEQ_LEN - len(TOKENIZER.encode(i)) for i in inputs)\n    tokens_per_s = round(new_tokens / (report[\"latency_ms_avg\"] / 1000), 2)\n    report[\"throughput_avg\"] = report[\"tokens_per_s\"] = tokens_per_s\n    \n    # display and save results\n    npf.print_report(report)\n    print(f\"Results saved to: {npf.write_json(report)}\")\n\n\nif __name__ == \"__main__\":\n    benchmark()\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/perceiver-multimodal_benchmark.py",
    "content": "import base64\nimport os\nimport ssl\nimport re\nfrom urllib import request\nimport time\nimport random\nfrom tqdm import tqdm\nimport numpy as np\nimport math\n\nfrom typing import Optional, Tuple, Union\nfrom transformers import PerceiverForMultimodalAutoencoding\nfrom transformers.modeling_outputs import BaseModelOutputWithCrossAttentions\nfrom transformers.models.perceiver.modeling_perceiver import PerceiverBasicDecoder, PerceiverClassifierOutput\nfrom transformers.models.perceiver.modeling_perceiver import restructure\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\n# We cannot use any of the pre-existing benchmarking utilities to benchmark E2E pipeline models.\n# All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a\n# traced Torchscript.\ndef benchmark(n_runs, test_name, model, model_inputs):\n    if not isinstance(model_inputs, tuple):\n        model_inputs = (model_inputs,)\n\n    warmup_run = model(*model_inputs)\n\n    latency_collector = LatencyCollector()    \n    for _ in range(n_runs):\n        latency_collector.pre_hook()\n        res = model(*model_inputs)\n        latency_collector.hook()\n    \n    p0_latency_ms = latency_collector.percentile(0) * 1000\n    p50_latency_ms = latency_collector.percentile(50) * 1000\n    p90_latency_ms = latency_collector.percentile(90) * 1000\n    p95_latency_ms = latency_collector.percentile(95) * 1000\n    p99_latency_ms = latency_collector.percentile(99) * 1000\n    p100_latency_ms = latency_collector.percentile(100) * 1000\n\n    report_dict = dict()\n    report_dict[\"Latency P0\"] = f'{p0_latency_ms:.1f}'\n    report_dict[\"Latency P50\"]=f'{p50_latency_ms:.1f}'\n    report_dict[\"Latency P90\"]=f'{p90_latency_ms:.1f}'\n    report_dict[\"Latency P95\"]=f'{p95_latency_ms:.1f}'\n    report_dict[\"Latency P99\"]=f'{p99_latency_ms:.1f}'\n    report_dict[\"Latency P100\"]=f'{p100_latency_ms:.1f}'\n\n    report = f'RESULT FOR {test_name}:'\n    for key, value in report_dict.items():\n        report += f' {key}={value}'\n    print(report)\n\nclass LatencyCollector:\n    def __init__(self):\n        self.start = None\n        self.latency_list = []\n\n    def pre_hook(self, *args):\n        self.start = time.time()\n\n    def hook(self, *args):\n        self.latency_list.append(time.time() - self.start)\n\n    def percentile(self, percent):\n        latency_list = self.latency_list\n        pos_float = len(latency_list) * percent / 100\n        max_pos = len(latency_list) - 1\n        pos_floor = min(math.floor(pos_float), max_pos)\n        pos_ceil = min(math.ceil(pos_float), max_pos)\n        latency_list = sorted(latency_list)\n        return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor]\n    \nclass MultimodalPerceiverWrapper(nn.Module):\n    def __init__(self, perceiver_model, nchunks, image_chunk_size, audio_chunk_size):\n        super().__init__()\n        self.perceiver_model = perceiver_model\n        self.nchunks = nchunks\n        self.image_chunk_size = image_chunk_size\n        self.audio_chunk_size = audio_chunk_size\n    \n    def forward(self, inputs: torch.FloatTensor,\n        neuron_decoder,\n        attention_mask: Optional[torch.FloatTensor] = None,\n        head_mask: Optional[torch.FloatTensor] = None,\n        output_attentions: Optional[bool] = None,\n        output_hidden_states: Optional[bool] = None,\n        return_dict: Optional[bool] = None):\n\n\n        output_attentions = output_attentions if output_attentions is not None else self.perceiver_model.config.output_attentions\n        output_hidden_states = (\n            output_hidden_states if output_hidden_states is not None else self.perceiver_model.config.output_hidden_states\n        )\n        return_dict = return_dict if return_dict is not None else self.perceiver_model.config.use_return_dict\n        \n        if self.perceiver_model.input_preprocessor is not None:\n            inputs, modality_sizes, inputs_without_pos = self.perceiver_model.input_preprocessor(inputs)\n        else:\n            modality_sizes = None\n            inputs_without_pos = None\n            if inputs.size()[-1] != self.perceiver_model.config.d_model:\n                raise ValueError(\n                    f\"Last dimension of the inputs: {inputs.size()[-1]} doesn't correspond to config.d_model:\"\n                    f\" {self.perceiver_model.config.d_model}. Make sure to set config.d_model appropriately.\"\n                )\n\n        batch_size, seq_length, _ = inputs.size()\n        device = inputs.device\n\n        # If no attention mask is provided, make them all ones\n        if attention_mask is None:\n            attention_mask = torch.ones((batch_size, seq_length), device=device)\n        # Make the attention mask broadcastable to [batch_size, num_heads, seq_length, seq_length]\n        extended_attention_mask = self.perceiver_model.invert_attention_mask(attention_mask)\n\n        head_mask = self.perceiver_model.get_head_mask(head_mask, self.perceiver_model.config.num_blocks * self.perceiver_model.config.num_self_attends_per_block)\n        embedding_output = self.perceiver_model.embeddings(batch_size=batch_size)\n\n        encoder_outputs = self.perceiver_model.encoder(\n            embedding_output,\n            attention_mask=None,\n            head_mask=head_mask,\n            inputs=inputs,\n            inputs_mask=extended_attention_mask,\n            output_attentions=output_attentions,\n            output_hidden_states=output_hidden_states,\n            return_dict=return_dict,\n        )\n        sequence_output = encoder_outputs[0]\n\n        logits = None\n        reconstruction = {}\n        for chunk_idx in tqdm(range(self.nchunks)):\n            subsampled_output_points = {\n            'image': torch.arange(\n                self.image_chunk_size * chunk_idx, self.image_chunk_size * (chunk_idx + 1)).to(device),\n            'audio': torch.arange(\n                self.audio_chunk_size * chunk_idx, self.audio_chunk_size * (chunk_idx + 1)).to(device),\n            'label': None,\n            }\n            \n            logits = neuron_decoder(sequence_output, extended_attention_mask, \n                                             inputs, modality_sizes, inputs_without_pos, subsampled_points=subsampled_output_points)\n\n            reconstruction['label'] = logits['label']\n            if 'image' not in reconstruction:\n                reconstruction['image'] = logits['image']\n                reconstruction['audio'] = logits['audio']\n            else:\n                reconstruction['image'] = torch.cat(\n                    [reconstruction['image'], logits['image']], dim=1)\n                reconstruction['audio'] = torch.cat(\n                    [reconstruction['audio'], logits['audio']], dim=1)\n            \n            del logits\n\n        return reconstruction\n\ndef custom_model_forward(\n        self,\n        nchunks,\n        image_chunk_size,\n        audio_chunk_size,\n        neuron_decoder,\n        inputs: Optional[torch.Tensor] = None,\n        attention_mask: Optional[torch.Tensor] = None,\n        head_mask: Optional[torch.Tensor] = None,\n        output_attentions: Optional[bool] = None,\n        output_hidden_states: Optional[bool] = None,\n        return_dict: Optional[bool] = None,\n    ) -> Union[Tuple, PerceiverClassifierOutput]:\n\n        return_dict = return_dict if return_dict is not None else self.config.use_return_dict\n\n        perceiver_wrapper = MultimodalPerceiverWrapper(self.perceiver, nchunks, image_chunk_size, audio_chunk_size)\n        outputs = perceiver_wrapper(\n            inputs,\n            neuron_decoder,\n            attention_mask=attention_mask,\n            head_mask=head_mask,\n            output_attentions=output_attentions,\n            output_hidden_states=output_hidden_states,\n            return_dict=return_dict,\n        )\n        return outputs\n\n\ndef custom_decoder_query(self, inputs, modality_sizes=None, inputs_without_pos=None, subsampled_points=None):\n    if self.position_encoding_type == \"none\":  # Queries come from elsewhere\n        raise ValueError(\"You cannot construct decoder queries when position_encoding_type is set to none\")\n    if subsampled_points is not None:\n        # subsampled_points are the indices if the inputs would be flattened\n        # however, the inputs aren't flattened, that's why we use unravel_index\n        # to get the indices for the unflattened array\n        # unravel_index returns a tuple (x_idx, y_idx, ...)\n        # stack to get the [n, d] tensor of coordinates\n\n        def unravel_indices(indices, shape):\n            coord = []\n\n            for dim in reversed(shape):\n                coord.append(indices % dim)\n                indices = indices // dim\n\n            coord = torch.stack(coord[::-1], dim=-1)\n\n            return coord\n\n        pos = unravel_indices(subsampled_points, self.output_index_dims)\n\n        batch_size = inputs.shape[0]\n        # Map these coordinates to [-1, 1]\n        pos = -1 + 2 * pos / torch.tensor(self.output_index_dims)[None, :]\n        pos = torch.broadcast_to(pos[None], [batch_size, pos.shape[0], pos.shape[1]])\n        # Construct the position encoding.\n        if self.position_encoding_type == \"trainable\":\n            pos_emb = self.output_position_encodings(batch_size)\n        elif self.position_encoding_type == \"fourier\":\n            pos_emb = self.output_position_encodings(\n                self.output_index_dims, batch_size=batch_size, device=inputs.device, dtype=inputs.dtype, pos=pos\n            )\n\n        # Optionally project them to a target dimension.\n        pos_emb = self.positions_projection(pos_emb)\n        pos_emb = torch.reshape(pos_emb, [pos_emb.shape[0], -1, pos_emb.shape[-1]])\n    else:\n        batch_size = inputs.shape[0]\n        index_dims = inputs.shape[2:]\n\n        # Construct the position encoding.\n        if self.position_encoding_type == \"trainable\":\n            pos_emb = self.output_position_encodings(batch_size)\n        elif self.position_encoding_type == \"fourier\":\n            pos_emb = self.output_position_encodings(\n                index_dims, batch_size, device=inputs.device, dtype=inputs.dtype\n            )\n\n        # Optionally project them to a target dimension.\n        pos_emb = self.positions_projection(pos_emb)\n\n    if self.concat_preprocessed_input:\n        if inputs_without_pos is None:\n            raise ValueError(\"Value is required for inputs_without_pos if concat_preprocessed_input is True\")\n        pos_emb = torch.cat([inputs_without_pos, pos_emb], dim=-1)\n\n    return pos_emb\n\n\n# Define wrapper for tracing encoder\nclass EncoderWrapper(nn.Module):\n    def __init__(self, encoder):\n        super().__init__()\n        self.encoder = encoder\n    \n    def forward(self, embedding_output, inputs, extended_attention_mask):\n        output = self.encoder(embedding_output, inputs=inputs, inputs_mask=extended_attention_mask)\n        return output\n\nclass NeuronEncoder(nn.Module):\n    def __init__(self, encoder_wrapper):\n       super().__init__()\n       self.encoder_wrapper = encoder_wrapper\n    \n    def forward(self,\n        hidden_states: torch.Tensor,\n        attention_mask: Optional[torch.FloatTensor] = None,\n        head_mask: Optional[torch.FloatTensor] = None,\n        inputs: Optional[torch.FloatTensor] = None,\n        inputs_mask: Optional[torch.FloatTensor] = None,\n        output_attentions: Optional[bool] = False,\n        output_hidden_states: Optional[bool] = False,\n        return_dict: Optional[bool] = True):\n\n        last_hidden_states = self.encoder_wrapper(hidden_states, inputs, inputs_mask)['last_hidden_state']\n        return BaseModelOutputWithCrossAttentions(last_hidden_state=last_hidden_states)\n\n\n# Define wrapper for tracing decoder\nclass DecoderWrapper(nn.Module):\n    def __init__(self, decoder, decoder_query_audio, decoder_query_image, decoder_query_label, output_postprocessor):\n        super().__init__()\n        self.decoder = decoder\n        self.decoder_query_audio = decoder_query_audio\n        self.decoder_query_image = decoder_query_image\n        self.decoder_query_label = decoder_query_label\n        self.output_postprocessor = output_postprocessor\n        self.num_query_channels = decoder.num_query_channels\n    \n    def forward(self, z, query_mask,\n                audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding,\n                image_input, image_input_without_pos, image_subsampled_point, image_padding,\n                label_input, label_input_without_pos, label_padding):\n        audio_query = self.decoder_query_audio(inputs=audio_input, inputs_without_pos=audio_input_without_pos, subsampled_points=audio_subsampled_point)\n        image_query = self.decoder_query_image(inputs=image_input, inputs_without_pos=image_input_without_pos, subsampled_points=image_subsampled_point)\n        label_query = self.decoder_query_label(inputs=label_input, inputs_without_pos=label_input_without_pos)\n\n        def embed(x, pos):\n            x = torch.reshape(x, [x.shape[0], np.prod(x.shape[1:-1]), x.shape[-1]])\n            pos = torch.broadcast_to(pos, [x.shape[0], x.shape[1], self.num_query_channels - x.shape[2]])\n            return torch.cat([x, pos], dim=2)\n\n        audio_padded = embed(audio_query, audio_padding)\n        image_padded = embed(image_query, image_padding)\n        label_padded = embed(label_query, label_padding)\n\n        decoder_query = torch.cat([audio_padded, image_padded, label_padded], dim=1)\n        logits = self.decoder(decoder_query, z, query_mask).logits\n        \n        output_modality_sizes = {\"audio\": audio_subsampled_point.shape[0],\n                                 \"image\": image_subsampled_point.shape[0],\n                                 \"label\": 1}\n        logits = self.output_postprocessor(logits, modality_sizes=output_modality_sizes)\n        return logits\n\nclass NeuronDecoder(nn.Module):\n    def __init__(self, decoder_wrapper):\n        super().__init__()\n        self.decoder_wrapper = decoder_wrapper\n        self.modalities = decoder_wrapper.decoder.modalities\n        self.padding = decoder_wrapper.decoder.padding\n\n    def forward(self, z, query_mask, inputs, modality_sizes, inputs_without_pos=None, subsampled_points=None, output_attentions=False):\n        # Partition the flat inputs among the different modalities\n        inputs = restructure(modality_sizes, inputs)\n\n        assert(subsampled_points is not None)\n        assert(inputs_without_pos is not None)\n\n        for modality, decoder in self.modalities.items():\n            if modality == \"audio\":\n                audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding = inputs[modality], inputs_without_pos[modality], subsampled_points[modality].to(torch.float32), self.padding[modality]\n            elif modality == \"image\":\n                image_input, image_input_without_pos, image_subsampled_point, image_padding = inputs[modality], inputs_without_pos[modality], subsampled_points[modality].to(torch.float32), self.padding[modality]\n            else:\n                # label doesn't have subsampled point\n                label_input, label_input_without_pos, label_padding = inputs[modality], inputs_without_pos[modality], self.padding[modality]\n\n        assert(audio_input_without_pos is not None)\n        assert(audio_subsampled_point is not None)\n        assert(image_input_without_pos is not None)\n        assert(image_subsampled_point is not None)\n        assert(label_input_without_pos is not None)\n\n        output = self.decoder_wrapper(z, query_mask, \n                                        audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding,\n                                        image_input, image_input_without_pos, image_subsampled_point, image_padding,\n                                        label_input, label_input_without_pos, label_padding)\n        return output\n\n\n# -- Load compiled models --\nmodel = PerceiverForMultimodalAutoencoding.from_pretrained(\"deepmind/multimodal-perceiver\", \n                                                                  low_cpu_mem_usage=True)\n\nPerceiverForMultimodalAutoencoding.forward = custom_model_forward\nPerceiverBasicDecoder.decoder_query = custom_decoder_query\n\nCOMPILER_WORKDIR_ROOT=\"perceiver_multimodal_compile_dir\"\nCOMPILER_WORKDIR_DECODER = os.path.join(COMPILER_WORKDIR_ROOT, \"decoder\")\nCOMPILER_WORKDIR_ENCODER = os.path.join(COMPILER_WORKDIR_ROOT, \"encoder\")\n\n\n# load saved encoder from disk\nencoder_fname = os.path.join(COMPILER_WORKDIR_ENCODER, 'model.pt')\nneuron_encoder = NeuronEncoder(EncoderWrapper(model.perceiver.encoder))\nneuron_encoder.encoder_wrapper = torch.jit.load(encoder_fname)\nmodel.perceiver.encoder = neuron_encoder\n\n# load saved decoder from disk\ndecoder_fname = os.path.join(COMPILER_WORKDIR_DECODER, 'model.pt')\nneuron_decoder = NeuronDecoder(DecoderWrapper(model.perceiver.decoder, model.perceiver.decoder.modalities['audio'].decoder_query, \\\n                                              model.perceiver.decoder.modalities['image'].decoder_query, model.perceiver.decoder.modalities['label'].decoder_query, \\\n                                              model.perceiver.output_postprocessor))\nneuron_decoder.decoder_wrapper = torch.jit.load(decoder_fname)\n\n\n\n# Inference function\ndef autoencode_video(images, audio, nchunks, image_chunk_size, audio_chunk_size):\n    input_image = torch.from_numpy(np.moveaxis(images, -1, 2)).to(torch.float32)\n    input_audio = torch.from_numpy(audio).to(torch.float32)\n    input_label = torch.zeros((images.shape[0], 700))\n\n    inputs = {'image': input_image, 'audio': input_audio, 'label':input_label}\n\n    reconstruction = {}\n    with torch.no_grad():\n        reconstruction = model(nchunks, image_chunk_size, audio_chunk_size, neuron_decoder, inputs=inputs)\n\n    # reshape image and audio modalities back to original shape\n    reconstruction['image'] = torch.reshape(reconstruction['image'], images.shape)\n    reconstruction['audio'] = torch.reshape(reconstruction['audio'], audio.shape)\n    return reconstruction\n\n# Generate random image for benchmarking\nAUDIO_SAMPLES_PER_PATCH = 16\nimage = np.random.random(size=(1, 16, 224, 224, 3))\naudio = np.random.random(size=(1, 30720, 1))\n\nnchunks = 128\nimage_chunk_size = np.prod(image.shape[1:-1]) // nchunks\naudio_chunk_size = audio.shape[1] // AUDIO_SAMPLES_PER_PATCH // nchunks\n\nn_runs = 20\nmodel_inputs = (image, audio, nchunks, image_chunk_size, audio_chunk_size)\nbenchmark(n_runs, \"perceiver-multimodal\", autoencode_video, model_inputs)"
  },
  {
    "path": "archive/src/benchmark/pytorch/perceiver-multimodal_compile.py",
    "content": "import base64\nimport os\nimport ssl\nimport re\nfrom urllib import request\nimport time\nimport random\nfrom tqdm import tqdm\nimport numpy as np\n\nfrom typing import Optional, Tuple, Union\nfrom transformers import PerceiverForMultimodalAutoencoding\nfrom transformers.modeling_outputs import BaseModelOutputWithCrossAttentions\nfrom transformers.models.perceiver.modeling_perceiver import PerceiverBasicDecoder, PerceiverClassifierOutput\nfrom transformers.models.perceiver.modeling_perceiver import restructure\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\nclass MultimodalPerceiverWrapper(nn.Module):\n    def __init__(self, perceiver_model, nchunks, image_chunk_size, audio_chunk_size):\n        super().__init__()\n        self.perceiver_model = perceiver_model\n        self.nchunks = nchunks\n        self.image_chunk_size = image_chunk_size\n        self.audio_chunk_size = audio_chunk_size\n    \n    def forward(self, inputs: torch.FloatTensor,\n        neuron_decoder,\n        attention_mask: Optional[torch.FloatTensor] = None,\n        head_mask: Optional[torch.FloatTensor] = None,\n        output_attentions: Optional[bool] = None,\n        output_hidden_states: Optional[bool] = None,\n        return_dict: Optional[bool] = None):\n\n\n        output_attentions = output_attentions if output_attentions is not None else self.perceiver_model.config.output_attentions\n        output_hidden_states = (\n            output_hidden_states if output_hidden_states is not None else self.perceiver_model.config.output_hidden_states\n        )\n        return_dict = return_dict if return_dict is not None else self.perceiver_model.config.use_return_dict\n        \n        if self.perceiver_model.input_preprocessor is not None:\n            inputs, modality_sizes, inputs_without_pos = self.perceiver_model.input_preprocessor(inputs)\n        else:\n            modality_sizes = None\n            inputs_without_pos = None\n            if inputs.size()[-1] != self.perceiver_model.config.d_model:\n                raise ValueError(\n                    f\"Last dimension of the inputs: {inputs.size()[-1]} doesn't correspond to config.d_model:\"\n                    f\" {self.perceiver_model.config.d_model}. Make sure to set config.d_model appropriately.\"\n                )\n\n        batch_size, seq_length, _ = inputs.size()\n        device = inputs.device\n\n        # If no attention mask is provided, make them all ones\n        if attention_mask is None:\n            attention_mask = torch.ones((batch_size, seq_length), device=device)\n        # Make the attention mask broadcastable to [batch_size, num_heads, seq_length, seq_length]\n        extended_attention_mask = self.perceiver_model.invert_attention_mask(attention_mask)\n\n        head_mask = self.perceiver_model.get_head_mask(head_mask, self.perceiver_model.config.num_blocks * self.perceiver_model.config.num_self_attends_per_block)\n        embedding_output = self.perceiver_model.embeddings(batch_size=batch_size)\n\n        encoder_outputs = self.perceiver_model.encoder(\n            embedding_output,\n            attention_mask=None,\n            head_mask=head_mask,\n            inputs=inputs,\n            inputs_mask=extended_attention_mask,\n            output_attentions=output_attentions,\n            output_hidden_states=output_hidden_states,\n            return_dict=return_dict,\n        )\n        sequence_output = encoder_outputs[0]\n\n        logits = None\n        reconstruction = {}\n        for chunk_idx in tqdm(range(self.nchunks)):\n            subsampled_output_points = {\n            'image': torch.arange(\n                self.image_chunk_size * chunk_idx, self.image_chunk_size * (chunk_idx + 1)).to(device),\n            'audio': torch.arange(\n                self.audio_chunk_size * chunk_idx, self.audio_chunk_size * (chunk_idx + 1)).to(device),\n            'label': None,\n            }\n            \n            logits = neuron_decoder(sequence_output, extended_attention_mask, \n                                             inputs, modality_sizes, inputs_without_pos, subsampled_points=subsampled_output_points)\n\n            reconstruction['label'] = logits['label']\n            if 'image' not in reconstruction:\n                reconstruction['image'] = logits['image']\n                reconstruction['audio'] = logits['audio']\n            else:\n                reconstruction['image'] = torch.cat(\n                    [reconstruction['image'], logits['image']], dim=1)\n                reconstruction['audio'] = torch.cat(\n                    [reconstruction['audio'], logits['audio']], dim=1)\n            \n            del logits\n\n        return reconstruction\n\ndef custom_model_forward(\n        self,\n        nchunks,\n        image_chunk_size,\n        audio_chunk_size,\n        neuron_decoder,\n        inputs: Optional[torch.Tensor] = None,\n        attention_mask: Optional[torch.Tensor] = None,\n        head_mask: Optional[torch.Tensor] = None,\n        output_attentions: Optional[bool] = None,\n        output_hidden_states: Optional[bool] = None,\n        return_dict: Optional[bool] = None,\n    ) -> Union[Tuple, PerceiverClassifierOutput]:\n\n        return_dict = return_dict if return_dict is not None else self.config.use_return_dict\n\n        perceiver_wrapper = MultimodalPerceiverWrapper(self.perceiver, nchunks, image_chunk_size, audio_chunk_size)\n        outputs = perceiver_wrapper(\n            inputs,\n            neuron_decoder,\n            attention_mask=attention_mask,\n            head_mask=head_mask,\n            output_attentions=output_attentions,\n            output_hidden_states=output_hidden_states,\n            return_dict=return_dict,\n        )\n        return outputs\n\n\ndef custom_decoder_query(self, inputs, modality_sizes=None, inputs_without_pos=None, subsampled_points=None):\n    if self.position_encoding_type == \"none\":  # Queries come from elsewhere\n        raise ValueError(\"You cannot construct decoder queries when position_encoding_type is set to none\")\n    if subsampled_points is not None:\n        # subsampled_points are the indices if the inputs would be flattened\n        # however, the inputs aren't flattened, that's why we use unravel_index\n        # to get the indices for the unflattened array\n        # unravel_index returns a tuple (x_idx, y_idx, ...)\n        # stack to get the [n, d] tensor of coordinates\n\n        def unravel_indices(indices, shape):\n            coord = []\n\n            for dim in reversed(shape):\n                coord.append(indices % dim)\n                indices = indices // dim\n\n            coord = torch.stack(coord[::-1], dim=-1)\n\n            return coord\n\n        pos = unravel_indices(subsampled_points, self.output_index_dims)\n\n        batch_size = inputs.shape[0]\n        # Map these coordinates to [-1, 1]\n        pos = -1 + 2 * pos / torch.tensor(self.output_index_dims)[None, :]\n        pos = torch.broadcast_to(pos[None], [batch_size, pos.shape[0], pos.shape[1]])\n        # Construct the position encoding.\n        if self.position_encoding_type == \"trainable\":\n            pos_emb = self.output_position_encodings(batch_size)\n        elif self.position_encoding_type == \"fourier\":\n            pos_emb = self.output_position_encodings(\n                self.output_index_dims, batch_size=batch_size, device=inputs.device, dtype=inputs.dtype, pos=pos\n            )\n\n        # Optionally project them to a target dimension.\n        pos_emb = self.positions_projection(pos_emb)\n        pos_emb = torch.reshape(pos_emb, [pos_emb.shape[0], -1, pos_emb.shape[-1]])\n    else:\n        batch_size = inputs.shape[0]\n        index_dims = inputs.shape[2:]\n\n        # Construct the position encoding.\n        if self.position_encoding_type == \"trainable\":\n            pos_emb = self.output_position_encodings(batch_size)\n        elif self.position_encoding_type == \"fourier\":\n            pos_emb = self.output_position_encodings(\n                index_dims, batch_size, device=inputs.device, dtype=inputs.dtype\n            )\n\n        # Optionally project them to a target dimension.\n        pos_emb = self.positions_projection(pos_emb)\n\n    if self.concat_preprocessed_input:\n        if inputs_without_pos is None:\n            raise ValueError(\"Value is required for inputs_without_pos if concat_preprocessed_input is True\")\n        pos_emb = torch.cat([inputs_without_pos, pos_emb], dim=-1)\n\n    return pos_emb\n\n\n# Define wrapper for tracing encoder\nclass EncoderWrapper(nn.Module):\n    def __init__(self, encoder):\n        super().__init__()\n        self.encoder = encoder\n    \n    def forward(self, embedding_output, inputs, extended_attention_mask):\n        output = self.encoder(embedding_output, inputs=inputs, inputs_mask=extended_attention_mask)\n        return output\n\nclass NeuronEncoder(nn.Module):\n    def __init__(self, encoder_wrapper):\n       super().__init__()\n       self.encoder_wrapper = encoder_wrapper\n    \n    def forward(self,\n        hidden_states: torch.Tensor,\n        attention_mask: Optional[torch.FloatTensor] = None,\n        head_mask: Optional[torch.FloatTensor] = None,\n        inputs: Optional[torch.FloatTensor] = None,\n        inputs_mask: Optional[torch.FloatTensor] = None,\n        output_attentions: Optional[bool] = False,\n        output_hidden_states: Optional[bool] = False,\n        return_dict: Optional[bool] = True):\n\n        last_hidden_states = self.encoder_wrapper(hidden_states, inputs, inputs_mask)['last_hidden_state']\n        return BaseModelOutputWithCrossAttentions(last_hidden_state=last_hidden_states)\n\n\n# Define wrapper for tracing decoder\nclass DecoderWrapper(nn.Module):\n    def __init__(self, decoder, decoder_query_audio, decoder_query_image, decoder_query_label, output_postprocessor):\n        super().__init__()\n        self.decoder = decoder\n        self.decoder_query_audio = decoder_query_audio\n        self.decoder_query_image = decoder_query_image\n        self.decoder_query_label = decoder_query_label\n        self.output_postprocessor = output_postprocessor\n        self.num_query_channels = decoder.num_query_channels\n    \n    def forward(self, z, query_mask,\n                audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding,\n                image_input, image_input_without_pos, image_subsampled_point, image_padding,\n                label_input, label_input_without_pos, label_padding):\n        audio_query = self.decoder_query_audio(inputs=audio_input, inputs_without_pos=audio_input_without_pos, subsampled_points=audio_subsampled_point)\n        image_query = self.decoder_query_image(inputs=image_input, inputs_without_pos=image_input_without_pos, subsampled_points=image_subsampled_point)\n        label_query = self.decoder_query_label(inputs=label_input, inputs_without_pos=label_input_without_pos)\n\n        def embed(x, pos):\n            x = torch.reshape(x, [x.shape[0], np.prod(x.shape[1:-1]), x.shape[-1]])\n            pos = torch.broadcast_to(pos, [x.shape[0], x.shape[1], self.num_query_channels - x.shape[2]])\n            return torch.cat([x, pos], dim=2)\n\n        audio_padded = embed(audio_query, audio_padding)\n        image_padded = embed(image_query, image_padding)\n        label_padded = embed(label_query, label_padding)\n\n        decoder_query = torch.cat([audio_padded, image_padded, label_padded], dim=1)\n        logits = self.decoder(decoder_query, z, query_mask).logits\n        \n        output_modality_sizes = {\"audio\": audio_subsampled_point.shape[0],\n                                 \"image\": image_subsampled_point.shape[0],\n                                 \"label\": 1}\n        logits = self.output_postprocessor(logits, modality_sizes=output_modality_sizes)\n        return logits\n\nclass NeuronDecoder(nn.Module):\n    def __init__(self, decoder_wrapper):\n        super().__init__()\n        self.decoder_wrapper = decoder_wrapper\n        self.modalities = decoder_wrapper.decoder.modalities\n        self.padding = decoder_wrapper.decoder.padding\n\n    def forward(self, z, query_mask, inputs, modality_sizes, inputs_without_pos=None, subsampled_points=None, output_attentions=False):\n        # Partition the flat inputs among the different modalities\n        inputs = restructure(modality_sizes, inputs)\n\n        assert(subsampled_points is not None)\n        assert(inputs_without_pos is not None)\n\n        for modality, decoder in self.modalities.items():\n            if modality == \"audio\":\n                audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding = inputs[modality], inputs_without_pos[modality], subsampled_points[modality].to(torch.float32), self.padding[modality]\n            elif modality == \"image\":\n                image_input, image_input_without_pos, image_subsampled_point, image_padding = inputs[modality], inputs_without_pos[modality], subsampled_points[modality].to(torch.float32), self.padding[modality]\n            else:\n                # label doesn't have subsampled point\n                label_input, label_input_without_pos, label_padding = inputs[modality], inputs_without_pos[modality], self.padding[modality]\n\n        assert(audio_input_without_pos is not None)\n        assert(audio_subsampled_point is not None)\n        assert(image_input_without_pos is not None)\n        assert(image_subsampled_point is not None)\n        assert(label_input_without_pos is not None)\n\n        output = self.decoder_wrapper(z, query_mask, \n                                        audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding,\n                                        image_input, image_input_without_pos, image_subsampled_point, image_padding,\n                                        label_input, label_input_without_pos, label_padding)\n        return output\n\n\nmodel = PerceiverForMultimodalAutoencoding.from_pretrained(\"deepmind/multimodal-perceiver\", \n                                                                   low_cpu_mem_usage=True)\nCOMPILER_WORKDIR_ROOT=\"perceiver_multimodal_compile_dir\"\n\nPerceiverForMultimodalAutoencoding.forward = custom_model_forward\nPerceiverBasicDecoder.decoder_query = custom_decoder_query\n\n\n# --- Compile Encoder ---\n# Define sample inputs for tracing encoder\nembedding_output = torch.randn(1, 784, 512)\nsample_inputs = torch.randn(1, 52097, 704)\nextended_attention_mask = torch.zeros(1, 1, 1, 52097)\n\n# Wrap and trace the encoder, save the traced encoder\nCOMPILER_WORKDIR_ENCODER = os.path.join(COMPILER_WORKDIR_ROOT, \"encoder\")\nneuron_encoder = NeuronEncoder(EncoderWrapper(model.perceiver.encoder))\n\n# You might see a warning from trace about unused input - these are safe to ignore.\nprint(\"Compiling Encoder...\")\nneuron_encoder.encoder_wrapper = torch_neuronx.trace(\n  neuron_encoder.encoder_wrapper,\n  (embedding_output, sample_inputs, extended_attention_mask),\n  compiler_workdir=COMPILER_WORKDIR_ENCODER,\n  compiler_args=[f\"--temp-dir={COMPILER_WORKDIR_ENCODER}\", \"--auto-cast=none\"] # --auto-cast=none is needed to avoid numerical error.\n)\n\n# Save compiled encoder\nencoder_fname = os.path.join(COMPILER_WORKDIR_ENCODER, 'model.pt')\ntorch.jit.save(neuron_encoder.encoder_wrapper, encoder_fname)\n\n\n# --- Compile Decoder ---\n# Define sample inputs for tracing decoder\nz = torch.randn(1, 784, 512)\nquery_mask = torch.zeros(1, 1, 1, 52097)\n\naudio_input = torch.randn(1, 1920, 704)\naudio_input_without_pos = torch.randn(1, 1920, 16)\naudio_subsampled_point = torch.arange(0, 15, dtype=torch.float32) # 15 = 1920/128\naudio_padding = torch.randn(1, 641)\n\nimage_input = torch.randn(1, 50176, 704)\nimage_input_without_pos = torch.randn(1, 50176, 48)\nimage_subsampled_point = torch.arange(0, 6272, dtype=torch.float32) # 6272 = 224*224*16/128\nimage_padding = torch.randn(1, 831)\n\nlabel_input = torch.randn(1, 1, 704)\nlabel_input_without_pos = torch.randn(1, 1, 700)\nlabel_padding = torch.randn(1, 2)\n\n# Wrap and trace the decoder, save the traced decoder\nCOMPILER_WORKDIR_DECODER = os.path.join(COMPILER_WORKDIR_ROOT, \"decoder\")\nneuron_decoder = NeuronDecoder(DecoderWrapper(model.perceiver.decoder, model.perceiver.decoder.modalities['audio'].decoder_query, \\\n                                              model.perceiver.decoder.modalities['image'].decoder_query, model.perceiver.decoder.modalities['label'].decoder_query, \\\n                                              model.perceiver.output_postprocessor))\n\n# You might see a warning from trace about unused input - these are safe to ignore.\nprint(\"Compiling decoder...\")\nneuron_decoder.decoder_wrapper = torch_neuronx.trace(\n   neuron_decoder.decoder_wrapper,\n   (z, query_mask, audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding,\n        image_input, image_input_without_pos, image_subsampled_point, image_padding,\n        label_input, label_input_without_pos, label_padding),\n   compiler_workdir=COMPILER_WORKDIR_DECODER,\n   compiler_args=[f\"--temp-dir={COMPILER_WORKDIR_DECODER}\", \"--auto-cast=none\"] # --auto-cast=none is needed to avoid numerical error.\n)\n\n# Save compiled decoder\ndecoder_fname = os.path.join(COMPILER_WORKDIR_DECODER, 'model.pt')\ntorch.jit.save(neuron_decoder.decoder_wrapper, decoder_fname)\n\nprint(\"Done\")"
  },
  {
    "path": "archive/src/benchmark/pytorch/perceiver-vision_benchmark.py",
    "content": "import torch\nimport neuronperf as npf\nimport neuronperf.torch\n\n# Add to these lists or change as needed\nmodels_list = [\n    (\"PerceiverForImageClassificationLearned\", \"deepmind/vision-perceiver-learned\"),\n    (\"PerceiverForImageClassificationFourier\", \"deepmind/vision-perceiver-fourier\"),\n    (\"PerceiverForImageClassificationConvProcessing\", \"deepmind/vision-perceiver-conv\"),\n]\nbatch_sizes = [1]\nn_models = [1, 2]\nworkers_per_model = [1, 2] # optimized for latency or throughput\n\n\ndef get_batch(batch_size):\n    return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n\n\nif __name__ == \"__main__\":\n    for class_name, pretrained_name in models_list:\n        model_name = pretrained_name.split(\"/\")[1]\n        inputs = [get_batch(batch_size) for batch_size in batch_sizes]\n        filename = f\"{model_name}.json\"\n\n        # Benchmark\n        print(\"Benchmarking {}\".format(filename))\n        reports = npf.torch.benchmark(filename, inputs, n_models=n_models, workers_per_model=workers_per_model) \n\n        # View and save results\n        print(\"======== {} ========\".format(filename))\n        npf.print_reports(reports)\n        npf.write_csv(reports)\n        npf.write_json(reports)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/perceiver-vision_compile.py",
    "content": "import torch\nimport transformers  # ==4.32.0\nimport neuronperf as npf\nimport neuronperf.torch\n\n# Add to these lists or change as needed\nmodels_list = [\n    (\"PerceiverForImageClassificationLearned\", \"deepmind/vision-perceiver-learned\"),\n    (\"PerceiverForImageClassificationFourier\", \"deepmind/vision-perceiver-fourier\"),\n    (\"PerceiverForImageClassificationConvProcessing\", \"deepmind/vision-perceiver-conv\"),\n]\nbatch_sizes = [1]\npipeline_sizes = [1]\n\n\ndef get_batch(batch_size):\n    return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n\n\nif __name__ == \"__main__\":\n    for class_name, pretrained_name in models_list:\n        model_name = pretrained_name.split(\"/\")[1]\n\n        model = getattr(transformers, class_name).from_pretrained(pretrained_name)\n        inputs = [get_batch(batch_size) for batch_size in batch_sizes]\n        filename = f\"{model_name}.json\"\n\n        # Compile\n        print(\"Compiling {}\".format(filename))\n        npf.torch.compile(\n            model,\n            inputs,\n            batch_sizes=batch_sizes,\n            pipeline_sizes=pipeline_sizes,\n            filename=filename,\n            model_name=model_name,\n        )"
  },
  {
    "path": "archive/src/benchmark/pytorch/pixart_alpha_benchmark.py",
    "content": "import os\nos.environ[\"NEURON_FUSE_SOFTMAX\"] = \"1\"\nos.environ[\"NEURON_CUSTOM_SILU\"] = \"1\"\n\nimport copy\nimport diffusers\nimport math\nimport numpy as npy\nimport time\nimport torch\nimport torch_neuronx\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom diffusers import PixArtAlphaPipeline\nfrom diffusers import Transformer2DModel\nfrom IPython.display import clear_output\nfrom matplotlib import image as mpimg\nfrom matplotlib import pyplot as plt\nfrom torch import nn\n\nimport torch\nfrom torch import nn\nfrom transformers.models.t5.modeling_t5 import T5EncoderModel\nfrom diffusers import Transformer2DModel\n\n# Define datatype\nDTYPE = torch.bfloat16\n\n# Specialized benchmarking class for PixArt models.\n# We cannot use any of the pre-existing benchmarking utilities to benchmark E2E PixArt performance,\n# because the top-level PixArt pipeline cannot be serialized into a single Torchscript object.\n# All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a\n# traced Torchscript.\ndef benchmark(n_runs, test_name, model, model_inputs):\n  if not isinstance(model_inputs, tuple):\n    model_inputs = (model_inputs,)\n  \n  warmup_run = model(*model_inputs)\n\n  latency_collector = LatencyCollector()\n  # can't use register_forward_pre_hook or register_forward_hook because PixArt pipeline is not a torch.nn.Module\n  \n  for _ in range(n_runs):\n    latency_collector.pre_hook()\n    res = model(*model_inputs)\n    latency_collector.hook()\n  \n  p0_latency_ms = latency_collector.percentile(0) * 1000\n  p50_latency_ms = latency_collector.percentile(50) * 1000\n  p90_latency_ms = latency_collector.percentile(90) * 1000\n  p95_latency_ms = latency_collector.percentile(95) * 1000\n  p99_latency_ms = latency_collector.percentile(99) * 1000\n  p100_latency_ms = latency_collector.percentile(100) * 1000\n\n  report_dict = dict()\n  report_dict[\"Latency P0\"] = f'{p0_latency_ms:.1f}'\n  report_dict[\"Latency P50\"]=f'{p50_latency_ms:.1f}'\n  report_dict[\"Latency P90\"]=f'{p90_latency_ms:.1f}'\n  report_dict[\"Latency P95\"]=f'{p95_latency_ms:.1f}'\n  report_dict[\"Latency P99\"]=f'{p99_latency_ms:.1f}'\n  report_dict[\"Latency P100\"]=f'{p100_latency_ms:.1f}'\n\n  report = f'RESULT FOR {test_name}:'\n  for key, value in report_dict.items():\n    report += f' {key}={value}'\n  print(report)\n\nclass LatencyCollector:\n  def __init__(self):\n    self.start = None\n    self.latency_list = []\n\n  def pre_hook(self, *args):\n    self.start = time.time()\n\n  def hook(self, *args):\n    self.latency_list.append(time.time() - self.start)\n\n  def percentile(self, percent):\n    latency_list = self.latency_list\n    pos_float = len(latency_list) * percent / 100\n    max_pos = len(latency_list) - 1\n    pos_floor = min(math.floor(pos_float), max_pos)\n    pos_ceil = min(math.ceil(pos_float), max_pos)\n    latency_list = sorted(latency_list)\n    return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor]\n\nclass InferenceTextEncoderWrapper(nn.Module):\n  def __init__(self, dtype, t: T5EncoderModel, seqlen: int):\n    super().__init__()\n    self.dtype = dtype\n    self.device = t.device\n    self.t = t\n  def forward(self, text_input_ids, attention_mask=None):\n    return [self.t(text_input_ids, attention_mask)['last_hidden_state'].to(self.dtype)]\n\nclass InferenceTransformerWrapper(nn.Module):\n  def __init__(self, transformer: Transformer2DModel):\n    super().__init__()\n    self.transformer = transformer\n    self.config = transformer.config\n    self.dtype = transformer.dtype\n    self.device = transformer.device\n  def forward(self, hidden_states, encoder_hidden_states=None, timestep=None, \n              encoder_attention_mask=None, added_cond_kwargs=None,\n              return_dict=False):\n    output = self.transformer(\n      hidden_states, \n      encoder_hidden_states, \n      timestep, \n      encoder_attention_mask)\n    return output\n\nclass SimpleWrapper(nn.Module):\n  def __init__(self, model):\n    super().__init__()\n    self.model = model\n  def forward(self, x):\n    output = self.model(x)\n    return output\n\n# --- Load all compiled models and benchmark pipeline ---\ndef get_pipe(resolution, dtype):\n  if resolution == 256:\n    transformer: Transformer2DModel = Transformer2DModel.from_pretrained(\n      \"PixArt-alpha/PixArt-XL-2-256x256\", \n      subfolder=\"transformer\", \n      torch_dtype=dtype)\n    return PixArtAlphaPipeline.from_pretrained(\n      \"PixArt-alpha/PixArt-XL-2-512x512\", \n      transformer=transformer, \n      torch_dtype=dtype)\n  elif resolution == 512:\n    return PixArtAlphaPipeline.from_pretrained(\n      \"PixArt-alpha/PixArt-XL-2-512x512\", \n      torch_dtype=dtype)\n  else:\n    raise Exception(f\"Unsupport resolution {resolution} for pixart alpha\")\n\nCOMPILER_WORKDIR_ROOT = 'pixart_alpha_compile_dir'\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\ntransformer_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'transformer/model.pt')\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\n\n# Select the desired resolution ()\nresolution = 256\n# resolution = 512\n\npipe = get_pipe(resolution, DTYPE)\nseqlen = 120\n\n_neuronTextEncoder = InferenceTextEncoderWrapper(DTYPE, pipe.text_encoder, seqlen)\n_neuronTextEncoder.t = torch.jit.load(text_encoder_filename)\npipe.text_encoder = _neuronTextEncoder\nassert pipe._execution_device is not None\n\ndevice_ids = [0, 1]\n_neuronTransformer = InferenceTransformerWrapper(pipe.transformer)\n_neuronTransformer.transformer = torch_neuronx.DataParallel(torch.jit.load(transformer_filename), device_ids, set_dynamic_batching=False)\npipe.transformer = _neuronTransformer\npipe.vae.decoder = SimpleWrapper(torch.jit.load(decoder_filename))\npipe.vae.post_quant_conv = SimpleWrapper(torch.jit.load(post_quant_conv_filename))\n\nprompt = \"a photo of an astronaut riding a horse on mars\"\nn_runs = 20\nbenchmark(n_runs, \"pixart_alpha\", pipe, prompt)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/pixart_sigma_benchmark.py",
    "content": "import os\nos.environ[\"NEURON_FUSE_SOFTMAX\"] = \"1\"\nos.environ[\"NEURON_CUSTOM_SILU\"] = \"1\"\n\nimport copy\nimport diffusers\nimport math\nimport numpy as npy\nimport time\nimport torch\nimport torch_neuronx\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom diffusers import PixArtSigmaPipeline\nfrom IPython.display import clear_output\nfrom matplotlib import image as mpimg\nfrom matplotlib import pyplot as plt\nfrom torch import nn\n\nimport torch\nfrom torch import nn\nfrom transformers.models.t5.modeling_t5 import T5EncoderModel\nfrom diffusers import Transformer2DModel\n\n# Define datatype\nDTYPE = torch.bfloat16\n\n# Specialized benchmarking class for PixArt models.\n# We cannot use any of the pre-existing benchmarking utilities to benchmark E2E PixArt performance,\n# because the top-level PixArt pipeline cannot be serialized into a single Torchscript object.\n# All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a\n# traced Torchscript.\ndef benchmark(n_runs, test_name, model, model_inputs):\n  if not isinstance(model_inputs, tuple):\n    model_inputs = (model_inputs,)\n  \n  warmup_run = model(*model_inputs)\n\n  latency_collector = LatencyCollector()\n  # can't use register_forward_pre_hook or register_forward_hook because PixArt pipeline is not a torch.nn.Module\n  \n  for _ in range(n_runs):\n    latency_collector.pre_hook()\n    res = model(*model_inputs)\n    latency_collector.hook()\n  \n  p0_latency_ms = latency_collector.percentile(0) * 1000\n  p50_latency_ms = latency_collector.percentile(50) * 1000\n  p90_latency_ms = latency_collector.percentile(90) * 1000\n  p95_latency_ms = latency_collector.percentile(95) * 1000\n  p99_latency_ms = latency_collector.percentile(99) * 1000\n  p100_latency_ms = latency_collector.percentile(100) * 1000\n\n  report_dict = dict()\n  report_dict[\"Latency P0\"] = f'{p0_latency_ms:.1f}'\n  report_dict[\"Latency P50\"]=f'{p50_latency_ms:.1f}'\n  report_dict[\"Latency P90\"]=f'{p90_latency_ms:.1f}'\n  report_dict[\"Latency P95\"]=f'{p95_latency_ms:.1f}'\n  report_dict[\"Latency P99\"]=f'{p99_latency_ms:.1f}'\n  report_dict[\"Latency P100\"]=f'{p100_latency_ms:.1f}'\n\n  report = f'RESULT FOR {test_name}:'\n  for key, value in report_dict.items():\n    report += f' {key}={value}'\n  print(report)\n\nclass LatencyCollector:\n  def __init__(self):\n    self.start = None\n    self.latency_list = []\n\n  def pre_hook(self, *args):\n    self.start = time.time()\n\n  def hook(self, *args):\n    self.latency_list.append(time.time() - self.start)\n\n  def percentile(self, percent):\n    latency_list = self.latency_list\n    pos_float = len(latency_list) * percent / 100\n    max_pos = len(latency_list) - 1\n    pos_floor = min(math.floor(pos_float), max_pos)\n    pos_ceil = min(math.ceil(pos_float), max_pos)\n    latency_list = sorted(latency_list)\n    return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor]\n\nclass InferenceTextEncoderWrapper(nn.Module):\n  def __init__(self, dtype, t: T5EncoderModel, seqlen: int):\n    super().__init__()\n    self.dtype = dtype\n    self.device = t.device\n    self.t = t\n  def forward(self, text_input_ids, attention_mask=None):\n    return [self.t(text_input_ids, attention_mask)['last_hidden_state'].to(self.dtype)]\n\nclass InferenceTransformerWrapper(nn.Module):\n  def __init__(self, transformer: Transformer2DModel):\n    super().__init__()\n    self.transformer = transformer\n    self.config = transformer.config\n    self.dtype = transformer.dtype\n    self.device = transformer.device\n  def forward(self, hidden_states, encoder_hidden_states=None, timestep=None, \n              encoder_attention_mask=None, added_cond_kwargs=None,\n              return_dict=False):\n    output = self.transformer(\n      hidden_states, \n      encoder_hidden_states, \n      timestep, \n      encoder_attention_mask)\n    return output\n\nclass SimpleWrapper(nn.Module):\n  def __init__(self, model):\n    super().__init__()\n    self.model = model\n  def forward(self, x):\n    output = self.model(x)\n    return output\n\n# --- Load all compiled models and benchmark pipeline ---\ndef get_pipe(resolution, dtype):\n  if resolution == 256:\n    transformer = Transformer2DModel.from_pretrained(\n      \"PixArt-alpha/PixArt-Sigma-XL-2-256x256\", \n      subfolder='transformer', \n      torch_dtype=dtype,\n    )\n    return PixArtSigmaPipeline.from_pretrained(\n      \"PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers\",\n      transformer=transformer,\n      torch_dtype=dtype,\n    )\n  elif resolution == 512:\n    transformer = Transformer2DModel.from_pretrained(\n      \"PixArt-alpha/PixArt-Sigma-XL-2-512-MS\",\n      subfolder='transformer', \n      torch_dtype=dtype,\n    )\n    return PixArtSigmaPipeline.from_pretrained(\n      \"PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers\",\n      transformer=transformer,\n      torch_dtype=dtype,\n    )\n  else:\n    raise Exception(f\"Unsupport resolution {resolution} for PixArt Sigma\")\n\nCOMPILER_WORKDIR_ROOT = 'pixart_sigma_compile_dir'\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\ntransformer_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'transformer/model.pt')\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\n\n# Select the desired resolution ()\nresolution = 256\n# resolution = 512\n\npipe = get_pipe(resolution, DTYPE)\nseqlen = 300\n\n_neuronTextEncoder = InferenceTextEncoderWrapper(DTYPE, pipe.text_encoder, seqlen)\n_neuronTextEncoder.t = torch.jit.load(text_encoder_filename)\npipe.text_encoder = _neuronTextEncoder\nassert pipe._execution_device is not None\n\ndevice_ids = [0, 1]\n_neuronTransformer = InferenceTransformerWrapper(pipe.transformer)\n_neuronTransformer.transformer = torch_neuronx.DataParallel(torch.jit.load(transformer_filename), device_ids, set_dynamic_batching=False)\npipe.transformer = _neuronTransformer\npipe.vae.decoder = SimpleWrapper(torch.jit.load(decoder_filename))\npipe.vae.post_quant_conv = SimpleWrapper(torch.jit.load(post_quant_conv_filename))\n\nprompt = \"a photo of an astronaut riding a horse on mars\"\nn_runs = 20\nbenchmark(n_runs, \"pixart_alpha\", pipe, prompt)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/resnet50_benchmark.py",
    "content": "import torch\nimport torch.neuron\n\nimport neuronperf as npf\nimport neuronperf.torch\n\n# Add to these lists or change as needed\nmodel_name = \"resnet50\"\nbatch_sizes = [1, 6]\n\n\ndef get_batch(batch_size):\n    return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n\n\nif __name__ == \"__main__\":\n    inputs = [get_batch(batch_size) for batch_size in batch_sizes]\n    filename = f\"{model_name}.json\"\n\n    # Benchmark\n    print(\"Benchmarking {}\".format(filename))\n    reports = npf.torch.benchmark(filename, inputs)\n\n    # View and save results\n    print(\"======== {} ========\".format(filename))\n    npf.print_reports(reports)\n    npf.write_csv(reports)\n    npf.write_json(reports)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/resnet50_compile.py",
    "content": "import torch\nimport torch.neuron\nimport torchvision\n\nimport neuronperf as npf\nimport neuronperf.torch\n\n# Add to these lists or change as needed\nmodel_name = \"resnet50\"\nbatch_sizes = [1, 6]\npipeline_sizes = [1]\n\n\ndef get_batch(batch_size):\n    return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n\n\nif __name__ == \"__main__\":\n    model = torchvision.models.resnet50(pretrained=True)\n    inputs = [get_batch(batch_size) for batch_size in batch_sizes]\n    filename = f\"{model_name}.json\"\n\n    # Compile\n    print(\"Compiling {}\".format(filename))\n    npf.torch.compile(\n        model,\n        inputs,\n        batch_sizes=batch_sizes,\n        pipeline_sizes=pipeline_sizes,\n        filename=filename,\n        model_name=model_name,\n    )\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/resnet_benchmark.py",
    "content": "import torch\nimport neuronperf as npf\nimport neuronperf.torch\n\n# Add to these lists or change as needed\nmodel_names = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\"]\nbatch_sizes = [1, 8, 64]\nn_models = [1, 2]\nworkers_per_model = [1, 2] # optimized for latency or throughput\n\n\ndef get_batch(batch_size):\n    return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        inputs = [get_batch(batch_size) for batch_size in batch_sizes]\n        filename = f\"{model_name}.json\"\n\n        # Benchmark\n        print(\"Benchmarking {}\".format(filename))\n        reports = npf.torch.benchmark(filename, inputs, n_models=n_models, workers_per_model=workers_per_model) \n\n        # View and save results\n        print(\"======== {} ========\".format(filename))\n        npf.print_reports(reports)\n        npf.write_csv(reports)\n        npf.write_json(reports)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/resnet_compile.py",
    "content": "import torch\nimport torchvision\nimport neuronperf as npf\nimport neuronperf.torch\n\n# Add to these lists or change as needed\nmodel_names = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\"]\nbatch_sizes = [1, 8, 64]\npipeline_sizes = [1]\n\n\ndef get_batch(batch_size):\n    return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        model = getattr(torchvision.models, model_name)(pretrained=True)\n        inputs = [get_batch(batch_size) for batch_size in batch_sizes]\n        filename = f\"{model_name}.json\"\n\n        # Compile\n        print(\"Compiling {}\".format(filename))\n        npf.torch.compile(\n            model,\n            inputs,\n            batch_sizes=batch_sizes,\n            pipeline_sizes=pipeline_sizes,\n            filename=filename,\n            model_name=model_name,\n        )"
  },
  {
    "path": "archive/src/benchmark/pytorch/sd2_512_benchmark.py",
    "content": "import os\nos.environ[\"NEURON_FUSE_SOFTMAX\"] = \"1\"\n\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\nfrom diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n\nimport time\nimport math\n\n# Define datatype\nDTYPE = torch.bfloat16\n\n# Specialized benchmarking class for stable diffusion.\n# We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance,\n# because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object.\n# All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a\n# traced Torchscript.\ndef benchmark(n_runs, test_name, model, model_inputs):\n    if not isinstance(model_inputs, tuple):\n        model_inputs = (model_inputs,)\n    \n    warmup_run = model(*model_inputs)\n\n    latency_collector = LatencyCollector()\n    # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module\n    \n    for _ in range(n_runs):\n        latency_collector.pre_hook()\n        res = model(*model_inputs)\n        latency_collector.hook()\n    \n    p0_latency_ms = latency_collector.percentile(0) * 1000\n    p50_latency_ms = latency_collector.percentile(50) * 1000\n    p90_latency_ms = latency_collector.percentile(90) * 1000\n    p95_latency_ms = latency_collector.percentile(95) * 1000\n    p99_latency_ms = latency_collector.percentile(99) * 1000\n    p100_latency_ms = latency_collector.percentile(100) * 1000\n\n    report_dict = dict()\n    report_dict[\"Latency P0\"] = f'{p0_latency_ms:.1f}'\n    report_dict[\"Latency P50\"]=f'{p50_latency_ms:.1f}'\n    report_dict[\"Latency P90\"]=f'{p90_latency_ms:.1f}'\n    report_dict[\"Latency P95\"]=f'{p95_latency_ms:.1f}'\n    report_dict[\"Latency P99\"]=f'{p99_latency_ms:.1f}'\n    report_dict[\"Latency P100\"]=f'{p100_latency_ms:.1f}'\n\n    report = f'RESULT FOR {test_name}:'\n    for key, value in report_dict.items():\n        report += f' {key}={value}'\n    print(report)\n\nclass LatencyCollector:\n    def __init__(self):\n        self.start = None\n        self.latency_list = []\n\n    def pre_hook(self, *args):\n        self.start = time.time()\n\n    def hook(self, *args):\n        self.latency_list.append(time.time() - self.start)\n\n    def percentile(self, percent):\n        latency_list = self.latency_list\n        pos_float = len(latency_list) * percent / 100\n        max_pos = len(latency_list) - 1\n        pos_floor = min(math.floor(pos_float), max_pos)\n        pos_ceil = min(math.ceil(pos_float), max_pos)\n        latency_list = sorted(latency_list)\n        return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor]\n\n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False)\n        return out_tuple\n\nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.device = unetwrap.unet.device\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None, return_dict=False):\n        sample = self.unetwrap(sample, timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states)[0]\n        return UNet2DConditionOutput(sample=sample)\n\nclass NeuronTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.neuron_text_encoder = text_encoder\n        self.config = text_encoder.config\n        self.dtype = text_encoder.dtype\n        self.device = text_encoder.device\n\n    def forward(self, emb, attention_mask = None):\n        return [self.neuron_text_encoder(emb)['last_hidden_state']]\n    \ndef decode_latents(self, latents):\n    latents = latents.to(torch.float)\n    latents = 1 / self.vae.config.scaling_factor * latents\n    image = self.vae.decode(latents).sample\n    image = (image / 2 + 0.5).clamp(0, 1)\n    image = image.cpu().permute(0, 2, 3, 1).float().numpy()\n    return image\n\nStableDiffusionPipeline.decode_latents = decode_latents\n\n# --- Load all compiled models and benchmark pipeline ---\nCOMPILER_WORKDIR_ROOT = 'sd2_compile_dir_512'\nmodel_id = \"stabilityai/stable-diffusion-2-1-base\"\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\n\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\npipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)\n\n# Load the compiled UNet onto two neuron cores.\npipe.unet = NeuronUNet(UNetWrap(pipe.unet))\ndevice_ids = [0,1]\npipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False)\n\nclass NeuronTypeConversionWrapper(nn.Module):\n    def __init__(self, network):\n        super().__init__()\n        self.network = network\n\n    def forward(self, x):\n        return self.network(x.float())\n\n# Load other compiled models onto a single neuron core.\npipe.text_encoder = NeuronTextEncoder(pipe.text_encoder)\npipe.text_encoder.neuron_text_encoder = torch.jit.load(text_encoder_filename)\npipe.vae.decoder = NeuronTypeConversionWrapper(torch.jit.load(decoder_filename))\npipe.vae.post_quant_conv = NeuronTypeConversionWrapper(torch.jit.load(post_quant_conv_filename))\n\nprompt = \"a photo of an astronaut riding a horse on mars\"\nn_runs = 20\nbenchmark(n_runs, \"stable_diffusion_512\", pipe, prompt)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/sd2_512_compile.py",
    "content": "import os\nos.environ[\"NEURON_FUSE_SOFTMAX\"] = \"1\"\n\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\nimport copy\nfrom diffusers import StableDiffusionPipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n# Compatibility for diffusers<0.18.0\nfrom packaging import version\nimport diffusers\ndiffusers_version = version.parse(diffusers.__version__)\nuse_new_diffusers = diffusers_version >= version.parse('0.18.0')\nif use_new_diffusers:\n    from diffusers.models.attention_processor import Attention\nelse:\n    from diffusers.models.cross_attention import CrossAttention\n\n# Define datatype\nDTYPE = torch.bfloat16\n\n# Have to do this double wrapper trick to compile the unet, because\n# of the special UNet2DConditionOutput output type.\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False)\n        return out_tuple\n\nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.device = unetwrap.unet.device\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        sample = self.unetwrap(sample, timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states)[0]\n        return UNet2DConditionOutput(sample=sample)\n    \nclass NeuronTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.neuron_text_encoder = text_encoder\n        self.config = text_encoder.config\n        self.dtype = text_encoder.dtype\n        self.device = text_encoder.device\n\n    def forward(self, emb, attention_mask = None):\n        return [self.neuron_text_encoder(emb)['last_hidden_state']]\n\n\n# Optimized attention\ndef get_attention_scores(self, query, key, attn_mask):       \n    dtype = query.dtype\n\n    if self.upcast_attention:\n        query = query.float()\n        key = key.float()\n\n    # Check for square matmuls\n    if(query.size() == key.size()):\n        attention_scores = custom_badbmm(\n            key,\n            query.transpose(-1, -2)\n        )\n\n        if self.upcast_softmax:\n            attention_scores = attention_scores.float()\n\n        attention_probs = attention_scores.softmax(dim=1).permute(0,2,1)\n        attention_probs = attention_probs.to(dtype)\n\n    else:\n        attention_scores = custom_badbmm(\n            query,\n            key.transpose(-1, -2)\n        )\n\n        if self.upcast_softmax:\n            attention_scores = attention_scores.float()\n\n        attention_probs = attention_scores.softmax(dim=-1)\n        attention_probs = attention_probs.to(dtype)\n        \n    return attention_probs\n\n# In the original badbmm the bias is all zeros, so only apply scale\ndef custom_badbmm(a, b):\n    bmm = torch.bmm(a, b)\n    scaled = bmm * 0.125\n    return scaled\n\n\n# For saving compiler artifacts\nCOMPILER_WORKDIR_ROOT = 'sd2_compile_dir_512'\n\n# Model ID for SD version pipeline\nmodel_id = \"stabilityai/stable-diffusion-2-1-base\"\n\n# --- Compile UNet and save ---\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\n\n# Replace original cross-attention module with custom cross-attention module for better performance\nif use_new_diffusers:\n    Attention.get_attention_scores = get_attention_scores\nelse:\n    CrossAttention.get_attention_scores = get_attention_scores\n\n# Apply double wrapper to deal with custom return type\npipe.unet = NeuronUNet(UNetWrap(pipe.unet))\n\n# Only keep the model being compiled in RAM to minimze memory pressure\nunet = copy.deepcopy(pipe.unet.unetwrap)\ndel pipe\n\n# Compile unet - FP32\nsample_1b = torch.randn([1, 4, 64, 64], dtype=DTYPE)\ntimestep_1b = torch.tensor(999, dtype=DTYPE).expand((1,))\nencoder_hidden_states_1b = torch.randn([1, 77, 1024], dtype=DTYPE)\nexample_inputs = sample_1b, timestep_1b, encoder_hidden_states_1b\n\nunet_neuron = torch_neuronx.trace(\n    unet,\n    example_inputs,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'),\n    compiler_args=[\"--model-type=unet-inference\", \"--enable-fast-loading-neuron-binaries\"]\n)\n\n# Enable asynchronous and lazy loading to speed up model load\ntorch_neuronx.async_load(unet_neuron)\ntorch_neuronx.lazy_load(unet_neuron)\n\n# save compiled unet\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\ntorch.jit.save(unet_neuron, unet_filename)\n\n# delete unused objects\ndel unet\ndel unet_neuron\n\n\n\n# --- Compile CLIP text encoder and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\ntext_encoder = copy.deepcopy(pipe.text_encoder)\ndel pipe\n\n# Apply the wrapper to deal with custom return type\ntext_encoder = NeuronTextEncoder(text_encoder)\n\n# Compile text encoder\n# This is used for indexing a lookup table in torch.nn.Embedding,\n# so using random numbers may give errors (out of range).\nemb = torch.tensor([[49406, 18376,   525,  7496, 49407,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0]])\ntext_encoder_neuron = torch_neuronx.trace(\n        text_encoder.neuron_text_encoder, \n        emb, \n        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'),\n        compiler_args=[\"--enable-fast-loading-neuron-binaries\"]\n        )\n\n# Enable asynchronous loading to speed up model load\ntorch_neuronx.async_load(text_encoder_neuron)\n\n# Save the compiled text encoder\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ntorch.jit.save(text_encoder_neuron, text_encoder_filename)\n\n# delete unused objects\ndel text_encoder\ndel text_encoder_neuron\n\n\n\n# --- Compile VAE decoder and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\ndecoder = copy.deepcopy(pipe.vae.decoder)\ndel pipe\n\n# Compile vae decoder\ndecoder_in = torch.randn([1, 4, 64, 64], dtype=torch.float32)\ndecoder_neuron = torch_neuronx.trace(\n    decoder, \n    decoder_in, \n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'),\n    compiler_args=[\"--enable-fast-loading-neuron-binaries\"]\n)\n\n# Enable asynchronous loading to speed up model load\ntorch_neuronx.async_load(decoder_neuron)\n\n# Save the compiled vae decoder\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\ntorch.jit.save(decoder_neuron, decoder_filename)\n\n# delete unused objects\ndel decoder\ndel decoder_neuron\n\n\n\n# --- Compile VAE post_quant_conv and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\npost_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv)\ndel pipe\n\n# # Compile vae post_quant_conv\npost_quant_conv_in = torch.randn([1, 4, 64, 64], dtype=torch.float32)\npost_quant_conv_neuron = torch_neuronx.trace(\n    post_quant_conv, \n    post_quant_conv_in,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'),\n)\n\n# Enable asynchronous loading to speed up model load\ntorch_neuronx.async_load(post_quant_conv_neuron)\n\n# # Save the compiled vae post_quant_conv\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\ntorch.jit.save(post_quant_conv_neuron, post_quant_conv_filename)\n\n# delete unused objects\ndel post_quant_conv\ndel post_quant_conv_neuron\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/sd2_768_benchmark.py",
    "content": "import os\nos.environ[\"NEURON_FUSE_SOFTMAX\"] = \"1\"\n\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\nfrom diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n\nimport time\nimport math\n\n# Define datatype\nDTYPE = torch.float32\n\n# Specialized benchmarking class for stable diffusion.\n# We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance,\n# because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object.\n# All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a\n# traced Torchscript.\ndef benchmark(n_runs, test_name, model, model_inputs):\n    if not isinstance(model_inputs, tuple):\n        model_inputs = (model_inputs,)\n    \n    warmup_run = model(*model_inputs)\n\n    latency_collector = LatencyCollector()\n    # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module\n    \n    for _ in range(n_runs):\n        latency_collector.pre_hook()\n        res = model(*model_inputs)\n        latency_collector.hook()\n    \n    p0_latency_ms = latency_collector.percentile(0) * 1000\n    p50_latency_ms = latency_collector.percentile(50) * 1000\n    p90_latency_ms = latency_collector.percentile(90) * 1000\n    p95_latency_ms = latency_collector.percentile(95) * 1000\n    p99_latency_ms = latency_collector.percentile(99) * 1000\n    p100_latency_ms = latency_collector.percentile(100) * 1000\n\n    report_dict = dict()\n    report_dict[\"Latency P0\"] = f'{p0_latency_ms:.1f}'\n    report_dict[\"Latency P50\"]=f'{p50_latency_ms:.1f}'\n    report_dict[\"Latency P90\"]=f'{p90_latency_ms:.1f}'\n    report_dict[\"Latency P95\"]=f'{p95_latency_ms:.1f}'\n    report_dict[\"Latency P99\"]=f'{p99_latency_ms:.1f}'\n    report_dict[\"Latency P100\"]=f'{p100_latency_ms:.1f}'\n\n    report = f'RESULT FOR {test_name}:'\n    for key, value in report_dict.items():\n        report += f' {key}={value}'\n    print(report)\n\nclass LatencyCollector:\n    def __init__(self):\n        self.start = None\n        self.latency_list = []\n\n    def pre_hook(self, *args):\n        self.start = time.time()\n\n    def hook(self, *args):\n        self.latency_list.append(time.time() - self.start)\n\n    def percentile(self, percent):\n        latency_list = self.latency_list\n        pos_float = len(latency_list) * percent / 100\n        max_pos = len(latency_list) - 1\n        pos_floor = min(math.floor(pos_float), max_pos)\n        pos_ceil = min(math.ceil(pos_float), max_pos)\n        latency_list = sorted(latency_list)\n        return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor]\n\n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False)\n        return out_tuple\n\nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.device = unetwrap.unet.device\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None, return_dict=False):\n        sample = self.unetwrap(sample, timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states)[0]\n        return UNet2DConditionOutput(sample=sample)\n\nclass NeuronTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.neuron_text_encoder = text_encoder\n        self.config = text_encoder.config\n        self.dtype = text_encoder.dtype\n        self.device = text_encoder.device\n\n    def forward(self, emb, attention_mask = None):\n        return [self.neuron_text_encoder(emb)['last_hidden_state']]\n    \n    \n# --- Load all compiled models and run pipeline ---\nCOMPILER_WORKDIR_ROOT = 'sd2_compile_dir_768'\nmodel_id = \"stabilityai/stable-diffusion-2-1\"\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\n\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\npipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)\n\n# Load the compiled UNet onto two neuron cores.\npipe.unet = NeuronUNet(UNetWrap(pipe.unet))\ndevice_ids = [0,1]\npipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False)\n\n# Load other compiled models onto a single neuron core.\npipe.text_encoder = NeuronTextEncoder(pipe.text_encoder)\npipe.text_encoder.neuron_text_encoder = torch.jit.load(text_encoder_filename)\npipe.vae.decoder = torch.jit.load(decoder_filename)\npipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename)\n\nprompt = \"a photo of an astronaut riding a horse on mars\"\nn_runs = 20\nbenchmark(n_runs, \"stable_diffusion_768\", pipe, prompt)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/sd2_768_compile.py",
    "content": "import os\nos.environ[\"NEURON_FUSE_SOFTMAX\"] = \"1\"\n\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\nimport copy\nfrom diffusers import StableDiffusionPipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n# Compatibility for diffusers<0.18.0\nfrom packaging import version\nimport diffusers\ndiffusers_version = version.parse(diffusers.__version__)\nuse_new_diffusers = diffusers_version >= version.parse('0.18.0')\nif use_new_diffusers:\n    from diffusers.models.attention_processor import Attention\nelse:\n    from diffusers.models.cross_attention import CrossAttention\n\n# Define datatype\nDTYPE = torch.float32\n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False)\n        return out_tuple\n\nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.device = unetwrap.unet.device\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        sample = self.unetwrap(sample, timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states)[0]\n        return UNet2DConditionOutput(sample=sample)\n    \nclass NeuronTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.neuron_text_encoder = text_encoder\n        self.config = text_encoder.config\n        self.dtype = text_encoder.dtype\n        self.device = text_encoder.device\n\n    def forward(self, emb, attention_mask = None):\n        return [self.neuron_text_encoder(emb)['last_hidden_state']]\n\n\n# Optimized attention\ndef get_attention_scores(self, query, key, attn_mask):       \n    dtype = query.dtype\n\n    if self.upcast_attention:\n        query = query.float()\n        key = key.float()\n\n    # Check for square matmuls\n    if(query.size() == key.size()):\n        attention_scores = custom_badbmm(\n            key,\n            query.transpose(-1, -2)\n        )\n\n        if self.upcast_softmax:\n            attention_scores = attention_scores.float()\n\n        attention_probs = attention_scores.softmax(dim=1).permute(0,2,1)\n        attention_probs = attention_probs.to(dtype)\n\n    else:\n        attention_scores = custom_badbmm(\n            query,\n            key.transpose(-1, -2)\n        )\n\n        if self.upcast_softmax:\n            attention_scores = attention_scores.float()\n\n        attention_probs = attention_scores.softmax(dim=-1)\n        attention_probs = attention_probs.to(dtype)\n        \n    return attention_probs\n\n# In the original badbmm the bias is all zeros, so only apply scale\ndef custom_badbmm(a, b):\n    bmm = torch.bmm(a, b)\n    scaled = bmm * 0.125\n    return scaled\n\n\n# For saving compiler artifacts\nCOMPILER_WORKDIR_ROOT = 'sd2_compile_dir_768'\n\n# Model ID for SD version pipeline\nmodel_id = \"stabilityai/stable-diffusion-2-1\"\n\n# --- Compile UNet and save ---\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\n\n# Replace original cross-attention module with custom cross-attention module for better performance\nif use_new_diffusers:\n    Attention.get_attention_scores = get_attention_scores\nelse:\n    CrossAttention.get_attention_scores = get_attention_scores\n\n# Apply double wrapper to deal with custom return type\npipe.unet = NeuronUNet(UNetWrap(pipe.unet))\n\n# Only keep the model being compiled in RAM to minimze memory pressure\nunet = copy.deepcopy(pipe.unet.unetwrap)\ndel pipe\n\n# Compile unet\nsample_1b = torch.randn([1, 4, 96, 96], dtype=DTYPE)\ntimestep_1b = torch.tensor(999, dtype=DTYPE).expand((1,))\nencoder_hidden_states_1b = torch.randn([1, 77, 1024], dtype=DTYPE)\nexample_inputs = sample_1b, timestep_1b, encoder_hidden_states_1b\n\nunet_neuron = torch_neuronx.trace(\n    unet,\n    example_inputs,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'),\n    compiler_args=[\"--model-type=unet-inference\", \"--enable-fast-loading-neuron-binaries\"]\n)\n\n# Enable asynchronous and lazy loading to speed up model load\ntorch_neuronx.async_load(unet_neuron)\ntorch_neuronx.lazy_load(unet_neuron)\n\n# save compiled unet\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\ntorch.jit.save(unet_neuron, unet_filename)\n\n# delete unused objects\ndel unet\ndel unet_neuron\n\n\n\n# --- Compile CLIP text encoder and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\ntext_encoder = copy.deepcopy(pipe.text_encoder)\ndel pipe\n\n# Apply the wrapper to deal with custom return type\ntext_encoder = NeuronTextEncoder(text_encoder)\n\n# Compile text encoder\n# This is used for indexing a lookup table in torch.nn.Embedding,\n# so using random numbers may give errors (out of range).\nemb = torch.tensor([[49406, 18376,   525,  7496, 49407,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0]])\ntext_encoder_neuron = torch_neuronx.trace(\n        text_encoder.neuron_text_encoder, \n        emb, \n        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'),\n        compiler_args=[\"--enable-fast-loading-neuron-binaries\"]\n        )\n\n# Enable asynchronous loading to speed up model load\ntorch_neuronx.async_load(text_encoder_neuron)\n\n# Save the compiled text encoder\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ntorch.jit.save(text_encoder_neuron, text_encoder_filename)\n\n# delete unused objects\ndel text_encoder\ndel text_encoder_neuron\n\n\n\n# --- Compile VAE decoder and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\ndecoder = copy.deepcopy(pipe.vae.decoder)\ndel pipe\n\n# Compile vae decoder\ndecoder_in = torch.randn([1, 4, 96, 96], dtype=DTYPE)\ndecoder_neuron = torch_neuronx.trace(\n    decoder, \n    decoder_in, \n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'),\n    compiler_args=[\"--enable-fast-loading-neuron-binaries\"]\n)\n\n# Enable asynchronous loading to speed up model load\ntorch_neuronx.async_load(decoder_neuron)\n\n# Save the compiled vae decoder\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\ntorch.jit.save(decoder_neuron, decoder_filename)\n\n# delete unused objects\ndel decoder\ndel decoder_neuron\n\n\n\n# --- Compile VAE post_quant_conv and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\npost_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv)\ndel pipe\n\n# # Compile vae post_quant_conv\npost_quant_conv_in = torch.randn([1, 4, 96, 96], dtype=DTYPE)\npost_quant_conv_neuron = torch_neuronx.trace(\n    post_quant_conv, \n    post_quant_conv_in,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'),\n)\n\n# Enable asynchronous loading to speed up model load\ntorch_neuronx.async_load(post_quant_conv_neuron)\n\n# # Save the compiled vae post_quant_conv\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\ntorch.jit.save(post_quant_conv_neuron, post_quant_conv_filename)\n\n# delete unused objects\ndel post_quant_conv\ndel post_quant_conv_neuron\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/sd2_inpainting_benchmark.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch_neuronx\nimport os\nfrom diffusers import StableDiffusionInpaintPipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n\nfrom diffusers.models.attention_processor import Attention\n\nimport argparse\nimport copy\n\ntorch.manual_seed(0)\n\ndef parse_argsuments():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--prompt', type=str, default='Face of a yellow cat, high resolution, sitting on a park bench', help=\"user input for text to image use case\")\n    parser.add_argument('--target_dir', type=str, default='./sd21_inpainting_512_neuron', help=\"directory to save neuron compield model\")\n    args=parser.parse_args()\n    return args\n\n# Have to do this double wrapper trick to compile the unet, because\n# of the special UNet2DConditionOutput output type.\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False)\n        return out_tuple\n\nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.device = unetwrap.unet.device\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        sample = self.unetwrap(sample, timestep.bfloat16().expand((sample.shape[0],)), encoder_hidden_states)[0]\n        return UNet2DConditionOutput(sample=sample)\n\nclass NeuronTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.neuron_text_encoder = text_encoder\n        self.config = text_encoder.config\n        self.dtype = text_encoder.dtype\n        self.device = text_encoder.device\n\n    def forward(self, emb, attention_mask = None):\n        return [self.neuron_text_encoder(emb)['last_hidden_state']]\n\n# Optimized attention\ndef get_attention_scores(self, query, key, attn_mask):       \n    dtype = query.dtype\n\n    if self.upcast_attention:\n        query = query.float()\n        key = key.float()\n\n    # Check for square matmuls\n    if(query.size() == key.size()):\n        attention_scores = custom_badbmm(\n            key,\n            query.transpose(-1, -2)\n        )\n\n        if self.upcast_softmax:\n            attention_scores = attention_scores.float()\n\n        attention_probs = torch.nn.functional.softmax(attention_scores, dim=1).permute(0,2,1)\n        attention_probs = attention_probs.to(dtype)\n\n    else:\n        attention_scores = custom_badbmm(\n            query,\n            key.transpose(-1, -2)\n        )\n\n        if self.upcast_softmax:\n            attention_scores = attention_scores.float()\n\n        attention_probs = torch.nn.functional.softmax(attention_scores, dim=-1)\n        attention_probs = attention_probs.to(dtype)\n        \n    return attention_probs\n\ndef custom_badbmm(a, b):\n    bmm = torch.bmm(a, b)\n    scaled = bmm * 0.125\n    return scaled\n\ninputs=parse_argsuments()\nprint(inputs.target_dir)\n# For saving compiler artifacts\nCOMPILER_WORKDIR_ROOT = inputs.target_dir\n\ndef trace_vae_encoder(model_id, height, width):\n    # Only keep the model being compiled in RAM to minimze memory pressure\n    pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n    vae_encoder = copy.deepcopy(pipe.vae.encoder)\n    del pipe\n\n    sample_input = torch.randn([1, 3, height, width])\n    vae_encoder_neuron = torch_neuronx.trace(\n            vae_encoder, \n            sample_input, \n            compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_encoder'),\n            )\n\n    # Save the compiled text encoder\n    vae_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_encoder/model.pt')\n    torch.jit.save(vae_encoder_neuron, vae_encoder_filename)\n\n    # delete unused objects\n    del vae_encoder\n    del vae_encoder_neuron\n\ndef trace_unet(model_id, height, width):\n    # --- Compile UNet and save ---\n    DTYPE = torch.bfloat16\n    pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\n\n    # Replace original cross-attention module with custom cross-attention module for better performance\n    Attention.get_attention_scores = get_attention_scores\n\n    # Apply double wrapper to deal with custom return type\n    pipe.unet = NeuronUNet(UNetWrap(pipe.unet))\n\n    # Only keep the model being compiled in RAM to minimze memory pressure\n    unet = copy.deepcopy(pipe.unet.unetwrap)\n    del pipe\n\n    sample_1b = torch.randn([1, 9, height, width], dtype=DTYPE)\n    timestep_1b = torch.tensor(999, dtype=DTYPE).expand((1,))\n    encoder_hidden_states_1b = torch.randn([1, 77, 1024], dtype=DTYPE)\n    example_inputs = sample_1b, timestep_1b, encoder_hidden_states_1b\n\n    unet_neuron = torch_neuronx.trace(\n        unet,\n        example_inputs,\n        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'),\n        compiler_args=[\"--model-type=unet-inference\", \"--verbose=info\"],\n    )\n\n    # save compiled unet\n    unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\n    torch.jit.save(unet_neuron, unet_filename)\n\n    # delete unused objects\n    del unet\n    del unet_neuron\n    \n\ndef main():\n    \n    model_id = \"stabilityai/stable-diffusion-2-inpainting\"\n    height = 624\n    width = 936\n\n    trace_unet(model_id, height // 8, width // 8)\n    trace_vae_encoder(model_id, height, width)\n\n    # Only keep the model being compiled in RAM to minimze memory pressure\n    pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n    text_encoder = copy.deepcopy(pipe.text_encoder)\n    del pipe\n    # Apply the wrapper to deal with custom return type\n    text_encoder = NeuronTextEncoder(text_encoder)\n\n    # Compile text encoder\n    # This is used for indexing a lookup table in torch.nn.Embedding,\n    # so using random numbers may give errors (out of range).\n    emb = torch.tensor([[49406, 18376,   525,  7496, 49407,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0]])\n    text_encoder_neuron = torch_neuronx.trace(\n            text_encoder.neuron_text_encoder, \n            emb, \n            compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'),\n            )\n\n    # Save the compiled text encoder\n    text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\n    torch.jit.save(text_encoder_neuron, text_encoder_filename)\n\n    # delete unused objects\n    del text_encoder\n    del text_encoder_neuron\n\n    # --- Compile VAE decoder and save ---\n\n    # Only keep the model being compiled in RAM to minimze memory pressure\n    pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n    decoder = copy.deepcopy(pipe.vae.decoder)\n    del pipe\n\n    # Compile vae decoder\n    decoder_in = torch.randn([1, 4, height // 8, width // 8])\n    decoder_neuron = torch_neuronx.trace(\n        decoder, \n        decoder_in, \n        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'),\n        compiler_args=[\"--verbose\", \"info\"]\n    )\n\n    # Save the compiled vae decoder\n    decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\n    torch.jit.save(decoder_neuron, decoder_filename)\n\n    # delete unused objects\n    del decoder\n    del decoder_neuron\n    \n    # --- Compile VAE post_quant_conv and save ---\n\n    # Only keep the model being compiled in RAM to minimze memory pressure\n    pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n    post_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv)\n    del pipe\n\n    # Compile vae post_quant_conv\n    post_quant_conv_in = torch.randn([1, 4, height // 8 , width // 8])\n    post_quant_conv_neuron = torch_neuronx.trace(\n        post_quant_conv, \n        post_quant_conv_in,\n        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'),\n        compiler_args=[\"--verbose\", \"info\"]\n    )\n\n    # Save the compiled vae post_quant_conv\n    post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\n    torch.jit.save(post_quant_conv_neuron, post_quant_conv_filename)\n\n    # delete unused objects\n    del post_quant_conv\n    del post_quant_conv_neuron\n    \n\nif __name__ == \"__main__\":\n    main()"
  },
  {
    "path": "archive/src/benchmark/pytorch/sd2_inpainting_inference.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch_neuronx\nimport os\nimport time\nfrom diffusers import StableDiffusionInpaintPipeline, DPMSolverMultistepScheduler\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n\nfrom diffusers.models.attention_processor import Attention\n\nimport threading\nimport argparse\nimport sys\nimport copy\nimport PIL\nimport math\n\ntorch.manual_seed(0)\n\ndef parse_argsuments():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--prompt', type=str, default='Face of a yellow cat, high resolution, sitting on a park bench', help=\"user input for text to image use case\")\n    parser.add_argument('--target_dir', type=str, default='./sd21_inpainting_512_neuron', help=\"directory to save neuron compield model\")\n    args=parser.parse_args()\n    return args\n\n# Specialized benchmarking class for stable diffusion.\n# We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance,\n# because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object.\n# All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a\n# traced Torchscript.\ndef benchmark(n_runs, test_name, model, model_inputs):\n    if not isinstance(model_inputs, tuple):\n        model_inputs = (model_inputs,)\n    \n    warmup_run = model(*model_inputs)\n\n    latency_collector = LatencyCollector()\n    # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module\n    \n    for _ in range(n_runs):\n        latency_collector.pre_hook()\n        res = model(*model_inputs)\n        latency_collector.hook()\n    \n    p0_latency_ms = latency_collector.percentile(0) * 1000\n    p50_latency_ms = latency_collector.percentile(50) * 1000\n    p90_latency_ms = latency_collector.percentile(90) * 1000\n    p95_latency_ms = latency_collector.percentile(95) * 1000\n    p99_latency_ms = latency_collector.percentile(99) * 1000\n    p100_latency_ms = latency_collector.percentile(100) * 1000\n\n    report_dict = dict()\n    report_dict[\"Latency P0\"] = f'{p0_latency_ms:.1f}'\n    report_dict[\"Latency P50\"]=f'{p50_latency_ms:.1f}'\n    report_dict[\"Latency P90\"]=f'{p90_latency_ms:.1f}'\n    report_dict[\"Latency P95\"]=f'{p95_latency_ms:.1f}'\n    report_dict[\"Latency P99\"]=f'{p99_latency_ms:.1f}'\n    report_dict[\"Latency P100\"]=f'{p100_latency_ms:.1f}'\n\n    report = f'RESULT FOR {test_name}:'\n    for key, value in report_dict.items():\n        report += f' {key}={value}'\n    print(report)\n\nclass LatencyCollector:\n    def __init__(self):\n        self.start = None\n        self.latency_list = []\n\n    def pre_hook(self, *args):\n        self.start = time.time()\n\n    def hook(self, *args):\n        self.latency_list.append(time.time() - self.start)\n\n    def percentile(self, percent):\n        latency_list = self.latency_list\n        pos_float = len(latency_list) * percent / 100\n        max_pos = len(latency_list) - 1\n        pos_floor = min(math.floor(pos_float), max_pos)\n        pos_ceil = min(math.ceil(pos_float), max_pos)\n        latency_list = sorted(latency_list)\n        return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor]\n\nDTYPE = torch.bfloat16\n\n# Have to do this double wrapper trick to compile the unet, because\n# of the special UNet2DConditionOutput output type.\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False)\n        return out_tuple\n\nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.device = unetwrap.unet.device\n\n    def forward(self, sample, timestep, encoder_hidden_states, timestep_cond=None, added_cond_kwargs=None, cross_attention_kwargs=None, return_dict=False):\n        sample = self.unetwrap(sample.to(dtype=DTYPE), timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states.to(dtype=DTYPE))[0]\n        return UNet2DConditionOutput(sample=sample)\n\nclass NeuronTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.neuron_text_encoder = text_encoder\n        self.config = text_encoder.config\n        self.dtype = text_encoder.dtype\n        self.device = text_encoder.device\n\n    def forward(self, emb, attention_mask = None):\n        return [self.neuron_text_encoder(emb)['last_hidden_state']]\n\n# Optimized attention\ndef get_attention_scores(self, query, key, attn_mask):       \n    dtype = query.dtype\n\n    if self.upcast_attention:\n        query = query.float()\n        key = key.float()\n\n    # Check for square matmuls\n    if(query.size() == key.size()):\n        attention_scores = custom_badbmm(\n            key,\n            query.transpose(-1, -2)\n        )\n\n        if self.upcast_softmax:\n            attention_scores = attention_scores.float()\n\n        attention_probs = torch.nn.functional.softmax(attention_scores, dim=1).permute(0,2,1)\n        attention_probs = attention_probs.to(dtype)\n\n    else:\n        attention_scores = custom_badbmm(\n            query,\n            key.transpose(-1, -2)\n        )\n\n        if self.upcast_softmax:\n            attention_scores = attention_scores.float()\n\n        attention_probs = torch.nn.functional.softmax(attention_scores, dim=-1)\n        attention_probs = attention_probs.to(dtype)\n        \n    return attention_probs\n\ndef custom_badbmm(a, b):\n    bmm = torch.bmm(a, b)\n    scaled = bmm * 0.125\n    return scaled\n\n\ndef main():\n    \n    inputs=parse_argsuments()\n    print(inputs.target_dir)\n\n    # For saving compiler artifacts\n    COMPILER_WORKDIR_ROOT = inputs.target_dir\n\n    model_id = \"stabilityai/stable-diffusion-2-inpainting\"\n    \n    pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n    \n    text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\n    unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\n    vae_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_encoder/model.pt')\n    decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\n    post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\n\n    # Load the compiled UNet onto two neuron cores.\n    pipe.unet = NeuronUNet(UNetWrap(pipe.unet))\n    device_ids = [0,1]\n    pipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False)\n\n    # Load other compiled models onto a single neuron core.\n    pipe.text_encoder = NeuronTextEncoder(pipe.text_encoder)\n    pipe.text_encoder.neuron_text_encoder = torch.jit.load(text_encoder_filename)\n    pipe.vae.encoder = torch.jit.load(vae_encoder_filename)\n    pipe.vae.decoder = torch.jit.load(decoder_filename)\n    pipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename)\n    \n    height = 624\n    width = 936\n    base_image = PIL.Image.open('sd2_inpainting_photo.png')\n    mask = PIL.Image.open('sd2_inpainting_mask.png')\n    image = pipe(prompt=inputs.prompt, image=base_image, mask_image=mask, height=height, width=width).images[0]\n    image.save(\"sd2_inpainting_output.png\")\n    \n    n_runs = 10\n    benchmark(n_runs, \"stable_diffusion_inpainting\", pipe, (inputs.prompt, base_image, mask, None, height, width))\n\nif __name__ == \"__main__\":\n    main()"
  },
  {
    "path": "archive/src/benchmark/pytorch/sd_15_512_benchmark.py",
    "content": "import os\nos.environ[\"NEURON_FUSE_SOFTMAX\"] = \"1\"\n\nimport copy\nimport time\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\nfrom diffusers import StableDiffusionPipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n\nimport time\nimport math\n\n# Specialized benchmarking class for stable diffusion.\n# We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance,\n# because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object.\n# All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a\n# traced Torchscript.\ndef benchmark(n_runs, test_name, model, model_inputs):\n    if not isinstance(model_inputs, tuple):\n        model_inputs = (model_inputs,)\n    \n    warmup_run = model(*model_inputs)\n\n    latency_collector = LatencyCollector()\n    # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module\n    \n    for _ in range(n_runs):\n        latency_collector.pre_hook()\n        res = model(*model_inputs)\n        latency_collector.hook()\n    \n    p0_latency_ms = latency_collector.percentile(0) * 1000\n    p50_latency_ms = latency_collector.percentile(50) * 1000\n    p90_latency_ms = latency_collector.percentile(90) * 1000\n    p95_latency_ms = latency_collector.percentile(95) * 1000\n    p99_latency_ms = latency_collector.percentile(99) * 1000\n    p100_latency_ms = latency_collector.percentile(100) * 1000\n\n    report_dict = dict()\n    report_dict[\"Latency P0\"] = f'{p0_latency_ms:.1f}'\n    report_dict[\"Latency P50\"]=f'{p50_latency_ms:.1f}'\n    report_dict[\"Latency P90\"]=f'{p90_latency_ms:.1f}'\n    report_dict[\"Latency P95\"]=f'{p95_latency_ms:.1f}'\n    report_dict[\"Latency P99\"]=f'{p99_latency_ms:.1f}'\n    report_dict[\"Latency P100\"]=f'{p100_latency_ms:.1f}'\n\n    report = f'RESULT FOR {test_name}:'\n    for key, value in report_dict.items():\n        report += f' {key}={value}'\n    print(report)\n\nclass LatencyCollector:\n    def __init__(self):\n        self.start = None\n        self.latency_list = []\n\n    def pre_hook(self, *args):\n        self.start = time.time()\n\n    def hook(self, *args):\n        self.latency_list.append(time.time() - self.start)\n\n    def percentile(self, percent):\n        latency_list = self.latency_list\n        pos_float = len(latency_list) * percent / 100\n        max_pos = len(latency_list) - 1\n        pos_floor = min(math.floor(pos_float), max_pos)\n        pos_ceil = min(math.ceil(pos_float), max_pos)\n        latency_list = sorted(latency_list)\n        return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor]\n\n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False)\n        return out_tuple\n    \nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.device = unetwrap.unet.device\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None, return_dict=False):\n        sample = self.unetwrap(sample, timestep.float().expand((sample.shape[0],)), encoder_hidden_states)[0]\n        return UNet2DConditionOutput(sample=sample)\n\nclass NeuronTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.neuron_text_encoder = text_encoder\n        self.config = text_encoder.config\n        self.dtype = torch.float32\n        self.device = text_encoder.device\n\n    def forward(self, emb, attention_mask = None):\n        return [self.neuron_text_encoder(emb)['last_hidden_state']]\n\n\nclass NeuronSafetyModelWrap(nn.Module):\n    def __init__(self, safety_model):\n        super().__init__()\n        self.safety_model = safety_model\n\n    def forward(self, clip_inputs):\n        return list(self.safety_model(clip_inputs).values())\n\n\n\n# # For saving compiler artifacts\nCOMPILER_WORKDIR_ROOT = 'sd_1_5_fp32_512_compile_workdir'\n\n# Model ID for SD version pipeline\nmodel_id = \"runwayml/stable-diffusion-v1-5\"\n\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\nsafety_model_neuron_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'safety_model/model.pt')\n\n\n# Load the compiled UNet onto two neuron cores.\npipe.unet = NeuronUNet(UNetWrap(pipe.unet))\ndevice_ids = [0,1]\npipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False)\n\n# Load other compiled models onto a single neuron core.\npipe.text_encoder = NeuronTextEncoder(pipe.text_encoder)\npipe.text_encoder.neuron_text_encoder = torch.jit.load(text_encoder_filename)\npipe.vae.decoder = torch.jit.load(decoder_filename)\npipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename)\npipe.safety_checker.vision_model = NeuronSafetyModelWrap(torch.jit.load(safety_model_neuron_filename))\n\nprompt = \"a photo of an astronaut riding a horse on mars\"\nn_runs = 20\nbenchmark(n_runs, \"stable_diffusion_15_512\", pipe, prompt)"
  },
  {
    "path": "archive/src/benchmark/pytorch/sd_15_512_compile.py",
    "content": "import os\nos.environ[\"NEURON_FUSE_SOFTMAX\"] = \"1\"\n\nimport copy\nimport time\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\nfrom diffusers import StableDiffusionPipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n\n# Compatibility for diffusers<0.18.0\nfrom packaging import version\nimport diffusers\ndiffusers_version = version.parse(diffusers.__version__)\nuse_new_diffusers = diffusers_version >= version.parse('0.18.0')\nif use_new_diffusers:\n    from diffusers.models.attention_processor import Attention\nelse:\n    from diffusers.models.cross_attention import CrossAttention\n\n\ndef get_attention_scores(self, query, key, attn_mask):    \n    dtype = query.dtype\n\n    if self.upcast_attention:\n        query = query.float()\n        key = key.float()\n\n    if(query.size() == key.size()):\n        attention_scores = cust_badbmm(\n            key,\n            query.transpose(-1, -2),\n            self.scale\n        )\n\n        if self.upcast_softmax:\n            attention_scores = attention_scores.float()\n\n        attention_probs = torch.nn.functional.softmax(attention_scores, dim=1).permute(0,2,1)\n        attention_probs = attention_probs.to(dtype)\n\n    else:\n        attention_scores = cust_badbmm(\n            query,\n            key.transpose(-1, -2),\n            self.scale\n        )\n\n        if self.upcast_softmax:\n            attention_scores = attention_scores.float()\n\n        attention_probs = torch.nn.functional.softmax(attention_scores, dim=-1)\n        attention_probs = attention_probs.to(dtype)\n        \n    return attention_probs\n\ndef cust_badbmm(a, b, scale):\n    bmm = torch.bmm(a, b)\n    scaled = bmm * scale\n    return scaled\n\n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None):\n        out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False)\n        return out_tuple\n    \nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.device = unetwrap.unet.device\n\n    def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None, return_dict=False):\n        sample = self.unetwrap(sample, timestep.float().expand((sample.shape[0],)), encoder_hidden_states)[0]\n        return UNet2DConditionOutput(sample=sample)\n\nclass NeuronTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.neuron_text_encoder = text_encoder\n        self.config = text_encoder.config\n        self.dtype = torch.float32\n        self.device = text_encoder.device\n\n    def forward(self, emb, attention_mask = None):\n        return [self.neuron_text_encoder(emb)['last_hidden_state']]\n\n\nclass NeuronSafetyModelWrap(nn.Module):\n    def __init__(self, safety_model):\n        super().__init__()\n        self.safety_model = safety_model\n\n    def forward(self, clip_inputs):\n        return list(self.safety_model(clip_inputs).values())\n\n\n\n# For saving compiler artifacts\nCOMPILER_WORKDIR_ROOT = 'sd_1_5_fp32_512_compile_workdir'\n\n# Model ID for SD version pipeline\nmodel_id = \"runwayml/stable-diffusion-v1-5\"\n\n\n# --- Compile CLIP text encoder and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\ntext_encoder = copy.deepcopy(pipe.text_encoder)\ndel pipe\n\n# Apply the wrapper to deal with custom return type\ntext_encoder = NeuronTextEncoder(text_encoder)\n\n# Compile text encoder\n# This is used for indexing a lookup table in torch.nn.Embedding,\n# so using random numbers may give errors (out of range).\nemb = torch.tensor([[49406, 18376,   525,  7496, 49407,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0]])\n\nwith torch.no_grad():\n    start_time = time.time()\n    text_encoder_neuron = torch_neuronx.trace(\n            text_encoder.neuron_text_encoder, \n            emb, \n            compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'),\n            compiler_args=[\"--enable-fast-loading-neuron-binaries\"]\n            )\n    text_encoder_neuron_compile_time = time.time() - start_time\n    print('text_encoder_neuron_compile_time:', text_encoder_neuron_compile_time)\n\n# Save the compiled text encoder\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ntorch_neuronx.async_load(text_encoder_neuron)\ntorch.jit.save(text_encoder_neuron, text_encoder_filename)\n\n# delete unused objects\ndel text_encoder\ndel text_encoder_neuron\ndel emb\n\n# --- Compile VAE decoder and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\ndecoder = copy.deepcopy(pipe.vae.decoder)\ndel pipe\n\n# Compile vae decoder\ndecoder_in = torch.randn([1, 4, 64, 64])\nwith torch.no_grad():\n    start_time = time.time()\n    decoder_neuron = torch_neuronx.trace(\n        decoder, \n        decoder_in, \n        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'),\n        compiler_args=[\"--enable-fast-loading-neuron-binaries\"]\n    )\n    vae_decoder_compile_time = time.time() - start_time\n    print('vae_decoder_compile_time:', vae_decoder_compile_time)\n\n# Save the compiled vae decoder\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\ntorch_neuronx.async_load(decoder_neuron)\ntorch.jit.save(decoder_neuron, decoder_filename)\n\n# delete unused objects\ndel decoder\ndel decoder_in\ndel decoder_neuron\n\n# --- Compile UNet and save ---\n\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n\n# Replace original cross-attention module with custom cross-attention module for better performance\nif use_new_diffusers:\n    Attention.get_attention_scores = get_attention_scores\nelse:\n    CrossAttention.get_attention_scores = get_attention_scores\n\n# Apply double wrapper to deal with custom return type\npipe.unet = NeuronUNet(UNetWrap(pipe.unet))\n\n# Only keep the model being compiled in RAM to minimze memory pressure\nunet = copy.deepcopy(pipe.unet.unetwrap)\ndel pipe\n\n# Compile unet - FP32\nsample_1b = torch.randn([1, 4, 64, 64])\ntimestep_1b = torch.tensor(999).float().expand((1,))\nencoder_hidden_states_1b = torch.randn([1, 77, 768])\nexample_inputs = sample_1b, timestep_1b, encoder_hidden_states_1b\n\nwith torch.no_grad():\n    start_time = time.time()\n    unet_neuron = torch_neuronx.trace(\n        unet,\n        example_inputs,\n        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'),\n        compiler_args=[\"--model-type=unet-inference\", \"--enable-fast-loading-neuron-binaries\"]\n    )\n    unet_compile_time = time.time() - start_time\n    print('unet_compile_time:', unet_compile_time)\n\n# Enable asynchronous and lazy loading to speed up model load\ntorch_neuronx.async_load(unet_neuron)\ntorch_neuronx.lazy_load(unet_neuron)\n\n# save compiled unet\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\ntorch.jit.save(unet_neuron, unet_filename)\n\n# delete unused objects\ndel unet\ndel unet_neuron\ndel sample_1b\ndel timestep_1b\ndel encoder_hidden_states_1b\n\n\n# --- Compile VAE post_quant_conv and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\npost_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv)\ndel pipe\n\n# Compile vae post_quant_conv\npost_quant_conv_in = torch.randn([1, 4, 64, 64])\nwith torch.no_grad():\n    start_time = time.time()\n    post_quant_conv_neuron = torch_neuronx.trace(\n        post_quant_conv, \n        post_quant_conv_in,\n        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'),\n        compiler_args=[\"--enable-fast-loading-neuron-binaries\"]\n    )\n    vae_post_quant_conv_compile_time = time.time() - start_time\n    print('vae_post_quant_conv_compile_time:', vae_post_quant_conv_compile_time)\n\n# Save the compiled vae post_quant_conv\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\ntorch_neuronx.async_load(post_quant_conv_neuron)\ntorch.jit.save(post_quant_conv_neuron, post_quant_conv_filename)\n\n# delete unused objects\ndel post_quant_conv\n\n\n\n# --- Compile safety checker and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\nsafety_model = copy.deepcopy(pipe.safety_checker.vision_model)\ndel pipe\n\nclip_input = torch.randn([1, 3, 224, 224])\nwith torch.no_grad():\n    start_time = time.time()\n    safety_model = torch_neuronx.trace(\n        safety_model, \n        clip_input,\n        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'safety_model'),\n        compiler_args=[\"--enable-fast-loading-neuron-binaries\"]\n    )\n    safety_model_compile_time = time.time() - start_time\n    print('safety_model_compile_time:', safety_model_compile_time)\n\n# Save the compiled safety checker\nsafety_model_neuron_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'safety_model/model.pt')\ntorch_neuronx.async_load(safety_model)\ntorch.jit.save(safety_model, safety_model_neuron_filename)\n\n# delete unused objects\ndel safety_model\n\nprint('Total compile time:', text_encoder_neuron_compile_time + vae_decoder_compile_time + unet_compile_time + vae_post_quant_conv_compile_time + safety_model_compile_time)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/sd_4x_upscaler_benchmark.py",
    "content": "import os\n\nimport time\nimport requests\nimport copy\nimport math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch_neuronx\nimport numpy as np\n\nfrom PIL import Image\nfrom io import BytesIO\n\nimport diffusers\nfrom diffusers import StableDiffusionUpscalePipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n\n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(\n        self,\n        sample,\n        timestep,\n        encoder_hidden_states,\n        class_labels,\n        cross_attention_kwargs=None,\n    ):\n        out_tuple = self.unet(\n            sample, timestep, encoder_hidden_states, class_labels, return_dict=False\n        )\n        return out_tuple\n\n\nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.device = unetwrap.unet.device\n\n    def forward(\n        self,\n        sample,\n        timestep,\n        encoder_hidden_states,\n        class_labels,\n        cross_attention_kwargs=None,\n        return_dict=False,\n    ):\n        sample = self.unetwrap(\n            sample,\n            timestep.float().expand((sample.shape[0],)),\n            encoder_hidden_states,\n            class_labels,\n        )[0]\n        return UNet2DConditionOutput(sample=sample)\n\n\nclass NeuronTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.neuron_text_encoder = text_encoder\n        self.config = text_encoder.config\n        self.dtype = text_encoder.dtype\n        self.device = text_encoder.device\n\n    def forward(self, emb, attention_mask=None):\n        return [self.neuron_text_encoder(emb)[\"last_hidden_state\"]]\n\n# Specialized benchmarking class for stable diffusion.\n# We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance,\n# because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object.\n# All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a\n# traced Torchscript.\ndef benchmark(n_runs, test_name, model, model_inputs):\n    if not isinstance(model_inputs, tuple):\n        model_inputs = (model_inputs,)\n    \n    warmup_run = model(*model_inputs)\n\n    latency_collector = LatencyCollector()\n    # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module\n    \n    for _ in range(n_runs):\n        latency_collector.pre_hook()\n        res = model(*model_inputs)\n        latency_collector.hook()\n    \n    p0_latency_ms = latency_collector.percentile(0) * 1000\n    p50_latency_ms = latency_collector.percentile(50) * 1000\n    p90_latency_ms = latency_collector.percentile(90) * 1000\n    p95_latency_ms = latency_collector.percentile(95) * 1000\n    p99_latency_ms = latency_collector.percentile(99) * 1000\n    p100_latency_ms = latency_collector.percentile(100) * 1000\n\n    report_dict = dict()\n    report_dict[\"Latency P0\"] = f'{p0_latency_ms:.1f}'\n    report_dict[\"Latency P50\"]=f'{p50_latency_ms:.1f}'\n    report_dict[\"Latency P90\"]=f'{p90_latency_ms:.1f}'\n    report_dict[\"Latency P95\"]=f'{p95_latency_ms:.1f}'\n    report_dict[\"Latency P99\"]=f'{p99_latency_ms:.1f}'\n    report_dict[\"Latency P100\"]=f'{p100_latency_ms:.1f}'\n\n    report = f'RESULT FOR {test_name}:'\n    for key, value in report_dict.items():\n        report += f' {key}={value}'\n    print(report)\n\nclass LatencyCollector:\n    def __init__(self):\n        self.start = None\n        self.latency_list = []\n\n    def pre_hook(self, *args):\n        self.start = time.time()\n\n    def hook(self, *args):\n        self.latency_list.append(time.time() - self.start)\n\n    def percentile(self, percent):\n        latency_list = self.latency_list\n        pos_float = len(latency_list) * percent / 100\n        max_pos = len(latency_list) - 1\n        pos_floor = min(math.floor(pos_float), max_pos)\n        pos_ceil = min(math.ceil(pos_float), max_pos)\n        latency_list = sorted(latency_list)\n        return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor]\n\n\n\n\n# --- Load all compiled models ---\nCOMPILER_WORKDIR_ROOT = 'stable_diffusion_upscaler_fp32'\nmodel_id = \"stabilityai/stable-diffusion-x4-upscaler\"\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\n\npipe = StableDiffusionUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n\n# Load the compiled UNet onto two neuron cores.\npipe.unet = NeuronUNet(UNetWrap(pipe.unet))\ndevice_ids = [0,1]\npipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False)\n\n# Load other compiled models onto a single neuron core.\npipe.text_encoder = NeuronTextEncoder(pipe.text_encoder)\npipe.text_encoder.neuron_text_encoder = torch.jit.load(text_encoder_filename)\npipe.vae.decoder = torch.jit.load(decoder_filename)\npipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename)\n\n# Run pipeline\nprompt = [\"a white cat\"]\nurl = \"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png\"\nresponse = requests.get(url)\nlow_res_img = Image.open(BytesIO(response.content)).convert(\"RGB\")\nlow_res_img = low_res_img.resize((128, 128))\nupscaled_image = pipe(prompt=prompt, image=low_res_img).images[0]\nos.makedirs(\"misc\", exist_ok=True)\nupscaled_image.save(\"upsampled_cat.png\")\n\n# Benchmark\nn_runs = 20\nbenchmark(n_runs, \"stable_diffusion_512\", pipe, (prompt, low_res_img))\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/sd_4x_upscaler_compile.py",
    "content": "import os\n\nimport requests\nimport copy\nimport math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch_neuronx\n\nfrom PIL import Image\nfrom io import BytesIO\n\nimport diffusers\nfrom diffusers import StableDiffusionUpscalePipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n\nfrom packaging import version\n\ndef apply_neuron_attn_override(\n    diffusers_pkg, get_attn_scores_func, neuron_scaled_dot_product_attention\n):\n    diffusers_version = version.parse(diffusers_pkg.__version__)\n    use_new_diffusers = diffusers_version >= version.parse(\"0.18.0\")\n    if use_new_diffusers:\n        diffusers_pkg.models.attention_processor.Attention.get_attention_scores = (\n            get_attn_scores_func\n        )\n    else:\n        diffusers_pkg.models.cross_attention.CrossAttention.get_attention_scores = (\n            get_attn_scores_func\n        )\n\n    # If Pytorch 2 is available, a F.scaled_dot_product_attention will be used, so we need to\n    # monkey patch that too to be Neuron optimized attention\n    if hasattr(F, \"scaled_dot_product_attention\"):\n        F.scaled_dot_product_attention = neuron_scaled_dot_product_attention\n\n\ndef get_attention_scores_neuron(self, query, key, attn_mask):\n    if query.size() == key.size():\n        attention_scores = cust_badbmm(key, query.transpose(-1, -2), self.scale)\n        attention_probs = attention_scores.softmax(dim=1).permute(0, 2, 1)\n\n    else:\n        attention_scores = cust_badbmm(query, key.transpose(-1, -2), self.scale)\n        attention_probs = attention_scores.softmax(dim=-1)\n\n    return attention_probs\n\n\ndef cust_badbmm(a, b, scale):\n    bmm = torch.bmm(a, b)\n    scaled = bmm * scale\n    return scaled\n\n\ndef neuron_scaled_dot_product_attention(\n    query, key, value, attn_mask=None, dropout_p=None, is_causal=None\n):\n    orig_shape = None\n    if len(query.shape) == 4:\n        orig_shape = query.shape\n\n        def to3d(x):\n            return x.reshape(-1, x.shape[2], x.shape[3])\n\n        query, key, value = map(to3d, [query, key, value])\n\n    if query.size() == key.size():\n        attention_scores = torch.bmm(key, query.transpose(-1, -2)) * (\n            1 / math.sqrt(query.size(-1))\n        )\n        attention_probs = attention_scores.softmax(dim=1).permute(0, 2, 1)\n\n    else:\n        attention_scores = torch.bmm(query, key.transpose(-1, -2)) * (\n            1 / math.sqrt(query.size(-1))\n        )\n        attention_probs = attention_scores.softmax(dim=-1)\n\n    attn_out = torch.bmm(attention_probs, value)\n\n    if orig_shape:\n        attn_out = attn_out.reshape(\n            orig_shape[0], orig_shape[1], attn_out.shape[1], attn_out.shape[2]\n        )\n\n    return attn_out\n\n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(\n        self,\n        sample,\n        timestep,\n        encoder_hidden_states,\n        class_labels,\n        cross_attention_kwargs=None,\n    ):\n        out_tuple = self.unet(\n            sample, timestep, encoder_hidden_states, class_labels, return_dict=False\n        )\n        return out_tuple\n\n\nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.device = unetwrap.unet.device\n\n    def forward(\n        self,\n        sample,\n        timestep,\n        encoder_hidden_states,\n        class_labels,\n        cross_attention_kwargs=None,\n        return_dict=False,\n    ):\n        sample = self.unetwrap(\n            sample,\n            timestep.float().expand((sample.shape[0],)),\n            encoder_hidden_states,\n            class_labels,\n        )[0]\n        return UNet2DConditionOutput(sample=sample)\n\n\nclass NeuronTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.neuron_text_encoder = text_encoder\n        self.config = text_encoder.config\n        self.dtype = text_encoder.dtype\n        self.device = text_encoder.device\n\n    def forward(self, emb, attention_mask=None):\n        return [self.neuron_text_encoder(emb)[\"last_hidden_state\"]]\n\n# For saving compiler artifacts\nCOMPILER_WORKDIR_ROOT = 'stable_diffusion_upscaler_fp32'\n\n# Model ID for SD version pipeline\nmodel_id = \"stabilityai/stable-diffusion-x4-upscaler\"\n\n# --- Compile CLIP text encoder and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionUpscalePipeline.from_pretrained(\nmodel_id, torch_dtype=torch.float32\n)\ntext_encoder = copy.deepcopy(pipe.text_encoder)\ndel pipe\n\n# Apply the wrapper to deal with custom return type\ntext_encoder = NeuronTextEncoder(text_encoder)\n\n# Compile text encoder\n# This is used for indexing a lookup table in torch.nn.Embedding,\n# so using random numbers may give errors (out of range).\nemb = torch.tensor([[49406, 18376,   525,  7496, 49407,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n        0,     0,     0,     0,     0,     0,     0]])\n\ntext_encoder_neuron = torch_neuronx.trace(\n        text_encoder.neuron_text_encoder,\n        emb,\n        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'),\n        )\n\n# Save the compiled text encoder\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ntorch.jit.save(text_encoder_neuron, text_encoder_filename)\n\n# delete unused objects\ndel text_encoder\n\n# --- Compile VAE decoder and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionUpscalePipeline.from_pretrained(\nmodel_id, torch_dtype=torch.float32\n)\ndecoder = copy.deepcopy(pipe.vae.decoder)\ndel pipe\n\n# # Compile vae decoder\ndecoder_in = torch.randn([1, 4, 128, 128])\ndecoder_neuron = torch_neuronx.trace(\n    decoder,\n    decoder_in,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'),\n)\n\n# Save the compiled vae decoder\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\ntorch.jit.save(decoder_neuron, decoder_filename)\n\n# delete unused objects\ndel decoder\n\n# --- Compile UNet and save ---\n\npipe = StableDiffusionUpscalePipeline.from_pretrained(\nmodel_id, torch_dtype=torch.float32\n)\n\n# Replace original cross-attention module with custom cross-attention module for better performance\napply_neuron_attn_override(\ndiffusers, get_attention_scores_neuron, neuron_scaled_dot_product_attention\n)\n\n# Apply double wrapper to deal with custom return type\npipe.unet = NeuronUNet(UNetWrap(pipe.unet))\n\n# Only keep the model being compiled in RAM to minimze memory pressure\nunet = copy.deepcopy(pipe.unet.unetwrap)\ndel pipe\n\n# Compile unet - FP32\nsample_1b = torch.randn([1, 7, 128, 128])\ntimestep_1b = torch.tensor(999).float().expand((1,))\nencoder_hidden_states_1b = torch.randn([1, 77, 1024])\nclass_labels = torch.tensor([20])\nexample_inputs = sample_1b, timestep_1b, encoder_hidden_states_1b, class_labels\n\nunet_neuron = torch_neuronx.trace(\n    unet,\n    example_inputs,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'),\n    compiler_args=[\"--model-type=unet-inference\"]\n)\n\n# save compiled unet\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\ntorch.jit.save(unet_neuron, unet_filename)\n\n# delete unused objects\ndel unet\n\n# --- Compile VAE post_quant_conv and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = StableDiffusionUpscalePipeline.from_pretrained(\nmodel_id, torch_dtype=torch.float32\n)\npost_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv)\ndel pipe\n\n# # # Compile vae post_quant_conv\npost_quant_conv_in = torch.randn([1, 4, 128, 128])\npost_quant_conv_neuron = torch_neuronx.trace(\n    post_quant_conv,\n    post_quant_conv_in,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'),\n)\n\n\n# # Save the compiled vae post_quant_conv\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\ntorch.jit.save(post_quant_conv_neuron, post_quant_conv_filename)\n\n# delete unused objects\ndel post_quant_conv\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/sdxl_base_1024_benchmark.py",
    "content": "import os\n\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\nfrom diffusers import DiffusionPipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\nfrom transformers.models.clip.modeling_clip import CLIPTextModelOutput\n\nimport time\nimport math\n\n# Define datatype\nDTYPE = torch.float32\n\n# Specialized benchmarking class for stable diffusion.\n# We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance,\n# because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object.\n# All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a\n# traced Torchscript.\ndef benchmark(n_runs, test_name, model, model_inputs):\n    if not isinstance(model_inputs, tuple):\n        model_inputs = (model_inputs,)\n    \n    warmup_run = model(*model_inputs)\n\n    latency_collector = LatencyCollector()\n    # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module\n    \n    for _ in range(n_runs):\n        latency_collector.pre_hook()\n        res = model(*model_inputs)\n        latency_collector.hook()\n    \n    p0_latency_ms = latency_collector.percentile(0) * 1000\n    p50_latency_ms = latency_collector.percentile(50) * 1000\n    p90_latency_ms = latency_collector.percentile(90) * 1000\n    p95_latency_ms = latency_collector.percentile(95) * 1000\n    p99_latency_ms = latency_collector.percentile(99) * 1000\n    p100_latency_ms = latency_collector.percentile(100) * 1000\n\n    report_dict = dict()\n    report_dict[\"Latency P0\"] = f'{p0_latency_ms:.1f}'\n    report_dict[\"Latency P50\"]=f'{p50_latency_ms:.1f}'\n    report_dict[\"Latency P90\"]=f'{p90_latency_ms:.1f}'\n    report_dict[\"Latency P95\"]=f'{p95_latency_ms:.1f}'\n    report_dict[\"Latency P99\"]=f'{p99_latency_ms:.1f}'\n    report_dict[\"Latency P100\"]=f'{p100_latency_ms:.1f}'\n\n    report = f'RESULT FOR {test_name}:'\n    for key, value in report_dict.items():\n        report += f' {key}={value}'\n    print(report)\n\nclass LatencyCollector:\n    def __init__(self):\n        self.start = None\n        self.latency_list = []\n\n    def pre_hook(self, *args):\n        self.start = time.time()\n\n    def hook(self, *args):\n        self.latency_list.append(time.time() - self.start)\n\n    def percentile(self, percent):\n        latency_list = self.latency_list\n        pos_float = len(latency_list) * percent / 100\n        max_pos = len(latency_list) - 1\n        pos_floor = min(math.floor(pos_float), max_pos)\n        pos_ceil = min(math.ceil(pos_float), max_pos)\n        latency_list = sorted(latency_list)\n        return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor]\n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n \n    def forward(self, sample, timestep, encoder_hidden_states, text_embeds=None, time_ids=None):\n        out_tuple = self.unet(sample,\n                              timestep,\n                              encoder_hidden_states,\n                              added_cond_kwargs={\"text_embeds\": text_embeds, \"time_ids\": time_ids},\n                              return_dict=False)\n        return out_tuple\n    \n    \nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.add_embedding = unetwrap.unet.add_embedding\n        self.device = unetwrap.unet.device\n \n    def forward(self, sample, timestep, encoder_hidden_states, added_cond_kwargs=None, return_dict=False, cross_attention_kwargs=None):\n        sample = self.unetwrap(sample,\n                               timestep.to(dtype=DTYPE).expand((sample.shape[0],)),\n                               encoder_hidden_states,\n                               added_cond_kwargs[\"text_embeds\"],\n                               added_cond_kwargs[\"time_ids\"])[0]\n        return UNet2DConditionOutput(sample=sample)\n\nclass TextEncoderOutputWrapper(nn.Module):\n    def __init__(self, traceable_text_encoder, original_text_encoder):\n        super().__init__()\n        self.traceable_text_encoder = traceable_text_encoder\n        self.config = original_text_encoder.config\n        self.dtype = original_text_encoder.dtype\n        self.device = original_text_encoder.device\n\n    def forward(self, text_input_ids, output_hidden_states=True):\n        out_tuple = self.traceable_text_encoder(text_input_ids)\n        return CLIPTextModelOutput(text_embeds=out_tuple[0], last_hidden_state=out_tuple[1], hidden_states=out_tuple[2])\n    \n    \nclass TextEncoderOutputWrapper(nn.Module):\n    def __init__(self, traceable_text_encoder, original_text_encoder):\n        super().__init__()\n        self.traceable_text_encoder = traceable_text_encoder\n        self.config = original_text_encoder.config\n        self.dtype = original_text_encoder.dtype\n        self.device = original_text_encoder.device\n\n    def forward(self, text_input_ids, output_hidden_states=True):\n        out_tuple = self.traceable_text_encoder(text_input_ids)\n        return CLIPTextModelOutput(text_embeds=out_tuple[0], last_hidden_state=out_tuple[1], hidden_states=out_tuple[2])\n    \nclass TraceableTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.text_encoder = text_encoder\n\n    def forward(self, text_input_ids):\n        out_tuple = self.text_encoder(text_input_ids, output_hidden_states=True, return_dict=False)\n        return out_tuple\n\n\n    \n# --- Load all compiled models and run pipeline ---\nCOMPILER_WORKDIR_ROOT = 'sdxl_base_compile_dir_1024'\nmodel_id = \"stabilityai/stable-diffusion-xl-base-1.0\"\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ntext_encoder_2_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2/model.pt')\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ntext_encoder_2_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2/model.pt')\n\npipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\n\n# Load the compiled UNet onto two neuron cores.\npipe.unet = NeuronUNet(UNetWrap(pipe.unet))\ndevice_ids = [0,1]\npipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False)\n\n# Load other compiled models onto a single neuron core.\npipe.vae.decoder = torch.jit.load(decoder_filename)\npipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename)\npipe.text_encoder = TextEncoderOutputWrapper(torch.jit.load(text_encoder_filename), pipe.text_encoder)\npipe.text_encoder_2 = TextEncoderOutputWrapper(torch.jit.load(text_encoder_2_filename), pipe.text_encoder_2)\n\n\nprompt = \"a photo of an astronaut riding a horse on mars\"\nn_runs = 20\nbenchmark(n_runs, \"stable_diffusion_1024\", pipe, prompt)\n\n\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/sdxl_base_1024_compile.py",
    "content": "import os\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch_neuronx\n\nimport math\nimport copy\nimport diffusers\nfrom diffusers import DiffusionPipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\nfrom diffusers.models.attention_processor import Attention\nfrom transformers.models.clip.modeling_clip import CLIPTextModelOutput\n\nfrom packaging import version\n\ndef apply_neuron_attn_override(\n    diffusers_pkg, get_attn_scores_func, neuron_scaled_dot_product_attention\n):\n    diffusers_version = version.parse(diffusers_pkg.__version__)\n    use_new_diffusers = diffusers_version >= version.parse(\"0.18.0\")\n    if use_new_diffusers:\n        diffusers_pkg.models.attention_processor.Attention.get_attention_scores = (\n            get_attn_scores_func\n        )\n    else:\n        diffusers_pkg.models.cross_attention.CrossAttention.get_attention_scores = (\n            get_attn_scores_func\n        )\n\n    # If Pytorch 2 is available, a F.scaled_dot_product_attention will be used, so we need to\n    # monkey patch that too to be Neuron optimized attention\n    if hasattr(F, \"scaled_dot_product_attention\"):\n        F.scaled_dot_product_attention = neuron_scaled_dot_product_attention\n\n# Define datatype\nDTYPE = torch.float32\n\n# Optimized attention\ndef get_attention_scores_neuron(self, query, key, attn_mask):    \n    if query.size() == key.size():\n        attention_scores = custom_badbmm(\n            key,\n            query.transpose(-1, -2),\n            self.scale\n        )\n        attention_probs = attention_scores.softmax(dim=1).permute(0,2,1)\n\n    else:\n        attention_scores = custom_badbmm(\n            query,\n            key.transpose(-1, -2),\n            self.scale\n        )\n        attention_probs = attention_scores.softmax(dim=-1)\n  \n    return attention_probs\n \ndef custom_badbmm(a, b, scale):\n    bmm = torch.bmm(a, b)\n    scaled = bmm * scale\n    return scaled\n \ndef neuron_scaled_dot_product_attention(\n    query, key, value, attn_mask=None, dropout_p=None, is_causal=None\n):\n    orig_shape = None\n    if len(query.shape) == 4:\n        orig_shape = query.shape\n\n        def to3d(x):\n            return x.reshape(-1, x.shape[2], x.shape[3])\n\n        query, key, value = map(to3d, [query, key, value])\n\n    if query.size() == key.size():\n        attention_scores = torch.bmm(key, query.transpose(-1, -2)) * (\n            1 / math.sqrt(query.size(-1))\n        )\n        attention_probs = attention_scores.softmax(dim=1).permute(0, 2, 1)\n\n    else:\n        attention_scores = torch.bmm(query, key.transpose(-1, -2)) * (\n            1 / math.sqrt(query.size(-1))\n        )\n        attention_probs = attention_scores.softmax(dim=-1)\n\n    attn_out = torch.bmm(attention_probs, value)\n\n    if orig_shape:\n        attn_out = attn_out.reshape(\n            orig_shape[0], orig_shape[1], attn_out.shape[1], attn_out.shape[2]\n        )\n\n    return attn_out\n\n# Replace original cross-attention module with custom cross-attention module for better performance\napply_neuron_attn_override(\n    diffusers, get_attention_scores_neuron, neuron_scaled_dot_product_attention\n)\n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n\n    def forward(\n        self, sample, timestep, encoder_hidden_states, text_embeds=None, time_ids=None\n    ):\n        out_tuple = self.unet(\n            sample,\n            timestep,\n            encoder_hidden_states,\n            added_cond_kwargs={\"text_embeds\": text_embeds, \"time_ids\": time_ids},\n            return_dict=False,\n        )\n        return out_tuple\n\n\nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.add_embedding = unetwrap.unet.add_embedding\n        self.device = unetwrap.unet.device\n\n    def forward(\n        self,\n        sample,\n        timestep,\n        encoder_hidden_states,\n        added_cond_kwargs=None,\n        return_dict=False,\n        cross_attention_kwargs=None,\n    ):\n        sample = self.unetwrap(\n            sample,\n            timestep.float().expand((sample.shape[0],)),\n            encoder_hidden_states,\n            added_cond_kwargs[\"text_embeds\"],\n            added_cond_kwargs[\"time_ids\"],\n        )[0]\n        return UNet2DConditionOutput(sample=sample)\n\nclass TextEncoderOutputWrapper(nn.Module):\n    def __init__(self, traceable_text_encoder, original_text_encoder):\n        super().__init__()\n        self.traceable_text_encoder = traceable_text_encoder\n        self.config = original_text_encoder.config\n        self.dtype = original_text_encoder.dtype\n        self.device = original_text_encoder.device\n\n    def forward(self, text_input_ids, output_hidden_states=True):\n        out_tuple = self.traceable_text_encoder(text_input_ids)\n        return CLIPTextModelOutput(text_embeds=out_tuple[0], last_hidden_state=out_tuple[1], hidden_states=out_tuple[2])\n    \nclass TraceableTextEncoder(nn.Module):\n    def __init__(self, text_encoder):\n        super().__init__()\n        self.text_encoder = text_encoder\n\n    def forward(self, text_input_ids):\n        out_tuple = self.text_encoder(text_input_ids, output_hidden_states=True, return_dict=False)\n        return out_tuple\n\n# For saving compiler artifacts\nCOMPILER_WORKDIR_ROOT = 'sdxl_base_compile_dir_1024'\n\n# Model ID for SD XL version pipeline\nmodel_id = \"stabilityai/stable-diffusion-xl-base-1.0\"\n\n\n# --- Compile Text Encoders and save ---\n\npipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\n\n\n# Apply wrappers to make text encoders traceable\ntraceable_text_encoder = copy.deepcopy(TraceableTextEncoder(pipe.text_encoder))\ntraceable_text_encoder_2 = copy.deepcopy(TraceableTextEncoder(pipe.text_encoder_2))\n\ndel pipe\n\ntext_input_ids_1 = torch.tensor([[49406,   736,  1615, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n         49407, 49407, 49407, 49407, 49407, 49407, 49407]])\n\n\ntext_input_ids_2 = torch.tensor([[49406,   736,  1615, 49407,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n             0,     0,     0,     0,     0,     0,     0]])\n\n\n# Text Encoder 1\nneuron_text_encoder = torch_neuronx.trace(\n    traceable_text_encoder,\n    text_input_ids_1,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'),\n)\n\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ntorch.jit.save(neuron_text_encoder, text_encoder_filename)\n\n\n# Text Encoder 2\nneuron_text_encoder_2 = torch_neuronx.trace(\n    traceable_text_encoder_2,\n    text_input_ids_2,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2'),\n)\n\ntext_encoder_2_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2/model.pt')\ntorch.jit.save(neuron_text_encoder_2, text_encoder_2_filename)\n\n\n\n# --- Compile Text Encoders and save ---\n\npipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)\n\n# Apply wrappers to make text encoders traceable\ntraceable_text_encoder = copy.deepcopy(TraceableTextEncoder(pipe.text_encoder))\ntraceable_text_encoder_2 = copy.deepcopy(TraceableTextEncoder(pipe.text_encoder_2))\n\ndel pipe\n\ntext_input_ids_1 = torch.tensor([[49406,   736,  1615, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n        49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n        49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n        49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n        49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n        49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n        49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,\n        49407, 49407, 49407, 49407, 49407, 49407, 49407]])\n\n\ntext_input_ids_2 = torch.tensor([[49406,   736,  1615, 49407,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n            0,     0,     0,     0,     0,     0,     0]])\n\n\n# Text Encoder 1\nneuron_text_encoder = torch_neuronx.trace(\n    traceable_text_encoder,\n    text_input_ids_1,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'),\n)\n\ntext_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')\ntorch.jit.save(neuron_text_encoder, text_encoder_filename)\n\n# Text Encoder 2\nneuron_text_encoder_2 = torch_neuronx.trace(\n    traceable_text_encoder_2,\n    text_input_ids_2,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2'),\n)\n\ntext_encoder_2_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2/model.pt')\ntorch.jit.save(neuron_text_encoder_2, text_encoder_2_filename)\n\n\n\n# --- Compile UNet and save ---\n\npipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\n\n\n# Replace original cross-attention module with custom cross-attention module for better performance\nAttention.get_attention_scores = get_attention_scores_neuron\n\n# Apply double wrapper to deal with custom return type\npipe.unet = NeuronUNet(UNetWrap(pipe.unet))\n\n# Only keep the model being compiled in RAM to minimze memory pressure\nunet = copy.deepcopy(pipe.unet.unetwrap)\ndel pipe\n\n# Compile unet - FP32\nsample_1b = torch.randn([1, 4, 128, 128], dtype=DTYPE)\ntimestep_1b = torch.tensor(999, dtype=DTYPE).expand((1,))\nencoder_hidden_states_1b = torch.randn([1, 77, 2048], dtype=DTYPE)\nadded_cond_kwargs_1b = {\"text_embeds\": torch.randn([1, 1280], dtype=DTYPE),\n                        \"time_ids\": torch.randn([1, 6], dtype=DTYPE)}\nexample_inputs = (sample_1b, timestep_1b, encoder_hidden_states_1b, added_cond_kwargs_1b[\"text_embeds\"], added_cond_kwargs_1b[\"time_ids\"],)\n\nunet_neuron = torch_neuronx.trace(\n    unet,\n    example_inputs,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'),\n    compiler_args=[\"--model-type=unet-inference\"]\n)\n\n# Enable asynchronous and lazy loading to speed up model load\ntorch_neuronx.async_load(unet_neuron)\ntorch_neuronx.lazy_load(unet_neuron)\n\n# save compiled unet\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt')\ntorch.jit.save(unet_neuron, unet_filename)\n\n# delete unused objects\ndel unet\ndel unet_neuron\n\n\n\n# --- Compile VAE decoder and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\ndecoder = copy.deepcopy(pipe.vae.decoder)\ndel pipe\n\n# Compile vae decoder\ndecoder_in = torch.randn([1, 4, 128, 128], dtype=DTYPE)\ndecoder_neuron = torch_neuronx.trace(\n    decoder, \n    decoder_in, \n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder')\n)\n\n# Enable asynchronous loading to speed up model load\ntorch_neuronx.async_load(decoder_neuron)\n\n# Save the compiled vae decoder\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\ntorch.jit.save(decoder_neuron, decoder_filename)\n\n# delete unused objects\ndel decoder\ndel decoder_neuron\n\n\n\n# --- Compile VAE post_quant_conv and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE)\npost_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv)\ndel pipe\n\n# Compile vae post_quant_conv\npost_quant_conv_in = torch.randn([1, 4, 128, 128], dtype=DTYPE)\npost_quant_conv_neuron = torch_neuronx.trace(\n    post_quant_conv, \n    post_quant_conv_in,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'),\n)\n\n# Enable asynchronous loading to speed up model load\ntorch_neuronx.async_load(post_quant_conv_neuron)\n\n# Save the compiled vae post_quant_conv\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\ntorch.jit.save(post_quant_conv_neuron, post_quant_conv_filename)\n\n# delete unused objects\ndel post_quant_conv\ndel post_quant_conv_neuron\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/sdxl_base_and_refiner_1024_benchmark.py",
    "content": "import os\n\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\nfrom diffusers import DiffusionPipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\n\nimport time\nimport math\n\n# Define datatype\nDTYPE = torch.float32\n\n# Specialized benchmarking class for stable diffusion.\n# We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance,\n# because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object.\n# All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a\n# traced Torchscript.\ndef benchmark(n_runs, test_name, model, model_inputs):\n    if not isinstance(model_inputs, tuple):\n        model_inputs = (model_inputs,)\n    \n    warmup_run = model(*model_inputs)\n\n    latency_collector = LatencyCollector()\n    # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module\n    \n    for _ in range(n_runs):\n        latency_collector.pre_hook()\n        res = model(*model_inputs)\n        latency_collector.hook()\n    \n    p0_latency_ms = latency_collector.percentile(0) * 1000\n    p50_latency_ms = latency_collector.percentile(50) * 1000\n    p90_latency_ms = latency_collector.percentile(90) * 1000\n    p95_latency_ms = latency_collector.percentile(95) * 1000\n    p99_latency_ms = latency_collector.percentile(99) * 1000\n    p100_latency_ms = latency_collector.percentile(100) * 1000\n\n    report_dict = dict()\n    report_dict[\"Latency P0\"] = f'{p0_latency_ms:.1f}'\n    report_dict[\"Latency P50\"]=f'{p50_latency_ms:.1f}'\n    report_dict[\"Latency P90\"]=f'{p90_latency_ms:.1f}'\n    report_dict[\"Latency P95\"]=f'{p95_latency_ms:.1f}'\n    report_dict[\"Latency P99\"]=f'{p99_latency_ms:.1f}'\n    report_dict[\"Latency P100\"]=f'{p100_latency_ms:.1f}'\n\n    report = f'RESULT FOR {test_name}:'\n    for key, value in report_dict.items():\n        report += f' {key}={value}'\n    print(report)\n\nclass LatencyCollector:\n    def __init__(self):\n        self.start = None\n        self.latency_list = []\n\n    def pre_hook(self, *args):\n        self.start = time.time()\n\n    def hook(self, *args):\n        self.latency_list.append(time.time() - self.start)\n\n    def percentile(self, percent):\n        latency_list = self.latency_list\n        pos_float = len(latency_list) * percent / 100\n        max_pos = len(latency_list) - 1\n        pos_floor = min(math.floor(pos_float), max_pos)\n        pos_ceil = min(math.ceil(pos_float), max_pos)\n        latency_list = sorted(latency_list)\n        return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor]\n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n \n    def forward(self, sample, timestep, encoder_hidden_states, text_embeds=None, time_ids=None):\n        out_tuple = self.unet(sample,\n                              timestep,\n                              encoder_hidden_states,\n                              added_cond_kwargs={\"text_embeds\": text_embeds, \"time_ids\": time_ids},\n                              return_dict=False)\n        return out_tuple\n    \n    \nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.add_embedding = unetwrap.unet.add_embedding\n        self.device = unetwrap.unet.device\n \n    def forward(self, sample, timestep, encoder_hidden_states, added_cond_kwargs=None, return_dict=False, cross_attention_kwargs=None):\n        sample = self.unetwrap(sample,\n                               timestep.to(dtype=DTYPE).expand((sample.shape[0],)),\n                               encoder_hidden_states,\n                               added_cond_kwargs[\"text_embeds\"],\n                               added_cond_kwargs[\"time_ids\"])[0]\n        return UNet2DConditionOutput(sample=sample)\n    \n# Helper function to run both refiner and base pipes and return the final image\ndef run_refiner_and_base(base, refiner, prompt, n_steps=40, high_noise_frac=0.8, generator=None):\n    image = base(\n        prompt=prompt,\n        num_inference_steps=n_steps,\n        denoising_end=high_noise_frac,\n        output_type=\"latent\",\n        generator=generator,\n    ).images\n\n    image = refiner(\n        prompt=prompt,\n        num_inference_steps=n_steps,\n        denoising_start=high_noise_frac,\n        image=image,\n    ).images[0]\n\n    return image\n    \n    \n# --- Load all compiled models and run pipeline ---\nCOMPILER_WORKDIR_ROOT = 'sdxl_base_and_refiner_compile_dir_1024'\nbase_model_id = \"stabilityai/stable-diffusion-xl-base-1.0\"\nrefiner_model_id = \"stabilityai/stable-diffusion-xl-refiner-1.0\"\n\nunet_base_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet_base/model.pt')\nunet_refiner_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet_refiner/model.pt')\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\n\n# ------- Load base -------\npipe_base = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=DTYPE, low_cpu_mem_usage=True)\n\n# Load the compiled UNet onto two neuron cores.\npipe_base.unet = NeuronUNet(UNetWrap(pipe_base.unet))\ndevice_ids = [0,1]\npipe_base.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_base_filename), device_ids, set_dynamic_batching=False)\n\n# Load other compiled models onto a single neuron core.\npipe_base.vae.decoder = torch.jit.load(decoder_filename)\npipe_base.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename)\n\n\n# ------- Load refiner -------\n# refiner shares text_encoder_2 and vae with the base\npipe_refiner = DiffusionPipeline.from_pretrained(\n    refiner_model_id,\n    text_encoder_2=pipe_base.text_encoder_2,\n    vae=pipe_base.vae,\n    torch_dtype=torch.float32,\n    low_cpu_mem_usage=True,\n)\n\n# Refiner - load the compiled UNet onto two neuron cores.\npipe_refiner.unet = NeuronUNet(UNetWrap(pipe_refiner.unet))\ndevice_ids = [0,1]\npipe_refiner.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_refiner_filename), device_ids, set_dynamic_batching=False)\n\n\n\n# Define how many steps and what % of steps to be run on each experts (80/20) here\nn_steps = 40\nhigh_noise_frac = 0.8\n\n\nprompt = \"a photo of an astronaut riding a horse on mars\"\ninputs = (pipe_base, pipe_refiner, prompt, n_steps, high_noise_frac, torch.manual_seed(0),)\n\nn_runs = 50\nbenchmark(n_runs, \"stable_diffusion_1024\", run_refiner_and_base, inputs)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/sdxl_base_and_refiner_1024_compile.py",
    "content": "import os\n\nimport torch\nimport torch.nn as nn\nimport torch_neuronx\n\nimport copy\nfrom diffusers import DiffusionPipeline\nfrom diffusers.models.unet_2d_condition import UNet2DConditionOutput\nfrom diffusers.models.attention_processor import Attention\n\n# Define datatype\nDTYPE = torch.float32\n\n# Optimized attention\ndef get_attention_scores_neuron(self, query, key, attn_mask):    \n    if query.size() == key.size():\n        attention_scores = custom_badbmm(\n            key,\n            query.transpose(-1, -2),\n            self.scale\n        )\n        attention_probs = attention_scores.softmax(dim=1).permute(0,2,1)\n\n    else:\n        attention_scores = custom_badbmm(\n            query,\n            key.transpose(-1, -2),\n            self.scale\n        )\n        attention_probs = attention_scores.softmax(dim=-1)\n  \n    return attention_probs\n \n\ndef custom_badbmm(a, b, scale):\n    bmm = torch.bmm(a, b)\n    scaled = bmm * scale\n    return scaled\n \n\nclass UNetWrap(nn.Module):\n    def __init__(self, unet):\n        super().__init__()\n        self.unet = unet\n \n    def forward(self, sample, timestep, encoder_hidden_states, text_embeds=None, time_ids=None):\n        out_tuple = self.unet(sample,\n                              timestep,\n                              encoder_hidden_states,\n                              added_cond_kwargs={\"text_embeds\": text_embeds, \"time_ids\": time_ids},\n                              return_dict=False)\n        return out_tuple\n    \n    \nclass NeuronUNet(nn.Module):\n    def __init__(self, unetwrap):\n        super().__init__()\n        self.unetwrap = unetwrap\n        self.config = unetwrap.unet.config\n        self.in_channels = unetwrap.unet.in_channels\n        self.add_embedding = unetwrap.unet.add_embedding\n        self.device = unetwrap.unet.device\n \n    def forward(self, sample, timestep, encoder_hidden_states, added_cond_kwargs=None, return_dict=False, cross_attention_kwargs=None):\n        sample = self.unetwrap(sample,\n                               timestep.expand((sample.shape[0],)),\n                               encoder_hidden_states,\n                               added_cond_kwargs[\"text_embeds\"],\n                               added_cond_kwargs[\"time_ids\"])[0]\n        return UNet2DConditionOutput(sample=sample)\n    \n\n# For saving compiler artifacts\nCOMPILER_WORKDIR_ROOT = 'sdxl_base_and_refiner_compile_dir_1024'\n\n# Model IDs for SD XL version pipeline\nbase_model_id = \"stabilityai/stable-diffusion-xl-base-1.0\"\nrefiner_model_id = \"stabilityai/stable-diffusion-xl-refiner-1.0\"\n\n# All components we compile in this script:\n# 1. unet (base, in fp32)\n# 2. unet (refiner, in fp32)\n# 3. vae.decoder (base & refiner)\n# 4. vae.post_quant_conv (base & refiner)\n\n# --- Compile UNet in fp32 (base) and save ---\n\npipe_base = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=DTYPE, low_cpu_mem_usage=True)\n\n# Replace original cross-attention module with custom cross-attention module for better performance\nAttention.get_attention_scores = get_attention_scores_neuron\n\n# Apply double wrapper to deal with custom return type\npipe_base.unet = NeuronUNet(UNetWrap(pipe_base.unet))\n\n# Only keep the model being compiled in RAM to minimze memory pressure\nunet = copy.deepcopy(pipe_base.unet.unetwrap)\ndel pipe_base\n\n# Compile unet - fp32 (note these tensors are cast to fp32 in UNetWrap)\nsample_1b = torch.randn([1, 4, 128, 128])\ntimestep_1b = torch.tensor(999).float().expand((1,))\nencoder_hidden_states_1b = torch.randn([1, 77, 2048])\nadded_cond_kwargs_1b = {\"text_embeds\": torch.randn([1, 1280]),\n                        \"time_ids\": torch.randn([1, 6])}\nexample_inputs = (sample_1b, timestep_1b, encoder_hidden_states_1b, added_cond_kwargs_1b[\"text_embeds\"], added_cond_kwargs_1b[\"time_ids\"],)\n\nunet_neuron = torch_neuronx.trace(\n    unet,\n    example_inputs,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet_base'),\n    compiler_args=[\"--model-type=unet-inference\"]\n)\n\n# Enable asynchronous and lazy loading to speed up model load\ntorch_neuronx.async_load(unet_neuron)\ntorch_neuronx.lazy_load(unet_neuron)\n\n# save compiled unet\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet_base/model.pt')\ntorch.jit.save(unet_neuron, unet_filename)\n\n# delete unused objects\ndel unet\ndel unet_neuron\n\n\n\n# --- Compile UNet in fp32 (refiner) and save ---\n\npipe_refiner = DiffusionPipeline.from_pretrained(refiner_model_id, torch_dtype=DTYPE, low_cpu_mem_usage=True)\n\n# Replace original cross-attention module with custom cross-attention module for better performance\nAttention.get_attention_scores = get_attention_scores_neuron\n\n# Apply double wrapper to deal with custom return type\npipe_refiner.unet = NeuronUNet(UNetWrap(pipe_refiner.unet))\n\n# Only keep the model being compiled in RAM to minimze memory pressure\nunet = copy.deepcopy(pipe_refiner.unet.unetwrap)\ndel pipe_refiner\n\n# Compile unet - fp32 - some input shapes are different from base\nsample_1b = torch.randn([1, 4, 128, 128])\ntimestep_1b = torch.tensor(999).float().expand((1,))\nencoder_hidden_states_1b = torch.randn([1, 77, 1280])\nadded_cond_kwargs_1b = {\"text_embeds\": torch.randn([1, 1280]),\n                        \"time_ids\": torch.randn([1, 5])}\nexample_inputs = (sample_1b, timestep_1b, encoder_hidden_states_1b, added_cond_kwargs_1b[\"text_embeds\"], added_cond_kwargs_1b[\"time_ids\"],)\n\nunet_neuron = torch_neuronx.trace(\n    unet,\n    example_inputs,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet_refiner'),\n    compiler_args=[\"--model-type=unet-inference\"]\n)\n\n# Enable asynchronous and lazy loading to speed up model load\ntorch_neuronx.async_load(unet_neuron)\ntorch_neuronx.lazy_load(unet_neuron)\n\n# save compiled unet\nunet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet_refiner/model.pt')\ntorch.jit.save(unet_neuron, unet_filename)\n\n# delete unused objects\ndel unet\ndel unet_neuron\n\n\n\n# --- Compile VAE decoder and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=DTYPE, low_cpu_mem_usage=True)\ndecoder = copy.deepcopy(pipe.vae.decoder)\ndel pipe\n\n# Compile vae decoder\ndecoder_in = torch.randn([1, 4, 128, 128], dtype=DTYPE)\ndecoder_neuron = torch_neuronx.trace(\n    decoder, \n    decoder_in, \n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder')\n)\n\n# Enable asynchronous loading to speed up model load\ntorch_neuronx.async_load(decoder_neuron)\n\n# Save the compiled vae decoder\ndecoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt')\ntorch.jit.save(decoder_neuron, decoder_filename)\n\n# delete unused objects\ndel decoder\ndel decoder_neuron\n\n\n\n# --- Compile VAE post_quant_conv and save ---\n\n# Only keep the model being compiled in RAM to minimze memory pressure\npipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=DTYPE, low_cpu_mem_usage=True)\npost_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv)\ndel pipe\n\n# Compile vae post_quant_conv\npost_quant_conv_in = torch.randn([1, 4, 128, 128], dtype=DTYPE)\npost_quant_conv_neuron = torch_neuronx.trace(\n    post_quant_conv, \n    post_quant_conv_in,\n    compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'),\n)\n\n# Enable asynchronous loading to speed up model load\ntorch_neuronx.async_load(post_quant_conv_neuron)\n\n# Save the compiled vae post_quant_conv\npost_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt')\ntorch.jit.save(post_quant_conv_neuron, post_quant_conv_filename)\n\n# delete unused objects\ndel post_quant_conv\ndel post_quant_conv_neuron"
  },
  {
    "path": "archive/src/benchmark/pytorch/unet_benchmark.py",
    "content": "import torch\nimport neuronperf as npf\nimport neuronperf.torch\n\n# Add to these lists or change as needed\nmodel_name = \"UNet\"\nbatch_sizes = [1, 4]\nn_models = [1, 2]\nworkers_per_model = [1, 2] # optimized for latency or throughput\n\ndef get_batch(batch_size):\n    return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n\n\nif __name__ == \"__main__\":\n    inputs = [get_batch(batch_size) for batch_size in batch_sizes]\n    filename = f\"{model_name}.json\"\n\n    # Benchmark\n    print(\"Benchmarking {}\".format(filename))\n    reports = npf.torch.benchmark(filename, inputs, n_models=n_models, workers_per_model=workers_per_model) \n\n    # View and save results\n    print(\"======== {} ========\".format(filename))\n    npf.print_reports(reports)\n    npf.write_csv(reports)\n    npf.write_json(reports)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/unet_compile.py",
    "content": "import torch\n\nimport neuronperf as npf\nimport neuronperf.torch\n\n# Add to these lists or change as needed\nmodel_name = \"UNet\"\nbatch_sizes = [1, 4]\npipeline_sizes = [1]\n\ndef get_batch(batch_size):\n    return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n\nif __name__ == \"__main__\":\n    # UNet Implementation from https://github.com/milesial/Pytorch-UNet\n    # load the model\n    model = torch.hub.load('milesial/Pytorch-UNet', 'unet_carvana', pretrained=False)\n    # load the weights\n    state_dict = torch.hub.load_state_dict_from_url('https://github.com/milesial/Pytorch-UNet/releases/download/v3.0/unet_carvana_scale0.5_epoch2.pth', map_location=\"cpu\")\n    model.load_state_dict(state_dict)\n\n    inputs = [get_batch(batch_size) for batch_size in batch_sizes]\n    filename = f\"{model_name}.json\"\n\n    # Compile\n    print(\"Compiling {}\".format(filename))\n    npf.torch.compile(\n        model,\n        inputs,\n        batch_sizes=batch_sizes,\n        pipeline_sizes=pipeline_sizes,\n        filename=filename,\n        model_name=model_name,\n    )"
  },
  {
    "path": "archive/src/benchmark/pytorch/vgg_benchmark.py",
    "content": "import torch\nimport neuronperf as npf\nimport neuronperf.torch\n\n# Add to these lists or change as needed\nmodel_names = [\"vgg11\", \"vgg16\"]\nbatch_sizes = [1, 8, 64]\nn_models = [1, 2]\nworkers_per_model = [1, 2] # optimized for latency or throughput\n\n\ndef get_batch(batch_size):\n    return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        inputs = [get_batch(batch_size) for batch_size in batch_sizes]\n        filename = f\"{model_name}.json\"\n\n        # Benchmark\n        print(\"Benchmarking {}\".format(filename))\n        reports = npf.torch.benchmark(filename, inputs, n_models=n_models, workers_per_model=workers_per_model) \n\n        # View and save results\n        print(\"======== {} ========\".format(filename))\n        npf.print_reports(reports)\n        npf.write_csv(reports)\n        npf.write_json(reports)\n"
  },
  {
    "path": "archive/src/benchmark/pytorch/vgg_compile.py",
    "content": "import torch\nimport torchvision\nimport neuronperf as npf\nimport neuronperf.torch\n\n# Add to these lists or change as needed\nmodel_names = [\"vgg11\", \"vgg16\"]\nbatch_sizes = [1, 8, 64]\npipeline_sizes = [1]\n\n\ndef get_batch(batch_size):\n    return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        model = getattr(torchvision.models, model_name)(pretrained=True)\n        inputs = [get_batch(batch_size) for batch_size in batch_sizes]\n        filename = f\"{model_name}.json\"\n\n        # Compile\n        print(\"Compiling {}\".format(filename))\n        npf.torch.compile(\n            model,\n            inputs,\n            batch_sizes=batch_sizes,\n            pipeline_sizes=pipeline_sizes,\n            filename=filename,\n            model_name=model_name,\n        )"
  },
  {
    "path": "archive/tensorboard/getting-started-tensorboard-neuron-plugin.rst",
    "content": ".. _neuron-plugin-tensorboard:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This page for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n   :date-modified: 12-02-2025\n\nNeuron Plugin for TensorBoard (Inf1)\n====================================\n\n.. contents:: Table of Contents\n  :local:\n  :depth: 2\n\n\nOverview\n--------\n\nThis guide is for developers who want to better understand how their\nmodel is executed using Neuron SDK through TensorBoard.\n\nThe Neuron plugin for TensorBoard provides metrics to the performance of machine learning tasks accelerated using the Neuron SDK. It is\ncompatible with TensorBoard versions 1.15 and higher. It provides visualizations and profiling results for graphs executed on NeuronCores.\n\n.. note::\n\n    The following information is compatible with Neuron SDK for Inf1.  For a walkthrough on the latest version, please check out the guide\n    :ref:`neuronx-plugin-tensorboard`.\n\n.. note:: \n\n   Graph visualization is currently only supported for TensorFlow-Neuron.  Support\n   for MXNet-Neuron and PyTorch-Neuron visualization will be added in a future\n   release.\n\n\nCompile the neural network\n--------------------------\n\n3. Refer to the following guides on how to compile a graph using Neuron SDK.\n\n- TensorFlow-Neuron\n   - :ref:`/src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb`\n- PyTorch-Neuron:\n   - \"Compile model for Neuron\" in `PyTorch-Neuron Resnet50 Tutorial`_\n- MXNet-Neuron:\n   - :ref:`/src/examples/mxnet/resnet50/resnet50.ipynb`\n\nEnable profiling \n-----------------\n\nIn this step, we enable Neuron profile data collection and collect results\nfrom executing an inference.\n\n4.1. To start profiling the neural network and collect inference traces, create a\ndirectory where profile data will be dumped and set the ``NEURON_PROFILE`` environment\nvariable.  In this example, we will assume this directory is ``$HOME/profile``\n\n.. code:: bash\n\n   mkdir -p $HOME/profile\n   export NEURON_PROFILE=$HOME/profile\n\n4.2. Ensure Neuron Tools are executable by setting the ``PATH`` environment variable.\n\n.. code:: bash\n\n   export PATH=/opt/aws/neuron/bin:$PATH\n\n4.3. Execute inference!\n\n.. note::\n\n   Please run the inference script outside of Jupyter notebook.  Profiling in\n   Jupyter notebook is not supported at this time.\n\n.. note::\n\n   Please ensure the inference script executes only one inference, as profiling\n   results are currently only supported for a single inference.\n\nFor more info on how to execute inference, refer to the following guides:\n\n- TensorFlow-Neuron\n   - :ref:`/src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb`\n- PyTorch-Neuron\n   - \"Run inference on Single Core\" in :ref:`/src/examples/pytorch/resnet50.ipynb`\n- MXNet-Neuron\n   - :ref:`/src/examples/mxnet/resnet50/resnet50.ipynb`\n\n4.4. Check if profiling results were successfully saved.  In the directory\npointed to by ``NEURON_PROFILE`` environment variable set in Step 4.1, there\nshould be at least two files, one with the ``.neff`` extension and one with the\n``.ntff`` extension.  For TensorFlow-Neuron users, the graph file (``.pb``) will\nalso be in this directory.\n\n.. code:: bash\n\n   ls $NEURON_PROFILE\n\nLaunch TensorBoard\n------------------\n\nIn this step, we will process the Neuron profile data and launch TensorBoard.\n\n5.1. Install the Neuron plugin for Tensorboard.\n\n.. include:: /setup/install-templates/inf1/tensorboard-plugin-neuron-pip-install.rst\n\n5.2. After collecting the raw profile data, we need to post-process it to create the\nlog files used by the Neuron plugin.  This can be done when launching TensorBoard\nby passing an extra flag ``--run_neuron_profiler``.  Using this flag will create the\ndirectory specified by ``--logdir`` and populate it with Neuron plugin data.  Please\nnote that the ``NEURON_PROFILE`` environment variable set in Step 4.1 must still point\nto the same directory as before.\n\n.. code:: bash\n\n   tensorboard --logdir results --run_neuron_profiler\n\n.. note::\n\n   If using TensorBoard >= 2.5, please use the ``--load_fast=false`` option when launching.\n   ``tensorboard --logdir results --run_neuron_profiler --load_fast=false``\n\n5.3. After you see the following message, TensorBoard is ready to use.  By default,\nTensorBoard will be launched at ``localhost:6006`` on the Deployment Instance.\n\n::\n\n   ...\n   Running neuron-profile\n   Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all\n   TensorBoard 2.4.1 at http://localhost:6006/ (Press CTRL+C to quit)\n\nView results in TensorBoard\n---------------------------\n\nIn this step, we will view the Neuron plugin for TensorBoard from a browser on your local\ndevelopment machine.\n\n6.1. Connect to the Deployment Instance while enabling port forwarding.  In this example, we\nassume TensorBoard has been launched using the default address ``localhost:6006`` on the\nDeployment Instance.\n\n.. code:: bash\n\n   # if Ubuntu-based AMI\n   ssh -i <PEM key file> ubuntu@<instance DNS> -L 6006:localhost:6006\n\n   # if AL2-based AMI\n   ssh -i <PEM key file> ec2-user@<instance DNS> -L 6006:localhost:6006\n\n6.2. In a browser, visit |tensorboard_address|.\n\n6.3. In the top navigation bar, switch from ``Graphs`` to ``Neuron``.  If it does not show up,\nplease wait a while and refresh the page while the plugin loads.  If the issue persists, check\nthe ``Inactive`` dropdown list on the right and check for ``Neuron``.\n\n|image1|\n\n6.4. If TensorBoard failed to find the generated logs, you will see the following message:\n\n|image10|\n\n\nIn this case, please check the console output on the Deployment Instance where TensorBoard was\nlaunched for any warnings or error messages, and make sure the version of the ``aws-neuron-tools``\npackage is compatible.\n\n\n.. _tensorboard-plugin-visualize-graph:\n\nVisualize graphs executed on Neuron\n-----------------------------------\n\n.. _tensorboard-plugin-graph-device:\n\nShow how the graph was partition to run on NeuronCores\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo view how the graph was partitioned to run on NeuronCores, select \"Device\" under \"Graph Color\nSchemes\" in the left navigation bar.\n\n|image2|\n\nEach operator will be colored according to the device used.  In this example, light blue indicates\nan operator was executed on CPU, and orange indicates the operator was executed on NeuronCores.\nOperators that are white may have been optimized by the Neuron compiler and fused into another\noperation.\n\n.. _tensorboard-plugin-graph-time:\n\nInspect which operators consumes the most time\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYou can also view how long each operator took by changing to the \"Compute time\" color scheme.\n\n|image3|\n\nThis view will show time taken by each layer and will be colored according to how much relative\ntime the layer took to compute. A lighter shade of red means that a relatively small portion of\ncompute time was spent in this layer, while a darker red shows that more compute time was used.\n\n.. _tensorboard-plugin-graph-supported-ops:\n\nCheck out Neuron support operators for each framework\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe \"Compatibility\" color scheme allows you to better understand what operators are currently\nsupported by the Neuron compiler - green for compatible ops, red for incompatible ops, and yellow\nfor subgraphs that contain both compatible and incompatible ops.\n\n|image4|\n\n.. _tensorboard-plugin-graph-filter-device:\n\nFilter view by device\n^^^^^^^^^^^^^^^^^^^^^\n\nAdditionally, you can choose to filter by CPU and NeuronCores, which will only color ops that\nmatch the selected device(s).\n\n|image5|\n\nExpand/collapse subgraphs and view operator details\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEach rectangular node in the graph represents a subgraph that can be expanded or collapse by\nclicking on the name.  Operators will be represented by ellipses, and can be clicked to reveal\nmore information on that operator, such as inputs and execution device.\n\n|image11|\n\nThe ``Expand All`` and ``Collapse All`` buttons can be used to expand or collapse every subgraph.\nWhen using these features, the positioning of the graph may change when redrawing the new graph.\nTry using ``Reset Position`` button and zoom out by scrolling if the graph appears to be missing.\n\n.. _tensorboard-plugin-view-profile:\n\nViewing the Neuron profile data\n-------------------------------\n\nOn the right side of the Neuron plugin, information on the profiled inference will be displayed.\n\n.. _tensorboard-plugin-profile-summary:\n\nSee performance summary\n^^^^^^^^^^^^^^^^^^^^^^^\n\nFirst is the \"Neuron Performance Summary,\" which gives a quick overview on how Neuron executed the graph,\nincluding information on the number of NeuronCores and both on-NeuronCore time and on-CPU time.\n\n|image6|\n\n.. _tensorboard-plugin-profile-nc:\n\nGet a breakdown of time spent per NeuronCore\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNext, the \"Neuron Execution\" will give more details on how a graph was partitioned for Neuron.\nEach entry in the table will show the order it was executed in, what type of device was used, the compute\ntime (in microseconds), and the percentage of total time spent.  To dive deeper into subgraphs, you can\ncheck the \"Show Details\" box to display the breakdown per NeuronCore.\n\n|image7|\n\n.. _tensorboard-plugin-profile-op:\n\nGet a breakdown of time spent per operator\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe \"Op Time Table\" section shows the cycle count per operator, much like the \"Compute time\" coloring\nfor graph visualization.  This table can be sorted by clicking the column names, and searched using the \nprovided text box in the top right corner. Due to Neuron compiler optimizations, some of the compute may\nnot be associated with any specific operator and will be categorized as ``unknown``.  Additionally, time\nspent moving data to and from NeuronCores will fall under ``(ND_ENGINE_LOAD)``.\n\n|image8|\n\n\n\n.. |image1| image:: /images/tb-plugin-img1.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image2| image:: /images/tb-plugin-img2.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image3| image:: /images/tb-plugin-img3.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image4| image:: /images/tb-plugin-img4.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image5| image:: /images/tb-plugin-img5.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image6| image:: /images/tb-plugin-img6.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image7| image:: /images/tb-plugin-img7.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image8| image:: /images/tb-plugin-img8.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image9| image:: /images/tb-plugin-img9.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image10| image:: /images/tb-plugin-img10.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image11| image:: /images/tb-plugin-img11.png\n  :height: 2826\n  :width: 5341\n  :scale: 10%\n.. _PyTorch-Neuron Resnet50 Tutorial: ../../src/examples/pytorch/resnet50.ipynb\n.. |tensorboard_address| raw:: html\n\n   <a href=\"http://localhost:6006\" target=\"_blank\">localhost:6006</a>\n"
  },
  {
    "path": "archive/tensorflow/index.rst",
    "content": ".. _tensorflow-neuron-main:\n.. _tensorflow-neuron:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow Neuron\n=================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\nTensorFlow Neuron unlocks high-performance and cost-effective deep learning acceleration on AWS Trainium-based and Inferentia-based Amazon EC2 instances.\n\nTensorFlow Neuron enables native TensorFlow models to be accelerated on Neuron devices, so you can use your existing framework application and get started easily with minimal code changes.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /archive/tensorflow/tensorflow-setup\n\n.. toctree::\n    :maxdepth: 2\n    :hidden:\n\n    Inference (Inf2 & Trn1)  </archive/tensorflow/tensorflow-neuronx-inference>\n    Inference (Inf1)  </archive/tensorflow/tensorflow-neuron-inference>    \n\n.. card:: Tensorflow NeuronX for Inference on ``Inf2`` & ``Trn1`` / ``Trn1n``\n    :link: inference-tensorflow-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Tensorflow Neuron for Inference on ``Inf1``\n    :link: inference-tensorflow-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n"
  },
  {
    "path": "archive/tensorflow/setup-legacy-inf1-tensorflow.rst",
    "content": ".. meta::\n   :description: Legacy TensorFlow installation guide for AWS Inferentia 1 (Inf1) instances\n   :keywords: tensorflow, neuron, inf1, legacy, installation, tensorflow-neuron\n   :framework: tensorflow\n   :instance-types: inf1\n   :status: legacy\n   :content-type: legacy-guide\n   :date-modified: 2026-03-30\n\nTensorFlow on Inf1 (legacy)\n=============================\n\n.. warning::\n   \n   **Legacy hardware**: Inf1 instances use NeuronCore v1 with TensorFlow 2.x (``tensorflow-neuron``).\n   \n   For new projects, use **Inf2, Trn1, Trn2, or Trn3** with PyTorch 2.9+ or JAX 0.7+.\n   See :ref:`setup-guide-index` for current setup options.\n\n.. note::\n   \n   TensorFlow support for Inf2 has reached end of support as of Neuron SDK 2.29.\n   See :ref:`announce-eos-tensorflow-inf2` for details.\n\nSetup instructions\n------------------\n\nFor complete Inf1 TensorFlow setup instructions, see the original setup guides:\n\n- :doc:`/archive/tensorflow/tensorflow-neuron/setup/tensorflow-update` - TensorFlow Neuron setup and updates\n- :doc:`/archive/tensorflow/tensorflow-neuron-inference` - Inference on Inf1\n\nThe setup guides cover:\n\n- Ubuntu 20, Ubuntu 22, and Amazon Linux 2 installation\n- DLAMI-based installation\n- Manual pip installation\n- TensorFlow 2.10.1, 2.9.3, and 2.8.4 versions\n\nVerification\n------------\n\nAfter installation, verify with:\n\n.. code-block:: python\n   \n   import tensorflow as tf\n   import tensorflow_neuron\n   \n   print(f\"TensorFlow version: {tf.__version__}\")\n\n.. code-block:: bash\n   \n   neuron-ls\n\nNext steps\n----------\n\n- :doc:`/archive/tensorflow/tensorflow-neuron-inference` - Inference tutorials for Inf1\n- :ref:`setup-guide-index` - Current setup options (Inf2, Trn1, Trn2, Trn3)\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/additional-examples.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nAdditional Examples (``tensorflow-neuron``)\n===========================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    AWS Neuron Samples GitHub Repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/tensorflow-neuron/inference>\n\n\n.. include:: /archive/tensorflow/tensorflow-neuron/additional-examples.txt\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/additional-examples.txt",
    "content": "* `AWS Neuron Samples GitHub Repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/archive/tensorflow-neuron/inference>`_\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/api-auto-replication-api.rst",
    "content": ".. _tensorflow-ref-auto-replication-python-api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow 2.x (``tensorflow-neuron``) Auto Multicore Replication (Beta)\n===================================================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThe Neuron auto multicore replication Python API enables modifying TensorFlow 2.x\ntraced models so that they can be automatically replicated across multiple cores.\nFor Tensorflow-Serving models and TensorFlow 1.x models, see :ref:`tensorflow-ref-auto-replication-cli-api`\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nTensorFlow 2.x (``tensorflow-neuron TF2.x``) Auto Multicore Replication Python API (Beta)\n-----------------------------------------------------------------------------------------------------------\n\nMethod\n^^^^^^\n\n``tensorflow.neuron.auto_multicore``\n\nDescription\n^^^^^^^^^^^\n\nConverts an existing AWS-Neuron-optimized ``keras.Model`` and returns an auto-replication tagged\nAWS-Multicore-Neuron-optimized  ``keras.Model`` that can execute on AWS Machine Learning Accelerators.\nLike the traced model, the returned ``keras.Model`` will support inference only. Attributes or\nvariables held by the original function or ``keras.Model`` will be dropped.\n\nThe auto model replication feature in TensorFlow-Neuron enables you to\ncreate a model once and the model parallel replication would happen\nautomatically. The desired number of cores can be less than the total available NeuronCores\non an Inf1 instance but not less than 1. This reduces framework memory usage as you are not\nloading the same model multiple times manually. Calls to the returned model will execute the call\non each core in a round-robin fashion.\n\nThe returned ``keras.Model`` can be exported as SavedModel and served using\nTensorFlow Serving. Please see the TensorFlow Serving documentation for more\ninformation about exporting to saved model and serving using TensorFlow\nServing.\n\nNote that the automatic replication will only work on models compiled with pipeline size 1:\nvia ``--neuroncore-pipeline-cores=1``. If auto replication is not enabled, the model will default to\nreplicate on up to 4 cores.\n\nSee  :ref:`neuron-compiler-cli-reference` for more information about compiler options.\n\nArguments\n^^^^^^^^^\n\n-   **func:** The ``keras.Model`` or function to be traced.\n-   **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of\n    ``tf.Tensor`` objects for tracing the function. When ``example_inputs``\n    is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect\n    ``func`` to have calling signature ``func(example_inputs)``. Otherwise,\n    the expectation is that inference on ``func`` is done by calling\n    ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``,\n    or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``.\n    The case where ``func`` accepts mixed positional and keyword arguments\n    is currently unsupported.\n-   **num_cores:** The desired number of cores where the model will be automatically\n    replicated across\n\nReturns\n^^^^^^^\n\n-  An AWS-Multicore-Neuron-optimized ``keras.Model``.\n\n\nExample Python API Usage for TF2.x traced models:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code :: python\n\n        input0 = tf.keras.layers.Input(3)\n        dense0 = tf.keras.layers.Dense(3)(input0)\n        inputs = [input0]\n        outputs = [dense0]\n        model = tf.keras.Model(inputs=inputs, outputs=outputs)\n        input0_tensor = tf.random.uniform([1, 3])\n        model_neuron = tfn.trace(model, input0_tensor)\n\n        num_cores = 4\n        multicore_model = tfn.auto_multicore(model_neuron, input0_tensor, num_cores=num_cores)\n        multicore_model(input0_tensor)\n\nExample Python API Usage for TF2.x saved models:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code :: python\n\n        from tensorflow.python import saved_model\n\n        input0_tensor = tf.random.uniform([1, 3])\n        num_cores = 4\n        reload_model = saved_model.load(model_dir)\n        multicore_model = tfn.auto_multicore(reload_model, input0_tensor, num_cores=num_cores)\n\n.. _tensorflow-ref-auto-replication-cli-api:\n\nTensorFlow Neuron 2.x (``tensorflow-neuron``) Auto Multicore Replication CLI (Beta)\n---------------------------------------------------------------------------------------------------------------\n\nThe Neuron auto multicore replication CLI  enables modifying TensorFlow 1.x and Tensorflow 2.x\ntraced saved models so that they can be automatically replicated across multiple cores. By performing\nthis call on Tensorflow Saved Models, we can support both Tensorflow-Serving and Tensorflow 1.x\nwithout significant modifications to the code. Note that the python API does not support Tensorflow 1.x.\n\nMethod\n^^^^^^\n\n``tf-neuron-auto-multicore MODEL_DIR --num_cores NUM_CORES --new_model_dir NEW_MODEL_DIR``\n\nArguments\n^^^^^^^^^\n\n-   **MODEL_DIR:** The directory of a saved AWS-Neuron-optimized ``keras.Model``.\n-   **NUM_CORES:** The desired number of cores where the model will be automatically\n    replicated across\n-   **NEW_MODEL_DIR:** The directory of where the AWS-Multicore-Neuron-optimized\n    ``keras.Model`` will be saved\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/api-compilation-python-api.rst",
    "content": ".. _tensorflow-ref-neuron-compile-api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow 1.x (``tensorflow-neuron``) Compilation API\n=======================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThe Neuron compilation API for TensorFlow 1.x enables compilation of saved\nmodel to an Inferentia target.\n\nMethod\n------\n\n``tensorflow.neuron.saved_model.compile``\n\nDescription\n-----------\n\nWithin the graph or subgraph, the compile method selects and send\nNeuron-supported operations to Neuron-Compiler for compilation and saves\nthe compiled artifacts in the graph. Uncompilable operations are kept as\noriginal operations for framework execution.\n\nThe compiled graph can be exported to saved model and served using\nTensorFlow Serving. Please see the TensorFlow Serving documentation for more\ninformation about exporting to saved model and serving using TensorFlow\nServing.\n\nOptions can be passed to Neuron compiler via the compile function. For\nexample, the “\\ ``--neuroncore-pipeline-cores``\\ ” option directs Neuron\ncompiler to compile each subgraph to fit in the specified number of\nNeuronCores. This number can be less than the total available\nNeuronCores on an Inf1 instance. See :ref:`neuron-compiler-cli-reference`\nfor more information about compiler options.\n\nArguments\n---------\n\n-  **model_dir:** The path of the original ``SavedModel``.\n-  **new_model_dir:** The path to which the Neuron-optimized\n   ``SavedModel`` will be stored.\n-  **batch_size:** (Optional) Positive integer representing batch size\n   used in inference. The default value is 1.\n-  **model_shape_feed_dict:** (Optional) Dictionary {str: list} used for\n   inferring tensor shapes. Keys should match model input names. Values\n   are lists of positive integers representing model input tensor\n   shapes.\n-  **model_feed_dict:** (Optional) Dictionary {str: numpy.array} used\n   for inference. Useful for inferring tensor shapes. Keys should match\n   model input names. Values are numpy arrays that can be fed as inputs\n   to the ``SavedModel``.\n-  **tags:** (Optional) Iterable of strings to identify the required\n   ``MetaGraphDef``. These should correspond to the tags used when\n   saving the variables using the ``SavedModel`` ``save()`` API. Default\n   is to use the first ``tag_set`` available in the ``SavedModel``.\n-  **signature_def_key:** (Optional) String specifying the\n   ``signature_def`` to use. Default is to use 'serving_default' or the\n   first ``signature_def`` corresponding to ``tags``.\n-  **minimum_segment_size:** (Optional) Integer indicating the minimum\n   number of operations in an NeuronOp.\n-  **no_fuse_ops:** (Optional) None or iterable of strings (unordered)\n   representing names of operations that are forcibly placed on CPU.\n-  **compiler_args:** (Optional) List of strings representing neuron-cc\n   compiler arguments. Note that these arguments apply to all subgraphs\n   generated by whitelist partitioning. For example, use\n   ``compiler_args=['--neuroncore-pipeline-cores', '4']`` to set number\n   of NeuronCores per subgraph to 4. See :ref:`neuron-compiler-cli-reference`\n   for more information about compiler options.\n-  **compiler_workdir:** (Optional) String representing work directory\n   of the neuron-cc compiler.\n\nReturns\n-------\n\n-  Dictionary with operator counts before/after optimization.\n-  Operator count statistics are displayed to show original count,\n   post-optimization count, and the number placed on Neuron runtime. For\n   example:\n\n::\n\n   INFO:tensorflow:Number of operations in TensorFlow session: 3978\n   INFO:tensorflow:Number of operations after tf.neuron optimizations: 555\n   INFO:tensorflow:Number of operations placed on Neuron runtime: 554\n\nExample Usage\n-------------\n\n.. code:: python\n\n   import shutil\n   import tensorflow.neuron as tfn\n   saved_model_path = \"<saved model path>\"\n   compiled_saved_model_path = \"<compiled saved model path>\"\n   shutil.rmtree(compiled_saved_model_path, ignore_errors=True)\n   tfn.saved_model.compile(saved_model_path, compiled_saved_model_path)\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/api-reference-guide.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nAPI Reference Guide (``tensorflow-neuron``)\n===========================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /archive/tensorflow/tensorflow-neuron/api-tracing-python-api\n    /archive/tensorflow/tensorflow-neuron/api-tfn-analyze-model-api\n    /archive/tensorflow/tensorflow-neuron/api-auto-replication-api\n\n\n.. include:: /archive/tensorflow/tensorflow-neuron/api-reference-guide.txt\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/api-reference-guide.txt",
    "content": "* :ref:`tensorflow-ref-neuron-tracing-api`\n* :ref:`tensorflow-ref-neuron-analyze_model-api`\n* :ref:`tensorflow-ref-auto-replication-python-api`"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/api-tfn-analyze-model-api.rst",
    "content": ".. _tensorflow-ref-neuron-analyze_model-api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow 2.x (``tensorflow-neuron``) analyze_model API\n========================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nMethod\n------\n\n``tensorflow.neuron.analyze_model``\n\nDescription\n-----------\n\nAnalyzes a ``keras.Model`` or a Python callable that can be decorated by\n``tf.function`` for it's compatibility with Neuron. It displays supported \nvs. unsupported operators in the model as well as percentages and counts of \neach operator and returns a dictionary with operator statistics.\n\nArguments\n---------\n\n-   **func:** The ``keras.Model`` or function to be analyzed.\n-   **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of\n    ``tf.Tensor`` objects for tracing the function. When ``example_inputs``\n    is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect\n    ``func`` to have calling signature ``func(example_inputs)``. Otherwise,\n    the expectation is that inference on ``func`` is done by calling\n    ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``,\n    or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``.\n    The case where ``func`` accepts mixed positional and keyword arguments\n    is currently unsupported.\n\nReturns\n-------\n\n-  A results ``dict`` with these keys: ``'percent_supported', 'supported_count', \n  'total_count', 'supported_operators', 'unsupported_operators', 'operators', \n  'operator_count'``.\n\nExample Usage\n-------------\n\n.. code:: python\n\n    import tensorflow as tf\n    import tensorflow.neuron as tfn\n\n    input0 = tf.keras.layers.Input(3)\n    dense0 = tf.keras.layers.Dense(3)(input0)\n    model = tf.keras.Model(inputs=[input0], outputs=[dense0])\n    example_inputs = tf.random.uniform([1, 3])\n    results = tfn.analyze_model(model, example_inputs)\n    print(results)\n\n    # expected output\n    '''\n    BiasAdd\n\tMatMul\n\t100.00% of all operations (2 of 2) are supported\n\t{'percent_supported': 100.0, 'supported_count': 2, 'total_count': 2, \n\t'supported_operators': {'BiasAdd', 'MatMul'}, 'unsupported_operators': [], \n\t'operators': ['BiasAdd', 'MatMul'], 'operator_count': {'MatMul': 1, 'BiasAdd': 1}}\n\t'''\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/api-tracing-python-api.rst",
    "content": ".. _tensorflow-ref-neuron-tracing-api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow 2.x (``tensorflow-neuron``) Tracing API\n===================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThe Neuron tracing API enables tracing TensorFlow 2.x models for deployment\non AWS Machine Learning Accelerators.\n\nMethod\n------\n\n``tensorflow.neuron.trace``\n\nDescription\n-----------\n\nTrace a ``keras.Model`` or a Python callable that can be decorated by\n``tf.function``, and return an AWS-Neuron-optimized ``keras.Model`` that\ncan execute on AWS Machine Learning Accelerators. Tracing is ideal for\n``keras.Model`` that accepts a list of ``tf.Tensor`` objects and returns\na list of ``tf.Tensor`` objects. It is expected that users will provide\nexample inputs, and the ``trace`` function will execute ``func``\nsymbolically and convert it to a ``keras.Model``.\n\nThe returned ``keras.Model`` will support inference only. Attributes or\nvariables held by the original function or ``keras.Model`` will be dropped.\n\nThe returned ``keras.Model`` can be exported as SavedModel and served using\nTensorFlow Serving. Please see the TensorFlow Serving documentation for more\ninformation about exporting to saved model and serving using TensorFlow\nServing.\n\nThe returned ``keras.Model`` has an ``.on_neuron_ratio`` attribute\nwhich shows the percentage of ops mapped to neuron hardware. This calculation\nignores PlaceholerOp, IdentityOp, ReadVariableOp and NoOp.\n\nOptions can be passed to Neuron compiler via the environment variable\n``NEURON_CC_FLAGS``. For example, the syntax\n``env NEURON_CC_FLAGS=\"--neuroncore-pipeline-cores=4\"`` directs Neuron\ncompiler to compile each subgraph to fit in the specified number of\nNeuronCores. This number can be less than the total available NeuronCores\non an Inf1 instance. See  :ref:`neuron-compiler-cli-reference` for more\ninformation about compiler options.\n\nArguments\n---------\n\n-   **func:** The ``keras.Model`` or function to be traced.\n-   **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of\n    ``tf.Tensor`` objects for tracing the function. When ``example_inputs``\n    is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect\n    ``func`` to have calling signature ``func(example_inputs)``. Otherwise,\n    the expectation is that inference on ``func`` is done by calling\n    ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``,\n    or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``.\n    The case where ``func`` accepts mixed positional and keyword arguments\n    is currently unsupported.\n-   **subgraph_builder_function:** (Optional) A callable with signature\n\n    ``subgraph_builder_function(node : NodeDef) -> bool``\n    (``NodeDef`` is defined in tensorflow/core/framework/node_def.proto)\n\n    that is used as a call-back function to determine which part of\n    the tensorflow GraphDef given by tracing ``func`` will be placed on\n    Machine Learning Accelerators.\n\n    If ``subgraph_builder_function`` is not provided, then ``trace`` will\n    automatically place operations on Machine Learning Accelerators or\n    on CPU to maximize the execution efficiency.\n\n    If it is provided, and ``subgraph_builder_function(node)`` returns\n    ``True``, and placing ``node`` on Machine Learning Accelerators\n    will not cause deadlocks during execution, then ``trace`` will place\n    ``node`` on Machine Learning Accelerators. If\n    ``subgraph_builder_function(node)`` returns ``False``, then ``trace``\n    will place ``node`` on CPU.\n\nSpecial Flags\n-------------\n\nThese are flags that get passed directly to the Neuron tracing API\n(rather than the Neuron Compiler). The flags are still passed\nvia the environment variable ``NEURON_CC_FLAGS``.\n\n-   **workdir:** example usage - ``NEURON_CC_FLAGS='--workdir ./artifacts'``\n    will create a folder named artifacts in the current directory and\n    save artifacts that can be used for debug.\n-   **dynamic-batch-size:** example usage -\n    ``NEURON_CC_FLAGS='--dynamic-batch-size'`` A flag to allow Neuron graphs to\n    consume variable sized batches of data. Dynamic sizing is restricted to the\n    0th dimension of a tensor.\n-   **extract-weights (Beta):** example usage -\n    ``NEURON_CC_FLAGS='--extract-weights inf1.2xlarge'`` will reduce the compiled\n    model's protobuf size by taking the weights out of the protobuf.\n    Useful for compiling large models that would exceed the 2GB protobuf\n    size limit. This feature is in beta. Model performance is not\n    guaranteed and the flag does not work in combination with\n    ``--neuroncore-pipeline-cores``, ``--dynamic-batch-size``, models with\n    multiple NEFFs, and models that are 4GB or greater. \n    Compiles models for different neuron instances depending on the instance type passed.\n    Supports all inf1 instance types.\n\nReturns\n-------\n\n-  An AWS-Neuron-optimized ``keras.Model``.\n\n\nExample Usage\n-------------\n\n.. code:: python\n\n    import tensorflow as tf\n    import tensorflow.neuron as tfn\n\n    input0 = tf.keras.layers.Input(3)\n    dense0 = tf.keras.layers.Dense(3)(input0)\n    model = tf.keras.Model(inputs=[input0], outputs=[dense0])\n    example_inputs = tf.random.uniform([1, 3])\n    model_neuron = tfn.trace(model, example_inputs)  # trace\n    # check to see how much of the model was compiled successfully\n    print(model_neuron.on_neuron_ratio) \n\n    model_dir = './model_neuron'\n    model_neuron.save(model_dir)\n    model_neuron_reloaded = tf.keras.models.load_model(model_dir)\n\n\nExample Usage with Manual Device Placement Using ``subgraph_builder_function``\n------------------------------------------------------------------------------\n\n.. code:: python\n\n    import tensorflow as tf\n    import tensorflow.neuron as tfn\n\n    input0 = tf.keras.layers.Input(3)\n    dense0 = tf.keras.layers.Dense(3)(input0)\n    reshape0 = tf.keras.layers.Reshape([1, 3])(dense0)\n    output0 = tf.keras.layers.Dense(2)(reshape0)\n    model = tf.keras.Model(inputs=[input0], outputs=[output0])\n    example_inputs = tf.random.uniform([1, 3])\n\n    def subgraph_builder_function(node):\n        return node.op == 'MatMul'\n\n    model_neuron = tfn.trace(\n        model, example_inputs,\n        subgraph_builder_function=subgraph_builder_function,\n    )\n\n.. important ::\n\n    Although the old API ``tensorflow.neuron.saved_model.compile`` is still available under tensorflow-neuron 2.x,\n    it supports only the limited capabilities of ``tensorflow.neuron.trace`` and will be deprecated in future releases.\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/dlc-then-ec2-devflow.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n.. include:: /devflows/inference/dlc-then-ec2-devflow.rst\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/dlc-then-ecs-devflow.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n.. include:: /devflows/inference/dlc-then-ecs-devflow.rst\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/dlc-then-eks-devflow.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n.. include:: /devflows/inference/dlc-then-eks-devflow.rst\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/ec2-then-ec2-devflow.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n.. include:: /devflows/inference/ec2-then-ec2-devflow.rst\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nMisc (``tensorflow-neuron``)\n============================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /release-notes/archive/tensorflow/tensorflow-neuron/tensorflow-neuron-v2                  \n    /archive/tensorflow/tensorflow-neuron/tensorflow2-accelerated-ops\n\n\n.. include:: /archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron.txt\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron.txt",
    "content": "* :ref:`tensorflow-neuron-rn-v2`\n* :ref:`tensorflow-ref-neuron-accelerated-ops`"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/neo-then-hosting-devflow.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n.. include:: /devflows/inference/neo-then-hosting-devflow.rst\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.14.2-tensorflow-install.rst",
    "content": ".. _install-neuron-1.14.2-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron (Neuron 1.14.2)\n======================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2\n\n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.0-tensorflow-install.rst",
    "content": ".. _install-neuron-1.15.0-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron (Neuron 1.15.0)\n======================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0\n\n\n\n \n   .. tab-item:: TensorFlow 2.4.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n\n   .. tab-item:: TensorFlow 2.3.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5    \n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n\n   .. tab-item:: TensorFlow 2.3.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2\n\n\n   .. tab-item:: TensorFlow 2.3.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.1-tensorflow-install.rst",
    "content": ".. _install-neuron-1.15.1-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron (Neuron 1.15.1)\n=========================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1\n\n\n\n \n   .. tab-item:: TensorFlow 2.4.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n\n   .. tab-item:: TensorFlow 2.3.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5    \n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n\n   .. tab-item:: TensorFlow 2.3.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2\n\n\n   .. tab-item:: TensorFlow 2.3.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.2-tensorflow-install.rst",
    "content": ".. _install-neuron-1.15.2-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron (Neuron 1.15.2)\n=========================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2\n\n\n\n \n   .. tab-item:: TensorFlow 2.4.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n\n   .. tab-item:: TensorFlow 2.3.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5    \n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n\n   .. tab-item:: TensorFlow 2.3.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2\n\n\n   .. tab-item:: TensorFlow 2.3.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.16.3-tensorflow-install.rst",
    "content": ".. _install-neuron-1.16.3-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron\n=========================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3\n\n\n\n \n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5    \n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.0-tensorflow-install.rst",
    "content": ".. _install-neuron-1.17.0-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron\n=========================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0\n\n\n\n \n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5    \n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.1-tensorflow-install.rst",
    "content": ".. _install-neuron-1.17.1-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron\n=========================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.1\n\n\n\n \n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5    \n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.2-tensorflow-install.rst",
    "content": ".. _install-neuron-1.17.2-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron\n=========================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2\n\n\n\n \n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5    \n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.5.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.4.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3\n\n\n   .. tab-item:: TensorFlow 2.3.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4\n\n\n   .. tab-item:: TensorFlow 2.2.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3\n\n\n   .. tab-item:: TensorFlow 2.1.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4      \n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.18.0-tensorflow-install.rst",
    "content": ".. _install-neuron-1.18.0-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron\n=========================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0\n\n\n\n \n   .. tab-item:: TensorFlow 2.6.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n\n   .. tab-item:: TensorFlow 2.5.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5    \n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.6.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n\n   .. tab-item:: TensorFlow 2.5.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.6.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3\n\n\n   .. tab-item:: TensorFlow 2.5.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3\n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.19.0-tensorflow-install.rst",
    "content": ".. _install-neuron-1.19.0-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron\n=========================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0\n\n\n\n \n   .. tab-item:: TensorFlow 2.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n\n\n \n   .. tab-item:: TensorFlow 2.6.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n\n   .. tab-item:: TensorFlow 2.5.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5    \n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0\n\n   .. tab-item:: TensorFlow 2.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.6.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n\n   .. tab-item:: TensorFlow 2.5.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: TensorFlow 2.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0\n\n\n\n\n   .. tab-item:: TensorFlow 2.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1\n\n\n\n\n\n   .. tab-item:: TensorFlow 2.6.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3\n\n\n   .. tab-item:: TensorFlow 2.5.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3\n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5   \n\n\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-al2023.rst",
    "content": ".. _tensorflow-neuron-install-prev-al2023:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous Tensorflow Neuron Releases for Ubuntu (``tensorflow-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\nThis section will assist you in installing previous Neuron releases.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.21.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n    \n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-u20.rst",
    "content": ".. _tensorflow-neuron-install-prev-u20:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous Tensorflow Neuron Releases for Ubuntu (``tensorflow-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\nThis section will assist you in installing previous Neuron releases.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.21.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n    \n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-u22.rst",
    "content": ".. _tensorflow-neuron-install-prev-u20:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous Tensorflow Neuron Releases for Ubuntu (``tensorflow-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\nThis section will assist you in installing previous Neuron releases.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.21.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev.rst",
    "content": ".. _install-prev-neuron-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall previous TensorFlow Neuron releases\n===========================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. toctree::\n   :maxdepth: 1\n\n   Neuron 1.19.0 </archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.19.0-tensorflow-install>\n   Neuron 1.18.0 </archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.18.0-tensorflow-install>\n   Neuron 1.17.2 </archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.2-tensorflow-install>\n   Neuron 1.17.1 </archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.1-tensorflow-install>\n   Neuron 1.17.0 </archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.0-tensorflow-install>\n   Neuron 1.16.3 </archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.16.3-tensorflow-install>\n   Neuron 1.15.2 </archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.2-tensorflow-install>\n   Neuron 1.15.1 </archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.1-tensorflow-install>\n   Neuron 1.15.0 </archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.0-tensorflow-install>\n   Neuron 1.14.2 </archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.14.2-tensorflow-install>\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/tensorflow-install.rst",
    "content": ".. _install-neuron-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow Neuron\n=========================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n   .. tab-item:: TensorFlow 2.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 2.9.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n   .. tab-item:: TensorFlow 2.8.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n   .. tab-item:: TensorFlow 2.7.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n   .. tab-item:: TensorFlow 2.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 2.9.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n   .. tab-item:: TensorFlow 2.8.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 2.7.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n   .. tab-item:: TensorFlow 2.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 2.9.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n   .. tab-item:: TensorFlow 2.8.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 2.7.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/tensorflow-update-u20.rst",
    "content": "\n.. _tensorflow-neuron-u20-update:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUpdate to latest TensorFlow Neuron  (``tensorflow-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nIf you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release.\n\n\n.. tab-set::\n\n    .. tab-item:: TensorFlow 2.10.1\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n\n    .. tab-item:: TensorFlow 2.9.3\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n\n    .. tab-item:: TensorFlow 2.8.4\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/tensorflow-update-u22.rst",
    "content": "\n.. _tensorflow-neuron-u20-update:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUpdate to latest TensorFlow Neuron  (``tensorflow-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nIf you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release.\n\n\n.. tab-set::\n\n    .. tab-item:: TensorFlow 2.10.1\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n\n    .. tab-item:: TensorFlow 2.9.3\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n\n    .. tab-item:: TensorFlow 2.8.4\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/setup/tensorflow-update.rst",
    "content": ".. _update-neuron-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUpdate to latest TensorFlow Neuron\n===============================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n   .. tab-item:: TensorFlow 2.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 2.9.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n   .. tab-item:: TensorFlow 2.8.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n   .. tab-item:: TensorFlow 2.7.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n         \n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n   .. tab-item:: TensorFlow 2.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 2.9.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n   .. tab-item:: TensorFlow 2.8.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 2.7.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n   .. tab-item:: TensorFlow 2.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 2.9.3\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n   .. tab-item:: TensorFlow 2.8.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 2.7.4\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: TensorFlow 1.15.5\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tensorflow2-accelerated-ops.rst",
    "content": ".. _tensorflow-ref-neuron-accelerated-ops:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow 2.x (``tensorflow-neuron``) Accelerated (``torch-neuron``) Python APIs and Graph Ops\n======================================================================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThis page lists TensorFlow 2.x Python APIs and graph operators that are\naccelerated by AWS Neuron. The lists are not exhaustive. TensorFlow 2.x Python\nAPIs or graph operators that are not listed here may still be accelerated if\nthey are composed of accelerated primitives, or they will be executed on CPU\nwithout significant acceleration. The TensorFlow Neuron integration contains\nan automatic operator-device-placement mechanism that strives to maximize\nthe execution efficiency of your deep learning models on AWS Machine Learning\nASIC instances.\n\nAccelerated Python APIs\n--------------------------------\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|   Module      |   Accelerated Python API          |                       Comments                            |\n+===============+===================================+===========================================================+\n|   ``tf``      | ``tf.abs``                        |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.add``                        |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.add_n``                      |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.broadcast_static_shape``     |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.cast``                       |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.constant``                   |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.convert_to_tensor``          |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.cumsum``                     | ``axis`` must be a compile-time constant.                 |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.einsum``                     |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.erf``                        |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.exp``                        |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.identity``                   |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.matmul``                     | Uses float16/bfloat16 matmul with float32 accumulation.   |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.maximum``                    |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.minimum``                    |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.multiply``                   |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.negative``                   |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.range``                      | ``start``, ``limit`` and ``delta`` arguments must be      |\n|               |                                   | compile-time constants.                                   |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.realdiv``                    |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.reciprocal``                 |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.reduce_all``                 | ``axis`` must be a compile-time constant.                 |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.reduce_any``                 | ``axis`` must be a compile-time constant.                 |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.reduce_max``                 | ``axis`` must be a compile-time constant.                 |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.reduce_min``                 | ``axis`` must be a compile-time constant.                 |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.reduce_prod``                | ``axis`` must be a compile-time constant.                 |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.reduce_sum``                 | ``axis`` must be a compile-time constant.                 |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.reshape``                    | ``shape`` argument must be a compile-time constant.       |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.rsqrt``                      |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.scalar_mul``                 |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.shape``                      |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.shape_n``                    |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.sigmoid``                    |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.size``                       |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.slice``                      | ``size`` must be a compile-time constant. In addition,    |\n|               |                                   |                                                           |\n|               |                                   | either ``begin`` must be a compile-time constant or       |\n|               |                                   |                                                           |\n|               |                                   | ``size`` must be non-negative.                            |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.sqrt``                       |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.square``                     |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.squared_difference``         |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.squeeze``                    |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.stack``                      |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.stop_gradient``              |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.strided_slice``              |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.tanh``                       |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.tensordot``                  |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.to_bfloat16``                |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.to_float``                   |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.truediv``                    |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n| ``tf.layers`` | ``tf.layers.batch_normalization`` |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.layers.dense``               |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.layers.flatten``             |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n| ``tf.nn``     | ``tf.nn.batch_normalization``     |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.nn.bias_add``                |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.nn.dropout``                 | Always treated as ``tf.identity`` during inference.       |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.nn.fused_batch_norm``        |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.nn.leaky_relu``              |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.nn.relu``                    |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.nn.relu6``                   |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.nn.relu_layer``              |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n|               | ``tf.nn.softmax``                 |                                                           |\n+---------------+-----------------------------------+-----------------------------------------------------------+\n\nAccelerated graph operators\n--------------------------------\n.. code:: python\n\n    Add\n    AddN\n    AddV2\n    BatchMatMul\n    BatchMatMulV2\n    BiasAdd\n    Cast\n    Const\n    Cumsum\n    Einsum\n    Erf\n    Exp\n    ExpandDims\n    FusedBatchNorm\n    FusedBatchNormV2\n    FusedBatchNormV3\n    Greater\n    Identity\n    LeakyRelu\n    MatMul\n    Max\n    Maximum\n    Minimum\n    Mean\n    Mul\n    Neg\n    Pack\n    RealDiv\n    Relu\n    Relu6\n    Reshape\n    Rsqrt\n    Sigmoid\n    Softmax\n    Split\n    SplitV\n    Sqrt\n    Square\n    SquaredDifference\n    Squeeze\n    StridedSlice\n    Sub\n    Sum\n    Tanh\n    Transpose\n    Unpack\n\n\nThe lists share many commonalities with `Available TensorFlow Ops <https://cloud.google.com/tpu/docs/tensorflow-ops>`_. Portions of this page are modifications based on work created and `shared by Google <https://developers.google.com/terms/site-policies>`_ and used according to terms described in the `Creative Commons 4.0 Attribution License <https://creativecommons.org/licenses/by/4.0/>`_.\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tf2_faq.rst",
    "content": ".. _tf2_faq:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow 2.x FAQ\n===================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\nHow do I get started with TensorFlow?\n-------------------------------------\n\nThe easiest entry point is the tutorials offered by the AWS Neuron team. For beginners, the :ref:`HuggingFace DistilBERT Tutorial </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>` is a good place to start.\n\nWhat TensorFlow versions are supported by Neuron?\n-------------------------------------------------\n\nThe AWS Neuron provide well-tested tensorflow-neuron packages that work with a range of tensorflow official releases, as long as the version of tensorflow-neuron matches that of tensorflow. For example, you may install ``tensorflow-neuron==2.3.3.1.0.9999.0`` on top of ``tensorflow==2.3.3`` and expect them to work together.\n\nCurrently, tensorflow-neuron can work with tensorflow versions 2.1.4, 2.2.3, 2.3.3, 2.4.2, 2.5.0.\n\nIn a fresh Python environment, ``pip install tensorflow-neuron`` would bring in the highest version (2.5.0 as of 07/13/2021), which then pulls ``tensorflow==2.5.0`` into the current environment.\n\nIf you already have a particular version of tensorflow 2.x installed, then it is recommended to pay attention to the precise version of tensorflow-neuron and only install the desired one. For example, in an existing Python environment with ``tensorflow==2.3.3`` installed, you may install tensorflow-neuron by pip install ``tensorflow-neuron==2.3.3``, which will reuse the existing tensorflow installation.\n\nWhat operators are supported?\n-----------------------------\n\nDue to fundamental backend design changes in the TensorFlow 2.x framework, the concept of \"supported graph operators\" is no longer well-defined. Please refer to :ref:`Accelerated Python APIs and graph operators <tensorflow-ref-neuron-accelerated-ops>` for a guide to the set of TensorFlow 2.x Python APIs and graph operators that can be accelerated by Neuron.\n\nHow do I compile my model?\n--------------------------\n\nIt is achieved by a new public API called tfn.trace, which resembles the compilation API of AWS PyTorch Neuron integration. Programmatically, customers would be able to execute the following code.\n\n.. code::\n\n    import tensorflow as tf\n    import tensorflow.neuron as tfn\n\n    ...\n    model = tf.keras.Model(inputs=inputs, outputs=outputs)\n    model_neuron = tfn.trace(model, example_inputs)\n    model_neuron.save('./model_neuron_dir')\n    ...\n    model_loaded = tf.saved_model.load('./model_dir')\n    predict_func = model_loaded['serving_default']\n    model_loaded_neuron = tfn.trace(predict_func, example_inputs2)\n    model_loaded_neuron.save('./model_loaded_neuron_dir')\n    ...\n\nHow do I deploy my model?\n-------------------------\n\nPython tensorflow\n^^^^^^^^^^^^^^^^^\n\nPre-compiled models can be saved and reloaded back into a Python environment using regular tensorflow model loading APIs, as long as tensorflow-neuron is installed.\n\n.. code::\n\n    import tensorflow as tf\n\n    model = tf.keras.models.load_model('./model_loaded_neuron_dir')\n    example_inputs = ...\n    output = model(example_inputs)\n\ntensorflow-serving\n^^^^^^^^^^^^^^^^^^\n\nPre-compiled models can be saved into SavedModel format via tensorflow SavedModel APIs\n\n.. code::\n\n    import tensorflow as tf\n    import tensorflow.neuron as tfn\n\n    ...\n    model = tf.keras.Model(inputs=inputs, outputs=outputs)\n    model_neuron = tfn.trace(model, example_inputs)\n    tf.saved_model.save(model_neuron, './model_neuron_dir/1')\n\nThe generated SavedModel './model_neuron_dir' can be loaded into tensorflow-model-server-neuron, which can be installed through apt or yum based on the type of the operating system. For example, on Ubuntu 18.04 LTS the following command installs and launches a tensorflow-model-server-neuron on a pre-compiled SavedModel.\n\n.. code::\n\n    sudo apt install tensorflow-model-server-neuron\n    # --model_base_path needs to be an absolute path\n    tensorflow_model_server_neuron --model_base_path=$(pwd)/model_neuron_dir\n\nWhere can I find tutorials and examples ?\n-----------------------------------------\n\n:ref:`HuggingFace DistilBERT Tutorial </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>` is a good place to start.\n\n\nHow to debug or profile my model?\n---------------------------------\n\n:ref:`AWS Neuron TensorBoard integration <neuron-plugin-tensorboard>` provides visibility into what is happening inside of the Neuron runtime, and allows a more fine-grained (but also more hardware-awared) reasoning on where to improve the performance of machine learning applications.\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo.rst",
    "content": ".. _tensorflow-bert-demo:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n[Broken] Running TensorFlow BERT-Large with AWS Neuron\n=============================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThis example shows a Neuron compatible BERT-Large implementation that is\nfunctionally equivalent to open source BERT-Large model. This demo uses\nTensorFlow-Neuron, BERT-Large weights fine tuned for MRPC and also shows\nthe performance achieved by the Inf1 instance. For users who want to use\npublic BERT SavedModels please also follow the steps described :ref:`using-public-bert-savedmodels`.\n\nLaunch EC2 instances\n--------------------\n\nFor this demo, launch two EC2 instances :\n\n-  a c5.4xlarge instance for compiling the BERT-Large Model and\n-  an inf1.xlarge instance for running inference\n\nFor both of these instances choose the latest Ubuntu 18 Deep Learning\nAMI (DLAMI).\n\n.. _compiling-neuron-compatible-bert-large:\n\nCompiling Neuron compatible BERT-Large\n--------------------------------------\n\nFirst connect to a c5.4xlarge instance and update tensorflow-neuron and\nneuron-cc\n\nUpdate compilation EC2 instance\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUpdate to the latest neuron software by executing the instructions at :ref:`install-neuron-tensorflow`.\n\nNote: if your tensorflow-neuron version on the inference instance is\nlower than 1.15.0.1.0.1333.0, you will need to run this demo on\ninf1.2xlarge instead of inf1.xlarge.\n\nCompile open source BERT-Large saved model using Neuron compatible BERT-Large implementation\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuron software works with TensorFlow saved models. Users should bring\ntheir own BERT-Large saved model for this section. This demo will run\ninference for the MRPC task and the saved model should be fine tuned for\nMRPC. Users who need additional help to fine-tune the model for MRPC or\nto create a saved model can refer to :ref:`bert-tensorflow-demo-appendix1`.\n\nIn the same environment and directory bert_demo scripts, run the\nfollowing :\n\n.. code:: bash\n\n   git clone https://github.com/aws/aws-neuron-sdk\n   cd ~/aws-neuron-sdk/src/examples/tensorflow/bert_demo/\n   export BERT_LARGE_SAVED_MODEL=\"/path/to/user/bert-large/savedmodel\"\n   pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/\n   pip install neuron_cc==1.13.5.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com\n   python bert_model.py --input_saved_model $BERT_LARGE_SAVED_MODEL --output_saved_model ./bert-saved-model-neuron --batch_size=6 --aggressive_optimizations\n\nThis compiles BERT-Large pointed to by $BERT_LARGE_SAVED_MODEL for an\ninput size of 128 and batch size of 6. The compilation output is stored\nin bert-saved-model-neuron. Copy this to your Inf1 instance for\ninferencing.\n\nThe bert_model.py script encapsulates all the steps necessary for this\nprocess. For details on what is done by bert_model.py please refer to\n:ref:`bert-tensorflow-demo-appendix2`.\n\nRunning the inference demo\n--------------------------\n\nConnect to your inf1.xlarge instance and update tensorflow-neuron,\naws-neuron-runtime and aws-neuron-tools.\n\nUpdate inference EC2 instance\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUpdate to the latest neuron software by executing the instructions at :ref:`install-neuron-tensorflow`.\n\nLaunching the BERT-Large demo server\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nCopy the compiled model (bert-saved-model-neuron) from your c5.4xlarge\nto your inf1.xlarge instance. Place the model in the same directory as\nthe bert_demo scripts. Then from the same conda environment launch the\nBERT-Large demo server :\n\n.. code:: bash\n\n   cd ~/aws-neuron-sdk/src/examples/tensorflow/bert_demo/\n   pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/\n   python bert_server.py --dir bert-saved-model-neuron --batch 6 --parallel 4\n\nThis loads 4 BERT-Large models, one into each of the 4 NeuronCores found\nin an inf1.xlarge instance. For each of the 4 models, the BERT-Large\ndemo server opportunistically stitches together asynchronous requests\ninto batch 6 requests. When there are insufficient pending requests, the\nserver creates dummy requests for batching.\n\nWait for the bert_server to finish loading the BERT-Large models to\nInferentia memory. When it is ready to accept requests it will print the\ninferences per second once every second. This reflects the number of\nreal inferences only. Dummy requests created for batching are not\ncredited to inferentia performance. Once the inferences are done you can send\na keyboard interrupt to print out the average throughput of your run.\n\nSending requests to server from multiple clients\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWait until the bert demo server is ready to accept requests. Then on the\nsame inf1.xlarge instance, launch a separate linux terminal. From the\nbert_demo directory execute the following commands :\n\n.. code:: bash\n\n   source activate aws_neuron_tensorflow_p36\n   cd ~/aws-neuron-sdk/src/examples/tensorflow/bert_demo/\n   for i in {1..96}; do python bert_client.py --cycle 128 & done\n\nThis spins up 96 clients, each of which sends 128 inference requests.\n\n\nPrinting latency metrics\n~~~~~~~~~~~~~~~~~~~~~~~~\nAfter all your requests have been sent to your server you can\nrun the following command:\n\n.. code:: bash\n\n    python latency_printer.py\n\n.. _using-public-bert-savedmodels:\n\nUsing public BERT SavedModels\n-----------------------------\n\nWe are now providing a compilation script that has better compatibility\nwith various flavors of BERT SavedModels generated from\nhttps://github.com/google-research/bert. Here are the current\nlimitations:\n\n1. You did not change\n   `modeling.py <https://github.com/google-research/bert/blob/master/modeling.py>`__\n2. BERT SavedModel is generated using ``estimator.export_saved_model``\n3. BERT SavedModel uses fixed sequence length 128 (you may check by\n   ``saved_model_cli show --dir /path/to/user/bert/savedmodel --all``)\n4. ``neuron-cc`` version is at least 1.0.12000.0\n5. ``aws-neuron-runtime`` version is at least 1.0.7000.0\n6. The ``--batch_size`` argument specified in this script is at most 4\n\nExample usage is shown below:\n\n.. code:: bash\n\n   export BERT_LARGE_SAVED_MODEL=\"/path/to/user/bert-large/savedmodel\"\n   cd ~/aws-neuron-sdk/src/examples/tensorflow/bert_demo/\n   python bert_no_model.py --input_saved_model $BERT_LARGE_SAVED_MODEL --output_saved_model ./bert-saved-model-neuron --batch_size=1\n\n.. _bert-tensorflow-demo-appendix1:\n\nAppendix 1\n----------\n\nUsers who need help finetuning BERT-Large for MRPC and creating a saved\nmodel may follow the instructions here.\n\nConnect to the c5.4xlarge compilation EC2 instance you started above and\ndownload these three items :\n\n1. clone `this <https://github.com/google-research/bert>`__ github repo.\n2. download GLUE data as described\n   `here <https://github.com/google-research/bert#user-content-sentence-and-sentence-pair-classification-tasks>`__.\n   Do not run the finetuning command.\n3. download a desired pre-trained BERT-Large checkpoint from\n   `here <https://github.com/google-research/bert#user-content-pre-trained-models>`__.\n   This is the model we will fine tune.\n\nNext edit run_classifier.py in the cloned bert repo to apply the patch\ndescribed in the following git diff.\n\n::\n\n   diff --git a/run_classifier.py b/run_classifier.py\n   index 817b147..c9426bc 100644\n   --- a/run_classifier.py\n   +++ b/run_classifier.py\n   @@ -955,6 +955,18 @@ def main(_):\n            drop_remainder=predict_drop_remainder)\n\n        result = estimator.predict(input_fn=predict_input_fn)\n   +    features = {\n   +        \"input_ids\": tf.placeholder(shape=[None, FLAGS.max_seq_length], dtype=tf.int32, name='input_ids'),\n   +        \"input_mask\": tf.placeholder(shape=[None, FLAGS.max_seq_length], dtype=tf.int32, name='input_mask'),\n   +        \"segment_ids\": tf.placeholder(shape=[None, FLAGS.max_seq_length], dtype=tf.int32, name='segment_ids'),\n   +        \"label_ids\": tf.placeholder(shape=[None], dtype=tf.int32, name='label_ids'),\n   +        \"is_real_example\": tf.placeholder(shape=[None], dtype=tf.int32, name='is_real_example'),\n   +    }\n   +    serving_input_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(features)\n   +    estimator._export_to_tpu = False  ## !!important to add this\n   +    estimator.export_saved_model(\n   +        export_dir_base='./bert_classifier_saved_model',\n   +        serving_input_receiver_fn=serving_input_fn)\n\n        output_predict_file = os.path.join(FLAGS.output_dir, \"test_results.tsv\")\n        with tf.gfile.GFile(output_predict_file, \"w\") as writer:\n\nNOTE : Users who are interested may refer to this\n`link <https://github.com/google-research/bert/issues/146#issuecomment-569138476>`__\nfor additional background information on the patch but it is not\nnecessary for running this demo.\n\nThen from the bert_demo directory run the following :\n\n.. code:: bash\n\n   source activate aws_neuron_tensorflow_p36\n   cd ~/aws-neuron-sdk/src/examples/tensorflow/bert_demo/\n   export BERT_REPO_DIR=\"/path/to/cloned/bert/repo/directory\"\n   export GLUE_DIR=\"/path/to/glue/data/directory\"\n   export BERT_BASE_DIR=\"/path/to/pre-trained/bert-large/checkpoint/directory\"\n   ./tune_save.sh\n\nThe a saved model will be created in\n$BERT_REPO_DIR/bert-saved-model/*random_number*/. Where, *random_number*\nis a random number generated for every run. Use this saved model to\ncontinue with the rest of the demo.\n\n.. _bert-tensorflow-demo-appendix2:\n\nAppendix 2\n----------\n\nFor all BERT variants, we currently need to augment the standard Neuron\ncompilation process for performance tuning. In the future, we intend to\nautomate this tuning process. This would allow users to use the standard\nNeuron compilation process, which requires only a one line change in\nuser source code. The standard compilation process is described :ref:`/src/examples/mxnet/resnet50/resnet50.ipynb`.\n\nThe augmented Neuron compilation process is encapsulated by the\nbert_model.py script, which performs the following things :\n\n1. Define a Neuron compatible implementation of BERT-Large. For\n   inference, this is functionally equivalent to the open source\n   BERT-Large. The changes needed to create a Neuron compatible\n   BERT-Large implementation is described in :ref:`bert-tensorflow-demo-appendix3`.\n2. Extract BERT-Large weights from the open source saved model pointed\n   to by --input_saved_model and associates it with the Neuron\n   compatible model\n3. Invoke TensorFlow-Neuron to compile the Neuron compatible model for\n   Inferentia using the newly associated weights\n4. Finally, the compiled model is saved into the location given by\n   --output_saved_model\n\n.. _bert-tensorflow-demo-appendix3:\n\nAppendix 3\n----------\n\nThe Neuron compatible implementation of BERT-Large is functionally\nequivalent to the open source version when used for inference. However,\nthe detailed implementation does differ and here are the list of changes\n:\n\n1. Data Type Casting : If the original BERT-Large an FP32 model,\n   bert_model.py contains manually defined cast operators to enable\n   mixed-precision. FP16 is used for multi-head attention and\n   fully-connected layers, and fp32 everywhere else. This will be\n   automated in a future release.\n2. Remove Unused Operators: A model typically contains training\n   operators that are not used in inference, including a subset of the\n   reshape operators. Those operators do not affect inference\n   functionality and have been removed.\n3. Reimplementation of Selected Operators : A number of operators\n   (mainly mask operators), has been reimplemented to bypass a known\n   compiler issue. This will be fixed in a planned future release.\n4. Manually Partition Embedding Ops to CPU : The embedding portion of\n   BERT-Large has been partitioned manually to a subgraph that is\n   executed on the host CPU, without noticable performance impact. In\n   near future, we plan to implement this through compiler\n   auto-partitioning without the need for user intervention.\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/glue_mrpc_dev.tsv",
    "content": "﻿Quality\t#1 ID\t#2 ID\t#1 String\t#2 String\n1\t1355540\t1355592\tHe said the foodservice pie business doesn 't fit the company 's long-term growth strategy .\t\" The foodservice pie business does not fit our long-term growth strategy .\n0\t2029631\t2029565\tMagnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war .\tHis wife said he was \" 100 percent behind George Bush \" and looked forward to using his years of training in the war .\n0\t487993\t487952\tThe dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat .\tThe dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .\n1\t1989515\t1989458\tThe AFL-CIO is waiting until October to decide if it will endorse a candidate .\tThe AFL-CIO announced Wednesday that it will decide in October whether to endorse a candidate before the primaries .\n0\t1783137\t1782659\tNo dates have been set for the civil or the criminal trial .\tNo dates have been set for the criminal or civil cases , but Shanley has pleaded not guilty .\n1\t3039165\t3039036\tWal-Mart said it would check all of its million-plus domestic workers to ensure they were legally employed .\tIt has also said it would review all of its domestic employees more than 1 million to ensure they have legal status .\n0\t1490811\t1490840\tWhile dioxin levels in the environment were up last year , they have dropped by 75 percent since the 1970s , said Caswell .\tThe Institute said dioxin levels in the environment have fallen by as much as 76 percent since the 1970s .\n1\t426112\t426210\tThis integrates with Rational PurifyPlus and allows developers to work in supported versions of Java , Visual C # and Visual Basic .NET.\tIBM said the Rational products were also integrated with Rational PurifyPlus , which allows developers to work in Java , Visual C # and VisualBasic .Net.\n1\t1439663\t1439808\tThe top rate will go to 4.45 percent for all residents with taxable incomes above $ 500,000 .\tFor residents with incomes above $ 500,000 , the income-tax rate will increase to 4.45 percent .\n1\t3147370\t3147525\tThe results appear in the January issue of Cancer , an American Cancer Society journal , being published online today .\tThe results appear in the January issue of Cancer , an American Cancer Society ( news - web sites ) journal , being published online Monday .\n1\t3300040\t3299992\tThe delegates said raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers .\tBin Laden ’ s men pointed out that raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers .\n0\t524136\t524119\t\" Sanitation is poor ... there could be typhoid and cholera , \" he said .\t\" Sanitation is poor , drinking water is generally left behind . . . there could be typhoid and cholera . \"\n0\t969512\t969295\tThe broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 .\tThe technology-laced Nasdaq Composite Index was down 25.36 points , or 1.53 percent , at 1,628.26 .\n1\t1685339\t1685429\tThe only announced Republican to replace Davis is Rep. Darrell Issa of Vista , who has spent $ 1.71 million of his own money to force a recall .\tSo far the only declared major party candidate is Rep. Darrell Issa , a Republican who has spent $ 1.5 million of his own money to fund the recall .\n1\t1967578\t1967664\tThe decision to issue new guidance has been prompted by intelligence passed to Britain by the FBI in a secret briefing in late July .\tScotland Yard 's decision to issue new guidance has been prompted by new intelligence passed to Britain by the FBI in late July .\n1\t2047034\t2046820\tUnable to find a home for him , a judge told mental health authorities they needed to find supervised housing and treatment for DeVries somewhere in California .\tThe judge had told the state Department of Mental Health to find supervised housing and treatment for DeVries somewhere in California .\n1\t2046630\t2046644\tThe decision came a year after Whipple ended federal oversight of the district 's racial balance , facilities , budget , and busing .\tThe decision came a year after Whipple ended federal oversight of school busing as well as the district 's racial balance , facilities and budget .\n0\t2221603\t2221633\tIn midafternoon trading , the Nasdaq composite index was up 8.34 , or 0.5 percent , to 1,790.47 .\tThe Nasdaq Composite Index .IXIC dipped 8.59 points , or 0.48 percent , to 1,773.54 .\n1\t129995\t129864\tMorgan Stanley raised its rating on the beverage maker to \" overweight \" from \" equal-weight \" saying in part that pricing power with its bottlers should improve in 2004 .\tMorgan Stanley raised its rating on the company to \" overweight \" from \" equal-weight , \" saying the beverage maker 's pricing power with bottlers should improve in 2004 .\n0\t919683\t919782\tThe pound also made progress against the dollar , reached fresh three-year highs at $ 1.6789 .\tThe British pound flexed its muscle against the dollar , last up 1 percent at $ 1.6672 .\n0\t970740\t971209\tFriday , Stanford ( 47-15 ) blanked the Gamecocks 8-0 .\tStanford ( 46-15 ) has a team full of such players this season .\n1\t2745055\t2745022\tLast month Intel raised its revenue guidance for the quarter to between $ 7.6 billion and $ 7.8 billion .\tAt the end of the second quarter , Intel initially predicted sales of between $ 6.9 billion and $ 7.5 billion .\n0\t2199097\t2199072\tThe driver , Eugene Rogers , helped to remove children from the bus , Wood said .\tAt the accident scene , the driver was \" covered in blood \" but helped to remove children , Wood said .\n1\t1609290\t1609098\tONG KONG , July 9 Tens of thousands of demonstrators gathered tonight before the legislature building here to call for free elections and the resignation of Hong Kong 's leader .\tTens of thousands of demonstrators gathered yesterday evening to stand before this city 's legislature building and call for free elections and the resignation of Hong Kong 's leader .\n1\t1597193\t1597119\tSaddam loyalists have been blamed for sabotaging the nation 's infrastructure , as well as frequent attacks on U.S. soldiers .\tHussein loyalists have been blamed for sabotaging the nation 's infrastructure and attacking US soldiers .\n1\t2758944\t2758975\tIts closest living relatives are a family frogs called sooglossidae that are found only in the Seychelles in the Indian Ocean .\tIts closest relative is found in the Seychelles Archipelago , near Madagascar in the Indian Ocean .\n0\t2584416\t2584653\tCooley said he expects Muhammad will similarly be called as a witness at a pretrial hearing for Malvo .\tLee Boyd Malvo will be called as a witness Wednesday in a pretrial hearing for fellow sniper suspect John Allen Muhammad .\n1\t86007\t86373\t\" Instead of pursuing the most imminent and real threats - international terrorists , \" Graham said , \" this Bush administration chose to settle old scores . \"\t\" Instead of pursuing the most imminent and real threats - international terrorists - this Bush administration has chosen to settle old scores , \" Graham said .\n1\t1602860\t1602844\tHe said they lied on a sworn affidavit that requires them to list prior marriages .\tMorgenthau said the women , all U.S. citizens , lied on a sworn affidavit that requires them to list prior marriages .\n1\t1201306\t1201329\tThe association said 28.2 million DVDs were rented in the week that ended June 15 , compared with 27.3 million VHS cassettes .\tThe Video Software Dealers Association said 28.2 million DVDs were rented out last week , compared to 27.3 million VHS cassettes .\n0\t461779\t461815\tWith these assets , Funny Cide has a solid chance to become the first Triple Crown winner since Affirmed in 1978 .\tFunny Cide is looking to become horse racing 's first Triple Crown winner in a generation .\n1\t1438666\t1438643\tIntel was disappointed and assessing its \" options in the event Mr. Hamidi resumes his spamming activity against Intel , \" spokesman Chuck Mulloy said .\tIntel spokesman Chuck Mulloy said the company was disappointed and assessing its \" options in the event Mr. Hamidi resumes his spamming activity against Intel . \"\n1\t3261484\t3261306\tMr Annan also warned the US should not use the war on terror as an excuse to suppress \" long-cherished freedoms \" .\tAnnan warned that the dangers of extremism after September 11 should not be used as an excuse to suppress \" long-cherished \" freedoms .\n1\t1277539\t1277527\tAt community colleges , tuition will jump to $ 2,800 from $ 2,500 .\tCommunity college students will see their tuition rise by $ 300 to $ 2,800 or 12 percent .\n1\t3035788\t3035918\tHe made a point of saying during Tuesdays debate that the Confederate flag was a racist symbol .\tThough Dean made a point of saying during the debate that the Confederate flag is a racist symbol .\n0\t132553\t132725\tBush wanted \" to see an aircraft landing the same way that the pilots saw an aircraft landing , \" White House press secretary Ari Fleischer said yesterday .\tOn Tuesday , before Byrd 's speech , Fleischer said Bush wanted ' ' to see an aircraft landing the same way that the pilots saw an aircraft landing .\n0\t2259788\t2259747\tOn Monday the Palestinian Prime Minister , Mahmoud Abbas , will report to the Palestinian parliament on his Government 's achievements in its first 100 days in office .\tPalestinian Prime Minister Mahmoud Abbas must defend the record of his first 100 days in office before Parliament today as the death toll in the occupied territories continues to rise .\n0\t2307064\t2307235\tThe civilian unemployment rate improved marginally last month -- slipping to 6.1 percent -- even as companies slashed payrolls by 93,000 .\tThe civilian unemployment rate improved marginally last month _ sliding down to 6.1 percent _ as companies slashed payrolls by 93,000 amid continuing mixed signals about the nation 's economic health .\n1\t3046488\t3046824\tPer-user pricing is $ 29 for Workplace Messaging , $ 89 for Team Collaboration and $ 35 for Collaborative Learning .\tWorkplace Messaging is $ 29 , Workplace Team Collaboration is $ 89 , and Collaborative Learning is $ 35 .\n1\t86020\t86007\t\" Instead of pursuing the most imminent and real threats – international terrorism – this Bush administration chose to settle old scores , \" Mr. Graham said .\t\" Instead of pursuing the most imminent and real threats - international terrorists , \" Graham said , \" this Bush administration chose to settle old scores . \"\n0\t1100998\t1100441\tSARS has killed about 800 people and affected more than 8400 since being detected in China in November .\tSARS has killed about 800 people and sickened more than 8,400 worldwide , mostly in Asia .\n1\t2268396\t2268480\tAuthorities had no evidence to suggest the two incidents were connected .\tThere was no immediate evidence that the two incidents were connected , police said .\n0\t1984039\t1983986\t\" Jeremy 's a good guy , \" Barber said , adding : \" Jeremy is living the dream life of the New York athlete .\tHe also said Shockey is \" living the dream life of a New York athlete .\n0\t2697659\t2697747\tRatliff 's daughters , Margaret and Martha Ratliff , were adopted by Peterson after their mother 's death .\tPeterson helped raise Ratliff 's two daughters , Margaret and Martha Ratliff , who supported him throughout the trial .\n0\t2175939\t2176090\tAfter losing as much as 84.56 earlier , the Dow Jones industrial average closed up 22.81 , or 0.2 percent , at 9,340.45 .\tIn midday trading , the Dow Jones industrial average lost 68.84 , or 0.7 percent , to 9,248.80 .\n1\t886618\t886456\tRumsfeld , who has been feuding for two years with Army leadership , passed over nine active-duty four-star generals .\tRumsfeld has been feuding for a long time with Army leadership , and he passed over nine active-duty four-star generals .\n1\t588637\t588864\tConsumers who said jobs are difficult to find jumped from 29.4 to 32.6 , while those claiming work was plentiful slipped from 13 to 12.6 .\tConsumers who said jobs are difficult to find jumped to 32.6 from 29.4 , while those saying work was plentiful slipped to 12.6 from 13 in April .\n0\t2252795\t2252970\tHe has no immediate plans for television advertising , believing it is unnecessary this early .\tA Lieberman aide said there were no immediate plans for television advertising .\n1\t1756329\t1756394\t\" I think it happened very quickly , \" Houston Police Department homicide investigator Phil Yochum said of the crime .\t\" I think it happened very quickly , \" said Investigator Phil Yochum of the Houston Police Department 's homicide division .\n1\t1673112\t1673068\tUnited issued a statement saying it will \" work professionally and cooperatively with all its unions . \"\tSenior vice president Sara Fields said the airline \" will work professionally and cooperatively with all our unions . \"\n1\t2357324\t2357271\t\" But they never climb out of the pot of beer again . \"\tIt 's just that they never climb out of the beer again . \"\n1\t780408\t780363\tChief financial officer Andy Bryant has said that hike had a greater affect volume than officials expected .\tBryant has said that hike had a greater effect on demand than officials expected .\n1\t821523\t821385\tRobert Liscouski , the Assistant Secretary of Homeland Security for Infrastructure Protection , will oversee NCSD .\tNCSD 's chief will be Robert Liscouski , the assistant secretary of Homeland Security for Infrastructure Protection .\n1\t2304696\t2304863\tHP 's shipments increased 48 percent year-over-year , compared to an increase of 31 percent for Dell .\tHPs shipments increased 48 per cent year-on-year , compared to an increase of 31 per cent for Dell .\n1\t2531749\t2531607\tChirac , who can pardon a law-breaker , refused Humbert 's request last year but kept in close touch with the family .\tChirac , who has the authority to pardon law-breakers , refused Humbert 's request to be allowed to die last year but kept in close touch with the family .\n1\t3180014\t3179967\tThe charges allege that he was part of the conspiracy to kill and kidnap persons in a foreign country .\tThe government now charges that Sattar conspired with Rahman to kill and kidnap individuals in foreign countries .\n1\t726966\t726945\tIn the 2002 study , the margin of error ranged from 1.8 to 4.4 percentage points .\tIt has a margin of error of plus or minus three to four percentage points .\n1\t2638861\t2638982\tMr. Clinton 's national security adviser , Sandy Berger , said that the White House wasn 't informed of the FBI activities .\tClinton ’ s national security adviser , Sandy Berger , said in an interview that the White House was not informed of the FBI activities .\n1\t2495223\t2495307\t\" This decision is clearly incorrect , \" FTC Chairman Timothy Muris said in a written statement .\tThe decision is \" clearly incorrect , \" FTC Chairman Tim Muris said .\n1\t55187\t54831\tProsecutors allege that Nichols and co-conspirator Timothy McVeigh worked together to prepare a bomb that destroyed the Alfred P. Murrah Federal Building .\tProsecutors allege that Nichols and coconspirator Timothy McVeigh worked together to prepare a 4,000-pound fuel-and-fertilizer bomb that destroyed the Murrah building .\n0\t2763381\t2763517\tTerri Schiavo , 39 , is expected to die sometime in the next two weeks in the Tampa-area hospice where she has spent the past several years .\tTerri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler .\n1\t1990975\t1991132\tSecretary of State Colin Powell designated the Chechen leader believed responsible for last year 's hostage standoff in a Moscow theater as a threat to U.S. security Friday .\tU.S. Secretary of State Colin Powell on Friday designated Chechen rebel leader Shamil Basayev a threat to the security of the United States and to U.S. citizens .\n1\t2204353\t2204418\t\" Today , we are trying to convey this problem to Russian President Vladimir Putin and US President George W Bush . \"\t\" Today , we are trying to convey this problem to Russian President Vladimir Putin ( news - web sites ) and President Bush ( news - web sites ) . \"\n1\t60122\t60445\tThat would be a potential setback to Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries .\tThe inquiry may hinder Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries .\n1\t961836\t962243\tPeopleSoft also said its board had officially rejected Oracle 's offer .\tThursday morning , PeopleSoft 's board rejected the Oracle takeover offer .\n0\t3140260\t3140288\tThe Dow Jones industrial average ended the day down 10.89 at 9,837.94 , after advancing 111.04 Wednesday .\tThe Dow Jones industrial average fell 10.89 points , or 0.11 percent , to 9,837.94 .\n1\t1720166\t1720115\tCortisol levels in the saliva of day care children were highest and rose most steeply in those judged by day care center personnel to be the shyest .\tCortisol levels in the saliva of day-care children were highest and rose most steeply in those whom day-care centre staffed judged to be the shyest .\n1\t2573262\t2573319\t\" The idea that Tony Abbott is in some way a one-dimensional political head-kicker couldn 't be more wrong , \" Mr Howard said .\t\" The idea that Tony Abbott is in some way a one-dimensional political head kicker couldn 't be more wrong . \"\n0\t1353356\t1353174\t\" Biotech products , if anything , may be safer than conventional products because of all the testing , \" Fraley said , adding that 18 countries have adopted biotechnology .\t\" Biotech products , if anything , may be safer than conventional products because of all the testing , \" said Robert Fraley , Monsanto 's executive vice president .\n1\t2738677\t2738741\tThe rate of skin cancer has tripled since the 1950s in Norway and Sweden , according to the study .\tThe study also found that skin cancer nearly tripled in Norway and Sweden since the 1950s .\n1\t1638813\t1639087\tWe acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , \" Rumsfeld said .\tRather , the US acted because the administration saw \" existing evidence in a new light , through the prism of our experience on September 11 \" .\n1\t1605350\t1605425\tTrans fat makes up only 1 percent to 3 percent of the total fat Americans consume , compared with 14 percent for saturated fat .\tTrans fat accounts for 2.5 percent of Americans ' daily calories , compared to 11 percent to 12 percent for saturated fat .\n1\t2494149\t2494073\tHowever , a recent slide in prices and OPEC 's expectations of a surge in oil inventories have compounded its fears about a further softening of the market .\tA 14 percent slide in crude prices this month and expectations of a build up in oil inventories compounded OPEC 's fears of a further softening of the market .\n1\t3023029\t3023229\tPeterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son .\tPeterson , 31 , is charged with two counts of first-degree murder in the slayings of his wife , Laci , and their unborn son , Conner .\n1\t1351550\t1351155\tCarlson on Tuesday said he would not recuse himself from the case .\tService officials said Carlson refused to recuse himself from the case .\n1\t981185\t981234\tThe program will grow to include ports in Dubai , Turkey and Malaysia , among others .\tThe program will be expanded to include areas of the Middle East such as Dubai , Turkey and Malaysia , Mr. Ridge said .\n0\t2111629\t2111786\tMcCabe said he was considered a witness , not a suspect .\t\" He is not considered a suspect , \" McCabe said .\n1\t655498\t655391\tThe woman was exposed to the SARS virus while in the hospital but was not a health care worker , said Dr. Colin D ’ Cunha , Ontario ’ s commissioner of public health .\tThe woman was exposed to the SARS virus while in the hospital but was not a health-care worker , said Dr Colin D 'Cunha , Ontario 's commissioner of public health .\n1\t533823\t533909\tHe added that those \" are not solely American principles , nor are they exclusively Western . \"\t\" These are not solely American principles nor are they exclusively Western , \" Rumsfeld said .\n1\t581592\t581570\t\" If we don 't march into Tehran , I think we will be in pretty good shape , \" he said .\t\" As long as we don 't march on Tehran , I think we are going to be in pretty good shape , \" he said .\n0\t1010655\t1010430\tOn Saturday , a 149mph serve against Agassi equalled Rusedski 's world record .\tOn Saturday , Roddick equalled the world record with a 149 m.p.h. serve in beating Andre Agassi .\n1\t2241925\t2242066\tChad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new technologies and methods to communicate more quickly and efficiently .\tChad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new ways to communicate .\n1\t2796978\t2797024\t\" APEC leaders are painfully aware that security and prosperity are inseparable , \" Thai Prime Minister Thaksin Shinawatra told business leaders .\t\" APEC leaders are painfully aware that security and prosperity are inseparable , \" Thaksin said .\n0\t101746\t101775\tDanbury prosecutor Warren Murray could not be reached for comment Monday .\tProsecutors could not be reached for comment after the legal papers were obtained late Monday afternoon .\n1\t327839\t327748\tWittig resigned last year after being indicted on federal bank fraud charges involving a real estate loan unrelated to Westar business .\tWittig resigned in late November about two weeks after being indicted on bank fraud charges in a real estate case unrelated to the company .\n0\t2988297\t2988555\tShattered Glass , \" starring Hayden Christensen as Stephen Glass , debuted well with $ 80,000 in eight theaters .\t\" Shattered Glass \" _ starring Hayden Christensen as Stephen Glass , The New Republic journalist fired for fabricating stories _ debuted well with $ 80,000 in eight theaters .\n1\t2217613\t2217659\tHe was arrested Friday night at an Alpharetta seafood restaurant while dining with his wife , singer Whitney Houston .\tHe was arrested again Friday night at an Alpharetta restaurant where he was having dinner with his wife .\n0\t2128530\t2128455\tHowever , EPA officials would not confirm the 20 percent figure .\tOnly in the past few weeks have officials settled on the 20 percent figure .\n1\t2208376\t2208198\tUniversity of Michigan President Mary Sue Coleman said in a statement on the university 's Web site , \" Our fundamental values haven 't changed .\t\" Our fundamental values haven 't changed , \" Mary Sue Coleman , president of the university , said in a statement in Ann Arbor .\n1\t1980654\t1980641\tThe first products are likely to be dongles costing between US $ 100 and US $ 150 that will establish connections between consumer electronics devices and PCs .\tThe first products will likely be dongles costing $ 100 to $ 150 that will establish connections between consumer electronics devices and PCs .\n0\t589579\t589557\tHowever , Lapidus expects foreign brands ' sales to be up 4 percent , driven by strong truck sales at Honda Motor Co .\tLapidus expects Ford to be down 5 percent , Chrysler down 10 percent and foreign brands up 4 percent driven by strong truck sales at Honda .\n1\t1636060\t1635946\tMichel , who remains in the government , denied that US pressure had provoked the government 's move .\tMichel , who has stayed in the new government , denied that it was U.S. pressure which had provoked the government 's move .\n1\t1630585\t1630657\tSome of the computers also are used to send spam e-mail messages to drum up traffic to the sites .\tSome are also used to send spam e-mail messages to boost traffic to the sites .\n0\t447728\t447699\tIndonesia 's army has often been accused of human rights abuses during GAM 's battle for independence , charges it has generally denied while accusing the separatists of committing rights violations .\tIndonesia 's army has been accused of human rights abuses during its earlier battles with GAM , charges it has generally denied .\n1\t1606495\t1606619\tBush also hoped to polish his anti-AIDS credentials in Uganda , which has been hailed as an African pioneer in fighting the killer disease .\tPresident Bush flies to Uganda Friday hoping to polish his anti- AIDS credentials in a country hailed as an African pioneer in fighting the epidemic .\n1\t1550897\t1550977\tLater this year , the command will send trainers with soldiers from four North African nations on patrolling and intelligence gathering missions .\tThis fall the command will send trainers to work with soldiers from four North African nations on patrolling and gathering intelligence .\n0\t490376\t490490\tThe reports helped overcome investor jitters after the euro briefly hit an all-time high against the dollar Tuesday .\tStocks slipped at the open after the euro hit record highs against the dollar .\n1\t3084554\t3084612\tSales for the quarter beat expectations , rising 37 percent year-on-year to 1.76 billion euros .\tSales rose 37 per cent year-on-year to 1.76bn , beating expectations .\n1\t315647\t315778\tIf the MTA 's appeal to a higher court is successful , the $ 2 bus and subway base fare won 't be rolled back .\tIf the MTA 's appeal is successful , the $ 2 bus and subway base fare won 't change .\n1\t3428298\t3428362\tRobert Walsh , 40 , remained in critical but stable condition Friday at Staten Island University Hospital 's north campus .\tWalsh , also 40 , was in critical but stable condition at Staten Island University Hospital last night .\n1\t2523564\t2523358\tThe Guru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS ( Basic Input Output System ) update and a troubleshooting-assistance feature called Black Box .\tThe µGuru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS update and a troubleshooting-assistance feature called Black Box .\n1\t2079200\t2079131\tU.S. corporate bond yield spreads tightened in spotty trading on Friday as Wall Street labored to get back on its feet after the largest power outage ever in North America .\tU.S. stocks rose slightly on feather-light volume on Friday , as Wall Street regrouped after the biggest-ever power outage in North America .\n1\t818091\t817811\tThe company said it would issue revised guidance for the full fiscal year next month when it releases its Q2 results .\tThe company said it would renew its guidance for 2003 when it announces its second quarter results in mid-July .\n1\t1580638\t1580663\t\" I stand 100 percent by it , and I think our intelligence services gave us the correct information at the time . \"\tI stand 100 percent by it , and I think that our intelligence services gave us the correct intelligence and information at the time , \" Blair said .\n0\t1919740\t1919926\t\" I don 't know if the person I 'm talking to now may end up being someone else at another time that may not follow the rules , \" Parrish said .\t\" I don 't know whether the person I 'm talking to now may end up being someone else , \" Parrish said .\n1\t2748287\t2748550\t\" I think it 's going to be a close vote , but I think the grant proposal is going to win , \" McConnell said .\t\" I think it 's going to be a close vote , but I think the grant proposal 's going to win , \" said Sen. Mitch McConnell , assistant majority leader .\n1\t3394891\t3394775\tTwenty-eight people were believed to have been spending Christmas Day with the caretaker of the St Sophia 's camp , when the mudslide smashed into two cabins .\tTwenty-seven people were believed to have been spending Christmas Day with the caretaker of Saint Sophia Camp , a Greek Orthodox facility , when the mudslide roared through .\n0\t2963943\t2963880\tOne , Capt. Doug McDonald , remained hospitalized in critical condition on Thursday .\tHer 20-year-old sister , Allyson , was severely burned and remained hospitalized in critical condition .\n0\t1865364\t1865251\tThe United States finally relented during President Bush 's visit to Africa earlier this month .\tDuring President Bush 's trip to Africa earlier this month , however , Washington said it would support the increase .\n1\t263690\t263819\t\" There is no conscious policy of the United States , I can assure you of this , to move the dollar at all , \" he said .\tHe also said there is no conscious policy by the United States to move the value of the dollar .\n1\t283751\t283290\tIt 's the first such drill since the September 11 terrorist attacks on New York and Washington .\tIt is the nation 's first large-scale counterterrorism exercise since the Sept . 11 terrorist attacks .\n1\t2517014\t2516995\tMyanmar 's pro-democracy leader Aung San Suu Kyi will return home late Friday but will remain in detention after recovering from surgery at a Yangon hospital , her personal physician said .\tMyanmar 's pro-democracy leader Aung San Suu Kyi will be kept under house arrest following her release from a hospital where she underwent surgery , her personal physician said Friday .\n1\t1330643\t1330622\tAccording to the Merchant Marine Ministry , the 37-year-old ship is registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands .\tThe Baltic Sky is a 37-year-old ship registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands .\n1\t3111452\t3111428\tIn an unusual move , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages that critics contend could disrupt millions of Web sites .\tIn an unusual move that critics contend could disrupt millions of Web sites , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages .\n0\t1167835\t1167651\tKansas Department of Health and Environment records show there were 88 abortions performed on girls age 14 and younger last year .\tStatistics from the Kansas Department of Health and Environment show that 11,844 abortions were performed in the state last year .\n0\t1423836\t1423708\tA European Union spokesman said the Commission was consulting EU member states \" with a view to taking appropriate action if necessary \" on the matter .\tLaos 's second most important export destination - said it was consulting EU member states ' ' with a view to taking appropriate action if necessary ' ' on the matter .\n1\t2090911\t2091154\tWaiting crowds filling the streets on both sides overwhelmed the peacekeepers soon after daylight , sweeping past the barbed wire barricades .\tBut waiting crowds filling the streets rushed the bridges soon after daylight , overrunning razor-wire barricades .\n1\t2265271\t2265152\tBarry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products not sold in the United States .\tBarry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products unknown to the American market .\n1\t3062202\t3062308\tBy skirting the FDA 's oversight , Eagan said , the quality of the imported drugs is \" less predictable \" than for those obtained in the United States .\tBy skirting the FDA 's oversight , Eagan said the quality of the imported drugs is \" less predictable \" than U.S. drugs .\n1\t2155514\t2155377\tHe said : \" For the first time there is an easy and affordable way of making this treasure trove of BBC content available to all . \"\t\" For the first time , there is an easy and affordable way of making this treasure trove of BBC content available to all , \" Dyke said .\n1\t1552068\t1551928\tThree such vigilante-style attacks forced the hacker organizer , who identified himself only as \" Eleonora [ 67 ] , \" to extend the contest until 7 p.m. EST Sunday .\tThree such vigilante-style attacks forced the hacker organiser , who identified himself only as \" Eleonora67 ] , \" to extend the contest until 8am ( AEST ) today .\n1\t936978\t937500\tEric Gagne pitched a perfect ninth for his 23rd save in as many opportunities .\tGagne struck out two in a perfect ninth inning for his 23rd save .\n0\t985015\t984975\tOne way or another , Harry Potter And The Order Of The Phoenix will be in your hands by Saturday .\tJust about everything about \" Harry Potter and the Order of the Phoenix \" will set records .\n1\t1430357\t1430425\t\" Allison just proves you don 't need to wait until August or September to have a disaster , \" said Josh Lichter , a meteorologist with the Houston-Galveston weather office .\t\" Allison just proves you don 't need to wait until August or September to have a disaster , \" Lichter said .\n1\t3039310\t3039413\tToday , analysts say , UN members can no longer ignore the shifts since the September 11 2001 attacks .\tOn Wednesday , analysts say , UN members can no longer ignore the shifts since the attacks in the US of September 11 2001 .\n1\t34513\t34742\tPolice say CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the United States .\tMr McKinlay said that CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the US .\n1\t368067\t368018\tChiron already has nearly 20 percent acceptances from PowderJect 's shareholders .\tChiron has acceptances from holders of nearly 20 percent of PowderJect shares .\n0\t611663\t611716\tErnst & Young has denied any wrongdoing and plans to fight the allegations .\tErnst & Young has denied the SEC 's claims , and called its recommendations \" irresponsible \" .\n1\t98432\t98657\tThe attack followed several days of disturbances in the city where American soldiers exchanged fire with an unknown number of attackers as civilians carried out demonstrations against the American presence .\tThe attack came after several days of disturbance in the city in which U.S. soldiers exchanged fire with an unknown number of attackers as civilians protested the American presence .\n1\t3039007\t3038845\tNo company employee has received an individual target letter at this time .\tShe said no company official had received \" an individual target letter at this time . \"\n1\t1708040\t1708062\tSecond-quarter results reflected a gain of 10 cents per diluted share , while the 2002 results included a loss of 19 cents per diluted share .\tThe second-quarter results had a non-operating gain of 10 cents a share while the 2002 second-quarter performance had a net non-operating loss of 19 cents a share .\n0\t1757264\t1757375\tHe allegedly told his ex-wife in an angry phone call that he had no intention of following their new custody agreement .\tThe two had battled over custody and he allegedly told her in an angry phone call that he had no intention of following their new custody agreement .\n1\t383417\t383558\tWorldwide , more than 50 million people have seen \" Les Miz , \" with gross receipts of $ 1.8 billion .\tWorldwide , Les Misérables has been seen by over 50 million people , with a total gross of over $ 2 billion .\n0\t2766112\t2766084\tIn fiction : Edward P. Jones ( \" The Known World \" ) and Scott Spencer ( \" A Ship Made of Paper \" ) .\tThe fifth nominee for fiction is Scott Spencer , for A Ship Made of Paper .\n1\t1261116\t1261234\t\" Overwhelmingly the Windows brand really resonated with them . \"\t\" Windows was the part of the experience that really resonated with people . \"\n1\t3028143\t3028234\tThe Centers for Medicare and Medicaid Services , the federal agency that runs Medicare , last year began a similar effort for nursing homes .\tThe Centers for Medicare and Medicaid launched a similar consumer tool for nursing homes last year .\n0\t249699\t249623\tVivace was founded in 1999 and has raised over $ 118 million in three rounds of venture financing .\tDuring difficult times for technology venture capital , Vivace raised over $ 118 million in three rounds of venture financing .\n0\t3448488\t3448449\tThe Dow Jones industrial average < .DJI > added 28 points , or 0.27 percent , at 10,557 , hitting its highest level in 21 months .\tThe Dow Jones industrial average < .DJI > rose 49 points , or 0.47 percent , to 10,578 .\n1\t2749322\t2749663\tThe Democratic candidates also began announcing their fund-raising totals before Wednesday 's deadline to file quarterly reports with the Federal Election Commission .\tThe Democratic candidates also began announcing their fund-raising totals in advance of the deadline today to file quarterly reports with the Federal Election Commission .\n0\t2204592\t2204588\tSun Microsystems Inc. on Thursday said it had added 100 new third-party systems and 100 new components to its Hardware Compatibility List for the Solaris x86 operating system Platform Edition .\tThe vendor has added 100 new third-party systems and 100 new components to the operating system 's Hardware Compatibility List ( HCL ) .\n1\t2889005\t2888954\tProsecutors said PW Marketing violated the state 's 1998 anti-spam law by sending unsolicited e-mail without a toll-free number for recipients to call to stop additional mailings .\tProsecutors said PW Marketing violated the 1998 anti-spam law because these unsolicited e-mails were sent without a free call number for recipients to phone to stop additional mailings .\n0\t1657632\t1657619\tThe Neighbours star and singer spent yesterday resting at her family home in Sydney and will have more tests today .\tGoodrem spent yesterday resting in her family home in Sydney and will have more tests today to determine her exact treatment .\n0\t555617\t555528\tThe 3 rd Armored Cavalry Regiment is 5,200 strong and the largest combat unit at Fort Carson .\tBroomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment .\n1\t2396937\t2396818\t\" The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , \" the Fed said in a statement accompanying the unanimous decision .\t\" The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , \" the policy-setting Federal Open Market Committee said .\n0\t2339738\t2339771\t\" It is bad for Symbian , \" said Per Lindberg , analyst at Dresdner Kleinwort Wasserstein .\t\" Motorola has displayed clear disloyalty \" to Symbian , said Per Lindberg , an analyst at Dresdner Kleinwort Wasserstein in London .\n0\t1616174\t1616206\tBob Richter , a spokesman for House Speaker Tom Craddick , had no comment about the ruling .\tBob Richter , spokesman for Craddick , R-Midland , said the speaker had not seen the ruling and could not comment .\n1\t635783\t635802\tBut Ms Ward said the headroom under its financial covenants was \" tight \" and that there could be another downgrade if Southcorp breached any of its banking covenants .\tBut Ms Ward said the headroom under its financial covenants was \" tight \" and that there could be a rating downgrade if Southcorp did breach any banking covenants .\n1\t3444633\t3444733\tHe added : ``I 've never heard of more reprehensiblebehaviour by a doctor .\tThe Harrisons ’ lawyer Paul LiCalsi said : “ I ’ ve never heard of more reprehensible behaviour by a doctor .\n1\t555553\t555528\tBroomhead was assigned to 2nd Squadron , 3rd Armor Cavalry Regiment , based at Fort Carson .\tBroomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment .\n1\t1112021\t1111925\tOther staff members , however , defended the document , saying it would still help policy-makers and the agency improve efforts to address the climate issue .\tSome E.P.A. staff members defended the document , saying that although pared down it would still help policy makers and the agency address the climate issue .\n0\t2749410\t2749625\tPresident Bush raised a record-breaking $ 49.5 million for his re-election campaign over the last three months , with contributions from 262,000 Americans , the president 's campaign chairman said Tuesday .\tPresident Bush has raised $ 83.9 million since beginning his re-election campaign in May , and has $ 70 million of that left to spend , his campaign said Tuesday .\n1\t1629064\t1629043\tAn episode is declared when the ozone reaches .20 parts per million parts of air for one hour .\tA Stage 1 episode is declared when ozone levels reach 0.20 parts per million .\n1\t789691\t789665\t\" He may not have been there , \" the defence official said on Thursday .\t\" He may not have been there , \" said a defence official speaking on condition of anonymity .\n1\t844421\t844679\tThe U.N. troops are in Congo to protect U.N. installations and personnel , and they can only fire in self defense and have been unable to stem the violence .\tThe troops - whose mandate is to protect U.N. installations and personnel - can only fire in self-defense and have been unable to stem the violence .\n1\t58540\t58567\tNorth American markets grabbed early gains Monday morning , as earnings season begins to slow and economic indicators take the spotlight .\tNorth American futures pointed to a strong start to the first trading session of the week Monday , as earnings season slows and economic indicators take the spotlight .\n1\t781439\t781461\tXerox itself paid a $ 10 million fine last year to settle similar SEC charges .\tXerox itself previously paid a $ 10-million penalty to settle the SEC accusations .\n1\t1909579\t1909408\t\" This deal makes sense for both companies , \" said National Chief Executive Brian Halla .\t\" This deal makes sense for both companies , \" Halla said in a prepared statement .\n0\t787432\t787464\tThe blasts killed two people and injured more than 150 others .\tThe Atlanta Olympic Games attack killed one woman and injured more than 100 other people .\n0\t52758\t52343\tMorrill 's wife , Ellie , sobbed and hugged Bondeson 's sister-in-law during the service .\tAt the service Morrill 's widow , Ellie , sobbed and hugged Bondeson 's sister-in-law as people consoled her .\n1\t1675025\t1675047\tSpansion products are to be available from both AMD and Fujitsu , AMD said .\tSpansion Flash memory solutions are available worldwide from AMD and Fujitsu .\n1\t2131318\t2131372\tAbout 1,500 police will be deployed for the visit .\tAround 1,500 police are to be deployed at Niigata for the ferry 's visit .\n1\t325763\t325928\tGamarekian told The News she remembers only the woman 's first name - and refused to reveal it .\tShe told the New York Daily News she remembers only the intern 's first name , which she refused to reveal .\n1\t2638975\t2638855\tOne of the FBI ’ s key operatives , who had a falling out with the bureau , provided an account of the operation at a friend ’ s closed immigration court proceeding .\tOne of the FBI 's key operatives , who has had a falling-out with the bureau , provided an account of the operation at a friend 's closed immigration court proceeding .\n1\t2198694\t2198937\tA nationally board certified teacher with a master 's degree , Kelley makes a salary of $ 65,000 in his 30th year .\tA nationally board certified teacher with a master 's degree , Kelley , in his 30th year teaching , makes $ 65,000 .\n1\t1825432\t1825301\tA man arrested for allegedly threatening to shoot and kill a city councilman from Queens was ordered held on $ 100,000 bail during an early morning court appearance Saturday .\tThe Queens man arrested for allegedly threatening to shoot City Councilman Hiram Monserrate was held on $ 100,000 bail Saturday , a spokesman for the Queens district attorney said .\n1\t2906104\t2906322\tThey were being held Sunday in the Camden County Jail on $ 100,000 bail .\tThey remained in Camden County Jail on Sunday on $ 100,000 bail .\n1\t722278\t722383\tMs Stewart , the chief executive , was not expected to attend .\tMs Stewart , 61 , its chief executive officer and chairwoman , did not attend .\n0\t101747\t101777\tChristina 's aunt , Shelley Riling , said the defense 's claims were preposterous .\tChristina 's aunt , Shelley Riling , said she will address the court .\n1\t2224884\t2224819\tThe Justice Department Aug. 19 gave pre-clearance for the Oct. 7 date for the election to recall Gov. Gray Davis , saying it would not affect minority voting rights .\tThe Justice Department on Aug. 19 sanctioned the Oct. 7 date for recall election , saying it would not affect voting rights .\n0\t977938\t978162\tLord Falconer hailed the changes as \" a new beginning as far as the courts , Crown Prosecution Service and police are concerned \" .\t\" It 's a new beginning as far as the courts , Crown Prosecution Service and police are concerned , making the criminal justice system work better . \"\n0\t1015010\t1014963\tGE stock closed at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange .\tGE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange .\n1\t1513190\t1513246\tAt least 27 US troops have been killed in hostile fire since Bush 's statement .\tAt least 26 American troops have been killed in hostile fire since major combat was officially declared over on May 1 .\n1\t2385348\t2385394\tA recent poll showed Edwards with a narrow lead in South Carolina , and he plans a rally there later on Tuesday .\tA recent poll showed Edwards in a virtual four-way tie at the top in South Carolina , and he plans a rally there later on Tuesday .\n1\t2317018\t2317252\tNovember 17 's last victim was British defence attache Stephen Saunders , who was shot on an Athens road in June 2000 .\tNovember 17 's last victim was British defense attache Stephen Saunders , who was shot and killed at point-blank range on a busy Athens road in June 2000 .\n0\t1831696\t1831660\tThe agency charged that one WD Energy worker discussed false reporting with traders at two other energy companies .\tThe agency found further that a WD Energy employee discussed false reporting with traders at two other energy companies , which the CFTC didn 't identify .\n1\t1528383\t1528083\tZulifquar Ali , a worshipper slightly wounded by shrapnel , said the assailants first targeted the mosque 's security guards .\tWitness Zulfiqar Ali , who was slightly wounded by shrapnel , said the attackers had focused on the mosque 's guards .\n1\t917965\t918315\tFor the second year in a row , rises in hospital costs accounted for much of the inflation , accounting for 51 percent of the overall cost increase .\tFor the second year in a row , rises in hospital costs dominated the increase , accounting for 51 percent of the overall cost spiral .\n0\t3218713\t3218830\tQ : Can I buy coverage for prescription drugs right away ?\tCongress has added a new benefit - an option to buy insurance coverage for prescription drugs .\n1\t221079\t221003\tThe airline also said it has the option to buy 380 more airplanes , orders that would be split evenly between the two manufacturers .\tThe airline has the option to buy 380 more , split evenly between the two manufacturers .\n1\t2546175\t2546198\tDr Mark McClean , Jonathan 's family doctor , said if the drug had been administered earlier Jonathan would have retained more of his brain functions .\tDr Mark McClean , the family 's GP , said had the drug been administered to Jonathan earlier , he would have retained more of his brain function .\n0\t799346\t799268\tThe chain operates more than 3,400 stores , and has annual revenue of about $ 15.8 billion .\tThe chain , which has been under new management since late 1999 , has more than 3,400 stores and $ 15.8 billion in annual revenue .\n0\t2673104\t2673130\tAll patients developed some or all of the symptoms of E. coli food poisoning : bloody diarrhea , vomiting , abdominal cramping and nausea .\tSymptoms of the E. coli infection include bloody diarrhea , nausea , vomiting and abdominal cramping .\n1\t1354501\t1354476\tFederal regulators have turned from sour to sweet on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings Inc. and Dreyer 's Grand Ice Cream Inc .\tFederal regulators have changed their minds on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings and Dreyer 's Grand Ice Cream .\n1\t3070979\t3070949\tEnvironmental campaigners are using this weekend ’ s lunar eclipse to highlight the huge increase in light pollution across the UK .\tEnvironmental campaigners used the eclipse to highlight the surge in light pollution across Britain .\n0\t1264509\t1264471\tAvailable July 7 , the software supports the Solaris , IBM AIX , Red Hat Linux and Windows operating systems .\tThe OpForce product currently works with Solaris , AIX , Red Hat Linux and Windows servers .\n1\t103280\t103431\tJustice Minister Martin Cauchon and Prime Minister Jean Chrétien have both said the Liberal government will introduce legislation soon to decriminalize possession of small amounts of pot for personal use .\tJustice Minister Martin Cauchon and Prime Minister Jean Chretien both have said the government will introduce legislation to decriminalize possession of small amounts of pot .\n0\t110731\t110648\tBut Chauncey Billups demonstrated he 's also capable of big games , scoring 77 points over the final two games against the Magic .\tBillups scored 77 points in the final two games of the first-round series against the Magic .\n1\t2274844\t2274714\tKelly killed himself after being exposed as the source for a BBC report which claimed the government had embellished evidence of Iraq 's banned weapons to justify the war .\tHe killed himself after being exposed as the source for a BBC report which claimed the government exaggerated the case for war against Iraq .\n0\t1050307\t1050144\tAnd it 's going to be a wild ride , \" said Allan Hoffenblum , a Republican consultant .\tNow the rest is just mechanical , \" said Allan Hoffenblum , a Republican consultant .\n1\t2810634\t2810670\tWhile the Ibrahims had one separation operation , Goodrich and Dr. David Staffenberg plan about three for the Aguirres , with several weeks between each .\tInstead of one long operation to separate the twins , Goodrich and Dr. David Staffenberg plan about three , with several weeks between each .\n1\t3073773\t3073779\tLay had contended that turning over the documents would violate his Fifth Amendment right against self-incrimination .\tLay had refused to turn over the papers , asserting his Fifth Amendment right against self-incrimination .\n0\t261202\t260995\tThe WHO experts didn 't say how many cases in Hebei were in rural areas .\tHebei has reported 191 cases and eight deaths , though the WHO experts did not say how many were in rural areas .\n1\t1824224\t1824209\tNearly 300 mutinous troops who seized a Manila shopping and apartment complex demanding the government resign gave up and retreated peacefully after some 19 hours .\tMutinous troops who seized a Manila shopping and apartment complex demanding the government resign ended a 19-hour standoff late Sunday and returned to barracks without a shot fired .\n1\t548867\t548785\tIn three years , Lend Lease has slipped from a top-five stock , when its share price was around $ 24 , to 37th .\tIn the space of three years , Lend Lease has slipped from a top-five 5 stock when its share price hovered around $ 24 to 37th on the list .\n0\t2796658\t2796682\tAbout two hours later , his body , wrapped in a blanket , was found dumped a few blocks away .\tThen his body was dumped a few blocks away , found in a driveway on Argyle Road .\n1\t1808166\t1808434\tColumbia broke up over Texas upon re-entry on Feb. 1 .\tColumbia broke apart in the skies above Texas on Feb. 1 .\n1\t853475\t853342\tA year or two later , 259 , or 10 per cent , of the youths reported that they had started to smoke , or had taken just a few puffs .\tWithin two years , 259 , or 10 percent , of the youths reported they had started to smoke or had at least taken a few puffs .\n0\t977772\t977804\tThe Lord Chancellor was guardian of the Great Seal , used to stamp all official documents from the sovereign .\tFalconer will hold on , for now , to the Lord Chancellor 's Great Seal , used to sign off instructions from the sovereign .\n1\t577854\t578500\tCindy Yeast , a 50-year-old Washington-area publicist , says she began taking supplements two years ago in part to avoid mild dementia that affects her elderly parents .\tShe started taking supplements two years ago - partly to stave off mild dementia that affects her elderly parents .\n1\t2829194\t2829229\tThe two are not related , but have referred to each other as father and son .\tHe 's not related to Malvo , but the two have referred to each other as father and son .\n1\t2074182\t2074668\tGibson said last month in a press statement that \" neither I nor my film are anti-Semitic .\tGibson said in a June statement that he and his film are not anti-Semitic .\n0\t2758265\t2758282\tThe world 's largest software company said it recognized the difficulty the multiple patches posed for companies , and set out to make it easier for them to apply the updates .\tThe world 's largest software company said it recognized the difficulty the multiple patches posed for companies trying to apply them .\n1\t1958079\t1958143\tThe Dow Jones industrial average .DJI ended up 64.64 points , or 0.71 percent , at 9,191.09 , according to the latest available data .\tThe blue-chip Dow Jones industrial average .DJI added 38 points , or 0.42 percent , to 9,165 .\n1\t544217\t544325\tThe vote came just two days after Kurds swept City Council elections , taking the largest single block of votes on the 30-seat council .\tThe vote for mayor followed City Council elections that gave Kurds the largest block of votes on the 30-seat council .\n1\t2385288\t2385256\tLarge swells and dangerous surf already were being felt along sections of the coast .\tAlready large swells and dangerous surf have arrived along the mid-Atlantic .\n0\t2324708\t2325028\tBased on a separate survey of households , the unemployment rate fell in August to 6.1 percent from 6.2 percent .\tLabor Department analysts discounted a slight improvement in the national unemployment rate , which fell in August to 6.1 percent from 6.2 percent .\n1\t2139506\t2139427\t\" We will work with the board to ensure a smooth transition . \"\tHe said federal regulators would work with the corporation to ensure a \" smooth transition . \"\n1\t2965576\t2965701\tGasps could be heard in the courtroom when the photo was displayed .\tGasps could be heard as the photo was projected onto the screen .\n1\t2931098\t2931144\tGilead had earnings of $ 73.1 million , or 33 cents a share , compared with $ 20.8 million , or 10 cents , in the year-ago quarter .\tQuarterly profit climbed to $ 73.1 million , or 33 cents a share , from $ 20.8 million , or 10 cents , a year earlier , the company said .\n0\t644788\t644816\t\" I had one bad stretch of holes that put me out of contention to win , \" Woods said .\t\" I had one bad stretch of holes that put me out of contention , \" Woods said , referring to his 42 on the front nine Saturday .\n0\t2551891\t2551563\tThe poll had a margin of error of plus or minus 2 percentage points .\tIt had a margin of sampling error of plus or minus four percentage points and was conducted Thursday through Saturday .\n1\t1089053\t1089297\tSen. Patrick Leahy of Vermont , the committee 's senior Democrat , later said the problem is serious but called Hatch 's suggestion too drastic .\tSen. Patrick Leahy , the committee 's senior Democrat , later said the problem is serious but called Hatch 's idea too drastic a remedy to be considered .\n1\t3435735\t3435717\tThe broad Standard & Poor 's 500 < .SPX > eased 0.37 of a point , or 0.03 percent , at 1,121 .\tThe Standard & Poor 's 500 Index < .SPX > slipped 0.26 point , or 0.02 percent , to 1,121.96 .\n0\t1954\t2142\tWatertown , Saugus and Framingham also are going smoke-free Monday , joining a growing number of cities around the country .\tAlong with Boston , Watertown , Saugus and Framingham also are going smoke-free Monday .\n1\t3400796\t3400822\tThat is evident from their failure , three times in a row , to get a big enough turnout to elect a president .\tThree times in a row , they failed to get a big _ enough turnout to elect a president .\n1\t1220668\t1220801\tWe firmly believe we have an absolute right to use the common word ' spike ' as the name of our network . \"\tWe firmly believe that we have an absolute right to use the common word ' spike ' to name our network .\n1\t1889954\t1889847\tSources who knew of the bidding said last week that cable TV company Comcast Corp. was also looking at VUE .\tLate last week , sources told Reuters cable TV company Comcast Corp. CMCSA.O also was looking at buying VUE assets .\n1\t315785\t315653\tBut MTA officials appropriated the money to the 2003 and 2004 budgets without notifying riders or even the MTA board members considering the 50-cent hike , Hevesi found .\tMTA officials appropriated the surplus money to later years ' budgets without notifying riders or the MTA board members when the 50-cent hike was being considered , he said .\n0\t1521034\t1520582\tWhite , who had suffered kidney failure from years of high blood pressure , died at Cedars-Sinai Medical Center around 9 : 30 a.m. , said manager Ned Shankman .\tWhite , who had kidney failure from years of high blood pressure , had been undergoing dialysis and had been hospitalized since a September stroke .\n1\t2083598\t2083810\tAbout 10 percent of high school and 16 percent of elementary students must be proficient at math .\tIn math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient .\n1\t1910610\t1910455\tThe legal ruling follows three days of intense speculation Hewlett-Packard Co. may be bidding for the company .\tThe legal ruling follows three days of wild volatility in RIM 's stock over speculation that PC giant Hewlett-Packard Co. may be bidding for the company .\n1\t3113791\t3113782\tThe European Commission , the EU 's antitrust enforcer , is expected to issue its decision next spring — unless a settlement is reached .\tThe European Commission is expected to issue its decision in the case next spring — unless a settlement is reached .\n1\t3214517\t3214483\t\" So Sebastian did his best to convincingly confess to a crime that he didn 't commit in order to survive , \" she told jurors .\t\" Sebastian did his best to confess convincingly to a crime he didn 't do in order to survive , \" Ms. Richardson declared .\n0\t2083612\t2083810\tTwenty percent of Latino students and 23 percent of black students performed at proficient or higher .\tIn math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient .\n1\t661390\t661218\tHe is charged in three bombings in Atlanta including a blast at the 1996 Olympics and one in Alabama .\tHe is charged in three bombings in Atlanta - including a blast at the 1996 Olympics - along with the bombing in Alabama .\n1\t1269572\t1269682\tThe men were remanded in custody and are due to appear again before court on July 8 .\tThey were remanded in custody and will appear in court again on July 8 .\n1\t1095780\t1095652\t\" No matter who becomes the sponsor for stock-car racing 's top series , NASCAR will need an all-star event , \" Wheeler said in a statement .\tNo matter who becomes the sponsor for stock-car racings top series , NASCAR will need an all-star event , Wheeler said Tuesday .\n1\t116294\t116332\tThe Phillies were upset that Counsell had stolen second in the sixth inning with Arizona leading 7-1 .\tThe Phillies were apparently upset when Counsell stole during the sixth with the Diamondbacks up 7-1 .\n1\t941617\t941673\tHe said his hatred for such people grew from these discussions and had helped convince him violence was the answer .\tHis hatred for these people had germinated from these discussions and helped cement his belief that violence was the panacea .\n1\t2640607\t2640576\t\" There is no need for one deadline for all to create the ASEAN Economic Community , \" Thaksin said .\tThus , he said , there did not have to one deadline to create the economic community .\n1\t3310210\t3310286\tThe announcement was made during the recording of a Christmas concert attended by top Vatican cardinals , bishops , and many elite from Italian society , witnesses said .\tThe broadside came during the recording on Saturday night of a Christmas concert attended by top Vatican cardinals , bishops and many elite of Italian society , witnesses said .\n1\t3376093\t3376101\tThe additional contribution brings total U.S. food aid to North Korea this year to 100,000 tonnes .\tThe donation of 60,000 tons brings the total of U.S. contributions for the year to 100,000 .\n1\t1549586\t1549609\tLeon Williams ' body was found inside his third-floor apartment at 196 Bay St. , in Tompkinsville .\tThe dead man , Leon Williams , was found in his third-floor apartment .\n1\t460211\t460445\tThe player 's eyes were bloodshot and a blood-alcohol test produced a reading of 0.18 - well above Tennessee 's level of presumed intoxication of 0.10 , the report said .\tHe failed a field sobriety test and a blood-alcohol test produced a reading of 0.18 – well above Tennessee 's level of presumed intoxication of 0.10 , the report said .\n1\t1196962\t1197061\tBut Virgin wants to operate Concorde on routes to New York , Barbados and Dubai .\tBranson said that his preference would be to operate a fully commercial service on routes to New York , Barbados and Dubai .\n0\t862804\t862715\tHe tried to fight off officers and was taken to a hospital after a police dog bit him but was later released .\tCruz tried to fight off officers and was hospitalized after a police dog bit him , Sgt. Steve Dixon said .\n1\t1726935\t1726879\tThe announcement , which economists said was not a surprise , may be bittersweet for the millions of Americans without jobs .\tEconomists said the announcement was not a surprise , and politicians said it offered little comfort to the millions of Americans without jobs .\n0\t331980\t332110\tAsked if the delegates could leave on Friday , police intelligence chief in Aceh , Surya Dharma , told reporters they could not because they did not have proper permission .\tAsked if the delegates could leave on Friday , police intelligence chief Surya Dharma told reporters : \" Of course they may not go .\n1\t173879\t173832\tDealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid the yen 's rise against the dollar .\tDealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid ever-falling domestic interest rates .\n0\t2834988\t2835026\tIran has until the end of the month to satisfy the agency it has no plans for nuclear weapons .\tThe Iranians have until the end of the month to answer all the agency 's questions about their past nuclear activities .\n1\t2587300\t2587243\tHer father , Florin Cioaba , the king of Transylvania 's Gypsies , had her brought back and she was married against her will .\tHer father , Roma King Florin Cioaba , had her brought back and she was promptly married against her will .\n0\t554905\t554627\tClaire had advanced to the third round of the 76th annual Scripps Howard National Spelling Bee .\tOne by one they strolled to the microphone , all 251 youngsters in the 76th Scripps Howard National Spelling Bee .\n1\t1912524\t1912648\tCitigroup Inc . C.N , the world 's largest financial services company , on Wednesday promoted Marjorie Magner to chairman and chief executive of its global consumer group .\tCitigroup ( C ) on Wednesday named Marjorie Magner chairman and chief executive of its colossal global consumer business .\n1\t3255597\t3255668\t\" They 've been in the stores for over six weeks , \" says Carney .\tThe quarterlies usually stay in stores for between six to eight weeks , \" Carney added .\n1\t629316\t629289\tLet me just say this : the evidence that we have of weapons of mass destruction was evidence drawn up and accepted by the joint intelligence community .\t\" The evidence that we had of weapons of mass destruction was drawn up and accepted by the Joint Intelligence Committee , \" he said .\n1\t54181\t53570\tRidge said no actual explosives or other harmful substances will be used .\tRidge said no real explosives or harmful devices will be used in the exercise .\n1\t723557\t724115\tThus far , Stewart 's company appears ready to stand behind her .\tFor now , the company 's management appears to be standing behind Stewart .\n0\t2607718\t2607708\tBut late Thursday night , the campaign issued a statement saying there would be no news conference and no big announcement .\tBut late yesterday , the campaign and the state Democratic Party said there would be no news conference .\n1\t753858\t753890\tThere 's also a flaw that results because IE does not implement an appropriate block on a file download dialog box .\tThe second vulnerability is a result of IE not implementing a block on a file download dialog box .\n1\t587009\t586969\tAnother $ 100-million in savings will come from management layoffs and pay cuts .\tThe airline expects to save another $ 100-million a year through management layoffs and pay cuts .\n1\t308567\t308525\tHe called on Prime Minister John Howard to establish a royal commission on child sex abuse .\tThe Senate motion also called on Prime Minister John Howard to hold a royal commission into child sex abuse .\n0\t665419\t665612\t\" We think that the United States of America should support the free speech of all groups , \" Mr. White said , objecting to Mr. Olson 's recommendation .\tWe think that the United States of America should support the free speech of all groups , he said .\n1\t2763517\t2763576\tTerri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler .\tThe tube was removed Wednesday from Terri Schiavo , 39 , at the Tampa Bay-area hospice where she has lived for several years .\n0\t3107118\t3107136\tAfter 18 months , Nissen found that Lipitor stopped plaque buildup in the patients ' arteries .\tAfter 18 months , the atorvastatin patients had no change in the plaque in their arteries .\n1\t780604\t780466\tToll , Australia 's second-largest transport company , last week offered NZ75 a share for Tranz Rail .\tToll last week offered to buy the company for NZ75c a share , or $ NZ158 million .\n0\t1989213\t1989116\t\" This child was literally neglected to death , \" Armstrong County District Attorney Scott Andreassi said .\tArmstrong County District Attorney Scott Andreassi said the many family photos in the home did not include Kristen .\n1\t1462409\t1462504\tWal-Mart , the nation 's largest private employer , has expanded its antidiscrimination policy to protect gay and lesbian employees , company officials said Tuesday .\tWal-Mart Stores Inc . , the nation 's largest private employer , will now include gays and lesbians in its anti-discrimination policy , company officials said Wednesday .\n1\t260952\t260924\tMetro , bus and local rail services in France 's four largest towns -- Paris , Lyon , Lille and Marseille -- were severely disrupted , Europe 1 radio reported .\tSubway , bus and suburban rail services in France 's four largest cities -- Paris , Lyon , Lille and Marseille -- were severely disrupted , transport authorities said .\n1\t1224743\t1225510\tIn the undergraduate case , Rehnquist said the use of race was not \" narrowly tailored \" to achieve the university 's asserted interest in diversity .\tRehnquist wrote that the system was not narrowly tailored to achieve the interest in educational diversity .\n0\t3329379\t3329416\tSP2 is basically about security enhancements to Windows , such as the improved Internet Connection Firewall ( ICF ) .\tThe firewall in the current Windows XP was known as the Internet Connection Firewall ( ICF ) .\n1\t2362761\t2362698\tA landslide in central Chungchong province derailed a Seoul-bound train and 28 passengers were injured , television said .\tIn central Chungchong province , a landslide caused a Seoul-bound Saemaeul Express train to derail , injuring 28 people , local television said .\n0\t1465073\t1464854\tThey will help draft a plan to attack obesity that Kraft will implement over three to four years .\tThe team will help draft a plan by the end of the year to attack obesity .\n1\t195728\t196099\tBut that amount would probably be impossible to pass in the Senate , where Republican moderates have refused to go above $ 350 billion .\tSuch an amount would probably be unable to summon a majority of the Senate , where Republican moderates have refused to go above $ 350 billion .\n1\t2587767\t2587673\tIn the clash with police , Lt. Mothana Ali said about 1,000 demonstrators had gone to the station demanding jobs .\tIn Baghdad , police Lieut . Mothana Ali said about 1,000 demonstrators arrived at the station demanding jobs .\n0\t1490044\t1489975\tCorixa shares rose 54 cents to $ 7.74 yesterday on the Nasdaq Stock Market .\tShares of Corixa rose 54 cents , or about 8 percent , to close at $ 7.74 .\n1\t958161\t957782\tCommittee approval , expected today , would set the stage for debate on the Senate floor beginning Monday .\tThat would clear the way for debate in the full Senate beginning on Monday .\n1\t1033204\t1033365\tO 'Brien was charged with leaving the scene of a fatal accident , a felony .\tBishop Thomas O 'Brien , 67 , was booked on a charge of leaving the scene of a fatal accident .\n0\t2996241\t2996734\tTom Hamilton said his daughter was conscious and alert and in stable condition after the attack Friday morning .\tBethany , who remained in stable condition after the attack Friday morning , talked of the attack Saturday .\n0\t2015389\t2015410\tThe Calgary woman , who is in her twenties , donated blood on Aug. 7 .\tThe woman -- who has no symptoms of illness -- donated blood Aug. 7 .\n1\t221515\t221509\tQuattrone lawyer John W. Keker said his client is innocent .\tIn a statement Monday , his lawyer John Keker said ``Frank Quattrone is innocent .\n0\t2283737\t2283794\tIn the weeks leading up to the execution , several Florida officials received anonymous threatening letters .\tSeveral Florida officials connected to the case have received threatening letters , accompanied by rifle bullets .\n1\t2826681\t2826474\tThe disagreement over online music sales was disclosed in documents filed last week with the judge and made available by the court yesterday .\tThe fight over online music sales was disclosed in documents made available Monday by the court .\n1\t2249237\t2249305\tParson was charged with intentionally causing and attempting to cause damage to protected computers .\tParson is charged with one count of intentionally causing damage to a protected computer .\n1\t389239\t389299\t\" The court and the public need to know much more of the details of the defendant 's seemingly massive fraud , \" the judge said .\t\" The court and the public need to know more of the defendants ' seemingly massive fraud , \" he said .\n1\t2652187\t2652218\tThe U.S. Supreme Court will hear arguments on Wednesday on whether companies can be sued under the Americans with Disabilities Act for refusing to rehire rehabilitated drug users .\tThe high court will hear arguments today on whether companies can be sued under the ADA for refusing to rehire rehabilitated drug users .\n1\t2945693\t2945847\tThe IRS said taxpayers can avoid undelivered checks by having refunds deposited directly into their checking or savings accounts .\tThe IRS said taxpayers can avoid problems with lost or stolen refunds by having refunds deposited directly into personal checking or savings accounts .\n1\t2065523\t2065836\t\" More than 70,000 men and women from bases in Southern California were deployed in Iraq .\tIn all , more than 70,000 troops based in Southern California were deployed to Iraq .\n1\t2222998\t2223097\tBP shares slipped 0.8 percent to 433.50 pence ( $ 6.85 ) each in afternoon trading on the London Stock Exchange .\tBP shares slipped 48 cents to $ 41.72 Friday in trading on the New York Stock Exchange .\n1\t2561999\t2561941\tBecause of the accounting charge , the company now says it lost $ 1.04 billion , or 32 cents a share , in the quarter ended June 30 .\tIncluding the charge , the Santa Clara , Calif.-based company said Monday it lost $ 1.04 billion , or 32 cents per share , in the period ending June 30 .\n0\t2324704\t2325023\tFriday 's report raised new worries that a weak job market could shackle the budding economic recovery despite a slight improvement in the overall unemployment rate .\tU.S. companies slashed payrolls for a seventh straight month in August , raising new worries that a weak jobs market could shackle the budding economic recovery .\n1\t2336453\t2336545\tFederal Emergency Management Administration designated $ 20 million to establish the registry .\tThe registry was launched with $ 20 million from the Federal Emergency Management Agency .\n1\t720572\t720486\tBREAST cancer cases in the UK have hit an all-time high with more than 40,000 women diagnosed with the disease each year , Cancer Re-search UK revealed yesterday .\tCases of breast cancer in Britain have reached a record high , with the number of women diagnosed with the disease passing the 40,000 mark for the first time .\n1\t1605818\t1605806\t\" It was never our intention to sell the product , \" said Health Minister Anne McClellan , a skeptic of medical marijuana use .\t\" It was never the intention of us to sell product , \" federal Health Minister Anne McLellan said yesterday in Edmonton .\n0\t2440680\t2440474\tGM , the world 's largest automaker , has 115,000 active UAW workers and another 340,000 retirees and spouses .\tThey cover more than 300,000 UAW workers and 500,000 retirees and spouses .\n0\t726399\t726078\tRosenthal is hereby sentenced to custody of the Federal Bureau of prisons for one day with credit for time served , \" Breyer said to tumultuous cheers in the courtroom .\t\" Rosenthal is hereby sentenced to custody of the Federal Bureau of Prisons for one day with credit for time served . \"\n1\t533903\t533818\t\" We are committed to helping the Iraqi people get on the path to a free society , \" Rumsfeld said in a speech to the Council on Foreign Relations .\t\" We are committed to helping the Iraqi people get on the path to a free society , \" he said .\n1\t1166473\t1166857\tMr. Young said he was disappointed that the government didn 't see the severe acute respiratory syndrome crisis as worthy of federal disaster-relief money .\tYoung said he was disappointed the government didn 't see the SARS crisis as worthy of federal disaster relief money .\n1\t144089\t143697\tThe 12-nation currency has risen by 33 percent against the dollar over the past 15 months .\tThe euro is up 9 percent against the dollar in the past six weeks .\n1\t3439854\t3439874\tIn February 2000 , the officers — Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy — were acquitted of all charges in the killing .\tThe officers -- Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy -- were acquitted in 2000 of state murder charges .\n1\t3464314\t3464302\tI was surprised it turned out me talking and the president just listening .\t\" I was surprised it turned out me talking and the president just listening . . . It was mostly a monologue . \"\n1\t2008984\t2009175\tThe state 's House delegation currently consists of 17 Democrats and 15 Republicans .\tDemocrats hold a 17-15 edge in the state 's U.S. House delegation .\n0\t816867\t816831\tFreddie also said Leland C. Brendsel will retire as chairman and chief executive and resign from the board .\tHe replaces Leland Brendsel , 61 , who retired as chairman and chief executive .\n1\t192285\t192327\tWe 'll be listening carefully to the [ IAEA ] director general 's report at the next board meeting .\t\" We 'll be listening carefully to the ( IAEA ) director-general 's report at the next board meeting . \"\n1\t2688145\t2688162\tIn that position , Elias will report to Joe Tucci , president and CEO of EMC .\tAs executive vice president of new ventures , Elias will report to Joe Tucci , EMC 's president and chief executive .\n1\t3294207\t3294290\tBut with the PM due to leave tomorrow afternoon for personal reasons there was a risk he might not be present when the final decision was made .\tBut with the Prime Minister due to leave tomorrow , a day early , he may not be present when the final decision is made .\n0\t205100\t205145\tA pro-independence radical , Miodrag Zivkovic , of the Liberal Alliance , came in second with 31 percent of the vote .\tMiodrag Zivkovic , of the Liberal Alliance of Montenegro , won 31 percent of the vote while the independent Dragan Hajdukovic got four percent .\n0\t3242051\t3241897\tMr. Kerkorian tried unsuccessfully to take over Chrysler in 1995 , but did win representation on its board .\tKerkorian and Tracinda had also tried to take over Chrysler in 1995 .\n0\t1076861\t1077018\tGlover spoke at a news conference that included about 20 relatives of the victims .\tAbout 20 family members of the victims were invited to the news conference .\n1\t2095803\t2095786\tDrax faced a financial crisis late last year after it lost its most lucrative sales contract , held with insolvent utility TXU Europe .\tDrax ’ s troubles began late last year when it lost its most lucrative sales contract , with the insolvent utility TXU Europe .\n1\t2112330\t2112376\tBut I would rather be talking about high standards than low standards . \"\t\" I would rather be talking about positive numbers rather than negative .\n1\t3389318\t3389271\tIt was not immediately known how many people were on flight UTA 141 , which could carry 141 passengers and crew .\tIt was still not known exactly how many people were on the plane , which could carry 141 passengers and crew .\n1\t698948\t698933\tThe market remains pinned in a narrow range after a powerful rally drove the broad Standard & Poor 's 500 index .SPX up more than 20 percent since mid-March .\tThe market remains pinned in a narrow range after a powerful rally pushed the broad S & P 500 index up more than 20 percent since mid-March .\n1\t539585\t539355\tWitnesses said they believed the man planned to crash the Launceston-bound Qantas flight 1737 , which was carrying 47 passengers and six crew .\tWitnesses believe he wanted to crash Flight 1737 , which had 47 passengers and six crew .\n1\t684848\t684557\tAs Samudra sat down to hear the indictment , he looked over to his nine lawyers and shouted ``God is Great ' ' three times .\tAs he sat down to hear the indictment , Samudra looked over to his nine lawyers and shouted \" Takbir ! \" , or \" Proclaim ! \" , a religious rallying cry .\n1\t347017\t347002\tIn hardest-hit Taipei , traffic has disappeared from once bustling streets , ubiquitous department stores stand mostly empty and restaurants are eerily quiet .\tIn hardest-hit Taipei , traffic has disappeared from once-bustling streets and department stores and restaurants are virtually empty .\n1\t1592037\t1592076\tIn a statement , Lee said he \" no longer believes that Viacom deliberately intended to trade on my name when naming Spike TV . \"\tSpike Lee no longer believes that Viacom deliberately intended to trade on his name by calling its own venture \" Spike TV , \" according to a statement read in court Tuesday .\n0\t3013483\t3013540\tSingapore Prime Minister Goh Chok Tong says China plays an important role in the integration of Asia , including managing the stresses and strains both within and between countries .\tHAINAN PROVINCE , China : Singapore Prime Minister Goh Chok Tong said China plays an important role in the integration of Asia .\n1\t2020252\t2020081\tThe worm attacks Windows computers via a hole in the operating system , an issue Microsoft on July 16 had warned about .\tThe worm attacks Windows computers via a hole in the operating system , which Microsoft warned of 16 July .\n0\t2614947\t2614904\tThe premium edition adds OfficeFront Page 2003 , Acceleration Server 2000 , and SQL Server 2000 .\tThe premium edition adds ISA Server , SQL Server and a specialized edition of BizTalk 2004 .\n0\t1744257\t1744378\tIn the year-ago quarter , the steelmaker recorded a profit of $ 16.2 million , or 15 cents per share , on sales of $ 1.14 billion .\tIn the second quarter last year , AK Steel reported a profit of $ 16.2 million , or 15 cents a share .\n0\t1119721\t1119714\tSony claimed that the reader 's capacitance sensing technology cannot be fooled by paper copies and does not require cleaning .\tIts capacitance sensing technology electronically reads a fingerprint ; Sony says it can 't be fooled by paper copies and doesn 't require cleaning .\n1\t1186754\t1187056\tAmazon.com shipped out more than a million copies of the new book , making Saturday the largest distribution day of a single item in e-commerce history .\tAmazon.com shipped more than a million copies by Saturday afternoon , making Saturday the largest distribution day of a single item in e-commerce history .\n1\t2842562\t2842582\tThe show 's closure affected third-quarter earnings per share by a penny .\tThe company said this impacted earnings by a penny a share .\n0\t431076\t431242\tAfter the two-hour meeting on May 14 , publisher Arthur O. Sulzberger Jr . , executive editor Howell Raines and managing editor Gerald Boyd pledged quick remedies to staff grievances .\tThe committee will make recommendations to Publisher Arthur Sulzberger , Executive Editor Howell Raines and Managing Editor Gerald Boyd .\n1\t1393764\t1393984\tIt 's been a busy couple of days for security gurus assigned to keep their companies safe and sound .\tIt 's been a busy couple of days for enterprise security gurus tasked with the job of keeping their companies safe and sound .\n0\t2916199\t2916164\tLu reclined in a soft chair wearing a woolly coat near the blackened capsule .\t\" It 's great to be back home , \" said Lu , dressed in a woolly coat near the blackened capsule .\n1\t2530671\t2530542\tGov. Bob Riley proposed the budget cuts after Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 .\tAfter Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 , Riley forecast significant cuts in state programs .\n1\t219064\t218969\t\" It is probably not the easiest time to come in and take over the shuttle program , but then again , I look forward to the challenge , \" he said .\t\" It 's probably not the easiest time to come in and take over the shuttle program , but I look forward to the challenge , \" Parsons told reporters at NASA headquarters .\n0\t2377289\t2377259\tEstonia 's place in the European mainstream and safeguard its independence regained in 1991 .\tEstonia was forcibly incorporated in the Soviet Union in 1940 and regained its independence only in 1991 .\n0\t2110220\t2110199\tFranklin County Judge-Executive Teresa Barton said a firefighter was struck by lightning and was taken to the Frankfort Regional Medical Center .\tA county firefighter , was struck by lightning and was in stable condition at Frankfort Regional Medical Center .\n0\t1864253\t1863810\tPolice suspected that Shaichat , 20 , had been abducted either by Palestinians or by Israeli Arabs .\tNobody claimed responsibility for Schaichat 's death , but police suspect that the 20-year-old soldier was abducted either by Palestinians or Israeli Arabs .\n0\t3150803\t3150839\tDuring this year 's August to October quarter , Lowe 's opened 38 new stores , including two relocations .\tDuring the third quarter , Lowe 's opened 38 new stores and now has 932 stores in 45 states .\n0\t969381\t969512\tThe technology-laced Nasdaq Composite Index < .IXIC > declined 25.78 points , or 1.56 percent , to 1,627.84 .\tThe broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 .\n1\t271891\t271839\tSony said the PSP would also feature a 4.5-inch LCD screen , Memory Stick expansion slots .\tIt also features a 4.5 in back-lit LCD screen and memory expansion facilities .\n0\t2829648\t2829613\tClinton did not mention that two Democratic senators , Charles Robb of Virginia and Wendell Ford of Kentucky , voted to shelve the McCain bill .\tTwo Democrats , Sen. Charles Robb of Virginia and Wendell Ford of Kentucky , voted with the 40 Republicans .\n1\t886904\t887158\tSome of the company 's software developers will join Microsoft , but details haven 't been finalized , said Mike Nash , corporate vice president of Microsoft 's security business unit .\tSome of the companys software developers will join Microsoft , but details havent been finalized , said Mike Nash , corporate vice president of Microsofts security business unit .\n0\t2632692\t2632767\tWal-Mart has said it plans to open at least 40 Supercenters in the state in the coming years ; analysts expect four or more to be in San Diego County .\tAt least 40 of the outlets will be in California , and analysts expect four or more to be in San Diego County .\n1\t2240399\t2240149\tCintas is battling efforts to unionize 17,000 of its workers and to let unions organize the workers by signing cards , rather than by a lengthy election process .\tCintas is battling efforts to unionize 17,000 of its workers and labor 's demands to let its workers organize by signing cards , rather than by a lengthy election process .\n1\t805457\t805985\tThe opposition would resort to rolling mass action \" at strategic times of our choice and without warning to the dictatorship , \" he said .\t\" From now onwards we will embark on rolling mass action at strategic times of our choice and without any warning to the dictatorship , \" he said .\n1\t2896308\t2896334\tFederal Agriculture Minister Warren Truss said the Government still did not know the real reason the sheep were rejected at the Saudi port of Jeddah on August 21 .\tHe said the Government still did not know the real reason the original Saudi buyer pulled out on August 21 .\n1\t2110775\t2110924\tTom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said that scenario is one among many that investigators are considering .\tTom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said investigators are considering the scenario .\n1\t1762569\t1762526\tHester said Sanmina was the best fit among several purchase offers the company received from electronics manufacturers and computer makers .\tHester said Sanmina 's offer was the best among several Newisys received from electronics manufacturers and computer makers .\n0\t2706154\t2706185\tThe other inmate fell but Selenski shimmed down the makeshift rope to a second-story roof and used the mattress to scale a razor-wire fence , Fischi said .\tAfter the other inmate fell , Selenski used the mattress to scale a 10-foot , razor-wire fence , Fischi said .\n1\t1057995\t1057778\tThe hearing , expected to last a week , will determine whether Akbar faces a court-martial .\tThe purpose of the hearing is to determine whether Akbar should be court-martialled .\n1\t1386884\t1386857\tHe said he has begun a court action to seize Beacon Hill 's assets and has frozen more than $ 13 million Beacon Hill had when it closed .\tHe said he has initiated a forfeiture action in court and frozen more than $ 13 million Beacon Hill had when it closed .\n1\t3093023\t3092996\tSpeaking for the first time yesterday , Brigitte 's maternal aunt said his family was unaware he had was in prison or that he had remarried .\tBrigitte 's maternal aunt said his family was unaware he had been sent to prison , or that he had remarried in Sydney .\n1\t1661381\t1661317\t\" Close co-operation between our law enforcement agencies , close co-operation between our intelligence services lie at the heart of the ongoing fight against terrorism . \"\tClose cooperation between regional law enforcement agencies and intelligence services was at the heart of the fight against terrorism , he said .\n0\t2926039\t2925982\tThe mother of a Briton held by Colombian guerrillasspoke of her relief yesterday after hearing that he might be freed in the next few weeks .\tThe parents of a Briton being held hostage by Colombian rebels spoke yesterday of their optimism that he would be freed in time for his birthday next month .\n0\t637168\t637447\tWe strongly disagree with Novell 's position and view it as a desperate measure to curry favor with the Linux community .\tMcBride characterized Novell 's move as \" a desperate measure to curry favor with the Linux community . \"\n1\t696677\t696932\tAfter more than two years ' detention under the State Security Bureau , the four were found guilty of subversion in Beijing 's No. 1 Intermediate Court last Wednesday .\tAfter more than two years in detention by the State Security Bureau , the four were found guilty last Wednesday of subversion .\n1\t3122429\t3122305\tMr Russell , 46 , a coal miner from Brisbane , said : \" They are obviously hurting , so we are basically going over there to help them . \"\t\" They are obviously hurting so we are basically going over there to help them , \" Russell , 46 , said .\n1\t1348909\t1348954\tThe New York Democrat and former first lady has said she will not run for the White House in 2004 , but has not ruled out a race in later years .\tThe former first lady has said she will not run for the White House in 2004 but has not ruled out a race later on .\n0\t162203\t162101\tIt does not affect the current Windows Media Player 9.0 Series .\tWindows Media Player has had security problems before .\n0\t71501\t71627\tThe seizure took place at 4 a.m. on March 18 , just hours before the first American air assault .\tThe time was about 4 a.m. on March 18 , just hours before the first pinpoint missiles rained down on the capital .\n1\t2907762\t2907649\tDonations stemming from the Sept . 11 attacks helped push up contributions to human service organizations and large branches of the United Way by 15 percent and 28.6 percent , respectively .\tDonations stemming from the Sept . 11 attacks helped push up contributions to human service organizations by 15 percent and to large branches of the United Way by 28.6 percent .\n1\t2167771\t2167744\tIn May , Mr. Hatfill said he was struck by a vehicle being driven by an FBI employee who was tailing him in Georgetown .\tLast May , Hatfill was struck by a vehicle being driven by an FBI employee who was tailing him in Washington 's Georgetown neighborhood .\n1\t3320577\t3320553\t\" I will support a constitutional amendment which would honor marriage between a man and a woman , codify that , \" he said .\t\" If necessary , I will support a constitutional amendment which would honour marriage between a man and a woman , codify that . \"\n1\t849291\t849442\tIBM of the US and Infineon Technologies of Germany will today announce a technological development that could threaten multi-billion dollar memory chip markets .\tIBMof the US andInfineon Technologies of Germany willon Tuesdayannounce a technological development that could threaten multi-billion dollar memory chip markets .\n0\t763948\t763991\tCosta 's semifinal opponent is Spaniard Juan Carlos Ferrero , whom he beat in last year 's final .\tCosta will play Juan Carlos Ferrero next in a rematch of last year 's final .\n1\t1908763\t1908744\tA former employee of a local power company pleaded guilty Wednesday to setting off a bomb that knocked out a power substation during the Winter Olympics last year .\tA former Utah Power meter reader pleaded guilty Wednesday to bombing a power substation during the 2002 Winter Olympics .\n0\t1876120\t1876059\tThyroid hormones are known to help in weight loss by stimulating metabolism - and cutting cholesterol - but come with the unwanted side effect of speeding up the heartbeat .\tThyroid hormones are known to help in weight loss by stimulating metabolism , and they can help cut cholesterol too .\n1\t518089\t518133\tJudge Craig Doran said it wasn 't his role to determine if Hovan was \" an evil man \" but maintained that \" he has committed an evil act . \"\tJudge Craig Doran said he couldn 't determine if Hovan was \" an evil man \" but said he \" has committed an evil act . \"\n0\t224932\t224868\tThe Hartford shares rose $ 2.88 , or 6.6 percent , to close Monday at $ 46.50 on the New York Stock Exchange .\tShares of Hartford rose $ 2.88 to $ 46.50 in New York Stock Exchange composite trading .\n1\t1771131\t1771091\tIt also offers a built-in NAND flash boot loader so that high-density NAND flash memory can be used without having to install an additional support chip .\tThe S3C2440 has a built-in NAND flash boot loader , for example , so that high-density NAND flash memory can be installed without an additional support chip .\n0\t2728425\t2728251\tIt decided instead to issue them before the stock market opened Monday after the downgrade of its debt late Friday by Moody 's , the credit rating agency .\tIt decided instead to issue them before the stock market opened Monday to counteract the downgrade of its debt late Friday by Moody 's to one step above junk status .\n0\t953733\t953537\tAltria shares fell 2.5 percent or $ 1.11 to $ 42.57 and were the Dow 's biggest percentage loser .\tIts shares fell $ 9.61 to $ 50.26 , ranking as the NYSE 's most-active issue and its biggest percentage loser .\n1\t349215\t349241\tIt will be followed in November by a third movie , \" The Matrix Revolutions . \"\tThe film is the second of a trilogy , which will wrap up in November with \" The Matrix Revolutions . \"\n1\t2919853\t2919804\tMassachusetts regulators and the Securities and Exchange Commission on Tuesday pressed securities fraud charges against Putnam Investments and two of its former portfolio managers for alleged improper mutual fund trading .\tState and federal securities regulators filed civil charges against Putnam Investments and two portfolio managers in the ever-expanding mutual fund trading scandal .\n1\t954526\t954607\tHe is blocking them until the Air Force assigns four additional C-130 cargo planes to Gowen Field , an Idaho Air National Guard base in Boise .\tHe is holding them up until the Air Force agrees to assign four additional C-130 cargo planes to the Idaho Air National Guard .\n1\t69773\t69792\tCisco pared spending to compensate for sluggish sales .\tIn response to sluggish sales , Cisco pared spending .\n0\t2823575\t2823513\tThe study , published Monday in the journal Molecular Brain Research , is likely to also apply to humans , its authors said .\tThe study , conducted on the brains of developing mice , was being published today in the journal Molecular Brain Research .\n1\t2455942\t2455978\tMy decision today is not based on any one event . \"\tGovernor Rowland said his decision was \" not based on any one event . \"\n1\t131979\t131957\tNelson , 27 , is being retried on civil-rights charges stemming from the disturbance which led to Rosenbaum 's death .\tNelson , 27 , is being retried on civil rights charges stemming from the disturbance that led to Rosenbaum 's death .\n0\t2010705\t2010779\t\" The government elements who have been causing trouble are still in place .\tThe government elements who have been causing trouble are still in place , they are attacking us . \"\n1\t54142\t53641\tNext Monday at about 2 p.m. ( CST ) , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms .\tAround the same time , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms .\n1\t1015249\t1015204\tWal-Mart Stores Inc . , Kohl 's Corp. , Family Dollar Stores Inc. and Big Lots Inc. were among the merchants posting May sales that fell below Wall Street 's modest expectations .\tWal- Mart , Kohl 's Corp. , Family Dollar Stores Inc . , and Big Lots Inc. posted May sales that fell below Wall Street 's modest expectations .\n0\t753928\t753890\tThe patch also fixes a vulnerability that results because IE does not implement an appropriate block on a file download dialog box .\tThe second vulnerability is a result of IE not implementing a block on a file download dialog box .\n1\t3022833\t3023029\tPeterson , a former fertilizer salesman , is charged with murder in the deaths of his 27-year-old wife and the baby boy she was carrying .\tPeterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son .\n0\t751520\t751373\tSPOT products run a Microsoft operating system and the company 's DirectBand radio technology developed with SCA Data Systems .\tThe DirectBand network was developed with the assistance of SCA Data Systems .\n0\t218848\t218851\tHe replaces Ron Dittemore , who announced his resignation in April .\tDittemore announced his plans to resign on April 23 .\n1\t3181118\t3181443\tDetectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , of the arrest shortly after Perry was apprehended .\tShortly after his arrest , detectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , a medical assistant , about the development .\n1\t515581\t515752\tThey were among about 40 people attending the traditional Jewish ceremony colored by some non-traditional touches .\tHe said about 40 people attended the traditional Jewish ceremony colored by some nontraditional touches .\n1\t347022\t347003\tTaiwan had been relatively free of the viral infection until a fiasco at a Taipei hospital in late April caused the number of infections to skyrocket .\tTaiwan had been relatively free of the viral infection until a severe outbreak at a Taipei hospital in late April .\n1\t3311600\t3311633\tMr. Rowland attended a party in South Windsor for the families of Connecticut National Guard soldiers called to active duty .\tRowland was making an appearance at a holiday party for families of Connecticut National Guard soldiers assigned to duty in Iraq and Afghanistan .\n0\t3439114\t3439084\tRoss Garber , Rowland 's lawyer , said Tuesday he would attend the meeting and would ask to speak on the issue .\tRoss Garber , Rowland 's legal counsel , said the governor would have no comment on the condo deal .\n0\t487951\t488007\tThe euro was at 1.5281 versus the Swiss franc EURCHF = , up 0.2 percent on the session , after hitting its highest since mid-2001 around 1.5292 earlier in the session .\tThe euro was steady versus the Swiss franc after hitting its highest since mid-2001 of 1.5261 earlier in the session .\n0\t314997\t315030\tOn the stand Wednesday , she said she was referring only to the kissing .\tOn the stand Wednesday , she testified that she was referring to the kissing before the alleged rape .\n0\t4733\t4557\tGarner said the group would probably be expanded to include , for example , a Christian and perhaps another Sunni leader .\tThe group has already met several times and Gen. Garner said it probably will be expanded to include a Christian and perhaps another Sunni Muslim leader .\n1\t2820371\t2820525\tBlair 's Foreign Secretary Jack Straw was to take his place on Monday to give a statement to parliament on the European Union .\tBlair 's office said his Foreign Secretary Jack Straw would take his place on Monday to give a statement to parliament on the EU meeting the prime minister attended last week .\n1\t801552\t801516\t\" There were more people surrounding the clubhouse than the Unabomber 's house up in the hills , \" Baker said .\t\" There are more people surrounding the clubhouse than surrounded the Unabomber 's home in the hills .\n1\t1704987\t1705268\tCharles O. Prince , 53 , was named as Mr. Weill 's successor .\tMr. Weill 's longtime confidant , Charles O. Prince , 53 , was named as his successor .\n1\t396041\t396188\tOfficials are also meeting with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world .\tCanadian officials were also expected to meet yesterday with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world .\n0\t1014983\t1014963\tGE stock closed Friday at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange .\tGE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange .\n1\t2320654\t2320666\tThe Midwestern research center will focus on the development of diagnostic , therapeutic and vaccine products for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague .\tThe Midwestern center will focus on diagnosis , treatment and vaccines for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague .\n1\t1057876\t1057778\tThe hearing is to determine whether there is enough evidence to order Akbar to a general court-martial proceeding .\tThe purpose of the hearing is to determine whether Akbar should be court-martialled .\n0\t2116843\t2116883\tIn the United States , heart attacks kill about 460,000 year , in Canada about 80,000 .\tIn the United States , heart attacks kill about 460,000 yearly , according to the National Institutes of Health .\n1\t1461629\t1461781\tNinety-five percent of international cargo to the United States is carried by ship .\tShips carry 95 percent of international cargo to the United States .\n0\t374015\t374162\t\" It 's a major victory for Maine , and it 's a major victory for other states .\tThe Maine program could be a model for other states .\n1\t2493369\t2493428\tNews that oil producers were lowering their output starting in November exacerbated a sell-off that was already under way on Wall Street .\tNews that the Organization of Petroleum Exporting Countries was lowering output starting in November exacerbated a stock sell-off already under way yesterday .\n1\t490355\t490378\tThey note that after several weeks of rallies on upbeat earnings , investors are looking for stronger evidence of a recovery before sending stocks higher .\tAfter several weeks of market rallies on upbeat earnings , many investors are looking for more concrete signs of an economic recovery .\n1\t2691044\t2691264\tMost economists had expected a more dire report , with many anticipating the fifth month of job losses in six months .\tMost economists had been expecting a far more dire report , with many expecting to see the fifth month of job losses in six months in September .\n1\t1831453\t1831491\tBut software license revenues , a measure financial analysts watch closely , decreased 21 percent to $ 107.6 million .\tLicense sales , a key measure of demand , fell 21 percent to $ 107.6 million .\n1\t2380695\t2380822\tKing , brand-name writer , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters .\tStephen King , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters from the National Book Foundation .\n1\t2577517\t2577531\tThe Denver-based natural gas producer and marketer said the inaccurate reporting was discovered after it received a subpoena from the U.S. Commodity Futures Trading Commission .\tThe natural gas producer and marketer said the inaccurate reporting was discovered in response to a subpoena from the U.S. Commodity Futures Trading Commission , or CFTC .\n1\t3267026\t3266930\tThe steel tariffs , which the U.S. president imposed in March 2002 , will officially end at midnight , instead of March 2005 as initially planned .\tThe U.S. steel tariffs , which Bush imposed in March 2002 , were to officially end at midnight Thursday ( 0500 GMT ) , instead of March 2005 as initially planned .\n1\t360875\t360943\tBusiness Week 's online edition reported on Friday that WorldCom and the SEC could announce a settlement as early as Monday .\tBusinessWeek Online has learned that the settlement could come as early as Monday , May 19 .\n1\t162632\t162653\tOnly one of the five buildings in the Baghdad compound of the United Nations Development Program escaped being burned , the UN said on its Web site .\tOnly one of the five buildings in the compound in Baghdad run by the UN Development Program , escaped being burned , the UN said on its Web site .\n1\t1128884\t1128865\tShares of Salix have rocketed 64 percent since Axcan made its first offer on April 10 .\tSince the initial takeover offer , Salix shares have risen about 35 percent .\n1\t3264732\t3264648\tThe jury verdict , reached Wednesday after less than four hours of deliberation , followed a 2 week trial , during which Waagner represented himself .\tThe quick conviction followed a 2 1 / 2 week trial , during which the Venango County man represented himself .\n1\t1721433\t1721267\tIt 's happened five times in the last 11 years : A disaster puts this Southwestern town in the headlines during the summer tourist season .\tIt 's happened five times in the last decade : A disaster puts this tourist town in the headlines during summer , its busiest season .\n0\t146112\t146127\tThe broader Standard & Poor 's 500 Index .SPX edged down 9 points , or 0.98 percent , to 921 .\tThe technology-laced Nasdaq Composite Index < .IXIC > shed 15 points , or 0.98 percent , to 1,492 .\n1\t389117\t389052\tThe company emphasized that McDonald 's USA does not import any raw beef or hamburger patties from Canada for McDonald 's use in the United States .\tMcDonald 's said in a statement that it does not import any raw beef or hamburger patties from Canada for use in the United States .\n1\t872784\t872834\tGregory Parseghian , a former investment banker , was appointed chief executive .\tGreg Parseghian was appointed the new chief executive .\n0\t2977500\t2977547\tTheir contract will expire at 12 : 01 a.m. Wednesday instead of 12 : 01 a.m. Sunday , said Rian Wathen , organizing director for United Food and Commercial Workers Local 700 .\t\" It has outraged the membership , \" said Rian Wathen , organizing director of United Food and Commercial Workers Local 700 .\n1\t3107137\t3107119\tBut plaque volume increased by 2.7 percent in pravastatin patients .\tThe volume of plaque in Pravachol patients ' arteries rose by 3 % .\n1\t1619244\t1619274\tToday in the US , the book - kept under wraps by its publishers , G. P. Putnam 's Sons , since its inception - will appear in bookstores .\tTomorrow the book , kept under wraps by G. P. Putnam 's Sons since its inception , will appear in bookstores .\n0\t3061836\t3062031\tThe S & P / TSX composite rose 87.74 points on the week , while the TSX Venture Exchange composite gained 44.49 points .\tOn the week , the Dow Jones industrial average rose 11.56 points , while the Nasdaq Stock Market gained 39.42 points .\n1\t485999\t486011\tEx-KGB agent Putin added that the Beatles were considered ' propaganda of an alien ideology ' .\tIn Soviet times the Beatles ' music \" was considered propaganda of an alien ideology .\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/mrpc.proto",
    "content": "# coding=utf-8\n\n\"\"\" Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n    Program to gather information from a system\n\"\"\"\n\nsyntax = \"proto3\";\n\npackage mrpc;\n\nservice mrpc {\n    rpc paraphrase (TextPair) returns (YesNo) {}\n}\n\nmessage TextPair {\n    bytes text_a = 1;\n    bytes text_b = 2;\n}\n\nmessage YesNo {\n    bytes message = 1;\n    bytes prediction = 2;\n}\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tutorials/index.rst",
    "content": ".. _tensorflow-tutorials:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow Tutorials\n====================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nBefore running a tutorial\n-------------------------\n\nYou will run the tutorials on an inf1.6xlarge instance running Deep Learning AMI (DLAMI) to enable both compilation and deployment (inference) on the same instance. In a production environment we encourage you to try different instance sizes to optimize to your specific deployment needs.\n\nFollow instructions at :ref:`tensorflow-tutorial-setup` before running a TensorFlow tutorial on Inferentia. We recommend new users start with the ResNet-50 tutorial.\n\n\n.. toctree::\n   :hidden:\n\n   /archive/tensorflow/tensorflow-neuron/tutorials/tensorflow-tutorial-setup\n\n.. _tensorflow-nlp:\n\nNatural Language Processing\n---------------------------\n\n*  Tensorflow 2.x - HuggingFace DistilBERT with Tensorflow2 Neuron :ref:`[html] </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>` :github:`[notebook] </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>`\n\n.. toctree::\n   :hidden:\n\n   /archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo\n   /src/examples/tensorflow/huggingface_bert/huggingface_bert\n\n.. _tensorflow-utilize-neuron:\n\nUtilizing Neuron Capabilities\n-----------------------------\n\n*  Tensorflow 2.x - Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving :ref:`[html] </src/examples/tensorflow/tensorflow_serving_tutorial.rst>`\n\n.. toctree::\n   :hidden:\n\n   /src/examples/tensorflow/tensorflow_serving_tutorial.rst\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tutorials/k8s_bert_demo/Dockerfile.tfserving_example",
    "content": "From ubuntu:16.04\nRUN apt-get update\nRUN apt-get install -y wget apt-transport-https ca-certificates awscli\nRUN echo \"deb https://apt.repos.neuron.amazonaws.com xenial main\" > /etc/apt/sources.list.d/neuron.list\nRUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -\n\nRUN apt-get update\nRUN apt-get install -y tensorflow-model-server-neuron"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tutorials/tensorflow-tutorial-setup.rst",
    "content": ".. _tensorflow-tutorial-setup:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow Tutorial Setup\n=========================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n#. Launch an Inf1.6xlarge Instance:\n    .. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst\n\n#. Set up a development environment:\n    * Enable or install TensorFlow-Neuron: :ref:`install-neuron-tensorflow`.\n    \n#. Run tutorial in Jupyter notebook:\n    * Follow instruction at :ref:`Setup Jupyter notebook <setup-jupyter-notebook-steps-troubleshooting>` to:\n    \n      #. Start the Jupyter Notebook on the instance\n      #. Run the Jupyter Notebook from your local browser\n\n    * Connect to the instance from the terminal, clone the Neuron Github repository to the Inf1 instance and then change the working directory to the tutorial directory:\n\n      .. code::\n\n        git clone https://github.com/aws/aws-neuron-sdk.git\n        cd aws-neuron-sdk/src/examples/tensorflow\n\n    * Locate the tutorial notebook file (.ipynb file) under ``aws-neuron-sdk/src/examples/tensorflow``\n    * From your local browser, open the tutorial notebook from the menu and follow the instructions.\n\n    \n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTutorials  (``tensorflow-neuron``)\n===================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Natural Language Processing (NLP) Tutorials </archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-nlp>\n    Utilizing Neuron Capabilities Tutorials </archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-utilizing-neuron-capabilities>\n\n\n.. include:: /archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron.txt\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron.txt",
    "content": ".. tab-set::\n                            \n    .. tab-item:: Natural Language Processing (NLP) Tutorials\n        \n        *  Tensorflow 2.x - HuggingFace Pipelines distilBERT with Tensorflow2 Neuron :ref:`[html] </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>` :github:`[notebook] </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>`\n\n                            \n        \n    .. tab-item:: Utilizing Neuron Capabilities Tutorials\n        \n        *  Tensorflow 2.x - Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving :ref:`[html] </src/examples/tensorflow/tensorflow_serving_tutorial.rst>`\n\n.. note::\n\n    To use Jupyter Notebook see:\n\n    * :ref:`setup-jupyter-notebook-steps-troubleshooting`\n    * :ref:`running-jupyter-notebook-as-script` "
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-nlp.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nNatural Language Processing (NLP) Tutorials (``tensorflow-neuron``)\n===================================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n*  Tensorflow 2.x - HuggingFace DistilBERT with Tensorflow2 Neuron :ref:`[html] </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>` :github:`[notebook] </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>`\n\n.. toctree::\n    :hidden:\n\n    /archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo\n    /src/examples/tensorflow/huggingface_bert/huggingface_bert\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-utilizing-neuron-capabilities.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUtilizing Neuron Capabilities Tutorials (``tensorflow-neuron``)\n===============================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n*  Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving :ref:`[html] <tensorflow-serving-neuronrt-visible-cores>`\n\n.. note::\n\n   To use Jupyter Notebook see:\n\n   * :ref:`setup-jupyter-notebook-steps-troubleshooting`\n   * :ref:`running-jupyter-notebook-as-script` \n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron-inference.rst",
    "content": ".. _inference-tensorflow-neuron:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInference on Inf1 (``tensorflow-neuron``)\n=========================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Tutorials </archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron>\n    Additional Examples  </archive/tensorflow/tensorflow-neuron/additional-examples>\n    API Reference Guide  </archive/tensorflow/tensorflow-neuron/api-reference-guide>\n    Misc  </archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron>\n\n\n.. include:: tensorflow-neuron-inference.txt\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuron-inference.txt",
    "content": ".. card:: Setup  (``tensorflow-neuron``)\n            :class-body: sphinx-design-class-title-small\n\n            See :doc:`TensorFlow Neuron setup </archive/tensorflow/index>`.\n\n\n.. dropdown::  Tutorials (``tensorflow-neuron``)\n    :class-title: sphinx-design-class-title-med\n    :animate: fade-in\n                \n    .. include:: /archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron.txt\n\n\n.. dropdown::  Additional Examples (``tensorflow-neuron``)\n    :class-title: sphinx-design-class-title-med\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /archive/tensorflow/tensorflow-neuron/additional-examples.txt\n\n\n.. dropdown::  API Reference Guide (``tensorflow-neuron``)\n    :class-title: sphinx-design-class-title-med\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /archive/tensorflow/tensorflow-neuron/api-reference-guide.txt\n\n\n.. dropdown::  Misc (``tensorflow-neuron``)\n    :class-title: sphinx-design-class-title-med\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n                \n    .. include:: /archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron.txt\n\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/api-reference-guide.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nAPI Reference Guide (``tensorflow-neuronx``)\n===========================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /archive/tensorflow/tensorflow-neuronx/tfneuronx-python-tracing-api\n    /archive/tensorflow/tensorflow-neuronx/tf-neuronx-auto-replication-api\n    /archive/tensorflow/tensorflow-neuronx/tfnx-analyze-model-api\n\n\n.. include:: /archive/tensorflow/tensorflow-neuronx/api-reference-guide.txt\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/api-reference-guide.txt",
    "content": "* :ref:`tfneuronx-ref-neuron-tracing-api`\n* :ref:`tf-neuronx-ref-auto-replication-python-api`\n* :ref:`tf-neuronx-ref-analyze-model-api`"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nMisc (``tensorflow-neuronx``)\n============================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /release-notes/archive/tensorflow/tensorflow-neuronx/tensorflow-neuronx\n\n\n\n.. include:: /archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx.txt\n    \n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx.txt",
    "content": "* :ref:`tensorflow-neuronx-release-notes`"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/index.rst",
    "content": ".. _tensorflow-neuron-setup:\n.. _tensorflow-neuronx-main:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow Setup Guide for Inf2 & Trn1\n======================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n\n    Fresh install </archive/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install>\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.8.0-tensorflow-install.rst",
    "content": ".. _install-neuronx-2.8.0-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Tensorflow Neuron (Neuron 2.8.0)\n========================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. tab-set::\n\n    .. tab-item:: Tensorflow 2.10.0\n\n        .. tab-set::\n\n            .. tab-item:: Amazon Linux 2 AMI\n\n                .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.0 --neuron-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n            .. tab-item:: Ubuntu 20 AMI\n\n                .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.0 --neuron-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.9.0-tensorflow-install.rst",
    "content": ".. _install-neuronx-2.9.0-tensorflow:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Tensorflow Neuron (Neuron 2.9.0)\n========================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. tab-set::\n\n    .. tab-item:: Tensorflow 2.10.0\n\n        .. tab-set::\n\n            .. tab-item:: Amazon Linux 2 AMI\n\n                .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.0 --neuron-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n            .. tab-item:: Ubuntu 20 AMI\n\n                .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.0 --neuron-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-al2.rst",
    "content": ".. _tensorflow-neuronx-install-prev-al2:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous TensorFlow Neuron Releases for Amazon Linux (``tensorflow-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\n\nThis section will assist you in installing previous Neuron releases.\n\n.. tab-set::\n\n\n    .. tab-item:: Neuron 2.18.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.17.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.17.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.16.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.16.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-al2023.rst",
    "content": ".. _tensorflow-neuronx-install-prev-al2023:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous TensorFlow NeuronX Releases for Amazon Linux 2023 (``tensorflow-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\n\nThis section will assist you in installing previous Neuron releases.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.21.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-u20.rst",
    "content": ".. _tensorflow-neuronx-install-prev-u20:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous TensorFlow Neuron Releases for Ubuntu (``tensorflow-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\n\nThis section will assist you in installing previous Neuron releases.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.18.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-u22.rst",
    "content": ".. _tensorflow-neuronx-install-prev-u20:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous TensorFlow Neuron Releases for Ubuntu (``tensorflow-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\n\nThis section will assist you in installing previous Neuron releases.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.21.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n    \n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install.rst",
    "content": ".. _install-tensorflow-neuronx:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall TensorFlow 2.x (``tensorflow-neuronx``)\n===============================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n.. tab-set::\n\n    .. tab-item:: Tensorflow 2.10.1\n\n        .. tab-set::\n\n            .. tab-item:: Amazon Linux 2\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 13\n                    :end-line: 16\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 32\n                    :end-line: 33\n\n            .. tab-item:: Ubuntu 20\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 19\n                    :end-line: 22\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 35\n                    :end-line: 36\n\n    .. tab-item:: Tensorflow 2.9.3\n\n        .. tab-set::\n\n            .. tab-item:: Amazon Linux 2\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 13\n                    :end-line: 16\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 74\n                    :end-line: 75\n\n            .. tab-item:: Ubuntu 20\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 19\n                    :end-line: 22\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 77\n                    :end-line: 78\n\n    .. tab-item:: Tensorflow 2.8.4\n\n      .. tab-set::\n\n            .. tab-item:: Amazon Linux 2\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 13\n                    :end-line: 16\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 80\n                    :end-line: 81\n\n            .. tab-item:: Ubuntu 20\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 19\n                    :end-line: 22\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 83\n                    :end-line: 84\n\n    .. tab-item:: Tensorflow 2.7.4\n\n      .. tab-set::\n\n            .. tab-item:: Amazon Linux 2\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 13\n                    :end-line: 16\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 86\n                    :end-line: 87\n\n            .. tab-item:: Ubuntu 20\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 19\n                    :end-line: 22\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 89\n                    :end-line: 90\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-al2-dlami.rst",
    "content": "\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n.. tensorflow-neuronx-al2-update:\n\nUpdate to latest TensorFlow Neuron  (``tensorflow-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nIf you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release.\n\n\n.. tab-set::\n\n    .. tab-item:: Tensorflow 2.10.1\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 122\n            :end-line: 123\n\n\n    .. tab-item:: Tensorflow 2.9.3\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 125\n            :end-line: 126\n\n\n    .. tab-item:: Tensorflow 2.8.4\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 128\n            :end-line: 129\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-al2.rst",
    "content": "\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n.. tensorflow-neuronx-al2-update:\n\nUpdate to latest TensorFlow Neuron  (``tensorflow-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nIf you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release.\n\n\n.. tab-set::\n\n    .. tab-item:: Tensorflow 2.10.1\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 56\n            :end-line: 57\n\n\n    .. tab-item:: Tensorflow 2.9.3\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 62\n            :end-line: 63\n\n\n    .. tab-item:: Tensorflow 2.8.4\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 68\n            :end-line: 69\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u20-dlami.rst",
    "content": "\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n.. tensorflow-neuronx-u20-update:\n\nUpdate to latest TensorFlow Neuron  (``tensorflow-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nIf you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release.\n\n\n.. tab-set::\n\n    .. tab-item:: Tensorflow 2.10.1\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 131\n            :end-line: 132\n\n\n    .. tab-item:: Tensorflow 2.9.3\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 134\n            :end-line: 135\n\n\n    .. tab-item:: Tensorflow 2.8.4\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 137\n            :end-line: 138\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u20.rst",
    "content": "\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n.. tensorflow-neuronx-u20-update:\n\nUpdate to latest TensorFlow NeuronX  (``tensorflow-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nIf you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release.\n\n\n.. tab-set::\n\n    .. tab-item:: Tensorflow 2.10.1\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 59\n            :end-line: 60\n\n\n    .. tab-item:: Tensorflow 2.9.3\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 65\n            :end-line: 66\n\n\n    .. tab-item:: Tensorflow 2.8.4\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 71\n            :end-line: 72\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u22.rst",
    "content": "\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n.. tensorflow-neuronx-u22-update:\n\nUpdate to latest TensorFlow Neuron  (``tensorflow-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nIf you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release.\n\n\n.. tab-set::\n\n    .. tab-item:: Tensorflow 2.10.1\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami --category=compiler_framework\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/tf-neuronx-auto-replication-api.rst",
    "content": ".. _tf-neuronx-ref-auto-replication-python-api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow 2.x (``tensorflow-neuronx``) Auto Multicore Replication (Beta)\n===========================================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThe Neuron auto multicore replication Python API enables modifying TensorFlow 2.x\nmodels trace by ```tensorflow_neuronx.trace``` so that they can be automatically replicated across multiple cores.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nTensorFlow 2.x (``tensorflow-neuron TF2.x``) Auto Multicore Replication Python API (Beta)\n-------------------------------------------------------------------------------------------\n\nMethod\n^^^^^^\n\n``tensorflow.neuron.auto_multicore``\non models traced by\n``tensorflow_neuronx.trace``\n\nDescription\n^^^^^^^^^^^\n\nConverts an existing AWS-Neuron-optimized ``keras.Model`` and returns an auto-replication tagged\nAWS-Multicore-Neuron-optimized  ``keras.Model`` that can execute on AWS Machine Learning Accelerators.\nLike the traced model, the returned ``keras.Model`` will support inference only. Attributes or\nvariables held by the original function or ``keras.Model`` will be dropped.\n\nThe auto model replication feature in TensorFlow-Neuron enables you to\ncreate a model once and the model parallel replication would happen\nautomatically. The desired number of cores can be less than the total available NeuronCores\non an trn1 or inf2 instance but not less than 1. This reduces framework memory usage as you are not\nloading the same model multiple times manually. Calls to the returned model will execute the call\non each core in a round-robin fashion.\n\nThe returned ``keras.Model`` can be exported as SavedModel and served using\nTensorFlow Serving. Please see the TensorFlow Serving documentation for more\ninformation about exporting to saved model and serving using TensorFlow\nServing.\n\nNote that the automatic replication will only work on models compiled with pipeline size 1:\nvia ``--neuroncore-pipeline-cores=1``. If auto replication is not enabled, the model will default to\nreplicate on up to 4 cores.\n\nSee  :ref:`neuron-compiler-cli-reference-guide` for more information about compiler options.\n\nArguments\n^^^^^^^^^\n\n-   **func:** The ``keras.Model`` or function to be traced.\n-   **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of\n    ``tf.Tensor`` objects for tracing the function. When ``example_inputs``\n    is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect\n    ``func`` to have calling signature ``func(example_inputs)``. Otherwise,\n    the expectation is that inference on ``func`` is done by calling\n    ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``,\n    or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``.\n    The case where ``func`` accepts mixed positional and keyword arguments\n    is currently unsupported.\n-   **num_cores:** The desired number of cores where the model will be automatically\n    replicated across\n\nReturns\n^^^^^^^\n\n-  An AWS-Multicore-Neuron-optimized ``keras.Model``.\n\n\nExample Python API Usage for TF2.x traced models:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code :: python\n\n        import tensorflow as tf\n        import tensorflow.neuron as tfn\n        import tensorflow_neuronx as tfnx\n\n        input0 = tf.keras.layers.Input(3)\n        dense0 = tf.keras.layers.Dense(3)(input0)\n        inputs = [input0]\n        outputs = [dense0]\n        model = tf.keras.Model(inputs=inputs, outputs=outputs)\n        input0_tensor = tf.random.uniform([1, 3])\n        model_neuron = tfnx.trace(model, input0_tensor)\n\n        # a trn1.2xlarge has 2 neuron cores\n        num_cores = 2\n        multicore_model = tfn.auto_multicore(model_neuron, input0_tensor, num_cores=num_cores)\n        multicore_model(input0_tensor)\n\nExample Python API Usage for TF2.x saved models:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code :: python\n\n        from tensorflow.python import saved_model\n\n        input0_tensor = tf.random.uniform([1, 3])\n        num_cores = 4\n        reload_model = saved_model.load(model_dir)\n        multicore_model = tfn.auto_multicore(reload_model, input0_tensor, num_cores=num_cores)\n\n.. _tensorflow-ref-auto-replication-cli-api-neuronx:\n\nTensorFlow Neuron TF2.x (``tensorflow-neuronx TF2.x``) Auto Multicore Replication CLI (Beta)\n---------------------------------------------------------------------------------------------------------------\n\nThe Neuron auto multicore replication CLI  enables modifying Tensorflow 2.x\ntraced saved models so that they can be automatically replicated across multiple cores. By performing\nthis call on Tensorflow Saved Models, we can support Tensorflow-Serving\nwithout significant modifications to the code.\n\nMethod\n^^^^^^\n\n``tf-neuron-auto-multicore MODEL_DIR --num_cores NUM_CORES --new_model_dir NEW_MODEL_DIR``\n\nArguments\n^^^^^^^^^\n\n-   **MODEL_DIR:** The directory of a saved AWS-Neuron-optimized ``keras.Model``.\n-   **NUM_CORES:** The desired number of cores where the model will be automatically\n    replicated across\n-   **NEW_MODEL_DIR:** The directory of where the AWS-Multicore-Neuron-optimized\n    ``keras.Model`` will be saved\n\nExample CLI Usage for Tensorflow-Serving saved models:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code :: python\n\n        tf-neuron-auto-multicore ./resnet --num_cores 8 --new_model_dir ./modified_resnet\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/tfneuronx-python-tracing-api.rst",
    "content": ".. _tfneuronx-ref-neuron-tracing-api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow 2.x (``tensorflow-neuronx``) Tracing API\n====================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThe Neuron tracing API enables tracing TensorFlow 2.x models for deployment\non trn1 and inf2 AWS machine learning accelerators.\n\nMethod\n------\n\n``tensorflow_neuronx.trace``\n\nDescription\n-----------\n\nTrace a ``keras.Model`` or a Python callable that can be decorated by\n``tf.function``, and return an AWS-Neuron-optimized ``keras.Model`` that\ncan execute on trn1 and inf2 AWS machine learning accelerators. Tracing is\nideal for ``keras.Model`` that accepts a list of ``tf.Tensor`` objects and\nreturns a list of ``tf.Tensor`` objects. It is expected that users will\nprovide example inputs, and the ``trace`` function will execute ``func``\nsymbolically and convert it to a ``keras.Model``.\n\nThe returned ``keras.Model`` will support inference only. Attributes or\nvariables held by the original function or ``keras.Model`` will be dropped.\n\nThe returned ``keras.Model`` can be exported as SavedModel and served using\nTensorFlow Serving. Please see the TensorFlow Serving documentation for more\ninformation about exporting to saved model and serving using TensorFlow\nServing.\n\nThe returned ``keras.Model`` has an ``.on_neuron_ratio`` attribute\nwhich shows the percentage of ops mapped to neuron hardware. This calculation\nignores PlaceholerOp, IdentityOp, ReadVariableOp and NoOp.\n\nOptions can be passed to Neuron compiler via the environment variable\n``NEURON_CC_FLAGS``. For example, the syntax\n``env NEURON_CC_FLAGS=\"--workdir ./artifacts\"`` directs the Neuron compiler to dump artifacts\nin the artifacts directory for debugging. See :ref:`neuron-compiler-cli-reference-guide` for more\ninformation about compiler options.\n\nArguments\n---------\n\n-   **func:** The ``keras.Model`` or function to be traced.\n-   **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of\n    ``tf.Tensor`` objects for tracing the function. When ``example_inputs``\n    is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect\n    ``func`` to have calling signature ``func(example_inputs)``. Otherwise,\n    the expectation is that inference on ``func`` is done by calling\n    ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``,\n    or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``.\n    The case where ``func`` accepts mixed positional and keyword arguments\n    is currently unsupported.\n-   **subgraph_builder_function:** (Optional) A callable with signature\n\n    ``subgraph_builder_function(node : NodeDef) -> bool``\n    (``NodeDef`` is defined in tensorflow/core/framework/node_def.proto)\n\n    that is used as a call-back function to determine which part of\n    the tensorflow GraphDef given by tracing ``func`` will be placed on\n    Machine Learning Accelerators.\n\n    If ``subgraph_builder_function`` is not provided, then ``trace`` will\n    automatically place operations on Machine Learning Accelerators or\n    on CPU to maximize the execution efficiency.\n\n    If it is provided, and ``subgraph_builder_function(node)`` returns\n    ``True``, and placing ``node`` on Machine Learning Accelerators\n    will not cause deadlocks during execution, then ``trace`` will place\n    ``node`` on Machine Learning Accelerators. If\n    ``subgraph_builder_function(node)`` returns ``False``, then ``trace``\n    will place ``node`` on CPU.\n\n.. _tensorflow-neuronx-special-flags:\n\nSpecial Flags\n-------------\n\nThese are flags that get passed directly to the Neuron tracing API\n(rather than the Neuron Compiler). The flags are still passed\nvia the environment variable ``NEURON_CC_FLAGS``.\n\n-   **workdir:** example usage - ``NEURON_CC_FLAGS='--workdir ./artifacts'``\n    will create a folder named artifacts in the current directory and\n    save artifacts that can be used for debug.\n-   **dynamic-batch-size:** example usage -\n    ``NEURON_CC_FLAGS='--dynamic-batch-size'`` A flag to allow Neuron graphs to\n    consume variable sized batches of data. Dynamic sizing is restricted to the\n    0th dimension of a tensor.\n-   **extract-weights (Beta):** example usage - \n    ``NEURON_CC_FLAGS='--extract-weights trn1.2xlarge'`` will reduce the compiled\n    model's protobuf size by taking the weights out of the protobuf.\n    Useful for compiling large models that would exceed the 2GB protobuf\n    size limit. This feature is in beta. Model performance is not\n    guaranteed and the flag does not work in combination with\n    ``--neuroncore-pipeline-cores``, ``--dynamic-batch-size``, models with\n    multiple NEFFs, and models that are 16GB or greater. \n    Compiles models for different neuron instances depending on the instance type passed.\n    Supports all trn1 and inf2 instance types except for trn1n.\n\nReturns\n-------\n\n-  An AWS-Neuron-optimized ``keras.Model``.\n\n\nExample Usage\n-------------\n\n.. code:: python\n\n    import tensorflow as tf\n    import tensorflow_neuronx as tfnx\n\n    input0 = tf.keras.layers.Input(3)\n    dense0 = tf.keras.layers.Dense(3)(input0)\n    model = tf.keras.Model(inputs=[input0], outputs=[dense0])\n    example_inputs = tf.random.uniform([1, 3])\n    model_neuron = tfnx.trace(model, example_inputs)  # trace\n    # check to see how much of the model was compiled successfully\n    print(model_neuron.on_neuron_ratio) \n\n    model_dir = './model_neuron'\n    model_neuron.save(model_dir)\n    model_neuron_reloaded = tf.keras.models.load_model(model_dir)\n\n\nExample Usage with Manual Device Placement Using ``subgraph_builder_function``\n------------------------------------------------------------------------------\n\n.. code:: python\n\n    import tensorflow as tf\n    import tensorflow_neuronx as tfnx\n\n    input0 = tf.keras.layers.Input(3)\n    dense0 = tf.keras.layers.Dense(3)(input0)\n    reshape0 = tf.keras.layers.Reshape([1, 3])(dense0)\n    output0 = tf.keras.layers.Dense(2)(reshape0)\n    model = tf.keras.Model(inputs=[input0], outputs=[output0])\n    example_inputs = tf.random.uniform([1, 3])\n\n    def subgraph_builder_function(node):\n        return node.op == 'MatMul'\n\n    model_neuron = tfnx.trace(\n        model, example_inputs,\n        subgraph_builder_function=subgraph_builder_function,\n    )\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/tfnx-analyze-model-api.rst",
    "content": ".. _tf-neuronx-ref-analyze-model-api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorFlow 2.x (``tensorflow-neuronx``) analyze_model API\n==========================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nMethod\n------\n\n``tensorflow_neuronx.analyze_model``\n\nDescription\n-----------\n\nAnalyzes a ``keras.Model`` or a Python callable that can be decorated by\n``tf.function`` for it's compatibility with Neuron. It displays supported \nvs. unsupported operators in the model as well as percentages and counts of \neach operator and returns a dictionary with operator statistics.\n\nArguments\n---------\n\n-   **func:** The ``keras.Model`` or function to be analyzed.\n-   **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of\n    ``tf.Tensor`` objects for tracing the function. When ``example_inputs``\n    is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect\n    ``func`` to have calling signature ``func(example_inputs)``. Otherwise,\n    the expectation is that inference on ``func`` is done by calling\n    ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``,\n    or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``.\n    The case where ``func`` accepts mixed positional and keyword arguments\n    is currently unsupported.\n\nReturns\n-------\n\n-  A results ``dict`` with these keys: ``'percent_supported'``, ``'supported_count'``, ``'total_count'``, ``'supported_operators'``, ``'unsupported_operators'``, ``'operators'``, ``'operator_count'``.\n\nExample Usage\n-------------\n\n.. code:: python\n\n    import tensorflow as tf\n    import tensorflow_neuron as tfnx\n\n    input0 = tf.keras.layers.Input(3)\n    dense0 = tf.keras.layers.Dense(3)(input0)\n    model = tf.keras.Model(inputs=[input0], outputs=[dense0])\n    example_inputs = tf.random.uniform([1, 3])\n    results = tfnx.analyze_model(model, example_inputs)\n    print(results)\n\n    # expected output\n    '''\n    BiasAdd\n\tMatMul\n\t100.00% of all operations (2 of 2) are supported\n\t{'percent_supported': 100.0, 'supported_count': 2, 'total_count': 2, \n\t'supported_operators': {'BiasAdd', 'MatMul'}, 'unsupported_operators': [], \n\t'operators': ['BiasAdd', 'MatMul'], 'operator_count': {'MatMul': 1, 'BiasAdd': 1}}\n\t'''\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/tutorials/tutorial-tensorflowx-serving-NeuronRT-Visible-Cores.rst",
    "content": ".. _tensorflow-servingx-neuronrt-visible-cores:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUsing NEURON_RT_VISIBLE_CORES with TensorFlow Serving\n=====================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nTensorFlow serving allows customers to scale-up inference workloads\nacross a network. TensorFlow Neuron Serving uses the same API as normal\nTensorFlow Serving with two differences: (a) the saved model must be\ncompiled for neuron and (b) the entry point is a different binary\nnamed ``tensorflow_model_server_neuronx``. Follow the steps below \nto install the package using apt-get or dnf. This will be pre-installed in a future release.\n\nInstall TensorFlow Model Server and Serving API\n-----------------------------------------------\n\nFollow the steps in the TensorFlow NeuronX installation guide.\n\nThen ensure you install using either apt-get or dnf.\n\n.. code:: bash\n\n  sudo apt-get install tensorflow-model-server-neuronx\n\nor\n\n.. code:: bash\n\n  sudo dnf install tensorflow-model-server-neuronx\n\nAlso, you would need TensorFlow Serving API (use --no-deps to prevent\ninstallation of regular tensorflow).\n\n.. code:: bash\n\n   pip install --no-deps tensorflow_serving_api\n\nFor the example image preprocessing using Keras preprocessing, the\nPython Imaging Library Pillow is required:\n\n.. code:: bash\n\n   pip install pillow\n\nTo workaround h5py issue https://github.com/aws/aws-neuron-sdk/issues/220:\n\n.. code:: bash\n\n   pip install \"h5py<3.0.0\"\n\n\nExport and Compile Saved Model\n------------------------------\n\nThe following example shows graph construction followed by the addition\nof Neuron compilation step before exporting to saved model.\n\n.. code:: python\n\n    import tensorflow as tf\n    import tensorflow_neuronx as tfnx\n    import numpy as np\n\n    tf.keras.backend.set_learning_phase(0)\n    tf.keras.backend.set_image_data_format('channels_last')\n    image_sizes = [224, 224]\n    model = tf.keras.applications.ResNet50(weights='imagenet')\n    example_inputs = tf.random.uniform([1, *image_sizes, 3], dtype=tf.float32)\n\n    model_neuron = tfnx.trace(model, example_inputs)\n    # run the model once to define the forward pass and allow for saving\n    model_neuron(example_inputs)\n    tf.keras.models.save_model(model_neuron, './resnet50_neuron/1')\n\n\n\nServing Saved Model\n-------------------\n\nUser can now serve the saved model with the\ntensorflow_model_server_neuron binary. To utilize multiple NeuronCores,\nit is recommended to launch multiple tensorflow model servers that\nlisten to the same gRPC port:\n\n.. code:: bash\n\n   export NEURON_RT_VISIBLE_CORES=0  # important to set this environment variable before launching model servers\n   tensorflow_model_server_neuron --model_name=resnet50_neuron \\\n        --model_base_path=$(pwd)/resnet50_neuron/ --port=8500\n\n   # then to run another server on a different neuron core open another\n   # window and run this, except this time set NEURON_RT_VISIBLE_CORES=1\n   # you can keep doing this up to the number of Neuron Cores on your machine\n\n   export NEURON_RT_VISIBLE_CORES=1\n   tensorflow_model_server_neuron --model_name=resnet50_neuron \\\n        --model_base_path=$(pwd)/resnet50_neuron/ --port=8500\n\nThe compiled model is staged in neuron DRAM by the server to prepare\nfor inference.\n\nGenerate inference requests to the model server\n-----------------------------------------------\n\nNow run inferences via GRPC as shown in the following sample client\ncode:\n\n.. code:: python\n\n    import numpy as np\n    import grpc\n    import tensorflow as tf\n    from tensorflow.keras.preprocessing import image\n    from tensorflow.keras.applications.resnet50 import preprocess_input\n    from tensorflow_serving.apis import predict_pb2\n    from tensorflow_serving.apis import prediction_service_pb2_grpc\n    from tensorflow.keras.applications.resnet50 import decode_predictions\n\n    tf.keras.backend.set_image_data_format('channels_last')\n\n    if __name__ == '__main__':\n        channel = grpc.insecure_channel('localhost:8500')\n        stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)\n        img_file = tf.keras.utils.get_file(\n            \"./kitten_small.jpg\",\n            \"https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg\")\n        img = image.load_img(img_file, target_size=(224, 224))\n        img_array = preprocess_input(image.img_to_array(img)[None, ...])\n        request = predict_pb2.PredictRequest()\n        request.model_spec.name = 'resnet50_neuron'\n        request.inputs['input_1'].CopyFrom(\n            tf.make_tensor_proto(img_array, shape=img_array.shape))\n        result = stub.Predict(request)\n        prediction = tf.make_ndarray(result.outputs['output_1'])\n        print(decode_predictions(prediction))\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.rst",
    "content": ".. _inference-tensorflow-neuronx-tutorials:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTutorials  (``tensorflow-neuronx``)\n===================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n  \n    HuggingFace Roberta-Base </src/examples/tensorflow/tensorflow-neuronx/tfneuronx-roberta-base-tutorial.ipynb>\n    /archive/tensorflow/tensorflow-neuronx/tutorials/tutorial-tensorflowx-serving-NeuronRT-Visible-Cores\n\n\n.. include:: /archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.txt\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.txt",
    "content": "* HuggingFace Roberta-Base :ref:`[html]</src/examples/tensorflow/tensorflow-neuronx/tfneuronx-roberta-base-tutorial.ipynb>` :github:`[notebook] </src/examples/tensorflow/tensorflow-neuronx/tfneuronx-roberta-base-tutorial.ipynb>`\n* :ref:`tensorflow-servingx-neuronrt-visible-cores`\n\n.. note::\n    To use Jupyter Notebook see:\n\n    * :ref:`setup-jupyter-notebook-steps-troubleshooting`\n    * :ref:`running-jupyter-notebook-as-script`"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx-inference.rst",
    "content": ".. _inference-tensorflow-neuronx:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInference on Inf2 & Trn1/Trn1n (``tensorflow-neuronx``)\n=======================================================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Tutorials </archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx>\n    API Reference Guide  </archive/tensorflow/tensorflow-neuronx/api-reference-guide>\n    Misc  </archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx>\n\n\n.. include:: tensorflow-neuronx-inference.txt\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-neuronx-inference.txt",
    "content": ".. card:: Setup  (``tensorflow-neuronx``)\n            :class-body: sphinx-design-class-title-small\n\n            See :doc:`TensorFlow NeuronX setup </archive/tensorflow/index>`.\n            \n            \n.. dropdown::  Tutorials (``tensorflow-neuronx``)\n    :class-title: sphinx-design-class-title-med\n    :animate: fade-in\n\n    .. include:: /archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.txt\n\n\n.. dropdown::  API Reference Guide (``tensorflow-neuronx``)\n    :class-title: sphinx-design-class-title-med\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /archive/tensorflow/tensorflow-neuronx/api-reference-guide.txt\n\n\n.. dropdown::  Misc (``tensorflow-neuronx``)\n    :class-title: sphinx-design-class-title-med\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n                \n    .. include:: /archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx.txt"
  },
  {
    "path": "archive/tensorflow/tensorflow-setup.rst",
    "content": ".. _tf-setup:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTensorflow Neuron Setup\n=======================\n\n.. warning::\n\n   This document is archived. TensorFlow is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: tensorflow-setup.txt\n"
  },
  {
    "path": "archive/tensorflow/tensorflow-setup.txt",
    "content": ".. card:: Tensorflow Neuron (``tensorflow-neuronx``) Setup for  Inf2, Trn1/Trn1n Instances\n            :class-body: sphinx-design-class-title-small\n\n            See :doc:`TensorFlow NeuronX setup </archive/tensorflow/index>`.\n\n\n.. card:: Tensorflow Neuron (``tensorflow-neuron``) Setup for Inf1 Instances\n            :class-body: sphinx-design-class-title-small\n\n            See :doc:`TensorFlow Neuron setup </archive/tensorflow/index>`.\n"
  },
  {
    "path": "archive/torch-neuron/additional-examples-inference-torch-neuron.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nAdditional Examples (``torch-neuron``)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    AWS Neuron Samples GitHub Repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuron/inference>\n\n\n\n.. include:: /archive/torch-neuron/additional-examples-inference-torch-neuron.txt\n"
  },
  {
    "path": "archive/torch-neuron/additional-examples-inference-torch-neuron.txt",
    "content": "* `AWS Neuron Samples GitHub Repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/archive/torch-neuron/inference>`_\n"
  },
  {
    "path": "archive/torch-neuron/api-compilation-python-api.rst",
    "content": ".. _torch_neuron_trace_api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nPyTorch-Neuron trace python API\n================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThe PyTorch-Neuron trace Python API provides a method to generate\nPyTorch models for execution on Inferentia, which can be serialized as\nTorchScript. It is analogous to :func:`torch.jit.trace` function in PyTorch.\n\n.. py:function:: torch_neuron.trace(model, example_inputs, **kwargs)\n\n    The :func:`torch_neuron.trace` method sends operations to\n    the Neuron-Compiler (``neuron-cc``) for compilation and embeds compiled\n    artifacts in a TorchScript graph.\n\n    Compilation can be done on any EC2 machine with sufficient memory and\n    compute resources. c5.4xlarge or larger is recommended.\n\n    Options can be passed to Neuron compiler via the compile function. See\n    :ref:`neuron-compiler-cli-reference`\n    for more information about compiler options.\n\n    This function partitions nodes into operations that are supported\n    by Neuron and operations which are not. Operations which are not supported\n    by Neuron are run on CPU. Graph partitioning can be controlled by the\n    ``subgraph_builder_function``, ``minimum_segment_size``, and ``fallback``\n    parameters (See below). By default all supported operations are compiled and\n    run on Neuron.\n\n    The compiled graph can be saved using the :func:`torch.jit.save` function and\n    restored using :func:`torch.jit.load` function for inference on Inf1 instances.\n    During inference, the previously compiled artifacts will be loaded into\n    the Neuron Runtime for inference execution.\n\n    *Required Arguments*\n\n    :arg ~torch.nn.Module,callable model: The functions that that will be run with\n       ``example_inputs`` arguments. The arguments and return types must compatible\n       with :func:`torch.jit.trace`. When a :class:`~torch.nn.Module` is passed\n       to :func:`torch_neuron.trace`, only the :func:`~torch.nn.Module.forward`\n       method is run and traced.\n    :arg tuple example_inputs: A tuple of example inputs that will be passed to\n       the ``model`` while tracing. The resulting trace can be run with inputs\n       of different types and shapes assuming the traced operations support\n       those types and shapes. This parameter may also be a single\n       :class:`torch.Tensor` in which case it is automatically wrapped in a\n       ``tuple``.\n\n    *Optional Keyword Arguments*\n\n    :keyword list[str] compiler_args: List of strings representing\n       ``neuron-cc`` compiler arguments. Note that these arguments apply to all\n       subgraphs generated by allowlist partitioning. For example, use\n       :code:`compiler_args=['--neuroncore-pipeline-cores', '4']` to set number\n       of NeuronCores per subgraph to 4. See :ref:`neuron-compiler-cli-reference`\n       for more information about compiler options.\n    :keyword int compiler_timeout: Timeout in seconds for waiting\n       ``neuron-cc`` to complete. Exceeding this timeout will cause a\n       ``subprocess.TimeoutExpired`` exception.\n    :keyword str compiler_workdir: Work directory used by\n       ``neuron-cc``. Useful for debugging and/or inspecting ``neuron-cc``\n       logs/IRs.\n    :keyword callable subgraph_builder_function: A function which is evaluated\n       on each node during graph partitioning. This takes in a torch graph\n       operator node and returns a :class:`bool` value of whether\n       it should be included in the fused Neuron graph or not. By default the\n       partitioner selects all operators which are supported by Neuron.\n    :keyword int minimum_segment_size: A parameter used during partitioning.\n       This specifies the minimum number of graph nodes which should be compiled\n       into a Neuron graph (default= :code:`2`). If the number of nodes is smaller\n       than this size, the operations will run on CPU.\n    :keyword float single_fusion_ratio_threshold: A parameter used during\n        partitioning. During partitioning, if a single partition contains a\n        fraction of operations greater than this threshold, only one graph\n        partition will be compiled (default= :code:`0.6`). This is used to\n        avoid compiling many small Neuron graphs. To force compilation of all\n        graphs to Neuron (even when they are very small), a value of ``1.0``\n        can be used.\n    :keyword bool fallback: A function parameter to turn off graph partitioning.\n       Indicates whether to attempt to fall back to CPU operations if an\n       operation is not supported by Neuron. By default this is ``True``. If\n       this is set to ``False`` and an operation is not supported by Neuron,\n       this will fail compilation and raise an ``AttributeError``.\n    :keyword bool dynamic_batch_size: A flag to allow Neuron graphs to consume\n       variable sized batches of data. Dynamic sizing is restricted to the 0th\n       dimension of a tensor.\n    :keyword list optimizations: A list of :class:`~torch_neuron.Optimization`\n        passes to apply to the model.\n    :keyword bool separate_weights: A flag to enable compilation of models with \n        over 1.9GB of constant parameters. By default this flag is ``False``. \n        If this is set to ``True`` and the compiler version is not new enough \n        to support the flag, this will raise an ``NotImplementedError``.\n    :keyword \\*\\*kwargs: All other keyword arguments will be forwarded directly to\n       :func:`torch.jit.trace`. This supports flags like ``strict=False``\n       in order to allow dictionary outputs.\n\n    :returns: The traced :class:`~torch.jit.ScriptModule` with embedded\n       compiled neuron sub-graphs. Operations in this module will run on Neuron\n       unless they are not supported by Neuron or manually partitioned to run\n       on CPU.\n\n       Note that in ``torch<1.8`` This would return a\n       :class:`~torch.jit.ScriptFunction` if the input was function type.\n    :rtype: ~torch.jit.ScriptModule, ~torch.jit.ScriptFunction\n\n\n.. py:class:: torch_neuron.Optimization\n\n    A set of optimization passes that can be applied to the model.\n\n    .. py:attribute:: FLOAT32_TO_FLOAT16\n\n        A post-processing pass that converts all :attr:`torch.float32` tensors\n        to :attr:`torch.float16` tensors. The advantage to this\n        optimization pass is that input/output tensors will be type cast.\n        This reduces the amount of data that will be copied to and from\n        Inferentia hardware. The resulting traced model will accept both\n        :attr:`torch.float32` and :attr:`torch.float16` inputs where the\n        model used :attr:`torch.float32` inputs during tracing. It is only\n        beneficial to enable this optimization if the throughput of a\n        model is highly dependent upon data transfer speed. This optimization is\n        not recommended if the final application will use :attr:`torch.float32`\n        inputs since the :attr:`torch.float16` type cast will occur on CPU\n        during inference.\n\n\nExample Usage\n-------------\n\nFunction Compilation\n~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    def foo(x, y):\n        return 2 * x + y\n\n    # Run `foo` with the provided inputs and record the tensor operations\n    traced_foo = torch.neuron.trace(foo, (torch.rand(3), torch.rand(3)))\n\n    # `traced_foo` can now be run with the TorchScript interpreter or saved\n    # and loaded in a Python-free environment\n    torch.jit.save(traced_foo, 'foo.pt')\n    traced_foo = torch.jit.load('foo.pt')\n\nModule Compilation\n~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    import torch.nn as nn\n\n    class Net(nn.Module):\n        def __init__(self):\n            super(Net, self).__init__()\n            self.conv = nn.Conv2d(1, 1, 3)\n\n        def forward(self, x):\n            return self.conv(x) + 1\n\n    n = Net()\n    n.eval()\n\n    inputs = torch.rand(1, 1, 3, 3)\n\n    # Trace a specific method and construct `ScriptModule` with\n    # a single `forward` method\n    neuron_forward = torch.neuron.trace(n.forward, inputs)\n\n    # Trace a module (implicitly traces `forward`) and constructs a\n    # `ScriptModule` with a single `forward` method\n    neuron_net = torch.neuron.trace(n, inputs)\n\nPre-Trained Model Compilation\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nThe following is an example usage of the compilation Python API, with\ndefault compilation arguments, using a pretrained :class:`torch.nn.Module`:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    from torchvision import models\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 224, 224])\n    model_neuron = torch.neuron.trace(model, image)\n\n\n.. _compiling-models-with-kwargs:\n\nCompiling models with torch.jit.trace kwargs\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nThis example uses the :code:`strict=False` flag to compile a model with\ndictionary outputs. Similarly, any other keyword argument of\n:func:`torch.jit.trace` can be passed directly to\n:func:`torch_neuron.trace` so that it is passed to the underlying trace call.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    import torch.nn as nn\n\n    class Model(nn.Module):\n        def __init__(self):\n            super(Model, self).__init__()\n            self.conv = nn.Conv2d(1, 1, 3)\n\n        def forward(self, x):\n            return {'conv': self.conv(x) + 1}\n\n    model = Model()\n    model.eval()\n\n    inputs = torch.rand(1, 1, 3, 3)\n\n    # use the strict=False kwarg to compile a model with dictionary outputs\n    # the model output format does not change\n    model_neuron = torch.neuron.trace(model, inputs, strict=False)\n\n\nDynamic Batching\n~~~~~~~~~~~~~~~~\nThis example uses the optional :code:`dynamic_batch_size` option in order to\nsupport variable sized batches at inference time.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    from torchvision import models\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input of batch size 1\n    image = torch.rand([1, 3, 224, 224])\n    model_neuron = torch.neuron.trace(model, image, dynamic_batch_size=True)\n\n    # Execute with a batch of 7 images\n    batch = torch.rand([7, 3, 224, 224])\n    results = model_neuron(batch)\n\n\nManual Partitioning\n~~~~~~~~~~~~~~~~~~~\nThe following example uses the optional :code:`subgraph_builder_function`\nparameter to ensure that only a specific convolution layer is compiled to\nNeuron. The remaining operations are executed on CPU.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    import torch.nn as nn\n\n    class ExampleConvolutionLayer(nn.Module):\n        def __init__(self):\n            super().__init__()\n            self.conv = nn.Conv2d(1, 1, 3)\n\n        def forward(self, x):\n            return self.conv(x) + 1\n\n    class Model(nn.Module):\n        def __init__(self):\n            super().__init__()\n            self.layer = ExampleConvolutionLayer()\n\n        def forward(self, x):\n            return self.layer(x) * 100\n\n    def subgraph_builder_function(node) -> bool:\n        \"\"\"Select if the node will be included in the Neuron graph\"\"\"\n\n        # Node names are tuples of Module names.\n        if 'ExampleConvolutionLayer' in node.name:\n            return True\n\n        # Ignore all operations not in the example convolution layer\n        return False\n\n    model = Model()\n    model.eval()\n\n    inputs = torch.rand(1, 1, 3, 3)\n\n    # Log output shows that `aten::_convolution` and `aten::add` are compiled\n    # but `aten::mul` is not. This will seamlessly switch between Neuron/CPU\n    # execution in a single graph.\n    neuron_model = torch_neuron.trace(\n        model,\n        inputs,\n        subgraph_builder_function=subgraph_builder_function\n    )\n\n\nSeparate Weights\n~~~~~~~~~~~~~~~~\nThis example uses the optional :code:`separate_weights` option in order to\nsupport compilation of models greater than 1.9GB.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    from torchvision import models\n\n    # Load the model\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 224, 224])\n    #the models' output format does not change\n    model_neuron = torch.neuron.trace(model, image, separate_weights=True)\n"
  },
  {
    "path": "archive/torch-neuron/api-core-placement.rst",
    "content": ".. _torch_core_placement_api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nPyTorch Neuron (``torch-neuron``) Core Placement API\n=====================================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. automodule:: placement\n    :module-name: torch_neuron.experimental\n    :members:\n\n\n"
  },
  {
    "path": "archive/torch-neuron/api-reference-guide-torch-neuron.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nAPI Reference Guide (``torch-neuron``)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    PyTorch Neuron trace Python API </archive/torch-neuron/api-compilation-python-api>\n    torch.neuron.DataParallel API </archive/torch-neuron/api-torch-neuron-dataparallel-api>\n    /archive/torch-neuron/api-core-placement\n\n\n.. include:: /archive/torch-neuron/api-reference-guide-torch-neuron.txt\n"
  },
  {
    "path": "archive/torch-neuron/api-reference-guide-torch-neuron.txt",
    "content": "* :ref:`PyTorch Neuron trace Python API <torch_neuron_trace_api>`\n* :ref:`torch.neuron.DataParallel API <api_torch_neuron_dataparallel_api>`\n* :ref:`torch_core_placement_api`"
  },
  {
    "path": "archive/torch-neuron/api-torch-neuron-dataparallel-api.rst",
    "content": ".. _api_torch_neuron_dataparallel_api:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\ntorch.neuron.DataParallel API\n=============================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThe :func:`torch.neuron.DataParallel` Python API implements data parallelism on\n:class:`~torch.jit.ScriptModule` models created by the\n:ref:`torch_neuron_trace_api`.\nThis function is analogous to :class:`~torch.nn.DataParallel` in PyTorch.\nThe :ref:`torch-neuron-dataparallel-app-note` application note provides an\noverview of how :func:`torch.neuron.DataParallel` can be used to improve\nthe performance of inference workloads on Inferentia.\n\n.. py:function:: torch.neuron.DataParallel(model, device_ids=None, dim=0)\n\n    Applies data parallelism by replicating the model on\n    available NeuronCores and distributing data across the different\n    NeuronCores for parallelized inference.\n\n    By default, DataParallel will use all available NeuronCores\n    allocated for the current process for parallelism. DataParallel will\n    apply parallelism on ``dim=0`` if ``dim`` is not specified.\n\n    DataParallel automatically enables\n    :ref:`dynamic batching <dynamic_batching_description>` on\n    eligible models if ``dim=0``. Dynamic batching can be dsiabled using\n    :func:`torch.neuron.DataParallel.disable_dynamic_batching`.\n    If dynamic batching is not enabled, the batch size at compilation-time must\n    be equal to the batch size at inference-time divided by the number of\n    NeuronCores being used. Specifically, the following must be true when\n    dynamic batching is disabled:\n    ``input.shape[dim] / len(device_ids) == compilation_input.shape[dim]``.\n    DataParallel will throw a warning if dynamic batching cannot be enabled.\n\n    DataParallel will try load all of a model’s NEFFs onto\n    a single NeuronCore, only if all of the NEFFs can fit on a single\n    NeuronCore. DataParallel does not currently support models that\n    have been compiled with :ref:`neuroncore-pipeline`.\n\n    :func:`torch.neuron.DataParallel` requires PyTorch >= 1.8.\n\n    *Required Arguments*\n\n    :arg ~torch.jit.ScriptModule model: Model created by the\n        :ref:`torch_neuron_trace_api`\n        to be parallelized.\n\n    *Optional Arguments*\n\n    :arg list device_ids: List of :obj:`int` or ``'nc:#'`` that specify the\n        NeuronCores to use for parallelization (default: all NeuronCores).\n        Refer to the :ref:`device_ids note <device_ids_note>` for a description\n        of how ``device_ids`` indexing works.\n    :arg int dim: Dimension along which the input tensor is scattered across\n        NeuronCores (default ``dim=0``).\n\n    *Attributes*\n\n    :arg int num_workers: Number of worker threads used for\n        multithreaded inference (default: ``2 * number of NeuronCores``).\n    :arg int split_size: Size of the input chunks\n        (default: ``max(1, input.shape[dim] // number of NeuronCores)``).\n\n\n.. py:function:: torch.neuron.DataParallel.disable_dynamic_batching()\n\n    Disables automatic dynamic batching on the DataParallel module. See\n    :ref:`Dynamic batching disabled <dataparallel_example_disable_dynamic_batching_api>`\n    for example of how DataParallel can be used with dynamic batching disabled.\n    Use as follows:\n\n        >>> model_parallel = torch.neuron.DataParallel(model_neuron)\n        >>> model_parallel.disable_dynamic_batching()\n\n.. _device_ids_note:\n\n.. note::\n\n    ``device_ids`` uses per-process NeuronCore granularity and zero-based\n    indexing. Per-process granularity means that each Python process \"sees\"\n    its own view of the world. Specifically, this means that ``device_ids``\n    only \"sees\" the NeuronCores that are allocated for the current process.\n    Zero-based indexing means that each Python process will index its\n    allocated NeuronCores starting at 0, regardless of the \"global\" index of\n    the NeuronCores. Zero-based indexing makes it possible to redeploy the exact\n    same code unchanged in different process. This behavior is analogous to\n    the ``device_ids`` argument in the PyTorch\n    :class:`~torch.nn.DataParallel` function.\n\n    As an example, assume DataParallel is run on an inf1.6xlarge, which\n    contains four Inferentia chips each of which contains four NeuronCores:\n\n    * If ``NEURON_RT_VISIBLE_CORES`` is not set, a single process can access\n      all 16 NeuronCores. Thus specifying ``device_ids=[\"nc:0\"]`` will\n      correspond to chip0:core0 and ``device_ids=[\"nc:14\"]`` will correspond\n      to chip3:core2.\n\n    * However, if two processes are launched where: process 1 has\n      ``NEURON_RT_VISIBLE_CORES=0-6`` and process 2 has\n      ``NEURON_RT_VISIBLE_CORES=7-15``, ``device_ids=[\"nc:14\"]``\n      cannot be specified in either process. Instead, chip3:core2 can only be\n      accessed in process 2. Additionally, chip3:core2 is specified in process 2\n      with ``device_ids=[\"nc:7\"]``. Furthermore, in process 1,\n      ``device_ids=[\"nc:0\"]`` would correspond to chip0:core0; in process 2\n      ``device_ids=[\"nc:0\"]`` would correspond to chip1:core3.\n\n\nExamples\n--------\n\nThe following sections provide example usages of the\n:func:`torch.neuron.DataParallel` module.\n\nDefault usage\n^^^^^^^^^^^^^\n\n.. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-default.rst\n\nSpecifying NeuronCores\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-specify-ncs.rst\n\nDataParallel with dim != 0\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-dim-neq-zero.rst\n\nDynamic batching\n^^^^^^^^^^^^^^^^\n\n.. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-dynamic-batching.rst\n\n.. _dataparallel_example_disable_dynamic_batching_api:\n\nDynamic batching disabled\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-disable-dynamic-batching.rst\n\nFull tutorial with torch.neuron.DataParallel\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFor an end-to-end tutorial that uses DataParallel, see the\n:ref:`PyTorch Resnet Tutorial </src/examples/pytorch/resnet50.ipynb>`.\n"
  },
  {
    "path": "archive/torch-neuron/developer-guide-torch-neuron.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nDeveloper Guide (``torch-neuron``)\n==================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    Running Inference on Variable Input Shapes with Bucketing </about-neuron/appnotes/torch-neuron/bucketing-app-note>\n    Data Parallel Inference on PyTorch Neuron </about-neuron/appnotes/torch-neuron/torch-neuron-dataparallel-app-note>\n    /archive/torch-neuron/guides/torch-lstm-support\n    /archive/torch-neuron/guides/core-placement/torch-core-placement\n\n\n.. include:: /archive/torch-neuron/developer-guide-torch-neuron.txt\n"
  },
  {
    "path": "archive/torch-neuron/developer-guide-torch-neuron.txt",
    "content": "* :ref:`Running Inference on Variable Input Shapes with Bucketing <bucketing_app_note>`\n* :ref:`Data Parallel Inference on PyTorch Neuron <torch-neuron-dataparallel-app-note>`\n* :ref:`torch_neuron_lstm_support`\n* :ref:`torch_neuron_core_placement_guide`"
  },
  {
    "path": "archive/torch-neuron/guides/core-placement/torch-core-placement.rst",
    "content": ".. _torch_neuron_core_placement_guide:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nPyTorch Neuron (``torch-neuron``) Core Placement\n================================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThis programming guide describes the available techniques and APIs to be able\nto allocate NeuronCores to a process and place models onto specific NeuronCores.\nIn order of precedence, the current recommendation is to use the following\nplacement techniques:\n\n1. For most regular models, default core placement should be used in\n   conjunction with ``NEURON_RT_NUM_CORES`` (:ref:`torch_placement_default`)\n2. For more specific core placement for NeuronCore Pipelined models, then\n   ``NEURONCORE_GROUP_SIZES`` should be used (:ref:`torch_placement_ncg`).\n3. Finally, for even more granular control, then the beta\n   explicit placement APIs may be used (:ref:`torch_placement_explicit`).\n\n.. contents:: Table of Contents\n    :depth: 3\n\nThe following guide will assume a machine with 8 NeuronCores:\n\n- NeuronCores will use the notation ``nc0``, ``nc1``, etc.\n- NeuronCore Groups will use the notation ``ncg0``, ``ncg1`` etc.\n- Models will use the notation ``m0``, ``m1`` etc.\n\nNeuronCores, NeuronCore Groups, and model allocations will be displayed in\nthe following format:\n\n.. raw:: html\n    :file: images/0-0-legend.svg\n\nNote that the actual cores that are visible to the process can be adjusted\naccording to the :ref:`nrt-configuration`.\n\nNeuronCore Pipeline\n-------------------\n\nA key concept to understand the intent behind certain core placement strategies\nis NeuronCore Pipelining (See :ref:`neuroncore-pipeline`). NeuronCore Pipelining\nallows a model to be automatically split into pieces and executed on different\nNeuronCores.\n\nFor most models only 1 NeuronCore will be required for execution. A model will\n**only** require more than one NeuronCore when using NeuronCore Pipeline.\nWhen model pipelining is enabled, the model is split between multiple\nNeuronCores and data is transferred between them. For example, if the compiler\nflag ``--neuroncore-pipeline-cores 4`` is used, this splits the model into\n4 pieces to be executed on 4 separate NeuronCores.\n\n.. _torch_placement_default:\n\nDefault Core Allocation & Placement\n-----------------------------------\n\nThe most basic requirement of an inference application is to be able to place a\nsingle model on a single NeuronCore. More complex applications may use multiple\nNeuronCores or even multiple processes each executing different models. The\nimportant thing to note about designing an inference application is that a\nsingle NeuronCore will always be allocated to a single process. *Processes do\nnot share NeuronCores*. Different configurations can be used to ensure that\nan application process has enough NeuronCores allocated to execute its model(s):\n\n- Default: A process will attempt to take ownership of **all NeuronCores**\n  visible on the instance. This should be used when an instance is only running\n  a single inference process since no other process will be allowed to take\n  ownership of any NeuronCores.\n- ``NEURON_RT_NUM_CORES``: Specify the **number of NeuronCores** to allocate\n  to the process. This places no restrictions on which NeuronCores will be used,\n  however, the resulting NeuronCores will always be contiguous. This should be\n  used in multi-process applications where each process should only use a subset\n  of NeuronCores.\n- ``NEURON_RT_VISIBLE_CORES``: Specifies exactly **which NeuronCores** are\n  allocated to the process by index. Similar to ``NEURON_RT_NUM_CORES``, this\n  can be used in multi-process applications where each process should only use a\n  subset of NeuronCores. This provides more fined-grained controls over the\n  exact NeuronCores that are allocated to a given process.\n- ``NEURONCORE_GROUP_SIZES``: Specifies a number of **NeuronCore Groups** which\n  are allocated to the process. This is described in more detail in the\n  :ref:`torch_placement_ncg` section.\n\nSee the :ref:`nrt-configuration` for more environment variable details.\n\nExample: Default\n^^^^^^^^^^^^^^^^\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0\n    m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc1\n\n\n.. raw:: html\n    :file: images/0-1-default-2.svg\n\nWith no environment configuration, the process will take ownership of all\nNeuronCores. In this example, only two of the NeuronCores are used by the\nprocess and the remaining are allocated but left idle.\n\n\nExample: ``NEURON_RT_NUM_CORES``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_NUM_CORES = '2'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0\n    m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc1\n\n.. raw:: html\n    :file: images/0-2-default-rt-num-cores.svg\n\nSince there is no other process on the instance, only the first 2 NeuronCores\nwill be acquired by the process. Models load in a simple linear order to the\nleast used NeuronCores.\n\n\nExample: ``NEURON_RT_VISIBLE_CORES``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_VISIBLE_CORES = '4-5'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc4\n    m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc5\n\n\n.. raw:: html\n    :file: images/0-3-default-rt-visible-cores.svg\n\nUnlike ``NEURON_RT_NUM_CORES``, setting the visible NeuronCores allows the\nprocess to take control of a specific contiguous set. This allows an application\nto have a more fine-grained control of where models will be placed.\n\n\nExample: Overlapping Models\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_VISIBLE_CORES = '0-1'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0\n    m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc0-nc1\n    m2 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc1\n\n.. raw:: html\n    :file: images/0-4-default-overlap-model-2.svg\n\n.. raw:: html\n    :file: images/0-4-default-overlap.svg\n\nThis shows how models may share NeuronCores but the default model placement\nwill attempt to evenly distribute NeuronCore usage rather than overlapping all\nmodels on a single NeuronCore.\n\n\nExample: Multiple Processes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_NUM_CORES = '2'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0\n    m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc1\n\n\nIn this example, if the script is run **twice**, the following allocations\nwill be made:\n\n.. raw:: html\n    :file: images/0-5-default-multiprocess.svg\n\nNote that each process will take ownership of as many NeuronCores as is\nspecified by the ``NEURON_RT_NUM_CORES`` configuration.\n\n\n.. _torch_placement_ncg:\n\nNEURONCORE_GROUP_SIZES\n----------------------\n\n.. important::\n\n    The use of explicit core placement should only be used when a specific\n    performance goal is required. By default ``torch-neuron`` places models on\n    the **least used** NeuronCores. This should be optimal for most\n    applications.\n\n    Secondly, ``NEURONCORE_GROUP_SIZES`` is being deprecated in a future\n    release and should be avoided in favor of newer placement methods.\n    Use ``NEURON_RT_NUM_CORES`` or ``NEURON_RT_VISIBLE_CORES`` with default\n    placement if possible (See :ref:`torch_placement_default`)\n\n\nIn the current release of NeuronSDK, the most well-supported method of placing\nmodels onto specific NeuronCores is to use the ``NEURONCORE_GROUP_SIZES``\nenvironment variable. This will define a set of \"NeuronCore Groups\" for the\napplication process.\n\nNeuronCore Groups are *contiguous sets of NeuronCores* that are allocated to\na given process. Creating groups allows an application to ensure that a\nmodel has a defined set of NeuronCores that will always be allocated to it.\n\nNote that NeuronCore Groups *can* be used to allocate non-pipelined models\n(those requiring exactly 1 NeuronCore) to specific NeuronCores but this is\nnot the primary intended use. The intended use of NeuronCore Groups is to\nensure pipelined models (those requiring >1 NeuronCore) have exclusive access\nto a specific set of contiguous NeuronCores.\n\nIn the cases where models are being used *without* NeuronCore Pipeline, the\ngeneral recommendation is to use default placement\n(See :ref:`torch_placement_default`).\n\nThe following section demonstrates how ``NEURONCORE_GROUP_SIZES`` can be used\nand the issues that may arise.\n\nExample: Single NeuronCore Group\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn the example where one model requires 4 NeuronCores, the correct environment\nconfiguration would be:\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURONCORE_GROUP_SIZES = '4'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    m0 = torch.jit.load('model-with-4-neuron-pipeline-cores.pt')  # Loads to nc0-nc3\n\n\n.. raw:: html\n    :file: images/1-ncg-4.svg\n\nThis is the most basic usage of a NeuronCore Group. The environment setup\ncauses the process to take control of 4 NeuronCores and then the script loads\na model compiled with a NeuronCore Pipeline size of 4 to the first group.\n\n\nExample: Multiple NeuronCore Groups\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWith more complicated configurations, the intended use of\n``NEURONCORE_GROUP_SIZES`` is to create 1 Group per model with the correct size\nto ensure that the models are placed on the intended NeuronCores. Similarly, the\nenvironment would need to be configured to create a NeuronCore Group for each\nmodel:\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURONCORE_GROUP_SIZES = '3,4,1'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    m0 = torch.jit.load('model-with-3-neuron-pipeline-cores.pt')  # Loads to nc0-nc2\n    m1 = torch.jit.load('model-with-4-neuron-pipeline-cores.pt')  # Loads to nc3-nc6\n    m2 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc7\n\n\n\n\n.. raw:: html\n    :file: images/2-ncg-3-4-1.svg\n\n\nIssue: Overlapping Models with Differing Model Sizes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen multiple models are loaded to a single NeuronCore Group, this can cause\nunintended inefficiencies. A single model is only intended to span a single\nNeuronCore Group. Applications with many models of varying sizes can be\nrestricted by NeuronCore Group configurations since the most optimal model\nlayout may require more fine-grained controls.\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURONCORE_GROUP_SIZES = '2,2'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc0-nc1\n    m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc2-nc3\n    m2 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0\n    m3 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc2\n    m4 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc0\n\n\n.. raw:: html\n    :file: images/3-models-m4-0-warning.svg\n\n.. raw:: html\n    :file: images/3-models-m2-0-m3-2.svg\n\n.. raw:: html\n    :file: images/3-ncg-2-2.svg\n\n\nHere the ``NEURONCORE_GROUP_SIZES`` does not generate an optimal layout\nbecause placement strictly follows the layout of NeuronCore Groups. A\npotentially more optimal layout would be to place ``m4`` onto ``nc1``. In this\ncase, since a pipelined model will not be able to have exclusive access to a set\nof NeuronCores, the default NeuronCore placement (no NeuronCore Groups\nspecified) would more evenly distribute the models.\n\nAlso note here that this is an example of where the order of model loads\naffects which model is assigned to which NeuronCore Group. If the order of the\nload statements is changed, models may be assigned to different NeuronCore\nGroups.\n\n\nIssue: Incompatible Model Sizes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAnother problem occurs when attempting to place a model which does not evenly\nfit into a single group:\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURONCORE_GROUP_SIZES = '2,2'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc0-nc1\n    m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc2-nc3\n    m2 = torch.jit.load('model-with-3-neuron-pipeline-cores.pt')  # Loads to nc0-nc2\n\n\n.. raw:: html\n    :file: images/4-models-m2-0-2-warning.svg\n\n.. raw:: html\n    :file: images/3-ncg-2-2.svg\n\n\nThe model will be placed *across* NeuronCore Groups since there is no obvious\ngroup to assign the model to according to the environment variable\nconfiguration. Depending on the individual model and application requirements,\nthe placement here may not be optimal.\n\n\nIssue: Multiple Model Copies\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIt is common in inference serving applications to use multiple replicas of a\nsingle model across different NeuronCores. This allows the hardware to be fully\nutilized to maximize throughput. In this scenario, when using NeuronCore\nGroups, the only way to replicate a model on multiple NeuronCores is to create a\n*new model* object. In the example below, 4 models loads are performed to place\na model in each NeuronCore Group.\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURONCORE_GROUP_SIZES = '2,2,2,2'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    models = list()\n    for _ in range(4):\n        model = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')\n        models.append(model)\n\n\n.. raw:: html\n    :file: images/3-ncg-2-2-2-2-copies.svg\n\n\nThe largest consequence of this type of model allocation is that the application\ncode is responsible for routing inference requests to models. There are a\nvariety of ways to implement the inference switching but in all cases routing\nlogic needs to be implemented in the application code.\n\n\nIssue Summary\n^^^^^^^^^^^^^\n\nThe use of ``NEURONCORE_GROUP_SIZES`` has the following problems:\n\n- **Variable Sized Models**: Models which require crossing NeuronCore Group\n  boundaries may be placed poorly. This means group configuration limits the\n  size of which models can be loaded.\n- **Model Load Order**: Models are loaded to NeuronCore Groups greedily. This\n  means that the order of model loads can potentially negatively affect\n  application performance by causing unintentional overlap.\n- **Implicit Placement**: NeuronCore Groups cannot be explicitly chosen in the\n  application code.\n- **Manual Replication**: When loading multiple copies of a model to different\n  NeuronCore Groups, this requires that multiple model handles are used.\n\n\n.. _torch_placement_explicit:\n\nExplicit Core Placement\n-------------------------------------\n\nTo address the limitations of ``NEURONCORE_GROUP_SIZES``, a new set of APIs has\nbeen added which allows specific NeuronCores to be chosen by the application\ncode. These can be found in the :ref:`torch_neuron_core_placement_api` documentation.\n\n\nExample: Manual Core Selection\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe most direct usage of the placement APIs is to manually select the\nstart NeuronCore that each model is loaded to. This will automatically use as\nmany NeuronCores as is necessary for that model (1 for most models, >1 for\nNeuronCore Pipelines models).\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_NUM_CORES = '4'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    # NOTE: Order of loads does NOT matter\n\n    with torch_neuron.experimental.neuron_cores_context(2):\n        m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc2-nc3\n\n    with torch_neuron.experimental.neuron_cores_context(0):\n        m2 = torch.jit.load('model-with-3-neuron-pipeline-cores.pt')  # Loads to nc0-nc2\n\n    with torch_neuron.experimental.neuron_cores_context(0):\n        m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads to nc0-nc1\n\n    with torch_neuron.experimental.neuron_cores_context(3):\n        m3 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads to nc3\n\n\n.. raw:: html\n    :file: images/5-models-m2-0-2-m3-3.svg\n\n.. raw:: html\n    :file: images/5-placement.svg\n\n\nNote that this directly solves the ``NEURONCORE_GROUP_SIZES`` issues of:\n\n- **Variable Sized Models**: Now since models are directly placed on the\n  NeuronCores requested by the application, there is no disconnect\n  between the model sizes and NeuronCore Group sizes.\n- **Model Load Order**: Since the NeuronCores are explicitly selected, there is\n  no need to be careful about the order in which models are loaded since they\n  can be placed deterministically regardless of the load order.\n- **Implicit Placement**: Similarly, explicit placement means there is no chance\n  that a model will end up being allocated to an incorrect NeuronCore Group.\n\n\nExample: Automatic Multicore\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUsing explicit core placement it is possible to replicate a model to multiple\nNeuronCores simultaneously. This means that a single model object within python\ncan utilize all available NeuronCores (or NeuronCores allocated to the process).\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_NUM_CORES = '8'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    with torch_neuron.experimental.multicore_context():\n        m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt')  # Loads replications to nc0-nc7\n\n\n.. raw:: html\n    :file: images/6-multicore.svg\n\n\nThis addresses the last ``NEURONCORE_GROUP_SIZES`` issue of:\n\n- **Manual Replication**: Since models can be automatically replicated to\n  multiple NeuronCores, this means that applications no longer need to implement\n  routing logic and perform multiple loads.\n\nThis API has a secondary benefit that the exact same loading logic can be used\non an ``inf1.xlarge`` or an ``inf1.6xlarge``. In either case, it will use all\nof the NeuronCores that are visible to the process. This means that no special\nlogic needs to be coded for different instance types.\n\n\nExample: Explicit Replication\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nReplication is also possible with the\n:func:`~torch_neuron.experimental.neuron_cores_context` API. The number of\nreplications is chosen by ``replications = floor(nc_count / cores_per_model)``.\n\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_NUM_CORES = '8'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    with torch_neuron.experimental.neuron_cores_context(start_nc=2, nc_count=4):\n        m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt')  # Loads replications to nc2-nc5\n\n\n.. raw:: html\n    :file: images/7-replication.svg\n"
  },
  {
    "path": "archive/torch-neuron/guides/torch-lstm-support.rst",
    "content": ".. _torch_neuron_lstm_support:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nDeveloper Guide - PyTorch Neuron (``torch-neuron``) |LSTM| Support\n==================================================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nThe `torch-neuron` package can support |LSTM| operations and yield\nhigh performance on both fixed-length and variable-length sequences. Most\nnetwork configurations can be supported, with the exception of those that\nrequire |PackedSequence| usage outside of |LSTM| or |pad_packed_sequence|\noperations. Neuron must guarantee that the shapes can remain fixed throughout\nthe network.\n\nThe following sections describe which scenarios can and cannot be supported.\n\nSupported Usage\n---------------\n\nFixed-Length Sequences\n~~~~~~~~~~~~~~~~~~~~~~\n\nIn normal usage of an |LSTM|, the inputs and outputs are expected to be a fixed\nsize sequence length. This is the most basic usage of an |LSTM| but may not be\napplicable to applications where the input sequence length may vary.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    class Network(torch.nn.Module):\n\n        def __init__(self):\n            super().__init__()\n            self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7)\n\n        def forward(self, inputs):\n            output, (ht, ct) = self.lstm(inputs)\n            return output, (ht, ct)\n\n    # Example Inputs\n    seq_len, batch_size, input_size = 5, 2, 3\n    inputs = torch.rand(seq_len, batch_size, input_size)\n\n    # Trace\n    torch_neuron.trace(Network(), (inputs,))\n\n\nPacked Input, Padded Output, *Pre-Sorted* Inputs\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA common usage of an |LSTM| is when the input sequence sizes vary according\nto an input sequence lengths (such as tokens).\n\nFor example, the following sentences could result in two different\nsequence lengths after tokenization:\n\n.. code-block:: python\n\n    # Input\n    text = [\n       'Hello, sailor',\n       'Example',\n    ]\n\n    # ... Tokenization ...\n\n    # Result\n    tokens = [\n        [101, 7592, 1010, 11803, 102],\n        [101, 2742,  102,     0,   0],\n    ]\n    lengths = [5, 3]\n\nBecause the lengths are different, the final |LSTM| state will be dependent upon\nthe lengths of each sequence in the batch. Torch provides a way to deal with\nthese types of sequences by densely packing batches into a |PackedSequence|. The\nmost common way this is constructed is by using the |pack_padded_sequence|\nutility function prior to feeding inputs into the |LSTM|.\n\nPacking the above sequences would result in the following data and batch\nsize tensors.\n\n.. code-block:: python\n\n    data = [101, 101, 7592, 2742, 1010, 102, 11803, 102]\n    batch_sizes = [2, 2, 2, 1, 1]\n\n\nIn addition to correctly computing final |LSTM| state, using a packed\nsequence instead of a padded sequence also improves model performance on CPU.\nOn Neuron, where computation is fixed to the maximum length ahead of time,\n**this is does not improve performance**.\n\nWhen an |LSTM| is processing a |PackedSequence|, it must do so in a descending\nsorted length order. To ensure that sequences are sorted, |pack_padded_sequence|\nprovides an ``enforce_sorted`` flag. When ``enforce_sorted`` is ``True``, the\ninput is *already expected* to contain sequences sorted by length in a\ndecreasing order along the batch dimension. Note that this must be enforced in\nthe application-level code but is only relevant when batch size > 1.\n\nThe following network can compile successfully because the input and output\nto the network are guaranteed to be a fixed shape. The input shape is expected\nto be a padded tensor and the output tensor is expected to be padded to the\nmaximum sequence length using the |pad_packed_sequence| function call:\n\n.. code-block:: python\n    :emphasize-lines: 14\n\n    import torch\n    import torch_neuron\n\n    class Network(torch.nn.Module):\n\n        def __init__(self):\n            super().__init__()\n            self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7)\n\n        def forward(self, inputs, lengths):\n            packed_input = torch.nn.utils.rnn.pack_padded_sequence(\n                inputs,\n                lengths=lengths,\n                enforce_sorted=True,\n            )\n            packed_result, (ht, ct) = self.lstm(packed_input)\n            padded_result, _ = torch.nn.utils.rnn.pad_packed_sequence(packed_result)\n            return padded_result, ht, ct\n\n    # Example Inputs\n    seq_len, batch_size, input_size = 5, 2, 3\n    inputs = torch.rand(seq_len, batch_size, input_size)\n    lengths = torch.tensor([seq_len] * batch_size)\n\n    # Trace\n    torch_neuron.trace(Network(), (inputs, lengths))\n\n\nPacked Input, Padded Output, *Unsorted* Inputs\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen ``enforce_sorted`` is ``False``, the input will be sorted unconditionally.\nThis causes some CPU overhead on Neuron because unsupported operators will be\ninserted into the graph such as ``aten::sort`` and ``aten::scatter_``. The\n``aten::lstm`` operation can still be supported, but it will be less efficient\nthan when ``enforce_sorted`` is ``True``.\n\nThe following code is able to be traced, but results in the sorting\noperations running on CPU. This is not problematic in this case because the\n``aten::sort`` and ``aten::scatter_`` are executed on CPU at the very beginning\nof the graph just prior to Neuron execution.\n\nLike the previous example, the call to |pad_packed_sequence| ensures that the\noutput is a fixed-shape based on the maximum sequence length.\n\n.. code-block:: python\n    :emphasize-lines: 14\n\n    import torch\n    import torch_neuron\n\n    class Network(torch.nn.Module):\n\n        def __init__(self):\n            super().__init__()\n            self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7)\n\n        def forward(self, inputs, lengths):\n            packed_input = torch.nn.utils.rnn.pack_padded_sequence(\n                inputs,\n                lengths=lengths,\n                enforce_sorted=False,\n            )\n            packed_result, (ht, ct) = self.lstm(packed_input)\n            padded_result, _ = torch.nn.utils.rnn.pad_packed_sequence(packed_result)\n            return padded_result, ht, ct\n\n    # Example Inputs\n    seq_len, batch_size, input_size = 5, 2, 3\n    inputs = torch.rand(seq_len, batch_size, input_size)\n    lengths = torch.tensor([seq_len] * batch_size)\n\n    # Trace\n    trace = torch_neuron.trace(Network(), (inputs, lengths))\n\n\nPacked Inputs, Final Hidden & Cell State Only\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen **only** the final |LSTM| hidden & cell state is used, it does not\nmatter if the inputs are packed or unpacked since these state\ntensors will not vary in size.\n\n.. code-block:: python\n    :emphasize-lines: 16,17\n\n    import torch\n    import torch_neuron\n\n    class Network(torch.nn.Module):\n\n        def __init__(self):\n            super().__init__()\n            self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7)\n\n        def forward(self, inputs, lengths):\n            packed_input = torch.nn.utils.rnn.pack_padded_sequence(\n                inputs,\n                lengths=lengths,\n                enforce_sorted=True,\n            )\n            packed_output, (ht, ct) = self.lstm(packed_input)\n            return ht, ct\n\n    # Example Inputs\n    seq_len, batch_size, input_size = 5, 2, 3\n    inputs = torch.rand(seq_len, batch_size, input_size)\n    lengths = torch.tensor([seq_len] * batch_size)\n\n    # Trace\n    trace = torch_neuron.trace(Network(), (inputs, lengths))\n\nNote that when the ``packed_output`` is unused, it does not need to be passed\nto the |pad_packed_sequence| to enable the |LSTM| to be compiled.\n\nUnsupported Usage\n-----------------\n\nNeuron does not support the use of a |PackedSequence| outside of the |LSTM|\noperation and the |pad_packed_sequence| operation. This is because the shape of\na |PackedSequence| can vary depending on the input data. This is incompatible\nwith the Neuron restriction that all tensor sizes must be known at compilation\ntime. When a |PackedSequence| is used only by an |LSTM| or |pad_packed_sequence|\noperation, Neuron *can guarantee* the size of the intermediary tensors by\npadding on behalf of the application.\n\nThis means that If the |PackedSequence| is either used by a different operation\nor returned from the network this would result in all of the |LSTM| operations to\nbe executed on CPU or the network compilation will fail.\n\n\n|PackedSequence| Returned\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe following is unsupported because the |PackedSequence| result of the |LSTM|\nis returned by the network:\n\n.. code-block:: python\n    :emphasize-lines: 14\n\n    class Network(torch.nn.Module):\n\n        def __init__(self):\n            super().__init__()\n            self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7)\n\n        def forward(self, inputs, lengths):\n            packed_input = torch.nn.utils.rnn.pack_padded_sequence(\n                inputs,\n                lengths=lengths,\n                enforce_sorted=False,\n            )\n            packed_result, (ht, ct) = self.lstm(packed_input)\n            return packed_result.data, ht, ct\n\n\n**Behavior**: In this case, compilation fails and the following warning is\ngenerated:\n\n.. code-block:: text\n\n    Operator \"aten::lstm\" consuming a PackedSequence input can only be supported when its corresponding PackedSequence output is unused or unpacked using \"aten::_pad_packed_input\". Found usage by \"prim::Return\"\n\n\n**Resolution**: To avoid this error, the ``packed_result`` should be padded\nprior to being returned from the network by using |pad_packed_sequence|\n\n\nInvalid |PackedSequence| Usage\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe following is unsupported because the |PackedSequence| result of the |LSTM|\nis used by a non-LSTM operator:\n\n.. code-block:: python\n    :emphasize-lines: 14\n\n    class Network(torch.nn.Module):\n\n        def __init__(self):\n            super().__init__()\n            self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7)\n\n        def forward(self, inputs, lengths):\n            packed_input = torch.nn.utils.rnn.pack_padded_sequence(\n                inputs,\n                lengths=lengths,\n                enforce_sorted=False,\n            )\n            packed_result, (ht, ct) = self.lstm(packed_input)\n            return torch.max(packed_result.data)\n\n**Behavior**: In this case, compilation fails and the following warning is\ngenerated:\n\n.. code-block:: text\n\n    Operator \"aten::lstm\" consuming a PackedSequence input can only be supported when its corresponding PackedSequence output is unused or unpacked using \"aten::_pad_packed_input\". Found usage by \"aten::max\"\n\n**Resolution**: To avoid this error, the ``packed_result`` should be padded\nprior to being used in the :func:`~torch.max` from the network by\nusing |pad_packed_sequence|.\n\n\n.. |LSTM| replace:: :class:`~torch.nn.LSTM`\n.. |PackedSequence| replace:: :class:`~torch.nn.utils.rnn.PackedSequence`\n.. |pack_padded_sequence| replace:: :func:`~torch.nn.utils.rnn.pack_padded_sequence`\n.. |pad_packed_sequence| replace:: :func:`~torch.nn.utils.rnn.pad_packed_sequence`\n"
  },
  {
    "path": "archive/torch-neuron/index.rst",
    "content": ".. _torch-neuron-main:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nPyTorch Neuron (torch-neuron) — Archived\n==========================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer actively developed.\n   For new workloads, use TorchNeuron Native or torch-neuronx.\n   See :doc:`/frameworks/torch/index` for current PyTorch support.\n\nPyTorch Neuron (``torch-neuron``) was the original PyTorch integration for AWS Inferentia (Inf1) instances.\nThis package supported inference workloads on NeuronCores v1 architecture.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nAPI Reference\n-------------\n\n.. toctree::\n   :maxdepth: 1\n\n   api-reference-guide-torch-neuron\n   api-compilation-python-api\n   api-core-placement\n   api-torch-neuron-dataparallel-api\n\nDeveloper Guide\n---------------\n\n.. toctree::\n   :maxdepth: 1\n\n   developer-guide-torch-neuron\n   troubleshooting-guide\n\nTutorials\n---------\n\n.. toctree::\n   :maxdepth: 1\n\n   tutorials/tutorials-inference-torch-neuron\n\nSetup\n-----\n\n.. toctree::\n   :maxdepth: 1\n\n   setup/pytorch-install\n   setup/pytorch-update\n\nMisc\n----\n\n.. toctree::\n   :maxdepth: 1\n\n   additional-examples-inference-torch-neuron\n   misc-inference-torch-neuron\n"
  },
  {
    "path": "archive/torch-neuron/inference-torch-neuron.rst",
    "content": ".. _inference-torch-neuron:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-13\n\nInference with ``torch-neuron`` (Inf1)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer actively developed.\n   For new workloads, use TorchNeuron Native or torch-neuronx.\n   See :doc:`/frameworks/torch/index` for current PyTorch support.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Tutorials </archive/torch-neuron/tutorials/tutorials-inference-torch-neuron>\n    Additional Examples  </archive/torch-neuron/additional-examples-inference-torch-neuron>\n    API Reference Guide </archive/torch-neuron/api-reference-guide-torch-neuron>\n    Developer Guide   </archive/torch-neuron/developer-guide-torch-neuron>\n    Misc  </archive/torch-neuron/misc-inference-torch-neuron>\n\n\n.. card:: Setup  (``torch-neuron``)\n            :link: setup-torch-neuron\n            :link-type: ref\n            :class-body: sphinx-design-class-title-small\n\n\n.. dropdown:: Tutorials  (``torch-neuron``)\n\t:class-title: sphinx-design-class-title-small\n\t:animate: fade-in\n\t:name: torch-neuronx-training-tutorials\n\n\t.. include:: /archive/torch-neuron/tutorials/tutorials-inference-torch-neuron.txt\n\n\n.. dropdown::  Additional Examples (``torch-neuron``)\n\t:class-title: sphinx-design-class-title-small\n\t:class-body: sphinx-design-class-body-small\n\t:animate: fade-in\n\t\n\t.. include:: /archive/torch-neuron/additional-examples-inference-torch-neuron.txt\n\n\n.. dropdown:: API Reference Guide (``torch-neuron``)\n\t:class-title: sphinx-design-class-title-small\n\t:animate: fade-in\n\n\t.. include:: /archive/torch-neuron/api-reference-guide-torch-neuron.txt\n\n\n.. dropdown:: Developer Guide (``torch-neuron``)\n\t:class-title: sphinx-design-class-title-small\n\t:animate: fade-in\n\n\t.. include:: /archive/torch-neuron/developer-guide-torch-neuron.txt\n\n.. dropdown:: Misc (``torch-neuron``)\n\t:class-title: sphinx-design-class-title-small\n\t:animate: fade-in\n\n\t* :ref:`neuron-cc-ops-pytorch`\n\t* :ref:`pytorch-neuron-inference-troubleshooting`\n\t* :ref:`pytorch-neuron-rn`\n"
  },
  {
    "path": "archive/torch-neuron/misc-inference-torch-neuron.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nMisc (``torch-neuron``)\n=======================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n\t:maxdepth: 1\n\t:hidden:\n\n\t/release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-pytorch\n\t/archive/torch-neuron/troubleshooting-guide\n\t/release-notes/components/pytorch\n\n\n.. include:: /archive/torch-neuron/misc-inference-torch-neuron.txt\n"
  },
  {
    "path": "archive/torch-neuron/misc-inference-torch-neuron.txt",
    "content": "* :ref:`neuron-cc-ops-pytorch`\n* :ref:`pytorch-neuron-inference-troubleshooting`\n* :ref:`pytorch-neuron-rn`"
  },
  {
    "path": "archive/torch-neuron/placement.py",
    "content": "\"\"\"\n\n\n.. warning::\n\n    The following functionality is beta and **will not be supported** in\n    future releases of the Neuron SDK. This module serves only as a preview for\n    future functionality. In future releases, equivalent functionality may\n    be moved directly to the :code:`torch_neuron` module and will no longer be\n    available in the :code:`torch_neuron.experimental` module.\n\nFunctions which enable placement of :class:`torch.jit.ScriptModule` to specific\nNeuronCores. Two sets of functions are provided which can be used\ninterchangeably but have different performance characteristics and advantages:\n\n- The :func:`~torch_neuron.experimental.multicore_context` &\n  :func:`~torch_neuron.experimental.neuron_cores_context` functions are context\n  managers that allow a model to be placed on a given NeuronCore at\n  :func:`torch.jit.load` time. These functions are the most efficient way of\n  loading a model since the model is loaded directly to a NeuronCore. The\n  alternative functions described below require that a model is unloaded from\n  one core and then reloaded to another.\n- The :func:`~torch_neuron.experimental.set_multicore` &\n  :func:`~torch_neuron.experimental.set_neuron_cores` functions allow a model\n  that has already been loaded to a NeuronCore to be moved to a different\n  NeuronCore. This functionality is less efficient than directly loading a model\n  to a NeuronCore within a context manager but allows device placement to be\n  fully dynamic at runtime. This is analogous to the :meth:`torch.nn.Module.to`\n  function for device placement.\n\n.. important::\n\n    A prerequisite to enable placement functionality is that\n    the loaded :class:`torch.jit.ScriptModule` has already been compiled with\n    the :func:`torch_neuron.trace` API. Attempting to place a regular\n    :class:`torch.nn.Module` onto a NeuronCore prior to compilation will do\n    nothing.\n\"\"\"\nimport contextlib\n\n\ndef set_neuron_cores(trace: 'torch.jit.ScriptModule', start_nc: int=-1, nc_count: int=-1):\n    \"\"\"\n    Set the NeuronCore start/count for all Neuron subgraphs in a torch Module.\n\n    This will unload the model from an existing NeuronCore if it is already\n    loaded.\n\n    *Requires Torch 1.8+*\n\n    Arguments:\n        trace: A torch module which contains one or more Neuron subgraphs.\n        start_nc: The starting NeuronCore index where the Module is placed. The\n            value ``-1`` automatically loads to the optimal NeuronCore (least\n            used). Note that this index is always relative to NeuronCores\n            visible to this process.\n        nc_count: The number of NeuronCores to use. The value ``-1`` will load\n            a model to exactly the number of cores required by that model (1 for\n            most models, >1 when using NeuronCore Pipeline). If ``nc_count``\n            is greater than the number of NeuronCores required by the\n            model, the model will be replicated across multiple\n            NeuronCores. ``(replications = floor(nc_count / cores_per_model))``\n\n    Raises:\n        RuntimeError: If the Neuron runtime cannot be initialized.\n        ValueError: If the ``nc_count`` is an invalid number of NeuronCores.\n\n    Examples:\n\n        *Single Load*: Move a model to the first visible NeuronCore after\n        loading.\n\n        >>> model = torch.jit.load('example_neuron_model.pt')\n        >>> torch_neuron.experimental.set_neuron_cores(model, start_nc=0, nc_count=1)\n        >>> model(example) # Executes on NeuronCore 0\n        >>> model(example) # Executes on NeuronCore 0\n        >>> model(example) # Executes on NeuronCore 0\n\n        *Multiple Core Replication*: Replicate a model to 2 NeuronCores after\n        loading. This allows a single :class:`torch.jit.ScriptModule` to\n        use multiple NeuronCores by running round-robin executions.\n\n        >>> model = torch.jit.load('example_neuron_model.pt')\n        >>> torch_neuron.experimental.set_neuron_cores(model, start_nc=2, nc_count=2)\n        >>> model(example) # Executes on NeuronCore 2\n        >>> model(example) # Executes on NeuronCore 3\n        >>> model(example) # Executes on NeuronCore 2\n\n        *Multiple Model Load*: Move and pin 2 models to separate NeuronCores.\n        This causes each :class:`torch.jit.ScriptModule` to always execute on\n        a specific NeuronCore.\n\n        >>> model1 = torch.jit.load('example_neuron_model.pt')\n        >>> torch_neuron.experimental.set_neuron_cores(model1, start_nc=2)\n        >>> model2 = torch.jit.load('example_neuron_model.pt')\n        >>> torch_neuron.experimental.set_neuron_cores(model2, start_nc=0)\n        >>> model1(example) # Executes on NeuronCore 2\n        >>> model1(example) # Executes on NeuronCore 2\n        >>> model2(example) # Executes on NeuronCore 0\n        >>> model2(example) # Executes on NeuronCore 0\n    \"\"\"\n\n\ndef set_multicore(trace: 'torch.jit.ScriptModule'):\n    \"\"\"\n    Loads all Neuron subgraphs in a torch Module to all visible NeuronCores.\n\n    This loads each Neuron subgraph within a :class:`torch.jit.ScriptModule`\n    to multiple NeuronCores without requiring multiple calls to\n    :func:`torch.jit.load`. This allows a single\n    :class:`torch.jit.ScriptModule` to use multiple NeuronCores for\n    concurrent threadsafe inferences. Executions use a round-robin strategy\n    to distribute across NeuronCores.\n\n    This will unload the model from an existing NeuronCore if it is already\n    loaded.\n\n    *Requires Torch 1.8+*\n\n    Arguments:\n        trace: A torch module which contains one or more Neuron subgraphs.\n\n    Raises:\n        RuntimeError: If the Neuron runtime cannot be initialized.\n\n    Examples:\n\n        *Multiple Core Replication*: Move a model across all visible\n        NeuronCores after loading. This allows a single\n        :class:`torch.jit.ScriptModule` to use all NeuronCores by\n        running round-robin executions.\n\n        >>> model = torch.jit.load('example_neuron_model.pt')\n        >>> torch_neuron.experimental.set_multicore(model)\n        >>> model(example) # Executes on NeuronCore 0\n        >>> model(example) # Executes on NeuronCore 1\n        >>> model(example) # Executes on NeuronCore 2\n    \"\"\"\n\n\n@contextlib.contextmanager\ndef neuron_cores_context(start_nc: int=-1, nc_count: int=-1):\n    \"\"\"\n    A context which sets the NeuronCore start/count for all Neuron subgraphs.\n\n    Any calls to :func:`torch.jit.load` will cause any underlying Neuron\n    subgraphs to load to the specified NeuronCores within this context.\n    This context manager only needs to be used during the model load.\n    After loading, inferences do not need to occur in this context in order\n    to use the correct NeuronCores.\n\n    Note that this context is *not* threadsafe. Using multiple core placement\n    contexts from multiple threads may not correctly place models.\n\n    Arguments:\n        start_nc: The starting NeuronCore index where the Module is placed. The\n            value ``-1`` automatically loads to the optimal NeuronCore (least\n            used). Note that this index is always relative to NeuronCores\n            visible to this process.\n        nc_count: The number of NeuronCores to use. The value ``-1`` will load\n            a model to exactly the number of cores required by that model (1 for\n            most models, >1 when using NeuronCore Pipeline). If ``nc_count``\n            is greater than the number of NeuronCores required by the\n            model, the model will be replicated across multiple\n            NeuronCores. ``(replications = floor(nc_count / cores_per_model))``\n\n    Raises:\n        RuntimeError: If the Neuron runtime cannot be initialized.\n        ValueError: If the ``nc_count`` is an invalid number of NeuronCores.\n\n    Examples:\n\n        *Single Load*: Directly load a model from disk to the first visible\n        NeuronCore.\n\n        >>> with torch_neuron.experimental.neuron_cores_context(start_nc=0, nc_count=1):\n        >>>     model = torch.jit.load('example_neuron_model.pt')\n        >>> model(example) # Executes on NeuronCore 0\n        >>> model(example) # Executes on NeuronCore 0\n        >>> model(example) # Executes on NeuronCore 0\n\n        *Multiple Core Replication*: Directly load a model from disk to 2\n        NeuronCores. This allows a single :class:`torch.jit.ScriptModule` to\n        use multiple NeuronCores by running round-robin executions.\n\n        >>> with torch_neuron.experimental.neuron_cores_context(start_nc=2, nc_count=2):\n        >>>     model = torch.jit.load('example_neuron_model.pt')\n        >>> model(example) # Executes on NeuronCore 2\n        >>> model(example) # Executes on NeuronCore 3\n        >>> model(example) # Executes on NeuronCore 2\n\n        *Multiple Model Load*: Directly load 2 models from disk and pin them to\n        separate NeuronCores. This causes each :class:`torch.jit.ScriptModule`\n        to always execute on a specific NeuronCore.\n\n        >>> with torch_neuron.experimental.neuron_cores_context(start_nc=2):\n        >>>     model1 = torch.jit.load('example_neuron_model.pt')\n        >>> with torch_neuron.experimental.neuron_cores_context(start_nc=0):\n        >>>     model2 = torch.jit.load('example_neuron_model.pt')\n        >>> model1(example) # Executes on NeuronCore 2\n        >>> model1(example) # Executes on NeuronCore 2\n        >>> model2(example) # Executes on NeuronCore 0\n        >>> model2(example) # Executes on NeuronCore 0\n    \"\"\"\n\n\n@contextlib.contextmanager\ndef multicore_context():\n    \"\"\"\n    A context which loads all Neuron subgraphs to all visible NeuronCores.\n\n    This loads each Neuron subgraph within a :class:`torch.jit.ScriptModule`\n    to multiple NeuronCores without requiring multiple calls to\n    :func:`torch.jit.load`. This allows a single\n    :class:`torch.jit.ScriptModule` to use multiple NeuronCores for\n    concurrent threadsafe inferences. Executions use a round-robin strategy\n    to distribute across NeuronCores.\n\n    Any calls to :func:`torch.jit.load` will cause any underlying Neuron\n    subgraphs to load to the specified NeuronCores within this context.\n    This context manager only needs to be used during the model load.\n    After loading, inferences do not need to occur in this context in order\n    to use the correct NeuronCores.\n\n    Note that this context is *not* threadsafe. Using multiple core placement\n    contexts from multiple threads may not correctly place models.\n\n    Raises:\n        RuntimeError: If the Neuron runtime cannot be initialized.\n\n    Examples:\n\n        *Multiple Core Replication*: Directly load a model to all visible\n        NeuronCores. This allows a single  :class:`torch.jit.ScriptModule`\n        to use all NeuronCores by running round-robin executions.\n\n        >>> with torch_neuron.experimental.multicore_context():\n        >>>     model = torch.jit.load('example_neuron_model.pt')\n        >>> model(example) # Executes on NeuronCore 0\n        >>> model(example) # Executes on NeuronCore 1\n        >>> model(example) # Executes on NeuronCore 2\n    \"\"\"\n"
  },
  {
    "path": "archive/torch-neuron/setup/index.rst",
    "content": ".. _setup-torch-neuron-archived:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nSetup Guide for Inf1\n====================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n.. toctree::\n\t:maxdepth: 1\n\n\tFresh install </archive/torch-neuron/setup/pytorch-install>\n\tUpdate to latest release </archive/torch-neuron/setup/pytorch-update>\n\tInstall previous releases </archive/torch-neuron/setup/pytorch-install-prev>\n\t/archive/torch-neuron/setup/pytorch-install-cxx11\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-1.14.2-pytorch-install.rst",
    "content": ".. _install-neuron-1.14.2-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 1.14.2)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents::\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-1.15.0-pytorch-install.rst",
    "content": ".. _install-neuron-1.15.0-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 1.15.0)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-1.15.1-pytorch-install.rst",
    "content": ".. _install-neuron-1.15.1-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 1.15.1)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-1.15.2-pytorch-install.rst",
    "content": ".. _install-neuron-1.15.2-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 1.15.2)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-1.16.1-pytorch-install.rst",
    "content": ".. _install-neuron-1.16.1-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 1.16.1)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.1\n\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.1 --framework-version=pytorch-1.5.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-1.16.2-pytorch-install.rst",
    "content": ".. _install-neuron-1.16.2-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 1.16.2)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.2\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.2\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.2\n\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.2 --framework-version=pytorch-1.5.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-1.16.3-pytorch-install.rst",
    "content": ".. _install-neuron-1.16.3-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 1.16.3)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.3\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.3\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.3\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.3\n\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.16.3 --framework-version=pytorch-1.5.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-1.17.2-pytorch-install.rst",
    "content": ".. _install-neuron-1.17.2-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 1.17.2)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.17.2\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.17.2\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.17.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.17.2\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.17.2 --framework-version=pytorch-1.5.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-1.18.0-pytorch-install.rst",
    "content": ".. _install-neuron-1.18.0-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 1.18.0)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.18.0\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.18.0\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.10.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.18.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.18.0\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.7.1\n\n\n   .. tab-item:: PyTorch 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.18.0 --framework-version=pytorch-1.5.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-1.19.0-pytorch-install.rst",
    "content": ".. _install-neuron-1.19.0-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 1.19.0)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.19.0\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.19.0\n\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.19.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.19.0\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.10.2\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=1.19.0 --framework-version=pytorch-1.7.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-2.3.0-pytorch-install.rst",
    "content": ".. _install-neuron-2.3.0-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 2.3.0)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.3.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.3.0\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.3.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.3.0\n\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.3.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.3.0\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.10.2\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.3.0 --framework-version=pytorch-1.7.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-2.4.0-pytorch-install.rst",
    "content": ".. _install-neuron-2.4.0-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 2.4.0)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.4.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.4.0\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.4.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.4.0\n\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.4.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.4.0\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.10.2\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.4.0 --framework-version=pytorch-1.7.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/prev-releases/neuron-2.5.0-pytorch-install.rst",
    "content": ".. _install-neuron-2.5.0-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (Neuron 2.5.0)\n======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.12.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.5.0 \n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 \n   \n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n\n\n \n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.12.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 \n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 \n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.5.0 \n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 \n\n   \n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.12.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 \n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 \n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.5.0 \n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 \n   \n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.11.0\n\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.10.2\n\n\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.9.1\n\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.8.1\n\n\n\n   .. tab-item:: PyTorch 1.7.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux AMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux  --neuron-version=2.5.0 --framework-version=pytorch-1.7.1\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-install-cxx11.rst",
    "content": ".. _pytorch-install-cxx11:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall with support for cxx11 ABI\n==================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. warning::\n\n    The intended user of this guide is using a custom built version of\n    ``torch`` or compiling a non-python application which must be built using\n    the cxx11 ABI.\n\n    *Most applications do not require this specialized distribution.*\n\n    For regular installation instructions see: :ref:`Fresh install <install-neuron-pytorch>`\n\nThe standard ``torch-neuron`` packages (which are normally installed according\nto the :ref:`Fresh install <install-neuron-pytorch>` guide) are compiled with\nthe pre-cxx11 ABI and linked against the pre-cxx11 ``libtorch``. These\ncompilation options ensure that the ``torch-neuron`` ABI matches the *publicly*\nreleased version of the ``torch`` package that is installed from the default\nPyPI index.\n\nTo support applications with specific ABI requirements, Neuron distributes\npackages which are linked against the cxx11 version of\n``libtorch``. These ``torch-neuron`` packages are built using the\n``-D_GLIBCXX_USE_CXX11_ABI=1`` compilation flag.\n\nThe only difference between these packages and the standard packages\nis the torch plugin library contained within the package. This is the\n``libtorchneuron.so`` library located in the ``torch_neuron/lib/`` package\ndirectory. All other libraries and python files within the packages are\nidentical. This means that these cxx11-compatible packages are drop-in\nreplacements in environments that are incompatible with the standard releases of\n``torch-neuron``. Behavior is identical whether compiling models or executing\ninferences.\n\nInstallation\n^^^^^^^^^^^^\n\nAll versions of the library are available to download from the following pip\nindex:\n\n::\n\n    https://pip.repos.neuron.amazonaws.com/cxx11\n\n\nTo install a wheel, it is recommended to use the ``--no-deps`` flag since\nversions of ``torch`` compiled using the cxx11 ABI are not distributed on this\nindex.\n\n::\n\n    pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 torch-neuron --no-deps\n\n\nSpecific versions of ``torch-neuron`` with cxx11 ABI support can be installed\njust like standard versions of ``torch-neuron``.\n\n::\n\n    pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 \"torch-neuron>=1.8\" --no-deps\n    pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 \"torch-neuron==1.9.1\" --no-deps\n    pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 \"torch-neuron<1.10\" --no-deps\n\n.. important::\n\n    This pip index does not include a distribution of ``torch`` compiled with\n    the new cxx11 ABI. The intent of this index is *only* to provide Neuron SDK\n    wheels.\n\n    The version of ``torch`` that is distributed on the default PyPI index is\n    compiled with the old pre-cxx11 ABI.\n\n    If a cxx11 ``torch-neuron`` package is installed *with* dependencies\n    using the *default* PyPI index, then the installed version of ``torch`` will\n    be using the pre-cxx11 ABI and ``torch-neuron`` will be using the cxx11\n    ABI. This ABI mismatch will lead to errors in both python usage and at link\n    time for non-python applications.\n\nFAQ\n^^^\n\nWhen should I use a cxx11 torch-neuron wheel?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDistributions compiled with the new cxx11 ABI should only be used in the\nfollowing cases:\n\n    1. You have built your own version of ``torch`` which uses the new cxx11 ABI and\n       need a corresponding version of ``torch-neuron`` that is compatible.\n    2. You are compiling an application against a ``libtorch``\n       which uses the cxx11 ABI and would like to include\n       ``libtorchneuron.so`` as well. Torch distributes these cxx11 ``libtorch``\n       libraries with a ``libtorch-cxx11`` prefix.\n\n        Example:\n\n        ::\n\n            https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.10.2%2Bcpu.zip\n\n\nCan I download a library/header zip file similar to the torch distribution?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nCurrently ``torch-neuron`` does not distribute a bundled library ``.zip`` with\nonly library/header files.\n\nThe recommended alternative when compiling ``libtorchneuron.so`` into a\nnon-python application is to install the ``torch-neuron`` wheel using ``pip``\naccording to the installation instructions. Then use the ``libtorchneuron.so``\nlibrary from within the python ``site-packages`` directory.\n\nA second alternative to isolate the package contents from a python environment\nis to download the wheel and unpack the contents:\n\n.. code:: bash\n\n    pip download --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 torch-neuron --no-deps\n    wheel unpack torch_neuron-*.whl\n\nIf the exact version of the ``torch-neuron`` package is known and no\npython/pip is available in the build environment, an alternative to is fetch the\npackage file directly and ``unzip`` the wheel:\n\n.. code::\n\n    wget https://pip.repos.neuron.amazonaws.com/cxx11/torch-neuron/torch_neuron-<VERSION>-py3-none-any.whl\n    unzip torch_neuron-<VERSION>-py3-none-any.whl\n\n\n.. _pytorch-cxx11-versioning:\n\nHow can I know which ABI torch-neuron is using?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nPackages which use the pre-cxx11 ABI have no local identifier and use the\nfollowing version scheme:\n\n::\n\n    <torch version>.<neuron version>\n\nPackages which use the cxx11 ABI have a ``+cxx11`` local identifier and use\nfollowing version scheme:\n\n::\n\n    <torch version>.<neuron version>+cxx11\n\n\nThis allows the ABI to be validated in the by inspecting the local identifier\n(or version suffix).\n\nExample:\n::\n\n    1.8.1.0.0.0.0+cxx11\n    1.9.1.0.0.0.0+cxx11\n    1.10.2.0.0.0.0+cxx11\n\n\nHow can I know which ABI torch is using?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe ``torch`` python package provides an API at the that allows you to check if\nthe underlying ``libtorch`` was compiled with the cxx11 ABI:\n\n.. code:: python\n\n    import torch\n    torch.compiled_with_cxx11_abi()  # True/False\n\nCurrently ``torch-neuron`` does not have an equivalent API. If the cxx11 ABI was\nused, it will be visible in the version string (See :ref:`pytorch-cxx11-versioning`).\n\n\nTroubleshooting\n^^^^^^^^^^^^^^^\n\nWhat python errors could I see if I mix ABI versions?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUsing a version of ``torch`` compiled with the cxx11 ABI will trigger an error\nin the python interpreter when importing a version of ``torch-neuron`` using\nthe old (pre-cxx11) ABI from the standard index. This will manifest as an\nerror when the ``import torch_neuron`` statement is executed.\n\n::\n\n    Traceback (most recent call last):\n      File \"/python3.7/site-packages/torch_neuron/__init__.py\", line 64, in <module>\n        _register_extension()\n      File \"/python3.7/site-packages/torch_neuron/__init__.py\", line 60, in _register_extension\n        torch.ops.load_library(neuron_op_filename)\n      File \"/python3.7/site-packages/torch/_ops.py\", line 110, in load_library\n        ctypes.CDLL(path)\n      File \"/python3.7/ctypes/__init__.py\", line 364, in __init__\n        self._handle = _dlopen(self._name, mode)\n    OSError: /python3.7/site-packages/torch_neuron/lib/libtorchneuron.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_\n\n\nSimilarly if using the standard pre-cxx11 version of ``torch`` with the cxx11\nversion of ``torch-neuron`` will also cause an error upon import.\n\n::\n\n    Traceback (most recent call last):\n      File \"/python3.7/site-packages/torch_neuron/__init__.py\", line 79, in <module>\n        _register_extension()\n      File \"/python3.7/site-packages/torch_neuron/__init__.py\", line 75, in _register_extension\n        torch.ops.load_library(neuron_op_filename)\n      File \"/python3.7/site-packages/torch/_ops.py\", line 110, in load_library\n        ctypes.CDLL(path)\n      File \"/python3.7/ctypes/__init__.py\", line 364, in __init__\n        self._handle = _dlopen(self._name, mode)\n    OSError: /python3.7/site-packages/torch_neuron/lib/libtorchneuron.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE\n\n\nIn either of these cases, the remedy is to ensure that the ABI of the ``torch``\ndistribution matches the ABI of the ``torch-neuron`` distribution.\n\nWhat compiler/linking errors could I see if I mix ABI versions?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf you link an application which uses the old (pre-cxx11) ABI\n``libtorchneuron.so`` with a cxx11 version of ``torch``, this will trigger a\nlink error.\n\n::\n\n    libtorchneuron.so: undefined reference to `torch::detail::class_base::class_base(std::string const&, std::string const&, std::string, std::type_info const&, std::type_info const&)'\n    libtorchneuron.so: undefined reference to `c10::Error::Error(c10::SourceLocation, std::string)'\n    libtorchneuron.so: undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::string const&)'\n    libtorchneuron.so: undefined reference to `c10::ClassType::getMethod(std::string const&) const'\n    libtorchneuron.so: undefined reference to `c10::ivalue::ConstantString::create(std::string)'\n    libtorchneuron.so: undefined reference to `c10::DeviceTypeName(c10::DeviceType, bool)'\n    libtorchneuron.so: undefined reference to `torch::jit::parseSchema(std::string const&)'\n    libtorchneuron.so: undefined reference to `unsigned short caffe2::TypeMeta::_typeMetaData<std::string>()'\n    libtorchneuron.so: undefined reference to `c10::Warning::warn(c10::SourceLocation const&, std::string const&, bool)'\n    libtorchneuron.so: undefined reference to `torch::jit::parseSchemaOrName(std::string const&)'\n    libtorchneuron.so: undefined reference to `c10::Symbol::fromQualString(std::string const&)'\n    libtorchneuron.so: undefined reference to `c10::Error::Error(std::string, std::string, void const*)'\n    libtorchneuron.so: undefined reference to `c10::detail::infer_schema::make_function_schema(std::string&&, std::string&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>)'\n    libtorchneuron.so: undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&)'\n    libtorchneuron.so: undefined reference to `torch::jit::canonicalSchemaString(c10::FunctionSchema const&)'\n\n\nSimilarly, an error will also occur in the opposite scenario where the\ncxx11 ``libtorchneuron.so`` library is used with the pre-cxx11 ``libtorch``:\n\n::\n\n    libtorchneuron.so: undefined reference to `c10::ivalue::ConstantString::create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'\n    libtorchneuron.so: undefined reference to `torch::jit::parseSchemaOrName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'\n    libtorchneuron.so: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'\n    libtorchneuron.so: undefined reference to `c10::Error::Error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void const*)'\n    libtorchneuron.so: undefined reference to `torch::jit::canonicalSchemaString[abi:cxx11](c10::FunctionSchema const&)'\n    libtorchneuron.so: undefined reference to `torch::detail::class_base::class_base(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::type_info const&, std::type_info const&)'\n    libtorchneuron.so: undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'\n    libtorchneuron.so: undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'\n    libtorchneuron.so: undefined reference to `c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>)'\n    libtorchneuron.so: undefined reference to `torch::jit::parseSchema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'\n    libtorchneuron.so: undefined reference to `c10::DeviceTypeName[abi:cxx11](c10::DeviceType, bool)'\n    libtorchneuron.so: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'\n    libtorchneuron.so: undefined reference to `unsigned short caffe2::TypeMeta::_typeMetaData<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >()'\n    libtorchneuron.so: undefined reference to `c10::ClassType::getMethod(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'\n    libtorchneuron.so: undefined reference to `c10::Warning::warn(c10::SourceLocation const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'\n\n\nIn either of these cases, the remedy is to ensure that the ABI of the\n``libtorch`` distribution matches the ABI of the ``libtorchneuron.so``\ndistribution.\n\nThe ``torch`` ABI must match the ``torch-neuron`` ABI or an error will occur.\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-install-prev-al2.rst",
    "content": ".. _pytorch-neuron-install-prev-al2:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall Previous PyTorch Neuron Releases for Amazon Linux (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n   :maxdepth: 1\n\n\nThis section will assist you in installing previous Neuron releases.\n\n.. tab-set::\n\n\n    .. tab-item:: Neuron 2.18.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.17.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.17.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.16.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.16.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-install-prev-al2023.rst",
    "content": "\n.. _pytorch-neuron-install-prev-al2023:\n\n.. Install previous PyTorch Neuron releases for Amazon Linux 2023 - archived\n\nUse the tabs below to install a specific previous Neuron SDK release. Select the Neuron version you need.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.21.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-install-prev-u20.rst",
    "content": "\n.. _pytorch-neuron-install-prev-u20:\n\n.. Install previous PyTorch Neuron releases for Ubuntu 20.04 - archived\n\nUse the tabs below to install a specific previous Neuron SDK release. Select the Neuron version you need.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.21.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-install-prev-u22.rst",
    "content": "\n.. _pytorch-neuron-install-prev-u22:\n\n.. Install previous PyTorch Neuron releases for Ubuntu 22.04 - archived\n\nUse the tabs below to install a specific previous Neuron SDK release. Select the Neuron version you need.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.21.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-install-prev.rst",
    "content": ".. _install-prev-neuron-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall previous PyTorch Neuron releases (``torch-neuron``)\n============================================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. toctree::\n   :maxdepth: 1\n\n   Neuron 2.5.0 </archive/torch-neuron/setup/prev-releases/neuron-2.5.0-pytorch-install>\n   Neuron 2.4.0 </archive/torch-neuron/setup/prev-releases/neuron-2.4.0-pytorch-install>\n   Neuron 2.3.0 </archive/torch-neuron/setup/prev-releases/neuron-2.3.0-pytorch-install>\n   Neuron 1.19.0 </archive/torch-neuron/setup/prev-releases/neuron-1.19.0-pytorch-install>\n   Neuron 1.18.0 </archive/torch-neuron/setup/prev-releases/neuron-1.18.0-pytorch-install>\n   Neuron 1.17.2 </archive/torch-neuron/setup/prev-releases/neuron-1.17.2-pytorch-install>\n   Neuron 1.16.3 </archive/torch-neuron/setup/prev-releases/neuron-1.16.3-pytorch-install>\n   Neuron 1.16.2 </archive/torch-neuron/setup/prev-releases/neuron-1.16.2-pytorch-install>\n   Neuron 1.16.1 </archive/torch-neuron/setup/prev-releases/neuron-1.16.1-pytorch-install>\n   Neuron 1.15.2 </archive/torch-neuron/setup/prev-releases/neuron-1.15.2-pytorch-install>\n   Neuron 1.15.1 </archive/torch-neuron/setup/prev-releases/neuron-1.15.1-pytorch-install>\n   Neuron 1.15.0 </archive/torch-neuron/setup/prev-releases/neuron-1.15.0-pytorch-install>\n   Neuron 1.14.2 </archive/torch-neuron/setup/prev-releases/neuron-1.14.2-pytorch-install>\n\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-install.rst",
    "content": ".. _install-neuron-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nInstall PyTorch Neuron (``torch-neuron``)\n=========================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.13.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.12.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.13.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.12.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.13.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.12.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.10.2\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-update-al2-dlami.rst",
    "content": ".. _pytorch-neuron-al2-update:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUpdate to latest PyTorch Neuron  (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nIf you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release.\n\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.1\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=dlami-framework\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-update-al2023.rst",
    "content": "\n.. _pytorch-neuron-al2023-update:\n\n.. Update PyTorch Neuron (torch-neuron) on Amazon Linux 2023 - archived\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands.\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.1\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-update-u20-dlami.rst",
    "content": "\n.. _pytorch-neuron-u20-update:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUpdate to latest PyTorch Neuron  (``torch-neuron``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nIf you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release.\n\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.1\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=dlami-framework\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-update-u20.rst",
    "content": "\n.. _pytorch-neuron-u20-update:\n\n.. Update PyTorch Neuron (torch-neuron) on Ubuntu 20.04 - archived\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands.\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.1\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-update-u22.rst",
    "content": "\n.. _pytorch-neuron-u22-update:\n\n.. Update PyTorch Neuron (torch-neuron) on Ubuntu 22.04 - archived\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands.\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.1\n\n        .. include:: /setup/install-templates/inf1/note-setup-general.rst\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/torch-neuron/setup/pytorch-update.rst",
    "content": ".. _update-neuron-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUpdate to latest PyTorch Neuron (``torch-neuron``)\n==================================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. include:: /setup/install-templates/inf1/note-setup-cntr.rst\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/develop_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. tab-set::\n   .. tab-item:: PyTorch 1.13.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\nCompile on compute instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/compile_mode.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.13.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n\nDeploy on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /setup/install-templates/inf1/deploy_mode.rst\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.13.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n         .. tab-item:: Amazon Linux 2 DLAMI Base\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n"
  },
  {
    "path": "archive/torch-neuron/torch-neuron-dataparallel-example-default.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\nThe default DataParallel use mode will replicate the model\non all available NeuronCores in the current process. The inputs will be split\non ``dim=0``.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    from torchvision import models\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 224, 224])\n    model_neuron = torch.neuron.trace(model, image)\n\n    # Create the DataParallel module\n    model_parallel = torch.neuron.DataParallel(model_neuron)\n\n    # Create a batched input\n    batch_size = 5\n    image_batched = torch.rand([batch_size, 3, 224, 224])\n\n    # Run inference with a batched input\n    output = model_parallel(image_batched)\n"
  },
  {
    "path": "archive/torch-neuron/torch-neuron-dataparallel-example-dim-neq-zero.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\nIn this example we run DataParallel inference using four NeuronCores and\n``dim = 2``. Because ``dim != 0``, dynamic batching is not enabled.\nConsequently, the DataParallel inference-time batch size must be four times the\ncompile-time batch size. DataParallel will generate a warning that dynamic\nbatching is disabled because ``dim != 0``.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n\n    # Create an example model\n    class Model(torch.nn.Module):\n        def __init__(self):\n            super().__init__()\n            self.conv = torch.nn.Conv2d(3, 3, 3)\n\n        def forward(self, x):\n            return self.conv(x) + 1\n\n    model = Model()\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 8, 8])\n    model_neuron = torch.neuron.trace(model, image)\n\n    # Create the DataParallel module using 4 NeuronCores and dim = 2\n    model_parallel = torch.neuron.DataParallel(model_neuron, device_ids=[0, 1, 2, 3], dim=2)\n\n    # Create a batched input\n    # Note that image_batched.shape[dim] / len(device_ids) == image.shape[dim]\n    batch_size = 4 * 8\n    image_batched = torch.rand([1, 3, batch_size, 8])\n\n    # Run inference with a batched input\n    output = model_parallel(image_batched)\n"
  },
  {
    "path": "archive/torch-neuron/torch-neuron-dataparallel-example-disable-dynamic-batching.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\nIn the following example, we use\n:func:`torch.neuron.DataParallel.disable_dynamic_batching` to disable dynamic\nbatching. We provide an example of a batch size that will not work when dynamic\nbatching is disabled as well as an example of a batch size that does work when\ndynamic batching is disabled.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    from torchvision import models\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 224, 224])\n    model_neuron = torch.neuron.trace(model, image)\n\n    # Create the DataParallel module and use 4 NeuronCores\n    model_parallel = torch.neuron.DataParallel(model_neuron, device_ids=[0, 1, 2, 3], dim=0)\n\n    # Disable dynamic batching\n    model_parallel.disable_dynamic_batching()\n\n    # Create a batched input (this won't work)\n    batch_size = 8\n    image_batched = torch.rand([batch_size, 3, 224, 224])\n\n    # This will fail because dynamic batching is disabled and\n    # image_batched.shape[dim] / len(device_ids) != image.shape[dim]\n    # output = model_parallel(image_batched)\n\n    # Create a batched input (this will work)\n    batch_size = 4\n    image_batched = torch.rand([batch_size, 3, 224, 224])\n\n    # This will work because\n    # image_batched.shape[dim] / len(device_ids) == image.shape[dim]\n    output = model_parallel(image_batched)\n"
  },
  {
    "path": "archive/torch-neuron/torch-neuron-dataparallel-example-dynamic-batching.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\nIn the following example, we use the :func:`torch.neuron.DataParallel` module\nto run inference using several different batch sizes without recompiling the\nNeuron model.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    from torchvision import models\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 224, 224])\n    model_neuron = torch.neuron.trace(model, image)\n\n    # Create the DataParallel module\n    model_parallel = torch.neuron.DataParallel(model_neuron)\n\n    # Create batched inputs and run inference on the same model\n    batch_sizes = [2, 3, 4, 5, 6]\n    for batch_size in batch_sizes:\n        image_batched = torch.rand([batch_size, 3, 224, 224])\n\n        # Run inference with a batched input\n        output = model_parallel(image_batched)\n"
  },
  {
    "path": "archive/torch-neuron/torch-neuron-dataparallel-example-specify-ncs.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\nThe following example uses the ``device_ids`` argument to use the first three\nNeuronCores for DataParallel inference.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    from torchvision import models\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 224, 224])\n    model_neuron = torch.neuron.trace(model, image)\n\n    # Create the DataParallel module, run on the first three NeuronCores\n    # Equivalent to model_parallel = torch.neuron.DataParallel(model_neuron, device_ids=[0, 1, 2])\n    model_parallel = torch.neuron.DataParallel(model_neuron, device_ids=['nc:0', 'nc:1', 'nc:2'])\n\n    # Create a batched input\n    batch_size = 5\n    image_batched = torch.rand([batch_size, 3, 224, 224])\n\n    # Run inference with a batched input\n    output = model_parallel(image_batched)\n"
  },
  {
    "path": "archive/torch-neuron/troubleshooting-guide.rst",
    "content": ".. _pytorch-neuron-inference-troubleshooting:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTroubleshooting Guide for PyTorch Neuron (``torch-neuron``)\n===========================================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\nPatching PyTorch version 1.13 for CVEs\n--------------------------------------\n\nPyTorch version 1.13 has the following CVEs:\n- CVE-2025-32434\n- CVE-2024-31580\n- CVE-2024-31583\n\nTo patch PyTorch version 1.13, run the following on a CPU instance with Ubuntu 22 AMI (it takes 30 minutes on a c5.4xlarge):\n\n::\n\n    git clone --recursive https://github.com/pytorch/pytorch -b v1.13.1\n    cd pytorch\n    git cherry-pick b5c3a17c2c207ebefcb85043f0cf94be9b2fef81\n    git cherry-pick 9c7071b0e324f9fb68ab881283d6b8d388a4bcd2\n    wget https://github.com/user-attachments/files/22013116/patch_v113.txt\n    git apply patch_v113.txt\n\nTo build the pip wheel, see `build steps <https://github.com/pytorch/pytorch/tree/v1.13.1?tab=readme-ov-file#from-source>`_. A condensed version is provided below.\n\nInstall Miniconda by following `installation steps <https://www.anaconda.com/docs/getting-started/miniconda/install#linux-2>`_ and run the following commands:\n\n::\n\n    source ~/miniconda3/bin/activate\n    conda create --name conda_py39 python=3.9\n    conda activate conda_py39\n    conda install astunparse numpy==1.19.5 ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses\n    conda install mkl mkl-include# CUDA only: Add LAPACK support for the GPU if needed\n    conda install -c pytorch magma-cuda110  # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo\n    sudo apt install cmake g++\n\n    export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-\"$(dirname $(which conda))/../\"}\n    PYTORCH_BUILD_VERSION=1.13.2 PYTORCH_BUILD_NUMBER=1 python setup.py bdist_wheel\n    # the PyTorch pip wheel will be in dist directory\n\n\nGeneral Torch-Neuron issues\n---------------------------\n\nIf you see an error about \"Unknown builtin op: neuron::forward_1\" like below, please ensure that import line \"import torch_neuron\" (to register the Neuron custom operation) is in the inference script before using torch.jit.load.\n\n::\n\n   Unknown builtin op: neuron::forward_1.\n   Could not find any similar ops to neuron::forward_1. This op may not exist or may not be currently supported in TorchScript.\n\n\nTorchVision related issues\n--------------------------\n\nIf you encounter an error like below, it is because latest torchvision\nversion >= 0.7 is not compatible with Torch-Neuron 1.5.1. Please\ndowngrade torchvision to version 0.6.1:\n\n::\n\n   E   AttributeError: module 'torch.jit' has no attribute '_script_if_tracing'                                                                                      \n\n\n2GB protobuf limit related issues\n---------------------------------\n\nIf you encounter an error like below, it is because the model size is larger than 2GB.\nTo compile such large models, use the :ref:`separate_weights=True <torch_neuron_trace_api>` flag. Note,\nensure that you have the latest version of compiler installed to support this flag.\nYou can upgrade neuron-cc using \n:code:`python3 -m pip install neuron-cc[tensorflow] -U --force --extra-index-url=https://pip.repos.neuron.amazonaws.com`\n\n::\n\n   E google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'\n\n\n\n\ntorch.jit.trace issues\n----------------------\nThe :doc:`Trace API </archive/torch-neuron/api-compilation-python-api>`\nuses the PyTorch :func:`torch.jit.trace` function to generate\n:class:`~torch.jit.ScriptModule` models for execution on Inferentia. Due to that,\nto execute your PyTorch model on Inferentia it must be torch-jit-traceable,\notherwise you need to make sure your model is torch-jit-traceable. You can try\nmodifying your underlying PyTorch model code to make it traceable. If it's not\npossible to change your model code, you can :ref:`write a wrapper around your\nmodel <wrapping-non-traceable-models>` that makes it torch-jit-traceable to\ncompile it for Inferentia.\n\nPlease visit :func:`torch.jit.trace` to review the properties that a model must\nhave to be torch-jit-traceable. The PyTorch-Neuron trace API\n:func:`torch_neuron.trace` accepts :code:`**kwargs` for :func:`torch.jit.trace`.\nFor example, you can use the :code:`strict=False` flag to\n:ref:`compile models with dictionary outputs <compiling-models-with-kwargs>`.\n\n\n.. _wrapping-non-traceable-models:\n\nCompiling models with outputs that are not torch-jit-traceable\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nTo enable compilation of models with non torch-jit-traceable outputs, you can\nuse a technique that involves writing a wrapper that converts the model's\noutput into a form that is torch-jit-traceable. You can then compile the\nwrapped model for Inferentia using :func:`torch_neuron.trace`.\n\n\nThe following example uses a wrapper to compile a model with non\ntorch-jit-traceable outputs. This model cannot be compiled for Inferentia in\nits current form because it outputs a list of tuples and tensors, which is not\ntorch-jit-traceable.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    import torch.nn as nn\n\n    class Model(nn.Module):\n        def __init__(self):\n            super(Model, self).__init__()\n            self.conv = nn.Conv2d(1, 1, 3)\n\n        def forward(self, x):\n            a = self.conv(x) + 1\n            b = self.conv(x) + 2\n            c = self.conv(x) + 3\n            # An output that is a list of tuples and tensors is not torch-traceable\n            return [(a, b), c]\n\n    model = Model()\n    model.eval()\n\n    inputs = torch.rand(1, 1, 3, 3)\n\n    # Try to compile the model\n    model_neuron = torch.neuron.trace(model, inputs) # ERROR: This cannot be traced, we must change the output format\n\n\nTo compile this model for Inferentia, we can write a wrapper around the model\nto convert its outputs into a tuple of tensors, which is torch-jit-traceable.\n\n.. code-block:: python\n\n    class NeuronCompatibilityWrapper(nn.Module):\n        def __init__(self):\n            super(NeuronCompatibilityWrapper, self).__init__()\n            self.model = Model()\n\n        def forward(self, x):\n            out = self.model(x)\n            # An output that is a tuple of tuples and tensors is torch-jit-traceable\n            return tuple(out)\n\nNow, we can successfully compile the model for Inferentia using the\n:code:`NeuronCompatibilityWrapper` wrapper as follows:\n\n.. code-block:: python\n\n    model = NeuronCompatibilityWrapper()\n    model.eval()\n\n    # Compile the traceable wrapped model\n    model_neuron = torch.neuron.trace(model, inputs)\n\nIf the model's outputs must be in the original form, a second wrapper can be\nused to transform the outputs after compilation for Inferentia. The following\nexample uses the :code:`OutputFormatWrapper` wrapper to convert the compiled\nmodel's output back into the original form of a list of tuples and tensors.\n\n.. code-block:: python\n\n    class OutputFormatWrapper(nn.Module):\n        def __init__(self):\n            super(OutputFormatWrapper, self).__init__()\n            self.traceable_model = NeuronCompatibilityWrapper()\n\n        def forward(self, x):\n            out = self.traceable_model(x)\n            # Return the output in the original format of Model()\n            return list(out)\n\n    model = OutputFormatWrapper()\n    model.eval()\n\n    # Compile the traceable wrapped model\n    model.traceable_model = torch.neuron.trace(model.traceable_model, inputs)\n\n\nCompiling a submodule in a model that is not torch-jit-traceable\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe following example shows how to compile a submodule that is part of a non\ntorch-jit-traceable model. In this example, the top-level model :code:`Outer`\nuses a dynamic flag, which is not torch-jit-traceable. However, the\nsubmodule :code:`Inner` is torch-jit-traceable and can be compiled for\nInferentia.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuron\n    import torch.nn as nn\n\n    class Inner(nn.Module) :\n        def __init__(self):\n            super().__init__()\n            self.conv = nn.Conv2d(1, 1, 3)\n\n        def forward(self, x):\n            return self.conv(x) + 1\n\n\n    class Outer(nn.Module):\n        def __init__(self):\n            super().__init__()\n            self.inner = Inner()\n\n        def forward(self, x, add_offset: bool = False):\n            base = self.inner(x)\n            if add_offset:\n                return base + 1\n            return base\n\n    model = Outer()\n    inputs = torch.rand(1, 1, 3, 3)\n\n    # Compile the traceable wrapped submodule\n    model.inner = torch.neuron.trace(model.inner, inputs)\n\n    # TorchScript the model for serialization\n    script = torch.jit.script(model)\n    torch.jit.save(script, 'model.pt')\n\n    loaded = torch.jit.load('model.pt')\n\nAlternatively, for usage scenarios in which the model configuration is static\nduring inference, the dynamic flags can be hardcoded in a wrapper to make\nthe model torch-jit-traceable and enable compiling the entire model for Inferentia.\nIn this example, we assume the :code:`add_offset` flag is always\n:code:`True` during inference, so we can hardcode this conditional path in the\n:code:`Static` wrapper to remove the dynmaic behavior and compile the entire\nmodel for Inferentia.\n\n.. code-block:: python\n\n    class Static(nn.Module):\n        def __init__(self):\n            super().__init__()\n            self.outer = Outer()\n\n        def forward(self, x):\n            # hardcode `add_offset=True`\n            output = self.outer(x, add_offset=True)\n            return output\n\n    model = Static()\n\n    # We can now compile the entire model because `add_offset=True` is hardcoded in the Static wrapper\n    model_neuron = torch.neuron.trace(model, inputs)\n"
  },
  {
    "path": "archive/torch-neuron/tutorials/neuroncore_pipeline_pytorch.rst",
    "content": ".. _pytorch-tutorials-neuroncore-pipeline-pytorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUsing NeuronCore Pipeline with PyTorch Tutorial\n================================================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\n\nOverview\n--------\n\nIn this tutorial we will benchmark latency of a Hugging Face Transformers model deployed in model pipeline paralle mode using the NeuronCore Pipeline feature. We will compare the results with the usual data parallel (multi-worker) deployment. We compile a pretrained BERT base model and run the benchmarking locally.\n\nTo enable faster enviroment setup, We will run both compilation and deployment (inference) on an single inf1.6xlarge instance. You can take similar steps to recreate the benchmark on other instance sizes, such as inf1.xlarge.\n\nIf you already have an Inf1 instance environment ready, this tutorial is availabe as a Jupyter notebook at :pytorch-neuron-src:`neuroncore_pipeline_pytorch.ipynb <pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb>` and instructions can be viewed at: \n\n.. toctree::\n   :maxdepth: 1\n\n   /src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb\n\nInstructions of how to setup the environment and run the tutorial are available in the next sections.\n\n.. _pytorch-neuroncore-pipeline-pytorch-env-setup:\n\nSetup The Environment \n---------------------\n\nLaunch an Inf1 instance by following the below steps, please make sure to choose an inf1.6xlarge instance.\n\n.. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst\n\n\n.. _pytorch-neuroncore-pipeline-pytorch-run-tutorial:\n\nRun The Tutorial\n----------------\n\nAfter connecting to the instance from the terminal, clone the Neuron Github repository to the EC2 instance and then change the working directory to the tutorial directory:\n\n.. code::\n\n  git clone https://github.com/aws/aws-neuron-sdk.git\n  cd aws-neuron-sdk/src/examples/pytorch\n  \n\n\nThe Jupyter notebook is available as a file with the name :pytorch-neuron-src:`neuroncore_pipeline_pytorch.ipynb <pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb>`, you can either run the Jupyter notebook from a browser or run it as a script from terminal:\n\n\n* **Running tutorial from browser**\n\n  * First setup and launch the Jupyter notebook on your local browser by following instructions at :ref:`Running Jupyter Notebook Browser`\n  * Open the Jupyter notebook from the menu and follow the instructions\n  \n  \nYou can also view the Jupyter notebook at:\n\n.. toctree::\n   :maxdepth: 1\n\n   /src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb\n\n\n.. _pytorch-neuroncore-pipeline-pytorch-cleanup-instances:\n\nClean up your instance/s\n------------------------\n\nAfter you've finished with the instance/s that you created for this tutorial, you should clean up by terminating the instance/s, please follow instructions at `Clean up your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-clean-up-your-instance>`_.\n"
  },
  {
    "path": "archive/torch-neuron/tutorials/pytorch-tutorial-setup.rst",
    "content": ".. _pytorch-tutorial-setup:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nPyTorch Tutorial Setup\n======================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n#. Launch an Inf1.6xlarge Instance:\n    .. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst\n\n#. Set up a development environment:\n    * Enable or install PyTorch-Neuron: :ref:`install-neuron-pytorch`.\n      \n\n#. Run tutorial in Jupyter notebook:\n    * Follow instruction at :ref:`Setup Jupyter notebook <setup-jupyter-notebook-steps-troubleshooting>` to:\n    \n      #. Start the Jupyter Notebook on the instance\n      #. Run the Jupyter Notebook from your local browser\n\n    * Connect to the instance from the terminal, clone the Neuron Github repository to the Inf1 instance and then change the working directory to the tutorial directory:\n\n      .. code::\n\n        git clone https://github.com/aws/aws-neuron-sdk.git\n        cd aws-neuron-sdk/src/examples/pytorch\n\n    * Locate the tutorial notebook file (.ipynb file) under ``aws-neuron-sdk/src/examples/pytorch``\n    * From your local browser, open the tutorial notebook from the menu and follow the instructions.\n\n    \n"
  },
  {
    "path": "archive/torch-neuron/tutorials/transformers-marianmt.rst",
    "content": ".. _pytorch-tutorials-marianmt:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nPyTorch HuggingFace MarianMT Tutorial\n=====================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n\nIn this tutorial you will compile and deploy the `HuggingFace MarianMT <https://huggingface.co/transformers/v4.0.1/model_doc/marian.html>`_ model for sequence-to-seqeunce language translation on an Inf1 instance.\n\nTo enable faster environment setup, you will run the tutorial on an inf1.6xlarge instance to enable both compilation and deployment (inference) on the same instance.\n\nIn a production environment we encourage you to try different instance sizes to optimize to your specific deployment needs.\n\nIf you have already launched an Inf1 instance and have Neuron pytorch DLAMI environment ready, tutorial is available as a Jupyter notebook at :pytorch-neuron-src:`transformers-marianmt.ipynb <transformers-marianmt.ipynb>` and instructions can be viewed at:\n\n.. toctree::\n   :maxdepth: 1\n\n   /src/examples/pytorch/transformers-marianmt.ipynb\n\nInstructions of how to setup Neuron pytorch environment and run the tutorial as a Jupyter notebook are available in the next sections.\n\n.. _pytorch-marianmt-env-setup:\n\nSetup The Environment\n---------------------\n\nLaunch an Inf1 instance by following the below steps, please make sure to choose an inf1.6xlarge instance.\n\n.. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst\n\n\n.. _pytorch-marianmt-run-tutorial:\n\nRun The Tutorial\n----------------\n\nAfter connecting to the instance from the terminal, clone the Neuron Github repository to the EC2 instance and then change the working directory to the tutorial directory:\n\n.. code::\n\n  git clone https://github.com/aws/aws-neuron-sdk.git\n  cd aws-neuron-sdk/src/examples/pytorch\n\nThe Jupyter notebook is available as a file with the name :pytorch-neuron-src:`transformers-marianmt.ipynb <transformers-marianmt.ipynb>` that you can run from browser:\n\n* **Running tutorial from browser**\n\n  * First setup and launch the Jupyter notebook on your local browser by following instructions at :ref:`Running Jupyter Notebook Browser`\n  * Open the Jupyter notebook from the menu and follow the instructions\n\nYou can also view the Jupyter notebook at:\n\n.. toctree::\n   :maxdepth: 1\n\n   /src/examples/pytorch/transformers-marianmt.ipynb\n\n\n.. _marianmt-cleanup-instances:\n\nClean up your instance/s\n------------------------\n\nAfter you've finished with the instance/s that you created for this tutorial, you should clean up by terminating the instance/s, please follow instructions at `Clean up your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-clean-up-your-instance>`_.\n"
  },
  {
    "path": "archive/torch-neuron/tutorials/tutorial-libtorch.rst",
    "content": ".. _pytorch-tutorials-libtorch:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nLibTorch C++ Tutorial\n=========================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\nOverview\n--------\n\nThis tutorial demonstrates the use of `LibTorch <https://pytorch.org/cppdocs/installing.html>`_ with Neuron, the SDK for Amazon Inf1, Inf2 and Trn1 instances. By the end of this tutorial, you will understand how to write a native C++ application that performs inference on EC2 Inf1, Inf2 and Trn1 instances. We will use an inf1.6xlarge and a pretrained BERT-Base model to determine if one sentence is a paraphrase of another.\n\nVerify that this tutorial is running in a virtual environement that was set up according to the `Torch-Neuronx Installation Guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/index.html#setup-torch-neuronx>` or `Torch-Neuron Installation Guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/torch-neuron.html#setup-torch-neuron>`\n\nNotes\n-----\n\nThe tutorial has been tested on Inf1, Inf2 and Trn1 instances on ubuntu instances.\n\n\nRun the tutorial\n----------------\n\nThis tutorial is self contained.  It produces similar output to :ref:`[html] </src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb>` :pytorch-neuron-src:`[notebook] <bert_tutorial/tutorial_pretrained_bert.ipynb>`.\n\nNote:  The tutorial will use about 8.5 GB of disk space. Ensure you have sufficient space before beginning.\n\nRight-click and copy :download:`this link address to the tutorial archive</src/examples/pytorch/libtorch_demo.tar.gz>`.\n\n.. code:: bash\n\n  wget <paste archive URL>\n  tar xvf libtorch_demo.tar.gz\n\nYour directory tree should now look like this:\n\n::\n\n  libtorch_demo\n  ├── bert_neuronx\n  │   ├── compile.py\n  │   └── detect_instance.py\n  ├── clean.sh\n  ├── core_count\n  │   ├── build.sh\n  │   └── main.cpp\n  ├── example_app\n  │   ├── build.sh\n  │   ├── core_count.hpp\n  │   ├── example_app.cpp\n  │   ├── README.txt\n  │   ├── utils.cpp\n  │   └── utils.hpp\n  ├── neuron.patch\n  ├── run_tests.sh\n  ├── setup.sh\n  ├── tokenizer.json\n  └── tokenizers_binding\n      ├── build_python.sh\n      ├── build.sh\n      ├── remote_rust_tokenizer.h\n      ├── run_python.sh\n      ├── run.sh\n      ├── tokenizer.json\n      ├── tokenizer_test\n      ├── tokenizer_test.cpp\n      └── tokenizer_test.py\n\nThis tutorial uses the `HuggingFace Tokenizers <https://github.com/huggingface/tokenizers>`_ library implemented in Rust.\nInstall Cargo, the package manager for the Rust programming language.\n\n\n +----------------------------------+----------------------------------+\n | Ubuntu                           | Amazon Linux 2023                |\n +----------------------------------+----------------------------------+\n | .. code-block:: bash             | .. code-block:: bash             |\n |                                  |                                  |\n |    sudo apt install -y cargo     |    sudo dnf install -y cargo     |\n +----------------------------------+----------------------------------+\n\n\nRun the setup script to download additional depdendencies and build the app. (This may take a few minutes to complete.)\n\n.. literalinclude:: tutorial_source_instructions/run_libtorch.sh\n   :language: bash\n   :lines: 6-7\n\n::\n\n  ...\n  + PATH_NEURON_LIB=/opt/aws/neuron/lib/\n  + g++ utils.cpp example_app.cpp -o ../example-app -O2 -D_GLIBCXX_USE_CXX11_ABI=0 -I../libtorch/include -L../tokenizers_binding/lib -L/opt/aws/neuron/lib/ -L../libtorch/lib -Wl,-rpath,libtorch/lib -Wl,-rpath,tokenizers_binding/lib -Wl,-rpath,/opt/aws/neuron/lib/ -ltokenizers -ltorchneuron -ltorch_cpu -lc10 -lpthread -lnrt\n  ~/libtorch_demo\n  Successfully completed setup\n\n.. _libtorch-benchmark:\n\nBenchmark\n---------\n\nThe setup script should have compiled and saved a PyTorch model compiled for neuron (bert_neuron_b6.pt).  Run the provided sanity tests to ensure everything is working properly.\n\n.. literalinclude:: tutorial_source_instructions/run_libtorch.sh\n   :language: bash\n   :lines: 10\n\n::\n\n  Running tokenization sanity checks.\n\n  None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\n  Tokenizing: 100%|██████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 15021.69it/s]\n  Python took 0.67 seconds.\n  Sanity check passed.\n  Begin 10000 timed tests.\n  ..........\n  End timed tests.\n  C++ took 0.226 seconds.\n\n  Tokenization sanity checks passed.\n  Running end-to-end sanity check.\n\n  The company HuggingFace is based in New York City\n  HuggingFace's headquarters are situated in Manhattan\n  not paraphrase: 10%\n  paraphrase: 90%\n\n  The company HuggingFace is based in New York City\n  Apples are especially bad for your health\n  not paraphrase: 94%\n  paraphrase: 6%\n\n  Sanity check passed.\n\nFinally, run the example app directly to benchmark the BERT model.\n\n.. note::\n\n  You can safely ignore the warning about ``None of PyTorch, Tensorflow >= 2.0, ...``. This occurs because the test runs in a small virtual environment that doesn't require the full frameworks.\n\n.. literalinclude:: tutorial_source_instructions/run_libtorch.sh\n   :language: bash\n   :lines: 13\n\n::\n\n  Getting ready................\n  Benchmarking................\n  Completed 32000 operations in 43 seconds => 4465.12 pairs / second\n  \n  ====================\n  Summary information:\n  ====================\n  Batch size = 6\n  Num neuron cores = 16\n  Num runs per neuron core = 2000\n\n**Congratulations!** By now you should have successfully built and used a native C++ application with LibTorch.\n\nTroubleshooting\n---------------\n\n* In the event of SIGBUS errors you may have insufficient disk space for the creation of temporary model files at runtime.  Consider clearing space or mounting additional disk storage.\n* In the event of a neuron runtime failure, confirm that the Neuron kernel module is loaded using ``sudo modprobe neuron``.\n\n.. _libtorch-cleanup:\n\n\n"
  },
  {
    "path": "archive/torch-neuron/tutorials/tutorial-torchserve.rst",
    "content": ".. _pytorch-tutorials-torchserve:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nBERT TorchServe Tutorial\n========================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\nOverview\n--------\n\nThis tutorial demonstrates the use of `TorchServe <https://pytorch.org/serve>`_ with Neuron, the SDK for Amazon Inf1 instances. By the end of this tutorial, you will understand how TorchServe can be used to serve a model backed by EC2 Inf1 instances. We will use a pretrained BERT-Base model to determine if one sentence is a paraphrase of another.\n\nVerify that this tutorial is running in a virtual environement that was set up according to the `Torch-Neuronx Installation Guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/index.html#setup-torch-neuronx>` or `Torch-Neuron Installation Guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/torch-neuron.html#setup-torch-neuron>`\n\n.. _torchserve-compile:\n\n\nRun the tutorial\n----------------\n\nOpen a terminal, log into your remote instance, and activate a Pytorch virtual environment setup (see the :ref:`Pytorch Installation Guide <install-neuron-pytorch>`). To complete this tutorial, you will need a compiled BERT model. If you have already completed the HuggingFace Pretrained BERT tutorial :ref:`[html] </src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb>` :pytorch-neuron-src:`[notebook] <bert_tutorial/tutorial_pretrained_bert.ipynb>` then you already have the necessary file. Otherwise, you can setup your environment as shown below and then run :download:`trace_bert_neuron.py </src/examples/pytorch/torchserve/trace_bert_neuron.py>` to obtain a traced BERT model.\n\n\nYou should now have a compiled ``bert_neuron_b6.pt`` file, which is required going forward.\n\nOpen a shell on the instance you prepared earlier, create a new directory named ``torchserve``. Copy your compiled model from the previous tutorial into this new directory.\n\n.. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n   :language: bash\n   :lines: 4-6\n\n::\n\n  bert_neuron_b6.pt\n\nPrepare a new Python virtual environment with the necessary Neuron and TorchServe components. Use a virtual environment to keep (most of) the various tutorial components isolated from the rest of the system in a controlled way.\n\n.. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n   :language: bash\n   :lines: 8\n\nInstall the system requirements for TorchServe.\n\n.. tab-set::\n\n   .. tab-item:: Amazon Linux 2023 DLAMI Base\n\n      .. code-block:: bash\n\n        sudo dnf install jq java-11-amazon-corretto-headless\n        sudo alternatives --config java\n        sudo alternatives --config javac\n\n   .. tab-item:: Ubuntu 20 DLAMI Base\n\n      .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n        :language: bash\n        :lines: 10\n\n\n.. code:: bash\n\n  java -version\n\n::\n\n  openjdk version \"11.0.17\" 2022-10-18\n  OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu218.04)\n  OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu218.04, mixed mode, sharing)\n\n.. code:: bash\n\n  javac -version\n\n::\n\n  javac 11.0.17\n\nVerify that TorchServe is now available.\n\n.. code:: bash\n\n  torchserve --version\n\n::\n\n  TorchServe Version is 0.7.0\n\n\n.. _torchserve-setup:\n\nSetup TorchServe\n----------------\n\nDuring this tutorial you will need to download a few files onto your instance. The simplest way to accomplish this is to paste the download links provided above each file into a ``wget`` command. (We don't provide the links directly because they are subject to change.) For example, right-click and copy the download link for ``config.json`` shown below.\n\n.. literalinclude:: /src/examples/pytorch/torchserve/config.json\n    :language: JSON\n    :caption: :download:`config.json </src/examples/pytorch/torchserve/config.json>`\n\n\nNow execute the following in your shell:\n\n.. code:: bash\n\n  wget <paste link here>\n  ls\n\n::\n\n  bert_neuron_b6.pt  config.json\n\nDownload the `custom handler script <https://pytorch.org/serve/custom_service.html>`_ that will eventually respond to inference requests.\n\n.. literalinclude:: /src/examples/pytorch/torchserve/handler_bert.py\n    :language: python\n    :caption: :download:`handler_bert.py </src/examples/pytorch/torchserve/handler_bert.py>`\n    :linenos:\n\nNext, we need to associate the handler script with the compiled model using ``torch-model-archiver``. Run the following commands in your terminal:\n\n.. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 12-16\n\n.. note::\n\n  If you modify your model or a dependency, you will need to rerun the archiver command with the ``-f`` flag appended to update the archive.\n\nThe result of the above will be a ``mar`` file inside the ``model_store`` directory.\n\n.. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 18\n\n::\n\n  bert-max_length128-batch_size6.mar\n\nThis file is essentially an archive associated with a fixed version of your model along with its dependencies (e.g. the handler code).\n\n.. note::\n\n  The version specified in the ``torch-model-archiver`` command can be appended to REST API requests to access a specific version of your model. For example, if your model was hosted locally on port 8080 and named \"bert\", the latest version of your model would be available at ``http://localhost:8080/predictions/bert``, while version 1.0 would be accessible at ``http://localhost:8080/predictions/bert/1.0``. We will see how to perform inference using this API in Step 6.\n\nCreate a `custom config <https://pytorch.org/serve/configuration.html>`_ file to set some parameters. This file will be used to configure the server at launch when we run ``torchserve --start``.\n\n.. literalinclude:: /src/examples/pytorch/torchserve/torchserve.config\n    :language: properties\n    :caption: :download:`torchserve.config </src/examples/pytorch/torchserve/torchserve.config>`\n\n.. note::\n\n  This will cause TorchServe to bind on all interfaces. For security in real-world applications, you’ll probably want to use port 8443 and `enable SSL <https://pytorch.org/serve/configuration.html#enable-ssl>`_.\n\n\n.. _torchserve-run:\n\nRun TorchServe\n--------------\n\nIt's time to start the server. Typically we'd want to launch this in a separate console, but for this demo we’ll just redirect output to a file.\n\n.. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 20\n\nVerify that the server seems to have started okay.\n\n.. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 22\n\n::\n\n  {\n    \"status\": \"Healthy\"\n  }\n\n.. note::\n\n  If you get an error when trying to ping the server, you may have tried before the server was fully launched. Check ``torchserve.log`` for details.\n\nUse the Management API to instruct TorchServe to load our model.\n\n.. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 24-26\n\n::\n\n  {\n    \"status\": \"Model \\\"bert-max_length128-batch_size6\\\" Version: 1.0 registered with 4 initial workers\"\n  }\n\n.. note::\n\n  Any additional attempts to configure the model after the initial curl request will cause the server to return a 409 error. You’ll need to stop/start/configure the server to realize any changes.\n\nThe ``MAX_BATCH_DELAY`` is a timeout value that determines how long to wait before processing a partial batch. This is why the handler code needs to check the batch dimension and potentially add padding. TorchServe will instantiate the number of model handlers indicated by ``INITIAL_WORKERS``, so this value controls how many models we will load onto Inferentia in parallel. This tutorial was performed on an inf1.xlarge instance (one Inferentia chip), so there are four NeuronCores available. If you want to control worker scaling more dynamically, `see the docs <https://pytorch.org/serve/management_api.html#scale-workers>`_.\n\n.. warning::\n  If you attempt to load more models than NeuronCores available, one of two things will occur. Either the extra models will fit in device memory but performance will suffer, or you will encounter an error on your initial inference. You shouldn't set ``INITIAL_WORKERS`` above the number of NeuronCores. However, you may want to use fewer cores if you are using the :ref:`neuroncore-pipeline` feature.\n\nIt looks like everything is running successfully at this point, so it's time for an inference.\n\nCreate the ``infer_bert.py`` file below on your instance.\n\n.. literalinclude:: /src/examples/pytorch/torchserve/infer_bert.py\n    :language: python\n    :caption: :download:`infer_bert.py </src/examples/pytorch/torchserve/infer_bert.py>`\n    :linenos:\n\nThis script will send a ``batch_size`` number of requests to our model. In this example, we are using a model that estimates the probability that one sentence is a paraphrase of another. The script sends positive examples in the first half of the batch and negative examples in the second half.\n\nExecute the script in your terminal.\n\n.. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 28\n\n::\n\n  1 ['paraphrase']\n  3 ['not paraphrase']\n  4 ['not paraphrase']\n  0 ['paraphrase']\n  5 ['not paraphrase']\n  2 ['paraphrase']\n\nWe can see that the first three threads (0, 1, 2) all report ``paraphrase``, as expected. If we instead modify the script to send an incomplete batch and then wait for the timeout to expire, the excess padding results will be discarded.\n\n\n.. _torchserve-benchmark:\n\nBenchmark TorchServe\n--------------------\n\nWe've seen how to perform a single batched inference, but how many inferences can we process per second? A separate upcoming tutorial will document performance tuning to maximize throughput. In the meantime, we can still perform a simple naïve stress test. The code below will spawn 64 worker threads, with each thread repeatedly sending a full batch of data to process. A separate thread will periodically print throughput and latency measurements.\n\n.. literalinclude:: /src/examples/pytorch/torchserve/benchmark_bert.py\n    :language: python\n    :caption: :download:`benchmark_bert.py </src/examples/pytorch/torchserve/benchmark_bert.py>`\n    :linenos:\n\nRun the benchmarking script.\n\n.. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 30\n\n::\n\n  pid 28523: current throughput 0.0, latency p50=0.000 p90=0.000\n  pid 28523: current throughput 617.7, latency p50=0.092 p90=0.156\n  pid 28523: current throughput 697.3, latency p50=0.082 p90=0.154\n  pid 28523: current throughput 702.8, latency p50=0.081 p90=0.149\n  pid 28523: current throughput 699.1, latency p50=0.085 p90=0.147\n  pid 28523: current throughput 703.8, latency p50=0.083 p90=0.148\n  pid 28523: current throughput 699.3, latency p50=0.083 p90=0.148\n  ...\n\n**Congratulations!** By now you should have successfully served a batched model over TorchServe.\n\nYou can now shutdown torchserve.\n\n.. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 32\n\n\n"
  },
  {
    "path": "archive/torch-neuron/tutorials/tutorial_source_instructions/run_libtorch.sh",
    "content": "#!/bin/bash\nset -eExuo\n#Run the setup script\ncd aws-neuron-sdk/src/examples/pytorch\nsudo apt install -y cargo \ncd libtorch_demo\nchmod +x setup.sh && ./setup.sh\n\n#Run sanity checks\n./run_tests.sh bert_neuron_b6.pt\n\n#Benchmark\n./example-app bert_neuron_b6.pt"
  },
  {
    "path": "archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh",
    "content": "#!/bin/bash\nset -eExuo\ncd aws-neuron-sdk/src/examples/pytorch\ncd torchserve\npython trace_bert_neuronx.py\nls\n\npip install transformers==4.52.* torchserve==0.7.0 torch-model-archiver==0.7.0 captum==0.6.0\n\nsudo apt install openjdk-11-jdk -y\n\nmkdir model_store\nMAX_LENGTH=$(jq '.max_length' config.json)\nBATCH_SIZE=$(jq '.batch_size' config.json)\nMODEL_NAME=bert-max_length$MAX_LENGTH-batch_size$BATCH_SIZE\ntorch-model-archiver --model-name \"$MODEL_NAME\" --version 1.0 --serialized-file ./bert_neuron_b6.pt --handler \"./handler_bert_neuronx.py\" --extra-files \"./config.json\" --export-path model_store\n\nls model_store\n\ntorchserve --start --ncs --model-store model_store --ts-config torchserve.config 2>&1 >torchserve.log\nsleep 10\ncurl http://127.0.0.1:8080/ping\n\nMAX_BATCH_DELAY=5000 # ms timeout before a partial batch is processed\nINITIAL_WORKERS=2 # Number from table above\ncurl -X POST \"http://localhost:8081/models?url=$MODEL_NAME.mar&batch_size=$BATCH_SIZE&initial_workers=$INITIAL_WORKERS&max_batch_delay=$MAX_BATCH_DELAY\"\n\npython infer_bert.py\n\npython benchmark_bert.py\n\ntorchserve --stop\n"
  },
  {
    "path": "archive/torch-neuron/tutorials/tutorials-inference-torch-neuron.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nTutorials for Inference with torch-neuron (Inf1)\n====================================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    Computer Vision Tutorials </archive/torch-neuron/tutorials/tutorials-torch-neuron-computervision>\n    Natural Language Processing (NLP) Tutorials </archive/torch-neuron/tutorials/tutorials-torch-neuron-nlp>\n    Utilizing Neuron Capabilities Tutorials </archive/torch-neuron/tutorials/tutorials-utilizing-neuron-capabilities>\n\n\n\n.. include:: /archive/torch-neuron/tutorials/tutorials-inference-torch-neuron.txt\n"
  },
  {
    "path": "archive/torch-neuron/tutorials/tutorials-inference-torch-neuron.txt",
    "content": ".. tab-set::\n\n    .. tab-item:: Computer Vision Tutorials\n\n\n        * ResNet-50 tutorial :ref:`[html] </src/examples/pytorch/resnet50.ipynb>` :pytorch-neuron-src:`[notebook] <resnet50.ipynb>`\n        * PyTorch YOLOv4 tutorial :ref:`[html] </src/examples/pytorch/yolo_v4.ipynb>` :pytorch-neuron-src:`[notebook] <yolo_v4.ipynb>`\n\n\n\n    .. tab-item:: Natural Language Processing (NLP) Tutorials\n\n\n        * HuggingFace pretrained BERT tutorial :ref:`[html] </src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb>` :pytorch-neuron-src:`[notebook] <bert_tutorial/tutorial_pretrained_bert.ipynb>`\n        * HuggingFace pretrained BERT tutorial with shared weights :ref:`[html] </src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert_shared_weights.ipynb>` :pytorch-neuron-src:`[notebook] <bert_tutorial/tutorial_pretrained_bert_shared_weights.ipynb>`\n        * Bring your own HuggingFace pretrained BERT container to Sagemaker Tutorial :ref:`[html] </src/examples/pytorch/byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb>` :pytorch-neuron-src:`[notebook] <byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb>`\n        * LibTorch C++ tutorial :ref:`[html] <pytorch-tutorials-libtorch>`\n        * TorchServe tutorial :ref:`[html] <pytorch-tutorials-torchserve>`\n        * HuggingFace MarianMT tutorial :ref:`[html] </src/examples/pytorch/transformers-marianmt.ipynb>` :pytorch-neuron-src:`[notebook] <transformers-marianmt.ipynb>`\n\n\n\n    .. tab-item:: Utilizing Neuron Capabilities Tutorials\n\n\n        * BERT TorchServe tutorial :ref:`[html] <pytorch-tutorials-torchserve>`\n        * NeuronCore Pipeline tutorial :ref:`[html] </src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb>` :pytorch-neuron-src:`[notebook] <pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb>`\n\n\n\n.. note::\n\n    To use Jupyter Notebook see:\n\n    * :ref:`setup-jupyter-notebook-steps-troubleshooting`\n    * :ref:`running-jupyter-notebook-as-script`\n"
  },
  {
    "path": "archive/torch-neuron/tutorials/tutorials-torch-neuron-computervision.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nComputer Vision Tutorials (``torch-neuron``)\n============================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n\n* ResNet-50 tutorial :ref:`[html] </src/examples/pytorch/resnet50.ipynb>` :pytorch-neuron-src:`[notebook] <resnet50.ipynb>`\n* PyTorch YOLOv4 tutorial :ref:`[html] </src/examples/pytorch/yolo_v4.ipynb>` :pytorch-neuron-src:`[notebook] <yolo_v4.ipynb>`\n\n\n\n\n"
  },
  {
    "path": "archive/torch-neuron/tutorials/tutorials-torch-neuron-nlp.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nNatural Language Processing (NLP) Tutorials (``torch-neuron``)\n==============================================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n* HuggingFace pretrained BERT tutorial :ref:`[html] </src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb>` :pytorch-neuron-src:`[notebook] <bert_tutorial/tutorial_pretrained_bert.ipynb>`\n* HuggingFace pretrained BERT tutorial with shared weights :ref:`[html] </src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert_shared_weights.ipynb>` :pytorch-neuron-src:`[notebook] <bert_tutorial/tutorial_pretrained_bert_shared_weights.ipynb>`\n* Bring your own HuggingFace pretrained BERT container to Sagemaker Tutorial :ref:`[html] </src/examples/pytorch/byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb>` :pytorch-neuron-src:`[notebook] <byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb>`\n* LibTorch C++ tutorial :ref:`[html] <pytorch-tutorials-libtorch>`\n* TorchServe tutorial :ref:`[html] <pytorch-tutorials-torchserve>`\n* HuggingFace MarianMT tutorial :ref:`[html] </src/examples/pytorch/transformers-marianmt.ipynb>` :pytorch-neuron-src:`[notebook] <transformers-marianmt.ipynb>`\n\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   /src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb\n   /src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert_shared_weights.ipynb\n   /src/examples/pytorch/byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb\n   tutorial-libtorch\n   tutorial-torchserve\n   transformers-marianmt\n"
  },
  {
    "path": "archive/torch-neuron/tutorials/tutorials-utilizing-neuron-capabilities.rst",
    "content": ".. meta::\n   :noindex:\n   :nofollow:\n   :description: This content is archived and no longer maintained.\n   :date-modified: 2026-03-11\n\nUtilizing Neuron Capabilities Tutorials\n=======================================\n\n.. warning::\n\n   This document is archived. torch-neuron (Inf1) is no longer officially supported\n   by the AWS Neuron SDK. It is provided for reference only. For current\n   framework support, see :doc:`/frameworks/index`.\n\n\n* BERT TorchServe tutorial :ref:`[html] <pytorch-tutorials-torchserve>`\n* NeuronCore Pipeline tutorial :ref:`[html] </src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb>` :pytorch-neuron-src:`[notebook] <pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb>`\n\n\n.. toctree::\n\t:hidden:\n\n\ttutorial-torchserve\n\t/src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb\n"
  },
  {
    "path": "archive/transformers-neuronx/api-reference-guide.rst",
    "content": ""
  },
  {
    "path": "archive/transformers-neuronx/api-reference-guide.txt",
    "content": ""
  },
  {
    "path": "archive/transformers-neuronx/developer-guide.rst",
    "content": ".. _tn_developer_guide:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This topic is currently archived and not maintained. It is provided for reference only.\n\n\nTransformers Neuron Developer Guide (``transformers-neuronx``)\n==============================================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /archive/transformers-neuronx/transformers-neuronx-developer-guide\n    /archive/transformers-neuronx/transformers-neuronx-developer-guide-for-continuous-batching\n\n\n.. include:: /libraries/transformers-neuronx/developer-guide.txt\n"
  },
  {
    "path": "archive/transformers-neuronx/developer-guide.txt",
    "content": "* :ref:`transformers_neuronx_developer_guide`"
  },
  {
    "path": "archive/transformers-neuronx/index.rst",
    "content": ".. _transformers_neuronx_archive_readme:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This topic is currently archived and not maintained. It is provided for reference only.\n\n\nTransformers NeuronX (``transformers-neuronx``)\n==============================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Setup </archive/transformers-neuronx/setup/index>\n    Developer Guide  </archive/transformers-neuronx/developer-guide>\n    Tutorials  </archive/transformers-neuronx/transformers-neuronx-tutorials>\n    Misc  </archive/transformers-neuronx/transformers-neuronx-misc>\n\n\n.. include:: /archive/transformers-neuronx/transformers-neuronx.txt\n"
  },
  {
    "path": "archive/transformers-neuronx/setup/index.rst",
    "content": ".. _transformers-neuronx-setup:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This topic is currently archived and not maintained. It is provided for reference only.\n\nTransformers NeuronX Setup (``transformers-neuronx``)\n=====================================================\n\nIf you already have setup your environment to run PyTorch NeuronX, you just need to install Transformers NeuronX library using\nthe following instruction.\n\n.. code-block::\n\n   pip install transformers-neuronx --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n\nIf you are starting from scratch, Neuron Multi Framework DLAMI is recommended as it comes pre-installed with Transformers NeuronX virtual environment.\nYou can refer to the :ref:`instructions to launch a Neuron instance using Multi Framework DLAMI <setup-ubuntu22-multi-framework-dlami>`\n\n\n"
  },
  {
    "path": "archive/transformers-neuronx/transformers-neuronx-api-reference.rst",
    "content": ""
  },
  {
    "path": "archive/transformers-neuronx/transformers-neuronx-developer-guide-for-continuous-batching.rst",
    "content": ".. _transformers_neuronx_developer_guide_for_cb:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This topic is currently archived and not maintained. It is provided for reference only.\n\nTransformers NeuronX (``transformers-neuronx``) Developer Guide for Continuous Batching\n=======================================================================================\n\nTransformers NeuronX is integrated with vLLM to enable continuous batching for high-throughput \nLLM serving and inference. This guide aims to help users get started with continuous batching for\nTransformers NeuronX and vLLM by providing:\n\n- :ref:`Transformers NeuronX <cb-tnx-overview>` An overview of Transformers NeuronX.\n- :ref:`cb-overview` The continuous batching procedure implemented by Transformers NeuronX and vLLM.\n- :ref:`cb-install` Installation and usage instructions for Transformers NeuronX and vLLM.\n- :ref:`cb-release-221-features` A showcase of new features in Transformers NeuronX and vLLM.\n- :ref:`cb-faq`\n\n.. _cb-tnx-overview:\n\nTransformers NeuronX (``transformers-neuronx``)\n-----------------------------------------------\n\nTransformers NeuronX for Trn1 and Inf2 is a software package that enables\nPyTorch users to perform large language model (LLM) :ref:`performant inference <neuron_llm_inference>` on\nsecond-generation Neuron hardware (See: :ref:`NeuronCore-v2 <neuroncores-v2-arch>`).\nThe :ref:`Neuron performance page <inf2-performance>` lists expected inference performance for commonly used Large Language Models.\n\n.. _cb-overview:\n\nContinuous Batching with Transformers NeuronX and vLLM\n------------------------------------------------------\n\nTransformers NeuronX implements the following operational flow with vLLM for continuous batching support:\n\n1. Context encode multiple prompts using virtual dynamic batching.\n2. Decode all sequences simultaneously until a sequence generates an EOS token.\n3. Evict the finished sequence and insert a new prompt encoding.\n4. Resume the decoding process, repeating steps 2 and 3 until all sequences are decoded.\n\n.. _cb-supported-model-architectures:\n\nSupported Model Architectures\n-----------------------------\n\nTransformers NeuronX supports continuous batching for models compatible with the following Hugging Face classes:\n\n- ``LlamaForCausalLM``\n- ``MistralForCausalLM``\n\n.. _cb-install:\n\nInstall vLLM and Get Started with Offline Inference\n---------------------------------------------------\n\nNeuron maintains a fork of vLLM (v0.6.2) that contains the necessary changes to support inference with Transformers NeuronX.\nNeuron is working with the vLLM community to upstream these changes to make them available in a future version.\n\nInstall vLLM\n^^^^^^^^^^^^\n\nFirst install ``neuronx-cc`` and the ``transformers-neuronx`` packages. Then install the vLLM fork from source:\n\n.. code-block:: bash\n\n    git clone -b v0.6.x-neuron https://github.com/aws-neuron/upstreaming-to-vllm.git\n    cd upstreaming-to-vllm\n    pip install -r requirements-neuron.txt\n    VLLM_TARGET_DEVICE=\"neuron\" && pip install -e .\n\n.. note::\n\n    Please note the vLLM ``pip`` package from PyPI is not compatible with Neuron. To work with Neuron, install vLLM using the source as outlined above.\n\n.. note::\n\n    The current supported version of Pytorch for Neuron installs ``triton`` version ``2.1.0``. This is incompatible with ``vllm >= 0.5.3``. You may see an error ``cannot import name 'default_dump_dir...``. To work around this, run ``pip install --upgrade triton==3.0.0`` after installing the vLLM wheel.\n\nIf Neuron packages are detected correctly in the installation process, ``vllm-0.1.dev2830+g22c56ee.neuron216`` will be installed (The ``neuron`` version depends on the installed\n``neuronx-cc`` version).\n\nRun Offline Batched Inference with Transformers NeuronX and vLLM\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn the following example we demonstrate how to perform continuous batching with a Llama model.\n\n.. note::\n\n    Since Llama models are gated, please accept the Llama Community License Agreement and request access to the model.\n    Then use a Hugging Face user access token to download the model.\n\n.. code-block:: python\n\n    from vllm import LLM, SamplingParams\n    \n    # Sample prompts.\n    prompts = [\n        \"Hello, my name is\",\n        \"The president of the United States is\",\n        \"The capital of France is\",\n        \"The future of AI is\",\n    ]\n    # Create a sampling params object.\n    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)\n\n    # Create an LLM.\n    llm = LLM(\n        model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n        max_num_seqs=8,\n        # The max_model_len and block_size arguments are required to be same as max sequence length,\n        # when targeting neuron device. Currently, this is a known limitation in continuous batching\n        # support in transformers-neuronx.\n        max_model_len=128,\n        block_size=128,\n        # The device can be automatically detected when AWS Neuron SDK is installed.\n        # The device argument can be either unspecified for automated detection, or explicitly assigned.\n        device=\"neuron\",\n        tensor_parallel_size=2)\n\n    # Generate texts from the prompts. The output is a list of RequestOutput objects\n    # that contain the prompt, generated text, and other information.\n    outputs = llm.generate(prompts, sampling_params)\n    # Print the outputs.\n    for output in outputs:\n        prompt = output.prompt\n        generated_text = output.outputs[0].text\n        print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n\nRun the API Server\n^^^^^^^^^^^^^^^^^^\nTo run the OpenAI-compatible API server in vLLM, run either command below:\n\n.. code-block:: bash\n\n    vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --tensor-parallel-size 32 --max-num-seqs 4 --max-model-len 2048 --block-size 8\n\n.. code-block:: bash\n\n    python3 -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --tensor-parallel-size 32 --max-num-seqs 4 --max-model-len 2048 --block-size 8\n\n.. _cb-release-221-features:\n\nNew Features in Neuron Release 2.21\n-----------------------------------\n\nNeuron's vLLM integration with Transformers NeuronX is tested using a public fork of vLLM v0.6.2.\nNew features and enhancements introduced in this fork will be described below.\nNeuron's intent is to upstream these features to vLLM as soon as possible after release.\nPrior to upstreaming, these features can be accessed in the AWS Neuron GitHub\nrepository https://github.com/aws-neuron/upstreaming-to-vllm/tree/v0.6.x-neuron.\n\n**Neuron Release 2.21 Features for the v0.6.2 vLLM Neuron Fork**\n\n- :ref:`Sequence bucketing <cb-sequence-bucketing>` configuration for context encoding and token generation.\n- :ref:`Granular NeuronConfig control <cb-neuron-config-override>` in vLLM entrypoints.\n- Inference support for :ref:`speculative decoding <cb-speculative-decoding>`.\n- Inference support for :ref:`EAGLE speculative decoding <cb-eagle-speculative-decoding>`.\n\n**Neuron Release 2.20 Features**\n\n- Multi-node inference support for larger models. Example scripts are included in `vLLM <https://github.com/vllm-project/vllm/commit/e5a3c0904799ec8e04e25ac25e66024004a61533>`_ .\n- Direct loading of Hugging Face-compatible checkpoints without creation of a ``-split`` directory.\n\n.. _cb-sequence-bucketing:\n\nSequence Bucketing\n^^^^^^^^^^^^^^^^^^\nTo configure buckets, set the following environment variables. Refer to the `developer guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide.html#bucketing>`_\nfor details on how to configure the values. These environment variables need to be set before starting the vLLM server or instantiating the ``LLM`` object.\n\n- ``NEURON_CONTEXT_LENGTH_BUCKETS``:  Bucket sizes for context encoding.\n- ``NEURON_TOKEN_GEN_BUCKETS``: Bucket sizes for token generation.\n\nFor example: ``export NEURON_CONTEXT_LENGTH_BUCKETS=\"128,512,1024\"``\n\n\n.. _cb-neuron-config-override:\n\nNeuronConfig Override\n^^^^^^^^^^^^^^^^^^^^^\nThe default ``NeuronConfig`` in vLLM uses the latest optimizations from the Neuron SDK. However, you can override the default values or add a new configuration from the `developer guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide.html#>`_ by setting the ``override_neuron_config`` parameter while creating the ``LLM`` object.\n\n.. code-block:: python\n\n    llm = LLM(\n        model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n        max_num_seqs=8,\n        max_model_len=128,\n        block_size=128\n        device=\"neuron\",\n        tensor_parallel_size=32,\n        #Override or update the NeuronConfig\n        override_neuron_config={\"shard_over_sequence\":True})\n\nWhile standing up the API server, set the ``override-neuron-config`` argument. For example:\n\n.. code-block:: bash\n\n    python3 -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --tensor-parallel-size 32 --max-num-seqs 4 --max-model-len 2048 --block-size 8 --override-neuron-config {\\\"shard_over_sequence\\\":\\\"True\\\"}\n\n\n.. _cb-quantization:\n\nQuantization\n^^^^^^^^^^^^\nTo use `int8 weight storage <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide.html#int8-weight-storage-support>`_ ,\nset the environment variable ``NEURON_QUANT_DTYPE`` to ``s8``.\n\n\n.. _cb-speculative-decoding:\n\nSpeculative Decoding\n^^^^^^^^^^^^^^^^^^^^\nSpeculative decoding is a token generation optimization technique that\nuses a small draft model to generate ``K`` tokens autoregressively and a\nlarger target model to determine which draft tokens to accept, all in a combined forward pass.\nFor more information on speculative decoding, please see `[Leviathan, 2023] <https://arxiv.org/abs/2211.17192>`_ and `[Chen et al., 2023] <https://arxiv.org/pdf/2302.01318>`_.\n\nSpeculative decoding is now available for inference with Transformers NeuronX and vLLM:\n\n.. code-block:: python\n\n    from vllm import LLM, SamplingParams\n\n    # Sample prompts.\n    prompts = [\n        \"Hello, my name is\",\n        \"The president of the United States is\",\n        \"The capital of France is\",\n        \"The future of AI is\",\n    ]\n    # Create a sampling params object.\n    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)\n\n    # Create an LLM.\n    llm = LLM(\n        model=\"meta-llama/Meta-Llama-3.1-70B-Instruct\",\n        speculative_model=\"meta-llama/Llama-3.2-1B-Instruct\",\n        # The max_model_len, speculative_max_model_len, and block_size arguments are required to be same as max sequence length,\n        # when targeting neuron device. Currently, this is a known limitation in continuous batching\n        # support in transformers-neuronx.\n        max_model_len=128,\n        block_size=128,\n        speculative_max_model_len=128,\n        dtype=\"bfloat16\",\n        max_num_seqs=4,\n        num_speculative_tokens=4,\n        # The device can be automatically detected when AWS Neuron SDK is installed.\n        # The device argument can be either unspecified for automated detection, or explicitly assigned.\n        device=\"neuron\",\n        tensor_parallel_size=32,\n        use_v2_block_manager=True,\n    )\n\n    outputs = llm.generate(prompts, sampling_params)\n    # Print the outputs.\n    for output in outputs:\n        prompt = output.prompt\n        generated_text = output.outputs[0].text\n        print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n\n.. note::\n\n    Please ensure that the selected target and draft model are from the same model family. For example, if the target model is an instruction-tuned Llama model,\n    the draft model must also be a lower-capacity instruction-tuned Llama model.\n\n.. _cb-eagle-speculative-decoding:\n\nEAGLE Speculative Decoding\n^^^^^^^^^^^^^^^^^^^^^^^^^^\nExtrapolation Algorithm for Greater Language-model Efficiency (EAGLE) extends the speculative decoding\ntechnique described above by:\n\n- Utilizing a specially trained EAGLE draft model that predicts feature outputs through an Autoregression Head and next token outputs through an LM Head.\n- Reducing sampling uncertainty by using the next autoregressively sampled token and a current feature map as draft model inputs.\n\nFor more information on EAGLE, please see `[Li et al., 2024] <https://arxiv.org/pdf/2401.15077>`_\n\nEAGLE speculative decoding can be applied without changes to the speculative decoding code sample above. Transformers NeuronX and vLLM will recognize\na draft model as an EAGLE draft when ``is_eagle: True`` is set in the model's Hugging Face ``config.json`` file.\n\n\n.. _cb-faq:\n\nFrequently Asked Questions\n--------------------------\n\n**Is PagedAttention supported in the vLLM integration?**\n\nNo, PagedAttention is not currently supported. It will be supported in a future Neuron release.\n"
  },
  {
    "path": "archive/transformers-neuronx/transformers-neuronx-developer-guide.rst",
    "content": ".. _transformers_neuronx_developer_guide:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This topic is currently archived and not maintained. It is provided for reference only.\n\nTransformers NeuronX (``transformers-neuronx``) Developer Guide\n================================================================\n\nTransformers NeuronX for Trn1 and Inf2 is a software package that enables\nPyTorch users to perform large language model (LLM) :ref:`performant inference <neuron_llm_inference>` on\nsecond-generation Neuron hardware (See: :ref:`NeuronCore-v2 <neuroncores-v2-arch>`).The :ref:`Neuron performance page <inf2-performance>` lists expected inference performance for commonly used Large Language Models.\n\n\nIntroduction\n------------\n\nThe `Transformers NeuronX repository <https://github.com/aws-neuron/transformers-neuronx>`_\ncontains the source code of the AWS Neuron Transformers integration project.\nAs it stands now, it mainly serves the purpose of\nrunning transformer decoder inference (autoregressive sampling)\nworkflows on the Neuron platform.\n\nNote: This project is **actively** in development. The Neuron team is\nstill heavily modifying the Neuron optimized module classes. The\nfunctionality provided in this repository will not maintain long-term\nAPI stability until version >= 1.0.0. For applications willing to reuse\ncode from this repository, we recommend treating the Neuron optimized\nmodule implementations as samples, and pin the version of the main\nlibrary package ``torch-neuronx`` to avoid breaking interface changes as\nnew features are developed.\n\n\n\nCheckpoint compatibility with HuggingFace Transformers\n------------------------------------------------------\n\n``transformers-neuronx`` is checkpoint-compatible with HuggingFace\nTransformers. While the Neuron team reimplemented some HuggingFace\nTransformers models from scratch for the purpose of maximizing the\nexecution efficiency of transformer decoders on Neuron, the\nimplementations are done with maximizing compatibility in mind, meaning\none can train transformer decoder models, say GPT2, using the standard\nHuggingFace Transformers library, and then construct an\ninference-optimized decoder model using transformers-neuronx's\n``GPT2ForSampling`` class. If training was done with other libraries\nsuch as MegatronLM, then it is still possible to convert the obtained\ncheckpoint to the standard HuggingFace Transformers checkpoint format,\nand then move on to transformers-neuronx's optimized decoder\nimplementations.\n\n\nNeuron optimized transformer decoders implemented in XLA High Level Operations (HLO)\n------------------------------------------------------------------------------------\n\nDue to the stateful nature of the autoregressive sampling computation,\nan efficient implementation of autoregressive sampling using the Neuron\nSDK requires rewriting the model forward function into a pure-function\ncomputation running on fixed-shape tensors. Furthermore, we want the\npure-function computation be implemented in a compiled language so that\nthe Neuron compiler can perform extensive code analysis and\noptimization. We chose XLA High Level Operations (HLO) as the compiled\nlanguage for implementing Neuron optimized transformer decoder classes.\nThe source code of these classes contains Python functions written in a\nsyntax called \"PyHLO\", name of a Neuron internal tool for\nwriting/compiling the HLO language in Python. As an example, a \"language\nmodel head\" implemented in PyHLO may look like the following.\n\n::\n\n   class LmHeadHlo:\n\n       ...\n\n       def lm_head(self, scribe):\n           dtype = self.dtype\n           hidden_size = self.hidden_size\n           n_active_tokens = self.n_active_tokens\n           batch_size = self.batch_size\n           vocab_size = self.vocab_size\n           hidden = dtype[hidden_size, n_active_tokens, batch_size].Parameter(parameter_number=0)\n           weight = dtype[hidden_size, vocab_size].Parameter(parameter_number=1)\n           rhs_size = n_active_tokens * batch_size\n           hidden = dtype[hidden_size, rhs_size].Reshape(hidden)\n           dot_dims = dict(lhs_contracting_dimensions=[0], rhs_contracting_dimensions=[0])\n           logits = dtype[vocab_size, rhs_size].Dot(weight, hidden, dot_dimension_numbers=dot_dims)\n           return dtype[vocab_size, n_active_tokens, batch_size].Reshape(logits)\n\n       ...\n\nThe ``transformers_neuronx.compiler.compile_py_func`` function can\nconvert the Python ``lm_head`` function into ``HloModuleProto``, a valid\ninput format for the ``neuronx-cc`` compiler.\n\n\nTensor-parallelism support\n--------------------------\n\nFor transformer decoders used in large language models,\ntensor-parallelism is necessary as it provides a way to shard the\nmodels' large weight matrices onto multiple NeuronCores, and having\nNeuronCores working on the same matrix multiply operation\ncollaboratively. transformers-neuronx's tensor-parallelism support makes\nheavy use of collective operations such as all-reduce, which is\nsupported natively by the Neuron runtime.\n\nThere are some principles for setting tensor-parallelism degree (number\nof NeuronCores participating in sharded matrix multiply operations) for\nNeuron-optimized transformer decoder models.\n\n1. The number of attention heads needs to be divisible by the\n   tensor-parallelism degree.\n2. The total data size of model weights and key-value caches needs to be\n   smaller than 16 GB times the tensor-parallelism degree.\n3. Currently, the Neuron runtime supports tensor-parallelism degrees 1,\n   2, 8, and 32 on Trn1 and supports tensor-parallelism degrees 1, 2, 4,\n   8, and 24 on Inf2.\n\nSome examples:\n\n1. ``facebook/opt-13b`` has 40 attention heads, and when running at\n   batch size 1 and float16 precision the model requires ~29 GB memory,\n   therefore a ``trn1.2xlarge`` with 32 GB device memory is sufficient.\n2. ``facebook/opt-30b`` has 56 attention heads, and at batch size 1 and\n   float16 precision the model requires ~66 GB memory, therefore it can\n   run on 8 NeuronCores on one ``trn1.32xlarge`` using 128 GB device\n   memory.\n3. ``gpt2-xl`` has 25 attention heads and requires ~4 GB memory at\n   bfloat16 precision. It runs without tensor-parallelism only.\n\n\nFeatures\n--------\n\n\nCompile-time Configurations\n---------------------------\n\nTransformers Neuron models support a variety of compile-time configurations\nthat can be used to tune model performance. All models support the following\nconfigurations:\n\n- ``batch_size``: The batch size to compile a model for. Once the batch size has\n  been set, this is the only size that is supported at inference time. Neuron\n  uses ahead-of-time compilation to achieve high performance which requires\n  that the compiled artifact shapes must be known at compilation time.\n- ``n_positions``: The maximum number of positions (or sequence length) to allow\n  during generation. This parameter directly controls the width of the KV\n  cache. This parameter should be set to the maximum expected sequence length\n  for the end application.\n- ``tp_degree``: This parameter controls the number of tensor parallel shards to\n  split the model into. Each shard will execute on a separate NeuronCore. To\n  minimize latency, it is recommended to set the tensor parallelism to be\n  equal to the number of NeuronCores that are available on an instance.\n- ``amp``: This allows a models weights and compute to be cast to a different\n  type. The options are; ``'bf16'``, ``'f16'``, or ``'f32'``. For\n  models trained in ``float32``, the 16-bit mixed precision options\n  (``'bf16'``, ``'f16'``) generally provide sufficient accuracy while\n  significantly improving performance.\n- ``context_length_estimate``: This parameter controls the maximum sequence\n  length of the prompt/context handling compute graph. This parameter is\n  not supported in ``GPTNeoXForSampling`` and ``GPTJForSampling``.\n\n.. code-block:: python\n\n    from transformers_neuronx import NeuronAutoModelForCausalLM\n\n    model = NeuronAutoModelForCausalLM.from_pretrained(\n        'gpt2',                      # Uses the GPT2 checkpoint from https://huggingface.co/gpt2\n        batch_size=1,                # Allow inference with batch size 1 inputs\n        n_positions=128,             # Allow a maximum size of 128 prompt & output tokens\n        tp_degree=2,                 # Shard the model weights & compute across 2 NeuronCores\n        amp='f16',                   # Downcast the weights & compute to float16\n        context_length_estimate=64,  # Build an optimized context encoding network for a maximum prompt size of 64\n    )\n    model.to_neuron() # Load/compile the model\n\n\n\nCheckpoint support and automatic model selection\n------------------------------------------------\n\n*New in release 2.18*\n\nTransformers Neuron now supports a greater variety of checkpoints including\nolder pytorch binary checkpoints and newer `safetensors`_ checkpoints. For\nimproved load speed and reduced host memory consumption, it is recommended to\nalways use ``safetensors`` by default. Both regular and sharded variants of\ncheckpoints are supported. It is no longer recommended to use the\n``save_pretrained_split`` function which was used in older Transformers Neuron\nexamples.\n\nIn addition to supporting standard checkpoint formats, Transformers Neuron\nprovides an AutoModel class ``NeuronAutoModelForCausalLM`` which can be\nused to load the correct model without explicitly importing the\narchitecture-specific class.\n\n.. _safetensors: https://github.com/huggingface/safetensors\n\n.. code-block:: python\n\n    from transformers_neuronx import NeuronAutoModelForCausalLM\n\n    # Loads: https://huggingface.co/bigscience/bloom-560m\n    bloom = NeuronAutoModelForCausalLM.from_pretrained('bigscience/bloom-560m')\n    bloom.to_neuron()\n\n    # Loads: https://huggingface.co/openlm-research/open_llama_3b_v2\n    llama = NeuronAutoModelForCausalLM.from_pretrained('openlm-research/open_llama_3b_v2')\n    llama.to_neuron()\n\n    # This is equivalent to the following:\n    from transformers_neuronx import BloomForSampling\n    model = BloomForSampling.from_pretrained('bigscience/bloom-560m')\n    model.to_neuron()\n\n    from transformers_neuronx import LlamaForSampling\n    llama = LlamaForSampling.from_pretrained('openlm-research/open_llama_3b_v2')\n    llama.to_neuron()\n\n\n.. note::\n\n    Advanced features of huggingface hub access are not supported. This\n    includes private repositories which require access tokens and branches.\n\n    In order to support more advanced repository downloads, please download the\n    model to a local directory and load it from there.\n\n\n\nHugging Face generate() API support\n-----------------------------------\n\nTransformers Neuron models support the Hugging Face `generate() <https://huggingface.co/docs/transformers/v4.28.1/en/main_classes/text_generation#transformers.GenerationMixin.generate>`__\nAPI via the ``HuggingFaceGenerationModelAdapter`` adapter class. In the following example we\ndemonstrate how to run sampling with temperature using the ``GPT2`` model:\n\n.. code-block:: python\n\n    import torch\n    from transformers import AutoTokenizer, AutoConfig\n    from transformers_neuronx import GPT2ForSamplingWithContextBroadcasting, HuggingFaceGenerationModelAdapter\n\n    # Create and compile the Neuron model\n    model = GPT2ForSamplingWithContextBroadcasting.from_pretrained('gpt2')\n    model.to_neuron()\n\n    # Use the `HuggingFaceGenerationModelAdapter` to access the generate API\n    config = AutoConfig.from_pretrained('gpt2')\n    model = HuggingFaceGenerationModelAdapter(config, model)\n\n    # Get a tokenizer and example input\n    tokenizer = AutoTokenizer.from_pretrained('gpt2')\n    tokenizer.pad_token_id = tokenizer.eos_token_id\n    tokenizer.padding_side = 'left'\n    text = \"Hello, I'm a language model,\"\n    encoded_input = tokenizer(text, return_tensors='pt', padding=True)\n\n    # Run inference using temperature\n    with torch.inference_mode():\n        model.reset_generation()\n        generated_sequence = model.generate(\n            input_ids=encoded_input.input_ids,\n            attention_mask=encoded_input.attention_mask,\n            do_sample=True,\n            max_length=256,\n            temperature=0.7,\n        )\n\n    print([tokenizer.decode(tok) for tok in generated_sequence])\n\n\nNote: As the Hugging Face generation API can expand the input's batch dimension\nbased on different generation configurations, we need to compile the neuron\nmodel with different compile batch_size compared to the run time batch_size\n(batch dimension of inputs to generation API).\n- if ``do_sample=True``, ``compile_batch_size = runtime_batch_size x num_return_sequences x beam_size``\n- otherwise, ``compile_batch_size = runtime_batch_size x num_return_sequences``\n\n\n\nNeuron Persistent Cache\n------------------------\n\nThe Neuron Persistent Cache is now enabled for Transformers Neuron by default.\nModel artifacts which have been compiled once will be cached and reused on\nsuccessive runs when possible. Model artifacts will only be reused when\ncompiling with the same compiler version (neuronx-cc), model configurations,\nand compiler flags. It also includes other features (i.e. using an S3 bucket as\nthe cache backend). For more detailed information, see the\n:ref:`Persistent cache documentation <neuron-caching>`\n\n\n.. _int8_weight_storage_support:\n\n\nint8 weight storage support\n---------------------------\n\nTransformers Neuron supports int8 weight storage for the ``GPT2`` model class.\nint8 weight storage can be used to reduce memory bandwidth usage to improve\nmodel performance. int8 weight storage support for additional model classes\nwill be added in an upcoming release. In the following example we demonstrate\nhow to apply int8 weight storage to the ``GPT2`` model via the\n``QuantizationConfig`` and ``NeuronConfig`` configs:\n\n.. code-block:: python\n\n    import torch\n    from transformers import AutoTokenizer\n    from transformers_neuronx import GPT2ForSamplingWithContextBroadcasting, NeuronConfig, QuantizationConfig\n\n    # Set the weight storage config use int8 quantization and bf16 dequantization\n    neuron_config = NeuronConfig(\n        quant=QuantizationConfig(quant_dtype='s8', dequant_dtype='bf16'),\n    )\n\n    # Create and compile the Neuron model\n    model = GPT2ForSamplingWithContextBroadcasting.from_pretrained(\n        'gpt2',\n        amp='bf16', # NOTE: When using quantization, amp type must match dequant type\n        neuron_config=neuron_config\n    )\n    model.to_neuron()\n\n    # Get a tokenizer and example input\n    tokenizer = AutoTokenizer.from_pretrained('gpt2')\n    text = \"Hello, I'm a language model,\"\n    encoded_input = tokenizer(text, return_tensors='pt')\n\n    # Run inference\n    with torch.inference_mode():\n        generated_sequence = model.sample(encoded_input.input_ids, sequence_length=256, start_ids=None)\n    print([tokenizer.decode(tok) for tok in generated_sequence])\n\n\n\nParallel Input Prompt Context Encoding\n--------------------------------------\n\nTransformers Neuron supports parallel input prompt context encoding for the ``GPT2``\nmodel class. Parallel context encoding can be used to significantly reduce\nthe latency of the input prompt context encoding before the autoregressive\ndecoder token generation loop. Parallel context encoding support for additional\nmodel classes will be added in an upcoming release.\n\nThe ``GPT2ForSamplingWithContextBroadcasting`` class has a ``context_length_estimate``\nvariable that determines the number of input prompt tokens that will be processed in\nparallel. For optimal results, this should be set to a power of 2 that is\nclosest to the most frequently seen input prompt length.\nIn the following example we demonstrate how to apply parallel context encoding\nto the ``GPT2`` model via the ``GPT2ForSamplingWithContextBroadcasting`` class.\nIn this example, we set the ``context_length_estimate`` to be 128, which is\nthe closest power of 2 the length of the input prompt (97 tokens).\n\n.. code-block:: python\n\n    import torch\n    from transformers import AutoTokenizer\n    from transformers_neuronx import GPT2ForSamplingWithContextBroadcasting\n\n    # Create and compile the Neuron model\n    model = GPT2ForSamplingWithContextBroadcasting.from_pretrained(\n        'gpt2',\n        context_length_estimate=256 # Create an optimized network which handles prompts up to 256 tokens\n    )\n    model.to_neuron()\n\n    # Get a tokenizer and example input\n    tokenizer = AutoTokenizer.from_pretrained('gpt2')\n    text = \"Hello, I'm a generative AI language model. Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. It is powered by large models that are pre-trained on vast amounts of data and commonly referred to as foundation models (FMs). With generative AI on AWS, you can reinvent your applications, create entirely new customer experiences, drive unprecedented levels of productivity, and transform your business. \"\n    encoded_input = tokenizer(text, return_tensors='pt')\n\n    # Run inference\n    with torch.inference_mode():\n        generated_sequence = model.sample(encoded_input.input_ids, sequence_length=256)\n    print([tokenizer.decode(tok) for tok in generated_sequence])\n\n\n\nThe ``GPT2ForSamplingWithContextBroadcasting`` class can also process\nan input prompt that has a different batch size from the batch size of the\nautoregressive decoder output. For example, an input prompt with batch size = 1 can\nbe used to produce an output of batch size = 5 to generate multiple suggestions\nfor the same input prompt. The input prompt batch size can be specified using\nthe ``prompt_batch_size`` argument and the autoregressive decoder output batch\nsize can be specified using the ``batch_size`` argument. In the following example\nwe demonstrate how to apply parallel context encoding to the ``GPT2`` model\nto generate 5 outputs for a single input.\n\n.. code-block:: python\n\n    import torch\n    from transformers import AutoTokenizer\n    from transformers_neuronx import GPT2ForSamplingWithContextBroadcasting\n\n    # Create and compile the Neuron model\n    model = GPT2ForSamplingWithContextBroadcasting.from_pretrained(\n        'gpt2',\n        prompt_batch_size=1, # This allows prompt and output batch to vary\n        batch_size=5,\n        context_length_estimate=256\n    )\n    model.to_neuron()\n\n    # Get a tokenizer and example input\n    tokenizer = AutoTokenizer.from_pretrained('gpt2')\n    text = \"Hello, I'm a generative AI language model. Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. It is powered by large models that are pre-trained on vast amounts of data and commonly referred to as foundation models (FMs). With generative AI on AWS, you can reinvent your applications, create entirely new customer experiences, drive unprecedented levels of productivity, and transform your business. \"\n    encoded_input = tokenizer(text, return_tensors='pt')\n\n    # Run inference\n    with torch.inference_mode():\n        generated_sequence = model.sample(encoded_input.input_ids, sequence_length=256)\n\n    for i, output in enumerate(generated_sequence):\n        print('-' * 50)\n        print(f'Batch {i} output:')\n        print(tokenizer.decode(output))\n\n\nSerialization support\n---------------------\n\nTransformers NeuronX supports model serialization (model saving and loading) for\nall models except the ``GPTJForSampling`` and ``GPTNeoXForSampling``` model\nclasses. In the following example we demonstrate how to save and load\nthe compiled artifacts for the ``GPT2`` model:\n\n.. code-block:: python\n\n    import torch\n    from transformers import AutoTokenizer\n    from transformers_neuronx import GPT2ForSamplingWithContextBroadcasting\n\n    # Create and compile the Neuron model\n    model = GPT2ForSamplingWithContextBroadcasting.from_pretrained('gpt2')\n    model.to_neuron()\n\n    # Save the compiled Neuron model\n    model.save('gpt2-compiled-artifacts')\n\n    # Load the Neuron model\n    model = GPT2ForSamplingWithContextBroadcasting.from_pretrained('gpt2')\n    # Load the compiled Neuron artifacts\n    model.load('gpt2-compiled-artifacts')\n    # Since prior artifacts are loaded, this skips compilation\n    model.to_neuron()\n\n    # Get a tokenizer and example input\n    tokenizer = AutoTokenizer.from_pretrained('gpt2')\n    text = \"Hello, I'm a language model,\"\n    encoded_input = tokenizer(text, return_tensors='pt')\n\n    # Run inference\n    with torch.inference_mode():\n        generated_sequence = model.sample(encoded_input.input_ids, sequence_length=256, start_ids=None)\n    print([tokenizer.decode(tok) for tok in generated_sequence])\n\nTransformers NeuronX also supports the serialization of presharded weights.\nThis reduces future model load time by saving a transformed and sharded\nset of weights as a new safetensors checkpoint. When this checkpoint is loaded,\nsharding and transformations normally done by Transformers NeuronX will be skipped,\nreducing model load time significantly. The saving of presharded weights is only\navailable when ``on_device_embedding`` is true. In the following example we\ndemonstrate how to save and load presharded weights along with compiled artifacts on a Llama model:\n\n.. code-block:: python\n\n    from transformers_neuronx import LlamaForSampling\n    from transformers_neuronx import NeuronConfig\n    from transformers import AutoTokenizer\n\n    neuron_config = NeuronConfig(on_device_embedding=True)\n\n    # Create and compile the Neuron model\n    model_neuron = LlamaForSampling.from_pretrained('openlm-research/open_llama_3b', batch_size=1, tp_degree=8, n_positions=128, neuron_config=neuron_config)\n    model_neuron.to_neuron()\n\n    # save the presharded weights and compiled artifacts to a directory\n    model_neuron.save('llama-artifacts', sharded_weights=True)\n\n    del model_neuron\n\n    # use the presharded checkpoint to reduce model load time\n    model_neuron_presharded = LlamaForSampling.from_pretrained('llama-artifacts', batch_size=1, tp_degree=8, n_positions=128, neuron_config=neuron_config)\n\n    # load in the compiled artifcats to skip compilation\n    model_neuron_presharded.load('llama-artifacts')\n    model_neuron_presharded.to_neuron()\n\nCPU Compilation Support\n-----------------------\n\nTransformers NeuronX now supports compilation on CPU. \nCPU compilation is compatible with model serialization and presharding weights, \nand is available for all models except the GPTJForSampling and GPTNeoXForSampling \nmodel classes. To compile on CPU, the initial call to to_neuron() is replaced with \ncpu_compile(). In the following example we demonstrate how to compile on CPU for \nthe LLaMA model:\n\n.. code-block:: python\n\n    from transformers_neuronx import LlamaForSampling\n    from transformers_neuronx import NeuronConfig\n    from transformers import AutoTokenizer\n\n    neuron_config = NeuronConfig(on_device_embedding=True)\n\n    # Create and compile the model on CPU\n    model_neuron = LlamaForSampling.from_pretrained('openlm-research/open_llama_3b', batch_size=1, tp_degree=8, n_positions=128, neuron_config=neuron_config)\n    model_neuron.cpu_compile() # instead of model_neuron.to_neuron()\n\n    # save the weights and compiled artifacts to a directory\n    model_neuron.save('llama-artifacts')\n\nTo use the saved artifacts generated by CPU compilation on a Neuron device: \n\n.. code-block:: python\n    \n    from transformers_neuronx import LlamaForSampling\n    from transformers_neuronx import NeuronConfig\n    from transformers import AutoTokenizer\n\n    neuron_config = NeuronConfig(on_device_embedding=True)\n\n    # use the presharded checkpoint to reduce model load time\n    model_neuron_presharded = LlamaForSampling.from_pretrained('llama-artifacts', batch_size=1, tp_degree=8, n_positions=128, neuron_config=neuron_config)\n\n    # load in the compiled artifacts to skip compilation\n    model_neuron_presharded.load('llama-artifacts')\n\n    # now, use CPU compiled artifacts to run the model\n    model_neuron_presharded.to_neuron()\n\n\nCompilation worker count support\n--------------------------------\n\nTransformers-neuronx supports providing compilation worker count for all models. This setting controls how many workers will execute HLO graph compilation tasks in parallel. A lower setting reduces CPU memory utilization when compiling a model, but increases the compilation time. This setting is useful to prevent out of CPU memory errors when compiling large models. By default, the number of workers used is equal to the total HLO graphs required for compilation. Compilation worker count integrates with both CPU compilation flow using ``cpu_compile()`` and neuron device compilation flow using ``to_neuron()``.\nTo set the compilation worker count, use the ``compilation_worker_count`` argument in ``NeuronConfig``. The following sample shows how to compile the graphs one by one.\n\n.. code-block:: python\n\n    neuron_config = NeuronConfig(compilation_worker_count=1)\n\n\nGrouped-query attention (GQA) support [Beta]\n---------------------------------------------\n\nTransformers Neuron supports grouped-query attention (GQA) models for\n``Llama`` and ``Mistral`` model classes.\nThere are multiple sharding strategies for K/V cache, in order to satisfy different constraints.\n\n- ``GQA.SHARD_OVER_HEADS`` distributes K/V caches along head dimension. This can be only used when K/V heads is multiple of tensor-parallelism degree. This is the default configuration.\n- ``GQA.SHARD_OVER_BATCH`` distributes K/V caches along batch dimension. This can be only used when batch size is multiple of tensor-parallelism degree. This can be useful for large-batch inference.\n- ``GQA.REPLICATED_HEADS`` replicates K/V heads. This can be used when neither batch size nor K/V heads can be divisible by tensor-parallelism degree. This can be useful for low-latency small-batch inference.\n- ``GQA.ALL_GATHER_HEADS`` evenly splits the K/V heads across all NeuronCores. This is optimized for large-batch inference of GQA model without replication.\n\n.. _mistral_gqa_code_sample:\n\nIn the following example we demonstrate how to configure these distributed inference strategies and\nperform inference with the ``Mistral`` model:\n\n.. code-block:: python\n\n    import torch\n    from transformers import AutoTokenizer\n    from transformers_neuronx import MistralForSampling, GQA, NeuronConfig\n\n    # Set sharding strategy for GQA to be shard over heads\n    neuron_config = NeuronConfig(\n        group_query_attention=GQA.SHARD_OVER_HEADS\n    )\n\n    # Create and compile the Neuron model\n    model_neuron = MistralForSampling.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', amp='bf16', neuron_config=neuron_config)\n    model_neuron.to_neuron()\n\n    # Get a tokenizer and exaple input\n    tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2')\n    text = \"[INST] What is your favourite condiment? [/INST]\"\n    encoded_input = tokenizer(text, return_tensors='pt')\n\n    # Run inference\n    with torch.inference_mode():\n        generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=256, start_ids=None)\n    print([tokenizer.decode(tok) for tok in generated_sequence])\n\n\n\nRepeated Ngram Filtering\n------------------------\n\nRepeated Ngram Filtering reduces redundant ngram phrases within the generated text. It uses the same API as `HuggingFace API for NoRepeatedNGram <https://huggingface.co/docs/transformers/v4.38.2/en/internal/generation_utils#transformers.NoRepeatNGramLogitsProcessor>`__. Set the parameter no_repeat_ngram_size to the size of ngram phrases to be filtered and pass it to the sampling function as in the example ``model.sample(inputs_ids, no_repeat_ngram_size=3)``\n\n\nOn-device sampling support [Beta]\n--------------------------------------\n\nTransformers-neuronx supports on-device sampling for all models except Mixtral models. The features\ncan be enabled by setting ``on_device_generation`` in ``NeuronConfig`` to an instance of ``GenerationConfig``.\n\nIn the following example, we demonstrate how to use on-device generation for a ``Llama`` model using\n``top_k``, ``top_p``, ``top_p_min_tokens`` and ``temperature``.\n\n\nTop-K on-device sampling support [Beta]\n---------------------------------------\nTransformers Neuron supports Top-K Sampling on-device for all models except Mixtral models.\nIn the following example, we demonstrate how to use on-device Top-K for the ``Llama`` model via\nthe ``GenerationConfig`` and ``NeuronConfig`` configs.\n\n.. code-block:: python\n\n    import torch\n    from transformers_neuronx import LlamaForSampling\n    from transformers_neuronx.config import NeuronConfig, GenerationConfig\n    from transformers import AutoTokenizer\n\n    neuron_config = NeuronConfig(\n        on_device_generation=GenerationConfig(max_length=128, top_k=10, top_p=0.9, top_p_min_tokens=1, temperature=0.9, do_sample=True)\n    )\n\n    # Create and compile the Neuron model\n    model_neuron = LlamaForSampling.from_pretrained('openlm-research/open_llama_3b', batch_size=1, tp_degree=8, n_positions=128, neuron_config=neuron_config)\n    model_neuron.to_neuron()\n\n    # Get a tokenizer and exaple input\n    tokenizer = AutoTokenizer.from_pretrained('openlm-research/open_llama_3b')\n    text = \"Hello, I'm a language model,\"\n    encoded_input = tokenizer(text, return_tensors='pt')\n\n    # Run inference\n    with torch.inference_mode():\n        generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=128, top_k=10)\n        print([tokenizer.decode(tok) for tok in generated_sequence])\n\n\nBy default, transformers-neuronx uses the same, fixed sampling parameters for all sequences across all invocations\nof the model when on-device generation is enabled. It is possible to provide new sampling parameters per\nmodel invocation by enabling the ``dynamic`` feature in the ``GenerationConfig``. It is also possible to provide\ndifferent sampling parameters for each sequence in the batch by using the ``per_batch_line`` feature.\nWhen using this feature, it is recommended to limit the number of tokens that are considered during\nsampling across all sequences by setting ``global_top_k`` to a reasonably low number e.g. 250 to prevent\npoor performance when computing ``top_p`` tokens over a large vocabulary without any prior filtering. When using\n``per_batch_line``, ``top_k``, ``top_p``, ``top_p_min_tokens`` and ``temperature`` accept lists with value per\nsequence in the batch.\n\n\nIn the following example, we demonstrate how to use the ``dynamic`` and ``per_batch_line`` features together.\n\n.. code-block:: python\n\n    import torch\n    from transformers_neuronx import LlamaForSampling\n    from transformers_neuronx.config import NeuronConfig, GenerationConfig\n    from transformers import AutoTokenizer\n\n    batch_size = 2\n    generation_config = GenerationConfig(\n            max_length=128, dynamic=True, per_batch_line=True, do_sample=True,\n            top_k=[1] * batch_size,\n            top_p=[1.0] * batch_size,\n            top_p_min_tokens=[1] * batch_size,\n            temperature=[1.0] * batch_size,\n            global_top_k=256\n        )\n\n    neuron_config = NeuronConfig(\n        on_device_generation=generation_config\n    )\n\n    # Create and compile the Neuron model\n    model_neuron = LlamaForSampling.from_pretrained('openlm-research/open_llama_3b', batch_size=2, tp_degree=8, n_positions=128, neuron_config=neuron_config)\n    model_neuron.to_neuron()\n\n    # Get a tokenizer and exaple input\n    tokenizer = AutoTokenizer.from_pretrained('openlm-research/open_llama_3b')\n    tokenizer.pad_token = tokenizer.eos_token\n    text = [\"Hello, I'm a language model,\", \"Hello, I'm also a language model,\"]\n    encoded_input = tokenizer(text, return_tensors='pt')\n\n    # Run inference\n    with torch.inference_mode():\n        generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=128)\n        print([tokenizer.decode(tok) for tok in generated_sequence])\n\n        # Use different settings for each sequence in the batch\n        # Supported because we use `generation_config.per_batch_line = True`\n        generation_config.top_k = [1, 20]\n        generation_config.top_p = [1.0, 0.9]\n        generation_config.top_p_min_tokens = [1, 1]\n        generation_config.temperature = [1.0, 0.9]\n\n        # Update the generation configuration dynamically\n        # Supported because we use `generation_config.dynamic = True`\n        model_neuron.update_generation_config(generation_config)\n\n        generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=128)\n        print([tokenizer.decode(tok) for tok in generated_sequence])\n\n\n\nRunning inference with multiple models\n--------------------------------------\n\nMultiple transformers-neuronx models can be loaded at the same time as long\nas the total number of consumed NeuronCores is less than or equal to the total\nnumber of NeuronCores on the instance. For example, three tp-degree=8 models can be\nloaded and run in parallel on an inf2.48xlarge which has 24 NeuronCores. The\n``NEURON_RT_NUM_CORES`` and ``NEURON_RT_VISIBLE_CORES`` environment variables\ncan be used to allocate the necessary number of NeuronCores to each process\nto run multiple transformers-neuronx models in parallel. See the\n:ref:`torch_neuronx_core_placement_guide` section for additional information\nabout how to use these environment variables.\n\nIt is important to notice that when multiple models are used on a single instance,\nthe number of threads should be reduced to avoid race condition on host side.\nAssume the neuron instance (i.e. trn1) has 192 CPU cores.\nIf one of the models keeps all CPU cores busy, there would be significant performance\ndegradation in the rest of models. As a result, the number of threads for each model\nshould be limited to part of available cores. To do this, ``OMP_NUM_THREADS`` environment\nvariable can be set. For example, if there are 192 CPU cores available and four tp-degree=8\nmodels are used, one can export OMP_NUM_THREADS=48 to avoid race condition.\n\n\nStreamer\n----------------------------\n\nLLMs generate tokens in auto-regressive loop. A model.sample call waits till\nthe end of full sequence generation before returning the generated response.\nIt is possible to output an output token as soon as it is generated. To do this,\na streamer object can be used. Streamer is an object which has 2 methods: put and end.\nThere are several predefined streamer in transformers library such as TextIteratorStreamer.\nThe following example shows how to define a streamer and use it in transformers-neuronx:\n\n.. code-block:: python\n\n    import torch\n    from transformers import AutoTokenizer\n    from transformers_neuronx import MistralForSampling, GQA\n\n    import transformers\n    from time import time\n\n    # Create a custom streamer inherited from transformers.generation.streamers.BaseStreamer\n    class CustomStreamer(transformers.generation.streamers.BaseStreamer):\n        def __init__(self) -> None:\n            self.reset()\n\n        def reset(self):\n            self.token_latencies = []\n            self.iter = 0\n            self.now = time()\n\n        def put(self, tokens):\n            now = time()\n            token_latency = now - self.now\n            print(f\"Iteration {self.iter:4d}: Latency [s] {token_latency:6.3f} -- Token {tokens}\")\n            self.now = now\n            self.iter += 1\n            self.token_latencies.append(token_latency)\n\n\n        def end(self):\n            print(\"First 10 token latencies:\", self.token_latencies[:10])\n\n\n    # Create and compile the Neuron model\n    model_neuron = MistralForSampling.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', amp='bf16')\n    model_neuron.to_neuron()\n\n    # Get a tokenizer and exaple input\n    tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2')\n    text = \"[INST] What is your favourite condiment? [/INST]\"\n    encoded_input = tokenizer(text, return_tensors='pt')\n\n    streamer = CustomStreamer()\n    # Run inference\n    with torch.inference_mode():\n        generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=256, start_ids=None, streamer=streamer)\n\n\nStopping Criteria\n------------------\nWe can define custom stopping criteria to stop autoregressive loop. For example, if\nwe want to limit autoregressive loop after 0.5s, we can define and use stopping criteria\nclass as follows:\n\n\n.. code-block:: python\n\n    import torch\n    import transformers\n    from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer\n    from transformers_neuronx import MistralForSampling, GQA, NeuronConfig\n    from transformers_neuronx.stopping_criteria import StoppingCriteria, StoppingCriteriaList\n\n    from time import time\n    from typing import List, Optional, Callable\n\n\n    class MaxTimeCriteria(StoppingCriteria):\n        \"\"\"\n        This class can be used to stop generation whenever the full generation exceeds some amount of time. By default, the\n        time will start being counted when you initialize this function. You can override this by passing an\n        `initial_time`.\n\n        Args:\n            max_time (`float`):\n                The maximum allowed time in seconds for the generation.\n            initial_time (`float`, *optional*, defaults to `time()`):\n                The start of the generation allowed time.\n        \"\"\"\n\n        def __init__(self, max_time: float, initial_timestamp: Optional[float] = None):\n            self.max_time = max_time\n            self.initial_timestamp = time() if initial_timestamp is None else initial_timestamp\n\n        def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:\n            dt = time() - self.initial_timestamp\n            end_condition = dt > self.max_time\n            if end_condition:\n                print(\"Stopping!\")\n            return end_condition\n\n    # Create a streamer. This can be a custom streamer too inherited from transformers.generation.streamers.BaseStreamer\n    class CustomStreamer(transformers.generation.streamers.BaseStreamer):\n        def __init__(self) -> None:\n            self.reset()\n\n        def reset(self):\n            self.token_latencies = []\n            self.iter = 0\n            self.now = time()\n\n        def put(self, tokens):\n            now = time()\n            token_latency = now - self.now\n            print(f\"Iteration {self.iter:4d}: Latency [s] {token_latency:6.3f} -- Token {tokens}\")\n            self.now = now\n            self.iter += 1\n            self.token_latencies.append(token_latency)\n\n\n        def end(self):\n            pass\n\n    # Create and compile the Neuron model\n    model_neuron = MistralForSampling.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', amp='bf16')\n    model_neuron.to_neuron()\n\n    # Get a tokenizer and exaple input\n    tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2')\n    text = \"[INST] What is your favourite condiment? [/INST]\"\n    encoded_input = tokenizer(text, return_tensors='pt')\n\n    # Add stopping criteria to stop after 0.5 seconds\n    stopping_criteria_list= StoppingCriteriaList([MaxTimeCriteria(0.5)])\n    streamer = CustomStreamer()\n\n    # Run inference\n    with torch.inference_mode():\n        model_neuron.sample(input_ids=encoded_input.input_ids, sequence_length=256, stopping_criteria_list=stopping_criteria_list, streamer=streamer)\n\n\nSpeculative sampling [Beta]\n---------------------------\n\nTransformers Neuron supports speculative sampling for the ``Llama`` and ``GPT2``\nmodel classes. In speculative sampling, we use use a smaller draft model to speculate future tokens.\nThese are then sent to the larger target model, which accepts or rejects these tokens.\nFor more detailed information, see the original proposal by\nDeepMind titled `Accelerating Large Language Model Decoding with Speculative Sampling <https://arxiv.org/abs/2302.01318>`__.\nOur implementation for speculative sampling is lossless. In addition to standalone draft models,\nwe also support `Eagle draft models <https://github.com/SafeAILab/EAGLE>`__.\nCurrently we only support Eagle v1.\n\nIn the following example, we demonstrate how to perform speculative sampling using the ``Llama`` model.\nIn this example, we are performing multinomial sampmling.\n\n.. code-block:: python\n\n    import torch\n    from transformers import LlamaTokenizer\n    from transformers_neuronx import NeuronAutoModelForCausalLM, NeuronConfig, GenerationConfig\n    from transformers_neuronx.fused_speculation import FusedSpeculativeDecoder\n\n    # Specify path to draft and target\n    draft = '/home/ubuntu/Llama-2-7b-chat-hf'\n    target = '/home/ubuntu/Llama-2-70b-chat-hf'\n\n    # Specify generation parameters\n    gen_kwargs = {\n        \"top_k\": 50,\n        \"top_p\": 0.9,\n        \"do_sample\": True,\n        \"temperature\": 0.7,\n    }\n\n    # Load draft model\n    draft_neuron_model = NeuronAutoModelForCausalLM.from_pretrained(\n            draft, \n            n_positions=1024,\n            batch_size=1, \n            tp_degree=32, \n            amp='bf16', \n            neuron_config=NeuronConfig(\n                padding_side=\"right\",\n                attention_layout=Layout.BSH,\n                collectives_layout=\"BSH\",\n                on_device_embedding=True,\n                on_device_generation=GenerationConfig(**gen_kwargs),\n                ),\n            )\n    draft_neuron_model.to_neuron()\n    # Load target model\n    target_neuron_model = NeuronAutoModelForCausalLM.from_pretrained(\n            target, \n            n_positions=1024,\n            batch_size=1, \n            tp_degree=32, \n            amp='bf16',\n            neuron_config=NeuronConfig(\n                padding_side=\"right\",\n                attention_layout=Layout.BSH,\n                collectives_layout=\"BSH\",\n                on_device_embedding=True,\n                on_device_generation=GenerationConfig(**gen_kwargs),\n                ),\n            )\n    target_neuron_model.to_neuron()\n    \n    # Compile the speculative sampling model\n    # Here we set sepculation length to be 4\n    fsd = FusedSpeculativeDecoder(\n            draft_neuron_model, \n            target_neuron_model, \n            4,\n            )\n    fsd.to_neuron()\n\n    # Initialize tokenizer and text prompt\n    tokenizer = LlamaTokenizer.from_pretrained(target)\n    prompt = \"Hello, I'm a generative AI language model.\"\n    inputs = tokenizer(prompt, return_tensors=\"pt\")\n\n    # Call speculative sampling on given input\n    response = fsd.sample(\n        input_ids=inputs.input_ids,\n        attention_mask=inputs.attention_mask,\n        sequence_length=30,\n    )\n\n    # Decode the response\n    generated_text = tokenizer.decode(response[0])\n    print(f\"\\nDecoded tokens: {generated_text}\")\n\nThe following sample shows how to enable EAGLE speculation.\nTo get the EAGLE draft model to work, manually copy the LM head \nweights from the target model to the draft model. Additionally, \nyou need to rename the keys in the draft model's ``state_dict`` \nto match those in the target model.\n\n.. code-block:: python\n\n    import torch\n    from transformers import LlamaTokenizer\n    from transformers_neuronx import NeuronAutoModelForCausalLM, NeuronConfig, GenerationConfig\n    from transformers_neuronx.fused_speculation import FusedSpeculativeDecoder\n\n    # Specify path to draft and target\n    # The Eagle draft model can be downloaded from Eagle website\n    draft = '/home/ubuntu/EAGLE-llama2-chat-70B'\n    target = '/home/ubuntu/Llama-2-70b-chat-hf'\n\n    # Specify generation parameters\n    gen_kwargs = {\n        \"top_k\": 50,\n        \"top_p\": 0.9,\n        \"do_sample\": True,\n        \"temperature\": 0.7,\n    }\n\n    # Load draft model\n    draft_neuron_model = NeuronAutoModelForCausalLM.from_pretrained(\n            draft, \n            n_positions=1024,\n            batch_size=1, \n            tp_degree=32, \n            amp='bf16', \n            neuron_config=NeuronConfig(\n                is_eagle_draft=True,\n                has_pre_attention_norm=False,\n                # Need the above two configs for Eagle\n                padding_side=\"right\",\n                attention_layout=Layout.BSH,\n                collectives_layout=\"BSH\",\n                on_device_embedding=True,\n                on_device_generation=GenerationConfig(**gen_kwargs),\n                ),\n            )\n    draft_neuron_model.to_neuron()\n    # Load target model\n    target_neuron_model = NeuronAutoModelForCausalLM.from_pretrained(\n            target, \n            n_positions=1024,\n            batch_size=1, \n            tp_degree=32, \n            amp='bf16',\n            neuron_config=NeuronConfig(\n                is_eagle_target=True,\n                # Need the above config for Eagle\n                padding_side=\"right\",\n                attention_layout=Layout.BSH,\n                collectives_layout=\"BSH\",\n                on_device_embedding=True,\n                on_device_generation=GenerationConfig(**gen_kwargs),\n                ),\n            )\n    target_neuron_model.to_neuron()\n    \n    # Compile the speculative sampling model\n    # Here we set sepculation length to be 4\n    fsd = FusedSpeculativeDecoder(\n            draft_neuron_model, \n            target_neuron_model, \n            4,\n            )\n    fsd.to_neuron()\n\n    # The rest are the same\n\n\nQKV Weight Fusion\n--------------------------------------\n\nConcatenating a model's query, key and value weight matrices often achieves better performance because larger matrices allow\nfor more efficient data movement and compute. QKV weight fusion can be enabled by setting ``fuse_qkv=True`` in the ``NeuronConfig``:\n\n.. code-block:: python\n\n    neuron_config = NeuronConfig(fuse_qkv=True)\n\n\nAttention Layout\n--------------------------------------\n\nThe intermediate tensor layouts in a model's attention layer can impact the\ncompiler's optimization opportunities and thus can impact a model's performance.\nUsing ``(batch, sequence, hidden)`` (or ``BSH``) layout for attention often\nachieves better performance since it can enable better overlapping of compute\nwith collectives and can reduce transposes. We intend to enable ``BSH``\nattention by default in a future release. For now, ``BSH`` attention layout can\nbe enabled by setting ``attention_layout=\"BSH\"`` in the ``NeuronConfig``:\n\n.. code-block:: python\n\n    neuron_config = NeuronConfig(attention_layout=\"BSH\")\n\n\nBucketing\n------------------\nLLM inference is a generate process that can produce variable length sequences.\nThis poses a problem since the Neuron compiler produces executables which expect statically shaped inputs and outputs.\nTo make LLM work with different shapes, transformers_neuronx generates buckets\nand applies padding wherever it is required.\n\nThere are at least two set of buckets for each LLM inference that can be set by user:\n1) Context encoding (pre-fill) buckets and 2) output token generation buckets.\n\n\n**Token generation buckets**\n\nIn token generation, tokens are generated iteratively.\nAt each token position, transformer need to attend to the previous tokens only.\nBut in the naive implementation with static shapes, one may attend to all KV-cache (full sequence length).\nTo solve this problem, we use token generation buckets.\nToken generation buckets determine the attention lengths.\nFor instance, if the max sequence length is 1024 tokens and current token\nis at position 120, there is no need to attend to all 1024 tokens in the current step.\nWe can use token generation buckets to attend to different portions of KV-cache.\nBy default, token generation buckets which are powers of 2 starting from 128\ntokens are used (i.e. 128, 256, 512, up to sequence length). In the example above,\nbucket 128 would be used for position 120 which would reduce the wasted compute significantly.\nUser can change these buckets by setting a list for ``n_positions`` (see example below).\nOtherwise, if a number is given for ``n_positions`` (sequence length), instead of a list,\nthen the powers of 2 buckets starting from 128 will be used.\nThe last bucket would be ``n_positions`` (sequence length), even if it is not a power of 2.\n\n**Context encoding buckets**\n\nThe prompt tokens can be processed in parallel.\nAs a result, we need to set the bucket sizes for different estimated length of\ninput prompts. We can specify these context bucket sizes using the ``context_length_estimate`` argument.\nIn general, it is better to have all the bucket to be multiples of 256 tokens.\nBut adding too many buckets would increase device memory consumption and add extra latency\nfor bucket switching.\nUsually, the powers of 2 starting from 128 tokens are used for\ncontext encoding buckets. If the total sequence length (``n_positions``) is beyond 2048\ntokens, it is desirable to add extra buckets with multiple of 512 or 1024 tokens.\nIt is not recommended to add buckets of multiples of 256 tokens or smaller for context buckets beyond 2k to avoid bucket switching latency.\nAt runtime, the smallest bucket which fits the input context will be used.\nBy default, the context encoding buckets set to half of output-token buckets.\nAdding extra context buckets would reduce the wasted compute and improves performance.\nHowever, the extra executables would reduce memory space since executables require device memory space.\n\nNotice that the default output token generation buckets work well for wide range\nof applications. However, ideal context encoding buckets depends on the specific use case.\nFor instance, if all the requests have a context length of about 1500 +/- 500 tokens,\nadding more buckets closer to 1500 might help context encoding time.\nIn this example, adding buckets of 1024, 1280, 1536, 1792, 2048 tokens (distance of 256 tokens) could help.\nMoreover, the largest context encoding bucket should be larger than the largest context length.\nOtherwise, the performance would degrade significantly.\n\n\nTo set context encoding and token generation buckets manually:\n\n.. code-block:: python\n\n    context_length_estimate = [1024, 1280, 1536, 1792, 2048]    # The best context estimate depends on the use case\n    n_positions = [128, 256, 512, 1024, 2048, 3072]             # Usually default buckets are appropriate\n\n    model = NeuronAutoModelForCausalLM.from_pretrained(\n        'gpt2',\n        batch_size=1,\n        n_positions=n_positions,\n        tp_degree=2,\n        amp='f16',\n        context_length_estimate=context_length_estimate,\n    )\n\n\n\n\nMulti-node inference support (TP/PP)\n---------------------------------------\n\nPrerequisite: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.html\n\nWhen models are too large to fit on single node, Transformers NeuronX multi-node inference (tensor parallel and pipeline parallel) can be used to shard model weights across multiple Neuron instances (only supported on Trn1 and Trn1n). Single node inference code can easily be extended to multi-node inference.\n\nNote that Transformers Neuronx currently doesn't support multi-node Tensor Parallel and Pipeline Parallel at same time, when Pipeline Parallel is used, the Tensor Parallel has to be within a node (TP<=32 on Trn1/Trn1n).\n\nIn the below sections, we first outline the sample code for single node execution and then provide instructions to migrate the code to use multi-node tensor parallel or multi-node pipeline parallel. To start with, the code below is for single node script, running llama2-3b model with tensor parallel degree as 32.\n\n.. code-block:: python\n\n    import torch\n    from transformers import AutoTokenizer, AutoConfig\n    from transformers_neuronx import  LlamaForSampling, HuggingFaceGenerationModelAdapter\n\n    # Create and compile the Neuron model\n    model = LlamaForSampling.from_pretrained(\"openlm-research/open_llama_3b\", tp_degree=32)\n    model.to_neuron()\n\n    # Use the `HuggingFaceGenerationModelAdapter` to access the generate API\n    config = AutoConfig.from_pretrained(\"openlm-research/open_llama_3b\")\n    model = HuggingFaceGenerationModelAdapter(config, model)\n\n    # Get a tokenizer and example input\n    tokenizer = AutoTokenizer.from_pretrained(\"openlm-research/open_llama_3b\")\n    tokenizer.pad_token_id = tokenizer.eos_token_id\n    tokenizer.padding_side = 'left'\n    text = \"Hello, I'm a language model,\"\n    encoded_input = tokenizer(text, return_tensors='pt', padding=True)\n\n\n\n    # Run inference using temperature\n    with torch.inference_mode():\n        model.reset_generation()\n        generated_sequence = model.generate(\n            input_ids=encoded_input.input_ids,\n            attention_mask=encoded_input.attention_mask,\n            do_sample=True,\n            max_length=256,\n            temperature=0.7,\n        )\n\n    print([tokenizer.decode(tok) for tok in generated_sequence])\n\ncommand line:\n\n.. code-block:: bash\n\n    python3 multi_node_dev_example.py\n\n**Multi-Node Tensor Parallel**\n\nCompared to single node tensor parallel, multi-node tensor parallel shards the model weights in the same way but having mores cores across nodes. In the meantime, it requires each node’s ``model.forward()`` receives the exact same input, otherwise there would be unexpected behaviors (runtime failure, wrong output).\n\nConfigurations (environment variables to be configured on each node):\n\n- ``NEURON_RT_ROOT_COMM_ID``: the master node's ``<IP address>:<port>``\n- ``NEURON_RANK_ID``: rank of the node, 0 means master node\n- ``NEURON_LOCAL_TP``: the local tensor parallel degree on each node\n\nexample:\n\nChange the single node script to use ``tp=64`` (2 node). Set the ``torch.manual_seed`` to ensure the sampling loop running on each node will sample same token as next input.\n\n\nNode 1 command line:\n\n.. code-block:: bash\n\n    NEURON_RT_ROOT_COMM_ID=10.1.201.64:63423 NEURON_RANK_ID=0 NEURON_LOCAL_TP=32 python3 multi_node_dev_example.py\n\nNode 2 command line (same as Node 1 but set ``NEURON_RANK_ID`` as 1):\n\n.. code-block:: bash\n\n    NEURON_RT_ROOT_COMM_ID=10.1.201.64:63423 NEURON_RANK_ID=1 NEURON_LOCAL_TP=32 python3 multi_node_dev_example.py\n\nYou can also refer to  `Tutorial <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-405b-multinode-16k-sampling.ipynb>`__ to run lama 3.1 405b multinode 16k tutorial with multi-node tensor parallel.\n\n**Multi-Node Pipeline Parallel**\n\nWhile having the weight tensor sharded as tensor pararallel, one can utilize pipeline parallel to partition the layers across different node, the intermediate tensor (hidden) will be transferred from one pipeline stage (nodes) to the next pipeline stage (nodes). The final output will be sent from last pipeline stage back to first pipeline stage.\n\nCompared to multi-node tensor parallel, for non-zero rank, the ``model.forward`` in pipeline parallel will fallback to while loop and block on the input broadcasting from master.\n\nConfigurations (environment variables to be configured on each node):\n\n- ``NEURON_RT_ROOT_COMM_ID``: the master node's ``<IP address>:<port>``\n- ``CPU_COMM_ID``: similar to NEURON_RT_ROOT_COMM_ID , but need to set with different port\n- ``NEURON_RANK_ID``: rank of the node, 0 means master node\n- ``NEURON_PP_STAGES``: number of pipeline stages (nodes)\n\nexample:\n\nKeep the original single node script with tp=32.\n\nNode 1 command line:\n\n.. code-block:: bash\n\n    NEURON_PP_STAGES=2 CPU_COMM_ID=10.1.201.64:8989 NEURON_RT_ROOT_COMM_ID=10.1.201.64:63423 NEURON_RANK_ID=0 python3 multi_node_dev_example.py\n\nNode 2 command line (same as Node 1 but set ``NEURON_RANK_ID`` as 1):\n\n.. code-block:: bash\n\n    NEURON_PP_STAGES=2 CPU_COMM_ID=10.1.201.64:8989 NEURON_RT_ROOT_COMM_ID=10.1.201.64:63423 NEURON_RANK_ID=1 python3 multi_node_dev_example.py\n\n\nLong Sequence length support up to 128k\n---------------------------------------\n**Flash Attention**\n\nWith the integration of FlashAttention kernel, developers can use longer sequence lengths for LLAMA models. The Flash Attention kernel is automatically used when the input sequence length is greater than 8k without any additional configuration. Refer to `Tutorial <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3-8b-32k-sampling.ipynb>`__ for usage of 32k sequence length on a variation of LLAMA3-8B Model.\n\n**Flash Decoding**\n\nFlash Decoding (FD) is a technique that significantly speeds up attention during inference, especially for long-context\ntasks in large language models (LLMs) with GQA.\n\n.. image:: ./flash_decoding.gif\n   :alt: Flash Decoding\n   :width: 800px\n   :align: center\n\nWith integration of FD, developers can achieve faster inference with larger sequence\nand batch size by reducing the KV cache replication.\nRefer to `Tutorial <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx\n/inference/llama-3.1-8b-128k-sampling.ipynb>`__ on flash decoding usage for 128k sequence length sampling. Flash decoding\ncan be enabled by setting the flag `shard_over_sequence=True` in `NeuronConfig`\n\n.. code-block:: python\n\n    neuron_config = NeuronConfig(shard_over_sequence=True)\n\n\nNote that you can skip the first Allgather introduced by flash decoding at the cost of duplicate Q weights, this is only recommended for relatively small models (i.e. 3B, 8B) and large batch size.\n\n.. code-block:: python\n\n    neuron_config = NeuronConfig(shard_over_sequence=True, duplicate_q_weight_sos=True)\n\n**Known limitations and FAQs**\n\n- Flash decoding is expected to have performance degradation (PTL) for smaller sequence and batch sizes. We recommend flash decoding when **batch-size x sequence length > 16k**\n- Flash decoding support is not enabled for the following features\n\n - Speculative Decoding\n - Multi Head Attention (MHA) models\n\n\n"
  },
  {
    "path": "archive/transformers-neuronx/transformers-neuronx-misc.rst",
    "content": ".. _transformers-neuronx-misc:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This topic is currently archived and not maintained. It is provided for reference only.\n\nMisc (``transformers-neuronx``)\n===============================\n\n\n"
  },
  {
    "path": "archive/transformers-neuronx/transformers-neuronx-misc.txt",
    "content": "* :ref:`transformers-neuronx-rn`"
  },
  {
    "path": "archive/transformers-neuronx/transformers-neuronx-tutorials.rst",
    "content": ".. _transformers_neuronx_tutorials:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This topic is currently archived and not maintained. It is provided for reference only.\n\nTransformers NeuronX Tutorials \n===============================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb>\n    Hugging Face facebook/opt-13b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/facebook-opt-13b-sampling.ipynb>\n    Hugging Face facebook/opt-30b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/facebook-opt-30b-sampling.ipynb>\n    Hugging Face facebook/opt-66b autoregressive sampling on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/facebook-opt-66b-sampling.ipynb>\n\n\n.. include:: /libraries/transformers-neuronx/transformers-neuronx-tutorials.txt\n"
  },
  {
    "path": "archive/transformers-neuronx/transformers-neuronx-tutorials.txt",
    "content": "* `Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb>`_\n* `Hugging Face facebook/opt-13b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/facebook-opt-13b-sampling.ipynb>`_\n* `Hugging Face facebook/opt-30b autoregressive sampling on Inf2 & Trn1 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/facebook-opt-30b-sampling.ipynb>`_\n* `Hugging Face facebook/opt-66b autoregressive sampling on Inf2 <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/facebook-opt-66b-sampling.ipynb>`_\n\n\n"
  },
  {
    "path": "archive/transformers-neuronx/transformers-neuronx.txt",
    "content": ".. dropdown::  Setup  (``transformers-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /libraries/transformers-neuronx/setup/index.rst\n\n\n.. dropdown::  Developer Guide  (``transformers-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /libraries/transformers-neuronx/developer-guide.txt\n\n\n.. dropdown::  Tutorials  (``transformers-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /libraries/transformers-neuronx/transformers-neuronx-tutorials.txt\n\n\n.. dropdown::  Misc  (``transformers-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /libraries/transformers-neuronx/transformers-neuronx-misc.txt\n"
  },
  {
    "path": "archive/tutorials/finetune_t5.rst",
    "content": ".. _torch-hf-t5-finetune:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n\nFine-tune T5 model on Trn1\n================================\n\n.. note:: \n   This page was archived on 7/31/2025.\n\n\nIn this tutorial, we show how to fine-tune a Hugging Face (HF) T5 model \nusing HF trainer API. This example fine-tunes a `T5 model for\na text-summarization <https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization>`__ task on CNN/DailyMail dataset.\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n.. include:: /frameworks/torch/torch-neuronx/tutorials/note-performance.txt\n\nSetup and compilation\n---------------------\n\nBefore running the tutorial please follow the installation instructions at:\n\n:ref:`Install PyTorch Neuron on Trn1 <setup-torch-neuronx>`\n\nPlease set the storage of instance to *512GB* or more if you also want to run through the BERT pretraining and GPT pretraining tutorials.\n\nFor all the commands below, make sure you are in the virtual environment that you have created above before you run the commands:\n\n.. code:: shell\n\n   source ~/aws_neuron_venv_pytorch/bin/activate\n\nFirst we install a recent version of HF transformers, scikit-learn and evaluate packages in our environment as well as download the source matching the installed version. In this example, we chose version 4.26.0 and the text summarization example from HF transformers source:\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_setup_code.sh\n   :language: shell\n   :lines: 5-9\n\nSingle-worker training\n----------------------\n\nWe will run text-summarization fine-tuning task following the example in\nREADME.md located in the path\n`~/transformers/examples/pytorch/summarization.`\n\nWe use full BF16 casting using `XLA_USE_BF16=1` to enable best\nperformance. First, paste the following script into your terminal to\ncreate a “run.sh” file and change it to executable:\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh\n   :language: shell\n   :lines: 7-46\n\nWe optionally precompile the model and training script using\n`neuron\\_parallel\\_compile <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/training/pytorch-neuron-parallel-compile.html?highlight=neuron_parallel_compile>`__ to warm up the persistent graph cache (Neuron\nCache) such that the actual run has fewer compilations (faster run\ntime):\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh\n   :language: shell\n   :lines: 49\n\nNote: For these auto-regressive models, do not run the\n``predict_with_generate`` method when doing the precompile step. This is\nbecause the ``neuron_parallel_compile`` utility will run the training\nscript in graph extraction mode and no actual execution of the graph\nwill be done. Hence, the outputs at each step are invalid. Since the\nauto-regressive generation at each step is dependent on output of\nprevious step, the generate step would fail since the outputs from\nprevious steps are invalid.\n\nPrecompilation is optional and only needs to be done once unless\nhyperparameters such as batch size are modified. After the optional\nprecompilation, the actual run will be faster with minimal additional\ncompilations.\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh\n   :language: shell\n   :lines: 51\n\nIf precompilation was not done, the first execution of ./run.sh will be\nslower due to serial compilations. Rerunning the same script a second\ntime would show quicker execution as the compiled graphs will be already\ncached in persistent cache.\n\nRunning the above script will run the T5-small fine-tuning on a single\nprocess.\n\n**Note:** As you may have noticed, we are not running the\n``predict_with_generate`` as part of training. This is because,\n``predict_with_generate`` requires auto-regressive sampling where the\ninputs to the decoder are created by appending outputs of previous\nsteps. This causes the inputs to the decoder to change shape and thereby\nresulting in a new graph. In other words, the current ``generate`` api\nprovided by HF transformers leads to repeated compilations. We are working on\nbuilding a Neuron friendly version of ``generate`` api and it will be\nmade available as part of future release. This will enable us to run\n``predict_with_generate`` as part of training script.\n\nAs a workaround, we can run the ``predict_with_generate`` on CPU after\nthe model is trained. Once training is completed, a trained checkpoint\nwould be saved. We can load the trained model and run the\n``predict_with_generate`` to compute the final accuracy.\n\nTo do so, in run_summarization.py, add the following before ``transformers`` get imported.\nThis can be done by adding the below lines before all the ``imports``:\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh\n   :language: python\n   :lines: 55-59\n\nYou can now run the following and it should run the predict method on CPU device.\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh\n   :language: shell\n   :lines: 67-78\n\nNote: To run on CPU, we need to make sure that NEURON\\_NUM\\_DEVICES is\nset to 0. This will make sure no xla\\_devices are created and the\ntrainer would use the default device (CPU).\n\n.. _multi_worker_training:\n\nMulti-worker Training\n---------------------\n\nThe above script will run one worker on one NeuronCore. To run on\nmultiple cores, first add these lines to top of run\\_summarization.py to disable\nDistributed Data Parallel (DDP) when using torchrun (see Known issues\nand limitations section below):\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_modify_run_summarization_code.sh\n   :language: python\n   :lines: 8-10\n\nThen launch the run\\_summarization.py script with torchrun using\n--nproc\\_per\\_node=N option to specify the number of workers (N=2 for\ntrn1.2xlarge, and N=2, 8, or 32 for trn1.32xlarge). The following\nexample runs 2 workers. Paste the following script into your terminal to\ncreate a “run\\_2w.sh” file and change it to executable:\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_multi_worker_training_code.sh\n   :language: shell\n   :lines: 7-46\n\nAgain, we optionally precompile the model and training script using\nneuron\\_parallel\\_compile to warm up the persistent graph cache (Neuron\nCache), ignoring the results from this precompile run as it is only for\nextracting and compiling the XLA graphs:\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_multi_worker_training_code.sh\n   :language: python\n   :lines: 49\n\nPrecompilation is optional and only needs to be done once unless\nhyperparameters such as batch size are modified. After the optional\nprecompilation, the actual run will be faster with minimal additional\ncompilations.\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_multi_worker_training_code.sh\n   :language: python\n   :lines: 51\n\nDuring run, you will notice that the “Total train batch size” is now\n8 and the “Total optimization steps” is now half the number for one\nworker training. Also, if you open ``neuron-top`` in a separate terminal, \nyou should see 2 cores been utilized.\n\nTo train T5-large model, you can set the ``model_name_or_path`` argument to ``t5-large``.\nPlease note, currently running ``t5-large`` on trn1-2xl machine can result in ``HOST OOM`` during \ncompilation. Hence, it is recommended that you run a ``t5-large`` model training on a trn1-32xl machine.\n\nOn a trn1-32xl machine, you can create a run_32w.sh on the terminal using the following commands:\n\n.. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_32_worker_training_code.sh\n   :language: shell\n   :lines: 7-46\n\nYou can now follow the same steps as listed above. This script would run a t5-large model by launching a training script \nusing 32 data-parallel workers.\n\n\n.. _t5_known_issues:\n\nKnown issues and limitations\n----------------------------\n\nThe following are currently known issues:\n\n-  Long compilation times: this can be alleviated with\n   ``neuron_parallel_compile`` tool to extract graphs from a short trial run and\n   compile them in parallel ahead of the actual run, as shown above.\n- T5-Large compilation causing processes to get killed on trn1-2xl: It is recommended \n  to ``t5-large`` model training on a trn1-32xl machine, as it avoids CPU OOM and also provides \n  faster training by making use of 32 data-parallel workers.\n"
  },
  {
    "path": "archive/tutorials/finetuning_llama2_7b_ptl.rst",
    "content": ".. _llama2_7b_tp_zero1_ptl_finetune_tutorial:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n\nFine-tuning Llama2 7B with tensor parallelism and ZeRO-1 optimizer using Neuron PyTorch-Lightning\n=================================================================================================\n\nThis tutorial shows how to fine-tune Llama2 7B with tensor parallelism and ZeRO-1 using Neuron PyTorch-Lightning APIs. For pre-training information and additional context, see the Llama2 7B Tutorial\nand :ref:`Neuron PT-Lightning Developer Guide <ptl_developer_guide>`. \n\n\nSetting up the environment\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFor this experiment, we will use AWS ParallelCluster with at least four trn1.32xlarge compute nodes.\nTo set up a cluster and prepare it for use, see `Train your model on ParallelCluster <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster/parallelcluster-training.html>`__.\nTo set up the packages on the head node of the cluster, see\n:ref:`Install PyTorch Neuron on Trn1 <setup-torch-neuronx>`.\n\nInstall the ``neuronx-distributed`` package inside the virtual environment using the following command:\n\n.. code:: ipython3\n\n   python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com\n\nNext, download the scripts for fine-tuning.\n\n\n1. Create a directory to hold the experiments.\n\n.. code:: ipython3\n\n   mkdir -p ~/examples/tp_zero1_llama2_7b_hf_finetune_ptl\n   cd ~/examples/tp_zero1_llama2_7b_hf_finetune_ptl\n\n2. Download training scripts for the experiments.\n\n.. code:: ipython3\n\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/data_module.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/module_llama.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/tp_zero1_llama2_7b_hf_finetune_ptl.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/tp_zero1_llama2_7b_hf_finetune_ptl.sh\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/finetune_config/config.json\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lr.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/modeling_llama_nxd.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/requirements.txt\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/requirements_ptl.txt\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/training_utils.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/convert_checkpoints.py\n\n3. Install the additional requirements and give the right permissions to the shell script.\n\n.. code:: ipython3\n\n   python3 -m pip install -r requirements.txt\n   python3 -m pip install -r requirements_ptl.txt  # Currently we're supporting Lightning version 2.4.0\n   python3 -m pip install optimum-neuron==0.0.18 nltk  # Additional dependencies for evaluation\n   python3 -m pip install --no-warn-conflicts transformers==4.32.1   # Ping transformers version 4.32.1\n   chmod +x tp_zero1_llama2_7b_hf_finetune_ptl.sh\n\nDownload the Llama2-7B pre-trained checkpoint from HuggingFace.\n\n\n1. Create a Python script ``get_model.py`` with the following lines: \n\n.. code:: ipython3\n\n   import torch\n   from transformers.models.llama.modeling_llama import LlamaForCausalLM\n   model = LlamaForCausalLM.from_pretrained(\"NousResearch/Llama-2-7b-hf\")\n   torch.save(model.state_dict(), \"llama-7b-hf-pretrained.pt\")\n\n2. Run the download script and conversion script to pull and convert the checkpoint, note that conversion scripts requires high memory so need to login to a compute node to do so:\n\n.. code:: ipython3\n\n   ssh compute1-dy-training-0-1\n   source ~/aws_neuron_venv_pytorch/bin/activate\n   cd ~/examples/tp_zero1_llama2_7b_hf_finetune_ptl\n   python3 get_model.py\n   python3 convert_checkpoints.py --tp_size 8 --convert_from_full_model --config config.json --input_dir llama-7b-hf-pretrained.pt --output_dir llama7B-pretrained/pretrained_weight\n\n3. (Optional) If you are loading checkpoint from different directory, set the checkpoint path by adding the following flag to ``tp_zero1_llama2_7b_hf_finetune_ptl.sh``:\n\n   * ``--pretrained_ckpt``.\n\n   This provides direction to the pre-trained checkpoint to be loaded.\n\nThen, set the dataset for the fine-tuning job. In this example, we will use Dolly, which is an open source dataset\nof instruction-following records on categories outlined in the InstructGPT paper, including brainstorming, classification,\nclosed QA, generation, information extraction, open QA, and summarization.\n\n.. code-block:: json\n\n   {\n     \"instruction\": \"Alice's parents have three daughters: Amy, Jessy, and what's the name of the third daughter?\",\n     \n     \"context\": \"\",\n     \n     \"response\": \"The name of the third daughter is Alice\"\n   }\n\nConfigure the following flags in ``tp_zero1_llama2_7b_hf_finetune_ptl.sh``:\n\n.. code:: ipython3\n\n   --data_dir \"databricks/databricks-dolly-15k\" \\\n   --task \"open_qa\"\n\nAt this point, you are all set to start fine-tuning.\n\nRunning fine-tuning\n^^^^^^^^^^^^^^^^^^^\n\nBy this step, the cluster is all set up for running experiments. \nBefore running training, first pre-compile the graphs using the :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>`.\nRun the command below:\n\n.. code:: ipython3\n\n   sbatch --exclusive \\\n   --nodes 1 \\\n   --wrap=\"srun neuron_parallel_compile bash $(pwd)/tp_zero1_llama2_7b_hf_finetune_ptl.sh\"\n\nThis script uses a tensor-parallel size of 8.\nThis automatically sets the zero-1 sharding degree to 4 (32 workers / tensor_parallel_size). \n\n`Note`: You can use any number of nodes in this case by adjusting the number of nodes in the above \nSlurm command accordingly. Also, the number of nodes used in the parallel_compile command should be same as the number used in the actual \ntraining run. This is because, as the number of nodes change, the data-parallel degree changes too. This  \nresults in more workers participating in operations like `gradient all-reduce`, which results in new graphs getting \ncreated. \n\nAfter the graphs are compiled, you can run training and observe how the loss goes down.\nBefore the actual fine-tune started, we need  to prepare the dataset\n\n.. code:: ipython3\n\n   python3 -c \"import nltk; nltk.download('punkt')\" \n\nTo run the training, run the above command without ``neuron_parallel_compile``:\n\n.. code:: ipython3\n\n   sbatch --exclusive \\\n   --nodes 1 \\\n   --wrap=\"srun bash $(pwd)/tp_zero1_llama2_7b_hf_finetune_ptl.sh\"\n\nAt the end of fine-tuning, run evaluation once with a test data split by generating sentences and calculating ROUGE scores.\nThe final evaluation results and ROUGE score are then printed in your terminal.\n\n\nCheckpointing\n^^^^^^^^^^^^^^\n\nTo enable checkpoint saving, add the following flags to ``tp_zero1_llama2_7b_hf_finetune_ptl.sh``:\n\n* ``--save_checkpoint`` Enables checkpoint saving.\n* ``--checkpoint_freq`` Number of steps to save a checkpoint.\n* ``--checkpoint_dir`` Direction to save the checkpoint.\n* ``--num_kept_checkpoint`` Number of checkpoints to save. Older checkpoint are deleted manually. Set to -1 to keep all saved checkpoints.\n* ``--save_load_xser`` Loads with torch_xla serialization to reduce time saving. We recommend enabling xser for significantly faster save and load times. Note that if the checkpoint is saved with xser, it can only be loaded with xser, and vice versa. \n\nTo enable checkpoint loading, add the following flags to ``tp_zero1_llama2_7b_hf_finetune_ptl.sh``:\n\n* ``--resume_ckpt`` Resumes the checkpoint process.\n* ``--load_step`` The step to retrieve the checkpoint from.\n* ``--checkpoint_dir`` Direction to load the checkpoint from.\n* ``--save_load_xser`` Loads with torch_xla serialization to reduce time saving. We recommend enabling xser for significantly faster save and load times. Note that if the checkpoint is saved with xser, it can only be loaded with xser, and vice versa. \n"
  },
  {
    "path": "archive/tutorials/gpt3_neuronx_nemo_megatron_pretraining.rst",
    "content": ".. _gpt3_neuronx_nemo_megatron_pretraining:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n\nLaunch a GPT-3 pretraining job using neuronx-nemo-megatron\n==========================================================\n\nArchived tutorials for gpt3 pretraining using neuronx-nemo-megatron\n  * `Launch a GPT-3 23B pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`_\n  * `Launch a GPT-3 46B pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`_\n  * `Launch a GPT-3 175B pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`_"
  },
  {
    "path": "archive/tutorials/megatron_gpt_pretraining.rst",
    "content": ".. _megatron_gpt_pretraining:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n\nMegatron GPT Pretraining\n========================\n\n.. note:: \n   This page was archived on 7/31/2025.\n\nIn this example, we will compile and train a Megatron GPT model on a single instance or\non multiple instances using ParallelCluster with the NxD Training library.\nThe example has the following main sections:\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nSetting up the environment\n--------------------------\n\nParallelCluster Setup\n^^^^^^^^^^^^^^^^^^^^^\n\nIn this example, we will use 8 instances with ParallelCluster,\nplease follow the instructions here to create a cluster:\n`Train your model on ParallelCluster\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster/parallelcluster-training.html>`_\n\nParallelCluster automates the creation of trn1 clusters,\nand provides the SLURM job management system for scheduling and managing distributed training jobs.\nPlease note that the home directory on your ParallelCluster\nhead node will be shared with all of the worker nodes via NFS.\n\nInstall Dependencies\n^^^^^^^^^^^^^^^^^^^^\n\nOnce you have launched a trn1 instance or ParallelCluster,\nplease follow this guide on how to install the latest Neuron packages:\n`PyTorch Neuron Setup Guide\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/torch-neuronx.html#setup-torch-neuronx>`_.\n\nNext, we will need to install NxD Training and its dependencies.\nPlease see the following installation guide for installing NxD Training:\n:ref:`NxDT Installation Guide <nxdt_installation_guide>`\n\n\nDownload the dataset\n--------------------\n\nThis tutorial makes use of a preprocessed Wikipedia dataset that is stored in S3.\nThe dataset can be downloaded to your cluster or instance by running\nthe following commands on the head node or your trn1 instance:\n\n.. code-block:: bash\n\n    export DATA_DIR=~/examples_datasets/gpt2\n    mkdir -p ${DATA_DIR} && cd ${DATA_DIR}\n    wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json\n    wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt\n    aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.bin .  --no-sign-request\n    aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.idx .  --no-sign-request\n    aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/license.txt .  --no-sign-request\n\n\n\nPre-compile the model\n---------------------\n\nBy default, PyTorch Neuron uses a just in time (JIT) compilation flow that sequentially\ncompiles all of the neural network compute graphs as they are encountered during a training job.\nThe compiled graphs are cached in a local compiler cache so that subsequent training jobs\ncan leverage the compiled graphs and avoid compilation\n(so long as the graph signatures and Neuron version have not changed).\n\nAn alternative to the JIT flow is to use the included ``neuron_parallel_compile``\ncommand to perform ahead of time (AOT) compilation. In the AOT compilation flow,\nthe compute graphs are first identified and extracted during a short simulated training run,\nand the extracted graphs are then compiled and cached using parallel compilation,\nwhich is considerably faster than the JIT flow.\n\nFirst, clone the open-source ``neuronx-distributed-training`` library\n\n.. code:: ipython3\n\n   git clone https://github.com/aws-neuron/neuronx-distributed-training\n   cd neuronx-distributed-training/examples\n\nNow, ensure that you are using the proper config file in the ``conf/`` directory.\nIn the ``train.sh`` file, ensure that the ``CONF_FILE`` variable is properly\nset to the config for the model you want to use. In our case,\nit will be ``megatron_gpt_config``. The default config here is a 6.7B parameter model,\nbut users can also add their own ``conf/*.yaml`` files and run different configs and\nhyperparameters if desired. Please see :ref:`Config Overview <nxdt_config_overview>`\nfor examples and usage for the ``.yaml`` config files.\n\nNext, run the following commands to launch an AOT pre-compilation job on your instance:\n\n.. code-block:: bash\n\n    export COMPILE=1\n    ./train.sh\n\nThe compile output and logs will be shown directly in the terminal\nand you will see a message similar to this:\n\n.. code-block:: bash\n\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total graphs: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total successful compilations: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total failed compilations: 0\n\nThen, you know your compilation has successfully completed.\n\n.. note::\n    The number of graphs will differ based on package versions, models, and other factors.\n    This is just an example.\n\nIf you are using ParallelCluster, then you will need to update the ``conf/megatron_gpt_config.yaml``\nwith\n\n.. code-block:: yaml\n\n    num_nodes: 8\n\nThen to run the compile job:\n\n.. code-block:: bash\n\n    export COMPILE=1\n    sbatch --exclusive \\\n        --nodes 8 \\\n        --cpus-per-task 128 \\\n        --wrap=\"srun ./train.sh\"\n\nOnce you have launched the precompilation job, run the squeue command to view the\nSLURM job queue on your cluster. If you have not recently run a job on your cluster,\nit may take 4-5 minutes for the requested trn1.32xlarge nodes to be launched and initialized.\nOnce the job is running, squeue should show output similar to the following:\n\n.. code-block:: bash\n\n    JOBID  PARTITION  NAME      USER    ST  TIME  NODES NODELIST(REASON)\n    10     compute1   wrap      ubuntu  R   5:11  8     compute1-dy-queue1-i1-[0-7]\n\nYou can view the output of the precompilation job by examining the file named\n``slurm-ZZ.out``,\nwhere ZZ represents the JOBID of your job in the squeue output above.\n\n.. code-block:: bash\n\n    tail -f slurm-10.out\n\nOnce the precompilation job is complete, just like the above output\nyou should see a message similar to the following in the logs:\n\n.. code-block:: bash\n\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total graphs: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total successful compilations: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total failed compilations: 0\n\nAt this point, you can press ``CTRL-C`` to exit the tail command.\n\nTraining the model\n------------------\n\nThe pre-training job is launched almost exactly the same as the compile job.\nWe now turn off the ``COMPILE`` environment variable and\nrun the same training script to start pre-training.\n\nOn a single instance:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    ./train.sh\n\nIf you are using ParallelCluster:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    sbatch --exclusive \\\n        --nodes 8 \\\n        --cpus-per-task 128 \\\n        --wrap=\"srun ./train.sh\"\n\nAs outlined above, you can again use the ``squeue`` command to view the job queue,\nand also monitor the job in the same way with the ``tail`` command to see the training logs.\nOnce the model is loaded onto the Trainium accelerators and training has commenced,\nyou will begin to see output indicating the job progress:\n\nExample:\n\n.. code-block:: bash\n\n    Epoch 0:   0%|          | 189/301501 [59:12<1573:03:24, 18.79s/it, loss=7.75, v_num=3-16, reduced_train_loss=7.560, global_step=188.0, consumed_samples=24064.0]\n    Epoch 0:   0%|          | 190/301501 [59:30<1572:41:13, 18.79s/it, loss=7.74, v_num=3-16, reduced_train_loss=7.560, global_step=189.0, consumed_samples=24192.0]\n    Epoch 0:   0%|          | 191/301501 [59:48<1572:21:28, 18.79s/it, loss=7.73, v_num=3-16, reduced_train_loss=7.910, global_step=190.0, consumed_samples=24320.0]\n\nMonitoring Training\n-------------------\n\nTensorboard monitoring\n^^^^^^^^^^^^^^^^^^^^^^\n\nIn addition to the text-based job monitoring described in the previous section,\nyou can also use standard tools such as TensorBoard to monitor training job progress.\nTo view an ongoing training job in TensorBoard, you first need to identify the\nexperiment directory associated with your ongoing job.\nThis will typically be the most recently created directory under\n``~/neuronx-distributed-training/examples/nemo_experiments/megatron_gpt/``.\nOnce you have identifed the directory, cd into it, and then launch TensorBoard:\n\n.. code-block:: bash\n\n    cd ~/neuronx-distributed-training/examples/nemo_experiments/megatron_gpt/\n    tensorboard --logdir ./\n\nWith TensorBoard running, you can then view the TensorBoard dashboard by browsing to\n``http://localhost:6006`` on your local machine. If you cannot access TensorBoard at this address,\nplease make sure that you have port-forwarded TCP port 6006 when SSH'ing into the head node,\n\n.. code-block:: bash\n\n    ssh -i YOUR_KEY.pem ubuntu@HEAD_NODE_IP_ADDRESS -L 6006:127.0.0.1:6006\n\nneuron-top / neuron-monitor / neuron-ls\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe `neuron-top <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-top-user-guide.html>`_\ntool can be used to view useful information about NeuronCore utilization, vCPU and RAM utilization,\nand loaded graphs on a per-node basis. To use neuron-top during on ongoing training job,\nfirst SSH into one of your compute nodes from the head node (if using ParallelCluster), and then run ``neuron-top``:\n\n.. code-block:: bash\n\n    ssh compute1-dy-queue1-i1-1  # to determine which compute nodes are in use, run the squeue command\n    neuron-top\n\nSimilarly, once you are logged into one of the active compute nodes,\nyou can also use other Neuron tools such as\n`neuron-monitor <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nand `neuron-ls <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nto capture performance and utilization statistics and to understand NeuronCore allocation.\n\nTroubleshooting Guide\n---------------------\n\nFor issues with NxD Training, please see:\n:ref:`NxD Training Known Issues <nxdt_known_issues>`\n\nFor ParallelCluster issues see:\n`AWS ParallelCluster Troubleshooting <https://docs.aws.amazon.com/parallelcluster/latest/ug/troubleshooting-v3.html>`_\n"
  },
  {
    "path": "archive/tutorials/multinode-training-model-profiling.rst",
    "content": ".. meta::\n    :description: Learn how to use Neuron Explorer to analyze performance during multi-node training on AWS Trainium instances with SLURM job scheduling\n    :date-modified: 12/02/2025\n\nProfiling Multi-Node Training Jobs with Neuron Explorer\n========================================================\n\nThis tutorial demonstrates how to use Neuron Explorer to analyze performance during multi-node training on AWS Trainium instances. We will run a scaled-down version of the :doc:`NxD Training Llama3 8B tutorial </libraries/nxd-training/tutorials/hf_llama3_8B_pretraining>` across 2 nodes, capture performance traces, and visualize them using Perfetto. we will run training across 2 nodes with reduced steps and layers so that compilation and profiling complete quickly.\n\nPrerequisites\n-------------\n\n* Access to a multi-node Trainium cluster (4 nodes in this example)\n* Neuron SDK installed and configured along with :doc:`NxD Training library installation </libraries/nxd-training/general/installation_guide>`\n* Review of the :doc:`NxD Training Llama3 8B tutorial </libraries/nxd-training/tutorials/hf_llama3_8B_pretraining>`\n* Familiarity with SLURM job scheduling\n\nSetup and Configuration\n-----------------------\n\nStep 1: Initial Setup\n~~~~~~~~~~~~~~~~~~~~~~\n\nA. Download the dataset script:\n\n.. code-block:: bash\n\n    # Download get_dataset.py\n    wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/get_dataset.py\n\nB. Create a directory for dataset and get the corresponding config file -\n\n.. code-block:: bash\n\n    mkdir ~/examples_datasets/ && cd ~/examples_datasets/\n\n    # Download config.json \n    wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_zero1_llama_hf_pretrain/8B_config_llama3/config.json ~/\n\nC. Get the tokenizer using the following code snippet -\n\n.. code-block:: python\n\n    # tokenizer.py\n    from huggingface_hub import login\n    from transformers import AutoTokenizer\n\n    login(token='YourHuggingFaceToken')\n\n    tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B')\n\n    tokenizer.save_pretrained(\".\")\n\n.. code-block:: bash\n\n    python3 tokenizer.py\n\nD. Run the get_dataset.py -\n\n.. code-block:: bash\n\n    python3 ~/get_dataset.py --llama-version 3\n\nE. Clone neuronx-distributed-training git repo\n\n.. code-block:: bash\n\n    cd ~\n    git clone https://github.com/aws-neuron/neuronx-distributed-training.git\n    cd ~/neuronx-distributed-training/examples\n\nStep 2: Modify the Configuration Files\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUpdate the training configuration to minimize runtime while still generating useful profiling data:\n\n1. In ``hf_llama3_8B_config.yaml``, make the following changes:\n\n.. code-block:: yaml\n\n    max_steps: 5             # Run only 5 steps for faster turnaround\n    num_layers: 2            # Reduce model depth to 2 layers\n    num_nodes: 2             # Run only 2 nodes\n    global_batch_size: 32    # Set a relatively smaller GBS to avoid large trace volume\n\nThese changes ensure the job compiles and runs quickly while still exercising the profiler.\n\n2. In ``train.sh``, set the configuration file name:\n\n.. code-block:: bash\n\n    CONF_FILE=hf_llama3_8B_config\n\nThis ensures the job runs with your modified config.\n\nStep 3: Compile the Model\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nBefore training, the model must be compiled into Neuron Executable Files (NEFFs). To do this:\n\n.. code-block:: bash\n\n    export COMPILE=1 \n    export CONF_FILE=hf_llama3_8B_config\n\n    sbatch --exclusive \\\n        --nodes=2 \\\n        --cpus-per-task=128 \\\n        --wrap=\"srun ./train.sh\"\n\n* ``COMPILE=1`` tells the script to run in compile-only mode.\n* ``--nodes=2`` requests 2 Trainium nodes for compilation.\n* ``srun ./train.sh`` launches the job via Slurm across the allocated nodes.\n\n.. note::\n   The first compilation may take some time depending on the model size. Once compiled, NEFFs are cached for reuse in later training runs.\n\nStep 4: Run the Training Job with Profiling Enabled\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNow that compilation is done, we can run the training job while enabling Neuron Explorer:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    export CONF_FILE=hf_llama3_8B_config\n\n    NEURON_RT_INSPECT_DEVICE_PROFILE=1 NEURON_RT_INSPECT_ENABLE=1 \\\n    NEURON_RT_INSPECT_OUTPUT_DIR=./output \\\n    sbatch --exclusive \\\n        --nodes=2 \\\n        --cpus-per-task=128 \\\n        --wrap=\"srun ./train.sh\"\n\nHere's what's happening:\n\n* ``COMPILE=0``: Use precompiled NEFFs instead of recompiling.\n* ``NEURON_RT_INSPECT_ENABLE=1``: Turns on runtime inspection for profiling.\n* ``NEURON_RT_INSPECT_OUTPUT_DIR=./output``: All profiler logs will be saved into the ``./output`` directory.\n* Slurm runs the job across 2 nodes with 128 CPUs per task.\n\nAt the end of this step, you should see an output directory containing runtime inspection logs from each node.\n\nStep 5: Generate a Perfetto Profile\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuron Explorer produces raw trace data. To visualize it, convert the logs into a Perfetto compatible trace file:\n\n1. Run the Neuron Explorer CLI:\n\n.. code-block:: bash\n\n    neuron-profile view -d ./output --output-format perfetto\n\nThis command consolidates the logs and generates a Perfetto compatible trace file.\n\nStep 6: Visualize in Perfetto\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n1. Download the generated trace file to your local machine.\n2. Open the Perfetto UI.\n3. Drag and drop the trace file into the browser window.\n\nYou'll now see a timeline view of your training job, including kernel execution, operator scheduling, and activity across NeuronCores. This visualization helps you identify compute vs. memory bottlenecks, idle time, and overall efficiency of the training job.\n\nStep 7: Understanding the System Level Profile\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nOnce the profile is loaded in Perfetto, you'll see both nodes (2 in our case) along with their workers, listed on the left-hand side as process IDs (PIDs). Each worker captures the same trace, so expanding any one of them will give you the information you need. The key runtime event to focus on is the Neuron Runtime API call named ``nc_exec_running``. This API is responsible for executing a Neuron Executable File (NEFF) on the NeuronCores.\n\nIf you hover over or click on one of these calls, Perfetto will display details about which NEFF is being executed. While you may see other runtime API calls, our primary interest is in ``nc_exec_running`` since it directly represents the model execution on Neuron hardware.\n\n.. image:: /tools/profiler/images/multinode-training-1.png\n\nIn the example trace shown, the calls to ``nc_exec_running`` appear back-to-back with no significant delays in between. This indicates that, at a system level, the runtime is efficiently dispatching work to NeuronCores. The ``model_name`` field in the arguments section will display the name of the NEFF which is being used in the corresponding ``nc_exec_running``.\n\nStep 8: Linking to device level profiles\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSince we are able to see the NEFF name from ``nc_exec_running`` api call, we will now see how to visualize the profile for that NEFF. This effectively means how the model performance on a given Neuron core looks like. For this, on your trainium cluster, navigate to your compile cache directory (If you are following this tutorial it could be set as ``compiler_cache_url`` in config.yaml file). Navigate to the directory and search for the respective module directory based on the name, and you will see artifacts in that directory as shown below -\n\n.. code-block:: text\n\n    ├── compile_flags.json\n    ├── model.done\n    ├── model.hlo_module.pb\n    └── model.neff"
  },
  {
    "path": "archive/tutorials/nxd-source-code/gpt_neox_tp_zero1/gpt_neox_20b.sh",
    "content": "#!/bin/bash\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/\nln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/adamw_fp32_optim_params.py ./\nln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/get_dataset.py ./\nln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/requirements.txt ./\npython3 -m pip install -r requirements.txt\n\npython3 get_dataset.py\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 4 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/tp_dp_gpt_neox_20b_hf_pretrain.sh\"\n\nsbatch --exclusive \\\n--nodes 4 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/tp_dp_gpt_neox_20b_hf_pretrain.sh\"\n"
  },
  {
    "path": "archive/tutorials/nxd-source-code/gpt_neox_tp_zero1/gpt_neox_6_9b.sh",
    "content": "#!/bin/bash\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_6.9b_hf_pretrain/\nln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/adamw_fp32_optim_params.py ./\nln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/get_dataset.py ./\nln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/requirements.txt ./\nln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/modeling_gpt_neox_nxd.py ./\nln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/utils.py ./\npython3 -m pip install -r requirements.txt\n\npython3 get_dataset.py\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 4 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/tp_dp_gpt_neox_6.9b_hf_pretrain.sh\"\n\nsbatch --exclusive \\\n--nodes 4 \\\n--wrap=\"srun bash $(pwd)/tp_dp_gpt_neox_6.9b_hf_pretrain.sh\"\n"
  },
  {
    "path": "archive/tutorials/nxd-source-code/llama_tp_pp_ptl/llama_2_13b.sh",
    "content": "#!/bin/bash\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/lightning\nchmod +x run_llama_13b_tp_pp_ptl.sh\nmkdir 13B_config\ncp ~/neuronx-distributed/examples/training/llama/tp_pp_llama_hf_pretrain/13B_config_llama2/config.json ./13B_config\n\n\nsudo rm -rf /home/ubuntu/.cache/\npip install --upgrade filelock\n\npython3 get_dataset.py --llama-version 2\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/run_llama_13b_tp_pp_ptl.sh\"\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/run_llama_13b_tp_pp_ptl.sh\"\n"
  },
  {
    "path": "archive/tutorials/nxd-source-code/llama_tp_pp_ptl/llama_2_70b.sh",
    "content": "#!/bin/bash\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/lightning\nchmod +x run_llama_70b_tp_pp_ptl.sh\nmkdir 70B_config\ncp ~/neuronx-distributed/examples/training/llama/tp_pp_llama_hf_pretrain/70B_config_llama2/config.json ./70B_config\n\n\nsudo rm -rf /home/ubuntu/.cache/\npip install --upgrade filelock\n\npython3 get_dataset.py --llama-version 2\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/run_llama_70b_tp_pp_ptl.sh\"\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/run_llama_70b_tp_pp_ptl.sh\"\n"
  },
  {
    "path": "archive/tutorials/nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh",
    "content": "#!/bin/bash\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/tp_zero1_llama_hf_pretrain\nchmod +x tp_zero1_llama2_7B_hf_pretrain.sh\nln -sf 7B_config_llama2/config.json ./\n\nsudo rm -rf /home/ubuntu/.cache/\npip install --upgrade filelock\n\npython3 get_dataset.py --llama-version 2\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 4 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/tp_zero1_llama2_7B_hf_pretrain.sh\"\n\nsbatch --exclusive \\\n--nodes 4 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/tp_zero1_llama2_7B_hf_pretrain.sh\"\n"
  },
  {
    "path": "archive/tutorials/nxd-source-code/llama_tp_pp_ptl/llama_tp_pp_ptl_setup.sh",
    "content": "#!/bin/bash\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/lightning\nln -sf ~/neuronx-distributed/examples/training/llama/get_dataset.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/lr.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/modeling_llama_nxd.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/requirements.txt ./\nln -sf ~/neuronx-distributed/examples/training/llama/requirements_ptl.txt ./\nln -sf ~/neuronx-distributed/examples/training/llama/training_utils.py ./\n\npython3 -m pip install -r requirements.txt\npython3 -m pip install -r requirements_ptl.txt  # Currently we're supporting Lightning version 2.1.0"
  },
  {
    "path": "archive/tutorials/ssd300_demo/requirements.txt",
    "content": "numpy>1.18.5\ntensorflow_neuron==1.15.5.2.8.9.0\nneuron_cc==1.13.5.0\ntensorflow-serving-api==1.15.0\ntorch>=1.0,<2.0\ntorchvision<1.0\nmatplotlib<4.0\nCython<0.29\npycocotools==2.0.1\ntensorflow-serving-api==1.15.0\n"
  },
  {
    "path": "archive/tutorials/ssd300_demo/ssd300_demo.rst",
    "content": ".. _tensorflow-ssd300:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n\nRunning SSD300 with AWS Neuron\n==============================\n\n.. note:: \n   This page was archived on 7/31/2025.\n\n*Update 11/16: The model checkpoint\nlink*\\ https://api.ngc.nvidia.com/v2/models/nvidia/ssdpyt_fp32/versions/1/files/nvidia_ssdpyt_fp32_20190225.pt\\ *is\ncurrently broken and the AWS Neuron team is working on providing an\nalternative source.*\n\n\nThis demo shows a Neuron compatible SSD300 implementation that is\nfunctionally equivalent to open source SSD300 model. This demo uses\nTensorFlow-Neuron, PyTorch SSD300 model and checkpoint\n(https://pytorch.org/hub/nvidia_deeplearningexamples_ssd/) and also\nshows the performance achieved by the Inf1 instance.\n\nTable of Contents\n-----------------\n\n1. Launch EC2 instance and update AWS Neuron SDK software\n2. Generating Neuron compatible SSD300 TensorFlow SavedModel\n\n   -  Convert open source PyTorch SSD300 model and checkpoint into\n      Neuron compatible SSD300 TensorFlow SavedModel\n\n3. Evaluate the generated SSD300 TensorFlow SavedModel for both accuracy\n   and performance\n\n   -  Running threaded inference through the COCO 2017 validation\n      dataset\n\nLaunch EC2 instances and update tensorflow-neuron and neuron-cc\n---------------------------------------------------------------\n\nFor this demo, launch one inf1.xlarge EC2 instance. We recommend using\nthe latest Ubuntu 18 Deep Learning AMI (DLAMI).\n\nPlease configure your ubuntu16/ubuntu18/yum repo following the steps in\nthe :ref:`install-neuron-tensorflow` in order to install\n``tensorflow-model-server-neuron``.\n\nGenerating Neuron compatible SSD300 TensorFlow SavedModel\n---------------------------------------------------------\n\nFirst connect to your inf1.xlarge instance\n\nCompile open source PyTorch SSD300 model and checkpoint into Neuron compatible SSD300 TensorFlow SavedModel\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn the same directory ssd300_demo, run the following:\n\n1. Create venv and install dependencies\n\n.. code:: bash\n\n   sudo apt update\n   sudo apt install g++ python3-dev python3-venv unzip\n   sudo apt install tensorflow-model-server-neuron\n   python3 -m venv env\n   source ./env/bin/activate\n   pip install pip setuptools --upgrade\n   pip install -r ./requirements.txt --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n2. Clone NVIDIA's DeepLearningExamples repo that contains PyTorch\n   SSD300.\n\n.. code:: bash\n\n   git clone https://github.com/NVIDIA/DeepLearningExamples.git\n   cd DeepLearningExamples\n   git checkout a644350589f9abc91b203f73e686a50f5d6f3e96\n   cd ..\n\n3. Download PyTorch SSD300 checkpoint file.\n\n.. code:: bash\n\n   curl -LO https://api.ngc.nvidia.com/v2/models/nvidia/ssdpyt_fp32/versions/1/files/nvidia_ssdpyt_fp32_20190225.pt\n\n4. Download COCO 2017 validation set and annotations.\n\n.. code:: bash\n\n   curl -LO http://images.cocodataset.org/zips/val2017.zip\n   unzip ./val2017.zip\n   curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip\n   unzip ./annotations_trainval2017.zip\n\n5. Convert PyTorch SSD300 model and checkpoint into a Neuron-compatible\n   TensorFlow SavedModel.\n\n.. code:: bash\n\n   python ssd300_model.py --torch_checkpoint=./nvidia_ssdpyt_fp32_20190225.pt --output_saved_model=./ssd300_tf_neuron/1\n\nThis converts PyTorch SSD300 model and checkpoint to a Neuron-compatible\nTensorFlow SavedModel using tensorflow-neuron and neuron-cc. The\ncompilation output is stored in ``./ssd300_tf_neuron``.\n\n6. Launch the ``tensorflow-model-server-neuron`` gRPC server at default\n   port 8500 in the background.\n\n.. code:: bash\n\n   tensorflow_model_server_neuron --model_base_path=$(pwd)/ssd300_tf_neuron &\n\n7. In client, evaluate the Neuron-compatible TensorFlow SavedModel for\n   both accuracy and performance. Note that this client by default\n   assumes a ``tensorflow-model-server-neuron`` listening at\n   ``localhost:8500``. On inf1.xlarge, the expected throughput is 100\n   images/second once the server is fully warmed up, and the expected\n   mean average precision (mAP) is 0.253.\n\n.. code:: bash\n\n   python ssd300_evaluation_client.py --val2017=./val2017 --instances_val2017_json=./annotations/instances_val2017.json\n\n8. After running the demo, please cleanup resources allocated in Neuron\n   runtime by gracefully killing the ``tensorflow_model_server_neuron``\n   process, e. g.,\n\n.. code:: bash\n\n   killall tensorflow_model_server_neuron\n"
  },
  {
    "path": "archive/tutorials/ssd300_demo/ssd300_detection.py",
    "content": "import argparse\nimport json\nimport pkg_resources\nfrom distutils.version import LooseVersion\nimport numpy as np\nfrom PIL import Image\nimport matplotlib.pyplot as plt\nimport matplotlib.patches as patches\nimport tensorflow as tf\nimport tensorflow.neuron as tfn\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--image', required=True, help='Path to image that is to be detected. Support jpeg and png format.')\n    parser.add_argument('--image_with_detections', required=True, help='Path to save image after detection (with bounding boxes drawn). Png format.')\n    parser.add_argument('--saved_model', required=True, help='TensorFlow SSD300 SavedModel')\n    parser.add_argument('--score_threshold', type=float, default=0.15, help='Minimum required score for drawing a bounding box')\n    parser.add_argument('--instances_val2017_json', default=None, help='Json file that contains labeling information')\n    parser.add_argument('--save_results', default=None)\n    parser.add_argument('--disable_version_check', action='store_true')\n    args = parser.parse_args()\n    if not args.disable_version_check:\n        tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version)\n        if tfn_version < LooseVersion('1.15.0.1.0.1333.0'):\n            raise RuntimeError(\n                'tensorflow-neuron version {} is too low for this demo. Please upgrade '\n                'by \"pip install -U tensorflow-neuron --index-url=https://pip.repos.neuron.amazonaws.com\"'.format(tfn_version))\n\n    with open(args.image, 'rb') as f:\n        img_jpg_bytes = f.read()\n    model_feed_dict = {'batch_image': [img_jpg_bytes]}\n\n    predictor = tf.contrib.predictor.from_saved_model(args.saved_model)\n    results = predictor(model_feed_dict)\n    if args.save_results is not None:\n        np.savez(args.save_results, **results)\n    boxes_np = results['boxes']\n    scores_np = results['scores']\n    classes_np = results['classes']\n\n    if args.instances_val2017_json is not None:\n        with open(args.instances_val2017_json) as f:\n            annotate_json = json.load(f)\n        label_info = {idx+1: cat['name'] for idx, cat in enumerate(annotate_json['categories'])}\n\n    plt.switch_backend('agg')\n    fig, ax = plt.subplots(1)\n    ax.imshow(Image.open(args.image).convert('RGB'))\n\n    wanted = scores_np[0] > args.score_threshold\n    for xywh, label_no_bg in zip(boxes_np[0][wanted], classes_np[0][wanted]):\n        rect = patches.Rectangle((xywh[0], xywh[1]), xywh[2], xywh[3], linewidth=1, edgecolor='g', facecolor='none')\n        ax.add_patch(rect)\n        rx, ry = rect.get_xy()\n        rx = rx + rect.get_width() / 2.0\n        if args.instances_val2017_json is not None:\n            ax.annotate(label_info[label_no_bg + 1], (rx, ry), color='w', backgroundcolor='g', fontsize=10,\n                        ha='center', va='center', bbox=dict(boxstyle='square,pad=0.01', fc='g', ec='none', alpha=0.5))\n    plt.savefig(args.image_with_detections)\n    plt.close(fig)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "archive/tutorials/ssd300_demo/ssd300_evaluation.py",
    "content": "import argparse\nimport os\nimport json\nimport glob\nfrom concurrent import futures\nimport time\nimport pkg_resources\nfrom distutils.version import LooseVersion\nimport numpy as np\nimport tensorflow as tf\nimport tensorflow.neuron as tfn\nfrom pycocotools.cocoeval import COCOeval\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.coco import COCO\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import dboxes300_coco\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import SSDTransformer\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import COCODetection\n\n\ndef get_val_dataset(val_annotate, val_coco_root):\n    dboxes = dboxes300_coco()\n    val_trans = SSDTransformer(dboxes, (300, 300), val=True)\n    val_coco = COCODetection(val_coco_root, val_annotate, val_trans)\n    return val_coco\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--saved_model', required=True, help='TensorFlow SSD300 SavedModel')\n    parser.add_argument('--val2017', required=True, help='Path to COCO 2017 validation dataset')\n    parser.add_argument('--instances_val2017_json', required=True, help='Json file that contains labeling information')\n    parser.add_argument('--num_sessions', type=int, default=1, help='Number of tensorflow sessions')\n    parser.add_argument('--num_threads', type=int, default=4, help='Number of threads')\n    parser.add_argument('--throughput_interval', type=int, default=10, help='Interval for counting throughput')\n    parser.add_argument('--save_results', default=None)\n    parser.add_argument('--disable_version_check', action='store_true')\n    args = parser.parse_args()\n    if not args.disable_version_check:\n        tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version)\n        if tfn_version < LooseVersion('1.15.0.1.0.1333.0'):\n            raise RuntimeError(\n                'tensorflow-neuron version {} is too low for this demo. Please upgrade '\n                'by \"pip install -U tensorflow-neuron --index-url=https://pip.repos.neuron.amazonaws.com\"'.format(tfn_version))\n    predictor_list = [tf.contrib.predictor.from_saved_model(args.saved_model) for _ in range(args.num_sessions)]\n\n    val_dataset = get_val_dataset(args.instances_val2017_json, args.val2017)\n    inv_map = {v: k for k, v in val_dataset.label_map.items()}\n    model_feed_dict_list = []\n    for img_id in val_dataset.img_keys:\n        img_path = os.path.join(args.val2017, val_dataset.images[img_id][0])\n        with open(img_path, 'rb') as f:\n            img_jpg_bytes = f.read()\n        model_feed_dict_list.append({'batch_image': [img_jpg_bytes]})\n\n    latency_list = []\n    throughput_list = []\n    def predict(pred, model_feed_dict):\n        start = time.time()\n        result = pred(model_feed_dict)\n        latency_list.append(time.time() - start)\n        return result\n\n    def performance():\n        last_num_infer = len(latency_list)\n        while len(latency_list) < len(model_feed_dict_list):\n            current_num_infer = len(latency_list)\n            throughput = (current_num_infer - last_num_infer) / args.throughput_interval\n            throughput_list.append(throughput)\n            p50 = 0.0\n            p90 = 0.0\n            if latency_list:\n                p50 = np.percentile(latency_list, 50)\n                p90 = np.percentile(latency_list, 90)\n            print('pid {}: current throughput {}, latency p50={:.3f} p90={:.3f}'.format(os.getpid(), throughput, p50, p90))\n            last_num_infer = current_num_infer\n            time.sleep(args.throughput_interval)\n\n    executor = futures.ThreadPoolExecutor(max_workers=(args.num_sessions*args.num_threads)+1)\n    performance_future = executor.submit(performance)\n    eval_futures = []\n    for idx, model_feed_dict in enumerate(model_feed_dict_list):\n        eval_fut = executor.submit(predict, predictor_list[idx%len(predictor_list)], model_feed_dict)\n        eval_futures.append(eval_fut)\n    waited_results = []\n    for idx, eval_fut in enumerate(eval_futures):\n        if idx % 100 == 0:\n            print('evaluating image {}/{}'.format(idx, len(eval_futures)))\n        waited_results.append(eval_fut.result())\n    eval_results = []\n    for idx, (img_id, results) in enumerate(zip(val_dataset.img_keys, waited_results)):\n        boxes = results['boxes']\n        for box, label, prob in zip(results['boxes'][0], results['classes'][0], results['scores'][0]):\n            res = [img_id, box[0], box[1], box[2], box[3], prob, inv_map[label+1]]  # +1 to account for background\n            eval_results.append(res)\n    performance_future.result()\n\n    coco_gt = COCO(annotation_file=args.instances_val2017_json)\n    coco_dt = coco_gt.loadRes(np.array(eval_results).astype(np.float32))\n    coco_eval = COCOeval(coco_gt, coco_dt, iouType='bbox')\n    coco_eval.evaluate()\n    coco_eval.accumulate()\n    coco_eval.summarize()\n    if args.save_results is not None:\n        np.save(args.save_results, coco_eval.stats)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "archive/tutorials/ssd300_demo/ssd300_evaluation_client.py",
    "content": "import argparse\nimport os\nimport json\nimport glob\nfrom concurrent import futures\nimport time\nimport subprocess\nfrom distutils.version import LooseVersion\nimport numpy as np\nimport tensorflow as tf\nimport grpc\nfrom tensorflow_serving.apis import predict_pb2\nfrom tensorflow_serving.apis import prediction_service_pb2_grpc\nfrom pycocotools.cocoeval import COCOeval\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.coco import COCO\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import dboxes300_coco\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import SSDTransformer\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import COCODetection\n\n\ndef get_val_dataset(val_annotate, val_coco_root):\n    dboxes = dboxes300_coco()\n    val_trans = SSDTransformer(dboxes, (300, 300), val=True)\n    val_coco = COCODetection(val_coco_root, val_annotate, val_trans)\n    return val_coco\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--server_address', default='localhost:8500', help='tensorflow-model-server-neuron grpc address')\n    parser.add_argument('--model_name', default='default', help='Serving model name')\n    parser.add_argument('--val2017', required=True, help='Path to COCO 2017 validation dataset')\n    parser.add_argument('--instances_val2017_json', required=True, help='Json file that contains labeling information')\n    parser.add_argument('--num_threads', type=int, default=4, help='Number of threads')\n    parser.add_argument('--throughput_interval', type=int, default=10, help='Interval for counting throughput')\n    parser.add_argument('--save_results', default=None)\n    args = parser.parse_args()\n\n    channel = grpc.insecure_channel(args.server_address)\n    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)\n\n    val_dataset = get_val_dataset(args.instances_val2017_json, args.val2017)\n    inv_map = {v: k for k, v in val_dataset.label_map.items()}\n    request_list = []\n    for img_id in val_dataset.img_keys:\n        img_path = os.path.join(args.val2017, val_dataset.images[img_id][0])\n        with open(img_path, 'rb') as f:\n            img_jpg_bytes = f.read()\n        data = np.array([img_jpg_bytes], dtype=object)\n        data = tf.contrib.util.make_tensor_proto(data, shape=data.shape)\n        request = predict_pb2.PredictRequest()\n        request.model_spec.name = args.model_name\n        request.inputs['batch_image'].CopyFrom(data)\n        request_list.append(request)\n\n    latency_list = []\n    throughput_list = []\n    def predict(request):\n        start = time.time()\n        result = stub.Predict(request).outputs\n        latency_list.append(time.time() - start)\n        return result\n\n    def performance():\n        last_num_infer = len(latency_list)\n        while len(latency_list) < len(request_list):\n            current_num_infer = len(latency_list)\n            throughput = (current_num_infer - last_num_infer) / args.throughput_interval\n            throughput_list.append(throughput)\n            p50 = 0.0\n            p90 = 0.0\n            if latency_list:\n                p50 = np.percentile(latency_list, 50)\n                p90 = np.percentile(latency_list, 90)\n            print('pid {}: current throughput {}, latency p50={:.3f} p90={:.3f}'.format(os.getpid(), throughput, p50, p90))\n            last_num_infer = current_num_infer\n            time.sleep(args.throughput_interval)\n\n    executor = futures.ThreadPoolExecutor(max_workers=args.num_threads+1)\n    performance_future = executor.submit(performance)\n    eval_futures = []\n    for idx, request in enumerate(request_list):\n        eval_fut = executor.submit(predict, request)\n        eval_futures.append(eval_fut)\n    waited_results = []\n    for idx, eval_fut in enumerate(eval_futures):\n        if idx % 100 == 0:\n            print('evaluating image {}/{}'.format(idx, len(eval_futures)))\n        waited_results.append(eval_fut.result())\n    eval_results = []\n    for idx, (img_id, results) in enumerate(zip(val_dataset.img_keys, waited_results)):\n        results = {key: tf.make_ndarray(value) for key, value in results.items()}\n        boxes = results['boxes']\n        for box, label, prob in zip(results['boxes'][0], results['classes'][0], results['scores'][0]):\n            res = [img_id, box[0], box[1], box[2], box[3], prob, inv_map[label+1]]  # +1 to account for background\n            eval_results.append(res)\n    performance_future.result()\n\n    coco_gt = COCO(annotation_file=args.instances_val2017_json)\n    coco_dt = coco_gt.loadRes(np.array(eval_results).astype(np.float32))\n    coco_eval = COCOeval(coco_gt, coco_dt, iouType='bbox')\n    coco_eval.evaluate()\n    coco_eval.accumulate()\n    coco_eval.summarize()\n    if args.save_results is not None:\n        np.save(args.save_results, coco_eval.stats)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "archive/tutorials/ssd300_demo/ssd300_model.py",
    "content": "import sys\nimport os\nimport argparse\nimport time\nimport itertools\nfrom functools import partial\nfrom collections import Counter\nimport json\nimport shutil\nimport pkg_resources\nfrom distutils.version import LooseVersion\nimport numpy as np\nimport tensorflow as tf\nfrom tensorflow.core.framework import attr_value_pb2\nimport tensorflow.neuron as tfn\nimport torch\n\n\ndef decode_jpeg_resize(input_tensor, image_size):\n    # decode jpeg\n    tensor = tf.image.decode_png(input_tensor, channels=3)\n\n    # resize\n    decoded_shape = tf.shape(tensor)\n    tensor = tf.cast(tensor, tf.float32)\n    decoded_shape_hw = decoded_shape[0:2]\n    decoded_shape_hw_float32 = tf.cast(decoded_shape_hw, tf.float32)\n    tensor = tf.image.resize(tensor, image_size)\n\n    # normalize\n    tensor -= np.array([0.485, 0.456, 0.406]).astype(np.float32) * 255.0\n    return tensor, decoded_shape_hw_float32[::-1]\n\n\ndef preprocessor(input_tensor, image_size):\n    with tf.name_scope('Preprocessor'):\n        tensor, bbox_scale_hw = tf.map_fn(\n            partial(decode_jpeg_resize, image_size=image_size), input_tensor,\n            dtype=(tf.float32, tf.float32), back_prop=False, parallel_iterations=16)\n    return tensor, bbox_scale_hw\n\n\ndef tf_Conv2d(input_tensor, module, first_conv=False):\n    np_dtype = input_tensor.dtype.as_numpy_dtype\n    kernel_np = module.weight.detach().numpy().transpose([2, 3, 1, 0])\n    if first_conv:\n        kernel_np /= (np.array([0.229, 0.224, 0.225]).astype(np.float32) * 255.0)[:, np.newaxis]\n    kernel = tf.constant(kernel_np.astype(np_dtype))\n    if any(module.padding):\n        pad_h, pad_w = module.padding\n        padding = [[0, 0], [pad_h, pad_h], [pad_w, pad_w], [0, 0]]\n        input_tensor = tf.pad(input_tensor, padding)\n    stride_h, stride_w = module.stride\n    tensor = tf.nn.conv2d(input_tensor, kernel, strides=[1, stride_h, stride_w, 1], padding='VALID')\n    if module.bias is not None:\n        bias = tf.constant(module.bias.detach().numpy().astype(np_dtype))\n        tensor = tf.nn.bias_add(tensor, bias)\n    return tensor\n\ndef tf_BatchNorm2d(input_tensor, module):\n    def _norm_np(ts):\n        return ts.astype(input_tensor.dtype.as_numpy_dtype)\n    mean = _norm_np(module.running_mean.detach().numpy())\n    offset = _norm_np(module.bias.detach().numpy())\n    inv_std = np.sqrt(module.running_var.detach().numpy() + module.eps)\n    scale_inv_std = _norm_np(module.weight.detach().numpy() / inv_std)\n    return scale_inv_std * (input_tensor - mean) + offset\n\ndef tf_MaxPool2d(input_tensor, module):\n    pad = module.padding\n    tensor = tf.pad(input_tensor, [[0, 0], [pad, pad], [pad, pad], [0, 0]])\n    return tf.nn.max_pool2d(tensor, ksize=module.kernel_size, strides=module.stride, padding='VALID')\n\ndef tf_Bottleneck(input_tensor, module):\n    tensor = tf_Conv2d(input_tensor, module.conv1)\n    tensor = tf_BatchNorm2d(tensor, module.bn1)\n    tensor = tf.nn.relu(tensor)\n    tensor = tf_Conv2d(tensor, module.conv2)\n    tensor = tf_BatchNorm2d(tensor, module.bn2)\n    tensor = tf.nn.relu(tensor)\n    tensor = tf_Conv2d(tensor, module.conv3)\n    tensor = tf_BatchNorm2d(tensor, module.bn3)\n    if module.downsample is not None:\n        input_tensor = tf_Conv2d(input_tensor, module.downsample[0])\n        input_tensor = tf_BatchNorm2d(input_tensor, module.downsample[1])\n    return tf.nn.relu(input_tensor + tensor)\n\ndef tf_SequentialBottleneck(tensor, seq, resnet):\n    with tf.name_scope('{}.Sequential'.format(seq)):\n        for idx, module in enumerate(resnet[seq]):\n            with tf.name_scope('{}.BasicBlock'.format(idx)):\n                tensor = tf_Bottleneck(tensor, module)\n    return tensor\n\ndef tf_bbox_view(detection_feed, modules, ndim):\n    results = []\n    for idx, (tensor, mod) in enumerate(zip(detection_feed, modules)):\n        with tf.name_scope('branch{}'.format(idx)):\n            tensor = tf_Conv2d(tensor, mod)\n            tensor = tf.transpose(tensor, [0, 3, 1, 2])\n            tensor = tf.cast(tensor, tf.float32)\n\n            shape = tensor.shape.as_list()\n            batch_size = -1 if shape[0] is None else shape[0]\n            new_shape = [batch_size, ndim, np.prod(shape[1:]) // ndim]\n            results.append(tf.reshape(tensor, new_shape))\n    tensor = tf.concat(results, axis=-1)\n    return tensor\n\n\ndef tf_feature_extractor(input_tensor, resnet):\n    with tf.name_scope('FeatureExtractor'):\n        with tf.name_scope('0.Conv2d'):\n            tensor = tf_Conv2d(input_tensor, resnet[0], first_conv=True)\n        with tf.name_scope('1.BatchNorm2d'):\n            tensor = tf_BatchNorm2d(tensor, resnet[1])\n        with tf.name_scope('2.ReLU'):\n            tensor = tf.nn.relu(tensor)\n        with tf.name_scope('3.MaxPool2d'):\n            tensor = tf_MaxPool2d(tensor, resnet[3])\n        tensor = tf_SequentialBottleneck(tensor, 4, resnet)\n        tensor = tf_SequentialBottleneck(tensor, 5, resnet)\n        tensor = tf_SequentialBottleneck(tensor, 6, resnet)\n        tensor = tf.cast(tensor, tf.float16)\n    return tensor\n\n\ndef tf_box_predictor(tensor, ssd300_torch):\n    with tf.name_scope('BoxPredictor'):\n        detection_feed = [tensor]\n        for idx, block in enumerate(ssd300_torch.additional_blocks):\n            with tf.name_scope('{}.Sequential'.format(idx)):\n                tensor = tf_Conv2d(tensor, block[0])\n                tensor = tf_BatchNorm2d(tensor, block[1])\n                tensor = tf.nn.relu(tensor)\n                tensor = tf_Conv2d(tensor, block[3])\n                tensor = tf_BatchNorm2d(tensor, block[4])\n                tensor = tf.nn.relu(tensor)\n                detection_feed.append(tensor)\n        with tf.name_scope('Boxes'):\n            loc = tf_bbox_view(detection_feed, ssd300_torch.loc, ndim=4)\n        with tf.name_scope('Probabilities'):\n            conf = tf_bbox_view(detection_feed, ssd300_torch.conf, ndim=ssd300_torch.label_num)\n    return loc, conf\n\n\n@tfn.fuse(batch_size=1, dynamic_batch_size=True)\ndef tf_ssd300(input_tensor, ssd300_torch):\n    with tf.name_scope('SSD300'):\n        tensor = tf_feature_extractor(input_tensor, ssd300_torch.feature_extractor.feature_extractor)\n        loc, conf = tf_box_predictor(tensor, ssd300_torch)\n    return loc, conf\n\n\ndef scale_back_batch(bboxes_in, scores_in, scale_xy, scale_wh, dboxes_xywh):\n    \"\"\"\n        Do scale and transform from xywh to ltrb\n        suppose input Nx4xnum_bbox Nxlabel_numxnum_bbox\n    \"\"\"\n    with tf.name_scope('ScaleBackBatch'):\n        bboxes_in = tf.transpose(bboxes_in, [0, 2, 1])\n        scores_in = tf.transpose(scores_in, [0, 2, 1])\n\n        bboxes_xy = bboxes_in[:, :, :2]\n        bboxes_wh = bboxes_in[:, :, 2:]\n        bboxes_xy *= scale_xy\n        bboxes_wh *= scale_wh\n\n        bboxes_xy = bboxes_xy * dboxes_xywh[:, :, 2:] + dboxes_xywh[:, :, :2]\n        bboxes_wh = tf.exp(bboxes_wh) * dboxes_xywh[:, :, 2:]\n\n        bboxes_wh_half = 0.5 * bboxes_wh\n        bboxes_lt = bboxes_xy - bboxes_wh_half\n        bboxes_rb = bboxes_xy + bboxes_wh_half\n\n        bboxes_in = tf.concat([bboxes_lt, bboxes_rb], axis=-1)\n\n        return bboxes_in, tf.nn.softmax(scores_in, axis=-1)\n\ndef select_nms_outputs(input_tensors):\n    boxes_xywh, scores, classes, valid_detections = input_tensors\n    return boxes_xywh[:valid_detections], scores[:valid_detections], classes[:valid_detections]\n\ndef postprocessor(ploc_ts, plabel_ts, bbox_scale_hw_ts, scale_xy, scale_wh, dboxes_xywh):\n    with tf.name_scope('Postprocessor'):\n        ploc_ts = tf.cast(ploc_ts, tf.float32)\n        plabel_ts = tf.cast(plabel_ts, tf.float32)\n        bboxes_ts, probs_ts = scale_back_batch(ploc_ts, plabel_ts, scale_xy, scale_wh, dboxes_xywh)\n        bboxes_ts = bboxes_ts[:, :, tf.newaxis, :]\n        probs_ts = probs_ts[:, :, 1:]\n        nms_outputs = tf.image.combined_non_max_suppression(\n            bboxes_ts,\n            probs_ts,\n            max_output_size_per_class=200,\n            max_total_size=200,\n            iou_threshold=0.5,\n            score_threshold=0.05,\n            pad_per_class=False,\n            clip_boxes=False,\n            name='CombinedNonMaxSuppression',\n        )\n        nmsed_boxes_x0y0x1y1, nmsed_scores, nmsed_classes, valid_detections = nms_outputs\n        nmsed_boxes_x0y0 = nmsed_boxes_x0y0x1y1[..., :2]\n        nmsed_boxes_x1y1 = nmsed_boxes_x0y0x1y1[..., 2:]\n        bbox_scale_hw_ts = bbox_scale_hw_ts[:, tf.newaxis, :]\n        nmsed_boxes_xy = nmsed_boxes_x0y0 * bbox_scale_hw_ts\n        nmsed_boxes_wh = (nmsed_boxes_x1y1 - nmsed_boxes_x0y0) * bbox_scale_hw_ts\n        nmsed_boxes_xywh = tf.concat([nmsed_boxes_xy, nmsed_boxes_wh], axis=-1)\n        nmsed_boxes_xywh, nmsed_scores, nmsed_classes = tf.map_fn(\n            select_nms_outputs, (nmsed_boxes_xywh, nmsed_scores, nmsed_classes, valid_detections),\n            dtype=(tf.float32, tf.float32, tf.float32), back_prop=False, parallel_iterations=16)\n    return nmsed_boxes_xywh, nmsed_scores, nmsed_classes\n\n\nclass DefaultBoxes(object):\n\n    def __init__(self, fig_size, feat_size, steps, scales, aspect_ratios,\n                 scale_xy=0.1, scale_wh=0.2):\n\n        self.feat_size = feat_size\n        self.fig_size = fig_size\n\n        self.scale_xy_ = scale_xy\n        self.scale_wh_ = scale_wh\n\n        # According to https://github.com/weiliu89/caffe\n        # Calculation method slightly different from paper\n        self.steps = steps\n        self.scales = scales\n\n        fk = fig_size/np.array(steps)\n        self.aspect_ratios = aspect_ratios\n\n        self.default_boxes = []\n        # size of feature and number of feature\n        for idx, sfeat in enumerate(self.feat_size):\n\n            sk1 = scales[idx]/fig_size\n            sk2 = scales[idx+1]/fig_size\n            sk3 = np.sqrt(sk1*sk2)\n            all_sizes = [(sk1, sk1), (sk3, sk3)]\n\n            for alpha in aspect_ratios[idx]:\n                w, h = sk1*np.sqrt(alpha), sk1/np.sqrt(alpha)\n                all_sizes.append((w, h))\n                all_sizes.append((h, w))\n            for w, h in all_sizes:\n                for i, j in itertools.product(range(sfeat), repeat=2):\n                    cx, cy = (j+0.5)/fk[idx], (i+0.5)/fk[idx]\n                    self.default_boxes.append((cx, cy, w, h))\n\n        self.dboxes = np.array(self.default_boxes)\n        self.dboxes = self.dboxes.clip(min=0, max=1)\n        # For IoU calculation\n        self.dboxes_ltrb = self.dboxes.copy()\n        self.dboxes_ltrb[:, 0] = self.dboxes[:, 0] - 0.5 * self.dboxes[:, 2]\n        self.dboxes_ltrb[:, 1] = self.dboxes[:, 1] - 0.5 * self.dboxes[:, 3]\n        self.dboxes_ltrb[:, 2] = self.dboxes[:, 0] + 0.5 * self.dboxes[:, 2]\n        self.dboxes_ltrb[:, 3] = self.dboxes[:, 1] + 0.5 * self.dboxes[:, 3]\n\n    @property\n    def scale_xy(self):\n        return self.scale_xy_\n\n    @property\n    def scale_wh(self):\n        return self.scale_wh_\n\n    def __call__(self, order=\"ltrb\"):\n        if order == \"ltrb\": return self.dboxes_ltrb\n        if order == \"xywh\": return self.dboxes\n\n\ndef dboxes300_coco():\n    figsize = 300\n    feat_size = [38, 19, 10, 5, 3, 1]\n    steps = [8, 16, 32, 64, 100, 300]\n    # use the scales here: https://github.com/amdegroot/ssd.pytorch/blob/master/data/config.py\n    scales = [21, 45, 99, 153, 207, 261, 315]\n    aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]\n    dboxes = DefaultBoxes(figsize, feat_size, steps, scales, aspect_ratios)\n    return dboxes\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--torch_checkpoint', required=True, help='Path to PyTorch SSD300 model checkpoint')\n    parser.add_argument('--output_saved_model', required=True, help='Output TensorFlow SavedModel that runs on Inferentia')\n    parser.add_argument('--disable_version_check', action='store_true')\n    args = parser.parse_args()\n    if os.path.exists(args.output_saved_model):\n        raise OSError('SavedModel dir {} already exists'.format(args.output_saved_model))\n\n    if not args.disable_version_check:\n        neuroncc_version = LooseVersion(pkg_resources.get_distribution('neuron-cc').version)\n        if neuroncc_version < LooseVersion('1.0.18000'):\n            raise RuntimeError(\n                'neuron-cc version {} is too low for this demo. Please upgrade '\n                'by \"pip install -U neuron-cc --index-url=https://pip.repos.neuron.amazonaws.com\"'.format(neuroncc_version))\n        tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version)\n        if tfn_version < LooseVersion('1.15.3.1.0.1900.0'):\n            raise RuntimeError(\n                'tensorflow-neuron version {} is too low for this demo. Please upgrade '\n                'by \"pip install -U tensorflow-neuron --index-url=https://pip.repos.neuron.amazonaws.com\"'.format(tfn_version))\n\n    sys.path.append(os.getcwd())\n    from DeepLearningExamples.PyTorch.Detection.SSD.src import model as torch_ssd300_model\n    ssd300_torch = torch_ssd300_model.SSD300()\n    ckpt = torch.load(args.torch_checkpoint, map_location=torch.device('cpu'))\n    ssd300_torch.load_state_dict(ckpt['model'])\n    ssd300_torch.eval()\n\n    input_tensor = tf.placeholder(tf.string, [None])\n    image_tensor, bbox_scale_hw_tensor = preprocessor(input_tensor, [300, 300])\n\n    dboxes = dboxes300_coco()\n    dboxes_xywh = dboxes(order=\"xywh\")[np.newaxis, ...]\n\n    ploc_tensor, plabel_tensor = tf_ssd300(image_tensor, ssd300_torch)\n    boxes_tensor, scores_tensor, classes_tensor = postprocessor(\n        ploc_tensor, plabel_tensor, bbox_scale_hw_tensor, dboxes.scale_xy, dboxes.scale_wh, dboxes_xywh)\n    outputs = {\n        'boxes': boxes_tensor,\n        'scores': scores_tensor,\n        'classes': classes_tensor,\n    }\n\n    sess = tf.Session()\n    try:\n        sess.run(outputs)\n    except:\n        pass\n\n    for op in sess.graph.get_operations():\n        if op.type == 'NeuronOp':\n            if not op.get_attr('executable'):\n                raise AttributeError(\n                    'Neuron executable (neff) is empty. Please check neuron-cc is installed and working properly '\n                    '(\"pip install neuron-cc --force --index-url=https://pip.repos.neuron.amazonaws.com\" '\n                    'to force reinstall neuron-cc).')\n            model_config = op.node_def.attr['model_config'].list\n            if model_config.i:\n                model_config.i[0] = 1\n            else:\n                model_config.i.extend([1, 1, 1, 10])\n            op._set_attr('model_config', attr_value_pb2.AttrValue(list=model_config))\n    tf.saved_model.simple_save(sess, args.output_saved_model, {'batch_image': input_tensor}, outputs)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "archive/tutorials/training-gpt-neox-20b.rst",
    "content": ".. _gpt_neox_20b_tp_zero1_tutorial:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently unsupported and not maintained. It is provided for reference only.\n\nTraining GPT-NeoX 20B with Tensor Parallelism and ZeRO-1 Optimizer \n=========================================================================================\n\nIn this section, we showcase to pretrain a GPT-NeoX 20B model by using the sequence parallel optimization\nof tensor parallelism in the ``neuronx-distributed`` package. Please refer to the `Neuron Samples repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain>`__ to view the files in this tutorial.\n\nThis GPT-NeoX 20B tutorial differs from the :ref:`GPT-NeoX 6.9B tutorial<gpt_neox_tp_zero1_tutorial>` in the following ways:\n\n* sequence parallel optimization has been applied\n* parallel cross entropy has been applied\n* the model size has been increased from 6.9B to 20B\n* the TP degree has been increased from 8 to 32\n\nSetting up environment is same as the :ref:`GPT-NeoX 6.9B tutorial<gpt_neox_tp_zero1_tutorial>`.\n\n**Let’s download the scripts for pretraining:**\n\n.. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_20b.sh\n   :language: shell\n   :lines: 4-8\n\nNext let’s download and pre-process the dataset:\n\n.. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_20b.sh\n   :language: shell\n   :lines: 10\n\nAt this point, you are all set to start training.\n\n**Running training**\n\nWe first pre-compile the graphs using the ``neuron_parallel_compile``.\nLet’s run the command below:\n\n.. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_20b.sh\n   :language: shell\n   :lines: 14-17\n\nThis script uses a tensor-parallel size of 32.\nThis will automatically set the zero-1 sharding degree to 4 (4 * 32 workers / tensor_parallel_size).\nOnce the graphs are compiled we can now run training and observe our loss goes down.\nTo run the training, we just the above command but without ``neuron_parallel_compile``.\n\n.. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_20b.sh\n   :language: shell\n   :lines: 19-22\n\n\n**Sequence Parallel**\n\nWe made the following model level modifications to enable sequence parallel:\n\n* turn on ``sequence_parallel_enabled`` of ``ColumnParallelLinear`` and ``RowParallelLinear``\n  in ``GPTNeoXAttention`` and ``GPTNeoXMLP``;\n* replace torch ``LayerNorm`` in ``GPTNeoXLayer`` and ``GPTNeoXModel`` with neuronx-distributed  ``LayerNorm``\n  with ``sequence_parallel_enabled``\n  turned on;\n* dimension transposition of intermediate states in the forward function of ``GPTNeoXAttention``.\n* dimension transposition and collective communication of intermediate states in the forward function of ``GPTNeoXModel``.\n\nIn the training training script level, we enable:\n\n* all-reduce sequence parallel gradients at the gradient accumulation boundary.\n\nPlease check `modeling_gpt_neox_nxd.py <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/modeling_gpt_neox_nxd.py>`__ and `tp_dp_gpt_neox_20b_hf_pretrain.py <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain.py>`__ for details.\n\n\n**Parallel Cross Entropy**\n\nTo enable parallel cross entropy, we made the following model level modeifincations:\n\n* replace the ``CrossEntropyLoss`` with neuronx-distributed ``parallel_cross_entropy`` in the forward\n  function of ``GPTNeoXForCausalLM``.\n* use ``ColumnParallelLinear`` for the ``embed_out`` layer in ``GPTNeoXForCausalLM``.\n\nPlease check ``modeling_gpt_neox_nxd.py`` for details.\n"
  },
  {
    "path": "archive/tutorials/training-gpt-neox.rst",
    "content": ".. _gpt_neox_tp_zero1_tutorial:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This documentation for the AWS Neuron SDK is currently unsupported and not maintained. It is provided for reference only.\n\nTraining GPT-NeoX 6.9B with Tensor Parallelism and ZeRO-1 Optimizer\n=========================================================================================\n\nIn this section, we showcase to pretrain a GPT-NeoX 6.9B model by using tensor parallelism\nand zero-1 optimizer in the ``neuronx-distributed`` package. Please refer to the `Neuron Samples repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_6.9b_hf_pretrain>`__ to view the files in this tutorial.\n\n**Setting up environment:**\n                       \n\nFor this experiment, we will use a ParallelCluster with at least four trn1-32xl compute nodes.\n`Train your model on ParallelCluster <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster/parallelcluster-training.html>`__\nintroduces how to setup and use a ParallelCluster.\nWe need first to create and activate a python virtual env on the head node of the ParallelCluster.\nNext follow the instructions mentioned here:\n:ref:`Install PyTorch Neuron on Trn1 <setup-torch-neuronx>` to install neuron python packages.\n\nWe also need to install and clone the ``neuronx-distributed`` package using the following command:\n\n.. code:: ipython3\n\n   python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com\n   git clone git@github.com:aws-neuron/neuronx-distributed.git\n\nLet’s download the scripts for pretraining.\n\n.. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_6_9b.sh\n   :language: shell\n   :lines: 4-10\n\nNext let’s download and pre-process the dataset:\n\n.. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_6_9b.sh\n   :language: shell\n   :lines: 12\n\nAt this point, you are all set to start training.\n\n**Running training**\n                \n\nWe first pre-compile the graphs using the ``neuron_parallel_compile``.\nLet’s run the command below:\n\n.. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_6_9b.sh\n   :language: shell\n   :lines: 16-18\n\nThis script uses a tensor-parallel size of 8.\nThis will automatically set the zero-1 sharding degree to 16 (4 * 32 workers / tensor_parallel_size).\nOnce the graphs are compiled we can now run training and observe our loss goes down.\nTo run the training, we just the above command but without ``neuron_parallel_compile``.\n\n.. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_6_9b.sh\n   :language: shell\n   :lines: 20-22\n\n**ZeRO-1 Optimizer**\n                \n\nThe training script uses ZeRO-1 optimizer, where the optimizer states are partitioned across\nthe ranks so that each rank updates only its partition.\nBelow shows the code snippet of using ZeRO-1 optimizer in training script:\n\n.. code:: ipython3\n\n   from neuronx_distributed.optimizer import NeuronZero1Optimizer\n\n   optimizer = NeuronZero1Optimizer(\n        optimizer_grouped_parameters,\n        AdamW_FP32OptimParams,\n        lr=flags.lr,\n        pin_layout=False,\n        sharding_groups=parallel_state.get_data_parallel_group(as_list=True),\n        grad_norm_groups=parallel_state.get_tensor_model_parallel_group(as_list=True),\n    )\n"
  },
  {
    "path": "archive/tutorials/training_codegen25_7b.rst",
    "content": ".. _codegen25_7b_tp_zero1_tutorial:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n\nTraining CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer \n==============================================================================================\n\nIn this tutorial, we showcase how to pretrain a CodeGen2.5 7B model for program synthesis. Since Codegen2.5's architecture is identical to the one of Llama2, you may want to take a look at our `Llama2 tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.html>`__ first.\n\nAfter setting up the environment and installing ``neuronx-distributed``, we need to download a data set containing source code (in this case Java code) and then preprocess and tokenize it to match the code-infill format (more about this below). Use the following commands to download the required files. Note, that we reuse our llama2 training files.\n\n.. code:: bash\n\n   mkdir -p ~/examples/tp_zero1_codegen25_7b_hf_pretrain\n   cd ~/examples/tp_zero1_codegen25_7b_hf_pretrain\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/modeling_llama_nxd.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_zero1_llama_hf_pretrain/tp_zero1_llama_hf_pretrain.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_zero1_llama_hf_pretrain/logger.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/codegen25/tp_zero1_codegen25_7b_hf_pretrain.sh\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/codegen25/get_dataset_infill.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/codegen25/get_dataset_infill.sh\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/codegen25/requirements.txt\n   chmod +x tp_zero1_codegen25_7b_hf_pretrain.sh\n   chmod +x get_dataset_infill.sh\n   python3 -m pip install -r requirements.txt\n\nData Preprocessing and Tokenization\n------------------------------------\n\nTo tokenize the data, we will use the CodeGen2.5 tokenizer from the HuggingFace repository. Download it by cloning the repository.\n\n.. code:: bash\n\n   cd ~/examples\n   git clone https://huggingface.co/Salesforce/codegen25-7b-mono\n   cd codegen25-7b-mono\n   rm config.json # Need to use our config.json for some Trainium-specific settings\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/codegen25/config.json\n   cd ..\n\nThis tutorial makes use of a clean JAVA subset of the TheStack corpus and we preprocess it to fit the infill-format.\nThe infill format samples a random number of spans and formats the input the following way:\n\n.. code:: Python\n\n   def count_words(filename: str) -> Dict[str, int]:\n      \"\"\"Count the number of occurrences of each word in the file.\"\"\"\n      with open(filename, 'r') as f:\n         word_counts = {}\n         for line in f:\n            if word in word_counts:\n                  for word in line.split():\n                     word_counts[word] += 1\n            else:\n                  word_counts[word] = 1\n      return word_counts\n\nbecomes \n\n.. code:: Python\n\n   def count_words(filename: str) -> Dict[str, int]:\n      \"\"\"Count the number of occurrences of each word in the file.\"\"\"\n      with open(filename, 'r') as f:\n            <mask_1> in word_counts:\n                  for word in line.split():\n                        word_counts[word] += 1\n               else:\n                  word_counts[word] = 1\n      return word_counts<|endoftext|><sep>\n      <mask_1>word_counts = {}\n            for line in f:\n                  if word <eom>\n\nFor each span, we introduce two ``<mask_X>`` tokens. One signals the model that a span is missing at this position, and one (at the end of the code) which is followed by the original code span. Lastly, each span is suffixed with an end of mask (``<eom>``) token. \nYou can preprocess and tokenize the dataset by running:\n\n.. code:: bash\n\n   cd ~/examples/tp_zero1_codegen25_7b_hf_pretrain\n   ./get_dataset_infill.sh\n\nThis will preprocess and store the data in your home directory at ``~/example_datasets/bigcode-stack-java_tokenized_infill``.\n\nStarting Training\n-----------------\nAt this point, you are all set to start training.\n\nPer default, we use a tensor parallel degree of 8, a global batch size of 256, and train for 10k steps. Feel free to change these settings in the ``tp_zero1_codegen25_7b_hf_pretrain.sh`` script.\n\nWe first pre-compile the graphs using the ``neuron_parallel_compile``. Let’s run the command below:\n\n.. code:: Python\n\n   sbatch --exclusive \\\n   --nodes 1 \\\n   --wrap=\"srun neuron_parallel_compile bash $(pwd)/tp_zero1_codegen25_7b_hf_pretrain.sh\"\n\nOnce the graphs are compiled we can run training and observe our loss going down. \nTo do so, we run the same command omitting ``neuron_parallel_compile``.\n\n.. code:: Python\n\n   sbatch --exclusive \\\n   --nodes 1 \\\n   --wrap=\"srun bash $(pwd)/tp_zero1_codegen25_7b_hf_pretrain.sh\"\n\n\nHappy training!\n"
  },
  {
    "path": "archive/tutorials/training_llama2_tp_pp_ptl.rst",
    "content": ".. _llama2_tp_pp_ptl_tutorial:\n\n.. meta::\n   :noindex:\n   :nofollow:\n   :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only.\n\nTraining Llama-2-7B/13B/70B using Tensor Parallelism and Pipeline Parallelism with Neuron PyTorch-Lightning\n============================================================================================================\n\nIn this section, we showcase to pretrain a Llama2 7B/13B/70B with Tensor Parallelism and Pipeline Parallel using Neuron PyTorch-Lightning APIs, please refer to  the Llama2 13B/70B Tutorial\nand the Neuron PT-Lightning Developer Guide for more context.\n\n\nSetting up environment:\n^^^^^^^^^^^^^^^^^^^^^^^\n                       \nFor this experiment, we will use AWS ParallelCluster with at least four trn1.32xlarge compute nodes(at least 32 nodes are needed for 13B/70B model size).\n`Train your model on ParallelCluster <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster/parallelcluster-training.html>`__\nintroduces how to setup and use a ParallelCluster.\nTo setup the packages on the headnode of the ParallelCluster, follow the instructions mentioned here:\n:ref:`Install PyTorch Neuron on Trn1 <setup-torch-neuronx>`.\n\nWe also need to install the ``neuronx-distributed`` package inside the virtual env using the following command:\n\n.. code:: ipython3\n\n   python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com\n   git clone git@github.com:aws-neuron/neuronx-distributed.git\n\nLet’s download the scripts for pretraining:\n\n\n1. Navigate to a directory to hold our experiments\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_tp_pp_ptl_setup.sh\n   :language: shell\n   :lines: 4\n\n2. Link the training scripts for our experiments\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_tp_pp_ptl_setup.sh\n   :language: shell\n   :lines: 5-10\n\nIf you want to pre-train Llama 7B, you would need to run the following steps -\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh\n   :language: shell\n   :lines: 5-8\n\nIf you want to pre-train Llama 13B, you would need to run the following steps -\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_13b.sh\n   :language: shell\n   :lines: 5-8\n\nIf you want to pre-train Llama 70B, you would need to run the following steps -\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_70b.sh\n   :language: shell\n   :lines: 5-8\n\n3. Installing the additional requirements and giving the right permissions to our shell script\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_tp_pp_ptl_setup.sh\n   :language: shell\n   :lines: 12-13\n\n\nNext, we tokenize our dataset. \n``Note``: To tokenize the data, we must request the tokenizer from `HuggingFace` and `Meta` by following \nthe instructions at the following link: `HuggingFace Llama 2 7B Model <https://huggingface.co/meta-llama/Llama-2-7b>`__ .\nUse of the Llama 2 model is governed by the Meta license. In order to download the model weights and tokenizer, please \nvisit the above website and accept their License before requesting access. After access has been granted, \nyou may use the download scripts provided by Meta to download the model weights and tokenizer to your cluster.\n\nOnce you have downloaded the tokenizer and model weights, you can copy the ``tokenizer.model`` to the ``~/examples/llama2_lightning`` directory.\n\nNext let’s download and pre-process the dataset:\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh\n   :language: shell\n   :lines: 13\n\n``Note``: In case you see an error of the following form when downloading data: ``huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/examples/llama2_lightning'. Use `repo_type` argument if needed.`` \nThis could be because of a stale cache. Try deleting the cache using: \n\n.. code:: ipython3\n\n   sudo rm -rf /home/ubuntu/.cache/\n\n\nAt this point, you are all set to start training.\n\nTraining Llama2-7B with Tensor Parallelism\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBy this step, the ParallelCluster is all setup for running experiments. \nBefore we run training, we first pre-compile the graphs using the :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>`.\nLet’s run the command below:\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh\n   :language: shell\n   :lines: 17-20\n\nThis script uses a tensor-parallel size of 8.\nThis will automatically set the zero-1 sharding degree to 16 (4 * 32 workers / tensor_parallel_size). \n\n``Note``: You can use any number of nodes in this case, would just need to adjust the number of nodes in the above \nslurm command accordingly. Also, the number of nodes used in parallel_compile command should be same as the actual \ntraining run. This is because, as the number of nodes change, the data-parallel degree would change too. This would \nresult in more workers participating in operations like `gradient all-reduce` which would result in new graphs getting \ncreated. \n\nOnce the graphs are compiled we can now run training and observe our loss goes down.\nTo run the training, we just run the above command but without ``neuron_parallel_compile``.\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh\n   :language: shell\n   :lines: 22-25\n\nTraining Llama2-13B/70B with Tensor Parallelism and Pipeline Parallelism\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nHere we use ``Llama70B`` as an example. To run 13B, simply change the script from ``run_llama_70b_tp_pp.sh`` to ``run_llama_13B_tp_pp.sh``\nBefore we run training, we first pre-compile the graphs using the :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>`.\nLet’s run the command below:\n\nPre-compiling\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_70b.sh\n   :language: shell\n   :lines: 17-20\n\nThis script uses a tensor-parallel size of 8, pipeline-parallel size of 8\nTo run the training, we just use the above command but without ``neuron_parallel_compile``.\n\n.. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh\n   :language: shell\n   :lines: 22-25\n\n\nCheckpointing:\n^^^^^^^^^^^^^^\n\nTo enable checkpoint saving, add following flags to ``run_llama_7b_tp_ptl.sh``/ ``run_llama_13b_tp_pp.sh`` /  ``run_llama_70B_tp_pp.sh``:\n* ``--save_checkpoint`` Add this flag to enable checkpoint saving\n* ``--checkpoint_freq`` Number of steps to save a checkpoint\n* ``--checkpoint_dir`` Direction to save the checkpoint \n* ``--num_kept_checkpoint`` Number of checkpoints to save, older checkpoint will be deleted manually, set to -1 to keep all saved checkpoints\n* ``--save_load_xser`` load with torch xla serialization to reduce time saving, it's recommended to enable xser for significantly faster save/load. Note that if the chekpoint is saved with xser, it can only be loaded with xser, vice versa. \n\nTo enable checkpoint loading, add following flags to ``run_llama_7b_tp_ptl.sh``/ ``run_llama_13b_tp_pp.sh`` /  ``run_llama_70B_tp_pp.sh``:\n* ``--resume_ckpt`` \n* ``--load_step`` Step to retrieve checkpoint from\n* ``--checkpoint_dir`` Direction to load the checkpoint from\n* ``--save_load_xser`` load with torch xla serialization to reduce time saving, it's recommended to enable xser for significantly faster save/load. Note that if the chekpoint is saved with xser, it can only be loaded with xser, vice versa. \n"
  },
  {
    "path": "archive/tutorials/tutorial_source_code/t5_finetuning/t5_finetuning_32_worker_training_code.sh",
    "content": "#!/bin/bash\nset -eExuo pipefail\n\ncd ~/transformers/examples/pytorch/summarization\n\n# Create run 32 worker script\ntee run_32w.sh > /dev/null <<EOF\n#!/bin/bash\nset -eExuo\nif [ \\$NEURON_PARALLEL_COMPILE == \"1\" ]\nthen\n    XLA_USE_BF16=1 torchrun --nproc_per_node=32 ./run_summarization.py \\\n    --model_name_or_path t5-large \\\n    --dataset_name cnn_dailymail \\\n    --dataset_config \"3.0.0\" \\\n    --do_train \\\n    --do_eval \\\n    --source_prefix \"summarize: \" \\\n    --max_source_length 512 \\\n    --per_device_train_batch_size 4 \\\n    --per_device_eval_batch_size 4 \\\n    --overwrite_output_dir \\\n    --pad_to_max_length \\\n    --max_steps 100 \\\n    --max_eval_samples 100 \\\n    --gradient_accumulation_steps=11 \\\n    --output_dir /tmp/tst-summarization |& tee log_run\nelse\n    XLA_USE_BF16=1 torchrun --nproc_per_node=32 ./run_summarization.py \\\n    --model_name_or_path t5-large \\\n    --dataset_name cnn_dailymail \\\n    --dataset_config \"3.0.0\" \\\n    --do_train \\\n    --do_eval \\\n    --source_prefix \"summarize: \" \\\n    --max_source_length 512 \\\n    --per_device_train_batch_size 4 \\\n    --per_device_eval_batch_size 4 \\\n    --overwrite_output_dir \\\n    --pad_to_max_length \\\n    --gradient_accumulation_steps=11 \\\n    --output_dir /tmp/tst-summarization |& tee log_run\nfi\nEOF\n\nchmod +x run_32w.sh\n\n# Precompile and run training\nneuron_parallel_compile ./run_32w.sh\n\n./run_32w.sh"
  },
  {
    "path": "archive/tutorials/tutorial_source_code/t5_finetuning/t5_finetuning_multi_worker_training_code.sh",
    "content": "#!/bin/bash\nset -eExuo pipefail\n\ncd ~/transformers/examples/pytorch/summarization\n\n# Create run 2 worker script\ntee run_2w.sh > /dev/null <<EOF\n#!/bin/bash\nset -eExuo\nif [ \\$NEURON_PARALLEL_COMPILE == \"1\" ]\nthen\n    XLA_USE_BF16=1 torchrun --nproc_per_node=2 ./run_summarization.py \\\n    --model_name_or_path t5-small \\\n    --dataset_name cnn_dailymail \\\n    --dataset_config \"3.0.0\" \\\n    --do_train \\\n    --do_eval \\\n    --source_prefix \"summarize: \" \\\n    --max_source_length 512 \\\n    --per_device_train_batch_size 32 \\\n    --per_device_eval_batch_size 4 \\\n    --overwrite_output_dir \\\n    --pad_to_max_length \\\n    --max_steps 100 \\\n    --max_eval_samples 100 \\\n    --gradient_accumulation_steps=32 \\\n    --output_dir /tmp/tst-summarization |& tee log_run\nelse\n    XLA_USE_BF16=1 torchrun --nproc_per_node=2 ./run_summarization.py \\\n    --model_name_or_path t5-small \\\n    --dataset_name cnn_dailymail \\\n    --dataset_config \"3.0.0\" \\\n    --do_train \\\n    --do_eval \\\n    --source_prefix \"summarize: \" \\\n    --max_source_length 512 \\\n    --per_device_train_batch_size 32 \\\n    --per_device_eval_batch_size 4 \\\n    --overwrite_output_dir \\\n    --pad_to_max_length \\\n    --gradient_accumulation_steps=32 \\\n    --output_dir /tmp/tst-summarization |& tee log_run\nfi\nEOF\n\nchmod +x run_2w.sh\n\n# Precompile and run training\nneuron_parallel_compile ./run_2w.sh\n\n./run_2w.sh"
  },
  {
    "path": "archive/tutorials/tutorial_source_code/t5_finetuning/t5_finetuning_setup_code.sh",
    "content": "#!/bin/bash\nset -eExuo\n\n# Install packages and clone transformers\nexport HF_VER=4.26.0\npip install -U transformers==$HF_VER datasets evaluate scikit-learn rouge_score pandas==1.4.0\ncd ~/\ngit clone https://github.com/huggingface/transformers --branch v$HF_VER\ncd ~/transformers/examples/pytorch/summarization"
  },
  {
    "path": "archive/tutorials/tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh",
    "content": "#!/bin/bash\nset -eExuo pipefail\n\ncd ~/transformers/examples/pytorch/summarization\n\n# Create run.sh file\ntee run.sh > /dev/null <<EOF\n#!/bin/bash\nset -eExuo\nif [ \\$NEURON_PARALLEL_COMPILE == \"1\" ]\nthen\n    XLA_USE_BF16=1 python3 ./run_summarization.py \\\n    --model_name_or_path t5-small \\\n    --dataset_name cnn_dailymail \\\n    --dataset_config \"3.0.0\" \\\n    --do_train \\\n    --do_eval \\\n    --source_prefix \"summarize: \" \\\n    --max_source_length 512 \\\n    --per_device_train_batch_size 32 \\\n    --per_device_eval_batch_size 4 \\\n    --overwrite_output_dir \\\n    --pad_to_max_length \\\n    --max_steps 100 \\\n    --max_eval_samples 100 \\\n    --gradient_accumulation_steps=32 \\\n    --output_dir /tmp/tst-summarization |& tee log_run\nelse\n    XLA_USE_BF16=1 python3 ./run_summarization.py \\\n    --model_name_or_path t5-small \\\n    --dataset_name cnn_dailymail \\\n    --dataset_config \"3.0.0\" \\\n    --do_train \\\n    --do_eval \\\n    --source_prefix \"summarize: \" \\\n    --max_source_length 512 \\\n    --per_device_train_batch_size 32 \\\n    --per_device_eval_batch_size 4 \\\n    --overwrite_output_dir \\\n    --pad_to_max_length \\\n    --gradient_accumulation_steps=32 \\\n    --output_dir /tmp/tst-summarization |& tee log_run\nfi\nEOF\n\nchmod +x run.sh\n\n# Run precompilation and training\nneuron_parallel_compile ./run.sh\n\n./run.sh\n\n# Insert code into run summarization in order to predict with generate\ntee temp_run_summarization.py > /dev/null <<EOF\nimport libneuronxla\n# Disable configuring xla env\ndef _configure_env():\n    pass\nlibneuronxla.configure_environment = _configure_env\nEOF\n\ncat run_summarization.py >> temp_run_summarization.py\nmv temp_run_summarization.py run_summarization.py\nchmod +x run_summarization.py\n\n# Run run summarization to predict without generate\nNEURON_NUM_DEVICES=0 python3 ./run_summarization.py \\\n    --model_name_or_path <CHECKPOINT_DIR> \\\n    --dataset_name cnn_dailymail \\\n    --dataset_config \"3.0.0\" \\\n    --do_predict \\\n    --predict_with_generate \\\n    --source_prefix \"summarize: \" \\\n    --per_device_eval_batch_size 4 \\\n    --max_source_length 512 \\\n    --pad_to_max_length \\\n    --no_cuda \\\n    --output_dir /tmp/tst-summarization |& tee log_run"
  },
  {
    "path": "archive/tutorials/tutorial_source_code/t5_finetuning/t5_modify_run_summarization_code.sh",
    "content": "#!/bin/bash\nset -eExuo pipefail\n\ncd ~/transformers/examples/pytorch/summarization\n\n# Insert code into run summarization to disable DDP for torchrun\ntee temp_run_summarization.py > /dev/null <<EOF\n# Disable DDP for torchrun\nfrom transformers import __version__, Trainer\nTrainer._wrap_model = lambda self, model, training=True, dataloader=None: model\nEOF\n\ncat run_summarization.py >> temp_run_summarization.py\nmv temp_run_summarization.py run_summarization.py\nchmod +x run_summarization.py"
  },
  {
    "path": "audit-report.md",
    "content": "# Frameworks Audit Report\n\n## Orphaned Pages\n\n| File Path | Type | Reason | Action |\n|---|---|---|---|\n| frameworks/mxnet-neuron/container-sm-hosting-devflow.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/mxnet-neuron/dlc-then-ec2-devflow.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/mxnet-neuron/dlc-then-ecs-devflow.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/mxnet-neuron/env-setup.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/mxnet-neuron/refman.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/mxnet-neuron/rn.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/mxnet-neuron/setup/mxnet-install-prev.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/mxnet-neuron/setup/mxnet-update-al2.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/mxnet-neuron/setup/mxnet-update-u22.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/mxnet-neuron/tutorials/bert_mxnet/index.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/mxnet-neuron/tutorials/index.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/inference.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/container-sm-hosting-devflow.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/dlc-then-k8s-devflow.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/env-setup.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/refman.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/rn.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/setup/index.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-al2.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-update-al2.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/tf1_faq.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/tutorials/yolo_v4_demo/code.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/tutorials/yolo_v4_demo/yolo_v4_demo.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-update.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuronx/tensorflow-neuron-quickstart.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuronx/tensorflow-neuron-supported-operators.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuronx/tutorials/inference/tensorflow-neuronx-serving-tutorial.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/inference.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuron/env-setup.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuron/setup/pytorch-update-al2.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuron/tutorials/index.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/setup/note-setup-general.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.3.0-pytorch-install.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.4.0-pytorch-install.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.5.0-pytorch-install.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.6.0-pytorch-install.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/setup/pytorch-install-prev.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/setup/pytorch-update.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/setup/setup-inference.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/setup/setup-training.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/tutorials/inference/index.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/torch-neuronx/tutorials/training/index.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/torch/training.rst | .rst | Not in any toctree or cross-reference | Delete |\n| frameworks/tensorflow/tensorflow-neuron/tutorials/bert_demo/uncased_L-24_H-1024_A-16.vocab.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete |\n| frameworks/torch/dropdown-neuron-setup.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete |\n| frameworks/torch/tab-inference-torch-neuronx.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete |\n| frameworks/torch/tab-training-torch-neuronx.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete |\n| frameworks/torch/torch-neuronx/api-reference-guide/inference/inference-api-guide-torch-neuronx.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete |\n| frameworks/torch/torch-neuronx/api-reference-guide/training/index.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete |\n| frameworks/torch/torch-neuronx/programming-guide/inference/index.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete |\n| frameworks/torch/torch-neuronx/programming-guide/training/index.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete |\n| frameworks/torch/torch-neuronx/setup/install-templates/pytorch-dev-install.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete |\n\n## Stale Pages\n\n| File Path | Staleness Indicators | Recommendation |\n|---|---|---|\n| frameworks/mxnet-neuron/misc-mxnet-neuron.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/mxnet-neuron/misc-mxnet-neuron.txt | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/mxnet-neuron/rn.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/mxnet-neuron/setup/mxnet-install.rst | Amazon Linux 2 | Will be archived |\n| frameworks/mxnet-neuron/setup/mxnet-update.rst | Amazon Linux 2 | Will be archived |\n| frameworks/mxnet-neuron/tutorials/bert_mxnet/index.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/tensorflow/tensorflow-neuron/api-compilation-python-api.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/tensorflow/tensorflow-neuron/refman.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/tensorflow/tensorflow-neuron/rn.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.rst | Amazon Linux 2 | Will be archived |\n| frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-update.rst | Amazon Linux 2 | Will be archived |\n| frameworks/tensorflow/tensorflow-neuron/tf1_faq.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/tensorflow/tensorflow-neuron/tf2_faq.rst | Ubuntu 18.04 | Will be archived |\n| frameworks/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.8.0-tensorflow-install.rst | Amazon Linux 2 | Will be archived |\n| frameworks/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.9.0-tensorflow-install.rst | Amazon Linux 2 | Will be archived |\n| frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install.rst | Amazon Linux 2 | Will be archived |\n| frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-update.rst | Amazon Linux 2 | Will be archived |\n| frameworks/torch/dropdown-neuron-setup.txt | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive |\n| frameworks/torch/guide-torch-neuron-vs-torch-neuronx-inference.rst | References deprecated neuron-cc compiler | Update or archive |\n| frameworks/torch/inference-torch-neuron.txt | References deprecated neuron-cc compiler | Update or archive |\n| frameworks/torch/torch-neuron/api-compilation-python-api.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/torch/torch-neuron/misc-inference-torch-neuron.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/torch/torch-neuron/misc-inference-torch-neuron.txt | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/torch/torch-neuron/setup/pytorch-install.rst | Amazon Linux 2 | Will be archived |\n| frameworks/torch/torch-neuron/setup/pytorch-update.rst | Amazon Linux 2 | Will be archived |\n| frameworks/torch/torch-neuron/troubleshooting-guide.rst | References deprecated neuron-cc compiler | Will be archived |\n| frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-trace.rst | References deprecated neuron-cc compiler | Update or archive |\n| frameworks/torch/torch-neuronx/programming-guide/inference/core-placement.rst | References deprecated neuron-cc compiler | Update or archive |\n| frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.rst | Ubuntu 20.04 | Update or archive |\n| frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.4.0-pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive |\n| frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.6.0-pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive |\n| frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.7.0-pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive |\n| frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.8.0-pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive |\n| frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.9.0-pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive |\n| frameworks/torch/torch-neuronx/setup/pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive |\n| frameworks/torch/torch-neuronx/setup/pytorch-update.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive |\n| frameworks/torch/torch-neuronx/training-troubleshooting.rst | Ubuntu 18.04; torch-neuron setup/update with unsupported OS: Ubuntu 18.04 | Update or archive |\n"
  },
  {
    "path": "build.sh",
    "content": "#!/bin/bash\n# build.sh - Docker + uv workflow for Neuron docs\n\nset -e\nIMAGE_NAME=\"neuron-docs\"\n\ncase \"${1:-build}\" in\n  build)\n    docker build -t \"$IMAGE_NAME\" .\n    ;;\n  html)\n    docker run --rm -v \"$(pwd):/docs\" \"$IMAGE_NAME\" -c \"sphinx-build -b html . _build/html -j auto\"\n    ;;\n  shell)\n    docker run --rm -it -v \"$(pwd):/docs\" \"$IMAGE_NAME\"\n    ;;\n  clean)\n    rm -rf _build\n    ;;\n  *)\n    echo \"Usage: $0 {build|html|shell|clean}\"\n    exit 1\n    ;;\nesac\n"
  },
  {
    "path": "compiler/error-codes/EARG001.rst",
    "content": ".. _error-code-earg001:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EARG001.\n\nNCC_EARG001\n===========\n\n**Error message**: This error occurs when you attempt to use a Logical Neuron Core (LNC) configuration that is not supported by the target Neuron architecture.\n\nFor example, a trn1 instance running the following code will run into this error:\n\n.. code-block:: python\n\n   traced_model = torch_neuronx.trace(\n      model,\n      input,\n      compiler_args=['--lnc', '2']  # ERROR: lnc=2 not supported on trn1\n   )\n\nOn trn1, only lnc=1 is supported.\n\nPhysical Neuron Core:\n\n- Actual hardware compute unit on the chip\n\n- Has dedicated compute resources, memory, etc.\n\nLogical Neuron Core:\n\n- Software abstraction grouping multiple physical cores\n\n- Controlled via the NEURON_LOGICAL_NC_CONFIG environment variable or the --lnc flag (when using neuronx-cc directly)\n\nFor more information: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/device-memory.html#logical-neuron-cores\n"
  },
  {
    "path": "compiler/error-codes/EBIR023.rst",
    "content": ".. _error-code-ebir023:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EBIR023.\n\nNCC_EBIR023\n===========\n\n**Error message**: MLP kernel intermediate size exceeds the maximum supported value of 4096.\n\nConsider tiling large intermediate tensors in your kernel to stay within the supported limit, or increase tensor parallelism to shard the intermediate dimension across more cores.\n"
  },
  {
    "path": "compiler/error-codes/EBVF030.rst",
    "content": ".. _error-code-ebvf030:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EBVF030.\n\nNCC_EBVF030\n===========\n\n**Error message**: The number of instructions generated exceeds the limit.\n\nConsider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs.\n\nFor more information: \n\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#api-guide\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html\n"
  },
  {
    "path": "compiler/error-codes/EHCA005.rst",
    "content": ".. _error-code-ehca005:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EHCA005.\n\nNCC_EHCA005\n===========\n\n**Error message**: The compiler encountered a custom call instruction with a target name that is not recognized.\n\nThe Neuron compiler currently recognizes the following custom call targets:\n\n - AwsNeuronErf\n - AwsNeuronGelu\n - AwsNeuronGeluApprxTanh\n - AwsNeuronGeluBackward\n - AwsNeuronSilu\n - AwsNeuronSiluBackward\n - AwsNeuronRmsNorm\n - AwsNeuronSoftmax\n - AwsNeuronSoftmaxBackward\n - AwsNeuronCollectiveMatmul\n - AwsNeuronIntMatmult\n - AwsNeuronArgMax\n - AwsNeuronArgMin\n - AwsNeuronTopK\n - AwsNeuronDropoutMaskV1\n - AwsNeuronCustomNativeKernel\n - AwsNeuronCustomOp\n - AwsNeuronDevicePrint\n - ResizeNearest\n - ResizeBilinear\n - ResizeNearestGrad\n - AwsNeuronLNCShardingConstraint\n - AwsNeuronTransferWithStaticRing\n - AwsNeuronModuleMarkerStart-Forward\n - AwsNeuronModuleMarkerStart-Backward\n - AwsNeuronModuleMarkerEnd-Forward\n - AwsNeuronModuleMarkerEnd-Backward\n - NeuronBoundaryMarker-Start\n - NeuronBoundaryMarker-End\n\nErroneous code example:\n\n.. code-block:: python\n\n    def lowering(ctx, x_val):\n        result_type = ir.RankedTensorType(x_val.type)\n        # This target name will not be recognized by HandleCustomCall\n        return hlo.CustomCallOp(\n            [result_type],\n            [x_val],\n            call_target_name=\"UNRECOGNIZED_TARGET\",\n            has_side_effect=ir.BoolAttr.get(False),\n        ).results\n\nUse a supported custom call target:\n\n.. code-block:: python\n\n    def lowering(ctx, x_val):\n        result_type = ir.RankedTensorType(x_val.type)\n        return hlo.CustomCallOp(\n            [result_type],\n            [x_val],\n            call_target_name=\"AwsNeuronSilu\",\n            has_side_effect=ir.BoolAttr.get(False),\n            backend_config=ir.StringAttr.get(\"\"),\n            api_version=ir.IntegerAttr.get(ir.IntegerType.get_signless(32), 2),\n        ).results\n"
  },
  {
    "path": "compiler/error-codes/EOOM001.rst",
    "content": ".. _error-code-eoom001:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EOOM001.\n\nNCC_EOOM001\n===========\n\n**Error message**: The combined memory needed for the model tensors exceeds the high-bandwidth memory limit.\n\nThe memory usage consists of:\n\n- I/O tensors: Input and output activation tensors\n- Internal allocations: Scratchpad memory for intermediate computations\n- SBUF spills: Data that cannot fit in on-chip SBUF memory and must spill to HBM\n\nThere are several ways to potentially fix this issue.\n\n1. Simply reduce the batch/tensor size if possible\n2. Utilize pipeline/tensor parallelism via neuronx-distributed\n\nShort snippet of tensor parallelism:\n\n.. code-block:: python\n\n    class ParallelSelfAttention(transformers.models.bert.modeling_bert.BertSelfAttention):\n        def __init__(self, config, position_embedding_type=None):\n            super().__init__(config, position_embedding_type)\n            self.query = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            self.key = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            self.value = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            # Since we shard the number of attention heads across tensor parallel\n            # ranks, each rank would have a subset of heads, hence, we update\n            # the num_attention_heads here.\n            tp_size = parallel_state.get_tensor_parallel_size()\n            self.num_attention_heads = self.num_attention_heads // tp_size\n            self.all_head_size = self.all_head_size // tp_size\n\nFor more information: \n\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html\n"
  },
  {
    "path": "compiler/error-codes/EOOM002.rst",
    "content": ".. _error-code-eoom002:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EOOM002.\n\nNCC_EOOM002\n===========\n\n**Error message**: The combined memory needed for the model tensors exceeds the high-bandwidth memory limit.\n\nThe memory usage consists of:\n\n- I/O tensors: Input and output activation tensors\n- Internal allocations: Scratchpad memory for intermediate computations\n- SBUF spills: Data that cannot fit in on-chip SBUF memory and must spill to HBM\n\nThere are several ways to potentially fix this issue.\n\n1. Simply reduce the batch/tensor size if possible\n2. Utilize pipeline/tensor parallelism via neuronx-distributed\n\nShort snippet of tensor parallelism:\n\n.. code-block:: python\n\n    class ParallelSelfAttention(transformers.models.bert.modeling_bert.BertSelfAttention):\n        def __init__(self, config, position_embedding_type=None):\n            super().__init__(config, position_embedding_type)\n            self.query = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            self.key = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            self.value = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            # Since we shard the number of attention heads across tensor parallel\n            # ranks, each rank would have a subset of heads, hence, we update\n            # the num_attention_heads here.\n            tp_size = parallel_state.get_tensor_parallel_size()\n            self.num_attention_heads = self.num_attention_heads // tp_size\n            self.all_head_size = self.all_head_size // tp_size\n\nFor more information: \n\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html\n"
  },
  {
    "path": "compiler/error-codes/ESFH002.rst",
    "content": ".. _error-code-esfh002:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error ESFH002.\n\nNCC_ESFH002\n===========\n\n**Error message**: The compiler encountered a unsigned 64-bit integer constant with a value that cannot be safely converted to 32-bit representation. \n\nThe Neuron hardware operates on 32-bit or narrower data types and attempts to convert 64-bit integers to 32-bit. 64-bit constants that exceed the 32-bit range and cannot be safely converted will fail compilation. Try to use uint32 for constants when possible and restructure code to avoid large constants.\n\nErroneous code example:\n\n.. code-block:: python\n\n   @jax.jit\n   def foo():\n      # direct uint64 constant in arithmetic operation\n      x = jnp.array([1, 2, 3], dtype=jnp.uint64)\n      # large constant that exceeds uint32 max\n      large_constant = jnp.uint64(5_000_000_000)\n      return x + large_constant\n\nUse uint32 for constants when possible:\n\n.. code-block:: python\n\n   @jax.jit\n   def test():\n      x = jnp.array([1, 2, 3], dtype=jnp.uint32)\n      large_constant = jnp.uint32(5_000_000_000)\n      return x + large_constant"
  },
  {
    "path": "compiler/error-codes/ESPP004.rst",
    "content": ".. _error-code-espp004:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error ESPP004.\n\nNCC_ESPP004\n===========\n\n**Error message**: The compiler encountered a data type that is not supported for code generation.\n\nErroneous code example:\n\n.. code-block:: python\n\n    import numpy as np\n    import jax.numpy as jnp\n    import jax\n    from jax._src import dtypes\n    from jax._src.lax import lax as lax_internal\n\n    # float4_e2m1fn type not supported\n    dtype = np.dtype(dtypes.float4_e2m1fn)\n    val = lax_internal._convert_element_type(0, dtype, weak_type=False)\n\nUse a supported data type:\n\n.. code-block:: python\n\n    import numpy as np\n    import jax.numpy as jnp\n    import jax\n    from jax._src import dtypes\n    from jax._src.lax import lax as lax_internal\n\n    # float4_e2m1fn type not supported\n    dtype = jnp.bfloat16\n    val = lax_internal._convert_element_type(0, dtype, weak_type=False)\n\nMore information on supported data types https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/data-types.html\n"
  },
  {
    "path": "compiler/error-codes/ESPP047.rst",
    "content": ".. _error-code-espp047:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error ESPP047.\n\nNCC_ESPP047\n===========\n\n**Error message**: The compiler found usage of an unsupported 8-bit floating-point data type.\n\nErroneous code example:\n\n.. code-block:: python\n\n    class Model(nn.Module):\n        def __init__(self):\n            super().__init__()\n            self.linear1 = nn.Linear(10, 20)\n            self.linear2 = nn.Linear(20, 10)\n\n        def forward(self, x):\n            x = self.linear1(x)\n            x = torch.relu(x)\n            x = self.linear2(x)\n            return x\n\n    # Unsupported 8-bit floating-point data type being used here\n    input_tensor = torch.randn(1, 10).to(torch.float8_e4m3fn)\n\n\nTo fix this error:\n\n.. code-block:: python\n\n    class Model(nn.Module):\n        def __init__(self):\n            super().__init__()\n            self.linear1 = nn.Linear(10, 20)\n            self.linear2 = nn.Linear(20, 10)\n\n        def forward(self, x):\n            x = self.linear1(x)\n            x = torch.relu(x)\n            x = self.linear2(x)\n            return x\n\n    input_tensor = torch.randn(1, 10).to(torch.float8_e4m3fn)\n    # Convert to a supported type\n    input_tensor = input_tensor.to(torch.float16)\n"
  },
  {
    "path": "compiler/error-codes/EUOC002.rst",
    "content": ".. _error-code-euoc002:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EUOC002.\n\nNCC_EUOC002\n===========\n\n**Error message**: An unsupported operator was used.\n\nTry using alternative operators from the full list of supported operators via `neuronx-cc list-operators --framework XLA` to workaround the limitation.\n\nBefore:\n\n.. code-block:: python\n\n    class Model(torch.nn.Module):\n        def forward(self, A, b):\n            return torch.triangular_solve(b, A)\n\nPossible workaround:\n\n.. code-block:: python\n\n    class Model(torch.nn.Module):\n        def forward(self, A, b):\n            # Although slower than triangular_solve, this is mathematically equivalent\n            A_inv = torch.inverse(A)\n            return A_inv @ b\n"
  },
  {
    "path": "compiler/error-codes/EVRF001.rst",
    "content": ".. _error-code-evrf001:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF001.\n\nNCC_EVRF001\n===========\n\n**Error message**: An unsupported operator was used.\n\nTry using alternative operators from the full list of supported operators via `neuronx-cc list-operators --framework XLA` to workaround the limitation.\n\nBefore:\n\n.. code-block:: python\n\n    class Model(torch.nn.Module):\n        def forward(self, A, b):\n            return torch.triangular_solve(b, A)\n\n\nPossible workaround:\n\n.. code-block:: python\n\n    class Model(torch.nn.Module):\n        def forward(self, A, b):\n            # Although slower than triangular_solve, this is mathematically equivalent\n            A_inv = torch.inverse(A)\n            return A_inv @ b\n"
  },
  {
    "path": "compiler/error-codes/EVRF004.rst",
    "content": ".. _error-code-evrf004:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF004.\n\nNCC_EVRF004\n===========\n\n**Error message**: Complex data types are not supported on the Neuron device.\n\nYou cannot use complex data types (such as ``complex64``, ``complex128``, and others) on the Neuron device directly. \n\nOne fix is to offload complex operations to CPU, like so:\n\n.. code-block:: python\n\n    x = torch.tensor([1+2j, 3+4j], dtype=torch.complex64).to('cpu')\n\n.. note::\n\n   Since data transfer between CPU and device is expensive, this is best used when complex operations are rare.\n\nYou can also address this error by manually emulating complex tensors using real and imaginary parts:\n\n.. code-block:: python\n\n    real = x.real\n    imag = x.imag\n    ...\n    # (a + bi) * (c + di)\n    real_out = a_real * b_real - a_imag * b_imag\n    imag_out = a_real * b_imag + a_imag * b_real\n"
  },
  {
    "path": "compiler/error-codes/EVRF005.rst",
    "content": ".. _error-code-evrf005:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF005.\n\nNCC_EVRF005\n===========\n\n**Error message**: The compiler found usage of F8E4M3FNUZ, F8E4M3B11FNUZ, or F8E5M2FNUZ data type which is not supported.\n\nErroneous code example:\n\n.. code-block:: python\n\n    class Model(nn.Module):\n        def __init__(self):\n            super().__init__()\n            self.linear1 = nn.Linear(10, 20)\n            self.linear2 = nn.Linear(20, 10)\n        def forward(self, x):\n            x = self.linear1(x)\n            x = torch.relu(x)\n            x = self.linear2(x)\n            return x\n    input_tensor = torch.randn(1, 10).to(torch.float8_e4m3fnuz)\n\nTo fix this error:\n\n.. code-block:: python\n    \n    class Model(nn.Module):\n        def __init__(self):\n            super().__init__()\n            self.linear1 = nn.Linear(10, 20)\n            self.linear2 = nn.Linear(20, 10)\n        def forward(self, x):\n            x = self.linear1(x)\n            x = torch.relu(x)\n            x = self.linear2(x)\n            return x\n    input_tensor = torch.randn(1, 10).to(torch.float8_e4m3fnuz)\n    # Convert to a supported type\n    input_tensor = input_tensor.to(torch.float16)\n\n* More information on supported data types: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/data-types.html\n"
  },
  {
    "path": "compiler/error-codes/EVRF006.rst",
    "content": ".. _error-code-evrf006:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF006.\n\nNCC_EVRF006\n===========\n\nThe compiler encountered a RNGBitGenerator operation using a random number generation algorithm other than RNG_DEFAULT.\n-----------------------------------------------------------------------------------------------------------------------\n\nEnsure that you are using standard JAX/PyTorch random APIs and not explicity specifying an RNG algorithm.\n"
  },
  {
    "path": "compiler/error-codes/EVRF007.rst",
    "content": ".. _error-code-evrf007:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF007.\n\nNCC_EVRF007\n===========\n\n**Error message**: The number of instructions generated exceeds the limit.\n\nConsider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs.\n\nFor more information: \n\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#api-guide\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html\n"
  },
  {
    "path": "compiler/error-codes/EVRF009.rst",
    "content": ".. _error-code-evrf009:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF009.\n\nNCC_EVRF009\n===========\n\n**Error message**: The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit.\n\nThere are several ways to potentially fix this issue.\n\n1. Simply reduce the batch/tensor size if possible\n2. Utilize pipeline/tensor parallelism via neuronx-distributed\n\nShort snippet of tensor parallelism:\n\n.. code-block:: python\n\n    class ParallelSelfAttention(transformers.models.bert.modeling_bert.BertSelfAttention):\n        def __init__(self, config, position_embedding_type=None):\n            super().__init__(config, position_embedding_type)\n\n            self.query = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            self.key = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            self.value = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            # Since we shard the number of attention heads across tensor parallel\n            # ranks, each rank would have a subset of heads, hence, we update\n            # the num_attention_heads here.\n            tp_size = parallel_state.get_tensor_parallel_size()\n            self.num_attention_heads = self.num_attention_heads // tp_size\n            self.all_head_size = self.all_head_size // tp_size\n\n\nFor more information: \n\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html\n"
  },
  {
    "path": "compiler/error-codes/EVRF010.rst",
    "content": ".. _error-code-evrf010:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF010.\n\nNCC_EVRF010\n===========\n\n**Error message**: The compiler encountered simultaneous use of input and kernel dilation, which is not supported.\n\nErroneous code example:\n\n.. code-block:: python\n\n    x = jnp.ones((1, 4, 4, 1), dtype=jnp.float32)\n    kernel = jnp.ones((3, 3, 1, 1), dtype=jnp.float32)\n\n    result = lax.conv_general_dilated(\n        x,\n        kernel,\n        window_strides=(1, 1),\n        padding=((2, 2), (2, 2)),\n        lhs_dilation=(2, 2), # input dilation\n        rhs_dilation=(2, 2), # kernel dilation\n        dimension_numbers=('NHWC', 'HWIO', 'NHWC')\n    )\n\n\nIf possible, use only only input or kernel dilation:\n\n.. code-block:: python\n\n    x = jnp.ones((1, 4, 4, 1), dtype=jnp.float32)\n    kernel = jnp.ones((3, 3, 1, 1), dtype=jnp.float32)\n\n    result = lax.conv_general_dilated(\n        x,\n        kernel,\n        window_strides=(1, 1),\n        padding=((2, 2), (2, 2)),\n        lhs_dilation=(1, 1), # no input dilation\n        rhs_dilation=(2, 2),\n        dimension_numbers=('NHWC', 'HWIO', 'NHWC')\n    )\n\nOr apply dilation manually and apply convolution to the remainder.\n"
  },
  {
    "path": "compiler/error-codes/EVRF011.rst",
    "content": ".. _error-code-evrf011:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF011.\n\nNCC_EVRF011\n===========\n\n**Error message**: The compiler encountered strided convolution combined with dilated input, which is not supported.\n\nErroneous code example:\n\n.. code-block:: python\n\n    x = jnp.ones((1, 4, 4, 1), dtype=jnp.float32)\n    kernel = jnp.ones((3, 3, 1, 1), dtype=jnp.float32)\n\n    result = lax.conv_general_dilated(\n        x,\n        kernel,\n        window_strides=(2, 2),    # strided convolution\n        padding=((2, 2), (2, 2)),\n        lhs_dilation=(2, 2),      # and dilated input\n        rhs_dilation=(1, 1),\n        dimension_numbers=('NHWC', 'HWIO', 'NHWC')\n    )\n\n\nIf possible, remove stride or input dilation:\n\n.. code-block:: python\n\n    x = jnp.ones((1, 4, 4, 1), dtype=jnp.float32)\n    kernel = jnp.ones((3, 3, 1, 1), dtype=jnp.float32)\n\n    result = lax.conv_general_dilated(\n        x, kernel,\n        window_strides=(2, 2),  \n        padding=((2, 2), (2, 2)),\n        lhs_dilation=(1, 1),    # remove input dilation\n        rhs_dilation=(1, 1),\n        dimension_numbers=('NHWC', 'HWIO', 'NHWC')\n    )\n\n\nOr apply upsampling and downsampling separately.\n"
  },
  {
    "path": "compiler/error-codes/EVRF013.rst",
    "content": ".. _error-code-evrf013:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF013.\n\nNCC_EVRF013\n===========\n\n**Error message**: TopK does not support int32 or int64 input tensors.\n\nErroneous code example:\n\n.. code-block:: python\n\n    def forward(self, x):\n        # assume x is an integer tensor\n        # error: cannot call TopK on integer dtypes\n        k = 5\n        values, indices = torch.topk(x, k=k, dim=-1)\n        return values, indices\n\n\nTo fix this error, you can cast your tensor to a supported floating point dtype.\n\n.. code-block:: python\n\n    def forward(self, x):\n        x = x.float()\n        k = 5\n        values, indices = torch.topk(x, k=k, dim=-1)\n        return values, indices\n"
  },
  {
    "path": "compiler/error-codes/EVRF015.rst",
    "content": ".. _error-code-evrf015:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF015.\n\nNCC_EVRF015\n===========\n\n**Error message**: The compiler encountered a custom call instruction with a target name that is not recognized.\n\nThe Neuron compiler currently recognizes the following custom call targets:\n\n - AwsNeuronErf\n - AwsNeuronGelu\n - AwsNeuronGeluApprxTanh\n - AwsNeuronGeluBackward\n - AwsNeuronSilu\n - AwsNeuronSiluBackward\n - AwsNeuronRmsNorm\n - AwsNeuronSoftmax\n - AwsNeuronSoftmaxBackward\n - AwsNeuronCollectiveMatmul\n - AwsNeuronIntMatmult\n - AwsNeuronArgMax\n - AwsNeuronArgMin\n - AwsNeuronTopK\n - AwsNeuronDropoutMaskV1\n - AwsNeuronCustomNativeKernel\n - AwsNeuronCustomOp\n - AwsNeuronDevicePrint\n - ResizeNearest\n - ResizeBilinear\n - ResizeNearestGrad\n - AwsNeuronLNCShardingConstraint\n - AwsNeuronTransferWithStaticRing\n - AwsNeuronModuleMarkerStart-Forward\n - AwsNeuronModuleMarkerStart-Backward\n - AwsNeuronModuleMarkerEnd-Forward\n - AwsNeuronModuleMarkerEnd-Backward\n - NeuronBoundaryMarker-Start\n - NeuronBoundaryMarker-End\n\nErroneous code example:\n\n.. code-block:: python\n\n    def lowering(ctx, x_val):\n        result_type = ir.RankedTensorType(x_val.type)\n        # This target name will not be recognized by HandleCustomCall\n        return hlo.CustomCallOp(\n            [result_type],\n            [x_val],\n            call_target_name=\"UNRECOGNIZED_TARGET\",\n            has_side_effect=ir.BoolAttr.get(False),\n        ).results\n\n\nUse a supported custom call target:\n\n.. code-block:: python\n\n    def lowering(ctx, x_val):\n        result_type = ir.RankedTensorType(x_val.type)\n        return hlo.CustomCallOp(\n            [result_type],\n            [x_val],\n            call_target_name=\"AwsNeuronSilu\",\n            has_side_effect=ir.BoolAttr.get(False),\n            backend_config=ir.StringAttr.get(\"\"),\n            api_version=ir.IntegerAttr.get(ir.IntegerType.get_signless(32), 2),\n        ).results\n\n"
  },
  {
    "path": "compiler/error-codes/EVRF016.rst",
    "content": ".. _error-code-evr016:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVR016.\n\nNCC_EVRF016\n===========\n\nThe NCC_EVRF016 error is raised when the Neuron compiler detects that you are trying to use an integer or boolean type with one of the restricted reduction functions.\n\n**Error message**: The scatter-reduce operation cannot perform reduction logic if the data being scattered or the destination tensor is using an integer or boolean data type.\n\nThe hardware instructions used on the Neuron device for these specific scatter-and-reduce functions are optimized for and limited to floating-point arithmetic. When the compiler detects that you are trying to use an integer or boolean type with one of the restricted reduction functions, it stops the compilation process to prevent a hardware crash or incorrect calculation.\n\n**Example of the error**\n\nThe following example shows the **NCC\\_EVRF016** error because the :code:`input_tensor` is defined using an integer data type (:code:`torch.int32`) while being used with a reduction function (:code:`reduce='sum'`) in the :code:`scatter_reduce_` operation.\n\n.. code-block:: python\n\n    def forward(self, input_tensor, indices_tensor, src_tensor):\n        output = input_tensor.clone()\n        \n        output.scatter_reduce_(\n            dim=1,\n            index=indices_tensor,\n            src=src_tensor,\n            reduce='sum',\n        )\n        return output\n\n    # ERROR: using integer dtype with scatter-reduce\n    input_tensor = torch.zeros(BATCH_SIZE, DIM_SIZE, dtype=torch.int32)\n    ...\n\n**How to fix**\n\nTo fix this error, you must cast your input and source tensors to a floating-point data type (e.g., torch.float32 or torch.bfloat16).\n\n.. code-block:: python\n\n    def forward(self, input_tensor, indices_tensor, src_tensor):\n        output = input_tensor.clone()\n        \n        output.scatter_reduce_(\n            dim=1,\n            index=indices_tensor,  \n            src=src_tensor,        \n            reduce='sum',\n        )\n        return output\n\n    # FIXED: changed to float32\n    # now works with scatter-reduce\n    input_tensor = torch.zeros(BATCH_SIZE, DIM_SIZE, dtype=torch.float32)\n    ...\n"
  },
  {
    "path": "compiler/error-codes/EVRF017.rst",
    "content": ".. _error-code-evrf017:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF017.\n\nNCC_EVRF017\n===========\n\n**Error message**: The compiler encountered a reduce-window operation with base dilation (input dilation) greater than 1, which is not supported.\n\nErroneous code example:\n\n.. code-block:: python\n\n    result = lax.reduce_window(\n        x, -jnp.inf, lax.max,\n        window_dimensions=(1, 1, 1, 1),\n        window_strides=(1, 1, 1, 1),\n        padding='VALID',\n        base_dilation=(1, 2, 1, 1) # ERROR: applying base dilation of 2 in dimension 1\n    )\n\nIf possible, change base dilation to be all 1s:\n\n.. code-block:: python\n\n    result = lax.reduce_window(\n        x, -jnp.inf, lax.max,\n        window_dimensions=(1, 1, 1, 1),\n        window_strides=(1, 1, 1, 1),\n        padding='VALID',\n        base_dilation=(1, 1, 1, 1) # FIXED: all values are 1 (no dilation)\n    )\n\nOr consider manual dilation if necessary.\n"
  },
  {
    "path": "compiler/error-codes/EVRF018.rst",
    "content": ".. _error-code-evrf018:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF018.\n\nNCC_EVRF018\n===========\n\n**Error message**: The compiler encountered a reduce-window operation with window dilation greater than 1, which is not supported.\n\nErroneous code example:\n\n.. code-block:: python\n\n    result = lax.reduce_window(\n        jnp.ones((1, 4, 4, 1)), -jnp.inf, lax.max,\n        window_dimensions=(1, 2, 2, 1),\n        window_strides=(1, 1, 1, 1),\n        padding='VALID',\n        window_dilation=(1, 2, 2, 1) # 2 is greater than 1\n    )\n\n\nIf possible, remove window_dilation or change values to be all 1s:\n\n.. code-block:: python\n\n    result = lax.reduce_window(\n        jnp.ones((1, 4, 4, 1)), -jnp.inf, lax.max,\n        window_dimensions=(1, 2, 2, 1),\n        window_strides=(1, 1, 1, 1),\n        padding='VALID',\n        window_dilation=(1, 1, 1, 1) \n    )\n\nOr consider manual dilation if necessary.\n"
  },
  {
    "path": "compiler/error-codes/EVRF019.rst",
    "content": ".. _error-code-evrf019:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF019.\n\nNCC_EVRF019\n===========\n\n**Error message**: The compiler encountered a reduce-window operation with more or less than 2 operands. Support for reduce_window is available for exactly one input tensor and one initial value for reduction.\n\nErroneous code example:\n\n.. code-block:: python\n\n    # reduce-window operation with more or less than 2 operands is not supported\n    # 4 operands are being provided instead of 2\n    lax.reduce_window(\n        (x, x),               # ERROR: a tuple of two input tensors\n        (-jnp.inf, jnp.inf),  # ERROR: a tuple of two initial values\n        lambda a, b: (jnp.maximum(a[0], b[0]), jnp.minimum(a[1], b[1])),\n        window_dimensions=(1, 2, 2, 1),\n        window_strides=(1, 2, 2, 1),\n        padding='VALID'\n    )\n\n\nIf possible, split multi-operand reduce_window with multiple single-operand reduce_window operations.\n\n.. code-block:: python\n\n    # For max pooling\n    # 2 operands are correctly being provided\n    max_pool = lax.reduce_window(\n        x,         # FIXED: a single input tensor\n        -jnp.inf,  # FIXED: a single initial value\n        lax.max,\n        window_dimensions=(1, 2, 2, 1),\n        window_strides=(1, 2, 2, 1),\n        padding='VALID'\n    )\n    \n    # For min pooling\n    # 2 operands are correctly being provided\n    min_pool = lax.reduce_window(\n        x,        # FIXED: a single input tensor\n        jnp.inf,  # FIXED: a single initial value\n        lax.min,\n        window_dimensions=(1, 2, 2, 1),\n        window_strides=(1, 2, 2, 1),\n        padding='VALID'\n    )\n"
  },
  {
    "path": "compiler/error-codes/EVRF022.rst",
    "content": ".. _error-code-evrf022:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF022.\n\nNCC_EVRF022\n===========\n\n**Error message**: Shift-right-arithmetic operation on non 32-bit inputs is not supported. Cast the first argument's data type to be S32, U32, or F32.\n\nErroneous code example:\n\n.. code-block:: python\n\n    def forward(self, input, other):\n        return torch.bitwise_right_shift(input, other)\n    \n    # This will be the first argument and must be 32-bit\n    input = torch.tensor([16, 32, 64], dtype=torch.int16)\n    # The second argument can be non 32-bit\n    other = torch.tensor([1, 2, 3], dtype=torch.int16)\n\n\nTo fix this error:\n\n.. code-block:: python\n\n    def forward(self, input, other):\n        return torch.bitwise_right_shift(input, other)\n\n    # Correctly setting the first argument to be 32-bit\n    input = torch.tensor([16, 32, 64], dtype=torch.int32)\n    other = torch.tensor([1, 2, 3], dtype=torch.int16)\n"
  },
  {
    "path": "compiler/error-codes/EVRF031.rst",
    "content": ".. _error-code-evrf031:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF031.\n\nNCC_EVRF031\n===========\n\n**Error message**: The compiler encountered a scatter out-of-bounds error. The indices created via iota instruction contain values that are beyond the size of the operand dimension.\n\nErroneous code example:\n\n.. code-block:: python\n\n    # size 3 in dimension 0\n    operand = jnp.zeros((3, 4), dtype=jnp.float32)\n\n    # iota generates indices [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n    indices = lax.iota(jnp.int32, 10) # ERROR: size 10 > operand dimension 3\n    indices = indices.reshape(10, 1)\n\n    updates = jnp.ones((10, 4), dtype=jnp.float32) # ERROR: 10 updates but operand only has 3 rows\n\n    result = lax.scatter(\n        operand,\n        indices, # ERROR: index values in [0, 10) but operand dimension only allows indices in [0, 3)\n        updates,\n        lax.ScatterDimensionNumbers(\n        update_window_dims=(1,),\n        inserted_window_dims=(0,),\n        scatter_dims_to_operand_dims=(0,)\n        )\n    )\n\n\nEnsure that the iota size matches the operand dimension size:\n\n.. code-block:: python\n\n    N = 3\n    D = 4\n    operand = jnp.zeros((N, D), dtype=jnp.float32)\n\n    # FIXED: match iota size to operand dimension\n    indices = lax.iota(jnp.int32, N) # size N is same as operand dimension\n    indices = indices.reshape(N, 1)\n\n    # FIXED: updates size matches operand dimension\n    updates = jnp.ones((N, D), dtype=jnp.float32)\n\n    result = lax.scatter(\n        operand,\n        indices, # FIXED: indices now in valid range [0, 3)\n        updates,\n        lax.ScatterDimensionNumbers(\n        update_window_dims=(1,),\n        inserted_window_dims=(0,),\n        scatter_dims_to_operand_dims=(0,)\n        )\n    )\n"
  },
  {
    "path": "compiler/error-codes/EXSP001.rst",
    "content": ".. _error-code-exsp001:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EXSP001.\n\nNCC_EXSP001\n===========\n\nThe combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit. \n------------------------------------------------------------------------------------------------------\n\nThere are several ways to potentially fix this issue.\n\n1. Simply reduce the batch/tensor size if possible\n2. Utilize pipeline/tensor parallelism via neuronx-distributed\n\nShort snippet of tensor parallelism:\n\n.. code-block:: python\n\n    class ParallelSelfAttention(transformers.models.bert.modeling_bert.BertSelfAttention):\n        def __init__(self, config, position_embedding_type=None):\n            super().__init__(config, position_embedding_type)\n\n            self.query = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            self.key = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            self.value = ColumnParallelLinear(config.hidden_size,\n                                            self.all_head_size,\n                                            gather_output=False)\n            # Since we shard the number of attention heads across tensor parallel\n            # ranks, each rank would have a subset of heads, hence, we update\n            # the num_attention_heads here.\n            tp_size = parallel_state.get_tensor_parallel_size()\n            self.num_attention_heads = self.num_attention_heads // tp_size\n            self.all_head_size = self.all_head_size // tp_size\n\n\nFor more information: \n\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html\n"
  },
  {
    "path": "compiler/error-codes/EXTP004.rst",
    "content": ".. _error-code-extp004:\n\n.. meta::\n   :description: AWS Neuron SDK Graph Compiler error code documentation for error EXTP004.\n\nNCC_EXTP004\n===========\n\n**Error message**: The number of instructions generated exceeds the limit.\n\nConsider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs.\n        \nFor more information: \n\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#api-guide\n- https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html\n"
  },
  {
    "path": "compiler/error-codes/index.rst",
    "content": ".. meta::\n    :description: \"Neuron Compiler error code documentation home.\"\n    :date-modified: 12/02/2025\n\n.. _ncc-errors-home:\n\nNeuron Compiler Error Codes\n============================\n\nThis page lists the error codes you can encounter while developing with the Neuron Compiler. For more details on any individual error, click the link for that error code in the table below.\n\n.. list-table::\n   :header-rows: 1\n\n   * - Error Code\n     - Error Message\n     - Recommendation\n   * - :ref:`NCC_EARG001 <error-code-earg001>`\n     - Unsupported Logical Neuron Core (LNC) configuration.\n     - You attempted to use a Logical Neuron Core configuration that is not supported by the target Neuron architecture.\n   * - :ref:`NCC_EBIR023 <error-code-ebir023>`\n     - MLP kernel intermediate size exceeds the maximum supported value of 4096.\n     - Consider tiling large intermediate tensors in your kernel to stay within the supported limit, or increase tensor parallelism to shard the intermediate dimension across more cores.\n   * - :ref:`NCC_EBVF030 <error-code-ebvf030>`\n     - The number of instructions generated exceeds the limit.\n     - Consider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs.\n   * - :ref:`NCC_EHCA005 <error-code-ehca005>`\n     - The compiler encountered a custom call instruction with a target name that is not recognized.\n     - Use a supported custom call target from the list of recognized targets.\n   * - :ref:`NCC_EOOM001 <error-code-eoom001>`\n     - The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit.\n     - You may need to reduce batch/tensor size or utilize pipeline/tensor parallelism via neuronx-distributed.\n   * - :ref:`NCC_EOOM002 <error-code-eoom002>`\n     - The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit.\n     - You may need to reduce batch/tensor size or utilize pipeline/tensor parallelism via neuronx-distributed.\n   * - :ref:`NCC_ESFH002 <error-code-esfh002>`\n     - The compiler encountered a unsigned 64-bit integer constant with a value that cannot be safely converted to 32-bit representation.\n     - Try to use uint32 for constants when possible and restructure code to avoid large constants.\n   * - :ref:`NCC_ESPP004 <error-code-espp004>`\n     - The compiler encountered a data type that is not supported for code generation.\n     - Use a supported data type as listed in the Neuron documentation.\n   * - :ref:`NCC_ESPP047 <error-code-espp047>`\n     - Unsupported 8-bit floating-point data type.\n     - The compiler found usage of an unsupported 8-bit floating-point data type. Convert to a supported type like torch.float16.\n   * - :ref:`NCC_EUOC002 <error-code-euoc002>`\n     - An unsupported operator was used.\n     - Try using alternative operators from the full list of supported operators via `neuronx-cc list-operators --framework XLA` to workaround the limitation.\n   * - :ref:`NCC_EVRF001 <error-code-evrf001>`\n     - An unsupported operator was used.\n     - Try using alternative operators from the full list of supported operators to workaround the limitation.\n   * - :ref:`NCC_EVRF004 <error-code-evrf004>`\n     - Complex data types are not supported on the Neuron device.\n     - You cannot use complex data types (such as ``complex64``, ``complex128``, and others) on the Neuron device directly.\n   * - :ref:`NCC_EVRF005 <error-code-evrf005>`\n     - Unsupported F8E4M3FNUZ, F8E4M3B11FNUZ, or F8E5M2FNUZ data type.\n     - The compiler found usage of unsupported 8-bit floating-point data types. Convert to a supported type like torch.float16.\n   * - :ref:`NCC_EVRF006 <error-code-evrf006>`\n     - The compiler encountered a RNGBitGenerator operation using a random number generation algorithm other than RNG_DEFAULT.\n     - Ensure that you are using standard JAX/PyTorch random APIs and not explicity specifying an RNG algorithm.\n   * - :ref:`NCC_EVRF007 <error-code-evrf007>`\n     - The number of instructions generated exceeds the limit.\n     - Consider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs.\n   * - :ref:`NCC_EVRF009 <error-code-evrf009>`\n     - The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit.\n     - You may need to reduce batch/tensor size or utilize pipeline/tensor parallelism via neuronx-distributed.\n   * - :ref:`NCC_EVRF010 <error-code-evrf010>`\n     - The compiler encountered simultaneous use of input and kernel dilation, which is not supported.\n     - If possible, use only input or kernel dilation, not both simultaneously.\n   * - :ref:`NCC_EVRF011 <error-code-evrf011>`\n     - The compiler encountered strided convolution combined with dilated input, which is not supported.\n     - If possible, remove stride or input dilation, or apply upsampling and downsampling separately.\n   * - :ref:`NCC_EVRF013 <error-code-evrf013>`\n     - TopK does not support integer input tensors (int32, int64).\n     - The TopK operation cannot be performed on integer data types.\n   * - :ref:`NCC_EVRF015 <error-code-evrf015>`\n     - The compiler encountered a custom call instruction with a target name that is not recognized.\n     - Use a supported custom call target from the list of recognized targets.\n   * - :ref:`NCC_EVRF016 <error-code-evr016>`\n     - The scatter-reduce operation cannot perform reduction logic if the data being scattered or the destination tensor is using an integer or boolean data type.\n     - Cast your input and source tensors to a floating-point data type (e.g., torch.float32 or torch.bfloat16).\n   * - :ref:`NCC_EVRF017 <error-code-evrf017>`\n     - Reduce-window operation with base dilation greater than 1 is not supported.\n     - Change base dilation to be all 1s or consider manual dilation if necessary.\n   * - :ref:`NCC_EVRF018 <error-code-evrf018>`\n     - Reduce-window operation with window dilation greater than 1 is not supported.\n     - Remove window_dilation or change values to be all 1s, or consider manual dilation if necessary.\n   * - :ref:`NCC_EVRF019 <error-code-evrf019>`\n     - The compiler encountered a reduce-window operation with more or less than 2 operands.\n     - If possible, split multi-operand reduce_window with multiple single-operand reduce_window operations.\n   * - :ref:`NCC_EVRF022 <error-code-evrf022>`\n     - Shift-right-arithmetic operation on non 32-bit inputs is not supported. Cast the first argument's data type to be S32, U32, or F32.\n     - You need to use 32-bit data types for shift operations. Cast inputs to int32, uint32, or float32.\n     - Reduce batch/tensor size or utilize tensor parallelism via neuronx-distributed.\n   * - :ref:`NCC_EVRF031 <error-code-evrf031>`\n     - The compiler encountered a scatter out-of-bounds error.\n     - Ensure that the iota size matches the operand dimension size.\n   * - :ref:`NCC_EXSP001 <error-code-exsp001>`\n     - The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit.\n     - You may need to reduce batch/tensor size or utilize pipeline/tensor parallelism via neuronx-distributed.\n   * - :ref:`NCC_EXTP004 <error-code-extp004>`\n     - The number of instructions generated exceeds the limit.\n     - Consider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs.\n\n.. toctree::\n    :hidden:\n    :maxdepth: 1\n\n    EARG001\n    EBIR023\n    EBVF030\n    EHCA005\n    EOOM001\n    EOOM002\n    ESFH002\n    ESPP004\n    ESPP047\n    EUOC002\n    EVRF001\n    EVRF004\n    EVRF005\n    EVRF006\n    EVRF007\n    EVRF009\n    EVRF010\n    EVRF011\n    EVRF013\n    EVRF015\n    EVRF016\n    EVRF017\n    EVRF018\n    EVRF019\n    EVRF022\n    EVRF031\n    EXSP001\n    EXTP004\n"
  },
  {
    "path": "compiler/index.rst",
    "content": ".. _neuron_cc:\n\nNeuron Graph Compiler\n======================\n\nThe Neuron Graph Compiler is a sophisticated compilation system that transforms Machine Learning models from various frameworks (TensorFlow, MXNet, PyTorch, XLA HLO) into highly optimized code for AWS Neuron accelerators. It performs deep analysis of model structure, applies hardware-specific optimizations, and generates executable code tailored for maximum performance on Neuron hardware.\n\nThe Neuron compiler is available in two versions to support different AWS ML accelerator architectures:\n \n* **neuronx-cc**: The newer XLA-based compiler supporting NeuronCores v2 architecture (Trn1, Inf2, Trn1n, Trn2). This compiler leverages the XLA (Accelerated Linear Algebra) framework to provide advanced optimizations for modern ML workloads.\n* **neuron-cc**: The TVM-based compiler supporting NeuronCores v1 architecture (Inf1). This compiler uses the TVM (Tensor Virtual Machine) framework as its foundation.\n\nKey capabilities of the Neuron Graph Compiler include:\n\n* **Performance optimization**: Intelligently converts FP32 operations to more efficient formats (BF16/FP16/TF32/FP8) with configurable precision-performance tradeoffs. By default, the compiler automatically casts FP32 matrix multiplication operations to BF16 for optimal performance while maintaining accuracy.\n\n* **Model-specific optimizations**: Provides specialized optimizations for different model architectures:\n  * **Generic**: Applies general optimizations suitable for all model types\n  * **Transformer**: Implements specific optimizations for transformer-based architectures like BERT, GPT, and other attention-based models\n  * **U-Net**: Applies specialized memory optimizations for U-Net architectures to prevent performance-impacting data transfers\n\n* **Distributed training support**: Enables efficient large language model (LLM) training through distribution strategies that shard parameters, gradients, and optimizer states across data-parallel workers.\n\n* **Advanced memory management**: Optimizes memory usage for large models through techniques like model sharding across multiple NeuronCores, with configurable logical NeuronCore settings to control sharding degree.\n\n* **Optimization levels**: Provides multiple optimization levels (1-3) to balance compilation time against runtime performance, allowing users to choose the appropriate tradeoff for their workflow.\n\n* **Mixed precision support**: Offers fine-grained control over precision and performance through auto-casting options, supporting multiple numeric formats (FP32, TF32, FP16, BF16, FP8) with different strengths in dynamic range and numeric precision.\n\nThe compilation process is typically transparent to users, as the compiler is invoked automatically within ML frameworks through Neuron Framework plugins. Models are analyzed, optimized, and compiled into a NEFF file (Neuron Executable File Format), which is then loaded by the :doc:`Neuron Runtime </neuron-runtime/index>` for execution on Neuron devices.\n\n.. grid:: 1 \n   :gutter: 3\n\n   .. grid-item-card:: Neuron Graph Compiler Component Release Notes\n      :link: /release-notes/components/compiler\n      :link-type: doc\n\n      Review the Neuron Graph Compiler release notes for all versions of the Neuron SDK.\n\n.. tab-set::\n\n   .. tab-item:: Neuron Graph Compiler (neuronx-cc) for Trn1 & Inf2\n\n      .. grid:: 1 \n         :gutter: 3\n\n         .. grid-item-card:: CLI Reference Guide\n            :link: neuron-compiler-cli-reference-guide\n            :link-type: ref\n\n            Neuron Compiler CLI Reference Guide\n\n         .. grid-item-card:: Graph Compiler Developer Guide\n            :link: neuronx-cc-training-mixed-precision\n            :link-type: ref\n\n            Mixed precision training guide\n\n         .. grid-item-card:: Graph Compiler Error Code Reference\n            :link: ncc-errors-home\n            :link-type: ref\n\n            Error code reference\n\n         .. grid-item-card:: How To Convolute Kernels in UNet Training Models\n            :link: implement-convolution-kernels-unet\n            :link-type: ref\n\n            Learn how to modify UNet training models to use convolution kernels with the AWS Neuron SDK. \n\n         .. grid-item-card:: Graph Compiler FAQ\n            :link: neuronx_compiler_faq\n            :link-type: ref\n\n            Frequently asked questions\n\n\n   .. tab-item:: Neuron Graph Compiler (neuron-cc) for Inf1\n\n      .. grid:: 1 \n         :gutter: 3\n\n         .. grid-item-card:: Graph Compiler API Reference Guide\n            :link: neuron-compiler-cli-reference\n            :link-type: ref\n\n            Neuron Compiler CLI Reference\n\n         .. grid-item-card:: Graph Compiler Developer Guide\n            :link: neuron-cc-training-mixed-precision\n            :link-type: ref\n\n            Mixed precision training guide\n\n         .. grid-item-card:: Graph Compiler FAQ\n            :link: neuron_compiler_faq\n            :link-type: ref\n\n            Frequently asked questions\n\n\n.. toctree::\n    :maxdepth: 2\n    :hidden:\n\n    /compiler/neuronx-cc\n    /compiler/neuron-cc\n    Error Codes </compiler/error-codes/index>\n    Release Notes </release-notes/components/compiler>\n"
  },
  {
    "path": "compiler/neuron-cc/api-reference-guide.rst",
    "content": "API Reference Guide\n===================\n\n.. toctree::\n    :maxdepth: 1\n\n    /compiler/neuron-cc/command-line-reference"
  },
  {
    "path": "compiler/neuron-cc/command-line-reference.rst",
    "content": ".. _neuron-compiler-cli-reference:\n\nNeuron compiler CLI Reference Guide (``neuron-cc``)\n===================================================\n\nThis document describes the command line interface of the Neuron\ncompiler. This reference is not relevant for applications that run\nneuron-cc from within a machine learning framework (TensorFlow-Neuron\nfor example) since these options are passed from the framework directly\nto neuron-cc.\n\nUsing neuron-cc on the command line may be desirable for applications\nthat do not use a framework, or customize existing frameworks. It is\nalso possible to supply CLI commands to the framework as options to be\npassed through to the compiler.\n\nUsage\n--------\n\nOptional parameters are shown in square brackets. See the individual\nframework guides for the correct syntax.\n\n.. _neuron_cli:\n\n.. rubric:: Neuron Compiler CLI\n\n.. program:: neuron-cc\n\n.. option:: neuron-cc [options] <command> [parameters]\n\nCommon options for the Neuron CLI:\n\n    - :option:`--verbose` (string) default=“WARN”:\n\n        Valid values:\n\n        -  :option:`DEBUG`\n        -  :option:`INFO`\n        -  :option:`WARN`\n        -  :option:`ERROR`\n\n\n\nUse :option:`neuron-cc <command> --help` for information on a specific command.\n\nAvailable Commands:\n~~~~~~~~~~~~~~~~~~~\n\n-  :option:`compile`\n-  :option:`list-operators`\n\n\n.. option:: neuron-cc compile [parameters]\n\n    Compile a model for use on the AWS Inferentia Machine Learning Accelerator.\n\n    .. code-block::\n\n        neuron-cc compile <file names> --framework <value> --io-config <value> [--neuroncore-pipeline-cores <value>] [--enable-saturate-infinity] [--enable-fast-loading-neuron-binaries] [--enable-fast-context-switch] [--fp32-cast cast-method] [--fast-math cast-method] [--output <value>]\n\n    **Compile Parameters:**\n\n    - :option:`<file names>`: Input containing model specification. The number\n      of arguments required varies between frameworks:\n\n        -  **TENSORFLOW**: A local filename or URI of a TensorFlow Frozen\n           GraphDef (.pb); or the name of a local directory containing a\n           TensorFlow SavedModel.\n\n           See\n           https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/graph.proto\n           for the associated .proto schema for TensorFlow Frozen GraphDefs. See\n           https://www.tensorflow.org/guide/saved_model for more information on\n           the SavedModel format.\n\n        -  **MXNET**: List of local filenames or URIs where input architecture\n           .json file and parameter .param file are stored. These contains\n           information related to the architecture of your graph and associated\n           parameters, respectively.\n\n\n    - :option:`--framework` (string): Framework in which the model was trained.\n\n      Valid values:\n\n        - :option:`TENSORFLOW`\n        - :option:`MXNET`\n        - :option:`XLA`\n\n    - :option:`--neuroncore-pipeline-cores` (int) (default=1): Number of neuron cores\n      to be used in \"NeuronCore Pipeline\" mode. This is different from data\n      parallel deployment (same model on multiple neuron cores). Refer to\n      Runtime/Framework documentation for data parallel deployment options.\n\n      Compile for the given number of\n      neuron cores so as to leverage NeuronCore Pipeline mode.\n\n      .. note::\n        This is not used to define the number of Neuron Cores to be used in a data\n        parallel deployment (ie the same model on multiple Neuron Cores). That\n        is a runtime/framework configuration choice.\n\n    - :option:`--output` (string) (default=“out.neff”): Filename where compilation\n      output (NEFF archive) will be recorded.\n\n    - :option:`--io-config` (string): Configuration containing the names and shapes\n      of input and output tensors.\n\n      The io-config can be specified as a local filename, a URI, or a string\n      containing the io-config itself.\n\n      The io-config must be formatted as a JSON object with two members\n      “inputs” and “outputs”. “inputs” is an object mapping input tensor names\n      to an array of shape and data type. “outputs” is an array of output\n      tensor names. Consider the following example:\n\n      .. code-block:: json\n\n        {\n         \"inputs\": {\n            \"input0:0\": [[1,100,100,3], \"float16\"],\n            \"input1:0\": [[1,100,100,3], \"float16\"]\n         },\n         \"outputs\": [\"output:0\"]\n        }\n\n    - :option:`--enable-saturate-infinity` : Convert +/- infinity values to MAX/MIN_FLOAT for certain computations that have a high risk of generating Not-a-Number (NaN) values. There is a potential performance impact during model execution when this conversion is enabled.\n\n\n    - :option:`--enable-fast-loading-neuron-binaries` : Write the compilation\n      output (NEFF archive) in uncompressed format which results\n      in faster loading of the archive during inference.\n\n    - :option:`--enable-fast-context-switch` : Optimize for faster model switching\n      rather than inference latency. This results in overall faster system\n      performance when your application switches between models frequently\n      on the same neuron core (or set of cores). The optimization\n      triggered by this option for example defers loading some weight\n      constants until the start of inference.\n\n    - :option:`--fast-math` : Controls tradeoff between performance and accuracy for fp32 operators. See more suggestions on how to use this option with the below arguments in :ref:`neuron-cc-training-mixed-precision`.\n\n\n        - ``all`` (Default): enables all optimizations that improve performance. This option can potentially lower precision/accuracy.\n\n        - ``none`` : Disables all optimizations that improve performance. This option will provide best precision/accuracy.\n\n        - Tensor transpose options\n\n            - ``fast-relayout``: Only enables fast relayout optimization to improve performance by using the matrix multiplier for tensor transpose. The data type used for the transpose is either FP16 or BF16, which is controlled by the ``fp32-cast-xxx`` keyword.\n\n            - ``no-fast-relayout``: Disables fast relayout optimization which ensures that tensor transpose is bit-accurate (lossless) but slightly slower.\n\n\n        - Casting options\n\n            - ``fp32-cast-all`` (Default): Cast all FP32 operators to BF16 to achieve highest performance and preserve dynamic range. Same as setting ``--fp32-cast all``.\n\n            - ``fp32-cast-all-fp16``: Cast all FP32 operators to FP16 to achieve speed up and increase precision versus BF16. Same setting as ``--fp32-cast all-fp16``.\n\n            - ``fp32-cast-matmult``: Only cast FP32 operators that use Neuron Matmult engine to BF16 while using FP16 for matmult-based transpose to get better accuracy. Same as setting ``--fp32-cast matmult``.\n\n            - ``fp32-cast-matmult-bf16``: Cast only FP32 operators that use Neuron Matmult engine (including matmult-based transpose) to BF16 to preserve dynamic range. Same as setting ``--fp32-cast matmult-bf16``.\n\n            - ``fp32-cast-matmult-fp16``: Cast only FP32 operators that use Neuron Matmult engine (including matmult-based transpose) to fp16 to better preserve precision. Same as setting ``--fp32-cast matmult-fp16``.\n\n\n\n        .. important ::\n\n            * ``all`` and ``none`` are mutually exclusive\n\n            * ``all`` is equivalent to using ``fp32-cast-all fast-relayout`` (best performance)\n\n            * ``none`` is equivalent to using ``fp32-cast-matmult-bf16 no-fast-relayout`` (best accuracy)\n\n            * ``fp32-cast-*`` options are mutually exclusive\n\n            * ``fast-relayout`` and ``no-fast-relayout`` are mutually exclusive\n\n            * The ``fp32-cast-*`` and ``*-fast-relayout`` options will overwrite the default behavior in ``all`` and ``none``.\n\n            * For backward compatibility, the ``--fp32-cast`` option has higher priority over ``--fast-math``. It will overwrite the FP32 casting options in any of the ``--fast-math`` options if ``--fp32-cast`` option is present explicitly.\n\n\n    - :option:`--fp32-cast` : Refine the automatic casting of fp32 tensors. This is being replaced by a newer --fast-math.\n\n        .. important ::\n\n            * ``--fp32-cast`` option is being deprecated and ``--fast-math`` will replace it in future releases.\n\n            * ``--fast-math`` is introducing the ``no-fast-relayout`` option to enable lossless transpose operation.\n\n\n        The ``--fp32-cast`` is an interface for controlling the performance and accuracy tradeoffs. Many of the ``--fast-math`` values invoke (override) it.\n\n        - ``all`` (default): Cast all FP32 operators to BF16 to achieve speed up and preserve dynamic range.\n\n        - ``matmult``: Cast only FP32 operators that use Neuron Matmult engine to BF16 while using fp16 for matmult-based transpose to get better accuracy.\n\n        - ``matmult-fp16``: Cast only FP32 operators that use Neuron Matmult engine (including matmult-based transpose) to fp16 to better preserve precision.\n\n        - ``matmult-bf16``: Cast only FP32 operators that use Neuron Matmult engine (including matmult-based transpose) to BF16 to preserve dynamic range.\n\n        - ``all-fp16``: Cast all FP32 operators to FP16 to achieve speed up and better preserve precision.\n\n\n\n\n    **Log Levels:**\n\n        Logs at levels “trace”, “debug”, and “info” will be written to STDOUT.\n\n        Logs at levels “warn”, “error”, and “fatal” will be written to STDERR.\n\n    **Exit Status**\n\n        **0** - Compilation succeeded\n\n        **>0** - An error occurred during compilation.\n\n    **Examples**\n\n\n        Compiling a saved TensorFlow model:\n\n        .. code-block:: shell\n\n           neuron-cc compile test_graph_tfmatmul.pb --framework TENSORFLOW --io-config test_graph_tfmatmul.config\n\n        Compiling a MXNet model:\n\n        .. code-block:: shell\n\n           neuron-cc compile lenet-symbol.json lenet-0001.params --framework MXNET --neuroncore-pipeline-cores 2 --output file.neff\n\n        Compiling an XLA HLO:\n\n        .. code-block:: shell\n\n           neuron-cc compile bert-model.hlo --framework XLA  --output file.neff\n\n.. _neuron-cc-list-operators:\n\n.. option:: neuron-cc list-operators [parameters]\n\n    .. _description-1:\n\n        Returns a newline ('n') separated list of operators supported by the NeuronCore.\n\n        -  **TENSORFLOW**: Operators will be formatted according to the value\n           passed to the associated REGISTER_OP(“OperatorName”) macro.\n\n           See https://www.tensorflow.org/guide/create_op#define_the_op_interface\n           for more information regarding operator registration in TensorFlow.\n\n        -  **MXNET**: Operator names will be formatted according to the value\n           passed to the associated NNVM_REGISTER_OP(operator_name) macro.\n\n        -  **XLA**: Operator names will be formatted according to the value used by XLA compiler in XlaBuilder.\n\n           See https://www.tensorflow.org/xla/operation_semantics for more information regarding XLA operator semantics in XLA interface.\n\n    .. code-block:: shell\n\n        neuron-cc list-operators --framework <value>\n\n    .. _options-1:\n\n    - :option:`--framework` (string): Framework in which the operators were\n      registered.\n\n      Valid values:\n\n        - :option:`TENSORFLOW`\n        - :option:`MXNET`\n        - :option:`XLA`\n\n    **Exit Status**\n\n    **0** - Call succeeded\n\n    **>0** - An error occurred\n\n    **Example**\n\n    .. code-block:: shell\n\n       $ neuron-cc list-operators --framework TENSORFLOW\n       AddN\n       AdjustContrastv2\n       CheckNumbers\n       ...\n"
  },
  {
    "path": "compiler/neuron-cc/developer-guide.rst",
    "content": "Developer Guide\n===================\n\n.. toctree::\n    :maxdepth: 1\n    \n    /about-neuron/appnotes/neuron-cc/mixed-precision"
  },
  {
    "path": "compiler/neuron-cc/faq.rst",
    "content": ".. _neuron_compiler_faq:\n\nNeuron Compiler FAQ (``neuron-cc``)\n===================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nWhere can I compile to Neuron?\n---------------------------------\n\nThe one-time compilation step from the standard framework-level model to\nNEFF binary may be performed on any EC2 instance or even\non-premises.\n\nWe recommend using a high-performance compute server of choice (C5 or\nz1d instance types), for the fastest compile times and ease of use with\na prebuilt `DLAMI <https://aws.amazon.com/machine-learning/amis/>`__.\nDevelopers can also install Neuron in their own environments; this\napproach may work well for example when building a large fleet for\ninference, allowing the model creation, training and compilation to be\ndone in the training fleet, with the NEFF files being distributed by a\nconfiguration management application to the inference fleet.\n\nMy current Neural Network is based on FP32, how can I use it with Neuron?\n-------------------------------------------------------------------------\n\nDevelopers who want to train their models in FP32 for best accuracy can\ncompile and deploy them with Neuron. The Neuron compiler automatically converts\nFP32 to internally supported datatypes, such as FP16 or BF16.\nYou can find more details about FP32 data type support\nand performance and accuracy tuning\nin :ref:`neuron-cc-training-mixed-precision`.\nThe Neuron compiler preserves the application interface - FP32 inputs and outputs.\nTransferring such large tensors may become a bottleneck for your application.\nTherefore, you can improve execution time by casting the inputs and outputs to\nFP16 or BF16 in the ML framework prior to compilation for Inferentia.\n\nWhat are some of the important compiler defaults I should be aware of?\n-----------------------------------------------------------------------\n\nThe compiler compiles the input graph for a single NeuronCore by default. Using the :option:`--neuroncore-pipeline-cores` option directs the compiler to\npartition so as to run on a specified number of NeuronCores. This number can\nbe less than the total available NeuronCores on an instance.\nSee :ref:`inferentia-arch` for more information on NeuronCores.\n\nWhich operators does Neuron support?\n---------------------------------------\n\nsee :ref:`neuron-supported-operators`.\n\nYou can also use the \"neuron-cc list-operators\" command on the cli to list the\noperators. See :ref:`neuron-cc-list-operators`\n\nIf your model contains operators missing from the above list, and you can't reach your performance goals, please\npost a message on the Neuron developer forum or open a github issue to let us know.\n\nAny operators that Neuron doesn't support?\n---------------------------------------------\n\nModels with control-flow and dynamic shapes are not supported. You will\nneed to partition the model using the framework prior to compilation.\nSee the :ref:`neuron-cc`.\n\nWill I need to recompile again if I updated runtime/driver version?\n----------------------------------------------------------------------\n\nThe compiler and runtime are committed to maintaining compatibility for\nmajor version releases with each other. The versioning is defined as\nmajor.minor, with compatibility for all versions with the same major\nnumber. If the versions mismatch, an error notification is logged and\nthe load will fail. This will then require the model to be recompiled.\n\nI have a NEFF binary, how can I tell which compiler version\n-----------------------------------------------------------\ngenerated it?** We will bring a utility out to help with this soon.\n\nHow long does it take to compile?\n------------------------------------\n\nIt depends on the model and its size and complexity, but this generally\ntakes a few minutes.\n"
  },
  {
    "path": "compiler/neuron-cc.rst",
    "content": ".. _neuron-cc-index:\n\nNeuron Compiler for Inf1\n========================\n\n.. toctree::\n    :maxdepth: 1\n\n    API Reference Guide </compiler/neuron-cc/api-reference-guide>\n    CLI Reference </compiler/neuron-cc/command-line-reference>\n    Developer Guide </compiler/neuron-cc/developer-guide>\n    FAQ </compiler/neuron-cc/faq>"
  },
  {
    "path": "compiler/neuronx-cc/api-reference-guide/index.rst",
    "content": ".. _neuron-compiler-cli-reference-guide:\n\nNeuron Compiler CLI Reference Guide (``neuronx-cc``)\n====================================================\n\nThis document describes the command line interface of the Neuron Compiler.\n\nThis reference is not relevant for applications that run the Neuron Compiler from within a machine learning framework (:ref:`PyTorch-Neuron <pytorch-neuronx-programming-guide>` for example) since these options are passed from the framework directly to the compiler. Using the compiler command line may be desirable for applications that do not use a framework or customize existing frameworks. It is also possible to specify compiler options within the framework which will forward these options to the compiler using :ref:`NEURON_CC_FLAGS <pytorch-neuronx-envvars>`.\n\n.. contents:: Table of Contents\n  :local:\n  :depth: 3\n\nUsage\n-----\n\n*Optional parameters are shown in square brackets.*\n\n.. _neuron_cli:\n\n.. rubric:: Neuron Compiler Command-Line Interface\n\n.. program:: neuronx-cc\n\n.. option:: neuronx-cc <command> [parameters]\n\nAvailable Commands\n------------------\n\n-  ``compile``\n-  ``list-operators``\n\nCommon parameters for the Neuron CLI:\n\n- ``--help``: Display a usage message of compiler options.\n    Use ``neuronx-cc <command> --help`` for information on a specific command.\n\n\n.. _neuronx-cc-compile:\n\n'compile' Command\n-----------------\n\n.. option:: neuronx-cc compile [parameters]\n\n  .. _description-1:\n\n  Compile a model for use on the AWS Machine Learning Accelerator.\n\n\n  .. code-block:: shell\n\n     neuronx-cc compile <model_files>\n     --framework <framework_name>\n     --target <instance_family>\n     [--model-type <model>]\n     [--auto-cast <cast_mode>]\n     [--auto-cast-type <data_type>]\n     [--distribution-strategy <distribution_type>]\n     [--logical-nc-config <shard_degree>], or [-lnc <shard_degree>]\n     [--optlevel <opt_level>], or [-O <opt_level>]\n     [--enable-mixed-precision-accumulation]\n     [--enable-saturate-infinity]\n     [--enable-fast-context-switch]\n     [--enable-fast-loading-neuron-binaries]\n     [--logfile <filename>]\n     [--output <filename>]\n     [--verbose <level>]\n\n\n  Parameters\n  ~~~~~~~~~~\n\n  - ``<model_files>``: Input containing model specification.\n      The number of arguments required varies between frameworks:\n\n      - **XLA**: A local filename of a HLO file (hlo.pb) generated via XLA. See `hlo.proto <https://github.com/tensorflow/tensorflow/blob/73c8e20101ae93e9f5ff0b58f68be0b70eca44c5/tensorflow/compiler/xla/service/hlo.proto>`_ for the .proto description and `inspect-compiled-programs <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/g3doc/index.md#user-content-inspect-compiled-programs>`_ for more information on how to generate such files.\n\n  - ``--framework <framework_name>``: Framework used to generate training model.\n\n    Valid values:\n\n    - ``XLA``\n\n  - ``--target <instance_family>``: Name of the Neuron instance family on which the compiled model will be run.\n\n    Valid values:\n\n    - ``inf2``\n    - ``trn1``\n    - ``trn1n``\n    - ``trn2``\n\n  - ``--model-type <model>``: Permit the compiler to attempt model-specific optimizations based upon type of model being compiled. (Default: ``generic``)\n\n    Valid values:\n\n    - ``generic``: Perform optimizations applicable to all types of inference and training models.\n    - ``transformer``: Perform optimizations specific to `Transformer <https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)>`_ models. \n    - ``unet-inference``: Perform optimizations specific to certain `U-Net <https://en.wikipedia.org/wiki/U-Net>`_ model architectures when performing inference. U-Net models often have certain structures that result in excessive performance-impacting data transfers; this option allows the compiler to apply additional memory optimizations to prevent these data transfers and also allows the compiler to map larger normalization operators which would otherwise not successfully execute.\n\n  - ``--auto-cast <cast_mode>``: Controls how the compiler makes tradeoffs between performance and accuracy for FP32 operations. (Default: ``none``)\n\n    Valid values:\n\n    - ``none``: (default) Leave all data types as defined in the model. Do not apply auto-casting data type optimizations.\n    - ``matmult``: Only cast FP32 operations that use the Neuron matrix-multiplication engine.\n    - ``all``: Cast all FP32 operations to achieve highest performance. This option can potentially lower precision/accuracy.\n\n    A more complete discussion on how to use this option and its arguments is in :ref:`Mixed Precision and Performance-accuracy Tuning for Training <neuronx-cc-training-mixed-precision>`.\n\n    .. note:: If the ``--auto-cast`` option is specified, the ``--auto-cast-type`` compiler flag can be optionally set to define which lower-precision data type the compiler should use.\n\n  - ``--auto-cast-type <data_type>``: When auto-cast mode is enabled, cast the FP32 operators to the lower-precision data type specified by this option. (Default: ``bf16``)\n\n    Valid values:\n\n    - ``bf16``: Cast the FP32 operations selected via the ``--auto-cast`` option to BF16 to achieve highest performance and preserve dynamic range.\n    - ``fp16``: Cast the FP32 operations selected via the ``--auto-cast`` option to FP16 to achieve improved performance relative to FP32 and increased precision relative to BF16.\n    - ``tf32``: Cast the FP32 operations selected via the ``--auto-cast`` option to TensorFloat-32.\n    - ``fp8_e4m3``: Cast the FP32 operations selected via the ``--auto-cast`` option to a signed 8-bit floating point represented as a 4-bit exponent and 3-bit mantissa. \n\n\n    .. note:: If multiple competing options are specified then the option right-most on the command line will supercede previous options.\n\n  - ``--distribution-strategy <distribution_type>``: Permit the compiler to attempt model-specific optimizations based upon type of model being compiled. (Default: ``generic``)\n\n    Valid values:\n\n    - ``llm-training``: Enable the compiler to perform optimizations applicable to large language model (LLMS) training runs that  shard parameters, gradients, and optimizer states across data-parallel workers. This is equivalent to the previously documented option argument value of ``NEMO``, which will be deprecated in a future release.\n\n  - ``--logical-nc-config <shard_degree>``: Instructs the compiler to shard the input graph across physical NeuronCore accelerators. Possible numeric values are {1, 2}. (Only available on trn2; Default: ``2``)\n\n    Valid values:\n\n    - ``1``: instructs the compiler to shard the input graph across 1 physical NeuronCore, i.e., do not perform any input graph sharding.\n    - ``2``: [default on trn2] instructs the compiler to shard the input graph across 2 physical NeuronCores.\n\n  - ``--optlevel <opt_level>``: Specify the level of optimization the compiler should perform. Possible numeric values are {1, 2, 3}. (Default: ``2``)\n\n    Valid values:\n\n    - ``1``: enables the core performance optimizations in the compiler, while also minimizing compile time.\n    - ``2``: [default] provides the best balance between model performance and compile time.\n    - ``3``: may provide additional model execution performance but may incur longer compile times and higher host memory usage during model compilation.\n\n    .. note:: This option supercedes, and deprecates, the ``—enable-experimental-O1`` option introduced in an earlier release.\n\n  - ``--enable-mixed-precision-accumulation``: **Enabled by default**. Set to ``true`` by default. Perform intermediate calculations of accumulation operators (such as softmax and layernorm) in FP32 and cast the result to the model-designated datatype. This improves the operator's resulting accuracy.\n\n  - ``--disable-mixed-precision-accumulation``: Disables mixed precision accumulation. Mixed precision accumulation is enabled by default; use this flag to disable it. Disabling mixed precision accumulation may improve performance at the cost of reduced accuracy for certain operators.\n\n  - ``--enable-saturate-infinity``: Convert +/- infinity values to MAX/MIN_FLOAT for compiler-introduced matrix-multiply transpose computations that have a high risk of generating Not-a-Number (NaN) values. There is a potential performance impact during model execution when this conversion is enabled. (Only needed on trn1; while the trn2 compiler will accept this flag for compatibility reasons, it has no effect on the compilation.)\n\n  - ``--enable-fast-context-switch``: Optimize for faster model switching rather than execution latency.\n      This option will defer loading some weight constants until the start of model execution. This results in overall faster system performance when your application switches between models frequently on the same Neuron Core (or set of cores).\n\n  - ``--enable-fast-loading-neuron-binaries``: Save the compilation output file in an uncompressed format.\n      This creates executable files which are larger in size but faster for the Neuron Runtime to load into memory during model execution.\n\n  - ``--logfile <filename>``: Filename where compiler writes log messages. (Default: “log-neuron-cc.txt”).\n\n  - ``--output <filename>``: Filename where compilation output (NEFF archive) will be recorded. (Default: \"file.neff”)\n\n  - ``--verbose <level>``: Specify the level of output produced by the compiler. (Default: ``warning``)\n\n    Valid values:\n\n    - ``info``: Informational messages regarding the progress of model compilation (written to stdout).\n    - ``warning``: Diagnostic messages that report model code that is not inherently erroneous but may be risky or suggest there may have been an error (written to stderr).\n    - ``error``: The compiler detected a condition causing it not complete the compilation successfully (written to stderr).\n    - ``critical``: The compiler encountered an unrecoverable error terminates immediately (written to stderr).\n    - ``debug``: Extensive information regarding the compiler's internal execution phases (written to stdout).\n\n  *Example*:\n    Compiling an XLA HLO:\n\n    .. code-block:: shell\n\n      neuronx-cc compile bert-model.hlo —-framework XLA -—target trn1 —-model-type transformer —-output bert.neff\n\n\n.. _neuronx-cc-list-operators:\n\n'list-operators' Command\n------------------------\n\n.. option:: neuronx-cc list-operators [parameters]\n\n  .. _description-1:\n\n  Returns a newline (‘\\\\n’) separated list of operators supported by the Neuron Compiler.\n\n  .. code-block:: shell\n\n    neuronx-cc list-operators\n    --framework <value>\n\n  Parameters\n  ~~~~~~~~~~\n\n  - ``--framework <framework_name>``: Framework in which the operators were registered.\n\n    Valid values:\n\n    - ``XLA``: Operator names will be formatted according to the value used by XLA compiler in XlaBuilder.\n\n\n  *Example*:\n\n  .. code-block:: shell\n\n    neuronx-cc list-operators —framework XLA\n    ...\n\n\nCompiler Exit Statuses\n----------------------\n\n- **0**: Compilation succeeded\n- **<>0**: An error occurred during compilation.\n"
  },
  {
    "path": "compiler/neuronx-cc/developer-guide.rst",
    "content": ".. meta::\n   :description: Developer guides for the Neuron Compiler (neuronx-cc), including mixed precision training, performance tuning, and custom kernel implementation for AWS Trainium and Inferentia.\n   :keywords: neuronx-cc, Neuron Compiler, mixed precision, BF16, FP16, TF32, auto-cast, convolution kernels, UNet, performance optimization, Trainium, Inferentia\n\nDeveloper Guide\n===================\n\nLearn how to optimize your models with the Neuron Compiler (neuronx-cc). These guides cover mixed precision training, performance-accuracy tuning, and custom kernel implementations for AWS Trainium and Inferentia instances.\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Mixed Precision and Performance-Accuracy Tuning\n      :link: /about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision\n      :link-type: doc\n\n      Learn how to use FP32, TF32, FP16, and BF16 data types with the Neuron Compiler's auto-cast options to balance performance and accuracy. Understand the tradeoffs between different data types and how to configure compiler settings for optimal model execution.\n\n   .. grid-item-card:: How to Use Convolution Kernels in UNet Training Models\n      :link: /compiler/neuronx-cc/how-to-convolution-in-unet\n      :link-type: doc\n\n      Modify UNet training models to use custom convolution kernels with NKI (Neuron Kernel Interface). This implementation helps avoid out-of-memory errors when training convolution-heavy models on Trainium instances.\n\n.. toctree::\n    :hidden:\n    :maxdepth: 1\n    \n    /about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision\n    /compiler/neuronx-cc/how-to-convolution-in-unet"
  },
  {
    "path": "compiler/neuronx-cc/faq.rst",
    "content": ".. _neuronx_compiler_faq:\n\nNeuron Compiler FAQ (``neuronx-cc``)\n====================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nWhere can I compile to Neuron?\n---------------------------------\n\nThe one-time compilation step from the standard framework-level model to\nNEFF binary may be performed on any EC2 instance or even\non-premises.\n\nWe recommend using a high-performance compute server of choice (C5 or\nz1d instance types), for the fastest compile times and ease of use with\na prebuilt `DLAMI <https://aws.amazon.com/machine-learning/amis/>`__.\nDevelopers can also install Neuron in their own environments; this\napproach may work well for example when building a large fleet for\ninference, allowing the model creation, training and compilation to be\ndone in the training fleet, with the NEFF files being distributed by a\nconfiguration management application to the inference fleet.\n\n.. _neuron-vs-neuronx:\n\nWhat is the difference between ``neuron-cc`` and ``neuronx-cc``?\n----------------------------------------------------------------\n\n* ``neuron-cc`` is the Neuron Compiler with TVM front-end, ``neuron-cc`` supports only :ref:`neuroncores-v1-arch`.\n* ``neuronx-cc`` is the Neuron Compiler with XLA front-end, ``neuronx-cc`` currently supports \n  :ref:`neuroncores-v2-arch`, ``neuronx-cc`` support of :ref:`neuroncores-v1-arch` is currently a \n  :ref:`Roadmap Item <neuron_roadmap>`.\n\nShould I use ``neuron-cc`` or ``neuronx-cc``?\n---------------------------------------------\n\nSee :ref:`neuron-vs-neuronx`\n\nMy current neural network is based on FP32, how can I use it with Neuron?\n-------------------------------------------------------------------------\n\nDevelopers who want to train their models in FP32 for best accuracy can\ncompile and deploy them with Neuron. The Neuron compiler automatically converts\nFP32 to internally supported datatypes, such as FP16 or BF16.\nYou can find more details about FP32 data type support\nand performance and accuracy tuning\nin :ref:`neuronx-cc-training-mixed-precision` or :ref:`neuron-cc-training-mixed-precision`.\nThe Neuron compiler preserves the application interface - FP32 inputs and outputs.\nTransferring such large tensors may become a bottleneck for your application.\nTherefore, you can improve execution time by casting the inputs and outputs to\nFP16 or BF16 in the ML framework prior to compilation.\n\nWhich operators does Neuron support?\n---------------------------------------\n\nYou can use the ``neuronx-cc list-operators`` command on the cli to list the operators. See :ref:`neuron-compiler-cli-reference-guide`.\n\nTo request support for new operators, open an issue on our `GitHub forum <https://github.com/aws/aws-neuron-sdk/issues/new>`_.\n\nAny operators that Neuron Compiler doesn't support?\n---------------------------------------------------\n\nModels with control-flow and dynamic shapes are not supported now. You will\nneed to partition the model using the framework prior to compilation.\n\n.. note::\n\n  Starting with :ref:`neuroncores-v2-arch` Neuron supports control-flow and dynamic shapes.\n\n  Stay tuned and follow the :ref:`Neuron Roadmap <neuron_roadmap>`.\n\nWill I need to recompile again if I updated runtime/driver version?\n----------------------------------------------------------------------\n\nThe compiler and runtime are committed to maintaining compatibility for\nmajor version releases with each other. The versioning is defined as\nmajor.minor, with compatibility for all versions with the same major\nnumber. If the versions mismatch, an error notification is logged and\nthe load will fail. This will then require the model to be recompiled.\n\nI have a NEFF binary, how can I tell which compiler version generated it?\n-------------------------------------------------------------------------\n ** We will bring a utility out to help with this soon.\n\nHow long does it take to compile?\n------------------------------------\n\nIt depends on the model and its size and complexity, but this generally\ntakes a few minutes.\n\nWhy is my model producing different results compared to CPU/GPU?\n----------------------------------------------------------------\n\n:ref:`neuroncores-v2-arch` supports multiple casting modes for floating point numbers, each with\nassociated implications for performance and accuracy. The default casting mode\nis a pragmatic balance between performance and accuracy, however on some models\nit may result in loss of precision.\n\nSee the :option:`--auto-cast` and :option:`--auto-cast-type` options in :ref:`neuron-compiler-cli-reference-guide` for details on how to adjust the casting mode.\n\nDo you support model *<insert model type>*?\n-------------------------------------------\n\n``neuronx-cc`` has explicit support for select model families using the :option:`--model-type` option, though many other model types are supported. You can also inspect supported operators using the :option:`list-operators` sub-command. See th :ref:`neuron-compiler-cli-reference-guide` for details.\nMore generally, support for new operators and models is continually being added. See our :ref:`neuron_roadmap` for details.\n"
  },
  {
    "path": "compiler/neuronx-cc/how-to-convolution-in-unet.rst",
    "content": ".. meta::\n   :description: Learn how to modify UNet training models to use convolution kernels with AWS Neuron SDK\n   :date_updated: 2025-09-09\n\n.. _implement-convolution-kernels-unet:\n\n=======================================================\nHow to Use Convolution Kernels in UNet Training Models\n=======================================================\n\nTask overview\n-------------\nThis topic discusses how to modify UNet training models to use convolution kernels with the AWS Neuron SDK. This implementation helps avoid out-of-memory errors seen when performing training on the convolution-heavy UNet model.\n\nPrerequisites\n-------------\n- AWS Neuron SDK 2.26 or later: Required for kernel implementation support\n- trn1.32xlarge instance: Needed for model training  \n- Existing UNet implementation: Base model to be modified\n- PyTorch-Neuron environment: Required for neural network operations\n\nInstructions\n------------\n\n**1: Import required dependencies**\n\n.. code-block:: python\n\n   import torch\n   import torch.nn as nn\n   import torch.nn.functional as F\n   from torch.autograd import Function\n   import neuronxcc.nki as nki\n   import neuronxcc.nki.language as nl\n   from neuronxcc.nki._private_kernels.conv import conv2d_dw_fb01_io01_01bf_rep_nhwc_Pcinh\n\n**2: Create the convolution wrapper function**\n\n.. code-block:: python\n\n   @nki.jit\n   def conv_wrap(img_ref, filter_ref, out_shape):\n       out_arr = nl.ndarray(shape=out_shape, dtype=img_ref.dtype, buffer=nl.hbm)\n       conv2d_dw_fb01_io01_01bf_rep_nhwc_Pcinh(img_ref, filter_ref, out_arr, **{\n           'input': img_ref.shape,\n           'filter': filter_ref.shape, \n           'output': out_shape,\n           'in_perm': [0, 1, 2, 3],\n           'kern_perm': [0, 1, 2, 3],\n           'out_perm': [0, 1, 2, 3],\n           'stride': (1, 1),\n           'padding': ((1, 1), (1, 1))})\n       return out_arr\n\n**3: Implement the custom Conv2d module**\n\n.. code-block:: python\n\n   class BwdConv2dWithKernel(nn.Module):\n       def __init__(self, in_channels, out_channels, kernel_size, padding, bias):\n           super().__init__()\n           assert padding == 1\n           assert bias == False\n           self.in_channels = in_channels\n           self.out_channels = out_channels\n           self.kernel_size = kernel_size\n           self.weight = nn.Parameter(torch.randn(out_channels, in_channels, kernel_size, kernel_size))\n           nn.init.kaiming_uniform_(self.weight, a=0.0, mode='fan_in', nonlinearity='leaky_relu')\n\n**4: Replace standard convolutions in the UNet model**\n\n.. code-block:: python\n\n   class DoubleConvWithKernel(nn.Module):\n       def __init__(self, in_channels, out_channels, mid_channels=None):\n           super().__init__()\n           if not mid_channels:\n               mid_channels = out_channels\n           self.double_conv = nn.Sequential(\n               BwdConv2dWithKernel(in_channels, mid_channels, kernel_size=3, padding=1, bias=False),\n               nn.BatchNorm2d(mid_channels),\n               nn.ReLU(inplace=True),\n               BwdConv2dWithKernel(mid_channels, out_channels, kernel_size=3, padding=1, bias=False),\n               nn.BatchNorm2d(out_channels),\n               nn.ReLU(inplace=True)\n           )\n\n**5: Update the UNet model initialization**\n\n.. code-block:: python\n\n   def __init__(self, n_channels, n_classes, bilinear=False):\n       super().__init__()\n       self.n_channels = n_channels\n       self.n_classes = n_classes\n       self.bilinear = bilinear\n       self.inc = (DoubleConvWithKernel(n_channels, 64))\n       # ... rest of initialization\n\nConfirm your work\n-----------------\n\nTo confirm successful implementation, verify the following:\n\n.. code-block:: bash\n\n   Expected training output\n   Training Device=xla:0 Epoch=1 Step=20 Loss=0.30803\n   Training Device=xla:0 Epoch=2 Step=560 Loss=0.01826\n\nCheck for:\n\n- No out-of-memory errors during execution\n- Decreasing loss values across epochs\n\nCommon issues\n-------------\n\n.. rubric:: Memory Errors\n\n- Solution: Verify all standard convolutions are replaced with BwdConv2dWithKernel implementations\n\n.. rubric:: Compilation Errors\n\n- Solution: Confirm Neuron SDK version is 2.26 or later\n\n.. rubric:: Kernel Errors\n\n- Solution: Use the kernel for supported configurations. The kernel will error out in unsupported scenarios.\n\nRelated information\n-------------------\n\n.. toctree::\n   :maxdepth: 1\n\n   * `UNet training sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/unet_image_segmentation>`_ - Sample UNet training implementation\n"
  },
  {
    "path": "compiler/neuronx-cc.rst",
    "content": ".. _neuronx-cc-index:\n\nNeuronX Compiler for Trn1 & Inf2\n=================================\n\n.. toctree::\n    :maxdepth: 1\n\n    API Reference Guide </compiler/neuronx-cc/api-reference-guide/index>\n    How-to: Convolution </compiler/neuronx-cc/how-to-convolution-in-unet>\n    Developer Guide </compiler/neuronx-cc/developer-guide>\n    FAQ </compiler/neuronx-cc/faq>"
  },
  {
    "path": "conf.py",
    "content": "# Configuration file for the Sphinx documentation builder.\n#\n# This file only contains a selection of the most common options. For a full\n# list see the documentation:\n# https://www.sphinx-doc.org/en/master/usage/configuration.html\n\n# -- Path setup --------------------------------------------------------------\n\n# If extensions (or modules to document with autodoc) are in another directory,\n# add these directories to sys.path here. If the directory is relative to the\n# documentation root, use os.path.abspath to make it absolute, like shown here.\n\nimport datetime\nimport os\nimport sys\n\nsys.path.append(os.path.abspath(\"./_ext\"))\nsys.path.append(os.path.abspath(\"./nki/api\"))\nsys.path.append(os.path.abspath(\"./nki/_ext\"))\nsys.path.append(os.path.abspath(\"./frameworks/torch/torch-neuron/\"))\nsys.path.append(os.path.abspath(\"./_static\"))\n\n\n# get environment variables\ndef get_env_vars_from_gh():\n    project_name = os.environ.get(\"GIT_PROJECT_NAME\", \"aws-neuron-sdk\")\n    branch_name = os.environ.get(\"GIT_BRANCH_NAME\", \"master\")\n    branch_name = \"master\" if branch_name == \"latest\" else branch_name\n\n    return project_name, branch_name\n\n\ndef get_env_vars_from_rtd():\n    branch_name = os.environ.get(\"READTHEDOCS_VERSION_NAME\", \"master\")\n    branch_name = \"master\" if branch_name == \"latest\" else branch_name\n\n    project_name = \"aws-neuron-sdk\"\n    if os.environ.get(\"READTHEDOCS_PROJECT\") == \"awsdocs-neuron-staging\":\n        project_name = \"private-aws-neuron-sdk-staging\"\n\n    return project_name, branch_name\n\n\ndef get_env_vars():\n    \"\"\"Configure project and branch names based on environment\"\"\"\n    if os.environ.get(\"READTHEDOCS\") == \"True\":\n        return get_env_vars_from_rtd()\n    return get_env_vars_from_gh()\n\n\nproject_name, branch_name = get_env_vars()\n# -- Project information -----------------------------------------------------\n\nproject = \"AWS Neuron\"\ncopyright = \"{}, Amazon.com\".format(datetime.datetime.now().year)\nauthor = \"AWS\"\nmaster_doc = \"index\"\nhtml_title = \"AWS Neuron Documentation\"\n\n# -- General configuration ---------------------------------------------------\n\n# Add any Sphinx extension module names here, as strings. They can be\n# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom\n# ones.\nextensions = [\n    \"sphinxcontrib.contentui\",\n    \"nbsphinx\",\n    \"sphinx.ext.extlinks\",\n    \"sphinx.ext.intersphinx\",\n    \"sphinx_plotly_directive\",\n    \"df_tables\",\n    \"sphinxcontrib.programoutput\",\n    \"neuron_tag\",\n    \"sphinx_design\",\n    \"ablog\",\n    \"sphinx.ext.viewcode\",\n    \"sphinx.ext.napoleon\",\n    \"sphinx.ext.autodoc\",\n    \"sphinx.ext.autosummary\",\n    \"local_documenter\",\n    \"archive\",\n    \"sphinx_copybutton\",\n    \"nki_directives\",\n    \"sphinxcontrib.googleanalytics\",\n    \"sphinxcontrib.datatemplates\",\n    \"sphinxcontrib.spelling\",\n    \"sphinx_tabs.tabs\",\n]\n\n\nhtml_sidebars = {\n    \"**\": [\n        \"navbar-logo.html\",\n        \"search-field.html\",\n        \"sbt-sidebar-nav.html\",\n    ],\n    \"about-neuron/announcements/*\": [\n        \"navbar-logo.html\",\n        \"search-field.html\",\n        \"ablog/postcard.html\",\n        \"ablog/recentposts.html\",\n        \"ablog/tagcloud.html\",\n        \"ablog/categories.html\",\n        \"ablog/archives.html\",\n        \"sbt-sidebar-nav.html\",\n    ],\n}\n\n\n# Add any paths that contain templates here, relative to this directory.\ntemplates_path = [\n    \"_templates\",\n    \"nki/_templates/\",\n    \"_content-types/\",\n    \"libraries/nxd-inference/_templates\",\n]\n\n# List of patterns, relative to source directory, that match files and\n# directories to ignore when looking for source files.\n# This pattern also affects html_static_path and html_extra_path.\nexclude_patterns = ['_build', '_backup-rn', '_backup-setup', '_content-types','**.ipynb_checkpoints','.venv','_utilities', 'nki/_templates']\nhtml_extra_path = ['static']\n\n# remove bash/python/ipython/jupyter prompts and continuations\ncopybutton_prompt_text = r\">>> |\\.\\.\\. |\\$ |In \\[\\d*\\]: | {2,5}\\.\\.\\.: | {5,8}: \"\ncopybutton_prompt_is_regexp = True\n\n# nbsphinx_allow_errors = True\nnbsphinx_execute = \"never\"\n\nhtml_logo = \"images/Site-Merch_Neuron-ML-SDK_Editorial.png\"\n\nnapoleon_google_docstring = True\n\n# Turn on figure/table numbering\nnumfig = True\n\n# -- autodoc/autosummary options -------------------------------------------------\n\nautosummary_generate = True  # Turn on sphinx.ext.autosummary\n\n\n# -- more options -------------------------------------------------\n\n\nprojectblob = project_name + \"/blob/\" + branch_name\nprojecttree = project_name + \"/tree/\" + branch_name\n\nextlinks = {\n    \"mxnet-neuron\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/neuron-guide/neuron-frameworks/mxnet-neuron/%s\",\n        \"\",\n    ),\n    \"pytorch-neuron\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/neuron-guide/neuron-frameworks/pytorch-neuron/%s\",\n        \"\",\n    ),\n    \"tensorflow-neuron\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/neuron-guide/neuron-frameworks/tensorflow-neuron/%s\",\n        \"\",\n    ),\n    \"neuron-deploy\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/neuron-deploy/%s\",\n        \"\",\n    ),\n    \"neuron-tools-tree\": (\n        \"https://github.com/aws-neuron/\" + projecttree + \"/neuron-guide/neuron-tools/%s\",\n        \"\",\n    ),\n    \"mxnet-neuron-src\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/src/examples/mxnet/%s\",\n        \"\",\n    ),\n    \"pytorch-neuron-src\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/src/examples/pytorch/%s\",\n        \"\",\n    ),\n    \"tensorflow-neuron-src\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/src/examples/tensorflow/%s\",\n        \"\",\n    ),\n    \"neuron-gatherinfor-src\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/src/examples/neuron-gatherinfo/%s\",\n        \"\",\n    ),\n    \"neuron-monitor-src\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/src/examples/neuron-monitor/%s\",\n        \"\",\n    ),\n    \"compile-pt\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/archive/src/benchmark/pytorch/%s_compile.py\",\n        \"\",\n    ),\n    \"benchmark-pt\": (\n        \"https://github.com/aws-neuron/\" + projectblob + \"/archive/src/benchmark/pytorch/%s_benchmark.py\",\n        \"\",\n    ),\n    \"llama-sample\": (\n        \"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/%s.ipynb\",\n        \"\",\n    ),\n    'github':(f'https://github.com/aws-neuron/{project_name}/blob/{branch_name}/%s', '')\n}\n\n\nintersphinx_mapping = {\n    \"python\": (\"https://docs.python.org/3\", None),\n    \"numpy\": (\"https://numpy.org/doc/stable/\", None),\n    \"torch\": (\"https://pytorch.org/docs/master/\", None),\n    \"transformers\": (\"https://huggingface.co/docs/transformers/master/en/\", None),\n}\n\n# -- Options for Theme  -------------------------------------------------\n\ntop_banner_message = \"<b>Neuron 2.29.0 is released!</b> Check the <a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/whats-new.html'>What's New</a> and <a class='reference internal' style='color:white;' href='https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html'>Release Notes</a> for more details.\"\n\nhtml_theme = \"sphinx_book_theme\"\nhtml_theme_options = {\n    \"repository_url\": \"https://github.com/aws-neuron/\" + project_name,\n    \"use_issues_button\": True,\n    \"use_repository_button\": True,\n    \"use_download_button\": True,\n    \"use_fullscreen_button\": True,\n    \"use_edit_page_button\": True,\n    \"home_page_in_toc\": False,\n    \"repository_branch\": branch_name,\n    \"announcement\": top_banner_message,\n    # \"navbar_persistent\": [],\n}\n\nhtml_additional_pages = {\n    \"search-google\": \"search-google.html\",\n}\n\nhtml_context = {\n    # ...\n    \"default_mode\": \"light\"\n}\n\n# The theme to use for HTML and HTML Help pages.  See the documentation for\n# a list of builtin themes.\n#\n# html_theme = 'sphinx_rtd_theme'\n\n# html_theme_options = {\n#\n#    'navigation_depth': 3\n# }\n\n\n# html_theme = \"pydata_sphinx_theme\"\n# html_theme_options = {\n#   \"use_edit_page_button\": True,\n# }\n\n# html_context = {\n#    \"github_url\": \"https://github.com\",\n#    \"github_user\": \"aws-neuron\",\n#    \"github_repo\": \"private-aws-neuron-sdk-staging\",\n#    \"github_version\": \"master\",\n#    \"doc_path\": \"/\",\n# }\n\n# -- Options for HTML output -------------------------------------------------\n\nhtml_css_files = [\"css/custom.css\", \"styles/sphinx-book-theme.css\"]\n\n# def setup(app):\n#   app.add_css_file('css/custom.css')\n\n# Add any paths that contain custom static files (such as style sheets) here,\n# relative to this directory. They are copied after the builtin static files,\n# so a file named \"default.css\" will overwrite the builtin \"default.css\".\nhtml_static_path = [\"_static\"]\n\nplotly_include_source = False\nplotly_html_show_source_link = False\nplotly_html_show_formats = False\nplotly_include_directive_source = False\n\n\n# -- ABlog config -------------------------------------------------\nblog_path = \"about-neuron/announcements/index\"\nblog_post_pattern = \"about-neuron/appnotes/*.rst\"\nblog_feed_length = 5\nfontawesome_included = True\npost_show_prev_next = False\npost_auto_image = 1\npost_auto_excerpt = 2\nexecution_show_tb = \"READTHEDOCS\" in os.environ\n\n# --- Google Analytics Sphinx extension ---\n\ngoogleanalytics_id = \"G-2Q13EGB80H\"\n\n# --- for neuron-tag directive ---\n\nrst_prolog = \"\"\"\n\n.. neuron-tag::\n\n\n\"\"\"\n\nrst_epilog = \"\"\"\n\n.. neuron-tag::\n\n\"\"\"\n\n# Exclude private github from linkcheck. Readthedocs only exposes the ssh-agent to the 'checkout' build step, which is too early for the linkchecker to run.\nlinkcheck_ignore = [\n    r\"http://localhost:\\d+/\",\n    r\"https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/dlami-pytorch-introduce.html\",\n    r\"https://github\\.com/aws-neuron/private-aws-neuron-sdk-staging/\",\n    r\"https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/dlami-pytorch-introduce.html\",\n    r\"https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install.html#install-tensorflow-neuronx\",\n    r\"https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx#inference\",\n    r\"https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx#training\",\n    r\"https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers\",\n    r\"https://github.com/aws-neuron/aws-neuron-sagemaker-samples/tree/master/inference/inf2-bert-on-sagemaker\",\n    r\"https://github.com/awslabs/multi-model-server/blob/master/docs/management_api.md\",\n    r\"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/dp_bert_hf_pretrain/run_dp_bert_large_hf_pretrain_bf16_s128.sh\",\n    r\" https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist.py\",\n    r\"https://github.com/pytorch/xla/blob/v1.10.0/TROUBLESHOOTING.md\",\n    r\"https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/saved_model.md\",\n    r\"https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/g3doc/index.md\",\n    r\"https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist.py\",\n    r\"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb\",\n    r\"https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.ipynb\",\n    r\"https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md\",\n    r\"https://github.com/pytorch/PiPPy/blob/main/pippy/IR.py#L697\",\n    r\"https://github.com/pytorch/pytorch/blob/main/torch/fx/_symbolic_trace.py#L241\",\n    r\"https://github.com/pytorch/xla/blob/master/torch_xla/utils/checkpoint.py#L129\",\n    r\"https://github.com/aws-neuron/neuronx-distributed/blob/main/src/neuronx_distributed/parallel_layers/layer_norm.py#L32\",\n    r\"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain.py#L273C1-L289C55\",\n    r\"https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install.html#pytorch-neuronx-install\",\n    r\"https://github.com/google-research/bert#user-content-pre-trained-models\",\n    r\"https://github.com/google-research/bert#user-content-sentence-and-sentence-pair-classification-tasks\",\n    r\"https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-retirement.html\",\n    r\"https://repost.aws/knowledge-center/eventbridge-notification-scheduled-events\",\n    r\"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/modeling_gpt_neox_nxd.py\",\n    r\"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain.py\",\n    r\"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3-8b-32k-sampling.ipynb\",\n]\nlinkcheck_exclude_documents = [\n    r\"src/examples/.*\",\n    \"about-neuron/announcements/neuron1.x/announcements\",\n    r\"release-notes/.*\",\n    r\"containers/.*\",\n]\nnitpicky = False\n\n"
  },
  {
    "path": "containers/container-deployment-flows.rst",
    "content": ".. _container-deployment-flows:\n\nContainer Deployment Flows\n==========================\n\nYou can also choose one of the following combinations for running the neuron container:\n\n.. toctree::\n   :maxdepth: 1\n   \n   dlc-then-ec2-devflow\n   dlc-then-ecs-devflow\n   dlc-then-eks-devflow\n   container-sm-hosting-devflow\n"
  },
  {
    "path": "containers/container-sm-hosting-devflow.rst",
    "content": ".. _containers-byoc-hosting-devflow:\n\n.. include:: /devflows/inference/byoc-hosting-devflow.rst\n"
  },
  {
    "path": "containers/developerflows.rst",
    "content": "Containers - Developer Flows\n============================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /containers/dlc-then-ec2-devflow\n    /containers/dlc-then-ecs-devflow\n    /containers/dlc-then-eks-devflow\n    /containers/container-sm-hosting-devflow\n    /containers/dlc-then-customize-devflow\n\n\n\n.. include:: /containers/developerflows.txt\n"
  },
  {
    "path": "containers/developerflows.txt",
    "content": ".. tab-set:: \n\n    .. tab-item:: Inference\n    \n        * :ref:`containers-dlc-then-ec2-devflow`\n        * :ref:`containers-dlc-then-ecs-devflow`\n        * :ref:`containers-dlc-then-eks-devflow`\n        * :ref:`containers-byoc-hosting-devflow`\n        * :ref:`containers-dlc-then-customize-devflow`\n"
  },
  {
    "path": "containers/dlc-then-customize-devflow.rst",
    "content": ".. _containers-dlc-then-customize-devflow:\n\n.. include:: /devflows/dlc-then-customize-devflow.rst\n"
  },
  {
    "path": "containers/dlc-then-ec2-devflow.rst",
    "content": ".. _containers-dlc-then-ec2-devflow:\n\n.. include:: /devflows/inference/dlc-then-ec2-devflow.rst"
  },
  {
    "path": "containers/dlc-then-ecs-devflow.rst",
    "content": ".. _containers-dlc-then-ecs-devflow:\n\n.. include:: /devflows/inference/dlc-then-ecs-devflow.rst"
  },
  {
    "path": "containers/dlc-then-eks-devflow.rst",
    "content": ".. _containers-dlc-then-eks-devflow:\n\n.. include:: /devflows/inference/dlc-then-eks-devflow.rst"
  },
  {
    "path": "containers/dlc-then-k8s-devflow.rst",
    "content": ".. _containers-dlc-then-k8s-devflow:\n\n\n.. include:: /devflows/inference/dlc-then-k8s-devflow.rst"
  },
  {
    "path": "containers/docker-example/Dockerfile.device-plugin",
    "content": "FROM amazonlinux:2 \n\nRUN echo $'[neuron] \\n\\\nname=Neuron YUM Repository \\n\\\nbaseurl=https://yum.repos.neuron.amazonaws.com \\n\\\nenabled=1' > /etc/yum.repos.d/neuron.repo\n\nRUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n\nRUN dnf install -y aws-neuron-k8-plugin\nRUN dnf install -y tar gzip\n\nENV PATH=\"/opt/aws/neuron/bin/k8s-neuron-device-plugin:${PATH}\"\n\nCMD k8s-neuron-device-plugin\n"
  },
  {
    "path": "containers/docker-example/index.rst",
    "content": "Example: Run containerized neuron application\n=============================================\n\nIntroduction:\n-------------\n\nWith this example you will learn how to run a Neuron application using\ndocker containers.\n\nPrerequisites:\n--------------\n\n-  Please ensure the steps from the guide on :ref:`tensorflow-serving`\n   were completed successfully before continuing.\n\nSteps:\n------\n\nStep 1: Start neuron-rtd container:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYou may choose to use the following neuron-rtd image:\n[790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:latest], or\nbuild your own image as shown in :ref:`neuron-runtime-dockerfile`.\n\nRun neuron-rtd container as shown below. A volume must be mounted to\n:/sock where neuron-rtd will open a UDS socket. The application can\ninteract with runtime using this socket.\n\n.. code:: bash\n\n   aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 790709498068.dkr.ecr.us-east-1.amazonaws.com\n \n   docker pull 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:1.1.1402.0\n   docker tag 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:1.1.1402.0 neuron-rtd\n   mkdir /tmp/neuron_rtd_sock\n   chmod o+rwx /tmp/neuron_rtd_sock\n   docker run --device=/dev/neuron0 --cap-add IPC_LOCK -v /tmp/neuron_rtd_sock/:/sock -it neuron-rtd\n   \n   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n   If using older version of neuorn(below 1.1):\n   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n   docker pull 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:1.0.9592.0\n   docker tag 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:1.0.9592.0 neuron-rtd\n   mkdir /tmp/neuron_rtd_sock\n   chmod o+rwx /tmp/neuron_rtd_sock\n   docker run --env AWS_NEURON_VISIBLE_DEVICES=\"0\" --cap-add SYS_ADMIN --cap-add IPC_LOCK -v /tmp/neuron_rtd_sock/:/sock -it neuron-rtd\n\nStep 2: Start application (tensorflow serving) container:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBuild tensorflow-model-server-neuron image using provided example\ndockerfile :ref:`tensorflow-model-server-neuron-dockerfile`.\n\nRun assuming a compiled saved model was stored in s3:///my_model/\n\n.. code:: bash\n\n\n   # Note: the neuron-rtd socket directory must be mounted and pointed at using environment variable.\n   #       TensorFlow serving will use that socket to talk to Neuron-rtd\n   docker run --env NEURON_RTD_ADDRESS=unix:/sock/neuron.sock \\\n              -v /tmp/neuron_rtd_sock/:/sock \\\n              -p 8501:8501 \\\n              -p 8500:8500 \\\n              --env MODEL_BASE_PATH=s3://<my-bucket>/my_model/ \\\n              --env MODEL_NAME=my_model\n              tensorflow-model-server-neuron\n\nStep 3: Verify by running an inference!\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAs shown in :ref:`tensorflow-serving`\n"
  },
  {
    "path": "containers/docker-example/inference/Dockerfile-inference",
    "content": "# Example pytorch neuron container\n# To build:\n#    docker build . -f Dockerfile.pt -t neuron-container:pytorch\n# To run on EC2 Inf1 instances with AWS DLAMI:\n#    docker run -it --device=/dev/neuron0 neuron-container:pytorch\n\nFROM ubuntu:24.04\n\nLABEL maintainer=\" \"\n\nRUN apt-get update -y \\\n && apt-get install -y --no-install-recommends \\\n    gnupg2 \\\n    wget \\\n    python3-pip \\\n    python3-setuptools \\\n    && cd /usr/local/bin \\\n    && pip3 --no-cache-dir install --upgrade pip \\\n    && rm -rf /var/lib/apt/lists/* \\\n    && apt-get clean\n\nRUN echo \"deb https://apt.repos.neuron.amazonaws.com bionic main\" > /etc/apt/sources.list.d/neuron.list\nRUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -\n\n# Installing Neuron Tools\nRUN apt-get update -y && apt-get install -y \\\n    aws-neuronx-tools\n\n# Sets up Path for Neuron tools\nENV PATH=\"/opt/bin/:/opt/aws/neuron/bin:${PATH}\"\n\n# Include framework tensorflow-neuron or torch-neuronx and compiler (compiler not needed for inference)\nRUN pip3 install \\\n    torch-neuronx \\\n    --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n# Include your APP dependencies here.\n# RUN ...\n\n# Define the entrypoint script that has some application code (if needed) and executes the docker run command\n# For example you can use something like below\n# COPY dockerd-libmode-entrypoint.sh /opt/bin/dockerd-entrypoint.sh\n# RUN chmod +x /opt/bin/dockerd-entrypoint.sh\n# ENTRYPOINT [\"/opt/bin/dockerd-entrypoint.sh\"]\n\nCMD [\"neuron-top\"]\n"
  },
  {
    "path": "containers/docker-example/inference/Dockerfile-inference-dlc",
    "content": "FROM ubuntu:24.04\n\n#SDK 1.17.1 has version 1. We skipped 1.18.0.\nLABEL dlc_major_version=\"2\"\nLABEL maintainer=\"Amazon AI\"\nLABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true\n\nARG PYTHON=python3.7\nARG PYTHON_VERSION=3.7.10\nARG TS_VERSION=0.5.2\nARG MAMBA_VERSION=4.12.0-0\n\n# See http://bugs.python.org/issue19846\nENV LANG C.UTF-8\nENV LD_LIBRARY_PATH /lib/x86_64-linux-gnu:/opt/conda/lib/:$LD_LIBRARY_PATH\nENV PATH /opt/conda/bin:$PATH\nENV SAGEMAKER_SERVING_MODULE sagemaker_pytorch_serving_container.serving:main\nENV TEMP=/home/model-server/tmp\n\nRUN apt-get update \\\n && apt-get install -y --no-install-recommends software-properties-common \\\n && add-apt-repository ppa:openjdk-r/ppa \\\n && apt-get update \\\n && apt-get install -y --no-install-recommends \\\n    build-essential \\\n    apt-transport-https \\\n    ca-certificates \\\n    cmake \\\n    curl \\\n    emacs \\\n    git \\\n    jq \\\n    libgl1-mesa-glx \\\n    libglib2.0-0 \\\n    libsm6 \\\n    libxext6 \\\n    libxrender-dev \\\n    openjdk-11-jdk \\\n    vim \\\n    wget \\\n    unzip \\\n    zlib1g-dev \\\n    libcap-dev \\\n    gpg-agent \\\n && rm -rf /var/lib/apt/lists/* \\\n && rm -rf /tmp/tmp* \\\n && apt-get clean\n\nRUN echo \"deb https://apt.repos.neuron.amazonaws.com bionic main\" > /etc/apt/sources.list.d/neuron.list\nRUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -\n\nRUN apt-get update \\\n && apt-get install -y \\\n    aws-neuron-tools \\\n && rm -rf /var/lib/apt/lists/* \\\n && rm -rf /tmp/tmp* \\\n && apt-get clean\n\n\n# https://github.com/docker-library/openjdk/issues/261 https://github.com/docker-library/openjdk/pull/263/files\nRUN keytool -importkeystore -srckeystore /etc/ssl/certs/java/cacerts -destkeystore /etc/ssl/certs/java/cacerts.jks -deststoretype JKS -srcstorepass changeit -deststorepass changeit -noprompt; \\\n    mv /etc/ssl/certs/java/cacerts.jks /etc/ssl/certs/java/cacerts; \\\n    /var/lib/dpkg/info/ca-certificates-java.postinst configure;\n\nRUN curl -L -o ~/mambaforge.sh https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-x86_64.sh \\\n && chmod +x ~/mambaforge.sh \\\n && ~/mambaforge.sh -b -p /opt/conda \\\n && rm ~/mambaforge.sh \\\n && /opt/conda/bin/conda update conda \\\n && /opt/conda/bin/conda install -c conda-forge -y \\\n    python=$PYTHON_VERSION \\\n    cython \\\n    mkl-include \\\n    mkl \\\n    parso \\\n    scipy \\\n    typing \\\n    # Below 2 are included in miniconda base, but not mamba so need to install\n    conda-content-trust \\\n    charset-normalizer \\\n && /opt/conda/bin/conda clean -ya\n\nRUN conda install -c conda-forge \\\n    opencv \\\n    scikit-learn \\\n    pandas \\\n    h5py \\\n    requests \\\n && conda clean -ya \\\n && pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org \\\n && ln -s /opt/conda/bin/pip /usr/local/bin/pip3 \\\n && pip install packaging==20.4 \\\n    enum-compat==0.0.3 \\\n    numpy==1.20.3 \\\n    ipython \\\n    # pyOpenSSL requires cryptography>=2.3, but all versions <3.3 have vulnerabilities\n    \"cryptography>=3.3.2\"\n\nRUN pip install --no-cache-dir -U \\\n    scipy \\\n    six \\\n    # install PyYAML>=5.4 to avoid conflict with latest awscli\n    \"pyYAML>=5.4,<5.5\" \\\n    \"pillow>=8.3\" \\\n    \"awscli<2\" \\\n    boto3\n\nRUN pip install neuron-cc[tensorflow] --extra-index-url https://pip.repos.neuron.amazonaws.com \\\n && pip install \"torch-neuron>=1.10.2,<1.10.3\" --extra-index-url https://pip.repos.neuron.amazonaws.com \\\n && pip install torchserve==$TS_VERSION \\\n && pip install --no-deps --no-cache-dir -U torchvision==0.11.3 \\\n # Install TF 1.15.5 to override neuron-cc[tensorflow]'s installation of tensorflow==1.15.0\n && pip install -U tensorflow==1.15.5 \\\n && pip install torch-model-archiver==$TS_VERSION\n\nRUN useradd -m model-server \\\n && mkdir -p /home/model-server/tmp /opt/ml/model \\\n && chown -R model-server /home/model-server /opt/ml/model\n\nCOPY torchserve-neuron.sh /usr/local/bin/entrypoint.sh\nCOPY config.properties /home/model-server\n\nRUN chmod +x /usr/local/bin/dockerd-entrypoint.py \\\n && chmod +x /usr/local/bin/neuron-monitor.sh \\\n && chmod +x /usr/local/bin/entrypoint.sh\n\nADD https://raw.githubusercontent.com/aws/deep-learning-containers/master/src/deep_learning_container.py /usr/local/bin/deep_learning_container.py\n\nRUN chmod +x /usr/local/bin/deep_learning_container.py\n\nRUN pip install --no-cache-dir \"sagemaker-pytorch-inference==2.0.8\"\n\nRUN HOME_DIR=/root \\\n && curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip \\\n && unzip ${HOME_DIR}/oss_compliance.zip -d ${HOME_DIR}/ \\\n && cp ${HOME_DIR}/oss_compliance/test/testOSSCompliance /usr/local/bin/testOSSCompliance \\\n && chmod +x /usr/local/bin/testOSSCompliance \\\n && chmod +x ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh \\\n && ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \\\n && rm -rf ${HOME_DIR}/oss_compliance*\n\nRUN curl https://aws-dlc-licenses.s3.amazonaws.com/pytorch-1.10/license.txt -o /license.txt\n\nEXPOSE 8080 8081\n\nCMD [\"/usr/local/bin/entrypoint.sh\"]\n"
  },
  {
    "path": "containers/docker-example/inference/Dockerfile-inference-dlc.rst",
    "content": ".. _inference-dlc-dockerfile:\n\nDLC sample Dockerfile for Application Container\n==============================================\n\n.. literalinclude:: Dockerfile-inference-dlc\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/inference/Dockerfile-libmode",
    "content": "# Example pytorch neuron container\n# To build:\n#    docker build . -f Dockerfile.pt -t neuron-container:pytorch\n# To run on EC2 Inf1 instances with AWS DLAMI:\n#    docker run -it --device=/dev/neuron0 neuron-container:pytorch\n\nFROM ubuntu:24.04\n\nLABEL maintainer=\" \"\n\nRUN apt-get update -y \\\n && apt-get install -y --no-install-recommends \\\n    gnupg2 \\\n    wget \\\n    python3-pip \\\n    python3-setuptools \\\n    && cd /usr/local/bin \\\n    && pip3 --no-cache-dir install --upgrade pip \\\n    && rm -rf /var/lib/apt/lists/* \\\n    && apt-get clean\n\nRUN echo \"deb https://apt.repos.neuron.amazonaws.com bionic main\" > /etc/apt/sources.list.d/neuron.list\nRUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -\n\n# Installing Neuron Tools\nRUN apt-get update -y && apt-get install -y \\\n    aws-neuron-tools\n\n# Sets up Path for Neuron tools\nENV PATH=\"/opt/bin/:/opt/aws/neuron/bin:${PATH}\"\n\n# Include framework tensorflow-neuron or torch-neuron and compiler (compiler not needed for inference)\nRUN pip3 install \\\n    torch-neuron \\\n    --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n# Include your APP dependencies here.\n# RUN ...\n\n# Define the entrypoint script that has some application code (if needed) and executes the docker run command\n# For example you can use something like below\n# COPY dockerd-libmode-entrypoint.sh /opt/bin/dockerd-entrypoint.sh\n# RUN chmod +x /opt/bin/dockerd-entrypoint.sh\n# ENTRYPOINT [\"/opt/bin/dockerd-entrypoint.sh\"]\n\nCMD [\"neuron-top\"]\n"
  },
  {
    "path": "containers/docker-example/inference/Dockerfile-libmode.rst",
    "content": ".. _libmode-dockerfile:\n\nDockerfile for Application Container\n====================================\n\n.. literalinclude:: Dockerfile-inference\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/inference/Dockerfile-tf-serving.rst",
    "content": ".. _tensorflow-model-server-neuron-dockerfile:\n\ntensorflow-model-server-neuron Dockerfile\n=========================================\n\n.. literalinclude:: Dockerfile.tf-serving\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/inference/Dockerfile.mxnet-serving",
    "content": "# To build:\n#    docker build . -f Dockerfile.mxnet-serving -t mxnet-model-server-neuron\n\nFROM amazonlinux:2\n\nENV PYTHONUNBUFFERED TRUE\n\nRUN dnf install -y gcc-c++\nRUN dnf install -y python3-devel\nRUN dnf install -y java-1.8.0-openjdk\nRUN dnf install -y curl\nRUN cd /tmp \\\n    && curl -O https://bootstrap.pypa.io/get-pip.py \\\n    && python3 get-pip.py\n\nRUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1\nRUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1\nRUN pip install mxnet-neuron --index-url=https://pip.repos.neuron.amazonaws.com\nRUN pip install multi-model-server\n\n\nRUN useradd -m model-server \\\n    && mkdir -p /home/model-server/tmp\n\nCOPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh\n\nRUN mkdir -p /home/model-server/tmp/models/\n#copy your model\nCOPY mxnet_model/resnet-50_compiled.mar  /home/model-server/tmp/models/\n\nRUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \\\n    && chown -R model-server /home/model-server\n\nEXPOSE 8080 8081\n\nUSER model-server\nWORKDIR /home/model-server\nENV TEMP=/home/model-server/tmp\nENTRYPOINT [\"/usr/local/bin/dockerd-entrypoint.sh\"]\nCMD [\"serve\"]\n"
  },
  {
    "path": "containers/docker-example/inference/Dockerfile.tf-serving",
    "content": "# Example tensorflow-model-server-neuron dockerfile.\n\n# Note: tensorflow_model_server_neuron must be pointed at the model location and name using MODEL_BASE_PATH and\n# MODEL_NAME env variables. MODEL_BASE_PATH may be an s3 location.\n\n# To build:\n#    docker build . -f Dockerfile.tf-serving -t tensorflow-model-server-neuron\n\n\nFROM amazonlinux:2\n\n\n# Expose ports for gRPC and REST\nEXPOSE 8500 8501\n\nENV MODEL_BASE_PATH=/models \\\n    MODEL_NAME=model\n\nRUN echo $'[neuron] \\n\\\nname=Neuron YUM Repository \\n\\\nbaseurl=https://yum.repos.neuron.amazonaws.com \\n\\\nenabled=1' > /etc/yum.repos.d/neuron.repo\n\nRUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n\nRUN dnf install -y tensorflow-model-server-neuron\nRUN mkdir -p /root/models/\n#copy your model\nCOPY tf_model/  /root/models/\nRUN ls -la /root/models/*\n\nCMD [\"/bin/sh\", \"-c\", \"/usr/local/bin/tensorflow_model_server_neuron --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=/root/models/${MODEL_NAME}\"]\n"
  },
  {
    "path": "containers/docker-example/inference/config-properties.rst",
    "content": ".. _torchserve-config-properties:\n\nTorchserve config.properties example\n====================================\n\n.. literalinclude:: config.properties\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/inference/config.properties",
    "content": "vmargs=-XX:+UseContainerSupport -XX:InitialRAMPercentage=8.0 -XX:MaxRAMPercentage=10.0 -XX:-UseLargePages -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError\nmodel_store=/opt/ml/model\nload_models=ALL\ninference_address=http://0.0.0.0:8080\nmanagement_address=http://0.0.0.0:8081\n# management_address=unix:/tmp/management.sock\n# number_of_netty_threads=0\n# netty_client_threads=0\n# default_response_timeout=120\n# default_workers_per_model=0\n# job_queue_size=100\n# async_logging=false\n# number_of_gpu=1\n# cors_allowed_origin\n# cors_allowed_methods\n# cors_allowed_headers\n# keystore=src/test/resources/keystore.p12\n# keystore_pass=changeit\n# keystore_type=PKCS12\n# private_key_file=src/test/resources/key.pem\n# certificate_file=src/test/resources/certs.pem\n# max_response_size=6553500\n# max_request_size=6553500\n# blacklist_env_vars=\n# decode_input_request=false\n# enable_envvars_config=false\n"
  },
  {
    "path": "containers/docker-example/inference/dockerd-libmode-entrypoint.rst",
    "content": ".. _dockerd-libmode-entrypoint:\n\nDocker Entrypoint Example - Application container\n=================================================\n\n.. literalinclude:: dockerd-libmode-entrypoint.sh\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/inference/dockerd-libmode-entrypoint.sh",
    "content": "#!/bin/bash\nif [[ \"$1\" = \"serve\" ]]; then\n  # Start your application here!\n  # e.g: 'python my_server_app.py'\nelse\n    eval \"$@\"\nfi\n\n# prevent docker exit\ntail -f /dev/null\n"
  },
  {
    "path": "containers/docker-example/inference/torchserve-neuron.rst",
    "content": ".. _torchserve-neuron:\n\nTorchserve Example\n==================\n\n.. literalinclude:: torchserve-neuron.sh\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/inference/torchserve-neuron.sh",
    "content": "#!/bin/bash\n\nMODEL_STORE=/opt/ml/model\nTS_CONFIG=/home/model-server/config.properties\nMODEL_PATH=\"\"\n\nwhile getopts \":m:t:\" opt; do\n  case $opt in\n    m) MODEL_PATH=\"$OPTARG\"\n    ;;\n    t) TS_CONFIG=\"$OPTARG\"\n    ;;\n    \\?) echo \"Invalid option -$OPTARG\" >&2\n    ;;\n  esac\ndone\n\nprintf \"Model path: %s\\n\" \"$MODEL_PATH\"\nprintf \"TS_CONFIG: %s\\n\" \"$TS_CONFIG\"\n# Start the Model Server\nif [[ -z \"$MODEL_PATH\" ]]; then\n  torchserve --start --ts-config /home/model-server/config.properties --model-store /opt/ml/model &\nelse\n  torchserve --start --ts-config $TS_CONFIG --models $MODEL_PATH &\nfi\nstatus=$?\nif [ $status -ne 0 ]; then\n  echo \"Failed to start TF Model Server: $status\"\n  exit $status\nfi"
  },
  {
    "path": "containers/docker-example/training/Dockerfile-training-dlc",
    "content": "# Example pytorch neuron container\n# To build:\n#    docker build . -f Dockerfile.pt -t neuron-container:pytorch\n# To run on EC2 Inf1 instances with AWS DLAMI:\n#    docker run -it --net=host --device=/dev/neuron0 neuron-container:pytorch\n\n# You can find the latest Pytorch Training Image here - https://gallery.ecr.aws/neuron/pytorch-training-neuronx\nFROM public.ecr.aws/neuron/pytorch-training-neuronx:2.9.0-neuronx-py310-sdk2.27.0-ubuntu24.04\nRUN mkdir -p /opt/ml\nCOPY model.py /opt/ml/model.py\nCOPY mlp_train.py /opt/ml/mlp_train.py "
  },
  {
    "path": "containers/docker-example/training/Dockerfile-trainium-dlc.rst",
    "content": ".. _trainium-dlc-dockerfile:\n\nDockerfile for Application Container\n====================================\n\n.. literalinclude:: Dockerfile-training-dlc\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/training/mlp.rst",
    "content": ".. _mlp-train:\n\nSimple MLP train script\n========================\n\nSave the following contents as mlp_train.py\n\n.. literalinclude:: mlp_train.py\n   :linenos:\n\n\nSave the following contents as model.py\n\n.. literalinclude:: model.py\n   :linenos:"
  },
  {
    "path": "containers/docker-example/training/mlp_train.py",
    "content": "import os\nimport time\nimport torch\nfrom model import MLP\n\nfrom torchvision.datasets import mnist\nfrom torch.utils.data import DataLoader\nfrom torchvision.transforms import ToTensor\n\n# XLA imports\nimport torch_xla.core.xla_model as xm\n\n# Global constants\nEPOCHS = 4\nWARMUP_STEPS = 2\nBATCH_SIZE = 32\n\n# Load MNIST train dataset\ntrain_dataset = mnist.MNIST(root='./MNIST_DATA_train',\n                            train=True, download=True, transform=ToTensor())\n\ndef main():\n    # Prepare data loader\n    train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE)\n\n    # Fix the random number generator seeds for reproducibility\n    torch.manual_seed(0)\n\n    # XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance)\n    device = 'xla'\n\n    # Move model to device and declare optimizer and loss function\n    model = MLP().to(device)\n    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n    loss_fn = torch.nn.NLLLoss()\n\n    # Run the training loop\n    print('----------Training ---------------')\n    model.train()\n    for epoch in range(EPOCHS):\n        start = time.time()\n        for idx, (train_x, train_label) in enumerate(train_loader):\n            optimizer.zero_grad()\n            train_x = train_x.view(train_x.size(0), -1)\n            train_x = train_x.to(device)\n            train_label = train_label.to(device)\n            output = model(train_x)\n            loss = loss_fn(output, train_label)\n            loss.backward()\n            optimizer.step()\n            xm.mark_step() # XLA: collect ops and run them in XLA runtime\n            if idx < WARMUP_STEPS: # skip warmup iterations\n                start = time.time()\n\n    # Compute statistics for the last epoch\n    interval = idx - WARMUP_STEPS # skip warmup iterations\n    throughput = interval / (time.time() - start)\n    print(\"Train throughput (iter/sec): {}\".format(throughput))\n    print(\"Final loss is {:0.4f}\".format(loss.detach().to('cpu')))\n\n    # Save checkpoint for evaluation\n    os.makedirs(\"checkpoints\", exist_ok=True)\n    checkpoint = {'state_dict': model.state_dict()}\n    # XLA: use xm.save instead of torch.save to ensure states are moved back to cpu\n    # This can prevent \"XRT memory handle not found\" at end of test.py execution\n    xm.save(checkpoint,'checkpoints/checkpoint.pt')\n\n    print('----------End Training ---------------')\n\nif __name__ == '__main__':\n    main()"
  },
  {
    "path": "containers/docker-example/training/model.py",
    "content": "import torch.nn as nn\nimport torch.nn.functional as F\n\n# Declare 3-layer MLP for MNIST dataset\nclass MLP(nn.Module):\n  def __init__(self, input_size = 28 * 28, output_size = 10, layers = [120, 84]):\n      super(MLP, self).__init__()\n      self.fc1 = nn.Linear(input_size, layers[0])\n      self.fc2 = nn.Linear(layers[0], layers[1])\n      self.fc3 = nn.Linear(layers[1], output_size)\n\n  def forward(self, x):\n      x = F.relu(self.fc1(x))\n      x = F.relu(self.fc2(x))\n      x = self.fc3(x)\n      return F.log_softmax(x, dim=1)"
  },
  {
    "path": "containers/docker-example/v1/inference/Dockerfile-app-rt-diff.rst",
    "content": ".. _app-rt-diff-dockerfile:\n\nDockerfile with Application and Runtime in different Container\n==============================================================\n\n.. literalinclude:: Dockerfile.app-rt-diff\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/v1/inference/Dockerfile-app-rt-same.rst",
    "content": ".. _app-rt-same-dockerfile:\n\nDockerfile with Application and Runtime in same Container\n=========================================================\n\n.. literalinclude:: Dockerfile.torch-neuron\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/v1/inference/Dockerfile-neuron-rtd.rst",
    "content": ".. _neuron-runtime-dockerfile:\n\nNeuron Runtime Dockerfile\n=========================\n\n.. literalinclude:: Dockerfile.neuron-rtd\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/v1/inference/Dockerfile-torch-neuron.rst",
    "content": ".. _torch-neuron-dockerfile:\n\ntorch-neuron Dockerfile\n=======================\n\n.. literalinclude:: Dockerfile.torch-neuron\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/v1/inference/Dockerfile.app-rt-diff",
    "content": "# Example pytorch neuron container\n# To build:\n#    docker build . -f Dockerfile.pt -t neuron-container:pytorch\n# To run on EC2 Inf1 instances with AWS DLAMI:\n#    sudo service neuron-rtd stop\n#    docker run -it --device=/dev/neuron0 -v /run/:/run --cap-add IPC_LOCK neuron-container:pytorch\n\nFROM ubuntu:18.04\n\nLABEL maintainer=\" \"\n\nRUN apt-get update -y \\\n && apt-get install -y --no-install-recommends \\\n wget \\\n gnupg2 \\\n python3-pip \\\n python3-setuptools \\\n && cd /usr/local/bin \\\n && pip3 --no-cache-dir install --upgrade pip \\\n && rm -rf /var/lib/apt/lists/* \\\n && apt-get clean\n\nRUN echo \"deb https://apt.repos.neuron.amazonaws.com bionic main\" > /etc/apt/sources.list.d/neuron.list\nRUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -\n\n# Include framework tensorflow-neuron or torch-neuron and compiler (compiler not needed for inference)\nRUN pip3 install \\\n    torch-neuron \\\n    --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n# Include your APP dependencies here.\n# RUN/ENTRYPOINT/CMD ...\n\n"
  },
  {
    "path": "containers/docker-example/v1/inference/Dockerfile.neuron-rtd",
    "content": "# Example neuron-rtd dockerfile.\n\n# To build:\n#    docker build . -f Dockerfile.neuron-rtd -t neuron-rtd\n\n# Note: the container must start with CAP_IPC_LOCK capability\n\n# To run on EC2 Inf1 instances with AWS DLAMI:\n#    sudo service neuron-rtd stop\n#   docker run --env AWS_NEURON_VISIBLE_DEVICES=\"0\" --cap-add IPC_LOCK -v /tmp/neuron_rtd_sock/:/sock neuron-rtd\n\n\nFROM amazonlinux:2\n\nRUN echo $'[neuron] \\n\\\nname=Neuron YUM Repository \\n\\\nbaseurl=https://yum.repos.neuron.amazonaws.com \\n\\\nenabled=1' > /etc/yum.repos.d/neuron.repo\n\nRUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n\nRUN dnf install -y aws-neuron-tools\nRUN dnf install -y aws-neuron-runtime\nRUN dnf install -y tar gzip\n\nENV PATH=\"/opt/aws/neuron/bin:${PATH}\"\n\nCMD neuron-rtd -g unix:/sock/neuron.sock --log-console\n"
  },
  {
    "path": "containers/docker-example/v1/inference/Dockerfile.torch-neuron",
    "content": "# Example pytorch neuron container\n# Note: a dockerd_entrypoint.sh script is required to succesfully build this image. Place the script on the same folder as the Dockerfile\n# To build:\n#    docker build . -f Dockerfile.pt -t neuron-container:pytorch\n# To run on EC2 Inf1 instances with AWS DLAMI:\n#    sudo service neuron-rtd stop\n#    docker run -it --device=/dev/neuron0 --cap-add IPC_LOCK neuron-container:pytorch\n\nFROM ubuntu:18.04\n\nLABEL maintainer=\" \"\n\nRUN apt-get update -y \\\n && apt-get install -y --no-install-recommends \\\n    gnupg2 \\\n    wget \\\n    python3-pip \\\n    python3-setuptools \\\n    libcap-dev \\\n    && cd /usr/local/bin \\\n    && pip3 --no-cache-dir install --upgrade pip \\\n    && rm -rf /var/lib/apt/lists/* \\\n    && apt-get clean\n\nRUN echo \"deb https://apt.repos.neuron.amazonaws.com bionic main\" > /etc/apt/sources.list.d/neuron.list\nRUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -\n\n# Installing Neuron Runtime and Tools\nRUN apt-get update -y && apt-get install -y \\\n    aws-neuron-runtime \\\n    aws-neuron-tools\n\n# Sets up Path for Neuron tools\nENV PATH=\"/opt/bin/:/opt/aws/neuron/bin:${PATH}\"\n\n# Include framework tensorflow-neuron or torch-neuron and compiler (compiler not needed for inference)\nRUN pip3 install \\\n    torch-neuron \\\n    --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n# Include your APP dependencies here.\n# RUN ...\n\n# Define the entrypoint script that starts the runtime and executes the docker run command\nCOPY dockerd-entrypoint.sh /opt/bin/dockerd-entrypoint.sh\nRUN chmod +x /opt/bin/dockerd-entrypoint.sh\nENTRYPOINT [\"/opt/bin/dockerd-entrypoint.sh\"]\n\nCMD [\"neuron-top\"]\n"
  },
  {
    "path": "containers/docker-example/v1/inference/dockerd-entrypoint-app-rt-same.rst",
    "content": ".. _dockerd-entrypoint-app-rt-same:\n\nDocker Entrypoint Example - Application and Runtime in same Container\n=====================================================================\n\n.. literalinclude:: dockerd-entrypoint.sh\n   :linenos:\n"
  },
  {
    "path": "containers/docker-example/v1/inference/dockerd-entrypoint.sh",
    "content": "#!/bin/bash\nset -e\n\nwait_for_nrtd() {\n  nrtd_sock=\"/run/neuron.sock\"\n  SOCKET_TIMEOUT=300\n  is_wait=true\n  wait_time=0\n  i=1\n  sp=\"/-\\|\"\n  echo -n \"Waiting for neuron-rtd  \"\n  pid=$1\n  while $is_wait; do\n    if [ -S \"$nrtd_sock\" ]; then\n      echo \"$nrtd_sock Exist...\"\n      is_wait=false\n    else\n      sleep 1\n      wait_time=$((wait_time + 1))\n      if [ \"$wait_time\" -gt \"$SOCKET_TIMEOUT\" ]; then\n        echo \"neuron-rtd failed to start, exiting\"\n\t      cat /tmp/nrtd.log\n        exit 1\n      fi\n      printf \"\\b${sp:i++%${#sp}:1}\"\n    fi\n  done\n  cat /tmp/nrtd.log\n}\n\n# Start neuron-rtd\n/opt/aws/neuron/bin/neuron-rtd -g unix:/run/neuron.sock --log-console  >>  /tmp/nrtd.log 2>&1 &\nnrtd_pid=$!\necho \"NRTD PID: \"$nrtd_pid\"\"\n#wait for nrtd to be up (5 minutes timeout)\nwait_for_nrtd $nrtd_pid\nexport NEURON_RTD_ADDRESS=unix:/run/neuron.sock\nnrtd_present=1\n\nif [[ \"$1\" = \"serve\" ]]; then\n  # Start your application here!\n  # e.g: 'python my_server_app.py'\nelse\n    eval \"$@\"\nfi\n\n# prevent docker exit\ntail -f /dev/null\n"
  },
  {
    "path": "containers/ec2-then-ec2-devflow.rst",
    "content": ".. _containers-ec2-then-ec2-devflow:\n\n.. include:: /devflows/inference/ec2-then-ec2-devflow.rst"
  },
  {
    "path": "containers/ec2.rst",
    "content": ".. _ec2-instance:\n\nEC2 Instance\n============\n\nIntroduction\n------------\n\nUse of Neuron in containers on EC2 can be simple to achieve by following these steps\n\n    - :ref:`tutorial-docker-env-setup-for-neuron`\n    - More details on EC2 setup `can be found at <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-setup.html>`_\n\nDLC Images\n----------\n\n    - The location for DLC images for Neuron can be obtained from `here <https://github.com/aws/deep-learning-containers/blob/master/available_images.md>`_\n    - To get the list of images for neuron, the following commands can be used.\n\n      ``aws ecr list-images --registry-id 763104351884 --repository-name tensorflow-inference-neuron``\n\n      ``aws ecr list-images --registry-id 763104351884 --repository-name pytorch-inference-neuron``\n\nSetup recommendations\n---------------------\n\n    - The EC2 Inf1 instance needs to have the aws-neuron-runtime-base and aws-neruon-dkms package installed.\n    - The DLC inference container runs the framework server (like tensorflow-model-server or TorchServe) and also the neuron runtime that interacts with the neuron driver running in the host.\n    - For more details on setting up the container, check the `tensorflow <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-inference.html#deep-learning-containers-ec2-tutorials-inference-tf>`_ or `pytorch <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-inference.html#deep-learning-containers-ec2-tutorials-inference-pytorch>`_. Make sure the appropriate framework container image is used.\n\nDebug Hints\n-----------\n    - Use the docker log command to get the neuron rtd logs in the container.\n\n       ``docker logs <container-name>``\n\n    - Look for errors like the following\n        - If we see *nrtd[8]: [TDRV:tdrv_init_mla_phase1] Could not open the device index:0*, it either means that some other container is using that device or the host is running the neuron-rtd process.\n        - Check to see that host is not running neuron-rtd\n\n           ``sudo systemctl status neuron-rtd``\n"
  },
  {
    "path": "containers/faq-troubleshooting-releasenote.rst",
    "content": "Containers - FAQ, Troubleshooting & ReleaseNotes\n================================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    FAQ </containers/faq>\n    troubleshooting\n    /release-notes/components/containers\n\n* :ref:`container-faq`\n* :ref:`container-troubleshooting`\n* :ref:`containers_rn`\n"
  },
  {
    "path": "containers/faq.rst",
    "content": ".. _container-faq:\n\nNeuron Containers FAQ\n=====================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 1\n\nWhere can I find DLC images\n---------------------------\n* The Inference/Training DLC images can be found `here <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#user-content-neuron-containers>`_.\n* In the `DLC release page <https://github.com/aws/deep-learning-containers/releases>`_ do a search for neuron to get the ECR repo location of specific neuron DLC release.\n\n\nWhat is OCI Neuron Hook and do we need that\n-------------------------------------------\nNeuron devices are exposed to the containers using the --device option in the docker run command.\nDocker runtime (runc) does not yet support the ALL option to expose all neuron\ndevices to the container. \n\nWith OCI neuron hook support is added to expose ALL devices to container using an environment variable,\n“AWS_NEURON_VISIBLE_DEVICES=ALL\". For more details please refer :ref:`oci neuron hook <tutorial-oci-hook>`\n\nIn Kubernetes, if we are using the device plugin version 1.7 & below, then the oci neuron hook is needed. If\nusing device plugin version >= 1.8 then oci neuron hook is not needed\n\nWhat container runtimes are supported\n-------------------------------------\nNeuron containers have been tested to work with docker, containerd, cri-o runtimes without any changes.\nIf the oci neuron hook is used then they need to be enabled in the runtime config. For more details please refer :ref:`oci neuron hook <tutorial-oci-hook>`\n\n\nHow to expose Neuron Devices to Container\n-----------------------------------------\nNeuron Device: Represents the number of Inferentia/Trainium chips in the instance. Refer :ref:`Container Devices <container-devices>` for more details\n\n\nHow to expose Neuron Cores to Container\n---------------------------------------\nNeuron Core: Represents the number of Neuron Cores in the instance. Refer :ref:`Container Cores <container-cores>` for more details. Each Inferentia1\ndevice has 4 Neuron Cores and each Inferentia2 and Trainium1 device has 2 Neuron Cores.\nWhen the devices are exposed to the containers all the cores in the device are available\nfor use in the container.  Please refer :ref:`nrt-configuration` to see how the environment variables NEURON_RT_VISIBLE_CORES and NEURON_RT_NUM_CORES \ncan be used to assign core to containers\n\nCan Neuron Devices be shared by different Containers running in the same Host\n-----------------------------------------------------------------------------\nYes, except in Kubernetes environment where the devices cannot be shared\n\nCan Neuron Cores be shared by different Containers running in the same Host\n-----------------------------------------------------------------------------\nNo\n\nWhen would you use Neuron K8 Scheduler Extension\n-------------------------------------------------\nThe neuron cores/devices that are exposed to the container needs to be contiguous. The kubernetes device plugin\ndoes not guarantee the devices to be contiguous. The K8 Neuron Scheduler Extension takes care of \nassigning contiguous devices to the containers.\n\nHow to add EFA devices to the container\n---------------------------------------\nThe EFA devices are exposed to the container using the --device option\n\n::\n\n   --device /dev/infiniband/uverbs0 \n\nIn a Kubernetes environment, the EFA device plugin is used to detect and advertise \nthe available EFA interfaces. The EFA device plugin can be installed using the `Helm chart provided by Amazon EKS <https://github.com/aws/eks-charts/tree/master/stable/aws-efa-k8s-device-plugin>`_\n\n::\n\n   helm repo add eks https://aws.github.io/eks-charts\n   helm install aws-efa-k8s-device-plugin --namespace kube-system eks/aws-efa-k8s-device-plugin\n\nOnce the plugin is deployed, applications can use the resource type vpc.amazonaws.com/efa in a pod request spec\n\n::\n\n   resources:\n      limits:\n         vpc.amazonaws.com/efa: 4\n\n\nCan distributed training jobs be run without EFA devices in container\n---------------------------------------------------------------------\nNo. For distributed training jobs on Trainium, all EFA interfaces provided by trn1.32xlarge need to be\nattached to the container\n"
  },
  {
    "path": "containers/files/index-dra.rst",
    "content": ".. meta::\n    :description: Templates supporting AWS Neuron Dynamic Resource Allocation (DRA) on Kubernetes.\n    :keywords: AWS Neuron, Neuron DRA, Dynamic Resource Allocation, Kubernetes, K8s, Device Plugin\n    :date-modified: 02/05/2026\n\nAWS Neuron Dynamic Resource Allocation (DRA) on Kubernetes: Support files\n=========================================================================\n\nThis page provides templates supporting AWS Neuron Dynamic Resource Allocation (DRA) on Kubernetes. You can view and download these files from the links below.\n\nResource Claim Specifications\n-----------------------------\n\nExample resource claim templates and pod specifications demonstrating different Neuron device allocation patterns for various workload requirements.\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 55 15\n\n   * - File Name\n     - Description\n     - Download\n   * - 1x4-connected-devices.yaml\n     - Resource claim template for allocating 4 connected Neuron devices with topology constraints for optimal performance.\n     - :download:`Download <specs/1x4-connected-devices.yaml>`\n   * - 2-node-inference-us.yaml\n     - Multi-node inference configuration for distributed workloads across 2 Trainium nodes.\n     - :download:`Download <specs/2-node-inference-us.yaml>`\n   * - 4-node-inference-us.yaml\n     - Large-scale inference setup for distributed workloads spanning 4 Trainium nodes.\n     - :download:`Download <specs/4-node-inference-us.yaml>`\n   * - all-devices.yaml\n     - Resource claim template that allocates all available Neuron devices on a trn2.48xlarge instance.\n     - :download:`Download <specs/all-devices.yaml>`\n   * - lnc-setting-trn2.yaml\n     - Logical NeuronCore configuration template optimized for Trainium2 instances.\n     - :download:`Download <specs/lnc-setting-trn2.yaml>`\n   * - specific-driver-version.yaml\n     - Example configuration for requesting specific Neuron driver versions in resource claims.\n     - :download:`Download <specs/specific-driver-version.yaml>`\n   * - us-and-lnc-config.yaml\n     - Example configuration for requesting UltraServer node with Logical NeuronCore configuration.\n     - :download:`Download <specs/us-and-lnc-config.yaml>`\n\n"
  },
  {
    "path": "containers/files/manifests/clusterrole.yaml",
    "content": "apiVersion: rbac.authorization.k8s.io/v1\nkind: ClusterRole\nmetadata:\n  name: neuron-dra-driver-clusterrole\nrules:\n# Required for DRA device plugin to manage ResourceSlices\n- apiGroups: [\"resource.k8s.io\"]\n  resources: [\"resourceslices\"]\n  verbs: [\"get\", \"list\", \"watch\", \"create\", \"update\", \"patch\", \"delete\"]\n# Required for DRA device plugin to read ResourceClaims\n- apiGroups: [\"resource.k8s.io\"]\n  resources: [\"resourceclaims\"]\n  verbs: [\"get\", \"list\", \"watch\"]\n# Required for DRA device plugin to read DeviceClasses\n- apiGroups: [\"resource.k8s.io\"]\n  resources: [\"deviceclasses\"]\n  verbs: [\"get\", \"list\", \"watch\"]\n  # Required to read and modify node information\n- apiGroups: [\"\"]\n  resources: [\"nodes\"]\n  verbs: [\"get\", \"list\", \"watch\", \"patch\", \"update\"]\n  # Required to modify node status\n- apiGroups: [\"\"]\n  resources: [\"nodes/status\"]\n  verbs: [\"patch\"]\n"
  },
  {
    "path": "containers/files/manifests/clusterrolebinding.yaml",
    "content": "\napiVersion: rbac.authorization.k8s.io/v1\nkind: ClusterRoleBinding\nmetadata:\n  name: neuron-dra-driver-binding\nroleRef:\n  apiGroup: rbac.authorization.k8s.io\n  kind: ClusterRole\n  name: neuron-dra-driver-clusterrole\nsubjects:\n- kind: ServiceAccount\n  name: neuron-dra-driver-sa\n  namespace: neuron-dra-driver"
  },
  {
    "path": "containers/files/manifests/daemonset.yaml",
    "content": "apiVersion: apps/v1\nkind: DaemonSet\nmetadata:\n  name: neuron-dra-driver-kubelet-plugin\n  namespace: neuron-dra-driver\n  labels:\n    app: neuron-dra-driver-kubelet-plugin\nspec:\n  updateStrategy:\n    type: RollingUpdate\n    rollingUpdate:\n      maxUnavailable: 0\n      maxSurge: 1\n  selector:\n    matchLabels:\n      app: neuron-dra-driver-kubelet-plugin\n  template:\n    metadata:\n      labels:\n        app: neuron-dra-driver-kubelet-plugin\n    spec:\n      affinity:\n        nodeAffinity:\n          requiredDuringSchedulingIgnoredDuringExecution:\n            nodeSelectorTerms:\n            - matchExpressions:\n              - key: node.kubernetes.io/instance-type\n                operator: In\n                values:\n                - trn1.2xlarge\n                - trn1.32xlarge\n                - trn1n.32xlarge\n                - trn2.3xlarge\n                - trn2.48xlarge\n                - trn2n.48xlarge\n              - key: eks.amazonaws.com/compute-type\n                operator: NotIn\n                values:\n                - fargate\n                - hybrid\n                - auto\n      serviceAccountName: neuron-dra-driver-sa\n      hostNetwork: true\n      containers:\n      - name: neuron-dra-driver\n        image: NEURON_DRA_IMAGE\n        imagePullPolicy: Always\n        command: [\"k8s-neuron-dra-driver\"]\n        # args:\n        # - --v=6\n        env:\n        - name: NODE_NAME\n          valueFrom:\n            fieldRef:\n              fieldPath: spec.nodeName\n        - name: POD_UID\n          valueFrom:\n            fieldRef:\n              fieldPath: metadata.uid\n        - name: CDI_ROOT\n          value: \"/var/run/cdi\"\n        - name: KUBELET_REGISTRAR_DIRECTORY_PATH\n          value: \"/var/lib/kubelet/plugins_registry\"\n        - name: KUBELET_PLUGINS_DIRECTORY_PATH\n          value: \"/var/lib/kubelet/plugins\"\n        - name: HEALTHCHECK_PORT\n          value: \"51515\"\n        - name: NEURON_DRA_DRIVER_EMULATION_MODE\n          value: \"trn2u\"\n        resources:\n          limits:\n            cpu: 20m\n            memory: 256Mi\n          requests:\n            cpu: 10m\n            memory: 128Mi\n        securityContext:\n          privileged: true\n        volumeMounts:\n        - name: kubelet-plugins-dir\n          mountPath: /var/lib/kubelet/plugins\n        - name: kubelet-registry-dir\n          mountPath: /var/lib/kubelet/plugins_registry\n        - name: cdi-dir\n          mountPath: /var/run/cdi\n        livenessProbe:\n          grpc:\n            port: 51515\n            service: liveness\n          failureThreshold: 3\n          periodSeconds: 10\n          initialDelaySeconds: 30\n          timeoutSeconds: 5\n      volumes:\n      - name: kubelet-plugins-dir\n        hostPath:\n          path: /var/lib/kubelet/plugins\n      - name: kubelet-registry-dir\n        hostPath:\n          path: /var/lib/kubelet/plugins_registry\n      - name: cdi-dir\n        hostPath:\n          path: /var/run/cdi\n      tolerations:\n      - key: CriticalAddonsOnly\n        operator: Exists\n      - key: aws.amazon.com/neuron\n        operator: Exists\n        effect: NoSchedule\n      - key: sagemaker.amazonaws.com/node-health-status\n        operator: Equal\n        value: Unschedulable\n        effect: NoSchedule\n      # - key: \"kwok.x-k8s.io/node\"\n      #   operator: \"Exists\"\n      #   effect: \"NoSchedule\""
  },
  {
    "path": "containers/files/manifests/deviceclass.yaml",
    "content": "apiVersion: resource.k8s.io/v1beta1\nkind: DeviceClass\nmetadata:\n  name: neuron.aws.com\nspec:\n  selectors:\n  - cel:\n      expression: device.driver == \"neuron.aws.com\""
  },
  {
    "path": "containers/files/manifests/namespace.yaml",
    "content": "apiVersion: v1\nkind: Namespace\nmetadata:\n  name: neuron-dra-driver\n  labels:\n    name: neuron-dra-driver"
  },
  {
    "path": "containers/files/manifests/serviceaccount.yaml",
    "content": "apiVersion: v1\nkind: ServiceAccount\nmetadata:\n  name: neuron-dra-driver-sa\n  namespace: neuron-dra-driver"
  },
  {
    "path": "containers/files/scripts/install-dra-driver.sh",
    "content": "#!/bin/bash\n\n# Deploy Neuron DRA Driver\nset -e\n\necho \"🚀 Deploying Neuron DRA Driver...\"\n\n# Check argument\nif [ $# -ne 1 ]; then\n    echo \"Usage: $0 <image_name>\"\n    echo \"Example: $0 123456789.dkr.ecr.us-west-2.amazonaws.com/neuron-dra-driver:v1.0\"\n    exit 1\nfi\n\n# Get the script directory and set the manifests path\nSCRIPT_DIR=\"$(cd \"$(dirname \"${BASH_SOURCE[0]}\")\" && pwd)\"\nMANIFESTS_DIR=\"$SCRIPT_DIR/../../manifests\"\nDRA_IMAGE=\"$1\"\n\n# Apply all manifests in order\necho \"📝 Creating namespace...\"\nkubectl apply -f \"$MANIFESTS_DIR/namespace.yaml\"\n\necho \"🔐 Creating ServiceAccount and RBAC...\"\nkubectl apply -f \"$MANIFESTS_DIR/serviceaccount.yaml\"\nkubectl apply -f \"$MANIFESTS_DIR/clusterrole.yaml\"\nkubectl apply -f \"$MANIFESTS_DIR/clusterrolebinding.yaml\"\n\necho \"📱 Creating DeviceClass...\"\nkubectl apply -f \"$MANIFESTS_DIR/deviceclass.yaml\"\n\necho \"🔧 Deploying DRA DaemonSet...\"\n# Check if DaemonSet already exists before applying\nDAEMONSET_EXISTS=false\nif kubectl get daemonset neuron-dra-driver-kubelet-plugin -n neuron-dra-driver >/dev/null 2>&1; then\n    DAEMONSET_EXISTS=true\n    echo \"📋 DaemonSet already exists, will restart after applying...\"\nfi\n\necho \"🏷️  Using custom image: $DRA_IMAGE\"\nsed \"s|NEURON_DRA_IMAGE|$DRA_IMAGE|g\" \"$MANIFESTS_DIR/daemonset.yaml\" | kubectl apply -f -\n\n# If DaemonSet was already running, restart it to pull latest image\nif [ \"$DAEMONSET_EXISTS\" = true ]; then\n    echo \"🔄 Restarting DaemonSet to pull latest image...\"\n    kubectl rollout restart daemonset/neuron-dra-driver-kubelet-plugin -n neuron-dra-driver\n    echo \"⏳ Waiting for rollout to complete...\"\n    kubectl rollout status daemonset/neuron-dra-driver-kubelet-plugin -n neuron-dra-driver --timeout=300s\nelse\n    echo \"⏳ Waiting until pods are in a running state...\"\n    kubectl wait --for=condition=ready pod -l app=neuron-dra-driver-kubelet-plugin -n neuron-dra-driver --timeout=300s\nfi\n\necho \"✅ Deployment complete!\"\n\necho \"\"\necho \"📊 Recent logs from dra driver:\"\nkubectl logs -n neuron-dra-driver -l app=neuron-dra-driver-kubelet-plugin --tail=10\necho \"\""
  },
  {
    "path": "containers/files/specs/1x4-connected-devices.yaml",
    "content": "apiVersion: resource.k8s.io/v1\nkind: ResourceClaimTemplate\nmetadata:\n  name: 1x4-connected-neurons\nspec:\n  spec:\n    devices:\n      requests:\n      - name: neurons\n        exactly:\n          deviceClassName: neuron.aws.com\n          allocationMode: ExactCount\n          count: 4\n          selectors:\n          - cel:\n              expression: \"device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'\"\n      constraints:\n      - requests: [\"neurons\"]\n        matchAttribute: \"resource.aws.com/devicegroup4_id\"\n\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: pod0\n  labels:\n    app: pod\nspec:\n  containers:\n  - name: ctr0\n    image: public.ecr.aws/ubuntu/ubuntu:22.04\n    command: [\"bash\", \"-c\"]\n    args: [\"export; trap 'exit 0' TERM; sleep 9999 & wait\"]\n    resources:\n      claims:\n      - name: neurons\n  resourceClaims:\n  - name: neurons\n    resourceClaimTemplateName: 1x4-connected-neurons\n"
  },
  {
    "path": "containers/files/specs/2-node-inference-us.yaml",
    "content": "apiVersion: resource.k8s.io/v1\nkind: ResourceClaimTemplate\nmetadata:\n  name: us-2-node-config\nspec:\n  spec:\n    devices:\n      requests:\n      - name: neurons\n        exactly:\n          deviceClassName: neuron.aws.com\n          selectors:\n          - cel:\n              expression: \"device.attributes['neuron.aws.com'].resourceType == 'neuron_node'\"\n          allocationMode: ExactCount\n          count: 1\n      config:\n      - requests: [\"neurons\"]\n        opaque:\n          driver: neuron.aws.com\n          parameters:\n            apiVersion: neuron.aws.com/v1\n            kind: UltraServerConfig\n            ultraserverMode: 2\n---\napiVersion: leaderworkerset.x-k8s.io/v1\nkind: LeaderWorkerSet\nmetadata:\n  name: vllm\n  annotations:\n    leaderworkerset.sigs.k8s.io/exclusive-topology: neuron.amazonaws.com/ultraserver-server-id-2\nspec:\n  rolloutStrategy:\n    type: RollingUpdate\n    rollingUpdateConfiguration:\n      maxUnavailable: 1\n      maxSurge: 1\n  # Two replica groups of 2 nodes each\n  replicas: 2\n  leaderWorkerTemplate:\n    size: 2\n    restartPolicy: RecreateGroupOnPodRestart\n    leaderTemplate:\n      metadata:\n        labels:\n          role: leader\n      spec:\n        containers:\n          - name: vllm-leader\n            image: public.ecr.aws/ubuntu/ubuntu:22.04\n            command:\n              - sh\n              - -c\n              - \"sleep infinity\"\n            resources:\n              claims:\n              - name: one-node-from-ultraserver\n        resourceClaims:\n        - name: one-node-from-ultraserver\n          resourceClaimTemplateName: us-2-node-config\n    workerTemplate:\n      metadata:\n        labels:\n          role: worker\n      spec:\n        containers:\n          - name: vllm-worker\n            image: public.ecr.aws/ubuntu/ubuntu:22.04\n            command:\n              - sh\n              - -c\n              - \"sleep infinity\"\n            resources:\n              claims:\n              - name: one-node-from-ultraserver\n        resourceClaims:\n        - name: one-node-from-ultraserver\n          resourceClaimTemplateName: us-2-node-config\n"
  },
  {
    "path": "containers/files/specs/4-node-inference-us.yaml",
    "content": "apiVersion: resource.k8s.io/v1\nkind: ResourceClaimTemplate\nmetadata:\n  name: us-4-node-config\nspec:\n  spec:\n    devices:\n      requests:\n      - name: neurons\n        exactly:\n          deviceClassName: neuron.aws.com\n          selectors:\n          - cel:\n              expression: \"device.attributes['neuron.aws.com'].resourceType == 'neuron_node'\"\n          allocationMode: ExactCount\n          count: 1\n      config:\n      - requests: [\"neurons\"]\n        opaque:\n          driver: neuron.aws.com\n          parameters:\n            apiVersion: neuron.aws.com/v1\n            kind: UltraServerConfig\n            ultraserverMode: 4\n---\napiVersion: leaderworkerset.x-k8s.io/v1\nkind: LeaderWorkerSet\nmetadata:\n  name: vllm\n  annotations:\n    leaderworkerset.sigs.k8s.io/exclusive-topology: neuron.amazonaws.com/ultraserver-server-id-4\nspec:\n  rolloutStrategy:\n    type: RollingUpdate\n    rollingUpdateConfiguration:\n      maxUnavailable: 1\n      maxSurge: 1\n  # Two replica groups of 4 nodes each, i.e. two ultraservers\n  replicas: 2\n  leaderWorkerTemplate:\n    size: 4\n    restartPolicy: RecreateGroupOnPodRestart\n    leaderTemplate:\n      metadata:\n        labels:\n          role: leader\n      spec:\n        containers:\n          - name: vllm-leader\n            image: public.ecr.aws/ubuntu/ubuntu:22.04\n            command:\n              - sh\n              - -c\n              - \"sleep infinity\"\n            resources:\n              claims:\n              - name: one-node-from-ultraserver\n        resourceClaims:\n        - name: one-node-from-ultraserver\n          resourceClaimTemplateName: us-4-node-config\n    workerTemplate:\n      metadata:\n        labels:\n          role: worker\n      spec:\n        containers:\n          - name: vllm-worker\n            image: public.ecr.aws/ubuntu/ubuntu:22.04\n            command:\n              - sh\n              - -c\n              - \"sleep infinity\"\n            resources:\n              claims:\n              - name: one-node-from-ultraserver\n        resourceClaims:\n        - name: one-node-from-ultraserver\n          resourceClaimTemplateName: us-4-node-config\n"
  },
  {
    "path": "containers/files/specs/all-devices.yaml",
    "content": "apiVersion: resource.k8s.io/v1\nkind: ResourceClaimTemplate\nmetadata:\n  name: all-neurons\nspec:\n  spec:\n    devices:\n      requests:\n        - name: neurons\n          exactly:\n            deviceClassName: neuron.aws.com\n            selectors:\n              - cel:\n                  expression: \"device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'\"\n            allocationMode: All\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: pod0\n  labels:\n    app: pod\nspec:\n  containers:\n  - name: ctr0\n    image: public.ecr.aws/ubuntu/ubuntu:22.04\n    command: [\"bash\", \"-c\"]\n    args: [\"export; trap 'exit 0' TERM; sleep 9999 & wait\"]\n    resources:\n      claims:\n      - name: neurons\n  resourceClaims:\n  - name: neurons\n    resourceClaimTemplateName: all-neurons"
  },
  {
    "path": "containers/files/specs/lnc-setting-trn2.yaml",
    "content": "apiVersion: resource.k8s.io/v1\nkind: ResourceClaimTemplate\nmetadata:\n  name: all-neurons-lnc-1\nspec:\n  spec:\n    devices:\n      requests:\n      - name: neurons\n        exactly:\n          deviceClassName: neuron.aws.com\n          selectors:\n          - cel:\n              expression: \"device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'\"\n          allocationMode: All\n      config:\n      - requests: [\"neurons\"]\n        opaque:\n          driver: neuron.aws.com\n          parameters:\n            apiVersion: neuron.aws.com/v1\n            kind: NeuronConfig\n            logicalNeuronCore: 1\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: pod0\n  labels:\n    app: pod\nspec:\n  containers:\n  - name: ctr0\n    image: public.ecr.aws/ubuntu/ubuntu:22.04\n    command: [\"bash\", \"-c\"]\n    args: [\"export; trap 'exit 0' TERM; sleep 9999 & wait\"]\n    resources:\n      claims:\n      - name: neurons\n  resourceClaims:\n  - name: neurons\n    resourceClaimTemplateName: all-neurons-lnc-1\n"
  },
  {
    "path": "containers/files/specs/specific-driver-version.yaml",
    "content": "apiVersion: resource.k8s.io/v1\nkind: ResourceClaimTemplate\nmetadata:\n  name: driver-version-neuron\nspec:\n  spec:\n    devices:\n      requests:\n      - name: neurons\n        exactly:\n          deviceClassName: neuron.aws.com\n          selectors:\n            - cel:\n                expression: \"device.attributes['neuron.aws.com'].neuronDriverVersion == '2.25.4.0'\"\n          allocationMode: All\n\n---\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: my-app\nspec:\n  replicas: 2\n  selector:\n    matchLabels:\n      app: my-app\n  template:\n    metadata:\n      labels:\n        app: my-app\n    spec:\n      containers:\n      - name: nginx\n        image: public.ecr.aws/docker/library/nginx:alpine\n      resourceClaims:\n      - name: neurons\n        resourceClaimTemplateName: driver-version-neuron"
  },
  {
    "path": "containers/files/specs/us-and-lnc-config.yaml",
    "content": "apiVersion: resource.k8s.io/v1\nkind: ResourceClaimTemplate\nmetadata:\n  name: us-and-lnc-config\nspec:\n  spec:\n    devices:\n      requests:\n        - name: neurons\n          exactly:\n            deviceClassName: neuron.aws.com\n            selectors:\n              - cel:\n                  expression: \"device.attributes['neuron.aws.com'].resourceType == 'neuron_node'\"\n            allocationMode: ExactCount\n            count: 1\n      config:\n        - requests: [\"neurons\"]\n          opaque:\n            driver: neuron.aws.com\n            parameters:\n              apiVersion: neuron.aws.com/v1\n              kind: UltraServerConfig\n              ultraserverMode: 2\n        - requests: [\"neurons\"]\n          opaque:\n            driver: neuron.aws.com\n            parameters:\n              apiVersion: neuron.aws.com/v1\n              kind: NeuronConfig\n              logicalNeuronCore: 1\n---\napiVersion: leaderworkerset.x-k8s.io/v1\nkind: LeaderWorkerSet\nmetadata:\n  name: vllm\n  annotations:\n    leaderworkerset.sigs.k8s.io/exclusive-topology: neuron.amazonaws.com/ultraserver-server-id-2\nspec:\n  rolloutStrategy:\n    type: RollingUpdate\n    rollingUpdateConfiguration:\n      maxUnavailable: 1\n      maxSurge: 1\n  # Two replica groups of 2 nodes each\n  replicas: 2\n  leaderWorkerTemplate:\n    size: 2\n    restartPolicy: RecreateGroupOnPodRestart\n    leaderTemplate:\n      metadata:\n        labels:\n          role: leader\n      spec:\n        containers:\n          - name: vllm-leader\n            image: public.ecr.aws/ubuntu/ubuntu:22.04\n            command:\n              - sh\n              - -c\n              - \"sleep infinity\"\n            resources:\n              claims:\n                - name: one-node-from-ultraserver\n        resourceClaims:\n          - name: one-node-from-ultraserver\n            resourceClaimTemplateName: us-and-lnc-config\n    workerTemplate:\n      metadata:\n        labels:\n          role: worker\n      spec:\n        containers:\n          - name: vllm-worker\n            image: public.ecr.aws/ubuntu/ubuntu:22.04\n            command:\n              - sh\n              - -c\n              - \"sleep infinity\"\n            resources:\n              claims:\n                - name: one-node-from-ultraserver\n        resourceClaims:\n          - name: one-node-from-ultraserver\n            resourceClaimTemplateName: us-and-lnc-config\n"
  },
  {
    "path": "containers/get-started/quickstart-configure-deploy-dlc.rst",
    "content": ".. meta::\n   :description: Learn how to deploy a vLLM server using preconfigured Neuron Deep Learning Container with on Trainium and Inferentia instances.\n   :date_updated: 01/26/2026\n\n.. _quickstart_vllm_dlc_deploy:\n\nQuickstart: Configure and deploy a vLLM server using Neuron Deep Learning Container (DLC)\n==========================================================================================\n\nThis topic guides you through deploying a vLLM server on Trainium and Inferentia instances using a Deep Learning Container preconfigured with AWS Neuron SDK artifacts. When you complete this tutorial, you will be able run a vLLM inference server on AWS Trainium and Inferentia instances.\n\nOverview\n--------\nIn this quickstart, you will pull a vLLM Docker image, configure it for Neuron devices, and start an inference server running vLLM. This process lets you deploy large language models on AWS ML accelerators for high-performance inference workloads.\n\nBefore you start\n----------------\n\nThis tutorial assumes that you have experience in the following areas:\n\n* Docker container management\n* AWS EC2 instance administration\n* Command-line interface operations\n\nPrerequisites\n-------------\n\nBefore you begin, ensure you have:\n\n* AWS Trainium or Inferentia instance access\n* Docker installed on your instance. You can set up docker environment according to :ref:`tutorial-docker-env-setup`\n* SSH access to your instance\n\nPrepare your environment\n------------------------\n\nLaunch an AWS Trainium or Inferentia instance with sufficient resources for your model requirements. We recommend using one of the base DLAMIs to launch your instance - `Neuron Base DLAMI <#>`.\n\nStep 1: Pull the vLLM Docker image\n-----------------------------------\n\nIn this step, you will download the vLLM Docker image from AWS ECR.\n\nGet the latest vLLM Docker image from Neuron's ECR public gallery `pytorch-inference-vllm-neuronx <https://gallery.ecr.aws/neuron/pytorch-inference-vllm-neuronx>`_ repository, and then get the latest published image tag and use it in the command below:\n\n.. code-block:: bash\n\n   docker pull public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:<image_tag>\n\nFor example, replace ``<image_tag>`` with an SDK 2.28.0 released DLC image tag such as ``0.13.0-neuronx-py312-sdk2.28.0-ubuntu24.04``\n\nStep 2: Start the Docker container\n-----------------------------------\n\nIn this step, you will run the container with access to Neuron devices. For this tutorial, we are using an trn1.32xlarge instance.\n\nRun the container interactively with access to Neuron devices:\n\n.. code-block:: bash\n\n   docker run -it \\\n   --device=/dev/neuron0 \\\n   --device=/dev/neuron1 \\\n   --device=/dev/neuron2 \\\n   --device=/dev/neuron3 \\\n   --device=/dev/neuron4 \\\n   --device=/dev/neuron5 \\\n   --device=/dev/neuron6 \\\n   --device=/dev/neuron7 \\\n   --device=/dev/neuron8 \\\n   --device=/dev/neuron9 \\\n   --device=/dev/neuron10 \\\n   --device=/dev/neuron11 \\\n   --device=/dev/neuron12 \\\n   --device=/dev/neuron13 \\\n   --device=/dev/neuron14 \\\n   --device=/dev/neuron15 \\\n   --cap-add SYS_ADMIN \\\n   --cap-add IPC_LOCK \\\n   -p 8080:8080 \\\n   --name <server_name> \\\n   <image_uri> \\\n   bash\n\n.. note::\n   The trn1.32xlarge instance provides 16 Neuron devices. Adjust the number of Neuron devices (``--device=/dev/neuronX``) based on your instance type and requirements.\n\nStep 3: Start the vLLM server\n------------------------------\n\nIn this step, you will launch the vLLM inference server inside the container.\n\nInside the container, start the vLLM inference server:\n\n.. code-block:: bash\n\n   vllm serve \\\n   --model='TinyLlama/TinyLlama-1.1B-Chat-v1.0' \\\n   --max-num-seqs=4 \\\n   --max-model-len=128 \\\n   --tensor-parallel-size=2 \\\n   --block-size=32 \\\n   --num-gpu-blocks-override=16 \\\n   --port=8080 \\\n   --additional-config='{\"override_neuron_config\":{\"enable_bucketing\":false}}'\n\n.. note::\n   **Version compatibility**: The command above is compatible with vLLM version 0.11.0 and later. If you are using an older version (such as 0.9.1), you must:\n   \n   * Replace ``--additional-config='{\"override_neuron_config\":{\"enable_bucketing\":false}}'`` with ``--override-neuron-config '{\"enable_bucketing\":false}'``\n   \n.. important::\n   * Choose the appropriate model for your use case\n   * Set ``--tensor-parallel-size`` to be less than or equal to total number of NeuronCores (or TP ranks) available from your devices, accounting for cores per device and logical core configuration\n   * Server startup typically takes 5-10 minutes\n\nStep 4: Verify server status\n-----------------------------\n\nIn this step, you will confirm the server starts successfully.\n\nWait for the server to fully initialize. You will see output showing available API routes:\n\n.. code-block:: text\n\n   INFO 08-12 00:04:47 [launcher.py:28] Available routes are:\n   INFO 08-12 00:04:47 [launcher.py:36] Route: /health, Methods: GET\n   INFO 08-12 00:04:47 [launcher.py:36] Route: /v1/chat/completions, Methods: POST\n   INFO 08-12 00:04:47 [launcher.py:36] Route: /v1/completions, Methods: POST\n\n.. note::\n   During startup, you may see warning logs similar to the following, which can be safely ignored:\n\n   .. code-block:: text\n\n      No module named 'vllm._version'\n        from .version import __version__, __version_tuple__  # isort:skip\n      WARNING [__init__.py:25] The vLLM package was not found, so its version could not be inspected. This may cause platform detection to fail.\n      INFO [__init__.py:243] Automatically detected platform neuron.\n      WARNING [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError(\"No module named 'vllm._C'\")\n\nAll complete! Now, let's confirm everything works.\n\nStep 5: Inference service confirmation\n---------------------------------------\n\nTest the API to confirm your setup works correctly.\n\nOpen a separate terminal and make an API call:\n\n.. code-block:: bash\n\n   curl http://localhost:8080/v1/chat/completions \\\n   -H \"Content-Type: application/json\" \\\n   -d '{\n     \"messages\": [\n       {\n         \"role\": \"user\",\n         \"content\": \"What is the capital of Italy?\"\n       }\n     ]\n   }'\n\nYou should receive a response similar to:\n\n.. code-block:: json\n\n   {\n     \"id\": \"chatcmpl-ac7551dd2f2a4be3bd2c1aabffa79b4c\",\n     \"object\": \"chat.completion\",\n     \"created\": 1754958455,\n     \"model\": \"TinyLlama/TinyLlama-1.1B-Chat-v1.0\",\n     \"choices\": [\n       {\n         \"index\": 0,\n         \"message\": {\n           \"role\": \"assistant\",\n           \"content\": \"The capital of Italy is Rome...\",\n           \"tool_calls\": []\n         },\n         \"finish_reason\": \"stop\"\n       }\n     ],\n     \"usage\": {\n       \"prompt_tokens\": 23,\n       \"total_tokens\": 106,\n       \"completion_tokens\": 83\n     }\n   }\n\nCongratulations! You have successfully deployed a vLLM inference server using a preconfigured Neuron DLC. If you encountered any issues, see the **Common issues** section below.\n\nAvailable API endpoints\n-----------------------\n\nThe server provides various endpoints for different use cases:\n\n* **Health Check**: ``GET /health``\n* **Chat Completions**: ``POST /v1/chat/completions``\n* **Text Completions**: ``POST /v1/completions``\n* **Models Info**: ``GET /v1/models``\n* **API Documentation**: ``GET /docs``\n\nCommon issues\n-------------\n\nDid you encounter an error while working through this tutorial? Here are common issues and solutions:\n\n- **Server won't start**: Check that you have sufficient Neuron devices allocated\n- **Connection refused**: Verify the container is running and port 8080 is properly mapped\n- **Slow performance**: Ensure your ``tensor-parallel-size`` matches your available Neuron devices\n- **Memory issues**: Consider using a larger instance type or reducing model size\n\nFor additional help, refer to the complete vLLM User Guide for NxD Inference documentation.\n\nClean up\n--------\n\nTo clean up resources after completing this tutorial:\n\n1. Stop the Docker container:\n\n   .. code-block:: bash\n\n      docker stop <server_name>\n\n2. Remove the container:\n\n   .. code-block:: bash\n\n      docker rm <server_name>\n\n3. Terminate your EC2 instance if no longer needed.\n\nNext steps\n----------\n\nNow that you've completed this tutorial, explore these related topics:\n\n* Learn more about vLLM configuration options in the vLLM User Guide for NxD Inference\n* Explore model optimization techniques for better performance\n* Set up production deployment with load balancing and monitoring\n\nFurther reading\n---------------\n\n- `vLLM User Guide for NxD Inference <#>`_ - Complete documentation for vLLM on Neuron\n- `AWS Neuron SDK Documentation <https://awsdocs-neuron.readthedocs-hosted.com/>`_ - Full Neuron SDK reference\n"
  },
  {
    "path": "containers/get-started/quickstart-pytorch-inference-dlc.rst",
    "content": ".. meta::\n   :description: Learn how to run PyTorch inference using preconfigured Neuron Deep Learning Container with Llama-2-7b on Trainium instances.\n   :date_updated: 02/17/2026\n\n.. _quickstart_pytorch_inference_dlc:\n\nQuickstart: Run PyTorch inference using Neuron Deep Learning Container (DLC)\n=============================================================================\n\nThis topic guides you through running PyTorch inference on Trainium instances using a Deep Learning Container preconfigured with AWS Neuron SDK artifacts. When you complete this tutorial, you will be able to run inference with the Llama-2-7b model on AWS Trainium instances.\n\nOverview\n--------\nIn this quickstart, you will pull a PyTorch inference Docker image, download the Llama-2-7b model from S3, and run an inference demo that compiles, validates, and benchmarks the model. This process lets you deploy large language models on AWS ML accelerators for high-performance inference workloads.\n\nBefore you start\n----------------\n\nThis tutorial assumes that you have experience in the following areas:\n\n* Docker container management\n* AWS EC2 instance administration\n* Command-line interface operations\n* AWS S3 operations\n\nPrerequisites\n-------------\n\nBefore you begin, ensure you have:\n\n* AWS Trainium instance access (trn2.48xlarge recommended)\n* Docker installed on your instance. You can set up docker environment according to :ref:`tutorial-docker-env-setup`\n* SSH access to your instance\n* AWS credentials configured with access to the model S3 bucket\n\nPrepare your environment\n------------------------\n\nLaunch an AWS Trainium instance with sufficient resources for your model requirements. We recommend using one of the base DLAMIs to launch your instance - `Neuron Base DLAMI <#>`.\n\nStep 1: Pull the PyTorch inference Docker image\n------------------------------------------------\n\nIn this step, you will download the PyTorch inference Docker image from AWS ECR.\n\nGet the latest PyTorch inference Docker image from Neuron's ECR public gallery `pytorch-inference-neuronx <https://gallery.ecr.aws/neuron/pytorch-inference-neuronx>`_ repository, and then get the latest published image tag and use it in the command below:\n\n.. code-block:: bash\n\n   docker pull public.ecr.aws/neuron/pytorch-inference-neuronx:<image_tag>\n\nFor example, replace ``<image_tag>`` with an SDK 2.28.0 released DLC image tag such as ``2.9.0-neuronx-py312-sdk2.28.0-ubuntu24.04``\n\nStep 2: Download the Llama-2-7b model\n--------------------------------------\n\nIn this step, you will download the Llama-2-7b model from HuggingFace to an S3 bucket, then copy it to your instance.\n\nFirst, download the model from HuggingFace and upload to your S3 bucket:\n\n.. code-block:: bash\n\n   # Install HuggingFace CLI if not already installed\n   pip install huggingface-hub\n\n   # Login to HuggingFace (you'll need to accept the Llama-2 license first)\n   hf auth login\n\n   # Download the model\n   hf download meta-llama/Llama-2-7b --local-dir ./Llama-2-7b\n\n   # Upload to your S3 bucket\n   aws s3 cp --recursive ./Llama-2-7b s3://your-bucket-name/models/Llama-2-7b/\n\nThen, on your Trainium instance, download the model from S3:\n\n.. note::\n   Change ``/home/ec2-user`` to ``/home/ubuntu`` if you're using an Ubuntu AMI.\n\n.. code-block:: bash\n\n   # Create directory for the model\n   mkdir -p /home/ec2-user/model_hf/Llama-2-7b\n\n   # Download from S3\n   aws s3 cp --recursive s3://your-bucket-name/models/Llama-2-7b/ /home/ec2-user/model_hf/Llama-2-7b/\n\n   # Verify the model downloaded successfully\n   ls /home/ec2-user/model_hf/Llama-2-7b/config.json\n\n.. note::\n   You must accept the Llama-2 license on HuggingFace before you can download the model. Visit https://huggingface.co/meta-llama/Llama-2-7b to request access.\n\nStep 3: Start the Docker container\n-----------------------------------\n\nIn this step, you will run the container with access to Neuron devices and mount the model directory. For this tutorial, we are using a trn2.48xlarge instance.\n\nRun the container interactively with access to all Neuron devices:\n\n.. code-block:: bash\n\n   docker run -it \\\n   --device=/dev/neuron0 \\\n   --device=/dev/neuron1 \\\n   --device=/dev/neuron2 \\\n   --device=/dev/neuron3 \\\n   --device=/dev/neuron4 \\\n   --device=/dev/neuron5 \\\n   --device=/dev/neuron6 \\\n   --device=/dev/neuron7 \\\n   --device=/dev/neuron8 \\\n   --device=/dev/neuron9 \\\n   --device=/dev/neuron10 \\\n   --device=/dev/neuron11 \\\n   -v /home/ec2-user/model_hf/Llama-2-7b:/root/model_hf/Llama-2-7b \\\n   --cap-add SYS_ADMIN \\\n   --cap-add IPC_LOCK \\\n   --name pytorch-inference-demo \\\n   public.ecr.aws/neuron/pytorch-inference-neuronx:<image_tag> \\\n   bash\n\n.. note::\n   The trn2.48xlarge instance provides 12 Neuron devices. Adjust the number of Neuron devices (``--device=/dev/neuronX``) based on your instance type and requirements.\n\nStep 4: Run the inference demo\n-------------------------------\n\nIn this step, you will run the inference demo script that compiles the model, checks accuracy, and benchmarks performance.\n\nInside the container, run the inference demo:\n\n.. code-block:: bash\n\n   inference_demo \\\n   --model-type llama \\\n   --task-type causal-lm \\\n   run \\\n   --model-path /root/model_hf/Llama-2-7b/ \\\n   --compiled-model-path /root/traced_model/Llama-2-7b-demo/ \\\n   --torch-dtype bfloat16 \\\n   --tp-degree 96 \\\n   --batch-size 2 \\\n   --max-context-length 32 \\\n   --seq-len 64 \\\n   --on-device-sampling \\\n   --enable-bucketing \\\n   --top-k 1 \\\n   --do-sample \\\n   --pad-token-id 2 \\\n   --prompt 'I believe the meaning of life is' \\\n   --prompt 'The color of the sky is' \\\n   --check-accuracy-mode token-matching \\\n   --benchmark\n\n.. important::\n   * The inference demo takes approximately 20 minutes to complete on a trn2.48xlarge instance\n   * The script will compile the model, validate accuracy, and run benchmarks\n   * Set ``--tp-degree`` to match the number of NeuronCores you want to use (96 for trn2.48xlarge)\n\nStep 5: Verify the results\n---------------------------\n\nIn this step, you will confirm the inference demo completed successfully and review the benchmark results.\n\nWait for the demo to complete. You will see output showing benchmark results:\n\n.. code-block:: text\n\n   Benchmark completed and its result is as following\n   {\n     \"e2e_model\": {\n       \"latency_ms_p50\": 8539.34,\n       \"latency_ms_p90\": 8627.43,\n       \"latency_ms_p95\": 8646.97,\n       \"latency_ms_p99\": 8652.62,\n       \"latency_ms_p100\": 8654.03,\n       \"latency_ms_avg\": 8533.13,\n       \"throughput\": 480.01\n     },\n     \"context_encoding_model\": {\n       \"latency_ms_p50\": 132.42,\n       \"latency_ms_p90\": 133.47,\n       \"latency_ms_p95\": 133.59,\n       \"latency_ms_p99\": 133.81,\n       \"latency_ms_p100\": 133.86,\n       \"latency_ms_avg\": 132.52,\n       \"throughput\": 30908.75\n     },\n     \"token_generation_model\": {\n       \"latency_ms_p50\": 7.84,\n       \"latency_ms_p90\": 8.39,\n       \"latency_ms_p95\": 8.47,\n       \"latency_ms_p99\": 8.63,\n       \"latency_ms_p100\": 28.96,\n       \"latency_ms_avg\": 7.87,\n       \"throughput\": 520434.73\n     }\n   }\n   Completed saving result to benchmark_report.json\n\n.. note::\n   You may see several red ``ERROR NRT:nrt_tensor_free`` errors at the end of the script output. These can be safely ignored - the actual benchmark results appear above these error messages.\n\nAll complete! The benchmark results are saved to ``benchmark_report.json`` in the container.\n\nUnderstanding the results\n-------------------------\n\nThe benchmark output provides three key metrics:\n\n* **e2e_model**: End-to-end model performance including context encoding and token generation\n* **context_encoding_model**: Performance of processing the input prompt\n* **token_generation_model**: Performance of generating output tokens\n\nEach metric includes:\n\n* Latency percentiles (p50, p90, p95, p99, p100) in milliseconds\n* Average latency in milliseconds\n* Throughput in tokens per second\n\nCommon issues\n-------------\n\nDid you encounter an error while working through this tutorial? Here are common issues and solutions:\n\n- **Model download fails**: Verify you have accepted the Llama-2 license on HuggingFace and have valid AWS credentials\n- **Container won't start**: Check that you have sufficient Neuron devices allocated\n- **Compilation fails**: Ensure you have enough memory and the correct PyTorch version\n- **Slow performance**: Verify your ``tp-degree`` matches your available Neuron devices\n- **Memory issues**: Consider using a larger instance type or reducing batch size\n\nFor additional help, refer to the complete NeuronX Distributed Inference documentation.\n\nClean up\n--------\n\nTo clean up resources after completing this tutorial:\n\n1. Exit the container:\n\n   .. code-block:: bash\n\n      exit\n\n2. Stop and remove the container:\n\n   .. code-block:: bash\n\n      docker stop pytorch-inference-demo\n      docker rm pytorch-inference-demo\n\n3. Remove the model files if no longer needed:\n\n   .. code-block:: bash\n\n      rm -rf /home/ec2-user/model_hf/Llama-2-7b\n\n4. Terminate your EC2 instance if no longer needed.\n\nNext steps\n----------\n\nNow that you've completed this tutorial, explore these related topics:\n\n* Learn more about NeuronX Distributed Inference configuration options\n* Explore different model architectures and optimization techniques\n* Set up production deployment with monitoring and logging\n\nFurther reading\n---------------\n\n- `NeuronX Distributed Inference Documentation <#>`_ - Complete documentation for inference on Neuron\n- `AWS Neuron SDK Documentation <https://awsdocs-neuron.readthedocs-hosted.com/>`_ - Full Neuron SDK reference\n- `Llama-2 Model Card <https://huggingface.co/meta-llama/Llama-2-7b>`_ - Model details and license information\n"
  },
  {
    "path": "containers/getting-started.rst",
    "content": ".. _containers-getting-started:\n\nGetting started with Neuron DLC using Docker\n============================================\n\n.. tab-set::\n\n   .. tab-item:: Training\n\n\n      .. dropdown::  Launch Trn1 Instance\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. include:: /setup/install-templates/launch-instance.txt\n\n      .. dropdown:: Install Drivers\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. code:: bash\n\n\n               # Configure Linux for Neuron repository updates\n\n               sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF\n               [neuron]\n               name=Neuron YUM Repository\n               baseurl=https://yum.repos.neuron.amazonaws.com\n               enabled=1\n               metadata_expire=0\n               EOF\n               sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n\n               # Update OS packages\n               sudo dnf update -y\n\n               # Install OS headers\n               sudo dnf install -y \"kernel-devel-uname-r = $(uname -r)\"\n\n               # Remove preinstalled packages and Install Neuron Driver and Runtime\n               sudo dnf remove aws-neuron-dkms -y\n               sudo dnf remove aws-neuronx-dkms -y\n               sudo dnf install aws-neuronx-dkms-2.*  -y\n\n               # Install EFA Driver(only required for multi-instance training)\n               curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz\n               wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key\n               cat aws-efa-installer.key | gpg --fingerprint\n               wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig\n               tar -xvf aws-efa-installer-latest.tar.gz\n               cd aws-efa-installer && sudo bash efa_installer.sh --yes\n               cd\n               sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer\n\n      .. dropdown:: Install Docker\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. code:: bash\n\n               sudo dnf install -y docker.io\n               sudo usermod -aG docker $USER\n\n            Logout and log back in to refresh membership.\n\n      .. dropdown:: Verify Docker\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. code:: bash\n\n               docker run hello-world\n\n            Expected result:\n\n            ::\n\n               Hello from Docker!\n               This message shows that your installation appears to be working correctly.\n\n               To generate this message, Docker took the following steps:\n               1. The Docker client contacted the Docker daemon.\n               2. The Docker daemon pulled the \"hello-world\" image from the Docker Hub.\n               (amd64)\n               3. The Docker daemon created a new container from that image which runs the\n               executable that produces the output you are currently reading.\n               4. The Docker daemon streamed that output to the Docker client, which sent it\n               to your terminal.\n\n               To try something more ambitious, you can run an Ubuntu container with:\n               $ docker run -it ubuntu bash\n\n               Share images, automate workflows, and more with a free Docker ID:\n               https://hub.docker.com/\n\n               For more examples and ideas, visit:\n               https://docs.docker.com/get-started/\n\n      .. dropdown:: Verify Neuron Component\n           :class-title: sphinx-design-class-title-small\n           :class-body: sphinx-design-class-body-small\n           :animate: fade-in\n\n           Once the environment is setup, a container can be started with\n           --device=/dev/neuron# to specify desired set of Inferentia/Trainium devices to be\n           exposed to the container. To find out the available neuron devices on\n           your instance, use the command ``ls /dev/neuron*``.\n\n           When running neuron-ls inside a container, you will only see the set of\n           exposed Trainiums. For example:\n\n           .. code:: bash\n\n             docker run --device=/dev/neuron0 neuron-test neuron-ls\n\n           Would produce the following output in trn1.32xlarge:\n\n           ::\n\n             +--------+--------+--------+---------+\n             | NEURON | NEURON | NEURON |   PCI   |\n             | DEVICE | CORES  | MEMORY |   BDF   |\n             +--------+--------+--------+---------+\n             | 0      | 2      | 32 GB  | 10:1c.0 |\n             +--------+--------+--------+---------+\n\n      .. dropdown::  Run Tutorial\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            :ref:`tutorial-training`\n\n\n   .. tab-item:: Inference\n\n\n      .. dropdown::  Launch Inf1 Instance\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. include:: /setup/install-templates/launch-inf1.txt\n\n      .. dropdown:: Install Drivers\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. code:: bash\n\n               # Configure Linux for Neuron repository updates\n               sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF\n               [neuron]\n               name=Neuron YUM Repository\n               baseurl=https://yum.repos.neuron.amazonaws.com\n               enabled=1\n               metadata_expire=0\n               EOF\n               sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n\n               # Update OS packages\n               sudo dnf update -y\n\n               ################################################################################################################\n               # To install or update to Neuron versions 1.19.1 and newer from previous releases:\n               # - DO NOT skip 'aws-neuron-dkms' install or upgrade step, you MUST install or upgrade to latest Neuron driver\n               ################################################################################################################\n\n               # Install OS headers\n               sudo dnf install -y \"kernel-devel-uname-r = $(uname -r)\"\n\n               # Install Neuron Driver\n               sudo dnf install aws-neuron-dkms -y\n\n               ####################################################################################\n               # Warning: If Linux kernel is updated as a result of OS package update\n               #          Neuron driver (aws-neuron-dkms) should be re-installed after reboot\n               ####################################################################################\n\n      .. dropdown:: Install Docker\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. code:: bash\n\n               sudo dnf install -y docker.io\n               sudo usermod -aG docker $USER\n\n            Logout and log back in to refresh membership.\n\n      .. dropdown:: Verify Docker\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. code:: bash\n\n               docker run hello-world\n\n            Expected result:\n\n            ::\n\n               Hello from Docker!\n               This message shows that your installation appears to be working correctly.\n\n               To generate this message, Docker took the following steps:\n               1. The Docker client contacted the Docker daemon.\n               2. The Docker daemon pulled the \"hello-world\" image from the Docker Hub.\n               (amd64)\n               3. The Docker daemon created a new container from that image which runs the\n               executable that produces the output you are currently reading.\n               4. The Docker daemon streamed that output to the Docker client, which sent it\n               to your terminal.\n\n               To try something more ambitious, you can run an Ubuntu container with:\n               $ docker run -it ubuntu bash\n\n               Share images, automate workflows, and more with a free Docker ID:\n               https://hub.docker.com/\n\n               For more examples and ideas, visit:\n               https://docs.docker.com/get-started/\n\n\n      .. dropdown:: Verify Neuron Component\n           :class-title: sphinx-design-class-title-small\n           :class-body: sphinx-design-class-body-small\n           :animate: fade-in\n\n           Once the environment is setup, a container can be started with\n           --device=/dev/neuron# to specify desired set of Inferentia/Trainium devices to be\n           exposed to the container. To find out the available neuron devices on\n           your instance, use the command ``ls /dev/neuron*``.\n\n           When running neuron-ls inside a container, you will only see the set of\n           exposed Inferentias. For example:\n\n           .. code:: bash\n\n             docker run --device=/dev/neuron0 neuron-test neuron-ls\n\n\n           Would produce the following output in inf1.xlarge:\n\n           ::\n\n               +--------------+---------+--------+-----------+-----------+------+------+\n               |   PCI BDF    | LOGICAL | NEURON |  MEMORY   |  MEMORY   | EAST | WEST |\n               |              |   ID    | CORES  | CHANNEL 0 | CHANNEL 1 |      |      |\n               +--------------+---------+--------+-----------+-----------+------+------+\n               | 0000:00:1f.0 |       0 |      4 | 4096 MB   | 4096 MB   |    0 |    0 |\n               +--------------+---------+--------+-----------+-----------+------+------+\n\n      .. dropdown::  Run Tutorial\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            :ref:`tutorial-infer`\n            :ref:`quickstart_vllm_dlc_deploy`\n\n\n"
  },
  {
    "path": "containers/how-to/how-to-ultraserver.rst",
    "content": ".. _containers-how-to-ultraserver:\n\nHow to schedule MPI jobs to run on Neuron UltraServer on EKS\n============================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n\nTrn2 UltraServers represent a sophisticated computing infrastructure designed to connect multiple Trainium instances\nthrough NeuronLinkV3 (Read more here: :ref:`aws-trn2-arch`). For many advanced and complex models, customers can use UltraServers to greatly reduce training\nand inference times compared to previous distributed job setups.\n\nThis page explains the two setups needed to properly schedule and run MPI jobs on the Neuron UltraServer on EKS:\n\n* UltraServer init script for the launcher pod\n* Affinity configuration for the worker pods\n\nHow it works\n~~~~~~~~~~~~\n\nThe UltraServer init script will:\n\n* Validate the node config and deployment of the MPI job worker pods\n* Write environment variables that are required for runtime to each MPI worker pod\n* Write a new hostfile to ``/root/ultraserver_init/new_hostfile``\n\nThe validation process includes making sure the node config is a valid number (4, 2, or 1), and that the worker pods\nare deployed correctly to UltraServer nodes. More about the how to set the node config can been found below.\n\nThe environment variables that are being written are:\n\n* NEURON_GLOBAL_TOPOID: The topology ID of the worker pod\n* NEURON_GLOBAL_TOPOID0_HOST: The FQDN of the worker pod that's the “leader” (topology ID of 0)\n* NEURON_RT_ULTRASERVER_MODE: The mode of the UltraServer node that’s passed to the Neuron runtime\n* NEURON_RT_ULTRASERVER_SERVER_ID: The server ID of the UltraServer node that’s passed to the Neuron runtime\n* NEURON_RT_ULTRASERVER_NODE_ID: The node ID of the UltraServer node that’s passed to the Neuron runtime\n\nThe affinity performs two functions:\n\n* Prevents worker pods from being scheduled together with worker pods from other jobs\n* Requires/Encourages worker pods from the same job to be scheduled together\n\nThese configurations are needed in order to properly schedule your MPI job worker pods.\n\nThe pod anti-affinity prevents scheduling your workload onto UltraServer topologies where worker pods from other jobs\nalready exist. For example, if you have an UltraServer that already has a 2-node job running on it, the pod\nanti-affinity will prevent scheduling a 4-node job on that UltraServer since 2 of the 4 nodes are already occupied.\n\nThe pod affinity will make sure that worker pods of the same job are scheduled together in the same UltraServer\ntopology. For example, if you have an 2 UltraServers with no jobs running on either of them, the pod affinity would\nmake sure that the worker pods of a 4-node job are all scheduled on the same UltraServer and not split between the two.\n\nPrerequisites\n-------------\n\n* An EKS cluster with trn2 UltraServers (:ref:`kubernetes-getting-started`)\n* Neuron Device Plugin installed on the cluster with version >= 2.26.26.0 (:ref:`tutorials/k8s-neuron-device-plugin`)\n* MPI operator installed on the cluster\n* An MPI job spec\n\nInstructions\n------------\n\nUltraServer Init Script\n~~~~~~~~~~~~~~~~~~~~~~~\n\nDownload the UltraServer init script :download:`k8s-ultraserver-init-script.sh </src/k8/k8s-ultraserver-init-script.sh>`\n\nTo use the script, either:\n- add it to your MPI job Dockerfile and build the image OR\n- create a new Dockerfile and build a new image from your MPI job image\n\nExample:\n\n.. code-block:: dockerfile\n\n    FROM 123456789012.dkr.ecr.us-west-2.amazonaws.com/ultraserver:mpijob\n    COPY ultraserver-init-script.sh /tmp/\n    RUN chmod +x /tmp/ultraserver-init-script.sh\n    ENTRYPOINT [\"/tmp/ultraserver-init-script.sh\"]\n\nThen add the 2 required init containers to the launcher pod.\n\nThe first init container should utilize the /etc/mpi/discover_hosts.sh script to ensure that all worker pods are ready\nbefore continuing on to the UltraServer init script.\n\nThe second init container should use the image containing ultraserver-init-script.sh. You can specify a value for\nNEURON_ULTRASERVER_NODE_CONFIG, which determines what UltraServer node config your MPI job will use, i.e. how many\nUltraServer nodes to use. Possible values are 4, 2, and 1, and the default value is 4.\n\nExample:\n\n.. code-block:: yaml\n\n    apiVersion: kubeflow.org/v2beta1\n    kind: MPIJob\n    metadata:\n      name: &job_name <MPI-JOB-NAME>\n      namespace: default\n    spec:\n      mpiReplicaSpecs:\n        Launcher:\n          replicas: 1\n          template:\n            spec:\n              containers:\n              - name: mpitest\n                image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/ultraserver:mpijob\n              ...\n              initContainers:\n              - name: wait-hostfilename\n                image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/ultraserver:mpijob\n                command:\n                - bash\n                - -cx\n                - |\n                  if [[ $(cat /etc/mpi/discover_hosts.sh | wc -l) != 1 ]]; then\n                    date\n                    echo \"Ready\"\n                    cat /etc/mpi/discover_hosts.sh\n                  else\n                    date\n                    echo \"not ready ...\"\n                    sleep 10\n                    exit 1\n                  fi\n                  while read host; do\n                    while ! ssh $host echo $host; do\n                      date\n                      echo \"Pod $host is not up ...\"\n                      sleep 10\n                    done\n                    date\n                    echo \"Pod $host is ready\"\n                  done <<< \"$(/etc/mpi/discover_hosts.sh)\"\n                resources: {}\n                volumeMounts:\n                - mountPath: /etc/mpi\n                  name: mpi-job-config\n                - mountPath: /root/.ssh\n                  name: ssh-auth\n              - name: ultraserver-init-container\n                image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/ultraserver:init-container\n                env:\n                - name: NEURON_ULTRASERVER_NODE_CONFIG\n                  value: <\"4\", \"2\", OR \"1\">\n                volumeMounts:\n                - mountPath: /etc/mpi\n                  name: mpi-job-config\n                - mountPath: /root/.ssh\n                  name: ssh-auth\n                - mountPath: /root/ultraserver_init\n                  name: ultraserver-init\n              ...\n              volumes:\n              - name: ultraserver-init\n                emptyDir: {}\n\nMPI Worker Pod Affinity\n~~~~~~~~~~~~~~~~~~~~~~~\n\nSingle-node Job\n^^^^^^^^^^^^^^^\n\n2-node job\n\n.. code-block:: yaml\n\n    apiVersion: kubeflow.org/v2beta1\n    kind: MPIJob\n    metadata:\n      name: &job_name <MPI-JOB-NAME>\n      namespace: default\n      ...\n    spec:\n      mpiReplicaSpecs:\n        Launcher:\n          ...\n        Worker:\n          replicas: 2\n          template:\n            spec:\n              nodeSelector:\n                node.kubernetes.io/instance-type: trn2u.48xlarge\n              affinity:\n                podAntiAffinity:\n                  requiredDuringSchedulingIgnoredDuringExecution:\n                  - labelSelector:\n                      matchExpressions:\n                      - key: training.kubeflow.org/job-name\n                        operator: NotIn\n                        values:\n                        - *job_name\n                      matchLabels:\n                        training.kubeflow.org/job-role: worker\n                    topologyKey: neuron.amazonaws.com/ultraserver-server-id-2\n                podAffinity:\n                  requiredDuringSchedulingIgnoredDuringExecution:\n                  - labelSelector:\n                      matchLabels:\n                        training.kubeflow.org/job-role: worker\n                        training.kubeflow.org/job-name: *job_name\n                    topologyKey: neuron.amazonaws.com/ultraserver-server-id-2\n        ...\n\n4-node job\n\n.. code-block:: yaml\n\n    apiVersion: kubeflow.org/v2beta1\n    kind: MPIJob\n    metadata:\n      name: &job_name <MPI-JOB-NAME>\n      namespace: default\n      ...\n    spec:\n      mpiReplicaSpecs:\n        Launcher:\n          ...\n        Worker:\n          replicas: 4\n          template:\n            spec:\n              nodeSelector:\n                node.kubernetes.io/instance-type: trn2u.48xlarge\n              affinity:\n                podAntiAffinity:\n                  requiredDuringSchedulingIgnoredDuringExecution:\n                  - labelSelector:\n                      matchExpressions:\n                      - key: training.kubeflow.org/job-name\n                        operator: NotIn\n                        values:\n                        - *job_name\n                      matchLabels:\n                        training.kubeflow.org/job-role: worker\n                    topologyKey: neuron.amazonaws.com/ultraserver-server-id-4\n                podAffinity:\n                  requiredDuringSchedulingIgnoredDuringExecution:\n                  - labelSelector:\n                      matchLabels:\n                        training.kubeflow.org/job-role: worker\n                        training.kubeflow.org/job-name: *job_name\n                    topologyKey: neuron.amazonaws.com/ultraserver-server-id-4\n        ...\n\nMulti-node job\n^^^^^^^^^^^^^^\n\n.. code-block:: yaml\n\n    apiVersion: kubeflow.org/v2beta1\n    kind: MPIJob\n    metadata:\n      name: &job_name <MPI-JOB-NAME>\n      namespace: default\n      ...\n    spec:\n      mpiReplicaSpecs:\n        Launcher:\n          ...\n        Worker:\n          replicas: 16\n          template:\n            spec:\n              nodeSelector:\n                node.kubernetes.io/instance-type: trn2u.48xlarge\n              affinity:\n                podAntiAffinity:\n                  requiredDuringSchedulingIgnoredDuringExecution:\n                  - labelSelector:\n                      matchExpressions:\n                      - key: training.kubeflow.org/job-name\n                        operator: NotIn\n                        values:\n                        - *job_name\n                      matchLabels:\n                        training.kubeflow.org/job-role: worker\n                    topologyKey: neuron.amazonaws.com/ultraserver-server-id-4\n                podAffinity:\n                  preferredDuringSchedulingIgnoredDuringExecution:\n                  - weight: 100\n                    podAffinityTerm:\n                      labelSelector:\n                        matchLabels:\n                          training.kubeflow.org/job-role: worker\n                          training.kubeflow.org/job-name: *job_name\n                      topologyKey: neuron.amazonaws.com/ultraserver-server-id-4\n        ...\n\nTo use the affinity configuration, replace <MPI-JOB-NAME> with your MPI job name and add it to your workload yaml spec.\n\nConfirm your work\n-----------------\n\nTo validate that the init container is working:\n\n.. code-block::\n\n    # Find the worker pods associated with your MPI job\n    kubectl get pods\n\n    # Get the logs of the init container\n    kubectl logs <LAUNCHER-POD-NAME> -c ultraserver-init-container\n\nYou should see logs under the init container.\n\nExample:\n\n.. code-block::\n\n    $ kubectl get pods\n    NAME                                       READY   STATUS     RESTARTS   AGE\n    demo-launcher-42lh9                        0/1     Init:0/2   0          4s\n    demo-worker-0                              1/1     Running    0          4s\n    demo-worker-1                              1/1     Running    0          4s\n    demo-worker-2                              1/1     Running    0          4s\n    demo-worker-3                              1/1     Running    0          4s\n\n    $ kubectl logs demo-launcher-42lh9 -c ultraserver-init-container\n    Using 4-node config\n    ...\n\nTo validate that the affinity configuration is working:\n\n.. code-block::\n\n    # Find the worker pods and the nodes they are scheduled to\n    kubectl get pods -o=custom-columns='POD_NAME:metadata.name,NODE_NAME:spec.nodeName'\n\n    # Compare the labels of the nodes to the\n    kubectl get nodes \\\n        -l neuron.amazonaws.com/ultraserver-mode \\\n        -o=custom-columns='NAME:metadata.name,MODE:metadata.labels.neuron\\.amazonaws\\.com/ultraserver-mode,ULTRASERVER_SERVER_ID_2:metadata.labels.neuron\\.amazonaws\\.com/ultraserver-server-id-2,ULTRASERVER_NODE_ID_2:metadata.labels.neuron\\.amazonaws\\.com/ultraserver-node-id-2,ULTRASERVER_SERVER_ID_4:metadata.labels.neuron\\.amazonaws\\.com/ultraserver-server-id-4,ULTRASERVER_NODE_ID_4:metadata.labels.neuron\\.amazonaws\\.com/ultraserver-node-id-4' | awk 'NR==1{print;next}{print | \"sort -k3,3 -k4,4\"}'\n\nWhen looking at the nodes used by the worker pods, they should share the same ULTRASERVER_SERVER_ID_2 or\nULTRASERVER_SERVER_ID_4 label based on which config you chose.\n\nExample when choosing a 4-node config:\n\n.. code-block::\n\n    $ kubectl get pods -o=custom-columns='POD_NAME:metadata.name,NODE_NAME:spec.nodeName'\n    POD_NAME                                   NODE_NAME\n    demo-launcher-42lh9                        ip-172-32-5-227.ap-southeast-4.compute.internal\n    demo-worker-0                              ip-172-32-5-227.ap-southeast-4.compute.internal\n    demo-worker-1                              ip-172-32-11-17.ap-southeast-4.compute.internal\n    demo-worker-2                              ip-172-32-13-57.ap-southeast-4.compute.internal\n    demo-worker-3                              ip-172-32-9-4.ap-southeast-4.compute.internal\n\n    $ kubectl get nodes \\\n        -l neuron.amazonaws.com/ultraserver-mode \\\n        -o=custom-columns='NAME:metadata.name,MODE:metadata.labels.neuron\\.amazonaws\\.com/ultraserver-mode,ULTRASERVER_SERVER_ID_2:metadata.labels.neuron\\.amazonaws\\.com/ultraserver-server-id-2,ULTRASERVER_NODE_ID_2:metadata.labels.neuron\\.amazonaws\\.com/ultraserver-node-id-2,ULTRASERVER_SERVER_ID_4:metadata.labels.neuron\\.amazonaws\\.com/ultraserver-server-id-4,ULTRASERVER_NODE_ID_4:metadata.labels.neuron\\.amazonaws\\.com/ultraserver-node-id-4' | awk 'NR==1{print;next}{print | \"sort -k3,3 -k4,4\"}'\n\n    NAME                                              MODE    ULTRASERVER_SERVER_ID_2   ULTRASERVER_NODE_ID_2   ULTRASERVER_SERVER_ID_4   ULTRASERVER_NODE_ID_4\n    ip-172-32-11-17.ap-southeast-4.compute.internal   1_2_4   u5wy80u0o2saugxy          0                       bog79p1y8tetj5uu          0\n    ip-172-32-13-57.ap-southeast-4.compute.internal   1_2_4   u5wy80u0o2saugxy          1                       bog79p1y8tetj5uu          1\n    ip-172-32-5-227.ap-southeast-4.compute.internal   1_2_4   ygml2651y0lwdd46          0                       bog79p1y8tetj5uu          2\n    ip-172-32-9-4.ap-southeast-4.compute.internal     1_2_4   ygml2651y0lwdd46          1                       bog79p1y8tetj5uu          3\n\nCommon issues\n-------------\n\nInit script fails to start\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf at least one of the worker pods isn't scheduled to a node, the init script will fail to start.\n\nExample:\n\n.. code-block::\n\n    $ kubectl get pods -o=custom-columns='POD_NAME:metadata.name,NODE_NAME:spec.nodeName'\n    POD_NAME                                   NODE_NAME\n    demo-launcher-96xsl                        ip-172-32-9-4.ap-southeast-4.compute.internal\n    demo-worker-0                              <none>\n    demo-worker-1                              <none>\n    demo-worker-2                              <none>\n    demo-worker-3                              <none>\n\n    $ kubectl logs demo-launcher-96xsl -c ultraserver-init-container\n    Error from server (BadRequest): container \"ultraserver-init-container\" in pod \"demo-launcher-96xsl\" is waiting to start: PodInitializing\n\nPossible solution: Check your pods for affinity/scheduling issues.\n\n.. code-block::\n\n    $ kubectl describe pod demo-worker-0\n    Events:\n      Type     Reason            Age    From               Message\n      ----     ------            ----   ----               -------\n      Warning  FailedScheduling  3m13s  default-scheduler  0/4 nodes are available: 4 node(s) didn't match pod affinity rules. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.\n\nRelated Information\n-------------------\n\n- :ref:`kubernetes-getting-started` - Information about how to use Neuron on EKS\n- :ref:`tutorials/k8s-neuron-device-plugin` - Information about Neuron Device Plugin\n- :ref:`aws-trn2-arch` - Information about trn2 UltraServer architecture\n- :ref:`general-troubleshooting` - Information about general troubleshooting for Neuron\n- `MPI Operator <https://github.com/kubeflow/mpi-operator>`_ - Information about MPI Operator\n- `MPI User Guide <https://www.kubeflow.org/docs/components/trainer/legacy-v1/user-guides/mpi/>`_ - Information about MPI jobs\n- `Kubernetes Pod Affinity <https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity>`_ - Information about pod affinity rules\n- `YAML anchors <https://support.atlassian.com/bitbucket-cloud/docs/yaml-anchors/>`_ - Information about YAML anchors\n"
  },
  {
    "path": "containers/index.rst",
    "content": ".. meta::\n   :description: AWS Neuron Deep Learning Containers (DLCs) are pre-configured Docker images for training and serving models on AWS Trainium and Inferentia instances with the Neuron SDK.\n   :keywords: Neuron Containers, Deep Learning Containers, DLC, Docker, Kubernetes, EKS, ECS, AWS Neuron, Trainium, Inferentia, vLLM, Container Deployment\n   :date-modified: 01/22/2026\n\n.. _neuron_containers:\n\nNeuron Containers\n=================\n\nThis section contains the technical documentation for using AWS Neuron Deep Learning Containers (DLCs) and containerized deployments on Inferentia and Trainium instances.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Getting Started </containers/getting-started>\n    Locate Neuron DLC Images </containers/locate-neuron-dlc-image>\n    Customize DLC </containers/dlc-then-customize-devflow>\n    Neuron Plugins </containers/neuron-plugins>\n    Tutorials </containers/tutorials>\n    How-To Guides </containers/how-to/how-to-ultraserver>\n    FAQ </containers/faq>\n    DRA </containers/neuron-dra>\n    Release Notes </release-notes/components/containers>\n\nWhat are Neuron Deep Learning Containers?\n------------------------------------------\n\nAWS Neuron Deep Learning Containers (DLCs) are a set of pre-configured Docker images for training and serving models on AWS Trainium and Inferentia instances using the AWS Neuron SDK. Each DLC is optimized for specific ML frameworks and comes with all Neuron components pre-installed, enabling you to quickly deploy containerized workloads without manual setup.\n\nWith Neuron DLCs, developers can:\n\n* Deploy production-ready containers with pre-installed Neuron SDK and ML frameworks\n* Use containers across multiple deployment platforms including EC2, EKS, ECS, and SageMaker\n* Customize DLCs to fit specific project requirements\n* Leverage Neuron plugins for better observability and fault tolerance\n* Run distributed training and inference workloads with vLLM integration\n* Schedule MPI jobs on Trn2 UltraServers for improved performance\n\nNeuron DLCs support popular ML frameworks including PyTorch, TensorFlow, and JAX, and are available for both training and inference workloads on Inf1, Inf2, Trn1, Trn1n, and Trn2 instances.\n\n.. admonition:: Neuron DRA for Kubernetes\n\n   Neuron has released support for Dynamic Resource Allocation (DRA) with Kubernetes. :doc:`Read more about it here </containers/neuron-dra>`.\n\nQuickstarts\n-----------\n\n.. grid:: 1 1 2 2\n    :gutter: 3\n    \n    .. grid-item-card:: Quickstart: Deploy a DLC with vLLM\n        :link: quickstart_vllm_dlc_deploy\n        :link-type: ref\n        :class-card: sd-rounded-3\n        \n        Get started by configuring and deploying a Deep Learning Container with vLLM for inference. Time to complete: ~30 minutes.\n\n    .. grid-item-card:: Quickstart: Build a Custom Neuron Container\n        :link: containers-getting-started\n        :link-type: ref\n        :class-card: sd-rounded-3\n        \n        Learn how to build a custom Neuron container using Docker for training or inference workloads.\n\nNeuron Containers Documentation\n--------------------------------\n\n.. grid:: 1 1 2 2\n    :gutter: 3\n    \n    .. grid-item-card:: Getting Started\n        :link: containers-getting-started\n        :link-type: ref\n        :class-card: sd-rounded-3\n        \n        Step-by-step guide for building Neuron containers using Docker, including driver installation and container setup.\n\n    .. grid-item-card:: Locate Neuron DLC Images\n        :link: locate-neuron-dlc-image\n        :link-type: ref\n        :class-card: sd-rounded-3\n        \n        Find the right pre-configured Deep Learning Container image for your ML framework and instance type.\n\n    .. grid-item-card:: Customize Neuron DLC\n        :link: containers-dlc-then-customize-devflow\n        :link-type: ref\n        :class-card: sd-rounded-3\n        \n        Learn how to customize Neuron Deep Learning Containers to fit your specific project requirements.\n\n    .. grid-item-card:: Neuron Plugins\n        :link: neuron-container-plugins\n        :link-type: ref\n        :class-card: sd-rounded-3\n        \n        Explore Neuron plugins for containerized environments, providing better observability and fault tolerance.\n\n    .. grid-item-card:: Tutorials\n        :link: /containers/tutorials\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Hands-on tutorials for deploying containers on EC2, EKS, ECS, and other platforms with various configurations.\n\n    .. grid-item-card:: How-To: Schedule MPI Jobs on UltraServers\n        :link: containers-how-to-ultraserver\n        :link-type: ref\n        :class-card: sd-rounded-3\n        \n        Learn how to schedule MPI jobs to run on Neuron UltraServers in EKS for improved performance.\n\n    .. grid-item-card:: FAQ & Troubleshooting\n        :link: container-faq\n        :link-type: ref\n        :class-card: sd-rounded-3\n        \n        Frequently asked questions and solutions for common issues with Neuron containers.\n\n    .. grid-item-card:: Neuron Containers Release Notes\n        :link: /release-notes/components/containers\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Review the latest updates, new DLC images, and improvements in Neuron container releases.\n"
  },
  {
    "path": "containers/k8.rst",
    "content": ".. _self-managed-kubernetes-service:\n\nSelf Managed Kubernetes Service\n===============================\nIntroduction\n------------\nUse of Neuron in containers on a Kubernetes cluster can be simple to achieve by following :ref:`tutorial-k8s-env-setup-for-neuron`\n\nKnown Limitations\n-----------------\nScheduling on k8s cluster requires contiguous neuron device-ids.  Neuron provides a scheduler extension to solve this problem for self-managed k8 clusters.  Read more about it here: :ref:`neuron-k8-scheduler-ext`.\n"
  },
  {
    "path": "containers/kubernetes-getting-started.rst",
    "content": ".. _kubernetes-getting-started:\n\nUsing Neuron with Amazon EKS\n=============================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n.. _tutorial-k8s-env-setup-for-neuron:\n\nEKS Setup for Neuron\n--------------------\n\nCustomers that use Kubernetes can conveniently integrate Inf/Trn instances into their workflows. This section provides step-by-step instructions for setting up an EKS cluster with Neuron support.\n\nPrerequisites\n~~~~~~~~~~~~~\n\n.. include:: /containers/tutorials/k8s-prerequisite.rst\n\nNeuron Helm Chart\n~~~~~~~~~~~~~~~~~\n\n.. include:: /containers/tutorials/k8s-neuron-helm-chart.rst\n\n.. _k8s-neuron-device-plugin:\n\nNeuron Device Plugin\n~~~~~~~~~~~~~~~~~~~~\n\n.. include:: /containers/tutorials/k8s-neuron-device-plugin.rst\n\n.. _neuron_scheduler:\n\nNeuron Scheduler Extension\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. include:: /containers/tutorials/k8s-neuron-scheduler.rst\n\nNeuron Node Problem Detector and Recovery\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. include:: /containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa.rst\n\n.. include:: /containers/tutorials/k8s-neuron-problem-detector-and-recovery.rst\n\nNeuron Monitor Daemonset\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. include:: /containers/tutorials/k8s-neuron-monitor.rst\n"
  },
  {
    "path": "containers/locate-neuron-dlc-image.rst",
    "content": ".. _locate-neuron-dlc-image:\n\nNeuron Deep Learning Containers\n===============================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\nOverview\n--------\n\nAWS Deep Learning Containers (DLCs) provide a set of Docker images that are pre-installed with deep learning frameworks.\nThe containers are optimized for performance and available in Amazon Elastic Container Registry (Amazon ECR).\nDLCs make it straightforward to deploy custom ML environments in a containerized manner,\nwhile taking advantage of the portability and reproducibility benefits of containers.\n\nAWS Neuron DLCs are a set of Docker images for training and serving models on AWS Trainium and Inferentia instances using AWS Neuron SDK.\nThe sections below list all of the AWS Neuron DLCs, as well as the AWS DLCs that come pre-installed with the Neuron SDK.\n\n\nInference Containers\n--------------------\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :align: left\n    :class: table-smaller-font-size\n\n    * - DLC Name\n      - DLC Link(s)\n      - Tutorial(s)\n\n    * - Neuron Inference Containers\n      - | `Neuron PyTorch Inference Containers <https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuron>`_\n        | `Neuronx PyTorch Inference Containers <https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuronx>`_\n        | `Neuronx PyTorch vLLM Inference Containers <https://github.com/aws-neuron/deep-learning-containers#vllm-inference-neuronx>`_\n      - | :ref:`tutorial-infer`\n        | :ref:`torchserve-neuron`\n        | :ref:`quickstart_vllm_dlc_deploy`\n\n    * - Large Model Inference (LMI)/Deep Java Library (DJL) Containers\n      - `LMI Containers <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers>`_\n      -\n\n    * - HuggingFace Inference Containers\n      - | `HuggingFace Neuron Inference Containers <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-neuron-inference-containers>`_\n        | `HuggingFace Neuron vLLM Containers <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-neuron-vllm-containers>`_\n        | `HuggingFace Text Generation Inference (TGI) Containers <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-neuron-text-generation-inference-tgi-containers>`_\n      -\n\n    * - Triton Inference Containers\n      - `NVIDIA Triton Inference Containers <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#nvidia-triton-inference-containers-sm-support-only>`_\n      -\n\n\nTraining Containers\n-------------------\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :align: left\n    :class: table-smaller-font-size\n\n    * - DLC Name\n      - DLC Link(s)\n      - Tutorial(s)\n\n    * - Neuron Training Containers\n      - | `Neuronx PyTorch Training Containers <https://github.com/aws-neuron/deep-learning-containers#pytorch-training-neuronx>`_\n        | `Neuronx Jax Training Containers <https://github.com/aws-neuron/deep-learning-containers#jax-training-neuronx>`_\n      - :ref:`tutorial-training`\n\n    * - HuggingFace Training Containers\n      - `HuggingFace Neuron Training Containers <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-neuron-training-containers>`_\n      -\n\n.. note::\n   Latest HuggingFace Neuron containers are also available on the `HuggingFace Optimum website <https://huggingface.co/docs/optimum-neuron/en/containers#available-optimum-neuron-containers>`_.\n\n\nGetting started with Neuron DLC using Docker\n----------------------------------------------\n\n:ref:`containers-getting-started`\n\n\nUsing containers on AWS services\n----------------------------------\n\n:ref:`Amazon EKS<eks_flow>`\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n:ref:`Amazon ECS<ecs_flow>`\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n:ref:`Amazon SageMaker<sagemaker_flow>`\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n:ref:`AWS Batch<aws_batch_flow>`\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n\nCustomizing Neuron Deep Learning Containers\n-------------------------------------------\nDeep Learning Containers can be customized to fit your specific project needs.\nTo read more, visit :ref:`containers-dlc-then-customize-devflow`.\n"
  },
  {
    "path": "containers/neo-then-hosting-devflow.rst",
    "content": ".. include:: /devflows/inference/neo-then-hosting-devflow.rst"
  },
  {
    "path": "containers/neuron-dra.rst",
    "content": ".. meta::\n   :description: AWS Neuron Dynamic Resource Allocation (DRA) for Kubernetes\n   :keywords: AWS, Neuron, DRA, Kubernetes, Dynamic Resource Allocation\n\n.. _neuron-dra:\n\n=================================================\nAWS Neuron Dynamic Resource Allocation (DRA)\n=================================================\n\nWhat is DRA?\n------------\n\nPrior to Kubernetes 1.33, Kubernetes used device plugins for resource management. The Neuron device plugin implements the\ndevice plugin interface to allow Kubernetes scheduler to manage Neuron resources. However, the device plugin framework\nonly tracks device count—the scheduler cannot see device attributes. Due to this limitation, the framework cannot natively\nfacilitate attribute-based filtering during device selection. For example, the default Kubernetes scheduler prior to DRA cannot\nsupport allocation of connected devices without additional mechanisms such as a scheduler extension.\n\nDynamic Resource Allocation (DRA) is a new framework for advanced resource management that addresses this limitation. DRA\nenables the scheduler to see the device attributes, allowing workloads to select devices based on specific attributes and\nachieve topology aware allocation. Hardware vendors determine which attributes are published for their hardware. The AWS\nNeuron DRA driver implements the kubelet plugin for DRA for AWS Trainium instances.\n\nFor more information on DRA, refer to `Kubernetes Dynamic Resource Allocation <https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/>`_.\n\nWhere can I get the Neuron DRA driver and resource templates?\n-------------------------------------------------------------------\n\nTo review and download the individual resource claim templates, visit this page: \n\n* :doc:`/containers/files/index-dra`.\n\nWhat are the benefits of using DRA over device plugin?\n-------------------------------------------------------\n\n**Reduced developer complexity**\n\nDevice plugin-based workloads use node labels along with request and limits to allocate right resources. Example:\n\n.. code-block:: yaml\n\n   Worker:\n     replicas: 4\n     template:\n       spec:\n         containers:\n         - image: <aws-account-id>.dkr.ecr.us-west-2.amazonaws.com/neuronx_nemo:latest\n           name: mpitest\n           imagePullPolicy: Always\n           resources:\n             limits:\n               aws.amazon.com/neuron: \"16\"\n               vpc.amazonaws.com/efa: \"16\"\n             requests:\n               aws.amazon.com/neuron: \"16\"\n               vpc.amazonaws.com/efa: \"16\"\n           volumeMounts:\n           - name: dshm\n             mountPath: /dev/shm\n         volumes:\n         - name: dshm\n           emptyDir:\n             medium: Memory\n\nDRA introduces ``ResourceClaim`` and ``ResourceClaimTemplates`` which provide abstraction:\n\n.. code-block:: yaml\n\n   Worker:\n     replicas: 4\n     template:\n       spec:\n         containers:\n         - image: <aws-account-id>.dkr.ecr.us-west-2.amazonaws.com/neuronx_nemo:latest\n           name: mpitest\n           imagePullPolicy: Always\n           resources:\n             claims:\n             - name: neurons\n           volumeMounts:\n           - name: dshm\n             mountPath: /dev/shm\n         volumes:\n         - name: dshm\n           emptyDir:\n             medium: Memory\n         resourceClaims:\n         - name: neurons\n           resourceClaimTemplateName: efa-neurons-4-devices\n\nThe ``ResourceClaimTemplate`` name is a given name and can be defined by the ML infra operators to be friendly to their developers. The RCT\ndefinition translates the name into the underlying allocation details - these are abstracted away from ML developers.\n\n**Rich interface for resource requests**\n\nWith DRA, resource requests can specify attribute-based selection. For example, RCT can follow requests, which was not possible to\ndo with device plugins without additional node labeling and extensions. This interface allows us to facilitate topology-aware scheduling.\n\n* Allocate connected neuron devices from trn2 instance type and the devices in the set need to be running specified Neuron driver version.\n* Allocate a specific set of neuron devices for my pod - I want the pod to use devices in row 1 of the topology.\n\n**Dynamic configuration**\n\nDRA allows end users to specify additional configuration for the device via RCT. The Neuron DRA driver leverages this capability to\nallow ResourceClaimTemplates to specify LNC size to be used for the allocation. An example is shown below. The end user need\nnot configure LNC via launch template while using Neuron devices with Neuron DRA driver.\n\n.. code-block:: yaml\n\n   #Template will be vended by Neuron via documentation/code repo\n   apiVersion: resource.k8s.io/v1\n   kind: ResourceClaimTemplate\n   metadata:\n     namespace: neuron-test7\n     name: lnc-neurons\n   spec:\n     spec:\n       devices:\n         requests:\n         - name: neurons\n           exactly:\n             deviceClassName: neuron.aws.com\n             selectors:\n             - cel:\n                 expression: device.attributes['neuron.aws.com'].instanceType == \"trn2.48xlarge\"\n             allocationMode: all\n         config:\n         - opaque:\n             driver: neuron.aws.com\n             parameters:\n               apiVersion: neuron.aws.com/v1\n               kind: NeuronConfig\n               logicalNeuronCore: 1\n           requests: [\"neurons\"]\n\nPrerequisites\n-----------------------------\n\n* **Kubernetes version** - Please use K8s control plane 1.34+\n* **Instance type** - Trn2.48xlarge launched with K8s version 1.34.2+\n\nFor instructions on how to setup an EKS cluster, please refer to :ref:`prerequisites<k8s-prerequisite>`.\n\nInstallation via Helm\n---------------------\n\nConnect to your cluster from local box. The cluster should have at least one trn2.48xlarge node. \nDo not install the Neuron device plugin on the cluster! \n\nPlease confirm the cluster being used via:\n\n.. code-block:: bash\n\n   kubectl config current-context\n\nThen install the DRA driver:\n\n.. code-block:: bash\n\n   helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \\\n     --set \"devicePlugin.enabled=false\" --set \"npd.enabled=false\" --set \"draDriver.enabled=true\"\n\nExample 1 – Connected Neuron Devices\n--------------------------------------\n\nThis section will demonstrate how to run a workload that needs to request a subset of connected Neuron Devices from a trn2.48xlarge instance.\nBefore DRA, this use case required using Neuron Scheduler Extension. With DRA, this allocation is enabled natively.\n\n* [:download:`Download example YAML file </containers/files/specs/1x4-connected-devices.yaml>`]\n\nThe supported subsets include set of 1, 4, 8 or 16. Specifically, these are ``resource.aws.com/devicegroup1_id``, ``resource.aws.com/devicegroup4_id``,\n``resource.aws.com/devicegroup8_id``, ``resource.aws.com/devicegroup16_id`` respectively.\n\nThe sets of 4 and 8 are selected as shown in diagram below:\n\n.. image:: /containers/images/neuron-dra-connected-devices.jpeg\n   :alt: Connected Neuron Devices\n   :width: 600px\n\nTo enable a workload to consume a connected subset of Neuron Devices, first create a ``ResourceClaimTemplate`` that requests a connected set of\nNeuron devices. From the package run:\n\n.. code-block:: bash\n\n   kubectl apply -f specs/1x4-connected-devices.yaml\n\nThis workload definition (which includes the ``ResourceClaimTemplate``) is shown below for quick reference:\n\n.. code-block:: yaml\n\n   apiVersion: resource.k8s.io/v1\n   kind: ResourceClaimTemplate\n   metadata:\n     name: 1x4-connected-neurons\n   spec:\n     spec:\n       devices:\n         requests:\n         - name: neurons\n           exactly:\n             deviceClassName: neuron.aws.com\n             allocationMode: ExactCount\n             count: 4\n             selectors:\n             - cel:\n                 expression: \"device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'\"\n         constraints:\n         - requests: [\"neurons\"]\n           matchAttribute: \"resource.aws.com/devicegroup4_id\"\n\nNext step is to reference the ``ResourceClaimTemplate`` in a pod definition as shown below:\n\n.. code-block:: yaml\n\n   ---\n   apiVersion: v1\n   kind: Pod\n   metadata:\n     name: pod0\n     labels:\n       app: pod\n   spec:\n     containers:\n     - name: ctr0\n       image: public.ecr.aws/ubuntu/ubuntu:22.04\n       command: [\"bash\", \"-c\"]\n       args: [\"export; trap 'exit 0' TERM; sleep 9999 & wait\"]\n       resources:\n         claims:\n         - name: neurons\n     resourceClaims:\n     - name: neurons\n       resourceClaimTemplateName: 1x4-connected-neurons\n\n\nDeploy the above workload using ``kubectl apply``. When the pod is running, examine the related ``ResourceClaim`` using:\n\n.. code-block:: bash\n\n   kubectl get resourceclaim -o yaml\n\nThe ``resourceclaim`` output will show the 4 Neuron Devices that were allocated to the pod. An example is shown below. These will be connected Neuron\nDevices.\n\n.. code-block:: bash\n\n   [devbox]$ kubectl get pod\n   \n   NAME   READY   STATUS    RESTARTS   AGE\n   ---------------------------------------\n   pod0   1/1     Running   0          3s\n   \n   [devbox]$ kubectl get resourceclaim\n   \n   NAME                 STATE                AGE\n   ---------------------------------------------\n   pod0-neurons-zdk76   allocated,reserved   9s\n   \n   [devbox]$ kubectl get resourceclaim pod0-neurons-zdk76 -o yaml\n\nStatus shown below:\n\n.. code-block:: yaml\n\n   status:\n     allocation:\n       devices:\n         results:\n         - adminAccess: null\n           device: neurondevice2\n           driver: neuron.aws.com\n           pool: ip-1-1-1-1.region.compute.internal\n           request: neurons\n         - adminAccess: null\n           device: neurondevice3\n           driver: neuron.aws.com\n           pool: ip-1-1-1-1.region.compute.internal\n           request: neurons\n         - adminAccess: null\n           device: neurondevice1\n           driver: neuron.aws.com\n           pool: ip-1-1-1-1.region.compute.internal\n           request: neurons\n         - adminAccess: null\n           device: neurondevice0\n           driver: neuron.aws.com\n           pool: ip-1-1-1-1.region.compute.internal\n           request: neurons\n\n.. note::\n   The RCT name can be simplified to communicate the intent of the allocation and abstract the allocation details away from ML developers.\n\n**Example RCT1 - \"xl\" - Allocate All 16 devices**\n\n.. code-block:: yaml\n\n   apiVersion: resource.k8s.io/v1\n   kind: ResourceClaimTemplate\n   metadata:\n     name: xl-trn2\n   spec:\n     spec:\n       devices:\n         requests:\n         - name: neurons\n           exactly: \n             allocationMode: ExactCount\n             count: 16\n             deviceClassName: neuron.aws.com\n             selectors:\n             - cel:\n                 expression: device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'\n\n**Example RCT2 - large - Allocate 8 devices**\n\n.. code-block:: yaml\n\n   apiVersion: resource.k8s.io/v1\n   kind: ResourceClaimTemplate\n   metadata:\n     name: l-trn2\n   spec:\n     spec:\n       devices:\n         constraints:\n         - matchAttribute: resource.aws.com/devicegroup8_id\n           requests:\n           - neurons\n         requests:\n         - name: neurons\n           exactly:\n             allocationMode: ExactCount\n             count: 8\n             deviceClassName: neuron.aws.com\n             selectors:\n             - cel:\n                 expression: device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'\n\n**Example RCT2 - 2.27-driver – Allocate 8 devices with driver version at the driver published by Neuron SDK 2.27**\n\n`Neuron 2.27.0 Runtime <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/2.27.0/runtime.html#neuron-2-27-0-runtime>`_\n\n.. code-block:: yaml\n\n   apiVersion: resource.k8s.io/v1\n   kind: ResourceClaimTemplate\n   metadata:\n     name: 2.27-driver-trn2\n   spec:\n     spec:\n       devices:\n         constraints:\n         - matchAttribute: resource.aws.com/devicegroup8_id\n           requests:\n           - neurons\n         requests:\n         - name: neurons\n           exactly:\n             allocationMode: ExactCount\n             count: 8\n             deviceClassName: neuron.aws.com\n             selectors:\n             - cel:\n                 expression: device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge' &&\n                            device.attributes['neuron.aws.com'].neuronDriverVersion == '2.25.4.0'\n\nExample 2 - Dynamic LNC config\n------------------------------\n\nThis example shows how to set LNC per workload. Earlier, overriding LNC on a Node required a node template. With DRA, workloads can\noverride default LNC via ``ResourceClaim.``\n\n* [:download:`Download example YAML file </containers/files/specs/lnc-setting-trn2.yaml>`]\n\n\nApply the following workload definition:\n\n.. code-block:: bash\n\n   kubectl apply -f specs/lnc-setting-trn2.yaml\n\nThis workload definition (which includes the ``ResourceClaimTemplate``) is shown below for quick reference:\n\n.. code-block:: yaml\n\n   apiVersion: resource.k8s.io/v1\n   kind: ResourceClaimTemplate\n   metadata:\n     name: all-neurons-lnc-1\n   spec:\n     spec:\n       devices:\n         requests:\n         - name: neurons\n           exactly:\n             deviceClassName: neuron.aws.com\n             selectors:\n             - cel:\n                 expression: \"device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'\"\n             allocationMode: All\n         config:\n         - requests: [\"neurons\"]\n           opaque:\n             driver: neuron.aws.com\n             parameters:\n               apiVersion: neuron.aws.com/v1\n               kind: NeuronConfig\n               logicalNeuronCore: 1\n\nThen deploy a pod that references the above ``ResourceClaimTemplate`` as shown below:\n\n.. code-block:: yaml\n\n   apiVersion: v1\n   kind: Pod\n   metadata:\n     name: pod0\n     labels:\n       app: pod\n   spec:\n     containers:\n     - name: ctr0\n       image: public.ecr.aws/ubuntu/ubuntu:22.04\n       command: [\"bash\", \"-c\"]\n       args: [\"export; trap 'exit 0' TERM; sleep 9999 & wait\"]\n       resources:\n         claims:\n         - name: neurons\n     resourceClaims:\n     - name: neurons\n       resourceClaimTemplateName: all-neurons-lnc-1\n\nExample 3 – Four Node Inference on trn2u.48xlarge\n--------------------------------------------------\n\nA trn2u.48xlarge Trn2 UltraServer has 4 Trn2 nodes interconnected by Neuron Links.\n\ntrn2u.48xlarge instances can be allocated in set of 1, 2, or 4. The Neuron DRA driver can utilize 1 or more ``ResourceClaimTemplate`` definitions to convey the\ndesired size of the set. The ``ResourceClaimTemplate`` allows end users to specify \"UltraServerConfig\" to declare their intent to use all 4 nodes of\nthe UltraServer. This configuration value is passed by the Neuron DRA driver to the Neuron runtime and collectives inside the container.\n\n* [:download:`Download example YAML file </containers/files/specs/4-node-inference-us.yaml>`]\n\nExample yaml for 4-node inference on trn2u.48xlarge:\n\n.. code-block:: yaml\n\n   apiVersion: resource.k8s.io/v1\n   kind: ResourceClaimTemplate\n   metadata:\n     name: us-4-node-config\n   spec:\n     spec:\n       devices:\n         requests:\n         - name: neurons\n           exactly: \n             deviceClassName: neuron.aws.com\n             selectors:\n             - cel:\n                 expression: \"device.attributes['neuron.aws.com'].resourceType == 'neuron_node'\"\n             allocationMode: ExactCount\n             count: 1\n         config:\n         - requests: [\"neurons\"]\n           opaque:\n             driver: neuron.aws.com\n             parameters:\n               apiVersion: neuron.aws.com/v1\n               kind: UltraServerConfig\n               ultraserverMode: 4\n   ---\n   apiVersion: leaderworkerset.x-k8s.io/v1\n   kind: LeaderWorkerSet\n   metadata:\n     name: vllm\n     annotations:\n       leaderworkerset.sigs.k8s.io/exclusive-topology: neuron.amazonaws.com/ultraserver-server-id-4\n   spec:\n     rolloutStrategy:\n       type: RollingUpdate\n       rollingUpdateConfiguration:\n         maxUnavailable: 1\n         maxSurge: 1\n     # Two replica groups of 4 nodes each, i.e. two ultraservers\n     replicas: 2\n     leaderWorkerTemplate:\n       size: 4\n       restartPolicy: RecreateGroupOnPodRestart\n       leaderTemplate:\n         metadata:\n           labels:\n             role: leader\n         spec:\n           containers:\n           - name: vllm-leader\n             image: public.ecr.aws/ubuntu/ubuntu:22.04\n             command:\n             - sh\n             - -c\n             - \"sleep infinity\"\n             resources:\n               claims:\n               - name: one-node-from-ultraserver\n           resourceClaims:\n           - name: one-node-from-ultraserver\n             resourceClaimTemplateName: us-4-node-config\n       workerTemplate:\n         metadata:\n           labels:\n             role: worker\n         spec:\n           containers:\n           - name: vllm-worker\n             image: public.ecr.aws/ubuntu/ubuntu:22.04\n             command:\n             - sh\n             - -c\n             - \"sleep infinity\"\n             resources:\n               claims:\n               - name: one-node-from-ultraserver\n           resourceClaims:\n           - name: one-node-from-ultraserver\n             resourceClaimTemplateName: us-4-node-config\n\n\nNeuron DRA Driver Attributes Reference\n---------------------------------------\n\nThe Neuron DRA driver publishes the following attributes in resource slices. These attributes can be used in ``ResourceClaimTemplate`` CEL expressions\nto filter and select specific devices for allocation.\n\nCommon Attributes\n^^^^^^^^^^^^^^^^^\n\nThese attributes are common to all Neuron instances and their devices:\n\n* ``deviceId`` - An integer value representing the ID of the Neuron device. Used to identify which device is chosen from allocation.\n* ``instanceType`` - A string value representing the EC2 instance type of the Neuron device. Used to specify devices of which instance(s) to choose for allocation.\n* ``neuronDriverVersion`` - A string value representing the Neuron driver version running on the instance. Used to claim instances with the same driver version for allocation.\n* ``draDriverVersion`` - A version value of the Neuron DRA driver version. Provides visibility on which Neuron DRA driver version published the resource slice.\n* ``resourceType`` - A string value to distinguish between devices and UltraServer nodes. For devices, this value is ``neuron_device``. For UltraServers, this value is ``neuron_node``.\n* ``networkNodeLayer1`` - A string value representing network node layer 1. Can be used during topology-aware scheduling to minimize network latency and optimize instance placement. See `EC2 Instance Topology <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/how-ec2-instance-topology-works.html>`_.\n* ``networkNodeLayer2`` - A string value representing network node layer 2. Can be used to allocate workloads to nodes on the same spine. See `EC2 Instance Topology <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/how-ec2-instance-topology-works.html>`_.\n* ``networkNodeLayer3`` - A string value representing network node layer 3. Can be used during topology-aware scheduling to minimize network latency and optimize instance placement. See `EC2 Instance Topology <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/how-ec2-instance-topology-works.html>`_.\n\nTrn Non-UltraServer Attributes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThese attributes are only populated for Neuron instances that have grid topology (trn) and are not UltraServers:\n\n* ``topology_x`` - An integer value representing the row of the device in a grid topology. Only populated when the number of devices in the instance is greater than 1. Can be used to select a specific device or devices that belong to the same row.\n* ``topology_y`` - An integer value representing the column of the device in a grid topology. Only populated when the number of devices in the instance is greater than 1. Can be used to select a specific device or devices that belong to the same column.\n* ``topology4_id`` - An integer value representing the row of the device in a grid topology. Only populated when the number of devices in the instance is greater than 1. Can be used to select devices that belong to the same row.\n* ``topology8_id`` - An integer value representing the row of the device in a grid topology. Only populated when the number of devices in the instance is greater than or equal to 8. Can be used to select devices that belong to the same two rows.\n\nTrn UltraServer Attributes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThese attributes are only populated for Neuron instances that have grid topology (trn) and are UltraServers:\n\n* ``capacityBlockId`` - A string value representing the ID of the capacity block that the UltraServer instance is in. See `Instance Topology API <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_InstanceTopology.html>`_.\n\nEFA-Enabled Instance Attributes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThese attributes are only populated for Neuron instances that are EFA-enabled:\n\n* ``resource.aws.com/devicegroup1_id`` - A string value representing the EFA Bus:Device:Function (BDF) corresponding to that device.\n* ``resource.aws.com/devicegroup4_id`` - A string value representing a hash, ensuring Neuron devices in the same topology group of 4 get the same group ID.\n* ``resource.aws.com/devicegroup8_id`` - A string value representing a hash, ensuring Neuron devices in the same topology group of 8 get the same group ID.\n* ``resource.aws.com/devicegroup16_id`` - A string value representing a hash, ensuring Neuron devices in the same topology group of 16 get the same group ID.\n\nFAQs\n----\n\nCan DRA plugin co-exist with other device plugins?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDevice plugins and the DRA plugin can coexist in the same cluster, but **not** for the same node. As of now, the two mechanisms act independently. Neuron is preparing\nan upcoming feature that will allow device plugin based allocations to work with DRA, but the feature is still in alpha and not enabled on EKS.\nRef: `Extended Resource <https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#extended-resource>`_.\n\nIs DRA replacing Neuron Device Plugin and Scheduler Extension?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe will continue to support the Neuron Device Plugin and Scheduler Extension as long as:\n\n1. Upstream Kubernetes continues to support device plugins.\n2. EKS continues to support Kubernetes versions below 1.34 (which do not support DRA).\n\nWhat Kubernetes versions are supported?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nKubernetes control plane must be on 1.34. For Node AMI, we support 1.34.2+. We do not support Node AMI for 1.34.0 or 1.34.1\nsince it had a regression in DRA. Upstream issue: `Kubernetes Issue #133920 <https://github.com/kubernetes/kubernetes/issues/133920>`_\n\nWhere can I learn more about how to put together RCT using CEL expressions?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo learn more about RCTs, please visit `Kubernetes Dynamic Resource Allocation <https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/>`_. To learn more\nabout CEL expressions, please visit `CEL Language <https://cel.dev/>`_. Send us feedback and let us know which additional RCT examples you would like\nus to provide in the source code.\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   Support Files </containers/files/index-dra>\n"
  },
  {
    "path": "containers/neuron-plugins.rst",
    "content": ".. _neuron-container-plugins:\n\nNeuron Plugins for Containerized Environments\n=============================================\n\nThis section provides an overview of the Neuron infrastructure components for containerized environments. For detailed setup instructions, see :ref:`tutorial-k8s-env-setup-for-neuron`.\n\nNeuron Device Plugin\n--------------------\n\nExposes Neuron hardware resources to Kubernetes as schedulable resources (``aws.amazon.com/neuron`` and ``aws.amazon.com/neuroncore``). The device plugin discovers Neuron devices on each node, advertises them to the scheduler, and manages allocation to Pods with exclusive access.\n\nNeuron Scheduler Extension\n---------------------------\n\nProvides topology-aware scheduling for optimal Neuron device allocation. It considers device connectivity and placement to ensure efficient utilization. This component is optional and most beneficial for workloads requesting specific subsets of Neuron devices or cores.\n\nNeuron Node Problem Detector and Recovery\n------------------------------------------\n\nMonitors Neuron device health and detects hardware and software errors. When unrecoverable issues occur, it can mark nodes as unhealthy and trigger node replacement. It also publishes CloudWatch metrics under the ``NeuronHealthCheck`` namespace for monitoring.\n\nFor ECS environments, see :ref:`ecs-neuron-problem-detector-and-recovery`.\n\nNeuron Monitor\n--------------\n\nCollects and exposes metrics from Neuron devices including hardware utilization, performance counters, memory usage, and device health. Supports integration with observability platforms like Prometheus for monitoring and alerting.\n\nNeuron Dynamic Resource Allocation (DRA) Driver\n-----------------------------------------------\n\nManages Neuron hardware resources in a Kubernetes environment. It integration with Kubernetes Dynamic Resource Allocation (DRA) framework to advertise Neuron devices and their attributes. This feature cannot be used alongside Neuron device plugin for nodes of the same cluster. For more information on Neuron DRA driver, please refer to :ref:`neuron-dra`\n\n"
  },
  {
    "path": "containers/neuron_dlc_images.csv",
    "content": "Framework,Neuron Package,Job Type,Supported EC2 Instance Types,Python Version Options,ECR Public Repo URL,Image Details,Other Packages\nPyTorch 2.1.2,\"aws-neuronx-tools, neuronx_distributed, torch-neuronx, transformers-neuronx\",inference,trn1 and inf2,3.10 (py310),https://gallery.ecr.aws/neuron/pytorch-inference-neuronx,https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuronx,torchserve\nPyTorch 2.1.2,\"aws-neuronx-tools, neuronx_distributed, torch-neuronx\",training,trn1 and inf2,3.10 (py310),https://gallery.ecr.aws/neuron/pytorch-training-neuronx,https://github.com/aws-neuron/deep-learning-containers#pytorch-training-neuronx,\nPyTorch 1.13.1,\"aws-neuronx-tools, torch-neuron\",inference,inf1,3.10 (py310),https://gallery.ecr.aws/neuron/pytorch-inference-neuron,https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuron,torchserve\nPyTorch 1.13.1,\"aws-neuronx-tools, neuronx_distributed, torch-neuronx, transformers-neuronx\",inference,trn1 and inf2,3.10 (py310),https://gallery.ecr.aws/neuron/pytorch-inference-neuronx,https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuronx,torchserve\nPyTorch 1.13.1,\"aws-neuronx-tools, neuronx_distributed, torch-neuronx\",training,trn1 and inf2,3.10 (py310),https://gallery.ecr.aws/neuron/pytorch-training-neuronx,https://github.com/aws-neuron/deep-learning-containers#pytorch-training-neuronx,\n"
  },
  {
    "path": "containers/troubleshooting.rst",
    "content": ".. _container-troubleshooting:\n\nTroubleshooting Neuron Containers\n=================================\n\nThis document aims to provide more information on how to fix issues you\nmight encounter while using the Neuron Containers. For each\nissue we will provide an explanation of what happened and what can\npotentially correct the issue.\n\n\nIf your issue is not listed below or you have a more nuanced problem, contact\nus via `issues <https://github.com/aws/aws-neuron-sdk/issues>`__ posted\nto this repo, the `AWS Neuron developer\nforum <https://forums.aws.amazon.com/forum.jspa?forumID=355>`__, or\nthrough AWS support.\n\nNeuron Container includes the following Neuron Components. For issues relating to \nthese components inside the container refer the individual component troubleshooting\nguides :ref:`general-troubleshooting`\n\n* Neuron Runtime/Driver\n* Pytorch/Tenosrflow/MXNet frameworks\n* Libfabric/EFA \n\nThe following are container specific issues\n\nNeuron Device Not found\n-----------------------\n\nThe neuron container expects the neuron devices to be exposed to the container as\nreferenced in :ref:`container-devices`. \n\nPlease look at the container logs to see messages like below\n\n::\n\n   2022-Sep-08 17:55:23.0768    19:19    ERROR  TDRV:tdrv_get_dev_info                       No neuron device available\n\n\nIf the above message is seen then devices are not exposed to container\n\nSolution\n''''''''\n\n* Refer :ref:`container-devices` and make sure the devices are exposed to container\n* If specific cores are being used refer :ref:`container-cores` and make sure the cores are exposed to container\n* In kubernetes environment refer :ref:`k8s-specify-devices` or :ref:`k8s-specify-cores` to make sure neuron devices/cores are there in pods container spec\n\n\nContiguous Device ID's\n-----------------------\n\nNeuron runtime expects the inferentia/trainium device id's to be contigious. If the device id's\nare not contiguous you might see error messages like below\n\n\n::\n\n   2022-Sep-08 21:52:11.0307     7:7     ERROR  TDRV:tdrv_init_mla_phase1                    Could not open the nd1\n\n::\n\n   2022-Sep-08 23:00:05.0667     8:8     ERROR   NRT:nrt_allocate_neuron_cores               Neuron cores are not contiguous\n\n\nSolution\n''''''''\n\n* In the docker run command make sure the devices specified using --device are all contiguous\n* If oci neuron hook is used and the env variable AWS_NEURON_VISIBLE_DEVICES is used then make sure the\ndevices specified are all contiguous\n* In kubernetes environment with just the neuron device plugin running there is no guarantee that\nthe devices allocated will be contiguous. Make sure to run the neuron scheduler extension as specified in :ref:`neuron-k8-scheduler-ext`"
  },
  {
    "path": "containers/tutorial-docker-runtime1.0.rst",
    "content": ".. _tutorial-docker-environment-setup-for-neuron-runtime-10:\n\nTutorial: Docker environment setup for Neuron Runtime 1.x\n=========================================================\n\nIntroduction\n------------\n\nA Neuron application can be deployed using docker containers. This\ntutorial describes how to configure docker to expose Inferentia devices\nto containers.\n\nOnce the environment is setup, a container can be started with\n*AWS_NEURON_VISIBLE_DEVICES* environment variable to specify desired set\nof Inferentia devices to be exposed to the container.\nAWS_NEURON_VISIBLE_DEVICES is a set of contiguous comma-seperated\ninferentia logical ids. To find out the available logical ids on your\ninstance, run the neuron-ls tool. For example, on inf1.6xlarge instance\nwith 4 inferentia devices, you may set AWS_NEURON_VISIBLE_DEVICES=\"2,3\"\nto expose the last two devices to a container. When running neuron-ls\ninside a container, you will only see the set of exposed Inferentias.\nFor example:\n\n.. code:: bash\n\n   docker run --env AWS_NEURON_VISIBLE_DEVICES=\"0\" neuron-test neuron-ls\n\nWould produce the following output:\n\n::\n\n   +--------------+---------+--------+-----------+-----------+------+------+\n   |   PCI BDF    | LOGICAL | NEURON |  MEMORY   |  MEMORY   | EAST | WEST |\n   |              |   ID    | CORES  | CHANNEL 0 | CHANNEL 1 |      |      |\n   +--------------+---------+--------+-----------+-----------+------+------+\n   | 0000:00:1f.0 |       0 |      4 | 4096 MB   | 4096 MB   |    0 |    0 |\n   +--------------+---------+--------+-----------+-----------+------+------+\n\nSteps:\n------\n\nThis tutorial starts from a fresh Ubuntu Server 16.04 LTS AMI\n\"ami-08bc77a2c7eb2b1da\".\n\nStep 1: install aws-neuron-runtime-base package\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFollow the :ref:`install-guide-index` to\nsetup access to Neuron repos. Then, install the aws-neuron-runtime-base\npackage.\n\n.. code:: bash\n\n   sudo apt-get install aws-neuron-runtime-base\n\nStep 2: Make sure that the neuron-rtd service is not running\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf neuron-rtd is running on the host, stop the neuron-rtd service before\nstarting the containerized neuron-rtd. This is needed to allow\nassignment of devices to containers:\n\n.. code:: bash\n\n   sudo service neuron-rtd stop\n\nStep 3: install oci-add-hooks dependency\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n`oci-add-hooks <https://github.com/awslabs/oci-add-hooks>`__ is an OCI\nruntime with the sole purpose of injecting OCI prestart, poststart, and\npoststop hooks into a container config.json before passing along to an\nOCI compatable runtime. oci-add-hooks is used to inject a hook that\nexposes Inferentia devices to the container.\n\n.. code:: bash\n\n   sudo apt install -y golang && \\\n       export GOPATH=$HOME/go && \\\n       go get github.com/joeshaw/json-lossless && \\\n       cd /tmp/ && \\\n       git clone https://github.com/awslabs/oci-add-hooks && \\\n       cd /tmp/oci-add-hooks && \\\n       make build && \\\n       sudo cp /tmp/oci-add-hooks/oci-add-hooks /usr/local/bin/\n\n.. _step-4-setup-docker-to-use-oci-neuron-oci-runtime:\n\nStep 4: setup Docker to use oci-neuron OCI runtime.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\noci-neuron is a script representing OCI compatible runtime. It wraps\noci-add-hooks, which wraps runc. In this step, we configure docker to\npoint at oci-neuron OCI runtime. Install dockerIO:\n\n.. code:: bash\n\n   sudo apt install -y docker.io\n   sudo usermod -aG docker $USER\n\nLogout and log back in to refresh membership. Place daemon.json Docker\nconfiguration file supplied by Neuron SDK in default location. This file\nspecifies oci-neuron as default docker runtime:\n\n.. code:: bash\n\n   sudo cp /opt/aws/neuron/share/docker-daemon.json /etc/docker/daemon.json\n   sudo service docker restart\n\nIf the docker restart command fails, make sure to check if the docker\nsystemd service is not masked. More information on this can be found\nhere: https://stackoverflow.com/a/37640824\n\nVerify docker:\n\n.. code:: bash\n\n   docker run hello-world\n\nExpected result:\n\n::\n\n   Hello from Docker!\n   This message shows that your installation appears to be working correctly.\n\n   To generate this message, Docker took the following steps:\n   1. The Docker client contacted the Docker daemon.\n   2. The Docker daemon pulled the \"hello-world\" image from the Docker Hub.\n   (amd64)\n   3. The Docker daemon created a new container from that image which runs the\n   executable that produces the output you are currently reading.\n   4. The Docker daemon streamed that output to the Docker client, which sent it\n   to your terminal.\n\n   To try something more ambitious, you can run an Ubuntu container with:\n   $ docker run -it ubuntu bash\n\n   Share images, automate workflows, and more with a free Docker ID:\n   https://hub.docker.com/\n\n   For more examples and ideas, visit:\n   https://docs.docker.com/get-started/\n\nBuild a docker image using provided dockerfile :ref:`neuron-runtime-dockerfile`, and use to\nverify whitelisting:\n\n.. code:: bash\n\n   docker build . -f Dockerfile.neuron-rtd -t neuron-test\n\nThen run:\n\n.. code:: bash\n\n   docker run --env AWS_NEURON_VISIBLE_DEVICES=\"0\"  neuron-test neuron-ls\n\nExpected result:\n\n::\n\n   +--------------+---------+--------+-----------+-----------+------+------+\n   |   PCI BDF    | LOGICAL | NEURON |  MEMORY   |  MEMORY   | EAST | WEST |\n   |              |   ID    | CORES  | CHANNEL 0 | CHANNEL 1 |      |      |\n   +--------------+---------+--------+-----------+-----------+------+------+\n   | 0000:00:1f.0 |       0 |      4 | 4096 MB   | 4096 MB   |    0 |    0 |\n   +--------------+---------+--------+-----------+-----------+------+------+\n"
  },
  {
    "path": "containers/tutorials/build-run-neuron-container.rst",
    "content": ".. _how-to-build-neuron-container:\n\nTutorial How to Build and Run a Neuron Container\n================================================\n\nIntroduction\n------------\n\nThis document explains how to build a Neuron Container using an existing Dockerfile.\n\nPre-requisites\n--------------\n#. Docker version 18 or newer is configured according to :ref:`tutorial-docker-env-setup`\n#. Inf1/Trn1 instance with available :ref:`Neuron Devices<container-devices>`\n#. If running a serving application such as tensorflow-model-server, torchserve or multi-model-server, make sure the appropriate ports that the server listens to are exposed using EXPOSE in the Dockerfile or the arguments ``-p 80:8080`` on the ``docker run`` command.\n\n.. _running-application-container:\n\nBuild and Run the Application Container\n---------------------------------------\nFollow the steps below for creating neuron application containers.\n\n- Build a docker image using provided dockerfile :ref:`libmode-dockerfile` for Inf1 and :ref:`trainium-dlc-dockerfile` for Trn1 (also for Trn1 the dockerfile needs mlp train script found here at :ref:`mlp-train`\n\n.. code:: bash\n\n   docker build . -f Dockerfile.pt -t neuron-container:pytorch\n\n- Run the container locally:\n\n.. code:: bash\n\n   docker run -it --name pt17 --device=/dev/neuron0 neuron-container:pytorch neuron-ls\n\nExpected result for Inf1:\n\n::\n\n   +--------------+---------+--------+-----------+-----------+------+------+\n   |   PCI BDF    | LOGICAL | NEURON |  MEMORY   |  MEMORY   | EAST | WEST |\n   |              |   ID    | CORES  | CHANNEL 0 | CHANNEL 1 |      |      |\n   +--------------+---------+--------+-----------+-----------+------+------+\n   | 0000:00:1f.0 |       0 |      4 | 4096 MB   | 4096 MB   |    0 |    0 |\n   +--------------+---------+--------+-----------+-----------+------+------+\n\nExpected result for Trn1:\n\n::\n\n   +--------+--------+--------+-----------+---------+\n   | NEURON | NEURON | NEURON | CONNECTED |   PCI   |\n   | DEVICE | CORES  | MEMORY |  DEVICES  |   BDF   |\n   +--------+--------+--------+-----------+---------+\n   | 0      | 4      | 8 GB   | 1         | 00:1f.0 |\n   +--------+--------+--------+-----------+---------+\n\n\n.. note::\n\n   If instead of the --device option above if the env variable AWS_NEURON_VISIBLE_DEVICES\n   is to be used then the oci hook needs to installed by following instructions in :ref:`tutorial-oci-hook`\n\n\nImportant to know\n-----------------\n\n.. _container-devices:\n\nDevices\n^^^^^^^\n\n- The docker native way is to use --device /dev/neuron# for each of the Neuron Devices intended to be passed. When using --device option ALL/all is not supported.\n\n    .. code:: bash\n\n        docker run --device=/dev/neuron0 --device=/dev/neuron1\n\n- If you install the aws-neuronx-oci-hook package, you will have an OCI hook that also supports use of a container environment variable AWS_NEURON_VISIBLE_DEVICES=<ALL | csv of devices>, which intends to make things easier for multi device scenarios. Following are some examples. For setting up oci hook please refer :ref:`oci neuron hook <tutorial-oci-hook>`\n\n    .. code:: bash\n\n        docker run -e “AWS_NEURON_VISIBLE_DEVICES=0,1”\n        docker run -e “AWS_NEURON_VISIBLE_DEVICES=ALL”\n\n- In kubernetes environment, the neuron device plugin is used for exposing the neuron device to the containers in the pod. The number of devices can be adjusted using the *aws.amazon.com/neuron* resource in the pod specification. Refer :ref:`K8s setup <tutorial-k8s-env-setup-for-neuron>` for more details\n\n    .. code:: bash\n\n         resources:\n            limits:\n            aws.amazon.com/neuron: 1\n\n   .. note::\n\n      Only the number of devices can be specfied.\n      When only the neuron device plugin is running that does not guaratee the devices to be\n      contiguous. Make sure to run the neuron scheduler extension :ref:`neuron-k8-scheduler-ext`\n      so that it makes sure that contigiuous devices are allocated to the containers\n\n\n- Multiple container applications running in the same host can share the devices but the cores cannot be shared. This is similar to running multiple applications in the host. \n- In the kubernetes environment the devices cannot be shared by multiple containers in the pod\n\n.. _container-cores:\n\nCores\n^^^^^\nEach neuron device has multiple cores. The cores allocated to process/container can be controlled by\nthe environment variable NEURON_RT_VISIBLE_CORES and NEURON_RT_NUM_CORES. Please refer :ref:`nrt-configuration` for more details.\n\n- The docker native way is to use --device /dev/neuron# for each of the Neuron Devices intended to be passed. Add --env NEURON_RT_VISIBLE_CORES-1,2 to use cores 1 and 2 to this container. For example in inf1.24xlarge with 64 cores, if we want to use cores 51 & 52, the appropriate device and NEURON_RT_VISIBLE_CORES needs to be used. With 4 cores in each device, core 51 is in device 12 and 52 is in device 13\n\n    .. code:: bash\n\n        docker run --device=/dev/neuron12 --device=/dev/neuron13 --env NEURON_RT_VISIBLE_CORES=51,52\n\n- In kubernetes environment, the neuron device plugin is used for exposing the neuron cores to the containers in the pod. The number of cores can be adjusted using the *aws.amazon.com/neuroncore* resource in the pod specification. Refer :ref:`K8s setup <tutorial-k8s-env-setup-for-neuron>` for more details.\n\n    .. code:: bash\n\n         resources:\n            limits:\n            aws.amazon.com/neuroncore: 1\n\n   .. note::\n\n      Only the number of cores can be specfied.\n      When only the neuron device plugin is running that does not guaratee the cores to be\n      contiguous. Make sure to run the neuron scheduler extension :ref:`neuron-k8-scheduler-ext`\n      so that it makes sure that contigiuous cores are allocated to the containers\n\n- Multiple container applications running in the same host cannot share the cores. This is similar to running multiple applications in the host.\n- In the kubernetes environment the cores cannot be shared by multiple containers in the pod\n"
  },
  {
    "path": "containers/tutorials/inference/index.rst",
    "content": "Containers -- Inference Tutorials\n=================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /containers/tutorials/inference/tutorial-infer\n    /containers/tutorials/inference/k8s_rn50_demo\n\n\n.. include:: /containers/tutorials/inference/index.txt"
  },
  {
    "path": "containers/tutorials/inference/index.txt",
    "content": "* :ref:`tutorial-infer`\n* :ref:`example-deploy-rn50-as-k8s-service`\n"
  },
  {
    "path": "containers/tutorials/inference/k8s_rn50_demo.rst",
    "content": ".. _example-deploy-rn50-as-k8s-service:\n\nDeploy a TensorFlow Resnet50 model as a Kubernetes service\n----------------------------------------------------------\n\nThis tutorial uses Resnet50 model as a teaching example on how to deploy an\ninference application using Kubernetes on the Inf1 instances.\n\nPrerequisite:\n^^^^^^^^^^^^^\n\n-  Please follow instructions at :ref:`tutorial-k8s-env-setup-for-neuron` to setup k8s support on your cluster.\n-  Inf1 instances as worker nodes with attached roles allowing:\n\n   -  ECR read access policy to retrieve container images from ECR:\n      **arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly**\n   -  S3 access to retrieve saved_model from within tensorflow serving\n      container.\n\nDeploy a TensorFlow Serving application image\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nA trained model must be compiled to an Inferentia target before it can be deployed on Inferentia instances\\.\nTo continue, you will need a Neuron-optimized TensorFlow model saved in Amazon S3\\.\nIf you don’t already have a SavedModel, please follow the tutorial for `creating a Neuron compatible ResNet50 model <https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-tf-neuron.html>`_\nand upload the resulting SavedModel to S3\\.\n\nResNet-50 is a popular machine learning model used for image\nclassification tasks\\. For more information about compiling Neuron models, see\n`The AWS Inferentia Chip With DLAMI <https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia.html>`_\nin the AWS Deep Learning AMI Developer Guide\\.\n\nThe sample deployment manifest manages a pre-built inference serving container for TensorFlow provided by\nAWS Deep Learning Containers. Inside the container is the AWS Neuron Runtime and the TensorFlow Serving application.\nA complete list of pre-built Deep Learning Containers optimized for Neuron is maintained on GitHub under\n`Available Images <https://github.com/aws/deep-learning-containers/blob/master/available_images.md#user-content-neuron-containers>`_.\nAt start\\-up, the DLC will fetch your model from Amazon S3, launch Neuron TensorFlow Serving with the saved model,\nand wait for prediction requests\\.\n\nThe number of Neuron devices allocated to your serving application can be adjusted by changing the\n`aws.amazon.com/neuron` resource in the deployment yaml\\. Please note that communication between TensorFlow Serving\nand the Neuron runtime happens over GRPC, which requires passing the `IPC_LOCK` capability to the container.\n\n1. Create a file named `rn50_deployment.yaml` with the contents below\\. Update the region\\-code and model path to match your desired settings. The model name is for identification purposes when a client makes a request to the TensorFlow server\\. This example uses a model name to match a sample ResNet50 client script that will be used in a later step for sending prediction requests\\.\n\n.. note::\n   1. Replace the s3 bucket name in model_base_path arg in the file with the location of the where the saved model was stored in s3.\n   2. In the image:  add the appropriate location of the DLC tensorflow image\n\n\n::\n\n   kind: Deployment\n   apiVersion: apps/v1\n   metadata:\n     name: k8s-neuron-test\n     labels:\n       app: k8s-neuron-test\n       role: master\n   spec:\n     replicas: 2\n     selector:\n       matchLabels:\n         app: k8s-neuron-test\n         role: master\n     template:\n       metadata:\n         labels:\n           app: k8s-neuron-test\n           role: master\n       spec:\n         containers:\n           - name: k8s-neuron-test\n             image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-neuron:1.15.4-neuron-py37-ubuntu18.04\n             command:\n               - /usr/local/bin/entrypoint.sh\n             args:\n               - --port=8500\n               - --rest_api_port=9000\n               - --model_name=resnet50_neuron\n               - --model_base_path=s3://${your-bucket-of-models}/resnet50_neuron/\n             ports:\n               - containerPort: 8500\n               - containerPort: 9000\n             imagePullPolicy: IfNotPresent\n             env:\n               - name: AWS_REGION\n                 value: \"us-east-1\"\n               - name: S3_USE_HTTPS\n                 value: \"1\"\n               - name: S3_VERIFY_SSL\n                 value: \"0\"\n               - name: S3_ENDPOINT\n                 value: s3.us-east-1.amazonaws.com\n               - name: AWS_LOG_LEVEL\n                 value: \"3\"\n             resources:\n               limits:\n                 cpu: 4\n                 memory: 4Gi\n                 aws.amazon.com/neuron: 1\n               requests:\n                 cpu: \"1\"\n                 memory: 1Gi\n             securityContext:\n               capabilities:\n                 add:\n                   - IPC_LOCK\n\n2. Deploy the model\\.\n\n::\n\n   kubectl apply -f rn50_deployment.yaml\n\n3. Create a file named `rn50_service.yaml` with the following contents\\. The HTTP and gRPC ports are opened for accepting prediction requests\\.\n\n::\n\n   kind: Service\n   apiVersion: v1\n   metadata:\n     name: k8s-neuron-test\n     labels:\n       app: k8s-neuron-test\n   spec:\n     type: ClusterIP\n     ports:\n       - name: http-tf-serving\n         port: 8500\n         targetPort: 8500\n       - name: grpc-tf-serving\n         port: 9000\n         targetPort: 9000\n     selector:\n       app: k8s-neuron-test\n       role: master\n\n\n4. Create a Kubernetes service for your TensorFlow model Serving application\\.\n\n::\n\n   kubectl apply -f rn50_service.yaml\n\nMake predictions against your TensorFlow Serving service\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n1. To test locally, forward the gRPC port to the `k8s-neuron-test` service\\.\n\n::\n\n   kubectl port-forward service/k8s-neuron-test 8500:8500 &\n\n2. Create a Python script called `tensorflow-model-server-infer.py` with the following content. This script runs inference via gRPC, which is service framework.\n\n::\n\n   import numpy as np\n   import grpc\n   import tensorflow as tf\n   from tensorflow.keras.preprocessing import image\n   from tensorflow.keras.applications.resnet50 import preprocess_input\n   from tensorflow_serving.apis import predict_pb2\n   from tensorflow_serving.apis import prediction_service_pb2_grpc\n   from tensorflow.keras.applications.resnet50 import decode_predictions\n\n   if __name__ == '__main__':\n       channel = grpc.insecure_channel('localhost:8500')\n       stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)\n       img_file = tf.keras.utils.get_file(\n           \"./kitten_small.jpg\",\n           \"https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg\")\n       img = image.load_img(img_file, target_size=(224, 224))\n       img_array = preprocess_input(image.img_to_array(img)[None, ...])\n       request = predict_pb2.PredictRequest()\n       request.model_spec.name = 'resnet50_inf1'\n       request.inputs['input'].CopyFrom(\n           tf.make_tensor_proto(img_array, shape=img_array.shape))\n       result = stub.Predict(request)\n       prediction = tf.make_ndarray(result.outputs['output'])\n       print(decode_predictions(prediction))\n\n3. Run the script to submit predictions to your service\\.\n::\n\n   python3 tensorflow-model-server-infer.py\n\n   Your output should look like the following:\n\n::\n\n   [[(u'n02123045', u'tabby', 0.68817204), (u'n02127052', u'lynx', 0.12701613), (u'n02123159', u'tiger_cat', 0.08736559), (u'n02124075', u'Egyptian_cat', 0.063844085), (u'n02128757', u'snow_leopard', 0.009240591)]]\n"
  },
  {
    "path": "containers/tutorials/inference/tutorial-infer.rst",
    "content": ".. _tutorial-infer:\n\nRun Inference in PyTorch Neuron Container\n==========================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\nOverview\n--------\n\nThis tutorial demonstrates how to run a pytorch DLC on an inferentia instance.\n\nBy the end of this tutorial you will be able to run the inference using the container\n\nYou will use an inf1.2xlarge to test your Docker configuration for Inferentia.\n\nTo find out the available neuron devices on your instance, use the command ``ls /dev/neuron*``.\n\nSetup Environment\n-----------------\n\n1. Launch an Inf1 Instance\n\n2. Set up docker environment according to :ref:`tutorial-docker-env-setup`\n\n3. Clone the `aws-neuron/deep-learning-containers <https://github.com/aws-neuron/deep-learning-containers>`_ GitHub repository and use one of the PyTorch inference Dockerfiles found in the folders of the repo:\n\n.. code:: bash\n\n   git clone https://github.com/aws-neuron/deep-learning-containers.git\n   cd deep-learning-containers/docker/pytorch/inference/2.9.0\n\nFor additional prerequisites and setup requirements, see the `docker build prerequisites <https://github.com/aws-neuron/deep-learning-containers/blob/main/README.md#prerequisites>`_.\n\nThis tutorial requires the `torchserve entrypoint <https://github.com/aws-neuron/deep-learning-containers/blob/main/docker/common/torchserve-neuron.sh>`_ and `torchserve config.properties <https://github.com/aws-neuron/deep-learning-containers/blob/main/docker/common/config.properties>`_ which are copied over to the same parent folder as part of prerequisites.\n\nWith the files in a local directory, build the image with the following command:\n\n.. code:: bash\n\n   docker build . -f Dockerfile.neuronx -t neuron-container:pytorch\n\nRun the following command to start the container\n\n.. code:: bash\n\n   docker run -itd --name pt-cont -p 80:8080 -p 8081:8081 --device=/dev/neuron0 neuron-container:pytorch /usr/local/bin/entrypoint.sh -m 'pytorch-resnet-neuron=https://aws-dlc-sample-models.s3.amazonaws.com/pytorch/Resnet50-neuron.mar' -t /home/model-server/config.properties"
  },
  {
    "path": "containers/tutorials/k8s-default-scheduler.rst",
    "content": "This approach integrates the Neuron Scheduler Extension directly with the Kubernetes default scheduler. This method requires access to modify the default scheduler configuration.\n\n**Prerequisites**\n\nEnsure that the Neuron Device Plugin is running.\n\n**Step 1: Configure kube-scheduler**\n\nEnable the kube-scheduler to use a ConfigMap for scheduler policy. In your ``cluster.yml``, update the spec section with the following:\n\n.. code:: yaml\n\n    spec:\n      kubeScheduler:\n        usePolicyConfigMap: true\n\n**Step 2: Launch the Cluster**\n\nCreate and launch the cluster:\n\n.. code:: bash\n\n    kops create -f cluster.yml\n    kops create secret --name neuron-test-1.k8s.local sshpublickey admin -i ~/.ssh/id_rsa.pub\n    kops update cluster --name neuron-test-1.k8s.local --yes\n\n**Step 3: Install Neuron Scheduler Extension**\n\nInstall the Neuron Scheduler Extension and register it with kube-scheduler:\n\n.. code:: bash\n\n    helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \\\n        --set \"scheduler.enabled=true\" \\\n        --set \"scheduler.customScheduler.enabled=false\" \\\n        --set \"scheduler.defaultScheduler.enabled=true\" \\\n        --set \"npd.enabled=false\"\n"
  },
  {
    "path": "containers/tutorials/k8s-multiple-scheduler.rst",
    "content": "This approach deploys a separate scheduler alongside the default Kubernetes scheduler. This is useful in environments where you don't have access to modify the default scheduler configuration, such as Amazon EKS.\n\nIn this setup, a new scheduler (``my-scheduler``) is deployed with the Neuron Scheduler Extension integrated. Pods that need to run Neuron workloads specify this custom scheduler in their configuration.\n\n.. note::\n\n    Amazon EKS does not natively support modifying the default scheduler, so this multiple scheduler approach is required for EKS environments.\n\n**Prerequisites**\n\nEnsure that the Neuron Device Plugin is running.\n\n**Step 1: Install Neuron Scheduler Extension**\n\nInstall the Neuron Scheduler Extension as a custom scheduler:\n\n.. code:: bash\n\n    helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \\\n        --set \"scheduler.enabled=true\" \\\n        --set \"npd.enabled=false\"\n\n**Step 2: Verify Installation**\n\nCheck that there are no errors in the ``my-scheduler`` pod logs and that the ``k8s-neuron-scheduler`` pod is bound to a node:\n\n.. code:: bash\n\n    kubectl logs -n kube-system my-scheduler-79bd4cb788-hq2sq\n\n**Expected output:**\n\n.. code:: bash\n\n    I1012 15:30:21.629611       1 scheduler.go:604] \"Successfully bound pod to node\" pod=\"kube-system/k8s-neuron-scheduler-5d9d9d7988-xcpqm\" node=\"ip-192-168-2-25.ec2.internal\" evaluatedNodes=1 feasibleNodes=1\n\n**Step 3: Configure Pods to Use Custom Scheduler**\n\nWhen creating Pods that need to use the Neuron Scheduler Extension, specify ``my-scheduler`` as the scheduler name. Here's a sample Pod specification:\n\n.. code:: yaml\n\n    apiVersion: v1\n    kind: Pod\n    metadata:\n      name: <POD_NAME>\n    spec:\n      restartPolicy: Never\n      schedulerName: my-scheduler\n      containers:\n        - name: <POD_NAME>\n          command: [\"<COMMAND>\"]\n          image: <IMAGE_NAME>\n          resources:\n            limits:\n              cpu: \"4\"\n              memory: 4Gi\n              aws.amazon.com/neuroncore: 9\n            requests:\n              cpu: \"1\"\n              memory: 1Gi\n\n**Step 4: Verify Scheduling**\n\nAfter running a Neuron workload Pod, verify that the Neuron Scheduler successfully processed the filter and bind requests:\n\n.. code:: bash\n\n    kubectl logs -n kube-system k8s-neuron-scheduler-5d9d9d7988-xcpqm\n\n**Expected output for filter request:**\n\n.. code:: bash\n\n    2022/10/12 15:41:16 POD nrt-test-5038 fits in Node:ip-192-168-2-25.ec2.internal\n    2022/10/12 15:41:16 Filtered nodes: [ip-192-168-2-25.ec2.internal]\n    2022/10/12 15:41:16 Failed nodes: map[]\n    2022/10/12 15:41:16 Finished Processing Filter Request...\n\n**Expected output for bind request:**\n\n.. code:: bash\n\n    2022/10/12 15:41:16 Executing Bind Request!\n    2022/10/12 15:41:16 Determine if the pod %v is NeuronDevice podnrt-test-5038\n    2022/10/12 15:41:16 Updating POD Annotation with alloc devices!\n    2022/10/12 15:41:16 Return aws.amazon.com/neuroncore\n    2022/10/12 15:41:16 neuronDevUsageMap for resource:aws.amazon.com/neuroncore in node: ip-192-168-2-25.ec2.internal is [false false false false false false false false false false false false false false false false]\n    2022/10/12 15:41:16 Allocated ids for POD nrt-test-5038 are: 0,1,2,3,4,5,6,7,8\n    2022/10/12 15:41:16 Try to bind pod nrt-test-5038 in default namespace to node ip-192-168-2-25.ec2.internal with &Binding{ObjectMeta:{nrt-test-5038    8da590b1-30bc-4335-b7e7-fe574f4f5538  0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},Target:ObjectReference{Kind:Node,Namespace:,Name:ip-192-168-2-25.ec2.internal,UID:,APIVersion:,ResourceVersion:,FieldPath:,},}\n    2022/10/12 15:41:16 Updating the DevUsageMap since the bind is successful!\n    2022/10/12 15:41:16 Return aws.amazon.com/neuroncore\n    2022/10/12 15:41:16 neuronDevUsageMap for resource:aws.amazon.com/neuroncore in node: ip-192-168-2-25.ec2.internal is [false false false false false false false false false false false false false false false false]\n    2022/10/12 15:41:16 neuronDevUsageMap for resource:aws.amazon.com/neurondevice in node: ip-192-168-2-25.ec2.internal is [false false false false]\n    2022/10/12 15:41:16 Allocated devices list 0,1,2,3,4,5,6,7,8 for resource aws.amazon.com/neuroncore\n    2022/10/12 15:41:16 Allocated devices list [0] for other resource aws.amazon.com/neurondevice\n    2022/10/12 15:41:16 Allocated devices list [0] for other resource aws.amazon.com/neurondevice\n    2022/10/12 15:41:16 Allocated devices list [0] for other resource aws.amazon.com/neurondevice\n    2022/10/12 15:41:16 Allocated devices list [0] for other resource aws.amazon.com/neurondevice\n    2022/10/12 15:41:16 Allocated devices list [1] for other resource aws.amazon.com/neurondevice\n    2022/10/12 15:41:16 Allocated devices list [1] for other resource aws.amazon.com/neurondevice\n    2022/10/12 15:41:16 Allocated devices list [1] for other resource aws.amazon.com/neurondevice\n    2022/10/12 15:41:16 Allocated devices list [1] for other resource aws.amazon.com/neurondevice\n    2022/10/12 15:41:16 Allocated devices list [2] for other resource aws.amazon.com/neurondevice\n    2022/10/12 15:41:16 Return aws.amazon.com/neuroncore\n    2022/10/12 15:41:16 Succesfully updated the DevUsageMap [true true true true true true true true true false false false false false false false]  and otherDevUsageMap [true true true false] after alloc for node ip-192-168-2-25.ec2.internal\n    2022/10/12 15:41:16 Finished executing Bind Request...\n"
  },
  {
    "path": "containers/tutorials/k8s-neuron-device-plugin.rst",
    "content": "The Neuron Device Plugin is a Kubernetes device plugin that exposes Neuron hardware resources to the cluster's scheduler. It discovers available Neuron devices on each node, advertises them as allocatable resources, and manages their lifecycle. When Pods request Neuron resources, the device plugin handles the allocation and ensures exclusive access to the assigned devices. This integration enables Kubernetes to treat Neuron accelerators as first-class schedulable resources, similar to GPUs or other specialized hardware.\n\nThe device plugin registers two resource types with Kubernetes:\n\n* ``aws.amazon.com/neuroncore`` - Used for allocating individual Neuron cores to containers\n* ``aws.amazon.com/neuron`` - Used for allocating entire Neuron devices to containers (all cores belonging to the device)\n\n**Deploy Neuron Device Plugin**\n\n**Prerequisites**\n\nEnsure that all :ref:`prerequisites<k8s-prerequisite>` are satisfied before proceeding.\n\n**Installation**\n\nApply the Neuron Device Plugin as a DaemonSet on the cluster:\n\n.. code:: bash\n\n    helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \\\n        --set \"npd.enabled=false\"\n\n**Verify Installation**\n\nVerify that the Neuron Device Plugin is running:\n\n.. code:: bash\n\n    kubectl get ds neuron-device-plugin -n kube-system\n\nExpected output (example with 2 nodes in cluster):\n\n.. code:: bash\n\n    NAME                   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE\n    neuron-device-plugin   2         2         2       2            2           <none>          18h\n\n**Verify Allocatable Resources**\n\nVerify that nodes have allocatable Neuron cores:\n\n.. code:: bash\n\n    kubectl get nodes \"-o=custom-columns=NAME:.metadata.name,NeuronCore:.status.allocatable.aws\\.amazon\\.com/neuroncore\"\n\nExpected output:\n\n.. code:: bash\n\n    NAME                                          NeuronCore\n    ip-192-168-65-41.us-west-2.compute.internal   32\n    ip-192-168-87-81.us-west-2.compute.internal   32\n\nVerify that nodes have allocatable Neuron devices:\n\n.. code:: bash\n\n    kubectl get nodes \"-o=custom-columns=NAME:.metadata.name,NeuronDevice:.status.allocatable.aws\\.amazon\\.com/neuron\"\n\nExpected output:\n\n.. code:: bash\n\n    NAME                                          NeuronDevice\n    ip-192-168-65-41.us-west-2.compute.internal   16\n    ip-192-168-87-81.us-west-2.compute.internal   16\n"
  },
  {
    "path": "containers/tutorials/k8s-neuron-helm-chart.rst",
    "content": ".. _k8s-neuron-helm-chart:\n\nThe Neuron Helm Chart simplifies the deployment and management of Neuron infrastructure components on Kubernetes clusters. It provides a unified installation method for all essential Neuron components, streamlining the setup process and ensuring consistent configuration across your cluster.\n\nComponents Included\n^^^^^^^^^^^^^^^^^^^\n\nThe Neuron Helm Chart includes the following components:\n\n* Neuron Device Plugin\n* Neuron Scheduler Extension\n* :ref:`Neuron Node Problem Detector and Recovery <k8s-neuron-problem-detector-and-recovery>`\n* Neuron DRA (Dynamic Resource Allocation) Driver. Refer to :ref:`neuron-dra`.\n\nInstallation\n^^^^^^^^^^^^\n\nTo install the Neuron Helm Chart:\n\n.. code:: bash\n\n    helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart\n\nFor detailed information on configuration options, advanced deployment scenarios, and troubleshooting, please refer to the official Neuron Helm Charts repository: https://github.com/aws-neuron/neuron-helm-charts/\n"
  },
  {
    "path": "containers/tutorials/k8s-neuron-monitor.rst",
    "content": ".. _k8s-neuron-monitor:\n\nNeuron Monitor is a monitoring solution that collects and exposes metrics from Neuron devices and the Neuron runtime. It provides visibility into hardware utilization, performance counters, memory usage, and device health status. The monitor can export metrics in formats compatible with popular observability platforms like Prometheus, enabling integration with existing monitoring and alerting infrastructure. This allows operators to track Neuron device performance, identify bottlenecks, and troubleshoot issues in production environments.\n\nFor detailed information about Neuron Monitor, see the `Neuron Monitor User Guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_.\n\n.. note::\n\n    Neuron Monitor does not currently support environments using the Neuron DRA (Dynamic Resource Allocation) Driver.\n\nDeploy Neuron Monitor DaemonSet\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Step 1: Download the Configuration**\n\nDownload the Neuron Monitor YAML file: :download:`k8s-neuron-monitor-daemonset.yml </src/k8/k8s-neuron-monitor-daemonset.yml>`\n\n**Step 2: Apply the Configuration**\n\nApply the Neuron Monitor YAML to create a DaemonSet on the cluster:\n\n.. code:: bash\n\n    kubectl apply -f k8s-neuron-monitor-daemonset.yml\n\n**Step 3: Verify Installation**\n\nVerify that the Neuron Monitor DaemonSet is running:\n\n.. code:: bash\n\n    kubectl get ds neuron-monitor --namespace neuron-monitor\n\nExpected output (example with 2 nodes in cluster):\n\n.. code:: bash\n\n    NAME             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE\n    neuron-monitor   2         2         2       2            2           <none>          27h\n\n**Step 4: Get Pod Names**\n\nRetrieve the Neuron Monitor pod names:\n\n.. code:: bash\n\n    kubectl get pods --namespace neuron-monitor\n\nExpected output:\n\n.. code:: bash\n\n    NAME                   READY   STATUS    RESTARTS   AGE\n    neuron-monitor-slsxf   1/1     Running   0          17m\n    neuron-monitor-wc4f5   1/1     Running   0          17m\n\n**Step 5: Verify Prometheus Endpoint**\n\nVerify that the Prometheus metrics endpoint is available:\n\n.. code:: bash\n\n    kubectl exec neuron-monitor-wc4f5 --namespace neuron-monitor -- wget -q --output-document - http://127.0.0.1:8000\n\nExpected output (sample metrics):\n\n.. code:: bash\n\n    # HELP python_gc_objects_collected_total Objects collected during gc\n    # TYPE python_gc_objects_collected_total counter\n    python_gc_objects_collected_total{generation=\"0\"} 362.0\n    python_gc_objects_collected_total{generation=\"1\"} 0.0\n    python_gc_objects_collected_total{generation=\"2\"} 0.0\n    # HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC\n    # TYPE python_gc_objects_uncollectable_total counter\n"
  },
  {
    "path": "containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa.rst",
    "content": ".. _k8s-neuron-problem-detector-and-recovery-irsa:\n\nPermissions for Neuron Node Problem Detector and Recovery\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Neuron Node Problem Detector and Recovery requires IAM roles for service accounts (IRSA) for authorization. For more information, see `IAM roles for service accounts <https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html>`__ in the Amazon EKS User Guide.\n\nThis section shows how to configure an IAM role for service accounts using the ``eksctl`` command-line tool.\n\n**Step 1: Install eksctl**\n\nInstall the ``eksctl`` CLI using the instructions at https://eksctl.io/installation/.\n\n**Step 2: Create IAM Policy**\n\nCreate an IAM policy that grants the necessary permissions for the Neuron Node Problem Detector.\n\n.. code:: json\n\n    {\n        \"Version\": \"2012-10-17\",\n        \"Statement\": [\n            {\n                \"Action\": [\n                    \"autoscaling:SetInstanceHealth\",\n                    \"autoscaling:DescribeAutoScalingInstances\"\n                ],\n                \"Effect\": \"Allow\",\n                \"Resource\": \"<arn of the Auto Scaling group corresponding to the Neuron nodes for the cluster>\"\n            },\n            {\n                \"Action\": [\n                    \"ec2:DescribeInstances\"\n                ],\n                \"Effect\": \"Allow\",\n                \"Resource\": \"*\",\n                \"Condition\": {\n                    \"ForAllValues:StringEquals\": {\n                        \"ec2:ResourceTag/aws:autoscaling:groupName\": \"<name of the Auto Scaling group corresponding to the Neuron nodes for the cluster>\"\n                    }\n                }\n            },\n            {\n                \"Action\": [\n                    \"cloudwatch:PutMetricData\"\n                ],\n                \"Effect\": \"Allow\",\n                \"Resource\": \"*\",\n                \"Condition\": {\n                    \"StringEquals\": {\n                        \"cloudwatch:Namespace\": \"NeuronHealthCheck\"\n                    }\n                }\n            }\n        ]\n    }\n\nSave the policy template above to a file named ``npd-policy.json`` (replacing the placeholder values), then run:\n\n.. code:: bash\n\n    aws iam create-policy \\\n        --policy-name NeuronProblemDetectorPolicy \\\n        --policy-document file://npd-policy.json\n\n**Step 3: Create Namespace and Service Account**\n\nCreate a dedicated namespace for the Neuron Node Problem Detector:\n\n.. code:: bash\n\n    kubectl create ns neuron-healthcheck-system\n\n**Step 4: Associate IAM Role with Service Account**\n\nUse the following script to create the service account and associate it with the IAM role:\n\n.. code:: bash\n\n    #!/bin/bash\n    CLUSTER_NAME=<eks cluster name>\n    REGION_CODE=$(aws configure get region)\n    POLICY_ARN=<policy arn for NeuronProblemDetectorPolicy>\n\n    eksctl create iamserviceaccount \\\n        --name node-problem-detector \\\n        --namespace neuron-healthcheck-system \\\n        --cluster $CLUSTER_NAME \\\n        --attach-policy-arn $POLICY_ARN \\\n        --approve \\\n        --role-name neuron-problem-detector-role-$CLUSTER_NAME \\\n        --region $REGION_CODE \\\n        --override-existing-serviceaccounts\n\n**Step 5: Verify Service Account Configuration**\n\nVerify that the service account is annotated correctly with the IAM role:\n\n.. code:: bash\n\n    kubectl describe sa node-problem-detector -n neuron-healthcheck-system\n\nExpected output:\n\n.. code:: bash\n\n    Name:                node-problem-detector\n    Namespace:           neuron-healthcheck-system\n    Labels:              app.kubernetes.io/managed-by=eksctl\n    Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::111111111111:role/neuron-problem-detector-role-cluster1\n    Image pull secrets:  <none>\n    Mountable secrets:   <none>\n    Tokens:              <none>\n    Events:              <none>\n\n**Cleanup**\n\nTo remove the service account and associated IAM role, use the following command:\n\n.. code:: bash\n\n    #!/bin/bash\n    CLUSTER_NAME=<eks cluster name>\n    REGION_CODE=$(aws configure get region)\n\n    eksctl delete iamserviceaccount \\\n        --name node-problem-detector \\\n        --namespace neuron-healthcheck-system \\\n        --cluster $CLUSTER_NAME \\\n        --approve \\\n        --region $REGION_CODE\n"
  },
  {
    "path": "containers/tutorials/k8s-neuron-problem-detector-and-recovery.rst",
    "content": ".. _k8s-neuron-problem-detector-and-recovery:\n\nDeploy Neuron Node Problem Detector and Recovery\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Neuron Node Problem Detector and Recovery is a critical resiliency component that continuously monitors the health of Neuron devices on each Kubernetes node by detecting hardware and software errors such as device failures, driver problems, and runtime errors. It integrates with the Kubernetes Node Problem Detector framework to report Neuron-specific conditions. When unrecoverable issues are detected, it can automatically remediate problems by marking nodes as unhealthy and triggering node replacement to prevent workload scheduling on faulty hardware. The component can also publish CloudWatch metrics under the ``NeuronHealthCheck`` namespace for monitoring and alerting purposes.\n\n**Requirements**\n\nBefore deploying the Neuron Node Problem Detector and Recovery, ensure the following requirements are met:\n\n* **Neuron Driver:** Version 2.15 or later\n* **Neuron Runtime:** SDK 2.18 or later\n* **Prerequisites:** All prerequisites for Kubernetes containers and the Neuron Node Problem Detector must be satisfied\n\n**Installation**\n\nInstall the Neuron Node Problem Detector and Recovery as a DaemonSet using Helm:\n\n.. note::\n\n    The installation pulls the container image from the upstream Node Problem Detector repository at ``registry.k8s.io/node-problem-detector``.\n\n.. code:: bash\n\n    helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart\n\n**Enable Node Recovery**\n\nBy default, the Neuron Node Problem Detector runs in **monitor-only mode**. To enable automatic node recovery functionality:\n\n.. code:: bash\n\n    helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \\\n        --set \"npd.nodeRecovery.enabled=true\"\n\n**Verify Installation**\n\nVerify that the Node Problem Detector pods are running:\n\n.. code:: bash\n\n    kubectl get pod -n neuron-healthcheck-system\n\nExpected output (example with 4 nodes in cluster):\n\n.. code:: bash\n\n    NAME                          READY   STATUS    RESTARTS   AGE\n    node-problem-detector-7qcrj   1/1     Running   0          59s\n    node-problem-detector-j45t5   1/1     Running   0          59s\n    node-problem-detector-mr2cl   1/1     Running   0          59s\n    node-problem-detector-vpjtk   1/1     Running   0          59s\n\n**Monitoring and Metrics**\n\nWhen an unrecoverable error occurs, the Neuron Node Problem Detector:\n\n* Publishes metrics to CloudWatch under the ``NeuronHealthCheck`` namespace\n* Updates the node's ``NodeCondition``, which can be viewed using:\n\n  .. code:: bash\n\n      kubectl describe node <node-name>\n"
  },
  {
    "path": "containers/tutorials/k8s-neuron-scheduler-flow.rst",
    "content": ".. _k8s-neuron-scheduler-flow:\n\nNeuron Scheduler Extension Flow Diagram\n---------------------------------------\n\n::\n\n\n\n\n                                                                           +----------------------------+\n                                                                           | POD Manifest               |\n                                                                           | with Request               |\n                                                                           | aws.amazon.com/neuroncore:2|\n                                                                           |                            |\n                                                                           |                            |\n                                                       2                   +-------------+--------------+\n                                            +--------------------------------+           |\n                                            |                                |           |\n                                            |                                |           | 3\n             +------------------------------+-----+                          |           |\n             |           Kubelet in INF1/TRN1 Node|                          |           |\n             |                                    +<-----------+             |           |\n             +-----+---------------------+--------+            |       +-----v-----------v--------------+\n                   |                     ^                     |       |          Kube-Scheduler        |\n                   |                     |                     |       |                                |\n                   |                     |                     |       +--^------+---------------+------+\n                 9 |                  1  |                     |          |      |               |\n                   |                     |                    8|         5|      |4              |\n                   |                     |                     |          |      |               |\n                   |                     |                     |          |      |               |6\n                   v                     |                     |          |      |               |\n             +-----+---------------------+--------+            |       +--+------v---------------v------+\n             |    neuron-device-plugin            |            +-------+       neuron|scheduler|ext     |\n             |    in INF1/TRN1 node               |                    +---------------------+----------+\n             +----+----------------------+--------+                                          |\n                  |                      |                                                   |7\n                  |                      |10                                                 |\n                  |                      |                                                   v\n                11|                      |                                         +---------+-------+\n                  |                      |                                         |POD Manifest:    |\n                  |                      |                                         |Annotation:      |\n                  |                      |                                         |NEURON_CORES:2,3 |\n                  v                      +---------------------------------------->+                 |\n   --device=/dev/neuron1 --env NEURON_RT_VISIBLE_CORES=2,3                         |                 |\n                                                                                   |                 |\n                                                                                   +-----------------+\n\n   1. neuron-device-plugin returns the list of Neuron cores/devices to kublet\n   2. Kubelet advertises the Core/Device list to K8s API server (in turn to kube-scheduler)\n   3. POD Request for neuron cores/devices [Kube-Scheduler picks up the POD creation request]\n   4. kube-scheduler calls the neuron-scheduler-extn filter function with list of nodes and POD Specification\n   5. neuron-scheduler-extn scans through the nodes and filters out nodes with non\n   contiguous cores/devices and returns the nodes that are capable of supporing the given POD specification\n   6. kube-scheduler calls the neuron-scheduler-extn bind function with pod and node\n   7. neuron-scheduler-extn updates the POD annotation with allocated neuron core/device Ids (contiguous)\n   8. neuron-scheduler-extn sends the bind request to kubelet of the selected node\n   9. Kubelet calls the Alloc function of the neuron-device-plugin\n   10. neuron-device-plugin queries the POD Annotation for allocated core/device Ids\n   11. neuron-device-plugin exports the devices & visisble cores to container runtime\n"
  },
  {
    "path": "containers/tutorials/k8s-neuron-scheduler.rst",
    "content": "The Neuron Scheduler Extension is a Kubernetes scheduler plugin that provides intelligent, topology-aware scheduling for Neuron workloads. While the device plugin handles basic resource allocation, the scheduler extension optimizes Pod placement by considering Neuron core topology, NeuronCore-to-NeuronCore connectivity, and workload requirements. It ensures efficient utilization of Neuron devices by placing Pods on nodes where the requested Neuron cores are optimally configured. This component is optional and primarily beneficial for workloads that require specific subsets of Neuron devices or cores rather than consuming all available resources on a node.\n\nThe scheduler extension is required for scheduling Pods that request more than one Neuron core or device resource. It finds sets of directly connected devices with minimal communication latency when scheduling containers, ensuring optimal performance for multi-device workloads.\n\nFor a graphical depiction of how the Neuron Scheduler Extension works, see :ref:`k8s-neuron-scheduler-flow`.\n\n**Device Allocation by Instance Type**\n\nThe Neuron Scheduler Extension applies topology-aware scheduling rules based on instance type to ensure consistent and high performance regardless of which cores and devices are assigned to containers.\n\n**Inf1 and Inf2 Instances (Ring Topology)**\n\nDevices are connected through a ring topology with no restrictions on the number of devices requested (as long as it is fewer than the total devices on a node). When N devices are requested, the scheduler finds a node where N contiguous devices are available to minimize communication latency. It will never allocate non-contiguous devices to the same container.\n\nFor example, when a container requests 3 Neuron devices, the scheduler might assign devices 0, 1, 2 if available, but never devices 0, 2, 4 because those devices are not directly connected.\n\nThe figure below shows examples of device sets on an Inf2.48xlarge node that could be assigned to a container requesting 2 devices:\n\n|eks-inf2-device-set|\n\n**Trn1.32xlarge and Trn1n.32xlarge Instances (2D Torus Topology)**\n\nDevices are connected via a 2D torus topology. The scheduler enforces that containers request 1, 4, 8, or all 16 devices. If your container requires a different number of devices (such as 2 or 5), we recommend using an Inf2 instance instead to benefit from more flexible topology support.\n\nIf you request an invalid number of devices (such as 7), your Pod will not be scheduled and you will receive a warning:\n\n``Instance type trn1.32xlarge does not support requests for device: 7. Please request a different number of devices.``\n\nWhen requesting 4 devices, your container will be allocated one of the following device sets if available:\n\n|eks-trn1-device-set4|\n\nWhen requesting 8 devices, your container will be allocated one of the following device sets if available:\n\n|eks-trn1-device-set8|\n\n.. note::\n\n    For all instance types, requesting one or all Neuron cores or devices is always valid.\n\n**Deploy Neuron Scheduler Extension**\n\n.. tab-set::\n\n   .. tab-item:: Multiple Scheduler Approach\n\n      .. include:: /containers/tutorials/k8s-multiple-scheduler.rst\n\n   .. tab-item:: Default Scheduler Approach\n\n      .. include:: /containers/tutorials/k8s-default-scheduler.rst\n\n\n.. |eks-inf2-device-set| image:: /images/eks-inf2-device-set.png\n.. |eks-trn1-device-set4| image:: /images/eks-trn1-device-set4.png\n.. |eks-trn1-device-set8| image:: /images/eks-trn1-device-set8.png\n"
  },
  {
    "path": "containers/tutorials/k8s-prerequisite.rst",
    "content": ".. _k8s-prerequisite:\n\n.. meta::\n   :description: Learn how to create an Amazon EKS cluster with AWS Trainium instances (Trn1, Trn2) for machine learning workloads using AWS Neuron SDK. Step-by-step guide with eksctl and CloudFormation templates.\n   :keywords: EKS, Kubernetes, Trainium, Trn1, Trn2, Neuron, AWS, machine learning, distributed training, eksctl, CloudFormation, EFA, node group\n\nBefore setting up Neuron components on your EKS cluster, you must create an EKS cluster and add Neuron-enabled nodes. This section guides you through creating an Amazon Elastic Kubernetes Service (EKS) cluster with AWS Trainium-enabled nodes (Trn1 or Trn2 instances) using CloudFormation templates and the eksctl command-line tool. You'll configure optimized networking with Elastic Fabric Adapter (EFA) support and pre-configured Neuron components for distributed training and inference workloads.\n\nFor detailed information, refer to:\n\n* `EKS Cluster Creation Guide <https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html>`_\n* `EKS Compute Resources Guide <https://docs.aws.amazon.com/eks/latest/userguide/eks-compute.html>`_\n* `eksctl Getting Started <https://eksctl.io/getting-started/>`_\n\n**Step 1: Download Node Group Template**\n\nDownload the node group CloudFormation template for your instance type.\n\n.. tab-set::\n\n   .. tab-item:: Trn1\n\n      .. code-block:: bash\n\n         wget https://raw.githubusercontent.com/aws-neuron/aws-neuron-eks-samples/master/dp_bert_hf_pretrain/cfn/eks_trn1_ng_stack.yaml\n\n   .. tab-item:: Trn2\n\n      .. code-block:: bash\n\n         wget https://raw.githubusercontent.com/aws-neuron/aws-neuron-eks-samples/master/dp_bert_hf_pretrain/cfn/eks_trn2_ng_stack_al2023.yaml\n\n**Important template configuration information**\n\n* **Placement Group:** Optimizes network speed between nodes\n* **EFA Driver:** Installed automatically (ensure ``libfabric`` version matches between AMI and workload containers)\n* **AMI:** Uses `EKS optimized accelerated AMI <https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html#gpu-ami>`_ with Neuron components pre-installed\n* **Instance Type:** Configured for trn1.32xlarge or trn2.48xlarge (update to your desired instance type)\n* **Kubernetes Version:** Trn1 templates use Kubernetes 1.25+, Trn2 templates use Kubernetes 1.34+ (update as needed)\n\nTrn2 LNC configuration (Optional):\n\nTrn2 instances use a default Logical NeuronCore Configuration (LNC) of ``2``. To change it to ``1``, update the ``UserData`` section of the launch template:\n\n.. code-block:: bash\n\n    --==BOUNDARY==\n    Content-Type: text/x-shellscript; charset=\"us-ascii\"\n\n    #!/bin/bash\n    set -ex\n    config_dir=/opt/aws/neuron\n    config_file=${config_dir}/logical_nc_config\n    [ -d \"$config_dir\" ] || mkdir -p \"$config_dir\"\n    [ -f \"$config_file\" ] || touch \"$config_file\"\n    if ! grep -q \"^NEURON_LOGICAL_NC_CONFIG=1$\" \"$config_file\" 2>/dev/null; then\n        printf \"NEURON_LOGICAL_NC_CONFIG=1\" >> \"$config_file\"\n    fi\n    --==BOUNDARY==--\n\n**Step 2: Create Cluster Parameter Script**\n\nCreate a bash script to capture the parameters needed for the node template:\n\n.. tab-set::\n\n   .. tab-item:: Trn1\n\n      .. code-block:: bash\n\n        #!/bin/bash\n\n        CLUSTER_NAME=$1\n        CLUSTER_SG=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r \".[0].ResourcesVpcConfig.ClusterSecurityGroupId\")\n        VPC_ID=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r \".[0].ResourcesVpcConfig.VpcId\")\n\n        cat <<EOF > cfn_params.json\n        [\n            {\n                \"ParameterKey\": \"ClusterName\",\n                \"ParameterValue\": \"$CLUSTER_NAME\"\n            },\n            {\n                \"ParameterKey\": \"ClusterControlPlaneSecurityGroup\",\n                \"ParameterValue\": \"$CLUSTER_SG\"\n            },\n            {\n                \"ParameterKey\": \"VpcId\",\n                \"ParameterValue\": \"$VPC_ID\"\n            }\n        ]\n        EOF\n\n   .. tab-item:: Trn2\n\n      .. code-block:: bash\n\n          #!/bin/bash\n\n          CLUSTER_NAME=$1\n          CLUSTER_SG=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r \".[0].ResourcesVpcConfig.ClusterSecurityGroupId\")\n          VPC_ID=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r \".[0].ResourcesVpcConfig.VpcId\")\n          CLUSTER_ENDPOINT=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r \".[0].Endpoint\")\n          CLUSTER_SERVICE_CIDR=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r \".[0].KubernetesNetworkConfig.ServiceIpv4Cidr\")\n          CLUSTER_CA=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r \".[0].CertificateAuthority.Data\")\n\n          cat <<EOF > cfn_params.json\n          [\n              {\n                  \"ParameterKey\": \"ClusterName\",\n                  \"ParameterValue\": \"$CLUSTER_NAME\"\n              },\n              {\n                  \"ParameterKey\": \"ClusterControlPlaneSecurityGroup\",\n                  \"ParameterValue\": \"$CLUSTER_SG\"\n              },\n              {\n                  \"ParameterKey\": \"VpcId\",\n                  \"ParameterValue\": \"$VPC_ID\"\n              },\n              {\n                  \"ParameterKey\": \"ClusterEndpoint\",\n                  \"ParameterValue\": \"$CLUSTER_ENDPOINT\"\n              },\n              {\n                  \"ParameterKey\": \"ClusterServiceCidr\",\n                  \"ParameterValue\": \"$CLUSTER_SERVICE_CIDR\"\n              },\n              {\n                  \"ParameterKey\": \"ClusterCertificateAuthority\",\n                  \"ParameterValue\": \"$CLUSTER_CA\"\n              }\n          ]\n          EOF\n\n\nThis script captures the cluster name, security group for control plane connectivity, and VPC ID.\n\n**Step 3: Create CloudFormation Stack**\n\nCreate the CloudFormation stack for the node group.\n\n.. tab-set::\n\n   .. tab-item:: Trn1\n\n      .. code-block:: bash\n\n         aws cloudformation create-stack \\\n             --stack-name eks-trn1-ng-stack \\\n             --template-body file://eks_trn1_ng_stack.yaml \\\n             --parameters file://cfn_params.json \\\n             --capabilities CAPABILITY_IAM\n\n   .. tab-item:: Trn2\n\n      .. code-block:: bash\n\n         aws cloudformation create-stack \\\n             --stack-name eks-trn2-ng-stack \\\n             --template-body file://eks_trn2_ng_stack_al2023.yaml \\\n             --parameters file://cfn_params.json \\\n             --capabilities CAPABILITY_IAM\n\nWait for the stack creation to complete before proceeding. You can monitor the progress in the AWS CloudFormation console.\n\n**Step 4: Determine Availability Zones**\n\nIdentify the availability zones for your cluster:\n\n.. code-block:: bash\n\n    aws ec2 describe-availability-zones \\\n        --region $REGION_CODE \\\n        --query \"AvailabilityZones[]\" \\\n        --filters \"Name=zone-id,Values=$1\" \\\n        --query \"AvailabilityZones[].ZoneName\" \\\n        --output text\n\n**Step 5: Generate Node Group Configuration**\n\nCreate a script named ``create_ng_yaml.sh`` to generate the node group YAML configuration. The script requires: region, availability zones, cluster name, and CloudFormation stack name.\n\n.. tab-set::\n\n   .. tab-item:: Trn1\n\n      .. code-block:: bash\n\n         #!/bin/bash\n\n         REGION_CODE=$1\n         EKSAZ1=$2\n         EKSAZ2=$3\n         CLUSTER_NAME=$4\n         STACKNAME=$5\n\n         LT_ID_TRN1=$(aws cloudformation describe-stacks --stack-name $STACKNAME \\\n                 --query \"Stacks[0].Outputs[?OutputKey=='LaunchTemplateIdTrn1'].OutputValue\" \\\n                 --output text)\n\n         cat <<EOF > trn1_nodegroup.yaml\n         apiVersion: eksctl.io/v1alpha5\n         kind: ClusterConfig\n\n         metadata:\n           name: $CLUSTER_NAME\n           region: $REGION_CODE\n           version: \"1.28\"\n\n         iam:\n           withOIDC: true\n\n         availabilityZones: [\"$EKSAZ1\",\"$EKSAZ2\"]\n\n         managedNodeGroups:\n           - name: trn1-32xl-ng1\n             launchTemplate:\n               id: $LT_ID_TRN1\n             minSize: 1\n             desiredCapacity: 1\n             maxSize: 1\n             availabilityZones: [\"$EKSAZ1\"]\n             privateNetworking: true\n             efaEnabled: true\n         EOF\n\n   .. tab-item:: Trn2\n\n      .. code-block:: bash\n\n         #!/bin/bash\n\n         REGION_CODE=$1\n         EKSAZ1=$2\n         EKSAZ2=$3\n         CLUSTER_NAME=$4\n         STACKNAME=$5\n\n         LT_ID_TRN2=$(aws cloudformation describe-stacks --stack-name $STACKNAME \\\n                 --query \"Stacks[0].Outputs[?OutputKey=='LaunchTemplateIdTrn2'].OutputValue\" \\\n                 --output text)\n\n         cat <<EOF > trn2_nodegroup.yaml\n         apiVersion: eksctl.io/v1alpha5\n         kind: ClusterConfig\n\n         metadata:\n           name: $CLUSTER_NAME\n           region: $REGION_CODE\n           version: \"1.34\"\n\n         iam:\n           withOIDC: true\n\n         availabilityZones: [\"$EKSAZ1\",\"$EKSAZ2\"]\n\n         managedNodeGroups:\n           - name: trn2-48xl-ng1\n             launchTemplate:\n               id: $LT_ID_TRN2\n             minSize: 1\n             desiredCapacity: 1\n             maxSize: 1\n             availabilityZones: [\"$EKSAZ1\"]\n             privateNetworking: true\n             efaEnabled: true\n         EOF\n\nRun the script to generate the configuration file. Update the Kubernetes version as needed for your environment.\n\nExample output:\n\n.. tab-set::\n\n   .. tab-item:: Trn1\n\n      .. code-block:: yaml\n\n         apiVersion: eksctl.io/v1alpha5\n         kind: ClusterConfig\n\n         metadata:\n           name: nemo2\n           region: us-west-2\n           version: \"1.28\"\n\n         iam:\n           withOIDC: true\n\n         availabilityZones: [\"us-west-2d\",\"us-west-2c\"]\n\n         managedNodeGroups:\n           - name: trn1-32xl-ng1\n             launchTemplate:\n               id: lt-093c222b35ea89009\n             minSize: 1\n             desiredCapacity: 1\n             maxSize: 1\n             availabilityZones: [\"us-west-2d\"]\n             privateNetworking: true\n             efaEnabled: true\n\n   .. tab-item:: Trn2\n\n      .. code-block:: yaml\n\n         apiVersion: eksctl.io/v1alpha5\n         kind: ClusterConfig\n\n         metadata:\n           name: nemo2\n           region: us-west-2\n           version: \"1.34\"\n\n         iam:\n           withOIDC: true\n\n         availabilityZones: [\"us-west-2d\",\"us-west-2c\"]\n\n         managedNodeGroups:\n           - name: trn2-48xl-ng1\n             launchTemplate:\n               id: lt-093c222b35ea89010\n             minSize: 1\n             desiredCapacity: 1\n             maxSize: 1\n             availabilityZones: [\"us-west-2d\"]\n             privateNetworking: true\n             efaEnabled: true\n\n**Step 6: Create Node Group**\n\nCreate the node group using the generated configuration.\n\n.. tab-set::\n\n   .. tab-item:: Trn1\n\n      .. code-block:: bash\n\n         eksctl create nodegroup -f trn1_nodegroup.yaml\n\n   .. tab-item:: Trn2\n\n      .. code-block:: bash\n\n         eksctl create nodegroup -f trn2_nodegroup.yaml\n\nWait for the nodes to reach the ``Ready`` state. Verify using:\n\n.. code-block:: bash\n\n    kubectl get nodes\n\n**Step 7: Install EFA Device Plugin (Optional)**\n\nIf you plan to run distributed training or inference jobs, install the EFA device plugin following the instructions at the `EFA device plugin repository <https://github.com/aws-samples/aws-efa-eks>`_.\n"
  },
  {
    "path": "containers/tutorials/k8s-setup.rst",
    "content": ".. _tutorial-k8s-env-setup-for-neuron-to-remove:\n\nKubernetes environment setup for Neuron\n=======================================\n\nIntroduction\n------------\n\nCustomers that use Kubernetes can conveniently integrate Inf1/Trn1 instances into their workflows. This tutorial will go through deploying the neuron device plugin daemonset and also how to allocate neuron cores or devices to application pods.\n\n.. dropdown:: Prerequisite\n      :class-title: sphinx-design-class-title-small\n      :class-body: sphinx-design-class-body-small\n      :animate: fade-in\n\n      .. include:: /containers/tutorials/k8s-prerequisite.rst\n\n.. dropdown:: Deploy Neuron Device Plugin\n      :class-title: sphinx-design-class-title-small\n      :class-body: sphinx-design-class-body-small\n      :animate: fade-in\n\n      .. include:: /containers/tutorials/k8s-neuron-device-plugin.rst\n\n.. dropdown:: Deploy Neuron Scheduler Extension\n      :class-title: sphinx-design-class-title-small\n      :class-body: sphinx-design-class-body-small\n      :animate: fade-in\n\n      .. include:: /containers/tutorials/k8s-neuron-scheduler.rst\n"
  },
  {
    "path": "containers/tutorials/training/index.rst",
    "content": "Containers -- Training Tutorials\n=================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /containers/tutorials/training/tutorial-training\n    /containers/tutorials/training/k8s_mlp_train_demo\n\n\n.. include:: /containers/tutorials/training/index.txt\n"
  },
  {
    "path": "containers/tutorials/training/index.txt",
    "content": "* :ref:`tutorial-training`\n* :ref:`example-deploy-mlp-train-pod`\n"
  },
  {
    "path": "containers/tutorials/training/k8s_mlp_train_demo.rst",
    "content": ".. _example-deploy-mlp-train-pod:\n\nDeploy a simple mlp training script as a Kubernetes job\n----------------------------------------------------------\n\nThis tutorial uses mlp train as a teaching example on how to deploy an\ntraining application using Kubernetes on the Trn1 instances. For more advanced example, please refer to `Tutorial: Launch a Multi-Node PyTorch Neuron Training Job on Trainium Using TorchX and EKS <https://github.com/aws-neuron/aws-neuron-eks-samples/tree/master/dp_bert_hf_pretrain>`__\n\nPrerequisite:\n^^^^^^^^^^^^^\n\n-  :ref:`tutorial-k8s-env-setup-for-neuron`: to setup k8s support on your cluster.\n-  Trn1 instances as worker nodes with attached roles allowing:\n\n   -  ECR read access policy to retrieve container images from ECR:\n      **arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly**\n- Have a container image that is build using :ref:`tutorial-training`\n\nDeploy a mlp training image\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n1. Create a file named `mlp_train.yaml` with the contents below\\. \n\n.. note::\n   In the image:  add the appropriate location of the image\n\n\n::\n\n  apiVersion: v1\n  kind: Pod\n  metadata:\n    name: trn1-mlp\n  spec:\n    restartPolicy: Never\n    schedulerName: default-scheduler\n    hostNetwork: true\n    nodeSelector:\n      beta.kubernetes.io/instance-type: trn1.32xlarge\n      beta.kubernetes.io/instance-type: trn1.2xlarge\n    containers:\n      - name: trn1-mlp\n        command: [\"/usr/local/bin/python3\"]\n        args:  [\"/opt/ml/mlp_train.py\"]\n        image: 647554078242.dkr.ecr.us-east-1.amazonaws.com/sunda-pt:k8s_mlp_0907\n        imagePullPolicy: IfNotPresent\n        env:\n        - name: NEURON_RT_LOG_LEVEL\n          value: \"INFO\"\n        resources:\n          limits: \n            aws.amazon.com/neuron: 2\n          requests:\n            aws.amazon.com/neuron: 2\n\n2. Deploy the pod.\n\n::\n\n   kubectl apply -f mlp_train.yaml\n\n3. Check the logs to make sure training completed\n::\n\n   kubectl logs <pod name>\n\n   Your log should have the following\n\n::\n\n  Final loss is 0.1977\n  ----------End Training ---------------\n"
  },
  {
    "path": "containers/tutorials/training/tutorial-training.rst",
    "content": ".. _tutorial-training:\n\nRun Training in PyTorch Neuron Container\n========================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\nOverview\n--------\n\nThis tutorial demonstrates how to run a pytorch container on an trainium instance.\n\nBy the end of this tutorial you will be able to run simple mlp training using the container\n\nYou will use an trn1.2xlarge to test your Docker configuration for Trainium.\n\nTo find out the available neuron devices on your instance, use the command ``ls /dev/neuron*``.\n\nSetup Environment\n-----------------\n\n1. Launch an Trn1 Instance\n         .. include:: /setup/install-templates/launch-instance.txt\n\n2. Set up docker environment according to :ref:`tutorial-docker-env-setup`\n\n3. A sample Dockerfile for for torch-neuron can be found here :ref:`trainium-dlc-dockerfile`.\nThis dockerfile needs the mlp train script found here  :ref:`mlp-train`\n\nWith the files in a dir, build the image with the following command:\n\n.. code:: bash\n\n   docker build . -f Dockerfile.pt -t neuron-container:pytorch\n\nRun the following command to start the container\n\n.. code:: bash\n\n   docker run -it --name pt-cont --net=host --device=/dev/neuron0 neuron-container:pytorch python3 /opt/ml/mlp_train.py"
  },
  {
    "path": "containers/tutorials/tutorial-docker-env-setup.rst",
    "content": ".. _tutorial-docker-env-setup:\n\nTutorial Docker environment setup\n=================================\n\nIntroduction\n------------\n\nA Neuron application can be deployed using docker containers. This\ntutorial describes how to configure docker on Amazon Linux 2023 to expose Inferentia/Trainium devices\nto containers.\n\n\n.. tab-set::\n\n   .. tab-item:: Training\n\n        .. dropdown:: Install Drivers\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. code:: bash\n\n               # Configure Linux for Neuron repository updates\n\n               sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF\n               [neuron]\n               name=Neuron YUM Repository\n               baseurl=https://yum.repos.neuron.amazonaws.com\n               enabled=1\n               metadata_expire=0\n               EOF\n               sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n\n               # Update OS packages\n               sudo dnf update -y\n\n\n               # Install OS headers\n               sudo dnf install -y \"kernel-devel-uname-r = $(uname -r)\"\n\n               # Remove preinstalled packages and Install Neuron Driver and Runtime\n               sudo dnf remove aws-neuron-dkms -y\n               sudo dnf remove aws-neuronx-dkms -y\n               sudo dnf install aws-neuronx-dkms-2.*  -y\n\n               # Install EFA Driver(only required for multi-instance training)\n               curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz\n               wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key\n               cat aws-efa-installer.key | gpg --fingerprint\n               wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig\n               tar -xvf aws-efa-installer-latest.tar.gz\n               cd aws-efa-installer && sudo bash efa_installer.sh --yes\n               cd\n               sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer\n\n        .. dropdown:: Install Docker\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. code:: bash\n\n               sudo dnf install -y docker.io\n               sudo usermod -aG docker $USER\n\n            Logout and log back in to refresh membership.\n\n        .. dropdown:: Verify Docker\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            .. code:: bash\n\n               docker run hello-world\n\n            Expected result:\n\n            ::\n\n               Hello from Docker!\n               This message shows that your installation appears to be working correctly.\n\n               To generate this message, Docker took the following steps:\n               1. The Docker client contacted the Docker daemon.\n               2. The Docker daemon pulled the \"hello-world\" image from the Docker Hub.\n               (amd64)\n               3. The Docker daemon created a new container from that image which runs the\n               executable that produces the output you are currently reading.\n               4. The Docker daemon streamed that output to the Docker client, which sent it\n               to your terminal.\n\n               To try something more ambitious, you can run an Ubuntu container with:\n               $ docker run -it ubuntu bash\n\n               Share images, automate workflows, and more with a free Docker ID:\n               https://hub.docker.com/\n\n               For more examples and ideas, visit:\n               https://docs.docker.com/get-started/\n\n        .. dropdown:: Verify Neuron Component\n            :class-title: sphinx-design-class-title-small\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            Once the environment is setup, a container can be started with\n            --device=/dev/neuron# to specify desired set of Inferentia/Trainium devices to be\n            exposed to the container. To find out the available neuron devices on\n            your instance, use the command ``ls /dev/neuron*``.\n\n            When running neuron-ls inside a container, you will only see the set of\n            exposed Trainiums. For example:\n\n            .. code:: bash\n\n               docker run --device=/dev/neuron0 neuron-test neuron-ls\n\n            Would produce the following output in trn1.32xlarge:\n\n            ::\n\n               +--------+--------+--------+---------+\n               | NEURON | NEURON | NEURON |   PCI   |\n               | DEVICE | CORES  | MEMORY |   BDF   |\n               +--------+--------+--------+---------+\n               | 0      | 2      | 32 GB  | 10:1c.0 |\n               +--------+--------+--------+---------+\n\n   .. tab-item:: Inference\n\n      .. dropdown:: Install Drivers\n         :class-title: sphinx-design-class-title-small\n         :class-body: sphinx-design-class-body-small\n         :animate: fade-in\n\n         .. code:: bash\n\n            # Configure Linux for Neuron repository updates\n            sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF\n            [neuron]\n            name=Neuron YUM Repository\n            baseurl=https://yum.repos.neuron.amazonaws.com\n            enabled=1\n            metadata_expire=0\n            EOF\n            sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n\n            # Update OS packages\n            sudo dnf update -y\n\n            ################################################################################################################\n            # To install or update to Neuron versions 1.19.1 and newer from previous releases:\n            # - DO NOT skip 'aws-neuron-dkms' install or upgrade step, you MUST install or upgrade to latest Neuron driver\n            ################################################################################################################\n\n            # Install OS headers\n            sudo dnf install -y \"kernel-devel-uname-r = $(uname -r)\"\n\n            # Install Neuron Driver\n            sudo dnf install aws-neuron-dkms -y\n\n            ####################################################################################\n            # Warning: If Linux kernel is updated as a result of OS package update\n            #          Neuron driver (aws-neuron-dkms) should be re-installed after reboot\n            ####################################################################################\n\n      .. dropdown:: Install Docker\n         :class-title: sphinx-design-class-title-small\n         :class-body: sphinx-design-class-body-small\n         :animate: fade-in\n\n         .. code:: bash\n\n            sudo dnf install -y docker.io\n            sudo usermod -aG docker $USER\n\n         Logout and log back in to refresh membership.\n\n      .. dropdown:: Verify Docker\n         :class-title: sphinx-design-class-title-small\n         :class-body: sphinx-design-class-body-small\n         :animate: fade-in\n\n         .. code:: bash\n\n            docker run hello-world\n\n         Expected result:\n\n         ::\n\n            Hello from Docker!\n            This message shows that your installation appears to be working correctly.\n\n            To generate this message, Docker took the following steps:\n            1. The Docker client contacted the Docker daemon.\n            2. The Docker daemon pulled the \"hello-world\" image from the Docker Hub.\n            (amd64)\n            3. The Docker daemon created a new container from that image which runs the\n            executable that produces the output you are currently reading.\n            4. The Docker daemon streamed that output to the Docker client, which sent it\n            to your terminal.\n\n            To try something more ambitious, you can run an Ubuntu container with:\n            $ docker run -it ubuntu bash\n\n            Share images, automate workflows, and more with a free Docker ID:\n            https://hub.docker.com/\n\n            For more examples and ideas, visit:\n            https://docs.docker.com/get-started/\n\n\n      .. dropdown:: Verify Neuron Component\n         :class-title: sphinx-design-class-title-small\n         :class-body: sphinx-design-class-body-small\n         :animate: fade-in\n\n         Once the environment is setup, a container can be started with\n         --device=/dev/neuron# to specify desired set of Inferentia/Trainium devices to be\n         exposed to the container. To find out the available neuron devices on\n         your instance, use the command ``ls /dev/neuron*``.\n\n         When running neuron-ls inside a container, you will only see the set of\n         exposed Inferentias. For example:\n\n         .. code:: bash\n\n            docker run --device=/dev/neuron0 neuron-test neuron-ls\n\n         Would produce the following output in inf1.xlarge:\n\n         ::\n\n            +--------------+---------+--------+-----------+-----------+------+------+\n            |   PCI BDF    | LOGICAL | NEURON |  MEMORY   |  MEMORY   | EAST | WEST |\n            |              |   ID    | CORES  | CHANNEL 0 | CHANNEL 1 |      |      |\n            +--------------+---------+--------+-----------+-----------+------+------+\n            | 0000:00:1f.0 |       0 |      4 | 4096 MB   | 4096 MB   |    0 |    0 |\n            +--------------+---------+--------+-----------+-----------+------+------+\n\n"
  },
  {
    "path": "containers/tutorials/tutorial-oci-hook.rst",
    "content": ".. _tutorial-oci-hook:\n\nTutorial Docker Neuron OCI Hook Setup\n=====================================\n\nIntroduction\n------------\n\nA Neuron application can be deployed using docker containers. Neuron devices\nare exposed to the containers using the --device option in the docker run command.\nDocker runtime (runc) does not yet support the ALL option to expose all neuron\ndevices to the container. In order to do that an environment variable,\n“AWS_NEURON_VISIBLE_DEVICES=ALL\" can be used.\n\nFor the above environment variable to be used, the oci neuron hook has to be\ninstalled/configured.\n\n.. important::\n\n    The Neuron OCI Hook is currently NOT supported with AL2023 on ECS. For workarounds,\n    see the :ref:`oci-hook-workarounds` section below.\n\nInstall oci-add-hooks dependency on the Linux host\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. important::\n\n    This step should run on the Linux host and not inside the container.\n\n\n`oci-add-hooks <https://github.com/awslabs/oci-add-hooks>`__ is an OCI\nruntime with the sole purpose of injecting OCI prestart, poststart, and\npoststop hooks into a container config.json before passing along to an\nOCI compatable runtime. oci-add-hooks is used to inject a hook that\nexposes Inferentia devices to the container.\n\n.. code:: bash\n\n    sudo apt install -y golang && \\\n        export GOPATH=$HOME/go && \\\n        go get github.com/joeshaw/json-lossless && \\\n        cd /tmp/ && \\\n        git clone https://github.com/awslabs/oci-add-hooks && \\\n        cd /tmp/oci-add-hooks && \\\n        make build && \\\n        sudo cp /tmp/oci-add-hooks/oci-add-hooks /usr/local/bin/\n\nInstall the package that has oci hook software\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. important::\n\n    This step should run on the Linux host and not inside the container.\n\nFor Inf1 install the following package\n\n.. code:: bash\n\n    sudo apt-get install aws-neuron-runtime-base -y\n\nFor Trn1 install the following package\n\n.. code:: bash\n\n    sudo apt-get install aws-neuronx-oci-hook -y\n\nFor docker runtime setup Docker to use oci-neuron OCI runtime.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\noci-neuron is a script representing OCI compatible runtime. It wraps\noci-add-hooks, which wraps runc. In this step, we configure docker to\npoint at oci-neuron OCI runtime. Install dockerIO:\n\n.. code:: bash\n\n    sudo cp /opt/aws/neuron/share/docker-daemon.json /etc/docker/daemon.json\n    sudo service docker restart\n\nIf the docker restart command fails, make sure to check if the docker\nsystemd service is not masked. More information on this can be found\nhere: https://stackoverflow.com/a/37640824\n\nFor containerd runtime, setup containerd to use oci-neuron OCI runtime.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUpdate the following fields in the /etc/containerd/config.toml to configure\ncontainerd to use the neuron oci hook\n\n.. code:: bash\n\n    default_runtime_name = \"neuron\"\n    [plugins.\"io.containerd.grpc.v1.cri\".containerd.runtimes.neuron]\n        [plugins.\"io.containerd.grpc.v1.cri\".containerd.runtimes.neuron.options]\n            BinaryName = \"/opt/aws/neuron/bin/oci_neuron_hook_wrapper.sh\"\n\n\nAfter that restart the containerd daemon\n\n.. code:: bash\n\n    sudo systemctl restart containerd\n\nFor cri-o runtime, setup cri-o to use oci-neuron OCI runtime.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUpdate the following fields in the /etc/crio/crio.conf to configure\ncri-o to use the neuron oci hook\n\n.. code:: bash\n\n    default_runtime_name = \"neuron\"\n    [crio.runtime.runtimes.neuron]\n    runtime_path = \"/opt/aws/neuron/bin/oci_neuron_hook_wrapper.sh\"\n\nAfter that restart the containerd daemon\n\n.. code:: bash\n\n    sudo systemctl restart cri-o\n\n.. _oci-hook-workarounds:\n\nOCI hook workarounds\n^^^^^^^^^^^^^^^^^^^^\n\n**ECS (EC2)**\n\nAdd the following to your ECS task definition:\n\n.. code:: json\n\n    \"linuxParameters\": {\n        \"devices\": [\n            {\n                \"containerPath\": \"/dev/neuron0\",\n                \"hostPath\": \"/dev/neuron0\",\n                \"permissions\": [\n                    \"read\",\n                    \"write\"\n                ]\n            },\n            {\n                \"containerPath\": \"/dev/neuron1\",\n                \"hostPath\": \"/dev/neuron1\",\n                \"permissions\": [\n                    \"read\",\n                    \"write\"\n                ]\n            },\n            ...,\n        ],\n    },\n\nThe linuxParameters parameter can be found under containerDefinition. More information can be found here:\nhttps://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_linuxparameters.\nExpose as many Neuron devices as needed, up to the max number of devices for the specified instance.\nFor example, the trn1.32xlarge instance type contains 16 neuron devices, so the devices that can be exposed are\n/dev/neuron0, /dev/neuron1, up to /dev/neuron15.\nTo see an example of an ECS task definition exposing Neuron devices,\nsee https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-inference-task-def.html.\n"
  },
  {
    "path": "containers/tutorials.rst",
    "content": ".. meta::\n   :description: Comprehensive tutorials for deploying AWS Neuron SDK in containers with Docker and Kubernetes. Learn to build Neuron containers, configure EKS clusters, deploy device plugins, and set up monitoring for Trainium and Inferentia instances.\n   :keywords: Neuron containers, Docker, Kubernetes, EKS, Trainium, Inferentia, device plugin, scheduler, monitoring, tutorials, AWS, machine learning\n\nContainers - Tutorials\n=======================\n\nLearn how to deploy and manage AWS Neuron workloads in containerized environments. These tutorials cover everything from building Docker containers with Neuron support to deploying production-ready Kubernetes clusters with device plugins, schedulers, and monitoring solutions. Whether you're running inference or training workloads on AWS Trainium or Inferentia instances, these step-by-step guides will help you configure your container infrastructure for optimal performance and reliability.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    Inference </containers/tutorials/inference/index>\n    Training </containers/tutorials/training/index>\n    /containers/tutorials/tutorial-docker-env-setup\n    /containers/tutorials/build-run-neuron-container\n    /containers/tutorials/tutorial-oci-hook\n    /containers/tutorials/k8s-setup\n    /containers/tutorials/k8s-neuron-helm-chart\n    /containers/tutorials/k8s-neuron-scheduler-flow\n    /containers/tutorials/k8s-neuron-monitor\n    /containers/tutorials/k8s-neuron-problem-detector-and-recovery\n    /containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa\n\n\nGeneral Container Tutorials\n----------------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Docker Environment Setup\n      :link: /containers/tutorials/tutorial-docker-env-setup\n      :link-type: doc\n\n      Configure Docker on Amazon Linux 2023 to expose Inferentia and Trainium devices to containers. Install Neuron drivers, runtime, and configure the Docker daemon for Neuron device access.\n\n   .. grid-item-card:: Build and Run Neuron Containers\n      :link: /containers/tutorials/build-run-neuron-container\n      :link-type: doc\n\n      Learn how to build Docker images with Neuron support using provided Dockerfiles and run containerized applications on Inf1 and Trn1 instances with proper device exposure.\n\n   .. grid-item-card:: Docker Neuron OCI Hook Setup\n      :link: /containers/tutorials/tutorial-oci-hook\n      :link-type: doc\n\n      Install and configure the Neuron OCI hook to enable the AWS_NEURON_VISIBLE_DEVICES environment variable for exposing all Neuron devices to containers without explicit device flags.\n\nKubernetes Setup and Configuration\n-----------------------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Kubernetes Environment Setup\n      :link: /containers/tutorials/k8s-setup\n      :link-type: doc\n\n      Complete guide to setting up Kubernetes for Neuron, including EKS cluster creation with Trainium nodes, device plugin installation, scheduler extension setup, and resource allocation configuration.\n\n   .. grid-item-card:: Neuron Helm Chart\n      :link: /containers/tutorials/k8s-neuron-helm-chart\n      :link-type: doc\n\n      Simplify Neuron infrastructure deployment with the unified Helm chart that installs device plugins, scheduler extensions, node problem detector, and DRA driver in a single command.\n\nKubernetes Device Management\n-----------------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Scheduler Flow Diagram\n      :link: /containers/tutorials/k8s-neuron-scheduler-flow\n      :link-type: doc\n\n      Visual diagram showing how the Neuron Scheduler Extension integrates with Kubernetes components to schedule Pods with Neuron resource requests.\n\nKubernetes Monitoring and Recovery\n-----------------------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Neuron Monitor\n      :link: /containers/tutorials/k8s-neuron-monitor\n      :link-type: doc\n\n      Deploy Neuron Monitor to collect and expose metrics from Neuron devices and runtime. Integrate with Prometheus for observability, performance tracking, and troubleshooting.\n\n   .. grid-item-card:: Node Problem Detector and Recovery\n      :link: /containers/tutorials/k8s-neuron-problem-detector-and-recovery\n      :link-type: doc\n\n      Monitor Neuron device health and automatically remediate issues by detecting hardware failures, driver problems, and runtime errors. Enable automatic node replacement for faulty hardware.\n\n   .. grid-item-card:: NPD Permissions (IRSA)\n      :link: /containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa\n      :link-type: doc\n\n      Configure IAM roles for service accounts (IRSA) to grant the Neuron Node Problem Detector necessary permissions for Auto Scaling group operations and CloudWatch metrics.\n\n\nTraining and Inference Container Tutorials\n------------------------------------------    \n.. tab-set:: \n\n    .. tab-item:: Training\n\n        .. include:: /containers/tutorials/training/index.txt\n\n.. tab-set:: \n\n    .. tab-item:: Inference\n    \n         .. include:: /containers/tutorials/inference/index.txt"
  },
  {
    "path": "devflows/aws-batch-flows.rst",
    "content": ".. _aws_batch_flow:\n\nAWS Batch\n=========\n\n.. toctree::\n    :maxdepth: 1\n\n    /devflows/training/batch/batch-training\n\n              \n"
  },
  {
    "path": "devflows/aws-batch-flows.txt",
    "content": ".. tab-set:: \n\n    .. tab-item:: Inference\n\n        .. include:: /devflows/inference/aws-batch-flows.txt\n\n\n.. tab-set:: \n\n    .. tab-item:: Training\n\n        .. include:: /devflows/training/aws-batch-flows.txt"
  },
  {
    "path": "devflows/dlc-then-customize-devflow.rst",
    "content": ".. _dlc-then-customize-devflow:\n\nCustomize Neuron DLC\n==============================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\nDescription\n-----------\n\nThis guide covers how to customize and extend the Neuron Deep Learning Container (DLC) to fit your specific project needs. You can customize the DLC either by using the DLC as a base image in your Dockerfile or by modifying published Dockerfiles on GitHub.\n\nMethod 1: Using DLC as a Base Image\n-----------------\n\n1. Create a New Dockerfile. In your Dockerfile, specify the Neuron DLC as your base image using the FROM directive.\n\n2. Complete the Dockerfile. You can add additional packages, change the base environment, or any other modifications that suit your project. `AWS Batch Training <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/batch/batch-training.html#batch-training>`_ is a good example which needs customize Neuron DLC by using it as the base image. From its `Dockerfile <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/aws-batch/llama2/docker/Dockerfile>`_, we can find the customized container copies llama_batch_training.sh to the container and runs it.\n\n3. Navigate to the directory containing your Dockerfile and build your custom container.\n\nMethod 2: Modifying Published Dockerfiles\n-----------------\n\n1. Visit the `Neuron DLC Github repo <https://github.com/aws-neuron/deep-learning-containers>`_ and locate the Dockerfile for the container you wish to customize.\n\n2. Modify the Dockerfile as needed. You can add additional packages, change the base environment, or any other modifications that suit your project. For example, if you do not need to use Neuron tools in your scenario and want to make the container smaller, you can remove aws-neuronx-tools at this `line <https://github.com/aws-neuron/deep-learning-containers/blob/a969c77fdba17ff8d35f411b39ce3a9bc6368730/docker/pytorch/inference/2.1.1/Dockerfile.neuronx#L64>`_.\n\n3. Navigate to the directory containing your Dockerfile and build your custom container.\n"
  },
  {
    "path": "devflows/ec2-flows.rst",
    "content": ".. _amazon-ec2:\n\nAmazon EC2\n==========\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Inference </devflows/inference/ec2-flows>\n    Training </devflows/training/ec2-flows>\n\n\n.. include:: /devflows/ec2-flows.txt\n"
  },
  {
    "path": "devflows/ec2-flows.txt",
    "content": ".. tab-set:: \n\n    .. tab-item:: Inference\n\n        .. include:: /devflows/inference/ec2-flows.txt\n\n.. tab-set:: \n\n    .. tab-item:: Training\n        \n        .. include:: /devflows/training/ec2-flows.txt\n"
  },
  {
    "path": "devflows/ecs-flows.rst",
    "content": ".. _ecs_flow:\n\nAmazon ECS\n==========\n\n.. toctree::\n    :maxdepth: 1\n\n    /devflows/plugins/npd-ecs-flows\n    /devflows/inference/dlc-then-ecs-devflow\n    /devflows/training/dlc-then-ecs-devflow\n\nIn this section, you'll find resources to help you use Neuron with ECS cluster, deploying inference and training workloads on Inferentia and Trainium ECS clusters.\n\n\nUsing Neuron Node Problem Detector Plugin with ECS\n--------------------------------------------------\n\nNeuron node problem detector and recovery plugin enhances resiliency by detecting and remediating errors.\nTo get started with using Neuron node problem detector plugin and recovery plugin on an ECS cluster, please refer to :ref:`ecs-neuron-problem-detector-and-recovery`.\n\n\nRunning Inference workload\n--------------------------\n\nThis guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an ECS cluster with Inferentia instances.\nFor running machine learning inference workloads on Amazon ECS using AWS Deep Learning Containers, please refer to :ref:`inference-dlc-then-ecs-devflow`.\n\n\nRunning Training workload\n-------------------------\n\nThis guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an ECS cluster with Trainium instances.\nFor running machine learning training workloads on Amazon ECS using AWS Deep Learning Containers, please refer to :ref:`training-dlc-then-ecs-devflow`.\n"
  },
  {
    "path": "devflows/eks-flows.rst",
    "content": ".. _eks_flow:\n\nAmazon EKS\n==========\n\n.. toctree::\n    :maxdepth: 1\n\n    /containers/kubernetes-getting-started\n    /devflows/inference/dlc-then-eks-devflow\n    /containers/tutorials/training/k8s_mlp_train_demo\n\n\nIn this section, you'll find resources to help you use Neuron with EKS cluster, deploying inference and training workloads on Inferentia and Trainium EKS clusters.\n\n\nEKS Setup\n------------\n\nThis guide covers setting up the Neuron device plugin, scheduler extension, node problem detector, and monitoring plugins.\nThese components enable efficient resource utilization, monitoring, and resilience when using Inferentia and Trainium instances for inference and training workloads on Kubernetes clusters.\nTo get started with using AWS Neuron and setting up the required plugins on an EKS cluster, please refer to :ref:`tutorial-k8s-env-setup-for-neuron`.\n\n\nRunning Inference workload\n--------------------------\n\nThis guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an EKS cluster with Inferentia instances.\nFor running machine learning inference workloads on Amazon EKS using AWS Deep Learning Containers, please refer to :ref:`dlc-then-eks-devflow`.\n\n\nRunning Training workload\n-------------------------\n\nThis guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an EKS cluster with Trainium instances.\nFor running machine learning training workloads on Amazon EKS using AWS Deep Learning Containers, please refer to :ref:`example-deploy-mlp-train-pod`.\n"
  },
  {
    "path": "devflows/index.rst",
    "content": ".. _neuron-devflows:\n\n.. meta::\n      :description:\n      :date-modified:\n\nAWS Workload Orchestration\n==========================\n\nAWS Neuron integrates seamlessly with various AWS compute and orchestration services to accelerate deep learning workloads. This section provides deployment patterns and best practices for running Neuron-powered applications across different AWS services, from container orchestration to high-performance computing clusters.\n\n.. grid:: 2\n   :gutter: 2\n\n   .. grid-item-card:: Amazon EKS\n      :link: /devflows/eks-flows\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Deploy Neuron workloads on Kubernetes with Amazon Elastic Kubernetes Service\n\n   .. grid-item-card:: Amazon ECS\n      :link: /devflows/ecs-flows\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Run containerized Neuron applications using Amazon Elastic Container Service\n\n   .. grid-item-card:: AWS ParallelCluster\n      :link: /devflows/parallelcluster-flows\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Set up HPC clusters for distributed training and inference workloads\n\n   .. grid-item-card:: AWS Batch\n      :link: /devflows/aws-batch-flows\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Execute batch ML jobs with automatic scaling and resource management\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /devflows/eks-flows\n    /devflows/ecs-flows\n    /devflows/parallelcluster-flows\n    /devflows/aws-batch-flows\n    Amazon SageMaker </devflows/sagemaker-flows>\n    Third-party Solutions </devflows/third-party-solutions>\n\n"
  },
  {
    "path": "devflows/inference/aws-batch-flows.rst",
    "content": "AWS Batch Flows - Inference\n===========================\n\n\n.. include:: /devflows/inference/aws-batch-flows.txt"
  },
  {
    "path": "devflows/inference/aws-batch-flows.txt",
    "content": ".. note::\n\n    AWS Batch supports Inf1.\n\n    An example of how to deploy a model with Neuron using Batch is coming soon.\n\n"
  },
  {
    "path": "devflows/inference/byoc-hosting-devflow-inf2.rst",
    "content": ".. _byoc-hosting-devflow-inf2:\n\nBring Your Own Neuron Container to Sagemaker Hosting (inf2 or trn1)\n====================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n   \nDescription\n-----------\n\n|image|\n \n.. |image| image:: /images/byoc-then-hosting-dev-flow.png\n   :width: 850\n   :alt: Neuron developer flow on SageMaker Neo\n   :align: middle\n\nYou can use a SageMaker Notebook or an EC2 instance to compile models and build your own containers for deployment on SageMaker Hosting using ml.inf2 instances. In this developer flow, you provision a Sagemaker Notebook or an EC2 instance to train and compile your model to Inferentia. Then you deploy your model to SageMaker Hosting using the `SageMaker Python SDK <https://sagemaker.readthedocs.io/en/stable/index.html>`_. \n\nYou may not need to create a container to bring your own **code** to Amazon SageMaker. When you are using a framework such as TensorFlow or PyTorch that has direct support in SageMaker, you can simply supply the Python code that implements your algorithm using the SDK entry points for that framework.\n\nFollow the steps bellow to setup your environment. Once your environment is set you'll be able to follow the `Compiling and Deploying HuggingFace Pretrained BERT on Inf2 on Amazon SageMaker Sample <https://github.com/aws-neuron/aws-neuron-sagemaker-samples/tree/master/inference/inf2-bert-on-sagemaker>`_.\n\n\n.. _byoc-hosting-setenv:\n\nSetup Environment\n-----------------\n\n1. Create a Compilation Instance:\n\tIf using an **EC2 instance for compilation only** you can use any instances to compile a model. It is recommended that you start with an c5.4xlarge instance. If using an **EC2 instance for compilation and test a model** you can use an Inf2 instance. Follow these steps to launch an Inf2 instance:\n\t\t\n\t\t.. include:: /setup/install-templates/inf2/launch-inf2-dlami.rst\n\t\n\n\tIf using an **SageMaker Notebook for compilation**, follow the instructions in `Get Started with Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html>`_ to provision the environment. \n\n\tIt is recommended that you start with an ml.c5.4xlarge instance for the compilation. Also, increase the volume size of you SageMaker notebook instance, to accomodate the models and containers built locally. A volume of 10GB is sufficient.\n\t\n\t\t.. note::\n\t\t\t\n\t\t\tTo compile the model in the SageMaker Notebook instance, you'll need to install the Neuron Compiler and Neuron Framework Extensions. Follow the `Compiling and Deploying HuggingFace Pretrained BERT on Inf2 on Amazon SageMaker Sample <https://github.com/aws-neuron/aws-neuron-sagemaker-samples/tree/master/inference/inf2-bert-on-sagemaker>`_ to install the environments.  \n\n\n2. Set up the environment to compile a model, build your own container and deploy:\n    To compile your model on EC2 or SageMaker Notebook, follow the *Set up a development environment* section on the EC2 :ref:`ec2-then-ec2-setenv` documentation.\n\n    Refer to `Adapting Your Own Inference Container <https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html>`_ documentation for information on how to bring your own containers to SageMaker Hosting.\n\n    Make sure to add the **AmazonEC2ContainerRegistryPowerUser** role to your IAM role ARN, so you're able to build and push containers from your SageMaker Notebook instance.\n\n    .. note::\n        The container image can be created using :ref:`how-to-build-neuron-container`.\n"
  },
  {
    "path": "devflows/inference/byoc-hosting-devflow.rst",
    "content": ".. _byoc-hosting-devflow:\n\nBring Your Own Neuron Container to Sagemaker Hosting (inf1)\n====================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n   \nDescription\n-----------\n\n|image|\n \n.. |image| image:: /images/byoc-then-hosting-dev-flow.png\n   :width: 850\n   :alt: Neuron developer flow on SageMaker Neo\n   :align: middle\n\nYou can use a SageMaker Notebook or an EC2 instance to compile models and build your own containers for deployment on SageMaker Hosting using ml.inf1 instances. In this developer flow, you provision a Sagemaker Notebook or an EC2 instance to train and compile your model to Inferentia. Then you deploy your model to SageMaker Hosting using the SageMaker Python SDK. Follow the steps bellow to setup your environment. Once your environment is set you'll be able to follow the :ref:`BYOC HuggingFace pretrained BERT container to Sagemaker Tutorial </src/examples/pytorch/byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb>` .\n\n.. _byoc-hosting-setenv:\n\nSetup Environment\n-----------------\n\n1. Create a Compilation Instance:\n\tIf using an **EC2 instance for compilation** you can use an Inf1 instance to compile and test a model. Follow these steps to launch an Inf1 instance:\n\t\t\n\t\t.. include:: /setup/install-templates/inf1/launch-inf1-ami.rst\n\t\n\n\tIf using an **SageMaker Notebook for compilation**, follow the instructions in `Get Started with Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html>`_ to provision the environment. \n\n\tIt is recommended that you start with an ml.c5.4xlarge instance for the compilation. Also, increase the volume size of you SageMaker notebook instance, to accomodate the models and containers built locally. A volume of 10GB is sufficient.\n\t\n\t\t.. note::\n\t\t\t\n\t\t\tTo compile the model in the SageMaker Notebook instance, you'll need to update the conda environments to include the Neuron Compiler and Neuron Framework Extensions. Follow the installation guide on the section :ref:`how-to-update-to-latest-Neuron-Conda-Env` to update the environments.  \n\n\n2. Set up the environment to compile a model, build your own container and deploy:\n    To compile your model on EC2 or SageMaker Notebook, follow the *Set up a development environment* section on the EC2 :ref:`ec2-then-ec2-setenv` documentation.\n\n    Refer to `Adapting Your Own Inference Container <https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html>`_ documentation for information on how to bring your own containers to SageMaker Hosting.\n\n    Make sure to add the **AmazonEC2ContainerRegistryPowerUser** role to your IAM role ARN, so you're able to build and push containers from your SageMaker Notebook instance.\n\n    .. note::\n        The container image can be created using :ref:`how-to-build-neuron-container`.\n"
  },
  {
    "path": "devflows/inference/container-sm-hosting-devflow.rst",
    "content": ".. _container-sm-hosting-devflow:\n\nDeploy on Sagemaker Hosting\n===========================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n   \nDescription\n-----------\nYou can use `Sagemaker Hosted Endpoint <https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers.html>`_ to do inference on Inf1 instances."
  },
  {
    "path": "devflows/inference/dev-flows.rst",
    "content": ".. _neuron1-devflows:\n.. _compilation-flow-target:\n.. _deploym-flow-target:\n\nDeveloper Flows Introduction\n============================\n\n|image|\n\n \n.. |image| image:: /images/neuron-devflow.jpg\n   :width: 500\n   :alt: Neuron developer flow\n   \nA typical Neuron developer flow includes compilation phase and then deployment (inference) on inf1 instance/s. You can develop on Neuron using one of the following combinations of developer flows:\n\n\n\n.. toctree::\n   :maxdepth: 1\n\n   ec2-then-ec2-devflow\n   ec2-then-ec2-devflow-inf2\n   neo-then-hosting-devflow\n   byoc-hosting-devflow\n   dlc-then-ec2-devflow\n   dlc-then-ecs-devflow\n   dlc-then-eks-devflow\n\n\n\n"
  },
  {
    "path": "devflows/inference/dlc-then-ec2-devflow.rst",
    "content": ".. _dlc-then-ec2-devflow:\n\nDeploy Neuron Container on EC2\n==============================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n   \nDescription\n-----------\n\n|image|\n \n.. |image| image:: /images/dlc-on-ec2-dev-flow.png\n   :width: 500\n   :alt: Neuron developer flow for DLC on EC2\n   :align: middle\n\nYou can use the Neuron version of the `AWS Deep Learning Containers <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-inference.html>`_ to run inference on inf1 instances. In this developer flow, you provision an EC2 inf1 instance using a Deep Learming AMI (DLAMI), pull the container image with the Neuron version of the desired framework, and run the container as a server for the already compiled model. This developer flow assumes the model has already has been compiled through a :ref:`compilation developer flow <compilation-flow-target>` \n\n.. _dlc-then-ec2-setenv:\n\nSetup Environment\n-----------------\n\n1. Launch an Inf1 Instance\n\t.. include:: /setup/install-templates/inf1/launch-inf1-ami.rst\n\n2. Once you have your EC2 environment set according to :ref:`tutorial-docker-env-setup`, you can build and run a Neuron container using the :ref:`how-to-build-neuron-container` section above.\n\n.. [DLC specific flow, uncomment when DLC available] Follow the `Getting Started with Deep Learning Containers for Inference on EC2 <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ec2-tutorials-inference.html>`_ and use the appropriate DLC container.\n\n\n.. note:: \n\n\t**Prior to running the container**, make sure that the Neuron runtime on the instance is turned off, by running the command:\n\n\t.. code:: bash\n\n\t\tsudo service neuron-rtd stop\n\n\n\n"
  },
  {
    "path": "devflows/inference/dlc-then-ecs-devflow.rst",
    "content": ".. _inference-dlc-then-ecs-devflow:\n\nDeploy Neuron Container on Elastic Container Service (ECS) for Inference\n========================================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n   \nDescription\n-----------\n\n|image|\n \n.. |image| image:: /images/dlc-on-ecs-dev-flow.png\n   :width: 750\n   :alt: Neuron developer flow for DLC on ECS\n   :align: middle\n\nYou can use the Neuron version of the `AWS Deep Learning Containers <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ecs-tutorials-inference.html>`_ to run inference on Amazon Elastic Container Service (ECS). In this developer flow, you set up an ECS cluster with inf1/inf2 instances, create a task description for your inference service and deploy it to your cluster. This developer flow assumes:\n\n1. The model has already been compiled through :ref:`Compilation with Framework API on EC2 instance <ec2-then-ec2-devflow>` or through :ref:`Compilation with Sagemaker Neo <neo-then-hosting-devflow>`. \n\n2. You already set up your container to retrieve it from storage.\n\n.. _inference-dlc-then-ecs-setenv:\n\nSetup Environment\n-----------------\n\n\n1. Set up an Amazon ECS cluster:\n\tFollow the instructions on `Setting up Amazon ECS for Deep Learning Containers <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ecs-setting-up-ecs.html>`_\n\n2. Define an Inference Task:\n\tUse the instruction on the `DLC Inference on ECS Tutorial <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ecs-tutorials-inference.html>`_ to define a task and create a service for the appropriate framework.\n\n\tWhen creating tasks for inferentia instances on ECS, be aware of the considerations and requirements listed in `Working with inference workloads on Amazon ECS <https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-inference.html>`_. \n\n\n3. Use the container image created using :ref:`how-to-build-neuron-container` as the ``image`` in your task definition.\n\n   .. _inference-push_to_ecr_note:\n\n   .. note::\n\n       Before deploying your task definition to your ECS cluster, make sure to push the image to ECR. Refer to `Pushing a Docker image <https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html>`_ for more information.\n"
  },
  {
    "path": "devflows/inference/dlc-then-eks-devflow.rst",
    "content": ".. _dlc-then-eks-devflow:\n\nDeploy Neuron Container on Elastic Kubernetes Service (EKS) for Inference\n=========================================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n   \nDescription\n-----------\n\n|image|\n \n.. |image| image:: /images/dlc-on-eks-dev-flow.png\n   :width: 750\n   :alt: Neuron developer flow for DLC on ECS\n   :align: middle\n\nYou can use the Neuron version of the `AWS Deep Learning Containers <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ecs-tutorials-inference.html>`_ to run inference on Amazon Elastic Kubernetes Service (EKS). In this developer flow, you set up an EKS cluster with Inf1 instances, create a Kubernetes manifest for your inference service and deploy it to your cluster. This developer flow assumes:\n\n1. The model has already been compiled through :ref:`Compilation with Framework API on EC2 instance <ec2-then-ec2-devflow>` or through :ref:`Compilation with Sagemaker Neo <neo-then-hosting-devflow>`. \n\n2. You already set up your container to retrieve it from storage.\n\n.. _dlc-then-eks-setenv:\n\nSetup Environment\n-----------------\n\nPlease add inferentia nodes using instructions at :ref:`tutorial-k8s-env-setup-for-neuron` . \n\nUsing the YML deployment manifest shown `in the EKS documentation for inferentia <https://docs.aws.amazon.com/eks/latest/userguide/inferentia-support.html#deploy-tensorflow-serving-application>`_, replace the `image` in the `containers` specification with the one you built using :ref:`how-to-build-neuron-container`.\n\n   .. note::\n\n     Before deploying the yaml to your EKS cluster, make sure to push the image to ECR. Refer to `Pushing a Docker image <https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html>`_ for more information.\n\n\nInference Example\n-----------------\nPlease refer to :ref:`example-deploy-rn50-as-k8s-service` run a simple inference example. Note that the container image referenced in the YML manifest is created using :ref:`how-to-build-neuron-container`.\n"
  },
  {
    "path": "devflows/inference/dlc-then-k8s-devflow.rst",
    "content": ".. _dlc-then-k8s-devflow:\n\nDeploy  Neuron Container on Kubernetes\n======================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n   \nDescription\n-----------\nUse of Neuron in containers on Kubernetes cluster can be simple to achieve by following :ref:`tutorial-k8s-env-setup-for-neuron`\n\nKnown Limitations\n-----------------\nScheduling on k8s cluster requires contiguous neuron device-ids\n"
  },
  {
    "path": "devflows/inference/ec2-flows.rst",
    "content": "EC2 Flows - Inference\n=====================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /devflows/inference/ec2-then-ec2-devflow\n    /devflows/inference/ec2-then-ec2-devflow-inf2\n\n    \n\n.. include:: /devflows/inference/ec2-flows.txt\n\n"
  },
  {
    "path": "devflows/inference/ec2-flows.txt",
    "content": "* :ref:`ec2-then-ec2-devflow`\n* :ref:`ec2-then-ec2-devflow-inf2`\n       \n"
  },
  {
    "path": "devflows/inference/ec2-then-ec2-devflow-inf2.rst",
    "content": ".. _ec2-then-ec2-devflow-inf2:\n\nCompile with Framework API and Deploy on EC2 Inf2\n=================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 3\n\n   \nDescription\n-----------\n\n|image|\n \n.. |image| image:: /images/ec2-then-ec2-dev-flow-inf2.png\n   :width: 500\n   :alt: Neuron developer flow on EC2\n   :align: middle\n\nYou can use a single inf2 instance as a development environment to compile and deploy Neuron models. In this developer flow, you provision an EC2 inf2 instance using a Deep Learning AMI (DLAMI) and execute the two steps of the development flow in the same instance. The DLAMI comes pre-packaged with the Neuron frameworks, compiler, and required runtimes to complete the flow. Development happens through Jupyter Notebooks or using a secure shell (ssh) connection in terminal. Follow the steps below to setup your environment. \n\n.. note::\n\t**Model compilation can be executed on a non-inf2 instance** for later deployment. \n\tFollow the same EC2 Developer Flow Setup using other instance families and leverage `Amazon Simple Storage Service  <https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html>`_ (S3) to share the compiled models between different instances.   \n\n.. _ec2-then-ec2-setenv:\n\nSetup Environment\n-----------------\n\n1. Launch an Inf2 Instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n    .. include:: /setup/install-templates/inf2/launch-inf2-dlami.rst\n  \n\n2. Set up a development environment\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n   \nEnable PyTorch-Neuron\n~~~~~~~~~~~~~~~~~~~~~\n\n.. include :: /setup/install-templates/inf2/note-setup-libnrt-warning.rst\n\n.. include:: /setup/install-templates/inf2/dlami-enable-neuron-pytorch.rst\n\n3. Set up Jupyter notebook\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo develop from a Jupyter notebook see :ref:`setup-jupyter-notebook-steps-troubleshooting`  \n\nYou can also run a Jupyter notebook as a script, first enable the ML framework Conda or Python environment of your choice and see :ref:`running-jupyter-notebook-as-script` for instructions. \n"
  },
  {
    "path": "devflows/inference/ec2-then-ec2-devflow.rst",
    "content": ".. _ec2-then-ec2-devflow:\n\nCompile with Framework API and Deploy on EC2 Inf1\n=================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 3\n\n   \nDescription\n-----------\n\n|image|\n \n.. |image| image:: /images/ec2-then-ec2-dev-flow.png\n   :width: 500\n   :alt: Neuron developer flow on EC2\n   :align: middle\n\nYou can use a single inf1 instance as a development environment to compile and deploy Neuron models. In this developer flow, you provision an EC2 inf1 instance using a Deep Learming AMI (DLAMI) and execute the two steps of the development flow in the same instance. The DLAMI comes pre-packaged with the Neuron frameworks, compiler, and required runtimes to complete the flow. Development happens through Jupyter Notebooks or using a secure shell (ssh) connection in terminal. Follow the steps bellow to setup your environment. \n\n.. note::\n\t**Model compilation can be executed on a non-inf1 instance** for later deployment. \n\tFollow the same EC2 Developer Flow Setup using other instance families and leverage `Amazon Simple Storage Service  <https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html>`_ (S3) to share the compiled models between different instances.   \n\n.. _ec2-then-ec2-setenv:\n\nSetup Environment\n-----------------\n\n1. Launch an Inf1 Instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n    .. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst\n  \n\n2. Set up a development environment\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n   \nEnable PyTorch-Neuron\n~~~~~~~~~~~~~~~~~~~~~\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. include:: /setup/install-templates/inf1/dlami-enable-neuron-pytorch.rst\n\nEnable TensorFlow-Neuron\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. include:: /setup/install-templates/inf1/dlami-enable-neuron-tensorflow.rst\n\nEnable Apache MXNet\n~~~~~~~~~~~~~~~~~~~~\n\n.. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst\n\n.. include:: /setup/install-templates/inf1/dlami-enable-neuron-mxnet.rst\n\n3. Set up Jupyter notebook\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo develop from a Jupyter notebook see :ref:`setup-jupyter-notebook-steps-troubleshooting`  \n\nYou can also run a Jupyter notebook as a script, first enable the ML framework Conda or Python environment of your choice and see :ref:`running-jupyter-notebook-as-script` for instructions. \n"
  },
  {
    "path": "devflows/inference/env-setup-text.rst",
    "content": "A typical Neuron developer flow includes compilation phase and then deployment (inference) on inf1 instance/s.\n\n\nYou can also choose one of the following combinations for compilation and deployment:\n\n\n   \n\n"
  },
  {
    "path": "devflows/inference/neo-then-hosting-devflow.rst",
    "content": ".. _neo-then-hosting-devflow:\n\nCompile with Sagemaker Neo and Deploy on Sagemaker Hosting (inf1)\n==========================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n   \nDescription\n-----------\n\n|image|\n \n.. |image| image:: /images/neo-then-hosting-dev-flow.png\n   :width: 700\n   :alt: Neuron developer flow on SageMaker Neo\n   :align: middle\n\nYou can use SageMaker Neo to compile models for deployment on SageMaker Hosting using ml.inf1 instances. In this developer flow, you provision a Sagemaker Notebook instance to train, compile and deploy your model using the SageMaker Python SDK. Follow the steps bellow to setup your environment. \n\n.. _neo-then-hosting-setenv:\n\nSetup Environment\n-----------------\n\n1. Create an Amazon SageMaker Notebook Instance:\n\n\tFollow the instructions in `Get Started with Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html>`_\n\n\tThe Notebook instance created provides the required Python SDK for training, compiling and deploying models with Amazon SageMaker.\n\n2. Compile a model using the Amazon SageMaker SDK:\n\n\tRefer to `Supported Instances Types and Frameworks <https://docs.aws.amazon.com/sagemaker/latest/dg/neo-supported-cloud.html>`_ for information on the framework versions currently supported by Amazon SageMaker Neo on AWS Inferentia. \n\n\tMore information about compiling and deploying models with Amazon SageMaker Neo can be found on `Use Neo to Compile a Model <https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation.html>`_\n\n\n\n\n\n\n"
  },
  {
    "path": "devflows/inference/parallelcluster-flows.rst",
    "content": "Parallel Cluster Flows - Inference\n===================================\n\n\n.. include:: /devflows/inference/parallelcluster-flows.txt"
  },
  {
    "path": "devflows/inference/parallelcluster-flows.txt",
    "content": ".. note::\n\n    AWS ParallelCluster support is coming soon."
  },
  {
    "path": "devflows/inference/sagemaker-flows.rst",
    "content": "Sagemaker Flows - Inference\n===========================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /devflows/inference/byoc-hosting-devflow-inf2\n    /devflows/inference/byoc-hosting-devflow \n    /devflows/inference/neo-then-hosting-devflow\n\n   \n\n.. include:: /devflows/inference/sagemaker-flows.txt"
  },
  {
    "path": "devflows/inference/sagemaker-flows.txt",
    "content": "* :ref:`byoc-hosting-devflow-inf2`\n* :ref:`byoc-hosting-devflow`\n* :ref:`neo-then-hosting-devflow`\n* `AWS Neuron Sagemaker Samples GitHub Repository <https://github.com/aws-neuron/aws-neuron-sagemaker-samples>`_\n"
  },
  {
    "path": "devflows/parallelcluster-flows.rst",
    "content": "AWS ParallelCluster\n===================\n\n.. toctree::\n    :maxdepth: 1\n\n    /devflows/training/parallelcluster-flows\n\n\n.. .. include:: /devflows/parallelcluster-flows.txt\n"
  },
  {
    "path": "devflows/parallelcluster-flows.txt",
    "content": ".. tab-set:: \n\n    .. tab-item:: Training\n            \n        .. include:: /devflows/training/parallelcluster-flows.txt\n                \n\n.. tab-set:: \n\n    .. tab-item:: Inference\n\n        .. note::\n\n            AWS ParallelCluster support is coming soon.\n\n\n"
  },
  {
    "path": "devflows/plugins/npd-ecs-flows.rst",
    "content": ".. _ecs-neuron-problem-detector-and-recovery:\n\nNeuron Problem Detector And Recovery\n====================================\n\n.. include:: /devflows/plugins/npd-ecs-flows.txt\n"
  },
  {
    "path": "devflows/plugins/npd-ecs-flows.txt",
    "content": "Neuron node problem detector and recovery artifact checks the health of Neuron devices on each ECS instance. After detecting an unrecoverable Neuron error, it triggers an instance replacement. In order to get started with Neuron node problem detector and recovery, make sure that the following requirements are satisfied:\n\n* The Neuron node problem detector and recovery requires Neuron driver 2.15+, and it requires the runtime to be at SDK 2.18 or later.\n\nCreating a Task Definition\n--------------------------\n\nConfiguration\n~~~~~~~~~~~~~\n\nThe task definition includes two containers:\n\n- **npd-container**: This container is responsible for enabling Problem detection functionality in the ECS cluster.\n- **recovery-container**: This container handles recovery operations in case of failures detected by Neuron Problem Detector.\n\nThe **recovery-container** has an environment variable called ``ENABLE_RECOVERY`` that controls whether recovery is enabled or disabled. Set the value to ``true`` to enable recovery, or ``false`` to disable it.\n\nFollow these steps to create a task definition for NPD and recovery:\n\n1. Go to the `ECS console <https://console.aws.amazon.com/ecs/>`_ and select **Task Definitions** in the navigation pane.\n2. Click **Create new Task Definition** and choose **Create new Task Definition with JSON**.\n3. Paste the task definition JSON provided, replacing the placeholders with your account-specific values.\n\n    .. code-block:: json\n\n        {\n            \"family\": \"neuron-npd-and-recovery\",\n            \"containerDefinitions\": [\n                {\n                    \"name\": \"npd\",\n                    \"image\": \"registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.19\",\n                    \"cpu\": 0,\n                    \"portMappings\": [\n                        {\n                            \"name\": \"npd-80-tcp\",\n                            \"containerPort\": 80,\n                            \"hostPort\": 80,\n                            \"protocol\": \"tcp\",\n                            \"appProtocol\": \"http\"\n                        }\n                    ],\n                    \"essential\": true,\n                    \"entryPoint\": [\n                        \"/bin/sh\",\n                        \"-c\"\n                    ],\n                    \"command\": [\n                        \"echo '{\\\"plugin\\\":\\\"kmsg\\\",\\\"logPath\\\":\\\"/dev/kmsg\\\",\\\"lookback\\\":\\\"5m\\\",\\\"bufferSize\\\":10,\\\"source\\\":\\\"kernel-monitor\\\",\\\"conditions\\\":[{\\\"type\\\":\\\"NeuronHealth\\\",\\\"reason\\\":\\\"NeuronHasNoError\\\",\\\"message\\\":\\\"Neuronhasnoerror\\\"}],\\\"rules\\\":[{\\\"type\\\":\\\"permanent\\\",\\\"condition\\\":\\\"NeuronHealth\\\",\\\"reason\\\":\\\"NeuronHasError_SRAM_UNCORRECTABLE_ERROR\\\",\\\"pattern\\\":\\\".*NEURON_HW_ERR=SRAM_UNCORRECTABLE_ERROR.*\\\"},{\\\"type\\\":\\\"permanent\\\",\\\"condition\\\":\\\"NeuronHealth\\\",\\\"reason\\\":\\\"NeuronHasError_NC_UNCORRECTABLE_ERROR\\\",\\\"pattern\\\":\\\".*NEURON_HW_ERR=NC_UNCORRECTABLE_ERROR.*\\\"},{\\\"type\\\":\\\"permanent\\\",\\\"condition\\\":\\\"NeuronHealth\\\",\\\"reason\\\":\\\"NeuronHasError_HBM_UNCORRECTABLE_ERROR\\\",\\\"pattern\\\":\\\".*NEURON_HW_ERR=HBM_UNCORRECTABLE_ERROR.*\\\"},{\\\"type\\\":\\\"permanent\\\",\\\"condition\\\":\\\"NeuronHealth\\\",\\\"reason\\\":\\\"NeuronHasError_DMA_ERROR\\\",\\\"pattern\\\":\\\".*NEURON_HW_ERR=DMA_ERROR.*\\\"}]}' > /config/kernel-monitor.json && /node-problem-detector --v=2 --logtostderr --enable-k8s-exporter=false --config.system-log-monitor=/config/kernel-monitor.json\"\n                    ],\n                    \"environment\": [],\n                    \"mountPoints\": [],\n                    \"volumesFrom\": [],\n                    \"linuxParameters\": {\n                        \"devices\": [\n                            {\n                                \"hostPath\": \"/dev/kmsg\",\n                                \"containerPath\": \"/dev/kmsg\",\n                                \"permissions\": [\n                                    \"read\",\n                                    \"write\"\n                                ]\n                            }\n                        ]\n                    },\n                    \"privileged\": true,\n                    \"logConfiguration\": {\n                        \"logDriver\": \"awslogs\",\n                        \"options\": {\n                            \"awslogs-group\": \"/ecs/npd\",\n                            \"awslogs-create-group\": \"true\",\n                            \"awslogs-region\": \"us-west-2\",\n                            \"awslogs-stream-prefix\": \"ecs\"\n                        },\n                        \"secretOptions\": []\n                    },\n                    \"systemControls\": []\n                },\n                {\n                    \"name\": \"recovery\",\n                    \"image\": \"public.ecr.aws/neuron/neuron-node-recovery:1.3.0\",\n                    \"cpu\": 0,\n                    \"portMappings\": [],\n                    \"essential\": true,\n                    \"entryPoint\": [\n                        \"/bin/sh\",\n                        \"-c\"\n                    ],\n                    \"command\": [\n                        \"python scripts/check-health.py\"\n                    ],\n                    \"environment\": [\n                        {\n                            \"name\": \"ENABLE_RECOVERY\",\n                            \"value\": \"false\"\n                        }\n                    ],\n                    \"mountPoints\": [],\n                    \"volumesFrom\": [],\n                    \"readonlyRootFilesystem\": true,\n                    \"logConfiguration\": {\n                        \"logDriver\": \"awslogs\",\n                        \"options\": {\n                            \"awslogs-create-group\": \"true\",\n                            \"awslogs-group\": \"/ecs/recovery\",\n                            \"awslogs-region\": \"us-west-2\",\n                            \"awslogs-stream-prefix\": \"ecs\"\n                        }\n                    },\n                    \"systemControls\": []\n                }\n            ],\n            \"executionRoleArn\": \"arn:aws:iam::012345678910:role/ecsTaskExecutionRole\",\n            \"taskRoleArn\": \"arn:aws:iam::012345678910:role/ecsTaskExecutionRole\",\n            \"networkMode\": \"awsvpc\",\n            \"requiresCompatibilities\": [\n                \"EC2\"\n            ],\n            \"cpu\": \"1024\",\n            \"memory\": \"3072\",\n            \"runtimePlatform\": {\n                \"cpuArchitecture\": \"X86_64\",\n                \"operatingSystemFamily\": \"LINUX\"\n            }\n        }\n\n4. Review the task definition and click **Create**.\n\nFor more details on task definitions, refer to the `AWS documentation <https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definitions.html>`_.\n\n.. _deploy-service:\n\nDeploying the Service\n---------------------\n\nAfter creating the task definition, follow these steps to deploy the service:\n\n1. In the ECS console, select the task definition and click **Deploy** → **Create Service**.\n2. Select your ECS cluster, set the launch type to **EC2**, and the service type to **Daemon**.\n3. Click **Create** to deploy the service.\n\nFor more details on deploying services, refer to the `AWS documentation <https://docs.aws.amazon.com/AmazonECS/latest/developerguide/services.html>`_.\n\nPermissions\n~~~~~~~~~~~\n\nEnsure the ECS task execution role and task role have permissions to:\n\n- Publish metrics to CloudWatch\n- Read and set health status of EC2 instances in the Auto Scaling group\n\nRefer to the `AWS documentation on IAM roles for ECS tasks <https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html>`_ for more information.\n\nWhen any unrecoverable error occurs, Neuron node problem detector and recovery publishes a metric under the CloudWatch namespace NeuronHealthCheck. It also reflects in NodeCondition and can be seen with kubectl describe node."
  },
  {
    "path": "devflows/sagemaker-flows.rst",
    "content": ".. _sagemaker_flow:\n\nAmazon SageMaker\n================\n\nAmazon SageMaker is a fully managed machine learning (ML) platform that streamlines the end-to-end ML workflow at scale. AWS Neuron integrates \nwith Amazon SageMaker to provide optimized performance for ML workloads on AWS Inferentia and AWS Trainium chips.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nSageMaker JumpStart\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nUse `Amazon SageMaker JumpStart <https://aws.amazon.com/sagemaker/jumpstart/>`_ to train and deploy models using Neuron.  SageMaker JumpStart is an ML hub that accelerates model \nselection and deployment. It provides support for fine-tuning and deploying popular models such as Meta’s Llama family of models. \nUsers can customize pre-trained models with their data and easily deploy them.\n\nSageMaker HyperPod\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nUse `Amazon SageMaker HyperPod <https://aws.amazon.com/sagemaker/hyperpod/>`_ to streamline ML infrastructure setup and optimization with AWS Neuron. SageMaker HyperPod leverages \npre-configured distributed training libraries to split workloads across numerous AI accelerators, enhancing model performance. \nHyperPod ensures uninterrupted training through automatic checkpointing, fault detection, and recovery.\n\nSageMaker Training\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n`Amazon SageMaker Model Training <https://aws.amazon.com/sagemaker/train/>`_ reduces the time and cost to train and tune ML models at scale without the need to manage infrastructure.\n\nSageMaker Inference\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nWith `Amazon SageMaker <https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html>`_ , you can start getting predictions, or inferences, from your trained ML models. SageMaker \nprovides a broad selection of ML infrastructure and model deployment options to help meet all your ML inference needs."
  },
  {
    "path": "devflows/setup/ecs-flows.rst",
    "content": "ECS Flows - Setup\n=================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /devflows/plugins/npd-ecs-flows\n\n.. include:: /devflows/setup/ecs-flows.txt"
  },
  {
    "path": "devflows/setup/ecs-flows.txt",
    "content": "* :ref:`ecs-neuron-problem-detector-and-recovery`"
  },
  {
    "path": "devflows/setup/eks-flows.rst",
    "content": "EKS - Setup\n=====================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /containers/kubernetes-getting-started\n\n\n\n.. include:: /devflows/setup/eks-flows.txt\n"
  },
  {
    "path": "devflows/setup/eks-flows.txt",
    "content": "* :ref:`kubernetes-getting-started`\n"
  },
  {
    "path": "devflows/third-party-solutions.rst",
    "content": ".. _third-party-devflow-solutions:\n\nThird-party solutions\n====================\n\nAWS Neuron integrates with multiple third-party partner solutions that alow you to run deep learning workloads on Amazon EC2 \ninstances powered by AWS Trainium and AWS Inferentia chips. The following list gives an overview of third-party solutions \nthat work with AWS Neuron.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nRay \n\"\"\"\nRay, by Anyscale, is the open source AI Compute Engine at the center of the world's most powerful AI Platforms. It precisely \norchestrates infrastructure for any distributed AI workload like data processing, model training, and serving on any accelerator at \nany scale. Ray simplifies the complexity of distributed computing, improves efficiency, lower costs, and accelerates developer \nproductivity.\n\n`Ray Train documentation <https://docs.ray.io/en/latest/train/examples/aws-trainium/llama3.html>`_\n\nDomino\n\"\"\"\"\"\"\nDomino is an open enterprise platform for data science, machine learning, and AI research. It works with an expansive list of \nindustry leading tools and technologies to enrich data science research, development, and deployment processes. Domino works with a \nwide range of data sources, languages, IDEs, tools, libraries, and publication targets.\n\n`Domino documentation <https://docs.dominodatalab.com/en/latest/user_guide/d98a6d/aws-trainium-and-inferentia-silicon-accelerators/>`_\n"
  },
  {
    "path": "devflows/training/aws-batch-flows.rst",
    "content": "AWS Batch Flows- Training\n=========================\n\n\n.. include:: /devflows/training/aws-batch-flows.txt"
  },
  {
    "path": "devflows/training/aws-batch-flows.txt",
    "content": "* :ref:`batch-training`"
  },
  {
    "path": "devflows/training/batch/batch-training.rst",
    "content": ".. _batch-training:\n\nTrain your model on AWS Batch\n=============================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 3\n\nDescription\n------------\n\nAWS Batch provides a scalable and cost-effective solution for running batch computing workloads in the AWS Cloud. Integrating Trainium with AWS Batch provides an efficient and cost-effective way of training deep learning models at scale.\nOnce you configure your training job, AWS Batch effectively manages the orchestration, execution, and dynamic scaling of compute resources for your extensive machine learning workloads. To learn more about AWS Batch, see `the AWS Batch documentation <https://docs.aws.amazon.com/batch/latest/userguide/what-is-batch.html>`_.\n\n\nHow does AWS Batch work with Trainium\n-------------------------------------\n\n.. image:: /images/batch-setup.png\n\n\nAs depicted in the illustration above, our workflow begins by building a ``Docker container image for Trainium`` and pushing it to Amazon Elastic Container Registry (ECR). Following this, we configure our AWS Batch environment with the required capabilities, and subsequently submit the training job.\n\nPlease follow the below mentioned steps to run your training jobs on ``AWS Batch`` with ``Trainium``.\n\n#. **Before you begin, please ensure that you have the following prerequisites completed:**\n\n   * ``AWS VPC`` with at least one ``Subnet`` and ``EFA Enabled Security Group`` (learn more about EFA-enabled security group `the AWS EFA User Guide <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-security>`_). Please make sure subnet needs to be private, and the VPC needs to have a NAT gateway to allow internet connectivity for the private subnet.\n   * ``AWS ECR`` repository\n   * ``AWS CLI`` installed and configured with permissions for the above mentioned AWS resources\n   * ``Docker``\n   * ``jq``\n\n#. **Setup to start working with AWS Batch**\n\n   Connect to your EC2 instance(``x86_64-based Linux instance``) and clone the ``aws-neuron-samples`` repo. Once done, navigate to aws batch scripts directory.\n\n   .. code:: shell\n\n      cd ~/\n      git clone https://github.com/aws-neuron/aws-neuron-samples.git\n      cd ~/aws-neuron-samples/torch-neuronx/training/aws-batch/all-reduce\n\n#. **Configure resource requirements**\n\n   Update the ``build_configs_and_setup.sh`` with your environment variables. Once done, execute the bash script using the command ``./build_configs_and_setup.sh``.\n\n#. **Build the required docker image and publish it to ECR**\n\n   Run ``./build_docker_image.sh`` to build a Neuron Deep-Learning Container image using the latest Neuron packages and push this image to ECR.\n\n#. **Prepare the AWS infrastructure required to submit the batch job**\n\n   Run ``./create_resources.sh`` to create all AWS Batch resources needed for your training workload. Below is the brief description of various AWS Batch components this script will create for you -\n\n   * ``Placement Group`` enables you to influence the placement of your EC2 (Elastic Compute Cloud) instances within the AWS infrastructure.\n   * ``Launch Template`` allows you to define a set of instance configuration parameters, including the Amazon Machine Image (AMI), instance type, key pair, security groups, and other settings, in a template format.\n   * ``Compute Environment`` helps you to specify configuration that specifies the type of compute resources you want to use for your batch jobs. It includes details such as the EC2 instance types, the minimum and maximum number of instances, the VPC configuration, and other settings related to the compute environment.\n   * ``Job Definition`` is a blueprint that specifies how a batch job should be run. It encapsulates information about the job, such as the Docker image to be used, the command to execute within the container, the CPU and memory requirements, job dependencies, and other settings.\n   * ``Job Queue`` acts as a queueing mechanism for managing and scheduling the execution of batch computing workloads. By using job queues, AWS Batch provides a scalable and efficient way to process batch workloads, managing the allocation of resources and ensuring optimal use of compute capacity.\n\n#. **Submit the job to AWS-Batch**\n\n   Run ``./submit_job.sh`` to submit a basic all-reduce job in the provisioned AWS Batch environment\n\n#. **Monitor the AWS-Batch job**\n\n   You can use Amazon CloudWatch Logs to monitor, store, and view all your logs from AWS Batch job. To learn more about it, please see `the AWS docs on using Batch and EKS with CloudWatch <https://docs.aws.amazon.com/batch/latest/userguide/batch-eks-cloudwatch-logs.html>`_.\n\n.. note::\n    * You could run a full model training job using this setup. For example, `this sample <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/aws-batch/llama2/README.md>`_ runs the Llama2-7B tutorial on AWS Batch using the same setup.\n    * You can further tailor your ``Dockerfile`` to include any additional dependencies specific to your needs.\n    * You have the option to leverage ``trn1n.32xlarge`` instances as an alternative to ``trn1.32xlarge``. To make this transition, you only need to make adjustments to the ``launch template`` and ``job definition`` in order to accommodate the use of 16 EFA (Elastic Fabric Adapter) devices, whereas the current setup for ``trn1`` employs 8 EFA devices. Please check out `this document <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.html?highlight=multi-node>`_ to start with ``trn1n.32xlarge`` for multi-node execution."
  },
  {
    "path": "devflows/training/dlc-then-ecs-devflow.rst",
    "content": ".. _training-dlc-then-ecs-devflow:\n\nDeploy Neuron Container on Elastic Container Service (ECS) for Training\n=======================================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n   \nDescription\n-----------\n\n|image|\n \n.. |image| image:: /images/dlc-on-ecs-dev-flow.png\n   :width: 750\n   :alt: Neuron developer flow for DLC on ECS\n   :align: middle\n\nYou can use the Neuron version of the `AWS Deep Learning Containers <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ecs-tutorials-training.html>`_ to run training on Amazon Elastic Container Service (ECS). In this developer flow, you set up an ECS cluster with trn1 instances, create a task description for your training container and deploy it to your cluster. This developer flow assumes:\n\n1. The model has already been compiled through :ref:`Compilation with Framework API on EC2 instance <ec2-training>` or through :ref:`Compilation with Sagemaker Neo <neo-then-hosting-devflow>`.\n\n2. You already set up your container to retrieve it from storage.\n\n.. _training-dlc-then-ecs-setenv:\n\nSetup Environment\n-----------------\n\n\n1. Set up an Amazon ECS cluster:\n\tFollow the instructions on `Setting up Amazon ECS for Deep Learning Containers <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ecs-setting-up-ecs.html>`_\n\n2. Define a Training Task:\n\tUse the instruction on the `DLC Training on ECS Tutorial <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-ecs-tutorials-training.html>`_ to define a task and create a service for the appropriate framework.\n\n\tWhen creating tasks for trn1 instances on ECS, be aware of the considerations and requirements listed in `Working with training workloads on Amazon ECS <https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-inference.html>`_.\n\n\n3. Use the container image created using :ref:`how-to-build-neuron-container` as the ``image`` in your task definition.\n\n   .. _training_push_to_ecr_note:\n\n   .. note::\n\n       Before deploying your task definition to your ECS cluster, make sure to push the image to ECR. Refer to `Pushing a Docker image <https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html>`_ for more information.\n"
  },
  {
    "path": "devflows/training/ec2/ec2-training.rst",
    "content": ".. _ec2-training:\n\nTrain your model on EC2\n=======================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 3\n   \nDescription\n-----------\n\n|image|\n \n.. |image| image:: /images/trn1-on-ec2-dev-flow.png\n   :width: 500\n   :alt: Neuron developer flow on EC2\n   :align: middle\n   \nYou can use a single Trn1 instance as a development environment to compile and train Neuron models. In this developer flow, you provision an EC2 Trn1 instance using a Deep Learming AMI (DLAMI) and execute the two steps of the development flow in the same instance. The DLAMI comes pre-packaged with the Neuron frameworks, compiler, and required runtimes to complete the flow. Development happens through Jupyter Notebooks or using a secure shell (ssh) connection in terminal. Follow the steps bellow to setup your environment.\n\nSetup Environment\n-----------------\n\n1. Launch an Trn1 Instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n    .. include:: /setup/install-templates/launch-trn1-dlami.rst\n\n2. Set up a development environment\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n   \nEnable PyTorch-Neuron\n~~~~~~~~~~~~~~~~~~~~~\n\n    .. include:: /frameworks/torch/torch-neuronx/setup/install-templates/pytorch-dev-install.txt\n\n3. Set up Jupyter notebook\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo develop from a Jupyter notebook see :ref:`setup-jupyter-notebook-steps-troubleshooting`  \n\nYou can also run a Jupyter notebook as a script, first enable the ML framework Conda or Python environment of your choice and see :ref:`running-jupyter-notebook-as-script` for instructions. \n"
  },
  {
    "path": "devflows/training/ec2-flows.rst",
    "content": "EC2 Flows- Training\n====================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /devflows/training/ec2/ec2-training\n\n    \n\n.. include:: /devflows/training/ec2-flows.txt"
  },
  {
    "path": "devflows/training/ec2-flows.txt",
    "content": "* :ref:`ec2-training`\n"
  },
  {
    "path": "devflows/training/parallelcluster/parallelcluster-training.rst",
    "content": ".. _parallelcluster-training:\n\nTrain your model on ParallelCluster\n===================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 3\n\nDescription\n------------\n\nThis document explains how to use AWS ParallelCluster to build HPC compute environment \nthat uses Trn1 compute nodes to run your distributed ML training job. Once the nodes are \nlaunched, we will run a training task to confirm that the nodes are working, and use \nslurm commands to check the job status. In this tutorial, we will use AWS `pcluster` command\nto run a yaml file in order to generate the cluster. As an example, we are going to launch\nmultiple Trn1.32xl nodes in our cluster.\n\nWe are going to set up our ParallelCluster infrastructure as below:\n\n.. image:: /images/vpc-setup.png\n\nAs shown in the figure above, inside a VPC, there are two subnets, a public and a private\nones. Head Node resides in the public subnet, while the compute fleet (in this case, trn1\ninstances) are in the private subnet. A Network Address Translation (NAT) gateway is also \nneeded in order for nodes in the private subnet to connect to clients outside the VPC. In \nthe next section, we are going to describe how to set up all the necessary infrastructure \nfor trn1 ParallelCluster.\n\n\nSetup environment\n-----------------\n\n1. Install prerequisite infrastructure:\n\nFollow `these setup <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/about-neuron/network/vpc-subnet-setup.md>`_ instructions to install VPC and all the necessary components for ParallelCluster. \n\n2. Install AWS ParallelCluster in a virtual environment (recommended)\n\nFollow `https://docs.aws.amazon.com/parallelcluster/latest/ug/install-v3-virtual-environment.html`\n\n3. Create and launch ParallelCluster \n\nFollow `these creating cluster <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/cluster-configs/trn1-16-nodes-pcluster.md>`_ instructions to launch ParallelCluster in the VPC.\n\n1. Launch training job\n\nFollow `these running training <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/dp-bert-launch-job.md>`_ instructions to submit a model training script as a slurm job.\n"
  },
  {
    "path": "devflows/training/parallelcluster-flows.rst",
    "content": "Parallel Cluster Flows- Training\n================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /devflows/training/parallelcluster/parallelcluster-training\n\n\n    \n\n.. include:: /devflows/training/parallelcluster-flows.txt"
  },
  {
    "path": "devflows/training/parallelcluster-flows.txt",
    "content": "* :ref:`parallelcluster-training`\n                \n\n"
  },
  {
    "path": "devflows/training/sagemaker-flows.rst",
    "content": "Sagemaker Flows- Training\n=========================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /devflows/training/sm-devflow/sm-training-devflow\n\n    \n\n.. include:: /devflows/training/sagemaker-flows.txt"
  },
  {
    "path": "devflows/training/sagemaker-flows.txt",
    "content": "* :ref:`sm-training-devflow`\n* `AWS Neuron Sagemaker Samples GitHub Repository <https://github.com/aws-neuron/aws-neuron-sagemaker-samples>`_\n\n"
  },
  {
    "path": "devflows/training/sm-devflow/sm-training-devflow.rst",
    "content": ".. _sm-training-devflow:\n\nTrain your model on SageMaker\n===================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 3\n\nDescription\n------------\n\nSageMaker Training helps you manage cloud computing resources in Amazon EC2, data storage services\nsuch as S3, EFS, and FSx, and security management services such as IAM and VPC. SageMaker Training \nprovides you a complete end-to-end experience of training classical ML and state-of-the-art DL models. \n\nYou can use SageMaker to train models using Trn1 instances (ml.trn1 instance types). \nIn this developer flow, you provision a SageMaker Notebook instance or SageMaker Studio to train \nyour model using the `SageMaker Python SDK <https://sagemaker.readthedocs.io/en/stable/index.html>`_.\n\nThe Amazon SageMaker Python SDK lets you launch training jobs in just a few lines of code with ease. \nAs shown in the below diagram Amazon SageMaker launches Trn1 instances, copies both data and code \nonto the instance. It then runs the training script to generate model artifacts. The trained model \nartifacts are then uploaded to S3 and finally SageMaker will terminate the provisioned instances. \nIn order to speed up the training process for successive runs you can copy the `Neuron Persistent Cache\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/neuron-caching.html>`_\nto S3 and then copied by future training jobs as they will leverage the cached artifacts. \n(See `Hugging Face fine tuning BERT base model on Amazon SageMaker Tutorial \n<https://github.com/aws-neuron/aws-neuron-sagemaker-samples/tree/main/training/trn1-bert-fine-tuning-on-sagemaker>`_\nfor an example on how to reuse the compiled cache.)\n\n.. image:: /images/trn1-on-sm-dev-flow.png\n\n\nSetup environment\n-----------------\n\n1. Create an Amazon SageMaker Notebook Instance\n\n   Follow the instructions in `Get Started with Notebook Instances \n   <https://docs.aws.amazon.com/sagemaker/latest/dg/gs-console.html>`_ or \n   `Use Amazon SageMaker Studio Notebooks <https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks.html>`_.\n   The Notebook instance provides the required Python SDK for training models with Amazon SageMaker.\n   Please make sure SageMaker Python SDK version is 2.116.0 or later.\n\n2. Train a model using the Amazon SageMaker SDK\n\n   Follow the instructions in `Distributed Training with PyTorch Neuron on Trn1 instances\n   <https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#distributed-training-with-pytorch-neuron-on-trn1-instances>`_.\n   You’ll be able to follow the `Hugging Face fine tuning BERT base model on Amazon SageMaker Tutorial\n   <https://github.com/aws-neuron/aws-neuron-sagemaker-samples/tree/main/training/trn1-bert-fine-tuning-on-sagemaker>`_.\n\n   .. note::\n     SageMaker support for EC2 Trn1 instance is currently available only for PyTorch Estimator. \n     HuggingFace Estimator will be available in future release.\n"
  },
  {
    "path": "dlami/index.rst",
    "content": ".. meta::\n   :description: Neuron Deep Learning AMIs (DLAMIs) are pre-configured Amazon Machine Images with the Neuron SDK for easy deployment on AWS Inferentia and Trainium instances.\n   :keywords: Neuron DLAMI, Deep Learning AMI, AWS Neuron SDK, Inferentia, Trainium, PyTorch, JAX, TensorFlow, vLLM, SSM Parameters\n   :date-modified: 01/22/2026\n\n.. _neuron-dlami-overview:\n.. _setup-ubuntu22-multi-framework-dlami:\n.. _setup-ubuntu24-multi-framework-dlami:\n\nNeuron DLAMI User Guide\n=======================\n\nThis guide helps you select, configure, and deploy AWS Neuron Deep Learning AMIs (DLAMIs) for running machine learning workloads on AWS Inferentia and Trainium instances. Learn about the different DLAMI types available, pre-installed virtual environments for popular ML frameworks like PyTorch and JAX, and how to automate DLAMI deployment.\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nWhat are Neuron DLAMIs?\n------------------------\n\nNeuron Deep Learning AMIs (DLAMIs) are pre-configured Amazon Machine Images that provide the easiest way to get started with the AWS Neuron SDK. Each DLAMI comes with Neuron drivers, frameworks, and libraries pre-installed, enabling you to quickly launch and run deep learning workloads on AWS Inferentia and Trainium instances without manual setup.\n\nNeuron currently supports three types of DLAMIs to meet different deployment needs:\n\n* **Multi-Framework DLAMIs**: Support multiple ML frameworks (PyTorch, JAX, vLLM) with separate virtual environments for each\n* **Single Framework DLAMIs**: Optimized for a specific framework version with focused virtual environments\n* **Base DLAMIs**: Include only Neuron drivers, EFA, and tools - ideal for containerized applications and custom builds\n\nAll Neuron DLAMIs support automated discovery through AWS Systems Manager (SSM) parameters, making them easy to integrate into cloud automation workflows and infrastructure-as-code deployments.\n\n.. note::\n  Starting with version 2.26.1, Neuron DLAMIs no longer support ``Inf1`` instance types due to an incompatibility with the Neuron driver.  \n  If you'd like to run ``Inf1`` workloads, use previous DLAMIs released up to SDK version 2.26.\n\n----\n\nNeuron Multi Framework DLAMI\n----------------------------\n\nNeuron Multi-Framework DLAMIs provide the most comprehensive environment, supporting multiple ML frameworks and libraries in isolated virtual environments. Each DLAMI is pre-installed with Neuron drivers and supports all current Neuron instance types (Inf2, Trn1, Trn1n, Trn2, Trn3). This is the recommended option for teams working with multiple frameworks or exploring different ML libraries.\n\n.. note::\n  Starting with version 2.27.1, AL2023 DLAMIs no longer support ``PyTorch 2.9+`` due to an incompatibility issue with the default GLIB.c installed on AL2023.\n  PyTorch requires GLIB.c 2.35+ and upgrading the version within AL2023 can break other system dependencies. This is the error message:\n  \n  ``ImportError: /lib64/libm.so.6: version `GLIBC_2.35' not found``\n\n  Since the latest vLLM version depends on PyTorch 2.9, we have also removed that environment from the DLAMI.\n  \n  For a workaround, use the latest Ubuntu-based AMIs instead.\n\n\nMulti Framework DLAMIs supported\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :align: left\n    :class: table-smaller-font-size\n\n    * - Operating System\n      - Neuron Instances Supported\n      - DLAMI Name\n\n    * - Ubuntu 24.04\n      - Inf2, Trn1, Trn1n, Trn2, Trn3\n      - Deep Learning AMI Neuron (Ubuntu 24.04)\n\n.. _neuron-dlami-multifw-venvs:\n\n\nVirtual Environments pre-installed\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :align: left\n    :class: table-smaller-font-size\n\n    * - Neuron Framework/Libraries supported\n      - Virtual Environment\n\n    * - PyTorch 2.9 Torch NeuronX, NxD Core (Ubuntu 24.04)\n      - /opt/aws_neuronx_venv_pytorch_2_9\n\n    * - PyTorch 2.9 NxD Training, Torch NeuronX (Ubuntu 24.04)\n      - /opt/aws_neuronx_venv_pytorch_2_9_nxd_training\n\n    * - PyTorch 2.9 NxD Inference, Torch NeuronX (Ubuntu 24.04)\n      - /opt/aws_neuronx_venv_pytorch_2_9_nxd_inference\n\n    * - JAX 0.7 NeuronX (Ubuntu 24.04)\n      - /opt/aws_neuronx_venv_jax_0_7\n\n    * - vLLM 0.16.0 NxD Inference, Torch NeuronX (Ubuntu 24.04)\n      - /opt/aws_neuronx_venv_pytorch_inference_vllm_0_16\n\n\nWe have included a setup script that installs required dependencies for the package within the PyTorch 2.9 NxD Training virtual environment. To run this script,\nactivate the virtual environment and run ``setup_nxdt.sh`` and this will run :ref:`the setup steps here <nxdt_installation_guide>`.\n\nYou can easily get started with the multi-framework DLAMI through AWS console by following this :doc:`setup guide </setup/multiframework-dlami>`. If you are looking to \nuse the Neuron DLAMI in your cloud automation flows, Neuron also supports :ref:`SSM parameters <ssm-parameter-neuron-dlami>` to easily retrieve the latest DLAMI id.\n\n----\n\nNeuron Single Framework DLAMI\n-----------------------------\n\nNeuron Single Framework DLAMIs are optimized for specific framework versions, providing a streamlined environment when you know exactly which framework you'll be using. Each DLAMI is pre-installed with Neuron drivers and supports all Neuron instance types. These DLAMIs are ideal for production deployments where you want a focused, framework-specific environment. \n\n\nSingle Framework DLAMIs supported\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :align: left\n    :class: table-smaller-font-size\n\n    * - Framework\n      - Operating System\n      - Neuron Instances Supported\n      - DLAMI Name\n\n    * - PyTorch 2.9\n      - Ubuntu 24.04\n      - Inf2, Trn1, Trn1n, Trn2, Trn3\n      - Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04)\n\n    * - JAX 0.7\n      - Amazon Linux 2023\n      - Inf2, Trn1, Trn1n, Trn2, Trn3\n      - Deep Learning AMI Neuron JAX 0.7 (Amazon Linux 2023)\n\n    * - JAX 0.7\n      - Ubuntu 24.04\n      - Inf2, Trn1, Trn1n, Trn2, Trn3\n      - Deep Learning AMI Neuron JAX 0.7 (Ubuntu 24.04)\n\n    * - vLLM 0.16.0\n      - Ubuntu 24.04\n      - Inf2, Trn1, Trn1n, Trn2, Trn3\n      - Deep Learning AMI Neuron PyTorch Inference vLLM 0.16 (Ubuntu 24.04)\n\n\nVirtual Environments pre-installed\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :align: left\n    :class: table-smaller-font-size\n\n    * - DLAMI Name\n      - Neuron Libraries supported\n      - Virtual Environment\n  \n    * - Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04) \n      - PyTorch 2.9 Torch NeuronX, NxD Core\n      - /opt/aws_neuronx_venv_pytorch_2_9\n\n    * - Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04) \n      - PyTorch 2.9 NxD Training, Torch NeuronX\n      - /opt/aws_neuronx_venv_pytorch_2_9_nxd_training\n\n    * - Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04) \n      - PyTorch 2.9 NxD Inference, Torch NeuronX\n      - /opt/aws_neuronx_venv_pytorch_2_9_nxd_inference\n\n    * - Deep Learning AMI Neuron JAX 0.7 (Ubuntu 24.04, Amazon Linux 2023) \n      - JAX NeuronX 0.7\n      - /opt/aws_neuronx_venv_jax_0_7\n\n    * - Deep Learning AMI Neuron PyTorch Inference vLLM 0.16 (Ubuntu 24.04) \n      - vLLM NeuronX 0.16.0\n      - /opt/aws_neuronx_venv_pytorch_inference_vllm_0_16\n\n\nGet started with the single framework DLAMI through AWS console by following one of the corresponding setup guides. If you want to\nuse the Neuron DLAMI in your cloud automation flows, Neuron also supports :ref:`SSM parameters <ssm-parameter-neuron-dlami>` to retrieve the latest DLAMI id.\n\n----\n\nNeuron Base DLAMI\n-----------------\n\nNeuron Base DLAMIs provide a minimal foundation with only the essential components: Neuron driver, EFA (Elastic Fabric Adapter), and Neuron tools. These DLAMIs are designed for advanced users who want to build custom environments, create containerized applications, or have specific framework version requirements not covered by the pre-configured DLAMIs.\n\n\nBase DLAMIs supported\n^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1\n    :align: left\n    :class: table-smaller-font-size\n\n    * - Operating System\n      - Neuron Instances Supported\n      - DLAMI Name\n\n    * - Amazon Linux 2023\n      - Inf2, Trn1, Trn1n, Trn2, Trn3 \n      - Deep Learning Base Neuron AMI (Amazon Linux 2023)\n\n    * - Ubuntu 24.04\n      - Inf2, Trn1, Trn1n, Trn2, Trn3\n      - Deep Learning Base Neuron AMI (Ubuntu 24.04)\n\n    * - Ubuntu 22.04\n      - Inf2, Trn1, Trn1n, Trn2, Trn3\n      - Deep Learning Base Neuron AMI (Ubuntu 22.04)\n\n\n.. _ssm-parameter-neuron-dlami:\n\n----\n\nUsing SSM Parameters for Cloud Automation\n------------------------------------------\n\nNeuron DLAMIs support AWS Systems Manager (SSM) parameters for automated DLAMI discovery and deployment. This enables you to always use the latest Neuron SDK release in your infrastructure-as-code templates, CI/CD pipelines, and auto-scaling configurations without hardcoding AMI IDs.\n\nSSM parameters provide several key benefits:\n\n* **Always up-to-date**: Automatically reference the latest DLAMI with the newest Neuron SDK release\n* **Infrastructure-as-code friendly**: Use in CloudFormation, Terraform, and other IaC tools\n* **Auto Scaling integration**: Update Auto Scaling groups without modifying launch templates\n* **Multi-region support**: Available across all AWS regions where Neuron instances are supported\n\nCurrently, SSM parameters support finding the latest DLAMI ID for each DLAMI type. Support for finding specific Neuron SDK version DLAMIs will be added in future releases.\n\n\nFinding specific DLAMI image id with the latest neuron release\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYou can find the DLAMI that supports latest Neuron SDK by using the SSM get-parameter.\n\n\n.. code-block::\n\n    aws ssm get-parameter \\\n    --region us-east-1 \\\n    --name <dlami-ssm-parameter-prefix>/latest/image_id \\\n    --query \"Parameter.Value\" \\\n    --output text\n\n\nThe SSM parameter prefix for each currently supported DLAMI can be seen below. To discover SSM parameters for older or end-of-life DLAMIs, you can filter by framework, version, or operating system using the path structure ``/aws/service/neuron/dlami/<framework>-<framework-version>/<os>``:\n\n.. code-block::\n\n      # List all Neuron DLAMI SSM parameters\n      aws ssm get-parameters-by-path --region us-east-1 --path /aws/service/neuron --recursive\n\n      # Filter by framework (e.g., all PyTorch 2.8 DLAMIs)\n      aws ssm get-parameters-by-path --region us-east-1 --path /aws/service/neuron/dlami/pytorch-2.8 --recursive\n\n      # Filter by framework and OS\n      aws ssm get-parameters-by-path --region us-east-1 --path /aws/service/neuron/dlami/pytorch-2.8/ubuntu-22.04 --recursive\n\n\nSSM Parameter Prefix\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n.. list-table::\n    :widths: 20 39\n    :header-rows: 1\n    :align: left\n    :class: table-smaller-font-size\n\n    * - AMI Name\n      - SSM parameter Prefix\n\n    * - Deep Learning AMI Neuron (Ubuntu 24.04)\n      - /aws/service/neuron/dlami/multi-framework/ubuntu-24.04\n\n    * - Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04)\n      - /aws/service/neuron/dlami/pytorch-2.9/ubuntu-24.04\n\n    * - Deep Learning AMI Neuron JAX 0.7 (Amazon Linux 2023)\n      - /aws/service/neuron/dlami/jax-0.7/amazon-linux-2023\n\n    * - Deep Learning AMI Neuron JAX 0.7 (Ubuntu 24.04)\n      - /aws/service/neuron/dlami/jax-0.7/ubuntu-24.04\n\n    * - Deep Learning AMI Neuron PyTorch Inference vLLM 0.16 (Ubuntu 24.04)\n      - /aws/service/neuron/dlami/pytorch-inference-vllm-0.16/ubuntu-24.04\n\n    * - Deep Learning Base Neuron AMI (Amazon Linux 2023)\n      - /aws/service/neuron/dlami/base/amazon-linux-2023\n\n    * - Deep Learning Base Neuron AMI (Ubuntu 24.04)\n      - /aws/service/neuron/dlami/base/ubuntu-24.04\n\n    * - Deep Learning Base Neuron AMI (Ubuntu 22.04)\n      - /aws/service/neuron/dlami/base/ubuntu-22.04\n\n\nFor example to find the latest DLAMI id for Multi-Framework DLAMI (Ubuntu 24.04) you can use the following:\n\n.. code-block::\n\n    aws ssm get-parameter \\\n    --region us-east-1 \\\n    --name /aws/service/neuron/dlami/multi-framework/ubuntu-24.04/latest/image_id \\\n    --query \"Parameter.Value\" \\\n    --output text\n\n\nYou can find all available parameters supported in Neuron DLAMis via CLI\n\n.. code-block::\n\n    aws ssm get-parameters-by-path \\\n    --region us-east-1 \\\n    --path /aws/service/neuron \\\n    --recursive\n\n\nYou can also view the SSM parameters supported in Neuron through AWS parameter store by selecting the \"Neuron\" service.\n\n\n\nUse SSM Parameter to launch instance directly via CLI\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nYou can use the AWS CLI to resolve the latest DLAMI ID and launch an instance in a single command. This is particularly useful for scripting and automation workflows.\n\nBelow is an example of launching an Inf2 instance using the TensorFlow 2.10 single-framework DLAMI: \n\n\n.. code-block::\n\n    aws ec2 run-instances \\\n    --region us-east-1 \\\n    --image-id resolve:ssm:/aws/service/neuron/dlami/tensorflow-2.10/ubuntu-22.04/latest/image_id \\\n    --count 1 \\\n    --instance-type inf2.48xlarge \\\n    --key-name <my-key-pair> \\\n    --security-groups <my-security-group>\n\n\n\nUse SSM alias in EC2 launch templates\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nSSM Parameters can be used directly in EC2 launch templates, enabling your Auto Scaling groups to automatically use the latest AMI IDs without requiring updates to launch templates or creating new versions each time an AMI ID changes. This significantly simplifies AMI lifecycle management in production environments.\n\nFor more information, see: https://docs.aws.amazon.com/autoscaling/ec2/userguide/using-systems-manager-parameters.html\n\n----\n\nOther Resources\n---------------\n\nLearn more about AWS Deep Learning AMIs and Systems Manager:\n\n* `AWS Deep Learning AMI Developer Guide <https://docs.aws.amazon.com/dlami/latest/devguide/what-is-dlami.html>`_\n* `AWS DLAMI Release Notes <https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html>`_\n* `AWS Systems Manager Parameter Store <https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html>`_\n* :doc:`Neuron DLAMI Release Notes </release-notes/components/dlamis>`\n"
  },
  {
    "path": "frameworks/index.rst",
    "content": ".. meta::\n   :description: ML Framework support on AWS Neuron SDK - PyTorch and JAX integration for high-performance machine learning on AWS Inferentia and Trainium.\n   :date-modified: 2026-03-12\n   :keywords: AWS Neuron, machine learning\n\n.. _frameworks-neuron-sdk:\n\nML framework support on AWS Neuron SDK\n=======================================\n\nAWS Neuron provides integration with popular machine learning frameworks, enabling you to accelerate your existing models on AWS Inferentia and Trainium with minimal code changes. Choose from our comprehensive framework support to optimize your inference and training workloads.\n\nFrameworks\n-----------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: PyTorch on AWS Neuron\n        :link: torch/index\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Complete PyTorch integration for both inference and training on all Neuron hardware.\n\n        * **TorchNeuron Native** - Native PyTorch backend with eager execution and ``torch.compile``\n        * **PyTorch NeuronX (torch-neuronx)** - ``Inf2``, ``Trn1``, ``Trn2`` (inference & training)\n        * See: :doc:`/frameworks/torch/pytorch-native-overview`\n\n    .. grid-item-card:: JAX on AWS Neuron\n        :link: jax/index\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        **Beta release**\n\n        Experimental JAX support with Neuron Kernel Interface (NKI) integration.\n\n        * **JAX NeuronX** - Neuron hardware support\n        * Research and development focus\n        * **Status**: Beta - active\n\n.. note::\n\n   Looking for TensorFlow, MXNet, or torch-neuron (Inf1) documentation? These frameworks\n   have been archived. See :doc:`/archive/index` for legacy framework documentation.\n\nHardware compatibility matrix\n-----------------------------\n\n.. list-table::\n   :header-rows: 1\n   :class: compatibility-matrix\n\n   * - Framework\n     - Inf2\n     - Trn1/Trn1n\n     - Trn2\n     - Inference\n     - Training\n   * - **torch-neuronx**\n     - ✅\n     - ✅\n     - ✅\n     - ✅\n     - ✅\n   * - **JAX NeuronX**\n     - ✅\n     - ✅\n     - N/A\n     - ✅\n     - N/A\n"
  },
  {
    "path": "frameworks/jax/api-reference-guide/index.rst",
    "content": ".. _jax-neuronx-api-reference-guide:\n\n\n.. meta::\n   :description: API Reference Guide for JAX Neuronx - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, JAX, JAX NeuronX\n   :date-modified: 2026-03-13\n\n\nAPI Reference Guide for JAX Neuronx\n====================================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /frameworks/jax/api-reference-guide/neuron-envvars\n\n* :ref:`jax-neuronx-envvars`\n"
  },
  {
    "path": "frameworks/jax/api-reference-guide/neuron-envvars.rst",
    "content": ".. _jax-neuronx-envvars:\n\n\n.. meta::\n   :description: JAX NeuronX Environment Variables - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, JAX, JAX NeuronX\n   :date-modified: 2026-03-13\n\n\nJAX NeuronX Environment Variables\n======================================\n\nEnvironment variables allow modifications to JAX NeuronX behavior\nwithout requiring code change to user script. It is recommended to set\nthem in code or just before invoking the python process, such as\n``NEURON_RT_VISIBLE_CORES=8 python3 <script>`` to avoid inadvertently\nchanging behavior for other scripts. Environment variables specific to\nJAX Neuronx are:\n\n``NEURON_CC_FLAGS``\n\n-  Compiler options. Full compiler options are described in the :ref:`neuronx-cc-training-mixed-precision`.\n\n``XLA_FLAGS``\n\n- When set to ``\"--xla_dump_hlo_snapshots --xla_dump_to=<dir>\"``, this environmental variable enables dumping snapshots in ``<dir>`` directory. See :ref:`torch-neuronx-snapshotting` section for more information. The snapshotting interface for JAX and Pytorch are identical.\n- When set to ``\"--xla_dump_hlo_as_text --xla_dump_hlo_as_proto --xla_dump_to=<dir> --xla_dump_hlo_pass_re='.*'\"``, this environmental variable enables dumping HLOs in proto and text formats after each XLA pass. The dumped ``*.hlo.pb`` files are in HloProto format.\n\n``NEURON_FORCE_PJRT_PLUGIN_REGISTRATION``\n\n- When ``NEURON_FORCE_PJRT_PLUGIN_REGISTRATION=1``, the Neuron PJRT plugin will be registered in JAX regardless of the instance type.\n\n``NEURON_RUN_TRIVIAL_COMPUTATION_ON_CPU``\n\n-  When ``NEURON_RUN_TRIVIAL_COMPUTATION_ON_CPU=1``, the Neuron PJRT plugin will compile and execute \"trivial\" computations on CPU instead of Neuron cores. A \"trivial\" computation is defined as an HLO program that does not contain any collective-compute instructions. The HLO program will be compiled by the XLA CPU compiler and outputs of the computation will be allocated on Neuron cores. The following HLO instructions are considered as collective-compute instructions.\n\n    - ``all-gather``\n    - ``all-gather-done``\n    - ``all-gather-start``\n    - ``all-reduce-done``\n    - ``all-reduce-start``\n    - ``all-to-all``\n    - ``collective-permute``\n    - ``partition-id``\n    - ``replica-id``\n    - ``recv``\n    - ``recv-done``\n    - ``reduce-scatter``\n    - ``send``\n    - ``send-done``\n\n``NEURON_PJRT_PROCESSES_NUM_DEVICES``\n\n- Should be set to a comma-separated list stating the number of NeuronCores used by each worker process. It is used to construct a global device array with its size equal to the sum of the list. This gets reported to the XLA PJRT runtime when requested. Must be set for multi-process executions. It can be used in conjunction with ``NEURON_RT_VISIBLE_CORES`` to expose a limited number of NeuronCores to each worker process. If ``NEURON_RT_VISIBLE_CORES`` is not set, it should be set to available number of NeuronCores on the host. ``NEURON_PJRT_PROCESSES_NUM_DEVICES`` must be less than or equal to ``NEURON_RT_VISIBLE_CORES``.\n\n``NEURON_PJRT_PROCESS_INDEX``\n\n- An integer stating the index (or rank) of the current worker process. This is required for multi-process environments where all workers need to know information on all participating processes. Must be set for multi-process executions. The value should be between ``0`` and ``NEURON_PJRT_PROCESS_INDEX - 1``.\n\n``NEURON_RT_STOCHASTIC_ROUNDING_EN`` **[Neuron Runtime]**\n\n- When ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1``, JAX Neuron will use stochastic rounding instead of\n  round-nearest-even for all internal rounding operations when casting from FP32 to a reduced precision data type (FP16, BF16, FP8, TF32).\n  This feature has been shown to improve\n  training convergence for reduced precision training jobs. \n  To switch to round-nearest-even mode, set ``NEURON_RT_STOCHASTIC_ROUNDING_EN=0``.\n\n``NEURON_RT_STOCHASTIC_ROUNDING_SEED`` **[Neuron Runtime]**\n\n- Sets the seed for the random number generator used in stochastic rounding (see previous section). If this environment variable is not set, the seed is set to 0 by default. Please set ``NEURON_RT_STOCHASTIC_ROUNDING_SEED`` to a fixed value to ensure reproducibility between runs.\n\n``NEURON_RT_VISIBLE_CORES`` **[Neuron Runtime]**\n\n- Integer range of specific NeuronCores needed by the process (for example, 0-3 specifies NeuronCores 0, 1, 2, and 3). Use this environment variable when launching processes to limit the launched process to specific consecutive NeuronCores.\n\nAdditional Neuron runtime environment variables are described in :ref:`nrt-configuration`.\n"
  },
  {
    "path": "frameworks/jax/index.rst",
    "content": ".. meta::\n   :description: JAX support on AWS Neuron SDK - JAX NeuronX for training and inference on Trn1, Trn2, and Inf2 instances with native JAX device integration.\n   :keywords: JAX, jax-neuronx, libneuronxla, AWS Neuron, Trainium, Inferentia, PJRT, machine learning\n   :date-modified: 01/22/2026\n\n.. _jax-neuron-main:\n\nJAX Support on Neuron\n=====================\n\nJAX running on Neuron unlocks high-performance and cost-effective deep learning acceleration on AWS Trainium-based and AWS Inferentia-based Amazon EC2 instances.\n\nThe JAX NeuronX plugin is a set of modularized JAX plugin packages that integrate AWS Trainium and Inferentia machine learning accelerators into JAX as pluggable devices using the PJRT (Plugin Runtime) mechanism. This enables native JAX device support for Neuron accelerators with minimal code changes.\n\nJAX NeuronX includes the following key components:\n\n* **libneuronxla**: Neuron's integration into JAX's runtime PJRT, built using the PJRT C-API plugin mechanism. Installing this package enables using Trainium and Inferentia natively as JAX devices.\n* **jax-neuronx**: A package containing Neuron-specific JAX features, such as the Neuron NKI JAX interface. It also serves as a meta-package for providing a tested combination of ``jax-neuronx``, ``jax``, ``jaxlib``, ``libneuronxla``, and ``neuronx-cc`` packages.\n\nKey capabilities of JAX NeuronX include:\n\n* **Native JAX device integration**: Seamless integration with JAX through the PJRT C-API plugin mechanism\n* **Flexible installation**: Choose between a production-ready meta-package or custom package combinations\n* **NKI support**: Access to Neuron Kernel Interface (NKI) through the JAX interface for custom kernel development\n* **Broad compatibility**: Support for multiple JAX and jaxlib versions through the PJRT C-API mechanism\n* **Training and inference**: Full support for both training and inference workloads on Trainium and Inferentia instances\n\n.. admonition:: Beta Release\n   :class: note\n\n   JAX NeuronX is currently in beta. Some JAX functionality may not be fully supported. We welcome your feedback and contributions.\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   /frameworks/jax/setup/jax-setup\n   /frameworks/jax/setup/jax-neuronx-known-issues\n   /frameworks/jax/api-reference-guide/index\n   Release Notes </release-notes/components/jax>\n\nGet Started\n------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: Setup Guide\n        :link: jax-neuron-setup\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Install and configure JAX NeuronX for Trn1, Trn2, and Inf2 instances.\n\n    .. grid-item-card:: Neuron Kernel Interface (NKI)\n        :link: /nki/index\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Learn about NKI for custom kernel development with JAX.\n\nReference\n----------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: API Reference Guide\n        :link: jax-neuronx-api-reference-guide\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Comprehensive API reference for JAX NeuronX features and environment variables.\n\n    .. grid-item-card:: Known Issues\n        :link: /frameworks/jax/setup/jax-neuronx-known-issues\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Review known issues and limitations in the current JAX NeuronX release.\n\nRelease Notes\n--------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: JAX NeuronX Component Release Notes\n        :link: /release-notes/components/jax\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Review the JAX NeuronX release notes for all versions of the Neuron SDK.\n"
  },
  {
    "path": "frameworks/jax/setup/jax-neuronx-known-issues.rst",
    "content": ".. _jax-neuron-known-issues:\n\n\n.. meta::\n   :description: JAX NeuronX Known Issues - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, JAX, JAX NeuronX, Trainium, setup, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nJAX NeuronX Known Issues\n------------------------\n- ``Threefry`` RNG algorithm is not completely supported. Use ``rbg`` algorithm instead. This can be configured by setting the following config option ``jax.config.update(\"jax_default_prng_impl\", \"rbg\")``\n- For JAX versions older than ``0.4.34``, caching does not work out of the box. Use the following to enable caching support,\n\n  .. code:: python\n\n    import jax\n    import jax_neuronx\n    from jax._src import compilation_cache\n\n    compilation_cache.set_cache_dir('./cache_directory')\n\n- For JAX versions older than ``0.4.34``, buffer donation does not work out of the box. Add the following snippet to your script to enable it - ``jax._src.interpreters.mlir._platforms_with_donation.append('neuron')``\n- Mesh configurations which use non-connected Neuron cores might crash during execution. You might observe compilation or Neuron runtime errors for such configurations. Device connectivity can be determined by using ``neuron-ls --topology``.\n- Not all dtypes supported by JAX work on Neuron. Check :ref:`neuron-data-types` for supported data types.\n- ``jax.random.randint`` does not produce expected distribution of randint values. Run it on CPU instead.\n- Dynamic loops are not supported for ``jax.lax.while_loop``. Only static while loops are supported.\n- ``jax.lax.cond`` is not supported.\n- Host callbacks are not supported. As a result APIs based on callbacks from ``jax.debug`` and ``jax.experimental.checkify`` are not supported.\n- ``jax.dlpack`` is not supported.\n- ``jax.experimental.sparse`` is not supported.\n- ``jax.lax.sort`` only supports comparators with LE, GE, LT and GT operations.\n- ``jax.lax.reduce_precision`` is not supported.\n- Certain operations (for example, rng weight initialization) might result in slow compilations. Try to run such operations on the CPU backend or by setting the following environment variable ``NEURON_RUN_TRIVIAL_COMPUTATION_ON_CPU=1``.\n- Neuron only supports ``float8_e4m3`` and ``float8_e5m2`` for FP8 dtypes.\n- Complex dtypes (``jnp.complex64`` and ``jnp.complex128``) are not supported.\n- Variadic reductions are not supported.\n- Out of bound access for scatter/gather operations can result in runtime errors.\n- Dot operations on int dtypes are not supported.\n- ``lax.DotAlgorithmPreset`` is not always respected. Dot operations occur in operand dtypes. This is a configurable parameter for ``jax.lax.dot`` and ``jax.lax.dot_general``.\n"
  },
  {
    "path": "frameworks/jax/setup/jax-setup.rst",
    "content": ".. _jax-neuron-setup:\n\n\n.. meta::\n   :description: JAX NeuronX plugin Setup - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, JAX, JAX NeuronX, setup\n   :date-modified: 2026-03-13\n\n\nJAX NeuronX plugin Setup\n------------------------------\n\nThe JAX NeuronX plugin is a set of modularized JAX plugin packages integrating\nAWS Trainium and Inferentia machine learning accelerators into JAX as pluggable\ndevices. It includes the following Python packages, all hosted on the AWS Neuron\npip repository.\n\n* ``libneuronxla``: A package containing Neuron's integration into JAX's runtime `PJRT <https://openxla.org/xla/pjrt_integration>`__, built using the `PJRT C-API plugin <https://github.com/openxla/xla/blob/5564a9220af230c6c194e37b37938fb40692cfc7/xla/pjrt/c/docs/pjrt_integration_guide.md>`__ mechanism. Installing this package enables using Trainium and Inferentia natively as JAX devices.\n* ``jax-neuronx``: A package containing Neuron-specific JAX features, such as the `Neuron NKI <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/nki_rn.html>`__ JAX interface. It also serves as a meta-package for providing a tested combination of the ``jax-neuronx``, ``jax``, ``jaxlib``, ``libneuronxla``, and ``neuronx-cc`` packages. Making proper use of the features provided in ``jax-neuronx`` will unleash the full potential of Trainium and Inferentia.\n\n.. include:: /setup/install-templates/trn1-ga-warning.txt\n\n.. note:: \n    JAX requires ``Python 3.10`` or newer. Ensure a supported python version is installed on your system prior to installing JAX. See https://docs.aws.amazon.com/linux/al2023/ug/python.html to install newer python versions on Amazon Linux 2023.\n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * To launch an instance, follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_. Make sure to select the correct instance type on the EC2 console.\n    * For more information about instance sizes and pricing, see `Amazon EC2 Trn1 Instances <https://aws.amazon.com/ec2/instance-types/trn1/>`_ and `Amazon EC2 Inf2 Instances <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Select Ubuntu Server 22 AMI.\n    * When launching a Trn1, adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance.\n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    Ubuntu\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 242\n        :end-line: 243\n\n    Amazon Linux 2023\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 239\n        :end-line: 240\n\n.. dropdown::  Install the JAX NeuronX Plugin\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    We provide two methods for installing the JAX NeuronX plugin. The first is to install\n    the ``jax-neuronx`` meta-package from the AWS Neuron pip repository. This method provides\n    a production-ready JAX environment where ``jax-neuronx``'s major dependencies, namely\n    ``jax``, ``jaxlib``, ``libneuronxla``, and ``neuronx-cc``, have undergone thorough testing\n    by the AWS Neuron team and will have their versions pinned during installation. \n    **Note:** AL2023 requires setting the correct Python binary (Python 3.10 or newer).\n\n    .. code:: bash\n\n        python3 -m pip install jax-neuronx[stable] --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n    The second is to install packages ``jax``, ``jaxlib``, ``libneuronxla``,\n    and ``neuronx-cc`` separately, with ``jax-neuronx`` being an optional addition.\n    Because ``libneuronxla`` supports a broad range of ``jaxlib`` versions through\n    the PJRT C-API mechanism, this method provides flexibility when choosing\n    ``jax`` and ``jaxlib`` versions, enabling JAX users to bring the JAX NeuronX plugin\n    into their own JAX environments.\n\n    .. code:: bash\n\n        python3 -m pip install jax==0.6.2 jaxlib==0.6.2\n        python3 -m pip install jax-neuronx libneuronxla neuronx-cc==2.* --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\nWe can now run some simple JAX programs on the Trainium or Inferentia\naccelerators.\n\n.. code:: bash\n\n   ~$ python3 -c 'import jax; print(jax.numpy.multiply(1, 1))'\n   Platform 'neuron' is experimental and not all JAX functionality may be correctly supported!\n   .\n   Compiler status PASS\n   1\n\nCompatibility between packages ``jaxlib`` and ``libneuronxla`` can be\ndetermined from `PJRT C-API\nversion <https://github.com/openxla/xla/blob/0d1b60216ea13b0d261d59552a0f7ef20c4f76c5/xla/pjrt/c/pjrt_c_api.h>`__.\nFor more information, see `PJRT integration\nguide <https://github.com/openxla/xla/blob/0d1b60216ea13b0d261d59552a0f7ef20c4f76c5/docs/pjrt/pjrt_integration.md>`__.\n\nTo determine compatible JAX versions, you can use the\n``libneuronxla.supported_clients`` API for querying known supported\nclient packages and their versions.\n\n.. code::\n\n   Help on function supported_clients in module libneuronxla.version:\n\n   supported_clients()\n       Return a description of supported client (jaxlib, torch-xla, etc.) versions,\n       as a list of strings formatted as `\"<package> <version> (PJRT C-API <c-api version>)\"`.\n       For example,\n       >>> import libneuronxla\n       >>> libneuronxla.supported_clients()\n       ['jaxlib 0.4.38 (PJRT C-API 0.58)', torch_xla 2.6.0 (PJRT C-API 0.55)', 'torch_xla 2.6.1 (PJRT C-API 0.55)', 'torch_xla 2.7.0 (PJRT C-API 0.61)']\n\nNote that the list of supported client packages and versions covers\nknown versions only and may be incomplete. More versions could be\nsupported, including Google's future ``jaxlib`` releases, assuming the\nPJRT C-API stays compatible with the current release of\n``libneuronxla``. As a result, we avoid specifying any dependency\nrelationship between ``libneuronxla`` and ``jaxlib``. This provides more\nfreedom when coordinating ``jax`` and ``libneuronxla`` installations.\n"
  },
  {
    "path": "frameworks/torch/about/index.rst",
    "content": ".. meta::\n    :description: History and evolution of PyTorch support on AWS Neuron across Inferentia and Trainium platforms\n    :keywords: pytorch, torch-neuron, torch-neuronx, torchneuron, neuron, inferentia, trainium\n    :date-modified: 02/26/2026\n\nAbout PyTorch on AWS Neuron\n===========================\n\nThis topic provides an overview of PyTorch support in Neuron for AWS ``Inf*`` (Inferentia-based) and ``Trn*`` (Trainium-based) ML platforms. \n\nThroughout the past 5 years, AWS Neuron has evolved its PyTorch support to match the capabilities and architectures of successive generations of AWS ML accelerators, delivering three distinct PyTorch implementations optimized for different hardware platforms and use cases:\n\n* **torch-neuron** (2019): Graph-based inference for Inferentia (Inf1)\n* **torch-neuronx** (2022): XLA-based training and inference for Inferentia2 (Inf2) and Trainium (Trn1/Trn2)\n* **TorchNeuron** (2025): Native PyTorch backend for Trainium (Trn2/Trn3) with eager mode and ``torch.compile``\n\nOverview\n--------\n\nAWS Neuron's PyTorch support has evolved through three major implementations, each designed to leverage the unique capabilities of AWS ML accelerators:\n\n1. **torch-neuron** (2019-2026): The original PyTorch integration for AWS Inferentia (Inf1), focused on inference workloads with a graph-based compilation approach\n2. **torch-neuronx** (2022-): An XLA-based PyTorch implementation for AWS Inferentia2 (Inf2) and Trainium (Trn1/Trn2/Trn3), supporting both training and inference with distributed computing capabilities\n3. **TorchNeuron** (2025-): A native PyTorch backend for Trainium that provides eager mode execution, ``torch.compile`` support, and standard PyTorch distributed APIs without requiring XLA\n\nEach implementation represents a significant architectural evolution, reflecting advances in both AWS ML accelerator hardware and PyTorch framework capabilities.\n\ntorch-neuron for Inf1\n---------------------\n\nThe first Neuron library supporting PyTorch, ``torch-neuron``, was initially released in December 2019 alongside the launch of AWS Inferentia. This implementation introduced PyTorch developers to AWS's purpose-built ML inference accelerators.\n\n``torch-neuron`` uses a graph-based compilation approach where PyTorch models are traced and compiled into optimized Neuron Executable File Format (NEFF) binaries. The library integrates with PyTorch through custom operators and provides APIs for model compilation (``torch.neuron.trace``) and execution on Inferentia NeuronCores.\n\nKey characteristics of torch-neuron:\n\n* **Target Platform**: AWS Inferentia (Inf1 instances)\n* **Primary Use Case**: Inference workloads\n* **Compilation Approach**: Ahead-of-time (AOT) graph compilation via ``torch.neuron.trace``\n* **Supported Models**: Computer vision models (ResNet, VGG, EfficientNet, YOLO variants), NLP models (BERT, RoBERTa, DistilBERT, MarianMT), and other inference-optimized architectures\n* **Integration Method**: Custom PyTorch operators and tracing API\n\nWhen to choose torch-neuron\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nChoose ``torch-neuron`` when:\n\n* Deploying inference workloads on AWS Inferentia (Inf1) instances\n* Working with models that can be traced and compiled ahead of time\n* Optimizing for inference latency and throughput on first-generation Inferentia hardware\n* Requiring compatibility with existing Inf1-based infrastructure\n\n\ntorch-neuronx for Inf2 and Trn1\n-------------------------------\n\nIn October 2022, AWS introduced Inferentia2 and Trainium, second-generation ML accelerators with enhanced capabilities for both training and inference. To support these platforms, Neuron delivered ``torch-neuronx``, a new PyTorch implementation built on PyTorch/XLA.\n\n``torch-neuronx`` represents a significant architectural shift from torch-neuron, leveraging the XLA (Accelerated Linear Algebra) compiler infrastructure to enable both training and inference workloads. This XLA-based approach provides support for dynamic shapes, control flow, distributed training primitives, and advanced parallelism strategies.\n\nKey characteristics of torch-neuronx:\n\n* **Target Platforms**: AWS Inferentia2 (Inf2 instances) and AWS Trainium (Trn1, Trn1n, Trn2, Trn3 instances)\n* **Primary Use Cases**: Both training and inference workloads\n* **Compilation Approach**: XLA-based compilation with support for dynamic shapes and control flow\n* **Distributed Computing**: Native support for data parallelism, tensor parallelism, pipeline parallelism, sequence parallelism, and Zero Redundancy Optimizer (ZeRO)\n* **Training Capabilities**: Full support for large-scale model training including LLMs (Llama, GPT, BERT families), with gradient accumulation, mixed precision training, and distributed checkpointing\n* **Inference Capabilities**: Support for large language model inference with features like continuous batching, speculative decoding, and quantization\n* **Integration Method**: PyTorch/XLA device backend (``xla`` device type)\n\nThe XLA-based architecture enables torch-neuronx to support advanced training techniques and distributed strategies that were not possible with the original torch-neuron implementation. This includes support for frameworks like NeuronX Distributed (NxD) for training and inference, Transformers NeuronX for LLM inference, and integration with popular ML libraries like HuggingFace Transformers and PyTorch Lightning.\n\nWhen to choose torch-neuronx\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nChoose ``torch-neuronx`` when:\n\n* Training models on AWS Trainium (Trn1, Trn1n, Trn2) instances\n* Running inference on AWS Inferentia2 (Inf2) instances\n* Requiring distributed training capabilities with tensor parallelism, pipeline parallelism, or data parallelism\n* Working with large language models or other models requiring multi-device training\n* Needing dynamic shape support or control flow in your models\n* Using PyTorch versions 2.5 through 2.9 (XLA-based implementation)\n\n**Note**: PyTorch 2.9 is the last version of torch-neuronx based on PyTorch/XLA. Starting with PyTorch 2.10 support (planned for a future Neuron release), torch-neuronx will transition to the native PyTorch implementation (TorchNeuron).\n\n\nTorchNeuron (Native PyTorch integration)\n----------------------------------------\n\n**TorchNeuron**, the latest evolution of PyTorch support for Neuron, was announced in December 2025 at AWS re:Invent and shipped its initial version as part of Neuron release 2.27.0. While it retains the same Python package name as its predecessor (``torch-neuronx``), TorchNeuron is an entirely new native PyTorch backend developed specifically for Trainium platforms.\n\nTorchNeuron represents a fundamental architectural shift from XLA-based compilation to native PyTorch integration through the PrivateUse1 device backend mechanism. This native integration enables PyTorch code to run on Trainium with minimal modifications, supporting both eager mode execution for rapid iteration and ``torch.compile`` for production optimization.\n\nKey characteristics of TorchNeuron:\n\n* **Target Platforms**: AWS Trainium (Trn2, Trn3 instances)\n* **Primary Use Cases**: Training and inference workloads with native PyTorch workflows\n* **Execution Modes**: \n  \n  * **Eager Mode**: Immediate operation execution for interactive development and debugging\n  * **torch.compile**: Just-in-time (JIT) compilation via TorchDynamo for optimized performance\n\n* **Distributed APIs**: Native support for standard PyTorch distributed primitives:\n  \n  * Fully Sharded Data Parallel (FSDP)\n  * Distributed Tensor (DTensor)\n  * Distributed Data Parallel (DDP)\n  * Tensor Parallelism (TP)\n\n* **Integration Method**: Native PyTorch backend via PrivateUse1 mechanism (``neuron`` device type)\n* **Ecosystem Compatibility**: Works with TorchTitan, HuggingFace Transformers, and other PyTorch ecosystem tools with minimal code changes\n* **Custom Kernels**: Integration with Neuron Kernel Interface (NKI) for performance-critical operations\n* **Open Source**: Available on GitHub under Apache 2.0 license\n\nTorchNeuron's native integration eliminates the need for XLA-specific APIs and enables researchers and ML developers to use familiar PyTorch patterns. The eager mode support provides immediate feedback during development, while ``torch.compile`` delivers production-grade performance through hardware-specific optimizations.\n\nThe implementation includes Adaptive Eager Execution, which applies optimizations like operator fusion while maintaining functional accuracy and debuggability. This approach provides a balance between development velocity and runtime performance.\n\nWhen to choose TorchNeuron\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nChoose **TorchNeuron** (native PyTorch) when:\n\n* Training models on AWS Trainium (Trn2, Trn3) instances with PyTorch 2.10 or later\n* Requiring eager mode execution for interactive development and debugging\n* Using standard PyTorch distributed training APIs (FSDP, DTensor, DDP)\n* Working with PyTorch ecosystem tools like TorchTitan or HuggingFace Transformers\n* Needing minimal code changes to run existing PyTorch code on Trainium\n* Leveraging ``torch.compile`` for production optimization\n* Developing custom kernels with Neuron Kernel Interface (NKI)\n\n**Migration Note**: Starting with PyTorch 2.10 support (planned for a future Neuron release), AWS Neuron will transition from PyTorch/XLA to native PyTorch support via TorchNeuron. Users on PyTorch 2.9 or earlier will need to update their scripts when upgrading to PyTorch 2.10 or later. See :ref:`native-pytorch-trainium` for complete migration guidance.\n\n\nRead More\n---------\n\n**Training Resources**\n\n* :doc:`Training with torch-neuronx </frameworks/torch/training-torch-neuronx>` - Training guides and tutorials for Trainium\n* :doc:`PyTorch Neuron Programming Guide </frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide>` - Core concepts for training on Neuron\n* :doc:`NeuronX Distributed (NxD) Training </libraries/nxd-training/index>` - Distributed training library for large-scale models\n* :doc:`PyTorch Training Tutorials </frameworks/torch/torch-neuronx/tutorials/training/tutorials-training-torch-neuronx>` - Step-by-step training examples\n\n**Inference Resources**\n\n* :doc:`Inference with torch-neuronx </frameworks/torch/inference-torch-neuronx>` - Inference guides for Inf2 and Trn1/Trn2\n* :doc:`Inference with torch-neuron </archive/torch-neuron/inference-torch-neuron>` - Inference guides for Inf1\n* :doc:`NeuronX Distributed Inference (NxDI) </libraries/nxd-inference/index>` - Inference library for large language models\n* :ref:`torch-neuron vs torch-neuronx Comparison <torch-neuron_vs_torch-neuronx>` - Detailed comparison for inference workloads\n\n**Architecture and Hardware**\n\n* :doc:`AWS Inferentia Architecture </about-neuron/arch/neuron-hardware/inferentia>` - Inf1 hardware architecture\n* :doc:`AWS Inferentia2 Architecture </about-neuron/arch/neuron-hardware/inferentia2>` - Inf2 hardware architecture\n* :doc:`AWS Trainium Architecture </about-neuron/arch/neuron-hardware/trainium>` - Trn1 hardware architecture\n* :doc:`AWS Trainium2 Architecture </about-neuron/arch/neuron-hardware/trainium2>` - Trn2 hardware architecture\n* :doc:`AWS Trainium3 Architecture </about-neuron/arch/neuron-hardware/trainium3>` - Trn3 hardware architecture\n\n\n"
  },
  {
    "path": "frameworks/torch/guide-torch-neuron-vs-torch-neuronx-inference.rst",
    "content": ".. _torch-neuron_vs_torch-neuronx:\n\n\n.. meta::\n   :description: Comparison of |torch-neuron| (|Inf1|) versus |torch-neuronx| (|Inf2| & |Trn1|) for Inference - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nComparison of |torch-neuron| (|Inf1|) versus |torch-neuronx| (|Inf2| & |Trn1|) for Inference\n============================================================================================\n\n.. warning::\n\n   ``torch-neuron`` (Inf1) has been archived and is no longer actively developed.\n   For new inference workloads, use :doc:`TorchNeuron Native </frameworks/torch/pytorch-native-overview>`\n   (recommended for Trn2/Trn3) or ``torch-neuronx`` (for Inf2/Trn1). The archived\n   torch-neuron documentation is available at :doc:`/archive/torch-neuron/index`.\n\nNeuron now supports multiple instance types for inference. The choice of\ninstance should be motivated primarily by the performance needs of the\napplication, the instance pricing, and model compatibility.\n\nIn prior releases, |torch-neuron| *only supported inference* and\n|torch-neuronx| *only supported training*. While |torch-neuron| will never\nbe updated to support training, |torch-neuronx| now supports both *inference and\ntraining*.\n\n.. note::\n\n    **Recommendation**: For new inference workloads, use\n    :doc:`TorchNeuron Native </frameworks/torch/pytorch-native-overview>` (Trn2/Trn3)\n    or |torch-neuronx| (|Inf2| & |Trn1|). |torch-neuron| (|Inf1|) is archived and\n    should only be used for existing applications that have not yet migrated.\n\n\nFramework Comparison\n--------------------\n\nExample\n~~~~~~~\n\nThe following scripts are identical model compilations performed using each\nframework. The lines that are changed are highlighted to show where the\ndifferences occur.\n\n\n.. tab-set::\n\n    .. tab-item:: torch-neuron\n\n        .. code-block:: python\n            :emphasize-lines: 3, 8\n\n            import torch\n            import torchvision\n            import torch_neuron\n\n            model = torchvision.models.resnet50(pretrained=True).eval()\n            image = torch.rand(1, 3, 224, 224)\n\n            trace = torch_neuron.trace(model, image)\n\n    .. tab-item:: torch-neuronx\n\n        .. code-block:: python\n            :emphasize-lines: 3, 8\n\n            import torch\n            import torchvision\n            import torch_neuronx\n\n            model = torchvision.models.resnet50(pretrained=True).eval()\n            image = torch.rand(1, 3, 224, 224)\n\n            trace = torch_neuronx.trace(model, image)\n\n\nHardware Features\n~~~~~~~~~~~~~~~~~\n\nThe |torch-neuron| framework supports |Inf1| instances and the |torch-neuronx|\nframework supports |Inf2| & |Trn1| instances. These instances have different\n|architectures|, networking configurations, and capabilities due to the\nNeuronCore versions used.\n\nModels compiled with |torch-neuron| produce artifacts which are *only*\ncompatible with |NeuronCore-v1|. Models compiled with |torch-neuronx| produce\nartifacts which are *only* compatible with |NeuronCore-v2|. This also means\nthat models that were previously compiled with |torch-neuron| for |Inf1| are\nnot forwards compatible with |Inf2| & |Trn1| instances. Likewise, models compiled\nwith |torch-neuronx| for |Inf2| & |Trn1| are not backwards compatible with |Inf1|.\n\n|NeuronCore-v2| is capable of higher throughput and lower latency than\n|NeuronCore-v1| due to more powerful compute engines and improved memory\nbandwidth. |NeuronCore-v2| can also support larger models since more\nmemory is available per NeuronCore. The hardware differences between\nNeuronCore versions means that models compiled with |torch-neuronx| will\nusually outperform models compiled with |torch-neuron|.\n\nIn cases where throughput may be similar across instance-types, instances using\n|NeuronCore-v2| tend to achieve *significantly lower* latency than instances\nusing |NeuronCore-v1|. This can enable applications that require extremely fast\nresponse time.\n\nSee the :ref:`benchmark` page for the most up-to-date performance metrics.\n\nBesides performance benefits, |NeuronCore-v2| also has more hardware\ncapabilities compared to |NeuronCore-v1|. For example, |NeuronCore-v2|\nsupports a greater variety of data types and introduces a new fully programmable\nGPSIMD-Engine.\n\nNote that ``Trn`` instance-types are optimized for training purposes. Some\n``Trn`` features (such as inter-chip networking) may be unnecessary\nfor inference applications that do not require distribution across multiple\nNeuronCores.\n\n\nSoftware Features\n~~~~~~~~~~~~~~~~~\n\nThe |torch-neuron| framework uses :func:`torch_neuron.trace` to\ncreate a TensorFlow GraphDef protobuf intermediate representation (IR) of the\nmodel compute graph. This is compiled to a binary Neuron Executable File Format\n(NEFF) with the |neuron-cc| compiler.\n\nThe |torch-neuronx| framework uses :func:`torch_neuronx.trace` with\ntorch-xla_ to create a HloModule protobuf IR of the model compute graph. This is\ncompiled to a binary executable NEFF with the |neuronx-cc| compiler.\n\nThe use of different compiler versions means that separate flags are supported\nby each framework. For example:\n\n- :ref:`neuroncore-pipeline` is supported in |neuron-cc| but is not supported\n  in |neuronx-cc|. However, this feature is much less useful when using the\n  |NeuronCore-v2| architecture due to significant memory improvements.\n- Mixed precision flags will differ across the compilers. |neuronx-cc| improves\n  the flags by making the behavior more explicit and streamlined:\n\n  - |neuron-cc-mixed-precision|\n  - |neuronx-cc-mixed-precision|\n\nSince the python graph recording methods used by the frameworks are much\ndifferent, this may lead to different levels of model support. To view the\nmodels which are known to be working, many compilation samples are provided for\neach framework:\n\n- `torch-neuron Samples`_\n- `torch-neuronx Samples`_\n\nFramework model support may also be affected by the graph partitioning feature.\nIn |torch-neuron|, the :func:`torch_neuron.trace` API provides the ability to\nfall back to CPU for operations that are not supported directly by Neuron. This\nfallback behavior is currently not supported by :func:`torch_neuronx.trace`,\nhowever, certain operations that were previously not well-supported\nin |torch-neuron| are now supported in |torch-neuronx| by default (e.g.\n:class:`torch.nn.Embedding`).\n\n\nFeature Summary\n~~~~~~~~~~~~~~~\n\n+-----------------------+-----------------------------+-----------------------------+\n|                       | `torch-neuron`              | `torch-neuronx`             |\n+=======================+=============================+=============================+\n| Supported Instances   | |Inf1|                      | |Inf2| & |Trn1|             |\n+-----------------------+-----------------------------+-----------------------------+\n| Inference Support     | Yes                         | Yes                         |\n+-----------------------+-----------------------------+-----------------------------+\n| Training Support      | No                          | Yes                         |\n+-----------------------+-----------------------------+-----------------------------+\n| Architecture          | |NeuronCore-v1|             | |NeuronCore-v2|             |\n+-----------------------+-----------------------------+-----------------------------+\n| Model Support         | |model-support-v1|          | |model-support-v2|          |\n+-----------------------+-----------------------------+-----------------------------+\n| Trace API             | :func:`torch_neuron.trace`  | :func:`torch_neuronx.trace` |\n+-----------------------+-----------------------------+-----------------------------+\n| NeuronCore Pipeline   | Yes                         | No                          |\n+-----------------------+-----------------------------+-----------------------------+\n| Partitioning          | Yes                         | No                          |\n+-----------------------+-----------------------------+-----------------------------+\n| IR                    | GraphDef                    | HLO                         |\n+-----------------------+-----------------------------+-----------------------------+\n| Compiler              | |neuron-cc|                 | |neuronx-cc|                |\n+-----------------------+-----------------------------+-----------------------------+\n| Samples               | `torch-neuron Samples`_     | `torch-neuronx Samples`_    |\n+-----------------------+-----------------------------+-----------------------------+\n\n\nReferences\n----------\n\nTo determine if a model is already supported in a given framework, it is\nrecommended to check the existing documentation for specific models. In order\nof reference quality, the following pages can be checked prior to compiling a\nmodel:\n\n1. :ref:`benchmark`: Models that are available here have been optimized to\n   maximize throughput and/or latency. These metrics are updated frequently as\n   improvements are made. Since metrics are published for different instance\n   types, this can provide a direct performance comparison between instances.\n   Note that the exact models and configurations may differ across instances.\n2. `Neuron GitHub Samples`_: Provides simple examples of compiling and executing\n   models. Compared to the benchmarks, this reference is only\n   intended to show *how* to run a particular model on Neuron. This only\n   validates if a framework supports a given model.\n\nIf a model does not appear in any of these references, the last option is\nto attempt to compile the model to see how it performs. In the case that an\nerror occurs during compilation, please file a ticket in the\n`Neuron SDK Github Issues`_.\n\n\n.. |neuron-cc-mixed-precision| replace:: :ref:`neuron-cc-training-mixed-precision`\n.. |neuronx-cc-mixed-precision| replace:: :ref:`neuronx-cc-training-mixed-precision`\n.. |Inf1| replace:: :ref:`Inf1 <aws-inf1-arch>`\n.. |Trn1| replace:: :ref:`Trn1 <aws-trn1-arch>`\n.. |Inf2| replace:: :ref:`Inf2 <aws-inf2-arch>`\n.. |architectures| replace:: architectures\n.. |NeuronCore-v1| replace:: :ref:`NeuronCore-v1 <neuroncores-v1-arch>`\n.. |NeuronCore-v2| replace:: :ref:`NeuronCore-v2 <neuroncores-v2-arch>`\n.. |neuron-cc| replace:: :ref:`neuron-cc <neuron-compiler-cli-reference>`\n.. |neuronx-cc| replace:: :ref:`neuronx-cc <neuron-compiler-cli-reference-guide>`\n.. |torch-neuron| replace:: :ref:`torch-neuron <inference-torch-neuron>`\n.. |torch-neuronx| replace:: :ref:`torch-neuronx <inference-torch-neuronx>`\n.. |model-support-v1| replace:: Architecture Fit NeuronCore-v1\n.. |model-support-v2| replace:: Architecture Fit NeuronCore-v2\n\n.. _Neuron GitHub Samples: https://github.com/aws-neuron/aws-neuron-samples\n.. _torch-neuron Samples: https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuron\n.. _torch-neuronx Samples: https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx\n.. _torch-xla: https://github.com/pytorch/xla\n.. _Neuron SDK Github Issues: https://github.com/aws-neuron/aws-neuron-sdk/issues"
  },
  {
    "path": "frameworks/torch/index.rst",
    "content": ".. meta::\n   :description: PyTorch support on AWS Neuron SDK - TorchNeuron Native for eager execution and torch.compile on Trainium and Inferentia, with torch-neuronx XLA-based support for training and inference.\n   :keywords: PyTorch, TorchNeuron, torch-neuronx, AWS Neuron, Trainium, Inferentia, deep learning, torch.compile, eager mode\n   :date-modified: 01/22/2026\n\n.. _neuron-pytorch:\n.. _pytorch-neuronx-main:\n\nPyTorch Support on Neuron\n==========================\n\nPyTorch running on Neuron unlocks high-performance and cost-effective deep learning acceleration on AWS Trainium-based and AWS Inferentia-based Amazon EC2 instances.\n\nThe PyTorch plugin for Neuron architecture enables native PyTorch models to be accelerated on Neuron devices, so you can use your existing framework application and get started easily with minimal code changes.\n\nPyTorch Neuron support is available at three levels:\n\n* **TorchNeuron Native** *(recommended)*: The newest native PyTorch backend providing eager execution, ``torch.compile``, and standard distributed APIs (FSDP, DTensor, DDP, Tensor Parallelism) for Trainium and Inferentia. This is the recommended starting point for new workloads.\n* **PyTorch NeuronX (torch-neuronx)** *(supported)*: The XLA-based PyTorch integration supporting NeuronCores v2 architecture (Trn1, Trn2, Inf2, Trn1n). Provides full capabilities for both training and inference workloads.\n* **PyTorch Neuron (torch-neuron)** *(archived)*: The legacy PyTorch integration for NeuronCores v1 architecture (Inf1 only). This package is no longer actively developed. See :doc:`/archive/torch-neuron/index` for reference documentation.\n\n.. admonition:: Which Neuron framework for PyTorch should I select?\n\n   For help selecting a framework type for inference, see:\n   *  :doc:`/frameworks/torch/about/index`\n   *  :ref:`torch-neuron_vs_torch-neuronx`\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    About PyTorch on Neuron </frameworks/torch/about/index>\n    Native PyTorch </frameworks/torch/pytorch-native-overview>\n    PyTorch Setup </frameworks/torch/torch-setup>\n    Training </frameworks/torch/training-torch-neuronx>\n    Inference </frameworks/torch/inference-torch-neuronx>\n    torch-neuron v. torch-neuronx </frameworks/torch/guide-torch-neuron-vs-torch-neuronx-inference>\n    Release Notes </release-notes/components/pytorch>\n\nGet Started\n------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: TorchNeuron Native Backend Overview\n        :link: /frameworks/torch/pytorch-native-overview\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        **Recommended for new workloads** — Learn about the native PyTorch backend with eager execution, ``torch.compile`` support, and standard distributed APIs for Trainium and Inferentia.\n\n    .. grid-item-card:: Setup Guide\n        :link: /frameworks/torch/torch-setup\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Install and configure PyTorch NeuronX for your environment.\n\nTraining & Inference\n---------------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: Training on Trn1 and Trn2\n        :link: training-torch-neuronx\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Train models using PyTorch NeuronX on Trainium instances.\n\n    .. grid-item-card:: Inference on Inf2, Trn1, and Trn2\n        :link: inference-torch-neuronx\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Deploy inference workloads using PyTorch NeuronX.\n\nRelease Notes\n--------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: PyTorch Neuron Component Release Notes\n        :link: /release-notes/components/pytorch\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Review the PyTorch Neuron release notes for all versions of the Neuron SDK.\n\n.. note::\n\n   Looking for torch-neuron (Inf1) documentation? The torch-neuron package has been\n   archived. See :doc:`/archive/torch-neuron/index` for legacy Inf1 documentation.\n"
  },
  {
    "path": "frameworks/torch/inference-torch-neuronx.rst",
    "content": ".. _inference-torch-neuronx:\n\n\n.. meta::\n   :description: Inference with ``torch-neuronx`` (Inf2 & Trn1/Trn2) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nInference with ``torch-neuronx`` (Inf2 & Trn1/Trn2)\n====================================================\n\nDeploy inference workloads using PyTorch NeuronX on Inf2, Trn1, and Trn2 instances.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Tutorials </frameworks/torch/torch-neuronx/tutorials/inference/tutorials-torch-neuronx>\n    Additional Examples </frameworks/torch/torch-neuronx/additional-examples-inference-torch-neuronx>\n    API Reference Guide </frameworks/torch/torch-neuronx/api-reference-guide/inference/inference-api-guide-torch-neuronx>\n    Developer Guide  </frameworks/torch/torch-neuronx/programming-guide/inference/index>\n    Misc  </frameworks/torch/torch-neuronx/misc-inference-torch-neuronx>\n\nGet Started\n------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: Setup (``torch-neuronx``)\n        :link: setup-torch-neuronx\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Install and configure PyTorch NeuronX for inference workloads on Inf2, Trn1, and Trn2 instances.\n\nTutorials\n----------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: Inference Tutorials\n        :link: /frameworks/torch/torch-neuronx/tutorials/inference/tutorials-torch-neuronx\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Step-by-step tutorials including BERT, TorchServe, LibTorch C++, ResNet50, and T5 inference.\n\nReference\n----------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: API Reference Guide\n        :link: /frameworks/torch/torch-neuronx/api-reference-guide/inference/inference-api-guide-torch-neuronx\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Inference API reference for PyTorch NeuronX, including trace, replace weights, core placement, and data parallel APIs.\n\n    .. grid-item-card:: Developer Guide\n        :link: /frameworks/torch/torch-neuronx/programming-guide/inference/index\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        In-depth developer guide covering core placement, trace vs XLA, data parallelism, and auto-bucketing.\n\nAdditional Resources\n---------------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: Additional Examples\n        :link: /frameworks/torch/torch-neuronx/additional-examples-inference-torch-neuronx\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        More inference examples and sample code from the AWS Neuron Samples repository.\n\n    .. grid-item-card:: Misc\n        :link: /frameworks/torch/torch-neuronx/misc-inference-torch-neuronx\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Supported operators, release notes, and additional inference resources.\n"
  },
  {
    "path": "frameworks/torch/pytorch-native-overview.rst",
    "content": ".. _native-pytorch-trainium:\n\n.. meta::\n    :description: Documentation Landing Page for TorchNeuron, the native PyTorch backend for AWS Trainium\n    :date-modified: 12/02/2025\n    :keywords: AWS Neuron, PyTorch\n\nNative PyTorch for AWS Trainium\n==================================\n\nOverview\n--------\n\n``TorchNeuron`` is an open-source PyTorch backend that provides native PyTorch framework integration for AWS Trainium. TorchNeuron provides support for eager mode, ``torch.compile``, and standard PyTorch native distributed APIs.\n\n.. image:: /images/torchneuron/pytorch-native-neuron-stack.png\n\nTorchNeuron\n------------------\n\n.. important::\n    TorchNeuron is currently only available as part of a closed Beta program. If you would like to participate, contact your AWS Neuron support representative.\n\n``TorchNeuron`` is an open-source PyTorch extension that provides new native backend integration for AWS Trainium. The implementation includes support for eager mode for rapid iteration and experimentation, ``torch.compile`` for just-in-time compilation, and standard distributed processing APIs. \nTorchNeuron enables ecosystem compatibility and supports custom kernel development through the Neuron Kernel Interface (NKI) for performance optimization and research applications.\n\nPyTorch Eager Mode \n---------------------------\n\nIn eager mode, operations are dispatched and execute immediately upon invocation. PyTorch's dispatcher routes tensor operations to the Neuron backend, which provides optimized implementations of ``ATen`` operators (core tensor operations) and distributed communication operators. These primitives execute directly on AWS Trainium hardware.\n\nAdaptive Eager Execution\n^^^^^^^^^^^^^^^^^^^^^^^^^\n``TorchNeuron`` implements *Adaptive Eager Execution* to improve performance while maintaining functional accuracy and debuggability. \nAdaptive Eager Execution applies optimizations such as operator fusion while guaranteeing identical stream order semantics and numerical accuracy. \n\ntorch.compile Support\n----------------------\n\n``TorchNeuron`` supports ``torch.compile``, enabling developers to JIT-compile some or all of their PyTorch code to improve performance on AWS Trainium. \n``TorchNeuron`` implements a custom backend for `TorchDynamo <https://docs.pytorch.org/docs/stable/torch.compiler_dynamo_overview.html>`__ that receives the forward and backward FX graphs and transforms them into optimized AWS Trainium instructions.\n\nThe ``TorchNeuron`` backend fully supports the caching mechanism provided by ``TorchDynamo``.\n\nCompilation Process\n^^^^^^^^^^^^^^^^^^^\n\nWhen ``torch.compile`` is applied to a model on AWS Trainium:\n\n1. ``TorchDynamo`` captures Python bytecode and extracts PyTorch operations into an FX Graph during the forward pass.\n2. ``AOT Autograd`` generates forward and backward graphs.\n3. The Neuron Backend receives both FX graphs and lowers them to Neuron IR.\n4. The Neuron Compiler applies hardware-specific optimizations and generates Trainium instructions for execution on the hardware.\n\nDistributed Inference and Training Support\n-------------------------------------------\n\n``TorchNeuron`` offers support for PyTorch distributed APIs, such as those included in the ``torch.distributed`` module, to support collective communications across sharded models, such as ``torch.distributed.all_reduce()``). \nHigher-level distributed training tools and techniques such as ``FSDP (Fully Sharded Data Parallel)`` and ``DTensor (Distributed Tensor)`` are implemented using these ``torch.distributed`` primitives to provide model parallelism and data parallelism strategies.\n\nThe Trainium backend supports the following ``torch.distributed`` APIs and techniques:\n\n* Fully Sharded Data Parallel (FSDP)\n* Distributed Tensor (DTensor)\n* Distributed Data Parallel (DDP)\n* Tensor Parallelism (TP)\n\nSupport for additional parallelism strategies such as Pipeline Parallelism (PP) will be available soon. \n\n\nNeuron Kernel Interface (NKI) Integration\n-------------------------------------------\n\n``TorchNeuron`` integrates with the ``Neuron Kernel Interface (NKI)``, enabling the development, optimization, and execution of custom operators.\n\nNKI provides fine-grained control beyond adaptive eager execution and ``torch.compile``. Developers can call performance-critical NKI kernels within training code to replace sequences of standard PyTorch operations. NKI kernels function in both eager and ``torch.compile`` modes, supporting:\n\n* Immediate execution and debugging capabilities in eager mode for rapid iteration\n* Graph-level optimizations with ``torch.compile`` for production deployment\n\nNKI kernels integrate with native PyTorch code through the ``@nki.jit`` decorator and ``@nki_op`` for custom op registration. \nTraining models that include NKI kernels requires a backward version of the custom op, implemented using the `register_autograd() <https://docs.pytorch.org/docs/stable/library.html#torch.library.register_autograd>`__ function.\n\n.. _pytorch_faqs:\n\nFAQs\n---------\n\nGetting Started FAQ\n^^^^^^^^^^^^^^^^^^^\n\n**Q: What is TorchNeuron?**\n\nTorchNeuron is an open-source native PyTorch backend for AWS Trainium that integrates through PyTorch's standard PrivateUse1 device backend mechanism. TorchNeuron supports both eager mode execution and ``torch.compile``. TorchNeuron is open source and initially available on GitHub at aws-neuron/torch-neuronx.\n\n**Q: What changes are needed to run my PyTorch code on Trainium?**\n\nRunning your PyTorch code on Trainium requires minimal changes, organized below by execution mode and common configuration:\n\nFor Eager Mode Execution:\n\nMinimal changes listed below:\n\n* Device placement: Change ``.to('cuda')`` to ``.to('neuron')``\n* ``torch.accelerator`` API: If your code uses ``torch.accelerator``, no changes are needed (automatic device detection)\n* Mixed precision: Use standard ``torch.autocast(device_type=\"neuron\")`` API with automatic datatype conversion following PyTorch CUDA conventions\n* Distributed training: Native support for FSDP, DTensor, Tensor Parallelism, and Distributed Data Parallel with no code modifications required, except for sharding configurations which depend on the number of NeuronCores (which can be different from the number of GPUs)\n* Sharding (Parallelism) Configuration: On Trainium, the unit of distribution is the NeuronCore, the heterogeneous compute unit that powers Trainium. Configure sharding strategies based on available Trainium instance and NeuronCores per Trainium chip, which depends on model and workload requirements. For some parallelism strategies like Tensor Parallelism, you need to specify how many NeuronCores are used for sharding. For other strategies like FSDP, no configuration changes are needed.\n\nFor ``torch.compile``:\n\nOn top the minimal changes listed in Eager Mode, the following two changes are needed for ``torch.compile``:\n\n* Specify ``backend=\"neuron\"`` (Specifically, ``@torch.compile(backend=\"neuron\")``)\n* Remove CUDA-specific parameters like m``ode=\"max-autotune-no-cudagraphs\"``\n\n**Q: What is NKI and when and how should I use it?**\n\nNKI (Neuron Kernel Interface) is TorchNeuron's kernel programming interface for creating custom operators optimized for Trainium hardware. NKI uses similar definition and registration patterns as Triton, providing a familiar workflow for developers.\n\nWhen to use NKI:\n\n* For performance optimization requiring low-level hardware control\n* For novel research requiring operations not yet expressible in standard PyTorch\n\nHow to use NKI:\n\n* Import ``torch_neuronx``\n* Define kernels using ``@nki.jit`` decorator for low-level hardware control\n* Register as PyTorch operators with ``@nki_op`` decorator for seamless integration\n* Provide explicit type signatures like ``(x: torch.Tensor) -> torch.Tensor``\n* For training, add autograd support via ``register_autograd()`` method for custom backward passes\n\nNKI kernels work in both eager execution and ``torch.compile``, integrating seamlessly with PyTorch's custom op registration system.\n\n**Q: Do I need to import torch_neuronx when not using NKI?**\n\nNo, the torch_neuronx import is only needed when using NKI kernels (via ``nki.jit``).\n\nIn PyTorch 2.9, PyTorch introduced a feature that allows custom backends to autoload their device and thereby register their backend. TorchNeuron follows the same setup as mentioned, allowing us to get rid of the import for device registration. For more details, see Autoloading Out-of-Tree Extension.\n\n**Q: What changes are needed to run TorchTitan on Trainium?**\n\nRunning TorchTitan on Trainium requires minimal code changes:\n\nFor Eager Mode:\n\n* Zero code changes required. TorchTitan's automatic device detection discovers Trainium hardware automatically.\n\nFor ``torch.compile:``\n\nMinimal changes required:\n\n* Specify ``backend=\"neuron\"``\n* Remove CUDA-specific parameters\n\nFor Mixed Mode (Eager + torch.compile):\n\n* When combining eager execution with components that use ``torch.compile`` (like FlexAttention), apply the ``torch.compile`` changes only to those specific components.\n\nParallelism Configuration:\n\n* Configure sharding strategy based on your hardware. For example, set ``NGPU=64`` for 16 Trainium2 chips (4 NeuronCores per chip). On Trainium, the unit of distribution is the NeuronCore, and you must specify how many NeuronCores are used based on your model and parallelism strategy.\n\nOpen Source & Development FAQ\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Q: Will TorchNeuron be fully open source with GitHub-first development?**\n\nYes. We are setting up the infrastructure for GitHub-first development in early 2026.\n\n**Q: Will TorchNeuron have open source CI/CD and nightly benchmarks similar to other PyTorch backends?**\n\nYes, TorchNeuron will have open source CI/CD before it reaches GA level same way we supported and enable PyTorch CI/CD for Aarch64 on graviton. We will provide similar testing and benchmarking infrastructure comparable to PyTorch CUDA, ROCm, and Intel XPU.\n\n**Q: When will TorchNeuron move from out-of-tree to in-tree in the main PyTorch repository?**\n\nOur ultimate goal is to move in-tree. However, we are starting as out-of-tree as it is the fastest way to provide value to our customers and allow us faster iteration in early life. Regardless of placement, our goal is to have full CI/CD integration as part of PyTorch's CI/CD infrastructure, even while out-of-tree. We are actively in discussions with PyTorch on the best path forward.\n\nTorch.compile FAQ\n^^^^^^^^^^^^^^^^^^^\n\n**Q: Does the Neuron Backend for TorchDynamo use Inductor?**\n\nThis may evolve in future releases. For this release, the Neuron Backend for TorchDynamo provides native ``torch.compile`` support without using Inductor.\n\n**Q: What does the Neuron TorchDynamo backend generate: kernels or hardware instructions?**\n\nThe Neuron backend generates Neuron IR, which may include NKI kernels passed as custom ops in the FX graphs. The Neuron Compiler then generates Trainium instructions from the IR.\n\n**Q: Does Neuron TorchDynamo Backend support overlapping compute and communication operations?**\n\nThe overlapping functionality is supported by the Neuron Compiler itself, and not by TorchDynamo backend. \n\n**Q: When using torch.compile, does TorchNeuron support graph breaks?**\n\nYes, the Neuron TorchDynamo backend supports graph breaks. \n\n**Q: When using torch.compile, does TorchNeuron have equivalents to CUDA graphs and CUDA graph trees?**\n\nNot in the initial release. We are considering equivalent constructs for future releases. \n\n**Q: Can I compile my model using torch.compile on a compute instance without Trainium hardware?**\n\nNo. The initial release requires compilation on Trainium instances (Trn1, Trn2, or Trn3). Future releases will support compilation on non-Trainium instances.\n\nEager Mode Execution FAQ\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Q: How does eager mode work on TorchNeuron? What is Adaptive Eager Execution, and how can operations be both dispatched individually and fused for performance?**\n\nExecution Model:\n\nIn PyTorch eager mode on TorchNeuron, operations are executed immediately as they are encountered in the Python code, following the same \"define-by-run\" paradigm where each operation is dispatched one at a time through PyTorch's dispatcher to the Neuron backend.\n\nNeuron Asynchronous Execution:\n\nPeeking into the details, Neuron operations are enqueued to the device asynchronously allowing the Python interpreter to continue issuing subsequent operations while previous Neuron operations may still be executing. PyTorch automatically performs necessary synchronization when copying data between host and devices or when accessing tensor values, making the effect of asynchronous computation transparent to the user since each device executes operations in the order they are queued.\n\nAdaptive Eager Execution:\n\nWhen the user is not debugging or inspecting tensors, TorchNeuron introduces Adaptive Eager Execution as an optimization. In PyTorch, the dispatcher queues operations on the backend for execution while the Python code continues running ahead. This allows multiple operations to be queued up simultaneously. TorchNeuron takes advantage of this mechanism by analyzing sequences of queued operators and fusing them into single operators based on fusion heuristics. These fused operations are then dispatched as single operator calls, improving performance while maintaining the same execution order, numerical accuracy, and determinism as non-fused execution.\n\nDebugging and Tensor Inspection:\n\nWhenever a user wants to print a tensor, just like any other backend, Neuron synchronizes at the operation where that tensor is needed and performs a device-to-host copy. This synchronization and copy mechanism applies the same way for fused ops when Adaptive Eager Execution is enabled.\n\nIn the context of Adaptive Eager Execution, printing operations may determine fusion boundaries. If printing occurs after an operation that would normally be fused with subsequent operations, fusion will not happen at that point to ensure the requested tensor value is available for inspection.\n\nWhen ``torch.use_deterministic_algorithms()`` or ``torch.set_deterministic_debug_mode()`` is called, TorchNeuron will ensure reproducible order of execution and Adaptive Eager Execution optimizations are disabled.\n\n**Q: Where are the TorchNeuron kernels implemented for eager mode execution?**\n\nATen implementations and kernels are part of the Neuron backend for eager mode. Currently, TorchNeuron is an out-of-tree backend. When TorchNeuron becomes an in-tree backend, those implementations will be part of the main PyTorch repository.\n\nDistributed Training FAQ\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Q: Which FSDP implementation does TorchNeuron support FSDP1, FSDP2, or SimpleFSDP?**\n\nFor eager mode, TorchNeuron supports all three: FSDPv1, FSDPv2, and SimpleFSDP. For torch.compile, TorchNeuron follows the PyTorch community recommendation and supports SimpleFSDP as it is more compiler-friendly, see SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile.\n\n**Q: Does TorchNeuron support activation checkpointing?**\n\nYes. TorchNeuron supports activation checkpointing.\n\n**Q: Does TorchNeuron support passing a mixed precision policy directly to FSDP, or do I need to use the autocast API?**\n\nBoth are supported. You can pass a mixed precision policy directly to FSDP or use the autocast API; it is up to the user to decide.\n\nGeneral FAQ\n^^^^^^^^^^^^^\n\n**Q: Does TorchNeuron support native PyTorch MoE (Mixture of Experts) operations?**\n\nYes, torch native MoE ops will be supported from first release, including ``torch._scaled_grouped_mm``, ``torch._grouped_mm``, and MoE Dispatch/Combine operations. First TorchNeuron release already comes with GPT-OSS support that covers all this.\n\n``torch.all_to_all_vdev_2d`` and ``torch.all_to_all_vdev_2d_offset`` (MoE Dispatch/Combine ops) will be supported in future releases.\n\n**Q: What is the timeline for supporting PyTorch Foundation libraries like torchcomms, monarch, torchforge, and torchao?**\n\nWe are actively evaluating support for these libraries now. Our goal is to support all of them over the next couple of quarters.\n\n**Q: Can NeuronCores on the same Trainium chip share HBM memory?**\n\nYes. HBM can be shared between the multiple NeuronCores on a single Trainium chip. However, depending on the Trainium generation, the available bandwidth of the NeuronCore and HBM could vary depending on affinity.\n\nAppendix\n^^^^^^^^\n\nWhile historically PyTorch used ``autograd`` function style, that approach is less recommended:\n\n.. code-block:: python\n\n    # sin_autograd.py\n    # sine using NKI kernels, registered via torch.autograd.Function (not recommended)\n\n    import torch\n    from torch_neuronx import nki\n\n    # Declaring and implementing NKI kernels\n    @nki.jit\n    def sin_kernel(in_ptr0, out_ptr):\n        import nki.language as nl\n        \n        input_tile = nl.load(in_ptr0[0:128])\n        output_tile = nl.sin(input_tile)\n        nl.store(out_ptr[0:128], value=output_tile)\n\n    @nki.jit\n    def cos_kernel(in_ptr0, out_ptr):\n        import nki.language as nl\n        \n        input_tile = nl.load(in_ptr0[0:128])\n        output_tile = nl.cos(input_tile)\n        nl.store(out_ptr[0:128], value=output_tile)\n\n    # after this line, there is no NKI code, just native PyTorch\n\n    # Create autograd function\n    class NKI_sin(torch.autograd.Function):\n        @staticmethod\n        def forward(ctx, x):\n            ctx.save_for_backward(x)\n            output = torch.empty_like(x)\n            # Here we call the nki kernel for sin\n            sin_kernel(x, output)\n            return output\n        \n        @staticmethod\n        def backward(ctx, grad_output):\n            x = ctx.saved_tensors[0]\n            cos_result = torch.empty_like(x)\n            # Here we call the nki kernel for cos\n            cos_kernel(x, cos_result)  # cos is derivative of sin\n            return grad_output * cos_result\n\n    # User-facing function\n    def custom_sin(x):\n        \"\"\"Sin with cosine as backward pass.\"\"\"\n        return NKI_sin.apply(x)\n\n    # Test\n    if __name__ == \"__main__\":\n        x = torch.randn(128, device=\"neuron\", requires_grad=True)\n        \n        # Forward pass, which call forward() -> sin_kernel()\n        y = custom_sin(x)\n        \n        # Backward pass\n        loss = y.sum()\n        # autograd automatically calls backward() -> cos_kernel()\n        loss.backward() \n        \n        # Verify\n        expected_forward = torch.sin(x)\n        expected_grad = torch.cos(x.detach())\n        \n        print(\"Testing accuracy of sin custom op, using autograd function style\")\n        \n        assert torch.allclose(y, expected_forward, atol=1e-5)\n        assert torch.allclose(x.grad, expected_grad, atol=1e-5)\n        \n        print(\"✅ Forward: sin kernel\")\n        print(\"✅ Backward: cos kernel\")\n        print(\"✅ Gradients match!\")\n\nResources and More Information\n--------------------------------\n\n* `TorchNeuron GitHub Repository <https://github.com/aws-neuron/torch-neuronx>`__\n* `AWS Trainium Overview <https://aws.amazon.com/machine-learning/trainium/>`__\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/additional-examples-inference-torch-neuronx.rst",
    "content": "\n.. meta::\n   :description: Additional Examples (``torch-neuronx``) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nAdditional Examples (``torch-neuronx``)\n=======================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    AWS Neuron Samples GitHub Repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/>\n    Transformers Neuron GitHub samples <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx>\n\n\n* `AWS Neuron Samples GitHub Repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/>`_\n* `Transformers Neuron GitHub samples <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx>`_"
  },
  {
    "path": "frameworks/torch/torch-neuronx/additional-examples-training.rst",
    "content": "\n.. meta::\n   :description: Additional Examples (``torch-neuronx``) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training\n   :date-modified: 2026-03-13\n\n\nAdditional Examples (``torch-neuronx``)\n=======================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    AWS Neuron Reference for Nemo Megatron GitHub Repository <https://github.com/aws-neuron/neuronx-nemo-megatron>\n    AWS Neuron Samples for EKS <https://github.com/aws-neuron/aws-neuron-eks-samples>\n    AWS Neuron Samples for AWS ParallelCluster <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples>\n    AWS Neuron Samples GitHub Repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training>\n\n\n\n* `AWS Neuron Reference for Nemo Megatron GitHub Repository <https://github.com/aws-neuron/neuronx-nemo-megatron>`_\n* `AWS Neuron Samples for EKS <https://github.com/aws-neuron/aws-neuron-eks-samples>`_\n* `AWS Neuron Samples for AWS ParallelCluster <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples>`_\n* `AWS Neuron Samples GitHub Repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training>`_\n\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-analyze.rst",
    "content": ".. _torch_neuronx_analyze_api:\n\n\n.. meta::\n   :description: PyTorch NeuronX Analyze API for Inference - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nPyTorch NeuronX Analyze API for Inference\n============================================================\n\n.. py:function:: torch_neuronx.analyze(func, example_inputs, compiler_workdir=None)\n\n   Checks the support of the operations in the ``func`` by checking each operator against neuronx-cc.\n\n   :arg ~torch.nn.Module,callable func: The function/module that that will be\n      run using the ``example_inputs`` arguments in order to record the\n      computation graph.\n    \n   :arg ~torch.Tensor,tuple[~torch.Tensor] example_inputs: A tuple of example\n      inputs that will be passed to the ``func`` while tracing.\n\n   :keyword str compiler_workdir: Work directory used by\n      |neuronx-cc|. This can be useful for debugging and/or inspecting\n      intermediary |neuronx-cc| outputs\n   \n   :keyword set additional_ignored_ops: A set of aten operators to not analyze. Default is an empty set.\n   \n   :keyword int max_workers: The max number of workers threads to spawn.\n      The default is ``4``.\n   \n   :keyword bool is_hf_transformers: If the model is a huggingface transformers model,\n      it is recommended to enable this option to prevent deadlocks. Default is ``False``.\n   \n   :keyword bool cleanup: Specifies whether to delete the compiler artifact directories\n      generated after running analyze. Default is ``False``.\n   \n\n   :returns: A JSON like :class:`~Dict` with the supported operators and their count, and unsupported\n      operators with the failure mode and location of the operator in the python code.\n    \n   :rtype: :class:`~Dict`\n\n\n   .. note::\n\n      This function is meant to be used as a way to evaluate operator support for the model that is intended to be traced.\n      The information can be used to modify operators that are unsupported to ones that are supported, or custom partitioning\n      of the model.\n\n      Note that this API does not return a traced model.\n      \n      Just like torch_neuronx.trace, this API can be used on any EC2 machine with sufficient memory and compute resources.\n\n\n   Examples\n   ----------\n\n   *Fully supported model*\n\n   .. code-block:: python\n\n      import json\n\n      import torch\n      import torch.nn as nn\n      import torch_neuronx\n\n      class MLP(nn.Module):\n         def __init__(self, input_size=28*28, output_size=10, layers=[120,84]):\n            super(MLP, self).__init__()\n            self.fc1 = nn.Linear(input_size, layers[0])\n            self.relu = nn.ReLU()\n            self.fc2 = nn.Linear(layers[0], layers[1])\n         def forward(self, x):\n            f1 = self.fc1(x)\n            r1 = self.relu(f1)\n            f2 = self.fc2(r1)\n            r2 = self.relu(f2)\n            f3 = self.fc3(r2)\n            return torch.log_softmax(f3, dim=1)\n    \n      model = MLP()\n      ex_input = torch.rand([32,784])\n\n      model_support = torch_neuronx.analyze(model,ex_input)\n      print(json.dumps(model_support,indent=4))\n\n   .. code-block::\n\n     {\n         \"torch_neuronx_version\": \"1.13.0.1.5.0\",\n         \"neuronx_cc_version\": \"2.0.0.11796a0+24a26e112\",\n         \"support_percentage\": \"100.00%\",\n         \"supported_operators\": {\n            \"aten::linear\": 3,\n         \"aten::relu\": 2,\n         \"aten::log_softmax\": 1\n         },\n         \"unsupported_operators\": []\n      }\n   \n   *Unsupported Model/Operator*\n\n   .. code-block:: python\n\n      import json\n      import torch\n      import torch_neuronx\n\n      def fft(x):\n         return torch.fft.fft(x)\n\n      model = fft\n      ex_input = torch.arange(4)\n\n      model_support = torch_neuronx.analyze(model,ex_input)\n      print(json.dumps(model_support,indent=4))\n\n   .. code-block::\n\n      {\n         \"torch_neuronx_version\": \"1.13.0.1.5.0\",\n         \"neuronx_cc_version\": \"2.0.0.11796a0+24a26e112\",\n         \"support_percentage\": \"0.00%\",\n         \"supported_operators\": {},\n         \"unsupported_operators\": [\n            {\n               \"kind\": \"aten::fft_fft\",\n               \"failureAt\": \"neuronx-cc\",\n               \"call\": \"test.py(6): fft\\n/home/ubuntu/testdir/venv/lib/python3.8/site-packages/torch_neuronx/xla_impl/analyze.py(35): forward\\n/home/ubuntu/testdir/venv/lib/python3.8/site-packages/torch/nn/modules/module.py(1182): _slow_forward\\n/home/ubuntu/testdir/venv/lib/python3.8/site-packages/torch/nn/modules/module.py(1194): _call_impl\\n/home/ubuntu/testdir/venv/lib/python3.8/site-packages/torch/jit/_trace.py(976): trace_module\\n/home/ubuntu/testdir/venv/lib/python3.8/site-packages/torch/jit/_trace.py(759): trace\\n/home/ubuntu/testdir/venv/lib/python3.8/site-packages/torch_neuronx/xla_impl/analyze.py(302): analyze\\ntest.py(11): <module>\\n\",\n               \"opGraph\": \"graph(%x : Long(4, strides=[1], requires_grad=0, device=cpu),\\n      %neuron_4 : NoneType,\\n      %neuron_5 : int,\\n      %neuron_6 : NoneType):\\n  %neuron_7 : ComplexFloat(4, strides=[1], requires_grad=0, device=cpu) = aten::fft_fft(%x, %neuron_4, %neuron_5, %neuron_6)\\n  return (%neuron_7)\\n\"\n            }\n         ]\n      }\n   \n   **Note:** the ``failureAt`` field can either be \"neuronx-cc\" or \"Lowering to HLO\". If the field is \"neuronx-cc\", then it indicates that the provided operator configuration failed to be compiled with ``neuronx-cc``. This could either indicate that the operator configuration is unsupported, or there is a bug with that operator configuration.\n\n.. |neuronx-cc| replace:: :ref:`neuronx-cc <neuron-compiler-cli-reference-guide>`\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-async-lazy-load.rst",
    "content": ".. _torch_neuronx_lazy_async_load_api:\n\n\n.. meta::\n   :description: PyTorch NeuronX Lazy and Asynchronous Loading API - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nPyTorch NeuronX Lazy and Asynchronous Loading API\n===================================================\n\nThe :func:`torch_neuronx.lazy_load` and :func:`torch_neuronx.async_load` Python APIs allow\nfor more fine-grained control of loading a model onto the Neuron cores. They are designed to\nenable different load behaviours (i.e. lazy or asynchronous loading) that, in certain cases, \ncan speed up the load time. Both APIs take as input a :class:`~torch.jit.ScriptModule` model\ncreated by :ref:`torch_neuronx_trace_api`. **They should be called immediately after** :func:`torch_neuronx.trace`\n**returns, before saving the model via** :func:`torch.jit.save`\n\n.. py:function:: torch_neuronx.lazy_load(trace, enable_lazy_load=True)\n\n    Enables(or disables) lazy load behaviour on the traced Neuron ScriptModule ``trace``.\n    By default, lazy load behaviour is disabled, so this API must be called immediately after\n    :func:`torch_neuronx.trace` returns if lazy load behaviour is desired.\n\n    In this context, lazy loading means that **calling** ``torch.jit.load`` **will not immediately load\n    the model onto the Neuron core.** Instead, the model will be loaded onto the Neuron core at a later\n    time, either via a call to :ref:`torch_neuronx_dataparallel_api`, or automatically when the model's\n    ``forward`` method executes.\n\n    There are several scenarios where lazy loading is useful. For instance, if one wants to use\n    the DataParallel API to load the model onto multiple Neuron cores, typically\n    one would first call ``torch.jit.load`` to load the saved model from disk, and then call ``DataParallel``\n    on the object returned by ``torch.jit.load``. Doing this will cause redundant loading, because calling ``torch.jit.load``\n    first will by default load the model onto one Neuron core, while calling ``DataParallel`` next will\n    first unload the model from the Neuron core, and then load again according to user-specified ``device_ids``.\n    This redundant load is avoided if one enables lazy loading by calling ``torch_neuronx.lazy_load`` prior to saving\n    the model. This way, ``torch.jit.load`` will not load the model onto the Neuron core, so ``DataParallel`` can \n    directly load the model onto the desired cores.\n\n    *Required Arguments*\n\n    :arg ~torch.jit.ScriptModule trace: Model created by the\n        :ref:`torch_neuronx_trace_api`, for which lazy loading is to be enabled.\n\n    *Optional Arguments*\n\n    :arg bool enable_lazy_load: Whether to enable lazy loading, defaults to True.\n\n    Simple example usage:\n\n        >>> neuron_model = torch_neuronx.trace(model, inputs)\n        >>> torch_neuronx.lazy_load(neuron_model)\n        >>> torch.jit.save(neuron_model, \"my_model\")\n        \n        Then some time later:\n\n        >>> neuron_model = torch.jit.load(\"my_model\") # neuron_model will not be loaded onto the Neuron core until it is run or it is passed to DataParallel\n\n.. py:function:: torch_neuronx.async_load(trace, enable_async_load=True)\n    \n    Enables(or disables) asynchronous load behaviour on the traced Neuron ScriptModule ``trace``.\n    \n    By default, loading onto the Neuron core is a synchronous, blocking operation. This API\n    can be called immediately after :func:`torch_neuronx.trace` returns in order to make\n    loading this model onto the Neuron core a non-blocking operation. This means that when\n    a load onto the Neuron core is triggered, either through a call to ``torch.jit.load`` or\n    ``DataParallel``, a new thread is launched to perform the load, while the calling function\n    will immediately return. The load will proceed asynchronously in the background, and only\n    when it finishes will the model successfully execute. If the model's ``forward`` method is invoked\n    before the asynchronus load finishes, ``forward`` will wait until the load completes before\n    executing the model.\n\n    This API is useful when one wants to load multiple models onto the Neuron core in parallel.\n    It allows multiple calls to load different models to execute concurrently on different threads,\n    which can significantly reduce the total load time when there are multiple CPU cores on the host.\n    It is especially useful in cases where a single model pipeline has several compiled Neuron models.\n    In this case, one can enable asynchronous load on each Neuron model and load all of them in parallel.\n\n    Note that this API differs from :func:`torch_neuronx.lazy_load`. Lazy loading will\n    only delay the load onto the Neuron core from when ``torch.jit.load`` is called to some later time, \n    but when the load does occur, it is still a synchronous, blocking operation. Asynchronous loading\n    will make the load an asynchronous, non-blocking operation, but it does not delay when the load starts,\n    meaning that calling ``torch.jit.load`` will still start the load, but the load will proceed asynchronously\n    in the background.\n\n    *Required Arguments*\n\n    :arg ~torch.jit.ScriptModule trace: Model created by the\n        :ref:`torch_neuronx_trace_api`, for which asynchronous loading is to be enabled.\n\n    *Optional Arguments*\n\n    :arg bool enable_async_load: Whether to enable asynchronous loading, defaults to True.\n\n    Simple example usage:\n\n        >>> neuron_model1 = torch_neuronx.trace(model1, inputs1)\n        >>> torch_neuronx.async_load(neuron_model1)\n        >>> torch.jit.save(neuron_model1, \"my_model1\")\n\n        >>> neuron_model2 = torch_neuronx.trace(model2, inputs2)\n        >>> torch_neuronx.async_load(neuron_model2)\n        >>> torch.jit.save(neuron_model2, \"my_model2\")\n        \n        Then some time later:\n\n        >>> neuron_model1 = torch.jit.load(\"my_model1\") # neuron_model1 will start loading onto the Neuron core immediately, but the load will occur in a separate thread in the background.\n        >>> neuron_model2 = torch.jit.load(\"my_model2\") # neuron_model2 will start loading onto the Neuron core immediately, but the load will occur in a separate thread in the background.\n\n        Both neuron_model1 and neuron_model2 will load concurrently.\n        \n        >>> output1 = neuron_model1(input1) # This call will block until the asynchronous load launched above finishes.\n        >>> output2 = neuron_model2(input2) # This call will block until the asynchronous load launched above finishes.\n\n\nUsing :func:`torch_neuronx.lazy_load` and :func:`torch_neuronx.async_load` Together\n-------------------------------------------------------------------------------------\n\nYou can also enable lazy load and asynchronous load together for the same model.\nTo do so, simply call each API independently before saving the model with ``torch.jit.save``:\n\n    >>> neuron_model = torch_neuronx.trace(model, inputs)\n    >>> torch_neuronx.lazy_load(neuron_model)\n    >>> torch_neuronx.async_load(neuron_model)\n    >>> torch.jit.save(neuron_model, \"my_model\")\n\nThis will both delay loading the model onto the Neuron core, and make the load asynchronous.\n\nFor another example usage, please refer to the `Github sample <https://github.com/aws-neuron/aws-neuron-samples/blob/master/archive/torch-neuronx/inference/hf_pretrained_sd2_512_inference.ipynb>`_ we provide for running inference on HuggingFace Stable Diffusion 2.1,\nwhere we use both ``lazy_load`` and ``async_load`` to speed up the total load time of the four Neuron models that make \nup that pipeline.\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-core-placement.rst",
    "content": ".. _torch_neuronx_core_placement_api:\n\n\n.. meta::\n   :description: PyTorch NeuronX NeuronCore Placement APIs - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nPyTorch NeuronX NeuronCore Placement APIs\n=========================================\n\nFunctions which enable placement of :class:`torch.jit.ScriptModule` to specific\nNeuronCores. Two sets of functions are provided which can be used\ninterchangeably but have different performance characteristics and advantages:\n\n- The :func:`~torch_neuronx.multicore_context` &\n  :func:`~torch_neuronx.neuron_cores_context` functions are context\n  managers that allow a model to be placed on a given NeuronCore *only* at\n  :func:`torch.jit.load` time. These functions are the most efficient way of\n  loading a model since the model is loaded directly to a NeuronCore. The\n  alternative functions described below require that a model is unloaded from\n  one core and then reloaded to another.\n- The :func:`~torch_neuronx.set_multicore` &\n  :func:`~torch_neuronx.set_neuron_cores` functions allow a model\n  that has already been loaded to a NeuronCore to be moved to a different\n  NeuronCore. This functionality is less efficient than directly loading a model\n  to a NeuronCore within a context manager but allows device placement to be\n  fully dynamic at runtime. This is analogous to the :meth:`torch.nn.Module.to`\n  function for device placement.\n\n.. important::\n\n    A prerequisite to enable placement functionality is that\n    the loaded :class:`torch.jit.ScriptModule` has already been compiled with\n    the :func:`torch_neuronx.trace` API. Attempting to place a regular\n    :class:`torch.nn.Module` onto a NeuronCore prior to compilation will do\n    nothing.\n\n.. py:function:: torch_neuronx.set_neuron_cores(trace: torch.jit.ScriptModule, start_nc: int=-1, nc_count: int=-1)\n\n    Set the NeuronCore start/count for all Neuron subgraphs in a torch Module.\n\n    This will unload the model from an existing NeuronCore if it is already\n    loaded.\n\n    *Requires Torch 1.8+*\n\n    :arg ~torch.jit.ScriptModule trace: A torch module which contains one or more Neuron subgraphs.\n    :keyword int start_nc: The starting NeuronCore index where the Module is placed. The\n        value ``-1`` automatically loads to the optimal NeuronCore (least\n        used). Note that this index is always relative to NeuronCores\n        visible to this process.\n    :keyword int nc_count: The number of NeuronCores to use. The value ``-1``\n        will load a model to exactly one NeuronCore. If ``nc_count``\n        is greater than than one, the model will be replicated across multiple\n        NeuronCores.\n\n    :raises [RuntimeError]: If the Neuron runtime cannot be initialized.\n    :raises [ValueError]: If the ``nc_count`` is an invalid number of NeuronCores.\n\n    .. rubric:: Examples\n\n    *Single Load*: Move a model to the first visible NeuronCore after\n    loading.\n\n    .. code-block:: python\n\n        model = torch.jit.load('example_neuron_model.pt')\n        torch_neuronx.set_neuron_cores(model, start_nc=0, nc_count=1)\n\n        model(example) # Executes on NeuronCore 0\n        model(example) # Executes on NeuronCore 0\n        model(example) # Executes on NeuronCore 0\n\n    *Multiple Core Replication*: Replicate a model to 2 NeuronCores after\n    loading. This allows a single :class:`torch.jit.ScriptModule` to\n    use multiple NeuronCores by running round-robin executions.\n\n    .. code-block:: python\n\n        model = torch.jit.load('example_neuron_model.pt')\n        torch_neuronx.set_neuron_cores(model, start_nc=2, nc_count=2)\n\n        model(example) # Executes on NeuronCore 2\n        model(example) # Executes on NeuronCore 3\n        model(example) # Executes on NeuronCore 2\n\n    *Multiple Model Load*: Move and pin 2 models to separate NeuronCores.\n    This causes each :class:`torch.jit.ScriptModule` to always execute on\n    a specific NeuronCore.\n\n    .. code-block:: python\n\n        model1 = torch.jit.load('example_neuron_model.pt')\n        torch_neuronx.set_neuron_cores(model1, start_nc=2)\n\n        model2 = torch.jit.load('example_neuron_model.pt')\n        torch_neuronx.set_neuron_cores(model2, start_nc=0)\n\n        model1(example) # Executes on NeuronCore 2\n        model1(example) # Executes on NeuronCore 2\n        model2(example) # Executes on NeuronCore 0\n        model2(example) # Executes on NeuronCore 0\n\n\n.. py:function:: torch_neuronx.set_multicore(trace: torch.jit.ScriptModule)\n\n    Loads all Neuron subgraphs in a torch Module to all visible NeuronCores.\n\n    This loads each Neuron subgraph within a :class:`torch.jit.ScriptModule`\n    to multiple NeuronCores without requiring multiple calls to\n    :func:`torch.jit.load`. This allows a single\n    :class:`torch.jit.ScriptModule` to use multiple NeuronCores for\n    concurrent threadsafe inferences. Executions use a round-robin strategy\n    to distribute across NeuronCores.\n\n    This will unload the model from an existing NeuronCore if it is already\n    loaded.\n\n    *Requires Torch 1.8+*\n\n    :arg ~torch.jit.ScriptModule trace: A torch module which contains one or more Neuron subgraphs.\n\n    :raises [RuntimeError]: If the Neuron runtime cannot be initialized.\n\n    .. rubric:: Examples\n\n    *Multiple Core Replication*: Move a model across all visible\n    NeuronCores after loading. This allows a single\n    :class:`torch.jit.ScriptModule` to use all NeuronCores by\n    running round-robin executions.\n\n    .. code-block:: python\n\n        model = torch.jit.load('example_neuron_model.pt')\n        torch_neuronx.set_multicore(model)\n\n        model(example) # Executes on NeuronCore 0\n        model(example) # Executes on NeuronCore 1\n        model(example) # Executes on NeuronCore 2\n\n\n.. py:function:: torch_neuronx.neuron_cores_context(start_nc: int=-1, nc_count: int=-1)\n\n    A context which sets the NeuronCore start/count for Neuron models loaded\n    with :func:`torch.jit.load`.\n\n    This context manager may only be used when loading a model with\n    :func:`torch.jit.load`. A model which has already been loaded into memory\n    will not be affected by this context manager. Furthermore, after loading the\n    model, inferences do not need to occur in this context in order to use the\n    correct NeuronCores.\n\n    Note that this context is *not* threadsafe. Using multiple core placement\n    contexts from multiple threads may not correctly place models.\n\n    :keyword int start_nc: The starting NeuronCore index where the Module is placed. The\n        value ``-1`` automatically loads to the optimal NeuronCore (least\n        used). Note that this index is always relative to NeuronCores\n        visible to this process.\n    :keyword int nc_count: The number of NeuronCores to use. The value ``-1``\n        will load a model to exactly one NeuronCore. If ``nc_count``\n        is greater than than one, the model will be replicated across multiple\n        NeuronCores.\n\n    :raises [RuntimeError]: If the Neuron runtime cannot be initialized.\n    :raises [ValueError]: If the ``nc_count`` is an invalid number of NeuronCores.\n\n\n    .. rubric:: Examples\n\n    *Single Load*: Directly load a model from disk to the first visible\n    NeuronCore.\n\n    .. code-block:: python\n\n        with torch_neuronx.neuron_cores_context(start_nc=0, nc_count=1):\n            model = torch.jit.load('example_neuron_model.pt')  # Load must occur within the context\n\n        model(example) # Executes on NeuronCore 0\n        model(example) # Executes on NeuronCore 0\n        model(example) # Executes on NeuronCore 0\n\n    *Multiple Core Replication*: Directly load a model from disk to 2\n    NeuronCores. This allows a single :class:`torch.jit.ScriptModule` to\n    use multiple NeuronCores by running round-robin executions.\n\n    .. code-block:: python\n\n        with torch_neuronx.neuron_cores_context(start_nc=2, nc_count=2):\n            model = torch.jit.load('example_neuron_model.pt')  # Load must occur within the context\n\n        model(example) # Executes on NeuronCore 2\n        model(example) # Executes on NeuronCore 3\n        model(example) # Executes on NeuronCore 2\n\n    *Multiple Model Load*: Directly load 2 models from disk and pin them to\n    separate NeuronCores. This causes each :class:`torch.jit.ScriptModule`\n    to always execute on a specific NeuronCore.\n\n    .. code-block:: python\n\n        with torch_neuronx.neuron_cores_context(start_nc=2):\n            model1 = torch.jit.load('example_neuron_model.pt')  # Load must occur within the context\n\n        with torch_neuronx.neuron_cores_context(start_nc=0):\n            model2 = torch.jit.load('example_neuron_model.pt')  # Load must occur within the context\n\n        model1(example) # Executes on NeuronCore 2\n        model1(example) # Executes on NeuronCore 2\n        model2(example) # Executes on NeuronCore 0\n        model2(example) # Executes on NeuronCore 0\n\n\n.. py:function:: torch_neuronx.multicore_context()\n\n    A context manager which loads models to all visible NeuronCores for Neuron\n    models loaded with :func:`torch.jit.load`.\n\n    This loads each Neuron subgraph within a :class:`torch.jit.ScriptModule`\n    to multiple NeuronCores without requiring multiple calls to\n    :func:`torch.jit.load`. This allows a single\n    :class:`torch.jit.ScriptModule` to use multiple NeuronCores for\n    concurrent threadsafe inferences. Executions use a round-robin strategy\n    to distribute across NeuronCores.\n\n    This context manager may only be used when loading a model with\n    :func:`torch.jit.load`. A model which has already been loaded into memory\n    will not be affected by this context manager. Furthermore, after loading the\n    model, inferences do not need to occur in this context in order to use the\n    correct NeuronCores.\n\n    Note that this context is *not* threadsafe. Using multiple core placement\n    contexts from multiple threads may not correctly place models.\n\n    :raises [RuntimeError]: If the Neuron runtime cannot be initialized.\n\n    .. rubric:: Examples\n\n    *Multiple Core Replication*: Directly load a model to all visible\n    NeuronCores. This allows a single  :class:`torch.jit.ScriptModule`\n    to use all NeuronCores by running round-robin executions.\n\n    .. code-block:: python\n\n        with torch_neuronx.multicore_context():\n            model = torch.jit.load('example_neuron_model.pt')  # Load must occur within the context\n\n        model(example) # Executes on NeuronCore 0\n        model(example) # Executes on NeuronCore 1\n        model(example) # Executes on NeuronCore 2\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-data-parallel.rst",
    "content": ".. _torch_neuronx_dataparallel_api:\n\n\n.. meta::\n   :description: PyTorch NeuronX DataParallel API - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nPyTorch NeuronX DataParallel API\n==================================\n\nThe :func:`torch_neuronx.DataParallel` Python API implements data parallelism on\n:class:`~torch.jit.ScriptModule` models created by \n:ref:`torch_neuronx_trace_api`.\nThis function is analogous to :class:`~torch.nn.DataParallel` in PyTorch.\nThe :ref:`torch-neuronx-dataparallel-app-note` application note provides an\noverview of how :func:`torch_neuronx.DataParallel` can be used to improve\nthe performance of inference workloads on Inferentia.\n\n.. py:function:: torch_neuronx.DataParallel(model, device_ids=None, dim=0, set_dynamic_batching=True)\n\n    Applies data parallelism by replicating the model on\n    available NeuronCores and distributing data across the different\n    NeuronCores for parallelized inference.\n\n    By default, DataParallel will use all available NeuronCores\n    allocated for the current process for parallelism. DataParallel will\n    apply parallelism on ``dim=0`` if ``dim`` is not specified.\n\n    DataParallel automatically enables\n    :ref:`dynamic batching <dynamic_batching_description_torch_neuronx>` on\n    eligible models if ``dim=0``. Dynamic batching can be disabled using\n    :func:`torch_neuronx.DataParallel.disable_dynamic_batching`, or by setting\n    ``set_dynamic_batching=False`` when initializing the DataParallel object.\n    If dynamic batching is not enabled, the batch size at compilation-time must\n    be equal to the batch size at inference-time divided by the number of\n    NeuronCores being used. Specifically, the following must be true when\n    dynamic batching is disabled:\n    ``input.shape[dim] / len(device_ids) == compilation_input.shape[dim]``.\n\n    :func:`torch.neuron.DataParallel` requires PyTorch >= 1.8.\n\n    *Required Arguments*\n\n    :arg ~torch.jit.ScriptModule model: Model created by the\n        :ref:`torch_neuronx_trace_api` to be parallelized.\n\n    *Optional Arguments*\n\n    :arg list device_ids: List of :obj:`int` or ``'nc:#'`` that specify the\n        NeuronCores to use for parallelization (default: all NeuronCores).\n        Refer to the :ref:`device_ids note <device_ids_note_torch_neuronx>` for a description\n        of how ``device_ids`` indexing works.\n    :arg int dim: Dimension along which the input tensor is scattered across\n        NeuronCores (default ``dim=0``).\n    :arg bool set_dynamic_batching: Whether to enable dynamic batching.\n\n    *Attributes*\n\n    :arg int num_workers: Number of worker threads used for\n        multithreaded inference (default: ``2 * number of NeuronCores``).\n    :arg int split_size: Size of the input chunks\n        (default: ``max(1, input.shape[dim] // number of NeuronCores)``).\n\n\n.. py:function:: torch.neuron.DataParallel.disable_dynamic_batching()\n\n    Disables automatic dynamic batching on the DataParallel module. See\n    :ref:`Dynamic batching disabled <dataparallel_example_disable_dynamic_batching_api_torch_neuronx>`\n    for example of how DataParallel can be used with dynamic batching disabled.\n    Use as follows:\n\n        >>> model_parallel = torch_neuronx.DataParallel(model_neuron)\n        >>> model_parallel.disable_dynamic_batching()\n\n.. _device_ids_note_torch_neuronx:\n\n.. note::\n\n    ``device_ids`` uses per-process NeuronCore granularity and zero-based\n    indexing. Per-process granularity means that each Python process \"sees\"\n    its own view of the world. Specifically, this means that ``device_ids``\n    only \"sees\" the NeuronCores that are allocated for the current process.\n    Zero-based indexing means that each Python process will index its\n    allocated NeuronCores starting at 0, regardless of the \"global\" index of\n    the NeuronCores. Zero-based indexing makes it possible to redeploy the exact\n    same code unchanged in different process. This behavior is analogous to\n    the ``device_ids`` argument in the PyTorch\n    :class:`~torch.nn.DataParallel` function.\n\n    As an example, assume DataParallel is run on an inf2.48xlarge, which\n    contains 12 Inferentia chips each of which contains two NeuronCores:\n\n    * If ``NEURON_RT_VISIBLE_CORES`` is not set, a single process can access\n      all 24 NeuronCores. Thus specifying ``device_ids=[\"nc:0\"]`` will\n      correspond to chip0:core0 and ``device_ids=[\"nc:13\"]`` will correspond\n      to chip6:core1.\n\n    * However, if two processes are launched where: process 1 has\n      ``NEURON_RT_VISIBLE_CORES=0-11`` and process 2 has\n      ``NEURON_RT_VISIBLE_CORES=12-23``, ``device_ids=[\"nc:13\"]``\n      cannot be specified in either process. Instead, chip6:core1 can only be\n      accessed in process 2. Additionally, chip6:core1 is specified in process 2\n      with ``device_ids=[\"nc:1\"]``. Furthermore, in process 1,\n      ``device_ids=[\"nc:0\"]`` would correspond to chip0:core0; in process 2\n      ``device_ids=[\"nc:0\"]`` would correspond to chip6:core0.\n\n\nExamples\n--------\n\nThe following sections provide example usages of the\n:func:`torch_neuronx.DataParallel` module.\n\nDefault usage\n^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-default.rst\n\nSpecifying NeuronCores\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-specify-ncs.rst\n\nDataParallel with dim != 0\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dim-neq-zero.rst\n\nDynamic batching\n^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dynamic-batching.rst\n\n.. _dataparallel_example_disable_dynamic_batching_api_torch_neuronx:\n\nDynamic batching disabled\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-disable-dynamic-batching.rst\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-replace-weights.rst",
    "content": ".. _torch_neuronx_replace_weights_api:\n\n\n.. meta::\n   :description: PyTorch Neuron (``torch-neuronx``) Weight Replacement API for Inference - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nPyTorch Neuron (``torch-neuronx``) Weight Replacement API for Inference\n========================================================================\n\n.. py:function:: torch_neuronx.replace_weights(neuron_model, weights)\n\n    Replaces the weights in a Neuron Model with split weights.\n    This function will emit a warning of the supplied Neuron model does not\n    contain any separated weights.\n\n    .. warning::\n\n        The below API is only applicable for models traced with the\n        parameter ``inline_weights_to_neff=False``, which is ``True`` by\n        default. See :func:`torch_neuronx.trace` for details.\n\n    :arg ~torch.jit.RecursiveScriptModule neuron_model: A Neuron model compiled with split weights\n\n    :arg ~torch.nn.Module,Dict[str, ~torch.Tensor] weights: Either the original model with the new weights,\n        or the state_dict of a model.\n    \n    :returns: ``None``, this function performs the weight replacement inline.\n    :rtype: ``None``\n\n    .. rubric:: Examples\n\n    *Using a model*\n\n    .. code-block:: python\n\n        import torch\n        import torch_neuronx\n\n\n        class Network(torch.nn.Module):\n            def __init__(self, hidden_size=4, layers=3) -> None:\n                super().__init__()\n                self.layers = torch.nn.Sequential(\n                    *(torch.nn.Linear(hidden_size, hidden_size) for _ in range(layers)))\n\n            def forward(self, tensor):\n                return self.layers(tensor)\n    \n\n        # initialize two networks\n        network = Network()\n        network2 = Network()\n        network.eval()\n        network2.eval()\n\n        inp = torch.rand(2,4)\n\n        # trace weight separated model with first network\n        weight_separated_trace = torch_neuronx.trace(network,inp,inline_weights_to_neff=False)\n\n        # replace with weights from second network\n        torch_neuronx.replace_weights(weight_separated_trace,network2.state_dict())\n\n        # get outputs from neuron and cpu networks\n        out_network2 = network2(inp)\n        out_neuron = weight_separated_trace(inp)\n        \n        # check that they are equal\n        print(out_network2,out_neuron)\n\n\n\n    *Using safetensors*\n\n    The `safetensors`_ library is useful for storing/loading model tensors safely and quickly.\n\n    .. code-block:: python\n\n        import torch\n        import torch_neuronx\n\n        from safetensors import safe_open\n        from safetensors.torch import save_model\n\n\n        class Network(torch.nn.Module):\n            def __init__(self, hidden_size=4, layers=3) -> None:\n                super().__init__()\n                self.layers = torch.nn.Sequential(\n                    *(torch.nn.Linear(hidden_size, hidden_size) for _ in range(layers)))\n\n            def forward(self, tensor):\n                return self.layers(tensor)\n    \n\n        # initialize two networks\n        network = Network()\n        network2 = Network()\n        network.eval()\n        network2.eval()\n\n        inp = torch.rand(2,4)\n\n        # trace weight separated model with first network\n        weight_separated_trace = torch_neuronx.trace(network,inp,inline_weights_to_neff=False)\n\n        # save network2 weights to safetensors\n        safetensor_path = f\"{directory}/network2.safetensors\"\n        save_model(network2,safetensor_path)\n\n        #load safetensors from network2 into traced_weight separated model\n        tensors = {}\n        with safe_open(safetensor_path,framework=\"pt\") as f:\n            for k in f.keys():\n                tensors[k] = f.get_tensor(k)\n\n        # replace with weights from second network\n        torch_neuronx.replace_weights(weight_separated_trace,tensors)\n\n        # get outputs from neuron and cpu networks\n        out_network2 = network2(inp)\n        out_neuron = weight_separated_trace(inp)\n        \n        # check that they are equal\n        print(out_network2,out_neuron)\n\n\n.. note::\n\n    For non-safetensors models, use ``torch.load`` to load the model, and pass the model's ``state_dict`` inside like the first example.\n\n.. _safetensors: https://huggingface.co/docs/safetensors/index\n.. _torch-xla: https://github.com/pytorch/xla\n.. _torchscript: https://pytorch.org/docs/stable/jit.html\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-trace.rst",
    "content": ".. _torch_neuronx_trace_api:\n\n\n.. meta::\n   :description: PyTorch NeuronX Tracing API for Inference - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nPyTorch NeuronX Tracing API for Inference\n===========================================\n\n.. py:function:: torch_neuronx.trace(func, example_inputs, *_, input_output_aliases={}, compiler_workdir=None, compiler_args=None, partitioner_config=None, inline_weights_to_neff=True, cpu_backend=False)\n    \n    Trace and compile operations in the ``func`` by executing it using\n    ``example_inputs``.\n\n    This function is similar to a :func:`torch.jit.trace` since it produces a\n    :class:`~torch.jit.ScriptModule` that can be saved with\n    :func:`torch.jit.save` and reloaded with :func:`torch.jit.load`. The\n    resulting module is an optimized fused graph representation of the ``func``\n    that is *only* compatible with Neuron.\n\n    Tracing a module produces a more efficient *inference-only* version of the\n    model. XLA Lazy Tensor execution should be used during training. See:\n    :ref:`trace-vs-xla-lazytensor`\n\n    .. warning::\n\n        Currently this only supports |NeuronCore-v2| type instances\n        (e.g. |trn1|, inf2). To compile models compatible with |NeuronCore-v1|\n        (e.g. |inf1|), please see :func:`torch_neuron.trace`\n\n    :arg ~torch.nn.Module,callable func: The function/module that that will be\n       run using the ``example_inputs`` arguments in order to record the\n       computation graph.\n    :arg ~torch.Tensor,tuple[~torch.Tensor] example_inputs: A tuple of example\n       inputs that will be passed to the ``func`` while tracing.\n    :keyword dict input_output_aliases: Marks input tensors as state tensors\n       which are device tensors. \n    :keyword str compiler_workdir: Work directory used by\n       |neuronx-cc|. This can be useful for debugging and/or inspecting\n       intermediary |neuronx-cc| outputs\n    :keyword str,list[str] compiler_args: List of strings representing\n       |neuronx-cc| compiler arguments. See :ref:`neuron-compiler-cli-reference-guide`\n       for more information about compiler options.\n    :keyword PartitionerConfig partitioner_config: A PartitionerConfig object,\n        which can be optionally supplied if there are unsupported ops in the model \n        that need to be partitioned out to CPU.\n    :keyword bool inline_weights_to_neff: A boolean indicating whether the weights should be\n        inlined to the NEFF. If set to False, weights will be separated from the NEFF.\n        The default is ``True``.\n    :keyword bool cpu_backend: A boolean indicating whether CPU should be used for tracing. \n        If set to True, tracing can be done completely on CPU. This keyword needs to be used with \n        the ``compiler_args`` option to set the ``--target`` flag. The default is ``False``.\n\n    :returns: The traced :class:`~torch.jit.ScriptModule` with the embedded\n       compiled Neuron graph. Operations in this module will execute on Neuron.\n    :rtype: ~torch.jit.ScriptModule\n\n    .. warning::\n\n      Behavior Change! Using ``args`` for ``kwargs`` is no longer supported starting from release 2.15.0 (``torch-neuronx==1.13.1.1.12.0``).\n      The current behavior is that a warning will be raised, but ``torch_neuronx.trace()`` will attempt to infer the keyword\n      arguments. This is likely to become an error in future releases, so to avoid the warning/error, assign kwargs as kwargs and\n      not args.\n\n    .. rubric:: Notes\n\n    This function records operations using `torch-xla`_ to create a HloModule\n    representation of the ``func``. This fixed graph representation is\n    compiled to the Neuron Executable File Format (NEFF) using the |neuronx-cc|\n    compiler. The NEFF binary executable is embedded into an optimized\n    :class:`~torch.jit.ScriptModule` for `torchscript`_ execution.\n\n    In contrast to a regular :func:`torch.jit.trace` that produces a graph of\n    many separate operations, tracing with Neuron produces a graph with a single\n    fused operator that is executed entirely on device. In `torchscript`_\n    this appears as a stateful ``neuron::Model`` component with an associated\n    ``neuron::forward*`` operation.\n\n    Tracing can be performed on any EC2 machine with sufficient memory and\n    compute resources, but inference can only be executed on a Neuron instance.\n\n    Unlike some devices (such as `torch-xla`_) that use\n    :meth:`~torch.Tensor.to` to move :class:`~torch.nn.parameter.Parameter` and\n    :class:`~torch.Tensor` data between CPU and device, upon loading a\n    Neuron traced :class:`~torch.jit.ScriptModule`, the model binary executable\n    is automatically moved to a NeuronCore. When the underlying\n    ``neuron::Model`` is initialized after tracing or upon\n    :func:`torch.jit.load`, it is loaded to a Neuron device without specifying\n    a device or ``map_location`` argument.\n\n    .. warning::\n\n      One small exception is models traced with ``inline_weights_to_neff=False``. For these models,\n      the NEFF is loaded onto the NeuronCore automatically, but the weights are not moved automatically. To move\n      the weights to the NeuronCore, call :func:`torch_neuronx.move_trace_to_device`. If this is not\n      done, a perfomance penalty is incurred per inference, because on every inference call, the weights move from CPU\n      to Neuron.\n\n    Furthermore, the Neuron traced :class:`~torch.jit.ScriptModule` expects\n    to consume CPU tensors and produces CPU tensors. The underlying operation\n    performs all data transfers to and from the Neuron device without explicit\n    data movement. This is a significant difference from the training XLA\n    device mechanics since XLA operations are no longer required to\n    be recorded after a trace. See: :ref:`pytorch-neuronx-programming-guide`\n\n    By *default*, when multiple NeuronCores are available, every Neuron traced\n    model :class:`~torch.jit.ScriptModule` within in a process\n    is loaded to each available NeuronCore in round-robin order. This is\n    useful at deployment to fully utilize the Neuron hardware since it means\n    that multiple calls to :func:`torch.jit.load` will attempt to load to each\n    available NeuronCore in linear order. The default start device is chosen\n    according to the |nrt-configuration|.\n\n    A traced Neuron module has limitations that are not present in regular\n    torch modules:\n\n    - **Fixed Control Flow**: Similar to :func:`torch.jit.trace`, tracing a\n      model with Neuron statically preserves control flow (i.e.\n      ``if``/``for``/``while`` statements) and will not re-evaluate the branch\n      conditions upon inference. If a model result is based on data-dependent\n      control flow, the traced function may produce inaccurate results.\n    - **Fixed Input Shapes**: After a function has been traced, the resulting\n      :class:`~torch.jit.ScriptModule` will always expect to consume tensors\n      of the same shape. If the tensor shapes used at inference differs\n      from the tensor shapes used in the ``example_inputs``, this will result in\n      an error. See: |bucketing|.\n    - **Fixed Tensor Shapes**: The intermediate tensors within the\n      ``func`` must always stay the same shape for the same shaped inputs. This\n      means that certain operations which produce data-dependent\n      sized tensors are not supported. For example, :func:`~torch.nonzero`\n      produces a different tensor shape depending on the input data.\n    - **Fixed Data Types**: After a model has been traced, the input, output,\n      and intermediate data types cannot be changed without recompiling.\n    - **Device Compatibility**: Due to Neuron using a specialized compiled\n      format (NEFF), a model traced with Neuron can no longer be executed in any\n      non-Neuron environment.\n    - **Operator Support**: If an operator is unsupported by `torch-xla`_, then\n      this will throw an exception.\n\n    .. rubric:: Examples\n\n    *Function Compilation*\n\n    .. code-block:: python\n\n        import torch\n        import torch_neuronx\n        def func(x, y):\n            return 2 * x + y\n        example_inputs = torch.rand(3), torch.rand(3)\n        # Runs `func` with the provided inputs and records the tensor operations\n        trace = torch_neuronx.trace(func, example_inputs)\n        # `trace` can now be run with the TorchScript interpreter or saved\n        # and loaded in a Python-free environment\n        torch.jit.save(trace, 'func.pt')\n        # Executes on a NeuronCore\n        loaded = torch.jit.load('func.pt')\n        loaded(torch.rand(3), torch.rand(3))\n    \n    *Module Compilation*\n\n    .. code-block:: python\n\n        import torch\n        import torch_neuronx\n        import torch.nn as nn\n        class Model(nn.Module):\n            def __init__(self):\n                super().__init__()\n                self.conv = nn.Conv2d(1, 1, 3)\n            def forward(self, x):\n                return self.conv(x) + 1\n        model = Model()\n        model.eval()\n        example_inputs = torch.rand(1, 1, 3, 3)\n        # Traces the forward method and constructs a `ScriptModule`\n        trace = torch_neuronx.trace(model, example_inputs)\n        torch.jit.save(trace, 'model.pt')\n        # Executes on a NeuronCore\n        loaded = torch.jit.load('model.pt')\n        loaded(torch.rand(1, 1, 3, 3))\n\n    *Weight Separated Module*\n\n    .. code-block:: python\n\n        import torch\n        import torch_neuronx\n        import torch.nn as nn\n\n        class Model(nn.Module):\n\n            def __init__(self):\n                super().__init__()\n                self.conv = nn.Conv2d(1, 1, 3)\n\n            def forward(self, x):\n                return self.conv(x) + 1\n\n        model = Model()\n        model.eval()\n\n        example_inputs = torch.rand(1, 1, 3, 3)\n\n        # Traces the forward method and constructs a `ScriptModule`\n        trace = torch_neuronx.trace(model, example_inputs,inline_weights_to_neff=False)\n\n        # Model can be saved like a normally traced model\n        torch.jit.save(trace, 'model.pt')\n\n        # Executes on a NeuronCore like a normally traced model\n        loaded = torch.jit.load('model.pt')\n        torch_neuronx.move_trace_to_device(loaded,0) # necessary for performance\n        loaded(torch.rand(1, 1, 3, 3))\n    \n    *CPU Compilation*\n\n    On CPU:\n\n    .. code-block:: python\n\n        import torch\n        import torch_neuronx\n        import torch.nn as nn\n        class Model(nn.Module):\n            def __init__(self):\n                super().__init__()\n                self.conv = nn.Conv2d(1, 1, 3)\n            def forward(self, x):\n                return self.conv(x) + 1\n        model = Model()\n        model.eval()\n        example_inputs = torch.rand(1, 1, 3, 3)\n        # Traces the forward method on CPU, compiling for Trn1\n        trace = torch_neuronx.trace(model, example_inputs, compiler_args=\"--target trn1\", cpu_backend=True)\n        torch.jit.save(trace, 'model.pt')\n        # Move model.pt to a Neuron instance\n    \n    On Neuron:\n\n    .. code-block:: python\n\n      import torch\n      import torch_neuronx\n      import torch.nn as nn\n      \n      loaded = torch.jit.load('model.pt')\n      loaded(torch.rand(1, 1, 3, 3))\n    \n    .. note::\n\n      Weight Separated models can have its weights replaced via the `torch_neuronx.replace_weights` API.\n\n.. _torch-neuronx-device-movement:\n\nMoving a Traced Module to a Neuron Core\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. warning::\n  This function will be deprecated in a future release, and instead, :func:`torch_neuronx.experimental.set_neuron_cores` will move out of experimental, and become a stable API.\n\n.. py:function:: torch_neuronx.move_trace_to_device(trace, device_id)\n\n  This function moves a model traced with :func:`torch_neuronx.trace`, to a Neuron Core. Here are some reasons to use this function|colon|\n\n  1. Explicit control of device placement for models\n    By default, the Neuron Runtime assigns neffs to devices in a Round Robin manner, meaning it will allocate a neff onto Neuron Core 0, then 1, 2, and then loop around.\n  2. Allocating Weights onto the Neuron Core for Weight Separated models.\n    This is necessary for performance reasons. If this is not done, the weights would remain on CPU and would need to move to device on every inference call, which is an expensive operation.\n\n  :arg ~torch.jit.ScriptModule trace: This is the torchscript model returned from :func:`torch_neuronx.trace`\n  :arg int device_id: The Neuron Core to move the traced model to. This number will need to be between 0 to the max number of NCs on the instance - 1. For example, a trn1.32xlarge has 32 Neuron Cores, so the acceptable values are from 0-31.\n\n  :returns: Nothing, the movement of the model happens in-place. \n  :rtype: None\n\n.. _torch-neuronx-autobucketing:\n\nAutobucketing\n~~~~~~~~~~~~~\n\n.. note::\n  \n  See :func:`neuronx_distributed.parallel_model_trace` for the API to use the autobucketing feature along with tensor parallelism.\n\n.. py:class:: torch_neuronx.BucketModelConfig(bucket_kernel, *_, shared_state_buffer=None, shared_state_buffer_preprocessor=None, func_kwargs=None)\n\n    This object contains configuration data for how buckets are selected based on input via the ``bucket_kernel``.\n    \n    This also supports the concept of a shared buffer between bucket models. You can use this to define how the shared buffer can be manipulated to be fed as input to a bucket model via the ``shared_state_buffer_preprocessor``. Details on how these are defined are found below.\n\n    :arg callable bucket_kernel: A function that returns a new TorchScript function. The TorchScript function has been adapted to the TorchScript\n     representation using :func:`torch.jit.script`. This new function takes in a list of input tensors and outputs a list of tensors and an index tensor.\n    \n    :keyword Optional[List[torch.Tensor]] shared_state_buffer: A list of tensors that is used as the initial values for\n        a shared state for bucket models via aliasing.\n    :keyword Optional[Callable] shared_state_buffer_preprocessor: Similar to bucket_kernel, this is a function that returns a\n        new TorchScript function that has been adapted to the TorchScript representation using :func:`torch.jit.script`.\n        This new TorchScript function takes in 3 arguments: an n-dimensional integer list representing a list\n        of tensor shapes, the state_buffer list of tensors, and a tensor representing the bucket index.\n        This function outputs a reshaped state_buffer to be supplied to the bucket model. If ``shared_state_buffer_preprocessor`` is not supplied when\n        ``shared_state_buffer`` is supplied, the preprocessor returns the full ``shared_state_buffer``.\n    :keyword Optional[Union[Dict[str, Any], List[Any]]] func_kwargs: A single dictionary or a list of dictionaries that can be used\n        to supply custom arguments to the function supplied to the ``func`` argument\n        in :func:`torch_neuronx.bucket_model_trace`. If you are using a list of dictionaries,\n        verify that func_kwargs equals the bucket degree, or number of buckets.\n        By default func_kwargs is None, which means no arguments.\n    \n    :returns: The  :class:`torch_neuronx.BucketModelConfig` with the configuration defining bucket selection for inputs and shared buffers.\n    :rtype: ~torch_neuronx.BucketModelConfig\n\n.. py:function:: torch_neuronx.bucket_model_trace(func, example_inputs, bucket_config, compiler_workdir=None, compiler_args=None)\n\n    This function traces a single model with multiple ``example_inputs`` and a ``bucket_config`` object to produce a single compiled model that can take in multiple input shapes. This trace function is very similar to :func:`torch_neuronx.trace`, but it has a few key differences:\n\n    1. In this case, ``func`` does not take in a ``Model``. Instead, it takes in a function that returns a tuple containing a ``Model`` and ``input_output_aliases``. This is like :func:`neuronx_distributed.parallel_model_trace`, and is done for the same reason, which is that bucket models are traced in parallel. \n    2. Instead of taking in one input, the function takes in multiple inputs in the form of a list. For example, ``[torch.rand(128,128),torch.rand(256,256)]``. \n    3. The ``bucket_config`` argument is of type :func:`torch_neuronx.BucketModelConfig`, which defines how an input is mapped to a bucket. For more details, see the :func:`torch_neuronx.BucketModelConfig` API Reference. You can use this for a variety of bucketing applications, such as sequence length bucketing for language models or image resolution bucketing for computer vision models.\n\n    Apart from the aforementioned differences, the rest of the function behaves similarly to :func:`torch_neuronx.trace`. You can save the model with :func:`torch.jit.save` and load it with :func:`torch.jit.load`.\n\n    :arg ~torch.nn.Module,callable func: This is a function that returns a ``Model``\n        object and a dictionary of states, or input_output_aliases. Similar to :func:`neuronx_distributed.parallel_model_trace`, this API\n        calls this function inside each worker and runs trace against them. Note: This differs\n        from the ``torch_neuronx.trace`` where the ``torch_neuronx.trace``\n        requires a model object to be passed.\n    :arg List[Union[~torch.Tensor,tuple[~torch.Tensor]]] example_inputs: A list of possible\n        inputs to the bucket model.\n    :arg ~torch_neuronx.BucketModelConfig bucket_config: The config object that defines\n        bucket selection behavior.\n    \n    :keyword str compiler_workdir: Work directory used by\n       |neuronx-cc|. This can be useful for debugging and inspecting\n       intermediary |neuronx-cc| outputs.\n    :keyword str,list[str] compiler_args: List of strings representing\n       |neuronx-cc| compiler arguments. See :ref:`neuron-compiler-cli-reference-guide`\n       for more information about compiler options.\n\n    :returns: The traced :class:`~torch.jit.ScriptModule` with the embedded\n       compiled Neuron graphs for each bucket model. Operations in this module will execute on Neuron.\n    :rtype: ~torch.jit.ScriptModule\n\n.. warning::\n    \n  If you receive the ``Too Many Open Files`` error message, increase the ulimit via ``ulimit -n 65535``. There is\n  a limitation in torch_xla's ``xmp.spawn`` function when dealing with large amounts of data.\n  \nThe developer guide for Autobucketing is located :ref:`here <torch-neuronx-autobucketing-devguide>`, which contains an example usage of autobucketing with BERT.\n\n.. _torch-neuronx-dynamic-batching:\n\nDynamic Batching\n~~~~~~~~~~~~~~~~\n\n.. py:function:: torch_neuronx.dynamic_batch(neuron_script)\n\n    Enables a compiled Neuron model to be called with variable sized batches.\n\n    When tracing with Neuron, usually a model can only consume tensors that are the same size as the example tensor used in the :func:`torch_neuronx.trace` call. Enabling dynamic batching allows a model to consume inputs that may be either smaller or larger than the original trace-time tensor size. Internally, dynamic batching splits & pads an input batch into chunks of size equal to the original trace-time tensor size. These chunks are passed to the underlying model(s). Compared to serial inference, the expected runtime scales by ``ceil(inference_batch_size / trace_batch_size) / neuron_cores``.\n    \n    This function modifies the ``neuron_script`` network in-place. The returned result is a reference to the modified input.\n\n    Dynamic batching is only supported by chunking inputs along the 0th dimension. A network that uses a non-0 batch dimension is incompatible with dynamic batching. Upon inference, inputs whose shapes differ from the compile-time shape in a non-0 dimension will raise a ValueError. For example, take a model was traced with a single example input of size ``[2, 3, 5]``. At inference time, when dynamic batching is enabled, a batch of size ``[3, 3, 5]`` is *valid* while a batch of size ``[2, 7, 5]`` is *invalid* due to changing a non-0 dimension.\n\n    Dynamic batching is only supported when the 0th dimension is the same size for all inputs. For example, this means that dynamic batching would not be applicable to a network which consumed two inputs with shapes ``[1, 2]`` and ``[3, 2]`` since the 0th dimension is different. Similarly, at inference time, the 0th dimension batch size for all inputs must be identical otherwise a ValueError will be raised.\n    \n    *Required Arguments*\n\n    :arg ~torch.jit.ScriptModule neuron_script: The neuron traced :class:`~torch.jit.ScriptModule` with the\n       embedded compiled neuron graph. This is the output of :func:`torch_neuronx.trace`.\n\n    :returns: The traced :class:`~torch.jit.ScriptModule` with the embedded\n       compiled neuron graph. The same type as the input, but with dynamic_batch enabled in the neuron graph.\n    :rtype: ~torch.jit.ScriptModule\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n    import torch.nn as nn\n\n    class Net(nn.Module):\n        def __init__(self):\n            super(Net, self).__init__()\n            self.conv = nn.Conv2d(1, 1, 3)\n\n        def forward(self, x):\n            return self.conv(x) + 1\n\n    n = Net()\n    n.eval()\n\n    inputs = torch.rand(1, 1, 3, 3)\n    inputs_batch_8 = torch.rand(8, 1, 3, 3)\n\n    # Trace a neural network with input batch size of 1\n    neuron_net = torch_neuronx.trace(n, inputs)\n\n    # Enable the dynamic batch size feature so the traced network\n    # can consume variable sized batch inputs\n    neuron_net_dynamic_batch = torch_neuronx.dynamic_batch(neuron_net)\n\n    # Run inference on inputs with batch size of 8\n    # different than the batch size used in compilation (tracing)\n    ouput_batch_8 = neuron_net_dynamic_batch(inputs_batch_8)\n\nGraph Partitioner\n~~~~~~~~~~~~~~~~~\n\n.. py:function:: torch_neuronx.PartitionerConfig(*,trace_kwargs=None,model_support_percentage_threshold=0.5,min_subgraph_size=-1,max_subgraph_count=-1,ops_to_partition=None,analyze_parameters=None)\n\n    Allows for Neuron to trace a model with unsupported operators and partition these operators to CPU.\n\n    This model will contain subgraphs of Neuron and CPU submodules, but it is executed like one model,\n    and can be saved and loaded like one model as well.\n\n    The graph partitioner is customized using this class, and is *only* enabled (disabled by default) from the ``torch_neuronx.trace`` API by setting ``partitioner_config``\n    keyword argument to this class. Below are the various configuration options.\n\n    :arg Dict trace_kwargs: Used if you need to pass trace kwargs to the Neuron subgraphs, such as the\n      ``compiler_workdir`` and/or ``compiler_args``. The default is ``None`` corresponding to the default trace args.\n    \n    :arg float model_support_percentage_threshold: A number between 0 to 1 representing\n      the maximum allowed percentage of operators that must be supported.\n      If the max is breached, the function will throw a ValueError.\n      Default is ``0.5`` (i.e 50% of operators must be supported by Neuron)\n    \n    :arg int min_subgraph_size: The minimum number of operators in a subgraph.\n      Can be ``>= 1`` or ``== -1``. If ``-1``, minimum subgraph size is not checked (i.e no minimum).\n      If ``>= 1``, each subgraph must contain at least that many operators.\n      If not, the graph partitioner will throw a ``ValueError``.\n    \n    :arg int max_subgraph_count: The maximum number of subgraphs in the partitioned model.\n      Can be ``>= 1`` or ``== -1``. If ``-1``, max subgraph count is not checked (i.e no maximum).\n      If ``>= 1``, the partitioned model must contain at most that many subgraphs.\n      If not, the graph partitioner will throw a ``ValueError``.\n    \n    :arg Set[str] ops_to_partition: This is a set of strings of this structure \"aten::<operator>\".\n      These are operators that will be partitioned to CPU regardless of Neuron support.\n      The default is ``None`` (i.e no additional operators will be partitioned).\n\n    :arg Dict analyze_parameters: This is a dictionary of kwargs used in ``torch_neuronx.analyze()``.\n      NOTE: Not all kwargs in ``torch_neuronx.analyze()`` are supported\n      in the graph partitioner.\n      The following ``kwargs`` in analyze are supported for use in the graph partitioner.\n\n      * ``compiler_workdir``\n      * ``additional_ignored_ops``\n      * ``max_workers``\n\n      The default is ``None``, corresponding to the default analyze arguments.\n\n    :returns: The  :class:`~torch_neuronx.PartitionerConfig` with the configuration for the graph partitioner.\n    :rtype: ~torch_neuronx.PartitionerConfig\n\n.. rubric:: Examples\n\n.. _graph_partitioner_example_default_usage:\n\nThis example demonstrates using the graph partitioner.\n\nThe below model is a simple MLP model with sorted log softmax output.\nThe sort operator, ``torch.sort()`` or ``aten::sort``, is not supported\nby ``neuronx-cc`` at this time, so the graph partitioner will partition\nout the sort operator to CPU.\n\n.. code-block:: python\n\n  import torch\n  import torch_neuronx\n  import torch.nn as nn\n\n  import logging\n  \n  # adjust logger level to see what the partitioner is doing\n  logger = logging.getLogger(\"Neuron\")\n\n  class MLP(nn.Module):\n      def __init__(\n          self, input_size=28 * 28, output_size=10, layers=[4096, 2048]\n      ):\n          super(MLP, self).__init__()\n          self.fc1 = nn.Linear(input_size, layers[0])\n          self.fc2 = nn.Linear(layers[0], layers[1])\n          self.fc3 = nn.Linear(layers[1], output_size)\n          self.relu = nn.ReLU()\n\n      def forward(self, x):\n          f1 = self.fc1(x)\n          r1 = self.relu(f1)\n          f2 = self.fc2(r1)\n          r2 = self.relu(f2)\n          f3 = self.fc3(r2)\n          out = torch.log_softmax(f3, dim=1)\n          sort_out,_ = torch.sort(out)\n          return sort_out\n\n  n = MLP()\n  n.eval()\n\n  inputs = torch.rand(32,784)\n\n  # Configure the graph partitioner with the default values\n  partitioner_config = torch_neuronx.PartitionerConfig()\n\n  # Trace a neural network with graph partitioner enabled\n  neuron_net = torch_neuronx.trace(n, inputs, partitioner_config=partitioner_config)\n\n  # Run inference on the partitioned model\n  output = neuron_net(inputs)\n\n.. note::\n  Dynamic batching has a case-by-case support with partitioned\n  models, because it is highly dependent on how the\n  final partition scheme looks like.\n\n.. |neuron-cc| replace:: :ref:`neuron-cc <neuron-compiler-cli-reference>`\n.. |neuronx-cc| replace:: :ref:`neuronx-cc <neuron-compiler-cli-reference-guide>`\n.. |NeuronCore-v1| replace:: :ref:`NeuronCore-v1 <neuroncores-v1-arch>`\n.. |NeuronCore-v2| replace:: :ref:`NeuronCore-v2 <neuroncores-v2-arch>`\n\n.. |HloModule| replace:: HloModule\n\n.. |inf1| replace:: :ref:`inf1 <aws-inf1-arch>`\n.. |trn1| replace:: :ref:`trn1 <aws-trn1-arch>`\n\n.. |bucketing| replace:: :ref:`bucketing_app_note`\n.. |nrt-configuration| replace:: :ref:`nrt-configuration`\n\n.. _torch-xla: https://github.com/pytorch/xla\n.. _torchscript: https://pytorch.org/docs/stable/jit.html"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/inference/inference-api-guide-torch-neuronx.rst",
    "content": "\n.. meta::\n   :description: API Reference Guide  (``torch-neuronx``) - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nAPI Reference Guide  (``torch-neuronx``)\n========================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-trace\n    /frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-replace-weights\n    /frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-core-placement\n    /frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-analyze\n    /frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-data-parallel\n\n\n.. dropdown::  API Reference Guide  (``torch-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n    :open:\n\n    * :ref:`torch_neuronx_trace_api`\n    * :ref:`torch_neuronx_replace_weights_api`\n    * :ref:`torch_neuronx_core_placement_api`\n    * :ref:`torch_neuronx_analyze_api`\n    * :ref:`torch_neuronx_dataparallel_api`\n    * :ref:`torch_neuronx_lazy_async_load_api`\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/torch-neuronx-profiling-api.rst",
    "content": ".. _torch-neuronx-profiling-api:\n\n\n.. meta::\n   :description: PyTorch NeuronX Profiling API - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, profiling, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nPyTorch NeuronX Profiling API\n===============================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nThe profiler provides a method to generate a context manager to capture\ntrace events at the operator or runtime level.\n\n.. py:function:: torch_neuronx.experimental.profiler.profile(port=9012,ms_duration=60000,neuron_tensorboard_plugin_dir=\"logs/plugins/neuron\",profile_type=\"operator\",auto_start=True,delete_working=True)\n\n   The :func:`torch_neuronx.experimental.profiler.profile` method returns a ``profile`` context manager object. This object\n   doesn't need to be used directly, as default options are set to auto capture events based on the ``profile_type``.\n\n   The context manager will wrap around the entire model\n   and training/inference loop. The context-manager is \n   backwards-compatible with the torch_xla.debug.profiler``\n\n   *Required Arguments*\n\n   None\n\n   *Optional Keyword Arguments*\n\n   :keyword int port: Port to run the profiling GRPC server on. Default is 9012.\n   :keyword int ms_duration: This defines how long the profiler will capture the\n      HLO artifacts from the model to view in the profiler. The unit is in\n      milliseconds. The default value is 60000 ms, or 1 minute.\n   :keyword str neuron_tensorboard_plugin_dir: The directory the neuron tensorboard plugin will file write to.\n      This will be ``logs/plugins/neuron`` by default/\n   :keyword str profile_type: There is “trace” and “operator”. “trace”\n      is the Torch Runtime Trace Level, while “operator” is the Model\n      Operator Trace Level. Default is \"operator\"\n   :keyword bool auto_start: If set to true, the profiler will start profiling immediately.\n      If set to false, the profiler can be set to start at a later condition.\n      Refer to ``profile.start()`` for more details. Default is ``True``.\n   :keyword bool delete_working: If set to False turns off the deletion of temporary files. Default True.\n   :keyword str traced_only: This should be set to ``True`` if profiling a model that has been traced with\n      ``torch_neuronx.trace()``. Default is ``False``.\n      \n   :returns: The traced :class:`profile`\n\n   :rtype: ~profile\n\n.. py:function:: torch_neuronx.experimental.profiler.profile.start()\n\n   The :func:`torch_neuronx.experimental.profiler.profile.start` method starts the profiler if not started (i.e when ``auto_start=False``).\n   This function does not take in any parameters, nor return anything.\n\n    *Required Arguments*\n\n   None\n\n    *Optional Keyword Arguments*\n\n   None\n\n   :returns: None\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/training/index.rst",
    "content": "\n.. meta::\n   :description: API Reference Guide for Training (``torch-neuronx``) - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training\n   :date-modified: 2026-03-13\n\n\nAPI Reference Guide for Training (``torch-neuronx``) \n====================================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /frameworks/torch/torch-neuronx/api-reference-guide/training/pytorch-neuron-parallel-compile\n    /frameworks/torch/torch-neuronx/api-reference-guide/training/torch-neuron-envvars\n    /about-neuron/arch/neuron-features/neuron-caching\n    /frameworks/torch/torch-neuronx/api-reference-guide/torch-neuronx-profiling-api\n\n\n.. dropdown::  API Reference Guide for Training (``torch-neuronx``) \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n    :open:\n    \n    * :ref:`pytorch-neuronx-parallel-compile-cli`\n    * :ref:`neuron-caching`\n    * :ref:`pytorch-neuronx-envvars`\n    * :ref:`torch-neuronx-profiling-api`\n\n\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/training/pytorch-neuron-parallel-compile.rst",
    "content": ".. _pytorch-neuronx-parallel-compile-cli:\n\n\n.. meta::\n   :description: PyTorch NeuronX neuron_parallel_compile CLI - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training\n   :date-modified: 2026-03-13\n\n\nPyTorch NeuronX neuron_parallel_compile CLI\n=============================================\n\nPyTorch NeuronX performs just-in-time compilation of graphs during\nexecution. At every step, a graph is traced. If the traced graph varies\nfrom the previous executions, it is compiled by the neuron compiler. For\nlarge models, the compilation time for each graph can be high. Moreover,\nbecause of JIT, we would compile all these graphs sequentially, hence\nincurring huge compilation penalty.\n\nTo reduce this compilation time during execution, the ``neuron_parallel_compile``\nutility is provided as part of PyTorch Neuron installation. The\n``neuron_parallel_compile`` will extract graphs from a trial run of your script,\nperform parallel pre-compilation of the graphs, and populate the :ref:`Neuron Persistent Cache <neuron-caching>`\non disk or in AWS S3 bucket with compiled graphs.\nYour trial run should be limited to a few steps\n(eg.10-15), enough for the utility to extract the different graphs needed for\nfull execution. To run the utility:\n\n``neuron_parallel_compile <run commands>``\n\nWhere ``<run commands>`` are the commands to run a short run (i.e. 10\nsteps) to trace training loops for pre-compilation. The example for\nthe run command is ``torchrun --nproc_per_node=2 <train script>``, where\ntrain script accepts ``--steps_this_run`` option to limit number of run steps:\n\n``neuron_parallel_compile torchrun --nproc_per_node=2 <train script> --steps_this_run=10``\n\nYou may notice that the output from the model is invalid when you use\n``neuron_parallel_compile``. This is because when you initiate your training\nrun command with ``neuron_parallel_compile``, the utility will run your command\nwith environment variables that puts your training script into graph\nextraction mode. In this mode, no real execution is performed and the outputs\nare invalid. You will also see outputs similar to the following about the compile cache path and the\nextracted graphs:\n\n.. code:: bash\n\n   INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache\n   INFO ||NEURON_CC_WRAPPER||: Extracting graphs (/var/tmp/neuron-compile-cache/neuronxcc-2.0.0.22266a0+a69f71e55/MODULE_9219523464496887986+abb26765/model.hlo.pb) for ahead-of-time parallel compilation. No compilation was done.\n\nAfter the trial execution ends and the graphs are extracted, ``neuron_parallel_compile`` would launch multiple compilation processes in parallel to compile all these graphs. Compiled graphs (NEFFs) are inserted into the Neuron Persistent Cache. You will also see outputs similar to the following about the compile cache path, the list of graphs (HLOs) to be compiled, and the running statistics of compiled graphs (count of remaining graphs, locked graphs, failed graphs, done compiled graphs).\n\n.. code:: bash\n\n    INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache\n    INFO ||NEURON_CACHE||: Current remaining items are 5, locked are 0, failed are 0, done are 0, total is 5\n    INFO ||NEURON_PARALLEL_COMPILE||: master grab hlos to compile: ['/var/tmp/neuron-compile-cache/neuronxcc-2.0.0.22266a0+a69f71e55/MODULE_8068656800389078395+abb26765/model.hlo.pb', '/var/tmp/neuron-compile-cache/neuronxcc-2.0.0.22266a0+a69f71e55/MODULE_17109392703413819652+abb26765/model.hlo.pb', '/var/tmp/neuron-compile-cache/neuronxcc-2.0.0.22266a0+a69f71e55/MODULE_9219523464496887986+abb26765/model.hlo.pb', '/var/tmp/neuron-compile-cache/neuronxcc-2.0.0.22266a0+a69f71e55/MODULE_16969875447143373016+abb26765/model.hlo.pb', '/var/tmp/neuron-compile-cache/neuronxcc-2.0.0.22266a0+a69f71e55/MODULE_3000743782456078279+abb26765/model.hlo.pb']\n    ...\n    INFO ||NEURON_CACHE||: Current remaining items are 0, locked are 0, failed are 0, done are 5, total is 5\n\nAfter all compilations are completed, a compilation summary is shown:\n\n.. code:: bash\n\n   INFO: 2023-08-24 20:21:11.000895:  161136  INFO ||NEURON_PARALLEL_COMPILE||: {\n   INFO:     \"compilation_summary\": {\n   INFO:         \"true\": 2\n   INFO:     },\n   INFO:     \"compilation_report\": {\n   INFO:         \"/var/tmp/neuron-compile-cache/neuronxcc-2.0.0.22266a0+a69f71e55/MODULE_1970132581169579119+abb26765/model.hlo.pb\": {\n   INFO:             \"status\": true,\n   INFO:             \"retry\": 0\n   INFO:         },\n   INFO:         \"/var/tmp/neuron-compile-cache/neuronxcc-2.0.0.22266a0+a69f71e55/MODULE_16141953836240613513+abb26765/model.hlo.pb\": {\n   INFO:             \"status\": true,\n   INFO:             \"retry\": 0\n   INFO:         }\n   INFO:     }\n   INFO: }\n   INFO: 2023-08-24 20:21:11.000895:  161136  INFO ||NEURON_PARALLEL_COMPILE||: Total graphs: 2\n   INFO: 2023-08-24 20:21:11.000895:  161136  INFO ||NEURON_PARALLEL_COMPILE||: Total successful compilations: 2\n   INFO: 2023-08-24 20:21:11.000895:  161136  INFO ||NEURON_PARALLEL_COMPILE||: Total failed compilations: 0\n\nNow if you run your script (without ``neuron_parallel_compile``), it will be faster\nsince the compiled graphs are already cached.\n\n``torchrun --nproc_per_node=2 <train script>``\n\n``Note``: Except for the option to limit number of run steps (such as ``--steps_this_run``),\nthe other options of ``<run commands>`` must match between the pre-compilation and\nactual run. If this is not the case, you may see additional compilations during training\nrun because of new graphs getting generated, resulting in cache miss.\n\nThere may be additional compilations due to unreached execution paths (in case the\nexecution path is not reached in the first few steps of graph extraction), or changes\nin parameters such as number of data parallel workers.\n\nEach precompilation command or actual script execution command above can be prefixed with ``NEURON_COMPILE_CACHE_URL=<cache URL>`` or ``NEURON_CC_FLAGS=\"--cache_dir=<cache URL>\"`` to specify a different cache location than the default (with ``--cache_dir`` taking precedence over ``NEURON_COMPILE_CACHE_URL`` if both are specified). Alternatively, the cache URL can also be specify in Python code using:\n\n.. code:: python\n\n    os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + \"--cache_dir=<cache URL>\"\n\nYou need to specify the same cache URL for both the precompilation command (using ``neuron_parallel_compile``) and the actual script execution command if you want the previously compiled and cached graphs to be used for actual script execution.\n\nThe environment variables below are available to help modify ``neuron_parallel_compile`` behavior:\n\n``NEURON_PARALLEL_COMPILE_MAX_RETRIES`` :\n\n-  Set the maximum number of retries when using :ref:`Neuron Persistent Cache <neuron-caching>` or :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>`.\n   If set to N, the tool will try compilation N more time(s) if the first graph compilation\n   failed. Example: Set NEURON_PARALLEL_COMPILE_MAX_RETRIES=1 when precompiling on\n   trn1.2xlarge where there's limited host memory and CPU resources.\n   Default is 0.\n\n``NEURON_IGNORE_TRAINING_SCRIPT_ERROR_AND_COMPILE`` :\n\n- When using :ref:`Neuron Persistent Cache <neuron-caching>` or :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>` , if you want to ignore the error in training script\n  and compile the accumulated HLO graphs, you can do so by setting this environment variable.\n  Example: If NEURON_IGNORE_TRAINING_SCRIPT_ERROR_AND_COMPILE=1 is set when using ``neuron_parallel_compile``,\n  a crash in the training script would be ignored and the graphs collected up to the crash would be\n  compiled.\n\n``NEURON_COMPILE_CACHE_URL``:\n\n-  Set the :ref:`Neuron Persistent Cache <neuron-caching>` URL or :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>`.\n   If starts with ``s3://``, it will use AWS S3 as cache backend. Otherwise it will use\n   local disk cache. Default is ``/var/tmp/neuron-compile-cache``.\n   If this is specified together with ``cache_dir=<cache_url>`` option via ``NEURON_CC_FLAGS``, the ``--cache_dir`` option takes precedence.\n\n\nDebugging with Neuron Persistent Cache\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA graph compilation can fail because of a compilation error or an environment issue (for example, compilation is interrupted by ctrl-C). The graph would be marked as failed and subsequent rerun would encounter message like below:\n\n.. code:: bash\n\n    INFO ||NCC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.8.0.25+a3ad0f342/MODULE_12486829708343293975+d41d8cd9/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation. \n\nTo retry compilation,\nadd ``--retry_failed_compilation`` in ``NEURON_CC_FLAGS`` environment variable. This will retry the compilation even if the graph was previously marked as failed compilation.\n\n.. code:: python\n\n   os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + ' --retry_failed_compilation'\n\nSee :ref:`Neuron Persistent Cache <neuron-caching>` for more information.\n\nSeparate collection and compilation commands\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nFor cases like finetuning, there could be multiple independent training tasks running on different nodes\nand sharing many compilation graphs in common. ``neuron_parallel_compile`` provides commands to separate \nthe graph collection and compilation phases, so users can collect all graphs across different training sessions in advance to avoid duplicate compilations.\n\nTo only collect the graphs from trial executions of training scripts into Neuron Persistent Cache:\n\n.. code:: bash\n\n    neuron_parallel_compile --command collect <run_script>\n\nTo compile the graph previously collected using ``collect`` command and store compiled result (NEFFs) back into Neuron Persistent Cache (make sure to use the same neuronx-cc compiler version as during the graph collection step):\n\n.. code:: bash\n\n    ``neuron_parallel_compile --command compile <run_script>``\n\nNote: if ``--command`` is not specified, ``neuron_parallel_compile`` will do both collection and compilation phases by default.\n\nCache maintenance commands\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe following commands are available to help maintain the cache.\n\n.. warning::\n   \n    Make sure no running process is using the cache when you use ``clean`` or ``clear-locks`` command because it can cause cache errors.\n\nTo clean cached files:\n\n.. code:: bash\n\n    # WARNING: Make sure no running process is using the cache\n    neuron_parallel_compile --command clean\n    \nTo clear file locks left behind when a ``neuron_parallel_compile`` execution was interrupted:\n\n.. code:: bash\n\n    # WARNING: Make sure no running process is using the cache\n    neuron_parallel_compile --command clear-locks\n\nEach command above can be prefixed with ``NEURON_COMPILE_CACHE_URL=<cache URL>`` or ``NEURON_CC_FLAGS=\"--cache_dir=<cache URL>\"`` to specify a different cache location than the default.\n\n.. note::\n\n   Currently there's no automatic maintenance of cache size either on disk or in S3. Please delete files (i.e. older compiler versions) as necessary to keep cache size within your limit.\n\nAnalyze operations support\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nThe analyze command checks the support of operations within the training script by checking each operator against neuronx-cc.\nIt is only supported for PyTorch models. The output of the tool will be available as result.json within the output location.\n\n.. code:: bash\n\n    neuron_parallel_compile --command analyze python3 training_script.py\n\nOptional Arguments:\n\n    ``--analyze-output ANALYZE_OUTPUT_LOCATION``\n    Only supported for --command analyze. Path to location where output will be persisted.\n    Default: cwd/model_analysis_result\n\n    ``--analyze-verbosity {1,2}``\n    Only supported for --command analyze. Level of information to be included within the output.\n    1: add XLA operator information into the results.\n    2: add aten metadata into results.\n    Default: 2\n\nThe tutorial for ``analyze`` can be found :ref:`here <torch-analyze-for-training-tutorial>`\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/api-reference-guide/training/torch-neuron-envvars.rst",
    "content": ".. _pytorch-neuronx-envvars:\n\n\n.. meta::\n   :description: PyTorch NeuronX Environment Variables - AWS Neuron SDK documentation\n   :keywords: API reference, AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training\n   :date-modified: 2026-03-13\n\n\nPyTorch NeuronX Environment Variables\n======================================\n\nEnvironment variables allow modifications to PyTorch NeuronX behavior\nwithout requiring code change to user script. It is recommended to set\nthem in code or just before invoking the python process, such as\n``NEURON_FRAMEWORK_DEBUG=1 python3 <script>`` to avoid inadvertently\nchanging behavior for other scripts. Environment variables specific to\nPyTorch Neuron are (beta ones are noted):\n\n``NEURON_CC_FLAGS``\n\n-  Compiler options. Full compiler options are described in the :ref:`neuronx-cc-training-mixed-precision`.\n   Additional options for the Neuron\n   Persistent Cache can be found in the :ref:`Neuron Persistent Cache <neuron-caching>` guide.\n\n``NEURON_FRAMEWORK_DEBUG``\n\n-  Enable dumping of XLA graphs in both HLO format (intermediate representation) and text form for debugging.\n\n``NEURON_EXTRACT_GRAPHS_ONLY``\n\n-  Dump the XLA graphs in HLO format (intermediate representation) and execute empty stubs with zero outputs\n   in order to allow multiple XLA graphs to be traced through a trial execution.\n   Used automatically for ahead-of-time\n   graph extraction for parallel compilation in :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>`\n   tool. This environment variable can be checked in the training script\n   to prevent checking of bad outputs during trial run.\n\n``NEURON_NUM_RECENT_MODELS_TO_KEEP`` \n\n-  Keep only N number of graphs loaded in Neuron runtime for each\n   process, where N is the value this environment variable is set to.\n   Default is to keep all graphs loaded by a process.\n\n``NEURON_COMPILE_CACHE_URL``\n\n-  Set the :ref:`Neuron Persistent Cache <neuron-caching>` URL or :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>`.\n   If starts with ``s3://``, it will use AWS S3 as cache backend. Otherwise it will use\n   local disk cache. Default is ``/var/tmp/neuron-compile-cache``.\n   If this is specified together with ``cache_dir=<cache_url>`` option via ``NEURON_CC_FLAGS``, the ``--cache_dir`` option takes precedence.\n\n``NEURON_PARALLEL_COMPILE_MAX_RETRIES``\n\n-  Set the maximum number of retries when using :ref:`Neuron Persistent Cache <neuron-caching>` or :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>`.\n   If set to N, the tool will try compilation N more time(s) if the first graph compilation failed.\n   Example: Set NEURON_PARALLEL_COMPILE_MAX_RETRIES=1 when precompiling on \n   trn1.2xlarge where there's limited host memory and CPU resources.\n   Default is 0.\n\n``NEURON_IGNORE_TRAINING_SCRIPT_ERROR_AND_COMPILE`` \n\n- When using :ref:`Neuron Persistent Cache <neuron-caching>` or :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>` , if you want to ignore the error in training script\n  and compile the accumulated HLO graphs, you can do so by setting this environment variable.\n  Example: If NEURON_IGNORE_TRAINING_SCRIPT_ERROR_AND_COMPILE=1 is set when using ``neuron_parallel_compile``,\n  a crash in the training script would be ignored and the graphs collected up to the crash would be\n  compiled.\n\n``NEURON_PARALLEL_COMPILE_DUMP_RESULTS``\n\n- When set to 1, neuron_parallel_compile would report compilation time results in the final JSON output.\n\n``NEURON_FUSE_SOFTMAX``\n\n- Enable custom lowering for Softmax operation to enable compiler optimizations.\n\n``NEURON_CUSTOM_SILU``\n\n- Enable custom lowering for SILU operation to enable compiler optimizations.\n\n``NEURON_TRANSFER_WITH_STATIC_RING_OPS``\n\n- The list of torch.nn.Modules that will have all parameter input buffers marked as static to enable runtime optimizations. The default is \"Embedding,LayerNorm,Linear,Conv2d,BatchNorm2d\" for ``torch-neuronx`` 1.13/2.1, and \"Embedding\" for ``torch-neuronx`` 2.1 in SDK release 2.20, and empty for ``torch-neuronx`` 2.1+ in SDK release 2.21.\n\n``NEURONCORE_NUM_DEVICES`` **[Use only with xmp.spawn]**\n\n-  Number of NeuronCores for setting up distributed data parallel training\n   when using torch_xla.distributed.xla_multiprocessing.spawn (xmp.spawn) utility only. See `MNIST MLP training with xmp.spawn <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/mnist_mlp/train_xmp.py>`__ for example.\n   NOTE: Do not use this environment variable when using ``torchrun``, which has ``--nproc_per_node`` option instead for this purpose. ``torchrun`` is recommended for consistent experience on one instance as well as across multiple instances.\n\n``NEURON_DUMP_HLO_SNAPSHOT`` **[Beta]** **[Torch-NeuronX 1.13 only]**\n\n- Dump the inputs, outputs, and graph in HLO format of a graph execution in a snapshot file. This\n  variable can be set to ``1``, ``ON_NRT_ERROR``, ``ON_NRT_ERROR_CPU``, ``ON_NRT_ERROR_HYBRID`` to\n  dump snapshots at every iteration using CPU memory, or dump only on errors automatically using\n  device, host, and both device and host memory respectively.\n\n``NEURON_NC0_ONLY_SNAPSHOT`` **[Beta]** **[Torch-NeuronX 1.13 only]**\n\n- Dump only the snapshot associated with Neuron Core 0 when ``NEURON_NC0_ONLY_SNAPSHOT=1`` and \n  the ``NEURON_DUMP_HLO_SNAPSHOT`` flag is set.\n\n``NEURON_TRANSFER_ALL_PARAMETERS_WITH_STATIC_RING`` **[Beta]**\n\n- When set to 1, mark all parameter transfers as static to enable runtime optimizations for torch.nn modules that are wrapped as done in Megatron-LM. This setting is not needed if torch.nn modules are not wrapped.\n\n``BUCKET_CAP_MB`` **[PyTorch XLA <=2.1]**\n\n- If there are many small gradient tensors, such as in BERT training, small allreduce sizes can limit performance. To improve performance, you can try increasing the bucket size using ``BUCKET_CAP_MB`` environment variable, which is set to 50MB by default. For example, BERT pretraining on multiple instances can see improved performance with ``BUCKET_CAP_MB=512``. NOTE: While this is supported in PyTorch Neuron 2.5, it is recommended for users to switch to ``ALLREDUCE_GRADIENTS_BUCKET_SIZE_MB``.\n\n``ALLREDUCE_GRADIENTS_BUCKET_SIZE_MB`` **[PyTorch XLA 2.5+]**\n\n- If there are many small gradient tensors, such as in BERT training, small allreduce sizes can limit performance. To improve performance, you can try increasing the bucket size using ``ALLREDUCE_GRADIENTS_BUCKET_SIZE_MB`` environment variable, which is set to 50MB by default. For example, BERT pretraining on multiple instances can see improved performance with ``ALLREDUCE_GRADIENTS_BUCKET_SIZE_MB=512``.\n\n\n``XLA_FLAGS`` **[PyTorch XLA]** **[Torch-NeuronX 2.1+]**\n\n- When set to ``\"--xla_dump_hlo_snapshots --xla_dump_to=<dir>\"``, this environmental variable enables dumping snapshots in ``<dir>`` directory. See :ref:`torch-neuronx-snapshotting` section for more information.\n\n``XLA_USE_DUMMY_STORE`` **[PyTorch XLA]**\n\n- When set to 1 along with ``TORCH_DIST_INIT_BARRIER=0``, PJRT process group initialization will use DummyStore instead of TCPStore. This reduces the number of open file descriptors and enables scaling training up to a large number of nodes.\n\n``XLA_USE_BF16`` **[PyTorch XLA <=2.1]**\n\n- When ``XLA_USE_BF16=1``, PyTorch Neuron will automatically map both torch.float and torch.double tensors\n  to bfloat16 tensors and turn on Stochastic Rounding mode. This can both reduce memory footprint and improve performance.\n  Example: to enable bfloat16 autocasting and stochastic rounding, set XLA_USE_BF16=1 only, as\n  stochastic rounding mode is on by default when XLA_USE_BF16=1. If you would like to preserve some tensors in float32, see ``XLA_DOWNCAST_BF16`` below. NOTE: This is deprecated in PyTorch Neuron 2.5. See :ref:`migration_from_xla_downcast_bf16`.\n\n\n``XLA_DOWNCAST_BF16`` **[PyTorch XLA <=2.1]**\n\n- When ``XLA_DOWNCAST_BF16=1``, PyTorch Neuron will automatically map torch.float tensors to bfloat16 tensors, torch.double tensors\n  to float32 tensors and turn on Stochastic Rounding mode. This can both reduce memory footprint and improve performance, while preserving some tensors in float32.\n  Example: to enable float to bfloat16 and double to float autocasting and stochastic rounding, set XLA_DOWNCAST_BF16=1 only, as\n  stochastic rounding mode is on by default when XLA_DOWNCAST_BF16=1. If you want to cast both torch.float and torch.double to bfloat16, please see ``XLA_USE_BF16`` above. NOTE: This is deprecated in PyTorch Neuron 2.5. See :ref:`migration_from_xla_downcast_bf16`.\n\n``XLA_DISABLE_FUNCTIONALIZATION`` **[PyTorch XLA 2.1+]**\n\n- When ``XLA_DISABLE_FUNCTIONALIZATION=0``, PyTorch XLA will enable the functionalization feature which makes graphs more compilable by removing mutations from functions. In PyTorch XLA 2.1 functionalization causes 15% performance degradations for BERT due to missing aliasing for gradient accumulation https://github.com/pytorch/xla/issues/7174 so it is off by default (``XLA_DISABLE_FUNCTIONALIZATION=1``). Enabling functionalization can improve convergence for LLaMA 70B with ZeRO1 (when used with release 2.19 compiler).\n\n\n``XLA_ENABLE_PARAM_ALIASING`` **[PyTorch XLA]**\n\n- When ``XLA_ENABLE_PARAM_ALIASING=0``, PyTorch Neuron will disable parameter aliasing in HLO graphs. This can be useful for debug. However, it would lead to increased device memory usage due to extra allocation of buffers (so higher chance of out-of-device memory errors) and decreased performance. When not set, parameter aliasing is enabled by default.\n\n``NEURON_RT_STOCHASTIC_ROUNDING_EN`` **[Neuron Runtime]**\n\n- When ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1``, PyTorch Neuron will use stochastic rounding instead of\n  round-nearest-even for all internal rounding operations when casting from FP32 to a reduced precision data type (FP16, BF16, FP8, TF32).\n  This feature has been shown to improve\n  training convergence for reduced precision training jobs, such as when bfloat16 autocasting is\n  enabled. This is set to 1 by default by PyTorch Neuron when XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1. To switch to round-nearest-even mode, please set ``NEURON_RT_STOCHASTIC_ROUNDING_EN=0``.\n\n``NEURON_RT_STOCHASTIC_ROUNDING_SEED`` **[Neuron Runtime]**\n\n- Sets the seed for the\n  random number generator used in stochastic rounding (see previous section). If this environment variable is not set, the seed is set to 0 by default. Please set ``NEURON_RT_STOCHASTIC_ROUNDING_SEED`` to a fixed value to ensure reproducibility between runs.\n\n``NEURON_RT_VISIBLE_CORES`` **[Neuron Runtime]**\n\n  Integer range of specific NeuronCores needed by the process (for example, 0-3 specifies NeuronCores 0, 1, 2, and 3).\n  You this environment variable when using torchrun to limit the launched processs to specific consecutive NeuronCores. To ensure best performance, the multi-core jobs requiring N NeuronCores for collective communication must be placed at the NeuronCore ID that starts at a multiple of N, where N is the world size limited to 1, 2, 8, 32. For example, a process using 2 NeuronCores can be mapped to 2 free NeuronCores starting at NeuronCore id 0, 2, 4, 6, etc, and a process using 8 NeuronCores can be mapped to 8 free NeuronCores starting at NeuronCore id 0, 8, 16, 24.\n\nAdditional Neuron runtime environment variables are described in `runtime\nconfiguration\ndocumentation <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-runtime/nrt-configurable-parameters.html>`__.\n\nAdditional XLA runtime environment variables are described in `PyTorch-XLA troubleshooting guide\n<https://github.com/pytorch/xla/blob/v1.10.0/TROUBLESHOOTING.md#user-content-environment-variables>`__.\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/misc-inference-torch-neuronx.rst",
    "content": "\n.. meta::\n   :description: Misc (``torch-neuronx``) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nMisc (``torch-neuronx``)\n========================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /release-notes/components/pytorch\n\n* :ref:`pytorch_rn`\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/misc-training.rst",
    "content": "\n.. meta::\n   :description: Misc (Training - torch-neuronx) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training\n   :date-modified: 2026-03-13\n\n\nMisc (Training - torch-neuronx)\n===============================\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /frameworks/torch/torch-neuronx/pytorch-neuron-supported-operators\n    /frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution\n    /frameworks/torch/torch-neuronx/training-troubleshooting\n    /release-notes/components/pytorch\n\n\n* :ref:`pytorch-neuron-supported-operators`\n* :ref:`setup-trn1-multi-node-execution`\n* :ref:`pytorch-neuron-traning-troubleshooting`\n* :ref:`pytorch_rn`"
  },
  {
    "path": "frameworks/torch/torch-neuronx/programming-guide/inference/autobucketing-dev-guide.rst",
    "content": ".. _torch-neuronx-autobucketing-devguide:\n\n\n.. meta::\n   :description: Autobucketing for Inference (torch-neuronx) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nAutobucketing for Inference (torch-neuronx)\n=============================================\n\n.. contents:: Table of Contents\n    :depth: 3\n\nIntroduction\n------------\n\nAutobucketing is a feature that enables you to use multiple bucket models. Each bucket model accepts a static input shape and a bucket kernel function. The models are then packaged into a single traced PyTorch model that can accept multiple different input shapes. \n\nThis gives you increased flexibility for inputs into Neuron models without the need to manage multiple Neuron models. The applications of this are extensive, from optimal model selection based on image resolution, to efficient sampling for token generation in language models.\n\nWhile Autobucketing offers increased flexibility, Autobucketing is also useful for latency sensitive applications since small and large inputs can be applied on small and large models respectively, based on the bucket kernel function.\n\nThis Developer Guide will discuss best practices for implementing Autobucketing for your use case. For this Developer Guide, a BERT model will be used, where we bucket on the sequence length dimension.\n\nBefore continuing, it is recommended to familiarize yourself with the Autobucketing APIs, which can be found :ref:`here <torch-neuronx-autobucketing>`.\n\nBucket Kernels\n--------------\n\nBucket kernels are user-defined functions that take in the model input as input to the function and return a tuple containing a *potentially* manipulated model input and a tensor representing the bucket index.\nAn important aspect of this function is that it must be able to be adapted to the TorchScript representation using :func:`torch.jit.script`. This is because to support saving a traced bucket model with :func:`torch.jit.save` and :func:`torch.jit.load`, you need all elements of the model to be in TorchScript.\nThe below example shows a bucket kernel that is adaptable to TorchScript in this way.\n\n.. code-block:: python\n\n    import torch\n    from typing import List\n\n    def sequence_length_bucket_kernel(tensor_list: List[torch.Tensor]):\n      x = tensor_list[0]\n      bucket_dim = 1\n      x_shape = x.shape\n      tensor_sequence_length = x_shape[bucket_dim]\n      batch_size = x_shape[bucket_dim - 1]\n      buckets = [128, 512]\n      idx = 0\n      num_inputs = 3\n      bucket = buckets[0]\n      reshaped_tensors: List[torch.Tensor] = []\n      bucket_idx = 0\n      for idx, bucket in enumerate(buckets):\n          if tensor_sequence_length <= bucket:\n              bucket_idx = idx\n              for tensor in tensor_list:\n                  if num_inputs == 0:\n                      break\n                  delta = bucket - tensor_sequence_length\n                  padding_shape: List[int] = [batch_size, delta]\n                  zeros = torch.zeros(padding_shape, dtype=x.dtype)\n                  reshaped_tensors.append(torch.cat([tensor, zeros], dim=bucket_dim))\n                  num_inputs -= 1\n              break\n      return reshaped_tensors, torch.tensor([bucket_idx])\n\n  def get_bucket_kernel(*_):\n      bk = torch.jit.script(sequence_length_bucket_kernel)\n      return bk\n\n\nIn the above example we define a bucket kernel that takes in an input to a transformers model, which is ``[input_ids,attention_mask,token_type_ids]``. We first obtain the first tensor in that list, since that tensor contains ``sequence_length`` as a dimension, and retrieve the ``sequence_length`` and ``batch_size``. We also define the sequence length buckets. The next major part of the code is the for loop, which first finds the matching sequence length bucket and then iterates through the tensors in the list to right pad the tensors to the desired sequence length. After this is done, we return the padded inputs as a list of tensors and a tensor containing the bucket index. Finally, we create a function ``get_bucket_kernel`` which returns a version of the bucket kernel that has been adapted to TorchScript using using :func:`torch.jit.script`. We can use this bucket kernel to pass in a tokenized input of sequence length 1-512, which is padded to the nearest bucket size rounded up.\n\nNote that we call :func:`torch.jit.script` instead of :func:`torch.jit.trace`. This\nis because we rely on control flow logic evaluating correctly for all inputs. This\nresults in certain challenges when writing compatible and accurate bucket kernels. We\ndiscuss these challenges and resolutions in the next section.\n\nTorchscript Best Practices for Bucket Kernels\n---------------------------------------------\n\nBelow are some recommendations when creating these Bucket Kernels:\n\n    - **Type annotate non-tensor-like data types**: Functions that have been adapted to the TorchScript representation using using :func:`torch.jit.script` treat \n      variables that are defined by using another variable as tensor-like when they might not be. This can be seen when defining\n      ``padding_shape`` in the above bucket kernel.\n    - **Index selection support is limited**: Functions that have been adapted to the TorchScript representation using using :func:`torch.jit.script` don't support the use of variables\n      for indexing very well. It could work in some scenarios, but there isn't a discernable pattern to it,\n      so for more reliable TorchScript-adapted functions relying on indexes, use an enumerated for loop or literals if possible.\n    - **Initializing variables with literals**: The Torchscript compiler often incorrectly removes\n      a variable if it finds another variable initialized with the same literal, such as ``0``. The compiler might also reuse variables initialized with a\n      literal for other operations, such as indexing or function parameters. This can cause inaccurate results for certain inputs. Therefore, always validate the\n      function by testing with the expected inputs. If the lowering does not behave as expected, you can see the lowered representation by calling ``bucket_kernel.graph``, where ``bucket_kernel`` is the return value of ``get_bucket_kernel``, and analyze the graph for inaccurate lowerings.\n    - **Use of aten functions might be necessary to guarantee correct lowering**: The TorchScript interpreter supports certain operations, such as slicing, but can\n      lower them in unexpected ways when using normal syntax. For example, with slicing, the most common way to slice is with indexing syntax such as ``tensor[:,:2,:]``. However,\n      this can cause lowering issues due to the aforementioned reasons. To mitigate this, it might be necessary to call the respective aten function directly.\n      See the below example with ``shared_state_buffer_preprocessor``.\n\nShared State Buffers\n--------------------\n\nAutobucketing supports the concept of a shared buffer between bucket models. You can use this to define how the shared buffer can be manipulated to be fed as input to a bucket model via the ``shared_state_buffer_preprocessor``.\n\nThe above recommendations also apply when defining a ``shared_state_buffer_preprocessor``.\n\nAn example where a shared buffer is useful between bucket models is maintaining a KV Cache between bucket models for LLMs.\n\nBelow is an example of a KV Cache preprocessor for Autobucketing.\n\n.. code-block:: python\n\n  def state_preprocessor(shapes_collection: List[List[List[int]]], states: List[torch.Tensor], bucket_idx_tensor: torch.Tensor)->List[torch.Tensor]:\n    bucket_idx = torch.ops.aten.Int(bucket_idx_tensor)\n    shapes = shapes_collection[bucket_idx]\n    sliced_state_tensors = []\n    \n    for i in range(len(shapes)):\n        expected_shape = shapes[i]\n        state_tensor = states[i]\n        state_tensor_shape = state_tensor.shape\n        for j,npos in enumerate(expected_shape):\n            state_tensor_dim_length = state_tensor_shape[j]\n            state_tensor = torch.ops.aten.slice(state_tensor,dim=j,start=state_tensor_dim_length-npos,end=state_tensor_dim_length)\n        sliced_state_tensors.append(state_tensor)\n    \n    return sliced_state_tensors\n  \n  def get_state_preprocessor():\n    sp = torch.jit.script(state_preprocessor)\n    return sp\n\nIn this example, we take in ``shapes_collection``, ``states``, and ``bucket_idx_tensor``. The input ``shapes_collection`` is essentially a list of expected shapes for each state tensor defined for each bucket kernel. For example, we can have ``shapes_collection = [[[1,128],[1,128]],[[1,512],[1,512]]]`` where ``shapes_collection[0][1]`` retrieves the expected shape for the second state tensor in the first bucket. The input ``states`` is the actual list of tensors in the shared buffer, which contains tensors of the largest shape. Finally, ``bucket_idx_tensor`` is the same tensor returned by the bucket kernel.\n\nTwo things to note is that we use two aten functions directly: ``aten::Int`` to convert the ``bucket_idx_tensor`` to an integer, and ``aten::slice`` to perform slicing given non-const or non-literal parameters.\n\n.. note::\n\n    The above shared state function is not used in the BERT example\n\nBucket Model Config\n-------------------\n\nGiven the above two examples, we can initialize a :class:`torch_neuronx.BucketModelConfig` object as follows:\n\n.. code-block:: python\n\n  import torch\n  import torch_neuronx\n\n  from typing import List\n\n  # above code\n\n  bucket_config = torch_neuronx.BucketModelConfig(get_bucket_kernel,shared_state_buffer_preprocessor=get_state_preprocessor)\n\n\nPutting it all Together\n-----------------------\n\nHere is a simple example using the BERT model:\n\n.. code-block:: python\n\n  import torch\n  import torch_neuronx\n\n  from transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n  from typing import List\n\n  def encode(tokenizer, *inputs, max_length=128, batch_size=1):\n      tokens = tokenizer.encode_plus(\n          *inputs,\n          max_length=max_length,\n          padding='max_length',\n          truncation=True,\n          return_tensors=\"pt\"\n      )\n      return (\n          torch.repeat_interleave(tokens['input_ids'], batch_size, 0),\n          torch.repeat_interleave(tokens['attention_mask'], batch_size, 0),\n      )\n\n  def get_bert_model(*args):\n      name = \"bert-base-cased-finetuned-mrpc\"\n      model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)\n\n      return model,{}\n\n  def sequence_length_bucket_kernel(tensor_list: List[torch.Tensor]):\n      x = tensor_list[0]\n      bucket_dim = 1\n      x_shape = x.shape\n      tensor_sequence_length = x_shape[bucket_dim]\n      batch_size = x_shape[bucket_dim - 1]\n      buckets = [128, 512]\n      idx = 0\n      num_inputs = 3\n      bucket = buckets[0]\n      reshaped_tensors: List[torch.Tensor] = []\n      bucket_idx = 0\n      for idx, bucket in enumerate(buckets):\n          if tensor_sequence_length <= bucket:\n              bucket_idx = idx\n              for tensor in tensor_list:\n                  if num_inputs == 0:\n                      break\n                  delta = bucket - tensor_sequence_length\n                  padding_shape: List[int] = [batch_size, delta]\n                  zeros = torch.zeros(padding_shape, dtype=x.dtype)\n                  reshaped_tensors.append(torch.cat([tensor, zeros], dim=bucket_dim))\n                  num_inputs -= 1\n              break\n      return reshaped_tensors, torch.tensor([bucket_idx])\n\n  def get_bucket_kernel(*_):\n      bk = torch.jit.script(sequence_length_bucket_kernel)\n      return bk\n  \n  if __name__ == '__main__':\n\n      name = \"bert-base-cased-finetuned-mrpc\"\n\n      # Build tokenizer and model\n      tokenizer = AutoTokenizer.from_pretrained(name)\n      model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)\n\n      # Setup some example inputs\n      sequence_0 = \"The company HuggingFace is based in New York City\"\n      sequence_1 = \"HuggingFace is named after the huggingface emoji\"\n      sequence_2 = \"HuggingFace's headquarters are situated in Manhattan\"\n\n      paraphrase_s128 = encode(tokenizer, sequence_0, sequence_2)\n      paraphrase_s122 = encode(tokenizer, sequence_0, sequence_2, max_length=122)\n      \n      paraphrase_s512 = encode(tokenizer, sequence_0, sequence_1, max_length=512)\n      paraphrase_s444 = encode(tokenizer, sequence_0, sequence_1, max_length=444)\n\n      # Note: Run on CPU before trace. Avoids running with XLA allocated params\n      paraphrase_expected_s128 = torch.argmax(model(*paraphrase_s128)[0])\n      paraphrase_expected_s512 = torch.argmax(model(*paraphrase_s512)[0])\n      \n\n      # Trace model\n      bucket_config = torch_neuronx.BucketModelConfig(get_bucket_kernel)\n      bucket_trace_neuron = torch_neuronx.bucket_model_trace(get_bert_model, [paraphrase_s128,paraphrase_s512], bucket_config)\n\n      # Run traced model with shorter inputs to test bucket rounding\n      paraphrase_actual_s128 = torch.argmax(bucket_trace_neuron(*paraphrase_s122)[0])\n      paraphrase_actual_s512 = torch.argmax(bucket_trace_neuron(*paraphrase_s444)[0])\n      \n\n      # Compare outputs\n      assert paraphrase_expected_s128 == paraphrase_actual_s128\n      assert paraphrase_expected_s512 == paraphrase_actual_s512\n\n\nAutobucketing for Neuronx-Distributed\n-------------------------------------\n\nTo see this same example applied on Neuronx-Distributed, go to this section on the :ref:`Neuronx-Distributed Inference Developer Guide <neuronx_distributed_inference_developer_guide>`"
  },
  {
    "path": "frameworks/torch/torch-neuronx/programming-guide/inference/core-placement.rst",
    "content": ".. _torch_neuronx_core_placement_guide:\n\n\n.. meta::\n   :description: NeuronCore Allocation and Model Placement for Inference (|torch-neuronx|) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nNeuronCore Allocation and Model Placement for Inference (|torch-neuronx|)\n=========================================================================\n\nThis programming guide describes the how to allocate NeuronCores to processes\nand place models onto specific NeuronCores. The models in this guide are\nexpected to have been traced with with :func:`torch_neuronx.trace`.\n\n.. warning::\n\n    This guide is **not** applicable to NeuronCore placement using XLA\n    LazyTensor device execution. See: :ref:`trace-vs-xla-lazytensor`\n\nIn order of precedence, the recommendation is to use the following placement\ntechniques:\n\n1. For nearly all regular models, default core placement should be used to take\n   control of all cores for a single process.\n2. For applications using multiple processes, default core placement should be\n   used in conjunction with ``NEURON_RT_NUM_CORES`` (:ref:`torch_neuronx_placement_default`)\n3. For more granular control, then the beta explicit placement APIs may\n   be used (:ref:`torch_neuronx_placement_explicit`).\n\n.. contents:: Table of Contents\n    :depth: 3\n\nThe following guide will assume a machine with 8 NeuronCores:\n\n- NeuronCores will use the notation ``nc0``, ``nc1``, etc.\n- Models will use the notation ``m0``, ``m1`` etc.\n\nNeuronCores and  model allocations will be displayed in the following format:\n\n.. raw:: html\n    :file: images/0-0-legend-neuronx.svg\n\nThe actual cores that are visible to the process can be adjusted according to\nthe :ref:`nrt-configuration`.\n\nUnlike |torch-neuron| (with |neuron-cc|) instances, |torch-neuronx| (with\n|neuronx-cc|) does not support :ref:`neuroncore-pipeline`. This simplifies\nmodel core allocations since it means that model pipelines will likely not span\nacross multiple NeuronCores.\n\n.. _torch_neuronx_placement_default:\n\nDefault Core Allocation & Placement\n-----------------------------------\n\nThe most basic requirement of an inference application is to be able to place a\nsingle model on a single NeuronCore. More complex applications may use multiple\nNeuronCores or even multiple processes each executing different models. The\nimportant thing to note about designing an inference application is that a\nsingle NeuronCore will always be allocated to a single process. *Processes do\nnot share NeuronCores*. Different configurations can be used to ensure that\nan application process has enough NeuronCores allocated to execute its model(s):\n\n- Default: A process will attempt to take ownership of **all NeuronCores**\n  visible on the instance. This should be used when an instance is only running\n  a single inference process since no other process will be allowed to take\n  ownership of any NeuronCores.\n- ``NEURON_RT_NUM_CORES``: Specify the **number of NeuronCores** to allocate\n  to the process. This places no restrictions on which NeuronCores will be used,\n  however, the resulting NeuronCores will always be contiguous. This should be\n  used in multi-process applications where each process should only use a subset\n  of NeuronCores.\n- ``NEURON_RT_VISIBLE_CORES``: Specifies exactly **which NeuronCores** are\n  allocated to the process by index. Similar to ``NEURON_RT_NUM_CORES``, this\n  can be used in multi-process applications where each process should only use a\n  subset of NeuronCores. This provides more fined-grained controls over the\n  exact NeuronCores that are allocated to a given process.\n\nSee the :ref:`nrt-configuration` for more environment variable details.\n\nExample: Default\n^^^^^^^^^^^^^^^^\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n\n    m0 = torch.jit.load('model.pt')  # Loads to nc0\n    m1 = torch.jit.load('model.pt')  # Loads to nc1\n\n\n.. raw:: html\n    :file: images/0-1-default-2.svg\n\nWith no environment configuration, the process will take ownership of all\nNeuronCores. In this example, only two of the NeuronCores are used by the\nprocess and the remaining are allocated but left idle.\n\n\nExample: ``NEURON_RT_NUM_CORES``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_NUM_CORES = '2'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n\n    m0 = torch.jit.load('model.pt')  # Loads to nc0\n    m1 = torch.jit.load('model.pt')  # Loads to nc1\n\n.. raw:: html\n    :file: images/0-2-default-rt-num-cores.svg\n\nSince there is no other process on the instance, only the first 2 NeuronCores\nwill be acquired by the process. Models load in a simple linear order to the\nleast used NeuronCores.\n\n\nExample: ``NEURON_RT_VISIBLE_CORES``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_VISIBLE_CORES = '4-5'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n\n    m0 = torch.jit.load('model.pt')  # Loads to nc4\n    m1 = torch.jit.load('model.pt')  # Loads to nc5\n\n\n.. raw:: html\n    :file: images/0-3-default-rt-visible-cores.svg\n\nUnlike ``NEURON_RT_NUM_CORES``, setting the visible NeuronCores allows the\nprocess to take control of a specific contiguous set. This allows an application\nto have a more fine-grained control of where models will be placed.\n\n\nExample: Multiple Processes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_NUM_CORES = '2'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n\n    m0 = torch.jit.load('model.pt')  # Loads to nc0\n    m1 = torch.jit.load('model.pt')  # Loads to nc1\n\n\nIn this example, if the script is run **twice**, the following allocations\nwill be made:\n\n.. raw:: html\n    :file: images/0-5-default-multiprocess.svg\n\nNote that each process will take ownership of as many NeuronCores as is\nspecified by the ``NEURON_RT_NUM_CORES`` configuration.\n\n\n.. _torch_neuronx_placement_explicit:\n\nExplicit Core Placement\n-------------------------------------\n\nThe ``torch_neuronx`` framework allows can be found in the\n:ref:`torch_neuronx_core_placement_api` documentation.\n\n\nExample: Manual Core Selection\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe most direct usage of the placement APIs is to manually select the\nstart NeuronCore that each model is loaded to.\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_NUM_CORES = '4'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n\n    # NOTE: Order of loads does NOT matter\n    with torch_neuronx.experimental.neuron_cores_context(start_nc=3):\n        m0 = torch.jit.load('model.pt')  # Loads to nc3\n\n    with torch_neuronx.experimental.neuron_cores_context(start_nc=0, nc_count=2):\n        m1 = torch.jit.load('model.pt')  # Loads replicas to nc0 and nc1\n\n    example = torch.rand(1, 3, 224, 224)\n\n    m1(example)  # Executes on nc3\n    m1(example)  # Executes on nc3\n\n    m0(example)  # Executes on nc0\n    m0(example)  # Executes on nc1\n    m0(example)  # Executes on nc0\n\n\n.. raw:: html\n    :file: images/8-models-m0-3-m1-1-2.svg\n\n\nExample: Automatic Multicore\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUsing explicit core placement it is possible to replicate a model to multiple\nNeuronCores simultaneously. This means that a single model object within python\ncan utilize all available NeuronCores (or NeuronCores allocated to the process).\n\n**Environment Setup**:\n\n.. code-block:: bash\n\n    export NEURON_RT_NUM_CORES = '8'\n\n**Python Script**:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n\n    with torch_neuronx.experimental.multicore_context():\n        m0 = torch.jit.load('model.pt')  # Loads replications to nc0-nc7\n\n    example = torch.rand(1, 3, 224, 224)\n\n    m0(example)  # Executes on nc0\n    m0(example)  # Executes on nc1\n\n.. raw:: html\n    :file: images/6-multicore.svg\n\nTo make full use of a model that has been loaded to multiple NeuronCores,\nmultiple threads should be used to run inferences in parallel.\n\n\n.. |neuron-cc| replace:: :ref:`neuron-cc <neuron-compiler-cli-reference>`\n.. |neuronx-cc| replace:: :ref:`neuronx-cc <neuron-compiler-cli-reference-guide>`\n.. |torch-neuron| replace:: :ref:`torch-neuron <inference-torch-neuron>`\n.. |torch-neuronx| replace:: :ref:`torch-neuronx <inference-torch-neuronx>`\n.. |Inf1| replace:: :ref:`Inf1 <aws-inf1-arch>`\n.. |Trn1| replace:: :ref:`Trn1 <aws-trn1-arch>`\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/programming-guide/inference/index.rst",
    "content": "\n.. meta::\n   :description: Developer Guide  (``torch-neuronx``) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nDeveloper Guide  (``torch-neuronx``) \n====================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /frameworks/torch/torch-neuronx/programming-guide/inference/core-placement\n    /frameworks/torch/torch-neuronx/programming-guide/inference/trace-vs-xla-lazytensor\n    /about-neuron/appnotes/torch-neuronx/torch-neuronx-dataparallel-app-note.rst\n    \n\n.. dropdown::  Developer Guide for Inference (``torch-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n    :open:\n\n    * :ref:`torch_neuronx_core_placement_guide`\n    * :ref:`trace-vs-xla-lazytensor`\n    * :ref:`torch-neuronx-dataparallel-app-note`\n    * :ref:`torch-neuronx-autobucketing-devguide`"
  },
  {
    "path": "frameworks/torch/torch-neuronx/programming-guide/inference/trace-vs-xla-lazytensor.rst",
    "content": ".. _trace-vs-xla-lazytensor:\n\n\n.. meta::\n   :description: Comparison of Traced Inference versus XLA |LazyTensor| Inference (``torch-neuronx``) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nComparison of Traced Inference versus XLA |LazyTensor| Inference (``torch-neuronx``)\n=====================================================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nIntroduction\n------------\n\n\nUsing ``torch-neuronx``, there are two ways that a model can be\nexecuted for inference:\n\n- **XLA LazyTensor Inference**: A model is executed on Neuron by calling\n  :meth:`~torch.Tensor.to` to move :class:`~torch.nn.parameter.Parameter`\n  and :class:`~torch.Tensor` data using the |device|. Executing operations uses\n  torch |LazyTensor| to record, compile, and execute the graph. These are the\n  same mechanisms used for :ref:`training <pytorch-neuronx-programming-guide>`.\n\n- **(Recommended) Traced Inference**: A model is traced prior to inference\n  using the |trace| API. This trace is similar to :func:`torch.jit.trace` but\n  instead creates a Neuron-specific `TorchScript`_ artifact. This artifact\n  provides improved performance and portability compared to XLA\n  |LazyTensor| inference.\n\n\n\n\n.. _xla_lazytensor:\n\nXLA Lazy Tensor Inference Mechanics\n-----------------------------------\n\nXLA |LazyTensor| inference uses Just-In-Time (JIT) compilation for Neuron\nexecution.\n\nXLA Device execution uses the built-in ``torch-xla`` functionality with torch\n|LazyTensor| to record torch operations using the |device|. The graph of\noperations is sent to the |neuronx-cc| compiler upon calling\n``xm.mark_step()``. Finally the compiled graph is transferred to a NeuronCore\nand executed in the Neuron backend.\n\nThe initial model inference will be very slow since the model binary file in the\nNeuron Executable File Format (NEFF) will need to be generated by the compiler.\nUpon each subsequent call to a model, the application will re-execute the\npython, rebuild the graph, and check a cache to see if an existing NEFF file\nis available for the given graph before attempting to recompile.\n\nThe process of recording graph operations in python can become a bottleneck for\notherwise fast models. This overhead will always have an effect on performance\nregardless of model size but may be less noticeable on larger models. Note that\nthis XLA |LazyTensor| execution performance may improve significantly with new\ntorch features in the future.\n\nExample\n~~~~~~~\n\n.. tab-set::\n\n    .. tab-item:: Fixed Shape Example\n\n        .. code-block:: python\n\n            import torch\n            import torch_neuronx\n            import torch_xla.core.xla_model as xm\n\n            # Create XLA device\n            device = xm.xla_device()\n\n            # Load example model and inputs to Neuron device\n            model = torch.nn.Sequential(\n                torch.nn.Linear(784, 120),\n                torch.nn.ReLU(),\n                torch.nn.Linear(120, 10),\n                torch.nn.Softmax(dim=-1),\n            )\n            model.eval()\n            model.to(device)\n            example = torch.rand((1, 784), device=device)\n\n            # Inference\n            with torch.no_grad():\n                result = model(example)\n                xm.mark_step()  # Compilation occurs here\n                print(result.cpu())\n\n\n    .. tab-item:: Dynamic Shape Example\n\n        The following is an example of a model that dynamically changes the\n        sequence length and batch size of the input token ID tensor to trigger\n        recompilations. This kind of workflow would require padding when using\n        traced inference.\n\n        .. code-block:: python\n\n            import torch\n            import torch_neuronx\n            import torch_xla.core.xla_model as xm\n\n            # Create XLA device\n            device = xm.xla_device()\n\n            # Load example model and inputs to Neuron device\n            model = torch.nn.Sequential(\n                torch.nn.Embedding(num_embeddings=30522, embedding_dim=512),\n                torch.nn.Linear(512, 128),\n                torch.nn.ReLU(),\n                torch.nn.Linear(128, 2),\n                torch.nn.Softmax(dim=-1),\n            )\n            model.eval()\n            model.to(device)\n\n            token_ids_1 = torch.tensor([\n                [1, 28, 748, 0],\n            ]) # shape: [1, 4]\n            token_ids_2 = torch.tensor([\n                [1, 13087, 10439, 1990, 18912, 0],\n                [1, 12009, 7849, 2509, 3500, 0],\n            ])  # shape: [2, 6]\n\n            # Inference\n            with torch.no_grad():\n\n                # First compilation/inference\n                result = model(token_ids_1)\n                xm.mark_step()\n                print(result.cpu())  # shape: [1, 4, 2]\n\n                # Recompilation occurs here since token_ids_2 is a different shape. This infer\n                # would have failed if the model had been traced with shape [1, 4]\n                result = model(token_ids_2)\n                xm.mark_step()\n                print(result.cpu())  # shape: [2, 6, 2]\n\n\n\nTraced Inference Mechanics\n--------------------------\nTraced inference uses Ahead-Of-Time (AOT) compilation for Neuron execution.\n\nSimilar to XLA |LazyTensor| inference, |trace| uses the operation recording\nmechanisms provided by ``torch-xla`` to build the graph structure. This graph\nstructure is also sent to the |neuronx-cc| compiler to produce a binary (NEFF)\nthat is executable on Neuron.\n\nThe main difference is that the call to |trace| returns a *new* fully\ncompiled graph as a `TorchScript`_ Module. Upon calling this new Module, rather\nthan re-executing the python, rebuilding the graph, and checking\nthe cache for a matching model, the new Module simply executes the precompiled\ngraph that was preloaded during tracing. This is a significantly\nmore optimized runtime since it avoids the python operator tracing, graph\nbuilding, etc.\n\nOne disadvantage of this interface is that a model will never dynamically\nrecompile after a trace. This means that dynamic control flow is not supported\nwithin a function/module. Tensor input/output shapes are fixed to the shapes\npassed to the |trace| API. Dynamic batching and bucketing can be used to avoid\nthe pitfalls of static shapes.\n\nExample\n~~~~~~~\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n\n    # Create example model and inputs\n    model = torch.nn.Sequential(\n        torch.nn.Linear(784, 120),\n        torch.nn.ReLU(),\n        torch.nn.Linear(120, 10),\n        torch.nn.Softmax(dim=-1),\n    )\n    model.eval()\n    example = torch.rand((1, 784))\n\n    # Create fixed model trace\n    trace = torch_neuronx.trace(model, example)\n\n    # Inference\n    result = trace(example) # No recompilation. Input shapes must not change\n    print(result)\n\n\n\nTraced Inference Advantages\n---------------------------\n\nTraced inference should be used for nearly all deployment purposes since it\nprovides some key advantages over XLA |LazyTensor| execution:\n\n- **Reduced Overhead**: There is no overhead associated with graph recording,\n  compilation, and model loading since these steps are performed only once\n  within the call to |trace|. In contrast, when using XLA |LazyTensor|\n  inference, all of these steps are performed just-in-time (with caching to\n  improve performance).\n- **Serializable**: The TorchScript Module that is produced from the |trace| API\n  is serializable using the normal :func:`torch.jit.save` function. It is able\n  to be reloaded in an inference environment with :func:`torch.jit.load`.\n  In contrast, XLA device inference does not provide a predetermined\n  serialization format that includes the pre-compiled NEFF artifacts. These\n  must be manually copied to an inference environment to be used.\n- **Reduced Dependencies**: When using the traced TorchScript Module in an\n  inference environment, it is no longer required to install the\n  |neuronx-cc| compiler. In contrast, when using the XLA |LazyTensor|\n  execution, an execution may require a recompile to successfully infer.\n- **Static & Predictable**: The resulting module produced by |trace| will\n  contain a static model that will consume a predictable amount of Neuron device\n  memory and will never require recompilation based on input changes. In\n  contrast, since XLA device inference performs just-in-time compilation, it\n  can be more difficult to predict memory utilization and the compilations\n  that may be required at inference time.\n- **C++ Usability**: If the end application is an inference platform using\n  ``libtorch``, it is easy to integrate with ``libtorchneuron`` to load\n  traced modules. It is not currently possible to set up an environment to use\n  torch in C++ in conjunction with Neuron XLA |LazyTensor| execution.\n\nTensor Materialization During Tracing\n---------------------------------------\n\nWhile tensor materialization is normal for JIT workflow, it is not expected during traced inference.\nWhen working with traced inference, developers may encounter tensor materialization, which leads to graphs being compiled based on example input tensor value and unexpected program behavior.\nTherefore we need to take advantage of PyTorch/XLA's debugging flags to identify when unexpected tensor materialization happens and make appropriate code changes to avoid tensor materialization.\n\n\nA common issue occurs when tensor values are evaluated during model compilation (traced inference). Consider this example:\n\n.. code-block:: python\n\n   def forward(self, tensor):\n       if tensor[0] == 1:\n           return tensor\n       else:\n           return tensor * 2\n\nWhile this code can compile and run, it may lead to unexpected behavior because:\n\n* The tensor value is being accessed during tracing (``tensor[0]``)\n* The resulting graph becomes fixed based on the tensor value available during tracing\n* Developers might incorrectly assume the condition will be evaluated dynamically during inference\n* The solution for the code above is to utilize the debugging flags below to catch the issue and modify the code\n\nSee the updated code without tensor materialization:\n\n.. code-block:: python\n\n  class TestModel(torch.nn.Module):\n      def __init__(self, flag=1):\n          super().__init__()\n          # the flag should be pre-determined based on the model configuration\n          # it should not be an input of the model during runtime\n          self.flag = flag\n\n      def forward(self, tensor):\n          if self.flag:\n              return tensor\n          else:\n              return tensor * 2\n\nDebugging Flags\n~~~~~~~~~~~~~~~~\n\nTo help catch tensor materialization issues, PyTorch/XLA provides two useful approaches:\n\n1. Enable warning messages for tensor materialization:\n\n.. code-block:: python\n\n   import os\n   os.environ['PT_XLA_DEBUG_LEVEL'] = '2'\n\n2. Disable graph execution to catch issues during development:\n\n.. code-block:: python\n\n   import torch_xla\n   torch_xla._XLAC._set_allow_execution(False)\n\nRecommendations\n~~~~~~~~~~~~~~~~\n\nUsing these flags during development can help identify potential issues early in the development cycle. The recommended approach is to:\n\n* Use ``PT_XLA_DEBUG_LEVEL=2`` during initial development to identify potential materialization points\n* Apply ``_set_allow_execution(False)`` when you want to ensure no tensor materialization occurs during tracing\n* When you see warnings or errors related the tensor materialization, look into the code path and make appropriate changes. The example above moved the flag to the ``__init__`` function which does not depend on the model input during runtime.\n\nFor more detailed debugging information, refer to the `XLA PyTorch on XLA Devices <https://github.com/pytorch/xla/blob/master/docs/source/learn/pytorch-on-xla-devices.md>`__.\n\n\nSummary\n-------\n\n+----------------+-----------------------+-------------------+\n|                | XLA Device Inference  | Traced Inference  |\n+================+=======================+===================+\n| Compilation    | JIT                   | AOT               |\n+----------------+-----------------------+-------------------+\n| Serialization  | N/A                   | `TorchScript`_    |\n+----------------+-----------------------+-------------------+\n| Performance    | Slower                | Faster            |\n+----------------+-----------------------+-------------------+\n| Dynamic        | Yes                   | No                |\n+----------------+-----------------------+-------------------+\n| C++ Usage      | No                    | Yes               |\n+----------------+-----------------------+-------------------+\n\n\n.. |LazyTensor| replace:: :ref:`Lazy Tensor <xla_lazytensor>`\n.. |trace| replace:: :func:`~torch_neuronx.trace`\n.. |device| replace:: :code:`xm.xla_device()`\n.. |neuronx-cc| replace:: :ref:`neuronx-cc <neuron-compiler-cli-reference-guide>`\n.. _TorchScript: https://pytorch.org/docs/stable/jit.html\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/programming-guide/torch-neuronx-profiling-dev-guide.rst",
    "content": ".. _torch-neuronx-dev-guide:\n\n\n.. meta::\n   :description: Developer Guide for Profiling with PyTorch NeuronX - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, profiling, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nDeveloper Guide for Profiling with PyTorch NeuronX \n===================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nIntroduction\n~~~~~~~~~~~~\n\nThe Neuron PyTorch profiler is a context manager wrapping around the entire model\nand training loop. Specifically this is the context manager:\n``torch_neuronx.experimental.profiler.profile``. This is a wrapper of\nthe XLA Debug Profiler which we imported earlier as\n``import torch_xla.debug.profiler as xp``, and is backwards-compatible.\nHere are the parameters of the profiler context manager:\n\n1. ``port``: Port to run the profiling GRPC server on. Default is 9012.\n2. ``profile_type``: There is “trace” and “operator”. “trace”\n   is the Torch Runtime Trace Level, while “operator” is the Model\n   Operator Trace Level.\n3. ``ms_duration``: This defines how long the profiler will capture the\n   HLO artifacts from the model to view in the profiler. The unit is in\n   milliseconds.\n4. ``neuron_tensorboard_plugin_dir``: The directory the neuron tensorboard plugin will file write to\n   (NB: Assumes that the tensorboard logdir=\"log/\")\n5. ``delete_working``: If set to False turns off the deletion of temporary files (default True)\n\nWe move the model to the xla device *inside the context manager.* This is important,\nas this allows the profiler to collect the operations and processes from the \n``neuronx-cc`` compiler artifacts. If the model is moved to the xla device outside of\nthe context manager, the profiling won't work.\n\n.. note::\n\n   The warnings about the ``XLA_IR_DEBUG`` and ``XLA_HLO_DEBUG``\n   env vars not being set can be ignored for the most part. This warning\n   only comes into play when compiling the model for Neuron outside of the\n   profiler context manager.\n\nAfter running this script, notice a ``./logs`` directory has been\ncreated. It contains the TensorBoard logs including the\nprofiler views.\n\n\nExample used in this guide\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWe will use the following code sample to describe in detail how to use the Neuron PyTorch profiling API.\n\nPrerequisites\n^^^^^^^^^^^^^\n\n1. Initial `Trn1 setup for PyTorch\n   (torch-neuronx) <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install.html>`__\n   has been done\n\nEnvironment\n^^^^^^^^^^^\n\n::\n\n   #activate python virtual environment and install tensorboard_plugin_neuron\n   source ~/aws_neuron_venv_pytorch_p38/bin/activate\n   pip install tensorboard_plugin_neuronx\n\n   #create work directory for the Neuron Profiling tutorials\n   mkdir -p ~/neuron_profiling_tensorboard_examples\n   cd ~/neuron_profiling_tensorboard_examples\n\nSetup\n^^^^^\n\nCreate a new working directory:\n\n::\n   \n   mkdir simple_demo\n   cd simple_demo\n\nSave the following code as ``demo.py``:\n\n::\n\n   import os\n\n   import torch\n   import torch.nn as nn\n   import torch.nn.functional as F\n\n   # XLA imports\n   import torch_xla\n   import torch_xla.core.xla_model as xm\n   import torch_xla.debug.profiler as xp\n\n   import torch_neuronx\n   from torch_neuronx.experimental import profiler\n\n   os.environ[\"NEURON_CC_FLAGS\"] = \"--cache_dir=./compiler_cache\"\n\n   # Global constants\n   EPOCHS = 10\n\n   # Declare 3-layer MLP Model\n   class MLP(nn.Module):\n     def __init__(self, input_size = 10, output_size = 2, layers = [5, 5]):\n         super(MLP, self).__init__()\n         self.fc1 = nn.Linear(input_size, layers[0])\n         self.fc2 = nn.Linear(layers[0], layers[1])\n         self.fc3 = nn.Linear(layers[1], output_size)\n\n     def forward(self, x):\n         x = F.relu(self.fc1(x))\n         x = F.relu(self.fc2(x))\n         x = self.fc3(x)\n         return F.log_softmax(x, dim=1)\n\n\n   def main():\n       # Fix the random number generator seeds for reproducibility\n       torch.manual_seed(0)\n\n       # XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance)\n       device = xm.xla_device()\n\n       # Start the proflier context-manager\n       with torch_neuronx.experimental.profiler.profile(\n           port=9012,\n           profile_type='trace',\n           ms_duration=15000 ) as profiler:\n\n           # IMPORTANT: the model has to be transferred to XLA within\n           # the context manager, otherwise profiling won't work\n           model = MLP().to(device)\n           optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n           loss_fn = torch.nn.NLLLoss()\n\n           # start training loop\n           print('----------Training ---------------')\n           model.train()\n           for epoch in range(EPOCHS):\n               optimizer.zero_grad()\n               train_x = torch.randn(1,10).to(device)\n               train_label = torch.tensor([1]).to(device)\n               \n               #forward\n               loss = loss_fn(model(train_x), train_label)                \n               \n               #back\n               loss.backward()    \n               optimizer.step()\n               \n               # XLA: collect ops and run them in XLA runtime\n               xm.mark_step() \n\n       print('----------End Training ---------------')\n\n   if __name__ == '__main__':\n       main()\n\nThen run it!\n\n::\n\n    python demo.py\n\n.. _Tensorboard Interface Overview:\n\nViewing the Trace on TensorBoard\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo view the TensorBoard logs, run ``tensorboard --logdir=./logs``\n\n.. note:: \n\n   Depending on TensorBoard version ``--load_fast=false`` might be an additional\n   parameter to add to view the trace.\n\nTake note of the port (usually 6006) and enter ``localhost:<port>`` into\nthe local browser (assuming port forwarding is set up properly):\n\n|tensorboard-url-image|\n\nOnce ``localhost:<port>`` is entered, verify that the\n“NEURON” view is shown:\n\n|tensorboard-NEURON-header|\n\nIf “NEURON” isn’t shown on the\ntop left hand side, select “NEURON” from the drop down on the top right\nhand side\n\n|tensorboard-NEURON-dropdown|\n\nOn the Left Hand Side, there are two dropdown menus: Run & Tool.\n\n|tensorboard-run-tool-dropdowns|\n\nThe Run dropdown would contain the Torch Runtime\nTrace and Operator Level Trace views; however since we only ran the\n“trace” (i.e Torch Runtime Trace Level), we’ll only see that log.\nThe Torch Runtime Trace views are simply dates in\n``year_month_day_hour_minute_second_millisecond`` format. The Tool\nDropdown only contains the “trace“ option.\n\nThe trace view should look like this:\n\n|tensorboard-run-trace-original|\n\nLet’s zoom into the following section of the trace:\n\n|tensorboard-run-trace-selected-section|\n\nAfter zooming in the trace should look like this:\n\n|tensorboard-run-trace-selected-section-zoomed|\n\nNotice on the top, there is a ``StepMarker`` process followed by ``NeuronDevice Execution``\nprocess. This correlates to the ``xm.mark_step()`` call which executes\nthe collected graph of our model on Neuron. For the Operator Level Trace\n(“operator”), we’ll be profiling the model operators that occur on\nNeuron. In other words, the profiler will zoom into the\n``NeuronDevice Execution`` process, if the user specifies\n``profile_type='trace'``.\n\nUsing Named Blocks for the Trace\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhat we've produced so far is the default behavior of the profiler, however \nit would be more useful to profile specific blocks of our code to narrow down onto\nperformance bottlenecks. To do this, use ``xp.Trace`` context manager.\nReplace the respective code in the training loop with the following:\n\n::\n\n   ...\n   optimizer.zero_grad()\n   train_x = torch.randn(1,10).to(device)\n   train_label = torch.tensor([1]).to(device)\n\n   with xp.Trace(\"model_build\"):\n       loss = loss_fn(model(train_x), train_label)                \n   with xp.Trace(\"loss_backward\"):\n       loss.backward()    \n   with xp.Trace(\"optimizer_step\"):\n       optimizer.step()\n\n   # XLA: collect ops and run them in XLA runtime\n   xm.mark_step()\n   ...\n\nRun the script, and follow the same TensorBoard steps. Afterwards, the\ntrace should look like this:\n\n|tensorboard-run-trace-selected-section-zoomed-named-traces|\n\nAs seen, the ``model_build``, ``loss_backward`` and ``optimizer_step`` \nsections have been profiled.\n\n.. note::\n   If you are running your training script in a docker container, to\n   view the tensorboard, you should launch the docker container using flag:\n   ``—network host`` eg. ``docker run —network host my_image:my_tag``\n\n\n\n.. |tensorboard-url-image| image:: /images/Neuron_Profiler_Tensorboard_Url.jpg\n\n.. |tensorboard-NEURON-header| image:: /images/Neuron_Profiler_Tensorboard_Header.jpg\n\n.. |tensorboard-NEURON-dropdown| image:: /images/Neuron_Profiler_Tensorboard_Dropdown.jpg\n\n.. |tensorboard-run-tool-dropdowns| image:: /images/Neuron_Profiler_Tensorboard_Run_Tool_Dropdowns.jpg\n\n.. |tensorboard-run-trace-original| image:: /images/Neuron_Profiler_Runtime_Trace_Original.jpg\n\n.. |tensorboard-run-trace-selected-section| image:: /images/Neuron_Profiler_Runtime_Trace_Section_Selection.jpg\n\n.. |tensorboard-run-trace-selected-section-zoomed| image:: /images/Neuron_Profiler_Runtime_Trace_Section_Selection_Zoomed.jpg\n\n.. |tensorboard-run-trace-selected-section-zoomed-named-traces| image:: /images/Neuron_Profiler_Runtime_Trace_Section_Selection_Zoomed_Named_Traces.jpg\n\n.. |tensorboard-operator-framework-view| image:: /images/Neuron_Profiler_T1_Op_Framework_View.png\n\n.. |tensorboard-operator-hlo-view| image:: /images/Neuron_Profiler_T1_Op_HLO_View.png\n\n.. |tensorboard-operator-trace-view| image:: /images/Neuron_Profiler_T1_Op_Trace_View.png\n\n.. |tensorboard-operator-trace-fusion-simple| image:: /images/Neuron_Profiler_T1_Op_Trace_Fusion_Simple.png\n\n.. |tensorboard-operator-trace-fusion-complex| image:: /images/Neuron_Profiler_T1_Op_Trace_Fusion_Complex.png\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/programming-guide/training/index.rst",
    "content": "\n.. meta::\n   :description: Developer Guide  (``torch-neuronx``) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training\n   :date-modified: 2026-03-13\n\n\nDeveloper Guide  (``torch-neuronx``)\n====================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide\n    /frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-debug\n    /frameworks/torch/torch-neuronx/programming-guide/torch-neuronx-profiling-dev-guide\n\n\n.. dropdown::  Developer Guide\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n    :open:\n\n    * :ref:`pytorch-neuronx-programming-guide`\n    * :ref:`pytorch-neuronx-debug`\n    * :ref:`torch-neuronx-dev-guide`\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-debug.rst",
    "content": ".. _pytorch-neuronx-debug:\n\n\n.. meta::\n   :description: How to debug models in PyTorch NeuronX - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, debugging, torch-neuronx, training\n   :date-modified: 2026-03-13\n\n\nHow to debug models in PyTorch NeuronX\n=======================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nTorch-XLA evaluates operations lazily, which means it builds a symbolic\ngraph in the background and the graph is executed in hardware only when\nthe users request (print) for the output or a mark_step is encountered.\nTo effectively debug training scripts with torch-xla, please use one of\nthe approaches mentioned below:\n\n**Printing metrics**\n~~~~~~~~~~~~~~~~~~~~\n\nTorch-xla provides a utility that records metrics of different sections\nof the code. These metrics can help figure out things like: How much\ntime is spent in compilation? How much time is spent in execution? To\ncheck the metrics:\n\n1. Import metrics: ``import torch_xla.debug.metrics as met``\n2. Print metrics at the end of the step: ``print(met.metrics_report())``\n\nPrinting metrics should produce an output that looks like this:\n\n.. code:: bash\n\n   Metric: CompileTime\n     TotalSamples: 1\n     Accumulator: 09s969ms486.408us\n     Percentiles: 1%=09s969ms486.408us; 5%=09s969ms486.408us; 10%=09s969ms486.408us; 20%=09s969ms486.408us; 50%=09s969ms486.408us; 80%=09s969ms486.408us; 90%=09s969ms486.408us; 95%=09s969ms486.408us; 99%=09s969ms486.408us\n   .....\n   Metric: ExecuteTime\n     TotalSamples: 1\n     Accumulator: 186ms062.970us\n     Percentiles: 1%=186ms062.970us; 5%=186ms062.970us; 10%=186ms062.970us; 20%=186ms062.970us; 50%=186ms062.970us; 80%=186ms062.970us; 90%=186ms062.970us; 95%=186ms062.970us; 99%=186ms062.970us\n   ....\n   Metric: TensorsGraphSize\n     TotalSamples: 1\n     Accumulator: 9.00\n     Percentiles: 1%=9.00; 5%=9.00; 10%=9.00; 20%=9.00; 50%=9.00; 80%=9.00; 90%=9.00; 95%=9.00; 99%=9.00\n   Metric: TransferFromServerTime\n     TotalSamples: 2\n     Accumulator: 010ms130.597us\n     ValueRate: 549ms937.108us / second\n     Rate: 108.372 / second\n     Percentiles: 1%=004ms948.602us; 5%=004ms948.602us; 10%=004ms948.602us; 20%=004ms948.602us; 50%=006ms181.995us; 80%=006ms181.995us; 90%=006ms181.995us; 95%=006ms181.995us; 99%=006ms181.995us\n   Metric: TransferToServerTime\n     TotalSamples: 6\n     Accumulator: 061ms698.791us\n     ValueRate: 007ms731.182us / second\n     Rate: 0.665369 / second\n     Percentiles: 1%=006ms848.579us; 5%=006ms848.579us; 10%=006ms848.579us; 20%=007ms129.666us; 50%=008ms940.718us; 80%=008ms496.166us; 90%=024ms636.413us; 95%=024ms636.413us; 99%=024ms636.413us\n   Metric: TransferToServerTransformTime\n     TotalSamples: 6\n     Accumulator: 011ms835.717us\n     ValueRate: 001ms200.844us / second\n     Rate: 0.664936 / second\n     Percentiles: 1%=108.403us; 5%=108.403us; 10%=108.403us; 20%=115.676us; 50%=167.399us; 80%=516.659us; 90%=010ms790.400us; 95%=010ms790.400us; 99%=010ms790.400us\n   .....\n   Counter: xla::_copy_from\n     Value: 7\n   Counter: xla::addmm\n     Value: 2\n   Counter: xla::empty\n     Value: 5\n   Counter: xla::t\n     Value: 2\n   ....\n   Metric: XrtCompile\n     TotalSamples: 1\n     Accumulator: 09s946ms607.609us\n     Mean: 09s946ms607.609us\n     StdDev: 000.000us\n     Percentiles: 25%=09s946ms607.609us; 50%=09s946ms607.609us; 80%=09s946ms607.609us; 90%=09s946ms607.609us; 95%=09s946ms607.609us; 99%=09s946ms607.609us\n   Metric: XrtExecute\n     TotalSamples: 1\n     Accumulator: 176ms932.067us\n     Mean: 176ms932.067us\n     StdDev: 000.000us\n     Percentiles: 25%=176ms932.067us; 50%=176ms932.067us; 80%=176ms932.067us; 90%=176ms932.067us; 95%=176ms932.067us; 99%=176ms932.067us\n   Metric: XrtReadLiteral\n     TotalSamples: 2\n     Accumulator: 608.578us\n     Mean: 304.289us\n     StdDev: 067.464us\n     Rate: 106.899 / second\n     Percentiles: 25%=236.825us; 50%=371.753us; 80%=371.753us; 90%=371.753us; 95%=371.753us; 99%=371.753us\n\nAs seen, you can get useful information about graph compile\ntimes/execution times. You can also know which operators are present in\nthe graph, which operators are run on the CPU and which operators are run on an XLA device.\nFor example, operators that have a prefix ``aten::`` would run on the CPU, since they do not have\nxla lowering. All operators with prefix ``xla::`` would run on an XLA device. Note: aten operators\nthat do not have xla lowering would result in a graph fragmentation and might end up slowing down the\nentire execution. If you encounter such operators, create a request for operator support.\n\n**Printing tensors**\n~~~~~~~~~~~~~~~~~~~~\n\nUsers can print tensors in their script as below:\n\n.. code:: python\n\n   import os\n   import torch\n   import torch_xla\n   import torch_xla.core.xla_model as xm\n\n   device = xm.xla_device()\n   input1 = torch.randn(2,10).to(device)\n   # Defining 2 linear layers\n   linear1 = torch.nn.Linear(10,30).to(device)\n   linear2 = torch.nn.Linear(30,20).to(device)\n\n   # Running forward\n   output1 = linear1(input1)\n   output2 = linear2(output1)\n   print(output2)\n\nSince torch-xla evaluates operations lazily, when you try to print\n``output2`` , the graph associated with the tensor would be evaluated.\nWhen a graph is evaluated, it is first compiled for the device and executed on\nthe selected device. Note: Each tensor would have a graph associated\nwith it and can result in graph compilations and executions. For\nexample, in the above script, if you try to print ``output1`` , a new\ngraph is cut and you would see another evaluation. To avoid multiple evaluations, you can make use of ``mark_step`` (next section).\n\n**Use mark_step**\n~~~~~~~~~~~~~~~~~\n\nTorch-XLA provides an api called ``mark_step`` which evaluates a graph\ncollected up to that point. While this is similar to printing of an output tensor\nwherein a graph is also evaluated, there is a difference. When\nan output tensor is printed, only the graph associated with that specific tensor is\nevaluated, whereas mark_step enables all the output tensors up to ``mark_step`` call to be evaluated\nin a single graph. Hence, any tensor print after ``mark_step`` would be\neffectively free of cost as the tensor values are already evaluated.\nConsider the example below:\n\n.. code:: python\n\n   import os\n   import torch\n   import torch_xla\n   import torch_xla.core.xla_model as xm\n   import torch_xla.debug.metrics as met\n\n   device = xm.xla_device()\n   input1 = torch.randn(2,10).to(device)\n   # Defining 2 linear layers\n   linear1 = torch.nn.Linear(10,30).to(device)\n   linear2 = torch.nn.Linear(30,20).to(device)\n\n   # Running forward\n   output1 = linear1(input1)\n   output2 = linear2(output1)\n   xm.mark_step()\n   print(output2)\n   print(output1)\n   # Printing the metrics to check if compilation and execution occurred\n   print(met.metrics_report())\n\nIn the printed metrics, the number of compiles and\nexecutions is only 1, even though 2 tensors are printed.\nHence, to avoid multiple graph evaluations, it is recommended that you\nvisualize tensors after a ``mark_step`` . You can also make use of the\n`add_step_closure <https://pytorch.org/xla/release/1.9/index.html#torch_xla.core.xla_model.add_step_closure>`__\napi for this purpose. With this api, you pass in the tensors that needs to\nbe visualized/printed. The added tensors would then be preserved in the\ngraph and can be printed as part of the callback function passed to the\napi. Here is a sample usage:\n`test_train_mp_mnist.py <https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist.py#L133>`__\n\n**Note:** Graph compilations can take time as the compiler optimizes the graph to run on device.\n\n**Using Eager Debug Mode**\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nEager debug mode provides a convenient utility to step through the code and evaluate operators one by one for correctness. Eager debug mode is useful to inspect your models the way you would do in eager-mode frameworks like PyTorch and Tensorflow. With Eager Debug Mode operations are executed eagerly. As soon as an operation is registered with torch-xla, it would be sent for compilation and\nexecution. Since compiling a single operation, the time spent\nwould be minimal. Moreover, the chances of hitting the framework compilation cache\nincreases as models would have repeated operations throughout.\nConsider example 1 below:\n\n.. code:: python\n\n   # Example 1\n\n   import os\n   # You need to set this env variable before importing torch-xla\n   # to run in eager debug mode.\n   os.environ[\"NEURON_USE_EAGER_DEBUG_MODE\"] = \"1\"\n\n   import torch\n   import torch_xla\n   import torch_xla.core.xla_model as xm\n   import torch_xla.debug.metrics as met\n\n   device = xm.xla_device()\n   input1 = torch.randn(2,10).to(device)\n   # Defining 2 linear layers\n   linear1 = torch.nn.Linear(10,30).to(device)\n   linear2 = torch.nn.Linear(30,20).to(device)\n\n   # Running forward\n   output1 = linear1(input1)\n   output2 = linear2(output1)\n\n   # Printing the metrics to check if compilation and execution occurred\n   # Here, in the metrics you should notice that the XRTCompile and XRTExecute\n   # value is non-zero, even though no tensor is printed. This is because, each\n   # operation is executed eagerly.\n   print(met.metrics_report())\n\n   print(output2)\n   print(output1)\n   # Printing the metrics to check if compilation and execution occurred.\n   # Here the XRTCompile count should be same as the previous count.\n   # In other words, printing tensors did not incur any extra compile\n   # and execution of the graph\n   print(met.metrics_report())\n\nAs seen from the above scripts, each operator is evaluated eagerly and\nthere is no extra compilation when output tensors are printed. Moreover, together with\nthe on-disk Neuron persistent cache, eager debug mode only incurs one time\ncompilation cost when the ops is first run. When the script is run again, the compiled ops will be\npulled from the persistent cache. Any changes you make to the\ntraining script would result in the re-compilation of only the newly\ninserted operations. This is because each operation is compiled\nindependently. Consider example 2 below:\n\n.. code:: python\n\n   # Example 2\n\n   import os\n   # You need to set this env variable before importing torch-xla\n   # to run in eager debug mode.\n   os.environ[\"NEURON_USE_EAGER_DEBUG_MODE\"] = \"1\"\n\n   import torch\n   import torch_xla\n   import torch_xla.core.xla_model as xm\n   import torch_xla.debug.metrics as met\n\n   os.environ['NEURON_CC_FLAGS'] = \"--log_level=INFO\"\n\n   device = xm.xla_device()\n   input1 = torch.randn(2,10).to(device)\n   # Defining 2 linear layers\n   linear1 = torch.nn.Linear(10,30).to(device)\n   linear2 = torch.nn.Linear(30,20).to(device)\n   linear3 = torch.nn.Linear(20,30).to(device)\n   linear4 = torch.nn.Linear(30,20).to(device)\n\n   # Running forward\n   output1 = linear1(input1)\n   output2 = linear2(output1)\n   output3 = linear3(output2)\n\n   # Note the number of compiles at this point and compare\n   # with the compiles in the next metrics print\n   print(met.metrics_report())\n\n   output4 = linear4(output3)\n   print(met.metrics_report())\n\nRunning the above example 2 script after running example 1 script, you may notice that from the start until the statement ``output2 = linear2(output1)`` ,\nall the graphs would hit the persistent cache. Executing the line\n``output3 = linear3(output2)`` would result in a new compilation for ``linear3`` layer only because the layer configuration is new.\nNow, when we run\n``output4 = linear4(output3)`` , you would observe no new compilation\nhappens. This is because the graph for ``linear4`` is same as the graph for\n``linear2`` and hence the compiled graph for ``linear2`` is reused for ``linear4`` by the framework's internal cache.\n\nEager debug mode avoids the wait times involved with tensor printing because of larger graph compilation.\nIt is designed only for debugging purposes, so when the training script is ready, please remove the ``NEURON_USE_EAGER_DEBUG_MODE`` environment\nvariable from the script in order to obtain optimal performance.\n\nBy default, in eager debug mode the\nlogging level in the Neuron compiler is set to error mode. Hence, no\nlogs would be generated unless there is an error. Before your first\nprint, if there are many operations that needs to be compiled, there\nmight be a small delay. In case you want to check the logs, switch on\nthe ``INFO`` logs for compiler using:\n\n.. code:: python\n\n   os.environ[\"NEURON_CC_FLAGS\"] = \"--log_level=INFO\"\n\n**Profiling model run**\n~~~~~~~~~~~~~~~~~~~~~~~\n\nProfiling model run can help to identify different bottlenecks and\nresolve issues faster. You can profile different sections of the code to\nsee which block is the slowest. To profile model run, you can follow the\nsteps below:\n\n1. Add: ``import torch_xla.debug.profiler as xp``\n\n2. Start server. This can be done by adding the following line after\n   creating xla device: ``server = xp.start_server(9012)``\n\n3. In a separate terminal, start tensorboard. The logdir should be in\n   the same directory from which you run the script.\n\n   .. image:: /images/tensorboard.png\n      :alt: Image: tensorboard.png\n\n   Open the tensorboard on a browser. Go to profile section in the top\n   right. Note: you may have to install the profile plugin using:\n   ``pip install tensorboard-plugin-profile``\n\n4. When you click on the profile, it should give an option to capture\n   profile. Clicking on capture profile produces the following pop-up.\n\n   .. image:: /images/popup.png\n      :alt: Image: popup.png\n\n   In the URL enter: ``localhost:9012`` . Port in this URL should\n   be same as the one you gave when starting the server in the script.\n\n5. Once done, click capture and it should automatically load the\n   following page:\n\n   .. image:: /images/./tb_1.png\n      :alt: Image: tb_1.png\n\n6. To check the profile for different blocks of code, head to\n   ``trace_viewer`` under ``Tools`` (on the left column).\n\n   .. image:: /images/./options.png\n      :alt: Image: options.png\n\n7. It should show a profile that looks like this:\n\n   .. image:: /images/./profile_large.png\n      :alt: Image: profile_large.png\n\nNote: By default, torch-xla would time different blocks of code inside\nthe library. However, you can also profile block of code in your\nscripts. This can be done by adding the code within a ``xp.Trace``\ncontext as follows:\n\n.. code:: python\n\n   ....\n   for epoch in range(total_epochs):\n       inputs = torch.randn(1,10).to(device)\n       labels = torch.tensor([1]).to(device)\n       with xp.Trace(\"model_build\"):\n           loss = model(inputs, labels)\n       with xp.Trace(\"loss_backward\"):\n           loss.backward()\n   ....\n\nIt should produce a profile that has the ``model_build`` and\n``loss_backward`` section timed. This way you can time any block of\nscript for debugging.\n\n.. image:: /images/./profile_zoom.png\n   :alt: Image: Screen profile_zoom.png\n\nNote: If you are running your training script in a docker container, to view the\ntensorboard, you should launch the docker container using flag: ``--network host``\neg. ``docker run --network host my_image:my_tag``\n\n\n.. _torch-neuronx-snapshotting:\n\n**Snapshotting With Torch-Neuronx 2.1**\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSnapshotting models can be used to dump debug information that can then be sent\nto the Neuron team. Neuron execution relies on a series of compiled graphs. Internally,\ngraph HLOs are used as an intermediate representation which is then compiled. Then, during\nexecution, the graph inputs are passed to the Neuron runtime, which produces\noutputs using the compiled graph. Snapshotting saves the inputs to a graph\nexecution, executes the graphs, saves the outputs of the execution, and then\nbundles and dumps the inputs, outputs and graph HLO in one file. This is\nillustrated here:\n\n.. image :: /images/./snapshot-diagram.png\n   :alt: Image: snapshot-diagram.png\n\nThis feature can be enabled using the following environment variables,\nwhich can be set at the beginning of your script as follows (``./dump`` is the snapshot\ndump directory that will be created):\n\n.. code:: python\n\n   ....\n   os.environ[\"XLA_FLAGS\"] = \"--xla_dump_hlo_snapshots --xla_dump_to=./dump\"\n   ....\n\nThis environment variable will produce snapshots in the ``./dump``\nfolder with the extension ``.decomposed_hlo_snapshot``\nat every iteration for every process. For example, files that look like the following would\nbe produced.\n\n.. code:: bash\n\n   SyncTensorsGraph.27737-process000000-executable000003-device000000-execution000496.inputs.decomposed_hlo_snapshot\n\nNote that ``NEURON_FRAMEWORK_DEBUG`` does not need to be set, as in torch-neuronx 1.13. Also note that ``NEURON_DUMP_HLO_SNAPSHOT`` and ``NEURON_NC0_ONLY_SNAPSHOT`` environment variables used in torch-neuronx 1.13 are now no longer used to control snapshot dumping.\n\nSnapshots can take up a large amount of disk space. To avoid running out of disk space, you can limit the snapshoting for a certain rank, such as rank 0. The following example code would work with ``torchrun`` utility which sets the ``RANK`` environment variable for each process:\n\n.. code:: python\n\n    if os.environ.get(\"RANK\", \"0\") == \"0\":\n        os.environ[\"XLA_FLAGS\"]=\"--xla_dump_hlo_snapshots --xla_dump_to=./dump\"\n\nor if not using torchrun:\n\n.. code:: python\n\n    import torch_xla.core.xla_model as xm\n\n    ....\n    if xm.is_master_ordinal():\n        os.environ[\"XLA_FLAGS\"]=\"--xla_dump_hlo_snapshots --xla_dump_to=./dump\"\n    ....\n\nTorch-NeuronX 2.1+ provides a ``register_hlo_snapshot_callback`` API to allow more control over when to dump the snapshot.\nBy default, Torch-NeuronX 2.1+ includes the following callback function:\n\n.. code:: python\n\n    def _dump_hlo_snapshot_callback(name: str, addressable_device_index: int, execution_count: int) -> str:\n        return 'inputs'\n\nAs the return value is always 'inputs', the backend will always dump snapshot files containing HLO and input data only. Recognized return value keywords are 'inputs' and 'outputs'.  If the return value is an empty string '', then the backend will skip this dump. If the return value is 'inputs outputs', then the backend will dump two snapshot files for each execution, one holding inputs, and another one holding outputs.\n\nTo implement selective dumping, we can make use of the callback function's parameters name, addressable_device_index, execution_count , where:\n\n* ``name`` is a string that stands for the HLO graph's name.\n* ``addressable_device_index`` is an integer that refers to the index of the addressable Neuron device as one NEFF can load onto multiple addressable Neuron devices (NeuronCores) for SPMD executions. Note that this is not the same as the worker process rank in multi-process execution, in which ``RANK``/``xm.get_ordinal()`` or ``LOCAL_RANK``/``xm.get_local_ordinal()`` should be used. See examples above.\n* ``execution_count`` is an integer that indicates the value of an internal execution counter that increments by one for each execution of a compiled graph when HloSnapshot dumping is requested. Note that each compiled graph maintains multiple execution counters, one for each addressable device that it loads onto.\n\nFor example, the following will dump snapshot files containing outputs at execution #2 (Note that this is graph execution number, not the iteration or step; for iteration or step, see the next example):\n\n.. code:: python\n\n    def callback(name, addressable_device_index, execution_count):\n        if execution_count == 2:\n            return 'outputs'\n        else:\n            return ''\n\n    import libneuronxla\n    old_callback = libneuronxla.register_hlo_snapshot_callback(callback)\n\nCallback functions can be use to dump at a certain condition, such as when the global step count equal a value:\n\n.. code:: python\n\n    step = 0\n    def callback(name, addressable_device_index, execution_count):\n        if step == 5:\n            return 'inputs'\n        else:\n            return ''\n\n    import libneuronxla\n    old_callback = libneuronxla.register_hlo_snapshot_callback(callback)\n\n    ...\n    for epoch in range(EPOCHS):\n        for idx, (train_x, train_label) in enumerate(train_loader):\n            step += 1\n    ...\n\n.. note::\n\n   Snapshot dumping triggered by a runtime error such as NaN is not yet available. It will be available in a feature release.\n\n\n.. _torch-neuronx-snapshotting_1.13:\n\n**Snapshotting with Torch-Neuronx 1.13**\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. note::\n\n   If you are using Torch-NeuronX 2.1, please see :ref:`torch-neuronx-snapshotting`\n\nWith Torch-Neuronx 1.13, the snapshotting feature can be enabled using the following environment variables,\nwhich can be set at the beginning of your script as follows.\n\n.. code:: python\n\n   ....\n   os.environ[\"XLA_FLAGS\"] = \" --xla_dump_to=dump\"\n   os.environ[\"NEURON_FRAMEWORK_DEBUG\"] = \"1\"\n   os.environ[\"NEURON_DUMP_HLO_SNAPSHOT\"] = \"1\"\n   ....\n\n\nThis set of environment variables will produce snapshots under the dump\nfolder with the extensions ``.hlo.snapshot.pb`` or ``.decomposed_hlo_snapshot``\nat every iteration. For example a file that looks like the following would\nbe produced.\n\n.. code:: bash\n\n   dump/module_SyncTensorsGraph.387.pid_12643.execution_7496.hlo_snapshot.pb\n\nThe dumping environment variable can be set and unset at specific\niterations as shown in the following example.\n\n.. code:: python\n\n    ....\n    for step in range(STEPS):\n        if step == 20:\n            os.environ[\"NEURON_DUMP_HLO_SNAPSHOT\"] = \"1\"\n        else:\n            os.environ.pop('NEURON_DUMP_HLO_SNAPSHOT', None)\n        train_x = torch.randn(BATCH_SIZE, 28, 28)\n        train_x = train_x.to(device)\n        loss = model(train_x)\n        loss.backward()\n        optimizer.step()\n        xm.mark_step()\n    ....\n\n\nAdditionally, we provide capabilities to snapshot graphs automatically.\nThe environment variables above can be set as follows:\n\n.. code:: python\n\n    ....\n    os.environ[\"XLA_FLAGS\"] = \" --xla_dump_to=dump\"\n    os.environ[\"NEURON_FRAMEWORK_DEBUG\"] = \"1\"\n    os.environ[\"NEURON_DUMP_HLO_SNAPSHOT\"] = \"ON_NRT_ERROR\"\n    ....\n\nWhen unexpected errors such as a graph execution producing NaNs occurs,\nsnapshots will be automatically produced and execution will be terminated.\nOccasionally, for larger models, automatic snapshotting may not capture\nsnapshots due to the device memory being exhausted. In this case, the above\nflag can be set to\n``os.environ[\"NEURON_DUMP_HLO_SNAPSHOT\"] = \"ON_NRT_ERROR_HYBRID\"``, this\nwill allocate memory for inputs on both the device and host memory.\nIn some additional cases, this may still go out of memory and may need to be\nset to ``os.environ[\"NEURON_DUMP_HLO_SNAPSHOT\"] = \"ON_NRT_ERROR_CPU\"`` to\navoid allocating any memory on the device at all for automatic snapshotting.\n\n**Snapshot FAQs:**\n--------------------\n\n**When should I use this features?**\n\nThis feature should be used when debugging errors that requires interfacing\nwith and providing debug data to the Neuron team. Snapshotting may be redundant\nand unnecessary in some situations. For example, when only the model weights are\nnecessary for debugging, methods such as checkpointing may be more convenient to use.\n\n**What sort of data is captured with these snapshots?**\n\nThe type of data captured by these snapshots may include model graphs in HLO form,\nweights/parameters, optimizer states, intermediate tensors and gradients.\nThis data may be considered sensitive and this should be taken into account before\nsending the data to the Neuron team.\n\n**What is the size of these snapshots?**\n\nThe size of snapshots can be significant for larger models such as GPT or BERT\nwith several GBs worth of data for larger graphs, so it is recommended to check\nthat sufficient disk space exists before using snapshotting. In addition, limiting\nthe amount of snapshots taken in a run will help to preserve disk space.\n\n**Will snapshotting add overhead to my execution?**\n\nSnapshotting does add a small overhead to the execution in most cases. This\noverhead can be significant if snapshots are dumped at every iteration. In\norder to alleviate some of this overhead, in the case that snapshotting is\nnot necessary on all cores the following environment variable can be set to\ncollect snapshots only on the first core in torch-neuronx 1.13:\n\n.. code:: python\n\n    ....\n    os.environ[\"NEURON_NC0_ONLY_SNAPSHOT\"] = \"1\"\n    ....\n\nIn torch-neuronx 2.1, use ``RANK`` environmental variable when using torchrun or ``xm.is_master_ordinal()`` to limit dumping to the first process (see above):\n\n.. code:: python\n\n    ....\n    if os.environ.get(\"RANK\", \"0\") == \"0\":\n        os.environ[\"XLA_FLAGS\"]=\"--xla_dump_hlo_snapshots --xla_dump_to=./dump\"\n    ....\n\nor (not using torchrun):\n\n.. code:: python\n\n    import torch_xla.core.xla_model as xm\n\n    ....\n    if xm.is_master_ordinal():\n        os.environ[\"XLA_FLAGS\"]=\"--xla_dump_hlo_snapshots --xla_dump_to=./dump\"\n    ....\n\nIn addition, checkpointing in tandem\nwith snapshotting can be useful to reduce overhead. A checkpoint close to\nthe problem iteration can be captured, then execution resumed with\nsnapshots enabled.\n\n**How can I share snapshots with the Neuron team?**\n\nThese snapshots can be shared with the Neuron team via S3 bucket.\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide.rst",
    "content": ".. _pytorch-neuronx-programming-guide:\n\n\n.. meta::\n   :description: Developer Guide for Training with PyTorch NeuronX - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training\n   :date-modified: 2026-03-13\n\n\nDeveloper Guide for Training with PyTorch NeuronX \n===================================================\n\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\nTrainium is designed to speed up model training and reduce training cost. It is available on the Trn1 and Trn2 instances. On Trn1, each Trainium accelerator has two NeuronCores (default two Logical NeuronCores), which are the main neural network compute units. On Trn2, each Trainium accelerator has 8 physical NeuronCores, configured as 4 Logical NeuronCores by default (LNC=2). The only supported LNC values are 1 and 2. The examples in this guide apply to Trn1. They can be extended to run on Trn2.\n\n.. important::\n   Currently, Neuron does not support setting the number of Logical NeuronCores (LNC) to a value of 8.\n\nPyTorch NeuronX enables PyTorch users to train their models on Trainium's\nNeuronCores with little code change to their training code. It is based\non the `PyTorch/XLA software package <https://pytorch.org/xla>`__.\n\nThis guide helps you get started with single-worker training and\ndistributed training using PyTorch Neuron.\n\nPyTorch NeuronX\n----------------\n\nNeuron XLA device\n~~~~~~~~~~~~~~~~~\n\nWith PyTorch NeuronX the default XLA device is mapped to a :ref:`Logical NeuronCore<logical-neuroncore-config>`. By default, one Logical NeuronCore is configured by a process. To use the Neuron XLA device, specify\nthe device as ``xm.xla_device()`` or ``'xla'``:\n\n.. code:: python\n\n   import torch_xla.core.xla_model as xm\n   device = xm.xla_device()\n\nor\n\n.. code:: python\n\n   device = 'xla'\n\nPyTorch models and tensors can be mapped to the device as usual:\n\n.. code:: python\n\n   model.to(device)\n   tensor.to(device)\n\nTo move tensor back to CPU, do :\n\n.. code:: python\n\n   tensor.cpu()\n\nor\n\n.. code:: python\n\n   tensor.to('cpu')\n\nPyTorch NeuronX single-worker training/evaluation quick-start\n--------------------------------------------------------------\n\nPyTorch NeuronX uses XLA to enable conversion of\nPyTorch operations to Trainium instructions. To get started on PyTorch\nNeuronX, first modify your :ref:`training script <neuronx-mlp-training-tutorial>` to\nuse XLA in the same manner as described in `PyTorch/XLA\ndocumentation <https://pytorch.org/xla>`__ and\nuse XLA device:\n\n.. code:: python\n\n   import torch_xla.core.xla_model as xm\n\n   device = xm.xla_device()\n   # or\n   device = 'xla'\n\nThe Logical NeuronCore is mapped to an XLA device. On Trainium instance, the XLA device is automatically mapped to the first available Logical NeuronCore. You can use :ref:`NEURON_RT_VISIBLE_CORES<nrt-configuration>` to select specific Logical NeuronCore to use.\n\nBy default the above steps will enable the training or evaluation script to run on one Logical\nNeuronCore. NOTE: Each process is mapped to one NeuronCore.\n\nFinally, add ``mark_step`` at the end of the training or evaluation step to compile\nand execute the training or evaluation step:\n\n.. code:: python\n\n   xm.mark_step()\n\nThese changes can be placed in control-flows in order to keep the script\nthe same between PyTorch Neuron and CPU/GPU. For example, you can use an\nenvironment variable to disable XLA which would cause the script to run\nin PyTorch native mode (using CPU on Trainium instances and GPU on GPU\ninstances):\n\n.. code:: python\n\n   device = 'cpu'\n   if not os.environ.get(\"DISABLE_XLA\", None):\n       device = 'xla'\n\n   ...\n\n       # end of training step \n       if not os.environ.get(\"DISABLE_XLA\", None):\n           xm.mark_step()\n\nMore on the need for mark_step is at `Understand the lazy mode in\nPyTorch Neuron <#understand-the-lazy-mode-in-pytorch-neuronx>`__.\n\nFor a full runnable example, please see the :ref:`Single-worker MLP training\non Trainium tutorial\n<neuronx-mlp-training-tutorial>`.\n\nPyTorch NeuronX multi-worker data parallel training using torchrun\n--------------------------------------------------------------------\n\nData parallel training allows you to replicate your script across\nmultiple workers, each worker processing a proportional portion of the\ndataset, in order to train faster.\n\nTo run multiple workers in data parallel configuration, with each worker\nusing one NeuronCore, first add additional imports for parallel\ndataloader and multi-processing utilities:\n\n::\n\n   import torch_xla.distributed.parallel_loader as pl\n\nNext we initialize the Neuron distributed context using the XLA backend for torch.distributed:\n\n::\n\n    import torch_xla.distributed.xla_backend\n    torch.distributed.init_process_group('xla')\n\nNext, replace ``optimizer.step()`` function call with\n``xm.optimizer_step(optimizer)`` which adds gradient synchronization\nacross workers before taking the optimizer step:\n\n::\n\n   xm.optimizer_step(optimizer)\n\nIf you're using a distributed dataloader, wrap your dataloader in the\nPyTorch/XLA's ``MpDeviceLoader`` class which provides buffering\nto hide CPU to device data load latency:\n\n::\n\n   parallel_loader = pl.MpDeviceLoader(dataloader, device)\n\nWithin the training code, use xm.xrt_world_size() to get the world size,\nand xm.get_ordinal to get the global rank of the current process.\n\nThen run use `PyTorch\ntorchrun <https://pytorch.org/docs/stable/elastic/run.html#launcher-api>`__\nutility to run the script. For example, to run 32 worker data parallel\ntraining on trn1.32xlarge:\n\n``torchrun --nproc_per_node=32 <script and options>``\n\nTo run on multiple instances, make sure to use trn1.32xlarge instances\nand use all 32 NeuronCores on each instance. For example, with two instances, \non the rank-0 Trn1 host, run with --node_rank=0  using torchrun utility:\n\n.. code:: shell\n\n    torchrun --nproc_per_node=32 --nnodes=2 --node_rank=0 --master_addr=<root IP> --master_port=<root port> <script and options>\n\nOn another Trn1 host, run with --node_rank=1 :\n\n.. code:: shell\n\n    torchrun --nproc_per_node=32 --nnodes=2 --node_rank=1 --master_addr=<root IP> --master_port=<root port> <script and options>\n\nIt is important to launch rank-0 worker with --node_rank=0  to avoid hang.\n\nFor trn2.48xlarge, use ``--nproc_per_node=64`` for 64 Logical NeuronCores default (each Logical NeuronCores using two physical NeuronCores).\n\nTo train on multiple instances, it is recommended to use a ParallelCluster. For a ParallelCluster example, please see `Train a model on AWS Trn1 ParallelCluster <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples>`__.\n\nMore information about torchrun can be found PyTorch documentation at\nhttps://pytorch.org/docs/stable/elastic/run.html#launcher-api .\n\nSee the :ref:`Multi-worker data-parallel MLP training using torchrun\ntutorial <neuronx-mlp-training-tutorial>`\nfor a full example.\n\n\nChecking the device kind\n------------------------\n\nTo find out the device kind that the application is running on, use ``torch_xla.core.xla_model.xla_device_kind()``. The returned string can be ``NC_v2`` for Trainium1, ``NC_v3`` for Trainium2 LNC=1 configuration, or ``NC_v3d`` for Trainium2 LNC=2 configuration. See :ref:`Logical NeuronCore<logical-neuroncore-config>` for more information about LNC configuration.\n\nExample:\n\n.. code:: python\n\n   import torch_xla.core.xla_model as xm\n\n   devkind = xm.xla_device_kind()\n   print(devkind)\n\nOutput on trn1.32xlarge:\n\n.. code:: bash\n\n   NC_v2\n\nChecking the number of devices\n------------------------------\n\nTo find out the number of devices are available on the EC2 instance, use ``torch_xla.core.xla_model.get_xla_supported_devices()``, which returns a list of devices:\n\n.. code:: python\n\n   import torch_xla.core.xla_model as xm\n\n   devices = xm.get_xla_supported_devices()\n   print(len(devices))\n   print(devices)\n\nOutput on trn1.32xlarge:\n \n.. code:: bash\n\n    32\n    ['xla:0', 'xla:1', 'xla:2', 'xla:3', 'xla:4', 'xla:5', 'xla:6', 'xla:7', 'xla:8', 'xla:9', 'xla:10', 'xla:11', 'xla:12', 'xla:13', 'xla:14', 'xla:15', 'xla:16', 'xla:17', 'xla:18', 'xla:19', 'xla:20', 'xla:21', 'xla:22', 'xla:23', 'xla:24', 'xla:25', 'xla:26', 'xla:27', 'xla:28', 'xla:29', 'xla:30', 'xla:31']\n\n\nChecking the platform type\n--------------------------\n\nTo get the EC2 instance's platform type string, i.e. ``trn1``, ``inf2``, ``trn2``, use ``torch_neuronx.utils.get_platform_target()``:\n\n.. code:: python\n\n   from torch_neuronx.utils import get_platform_target\n\n   platform = get_platform_target()\n   print(platform)\n\nOutput on trn1.32xlarge:\n \n.. code:: bash\n\n   trn1\n\n\nConversion from Distributed Data Parallel (DDP) application\n-----------------------------------------------------------\n\nDistributed Data Parallel (DDP) in torch.distributed module is a wrapper\nto help convert a single-worker training to distributed training. To\nconvert from torch.distributed Distributed Data Parallel (DDP)\napplication to PyTorch Neuron, first convert the application back to\nsingle-worker training, which simply involves removing the DDP wrapper,\nfor example ``model = DDP(model, device_ids=[rank])``. After this,\nfollow the previous section to change to multi-worker training.\n\nPyTorch NeuronX environment variables\n--------------------------------------\n\nEnvironment variables allow modifications to PyTorch Neuron behavior\nwithout requiring code change to user script. See :ref:`PyTorch Neuron environment variables <pytorch-neuronx-envvars>` for more details.\n\nNeuron Persistent Cache for compiled graphs\n-------------------------------------------\n\nSee :ref:`Neuron Persistent Cache for compiled graphs <neuron-caching>`\n\nNumber of graphs\n-----------------\n\nPyTorch/XLA converts PyTorch's eager mode execution to lazy-mode\ngraph-based execution. During this process, there can be multiple graphs\ncompiled and executed if there are extra mark-steps or functions with\nimplicit mark-steps. Additionally, more graphs can be generated if there\nare different execution paths taken due to control-flows.\n\nFull BF16 with stochastic rounding enabled\n------------------------------------------\n\nPreviously, on torch-neuronx 2.1 and earlier, the environmental variables ``XLA_USE_BF16`` or ``XLA_DOWNCAST_BF16`` provided full casting to BF16 with stochastic rounding enabled by default. These environmental variables are deprecated in torch-neuronx 2.5, although still functional with warnings. To replace ``XLA_USE_BF16`` or ``XLA_DOWNCAST_BF16`` with stochastic rounding on Neuron, set ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1`` and use the ``torch.nn.Module.to`` method to cast model floating-point parameters and buffers to data-type BF16 as follows:\n\n.. code:: python\n\n    os.environ[\"NEURON_RT_STOCHASTIC_ROUNDING_EN\"] = \"1\"\n\n    # model is created\n    model.to(torch.bfloat16)\n\nStochastic rounding is needed to enable faster convergence for full BF16 model.\n\nIf the loss is to be kept in FP32, initialize it with ``dtype=torch.float`` as follows:\n\n.. code:: python\n\n    running_loss = torch.zeros(1, dtype=torch.float).to(device)\n\nSimilarly, if the optimizer states are to be kept in FP32, convert the gradients to FP32 before optimizer computations:\n\n.. code:: python\n\n    grad = p.grad.data.float()\n\nFor a full example, please see the :ref:`PyTorch Neuron BERT Pretraining Tutorial (Data-Parallel) <hf-bert-pretraining-tutorial>`, which has been updated to use ``torch.nn.Module.to`` instead of ``XLA_DOWNCAST_BF16``.\n\nBF16 in GPU-compatible mode without stochastic rounding enabled\n---------------------------------------------------------------\n\nFull BF16 training in GPU-compatible mode would enable faster convergence without the need for stochastic rounding, but would require a FP32 copy of weights/parameters to be saved and used in the optimizer. To enable BF16 in GPU-compatible mode without stochastic rounding enabled, use the ``torch.nn.Module.to`` method to cast model floating-point parameters and buffers to data-type bfloat16 as follows without setting ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1``:\n\n.. code:: python\n\n    # model is created\n    model.to(torch.bfloat16)\n\nIn the initializer of the optimizer, for example AdamW, you can add code like the following code snippet to make a FP32 copy of weights:\n\n.. code:: python\n\n        # keep a copy of weights in highprec\n        self.param_groups_highprec = []\n        for group in self.param_groups:\n            params = group['params']\n            param_groups_highprec = [p.data.float() for p in params]\n            self.param_groups_highprec.append({'params': param_groups_highprec})\n\nIn the :ref:`PyTorch Neuron BERT Pretraining Tutorial (Data-Parallel) <hf-bert-pretraining-tutorial>`, this mode can be enabled by pasing ``--optimizer=AdamW_FP32ParamsCopy`` option to ``dp_bert_large_hf_pretrain_hdf5.py`` and setting ``NEURON_RT_STOCHASTIC_ROUNDING_EN=0`` (or leave it unset).\n\n.. _automatic_mixed_precision_autocast:\n\nBF16 automatic mixed precision using PyTorch Autocast\n-----------------------------------------------------\n\nBy default, the compiler automatically casts internal FP32 operations to\nBF16. You can disable this and allow PyTorch's BF16 automatic mixed precision function (``torch.autocast``) to\ndo the casting of certain operations to operate in BF16.\n\nTo enable PyTorch's BF16 mixed-precision, first turn off the Neuron\ncompiler auto-cast:\n\n.. code:: python\n\n   os.environ[\"NEURON_CC_FLAGS\"] = \"--auto-cast=none\"\n\nNext, per recommendation from official PyTorch `torch.autocast documentation <https://pytorch.org/docs/stable/amp.html#autocasting>`__, place only\nthe forward-pass of the training step in the ``torch.autocast`` scope with ``xla`` device type:\n\n.. code:: python\n\n   with torch.autocast(dtype=torch.bfloat16, device_type='xla'):\n       # forward pass\n\nThe device type is XLA because we are using PyTorch-XLA's autocast backend. The PyTorch-XLA `autocast mode source code <https://github.com/pytorch/xla/blob/master/torch_xla/csrc/autocast_mode.cpp>`_ lists which operations are casted to lower precision BF16 (\"lower precision fp cast policy\" section), which are maintained in FP32 (\"fp32 cast policy\"), and which are promoted to the widest input types (\"promote\" section).\n\nExample showing the original training code snippet:\n\n.. code:: python\n\n   def train_loop_fn(train_loader):\n       for i, data in enumerate(train_loader):\n           inputs = data[0]\n           labels = data[3]\n           outputs = model(inputs, labels=labels)\n           loss = outputs.loss/ flags.grad_acc_steps\n           loss.backward()\n           optimizer.step()\n           xm.mark_step()               \n\nThe following shows the training loop modified to use BF16 autocast:\n\n.. code:: python\n\n   os.environ[\"NEURON_CC_FLAGS\"] = \"--auto-cast=none\"\n\n   def train_loop_fn(train_loader):\n       for i, data in enumerate(train_loader):\n           torch.cuda.is_bf16_supported = lambda: True\n           with torch.autocast(dtype=torch.bfloat16, device_type='xla'):\n               inputs = data[0]\n               labels = data[3]\n               outputs = model(inputs, labels=labels)\n           loss = outputs.loss/ flags.grad_acc_steps\n           loss.backward()\n           optimizer.step()\n           xm.mark_step()        \n\nFor a full example of BF16 mixed-precision, see :ref:`PyTorch Neuron BERT Pretraining Tutorial (Data-Parallel) <hf-bert-pretraining-tutorial>`.\n\nSee official PyTorch documentation for more details about\n`torch.autocast <https://pytorch.org/docs/stable/amp.html#autocasting>`__\n.\n\nTips and Best Practices\n-----------------------\n\nUnderstand the lazy mode in PyTorch NeuronX\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nOne significant difference between PyTorch NeuronX and native PyTorch is\nthat the PyTorch NeuronX system runs in lazy mode while the native\nPyTorch runs in eager mode. Tensors in lazy mode are placeholders for\nbuilding the computational graph until they are materialized after the\ncompilation and evaluation are complete. The PyTorch NeuronX system\nbuilds the computational graph on the fly when you call PyTorch APIs to\nbuild the computation using tensors and operators. The computational\ngraph gets compiled and executed when ``xm.mark_step()`` is called\nexplicitly or implicitly by ``pl.MpDeviceLoader/pl.ParallelLoader``, or\nwhen you explicitly request the value of a tensor such as by calling\n``loss.item()`` or ``print(loss)``.\n\n.. _minimize-the-number-of-compilation-and-executions:\n\nMinimize the number of compilation-and-executions\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nFor best performance, you should keep in mind the possible ways to\ninitiate compilation-and-executions as described in `Understand the lazy\nmode in PyTorch/XLA <#understand-the-lazy-mode-in-pytorch-neuronx>`__ and\nshould try to minimize the number of compilation-and-executions.\nIdeally, only one compilation-and-execution is necessary per training\niteration and is initiated automatically by\n``pl.MpDeviceLoader/pl.ParallelLoader``. The ``MpDeviceLoader`` is\noptimized for XLA and should always be used if possible for best\nperformance. During training, you might want to examine some\nintermediate results such as loss values. In such case, the printing of\nlazy tensors should be wrapped using ``xm.add_step_closure()`` to avoid\nunnecessary compilation-and-executions.\n\nAggregate the data transfers between host CPUs and devices\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nFor best performance, you may try to aggregate the data transfers between host CPUs and devices.\nFor example, increasing the value for `batches_per_execution` argument when instantiating ``MpDeviceLoader`` can help increase performance for certain where there's frequent host-device traffic like ViT as described in `a blog <https://towardsdatascience.com/ai-model-optimization-on-aws-inferentia-and-trainium-cfd48e85d5ac>`_. NOTE: Increasing `batches_per_execution` value would delay the mark-step for multiple batches specified by this value, increasing graph size and could lead to out-of-memory (device OOM) error.\n\nEnsure common initial weights across workers\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo achieve best accuracy during data parallel training, all workers need\nto have the same initial parameter states. This can be achieved by using\nthe same seed across the workers. In the case of HuggingFace library,\nthe set_seed function can be used.\n(`pytorch/xla#3216 <https://github.com/pytorch/xla/issues/3216>`__).\n\nUse PyTorch/XLA's model save function\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo avoid problems with saving and loading checkpoints, make sure you use\nPyTorch/XLA's model save function to properly checkpoint your model. For\nmore information about the function, see\n`torch_xla.core.xla_model.save <https://pytorch.org/xla/release/1.9/index.html#torch_xla.core.xla_model.save>`__\nin the *PyTorch on XLA Devices* documentation.\n\nWhen training using multiple devices, ``xla_model.save`` can result in high host memory usage. If you see such high usage \ncausing the host to run out of memory, please use `torch_xla.utils.serialization.save <https://pytorch.org/xla/release/1.9/index.html#torch_xla.utils.serialization.save>`__ .\nThis would save the model in a serialized manner. When saved using the ``serialization.save`` api, the model should \nbe loaded using ``serialization.load`` api. More information on this here: `Saving and Loading Tensors <https://pytorch.org/xla/release/1.9/index.html#saving-and-loading-xla-tensors>`__\n\n\nDebugging and troubleshooting\n-----------------------------\n\nTo debug on PyTorch Neuron, follow the :doc:`debug guide <pytorch-neuron-debug>`.\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/pytorch-neuron-supported-operators.rst",
    "content": ".. _pytorch-neuron-supported-operators:\n\n\n.. meta::\n   :description: PyTorch Neuron (``torch-neuronx``) - Supported Operators - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nPyTorch Neuron (``torch-neuronx``) - Supported Operators\n========================================================\n\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nOperator support\n~~~~~~~~~~~~~~~~\n\nThe following list the aten operators supported by torch-neuronx.\n\n+----------------------------------+\n| aten::_s_where                   |\n+----------------------------------+\n| aten::_softmax                   |\n+----------------------------------+\n| aten::_softmax_backward_data     |\n+----------------------------------+\n| aten::_unsafe_view               |\n+----------------------------------+\n| aten::add                        |\n+----------------------------------+\n| aten::addcdiv\\_                  |\n+----------------------------------+\n| aten::addcmul                    |\n+----------------------------------+\n| aten::addmm                      |\n+----------------------------------+\n| aten::bernoulli\\_                |\n+----------------------------------+\n| aten::bmm                        |\n+----------------------------------+\n| aten::constant_pad_nd            |\n+----------------------------------+\n| aten::div                        |\n+----------------------------------+\n| aten::embedding                  |\n+----------------------------------+\n| aten::embedding_dense_backward   |\n+----------------------------------+\n| aten::empty                      |\n+----------------------------------+\n| aten::expand                     |\n+----------------------------------+\n| aten::fill\\_                     |\n+----------------------------------+\n| aten::index_select               |\n+----------------------------------+\n| aten::_log_softmax               |\n+----------------------------------+\n| aten::_log_softmax_backward_data |\n+----------------------------------+\n| aten::lt                         |\n+----------------------------------+\n| aten::mm                         |\n+----------------------------------+\n| aten::mul                        |\n+----------------------------------+\n| aten::native_batch_norm          |\n+----------------------------------+\n| aten::native_batch_norm_backward |\n+----------------------------------+\n| aten::neg                        |\n+----------------------------------+\n| aten::permute                    |\n+----------------------------------+\n| aten::relu                       |\n+----------------------------------+\n| aten::rsub                       |\n+----------------------------------+\n| aten::select                     |\n+----------------------------------+\n| aten::slice                      |\n+----------------------------------+\n| aten::sqrt                       |\n+----------------------------------+\n| aten::sum                        |\n+----------------------------------+\n| aten::t                          |\n+----------------------------------+\n| aten::tanh                       |\n+----------------------------------+\n| aten::tanh_backward              |\n+----------------------------------+\n| aten::threshold_backward         |\n+----------------------------------+\n| aten::transpose                  |\n+----------------------------------+\n| aten::unsqueeze                  |\n+----------------------------------+\n| aten::view                       |\n+----------------------------------+\n| aten::zero\\_                     |\n+----------------------------------+\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/install-templates/pytorch-dev-install.txt",
    "content": "\n\n.. tab-set::\n\n   .. tab-item:: PyTorch 1.11.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu 20 AMI\n\n            .. include :: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n            .. code:: bash\n\n                # Configure Linux for Neuron repository updates\n                . /etc/os-release\n\n                sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF\n                deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main\n                EOF\n                wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -\n\n                # Update OS packages\n                sudo apt-get update -y\n\n                # Install git\n                sudo apt-get install git -y\n\n                # Install OS headers\n                sudo apt-get install linux-headers-$(uname -r) -y\n\n                # Remove preinstalled packages and Install Neuron Driver and Runtime\n                sudo apt-get remove aws-neuron-dkms  -y\n                sudo apt-get remove aws-neuronx-dkms  -y\n                sudo apt-get remove aws-neuronx-oci-hook  -y\n                sudo apt-get remove aws-neuronx-runtime-lib -y\n                sudo apt-get remove aws-neuronx-collectives -y\n                sudo apt-get install aws-neuronx-dkms=2.* -y\n                sudo apt-get install aws-neuronx-oci-hook=2.* -y\n                sudo apt-get install aws-neuronx-runtime-lib=2.* -y\n                sudo apt-get install aws-neuronx-collectives=2.* -y\n\n                # Install EFA Driver(only required for multi-instance training)\n\n                curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz\n                wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key\n                cat aws-efa-installer.key | gpg --fingerprint\n                wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig\n\n                tar -xvf aws-efa-installer-latest.tar.gz\n                cd aws-efa-installer && sudo bash efa_installer.sh --yes\n                cd\n                sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer\n\n                # Remove pre-installed package and Install Neuron Tools\n                sudo apt-get remove aws-neuron-tools  -y\n                sudo apt-get remove aws-neuronx-tools  -y\n                sudo apt-get install aws-neuronx-tools=2.* -y\n\n                export PATH=/opt/aws/neuron/bin:$PATH\n\n                # Install Python venv and activate Python virtual environment to install\n                # Neuron pip packages.\n                sudo apt install python3.8-venv\n                python3.8 -m venv aws_neuron_venv_pytorch\n                source aws_neuron_venv_pytorch/bin/activate\n                pip install -U pip\n\n                # Install wget, awscli\n                pip install wget\n                pip install awscli\n\n                # Install Python packages - Transformers package is needed for BERT\n                python -m pip install torch-neuronx==\"1.11.0.1.*\" \"neuronx-cc==2.*\" --extra-index-url \"https://pip.repos.neuron.amazonaws.com\"\n\n\n         .. tab-item:: Amazon Linux 2023 AMI\n\n            .. include :: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n            .. code:: bash\n\n\n                # Configure Linux for Neuron repository updates\n\n                sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF\n                [neuron]\n                name=Neuron YUM Repository\n                baseurl=https://yum.repos.neuron.amazonaws.com\n                enabled=1\n                metadata_expire=0\n                EOF\n                sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n\n                # Install OS headers\n                sudo dnf install -y \"kernel-devel-uname-r = $(uname -r)\"\n\n                # Update OS packages\n                sudo dnf update -y\n\n                # Install git\n                sudo dnf install git -y\n\n                # Remove preinstalled packages and Install Neuron Driver and Runtime\n                sudo dnf remove aws-neuron-dkms -y\n                sudo dnf remove aws-neuronx-dkms -y\n                sudo dnf remove aws-neuronx-oci-hook -y\n                sudo dnf remove aws-neuronx-runtime-lib -y\n                sudo dnf remove aws-neuronx-collectives -y\n                sudo dnf install aws-neuronx-dkms-2.*  -y\n                sudo dnf install aws-neuronx-oci-hook-2.*  -y\n                sudo dnf install aws-neuronx-runtime-lib-2.*  -y\n                sudo dnf install aws-neuronx-collectives-2.*  -y\n\n                # Install EFA Driver(only required for multi-instance training)\n                curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz\n                wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key\n                cat aws-efa-installer.key | gpg --fingerprint\n                wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig\n                tar -xvf aws-efa-installer-latest.tar.gz\n                cd aws-efa-installer && sudo bash efa_installer.sh --yes\n                cd\n                sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer\n\n                # Remove pre-installed package and Install Neuron Tools\n                sudo dnf remove aws-neuron-tools  -y\n                sudo dnf remove aws-neuronx-tools  -y\n                sudo dnf install aws-neuronx-tools-2.*  -y\n\n                export PATH=/opt/aws/neuron/bin:$PATH\n\n                # Install Python venv and activate Python virtual environment to install\n                # Neuron pip packages.\n                python3.7 -m venv aws_neuron_venv_pytorch\n                source aws_neuron_venv_pytorch/bin/activate\n                python -m pip install -U pip\n\n                # Install wget, awscli\n                pip install wget\n                pip install awscli\n\n                # Install Python packages - Transformers package is needed for BERT\n                python -m pip install torch-neuronx==\"1.11.0.1.*\" \"neuronx-cc==2.*\" --extra-index-url \"https://pip.repos.neuron.amazonaws.com\"\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/note-setup-general.rst",
    "content": "\n.. note::\n\n  * Instructions in this page only apply to setting up Neuron components on Linux host running Ubuntu or Amazon Linux AMI.\n  * When launching a Trn1/Trn2/Trn3 instance, you must adjust your primary EBS volume size to a minimum of 512GB. \n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.7.0-pytorch-install.rst",
    "content": ".. _install-neuronx-2.7.0-pytorch:\n\n\n.. meta::\n   :description: Install PyTorch NeuronX (Neuron 2.7.0) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, setup, torch-neuronx, Neuron 2.7.0, previous release\n   :date-modified: 2026-03-30\n\n\nInstall PyTorch NeuronX (Neuron 2.7.0)\n======================================\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.0\n\n        .. tab-set::\n\n            .. tab-item:: Amazon Linux 2 AMI\n\n                .. include :: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n                .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.0 --neuron-version=2.7.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n            .. tab-item:: Ubuntu 20 AMI\n\n                .. include :: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n                .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.0 --neuron-version=2.7.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.8.0-pytorch-install.rst",
    "content": ".. _install-neuronx-2.8.0-pytorch:\n\n\n.. meta::\n   :description: Install PyTorch NeuronX (Neuron 2.8.0) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, setup, torch-neuronx, Neuron 2.8.0, previous release\n   :date-modified: 2026-03-30\n\n\nInstall PyTorch NeuronX (Neuron 2.8.0)\n======================================\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.0\n\n        .. tab-set::\n\n            .. tab-item:: Amazon Linux 2 AMI\n\n                .. include :: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n                .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.0 --neuron-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n            .. tab-item:: Ubuntu 20 AMI\n\n                .. include :: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n                .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.0 --neuron-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.9.0-pytorch-install.rst",
    "content": ".. _install-neuronx-2.9.0-pytorch:\n\n\n.. meta::\n   :description: Install PyTorch NeuronX (Neuron 2.9.0) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, setup, torch-neuronx, Neuron 2.9.0, previous release\n   :date-modified: 2026-03-30\n\n\nInstall PyTorch NeuronX (Neuron 2.9.0)\n======================================\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.0\n\n        .. tab-set::\n\n            .. tab-item:: Amazon Linux 2 AMI\n\n                .. include :: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n                .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.0 --neuron-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n            .. tab-item:: Ubuntu 20 AMI\n\n                .. include :: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n                .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.0 --neuron-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2.rst",
    "content": ".. _pytorch-neuronx-install-prev-al2:\n\n\n.. meta::\n   :description: Install previous PyTorch NeuronX releases on Amazon Linux 2\n   :keywords: AWS Neuron, PyTorch, Trainium, Inferentia, setup, torch-neuronx, previous releases, Amazon Linux 2, AL2\n   :date-modified: 2026-03-30\n\n\nInstall Previous PyTorch Neuron Releases for Amazon Linux (``torch-neuronx``)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. toctree::\n   :maxdepth: 1\n\n\n\nUse the tabs below to install a specific previous Neuron SDK release of PyTorch NeuronX on Amazon Linux 2. Select the Neuron version you need.\n\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.18.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.17.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.17.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.16.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.16.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2023.rst",
    "content": "\n.. _pytorch-neuronx-install-prev-al2023:\n\n.. Install previous PyTorch NeuronX releases for Amazon Linux 2023\n\nUse the tabs below to install a specific previous Neuron SDK release of PyTorch NeuronX on Amazon Linux 2023. Select the Neuron version you need.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.28.1\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --neuron-version=2.28.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.27.1\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --neuron-version=2.27.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.26.1\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --neuron-version=2.26.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u20.rst",
    "content": "\n.. _pytorch-neuronx-install-prev-u20:\n\n.. Install previous PyTorch NeuronX releases for Ubuntu 20.04\n\nUse the tabs below to install a specific previous Neuron SDK release of PyTorch NeuronX on Ubuntu 20.04. Select the Neuron version you need.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.21.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.20.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.19.0\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u22.rst",
    "content": "\n.. _pytorch-neuronx-install-prev-u22:\n\n.. Install previous PyTorch NeuronX releases for Ubuntu 22.04\n\nUse the tabs below to install a specific previous Neuron SDK release of PyTorch NeuronX on Ubuntu 22.04. Select the Neuron version you need.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.28.1\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --neuron-version=2.28.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.27.1\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --neuron-version=2.27.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.26.1\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --neuron-version=2.26.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u24.rst",
    "content": "\n.. _pytorch-neuronx-install-prev-u24:\n\n.. Install previous PyTorch NeuronX releases for Ubuntu 24.04\n\nUse the tabs below to install a specific previous Neuron SDK release of PyTorch NeuronX on Ubuntu 24.04. Select the Neuron version you need.\n\n.. tab-set::\n\n    .. tab-item:: Neuron 2.28.1\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --neuron-version=2.28.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n    .. tab-item:: Neuron 2.27.1\n\n        .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --neuron-version=2.27.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-install.rst",
    "content": ".. _pytorch-neuronx-install:\n\n\n.. meta::\n   :description: Install PyTorch NeuronX on AWS Trainium and Inferentia instances\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, setup, torch-neuronx, install, DLAMI, pip\n   :date-modified: 2026-03-30\n\n\nInstall PyTorch NeuronX \n========================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\nDevelop on AWS ML accelerator instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSelect the PyTorch version and AMI type tabs below to get the installation commands for your environment.\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.1\n\n        .. tab-set::\n\n            .. tab-item:: Amazon Linux 2 DLAMI Base\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 13\n                    :end-line: 18\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 8\n                    :end-line: 9\n\n            .. tab-item:: Ubuntu 20 DLAMI Base\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 19\n                    :end-line: 24\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 11\n                    :end-line: 12\n\n            .. tab-item:: Amazon Linux 2 DLAMI Pytorch\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 25\n                    :end-line: 29\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 50\n                    :end-line: 51\n\n            \n\n            .. tab-item:: Ubuntu 20 DLAMI Pytorch\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 30\n                    :end-line: 35\n\n                .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                    :start-line: 53\n                    :end-line: 54\n\n            .. tab-item:: Amazon Linux 2\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 1\n                    :end-line: 3\n\n            .. tab-item:: Ubuntu 20\n\n                .. include :: /setup/install-templates/trn1/dlami-notes.rst\n                    :start-line: 4\n                    :end-line: 6\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-neuronx-install-cxx11.rst",
    "content": ".. _pytorch-neuronx-install-cxx11:\n\n\n.. meta::\n   :description: Build torch-xla from source with C++11 ABI support for PyTorch NeuronX\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, setup, torch-neuronx, CXX11, C++11 ABI, build from source\n   :date-modified: 2026-03-30\n\n\nInstall with support for C++11 ABI\n==================================\n\n.. warning::\n\n    The intended user of this guide is using a custom built version of\n    ``torch`` and ``torch-xla`` or compiling a non-python application which must be built using\n    the C++11 ABI.\n\n    *Most applications do not require this specialized distribution.*\n\n    For regular installation instructions see: :ref:`Fresh install <pytorch-neuronx-install>`\n\nThe standard ``torch-neuronx`` packages (which are normally installed according\nto the :ref:`Fresh install <pytorch-neuronx-install>` guide) are compiled with\nthe pre-C++11 ABI and linked against the pre-C++11 ``libtorch``. These\ncompilation options ensure that the ``torch-neuronx`` ABI matches the *publicly*\nreleased version of the ``torch`` and ``torch-xla`` packages that are installed from the default\nPyPI index.\n\nTo support applications with specific ABI requirements, Neuron distributes\npackages which are linked against the C++11 version of\n``libtorch``. These ``torch-neuronx`` packages are built using the\n``-D_GLIBCXX_USE_CXX11_ABI=1`` compilation flag. \n\n.. note::\n\n    The ``libneuronxla`` packages are already built with both pre-C++11 ABI and C++11 ABI symbols so the same PIP package can be used for C++11 ABI applications.\n\nThe only difference between these packages and the standard packages\nis the torch plugin library contained within the package. This is the\n``libtorchneuron.so`` library located in the ``torch_neuronx/lib/`` package\ndirectory. All other libraries and python files within the packages are\nidentical. This means that these C++11-compatible packages are drop-in\nreplacements in environments that are incompatible with the standard releases of\n``torch-neuronx``. The behavior is identical whether compiling models, executing\ninferences or running training.\n\nInstallation\n^^^^^^^^^^^^\n\nAll versions of the library are available to download from the following pip\nindex:\n\n::\n\n    https://pip.repos.neuron.amazonaws.com/cxx11\n\n\nTo install a wheel, it is recommended to use the ``--no-deps`` flag since\nversions of ``torch`` and ``torch-xla`` compiled using the C++11 ABI are not distributed on this\nindex.\n\n::\n\n    pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 torch-neuronx --no-deps\n\n\nSpecific versions of ``torch-neuronx`` with C++11 ABI support can be installed\njust like standard versions of ``torch-neuronx``.\n\n::\n\n    pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 \"torch-neuronx==2.5.*\" --no-deps\n    pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 \"torch-neuronx==2.6.*\" --no-deps\n\n.. important::\n\n    This pip index does not include a distribution of ``torch`` and ``torch-xla`` compiled with\n    the new C++11 ABI. The intent of this index is *only* to provide Neuron SDK\n    wheels. See :ref:`pytorch-neuronx-cxx11-building-torch-xla`.\n\n    The version of ``torch`` and ``torch-xla`` that are distributed on the default PyPI index is\n    compiled with the old pre-C++11 ABI.\n\n    If a C++11 ``torch-neuronx`` package is installed *with* dependencies\n    using the *default* PyPI index, then the installed version of ``torch`` and ``torch-xla`` will\n    be using the pre-C++11 ABI and ``torch-neuronx`` will be using the C++11\n    ABI. This ABI mismatch will lead to ``undefined symbol`` errors in both Python usage and at link\n    time for non-Python applications.\n\n\n.. _pytorch-neuronx-cxx11-building-torch-xla:\n\nBuilding ``torch`` and ``torch-xla`` with C++11 ABI\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe instructions for building ``torch`` from source are at https://github.com/pytorch/pytorch#from-source\n\nThe instructions for building ``torch-xla`` from source are at https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md\n\nThe following are simplified instructions (subject to change):\n\nSetting the build environment:\n\n.. code:: bash\n\n   sudo apt install cmake\n   pip install yapf==0.30.0\n   wget https://github.com/bazelbuild/bazelisk/releases/download/v1.20.0/bazelisk-linux-amd64\n   sudo cp bazelisk-linux-amd64 /usr/local/bin/bazel\n\nBuild ``torch`` (CPU only) and ``torch-xla`` wheels for version 2.5:\n\n.. code:: bash\n\n   git clone --recursive https://github.com/pytorch/pytorch --branch v2.5.1\n   cd pytorch/\n   git clone --recursive https://github.com/pytorch/xla.git --branch v2.5.1\n   _GLIBCXX_USE_CXX11_ABI=1 python setup.py bdist_wheel\n   # pip wheel will be present in ./dist\n   cd xla/\n   CXX_ABI=1 python setup.py bdist_wheel\n   # pip wheel will be present in ./dist\n\nBuild ``torch`` (CPU only) and ``torch-xla`` wheels for version 2.6:\n\n.. code:: bash\n\n   git clone --recursive https://github.com/pytorch/pytorch --branch v2.6.0\n   cd pytorch/\n   git clone --recursive https://github.com/pytorch/xla.git --branch r2.6_aws_neuron\n   _GLIBCXX_USE_CXX11_ABI=1 python setup.py bdist_wheel\n   # pip wheel will be present in ./dist\n   cd xla/\n   CXX_ABI=1 python setup.py bdist_wheel\n   # pip wheel will be present in ./dist\n\nFAQ\n^^^\n\nWhen should I use a C++11 torch-neuronx wheel?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDistributions compiled with the new C++11 ABI should only be used in the\nfollowing cases:\n\n1. You have built your own version of ``torch`` and ``torch-xla`` which uses the new C++11 ABI and\n   need a corresponding version of ``torch-neuronx`` that is compatible.\n2. You are compiling an application against a ``libtorch``\n   which uses the C++11 ABI and would like to include\n   ``libtorchneuron.so`` as well. Torch distributes these C++11 ``libtorch``\n   libraries with a ``libtorch-cxx11`` prefix.\n\n    Example:\n\n    ::\n\n        https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.5.1%2Bcpu.zip\n\n\nCan I download a library/header zip file similar to the torch distribution?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nCurrently ``torch-neuron`` does not distribute a bundled library ``.zip`` with\nonly library/header files.\n\nThe recommended alternative when compiling ``libtorchneuron.so`` into a\nnon-python application is to install the ``torch-neuron`` wheel using ``pip``\naccording to the installation instructions. Then use the ``libtorchneuron.so``\nlibrary from within the python ``site-packages`` directory.\n\nA second alternative to isolate the package contents from a python environment\nis to download the wheel and unpack the contents:\n\n.. code:: bash\n\n    pip download --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 torch-neuronx --no-deps\n    wheel unpack torch_neuronx-*.whl\n\nIf the exact version of the ``torch-neuronx`` package is known and no\nPython/Pip is available in the build environment, an alternative is to fetch the\npackage file directly and ``unzip`` the wheel:\n\n.. code::\n\n    wget https://pip.repos.neuron.amazonaws.com/cxx11/torch-neuronx/torch_neuronx-<VERSION>-py3-none-any.whl\n    unzip torch_neuronx-<VERSION>-py3-none-any.whl\n\n\n.. _pytorch-neuronx-cxx11-versioning:\n\nHow can I know which ABI torch-neuronx is using?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nPackages which use the pre-C++11 ABI have no local identifier and use the\nfollowing version scheme:\n\n::\n\n    <torch version>.<neuron version>\n\nPackages which use the C++11 ABI have a ``+cxx11`` local identifier and use\nfollowing version scheme:\n\n::\n\n    <torch version>.<neuron version>+cxx11\n\n\nThis allows the ABI to be validated in the by inspecting the local identifier\n(or version suffix).\n\nExample:\n::\n\n    2.5.1.2.4.0+cxx11\n    2.6.1.2.4.0+cxx11\n\n\nHow can I know which ABI torch is using?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe ``torch`` python package provides an API at the that allows you to check if\nthe underlying ``libtorch`` was compiled with the C++11 ABI:\n\n.. code:: python\n\n    import torch\n    torch.compiled_with_cxx11_abi()  # True/False\n\nCurrently ``torch-neuronx`` does not have an equivalent API. If the C++11 ABI was\nused, it will be visible in the version string (See :ref:`pytorch-neuronx-cxx11-versioning`).\n\n\nTroubleshooting\n^^^^^^^^^^^^^^^\n\nWhat Python errors could I see if I mix ABI versions?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUsing a version of ``torch`` compiled with the C++11 ABI will trigger an error\nin the python interpreter when importing a version of ``torch-neuronx`` using\nthe old (pre-C++11) ABI from the standard index. This will manifest as an\nerror when the ``import torch_neuronx`` statement is executed.\n\n::\n\n    Traceback (most recent call last):\n      File \"/python3.9/site-packages/torch_neuron/__init__.py\", line 64, in <module>\n        _register_extension()\n      File \"/python3.9/site-packages/torch_neuron/__init__.py\", line 60, in _register_extension\n        torch.ops.load_library(neuron_op_filename)\n      File \"/python3.9/site-packages/torch/_ops.py\", line 110, in load_library\n        ctypes.CDLL(path)\n      File \"/python3.9/ctypes/__init__.py\", line 364, in __init__\n        self._handle = _dlopen(self._name, mode)\n    OSError: /python3.9/site-packages/torch_neuron/lib/libtorchneuron.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_\n\n\nSimilarly, when using the standard pre-C++11 versions of ``torch/torch-xla`` with the C++11\nversion of ``torch-neuronx``, an error would also occur at import.\n\n::\n\n    Traceback (most recent call last):\n      File \"/python3.9/site-packages/torch_neuron/__init__.py\", line 79, in <module>\n        _register_extension()\n      File \"/python3.9/site-packages/torch_neuron/__init__.py\", line 75, in _register_extension\n        torch.ops.load_library(neuron_op_filename)\n      File \"/python3.9/site-packages/torch/_ops.py\", line 110, in load_library\n        ctypes.CDLL(path)\n      File \"/python3.9/ctypes/__init__.py\", line 364, in __init__\n        self._handle = _dlopen(self._name, mode)\n    OSError: /python3.9/site-packages/torch_neuron/lib/libtorchneuron.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE\n\n\nIn either of these cases, the remedy is to ensure that the ABI of the ``torch`` and ``torch-xla``\ndistribution matches the ABI of the ``torch-neuronx`` distribution.\n\nWhat compiler/linking errors could I see if I mix ABI versions?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf you link an application which uses the old (pre-C++11) ABI\n``libtorchneuron.so`` with a C++11 version of ``torch``, this will trigger a\nlink error.\n\n::\n\n    libtorchneuron.so: undefined reference to `torch::detail::class_base::class_base(std::string const&, std::string const&, std::string, std::type_info const&, std::type_info const&)'\n    libtorchneuron.so: undefined reference to `c10::Error::Error(c10::SourceLocation, std::string)'\n    libtorchneuron.so: undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::string const&)'\n    libtorchneuron.so: undefined reference to `c10::ClassType::getMethod(std::string const&) const'\n    libtorchneuron.so: undefined reference to `c10::ivalue::ConstantString::create(std::string)'\n    libtorchneuron.so: undefined reference to `c10::DeviceTypeName(c10::DeviceType, bool)'\n    libtorchneuron.so: undefined reference to `torch::jit::parseSchema(std::string const&)'\n    libtorchneuron.so: undefined reference to `unsigned short caffe2::TypeMeta::_typeMetaData<std::string>()'\n    libtorchneuron.so: undefined reference to `c10::Warning::warn(c10::SourceLocation const&, std::string const&, bool)'\n    libtorchneuron.so: undefined reference to `torch::jit::parseSchemaOrName(std::string const&)'\n    libtorchneuron.so: undefined reference to `c10::Symbol::fromQualString(std::string const&)'\n    libtorchneuron.so: undefined reference to `c10::Error::Error(std::string, std::string, void const*)'\n    libtorchneuron.so: undefined reference to `c10::detail::infer_schema::make_function_schema(std::string&&, std::string&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>)'\n    libtorchneuron.so: undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&)'\n    libtorchneuron.so: undefined reference to `torch::jit::canonicalSchemaString(c10::FunctionSchema const&)'\n\n\nSimilarly, an error will also occur in the opposite scenario where the\nC++11 ``libtorchneuron.so`` library is used with the pre-C++11 ``libtorch``:\n\n::\n\n    libtorchneuron.so: undefined reference to `c10::ivalue::ConstantString::create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'\n    libtorchneuron.so: undefined reference to `torch::jit::parseSchemaOrName(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'\n    libtorchneuron.so: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'\n    libtorchneuron.so: undefined reference to `c10::Error::Error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, void const*)'\n    libtorchneuron.so: undefined reference to `torch::jit::canonicalSchemaString[abi:cxx11](c10::FunctionSchema const&)'\n    libtorchneuron.so: undefined reference to `torch::detail::class_base::class_base(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::type_info const&, std::type_info const&)'\n    libtorchneuron.so: undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'\n    libtorchneuron.so: undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'\n    libtorchneuron.so: undefined reference to `c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>)'\n    libtorchneuron.so: undefined reference to `torch::jit::parseSchema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'\n    libtorchneuron.so: undefined reference to `c10::DeviceTypeName[abi:cxx11](c10::DeviceType, bool)'\n    libtorchneuron.so: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'\n    libtorchneuron.so: undefined reference to `unsigned short caffe2::TypeMeta::_typeMetaData<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >()'\n    libtorchneuron.so: undefined reference to `c10::ClassType::getMethod(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'\n    libtorchneuron.so: undefined reference to `c10::Warning::warn(c10::SourceLocation const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)'\n\n\nIn either of these cases, the remedy is to ensure that the ABI of the\n``libtorch`` distribution matches the ABI of the ``libtorchneuron.so``\ndistribution.\n\nThe ``torch`` and ``torch-xla`` ABI must match the ``torch-neuron`` ABI or an ``undefined symbol`` error will occur.\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-update-al2-dlami.rst",
    "content": "\n.. _pytorch-neuronx-al2-dlami-update:\n\n.. Update PyTorch NeuronX on Amazon Linux 2 PyTorch DLAMI\n\nIf you already have a previous Neuron release installed on a PyTorch DLAMI, select the PyTorch version tab below to get the update commands.\n\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.1\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 116\n            :end-line: 117\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-update-al2.rst",
    "content": "\n.. _pytorch-neuronx-al2-update:\n\n\n.. meta::\n   :description: Update PyTorch NeuronX to the latest release on Amazon Linux 2\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, setup, torch-neuronx, update, Amazon Linux 2, AL2\n   :date-modified: 2026-03-30\n\n\nUpdate to latest PyTorch NeuronX \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands for your environment.\n\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 1.13.1\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 14\n            :end-line: 15\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-update-al2023.rst",
    "content": "\n.. _pytorch-neuronx-al2023-update:\n\n.. Update PyTorch NeuronX to the latest release on Amazon Linux 2023\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands for your environment.\n\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 2.8.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. note::\n            PyTorch versions 2.7 and 2.8 are no longer supported on Neuron. If you are looking for setup instructions specific to PyTorch 2.7 and 2.8 on Amazon Linux 2023, Ubuntu 24.04, or Ubuntu 22.04, see `the Neuron release 2.28.0 version of the setup docs <https://awsdocs-neuron.readthedocs-hosted.com/en/v2.28.0/setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2023.html#id2>`__.\n\n    .. tab-item:: PyTorch 2.7.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n      \n        .. note::\n            PyTorch versions 2.7 and 2.8 are no longer supported on Neuron. If you are looking for setup instructions specific to PyTorch 2.7 and 2.8 on Amazon Linux 2023, Ubuntu 24.04, or Ubuntu 22.04, see `the Neuron release 2.28.0 version of the setup docs <https://awsdocs-neuron.readthedocs-hosted.com/en/v2.28.0/setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2023.html#id2>`__.\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-update-u20-dlami.rst",
    "content": "\n.. _pytorch-neuronx-ubuntu20-dlami-update:\n\n.. Update PyTorch NeuronX on Ubuntu 20.04 PyTorch DLAMI\n\nIf you already have a previous Neuron release installed on a PyTorch DLAMI, select the PyTorch version tab below to get the update commands.\n\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 2.1.2\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 197\n            :end-line: 198\n\n    .. tab-item:: PyTorch 1.13.1\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 119\n            :end-line: 120\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-update-u20.rst",
    "content": "\n.. _pytorch-neuronx-ubuntu20-update:\n\n.. Update PyTorch NeuronX to the latest release on Ubuntu 20.04\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands for your environment.\n\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 2.1.2\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 233\n            :end-line: 234\n\n    .. tab-item:: PyTorch 1.13.1\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 17\n            :end-line: 18\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-update-u22.rst",
    "content": "\n.. _pytorch-neuronx-ubuntu22-update:\n\n.. Update PyTorch NeuronX to the latest release on Ubuntu 22.04\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands for your environment.\n\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 2.9.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 284\n            :end-line: 285\n\n    .. tab-item:: PyTorch 2.8.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. note::\n            PyTorch versions 2.7 and 2.8 are no longer supported on Neuron. If you are looking for setup instructions specific to PyTorch 2.7 and 2.8 on Amazon Linux 2023, Ubuntu 24.04, or Ubuntu 22.04, see `the Neuron release 2.28.0 version of the setup docs <https://awsdocs-neuron.readthedocs-hosted.com/en/v2.28.0/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html#setup-torch-neuronx-ubuntu22>`__.\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup/pytorch-update-u24.rst",
    "content": "\n.. _pytorch-neuronx-ubuntu24-update:\n\n.. Update PyTorch NeuronX to the latest release on Ubuntu 24.04\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands for your environment.\n\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 2.9.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 293\n            :end-line: 294\n\n    .. tab-item:: PyTorch 2.8.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n            \n        .. note::\n            PyTorch versions 2.7 and 2.8 are no longer supported on Neuron. If you are looking for setup instructions specific to PyTorch 2.7 and 2.8 on Amazon Linux 2023, Ubuntu 24.04, or Ubuntu 22.04, see `the Neuron release 2.28.0 version of the setup docs <https://awsdocs-neuron.readthedocs-hosted.com/en/v2.28.0/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu24.html#setup-torch-neuronx-ubuntu24>`__.\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.rst",
    "content": ".. _setup-trn1-multi-node-execution:\n\n\n.. meta::\n   :description: How to prepare trn1.32xlarge for multi-node execution - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, setup, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nHow to prepare trn1.32xlarge for multi-node execution\n=====================================================\n\nEFA is a low latency transport that is used for inter-node communication.  Multi-node jobs, such as distributed training, requires EFA to be enabled on every participating trn1/trn1n 32xlarge instance. Please note that EFA is currently not available on the smaller instances sizes and they cannot be used for running multi-node jobs.\n\ntrn1.32xlarge has 8 EFA devices, trn1n.32xlarge has 16 EFA devices.  The rest of the document will refer to trn1.32xlarge but everything in the document also applies to trn1n.32xlarge except for the different number of EFA devices.\n\n\nLaunching an instance\n^^^^^^^^^^^^^^^^^^^^^\n\nBefore launching trn1 you need to create a security group that allows EFA traffic between the instances.  Follow Step1 here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-security and note the newly created security group ID.  It will be used on the next step.\n\nDetermine the region, the AMI, the key and the subnet that will be used to launch trn1.\n\nAt the moment launching Trn1 instances with EFA support from the console is not recommended. The instances must be launched using AWS CLI.  To launch trn1.32xlarge instance:\n\n\n.. code-block:: bash\n\n    export AMI=<ami>\n    export SUBNET=<subnet id>\n    export SG=<security group created on the previous step>\n    export REG=<AWS region>\n    export KEY=<the key>\n\n    aws ec2 run-instances --region ${REG} \\\n    --image-id ${AMI} --instance-type trn1.32xlarge \\\n    --key-name ${KEY} \\\n    --tag-specifications \"ResourceType=instance,Tags=[{Key=Name,Value=\\\"friendly name\\\"}]\" \\\n    --network-interfaces \\\n    \"NetworkCardIndex=0,DeviceIndex=0,Groups=${SG},SubnetId=${SUBNET},InterfaceType=efa\" \\\n    \"NetworkCardIndex=1,DeviceIndex=1,Groups=${SG},SubnetId=${SUBNET},InterfaceType=efa\" \\\n    \"NetworkCardIndex=2,DeviceIndex=1,Groups=${SG},SubnetId=${SUBNET},InterfaceType=efa\" \\\n    \"NetworkCardIndex=3,DeviceIndex=1,Groups=${SG},SubnetId=${SUBNET},InterfaceType=efa\" \\\n    \"NetworkCardIndex=4,DeviceIndex=1,Groups=${SG},SubnetId=${SUBNET},InterfaceType=efa\" \\\n    \"NetworkCardIndex=5,DeviceIndex=1,Groups=${SG},SubnetId=${SUBNET},InterfaceType=efa\" \\\n    \"NetworkCardIndex=6,DeviceIndex=1,Groups=${SG},SubnetId=${SUBNET},InterfaceType=efa\" \\\n    \"NetworkCardIndex=7,DeviceIndex=1,Groups=${SG},SubnetId=${SUBNET},InterfaceType=efa\" \n\n\n\nNote that one of the cards is assigned DeviceIndex 0 and the rest are assigned DeviceIndex 1.  Cloud-init will configure instance routing to route outgoing traffic prioritized by the device index field.  I.e the outbound traffic will always egress from the interface with DeviceIndex 0.  That avoids network connectivity problems when multiple interfaces are attached to the same subnet.\n\nTo launch trn1n.32xlarge instance:\n\n.. code-block:: bash\n\n    export AMI=<ami>\n    export SUBNET=<subnet id>\n    export SG=<security group created on the previous step>\n    export REG=<AWS region>\n    export KEY=<the key>\n    \n    aws ec2 run-instances --region ${REG} \\\n    --image-id ${AMI} --instance-type trn1.32xlarge \\\n    --key-name ${KEY} \\\n    --tag-specifications \"ResourceType=instance,Tags=[{Key=Name,Value=\\\"friendly name\\\"}]\" \\\n    --network-interfaces \\\n        NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=2,DeviceIndex=2,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=3,DeviceIndex=3,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=5,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=6,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=7,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=8,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=9,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=10,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=11,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=12,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=13,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=14,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa \\\n        NetworkCardIndex=15,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa\n\nAssigning public IP address\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nMulti-interface instances are not assigned public IP automatically.  If you require access to the newly launched trn1 from the Internet you need to assign Elastic IP to the interface with DeviceIndex = 0.  To find the right interface either parse the output of the instance launch command or use describe-instances command:\n\n\n.. code-block:: bash\n\n    $ aws ec2 describe-instances --instance-ids i-01b17afa1e6021d6c\n    {\n        \"Reservations\": [\n            {\n                \"Groups\": [],\n                \"Instances\": [\n                    {\n                        \"AmiLaunchIndex\": 0,\n                        \"ImageId\": \"ami-01257e71ecb2f431c\",\n                        \"InstanceId\": \"i-01b17afa1e6021d6c\",\n                        \"InstanceType\": \"trn1.32xlarge\",\n                        .........\n                        \"NetworkInterfaces\": [\n                            {\n                                \"Attachment\": {\n                                    \"AttachTime\": \"2023-05-19T17:37:26.000Z\",\n                                    \"AttachmentId\": \"eni-attach-03730388baedd4b96\",\n                                    \"DeleteOnTermination\": true,\n                                    \"DeviceIndex\": 0,\n                                    \"Status\": \"attached\",\n                                    \"NetworkCardIndex\": 4\n                                },\n                                \"Description\": \"\",\n                                .........\n                                \"InterfaceType\": \"efa\"\n                            },\n                            {\n                                \"Attachment\": {\n                                    \"AttachTime\": \"2023-05-19T17:37:26.000Z\",\n                                    \"AttachmentId\": \"eni-attach-0e1242371cd2532df\",\n                                    \"DeleteOnTermination\": true,\n                                    \"DeviceIndex\": 0,\n                                    \"Status\": \"attached\",\n                                    \"NetworkCardIndex\": 3\n                                },\n                                \"Description\": \"\",\n                                ................\n            \n            }\n        ]\n    }\n\n\n\nThe second entry in “NetworkInterfaces” in this example has “DeviceIndex” 0 and should be used to attach EIP.\n\n\nSoftware installation\n^^^^^^^^^^^^^^^^^^^^^\n\nThe software required for EFA operation is distributed via aws-efa-installer package.  The package is preinstalled on Neuron DLAMI.  If you’d like to install the latest or if you are using your own AMI follow these steps:\n\n.. code-block:: bash\n\n    curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz \n    wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key \n    cat aws-efa-installer.key | gpg --fingerprint \n    wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig \n    tar -xvf aws-efa-installer-latest.tar.gz \n    cd aws-efa-installer && sudo bash efa_installer.sh --yes \n    cd \n    sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer\n\nApplication execution environment\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running an application make sure the following environment variables are set:\n\n.. code-block:: bash\n\n    FI_PROVIDER=efa\n    FI_EFA_USE_DEVICE_RDMA=1\n    FI_EFA_FORK_SAFE=1  # only required when running on AL2\n\nContainers\n^^^^^^^^^^\n\naws-efa-installer package must be installed on the instance.  That installs both the efa kernel module and the libraries.  The libraries must be accessible to an application running inside a container.  This can be accomplished by either installing aws-efa-installer package inside the container or by making on the instance library installation path available inside a container.\n\nIf installing aws-efa-installer package inside a container pass the flag that disables the kernel module installation:\n\n.. code-block:: bash\n\n    sudo bash efa_installer.sh --yes --skip-kmod\n\n\nThe location of the libraries is distribution specific:\n\n.. code-block:: bash\n\n    /opt/amazon/efa/lib   # Ubuntu\n    /opt/amazon/efa/lib64 # AL2\n\n\nAppendix - trn1 instance launch example script\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: bash\n\n    #!/bin/bash\n     \n    set -e\n \n    # AWS CLI v2 Installation instructions for Linux:\n    # curl \"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip\" -o \"awscliv2.zip\"\n    # unzip awscliv2.zip\n    # sudo ./aws/install\n    # $ aws --version\n    # aws-cli/2.11.20 Python/3.11.3 Linux/5.15.0-1034-aws exe/x86_64.ubuntu.20 prompt/off\n    # Someone with AWS console admin privileges can create an access key ID and secret for this:\n    # Configure credentials: aws configure\n \n    # Search the AWS AMIs for the most recent \"Deep Learning Base Neuron AMI (Ubuntu 20.04) <Latest_Date>\"\n    # This one is 2023-05-17 - ami-01257e71ecb2f431c\n    AMI= ... # the ami\n    KEYNAME= ... # your key\n    SG= ... # the security group \n    SUBNET= ... # the subnet\n    REGION=us-west-2\n    \n    # Launch instances\n    echo \"Starting instances...\"\n    output=$(aws ec2 --region $REGION run-instances \\\n    --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=_Trainium-Big}]' \\\n    --count 1 \\\n    --image-id $AMI \\\n    --instance-type trn1.32xlarge \\\n    --key-name $KEYNAME \\\n    --network-interfaces \"NetworkCardIndex=0,DeviceIndex=0,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa\" \\\n    \"NetworkCardIndex=1,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa\" \\\n    \"NetworkCardIndex=2,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa\" \\\n    \"NetworkCardIndex=3,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa\" \\\n    \"NetworkCardIndex=4,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa\" \\\n    \"NetworkCardIndex=5,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa\" \\\n    \"NetworkCardIndex=6,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa\" \\\n    \"NetworkCardIndex=7,DeviceIndex=1,Groups=$SG,SubnetId=$SUBNET,InterfaceType=efa\")\n    \n    \n    # Parse the output to get the instance IDs\n    instance_ids=$(echo $output | jq -r .Instances[].InstanceId)\n    echo \"Got created instance IDs: $instance_ids\"\n \n    # Loop through each instance ID\n    public_ips=\"\"\n    for instance_id in $instance_ids; do\n      echo \"Waiting for instance $instance_id to be running...\"\n      aws ec2 wait instance-running --instance-ids $instance_id --region $REGION\n \n      echo \"Creating SSH public IP newtork inteface for instance $instance_id...\"\n      interface_id=\"\"\n      INSTANCE_INFO=$(aws ec2 describe-instances --region $REGION --instance-ids $instance_id)\n      OUTPUT=$(echo \"$INSTANCE_INFO\" | jq -r '.Reservations[0].Instances[0].NetworkInterfaces[] | \"\\(.Attachment.DeviceIndex),\\(.NetworkInterfaceId)\"')\n      echo $OUTPUT\n      for pair in $OUTPUT; do\n          IFS=\",\" read -r device_idx ni_id <<< $pair\n          if [ \"$device_idx\" == \"0\" ]; then\n              interface_id=$ni_id\n              break\n          fi\n      done\n      if [ \"$interface_id\" == \"\" ]; then\n          exit -1\n      fi\n      echo $interface_id\n \n      echo \"Checking for unassociated Elastic IPs...\"\n      unassociated_eips=$(aws ec2 describe-addresses --region $REGION | jq -r '.Addresses[] | select(.AssociationId == null) | .AllocationId')\n      if [[ -z \"$unassociated_eips\" ]]; then\n          echo \"No unassociated Elastic IPs found. Allocating new Elastic IP...\"\n          eip_output=$(aws ec2 allocate-address --domain vpc --region $REGION)\n          eip_id=$(echo $eip_output | jq -r .AllocationId)\n          echo \"Allocated Elastic IP ID: $eip_id\"\n          eip_public_ip=$(echo $eip_output | jq -r .PublicIp)\n          echo \"Allocated Elastic IP Public IP: $eip_public_ip\"\n          echo \"Note that this newly allocated Elasic IP will persist even after the instance termination\"\n          echo \"If the Elastic IP is not going to be reused do not forget to delete it\"\n      else\n          # use the first unassociated Elastic IP found\n          eip_id=$(echo \"$unassociated_eips\" | head -n 1)\n          echo \"Found unassociated Elastic IP ID: $eip_id\"\n          eip_public_ip=$(aws ec2 describe-addresses --allocation-ids $eip_id --region $REGION | jq -r .Addresses[0].PublicIp)\n          echo \"Elastic IP Public IP: $eip_public_ip\"\n      fi\n      public_ips+=\"${eip_public_ip} \"\n \n      echo \"Associating Elastic IP with network interface $interface_id...\"\n      aws ec2 associate-address --allocation-id $eip_id --network-interface-id $interface_id --region $REGION\n      echo \"Associated Elastic IP with network interface.\"\n    done\n \n    echo \"The instance has been launched.\\nYou can now SSH into $public_ips with key $KEYNAME.\\n\"\n\n.. note:: if you face connectivity issues after launching trn1\\\\trn1n 32xlarge instance on Ubuntu, please follow the troubleshooting instructions mentioned :ref:`here. <trn1_ubuntu_troubleshooting>`\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-default.rst",
    "content": "\n.. meta::\n   :description: AWS Neuron SDK documentation for torch neuronx dataparallel example default\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nThe default DataParallel use mode will replicate the model\non all available NeuronCores in the current process. The inputs will be split\non ``dim=0``.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n    from torchvision import models\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 224, 224])\n    model_neuron = torch_neuronx.trace(model, image)\n\n    # Create the DataParallel module\n    model_parallel = torch_neuronx.DataParallel(model_neuron)\n\n    # Create a batched input\n    batch_size = 5\n    image_batched = torch.rand([batch_size, 3, 224, 224])\n\n    # Run inference with a batched input\n    output = model_parallel(image_batched)"
  },
  {
    "path": "frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dim-neq-zero.rst",
    "content": "\n.. meta::\n   :description: AWS Neuron SDK documentation for torch neuronx dataparallel example dim neq zero\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nIn this example we run DataParallel inference using two NeuronCores and\n``dim = 2``. Because ``dim != 0``, dynamic batching is not enabled.\nConsequently, the DataParallel inference-time batch size must be two times the\ncompile-time batch size.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n\n    # Create an example model\n    class Model(torch.nn.Module):\n        def __init__(self):\n            super().__init__()\n            self.conv = torch.nn.Conv2d(3, 3, 3)\n\n        def forward(self, x):\n            return self.conv(x) + 1\n\n    model = Model()\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 8, 8])\n    model_neuron = torch_neuronx.trace(model, image)\n\n    # Create the DataParallel module using 2 NeuronCores and dim = 2\n    model_parallel = torch_neuronx.DataParallel(model_neuron, device_ids=[0, 1], dim=2)\n\n    # Create a batched input\n    # Note that image_batched.shape[dim] / len(device_ids) == image.shape[dim]\n    batch_size = 2 * 8\n    image_batched = torch.rand([1, 3, batch_size, 8])\n\n    # Run inference with a batched input\n    output = model_parallel(image_batched)"
  },
  {
    "path": "frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-disable-dynamic-batching.rst",
    "content": "\n.. meta::\n   :description: AWS Neuron SDK documentation for torch neuronx dataparallel example disable dynamic batching\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nIn the following example, we use\n:func:`torch_neuronx.DataParallel.disable_dynamic_batching` to disable dynamic\nbatching. We provide an example of a batch size that will not work when dynamic\nbatching is disabled as well as an example of a batch size that does work when\ndynamic batching is disabled.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n    from torchvision import models\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 224, 224])\n    model_neuron = torch_neuronx.trace(model, image)\n\n    # Create the DataParallel module and use 2 NeuronCores\n    model_parallel = torch_neuronx.DataParallel(model_neuron, device_ids=[0, 1], dim=0)\n\n    # Disable dynamic batching\n    model_parallel.disable_dynamic_batching()\n\n    # Create a batched input (this won't work)\n    batch_size = 4\n    image_batched = torch.rand([batch_size, 3, 224, 224])\n\n    # This will fail because dynamic batching is disabled and\n    # image_batched.shape[dim] / len(device_ids) != image.shape[dim]\n    # output = model_parallel(image_batched)\n\n    # Create a batched input (this will work)\n    batch_size = 2\n    image_batched = torch.rand([batch_size, 3, 224, 224])\n\n    # This will work because\n    # image_batched.shape[dim] / len(device_ids) == image.shape[dim]\n    output = model_parallel(image_batched)\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dynamic-batching.rst",
    "content": "\n.. meta::\n   :description: AWS Neuron SDK documentation for torch neuronx dataparallel example dynamic batching\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nIn the following example, we use the :func:`torch_neuronx.DataParallel` module\nto run inference using several different batch sizes without recompiling the\nNeuron model.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n    from torchvision import models\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 224, 224])\n    model_neuron = torch_neuronx.trace(model, image)\n\n    # Create the DataParallel module\n    model_parallel = torch_neuronx.DataParallel(model_neuron)\n\n    # Create batched inputs and run inference on the same model\n    batch_sizes = [2, 3, 4, 5, 6]\n    for batch_size in batch_sizes:\n        image_batched = torch.rand([batch_size, 3, 224, 224])\n\n        # Run inference with a batched input\n        output = model_parallel(image_batched)"
  },
  {
    "path": "frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-specify-ncs.rst",
    "content": "\n.. meta::\n   :description: AWS Neuron SDK documentation for torch neuronx dataparallel example specify ncs\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx\n   :date-modified: 2026-03-13\n\n\nThe following example uses the ``device_ids`` argument to use the first three\nNeuronCores for DataParallel inference.\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n    from torchvision import models\n\n    # Load the model and set it to evaluation mode\n    model = models.resnet50(pretrained=True)\n    model.eval()\n\n    # Compile with an example input\n    image = torch.rand([1, 3, 224, 224])\n    model_neuron = torch_neuronx.trace(model, image)\n\n    # Create the DataParallel module, run on the first two NeuronCores\n    # Equivalent to model_parallel = torch.neuron.DataParallel(model_neuron, device_ids=[0, 1])\n    model_parallel = torch_neuronx.DataParallel(model_neuron, device_ids=['nc:0', 'nc:1'])\n\n    # Create a batched input\n    batch_size = 5\n    image_batched = torch.rand([batch_size, 3, 224, 224])\n\n    # Run inference with a batched input\n    output = model_parallel(image_batched)"
  },
  {
    "path": "frameworks/torch/torch-neuronx/training-troubleshooting.rst",
    "content": ".. _pytorch-neuron-traning-troubleshooting:\n\n\n.. meta::\n   :description: PyTorch Neuron (``torch-neuronx``) for Training Troubleshooting Guide - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training, troubleshooting\n   :date-modified: 2026-03-13\n\n\nPyTorch Neuron (``torch-neuronx``) for Training Troubleshooting Guide\n=====================================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nThis document shows common issues users may encounter while using\nPyTorch-Neuron and provides guidance how to resolve or work-around them.\n\nGeneral Troubleshooting\n-----------------------\n\nFor setting up EFA that is needed for multi-node training, please see :ref:`setup-trn1-multi-node-execution`\n\n\nFor XLA-related troubleshooting notes see :ref:`How to debug models in PyTorch\nNeuron <pytorch-neuronx-debug>`\nand `PyTorch-XLA troubleshooting\nguide <https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md>`__.\n\nIf your multi-worker training run is interrupted, you may need to kill\nall the python processes (WARNING: this kills all python processes and\nreload the driver):\n\n.. code:: bash\n\n   killall -9 python\n   killall -9 python3\n   sudo rmmod neuron; sudo modprobe neuron\n\nTo turn on RT debug:\n\n.. code:: python\n\n   os.environ[\"NEURON_RT_LOG_LEVEL\"] = \"INFO\"\n\nTo turn on Neuron NCCL debug:\n\n.. code:: python\n\n   os.environ[\"NCCL_DEBUG\"] = \"WARN\"\n   os.environ[\"NCCL_DEBUG_SUBSYS\"] = \"ALL\"\n\nIf some process crashed during training, you can enable core dumps using ``ulimit`` command:\n\n.. code:: bash\n\n   ulimit -S -c unlimited\n\nTo see the type of signals that would cause core dumps, see https://www.man7.org/linux/man-pages/man7/signal.7.html.\n\nNote that core dumps take significant amount of storage, so make sure there is enough free disk space before enabling core dumps.\n\nOn Ubuntu, if Apport is not running, core dump file name is by default \"core\" in the local directory. To change file location and name format, modify ``/proc/sys/kernel/core_pattern`` (see https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#core-pattern for pattern info). For example, to dump to /tmp with executable filename and process ID:\n\n.. code:: bash\n\n   echo '/tmp/core.%e.%p' | sudo tee /proc/sys/kernel/core_pattern\n\nFor containers, install appropriate dependencies during docker build (\"apt-get update && apt-get -y install build-essential gdb\") and start the container with ``--ulimit core=-1`` to enable core dump and ``-v /tmp/:/tmp/`` to ensure core dumps to ``/tmp`` are preserved when container is stopped or deleted. Dependencies can also be installed after container is started.\n\nOn Ubuntu, core dumps can also handled by Apport which is disabled by default. To enable Apport, run ``sudo service apport start``. The ``/proc/sys/kernel/core_pattern`` is updated by Apport service. After a crash, look in ``/var/log/apport.log`` for the core dump file name, which should be in located in ``/var/lib/apport/coredump/``.\n\nOnce you have the core dump, you can use gdb to debug further (for Python applications, <executable> is ``python`` or ``python3``):\n\n.. code:: bash\n\n   gdb <executable> <core file>\n\nIf some process (i.e. XRT server) is killed due to out-of-memory on host (i.e. you see ``Out of memory: Killed process <PID>`` in ``/var/log/syslog`` or output of ``dmesg``), there won't be any core dump generated. However, you can change to it to kernel panic mode to trigger core dump by setting ``/proc/sys/vm/panic_on_oom`` to value of 1 on the host or from inside container.\n\nOn the host where you need ``sudo`` (this change will be reflected inside the container also):\n\n.. code:: bash\n\n   echo 1 | sudo tee /proc/sys/vm/panic_on_oom\n\nFrom inside container where ``sudo`` doesn't work (this change will be reflected on the host also):\n\n.. code:: bash\n    \n   echo 1 > /proc/sys/vm/panic_on_oom\n\n\nPossible Error Conditions\n-------------------------\n\nEager debug mode fails with \"urllib3.exceptions.URLSchemeUnknown: Not supported URL scheme http+unix\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running with eager debug mode (NEURON_USE_EAGER_DEBUG_MODE=1) using ``torch-neuronx`` and ``neuronx-cc`` from releases 2.19.1 and 2.20, you may see the following error:\n\n.. code:: bash\n\n   urllib3.exceptions.URLSchemeUnknown: Not supported URL scheme http+unix\n\nThis error is due to ``requests`` version >= 2.32. While ``neuronx-cc`` pins ``requests`` package version be less than 2.32, installing other packages like ``transformers`` could bring in a newer version of ``requests``.  To work-around this, you can pin ``requests`` to version 2.31.0 with the following command, which also include ``urllib3`` pinning due to a related issue noted in the next note:\n\n.. code:: bash\n\n   pip install requests==2.31.0 urllib3==1.26.20\n\nEager debug mode fails with \"TypeError: HTTPConnection.request() got an unexpected keyword argument 'chunked'\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running with eager debug mode (NEURON_USE_EAGER_DEBUG_MODE=1) using ``torch-neuronx`` and ``neuronx-cc`` from releases 2.19.1 and 2.20, you may see the following error:\n\n.. code:: bash\n\n   TypeError: HTTPConnection.request() got an unexpected keyword argument 'chunked'\n\nThis error is due to ``urllib3`` version >= 2.* and can be a dependency of ``requests`` < 2.32.  To work-around this, you can pin ``urllib3`` to version 1.26.20 with the following command (which also include ``requests`` pinning due a related issue noted the previous note):\n\n.. code:: bash\n\n   pip install requests==2.31.0 urllib3==1.26.20\n\n\nNon-Fatal Error OpKernel ('op: \"TPU*\" device_type: \"CPU\"')\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDuring execution using PyTorch Neuron, you may see these non-fatal error messages:\n\n.. code:: bash\n\n    E tensorflow/core/framework/op_kernel.cc:1676] OpKernel ('op: \"TPURoundRobin\" device_type: \"CPU\"') for unknown op: TPURoundRobin\n    E tensorflow/core/framework/op_kernel.cc:1676] OpKernel ('op: \"TpuHandleToProtoKey\" device_type: \"CPU\"') for unknown op: TpuHandleToProtoKey\n\nThey don't affect operation of the PyTorch Neuron and can be ignored.\n\nXLA runtime error: \"Invalid argument: Cannot assign a device for operation\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: bash\n\n    RuntimeError: tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:490 : Check failed: session->session()->Run(session_work->feed_inputs, session_work->outputs_handles, &outputs) == ::tensorflow::Status::OK() (INVALID_ARGUMENT: Cannot assign a device for operation XRTAllocateFromTensor: {{node XRTAllocateFromTensor}} was explicitly assigned to /job:localservice/replica:0/task:0/device:TPU:0 but available devices are [ /job:localservice/replica:0/task:0/device:CPU:0, /job:localservice/replica:0/task:0/device:TPU_SYSTEM:0, /job:localservice/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.\n\t [[XRTAllocateFromTensor]] vs. OK)\n      *** Begin stack trace ***\n         tensorflow::CurrentStackTrace()\n\n         xla::util::MultiWait::Complete(std::function<void ()> const&)\n\n         clone\n      *** End stack trace ***\n\nThe above error indicates that the framework was not able to initialize the neuron runtime. If you get\nthe above error, check for the following:\n\n1. No other process is taking the neuron cores. If yes, you may have to kill that process.\n\n2. If no process is running, try reloading the driver using ``sudo rmmod neuron; sudo modprobe neuron``\n\n\nError: “Could not start gRPC server”\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you get “Could not start gRPC server” error, please check if there\nare any leftover python processes from a previous interrupted run and\nterminate them before restarting run.\n\n.. code:: bash\n\n   E0207 17:22:12.592127280   30834 server_chttp2.cc:40]        {\"created\":\"@1644254532.592081429\",\"description\":\"No address added out of total 1 resolved\",\"file\":\"external/com_github_grpc_grpc/src/core/ext/t\n   ransport/chttp2/server/chttp2_server.cc\",\"file_line\":395,\"referenced_errors\":[{\"created\":\"@1644254532.592078907\",\"description\":\"Failed to add any wildcard listeners\",\"file\":\"external/com_github_grpc_grpc/s\n   rc/core/lib/iomgr/tcp_server_posix.cc\",\"file_line\":342,\"referenced_errors\":[{\"created\":\"@1644254532.592072626\",\"description\":\"Unable to configure socket\",\"fd\":10,\"file\":\"external/com_github_grpc_grpc/src/c\n   ore/lib/iomgr/tcp_server_utils_posix_common.cc\",\"file_line\":216,\"referenced_errors\":[{\"created\":\"@1644254532.592068939\",\"description\":\"Address already in use\",\"errno\":98,\"file\":\"external/com_github_grpc_grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc\",\"file_line\":189,\"os_error\":\"Address already in use\",\"syscall\":\"bind\"}]},{\"created\":\"@1644254532.592078512\",\"description\":\"Unable to configure socket\"\n   ,\"fd\":10,\"file\":\"external/com_github_grpc_grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc\",\"file_line\":216,\"referenced_errors\":[{\"created\":\"@1644254532.592077123\",\"description\":\"Address already in\n    use\",\"errno\":98,\"file\":\"external/com_github_grpc_grpc/src/core/lib/iomgr/tcp_server_utils_posix_common.cc\",\"file_line\":189,\"os_error\":\"Address already in use\",\"syscall\":\"bind\"}]}]}]}\n   2022-02-07 17:22:12.592170: E tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:545] Unknown: Could not start gRPC server\n\n\nFailed compilation result in the cache\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAll compilation results are by default saved in ``Neuron Persistent Cache``. If the Neuron Compiler\nfails to compile a graph, we save the failed result in the cache. The reason for doing so is, if\nthe user tries to run the same script, we want the users to error out early rather than wait for\nthe compilation to progress and see an error at the later stage. However, there could be certain\ncases under which a failed compilation may be do you some environment issues. One possible reason\nof failure could be, during compilation the process went out of memory. This can happen if you are\nrunning multiple processes in parallel such that not enough memory is available for compilation of\ngraph. Failure due to such reasons can be easily mitigated by re-running the compilation. In case,\nyou want to retry a failed compilation, you can do that by passing ``--retry_failed_compilation``\nas follows:\n\n.. code:: python\n\n   os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + ' --retry_failed_compilation'\n\nThis would retry the compilation and would replace a failed result in the cache with a\nsuccessful compilation result.\n\n\nCompilation errors when placing NeuronCache home directory on NFS/EFS/FSx mounted drive\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCurrently, NeuronCache default root directory is /var/tmp which is local to the instance you are running on. You can modify the location of the NeuronCache root directory using ``NEURON_CC_FLAGS='--cache_dir=<root dir>'``.  However, when the NeuronCache directory is placed in a directory that is part of a NFS mounted drive shared among multiple instances, you may encounter file errors such as file not found, file corruption, or KeyError when running multi-instance training:\n\n.. code:: bash\n\n    KeyError: 'neff_cache2/neuron-compile-cache/USER_neuroncc-1.0.48875.0+7437fbf18/MODULE_7223055628515330524/MODULE_0_SyncTensorsGraph.14_7223055628515330524_compute1-dy-training-2-1-e859998e-3035-5df63dab5ce63'\n\nThis is a result of limitations to file locking on NFS. EFS/FSx also exhibit similar limitation. The workaround is to setup separate NeuronCache root directories for each worker instance, such as ``NEURON_CC_FLAGS=\"--cache_dir=$HOME/neuron_cache/bert/`hostname`\"``, where the home directory is shared among worker instances as in ParallelCluster.\n\nConsider the use case of a ParallelCluster with SLURM cluster management. The home directory of the head node is shared via NFS with worker instances. Also, SLURM would terminate the idle worker instances when the cluster is configured as dynamic auto-scaling cluster, and the default cache in the terminated worker instance's /var/tmp is deleted. So to persist the cache across runs separated by a cluster idle period, we use the workaround above to create separate NeuronCache root directories for each worker instance. For example, see `BERT ParallelCluster script <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/dp_bert_hf_pretrain/run_dp_bert_large_hf_pretrain_bf16_s128.sh#L42>`__.\n\n\nCompilation error: “Expect ap datatype to be of type float32 float16 bfloat16 uint8”\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf an XLA example fails to run because of failed compilation and one of\nthe error messages is “Expect ap datatype to be of type float32 float16\nbfloat16 uint8”, then please set the environment variable\n``XLA_USE_32BIT_LONG=1`` in your script:\n\n.. code:: python\n\n    os.environ['XLA_USE_32BIT_LONG'] = '1'\n\n.. code:: bash\n\n   11/18/2021 04:51:25 PM WARNING 34567 [StaticProfiler]: matmul-based transposes inserted by penguin takes up 93.66 percent of all matmul computation\n   terminate called after throwing an instance of 'std::runtime_error'\n     what():  === BIR verification failed ===\n   Reason: Expect ap datatype to be of type float32 float16 bfloat16 uint8\n   Instruction: I-545-0\n   Opcode: Matmult\n   Input index: 0\n   Argument AP:\n   Access Pattern: [[1,8],[1,1],[1,1]]\n   Offset: 0\n   Memory Location: {compare.85-t604_i0}@SB<0,0>(8x2)#Internal DebugInfo: <compare.85||uint16||UNDEF||[8, 1, 1]>\n\nNeuronCore(s) not available - Requested:1 Available:0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen you see \"NeuronCore(s) not available\" please terminate processes\nthat may be holding the NeuronCores and terminate any neuron-top\nsessions that are running. Also check if someone else is using the\nsystem. Then do \"sudo rmmod neuron; sudo modprobe neuron\" to reload the\ndriver.\n\n.. code:: bash\n\n   2021-Nov-15 15:21:28.0231 7245:7245 ERROR NRT:nrt_allocate_neuron_cores NeuronCore(s) not available - Requested:nc1-nc1 Available:0\n   2021-11-15 15:21:28.231864: F ./tensorflow/compiler/xla/service/neuron/neuron_runtime.h:1037] Check failed: status == NRT_SUCCESS NEURONPOC : nrt_init failed. Status = 1\n\nOften when you run multi-worker training, there can be many python\nprocesses leftover after a run is interrupted. To kill all python\nprocesses, run the follow (WARNING: this kills all python processes on\nthe system) then reload the driver:\n\n.. code:: bash\n\n   killall -9 python\n   killall -9 python3\n   sudo rmmod neuron; sudo modprobe neuron\n\nTDRV error \"TDRV:exec_consume_infer_status_notification\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you see TDRV error \"TDRV:exec_consume_infer_status_notification\", try reloading the driver using ``sudo modprobe -r neuron; sudo modprobe neuron;``.\n\n.. code:: bash\n\n    2022-Mar-10 18:51:19.07392022-Mar-10 18:51:19.0739 17821:17931 ERROR  TDRV:exec_consume_infer_status_notifications  17822:18046 ERROR  TDRV:exec_consume_infer_status_notifications Unexpected number of CC notifications:  mod->cc_op_count=1, cc_start_cnt=0, cc_end_cnt=0Unexpected number of CC notifications:  mod->cc_op_count=1, cc_start_cnt=0, cc_end_cnt=0\n\n    2022-Mar-10 18:51:19.07392022-Mar-10 18:51:19.0739 17821:17931 ERROR  TDRV:exec_consume_infer_status_notifications  17822:18046 ERROR  TDRV:exec_consume_infer_status_notifications (NON-FATAL, Ignoring) inference timeout (180000 ms) on Neuron Device 0 NC 0, waiting for cc status notifications.\n\n    (NON-FATAL, Ignoring) inference timeout (180000 ms) on Neuron Device 0 NC 1, waiting for cc status notifications.\n\nTDRV error \"TDRV:tdrv_one_tmpbuf_reserve  Number of ONE TMPBUF pages requested exceeded the max number of pages allowed (requested: <N>, max allowed: 16).\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you see the TDRV error \"TDRV:tdrv_one_tmpbuf_reserve  Number of ONE TMPBUF pages requested exceeded the max number of pages allowed (requested: <N>, max allowed: 16)\", it maybe due to model tensors requiring more device memory then available. A solution is to try training with a smaller data batch size.\n\n.. code:: bash\n\n    ERROR  TDRV:tdrv_one_tmpbuf_reserve                 Number of ONE TMPBUF pages requested exceeded the max number of pages allowed (requested: 28, max allowed: 16).\n    ERROR  TDRV:copy_and_stage_mr                       Failed to reserve one tmpbuf memory\n    ERROR  TDRV:kbl_model_add                           copy_and_stage_mr() error\n    W tensorflow/core/distributed_runtime/rpc/grpc_remote_master.cc:157] RPC failed with status = \"UNAVAILABLE: Socket closed\" and grpc_error_string = \"{\"created\":\"@1669183391.155135683\",\"description\":\"Error received from peer ipv4:172.31.58.24:43941\",\"file\":\"external/com_github_grpc_grpc/src/core/lib/surface/call.cc\",\"file_line\":1056,\"grpc_message\":\"Socket closed\",\"grpc_status\":14}\", maybe retrying the RPC\n\n\nCould not open the ndX, close device failed, TDRV not initialized\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you see error messages stating “Could not open the ndX” (where X is\nan integer from 0..15), please run ``neuron-ls`` and ensure that you are\nable to see all 16 Neuron devices in the output. If one or more devices\nare missing please report the issue to aws-neuron-support@amazon.com with the instance ID and a screen capture of ``neuron-ls`` output.\n\n::\n\n   2021-Nov-11 15:33:20.0161  7912:7912  ERROR  TDRV:tdrv_init_mla_phase1                    Could not open the nd0\n   2021-Nov-11 15:33:20.0161  7912:7912  ERROR  TDRV:tdrv_destroy_one_mla                    close device failed\n   2021-Nov-11 15:33:20.0161  7912:7912  ERROR  TDRV:tdrv_destroy                            TDRV not initialized\n   2021-Nov-11 15:33:20.0161  7912:7912  ERROR   NRT:nrt_init                                Failed to initialize devices, error:1\n   2021-11-11 15:33:20.161331: F ./tensorflow/compiler/xla/service/neuron/neuron_runtime.h:1033] Check failed: status == NRT_SUCCESS NEURONPOC : nrt_init failed. Status = 1\n\nMultiworker execution hangs during NCCL init\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen your multi-worker execution hangs during NCCL init, you can try to\nreserve the port used by environment variable ``NEURON_RT_ROOT_COMM_ID``\nby (here we use host:port localhost:48620 as an example but you can use\nany free port and root node’s host IP):\n\n.. code:: bash\n\n   sudo sysctl -w net.ipv4.ip_local_reserved_ports=48620\n\nThen set the environment variable ``NEURON_RT_ROOT_COMM_ID`` in your\nscript:\n\n.. code:: python\n\n   os.environ[\"NEURON_RT_ROOT_COMM_ID\"] = \"localhost:48620\"\n\n.. _nrt-init-error-one-or-more-engines-are-running-please-restart-device-by-reloading-driver:\n\nNRT init error “One or more engines are running. Please restart device by reloading driver”\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you see an error stating “One or more engines are running. Please\nrestart device by reloading driver” please follow the instruction and\nreload the driver using\n“\\ ``sudo modprobe -r neuron; sudo modprobe neuron;``\\ ”.\n\n.. code:: bash\n\n   2021-Nov-15 20:23:27.0280 3793:3793 ERROR TDRV:tpb_eng_init_hals_v2 CRITICAL HW ERROR: One or more engines are running. Please restart device by reloading driver:\n   sudo modprobe -r neuron; sudo modprobe neuron;\n   2021-Nov-15 20:23:27.0280 3793:3793 ERROR TDRV:tdrv_init_one_mla_phase2 nd0 nc0 HAL init failed. error:1\n\nNRT error “ERROR TDRV:kbl_model_add Attempting to load an incompatible model!”\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you see an NRT error “ERROR TDRV:kbl_model_add Attempting to load an\nincompatible model!” this means that the compiler neuronx-cc used to\ncompile the model is too old. See installation instruction to update to\nlatest compiler.\n\nNRT error \"ERROR HAL:aws_hal_sprot_config_remap_entry SPROT remap destination address must be aligned size\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you see an NRT error \"ERROR HAL:aws_hal_sprot_config_remap_entry SPROT remap\ndestination address must be aligned size\", please check the kernel version and upgrade it\nto the distribution's latest kernel.\n\nFor example, on Ubuntu 18.04.6 LTS, the kernel version 4.15.0-66-generic is\nknown to cause this error when running MLP tutorial. This is due to a known\nbug in the kernel in aligned memory allocation. To fix this issue, please\nupgrade your kernel to latest version (i.e. 4.15.0-171-generic):\n\n.. code:: shell\n\n    uname -a\n    sudo apt-get update\n    sudo  apt-get upgrade\n    sudo apt-get dist-upgrade\n\nPlease reboot after the upgrade.  Use \"uname -a\" to check kernel version again after reboot.\n\nNCCL warning : \"NCCL WARN Timeout waiting for RX (waited 120 sec) - retrying\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen running multi-worker training, if a graph has collective communication operator like an\n``all_reduce``, it requires all the workers involved in the collective communication to load the\ngraph in the runtime at approximately same time. If any of the worker doesn't load the graph\nwithin a 120 sec window from the first model load by any of the worker, you would see warnings\nlike ``NCCL WARN Timeout waiting for RX (waited 120 sec) - retrying``. When you see such warnings\ncheck for the following in the log messages:\n\n1. One of the workers is compiling a graph: In multi-worker training, there is a chance that\neach worker builds a slightly different graph. This would result in cache miss and can result\nin compilation. Since the compilations during training run are serialized, the first worker\ncan compile and load the graph with collective communication. It would then wait for 120 secs\nfor other works to join. If they don't show up because they are compiling their own graphs,\nfirst worker would start throwing a warning message as above. The warning in this case is\n``non-fatal`` and would go away once all workers have compiled their respective graphs and then loaded\nthem. To identify this scenario, look for ``No candidate found under ....`` logs around the warning.\nYou should also see ``.....`` which indicates compilation is in progress.\n\n2. Server on one of the nodes crashed: In distributed training across multiple nodes, if the server on one\nnode crashed, the workers on other nodes would keep waiting on model load and you would see above\n``timeout`` logs on those nodes. To identify if the server crashed, check if you see the following\nerror on any of the nodes:\n\n::\n\n   `RPC failed with status = \"UNAVAILABLE: Socket closed\" and grpc_error_string = \"{\"created\":\"@1664146011.016500243\",\"description\":\"Error received from peer ipv4:10.1.24.109:37379\",\"file\":\"external/com_github_grpc_grpc/src/core/lib/surface/call.cc\",\"file_line\":1056,\"grpc_message\":\"Socket closed\",\"grpc_status\":14}\", maybe retrying the RPC`\n\nIf you see the above error, then it means there is a server crash and you need to cancel the\ntraning run.\n\nRPC error: \"RPC failed with status = 'UNAVAILABLE: Socket closed'\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nWhen you see the above error, it means that the xrt server crashed. When you see such an error, look for\nthe following:\n\n1. Check for any error logs before the ``RPC error``. That should indicate the root cause of server crash.\n   Note: The actual error log might be buried because of all the ``RPC error`` logs that swamp the logs.\n\n2. Sometimes the server can crash because of host OOM. This can happen when we are loading and saving checkpoints.\n   In such cases, you only see ``RPC errors`` and no other log. You can check if any instance is going out of memory\n   by using tools like `dmesg <https://man7.org/linux/man-pages/man1/dmesg.1.html>`_\n\nError \"Assertion \\`listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed\" followed by 'RPC failed with status = \"UNAVAILABLE: Connection reset by peer\"'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe error \"Assertion \\`listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed\" is intermittent and occurs when using glibc 2.26. To find out the glibc version you have, you can run ``ldd --version``. The workaround is to use Ubuntu 20 where glibc is 2.27.\n\n.. code:: bash\n\n   INFO: Inconsistency detected by ld.so: ../elf/dl-tls.c: 488: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!\n   INFO: 2022-10-03 02:16:04.488054: W tensorflow/core/distributed_runtime/rpc/grpc_remote_master.cc:157] RPC failed with status = \"UNAVAILABLE: Connection reset by peer\" and grpc_error_string = \"{\"created\":\"@1664763364.487962663\",\"description\":\"Error received from peer ipv4:10.0.9.150:41677\",\"file\":\"external/com_github_grpc_grpc/src/core/lib/surface/call.cc\",\"file_line\":1056,\"grpc_message\":\"Connection reset by peer\",\"grpc_status\":14}\", maybe retrying the RPC\n\nRPC connection error: \"RPC failed with status = UNAVAILABLE: Connection reset by peer\" not preceded by any error\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThis error may not be preceded by another error like shown in the previous section.\nIn this case, the RPC connection error usually happens when we do distributed training across multiple nodes. When you see such error, please\nwait for a few minutes. It might be because some node is taking time to setup and hence the other node is not\nable to connect to it just yet. Once, all nodes are up, training should resume.\n\nRuntime errors \"Missing infer_status notification\" followed by \"inference timeout\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you get a timeout error like below:\n\n.. code:: bash\n\n    ERROR  TDRV:exec_consume_tpb_status_notifications   Missing infer_status notification: (end:4)\n    ERROR  TDRV:exec_consume_infer_status_notifications (FATAL-RT-UNDEFINED-STATE) inference timeout (600000 ms) on Neuron Device 4 NC 1, waiting for execution completion notification\n\nIt maybe due to long graph execution time causing synchronization delays\nexceeding the default timeout. Please try increasing the timeout to\nlarger value using ``NEURON_RT_EXEC_TIMEOUT`` (unit in seconds) and\nsee if the problem is resolved.\n\nProtobuf Error \"TypeError: Descriptors cannot not be created directly.\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you install torch-neuronx after neuronx-cc, you may get the Protobuf error \"TypeError: Descriptors cannot not be created directly.\". To fix this, please reinstall neuronx-cc using \"pip install --force-reinstall neuronx-cc\".\n\n.. code:: bash\n\n    Traceback (most recent call last):\n      File \"./run_glue.py\", line 570, in <module>\n        main()\n      File \"./run_glue.py\", line 478, in main\n        data_collator=data_collator,\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/transformers/trainer.py\", line 399, in __init__\n        callbacks, self.model, self.tokenizer, self.optimizer, self.lr_scheduler\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/transformers/trainer_callback.py\", line 292, in __init__\n        self.add_callback(cb)\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/transformers/trainer_callback.py\", line 309, in add_callback\n        cb = callback() if isinstance(callback, type) else callback\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/transformers/integrations.py\", line 390, in __init__\n        from torch.utils.tensorboard import SummaryWriter  # noqa: F401\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/torch/utils/tensorboard/__init__.py\", line 10, in <module>\n        from .writer import FileWriter, SummaryWriter  # noqa: F401\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/torch/utils/tensorboard/writer.py\", line 9, in <module>\n        from tensorboard.compat.proto.event_pb2 import SessionLog\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/tensorboard/compat/proto/event_pb2.py\", line 17, in <module>\n        from tensorboard.compat.proto import summary_pb2 as tensorboard_dot_compat_dot_proto_dot_summary__pb2\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/tensorboard/compat/proto/summary_pb2.py\", line 17, in <module>\n        from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/tensorboard/compat/proto/tensor_pb2.py\", line 16, in <module>\n        from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/tensorboard/compat/proto/resource_handle_pb2.py\", line 16, in <module>\n        from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py\", line 42, in <module>\n        serialized_options=None, file=DESCRIPTOR),\n      File \"/home/ec2-user/aws_neuron_venv_pytorch_p37_exp/lib64/python3.7/site-packages/google/protobuf/descriptor.py\", line 560, in __new__\n        _message.Message._CheckCalledFromGeneratedFile()\n    TypeError: Descriptors cannot not be created directly.\n    If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.\n    If you cannot immediately regenerate your protos, some other possible workarounds are:\n     1. Downgrade the protobuf package to 3.20.x or lower.\n     2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).\n\nTDRV error \"Timestamp program stop timeout\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you see TDRV error \"Timestamp program stop timeout\", i.e. when rerunning a training script after it was interrupted, try first reloading the driver using ``sudo modprobe -r neuron; sudo modprobe neuron;`` (make sure neuron-top and/or neuron-monitor are not running).\n\n.. code:: bash\n\n    2022-Aug-31 04:59:21.0546 117717:117717 ERROR  TDRV:tsync_wait_eng_stop                     nd0 nc0 Timestamp program stop timeout (1000 ms)\n    2022-Aug-31 04:59:21.0546 117717:117717 ERROR  TDRV:tsync_wait_nc_stop                      nd0 nc0 Error while waiting for timestamp program to end on TPB eng 0\n    2022-Aug-31 04:59:21.0546 117717:117717 ERROR  TDRV:tsync_timestamps_finish                 nd0 nc0 Failed to stop neuron core\n    2022-Aug-31 04:59:21.0546 117717:117717 ERROR  TDRV:tdrv_tsync_timestamps                   nd0 nc0 Failed to end timestamp sync programs\n    2022-Aug-31 04:59:22.0768 117717:117717 ERROR  TDRV:tdrv_destroy                            TDRV not initialized\n    2022-Aug-31 04:59:22.0768 117717:117717 ERROR   NRT:nrt_init                                Failed to initialize devices, error:5\n\nCompiler error \"module 'numpy' has no attribute 'asscalar'\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen you have a newer version of numpy in the Python environment, compilations may fail with the \"error module 'numpy' has no attribute 'asscalar'\".\nPlease note the neuronx-cc has the following dependency on numpy \"numpy<=1.20.0,>=1.13.3\". To workaround this error, please do \"pip install --force-reinstall neuronx-cc\" to reinstall neuronx-cc with the proper dependencies.\n\n.. code:: base\n\n   ERROR 227874 [neuronx-cc]: ***************************************************************\n   ERROR 227874 [neuronx-cc]:  An Internal Compiler Error has occurred\n   ERROR 227874 [neuronx-cc]: ***************************************************************\n   ERROR 227874 [neuronx-cc]:\n   ERROR 227874 [neuronx-cc]: Error message:  module 'numpy' has no attribute 'asscalar'\n   ERROR 227874 [neuronx-cc]:\n   ERROR 227874 [neuronx-cc]: Error class:    AttributeError\n   ERROR 227874 [neuronx-cc]: Error location: Unknown\n   ERROR 227874 [neuronx-cc]: Version information:\n   ERROR 227874 [neuronx-cc]:   NeuronX Compiler version 2.1.0.76+2909d26a2\n   ERROR 227874 [neuronx-cc]:\n   ERROR 227874 [neuronx-cc]:   HWM version 2.1.0.7-64eaede08\n   ERROR 227874 [neuronx-cc]:   NEFF version Dynamic\n   ERROR 227874 [neuronx-cc]:   TVM not available\n   ERROR 227874 [neuronx-cc]:   NumPy version 1.23.3\n   ERROR 227874 [neuronx-cc]:   MXNet not available\n   ERROR 227874 [neuronx-cc]:\n\nImport errors 'generic_type: type \"IrValue\" is already registered!' or 'generic_type: type \"XlaBuilder\" is already registered!'\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen you encounter a PyTorch import error 'import _XLAC ... generic_type: type \"IrValue\" is already registered!' or 'import _XLAC ... generic_type: type \"XlaBuilder\" is already registered!', please check that TensorFlow and/or JAX are not installed in the Python environment. If they are installed, please uninstall them.\n\nImport error \"import _XLAC ImportError: <>/site-packages/_XLAC.cpython-38-x86_64-linux-gnu.so: undefined symbol\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen you encounter a PyTorch import error \"import _XLAC ImportError: <>/site-packages/_XLAC.cpython-38-x86_64-linux-gnu.so: undefined symbol\" during execution, please check:\n    1. TensorFlow and/or JAX are not installed in the Python environment. If they are installed, please uninstall them.\n    2. The installed PyTorch (torch) package major/minor versions match the installed torch-neuronx package's major/minor versions (ie. 1.11). If they don't match, please install the version of PyTorch that matches torch-neuronx.\n\n.. code:: bash\n\n    Traceback (most recent call last):\n      File \"/opt/ml/mlp_train.py\", line 11, in <module>\n        import torch_xla.core.xla_model as xm\n      File \"/usr/local/lib/python3.8/site-packages/torch_xla/__init__.py\", line 117, in <module>\n        import _XLAC\n    ImportError: /usr/local/lib/python3.8/site-packages/_XLAC.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl7stridesEv\n\nNaNs seen with transformers version >= 4.21.0 when running HF BERT fine-tuning or pretraining with XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen running HuggingFace BERT (any size) fine-tuning tutorial or pretraining tutorial with transformers version >= 4.21.0 and using XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1, you will see NaNs in the loss immediately at the first step. More details on the issue can be found at `pytorch/xla#4152 <https://github.com/pytorch/xla/issues/4152>`_. The workaround is to use 4.20.0 or earlier (the tutorials currently recommend version 4.15.0) or add ``transformers.modeling_utils.get_parameter_dtype = lambda x: torch.bfloat16`` to the Python script.\n\n\n.. _trn1_ubuntu_troubleshooting:\n\nNetwork Connectivity Issue on trn1/trn1n 32xlarge with Ubuntu\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Description**\n\nUbuntu distributions have network connectivity issues when multiple interfaces are connected to the same subnet. trn1/trn1n 32xlarge comes with 8/16 network interfaces. (To launch trn1/trn1n with 8/16 interfaces please follow :ref:`here <setup-trn1-multi-node-execution>`)\n\nAWS publishes a package that installs a helper service to address the issue. This service runs at the startup, creates the appropriate netplan files, updates the netplan and the the instance networking and terminates.\n\nNote that the following fix is only required on instances launched using generic Ubuntu AMIs.  Neuron AMIs and instances launched via ParalleCluster do not require the fix.\n\n**Patch to fix networking on a multi-interface instance**\n\n.. code:: bash\n\n    wget -O /tmp/aws-ubuntu-eni-helper.deb 'https://github.com/aws-samples/aws-efa-nccl-baseami-pipeline/blob/master/nvidia-efa-ami_base/networking/aws-ubuntu-eni-helper_0.3-1_all.deb?raw=true'\n    sudo apt install /tmp/aws-ubuntu-eni-helper.deb -y\n    sudo systemctl enable aws-ubuntu-eni-helper.service\n    sudo systemctl start aws-ubuntu-eni-helper.service\n\n\n**How to apply the patch?**\n\nThe following steps could be followed to resolve this issue:\n\n* Launch trn1.32xl from AWS console (starts with ``single interface``, does not suffer from the multi-interface issue)\n* Apply the patch on this newly launched single-interface instance\n* Create a new AMI from this instance\n* Launch an 8 or 16 interface instance using that AMI.\n\n.. note::\n    The patch installs and enables the service but does not run it.  This is intentional.  The service will run at the startup when the AMI is used to launch a multi-interface instance. \n\n**FAQs**\n\n.. note::\n  Neuron DLAMI has the patch installed, users are always encouraged to launch the instances using the DLAMI which does not require any fix. Please refer to the :ref:`Set Up Guide <setup-guide-index>` to know how to launch an instance using DLAMI.\n\n\n\n\"Too many open files\" when running training job\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nWhen running a large model training with several workers, it can result in errors like the following.\n\n.. code:: bash\n\n\t2023-Jun-14 19:05:29.0312 4112959:4113326 [23] bootstrap.cc:106 CCOM WARN Call to accept failed : Too many open files\n\t2023-Jun-14 19:05:29.0312 4112959:4113263 [14] include/socket.h:438 CCOM WARN Net : Socket creation failed : Too many open files\n\t2023-Jun-14 19:05:29.0312 4112959:4113326 ERROR   ENC:ncclBootstrapRecv                       failed neuronBootstrapRecv request to NCCL\n\t2023-Jun-14 19:05:29.0312 4112959:4113249 [12] bootstrap.cc:106 CCOM WARN Call to accept failed : Too many open files\n\t2023-Jun-14 19:05:29.0312 4112959:4113263 ERROR   ENC:ncclBootstrapSend                       failed neuronBootstrapSend request to NCCL2023-Jun-14 19:05:29.03122023-Jun-14 19:05:29.0312 4112959:4113270 [15] bootstrap.cc:106 CCOM WARN Call to accept failed : Too many open files\n\nThis can result when the default OS limits is low. The hard and soft limits can be set on OS using the following commands or by manually opening and setting the limits.\n\n.. code:: bash\n\n\tsudo sed -i 'H;1h;$!d;x;/hard  *nofile/!s/$/\\n* hard nofile 65536/' /etc/security/limits.conf\n\tsudo sed -i 'H;1h;$!d;x;/soft  *nofile/!s/$/\\n* soft nofile 65536/' /etc/security/limits.conf\n\tsudo sed -i 's/^#*\\(\\*\\|\\s*\\*\\)\\s*soft\\s*nofile\\s*[0-9]\\+$/\\1 soft nofile 65536/' /etc/security/limits.conf\n\tsudo sed -i 's/^#*\\(\\*\\|\\s*\\*\\)\\s*hard\\s*nofile\\s*[0-9]\\+$/\\1 hard nofile 65536/' /etc/security/limits.conf\n\tsudo sed -i 's/^#*\\(\\*\\|\\s*\\*\\)\\s*soft\\s*nofile\\s*[0-9]\\+$/\\1 soft nofile 65536/' /etc/security/limits.d/01_efa.conf || true\n\tsudo sed -i 's/^#*\\(\\*\\|\\s*\\*\\)\\s*hard\\s*nofile\\s*[0-9]\\+$/\\1 hard nofile 65536/' /etc/security/limits.d/01_efa.conf || true\n\nThe `01_efa.conf` file is created as part of the EFA installation and needs to be updated. If EFA driver is not installed the file `01_efa.conf` doesn't exist and the sed commands will fail with `No such file or directory`. If there are other files under `limits.d` with file limits they need to be updated as well.\n\n\"undefined symbol\"\n^^^^^^^^^^^^^^^^^^\nTo maintain compatibility with the packages vended publicly in Pypi, AWS Neuron python packages contain binary extensions that are compiled with the pre-2011 libstdc++ application binary interface (ABI). If a custom version of a package - such as `torch` - is compiled using a modern compiler, it can result in \"undefined symbol\" errors due to mismatches between the package and AWS Neuron package. \n\nTo support this situation, we provide alternative versions of AWS Neuron packages that are compiled according to the newer 2011 ABI. For information on how to use these packages, see :ref:`pytorch-install-cxx11`.\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/inference/tutorial-torchserve-neuronx.rst",
    "content": ".. _pytorch-tutorials-torchserve-neuronx:\n\n\n.. meta::\n   :description: BERT TorchServe Tutorial - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx, tutorials\n   :date-modified: 2026-03-13\n\n\nBERT TorchServe Tutorial\n========================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n\nOverview\n--------\nThis tutorial demonstrates the use of `TorchServe <https://pytorch.org/serve>`_ with Neuron, the SDK for EC2 Inf2 and Trn1 instances. By the end of this tutorial, you will understand how TorchServe can be used to serve a model backed by EC2 Inf2/Trn1 instances. We will use a pretrained BERT-Base model to determine if one sentence is a paraphrase of another.\n\n.. _torchserve-compile-nx:\n\n\nRun the tutorial\n----------------\n\nOpen a terminal, log into your remote instance, and activate a Pytorch virtual environment setup (see the:ref:`Install PyTorch Neuron <setup-torch-neuronx>`). To complete this tutorial, you will also need a compiled BERT model. You can run :download:`trace_bert_neuronx.py </src/examples/pytorch/torchserve/trace_bert_neuronx.py>` to obtain a traced BERT model.\n\nYou should now have a compiled ``bert_neuron_b6.pt`` file, which is required going forward.\n\nOpen a shell on the instance you prepared earlier, create a new directory named ``torchserve``. Copy your compiled model from the previous tutorial into this new directory.\n\n.. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n   :language: bash\n   :lines: 4-6\n\n::\n\n  bert_neuron_b6.pt\n\nPrepare a new Python virtual environment with the necessary Neuron and TorchServe components. Use a virtual environment to keep (most of) the various tutorial components isolated from the rest of the system in a controlled way.\n\n.. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n   :language: bash\n   :lines: 8\n\nInstall the system requirements for TorchServe.\n\n.. tab-set::\n\n   .. tab-item:: Amazon Linux 2023 DLAMI Base\n\n      .. code-block:: bash\n\n        sudo dnf -y install jq java-11-amazon-corretto-headless\n        sudo alternatives --config java\n        sudo alternatives --config javac\n\n   .. tab-item:: Ubuntu 20 DLAMI Base\n\n      .. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n        :language: bash\n        :lines: 10\n\n\n.. code:: bash\n\n  java -version\n\n::\n\n  openjdk version \"11.0.17\" 2022-10-18\n  OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu218.04)\n  OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu218.04, mixed mode, sharing)\n\n.. code:: bash\n\n  javac -version\n\n::\n\n  javac 11.0.17\n\nVerify that TorchServe is now available.\n\n.. code:: bash\n\n  torchserve --version\n\n::\n\n  TorchServe Version is 0.7.0\n\n\n.. _torchserve-setup-nx:\n\nSetup TorchServe\n----------------\n\nDuring this tutorial you will need to download a few files onto your instance. The simplest way to accomplish this is to paste the download links provided above each file into a ``wget`` command. (We don't provide the links directly because they are subject to change.) For example, right-click and copy the download link for ``config.json`` shown below.\n\n.. literalinclude:: /src/examples/pytorch/torchserve/config.json\n    :language: JSON\n    :caption: :download:`config.json </src/examples/pytorch/torchserve/config.json>`\n\n\nNow execute the following in your shell:\n\n.. code:: bash\n\n  wget <paste link here>\n  ls\n\n::\n\n  bert_neuron_b6.pt  config.json\n\nDownload the `custom handler script <https://pytorch.org/serve/custom_service.html>`_ that will eventually respond to inference requests.\n\n.. literalinclude:: /src/examples/pytorch/torchserve/handler_bert_neuronx.py\n    :language: python\n    :caption: :download:`handler_bert_neuronx.py </src/examples/pytorch/torchserve/handler_bert_neuronx.py>`\n    :linenos:\n\nNext, we need to associate the handler script with the compiled model using ``torch-model-archiver``. Run the following commands in your terminal:\n\n.. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 12-16\n\n.. note::\n\n  If you modify your model or a dependency, you will need to rerun the archiver command with the ``-f`` flag appended to update the archive.\n\nThe result of the above will be a ``mar`` file inside the ``model_store`` directory.\n\n.. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 18\n\n::\n\n  bert-max_length128-batch_size6.mar\n\nThis file is essentially an archive associated with a fixed version of your model along with its dependencies (e.g. the handler code).\n\n.. note::\n\n  The version specified in the ``torch-model-archiver`` command can be appended to REST API requests to access a specific version of your model. For example, if your model was hosted locally on port 8080 and named \"bert\", the latest version of your model would be available at ``http://localhost:8080/predictions/bert``, while version 1.0 would be accessible at ``http://localhost:8080/predictions/bert/1.0``. We will see how to perform inference using this API in Step 6.\n\nCreate a `custom config <https://pytorch.org/serve/configuration.html>`_ file to set some parameters. This file will be used to configure the server at launch when we run ``torchserve --start``.\n\n.. literalinclude:: /src/examples/pytorch/torchserve/torchserve.config\n    :language: properties\n    :caption: :download:`torchserve.config </src/examples/pytorch/torchserve/torchserve.config>`\n\n.. note::\n\n  This will cause TorchServe to bind on all interfaces. For security in real-world applications, you’ll probably want to use port 8443 and `enable SSL <https://pytorch.org/serve/configuration.html#enable-ssl>`_.\n\n\n.. _torchserve-run-nx:\n\nRun TorchServe\n--------------\n\nIt's time to start the server. Typically we'd want to launch this in a separate console, but for this demo we’ll just redirect output to a file.\n\n.. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 20\n\nVerify that the server seems to have started okay.\n\n.. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 22\n\n::\n\n  {\n    \"status\": \"Healthy\"\n  }\n\n.. note::\n\n  If you get an error when trying to ping the server, you may have tried before the server was fully launched. Check ``torchserve.log`` for details.\n\nUse the Management API to instruct TorchServe to load our model.\n\nFirst, determine the number of NeuronCores available based on your instance size.\n\n.. tab-set::\n\n   .. tab-item:: Inf2\n\n      .. list-table::\n        :header-rows: 1\n\n        * - Instance Size\n          - # of NeuronCores\n        * - xlarge\n          - 2\n        * - 8xlarge\n          - 2\n        * - 24xlarge\n          - 12\n        * - 48xlarge\n          - 24\n\n   .. tab-item:: Trn1\n\n      .. list-table::\n        :header-rows: 1\n\n        * - Instance Size\n          - # of NeuronCores\n        * - 2xlarge\n          - 2\n        * - 32xlarge\n          - 32\n\n\n.. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 24-26\n\n::\n\n  {\n    \"status\": \"Model \\\"bert-max_length128-batch_size6\\\" Version: 1.0 registered with X initial workers\"\n  }\n\n\n.. warning::\n  You shouldn't set ``INITIAL_WORKERS`` above the number of NeuronCores. If you attempt to load more models than NeuronCores available, one of two things will occur. Either the extra models will fit in device memory but performance will suffer, or you will encounter an error on your initial inference. However, you may want to use fewer cores if you are using the :ref:`neuroncore-pipeline` feature.\n\n\n.. note::\n\n  Any additional attempts to configure the model after the initial curl request will cause the server to return a 409 error. You’ll need to stop/start/configure the server to realize any changes.\n\nThe ``MAX_BATCH_DELAY`` is a timeout value that determines how long to wait before processing a partial batch. This is why the handler code needs to check the batch dimension and potentially add padding. TorchServe will instantiate the number of model handlers indicated by ``INITIAL_WORKERS``, so this value controls how many models we will load onto Inferentia in parallel. If you want to control worker scaling more dynamically, `see the docs <https://pytorch.org/serve/management_api.html#scale-workers>`_.\n\nIt looks like everything is running successfully at this point, so it's time for an inference.\n\nCreate the ``infer_bert.py`` file below on your instance.\n\n.. literalinclude:: /src/examples/pytorch/torchserve/infer_bert.py\n    :language: python\n    :caption: :download:`infer_bert.py </src/examples/pytorch/torchserve/infer_bert.py>`\n    :linenos:\n\nThis script will send a ``batch_size`` number of requests to our model. In this example, we are using a model that estimates the probability that one sentence is a paraphrase of another. The script sends positive examples in the first half of the batch and negative examples in the second half.\n\nExecute the script in your terminal.\n\n.. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 28\n\n::\n\n  1 ['paraphrase']\n  3 ['not paraphrase']\n  4 ['not paraphrase']\n  0 ['paraphrase']\n  5 ['not paraphrase']\n  2 ['paraphrase']\n\nWe can see that the first three threads (0, 1, 2) all report ``paraphrase``, as expected. If we instead modify the script to send an incomplete batch and then wait for the timeout to expire, the excess padding results will be discarded.\n\n\n.. _torchserve-benchmark-nx:\n\nBenchmark TorchServe\n--------------------\n\nWe've seen how to perform a single batched inference, but how many inferences can we process per second? A separate upcoming tutorial will document performance tuning to maximize throughput. In the meantime, we can still perform a simple naïve stress test. The code below will spawn 64 worker threads, with each thread repeatedly sending a full batch of data to process. A separate thread will periodically print throughput and latency measurements.\n\n.. literalinclude:: /src/examples/pytorch/torchserve/benchmark_bert.py\n    :language: python\n    :caption: :download:`benchmark_bert.py </src/examples/pytorch/torchserve/benchmark_bert.py>`\n    :linenos:\n\nRun the benchmarking script.\n\n.. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 30\n\n::\n\n  pid 1214554: current throughput 0.0, latency p50=0.000 p90=0.000\n  pid 1214554: current throughput 713.9, latency p50=0.071 p90=0.184\n  pid 1214554: current throughput 737.9, latency p50=0.071 p90=0.184\n  pid 1214554: current throughput 731.6, latency p50=0.068 p90=0.192\n  pid 1214554: current throughput 732.2, latency p50=0.070 p90=0.194\n  pid 1214554: current throughput 733.9, latency p50=0.070 p90=0.187\n  pid 1214554: current throughput 739.3, latency p50=0.071 p90=0.184\n  ...\n\n.. note::\n\n  Your throughput numbers may differ from these based on instance type and size.\n\n**Congratulations!** By now you should have successfully served a batched model over TorchServe.\n\nYou can now shutdown torchserve.\n\n.. literalinclude:: /archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh\n    :language: bash\n    :lines: 32\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/inference/tutorials-torch-neuronx.rst",
    "content": ".. _inference-torch-neuronx-tutorials:\n\n\n.. meta::\n   :description: Tutorials for Inference (``torch-neuronx``) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, inference, torch-neuronx, tutorials\n   :date-modified: 2026-03-13\n\n\nTutorials for Inference (``torch-neuronx``)\n===========================================\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /src/examples/pytorch/torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.ipynb\n    /frameworks/torch/torch-neuronx/tutorials/inference/tutorial-torchserve-neuronx\n    /archive/torch-neuron/tutorials/tutorial-libtorch\n    /src/examples/pytorch/torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb\n    /src/examples/pytorch/torch-neuronx/t5-inference-tutorial.ipynb\n\n\n* HuggingFace pretrained BERT tutorial :ref:`[html] </src/examples/pytorch/torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.ipynb>`\n* TorchServe tutorial :ref:`[html] <pytorch-tutorials-torchserve-neuronx>`\n* LibTorch C++ tutorial (for torch-neuron and torch-neuronx) :ref:`[html] <pytorch-tutorials-libtorch>`\n* Torchvision ResNet50 tutorial :ref:`[html] </src/examples/pytorch/torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb>`\n* T5 inference tutorial :ref:`[html] </src/examples/pytorch/torch-neuronx/t5-inference-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <torch-neuronx/t5-inference-tutorial.ipynb>`\n\n.. note::\n\n        To use Jupyter Notebook see:\n\n        * :ref:`setup-jupyter-notebook-steps-troubleshooting`\n        * :ref:`running-jupyter-notebook-as-script`\n\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/note-performance.txt",
    "content": ".. note::\n\n    Logs used in tutorials do not present latest performance numbers\n\n    For latest performance numbers visit :ref:`benchmark`"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/analyze_for_training.rst",
    "content": ".. _torch-analyze-for-training-tutorial:\n\n\n.. meta::\n   :description: Analyze for Training Tutorial - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training, tutorials\n   :date-modified: 2026-03-13\n\n\nAnalyze for Training Tutorial\n==============================\n\nThis tutorial explains how to analyze a model for training support using via ``torch-neuronx``.\n\n.. note::\n    For analyzing models for inference support via ``torch-neuronx``, please refer to :ref:`torch_neuronx.analyze() <torch_neuronx_analyze_api>`\n\nSetup\n-----\n\nFor this tutorial we'll be using two scripts: ``supported.py`` and ``unsupported.py``. Create these files by copy pasting the below code to their respective files.\n\n``supported.py``\n\n.. literalinclude:: tutorial_source_code/analyze_training/analyze_training_code.sh\n   :language: python\n   :lines: 3-42\n\n``unsupported.py``\n\n.. literalinclude:: tutorial_source_code/analyze_training/analyze_training_code.sh\n   :language: python\n   :lines: 46-74\n\nRunning ``analyze`` via ``neuron_parallel_compile``\n---------------------------------------------------\n\nTo analyze a model, we supply the training script to the ``analyze`` command, which is shipped with ``neuron_parallel_compile``.\nThe command is:\n\n.. literalinclude:: tutorial_source_code/analyze_training/analyze_training_code.sh\n   :language: bash\n   :lines: 78\n\nThis will generate a lot of output showing a lot of compilation statuses.\nHere's a snippet of the output when running the above command. \n\n.. code:: shell\n\n    .2023-05-25 00:43:43.000394:  776642  INFO ||ANALYZE||: Compiling /tmp/model_analyis_graphs/compare_7841189860629745939_23.hlo.pb using following command: neuronx-cc compile --target=trn1 --framework XLA /tmp/model_analyis_graphs/compare_7841189860629745939_23.hlo.pb --verbose=35 --query-compute-placement \n    2023-05-25 00:43:43.000418:  776642  INFO ||ANALYZE||: Compiling /tmp/model_analyis_graphs/multiply_15640857564712679356_53.hlo.pb using following command: neuronx-cc compile --target=trn1 --framework XLA /tmp/model_analyis_graphs/multiply_15640857564712679356_53.hlo.pb --verbose=35 --query-compute-placement \n    .\n    Compiler status PASS\n    2023-05-25 00:43:43.000549:  776642  INFO ||ANALYZE||: Compiling /tmp/model_analyis_graphs/subtract_1927104012014828209_49.hlo.pb using following command: neuronx-cc compile --target=trn1 --framework XLA /tmp/model_analyis_graphs/subtract_1927104012014828209_49.hlo.pb --verbose=35 --query-compute-placement \n    ...\n    Compiler status PASS\n\n\nThe analysis report will be generated as a JSON file.\nThe location of the report is shown as the last log entry:\n\n.. code:: shell\n\n    2023-05-25 00:43:49.000252:  776642  INFO ||ANALYZE||: Removing existing report /home/ubuntu/analyze_for_training/model_analysis_result/result.json\n    2023-05-25 00:43:49.000252:  776642  INFO ||ANALYZE||: Model analysis completed. Report - /home/ubuntu/analyze_for_training/model_analysis_result/result.json\n\n.. note::\n\n    Note that if a report is already present in the specified path, ``analyze`` will remove/overwrite it.\n\nThe report generated running the above command looks like:\n\n.. code:: json\n\n    {\n        \"torch_neuronx_version\": \"1.13.0.1.6.1\",\n        \"neuronx_cc_version\": \"2.5.0.28+1be23f232\",\n        \"support_percentage\": \"100.00%\",\n        \"supported_operators\": {\n            \"aten\": {\n                \"aten::permute\": 8,\n                \"aten::add\": 8,\n                \"aten::mul\": 8,\n                \"aten::expand\": 18,\n                \"aten::mm\": 10,\n                \"aten::mse_loss_backward\": 12,\n                \"aten::relu\": 3,\n                \"aten::threshold_backward\": 4,\n                \"aten::squeeze\": 4,\n                \"aten::view\": 4,\n                \"aten::pow\": 2,\n                \"aten::mse_loss\": 2,\n                \"aten::tanh\": 2\n            }\n        },\n        \"unsupported_operators\": {\n            \"aten\": []\n        }\n    }\n\n.. note::\n\n    Note that the ``torch_neuronx`` and ``neuronx_cc`` versions may be different from this example\n\nUnderstanding ``analyze`` report for Unsupported Models\n-------------------------------------------------------\n\nDefault Verbosity\n~~~~~~~~~~~~~~~~~\n\nLet's run ``analyze`` for ``unsupported.py``\n\n.. literalinclude:: tutorial_source_code/analyze_training/analyze_training_code.sh\n   :language: bash\n   :lines: 80\n\nHere is the report generated by the above command:\n\n.. code:: json\n\n    {\n        \"torch_neuronx_version\": \"1.13.0.1.6.1\",\n        \"neuronx_cc_version\": \"2.5.0.28+1be23f232\",\n        \"support_percentage\": \"60.00%\",\n        \"supported_operators\": {\n            \"aten\": {\n                \"aten::add\": 2,\n                \"aten::mul\": 1\n            }\n        },\n        \"unsupported_operators\": {\n            \"aten\": [\n                {\n                    \"kind\": \"aten::mul\",\n                    \"failureAt\": \"neuronx-cc\",\n                    \"call\": \"test2_unsup.py 24\"\n                }\n            ]\n        }\n    }\n\nIn the list of unsupported operators we are provided the specific aten op that failed, and where that operator is in the training script.\n\nOne thing to notice is that the ``support_percentage`` doesn't exactly add up. This is because the ``support_percentage`` is calculated based on the supported number of XLA/HLO instructions (explained more in the next section). To see the specific XLA/HLO op lowerings, use the flag ``--analyze-verbosity 1``, as the default is ``2``.\n\nThe last thing is that a specific aten operator can be supported and unsupported simultaneously. In our example, this can be seen with ``aten::mul``. This is due to the configuration of the aten op. The below section will describe what went wrong with the ``aten::mul`` op.\n\nLower Level Verbosity\n~~~~~~~~~~~~~~~~~~~~~\n\nLet's run again with lower verbosity level:\n\n.. literalinclude:: tutorial_source_code/analyze_training/analyze_training_code.sh\n   :language: bash\n   :lines: 82\n\nThe report looks like:\n\n.. code:: json\n\n    {\n        \"torch_neuronx_version\": \"1.13.0.1.6.1\",\n        \"neuronx_cc_version\": \"2.5.0.28+1be23f232\",\n        \"support_percentage\": \"60.00%\",\n        \"supported_operators\": {\n            \"aten\": {\n                \"aten::mul\": 1,\n                \"aten::add\": 2\n            },\n            \"xla\": [\n                \"f32[] multiply(f32[], f32[])\",\n                \"f32[4]{0} broadcast(f32[]), dimensions={}\",\n                \"f32[4]{0} add(f32[4]{0}, f32[4]{0})\"\n            ]\n        },\n        \"unsupported_operators\": {\n            \"aten\": [\n                {\n                    \"kind\": \"aten::mul\",\n                    \"failureAt\": \"neuronx-cc\",\n                    \"call\": \"test2_unsup.py 24\"\n                }\n            ],\n            \"xla\": [\n                {\n                    \"hlo_instruction\": \"c64[4]{0} convert(f32[4]{0})\",\n                    \"aten_op\": \"aten::mul\"\n                },\n                {\n                    \"hlo_instruction\": \"c64[4]{0} multiply(c64[4]{0}, c64[4]{0})\",\n                    \"aten_op\": \"aten::mul\"\n                }\n            ]\n        }\n    }\n\nThis report provides both the aten operator and the failed XLA/HLO instructions. There will be more HLO instructions than aten ops since an aten op generally lowers to multiple HLO instructions. As a result, the ``support_percentage`` field doesn't exactly line up with the aten operator count, but does line up the XLA/HLO instruction count. This level of verbosity is intended for use when you have the ability to modify the model's HLO lowering, or generally have insight into the HLO lowering.\n\nAs mentioned before, the ``aten::mul`` op appears to be both supported and unsupported. This is because the compiler does not support a specific configuration of ``aten::mul``, which can be seen more clearly with the HLO lowering. In the above example, the ``aten::mul`` operator is unsupported since at least one parameter provided was a complex type (``C64``), which is unsupported by ``neuronx-cc``.\n\nThis concludes the tutorial. The API for ``analyze`` can be found within :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>`\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/bert.rst",
    "content": ".. _hf-bert-pretraining-tutorial:\n\n\n.. meta::\n   :description: Hugging Face BERT Pretraining Tutorial (Data-Parallel) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training, tutorials\n   :date-modified: 2026-03-13\n\n\nHugging Face BERT Pretraining Tutorial (Data-Parallel)\n======================================================\n\n.. important::\n   Neuron will stop supporting XLA-based training support in a future release. For now, this tutorial is provided strictly for reference.\n\nThis tutorial explains how to run Hugging Face BERT-Large model\npretraining on Trainium using PyTorch Neuron and data-parallel mode.\n\nThe Hugging Face BERT pretraining example demonstrates the steps\nrequired to perform single-node, multi-accelerator PyTorch model\ntraining using the new AWS EC2 Trn1 (Trainium) instances and the AWS\nNeuron SDK. This tutorial is an adaptation of an existing `BERT\nexample <https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/run_pretraining.py>`__\nwith the following important characteristics:\n\n-  Framework: PyTorch/XLA\n-  Model: Hugging Face BertForPreTraining\n-  Optimizer: AdamW, LAMB (Layerwise Adaptive Moments optimizer)\n-  Scheduler: Hugging Face's get_linear_schedule_with_warmup\n-  Allreduce occurs before optimizer step, after gradient accumulations\n   (following DeepSpeed's Smart Gradient Accumulation)\n-  Training data types: Float32, full BFloat16 and Stochastic Rounding (SR), full BFloat16 with fp32 copy of weights, PyTorch Autocast (Automatic Mixed Precision or AMP)\n\nAs done in the original BERT paper, BERT pretraining happens in two\nphases. In the first phase (phase 1) BERT maximum sequence length is fixed\nat 128 tokens, while in phase 2 it is fixed at 512 tokens.\n\nNeuron provides access to Trainium devices through an extension of PyTorch/XLA - a library that includes the familiar PyTorch interface along with XLA-specific additions. For additional details\nrelating to PyTorch/XLA, please refer to the `official PyTorch/XLA\ndocumentation <https://pytorch.org/xla/>`__.\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 3\n\n\n.. include:: ../note-performance.txt\n\n\nPhase 1 BFloat16 BERT-Large pretraining with AdamW and stochastic rounding\n--------------------------------------------------------------------------\n\n\nSetting up the training environment on trn1.32xlarge\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe BERT training script ``dp_bert_large_hf_pretrain_hdf5.py`` (`source <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/dp_bert_hf_pretrain/dp_bert_large_hf_pretrain_hdf5.py>`_)\ncan run on a Trainium instance (trn1.32xlarge) that contains the\nappropriate Neuron runtime and Python dependencies.\n\nFirst, on a trn1.32xlarge instance, follow the installation instructions at:\n\n:ref:`Install PyTorch Neuron on Trn1 <setup-torch-neuronx>`\n\nPlease set the storage of instance to *512GB* or more if you intent to run multiple experiments and save many checkpoints.\n\nFor all the commands below, make sure you are in the virtual environment that you have created above before you run the commands:\n\n.. code:: shell\n\n   source ~/aws_neuron_venv_pytorch/bin/activate\n\nNext, clone the `AWS Neuron Samples repository <https://github.com/aws-neuron/aws-neuron-samples/>`_ and install requirements in the BERT tutorial directory ``aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain`` (`directory link <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/dp_bert_hf_pretrain>`_):\n\n.. code:: bash\n\n   cd ~/\n   git clone https://github.com/aws-neuron/aws-neuron-samples.git\n\n.. literalinclude:: tutorial_source_code/bert_training/bert_setup_code.sh\n   :language: shell\n   :lines: 5\n\n\nDownloading tokenized and sharded dataset files\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo download the tokenized and sharded dataset files needed for this tutorial, please run the following commands:\n\n.. literalinclude:: tutorial_source_code/bert_training/bert_setup_code.sh\n   :language: shell\n   :lines: 8-16\n\n``~/examples_datasets/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128`` will now have the tokenized and sharded dataset files for phase 1 pretraining and ``~/examples_datasets/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512`` for phase 2 pretraining.\n\nNumber of workers\n~~~~~~~~~~~~~~~~~\n\nYou will be using torchrun (`PyTorch's Elastic Launch <https://pytorch.org/docs/stable/elastic/run.html>`__) to run some of the commands in this tutorial. When running the training script, you can configure the number of\nNeuronCores to use for training by using torchrun's ``--nproc_per_node`` option. In this tutorial, we use 32 NeuronCores on trn1.32xlarge.\n\n.. note::\n\n    Currently Neuron Runtime only support 1 and 2 worker configurations on trn1.2xlarge and 1, 2, 8, and 32-worker configurations on trn1.32xlarge.\n\n.. _bf16_sr_phase1:\n\nBFloat16 and stochastic rounding in phase 1\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nPhase 1 pretraining performance can be increased by using BFloat16 casting\nand stochastic rounding. BFloat16 casting and stochastic rounding can be enabled by moving the model to BFloat16 using ``model.to(torch.bfloat16)`` expression in the training code and setting the environment variable ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1``, both are done in BERT pretraining example ``dp_bert_large_hf_pretrain_hdf5.py`` by default. Also in the BERT pretraining example, the loss is kept in FP32 to ensure smooth loss curve when loss averaging is used. We also preserve the optimizer states in FP32 using a modified HuggingFace AdamW implementation in order to match FP32 loss with BFloat16.\nTo achieve maximum performance while maintaining loss\nconvergence characteristics, we are using batch size of 16 and\ngradient accumulation microsteps of 32 to maintain global batch size of 16384 for phase 1.\nThe batch size and gradient accumulation microstep changes can be set by\nlaunching the BERT pretraining script ``dp_bert_large_hf_pretrain_hdf5.py`` with\ncommand-line arguments ``--batch_size=16 --grad_accum_usteps=32``, as seen in the following steps.\n\nAnother option with BFloat16 using PyTorch AutoCast (Automatic Mixed Precision or AMP) is covered in :ref:`amp-sr-phase1`.\n\n.. note::\n\n   ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated starting in torch-xla 2.1, and their usage would result in warnings. They will become no-operations in torch-xla 2.6. Please switch to using ``model.to(torch.bfloat16()`` or AMP.\n\nPre-compilation\n~~~~~~~~~~~~~~~\n\nPyTorch Neuron evaluates operations lazily during execution of the training loops, which means it builds a symbolic\ngraph in the background and the graph is executed in hardware only when the tensor is printed, transfered to CPU, or ``xm.mark_step()`` is encountered (``xm.mark_step()`` is implicitly called by ``pl.MpDeviceLoader/pl.ParallelLoader``). During execution of the training loops, PyTorch Neuron can build multiple graphs depending on the number of conditional paths taken. For BERT-Large pretraining, PyTorch Neuron builds multiple unique graphs that should be compiled before running on the NeuronCores. PyTorch Neuron will compile those graphs only if they are not in the XLA in-memory cache or the persistent cache. To reduce the compilation time of these graphs, you can pre-compile those graphs using the utility ``neuron_parallel_compile`` (provided by the ``libneuronxla`` package, a transitive dependency of ``torch-neuronx``) as shown:\n\n.. literalinclude:: tutorial_source_code/bert_training/bert_precompilation_code.sh\n   :language: shell\n   :lines: 5-10\n\nThis command performs a fast trial run of the training script to build\ngraphs and then do parallel compilations on those graphs using multiple processes of Neuron Compiler before\npopulating the on-disk persistent cache with compiled graphs. This helps make\nthe actual training run faster because the compiled graphs will loaded from the persistent cache.\nCurrently it takes ~13 minutes to compile the BERT-Large model training step using the pre-compilation script (compare to ~40 minute if not using the pre-compilation script).\nNote that the command above specifies 32 NeuronCores for trn1.32xlarge via --nproc_per_node option.\n\nThe script ``run_dp_bert_large_hf_pretrain_bf16_s128.sh`` is provided in the same BERT tutorial directory for convenience and you can simply run the script using ``neuron_parallel_compile ./run_dp_bert_large_hf_pretrain_bf16_s128.sh`` to start the precompilation.\n\nThe pretokenized dataset is expected to be at ``~/examples_datasets/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128/`` by default (see above for downloading instructions) and can be changed via the ``--data_dir`` option.\n\n.. note::\n\n    The trial run during pre-compilation currently outputs invalid loss numbers. Please disregard them.\n\n.. note::\n\n    The command after ``neuron_parallel_compile`` should match the actual run command, except for the option ``--steps_this_run`` which shortens the trial run just enough to allow the tool to build all the graphs needed for the actual run.\n\n\nIf you interrupt\nthe run and restart the execution without changing model configurations or training hyperparameters, the new run will detect the cached\ngraphs in the persistent cache (on-disk) and reload the compiled graphs for\nexecution, avoiding any recompilation time.\n\nChanges made to the BERT model configuration (layers, hidden\nsize, attention heads in the get_model function), batch size (using\n``--batch_size`` option), optimizer or number of workers may trigger\ngraph recompilation. It is best to rerun the pre-compilation step above if these changes are made.\n\nYou can adjust the following hyperparameters without changing the model\nand causing recompilation:\n\n-  Number of global steps to run (``--steps_this_run`` option)\n-  Learning rate (``--lr`` option)\n-  Gradient accumulation steps > 1 (``--grad_accum_usteps`` option). If\n   1 then there's no gradient accumulation and the graphs change causing\n   recompilation.\n\nInitiating a Training Job\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAfter running the pre-compilation step, continue\nwith the actual phase 1 pretraining by running the following\nset of commands to launch 32 data parallel distributed training workers on trn1.32xlarge:\n\n.. literalinclude:: tutorial_source_code/bert_training/bert_training_code.sh\n   :language: shell\n   :lines: 5-9\n\nThe script ``run_dp_bert_large_hf_pretrain_bf16_s128.sh`` is provided in the same BERT tutorial directory for convenience and you can simply run the script to start the training.\n\n\nThe following messages indicate that the Neuron Runtime is initializing:\n\n.. code:: bash\n\n   Using Neuron Runtime\n   Using Neuron Runtime\n   Using Neuron Runtime\n   Using Neuron Runtime\n   Using Neuron Runtime\n   ...\n\nA few moments later, you will see the Training Configuration and Model\nConfiguration in the output:\n\n.. code:: bash\n\n   --------TRAINING CONFIG----------\n   Namespace(batch_size=16, data_dir='~/examples_datasets/\n   bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128/', debug=False,\n   enable_pt_autocast=False, grad_accum_usteps=32, local_rank=0, lr=0.0004,\n   max_pred_len=20, max_steps=28125, metrics_file='/tmp/test_dict.json',\n   minimal_ckpt=False, num_ckpts_to_keep=1, output_dir='./output',\n   phase1_end_step=28125, phase2=False, resume_ckpt=False, resume_step=-1,\n   seed=12349, seq_len=128, shards_per_ckpt=1, steps_this_run=28125, warmup_steps=2000)\n\n.. code:: bash\n\n   --------MODEL CONFIG----------\n   BertConfig {\n   \"_name_or_path\": \"bert-large-uncased\",\n   \"architectures\": [\n   \"BertForMaskedLM\"\n   ],\n   \"attention_probs_dropout_prob\": 0.1,\n   \"classifier_dropout\": null,\n   \"gradient_checkpointing\": false,\n   \"hidden_act\": \"gelu\",\n   \"hidden_dropout_prob\": 0.1,\n   \"hidden_size\": 1024,\n   \"initializer_range\": 0.02,\n   \"intermediate_size\": 4096,\n   \"layer_norm_eps\": 1e-12,\n   \"max_position_embeddings\": 512,\n   \"model_type\": \"bert\",\n   \"num_attention_heads\": 16,\n   \"num_hidden_layers\": 24,\n   \"pad_token_id\": 0,\n   \"position_embedding_type\": \"absolute\",\n   \"transformers_version\": \"4.15.0\",\n   \"type_vocab_size\": 2,\n   \"use_cache\": true,\n   \"vocab_size\": 30522\n   }\n\nAs the worker processes begin training on the BERT dataset, you will\nbegin to see training metrics and the learning rate logged to the\nconsole approximately every training step. The metrics include\naverage_loss, step_loss, and throughput:\n\n.. code:: bash\n\n    LOG Thu Sep 29 22:30:10 2022 - (0, 78) step_loss : 9.1875  learning_rate : 1.56e-05  throughput : 2873.14\n    LOG Thu Sep 29 22:30:16 2022 - (0, 79) step_loss : 8.9375  learning_rate : 1.58e-05  throughput : 2878.09\n    LOG Thu Sep 29 22:30:22 2022 - (0, 80) step_loss : 9.0000  learning_rate : 1.60e-05  throughput : 2875.31\n    LOG Thu Sep 29 22:30:27 2022 - (0, 81) step_loss : 9.0000  learning_rate : 1.62e-05  throughput : 2877.35\n    LOG Thu Sep 29 22:30:33 2022 - (0, 82) step_loss : 8.8750  learning_rate : 1.64e-05  throughput : 2872.55\n    LOG Thu Sep 29 22:30:39 2022 - (0, 83) step_loss : 9.0000  learning_rate : 1.66e-05  throughput : 2876.17\n    LOG Thu Sep 29 22:30:44 2022 - (0, 84) step_loss : 9.1250  learning_rate : 1.68e-05  throughput : 2872.48\n    LOG Thu Sep 29 22:30:50 2022 - (0, 85) step_loss : 9.0000  learning_rate : 1.70e-05  throughput : 2873.39\n\nBy default, the training script will store all output files under\n``~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain/output``. The output files consist of\nthe following:\n\n-  PyTorch model checkpoint files, with names containing the global step\n   of the checkpoint (ckpt_2000.pt, ckpt_4000.pt, etc.). Currently, the\n   training script saves a checkpoint after every dataset shard.\n   The frequency of saving checkpoint can be reduced by increasing the number of\n   dataset shards per checkpoint, using option ``--shards_per_ckpt``.\n   Furthermore, the number of checkpoints kept at a given time is limited by ``--num_ckpts_to_keep`` option (currently default to 1).\n\n-  TensorBoard log files (each training run will store its logs in a\n   subdirectory with prefix ``neuron_tblogs_``).\n\nMonitoring Progress of the Training Job\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUsing a single Trn1 instance with 32 NeuronCores, the current BERT\nphase 1 pretraining will finish in about 45 hours. During this time, you will\nsee the average loss metric begin at about 11.2 and ultimately converge to about 1.4.\n\nMonitoring Training Job Progress using neuron-top\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWith the training job still running, launch a second SSH connection into\nthe trn1 instance, and use the ``neuron-top`` command to examine the\naggregate NeuronCore utilization. If you have not modified the ``--nproc_per_node`` option\nin the run command, you should observe that\nall 32 NeuronCores are participating in the training job, with\nutilization fluctuating around 80%.\n\nMonitoring Training Job Progress using TensorBoard\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe demo includes TensorBoard-compatible logging, which allows the\nlearning rate and training metrics to be monitored in real-time. By\ndefault, the training script logs metrics to the following TensorBoard\nlog directory ``~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain/output/neuron_tblogs_<date/time>_<training configs>``.\n\nIn order to view your training metrics in TensorBoard, first run the\nfollowing commands in your SSH session:\n\n.. code:: bash\n\n   cd ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain\n   tensorboard --logdir ./output\n\nOnce running, open a new SSH connection to the instance and port-forward\nTCP port 6006 (ex: ``ssh -L 6006:127.0.0.1:6006 user_name@remote_ip``). Once the tunnel is\nestablished, TensorBoard can then be accessed via web browser at the\nfollowing URL: `http://localhost:6006 <http://localhost:6006/>`__.\nPlease note that you will not be able to access TensorBoard if you\ndisconnect your port-forwarding SSH session to the Trainium instance.\n\n.. image:: tensorboard.png\n   :alt: Image: tensorboard.png\n\n\nFinishing the tutorial\n~~~~~~~~~~~~~~~~~~~~~~\n\nOnce you are ready, there are a couple of options for finishing\nthe BERT pretraining demo:\n\n1. **Allow the training script to run to completion**. If you would like\n   to observe the training script run to completion, it is recommended\n   to launch the training script from a terminal multiplexer such as\n   ``tmux`` or ``screen``, and then detach the session so that the\n   training script can run in the background. With this approach, you\n   can safely let the training script run unattended, without risk of an\n   SSH disconnection causing the training job to stop running.\n2. **Stop the training job early**. To stop the training job early,\n   press CTRL-C in the terminal window in which you launched the\n   training script. In some cases, if you manually cancel a job using\n   CTRL-C and then later want to run the job again, you might first need\n   to execute ``sudo rmmod neuron; sudo modprobe neuron`` in order to\n   reload/reset the Neuron driver.\n\nPhase 1 BERT-Large pretraining with Layerwise Adaptive Moments based optimizer (LAMB)\n-------------------------------------------------------------------------------------\nSometimes, to reduce the training wall time, you can use higher learning rate and larger global batch size. The approach is discussed in `LARGE BATCH OPTIMIZATION FOR DEEP LEARNING: TRAINING BERT IN 76 MINUTES <https://arxiv.org/pdf/1904.00962.pdf>`__. Tranium supports LAMB, and in this tutorial, we use publicly available XLA-friendly LAMB implemenation from https://github.com/rwightman/pytorch-image-models/blob/master/timm/optim/lamb.py.\n\n.. literalinclude:: tutorial_source_code/bert_training/bert_lamb_training_code.sh\n   :language: shell\n   :lines: 5-12\n \nThe command-line argument ``--optimizer LAMB`` is needed, otherwise, the default optimizer AdamW will be used. Besides, you need to use a set of hyper-parameters supporting the larger global batch size (GBS). In this case, we have 64k as GBS for LAMB and use a set of hyper-params similar to https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/README.md. Given higher GBS from LAMB than AdamW, it takes fewer steps (roughly 7k) to achieve similar level of accuracy as AdamW, which takes more than 28k steps. In addition, you can also use different data types on top of LAMB. Below is an example using the BFloat16 and Stochastic Roundings. \n\n.. literalinclude:: tutorial_source_code/bert_training/bert_lamb_bf16_training_code.sh\n   :language: shell\n   :lines: 5-12\n\nThe script ``run_dp_bert_large_hf_pretrain_bf16_s128_lamb.sh`` is provided in the same BERT tutorial directory for convenience and you can simply run the script to start the training.\n\n.. _fp32paramscopy-sr-phase1:\n\nPhase 1 BFloat16 BERT-Large pretraining with AdamW and FP32 copy of weights\n---------------------------------------------------------------------------\nBFloat16 training can be achieved without stochastic rounding when a copy of weights is kept in FP32. To train BERT-Large with AdamW and FP32 copy of weights, specify ``--optimizer=AdamW_FP32ParamsCopy`` option when calling the BERT pretraining script (stochastic rounding is off):\n\n.. code:: bash\n\n    cd ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain\n    torchrun --nproc_per_node=32 dp_bert_large_hf_pretrain_hdf5.py \\\n    --batch_size 16 \\\n    --optimizer=AdamW_FP32ParamsCopy \\\n    --grad_accum_usteps 32 |& tee run_pretrain_log.txt\n\nThe script ``run_dp_bert_large_hf_pretrain_bf16_s128.sh`` is provided in the same BERT tutorial directory for convenience and you can simply run the script with ``fp32paramscopy`` option like ``./run_dp_bert_large_hf_pretrain_bf16_s128.sh fp32paramscopy`` to start the training with FP32 copy of weights.\n\n\n.. _amp-sr-phase1:\n\nPhase 1 BERT-Large pretraining with AdamW and PyTorch Autocast (Automatic Mixed Precision or AMP)\n-------------------------------------------------------------------------------------------------\nBesides the :ref:`bf16_sr_phase1` , you can also use [PyTorch Autocast for XLA (Automatic Mixed Precision or AMP)](https://github.com/pytorch/xla/blob/master/docs/source/perf/amp.md), which automatically converts operations to either a lower precision (like Bfloat16) or Float32. This generally provides better performance over full Float32 due to higher compute density and lower memory footprint.\nWith the BERT-Large pretraining scripts you can use AMP by specifying the ``--enable_pt_autocast`` option without enabling stochatic rounding (``NEURON_RT_STOCHASTIC_ROUNDING_EN is not set``).\n\n.. literalinclude:: tutorial_source_code/bert_training/bert_amp_training_code.sh\n   :language: shell\n   :lines: 5-10\n\nThe script ``run_dp_bert_large_hf_pretrain_bf16_s128.sh`` is provided in the same BERT tutorial directory for convenience and you can simply run the script with ``amp`` option like ``./run_dp_bert_large_hf_pretrain_bf16_s128.sh amp`` to start the training with AMP.\n\nUnder the hood, ``--enable_pt_autocast`` would wrap only the forward pass and loss in the PyTorch autocasting context. The backward pass is NOT in the PyTorch autocasting context. This converts compute operations such as matrix multiply, convolution, activation, and pooling to lower precision such as BFloat16 while keeping numerically sensitive operations such as softmax and cross-entropy in Float32. For information about operations that are autocasted, please see [PyTorch Autocast for XLA AMP guide](https://github.com/pytorch/xla/blob/master/docs/source/perf/amp.md#supported-operators).\n\n.. code:: python\n\n             with torch.autocast(enabled=flags.enable_pt_autocast, dtype=torch.bfloat16, device_type='xla'):\n                 outputs = model(input_ids=input_ids,\n                                 attention_mask=input_mask,\n                                 token_type_ids=segment_ids,\n                                 labels=masked_lm_labels,\n                                 next_sentence_label=next_sentence_labels)\n                 loss = outputs.loss / flags.grad_accum_usteps\n             loss.backward()\n             running_loss += loss.detach()\n\nPhase 1 BERT-Large pretraining on two instances\n-----------------------------------------------\n\nIf you have two trn1.32xlarge instances with EFA-enabled interfaces, using `EFA-enabled security group <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html#nccl-start-base-setup>`__, and setup using :ref:`Install PyTorch Neuron on Trn1 <pytorch-neuronx-install>`, you can run\nmulti-instance BERT-Large pretraining. The following example demonstrate running BERT phase 1 pretraining on two instances.\nTo ensure that the global batch size remains at 16384 for phase 1, the gradient accumulation microstep count is reduced by half when the number of instances is 2.\nNOTE: To run on multiple instances, you will need to use trn1.32xlarge instances and using all 32 NeuronCores on each instance.\n\nOn the rank-0 Trn1 host (root), run with ``--node_rank=0`` using torchrun utility, and ``--master_addr`` set to rank-0 host's IP address:\n\n.. code:: shell\n\n   cd ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain\n   export FI_EFA_USE_DEVICE_RDMA=1\n   export FI_PROVIDER=efa\n   export BUCKET_CAP_MB=512\n   export XLA_TRANSFER_SEED_ASYNC=1\n   torchrun --nproc_per_node=32 --nnodes=2 --node_rank=0 --master_addr=<root IP> --master_port=2020 \\\n   dp_bert_large_hf_pretrain_hdf5.py \\\n   --batch_size 16 \\\n   --grad_accum_usteps 16 |& tee run_pretrain_log.txt\n\nOn another Trn1 host, run with ``--node_rank=1``, and ``--master_addr`` also set to rank-0 host's IP address:\n\n.. code:: shell\n\n   cd ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain\n   export FI_EFA_USE_DEVICE_RDMA=1\n   export FI_PROVIDER=efa\n   export BUCKET_CAP_MB=512\n   export XLA_TRANSFER_SEED_ASYNC=1\n   torchrun --nproc_per_node=32 --nnodes=2 --node_rank=1 --master_addr=<root IP> --master_port=2020 \\\n   dp_bert_large_hf_pretrain_hdf5.py \\\n   --batch_size 16 \\\n   --grad_accum_usteps 16 |& tee run_pretrain_log.txt\n\nIt is important to launch rank-0 worker with ``--node_rank=0`` to avoid hang.\n\nTo train on multiple instances, it is recommended to use a ParallelCluster. For a ParallelCluster example, please see `Train a model on AWS Trn1 ParallelCluster <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples>`__.\n\nPhase 2 BERT-Large pretraining\n------------------------------\n\nAs mentioned above, BERT pretraining happens in two\nphases. In phase 1, the sequence length is 128.\nIn phase 2, the sequence length increases to 512.\nThis additional training phase will further reduce the pretraining\nloss and improve the metrics for the fine-tune tasks that usually\nfollow. The setup is very similar to the phase 1, with some differences\nin training environment and command line arguments highlighted below.\n\nTraining Environment\n~~~~~~~~~~~~~~~~~~~~\n\nThe following dataset and checkpoint are required:\n\n* ``~/examples_datasets/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512`` is WikiCorpus training dataset that is preprocessed (tokenized and pre-masked) for phase 2.\n\n* ``~/examples/dp_bert_hf_pretrain/output/ckpt_<phase1_end_step>.pt`` is the final checkpoint from phase 1.  It’s generated automatically at the end of phase 1 pretraining. For convenience, one can also download the example available at ``s3://neuron-s3/training_checkpoints/pytorch/dp_bert_large_hf_pretrain/ckpt_28125.pt``, which is collected after 28125 training steps in phase 1. Phase 2 will continue training by loading this checkpoint. During its progression, phase 2 continues to generate its own checkpoints in output directory, following the naming convention ``ckpt_<global_steps>.pt``\n\nInitiating a Training Job\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo launch the phase 2 pretraining job with AdamW optimizer, run the same python script ``dp_bert_large_hf_pretrain_hdf5.py``\nas before except with different options for phase 2. \nFor phase 2, we are using global batch size of 32768, with worker device batch size of 2\nand gradient accumulation microsteps of 512. The pretokenized dataset is expected to be at ``~/examples_datasets/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512/`` following the setup steps above and is set via ``--data_dir`` option.\n\n.. literalinclude:: tutorial_source_code/bert_training/bert_phase2_training_code.sh\n   :language: shell\n   :lines: 6-19\n\nThe script ``run_dp_bert_large_hf_pretrain_bf16_s512_phase2.sh`` is provided in the same BERT tutorial directory for convenience and you can simply run the script to start the training with AdamW optimizer. Similarly, you can use LAMB optimizer using the script ``run_dp_bert_large_hf_pretrain_bf16_s512_lamb_phase2.sh``.\n\nThe output below is expected as the job is initiated. Step 28125 is the phase1_end_step in this run, which could be different if phase1 training stops at a different global step.\n\n.. code:: shell\n\n    Worker 21 resuming from checkpoint ./output/ckpt_28125.pt at step 28125\n    Worker 23 resuming from checkpoint ./output/ckpt_28125.pt at step 28125\n    Worker 27 resuming from checkpoint ./output/ckpt_28125.pt at step 28125\n    Worker 26 resuming from checkpoint ./output/ckpt_28125.pt at step 28125\n    Worker 20 resuming from checkpoint ./output/ckpt_28125.pt at step 28125\n    Worker 22 resuming from checkpoint ./output/ckpt_28125.pt at step 28125\n\n    --------TRAINING CONFIG----------\n    Namespace(batch_size=2, data_dir='/home/ec2-user/examples_datasets/\n    bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512/', debug=False,\n    enable_pt_autocast=False, grad_accum_usteps=512, local_rank=0, lr=0.0002,\n    max_pred_len=80, max_steps=28125, metrics_file='/tmp/test_dict.json',\n    minimal_ckpt=False, num_ckpts_to_keep=1, output_dir='./output',\n    phase1_end_step=28125, phase2=True, resume_ckpt=True, resume_step=-1,\n    seed=12349, seq_len=512, shards_per_ckpt=1, steps_this_run=32, warmup_steps=781)\n\n    --------MODEL CONFIG----------\n    BertConfig {\n      \"_name_or_path\": \"bert-large-uncased\",\n      \"architectures\": [\n        \"BertForMaskedLM\"\n      ],\n      \"attention_probs_dropout_prob\": 0.1,\n      \"classifier_dropout\": null,\n      \"gradient_checkpointing\": false,\n      \"hidden_act\": \"gelu\",\n      \"hidden_dropout_prob\": 0.1,\n      \"hidden_size\": 1024,\n      \"initializer_range\": 0.02,\n      \"intermediate_size\": 4096,\n      \"layer_norm_eps\": 1e-12,\n      \"max_position_embeddings\": 512,\n      \"model_type\": \"bert\",\n      \"num_attention_heads\": 16,\n      \"num_hidden_layers\": 24,\n      \"pad_token_id\": 0,\n      \"position_embedding_type\": \"absolute\",\n      \"transformers_version\": \"4.15.0\",\n      \"type_vocab_size\": 2,\n      \"use_cache\": true,\n      \"vocab_size\": 30522\n    }\n\nAs the phase 2 training proceeds, similar metrics to phase 1 will appear on the console, showing the loss, learning rate, and throughput:\n\n.. code:: shell\n\n    LOG Tue Sep 27 20:56:35 2022 - (0, 26) step_loss : 4.3438  learning_rate : 6.66e-06  throughput : 494.55\n    LOG Tue Sep 27 20:57:40 2022 - (0, 27) step_loss : 4.0938  learning_rate : 6.91e-06  throughput : 495.67\n    LOG Tue Sep 27 20:58:46 2022 - (0, 28) step_loss : 4.1875  learning_rate : 7.17e-06  throughput : 496.18\n    LOG Tue Sep 27 20:59:53 2022 - (0, 29) step_loss : 4.0000  learning_rate : 7.43e-06  throughput : 495.31\n    LOG Tue Sep 27 21:00:58 2022 - (0, 30) step_loss : 4.2500  learning_rate : 7.68e-06  throughput : 495.60\n    LOG Tue Sep 27 21:02:05 2022 - (0, 31) step_loss : 4.3125  learning_rate : 7.94e-06  throughput : 495.50\n    LOG Tue Sep 27 21:03:10 2022 - (0, 32) step_loss : 4.4688  learning_rate : 8.19e-06  throughput : 496.02\n\nTools\n-----\n\nWhile running the tutorial, try experimenting with the following Neuron\ntools, which help monitor and evaluate compute utilization in real-time:\n\nneuron-ls\n~~~~~~~~~\n\nThe ``neuron-ls`` command describes the number of Neuron devices present\nin the system, along with the associated NeuronCore count, memory, and\nPCI device information:\n\n.. image:: neuron-ls.png\n   :alt: Image: image.png\n\nYou will find that the Trn1 instance has 16 Neuron devices, each with 2\nNeuronCores. This configuration allows you to train the model using a\ntotal of 32 workers, one per NeuronCore, within a single instance.\n\nAdditional information regarding neuron-ls can be found in the\n`neuron-ls user\nguide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-tools/neuron-ls.html>`__.\n\nneuron-top\n~~~~~~~~~~\n\nThe ``neuron-top`` command presents a high-level view of the Neuron\nenvironment, including the utilization of each of the NeuronCores, any\nmodels that are currently loaded onto one or more NeuronCores, process\nIDs for any processes that are leveraging the Neuron runtime, and basic\nsystem statistics relating to vCPU and memory usage.\n\nPlease note that ``neuron-top`` can either display aggregate NeuronCore\nutilization for 'all' processes (the default), or alternatively display\nthe NeuronCore utilization for a particular process. You can toggle\nthrough the aggregate and per-process views using the ``a`` and ``d``\nkeys. The screenshot below illustrates the default aggregate view:\n\n.. image:: neuron-top.png\n   :alt: Image: image.png\n\nPlease refer to the `neuron-top user\nguide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-tools/neuron-top-user-guide.html>`__\nfor additional details.\n\nGenerating tokenized and sharded dataset files\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis section is for generating tokenized and sharded dataset files from WikiCorpus dataset. If you just want the pregenenerated dataset files, please see ``Downloading tokenized and sharded dataset files`` section above.\n\nOn a c5n.18xlarge instance launched with Deep Learning Conda AMI and 512GB disk space, you can generate the preprocessed datasets from WikiCorpus dataset using NVidia's DeepLearningExamples for BERT pretraining. The preprocessing converts the WikiCorpus dataset to tokenized data and shard the data into multiple shards for parallel loading. The full flow takes about 8.7 hours:\n\n.. code:: shell\n\n    source activate pytorch_latest_p37\n    cd ~/\n    git clone https://github.com/NVIDIA/DeepLearningExamples.git\n    cd DeepLearningExamples\n    git checkout 81b9010096b6f9812e3977b607669f6ec8b16561\n    sudo mkdir -m a=rwx /workspace\n    cp -rf PyTorch/LanguageModeling/BERT /workspace/bert\n    cd /workspace\n    git clone https://github.com/attardi/wikiextractor.git\n    cd wikiextractor\n    git checkout 6408a430fc504a38b04d37ce5e7fc740191dee16\n    cd /workspace/bert\n    # increase num processes and shards\n    ex -s \"+%s/\\(bertPrep\\.py\\)\\( --action create_hdf5_files\\)/\\1 --n_processes 32 --n_test_shards 1024 --n_training_shards 1024\\2\" \"+wq\" data/create_datasets_from_start.sh\n    export BERT_PREP_WORKING_DIR=/workspace/data/\n    time ./data/create_datasets_from_start.sh wiki_only |& tee log\n\nAfter execution is finished, phase 1 pre-tokenized and sharded dataset is located at:\n\n``/workspace/data/hdf5_lower_case_1_seq_len_128_max_pred_20_masked_lm_prob_0.15_random_seed_12345_dupe_factor_5/wikicorpus_en/``\n\nCopy this entire directory to ``~/examples_datasets/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128`` of the trn1.32xlarge machine.\n\nPhase 2 pre-tokenized dataset is located at:\n\n``/workspace/data/hdf5_lower_case_1_seq_len_512_max_pred_80_masked_lm_prob_0.15_random_seed_12345_dupe_factor_5/wikicorpus_en/``\n\nCopy this entire directory to ``~/examples_datasets/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512`` of the trn1.32xlarge machine.\n\nKnown issues and limitations\n----------------------------\n\nBERT-large compilation limitations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nOptimal BERT-large phase 1 (sequence length 128) batch size is currently 8 for FP32 and 16 for full BF16 with stochastic rounding.\nOptimal BERT-large phase 2 (sequence length 512) batch size is currently 1 for FP32 and 2 for full BF16 with stochastic rounding.\n\nBERT-large pretraining with pretokenized dataset hangs when using xm.save\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nCurrently, BERT-large pretraining with pretokenized dataset hangs when\n``xm.save`` is used outside of the main training loop.\n\n.. code:: python\n\n   Loop through HDF5 sharded dataset files:\n       Train on one HDF5 sharded dataset file\n           Loop through batched samples:\n               Training iteration\n       Save checkpoint using xm.save\n\nThe reason is that xm.save has a synchronization point. However, the\nHDF5 shared data files do not have the same number of training samples\nso the workers cannot all reach xm.save in the same iteration.\n\nThe workaround is to use ``xm._maybe_convert_to_cpu`` to ensure tensors\nare moved to CPU followed by ``torch.save`` as done in the BERT-large\npretraining tutorial:\n\n.. code:: python\n\n   cpu_data = xm._maybe_convert_to_cpu(data)\n\nBERT-large two worker pretraining hangs or run out of host memory during checkpointing on trn1.2xlarge\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nOn trn1.2xlarge, where there's limited host memory and CPU resources,\nthe BERT-large two worker pretraining may hang or run out of host memory during\ncheckpointing. This problem can be worked around by not saving optimizer and\nLR scheduler states in the checkpoint. This is enabled by ``--minimal_ckpt`` option\nof the pretraining script.\n\nBERT precompilation using neuron_parallel_compile hangs when using torchrun\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWe use neuron_parallel_compile in front of the short run command to do precompilation. However, the following command hangs when running BERT parallel compilation with torchrun:\n\n\n.. code:: bash\n\n    neuron_parallel_compile XLA_DOWNCAST_BF16=1 torchrun --nproc_per_node=32 --nnodes=1 dp_bert_large_hf_pretrain_hdf5.py --steps_this_run 5\n\n    ...\n    Updating train metrics in provide results.json file\n    Current data: {'num_workers': 32, 'epoch': 0, 'steps': 5, 'microsteps': 320, 'loss': -22172234.0, 'train_time_minutes': 0.7424166639645894, 'throughput_average': 1839.0391805624324, 'throughput_peak': 1840.0107059878164, 'batch_size': 8, 'max_length': 128}\n    Updating with data: {'num_workers': 32, 'epoch': 0, 'steps': 5, 'microsteps': 320, 'loss': -22172234.0, 'train_time_minutes': 0.7826640844345093, 'throughput_average': 1744.4691285659471, 'throughput_peak': 1745.4964663587539, 'batch_size': 8, 'max_length': 128}\n    Checkpointing...\n    Checkpointing done...\n    (hangs)\n\nThe fix is to add xm.rendezvous at the end of training to ensure all workers sync up before exiting the script dp_bert_large_pretrain_hdf5.py.\n\n.. code:: python\n\n    def _mp_fn(index, flags):\n        torch.set_default_tensor_type('torch.FloatTensor')\n        train_bert_hdf5(flags)\n        xm.rendezvous(\"_mp_fn finished\")\n\nTroubleshooting\n---------------\n\nThe following are troubleshooting tips related to this tutorial. See\n:ref:`PyTorch Neuron on Trainium Troubleshooting\nGuide <pytorch-neuron-traning-troubleshooting>` for additional troubleshooting\ntips.\n\n.. _modulenotfounderror-no-module-named-torch--torch_xla-transformers-etc:\n\nModuleNotFoundError: No module named 'torch' , 'torch_xla', 'transformers', etc\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf you encounter 'ModuleNotFoundError' messages while attempting to run\nthe demo scripts, please ensure that you have activated the appropriate\nPython *virtualenv* which contains all of the demo dependencies:\n\n.. code:: bash\n\n   cd ~\n   source <python virtual environment path>/bin/activate\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/finetune_hftrainer.rst",
    "content": ".. _torch-hf-bert-finetune:\n\n\n.. meta::\n   :description: PyTorch Neuron for Trainium Hugging Face BERT MRPC task finetuning using Hugging Face Trainer API - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training, tutorials\n   :date-modified: 2026-03-13\n\n\nPyTorch Neuron for Trainium Hugging Face BERT MRPC task finetuning using Hugging Face Trainer API\n=================================================================================================\n\n.. important::\n   Neuron will stop supporting XLA-based training support in a future release. For now, this tutorial is provided strictly for reference.\n\n.. note::\n\n   Use Hugging Face `Optimum-Neuron <https://huggingface.co/docs/optimum-neuron/index>`_ for the best coverage and support for Hugging Face models running on AWS Trainium and Inferentia devices.\n\nIn this tutorial, we show how to run a Hugging Face script that uses Hugging Face Trainer API\nto do fine-tuning on Trainium. The example follows the `text-classification\nexample <https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification>`__\nwhich fine-tunes BERT-base model for sequence classification on the GLUE\nbenchmark.\n\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n.. include:: ../note-performance.txt\n\nSetup and compilation\n---------------------\n\nBefore running the tutorial please follow the installation instructions at:\n\n:ref:`Install PyTorch Neuron on\nTrn1 <setup-torch-neuronx>`\n\nPlease set the storage of instance to *512GB* or more if you also want to run through the BERT pretraining and GPT pretraining tutorials.\n\nFor all the commands below, make sure you are in the virtual environment that you have created above before you run the commands:\n\n.. code:: shell\n\n   source ~/aws_neuron_venv_pytorch/bin/activate\n\nFirst we install a recent version of HF transformers, scikit-learn and evaluate packages in our environment as well as download the source matching the installed version. In this example, we use the text classification example from HF transformers source:\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_setup_code.sh\n   :language: shell\n   :lines: 5-10\n\nSingle-worker training\n----------------------\n\nWe will run MRPC task fine-tuning following the example in README.md located in the path ``~/transformers/examples/pytorch/text-classification``. In this part of the tutorial we will use the Hugging Face model hub's pretrained ``bert-large-uncased`` model.\n\n.. note::\n\n    If you are using older versions of transformers <4.27.0 or PyTorch Neuron <1.13.0, please see section :ref:`workarounds_for_older_versions` for necessary workarounds.\n\nWe use BF16 mixed-precision casting using trainer API ``--bf16`` option and compiler flag ``--model-type=transformer`` to enable best performance.\nWe also launch the ``run_glue.py`` script with ``torchrun`` using ``--nproc_per_node=N`` option to specify the number of workers. Here we start off with 1 worker.\n\n.. note::\n\n    With transformers version 4.44 and up, please use torchrun even for one worker (``--nproc_per_node=1``) to avoid execution hang.\n\nFirst, paste the following script into your terminal to create a “run.sh” file and change it to executable:\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_single_worker_training.sh\n   :language: shell\n   :lines: 7-29\n\nWe optionally precompile the model and training script using neuron_parallel_compile to warm up the persistent\ngraph cache (Neuron Cache) such that the actual run has fewer compilations (faster run\ntime):\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_single_worker_training.sh\n   :language: shell\n   :lines: 32\n\nPlease ignore the results from this precompile run as it is only for\nextracting and compiling the XLA graphs.\n\n.. note::\n\n   With both train and evaluation options (``--do_train`` and ``--do_eval``), you will encounter harmless error\n   ``ValueError: Target is multiclass but average='binary'`` when using neuron_parallel_compile.\n\nPrecompilation is optional and only needed to be done once unless hyperparameters such as batch size are modified.\nAfter the optional precompilation, the actual run will be faster with minimal\nadditional compilations.\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_single_worker_training.sh\n   :language: shell\n   :lines: 34\n\nIf precompilation was not done, the first execution of ./run.sh will be slower due to serial compilations. Rerunning the same script a second time would show quicker execution as the compiled graphs will be already cached in persistent cache.\n\n.. _multi_worker_training_parallel:\n\nMulti-worker data-parallel training\n-----------------------------------\n\nThe above script would run one worker on one Logical NeuronCore. To run on\nmultiple Logical NeuronCores in data-parallel configuration, launch the ``run_glue.py`` script with ``torchrun`` using ``--nproc_per_node=N`` option to specify the number of workers\n(N=2 for trn1.2xlarge, and N=2, 8, or 32 for trn1.32xlarge).\n\n.. note::\n\n    If you are using older versions of transformers <4.27.0 or PyTorch Neuron <1.13.0, please see section :ref:`workarounds_for_older_versions` for necessary workarounds.\n\nThe following example runs 2 workers.\nPaste the following script into your terminal to create a “run_2w.sh” file and change it to executable:\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_multi_worker_training_code.sh\n   :language: shell\n   :lines: 7-29\n\nAgain, we optionally precompile the model and training script using neuron_parallel_compile to warm up the persistent\ngraph cache (Neuron Cache), ignoring the results from this precompile run as it is only for\nextracting and compiling the XLA graphs:\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_multi_worker_training_code.sh\n   :language: shell\n   :lines: 32\n\nPrecompilation is optional and only needed to be done once unless hyperparameters such as batch size are modified.\nAfter the optional precompilation, the actual run will be faster with minimal\nadditional compilations.\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_multi_worker_training_code.sh\n   :language: shell\n   :lines: 34\n\nDuring run, you will now notice that the \"Total train batch size\" is now 16 and the \"Total optimization steps\" is now half the number for one worker training.\n\nConverting BERT pretrained checkpoint to Hugging Face pretrained model format\n-----------------------------------------------------------------------------\nIf you have a pretrained checkpoint (i.e., from the BERT phase 2 pretraining tutorial), you can run the script below (saved as \"convert.py\") to convert BERT pretrained saved checkpoint to Hugging Face pretrained model format. An example phase 2 pretrained checkpoint can be downloaded from ``s3://neuron-s3/training_checkpoints/pytorch/dp_bert_large_hf_pretrain/ckpt_29688.pt``. Note that here we also use the ``bert-large-uncased`` model configuration to match the BERT-Large model trained following BERT phase 2 pretraining tutorial.\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_converted_checkpoint_training.sh\n   :language: python\n   :lines: 8-33\n\nRun the conversion script as:\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_converted_checkpoint_training.sh\n   :language: shell\n   :lines: 35\n\nAfter conversion, the new Hugging Face pretrained model is stored in the output directory specified by the ``--output_saved_model_path`` option which is ``hf_saved_model`` by default. You will use this directory in the next step.\n\nPaste the following script into your terminal to create a “run_converted.sh” file and change it to executable:\n(note that it uses the converted Hugging Face pretrained model in ``hf_saved_model`` directory):\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_converted_checkpoint_training.sh\n   :language: shell\n   :lines: 38-61\n\nIf it is the first time running with ``bert-large-uncased`` model or if hyperparameters have changed, then the optional one-time precompilation step can save compilation time:\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_converted_checkpoint_training.sh\n   :language: shell\n   :lines: 64\n\nIf you have run the single worker training in a previous section, then you can skip the precompilation step and just do:\n\n.. literalinclude:: tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_converted_checkpoint_training.sh\n   :language: shell\n   :lines: 67\n\n\n.. _known_issues:\n\nKnown issues and limitations\n----------------------------\n\n``RuntimeError: `fused=True` requires all the params to be floating point Tensors of supported devices: ['mps', 'cuda', 'xpu', 'hpu', 'cpu', 'mtia', 'privateuseone'] but torch.float32 and xla``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe error ``RuntimeError: `fused=True``` below indicates that the fused option for ``torch.optim.AdamW`` is turned on by default for PyTorch 2.8/2.9 in Hugging Face transformers versions 4.54 and newer. To work-around, use version <=4.53.3 or pass the option ``--optim adamw_torch`` to the ``run_glue.py`` script. This issue will be fixed with the upcoming Neuron PyTorch native which supports ``privateuseone`` device.\n\n.. code:: shell\n\n   RuntimeError: `fused=True` requires all the params to be floating point Tensors of supported devices: ['mps', 'cuda', 'xpu', 'hpu', 'cpu', 'mtia', 'privateuseone'] but torch.float32 and xla\n\n\n``INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension ...`` during precompilation of evaluation phase\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDuring precompilation (``neuron_parallel_compile``) of model evaluation phase, you may see the following crash:\n\n.. code:: shell\n\n    Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into; the 1th operand dimension is 2, the 1th output dimension is 0.\n    *** Begin stack trace ***\n        tsl::CurrentStackTrace[abi:cxx11]()\n        xla::Shape const* ConsumeValue<xla::Shape const*>(absl::lts_20230802::StatusOr<xla::Shape const*>&&) \n        ...\n\nThis is due to output dependent logic in HuggingFace Accelerate's ``pad_across_processes`` utility function. To work-around this issue, please add the following code snippet to the top of your run script (i.e. ``run_glue.py``):\n\n.. code:: python\n\n    import os\n    if os.environ.get(\"NEURON_EXTRACT_GRAPHS_ONLY\", \"0\") == \"1\":\n        from accelerate.accelerator import Accelerator\n        def pad_across_processes(self, tensor, dim=0, pad_index=0, pad_first=False):\n            return tensor\n        Accelerator.pad_across_processes = pad_across_processes\n\n\nCompilations for every evaluation step\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDuring model evaluation, there can be small compilations for every evaluation step due to a `known transformers issue <https://github.com/huggingface/transformers/issues/37593>`_. The work-around is to set training arguments ``eval_do_concat_batches=False`` and apply the changes in `the PR <https://github.com/huggingface/transformers/pull/37621>`_ which will be in a future release of transformers package (version 4.52 or later).\n\nRunning one worker fine-tuning without torchrun would result in a hang\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWith transformers>=4.44.0, running one worker fine-tuning without torchrun would result in a hang. To workaround and run one worker fine-tuning, use ``torchrun --nproc_per_node=1 <script>``.\n\n\nLong compilation times\n^^^^^^^^^^^^^^^^^^^^^^\n\nLong compilation times can be alleviated by using the ``neuron_parallel_compile`` tool to extract graphs from a short trial run and compile them in parallel ahead of the actual run, as shown above. Subsequent runs would load compiled graphs from the Neuron Cache and thus avoid long compilation times.\n\nCompilation errors during precompilation using ``neuron_parallel_compile`` on small EC2 instances\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen precompiling using batch size of 16 on trn1.2xlarge, you will see ``ERROR ||PARALLEL_COMPILE||: parallel compilation with neuronx-cc exited with error.Received error code: -9``. To workaround this error, please set ``NEURON_PARALLEL_COMPILE_MAX_RETRIES=1`` in the environment.\n\n\nVariable input sizes leading to timeouts\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nVariable input sizes: When fine-tuning models such as dslim/bert-base-NER using the `token-classification example <https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification>`__, you may encounter timeouts (lots of \"socket.h:524 CCOM WARN Timeout waiting for RX\" messages) and execution hang. This occurs because NER dataset has different sample sizes, which causes many recompilations and compiled graph (NEFF) reloads. Furthermore, different data parallel workers can execute different compiled graph. This multiple-program multiple-data behavior is currently unsupported. To workaround this issue, please pad to maximum length using the Trainer API option ``--pad_to_max_length``.\n\n\"ValueError: Your setup doesn't support bf16/gpu.\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen using latest HuggingFace transformers version, you may see \"ValueError: Your setup doesn't support bf16/gpu.\" To fix this, please use ``--use_cpu True`` in your scripts.\n\n.. _resolved_hf_issues:\n\nResolved issues\n---------------\n\n-  With torch-neuronx 2.1, HF Trainer API's use of XLA function ``xm.mesh_reduce`` causes ``\"EOFError: Ran out of input\"`` or ``\"_pickle.UnpicklingError: invalid load key, '!'\"`` errors during Neuron Parallel Compile. This is an issue with the trial execution of empty NEFFs and should not affect the normal execution of the training script.\n-  Multi-worker training using Trainer API resulted in too many graph compilations for HF transformers>=4.35: This is resolved with HF transformers>=4.37 with the additional workarounds as shown in `the ticket <https://github.com/aws-neuron/aws-neuron-sdk/issues/813>`_.\n-  Reduced accuracy for RoBERTa-Large is seen with Neuron PyTorch 1.12 (release 2.6) in FP32 mode with compiler BF16 autocast.\n   The workaround is to set NEURON_CC_FLAGS=\"--auto-cast none\" or set NEURON_RT_STOCHASTIC_ROUNDING_EN=1.\n- When running HuggingFace GPT fine-tuning with transformers version >= 4.21.0 and using XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1, you might see NaNs in the loss immediately at the first step. This issue occurs due to large negative constants used to implement attention masking (https://github.com/huggingface/transformers/pull/17306). To workaround this issue, please use transformers version <= 4.20.0.\n-  With release 2.6 and transformers==4.25.1,\n   using ``neuron_parallel_compile`` tool to run ``run_glue.py`` script\n   with both train and evaluation options (``--do_train`` and ``--do_eval``), you will encounter harmless error\n   ``ValueError: Target is multiclass but average='binary'``\n-  Using ``neuron_parallel_compile`` tool to run ``run_glue.py`` script\n   with both train and evaluation options (``--do_train`` and ``--do_eval``), you will\n   encounter INVALID_ARGUMENT error. To avoid this, only enable train for parallel\n   compile (``--do_train``). This will cause compilations during evaluation step.\n   The INVALID_ARGUMENT error is fixed in release 2.6 together with latest transformers package version 4.25.1.\n- When using Trainer API option --bf16, you will see \"RuntimeError: No CUDA GPUs are available\". To workaround this error, please add \"import torch; torch.cuda.is_bf16_supported = lambda: True\" to the Python script (i.e. run_glue.py). (Trainer API option --fp16 is not yet supported).\n-  When running HuggingFace BERT (any size) fine-tuning tutorial or pretraining tutorial with transformers version >= 4.21.0 and < 4.25.1 and using XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1, you will see NaNs in the loss immediately at the first step. More details on the issue can be found at `pytorch/xla#4152 <https://github.com/pytorch/xla/issues/4152>`_. The workaround is to use transformers version < 4.21.0 or >= 4.25.1, or add ``transformers.modeling_utils.get_parameter_dtype = lambda x: torch.bfloat16`` to your Python script (i.e. run_glue.py).\n-  Some recompilation is seen at the epoch boundary even after ``neuron_parallel_compile`` is used. This can be fixed by using the same number of epochs both during precompilation and the actual run.\n-  When running multi-worker training, you may see the process getting killed at the time of model saving on trn1.2xlarge.\n   This happens because the transformers ``trainer.save_model`` api uses ``xm.save`` for saving models.\n   This api is known to cause high host memory usage in multi-worker setting `see Saving and Loading XLA Tensors in  <https://github.com/pytorch/xla/blob/master/API_GUIDE.md>`__ . Coupled with a compilation\n   at the same time results in a host OOM. To avoid this issue, we can: Precompile all the graphs in multi-worker\n   training. This can be done by running the multi-worker training first with ``neuron_parallel_compile <script>``\n   followed by the actual training. This would avoid the compilation at model save during actual training.\n\n.. _workarounds_for_older_versions:\n\nOlder versions of transformers <4.27.0 or PyTorch Neuron <1.13.0\n----------------------------------------------------------------\n\nIf using older versions of transformers package before 4.27.0 or PyTorch Neuron before 1.13.0, please edit the python script run_glue.py and add the following lines after the Python\nimports. They set the compiler flag for transformer model type and enable data parallel training using torchrun:\n\n.. code:: python\n\n    # Enable torchrun\n    import os\n    import torch\n    import torch_xla.distributed.xla_backend\n    from packaging import version\n    from transformers import __version__, Trainer\n    if version.parse(__version__) < version.parse(\"4.26.0\") and os.environ.get(\"WORLD_SIZE\"):\n        torch.distributed.init_process_group('xla')\n\n    # Disable DDP for torchrun\n    import contextlib\n    if version.parse(__version__) < version.parse(\"4.20.0\"):\n        def _wrap_model(self, model, training=True):\n            model.no_sync = lambda: contextlib.nullcontext()\n            return model\n    else:\n        def _wrap_model(self, model, training=True, dataloader=None):\n            model.no_sync = lambda: contextlib.nullcontext()\n            return model\n    Trainer._wrap_model = _wrap_model\n\n    # Workaround for NaNs seen with transformers version >= 4.21.0\n    # https://github.com/aws-neuron/aws-neuron-sdk/issues/593\n    import transformers\n    if os.environ.get(\"XLA_USE_BF16\") or os.environ.get(\"XLA_DOWNCAST_BF16\"):\n        transformers.modeling_utils.get_parameter_dtype = lambda x: torch.bfloat16\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/mlp.rst",
    "content": ".. _neuronx-mlp-training-tutorial:\n\n\n.. meta::\n   :description: Multi-Layer Perceptron Training Tutorial - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training, tutorials\n   :date-modified: 2026-03-13\n\n\nMulti-Layer Perceptron Training Tutorial\n========================================\n\n\n\nMNIST is a standard dataset for handwritten digit recognition. A\nmulti-layer perceptron (MLP) model can be trained with MNIST dataset to\nrecognize hand-written digits. This tutorial starts with a 3-layer MLP\ntraining example in PyTorch on CPU, then show how to modify it to run on\nTrainium using PyTorch Neuron. It also shows how to do multiple worker\ndata parallel MLP training.\n\n\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\n.. include:: ../note-performance.txt\n\nSetup environment and download examples\n---------------------------------------\n\nBefore running the tutorial please follow the installation instructions at:\n\n:ref:`Install PyTorch Neuron on\nTrn1 <setup-torch-neuronx>`\n\nPlease set the storage of instance to *512GB* or more if you also want to run through the BERT pretraining and GPT pretraining tutorials.\n\nFor all the commands below, make sure you are in the virtual environment that you have created above before you run the commands:\n\n.. code:: shell\n\n   source ~/aws_neuron_venv_pytorch/bin/activate\n\nInstall needed dependencies in your environment by running:\n\n.. code:: bash\n\n    pip install pillow\n\nTorchvision package is needed for MNIST dataset and has already been installed as part of :ref:`Install PyTorch Neuron on Trn1 <pytorch-neuronx-install>`. Installing Torchvision together with torch-neuronx ensures that the compatible version of Torchvision is selected. For example, torchvision==0.12 is compatible with torch==1.11 and torchvision==0.13 is compatible with torch==1.12.\n    \nTo download the MNIST MLP examples, do:\n\n.. code:: bash\n\n   git clone https://github.com/aws-neuron/aws-neuron-samples.git\n   cd aws-neuron-samples/torch-neuronx/training/mnist_mlp\n\nMulti-layer perceptron MNIST model\n----------------------------------\n\nIn ``model.py``, we define the multi-layer perceptron (MLP) MNIST model with 3\nlinear layers and ReLU activations, followed by a log-softmax layer.\nThis model will be used in multiple example scripts.\n\nSingle-worker MLP training script in PyTorch on CPU\n---------------------------------------------------\n\nWe will show how to modify a training script that runs on other platform to run on Trainium.\n\nWe begin with a single-worker MLP training script for running on\nthe host CPUs of the Trainium instance. The training script imports the\nMLP model from ``model.py``.\n\nIn this training script, we load the MNIST train dataset and, within the\n``main()`` method, set the data loader to read batches of 32 training\nexamples and corresponding labels.\n\nNext we instantiate the MLP model and move it to the device. We use\n``device = 'cpu'`` to illustrate the use of device in PyTorch. On GPU\nyou would use ``device = 'cuda'`` instead.\n\nWe also instantiate the other two components of a neural network\ntrainer: stochastic-gradient-descent (SGD) optimizer and\nnegative-log-likelihood (NLL) loss function (also known as cross-entropy\nloss).\n\nAfter the optimizer and loss function, we create a training loop to iterate over the training samples and\nlabels, performing the following steps for each batch in each iteration:\n\n-  Zero gradients using:\n\n.. code:: python\n\n   optimizer.zero_grad()\n\n-  Move training samples and labels to device using the 'tensor.to'\n   method.\n-  Perform forward/prediction pass using\n\n.. code:: python\n\n   output = model(train_x)\n\n-  The prediction results are compared against the corresponding labels\n   using the loss function to compute the loss\n\n.. code:: python\n\n   loss_fn(output, train_label)\n\n-  The loss is propagated back through the model using chain-rule to\n   compute the weight gradients\n\n.. code:: python\n\n   loss.backward()\n\n-  The weights are updated with a change that is proportional to the\n   computed weights gradients\n\n.. code:: python\n\n   optimizer.step()\n\nAt the end of training we compute the throughput, display the final loss\nand save the checkpoint.\n\nExpected CPU output:\n\n.. code:: bash\n\n    ----------Training ---------------\n    Train throughput (iter/sec): 286.96994718801335\n    Final loss is 0.1040\n    ----------End Training ---------------\n\nRun the command below to execute this script:\n\n.. literalinclude:: tutorial_source_code/multi_layer_perceptron_training/multi_layer_perceptron_training_code.sh\n   :language: bash\n   :lines: 7\n\nFor a full tutorial on training in PyTorch, please see\n`Training with PyTorch <https://pytorch.org/tutorials/beginner/introyt/trainingyt.html>`__.\n\nThus far we have used PyTorch without Trainium. Next, we will show how\nto change this script to run on Trainium.\n\nSingle-worker MLP training on Trainium\n--------------------------------------\n\nTo run on Trainium, first we modify the CPU training script train_cpu.py to run with\nPyTorch Neuron torch_xla as described in :ref:`PyTorch Neuron for Trainium Getting Started Guide <pytorch-neuronx-programming-guide>`\nby changing the device:\n\n.. code:: python\n\n   import torch_xla.core.xla_model as xm\n   device = xm.xla_device()\n   # or\n   device = 'xla'\n\nWhen the model is moved to the XLA device using ``model.to(device)``\nmethod, subsequent operations on the model are recorded for later\nexecution. This is XLA's lazy execution which is different from\nPyTorch's eager execution. Within the training loop, we must mark the\ngraph to be optimized and run on XLA device (NeuronCore) using\nxm.mark_step() (unless MpDeviceLoader is used as you will see in the next section). \nWithout this mark, XLA cannot determine where the graph\nends. The collected computational graph also gets compiled and executed\nwhen you request the value of a tensor such as by calling\n``loss.item()`` or ``print(loss)``.\n\nTo save a checkpoint, it is recommended to use the ``xm.save()``\nfunction instead of ``torch.save()`` to ensure states are moved to CPU.\n``xm.save()`` also prevents the \"XRT memory handle not found\" warning at\nthe end of evaluation script (if the checkpoint saved using torch.save()\nis used for evaluation).\n\nThe resulting script ``train.py`` can be executed as \n``python3 train.py``. Again, note that we import the MLP model\nfrom ``model.py``. When you examine the script, the comments that begin with\n'XLA' indicate the changes required to make the script compatible with\ntorch_xla.\n\nRun the command below to execute this script:\n\n.. literalinclude:: tutorial_source_code/multi_layer_perceptron_training/multi_layer_perceptron_training_code.sh\n   :language: bash\n   :lines: 10\n\nExpected output on trn1.32xlarge (start from a fresh compilation cache, located at /var/tmp/neuron-compile-cache by default):\n\n.. code:: bash\n\n    2022-04-12 16:15:00.000947: INFO ||NCC_WRAPPER||: No candidate found under /var/tmp/neuron-compile-cache/USER_neuroncc-1.0.47218.0+162039557/MODULE_18200615679846498221.\n    2022-04-12 16:15:00.000949: INFO ||NCC_WRAPPER||: Cache dir for the neff: /var/tmp/neuron-compile-cache/USER_neuroncc-1.0.47218.0+162039557/MODULE_18200615679846498221/MODULE_0_SyncTensorsGraph.318_18200615679846498221_ip-172-31-69-14.ec2.internal-8355221-28940-5dc775cd78aa2/83a0fd4a-b07e-4404-aa55-701ab3b2700c\n    ........\n    Compiler status PASS\n    2022-04-12 16:18:05.000843: INFO ||NCC_WRAPPER||: Exiting with a successfully compiled graph\n    2022-04-12 16:18:05.000957: INFO ||NCC_WRAPPER||: No candidate found under /var/tmp/neuron-compile-cache/USER_neuroncc-1.0.47218.0+162039557/MODULE_5000680699473283909.\n    2022-04-12 16:18:05.000960: INFO ||NCC_WRAPPER||: Cache dir for the neff: /var/tmp/neuron-compile-cache/USER_neuroncc-1.0.47218.0+162039557/MODULE_5000680699473283909/MODULE_1_SyncTensorsGraph.390_5000680699473283909_ip-172-31-69-14.ec2.internal-8355221-28940-5dc7767e5fc69/7d0a2955-11b4-42e6-b536-6f0f02cc68df\n    .\n    Compiler status PASS\n    2022-04-12 16:18:12.000912: INFO ||NCC_WRAPPER||: Exiting with a successfully compiled graph\n    ----------Training ---------------\n    Train throughput (iter/sec): 95.06756661972014\n    Final loss is 0.1979\n    ----------End Training ---------------\n\nIf you re-run the training script a second time, you will see messages\nindicating that the compiled graphs are cached in the persistent cache\nfrom the previous run and that the startup time is quicker:\n\n.. code:: bash\n\n    (aws_neuron_venv_pytorch_p36) [ec2-user@ip-172-31-69-14 mnist_mlp]$ python train.py |& tee log_trainium\n    2022-04-12 16:21:58.000241: INFO ||NCC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/USER_neuroncc-1.0.47218.0+162039557/MODULE_18200615679846498221/MODULE_0_SyncTensorsGraph.318_18200615679846498221_ip-172-31-69-14.ec2.internal-8355221-28940-5dc775cd78aa2/83a0fd4a-b07e-4404-aa55-701ab3b2700c/MODULE_0_SyncTensorsGraph.318_18200615679846498221_ip-172-31-69-14.ec2.internal-8355221-28940-5dc775cd78aa2.neff. Exiting with a successfully compiled graph\n    2022-04-12 16:21:58.000342: INFO ||NCC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/USER_neuroncc-1.0.47218.0+162039557/MODULE_5000680699473283909/MODULE_1_SyncTensorsGraph.390_5000680699473283909_ip-172-31-69-14.ec2.internal-8355221-28940-5dc7767e5fc69/7d0a2955-11b4-42e6-b536-6f0f02cc68df/MODULE_1_SyncTensorsGraph.390_5000680699473283909_ip-172-31-69-14.ec2.internal-8355221-28940-5dc7767e5fc69.neff. Exiting with a successfully compiled graph\n    ----------Training ---------------\n    Train throughput (iter/sec): 93.16748895384832\n    Final loss is 0.1979\n    ----------End Training ---------------\n\nMultiple graphs can be created during execution since there are\ndifferences between some iterations (first, steady state, last). After\nthe first iteration, the graph for each iteration should remain the same\nfrom iteration to iteration. This allows XLA runtime to execute a\nprevious compiled graph that has been cached in XLA runtime cache.\n\nIf the inner training loop has some control-flows, for example for\ngradient accumulation, the number of compiled graphs may increase due to the\ngeneration and consumption of intermediates as well as additional\noperations when the conditional path is taken.\n\nMulti-worker data-parallel MLP training using torchrun\n------------------------------------------------------\n\nData parallel training allows you to replicate your script across\nmultiple workers, each worker processing a proportional portion of the\ndataset, in order to train faster.\n\nThe PyTorch distributed utility torchrun can be used to launch multiple\nprocesses in a server node for multi-worker data parallel training.\n\nTo run multiple workers in data parallel configuration using torchrun,\nmodify the single-worker training script train.py as follows (below we use ``xm``\nas alias for ``torch_xla.core.xla_model`` and ``xmp`` as alias for\n``torch_xla.distributed.xla_multiprocessing``):\n\n1. Import XLA backend for torch.distributed using ``import torch_xla.distributed.xla_backend``.\n2. Use ``torch.distributed.init_process_group('xla')``\n   to initialize PyTorch XLA runtime and Neuron\n   runtime.\n3. Use XLA multiprocessing device loader (``MpDeviceLoader``) from\n   ``torch_xla.distributed`` to wrap PyTorch data loader.\n4. Use ``xm.optimizer_step(optimizer)`` to perform allreduce and take\n   optimizer step.\n\nXLA MpDeviceLoader is optimized for XLA and is recommended for best\nperformance. It also takes care of marking the step for execution\n(compile and execute the lazily collected operations for an iteration)\nso no separate ``xm.mark_step()`` is needed.\n\nThe following are general best-practice changes needed to scale up the\ntraining:\n\n1. Set the random seed to be the same across workers.\n2. Scale up the learning rate by the number of workers. Use\n   ``xm.xrt_world_size()`` to get the global number of workers.\n3. Add distributed sampler to allow different worker to sample different\n   portions of dataset.\n\nAlso, the ``xm.save()`` function used to save checkpoint automatically\nsaves only for the rank-0 worker's parameters.\n\nThe resulting script is ``train_torchrun.py``\n(note again that we import the MLP model from ``model.py``):\n\nNext we use the ``torchrun`` utility that is included with torch\ninstallation to run multiple processes, each using one Logical NeuronCore. Use\nthe option ``nproc_per_node`` to indicate the number of processes to launch.\nFor example, to run on two Logical NeuronCores on one Trn1/Trn2 instance only, do:\n\nRun the command below to execute this script:\n\n.. literalinclude:: tutorial_source_code/multi_layer_perceptron_training/multi_layer_perceptron_training_code.sh\n   :language: bash\n   :lines: 13\n\n.. note::\n\n    Currently we only support:\n    - 1 and 2 worker configurations on trn1.2xlarge (default Logic NeuronCores size of 1)\n    - 1, 2, 8, and 32-worker configurations on trn1.32xlarge (default Logic NeuronCores size of 1)\n    - 1, 4, 16 and 64-worker configurations on trn2.48xlarge (default Logic NeuronCores size of 2)\n\nExpected output on trn1.32xlarge (second run to avoid compilations):\n\n.. code:: bash\n\n    ----------Training ---------------\n    ----------Training ---------------\n    ... (Info messages truncated)\n    Train throughput (iter/sec): 163.25353269069706\n    Train throughput (iter/sec): 163.23261047441036\n    Final loss is 0.3469\n    Final loss is 0.1129\n    ----------End Training ---------------\n    ----------End Training ---------------\n\nIn another example, we run on two trn1.32xlarge instances launched with EFA-enabled interfaces, using `EFA-enabled security group <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html#nccl-start-base-setup>`__, and setup using :ref:`Install PyTorch Neuron on Trn1 <pytorch-neuronx-install>`.\nNOTE: To run on multiple instances, you will need to use trn1.32xlarge instances and using all 32 NeuronCores on each instance.\n\nOn the rank-0 Trn1 host (root), run with ``--node_rank=0`` using torchrun utility, and ``--master_addr`` set to rank-0 host's IP address:\n\n.. code:: shell\n\n   export FI_EFA_USE_DEVICE_RDMA=1\n   export FI_PROVIDER=efa\n   torchrun --nproc_per_node=32 --nnodes=2 --node_rank=0 --master_addr=<root IP> --master_port=2020 train_torchrun.py\n\nOn another Trn1 host, run with ``--node_rank=1``, and ``--master_addr`` also set to rank-0 host's IP address:\n\n.. code:: shell\n\n   export FI_EFA_USE_DEVICE_RDMA=1\n   export FI_PROVIDER=efa\n   torchrun --nproc_per_node=32 --nnodes=2 --node_rank=1 --master_addr=<root IP> --master_port=2020 train_torchrun.py\n\nIt is important to launch rank-0 worker with ``--node_rank=0`` to avoid hang.\n\nFor trn2.48xlarge, use ``--nproc_per_node=64`` for 64 Logical NeuronCores default (each Logical NeuronCores using two physical NeuronCores).\n\nTo train on multiple instances, it is recommended to use a ParallelCluster. For a ParallelCluster example, please see `Train a model on AWS Trn1 ParallelCluster <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples>`__.\n\nSingle-worker MLP evaluation on Trainium\n----------------------------------------\n\nAfter training, the final checkpoint is saved in ``checkpoints`` directory. You can run the evaluation step by running the ``eval.py`` script in the same directory as the training script:\n\nRun the command below to execute this script:\n\n.. literalinclude:: tutorial_source_code/multi_layer_perceptron_training/multi_layer_perceptron_training_code.sh\n   :language: bash\n   :lines: 16-17\n\nThis evaluation phase can be merged with the training script to check accuracy, for example at the end of every epoch. It is kept separate for illustration purpose.\n\nThe evaluation script follow similar flow as the training script with the following differences:\n\n- The input data used is the validation subset of the MNIST dataset.\n- Only need to loop through the dataset once (no epochs).\n- There's only forward pass through the model, and no backward pass or optimizer update.\n- Compute the accuracy across validation set instead of loss per batch.\n\nExpected results (after a second execution to eliminate warmup compilation time during first execution):\n\n.. code:: bash\n\n   ----------Evaluating---------------\n   Test throughput (iter/sec): 47.897945949832845\n   Accuracy: 0.9273833632469177\n   ----------Done Evaluating---------------\n\nIf you get a lower accuracy than above, please check that the training is done with at least 4 epochs.\n\nYou can also use :ref:`torch_neuronx_trace_api` in the evaluation loop. This can be achieved by the following changes to the ``eval.py``:\n\n- Use ``device = 'cpu'`` instead of XLA device.\n- Don't use ``mark_step()``.\n- Trace the model at the first iteration to freeze it and precompile for inference:\n\n.. code:: python\n\n         if idx == 0:\n             import torch_neuronx\n             model = torch_neuronx.trace(model, test_x)\n\n\nHowever, note that the inference trace API fixed the input tensor shape, so that every  input tensor will need to match the size used during the tracing step. To ensure every batch from ``DataLoader`` has the same tensor shape, pass ``drop_last=True`` option when instantiating ``DataLoader``.\n\n.. code:: python\n\n        test_loader = DataLoader(test_dataset, batch_size=32, drop_last=True)\n\nThe script ``eval_using_trace.py`` can be compared against ``eval.py`` to show the above modifications. It can be executed using:\n\nRun the command below to execute this script:\n\n.. literalinclude:: tutorial_source_code/multi_layer_perceptron_training/multi_layer_perceptron_training_code.sh\n   :language: bash\n   :lines: 18\n\nExpected results (note the large increase in performance when using trace API for inference):\n\n.. code:: bash\n\n   ----------Evaluating---------------\n   Test throughput (iter/sec): 409.0836291417652\n   Accuracy: 0.9288585186004639\n   ----------Done Evaluating---------------\n\n\nKnown issues and limitations\n----------------------------\n\nMLP model is not optimized for performance. For the single-worker training, the performance can be improved by using MpDeviceLoader which exists in the multiprocessing example. For example, by setting ``--nproc_per_node=1`` in the torchrun example, you will see higher MLP performance.\n\n.. code:: bash\n\n    (aws_neuron_venv_pytorch_p36) [ec2-user@ip-172-31-69-14 mnist_mlp]$ torchrun --nproc_per_node=1 train_torchrun.py\n\n    ----------Training ---------------\n    ... (Info messages truncated)\n    Train throughput (iter/sec): 192.43508922834008\n    Final loss is 0.2720\n    ----------End Training ---------------\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/analyze_training/analyze_training_code.sh",
    "content": "# Create the files needed\ntee supported.py > /dev/null <<EOF\nimport torch\nimport torch_xla.core.xla_model as xm\n\nclass NN(torch.nn.Module):\n    def __init__(self):\n        super().__init__()\n\n        self.layer1 = torch.nn.Linear(4,4)\n        self.nl1 = torch.nn.ReLU()\n        self.layer2 = torch.nn.Linear(4,2)\n        self.nl2 = torch.nn.Tanh()\n\n    def forward(self, x):\n        x = self.nl1(self.layer1(x))\n        return self.nl2(self.layer2(x))\n\n\ndef main():\n    device = xm.xla_device()\n\n    model = NN().to(device)\n    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n    loss_fn = torch.nn.MSELoss()\n\n    inp = torch.rand(4)\n    target = torch.tensor([1,0])\n\n    model.train()\n    for epoch in range(2):\n        optimizer.zero_grad()\n        inp = inp.to(device)\n        target = target.to(device)\n        output = model(inp)\n        loss = loss_fn(output,target)\n        loss.backward()\n        optimizer.step()\n        xm.mark_step()\n\nif __name__ == '__main__':\n    main()\nEOF\n\ntee unsupported.py > /dev/null <<EOF\nimport torch\nimport torch_xla.core.xla_model as xm\n\nclass UnsupportedModel(torch.nn.Module):\n    def __init__(self):\n        super().__init__()\n\n    def forward(self, x):\n        y =  torch.fft.fft(x)\n        x = x + 10\n        return x * y\n\n\ndef main():\n    device = xm.xla_device()\n\n    model = UnsupportedModel().to(device)\n\n    inp = torch.rand(4)\n\n    model.train()\n    for epoch in range(1):\n        inp = inp.to(device)\n        output = model(inp)\n\n        xm.mark_step()\n\nif __name__ == '__main__':\n    main()\nEOF\n\n# Run analyze\nneuron_parallel_compile --command analyze python supported.py\n\nneuron_parallel_compile --command analyze python unsupported.py\n\nneuron_parallel_compile --command analyze --analyze-verbosity 1 python unsupported.py"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_converted_checkpoint_training.sh",
    "content": "#!/bin/bash\n\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\n\nset -eExuo\n\ncd ~/transformers/examples/pytorch/text-classification\naws s3 cp --no-progress s3://neuron-s3/training_checkpoints/pytorch/dp_bert_large_hf_pretrain/ckpt_29688.pt ./ --no-sign-request\n\n# Create convert file\ntee convert.py > /dev/null <<EOF\nimport os\nimport sys\nimport argparse\nimport torch\nimport transformers\nfrom transformers import (\n    BertForPreTraining,\n)\nimport torch_xla.core.xla_model as xm\nfrom transformers.utils import check_min_version\nfrom transformers.utils.versions import require_version\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--model_name', type=str, default='bert-large-uncased',  help=\"Path to model identifier from huggingface.co/models\")\n    parser.add_argument('--output_saved_model_path', type=str, default='./hf_saved_model', help=\"Directory to save the HF pretrained model format.\")\n    parser.add_argument('--checkpoint_path', type=str, required=True, help=\"Path to pretrained checkpoint which needs to be converted to a HF pretrained model format\")\n    args = parser.parse_args(sys.argv[1:])\n\n    model = BertForPreTraining.from_pretrained(args.model_name)\n    check_point = torch.load(args.checkpoint_path, map_location='cpu')\n    model.load_state_dict(check_point['model'], strict=False)\n    model.save_pretrained(args.output_saved_model_path, save_config=True, save_function=xm.save)\n    print(\"Done converting checkpoint {} to HuggingFace saved model in directory {}.\".format(args.checkpoint_path, args.output_saved_model_path))\nEOF\n\npython convert.py --checkpoint_path ckpt_29688.pt\n\n# Create run script\ntee run_converted.sh > /dev/null <<EOF\n#!/usr/bin/env bash\nset -eExuo\nexport TASK_NAME=mrpc\nexport NEURON_CC_FLAGS=\"--model-type=transformer\"\nNEURON_RT_STOCHASTIC_ROUNDING_EN=1 torchrun --nproc_per_node=2 ./run_glue.py \\\\\n--model_name_or_path hf_saved_model \\\\\n--tokenizer_name bert-large-uncased \\\\\n--task_name \\$TASK_NAME \\\\\n--do_train \\\\\n--do_eval \\\\\n--bf16 \\\\\n--use_cpu True \\\\\n--max_seq_length 128 \\\\\n--per_device_train_batch_size 8 \\\\\n--eval_do_concat_batches False \\\\\n--learning_rate 2e-5 \\\\\n--num_train_epochs 5 \\\\\n--save_total_limit 1 \\\\\n--overwrite_output_dir \\\\\n--output_dir /tmp/\\$TASK_NAME/ |& tee log_run_converted\nEOF\n\nchmod +x run_converted.sh\n\n# Pre-compile\nneuron_parallel_compile ./run_converted.sh\n\n#Run Training\n./run_converted.sh\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_multi_worker_training_code.sh",
    "content": "#!/bin/bash\n\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\n\nset -eExuo\n\ncd ~/transformers/examples/pytorch/text-classification\n\n# Create the run_2w.sh file\ntee run_2w.sh > /dev/null <<EOF\n#!/usr/bin/env bash\nset -eExuo\nexport TASK_NAME=mrpc\nexport NEURON_CC_FLAGS=\"--model-type=transformer\"\nNEURON_RT_STOCHASTIC_ROUNDING_EN=1 torchrun --nproc_per_node=2 ./run_glue.py \\\\\n--model_name_or_path bert-large-uncased \\\\\n--task_name \\$TASK_NAME \\\\\n--do_train \\\\\n--do_eval \\\\\n--bf16 \\\\\n--use_cpu True \\\\\n--max_seq_length 128 \\\\\n--per_device_train_batch_size 8 \\\\\n--eval_do_concat_batches False \\\\\n--learning_rate 2e-5 \\\\\n--num_train_epochs 5 \\\\\n--save_total_limit 1 \\\\\n--overwrite_output_dir \\\\\n--output_dir /tmp/\\$TASK_NAME/ |& tee log_run_2w\nEOF\n\nchmod +x run_2w.sh\n\n# Pre-compile and train\nneuron_parallel_compile ./run_2w.sh\n\n./run_2w.sh\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_setup_code.sh",
    "content": "#!/bin/bash\n\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\n\nset -eExuo\n\n# Install packages and clone transformers\nexport HF_VER=4.53.2\nexport ACC_VER=1.9.0\nexport DATA_VER=4.0.0\nexport EVAL_VER=0.4.5\npip install -U transformers==$HF_VER accelerate==$ACC_VER datasets==$DATA_VER evaluate==$EVAL_VER scikit-learn\ncd ~/\ngit clone https://github.com/huggingface/transformers --branch v$HF_VER\ncd ~/transformers/examples/pytorch/text-classification\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_mrpc_finetuning/bert_mrpc_finetuning_single_worker_training.sh",
    "content": "#!/bin/bash\n\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\n\nset -eExuo\n\ncd ~/transformers/examples/pytorch/text-classification\n\n# Create the run.sh file\ntee run.sh > /dev/null <<EOF\n#!/usr/bin/env bash\nset -eExuo\nexport TASK_NAME=mrpc\nexport NEURON_CC_FLAGS=\"--model-type=transformer\"\nNEURON_RT_STOCHASTIC_ROUNDING_EN=1 torchrun --nproc_per_node=1 ./run_glue.py \\\\\n--model_name_or_path bert-large-uncased \\\\\n--task_name \\$TASK_NAME \\\\\n--do_train \\\\\n--do_eval \\\\\n--bf16 \\\\\n--use_cpu True \\\\\n--max_seq_length 128 \\\\\n--per_device_train_batch_size 8 \\\\\n--eval_do_concat_batches False \\\\\n--learning_rate 2e-5 \\\\\n--num_train_epochs 5 \\\\\n--save_total_limit 1 \\\\\n--overwrite_output_dir \\\\\n--output_dir /tmp/\\$TASK_NAME/ |& tee log_run\nEOF\n\nchmod +x run.sh\n\n# Pre-compile and train\nneuron_parallel_compile ./run.sh\n\n./run.sh\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_training/bert_amp_training_code.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\n# Run the training script\ncd ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain\n\ntorchrun --nproc_per_node=32 dp_bert_large_hf_pretrain_hdf5.py \\\n--batch_size 16 \\\n--enable_pt_autocast \\\n--grad_accum_usteps 32 | tee run_pretrain_log.txt\ntorchrun_exit_status=${PIPESTATUS[0]}\necho \"Training return code: $torchrun_exit_status\"\nexit $torchrun_exit_status\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_training/bert_lamb_bf16_training_code.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\n# Run the training script\ncd ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain\ntorchrun --nproc_per_node=32 \\\ndp_bert_large_hf_pretrain_hdf5.py \\\n--max_steps 7032 \\\n--batch_size 16 \\\n--optimizer LAMB \\\n--lr 6e-3 \\\n--grad_accum_usteps 128 | tee run_pretrain_log.txt\ntorchrun_exit_status=${PIPESTATUS[0]}\necho \"Training return code: $torchrun_exit_status\"\nexit $torchrun_exit_status\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_training/bert_lamb_training_code.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\n# Run the training script\ncd ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain\ntorchrun --nproc_per_node=32 \\\ndp_bert_large_hf_pretrain_hdf5.py \\\n--max_steps 7032 \\\n--batch_size 8 \\\n--optimizer LAMB \\\n--lr 6e-3 \\\n--grad_accum_usteps 256 | tee run_pretrain_log.txt\ntorchrun_exit_status=${PIPESTATUS[0]}\necho \"Training return code: $torchrun_exit_status\"\nexit $torchrun_exit_status\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_training/bert_phase2_training_code.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\naws s3 cp --no-progress s3://neuron-s3/training_checkpoints/pytorch/dp_bert_large_hf_pretrain/ckpt_28125.pt ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain/output/ckpt_28125.pt --no-sign-request\n\ncd ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain\ntorchrun --nproc_per_node=32 dp_bert_large_hf_pretrain_hdf5.py \\\n    --data_dir ~/examples_datasets/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512/ \\\n    --lr 2.8e-4 \\\n    --phase2 \\\n    --resume_ckpt \\\n    --phase1_end_step 28125 \\\n    --batch_size 2 \\\n    --grad_accum_usteps 512 \\\n    --seq_len 512 \\\n    --max_pred_len 80 \\\n    --warmup_steps 781 \\\n    --max_steps 1563 \\\n    | tee run_pretrain_log_phase2.txt\ntorchrun_exit_status=${PIPESTATUS[0]}\necho \"Training return code: $torchrun_exit_status\"\nexit $torchrun_exit_status\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_training/bert_precompilation_code.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\n# Navigate to the script directory and run the pre-compile script\ncd ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain\nneuron_parallel_compile torchrun --nproc_per_node=32 \\\ndp_bert_large_hf_pretrain_hdf5.py \\\n--steps_this_run 10 \\\n--batch_size 16 \\\n--grad_accum_usteps 32 | tee compile_log.txt\ntorchrun_exit_status=${PIPESTATUS[0]}\necho \"Training return code: $torchrun_exit_status\"\nexit $torchrun_exit_status\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_training/bert_setup_code.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\n# Install the required Python packages\npython3 -m pip install -r ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain/requirements.txt\n\n# Create a directory for the datasets and download the datasets\nmkdir -p ~/examples_datasets/\npushd ~/examples_datasets/\naws s3 cp --no-progress s3://neuron-s3/training_datasets/bert_pretrain_wikicorpus_tokenized_hdf5/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar .  --no-sign-request\ntar -xf bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar\nrm bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar\naws s3 cp --no-progress s3://neuron-s3/training_datasets/bert_pretrain_wikicorpus_tokenized_hdf5/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar .  --no-sign-request\ntar -xf bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar\nrm bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar\npopd"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_training/bert_setup_code_ph2.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\n# Install the required Python packages\npython3 -m pip install -r ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain/requirements.txt\n\n# Create a directory for the datasets and download the datasets\nmkdir -p ~/examples_datasets/\npushd ~/examples_datasets/\naws s3 cp --no-progress s3://neuron-s3/training_datasets/bert_pretrain_wikicorpus_tokenized_hdf5/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar .  --no-sign-request\ntar -xf bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar\nrm bert_pretrain_wikicorpus_tokenized_hdf5_seqlen512.tar\npopd\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/bert_training/bert_training_code.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\n# Run the training script\ncd ~/aws-neuron-samples/torch-neuronx/training/dp_bert_hf_pretrain\ntorchrun --nproc_per_node=32 \\\ndp_bert_large_hf_pretrain_hdf5.py \\\n--batch_size 16 \\\n--grad_accum_usteps 32 | tee run_pretrain_log.txt\ntorchrun_exit_status=${PIPESTATUS[0]}\necho \"Training return code: $torchrun_exit_status\"\nexit $torchrun_exit_status\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/multi_layer_perceptron_training/multi_layer_perceptron_training_code.sh",
    "content": "#!/bin/bash\nset -eExuo\n\ncd ~/aws-neuron-samples/torch-neuronx/training/mnist_mlp\n\n# Single worker CPU training\npython train_cpu.py\n\n# Single worker MLP training\npython train.py\n\n# Multi-worker data-parallel MLP training\ntorchrun --nproc_per_node=2 train_torchrun.py\n\n# Single-worker MLP evaluation\ncd ~/aws-neuron-samples/torch-neuronx/training/mnist_mlp\npython eval.py\npython eval_using_trace.py"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorial_source_code/zero1_training/zero1_single_node_training_code.sh",
    "content": "#!/bin/bash\nset -eExuo\n\n# Install requirements\ncd ~/aws-neuron-samples/torch-neuronx/training/zero1_gpt2\npython3 -m pip install -r requirements.txt\n\n# Run precompile and training\nneuron_parallel_compile bash run_clm.sh MIXED wikitext-103-raw-v1\nbash run_clm.sh MIXED wikitext-103-raw-v1\n\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/tutorials-training-torch-neuronx.rst",
    "content": "\n.. meta::\n   :description: Tutorials for Training(torch-neuronx) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training, tutorials\n   :date-modified: 2026-03-13\n\n\nTutorials for Training(torch-neuronx)\n=====================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /frameworks/torch/torch-neuronx/tutorials/training/bert\n    /frameworks/torch/torch-neuronx/tutorials/training/mlp\n    /frameworks/torch/torch-neuronx/tutorials/training/finetune_hftrainer\n   \n    /frameworks/torch/torch-neuronx/tutorials/training/zero1_gpt2\n    /frameworks/torch/torch-neuronx/tutorials/training/analyze_for_training\n    /neuron-customops/tutorials/customop-mlp-training\n    /neuron-customops/tutorials/customop-mlp-perf-opt\n\n\n\n* :ref:`hf-bert-pretraining-tutorial`\n* :ref:`neuronx-mlp-training-tutorial`\n* :ref:`torch-hf-bert-finetune`\n* :ref:`torch-hf-t5-finetune`\n* :ref:`zero1-gpt2-pretraining-tutorial`\n* :ref:`torch-analyze-for-training-tutorial`\n* :ref:`neuronx-customop-mlp-tutorial`\n* :ref:`neuronx-customop-mlp-perf`\n\n.. note::\n\n    To use Jupyter Notebook see:\n\n    * :ref:`setup-jupyter-notebook-steps-troubleshooting`\n    * :ref:`running-jupyter-notebook-as-script`\n"
  },
  {
    "path": "frameworks/torch/torch-neuronx/tutorials/training/zero1_gpt2.rst",
    "content": ".. _zero1-gpt2-pretraining-tutorial:\n\n\n.. meta::\n   :description: ZeRO-1 Tutorial - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training, tutorials\n   :date-modified: 2026-03-13\n\n\nZeRO-1 Tutorial\n===============\n\n.. important::\n   Neuron will stop supporting XLA-based training support in a future release. For now, this tutorial is provided strictly for reference.\n\nWhat is ZeRO-1?\n---------------\n\nZeRO-1 (Zero Redundancy Optimizer Stage 1,\nhttps://arxiv.org/abs/1910.02054) is an optimization technique for\nlarge-scale deep learning models. It is a memory efficient variation of\ndata parallelism. ZeRO leverages the aggregate computation and memory\nresources of data parallelism to reduce the memory and compute\nrequirements of each accelerator used for model training. ZeRO reduces\nthe memory consumption of each accelerator by partitioning the various\nmodel training states (weights, gradients, and optimizer states) across\nthe available devices in the distributed training hardware. ZeRO is\nbeing implemented as incremental stages of optimizations. In stage 1,\nthe optimizer states (e.g., for Adam optimizer, 32-bit weights, and the\nfirst, and second moment estimates) are partitioned across the\nprocesses, so that each process updates only its partition.\n\n.. image:: zero1.jpg\n   :alt: Image: zero1.jpg\n\nWe implemented an XLA-friendly version of ZeRO-1 and it has\nbeen merged in open-source PyTorch/XLA project. Users can use it to\nenable ZeRO-1 algorithm by simply wrapping the origin optimizer as shown\nbelow.\n\n::\n\n   # Before:\n   optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)\n\n\n   # After\n   optimizer = ZeroRedundancyOptimizer(model.parameters(), torch.optim.Adam, lr=0.0001)\n\nThen just call ``optimizer.step()`` directly, the wrapped optimizer will\nhandle the distributed operations automatically.\n\nThe above code snippet illustrates the basic usage. Generally, users can\nuse ZeRO-1 optimizer like a normal optimizer. In addition,\n``ZeroRedundancyOptimizer`` also provides other features: enable\ngradient clipping or use other data type for wrapped optimizer. Note\nthat though the most of optimizers can be used with ZeRO-1, optimizers\nthat compute norm for parameters (e.g. LAMB) might lead to accuracy\ndisparities compared to using original local optimizer when using\nZeRO-1, because these optimizers cannot get full parameters but shards.\n\nUsage\n-----\n\nTo enable ZeRO-1 optimizer, just import it and replace origin optimizer\nwith ZeRO-1 wrapped version\n\n::\n\n   from torch_xla.distributed.zero_redundancy_optimizer import ZeroRedundancyOptimizer\n   ...\n   ...\n\n   device = xm.xla_device()\n   model = model.to(device)\n\n   optimizer = ZeroRedundancyOptimizer(model.parameters(), AdamW, lr=0.001)\n\nThen in training loop, just call ``optimizer.step()`` , note that we\nshould not use ``xm.reduce_gradients()`` or ``xm.optimizer_step()`` as\ngradient reduction will be handle by ZeRO-1.\n\n::\n\n       ...\n       loss.backward()\n       xm.mark_step()\n       optimizer.step()\n       xm.mark_step()\n\nZeRO-1 optimizer also provides some additional features, user can pass\nthese arguments to the wrapper constructor:\n\n-  Change ``optimizer_dtype`` to choose data dtype used by optimizer, default\n   is ``torch.float32``. For example, when parameter data type is bfloat16,\n   set ``optimizer_dtype`` to be float32 to enable 'master weight'.\n-  Change ``grad_clipping`` to enable grad clipping, default is ``True``.\n-  Change ``max_norm`` to determine the maximum norm value used by grad\n   clipping, default is ``1.0``.\n-  Change ``use_grad_acc_hook`` to enable using buffers to store gradients,\n   it will use the same data type as ``optimizer_dtype`` to accumulate gradients.\n   (Added in neuron 2.19.0 release).\n-  Change ``higher_cc_precision`` to force reduce-scatter operator to use the same\n   data type as ``optimizer_dtype``, default is ``False``. When ``use_grad_acc_hook``\n   is ``True``, it has no effects. (Added in neuron 2.19.0 release).\n\nNote: ZeRO-1 optimizer now forces to use the same data type as parameters for\nall-gather operator. (Changed in neuron 2.19.0 release)\n\nGPT2-XL Pretraining Tutorial\n----------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nSetup\n~~~~~\n\nWe use single Trn1.32xlarge instance. Follow :ref:`Install PyTorch Neuron on\nTrn1 <setup-torch-neuronx>` to setup the environment first. For all the commands below, make sure\nyou are in the virtual environment that you have created above before\nyou run the commands:\n\n**requirements.txt:** We pin the following Hugging Face Library versions\nnecessary for the tutorial\n\n::\n\n   transformers==4.27.3\n   accelerate==0.17\n   tensorboard==2.12.2\n\n::\n\n   source ~/aws_neuron_venv_pytorch/bin/activate\n\n::\n\n   git clone https://github.com/aws-neuron/aws-neuron-samples.git\n   cd aws-neuron-samples/torch-neuronx/training/zero1_gpt2\n   python3 -m pip install -r requirements.txt\n\nThe specific files you need for this tutorial:\n\n-  config_1p5B_gpt2.json: The model configuration used in the tutorial\n   for GPT 2.7B Neo\n-  neuron_utils.py: includes utility functions and the logging tools\n-  run_clm_no_trainer.py: the main training script that runs the actual\n   training\n-  run_clm.sh: the shell script to launch the training job\n\nDataset\n~~~~~~~\n\nFor the dataset, we use the wikitext dataset, specifically\n``wikitext-103-raw-v1,`` provided by the HuggingFace\nhttps://huggingface.co/datasets/wikitext. The data will be preprocessed\nthe first time running through the training script and then preprocessed\ndata will be cached in the HuggingFace cache directory for any future\ntraining runs.\n\nIf the main process downloads the dataset, tokenizes the data and groups\nthem together successfully, the expected output would be as below at the\nbeginning of the training.\n\n::\n\n   ***** Running training *****\n     Num examples = 114248\n     Num Epochs = 29\n     Instantaneous batch size per device = 1\n     Total train batch size (w. parallel, distributed & accumulation) = 32\n     Gradient Accumulation steps = 1\n     Total optimization steps = 100000\n\nTraining\n~~~~~~~~\n\nThe GPT2 python fine-tuning script is adapted from the example\n`run_clm_no_trainer.py <https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm_no_trainer.py>`__\nin the `Transformers language modeling examples <https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling>`__.\nIt incorporates the `Accelerate <https://github.com/huggingface/accelerate>`__ library. Given its beta stage,\nsome modifications are needed, along with the bridge code to XLA.\nParticularly, some workarounds to support Accelerate for the training\nscript are listed in \"Known Issues Workarounds and Limitations\" below.\n\nIn this example, we use GPT2-xl as example, and show the training steps\nwith mixed precision (bfloat16 and float32)\n\n-  single node training:\n\n.. literalinclude:: tutorial_source_code/zero1_training/zero1_single_node_training_code.sh\n   :language: shell\n   :lines: 8-10\n\n-  multi-node training, run:\n\n::\n\n   sbatch run_clm_compile.slurm\n\nthen\n\n::\n\n   sbatch run_clm.slurm\n\nKnown Issues, **Work-arounds and Limitations**\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n1. Error message: ``ValueError: invalid literal for int() with base 10: ''``.\n   Simply re-run the script can solve this issue. This issue is already solved\n   in the newer versions of transformers, see https://github.com/huggingface/transformers/pull/22427.\n\n2. Accelerator API workarounds:\n\n   -  Error message: \"Gradient accumulation is not supported on TPU.\n      Please set gradient_accumulation_steps to 1 and don’t pass in a\n      GradientAccumulationPlugin object.\" More context here:\n      https://github.com/huggingface/accelerate/pull/479. The training\n      still works by commenting out the assertion and avoid using the\n      accumulation wrapper with accelerator.accumulate(model)\n   -  Accelerator.prepare call: We have noticed that using the optimizer\n      returned by this API are not directly reusable. It is due to gaps\n      in configuring accelerate API for XLA devices.\n"
  },
  {
    "path": "frameworks/torch/torch-setup.rst",
    "content": ".. _torch-setup:\n\n\n.. meta::\n   :description: Pytorch Neuron Setup - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, PyTorch, setup\n   :date-modified: 2026-03-13\n\n\nPytorch Neuron Setup\n====================\n\nInstall and configure PyTorch for use with AWS Neuron on Trainium and Inferentia instances.\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: PyTorch Neuron (``torch-neuronx``) Setup for Inf2, Trn1, and Trn2 Instances\n        :link: setup-torch-neuronx\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Install and configure PyTorch NeuronX for Inf2, Trn1, and Trn2 instances.\n\n    .. grid-item-card:: PyTorch Neuron (``torch-neuron``) Setup for Inf1 Instances (Archived)\n        :link: /archive/torch-neuron/setup/pytorch-install\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Legacy setup guide for PyTorch Neuron on Inf1 instances. This package has been archived.\n"
  },
  {
    "path": "frameworks/torch/training-torch-neuronx.rst",
    "content": ".. _training-torch-neuronx:\n\n\n.. meta::\n   :description: Training (``torch-neuronx``) - AWS Neuron SDK documentation\n   :keywords: AWS Neuron, Inferentia, PyTorch, Trainium, torch-neuronx, training\n   :date-modified: 2026-03-13\n\n\nTraining (``torch-neuronx``)\n============================\n\nTrain models using PyTorch NeuronX on Trainium instances (Trn1, Trn2).\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n     Tutorials </frameworks/torch/torch-neuronx/tutorials/training/tutorials-training-torch-neuronx>\n     Additional Examples </frameworks/torch/torch-neuronx/additional-examples-training>\n     API Reference Guide </frameworks/torch/torch-neuronx/api-reference-guide/training/index>\n     Developer Guide  </frameworks/torch/torch-neuronx/programming-guide/training/index>\n     Misc  </frameworks/torch/torch-neuronx/misc-training>\n\nGet Started\n------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: Setup (``torch-neuronx``)\n        :link: setup-torch-neuronx\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Install and configure PyTorch NeuronX for training workloads.\n\nTutorials & Examples\n---------------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: Tutorials\n        :link: /frameworks/torch/torch-neuronx/tutorials/training/tutorials-training-torch-neuronx\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Step-by-step training tutorials for PyTorch NeuronX.\n\n    .. grid-item-card:: Additional Examples\n        :link: /frameworks/torch/torch-neuronx/additional-examples-training\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        More training examples and sample code.\n\nReference\n----------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: API Reference Guide\n        :link: /frameworks/torch/torch-neuronx/api-reference-guide/training/index\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Training API reference for PyTorch NeuronX.\n\n    .. grid-item-card:: Developer Guide\n        :link: /frameworks/torch/torch-neuronx/programming-guide/training/index\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        In-depth developer guide for training on Neuron.\n\n    .. grid-item-card:: Misc\n        :link: /frameworks/torch/torch-neuronx/misc-training\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Supported operators, multi-node setup, and troubleshooting.\n"
  },
  {
    "path": "general/faq.rst",
    "content": ".. _neuron_faq:\n\nNeuron FAQ\n==========\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nNeuron 2.x FAQ\n--------------\n\n* :ref:`neuron2-intro-faq`\n\nTraining Only FAQ\n-----------------\n\n* :ref:`neuron-training-faq`\n\n\nInference Only FAQ\n------------------\n\n* :ref:`neuron-f1-faq`\n* :ref:`trouble-shooting-inf1-faq`\n* :ref:`tf1_faq`\n* :ref:`tf2_faq`\n* :ref:`NeuronPerf <neuronperf_faq>`\n\nRuntime FAQ\n-----------\n\n* :ref:`Neuron Runtime FAQ <neuron-runtime-faq>`\n\nCompiler FAQ\n------------\n\n* :ref:`neuronx_compiler_faq`\n* :ref:`neuron_compiler_faq`\n\n\nNeuron Containers\n-----------------\n\n* :ref:`Neuron Containers FAQ <container-faq>`\n\n\nONNX FAQ\n--------\n\n* :ref:`onnx-faq`\n  \n\nSupport\n-------\n\n* :ref:`neuron_roadmap_faq`\n* :ref:`contribute-faq`\n"
  },
  {
    "path": "includes/setup/select-framework-note.txt",
    "content": ".. note::\n    For help selecting a framework type, see:\n\n    :ref:`torch-neuron_vs_torch-neuronx`"
  },
  {
    "path": "includes/setup/tab-inference-mxnet-neuron-al2.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install MXNet Neuron (``mxnet-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n.. card:: Visit MXNet Neuron(``mxnet-neuron``) for Inference section\n    :link: inference-mxnet-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n"
  },
  {
    "path": "includes/setup/tab-inference-mxnet-neuron-al2023.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install MXNet Neuron (``mxnet-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n\n.. card:: Visit MXNet Neuron(``mxnet-neuron``) for Inference section\n    :link: inference-mxnet-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n"
  },
  {
    "path": "includes/setup/tab-inference-mxnet-neuron-u20.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install MXNet Neuron (``mxnet-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n.. card:: Visit MXNet Neuron(``mxnet-neuron``) for Inference section\n    :link: inference-mxnet-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n"
  },
  {
    "path": "includes/setup/tab-inference-mxnet-neuron-u22.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install MXNet Neuron (``mxnet-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n.. card:: Visit MXNet Neuron(``mxnet-neuron``) for Inference section\n    :link: inference-mxnet-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n"
  },
  {
    "path": "includes/setup/tab-inference-mxnet-neuron.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install MXNet Neuron (``mxnet-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=compiler_framework\n\n.. dropdown::  Run Tutorial\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    :ref:`ResNet50 </src/examples/mxnet/resnet50/resnet50.ipynb>`\n\n.. card:: Visit MXNet Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: /archive/mxnet-neuron/index\n    :link-type: doc"
  },
  {
    "path": "includes/setup/tab-inference-tensorflow-neuron-al2.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install TensorFlow Neuron (``tensorflow-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n.. card:: Visit TensorFlow Neuron(``tensorflow-neuron``) for Inference section\n    :link: inference-tensorflow-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit TensorFlow Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: tensorflow-neuron-main\n    :link-type: ref"
  },
  {
    "path": "includes/setup/tab-inference-tensorflow-neuron-al2023.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install TensorFlow Neuron (``tensorflow-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n\n.. card:: Visit TensorFlow Neuron(``tensorflow-neuron``) for Inference section\n    :link: inference-tensorflow-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit TensorFlow Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: tensorflow-neuron-main\n    :link-type: ref"
  },
  {
    "path": "includes/setup/tab-inference-tensorflow-neuron-u20.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install TensorFlow Neuron (``tensorflow-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n.. card:: Visit TensorFlow Neuron(``tensorflow-neuron``) for Inference section\n    :link: inference-tensorflow-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit TensorFlow Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: tensorflow-neuron-main\n    :link-type: ref"
  },
  {
    "path": "includes/setup/tab-inference-tensorflow-neuron-u22.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install TensorFlow Neuron (``tensorflow-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 113\n        :end-line: 114\n\n\n.. card:: Visit TensorFlow Neuron(``tensorflow-neuron``) for Inference section\n    :link: inference-tensorflow-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit TensorFlow Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: tensorflow-neuron-main\n    :link-type: ref"
  },
  {
    "path": "includes/setup/tab-inference-tensorflow-neuronx-al2.txt",
    "content": ".. _neuronx_installation:\n\n.. dropdown::  Install EFA (Applicable only for ``Trn1\\Trn1n``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 44\n        :end-line: 45\n\n.. dropdown::  Install TensorFlow Neuron (``tensorflow-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 32\n        :end-line: 33\n\n.. card:: Visit TensorFlow Neuron(``tensorflow-neuronx``) for Inference section\n    :link: inference-tensorflow-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n"
  },
  {
    "path": "includes/setup/tab-inference-tensorflow-neuronx-al2023.txt",
    "content": ".. _neuronx_installation:\n\n.. dropdown::  Install EFA (Applicable only for ``Trn1\\Trn1n``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 245\n        :end-line: 246\n\n.. dropdown::  Install TensorFlow Neuron (``tensorflow-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 173\n        :end-line: 174\n\n.. card:: Visit TensorFlow Neuron(``tensorflow-neuronx``) for Inference section\n    :link: inference-tensorflow-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n"
  },
  {
    "path": "includes/setup/tab-inference-tensorflow-neuronx-u20.txt",
    "content": ".. _neuronx_installation:\n\n.. dropdown::  Install EFA (Applicable only for ``Trn1\\Trn1n``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 47\n        :end-line: 48\n\n.. dropdown::  Install TensorFlow Neuron (``tensorflow-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 35\n        :end-line: 36\n\n.. card:: Visit TensorFlow Neuron(``tensorflow-neuronx``) for Inference section\n    :link: inference-tensorflow-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n"
  },
  {
    "path": "includes/setup/tab-inference-tensorflow-neuronx-u22.txt",
    "content": ".. _neuronx_installation:\n\n.. dropdown::  Install EFA (Applicable only for ``Trn1\\Trn1n``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 248\n        :end-line: 249\n\n.. dropdown::  Install TensorFlow Neuron (``tensorflow-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 107\n        :end-line: 108\n\n.. card:: Visit TensorFlow Neuron(``tensorflow-neuronx``) for Inference section\n    :link: inference-tensorflow-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n"
  },
  {
    "path": "includes/setup/tab-inference-torch-neuron-al2.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install PyTorch Neuron (``torch-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami\n\n.. card:: Visit PyTorch Neuron(``torch-neuron``) for Inference section\n    :link: inference-torch-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit PyTorch Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: neuron-pytorch\n    :link-type: ref"
  },
  {
    "path": "includes/setup/tab-inference-torch-neuron-al2023.txt",
    "content": ".. _neuron_installation_al2023:\n\n.. dropdown::  Install PyTorch Neuron (``torch-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami\n\n.. card:: Visit PyTorch Neuron(``torch-neuron``) for Inference section\n    :link: inference-torch-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit PyTorch Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: neuron-pytorch\n    :link-type: ref"
  },
  {
    "path": "includes/setup/tab-inference-torch-neuron-u20.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install PyTorch Neuron (``torch-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami\n\n.. card:: Visit PyTorch Neuron(``torch-neuron``) for Inference section\n    :link: inference-torch-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit PyTorch Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: neuron-pytorch\n    :link-type: ref"
  },
  {
    "path": "includes/setup/tab-inference-torch-neuron-u22.txt",
    "content": ".. _neuron_installation_u22:\n\n.. dropdown::  Install PyTorch Neuron (``torch-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 110\n        :end-line: 111\n\n.. card:: Visit PyTorch Neuron(``torch-neuron``) for Inference section\n    :link: inference-torch-neuron\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit PyTorch Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: neuron-pytorch\n    :link-type: ref"
  },
  {
    "path": "includes/setup/tab-inference-torch-neuron.txt",
    "content": ".. _neuron_installation:\n\n.. dropdown::  Install PyTorch Neuron (``torch-neuron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=compiler_framework\n\n.. dropdown::  Run Tutorial\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Torchvision ResNet50 tutorial :ref:`[html] </src/examples/pytorch/torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb>`\n\n.. card:: Visit PyTorch Neuron section for more\n    :class-body: sphinx-design-class-body-small\n    :link: pytorch-neuronx-main\n    :link-type: ref"
  },
  {
    "path": "includes/setup/tab-inference-torch-neuronx-al2.txt",
    "content": ".. _neuronx_installation:\n\n.. dropdown::  Install EFA (Applicable only for ``Trn1\\Trn1n``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 44\n        :end-line: 45\n\n.. dropdown::  Install PyTorch Neuron (``torch-neuronx``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. tab-set::\n\n        .. tab-item:: PyTorch 1.13.1\n\n            .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                :start-line: 8\n                :end-line: 9\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Inference section\n    :link: inference-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Training section\n    :link: training-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small"
  },
  {
    "path": "includes/setup/tab-inference-torch-neuronx-al2023.txt",
    "content": ".. dropdown::  Install EFA (Applicable only for ``Trn1\\Trn1n\\Trn2``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 245\n        :end-line: 246\n\n.. dropdown::  Install PyTorch Neuron (torch-neuronx) \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. tab-set::\n\n        .. tab-item :: PyTorch 2.8.0\n\n            .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n                :start-line: 278\n                :end-line: 279\n\n        .. tab-item :: PyTorch 2.7.0\n\n            .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n                :start-line: 266\n                :end-line: 267\n        \n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Inference section\n    :link: inference-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Training section\n    :link: training-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n"
  },
  {
    "path": "includes/setup/tab-inference-torch-neuronx-u20.txt",
    "content": ".. dropdown::  Install EFA (Applicable only for ``Trn1\\Trn1n``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 47\n        :end-line: 48\n\n.. dropdown::  Install PyTorch Neuron (torch-neuronx) \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. tab-set::\n\n        .. tab-item:: PyTorch 2.1.2\n\n            .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                :start-line: 230\n                :end-line: 231\n\n        .. tab-item:: PyTorch 1.13.1\n\n            .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                :start-line: 11\n                :end-line: 12\n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Inference section\n    :link: inference-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Training section\n    :link: training-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n"
  },
  {
    "path": "includes/setup/tab-inference-torch-neuronx-u22.txt",
    "content": ".. _neuronx_installation:\n\n.. dropdown::  Install EFA (Applicable only for ``Trn1\\Trn1n\\Trn2\\Trn3``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 248\n        :end-line: 249\n\n.. dropdown::  Install PyTorch Neuron (torch-neuronx) \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. tab-set::\n\n        .. tab-item:: PyTorch 2.9.0\n\n            .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                :start-line: 287\n                :end-line: 288\n\n        .. tab-item:: PyTorch 2.8.0\n\n            .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                :start-line: 281\n                :end-line: 282\n\n        .. tab-item:: PyTorch 2.7.0\n\n            .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                :start-line: 269\n                :end-line: 270\n\n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Inference section\n    :link: inference-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Training section\n    :link: training-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n"
  },
  {
    "path": "includes/setup/tab-inference-torch-neuronx-u24.txt",
    "content": ".. _neuronx_installation:\n\n.. dropdown::  Install EFA (Applicable only for ``Trn1\\Trn1n\\Trn2\\Trn3``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 290\n        :end-line: 291\n\n.. dropdown::  Install PyTorch Neuron (torch-neuronx) \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. tab-set::\n\n        .. tab-item:: PyTorch 2.9.0\n\n            .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                :start-line: 296\n                :end-line: 297\n\n        .. tab-item:: PyTorch 2.8.0\n\n            .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n                :start-line: 305\n                :end-line: 306\n\n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Inference section\n    :link: inference-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n\n.. card:: Visit PyTorch Neuron(``torch-neuronx``) for Training section\n    :link: training-torch-neuronx\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\n"
  },
  {
    "path": "index.rst",
    "content": ".. meta::\n   :description: AWS Neuron SDK enables high-performance deep learning and generative AI on AWS Inferentia and Trainium instances. Get started with PyTorch, JAX, and distributed training.\n   :date-modified: 2026-04-09\n\n.. _neuron_home:\n\nAWS Neuron Documentation\n=========================\n\n:ref:`AWS Neuron <what-is-neuron>` is a software stack that enables high-performance deep learning and generative AI workloads on `AWS Inferentia <https://aws.amazon.com/ai/machine-learning/inferentia/>`_ and `AWS Trainium <https://aws.amazon.com/ai/machine-learning/trainium/>`_ instances. Neuron provides a complete machine learning development experience with compiler optimization, runtime efficiency, and comprehensive tooling.\n\n* **For more details, see** :doc:`What is AWS Neuron? </about-neuron/what-is-neuron>` and :doc:`What's New in AWS Neuron? </about-neuron/whats-new>`\n\n* **For the latest release notes, see** :doc:`AWS Neuron Release Notes </release-notes/index>`. The current release is :doc:`version 2.29.0 </release-notes/2.29.0>`, released on April 09, 2026.\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: \n      :class-card: sd-border-2\n\n      **Looking to dive into Neuron development? Follow these links:**\n      ^^^\n      * :doc:`Learn about Neuron's support for native PyTorch </frameworks/torch/pytorch-native-overview>`\n      * :doc:`Get started with vLLM </libraries/nxd-inference/vllm/index>` for :doc:`Offline </libraries/nxd-inference/vllm/quickstart-vllm-offline-serving>` or :doc:`Online </libraries/nxd-inference/vllm/quickstart-vllm-online-serving>` inference model serving\n      * :doc:`Implement and run your first NKI kernel </nki/get-started/quickstart-implement-run-kernel>`\n      * :doc:`Optimize model performance with Neuron Explorer </tools/neuron-explorer/index>`\n      * :doc:`Launch a Inf/Trn instance on Amazon EC2 </devflows/ec2-flows>`\n      * :doc:`Deploy a DLC </containers/get-started/quickstart-configure-deploy-dlc>`\n\n----\n\nLearn more about AWS Neuron\n----------------------------\n\n**Select a card below to read more about these features**:\n\n.. grid:: 1 2 2 2\n   :gutter: 3\n\n   .. grid-item-card:: \n      :link: /frameworks/torch/pytorch-native-overview\n      :link-type: doc\n      :class-card: sd-border-2\n \n      **Native PyTorch**\n      ^^^\n      Learn about native PyTorch support in AWS Neuron.\n\n   .. grid-item-card:: \n      :link: /libraries/nxd-inference/vllm/index\n      :link-type: doc\n      :class-card: sd-border-2\n\n      **vLLM on Neuron**\n      ^^^\n      High-performance inference serving for large language models with OpenAI-compatible APIs on Trainium and Inferentia.\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: \n      :class-card: sd-border-2\n\n      **Developer Tools**\n      ^^^\n      Profile and monitor your models as you develop, build, test, and deploy them with Neuron's developer tools.\n\n      * :doc:`Neuron Explorer </tools/neuron-explorer/index>`\n      * :doc:`Neuron Profiler </tools/profiler/neuron-profile-user-guide>`\n      * :doc:`Neuron Profiler 2.0 </tools/profiler/neuron-profiler-2-0-beta-user-guide>`\n      * :doc:`Neuron System tools </tools/neuron-sys-tools/index>`\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: \n      :class-card: sd-border-2\n\n      **Neuron Kernel Interface**\n      ^^^\n      Low-level programming interface for custom kernel development on Trainium and Inferentia with direct hardware access.\n\n      * :doc:`Set up your developer environment </nki/get-started/setup-env>`\n      * :doc:`NKI Library  </nki/library/index>`\n      * :doc:`NKI Language Guide </nki/get-started/nki-language-guide>`\n      * :doc:`NKI Tutorials </nki/guides/tutorials/index>`\n      * :doc:`NKI API Reference </nki/api/index>`\n      * :doc:`NKI Compiler </nki/deep-dives/nki-compiler>`\n\n**Other Neuron features:**\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: **Orchestration and Deployment on AWS EC2 and EKS**\n      :link: /devflows/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Configure and run AWS Deep Learning Images (DLAMIs) and Containers (DLCs) to test and deploy your models with AWS EC2 and EKS.\n\n   .. grid-item-card::  **AWS Neuron Open Source**\n      :link: /about-neuron/oss/index\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Interested in contributing to Neuron source code and samples? Review this documentation and learn about our public GitHub repos and how to contribute to the code and samples in them.  \n\n   .. grid-item-card:: **AWS Neuron-supported ML frameworks**\n      :class-card: sd-border-1\n\n      * :doc:`PyTorch NeuronX (torch-neuronx) <frameworks/torch/index>`\n      * :doc:`JAX NeuronX <frameworks/jax/index>`\n\n   .. grid-item-card:: **NeuronX Distributed (NxD) libraries**\n      :class-card: sd-border-1\n\n      * :doc:`NxD Libraries Overview <libraries/index>`\n      * :doc:`NxD Training <libraries/nxd-training/index>`\n      * :doc:`NxD Inference <libraries/nxd-inference/index>`\n      * :doc:`NxD Core <libraries/index>`\n\n   .. grid-item-card:: **Workloads**\n      :class-card: sd-border-1\n\n      * :doc:`Workload orchestration </devflows/index>`\n      * :doc:`AWS Neuron Deep Learning Machine Images (DLAMIs) <dlami/index>`\n      * :doc:`AWS Neuron Deep Learning Containers (DLCs) <containers/index>`\n\n   .. grid-item-card:: **Runtime & Collectives**\n      :class-card: sd-border-1\n\n      * :doc:`Neuron Runtime <neuron-runtime/index>`\n      * :doc:`Neuron Collectives <neuron-runtime/about/collectives>`\n      * :doc:`Neuron C++ Custom Operators <neuron-customops/index>`\n\n   .. grid-item-card:: **Compilers**\n      :class-card: sd-border-1\n\n      * :doc:`Neuron Graph Compiler <compiler/index>`\n      * :doc:`Neuron Compiler Error Codes <compiler/error-codes/index>`\n\n   .. grid-item-card:: **Legacy Documentation and Samples**\n      :class-card: sd-border-1\n\n      * :doc:`Apache MXNet </archive/mxnet-neuron/index>`\n      * :doc:`TensorFlow Neuron </archive/tensorflow/index>`\n      * :doc:`torch-neuron (Inf1) </archive/torch-neuron/index>`\n      * :doc:`All archived content </archive/index>`\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n   \n   About Neuron </about-neuron/index>\n   Neuron Architecture </about-neuron/arch/index>\n   What's New </about-neuron/whats-new>\n   Announcements </about-neuron/announcements/index>\n   News & Blogs </about-neuron/news-and-blogs/index>\n   Contribute </about-neuron/oss/index>\n\n.. toctree::\n    :maxdepth: 1\n    :caption: Get Started\n    :hidden:\n\n    Quickstarts </about-neuron/quick-start/index>\n    Setup Guides </setup/index>\n    Developer Flows </devflows/index>\n\n.. toctree::\n   :maxdepth: 1\n   :caption: ML Frameworks\n   :hidden:\n\n   Home </frameworks/index>\n   PyTorch </frameworks/torch/index>\n   JAX </frameworks/jax/index>\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Training\n   :hidden:\n\n   NxD Training </libraries/nxd-training/index>\n   NxD Core (Training) </libraries/neuronx-distributed/index-training>\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Inference\n   :hidden:\n\n   Overview </libraries/nxd-inference/neuron-inference-overview>\n   vLLM </libraries/nxd-inference/vllm/index>\n   NxD Inference </libraries/nxd-inference/index>\n   NxD Core (Inference) </libraries/neuronx-distributed/index-inference>\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Developer Tools\n   :hidden:\n\n   Home </tools/index>\n   Neuron Explorer </tools/neuron-explorer/index>\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Orchestrate and Deploy\n   :hidden:\n\n   AWS Workload Orchestration </devflows/index>\n   Neuron DLAMI </dlami/index>\n   Neuron Containers </containers/index>\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Runtime & Collectives\n   :hidden:\n\n   Neuron Runtime </neuron-runtime/index>\n   Collectives </neuron-runtime/about/collectives>\n   Neuron C++ Custom Operators </neuron-customops/index>\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Compilers\n   :hidden:\n\n   Graph Compiler </compiler/index>\n   Compiler Error Codes </compiler/error-codes/index>\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Neuron Kernel Interface (NKI)\n   :hidden:\n\n   Home </nki/index>\n   Get Started </nki/get-started/index>\n   Guides </nki/guides/index>\n   Deep Dives </nki/deep-dives/index>\n   Migration Guides </nki/migration/index>\n   NKI API Reference </nki/api/index>\n   NKI Library </nki/library/index>\n\n.. toctree::\n   :maxdepth: 1\n   :caption: Archive\n   :hidden:\n\n   Archived content </archive/index>\n   \n*AWS and the AWS logo are trademarks of Amazon Web Services, Inc. or its affiliates. All rights reserved.*\n"
  },
  {
    "path": "info/exclude",
    "content": "# The following do not need to be shared outside of staging:\n/.github/*\n/_ext \n/static"
  },
  {
    "path": "libraries/index.rst",
    "content": ".. meta::\n   :description: AWS NeuronX distributed libraries - High-performance distributed training and inference libraries for AWS Trainium and Inferentia, including NxD Core, NxD Inference, NxD Training, and third-party integrations.\n\n.. _libraries-neuron-sdk:\n\nWork with training and inference libraries\n===========================================\n\nAccelerate your machine learning workloads with Neuron's distributed libraries. Our libraries provide high-level abstractions and optimized implementations for distributed training and inference on AWS Trainium and Inferentia.\n\nWhat are NeuronX Distributed libraries?\n----------------------------------------\n\nNeuronX Distributed (NxD) libraries are a comprehensive suite of PyTorch-based libraries designed to enable scalable machine learning on AWS Neuron hardware. The NxD ecosystem provides a layered architecture where foundational distributed primitives support higher-level training and inference workflows.\n\n**The NxD Stack:**\n\n* **NxD Core**: The foundational layer providing distributed primitives, model sharding techniques, and XLA-optimized implementations\n* **NxD Training**: High-level training library built on NxD Core, offering turnkey distributed training workflows with NeMo compatibility\n* **NxD Inference**: Production-ready inference library with advanced features like continuous batching, speculative decoding, and vLLM integration\n\nTogether, these libraries enable developers to scale from prototype to production while leveraging the full performance potential of AWS Trainium and Inferentia instances.\n\nAbout NxD Core Libraries\n------------------------\n        \nNxD Core libraries provide distributed training and inference mechanisms for Neuron devices with XLA-friendly implementations. This includes:\n        \n* :doc:`Tensor Parallel (TP) sharding </libraries/neuronx-distributed/ptl_developer_guide>` (:doc:`Overview </libraries/neuronx-distributed/tensor_parallelism_overview>`)\n* :doc:`Pipeline Parallel (PP) support </libraries/neuronx-distributed/pp_developer_guide>` (:doc:`Overview </libraries/neuronx-distributed/pipeline_parallelism_overview>`)\n* :doc:`Model activation memory reduction support </libraries/neuronx-distributed/activation_memory_reduction_developer_guide>` (:doc:`Overview </libraries/neuronx-distributed/activation_memory_reduction>`)\n* Model partitioning across devices\n* XLA-optimized distributed operations\n* Foundation for other NxD libraries\n\nThe NxD Training and Inference documentation below provides documentation for NxD Core libraries in the context of of training and inference models respectively.\n\nNxD Training and Inference Libraries \n-------------------------------------\n\n.. grid:: 1\n  :gutter: 3\n  :class-container: library-grid\n\n  .. grid-item-card:: NxD Inference\n      :link: /libraries/nxd-inference/index\n      :link-type: doc\n      :class-header: bg-success text-white\n      :class-body: library-card-body\n        \n      PyTorch-based inference library for deploying large models on Inferentia and Trainium.\n        \n       * Large Language Model (LLM) inference\n       * Disaggregated inference architecture\n       * vLLM integration and compatibility\n       * Model sharding and parallelism\n       * Performance optimization tools\n\n  .. grid-item-card:: NxD Training\n      :link: nxdt\n      :link-type: ref\n      :class-header: bg-info text-white\n      :class-body: library-card-body\n\n      PyTorch library for end-to-end distributed training with Neuron.\n        \n       * Large-scale model training\n       * NeMo YAML configuration support\n       * HuggingFace and Megatron-LM models   \n       * Experiment management\n       * Advanced parallelism strategies\n\nOther Libraries\n----------------\n\n.. grid:: 1 1 2 2\n  :gutter: 3\n  :class-container: library-grid\n\n  .. grid-item-card:: Hugging Face Transformers (legacy)\n      :link: /libraries/transformers-neuronx/index\n      :link-type: doc\n      :class-header: bg-success text-white\n      :class-body: library-card-body\n          \n  .. grid-item-card:: NeMo Megatron\n      :link: /libraries/nemo-megatron/index\n      :link-type: doc\n      :class-header: bg-success text-white\n      :class-body: library-card-body\n\nHardware Compatibility\n----------------------\n\n.. list-table::\n   :header-rows: 1\n   :class: compatibility-matrix\n\n   * - Library\n     - Inf1\n     - Inf2\n     - Trn1/Trn1n\n     - Trn2\n     - Inference\n     - Training\n   * - **NxD Core**\n     - N/A\n     - ✅\n     - ✅\n     - ✅\n     - ✅\n     - ✅\n   * - **NxD Inference**\n     - N/A\n     - ✅\n     - ✅\n     - ✅\n     - ✅\n     - N/A\n   * - **NxD Training**\n     - N/A\n     - N/A\n     - ✅\n     - ✅\n     - N/A\n     - ✅\n\n.. _third-party-libraries:\n\nThird-party libraries\n-----------------------\n\nAWS Neuron integrates with multiple third-party partner products that alow you to run deep learning workloads on Amazon EC2 \ninstances powered by AWS Trainium and AWS Inferentia chips. The following list gives an overview of the third-party libraries \nworking with AWS Neuron.\n\nHugging Face Optimum Neuron\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOptimum Neuron bridges Hugging Face Transformers and the AWS Neuron SDK, providing standard Hugging Face APIs for \n`AWS Trainium <https://aws.amazon.com/ai/machine-learning/trainium/>`_ and `AWS Inferentia <https://aws.amazon.com/ai/machine-learning/inferentia/>`_. \nIt offers solutions for both training and inference, including support for large-scale model training and deployment for AI workflows. \nSupporting Amazon SageMaker and pre-built Deep Learning Containers, Optimum Neuron simplifies the use of Trainium and Inferentia \nfor machine learning. This integration allows developers to work with familiar Hugging Face interfaces while leveraging Trainium \nand Inferentia for their transformer-based projects.\n\n`Optimum Neuron documentation <https://huggingface.co/docs/optimum-neuron/en/index>`_\n\nPyTorch Lightning\n^^^^^^^^^^^^^^^^^^^\n\nPyTorch Lightning is a deep learning framework for professional AI researchers and machine learning engineers who need maximal \nflexibility without sacrificing performance at scale. Lightning organizes PyTorch code to remove boilerplate and unlock scalability.\n\n`Get Started with Lightning  <https://lightning.ai/lightning-ai/studios/finetune-llama-90-cheaper-on-aws-trainium~01hh3kj60fs8b8x91rv9n9fn2j?section=featured>`_\n\nUse PyTorch Lightning Trainer with :ref:`NeuronX Distributed <pytorch-lightning>`. \n\n\nAXLearn\n^^^^^^^^^\n\nAXLearn is an open-source JAX-based library used by AWS Neuron for training deep learning models on AWS Trainium. Integrates with JAX ecosystem and supports distributed training.\n\nCheck out the `AXLearn Github repository <https://github.com/apple/axlearn>`_.\n\n\nAdditional libraries\n---------------------\n\nNeMo \n^^^^^\n\n:ref:`NxD Training <nxd-training-overview>` offers a `NeMo <https://github.com/NVIDIA/NeMo>`_-compatible YAML interface for training \nPyTorch models on AWS Trainium chips. The library supports both Megatron-LM and HuggingFace model classes through its model hub. \nNxD Training leverages key NeMo components, including Experiment Manager for tracking ML experiments and data loaders for efficient \ndata processing. This library simplifies the process of training deep learning models on AWS Trainium while providing compatibility \nwith familiar NeMo YAML Interface.\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   HF Transformers </libraries/transformers-neuronx/index>\n   NeMo Megatron </libraries/nemo-megatron/index>\n   NxD Core Release Notes </release-notes/components/nxd-core>\n\n  \n"
  },
  {
    "path": "libraries/nemo-megatron/index.rst",
    "content": ".. _nemo-megatron-index:\n\nAWS Neuron Reference for NeMo Megatron\n======================================\n\nAWS Neuron Reference for NeMo Megatron is a library that includes modified versions of the open-source packages `NeMo <https://github.com/NVIDIA/NeMo>`_ and `Apex <https://github.com/NVIDIA/apex>`_ that have been adapted for use with AWS Neuron and AWS EC2 Trn1 instances.\nThe library supports Tensor Parallel, Pipeline parallel and Data Parallel configurations for distributed training of large language models like GPT-3 175B. The APIs have been optimized for XLA based computation and high performance communication over Trainium instances.\nThe library uses various techniques to improve memory utilization such as sequence parallelism which reduces activation memory footprint, selective or full activation checkpointing which allows larger model configurations to fit. SPMD optimizations are also used whenever possible to reduce the number of graphs obtained.\n\n\n\n.. dropdown::  Setup  (``neuronx-nemo-megatron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    The library can be installed from `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_\n\n\n.. dropdown::  Tutorials  (``neuronx-nemo-megatron``)\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n   \n    * `Launch a GPT-3 pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`_\n    * `Launch a Llama 2 pretraining job using neuronx-nemo-megatron <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md>`_\n\nImportant Tips for Training with Neuron NeMo Megatron\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDo Not Create the Attention Mask\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nIf you are using your own data pipeline, do not create an attention mask for each record. Neuron NeMo Megatron is optimized to create an attention mask on Neuron Cores directly before use. Creating an attention mask per sample consumes excess CPU memory and often causes out of memory errors on CPU."
  },
  {
    "path": "libraries/neuronx-distributed/activation_memory_reduction.rst",
    "content": ".. _activation_memory_reduction:\n\nActivation Memory Reduction\n============================\n\nThere are three major contributors to high device memory utilization: \n`Parameters`, `Optimizer states` and `Activation Memory`.\nTo reduce the size of parameter/optimizer states memory, one can use parallelism \ntechniques like `Tensor-parallelism`, `Pipeline-paralleism` or `Zero1`.\nHowever, as the hidden size and sequence length increases, the size of the \nactivation memory keeps growing linearly with hidden size and quadraticly with\nsequence length. \n\nThe total activation memory without any parallelism comes to about:\n\n.. math::\n\n   \\text{Activations memory per layer} = \\text{sbh} \\left(34 + \\frac{5as}{h}\\right)\n\nwhere,\n\n* `a`: Number of attention heads\n* `b`: microbatch size\n* `h`: hidden dimension size\n* `s`: sequence length\n\n\nWhen we use tensor-parallelism, it not only helps to reduce the parameter and optimizer states\nsize on each device, but it also helps to reduce the activation memory. For a transformer model,\nwhere we apply the tensor-parallel sharding on the attention block (more info `here <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tensor_parallelism_overview.html#tensor-parallelism-overview>`__), \nthe activation memory within the attention block also drops by a factor of tensor-parallel degree (`t`). However, since the layernorms and dropouts \n(which are outside these attention blocks) are not parallelised and they are replicated on each device. These \nlayernorms and dropouts are computationally inexpensive, however, they increase the overall activation memory \non each device. Moreover, since we only parallelize within the attention block or within the MLP block (h -> 4h projection),\nthe inputs to the QKV multiplies and the MLP are still unsharded. This overall adds to about `10sbh` of total activation \nmemory. To reduce this activation memory, one can use 2 methods:\n\n* `Sequence-Parallelism <https://arxiv.org/abs/2105.13120>`__ \n* `Activation Recomputation <https://arxiv.org/abs/1604.06174>`__\n\n\nSequence Parallelism\n====================\n\nSequence-Parallelism was proposed by `Shenggui and et.al <https://arxiv.org/pdf/2105.13120.pdf>`__ . The authors propose to \nparallelize the compute along all the sequence dimension in an attempt the reduce increasing the memory pressure due to high \nsequence-lengths. Sequence-parallelism can be combined with tensor-parallelism to reduce the activation memory pressure \ndue to increasing sequence-lengths.\n\nTensor-parallelism parallelizes the parts of the transformer which are computationally heavy, however, it leaves the \nlayer-norms, dropouts and some MLP block intact. The activation memory for this block adds up to a factor of `10sbh`.\n`Vijay Korthikanti et.al <https://browse.arxiv.org/pdf/2205.05198.pdf>`__ noticed that the compute in the non-tensor parallel \nregion is independent in the sequence dimension. This property can be leveraged to shard the compute along the \nsequence dimension. The main advantage of sharding these non-tensor parallel block is reducing the activation memory.\nWe can use the same tensor-parallel degree to partition, thereby reducing the overall activation memory by a factor of `t`.\nHowever, this partitioning comes at a cost. Since we are partitionining the non-tensor parallel region along sequence dimnesion,\nwe have to collect the activations before we feed to the tensor-parallel block. This requires an introduction of \n`all-gather <https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/collectives.html#allgather>`__ collective \noperation which would gather the activations along the sequence dimension. Similarly, after the tensor-parallel block, since \nwe would have to split the activations along the sequence dimension and distribute among the devices. Since, the tensor-parallel \nblock in the transformer module already uses an all-reduce (Row-parallel linear layer used for MLP), we can replace the \nall-reduce operation with a `reduce-scatter <https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/collectives.html#reducescatter>`__ operation.\n\n.. image:: /libraries/neuronx-distributed/images/sequence_parallel.png\n   :alt: Image: image.png\n\nRef: `Reducing Activation Recomputation in Large Transformer Models <https://browse.arxiv.org/pdf/2205.05198.pdf>`__\n\nIn the figure, `g` is all-gather operation and `g¯` is the reduce-scatter operation. `g` and `g¯` are conjugates and in the \nbackward pass, `g¯` becomes an all-gather operation and `g` becomes the reduce-scatter operation. At first glance, it appears \nthat sequence-parallelism when combined with tensor-parallelism introduces an extra communication operation, however, in a ring \nall-reduce, the op is broken down into all-gather and reduce-scatter. Hence, the bandwidth required for sequence-parallelism is \nsame as tensor-parallelism only. Hence, we are not losing out on compute but end up saving the activation memory per device.\nThe final activation memory when sequence-parallelism is combined with tensor-parallelism:\n\n.. math::\n\n   \\text{Activations memory per layer} = \\text{sbh} \\left(\\frac{10}{t} + \\frac{24}{t} + \\frac{5as}{ht}\\right) = \\frac{\\text{sbh}}{t} \\left(34 + \\frac{5as}{h}\\right)\n\n\nActivation Recomputation\n========================\n\nThe total required memory in the above equation can still be high as we increase the sequence length and hidden size. \nWe would have to keep increasing the tensor-parallel degree to accommodate this requirement. Increasing the tensor-parallel \ndegree might soon start producing diminishing returns as the model would start becoming bandwidth bottlenecked because of the \nextra collective communication operations. `Activation recomputation <https://arxiv.org/abs/1604.06174>`__ can help to alleviate \nthis problem. In this method, we recompute a part of the forward pass during the backward pass, thereby avoiding the need to \nsave the activations during the forward pass. Activation recomputation is a trade-off between duplicate computation vs memory.\nIt allows you to save on memory at the cost of extra recompute. This trade-off becomes valuable when we can fit larger models \nat the expense of recomputing forward pass activations. \n\nIdeally one can recompute the entire forward pass, there by resulting in an activation memory of `2sbh` per transformer layer.\nThis method is called `Full-activation checkpointing`. This memory can further go down by a factor of `t` if we use tensor-parallelism.\nIn the activation memory equation, we have a quadratic term of `5abs^2`. As the sequence length, this term will grow at a much \nfaster rate. This quadratic term comes from the softmax computation. `Vijay Korthikanti et.al <https://browse.arxiv.org/pdf/2205.05198.pdf>`__ \npropose `Selective activation checkpointing` where they only recompute the softmax and attention computation and thereby avoid saving the activations coming \nfrom softmax and attention computation. This completely gets rid of the quadratic term and brings down the activation memory per layer to \n`34sbh/t`. The LLama-7B example in `this tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.html#llama2-7b-tp-zero1-tutorial>`__\nused selective activation checkpointing.\n\n\n\n\n"
  },
  {
    "path": "libraries/neuronx-distributed/activation_memory_reduction_developer_guide.rst",
    "content": ".. _activation_memory_reduction_developer_guide:\n\nDeveloper guide for Activation Memory reduction \n============================================================================\n\nSequence Parallelism\n^^^^^^^^^^^^^^^^^^^^\n\nTo combine sequence parallelism with tensor-parallelism, one needs to follow the steps below:\n\nModel changes for Tensor-Parallel block:\n'''''''''''''''''''''''''''''''''''''''''\n\nFor tensor-parallelism, we replace the linear layers with ColumnParallel and RowParallel Linear \nlayers as mentioned `here <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tp_developer_guide.html#creating-model>`__.\nTo enable sequence-parallel, we need to pass the `sequence_parallel_enabled` for the ColumnParallel and RowParallel linear layers.\nSetting this argument to ``true``, the ColumnParallel and RowParallel Linear layers will introduce the ``all-gather`` and ``reduce-scatter`` \noperations for gathering and distributing the activations along the sequence dimension.\n\n.. code:: ipython3\n   \n   from transformers.models.gpt_neox.modeling_gpt_neox import GPTNeoXAttention\n\n   class class GPTNeoXAttentionNxD(GPTNeoXAttention):\n       def __init__(self, config):\n           super().__init__(config)\n           ....\n           self.query_key_value = ColumnParallelLinear(\n                                    config.hidden_size,\n                                    3 * config.hidden_size,\n                                    stride=3,\n                                    gather_output=False,\n                                    init_method=init_method,\n                                    sequence_parallel_enabled=self.config.sequence_parallel_enabled,\n                                )\n           self.dense = RowParallelLinear(\n                            config.hidden_size,\n                            config.hidden_size,\n                            input_is_parallel=True,\n                            init_method=init_method,\n                            sequence_parallel_enabled=self.config.sequence_parallel_enabled,\n                        )\n           ....\n\nModel changes for Non-Tensor-Parallel block:\n''''''''''''''''''''''''''''''''''''''''''''\n\nIn a transformer module, the non-tensor parallel block contains mainly the Layer-Norm modules. Since we partition \nthe computation along the sequence dimension for the layer-norm, we \nneed to sum up the gradients along the sequence dimension for the Layer-norm. To help us do that, \nwe use the Layer-norm provided from ``neuronx-distributed.parallel_layers.layer_norm``. The Layer-norm in \nneuronx-distributed should uses the same forward and backward as ``torch.nn.LayerNorm``, however, it just marks\nthe weights as sequence-parallel weights. This tagging allows us to look for weights with sequence-parallel \ntagging and reduce those gradients along the tensor-parallel degree. Hence we need to add the following two changes:\n\n\n.. code:: ipython3\n\n   from transformers.models.gpt_neox.modeling_gpt_neox import GPTNeoXLayer\n   from neuronx_distributed.parallel_layers import layer_norm\n\n   class GPTNeoXLayerNxD(GPTNeoXLayer):\n       def __init__(self, config):\n           super().__init__(config)\n           ...\n           self.input_layernorm = layer_norm.LayerNorm(\n                                    config.hidden_size,\n                                    eps=config.layer_norm_eps,\n                                    sequence_parallel_enabled=config.sequence_parallel_enabled\n                                  )\n           self.post_attention_layernorm = layer_norm.LayerNorm(\n                                                config.hidden_size,\n                                                eps=config.layer_norm_eps,\n                                                sequence_parallel_enabled=config.sequence_parallel_enabled\n                                            )\n\nOnce we replace the layernorm with neuronx-distributed's layernorm, it will `mark the weights <https://github.com/aws-neuron/neuronx-distributed/blob/main/src/neuronx_distributed/parallel_layers/layer_norm.py#L32>`__ \nas sequence-parallel weights. Note: If your model is using RMSNorm or any other layer that parallelizes in the sequence-dimension,\nyou can mark the weights as sequence-parallel weights by using the following code:\n\n.. code:: ipython3\n\n    setattr(param, \"sequence_parallel_enabled\", sequence_parallel_enabled)\n\nOnce marked, we then use this attribute when we compute gradients for layer-norm. We need to add the following code before our optimizer.step:\n\n.. code:: ipython3\n\n    def allreduce_sequence_parallel_gradients(optimizer):\n        \"\"\" All-reduce layernorm parameters across model parallel nodes when sequence parallelism is used.\n            Modified from megatron-lm:\n            https://gitlab-master.nvidia.com/ADLR/megatron-lm/-/blob/3f91f09bb2ab32f9904b47f46f19d2fc3f518ed8/megatron/training.py#L425\n        \"\"\"\n        from neuronx_distributed.parallel_layers.mappings import reduce_from_tensor_model_parallel_region\n        grads = []\n        for param_group in optimizer.__getstate__()['param_groups']:\n            for group, params in param_group.items():\n                if group == 'params':\n                    for p in params:\n                        if isinstance(p, torch.Tensor) and p.grad is not None:\n                            sequence_parallel_param = getattr(p, 'sequence_parallel_enabled', False)\n                            if sequence_parallel_param:\n                                grads.append(p.grad.data)\n        for grad in grads:\n            reduce_from_tensor_model_parallel_region(grad)\n\nAs seen in the above code, we reduce the gradients from all tensor parallel devices. This is because the compute is divided along the \nsequence dimension across all the devices participating in the tensor parallel group. For reference implementation, check \nthe `GPTNeoX-20B modeling code <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain.py#L273C1-L289C55>`__ .\n\nTransposing the activations:\n''''''''''''''''''''''''''''\n\nSequence-parallelism implementation requires the sequence dimension to be the 0th dimension whereas the tensor-parallel region \nrequires the sequence dimension to be the first dimension. All our model implementation keeps the sequence dimension \nas 1st dimension and batch dimension as 0th dimension. Hence, to accommodate sequence parallelism, we need to insert a few \ntranspose operations at the following places:\n\n1. Before we start looping through all the layers, we need to transpose the sequence and batch dimension. We \nalso need to partition the inputs along the sequence dimensions such that each tp-rank gets a part. This can be done as:\n\n.. code:: ipython3\n\n    form neuronx_distributed.parallel_layers.mappings import scatter_to_sequence_parallel_region\n    # NxD Core code change: sequence parallel uses seq_len as the 0-th dim\n    if self.config.sequence_parallel_enabled:\n        hidden_states = hidden_states.transpose(0, 1).contiguous()\n        hidden_states = scatter_to_sequence_parallel_region(hidden_states)\n\n2. Since the attention block requires the sequence dimension to be 1st dimension, we transpose the output of QKV projection and then \ntranspose it back before the final MLP of the attention block. \n\n.. code:: ipython3\n\n    # Within the attention module\n    qkv = self.query_key_value(hidden_states)\n\n    if config.sequence_parallel_enabled:\n        qkv = qkv.transpose(0,1)\n    ...\n\n    attn_output = attn_output.transpose(0,1)\n    attn_output = self.dense(attn_output)\n\n\n3. Finally before returning the final output, we need to put all the partial activations along the sequence dimension \nback together. This can be done as follows:\n\n.. code:: ipython3\n\n    form neuronx_distributed.parallel_layers.mappings import gather_from_sequence_parallel_region\n    if self.config.sequence_parallel_enabled:\n        hidden_states = gather_from_sequence_parallel_region(hidden_states, to_model_parallel=False)\n        hidden_states = hidden_states.transpose(0, 1).contiguous()\n\n    return BaseModelOutputWithPast(\n            last_hidden_state=hidden_states,\n            past_key_values=presents,\n            hidden_states=all_hidden_states,\n            attentions=all_attentions,\n        )\n\nThese are the only major changes required to add sequence-parallelism on top of tensor-parallelism. Note: Sequence-parallelism \nuses the same tensor-parallel group. \nFor reference implementation, follow `GPTNeoX-20B model script <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/modeling_gpt_neox_nxd.py>`__.\n\nActivation Recomputation\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nAs seen in the :ref:`App notes on Activation Memory Recomputation <activation_memory_reduction>` we can reduce the activation memory by recomputing few operations from \nthe forward pass during the backward run. To replay some of the compute, we can use the \n`torch.utils.checkpoint.checkpoint <https://pytorch.org/docs/stable/checkpoint.html>`__. To use this API, we need \nto put the compute, we want to replay, inside a function which can be passed to the `checkpoint` API. This API takes care \nof maintaining the RNG seed, not saving the activations and also inserting the forward recompute during the gradient computation.\n\nTo enable selective activation checkpointing for the attention block, we can simply pass the attention block to the checkpoint \napi as follows:\n\n.. code:: ipython3\n\n    if config.selective_activation_checkpointing_is_enabled:\n        attn_output = torch.utils.checkpoint.checkpoint(self._attn, query, key, value, attention_mask, head_mask)\n    else:\n        attn_output = self._attn(query, key, value, attention_mask, head_mask)\n\nNote: To use torch.utils.checkpoint, it is mandatory to use `-O1 <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc/api-reference-guide/index.html?highlight=--O1#cmdoption-neuronx-cc-arg-0>`__ \ncompiler flag. If this is not enabled, the Neuron compiler would eliminate the duplicate recompute as an \noptimization and hence you would not see any memory gains."
  },
  {
    "path": "libraries/neuronx-distributed/api-reference-guide-inference.rst",
    "content": ".. _api_guide_nxd_inference:\n\nInference APIs\n==============\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n.. _nxd_tracing:\n\nModel Trace:\n^^^^^^^^^^^^\n\nWe can use the tensor parallel layers to perform large model inference\ntoo. For performing inference, we can re-use the Parallel model built\nabove for training and then use the trace APIs provided by the\nneuronx_distributed package to trace it for inference. One can use the\nfollowing set of APIs for running distributed inference:\n\n::\n\n   def neuronx_distributed.trace.parallel_model_trace(func, example_inputs, compiler_workdir=None, compiler_args=None, inline_weights_to_neff=True, bucket_config=None, tp_degree=1, max_parallel_compilations=None)\n\nThis API would launch tensor parallel workers, where each worker would\ntrace its own model. These traced models would be wrapped with a single\nTensorParallelModel module which can then be used like any other traced\nmodel.\n\n.. _parameters-9:\n\nParameters:\n\n\n-  ``func : Callable``: This is a function that returns a ``Model``\n   object and a dictionary of states. The ``parallel_model_trace`` API would call this function\n   inside each worker and run trace against them. Note: This differs\n   from the ``torch_neuronx.trace`` where the ``torch_neuronx.trace``\n   requires a model object to be passed.\n\n-  ``example_inputs: (torch.Tensor like)`` : The inputs that needs to be passed to\n   the model. If you are using ``bucket_config``, then this must be a list of inputs for\n   each bucket model. This configuration is similar to :func:`torch_neuronx.bucket_model_trace`\n\n-  ``compiler_workdir: Optional[str,pathlib.Path]`` : Work directory used by\n   Neuron Graph Compiler. This can be useful for debugging and inspecting\n   intermediary Neuron Graph Compiler outputs.\n\n-  ``compiler_args: Optional[Union[List[str],str]]`` : List of strings representing\n   Neuron Graph Compiler compiler arguments. See :ref:`neuron-compiler-cli-reference-guide`\n   for more information about compiler options.\n\n-  ``inline_weights_to_neff: bool`` : A boolean indicating whether the weights should be\n   inlined to the NEFF. If set to False, weights will be separated from the NEFF.\n   The default is ``True``.\n\n-  ``bucket_config: torch_neuronx.BucketModelConfig`` : The config object that defines\n   bucket selection behavior. See :func:`torch_neuronx.BucketModelConfig` for more details.\n\n-  ``tp_degree: (int)`` : How many devices to be used when performing\n   tensor parallel sharding\n\n-  ``max_parallel_compilations: Optional[int]`` : If specified, this function will only trace these numbers\n   of models in parallel, which can be necessary to prevent OOMs while tracing. The default\n   is None, which means the number of parallel compilations is equal to the ``tp_degree``.\n\n\n\n\nTrace Model Save/Load:\n^^^^^^^^^^^^^^^^^^^^^^\n\nSave:\n'''''\n\n::\n\n   def neuronx_distributed.trace.parallel_model_save(model, save_dir)\n\nThis API should save the traced model in ``save_dir``. Each shard would be\nsaved in its respective directory inside the ``save_dir``. \n\nParameters:\n\n-  ``model: (TensorParallelModel)`` : Traced model produced using the ``parallel_model_trace`` API\n-  ``save_dir: (str)`` : The directory where the model would be saved\n\nLoad:\n'''''\n\n::\n\n   def neuronx_distributed.trace.parallel_model_load(load_dir)\n\nThis API will load the sharded traced model into ``TensorParallelModel`` for inference.\n\n.. _parameters-10:\n\nParameters:\n'''''''''''\n\n-  ``load_dir: (str)`` : Directory which contains the traced model.\n"
  },
  {
    "path": "libraries/neuronx-distributed/api-reference-guide-training.rst",
    "content": ".. _api_guide_nxd_training:\n\nTraining APIs\n==============\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nNeuronx-Distributed Training APIs:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn Neuronx-Distributed, we provide a series of APIs under `neuronx_distributed` directly that helps\nuser to apply optimizations in NxD Core easily. These APIs cover configuration, model/optimizer initialization\nand saving/loading checkpoint.\n\nInitialize NxD Core config:\n''''''''''''''''''''''''''''\n\n::\n\n    def neuronx_distributed.trainer.neuronx_distributed_config(\n        tensor_parallel_size=1,\n        pipeline_parallel_size=1,\n        pipeline_config=None,\n        optimizer_config=None,\n        activation_checkpoint_config=None,\n        pad_model=False,\n        sequence_parallel=False,\n        model_init_config=None,\n        lora_config=None,\n    )\n\nThis method initializes NxD Core training config and initialize model parallel. This config\nmaintains all optimization options of the distributed training, and it's a global config\n(the same for all processes).\n\nParameters:\n\n- ``tensor_parallel_size (int)`` : Tensor model parallel size. Default: :code:`1`.\n- ``pipeline_parallel_size (int)`` : Pipeline model parallel size. Default: :code:`1`.\n- ``pipeline_config (dict)`` : Pipeline parallel config. For details please refer to \n    `pipeline parallel guidance <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pp_developer_guide.html>`__.\n    Default: :code:`None`.\n\n- ``optimizer_config (dict)`` : Optimizer config. Default: :code:`{\"zero_one_enabled\": False, \"grad_clipping\": True, \"max_grad_norm\": 1.0}`.\n- Enable ZeRO-1 by setting ``zero_one_enabled`` to ``True``.\n- Enable grad clipping by setting ``grad_clipping`` to ``True``.\n- Change maximum grad norm value by setting ``max_grad_norm``.\n\n- ``activation_checkpoint_config (str of torch.nn.Module)`` : Activation checkpoint config,\n   accept value: ``\"full\"``, ``None``, or any ``torch.nn.Module``. When set to ``full``,\n   regular activation checkpoint enabled (every transformer layer will be re-computed).\n   When set to ``None``, activation checkpoint disabled. When set to any ``torch.nn.Module``,\n   selective activation checkpoint enabled, any provided module will be re-computed.\n   Default: :code:`None`.\n\n- ``pad_model (bool)`` : Whether to pad attention heads of model. Default: :code:`False`.\n- ``sequence_parallel (bool)`` : Whether to enable sequence parallel. Default: :code:`False`.\n- ``model_init_config (dict)`` : Model initialization config. Default: :code:`{\"sequential_move_factor\": 11, \"meta_device_init\": False, \"param_init_fn\": None}`.\n- ``lora_config``: LoRA configuration. Default: :code:`None` with LoRA disabled.\n\n- ``sequential_move_factor``: num of processes instantiating model on host at the same time.\n    This is done to avoid the host OOM. Range: 1-32.\n- ``meta_device_init``: whether to initialize model on meta device.\n- ``param_init_fn``: method that initialize parameters of modules, should be provided when\n    ``param_init_fn`` is ``True``.\n\nInitialize NxD Core Model Wrapper:\n''''''''''''''''''''''''''''''''''''\n\n::\n\n    def neuronx_distributed.trainer.initialize_parallel_model(nxd_config, model_fn, *model_args, **model_kwargs)\n\nThis method initialize NxD Core model wrapper, return a wrapped model that can be used as\na regular ``torch.nn.Module``, while has all the model optimizations in config applied.\nThis wrapper is designed to hide the complexity of optimizations such as pipeline model\nparallel, so that users can simply use the wrapped model as the unwrapped version.\n\nParameters:\n\n- ``nxd_config (dict)``: config generated by ``neuronx_distributed_config``.\n- ``model_fn (callable)``: user provided function to get the model for training.\n- ``model_args`` and ``model_kwargs``: arguments that will be passed to ``model_fn``.\n\nModel wrapper class and its methods:\n\n::\n\n    class neuronx_distributed.trainer.model.NxDModel(torch.nn.Module):\n        def local_module(self):\n            # return the unwrapped local module\n\n        def run_train(self, *args, **kwargs):\n            # method to run one iteration, when pipeline parallel enabled,\n            # user have to use this instead of forward+backward\n\n        def named_parameters(self, *args, **kwargs):\n            # only return parameters on local rank.\n            # same for `parameters`, `named_buffers`, `buffers`\n\n        def named_modules(self, *args, **kwargs):\n            # only return modules on local rank.\n            # same for `modules`, `named_children`, `children`\n\n.. note::\n    \n    As a short cut, users can call ``model.config`` or ``model.dtype`` from wrapped model\n    if original model is hugging face transformers pre-trained model.\n\nInitialize NxD Core Optimizer Wrapper:\n''''''''''''''''''''''''''''''''''''''''\n\n::\n\n    def neuronx_distributed.trainer.initialize_parallel_optimizer(nxd_config, optimizer_class, parameters, **defaults)\n\nThis method initialize NxD Core optimizer wrapper, return a wrapped optimizer that can be used as\na regular ``torch.optim.Optimizer``, while has all the optimizer optimizations in config applied.\n\nThis optimizer wrapper is inherited from ``toch.optim.Optimizer``. It takes in the ``nxd_config`` and\nconfigures the optimizer to work with different distributed training regime.\n\nThe `step` method of the wrapped optimizer contains necessary all-reduce operations and grad clipping.\nOther methods and variables work the same as the unwrapped optimizer.\n\nParameters:\n\n- ``nxd_config (dict)``: config generated by ``neuronx_distributed_config``.\n- ``optimizer_class (Type[torch.optim.Optimizer])``: optimizer class to create the optimizer.\n- ``parameters (iterable)``: parameters passed to the optimizer.\n- ``defaults``: optimizer options that will be passed to the optimizer.\n\n\nEnable LoRA fine-tuning:\n''''''''''''''''''''''''''\n\nLoRA model wrapper\n::\n\n   class LoRAModel(module, LoraConfig)\n\n\nParameters:\n\n- ``module (torch.nn.Module)``: Module to be wrapped with LoRA\n\n- ``LoraConfig``: The LoRA configuration defined in ``neuronx_distributed.modules.lora.LoraConfig``\n\nThe flags in ``LoraConfig`` to initialize LoRA adapter:\n\n- ``enable_lora (bool)``: Enable LoRA fine-tuning. \n\n- ``lora_rank (int)``: The rank of LoRA adapter. A small LoRA rank reduces the memory footprint during fine-tuning, but it may harm the model quality.\n\n- ``lora_alpha (float)``: The alpha parameter for LoRA scaling, i.e., scaling LoRA weights against base model weights.\n\n- ``lora_dropout (float)``: The dropout probability for LoRA layers.\n\n- ``bias (str)``: Bias type for LoRA. Can be ``none``, ``all`` or ``lora_only``.\n\n- ``target_modules (List[str])``: The names of the modules that need LoRA.\n\n- ``use_rslora (bool)``: If True, uses Rank-Stabilized LoRA, which sets the adapter scaling factor to ``lora_alpha/math.sqrt(lora_rank)``.\n\n- ``init_lora_weights (str)``: Weights initialization of LoRA adapter. Can be ``default`` (initialized with ``torch.nn.init.kaiming_uniform_()``) or ``gaussian`` (initialized with ``torch.nn.init.normal_()``).\n\n\n**Usage:**\n\n   We first define the LoRA configuration for fine-tuning. Suppose the target modules is ``[q_proj, v_proj, k_proj]``, \n   it indicates that LoRA will be appied to modules whose name includes any of the keywords. \n   An example is\n\n   ::\n\n      lora_config = neuronx_distributed.modules.lora.LoraConfig(\n         enable_lora=True,\n         lora_rank=16,\n         lora_alpha=32,\n         lora_dropout=0.05,\n         bias=\"none\",\n         target_modules=[\"q_proj\", \"v_proj\", \"k_proj\"],\n      )\n\n   You can enable LoRA fine-tuning like below\n\n   ::\n\n      nxd_config = neuronx_distributed.neuronx_distributed_config(\n        ...\n        lora_config=lora_config,\n      )\n      model = neuronx_distributed.initialize_parallel_model(nxd_config, ...)\n\n   Then the NxD model will be initialized with LoRA adapter enabled.\n\n\nSave Checkpoint:\n''''''''''''''''\n\nMethod to save checkpoint, return ``None``.\n\nThis method saves checkpoints for model, optimizer, scheduler and user contents sequentially.\nModel states are saved on data parallel rank-0 only. When ZeRO-1 optimizer is not turned on,\noptimizer states are also saved like this; while when ZeRO-1 optimizer is turned on, states\nare saved on all ranks. Scheduler and user contents are saved on master rank only. Besides,\nusers can use ``use_xser=True`` to boost saving performance and avoid host OOM. It's achieved\nby saving tensors one by one simultaneously and keeping the original data structure.\nHowever, the resulted checkpoint cannot be loaded using ``load`` api of PyTorch. Users\ncan also use ``async_save=True`` to further boost saving performance. It's achieved by saving tensors\nin separate processes along with computation. Setting ``async_save`` to true will result\nin more host memory being used, therefore increase the risk of application crash due to system\nran out of memory.\n\n::\n\n    def neuronx_distributed.trainer.save_checkpoint(\n        path,\n        tag=\"\",\n        model=None,\n        optimizer=None,\n        scheduler=None,\n        user_content=None,\n        num_workers=8,\n        use_xser=False,\n        num_kept_ckpts=None,\n        async_save=False,\n        zero1_optimizer=False\n    )\n\nParameters:\n\n- ``path (str)``: path to save the checkpoints.\n- ``tag (str)``: tag to save the checkpoints.\n- ``model (torch.nn.Module)``: model to save, optional.\n- ``optimizer (torch.optim.Optimizer)``: optimizer to save, optional.\n- ``scheduler``: scheduler to save, optional.\n- ``user_content``: user contents to save, optional.\n- ``num_workers (int)``: num of processes saving data on host at the same time.\n   This is done to avoid the host OOM, range: 1-32.\n\n- ``use_xser (bool)``: whether to use torch-xla serialization. When enabled, ``num_workers``\n   will be ignored and maximum num of workers will be used. Default: :code:`False`.\n\n- ``num_kept_ckpts (int)``: number of checkpoints to keep on disk, optional. Default: :code:`None`.\n- ``async_save (bool)``: whether to use asynchronous saving method. Default: :code:`False`.\n- ``zero1_optimizer (bool):`` : whether the optimizer state is from a zero1 optimizer, used when optimizer is a dict\n\n\n**Save LoRA Checkpoint:**\n\nNxD also uses ``neuronx_distributed.trainer.save_checkpoint()`` to save LoRA models, but it can set ``save_lora_base`` and ``merge_lora`` in LoraConfig to specify how to save LoRA checkpoint.\nThere are three modes for LoRA checkpoint saving:\n\n* ``save_lora_base=False, merge_lora=False``: Save the LoRA adapter only.\n* ``save_lora_base=True, merge_lora=False``: Save both the base model and the LoRA adapter seperately.\n* ``save_lora_base=True, merge_lora=True``: Merge the LoRA adapter into the base model and then save the base model.\n\n\nOther than the adapter, NxD also needs to save the LoRA configuration file for LoRA loading. \nThe configuration can be saved into the same checkpoint with the adapter, or saved as a seperately json file.\n\n- ``save_lora_config_adapter (bool)``: If False, save the configuration file as a seperately json file.\n\nNote that if LoRA configuration file is saved separately, it is named as ``lora_adapter/adapter_config.json``.\n\nA configuration example to save the LoRA adapter only is\n\n::\n\n   lora_config = neuronx_distributed.modules.lora.LoraConfig(\n      ...\n      save_lora_base=False,  \n      merge_lora=False,      \n      save_lora_config_adapter=True, \n   )\n\n\nLoad Checkpoint:\n''''''''''''''''\n\nMethod to load checkpoint saved by ``save_checkpoint``, return user contents if exists otherwise ``None``.\nIf ``tag`` not provided, will try to use the newest tag tracked by ``save_checkpoint``.\n\nNote that the checkpoint to be loaded must have the same model parallel degrees as in current use,\nand if ZeRO-1 optimizer is used, must use the same data parallel degrees.\n\n::\n\n    def neuronx_distributed.trainer.load_checkpoint(\n        path,\n        tag=None,\n        model=None,\n        optimizer=None,\n        scheduler=None,\n        num_workers=8,\n        strict=True,\n    )\n\nParameters:\n\n- ``path (str)``: path to load the checkpoints.\n- ``tag (str)``: tag to load the checkpoints.\n- ``model (torch.nn.Module)``: model to load, optional.\n- ``optimizer (torch.optim.Optimizer)``: optimizer to load, optional.\n- ``scheduler``: scheduler to load, optional.\n- ``num_workers (int)``: num of processes loading data on host at the same time. This is done to avoid the host OOM, range: ``[1-32]``.\n- ``strict (bool)``: whether to use strict mode when loading model checkpoint. Default: ``True``.\n\n\n**Load LoRA Checkpoint:**\n\nNxD loads LoRA checkpoints by setting flags in LoraConfig.\n\n- ``load_lora_from_ckpt``: Resumes the checkpoint process.\n- ``lora_save_dir``: Load LoRA checkpoint from the specified folder\n- ``lora_load_tag``: Load the LoRA checkpoint with the specified tag\n\nAn example is:\n\n::\n\n   lora_config = LoraConfig(\n      enable_lora=True,\n      load_lora_from_ckpt=True,\n      lora_save_dir=checkpoint_dir,  # checkpoint path\n      lora_load_tag=tag,  # sub-directory under checkpoint path\n   )\n   nxd_config = nxd.neuronx_distributed_config(\n      ...\n      lora_config=lora_config,\n   )\n   model = nxd.initialize_parallel_model(nxd_config, ...)\n\n\nThe NxD model with be initialized with LoRA enabled and LoRA weights loaded. LoRA-related configurations are the same as the LoRA adapter checkpoint.\n\n\n**Sample usage:**\n\n::\n\n    import neuronx_distributed as nxd\n\n    # create config\n    nxd_config = nxd.neuronx_distributed_config(\n        tensor_parallel_size=8,\n        optimizer_config={\"zero_one_enabled\": True, \"grad_clipping\": True, \"max_grad_norm\": 1.0},\n    )\n\n    # wrap model\n    model = nxd.initialize_parallel_model(nxd_config, get_model)\n\n    # wrap optimizer\n    optimizer = nxd.initialize_parallel_optimizer(nxd_config, AdamW, model.parameters(), lr=1e-3)\n\n    ...\n    (training loop):\n        loss = model.run_train(inputs)\n        optimizer.step()\n\n    ...\n    # loading checkpoint (auto-resume)\n    user_content = nxd.load_checkpoint(\n        \"ckpts\",\n        model=model,\n        optimizer=optimizer,\n        scheduler=scheduler,\n    )\n    ...\n    # saving checkpoint\n    nxd.save_checkpoint(\n        \"ckpts\",\n        nxd_config=nxd_config,\n        model=model,\n        optimizer=optimizer,\n        scheduler=scheduler,\n        user_content={\"total_steps\": total_steps},\n    )\n\nModules:\n^^^^^^^^\n\nGQA-QKV Linear Module:\n''''''''''''''''''''''\n\n::\n\n    class neuronx_distributed.modules.qkv_linear.GQAQKVColumnParallelLinear(\n        input_size, output_size, bias=True, gather_output=True,\n        sequence_parallel_enabled=False, dtype=torch.float32, device=None, kv_size_multiplier=1, fuse_qkv=True)\n\nThis module parallelizes the Q,K,V linear projections using ColumnParallelLinear layers. Instead of using \n3 different linear layers, we can replace it with a single QKV module. In case of GQA module, the number of \nQ attention heads are `N` times more than the number of K and V attention heads. The K and V attention heads \nare replicated after projection to match the number of Q attention heads. This helps to reduce the K and V \nweights and is useful especially during inference. However, in case of training these modules, it restricts \nthe tensor-parallel degree that can be used, since the attention heads should be divisible by tensor-parallel \ndegree. Hence, to mitigate this bottleneck, the `GQAQKVColumnParallelLinear` takes in a `kv_size_multiplier` \nargument. The module would replicate the K and V weights `kv_size_multiplier` times thereby allowing you to \nuse higher tensor-parallel degree. Note: here instead of replicating the projection `N/tp_degree` times, we \nend of replicating the weights `kv_size_multiplier` times. This would produce the same result, allow you to use \nhigher tp_degree degree, however, it would result in extra memory getting consumed.\n\n.. _parameters-11:\n\nParameters:\n        \n\n-  ``input_size: (int)`` : First dimension of the weight matrix\n-  ``output_sizes: (List[int])`` : A list of second dimension of the Q and K/V weight matrix\n-  ``bias: (bool)``: If set to True, bias would be added\n-  ``gather_output: (bool)`` : If true, call all-gather on output and make Y available to all\n    Neuron devices, otherwise, every Neuron device will have its output which is Y_i = XA_i\n- ``sequence_parallel_enabled: (bool)`` : When sequence-parallel is enabled, it would gather\n   the inputs from the sequence parallel region and perform the forward and backward passes\n-  ``init_method: (torch.nn.init)`` : Initialization function for the Q and K/V weights.\n-  ``dtype: (dtype)`` : Datatype for the weights\n-  ``device: (torch.device)`` : Device to initialize the weights on. By default, the weights\n    would be initialized on CPU\n- ``kv_size_multiplier: (int)``: Factor by which the K and V weights would be replicated along the first dimension\n- ``fuse_qkv: (bool)``: When fuse_qkv is enabled, a single fused tensor is used for QKV. By default, this parameter is True. \n\n\nCheckpointing:\n^^^^^^^^^^^^^^\n\nThese are set of APIs for saving and loading the checkpoint. These APIs\ntake care of saving and loading the shard depending the tensor parallel\nrank of the worker.\n\nSave Checkpoint:\n''''''''''''''''\n\n::\n\n   def neuronx_distributed.parallel_layers.save(state_dict, save_dir, save_serially=True, save_xser: bool=False, down_cast_bf16=False)\n\n.. note::\n    \n    This method will be deprecated, use ``neuronx_distributed.trainer.save_checkpoint`` instead.\n\nThis API will save the model from each tensor-parallel rank in the\nsave_dir . Only workers with data parallel rank equal to 0 would be\nsaving the checkpoints. Each tensor parallel rank would be creating a\n``tp_rank_ii_pp_rank_ii`` folder inside ``save_dir`` and each ones saves its shard\nin the ``tp_rank_ii_pp_rank_ii`` folder.\nIf ``save_xser`` is enabled, the folder name would be ``tp_rank_ii_pp_rank_ii.tensors``\nand there will be a ref data file named as ``tp_rank_ii_pp_rank_ii`` in save_dir for each rank.\n\n.. _parameters-4:\n\nParameters:\n\n\n-  ``state_dict: (dict)`` : Model state dict. Its the same dict that you\n   would save using torch.save\n-  ``save_dir: (str)`` : Model save directory.\n-  ``save_serially: (bool)``: This flag would save checkpoints one model-parallel rank at a time.\n   This is particularly useful when we are checkpointing large models.\n-  ``save_xser: (bool)``: This flag would save the model with torch xla serialization.\n   This could significantly reduce checkpoint saving time when checkpointing large model, so it's recommended\n   to enable xser when the model is large.\n   Note that if a checkpoint is saved with ``save_xser``, it needs to be loaded with ``load_xser``, vice versa.\n-  ``down_cast_bf16: (bool)``: This flag would downcast the state_dict to bf16 before saving.\n\nLoad Checkpoint\n'''''''''''''''\n\n::\n\n   def neuronx_distributed.parallel_layers.load(\n       load_dir, model_or_optimizer=None, model_key='model', load_xser=False, sharded=True)\n\n.. note:: This method will be deprecated, use ``neuronx_distributed.trainer.load_checkpoint`` instead.\n\nThis API will automatically load checkpoint depending on the tensor\nparallel rank. For large models, one should pass the model object to the\nload API to load the weights directly into the model. This could avoid\nhost OOM, as the load API would load the checkpoints for one tensor\nparallel rank at a time.\n\n.. _parameters-5:\n\nParameters:\n\n\n-  ``load_dir: (str)`` : Directory where the checkpoint is saved.\n-  ``model_or_optimizer``: (torch.nn.Module or torch.optim.Optimizer): Model or Optimizer object.\n-  ``model``: (torch.nn.Module or torch.optim.Optimizer): Model or Optimizer object, equivilant to ``model_or_optimizer``\n-  ``model_key: (str)`` : The model key used when saving the model in the\n   state_dict.\n-  ``load_xser: (bool)`` : Load model with torch xla serialization.\n   Note that if a checkpoint is saved with ``save_xser``, it needs to be loaded with ``load_xser``, vice versa.\n-  ``sharded: (bool)`` : If the checkpoint is not sharded, pass False.\n   This is useful (especially during inference) when the model is\n   trained using a different strategy and you end up saving a single\n   unsharded checkpoint. You can then load this unsharded checkpoint\n   onto the sharded model. When this attribute is set to ``False`` , it\n   is necessary to pass the model object. Note: The keys in the\n   state-dict should have the same name as in the model object, else it\n   would raise an error.\n\nGradient Clipping:\n''''''''''''''''''\n\nWith tensor parallelism, we need to handle the gradient clipping as we\nhave to accumulate the total norm from all the tensor parallel ranks.\nThis should be handled by the following API\n\n::\n\n   def neuronx_distributed.parallel_layers.clip_grad_norm(\n       parameters, max_norm, norm_type=2)\n\n.. _parameters-6:\n\nParameters:\n\n\n-  ``parameters (Iterable[Tensor] or Tensor)`` : an iterable of Tensors\n   or a single Tensor that will have gradients normalized\n-  ``max_norm (float or int)`` :max norm of the gradients\n-  ``norm_type (float or int)`` : type of the used p-norm. Can be ‘inf’\n   for infinity norm.\n\nNeuron Zero1 Optimizer:\n'''''''''''''''''''''''\n\nIn Neuronx-Distributed, we built a wrapper on the Zero1-Optimizer present in torch-xla.\n\n::\n\n   class NeuronZero1Optimizer(Zero1Optimizer)\n\nThis wrapper takes into account the tensor-parallel degree and computes the grad-norm\naccordingly. It also provides two APIs: save_sharded_state_dict and load_sharded_state_dict.\nAs the size of the model grows, saving the optimizer state from a single rank can result in OOMs.\nHence, the api to save_sharded_state_dict can allow saving states from each data-parallel rank. To\nload this sharded optimizer state, there is a corresponding load_sharded_state_dict that allows each\nrank to pick its corresponding shard from the checkpoint directory.\n\n::\n\n   optimizer_grouped_parameters = [\n        {\n            \"params\": [\n                p for n, p in param_optimizer if not any(nd in n for nd in no_decay)\n            ],\n            \"weight_decay\": 0.01,\n        },\n        {\n            \"params\": [\n                p for n, p in param_optimizer if any(nd in n for nd in no_decay)\n            ],\n            \"weight_decay\": 0.0,\n        },\n   ]\n\n   optimizer = NeuronZero1Optimizer(\n        optimizer_grouped_parameters,\n        AdamW,\n        lr=flags.lr,\n        pin_layout=False,\n        sharding_groups=parallel_state.get_data_parallel_group(as_list=True),\n        grad_norm_groups=parallel_state.get_tensor_model_parallel_group(as_list=True),\n    )\n\nThe interface is same as Zero1Optimizer in torch-xla\n\n::\n\n   save_sharded_state_dict(output_dir, save_serially = True)\n\n.. note:: This method will be deprecated, use ``neuronx_distributed.trainer.save_checkpoint`` instead.\n\n.. _parameters-7:\n\nParameters:\n\n\n-  ``output_dir (str)`` : Checkpoint directory where the sharded optimizer states need to be saved\n-  ``save_serially (bool)`` : Whether to save the states one data-parallel rank at a time. This is\n    especially useful when we want to checkpoint large models.\n\n::\n\n   load_sharded_state_dict(output_dir, num_workers_per_step = 8)\n\n.. note:: This method will be deprecated, use ``neuronx_distributed.trainer.load_checkpoint`` instead.\n\n.. _parameters-8:\n\nParameters:\n\n\n-  ``output_dir (str)`` : Checkpoint directory where the sharded optimizer states are saved\n-  ``num_workers_per_step (int)`` : This argument controls how many workers are doing model load\n   in parallel.\n\n\n.. _pytorch-lightning:\n\nNeuron PyTorch-Lightning\n^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron PyTorch-Lightning is currently based on Lightning version 2.4.0, and will eventually be upstreamed Lightning-AI code base\n\nNeuron Lightning Module\n'''''''''''''''''''''''\n\nInherited from `LightningModule <https://lightning.ai/docs/pytorch/stable/common/lightning_module.html>`__\n::\n\n    class neuronx_distributed.lightning.NeuronLTModule(\n        model_fn: Callable,\n        nxd_config: Dict,\n        opt_cls: Callable,\n        scheduler_cls: Callable,\n        model_args: Tuple = (),\n        model_kwargs: Dict = {},\n        opt_args: Tuple = (),\n        opt_kwargs: Dict = {},\n        scheduler_args: Tuple = (),\n        scheduler_kwargs: Dict = {},\n        grad_accum_steps: int = 1,\n        log_rank0: bool = False,\n        manual_opt: bool = True,\n    )\n\nParameters:\n\n- ``model_fn``: Model function to create the actual model\n\n- ``nxd_config``: Neuronx Distributed Config, output of neuronx_distributed.neuronx_distributed_config\n\n- ``opt_cls``: Callable to create optimizer\n\n- ``scheduler_cls``: Callable to create scheduler\n\n- ``model_args``: Tuple of args fed to model callable\n\n- ``model_kwargs``: Dict of keyworded args fed to model callable\n\n- ``opt_args``: Tuple of args fed to optimizer callable\n\n- ``opt_kwargs``: Dict of keyword args fed to optimizer callable\n\n- ``scheduler_args``: Tuple of args fed to scheduler callable\n\n- ``scheduler_args``: Dict of keyworded args fed to scheduler callable\n\n- ``grad_accum_steps``: Grad accumulation steps\n\n- ``log_rank0``: Log at rank 0 (by default it will log at the last PP rank). Note that setting this to True will introduce extra communication per step hence causing performance drop\n\n- ``manual_opt``: Whether to do manual optimization, note that currently NeuronLTModule doesn't support auto optimization so this should always set to True\n\n\nNeuron XLA Strategy\n'''''''''''''''''''\n\nInherited from `XLAStrategy <https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.strategies.XLAStrategy.html>`__\n::\n\n    class neuronx_distributed.lightning.NeuronXLAStrategy(\n        nxd_config: Dict = None,\n        tensor_parallel_size: int = 1,\n        pipeline_parallel_size: int = 1,\n        save_load_xser: bool = True,\n    )\n\nParameters:\n\n- ``nxd_config``: Neuronx Distributed Config, output of neuronx_distributed.neuronx_distributed_config\n\n- ``tensor_parallel_size``: Tensor parallel degree, only needed when nxd_config is not specified\n\n- ``pipeline_parallel_size``: Pipeline parallel degree, only needed when nxd_config is not specified (Note that for now we only support TP with Neuron-PT-Lightning)\n\n- ``save_load_xser``: Set to True will enable save/load with xla serialization, for more context check `Save Checkpoint <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#save-checkpoint>`__\n\n\nNeuron XLA Precision Plugin\n'''''''''''''''''''''''''''\n\nInherited from `XLAPrecisionPlugin <https://github.com/Lightning-AI/lightning/blob/2.1.0/src/lightning/pytorch/plugins/precision/xla.py>`__\n\n::\n\n    class neuronx_distributed.lightning.NeuronXLAPrecisionPlugin\n\nNeuron TQDM Progress Bar\n''''''''''''''''''''''''\n\nInherited from `TQDMProgressBar <https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.TQDMProgressBar.html>`__\n\n::\n\n    class neuronx_distributed.lightning.NeuronTQDMProgressBar\n\n\nNeuron TensorBoard Logger\n'''''''''''''''''''''''''\n\nInherited from `TensorBoardLogger <https://lightning.ai/docs/pytorch/stable/extensions/generated/lightning.pytorch.loggers.TensorBoardLogger.html>`__\n\n::\n\n    class neuronx_distributed.lightning.NeuronTensorBoardLogger(save_dir)\n\nParameters:\n\n- ``save_dir``: Directory to save the log files\n\n\n.. |neuronx-cc| replace:: :ref:`neuronx-cc <neuron-compiler-cli-reference-guide>`\n\n\n"
  },
  {
    "path": "libraries/neuronx-distributed/api-reference-guide.rst",
    "content": ".. _neuronx_distributed_api_guide:\n\nAPI Reference Guide\n===============================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /libraries/neuronx-distributed/api_guide\n    /libraries/neuronx-distributed/api-reference-guide-training\n    /libraries/neuronx-distributed/api-reference-guide-inference\n    /libraries/neuronx-distributed/model_builder_v2_api_reference\n\n.. include:: /libraries/neuronx-distributed/api-reference-guide.txt"
  },
  {
    "path": "libraries/neuronx-distributed/api-reference-guide.txt",
    "content": "* :ref:`api_guide`\n* :ref:`api_guide_nxd_training`\n* :ref:`api_guide_nxd_inference`\n* :ref:`nxd-core-model-builder-v2`\n"
  },
  {
    "path": "libraries/neuronx-distributed/api_guide.rst",
    "content": ".. _api_guide:\n\nDistributed Strategies APIs\n===========================\n\n\nNeuronX Distributed Core (NxD Core) is XLA based library for distributed training and inference on Neuron devices.\nAs part of this library, we support 3D parallelism: Tensor-Parallelism, Pipeline-Parallelism\nand Data-Parallelism. We also support Zero1 optimizer to shard the optimizer weights.\nTo support tensor-parallelism on Neuron, we adopted the Apex Library\nbuilt for CUDA devices. We modified the implementations to work with\nXLA. This document enlist the different APIs and modules provided by the library\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nParallel Model State:\n^^^^^^^^^^^^^^^^^^^^^\n\nInitialize Model Parallelism:\n'''''''''''''''''''''''''''''\n\n::\n\n   def neuronx_distributed.parallel_state.initialize_model_parallel(\n       tensor_model_parallel_size=1,\n       pipeline_model_parallel_size=1,\n   )\n\nThis module would initialize the distributed model training and allows\nusers to set the number of tensor_parallel world size.\n\nParameters:\n\n- ``tensor_model_parallel_size`` : This should set the number of tensor\n  parallel workers. Note the default value is set to 1\n- ``pipeline_model_parallel_size`` : This should set the number of pipeline\n  parallel workers. Note the default value is set to 1\n\nOther helper APIs:\n''''''''''''''''''\n\n-  ``neuronx_distributed.parallel_state.get_data_parallel_size()`` :\n   Returns the data parallel world size depending on the number of\n   global workers and tensor parallel workers.\n-  ``neuronx_distributed.parallel_state.get_tensor_model_parallel_size()``\n   : Returns the tensor parallel world size.\n-  ``neuronx_distributed.parallel_state.get_tensor_model_parallel_rank()``\n   : Returns the rank of the worker within the tensor parallel group\n-  ``neuronx_distributed.parallel_state.get_pipeline_model_parallel_size()``\n   : Returns the pipeline parallel world size.\n-  ``neuronx_distributed.parallel_state.get_pipeline_model_parallel_rank()``\n   : Returns the rank of the worker within the pipeline parallel group\n-  ``neuronx_distributed.parallel_state.get_data_parallel_rank()`` :\n   Returns the rank of the worker in the data parallel group.\n-  ``neuronx_distributed.parallel_state.get_data_parallel_group(as_list=False)``\n   : Returns the data parallel group after taking into account the\n   tensor parallel size and the global world size. as_list argument when\n   set to True, would return the group as a List[List] otherwise it\n   would return a torch.distributed.group.\n-  ``neuronx_distributed.parallel_state.get_tensor_model_parallel_group(as_list=False)``\n   : Returns the tensor parallel group after taking into account the\n   tensor parallel size and the global world size. as_list argument when\n   set to True, would return the group as a List[List] otherwise it\n   would return a torch.distributed.group.\n-  ``neuronx_distributed.parallel_state.get_pipeline_model_parallel_group(as_list=False)``\n   : Returns the pipeline parallel group after taking into account the\n   pipeline parallel size and the global world size. as_list argument when\n   set to True, would return the group as a List[List] otherwise it\n   would return a torch.distributed.group.\n- ``move_model_to_device(model, device)``: This api moves the model to device by\n  preserving tensor parallel attributes.\n\nParallel Layers:\n^^^^^^^^^^^^^^^^\n\nMajority of parameters within the transformer based model reside in the\nEmbedding and Linear layers. Hence, to reduce the number of parameters\non a single device because of these layers, we provided sharded\nEmbedding and Linear layers.\n\nParallel Embedding:\n'''''''''''''''''''\n\n::\n\n   class neuronx_distributed.parallel_layers.ParallelEmbedding(\n       num_embeddings, embedding_dim, init_method=init.normal_,\n       dtype=torch.float32, device=None)\n\nThis module is intended to replace torch.nn.Embedding . In cases where\nthe vocab size is too large, we can shard the Embedding table across\nworkers. Note: The embedding table would be sharded across all the\ntensor-parallel workers.\n\n.. _parameters-1:\n\nParameters:\n\n-  ``num_embeddings (int)`` : size of the dictionary of embeddings\n-  ``embedding_dim (int)`` : the size of each embedding vector\n-  ``init_method: (torch.nn.init)`` : Initialization function for the\n   embedding weights.\n-  ``dtype: (dtype)`` : Datatype for the weights\n-  ``device: (torch.device)`` : Device to initialize the weights on. By\n   default, the weights would be initialized on CPU\n\nColumnParallel Linear Layer:\n''''''''''''''''''''''''''''\n\n::\n\n   class neuronx_distributed.parallel_layers.ColumnParallelLinear(\n       input_size, output_size, bias=True, gather_output=True,\n       sequence_parallel_enabled=False, dtype=torch.float32, device=None)\n\nThis module would perform a Column wise partition of the weight matrix.\nLinear layer is defined as ``Y = XA + b`` , here A is parallelized along\nsecond dimension as ``A = [A_1, A_2 .... A_p]`` . ``Note``: This layer\nis designed to operate on 3-dimensional inputs.\n\n.. _parameters-2:\n\nParameters:\n\n-  ``input_size: (int)`` : First dimension of the weight matrix\n-  ``output_size: (int)`` : Second dimension of the weight matrix\n-  ``bias: (bool)``: If set to True, bias would be added\n-  ``gather_output: (bool)`` : If true, call all-gather on output and\n   make Y available to all Neuron devices, otherwise, every Neuron\n   device will have its output which is Y_i = XA_i\n- ``sequence_parallel_enabled: (bool)`` : When sequence-parallel is enabled, it would\n   gather the inputs from the sequence parallel region and perform the forward and backward\n   passes\n-  ``dtype: (dtype)`` : Datatype for the weights\n-  ``device: (torch.device)`` : Device to initialize the weights on. By\n   default, the weights would be initialized on CPU\n\nRowParallel Linear Layer:\n'''''''''''''''''''''''''\n\n::\n\n   class neuronx_distributed.parallel_layers.RowParallelLinear(\n       input_size, output_size, bias=True, input_is_parallel=False,\n       sequence_parallel_enabled=False, dtype=torch.float32, device=False\n   )\n\nThe linear layer is defined as ``Y = XA + b``. A is parallelized along\nits first dimension and X along its second. ``Note``: This layer is\ndesigned to operate on 3-dimensional inputs.\n\n.. _parameters-3:\n\nParameters:\n\n-  ``input_size: (int)`` : First dimension of the weight matrix\n-  ``output_size: (int)`` : Second dimension of the weight matrix\n-  ``bias: (bool)`` : If set to True, bias would be added\n-  ``input_is_parallel: (bool)`` : If true, we assume that the input is\n   already split across the Neuron devices and we do not split again.\n   This is useful when we have a ColumnParallel Layer just before the\n   Row Parallel layer\n-  ``sequence_parallel_enabled: (bool)`` : When sequence-parallel is enabled, it would\n   gather the inputs from the sequence parallel region and perform the forward and backward\n   passes\n-  ``dtype: (dtype)`` : Datatype for the weights\n-  ``device: (torch.device)`` : Device to initialize the weights on. By\n   default, the weights would be initialized on CPU\n\n\nPadding Tensor-Parallel Layers\n''''''''''''''''''''''''''''''\n\n::\n\n   def neuronx_distributed.parallel_layers.pad.pad_model(\n      model, tp_degree, n_heads, wrapped_classes=(), pad_hook_fn=None)\n\n\nPads a generic model to function to a desired tensor parallelism degree by padding the\nnumber of attention heads. Returns the original model modified with padding.\nUses 1-axis padding strategy: pads the sharded dim of the ParallelLinear layers to the\nsize it would have been for the padded number of heads.\n\n.. _parameters-4:\n\nParameters:\n\n- ``model (torch.nn.Module)`` : model to be padded\n- ``tp_degree (int)`` : tensor parallel degree\n- ``n_heads (int)`` : the number of heads the given model to be padded has. This can\n   typically be found in the config\n- ``wrapped_classes (Tuple[any], *optional*, defaults to `()`)`` : tuple of classes\n   (and their submodules) which should be padded\n- ``pad_hook_fn (Callable[any, float], *optional*, defaults to `None`)`` : a hook\n   function that is called whenever encountering a class to pad. Receives an instance\n   of the class to pad and the tgt_src_ratio (num_heads_padded / num_heads)as its argument\n\nUsage:\n\n   When modifying the Attention layer, typically you must divide by TP degree like so:\n\n   ``self.num_heads = neuronx_dist_utils.divide(self.num_heads, get_tensor_model_parallel_size())``\n\n   This line must be modified like so:\n  \n   .. code-block:: python\n\n      self.num_heads = neuronx_dist_utils.divide(\n         self.num_heads + get_number_of_extra_heads(self.num_heads, get_tensor_model_parallel_size()),\n         get_tensor_model_parallel_size())\n\n   Then, after initializing the model, you must call this wrapper:\n   \n   .. code-block:: python\n\n      model = get_model(config=desired_config)\n      model = pad_model(model, tp_degree=32, desired_config.num_heads)  # Use the model as desired after this point\n\n   You can specify a specific layer or class for your model to pad, so you aren't unnecessarily padding.\n   Typically, this layer will be your Attention layer\n   \n   ``model = pad_model(model, tp_degree=32, desired_config.num_heads, wrapped_classes=[MyAttention])``\n\n   You can also specify a pad_hook_fn, to be called whenever encountering an instance of wrapped_class,\n   passing in said instance as a parameter, along with the tgt_src_ratio (num_heads_padded / num_heads).\n   \n   .. code-block:: python\n\n      def my_hook(attention_to_pad, tgt_src_ratio):\n         attention_to_pad.split_size = int(model.split_size * tgt_src_ratio)\n         model = pad_model(\n                  model,\n                  tp_degree=32,\n                  desired_config.num_heads,\n                  wrapped_classes=[MyAttention],\n                  pad_hook_fn=my_hook\n               )\n\n\nLoss functions:\n''''''''''''''''''\n\nWhen you shard the final MLP layer using tensor-parallelism, instead of\nrecollecting all the outputs from each TP rank, we can use the\nParallelCrossEntropy loss function. This function would take the parallel\nlogits produced by final parallel MLP and produce a loss by taking into\naccount that the logits are sharded across multiple workers.\n\n\n::\n\n   def neuronx_distributed.parallel_layers.loss_functions.parallel_cross_entropy(\n       parallel_logits, labels, label_smoothing=0.0)\n\n.. _parameters-6:\n\nParameters:\n\n\n-  ``parallel_logits (Tensor)`` : Sharded logits from the previous MLP\n-  ``labels (Tensor)`` : Label for each token. Labels should not be sharded, and the parallel_cross_entropy would take care of sharding the labels internally\n-  ``label_smoothing (float)`` : A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss, where 0.0 means no smoothing\n\nPipeline parallelism:\n^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron Distributed Pipeline Model\n'''''''''''''''''''''''''''''''''\n\n::\n\n   class NxDPPModel(\n        module: torch.nn.Module,\n        transformer_layer_cls: Optional[Any] = None,\n        num_microbatches: int = 1,\n        virtual_pipeline_size: int = 1,\n        output_loss_value_spec: Optional[Union[Dict, Tuple]] = None,\n        return_mb_loss: bool = False,\n        broadcast_and_average_loss: bool = False,\n        pipeline_cuts: Optional[List[str]] = None,\n        input_names: Optional[List[str]] = None,\n        leaf_module_cls: Optional[List[Any]] = None,\n        autowrap_functions: Optional[Tuple[ModuleType]] = None,\n        autowrap_modules: Optional[Tuple[Callable, ...]] = None,\n        tracer_cls: Optional[Union[str, Any]] = None,\n        param_init_fn: Optional[Any] = None,\n        trace_file_path: Optional[str] = None,\n        use_zero1_optimizer: bool = False,\n        auto_partition: Optional[bool] = False,\n        deallocate_pipeline_outputs: bool = False,\n   )\n\nParameters:\n\n- ``module``: Module to be distributed with pipeline parallelism\n\n- ``transformer_layer_cls``: The module class of transformer layers\n\n- ``num_microbatches``: Number of pipeline microbatchs\n\n- ``virtual_pipeline_size``: Virtual pipeline size if greater than 1 we will use the interleaved pipeline schedule.\n\n- ``output_loss_value_spec``:\n      The ``output_loss_value_spec`` value can be specified to disambiguate\n      which value in the output of `forward` is the loss value on which NxDPPModel should apply\n      backpropagation. For example, if your ``forward`` returns a tuple ``(loss, model_out)``,\n      you can specify ``output_loss_value_spec=(True, False)``. Or, if your ``forward`` returns\n      a dict ``{'loss': loss_value, 'model_out': model_out}``, you can specify\n      ``output_loss_value_spec={'loss': True, 'model_out': False}``\n      referred from `this <https://github.com/pytorch/PiPPy/blob/main/pippy/IR.py#L697>`__\n\n- ``return_mb_loss``: Whether return a list of loss for all microbatchs\n\n- ``broadcast_and_average_loss``:Whether to broadcast loss to all PP ranks and average across dp ranks, when set to True return_mb_loss must be False\n\n- ``pipeline_cuts``: A list of layer names that will be used to annotate pipeline stage boundaries\n\n- ``input_names``:The input names that will be used for tracing, which will be the same as the model inputs during runtime.\n\n- ``leaf_module_cls``:A list of module classes that should be treated as leaf nodes during tracing. Note transformer layer class will be by default treat as leaf nodes.\n\n- ``autowrap_modules``: (symbolic tracing only)\n      Python modules whose functions should be wrapped automatically\n      without needing to use fx.wrap().\n      reference `here <https://github.com/pytorch/pytorch/blob/main/torch/fx/_symbolic_trace.py#L241>`__\n\n- ``autowrap_functions``: (symbolic tracing only)\n      Python functions that should be wrapped automatically without\n      needing to use fx.wrap().\n      reference `here <https://github.com/pytorch/pytorch/blob/main/torch/fx/_symbolic_trace.py#L241>`__\n\n- ``tracer_cls``:User provided tracer class for symbolic tracing. It can be \"hf\", \"torch\" or any tracer class user created.\n\n- ``param_init_fn``:\n      Function used to initialize parameters. This is useful if user wants to use meta device to do\n      delayed parameter initialization. param_init_fn should take a module as input and initialize the\n      parameters that belongs to this module only (not for submodules).\n\n- ``use_zero1_optimizer``: Whether to use the zero1 optimizer. When setting to True the gradient average will be handed over.\n\n- ``auto_partition``:\n      Boolean to indicate whether to use auto_partition for the model. When set to True, the pipeline\n      cuts used as the pipeline stage boundaries to partition the model are automatically determined. When set to\n      True, the pipeline_cuts parameter should not be set. The pipeline_cuts are chosen on the basis of the transformer layer names.\n\n- ``deallocate_pipeline_outputs``: \n      Whether to deallocate the pipeline outputs after send. After send the output tensor is only useful for its \n      '.grad_fn' field, and not its '.data'.\n\nCommon used APIs\n'''''''''''''''''\n\n::\n\n   NxDPPModel.run_train(**kwargs)\n\nTrain the model with PP schedule, which will run both forward and backward in a PP manner.\nThe kwargs should be the same as the input_names provided to the trace function.\nWill output the loss that provided by user from output_loss_value_spec.\n\n::\n\n   NxDPPModel.run_eval(**kwargs)\n\nEval the model with PP schedule, which will run forward only.\nThe kwargs should be the same as the input_names provided to the trace function.\nWill output the loss that provided by user from output_loss_value_spec.\n\n::\n\n   NxDPPModel.local_named_parameters(**kwargs)\n\nThe parameters that are local to this PP rank. This must be called after the model is partitioned.\n\n::\n\n   NxDPPModel.local_named_modules(**kwargs)\n\n"
  },
  {
    "path": "libraries/neuronx-distributed/app_notes.rst",
    "content": ".. _neuronx_distributed_appnotes:\n\nApp Notes \n====================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /libraries/neuronx-distributed/tensor_parallelism_overview\n    /libraries/neuronx-distributed/pipeline_parallelism_overview\n    /libraries/neuronx-distributed/activation_memory_reduction\n    /libraries/neuronx-distributed/context_parallelism_overview\n\n\n\n.. include:: /libraries/neuronx-distributed/app_notes.txt"
  },
  {
    "path": "libraries/neuronx-distributed/app_notes.txt",
    "content": "* :ref:`tensor_parallelism_overview`\n* :ref:`pipeline_parallelism_overview`\n* :ref:`activation_memory_reduction`\n* :ref:`context_parallelism_overview`\n"
  },
  {
    "path": "libraries/neuronx-distributed/context_parallelism_overview.rst",
    "content": ".. _context_parallelism_overview:\n\nContext Parallelism Overview \n===============================\n\nContext parallelism (CP) is a technique used in deep learning model training to train large context models.\nCP parallelizes the processing of neural network activations across multiple devices by partitioning the input \ntensors along the sequence dimension. CP reduces the memory footprint and computational cost of processing long sequences.\nUnlike Sequence Parallelism (SP) that partitions the activations of specific layers, CP divides the activations of all layers.\n\nThe implementation of Context Parallelism in NxD leverages `Ring Attention <https://arxiv.org/abs/2310.01889>`_. Ring Attention\nenables efficient communication between devices by organizing them in a ring topology, allowing tokens to attend to each other \nacross devices without needing full attention computation on each device. This reduces memory overhead while extending the \nfeasible context length beyond traditional transformer models.\n\nFor more details, refer to Context Parallelism in Megatron <https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/context_parallel.html>_\n\n.. image:: /libraries/neuronx-distributed/images/cp.png\n   :alt: Image: image.png\n\nFig: Context Parallelism in NxD (Figure adapted from `Megatron \nCP <https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/context_parallel.html>`_).\nIn NxD's TP implementation, we make use of All-Gather (AG), Reduce-Scatter (RS) collectives. Further\nCP is applied to all layers including LayerNorm (LN), Linear (LIN) and Fully-Connected (FC) layers.\nThe figure shows a transformer layer running with TP2 and CP2. Assuming sequence length is 8K, each device processes 4K tokens. \nDevice0 and Device2 form a CP group and exchange KV with each other; similarly, Device1 and Device3 form a CP group and exchange KV with each other. \nThe collective communication to exchange KV is handled by NxD using approaches described in the \n`Ring Attention <https://arxiv.org/abs/2310.01889>`_ paper.\n   \n\n"
  },
  {
    "path": "libraries/neuronx-distributed/developer-guide-inference.rst",
    "content": ".. _neuronx_distributed_developer_guide_inference:\n\nInference Developer Guide\n==========================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /libraries/neuronx-distributed/neuronx_distributed_inference_developer_guide\n\n.. include:: /libraries/neuronx-distributed/developer-guide-inference.txt"
  },
  {
    "path": "libraries/neuronx-distributed/developer-guide-inference.txt",
    "content": "* :ref:`neuronx_distributed_inference_developer_guide`"
  },
  {
    "path": "libraries/neuronx-distributed/developer-guide-training.rst",
    "content": ".. _neuronx_distributed_developer_guide_training:\n\nTraining Developer Guides\n==========================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n   \n    /libraries/neuronx-distributed/tp_developer_guide\n    /libraries/neuronx-distributed/pp_developer_guide\n    /libraries/neuronx-distributed/activation_memory_reduction_developer_guide\n    /libraries/neuronx-distributed/save_load_developer_guide\n    /libraries/neuronx-distributed/ptl_developer_guide\n    /libraries/neuronx-distributed/model_optimizer_wrapper_developer_guide\n    /libraries/neuronx-distributed/lora_finetune_developer_guide\n\n.. include:: /libraries/neuronx-distributed/developer-guide-training.txt"
  },
  {
    "path": "libraries/neuronx-distributed/developer-guide-training.txt",
    "content": "* :ref:`tp_developer_guide`\n* :ref:`pp_developer_guide`\n* :ref:`activation_memory_reduction_developer_guide`\n* :ref:`save_load_developer_guide`\n* :ref:`ptl_developer_guide`\n* :ref:`model_optimizer_wrapper_developer_guide`"
  },
  {
    "path": "libraries/neuronx-distributed/developer-guide.rst",
    "content": ".. _neuronx_distributed_developer_guide:\n\nDeveloper Guide \n==========================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /libraries/neuronx-distributed/developer-guide-training.rst\n    /libraries/neuronx-distributed/developer-guide-inference.rst\n\n.. include:: /libraries/neuronx-distributed/developer-guide.txt\n\n\n\n\n"
  },
  {
    "path": "libraries/neuronx-distributed/developer-guide.txt",
    "content": "* :ref:`neuronx_distributed_developer_guide_training`\n* :ref:`neuronx_distributed_developer_guide_inference`\n"
  },
  {
    "path": "libraries/neuronx-distributed/index-inference.rst",
    "content": ".. meta::\n    :description: Home page for the NxD Inference for Training (NxDI) library included with the Neuron SDK.\n    :date-modified: 12/02/2025\n\n.. _neuronx-distributed-inference-index:\n\n\nNxD Core for Inference\n=======================\n\nNeuronX Distributed Core (NxD Core) is a package for supporting different distributed\ninference mechanisms for Neuron devices. It provides XLA-friendly\nimplementations of some of the more popular distributed\ninference techniques. As the size of the model scales, fitting\nthese models on a single device becomes impossible and hence we have to\nmake use of model sharding techniques to partition the model across\nmultiple devices.\n\nAs part of this library, we enable support for Tensor\nParallelism sharding technique with other distributed library supported to be\nadded in future.\n\n.. _neuronx_distributed_inference_developer_guide:\n\nAbout NeuronX-Distributed (NxD) Inference\n------------------------------------------\n\nNeuronX Distributed (NxD Core) provides fundamental building blocks that enable you to run advanced inference workloads on AWS Inferentia and Trainium instances. These building blocks include parallel linear layers that enable distributed inference, a model builder that compiles PyTorch modules into Neuron models, and more.\n\nAs part of NxD Core, Neuron offers NxD Inference, which is a library that provides optimized model and module implementations that build on top of NxD Core. For more information about NxD Inference, see :ref:`nxdi-overview`.\n\nFor examples of how to build directly on NxD Core, see the following:\n\n* `Llama 3.2 1B inference sample <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama>`_\n* T5 3B inference tutorial :ref:`[html] </src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Setup </libraries/neuronx-distributed/setup/index>\n    App Notes </libraries/neuronx-distributed/app_notes>\n    API Reference Guide </libraries/neuronx-distributed/api-reference-guide>\n    Developer Guide </libraries/neuronx-distributed/developer-guide-inference>\n    LoRA Guide </libraries/neuronx-distributed/lora_finetune_developer_guide>\n\n    Tutorials  </libraries/neuronx-distributed/tutorials/index>\n    Misc  </libraries/neuronx-distributed/neuronx-distributed-misc>\n\nNxD Core for Inference Documentation\n-------------------------------------\n\n.. dropdown::  Setup\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /libraries/neuronx-distributed/setup/index.txt\n\n.. dropdown::  App Notes\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /libraries/neuronx-distributed/app_notes.txt\n\n.. dropdown::  API Reference Guide\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /libraries/neuronx-distributed/api-reference-guide.txt\n\n.. dropdown::  Developer Guide\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /libraries/neuronx-distributed/developer-guide-inference.txt\n\n.. dropdown::  Tutorials\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /libraries/neuronx-distributed/tutorials/neuronx_distributed_tutorials.txt\n\n\n.. dropdown::  Misc\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /libraries/neuronx-distributed/neuronx-distributed-misc.txt\n"
  },
  {
    "path": "libraries/neuronx-distributed/index-training.rst",
    "content": ".. meta::\n    :description: Home page for the NxD Core for Training (NxDT) library included with the Neuron SDK.\n    :date-modified: 12/02/2025\n\n.. _neuronx-distributed-training-index:\n.. _neuronx-distributed-index:\n\n\nNxD Core for Training\n=======================\n\nNeuronX Distributed Core (NxD Core) is a package for supporting different distributed training mechanisms for Neuron devices. It provides XLA-friendly implementations of some of the more popular distributed\ntraining techniques. As the size of the model scales, fitting these models on a single device becomes impossible and hence we have to make use of model sharding techniques to partition the model across multiple devices. \n\n\nAbout NeuronX-Distributed (NxD) for Training\n---------------------------------------------\n\nNeuronX Distributed (NxD Core) provides fundamental building blocks that enable you to run advanced inference workloads on AWS Inferentia and Trainium instances. These building blocks include parallel linear layers that enable distributed inference, a model builder that compiles PyTorch modules into Neuron models, and more.\n\nThe NeuronX Distributed Training (NxD Training) library is a collection of open-source tools and libraries designed to empower customers to train PyTorch models on AWS Trainium instances. It combines both ease-of-use and access to features built on top of\nNxD Core library. Except for a few Trainium-specific features, NxD Training is compatible with training platforms like NVIDIA's NeMo.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Setup </libraries/neuronx-distributed/setup/index>\n    App Notes </libraries/neuronx-distributed/app_notes>\n    API Reference Guide </libraries/neuronx-distributed/api-reference-guide>\n    Developer Guide  </libraries/neuronx-distributed/developer-guide-training>\n    Tutorials  </libraries/neuronx-distributed/tutorials/index>\n    Misc  </libraries/neuronx-distributed/neuronx-distributed-misc>\n\nNxD Core for Inference Documentation\n-------------------------------------\n\n.. dropdown::  Setup  \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n    \n    .. include:: /libraries/neuronx-distributed/setup/index.txt\n\n.. dropdown::  App Notes  \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n   \n    .. include:: /libraries/neuronx-distributed/app_notes.txt\n\n.. dropdown::  API Reference Guide  \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n    \n    .. include:: /libraries/neuronx-distributed/api-reference-guide.txt\n\n.. dropdown::  Developer Guide  \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n    \n    .. include:: /libraries/neuronx-distributed/developer-guide-training.txt\n\n.. dropdown::  Tutorials  \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n    \n    .. include:: /libraries/neuronx-distributed/tutorials/neuronx_distributed_tutorials.txt\n\n\n.. dropdown::  Misc  \n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n    \n    .. include:: /libraries/neuronx-distributed/neuronx-distributed-misc.txt\n"
  },
  {
    "path": "libraries/neuronx-distributed/lora_finetune_developer_guide.rst",
    "content": "\n.. _lora_finetune_developer_guide:\n\nDeveloper guide for LoRA finetuning\n===================================\n\nThis document will introduce how to enable model finetuning with LoRA.\n\nFor a complete api guide, refer to :ref:`API <api_guide>`.\n\nEnable LoRA finetuning:\n'''''''''''''''''''''''\n\nWe first set up LoRA-related configurations:\n\n.. code:: ipython3\n\n    lora_config = nxd.modules.lora.LoraConfig(\n        enable_lora=True,\n        lora_rank=16,\n        lora_alpha=32,\n        lora_dropout=0.05,\n        bias=\"none\",\n        lora_verbose=True,\n        target_modules=[\"q_proj\", \"v_proj\", \"k_proj\"],\n        save_lora_base=False,\n        merge_lora=False,\n    )\n\n\nThe default target modules for different model architectures can be found in `model.py <https://github.com/aws-neuron/neuronx-distributed/blob/main/src/neuronx_distributed/modules/lora/model.py>`_.\n\n\nWe then initialize NxD model with LoRA enabled:\n\n.. code:: ipython3\n\n    nxd_config = nxd.neuronx_distributed_config(\n        ...\n        lora_config=lora_config,\n    )\n    model = nxd.initialize_parallel_model(nxd_config, ...)\n\n\nSave LoRA checkpoint\n''''''''''''''''''''\n\nUsers can save the LoRA adapter with\n\n.. code:: ipython3\n\n    nxd.save_checkpoint(\n        checkpoint_dir_str=checkpoint_dir, # checkpoint path\n        tag=tag,     # sub-directory under checkpoint path\n        model=model\n    )\n\n\nBecause ``save_lora_base=False`` and ``merge_lora=False``, only the LoRA adapter is saved under ``checkpoint_dir/tag/``.\nWe can also set ``merge_lora=True`` to save the merged model, i.e., merging LoRA adapter into the base model.\n\n\nLoad LoRA checkpoint:\n''''''''''''''''''''''\n\nA sample usage:\n\n.. code:: ipython3\n\n    lora_config = LoraConfig(\n        enable_lora=True,\n        load_lora_from_ckpt=True,\n        lora_save_dir=checkpoint_dir,  # checkpoint path\n        lora_load_tag=tag,  # sub-directory under checkpoint path\n    )\n    nxd_config = nxd.neuronx_distributed_config(\n        ...\n        lora_config=lora_config,\n    )\n    model = nxd.initialize_parallel_model(nxd_config, ...)\n   \n   \nThe NxD model with be initialized with LoRA enabled and LoRA weights loaded. LoRA-related configurations are the same as the LoRA adapter checkpoint."
  },
  {
    "path": "libraries/neuronx-distributed/model_builder_v2_api_reference.rst",
    "content": ".. _nxd-core-model-builder-v2:\n\nModelBuilderV2 API Reference\n==============================================\n\nAPIs\n~~~~\n\n- `neuronx_distributed.trace.model_builder.trace`_\n- `neuronx_distributed.trace.model_builder.compile`_\n- `neuronx_distributed.shard_checkpoint`_\n- `neuronx_distributed.ModelBuilder`_\n- `neuronx_distributed.ModelBuilder.trace`_\n- `neuronx_distributed.ModelBuilder.compile`_\n- `neuronx_distributed.trace.nxd_model.base_nxd_model.StateInitializer`_\n- `neuronx_distributed.NxDModel`_\n- `neuronx_distributed.NxDModel.add`_\n- `neuronx_distributed.NxDModel.get_neff`_\n- `neuronx_distributed.NxDModel.get_metaneff`_\n- `neuronx_distributed.NxDModel.get_hlo`_\n- `neuronx_distributed.NxDModel.set_weights`_\n- `neuronx_distributed.NxDModel.to_neuron`_\n- `neuronx_distributed.NxDModel.replace_weights`_\n- `neuronx_distributed.NxDModel.read_from_neuron_buffer`_\n- `neuronx_distributed.NxDModel.write_to_neuron_buffer`_\n- `neuronx_distributed.NxDModel.forward`_\n- `neuronx_distributed.NxDModel.save`_\n- `neuronx_distributed.NxDModel.load`_\n\n`Usage Notes`_\n\n**Examples**\n\n`Usage Examples`_\n\n- `E2E with ModelBuilder APIs`_\n- `E2E with Fundamental Units`_\n\nneuronx_distributed.trace.model_builder.trace\n=============================================\n\n::\n\n   neuronx_distributed.trace.model_builder.trace(\n       model: Union[Callable, torch.nn.Module],\n       args: Union[None, torch.Tensor, Tuple[torch.Tensor, ...]] = None,\n       kwargs: Optional[Dict[str, torch.Tensor]] = None,\n       spmd: bool = True,\n       preserve_parameters: bool = True,\n   ) -> TraceArtifacts\n\nThe ``trace()`` function is a fundamental unit in the ModelBuilderV2\nframework that handles the tracing of PyTorch models for execution on\nNeuron devices. It processes example inputs as both positional and\nkeyword arguments, validates model parameters, and generates necessary\ntrace artifacts such as HLOs.\n\nParameters\n~~~~~~~~~~\n\n- **model: Union[Callable, torch.nn.Module]** — The PyTorch model or\n  callable function to be traced. Must have explicitly defined\n  parameters (no ``*args`` *or* ``**kwargs``). Must have at least one\n  parameter.\n- **args: Union[None, torch.Tensor, Tuple[torch.Tensor, …]] = None** —\n  Example inputs as positional arguments. Can be None, a single tensor,\n  or a tuple of tensors. Must match the model’s positional parameter\n  requirements.\n- **kwargs: Optional[Dict[str, torch.Tensor]] = None** — Example inputs\n  as keyword arguments. Must be a dictionary mapping parameter names to\n  tensor values. Cannot override parameters provided in args.\n- **spmd: bool = True** — Whether to use SPMD (Single Program Multiple\n  Data) for tracing. Currently only True is supported\n- **preserve_parameters: bool = True** — Whether to preserve module\n  buffers across multi-bucket trace.\n\nReturns\n~~~~~~~\n\nReturns a ``TraceArtifacts`` object containing:\n\n::\n\n   neuronx_distributed.trace.model_builder_utils.TraceArtifacts(\n       hlo: Any,                                 # HLO representation\n       metaneff: Any,                            # Meta information for NEFF\n       flattener: Any,                           # Function to flatten inputs\n       packer: Any,                              # Function to pack outputs\n       weight_name_to_idx: Dict[str, int],       # Maps weight names to indices\n       weight_names_to_skip: Set,                # Weight names excluded from optimization\n       provided_args: List[ProvidedArgInfo],     # Information about provided arguments\n       model_params: List[ModelParamInfo],       # Information about model parameters\n   )\n\n``ProvidedArgInfo`` object contains:\n\n::\n\n   neuronx_distributed.trace.model_builder_utils.ProvidedArgInfo(\n        param_name: str,       # Name of the parameter this argument corresponds to\n        is_positional: bool,   # Whether this argument is positional (required) or keyword (optional)\n        tensor: torch.Tensor,  # The tensor value provided for this argument\n   )\n\n``ModelParamInfo`` object contains:\n\n::\n\n   neuronx_distributed.trace.model_builder_utils.ModelParamInfo(\n        param_name: str,      # Name of the parameter in the function signature\n        is_positional: bool,  # Whether this parameter is positional (required) or keyword (optional)\n   )\n\nneuronx_distributed.trace.model_builder.compile\n===============================================\n\n::\n\n   neuronx_distributed.trace.model_builder.compile(\n       hlo_module: hlo_pb2.HloModuleProto,\n       metaneff: Any,\n       compiler_workdir: Optional[Union[str, pathlib.Path]] = None,\n       compiler_args: Optional[str] = None,\n       key: Optional[str] = None\n   ) -> CompilationArtifacts\n\nThe ``compile()`` function is a fundamental unit in the ModelBuilderV2\nframework that compiles traced models using the Neuron Compiler, and\ngenerates Neuron Executable File Format (NEFF) files. It handles\ncompiler configurations, workdir management, and produces compilation\nartifacts.\n\n.. _parameters-1:\n\nParameters\n~~~~~~~~~~\n\n- **hlo_module: hlo_pb2.HloModuleProto** — The HLO module representing\n  the computational graph to be compiled. Generated from the ``trace()``\n  function.\n- **metaneff: Any** — Meta information for the Neuron Executable File\n  Format (NEFF)\n- **compiler_workdir: Optional[Union[str, pathlib.Path]] = None** —\n  Directory path to store compiler artifacts. If None, uses a default\n  path. Creates timestamped subdirectories (in UTC format) for each\n  compilation.\n- **compiler_args: Optional[str] = None** — Compiler flags for\n  neuronx-cc. If None, uses default compiler\n  flags. Can include optimization levels and other compiler options.\n- **key: Optional[str] = None** — Key to tag the bucket with a\n  meaningful name. If None, generates a hash from the HLO module. Used\n  for logging and artifact organization\n\n.. _returns-1:\n\nReturns\n~~~~~~~\n\nReturns a ``CompilationArtifacts`` object containing:\n\n::\n\n   neuronx_distributed.trace.model_builder_utils.CompilationArtifacts(\n       neff_filepath: str    # Path to the compiled NEFF file\n   )\n\nDefault Compiler Flags\n~~~~~~~~~~~~~~~~~~~~~~\n\nIf no ``compiler_args`` are provided, the following defaults are used:\n\n::\n\n   --enable-saturate-infinity --auto-cast=none --model-type=transformer -O1\n\nDirectory Structure\n~~~~~~~~~~~~~~~~~~~\n\nThis creates the following directory structure:\n\n::\n\n   compiler_workdir/\n   └── {key}/\n       └── {timestamp}/\n           ├── model/\n           │   └── graph.hlo\n           ├── graph.neff\n           ├── metaneff.pb\n           └── command.txt\n           └── log-neuron-cc.txt\n\nneuronx_distributed.shard_checkpoint\n====================================\n\n::\n\n   neuronx_distributed.shard_checkpoint(\n       checkpoint: Dict[str, torch.Tensor],\n       model: torch.nn.Module,\n       start_rank: Optional[int] = None,\n       end_rank: Optional[int] = None,\n       load_on_device: bool = False,\n       serialize_path: Optional[str] = None\n   ) -> List[Dict[str, torch.Tensor]]\n\nThe ``shard_checkpoint()`` function shards a model checkpoint across\ntensor parallel ranks for distributed execution. It supports options for\nserialization (pre-shard) and direct loading onto Neuron devices\n(shard-on-load).\n\n.. _parameters-2:\n\nParameters\n~~~~~~~~~~\n\n- **checkpoint: Dict[str, torch.Tensor]** — The model checkpoint\n  dictionary. Maps parameter names to tensor values. Must contain all\n  model parameters.\n- **model: torch.nn.Module** — The PyTorch model to be sharded. Used for\n  determining sharding strategy.\n- **start_rank: Optional[int] = None** — Starting rank for sharding.\n  Must be in range [0, tp_degree). Defaults to 0 if None.\n- **end_rank: Optional[int] = None** — Ending rank for sharding. Must be\n  in range [start_rank, tp_degree). Defaults to ``(tp_degree - 1)`` if\n  None.\n- **load_on_device: bool = False** — Whether to load sharded tensors\n  onto Neuron devices. Requires running on supported Neuron instance.\n  Defaults to False.\n- **serialize_path: Optional[str] = None** — Path to save sharded\n  checkpoints. If provided, saves as safetensors files. Creates\n  directory if it doesn’t exist.\n\n.. _returns-2:\n\nReturns\n~~~~~~~\n\nReturns a ``List[Dict[str, torch.Tensor]]`` where:\n\n- Each dictionary represents a sharded checkpoint for a rank\n- Dictionary keys are parameter names\n- Dictionary values are sharded tensor values\n- List length is (end_rank - start_rank + 1)\n\nneuronx_distributed.ModelBuilder\n================================\n\n::\n\n   class ModelBuilder:\n       def __init__(\n           self,\n           model: Union[Callable, torch.nn.Module],\n       )\n\n``ModelBuilder`` is a high-level class that provides a fluent interface\nfor tracing and compiling PyTorch models for Neuron devices. It supports\nSPMD (Single Program Multiple Data) execution, and distributed model\nexecution.\n\nConstructor Parameters\n~~~~~~~~~~~~~~~~~~~~~~\n\n- **model: Union[Callable, torch.nn.Module]** — The PyTorch model to be\n  traced and compiled. Can be a model class or callable function. Must\n  have explicitly defined parameters (no ``*args`` *or* ``**kwargs``).\n  Must have at least one argument.\n\nneuronx_distributed.ModelBuilder.trace\n======================================\n\n::\n\n   neuronx_distributed.ModelBuilder.trace(\n       self,\n       args: Union[None, torch.Tensor, Tuple[torch.Tensor, ...]] = None,\n       kwargs: Optional[Dict[str, torch.Tensor]] = None,\n       tag: Optional[str] = None,\n       spmd: bool = True,\n   ) -> ModelBuilderV2\n\nTraces the model with given inputs and stores trace artifacts. Leverages\n`neuronx_distributed.trace.model_builder.trace`_\nfundamental unit.\n\n.. _parameters-3:\n\nParameters\n~~~~~~~~~~\n\n- **args: Union[None, torch.Tensor, Tuple[torch.Tensor, …]] = None** —\n  Example inputs as positional arguments. Can be None, a single tensor,\n  or a tuple of tensors. Must match the model’s positional parameter\n  requirements.\n- **kwargs: Optional[Dict[str, torch.Tensor]] = None** — Example inputs\n  as keyword arguments\n- **tag: Optional[str] = None** — Unique identifier for this trace.\n  Corresponding bucket will be tagged with this name. If None, generates\n  a hash from the HLO module.\n- **spmd: bool = True** — Whether to use SPMD (Single Program Multiple\n  Data) for tracing. Currently only True is supported\n\n.. _returns-3:\n\nReturns\n~~~~~~~\n\nSelf reference for method chaining.\n\nneuronx_distributed.ModelBuilder.compile\n========================================\n\n::\n\n   neuronx_distributed.ModelBuilder.compile(\n       self,\n       priority_model_key: Optional[str] = None,\n       compiler_workdir: Optional[Union[str, pathlib.Path]] = None,\n       compiler_args: Optional[Union[str, Dict[str, str]]] = None,\n       max_workers: Optional[int] = None,\n   ) -> NxDModel\n\nCompiles traced models using the Neuron compiler. Leverages\n`neuronx_distributed.trace.model_builder.compile`_\nfundamental unit.\n\n.. _parameters-4:\n\nParameters\n~~~~~~~~~~\n\n- **priority_model_key: Optional[str] = None** — Key of model to\n  prioritize for WLO\n- **compiler_workdir: Optional[Union[str, pathlib.Path]] = None** —\n  Directory for compiler artifacts\n- **compiler_args: Optional[Union[str, Dict[str, str]]] = None** —\n  Compiler flags as string or dictionary mapping tags to flags.\n- **max_workers: Optional[int] = None** — Maximum worker threads for\n  parallel compilation. If None, uses the default value from\n  ThreadPoolExecutor.\n\n.. _returns-4:\n\nReturns\n~~~~~~~\n\nA built and configured ``NxDModel`` instance.\n\nneuronx_distributed.trace.nxd_model.base_nxd_model.StateInitializer\n===================================================================\n\n::\n\n   class StateInitializer(torch.nn.Module):\n       def __init__(\n           self,\n           shapes: Dict[str, List[int]],\n           dtypes: Dict[str, torch.dtype],\n           local_ranks_size: int\n       ):\n\nA TorchScript-compatible module to initialize state buffers onto Neuron.\n\n.. _constructor-parameters-1:\n\nConstructor Parameters\n~~~~~~~~~~~~~~~~~~~~~~\n\n- **shapes: Dict[str, List[int]]** — Dict of shape lists associated with\n  a specific stateful tensor by key\n- **dtypes: Dict[str, torch.dtype]** — Dict of torch dtypes associated\n  with a specific stateful tensor by key\n- **local_ranks_size: int** — integer representing the number of ranks\n  per instance in a distributed setting. Unless it’s a Multi Instance\n  Data Parallel setup, it is usually just equal to the ``world_size``\n  your model was compiled for.\n\nneuronx_distributed.NxDModel\n============================\n\n::\n\n   class NxDModel(torch.nn.Module, BaseNxDModel):\n       def __init__(\n           self,\n           world_size: int,\n           start_rank: Optional[int] = None,\n           local_ranks_size: Optional[int] = None,\n           state_initializer: Optional[StateInitializer] = None,\n           layout_transformer: Optional[LayoutTransformerArtifacts] = None\n       )\n\nAn executor class to run models compiled by either the ``ModelBuilder``\nor ``trace()``, ``compile()`` fundamental units.\n\n.. _constructor-parameters-2:\n\nConstructor Parameters\n~~~~~~~~~~~~~~~~~~~~~~\n\n- **world_size: int —** Total number of ranks/processes in the\n  distributed setup.\n- **start_rank: Optional[int], default=None —** Starting rank for this\n  instance. If None, defaults to 0.\n- **local_ranks_size: Optional[int], default=None —** Number of local\n  ranks. Must be specified if start_rank is provided.\n- **state_initializer: Optional[StateInitializer], default=None —**\n  Initializer for model states. If not provided, stateful model tensors\n  will be initialized with zeros.\n\nneuronx_distributed.NxDModel.add\n================================\n\n::\n\n   @torch.jit.unused\n   def add(\n       self,\n       key: str,\n       trace_artifacts: TraceArtifacts,\n       compilation_artifacts: Union[CompilationArtifacts, WLOArtifacts],\n   ) -> \"NxDModel\"\n\nAdd a compiled submodel to this ``NxDModel`` instance.\n\n**Notes:**\n\n- Creates a ``StateInitializer`` if state tensors are present in the\n  metaneff, and none was provided in the ``NxDModel`` constructor\n- Sets up ``SPMDModel`` instances and input/output processing components\n\n.. _parameters-5:\n\nParameters\n~~~~~~~~~~\n\n- **key: str —** Unique identifier for this submodel within the\n  ``NxDModel``\n- **trace_artifacts: TraceArtifacts —** Artifacts produced from the\n  ``trace()`` function\n- **compilation_artifacts:** CompilationArtifacts — Artifacts produced\n  from the ``compile()`` or ``compile_wlo()`` functions\n\n.. _returns-5:\n\nReturns\n~~~~~~~\n\n``NxDModel`` self reference, enabling builder-style method chaining.\n\nneuronx_distributed.NxDModel.get_neff\n=====================================\n\n::\n\n   @torch.jit.unused\n   def get_neff(self, key: str) -> bytes\n\nRetrieves the NEFF (Neuron Executable File Format) from the specified\nmodel. Requires the associated model to already be added using the\n``add()`` method.\n\n.. _parameters-6:\n\nParameters\n~~~~~~~~~~\n\n- **key: str —** The identifier for the model whose NEFF should be\n  retrieved.\n\n.. _returns-6:\n\nReturns\n~~~~~~~\n\n``bytes`` — The NEFF for the specified model\n\n.. _raises-6:\n\nRaises\n~~~~~~\n\n- ``KeyError``: If the specified key is not found in the available keys.\n- ``RuntimeError``: If there is an error retrieving the NEFF.\n\nneuronx_distributed.NxDModel.get_metaneff\n=========================================\n\n::\n\n   @torch.jit.unused\n   def get_metaneff(self, key: str) -> metaneff_pb2.MetaNeff\n\nRetrieves the metaneff from the specified model. Requires the associated\nmodel to already be added using the ``add()`` method.\n\n.. _parameters-7:\n\nParameters\n~~~~~~~~~~\n\n- **key: str** — The identifier for the model whose metaneff should be\n  retrieved.\n\n.. _returns-7:\n\nReturns\n~~~~~~~\n\n``metaneff_pb2.MetaNeff`` — The metaneff proto object for the specified\nmodel.\n\n.. _raises-7:\n\nRaises\n~~~~~~~\n\n- ``KeyError``: If the specified key is not found in the available keys. \n- ``RuntimeError``: If there is an error retrieving the metaneff.\n\nneuronx_distributed.NxDModel.get_hlo\n====================================\n\n::\n\n   @torch.jit.unused\n   def get_hlo(self, key: str) -> hlo_pb2.HloModuleProto\n\nRetrieves the HLO from the specified model. Requires the associated\nmodel to already be added using the ``add()`` method.\n\n.. _parameters-8:\n\nParameters\n~~~~~~~~~~\n\n- **key: str** — The identifier for the model whose HLO should be\n  retrieved.\n\n.. _returns-8:\n\nReturns\n~~~~~~~\n\n``hlo_pb2.HloModuleProto`` — The HLO module proto object for the\nspecified model.\n\n.. _raises-8:\n\nRaises\n~~~~~~\n\n- ``KeyError``: If the specified key is not found in the available keys.\n- ``RuntimeError``: If there is an error retrieving the metaneff. \n\n\nneuronx_distributed.NxDModel.set_weights\n========================================\n\n::\n\n   @torch.jit.export\n   def set_weights(\n       self,\n       sharded_checkpoint: List[Dict[str, torch.Tensor]]\n   )\n\nSet the model’s weights from a sharded checkpoint.\n\nThis function initializes the model’s weights using a sharded\ncheckpoint. The checkpoint is processed and loaded using either a layout\ntransformer (if provided) or a direct parallel loading mechanism.\n\nThis function should only be called before the model is loaded onto a\nNeuron device. Once the model is loaded, use the\n``replace_weights()`` method to update the weights.\n\n.. _parameters-9:\n\nParameters\n~~~~~~~~~~\n\n- **sharded_checkpoint: List[Dict[str, torch.Tensor]]** — \\***\\* A list\n  of state dicts mapping parameter names to their corresponding tensor\n  values for each rank.\n\n.. _returns-9:\n\nReturns\n~~~~~~~\n\n``None``\n\n.. _raises-9:\n\nRaises\n~~~~~~\n\n``ValueError``: If the model is already loaded on a Neuron device.\n\nneuronx_distributed.NxDModel.to_neuron\n======================================\n\n::\n\n   @torch.jit.export\n   def to_neuron(self)\n\nLoads the model onto Neuron Devices.\n\nThis function initializes the model onto Neuron Hardware. Must be called\nbefore executing the model, otherwise the forward method will raise a\n``RuntimeError``.\n\n.. _returns-10:\n\nReturns\n~~~~~~~\n\n``None``\n\nneuronx_distributed.NxDModel.replace_weights\n============================================\n\n::\n\n   @torch.jit.export\n   def replace_weights(\n       self,\n       sharded_checkpoint: List[Dict[str, torch.Tensor]]\n   )\n\nReplace the model’s weights and reload onto Neuron devices.\n\nThis method should be used instead of ``set_weights()`` when the model\nis already loaded on Neuron devices and weights need to be updated.\n\n.. _parameters-10:\n\nParameters\n~~~~~~~~~~\n\n- **sharded_checkpoint: List[Dict[str, torch.Tensor]]** — \\***\\* A list\n  of state dicts mapping parameter names to their corresponding tensor\n  values for each rank.\n\n.. _returns-11:\n\nReturns\n~~~~~~~\n\n``None``\n\nneuronx_distributed.NxDModel.read_from_neuron_buffer\n====================================================\n\n::\n\n   @torch.jit.export\n   def read_from_neuron_buffer(\n       self,\n       buffer_key: str,\n       rank: int\n   ) -> torch.Tensor\n\nReads a tensor value from a Neuron device buffer to CPU, based on given\nkey and rank.\n\n.. _parameters-11:\n\nParameters\n~~~~~~~~~~\n\n- **buffer_key: str** — The key identifying the specific buffer\n  to retrieve.\n- **rank: int** — The rank from which to retrieve the buffer.\n\n.. _returns-12:\n\nReturns\n~~~~~~~\n\n``torch.Tensor``: The requested tensor buffer copied to Host memory.\n\n.. _raises-12:\n\nRaises\n~~~~~~\n\n- ``AssertionError``: If this method is called before to_neuron()\n- ``KeyError``: If the specified state_buffer_key does not exist in the states for the given rank.\n\nneuronx_distributed.NxDModel.write_to_neuron_buffer\n===================================================\n\n::\n\n   @torch.jit.export\n   def write_to_neuron_buffer(\n       self,\n       tensor: torch.Tensor,\n       buffer_key: str,rank: int\n   )\n\nWrite a tensor to a specific Neuron device buffer.\n\nThis function updates a state buffer on a Neuron device by copying\nvalues from the provided tensor. The destination buffer must already\nexist and have the same shape as the input tensor.\n\n.. _parameters-12:\n\nParameters\n~~~~~~~~~~\n\n- **tensor: torch.Tensor** — The tensor containing the data to be\n  written to the buffer.\n- **buffer_key: str** — The key identifying the specific buffer\n  to update.\n- **rank: int** — The rank where the buffer is located.\n\n.. _returns-13:\n\nReturns\n~~~~~~~\n\n``None``\n\n.. _raises-13:\n\nRaises\n~~~~~~~\n\n- ``AssertionError``: If this method is called before ``to_neuron()``.\n- ``KeyError``: If the specified ``state_buffer_key`` does not exist in the states for the given rank, or if the shapes of the input tensor and target buffer do not match.\n\nneuronx_distributed.NxDModel.forward\n====================================\n\n::\n\n   def forward(\n       self,\n       *args,\n       model_name: Optional[str] = None,\n       forward_mode='default',\n       **kwargs\n   ):\n\nThe forward method of the NxDModel class, which will take in inputs and\nrun the respective NEFF.\n\n.. _parameters-13:\n\nParameters\n~~~~~~~~~~\n\n- **args: Union[torch.Tensor, List[torch.Tensor]]** — Positional\n  tensor inputs to model. List form must be used if\n  ``forward_mode != 'default'``.\n- **model_name: Optional[str]** — Parameter to pass in a specific\n  key to execute. This must be used in cases of ambiguous routing.\n- **forward_mode: str, default=‘default’** — There are 3\n  supported modes: default, ranked, async.\n\n  - **default**: This takes in inputs, replicates them across ranks,\n    executes the model, and only returns the outputs from rank 0\n  - **ranked:** This takes in inputs in ranked form, meaning each\n    individual tensor input (ie each ``arg`` in ``*args``) must be a list\n    of tensors whose length is equal to the world size of the compiled\n    model. The model will execute, and return a ranked output, which is\n    a ``List`` of all outputs by rank (ie a\n    ``List[List[torch.Tensor]]``.\n  - **async:** Like ranked, this takes in inputs and returns outputs in\n    ranked form, except the major difference is that the outputs will be\n    returned instantly, and will be references to buffers where the\n    model will write the output once the NEFF is done executing. To\n    block on the NEFF call, you must call ``.cpu()`` for each tensor in\n    the output.\n\n- ****kwargs (torch.Tensor, List[torch.Tensor])** — Keyword arguments\n  corresponding to specific input tensors to the model. List form must\n  be used if ``forward_mode != 'default'``.\n\n.. _returns-14:\n\nReturns\n~~~~~~~\n\nIt depends on the ``forward_mode`` setting: \n\n- **default:** Expected format of tensor outputs based on what was originally traced.\n- **ranked or async:** ``List[List[torch.Tensor]]`` of shape (num_out_tensors, world_size).\n\nneuronx_distributed.NxDModel.save\n=================================\n\n::\n\n   def save(self, path_to_save: str, save_weights: bool = False)\n\nSaves the model as a TorchScript module to the specified path. The saved\nartifact can be loaded with ``NxDModel.load`` or ``torch.jit.load``\n(``NxDModel.load`` is preferrable).\n\n.. _parameters-14:\n\nParameters\n~~~~~~~~~~\n\n- **path_to_save: str** — The file path where the TorchScript\n  model should be saved.\n- **save_weights: Optional[bool], default=False** — If ``True``,\n  preserves the weights within the TorchScript model. It is ``False`` by\n  default.\n\n.. _returns-15:\n\nReturns\n~~~~~~~\n\n``None``\n\nneuronx_distributed.NxDModel.load\n=================================\n\n::\n\n   @classmethod\n   def load(\n       cls,\n       path_to_model: str,\n       start_rank: Optional[int] = None,\n       local_ranks_size: Optional[int] = None\n   ) -> Union[\"NxDModel\", torch.jit.ScriptModule]\n\nAttempts to load and restore an ``NxDModel`` from a saved TorchScript\nmodel.\n\nThis classmethod tries to reconstruct an NxDModel instance from a\npreviously saved TorchScript model. If the restoration process fails, it\nreturns the loaded TorchScript model instead, as backwards compatibility\nis not guaranteed across different versions of NxD.\n\n.. _parameters-15:\n\nParameters\n~~~~~~~~~~\n\n- **path_to_model: str** — Path to the saved TorchScript model\n  file.\n- **start_rank: Optional[int], default=None** — Starting rank for\n  distributed processing. If ``None``, and ``local_ranks_size`` is set,\n  an ``AssertionError`` will be raised. Defaults to ``None``\n- **local_ranks_size: Optional[int], default=None** — Size of\n  local_ranks for distribtued processing. Must be set if ``start_rank``\n  is provided. Defaults to ``None``\n\n.. _returns-16:\n\nReturns\n~~~~~~~\n\n``Union[NxDModel, torch.jit.ScriptModule]``: Either the restored\n``NxdModel`` instance, or the loaded TorchScript model if restoration\nfails.\n\n.. _raises-16:\n\nRaises\n~~~~~~~\n\n- ``ValueError``: If the provided model was not originally saved using ``NxDModel.save()``.\n- ``AssertionError``: If ``start_rank``/``local_ranks_size`` parameters are inconsistently set.\n\nUsage Notes\n===========\n\nIn-place buffer updates\n~~~~~~~~~~~~~~~~~~~~~~~\n\nDescription\n~~~~~~~~~~~\n\nModelBuilderV2 enables users to update model buffers in-place during\ntheir model’s ``forward`` pass. In-place updates enable users to\nefficiently utilize memory when caching values during the ``forward``\npass. An example use case for in-place updates is the population of a\nmodel’s KV Cache.\n\nUnder the hood, ModelBuilderV2 detects when buffers are mutated during\n``forward`` while tracing a model, and uses `XLA’s\naliasing <https://openxla.org/xla/aliasing>`__ to ensure that buffers\nare mutated in-place.\n\nSupported Usage\n~~~~~~~~~~~~~~~\n\nIn-place updates are currently supported for the following combinations\nof ``torch.Tensor`` subclasses and torch operations:\n\n+-----------------------+-----------------------+-----------------------+\n| Tensor class          | Out of place torch    | In place torch        |\n|                       | operation             | operation             |\n+=======================+=======================+=======================+\n| torch.nn.Buffer,      | Supported             | Not Supported         |\n| persistent=True       |                       |                       |\n+-----------------------+-----------------------+-----------------------+\n| torch.nn.Buffer,      | Supported             | Not Supported         |\n| persistent=False      |                       |                       |\n+-----------------------+-----------------------+-----------------------+\n| torch.nn.Parameter    | Not Supported         | Not Supported         |\n+-----------------------+-----------------------+-----------------------+\n\nAdditionally, the following forms of updates are not supported, because\nthese mutations change the memory utilization or memory layout of the\nmutated tensor:\n\n- Updating the ``dtype`` of a buffer or parameter during ``forward``.\n- Updating the ``shape`` of a buffer or parameter during ``forward``.\n\n.. _supported-usage-1:\n\nSupported Usage:\n~~~~~~~~~~~~~~~~\n\n::\n\n   import torch\n   import torch.nn as nn\n\n   class ExampleModel(nn.Module):\n       def __init__(self):\n           super().__init__()\n           \n           self.register_buffer(\"buffer_persistent\", torch.zeros(10), dtype=torch.bfloat16, persistent=True)\n           self.register_buffer(\"buffer_nonpersistent\", torch.zeros(10), dtype=torch.bfloat16, persistent=False)\n           self.parameter = nn.Parameter(torch.zeros(10), dtype=torch.bfloat16)\n           \n       def forward(self, x, dim_tensor, index, src):\n           # supported: buffers with out of place torch operations\n           self.buffer_persistent = self.buffer_persistent + 1\n           self.buffer_nonpersistent = torch.scatter(self.buffer_persistent, dim_tensor, index, src)\n           \n           # not supported: buffers with inplace torch operations\n           self.buffer_persistent.scatter_(dim_tensor, index, src)\n           self.buffer_nonpersistent.index_copy_(dim_tensor, index, src)\n           \n           # not supported: parameters\n           self.parameter = torch.scatter(self.paramter, dim_tensor, index, src)\n           self.parameter.scatter_(dim_tensor, index, src)\n           \n           # not supported: dtype updates\n           self.buffer_persistent = self.buffer_persistent.to(torch.float32)\n           \n           # not supported: shape changes\n           self.buffer_persistent = torch.reshape(self.buffer_persistent.reshape, (2, 5))\n\nUsage Examples\n==============\n\nE2E with ModelBuilder APIs\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nExample: Build and run callable with ModelBuilder\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   import torch\n   import torch.nn as nn\n   from neuronx_distributed import ModelBuilder\n\n   torch.manual_seed(0)\n\n   def func(a, b):\n       return a + b\n\n   nxd_model = ModelBuilder(func) \\\n       .trace(kwargs={'a': torch.rand(2,2), 'b': torch.rand(2,2)}, tag=\"key1\") \\\n       .compile()\n\n   nxd_model.to_neuron()\n   input = (torch.rand(2, 2), torch.rand(2, 2))\n   cpu_out = func(a=input[0], b=input[1])\n   neuron_out = nxd_model(a=input[0], b=input[1])\n\n   torch.testing.assert_close(cpu_out, neuron_out)\n\nExample: Build and run torch module with ModelBuilder\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   import torch\n   import torch.nn as nn\n   from neuronx_distributed.utils.model_utils import init_on_device\n   from neuronx_distributed import NxDParallelState, shard_checkpoint, ModelBuilder\n   from neuronx_distributed.parallel_layers import ColumnParallelLinear, RowParallelLinear\n\n   torch.manual_seed(0)\n\n   class Model(nn.Module):\n       def __init__(self, is_distributed=True):\n           super().__init__()\n           if is_distributed:\n               self.layer1 = ColumnParallelLinear(1024, 1024, gather_output=False)\n               self.layer2 = RowParallelLinear(1024, 1024, input_is_parallel=True)\n           else:\n               self.layer1 = nn.Linear(1024, 1024)\n               self.layer2 = nn.Linear(1024, 1024)\n       def forward(self, x):\n           x = self.layer1(x)\n           return self.layer2(x)\n\n   cpu_model = Model(is_distributed=False)\n   model_checkpoint = cpu_model.state_dict()\n\n   with NxDParallelState(world_size=32, tensor_model_parallel_size=32):\n       model = Model()\n\n       example_inputs = torch.rand(32, 1024)\n\n       nxd_model = ModelBuilder(model) \\\n           .trace(args=example_inputs, tag=\"key1\") \\\n           .compile()\n\n   with NxDParallelState(world_size=32, tensor_model_parallel_size=32), init_on_device(torch.device(\"meta\")):\n       sharded_checkpoint = shard_checkpoint(\n           checkpoint=model_checkpoint,\n           model=Model()\n       )\n\n   nxd_model.set_weights(sharded_checkpoint)\n   nxd_model.to_neuron()\n\n   input = torch.ones(32, 1024)\n   cpu_out = cpu_model(input)\n   neuron_out = nxd_model(x=input)\n\nExample: Multi-bucket trace with ModelBuilder\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   import torch\n   import torch.nn as nn\n   from neuronx_distributed.utils.model_utils import init_on_device\n   from neuronx_distributed import NxDParallelState, shard_checkpoint, ModelBuilder\n   from neuronx_distributed.parallel_layers import ColumnParallelLinear\n\n   torch.manual_seed(0)\n\n   class Model(nn.Module):\n       def __init__(self, is_distributed=True):\n           super().__init__()\n           if is_distributed:\n               self.layer1 = ColumnParallelLinear(1024, 1024, gather_output=True)\n               self.layer2 = ColumnParallelLinear(1024, 1024, gather_output=True)\n           else:\n               self.layer1 = nn.Linear(1024, 1024)\n               self.layer2 = nn.Linear(1024, 1024)\n       def forward(self, x):\n           x = self.layer1(x)\n           return self.layer2(x)\n\n   cpu_model = Model(is_distributed=False)\n   model_checkpoint = cpu_model.state_dict()\n\n   with NxDParallelState(world_size=32, tensor_model_parallel_size=32):\n       model = Model()\n\n       example_inputs1 = torch.rand(32, 1024)\n       example_inputs2 = torch.rand(16, 1024)\n       \n       nxd_model = ModelBuilder(model) \\\n           .trace(args=example_inputs1, tag=\"bucket1\") \\\n           .trace(args=example_inputs2, tag=\"bucket2\") \\\n           .compile()\n\n\n   with NxDParallelState(world_size=32, tensor_model_parallel_size=32), init_on_device(torch.device(\"meta\")):\n       sharded_checkpoint = shard_checkpoint(\n           checkpoint=model_checkpoint,\n           model=Model()\n       )\n\n   nxd_model.set_weights(sharded_checkpoint)\n   nxd_model.to_neuron()\n\n   input1 = torch.rand(32, 1024)\n   input2 = torch.rand(16, 1024)\n\n   for input in [input1, input2]:\n       cpu_out = cpu_model(input)\n       neuron_out = nxd_model(input)\n       torch.testing.assert_close(cpu_out, neuron_out)\n\nExample: Build and run torch module with ModelBuilder where example inputs are supplied as kwargs\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   import torch\n   import torch.nn as nn\n   from neuronx_distributed.utils.model_utils import init_on_device\n   from neuronx_distributed import NxDParallelState, shard_checkpoint, ModelBuilder\n   from neuronx_distributed.parallel_layers.layers import ColumnParallelLinear\n\n   torch.manual_seed(0)\n\n   class Model(nn.Module):\n       def __init__(self, is_distributed=True):\n           super().__init__()\n           if is_distributed:\n               self.layer1 = ColumnParallelLinear(5, 10, gather_output=True)\n               self.layer2 = ColumnParallelLinear(20, 10, gather_output=True)\n           else:\n               self.layer1 = nn.Linear(5, 10)\n               self.layer2 = nn.Linear(20, 10)\n\n       def forward(self, x, y):\n           return self.layer1(x) + self.layer2(y)\n\n   cpu_model = Model(is_distributed=False)\n   model_checkpoint = cpu_model.state_dict()\n\n   with NxDParallelState(world_size=2, tensor_model_parallel_size=2):\n       model = Model()\n\n       example_inputs1 = {'x': torch.rand(10, 5), 'y': torch.rand(10, 20)}\n       example_inputs2 = {'x': torch.rand(50, 5), 'y': torch.rand(50, 20)}\n       \n       nxd_model = ModelBuilder(model) \\\n           .trace(kwargs=example_inputs1, tag=\"bucket1\") \\\n           .trace(kwargs=example_inputs2, tag=\"bucket2\") \\\n           .compile()\n\n\n   with NxDParallelState(world_size=2, tensor_model_parallel_size=2), init_on_device(torch.device(\"meta\")):\n       sharded_checkpoint = shard_checkpoint(\n           checkpoint=model_checkpoint,\n           model=Model()\n       )\n\n   nxd_model.set_weights(sharded_checkpoint)\n   nxd_model.to_neuron()\n\n   input1 = (torch.rand(10, 5), torch.rand(10, 20))\n   input2 =  (torch.rand(50, 5), torch.rand(50, 20))\n\n   for input in [input1, input2]:\n       cpu_out = cpu_model(input[0], input[1])\n       neuron_out = nxd_model(x=input[0], y=input[1])\n       torch.testing.assert_close(cpu_out, neuron_out)\n\nExample: Build and run torch module with in-place buffer updates\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   import torch\n   from neuronx_distributed import ModelBuilder\n\n   torch.manual_seed(0)\n\n   class Model(torch.nn.Module):\n       def __init__(self):\n           super().__init__()\n           self.register_buffer('cache', torch.tensor([0], dtype=torch.float32), persistent=True)\n\n       def forward(self, x, update_value):\n           self.cache = torch.add(self.cache, update_value)\n           return x + self.cache\n\n   cpu_model = Model()\n\n   model = Model()\n\n   example_inputs1 = {'x': torch.zeros(1, dtype=torch.float32), 'update_value': torch.zeros(1, dtype=torch.float32)}\n\n   nxd_model = ModelBuilder(model) \\\n       .trace(kwargs=example_inputs1, tag=\"bucket1\") \\\n       .compile()\n\n   state_dict = [\n       {\n           \"cache\": torch.tensor([0])\n       }\n   ]\n   nxd_model.set_weights(state_dict)\n   nxd_model.to_neuron()\n\n   input1 = (torch.tensor([1], dtype=torch.float32), torch.tensor([5], dtype=torch.float32))\n   input2 =  (torch.tensor([2], dtype=torch.float32), torch.tensor([10], dtype=torch.float32))\n\n   model_iteration = 0\n   for input in [input1, input2]:\n       cpu_out = cpu_model(input[0], input[1])\n       neuron_out = nxd_model(x=input[0], update_value=input[1])\n       \n       torch.testing.assert_close(cpu_out, neuron_out)\n       model_iteration += 1\n       print(f\"Iteration {model_iteration} matches!\")\n\nE2E with Fundamental Units\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nExample: Build and run Callable with Fundamental Units\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   import torch\n   from neuronx_distributed import NxDModel\n   from neuronx_distributed.trace.model_builder import trace, compile\n\n   torch.manual_seed(0)\n\n   def func(a,b):\n       return a + b\n\n   trace_artifacts = trace(func, kwargs={'a': torch.rand(2,2), 'b': torch.rand(2,2)})\n   compilation_artifacts = compile(trace_artifacts.hlo, trace_artifacts.metaneff)\n\n   nxd_model = NxDModel(world_size=1)\n   nxd_model.add('func', trace_artifacts, compilation_artifacts)\n   nxd_model.to_neuron()\n\n   cpu_out = func(torch.ones(2, 2), torch.ones(2, 2))\n   neuron_out = nxd_model(torch.ones(2,2), torch.ones(2,2))\n   torch.testing.assert_close(cpu_out, neuron_out)\n\nExample: Build and run torch module with Fundamental Units\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   import os\n   import shutil\n   import torch\n   import torch.nn as nn\n\n   from neuronx_distributed.utils.model_utils import init_on_device\n   from neuronx_distributed import NxDParallelState, shard_checkpoint, ModelBuilder, NxDModel\n   from neuronx_distributed.parallel_layers import ColumnParallelLinear, RowParallelLinear\n   from neuronx_distributed.trace.model_builder_utils import ModelBuilderConstants\n   from neuronx_distributed.trace.model_builder import (\n       trace,\n       compile,\n   ) \n\n   torch.manual_seed(0)\n\n   class Model(nn.Module):\n       def __init__(self, is_distributed=True):\n           super().__init__()\n           if is_distributed:\n               self.layer1 = ColumnParallelLinear(1024, 1024, gather_output=False)\n               self.layer2 = RowParallelLinear(1024, 1024, input_is_parallel=True)\n           else:\n               self.layer1 = nn.Linear(1024, 1024)\n               self.layer2 = nn.Linear(1024, 1024)\n       def forward(self, x):\n           x = self.layer1(x)\n           return self.layer2(x)\n\n   cpu_model = Model(is_distributed=False)\n   model_checkpoint = cpu_model.state_dict()\n\n   with NxDParallelState(world_size=32, tensor_model_parallel_size=32):\n       model = Model()\n\n       example_inputs = torch.rand(32, 1024)\n\n       trace_artifacts = {\n           \"bucket1\": trace(model, args=example_inputs),\n       }\n\n       compilation_artifacts_priority = compile(\n           hlo_module=trace_artifacts[\"bucket1\"].hlo,\n           metaneff=trace_artifacts[\"bucket1\"].metaneff,\n           key=\"bucket1\"\n       )\n\n   with NxDParallelState(world_size=32, tensor_model_parallel_size=32), init_on_device(torch.device(\"meta\")):\n       sharded_checkpoint = shard_checkpoint(\n           checkpoint=model_checkpoint,\n           model=Model()\n       )\n\n   nxd_model = NxDModel(world_size=32)\n   nxd_model.add(key=\"bucket1\", trace_artifacts=trace_artifacts[\"bucket1\"], compilation_artifacts=compilation_artifacts_priority)\n\n   nxd_model.set_weights(sharded_checkpoint)\n   nxd_model.to_neuron()\n\n   input = torch.rand(32, 1024)\n\n   cpu_out = cpu_model(input)\n   neuron_out = nxd_model(input)\n   torch.testing.assert_close(cpu_out, neuron_out)\n\nExample: Multi-bucket trace with Fundamental Units\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   import os\n   import shutil\n   import torch\n   import torch.nn as nn\n\n   from neuronx_distributed.utils.model_utils import init_on_device\n   from neuronx_distributed import NxDParallelState, shard_checkpoint, ModelBuilder, NxDModel\n   from neuronx_distributed.parallel_layers import ColumnParallelLinear, RowParallelLinear\n   from neuronx_distributed.trace.model_builder_utils import ModelBuilderConstants\n   from neuronx_distributed.trace.model_builder import (\n       trace,\n       compile,\n   ) \n\n   torch.manual_seed(0)\n\n   class Model(nn.Module):\n       def __init__(self, is_distributed=True):\n           super().__init__()\n           if is_distributed:\n               self.layer1 = ColumnParallelLinear(1024, 1024, gather_output=False)\n               self.layer2 = RowParallelLinear(1024, 1024, input_is_parallel=True)\n           else:\n               self.layer1 = nn.Linear(1024, 1024)\n               self.layer2 = nn.Linear(1024, 1024)\n       def forward(self, x):\n           x = self.layer1(x)\n           return self.layer2(x)\n\n   cpu_model = Model(is_distributed=False)\n   model_checkpoint = cpu_model.state_dict()\n\n   with NxDParallelState(world_size=32, tensor_model_parallel_size=32):\n       model = Model()\n\n       example_inputs1 = torch.rand(32, 1024)\n       example_inputs2 = torch.rand(16, 1024)\n\n       trace_artifacts = {\n           \"bucket1\": trace(model, args=example_inputs1),\n           \"bucket2\": trace(model, args=example_inputs2),\n       }\n\n       compilation_artifacts_bucket1 = compile(\n           hlo_module=trace_artifacts[\"bucket1\"].hlo,\n           metaneff=trace_artifacts[\"bucket1\"].metaneff,\n           key=\"bucket1\"\n       )\n       compilation_artifacts_bucket2 = compile(\n           hlo_module=trace_artifacts[\"bucket2\"].hlo,\n           metaneff=trace_artifacts[\"bucket2\"].metaneff,\n           key=\"bucket2\"\n       )\n\n   with NxDParallelState(world_size=32, tensor_model_parallel_size=32), init_on_device(torch.device(\"meta\")):\n       sharded_checkpoint = shard_checkpoint(\n           checkpoint=model_checkpoint,\n           model=Model()\n       )\n\n   nxd_model = NxDModel(world_size=32)\n   nxd_model.add(key=\"bucket1\", trace_artifacts=trace_artifacts[\"bucket1\"], compilation_artifacts=compilation_artifacts_bucket1)\n   nxd_model.add(key=\"bucket2\", trace_artifacts=trace_artifacts[\"bucket2\"], compilation_artifacts=compilation_artifacts_bucket2)\n\n   nxd_model.set_weights(sharded_checkpoint)\n   nxd_model.to_neuron()\n\n   input1 = torch.rand(32, 1024)\n   input2 = torch.rand(16, 1024)\n\n   for input in [input1, input2]:\n       cpu_out = cpu_model(input)\n       neuron_out = nxd_model(input)\n       torch.testing.assert_close(cpu_out, neuron_out)\n\nExample: Build and run torch module with Fundamental Units where example inputs are supplied as kwargs\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   import os\n   import shutil\n   import torch\n   import torch.nn as nn\n\n   from neuronx_distributed.utils.model_utils import init_on_device\n   from neuronx_distributed import NxDParallelState, shard_checkpoint, ModelBuilder, NxDModel\n   from neuronx_distributed.parallel_layers import ColumnParallelLinear, RowParallelLinear\n   from neuronx_distributed.trace.model_builder_utils import ModelBuilderConstants\n   from neuronx_distributed.trace.model_builder import (\n       trace,\n       compile,\n   ) \n\n   torch.manual_seed(0)\n\n   class Model(nn.Module):\n       def __init__(self, is_distributed=True):\n           super().__init__()\n           if is_distributed:\n               self.linear1 = ColumnParallelLinear(5, 10, gather_output=True)\n               self.linear2 = ColumnParallelLinear(20, 10, gather_output=True)\n           else:\n               self.linear1 = nn.Linear(5, 10)\n               self.linear2 = nn.Linear(20, 10)\n\n       def forward(self, x, y):\n           return self.linear1(x) + self.linear2(y)\n\n   cpu_model = Model(is_distributed=False)\n   model_checkpoint = cpu_model.state_dict()\n\n   with NxDParallelState(world_size=2, tensor_model_parallel_size=2):\n       model = Model()\n\n       example_inputs1 = {'x': torch.rand(10, 5), 'y': torch.rand(10, 20)}\n       example_inputs2 = {'x': torch.rand(50, 5), 'y': torch.rand(50, 20)}\n\n       trace_artifacts = {\n           \"bucket1\": trace(model, kwargs=example_inputs1),\n           \"bucket2\": trace(model, kwargs=example_inputs2),\n       }\n\n       compilation_artifacts_bucket1 = compile(\n           hlo_module=trace_artifacts[\"bucket1\"].hlo,\n           metaneff=trace_artifacts[\"bucket1\"].metaneff,\n           key=\"bucket1\"\n       )\n       compilation_artifacts_bucket2 = compile(\n           hlo_module=trace_artifacts[\"bucket2\"].hlo,\n           metaneff=trace_artifacts[\"bucket2\"].metaneff,\n           key=\"bucket2\"\n       )\n\n   with NxDParallelState(world_size=2, tensor_model_parallel_size=2), init_on_device(torch.device(\"meta\")):\n       sharded_checkpoint = shard_checkpoint(\n           checkpoint=model_checkpoint,\n           model=Model()\n       )\n\n   nxd_model = NxDModel(world_size=2)\n   nxd_model.add(key=\"bucket1\", trace_artifacts=trace_artifacts[\"bucket1\"], compilation_artifacts=compilation_artifacts_bucket1)\n   nxd_model.add(key=\"bucket2\", trace_artifacts=trace_artifacts[\"bucket2\"], compilation_artifacts=compilation_artifacts_bucket2)\n\n   nxd_model.set_weights(sharded_checkpoint)\n   nxd_model.to_neuron()\n\n   input1 = (torch.rand(10, 5), torch.rand(10, 20))\n   input2 =  (torch.rand(50, 5), torch.rand(50, 20))\n\n   for input in [input1, input2]:\n       cpu_out = cpu_model(input[0], input[1])\n       neuron_out = nxd_model(x=input[0], y=input[1])\n       torch.testing.assert_close(cpu_out, neuron_out)\n\nExample: Build and run torch module with in-place buffer updates\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   import torch\n\n   from neuronx_distributed import NxDModel\n   from neuronx_distributed.trace.model_builder import (\n       trace,\n       compile,\n   ) \n\n   torch.manual_seed(0)\n\n   class Model(torch.nn.Module):\n       def __init__(self):\n           super().__init__()\n           self.register_buffer('cache', torch.tensor([0], dtype=torch.float32), persistent=True)\n\n       def forward(self, x, update_value):\n           self.cache = torch.add(self.cache, update_value)\n           return x + self.cache\n\n   cpu_model = Model()\n\n   model = Model()\n\n   example_inputs1 = {'x': torch.zeros(1, dtype=torch.float32), 'update_value': torch.zeros(1, dtype=torch.float32)}\n\n   trace_artifacts = {\n       \"bucket1\": trace(model, kwargs=example_inputs1),\n   }\n\n   compilation_artifacts_bucket1 = compile(\n       hlo_module=trace_artifacts[\"bucket1\"].hlo,\n       metaneff=trace_artifacts[\"bucket1\"].metaneff,\n       key=\"bucket1\"\n   )\n\n\n   nxd_model = NxDModel(world_size=1)\n   nxd_model.add(key=\"bucket1\", trace_artifacts=trace_artifacts[\"bucket1\"], compilation_artifacts=compilation_artifacts_bucket1)\n\n   state_dict = [\n       {\n           \"cache\": torch.tensor([0], dtype=torch.float32)\n       }\n   ]\n   nxd_model.set_weights(state_dict)\n   nxd_model.to_neuron()\n\n   input1 = (torch.tensor([1], dtype=torch.float32), torch.tensor([5], dtype=torch.float32))\n   input2 =  (torch.tensor([2], dtype=torch.float32), torch.tensor([10], dtype=torch.float32))\n\n   model_iteration = 0\n   for input in [input1, input2]:\n       cpu_out = cpu_model(input[0], input[1])\n       neuron_out = nxd_model(x=input[0], update_value=input[1])\n       \n       torch.testing.assert_close(cpu_out, neuron_out)\n       model_iteration += 1\n       print(f\"Iteration {model_iteration} matches!\")\n"
  },
  {
    "path": "libraries/neuronx-distributed/model_optimizer_wrapper_developer_guide.rst",
    "content": ".. _model_optimizer_wrapper_developer_guide:\n\nDeveloper guide for model and optimizer wrapper \n==========================================================================\n\nModel and optimizer wrapper are useful tools to wrap original model and optimizer\nwhile keep the API unchanged. We recommend to always use model and optimizer wrappers,\nit's helpful to apply optimizations and hide the complexity from the optimizations.\nUsers need to care about the implementation details of the optimization, just use\nthe wrappers as you normally use ``torch.nn.Module`` and ``torch.optim.Optimizer``.\n\nFor a complete api guide, refer to :ref:`API GUIDE<api_guide>`.\n\nCreate training config:\n'''''''''''''''''''''''\n\nTo use model and optimizer wrapper, we need to create ``neuronx_distributed``\nconfig firstly.\n\nA sample config use tensor parallel, pipeline parallel, ZeRO-1 optimizer,\nsequence parallel and activation checkpointing:\n\n.. code:: ipython3\n\n   nxd_config = nxd.neuronx_distributed_config(\n       tensor_parallel_size=args.tensor_parallel_size,\n       pipeline_parallel_size=args.pipeline_parallel_size,\n       pipeline_config={\n           \"transformer_layer_cls\": LlamaDecoderLayer,\n           \"num_microbatches\": args.num_microbatches,\n           \"output_loss_value_spec\": (True, False),\n           \"input_names\": [\"input_ids\", \"attention_mask\", \"labels\"],\n           \"pipeline_cuts\": pipeline_cuts,\n           \"trace_file_path\": args.trace_file_path,\n           \"param_init_fn\": None,\n           \"leaf_module_cls\": [LlamaRMSNorm.__name__],\n           \"autowrap_modules\": [mappings],\n           \"use_zero1_optimizer\": args.use_zero1_optimizer > 0,\n           \"use_optimizer_wrapper\": True,\n       },\n       optimizer_config={\n           \"zero_one_enabled\": args.use_zero1_optimizer > 0,\n           \"grad_clipping\": True,\n           \"max_grad_norm\": 1.0,\n       },\n       sequence_parallel=args.use_sequence_parallel,\n       activation_checkpoint_config=CoreAttention if args.use_selective_checkpoint > 0 else \"full\",\n       model_init_config=model_init_config,\n   )\n\nUse model wrapper:\n''''''''''''''''''\n\nWhen we wrap a model with model wrapper, we need to implement a model getter\nfunction. The model getter function will be called to initialize model on CPU and\nthen model will be moved to XLA device serially. Then, let's pass ``nxd_config``,\nmodel getter function and its inputs to method ``initialize_parallel_model``:\n\n.. code:: ipython3\n\n   model = nxd.initialize_parallel_model(nxd_config, get_model, config)\n\nIf pipeline parallel is enabled, to run a training iteration, user must use\n``run_train``, it handles pipeline partitioned forward and backward in it:\n\n.. code:: ipython3\n\n   loss = model.run_train(*inputs)\n\nOtherwise, users can use either ``run_train`` or:\n\n.. code:: ipython3\n\n   loss = model(*inputs)\n   loss.backward()\n\nTo access the wrapped model:\n\n.. code:: ipython3\n\n   model.local_module()\n\nModel wrapper also has short cuts to access some common fields of hugging\nface transformers model;\n\n.. code:: ipython3\n\n   model.dtype  # get model's dtype\n   model.config  # get model's config\n   model.name_or_path  # get model's name or path\n\nUse optimizer wrapper:\n''''''''''''''''''''''\n\nWhen we wrap an optimizer with optimizer wrapper, we need ``nxd_config``,\noriginal optimizer class and its inputs (parameters and optimizer arguments):\n\n.. code:: ipython3\n\n   optimizer = nxd.initialize_parallel_optimizer(\n       nxd_config, torch.optim.AdamW, param_groups, lr=args.lr, betas=(args.beta1, args.beta2), weight_decay=args.weight_decay\n   )\n\nOne useful feature is that user can access grad norm value from wrapped optimizer\ndirectly:\n\n.. code:: ipython3\n\n   # It's a XLA tensor\n   optimizer.grad_norm\n\nNote that if optimizer has not been executed or ``grad_clipping`` is disable,\naccess ``grad_norm`` will get ``None``.\n"
  },
  {
    "path": "libraries/neuronx-distributed/neuronx-distributed-misc.rst",
    "content": "Misc \n===============================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /release-notes/components/nxd-core\n\n.. include:: /libraries/neuronx-distributed/neuronx-distributed-misc.txt"
  },
  {
    "path": "libraries/neuronx-distributed/neuronx-distributed-misc.txt",
    "content": "* :ref:`nxd-core_rn`"
  },
  {
    "path": "libraries/neuronx-distributed/neuronx_distributed_inference_developer_guide.rst",
    "content": ".. _neuronx_distributed_inference_developer_guide:\n\nAbout NeuronX-Distributed (NxD) Inference\n=================================================\n\nNeuronX Distributed (NxD Core) provides fundamental building blocks that enable you to run\nadvanced inference workloads on AWS Inferentia and Trainium instances. These building\nblocks include parallel linear layers that enable distributed inference, a model builder\nthat compiles PyTorch modules into Neuron models, and more.\n\nNeuron also offers Neuronx-Distributed (NxD) Inference,\nwhich is a library that provides optimized model and module implementations that build on top\nof NxD Core. We recommend that you use NxD Inference to run inference workloads and onboard\ncustom models. For more information about NxD Inference, see :ref:`nxdi-overview`.\n\nFor examples of how to build directly on NxD Core, see the following:\n\n* `Llama 3.2 1B inference sample <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama>`_\n* T5 3B inference tutorial :ref:`[html] </src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`"
  },
  {
    "path": "libraries/neuronx-distributed/pipeline_parallelism_overview.rst",
    "content": ".. _pipeline_parallelism_overview:\n\nPipeline Parallelism Overview \n===============================\n\nPipeline parallelism is a technique used in deep learning model training to improve efficiency \nand reduce the training time of large neural networks.\nCurrently NeuronxDistributed's pipeline parallelism is built specially for transformer based models,\nwhere each Neuron core will be assigned with a subset of transformer layers.\nPipelining is a technique to achieve true parallelization in pipeline parallelism, \nby having the Neuron cores compute simultaneously on different data samples, \nand to overcome the performance loss due to sequential computation. \nWhen you use pipeline parallelism, training job is executed in a pipelined \nfashion over microbatches to maximize device usage.\n\nModel partitioning\n---------------------\n\nIn NeuronxDistributed, we use `Pytorch's FX <https://pytorch.org/docs/stable/fx.html>`__ to trace the model and do partition on the FX IR.\nUser simply needs to specify where to cut the pipeline stages, and our algorithm will cut the\npipeline stages and assign the corresponding modules to each Neuron core automatically.\nCurrently we require user to provide model partition decision but auto-partition will be supported in the future.\nHere is an example of simple partition with 5 linear layers\n\n.. code:: ipython3\n\n   # original NN module\n   class MyModule(torch.nn.Module):\n      def __init__(self):\n         super().__init__()\n         self.linears = torch.nn.ModuleList([torch.nn.Linear(4, 4) for _ in range(5)])\n\n      def forward(self, x):\n         for lin in self.linears:\n               x = lin(x)\n         return x\n\n   m = MyModule()\n   gm = torch.fx.symbolic_trace(m)\n   print(gm)\n   \"\"\"\n   GraphModule(\n   (linears): Module(\n      (0): Linear(in_features=4, out_features=4, bias=True)\n      (1): Linear(in_features=4, out_features=4, bias=True)\n      (2): Linear(in_features=4, out_features=4, bias=True)\n      (3): Linear(in_features=4, out_features=4, bias=True)\n      (4): Linear(in_features=4, out_features=4, bias=True)\n   )\n   )\n\n   def forward(self, x):\n      linears_0 = getattr(self.linears, \"0\")(x);  x = None\n      linears_1 = getattr(self.linears, \"1\")(linears_0);  linears_0 = None\n      linears_2 = getattr(self.linears, \"2\")(linears_1);  linears_1 = None\n      linears_3 = getattr(self.linears, \"3\")(linears_2);  linears_2 = None\n      linears_4 = getattr(self.linears, \"4\")(linears_3);  linears_3 = None\n      return linears_4\n   \"\"\"\n\nIf user decide to cut the pipeline stage at the 3nd linear call, after partition \nthere will be 2 submodules, where `submod_0` contains first 3 linear layers \nand `submod_1` contains last 2 linear layers.\n\n.. code:: ipython3\n\n   After Split module\n   GraphModule(\n   (submod_0): GraphModule(\n      (linears_0): Linear(in_features=4, out_features=4, bias=True)\n      (linears_1): Linear(in_features=4, out_features=4, bias=True)\n      (linears_2): Linear(in_features=4, out_features=4, bias=True)\n   )\n   (submod_1): GraphModule(\n      (linears_3): Linear(in_features=4, out_features=4, bias=True)\n      (linears_4): Linear(in_features=4, out_features=4, bias=True)\n   )\n   )\n\n   def forward(self, x):\n      submod_0 = self.submod_0(x);  x = None\n      submod_1 = self.submod_1(submod_0);  submod_0 = None\n      return submod_1\n\nPipeline Execution Schedule\n----------------------------\n\nPipelining is based on splitting a mini-batch into microbatches, which are \nfed into the training pipeline one-by-one and follow an execution schedule defined \nby the library runtime. A microbatch is a smaller subset of a given training mini-batch. \nThe pipeline schedule determines which microbatch is executed by which device for every time slot.\n\nFor example, depending on the pipeline schedule and the model partition, \nNeuron core i might perform (forward or backward) computation on microbatch b while Neuron core i+1 performs \ncomputation on microbatch b+1, thereby keeping both Neuron cores active at the same time. An example taken from\nMegatron paper is showed as below\n\n.. image:: /libraries/neuronx-distributed/images/pp_schedule.png\n   :alt: Image: image.png\n"
  },
  {
    "path": "libraries/neuronx-distributed/pp_developer_guide.rst",
    "content": ".. _pp_developer_guide:\n\nDeveloper guide for Pipeline Parallelism \n=====================================================================\n\nTraining\n^^^^^^^^\n\nFor training models with pipeline-parallelism, user needs to make few\nchanges to their model/training script. In the below steps, we walk through different \nchanges user has to make to use pipeline parallelism.\nFor general changes please refer to `tensor parallel guidance <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tp_developer_guide.html>`__.\n\nCreating Model\n'''''''''''''''\n\nTo train with pipeline parallel, user needs to wrap their torch module with NeuronxDistributed's Pipeline Parallel model wrapper, i.e. ``NxDPPModel``\nLet's take a look at our Llama example:\n\n.. code:: ipython3\n\n    # Create torch model\n    config.return_dict = False\n    model = transformers.LlamaForCausalLM(config)\n    # Create pipeline cuts\n    pipeline_cuts = create_partition(config, args)\n    # Apply model wrapper\n    model = NxDPPModel(\n        model,\n        transformer_layer_cls=LlamaDecoderLayer,\n        num_microbatches=args.num_microbatches,\n        virtual_pipeline_size=1,\n        output_loss_value_spec=(True, False),\n        input_names=[\"input_ids\", \"attention_mask\", \"labels\"],\n        pipeline_cuts=pipeline_cuts,\n        trace_file_path=args.trace_file_path,\n        leaf_module_cls=[LlamaRMSNorm.__name__],\n        autowrap_modules=[mappings],\n        use_zero1_optimizer=args.use_zero1_optimizer,\n        deallocate_pipeline_outputs=False,\n    )\n    model.move_model_to_device()\n\nWe first create the model from the Hugging Face model config. If tensor parallel needs to be applied to model\nit must be done here before applying the pipeline parallel model wrapper. The next step is to create the partitions. Here\nis an example to evenly partition the layers for all stages:\n\n.. code:: ipython3\n\n    def create_partition(config, args):\n        \"\"\"\n        Evenly split the transformer layers between the PP ranks\n        \"\"\"\n        assert config.num_hidden_layers % args.pipeline_parallel_size == 0\n        num_layer_per_partition = config.num_hidden_layers  // args.pipeline_parallel_size\n        pipeline_cuts = []\n        current_cut = num_layer_per_partition - 1\n        for i in range(args.pipeline_parallel_size-1):\n            pipeline_cuts.append(f\"model.layers.{current_cut}\")\n            current_cut += num_layer_per_partition\n        if torch.distributed.get_rank() == 0:\n            print(f\"pipeline_cuts {pipeline_cuts}\")\n        return pipeline_cuts\n\nNote that the pipeline cuts should be at the transformer layer module name, which \nin Llama model is indicated as ``model.layers.i`` where ``i`` is the layer index. Users have the option to either provide the pipeline cuts, or set ``auto_partition`` to ``True`` to automatically determine the pipeline cuts to use.\nAfter pipeline cuts are decided, pipeline model wrapper is applied. Let's take a deeper look into each input of the model wrapper\n\n- ``model``: The original Pytorch module, could be TPfied.\n- ``transformer_layer_cls=LlamaDecoderLayer``: The transformer layer class, we will use it for partition\n- ``num_microbatches=args.num_microbatches``: The number of microbatches we used for pipeline execution.\n- ``virtual_pipeline_size``: Virtual pipeline size if greater than 1 we will use the interleaved pipeline schedule.\n- ``output_loss_value_spec=(True, False)``: This tells ``NxDPPModel`` how to get the loss from the model output. In this case output is a tuple, where first value is loss and second value is something else. ``NxDPPModel`` will use loss to run backward and return loss as the output.\n- ``input_names=[\"input_ids\", \"attention_mask\", \"labels\"]``: The model input names that we will use to run training. As our partition uses FX symbolic trace to trace the model, we will use these input names to create ``concrete_args``. Usually this will be the same input as you will feed into model for the execution. For details please check https://pytorch.org/docs/stable/fx.html#torch.fx.symbolic_trace\n- ``pipeline_cuts=pipeline_cuts``: The pipeline cuts to decide the stages\n- ``leaf_module_cls=[LlamaRMSNorm.__name__]``: We can add some pytorch modules as leaf module so that FX symbolic trace won't trace it through. Here we mark the ``LlamaRMSNorm`` as one leaf module. If you hit any issue about tracing you can skip tracing that part by add the module as a leaf module here. The transformer layer module will be a leaf module by default.\n- ``autowrap_modules``: This serves as the same functionality to simplify FX tracing. User can provide a **python** module here and all the methods from this python module will not be traced.\n- ``use_zero1_optimizer``: When zero-1 optimizer is used, set this to True, so the PP model will understand that zero-1 optimizer will handle data parallel gradient averaging.\n- ``deallocate_pipeline_outputs``: \n    Whether to deallocate the pipeline outputs after send. After send the output tensor is only useful for its \n    '.grad_fn' field, and not its '.data'.\n\nAfter applying model wrapper, ``NxDPPModel`` will partition the model based on the pipeline cuts. If the original model is not yet moved to device, we can call\n``model.move_model_to_device()`` so that ``NxDPPModel`` will only move the local module to device.\n\nRuntime execution:\n'''''''''''''''''''\n\nTo use pipeline runtime, user simply needs to replace their original model call with ``NxDPPModel.run_train``, rest will remain unchanged. \nPlease note that the pipeline runtime will take care of both forward and backward call, so user will not need to explicitly make backward calls. \nThe ``NxDPPModel.run_train`` call will return the loss that is achieved from ``output_loss_value_spec``.\n\nInterleaved Pipeline-Parallelism:\n---------------------------------\n\nTo use interleaved pipeline parallel, one has to set ``virtual_pipeline_size`` greater than 1. The value of the \n``virtual_pipeline_size * pipeline_parallel_size`` should be equal to the number of layers in the models. Interleave pipeline can \nhelp to reduce the pipeline bubble size and improve performance especially in cases when the number of microbatches \nper data-parallel rank is small. More information can be found `here <https://developer.nvidia.com/blog/scaling-language-model-training-to-a-trillion-parameters-using-megatron/#interleaved_schedule>`__\n\n\nMixed precision training\n------------------------\nWe support the torch autocast to do mixed precision, simply apply the context manager for the ``NxDPPModel.run_train`` call.\nHere is an example:\n\n\n.. code:: ipython3\n\n    # replace loss, _ = model(input_ids, attention_mask, labels) with below\n    with torch.autocast(enabled=args.use_amp > 0, dtype=torch.bfloat16, device_type=\"cuda\"):\n        loss = model.run_train(\n            input_ids=input_ids,\n            attention_mask=attention_mask,\n            labels=labels,\n        )\n\n\nThings that require user attention:\n'''''''''''''''''''''''''''''''''''\n\nModel initialization\n--------------------\n\nWhen the model is large, it is easy to cause host OOM when full model is created on every Neuron core. We recommend 2 ways to deal with this situation:\n\nUsing torchdistx's deferred initialization\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nPytorch's torchdistx package (https://github.com/pytorch/torchdistx/tree/main) provides easy way to do deferred initialization. If you have torchdistx installed,\nusing deferred initialization is simple as below\n\n.. code:: ipython3\n\n    from torchdistx import deferred_init\n    # Instead of model = LlamaForCausalLM(config)\n    model = deferred_init.deferred_init(LlamaForCausalLM, config)\n\nThe model weights will be initialized in fake tensor mode which will not consume memory.\nAfter applying the ``NxDPPModel`` model wrapper we will only materialize the weights that belong to the local module. \nPlease be aware that the torchdistx package is not actively maintained by Meta, please use at your own risk.\n\nUsing meta device for initialization\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuronxDistributed also supports also offer a way to first create the model on meta device, then reinitialize it to host device with only the local modules.\nTo create the model on meta device, follow the below example:\n\n.. code:: ipython3\n\n    from neuronx_distributed.utils.model_utils import init_on_device\n    with init_on_device(torch.device(\"meta\")):\n        model = LlamaForCausalLM(config)\n\nWith ``init_on_device(torch.device(\"meta\"))`` context manager, all model weights will be create to meta device, which will not consume host memory.\nThen during applying the PP model wrapper, user can pass the ``param_init_fn`` kwargs which can define how to reinit the parameter. Here is an example:\n\n.. code:: ipython3\n    \n    def init_weights(module):\n        from neuronx_distributed.parallel_layers import ColumnParallelLinear, RowParallelLinear, ParallelEmbedding\n        if isinstance(module, (nn.Linear, Conv1D)):\n            module.weight.data.normal_(mean=0.0, std=model_config.initializer_range)\n            if module.bias is not None:\n                module.bias.data.zero_()\n        elif isinstance(module, nn.Embedding):\n            module.weight.data.normal_(mean=0.0, std=model_config.initializer_range)\n            if module.padding_idx:\n                module.weight.data[module.padding_idx].zero_()\n        elif isinstance(module, nn.LayerNorm):\n            module.bias.data.zero_()\n            module.weight.data.fill_(1.0)\n        elif isinstance(module, (ParallelEmbedding, RowParallelLinear, ColumnParallelLinear)):\n            module.init_weight_cpu()\n            if hasattr(module, \"bias\") and module.bias is not None:\n                module.bias.data.zero_()\n    \n    model = NxDPPModel(...,param_init_fn=init_weights,...)\n\n``param_init_fn`` should take a module as input and initialize how the weight of that module should be initialized.\n\nMoving model to device\n----------------------\n\nWhen user create the model it is usually either created on CPU, or using meta device/torchdistx for delayed parameter initialization. It is important to understand \nwhen the delayed parameter will be materialized and how/when to move model to device.\n\nOnce the ``NxDPPModel`` wrapper is applied with the model together with the partition information, tracing and partition will happen immediately. After partition\nwe will materialize the local module if torchdistx is used or ``param_init_fn`` is passed. So the returned model of ``NxDPPModel`` wrapper will have local parameters on host device.\n\nAfter model is wrapped with ``NxDPPModel`` user can do things that are recommended to run on CPU, e.g. loading shareded checkpoint. It is important to make sure to call ``model.move_model_to_device()``\nbefore creating the optimizer, so that the optimizer can take the weights that are on the device. When using zero-1 optimizer, it is also required to use ``model.local_parameters()`` to create parameter groups so the optimizer can\ninfer the right device information from parameter groups.\n\nGradient checkpointing\n----------------------\n\nGradient checkpointing (or activation checkpointing) is a common method used in deep learning to reduce memory footprint by doing \nrecomputation of forward computation. The common way to apply the gradient checkpointing on XLA device is to use the torch_xla's \n`gradient checkpointing wrapper <https://github.com/pytorch/xla/blob/master/torch_xla/utils/checkpoint.py#L129>`__, which will apply an autograd function.\nHowever FX's symbolic tracing does not understand autograd function, and as a result the checkpointing information will be ignored if the checkpoint wrapper\nis traced during partition.\nTo handle this case, user can manually re-apply gradient checkpoint after partition. Here we provide an example to checkpoint every transformer layer\nafter partition.\n\n.. code:: ipython3\n\n    from typing import Any, Dict, Iterator, Tuple\n    import torch.nn as nn\n\n    import torch\n    from torch_xla.utils.checkpoint import checkpoint as torch_checkpoint\n    from neuronx_distributed.parallel_layers.parallel_state import rmsg\n    from neuronx_distributed.utils.logger import get_logger\n    from torch.distributed.utils import _replace_by_prefix\n\n    logger = get_logger()\n\n    _CHECKPOINT_WRAPPED_MODULE = \"mod\"\n    _CHECKPOINT_PREFIX = _CHECKPOINT_WRAPPED_MODULE + \".\"\n\n    class CheckPointWrapper(torch.nn.Module):\n        def __init__(self, mod) -> None:\n            super().__init__()\n            self.mod = mod\n            # state_dict post hook to remove prefix to allow loading into a\n            # non-checkpoint wrapped module.\n            self._register_state_dict_hook(self._post_state_dict_hook)\n            # load_state_dict pre-hook to allow loading back into\n            # checkpoint-wrapped module.\n            self._register_load_state_dict_pre_hook(\n                self._pre_load_state_dict_hook, with_module=True\n            )\n\n\n        def forward(self, *args, **kwargs):\n            ordered_args = list(args)\n            for value in kwargs.values():\n                ordered_args += [value]\n\n            # Note: checkpoint cannot accept kwargs\n            return torch_checkpoint(self.mod, *ordered_args, use_reentrant=True)\n        \n        def named_parameters(\n            self,\n            *args,\n            **kwargs,\n        ) -> Iterator[Tuple[str, torch.nn.Parameter]]:\n            \"\"\"\n            Overrides :meth:`named_parameters()` to intercept parameter names and\n            remove all occurrences of ``_CHECKPOINT_PREFIX``.\n            \"\"\"\n            for param_name, param in super().named_parameters(*args, **kwargs):\n                updated_name = param_name.replace(_CHECKPOINT_PREFIX, \"\")\n                yield updated_name, param\n        \n        def named_modules(self,*args,**kwargs):\n            for module_name, module in super().named_modules(*args, **kwargs):\n                updated_name = module_name.replace(_CHECKPOINT_PREFIX, \"\")\n                yield updated_name, module\n\n        @staticmethod\n        def _post_state_dict_hook(\n            module: nn.Module,\n            state_dict: Dict[str, Any],\n            prefix: str,\n            *args: Any,\n        ) -> Dict[str, Any]:\n            \"\"\"\n            _post_state_dict_hook() is called after the state_dict() of this\n            FSDP module is executed. For ``checkpoint_wrapper``, it will strip\n            checkpoint-wrapped module prefix so that this module can be loaded into\n            non-checkpointed modules. It would still be able to be loaded into\n            checkpoint-wrapped modules as this class adds the prefix back before\n            loading the state_dict.\n            \"\"\"\n            _replace_by_prefix(state_dict, f\"{prefix}{_CHECKPOINT_PREFIX}\", prefix)\n            return state_dict\n        \n        @staticmethod\n        def _pre_load_state_dict_hook(\n            module: nn.Module,\n            state_dict: Dict[str, Any],\n            prefix: str,\n            *args: Any,\n        ) -> None:\n            \"\"\"\n            ``_pre_state_dict_hook` is called before ``self._load_from_state_dict()``\n            is called. For ``checkpoint_wrapper``, it will add back the module\n            prefix so that non-checkpointed modules can be loaded into\n            checkpoint_wrapper modules properly.\n            \"\"\"\n            _replace_by_prefix(state_dict, prefix, prefix + f\"{_CHECKPOINT_PREFIX}\")\n\n    def apply_checkpoint(dist_model, layers_to_checkpoint=None):\n        checkpoint_wrapper_added = False\n        if layers_to_checkpoint is not None and len(layers_to_checkpoint) == 0:\n            raise RuntimeError(\n                rmsg(f\"invalid input layers_to_checkpoint {layers_to_checkpoint}, can't be empty\")\n            )\n        for name, module in dist_model.local_module.named_children():\n            # checkpoint layers that are provided in input\n            # if layers not provide in input, then checkpoint if it is transformer layer\n            if (layers_to_checkpoint and name in layers_to_checkpoint) or (\n                not layers_to_checkpoint and type(module) == dist_model.transformer_layer_cls\n            ):\n                # add_module replaces old module with our own custom module.\n                # https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html#Module.add_module\n                dist_model.local_module.add_module(name, CheckPointWrapper(module))\n                checkpoint_wrapper_added = True\n        if layers_to_checkpoint is not None and not checkpoint_wrapper_added:\n            logger.warning(\n                rmsg(f\"layers_to_checkpoint {layers_to_checkpoint} do not exist in the graph\")\n            )\n        elif layers_to_checkpoint is None and not checkpoint_wrapper_added:\n            logger.warning(\n                rmsg(\n                    f\"During applying activation checkpointing, transformer_layer_cls {dist_model.transformer_layer_cls.__name__} can not be found in stage {dist_model.pipeline_parallel_rank}, skipping...\"\n                )\n            )\n\n    model = NxDPPModel(...)\n    # Will checkpoint every transformer layer\n    apply_checkpoint(model)\n\n``apply_checkpoint`` function will try to apply gradient checkpointing to every transformer layer. Please note we have plan to add this functionality into ``NxDPPModel`` in the future releases.\n\n\nModel tracing\n-------------\n\nIt is important to understand that the model cannot be partitioned without tracing.\nThe model tracing is currently done with FX's symbolic trace. There are `certain limitations for FX's symbolic trace <https://pytorch.org/docs/stable/fx.html#limitations-of-symbolic-tracing>`__. So in order to avoid any tracing issue, \nwe would like to trace as less operations as possible, which means that we only want to trace the structure of the model, and cut the pipeline stages on the transformer layers, we do not care how exactly the computations are in the model.\nBy default, we will mark all transformer layers as leaf nodes, so that the tracer will not trace inside these layers. If you have some module that might cause tracing problem, you can try to mark them as leaf nodes as well. Our previous example \nalso marks the `LlamaRMSNorm` as leaf module for Llama model.\n\nSpecial treatment for Hugging Face models\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHugging Face offers FX support for many of its models. We will detect if user is using a Hugging Face model (by checking if the model class is ``transformers.PreTrainedModel``), and if so we will use the Huggingface's FX tracer to do the symbolic trace.\nThe Hugging Face's tracer has implementation of many functionalities to help tracing, for details please refer to `here <https://github.com/huggingface/transformers/blob/main/src/transformers/utils/fx.py>`__.\nHowever, please be aware that Hugging Face's tracer will check if the model class name belongs to one of the Hugging Face models. So if you create your model class based on some Huggingface model class, it is important to maintain the same class name. Below is an example:\n\n.. code:: ipython3\n\n    from transformers.models.llama.modeling_llama import LlamaForCausalLM as LlamaForCausalLMHF\n\n    # Keep the same class name as original one\n    class LlamaForCausalLM(LlamaForCausalLMHF):\n        ...\n\n\nAuto partition\n---------------\nSetting the ``auto_partition`` parameter to ``True`` means that the transformer layers are automatically partitioned by evenly splitting the transformer layers between the PP ranks. If the transformer layers are not evenly divisible by the PP ranks, the remaining layers are distributed to the latter pipeline ranks.\nThe partitions are created on the basis of the transformer layer names. The transformer layer names are determined by recursively traversing the original torch module to find the layer names of modules that are of the ``transformer_layer_cls`` type in the model.\nIf the user does not want to partition the model in this way, they can set the partitions to use by specifying the ``pipeline_cuts``. Note that the pipeline cuts should be at the transformer layer module name, which in the Llama model is given by ``model.layers.i`` where ``i`` is the layer index.\n"
  },
  {
    "path": "libraries/neuronx-distributed/ptl_developer_guide.rst",
    "content": ".. _ptl_developer_guide:\n\nDeveloper guide for Neuron-PT-Lightning \n=================================================================\n\nTraining\n^^^^^^^^\n\nFor training models with Neuron-PT-Lightning, user needs to make few\nchanges to their model/training script. \nIn this document we explain how we can train a model using Tensor Parallelism (TP), Data Parallelism (DP) and Zero-1. \n\nFirst, let's start with the model changes. Please follow the guidelines here (`tensor parallel guidance <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tp_developer_guide.html>`__) \nfor building the model with tensor-parallelism enabled and setting up training dataset.\n\nNext, let's walkthrough how we can build the training loop with Neuron-PT-Lightning APIs\n\nConfigure NeuronLTModule\n''''''''''''''''''''''''\nNeuronxDistributed overrides `LightningModule <https://lightning.ai/docs/pytorch/stable/common/lightning_module.html>`__ with built-in support for \nNeuron device. User needs to inherit from ``NeuronLTModule``\n\n.. code:: ipython3\n\n    class NeuronLlamaLTModule(NeuronLTModule):\n        def training_step(self, batch, batch_idx):\n            ...\n        ...\n\nWithin LTModule, user needs to override the following methods\n``training_step``\nAt this moment NeuronLTModule only support `manual optimization <https://lightning.ai/docs/pytorch/stable/model/manual_optimization.html>`__, so user needs to define forward, backward and optimization steps\n\n.. code:: ipython3\n\n    def training_step(self, batch, batch_idx):\n        xm.mark_step() # Isolate forward+backward graph\n        for logger in self.trainer.loggers:\n            logger.print_step = -1\n        self.should_print = False\n        outputs = self.model(\n            input_ids=batch[\"input_ids\"],\n            attention_mask=batch[\"attention_mask\"],\n            labels=batch[\"labels\"],\n        )\n        loss = outputs.loss / self.grad_accum_steps\n        loss.backward()\n        self.averaged_loss += loss.detach()\n        xm.mark_step() # Isolate forward+backward graph\n        if not self.automatic_optimization and (batch_idx +1) % self.grad_accum_steps == 0:\n            self.should_print = True\n            loss_div = self.averaged_loss / self.trainer.strategy.data_parallel_size\n            loss_reduced = xm.all_reduce(\n                xm.REDUCE_SUM,\n                loss_div,\n                groups=parallel_state.get_data_parallel_group(as_list=True),\n            )\n            loss_reduced_detached = loss_reduced.detach()\n            self.averaged_loss.zero_()\n            optimizer = self.optimizers()\n            scheduler = self.lr_schedulers()\n            optimizer.step()\n            optimizer.zero_grad()\n            scheduler.step()\n            xm.mark_step() # Isolate Optimization step graph\n\n            # Setup items for logging\n            self.loss = loss_reduced_detached\n        return loss\n\n``configure_optimizers``\nConfigure optimizer and lr_scheduler\n\n.. code:: ipython3\n\n    def configure_optimizers(self):\n        param_groups = self.get_param_groups_by_weight_decay()\n        optimizer = initialize_parallel_optimizer(\n            self.nxd_config, self.opt_cls, param_groups, **self.opt_kwargs\n        )\n        optimizer.zero_grad()\n        scheduler = self.scheduler_cls(optimizer, *self.scheduler_args, **self.scheduler_kwargs)\n        return (\n            [optimizer],\n            [\n                {\n                    \"scheduler\": scheduler,\n                }\n            ],\n        )\n\n``on_train_batch_end``\nCustomized behaviour at the end of each training batch, like logging\n\n.. code:: ipython3\n\n    def on_train_batch_end(self, *args, **kwargs):\n        if self.should_print:\n            if not self.automatic_optimization:\n                self.log(\n                    \"loss\",\n                    self.loss.detach().cpu().item() if self.loss is not None else torch.zeros(1, device=\"cpu\", requires_grad=False),\n                    prog_bar=True,\n                )\n                self.log(\n                    \"global_step\",\n                    self.global_step,\n                    prog_bar=True,\n                    on_step=True,\n                    on_epoch=True,\n                )\n                for logger in self.trainer.loggers:\n                    logger.print_step = self.global_step\n\nNote that NeuronLTModule has a built-in function of ``get_param_groups_by_weight_decay`` for common use case as shown in snippet below, \nusers can also override with their own param_groups generation.\n\n.. code:: ipython3\n\n    def get_param_groups_by_weight_decay(self):\n        \"\"\"Get param groups. Customers can override this to have their own way of weight_decay\"\"\"\n        param_optimizer = list(self.model.named_parameters())\n        no_decay = [\"bias\", \"LayerNorm\"]  # gamma/beta are in LayerNorm.weight\n\n        optimizer_grouped_parameters = [\n            {\n                \"params\": [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],\n                \"weight_decay\": 0.01,\n            },\n            {\n                \"params\": [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],\n                \"weight_decay\": 0.0,\n            },\n        ]\n        return optimizer_grouped_parameters\n\n\nConfigure DataModule\n''''''''''''''''''''\n\nCreate a LightningDataModule for data loading/sampling\n\n.. code:: ipython3\n\n    class NeuronLightningDataModule(LightningDataModule):\n        def __init__(\n            self, \n            dataloader_fn: Callable,\n            data_dir: str, \n            batch_size: int,\n            data_args: Tuple = (), \n            data_kwargs: Dict = {},\n        ):\n            super().__init__()\n            self.dataloader_fn = dataloader_fn\n            self.data_dir = data_dir\n            self.batch_size = batch_size\n            self.data_args = data_args,\n            self.data_kwargs = data_kwargs\n            \n\n        def setup(self, stage: str):\n            pass\n\n        def train_dataloader(self):\n            return self.dataloader_fn(\n                self.data_dir,\n                self.batch_size,\n                self.trainer.strategy.data_parallel_size,\n                self.trainer.strategy.data_parallel_rank,\n                *self.data_args,\n                **self.data_kwargs\n            )\n\nUpdate Training Script\n''''''''''''''''''''''\n\nFor detailed introduction to each api/class, check `api guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html>`__\n\nCreate NeuronLTModule and DataModule\n------------------------------------\n\n.. code:: ipython3\n\n    model = NeuronLlamaLTModule(\n        model_fn = LlamaForCausalLM,\n        nxd_config = nxd_config,\n        model_args = (model_config,),\n        opt_cls = optimizer_cls,\n        scheduler_cls = configure_scheduler,\n        opt_kwargs = {\n            \"lr\": flags.lr,\n        },\n        scheduler_args = (flags.warmup_steps, flags.max_steps),\n        grad_accum_steps = flags.grad_accum_usteps,\n        manual_opt = True, \n    )\n\n    dm = NeuronLightningDataModule(\n        create_llama_pretraining_dataset,\n        flags.data_dir,\n        flags.batch_size,\n        data_args = (flags.seed,),\n    )\n\nAdd Strategy, Plugins, Callbacks\n--------------------------------\n\n.. code:: ipython3\n\n    strategy = NeuronXLAStrategy(\n        nxd_config = nxd_config\n    )\n    plugins = []\n    plugins.append(NeuronXLAPrecisionPlugin())\n    callbacks = []\n    callbacks.append(NeuronTQDMProgressBar())\n\nCreate Trainer and Start Training\n---------------------------------\n\n.. code:: ipython3\n\n    trainer = Trainer(\n        strategy = strategy, \n        max_steps = flags.steps_this_run,\n        plugins = plugins,\n        enable_checkpointing = flags.save_checkpoint,\n        logger = NeuronTensorBoardLogger(save_dir=flags.log_dir),\n        log_every_n_steps = 1,\n        callbacks = callbacks,\n    )\n    trainer.fit(model=model, datamodule=dm)\n\nCheckpointing\n-------------\n\nTo enable checkpoint saving, add `ModelCheckpoint <https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.ModelCheckpoint.html>`__\nto the callbacks\n\n.. code:: ipython3\n\n    callbacks.append(\n        ModelCheckpoint(\n            save_top_k = flags.num_kept_checkpoint,\n            monitor=\"global_step\",\n            mode=\"max\",\n            every_n_train_steps = flags.checkpoint_freq,\n            dirpath = flags.checkpoint_dir,\n        )\n    )\n\nTo load from specific checkpoint, add ``ckpt_path=ckpt_path`` to ``trainer.fit``\n\n.. code:: ipython3\n\n     trainer.fit(model=model, datamodule=dm, ckpt_path=ckpt_path)\n"
  },
  {
    "path": "libraries/neuronx-distributed/save_load_developer_guide.rst",
    "content": "\n.. _save_load_developer_guide:\n.. _neuronx_distributed_save_load_developer_guide:\n\nDeveloper guide for save/load checkpoint\n========================================\n\nThis document will introduce how to use `nxd.save_checkpoint` and `nxd.load_checkpoint`\nto save and load checkpoint for distributed model training. This two methods handle all\ncheckpoint in a single method: model, optimize, learning rate scheduler and any user contents.\n\nModel states are saved on data parallel rank-0 only. When ZeRO-1 optimizer is not turned on,\noptimizer states are also saved like this; while when ZeRO-1 optimizer is turned on, states\nare saved on all ranks. Scheduler and user contents are saved on master rank only.\n\nFor a complete api guide, refer to :ref:`API GUIDE<api_guide>`.\n\nSave checkpoint:\n''''''''''''''''\n\nA sample usage:\n\n.. code:: ipython3\n\n   nxd.save_checkpoint(\n       args.checkpoint_dir,  # checkpoint path\n       tag=f\"step_{total_steps}\",  # tag, sub-directory under checkpoint path\n       model=model,\n       optimizer=optimizer,\n       scheduler=lr_scheduler,\n       user_content={\"total_steps\": total_steps, \"batch_idx\": batch_idx, \"cli_args\": args.__dict__},\n       use_xser=True,\n       async_save=True,\n   )\n\nUsers can choose to not save every thing. For example, model states only:\n\n.. code:: ipython3\n\n   nxd.save_checkpoint(\n       args.checkpoint_dir,  # checkpoint path\n       tag=f\"step_{total_steps}\",  # tag, sub-directory under checkpoint path\n       model=model,\n       use_xser=True,\n       async_save=True,\n   )\n\nTo only keep several checkpoints (e.g. 5), just use :code:`num_kept_ckpts=5`.\n\nLoad checkpoint:\n''''''''''''''''\n\nA sample usage, note that if no user contents detected, it will return ``None``:\n\n.. code:: ipython3\n\n   user_content = nxd.load_checkpoint(\n       args.checkpoint_dir,  # checkpoint path\n       tag=f\"step_{args.loading_step}\",  # tag\n       model=model,\n       optimizer=optimizer,\n       scheduler=lr_scheduler,\n   )\n\nLeave ``tag`` not provided, this loading method will try to automatically resume from the\nlatest checkpoint.\n\n.. code:: ipython3\n\n   user_content = nxd.load_checkpoint(\n       args.checkpoint_dir,  # checkpoint path\n       model=model,\n       optimizer=optimizer,\n       scheduler=lr_scheduler,\n   )\n\nZeRO-1 Optimizer State Offline Conversion:\n''''''''''''''''''''''''''''''''''''''''''\n\nZeRO-1 optimizer checkpoint are sharded states stored for each rank. When user want to\nload ZeRO-1 optimizer states with different cluster setting (e.g. with DP degree changed),\nthey can run the offline ZeRO-1 optimizer checkpoint conversion tool. This tool supports\nconversion from sharded states to full states, from full to sharded, and from sharded to sharded.\n\n.. code:: ipython3\n   \n   # sharded to sharded or full to sharded\n   nxd_convert_zero_checkpoints --input_dir <input path> --output_dir <output path> --convert_to_sharded --dp_size <new dp degree>\n   # sharded to full\n   nxd_convert_zero_checkpoints --input_dir <input path> --output_dir <output path> --convert_to_full\n"
  },
  {
    "path": "libraries/neuronx-distributed/setup/index.rst",
    "content": ".. _neuronx_distributed_setup:\n\nNeuronX Distributed Setup\n===========================\n\n:ref:`Install PyTorch Neuron on Trn1 <setup-torch-neuronx>` to create a pytorch environment. It is recommended to work out of a Python virtual environment (such as ``venv``) so as to avoid package installation issues.\n\nYou can install the ``neuronx-distributed`` package using the following command:\n\n.. code:: ipython3\n\n   python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com\n\n\n\n\n\n"
  },
  {
    "path": "libraries/neuronx-distributed/setup/index.txt",
    "content": ":ref:`Install PyTorch Neuron on Trn1 <setup-torch-neuronx>` to create a pytorch environment. It is recommended to work out of a Python virtual environment (such as ``venv``) so as to avoid package installation issues.\n\nYou can install the ``neuronx-distributed`` package using the following command:\n\n.. code:: ipython3\n\n   python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com"
  },
  {
    "path": "libraries/neuronx-distributed/standard_mixed_precision.rst",
    "content": "\n.. _standard_mixed_precision:\n\nDeveloper guide for Standard Mixed Precision\n============================================\n\nThis document will introduce the concept of Standard Mixed Precision in NxD. It's\nnewly introduced in neuron release 2.20. It is recommended to use this setting for\ntraining large models using NxD. When enabled, the optimizer will maintain a copy of\nweights and their grads in FP32 data type.\n\n.. note::\n   Using this can increase memory pressure as we are using master weights and also performing\n   optimiizer updates in higher precision. This can result in increased memory pressure and a\n   slighly lower throughpout\n\nStandard Mixed Precision offers few config settings that can be tuned by users\n\nCompared to legacy mixed precision setting (i.e. before this feature's addition), Standard Mixed Precision\nincludes these components:\n\n- Use FP32 for precision sensitive operators\n- Use FP32 master weights and optimizer states for ZeRO-1 optimizer\n- Use FP32 in local gradients accumulation\n- Turn off stochastic rounding\n\n.. note::\n   The feature is tightly integrated with the :code:`NeuronZero1Optimizer`, to make\n   Standard Mixed Precision take effect, ZeRO-1 optimizer needs to be enabled.\n\nNxD Config Update\n'''''''''''''''''\n\nNewly introduced NxD config is as below:\n\n.. code:: ipython3\n\n   mixed_precision_config = {\n       \"use_master_weights\": True,\n       \"use_fp32_grad_acc\": True,\n       \"use_master_weights_in_ckpt\": False,\n   }\n\n   config = {\n       ...\n       \"mixed_precision_config\": mixed_precision_config,\n   }\n\nIn NxD training config, a new field :code:`mixed_precision_config` (default value is :code:`None`,\nsee details in the following sections) is added. It contains three sub-fields: :code:`use_master_weights`,\n:code:`use_fp32_grad_acc`, and :code:`use_master_weights_in_ckpt`. Default value of\n:code:`use_master_weights` and :code:`use_fp32_grad_acc` is whether ZeRO-1 optimizer is enabled.\nField :code:`use_master_weights` controls whether to use FP32 master weights. Field :code:`use_fp32_grad_acc`\ncontrols whether to enable FP32 gradient accumulation buffer. Default value of :code:`use_master_weights_in_ckpt`\nis :code:`False`. This field controls whether to save master weights in checkpoints.\n\n.. code:: ipython3\n\n   # same as `mixed_precision_config = None`\n   mixed_precision_config = {\n       \"use_master_weights\": optimizer_config[\"zero_one_enabled\"],\n       \"use_fp32_grad_acc\": optimizer_config[\"zero_one_enabled\"],\n       \"use_master_weights_in_ckpt\": False,\n   }\n\n   config = {\n       ...\n       \"mixed_precision_config\": mixed_precision_config,\n   }\n\nNote that only when ZeRO-1 optimizer is enabled, Standard Mixed Precision will take effect.\n\nTo disable this Standard Mixed Precision setting, just change NxD config:\n\n.. code:: ipython3\n\n   mixed_precision_config = {\n       \"use_master_weights\": False,\n       \"use_fp32_grad_acc\": False,\n       \"use_master_weights_in_ckpt\": False,\n   }\n\n   config = {\n       ...\n       \"mixed_precision_config\": mixed_precision_config,\n   }\n"
  },
  {
    "path": "libraries/neuronx-distributed/tensor_parallelism_overview.rst",
    "content": ".. _tensor_parallelism_overview:\n\nTensor Parallelism Overview \n===========================\n\nTensor Parallelism is a technique in which a tensor is split into N\nchunks along a particular dimension such that each device only holds 1/N\nchunk of the tensor. Computation is performed using this partial chunk\nso as to get partial output. These partial outputs are collected from\nall devices ensuring the correctness of the computation is maintained.\n\nTaking a general matrix multiplication as an example, let’s say we have\nC = AB. We can split B along the column dimension into [B0 B1 B2 … Bn]\nand each device holds a column. We then multiply A with each column in B\non each device, we will get [AB0 AB1 AB2 … ABn]. At this moment, each\ndevice still holds partial results, e.g. device rank 0 holds AB0. To\nmake sure the result is correct, we need to all-gather the partial\nresult and concatenate the tensor along the column dimension. In this\nway, we are able to distribute the tensor over devices while making sure\nthe computation flow remains correct.\n\n.. image:: /libraries/neuronx-distributed/images/tp.png\n   :alt: Image: image.png\n\nFig and TP explanation is borrowed from https://colossalai.org/docs/concepts/paradigms_of_parallelism/#tensor-parallel\n\nSimilarly we can perform the partition along the row dimensions and\ncreate a RowParallel Linear layer. In RowParallelLinear layer, we\npartition the weight matrix along the row dimension. Let’s say we have C\n= AB. We can split B along the row dimension into [B0 B1 B2 … Bn] and\neach device holds a row. We then multiply each column of A on each\ndevice, we will get [A0B0 A1B1 A2B2 … AnBn]. At this moment, each device\nstill holds partial results, e.g. device rank 0 holds A0B0. To make sure\nthe result is correct, we need to all-reduce sum the partial result from\nall devices to produce the final output.\n\nUsing this principle of sharded linear layers, we can construct MLPs of\narbitrary depth until the need to operate on the whole output tensor, in\nwhich case we would have to construct the output but gathering it from\nall devices.\n\n.. image:: /libraries/neuronx-distributed/images/mlp.png\n   :alt: Image: image.png\n\nHere is an illustration from the Megatron-LM paper In the above case, as\nyou can see two linear layers are implemented using Column Parallel and\nRow Parallel linear layers, wherein the ColumnParallel Linear shards\nalong the columns and then it is followed by RowParallel Linear layer\nwhich takes in parallel inputs (sharded outputs from\nColumnParallelLinear). Consider the example shown in the above diagram,\nZ = (X\\ *A)*\\ B. In this case we split the first matrix multiplication\nover column dimension such that each device after first matrix\nmultiplication holds partial result of Y0=XA0,Y1=XA1 and so on. For the\nsecond matrix multiplication, we partition the weight matrix over row\ndimension and since the inputs are already columns sharded and we can\nmultiply them to produce partial outputs. These outputs finally requires\nan all-reduce sum, since we want to sum up the single column*row result.\n\nTensor Parallelism for Transformers:\n\nA transformer block\n\n.. image:: /libraries/neuronx-distributed/images/self-attention.png\n   :alt: Image: image.png\n\nFig: Taken from Megatron-LM paper.\n\nAs seen from the figure above, a simple self attention block has the QKV linear layer followed by MLP.\nUsing the same Column and Row Parallel linear layers, we can partition\nthe self-attention block across devices thereby reducing the memory\nfootprint on each device, since each device now only holds partial\nparameters. This weight distribution strategy allows us to scale large\nmodel training across devices.\n\n\n"
  },
  {
    "path": "libraries/neuronx-distributed/tp_developer_guide.rst",
    "content": ".. _tp_developer_guide:\n\nDeveloper guide for Tensor Parallelism \n=================================================================\n\nTraining\n^^^^^^^^\n\nFor training models with tensor-parallelism, one would have to make few\nchanges to their model/training script. Below we walk through the\ndifferent changes one would have to make to shard the models across\ndevices.\n\nCreating DataLoader:\n''''''''''''''''''''\n\nWhen we shard the model across devices using tensor parallelism, all the\ntensor parallel workers are operating on the same batch of data. Hence,\nto ensure that each tensor parallel worker is getting the same data, we\nmake use of ``DistributedSampler`` as shown in the snippet below\n\n.. code:: ipython3\n\n   def create_pretraining_dataset(\n       input_file, max_pred_length, mini_batch_size, worker_init\n   ):\n       train_data = pretraining_dataset(\n           input_file=input_file, max_pred_length=max_pred_length\n       )\n       # To distribute the data across different workers in the world, \n       # we use the DistributedSampler. The num_replicas should be equal\n       # to the data_parallel_world_size. Note: data_parallel_rank=0 can have\n       # multiple tensor parallel ranks and each of these should get the same \n       # data. \n       train_sampler = DistributedSampler(\n           train_data,\n           num_replicas=parallel_state.get_data_parallel_world_size(),\n           rank=parallel_state.get_data_parallel_rank(),\n       )\n       train_dataloader = DataLoader(\n           train_data,\n           sampler=train_sampler,\n           batch_size=mini_batch_size,\n           num_workers=0,\n           worker_init_fn=worker_init,\n           drop_last=True,\n           pin_memory=True,\n       )\n       return train_dataloader\n\nCreating Model:\n'''''''''''''''\n\nOne can create models by replacing the large linear layers with\n``ColumnParallel`` and ``RowParallel`` Linear layers. In case of\ntransformers, we have a good structure where the Attention block usually\nhave linear projections for QKV and this is followed by a fully\nconnected layer. Let’s take a look at the example for the BERT model. We\nmake the attention module of BERT model to use tensor parallel layers,\nthereby adding the ability to shard the model across devices.\n\n.. code:: ipython3\n\n   class ParallelSelfAttention(transformers.models.bert.modeling_bert.BertSelfAttention):\n       def __init__(self, config, position_embedding_type=None):\n           super().__init__(config, position_embedding_type)\n\n           self.query = ColumnParallelLinear(config.hidden_size,\n                                             self.all_head_size,\n                                             gather_output=False)\n           self.key = ColumnParallelLinear(config.hidden_size,\n                                           self.all_head_size,\n                                           gather_output=False)\n           self.value = ColumnParallelLinear(config.hidden_size,\n                                             self.all_head_size,\n                                             gather_output=False)\n           # Since we shard the number of attention heads across tensor parallel\n           # ranks, each rank would have a subset of heads, hence, we update\n           # the num_attention_heads here.\n           tp_size = parallel_state.get_tensor_parallel_size()\n           self.num_attention_heads = self.num_attention_heads // tp_size\n           self.all_head_size = self.all_head_size // tp_size\n\nAs seen we just had to swap out the linear layers with ColumnParallel\nLinear layers and the rest of the forward method of the attention layer\ncan work as is. Note: In the above ColumnParallelLinear layer we are not\ngathering output from each rank, in other words, each ranks is working\non its own shard. We can make gather_output=True and that would gather\noutput and you would get a full dim output. However, gathering output\nfrom all ranks would introduce an all-gather operation which can be\nexpensive depending on the size of the tensor. In the case of attention\nmodule, we know that the SelfAttention block is followed by MLP block.\nHence, we replace the linear layer there with a RowParallelLinear as\nshown below:\n\n.. code:: ipython3\n\n   class ParallelSelfOutput(transformers.models.bert.modeling_bert.BertSelfOutput):\n       def __init__(self, config):\n           super().__init__(config)\n           self.dense = RowParallelLinear(config.hidden_size,\n                                          config.hidden_size,\n                                          input_is_parallel=True)\n\nAs seen we just had to replace the dense layer here, and pass the\n``input_is_parallel`` argument. This way, the ``RowParallelLinear``\nshould operator on partitions and get a collective result.\n\nMaking just the above two changes can help you partition good chunk of\nyour model across multiple workers, thereby allowing models of larger\nsize to be trained on a single instance. Note: Majority of the\nparameters of a transformer model are in these linear layers and hence\npartitioning these layers can help you scale.\n\nFinal Training script:\n''''''''''''''''''''''\n\nOnce the dataloader and model changes are done, we are ready to build\nthe training script. Good news, you can use the same training loop as\nbefore for data-parallel training, and would need just the minor tweaks\nto get it all started.\n\n.. code:: ipython3\n\n   from neuronx_distributed.parallel_layers import parallel_state, clip_grad_norm\n\n   neuronx_distributed.parallel_state.initialize_model_parallel(tensor_model_parallel_size=2)\n   dataloader = create_pretraining_dataset(\n    input_file, max_pred_length, mini_batch_size, worker_init)\n\n   model = YourNewlyBuiltParallelModel(config)\n   # We have to move the model to device using this API, because when\n   # we move model to device using .to(device), the model parameter's\n   # attributes aren't preserved. This causes some of the tensor parallel\n   # attributes to be lost. Hence, this API takes care of preserving the\n   # tensor parallel attributes.\n   parallel_layers.move_model_to_device(model, device)\n\n   for inputs, labels in dataloader:\n       output = model(*inputs)\n       loss = loss_fn(output, labels)\n       loss.backward()\n       # Here we use clip_grad_norm from neuronx_distributed as that \n       # can handle tensor parallel ranks\n       clip_grad_norm(model.parameters(), max_norm)\n       # For the optimzer step, we have to pass the data_parallel group\n       xm.optimizer_step(\n           optimzer, \n           groups=parallel_state.get_data_parallel_group(as_list=True)\n       )\n       optimizer.zero_grad()\n       scheduler.step()\n\nFew things to take note of in the above code snippet: 1. We are\ninitializing the model parallel with tensor parallel size of 2. This\nwill shard the model across 2 devices. 2. We use the\n``move_model_to_device`` API to move model to device. This is equivalent\nto doing ``model.to(device)``. We need to explicity call this API since\nsome of the tensor-parallel attributes do not get copied over when we\nmove the model to device using ``model.to(device)``. 3. We are calling\nthe ``clip_grad_norm`` from ``parallel_layers``. This clip_grad_norm\nshould take care of accumulating the max_norm from the tensor_parallel\nranks and producing the correct output. 4. We pass the\n``data_parallel_group`` to the ``optimizer_step``. If we don’t pass the\ngroup, default would be all the workers in the world.\n\nSaving Model:\n'''''''''''''\n\nOnce training is done, we want to save the model. This can be done\neasily by calling the save api from\n``neuronx_distributed.parallel_layers`` . Here is an example:\n\n.. code:: ipython3\n\n   neuronx_distributed.parallel_layers.save({\n               'epoch': epoch,\n               'model': model.state_dict(),\n               'optimizer_state_dict': optimizer.state_dict(),\n               'loss': loss,\n               ...\n               }, PATH)\n\nNote the ``model`` key used here, we need to provide the same key during\nmodel load."
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/finetune_llama3_8b_ptl_lora.rst",
    "content": ".. _llama3_8b_tp_ptl_lora_finetune_tutorial:\n\nFine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning\n=====================================================================================\n\nThis tutorial shows how to fine-tune a Llama3-8B model with tensor-parallelism and LoRA adaptors. The tutorial uses the :ref:`PyTorch-lightning trainer <ptl_developer_guide>` for setting up the finetuning loop.\n\n\nSetting up the environment\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFor this experiment, we will use one trn1.32xlarge compute instance in AWS EC2.\nTo set up the packages in the compute instance, see\n:ref:`Install PyTorch Neuron on Trn1 <setup-torch-neuronx>`.\nInstall the ``neuronx-distributed`` package inside the virtual environment using the following command:\n\n.. code-block:: ipython3\n   \n\n   python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com\n\nNext, download the scripts for fine-tuning with LoRA\n\n1. Create a directory to hold the experiments.\n\n.. code-block:: ipython3\n\n   mkdir -p ~/examples/tp_llama3_8b_lora_finetune\n   cd ~/examples/tp_llama3_8b_lora_finetune\n\n\n2. Download training scripts for the experiments.\n\n\nWe download training scripts for Llama modules, data modules, the config file of Llama3-8B, and the LoRA fine-tuning script from NxD.\nWe also download the requirements files for package dependencies and scripts to convert Llama checkpoint to NxD checkpoint.\n\n.. code-block:: ipython3\n\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/data_module.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/module_llama.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/tp_llama_hf_finetune_ptl.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/tp_zero1_llama_hf_pretrain/8B_config_llama3/config.json\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lr.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/modeling_llama_nxd.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/requirements.txt\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/requirements_ptl.txt\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/training_utils.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/convert_checkpoints.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/test/integration/modules/lora/test_llama_lora_finetune.sh\n   wget https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/models/llama/convert_llama_weights_to_hf.py\n\n\n3. Install the additional requirements and give the right permissions to the shell script.\n\n.. code-block:: ipython3\n\n   python3 -m pip install -r requirements.txt\n   python3 -m pip install -r requirements_ptl.txt  # Currently we're supporting Lightning version 2.4.0\n   chmod +x test_llama_lora_finetune.sh\n   # prepare the dataset\n   python3 -c \"import nltk; nltk.download('punkt'); nltk.download('punkt_tab');\" \n\n\nPrepare the checkpoint and dataset\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n\n1. Download the Llama3-8B checkpoint\n\nUse of this model is governed by the Meta license. In order to download the model weights and tokenizer follow the instructions in meta-llama/Meta-Llama-3-8B .\n\nOnce granted access, you can download the model. For the purposes of this tutorial we assume you have saved the Llama-3-8B model in a directory called ``models/Llama-3-8B``\n\n2. Convert the llama checkpoint to NxD checkpoint\n\nUse `convert_llama_weights_to_hf.py <https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py>`_ to convert Llama checkpoint to HuggingFace checkpoint. \nThis script will shard Llama3-8B into multiple partitions.\nIn order to save it as one partition, we need to set flags ``max_shard_size=\"64GB\"`` and ``safe_serialization=False`` in ``model.save_pretrained()``.\n\n.. code-block:: ipython3\n\n   pip install blobfile tiktoken\n   cd ~/examples/tp_llama3_8b_lora_finetune\n   python convert_llama_weights_to_hf.py --input_dir models/Llama-3-8B/ --model_size 8B --llama_version 3 --output_dir models/Llama-3-8B-hf\n\n\nWhen the HuggingFace checkpoint is ready, we can convert it to NxD checkpoint with\n\n.. code-block:: ipython3\n\n   cd ~/examples/tp_llama3_8b_lora_finetune\n   python3 convert_checkpoints.py --tp_size 32 --qkv_linear 1 --kv_size_multiplier 4 --convert_from_full_state --config config.json --input_dir models/Llama-3-8B-hf/pytorch_model.bin --output_dir models/llama3_8b_tp32/pretrained_weight/\n\n\nWe then set up `PRETRAINED_PATH=\"models/llama3_8b_tp32\"` in `tp_llama3_8b_lora_finetune_ptl.sh`.\n\n\n3. Set up HuggingFace Token for Llama3 Tokenizer\n\nWe need to set up ``HF_TOKEN`` in ``test_llama_lora_finetune.sh`` to configure your Huggingface Token for Llama3-8B Tokenizer.\n\nRefer to `Huggingface Access Tokens <https://huggingface.co/docs/hub/en/security-tokens>`_ to create your Huggingface access tokens.\n\n\n1. Set the dataset for the fine-tuning job. \n\nIn this example, we will use `Dolly <https://huggingface.co/datasets/databricks/databricks-dolly-15k>`_, which is an open source dataset\nof instruction-following records on categories outlined in the `InstructGPT paper <https://arxiv.org/pdf/2203.02155>`_, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.\n\n.. code-block::\n\n   {\n   \"instruction\": \"Alice's parents have three daughters: Amy, Jessy, and what's the name of the third daughter?\",\n\n   \"context\": \"\",\n\n   \"response\": \"The name of the third daughter is Alice\"\n   }\n\nConfigure the following flags in ``test_llama_lora_finetune.sh`` to set up the dataset:\n\n.. code-block:: ipython3\n\n   --data_dir \"databricks/databricks-dolly-15k\" \\\n   --task \"open_qa\" \\\n\n\nRunning fine-tuning\n^^^^^^^^^^^^^^^^^^^\n\n1. Enable LoRA for fine-tuning \n\nIn ``test_llama_lora_finetune.sh``, we also need to enable LoRA by adding the below argument\n\n.. code-block:: ipython3\n\n   --enable_lora \\\n\n\nThe default configuration for LoRA adapters in ``test_llama_lora_finetune.py`` is\n\n.. code-block:: ipython3\n\n   target_modules = [\"q_proj\", \"v_proj\", \"k_proj\"] if flags.qkv_linear == 0 else [\"qkv_proj\"]      \n   lora_config = LoraConfig(\n      enable_lora=flags.enable_lora,\n      lora_rank=16,\n      lora_alpha=32,\n      lora_dropout=0.05,\n      bias=\"none\",\n      lora_verbose=True,\n      target_modules=target_modules,\n   )\n\n\n1. LoRA checkpoint\n\nThere are three checkpoint saving modes for LoRA fine-tuning and we can set different modes with LoRA flags ``save_lora_base`` and ``merge_lora``\n\n* ``save_lora_base=False, merge_lora=False`` Save the LoRA adapter only.\n* ``save_lora_base=True, merge_lora=False``  Save both the base model and the LoRA adapter seperately.\n* ``save_lora_base=True, merge_lora=True``   Merge the LoRA adapter into the base model and then save the base model.\n\n\nOther than the adapter, LoRA also needs to save the LoRA configuration file for adapter loading. \nThe configuration can be saved into the same checkpoint with the adapter, or saved as a seperately json file.\nAn example of configurations for LoRA saving is\n\n.. code-block:: ipython3\n\n   lora_config = LoraConfig(\n      ...\n      save_lora_base=False,   # save the LoRA adapter only\n      merge_lora=False,       # do not merge LoRA adapter into the base model\n      save_lora_config_adapter=True,  # save LoRA checkpoint and configuration file in the same checkpoint\n   )\n\n\nAfter adding these flags, users can save LoRA model with \n\n.. code-block:: ipython3\n\n   import neuronx_distributed as nxd\n   nxd.save_checkpoint(\n      checkpoint_dir_str=\"lora_checkpoint\", \n      tag=\"lora\", \n      model=model\n   )\n\n\nThe output checkpoints of LoRA Adapter will be saved under folder ``lora_checkpoint/lora/``. \n\n.. note::\n   If LoRA configuration file is saved separately, it should be placed as ``lora_adapter/adapter_config.json``.\n\n\n3. Run the fine-tune script\n\n.. code-block:: ipython3\n\n   ./test_llama_lora_finetune.sh\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/index.rst",
    "content": ".. _tp_tutorials:\n\nTutorials for NeuronX Distributed \n============================================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Training Tutorials </libraries/neuronx-distributed/tutorials/training_tutorials>\n    Inference Tutorials </libraries/neuronx-distributed/tutorials/inference_tutorials>\n\n.. include:: /libraries/neuronx-distributed/tutorials/index.txt\n\n\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/index.txt",
    "content": "* :ref:`nxd_training_tutorials`\n* :ref:`nxd_inference_tutorials`"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/inference.rst",
    "content": ".. _tp_inference_tutorial:\n\nInference with Tensor Parallelism [Beta]\n===========================================================================\n\nBefore we start, let's install transformers.\n\n.. code:: ipython3\n\n    pip install transformers==4.26.0\n\nFor running model inference, we would need to trace the distributed\nmodel. Before we run the inference, let’s get a checkpoint that we can\nuse. Let’s run the below block of code:\n\n.. code:: ipython3\n\n    import torch\n    import torch_neuronx\n    import transformers\n    from transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n    name = \"bert-base-cased-finetuned-mrpc\"\n\n    model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)\n    torch.save({\"model\":model.state_dict()}, \"bert.pt\")\n\nIf you already have a checkpoint from the tensor parallel training tutorial or by running\ntraining from another source, feel free to skip the above step.\n\nOnce we have the checkpoint we are ready to trace the model and run\ninference against it. Let’s look at the example below:\n\n.. code:: ipython3\n\n    import os\n    import torch\n    import torch_neuronx\n    import transformers\n    from transformers import AutoTokenizer, AutoModelForSequenceClassification\n    from transformers.models.bert.modeling_bert import BertSelfAttention, BertSelfOutput\n\n    import neuronx_distributed\n    from neuronx_distributed.parallel_layers import layers, parallel_state\n\n\n    def encode(tokenizer, *inputs, max_length=128, batch_size=1):\n        tokens = tokenizer.encode_plus(\n            *inputs,\n            max_length=max_length,\n            padding='max_length',\n            truncation=True,\n            return_tensors=\"pt\"\n        )\n        return (\n            torch.repeat_interleave(tokens['input_ids'], batch_size, 0),\n            torch.repeat_interleave(tokens['attention_mask'], batch_size, 0),\n            torch.repeat_interleave(tokens['token_type_ids'], batch_size, 0),\n        )\n\n\n    # Create the tokenizer and model\n    name = \"bert-base-cased-finetuned-mrpc\"\n    tokenizer = AutoTokenizer.from_pretrained(name)\n\n\n    # Set up some example inputs\n    sequence_0 = \"The company HuggingFace is based in New York City\"\n    sequence_1 = \"Apples are especially bad for your health\"\n    sequence_2 = \"HuggingFace's headquarters are situated in Manhattan\"\n\n    paraphrase = encode(tokenizer, sequence_1, sequence_2)\n    not_paraphrase = encode(tokenizer, sequence_1, sequence_1)\n\n    def get_model():\n        model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)\n        # Here we build a model with tensor-parallel layers.\n        # Note: If you already have a Model class that does this, we can use that directly\n        # and load the checkpoint in it.\n        class ParallelSelfAttention(BertSelfAttention):\n            def __init__(self, config, position_embedding_type=None):\n                super().__init__(config, position_embedding_type)\n                self.query = layers.ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False)\n                self.key = layers.ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False)\n                self.value = layers.ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False)\n                self.num_attention_heads = self.num_attention_heads // parallel_state.get_tensor_model_parallel_size()\n                self.all_head_size = self.all_head_size // parallel_state.get_tensor_model_parallel_size()\n\n        class ParallelSelfOutput(BertSelfOutput):\n            def __init__(self, config):\n                super().__init__(config)\n                self.dense = layers.RowParallelLinear(config.hidden_size,\n                                        config.hidden_size,\n                                        input_is_parallel=True)\n\n        for layer in model.bert.encoder.layer:\n            layer.attention.self = ParallelSelfAttention(model.config)\n            layer.attention.output = ParallelSelfOutput(model.config)\n\n        # Here we created a checkpoint as mentioned above. We pass sharded=False, since the checkpoint\n        # we obtained is unsharded. In case you are using the checkpoint from the tensor-parallel training,\n        # you can set the sharded=True, as that checkpoint will contain shards from each tp rank.\n        neuronx_distributed.parallel_layers.load(\"bert.pt\", model, sharded=False)\n\n        # These io aliases would enable us to mark certain input tensors as state tensors. These\n        # state tensors are going to be device tensors.\n        io_aliases = {}\n        return model, io_aliases\n    \n    if __name__ == \"__main__\":\n\n        # Note how we are passing a function that returns a model object, which needs to be traced.\n        # This is mainly done, since the model initialization needs to happen within the processes\n        # that get launched internally within the parallel_model_trace.\n        model = neuronx_distributed.trace.parallel_model_trace(get_model, paraphrase, tp_degree=2)\n\n        # Once traced, we now save the trace model for future inference. This API takes care\n        # of saving the checkpoint from each tensor parallel worker\n        neuronx_distributed.trace.parallel_model_save(model, \"tp_models\")\n\n        # We now load the saved model and will run inference against it\n        model = neuronx_distributed.trace.parallel_model_load(\"tp_models\")\n        cpu_model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)\n        assert torch.argmax(model(*paraphrase)[0]) == torch.argmax(cpu_model(*paraphrase)[0])\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/inference_tutorials.rst",
    "content": ".. _nxd_inference_tutorials:\n\nInference Tutorials\n============================================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n        \n    /src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb\n\n.. include:: /libraries/neuronx-distributed/tutorials/nxd_inference_tutorials.txt\n\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/neuronx_distributed_tutorials.txt",
    "content": "* :ref:`nxd_training_tutorials`\n* :ref:`nxd_inference_tutorials`"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd-source-code/llama_tp_pp/llama_2_13b.sh",
    "content": "#!/bin/bash\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/tp_pp_llama_hf_pretrain\n\nchmod +x run_llama2_13B_tp_pp.sh\nln -sf 13B_config_llama2/config.json ./\n\nsudo rm -rf /home/ubuntu/.cache/\npip install --upgrade filelock\n\npython3 get_dataset.py --llama-version 2\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/run_llama2_13B_tp_pp.sh\"\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/run_llama2_13B_tp_pp.sh\""
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd-source-code/llama_tp_pp/llama_2_70b.sh",
    "content": "#!/bin/bash\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/tp_pp_llama_hf_pretrain\n\nchmod +x run_llama2_70B_tp_pp.sh\nln -sf 70B_config_llama2/config.json ./\n\nsudo rm -rf /home/ubuntu/.cache/\npip install --upgrade filelock\n\npython3 get_dataset.py --llama-version 2\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/run_llama2_70B_tp_pp.sh\"\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/run_llama2_70B_tp_pp.sh\""
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd-source-code/llama_tp_pp/llama_31_70b.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/tp_pp_llama_hf_pretrain\n\nchmod +x run_llama3_70B_tp_pp.sh\nln -sf 70B_config_llama3.1/config.json ./\n\nsudo rm -rf /home/ubuntu/.cache/\npip install --upgrade filelock\n\npython3 get_dataset.py --llama-version 3 # change the version number to 2 for Llama-2 models\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/run_llama3_70B_tp_pp.sh\"\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/run_llama3_70B_tp_pp.sh\""
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd-source-code/llama_tp_pp/llama_3_70b.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/tp_pp_llama_hf_pretrain\n\nchmod +x run_llama3_70B_tp_pp.sh\nln -sf 70B_config_llama3/config.json ./\n\nsudo rm -rf /home/ubuntu/.cache/\npip install --upgrade filelock\n\npython3 get_dataset.py --llama-version 3 # change the version number to 2 for Llama-2 models\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/run_llama3_70B_tp_pp.sh\"\n\nsbatch --exclusive \\\n--nodes 32 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/run_llama3_70B_tp_pp.sh\""
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd-source-code/llama_tp_pp/llama_tp_pp_setup.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/tp_pp_llama_hf_pretrain\nln -sf ~/neuronx-distributed/examples/training/llama/lr.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/training_utils.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/convert_checkpoints.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/get_dataset.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/modeling_llama_nxd.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/requirements.txt ./\n\npython3 -m pip install -r requirements.txt"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd-source-code/llama_tp_zero1/llama_2_7b.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/tp_zero1_llama_hf_pretrain\nchmod +x tp_zero1_llama2_7B_hf_pretrain.sh\nln -sf 7B_config_llama2/config.json ./\n\nsudo rm -rf /home/ubuntu/.cache/\npip install --upgrade filelock\n\npython3 get_dataset.py --llama-version 2\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 4 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/tp_zero1_llama2_7B_hf_pretrain.sh\"\n\nsbatch --exclusive \\\n--nodes 4 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/tp_zero1_llama2_7B_hf_pretrain.sh\"\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd-source-code/llama_tp_zero1/llama_31_8b.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/tp_zero1_llama_hf_pretrain\nchmod +x tp_zero1_llama3_8B_hf_pretrain.sh\ncp ./8B_config_llama3.1/config.json ./8B_config_llama3\nln -sf 8B_config_llama3.1/config.json ./\n\nsudo rm -rf /home/ubuntu/.cache/\n\npip install --upgrade filelock\n\npython3 get_dataset.py --llama-version 3\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 4 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/tp_zero1_llama3_8B_hf_pretrain.sh\"\n\nsbatch --exclusive \\\n--nodes 4 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/tp_zero1_llama3_8B_hf_pretrain.sh\"\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd-source-code/llama_tp_zero1/llama_3_8b.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/tp_zero1_llama_hf_pretrain\nchmod +x tp_zero1_llama3_8B_hf_pretrain.sh\nln -sf 8B_config_llama3/config.json ./\n\nsudo rm -rf /home/ubuntu/.cache/\npip install --upgrade filelock\n\npython3 get_dataset.py --llama-version 3 # change the version number to 2 for Llama-2 models\n\nPATH=$PATH:/opt/slurm/bin/\n\nsbatch --exclusive \\\n--nodes 4 \\\n--cpus-per-task 128 \\\n--wrap=\"srun neuron_parallel_compile bash $(pwd)/tp_zero1_llama3_8B_hf_pretrain.sh\"\n\nsbatch --exclusive \\\n--nodes 4 \\\n--cpus-per-task 128 \\\n--wrap=\"srun bash $(pwd)/tp_zero1_llama3_8B_hf_pretrain.sh\"\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd-source-code/llama_tp_zero1/llama_tp_zero1_setup.sh",
    "content": "#!/bin/bash\n# IMPORTANT: Neuron will stop supporting XLA-based training support in a future release. For now, this code sample is provided strictly for reference.\nset -eExuo\n\ncd ~/neuronx-distributed/examples/training/llama/tp_zero1_llama_hf_pretrain\nln -sf ~/neuronx-distributed/examples/training/llama/training_utils.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/modeling_llama_nxd.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/get_dataset.py ./\nln -sf ~/neuronx-distributed/examples/training/llama/requirements.txt ./\n\npython3 -m pip install -r requirements.txt\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd_inference_tutorials.txt",
    "content": "* T5 inference tutorial :ref:`[html] </src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>` :pytorch-neuron-src:`[notebook] <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`\n* `Llama 3.2 1B inference example <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama>`__\n\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/nxd_training_tutorials.txt",
    "content": "* :ref:`tp_training_tutorial`\n* :ref:`gpt_neox_tp_zero1_tutorial`\n* :ref:`gpt_neox_20b_tp_zero1_tutorial`\n* :ref:`llama2_tp_pp_ptl_tutorial`\n* :ref:`llama2_7b_tp_zero1_ptl_finetune_tutorial`\n* :ref:`llama3_8b_tp_ptl_lora_finetune_tutorial`\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/training.rst",
    "content": ".. _tp_training_tutorial:\n\nTraining with Tensor Parallelism \n===========================================================\n\nKeeping the above changes made in :ref:`Developer guide <tp_developer_guide>`, let’s now run an end-to-end training\nwith tensor-parallelism. This section is adopted from `BERT pretraining\ntutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/bert.html#hf-bert-pretraining-tutorial>`__\nwhich used data-parallel training to scale the throughput. In this\nsection we modify that tutorial to showcase the use of\ntensor-parallelism which should enable us to scale the size of the\nmodel.\n\nSetting up environment:\n                       \nFor this experiment, we will use a trn1-32xl machine with the storage\nset to 512GB at least.\nFollow the instructions mentioned here: \n:ref:`Install PyTorch Neuron on Trn1 <setup-torch-neuronx>`. \nIt is recommended to work out of python virtual env so as to avoid package installation issues.\n\nWe also have to install the ``neuronx-distributed`` package using the\nfollowing command:\n\n.. code:: ipython3\n\n   python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com\n\nMake sure the transformers version is set to ``4.26.0`` (Note: If you have transformers-neuronx in your environment, you need to uninstall it to avoid a conflict with the transformers version.)\n\nLet’s download the scripts and datasets for pretraining.\n\n.. code:: ipython3\n\n   mkdir -p ~/examples/tp_dp_bert_hf_pretrain\n   cd ~/examples/tp_dp_bert_hf_pretrain\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/tp_dp_bert_hf_pretrain/tp_dp_bert_large_hf_pretrain_hdf5.py\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/tp_dp_bert_hf_pretrain/requirements.txt\n   python3 -m pip install -r requirements.txt\n\nNext let’s download the tokenizer and the sharded datasets:\n\n.. code:: ipython3\n\n   mkdir -p ~/examples_datasets/\n   pushd ~/examples_datasets/\n   aws s3 cp s3://neuron-s3/training_datasets/bert_pretrain_wikicorpus_tokenized_hdf5/bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar .  --no-sign-request\n   tar -xf bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar\n   rm bert_pretrain_wikicorpus_tokenized_hdf5_seqlen128.tar\n   popd\n\nAt this point, you are all set to start training\n\nRunning training\n                \n\nWe first pre-compile the graphs using the ``neuron_parallel_compile``.\nThis process is similar to one discussed in the `BERT pretraining\ntutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/bert.html#hf-bert-pretraining-tutorial>`__\n. Let’s run the command below:\n\n.. code:: ipython3\n\n   cd ~/examples/tp_dp_bert_hf_pretrain\n   export XLA_DOWNCAST_BF16=1\n   neuron_parallel_compile torchrun --nproc_per_node=32 \\\n   tp_dp_bert_large_hf_pretrain_hdf5.py \\\n   --tensor_parallel_size 8 \\\n   --steps_this_run 10 \\\n   --batch_size 64 \\\n   --grad_accum_usteps 64 |& tee compile_log.txt\n\nThis script uses a tensor-parallel size of 8. This will automatically\nset the data-parallel degree to 4 (32 workers / tensor_parallel_size).\nOnce the graphs are compiled we can now run training and observe our\nloss go down. To run the training, we just the above command but without\n``neuron_parallel_compile``.\n\n.. code:: ipython3\n\n   XLA_DOWNCAST_BF16=1 torchrun --nproc_per_node=32 \\\n   tp_dp_bert_large_hf_pretrain_hdf5.py \\\n   --tensor_parallel_size 8 \\\n   --steps_this_run 10 \\\n   --batch_size 64 \\\n   --grad_accum_usteps 64 |& tee training_log.txt\n\nYou would notice that the throughput is lower when you run the\n``dp_bert_large_hf_pretrain_hdf5.py``. This is expected as the number of\ndata-parallel workers have gone down (from 32 to 4). However, if you\nopen ``neuron-top`` in another terminal, you should see the memory\nutilization per core for this script is lower than the\n``dp_bert_large_hf_pretrain_hdf5.py``. Since the memory requirement has\ngone down, you can scale the size of model either by increasing the\nnumber of layers/attention heads/hidden sizes.\n\nThe loss curve should match to the loss curve we would get from the\ndata_parallel counterpart.\n\nKnown Issues:\n~~~~~~~~~~~~~\n\n1. Currently the checkpoints dumped during training are sharded and\n   users would have to write a script to combine the checkpoints\n   themselves. This should be fixed in the future release\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/training_llama_tp_pp.rst",
    "content": ".. _llama3_tp_pp_tutorial:\n\nTraining Llama-3.1-70B and Llama-3-70B with Tensor Parallelism and Pipeline Parallelism \n========================================================================================\n\n.. important::\n   Neuron will stop supporting XLA-based training support in a future release. For now, this tutorial is provided strictly for reference.\n\nIn this section, we showcase to pretrain Llama 3.1 and Llama3 70B models by using the tensor parallel, pipeline parallel, sequence parallel, activation\ncheckpoint as well as constant mask optimization in the ``neuronx-distributed`` package.\n\nSetting up environment:\n                       \nFor this experiment, we will use a ParallelCluster with at least 32 trn1-32xl compute nodes.\n`Train your model on ParallelCluster <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster/parallelcluster-training.html>`__\nintroduces how to setup and use a ParallelCluster.\n\nWe also need to install the ``neuronx-distributed`` package using the following command:\n\n.. code:: ipython3\n\n   python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com\n   git clone git@github.com:aws-neuron/neuronx-distributed.git\n\nLet’s download the scripts for pretraining:\n\n.. literalinclude:: nxd-source-code/llama_tp_pp/llama_tp_pp_setup.sh\n   :language: shell\n   :lines: 4-10\n\n\n\nIf you want to pre-train Llama3.1 70B, you would need to run the following steps -\n\n.. literalinclude:: nxd-source-code/llama_tp_pp/llama_31_70b.sh\n   :language: shell\n   :lines: 6-7\n\nIf you want to pre-train Llama3 70B, you would need to run the following steps -\n\n.. literalinclude:: nxd-source-code/llama_tp_pp/llama_3_70b.sh\n   :language: shell\n   :lines: 6-7\n\n\nThe below tutorial uses ``Llama3.1 70B`` as an example.\n\nFirst, let's get all the needed dependencies\n\n.. literalinclude:: nxd-source-code/llama_tp_pp/llama_tp_pp_setup.sh\n   :language: shell\n   :lines: 12\n    \n\nTo tokenize the data, we must request the tokenizer from hugging face and meta by following the instructions at the following link: `HuggingFace Llama 3 8B Model <https://huggingface.co/meta-llama/Meta-Llama-3-8B>`__ . \n\nUse of the Llama models is governed by the Meta license. In order to download the model weights and tokenizer, please visit the above website and accept their License before requesting access. After access has been granted, you may use the following python3 script along with your own hugging face token to download and save required tokenizer.\n\nRun the following from ``~/examples/tp_pp_llama_hf_pretrain`` directory:\n\n.. code:: ipython3\n\n   from huggingface_hub import login\n   from transformers import AutoTokenizer\n\n   login(token='your_own_hugging_face_token')\n\n   tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B')\n\n   tokenizer.save_pretrained(\".\")\n\nFor Llama3.1/Llama3, make sure your ``~/examples/tp_pp_llama2_hf_pretrain`` directory has the following files:\n\n.. code:: ipython3\n\n   './tokenizer_config.json', './special_tokens_map.json', './tokenizer.json'\n\n\nNext let’s download and pre-process the dataset:\n\n.. literalinclude:: nxd-source-code/llama_tp_pp/llama_3_70b.sh\n   :language: shell\n   :lines: 12\n\nIn case you see an error of the following form when downloading data: ``huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/examples/tp_pp_llama2_hf_pretrain'. Use `repo_type` argument if needed.`` This could be because of a stale cache. Try deleting the cache using: \n\n.. code:: ipython3\n\n   sudo rm -rf /home/ubuntu/.cache/\n\nIn case you see an error of the following form when downloading data: ```NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.``` Try upgrading pip:\n\n.. code:: ipython3\n\n   pip install -U datasets\n\n\nAt this point, you are all set to start training.\n\n\nRunning training\n\nWe first pre-compile the graphs using the ``neuron_parallel_compile``. Let’s run the command below:\n\n.. literalinclude:: nxd-source-code/llama_tp_pp/llama_3_70b.sh\n   :language: shell\n   :lines: 16-19\n\nThis script uses a tensor-parallel size of 8, pipeline-parallel size of 8\nTo run the training, we just use the above command but without ``neuron_parallel_compile``.\n\n.. literalinclude:: nxd-source-code/llama_tp_pp/llama_3_70b.sh\n   :language: shell\n   :lines: 21-24\n\n\nTo achieve better performance, the script applies few techniques:\n\n**Sequence Parallelism and Selective Activation Checkpointing**:\n\nAs explained in the :ref:`Activation Memory Recomputation Doc <activation_memory_reduction>`, both `Sequence Parallelism` \nand `Selective activation checkpointing` can help with activation memory reduction thereby allowing us to fit bigger \nmodels with less number of devices. \nPlease refer to :ref:`Activation Memory Reduction Developer Guide <activation_memory_reduction_developer_guide>` on how to \nenable sequence parallel and selective activation checkpointing. \n\n\n**GQAQKVColumnParallelLinear Layer**:\n\nIn LLama 70B GQA module, the K and V attention heads are `8` whereas Q has `64` attentions heads. Since the number of \nattention heads should be divisible by tensor_parallel_degree, we would end up using a tp_degree of 8. Hence to fit \na 70B model, we would have to use a higher pipeline-parallel degree. Using higher pipeline-parallel degree works well \nwhen the global batch size is very high, however, as the data-parallel degree increases at higher cluster size, the \nbatch size per node decreases. This would result in higher `pipeline bubble <https://developer.nvidia.com/blog/scaling-language-model-training-to-a-trillion-parameters-using-megatron/>`__ \nthereby reducing performance. To mitigate this issue, one can use the :ref:`GQAQKVColumnParallelLinear <parameters-11>` layer with the\n`kv_size_multiplier` set to 4. This would repeat the KV heads and make them 32. This would allow doing tensor-parallelism \nusing tp_degree of 32. This reduces the activation memory per device and thereby eventually allows using a pipeline \nparallel degree of 4. This can be enabled by passing the argument:\n\n.. code-block:: python\n\n   torchrun $DISTRIBUTED_ARGS run_llama_nxd.py \\\n   ... \\\n   --qkv_linear 1 \\\n   --kv_replicator 4 \\\n   --tb_dir $tb_dir |& tee $LOG_PATH/log\n\nThe above changes are already included in the `run_llama_70b_tp_pp.sh`. For Llama13B model we only do 8-way tensor parallelism so\nwe do not need this change.\n\n**Fusing Q,K,V layers**:\n\nIn the GQAQKVColumnParallelLinear, the parallel matrix multiply is coalesced to improve throughput. Currently it's enabled by default. To disable it, set ``--fuse_qkv 0``\n\n.. note::\n    Because the layers above are coalesced, ensure that any pretrained checkpoint loaded for fine-tuning has the q,k,v layers coleasced. Otherwise, preprocessing is required to fuse these layers in the checkpoint. Follow this :ref:`Checkpoint Conversion Guide <checkpoint_conversion>` and set ``--fuse_qkv`` to coalesce the layers in the checkpoint. \n\n\n**Flash Attention**:\n\nWe're introducing flash attention function for better performance/memory efficiency. Currently it's enabled by default, to disable it set ``--use_flash_attention 0``\n\n\n`Save/Load Checkpoint` (refer to :ref:`API Guide <api_guide>` for more context about checkpoint APIs).\n\nTo enable checkpoint saving, add the following flags to ``run_llama_70b_tp_pp.sh``:\n\n* ``--checkpoint_freq`` Number of steps to save a checkpoint, set to -1 to disable saving checkpoint, should set as -1 when pre-compling graph\n* ``--checkpoint_dir`` Direction to save the checkpoint\n* ``--num_kept_checkpoint`` Number of checkpoints to save, older checkpoint will be deleted manually, set to -1 to keep all saved checkpoints.\n* ``--save_load_xser`` Save with torch xla serialization to reduce time saving, it's recommended to enable xser for significantly faster save/load \n* ``--async_checkpoint_saving`` Whether to use asynchronous checkpoint saving to reduce saving time.\n\nTo enable checkpoint loading, add the following flags to ``run_llama_70b_tp_pp.sh``:\n\n* ``--loading_step`` Step to retrieve checkpoint from, set to -1 to disable checkpoint loading. Set to ``latest_if_exists`` to load the latest checkpoint under ``checkpoint_dir``.\n* ``--checkpoint_dir`` Direction to load the checkpoint from\n* ``--save_load_xser`` load with torch xla serialization to reduce time saving, it's recommended to enable xser for significantly faster save/load. Note that if the chekpoint is saved with xser, it can only be loaded with xser, vice versa. \n\nLoad pretrained model:\n\nWe also provide option to load from pretrained HF model. Before loading, convert the full model to sharded model with ``convert_checkpoints.py``:\n\n.. code:: ipython3\n\n   python3 convert_checkpoints.py --tp_size <tp_size> --pp_size <pp_size> --n_layers <number_of_layers>  --input_dir  <path_to_full_model> --output_dir <sharded_model_path> --convert_from_full_model \n\nAnd add ``--pretrained_weight_dir <sharded_model_path>`` flag to ``run_llama_70b_tp_pp.sh``\n\n\nConvert sharded model to full model with ``convert_checkpoints.py``:\n\n.. code:: ipython3\n\n   python3 convert_checkpoints.py --tp_size <tp_size> --pp_size <pp_size> --n_layers <number_of_layers>  --input_dir  <sharded_model_dir> --output_dir <full_model_dir> --convert_to_full_model --kv_size_multiplier <kv_size_multiplier> --config config.json --qkv_linear True --load_xser True\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.rst",
    "content": ".. _llama3_tp_zero1_tutorial:\n\nTraining Llama3.1-8B and Llama3-8B with Tensor Parallelism and ZeRO-1 Optimizer\n=================================================================================\n\n.. important::\n   Neuron will stop supporting XLA-based training support in a future release. For now, this tutorial is provided strictly for reference.\n\nIn this section, we showcase how to pre-train Llama3.1-8B and Llama3 8B models on four Trn1.32xlarge instances \nusing the Neuron Distributed library. We will use AWS ParallelCluster to orchestrate the training jobs. \nTo train the LLama model in this example, we will apply the following optimizations using the \nNeuron Distributed library:\n\n1. :ref:`Tensor Parallelism <tensor_parallelism_overview>`\n2. :ref:`Sequence Parallel <activation_memory_reduction>`\n3. :ref:`Selective checkpointing <activation_memory_reduction>`\n4. :ref:`ZeRO-1 <zero1-gpt2-pretraining-tutorial>`\n\n\nSetting up environment:\n^^^^^^^^^^^^^^^^^^^^^^^\n                       \nFor this experiment, we will use AWS ParallelCluster with at least four Trn1.32xlarge compute nodes.\n`Train your model on ParallelCluster <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster/parallelcluster-training.html>`__\nintroduces how to setup and use a ParallelCluster.\nTo setup the packages on the headnode of the ParallelCluster, follow the instructions mentioned here:\n:ref:`Install PyTorch Neuron on Trn1 <setup-torch-neuronx>`.\n\nWe also need to install and clone the ``neuronx-distributed`` package inside the virtual env using the following commands:\n\n.. code:: ipython3\n\n   python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com\n   git clone git@github.com:aws-neuron/neuronx-distributed.git\n\nLet’s download the scripts for pretraining:\n\n\n1. Navigate to a directory to hold our experiments\n\n.. literalinclude:: nxd-source-code/llama_tp_zero1/llama_tp_zero1_setup.sh\n   :language: shell\n   :lines: 4\n\n2. Link the training scripts for our experiments\n\n.. literalinclude:: nxd-source-code/llama_tp_zero1/llama_tp_zero1_setup.sh\n   :language: shell\n   :lines: 5-8\n\nIf you want to pre-train Llama3.1 8B, you would need to run the following steps -\n\n.. literalinclude:: nxd-source-code/llama_tp_zero1/llama_31_8b.sh\n   :language: shell\n   :lines: 5-7\n\nIf you want to pre-train Llama3 8B, you would need to run the following steps -\n\n.. literalinclude:: nxd-source-code/llama_tp_zero1/llama_3_8b.sh\n   :language: shell\n   :lines: 5-6\n\n3. Installing the additional requirements\n\n.. literalinclude:: nxd-source-code/llama_tp_zero1/llama_tp_zero1_setup.sh\n   :language: shell\n   :lines: 10\n\n\nTo tokenize the data, we must request the tokenizer from hugging face and meta by following the instructions at the following link: `HuggingFace Llama 3 8B Model <https://huggingface.co/meta-llama/Meta-Llama-3-8B>`__ . \n\nUse of the Llama models is governed by the Meta license. In order to download the model weights and tokenizer, please visit the above website and accept their License before requesting access. After access has been granted, you may use the following python3 script along with your own hugging face token to download and save the tokenizer.\n\nRun the following from ``~/examples/tp_zero1_llama_hf_pretrain`` directory:\n\n.. code:: ipython3\n\n   from huggingface_hub import login\n   from transformers import AutoTokenizer\n\n   login(token='your_own_hugging_face_token')\n\n   tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B') \n\n   tokenizer.save_pretrained(\".\")\n\nFor Llama3.1/Llama3, make sure your ``~/examples/tp_zero1_llama_hf_pretrain`` directory has the following files:\n\n.. code:: ipython3\n\n   './tokenizer_config.json', './special_tokens_map.json', './tokenizer.json'\n\nNext let’s download and pre-process the dataset:\n\n.. literalinclude:: nxd-source-code/llama_tp_zero1/llama_3_8b.sh\n   :language: shell\n   :lines: 11\n\n`Note:` In case you see an error of the following form when downloading data: ``huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/examples/tp_zero1_llama_hf_pretrain'. Use `repo_type` argument if needed.`` \nThis could be because of a stale cache. Try deleting the cache using: \n\n.. literalinclude:: nxd-source-code/llama_tp_zero1/llama_3_8b.sh\n   :language: shell\n   :lines: 8\n\n\nAt this point, you are all set to start training. The below tutorial uses ``Llama3 8B`` as an example.\n\nRunning training\n^^^^^^^^^^^^^^^^\n\nBy this step, the ParallelCluster is all setup for running experiments. \nBefore we run training, we first pre-compile the graphs using the :ref:`neuron_parallel_compile <pytorch-neuronx-parallel-compile-cli>`.\nLet’s run the command below:\n\n.. literalinclude:: nxd-source-code/llama_tp_zero1/llama_3_8b.sh\n   :language: shell\n   :lines: 15-18\n\nThis script uses a tensor-parallel size of 8.\nThis will automatically set the zero-1 sharding degree to 16 (4 * 32 workers / tensor_parallel_size). \n\n`Note`: You can use any number of nodes in this case, would just need to adjust the number of nodes in the above \nslurm command accordingly. Also, the number of nodes used in parallel_compile command should be same as the actual \ntraining run. This is because, as the number of nodes change, the data-parallel degree would change too. This would \nresult in more workers participating in operations like `gradient all-reduce` which would result in new graphs getting \ncreated. \n\nOnce the graphs are compiled we can now run training and observe our loss goes down.\nTo run the training, we just run the above command but without ``neuron_parallel_compile``.\n\n.. literalinclude:: nxd-source-code/llama_tp_zero1/llama_3_8b.sh\n   :language: shell\n   :lines: 20-23\n\n\nPerformance\n^^^^^^^^^^^^\n\nTo achieve better performance, the script applies few techniques:\n\n**Sequence Parallelism and Selective Activation Checkpointing**\n\nAs explained in the :ref:`Activation Memory Recomputation Doc <activation_memory_reduction>`, both `Sequence Parallelism` \nand `Selective activation checkpointing` can help with activation memory reduction thereby allowing us to fit bigger \nmodels with less number of devices. \nPlease refer to :ref:`Activation Memory Reduction Developer Guide <activation_memory_reduction_developer_guide>` on how to \nenable sequence parallel and selective activation checkpointing.\n\n**Coalescing Q, K, V layers**\n\nWe coalesced parallel matrix multiply to improve throughput:\n\n* We coalesced ``query``, ``key`` and ``value`` into one matrix multiply\n* We coalesced ``gate_proj`` and ``up_proj`` into one matrix multiply\n\nPlease check ``modeling_llama_nxd.py`` for details.\n`Note:` Because we coalesced the layers above, pretrained checkpoints cannot be loaded out of the box for fine-tuning, and would require preprocessing. The Q,K,V layers \nand the gate_proj and up_proj layers need to be coalesced in the checkpoint before loading.\n\n**Logging**\n\nCurrently for better performance we log loss values every 10 steps. Logging frequently will result in frequent \nsyncs between device and CPU which are expensive. Hence, it is recommended to do less frequent logging if possible.\n\n\n**Flash Attention**\n\nWe're introducing flash attention function for better performance/memory efficiency. Currently it's enabled by default, to disable it set ``--use_flash_attention 0``\n\nCheckpointing\n^^^^^^^^^^^^^^\n\nCurrently by default, the checkpoint is saved at the end of training. You can modify that behaviour by saving \nthe checkpoint after every `N steps` inside the training loop:\n\n.. code:: ipython3\n\n   from neuronx_distributed.parallel_layers import checkpointing\n   if global_step % every_n_steps_checkpoint == 0:\n      state_dict = {\n         \"model\": model.state_dict(),\n         \"global_step\": global_step,\n         \"epoch\": epoch,\n         \"scheduler\": scheduler.state_dict()\n      }\n      checkpointing.save(state_dict, flags.output_dir)\n      optimizer.save_sharded_state_dict(flags.output_dir)\n\nHere we have to save the model state_dict using the `checkpointing.save` API and the optimizer state_dict using \nthe `optimizer.save_sharded_state_dict`. This is because, currently, `checkpointing.save` API only saves on \ndata-parallel rank 0, while in case of Zero1 Optimizer, the optimizer states are distributed across all data-parallel \nranks. Hence, we use Zero1 Optimizer's save API to save the optimizer states.\n\n`Time to save a checkpoint:`\n\nCheckpoint save time can vary depending on what location the checkpoint is saved. If the checkpoint is saved in \nthe `home` directory, the checkpointing time can be higher. The same time can be reduce by 4x if the checkpoint \nis dumped to FSX file system. \n\nBy default, `checkpoint.save` API allows one tensor-parallel rank at a time to save the checkpoint. This is done \nin order to avoid HOST OOM. When all tensor-parallel ranks try to save at the same time, they would end up copying \nweights to CPU at the same time. This can result in HOST OOM. `Note:` Since, we use `XLA_DOWNCAST_BF16` flag for \nBF16 training, even though the weights on device are on bf16, the weights on CPU are copied in FP32 format. In case, \nyou want to avoid this typecasting from BF16 to FP32 when copying weights from device to CPU for checkpoint saving, \nyou can pass `down_cast_bf16=True` to the checkpointing.save API as follows:\n\n.. code:: ipython3\n\n   from neuronx_distributed.parallel_layers import checkpointing\n   if global_step % every_n_steps_checkpoint == 0:\n      state_dict = {\n         \"model\": model.state_dict(),\n         \"global_step\": global_step,\n         \"epoch\": epoch,\n         \"scheduler\": scheduler.state_dict()\n      }\n      checkpointing.save(state_dict, flags.output_dir, down_cast_bf16=True)\n\nThis should not only reduce the HOST memory pressure when saving weights, but at the same time reduce model checkpointing \ntime by half. `Note:` We are saving checkpoint in sharded format, wherein each tensor-parallel rank is \nsaving one shard. To deploy these pretrained models, one would have to combine these shards by loading them and \nconcatenating the tensor-parallel layers together. (We are working on a checkpoint conversion script that \ncombines the shards into a single checkpoint)\n\nIn addition to the above method, if we want to speed up checkpoint saving for the model further, we can do so by:\n\n.. code:: ipython3\n\n   from neuronx_distributed.parallel_layers import checkpointing\n   if global_step % every_n_steps_checkpoint == 0:\n      state_dict = {\n         \"model\": model.state_dict(),\n         \"global_step\": global_step,\n         \"epoch\": epoch,\n         \"scheduler\": scheduler.state_dict()\n      }\n      checkpointing.save(state_dict, flags.output_dir, down_cast_bf16=True, save_xser=True)\n\nThe `save_xser` uses torch-xla's `xser.save <https://pytorch.org/xla/release/2.1/index.html#saving-and-loading-xla-tensors>`__ \nto save the tensors serially. This API will copy one tensor at a time to the disk. This will allow all the ranks to \nsave the checkpoint at the same time. This speeds up checkpoint saving especially for large models as all ranks \nare saving at the same time. Moreover, the risk of HOST OOM is completely eliminated because only one tensor is copied \nto CPU at a time. \n\n`Note:` If we use `save_xser` to save the checkpoint, we would have to pass `load_xser` to the \n`checkpoint.load` API. \nAlso, if you use `save_xser`, the checkpoint folder would contain a `.pt` file for each tensor instead of a \nsingle `.pt` for the entire state_dict. To read this checkpoint in your checkpoint conversion script, you would \nhave to use `xser.load <https://pytorch.org/xla/release/2.1/index.html#saving-and-loading-xla-tensors>`__ API \ninstead of `torch.load` to load the checkpoint. The `xser.load` should load the serialized checkpoint and return \nthe full state_dict.\n\nFinally, to speed up optimizer saving time, you can increase the number of workers saving at the same time. \nThis can be done as follows:\n\n.. code:: ipython3\n\n   if global_step % every_n_steps_checkpoint == 0:\n      ...\n      optimizer.save_sharded_state_dict(flags.output_dir, num_workers_per_step=32)\n\nBy default, `num_workers_per_step` is set to 8.\n"
  },
  {
    "path": "libraries/neuronx-distributed/tutorials/training_tutorials.rst",
    "content": ".. _nxd_training_tutorials:\n\nTraining Tutorials\n============================================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n        \n    Training using Tensor Parallelism </libraries/neuronx-distributed/tutorials/training>\n    Training Llama 3.1 8B/Llama 3 8B using TP and ZeRO-1 </libraries/neuronx-distributed/tutorials/training_llama_tp_zero1>\n    Training Llama 3.1 70B/Llama 3 70B using TP and PP </libraries/neuronx-distributed/tutorials/training_llama_tp_pp>\n    Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning </libraries/neuronx-distributed/tutorials/finetune_llama3_8b_ptl_lora>\n\n.. include:: /libraries/neuronx-distributed/tutorials/nxd_training_tutorials.txt\n\n"
  },
  {
    "path": "libraries/nxd-inference/_templates/model_card.jinja.rst",
    "content": ".. -*- mode: rst -*-\n\n.. meta::\n   :description: Learn how to get started with the {{ data.model.display_name }} model with Neuron, using recommended online and offline serving configurations.\n\n.. _nxdi-models-{{ data.model.name | lower | replace(\".\", \"-\") | replace(\"/\", \"-\") }}:\n\n{{ data.model.display_name }}\n=====================================\n\n.. toctree::\n   :hidden:\n\nLearn how to get started with the {{ data.model.display_name }} model with Neuron, using recommended online and offline serving configurations. \n\nAbout {{ data.model.display_name }}\n-------------------------------------------------------------------\n\n{{ data.model.description }}\n\nFor detailed model specifications, capabilities, and checkpoints, see the official `{{ data.model.checkpoint }} <https://huggingface.co/{{ data.model.checkpoint }}>`_ model card on Hugging Face.\n\n.. _nxdi-models-{{ data.model.name | lower | replace(\".\", \"-\") | replace(\"/\", \"-\") }}-quickstart:\n\nQuickstart\n-----------------\n\nThe following examples show how to use {{ data.model.display_name }} with NeuronX Distributed Inference (NxDI) framework and vLLM for both online and offline use cases on Neuron devices.\n\n.. admonition:: Before you start...\n   :class: note\n\n   Before running the sample code below, review how to set up your environment by following the :ref:`NxDI Setup Guide <nxdi-setup>`. Additionally, download the model checkpoint to a local directory of your choice (such as ``~/models/{{ data.model.name }}/``).\n\n{%- macro render_nxdi_code(config) %}\n\n.. code-block:: python\n   :linenos:\n   :emphasize-lines: 9,10,11,12{% for key, value in config.neuron.items() %}{% if key not in [\"extra\"] %},{{ loop.index + 12 }}{% endif %}{% endfor %}\n\n   import torch\n   from transformers import AutoTokenizer, GenerationConfig\n\n   from neuronx_distributed_inference.models.config import NeuronConfig\n   from neuronx_distributed_inference.models.llama.modeling_llama import LlamaInferenceConfig, NeuronLlamaForCausalLM\n   from neuronx_distributed_inference.modules.generation.sampling import prepare_sampling_params\n   from neuronx_distributed_inference.utils.hf_adapter import HuggingFaceGenerationAdapter, load_pretrained_config\n\n   MODEL_PATH = \"~/models/{{ data.model.name }}/\"\n   TRACED_MODEL_PATH = \"~/traced_models/{{ data.model.name }}/\"\n   SEED = 0\n   NEURON_CONFIG = NeuronConfig({% for key, value in config.neuron.items() %}{% if key not in [\"extra\"] %}\n      {{ key }}={% if value is string %}\"{{ value }}\"{% else %}{{ value }}{% endif %},\n      {%- endif %}{%- endfor %}\n   )\n\n   # Set random seed for reproducibility\n   torch.manual_seed(SEED)\n\n   # Initialize configs and tokenizer.\n   generation_config = GenerationConfig.from_pretrained(MODEL_PATH)\n   eos = generation_config.eos_token_id\n   generation_config_kwargs = {\n      \"do_sample\": True,\n      \"top_k\": 1,\n      \"pad_token_id\": eos[0] if isinstance(eos, list) else eos,\n   }\n   generation_config.update(**generation_config_kwargs)\n   config = LlamaInferenceConfig(NEURON_CONFIG, load_config=load_pretrained_config(MODEL_PATH))\n\n   tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, padding_side=\"right\")\n   tokenizer.pad_token = tokenizer.eos_token\n\n   # Compile and save model.\n   print(\"Compiling and saving model...\")\n   model = NeuronLlamaForCausalLM(MODEL_PATH, config)\n   model.compile(TRACED_MODEL_PATH)\n   tokenizer.save_pretrained(TRACED_MODEL_PATH)\n\n   # Load from compiled checkpoint.\n   print(\"Loading model from compiled checkpoint...\")\n   model = NeuronLlamaForCausalLM(TRACED_MODEL_PATH)\n   model.load(TRACED_MODEL_PATH)\n\n   # Generate outputs.\n   print(\"Generating outputs...\")\n   prompts = [\"I believe the meaning of life is\", \"The color of the sky is\"]\n   sampling_params = prepare_sampling_params(\n      batch_size=NEURON_CONFIG.batch_size,\n      top_k=[10, 5],\n      top_p=[0.5, 0.9],\n      temperature=[0.9, 0.5],\n   )\n   print(f\"Prompts: {prompts}\")\n\n   inputs = tokenizer(prompts, padding=True, return_tensors=\"pt\")\n   generation_model = HuggingFaceGenerationAdapter(model)\n   outputs = generation_model.generate(\n      inputs.input_ids,\n      generation_config=generation_config,\n      attention_mask=inputs.attention_mask,\n      max_length=model.config.neuron_config.max_length,\n      sampling_params=sampling_params,\n   )\n\n   output_tokens = tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False)\n   print(\"Generated outputs:\")\n   for i, output_token in enumerate(output_tokens):\n      print(f\"Output {i}: {output_token}\")\n\n{%- endmacro %}\n\n{%- macro render_offline_code(config) %}\n\n.. code-block:: python\n   :linenos:\n   :emphasize-lines: 9{% for key, value in config.vllm.items() %}{% if key != \"extra\" %},{{ loop.index + 9 }}{% endif %}{% endfor %}\n\n   import os\n\n   os.environ[\"VLLM_NEURON_FRAMEWORK\"] = \"neuronx-distributed-inference\"\n\n   from vllm import LLM, SamplingParams\n\n   # Create an LLM.\n   llm = LLM(\n      model=\"~/models/{{ data.model.name }}/\",{% for key, value in config.vllm.items() %}{% if key != \"extra\" %}\n      {{ key }}={% if value is string %}\"{{ value }}\"{% else %}{{ value }}{% endif %},\n      {%- endif %}{%- endfor %}\n   )\n\n   # Sample prompts.\n   prompts = [\n      \"The president of the United States is\",\n      \"The capital of France is\",\n      \"The future of AI is\",\n   ]\n   outputs = llm.generate(prompts, SamplingParams(top_k=1))\n\n   for output in outputs:\n      prompt = output.prompt\n      generated_text = output.outputs[0].text\n      print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n\n{%- endmacro %}\n\n{%- macro render_online_code(config) %}\n\n.. code-block:: bash\n   :linenos:\n   :emphasize-lines: 2,3{% for key, value in config.vllm.items() %}{% if key != \"extra\" %},{{ loop.index + 3 }}{% endif %}{% endfor %}\n\n   vllm serve \\\n      --model=\"~/models/{{ data.model.name }}/\"{% for key, value in config.vllm.items() %}{% if key != \"extra\" %} \\\n      --{{ key | replace(\"_\", \"-\") }}{% if value is sameas true %}{% elif value is mapping %}='{{ value | tojson | replace(\"True\", \"true\") | replace(\"False\", \"false\") }}'{% elif value is string %}=\"{{ value }}\"{% else %}={{ value }}{% endif %}{% endif %}{% endfor %} \\\n      --port=8080 \n\nOnce the vLLM server is online, submit requests using the example below:\n\n.. literalinclude:: ../../examples/vllm_client.py\n   :linenos:\n   :language: python\n\n{%- endmacro %}\n\n.. tab-set::\n\n   .. tab-item:: NxDI\n      :selected:\n\n      Select the instance type and make sure to update the highlighted code below to match your chosen path before you execute it.\n\n      .. tab-set::\n      {% for hardware_type, default_config in data.defaults.items() %}\n         {%- set config = data.configurations[default_config.config] %}\n\n         .. tab-item:: {{ hardware_type }}\n            {% if loop.first %}:selected:{% endif %}\n\n{{ render_nxdi_code(config) | indent(12, true) }}\n\n      {% endfor %}\n\n   .. tab-item:: Offline serving\n\n      Select the instance type and make sure to update the highlighted code below to match your chosen path before you execute it.\n\n      .. tab-set::\n      {% for hardware_type, default_config in data.defaults.items() %}\n         {%- set config = data.configurations[default_config.config] %}\n\n         .. tab-item:: {{ hardware_type }}\n            {% if loop.first %}:selected:{% endif %}\n\n{{ render_offline_code(config) | indent(12, true) }}\n\n      {% endfor %}\n\n   .. tab-item:: Online serving\n\n      Select the instance type and make sure to update the highlighted code below to match your chosen path before you execute it.\n\n      .. tab-set::\n      {% for hardware_type, default_config in data.defaults.items() %}\n         {%- set config = data.configurations[default_config.config] %}\n\n         .. tab-item:: {{ hardware_type }}\n            {% if loop.first %}:selected:{% endif %}\n\n{{ render_online_code(config) | indent(12, true) }}\n\n      {% endfor %}\n\n.. _nxdi-models-{{ data.model.name | lower | replace(\".\", \"-\") | replace(\"/\", \"-\") }}-benchmarks:\n\n{% if data.benchmarks %}\nBenchmarks\n------------------------\n\nSelect a metric to view performance benchmarks for various **batch sizes** and **input|output** sequence length combinations.\n\n.. tab-set::\n\n   .. tab-item:: Latency\n      :sync: Latency\n\n      Measured in: seconds (s)\n\n      .. df-table::\n         :header-rows: 1\n\n         latency_data = {{ data.benchmarks.Latency | tojson }}\n         df_raw = pd.DataFrame(latency_data)\n\n         cols = [c for c in df_raw.columns if c not in ('neuron_config', 'batch_size')]\n\n         df_grouped = df_raw.groupby('batch_size')[cols].min().round(3)\n         df = df_grouped.reset_index()\n         df.rename(columns={'batch_size': 'Batch Size'}, inplace=True)\n\n   .. tab-item:: Throughput\n      :sync: Throughput\n\n      Measured in: tokens per second (tok/s)\n\n      .. df-table::\n         :header-rows: 1\n\n         throughput_data = {{ data.benchmarks.Throughput | tojson }}\n         df_raw = pd.DataFrame(throughput_data)\n\n         cols = [c for c in df_raw.columns if c not in ('neuron_config', 'batch_size')]\n\n         df_grouped = df_raw.groupby('batch_size')[cols].max().round(2)\n         df = df_grouped.reset_index()\n         df.rename(columns={'batch_size': 'Batch Size'}, inplace=True)\n\n   .. tab-item:: TTFT\n      :sync: TTFT\n\n      Measured in: seconds (s)\n\n      .. df-table::\n         :header-rows: 1\n\n         ttft_data = {{ data.benchmarks.TTFT | tojson }}\n         df_raw = pd.DataFrame(ttft_data)\n\n         cols = [c for c in df_raw.columns if c not in ('neuron_config', 'batch_size')]\n\n         df_grouped = df_raw.groupby('batch_size')[cols].min().round(3)\n         df = df_grouped.reset_index()\n         df.rename(columns={'batch_size': 'Batch Size'}, inplace=True)\n\n   .. tab-item:: ITL\n      :sync: ITL\n\n      Measured in: seconds (s)\n\n      .. df-table::\n         :header-rows: 1\n\n         itl_data = {{ data.benchmarks.ITL | tojson }}\n         df_raw = pd.DataFrame(itl_data)\n\n         cols = [c for c in df_raw.columns if c not in ('neuron_config', 'batch_size')]\n\n         df_grouped = df_raw.groupby('batch_size')[cols].min().round(5)\n         df = df_grouped.reset_index()\n         df.rename(columns={'batch_size': 'Batch Size'}, inplace=True)\n\n.. admonition:: Tip\n   :class: tip\n\n   Further improvements and optimizations are possible through the :ref:`Neuron Kernel Interface (NKI) <neuron-nki>`.\n\n\n{% endif %}\n\n.. _nxdi-models-{{ data.model.name | lower | replace(\".\", \"-\") | replace(\"/\", \"-\") }}-neuron-config:\n\nRecommended configuration\n--------------------------\n\n{% if data.recommendations %}\nSelect a use case to view the recommended Neuron configuration. For the definitions of the flags listed below, see the :ref:`NxDI API reference guide <nxd-inference-api-guide>`.\n\n{%- set throughput_config = data.configurations[data.recommendations.Throughput.config] %}\n{%- set latency_config = data.configurations[data.recommendations.Latency.config] %}\n\n.. tab-set::\n\n   .. tab-item:: Offline serving\n      :sync: Throughput\n\n      For most use cases, the configuration below can be used to optimize **throughput** on Neuron devices. You can also increase the ``batch_size`` or use quantization to improve throughput even further. \n\n      {% if throughput_config.dp_degree != 1 %}\n      For this specific configuration, we recommend using **Data Parallelism (DP) of {{ throughput_config.dp_degree }}**. For more details on how to implement data parallelism, refer to the :ref:`Data Parallelism on Trn2 <nxdi-trn2-llama3.3-70b-dp-tutorial>` tutorial.\n      {% endif %}\n\n      :bdg-info:`{{ throughput_config.instance_type }}`\n\n      .. code-block:: python\n         :linenos:\n\n         NeuronConfig({% for key, value in throughput_config.neuron.items() %}{% if key not in [\"extra\"] %}\n            {{ key }}={% if value is string %}\"{{ value }}\"{% else %}{{ value }}{% endif %},\n            {%- endif %}{%- endfor %}\n         )\n\n   .. tab-item:: Online serving\n      :sync: Latency\n\n      For most use cases, the configuration below can be used to optimize **latency** on Neuron devices.\n\n      {% if latency_config.dp_degree != 1 %}\n      For this specific configuration, we recommend using **Data Parallelism (DP) of {{ latency_config.dp_degree }}**. For more details on how to implement data parallelism, refer to the :ref:`Data Parallelism on Trn2 <nxdi-trn2-llama3.3-70b-dp-tutorial>` tutorial.\n      {% endif %}\n\n      :bdg-info:`{{ latency_config.instance_type }}`\n\n      .. code-block:: python\n         :linenos:\n\n         NeuronConfig({% for key, value in latency_config.neuron.items() %}{% if key not in [\"extra\"] %}\n            {{ key }}={% if value is string %}\"{{ value }}\"{% else %}{{ value }}{% endif %},\n            {%- endif %}{%- endfor %}\n         )\n\n{% else %}\n.. note::\n\n   The recommended configuration for the {{ data.model.display_name }} model is coming soon...\n\n{% endif %}\n"
  },
  {
    "path": "libraries/nxd-inference/_templates/model_card_qwen3.jinja.rst",
    "content": ".. -*- mode: rst -*-\n\n.. meta::\n   :description: Learn how to get started with the {{ data.model.display_name }} model with Neuron, using recommended online and offline serving configurations.\n\n.. _nxdi-models-{{ data.model.name | lower | replace(\".\", \"-\") | replace(\"/\", \"-\") }}:\n\n{{ data.model.display_name }}\n=====================================\n\n.. toctree::\n   :hidden:\n\nLearn how to get started with the {{ data.model.display_name }} model with Neuron, using recommended online and offline serving configurations. \n\nAbout {{ data.model.display_name }}\n-------------------------------------------------------------------\n\n{{ data.model.description }}\n\nFor detailed model specifications, capabilities, and checkpoints, see the official `{{ data.model.checkpoint }} <https://huggingface.co/{{ data.model.checkpoint }}>`_ model card on Hugging Face.\n\n.. _nxdi-models-{{ data.model.name | lower | replace(\".\", \"-\") | replace(\"/\", \"-\") }}-quickstart:\n\nQuickstart\n-----------------\n\nThe following examples show how to use {{ data.model.display_name }} with NeuronX Distributed Inference (NxDI) framework and vLLM for both online and offline use cases on Neuron devices.\n\n.. admonition:: Before you start...\n   :class: note\n\n   Before running the sample code below, review how to set up your environment by following the :ref:`NxDI Setup Guide <nxdi-setup>`. Additionally, download the model checkpoint to a local directory of your choice (such as ``~/models/{{ data.model.name }}/``).\n\n{%- macro render_nxdi_code(config) %}\n\n.. code-block:: python\n   :linenos:\n   :emphasize-lines: 11,12{% for key, value in config.neuron.items() %}{% if key not in [\"extra\"] %},{{ loop.index + 12 }}{% endif %}{% endfor %}\n\n   import torch\n   from transformers import AutoTokenizer, GenerationConfig\n\n   from neuronx_distributed_inference.models.config import MoENeuronConfig, OnDeviceSamplingConfig\n   from neuronx_distributed_inference.models.qwen3_moe.modeling_qwen3_moe import Qwen3MoeInferenceConfig, NeuronQwen3MoeForCausalLM\n   from neuronx_distributed_inference.utils.hf_adapter import HuggingFaceGenerationAdapter, load_pretrained_config\n\n   MODEL_PATH = \"~/models/{{ data.model.name }}/\"\n   TRACED_MODEL_PATH = \"~/traced_models/{{ data.model.name }}/\"\n   SEED = 0\n   NEURON_CONFIG = MoENeuronConfig({% for key, value in config.neuron.items() %}{% if key not in [\"extra\"] %}\n      {{ key }}={% if value is string %}\"{{ value }}\"{% else %}{{ value }}{% endif %},\n      {%- endif %}{%- endfor %}\n   )\n\n   # Set random seed for reproducibility\n   torch.manual_seed(SEED)\n\n   # Initialize configs and tokenizer.\n   generation_config = GenerationConfig.from_pretrained(MODEL_PATH)\n   config = Qwen3MoeInferenceConfig(NEURON_CONFIG, load_config=load_pretrained_config(MODEL_PATH))\n\n   tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, padding_side=\"right\")\n   tokenizer.pad_token = tokenizer.eos_token\n\n   # Compile and save model.\n   print(\"Compiling and saving model...\")\n   model = NeuronQwen3MoeForCausalLM(MODEL_PATH, config)\n   model.compile(TRACED_MODEL_PATH)\n   tokenizer.save_pretrained(TRACED_MODEL_PATH)\n\n   # Load from compiled checkpoint.\n   print(\"Loading model from compiled checkpoint...\")\n   model = NeuronQwen3MoeForCausalLM(TRACED_MODEL_PATH)\n   model.load(TRACED_MODEL_PATH)\n\n   # Generate outputs.\n   print(\"\\nGenerating outputs...\")\n   prompt = \"Give me a short introduction to large language models.\"\n   messages = [\n      {\"role\": \"user\", \"content\": prompt}\n   ]\n   text = tokenizer.apply_chat_template(\n      messages,\n      tokenize=False,\n      add_generation_prompt=True,\n      enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.\n   )\n   inputs = tokenizer([text], padding=True, return_tensors=\"pt\")\n   generation_model = HuggingFaceGenerationAdapter(model)\n   outputs = generation_model.generate(\n      inputs.input_ids,\n      generation_config=generation_config,\n      attention_mask=inputs.attention_mask,\n      max_length=model.config.neuron_config.max_length,\n   )\n\n   output_tokens = tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False)\n   print(\"Generated outputs:\")\n   for i, output_token in enumerate(output_tokens):\n      print(f\"Output {i}: {output_token}\")\n\n{%- endmacro %}\n\n{%- macro render_offline_code(config) %}\n\n.. code-block:: python\n   :linenos:\n   :emphasize-lines: 9,10,11,14,15{% for key, value in config.vllm.items() %}{% if key != \"extra\" %},{{ loop.index + 9 }}{% endif %}{% endfor %}\n\n   import os\n\n   os.environ[\"VLLM_NEURON_FRAMEWORK\"] = \"neuronx-distributed-inference\"\n\n   from vllm import LLM, SamplingParams\n\n   # Create an LLM.\n   llm = LLM(\n      model=\"~/models/{{ data.model.name }}/\",{% for key, value in config.vllm.items() %}{% if key != \"extra\" %}\n      {{ key }}={% if value is string %}\"{{ value }}\"{% else %}{{ value }}{% endif %},\n      {%- endif %}{%- endfor %}\n      enable_prefix_caching=False,\n      enable_chunked_prefill=False,\n   )\n\n   # Sample prompts.\n   prompts = [\n      \"The president of the United States is\",\n      \"The capital of France is\",\n      \"The future of AI is\",\n   ]\n   outputs = llm.generate(prompts, SamplingParams(top_k=1))\n\n   for output in outputs:\n      prompt = output.prompt\n      generated_text = output.outputs[0].text\n      print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n\n{%- endmacro %}\n\n{%- macro render_online_code(config) %}\n\n.. code-block:: bash\n   :linenos:\n   :emphasize-lines: 2,3,7,8,9{% for key, value in config.vllm.items() %}{% if key != \"extra\" %},{{ loop.index + 3 }}{% endif %}{% endfor %}\n\n   vllm serve \\\n      --model=\"~/models/{{ data.model.name }}/\"{% for key, value in config.vllm.items() %}{% if key != \"extra\" %} \\\n      --{{ key | replace(\"_\", \"-\") }}{% if value is sameas true %}{% elif value is mapping %}='{{ value | tojson | replace(\"True\", \"true\") | replace(\"False\", \"false\") }}'{% elif value is string %}=\"{{ value }}\"{% else %}={{ value }}{% endif %}{% endif %}{% endfor %} \\\n      --no-enable-chunked-prefill \\\n      --no-enable-prefix-caching \\\n      --port=8080 \n\nOnce the vLLM server is online, submit requests using the example below:\n\n.. literalinclude:: ../../examples/vllm_client.py\n   :linenos:\n   :language: python\n\n{%- endmacro %}\n\n.. tab-set::\n\n   .. tab-item:: NxDI\n      :selected:\n\n      Select the instance type and make sure to update the highlighted code below to match your chosen path before you execute it.\n\n      .. tab-set::\n      {% for hardware_type, default_config in data.defaults.items() %}\n         {%- set config = data.configurations[default_config.config] %}\n\n         .. tab-item:: {{ hardware_type }}\n            {% if loop.first %}:selected:{% endif %}\n\n{{ render_nxdi_code(config) | indent(12, true) }}\n\n      {% endfor %}\n\n   .. tab-item:: Offline serving\n\n      Select the instance type and make sure to update the highlighted code below to match your chosen path before you execute it.\n\n      .. tab-set::\n      {% for hardware_type, default_config in data.defaults.items() %}\n         {%- set config = data.configurations[default_config.config] %}\n\n         .. tab-item:: {{ hardware_type }}\n            {% if loop.first %}:selected:{% endif %}\n\n{{ render_offline_code(config) | indent(12, true) }}\n\n      {% endfor %}\n\n   .. tab-item:: Online serving\n\n      Select the instance type and make sure to update the highlighted code below to match your chosen path before you execute it.\n\n      .. tab-set::\n      {% for hardware_type, default_config in data.defaults.items() %}\n         {%- set config = data.configurations[default_config.config] %}\n\n         .. tab-item:: {{ hardware_type }}\n            {% if loop.first %}:selected:{% endif %}\n\n{{ render_online_code(config) | indent(12, true) }}\n\n      {% endfor %}\n\n.. _nxdi-models-{{ data.model.name | lower | replace(\".\", \"-\") | replace(\"/\", \"-\") }}-benchmarks:\n\n{% if data.benchmarks %}\nBenchmarks\n------------------------\n\nSelect a metric to view performance benchmarks for various **batch sizes** and **input|output** sequence length combinations.\n\n.. tab-set::\n\n   .. tab-item:: Latency\n      :sync: Latency\n\n      Measured in: seconds (s)\n\n      .. df-table::\n         :header-rows: 1\n\n         latency_data = {{ data.benchmarks.Latency | tojson }}\n         df_raw = pd.DataFrame(latency_data)\n\n         cols = [c for c in df_raw.columns if c not in ('neuron_config', 'batch_size')]\n\n         df_grouped = df_raw.groupby('batch_size')[cols].min().round(3)\n         df = df_grouped.reset_index()\n         df.rename(columns={'batch_size': 'Batch Size'}, inplace=True)\n\n   .. tab-item:: Throughput\n      :sync: Throughput\n\n      Measured in: tokens per second (tok/s)\n\n      .. df-table::\n         :header-rows: 1\n\n         throughput_data = {{ data.benchmarks.Throughput | tojson }}\n         df_raw = pd.DataFrame(throughput_data)\n\n         cols = [c for c in df_raw.columns if c not in ('neuron_config', 'batch_size')]\n\n         df_grouped = df_raw.groupby('batch_size')[cols].max().round(2)\n         df = df_grouped.reset_index()\n         df.rename(columns={'batch_size': 'Batch Size'}, inplace=True)\n\n   .. tab-item:: TTFT\n      :sync: TTFT\n\n      Measured in: seconds (s)\n\n      .. df-table::\n         :header-rows: 1\n\n         ttft_data = {{ data.benchmarks.TTFT | tojson }}\n         df_raw = pd.DataFrame(ttft_data)\n\n         cols = [c for c in df_raw.columns if c not in ('neuron_config', 'batch_size')]\n\n         df_grouped = df_raw.groupby('batch_size')[cols].min().round(3)\n         df = df_grouped.reset_index()\n         df.rename(columns={'batch_size': 'Batch Size'}, inplace=True)\n\n   .. tab-item:: ITL\n      :sync: ITL\n\n      Measured in: seconds (s)\n\n      .. df-table::\n         :header-rows: 1\n\n         itl_data = {{ data.benchmarks.ITL | tojson }}\n         df_raw = pd.DataFrame(itl_data)\n\n         cols = [c for c in df_raw.columns if c not in ('neuron_config', 'batch_size')]\n\n         df_grouped = df_raw.groupby('batch_size')[cols].min().round(5)\n         df = df_grouped.reset_index()\n         df.rename(columns={'batch_size': 'Batch Size'}, inplace=True)\n\n.. admonition:: Tip\n   :class: tip\n\n   Further improvements and optimizations are possible through the :ref:`Neuron Kernel Interface (NKI) <neuron-nki>`.\n\n\n{% endif %}\n\n.. _nxdi-models-{{ data.model.name | lower | replace(\".\", \"-\") | replace(\"/\", \"-\") }}-neuron-config:\n\nRecommended configuration\n--------------------------\n\n{% if data.recommendations %}\nSelect a use case to view the recommended Neuron configuration. For the definitions of the flags listed below, see the :ref:`NxDI API reference guide <nxd-inference-api-guide>`.\n\n{%- set throughput_config = data.configurations[data.recommendations.Throughput.config] %}\n{%- set latency_config = data.configurations[data.recommendations.Latency.config] %}\n\n.. tab-set::\n\n   .. tab-item:: Offline serving\n      :sync: Throughput\n\n      For most use cases, the configuration below can be used to optimize **throughput** on Neuron devices. You can also increase the ``batch_size`` or use quantization to improve throughput even further. \n\n      {% if throughput_config.neuron.moe_ep_degree != 1 %}\n      For this specific configuration, we recommend using **Expert Parallelism (EP) of {{ throughput_config.neuron.moe_ep_degree }}**. For more details, refer to the :ref:`Qwen3-MoE Inference on Trn2 <qwen3-moe-tutorial>` tutorial.\n      {% endif %}\n\n      :bdg-info:`{{ throughput_config.instance_type }}`\n\n      .. code-block:: python\n         :linenos:\n\n         NeuronConfig({% for key, value in throughput_config.neuron.items() %}{% if key not in [\"extra\"] %}\n            {{ key }}={% if value is string %}\"{{ value }}\"{% else %}{{ value }}{% endif %},\n            {%- endif %}{%- endfor %}\n         )\n\n   .. tab-item:: Online serving\n      :sync: Latency\n\n      For most use cases, the configuration below can be used to optimize **latency** on Neuron devices.\n\n      {% if latency_config.neuron.moe_ep_degree != 1 %}\n      For this specific configuration, we recommend using **Expert Parallelism (EP) of {{ latency_config.neuron.moe_ep_degree }}**. For more details, refer to the :ref:`qwen3-moe-tutorial` tutorial.\n      {% endif %}\n\n      :bdg-info:`{{ latency_config.instance_type }}`\n\n      .. code-block:: python\n         :linenos:\n\n         NeuronConfig({% for key, value in latency_config.neuron.items() %}{% if key not in [\"extra\"] %}\n            {{ key }}={% if value is string %}\"{{ value }}\"{% else %}{{ value }}{% endif %},\n            {%- endif %}{%- endfor %}\n         )\n\n{% else %}\n.. note::\n\n   The recommended configuration for the {{ data.model.display_name }} model is coming soon...\n\n{% endif %}\n"
  },
  {
    "path": "libraries/nxd-inference/api-guides/api-guide.rst",
    "content": ".. _nxd-inference-api-guide:\n\nNxD Inference API Reference\n===========================\n\nNeuronX Distributed (NxD) Inference (``neuronx-distributed-inference``) is\nan open-source PyTorch-based inference library that simplifies deep learning\nmodel deployment on AWS Inferentia and Trainium instances. Neuronx Distributed\nInference includes a model hub and modules that users can reference to\nimplement their own models on Neuron.\n\nThis API guide describes API and configuration functions and parameters that you\ncan use when you directly interact with the NxD Inference library.\n\n.. note ::\n\n   NxD Inference also supports integration with vLLM. When you use vLLM, you can\n   use the ``override_neuron_config`` attribute to override defaults using the\n   :ref:`NeuronConfig parameters <nxd-inference-api-guide-neuron-config>` described\n   in this API guide. For more information about vLLM integration, see :ref:`nxdi-vllm-user-guide-v1`.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nConfiguration\n-------------\n\nNxD Inference defines configuration objects that enable you to control how a model\nis compiled and used for inference. When you compile a model, its configuration is\nserialized to a JSON file in the compiled checkpoint, so you can distribute the\ncompiled checkpoint to additional Neuron instances without needing to compile on\neach instance.\n\nNxD Inference supports loading HuggingFace model checkpoints and configurations.\nWhen you run a model from a HuggingFace checkpoint, NxD Inference loads the model\nconfiguration from the model's PretrainedConfig.\n\n.. _nxd-inference-api-guide-neuron-config:\n\nNeuronConfig\n~~~~~~~~~~~~\n\nNeuronConfig contains compile-time configuration options for inference on Neuron. \n\nInitialization\n^^^^^^^^^^^^^^\n\nPass the NeuronConfig attributes as keyword args.\n\nFunctions\n^^^^^^^^^\n\n- ``NeuronConfig(**kwargs)`` - Initializes a NeuronConfig with\n  attributes from ``kwargs``.\n\nAttributes\n^^^^^^^^^^\n\n- General configuration\n\n  - ``batch_size`` - The number of inputs to process in a single\n    request. Defaults to ``1``.\n  - ``padding_side`` - The padding side. Defaults to ``right``.\n  - ``seq_len`` - The sequence length, which is typically the sum of\n    ``max_context_length`` and ``max_new_tokens``. This value is the\n    maximum sequence size that the model can process in a single\n    request. Defaults to ``128``.\n  - ``max_context_length`` - The maximum context length. Default to the\n    ``seq_len``.\n  - ``max_new_tokens`` - The maximum number of tokens to generate in a\n    single request. Default to the difference between ``seq_len`` and\n    ``max_context_length``. If the difference is zero, then\n    ``max_new_tokens`` is set to ``None``.\n  - ``max_length`` - The maximum length to process. Default to the\n    ``seq_len``.\n  - ``n_active_tokens`` - The number of active tokens to track. Defaults\n    to the ``seq_len``.\n  - ``n_positions`` - The number of positions to track. Defaults to the\n    ``seq_len``.\n  - ``torch_dtype`` - The torch data type to use for computation. Choose\n    from the following options. Defaults to ``torch.bfloat16``.\n\n    - ``torch.bfloat16``\n    - ``torch.float16``\n    - ``torch.float32``\n\n  - ``rpl_reduce_dtype`` - The torch data type to use for ``all_reduce``\n    operations in RowParallelLinear layers. Defaults to ``None`` and does the \n    reduction in the input tensors dtype.\n  - ``cast_type`` - The type of casting strategy to use when loading model parameters. \n    Can be set to ``config`` (default) which casts all parameters to ``torch_dtype``, \n    or ``as-declared`` which casts all parameters to the dtype they were defined with.\n  - ``async_mode`` - Whether to use asynchronous mode for inference.\n    Defaults to ``false``.\n  - ``save_sharded_checkpoint`` - Whether to save the sharded weights in\n    the compiled checkpoint. If this option is disabled, NxD Inference\n    shards the weights during model load. Defaults to ``true``.\n  - ``logical_nc_config`` - The Logical NeuronCore Configuration (LNC).\n    On Trn1 and Inf2, this defaults to ``1``. On Trn2, this defaults to ``2``.\n    You can also configure LNC with the ``NEURON_LOGICAL_NC_CONFIG`` environment\n    variable. For more information about LNC, see :ref:`logical-neuroncore-config`.\n\n    - Note: If you use Trn2 with NxD Inference v0.1 (Neuron 2.21), you must\n      specify LNC=2 by setting ``logical_neuron_cores=2`` in NeuronConfig.\n      The ``logical_neuron_cores`` attribute is deprecated in NxD Inference v0.2\n      and later.\n\n  - ``skip_sharding`` - Whether to skip weight sharding during compilation.\n    You can use this option if the compiled checkpoint path already\n    includes sharded weights for the model. Defaults to ``false``.\n  - ``weights_to_skip_layout_optimization`` - The list of weight names\n    to skip during weight layout optimization.\n  - ``skip_warmup`` - Whether to skip warmup during model load. To improve\n    the performance of the first request sent to a model, NxD Inference\n    warms up the model during load. Defaults to ``false``.\n  - ``scratchpad_page_size`` - The scratchpad page size to use during compilation\n    and at runtime. The scratchpad is a shared memory buffer used for internal\n    model variables and other data. You can adjust this attribute in scenarios\n    where you need to adjust memory usage to support larger models or larger\n    sequence lengths.\n\n- Distributed configuration\n\n  - ``tp_degree`` - The number of Neuron cores to parallelize across\n    using tensor parallelism. Defaults to ``1``.\n\n    - The number of attention heads needs to be divisible by the\n      tensor-parallelism degree.\n    - The total data size of model weights and key-value caches needs to\n      be smaller than the tensor-parallelism degree multiplied by the\n      amount of HBM memory per Neuron core.\n\n      - On trn2, each Neuron core has 24GB of memory (with\n        ``logical_nc_config`` set to ``2``).\n      - On inf2/trn1, each Neuron core has 16GB of memory.\n\n    - The Neuron runtime supports the following tensor-parallelism\n      degrees:\n\n      - trn2: 1, 2, 4, 8, 16, 32, and 64 (with ``logical_nc_config``\n        set to ``2``)\n      - inf2: 1, 2, 4, 8, and 24\n      - trn1: 1, 2, 8, 16, and 32\n\n- Attention\n\n  - ``flash_decoding_enabled`` - Whether to enable flash decoding.\n    Defaults to ``false``.\n  - ``fused_qkv`` - Whether to fuse the query (Q), key (K), and value\n    (V) weights in the models attention layers. This option improves\n    performance by using larger matrices. Defaults to ``false``.\n  - ``sequence_parallel_enabled`` - Whether to use sequence parallelism,\n    which splits tensors along the sequence dimension. Defaults to\n    ``false``. Sequence parallel requires context sequence length to\n    be divisible with tensor parallelism degree. Once enabled, sequence parallelism\n    is only applied to context encoding.\n  - ``qk_layernorm`` - Whether to enable QK layer normalization.\n    Defaults to ``false``.\n  - ``attention_dtype`` - The torch data type to use for all operations in attention. \n    Defaults to ``None`` and infers the dtype based on the dtype of the hidden_states passed to attention.\n\n- On-device sampling\n\n  - ``on_device_sampling_config`` - The on-device sampling configuration\n    to use. Specify this config to enable on-device sampling. This\n    config is an ``OnDeviceSamplingConfig``, which has the following\n    attributes:\n\n    - ``do_sample`` - Whether to use multinomial sampling (true) or\n      greedy sampling (false). Defaults to ``false``.\n    - ``top_k`` - The top-k value to use for sampling. Defaults to\n      ``1``.\n    - ``dynamic`` - Whether to enable dynamic sampling. With dynamic\n      sampling, you can pass different ``top_k``, ``top_p``, and\n      ``temperature`` values to the ``forward`` call to configure\n      sampling for each input in a batch. Defaults to ``false``.\n    - ``deterministic`` - Whether to enable deterministic sampling.\n      Defaults to ``false``.\n    - ``global_topk`` - The global topK value to use. Defaults to\n      ``256``.\n\n- Bucketing\n\n  - ``enable_bucketing`` - Whether to enable bucketing. Defaults to\n    ``false``. You can specify the buckets to use with the\n    ``context_encoding_buckets`` and ``token_generation_buckets``\n    attributes. If you don't specify the buckets to use, NxDI\n    automatically selects buckets based on the following logic.\n\n    - Context encoding: Powers of two between 128 and the max context\n      length.\n\n      - Note: Max context length is equivalent to sequence length by\n        default.\n\n    - Token generation: Powers of two between 128 and the maximum\n      sequence length.\n\n  - ``context_encoding_buckets`` - The list of bucket sizes to use for\n    the context encoding model.\n  - ``token_generation_buckets`` - The list of bucket sizes to use for\n    the token generation model.\n\n- Quantization\n\n  - ``quantized`` - Whether the model weights are quantized. Defaults to\n    ``false``.\n  - ``quantized_checkpoints_path`` - The path to the quantized\n    checkpoint. To quantize the model and save it to this path, use\n    NeuronApplicationBase's ``save_quantized_state_dict`` function.\n    Specify one of the following:\n\n    - A folder path. During quantization, NxD Inference\n      saves the quantized model in safetensors format to this folder. To\n      use a quantized model from a folder, it can be in safetensors or\n      pickle format.\n    - A file path to a quantized model file in pickle format.\n\n  - ``quantization_dtype`` - The data type to use for quantization.\n    Choose from the following options. Defaults to ``int8``.\n\n    - ``int8`` - 8 bit int.\n    - ``f8e4m3`` - 8-bit float with greater precision and less range.\n\n      - Important: To use ``f8e4m3`` for quantization, you must set the\n        ``XLA_HANDLE_SPECIAL_SCALAR`` environment variable to ``1``.\n\n    - ``f8e5m2`` - 8-bit float with greater range and less precision.\n\n  - ``quantization_type`` - The type of quantization to use. Choose from\n    the following options. Defaults to ``per_tensor_symmetric``.\n\n    - ``per_tensor_symmetric``\n    - ``per_channel_symmetric``\n\n  - ``modules_to_not_convert`` - Specify a list of modules to be not quantized. Also, required when running inference on custom quantized models(using external libraries) where certain layers are left in full precision. Example: [\"lm_head\", \"layers.0.self_attn\", \"layers.1.mlp\", ...].\n    Defaults to None (meaning all modules will be quantized)\n\n  - ``draft_model_modules_to_not_convert`` - Specify a list of modules in full precision when working with fused speculation. If no layers are required, add all layers in the list. Example: [\"lm_head\", \"layers.0.self_attn\", \"layers.1.mlp\", ...].\n    This is only required in the case of fused speculation.\n\n- KV cache quantization\n\n  - ``kv_cache_quant`` - Whether to quantize the KV cache. When enabled,\n    the model quantizes the KV cache to the ``torch.float8_e4m3fn`` data\n    type. Defaults to ``false``.\n\n    - Important: To use ``kv_cache_quant``, you must set the\n      ``XLA_HANDLE_SPECIAL_SCALAR`` environment variable to ``1``.\n\n- Kernels\n\n  - ``attn_kernel_enabled`` - Whether to enable the flash attention\n    kernel when supported. Defaults to ``false``. Flash attention is automatically enabled by default for certain conditions,\n    see ``NeuronAttentionBase.get_flash_attention_strategy`` in \n    `neuronx_distributed_inference.modules.attention.attention_base <https://github.com/aws-neuron/neuronx-distributed-inference/blob/main/src/neuronx_distributed_inference/modules/attention/attention_base.py>`_.\n    Even explicitly enabled flash attention with ``NeuronConfig(attn_kernel_enabled=True)`` will be disabled for use cases\n    where enabling it would be less efficient.\n  - ``qkv_kernel_enabled`` - Whether to enable the fused QKV kernel. To\n    use this option, you must set ``fused_qkv`` to ``true`` and ``torch_dtype``\n    to ``torch.bfloat16``. Defaults to ``false``.\n  - ``mlp_kernel_enabled`` - Whether to enable the MLP kernel. To use this\n    option, you must set ``torch_dtype`` to ``torch.bfloat16``. Defaults\n    to ``false``.\n  - ``quantized_mlp_kernel_enabled`` - Whether to enable the quantized\n    MLP kernel, which uses FP8 compute to improve performance. To use this\n    option, you must set ``mlp_kernel_enabled`` to ``true``. Defaults to ``false``.\n  - ``rmsnorm_quantize_kernel_enabled`` - Whether to enable the\n    quantized RMS norm kernel. Defaults to ``false``.\n\n- Continuous batching\n\n  - ``is_continuous_batching`` - Whether to enable continuous batching.\n    Defaults to ``false``.\n  - ``max_batch_size`` - The maximum batch size to use for continuous\n    batching. Defaults to ``batch_size``.\n  - ``ctx_batch_size`` - The maximum batch size to use for the context\n    encoding model in continuous batching. Defaults to ``batch_size``.\n  - ``tkg_batch_size`` - The maximum batch size to use for the token\n    generation model in continuous batching. Defaults to ``batch_size``.\n\n- Speculative decoding\n\n  - ``speculation_length`` - The number of tokens to generate with the\n    draft model before checking work with the primary model. Set this\n    value to a positive integer to enable speculation. Defaults to\n    ``0``.\n  - ``spec_batch_size`` - The batch size to use for speculation.\n    Defaults to ``batch_size``.\n  - ``enable_eagle_speculation`` - Whether to enable EAGLE speculation,\n    where the previous hidden state is passed to a specialized target\n    model to improve performance. Defaults to ``false``.\n  - ``enable_eagle_draft_input_norm`` - Whether to perform input\n    normalization in the EAGLE draft model. Defaults to ``false``.\n  - ``enable_fused_speculation`` - Whether to enable fused speculation,\n    where the target and draft model are fused into a single compiled\n    model to improve performance. Fused speculation is enabled by\n    default if ``enable_eagle_speculation`` is true. Otherwise, this\n    defaults to ``false``.\n\n- Medusa decoding - Medusa is a speculation method that uses multiple\n  smaller LM heads to perform speculation.\n\n  - ``is_medusa`` - Whether to use Medusa decoding. Defaults to\n    ``false``\n  - ``medusa_speculation_length`` - The number of tokens to generate\n    with the Medusa heads before checking work with the primary model.\n    Set this value to a positive integer. Defaults to ``0``.\n  - ``num_medusa_heads`` - The number of LM heads to use for Medusa.\n    Defaults to ``0``.\n  - ``medusa_tree`` - The Medusa tree to use. For an example, see\n    ``medusa_mc_sim_7b_63.json`` in the ``examples`` folder.\n\n\n\n- Multi-LoRA serving\n\n  - ``lora_config`` - The multi-lora serving configuration to use. Defaults to ``none``. Specify this config to enable multi-LoRA serving. This\n    config is ``LoraServingConfig``, which has the following\n    attributes:\n\n    - ``max_loras`` - The maximum number of concurrent LoRA adapters \n      in device memory. Defaults to ``1``.\n    - ``max_cpu_loras`` - The maximum number of concurrent LoRA adapters in host memory.\n    - ``enable-dynamic-multi-lora`` - The flag to enable dynamic multi-LoRA serving in NxD inference. Defaults to False.\n    - ``lora_ckpt_paths`` - The checkpoint paths for LoRA adapters that need to be loaded to HBM during initialization with key-value pairs. The key is the adapter ID and the value is the local path of the LoRA adapter checkpoint.\n    - ``lora_ckpt_paths_cpu`` - The checkpoint paths for LoRA adapters in host memory during initialization with key-value pairs. The key is the adapter ID and the value is the local path of the LoRA adapter checkpoint.\n    - ``lora_memory_transpose`` - Transpose memory layout to optimize \n      inference performance. Defaults to ``True``.\n    - ``lora_shard_linear_layer`` - Shard the linear layer across TP group.\n      Defaults to ``True``.\n    - ``base_model_quantized`` - Whether the base model is quantized. Defaults to False.\n    - ``lora_ckpt_json`` - The JSON file that specifies the checkpoint paths for LoRA adapters in both HBM and host memory. Users can set either ``lora_ckpt_json`` or ``lora_ckpt_paths``/``lora_ckpt_paths_cpu`` to specify LoRA adapters, but ``lora_ckpt_json`` is recommended. The JSON file includes three fields:\n      - ``lora-ckpt-dir`` - The directory of the LoRA adapters.\n      - ``lora-ckpt-paths`` - The mapping between LoRA adapter IDs on HBM and their checkpoint paths at initialization.\n      - ``lora-ckpt-paths-cpu`` - The mapping between LoRA adapter IDs and their checkpoints on CPU.\n\n\n- Compilation configuration\n\n  - ``cc_pipeline_tiling_factor`` - The pipeline tiling factor to use\n    for collectives. Defaults to ``2``.\n\n- Debugging\n\n  - ``output_logits`` - Whether to return model logits from the Neuron device\n    when using on-device sampling. With on-device sampling, the model samples\n    the logits on-device to return a singular token, and the model output includes only\n    the tokens (without the logits) to improve performance. The ``output_logits`` feature enables\n    you to output the logits alongside the token, which enables you to run logit\n    validation and investigate the model output. Note: This feature\n    impacts performance and shouldn't be used in production; this should \n    only be used for testing and debugging model logits.\n\nInferenceConfig\n~~~~~~~~~~~~~~~\n\nInferenceConfig contains a NeuronConfig and model configuration\nattributes.\n\n\n.. _initialization-1:\n\nInitialization\n^^^^^^^^^^^^^^\n\nYou can pass attributes through keyword args, or provide a\n``load_config`` hook that is called during initialization to load the\nconfiguration attributes.\n\nInferenceConfig is compatible with HuggingFace ``transformers``. To use\na model from HuggingFace ``transformers``, you can populate an\nInferenceConfig with the attributes from the model's PretrainedConfig,\nwhich is stored in ``config.json`` in the model checkpoint.\n\n::\n\n   from neuronx_distributed_inference.models.llama import (\n       LlamaInferenceConfig,\n       LlamaNeuronConfig\n   )\n   from neuronx_distributed_inference.utils.hf_adapter import load_pretrained_config\n\n   model_path = \"/home/ubuntu/models/Meta-Llama-3.1-8B\"\n\n   neuron_config = LlamaNeuronConfig()\n   config = LlamaInferenceConfig(\n       neuron_config,\n       load_config=load_pretrained_config(model_path),\n   )\n\n.. _attributes-1:\n\nAttributes\n^^^^^^^^^^\n\nAn InferenceConfig includes ``neuron_config`` and any other attributes\nthat you set during initialization.\n\n- ``neuron_config`` - The NeuronConfig for this inference config.\n- ``fused_spec_config`` - The FusedSpecNeuronConfig for this inference\n  config. Provide a fused spec config if using fused speculation.\n- ``load_config`` - The ``load_config`` hook to run during\n  initialization. You can provide a load config hook to load\n  configuration attributes from another source. To load from a\n  HuggingFace PretrainedConfig, pass the load config hook returned by\n  ``load_pretrained_config``. The ``load_pretrained_config`` hook\n  provider takes the model path as its argument.\n\nInferenceConfig also supports an attribute map, which lets you configure\nadditional names or aliases for attributes. When you get or set an\nattribute by an alias, you retrieve or modify the value of the original\nattribute. When you initialize an InferenceConfig from a HuggingFace\nPretrainedConfig, it automatically inherits the attribute map from that\nPretrainedConfig.\n\n.. _functions-1:\n\nFunctions\n^^^^^^^^^\n\n- ``InferenceConfig(neuron_config, load_config=None, **kwargs)`` -\n  Initializes an InferenceConfig.\n- ``load_config(self)`` - Loads the config attributes. This function\n  does nothing by default; subclasses can override it to provide a\n  model-specific implementation. This function is called during\n  initialization unless a ``load_config`` hook is provided.\n- ``get_required_attributes(self)`` - Returns the list of attribute\n  names that must be present in this config for it to validate during\n  initialization. This function returns an empty list by default;\n  subclasses can override it to require model-specific attributes to be\n  present.\n- ``validate_config(self)`` - Checks that the config is valid. This\n  function is called during initialization. By default, this function\n  checks that the attributes returned by ``get_required_attributes`` are\n  present. Subclasses can override this function to implement\n  model-specific validation.\n- ``save(self, model_path)`` - Serializes the config to a JSON file,\n  ``neuron_config.json`` in the given model path.\n- ``to_json_file(self, json_file)`` - Serializes the config to the given\n  JSON file.\n- ``to_json_string(self)`` - Serializes the config to a string in JSON\n  format.\n- ``load(cls, model_path, **kwargs)`` - Loads the config from the\n  ``neuron_config.json`` file in the given model path. You can specify\n  ``kwargs`` to override attributes in the config.\n- ``from_json_file(cls, json_file, **kwargs)`` - Loads the config from\n  the given JSON file. You can specify ``kwargs`` to override attributes\n  in the config.\n- ``from_json_string(cls, json_string, **kwargs)`` - Loads the config\n  from the given JSON string. You can specify ``kwargs`` to override\n  attributes in the config.\n- ``get_neuron_config_cls(cls)`` - Returns the NeuronConfig class type\n  to use for this InferenceConfig. This function returns\n  ``NeuronConfig`` by default; subclasses can override this function to\n  configure a specific NeuronConfig subclass to use.\n\nRouterConfig\n~~~~~~~~~~~~~\n\nConfiguration class for expert router in mixture-of-experts models. This config specifies the activation function and data type used in the router component.\n\n.. _initialization-4:\n\nInitialization\n^^^^^^^^^^^^^^^\n\nInitialize directly with parameters or use the from_kwargs class method.\n\n.. _functions-4:\n\nFunctions\n^^^^^^^^^^\n\n- ``RouterConfig(**kwargs)`` - Initializes router configuration with specified activation function and data type.\n\n.. _attributes-4:\n\nAttributes\n^^^^^^^^^^^\n\n- ``act_fn`` - Activation function to use in the router. Defaults to ``\"softmax\"``. See ACT2FN for supported activations.\n- ``dtype`` - Data type for router computations. Defaults to ``torch.float32``.\n\nRoutedExpertsMLPOpsConfig\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nConfiguration class for routed experts in mixture-of-experts models. This class shares several configuration flags with MoENeuronConfig and provides additional settings specific to expert MLPs.\n\n.. _initialization-6:\n\nInitialization\n^^^^^^^^^^^^^^^\n\nInitialize with specific parameters for expert MLP operations.\n\n.. _attributes-6:\n\nAttributes\n^^^^^^^^^^^\n\n- ``num_experts`` - Total number of experts in the model.\n- ``hidden_size`` - Hidden dimension of the layers.\n- ``intermediate_size`` - Intermediate dimension of the layers.\n- ``top_k`` - Number of experts activated per token. Must be less than or equal to num_experts.\n- ``hidden_act`` - Activation function for hidden layers. See ACT2FN for supported activations.\n- ``glu_mlp`` -  When True, combines gate and up projection; otherwise, uses simple up projection.\n- ``bias`` - Whether to include bias terms in linear layers. Defaults to ``False``.\n- ``glu_type`` - Type of GLU activation to use. Defaults to ``GLUType.GLU``.\n- ``hidden_act_scaling_factor`` - Scaling factor applied to gate projections before activation. Defaults to ``1.0``\n- ``hidden_act_bias`` - Bias term added to the up projection values. Defaults to ``0.0``.\n- ``capacity_factor`` - Controls expert capacity and token dropping rate. None indicates full capacity with no token dropping.\n- ``use_index_calc_kernel`` - Whether to use specialized kernel for index calculations. Defaults to ``False``.\n- ``gate_clamp_upper_limit`` - Upper bound for clamping expert MLP gate projection results. No clamping if ``None``.\n- ``gate_clamp_lower_limit`` - Lower bound for clamping expert MLP gate projection results. No clamping if ``None``.\n- ``up_clamp_upper_limit`` - Upper bound for clamping expert MLP up projection results. No clamping if ``None``.\n- ``up_clamp_lower_limit`` - Lower bound for clamping expert MLP up projection results. No clamping if ``None``.\n- ``normalize_top_k_affinities`` - Whether to normalize chosen experts' affinities before combining with MLP outputs. Defaults to ``False``.\n- ``early_expert_affinity_modulation`` - Whether to enable early modulation of expert affinities. Defaults to ``False``.\n- ``input_layer_init_method`` - Initialization function for input linear layer weights. Defaults to ``None``.\n- ``output_layer_init_method`` - Initialization function for output linear layer weights. Defaults to ``None``.\n- ``enable_spmd_rank`` - Whether to use runtime rank information in inference. Defaults to ``False``.\n- ``is_prefill`` - Whether the configuration is for prefill computation. Defaults to ``None``.\n\nBlockwiseMatmulConfig\n~~~~~~~~~~~~~~~~~~~~~~\n\nConfiguration class for blockwise matrix multiplication operations. This config contains settings that control how blockwise matrix multiplication is performed, particularly in the context of expert MLPs.\n\n.. _initialization-3:\n\nInitialization\n^^^^^^^^^^^^^^^\n\nInitialize with specific parameters or use the from_kwargs class method.\n\n.. _functions-3:\n\nFunctions\n^^^^^^^^^^\n\n- ``BlockwiseMatmulConfig(**kwargs)`` - Initializes configuration with the specified attributes.\n\n.. _attributes-3:\n\nAttributes\n^^^^^^^^^^^\n\n- ``block_size`` - Size of blocks used in blockwise matrix multiplication.\n- ``use_block_parallel`` - Whether to enable block parallel blockwise matmul NKI kernel.\n- ``block_sharding_strategy`` - Strategy for block parallel blockwise matmul kernel implementation.\n\n  - ``BlockShardStrategy.HI_LO`` - distribute upper half and lower half blocks across LNCs.\n  - ``BlockShardStrategy.PING_PONG`` - distribute odd blocks on NC0 and even blocks on NC1.\n  \n- ``skip_dma_token`` - Kernel optimization flag for skipping token DMA operations for padding. When true, inputs to blockwise kernel don't require padding.\n- ``skip_dma_weight`` - Kernel optimization flag for skipping weight DMA operations for padding.\n- ``logical_nc_config`` - LNC size configuration. Defaults to 1 on trn1 and 2 on trn2.\n- ``blockwise_nki_autograd_cls`` - NKI function implementing blockwise matmul for expert MLPs. Defaults to ``BlockwiseMatmulNKIFunc`` when None.\n- ``use_torch_block_wise`` - Forces using PyTorch implementation of blockwise matmul for expert MLPs instead of NKI kernel.\n- ``parallelize_token_to_block_mapping`` - Enables parallel computation of block position to token indices mapping. Enabled by default.\n- ``optimized_block_to_token_mapping`` - When enabled, token position in blocks will only include top k experts.\n- ``always_augment_inputs_for_blockwise_matmul`` - Forces padding of inputs to blockwise kernel regardless of skip_dma value.\n- ``use_shard_on_intermediate_dynamic_while`` - Enables shard-on-intermediate dynamic while kernel.\n- ``use_shard_on_block_dynamic_while`` - Enables shard-on-block dynamic while kernel.\n- ``num_static_blocks`` - Number of static blocks to compute in dynamic kernel. Static blocks have fixed computation, while dynamic blocks can be skipped.\n\nMoEFusedTKGConfig\n~~~~~~~~~~~~~~~~~~\n\nConfiguration class for fused Token Generation operations in mixture-of-experts models. This config controls various kernel optimizations and fusion options.\n\n.. _initialization-7:\n\nInitialization\n^^^^^^^^^^^^^^^\n\nInitialize with settings for quantization and kernel enablement options.\n\n.. _attributes-7:\n\nAttributes\n^^^^^^^^^^^\n\n- ``quantized`` - Whether weights are quantized or not.\n- ``moe_fused_kernel_enabled`` - Whether to enable the fused MoE kernel. Defaults to ``None``.\n- ``router_topk_kernel_enabled`` - Whether to enable the router top-k kernel optimization. Defaults to ``None``.\n- ``expert_mlp_kernel_enabled`` - Whether to enable the expert MLP kernel optimization. Defaults to ``None``.\n- ``shared_mlp_kernel_enabled`` - Whether to enable the shared MLP kernel optimization. Defaults to ``None``.\n\nHybridShardingConfig\n~~~~~~~~~~~~~~~~~~~~~\n\nConfiguration class for hybrid sharding in mixture-of-experts models. This config specifies different parallelism degrees for CTE (Context Encoding) and TKG (Token Generation) components.\n\n.. _initialization-5:\n\nInitialization\n^^^^^^^^^^^^^^^\n\nInitialize with keyword arguments specifying parallelism degrees.\n\n.. _functions-5:\n\nFunctions\n^^^^^^^^^^\n\n- ``HybridShardingConfig(**kwargs)`` - Initializes configuration with specified parallelism degrees.\n\n.. _attributes-5:\n\nAttributes\n^^^^^^^^^^^\n\n- ``moe_cte_tp_degree`` - Tensor parallelism degree for Context Encoding. Defaults to ``1``.\n- ``moe_cte_ep_degree`` - Expert parallelism degree for Context Encoding. Defaults to ``1``.\n- ``moe_tkg_tp_degree`` - Tensor parallelism degree for Token Generation. Defaults to ``1``.\n- ``moe_tkg_ep_degree`` - Expert parallelism degree for Token Generation. Defaults to ``1``.\n\n.. _nxd-inference-api-guide-moe-neuron-config:\n\nMoENeuronConfig\n~~~~~~~~~~~~~~~\n\nA NeuronConfig subclass for mixture-of-experts (MoE) models. This config\nincludes attributes specific to MoE models. MoE model configurations, such\nas DbrxNeuronConfig, are subclasses of MoENeuronConfig.\n\n.. _initialization-2:\n\nInitialization\n^^^^^^^^^^^^^^\n\nPass the attributes as keyword args.\n\n.. _functions-2:\n\nFunctions\n^^^^^^^^^\n\n- ``MoENeuronConfig(**kwargs)`` - Initializes an MoENeuronConfig with\n  attributes from ``kwargs``.\n\n.. _attributes-2:\n\nAttributes\n^^^^^^^^^^\n- General\n\n  - ``moe_tp_degree`` - Tensor parallelism degree for MoE. Defaults to ``1``.\n  - ``moe_ep_degree`` - Expert parallelism degree. Defaults to ``1``.\n  - ``hybrid_sharding_config(**kwargs)`` - Configuration for hybrid model sharding. Defaults to ``None``.\n\n- Router\n\n  - ``router_config(**kwargs)`` - Configuration for the expert router. Can be initialized from a dictionary or kwargs.\n  - ``return_expert_index`` - Whether to return expert indices in output. Defaults to ``False``.\n  - ``return_router_logits`` - Whether to return router logits in output. Defaults to ``False``.\n\n- Expert MLPs\n\n  - ``blockwise_matmul_config(**kwargs)`` - Configuration for blockwise matrix multiplication. Defaults to empty config.\n  - ``capacity_factor`` - The capacity factor to use when allocating\n    tokens across experts. When an expert is at capacity, tokens allocated\n    to that expert are dropped until that expert has capacity again.\n    Defaults to ``None``, which means that NxDI waits until an expert has\n    capacity, and no tokens are dropped.\n  - ``glu_mlp`` - Whether to use a Gated Linear Unit in the MLP. Defaults\n    to ``false``\n  - ``glu_type`` - Type of GLU activation to use. Defaults to ``\"glu\"``.\n  - ``hidden_act_scaling_factor`` - Scaling factor applied to gate projections before activation. Defaults to ``1.0``.\n  - ``hidden_act_bias`` - Bias term added to the up projection values. Defaults to ``0.0``.\n  - ``gate_clamp_upper_limit`` - Upper limit for clamping experts gate values. Defaults to ``None``.\n  - ``gate_clamp_lower_limit`` - Lower limit for clamping experts gate values. Defaults to ``None``.\n  - ``up_clamp_upper_limit`` - Upper limit for clamping experts up values. Defaults to ``None``.\n  - ``up_clamp_lower_limit`` - Lower limit for clamping experts up values. Defaults to ``None``.\n  - ``use_index_calc_kernel`` - Whether to use specialized kernel for index calculations. Defaults to ``False``.\n  - ``early_expert_affinity_modulation`` - Enable early modulation of expert affinities. Defaults to ``False``.\n  - ``normalize_top_k_affinities`` - Whether to normalize the top-k expert affinities. Defaults to ``True``.\n\n- Shared Experts \n\n  - ``fused_shared_experts`` - Whether to use fused gate/up computation for shared experts. Defaults to ``False``.\n  - ``shared_experts_sequence_parallel_enabled`` - Enable sequence parallelism for shared experts. Defaults to ``False``.\n  - ``transpose_shared_experts_weights`` - Whether to transpose weights for shared experts. Defaults to ``False``.\n\nFusedSpecNeuronConfig\n~~~~~~~~~~~~~~~~~~~~~\n\nA configuration for a model that uses fused speculation, which is a speculative\ndecoding feature where the target and draft models are compiled into a combined model to improve\nperformance. For more information, see :ref:`nxd-fused-speculative-decoding`.\n\n.. _attributes-17:\n\nAttributes\n^^^^^^^^^^\n\n- ``worker_cls`` - The model class to use for fused speculation. This\n  class should be a subclass of NeuronBaseModel.\n- ``draft_config`` - The InferenceConfig for the draft model.\n- ``draft_model_path`` - The path to the draft model checkpoint.\n\nGeneration\n----------\n\nHuggingFaceGenerationAdapter\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference supports running inference with the HuggingFace ``generate``\ninference. To use HuggingFace-style generation, create a\nHuggingFaceGenerationAdapter that wraps a Neuron application model.\nThen, you can call ``generate`` on the adapted model.\n\n::\n\n   generation_model = HuggingFaceGenerationAdapter(neuron_model)\n   outputs = generation_model.generate(\n       inputs.input_ids,\n       attention_mask=inputs.attention_mask,\n       generation_config=generation_config\n   )\n\nModels\n------\n\nNxD Inference provides a :ref:`model hub<nxdi-model-reference>` with production\nready models. You can use these existing models to run inference, or use them as\nreference implementations when you develop your own models on Neuron. All model\ninherit from base classes that provide a basic set of functionality that\nis common to all models.\n\nNeuronApplicationBase\n~~~~~~~~~~~~~~~~~~~~~\n\nNeuronApplicationBase is the base class for all application models,\nincluding NeuronBaseForCausalLM. NeuronApplicationBase provides\nfunctions to compile and load models. This class extends\n``torch.nn.Module``. Application models are the entry point to running\ninference with NxD Inference. You can extend this class to define new\napplication models that implement use cases in addition to causal LM.\n\n.. _attributes-18:\n\nAttributes\n^^^^^^^^^^\n\n- ``config`` - The InferenceConfig for this model.\n- ``neuron_config`` - The NeuronConfig for this model.\n- ``model_path`` - The model path for this model.\n- ``models`` - The list of models that make up this application model.\n  These models are instances of ModelWrapper. Add models to this list to\n  compile them with ``compile``.\n- ``is_compiled`` - Whether this model is compiled.\n- ``is_loaded_to_neuron`` - Whether this model is loaded to the Neuron\n  device.\n\n.. _functions-8:\n\nFunctions\n^^^^^^^^^\n\n- ``NeuronApplicationBase(self, model_path, config=None, neuron_config=None)``\n  - Initializes an application model from the given model path, and\n  optionally the given InferenceConfig (``config``) and NeuronConfig\n  (``neuron_config``). If no InferenceConfig is provided, this function\n  loads the config from the given model path.\n- ``compile(self, compiled_model_path, debug=False)`` - Compiles this\n  model for Neuron and saves the compiled model to the given path. This\n  function compiles all models added to ``self.models``. This function\n  also shards the weights for the model. To produce HLO files that have\n  source annotations enabled for debugging, set ``debug`` to ``True``. When ``debug`` is enabled, HLOs contain following attributes for each computation: ``op_type``, ``op_name``, ``source_file``, and ``source_line``.\n- ``load(self, compiled_model_path)`` - Loads the compiled model from\n  the given path to the Neuron device. This function also loads the\n  model weights to the Neuron device.\n- ``load_weights(self, compiled_model_path)`` - Loads the model weights\n  from the given path to the Neuron device. You can call this function\n  to load new weights without reloading the entire model.\n- ``shard_weights(self, compiled_model_path)`` - Shards the model's\n  weights and serializes the sharded weights to the given path.\n- ``forward(self, **kwargs)`` - The forward function for this\n  application model. This function must be implemented by subclasses.\n- ``validate_config(cls, config)`` - Checks whether the config is valid\n  for this model. By default, this function requires that\n  ``neuron_config`` is present. This function can be implemented by\n  subclasses to provide model-specific validation.\n- ``get_compiler_args(self)`` - Returns the Neuron compiler arguments to\n  use when compiling this model. By default, this returns no compiler\n  arguments. This function can be implemented by subclasses to use\n  model-specific compiler args.\n- ``to_cpu(self)`` - Allows inference to be run entirely on CPU. Use this \n  in place of the ``compile`` and ``load`` functions. Note that CPU inference \n  doesn't currently work for configurations that use kernels.\n- ``get_state_dict(cls, model_path, config)`` - Gets the state dict for\n  this model. By default, this function loads the state dict from the\n  given model path. This function calls the class'\n  ``convert_hf_to_neuron_state_dict`` function to convert the state dict\n  according to the specific model. Subclasses can override this function\n  to provide custom state dict loading.\n\n  - When loading the state dict, this function replaces keys that start\n    with the class' ``_STATE_DICT_MODEL_PREFIX`` value with the class'\n    ``_NEW_STATE_DICT_MODEL_PREFIX`` value. Subclasses can set these\n    values to update the state dict keys accordingly.\n\n- ``convert_hf_to_neuron_state_dict`` - Converts a state dict from HF\n  format to the format expected by Neuron. By default, this function\n  returns the state dict without modifying it; subclasses can override\n  this to provide custom conversion for each model.\n- ``save_quantized_state_dict(cls, model_path, config)`` - Quantizes the\n  model's state dict and saves the quantized checkpoint to the\n  ``quantized_checkpoint_path`` from the given config's NeuronConfig.\n- ``generate_quantized_state_dict(cls, model_path, config)`` - Generates\n  the quantized state dict for this model. This function loads the\n  HuggingFace model from the given model path in order to quantize the\n  model. Then, this function passes the quantized model to\n  ``prepare_quantized_state_dict`` to generate the state dict.\n  Subclasses can override this function to customize quantization.\n- ``prepare_quantized_state_dict(cls, hf_model_quant)`` - Prepares the\n  quantized state dict for the model. By default, this function converts\n  the state dict from qint8 to int8. Subclasses can override this\n  function to customize quantization.\n- ``load_hf_model(model_path)`` - Loads the equivalent HuggingFace model\n  from the given model path. Subclasses must implement this function to\n  use quantization or to generate expected outputs when evaluating\n  accuracy with ``accuracy.py``.\n- ``reset(self)`` - Resets the model state. By default, this function\n  does nothing; subclasses can implement it to provide custom behavior.\n\nNeuronBaseForCausalLM\n~~~~~~~~~~~~~~~~~~~~~\n\nNeuronBaseForCausalLM is the base application class that you use to generate\ntext with causal language models. This class extends NeuronApplicationBase.\nYou can extend this class to run text generation in custom models.\n\n.. _attributes-9:\n\nAttributes\n^^^^^^^^^^\n\n- ``kv_cache_populated`` - Whether the KV cache is populated.\n\n.. _functions-9:\n\nFunctions\n^^^^^^^^^\n\n- ``NeuronBaseForCausalLM(self, *args, **kwargs)`` - Initializes the\n  NeuronApplicationBase and configures the models used in this LM\n  application, including context encoding, token gen, and others, based\n  on the given NeuronConfig.\n- ``forward(self, input_ids=None, seq_ids=None, attention_mask=None, position_ids=None, sampling_params=None, prev_hidden=None, past_key_values=None, inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, medusa_args=None, return_dict=None, input_capture_hook=None)``\n  - The forward function for causal LM. This function routes the forward\n  pass to the correct sub-model (such as context encoding or token\n  generation) based on the current model state. If an ``input_capture_hook``\n  function is provided, the forward function calls the hook with the model\n  inputs as arguments.\n- ``reset(self)`` - Resets the model for a new batch of inference. After\n  the model is reset, a subsequent run will invoke the context encoding\n  model.\n- ``reset_kv_cache(self)`` - Resets the KV cache by replacing its key\n  values with zeroes.\n\nNeuronBaseModel\n~~~~~~~~~~~~~~~\n\nNeuronBaseModel is the base class for all models. This class extends\n``torch.nn.Module``. In instances of NeuronBaseModel, you define the\nmodules, such as attention, MLP, and decoder layers, that make up a model.\nYou can extend this class to define custom decoder models.\n\n.. _attributes-16:\n\nAttributes\n^^^^^^^^^^\n\n- ``sampler`` - The sampler to use for on-device sampling.\n- ``kv_mgr`` - The KV cache manager to use to manage the KV cache.\n- ``sequence_dimension`` - The dimension for sequence parallelism.\n\n.. _functions-15:\n\nFunctions\n^^^^^^^^^\n\n- ``NeuronBaseModel(config, optimize_inference=True)`` - Initializes the\n  Neuron model from the given config. If ``optimize_inference`` is true,\n  then this initializes a KV cache manager and sampler (if on-device\n  sampling).\n- ``setup_attr_for_model(self, config)`` - Initializes the following\n  attributes for the model. These attributes are used by modules within\n  the model. Subclasses must implement this function to set these\n  attributes from the config.\n\n  - ``on_device_sampling``\n  - ``tp_degree``\n  - ``hidden_size``\n  - ``num_attention_heads``\n  - ``num_key_value_heads``\n  - ``max_batch_size``\n  - ``buckets``\n\n- ``init_model(self, config)`` - Initializes the following modules for\n  the model. Subclasses must implement this function.\n\n  - ``embed_tokens``\n  - ``layers``\n  - ``norm``\n  - ``lm_head``\n\n- ``forward(self, input_ids, attention_mask, position_ids, seq_ids, accepted_indices=None, current_length=None, medusa_mask=None, scatter_index=None)``\n  - The forward function for this model.\n\nModelWrapper\n~~~~~~~~~~~~\n\nWraps a model to prepare it for compilation. Neuron applications, such\nas NeuronBaseForCausalLM, use this class to prepare a model for\ncompilation. ModelWrapper defines the inputs to use when tracing the\nmodel during compilation.\n\nTo define a custom model with additional model inputs, you can extend ModelWrapper\nand override the ``input_generator`` function, which defines the inputs for tracing.\n\n.. _functions-6:\n\nFunctions\n^^^^^^^^^\n\n- ``ModelWrapper(config, model_cls, tag, compiler_args)`` - Initializes\n  a model wrapper from a given config and model class. This model class\n  is used to compile the model with the given compiler args. The tag is\n  used to identify the compiled model in the application.\n- ``input_generator(self)`` - Returns a list of input tensors to use to trace\n  the model for compilation. When you trace and compile a model, the trace captures\n  only the code paths that are run with these inputs. To support different inputs and\n  code paths based on configuration options, provide configuration-specific inputs\n  in ``input_generator``.\n"
  },
  {
    "path": "libraries/nxd-inference/api-guides/api-guide.txt",
    "content": "* :ref:`nxd-inference-api-guide`"
  },
  {
    "path": "libraries/nxd-inference/api-guides/index.rst",
    "content": ".. _nxdi-api-ref-index:\n\nAPI Reference Guides\n====================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    :caption: API Reference Guides\n    \n    /libraries/nxd-inference/api-guides/api-guide\n\n\nUse the NxD Inference (``neuronx-distributed-inference``) API Reference Guides to learn how to use NxD Inference.\n\n.. include:: /libraries/nxd-inference/api-guides/api-guide.txt"
  },
  {
    "path": "libraries/nxd-inference/app-notes/app_notes.txt",
    "content": "* :ref:`introduce-nxd-inference`\n* :ref:`nxdi-parallelism-user-guide`"
  },
  {
    "path": "libraries/nxd-inference/app-notes/index.rst",
    "content": "Application Notes\n=================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    :caption: Application Notes\n    \n    /about-neuron/appnotes/neuronx-distributed/introducing-nxd-inference\n    /libraries/nxd-inference/app-notes/parallelism\n\n.. include:: /libraries/nxd-inference/app-notes/app_notes.txt"
  },
  {
    "path": "libraries/nxd-inference/app-notes/parallelism.rst",
    "content": ".. _nxdi-parallelism-user-guide:\n.. _nxdi-tensor-parallelism:\n\nParallelism Techniques for LLM Inference\n========================================\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n\nLarge language models (LLMs) have grown exponentially in size in the past few years, requiring\nincreasing accelerator memory to run the model. In order to effectively generate predictions from an LLM, it\nis often necessary to use one or more **parallelism techniques** to shard operations across multiple available accelerators.\n**Model parallelism**, such as tensor and sequence parallelism described in this document, can reduce memory requirements per NeuronCore \nby sharding the model across multiple cores. **Data parallelism**, on the other hand, enables\nhigher throughput by sharding input data.\n\nTensor Parallelism\n--------------------\n\nTensor parallelism is a technique in which a tensor is split into a number of chunks along the intermediate\ndimension, resulting in sharding not only model parameters but also intermediate activations.\nTensor parallelism has relatively high communication volume and presents a synchronization point in forward pass,\nmaking it costly to scale beyond 1 node. When tensors are sharded across multiple EC2 instances, the collective communication\nat these synchronization points must occur through network interfaces like Elastic Fabric Adapter (EFA) instead of\nthe faster chip-to-chip NeuronLink connections.\n\nA basic transformer MLP block contains a single matrix multiplication (matmul) called the up-projection, \nwhich increases the dimensionality from the hidden_size to the intermediate_size, and a single output matmul called the down projection, \nwhich reduces the dimensionality back to the hidden_size, with a non-linear activation function in-between. \nIn order to avoid running collective operations (synchronization point) after each matrix multiply, we\ndefer collective to run after 2nd linear layer. To ensure correctness of non-linear activation\nfunction computation (``f(x+y) != f(x) + f(y)`` for non-linear ``f`` like silu in SwiGLU), we split first linear layer\nalong columns (ColumnParallel) and second linear layer along rows (RowParallel), then run an AllReduce collective\noperation at the end.\n\nModern transformer architectures use SwiGLU activation function, where the MLP block has 3 matrices, first\nup and gate projection and later a down projection. We can view up and gate projection as the same\n(referred to as first matrix multiply or first linear layer) in this context because they have the same\nsharding approach. In this case up and gate projection is column parallel, while down projection is row parallel.\n\nIn attention, we similarly split Q, K and V projections in column parallel fashion and use row parallel for\nfinal output (O) projection, then run AllReduce with input tensor size equal to\n``batch_size x sequence_length x hidden_size x per_element_bytes`` bytes. Here,``per_element_bytes`` depends on the\nnumerical format of your tensors. When using BF16, for example, it would be ``2``. \nAllReduce input tensor size is the same for both MLP and attention blocks, resulting in two AllReduce operations\nwith with the same input size and output size as per AllReduce algorithm per transformer layer.\n\n.. figure:: /images/sharding/tensor_parallel.png\n   :alt: Image: sharding_tensor_parallel.png\n\n   Image visualizing transformer layer like llama3 with SwiGLU activation layer in MLP.\n\nHow to Use Tensor Parallelism with NxD Inference\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTensor parallelism can be enabled by setting the ``tp_degree`` parameter in NeuronConfig. See\n:ref:`nxdi-feature-guide-tensor-parallelism` for more detail.\n\nCode example, defining NeuronConfig:\n\n .. code-block:: python\n\n    neuron_config = NeuronConfig(tp_degree=32)\n\nSee :ref:`tensor_parallelism_overview` for a detailed reference of the concepts underlying tensor parallelism.\n\nSequence Parallelism\n---------------------\n\nOne drawback of tensor parallelism is that it replicates attention/MLP layer norm and dropout operations across all NeuronCores.\nThese operations are less compute intensive compared to linear layers, but still requires\nsignificant memory. These computations are independent along the sequence dimension, allowing us to shard\nacross the sequence dimension. This requires AllGather in the transition from a sequence to a tensor parallel \nregion and ReduceScatter in the transition from tensor to sequence parallel region during inference.\nSequence parallelism is especially useful for longer sequences and usually used in conjunction with tensor parallelism.\n\n\n.. figure:: /images/sharding/sequence_parallel.png\n   :alt: Image: sharding_sequence_parallel.png\n\n   Image visualizing how sequence and tensor parallelism intertwine in transformer layer like Llama 3.\n\nHow to Use Sequence Parallelism with NxD Inference\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSequence parallelism can be enabled by setting the ``sequence_parallel_enabled`` parameter in NeuronConfig. See \n:ref:`nxdi-feature-guide-sequence-parallelism` for more detail.\n\nCode example, defining NeuronConfig:\n\n.. code-block:: python\n\n    neuron_config = NeuronConfig(sequence_parallel_enabled=True)\n\nFlash Decoding\n--------------\n\nFlash decoding enables inference on long sequences by partitioning the KV cache. The technique is useful for \nlong sequences and used in decoding phase. It is motivated by the fact that assuming KV caching, K and V memory\nfootprint scales with sequence length, while Q has sequence length equal to 1 during decoding.\n\nFlash decoding shards K and V, and at the start uses AllGather to gather all Q heads in each\nKV partition. Each KV partition computes partial softmax (also called log-sum-exp) which uses AllGather\nto compute log-sum-exp scaling factor and correction denominator after “local” attention computation\n(multiply Q and K, then apply the mask). Lastly, the algorithm performs ReduceScatter on attention results at the end.\n\nHow to Use Flash Decoding with NxD Inference\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nFlash decoding can be enabled by setting the ``flash_decoding_enabled`` parameter in NeuronConfig.\nThe technique is only supported with GQA (grouped query attention).\n\nCode example, defining NeuronConfig:\n\n\n.. code-block:: python\n\n    neuron_config = NeuronConfig(flash_decoding_enabled=True)\n\n\nData Parallelism\n------------------\n\nData parallelism will replicate the model (same architecture, weights and hyperparameters) but will shard input data.\nBy distributing the data across NeuronCores or even different instances, data parallelism reduces\nthe total execution time of large batch size inputs using parallelization across sharded inputs instead of\nsequential execution. Compared to batch parallel where KV cache is sharded, each data parallel replica has\nits own individual cache for separate sequences.\n\nData parallelism works as standalone technique or can be used in conjunction with other model sharding techniques such as tensor parallelism. \nFor example, Trn2 instances has 64 NeuronCores available when using default Logical NeuronCore configuration (LNC) set to 2, so you can use a\ntensor parallel degree of 16 and a data parallel degree of 4, resulting in four copies of the model, each with disjunct data partitioning and\nwith each model sharded across 16 logical NeuronCores. Model replicas can run on the same instance or separate instances.\nData parallelism doesn't introduce any additional collective operations during inference.\n\n.. figure:: /images/sharding/data_parallel.png\n   :alt: Image: sharding_data_parallel.png\n\n   Image visualizing how data parallelism shards inputs.\n\nHow to Use Data Parallelism with NxD Inference\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSee :ref:`/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-dp-tutorial.ipynb` for detailed guidance on how to use vLLM to apply data parallelism along with tensor\nparallelism to increase model inference throughput in NxDI. "
  },
  {
    "path": "libraries/nxd-inference/developer_guides/accuracy-eval-with-datasets.rst",
    "content": ".. _accuracy-eval-with-datasets:\n\nAccuracy Evaluation of Models on Neuron Using Open Source Datasets\n====================================================================\n\nThis guide demonstrates how to evaluate accuracy of models on Trainium and Inferentia instances using open source datasets. \nThis approach expands on the accuracy evaluation using logits and enables you to evaluate accuracy using open source datasets \nlike MMLU and GSM8K for tasks such as instruction following and mathematical reasoning.\n\nUnder the hood, this accuracy suite uses vLLM server to serve the model\nand can use benchmarking clients such as `lm-eval <https://github.com/EleutherAI/lm-evaluation-harness>`__ \nand `LongBench <https://github.com/THUDM/LongBench>`__ to evaluate on their supported datasets. \nIn future we will add support for other benchmarking clients. \n\nThe code used in this guide is located at https://github.com/aws-neuron/aws-neuron-samples/tree/master/inference-benchmarking/\n\nFor a tutorial that you can follow and run on a trainium or inferentia instance please look at :ref:`Evaluating Accuracy of Llama-3.1-70B on Neuron using open source datasets </libraries/nxd-inference/tutorials/trn1-llama3.1-70b-instruct-accuracy-eval-tutorial.ipynb>`.\n\nConfiguration Setup\n-------------------\n\nCreating the Configuration File\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nCreate a test_config.yaml file that defines your server settings and\naccuracy test configurations:\n\n.. code:: yaml\n\n   server:\n     name: \"test-model-server\"\n     model_path: \"/path/to/model\"\n     model_s3_path: \"s3://bucket/path/to/model\"\n     max_seq_len: 2048\n     context_encoding_len: 1024\n     tp_degree: 2\n     n_vllm_threads: 16\n     server_port: 8000\n     continuous_batch_size: 2\n\n   test:\n     accuracy:\n       mmlu_test:\n         client: \"lm_eval\"\n         datasets: [\"mmlu\"]\n         max_concurrent_requests: 1\n         timeout: 3600\n         client_params:\n           limit: 100\n       \n       longbench_test:\n         client: \"longbench\"\n         datasets: [\"qasper\", \"multifieldqa\"]\n         max_concurrent_requests: 1\n         timeout: 7200\n         client_params:\n           max_length: 4096\n\nConfiguration Parameters\n------------------------\n\nServer Configuration\n~~~~~~~~~~~~~~~~~~~~\n\n========================= ================================\nParameter                 Description\n========================= ================================\n``name``                  Identifier for your model server\n``model_path``            Local path to model files\n``model_s3_path``         S3 location of model files\n``max_seq_len``           Maximum sequence length\n``context_encoding_len``  Length of context encoding\n``tp_degree``             Tensor parallelism degree\n``n_vllm_threads``        Number of vLLM threads\n``server_port``           Server port number\n``continuous_batch_size`` Size of continuous batches\n========================= ================================\n\nif ``model_s3_path`` is specified, the model will be downloaded into ``model_path``,\notherwise model should already exist in ``model_path``.\n\nAccuracy Test Configuration\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n+-----------------------------+---------------------------------------+\n| Parameter                   | Description                           |\n+=============================+=======================================+\n| ``client``                  | Evaluation framework (e.g.,           |\n|                             | “lm_eval”, “longbench”)               |\n+-----------------------------+---------------------------------------+\n| ``datasets``                | List of datasets for evaluation       |\n|                             | from the supported set by the client  |\n+-----------------------------+---------------------------------------+\n| ``max_concurrent_requests`` | Maximum parallel requests             |\n+-----------------------------+---------------------------------------+\n| ``timeout``                 | Maximum execution time (seconds)      |\n+-----------------------------+---------------------------------------+\n| ``client_params``           | Client-specific parameters            |\n+-----------------------------+---------------------------------------+\n\nRunning Evaluations\n-------------------\n\nExecute accuracy tests using the CLI command:\n\n.. code:: bash\n\n   python accuracy.py --config test_config.yaml\n\n\n\nFor more detailed information and advanced configurations, please refer\nto: - `lm-eval\nDocumentation <https://github.com/EleutherAI/lm-evaluation-harness>`__ -\n`LongBench Documentation <https://github.com/THUDM/LongBench>`__\n\nThese resources provide comprehensive guides on client-specific\nparameters and advanced evaluation scenarios.\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/custom-quantization.rst",
    "content": ".. _nxdi-custom-quantization:\n\nCustom Quantization\n===================\n\nOverview\n--------\n\nThis document gives an overview of customizable quantization feature in\nthe NxD Inference. Users can specify which modules should not be\nconverted during quantization, allowing custom quantized model\ninference. Users can take an un-quantized model and apply selective\nquantization to specific layers while keeping others in full precision.\n\nThe document also explains how to use external libraries like\n`llmcompressor <https://github.com/vllm-project/llm-compressor>`__,\nincluding quantization config setup and applying necessary patches. It\nalso covers running inference with quantized models and specifying\nunconverted modules through either command-line arguments or\nNeuronConfig kwargs.\n\nQuantization\n------------\n\nCustom quantization allows users to have fine-grained control over which\nlayers of the model are quantized. This can be particularly useful for\nmaintaining model accuracy while still benefiting from the reduced\nmemory footprint of quantization. For more detailed information on\nquantization techniques and implementation, please refer to the\n`quantization feature\nguide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#quantization>`__.\n\nQuantize Using NxD\n~~~~~~~~~~~~~~~~~~\n\nQuantization can significantly reduce the model size and inference time,\nmaking it more suitable for deployment of large models that typically\ncannot fit on a single instance. However, not all layers of the model\nbenefit equally from quantization.\n\n- Some layers, especially those involved in critical computations like\n  normalizations or certain types of activations, may see a significant\n  drop in accuracy if quantized. Leaving these layers in full precision\n  helps maintain the overall performance of the model.\n- Quantization can also introduce small errors in each layer’s\n  computation. When these errors accumulate through the network, they\n  can lead to a noticeable degradation in performance. Keeping certain\n  layers in full precision can mitigate this accumulation.\n\nTo leverage the customizable quantization feature in NxD, follow the\nsteps below. This process involves importing necessary libraries,\ndefining the model and output paths, specifying modules to not convert,\nand utilizing a quantization function to create a quantized model.\n\n::\n\n   import torch\n   from typing import Optional, List\n   from transformers import AutoModelForCausalLM, AutoTokenizer\n   from neuronx_distributed_inference.modules.checkpoint import prune_state_dict,save_state_dict_safetensors\n   from neuronx_distributed.quantization.quantization_utils import quantize_pytorch_model_per_channel_symmetric, convert_qint8_to_int8_state_dict\n\n   model_path = \"/<model_path/llama-3.1-405b-instruct-4layers/\" \n   output_path = \"<save_quantized_checkpoints>\"\n\n   modules_to_not_convert = [\n       \"lm_head\",\n       \"layers.0.self_attn\",\n       \"layers.1.self_attn\",\n       \"layers.2.self_attn\",\n       \"layers.1.mlp\"\n   ]\n\n   def quantize(model: torch.nn.Module, dtype=torch.qint8, modules_to_not_convert: Optional[List[str]] = None) -> torch.nn.Module:\n       quant_model = quantize_pytorch_model_per_channel_symmetric(model,dtype=dtype, modules_to_not_convert=modules_to_not_convert)\n       model_quant_sd = quant_model.state_dict()\n       convert_qint8_to_int8_state_dict(model_quant_sd)\n       quantized_state_dict = prune_state_dict(model_quant_sd)\n       return quantized_state_dict\n       \n   model = AutoModelForCausalLM.from_pretrained(model_path)\n   tokenizer = AutoTokenizer.from_pretrained(model_path)\n\n   state_dict = quantize(model,torch.float8_e4m3fn,modules_to_not_convert)\n\n   save_state_dict_safetensors(state_dict=state_dict,state_dict_dir=output_path)\n   tokenizer.save_pretrained(output_path)\n\nQuantize using external libraries\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn addition to the built-in quantization features of NxD, users can also\nleverage external libraries for more flexible and advanced quantization\noptions. One such library is ``llmcompressor``, which offers a robust\nset of tools for quantizing models. To use the ``llmcompressor`` library\nfor quantization, follow the steps below.\n\nThis process involves importing necessary libraries, specifying modules\nto not convert, setting up a quantization recipe, and applying the\nquantization to create a quantized model. llmcompressor gives us a range\nfrom -/+448, so it is important to ensure the scale range is set from\n-/+240 if you need to run inference on the quantized model later using\nNxD Inference. Values outside the range of -/+240 on Neuron devices\nresult in NaNs.\n\nThe ``LLaMA`` model is an example where not all layers are quantized.\n\n- By keeping the attention layers, first and last MLP layers, and the LM\n  head in full precision, the model maintains high accuracy in tasks\n  like language generation and comprehension.\n- Quantizing the remaining layers (e.g., intermediate MLP layers)\n  reduces the model size and inference time without significantly\n  compromising performance.\n- This strategy allows for a balanced trade-off between model efficiency\n  and accuracy, making the model suitable for high performance\n  deployment.\n\n::\n\n   import torch\n   from llmcompressor.transformers import oneshot, SparseAutoModelForCausalLM\n   from transformers import AutoTokenizer\n   from compressed_tensors.quantization.utils.helpers import calculate_range\n   from compressed_tensors.quantization.quant_args import QuantizationType\n   import compressed_tensors.quantization.utils.helpers as helpers\n\n   model_path = \"/<model_path>/llama-3.1-405b-instruct-4layers/\" \n   output_path = \"<save_quantized_checkpoints>\"\n\n   modules_to_not_convert = ['lm_head',\n       \"model.layers.0.mlp.down_proj\",\n       \"model.layers.0.mlp.gate_proj\",\n       \"model.layers.0.mlp.up_proj\",\n       \"model.layers.3.mlp.down_proj\",\n       \"model.layers.3.mlp.gate_proj\",\n       \"model.layers.3.mlp.up_proj\",\n       \"model.layers.0.self_attn.k_proj\",\n       \"model.layers.0.self_attn.o_proj\",\n       \"model.layers.0.self_attn.q_proj\",\n       \"model.layers.0.self_attn.v_proj\",\n       \"model.layers.1.self_attn.k_proj\",\n       \"model.layers.1.self_attn.o_proj\",\n       \"model.layers.1.self_attn.q_proj\",\n       \"model.layers.1.self_attn.v_proj\",\n       \"model.layers.2.self_attn.k_proj\",\n       \"model.layers.2.self_attn.o_proj\",\n       \"model.layers.2.self_attn.q_proj\",\n       \"model.layers.2.self_attn.v_proj\",\n       \"model.layers.3.self_attn.k_proj\",\n       \"model.layers.3.self_attn.o_proj\",\n       \"model.layers.3.self_attn.q_proj\",\n       \"model.layers.3.self_attn.v_proj\"]\n\n   recipe = f\"\"\"\n   quant_stage:\n       quant_modifiers:\n           QuantizationModifier:\n               ignore: {modules_to_not_convert}\n               config_groups:\n                   group_0:\n                       weights:\n                           num_bits: 8\n                           type: float\n                           strategy: channel\n                           dynamic: false\n                           symmetric: true\n                       input_activations:\n                           num_bits: 8\n                           type: float\n                           strategy: token\n                           dynamic: true\n                           symmetric: true\n                       targets: [\"Linear\"]\n   \"\"\"\n\n   model = SparseAutoModelForCausalLM.from_pretrained(\n       model_path, torch_dtype=\"auto\"\n   )\n\n   # Monkey patch to rescale weights from -/+448 to -/+240\n   original_calculate_range = helpers.calculate_range\n   def calculate_range(*args, **kwargs):\n       q_min, q_max = original_calculate_range(*args, **kwargs)\n       if args[0].type == QuantizationType.FLOAT and args[0].num_bits == 8:\n           return torch.tensor(-240.0, device=args[1]), torch.tensor(240.0, device=args[1])\n       return q_min, q_max\n\n   # Patch it\n   helpers.calculate_range = calculate_range\n   oneshot(model=model, recipe=recipe)\n\n   for name, module in model.named_modules():\n       if hasattr(module, 'weight_scale'):\n           module.weight_scale.data = module.weight_scale.data.to(torch.float32)\n\n   tokenizer = AutoTokenizer.from_pretrained(model_path)\n\n   model.save_pretrained(output_path)\n   tokenizer.save_pretrained(output_path)\n\nQuantization Commands\n---------------------\n\nTo utilize the quantization commands in NxD Inference, users can follow\nthe instructions below. These commands cover the required flags to\nenable running inference with quantized models.\n\nFirst Quantize then Inference\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf you have a model in full precision and need to quantize it on the CPU\nfirst before using it for inference, you can set the following flags to\nenable quantization during inference:\n\n::\n\n   inference_demo --model-type llama --task-type causal-lm run \\\n   --model-path /your_model_path/ \\\n   --compiled-model-path /save_to_path/ \\\n   --torch-dtype bfloat16 \\\n   --tp-degree 32 \\\n   --batch-size 1 \\\n   --max-context-length 1024 \\\n   --quantized \\\n   --quantization-dtype f8e4m3 \\\n   --quantization-type per_channel_symmetric \\\n   --quantized-checkpoints-path /save_to_path/ \\\n   --seq-len 2048 \\\n   --fused-qkv \\\n   --pad-token-id 2 \\\n   --on-device-sampling \\\n   --sequence-parallel-enabled \\\n   --attn-kernel-enabled \\\n   --prompt \"I believe the meaning of life is\" \\\n   --is-continuous-batching \\\n   --enable-fused-speculation \\\n   --enable-eagle-speculation \\\n   --speculation-length 4  \\\n   --draft-model-path /your_draft_model_path \\\n   --modules-to-not-convert-file /path/modules_to_not_convert.json\n\nInference Using Already quantized checkpoint\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo utilize the quantization commands in NxD, users can follow the\ninstructions below. These commands cover the required flags to enable\nrunning inference with quantized models. The\n``modules-to-not-convert-file`` allows you to specify the list of\nmodules to not quantize, useful for quantizing models that explicitly\nrequire having some modules left in their original precision.\n\nHow to Use\n~~~~~~~~~~\n\n- Pass ``modules_to_not_convert`` using Inference Demo\n\n::\n\n   inference_demo --model-type llama --task-type causal-lm run \\\n       --model-path <path> \\\n       --compiled-model-path <path> \\\n       --torch-dtype bfloat16 \\\n       --tp-degree <value> \\\n       --batch-size <value> \\\n       --max-context-length <value> \\\n       --seq-len <value> \\\n       --on-device-sampling \\\n       --mlp-kernel-enabled \\\n       --quantized-mlp-kernel-enabled \\\n       --quantization-dtype <dtype> \\\n       --quantization-type <type> \\\n       --prompt \"I believe the meaning of life is\" \\\n       --modules-to-not-convert-file /<your_path>/modules_to_not_convert.json\n\n- Pass ``modules_to_not_convert`` using NeuronConfig kwargs\n\n::\n\n   neuron_config = NeuronConfig(\n       tp_degree=32,\n       batch_size=2,\n       max_context_length=32,\n       seq_len=64,\n       on_device_sampling_config=OnDeviceSamplingConfig(top_k=1),\n       enable_bucketing=True,\n       flash_decoding_enabled=False,\n       modules_to_not_convert=[\"lm_head\", \"layers.0.self_attn\", \"layers.1.mlp\", ...],\n       draft_model_modules_to_not_convert=[\"lm_head\", \"layers.0.self_attn\", \"layers.1.mlp\", ..., \"fc\"]\n   )\n\n..\n\n   *Note: If you are creating different NeuronConfig for draft and\n   target models, you only need to pass the modules_to_not_convert list\n   for both.*\n\nJSON File Structure\n~~~~~~~~~~~~~~~~~~~\n\nThe JSON structure is a crucial component for specifying which modules\nshould not be converted during the quantization if you are using\ninference demo. This section provides detailed examples of how to format\nthe JSON file. The JSON structure depends on whether fused speculation\nis used.\n\n1. Basic Structure\n\nFor simple cases:\n\n::\n\n   {\n       \"modules_to_not_convert\": [\n               \"lm_head\",\n               \"layers.0.self_attn\",\n               \"layers.1.self_attn\",\n               \"layers.2.self_attn\",\n               \"layers.3.self_attn\",\n               \"layers.0.mlp\",\n               \"layers.3.mlp\"\n       ]}\n\nOR\n^^\n\n::\n\n   {\n       \"model\": {\n           \"modules_to_not_convert\": [\n               \"lm_head\",\n               \"layers.0.self_attn\",\n               \"layers.1.self_attn\",\n               \"layers.2.self_attn\",\n               \"layers.3.self_attn\",\n               \"layers.0.mlp\",\n               \"layers.3.mlp\"\n           ]\n       }}\n\n1. With Fused Speculation\n\n::\n\n   {\n       \"model\": {\n           \"modules_to_not_convert\": [\n               \"lm_head\",\n               \"layers.0.self_attn\",\n               \"layers.1.self_attn\",\n               \"layers.2.self_attn\",\n               \"layers.3.self_attn\",\n               \"layers.0.mlp\",\n               \"layers.3.mlp\"\n           ]\n       },\n       \"draft_model\": {\n           \"modules_to_not_convert\": [\n               \"lm_head\",\n               \"layers.0.self_attn\",\n               \"layers.0.mlp\",\n               \"fc\"\n           ]\n       }}\n\nImportant Notes\n~~~~~~~~~~~~~~~\n\n- Make sure to assign partial names in modules to avoid conversion, as\n  shown in the examples above. This is necessary due to different naming\n  schemes between the model layers being read from the source and the\n  model we create for inference. The above examples include the partial\n  parts of the names which are common between the two naming schemes.\n\n  - For example: Original model names are like\n    ``model.layers.0.self_attn.q_proj``, whereas the names we give are\n    like ``layers.0.self_attn.qkv_proj.q_proj``\n\n- Quantization with Fused Speculation\n\n  - We currently do not quantize the draft model, Include these in the\n    ``draft_model.modules_to_not_convert`` section of your JSON file\n\nBackward Incompatible Changes:\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- Now running the quantization workflow will need the\n  ``modules-to-not-convert-file`` flag while running with\n  ``inference demo`` because we no longer hard-code the layers to\n  incorporate quantized layers.\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/disaggregated-inference.rst",
    "content": ".. _nxdi-disaggregated-inference:\n\n==============================\nDisaggregated Inference [BETA]\n==============================\n\n\nOverview\n--------\n\nDisaggregated Inference (DI), also known as disaggregated serving, disaggregated prefill, P/D disaggregation,\nis an LLM serving architecture that separates the prefill and decode phases of inference onto different hardware resources. \nTo achieve this, the prefill worker needs to transfer the computed KV cache to the decode worker to resume decoding.\nSeparating the compute intensive prefill phase from the memory bandwidth intensive \ndecode phase can improve the LLM serving experience by\n\n1. Removing prefill interruptions to decode from continuous batching to reduce inter token latency (ITL). These gains can be used to achieve higher throughput by running with a higher decode batch size while staying under Service Level Objectives (SLO).\n\n2. Adapt to changing traffic patterns while still remaining under application SLOs.\n\n3. Enable independent scaling of resources and parallelism strategies for prefill (compute bound) and decode (memory bound).\n\n\n.. note::\n\n    Automatic Prefix Caching is not supported with DI.\n\n\nHigh-Level Flow on Neuron\n-------------------------\n\nDisaggregated Inference is mainly implemented through Neuron's vLLM fork \nhttps://github.com/aws-neuron/upstreaming-to-vllm/tree/neuron-2.25 \nand the Neuron Runtime.\n\nThere are three main components to a DI workflow.\n\n1. The router. Its job is to orchestrate requests to servers inside the prefill and decode clusters.\n\n2. The prefill cluster. This represents all of the prefill servers ready to run a DI workload.\n\n3. The decode cluster. This represents all of the decode servers ready to run a DI workload.\n\nBelow is an example lifespan of a single request through the DI flow.\n\n.. image:: /libraries/nxd-inference/developer_guides/images/di_high_level_architecture.png\n    :alt: High Level Disaggregated Inference Architecture\n\n1. A request is sent to the router (1), a component responsible for orchestrating (2) the requests to both\nthe prefill and decode servers. It receives responses from the prefill and decode servers and \nstreams the results back to the user. \n\n2. The prefill server receives the request from the router (3a) and starts prefilling. After the prefill completes (4),\nit updates the status of the request for the decode server by sending information through another ZMQ server.\nThen, it listens for a \"pull request\" from the decode server to initiate the KV cache transfer.\nWe use Neuron runtime APIs to transfer the KV cache through EFA from Neuron device to Neuron device.\nThis is a zero copy transfer, meaning that we do not copy the KV cache from a Neuron device to CPU to transfer, \nbut rather directly transfer KV cache from Neuron device to Neuron device.\nThe transfer is also asynchronous. This means that the prefill server can immediately start \nprefilling the next request while the KV cache of the previous request is being transferred. This \nensures that TTFT is not impacted for other requests while the KV cache for older request is being transferred to decode.\n\n3. The decode server also receives a request from the router at the same time as the prefill server (3b).\nIt waits until it receives a signal that its corresponding prefill is done from the prefill server by listening\non the ZMQ server. Then, if there is a free spot in the decode batch, the scheduler will schedule the request and send\na \"pull request\" to the prefill server. This initiates the asynchronous KV cache transfer (red arrow) \nthrough EFA by calling the Neuron Runtime API. The receive also needs to be asynchronous to ensure\nsmooth ITL. While the receive is happening other decode requests will still run. As soon as the receive is\nfinished the scheduler will add the request to the next decode batch (5).\n\n\nPrefill Decode Interference When Colocating Prefill and Decode\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn traditional continuous batching, prefill requests are prioritized over decode requests. Prefills\nare run as batch size 1 because they are compute intensive whereas decodes can be run at a higher \nbatch size because it is constrained on memory bandwidth not compute. To ensure the highest\nthroughput, continuous batching schedulers prioritize new prefills if the decode batch is not at max capacity.\nAs soon as a decode request finishes, another prefill is scheduled to fill the finished request's place. \nHowever, all other ongoing decodes pause while the new prefill is running because that prefill uses\nthe same compute resources. This effect is known as prefill stall or prefill contention.\n\nDisaggregated Inference avoids prefill stall because the decode workflow is never interrupted by a prefill as\nit receives KV caches asynchronously while decoding. The overall ITL on DI is affected\nby the transfer time of the KV cache but this does not scale with batch size. For example, in a continuous\nbatching workload of batch size 8 each request will on average be interrupted 7 times whereas in DI each \nrequest is only affected by a single transfer since it happens asynchronously.\n\nAnother advantage of DI is its ability to adapt to traffic patterns while maintaining a consistent\nITL. For example, if prefill requests double in length the application can double the amount of available prefill\nservers in the prefill cluster to match the new traffic pattern. Continuous batching workloads would suffer because\nlonger prefill requests increase tail ITL whereas DI workloads continue to deliver a low variance and a predictable customer experience.\n\nAdditionally, DI also allows users to tailor their parallelism strategies differently for prefill and decode. \nFor example, a model with 32 attention heads may prefer to run two decode servers Data Parallel=2 (DP) \neach with Tensor Parallel=32 (TP) in order to reduce KV replication instead of TP=64. Such replication will get worse if using Group Query Attention (GQA).\n\nDI does not necessarily improve throughput directly but it can help depending on the workload. Continuous\nbatching is a technique optimized for throughput at the cost of ITL. An application may have an SLO to ensure \nthat ITL is under a certain threshold. Because increasing the batch size\nincreases the amount of prefill stall, and therefore increases ITL, many applications run on smaller than ideal batch sizes \nwhen using continuous batching. DI can allow an application to run at a higher batch size while still keeping ITL\nunder the application defined SLO.\n\n\nTrade-Offs\n^^^^^^^^^^^^\n\nBecause DI runs prefill and decode separately, each part of the inference process needs to operate at an\nequal level of efficiency to maximize throughput and hardware resources. For example, if you can process 4 prefill\nrequests per second and two decode requests per second the application will be stuck processing\ntwo requests per second. It is also important to note that the prefill and decode efficiency can vary based on\nthe prompt length and the number of tokens for a response respectively. Continuous batching and chunked prefill\ndo not have this issue as these techniques run prefill and decode on the same hardware.\n\nOne technique to remediate this is to run with a dynamic amount of prefill and decode servers. We call this\ndynamic xPyD. In the above example, we could run with 1 prefill and 2 decode servers so that our prefill and \ndecode efficiency will be balanced.\n\n\nProxy Server Architecture\n----------------------------\n\nThe proxy server routes messages between clients and workers in our disaggregated inference system. \nIt uses the Quart framework, Python's asyncio libraries, and etcd to manage this communication.\n\nMain Components\n^^^^^^^^^^^^^^^^^\n\n* **Framework**: Quart (for handling web requests)\n* **Task Management**: Python asyncio\n* **Request Forwarding**: Uses etcd to detect new prefill and decode workers (xPyD only)\n\nHow Requests Flow\n^^^^^^^^^^^^^^^^^\n\nWhen a client sends a request, the proxy server starts two tasks at the same time:\n\n.. code:: python\n\n    prefill_task = asyncio.create_task(anext(prefill_response))\n    decode_task = asyncio.create_task(anext(decode_response))\n\n    await prefill_task\n    async for chunk in handle_prefill_response(prefill_response,\n                                             streaming, endpoint,\n                                             uid, request_time):\n        yield chunk\n\n    await decode_task\n    async for chunk in handle_decode_response(decode_response,\n                                            streaming, endpoint, uid,\n                                            request_time):\n        yield chunk\n\nIf running in static 1P1D mode, the workers are pre-chosen. If running in dynamic \nxPyD mode, the workers are chosen by round-robin and discovered through etcd.\n\nThis approach offers two benefits:\n\n1. Faster responses because network delays don't stack up\n2. The decode server can get ready while prefill is working\n\nHow Tokens Work\n^^^^^^^^^^^^^^^^^\n\nThe proxy server handles tokens in specific ways to ensure accurate responses:\n\n**Prefill Settings**\n\n* Sets ``max_tokens=1`` for prefill requests\n* Returns the first output token\n\n**Decode settings**\n\n* Runs as normal except it skips the first token from decode\n\nOutput Types\n^^^^^^^^^^^^^^^\n\nThe system can work in two ways decided by the client if streaming is enabled:\n\n1. **Streaming Mode**\n   \n   * Sends tokens to the client one at a time\n   * Uses both prefill and decode servers\n   * Shows results as they're created\n\n2. **Batch Mode (stream=false)**\n   \n   * Sends all tokens at once when finished\n\nResponse Handling\n^^^^^^^^^^^^^^^^^^\n\nThe proxy server:\n\n* Combines responses from both servers\n* Keeps tokens in the right order\n* Makes sure outputs match what clients expect from a regular system\n\nDynamic xPyD (Multiple Prefill, Multiple Decode)\n--------------------------------------------------\n\nDynamic xPyD lets you use multiple prefill and decode workers and dynamically add new workers to the cluster.\n\n.. note::\n   The system can't yet remove or handle unresponsive nodes automatically.\n\n\nWorker Discovery and Connection Manager (neuron_connector.py)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe system keeps track of workers using an etcd server. Here's how it works:\n\n.. code:: python\n\n    class NeuronConnector:\n        def _keep_alive_ectd(self):\n            # Add worker to etcd\n            etcd_client.put(\n                f\"/workers/{self.role}/{self.local_ip}/{self.api_server_port}\",\n                json.dumps({\"connections\": []}),\n                lease\n            )\n\nThis manager:\n\n* Signs up workers with etcd \n* Keeps a list of active connections\n* Creates new buffers when needed (dynamic xPyD)\n* Or statically creates buffers (static 1P1D)\n\nSignal Plane (ZMQ Communication)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* **Router** (Prefill): Works with many decode connections\n* **Dealer** (Decode): Connects to prefill\n* **Message Types**:\n  \n  * Welcome message when connecting\n  * Setting up key-value maps\n  * Managing transfers\n\nBuffer Connection Management Details\n------------------------------------\n\nBuffer connection management is a critical component of the DI system that controls how servers communicate.\nThe system supports two modes of operation: static 1P1D and dynamic xPyD.\nThe connection management is done by ``neuron_connector.py`` and the actual buffer class is in ``neuron_buffer.py``.\n\nWe use two types of buffers:\n\n* ``SendBuffer``: For prefill workers\n* ``RecvBuffer``: For decode workers\n\nStatic 1P1D Mode\n-----------------\n\nIn static mode, the system creates a single buffer for each worker during initialization:\n\n.. code-block:: python\n\n    def initialize_buffer(self):\n        if self.config.is_kv_producer:\n            self.static_buffer = SendBuffer(\n                self.kv_caches,\n                self.zmq_context,\n                self.neuron_recv_ip,\n                self.config.kv_ip,\n                self.config.kv_port\n            )\n\nThis approach means:\n\n* All connection components are predefined\n* Communication paths are fixed\n* Buffers have predetermined communication partners\n\nDynamic xPyD Mode\n------------------\n\nIn dynamic mode, the system creates buffers on demand. Both SendBuffers and RecvBuffers can be created dynamically:\n\n.. code-block:: python\n\n    def maybe_setup_buffer(self, remote_ip, remote_port):\n        if self.static_buffer:\n            return self.static_buffer\n\n        key = \"\" if self.config.is_kv_producer else (remote_ip, remote_port)\n        \n        if key in self.connection_dict:\n            return self.connection_dict[key]\n\nKey differences in dynamic mode:\n\n1. One to many relationship between SendBuffers and RecvBuffers\n2. Workers register themselves in etcd for service discovery\n3. New connection determined by proxy server, the info is encoded in the request_id\n4. Workers check their connection dictionary for existing buffers encoded in the request_id\n5. If no buffer exists, they create a new one using the proxy server's information\n6. The new buffer establishes ZMQ communication with its partner\n\nThis dynamic approach allows the system to:\n\n* Add new connections as needed\n* Scale with changing workloads\n* Maintain efficient communication paths\n* Adapt to cluster changes\n\n\nTransfer Engine and Communication\n---------------------------------\n\nBelow is an image showing the KV cache transfer process on neuron:\n\n.. image:: /libraries/nxd-inference/developer_guides/images/di_transfer_architecture.png\n    :alt: High Level Transfer Architecture\n\nTransfer Engine (neuron_transfer_engine.py)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe transfer engine moves KV cache efficiently between workers:\n\n.. code:: python\n\n    class NeuronTransferEngine:\n        def transfer_neuron_tensors(self, tensors, offsets, lengths, peer_devices, ...):\n            self.engine.queue_transfer_with_token(\n                tensors, offsets, lengths, peer_devices, self.local_devices,\n                self.comm_ids, completion_count, completion_token, use_queue,\n                completion_time_out)\n\nThe engine:\n\n* Sets up KV communication between devices\n* Calls Neuron Runtime APIs to move KV caches\n* Tracks when transfers finish\n\nZero-Copy Transfer System\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Send Handler (Prefill Side)**\n\n* Runs in its own thread\n* Listens for requests from decode servers\n* Handles three types of requests:\n  \n  * Handshakes to confirm connection establishment\n  * Setting up KV cache maps\n  * Decode server requests for KV cache transfer (lookup_all)\n\nHere's how it works:\n\n.. code:: python\n\n    def send_handler(self):\n        while True:\n            identity, request = self.router.recv_json()\n        \n            if request[\"type\"] == \"handshake\":\n                self.router.send_json(identity, {\n                    \"status\": \"ok\",\n                    \"timestamp\": time.time()\n                })\n                continue\n        \n            if request[\"type\"] == \"kv_map_init\":\n                # Set up transfer details\n                continue\n                \n            if request[\"type\"] == \"lookup_all\":\n                self._process_lookup_all(identity, request)\n                continue\n\n**Receive Handler (Decode Side)**\n\n* Keeps a list of waiting transfers\n* For each task:\n  \n  * Sends request to prefill server\n  * Waits for answer\n  * If successful:\n    \n    * Saves the output token from signal plane\n    * Starts moving the KV cache through Transfer Engine\n  \n  * If it fails:\n    \n    * Tries again later\n\nStarting Transfers\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOn the Prefill Side:\n\n.. code:: python\n\n    # ensure that the request is finished prefill\n    if request_id not in self.lookup_dict:\n        self.router.send_json(identity, {\"success\": False})\n        return\n\n    # After getting decode server request and prefill is finished\n    kv_caches, offsets, lengths, peer_devices = \\\n        self.generate_transfer_sequences(entry, remote_id=identity_str)\n\n    # Start transfer\n    self.get_transfer_engine(remote_id=identity_str).transfer_neuron_tensors(\n        kv_caches, offsets, lengths, peer_devices,\n        completion_token=entry.completion_token)\n\nOn the Decode Side:\n\n.. code:: python\n\n    # receive prefill worker's output token\n    entry.output_token = torch.tensor(\n        response[\"output_token\"]).unsqueeze(0)\n\n    kv_caches, offsets, lengths, peer_devices = \\\n         self.generate_transfer_sequences(entry)\n\n    # do not wait for request completion for recv buffer\n    self.get_transfer_engine().transfer_neuron_tensors(\n        kv_caches, offsets,lengths, peer_devices,\n        completion_token=entry.completion_token)\n\nThe ``completion_token`` provides the status of the transfer.\n\n.. note::\n\n    These are separate threads from the main inference process and do not block ongoing inference.\n\n\nRequest Scheduling Rules\n------------------------\n\nHere are new scheduling rules for Disaggregated Inference:\n\nPrefill Worker Rules\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Requests can be: Waiting, Transferring, or Running\n* Only one request can run at a time\n* Total of transferring + running must not exceed batch size\n* Can start new requests when:\n  \n  * Nothing is running\n  * Number of transfers is less than batch size\n\nDecode Worker Rules\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Uses same request states as prefill\n* Running + transferring must not exceed batch size\n* Running must not exceed batch size\n* Must finish key-value cache transfer before running\n* Can start new transfers when there's space\n\nScheduler Jobs\n^^^^^^^^^^^^^^^\n\n* Adds transfer requests to a list\n* Checks status without blocking\n* Uses status to make decisions\n* Doesn't handle transfers directly\n\nThese rules help:\n\n* Keep key-value caches safe\n* Use resources well\n* Process batches efficiently\n* Keep scheduling separate from transfers\n\nExample Usage\n-------------\n\nRefer to the `offline inference DI example <https://github.com/aws-neuron/upstreaming-to-vllm/tree/neuron-2.26/examples/offline_inference/neuron_di.py>`_\nfor a quick example to get started.\n\nRefer to the :ref:`Disaggregated Inference Tutorial<nxdi-disaggregated-inference-tutorial>` for a detailed usage guide."
  },
  {
    "path": "libraries/nxd-inference/developer_guides/feature-guide.rst",
    "content": ".. _nxdi-feature-guide:\n\nNxD Inference Features Configuration Guide\n==========================================\n\nNxD Inference (``neuronx-distributed-inference``) is\nan open-source PyTorch-based inference library that simplifies deep learning\nmodel deployment on AWS Inferentia and Trainium instances. Neuronx Distributed\nInference includes a model hub and modules that users can reference to\nimplement their own models on Neuron.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nCheckpoint compatibility with HuggingFace Transformers\n------------------------------------------------------\n\nModels included in the NxD Inference model hub are checkpoint-compatible with\nHuggingFace Transformers. Supporting other checkpoint formats in NxD Inference is possible through converting the\nobtained checkpoint to the standard HuggingFace Transformers checkpoint format.\n\n.. _nxdi-checkpoint-support:\n\nCheckpoint support\n------------------\n\nNxD Inference supports older PyTorch binary checkpoints\nand newer `safetensors <https://github.com/huggingface/safetensors>`__\ncheckpoints. For improved load speed and reduced host memory\nconsumption, we recommend to always use safetensors by default. Both\nregular and sharded variants of checkpoints are supported.\n\nNxD Inference supports weights stored in the model path in the following\nformats:\n\n=========== ======= ============================\nFormat      Sharded File name\n=========== ======= ============================\nSafetensors No      model.safetensors\nSafetensors Yes     model.safetensors.index.json\nPickle      No      pytorch_model.bin\nPickle      Yes     pytorch_model.bin.index.json\n=========== ======= ============================\n\nIf your weights are in another format, you must convert them to one of\nthese formats before you can compile and load the model to Neuron. See\nthe following references for more information about these formats:\n\n- Safetensors:\n\n  - https://github.com/huggingface/safetensors\n  - https://huggingface.co/docs/safetensors/en/convert-weights\n\n- Pickle:\n\n  - https://docs.python.org/3/library/pickle.html\n\nCompiling models\n----------------\nTo run a model on Neuron with NxD Inference, you compile Python code into\na NEFF file (Neuron Executable File Format), which you can load to Neuron\ndevices using the Neuron Runtime.\n\nWhen you call ``compile()``, NxD Inference does the following:\n\n1. Trace the Python code to produce an HLO file.\n2. Use the Neuron Compiler to compile the HLO file into a NEFF.\n\nDuring the trace process, the model code is traced based on a given sample\ntensor for each input. As a result, model code should avoid dynamic logic\nthat depends on the input values in a tensor, because NxD Inference compiles\nonly the code path that is traced for the sample input tensor.\n\n::\n\n    # Configure, initialize, and compile a model.\n    model = NeuronLlamaForCausalLM(model_path, config)\n    model.compile(compiled_model_path)\n\n\n.. _nxdi-neuron-persistent-cache:\n\nNeuron Persistent Cache\n------------------------\n\nThe Neuron Persistent Cache is enabled by default for NxD Inference library.\nModel artifacts which have been compiled once will be cached and reused on\nsuccessive runs when possible. Model artifacts will only be reused when\ncompiling with the same compiler version (neuronx-cc), model configurations,\nand compiler flags. Neuron Persistent Cache also includes other features, such as using an S3 bucket as\nthe cache backend. For more detailed information, see the\n:ref:`Persistent cache documentation <neuron-caching>`\n\n\nSerialization support\n---------------------\n\nWhen you compile a model with NxD Inference, the library\nserializes the model to a given folder. After you have a serialized\nmodel, you can load it directly to a Neuron device without needing to\ncompile again.\n\nThe compile function does not serialize sharded weights by default, and you can\nenable this functionality with the ``save_sharded_checkpoint`` flag in\nNeuronConfig. For more information on weights sharding, see :ref:`nxdi-weights-sharding-guide`.\n\nLogical NeuronCore Configuration (LNC) support\n----------------------------------------------\nOn Trn2 instances, Neuron supports Logical NeuronCore (LNC) configuration,\nwhich combines multiple physical NeuronCores into a single logical\nNeuronCore. On Trn2 instances, the Neuron SDK is optimized for LNC=2, which means\neach NeuronCore visible to the Neuron SDK is two physical NeuronCores.\n\nNxD Inference automatically chooses the correct LNC configuration\nbased on the target platform. To override the default LNC configuration,\nyou can set the ``NEURON_LOGICAL_NC_CONFIG`` environment variable, or set the\n``logical_nc_config`` flag in NeuronConfig.\n\n::\n\n   neuron_config = NeuronConfig(logical_nc_config=2)\n\nFor more information about logical NeuronCore support, see\n:ref:`logical-neuroncore-config`.\n\n.. _nxdi-feature-guide-tensor-parallelism:\n\nTensor-parallelism support\n--------------------------\n\nFor transformer decoders used in large language models,\ntensor-parallelism is necessary as it provides a way to shard the\nmodels' large weight matrices onto multiple NeuronCores, and having\nNeuronCores working on the same matrix multiply operation\ncollaboratively. neuronx-distributed-inference's tensor-parallelism\nsupport makes heavy use of collective operations such as all-reduce,\nwhich is supported natively by the Neuron runtime.\n\nThere are some principles for setting tensor-parallelism degree (number\nof NeuronCores participating in sharded matrix multiply operations) for\nNeuron-optimized transformer decoder models.\n\n1. The number of attention heads needs to be divisible by the\n   tensor-parallelism degree.\n2. The total data size of model weights and key-value caches needs to be\n   smaller than the tensor-parallelism degree multiplied by the amount\n   of memory per Neuron core.\n\n   1. On Trn2, each Neuron core has 24GB of memory (with LNC2).\n   2. On Inf2/Trn1, each Neuron core has 16GB of memory.\n\n3. The Neuron runtime supports the following tensor-parallelism degrees:\n\n   1. Trn2: 1, 2, 4, 8, 16, 32, and 64 (with LNC2)\n   2. Inf2: 1, 2, 4, 8, and 24\n   3. Trn1: 1, 2, 8, 16, and 32\n\nExamples\n~~~~~~~~\n\n1. ``meta-llama/Meta-Llama-3.1-8B`` has 32 attention heads, and when\n   running at batch size 1 and bfloat16 precision, the model requires\n   about 16GB memory. Therefore, a ``trn1.2xlarge`` with 32GB device\n   memory is sufficient.\n2. ``meta-llama/Meta-Llama-3.1-70B`` has 64 attention heads, and when\n   running at batch size 1 and bfloat16 precision, the model requires\n   about 148GB memory. Therefore, it can run on 16 NeuronCores on one\n   ``trn1.32xlarge`` using 256GB device memory.\n\n.. _nxdi-feature-guide-sequence-parallelism:\n\nSequence Parallelism\n--------------------\nSequence parallelism splits tensors across the sequence dimension to\nimprove performance. You can enable sequence parallelism by setting\n``sequence_parallel_enabled=True`` in NeuronConfig.\n\n::\n\n   neuron_config = NeuronConfig(sequence_parallel_enabled=True)\n\nCompile-time Configurations\n---------------------------\n\nNxD Inference models support a variety of compile-time\nconfigurations you can use to tune model performance. For more\ninformation, see the :ref:`nxd-inference-api-guide`.\n\nHugging Face generate() API support\n-----------------------------------\n\nNxD Inference models support the HuggingFace `generate()\nAPI <https://huggingface.co/docs/transformers/main/en/main_classes/text_generation>`__\nvia the ``HuggingFaceGenerationAdapter`` class. This adapter wraps a\nNeuron model to provide the HuggingFace generation interface.\n\nNxD Inference's supports the following HuggingFace\ngeneration modes:\n\n- Greedy decoding — ``num_beams=1`` and ``do_sample=False``.\n- Multinomial sampling — ``num_beams=1`` and ``do_sample=True``.\n- Assisted (speculative) decoding — ``assistant_model`` or\n  ``prompt_lookup_num_tokens`` are specified.\n\nNxD Inference doesn't currently support other\nHuggingFace generation modes such beam-search sampling.\n\nNote: When you call ``generate``, the number of prompts must match the\n``batch_size`` for the model, which is an attribute of NeuronConfig.\n\n::\n\n   neuron_config = NeuronConfig(batch_size=2)\n\nExample\n~~~~~~~\n\nThe following example demonstrates how to wrap a model with\nHuggingFaceGenerationAdapter to call ``generate()``.\n\n::\n\n   from neuronx_distributed_inference.utils.hf_adapter import HuggingFaceGenerationAdapter\n\n   # Init Neuron model, HuggingFace tokenizer, HuggingFace and generation config.\n\n\n   # Run generation with HuggingFaceGenerationAdapter.\n   generation_model = HuggingFaceGenerationAdapter(model)\n   inputs = tokenizer(prompts, padding=True, return_tensors=\"pt\")\n   outputs = generation_model.generate(\n       inputs.input_ids,\n       generation_config=generation_config,\n       attention_mask=inputs.attention_mask,\n       max_length=model.neuron_config.max_length,\n       **kwargs,\n   )\n\n   output_tokens = tokenizer.batch_decode(\n       outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False\n   )\n\n   print(\"Generated outputs:\")\n   for i, output_token in enumerate(output_tokens):\n       print(f\"Output {i}: {output_token}\")\n\nOn-device Sampling Support\n--------------------------\n\nOn-device sampling performs sampling logic on the Neuron device (rather\nthan on the CPU) to achieve better performance. To enable on device\nsampling, provide an OnDeviceSamplingConfig for the\n``on_device_sampling_config`` attribute in NeuronConfig.\n\n::\n\n   on_device_sampling_config = OnDeviceSamplingConfig(global_topk=256)\n   neuron_config = NeuronConfig(on_device_sampling_config=on_device_sampling_config)\n\nDynamic Sampling\n~~~~~~~~~~~~~~~~\n\nWith dynamic sampling, you can pass different ``top_k``, ``top_p``, and\n``temperature`` values to the ``forward`` call to configure sampling for\neach input in a batch. To enable dynamic sampling, provide an\nOnDeviceSamplingConfig with ``dynamic=True``.\n\n::\n\n   on_device_sampling_config = OnDeviceSamplingConfig(dynamic=True)\n   neuron_config = NeuronConfig(on_device_sampling_config=on_device_sampling_config)\n\nTo use dynamic sampling, pass a ``sampling_params`` tensor to the\nforward function of the model. The ``sampling_params`` tensor has shape\n``[batch_size, 3]``, where the three values per batch are ``top_k``,\n``top_p``, and ``temperature``.\n\nThe following example demonstrates how to create ``sampling_params`` for\na batch with two inputs. In the first input, ``top_k=50``,\n``top_p=0.5``, and ``temperature=0.75``. In the second input,\n``top_k=5``, ``top_p=1.0``, and ``temperature=1.0``.\n\n::\n\n   sampling_params = torch.tensor([[50, 0.5, 0.75], [5, 1.0, 1.0]])\n\nGreedy Sampling\n~~~~~~~~~~~~~~~\n\nBy default, on-device sampling uses greedy sampling, where the model\npicks the highest scoring token.\n\nMultinomial (Top-K) Sampling\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWith multinomial (top-k) sampling, the model picks one of the top\n*k*-highest scoring tokens. To use on-device multinomial sampling, you\nmust enable dynamic sampling. You can configure the default ``top_k``\nattribute in the OnDeviceSamplingConfig, or you can specify the\n``top_k`` value in each call to the model's ``forward`` function.\n\n::\n\n   on_device_sampling_config = OnDeviceSamplingConfig(top_k=5)\n\nTop-P Support in On-Device Sampling\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo use top-p in on-device sampling, enable dynamic sampling, and specify\n``top_p`` values in the ``sampling_params``.\n\nTemperature Support in On-Device Sampling\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo adjust temperature in on-device sampling, enable dynamic sampling,\nand specify ``temperature`` values in the ``sampling_params``.\n\n.. _qkv-weight-fusion:\n\nQKV Weight Fusion\n-----------------\n\nQKV weight fusion concatenates a model's query, key and value weight\nmatrices to achieve better performance, because larger matrices allow\nfor more efficient data movement and compute. You can enable QKV weight\nfusion by setting ``fused_qkv=True`` in the NeuronConfig.\n\n::\n\n   neuron_config = NeuronConfig(fused_qkv=True)\n\n.. _nxdi-bucketing:\n\nBucketing\n---------\n\nLLM inference is a generation process that can produce variable length\nsequences. This poses a problem since the Neuron compiler produces\nexecutables which expect statically shaped inputs and outputs. To make\nLLMs work with different shapes, NxD Inference supports\nbuckets and applies padding wherever it is required. When you run\ninference, NxD Inference automatically chooses the\nsmallest bucket that fits the input for optimal performance. For more\ninformation about bucketing, see :ref:`torch-neuronx-autobucketing-devguide`.\n\nAutomatic Bucketing\n~~~~~~~~~~~~~~~~~~~\n\nWhen automatic bucketing is enabled, NxD Inference\nautomatically chooses buckets for each model according to the following\nlogic:\n\n- Context encoding: Powers of two between 128 and the max context\n  length.\n\n  - Note: Max context length is equivalent to sequence length by\n    default.\n\n- Token generation: Powers of two between 128 and the maximum sequence\n  length.\n\nTo enable automatic bucketing, set ``enable_bucketing=True`` in\nNeuronConfig.\n\n::\n\n   neuron_config = NeuronConfig(enable_bucketing=True)\n\nConfiguring Specific Buckets\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can configure specific buckets to further optimize inference based\non the input and output length distribution that you expect to process\nwith your model. In NeuronConfig, set ``enable_bucketing=True``, and\nprovide a list of bucket sizes in ``context_encoding_buckets`` and/or\n``token_generation_buckets``.\n\n::\n\n   neuron_config = NeuronConfig(\n       enable_bucketing=True,\n       context_encoding_buckets=[1024, 2048, 4096],\n       token_generation_buckets=[8192]\n   )\n\n.. _nxdi-quantization:\n\nQuantization\n------------\n\nNxD Inference supports quantization, where model weights\nand data are converted to a smaller data type to reduce memory bandwidth\nusage, which improves model performance.\n\nNote: Quantization slightly reduces accuracy due to using data types\nwith lower precision and/or lower range.\n\n.. _nxdi-weight-quantization:\n\nModel Weight Quantization\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference supports quantizing model weights to the\nfollowing data types:\n\n- INT8 (``int8``) - 8 bit int.\n- FP8 - 8 bit float.\n\n  - ``f8e4m3`` - 8-bit float with greater precision and less range.\n\n    - Important: To use ``f8e4m3`` for quantization, you must set the\n      ``XLA_HANDLE_SPECIAL_SCALAR`` environment variable to ``1``.\n\n  - ``f8e5m2`` - 8-bit float with greater range and less precision.\n\nNxD Inference supports the following quantization methods, which you specify with `quantization_type` in NeuronConfig:\n\n- `per_tensor_symmetric`\n- `per_channel_symmetric`\n\n.. _example-1:\n\nExample\n^^^^^^^\n\nThe following example demonstrates how to quantize a model to INT8. To quantize\na model to a different data type, change the ``quantization_dtype`` config\nattribute in ``NeuronConfig``.\n\n::\n\n   from neuronx_distributed_inference.models.config import NeuronConfig\n   from neuronx_distributed_inference.models.llama.modeling_llama import (\n       LlamaInferenceConfig,\n       NeuronLlamaForCausalLM\n   )\n   from neuronx_distributed_inference.utils.hf_adapter import load_pretrained_config\n\n   model_path = \"/home/ubuntu/models/Llama-3.1-8B\"\n   quantized_model_path = \"/home/ubuntu/models/Llama-3.1-8B-quantized\"\n\n   neuron_config = NeuronConfig(\n       quantized=True,\n       quantized_checkpoints_path=quantized_model_path,\n       quantization_dtype=\"int8\",\n       quantization_type=\"per_tensor_symmetric\"\n   )\n\n   config = LlamaInferenceConfig(\n       neuron_config,\n       load_config=load_pretrained_config(model_path)\n   )\n\n   # Quantize the model and save it to `quantized_checkpoints_path`.\n   NeuronLlamaForCausalLM.save_quantized_state_dict(model_path, config)\n\n   # Compile, load, and use the model.\n   model = NeuronLlamaForCausalLM(model_path, config)\n\n.. _nxdi-kv-cache-quantization:\n\nKV Cache Quantization\n~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference supports KV cache quantization, where the\nmodel's KV cache is quantized to a smaller data type. When enabled, the\nmodel quantizes the KV cache to the ``torch.float8_e4m3fn`` data type.\nBefore using the KV cache, the model dequantizes the KV cache to the data\ntype specified by ``torch_dtype`` in NeuronConfig.\n\nTo enable KV cache quantization, set ``kv_cache_quant=True`` in\nNeuronConfig.\n\n::\n\n   neuron_config = NeuronConfig(kv_cache_quant=True)\n\n- Important: To use KV cache quantization, you must set the\n  ``XLA_HANDLE_SPECIAL_SCALAR`` environment variable to ``1``.\n\n.. _nxd-speculative-decoding:\n\nSpeculative Decoding\n--------------------\n\nSpeculative decoding is a performance optimization technique where a\nsmaller *draft* LLM model predicts the next tokens, and the larger *target*\nLLM model verifies those predictions. NxD Inference supports\nthe following speculative decoding implementations:\n\n1. :ref:`Speculative decoding with a draft model <nxd-vanilla-speculative-decoding>`,\n   where a separate draft model predicts the next *n* tokens for the target\n   model. Each model is compiled independently.\n2. :ref:`Medusa speculative decoding<nxd-medusa-speculative-decoding>`,\n   where several small model heads predict next tokens, and the target\n   model verifies all predictions at the same time.\n3. :ref:`EAGLE speculative decoding<nxd-eagle-speculative-decoding>`,\n   where the draft model uses additional context from the target model\n   to improve generation efficiency. NxD Inference supports EAGLE v1 with\n   a flat draft structure.\n\n.. _nxd-vanilla-speculative-decoding:\n\nSpeculative Decoding with a Draft model\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo use speculative decoding with a draft model, you configure, compile, and load a\ndraft model in addition to the main target model. To enable \nspeculative decoding with a draft model, set ``speculation_length`` and\n``trace_tokengen_model=False`` in the target model's NeuronConfig. The\ndraft model's NeuronConfig should use the same configuration but with\nthese additional attributes reset to their defaults.\n\n Speculative decoding with a draft model currently supports only batch sizes of 1.\n\n.. _example-2:\n\nExample\n^^^^^^^\n\nThe following example demonstrates using Llama-3.2 3B as a draft model\nfor Llama-3.1 70B. The speculation length is set to 5 tokens.\n\n::\n\n   import copy\n\n   from transformers import AutoTokenizer, GenerationConfig\n\n   from neuronx_distributed_inference.models.config import NeuronConfig\n   from neuronx_distributed_inference.models.llama.modeling_llama import (\n       LlamaInferenceConfig,\n       NeuronLlamaForCausalLM\n   )\n   from neuronx_distributed_inference.utils.accuracy import get_generate_outputs\n   from neuronx_distributed_inference.utils.hf_adapter import load_pretrained_config\n\n   prompts = [\"I believe the meaning of life is\"]\n\n   model_path = \"/home/ubuntu/models/Llama-3.1-70B\"\n   draft_model_path = \"/home/ubuntu/models/Llama-3.2-3B\"\n   compiled_model_path = \"/home/ubuntu/neuron_models/Llama-3.1-70B\"\n   compiled_draft_model_path = \"/home/ubuntu/neuron_models/Llama-3.2-3B\"\n\n   # Initialize target model.\n   neuron_config = NeuronConfig(\n       speculation_length=5,\n       trace_tokengen_model=False\n   )\n   config = LlamaInferenceConfig(\n       neuron_config,\n       load_config=load_pretrained_config(model_path)\n   )\n   model = NeuronLlamaForCausalLM(model_path, config)\n\n   # Initialize draft model.\n   draft_neuron_config = copy.deepcopy(neuron_config)\n   draft_neuron_config.speculation_length **=** 0\n   draft_neuron_config.trace_tokengen_model **=** True\n   draft_config = LlamaInferenceConfig(\n       draft_neuron_config,\n       load_config=load_pretrained_config(draft_model_path)\n   )\n   draft_model = NeuronLlamaForCausalLM(draft_model_path, draft_config)\n\n   # Compile and save models.\n   model.compile(compiled_model_path)\n   draft_model.compile(compiled_draft_model_path)\n\n   # Load models to the Neuron device.\n   model.load(compiled_model_path)\n   draft_model.load(compiled_draft_model_path)\n\n   # Load tokenizer and generation config.\n   tokenizer **=** AutoTokenizer.from_pretrained(model_path, padding_side**=**neuron_config.padding_side)\n   generation_config = GenerationConfig.from_pretrained(model_path)\n\n   # Run generation.\n   _, output_tokens = get_generate_outputs(\n       model,\n       prompts,\n       tokenizer,\n       is_hf=False,\n       draft_model=draft_model,\n       generation_config=generation_config\n   )\n\n   print(\"Generated outputs:\")\n   for i, output_token in enumerate(output_tokens):\n       print(f\"Output {i}: {output_token}\")\n\n.. _nxd-medusa-speculative-decoding:\n\nMedusa Speculative Decoding\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo use Medusa speculative decoding, you must use a model that is\nspecifically fine-tuned for Medusa speculation, such as\n`text-generation-inference/Mistral-7B-Instruct-v0.2-medusa <https://huggingface.co/text-generation-inference/Mistral-7B-Instruct-v0.2-medusa>`__.\nYou must also provide a Medusa tree. For an example Medusa tree, see\n``medusa_mc_sim_7b_63.json`` in the ``examples`` folder in NeuronX\nDistributed Inference.\n\nTo enable Medusa, set ``is_medusa=True``, set the\n``medusa_speculation_length``, set the ``num_medusa_heads``, and specify\nthe ``medusa_tree``.\n\n::\n\n   def load_json_file(json_path):\n       with open(json_path, \"r\") as f:\n           return json.load(f)\n\n   medusa_tree = load_json_file(\"medusa_mc_sim_7b_63.json\")\n\n   neuron_config = NeuronConfig(\n       is_medusa=True,\n       medusa_speculation_length=64,\n       num_medusa_heads=4,\n       medusa_tree=medusa_tree\n   )\n\nTo run generation with a Medusa model and the HuggingFace ``generate()``\nAPI, set the ``assistant_model`` to the target model.\n\nFor more information about Medusa speculative decoding, see the official\nimplementation on GitHub: https://github.com/FasterDecoding/Medusa.\n\nMedusa speculative decoding currently supports only batch sizes of 1.\n\n.. _nxd-eagle-speculative-decoding:\n\nEAGLE Speculative Decoding\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference supports EAGLE v1 speculative decoding with a flat draft structure.\n\nEAGLE Checkpoint Compatibility\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo use EAGLE speculative decoding, you must use a draft\nmodel that is specifically fine-tuned for EAGLE speculation. Additionally, to use EAGLE with\nNxD Inference, the draft model must include the LM head weights from the target model.\nThese weights are shared between the draft and target model.\n\nBecause NxD Inference uses a flat draft structure, it predicts only one token per draft iteration.\nAlthough NxD Inference doesn't support EAGLE with a tree structure, you can train\nan EAGLE checkpoint in the same way. Note that depending on your use case and dataset, you\nmight see lower acceptance rate with the flat draft structure compared with using a tree structure.\n\nNxD Inference supports EAGLE models with or without input normalization. By default,\nNxD Inference expects that the EAGLE model doesn't use input normalization. To use\nan EAGLE model with input normalization, set ``enable_eagle_draft_input_norm`` to ``True``\nin NeuronConfig.\n\nYou can find links to pretrained EAGLE draft model checkpoints for various\npopular models in the official EAGLE repository on GitHub: https://github.com/SafeAILab/EAGLE.\nHowever, these pretrained EAGLE model checkpoints don't include the LM head\nweights from the target model. To use these pretrained checkpoints with NxD Inference,\nyou must first copy the LM head weights from the target to the draft model.\n\nThe following code demonstrates how to perform this operation for a `Llama-3.1-70B-Instruct <https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct>`__\ntarget model and the corresponding `EAGLE draft <https://huggingface.co/yuhuili/EAGLE-LLaMA3-Instruct-70B>`__:\n\n::\n\n    import json\n    import os\n\n    import torch\n    from safetensors import safe_open\n    from safetensors.torch import save_file\n\n    target_model_path = \"Meta-Llama-3.1-70B-Instruct\"\n    draft_model_path = \"Llama-3.1-70B-Instruct-EAGLE-Draft\"\n\n    DRAFT_MODEL_SAFETENSORS_NAME = \"model.safetensors\"\n    LM_HEAD_WEIGHT_TENSOR_NAME = \"lm_head.weight\"\n    TARGET_MODEL_SAFETENSORS_INDEX_NAME = \"model.safetensors.index.json\"\n\n    def find_lm_head_safetensors_location(model_dir):\n        model_index_location_path = os.path.join(model_dir, TARGET_MODEL_SAFETENSORS_INDEX_NAME)\n\n        with open(model_index_location_path, 'r') as f:\n            model_index_locations = json.load(f)\n\n        lm_head_safetensors_name = model_index_locations[\"weight_map\"][LM_HEAD_WEIGHT_TENSOR_NAME]\n\n        return lm_head_safetensors_name\n\n    # Find the target model `lm_head.weight` location in safetensors\n    target_lm_head_safetensors_name = find_lm_head_safetensors_location(target_model_path)\n    target_lm_head_safetensors_path = os.path.join(target_model_path, target_lm_head_safetensors_name)\n\n    # Open the target model.safetensor containing `lm_head.weight`\n    with safe_open(target_lm_head_safetensors_path, framework=\"pt\") as f:\n        target_lm_head = f.get_tensor(LM_HEAD_WEIGHT_TENSOR_NAME)\n\n    # Collect all tensors in the draft model\n    draft_model_safetensors_path = os.path.join(draft_model_path, DRAFT_MODEL_SAFETENSORS_NAME)\n    tensors = {}\n    with safe_open(draft_model_safetensors_path, framework=\"pt\") as f:\n        for key in f.keys():\n            tensors[key] = f.get_tensor(key)\n\n    # Add the LM head weights and save out the new draft model.safetensors file\n    tensors[LM_HEAD_WEIGHT_TENSOR_NAME] = target_lm_head.type(torch.float16)\n    save_file(tensors, draft_model_safetensors_path)\n\n.. _nxd-fused-speculative-decoding:\n\nFused Speculation\n^^^^^^^^^^^^^^^^^\n\nEAGLE speculation uses a feature called *fused speculation*, where the\ndraft model and target model are fused into a single compiled model to\nimprove performance. Fused speculation uses a different config called\nFusedSpecNeuronConfig, which specifies the model class. draft config,\nand draft model path to fuse with the target model.\n\n.. _example-3:\n\nExample\n^^^^^^^\n\n::\n\n    import copy\n\n    from neuronx_distributed_inference.models.config import (\n        FusedSpecNeuronConfig,\n        NeuronConfig,\n        OnDeviceSamplingConfig\n    )\n    from neuronx_distributed_inference.models.llama.modeling_llama import (\n        NeuronLlamaForCausalLM,\n        NeuronLlamaModel\n    )\n    from neuronx_distributed_inference.utils.accuracy import get_generate_outputs\n    from neuronx_distributed_inference.utils.hf_adapter import load_pretrained_config\n    from transformers import AutoTokenizer, GenerationConfig\n\n    prompt = \"The future of AI is\"\n\n    model_path = \"/home/ubuntu/models/Llama-3.1-70B-Instruct\"\n    draft_model_path = \"/home/ubuntu/models/Llama-3.1-70B-Instruct-EAGLE-Draft\"\n    compiled_model_path = \"/home/ubuntu/neuron_models/Llama-3.1-70B-Instruct-EAGLE\"\n    max_sequence_length = 1024\n\n    # Initialize on-device sampling configuration.\n    on_device_sampling_config = OnDeviceSamplingConfig(\n        temperature=0.7,\n        top_k=50,\n        top_p=1.0,\n    )\n\n    # Initialize model configuration.\n    neuron_config = NeuronConfig(\n        # Neuron supports EAGLE batch sizes greater than 1.\n        # We set batch size to 1 in this tutorial due to a\n        # limitation in the transformers library for\n        # generation with speculative decoding.\n        # For more information, see: https://github.com/huggingface/transformers/issues/32165\n        batch_size = 1,\n        enable_eagle_speculation=True,\n        enable_fused_speculation=True,\n        max_context_length=max_sequence_length,\n        max_length=max_sequence_length,\n        on_device_sampling_config=on_device_sampling_config,\n        seq_len=max_sequence_length,\n        speculation_length=5,\n        # For best performance, set to the maximum tensor\n        # parallelism of your Neuron instance type.\n        tp_degree=32,\n        trace_tokengen_model=False\n    )\n\n    config = NeuronLlamaForCausalLM.get_config_cls()(\n        neuron_config, load_config=load_pretrained_config(model_path)\n    )\n\n    # Initialize draft model configuration and set EAGLE-specific values.\n    draft_neuron_config = copy.deepcopy(neuron_config)\n    draft_neuron_config.trace_tokengen_model = True\n    draft_neuron_config.enable_fused_speculation = False\n    draft_neuron_config.is_eagle_draft = True\n    draft_neuron_config.sequence_parallel_enabled = False\n\n    draft_config = NeuronLlamaForCausalLM.get_config_cls()(\n        draft_neuron_config, load_config=load_pretrained_config(draft_model_path))\n\n    # Initialize fused speculation configuration.\n    fused_spec_config = FusedSpecNeuronConfig(\n        NeuronLlamaForCausalLM._model_cls,\n        draft_config=draft_config,\n        draft_model_path=draft_model_path,\n    )\n    config.fused_spec_config = fused_spec_config\n\n    # Initialize model from configuration.\n    model = NeuronLlamaForCausalLM(model_path, config)\n\n    # Compile and save model.\n    model.compile(compiled_model_path)\n\n    # Load model to the Neuron device.\n    model.load(compiled_model_path)\n\n    # Load tokenizer and generation config.\n    tokenizer = AutoTokenizer.from_pretrained(model_path, padding_side=neuron_config.padding_side)\n    generation_config = GenerationConfig.from_pretrained(model_path)\n    generation_config.max_length = 1024\n    # pad_token_id is required for Hugging Face assisted sampling.\n    generation_config.pad_token_id = tokenizer.eos_token_id\n\n    # Run generation and print outputs.\n    _, output_tokens = get_generate_outputs(\n        model,\n        [prompt],\n        tokenizer,\n        is_hf=False,\n        # draft_model is not set here due to fused speculation.\n        draft_model=None,\n        generation_config=generation_config\n    )\n\n    print(\"Generated output:\")\n    for _, output in enumerate(output_tokens):\n        print(output)\n\nMoE model architecture support\n------------------------------\n\nNxD Inference supports mixture-of-experts (MoE) models.\nThe library includes ready-to-use modeling code for Mixtral and DBRX.\nThese models are built using reusable MoE modules from NeuronX\nDistributed Core: ``RouterTopK``, ``ExpertMLPs``, and ``MoE``. You can\nuse these modules to onboard additional MoE models.\n\nNxD Inference also provides a helper function,\n``initialize_moe_module``, which you can use to initialize an MoE\nmodel's MLP module from these MoE modules. For examples of how to use\nthis helper function, see the decoder layer module implementation in the\n`Mixtral <https://github.com/aws-neuron/neuronx-distributed-inference/blob/main/src/neuronx_distributed_inference/models/mixtral/modeling_mixtral.py>`__\nand `DBRX <https://github.com/aws-neuron/neuronx-distributed-inference/blob/main/src/neuronx_distributed_inference/models/dbrx/modeling_dbrx.py>`__\nmodeling code.\n\nGrouped-query attention (GQA) support\n-------------------------------------\n\nNxD Inference provides a reusable attention module,\nNeuronAttentionBase, which you can use when onboarding models. This\nmodule is also used in NxD Inference modeling code like Llama and\nMixtral.\n\nNxD Inference supports the following sharding strategies\nfor the KV cache used in the attention module:\n\n- ``CONVERT_TO_MHA`` — Transforms a GQA attention mechanism into a\n  traditional MHA mechanism by replicating the K/V heads to evenly match\n  the corresponding Q heads. This consumes more memory than would\n  otherwise be used with other sharding mechanisms but works in all\n  cases.\n- ``REPLICATE_TO_TP_DEGREE`` — Transforms a GQA attention mechanism such\n  that there is exactlyone K/V head per tp_degree through replication\n  e.g. 8 K/V heads with tp_degree=32 results in 32 K/V heads. This is\n  more memory efficient but does not work for all configurations. Q\n  heads are padded interleaved to retain correct alignment between Q and\n  K/V heads.\n\nThe NeuronAttentionBase module uses ``REPLICATE_TO_TP_DEGREE`` by\ndefault. If the TP degree isn't divisible by the number of KV heads,\nNeuronAttentionBase uses ``CONVERT_TO_MHA``.\n\n.. _nxdi_async_mode_feature_guide:\n\nAsyncronous Runtime Support\n---------------------------\n\nNxD Inference offers certain model configurations to be run with Asyncronous Runtime Mode (Async mode).\nAsync mode allows NxD Inference to parallelize CPU logic with Neuron (NEFF) logic. As a result, any CPU overheads\nwithin NxDI that exist between sequential model executions (ex. autoregressive loop in LLMs) are almost fully\neliminated. This reduces latency anywhere from 5% to 20% based on the model configuration.\n\nThis feature can be enabled with by setting ``async_mode`` to ``True`` in ``NeuronConfig``.\n\nTo use Async mode, a model configuration must meet the following prerequisites:\n- On-device sampling is enabled.\n- If speculation is enabled, fused speculation must also be enabled.\n\nIt is highly recommended to set ``async_mode`` to ``True`` for every other case, since it offers a latency reduction.\nFurthermore, this feature is a purely runtime feature, so if you have a previously compiled model, and its configuration\ndoesn't fall under the unsupported case, ``async_mode`` will likely be able to improve performance.\n\n.. note::\n    If you are using vLLM, this feature works independently of vLLM's Async Engine. As a result, ``async_mode`` can be enabled\n    whether vLLM is used or not.\n\n.. _nxdi_prefix_caching:\n\nPrefix Caching Support\n----------------------\n\nPrefix caching is a performance optimization technique where prompts in multiple requests sharing the same prefix can reuse the\npreviously computed KV cache. When context encoding a prompt that starts with a previously computed prefix, the encoding of the\nprefix tokens will be skipped and the corresponding KV Cache will be fetched and used for encoding the rest of the tokens (suffix).\nThe performance benefit comes from the time saved by re-using the KV Cache instead of re-encoding the prefix tokens. NxD Inference\nsupports prefix caching during context encoding. To store KV cache and match to prefix efficiently, NxD Inference uses block KV Cache\nlayout for prefix caching. NxD Inference does not implement its own cache eviction, memory management, or prefix hashing for matches.\nInstead, it requires external management of the block KV cache and expects active block tables and slot mappings to be provided with\neach generation request. This feature integrates with vLLM by enabling automatic prefix caching, which manages the block tables,\nhandles automatic prefix matching across prompts, and performs cache evictions. More on automatic prefix caching support on vLLM\ncan be found `here <https://docs.vllm.ai/en/latest/design/v1/prefix_caching.html>`__.\n\nTo enable prefix caching with NxD Inference, set ``is_prefix_caching=True`` in NeuronConfig along with configurations for\nblock KV cache layout.\n\n::\n\n    neuron_config = NeuronConfig(\n        is_prefix_caching=True,\n        is_block_kv_layout=True,\n        pa_num_blocks=1024,\n        pa_block_size=32,\n    )\n\n``is_block_kv_layout=True`` and ``is_prefix_caching=True`` are set in NeuronConfig to enable block KV Cache layout and enable\nprefix caching. The first two dimensions of the KV cache are set to the number of blocks and block size, respectively. These\nconfigurations are specified using ``pa_num_blocks`` and ``pa_block_size`` in NeuronConfig. For optimal performance with Neuron,\nit's recommended to set ``pa_block_size=32``. The minimum required ``pa_num_blocks`` to be initialized is\n``(batch_size * max_model_len) / pa_block_size`` However, it is recommended to initialize more blocks than the required minimum\nto accommodate caching of common prefixes. The higher the number of blocks, the greater the likelihood of cache hits, as fewer\ncache evictions will occur. NxD Inference does not currently provide an automated solution to determine the maximum number of\nKV Cache blocks that can be initialized in HBM without exceeding available memory space. Customers are advised to experiment with\nincreasing the number of blocks that balances the cache hit rate and memory taken. Any memory taken by increasing the cache will\nimpact the batch sizes and sequence lengths that can be supported, so customers are sugggested to pick the correct number of blocks\nconsidering these trade offs and the specific inference workload they plan to run in production.\n\nNxD Inference does not use paged attention for prefix caching. Instead, it follows a different process:\nfirst gathering the block KV cache using the block table, then converting it to a flat KV cache layout, computing attention, and \nfinally scattering the computed cache back to the block KV cache layout. This approach introduces overhead during\ntoken generation requests due to layout conversions, which can negatively impact performance as the ``max_model_len`` increases.\n\n.. _bucketing-with-prefix-caching:\n\nBucketing with Prefix Caching\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nPrefix caching handles both the prefix (cache hit) and suffix (no cache hit) portions of input prompts during context encoding.\nA two-dimensional bucketing system has been introduced to support context encoding when prefix caching is enabled. This system\nuses separate dimensions corresponding to the prefix and suffix (non cache-hit portion) of the input prompts. In contrast,\ntoken generation still uses one-dimensional bucketing based on the maximum sequence length.\n\nWhen bucketing is enabled, NxD Inference creates prefill (suffix) buckets (covering suffix portion) starting with powers of 2,\nranging from 512 up to the maximum context length. The prefix buckets mirror the prefill buckets, with one key difference: a special\nprefix bucket of size 0 is added to handle requests with no cache hits. NxD Inference then creates a two-dimensional grid of all prefill\nand prefix bucket combinations, which represents the effective set of buckets during context encoding. During request processing,\nNxD Inference first identifies the smallest prefill bucket that can accommodate the largest suffix portion of the input prompts.\nIf prefill padding is needed, NxD Inference prioritizes moving tokens from the prefix's end to the prefill bucket before adding padding.\nIt then determines the smallest prefix bucket that can fit the largest prefix across prompts. These two dimensions together determine\nthe final (prefill, prefix) bucket combination used to serve the context encoding request.\n\nYou can configure specific buckets to optimize inference based on the expected distribution of prefix lengths, input lengths, and\noutput lengths for your model. In NeuronConfig, set ``enable_bucketing=True``, and provide a list of bucket sizes in\n``context_encoding_buckets``, ``prefix_buckets`` and/or ``token_generation_buckets``. ``context_encoding_buckets`` corresponds to prefill\nbuckets when prefix caching is enabled.\n\n::\n\n    neuron_config = NeuronConfig(\n        enable_bucketing=True,\n        context_encoding_buckets=[512, 1024, 2048],\n        prefix_buckets=[512, 1024]\n        token_generation_buckets=[2048]\n    )\n\nExamples\n^^^^^^^^\n\nFor ``context_encoding_buckets=[512, 1024, 2048]`` and ``prefix_buckets=[512, 1024]``\n\nFor requests with:\n\n- Input prompt of size 1000 with no prefix, NxDI uses prefill bucket as 1024 and prefix bucket as 0.\n- Input prompt of size 800 with 128 as the prefix size, and remaining 672 as the suffix size, NxDI first selects 1024\n  as the prefill bucket. Remaining 352 prefill slots are filled up by moving entire prefix to the suffix part.\n  So prefill bucket of 1024 and prefix bucket as 0 is used here.\n- Input prompt of size 900 with 640 as the prefix size, and remaining 260 as the suffix size, NxDI first selects 512\n  as the prefill bucket. Remaining 252 prefill slots are filled up by moving 252 tokens from the end of prefix to the suffix part.\n  Effective prefix length now becomes 388, so prefill bucket of 512 and prefix bucket as 512 is used.\n- Input prompt of size 1600 with 1280 as the prefix size and remaining 320 as the suffix size, NxDI selects 512 as the\n  prefill bucket. Remaining 192 prefill slots are filled up by moving 192 tokens from the end of prefix to the suffix part.\n  Effective prefix length now becomes 1088 which is larger than the largest prefix bucket of 1024. This leads to exception\n  during the request processing.\n\nThe two-dimensional bucketing system exponentially increases the number of context encoding buckets. Therefore, users should exercise caution\nwhen using auto-bucketing with large context lengths. It is recommended to limit the granularity of prefix buckets based on your\nspecific workload requirements.\n\nFor detailed examples of prefix caching with NxD Inference and vLLM, see :ref:`/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-apc-tutorial.ipynb`.\n\nMulti-LoRA Serving\n------------------\n\nNxD Inference supports serving with multiple LoRA adapters and users can specify different LoRA adapters for their requests at runtime. \nIt also supports multi-LoRA serving with vLLM as the frontend.\nNxD Inference currently supports loading of LoRA adapters for dense model families, including Llama-2, Llama-3.1, Llama-3.2, Llama-3.3, TinyLlama, OpenLLaMA, Qwen2, and Qwen3.\nA current prerequisite is that the LoRA adapter checkpoints must be stored locally before the server is initialized and started.\n\nEnable multi-LoRA serving\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo enable multi-LoRA serving, provide a LoraServingConfig for ``lora_config`` attribute in NeuronConfig.\n\n::\n\n    lora_config = LoraServingConfig(\n        max_loras=max_loras,\n        max_cpu_loras=max_cpu_loras,\n        batch_size=batch_size,\n        dynamic_multi_lora=dynamic_multi_lora,\n        base_model_quantized=quantized,\n        lora_ckpt_json=lora_ckpt_json,\n    )\n    neuron_config = NeuronConfig(lora_config=lora_config)\n\nRefer to :ref:`nxd-inference-api-guide` for more details of ``LoraServingConfig``.\n\nNxD Inference primarily supports the format of LoRA adapters from `Huggingface PEFT <https://github.com/huggingface/peft>`__.\nEach checkpoint path is a folder that contains a checkpoint file (.safetensors, .bin, or .pt) and a configuration json file (.json).\nIn addition, NxD inference also supports LoRA adapters trained from `NxD LoRA finetuning <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/lora_finetune_developer_guide.html>`__.\nEach checkpoint path is a checkpoint file (.pt) that includes both LoRA adapter weights and the configuration. \n\nNxD Inference assumes all the LoRA adapters for multi-LoRA serving are available locally during compilation and their weights are loaded on neuron devices during serving.\nWhen uploading a LoRA adapter checkpoint to NxDI for multi-LoRA serving, the user is required to name the adapter with a unique adapter ID, such as ``adapter_id_1``, which will be used by users to specify the LoRA adapter for serving at runtime and by NxDI for model compilation.\n\nThe maximum number of concurrent LoRA adapters in device memory and host memory for serving are specified by ``max_loras`` and ``max_cpu_loras``, respectively.\nWhen ``dynamic_multi_lora=False``, all the LoRA adapters must be fully pre-loaded into device memory before the serving process begins.\nDynamic multi-LoRA serving is enabled by ``dynamic_multi_lora=True``, which loads more LoRA adapters to host memory and dynamically swaps them from CPU to HBM at runtime according to user requests.\nNxD Inference can quantize the base model for multi-LoRA serving with ``base_model_quantized=True``. \nRefer to :ref:`nxd-inference-api-guide-neuron-config` for setting the quantization configurations.\nThe set of LoRA adapters are specified by ``lora_ckpt_json``, which is a JSON file describing the mapping between the adapters IDs and their local paths of the LoRA adapter checkpoint.\nRefer to :ref:`nxd-inference-api-guide-neuron-config` for the JSON format.\nFor detailed examples of multi-LoRA serving in NxDI, see :ref:`/libraries/nxd-inference/tutorials/trn2-llama3.1-8b-multi-lora-tutorial.ipynb`.\n\n\nMaximum number of LoRA adapters supported in device memory\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe LoRA adapter size is much smaller than the base model, but its weights still consumes non-negligible on-device memory. \nThe maximum number of LoRA adapters that can be concurrently supported in the device memory depends on the base model, the LoRA rank, the reserved HBM size for LoRA adapters, and how the LoRA adapters are sharded across TP groups.\n\nSuppose a Trainium instance is used for multi-LoRA serving and the reserved HBM size on each neuron core for LoRA adapters is 2GB.\nEach LoRA adapter has two parts, LoRA A and LoRA B, and only one of them can be partitioned with tensor parallelism and the other is just Linear layer.\nWe analyze the maximum number of LoRA adapters supported in the device memory under two cases: 1/ the linear layer is duplicated, and 2/ the linear layer is sharded.\nThese two cases can be specified by ``lora_shard_linear_layer`` in ``LoraServingConfig``.\n\nWhen the linear layer is duplicated\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe weight size of a LoRA adapter on each device is around half of the total LoRA adapter size in this case.\nWhen the base model is Llama3.1 8B, the LoRA adapter checkpoint size with LoRA rank 16 in BF16 is around 170MB. \nBecause ``2GB / (170MB / 2) = 23``, the maximum number of concurrent LoRA adapters is 23.\nWhen the base model is Llama3.3 70B, the LoRA adapter checkpoint size with LoRA rank 16 in BF16 is around 830MB and we can set ``max_loras=4``.\nWe analyze the maximum number of LoRA adapters supported in NxD inference under two cases: the linear layer is duplicated and the linear layer is sharded.\nThese two cases can be specified by ``lora_shard_linear_layer`` in ``LoraServingConfig``.\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1 \n    :stub-columns: 1    \n    :align: left\n      \n    *   - Model\n        - Reserved Memory size\n        - LoRA rank\n        - Maximum LoRAs\n    \n    *   - Llama3.1 8B\n        - 2GB\n        - 16\n        - 23\n    *   - Llama3.1 8B\n        - 2GB\n        - 32\n        - 12\n    *   - Llama3.3 70B\n        - 2GB\n        - 16\n        - 4\n    *   - Llama3.3 70B\n        - 2GB\n        - 32 \n        - 2\n\nWhen the linear layer is sharded\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe linear layer in a LoRA adapter is sharded across neuron cores in a TP group at the cost of Allgather communication overehead in this case.\nThe weight size of a LoRA adapter on each device is ``1/TP_DEGREE`` of the total LoRA adapter size.\n\n.. list-table::\n    :widths: auto\n    :header-rows: 1 \n    :stub-columns: 1    \n    :align: left\n      \n    *   - Model\n        - Reserved Memory size\n        - LoRA rank\n        - TP degree\n        - Maximum LoRAs\n    \n    *   - Llama3.1 8B\n        - 2GB\n        - 16\n        - 32\n        - 376\n    *   - Llama3.1 8B\n        - 2GB\n        - 32\n        - 32\n        - 188\n    *   - Llama3.3 70B\n        - 2GB\n        - 16\n        - 32\n        - 77\n    *   - Llama3.3 70B\n        - 2GB\n        - 32 \n        - 32\n        - 38\n\n.. _nxdi_di_feature_guide:\n\nDisaggregated Inference [BETA]\n------------------------------\n\nDisaggregated Inference is an LLM serving architecture separates the prefill and decode phases of inference onto different hardware resources.\nSeparating the compute intensive prefill phase from the memory bandwidth intensive decode phase can improve the LLM serving experience by\n\n1. Removing prefill interruptions to decode from continuous batching to reduce inter token latency (ITL). These gains can be used to\nachieve higher throughput by running with a higher decode batch size while staying under Service Level Objectives (SLO).\n\n2. Adapt to changing traffic patterns while still remaining under application SLOs.\n\n3. Enable independent scaling of resources and parallelism strategies for prefill (compute bound) and decode (memory bound).\n\nSee the :ref:`Disaggregated Inference Developer Guide<nxdi-disaggregated-inference>` and the :ref:`Disaggregated Inference Tutorial<nxdi-disaggregated-inference-tutorial>`\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/how-to-use-fpem.rst",
    "content": ".. meta::\n   :description: Learn how to use Pipeline Execution Mode to optimize performance for large models with multiple submodels using NxD Inference\n   :date_updated: 2025-09-19\n\n.. _how-to-use-fpem:\n\n=======================================================================\nHow to Use On-device Forward Pipeline Execution Mode for Optimization\n=======================================================================\n\nTask Overview\n-------------\n\nThis topic shows you how to use Pipeline Execution Mode to optimize performance for large models with multiple submodels using the NxD Inference. This technique keeps intermediate tensors from sub models on the device to reduce data transfer overhead and minimize model latency.\n\nIn this guide, you'll learn to:\n\n* Configure pipeline execution flags for optimal performance\n* Set up multi-stage model wrappers that communicate efficiently\n* Manage intermediate tensor placement between pipeline stages\n* Implement a simple vision-text pipeline as a practical example\n\nSample Architecture\n-------------------\n\nThis guide uses a vision-text multimodal model to demonstrate pipeline execution. The architecture consists of:\n\n**Vision Model**: Processes image inputs through convolutional layers and outputs vision embeddings\n\n**Text Model**: Takes vision embeddings and text inputs, then produces final classification results\n\nThis two-stage pipeline shows how intermediate vision embeddings can remain on the device, avoiding costly CPU transfers between model stages. The same principles apply to other multi-stage architectures like transformer decoder chains, diffusion model denoisers, or encoder-decoder pairs.\n\nPrerequisites\n-------------\n\n- **NeuronX Distributed Inference (NxDI)**: You must have NxDI installed and configured. See NxD Inference Setup Guide.\n- **Multi-stage model**: Your model should have intermediate tensors in a pipeline structure, such as Llama4-style models, Pixtral, or diffusion-based models.\n\nThe following diagram shows how intermediate tensors flow through a multi-stage pipeline::\n\n    Input Data\n        |\n        v\n    ┌─────────────┐\n    │   Stage 1   │  <- Vision Model (Conv2D + Pooling)\n    │ (SubModel)  │\n    └─────────────┘\n        |\n        v\n    Intermediate    <- Kept on device with pipeline_execution=True\n    Tensors            and return_ranked_to_cpu=False\n        |\n        v\n    ┌─────────────┐\n    │   Stage 2   │  <- Text Model (Embedding + Fusion)\n    │ (SubModel)  │\n    └─────────────┘\n        |\n        v\n    Final Output    <- Returned to CPU with return_ranked_to_cpu=True\n\nWithout pipeline execution, intermediate tensors transfer between CPU and device at each stage, creating overhead. With pipeline execution enabled, intermediate tensors remain on the device, reducing latency.\n\n.. note::\n\n   **Padding Requirements**: When passing outputs between ModelWrapper instances, you must manually pad the list of lists to ensure consistent input dimensions. Padding is crucial to maintain tensor compatibility across pipeline stages.\n\nInstructions\n------------\n\n**1:** Import required modules and define model classes\n\nStart by importing the necessary modules and defining your model architectures:\n\n.. code-block:: python\n\n    import torch\n    from torch import nn\n    from neuronx_distributed_inference.models.encoder_base import NeuronEncoderBase\n    from neuronx_distributed_inference.models.model_wrapper import ModelWrapper\n    from neuronx_distributed_inference.models.application_base import NeuronApplicationBase\n    from neuronx_distributed_inference.models.config import InferenceConfig, NeuronConfig\n\n    # Vision Model Definition\n    class VisionModel(NeuronEncoderBase):\n        def __init__(self, config: InferenceConfig):\n            super().__init__(config)\n            self.conv = nn.Conv2d(3, 64, kernel_size=3, padding=1)\n            self.pool = nn.AdaptiveAvgPool2d((1, 1))\n            self.fc = nn.Linear(64, config.vision_embedding_size)\n\n        def forward(self, x):\n            x = self.conv(x)\n            x = self.pool(x)\n            x = torch.flatten(x, 1)\n            return self.fc(x)\n\n    # Text Model Definition\n    class TextModel(NeuronEncoderBase):\n        def __init__(self, config: InferenceConfig):\n            super().__init__(config)\n            self.embedding = nn.Linear(config.text_input_size, config.text_embedding_size)\n            self.fusion = nn.Linear(\n                config.vision_embedding_size + config.text_embedding_size,\n                config.output_size\n            )\n\n        def forward(self, vision_features, text_input):\n            text_features = self.embedding(text_input)\n            combined = torch.cat([vision_features, text_features], dim=1)\n            return self.fusion(combined)\n\n**2:** Configure ModelWrappers with pipeline execution flags\n\nSet up your ModelWrapper classes with appropriate pipeline execution parameters:\n\n.. code-block:: python\n\n    # Vision Model Wrapper - keeps output on device\n    class VisionModelWrapper(ModelWrapper):\n        def __init__(self, config: InferenceConfig):\n            super().__init__(\n                config=config,\n                model_cls=VisionModel,\n                pipeline_execution=True,\n                return_ranked_to_cpu=False,  # Keep output ranked for efficient pipeline\n                tag=\"vision_model\"\n            )\n\n        def input_generator(self):\n            # Generate sample input for compilation\n            x = torch.randn(\n                self.neuron_config.batch_size,\n                3,\n                224,\n                224\n            )\n            return [(x,)]\n\n    # Text Model Wrapper - returns final output to CPU\n    class TextModelWrapper(ModelWrapper):\n        def __init__(self, config: InferenceConfig):\n            super().__init__(\n                config=config,\n                model_cls=TextModel,\n                pipeline_execution=True,\n                return_ranked_to_cpu=True,  # Return final output to CPU\n                tag=\"text_model\"\n            )\n\n        def input_generator(self):\n            # Generate sample inputs for compilation\n            vision_features = torch.randn(\n                self.neuron_config.batch_size,\n                self.config.vision_embedding_size\n            )\n            text_input = torch.randn(\n                self.neuron_config.batch_size,\n                self.config.text_input_size\n            )\n            return [(vision_features, text_input)]\n\n**3:** Create application classes\n\nBuild application classes that use your configured ModelWrappers:\n\n.. code-block:: python\n\n    # Application Classes\n    class VisionModelApp(NeuronApplicationBase):\n        def __init__(self, model_path: str, config: InferenceConfig):\n            super().__init__(model_path=model_path, config=config)\n            self.model = VisionModelWrapper(config)\n            self.models.append(self.model)\n\n        def forward(self, x):\n            return self.models[0].forward(x)\n\n    class TextModelApp(NeuronApplicationBase):\n        def __init__(self, model_path: str, config: InferenceConfig):\n            super().__init__(model_path=model_path, config=config)\n            self.model = TextModelWrapper(config)\n            self.models.append(self.model)\n\n        def forward(self, vision_features, text_input):\n            return self.models[0].forward(vision_features, text_input)\n\n**4:** Run the complete pipeline example\n\nExecute your pipeline with the configured models:\n\n.. code-block:: python\n\n    def main():\n        # Configure models\n        config = InferenceConfig(\n            NeuronConfig(batch_size=32, torch_dtype=torch.float32, tp_degree=2),\n            vision_embedding_size=512,\n            text_input_size=256,\n            text_embedding_size=512,\n            output_size=1024\n        )\n\n        # Create applications\n        vision_app = VisionModelApp(\"path/to/vision/model\", config)\n        text_app = TextModelApp(\"path/to/text/model\", config)\n\n        # Compile models\n        vision_app.compile(\"path/to/compiled/vision\")\n        text_app.compile(\"path/to/compiled/text\")\n\n        # Load models\n        vision_app.load(\"path/to/compiled/vision\")\n        text_app.load(\"path/to/compiled/text\")\n\n        # Example inference\n        image_input = torch.randn(32, 3, 224, 224)\n        text_input = torch.randn(32, 256)\n\n        # Forward pass through vision model\n        # Returns ranked output (list of lists) since return_ranked_to_cpu=False\n        vision_features = vision_app.forward(image_input)\n\n        # Forward pass through text model\n        # Returns CPU tensor since return_ranked_to_cpu=True\n        final_output = text_app.forward(vision_features, text_input)\n\n        print(f\"Final output shape: {final_output.shape}\")  # [32, 1024]\n\nConfirm your work\n-----------------\n\nTo confirm you have successfully configured pipeline execution mode, check that your model outputs have the expected tensor placement:\n\n.. code-block:: python\n\n    # Check intermediate output placement\n    print(f\"Vision features type: {type(vision_features)}\")  # Should be list of lists\n    print(f\"Final output shape: {final_output.shape}\")       # Should be [32, 1024]\n    print(f\"Final output device: {final_output.device}\")     # Should be CPU\n\nCommon issues\n-------------\n\n.. rubric:: Tensor dimension mismatch between pipeline stages\n\n- **Possible solution**: Ensure you manually pad the list of lists when passing outputs between ModelWrapper instances to maintain consistent input dimensions.\n\n.. rubric:: Performance not improving with pipeline execution\n\n- **Possible solution**: Verify that your model has intermediate tensors in a pipeline structure. Pipeline execution works best with models like Llama4-style, Pixtral, or diffusion-based models.\n\n.. rubric:: Memory issues with large models\n\n- **Possible solution**: Adjust your batch size and tensor parallelism degree (tp_degree) in the NeuronConfig to better fit your available memory."
  },
  {
    "path": "libraries/nxd-inference/developer_guides/index.rst",
    "content": ".. meta::\n   :description: Developer guides for NxD Inference (neuronx-distributed-inference) on AWS Inferentia and AWS Trainium, covering model deployment, optimization, quantization, and integration with vLLM.\n   :keywords: AWS Neuron, NxD Inference, neuronx-distributed-inference, LLM inference, model deployment, AWS Inferentia, AWS Trainium, model optimization, quantization, vLLM integration\n   :author: AWS Neuron Team\n\n.. _nxdi-dev-ref-index:\n\nDeveloper Guides\n================\n\nComprehensive guides for using NxD Inference (neuronx-distributed-inference) to deploy and optimize machine learning models on AWS Inferentia and AWS Trainium accelerators. These guides cover model onboarding, performance optimization, quantization techniques, integration with vLLM, and other advanced features to help you maximize the performance of your models on AWS Neuron hardware.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    :caption: Developer Guides\n    \n    Accuracy Evaluation </libraries/nxd-inference/developer_guides/accuracy-eval-with-datasets>\n    Custom Quantization </libraries/nxd-inference/developer_guides/custom-quantization>\n    Disaggregated Inference </libraries/nxd-inference/developer_guides/disaggregated-inference>\n    Feature Guide </libraries/nxd-inference/developer_guides/feature-guide>\n    Using Pipeline Execution Mode </libraries/nxd-inference/developer_guides/how-to-use-fpem>\n    LLM Benchmarking </libraries/nxd-inference/developer_guides/llm-inference-benchmarking-guide>\n    Migrate from TNX </libraries/nxd-inference/developer_guides/migrate-from-tnx-to-nxdi>\n    Model Reference </libraries/nxd-inference/developer_guides/model-reference>\n    MoE Architecture </libraries/nxd-inference/developer_guides/moe-arch-deep-dive>\n    Migrate from NxD Core </libraries/nxd-inference/developer_guides/nxd-examples-migration-guide>\n    Onboarding Models </libraries/nxd-inference/developer_guides/onboarding-models>\n    Performance Benchmarking CLI </libraries/nxd-inference/developer_guides/performance-cli-params>\n    vLLM Guide (Legacy) </libraries/nxd-inference/developer_guides/vllm-user-guide>\n    vLLM Guide v1 </libraries/nxd-inference/developer_guides/vllm-user-guide-v1>\n    Weights Sharding </libraries/nxd-inference/developer_guides/weights-sharding-guide>\n    Writing Tests </libraries/nxd-inference/developer_guides/writing-tests>\n\nUse the NxD Inference (``neuronx-distributed-inference``) Developer Guides to learn how to use NxD Inference.\n\n.. grid:: 2\n    :gutter: 3\n\n    .. grid-item-card:: Accuracy Evaluation with Datasets\n        :link: /libraries/nxd-inference/developer_guides/accuracy-eval-with-datasets\n        :link-type: doc\n        \n        Guide for evaluating model accuracy using datasets to ensure model quality and performance.\n\n    .. grid-item-card:: Custom Quantization\n        :link: /libraries/nxd-inference/developer_guides/custom-quantization\n        :link-type: doc\n        \n        Guide for implementing custom quantization techniques to optimize model size and performance.\n\n    .. grid-item-card:: Disaggregated Inference\n        :link: /libraries/nxd-inference/developer_guides/disaggregated-inference\n        :link-type: doc\n        \n        Guide for using disaggregated inference architecture that separates prefill and decode phases for improved performance.\n\n    .. grid-item-card:: Feature Guide\n        :link: /libraries/nxd-inference/developer_guides/feature-guide\n        :link-type: doc\n        \n        Overview of NxD Inference features and configuration options for optimizing model deployment.\n\n    .. grid-item-card:: How to Use FPEM\n        :link: /libraries/nxd-inference/developer_guides/how-to-use-fpem\n        :link-type: doc\n        \n        Guide for using Fast Parameter-Efficient Module (FPEM) for efficient model fine-tuning.\n\n    .. grid-item-card:: LLM Inference Benchmarking Guide\n        :link: /libraries/nxd-inference/developer_guides/llm-inference-benchmarking-guide\n        :link-type: doc\n        \n        Guide for benchmarking LLM inference performance to optimize deployment configurations.\n\n    .. grid-item-card:: Migrate from TNX to NxDI\n        :link: /libraries/nxd-inference/developer_guides/migrate-from-tnx-to-nxdi\n        :link-type: doc\n        \n        Guide for migrating from Transformers NeuronX to NxD Inference with step-by-step instructions.\n\n    .. grid-item-card:: Model Reference\n        :link: /libraries/nxd-inference/developer_guides/model-reference\n        :link-type: doc\n        \n        Reference for production-ready models supported by NxD Inference and their configuration options.\n\n    .. grid-item-card:: MoE Architecture Deep Dive\n        :link: /libraries/nxd-inference/developer_guides/moe-arch-deep-dive\n        :link-type: doc\n        \n        Deep dive into Mixture of Experts (MoE) architecture implementation in NxD Inference.\n\n    .. grid-item-card:: NxD Examples Migration Guide\n        :link: /libraries/nxd-inference/developer_guides/nxd-examples-migration-guide\n        :link-type: doc\n        \n        Guide for migrating examples to NxD Inference from other frameworks or previous versions.\n\n    .. grid-item-card:: Onboarding Models\n        :link: /libraries/nxd-inference/developer_guides/onboarding-models\n        :link-type: doc\n        \n        Guide for onboarding new models to NxD Inference with detailed implementation steps.\n\n    .. grid-item-card:: Performance CLI Parameters\n        :link: /libraries/nxd-inference/developer_guides/performance-cli-params\n        :link-type: doc\n        \n        Guide for performance tuning using command-line interface parameters for optimal model execution.\n\n    .. grid-item-card:: vLLM User Guide (Legacy)\n        :link: /libraries/nxd-inference/developer_guides/vllm-user-guide\n        :link-type: doc\n        \n        Guide for using vLLM v0.x with NxD Inference (Legacy version) for LLM inference and serving.\n\n    .. grid-item-card:: vLLM User Guide v1\n        :link: /libraries/nxd-inference/developer_guides/vllm-user-guide-v1\n        :link-type: doc\n        \n        Guide for using vLLM v1.x with NxD Inference for efficient LLM inference and serving.\n\n    .. grid-item-card:: Weights Sharding Guide\n        :link: /libraries/nxd-inference/developer_guides/weights-sharding-guide\n        :link-type: doc\n        \n        Guide for implementing weights sharding to distribute model parameters across multiple devices.\n\n    .. grid-item-card:: Writing Tests\n        :link: /libraries/nxd-inference/developer_guides/writing-tests\n        :link-type: doc\n        \n        Guide for writing tests for NxD Inference models to ensure accuracy and performance.\n\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/llm-inference-benchmarking-guide.rst",
    "content": ".. _llm-inference-benchmarking:\n\nLLM Inference benchmarking guide\n================================\n\nThis guide gives an overview of the metrics that are tracked for LLM Inference and guidelines in using LLMPerf library\nto benchmark for LLM Inference.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n.. _llm_inference_metrics:\n\nLLM Inference metrics\n---------------------\nFollowing are the essential metrics for monitoring LLM Inference server performance.\n\n.. list-table::\n   :widths: 20 70 \n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - Metric\n     - Description\n\n   * - Time To First Token (TTFT) \n     - Average time taken for the LLM to process the prompt and output the first output token to the user. This is typically measured in milli seconds.\n  \n   * - Time per Output Token (TPOT) \n     - Average time taken for LLM to generate an output token for an inference request. This is typically measured in milli seconds. This metric is also referred as Inter Token Latency (ITL) or Per Token Latency(PTL)\n  \n   * - End-to-End Response Latency\n     - Time taken for the LLM to generate the entire response, including all output tokens. This metric is computed as  \n       end-to-end latency = (TTFT) + (TPOT) * (Number of output tokens).\n \n   * - Output Token Throughput\n     - Number of output tokens generated per second by the inference server across all concurrent users and requests.\n\n\n.. _llm_perf_patch_changes:\n\nUsing LLMPerf to benchmark LLM Inference performance\n----------------------------------------------------\n\n`LLMPerf <https://github.com/ray-project/llmperf>`_ is an open source library to benchmark LLM Inference performance. However, there are few changes that need to be applied to LLMPerf\nto accurately benchmark and reproduce the metrics that are published by Neuron.\n\n\nAll the changes outlined below are provided as a patch file. \n\n.. note::\n\n  Patches need to be applied in order because they might modify the same files.\n\nStep 1: Install LLMPerf from source\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n    python3 -m venv llmperf-env\n    source llmperf-env/bin/activate\n\n    git clone https://github.com/ray-project/llmperf.git ~/llmperf\n    cd ~/llmperf\n    pip install -e .\n\n\nStep 2: Patch custom Tokenizer and updated TPOT metric\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn public LLMPerf, ``hf-internal-testing`` tokenizer is used for all models which leads to incorrect\nperformance metrics due to counting more or less tokens than were actually processed by the model\non the server. Instead, we use the tokenizer of the model that is being benchmarked. \n\nLLMPerf includes TTFT in Time per Output Token(or Inter Token Latency) calculation. As TPOT and TTFT are two different metrics, a change is done to LLMPerf\nto exclude TTFT from TPOT calculation to keep it consistent with how other industry standard performance benchmarks are done.\n\nFollow these instructions to apply the patch to the LLMPerf library.\n\n* Download the ``neuron_perf.patch`` :download:`file </src/benchmark/helper_scripts/neuron_perf.patch>` into the ``llmperf`` directory. \n* Run ``git apply neuron_perf.patch``. Confirm changes with ``git diff``.\n\n\nStep 3: Patch data parallel benchmarking with multiple model endpoints\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo measure performance with data parallel inference using multiple model copies, \nwe allow users to provide multiple semicolon separated endpoints via `OPENAI_API_BASE` \n(e.g. \"export OPENAI_API_BASE=http://server1;http://server2;http://server3\") for\nthe OpenAI chat completion client. By default, the patch uses round-robin to route\nrequests.\n\n* Download the ``llmperf_dp.patch`` :download:`file </src/benchmark/helper_scripts/llmperf_dp.patch>` into the ``llmperf`` directory. \n* Run ``git apply llmperf_dp.patch``. Confirm changes with ``git diff``.\n\n\nStep 4: Patch reasoning model support\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTo measure LLM Inference performance of reasoning models, we need to patch LLMPerf to measure TTFT up to the \nfirst reasoning token instead of the first answer token.\n\n* Download the ``llmperf_reasoning.patch`` :download:`file </src/benchmark/helper_scripts/llmperf_reasoning.patch>` into the ``llmperf`` directory. \n* Run ``git apply llmperf_reasoning.patch``. Confirm changes with ``git diff``.\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/migrate-from-tnx-to-nxdi.rst",
    "content": ".. _nxdi_migrate_from_tnx:\n\n\nMigrating from Transformers NeuronX to  NeuronX Distributed(NxD) Inference\n==========================================================================\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nFor customers who are currently using Transformers NeuronX, this migration guide explains the steps involved in\nmigrating from Transformers NeuronX to NxD Inference library.  \n\n\nHow is writing modeling code different in NxD Inference?\n---------------------------------------------------------\n\nIn Transformers NeuronX, you write modeling code in HLO format using a Python HLO interface. In NeuronX Distributed Inference, you write modeling code in native PyTorch and Python, and the library converts it to HLO for you. \nThis change makes it easier to develop models to run on Neuron, because you can start from existing Pytorch or Python modeling code.\n\n\nHow can I migrate from Transformers NeuronX to use NxD Inference with vLLM?\n----------------------------------------------------------------------------\n\nTransformers NeuronX library currently supports Llama and Mistral model architectures with vLLM integration. If you are using one of these models, like Llama 3.1, Llama 3, Llama 2, or Mistral-7b-V2, you can migrate to use NxD Inference library with vLLM using the following steps:\n\n\nUpdate Environment Variable to force vLLM to use NxD Inference\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nAs vLLM currently supports both Transformers NeuronX and NeuronX Distributed Inference libraries for the Llama and Mistral models, you need to update the following environment variable in the inference scripts to force vLLM to use NxD Inference.\n\n.. code:: \n\n    # Force vLLM framework to use neuronx-distributed-inference\n    os.environ['VLLM_NEURON_FRAMEWORK'] = \"neuronx-distributed-inference\"\n\n\nCompiling and loading the model\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTransformers NeuronX uses Neuron Persistent Cache to load a pre-compiled model so that there is no additional delay in compilation when loading the model on vLLM.  NxD Inference currently does not support Neuron Persistent Cache but provides the following way to load a pre-compiled model in NeuronX Distributed Inference.\n\nFor production use cases where customer wants to avoid compiling the model in NxD Inference for the first time, users can set the environment variable ``NEURON_COMPILED_ARTIFACTS`` which points to pre-compiled artifacts directory to avoid the compilation time. If the artifacts are not present within the specified directory, then compilation of the model would be triggered as a fallback mechanism and will store the artifacts by default in ``neuron-compiled-artifacts/{unique_hash}/``\n\n\nFeatures currently not supported in NxD Inference through vLLM\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNxD Inference doesn't yet support the following features that TNx supports in vLLM integration.\n\n* Multi-Node Inference\n* Persistent Cache\n* concurrency > 1 support for speculation\n\nUsers can use exactly the same set of parameters to test out vLLM with NxD Inference library as they specify with Transformers NeuronX with the exception of ``override_neuron_config`` . Both Transformers NeuronX and NxD Inference allows overriding available NeuronConfig, but not all NeuronConfig parameters that are available with Transformers NeuronX are still valid/applicable in NxD Inference. Refer to the :ref:`neuron_config_migration_tnx_nxdi` to migrate your ``override_neuron_config`` params from Transformers NeuronX to NxD Inference.\n\nSerialization support\n----------------------\n\nIn both libraries, you serialize the compiled model, so you can use the model in subsequent runs without compiling it each time.\n\nIn Transformers NeuronX, the save function does not serialize sharded weights by default, and you can enable this functionality with the ``sharded_weights`` flag. In NeuronX Distributed Inference, the ``compile`` function serializes sharded weights by default, and you can disable this functionality with the ``save_sharded_checkpoint`` flag in ``NeuronConfig``.\n\nTranformers NeuronX\n^^^^^^^^^^^^^^^^^^^\n\n.. code::\n\n    # Create and compile the Neuron model\n    neuron_config = NeuronConfig()\n    model_neuron = LlamaForSampling.from_pretrained(\n        'openlm-research/open_llama_3b',\n        batch_size=1,\n        tp_degree=8,\n        n_positions=128,\n        neuron_config=neuron_config\n    )\n\n    # Compile the model.\n    model_neuron.to_neuron()\n\n    # Save the presharded weights and compiled artifacts to a directory.\n    model_neuron.save('llama-artifacts', sharded_weights=True)\n\nNeuronX Distributed Inference\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n.. code::\n    \n    model_path = \"/home/ubuntu/models/open_llama_3b\"\n    compiled_model_path = \"/home/ubuntu/compiled_models/open_llama_3b\"\n\n    neuron_config = NeuronConfig(\n        batch_size=1,\n        tp_degree=8,\n        seq_len=128\n    )\n\n    config = LlamaInferenceConfig(\n        neuron_config,\n        load_config=load_pretrained_config(model_path)\n    )\n\n    model = NeuronLlamaForCausalLM(model_path, config)\n\n    # Compile the model, shard the weights, and save to the given path.\n    model.compile(compiled_model_path)\n\nModels supported in Transformers NeuronX and NxD Inference model hubs\n----------------------------------------------------------------------\n\nThe following table depicts the list of models currently supported by TNx and their status in the NxD Inference library. For a more detailed list of models currently supported in NeuronX Distributed Inference, please refer to :ref:`NxD Inference model hub guide <nxdi-model-reference>`\n\n\n\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| Model                      | Transformers NeuronX (TNx)               | NxD Inference (NxDI)                              |\n+                            +--------------------+---------------------+------------------+--------------------------------+\n|                            | supported in TNx   | vLLM Support (TNx)  | supported in NxDI| vLLM Support (NxD Inference)   |\n+============================+====================+=====================+==================+================================+\n| BLOOM                      | Yes                | No                  | No               | No                             |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| GPT2                       | Yes                | No                  | No               | No                             |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| GPT-J                      | Yes                | No                  | No               | No                             |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| GPT-Neox                   | Yes                | No                  | No               | No                             |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| Llama 2                    | Yes                | Yes                 | Yes              | Yes                            |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| Llama 3                    | Yes                | Yes                 | Yes              | Yes                            |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| Llama 3.1                  | Yes                | Yes                 | Yes              | Yes                            |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| Llama 3.2 (1B and 3B)      | Yes                | Yes                 | Yes              | Yes                            |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| Llama 3.2 (11B and 90B)    | No                 | No                  | Yes              | Yes                            |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| Mistral-V2                 | Yes                | Yes                 | Yes              | Yes                            |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| Mixtral                    | Yes                | No                  | Yes              | Yes                            |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n| DBRX                       | No                 | No                  | Yes              | Yes                            |\n+----------------------------+--------------------+---------------------+------------------+--------------------------------+\n\n\n\n\nOnboarding custom or private models with NxD Inference\n-------------------------------------------------------\n\nIf you need model support for one of the models not currently supported in NxD Inference or if you have a private model that you currently implemented support in Transformers Neuronx,\nyou need to implement the model using NxD Inference library.  You can use the :ref:`nxdi-onboarding-models` guide.\n\n.. _neuron_config_migration_tnx_nxdi:\n\nNeuron Config Migration\n-----------------------\n\nThere are differences in Neuron Config parameters in Transformers NeuronX and :ref:`NxD Inference <nxd-inference-api-guide-neuron-config>` libraries.  \nIf you use TNx directly without vLLM, or if you use the ``override_neuron_config`` param in vLLM with TNx, then you must update config parameters according to the following table.\n\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 30 40\n\n   * - Transformers NeuronX parameter\n     - NxD Inference parameter\n     - Notes\n   * - sparse_attn\n     - N/A\n     - \n   * - quant.quant_dtype\n     - quantization_dtype\n     - To use quantization, set ``quantized`` to True, and provide the ``quantized_checkpoints_path`` where the quantized model is stored (or will be stored).\n   * - quant.dequant_dtype\n     - torch_dtype\n     - NxD Inference uses the inference dtype as the dequant dtype.\n   * - quant.quantize_method\n     - quantization_type\n     - \n   * - quant.quantize_attn\n     - N/A\n     - \n   * - quant.no_quantize_list\n     - N/A\n     - \n   * - kv_cache_quant.quant_dtype\n     - N/A\n     - NxD Inference uses FP8 (torch.float8_e4m3fn) for KV cache quantization. To use KV cache quantization, set ``kv_cache_quant`` to True.\n   * - kv_cache_quant.dequant_dtype\n     - torch_dtype\n     - NxD Inference uses the inference dtype as the dequant dtype.\n   * - kv_cache_quant.quantize_method\n     - N/A\n     - NxD Inference uses direct cast.\n   * - continuous_batching.max_num_seqs\n     - max_batch_size\n     - To use continuous batching, set ``is_continous_batching`` to True, and set ``tkg_batch_size`` to the max batch size.\n   * - continuous_batching.max_model_len\n     - seq_len\n     - \n   * - continuous_batching.optimized_paged_attention\n     - N/A\n     - \n   * - continuous_batching.block_size\n     - N/A\n     - \n   * - continuous_batching.num_blocks\n     - N/A\n     - \n   * - attention_layout\n     - N/A\n     - NxD Inference uses BHSD layout.\n   * - collectives_layout\n     - N/A\n     - NxD Inference uses BHSD layout.\n   * - cache_layout\n     - N/A\n     - NxD Inference uses BHSD layout.\n   * - padding_side\n     - padding_side\n     - NxD Inference defaults to padding on the right side.\n   * - group_query_attention\n     - N/A\n     - \n   * - sequence_parallel_norm\n     - sequence_parallel_enabled\n     - \n   * - sequence_parallel_norm_threshold\n     - N/A\n     - \n   * - bf16_rms_norm\n     - N/A\n     - NxD Inference upcasts RMS norm inputs to fp32.\n   * - on_device_embedding\n     - N/A\n     - \n   * - on_device_generation\n     - on_device_sampling_config\n     - \n   * - on_device_generation.max_length\n     - seq_len\n     - NxD Inference uses the model's sequence length.\n   * - on_device_generation.do_sample\n     - on_device_sampling_config.do_sample\n     - \n   * - on_device_generation.top_k\n     - on_device_sampling_config.top_k\n     - NxD Inference supports top_k through dynamic sampling. Pass the top_k values to the model inputs.\n   * - on_device_generation.top_p\n     - N/A\n     - NxD Inference supports top_p through dynamic sampling. Pass the top_p values to the model inputs.\n   * - on_device_generation.temperature\n     - N/A\n     - NxD Inference supports temperature through dynamic sampling. Pass the temperature values to the model inputs.\n   * - on_device_generation.top_p_min_tokens\n     - N/A\n     - NxD Inference defaults to a minimum of 1 token.\n   * - on_device_generation.global_top_k\n     - on_device_sampling_config.global_topk\n     - \n   * - on_device_generation.eos_token_id\n     - N/A\n     - NxD Inference sampling treats EOS like any other token.\n   * - on_device_generation.dynamic\n     - on_device_sampling_config.dynamic\n     - \n   * - on_device_generation.deterministic\n     - on_device_sampling_config.deterministic\n     - \n   * - on_device_generation.per_batch_line\n     - N/A\n     - \n   * - all_reduce_dtype\n     - rpl_reduce_dtype\n     - NxD Inference applies this dtype to only the all_reduce in attention's ``o_proj`` layer.\n   * - cast_logits_dtype\n     - N/A\n     - \n   * - fuse_qkv\n     - fused_qkv\n     - \n   * - qkv_tiling\n     - N/A\n     - \n   * - weight_tiling\n     - N/A\n     - \n   * - mlp_in_weight_tiling_permute_order\n     - N/A\n     - \n   * - mlp_out_weight_tiling_permute_order\n     - N/A\n     - \n   * - mlp_out_weight_transpose\n     - N/A\n     - \n   * - log_softmax_scores\n     - N/A\n     - \n   * - shard_over_sequence\n     - flash_decoding_enabled\n     - \n   * - duplicate_q_weight_sos\n     - N/A\n     - \n   * - output_all_logits\n     - N/A\n     - \n   * - fused_rmsnorm_qkv\n     - qkv_kernel_enabled\n     - \n   * - fused_rmsnorm_mlp\n     - mlp_kernel_enabled\n     - \n   * - attn_output_transposed\n     - N/A\n     - \n   * - compilation_worker_count\n     - N/A\n     - \n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/model-reference.rst",
    "content": ".. _nxdi-model-reference:\n\nNxD Inference - Production Ready Models\n=======================================\n\nNeuronx Distributed Inference provides production ready models that you can\ndirectly use for seamless deployment. You can view the source code for all\nsupported models in the `NxD Inference GitHub repository <https://github.com/aws-neuron/neuronx-distributed-inference/tree/main/src/neuronx_distributed_inference/models>`__. \n\n.. note:: \n   \n   If you are looking to deploy a custom model integration, you can follow the\n   :ref:`model onboarding guide <nxdi-onboarding-models>`. You can refer to the source\n   code for supported models in the `NxD Inference GitHub repository <https://github.com/aws-neuron/neuronx-distributed-inference/tree/main/src/neuronx_distributed_inference/models>`__\n   and make custom changes required for your use case.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nUsing Models to Run Inference\n-----------------------------\n\nYou can run models through vLLM or integrate directly with NxD\nInference.\n\nUsing vLLM\n~~~~~~~~~~\n\nIf you are using vLLM for production deployment, we recommend that you\nuse the vLLM API to integrate with NxD Inference. The vLLM API automatically\nchooses the correct model and config classes based on the model's config file.\nFor more information, refer to the :ref:`nxdi-vllm-user-guide-v1`.\n\nIntegrating Directly with NxD Inference\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo use NxD Inference directly, you construct model and configuration\nclasses. For more information about which model and configuration classes to use for each\nmodel, see :ref:`nxdi-supported-model-architectures`. To see an example of how to\nrun inference directly with NxD Inference, see the `generation_demo.py\nscript <https://github.com/aws-neuron/neuronx-distributed-inference/blob/main/examples/generation_demo.py>`__.\n\n.. _nxdi-supported-model-architectures:\n\nSupported Model Architectures\n-----------------------------\n\nNxD Inference currently provides support for the following model\narchitectures.\n\nLlama (Text)\n~~~~~~~~~~~~\n\nNxD Inference supports Llama text models. The Llama model architecture\nsupports all Llama text models, including Llama 2, Llama 3, Llama 3.1,\nLlama 3.2, and Llama 3.3. You can also use the Llama model architecture\nto run any model based on Llama, such as Mistral.\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Neuron config class: NeuronConfig\n- Inference config class: LlamaInferenceConfig\n- Causal LM model class: NeuronLlamaForCausalLM\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct (requires\n  Trn2)\n- https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct\n- https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct\n- https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct\n- https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3\n\n----\n\nLlama 4\n~~~~~~~~\n\nNxD Inference supports Llama 4 models, including both Scout and Maverick checkpoints.\nYou can use Hugging Face checkpoints. Both checkpoints leverage early fusion for native multimodality,\nenabling them to process text and image inputs. For more information\nabout how to run Llama 4 inference, see :ref:`/libraries/nxd-inference/tutorials/llama4-tutorial.ipynb`.\n\n.. _neuron-classes-1:\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Neuron config class: Llama4NeuronConfig\n- Inference config class: Llama4InferenceConfig\n- Causal LM model class: NeuronLlama4ForCausalLM\n\n.. _compatible-checkpoint-examples-1:\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct\n- https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct\n\n----\n\nMixtral\n~~~~~~~\n\nNxD Inference supports models based on the Mixtral model architecture,\nwhich uses mixture-of-experts (MoE) architecture.\n\n.. _neuron-classes-2:\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Neuron config class: MoENeuronConfig\n- Inference config class: MixtralInferenceConfig\n- Causal LM model class: NeuronMixtralForCausalLM\n\n.. _compatible-checkpoint-examples-2:\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1\n\n----\n\nDBRX\n~~~~\n\nNxD Inference supports models based on the DBRX model architecture,\nwhich uses mixture-of-experts (MoE) architecture.\n\n.. _neuron-classes-3:\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Neuron config class: DbrxNeuronConfig\n- Inference config class: DbrxInferenceConfig\n- Causal LM model class: NeuronDbrxForCausalLM\n\n.. _compatible-checkpoint-examples-3:\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/databricks/dbrx-instruct\n\nQwen2.5\n~~~~~~~~\n\nNxD Inference supports models based on the Qwen2.5 model architecture.\n\n----\n\n.. _neuron-classes-4:\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Neuron config class: Qwen2NeuronConfig\n- Inference config class: Qwen2InferenceConfig\n- Causal LM model class: NeuronQwen2ForCausalLM\n\n.. _compatible-checkpoint-examples-4:\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/Qwen/Qwen2.5-72B-Instruct\n- https://huggingface.co/Qwen/Qwen2.5-32B-Instruct\n- https://huggingface.co/Qwen/Qwen2.5-14B-Instruct (Not tested, but expected to work out of the box)\n- https://huggingface.co/Qwen/Qwen2.5-7B-Instruct\n- https://huggingface.co/Qwen/Qwen2.5-3B-Instruct (Not tested, but expected to work out of the box)\n- https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct (Not tested, but expected to work out of the box)\n- https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct\n\n----\n\nQwen3\n~~~~~~\n\nNxD Inference supports models based on the Qwen3 model architecture.\n\n.. _neuron-classes-5:\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Neuron config class: Qwen3NeuronConfig\n- Inference config class: Qwen3InferenceConfig\n- Causal LM model class: NeuronQwen3ForCausalLM\n\n.. _compatible-checkpoint-examples-5:\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/Qwen/Qwen3-0.6B\n- https://huggingface.co/Qwen/Qwen3-1.7B\n- https://huggingface.co/Qwen/Qwen3-4B\n- https://huggingface.co/Qwen/Qwen3-8B\n- https://huggingface.co/Qwen/Qwen3-14B\n- https://huggingface.co/Qwen/Qwen3-32B\n\n----\n\nQwen3 MoE\n~~~~~~~~~~\n\nNxD Inference supports Qwen3 MoE language model which supports multilingual text inputs.\n\n.. _neuron-classes-6:\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Neuron config class: MoENeuronConfig\n- Inference config class: Qwen3MoeInferenceConfig\n- Causal LM model class: NeuronQwen3MoeForCausalLM\n\n.. _compatible-checkpoint-examples-6:\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/Qwen/Qwen3-235B-A22B\n\n----\n\nFLUX.1 [BETA]\n~~~~~~~~~~~~~~~~~~\n\nNxD Inference supports FLUX.1-dev model checkpoint for text to image generation.\nYou can use Hugging Face checkpoints. For more information\nabout how to run FLUX.1-dev inference, see :ref:`/libraries/nxd-inference/tutorials/flux-inference-tutorial.ipynb`.\n\n.. _neuron-classes-7:\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Flux Application class: NeuronFluxApplication\n- Flux Pipeline class: NeuronFluxPipeline\n- Flux Backbone Neuron config class: FluxBackboneInferenceConfig\n\n.. _compatible-checkpoint-examples-7:\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/black-forest-labs/FLUX.1-dev\n\n----\n\nPixtral-Large-Instruct-2411\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference supports Pixtral image understanding model which processes text and image inputs. You can use HuggingFace checkpoint.\n\n.. _neuron-classes-8:\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Neuron config class: NeuronConfig\n- Inference config class: PixtralInferenceConfig\n- Causal LM model class: NeuronPixtralForCausalLM\n\n.. _compatible-checkpoint-examples-8:\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411\n\n----\n\nQwen2-VL-7B-Instruct (Dense)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference supports models based on the Qwen2-VL-7B-Instruct (Dense) model architecture.\n\n.. _neuron-classes-9:\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Neuron config class: Qwen2VLNeuronConfig\n- Inference config class: Qwen2VLInferenceConfig\n- Causal LM model class: NeuronQwen2VLForCausalLM\n\n.. _compatible-checkpoint-examples-9:\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct\n\n----\n\nQwen3-VL-8B-Thinking (Dense)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference supports models based on the Qwen3-VL-8B-Thinking (Dense) model architecture.\n\n.. _neuron-classes-10:\n\nNeuron Classes\n^^^^^^^^^^^^^^\n\n- Neuron config class: Qwen3VLNeuronConfig\n- Inference config class: Qwen3VLInferenceConfig\n- Causal LM model class: NeuronQwen3VLForCausalLM\n\n.. _compatible-checkpoint-examples-10:\n\nCompatible Checkpoint Examples\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/moe-arch-deep-dive.rst",
    "content": ".. meta::\n   :description: Deep dive into MoE architecture support in NxD Inference\n   :date_updated: 12/01/2025\n\n.. _moe-inference-deep-dive:\n\n================================================================================\nDeep dive: Explore Mixture of Experts (MoE) inference support for Neuron\n================================================================================\n\n**Why read this guide?** This guide is intended for ML engineers looking to\nimplement custom MoE models or implement advanced performance optimizations on Neuron.\nIt explains how each MoE component maps to Neuron hardware and how to combine router, expert, and parallelism\nsettings to extract maximium performance during the prefill and decode phases of MoE model inference.\n\n**How to use this guide:** If you are looking to deploy existing MoE models with vLLM,\nrefer to the :doc:`vLLM user guide <vllm-user-guide-v1>` instead.\nSkip to the :ref:`optimization sections <moe-prefill-optimization>` if you already know NxD basics.\n\nThis topic explores Mixture of Experts (MoE) inference in depth. It discusses the\ntechnical details from an AWS Neuron expert perspective. You need experience\nwith model sharding concepts like Tensor Parallelism and performance tuning on Neuron\nusing Neuron Kernel Interface (NKI) to fully understand this content.\n\nPrerequisites\n-------------\n\nBefore you start, you must be familiar with the following:\n\n- **NxD Inference library overview:** How to build and deploy models\n  using NxD Inference. See :doc:`../index`.\n- **Neuron Kernel Interface (NKI):** Performance optimization techniques\n  using NKI for custom kernel development. See :doc:`/nki/index`.\n- **Model parallelism techniques:** Tensor parallelism and other\n  distributed inference strategies. See :doc:`../app-notes/parallelism`.\n\nOverview\n--------\n\nMixture of Experts (MoE) is a neural network architecture that scales\nto massive parameter counts while maintaining computational efficiency. An\nMoE layer replaces a traditional dense feedforward network with multiple specialized\n\"expert\" networks. Only a subset of experts are activated per token.\nEach input token is processed by only the top-k most relevant\nexperts (typically k=1-8), as determined by a learned router. This selective activation\nallows models to have billions of parameters while computing only a fraction of them.\nThis breaks the linear relationship between model size and computational cost.\nDue to its computational benefits,\nthe MoE architecture has gained significant adoption across the industry.\nRecent models like GPT-OSS, Llama4, DeepSeek-V3, and Qwen3-MoE all use MoE.\n\n.. image:: /images/deep-dives/moe-arch/moe-architecture-overview.png\n   :alt: MoE layer architecture showing input tokens, router, expert selection, and output combination\n   :align: center\n   :width: 80%\n\nImplementing MoE models to extract peak performance on Neuron hardware requires careful\ndesign. This is due to the dynamic nature of expert selection, which creates variable\ncomputational graphs. These must be handled within Neuron's static compilation model.\nExpert routing decisions vary per iteration. This causes different number of tokens to be\nassigned to each expert. This requires algorithms like the blockwise\nmatrix multiplication approach to maintain static tensor shapes while minimizing padding\noverhead. Additionally, MoE models require careful consideration of tensor parallelism\n(TP), expert parallelism (EP), and sequence parallelism (SP) strategies. The\noptimal approach depends on expert size, sparsity patterns, and whether the workload is\ncompute-bound (prefill) or memory-bound (decode). These topics form the focus of this deep dive.\n\nAnatomy of an MoE layer and MoE API in NxDI\n--------------------------------------------\n\nAn MoE layer consists of three main components: a router that determines expert selection,\nexpert MLPs that perform the actual computation, and optional shared experts that\nprocess all tokens.\nThe NxD Inference library provides a comprehensive set of APIs for building MoE layers\nthat mirrors this conceptual structure.\n\nMoE Layer Structure\n~~~~~~~~~~~~~~~~~~~\n\nThe ``MoE`` class in NeuronxDistributed serves as the main orchestrator. It combines the\nthree core components into a unified layer. The data flow implements a clear pattern:\ninput tokens first pass through the router to determine expert assignments, then through\nthe selected expert MLPs for computation, and finally through optional shared experts\nbefore output combination. This modular design allows you to flexibly configure and\nbuild different MoE model architectures. You also benefit from optimizations in the\nNeuron SDK to optimize MoE model performance.\n\n**Expert combine** is an operation where outputs\nfrom multiple experts are weighted and combined to produce the final token\nrepresentation. For each token processed by top-k experts, the router produces affinity\nscores that determine how much each expert's output contributes to the final result.\nMathematically, for a token processed by experts :math:`E_1, E_2, ..., E_k` with corresponding\naffinities :math:`a_1, a_2, ..., a_k`, the final output is computed as:\n\n.. math::\n\n   \\mathrm{output\\_token} = \\sum_{i=1}^{k} a_i \\times E_i(\\text{token})\n\nwhere:\n\n- :math:`E_i(\\text{token})` is the output of expert :math:`i` for the given token\n- :math:`a_i` is the affinity score for expert :math:`i`\n- :math:`k` is the number of selected experts (top_k)\n\nThis weighted combination ensures that experts with higher routing confidence contribute\nmore significantly to the final output. The affinity normalization (controlled by\n``normalize_top_k_affinities``) ensures that :math:`\\sum_{i=1}^{k} a_i = 1.0` across the selected\nexperts for each token. The NxD framework handles this expert combination logic internally,\nalong with routing and static compilation optimizations.\n\nBelow is an example of how to instantiate the MoE API:\n\n.. code-block:: python\n\n   from neuronx_distributed.modules.moe import MoE, routing\n   from neuronx_distributed.modules.moe.expert_mlps_v2 import ExpertMLPsV2\n   from neuronx_distributed.modules.moe.moe_configs import (\n       RoutedExpertsMLPOpsConfig,\n       BlockwiseMatmulConfig\n   )\n   from neuronx_distributed.modules.moe.shared_experts import SharedExperts\n\n   # Example: GPT-OSS MoE layer configuration\n   num_experts = 128\n   top_k = 8\n   hidden_size = 7168\n   intermediate_size = 2048\n\n   # Initialize router for expert selection\n   router = routing.RouterTopK(\n       num_experts=num_experts,\n       top_k=top_k,\n       hidden_size=hidden_size,\n   )\n\n   # Configure expert MLPs using ExpertMLPsV2 class\n   routed_experts_config = RoutedExpertsMLPOpsConfig(\n       num_experts=num_experts,\n       top_k=top_k,\n       hidden_size=hidden_size,\n       intermediate_size=intermediate_size,\n       hidden_act=\"silu\",\n       glu_mlp=True,\n       capacity_factor=None,  # Full capacity, no token dropping\n       normalize_top_k_affinities=True,\n   )\n\n   # These configs relate to the blockwise matrix multiply (BWMM) algorithm,\n   # which enables static compilation by organizing tokens into fixed-size blocks\n   # assigned to experts. BWMM tuning parameters are covered in detail later.\n   blockwise_config = BlockwiseMatmulConfig.from_kwargs(\n       block_size=512,\n       logical_nc_config=2,  # Use LNC2 for Trn2\n   )\n\n   expert_mlps = ExpertMLPsV2(\n       routed_experts_mlp_config=routed_experts_config,\n       blockwise_matmul_config=blockwise_config,\n       sequence_parallel_enabled=True,\n   )\n\n   # Create complete MoE layer\n   moe_layer = MoE(\n       router=router,\n       expert_mlps=expert_mlps,\n       sequence_parallel_enabled=True,\n   )\n\nRouter\n~~~~~~\n\nThe router component determines which experts compute each token through routing\ndecisions learned during model training. NxD Inference supports multiple routing\nstrategies, each optimized for different model architectures. The ``RouterBase``\nclass provides interfaces for inputs and outputs that the MoE module expects. Specialized implementations offer distinct\nrouting behaviors.\n\nThe ``RouterTopK`` implementation available for use out of the box in NxD inference provides standard top-k expert selection, making it\nsuitable for most MoE models including GPT-OSS, Llama4, and Qwen-3 Moe. It supports\nboth softmax and sigmoid activation functions for computing token to expert affinities:\n\n.. code-block:: python\n\n   # Standard top-k routing (used in GPT-OSS, DBRX)\n   router = routing.RouterTopK(\n       num_experts=128,\n       top_k=8,\n       hidden_size=7168,\n       act_fn=\"softmax\",  # or \"sigmoid\"\n       sequence_parallel_enabled=True,\n   )\n\nThe ``GroupLimitedRouter`` is another built-in routing API that implements the no-auxiliary-loss method from DeepSeek-V3,\nwhich groups experts and selects top groups before performing top-k selection within\nthose groups:\n\n.. code-block:: python\n\n   # Setting up Group-limited routing (DeepSeek-V3 style)\n   router = routing.GroupLimitedRouter(\n       num_experts=256,\n       top_k=8,\n       hidden_size=7168,\n       n_group=8,  # Number of expert groups\n       topk_group=2,  # Top groups to select\n   )\n\nRouted Experts\n~~~~~~~~~~~~~~\n\nThe ``ExpertMLPsV2`` class handles the core routed expert computation. It computes tokens through\ntheir assigned experts. This class contains implementations of the experts matrix\nmultiplication that are optimized depending on whether the workload is compute-bound\nor memory-bound. It automatically selects the appropriate strategy based on sequence\nlength, batch size and other architectural parameters.\n\nThe V2 API provides a configuration-based approach with ``RoutedExpertsMLPOpsConfig``\nfor expert-specific settings to implement different MoE architectures\nand ``BlockwiseMatmulConfig`` for optimization parameters.\nThis separation provides cleaner configuration management and better extensibility:\n\n.. code-block:: python\n\n   # GPT-OSS Expert MLPs configuration\n   routed_experts_config = RoutedExpertsMLPOpsConfig(\n       num_experts=128,\n       top_k=8,\n       hidden_size=7168,\n       intermediate_size=2048,\n       hidden_act=\"swiglu\",\n       glu_mlp=True,\n       capacity_factor=None,  # Full capacity, no token dropping\n       normalize_top_k_affinities=True,\n   )\n\n   # Configuration parameters for the BWMM algorithm, which are explained later.\n   blockwise_config = BlockwiseMatmulConfig.from_kwargs(\n       block_size=512,\n       logical_nc_config=2,  # Use LNC2 for Trn2\n       skip_dma_token=True,  # Skip loading padded tokens\n       skip_dma_weight=True,  # Skip duplicate weight loads\n   )\n\n   expert_mlps = ExpertMLPsV2(\n       routed_experts_mlp_config=routed_experts_config,\n       blockwise_matmul_config=blockwise_config,\n       sequence_parallel_enabled=True,\n   )\n\n\nNxD Inference supports both dropping and dropless MoE strategies. Each has different\ntrade-offs between computational efficiency and model accuracy. The choice between these\nstrategies is controlled by the ``capacity_factor`` parameter in the expert configuration.\n\n**Dropless MoE** (``capacity_factor=None``) processes all tokens through their assigned\nexperts without dropping any tokens. This approach maintains full model accuracy but\nrequires dynamic handling of variable expert loads. Models using dropless strategies\ninclude GPT-OSS, Llama4, and DBRX. The blockwise matrix multiplication\nalgorithm enables efficient dropless computation by organizing tokens into fixed-size\nblocks while minimizing padding overhead:\n\n.. code-block:: python\n\n   # Dropless MoE configuration (recommended for inference)\n   routed_experts_config = RoutedExpertsMLPOpsConfig(\n       num_experts=128,\n       top_k=8,\n       hidden_size=7168,\n       intermediate_size=2048,\n       hidden_act=\"swiglu\",\n       glu_mlp=True,\n       capacity_factor=None,  # Dropless - no tokens dropped\n       normalize_top_k_affinities=True,\n   )\n\n**Dropping MoE** (``capacity_factor > 0``) sets a fixed capacity for each expert and\ndrops tokens that exceed this capacity. This approach provides more predictable\ncomputational costs but may impact model accuracy due to dropped tokens. Models using\ndropping strategies include DeepSeek-V3:\n\n.. code-block:: python\n\n   # Dropping MoE configuration with 25% extra capacity\n   routed_experts_config = RoutedExpertsMLPOpsConfig(\n       num_experts=128,\n       top_k=8,\n       hidden_size=2880,\n       intermediate_size=2880,\n       hidden_act=\"swiglu\",\n       glu_mlp=True,\n       capacity_factor=1.25,  # 25% extra capacity beyond perfect balance\n       normalize_top_k_affinities=True,\n   )\n\n**Parallelism Strategies for Routed Experts**\n\nMoE models on Neuron hardware benefit from two primary parallelism strategies that can\nbe used independently or in combination to optimize performance and memory usage:\n\n.. image:: /images/deep-dives/moe-arch/moe-parallelism-strategies.png\n   :alt: MoE parallelism strategies showing data flow for Tensor Parallelism vs Expert Parallelism\n   :align: center\n   :width: 80%\n\n**Tensor Parallelism (TP)** distributes each expert's computation across multiple\nNeuronCores by sharding the expert weights along the intermediate dimension. This\napproach reduces memory usage per core and enables larger models to fit in available\nmemory. With TP, each expert's gate, up, and down projection matrices are split across\nTP ranks, requiring collective communication to combine results.\n\n**Expert Parallelism (EP)** distributes different experts across different NeuronCores,\nallowing each core to specialize in computing a subset of the total experts.\n\nAs we discuss later in this deep dive,\nthe choice between TP and EP (or their combination) depends on model architecture\nand the specific TRN hardware under consideration.\n\nTo configure TP and EP, configure the degrees\nwhile initializing the model parallel state in NxD.\nThe MoE components automatically create and use the appropriate PyTorch process groups based on the\nparallelism configuration. These configurations set up routed expert behavior and\nparallelism strategy, while NxD internally manages mapping to the optimized kernels,\nand process group mapping for TP/EP. We show a few code examples below.\n\n.. code-block:: python\n\n   from neuronx_distributed.parallel_layers import parallel_state\n\n   # Configure Tensor Parallelism only (TP=8)\n   parallel_state.initialize_model_parallel(\n       tensor_model_parallel_size=8,\n       expert_model_parallel_size=1,  # No expert parallelism\n   )\n\n   # Configure Expert Parallelism only (EP=16)\n   parallel_state.initialize_model_parallel(\n       tensor_model_parallel_size=1,  # No tensor parallelism\n       expert_model_parallel_size=16,\n   )\n\n   # Configure combined TP and EP (TP=4, EP=16)\n   parallel_state.initialize_model_parallel(\n       tensor_model_parallel_size=4,\n       expert_model_parallel_size=16,\n   )\n\n\n\nShared Experts\n~~~~~~~~~~~~~~\n\nShared experts provide an optional mechanism for processing all tokens through a\ndedicated expert network in addition to the routed experts described above.\nModel architectures that use shared experts include Llama4 Maverick and DeepSeek-V3.\n\nThe ``SharedExperts`` implementation supports both tensor parallelism and sequence\nparallelism execution modes. **Sequence Parallelism (SP)** distributes the sequence\ndimension across multiple NeuronCores, where each core processes a subset of tokens\nwhile maintaining complete copies of the weights. It uses automatic weight replication or sharding based on the\nconfiguration. For prefill, shared experts can run in sequence parallel mode\nwith replicated weights. Token generation uses tensor parallel mode with sharded\nweights:\n\n.. code-block:: python\n\n   # Llama4 Maverick shared experts configuration\n   shared_experts = SharedExperts(\n       hidden_size=5120,\n       intermediate_size=8192,\n       num_shared_experts=1,  # Llama4 Maverick uses 1 shared expert\n       hidden_act=\"silu\",\n       sequence_parallel_enabled=True,  # Run in SP for prefill\n       fused_gate_up_projection=True,  # Optimize gate/up fusion\n   )\n\n   # Complete Llama4 Maverick MoE layer with shared experts\n   moe_layer = MoE(\n       router=router,\n       expert_mlps=expert_mlps,\n       shared_experts=shared_experts,\n       sequence_parallel_enabled=True,\n   )\n\nThe shared experts component automatically handles the complexity of different execution\nmodes. It switches between sequence parallel execution for prefill (where weights\nare replicated) and tensor parallel execution for token generation (where weights are\nsharded).\n\n.. _moe-prefill-optimization:\n\nMoE prefill optimization\n------------------------\n\nThis section explores the core design principles and optimization techniques that enable\nefficient MoE execution during prefill. It focuses on three key areas: router execution strategies,\nblockwise matrix multiplication algorithms for efficient routed expert computation,\nand optimization strategies for shared experts.\n\nRouter execution in sequence parallel mode\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRouter networks are significantly smaller compared to expert MLPs. They have weight matrices of size\n:math:`[\\mathrm{hidden\\_size}, \\mathrm{num\\_experts}]`. For most MoE architectures, this represents a relatively\nmodest memory footprint that allows RouterTopK to run with replicated weights across sequence\nparallel ranks. NxD also delays logit gathering until after expert selection to reduce\ncommunication volume. Consider a concrete example:\n\n.. code-block:: text\n\n   Example: GPT-OSS 120B configuration\n   - Hidden size: 2880\n   - Number of experts: 128\n   - Router weight size: 2880 × 128 × 2 bytes = ~0.07MB per MoE layer\n   - Router across all layers: 0.07MB × 36 layers = ~2.4MB\n   - Replicating the router occupies ~0.01% of HBM capacity on a TRN2 instance\n\nThe small size of router weights makes weight replication across cores acceptable. This enables\nsequence parallel execution where each core processes a subset of the sequence but maintains\na complete copy of the router weights. This approach improves the arithmetic intensity\nof router layer operations without imposing significant memory overhead.\n\n**Communication optimization in sequence parallel mode**\n\nThe NxD implementation performs an additional optimization to reduce communication overhead.\nA naive implementation of router in sequence parallel (SP) would involve gathering the\nrouter logits computed in sequence parallel. This induces a communication\nvolume of :math:`[\\mathrm{batch\\_size}, \\mathrm{seq\\_len}, \\mathrm{num\\_experts}]`.\nThe gathering of logits is needed to proceed to the next step\nof computing experts. The computation operates in TP or EP mode rather than SP.\nFor long sequences and models with a large number of experts, this step can become a performance bottleneck.\n\nTo optimize this, we delay gathering logits until after expert selection is completed.\nFollowing this step, the size of router logits to be gathered becomes :math:`[\\mathrm{batch\\_size}, \\mathrm{seq\\_len}, \\mathrm{top\\_k}]`.\nThis is significantly smaller and reduces communication overhead by a factor of :math:`\\frac{\\mathrm{num\\_experts}}{\\mathrm{top\\_k}}`.\n\nFor example, with 128 experts and top_k=8, this optimization reduces communication volume by 16×.\n\n**Takeaway**: During prefill, we recommend configuring the router in sequence parallel mode.\n\n**Enabling router in sequence parallel mode**\n\nThe router implementation in NxD automatically handles sequence parallel execution through\nthe ``sequence_parallel_enabled`` parameter.\n\n.. code-block:: python\n\n   # Router configuration for sequence parallel execution\n   router = routing.RouterTopK(\n       num_experts=128,\n       top_k=8,\n       hidden_size=2880,\n       sequence_parallel_enabled=True,  # Enable SP execution\n       act_fn=\"softmax\"\n   )\n\n\nBlockwise Matrix Multiplication (BWMM): Routed Expert optimization\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA naive implementation of routed expert computation\ninherently creates dynamic computational graphs. This is because token-to-expert\ndistributions vary across iterations.\n\nConsider a simple example that illustrates the core problem:\n\n.. code-block:: python\n\n   # Naive MoE implementation picked from HuggingFace\n   # (problematic for static compilation)\n   def moe_forward(tokens, experts, router):\n       expert_assignments = router(tokens)  # Dynamic routing decisions\n       outputs = []\n\n       for expert_id in range(num_experts):\n           # Variable number of tokens per expert each iteration\n           expert_tokens = tokens[expert_assignments == expert_id]\n           if len(expert_tokens) > 0:\n               # experts[expert_id] represents the expert network/function\n               expert_output = experts[expert_id](expert_tokens)\n               outputs.append(expert_output)\n\n       return combine_outputs(outputs, expert_assignments)\n\n\nBlockwise matrix multiplication solution\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe blockwise matrix multiplication (BWMM) approach solves this challenge\nby transforming the dynamic problem into a static one. It maps tokens\ninto fixed-size computational blocks:\n\n.. image:: /images/deep-dives/moe-arch/moe-blockwise-transformation.png\n   :alt: Transformation from dynamic expert assignment to fixed-size blocks\n   :align: center\n   :width: 80%\n\n**Core design principles:**\n\nThe algorithm maps tokens into blocks with a fixed number of tokens (equal to block_size).\nIt maintains the following constraints:\n\n1. **Single expert per block**: Each block contains tokens assigned to only one expert\n2. **Multiple blocks per expert**: Experts can be assigned multiple blocks when needed\n3. **Padded blocks allowed**: Some blocks may contain only padding tokens depending on the token-to-expert distribution during inference\n\nFor dropless inference, provisioning :math:`N = \\lceil\\frac{\\mathrm{tokens} \\times \\mathrm{top\\_k}}{\\mathrm{block\\_size}}\\rceil + (\\mathrm{num\\_experts} - 1)`\nblocks is sufficient to map all tokens without dropping while satisfying these constraints.\n\n**Concrete example:**\n\n.. code-block:: text\n\n   Input: 6 tokens [T0, T1, T2, T3, T4, T5]\n   Expert assignment: [E0, E1, E0, E2, E1, E0]\n   Block size: 4\n\n   Block organization:\n   Block 0 → Expert E0: [T0, T2, T5, -1]  # 3 real tokens + 1 padding\n   Block 1 → Expert E1: [T1, T4, -1, -1]  # 2 real tokens + 2 padding\n   Block 2 → Expert E2: [T3, -1, -1, -1]  # 1 real token + 3 padding\n\n**Padding overhead analysis**\n\nUnderstanding padding overhead is crucial for optimizing MoE performance. It directly\nimpacts compute utilization and memory efficiency. The BWMM algorithm introduces\npadding in two scenarios: within blocks (when experts receive fewer tokens than block_size)\nand across blocks (when we provision more blocks than the minimum required).\n\n*Mathematical framework:*\n\nThe total padding overhead can be quantified as:\n\n.. math::\n\n   \\text{Padding overhead} = (\\text{Total provisioned compute}) - (\\text{Actual required compute})\n\n.. math::\n\n   = (N \\times \\mathrm{block\\_size}) - (T \\times \\mathrm{top\\_k})\n\nWhere:\n\n- :math:`N` = number of blocks provisioned\n- :math:`T` = total input tokens\n- :math:`\\mathrm{block\\_size}` = tokens per block\n- :math:`\\mathrm{top\\_k}` = experts per token\n\n*Concrete example - Padding impact:*\n\n.. code-block:: text\n\n   Scenario: 1000 tokens, 8 experts, top_k=2, block_size=256\n\n   Required computation: 1000 × 2 = 2000 token-expert pairs\n\n   Blocks statically provisioned to handle worst case:\n   N = ⌈(1000 × 2) / 256⌉ + (8 - 1) = ⌈7.8⌉ + 7 = 15\n\n   Best case (perfect load balancing):\n   - Each expert gets: 2000 ÷ 8 = 250 tokens\n   - Blocks needed: 8 experts × 1 block = 8 blocks\n   - Total compute slots (required): 8 × 256 = 2048\n   - Total compute slots (actual): 15 × 256 = 3840\n   - Padding overhead (to handle worst case): (3840 - 2048) ÷ 2048 = 87.5%\n   - Algorithmic padding overhead: (2048 - 2000) / 2000 = 2.4%\n\n   Worst case (load imbalance):\n   - One expert gets 1750 tokens, others get ~36 tokens each\n   - Blocks needed: 7 blocks for hot expert + 7 blocks for others = 14 blocks\n   - Total compute slots (required): 14 × 256 = 3584\n   - Total compute slots (actual): 15 × 256 = 3840\n   - Padding overhead (to handle worst case): (3840 - 3584) ÷ 3584 = 7.14%\n   - Algorithmic padding overhead: (3584 - 2000) / 2000 = 79.2%\n\n**Block size selection guidance**\n\n*Trade-offs:*\n\n.. code-block:: text\n\n   Smaller block_size (e.g., 128):\n   ✓ Reduces within-block padding, improving performance when token-to-expert distribution is imbalanced\n   ✗ Lower arithmetic intensity per block\n\n   Larger block_size (e.g., 1024):\n   ✓ Higher arithmetic intensity per block\n   ✗ Higher within-block padding for sparse experts\n\n*Optimization principle:*\n\nChoose the block size just large enough so that the workload becomes compute-bound rather than memory-bound.\n\nThe arithmetic intensity factor (AIF) provides a quantitative framework for block size selection:\n\n.. math::\n\n   \\text{AIF} = \\frac{\\text{Compute FLOPs}}{\\text{Data movement}}\n\n.. math::\n\n   = \\frac{2 \\times 3 \\times \\mathrm{block\\_size} \\times \\mathrm{hidden\\_size} \\times \\mathrm{intermediate\\_size} \\times \\mathrm{num\\_blocks}}{2 \\times 3 \\times \\mathrm{num\\_experts} \\times \\mathrm{hidden\\_size} \\times \\mathrm{intermediate\\_size}}\n\n.. math::\n\n   = \\frac{\\mathrm{block\\_size} \\times \\mathrm{num\\_blocks}}{\\mathrm{num\\_experts}}\n\nTarget configuration: :math:`\\text{AIF} \\geq \\frac{\\text{Peak compute throughput}}{\\text{Memory bandwidth}}`\n\nFor TRN2 instances, this ratio is approximately 400-500 FLOPs/byte, providing guidance for optimal block size selection.\n\n\n\nAdvanced optimizations in the BWMM algorithm\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe implementation of the BWMM kernel that is available in the Neuron SDK\nprovides several sophisticated optimizations. These significantly\nimprove MoE performance by reducing memory bandwidth requirements and eliminating\nunnecessary computation.\n\n**DMA skipping optimizations**\n\nDMA (Direct Memory Access) skipping addresses the padding overhead inherent in the blockwise\napproach. It selectively avoids DMA transfers for padded elements.\n\n*Token skipping:*\n\nToken skipping eliminates memory transfers for padded token positions (marked as ``-1`` in\nthe token position mapping):\n\n.. image:: /images/deep-dives/moe-arch/moe-token-skipping.png\n   :alt: Token skipping optimization showing elimination of padded token transfers\n   :align: center\n   :width: 80%\n\n.. code-block:: text\n\n   Without token skipping:\n   Block: [T0, T2, T5, -1]\n   DMA operations: 4 token loads (including padding)\n\n   With token skipping:\n   Block: [T0, T2, T5, -1]\n   DMA operations: 3 token loads (padding skipped)\n   Performance improvement: ~25% reduction in memory bandwidth\n\n*Weight skipping:*\n\nWeight skipping avoids redundant expert weight loads when consecutive blocks use the same expert:\n\n.. code-block:: text\n\n   Block sequence: [E0, E0, E1, E2, E2]\n\n   Without weight skipping:\n   - Load E0 weights for Block 0\n   - Load E0 weights for Block 1 (redundant)\n   - Load E1 weights for Block 2\n   - Load E2 weights for Block 3\n   - Load E2 weights for Block 4 (redundant)\n\n   With weight skipping:\n   - Load E0 weights for Block 0\n   - Reuse E0 weights for Block 1\n   - Load E1 weights for Block 2\n   - Load E2 weights for Block 3\n   - Reuse E2 weights for Block 4\n\n**Configuration in NxD Inference:**\n\nRecommendation is to have both these features as default on.\n\n.. code-block:: python\n\n   # Enable DMA skipping optimizations\n   blockwise_config = BlockwiseMatmulConfig.from_kwargs(\n       block_size=512,\n       logical_nc_config=2,\n       skip_dma_token=True,    # Enable token skipping\n       skip_dma_weight=True,   # Enable weight skipping\n   )\n\n**Dynamic control flow - block compute skipping**\n\nDynamic control flow optimization eliminates computation\nentirely for blocks that contain only padding tokens.\nThis is done inside the kernel by leveraging support for\nexecuting while loops on chip with dynamic number of iterations in the Neuron SDK.\n\n.. image:: /images/deep-dives/moe-arch/moe-dynamic-control-flow.png\n   :alt: Dynamic while loop skipping fully padded blocks\n   :align: center\n   :width: 80%\n\n**Conceptual example:**\n\n.. code-block:: text\n\n   Total blocks: 10\n   Token distribution: 6 blocks with real tokens, 4 blocks fully padded\n\n   Block to expert allocation: [1, 1, 1, 1, 1, 1, 0, 0, 0, 0]\n                              ^-- real blocks --^  ^-- skip --^\n\n   Regular execution: Compute all 10 blocks\n   With dynamic control flow: Compute only 6 blocks, skip 4 entirely\n   Performance improvement roofline: ~40% reduction in compute FLOPs, especially when token to expert distribution is not imbalanced.\n\n**NxD Inference configuration:**\n\n.. code-block:: python\n\n   # Enable dynamic control flow optimization\n   blockwise_config = BlockwiseMatmulConfig.from_kwargs(\n       block_size=512,\n       logical_nc_config=2,\n       # Choose based on LNC2 sharding:\n       use_shard_on_block_dynamic_while=True,\n       # OR\n       use_shard_on_intermediate_dynamic_while=True, # Based on technique used for LNC2 sharding\n   )\n\n**LNC2 sharding strategies**\n\nTRN2 and TRN3 provide two physical cores per logical NeuronCore.\nNxD inference via the Neuron Kernel Library (NKI-Lib) supports three distinct sharding strategies,\neach optimized for different scenarios. The choice of LNC sharding algorithm can be configured\nthrough `BlockwiseMatmulConfig` parameters:\n\n*Hidden dimension sharding (shard on H):*\n\nDefault sharding strategy in `BlockwiseMatmulConfig`.\n\n.. code-block:: text\n\n   Computation per block: [block_size, H] @ [H, I] @ [I, H]\n   Sharding strategy: Split H dimension across cores\n\n   Core 0: [block_size, H/2] @ [H/2, I] @ [I, H/2]\n   Core 1: [block_size, H/2] @ [H/2, I] @ [I, H/2]\n\n   Requires: Cross-core reduction after first matmul\n   Best for: High tensor parallelism scenarios\n\n*Intermediate dimension sharding (shard on I):*\n\nConfigured with `use_shard_on_intermediate_dynamic_while=True` in `BlockwiseMatmulConfig`.\n\n.. code-block:: text\n\n   Computation per block: [block_size, H] @ [H, I] @ [I, H]\n   Sharding strategy: Split I dimension across cores\n\n   Core 0: [block_size, H] @ [H, I/2] @ [I/2, H]\n   Core 1: [block_size, H] @ [H, I/2] @ [I/2, H]\n\n   Requires: Cross-core reduction after second matmul\n   Best for: Low expert parallelism scenarios, large intermediate dimensions\n\n*Block parallel execution:*\n\nConfigured with `use_shard_on_block_dynamic_while=True` in `BlockwiseMatmulConfig`.\n\n.. code-block:: text\n\n   Total blocks: N\n   Sharding strategy: Distribute blocks across cores\n\n   Core 0: Processes blocks [0, 2, 4, ...] (even indices)\n   Core 1: Processes blocks [1, 3, 5, ...] (odd indices)\n\n   Requires: Enough HBM capacity to store intermediate outputs across cores and a cross-core reduction at the end.\n   Best for: When workload can afford the HBM capacity to store intermediate outputs from both cores and when there is more than one expert per logical core.\n\n\nShared experts optimization\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nShared experts, used in models like Llama4 Maverick, process all tokens regardless of\nrouting decisions. Their optimization strategy differs significantly from routed experts\ndue to their deterministic computation pattern.\n\n**Execution mode selection**\n\nShared experts support two primary execution modes. Each is optimized for different phases\nof inference:\n\n*Sequence parallel mode:*\n- **When to use**: Context encoding with small weights and available HBM capacity\n- **Characteristics**: Weights replicated across cores, each core processes subset of sequence\n- **Benefits**: Maximizes compute utilization, minimizes communication overhead\n\n*Tensor parallel mode :*\n- **When to use**: When memory constraints require weight sharding\n- **Characteristics**: Weights sharded across cores, requires collective communication\n- **Benefits**: Reduces memory usage per core, enables larger models\n\n\n**Configuration in NxD Inference**\n\n.. code-block:: python\n\n   # Shared experts with dual-mode execution\n   shared_experts = SharedExperts(\n       hidden_size=5120,\n       intermediate_size=8192,\n       num_shared_experts=1,\n       hidden_act=\"silu\",\n       sequence_parallel_enabled=True,  # Enable SP for prefill\n       fused_gate_up_projection=True,\n   )\n\n\nConfiguring TP and EP\n~~~~~~~~~~~~~~~~~~~~~~\n\nThe choice between Tensor Parallelism (TP) and Expert Parallelism (EP) depends on several\nmodel characteristics and hardware constraints. This section provides practical guidance\nfor selecting the optimal parallelism strategy.\n\n**Decision framework**\n\n**When to prefer Tensor Parallelism:**\n\n- *Small number of experts* (≤32): TP provides good load balancing without expert distribution concerns\n- *Large intermediate dimensions*: Optimal configuration is when sharded intermediate dimensions are >= 128 for good tensor engine utilization\n\n**When to prefer Expert Parallelism:**\n\n- *Large number of experts* (≥64): Better expert distribution and load balancing\n- *Small intermediate dimensions*: Avoids under-utilization from excessive TP sharding\n\n**Hybrid TP+EP approach:**\n\n- *Best of both worlds*: Combine moderate TP (2-8) with EP to achieve good compute efficiency.\n- *Load balancing problem with very large EP*: Expert parallelism can suffer from load imbalance.\n  \nSome EP groups receive significantly more work than others. In the worst case, one EP\ngroup may receive 3-4x the average number of tokens. This creates straggler effects that\nlimit overall performance. This skew becomes more pronounced with larger EP degrees\nand imbalanced routing patterns. The overall MoE layer performance is determined by\nthe slowest EP group. This makes load balancing critical for EP effectiveness.\n\n\n**Configuration examples**\n\n.. code-block:: python\n\n   # Small model, balanced routing - prefer TP\n   parallel_state.initialize_model_parallel(\n       tensor_model_parallel_size=8,\n       expert_model_parallel_size=1,\n   )\n\n   # Large model, many experts - prefer EP\n   parallel_state.initialize_model_parallel(\n       tensor_model_parallel_size=1,\n       expert_model_parallel_size=16,\n   )\n\n   # Very large model - hybrid approach\n   parallel_state.initialize_model_parallel(\n       tensor_model_parallel_size=4,\n       expert_model_parallel_size=16,\n   )\n\n\nMoE decode optimization\n-----------------------\n\nToken generation (decode) presents fundamentally different optimization challenges compared\nto prefill due to its memory-bound characteristics. During decode, the input shape is\n``[batch_size, 1, hidden_size]`` rather than ``[1, seq_len, hidden_size]``. This creates\nsmall matrix multiplications that are limited by memory bandwidth rather than compute\nthroughput. This section explores the specialized optimization strategies for efficient\nMoE execution during token generation.\n\nMemory-bound characteristics of token generation\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nToken generation workloads exhibit distinct computational characteristics. They require\ndifferent optimization approaches from prefill:\n\n**Computational profile comparison:**\n\n.. code-block:: text\n\n   Prefill (compute-bound):\n   - Input shape: [1, seq_len, hidden_size] where seq_len >> batch_size\n   - Large matrix multiplications: [1, 8192, 4096] @ [4096, 12288]\n   - High arithmetic intensity: ~400+ FLOPs/byte\n   - Bottleneck: Compute throughput (TensorEngine utilization)\n\n   Token generation (memory-bound):\n   - Input shape: [batch_size, 1, hidden_size] where batch_size << seq_len\n   - Small matrix multiplications: [32, 1, 4096] @ [4096, 12288]\n   - Low arithmetic intensity: ~50-100 FLOPs/byte\n   - Bottleneck: Memory bandwidth (weight loading from HBM)\n\nThe key insight is that during token generation, the time to load expert weights from\nHBM often exceeds the actual computation time. This makes memory bandwidth optimization\nthe primary concern.\n\nSelective loading algorithm\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSelective loading addresses the memory bandwidth bottleneck. It loads only the expert\nweights required for the current batch of tokens, rather than loading all expert weights.\n\n**Core principle:**\n\nInstead of loading all ``E`` experts, load only the ``batch_size × top_k`` unique experts\nneeded for the current batch. This can provide significant memory bandwidth savings when\nthe number of required experts is much smaller than the total number of experts.\n\n**Algorithm overview:**\n\n.. code-block:: text\n\n   For each token generation step:\n   1. Determine expert assignments for current batch\n   2. Identify unique experts needed across all tokens\n   3. Load only required expert weights from HBM\n   4. Compute only loaded experts\n   5. Combine outputs using expert affinities\n\n**Effectiveness conditions:**\n\nSelective loading is most effective when the number of unique experts required is significantly smaller than the total number of experts:\n\n.. math::\n\n   \\mathrm{Effectiveness\\ condition:\\ } \\mathrm{batch\\_size} \\times \\mathrm{top\\_k} \\ll \\mathrm{num\\_experts}\n\n**Memory bandwidth savings:**\n\nThe theoretical memory bandwidth reduction can be calculated as:\n\n.. math::\n\n   \\mathrm{Bandwidth\\ reduction} = 1 - \\frac{\\mathrm{unique\\_experts\\_loaded}}{\\mathrm{num\\_experts}}\n\n**Example scenarios:**\n\n.. code-block:: text\n\n   DeepSeek (256 experts, top_k=8):\n   - Effective for batch_size ≤ 16\n   - Max unique experts: 16 × 8 = 128 (50% of total experts)\n   - Potential bandwidth savings: ~50%\n\n   GPT-OSS (128 experts, top_k=8):\n   - Effective for batch_size ≤ 8\n   - Max unique experts: 8 × 8 = 64 (50% of total experts)\n   - Potential bandwidth savings: ~50%\n\n   Llama4 (16 experts, top_k=1):\n   - Effective for batch_size ≤ 8\n   - Max unique experts: 8 × 1 = 8 (50% of total experts)\n   - Potential bandwidth savings: ~50%\n\n\nAll-Experts algorithm\n~~~~~~~~~~~~~~~~~~~~~~\n\nWhen selective loading becomes ineffective (large batch sizes),\nthe all-experts algorithm provides an alternative optimization strategy.\n\n**When to use All-Experts:**\n\nNxD Inference automatically determines when to switch from selective loading to all-experts\nbased on workload characteristics. The threshold for switching can be determined by:\n\n.. math::\n\n   \\mathrm{Switch\\ threshold:\\ } \\mathrm{batch\\_size} \\times \\mathrm{top\\_k} \\geq \\alpha \\times \\mathrm{num\\_experts}\n\nwhere :math:`\\alpha` is typically between 0.8-1, representing the point where loading all experts becomes more efficient than selective loading.\n\n**Example threshold analysis:**\n\n.. code-block:: text\n\n   DeepSeek with batch_size=32, top_k=8:\n   - Required experts: 32 × 8 = 256 (potentially all experts)\n   - All-experts becomes more efficient than selective loading\n\n**Implementation strategy:**\n\nThe all-experts algorithm follows a structured approach:\n\n1. **Load all expert weights** once per token generation step\n2. **Compute all experts** for all tokens in parallel\n3. **Apply expert masks** during output combination to zero out unused expert outputs\n4. **Benefits**:\n   - Better DMA efficiency since all DMA loads do not have indirection unlike in selective loading.\n5. **Scalability with TP+EP**: Use TP+EP to shard weights across multiple cores, increasing effective memory bandwidth for expert weight loading\n6. **Automatic configuration**: NxD Inference automatically selects between selective loading and all-experts based on the workload characteristics\n\n\nMoE Quantization Support\n------------------------\n\nThe MoE module available in NxD inference supports the below quantization techniques:\n\n1. BF16 weights and compute\n2. Weights quantized to FP8 along the hidden dimension with BF16 compute\n3. Weights quantized to MxFP4 with MxFP4/BF16 compute\n\nReference Implementations\n-------------------------\n\nFor detailed reference implementations of MoE models using the techniques described in this guide,\nrefer to the following NxDI model code:\n\n- **GPT-OSS MoE models**: `GPT-OSS implementation <https://github.com/aws-neuron/neuronx-distributed-inference/tree/main/src/neuronx_distributed_inference/models/gpt_oss>`_\n- **Llama4 MoE models**: `Llama4 implementation <https://github.com/aws-neuron/neuronx-distributed-inference/tree/main/src/neuronx_distributed_inference/models/llama4>`_\n\nThese implementations demonstrate practical applications of the router configurations, expert\nparallelism strategies, and optimization techniques covered in this deep dive.\n\nFuture Optimizations\n--------------------\n\nWe will continue to optimize the Neuron SDK with advanced optimizations for MoE workloads. Two key improvements\nwhich will be available in future releases are:\n\n**Expert Parallel Load Balancing (EPLB)**\n\nExpert Parallel Load Balancing (EPLB) addresses the fundamental challenge of load imbalance in EP configurations\nwhere some expert groups receive significantly more tokens than others, creating straggler effects.\nEPLB introduces redundant expert placement across multiple EP ranks, allowing dynamic load redistribution\nwhen imbalance is detected.\n\n**Communication Optimization for Expert Parallelism with All-to-All-v**\n\nCurrently, Expert Parallelism uses All-Gather to gather all tokens at all ranks, resulting in\nwasted communication volume since each rank only needs tokens assigned to its subset of experts.\nWe are working on an optimized All-to-All-v primitive in the Neuron SDK that will enable\nvariable-sized token exchanges between EP ranks, communicating only the actual tokens assigned\nto each expert rather than gathering all tokens everywhere. This optimization will significantly\nreduce network bandwidth requirements for EP communication.\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/nxd-examples-migration-guide.rst",
    "content": ".. _nxd-examples-migration-guide:\n\nMigrating from NxD Core inference examples to NxD Inference\n===========================================================\n\nWe have migrated the NeuronX Distributed Core `examples/inference <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference>`__\nfolder to a separate package, NeuronX Distributed (NxD) Inference\n(``neuronx-distributed-inference``), so you can import and use it as a\nproper library. This new library, NxD Inference, includes production ready\nmodels that you can deploy out of the box with model inference backends,\nsuch as vLLM. This library also provides modules that you can use to\nimplement your own models to run with the Neuron SDK.\n\nIf you use the inference examples from NxD Core, follow this guide to migrate\nto NxD Inference. For more information about NxD Inference and to see examples\nof how to use it, see :ref:`nxdi-feature-guide`, :ref:`NxD Inference Tutorials <nxdi-tutorials-index>`,\nand the `generation_demo.py script <https://github.com/aws-neuron/neuronx-distributed-inference/blob/main/examples/generation_demo.py>`__.\n\n.. warning::\n   Previous inference examples (including Llama 2, Llama 3, Mixtral, and DBRX) in\n   the NxD Core GitHub repository were removed in Neuron Release 2.23.\n   The models and example code are implemented in the\n   NxD Inference library, so you can easily integrate them with your inference\n   scripts. If you use these examples in NxD Core, we recommend\n   that you update your inference scripts to use the NxD Inference model hub\n   instead. If your use case requires you to directly integrate with the NxD\n   Core library (and not NxD Inference) then you can continue to use the NxD\n   Core library directly. For an example of how to integrate with NxD Core directly,\n   see the newer `Llama3.2 1B sample <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama>`__\n   added in Neuron Release 2.23. For more information, see :ref:`announce-eos-nxd-examples`.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nChanges\n-------\n\n1. New config interface\n~~~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference includes a new model config interface, ``InferenceConfig``,\nwhere NeuronConfig is an attribute within the model config, and the\nmodel config no longer extends HuggingFace's PretrainedConfig. NxDI\nincludes an adapter for loading an HuggingFace's config into this model\nconfig. The configurations are serialized into a file named\n``neuron_config.json``.\n\n**This change means that the config structure is inverted compared to\nthe NxD examples folder.**\n\n- To access the model config (similar to HuggingFace's\n  PreTrainedConfig), use ``config`` (or ``model.config``,\n  ``self.config``).\n- To access the NeuronConfig, use ``config.neuron_config`` (or\n  ``model.neuron_config``, ``self.neuron_config``).\n\nTo onboard a custom model, you define config classes that extend InferenceConfig\nand NeuronConfig. The following example from DBRX shows how to define a\nDBRX-specific NeuronConfig (NeuronDbrxConfig) and InferenceConfig\n(DbrxInferenceConfig). DbrxInferenceConfig that defines required config\nattributes and specifies that NeuronDbrxConfig is the NeuronConfig\nclass. The required attributes are typically set by loading a\nPretrainedConfig (in this case, HuggingFace's DbrxConfig) into the\nInferenceConfig. Alternatively, a user can manually provide these\nattributes to avoid depending on an HuggingFace config class.\n\n::\n\n   class NeuronDbrxConfig(MoENeuronConfig):\n       def __init__(self, **kwargs):\n           super().__init__(**kwargs)\n           self.fused_qkv = True\n\n\n   class DbrxInferenceConfig(InferenceConfig):\n       def get_required_attributes(self) -> List[str]:\n           return [\n               \"d_model\",\n               \"n_heads\",\n               \"max_seq_len\",\n               \"emb_pdrop\",\n               \"resid_pdrop\",\n               \"pad_token_id\",\n               \"vocab_size\",\n               \"attn_config\",\n               \"ffn_config\",\n           ]\n\n       @classmethod\n       def get_neuron_config_cls(cls):\n           return NeuronDbrxConfig\n\n.. note:: \n\n   NeuronDbrxConfig extends MoENeuronConfig, which is a subclass of NeuronConfig\n   that includes attributes that are specific to mixture-of-experts (MoE) models.\n\n\nTo load the config from an HuggingFace checkpoint or a compiled\ncheckpoint, pass ``load_pretrained_config(path)`` as the ``load_config``\nhook when you create the InferenceConfig.\n\n::\n\n   from neuronx_distributed_inference.utils.hf_adapter import load_pretrained_config\n\n   neuron_config = DbrxNeuronConfig()  # Provide args\n   config = DbrxInferenceConfig(\n       neuron_config,\n       load_config=load_pretrained_config(model_path),\n   )\n\nTo serialize the config, call ``save(path)``.\n\n::\n\n   config.save(compiled_model_path)\n\nTo deserialize the config, call ``load(path)``.\n\n::\n\n   config = DbrxInferenceConfig.load(compiled_model_path)\n\nNeuronConfig also supports nested configs now. For example, see the\nOnDeviceSamplingConfig class and its integration into NeuronConfig.\n\n2. New base application interface\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuronApplicationBase takes general purpose features from\nNeuronBaseForCausalLM, such as compile and load, and makes them\navailable in a new abstract base class. You can extend this base class\nto define other types of application heads, such as for image\nclassification.\n\n3. New generation inference\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe Neuron model classes no longer extend HuggingFace's PretrainedModel,\nso they no longer include a HuggingFace ``generate()`` function.\nAdditionally, GenerationConfig arguments are no longer passed through\nthe model config. To run HuggingFace generation in NxD Inference, wrap\nthe Neuron model in a HuggingFaceGenerationAdapter, and pass a\nGenerationConfig when you call ``generate()``.\n\n::\n\n   from transformers import GenerationConfig\n\n   from neuronx_distributed_inference.utils.hf_adapter import HuggingFaceGenerationAdapter\n\n   # Init config, model, and tokenizer.\n\n   generation_config = GenerationConfig.from_pretrained(model_path)\n   generation_config_kwargs = {\n       \"do_sample\": True,\n       \"top_k\": 1,\n       \"pad_token_id\": generation_config.eos_token_id,\n       \"max_length\": neuron_config.max_length,\n   }\n   generation_config.update(**generation_config_kwargs)\n\n   inputs = tokenizer(prompts, padding=True, return_tensors=\"pt\")\n   generation_model = HuggingFaceGenerationAdapter(model)\n   outputs = generation_model.generate(\n       inputs.input_ids,\n       generation_config=generation_config,\n       attention_mask=inputs.attention_mask,\n   )\n\n4. New quantization interface\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis new base class also includes an interface for quantization, which\nwas previously part of the ``run_llama_quantized.py`` example in the old\nNxD examples folder. The following example saves a quantized checkpoint\nfor a Llama model. In this example, the ``config`` includes a\n``neuron_config`` with quantization enabled.\n\n::\n\n   NeuronLlamaForCausalLM.save_quantized_state_dict(model_path, config)\n\n5. Inference demo script (replaces runners)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn place of ``runner.py`` and various ``run_x.py`` examples, NxD-I\nprovides an ``inference_demo`` console script. When you run the script,\nyou provide a model path and configuration parameters to use for\ninference. This script includes benchmarking and accuracy checking\nfeatures that you can use verify that your models and modules work\ncorrectly.\n\nThe following example demonstrates how to run Llama-3-8b with token\nmatching and benchmarking enabled.\n\n::\n\n   inference_demo \\ \n     --model-type llama \\\n     --task-type causal-lm \\\n     run \\ \n       --model-path /home/ubuntu/model_hf/Llama-3.1-8b/ \\ \n       --compiled-model-path /home/ubuntu/traced_model/Llama-3.1-8b/ \\ \n       --torch-dtype bfloat16 \\ \n       --tp-degree 32 \\ \n       --batch-size 2 \\ \n       --max-context-length 32 \\ \n       --seq-len 64 \\ \n       --on-device-sampling \\ \n       --enable-bucketing \\ \n       --top-k 1 \\ \n       --do-sample \\ \n       --pad-token-id 2 \\ \n       --prompt \"I believe the meaning of life is\" \\ \n       --prompt \"The color of the sky is\" \\ \n       --check-accuracy-mode token-matching \\ \n       --benchmark\n\nFor additional examples, see the ``neuronx-distributed-inference``\nGitHub repository:\nhttps://github.com/aws-neuron/neuronx-distributed-inference."
  },
  {
    "path": "libraries/nxd-inference/developer_guides/onboarding-models.rst",
    "content": ".. _nxdi-onboarding-models:\n\nOnboarding models to run on NxD Inference\n=========================================\n\nThis guide covers how to onboard a model to get it to run on NxD Inference\nfor the first time. To learn more about how to optimize a model on Neuron,\nsee the :ref:`nxdi-feature-guide`.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nOverview\n--------\n\nThis guide demonstrates how to adapt an existing PyTorch model to run on\nNeuron with the NeuronX Distributed (NxD) Inference library. At a\nhigh-level, you will do the following:\n\n1. Define configuration classes. NxD Inference models include a\n   NeuronConfig, which defines Neuron-specific configuration parameters,\n   and an InferenceConfig, which defines model configuration parameters.\n   When adapting a model that works with HuggingFace, InferenceConfig is\n   synonymous to PretrainedConfig.\n2. Define model classes. When you define model classes, you replace\n   linear layers with parallel layers that are optimized for distributed\n   inference on Neuron. NxD Inference also provides modules for\n   attention, KV cache management, and more, which you can use to write\n   model classes that work with Neuron. Model classes are compiled to\n   run effectively on Neuron.\n3. Define application heads. Application heads orchestrate passing\n   inputs to the correct compiled model. Application heads also provide\n   the interface to compile and load the model.\n4. Convert weights to a supported format. NxD Inference supports\n   safetensors and pickle formats.\n\n\n1. Define a NeuronConfig class\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDefine a Neuron configuration class, which extends NeuronConfig.\nNeuronConfig includes Neuron-specific configuration parameters. In the\nconfig class for your model, you can define any additional\nNeuron-specific configuration parameters that your model requires.\n\n- For MoE models, you can extend MoENeuronConfig instead of\n  NeuronConfig. This class includes configuration parameters specific to\n  MoE models.\n\n::\n\n   from neuronx_distributed_inference.models.config import NeuronConfig\n\n   class NeuronLlamaConfig(NeuronConfig):\n       def __init__(self, **kwargs):\n           super().__init__(**kwargs)\n           # Set any args/defaults\n\n2. Define an InferenceConfig class\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDefine an inference configuration class, which extends InferenceConfig.\nInferenceConfig includes model parameters, such as those from a\nHuggingFace PretrainedConfig (like LlamaConfig). When users initialize\nyour config, they can provide required attributes directly, or they can\npopulate the config from a HuggingFace PretrainedConfig. You can also\noverride ``get_required_attributes`` to enforce that certain attributes\nare present.\n\n::\n\n   from neuronx_distributed_inference.models.config import InferenceConfig, NeuronConfig\n\n   class LlamaInferenceConfig(InferenceConfig):\n       def get_required_attributes(self) -> List[str]:\n           return [\n               \"hidden_size\",\n               \"num_attention_heads\",\n               \"num_hidden_layers\",\n               \"num_key_value_heads\",\n               \"pad_token_id\",\n               \"vocab_size\",\n               \"max_position_embeddings\",\n               \"rope_theta\",\n               \"rms_norm_eps\",\n               \"hidden_act\",\n           ]\n           \n       @classmethod\n       def get_neuron_config_cls(cls) -> Type[NeuronConfig]:\n           return NeuronLlamaConfig\n\n3. Define a Neuron model\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nDefine a Neuron model. This class is a subclass of NeuronBaseModel,\nwhich is a PyTorch module.\n\n1. In this class, you provide implementations for\n   ``setup_attr_for_model(self, config)`` and\n   ``init_model(self, config)``.\n\n   1. In ``setup_attr_for_model``, set values for the following\n      attributes. You can set these attributes from values in ``config``\n      and ``config.neuron_config``.\n\n      1. self.on_device_sampling\n      2. self.tp_degree\n      3. self.hidden_size\n      4. self.num_attention_heads\n      5. self.num_key_value_heads\n      6. self.max_batch_size\n      7. self.buckets\n\n   2. In ``init_model``, initialize the modules that make up the model.\n\n      1. For attention modules, extend NeuronAttentionBase, which\n         provides a group query attention (GQA) implementation adapted\n         to Neuron.\n      2. Replace linear layers (such as in attention and MLP) with\n         Neuron parallel layers (RowParallelLinear and\n         ColumnParallelLinear).\n\n         1. For more information about RowParallelLinear and\n            ColumnParallelLinear layers, see :ref:`tensor_parallelism_overview`.\n\n      3. Replace embeddings with Neuron parallel embeddings\n         (ParallelEmbedding).\n      4. Replace any other modules that require Neuron-specific\n         implementations.\n\nNote: This example demonstrates a simplified version of NeuronLlamaModel\nfrom from the NxDI model hub.\n\n::\n\n   from torch import nn\n   from transformers.activations import ACT2FN\n\n   from neuronx_distributed.parallel_layers import parallel_state\n   from neuronx_distributed.parallel_layers.layers import ColumnParallelLinear, RowParallelLinear, ParallelEmbedding\n\n   from neuronx_distributed_inference.models.model_base import NeuronBaseModel\n   from neuronx_distributed_inference.modules.attention.attention_base import NeuronAttentionBase\n   from neuronx_distributed_inference.modules.attention.utils import RotaryEmbedding\n   from neuronx_distributed_inference.modules.custom_calls import CustomRMSNorm\n\n   class NeuronLlamaMLP(nn.Module):\n       \"\"\"\n       This class just replace the linear layers (gate_proj, up_proj and down_proj) with column and row parallel layers\n       \"\"\"\n\n       def __init__(self, config: InferenceConfig):\n           super().__init__()\n           self.config = config\n           self.neuron_config = config.neuron_config\n           self.tp_degree = config.neuron_config.tp_degree\n           self.hidden_size = config.hidden_size\n           self.intermediate_size = config.intermediate_size\n           self.act_fn = ACT2FN[config.hidden_act]\n\n           self.gate_proj = ColumnParallelLinear(\n               self.hidden_size,\n               self.intermediate_size,\n               bias=False,\n               gather_output=False,\n               dtype=config.neuron_config.torch_dtype,\n               pad=True,\n           )\n           self.up_proj = ColumnParallelLinear(\n               self.hidden_size,\n               self.intermediate_size,\n               bias=False,\n               gather_output=False,\n               dtype=config.neuron_config.torch_dtype,\n               pad=True,\n           )\n           self.down_proj = RowParallelLinear(\n               self.intermediate_size,\n               self.hidden_size,\n               bias=False,\n               input_is_parallel=True,\n               dtype=config.neuron_config.torch_dtype,\n               pad=True,\n           )\n\n       def forward(self, x):\n           return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))\n\n\n   class NeuronLlamaAttention(NeuronAttentionBase):\n       \"\"\"\n       Compared with LlamaAttention, this class just\n       1. replaces the q_proj, k_proj, v_proj with column parallel layer\n       2. replaces the o_proj with row parallel layer\n       3. update self.num_head to be self.num_head / tp_degree\n       4. update self.num_key_value_heads to be self.num_key_value_heads / tp_degree\n       5. update forward() method to adjust to changes from self.num_head\n       \"\"\"\n\n       def __init__(self, config: InferenceConfig):\n           super().__init__()\n\n           self.config = config\n           self.neuron_config = config.neuron_config\n           self.hidden_size = config.hidden_size\n           self.num_attention_heads = config.num_attention_heads\n           self.num_key_value_heads = config.num_key_value_heads\n           self.head_dim = self.hidden_size // self.num_attention_heads\n           self.max_position_embeddings = config.max_position_embeddings\n           self.rope_theta = config.rope_theta\n           self.padding_side = config.neuron_config.padding_side\n           self.torch_dtype = config.neuron_config.torch_dtype\n\n           self.tp_degree = parallel_state.get_tensor_model_parallel_size()\n\n           self.fused_qkv = config.neuron_config.fused_qkv\n           self.clip_qkv = None\n\n           self.init_gqa_properties()\n           self.init_rope()\n\n       def init_rope(self):\n           self.rotary_emb = RotaryEmbedding(\n               self.head_dim,\n               max_position_embeddings=self.max_position_embeddings,\n               base=self.rope_theta,\n           )\n\n\n   class NeuronLlamaDecoderLayer(nn.Module):\n       \"\"\"\n       Just replace the attention with the NXD version, and MLP with the NXD version\n       \"\"\"\n\n       def __init__(self, config: InferenceConfig):\n           super().__init__()\n           self.hidden_size = config.hidden_size\n           self.self_attn = NeuronLlamaAttention(config)\n           self.mlp = NeuronLlamaMLP(config)\n           self.input_layernorm = CustomRMSNorm(\n               config.hidden_size,\n               eps=config.rms_norm_eps,\n           )\n           self.post_attention_layernorm = CustomRMSNorm(\n               config.hidden_size,\n               eps=config.rms_norm_eps,\n           )\n\n       def forward(\n           self,\n           hidden_states: torch.Tensor,\n           attention_mask: Optional[torch.Tensor] = None,\n           position_ids: Optional[torch.LongTensor] = None,\n           past_key_value: Optional[Tuple[torch.Tensor]] = None,\n           **kwargs,\n       ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:\n           residual = hidden_states\n           hidden_states = self.input_layernorm(hidden_states)\n\n           # Self Attention\n           attn_outs = self.self_attn(\n               hidden_states=hidden_states,\n               attention_mask=attention_mask,\n               position_ids=position_ids,\n               past_key_value=past_key_value,\n               **kwargs,\n           )\n\n           hidden_states, present_key_value = attn_outs\n           hidden_states = residual + hidden_states\n\n           # Fully Connected\n           residual = hidden_states\n           hidden_states = self.post_attention_layernorm(hidden_states)\n           hidden_states = self.mlp(hidden_states)\n           hidden_states = residual + hidden_states\n\n           return (hidden_states, present_key_value)\n\n\n   class NeuronLlamaModel(NeuronBaseModel):\n       \"\"\"\n       The neuron version of the LlamaModel\n       \"\"\"\n\n       def setup_attr_for_model(self, config: InferenceConfig):\n           # Needed for init_inference_optimization()\n           self.on_device_sampling = config.neuron_config.on_device_sampling_config is not None\n           self.tp_degree = config.neuron_config.tp_degree\n           self.hidden_size = config.hidden_size\n           self.num_attention_heads = config.num_attention_heads\n           self.num_key_value_heads = config.num_key_value_heads\n           self.max_batch_size = config.neuron_config.max_batch_size\n           self.buckets = config.neuron_config.buckets\n\n       def init_model(self, config: InferenceConfig):\n           self.padding_idx = config.pad_token_id\n           self.vocab_size = config.vocab_size\n\n           self.embed_tokens = ParallelEmbedding(\n               config.vocab_size,\n               config.hidden_size,\n               self.padding_idx,\n               dtype=config.neuron_config.torch_dtype,\n               shard_across_embedding=True,\n               # We choose to shard across embedding dimension because this stops XLA from introducing\n               # rank specific constant parameters into the HLO. We could shard across vocab, but that\n               # would require us to use non SPMD parallel_model_trace.\n               pad=True,\n           )\n           self.lm_head = ColumnParallelLinear(\n               config.hidden_size,\n               config.vocab_size,\n               bias=False,\n               pad=True,\n           )\n\n           self.layers = nn.ModuleList(\n               [NeuronLlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)]\n           )\n           self.norm = CustomRMSNorm(config.hidden_size, eps=config.rms_norm_eps)\n\n4. Define an application/task head\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDefine an application/task head. Applications includes causal LM,\nclassification, and so on. This class extends a task-specific Neuron\napplication head class (such as NeuronBaseForCausalLM), or the general\nNeuronApplicationHead class.\n\n1. In this class, you provide an value for ``_model_cls`` which is the\n   Neuron model class you defined.\n2. You can also override any other functions as needed for your model,\n   such as ``get_compiler_args(self)`` or\n   ``convert_hf_to_neuron_state_dict(model_state_dict, neuron_config)``.\n\nNote: This example demonstrates a simplified version of\n`NeuronLlamaForCausalLM <https://github.com/aws-neuron/neuronx-distributed-inference/blob/main/src/neuronx_distributed_inference/models/llama/modeling_llama.py>`__\nfrom the NxD Inference model hub.\n\n\n::\n\n   class NeuronLlamaForCausalLM(NeuronBaseForCausalLM):\n       _model_cls = NeuronLlamaModel\n           \n       @classmethod\n       def get_config_cls(cls):\n           return LlamaInferenceConfig\n\nNxD Inference offers :ref:`nxdi_async_mode_feature_guide` as an alternative method to executing NEFFs in parallel with CPU Logic. To evaluate if your\ntask can utilize ``async_mode``, the following questions must be answered:\n\n1. Does your task repeatedly execute a model for a single user request? If not, then ``async_mode`` won't offer any benefits.\n    - Example: The Auto Regressive loops used in LLMs perform repeated execution of models for a given prompt, which can get some benefits from async mode.\n2. Does the output of one execution get passed onto the next execution without manipulation? If not, then ``async_mode`` is incompatible.\n    - NOTE: It might be possible to address this by moving some manipulation logic within the neff.\n    - Example: For LLMs using on-device-sampling, we pass in the token generated as output as input to the next step in the auto regressive loop directly. Without on-device-sampling, the sampling logic will rely on logits as output, which is a data dependent compute pattern that is incompatible with async mode.\n3. Is there sufficient CPU logic that is independent of the previous outputs? If not, then ``async_mode`` likely won't offer major benefits.\n    - Example: In production workloads, these are typically server overheads (scheduling, logging, etc.), but this could also be some pre/post processing steps in the model execution pipeline.\n  \nBased on the answers above, ``async_mode`` will need to be set accordingly, and/or, be configured to work correctly with the application.\n\n1. Convert weights to a supported format\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference supports weights stored in the model path in the following\nformats:\n\n=========== ======= ============================\nFormat      Sharded File name\n=========== ======= ============================\nSafetensors No      model.safetensors\nSafetensors Yes     model.safetensors.index.json\nPickle      No      pytorch_model.bin\nPickle      Yes     pytorch_model.bin.index.json\n=========== ======= ============================\n\nIf your weights are in another format, you must convert them to one of\nthese formats before you can compile and load the model to Neuron. See\nthe following references for more information about these formats:\n\n- Safetensors:\n\n  - https://github.com/huggingface/safetensors\n  - https://huggingface.co/docs/safetensors/en/convert-weights\n\n- Pickle:\n\n  - https://docs.python.org/3/library/pickle.html\n\n.. _nxdi-onboarding-models-vllm:\n\nIntegrating Onboarded Model with vLLM\n-------------------------------------\n\nAfter completing the model onboarding in NxDI using the steps outlined \nin this guide, you can follow these steps to run that model through vLLM.\n\n1. **Model Architecture**: Ensure your model follows standard NxDI naming \n   conventions (e.g., ``ModelNameForCausalLM``). The model is automatically \n   recognized through NxDI's ``MODEL_TYPES`` registry.\n\n2. **Model Directory**: Use the local directory as ``model_name_or_path`` \n   when initializing vLLM. This directory should contain:\n   \n   - Model weights (safetensors or pickle format)\n   - ``config.json`` file compatible with your InferenceConfig class\n\n3. **Custom Configuration**: Pass any custom NeuronConfig attributes using \n   the ``override_neuron_config`` parameter when initializing the vLLM engine.\n\n4. **Run Inference**: Execute offline or online inference using vLLM's \n   standard APIs to get your model working with vLLM.\n\n\n.. _nxdi-evaluating-models:\n\nEvaluating Models on Neuron\n---------------------------\n\nNxD Inference provides tools that you can use to\nevaluate the accuracy and performance of the models that you onboard to\nNeuron.\n\n.. _nxdi-logit-matching:\n\nLogit Matching\n~~~~~~~~~~~~~~\n\nThe logit matching evaluation tool verifies that output logits are\nwithin certain tolerances of expected logits. With this evaluation tool,\nNxD Inference runs generation on the Neuron device.\nThen, it compares the output logits against expected logits, which you\ncan provide or generate with the HuggingFace model on CPU.\n\nDuring logit validation, if the output tokens diverge, then this process\nruns generation on Neuron again, using the tokens up to the point where it diverged. This\nprocess is performed repeatedly each time the output diverges, until the\nentire output matches. This process uses greedy sampling to choose the\nmost likely token at each index.\n\nOnce all tokens match, this process compares the logits generated on\nNeuron with the expected logits. If all logits are within expected\ntolerances, this accuracy check passes. Divergence difference tolerance\nis used to compare the logits at the token that diverges. Absolute and\nrelative tolerance are used to compare the values of the logits for the\ntop k highest scoring tokens. For best results, use a lower relative\ntolerance for smaller k values, and a higher relative tolerance for\nlarger k values. A top k of ``None`` means to compare logits for all\npossible tokens at each index.\n\nLogit matching uses the following tolerances by default, and you can\ncustomize these tolerances.\n\n- Divergence difference tolerance: ``0.001``\n- Absolute tolerance:\n\n  - Top k = 5: ``1e-5``\n  - Top k = 50: ``1e-5``\n  - Top k = 1000: ``1e-5``\n  - Top k = None: ``1e-5``\n\n- Relative tolerance:\n\n  - Top k = 5: ``0.01``\n  - Top k = 50: ``0.02``\n  - Top k = 1000: ``0.03``\n  - Top k = None: ``0.05``\n\nIf all logits are within expected thresholds, this accuracy check\npasses.\n\n- Note: Logit matching cannot be used with on-device sampling.\n- Note: Generating HuggingFace model outputs on CPU can take a\n  significant amount of time for larger models or large sequence\n  lengths.\n\nExample (``check_accuracy_logits_v2`` API)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   from neuronx_distributed_inference.utils.accuracy import generate_expected_logits, check_accuracy_logits_v2\n\n   # Init Neuron model, test inputs and HuggingFace generation config.\n\n   # Generating HuggingFace model outputs on CPU.\n   expected_logits = generate_expected_logits(\n        neuron_model,\n        inputs.input_ids,\n        inputs.attention_mask,\n        generation_config\n    )\n    # Alternatively, you can load the expected_logits from disk to save time.\n    # expected_logits = ...\n\n    check_accuracy_logits_v2(\n        neuron_model,\n        expected_logits,\n        inputs.input_ids,\n        inputs.attention_mask,\n        generation_config=generation_config\n    )\n\nExample (``check_accuracy_logits`` API)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   from neuronx_distributed_inference.utils.accuracy import check_accuracy_logits\n\n   # Init Neuron model, HuggingFace tokenizer, and HuggingFace generation config.\n\n   check_accuracy_logits(\n       model,\n       tokenizer,\n       generation_config,\n   )\n\nToken Matching\n~~~~~~~~~~~~~~\n\nThe token matching evaluation tool verifies that output tokens match\nexpected tokens. With this evaluation tool, Neuronx Distributed\nInference runs generation on the Neuron device. Then, it compares the\noutput against expected tokens, which you can provide or generate with\nthe HuggingFace model on CPU. If all tokens match, this accuracy check\npasses.\n\n- Warning: Token mismatches are acceptable in many scenarios, especially\n  with large models or large sequence lengths. This tool should only be\n  used for small models and small sequence lengths.\n- Note: Generating HuggingFace model outputs on CPU can take a\n  significant amount of time for larger models or large sequence\n  lengths.\n\nExample (``check_accuracy`` API)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   from neuronx_distributed_inference.utils.accuracy import check_accuracy\n\n   # Init Neuron model, HuggingFace tokenizer, and HuggingFace generation config.\n\n   check_accuracy(\n       model,\n       tokenizer,\n       generation_config,\n   )\n\n.. _nxdi-benchmark-sampling:\n\nBenchmarking\n~~~~~~~~~~~~\n\nNxD Inference provides a benchmarking tool that\nevaluates the latency and throughput of a Neuron model and its\nsub-models (context encoding, token generation, etc.).\n\nExample (``benchmark_sampling`` API)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   from neuronx_distributed_inference.utils.benchmark import benchmark_sampling\n\n   # Init Neuron model and HuggingFace generation config.\n\n   benchmark_sampling(model, generation_config)\n\nExample benchmarking result\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   {\n       \"e2e_model\": {\n           \"latency_ms_p50\": 28890.24031162262,\n           \"latency_ms_p90\": 28977.734088897705,\n           \"latency_ms_p95\": 28983.17071199417,\n           \"latency_ms_p99\": 29032.21325159073,\n           \"latency_ms_p100\": 29044.473886489868,\n           \"latency_ms_avg\": 28879.499554634094,\n           \"throughput\": 283.66142510545984\n       },\n       \"context_encoding_model\": {\n           \"latency_ms_p50\": 705.0175666809082,\n           \"latency_ms_p90\": 705.3698301315308,\n           \"latency_ms_p95\": 705.6618571281433,\n           \"latency_ms_p99\": 705.8443236351013,\n           \"latency_ms_p100\": 705.8899402618408,\n           \"latency_ms_avg\": 705.0377488136292,\n           \"throughput\": 5809.618005408024\n       },\n       \"token_generation_model\": {\n           \"latency_ms_p50\": 27.20165252685547,\n           \"latency_ms_p90\": 27.295589447021484,\n           \"latency_ms_p95\": 27.324914932250977,\n           \"latency_ms_p99\": 27.655515670776367,\n           \"latency_ms_p100\": 32.74345397949219,\n           \"latency_ms_avg\": 27.19622969277793,\n           \"throughput\": 147.22298324644066\n       }\n   }\n\nProfiling Models\n~~~~~~~~~~~~~~~~\n\nNeuron provides a profiling tool, ``neuron-profile``, which you can use\nto analyze the performance of a compiled Neuron model. For more\ninformation, see :ref:`neuron-profile-ug`.\n\nEvaluating Models with the Inference Demo Script\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference provides an ``inference_demo`` console\nscript, which you can run from the environment where you install\n``neuronx_distributed_inference``.\n\nNote: Before you can use a custom model with the ``inference_demo``, you\nmust add it to the ``MODEL_TYPES`` dictionary in ``inference_demo.py``.\n\nThis script provides the following arguments to configure evaluation\ntools:\n\n- ``--check-accuracy-mode`` - Provide one of the following values:\n\n  - ``token-matching`` - Perform a token matching accuracy check.\n  - ``logit-matching`` - Perform a logit matching accuracy check.\n  - ``skip-accuracy-check`` - Do not perform an accuracy check.\n\n- ``--num-tokens-to-check`` - The number of tokens to check when performing\n  token matching or logit matching.\n- ``--expected-outputs-path`` - The path to a file that contains tokens or\n  logits to compare against for the accuracy check. This file must contain\n  an object saved with ``torch.save()``.\n- ``--benchmark`` - Run benchmarking.\n- ``--on-cpu`` - Run inference on CPU. To simulate tensor parallelism, \n  initialize ``inference_demo.py`` with ``torchrun``.\n\nDebugging Models on Neuron\n--------------------------\n\nWhen you debug models on Neuron, you can enable debug logging to view\ninformation about inputs and outputs of the NeuronBaseForCausalLM\nforward function, which calls the NeuronBaseModel's forward function.\n\n::\n\n   import logging\n\n   logging.getLogger().setLevel(logging.DEBUG)\n\nBecause the forward function of NeuronBaseModel is compiled, you cannot\nuse log/print statements to debug code that is called from this forward\nfunction (or any other compiled code).\n\nDebugging Neuron modeling code on CPU isn't yet supported.\n\nWriting Tests on Neuron\n-----------------------\n\nNxD Inference provides tools to help you write unit and integration tests\nthat validate your model works as expected. For more information, see\n:ref:`nxdi-writing-tests`.\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/performance-cli-params.rst",
    "content": ".. _performance-cli-params:\n\nEvaluating Performance of Models on Neuron Using LLMPerf\n==========================================================\n\nThis topic guides you through determining the performance of your models on Trainium and Inferentia instances using  open-source clients.\nIt expands on the basic performance analysis tools provided with Neuron by incorporating the `LLMperf <https://github.com/ray-project/llmperf>`_ client to collect additional information about performance for models such as llama-3.3-70B-instruct and llama-3.1-8b.\n\n\nUnder the hood, this performance suite uses vLLM server to serve the model\nand can use benchmarking clients such as `llm-perf <https://github.com/ray-project/llmperf>`_\nto evaluate on their supported models. \n\nIn the future we will add support for other benchmarking clients. \n\nThe code used in this guide is located at `inference-benchmarking <https://github.com/aws-neuron/aws-neuron-samples/tree/master/inference-benchmarking/>`_.\n\nFor a tutorial that you can follow and run on a Trainium or Inferentia instance, see :ref:`/libraries/nxd-inference/tutorials/generating-results-with-performance-cli.ipynb`. \n\n\n\nCreating the Configuration File\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nCreate a test_config.yaml file that defines your server settings and\nperformance test configurations and paste in the following code:\n\n.. code:: yaml\n\n   server:\n     name: \"test-model-server\"\n     model_path: \"/path/to/model\"\n     model_s3_path: \"s3://bucket/path/to/model\"\n     max_seq_len: 256\n     context_encoding_len: 128\n     tp_degree: 32\n     server_port: 8000\n     continuous_batch_size: 1\n     custom_chat_template_path: \"default\"\n\n   test:\n     performance:\n       llama_test:\n         client: \"llm_perf\"\n         client_type: \"llm_perf_github_patched\"\n         max_concurrent_requests: 20\n         timeout: 3600\n         input_size: 128\n         output_size: 124\n         client_params:\n           stddev_input_tokens: 0\n           stddev_output_tokens: 1\n       \n\n\nConfiguration Parameters\n------------------------\n\nBelow is a reference for the configuration parameters you can use when configuring the server and tastes for your model performance analysis:\n\nServer Configuration\n~~~~~~~~~~~~~~~~~~~~\n\n===================================== ===================================\nParameter                               Description\n===================================== ===================================\n``name``                              Identifier for your model server\n``model_path``                        Local path to model files\n``model_s3_path``                     S3 location of model files\n``max_seq_len``                       Maximum sequence length\n``context_encoding_len``              Length of context encoding\n``tp_degree``                         Tensor parallelism degree\n``server_port``                       Server port number\n``continuous_batch_size``             Size of continuous batches\n``custom_chat_template_path``         Chat template for the prompt\n===================================== ===================================\n\nif ``model_s3_path`` is specified, the model is downloaded to ``model_path``;\notherwise, the model should already be available at ``model_path``.\n\nPerformance Test Configuration\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n+-----------------------------+---------------------------------------+\n| Parameter                   | Description                           |\n+=============================+=======================================+\n| ``client``                  | Performance framework (such as,       |\n|                             | llm-perf)                             |\n+-----------------------------+---------------------------------------+\n| ``client_type``             | List of clients such as               |\n|                             |  llm_perf_github_patched              |\n+-----------------------------+---------------------------------------+\n| ``max_concurrent_requests`` | Maximum parallel requests             |\n+-----------------------------+---------------------------------------+\n| ``timeout``                 | Maximum execution time (seconds)      |\n+-----------------------------+---------------------------------------+\n| ``input_size``              | Input context length                  |\n+-----------------------------+---------------------------------------+\n| ``output_size``             | Output length / MaxNewTokens          |\n+-----------------------------+---------------------------------------+\n| ``client_params``           | Client-specific parameters            |\n+-----------------------------+---------------------------------------+\n\nClient_params\n-------------------\n\nInvolves ``stddev_input_tokens`` and ``stddev_output_tokens``\n\nTo prevent bucket overflow at higher batch sizes, we use the following default:\n\n``outputlength`` = ``orig_output_length - 4* continuous_batch_size``\n\n``stddev_output_tokens`` = ``batch_size``\n\n\nRunning Evaluations\n-------------------\n\nExecute performance tests using the CLI command:\n\n.. code:: bash\n\n   python performance.py --config perf.yaml\n\n\n\nFor more detailed information and advanced configurations, please refer\nto: - `llm-perf\nDocumentation <https://github.com/ray-project/llmperf>`__ -\n\n\nThese resources provide comprehensive guides on client-specific\nparameters and advanced evaluation scenarios.\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/vllm-user-guide-v1.rst",
    "content": ".. _nxdi-vllm-user-guide-v1:\n.. _nxdi-vllm-user-guide:\n\nvLLM User Guide for NxD Inference\n============================================\n\n`vLLM <https://docs.vllm.ai/en/latest/>`_ is a popular library for LLM inference and serving utilizing advanced inference features such as continuous batching.\nThis guide describes how to utilize AWS Inferentia and AWS Trainium AI accelerators in vLLM by using NxD Inference (``neuronx-distributed-inference``).\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n\nNxD Inference integrates with vLLM by using `vLLM's Plugin System <https://docs.vllm.ai/en/latest/design/plugin_system.html>`_ to extend the model execution components responsible for loading and invoking models within vLLM's LLMEngine (see `vLLM architecture <https://docs.vllm.ai/en/latest/design/arch_overview.html#llm-engine>`_ \nfor more details). This means input processing, scheduling and output \nprocessing follow the default vLLM behavior.\n\nVersioning\n^^^^^^^^^^\n\nPlugin Version: ``0.5.0``\n\nNeuron SDK Version: ``2.29.0``\n\nvLLM Version: ``0.16.0``\n\nPyTorch Version: ``2.9.1``\n\n\nSupported Models\n----------------\n\nThe following models are supported on vLLM with NxD Inference:\n\n- Llama 2/3.1/3.3\n- Llama 4 Scout, Maverick\n- Qwen 2.5\n- Qwen 3\n- Qwen2-VL\n- Qwen3-VL\n- Pixtral\n\nIf you are adding your own model to NxD Inference, see :ref:`Integrating Onboarded Model with vLLM<nxdi-onboarding-models-vllm>`.\n\n  \nSetup\n-----\n\nPrerequisite: Launch an instance and install drivers and tools\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBefore installing vLLM with the instructions below, you must launch an Inferentia or Trainium instance and install the necessary\nNeuron SDK dependency libraries. We recommend using a Neuron Deep Learning Container (DLC) for the best compatibility. \nRefer to :ref:`these setup instructions<nxdi-setup>` for information on using Neuron DLCs.\n\n\n**Prerequisites:**\n\n- Latest AWS Neuron SDK (`Neuron SDK 2.29.0 <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/2.29.0.html>`_)\n- Python 3.10+ (compatible with vLLM requirements)\n- Supported AWS instances: Inf2, Trn1/Trn1n, Trn2\n\n\nQuickstart using Docker\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nYou can use a Deep Learning Container (DLC) which bundles the SDK and dependencies.\nRefer to the `pytorch-inference-neuronx container <https://github.com/aws-neuron/deep-learning-containers?tab=readme-ov-file#pytorch-inference-neuronx>`_\non `https://github.com/aws-neuron/deep-learning-containers <https://github.com/aws-neuron/deep-learning-containers>`_ to get started.\n\nFor a complete step-by-step tutorial, see :ref:`Option B in the DLC quickstart <quickstart_vllm_dlc_option_b>`. After entering the container, proceed to `Manually install from source`_ below to install the vLLM Neuron plugin.\n\nManually install from source\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nInstall the plugin from GitHub sources using the following commands. The plugin will automatically install the correct version of vLLM along with other required dependencies.\nThis version of the plugin is intended to work with the Neuron SDK 2.29.0, PyTorch 2.9, and vLLM 0.16.0.\n\n.. code-block:: bash\n\n    git clone --branch \"0.5.0\" https://github.com/vllm-project/vllm-neuron.git\n    cd vllm-neuron\n    pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com -e .\n\n\nUsage\n-----\n\nQuickstart\n^^^^^^^^^^^^\n\nHere is a very basic example to get started:\n\n.. code-block:: python\n\n  from vllm import LLM, SamplingParams\n\n  if __name__ == '__main__':\n      # Initialize the model\n      llm = LLM(\n          model=\"TinyLlama/TinyLlama-1.1B-Chat-v1.0\",\n          max_num_seqs=4,\n          max_model_len=128,\n          tensor_parallel_size=2,\n          block_size=32,\n          num_gpu_blocks_override=16\n      )\n\n      # Generate text\n      prompts = [\n          \"Hello, my name is\",\n          \"The president of the United States is\",\n          \"The capital of France is\",\n      ]\n      sampling_params = SamplingParams(temperature=0.0)\n      outputs = llm.generate(prompts, sampling_params)\n\n      for output in outputs:\n          print(f\"Prompt: {output.prompt}\")\n          print(f\"Generated: {output.outputs[0].text}\")\n\nFeature Support\n------------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 10 60\n\n   * - Feature\n     - Status\n     - Notes\n   * - Continuous batching\n     - 🟢\n     -\n   * - Prefix Caching\n     - 🟢\n     - \n   * - Multi-LORA\n     - 🟢\n     - \n   * - Speculative Decoding\n     - 🟢\n     - Eagle V1 and V3 are supported\n   * - Quantization\n     - 🟢\n     - INT8/FP8 quantization support\n   * - Dynamic sampling\t\n     - 🟢\n     -\n   * - Tool calling\n     - 🟢\n     -\n   * - CPU Sampling\n     - 🟢\n     -\n   * - Multimodal\n     - 🟢\n     - Llama4 and Pixtral are supported\n\n- 🟢 Functional: Fully operational, with ongoing optimizations.\n- 🚧 WIP: Under active development.\n\nFeature Configuration\n----------------------\n\nNxD Inference models provide many configuration options. When using NxD Inference through vLLM,\nyou configure the model with a default configuration that sets the required fields from vLLM settings.\n\n.. code:: ipython3\n\n    neuron_config = dict(\n        tp_degree=parallel_config.tensor_parallel_size,\n        ctx_batch_size=1,\n        batch_size=scheduler_config.max_num_seqs,\n        max_context_length=scheduler_config.max_model_len,\n        seq_len=scheduler_config.max_model_len,\n        enable_bucketing=True,\n        is_continuous_batching=True,\n        quantized=False,\n        torch_dtype=TORCH_DTYPE_TO_NEURON_AMP[model_config.dtype],\n        padding_side=\"right\"\n    )\n\n\nUse the ``additional_config`` field to provide an ``override_neuron_config`` dictionary that specifies your desired NxD Inference configuration settings. You provide the settings you want to override as a dictionary (or JSON object when starting vLLM from the CLI) containing basic types. For example, to enable prefix caching:\n\n.. code:: ipython3\n    \n    additional_config=dict(\n        override_neuron_config=dict(\n            is_prefix_caching=True,\n            is_block_kv_layout=True,\n            pa_num_blocks=4096,\n            pa_block_size=32,\n        )\n    )\n\nor when launching vLLM from the CLI\n\n.. code:: bash\n\n    --additional-config '{\n        \"override-neuron-config\": {\n            \"is_prefix_caching\": true,\n            \"is_block_kv_layout\": true,\n            \"pa_num_blocks\": 4096,\n            \"pa_block_size\": 32\n        }\n    }'\n\nHere's a list of arguments that can be set to enable specific features from NxD Inference.\n\n.. list-table::\n   :header-rows: 1\n   :widths: 45 10 45\n\n   * - Neuronx Distributed Inference Feature\n     - Argument\n     - Description\n   * - :ref:`Sequence Parallelism <nxdi-feature-guide-sequence-parallelism>`\n     - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"sequence_parallel_enabled\": true\n                } \n            }'\n    \n     - Sequence parallelism splits tensors across the sequence dimension\n   * - :ref:`QKV Weight Fusion <qkv-weight-fusion>`\n     - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"fused_qkv\": true\n                }\n        }'\n     - QKV weight fusion concatenates a model’s query, key and value weight matrices\n   * - :ref:`Bucketing <nxdi-bucketing>`\n     - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"enable_bucketing\": true,\n                \"context_encoding_buckets\": [512, 1024],\n                \"token_generation_buckets\": [1536, 2048]\n                }\n            }'\n     - Bucketing helps LLMs work optimally with different shapes.Setting only `enable_bucketing=True` enables automatic bucketing which creates context encoding and token generation buckets with powers of two from 128 and `max-model-len`.Set `context_encoding_buckets` and `token_generation_buckets` to explicit values if your use case needs to be optimized for specific sequence lengths. \n   * - :ref:`Prefix Caching<nxdi_prefix_caching>`\n     - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"is_prefix_caching\": true,\n                \"is_block_kv_layout\": true,\n                \"pa_num_blocks\": 4096,\n                \"pa_block_size\": 32\n                }\n            }'\n     - ``is_prefix_caching`` and ``is_block_kv_layout`` enable prefix caching and block KV Cache layout respectively. Both arguments need to be enabled for automatic prefix caching. For optimal performance with Neuron, it’s recommended to set ``pa_block_size=32 or 16``. Also set ``num_gpu_blocks_override`` to the same value as ``pa_num_blocks``.\n   * - :ref:`Asyncronous Runtime Support<nxdi_async_mode_feature_guide>`\n     - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"async_mode\": true\n                }\n            }'\n     - Parallelizes CPU logic with Neuron device logic, eliminating CPU overheads\n   * - :ref:`Bucketing with Prefix Caching<bucketing-with-prefix-caching>`\n     - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"is_prefix_caching\": true,\n                \"is_block_kv_layout\": true,\n                \"pa_num_blocks\": 4096,\n                \"pa_block_size\": 32,\n                \"context_encoding_buckets\": [512, 1024, 2048],\n                \"prefix_buckets\": [512, 1024],\n                \"token_generation_buckets\": [2048]\n                }\n            }'\n     - Bucketing is enabled by default, and Neuron automatically determines optimal bucket sizes. However, if needed, you can specify custom bucket sizes by defining the context_encoding_buckets and prefix_buckets parameters in ``override-neuron-config``.\n   \nFor more information on NxD Inference features, see :ref:`NxD Inference Features Configuration Guide<nxdi-feature-guide>`\nand :ref:`NxD Inference API Reference<nxd-inference-api-guide>`.\n\nEnabling Kernels\n^^^^^^^^^^^^^^^^\nKernels can be enabled in nxdi by using certain arguments through ``--additional-config`` via the ``overrride-neuron-config`` field.\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n\n   * - Argument\n     - Description\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"attn_kernel_enabled\": true\n                } \n            }'\n    \n     - Prefill attention kernel\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"attn_block_tkg_nki_kernel_enabled\": true\n                }\n        }'\n     - Token generation attention kernel with block kv layout. Improves performance when prefix caching is enabled.\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"attn_tkg_nki_kernel_enabled\": true\n                }\n        }'\n     - Token generation attention kernel without block kv layout.\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"attn_block_tkg_nki_kernel_cascaded_attention\": true\n                }\n        }'\n     - Token generation attention kernel. Use this for performance considerations. Performance is better at longer sequence lengths and high batches. Needs to be used with ``attn_block_tkg_nki_kernel``.\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"attn_block_tkg_nki_kernel_cache_update\": true\n                }\n        }'\n     - Enables cache update inside the attention kernel \n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"attn_block_cte_nki_kernel_enabled\": true\n                }\n        }'\n     - Prefill attention kernel with block KV for prefix caching support\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"mlp_kernel_enabled\": true\n                }\n        }'\n     - Prefill MLP kernel\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"quantized_mlp_kernel_enabled\": true\n                }\n        }'\n     - Prefill MLP kernell with fp8 (static / dynamic quantization)\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"mlp_tkg_nki_kernel_enabled\": true\n                }\n        }'\n     - Token generation MLP kernel. Should be used with ``mlp_kernel_enabled``.\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"mlp_kernel_fuse_residual_add\": true\n                }\n        }'\n     - Fuses the residual add into the mlp kernel. This kernel cannot be used when ``sequence_parallel_enabled`` is used.\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"qkv_kernel_enabled\": true\n                }\n        }'\n     - QKV projection prefill kernel\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"qkv_nki_kernel_enabled\": true\n                }\n        }'\n     - QKV projection prefill kernel (new NKI)\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"qkv_cte_nki_kernel_fuse_rope\": true\n                }\n        }'\n     - QKV projection prefill with fused RoPE\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"qkv_kernel_fuse_residual_add\": true\n                }\n        }'\n     - Fuses residual add into the qkv kernel. Can be used with ``qkv_kernel_enabled`` and cannot be used with ``sequence_parallel_enabled``\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"rmsnorm_quantize_kernel_enabled\": true\n                }\n        }'\n     - Used in combination with quantized  mlp kernel for Prefill. Moves qunatization from MLP kenrel to rmsnorm, followed by collectives in fp8 and qunatized mlp. Use for better performance.\n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"strided_context_parallel_kernel_enabled\": true\n                }\n        }'\n     - Context parallel attention CTE kernel with striding for load balancing. To be used when using context parallelism. \n   * - .. code-block::\n\n        --additional-config '{\n            \"override-neuron-config\": {\n                \"k_cache_transposed\": true\n                }\n        }'\n     - Decode optimization\n\n\nScheduling and K/V Cache\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNxD Inference uses a contiguous memory layout for the K/V cache instead of PagedAttention support.\nIt integrates into vLLM's block manager by setting the block size to the maximum length supported by the model\nand allocating one block per maximum number of sequences configured. However, the vLLM scheduler currently does\nnot introspect the blocks associated to each sequence when (re-)scheduling running sequences. The scheduler requires an additional\nfree block regardless of space available in the current block resulting in preemption. This would lead to a large increase \nin latency for the preempted sequence because it would be rescheduled in the context encoding phase. Since NxD Inference's implementation ensures each block\nis big enough to fit the maximum model length, preemption is never needed in our current integration. \nAs a result, AWS Neuron disabled the preemption checks done by the scheduler in our fork. This significantly improves\nE2E performance of the Neuron integration.\n\n.. _nxdi-on-device-sampling:\n\nDecoding\n^^^^^^^^^^\n\nOn-device sampling is enabled by default, which performs sampling logic on the Neuron devices \nrather than passing the generated logits back to CPU and sample through vLLM. This allows you to\nuse Neuron hardware to accelerate sampling and reduce the amount of data transferred between devices \nleading to improved latency.\n\nHowever, on-device sampling comes with some limitations. Currently, we only support the following\nsampling parameters: ``temperature``, ``top_k`` and ``top_p`` parameters. \nOther `sampling parameters <https://docs.vllm.ai/en/latest/dev/sampling_params.html>`_ are currently\nnot supported through on-device sampling.\n\nWhen on-device sampling is enabled, we handle the following special cases:\n\n* When ``top_k`` is set to -1, we limit ``top_k`` to 256 instead.\n* When ``temperature`` is set to 0, we use greedy decoding to remain compatible with existing conventions. This is the same as setting ``top_k`` to 1.\n\nBy default, on-device sampling utilizes a greedy decoding strategy to select tokens with the highest probabilities. \nYou can enable a different on-device sampling strategy by passing a ``on_device_sampling_config``\nusing the override neuron config feature (see :ref:`Model Configuration<nxdi-vllm-model-configuration>`). It is strongly recommended to make use\nof the ``global_top_k`` configuration limiting the maximum value of ``top_k`` a user can request for improved performance.\n\nQuantization\n^^^^^^^^^^^^^^\n\nNxD Inference supports quantization but has not yet been integrated with vLLM's configuration for quantization.\nIf you want to use quantization, **do not** set vLLM's ``--quantization`` setting to ``neuron_quant``. \nKeep it unset and use the Neuron configuration of the model to configure quantization of the NxD Inference model directly.\nFor more information on how to configure and use quantization with NxD Inference incl. requirements on checkpoints,\nrefer to :ref:`Quantization<nxdi-quantization>` in the NxD Inference Feature Guide.\n\n.. _nxdi-vllm-v1-serialization:\n\nLoading pre-compiled models / Serialization Support\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTracing and compiling the model can take a non-trivial amount of time depending on model size e.g. \na small-ish model of 15GB might take around 15min to compile. Exact times depend on multiple factors.\nDoing this on each server start would lead to unacceptable application startup times. \nTherefore, we support storing and loading the traced and compiled models.\n\nBoth are controlled through the ``NEURON_COMPILED_ARTIFACTS`` variable. When pointed to a path that contains a pre-compiled model,\nwe load the pre-compiled model directly, and any differing model configurations passed in to the vllm API will not trigger re-compilation. \nIf loading from the ``NEURON_COMPILED_ARTIFACTS`` path fails, then we will recompile the model with the provided configurations and store \nthe results in the provided location. If ``NEURON_COMPILED_ARTIFACTS`` is not set, we will compile the model and store it under a ``neuron-compiled-artifacts``\nsubdirectory in the directory of your model checkpoint.\n\nPrefix Caching\n^^^^^^^^^^^^^^^^\n\nStarting in Neuron SDK 2.24, prefix caching is supported on the AWS Neuron fork of vLLM. Prefix caching allows developers to improve TTFT by \nre-using the KV Cache of the common shared prompts across inference requests. See :ref:`Prefix Caching <nxdi_prefix_caching>` for more information on how to \nenable prefix caching with vLLM. \n\n\nExamples\n--------\n\nFor more in depth NxD Inference tutorials that include vLLM deployment steps, refer to :ref:`Tutorials <nxdi-tutorials-index>`.\n\nThe following examples use `TinyLlama/TinyLlama-1.1B-Chat-v1.0 <https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0>`_\n\nIf you have access to the model checkpoint locally, replace ``TinyLlama/TinyLlama-1.1B-Chat-v1.0`` with the path to your local copy. \n\nIf you use a different instance type, you need to adjust the ``tensor_parallel_size`` according to the number of Neuron Cores \navailable on your instance type. (For more information see: :doc:`Tensor-parallelism support </libraries/nxd-inference/app-notes/parallelism>`.)\n\nOffline Inference Example\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFor offline inference, refer to the code example in the Quickstart section above.\n\nOnline Inference Example\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYou can start an OpenAI API compatible server with the same settings as the offline example by running\nthe following command:\n\n.. code:: bash\n\n    vllm serve \\\n        --model \"TinyLlama/TinyLlama-1.1B-Chat-v1.0\" \\\n        --tensor-parallel-size 2 \\\n        --max-model-len 128 \\\n        --max-num-seqs 4 \\\n        --block-size 32 \\\n        --num-gpu-blocks-override 16 \\\n        --port 8000\n\nIn addition to the sampling parameters supported by OpenAI, we also support ``top_k``.\nYou can change the sampling parameters and enable or disable streaming.\n\n.. code:: python\n\n    from openai import OpenAI\n\n    # Client Setup\n    openai_api_key = \"EMPTY\"\n    openai_api_base = \"http://localhost:8000/v1\"\n\n    client = OpenAI(\n        api_key=openai_api_key,\n        base_url=openai_api_base,\n    )\n\n    models = client.models.list()\n    model_name = models.data[0].id\n\n    # Sampling Parameters\n    max_tokens = 64\n    temperature = 1.0\n    top_p = 1.0\n    top_k = 50\n    stream = False\n\n    # Chat Completion Request\n    prompt = \"Hello, my name is Llama \"\n    response = client.chat.completions.create(\n        model=model_name,\n        messages=[{\"role\": \"user\", \"content\": prompt}],\n        max_tokens=int(max_tokens),\n        temperature=float(temperature),\n        top_p=float(top_p),\n        stream=stream,\n        extra_body={'top_k': top_k}\n    )\n\n    # Parse the response\n    generated_text = \"\"\n    if stream:\n        for chunk in response:\n            if chunk.choices[0].delta.content is not None:\n                generated_text += chunk.choices[0].delta.content\n    else:\n        generated_text = response.choices[0].message.content\n        \n    print(generated_text)\n\n\nKnown Issues\n---------------\n\n1. Chunked prefill is not supported on Neuron.\n2. You must provide ``num_gpu_blocks_override`` to avoid out-of-bounds (OOB) errors. This override ensures vLLM's scheduler uses the same block count that you compiled into the model Currently NxDI does not support using different kv cache sizes at compile vs. runtime.\n\n   - With either chunked prefill or prefix caching: NxDI will internally use blockwise kv cache layout. Set ``num_gpu_blocks_override`` to at least ``ceil(max_model_len / block_size) * max_num_seqs``\n   - With neither chunked prefill nor prefix caching: NxDI will internally use contiguous kv cache layout, and overwrite ``block_size`` to ``max_model_len``. Set ``num_gpu_blocks_override`` to exactly ``max_num_seqs``\n\n3. When using HuggingFace model IDs with `shard on load <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/weights-sharding-guide.html#shard-on-load>`_ enabled, models with ``tie_word_embeddings=true`` in their config.json (including Qwen3-8B, Qwen2.5-7B, and other Qwen family models) will encounter the error ``NotImplementedError: Cannot copy out of meta tensor; no data!``. To resolve this, download the model checkpoint locally from Hugging Face and serve it from the local path instead of using the HuggingFace model ID.\n4. Async tokenization in vLLM V1 may result in increased time to first token (TTFT) compared to V0 for small inputs and low batch sizes, as the orchestration overhead can outweigh the efficiency gains from async processing.\n5. The following features are only supported on the legacy Neuron fork of vLLM v0 architecture that is no longer supported: disaggregated inference, mllama, and speculative decoding with a draft model. The fork can be found at https://github.com/aws-neuron/upstreaming-to-vllm/releases/tag/2.26.1. \n\nSupport\n----------\n\n- **Documentation**: `AWS Neuron Documentation <https://awsdocs-neuron.readthedocs-hosted.com/>`_\n- **Issues**: `GitHub Issues <https://github.com/vllm-project/vllm-neuron/issues>`_\n- **Community**: `AWS Neuron Forum <https://repost.aws/tags/TAjy-krivRTDqDPWNNBmV9lA>`_"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/vllm-user-guide.rst",
    "content": ".. _nxdi-vllm-user-guide-v0:\n\nvLLM V0 User Guide for NxD Inference (Legacy)\n==============================================\n\n`vLLM <https://docs.vllm.ai/en/latest/>`_ is a popular library for LLM inference and serving utilizing advanced inference features such as continuous batching.\nThis guide describes how to utilize AWS Inferentia and AWS Trainium AI accelerators in vLLM by using NxD Inference (``neuronx-distributed-inference``).\n\n.. important::\n   This guide is compatible with vLLM v0.x versions. Since vLLM has deprecated v0.x versions (see `vLLM issue #18571 <https://github.com/vllm-project/vllm/issues/18571>`_), Neuron recommends using vLLM v1.x with the vLLM-Neuron Plugin for new deployments. See :ref:`vLLM User Guide  V1 <nxdi-vllm-user-guide>` for the updated guide.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n\nNxD Inference integrates into vLLM by extending the model execution components responsible\nfor loading and invoking models used in vLLM’s LLMEngine (see https://docs.vllm.ai/en/latest/design/arch_overview.html#llm-engine \nfor more details on vLLM architecture). This means input processing, scheduling and output \nprocessing follow the default vLLM behavior. \n\nCurrently, we support continuous batching and streaming generation in the NxD Inference vLLM integration.\nWe are working with the vLLM community to enable support for other vLLM features like PagedAttention\nand Chunked Prefill on Neuron instances through NxD Inference in upcoming releases.\n\n\nSupported Models\n----------------\n\nRefer to :ref:`Supported Model Architectures<nxdi-supported-model-architectures>` for a list of models supported in vLLM through NxD Inference.\n\nIf you are adding your own model to NxD Inference, please see :ref:`Integrating Onboarded Model with vLLM<nxdi-onboarding-models-vllm>`\nfor instructions on how to setup vLLM integration for it.\n\n.. warning::\n  NeuronX distributed inference does not support the following combination of features in vLLM:\n\n  - vLLM with model ID\n  - Shard on load\n  - Tied weight embeddings\n \n  If this combination is configured, you will likely see this error: ``NotImplementedError: Cannot copy out of meta tensor; no data!``\n \n  To workaround this limitation, download a model checkpoint from Hugging Face (such as `Qwen3-8B <https://huggingface.co/Qwen/Qwen3-8B>`_) and serve it.\n  \nSetup\n-----\nBefore installing vLLM with the instructions below, you need to install the Neuron SDK.\n\nPrerequisite: Launch an instance and install drivers and tools\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBefore installing vLLM with the instructions below, you will first need to launch an Inferentia or Trainium instance and install the necessary\nNeuron drivers and tools. Refer to :ref:`these setup instructions<nxdi-setup>` for different ways to prepare your environment, including using\nNeuron DLAMIs and Neuron DLCs for quick setups.\n\nInstalling the AWS Neuron fork of vLLM \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe maintain a fork of vLLM that supports the latest features for NxD Inference. \n\nQuickstart using Docker\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nUsers can now use a preconfigured Deep Learning Container (DLC) with the AWS Neuron fork of vLLM pre-installed.\nRefer to the `vllm-inference-neuronx container <https://github.com/aws-neuron/deep-learning-containers?tab=readme-ov-file#vllm-inference-neuronx>`_\non `https://github.com/aws-neuron/deep-learning-containers <https://github.com/aws-neuron/deep-learning-containers>`_ to get started.\n\nFor a complete step-by-step tutorial on deploying the vLLM Neuron DLC, see :ref:`quickstart_vllm_dlc_deploy`.\n\nManually install from source\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nTo manually install the AWS fork from source, use the following commands:\n\n.. code::\n\n    git clone -b 2.26.1 https://github.com/aws-neuron/upstreaming-to-vllm.git\n    cd upstreaming-to-vllm\n    pip install -r requirements/neuron.txt\n    VLLM_TARGET_DEVICE=\"neuron\" pip install -e .\n\n\nInstalling vLLM from vLLM main repository\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nA prior version of Neuron SDK 2.23 NxD Inference support was upstreamed onto vLLM v0.9.0. \nAdditional details can be found in vLLM docs `here <https://docs.vllm.ai/en/stable/getting_started/installation/ai_accelerator.html#aws-neuron>`_.\n\nTo install the official vLLM repository with Neuron support, use the following commands. Only Neuron SDK 2.23 and prior features are \ncurrently available in the official vLLM repository. See Neuron SDK 2.23 artifacts :ref:`here<neuron-2.23.0-artifacts>`. It is recommended \nto re-install neuronx-distributed and neuronx-distributed-inference libraries after installing vLLM to avoid dependency version incompatibilities.\n\n.. code::\n\n    git clone -b releases/v0.9.0 https://github.com/vllm-project/vllm.git\n    cd vllm\n    pip install -U -r requirements/neuron.txt\n    VLLM_TARGET_DEVICE=\"neuron\" pip install -e .\n\n    pip install neuronx-distributed==0.12.12111\n    pip install neuronx-distributed-inference==0.3.5591\n\n\nUsage\n-----\n\nNeuron Framework Selection\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. note::\n\n    The Neuron integration for vLLM supports both Transformers NeuronX and NxD Inference libraries. Set the ``VLLM_NEURON_FRAMEWORK`` \n    environment variable to ``neuronx-distributed-inference`` to use the NxD Inference library. Set the  ``VLLM_NEURON_FRAMEWORK`` \n    environment variable to ``transformers-neuronx`` to use the Transformers NeuronX library. Make sure you have the corresponding library\n    installed before running vLLM. If you have both libraries installed, and the ``VLLM_NEURON_FRAMEWORK`` environment variable is not set,\n    the NxD Inference library will be used by default.\n\nIf you are migrating from Transformers NeuronX to NxD Inference, you can refer to this :ref:`Migration Guide<nxdi_migrate_from_tnx>` for\nadditional support.\n\nQuickstart\n^^^^^^^^^^\n\nHere is a quick and minimal example to get running.\n\n.. code::\n\n    import os\n    os.environ['VLLM_NEURON_FRAMEWORK'] = \"neuronx-distributed-inference\"\n\n    from vllm import LLM, SamplingParams\n    llm = LLM(\n        model=\"TinyLlama/TinyLlama-1.1B-Chat-v1.0\",\n        max_num_seqs=8,\n        max_model_len=128,\n        device=\"neuron\",\n        tensor_parallel_size=2)\n\n    prompts = [\n        \"Hello, my name is\",\n        \"The president of the United States is\",\n        \"The capital of France is\",\n        \"The future of AI is\",\n    ]\n    # note that top_k must be set to lower than the global_top_k defined in\n    # the neuronx_distributed_inference.models.config.OnDeviceSamplingConfig\n    sampling_params = SamplingParams(top_k=10, temperature=0.8, top_p=0.95)\n\n    outputs = llm.generate(prompts, sampling_params)\n\n    for output in outputs:\n        prompt = output.prompt\n        generated_text = output.outputs[0].text\n        print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n\n\n.. _nxdi-vllm-model-configuration:\n\nModel Configuration\n^^^^^^^^^^^^^^^^^^^\n\nNxD Inference models provide many configuration options. When using NxD Inference through vLLM,\nwe configure the model with a default configuration that sets the required fields from vLLM settings.\nIt is recommended that you do not override these configuration settings unless you need it.\n\n.. code:: ipython3\n\n    neuron_config = dict(\n        tp_degree=parallel_config.tensor_parallel_size,\n        ctx_batch_size=1,\n        batch_size=scheduler_config.max_num_seqs,\n        max_context_length=scheduler_config.max_model_len,\n        seq_len=scheduler_config.max_model_len,\n        enable_bucketing=True,\n        is_continuous_batching=True,\n        quantized=False,\n        torch_dtype=TORCH_DTYPE_TO_NEURON_AMP[model_config.dtype],\n        padding_side=\"right\"\n    )\n\n\nIf you want to add or change any settings, you can use vLLM's ``override_neuron_config`` setting. \nYou provide the settings you want to override as dictionary (or JSON object when starting vLLM from the CLI)\ncontaining basic types e.g. to disable auto bucketing (for illustration), use \n\n.. code:: ipython3\n    \n    override_neuron_config={\n        \"enable_bucketing\":False,\n    }\n\nor when launching vLLM from the CLI\n\n.. code::\n\n    --override-neuron-config \"{\\\"enable_bucketing\\\":false}\"\n\n\nFor more information on NxD Inference features, see :ref:`NxD Inference Features Configuration Guide<nxdi-feature-guide>`\nand :ref:`NxD Inference API Reference<nxd-inference-api-guide>`.\n\nScheduling and K/V Cache\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe currently use a contiguous memory layout for the K/V cache instead of PagedAttention support in NxD Inference.\nWe integrated into vLLMs block manager by setting the block size to the maximum length supported by the model\nand allocating one block per maximum number of sequences configured. However, the vLLM scheduler currently does\nnot introspect the blocks associated to each sequence when (re-)scheduling running sequences. It requires an additional\nfree block regardless of space available in the current block resulting in preemption. This would lead to a large increase \nin latency for the preempted sequence because it would be rescheduled in the context encoding phase. Since we ensure each block\nis big enough to fit the maximum model length, preemption is never needed in our current integration. \nTherefore, we disabled the preemption checks done by the scheduler in our fork. This significantly improves\nE2E performance of the Neuron integration.\n\nDecoding\n^^^^^^^^\n\n:ref:`On-device sampling<nxdi-on-device-sampling>` is enabled by default, which performs sampling logic on the Neuron devices \nrather than passing the generated logits back to CPU and sample through vLLM. This allows us to\nuse Neuron hardware to accelerate sampling and reduce the amount of data transferred between devices \nleading to improved latency.\n\nHowever, on-device sampling comes with some limitations. Currently, we only support the following\nsampling parameters: ``temperature``, ``top_k`` and ``top_p`` parameters. \nOther sampling parameters (https://docs.vllm.ai/en/latest/dev/sampling_params.html) are currently\nnot supported through on-device sampling.\n\nWhen on-device sampling is enabled, we handle the following special cases:\n\n* When ``top_k`` is set to -1, we limit ``top_k`` to 256 instead.\n* When ``temperature`` is set to 0, we use greedy decoding to remain compatible with existing conventions. This is the same as setting ``top_k`` to 1.\n\nBy default, on-device sampling utilizes a greedy decoding strategy to select tokens with the highest probabilities. \nYou can enable a different on-device sampling strategy by passing a ``on_device_sampling_config``\nusing the override neuron config feature (see :ref:`Model Configuration<nxdi-vllm-model-configuration>`). It is strongly recommended to make use\nof the ``global_top_k`` configuration limiting the maximum value of ``top_k`` a user can request for improved performance.\n\nQuantization\n^^^^^^^^^^^^\n\nNxD Inference supports quantization but has not yet been integrated with vLLMs configuration for quantization.\nIf you want to use quantization, **do not** set vLLM’s  ``--quantization`` setting to ``neuron_quant``. \nKeep it unset and use the Neuron configuration of the model to configure quantization of the NxD Inference model directly.\nFor more information on how to configure and use quantization with NxD Inference incl. requirements on checkpoints,\nrefer to :ref:`Quantization<nxdi-quantization>` in the NxD Inference Feature Guide.\n\nLoading pre-compiled models / Serialization Support\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTracing and compiling the model can take a non-trivial amount of time depending on model size e.g. \na small-ish model of 15GB might take around 15min to compile. Exact times depend on multiple factors.\nDoing this on each server start would lead to unacceptable application startup times. \nTherefore, we support storing and loading the traced and compiled models.\n\nBoth are controlled through the ``NEURON_COMPILED_ARTIFACTS`` variable. When pointed to a path that contains a pre-compiled model,\nwe load the pre-compiled model directly, and any differing model configurations passed in to the vllm API will not trigger re-compilation. \nIf loading from the ``NEURON_COMPILED_ARTIFACTS`` path fails, then we will recompile the model with the provided configurations and store \nthe results in the provided location. If ``NEURON_COMPILED_ARTIFACTS`` is not set, we will compile the model and store it under a ``neuron-compiled-artifacts``\nsubdirectory in the directory of your model checkpoint.\n\nPrefix Caching\n^^^^^^^^^^^^^^\nStarting in Neuron SDK 2.24, prefix caching is supported on the AWS Neuron fork of vLLM. Prefix caching allows developers to improve TTFT by \nre-using the KV Cache of the common shared prompts across inference requests. See :ref:`Prefix Caching<nxdi_prefix_caching>` for more information on how to \nenable prefix caching with vLLM. \n\n\nDisaggregated Inference\n^^^^^^^^^^^^^^^^^^^^^^^\nStarting in Neuron SDK 2.24, disaggregated inference is supported on the AWS Neuron fork of vLLM. This feature allows different hardware\nresources to separately perform the compute intensive prefill phase and the memory bandwidth intensive decode phase of inference, thereby \nremoving the prefill-decode interference and improving Goodput. See :ref:`Disaggregated Inference<nxdi-disaggregated-inference>` for more information on \nhow to use disaggregated inference with vLLM. \n\n\nExamples\n--------\n\nFor a list of examples for using vLLM with Neuron, refer to `upstreaming-to-vllm/examples\n/offline_inference/ <https://github.com/aws-neuron/upstreaming-to-vllm/tree/neuron-2.26/examples/offline_inference>`_ folder. Look for example scripts with the ``neuron_`` prefix. \nWe provide examples for use cases such as `automatic prefix caching <https://github.com/aws-neuron/upstreaming-to-vllm/blob/neuron-2.26/examples/offline_inference/neuron_prefix_caching.py>`_,\n`disaggregated inference <https://github.com/aws-neuron/upstreaming-to-vllm/blob/neuron-2.26/examples/offline_inference/neuron_di.py>`_, \n`speculative decoding with a draft model <https://github.com/aws-neuron/upstreaming-to-vllm/blob/neuron-2.26/examples/offline_inference/neuron_speculation.py>`_,\n`speculative decoding using EAGLE <https://github.com/aws-neuron/upstreaming-to-vllm/blob/neuron-2.26/examples/offline_inference/neuron_eagle.py>`_,\n`multimodal models <https://github.com/aws-neuron/upstreaming-to-vllm/blob/neuron-2.26/examples/offline_inference/neuron_multimodal.py>`_, \n`multi-LoRA <https://github.com/aws-neuron/upstreaming-to-vllm/blob/neuron-2.26/examples/offline_inference/neuron_multi_lora.py>`_, \n`quantization <https://github.com/aws-neuron/upstreaming-to-vllm/blob/neuron-2.26/examples/offline_inference/neuron_int8_quantization.py>`_, and more.\n\n\nFor more in depth NxD Inference tutorials that include vLLM deployment steps, refer to :ref:`Tutorials<nxdi-tutorials-index>`.\n\nThe following examples use `meta-llama/Llama-3.1-8B-Instruct <https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct>`_ on a ``Trn1.32xlarge`` instance. \n\nIf you have access to the model checkpoint locally, replace ``meta-llama/Llama-3.1-8B-Instruct`` with the path to your local copy. \nOtherwise, you need to request access through HuggingFace and login via `huggingface-cli login <https://huggingface.co/docs/huggingface_hub/en/guides/cli#huggingface-cli-login>`_ using \na `HuggingFace user access token <https://huggingface.co/docs/hub/en/security-tokens>`_ before running the examples. \n\nIf you use a different instance type, you need to adjust the ``tp_degree`` according to the number of Neuron Cores \navailable on your instance type (for more information see: :ref:`Tensor-parallelism support<nxdi-tensor-parallelism>`).\n\nOffline Inference Example\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nHere is an example for running offline inference. :ref:`Bucketing<nxdi-bucketing>` is only disabled to demonstrate \nhow to override Neuron configuration values. Keeping it enabled generally delivers better\nperformance.\n\n.. code:: ipython3\n\n    import os\n    os.environ['VLLM_NEURON_FRAMEWORK'] = \"neuronx-distributed-inference\"\n\n    from vllm import LLM, SamplingParams\n\n    # Sample prompts.\n    prompts = [\n        \"The president of the United States is\",\n        \"The capital of France is\",\n        \"The future of AI is\",\n    ]\n    # Create a sampling params object.\n    sampling_params = SamplingParams(top_k=1)\n\n    # Create an LLM.\n    llm = LLM(\n        model=\"meta-llama/Llama-3.1-8B-Instruct\",\n        max_num_seqs=4,\n        max_model_len=128,\n        override_neuron_config={\n            \"enable_bucketing\":False,\n        },\n        device=\"neuron\",\n        tensor_parallel_size=32)\n\n    outputs = llm.generate(prompts, sampling_params)\n\n    for output in outputs:\n        prompt = output.prompt\n        generated_text = output.outputs[0].text\n        print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n\nOnline Inference Example\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nYou can start an OpenAI API compatible server with the same settings as the offline example by running\nthe following command:\n\n.. code::\n\n    VLLM_NEURON_FRAMEWORK='neuronx-distributed-inference' python -m vllm.entrypoints.openai.api_server \\\n        --model=\"meta-llama/Llama-3.1-8B-Instruct\" \\\n        --max-num-seqs=4 \\\n        --max-model-len=128 \\\n        --tensor-parallel-size=8 \\\n        --port=8080 \\\n        --device \"neuron\" \\\n        --override-neuron-config \"{\\\"enable_bucketing\\\":false}\"\n\nIn addition to the sampling parameters supported by OpenAI, we also support ``top_k``.\nYou can change the sampling parameters and enable or disable streaming.\n\n.. code::\n\n    from openai import OpenAI\n\n    # Client Setup\n    openai_api_key = \"EMPTY\"\n    openai_api_base = \"http://localhost:8000/v1\"\n\n    client = OpenAI(\n        api_key=openai_api_key,\n        base_url=openai_api_base,\n    )\n\n    models = client.models.list()\n    model_name = models.data[0].id\n\n    # Sampling Parameters\n    max_tokens = 1024\n    temperature = 1.0\n    top_p = 1.0\n    top_k = 50\n    stream = False\n\n    # Chat Completion Request\n    prompt = \"Hello, my name is Llama \"\n    response = client.chat.completions.create(\n        model=model_name,\n        messages=[{\"role\": \"user\", \"content\": prompt}],\n        max_tokens=int(max_tokens),\n        temperature=float(temperature),\n        top_p=float(top_p),\n        stream=stream,\n        extra_body={'top_k': top_k}\n    )\n\n    # Parse the response\n    generated_text = \"\"\n    if stream:\n        for chunk in response:\n            if chunk.choices[0].delta.content is not None:\n                generated_text += chunk.choices[0].delta.content\n    else:\n        generated_text = response.choices[0].message.content\n        \n    print(generated_text)\n\n\nSpecifying context and token buckets (online inference)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYou can tune bucketing for **prefill** (context encoding) and **decode** (token generation) by\npassing ``override_neuron_config`` to the OpenAI-compatible server.  \nThe example below targets a 1K-token workload on ``meta-llama/Llama-3.1-8B-Instruct`` with **single sequence** (BS=1) execution.\n\n.. code:: bash\n\n    export VLLM_NEURON_FRAMEWORK=\"neuronx-distributed-inference\"\n\n    python -m vllm.entrypoints.openai.api_server \\\n      --model \"meta-llama/Llama-3.1-8B-Instruct\" \\\n      --device \"neuron\" \\\n      --tensor-parallel-size 16 \\\n      --max-num-seqs 1 \\\n      --max-model-len 1024 \\\n      --port 8080 \\\n      --override-neuron-config \"{\\\"enable_bucketing\\\": true, \\\n        \\\"context_encoding_buckets\\\": [256, 512, 1024], \\\n        \\\"token_generation_buckets\\\": [32, 64, 128, 256, 512, 768], \\\n        \\\"max_context_length\\\": 1024, \\\n        \\\"seq_len\\\": 1024, \\\n        \\\"batch_size\\\": 1, \\\n        \\\"ctx_batch_size\\\": 1, \\\n        \\\"tkg_batch_size\\\": 1, \\\n        \\\"is_continuous_batching\\\": true}\"\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/weights-sharding-guide.rst",
    "content": ".. _nxdi-weights-sharding-guide:\n\nNxD Inference Weights Sharding Guide\n==========================================\n\nNxD Inference provides two approaches to shard model weights and load them onto Neuron Devices, enabling parallel processing \n(e.g. Tensor Parallelism) on each device. This guide demonstrates the usage of both approaches using :ref:`nxdi-trn2-llama3.1-405b-speculative-tutorial`,\nand provides insights into selecting the appropriate method based on the usage pattern and performance requirements.\n\n.. note::\n\n    Sharding speed on different storage volumes can vary. We recommend to use NVMe solid state drive (SSD) storage to achieve the best sharding performance.\n    This guide shows sharding results on NVMe SSD. For more information about NVMe storage on EC2 instances, see the following:\n    * `Instance store volumes <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html>`__ in the Amazon EC2 User Guide. Instance store volumes are drives attached to EC2 instances that you can use for temporary storage. Neuron instances such as Trn1 and Trn2 include NVMe drives that you can use as instance store volumes.\n    * `EBS volumes and NVMe <https://docs.aws.amazon.com/ebs/latest/userguide/nvme-ebs-volumes.html>`__ in the Amazon EBS User Guide. For persistent storage on NVMe, you can use EBS volumes built on AWS Nitro.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nShard on compile (Pre-shard)\n----------------------------\n\nThe shard on compile (pre-shard) approach loads the supported pretrained :ref:`checkpoints <nxdi-checkpoint-support>`, \nconverts to Neuron compatible format, shards for each parallel rank and serializes sharded weights to disk as safetensors files. The entire sharding and serialization \nprocess can take a few minutes to hours depending on the model size and throughput of the storage volume. This approach is optimized to minimize the future model loading time.\n\nThe following example demonstrates how to run shard on compile with Llama3.1-405b.\n\nFirst, complete the `prerequisites <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.1-405b-speculative-tutorial.html#prerequisites>`__\nfor running Llama3.1-405b on a Trn2.48xlarge instance.\n\nNext, enable shard on compile by adding ``--save-sharded-checkpoint`` to the command. The sharded checkpoints will be saved to the ``/weights`` folder under the specified ``COMPILED_MODEL_PATH``.\n\nFull command to run shard on compile for Llama3.1-405b:\n::\n\n    # Replace this with the path where model files are downloaded.\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.1-405B-Instruct/\"\n    # This is where the compiled model will be saved.\n    COMPILED_MODEL_PATH=\"/home/ubuntu/traced_model/Llama-3.1-405B-Instruct/\"\n\n    NUM_CORES=128\n    TP_DEGREE=64\n    LNC=2\n\n    export NEURON_RT_VIRTUAL_CORE_SIZE=$LNC\n    export NEURON_RT_NUM_CORES=$((NUM_CORES/NEURON_RT_VIRTUAL_CORE_SIZE))\n    export NEURON_RT_EXEC_TIMEOUT=600 \n\n    inference_demo \\\n        --model-type llama \\\n        --task-type causal-lm \\\n            run \\\n            --model-path $MODEL_PATH \\\n            --compiled-model-path $COMPILED_MODEL_PATH \\\n            --torch-dtype bfloat16 \\\n            --start_rank_id 0 \\\n            --local_ranks_size $TP_DEGREE \\\n            --tp-degree $TP_DEGREE \\\n            --batch-size 1 \\\n            --max-context-length 2048 \\\n            --seq-len 2048 \\\n            --on-device-sampling \\\n            --top-k 1 \\\n            --fused-qkv \\\n            --sequence-parallel-enabled \\\n            --qkv-kernel-enabled \\\n            --attn-kernel-enabled \\\n            --mlp-kernel-enabled \\\n            --cc-pipeline-tiling-factor 1 \\\n            --pad-token-id 2 \\\n            --save-sharded-checkpoint \\\n            --prompt \"What is annapurna labs?\" 2>&1 | tee log\n\nYou should see the outputs below in your logs. The duration can slightly vary between runs. Note that model loading started only after sharding is completed. \n\n::\n\n    INFO:Neuron:Sharding Weights for ranks: 0...63\n    INFO:Neuron:Done sharding weights in 1856.5586961259833 seconds\n    Loading model to Neuron...\n    Total model loading time: 107.76132441597292 seconds\n\nNow that sharded checkpoints have been serialized to disk, you may save sharding time in your next run by adding ``--skip-sharding`` to the command.\nSharded weights will be directly loaded from the disk for inference, which saves you 30+ minutes of sharding for each subsequent run in this example.\n\nThe total model loading time in each subsequent run is expected to be comparable with the first run.\n\n\nShard on load\n------------------\n\n.. warning::\n    At high batch size (>=32), we have observed performance degradation with ``shard-on-load`` for some models such as Llama3.1-8B. If you observe worse inference performance with ``shard-on-load``, please disable this feature (by enabling the ``--save-sharded-checkpoint`` flag during compilation with ``inference_demo`` as above).\n    Alternatively, if you are not using ``inference_demo``, you can also enable ``save_sharded_checkpoint`` directly in ``NeuronConfig`` which will be passed to model init when the model is traced and compiled.\n\nThe shard on load approach significantly reduces sharding overheads by parallelizing tensor movement in sharding/loading and skipping sharded checkpoints serialization. \nThis approach is preferred when you are working with weights that are frequently retrained/fine-tuned so re-sharding becomes a bottleneck when serving with new weights.\nSince Neuron 2.23 release, Shard on load is enabled by default in NxD Inference.\n\nFull command to run shard on load for Llama3.1-405b is shown below. Note that ``--save-sharded-checkpoint`` is excluded from the command.\n::\n\n    # Replace this with the path where model files are downloaded.\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.1-405B-Instruct/\"\n    # This is where the compiled model will be saved.\n    COMPILED_MODEL_PATH=\"/home/ubuntu/traced_model/Llama-3.1-405B-Instruct/\"\n\n    NUM_CORES=128\n    TP_DEGREE=64\n    LNC=2\n\n    export NEURON_RT_VIRTUAL_CORE_SIZE=$LNC\n    export NEURON_RT_NUM_CORES=$((NUM_CORES/NEURON_RT_VIRTUAL_CORE_SIZE))\n    export NEURON_RT_EXEC_TIMEOUT=600 \n\n    inference_demo \\\n        --model-type llama \\\n        --task-type causal-lm \\\n            run \\\n            --model-path $MODEL_PATH \\\n            --compiled-model-path $COMPILED_MODEL_PATH \\\n            --torch-dtype bfloat16 \\\n            --start_rank_id 0 \\\n            --local_ranks_size $TP_DEGREE \\\n            --tp-degree $TP_DEGREE \\\n            --batch-size 1 \\\n            --max-context-length 2048 \\\n            --seq-len 2048 \\\n            --on-device-sampling \\\n            --top-k 1 \\\n            --fused-qkv \\\n            --sequence-parallel-enabled \\\n            --qkv-kernel-enabled \\\n            --attn-kernel-enabled \\\n            --mlp-kernel-enabled \\\n            --cc-pipeline-tiling-factor 1 \\\n            --pad-token-id 2 \\\n            --prompt \"What is annapurna labs?\" 2>&1 | tee log\n\nYou should see the outputs below in your logs. The duration can slightly vary between runs. Note that sharding happened while model was being loaded (i.e. shard on load).\n\n::\n\n    Loading model to Neuron...\n    INFO:Neuron:Done Sharding weights in 49.31190276599955 seconds\n    Total model loading time: 187.3972628650372 seconds\n\nAs you can see, weights sharding of shard on load is much faster than that of shard on compile.\n\nWhen the current run finishes, no sharded checkpoints will be saved. Therefore, you cannot use ``--skip-sharding`` for your next run. \nIn each subsequent run, NxD Inference will do the exact same amount of sharding work, so the total model loading time is expected to be \ncomparable with the first run. It's also expected that the total model loading time is longer than that of shard on compile, due to the extra\nsharding work it has to do during loading time.\n"
  },
  {
    "path": "libraries/nxd-inference/developer_guides/writing-tests.rst",
    "content": ".. _nxdi-writing-tests:\n\nTesting modeling code with NxD Inference\n========================================\n\nTo ensure that your model is accurate and performant, we recommend that\nyou write tests for your modules, functions, and models. Run your tests\neach time you make a code change to check that your modeling code\ncontinues to work as expected.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nTesting models on Neuron\n------------------------\n\nNxD Inference provides utilities that you can use to test the performance\nand accuracy of a full model end-to-end. The :ref:`check_accuracy_logits <nxdi-logit-matching>` \ntool validates the accuracy of a Neuron model's output logits over the full sequence\nlength. NxD Inference also includes a benchmarking tool, :ref:`benchmark_sampling <nxdi-benchmark-sampling>`,\nthat you can use to evaluate the performance of your model and its submodels.\nYou can use these utilities to define integration tests that\nvalidate your model. For more information, see :ref:`nxdi-evaluating-models`.\n\nTesting modules and functions on Neuron\n---------------------------------------\n\nNxD Inference provides common test utilities to help you validate that\nmodules and functions run correctly on Neuron. The ``build_module`` and\n``build_function`` utilities help you convert modules and functions into\nNeuron models. Then, you can use the ``validate_accuracy`` function to\nvalidate that the Neuron model is accurate for given inputs. You can use\nthese utilities to write unit tests for your modeling code.\n\nBuilding modules to run on Neuron\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n   neuronx_distributed_inference.utils.testing.build_module(\n       module_cls,\n       example_inputs: List[Tuple[torch.Tensor]],\n       module_init_kwargs: Dict = {},\n       tp_degree: int = 1,\n       compiler_args: Optional[str] = None,\n       compiler_workdir: Optional[str] = None,\n       checkpoint_path: Optional[str] = None,\n   )\n\nBuilds a module into a Neuron model. This function traces the module\nusing the example inputs, which is a list of tuples where each item is a\ntensor. Then, it compiles the traced module to produce a Neuron model.\nArguments:\n\n- ``module_cls``: The module class to compile.\n- ``example_inputs``: The list of example inputs to use to trace the\n  module. This list must contain exactly one tuple of tensors.\n- ``tp_degree``: The TP degree to use. Defaults to 1.\n- ``module_init_kwargs``: The kwargs to pass when initializing the\n  module.\n- ``compiler_args``: The compiler args to use.\n- ``compiler_workdir``: Where to save compiler artifacts. Defaults to a\n  tmp folder with a UUID for uniqueness.\n- ``checkpoint_path``: The path to the checkpoint to load. By default,\n  this function saves the module state dict to use as the checkpoint.\n\nBuilding functions to run on Neuron\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n   neuronx_distributed_inference.utils.testing.build_function(\n       func: Callable,\n       example_inputs: List[Tuple[torch.Tensor]],\n       tp_degree: int = 1,\n       compiler_args: Optional[str] = None,\n       compiler_workdir: Optional[str] = None,\n   )\n\nBuilds a function into a Neuron model. You can use ``build_function`` to\ntest an individual function, such as a ``top_k`` sampling function.\n\nSee ``build_module`` for more\ninformation about common arguments. If the function has non-tensor\ninputs, you must convert it to a function that only takes tensor inputs.\nYou can use ``partial`` to do this, where you provide the non-tensor\ninputs as constants in the partial function. This step is necessary\nbecause all inputs must be tensors in a Neuron model.\n\n::\n\n   import torch\n\n   from neuronx_distributed_inference.utils.testing import build_module\n\n\n   def top_k(input: torch.Tensor, k: int, dim: int):\n       return torch.topk(input, k, dim)\n\n\n   top_k_partial = partial(top_k, 1, 0)\n   model = build_fuction(top_k_partial, example_inputs=[(torch.rand(4)),])\n   output = model(torch.rand(4))\n\nValidating module and function accuracy on Neuron\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n   neuronx_distributed_inference.utils.testing.validate_accuracy(\n       neuron_model,\n       inputs: List[Tuple],\n       expected_outputs: Optional[List] = None,\n       cpu_callable: Optional[Callable] = None,\n       assert_close_kwargs: Dict = {},\n   )\n\nValidates the accuracy of a Neuron model. This function tests that the\nmodel produces expected outputs, which you can provide and/or produce on\nCPU. To compare outputs, this function uses\n``torch_neuronx.testing.assert_close``. If the output isn't similar,\nthis function raises an AssertionError. Arguments:\n\n- ``neuron_model``: The Neuron model to validate.\n- ``inputs``: The list of inputs to use to run the model. Each input is\n  passed to the model's forward function.\n- ``expected_outputs``: The list of expected outputs for each input. If\n  not provided, this function compares against the CPU output for each\n  input.\n- ``cpu_callable``: The callable to use to produce output on CPU.\n- ``assert_close_kwargs``: The kwargs to pass to\n  ``torch_neuronx.testing.assert_close``.\n\nExamples\n~~~~~~~~\n\nExample: Basic module test\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis example demonstrates how to validate the accuracy of a basic module\nwith a single linear layer. In this example, we initialize the module\nseparately on Neuron and CPU (using the ``distributed`` arg in\n``ExampleModule``). This flag enables us run a parallel linear layer on\nNeuron and compare it to a standard linear layer on CPU.\n\n::\n\n   import torch\n\n   from neuronx_distributed_inference.utils.testing import build_module, validate_accuracy\n\n   # Module to test.\n   class ExampleModule(torch.nn.Module):\n       def __init__(self, distributed):\n           super().__init__()\n           if distributed:\n               self.linear = ColumnParallelLinear(\n                   input_size=SAMPLE_SIZE,\n                   output_size=SAMPLE_SIZE,\n                   bias=False,\n                   dtype=torch.float32,\n               )\n           else:\n               self.linear = torch.nn.Linear(\n                   in_features=SAMPLE_SIZE,\n                   out_features=SAMPLE_SIZE,\n                   bias=False,\n                   dtype=torch.float32,\n               )\n\n       def forward(self, x):\n           return self.linear(x)\n\n\n   def test_validate_accuracy_basic_module():\n       inputs = [(torch.arange(0, SAMPLE_SIZE, dtype=torch.float32),)]\n       example_inputs = [(torch.zeros((SAMPLE_SIZE), dtype=torch.float32),)]\n\n       module_cpu = ExampleModule(distributed=False)\n       neuron_model = build_module(ExampleModule, example_inputs, module_init_kwargs={\"distributed\": True})\n\n       validate_accuracy(neuron_model, inputs, cpu_callable=module_cpu)\n\nExample: Basic function test\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis example demonstrates how to validate the accuracy of a basic\nfunction with tensor args.\n\n::\n\n   import torch\n\n   from neuronx_distributed_inference.utils.testing import build_function, validate_accuracy\n\n\n   def example_sum(tensor):\n       return torch.sum(tensor)\n\n\n   def test_validate_accuracy_basic_function():\n       inputs = [(torch.tensor([1, 2, 3], dtype=torch.float32),)]\n       example_inputs = [(torch.zeros((3), dtype=torch.float32),)]\n\n       neuron_model = build_function(example_sum, example_inputs)\n       validate_accuracy(neuron_model, inputs, cpu_callable=example_sum)\n\nAdditional examples\n^^^^^^^^^^^^^^^^^^^\n\nFor additional examples of ``build_module``, ``build_function``, and\n``validate_accuracy``, see the `testing.py unit\ntests <https://github.com/aws-neuron/neuronx-distributed-inference/tree/main/test/unit/testing/test_testing.py>`__.\n"
  },
  {
    "path": "libraries/nxd-inference/examples/vllm_client.py",
    "content": "from openai import OpenAI\n\n\nclient = OpenAI(api_key=\"EMPTY\", base_url=\"http://0.0.0.0:8080/v1\")\nmodels = client.models.list()\nmodel_name = models.data[0].id\n\nprompt = \"Hello, my name is Llama \"\n\nresponse = client.chat.completions.create(\n    model=model_name,\n    messages=[{\"role\": \"user\", \"content\": prompt}],\n    max_tokens=1024,\n    temperature=1.0,\n    top_p=1.0,\n    stream=False,\n    extra_body={\"top_k\": 50},\n)\n\ngenerated_text = response.choices[0].message.content\nprint(generated_text)\n"
  },
  {
    "path": "libraries/nxd-inference/index.rst",
    "content": ".. meta::\n   :description: NxD Inference (NeuronX Distributed Inference) is an ML inference library included with the Neuron SDK that simplifies deploying deep learning models on AWS Inferentia and Trainium instances.\n   :keywords: NxD Inference, NeuronX Distributed Inference, AWS Neuron SDK, Deep Learning Inference, LLM Deployment, Model Optimization, Tensor Parallelism, Sequence Parallelism, vLLM Integration, Speculative Decoding, Continuous Batching \n   :date-modified: 12/02/2025\n\n.. _nxdi-index:\n\nNxD Inference\n=============\n\nThis section contains the technical documentation specific to the NxD Inference library included with the Neuron SDK.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Overview </libraries/nxd-inference/overview-index>\n    Setup </libraries/nxd-inference/nxdi-setup>\n    Tutorials  </libraries/nxd-inference/tutorials/index>\n    Developer Guides  </libraries/nxd-inference/developer_guides/index>\n    API Reference Guide </libraries/nxd-inference/api-guides/index>\n    App Notes  </libraries/nxd-inference/app-notes/index>\n    Release Notes </release-notes/components/nxd-inference>\n    Misc  </libraries/nxd-inference/misc/index>\n\nWhat is NxD Inference?\n-----------------------\n\nNxD Inference (NeuronX Distributed Inference) is an ML inference library included with the Neuron SDK that simplifies deploying deep learning models on AWS Inferentia and Trainium instances. It offers advanced features like continuous batching and speculative decoding for high-performance inference, and supports popular models like Llama-3.1, DBRX, and Mixtral.\n\nWith NxD Inference, developers can:\n\n* Deploy production-ready LLMs with minimal configuration\n* Leverage optimizations like KV Cache, Flash Attention, and Quantization\n* Distribute large models across multiple NeuronCores using Tensor and Sequence Parallelism\n* Integrate with vLLM for seamless production deployment\n* Customize and extend models with a modular design approach\n\nWith NxD Inference, developers can:\n\nUse vLLM for Inference\n------------------------\n\nNeuron recommends that use vLLM when building your inference models. Read more about Neuron's integration with vLLM here: :doc:`vLLM on Neuron </libraries/nxd-inference/vllm/index>`\n\nQuickstarts\n------------\n\n.. grid:: 1 1 2 2\n    :gutter: 3\n    \n    .. grid-item-card:: Quickstart: Serve models online with vLLM on Neuron\n        :link: /libraries/nxd-inference/vllm/quickstart-vllm-online-serving\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Get started serving online models with vLLM. Time to complete: ~20 minutes.\n\n    .. grid-item-card:: Quickstart: Run offline inference with vLLM on Neuron\n        :link: /libraries/nxd-inference/vllm/quickstart-vllm-offline-serving\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Get started running offline inference with vLLM. Time to complete: ~20 minutes.\n\nNxD Inference documentation\n----------------------------\n\n.. grid:: 1 1 2 2\n    :gutter: 3\n    \n    .. grid-item-card:: Overview\n        :link: /libraries/nxd-inference/overview-index\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Learn about NxD Inference architecture, key features, and how it can help you deploy models efficiently on AWS Neuron hardware.\n\n    .. grid-item-card:: Setup\n        :link: /libraries/nxd-inference/nxdi-setup\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Step-by-step instructions for setting up NxD Inference using DLAMI, Docker containers, or manual installation.\n\n    .. grid-item-card:: Get Started with Models\n        :link: /libraries/nxd-inference/models/index\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Deploy production-ready models like Llama 3, DBRX, and Mixtral with optimized configurations for AWS Neuron hardware.\n\n    .. grid-item-card:: Tutorials\n        :link: /libraries/nxd-inference/tutorials/index\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Hands-on tutorials for deploying various models, including Llama 3 variants, multimodal models, and using advanced features like speculative decoding.\n\n    .. grid-item-card:: Developer Guides\n        :link: /libraries/nxd-inference/developer_guides/index\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        In-depth guides for model onboarding, feature integration, vLLM usage, benchmarking, and customizing inference workflows.\n\n    .. grid-item-card:: API Reference\n        :link: /libraries/nxd-inference/api-guides/index\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Comprehensive API documentation for integrating NxD Inference into your applications and customizing inference behavior.\n\n    .. grid-item-card:: Application Notes\n        :link: /libraries/nxd-inference/app-notes/index\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Detailed application notes on parallelism strategies and other advanced topics for optimizing inference performance.\n\n    .. grid-item-card:: Misc Resources\n        :link: /libraries/nxd-inference/misc/index\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Release notes, troubleshooting guides, and other helpful resources for working with NxD Inference.\n\n    .. grid-item-card:: NxD Inference Release Notes\n        :link: /release-notes/components/nxd-inference\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Release notes, troubleshooting guides, and other helpful resources for working with NxD Inference.\n"
  },
  {
    "path": "libraries/nxd-inference/misc/index.rst",
    "content": ".. _nxdi-misc-index:\n\nNxD Inference Misc\n===================\n\n.. toctree::\n    :maxdepth: 1\n    \n    /release-notes/components/nxd-inference\n    /libraries/nxd-inference/misc/nxdi-troubleshooting\n\n.. include:: /libraries/nxd-inference/misc/misc.txt\n  "
  },
  {
    "path": "libraries/nxd-inference/misc/misc.txt",
    "content": "* :ref:`nxd-inference_rn`\n* :ref:`nxdi-troubleshooting`"
  },
  {
    "path": "libraries/nxd-inference/misc/nxdi-troubleshooting.rst",
    "content": ".. _nxdi-troubleshooting:\n\nTroubleshooting Guide for NxD Inference\n=======================================\n\nThis guide provides solutions for common issues encountered when using NxD Inference.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nAccuracy Issues\n----------------\n\nThe primary methods for validating model accuracy on Neuron involve both token-by-token output matching and logit-level error analysis (relative or max absolute error) against a pre-calibrated GPU FP32 or CPU FP32 reference. When output deviations are observed, these can be systematically attributed to factors such as tokenizer/input discrepancies, amplification from large weight norms (high Lipschitz constants), quantization or precision loss, differences in operator implementation or kernel fusion, compiler optimization, or unintended hardware-level datatype casts.\n\nWhen validating model accuracy on Neuron, it is important to recognize that predicting the exact output deviations from a high-precision reference (like CPU or GPU FP32) is theoretically NP-hard, due to the complex and nonlinear nature of large neural networks. Rather than attempting to anticipate every possible numerical difference, the recommended strategy is to systematically identify, localize, and diagnose deviations as they occur.\n\nAccuracy Degradation with Auto-Cast\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Issue**: You may observe accuracy degradation in model outputs when using the default auto-cast behavior of the Neuron compiler.\n\n**Explanation**: By default, the Neuron compiler automatically casts certain operations to lower precision data types (BF16) to improve performance. While this works well for most cases, it can sometimes lead to accuracy issues, especially in operations involving integer-to-float conversions.\n\n**Solution**: Use the ``--auto-cast=none`` compiler flag to disable automatic casting. This preserves the original precision of operations at the cost of some performance.\n\nExample using inference_demo:\n\n.. code:: bash\n\n   inference_demo --model-type llama --task-type causal-lm run \\\n       --model-path <path> \\\n       --compiled-model-path <path> \\\n       --torch-dtype bfloat16 \\\n       --tp-degree <value> \\\n       --batch-size <value> \\\n       --max-context-length <value> \\\n       --seq-len <value> \\\n       --on-device-sampling \\\n       --prompt \"Your prompt here\" \\\n       --compiler-args \"--auto-cast=none\"\n\nExample using NeuronConfig:\n\n.. code:: python\n\n   from neuronx_distributed_inference.models.config import NeuronConfig\n   \n   neuron_config = NeuronConfig(\n       tp_degree=32,\n       batch_size=1,\n       max_context_length=1024,\n       seq_len=2048,\n       compiler_args=\"--auto-cast=none\"\n   )\n\nInteger-to-Float Conversion Issues\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Issue**: Operations involving integer-to-float conversions (such as in rotary embeddings) may experience significant accuracy degradation when auto-cast is enabled.\n\n**Explanation**: When integer values are converted to floating point and then automatically cast to lower precision (like BF16), the precision loss can be substantial. This is particularly problematic in operations like rotary embeddings where position IDs are converted to floating point for computing sin/cos values.\n\n**Solution**: Use the ``--auto-cast=none`` compiler flag to prevent downcasting these operations. This is especially important for models that use rotary embeddings or similar position encoding mechanisms.\n\n**Technical Details**: The issue occurs in operations like:\n\n.. code:: python\n\n   # Integer position IDs are converted to float for sin/cos computation\n   # Downcasting to BF16 here can cause significant precision loss\n   position_ids = position_ids.to(torch.bfloat16)\n   sin, cos = self.compute_sin_cos(position_ids)\n\nMemory Usage Considerations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Note**: Using ``--auto-cast=none`` will increase memory usage as operations will use higher precision data types. Ensure your instance has sufficient memory when using this flag.\n\nPerformance Impact\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n**Note**: Disabling auto-cast will typically result in slower inference. The exact performance impact depends on your model architecture and hardware configuration. Consider this trade-off when optimizing for accuracy.\n\n\nArray indexing and in-place operations in Neuron\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Issue**: When building attention masks, operations that combine array slicing with in-place modifications (e.g., ``mask_i[: arx[0] * arx[1], :ntok] = 0``) can cause accuracy issues in Neuron. This is particularly problematic when the array indices are dynamically computed.\n\n**Explanation**: The accuracy issue stems from two main factors:\n\n1. Array Slicing with Dynamic Ranges:\n\n.. code:: python\n\n   # Problematic: Array slicing with dynamic range (arx[0] * arx[1])\n   mask_i[: arx[0] * arx[1], :ntok] = 0\n\n- Uses computed indices to access specific portions of the tensor\n- Dynamic ranges can lead to unpredictable memory access patterns\n\n2. In-place Modifications:\n\n.. code:: python\n\n   # Problematic: Modifying tensor in-place\n   mask_i[...] = 0  # Direct modification of the original tensor\n\n- Changes the original tensor's values directly\n- Can cause issues with Neuron's memory management and optimization\n\n**Solution**: Replace array slicing and in-place operations with element-wise operations:\n\n.. code:: python\n\n   # Instead of array slicing and in-place modification:\n   mask_i[: arx[0] * arx[1], :ntok] = 0  # Problematic\n\n   # Use element-wise operations:\n   arx_mask = (torch.arange(num_chunks, device=x.device) >= (arx[0] * arx[1])).to(dtype=x.dtype)\n   mask_i[:, :ntok] *= arx_mask.view(num_chunks, 1, 1)  # Neuron-friendly\n\n**Example**: \nFile: `test/unit/models/mllama/test_vision_encoder_attention_mask.py <https://github.com/aws-neuron/neuronx-distributed-inference/blob/9b90cd02ffc3cc76bb3e81113a177f10d7a350a8/test/unit/models/mllama/test_vision_encoder_attention_mask.py>`__\n\n.. code:: python\n\n   # CPU version (problematic in Neuron):\n   def build_encoder_attention_mask_meta(x, ar, ntok, num_chunks, n_heads):\n       masks = []\n       for arx in ar:\n           mask_i = torch.ones((num_chunks, x.shape[2], 1), dtype=x.dtype)\n           mask_i[: arx[0] * arx[1], :ntok] = 0  # Problematic: array slicing + in-place\n           # ...\n\n   # Neuron-friendly version:\n   def build_encoder_attention_mask(x, ar, ntok, num_chunks, n_heads):\n       masks = []\n       for arx in ar:\n           mask_i = torch.ones((num_chunks, x.shape[2], 1), dtype=x.dtype, device=x.device)\n           arx_mask = (torch.arange(num_chunks, device=x.device) >= (arx[0] * arx[1])).to(dtype=x.dtype)\n           mask_i[:, :ntok] *= arx_mask.view(num_chunks, 1, 1)  # Element-wise operation\n           # ...\n\n**Note**: This pattern applies to similar operations where array slicing and in-place modifications are used together. \nConsider using element-wise operations and avoiding in-place modifications for better Neuron compatibility.\n\n\nPerformance Issues\n--------------------\n\n\nSkip model warmup during inference\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Issue**: You may observe slower performance for the first few inference requests, particularly on Trn2.\n\n**Explanation**: By default, model warmup is disabled (``skip_warmup=True``) on Trn2 since warmup feature is not yet implemented for Trn2. This means the model needs to \"warm up\" naturally through actual inference requests, leading to slower performance during the initial requests.\n\n\n**Solution**: There are approaches to ensure initial request performance:\n\n1. Enable built-in warmup if your configuration supports it (on Inf2, Trn1):\n\n.. code:: python\n\n   neuron_config = NeuronConfig(\n       tp_degree=32,\n       batch_size=1,\n       # skip_warmup=True is the default for Trn2 in release 2.23\n       # skip_warmup=False is the default for Trn1, Inf2 in release 2.23\n   )\n\n2. Implement manual warmup by sending dummy requests (on all instance types):\n\n.. code:: python\n\n   # Send a few dummy requests before serving real traffic\n   dummy_prompt = \"This is a warmup request.\"\n   for _ in range(3):  # Number of warmup iterations\n       model.generate(\n           prompt=dummy_prompt,\n           max_new_tokens=32\n       )\n\n\n**Note**:\n \n- When using vLLM for serving, the same initial performance impact applies if warmup is disabled.\n- Use `--override-neuron-config \"{\\\"skip_warmup\\\":false}\"` to change the warmup setting\n\n**Best Practice**: \n\n- For production environments where initial latency is critical, test if your configuration supports built-in warmup.\n- If built-in warmup isn't supported, implement manual warmup before serving real traffic.\n- For development or non-latency-critical scenarios, the default configuration (warmup disabled) is sufficient.\n\nOther Common Issues\n--------------------\n\nTensor Materialization During Tracing caused unexpected model behavior\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Issue**: Developers may inadvertently write code that forces tensor materialization during model tracing, leading to fixed computation paths and unexpected behaviors.\n\n**Explanation**: When model logic depends on tensor values during the forward pass, the compiler may try to evaluate these values during tracing time. This \"fixes\" the computation path based on the initial values, resulting in a model that doesn't properly handle different runtime values.\n\nExample of problematic code:\n\n.. code:: python\n\n   def forward(self, tensor):\n       if tensor[0] == 1:  # Forces tensor sync during tracing\n           return tensor\n       else:\n           return tensor * 2\n\n**Solution**: There are two debugging approaches to detect tensor materialization issues:\n\n1. Enable warning messages:\n\n.. code:: python\n\n   import os\n   \n   # Set before model tracing\n   os.environ['PT_XLA_DEBUG_LEVEL'] = '2'  # Will print warnings when tensor sync occurs\n\n2. Force errors on tensor materialization:\n\n.. code:: python\n\n   import torch_xla\n   \n   # Set before model tracing\n   torch_xla._XLAC._set_allow_execution(False)  # Will raise an error if tensor sync is attempted\n\n**Best Practice**: \n\n- Avoid control flow that depends on tensor values during tracing. Instead, consider setting flags through configurations that should not change during runtime. See below example:\n\n.. code:: python\n\n   class TestModel(torch.nn.Module):\n      def __init__(self, flag=1):\n         super().__init__()\n         # the flag should be pre-determined based on the model configuration\n         # it should not be an input of the model during runtime\n         self.flag = flag\n\n      def forward(self, tensor):\n         if self.flag:\n               return tensor\n         else:\n               return tensor * 2\n\n- If dynamic model path is required, consider using JIT inference (See: :ref:`trace-vs-xla-lazytensor`)\n\n\nInput Data Type Handling for int64/fp64 due to compiler dtype compatibility\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n\n**Issue**: While you may be using 64-bit data types (int64/fp64) from tokenizers or other input sources, be aware that these are automatically converted to 32-bit types inside `ModelWrapper <https://github.com/aws-neuron/neuronx-distributed-inference/blob/main/src/neuronx_distributed_inference/models/model_wrapper.py>`__.\n\n**Explanation**: The Neuron compiler is optimized for 32-bit data types. To ensure optimal accuracy and compatibility, the model wrapper automatically converts 64-bit inputs (like those from Hugging Face tokenizers) to their 32-bit equivalents (int64 → int32, fp64 → fp32).\n\n**Note**: No action is required from users as this conversion is handled automatically.\n\n**Best Practice**:\n \n- Continue using your tokenizers and input pipelines as normal\n- Be aware that 64-bit inputs are automatically converted to 32-bit when using `ModelWrapper <https://github.com/aws-neuron/neuronx-distributed-inference/blob/main/src/neuronx_distributed_inference/models/model_wrapper.py>`__\n- If you're implementing custom pre-processing, using 32-bit types directly can be more efficient\n\nThis automatic conversion ensures consistent accuracy and compatibility with the Neuron compiler while maintaining ease of use with standard tokenizers and input pipelines."
  },
  {
    "path": "libraries/nxd-inference/models/index.rst",
    "content": ".. meta::\n  :description: Reference guide for running inference with NeuronX Distributed Inference (NxDI) on AWS Neuron for Trainium and Inferentia ML chips.\n\n.. _nxdi-models-index:\n\nNeuron Inference Model Support\n=================================\n\nThis section provides information on model support in **NeuronX Distributed Inference (NxDI)** and how to determine appropriate configurations for both online and offline use cases.\n\n.. _nxdi-models-llama3-index:\n\nLlama 3\n---------------------------\n\nMeta's Llama 3 family includes large language models available in multiple sizes and versions. Select the model variant that matches your application requirements:\n\n.. grid:: 1\n  :gutter: 1\n\n  .. grid-item-card:: Llama 3.3 70B\n\n    Meta's multilingual LLM, featuring 70B parameters and Grouped Query Attention.\n\n    :bdg-ref-primary:`Quickstart <nxdi-models-llama-3-3-70b-instruct-quickstart>`\n\n.. _nxdi-models-qwen3-index:\n\nQwen 3\n---------------------------\n\nQwen 3 family includes large language models available in multiple sizes and versions. Select the model variant that matches your application requirements:\n\n.. grid:: 1\n  :gutter: 1\n\n  .. grid-item-card:: Qwen3 MoE 235B\n\n    Qwen family multilingual LLM, featuring sparse Mixture-of-Experts and Grouped Query Attention\n\n    :bdg-ref-primary:`Quickstart <nxdi-models-qwen3-235b-a22b-quickstart>`\n\n.. note::\n   Instructions for additional models will be available soon. For a complete list of supported model architectures, refer to this :ref:`developer guide <nxdi-model-reference>`.\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   llama3/llama_33_70b\n   qwen3/qwen3_moe_235b\n\n"
  },
  {
    "path": "libraries/nxd-inference/models/llama3/data/card_llama33_70b.yml",
    "content": "# Optional Metadata\nmetadata:\n\n# Model Information\nmodel:\n  family: \"Llama 3\"\n  name: \"Llama-3.3-70B-Instruct\"\n  display_name: \"Llama 3.3 70B\"\n  checkpoint: \"meta-llama/Llama-3.3-70B-Instruct\"\n\n  description: |\n    Llama 3.3 70B is Meta's multilingual large language model with 70B parameters \n    and a transformer architecture featuring Grouped Query Attention (GQA).\n\n# Hardware Requirements\nhardware:\n\n# Configurations\nconfigurations:\n  config1:\n    instance_type: \"trn2.48xlarge\"\n    sdk_version: \"2.25\"\n    dp_degree: 1\n    neuron:\n      async_mode: true\n      batch_size: 1\n      tp_degree: 64\n      attn_block_tkg_nki_kernel_cache_update: true\n      attn_block_tkg_nki_kernel_enabled: true\n      attn_kernel_enabled: true\n      cc_pipeline_tiling_factor: 1\n      enable_bucketing: true\n      fused_qkv: true\n      is_continuous_batching: true\n      k_cache_transposed: true\n      kv_cache_tiling: false\n      logical_nc_config: 2\n      mlp_kernel_enabled: true\n      qkv_kernel_enabled: true\n      seq_len: 16384\n      sequence_parallel_enabled: true\n      token_generation_buckets: [256, 512, 1024, 2048, 4096, 8192, 10240, 12288, 16384]\n      context_encoding_buckets: [256, 512, 1024, 2048, 4096, 8192, 10240, 12288, 16384]\n      on_device_sampling_config: \n        do_sample: true\n        dynamic: true\n      torch_dtype: \"bfloat16\"\n    vllm:\n      tensor_parallel_size: 64\n      max_num_seqs: 1\n      max_model_len: 16384\n      additional_config:\n        override_neuron_config:\n          async_mode: true\n          batch_size: 1\n          tp_degree: 64\n          attn_block_tkg_nki_kernel_cache_update: true\n          attn_block_tkg_nki_kernel_enabled: true\n          attn_kernel_enabled: true\n          cc_pipeline_tiling_factor: 1\n          enable_bucketing: true\n          fused_qkv: true\n          is_continuous_batching: true\n          k_cache_transposed: true\n          kv_cache_tiling: false\n          logical_nc_config: 2\n          mlp_kernel_enabled: true\n          qkv_kernel_enabled: true\n          seq_len: 16384\n          sequence_parallel_enabled: true\n          token_generation_buckets: [256, 512, 1024, 2048, 4096, 8192, 10240, 12288, 16384]\n          context_encoding_buckets: [256, 512, 1024, 2048, 4096, 8192, 10240, 12288, 16384]\n          on_device_sampling_config:\n            do_sample: true\n            dynamic: true\n          torch_dtype: \"bfloat16\"\n\n  config2:\n    instance_type: \"trn2.48xlarge\"\n    sdk_version: \"2.25\"\n    dp_degree: 2\n    neuron:\n      async_mode: true\n      batch_size: 8\n      ctx_batch_size: 1\n      tp_degree: 32\n      attn_block_tkg_nki_kernel_cache_update: true\n      attn_block_tkg_nki_kernel_enabled: true\n      attn_kernel_enabled: true\n      cc_pipeline_tiling_factor: 1\n      enable_bucketing: true\n      fused_qkv: true\n      is_continuous_batching: true\n      k_cache_transposed: true\n      kv_cache_tiling: false\n      logical_nc_config: 2\n      mlp_kernel_enabled: true\n      qkv_kernel_enabled: true\n      seq_len: 16384\n      sequence_parallel_enabled: true\n      token_generation_buckets: [256, 512, 1024, 2048, 4096, 8192, 10240, 12288, 16384]\n      context_encoding_buckets: [256, 512, 1024, 2048, 4096, 8192, 10240, 12288, 16384]\n      on_device_sampling_config: \n        do_sample: true\n        dynamic: true\n      torch_dtype: \"bfloat16\"\n    vllm:\n      tensor_parallel_size: 32\n      max_num_seqs: 8\n      max_model_len: 16384\n      additional_config:\n        override_neuron_config:\n          async_mode: true\n          batch_size: 8\n          ctx_batch_size: 1\n          tp_degree: 32\n          attn_block_tkg_nki_kernel_cache_update: true\n          attn_block_tkg_nki_kernel_enabled: true\n          attn_kernel_enabled: true\n          cc_pipeline_tiling_factor: 1\n          enable_bucketing: true\n          fused_qkv: true\n          is_continuous_batching: true\n          k_cache_transposed: true\n          kv_cache_tiling: false\n          logical_nc_config: 2\n          mlp_kernel_enabled: true\n          qkv_kernel_enabled: true\n          seq_len: 16384\n          sequence_parallel_enabled: true\n          token_generation_buckets: [256, 512, 1024, 2048, 4096, 8192, 10240, 12288, 16384]\n          context_encoding_buckets: [256, 512, 1024, 2048, 4096, 8192, 10240, 12288, 16384]\n          on_device_sampling_config:\n            do_sample: true\n            dynamic: true\n          torch_dtype: \"bfloat16\"\n  \n  config3:\n    instance_type: \"trn2.48xlarge\"\n    sdk_version: \"2.25\"\n    dp_degree: 2\n    neuron:\n      batch_size: 1\n      tp_degree: 64\n      enable_bucketing: true\n      is_continuous_batching: true\n      logical_nc_config: 2\n      seq_len: 16384\n      torch_dtype: \"bfloat16\"\n    vllm:\n      tensor_parallel_size: 64\n      max_num_seqs: 1\n      max_model_len: 16384\n      additional_config:\n        override_neuron_config:\n          batch_size: 1\n          tp_degree: 64\n          enable_bucketing: true\n          is_continuous_batching: true\n          logical_nc_config: 2\n          seq_len: 16384\n          torch_dtype: \"bfloat16\"\n\n  config4:\n    instance_type: \"trn1.32xlarge\"\n    sdk_version: \"2.25\"\n    dp_degree: 1\n    neuron:\n      batch_size: 1\n      tp_degree: 32\n      enable_bucketing: true\n      is_continuous_batching: true\n      logical_nc_config: 1\n      seq_len: 16384\n      torch_dtype: \"bfloat16\"\n    vllm:\n      tensor_parallel_size: 32\n      max_num_seqs: 1\n      max_model_len: 16384\n      additional_config:\n        override_neuron_config:\n          batch_size: 1\n          tp_degree: 32\n          enable_bucketing: true\n          is_continuous_batching: true\n          logical_nc_config: 1\n          seq_len: 16384\n          torch_dtype: \"bfloat16\"\n\ndefaults:\n  \"trn2.48xlarge\": \n    config: \"config3\"\n  \"trn1.32xlarge\": \n    config: \"config4\"\n\n# Recommendations\nrecommendations:\n  Latency:\n    config: \"config1\"\n\n  Throughput:\n    config: \"config2\"\n"
  },
  {
    "path": "libraries/nxd-inference/models/llama3/llama_33_70b.rst",
    "content": ".. datatemplate:yaml:: data/card_llama33_70b.yml\n   :template: model_card.jinja.rst\n\nResources\n-----------\n\n* :ref:`llm-inference-benchmarking`\n* :ref:`nxdi-onboarding-models`\n* :ref:`nxdi-vllm-user-guide-v0`\n* :ref:`neuron-nki`\n"
  },
  {
    "path": "libraries/nxd-inference/models/models.txt",
    "content": "* :ref:`nxdi-models-llama3-index`"
  },
  {
    "path": "libraries/nxd-inference/models/qwen3/data/card_qwen3_moe_235b.yml",
    "content": "# Optional Metadata\nmetadata:\n\n# Model Information\nmodel:\n  family: \"Qwen3\"\n  name: \"Qwen3-235B-A22B\"\n  display_name: \"Qwen3 235B A22B\"\n  checkpoint: \"Qwen/Qwen3-235B-A22B\"\n\n  description: |\n    Qwen3 235B A22B is a mixture-of-experts (MoE) model with 235B parameters developed by the Qwen Team,\n    activating 22B parameters per forward pass.\n\n# Hardware Requirements\nhardware:\n\n# Configurations\nconfigurations:\n  config1:\n    instance_type: \"trn2.48xlarge\"\n    sdk_version: \"2.27\"\n    dp_degree: 8\n    neuron:\n      tp_degree: 64\n      attention_dp_degree: 8\n      cp_degree: 16\n      moe_tp_degree: 2\n      moe_ep_degree: 32\n      use_index_calc_kernel: true\n      mode_mask_padded_tokens: true\n      batch_size: 16\n      ctx_batch_size: 1\n      max_context_length: 16384\n      seq_len: 16384\n      scratch_pad_size: 1024\n      torch_dtype: \"float16\"\n      is_continuous_batching: true\n      fused_qkv: true\n      blockwise_matmul_config:\n        use_shard_on_intermediate_dynamic_while: true\n        skip_dma_token: true\n      on_device_sampling_config:\n        do_sample: true\n        temperature: 0.6\n        top_k: 20\n        top_p: 0.95\n      enable_bucketing: true\n      token_generation_buckets: [10240, 16384]\n      context_encoding_buckets: [10240, 16384]\n      flash_decoding_enabled: false\n      logical_nc_config: 2\n      cc_pipeline_tiling_factor: 2\n      sequence_parallel_enabled: true\n      qkv_kernel_enabled: true\n      qkv_nki_kernel_enabled: true\n      qkv_cte_nki_kernel_fuse_rope: true\n      attn_kernel_enabled: true\n      strided_context_parallel_kernel_enabled: true\n      async_mode: true\n    # vllm v1 config\n    vllm:\n      tensor_parallel_size: 64\n      max_num_seqs: 16\n      max_model_len: 16384\n      additional_config:\n        override_neuron_config:\n          tp_degree: 64\n          attention_dp_degree: 8\n          cp_degree: 16\n          moe_tp_degree: 2\n          moe_ep_degree: 32\n          use_index_calc_kernel: true\n          mode_mask_padded_tokens: true\n          batch_size: 16\n          ctx_batch_size: 1\n          max_context_length: 16384\n          seq_len: 16384\n          scratch_pad_size: 1024\n          torch_dtype: \"float16\"\n          is_continuous_batching: true\n          fused_qkv: true\n          blockwise_matmul_config:\n            use_shard_on_intermediate_dynamic_while: true\n            skip_dma_token: true\n          on_device_sampling_config:\n            do_sample: true\n            temperature: 0.6\n            top_k: 20\n            top_p: 0.95\n          enable_bucketing: true\n          token_generation_buckets: [10240, 16384]\n          context_encoding_buckets: [10240, 16384]\n          flash_decoding_enabled: false\n          logical_nc_config: 2\n          cc_pipeline_tiling_factor: 2\n          sequence_parallel_enabled: true\n          qkv_kernel_enabled: true\n          qkv_nki_kernel_enabled: true\n          qkv_cte_nki_kernel_fuse_rope: true\n          attn_kernel_enabled: true\n          strided_context_parallel_kernel_enabled: true\n          async_mode: true\n\n  config2:\n    instance_type: \"trn2.48xlarge\"\n    sdk_version: \"2.27\"\n    dp_degree: 8\n    neuron:\n      tp_degree: 64\n      attention_dp_degree: 8\n      cp_degree: 16\n      moe_tp_degree: 2\n      moe_ep_degree: 32\n      use_index_calc_kernel: true\n      mode_mask_padded_tokens: true\n      batch_size: 64\n      ctx_batch_size: 1\n      max_context_length: 16384\n      seq_len: 16384\n      scratch_pad_size: 1024\n      torch_dtype: \"float16\"\n      is_continuous_batching: true\n      fused_qkv: true\n      blockwise_matmul_config:\n        use_shard_on_intermediate_dynamic_while: true\n        skip_dma_token: true\n      on_device_sampling_config:\n        do_sample: true\n        temperature: 0.6\n        top_k: 20\n        top_p: 0.95\n      enable_bucketing: true\n      token_generation_buckets: [10240, 16384]\n      context_encoding_buckets: [10240, 16384]\n      flash_decoding_enabled: false\n      logical_nc_config: 2\n      cc_pipeline_tiling_factor: 2\n      sequence_parallel_enabled: true\n      qkv_kernel_enabled: true\n      qkv_nki_kernel_enabled: true\n      qkv_cte_nki_kernel_fuse_rope: true\n      attn_kernel_enabled: true\n      strided_context_parallel_kernel_enabled: true\n      async_mode: true\n\ndefaults:\n  \"trn2.48xlarge\": \n    config: \"config1\"\n\n# Recommendations\nrecommendations:\n  Latency:\n    config: \"config1\"\n\n  Throughput:\n    config: \"config2\"\n"
  },
  {
    "path": "libraries/nxd-inference/models/qwen3/qwen3_moe_235b.rst",
    "content": ".. datatemplate:yaml:: data/card_qwen3_moe_235b.yml\n   :template: model_card_qwen3.jinja.rst\n\nResources\n-----------\n\n* :ref:`llm-inference-benchmarking`\n* :ref:`nxdi-onboarding-models`\n* :ref:`nxdi-vllm-user-guide-v1`\n* :ref:`neuron-nki`\n"
  },
  {
    "path": "libraries/nxd-inference/neuron-inference-overview.rst",
    "content": ".. _neuron-inference-overview:\n\nAI Inference on Neuron\n======================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n\nAWS Neuron provides optimized AI inference on AWS Trainium and Inferentia instances across diverse AI workloads, from Large Language Models (LLMs) to image/video generation models and custom machine learning architectures. The Neuron SDK enables optimized performance tuning for both latency-sensitive applications like interactive chatbots and high-throughput batch processing workloads. Whether you're building real-time generative AI applications, agentic AI systems, or processing offline batch requests, the Neuron SDK provides the flexibility to optimize inference for your specific performance requirements.\n\n\nDeploying Production-Ready Models on Trainium/Inferentia\n--------------------------------------------------------\n\nThe Neuron SDK enables deployment of production-ready popular LLM models like Meta Llama-3.3-70B and OpenAI gpt-oss-120B using vLLM. \nFor model architectures not supported through vLLM, such as diffusion transformer models (Flux), you can integrate with other model servers directly using NxD Inference APIs.\n\n\n.. _neuron_inference_deployment_figure:\n\n.. figure:: ./images/inference-deployment-options.png\n   :align: center\n   :class: outlined-image\n\n   Figure 1: Neuron Inference Deployment options\n\n\n\n\nDeploy Production-Ready Models with vLLM\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nvLLM on Neuron offers a streamlined deployment experience using the standard vLLM V1 APIs with minimal code changes. Once you :ref:`install the latest Neuron SDK <nxdi-setup>`, you can easily get started using vLLM to serve\nproduction-ready models. Below is an example of starting vLLM serving for the Llama-3.1-8B model:\n\n.. code-block:: python\n\n      ##install the vllm-neuron plugin which automatically installs the right vLLM version that is supported\n      git clone https://github.com/vllm-project/vllm-neuron.git\n      cd vllm-neuron\n      pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com -e .\n\n      ##start the vLLM server to start serving inference requests (sample config for Trn1 instance)\n      vllm serve meta-llama/Meta-Llama-3-8B-Instruct --tensor-parallel-size 32 --max-num-seqs 4 --max-model-len 128 --block-size 32 --num-gpu-blocks-override 256\n\n\n\n\nNeuron also offers :ref:`AMIs <neuron-dlami-overview>` with pre-installed Neuron SDK dependencies to quickly test your inference workloads and :ref:`pre-built inference containers<neuron_containers>` that you can use\nto get started with your production workloads in Kubernetes environments.\n\nYou can refer to the :ref:`detailed developer guide on vLLM V1 support <nxdi-vllm-user-guide-v1>` for the list of features and models supported through vLLM.\n\nIf you are looking to deploy a model in vLLM that is not yet supported out of the box on Neuron, you can refer to the :ref:`implementing custom models section below <neuron-inference-implement-custom-models>`.\n\n\nIntegrate with NxD Inference APIs for Custom Model Serving Deployments\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you are looking to deploy models beyond standard LLM architectures, such as Diffusion Transformers which are not supported in vLLM, NxD Inference provides direct API integration options that you can\nintegrate with general-purpose model serving frameworks like FastAPI or Triton Inference Server. You can refer to the `Flux tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/flux-inference-tutorial.html>`_ to learn how to integrate directly with NxD Inference APIs.\n\nSimilarly, if you want to integrate LLM model serving with model serving options other than vLLM, you can integrate directly with NxD Inference. However, you will need to make custom changes to the scheduler along with any modifications required to make it compatible with your desired model server.\n\n\n.. _neuron-inference-implement-custom-models:\n\nImplementing Custom Models or Performance Optimizations\n---------------------------------------------------------\n\nNxD Inference Library\n^^^^^^^^^^^^^^^^^^^^^^\n\nNxD Inference is a PyTorch-based open-source library that provides reference implementations for optimizing popular dense LLM models, MoE LLM models, and image generation models like Llama-3.3-70B, gpt-oss-120B, and Flux on Neuron.\nThe NxD Inference library provides key model building blocks such as different attention techniques, distributed strategies like Tensor Parallel, Expert Parallelism, speculative decoding techniques, and NKI kernels for popular model architectures that you can use to quickly\nbuild custom LLM and other ML model architectures. \n\nYou can use the :ref:`model onboarding guide <nxdi-onboarding-models>` to get started implementing custom models on Neuron. Similarly, you can extend and implement custom performance optimizations on models\nalready implemented in NxD Inference. Once you have implemented the model in NxD Inference, you can either integrate it with vLLM as described in the :ref:`model onboarding guide <nxdi-onboarding-models-vllm>` or integrate it with another model serving framework.\n\nNxD Inference is an open-source library with `source code publicly available on GitHub <https://github.com/aws-neuron/neuronx-distributed-inference>`_. We invite you to contribute\ncustom model implementations or performance optimizations by opening a PR on GitHub.\n\nImplementing Custom Models Directly on PyTorch\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nIf you want to implement models directly on PyTorch without using the NxD Inference library and need more fine-grained control, you can use the :ref:`NxD Core library<neuronx_distributed_api_guide>` that offers Neuron essential primitives like tracing and compilation. The `Llama-3.2-1B <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama>`_ example provides a sample reference implementation showing how to build custom models with the NxD Core library.\n"
  },
  {
    "path": "libraries/nxd-inference/nxdi-setup.rst",
    "content": ".. _nxdi-setup:\n\nNxD Inference Setup Guide\n=========================\n\nThe NeuronX Distributed (NxD) Inference framework is built on top of\n:ref:`NxD Core <neuronx-distributed-index>`. Follow the steps in this\nguide to set up your environment to run inference using the NxD Inference framework.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nOption 1: Launch an instance using a Neuron DLAMI\n-------------------------------------------------\nNeuron Deep Learning AMIs (DLAMIs) are Amazon Machine Images (AMIs) that come\nwith the Neuron SDK pre-installed. To quickly get started with NxD Inference,\nyou can launch an EC2 instance with the multi-framework DLAMI, which includes\nNxD Inference and its dependencies. For more information, see the\n:ref:`Neuron Multi-Framework DLAMI Guide <setup-ubuntu22-multi-framework-dlami>`\nand :ref:`neuron-dlami-overview`.\n\nAfter you launch an instance, you can run the following command to activate the\nNxD Inference virtual environment.\n\n::\n\n   source /opt/aws_neuronx_venv_pytorch_2_6_nxd_inference/bin/activate\n\n\n\nOption 2: Use a Neuron Deep Learning Container (DLC)\n----------------------------------------------------\nNeuron Deep Learning Containers (DLCs) are Docker images that come with the\nNeuron SDK pre-installed. To run NxD Inference in a Docker container, use the\n`Neuronx PyTorch Inference Containers <https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuronx>`_.\nFor more information, see :ref:`neuron_containers`.\n\n\nOption 3: Manually Install NxD Inference\n----------------------------------------\n\nFollow these instructions to manually install NxD Inference on an instance.\n\n.. note:: \n\n   For information about which Python versions are compatible with the Neuron\n   SDK, see :ref:`Release Artifacts <latest-neuron-release-artifacts>`.\n\nSetup a Neuron Environment\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBefore you install NxD Inference, you must install the Neuron SDK and its\ndependencies, including PyTorch Neuron (``torch-neuronx``). Follow instructions\nfor one of the following operating systems:\n\n* :ref:`PyTorch NeuronX Setup on Ubuntu 22 <setup-torch-neuronx>`\n* :ref:`PyTorch NeuronX Setup on Amazon Linux 2023 <setup-torch-neuronx>`\n\n\nInstall NxD Inference\n^^^^^^^^^^^^^^^^^^^^^\n\nRun this command to install NxD Inference.\n\n::\n\n   source aws_neuron_venv_pytorch/bin/activate\n   pip install -U pip\n   pip install --upgrade neuronx-cc==2.* neuronx-distributed-inference --extra-index-url https://pip.repos.neuron.amazonaws.com\n\n\nVerify NxD Inference Installation\n---------------------------------\n\nTo verify that NxD Inference installed successfully, check that you can\nrun the ``inference_demo`` console script.\n\n::\n\n   inference_demo --help\n"
  },
  {
    "path": "libraries/nxd-inference/overview-index.rst",
    "content": ".. _nxdi-overview:\n\nNxD Inference Overview\n=======================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n\nOverview\n--------\n\nNxD Inference  (where NxD stands for NeuronX Distributed) is an open-source PyTorch-based inference library that simplifies deep learning model deployment on AWS Inferentia and Trainium instances. Introduced with Neuron SDK 2.21 release, \nit offers advanced inference capabilities, including features such as continuous batching and speculative decoding for high performance inference. Additionally, it supports inference engine for vLLM for seamless integration with the majority of customers' production deployment systems. ML developers can use NxD Inference library at different levels of abstraction that fits their inference use case.\n\n\nNxD Inference(NxDI) library offers the following benefits:\n\n* **Production ready models**: NxD Inference provides production ready models like  Llama-3.1, DBRX, and Mixtral that you can quickly deploy for high performance inference. \n\n* **LLM Inference Features**:  NxD Inference provides support for various LLM inference features like KV Cache, Multi-Head Attention (MHA), Grouped Query Attention (GQA), Flash Attention, Quantization, MoE , Continuous Batching and Speculative Decoding enabling high performance inference.  \n\n* **Modular Design**:  Inference features in NxDI like KV Caching are implemented with a modular design, allowing developers to easily incorporate them into new models or customize and extend them.\n\n* **Distributed Strategies**: NxD Inference enables distributing inference workload of large models across multiple NeuronCores in a single instance using Tensor parallelism and Sequence Parallelism. Pipeline parallelism and multi-node inference will be supported in future Neuron releases. \n\n* **Support for NKI Kernels**:  NxD Inference provides support for integrating custom NKI kernels on Trainium and  Inferentia instances. \n\n* **Open Source and SW Release**:  NxD Inference library is provided as pip wheel and corresponding source code is made available on `GitHub <https://github.com/aws-neuron/neuronx-distributed-inference>`_ . We encourage developers to contribute new model implementations or feature optimizations to the NxDI library by submitting a pull request.\n\n\n\n.. _nxdi_figure:\n\n.. figure:: ./images/nxd-inference-block-diagram.jpg\n   :align: center\n   :class: outlined-image\n\n   NxD Inference High level Overview\n\n\n\nUsing NxD Inference Library\n---------------------------\n\n\nML developers can use NxD Inference library at different levels of abstraction. As shown in the below diagram :numref:`Fig. %s <nxdi_use-cases-figure>`, developers can use NxDI library in 3 different ways.\n\n* **Deploy production ready models with vLLM**:  NxDI supports production ready models like Llama-3.1, DBRX and Mixtral that can be easily deployed directly through vLLM. Customers can integrate their inference scripts directly with vLLM API. \n\n* **Deploy production ready models with NxDI**:   For customers who are not using vLLM, they can integrate with NxDI models directly for use cases such as static batching.   For continuous batching, customers can also integrate with NxDI API to implement a custom model server with scheduler(other than vLLM) . See :numref:`Fig. %s <nxdi_use-cases-figure>` b) for reference.\n\n* **Integrate with Inference modules and NxD Core primitives**:   As described in :numref:`Fig. %s <nxdi_use-cases-figure>` c), customers who are looking to onboard new models which are not in NxDI model hub can integrate with inference modules and NxD Core primitives. In addition, customers who are looking to integrate with model servers other than vLLM can also integrate directly with NxD Inference modules and NxD core primitives.\n\n.. _nxdi_use-cases-figure:\n\n.. figure:: ./images/nxd-inference-use-cases.jpg\n   :align: center\n   :class: outlined-image\n\n   Using NxD Inference through various abstractions "
  },
  {
    "path": "libraries/nxd-inference/setup.txt",
    "content": "* :ref:`nxdi-setup`"
  },
  {
    "path": "libraries/nxd-inference/tutorials/disaggregated-inference-tutorial-1p1d.rst",
    "content": ".. _nxdi-disaggregated-inference-1p1d-tutorial:\n\nTutorial: Static 1P1D Disaggregated Inference on Trn2 [BETA]\n============================================================\n\nOverview\n~~~~~~~~\n\nThis tutorial will mainly cover how to run Disaggregated Inference (DI) 1P1D (1 prefill, 1 Decode) \neither on a single Trn2 instance (1P and 1D both are on same instance) or on 2 instances \n(1P and 1D are on separate instances). It will provide scripts that can setup both\nsingle and multi instance workflows. Next, the tutorial will demonstrate how to benchmark DI. Finally,\nwe show how to benchmark non Disaggregated Inference (non-DI) continuous batching to compare results between DI vs. non-DI.\n\nRead the :ref:`DI Developer Guide<nxdi-disaggregated-inference>` for more detailed information.\n\n.. note::\n\n   This tutorial was tested on trn2.48xlarge but its concepts are also be applicable to trn1.32xlarge.\n\nSet up and connect trn2.48xlarge instance\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAs a prerequisite, this tutorial requires that you have one or two Trn2 instances\nwith Neuron SDK, Neuron vLLM and Elastic Fabric Adapter (EFA) enabled and installed. The Neuron Deep Learning AMI\ncomes with Neuron dependencies and EFA enabled and installed so it is the recommended\nway to run this tutorial.\n\nTo set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK,\nsee :ref:`nxdi-setup`.\n\n.. note::\n\n   Disaggregated Inference is only supported on Neuron instances with EFA enabled (trn1.32xlarge or trn2.48xlarge).\n   EFA is still required even when running single instance as the KV cache transfer happens through EFA.\n\nIf you choose to manually install NxD Inference follow the \n`EFA setup guide <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html>`_ to install and enable EFA.\n\n\nIf running multi-instance it is recommended to have shared storage between the two instances to avoid having\nto download, compile and save scripts twice. For more details, see documentation on mounting \n`EFS <https://docs.aws.amazon.com/efs/latest/ug/mount-multiple-ec2-instances.html>`_ or \n`FSX <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage_fsx.html>`_ filesystems.\n\nAfter setting up an instance, use SSH to connect to the Neuron instance(s) using the key pair that you\nchose when you launched the instance.\n\nAfter you are connected, activate the Python virtual environment that includes the Neuron SDK.\n\n::\n\n   source ~/aws_neuronx_venv_pytorch_2_7_nxd_inference/bin/activate\n\nInstall the latest release branch of vLLM from the AWS Neuron fork \nfollowing the instructions in the :ref:`vLLM User Guide for NxD Inference<nxdi-vllm-user-guide>`.\n\n\nRun ``pip list`` to verify that the Neuron SDK is installed.\n\n::\n\n   pip list | grep neuron\n\nYou should see Neuron packages including\n``neuronx-distributed-inference`` and ``neuronx-cc`` and ``vllm``.\n\nDownload Dependencies\n~~~~~~~~~~~~~~~~~~~~~\n\nTo use this sample, you must first download a `Llama-3.3-70B-Instruct <https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct>`_ model checkpoint from Hugging Face\nto a local path on the Trn2 instance. \nNote that you may need access from Meta for model download.\nFor more information, see\n`Downloading models <https://huggingface.co/docs/hub/en/models-downloading>`_\nin the Hugging Face documentation.\n\n\nCompile the model\n~~~~~~~~~~~~~~~~~\n\nCompile the model for Neuron by using the following ``compile.sh`` script.\n\n::\n\n   #!/bin/bash\n   # copy and paste me into a file called compile.sh\n   # then run chmod +x compile.sh\n\n   # Parse command line arguments\n   while [[ $# -gt 0 ]]; do\n      case $1 in\n         --tp-degree)\n               TP_DEGREE=\"$2\"\n               shift 2\n               ;;\n         --batch-size)\n               BATCH_SIZE=\"$2\"\n               shift 2\n               ;;\n         --model-path)\n               MODEL_PATH=\"$2\"\n               shift 2\n               ;;\n         *)\n               echo \"Unknown parameter: $1\"\n               echo \"Usage: $0 --tp-degree <value> --batch-size <value> --model-path <path>\"\n               exit 1\n               ;;\n      esac\n   done\n\n   export COMPILED_MODEL_PATH=\"di_traced_model_tp${TP_DEGREE}_b${BATCH_SIZE}/\"\n\n   inference_demo \\\n      --model-type llama \\\n      --task-type causal-lm \\\n      run \\\n      --model-path $MODEL_PATH \\\n      --compiled-model-path $COMPILED_MODEL_PATH \\\n      --torch-dtype bfloat16 \\\n      --tp-degree $TP_DEGREE \\\n      --batch-size $BATCH_SIZE \\\n      --ctx-batch-size 1 \\\n      --tkg-batch-size $BATCH_SIZE \\\n      --is-continuous-batching \\\n      --max-context-length 8192 \\\n      --seq-len 8192 \\\n      --on-device-sampling \\\n      --fused-qkv \\\n      --global-topk 256 --dynamic \\\n      --top-k 50 --top-p 0.9 --temperature 0.7 \\\n      --do-sample \\\n      --sequence-parallel-enabled \\\n      --qkv-kernel-enabled \\\n      --attn-kernel-enabled \\\n      --mlp-kernel-enabled \\\n      --cc-pipeline-tiling-factor 1 \\\n      --pad-token-id 2 \\\n      --logical-neuron-cores 2 \\\n      --context-encoding-buckets 256 512 1024 2048 4096 8192 \\\n      --token-generation-buckets 512 1024 2048 4096 8192 \\\n      --apply-seq-ids-mask \\\n      --enable-bucketing \\\n      --prompt \"test prompt\" \\\n      --save-sharded-checkpoint \\\n      --attn-block-tkg-nki-kernel-enabled \\\n      --attn-block-tkg-nki-kernel-cache-update \\\n      --k-cache-transposed \\\n      --async-mode \\\n      --compile-only\n\nThe ``--apply-seq-ids-mask`` flag is required for DI because it\ntells Neuron to only update the KV cache of the current sequence ID to ensure \nKV cache integrity, and ultimately, accuracy.\n\nMulti-Instance\n---------------\nFor multi-instance run: \n\n::\n\n   ./compile.sh --tp-degree 64 --batch-size 4 --model-path path/to/your/downloaded/model\n\nSingle-Instance\n---------------\nFor single-instance run: \n\n::\n\n   ./compile.sh --tp-degree 32 --batch-size 4 --model-path path/to/your/downloaded/model\n\nWe compile for ``tp-degree=32`` because 1 prefill server will take up half \nof the Neuron Cores cores while the decode server will take up the other half.\n\n\nLaunch the Prefill and Decode Servers\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWe provide a script called ``server.sh``, which you can use to launch prefill and\ndecode servers.\n\n``NEURON_RT_ASYNC_SENDRECV_EXPERIMENTAL_ENABLED=1`` is currently required as DI is still in beta.\n``NEURON_RT_ASYNC_SENDRECV_BOOTSTRAP_PORT=45645`` is required to tell the Neuron Runtime which port to use for KV Cache transfer communications.\n``NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=2`` enables :ref:`Asynchronous Runtime Support<nxdi_async_mode_feature_guide>`\n\nThe ``KVTransferConfig`` provided to both servers on startup have key information.\n``kv_connector=NeuronConnector`` lets vLLM know to use the Neuron implementation for KV cache transfer.\n``kv_role=producer`` lets vLLM know that this server's job is to do prefill.\n``kv_role=consumer`` lets vLLM know that this server's job is to do decode.\n``neuron_core_offset=n`` lets vLLM know that the model is hosted starting on the nth Neuron Core.\n\n\n::\n\n   #!/bin/bash\n   # copy and paste me into a file called server.sh\n   # then run chmod +x server.sh\n\n   #!/bin/bash\n\n   # Parse command line arguments\n   while [[ $# -gt 0 ]]; do\n      case $1 in\n         --tp-degree)\n               TP_DEGREE=\"$2\"\n               shift 2\n               ;;\n         --batch-size)\n               BATCH_SIZE=\"$2\"\n               shift 2\n               ;;\n         --model-path)\n               MODEL_PATH=\"$2\"\n               shift 2\n               ;;\n         --compiled-model-path)\n               COMPILED_MODEL_PATH=\"$2\"\n               shift 2\n               ;;\n         --neuron-send-ip)\n               SEND_IP=\"$2\"\n               shift 2\n               ;;\n         --neuron-recv-ip)\n               RECV_IP=\"$2\"\n               shift 2\n               ;;\n         *)\n               echo \"Unknown parameter: $1\"\n               echo \"Usage: $0 --tp-degree <value> --batch-size <value> --model-path <path> \\\n                              --compiled-model-path <path> --send-ip <ip> --recv-ip <ip>\"\n               exit 1\n               ;;\n      esac\n   done\n\n   export NEURON_RT_ASYNC_SENDRECV_BOOTSTRAP_PORT=45645\n   export NEURON_RT_ASYNC_SENDRECV_EXPERIMENTAL_ENABLED=1\n   export NEURON_COMPILED_ARTIFACTS=\"$COMPILED_MODEL_PATH\"\n   export NEURON_SEND_IP=\"$SEND_IP\"\n   export NEURON_RECV_IP=\"$RECV_IP\"\n   export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=2\n\n   if [ \"$SEND\" = \"1\" ]; then\n      PORT=8100\n      if [ \"$SINGLE_INSTANCE\" = \"1\" ]; then\n         export NEURON_RT_VISIBLE_CORES=0-31\n      fi\n      TRANSFER_CONFIG='{\n               \"kv_connector\":\"NeuronConnector\",\n               \"kv_buffer_device\":\"cpu\",\n               \"kv_role\":\"kv_producer\",\n               \"kv_rank\":0,\n               \"kv_parallel_size\":2,\n               \"kv_buffer_size\":2e11,\n               \"kv_ip\":\"'\"$NEURON_SEND_IP\"'\",\n               \"neuron_core_offset\": 0\n         }'\n      \n   else\n      PORT=8200\n      if [ \"$SINGLE_INSTANCE\" = \"1\" ]; then\n         NC_OFFSET=32\n         export NEURON_RT_VISIBLE_CORES=32-63\n      else   \n         NC_OFFSET=0\n      fi\n      TRANSFER_CONFIG='{\n               \"kv_connector\":\"NeuronConnector\",\n               \"kv_buffer_device\":\"cpu\",\n               \"kv_role\":\"kv_consumer\",\n               \"kv_rank\":1,\n               \"kv_parallel_size\":2,\n               \"kv_buffer_size\":2e11,\n               \"kv_ip\":\"'\"$NEURON_SEND_IP\"'\",\n               \"neuron_core_offset\": \"'\"$NC_OFFSET\"'\"\n         }'\n   fi\n\n   python3 -m vllm.entrypoints.openai.api_server \\\n         --model \"$MODEL_PATH\" \\\n         --max-num-seqs \"$BATCH_SIZE\" \\\n         --max-model-len 8192 \\\n         --tensor-parallel-size \"$TP_DEGREE\" \\\n         --device neuron \\\n         --use-v2-block-manager \\\n         --override-neuron-config \"{}\" \\\n         --kv-transfer-config \"$TRANSFER_CONFIG\" \\\n         --port \"$PORT\"\n\n\nYou may need multiple terminals to run the following commands.\n\nFor multi-instance choose one instance to be your prefill instance and\none instance to be your decode instance. Get the IP addresses of them by running\n``hostname -i`` and use them in the commands below. Single instance can use ``127.0.0.1``\nas the IP address since prefill and decode always run on the same instance.\n\nMulti-Instance\n---------------\n\nTo launch a prefill server for multi-instance run: \n\n::\n\n   SEND=1 ./server.sh --tp-degree 64 --batch-size 4 \\\n                      --model-path path/to/your/downloaded/model \\\n                      --compiled-model-path di_traced_model_tp64_b4/ \\\n                      --neuron-send-ip prefill_ip --neuron-recv-ip decode_ip\n\nTo launch a decode server open up a new tab and run: \n\n::\n\n   ./server.sh --tp-degree 64 --batch-size 4 \\\n               --model-path path/to/your/downloaded/model \\\n               --compiled-model-path di_traced_model_tp64_b4/  \\\n               --neuron-send-ip prefill_ip --neuron-recv-ip decode_ip\n\n\nSingle-Instance\n---------------\nTo launch a prefill server for single-instance run: \n\n::\n\n   SEND=1 SINGLE_INSTANCE=1 ./server.sh --tp-degree 32 --batch-size 4 \\\n                                        --model-path path/to/your/downloaded/model \\\n                                        --compiled-model-path di_traced_model_tp32_b4/ \\\n                                        --neuron-send-ip 127.0.0.1 --neuron-recv-ip 127.0.0.1\n\n\nTo launch a decode server open up a new tab and run: \n\n::\n\n   SINGLE_INSTANCE=1 ./server.sh --tp-degree 32 --batch-size 4 \\\n                                 --model-path path/to/your/downloaded/model \\\n                                 --compiled-model-path di_traced_model_tp32_b4/ \\\n                                 --neuron-send-ip 127.0.0.1 --neuron-recv-ip 127.0.0.1\n\n\n\nWhen you see the line ``INFO:     Uvicorn running on http://0.0.0.0:8100 (Press CTRL+C to quit)``\non your prefill and decode server tabs your servers are ready.\n\nLaunch a Router (Proxy Server)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nBoth servers need to receive a request to run inference. The component that does this job is called the \nrouter as mentioned in :ref:`DI Developer Guide<nxdi-disaggregated-inference>`.\nWe offer an implementation of a router called the ``neuron-proxy-server``.\nThe ``neuron-proxy-server`` is an entrypoint in our fork of vLLM which launches a proxy server that\nwill take a request and forward it to both the prefill and decode servers. It will \nthen capture their responses and format them back to the user. \n\nThe implementation of the neuron-proxy-server can be found \n`here <https://github.com/aws-neuron/upstreaming-to-vllm/tree/neuron-2.24-vllm-v0.7.2/vllm/neuron_immediate_first_token_proxy_server.py>`_.\n\n\nFor multi-instance run the router as another process on your prefill instance. \nFor single-instance run the router as another process on your Trn2.\n\nA router can run on any instance that has a connection to both the prefill and decode nodes.\nFor multi-instance 1P1D, it makes the most sense to have the router on the prefill node to reduce network latency.\n\nLaunch the proxy server by running:\n\n::\n\n   pip install quart # only install one time\n   neuron-proxy-server --prefill-ip your_prefill_ip --decode-ip your_decode_ip --prefill-port 8100 --decode-port 8200\n\nThe proxy server is ready when you see the line ``INFO:hypercorn.error:Running on http://127.0.0.1:8000 (CTRL + C to quit)``\n\nTest the DI Setup\n~~~~~~~~~~~~~~~~~\n\nRun a sanity check to see if you DI setup is working by sending a curl request to the ``neuron-proxy-server``:\n\n::\n\n   curl -s http://localhost:8000/v1/completions \\\n      -H \"Content-Type: application/json\" \\\n      -d '{\n      \"model\": \"path/to/your/downloaded/model\",\n      \"prompt\": [\"a tornado is a\"],\n      \"max_tokens\": 10,\n      \"temperature\": 0\n      }'\n\nA successful response looks like:\n``{\"id\": ... :[{\"index\":0,\"text\":\" rotating column of air that forms during severe thunderstorms\" ... }``\n\nThe ``neuron-proxy-server`` also supports the streaming of responses. It can be tested by:\n\n::\n\n   curl -s http://localhost:8000/v1/completions \\\n      -H \"Content-Type: application/json\" \\\n      -d '{\n      \"model\": \"path/to/your/downloaded/model\",\n      \"prompt\": [\"a tornado is a\"],\n      \"max_tokens\": 10,\n      \"temperature\": 0,\n      \"stream\": true\n      }'\n\n\nBenchmark the DI Setup\n~~~~~~~~~~~~~~~~~~~~~~\n\nInstall LLMPerf\n---------------\n\nWe will use `LLMPerf <https://github.com/ray-project/llmperf>`_ to measure the performance.\n\nLLMPerf will send requests to the ``neuron-proxy-server`` and capture data including Time To First Token,\nInter Token Latency and throughput.\n\nInstall llmperf into the ``aws_neuronx_venv_pytorch_2_7_nxd_inference`` virtual environment.\n\nFor multi-instance LLMperf is only required to be installed on the prefill instance where you will run benchmarking.\n\n::\n\n    git clone https://github.com/ray-project/llmperf.git\n    cd llmperf\n    pip install -e .    \n\nOnce you have installed LLMPerf, apply the ``neuron_perf.patch`` as described in :ref:`llm-inference-benchmarking`. \n\nNext use the ``llmperf.sh`` script to run benchmarks.\n\n::\n\n   #!/bin/bash\n   # copy and paste me into a file called llmperf.sh\n   # then run chmod +x llmperf.sh\n\n   # Set environment variables\n   export OPENAI_API_BASE=\"http://localhost:8000/v1\"\n   export OPENAI_API_KEY=\"mock_key\"\n\n   python llmperf/token_benchmark_ray.py \\\n      --model=$MODEL_PATH \\\n      --tokenizer=$MODEL_PATH \\\n      --mean-input-tokens=1024 \\\n      --stddev-input-tokens=0\\\n      --mean-output-tokens=100 \\\n      --stddev-output-tokens=10 \\\n      --max-num-completed-requests=200 \\\n      --timeout=1720000 \\\n      --num-concurrent-requests=4 \\\n      --results-dir=llmperf_results \\\n      --llm-api=openai \\\n      --additional-sampling-params \"{\\\"top_k\\\": 50, \\\"top_p\\\": 0.9, \\\"temperature\\\": 0.7}\"\n\nSince the ``llmperf.sh`` script sends requests to localhost, it should be run on the same instance\nthe router is running on.\n\nIn multi-instance that means as a separate process on your prefill instance.\nFor single instance that means a separate process on your Trn2.\n\n::\n\n   MODEL_PATH=path/to/your/downloaded/model ./llmperf.sh \n\nThis will run a total of 200 requests and your final output should have the line:\n``Completed Requests Per Minute: xx.xxxxxxx``. Scroll up to see metrics such as\nInter Token Latency and Time To First Token.\n\n\nBenchmark a Non-DI Continuous Batching Setup for Comparison\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo compare Disaggregated Inference against non-DI continuous batching \nwe will run benchmarks without Disaggregated Inference.\n\nFirst kill all DI servers. Then kill the ``neuron-proxy-server``.\n\nWe will run the same compiled model as a singular server for non-DI benchmarks.\nFor single instance non-DI benchmarking we will start one TP=32 server. For multi-instance non-DI \nbenchmarking we will start one TP=64 server. This means you do not need your second (decode) instance for this step.\nLatency can be compared directly in DI vs non-DI benchmarks. You might need to adjust the throughput related \nmetrics based on number of instances to compare apples-to-apples between DI and non-D1. \nIn this case, Non-DI throughput should be doubled before comparing with DI as the non-DI benchmark uses half the amount of hardware.\n\nUse the ``baseline_server.sh`` to launch a vLLM server without DI.\n\n::\n\n   #!/bin/bash\n   # copy and paste me into a file called baseline_server.sh\n   # then run chmod +x baseline_server.sh\n\n   #!/bin/bash\n\n   # Parse command line arguments\n   while [[ $# -gt 0 ]]; do\n      case $1 in\n         --tp-degree)\n               TP_DEGREE=\"$2\"\n               shift 2\n               ;;\n         --batch-size)\n               BATCH_SIZE=\"$2\"\n               shift 2\n               ;;\n         --model-path)\n               MODEL_PATH=\"$2\"\n               shift 2\n               ;;\n         --compiled-model-path)\n               COMPILED_MODEL_PATH=\"$2\"\n               shift 2\n               ;;\n         *)  \n               echo \"Unknown parameter: $1\"\n               echo \"Usage: $0 --tp-degree <value> --batch-size <value> --model-path <path> \\\n                              --compiled-model-path <path>\"\n               exit 1\n               ;;\n      esac\n   done\n\n   export NEURON_COMPILED_ARTIFACTS=\"$COMPILED_MODEL_PATH\"\n   export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=2\n\n   if [ \"$SINGLE_INSTANCE\" = \"1\" ]; then\n      NEURON_RT_VISIBLE_CORES=0-31\n   fi\n\n   python3 -m vllm.entrypoints.openai.api_server \\\n         --model \"$MODEL_PATH\" \\\n         --max-num-seqs \"$BATCH_SIZE\" \\\n         --max-model-len 8192 \\\n         --tensor-parallel-size \"$TP_DEGREE\" \\\n         --device neuron \\\n         --use-v2-block-manager \\\n         --override-neuron-config \"{}\" \\\n         --port 8000\n\n\nMulti-Instance\n---------------\nLaunch for multi-instance with:\n\n::\n   \n   ./baseline_server.sh --tp-degree 64 --batch-size 4 \\\n                        --model-path path/to/your/downloaded/model \\\n                        --compiled-model-path di_traced_model_tp64_b4/\n\n\nSingle-Instance\n---------------\nLaunch for single-instance with:\n\n::\n   \n   SINGLE_INSTANCE=1 ./baseline_server.sh --tp-degree 32 --batch-size 4 \\\n                                          --model-path path/to/your/downloaded/model \\\n                                          --compiled-model-path di_traced_model_tp32_b4/\n\nNow we have a server launched with the same underlying model but with DI turned off.\n\nThen on the same instance run llmperf which will now directly send requests to the server\ninstead of going through a proxy:\n\n::\n\n   MODEL_PATH=path/to/your/downloaded_model ./llmperf.sh \n\nThis will run a total of 200 requests and your final output should have the line:\n``Completed Requests Per Minute: xx.xxxxxxx``. Scroll up to see metrics such as\nInter Token Latency and Time To First Token.\n\n\nKnown Issues\n~~~~~~~~~~~~\n\n``ENC:kv_store_acquire_file_lock   Failed to open kv store server lock file Permission denied`` \nusually means that another user on the system ran a DI workload and left behind a lock file\nthat the current user does not have access to. The solution is to delete ``/tmp/nrt_kv_store_server.lock`` file."
  },
  {
    "path": "libraries/nxd-inference/tutorials/disaggregated-inference-tutorial.rst",
    "content": ".. _nxdi-disaggregated-inference-tutorial:\n\nTutorial: Disaggregated Inference [BETA]\n================================================\n\nOverview\n~~~~~~~~\n\nThis tutorial shows how to run Disaggregated Inference (DI) using prefill and decode vLLM workers. You'll learn how to set up both worker types and scale from a basic 1P1D setup to larger configurations. The guide includes benchmarks that show how DI improves performance compared to standard inference, especially for long input sequences.\n\nDI splits work between prefill workers and decode workers. Each worker needs:\n\n* A Trn1 or Trn2 instance \n* Neuron SDK\n* A supported vLLM version\n* Elastic Fabric Adapter (EFA)\n\nDI also needs a proxy server to manage traffic between workers and an etcd service for worker registration. You can run these on a basic EC2 instance like an M-series.\n\nFor more details, see the :ref:`DI Developer Guide<nxdi-disaggregated-inference>`.\n\n.. note::\n\n  This tutorial works with trn2.48xlarge and trn1.32xlarge instances.\n\nBefore You Begin  \n~~~~~~~~~~~~~~~~\n\nYou need:\n\n* A Trn1 or Trn2 instance with Neuron SDK, Neuron vLLM, and EFA enabled (see :ref:`nxdi-setup`)\n* An m5.xlarge instance with Ubuntu or Amazon Linux\n\n.. note::\n   DI only works on Neuron instances with EFA (trn1.32xlarge or trn2.48xlarge). You need EFA even for single-instance setups.\n\n.. tip::\n   Use the AWS Neuron Deep Learning Container (DLC) to avoid manual setup. We'll use the vllm-inference-neuronx DLC in this guide.\n\nSelect and Compile Your Model\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDI works best with large models that have billions of parameters. We'll use ``meta-llama/Llama-3.3-70B-Instruct`` as an example. First, compile your model following the :ref:`nxdi-trn2-llama3.3-70b-dp-tutorial` guide. Make sure to set the correct input shapes and tensor parallelism.\n\nSet Up the etcd Server\n~~~~~~~~~~~~~~~~~~~~~~~\n\n1. Connect to your EC2 proxy instance using SSH or Session Manager\n2. Run these commands:\n\n.. code-block:: bash\n\n   sudo su - ubuntu\n   HOST_IP=$(hostname -i | awk '{print $1}')\n\n   # Remove old containers\n   docker rm -f etcd proxy 2>/dev/null || true\n\n   # Start etcd\n   docker run -d \\\n     --name etcd \\\n     --shm-size=10g \\\n     --privileged \\\n     -p 8989:8989 \\\n     -e ETCD_IP=$HOST_IP \\\n     ubuntu:22.04 \\\n     bash -c \"apt-get update && apt-get install -y etcd && \\\n              exec etcd \\\n                --data-dir=/etcd-data \\\n                --listen-client-urls=http://0.0.0.0:8989 \\\n                --advertise-client-urls=http://\\$ETCD_IP:8989 \\\n                --listen-peer-urls=http://127.0.0.1:21323 \\\n                --initial-advertise-peer-urls=http://127.0.0.1:21323 \\\n                --initial-cluster=default=http://127.0.0.1:21323 \\\n                --name=default\"\n\n   # Start proxy\n   docker run -d \\\n     --name proxy \\\n     --shm-size=10g \\\n     --privileged \\\n     -p 8000:8000 \\\n     -e ETCD_IP=$HOST_IP \\\n     -e ETCD_PORT=8989 \\\n     public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:0.9.1-neuronx-py310-sdk2.25.1-ubuntu22.04 \\\n     bash -c \"exec neuron-proxy-server --etcd \\$ETCD_IP:\\$ETCD_PORT\"\n\nVerify both services are running:\n\n.. code-block:: bash\n\n   docker ps\n\nStart the Prefill Server\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRun these commands:\n\n.. code-block:: bash\n\n   sudo su - ubuntu\n   export MODEL=\"meta-llama/Llama-3.3-70B-Instruct\"\n   export VLLM_BATCH=8\n   export MAX_LEN=8192\n   export ETCD=\"${HOST_IP}:8989\"\n   export PORT=8000\n\n   # Remove old container\n   docker rm -f prefill-vllm-server1 2>/dev/null || true\n\n   # Start prefill server\n   docker run -d \\\n     --name prefill-vllm-server1 \\\n     --privileged \\\n     --device /dev/infiniband/uverbs0 \\\n     --shm-size=10g \\\n     -p ${PORT}:${PORT} \\\n     -e MODEL \\\n     -e VLLM_BATCH \\\n     -e MAX_LEN \\\n     -e ETCD \\\n     -e PORT \\\n     public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:0.9.1-neuronx-py310-sdk2.25.1-ubuntu22.04 \\\n     bash -c \"exec python3 -m vllm.entrypoints.openai.api_server \\\n       --model \\$MODEL \\\n       --max-num-seqs \\$VLLM_BATCH \\\n       --max-model-len \\$MAX_LEN \\\n       --tensor-parallel-size 64 \\\n       --device neuron \\\n       --speculative-max-model-len \\$MAX_LEN \\\n       --override-neuron-config '{}' \\\n       --kv-transfer-config '{\\\"kv_connector\\\":\\\"NeuronConnector\\\",\\\"kv_role\\\":\\\"kv_producer\\\",\\\"kv_buffer_size\\\":2e11,\\\"etcd\\\":\\\"\\$ETCD\\\"}' \\\n       --port \\$PORT\"\n\nNote: The prefill server uses ``kv_role:kv_producer`` in its configuration.\n\nStart the Decode Server\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRun similar commands for the decode server:\n\n.. code-block:: bash\n\n   sudo su - ubuntu\n   export MODEL=\"meta-llama/Llama-3.3-70B-Instruct\"\n   export VLLM_BATCH=8\n   export MAX_LEN=8192\n   export ETCD=\"${HOST_IP}:8989\"\n   export PORT=8000\n\n   # Remove old container\n   docker rm -f decode-vllm-server1 2>/dev/null || true\n\n   # Start decode server\n   docker run -d \\\n     --name decode-vllm-server1 \\\n     --privileged \\\n     --device /dev/infiniband/uverbs0 \\\n     --shm-size=10g \\\n     -p ${PORT}:${PORT} \\\n     -e MODEL \\\n     -e VLLM_BATCH \\\n     -e MAX_LEN \\\n     -e ETCD \\\n     -e PORT \\\n     public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:0.9.1-neuronx-py310-sdk2.25.1-ubuntu22.04 \\\n     bash -c \"exec python3 -m vllm.entrypoints.openai.api_server \\\n       --model \\$MODEL \\\n       --max-num-seqs \\$VLLM_BATCH \\\n       --max-model-len \\$MAX_LEN \\\n       --tensor-parallel-size 64 \\\n       --device neuron \\\n       --speculative-max-model-len \\$MAX_LEN \\\n       --override-neuron-config '{}' \\\n       --kv-transfer-config '{\\\"kv_connector\\\":\\\"NeuronConnector\\\",\\\"kv_role\\\":\\\"kv_consumer\\\",\\\"kv_buffer_size\\\":2e11,\\\"etcd\\\":\\\"\\$ETCD\\\"}' \\\n       --port \\$PORT\"\n\nNote: The decode server uses ``kv_role:kv_consumer`` in its configuration.\n\nTest Your Setup\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTest your DI setup with this simple request:\n\n.. code-block:: bash\n\n   curl -s http://localhost:8000/v1/completions \\\n      -H \"Content-Type: application/json\" \\\n      -d '{\n      \"model\": \"meta-llama/Llama-3.3-70B-Instruct\",\n      \"prompt\": [\"a tornado is a\"],\n      \"max_tokens\": 10,\n      \"temperature\": 0\n      }'\n\nScale Your Setup\n~~~~~~~~~~~~~~~~~~~~~\n\nTo add more capacity:\n\n1. Launch additional prefill workers when you need more compute power\n2. Launch additional decode workers when you need more memory\n3. Workers can run on the same instance or different ones\n4. New workers automatically register with etcd\n5. The proxy automatically routes traffic to all workers\n\nBenchmark Your Setup\n~~~~~~~~~~~~~~~~~~~~~~\n\nInstall LLMPerf\n---------------\n\n1. Get LLMPerf:\n\n.. code-block:: bash\n\n   git clone https://github.com/ray-project/llmperf.git\n   cd llmperf\n   pip install -e .    \n\n2. Apply the ``neuron_perf.patch`` as shown in :ref:`llm-inference-benchmarking`\n\n3. Create this benchmark script (``llmperf.sh``):\n\n.. code-block:: bash\n\n   #!/bin/bash\n   export OPENAI_API_BASE=\"http://localhost:8000/v1\"\n   export OPENAI_API_KEY=\"mock_key\"\n\n   python llmperf/token_benchmark_ray.py \\\n      --model=$MODEL_PATH \\\n      --tokenizer=$MODEL_PATH \\\n      --mean-input-tokens=1024 \\\n      --stddev-input-tokens=0\\\n      --mean-output-tokens=100 \\\n      --stddev-output-tokens=10 \\\n      --max-num-completed-requests=200 \\\n      --timeout=1720000 \\\n      --num-concurrent-requests=4 \\\n      --results-dir=llmperf_results \\\n      --llm-api=openai \\\n      --additional-sampling-params \"{\\\"top_k\\\": 50, \\\"top_p\\\": 0.9, \\\"temperature\\\": 0.7}\"\n\n4. Run the benchmark:\n\n.. code-block:: bash\n\n   MODEL_PATH=path/to/your/downloaded/model ./llmperf.sh \n\nCompare with Standard Inference\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo benchmark without DI:\n\n1. Stop all DI servers and the proxy\n2. Create this script (``baseline_server.sh``):\n\n.. code-block:: bash\n\n   #!/bin/bash\n   while [[ $# -gt 0 ]]; do\n      case $1 in\n         --tp-degree)\n               TP_DEGREE=\"$2\"\n               shift 2\n               ;;\n         --batch-size)\n               BATCH_SIZE=\"$2\"\n               shift 2\n               ;;\n         --model-path)\n               MODEL_PATH=\"$2\"\n               shift 2\n               ;;\n         --compiled-model-path)\n               COMPILED_MODEL_PATH=\"$2\"\n               shift 2\n               ;;\n         *)  \n               echo \"Unknown parameter: $1\"\n               exit 1\n               ;;\n      esac\n   done\n\n   export NEURON_COMPILED_ARTIFACTS=\"$COMPILED_MODEL_PATH\"\n   export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=2\n\n   if [ \"$SINGLE_INSTANCE\" = \"1\" ]; then\n      NEURON_RT_VISIBLE_CORES=0-31\n   fi\n\n   python3 -m vllm.entrypoints.openai.api_server \\\n         --model \"$MODEL_PATH\" \\\n         --max-num-seqs \"$BATCH_SIZE\" \\\n         --max-model-len 8192 \\\n         --tensor-parallel-size \"$TP_DEGREE\" \\\n         --device neuron \\\n         --use-v2-block-manager \\\n         --override-neuron-config \"{}\" \\\n         --port 8000\n\n3. Run for multi-instance:\n\n.. code-block:: bash\n   \n   ./baseline_server.sh --tp-degree 64 --batch-size 4 \\\n                        --model-path path/to/your/downloaded/model \\\n                        --compiled-model-path di_traced_model_tp64_b4/\n\nOr for single-instance:\n\n.. code-block:: bash\n   \n   SINGLE_INSTANCE=1 ./baseline_server.sh --tp-degree 32 --batch-size 4 \\\n                                          --model-path path/to/your/downloaded/model \\\n                                          --compiled-model-path di_traced_model_tp32_b4/\n\n4. Run the benchmark:\n\n.. code-block:: bash\n\n   MODEL_PATH=path/to/your/downloaded_model ./llmperf.sh \n\nKnown Issues\n~~~~~~~~~~~~\n\nIf you see ``ENC:kv_store_acquire_file_lock Failed to open kv store server lock file Permission denied``, delete the lock file:\n\n.. code-block:: bash\n\n   sudo rm /tmp/nrt_kv_store_server.lock"
  },
  {
    "path": "libraries/nxd-inference/tutorials/flux-inference-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"6a2a5707\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Generating Images with Black Forest Labs Flux.1-Dev on Trn1/Trn2\\n\",\n    \"\\n\",\n    \"This tutorial provides a step-by-step guide for generating images using the Flux.1-dev model from Black Forest Labs with NeuronX Distributed (NxD) Inference on a single trn2.48xl instance. This sample specifically generates 1k x 1k images.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"46aaf0d2\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \".. contents:: Table of contents\\n\",\n    \"    :local:\\n\",\n    \"    :depth: 2\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"a71ce078\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"plaintext\"\n    }\n   },\n   \"source\": [\n    \"## Background, Concepts, and Optimizations\\n\",\n    \"\\n\",\n    \"### Tensor and Context Parallel\\n\",\n    \"\\n\",\n    \"For the latent transformer model, use a combination of Tensor Parallelism and Context Parallelism. Due to the compute-bound nature of diffusion inference, add additional parallelism by using sharding on the sequence dimension. Sharding is governed by the `world_size` relative to the `backbone_tp_degree`. \\n\",\n    \"\\n\",\n    \"### CFG Parallelism\\n\",\n    \"\\n\",\n    \"Classifier-Free Guidance (CFG) inference runs two forward passes per denoising step: one for the conditional (prompt) input and one for the unconditional (negative prompt) input. CFG Parallelism accelerates this by distributing the two passes across two sets of devices, effectively halving the per-step latency. To enable CFG Parallelism, set `cfg_parallel_enabled=True` and also enable CFG inference via `use_cfg=True` (providing a `negative_prompt`). Like Context Parallelism, CFG Parallelism requires `world_size = 2 × backbone_tp_degree`. CFG Parallelism and Context Parallelism are mutually exclusive.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"9d6c47a4\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 1: Setup the environment\\n\",\n    \"### Set up and connect to a trn2.48xlarge instance\\n\",\n    \"\\n\",\n    \"As a prerequisite, this tutorial requires that you have a Trn2 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed.\\n\",\n    \"To set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK, see the [NxDI setup guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#nxdi-setup).\\n\",\n    \"\\n\",\n    \"After setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you chose when you launched the instance.\\n\",\n    \"\\n\",\n    \"To use Jupyter Notebook on the Neuron instance, follow the [Jupyter Notebook QuickStart guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"After you are connected, activate the Python virtual environment that includes the Neuron SDK.\\n\",\n    \"\\n\",\n    \"`source ~/aws_neuronx_venv_pytorch_2_9_nxd_inference/bin/activate`\\n\",\n    \"\\n\",\n    \"Run pip list to verify that the Neuron SDK is installed.\\n\",\n    \"\\n\",\n    \"`pip list | grep neuron`\\n\",\n    \"\\n\",\n    \"You should see Neuron packages including neuronx-distributed-inference and neuronx-cc.\\n\",\n    \"\\n\",\n    \"### Download the model\\n\",\n    \"\\n\",\n    \"To use this sample, you must first download the model checkpoint from HuggingFace to a local path on the Trn2 instance. For more information, see [Download models](https://huggingface.co/docs/hub/en/models-downloading) in the HuggingFace documentation. You can download and use [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) for this tutorial.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"fdce0741\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!pip install matplotlib\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"744c9c85\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import torch\\n\",\n    \"from matplotlib import pyplot as plt\\n\",\n    \"\\n\",\n    \"from neuronx_distributed_inference.models.diffusers.flux.application import NeuronFluxApplication, get_flux_parallelism_config\\n\",\n    \"from neuronx_distributed_inference.models.config import NeuronConfig\\n\",\n    \"from neuronx_distributed_inference.models.diffusers.flux.clip.modeling_clip import CLIPInferenceConfig\\n\",\n    \"from neuronx_distributed_inference.models.diffusers.flux.t5.modeling_t5 import T5InferenceConfig\\n\",\n    \"from neuronx_distributed_inference.models.diffusers.flux.modeling_flux import FluxBackboneInferenceConfig\\n\",\n    \"from neuronx_distributed_inference.models.diffusers.flux.vae.modeling_vae import VAEDecoderInferenceConfig\\n\",\n    \"from neuronx_distributed_inference.utils.hf_adapter import load_pretrained_config\\n\",\n    \"from neuronx_distributed_inference.utils.diffusers_adapter import load_diffusers_config\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"2f77e9c9\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 2: Setup Inference Parameters and Model Config\\n\",\n    \"\\n\",\n    \"Start by initializing your inference parameters, which include model parallelism configuration, image sizes and model configuration. Ensure that that `CKPT_DIR` matches the local directory where you downloaded the model in Step 1.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"29c0c1fd\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"backbone_tp_degree = 4\\n\",\n    \"dtype = torch.bfloat16\\n\",\n    \"\\n\",\n    \"# Set context_parallel_enabled or cfg_parallel_enabled to True to enable those parallelism modes.\\n\",\n    \"context_parallel_enabled = False\\n\",\n    \"cfg_parallel_enabled = False\\n\",\n    \"\\n\",\n    \"# world_size is derived automatically: backbone_tp_degree * 2 if either parallel mode is enabled, else backbone_tp_degree\\n\",\n    \"world_size = get_flux_parallelism_config(\\n\",\n    \"    backbone_tp_degree,\\n\",\n    \"    context_parallel_enabled=context_parallel_enabled,\\n\",\n    \"    cfg_parallel_enabled=cfg_parallel_enabled,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"height, width = [1024, 1024]\\n\",\n    \"guidance_scale = 3.5\\n\",\n    \"num_inference_steps = 25\\n\",\n    \"prompt = \\\"A robot named trn2\\\"\\n\",\n    \"\\n\",\n    \"# CFG inference parameters. Set use_cfg=True and provide a negative_prompt to enable CFG.\\n\",\n    \"# cfg_parallel_enabled above requires use_cfg=True.\\n\",\n    \"use_cfg = False\\n\",\n    \"negative_prompt = \\\"\\\"\\n\",\n    \"true_cfg_scale = 2.0 if use_cfg else 1.0\\n\",\n    \"\\n\",\n    \"# The Ckpt directory root under huggingface\\n\",\n    \"CKPT_DIR = \\\"/shared/models/FLUX.1-dev/\\\"\\n\",\n    \"\\n\",\n    \"# Existing Compiled working directory for the compiler\\n\",\n    \"BASE_COMPILE_WORK_DIR = \\\"/tmp/flux/compiler_workdir/\\\"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"81915c20\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 3: Setup Model and Neuron Configuration\\n\",\n    \"\\n\",\n    \"Here, you initialize the various component model configuration objects for the models within the Flux Pipeline. The Flux pipeline contains CLIP, T5, the backbone transformer and the VAE. For each component model, you can use the following parallelism configuration:\\n\",\n    \"- For CLIP, `tp_degree` of 1\\n\",\n    \"- For T5, `tp_degree` is the same as the `world_size`. In the case of this example, this will be 8.\\n\",\n    \"- For the backbone transformer, if using Context Parallelism or CFG Parallelism, `tp_degree` is half the world size. In the case of this example, this will be 4, which allows for 2 parallel ranks.\\n\",\n    \"- Finally, for the VAE, `tp_degree` of 1.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"fae08e51\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"text_encoder_path = os.path.join(CKPT_DIR, \\\"text_encoder\\\")\\n\",\n    \"text_encoder_2_path = os.path.join(CKPT_DIR, \\\"text_encoder_2\\\")\\n\",\n    \"backbone_path = os.path.join(CKPT_DIR, \\\"transformer\\\")\\n\",\n    \"vae_decoder_path = os.path.join(CKPT_DIR, \\\"vae\\\")\\n\",\n    \"\\n\",\n    \"clip_neuron_config = NeuronConfig(\\n\",\n    \"    tp_degree=1,\\n\",\n    \"    world_size=world_size,\\n\",\n    \"    torch_dtype=dtype,\\n\",\n    \")\\n\",\n    \"clip_config = CLIPInferenceConfig(\\n\",\n    \"    neuron_config=clip_neuron_config,\\n\",\n    \"    load_config=load_pretrained_config(text_encoder_path),\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"t5_neuron_config = NeuronConfig(\\n\",\n    \"    tp_degree=world_size,\\n\",\n    \"    world_size=world_size,\\n\",\n    \"    torch_dtype=dtype,\\n\",\n    \")\\n\",\n    \"t5_config = T5InferenceConfig(\\n\",\n    \"    neuron_config=t5_neuron_config,\\n\",\n    \"    load_config=load_pretrained_config(text_encoder_2_path),\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"backbone_neuron_config = NeuronConfig(\\n\",\n    \"    tp_degree=backbone_tp_degree,\\n\",\n    \"    world_size=world_size,\\n\",\n    \"    torch_type=dtype,\\n\",\n    \")\\n\",\n    \"backbone_config = FluxBackboneInferenceConfig(\\n\",\n    \"    cfg_parallel_enabled=cfg_parallel_enabled,\\n\",\n    \"    context_parallel_enabled=context_parallel_enabled,\\n\",\n    \"    neuron_config=backbone_neuron_config,\\n\",\n    \"    load_config=load_diffusers_config(backbone_path),\\n\",\n    \"    height=height,\\n\",\n    \"    width=width,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"decoder_neuron_config = NeuronConfig(\\n\",\n    \"    tp_degree=1,\\n\",\n    \"    world_size=world_size,\\n\",\n    \"    torch_type=dtype,\\n\",\n    \")\\n\",\n    \"decoder_config = VAEDecoderInferenceConfig(\\n\",\n    \"    neuron_config=decoder_neuron_config,\\n\",\n    \"    load_config=load_diffusers_config(vae_decoder_path),\\n\",\n    \"    height=height,\\n\",\n    \"    width=width,\\n\",\n    \"    transformer_in_channels=backbone_config.in_channels,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"setattr(\\n\",\n    \"    backbone_config,\\n\",\n    \"    \\\"vae_scale_factor\\\",\\n\",\n    \"    decoder_config.vae_scale_factor,\\n\",\n    \")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"870692df\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 4: Initialize the Flux Application and Compile\\n\",\n    \"\\n\",\n    \"Now you instantiate the `NeuronFluxApplication` which contains the pipeline orchestration logic, as well as the various component models. You then compile the application, which then compiles each component model individually.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"bd1f472a\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"flux_app = NeuronFluxApplication(\\n\",\n    \"    model_path=CKPT_DIR,\\n\",\n    \"    text_encoder_config=clip_config,\\n\",\n    \"    text_encoder2_config=t5_config,\\n\",\n    \"    backbone_config=backbone_config,\\n\",\n    \"    decoder_config=decoder_config,\\n\",\n    \"    height=height,\\n\",\n    \"    width=width,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"flux_app.compile(BASE_COMPILE_WORK_DIR)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4389cc4d\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 5: Load Model\\n\",\n    \"This step loads the compiled model (NEFF), along with the model weights into device memory. Specifically, calling load on the flux_app loads all the individual component models.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"6c271fe2\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"flux_app.load(BASE_COMPILE_WORK_DIR)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d5d3d857\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 6: Generate an Image\\n\",\n    \"\\n\",\n    \"Finally, you will generate a singular image and render it:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"8c6e5781\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"image = flux_app(\\n\",\n    \"    prompt,\\n\",\n    \"    negative_prompt=negative_prompt if use_cfg else None,\\n\",\n    \"    true_cfg_scale=true_cfg_scale,\\n\",\n    \"    height=height,\\n\",\n    \"    width=width,\\n\",\n    \"    guidance_scale=guidance_scale,\\n\",\n    \"    num_inference_steps=num_inference_steps\\n\",\n    \").images[0]\\n\",\n    \"plt.imshow(image)\\n\",\n    \"plt.show()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d1db69b0\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Notes \\n\",\n    \"### Running Flux Inference on trn1\\n\",\n    \"\\n\",\n    \"This sample can also be deployed to a trn1.32xlarge with a few modifications. If you are using Context Parallelism specifically, then apply the following parallelism configuration\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"world_size = 16\\n\",\n    \"backbone_tp_degree = 8\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"Otherwise use the following:\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"world_size = 8\\n\",\n    \"backbone_tp_degree = 8\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"### Using CFG Parallelism\\n\",\n    \"\\n\",\n    \"To enable CFG Parallelism, set both `use_cfg=True` and `cfg_parallel_enabled=True` in Step 2. You must also provide a `negative_prompt`. CFG Parallelism and Context Parallelism are mutually exclusive — only one can be enabled at a time.\\n\",\n    \"\\n\",\n    \"When `cfg_parallel_enabled=True`, `world_size` is automatically set to `2 × backbone_tp_degree` by `get_flux_parallelism_config`.\\n\",\n    \"\\n\",\n    \"For **trn2.48xlarge**:\\n\",\n    \"```\\n\",\n    \"backbone_tp_degree = 4  # world_size will be 8\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"For **trn1.32xlarge**:\\n\",\n    \"```\\n\",\n    \"backbone_tp_degree = 8  # world_size will be 16\\n\",\n    \"```\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"venv\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.19\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/flux-inpainting-inference-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"6a2a5707\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Inpainting Images with Black Forest Labs Flux.1-Fill-Dev on Trn1/Trn2\\n\",\n    \"\\n\",\n    \"This tutorial provides a step-by-step guide for inpainting/outpainting images using the Flux.1-Fill-dev model from Black Forest Labs with NeuronX Distributed (NxD) Inference on a single trn2.48xl instance.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"a71ce078\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"plaintext\"\n    }\n   },\n   \"source\": [\n    \"## Background, Concepts, and Optimizations\\n\",\n    \"\\n\",\n    \"### Tensor and Context Parallelism\\n\",\n    \"\\n\",\n    \"For the latent transformer model, use a combination of Tensor Parallelism and Context Parallelism. Due to the compute-bound nature of diffusion inference, add additional parallelism by using sharding on the sequence dimension. Sharding is governed by the `world_size` relative to the `backbone_tp_degree`. \\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"9d6c47a4\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 1: Setup the environment\\n\",\n    \"### Set up and connect to a trn2.48xlarge instance\\n\",\n    \"\\n\",\n    \"As a prerequisite, this tutorial requires that you have a Trn2 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed.\\n\",\n    \"To set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK, see the [NxDI setup guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#nxdi-setup).\\n\",\n    \"\\n\",\n    \"After setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you chose when you launched the instance.\\n\",\n    \"\\n\",\n    \"To use a Jupyter Notebook (`.ipynb`) on the Neuron instance, follow the [Jupyter Notebook QuickStart guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"After you are connected, activate the Python virtual environment that includes the Neuron SDK.\\n\",\n    \"\\n\",\n    \"`source ~/aws_neuronx_venv_pytorch_2_9_nxd_inference/bin/activate`\\n\",\n    \"\\n\",\n    \"Run pip list to verify that the Neuron SDK is installed.\\n\",\n    \"\\n\",\n    \"`pip list | grep neuron`\\n\",\n    \"\\n\",\n    \"You should see Neuron packages including neuronx-distributed-inference and neuronx-cc.\\n\",\n    \"\\n\",\n    \"### Download the model\\n\",\n    \"\\n\",\n    \"To use this sample, you must first download the model checkpoint from HuggingFace to a local path on the Trn2 instance. For more information, see [Download models](https://huggingface.co/docs/hub/en/models-downloading) in the HuggingFace documentation. You can download and use [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev) for this tutorial.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"fdce0741\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!pip install matplotlib\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"744c9c85\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import torch\\n\",\n    \"from matplotlib import pyplot as plt\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"from neuronx_distributed_inference.models.diffusers.flux.application import NeuronFluxApplication\\n\",\n    \"from neuronx_distributed_inference.models.config import NeuronConfig\\n\",\n    \"from neuronx_distributed_inference.models.diffusers.flux.clip.modeling_clip import CLIPInferenceConfig\\n\",\n    \"from neuronx_distributed_inference.models.diffusers.flux.t5.modeling_t5 import T5InferenceConfig\\n\",\n    \"from neuronx_distributed_inference.models.diffusers.flux.modeling_flux import FluxBackboneInferenceConfig\\n\",\n    \"from neuronx_distributed_inference.models.diffusers.flux.vae.modeling_vae import VAEDecoderInferenceConfig\\n\",\n    \"from neuronx_distributed_inference.utils.hf_adapter import load_pretrained_config\\n\",\n    \"from neuronx_distributed_inference.utils.diffusers_adapter import load_diffusers_config\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"2f77e9c9\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 2: Setup Inference Parameters and Model Config\\n\",\n    \"\\n\",\n    \"Start by initializing your inference parameters, which include model parallelism configuration, image sizes and model configuration. Ensure that that `CKPT_DIR` matches the local directory where you downloaded the model in Step 1\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"29c0c1fd\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"world_size = 8\\n\",\n    \"backbone_tp_degree = 4\\n\",\n    \"dtype = torch.bfloat16\\n\",\n    \"\\n\",\n    \"height, width = [1024, 1024]\\n\",\n    \"guidance_scale = 3.5\\n\",\n    \"num_inference_steps = 25\\n\",\n    \"prompt = \\\"Milky way galaxy in space\\\"\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"# The Ckpt directory root under huggingface\\n\",\n    \"CKPT_DIR = \\\"/shared/models/FLUX.1-Fill-dev/\\\"\\n\",\n    \"\\n\",\n    \"# Existing Compiled working directory for the compiler\\n\",\n    \"BASE_COMPILE_WORK_DIR = \\\"/tmp/flux/compiler_workdir/\\\"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"81915c20\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 3: Setup Model and Neuron Configuration\\n\",\n    \"\\n\",\n    \"Here, you initialize the various component model configuration objects for the models within the Flux Pipeline. The Flux pipeline contains CLIP, T5, the backbone transformer and the VAE. For each component model, you can use the following parallelism configuration:\\n\",\n    \"- For CLIP, `tp_degree` of 1\\n\",\n    \"- For T5, `tp_degree` is the same as the `world_size`. In the case of this example, this will be 8.\\n\",\n    \"- For the backbone transformer, if using Context Parallelism, `tp_degree` is half the world size. In the case of this example, this will be 4, which allows for 2 CP ranks.\\n\",\n    \"- Finally, for the VAE, `tp_degree` of 1.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"fae08e51\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"text_encoder_path = os.path.join(CKPT_DIR, \\\"text_encoder\\\")\\n\",\n    \"text_encoder_2_path = os.path.join(CKPT_DIR, \\\"text_encoder_2\\\")\\n\",\n    \"backbone_path = os.path.join(CKPT_DIR, \\\"transformer\\\")\\n\",\n    \"vae_decoder_path = os.path.join(CKPT_DIR, \\\"vae\\\")\\n\",\n    \"\\n\",\n    \"clip_neuron_config = NeuronConfig(\\n\",\n    \"    tp_degree=1,\\n\",\n    \"    world_size=world_size,\\n\",\n    \"    torch_dtype=dtype,\\n\",\n    \")\\n\",\n    \"clip_config = CLIPInferenceConfig(\\n\",\n    \"    neuron_config=clip_neuron_config,\\n\",\n    \"    load_config=load_pretrained_config(text_encoder_path),\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"t5_neuron_config = NeuronConfig(\\n\",\n    \"    tp_degree=world_size,\\n\",\n    \"    world_size=world_size,\\n\",\n    \"    torch_dtype=dtype,\\n\",\n    \")\\n\",\n    \"t5_config = T5InferenceConfig(\\n\",\n    \"    neuron_config=t5_neuron_config,\\n\",\n    \"    load_config=load_pretrained_config(text_encoder_2_path),\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"backbone_neuron_config = NeuronConfig(\\n\",\n    \"    tp_degree=backbone_tp_degree,\\n\",\n    \"    world_size=world_size,\\n\",\n    \"    torch_type=dtype,\\n\",\n    \")\\n\",\n    \"backbone_config = FluxBackboneInferenceConfig(\\n\",\n    \"    neuron_config=backbone_neuron_config,\\n\",\n    \"    load_config=load_diffusers_config(backbone_path),\\n\",\n    \"    height=height,\\n\",\n    \"    width=width,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"decoder_neuron_config = NeuronConfig(\\n\",\n    \"    tp_degree=1,\\n\",\n    \"    world_size=world_size,\\n\",\n    \"    torch_type=dtype,\\n\",\n    \")\\n\",\n    \"decoder_config = VAEDecoderInferenceConfig(\\n\",\n    \"    neuron_config=decoder_neuron_config,\\n\",\n    \"    load_config=load_diffusers_config(vae_decoder_path),\\n\",\n    \"    height=height,\\n\",\n    \"    width=width,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"setattr(\\n\",\n    \"    backbone_config,\\n\",\n    \"    \\\"vae_scale_factor\\\",\\n\",\n    \"    decoder_config.vae_scale_factor,\\n\",\n    \")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"870692df\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 4: Initialize the Flux Application and Compile\\n\",\n    \"\\n\",\n    \"Now you instantiate the `NeuronFluxApplication` which contains the pipeline orchestration logic, as well as the various component models. You then compile the application, which then compiles each component model individually.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"bd1f472a\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"flux_app = NeuronFluxApplication(\\n\",\n    \"    model_path=CKPT_DIR,\\n\",\n    \"    text_encoder_config=clip_config,\\n\",\n    \"    text_encoder2_config=t5_config,\\n\",\n    \"    backbone_config=backbone_config,\\n\",\n    \"    decoder_config=decoder_config,\\n\",\n    \"    height=height,\\n\",\n    \"    width=width,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"flux_app.compile(BASE_COMPILE_WORK_DIR)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4389cc4d\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 5: Load Model\\n\",\n    \"This step loads the compiled model (NEFF), along with the model weights into device memory. Specifically, calling load on the flux_app loads all the individual component models.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"6c271fe2\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"flux_app.load(BASE_COMPILE_WORK_DIR)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"bceb5666\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 6: Load the Image and Mask\\n\",\n    \"\\n\",\n    \"Load the image and mask which denotes the area that has to be filled in adherence to the prompt. The `cat.png` and `mask.png` are taken from COCO dataset (https://cocodataset.org/#explore?id=261706). Ensure that the images are in the same directory as the notebook.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"98fb15fa\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from diffusers.utils import load_image\\n\",\n    \"from PIL import Image\\n\",\n    \"\\n\",\n    \"def load_and_resize_image(image_path: str, height: int, width: int) -> Image.Image:\\n\",\n    \"    \\\"\\\"\\\"Load an image from a file path and resize it to the specified dimensions.\\\"\\\"\\\"\\n\",\n    \"    image = load_image(image_path)\\n\",\n    \"    return image.resize((width, height), Image.Resampling.LANCZOS)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"image = load_and_resize_image('./cat.png', height, width)\\n\",\n    \"mask_image = load_and_resize_image('./mask.png', height, width)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d5d3d857\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 7: Generate Fill Image using the model\\n\",\n    \"\\n\",\n    \"Finally, you will fill the masked-region of the image using the prompt and render it:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"8c6e5781\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"image = flux_app(\\n\",\n    \"    prompt=prompt,\\n\",\n    \"    image=image,\\n\",\n    \"    mask_image=mask_image,\\n\",\n    \"    height=height,\\n\",\n    \"    width=width,\\n\",\n    \"    guidance_scale=guidance_scale,\\n\",\n    \"    num_inference_steps=num_inference_steps,\\n\",\n    \").images[0]\\n\",\n    \"plt.imshow(image)\\n\",\n    \"plt.show()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d1db69b0\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Notes \\n\",\n    \"### Running Flux Inference on trn1\\n\",\n    \"\\n\",\n    \"This sample can also be deployed to a trn1.32xlarge with a few modifications. If you are using Context Parallelism specifically, then apply the following parallelism configuration\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"world_size = 16\\n\",\n    \"backbone_tp_degree = 8\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"Otherwise use the following:\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"world_size = 8\\n\",\n    \"backbone_tp_degree = 8\\n\",\n    \"```\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"venv\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.19\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/generating-results-with-performance-cli.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"8b8b4883\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Evaluating Performance of Llama-3.3-70B on Neuron using Performance CLI\\n\",\n    \"\\n\",\n    \"## Introduction\\n\",\n    \"This tutorial provides a step-by-step guide to measure the performance of Llama3.3 70B on `Trn1` with easy to reproduce benchmarks.\\n\",\n    \"\\n\",\n    \"In this tutorial you will learn how llama-3.3-70B can be easily tested with llm-perf for 3.3-70b-instruct model.\\n\",\n    \"\\n\",\n    \"You must have the instruction-tuned version of llama-3.3 70b [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/) available for Hugging Face to successfully complete it.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f747ff61\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Environment Setup Guide\\n\",\n    \"\\n\",\n    \"### Prerequisites\\n\",\n    \"\\n\",\n    \"This tutorial requires that you have a `Trn1` instance created from a Deep Learning AMI that has the Neuron SDK pre-installed. This tutorial depends on the Neuron fork of vLLM.\\n\",\n    \"\\n\",\n    \"Before running evaluations, ensure your environment is properly configured by following these essential setup guides:\\n\",\n    \"\\n\",\n    \"1. [NxD Inference Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html)\\n\",\n    \"2. [vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html)\\n\",\n    \"\\n\",\n    \"###  Installing dependencies\\n\",\n    \"\\n\",\n    \"- Copy the [inference-benchmarking](https://github.com/aws-neuron/aws-neuron-samples/tree/master/inference-benchmarking/) directory to some location on your instance. \\n\",\n    \"- Change your current working directory to your copy of [inference-benchmarking](https://github.com/aws-neuron/aws-neuron-samples/tree/master/inference-benchmarking/). \\n\",\n    \"- Install other required dependencies in the same Python env (such as `aws_neuron_venv_pytorch`, if you followed the steps in [Manually install NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#id3)) by:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"id\": \"0d51cd27\",\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"aws-neuron-llm-evaluation                1.0\\n\",\n      \"awsneuroneval                            1.0\\n\",\n      \"libneuronxla                             2.2.7366.0+1faf0ddf\\n\",\n      \"neuron-torch-tools                       1.0.0.33853+83b6bf63a\\n\",\n      \"neuronx-cc                               2.20.2831.0+8bfecb25\\n\",\n      \"neuronx-cc-devel                         2.20.2831.0+8bfecb25\\n\",\n      \"neuronx-distributed                      0.14.17095+c66a8ca6\\n\",\n      \"neuronx-distributed-inference            0.5.0+dev\\n\",\n      \"torch-neuronx                            2.7.0.2.9.8707+08e1f40d\\n\",\n      \"vllm-neuronx                             0.9.0.dev0+neuron225\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"\\n\",\n      \"WARNING: apt does not have a stable CLI interface. Use with caution in scripts.\\n\",\n      \"\\n\"\n     ]\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"aws-neuronx-collectives/now 2.27.13.0-f3bd841a2 amd64 [installed,local]\\n\",\n      \"aws-neuronx-dkms/now 2.23.0.0 all [installed,local]\\n\",\n      \"aws-neuronx-runtime-lib/now 2.27.7.0-765d5f599 amd64 [installed,local]\\n\",\n      \"aws-neuronx-tools/now 2.25.100.0 amd64 [installed,local]\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/home/ubuntu/aws_neuron_venv/lib/python3.10/site-packages/IPython/core/completerlib.py:371: UserWarning: This is now an optional IPython functionality, using bookmarks requires you to install the `pickleshare` library.\\n\",\n      \"  bks = self.db.get('bookmarks',{})\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"%%bash\\n\",\n    \"pip list | grep neuron\\n\",\n    \"apt list --installed | grep neuron\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"b87a5da4\",\n   \"metadata\": {},\n   \"source\": [\n    \"You should see Neuron packages including `neuronx-distributed-inference` and its related components.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"186aacdb\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e4db09b0\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Download llama-3.3 70B\\n\",\n    \"To use this sample, you must first download [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) model checkpoint from Hugging Face `/home/ubuntu/models/Llama-3.3-70B-Instruct/` on the `Trn1` instance. For more information, see [Downloading models](https://huggingface.co/docs/hub/en/models-downloading) in the Hugging Face documentation.\\n\",\n    \"\\n\",\n    \"To use a Jupyter Notebook on the Neuron instance, follow this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"### Running Evaluations\\n\",\n    \"There are two methods that you can use to run your evaluation.\\n\",\n    \"\\n\",\n    \"1. Use a YAML configuration file and `performance.py` script\\n\",\n    \"\\n\",\n    \"2. Write your own python script that uses several components provided in `performance.py` and `server_config.py`\\n\",\n    \"\\n\",\n    \"Each use case is demonstrated below:\\n\",\n    \"\\n\",\n    \"### 1. Running performance with yaml config file\\n\",\n    \"In this method, you create  a YAML (`.yaml`) config file that specifies the server configuration and testing scenario you want to run. Create `config.yaml` with the following content.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"09b648ec\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"cd inference-benchmarking/\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"id\": \"32fab921\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"#Install requirements present in inference-benchmarking package\\n\",\n    \"#!pip install -r requirements.txt\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4702a430\",\n   \"metadata\": {},\n   \"source\": [\n    \"perf.yaml\\n\",\n    \"```yaml\\n\",\n    \"server:\\n\",\n    \"  name: \\\"llama-3.3-70b-instruct\\\"\\n\",\n    \"  model_path: \\\"/home/ubuntu/models/llama-3.3-70b-instruct\\\"\\n\",\n    \"  model_s3_path: null\\n\",\n    \"  compiled_model_path: \\\"/home/ubuntu/traced_models/llama-3.3-70b-instruct\\\"\\n\",\n    \"  max_seq_len: 256\\n\",\n    \"  context_encoding_len: 128\\n\",\n    \"  tp_degree: 32\\n\",\n    \"  server_port: 8000\\n\",\n    \"  continuous_batch_size: 1\\n\",\n    \"  custom_chat_template_path: \\\"default\\\"\\n\",\n    \"\\n\",\n    \"test:\\n\",\n    \"  performance:\\n\",\n    \"    sonnets_small_test:\\n\",\n    \"      client: \\\"llm_perf\\\"\\n\",\n    \"      client_type: \\\"llm_perf_github_patched\\\"\\n\",\n    \"      n_batches: 1\\n\",\n    \"      max_concurrent_requests: 20\\n\",\n    \"      timeout: 3600\\n\",\n    \"      input_size: 128\\n\",\n    \"      output_size: 124\\n\",\n    \"      client_params:\\n\",\n    \"        stddev_input_tokens: 0\\n\",\n    \"        stddev_output_tokens: 1\\n\",\n    \"\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0ad27ecd\",\n   \"metadata\": {},\n   \"source\": [\n    \"The above YAML file is explained in more detail in [Performance Params guide](../developer_guides/performance-cli-params.html)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"For changing sequence length you must adjust `max_seq_len`. \\n\",\n    \"\\n\",\n    \"Run `python performance.py --config perf.yaml`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"e6d2442a\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!python performance.py --config perf.yaml\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"85dad77c\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 2. Running perf as part of your own Python code\\n\",\n    \"\\n\",\n    \"You nmight want to run the performance script as part of your Python code. For example, you might want to change the configuration programatically or post-process the results. This is possible using 3 main components provided in `performance.py` and `server_config.py`.\\n\",\n    \"\\n\",\n    \"1. Server Configuration: Use ServerConfig to define the vLLM server settings\\n\",\n    \"\\n\",\n    \"2. Performance Scenario: Use PerformanceScenario to specify evaluation parameters\\n\",\n    \"\\n\",\n    \"3. Test Execution: Run the performance with the configured settings\\n\",\n    \"\\n\",\n    \"### Step-by-Step Implementation\\n\",\n    \"First, import the necessary components:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"d3ac8e7b\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"cd \\\"/home/ubuntu/inference-benchmarking\\\"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 59,\n   \"id\": \"7d4bd54d\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from performance import PerformanceScenario, run_perf_test\\n\",\n    \"from server_config import ServerConfig\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"263b8270\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 1. Configure the Server\\n\",\n    \"\\n\",\n    \"Set up your server configuration with ServerConfig. This example uses Llama 3.3-70b Instruct:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 60,\n   \"id\": \"e37832b5\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"name = \\\"llama-3.3-70b-instruct\\\"\\n\",\n    \"server_config = ServerConfig(\\n\",\n    \"    name=name,\\n\",\n    \"    model_path=f\\\"/home/ubuntu/models/{name}\\\",  # Local model path\\n\",\n    \"    model_s3_path=None,  # S3 model path\\n\",\n    \"    max_seq_len=256,          # Maximum sequence length\\n\",\n    \"    context_encoding_len=128,  # Context window size\\n\",\n    \"    tp_degree=32,               # Tensor parallel degree\\n\",\n    \"    n_vllm_threads=1,          # Number of vLLM threads\\n\",\n    \"    server_port=8000,           # Server port\\n\",\n    \"    continuous_batch_size=1,    # Batch size for continuous batching\\n\",\n    \"    custom_chat_template_path=\\\"default\\\" # Chat template\\n\",\n    \")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"90aecfe7\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 2. Define Performance Scenarios\\n\",\n    \"\\n\",\n    \"Create a PerformanceScenario to specify your perf parameters:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 61,\n   \"id\": \"818598ca\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"scenario = PerformanceScenario(\\n\",\n    \"    client=\\\"llm_perf\\\",          # Evaluation client\\n\",\n    \"    client_type=\\\"llm_perf_github_patched\\\",\\n\",\n    \"    n_batches=1,\\n\",\n    \"    max_concurrent_requests=20,  # Maximum concurrent requests\\n\",\n    \"    timeout=5000,              # Timeout in seconds - changed to 5000 from 3600\\n\",\n    \"    input_size=128,            # Input length\\n\",\n    \"    output_size=124,           # Output length\\n\",\n    \"    client_params={\\\"stddev_input_tokens\\\": 0, \\\"stddev_output_tokens\\\": 1}  # Client-specific parameters\\n\",\n    \")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f57a53d0\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 3. Run the Evaluation\\n\",\n    \"\\n\",\n    \"Execute the evaluation using `run_perf_test`:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"d8c5dcd8\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Run the test with a named scenario\\n\",\n    \"results_collection = run_perf_test(\\n\",\n    \"    server_config=server_config,\\n\",\n    \"    named_scenarios={\\\"mytest\\\": scenario}\\n\",\n    \")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"75ebe3d8\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from pprint import pprint\\n\",\n    \"# Display results\\n\",\n    \"pprint(results_collection)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"8753980b\",\n   \"metadata\": {},\n   \"source\": [\n    \"This code will execute and return detailed performance metrics for the model.\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"aws_neuron_venv\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.12\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/index.rst",
    "content": ".. meta::\n    :description: Comprehensive tutorials for NeuronX Distributed (NxD) Inference on AWS Neuron hardware, covering various LLM deployments and optimizations.\n    :date-modified: 12/02/2025\n\n.. _nxdi-tutorials-index:\n\nNxD Inference Tutorials\n========================\n\nWelcome to the NeuronX Distributed (NxD) Inference tutorials collection. These step-by-step guides help you deploy and optimize large language models (LLMs) on AWS Neuron hardware. Learn how to run various models like Llama3, GPT, and more with different optimization techniques including speculative decoding, tensor parallelism, and disaggregated inference.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    :caption: Tutorials\n\n    Disaggregated Inference (1P1D) </libraries/nxd-inference/tutorials/disaggregated-inference-tutorial-1p1d>\n    Disaggregated Inference </libraries/nxd-inference/tutorials/disaggregated-inference-tutorial>\n    Flux Inference </libraries/nxd-inference/tutorials/flux-inference-tutorial>\n    Flux Inpainting </libraries/nxd-inference/tutorials/flux-inpainting-inference-tutorial>\n    Benchmark using Performance CLI </libraries/nxd-inference/tutorials/generating-results-with-performance-cli>\n    GPT-OSS 120B </libraries/nxd-inference/tutorials/trn3-gpt-oss-120b-tutorial>\n    Llama3.1 405B on Trn2 </libraries/nxd-inference/tutorials/trn2-llama3.1-405b-tutorial>\n    Llama3.1 405B with Speculative Decoding </libraries/nxd-inference/tutorials/trn2-llama3.1-405b-speculative-tutorial>\n    Llama3.1 70B Instruct Accuracy Evaluation </libraries/nxd-inference/tutorials/trn1-llama3.1-70b-instruct-accuracy-eval-tutorial>\n    Llama3.1 8B with Multi-LoRA </libraries/nxd-inference/tutorials/trn2-llama3.1-8b-multi-lora-tutorial>\n    Llama3.2 Multimodal </libraries/nxd-inference/tutorials/llama3.2-multimodal-tutorial>\n    Llama3.3 70B FP8 </libraries/nxd-inference/tutorials/trn2-llama3.3-70b-fp8>\n    Llama3.3 70B with APC </libraries/nxd-inference/tutorials/trn2-llama3.3-70b-apc-tutorial>\n    Llama3.3 70B with Data Parallelism </libraries/nxd-inference/tutorials/trn2-llama3.3-70b-dp-tutorial>\n    Llama3.3 70B with Speculative Decoding </libraries/nxd-inference/tutorials/trn2-llama3.3-70b-tutorial>\n    Llama4 </libraries/nxd-inference/tutorials/llama4-tutorial>\n    Llama4 Legacy </libraries/nxd-inference/tutorials/llama4-tutorial-v0>\n    Pixtral </libraries/nxd-inference/tutorials/pixtral-tutorial>\n    Qwen3 MoE 235B </libraries/nxd-inference/tutorials/qwen3-moe-tutorial>\n    Qwen2 VL 7B </libraries/nxd-inference/tutorials/qwen2-vl-tutorial>\n    Speculative Decoding </libraries/nxd-inference/tutorials/sd-inference-tutorial>\n    Qwen3-VL 8B </libraries/nxd-inference/tutorials/qwen3-vl-tutorial>\n\nLlama\n-----\n\n.. grid:: 2\n    :gutter: 3\n\n    .. grid-item-card:: Llama3.1 405B on Trn2\n        :link: /libraries/nxd-inference/tutorials/trn2-llama3.1-405b-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Learn how to deploy Llama3.1 405B on a single Trn2 instance using NxD Inference with vLLM and explore performance optimization techniques.\n\n    .. grid-item-card:: Llama3.1 405B with Speculative Decoding\n        :link: /libraries/nxd-inference/tutorials/trn2-llama3.1-405b-speculative-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Optimize Llama3.1 405B inference on Trn2 using vanilla fused speculative decoding techniques for improved performance.\n\n    .. grid-item-card:: Llama3.1 70B Instruct Accuracy Evaluation\n        :link: /libraries/nxd-inference/tutorials/trn1-llama3.1-70b-instruct-accuracy-eval-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Evaluate the accuracy of Llama3.1 70B Instruct model on Trn1 hardware and learn how to measure model performance.\n\n    .. grid-item-card:: Llama3.1 8B with Multi-LoRA\n        :link: /libraries/nxd-inference/tutorials/trn2-llama3.1-8b-multi-lora-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Learn how to use multiple LoRA adapters with Llama3.1 8B on Trn2 for efficient fine-tuning and domain-specific inference.\n\n    .. grid-item-card:: Llama3.3 70B with Speculative Decoding\n        :link: /libraries/nxd-inference/tutorials/trn2-llama3.3-70b-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Deploy Llama3.3 70B on Trn2 instances and learn how to optimize performance with tensor parallelism and other NxD Inference features.\n\n    .. grid-item-card:: Llama3.3 70B with Data Parallelism\n        :link: /libraries/nxd-inference/tutorials/trn2-llama3.3-70b-dp-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Explore data parallelism techniques for Llama3.3 70B on Trn2 to increase throughput for high-volume inference workloads.\n\n    .. grid-item-card:: Llama3.3 70B with APC\n        :link: /libraries/nxd-inference/tutorials/trn2-llama3.3-70b-apc-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Deploy Llama3.3 70B on Trn2 with Automatic Prefix Caching (APC) to improve inference performance for repetitive patterns.\n\n    .. grid-item-card:: Llama3.3 70B FP8 on Trainium2\n        :link: /libraries/nxd-inference/tutorials/trn2-llama3.3-70b-fp8\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Deploy Llama3.3 70B FP8 quantized model on Trainium2.\n\n    .. grid-item-card:: Llama4\n        :link: /libraries/nxd-inference/tutorials/llama4-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Deploy and optimize Llama4 models on AWS Neuron hardware using NxD Inference with various performance tuning options.\n\nQwen\n----\n\n.. grid:: 2\n    :gutter: 3\n\n    .. grid-item-card:: Qwen3 MoE 235B\n        :link: /libraries/nxd-inference/tutorials/qwen3-moe-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Learn how to deploy `Qwen/Qwen3-235B-A22B <https://huggingface.co/Qwen/Qwen3-235B-A22B>`__ with NxD Inference with various performance tuning options.\n\n    .. grid-item-card:: Qwen3 VL 8B\n        :link: /libraries/nxd-inference/tutorials/qwen3-vl-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Learn how to deploy `Qwen/Qwen3-VL-8B-Thinking <https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking>`__ on a single `trn2.48xlarge` instance.\n\n    .. grid-item-card:: Qwen2 VL 7B\n        :link: /libraries/nxd-inference/tutorials/qwen2-vl-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Learn how to deploy `Qwen/Qwen2-VL-7B-Instruct <https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct>`__ with NxD Inference with various performance tuning options.\n\n    .. grid-item-card:: Speculative Decoding (Qwen3-32B) on Trainium2\n        :link: /libraries/nxd-inference/tutorials/sd-inference-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Implement speculative decoding techniques with Qwen3-32B on Trn2 instances to accelerate LLM inference with NxD Inference.\n\nGPT\n---\n\n.. grid:: 2\n    :gutter: 3\n\n    .. grid-item-card:: GPT-OSS 120B on Trainium3\n        :link: /libraries/nxd-inference/tutorials/trn3-gpt-oss-120b-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Deploy open-source GPT models on Trainium3 hardware using NxD Inference and explore Trn3-specific optimizations.\n\nFlux\n----\n\n.. grid:: 2\n    :gutter: 3\n\n    .. grid-item-card:: Flux Inference\n        :link: /libraries/nxd-inference/tutorials/flux-inference-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Learn how to use Flux for efficient inference with NxD, enabling dynamic batch processing and optimized resource utilization.\n\n    .. grid-item-card:: Flux Inpainting\n        :link: /libraries/nxd-inference/tutorials/flux-inpainting-inference-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Learn how to use the Flux-Fill model for efficient inference with NxD, enabling image inpainting/outpainting.\n\nPixtral\n-------\n\n.. grid:: 2\n    :gutter: 3\n\n    .. grid-item-card:: Pixtral Large Instruct\n        :link: /libraries/nxd-inference/tutorials/pixtral-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Learn how to deploy `mistralai/Pixtral-Large-Instruct-2411 <https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411>`__ on a single `trn2.48xlarge` instance.\n\nTechniques and tools\n--------------------\n\n.. grid:: 2\n    :gutter: 3\n\n    .. grid-item-card:: Disaggregated Inference\n        :link: /libraries/nxd-inference/tutorials/disaggregated-inference-tutorial\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Implement disaggregated inference to distribute model components across multiple instances for large-scale LLM deployment.\n\n    .. grid-item-card:: Disaggregated Inference (1P1D)\n        :link: /libraries/nxd-inference/tutorials/disaggregated-inference-tutorial-1p1d\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Learn about the 1P1D (1 Prefill, 1 Decode) pattern for disaggregated inference to optimize latency and throughput.\n\n    .. grid-item-card:: Benchmark using Performance CLI\n        :link: /libraries/nxd-inference/tutorials/generating-results-with-performance-cli\n        :link-type: doc\n        :class-card: sd-rounded-3\n\n        Use the Performance CLI tool to benchmark and generate performance results for NxD Inference deployments.\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/llama4-tutorial-v0.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4f580405\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Deploying Llama4 Multimodal Models (Legacy)\\n\",\n    \"\\n\",\n    \"> **Important**: This guide is compatible with vLLM v0.x versions. Since vLLM has deprecated v0.x versions, we recommend using vLLM v1.x with the vLLM-Neuron Plugin for new deployments. See [Tutorial: Deploying Llama4 Multimodal Models](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/llama4-tutorial.html) for the updated guide.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"98b5edf6\",\n   \"metadata\": {},\n   \"source\": [\n    \"This guide shows how to deploy Llama4 on an AWS Neuron Trainium2 (Trn2) instance. This model supports both text and images. It uses Llama4 Scout (meta-llama/Llama-4-Scout-17B-16E) as the example model in this tutorial; however, Maverick (meta-llama/Llama-4-Maverick-17B-128E-Instruct) can also be used.\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"id\": \"88abcf3e\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\\n\",\n    \"    :local:\\n\",\n    \"    :depth: 1\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"963ac8ed\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Examples\\n\",\n    \"\\n\",\n    \"- [Offline Example](#offline-example)\\n\",\n    \"- [Online Example](#online-example)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5d0049f7\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 1: Set up your development environment\\n\",\n    \"\\n\",\n    \"As a prerequisite, this tutorial requires that you have a Trn2 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed.\\n\",\n    \"\\n\",\n    \"To set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK, see [NxD Inference Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#nxdi-setup). To use a Jupyter Notebook (.ipynb) on a Neuron-enabled instance, see this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"After setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you chose when you launched the instance.\\n\",\n    \"\\n\",\n    \"After you are connected, activate the Python virtual environment that includes the Neuron SDK.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"dff142cd\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 2: Compile your model and save it as artifacts\\n\",\n    \"\\n\",\n    \"The code snippet below is required to compile Llama4 as artifacts to load for vLLM serving. There is no need to download a Llama4 checkpoint from HuggingFace explicitly, but you may need access from Meta to download it as part of compilation script. For more information, see [Downloading models](https://huggingface.co/docs/hub/en/models-downloading) in the Hugging Face documentation.\\n\",\n    \"\\n\",\n    \"In``neuron_config``, to support multimodal architecture, you can define ``text_config`` and ``vision_config`` separately for text decoder and vision encoder. \\n\",\n    \"\\n\",\n    \"The image input can be represented in 1, 4, or 16 chunks based on its resolution and aspect ratio. Additionally, there is one chunk to describe the entire image, resulting in the total number of chunks. Due to the use of data parallelism (DP) together with tensor parallelism (TP), the vision model input batch size is padded to the next value divisible by the DP degree, which in this case is 4. The final padded batch size will be:\\n\",\n    \"* 1+1 = 2 → 4: Each rank has the batch size = 4/4 = 1\\n\",\n    \"* 4+1 = 5 → 8: Each rank has the batch size = 8/4 = 2\\n\",\n    \"* 16+1 = 17 → 20: Each rank has the batch size = 20/4 = 5\\n\",\n    \"\\n\",\n    \"There are a few fields you can configure to improve performance:\\n\",\n    \"- ``cp_degree``: degree of context parallelism at the attention layer for prefill.\\n\",\n    \"- ``blockwise_matmul_config``: the configuration of the blockwise MoE kernel for prefill.\\n\",\n    \"- ``attn_block_tkg_nki_kernel_enabled`` and ``attn_block_tkg_nki_kernel_cache_update`` to enable a NKI kernel for attention and a kernel KV cache update for decode operations.\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"14ff3fe0\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"scout_neuron_config = {\\n\",\n    \"    \\\"text_config\\\": {\\n\",\n    \"        \\\"batch_size\\\": 1,\\n\",\n    \"        \\\"is_continuous_batching\\\": true,\\n\",\n    \"        \\\"seq_len\\\": 16384,\\n\",\n    \"        \\\"enable_bucketing\\\": true,\\n\",\n    \"        \\\"context_encoding_buckets\\\": [256, 512, 1024, 2048, 4096, 8192, 10240, 16384],\\n\",\n    \"        \\\"token_generation_buckets\\\": [256, 512, 1024, 2048, 4096, 8192, 10240, 16384],\\n\",\n    \"        \\\"torch_dtype\\\": \\\"float16\\\",\\n\",\n    \"        \\\"async_mode\\\": true,\\n\",\n    \"        \\\"world_size\\\": 64,\\n\",\n    \"        \\\"tp_degree\\\": 64,\\n\",\n    \"        \\\"cp_degree\\\": 16,\\n\",\n    \"        \\\"on_device_sampling_config\\\": {\\n\",\n    \"            \\\"dynamic\\\": true,\\n\",\n    \"            \\\"top_k_kernel_enabled\\\": true,\\n\",\n    \"            \\\"top_k\\\": 1\\n\",\n    \"        },\\n\",\n    \"        \\\"cast_type\\\": \\\"as-declared\\\",\\n\",\n    \"        \\\"logical_neuron_cores\\\": 2,\\n\",\n    \"        \\\"cc_pipeline_tiling_factor\\\": 1,\\n\",\n    \"        \\\"sequence_parallel_enabled\\\": true,\\n\",\n    \"        \\\"fused_qkv\\\": true,\\n\",\n    \"        \\\"qkv_kernel_enabled\\\": true,\\n\",\n    \"        \\\"attn_kernel_enabled\\\": true,\\n\",\n    \"        \\\"attn_block_tkg_nki_kernel_enabled\\\": true,\\n\",\n    \"        \\\"attn_block_tkg_nki_kernel_cache_update\\\": true,\\n\",\n    \"        \\\"blockwise_matmul_config\\\": {\\n\",\n    \"            \\\"block_size\\\": 256,\\n\",\n    \"            \\\"use_block_parallel\\\": true,\\n\",\n    \"            \\\"block_sharding_strategy\\\": \\\"HI_LO\\\",\\n\",\n    \"            \\\"skip_dma_token\\\": true,\\n\",\n    \"            \\\"skip_dma_weight\\\": true,\\n\",\n    \"            \\\"parallelize_token_to_block_mapping\\\": true\\n\",\n    \"        }\\n\",\n    \"    },\\n\",\n    \"    \\\"vision_config\\\": {\\n\",\n    \"        \\\"batch_size\\\": 1,\\n\",\n    \"        \\\"seq_len\\\": 8192,\\n\",\n    \"        \\\"torch_dtype\\\": \\\"float16\\\",\\n\",\n    \"        \\\"tp_degree\\\": 16,\\n\",\n    \"        \\\"cp_degree\\\": 1,\\n\",\n    \"        \\\"dp_degree\\\": 4,\\n\",\n    \"        \\\"world_size\\\": 64,\\n\",\n    \"        \\\"fused_qkv\\\": true,\\n\",\n    \"        \\\"qkv_kernel_enabled\\\": true,\\n\",\n    \"        \\\"attn_kernel_enabled\\\": true,\\n\",\n    \"        \\\"mlp_kernel_enabled\\\": true,\\n\",\n    \"        \\\"enable_bucketing\\\": true,\\n\",\n    \"        \\\"buckets\\\": [8, 28, 88],\\n\",\n    \"        \\\"logical_neuron_cores\\\": 2,\\n\",\n    \"        \\\"save_sharded_checkpoint\\\": true\\n\",\n    \"    }\\n\",\n    \"}\\n\",\n    \"\\n\",\n    \"import argparse\\n\",\n    \"import json\\n\",\n    \"\\n\",\n    \"import torch\\n\",\n    \"from neuronx_distributed_inference.models.config import OnDeviceSamplingConfig\\n\",\n    \"from neuronx_distributed_inference.models.llama4.modeling_llama4 import NeuronLlama4ForCausalLM, Llama4InferenceConfig, Llama4NeuronConfig\\n\",\n    \"from neuronx_distributed_inference.utils.hf_adapter import load_pretrained_config\\n\",\n    \"\\n\",\n    \"def parse_args():\\n\",\n    \"    parser = argparse.ArgumentParser()\\n\",\n    \"    parser.add_argument('--model-path', type=str, required=True)\\n\",\n    \"    parser.add_argument('--traced-model-path', type=str, required=True)\\n\",\n    \"    parser.add_argument('--neuron-config-path', type=str, default=None)\\n\",\n    \"    return parser.parse_args()\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def build_config(neuron_config_path, model_path):\\n\",\n    \"    with open(neuron_config_path, 'r') as f:\\n\",\n    \"        config_json = json.load(f)\\n\",\n    \"    text_neuron_config = Llama4NeuronConfig(**config_json['text_config'])\\n\",\n    \"    vision_neuron_config = Llama4NeuronConfig(**config_json['vision_config'])\\n\",\n    \"    return Llama4InferenceConfig(\\n\",\n    \"        text_neuron_config=text_neuron_config,\\n\",\n    \"        vision_neuron_config=vision_neuron_config,\\n\",\n    \"        load_config=load_pretrained_config(model_path)\\n\",\n    \"    )\\n\",\n    \"\\n\",\n    \"def compile(model_path, traced_model_path, config):\\n\",\n    \"    model = NeuronLlama4ForCausalLM(model_path, config)\\n\",\n    \"    model.compile(traced_model_path)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"args = parse_args()\\n\",\n    \"config = build_config(args.neuron_config_path, args.model_path)\\n\",\n    \"compile(\\\"meta-llama/Llama-4-Scout-17B-16E-Instruct\\\", \\n\",\n    \"    \\\"/home/ubuntu/llama4/traced_models/Llama-4-Scout-17B-16E-Instruct\\\", \\n\",\n    \"    scount_neuron_config)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"01b099e8\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 3: Deploy with vLLM Inference\\n\",\n    \"\\n\",\n    \"We provide two examples to run Llama4 with vLLM:\\n\",\n    \"\\n\",\n    \"* Offline inference: you can provide prompts in a python script and execute it.\\n\",\n    \"* Online inference: you will serve the model in an online server and send requests.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"7a59eb13\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Offline Example\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"Prior to launching the vLLM server, you must trace the Llama4 model. Provide the trace model by setting the environment variable NEURON_COMPILED_ARTIFACTS.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"08f20db7\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"from vllm import LLM, SamplingParams\\n\",\n    \"\\n\",\n    \"# Hugging Face authentication (replace with your token)\\n\",\n    \"# from huggingface_hub import login\\n\",\n    \"# login(token=\\\"your_hf_token_here\\\")\\n\",\n    \"\\n\",\n    \"# Configure Neuron environment for inference\\n\",\n    \"os.environ['VLLM_NEURON_FRAMEWORK'] = \\\"neuronx-distributed-inference\\\"\\n\",\n    \"os.environ['NEURON_COMPILED_ARTIFACTS'] = \\\"/home/ubuntu/llama4/traced_models/Llama-4-Scout-17B-16E-Instruct\\\"\\n\",\n    \"\\n\",\n    \"IMAGE_URL = \\\"https://httpbin.org/image/png\\\"\\n\",\n    \"\\n\",\n    \"# Initialize LLM with Neuron device configuration\\n\",\n    \"llm = LLM(\\n\",\n    \"    model=\\\"meta-llama/Llama-4-Scout-17B-16E-Instruct\\\",  # or the file path to the downloaded checkpoint\\n\",\n    \"    max_num_seqs=1,\\n\",\n    \"    max_model_len=16384,\\n\",\n    \"    device=\\\"neuron\\\",\\n\",\n    \"    tensor_parallel_size=64,\\n\",\n    \"    use_v2_block_manager=True,\\n\",\n    \"    limit_mm_per_prompt={\\\"image\\\": 5}, # Accepts up to 5 images per prompt\\n\",\n    \")\\n\",\n    \"# Configure sampling for deterministic output\\n\",\n    \"sampling_params = SamplingParams(top_k=1, max_tokens=100)\\n\",\n    \"\\n\",\n    \"# Test 1: Text-only input\\n\",\n    \"conversation = [\\n\",\n    \"    {\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"what is the recipe of mayonnaise in two sentences?\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }\\n\",\n    \"]\\n\",\n    \"for output in llm.chat(conversation, sampling_params):\\n\",\n    \"    print(f\\\"Generated text: {output.outputs[0].text !r}\\\")\\n\",\n    \"\\n\",\n    \"# Test 2: Single image with text\\n\",\n    \"conversation = [\\n\",\n    \"    {\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": IMAGE_URL}},\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"Describe this image\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }\\n\",\n    \"]\\n\",\n    \"for output in llm.chat(conversation, sampling_params):\\n\",\n    \"    print(f\\\"Generated text: {output.outputs[0].text !r}\\\")\\n\",\n    \"\\n\",\n    \"# Test 3: Multiple images with text\\n\",\n    \"conversation = [\\n\",\n    \"    {\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": IMAGE_URL}},\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": IMAGE_URL}},\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"Compare these two images, tell me the difference.\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }\\n\",\n    \"]\\n\",\n    \"for output in llm.chat(conversation, sampling_params):\\n\",\n    \"    print(f\\\"Generated text: {output.outputs[0].text !r}\\\")\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"586cc351\",\n   \"metadata\": {},\n   \"source\": [\n    \"Below is an example output:\\n\",\n    \"\\n\",\n    \"```bash\\n\",\n    \"Generated text: 'To make mayonnaise, combine 2 egg yolks, 1 tablespoon of lemon juice or vinegar, and a pinch of salt in a bowl, and whisk them together until smooth. Then, slowly pour in 1/2 cup of oil while continuously whisking the mixture until it thickens and emulsifies into a creamy sauce.'\\n\",\n    \"Generated text: \\\"The image depicts a cartoon-style illustration of a pig's face, characterized by its pink color and endearing expression. The pig features two small black eyes with white outlines, a curved smile, and two small nostrils on its snout. Two red circles adorn the cheeks, adding to the pig's rosy appearance.\\\\n\\\\n**Key Features:**\\\\n\\\\n* **Color:** Pink\\\\n* **Facial Expression:** Smiling\\\\n* **Eyes:** Small, black, with white outlines\\\\n* **Sn\\\"\\n\",\n    \"Generated text: \\\"The two images are identical, with no discernible differences. The only variation is a slight difference in the shade of pink used for the pig's face, but this could be due to different rendering or display settings rather than an actual difference in the images themselves.\\\\n\\\\n**Key Features:**\\\\n\\\\n* Both images feature a cartoon-style pig's head with a smiling face.\\\\n* The pig has two small ears, two eyes, and a curved smile.\\\\n* The background of both images is white.\\\\n\\\\n**Conclusion:**\\\\nGiven\\\"\\n\",\n    \"```\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5a04efba\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Online Example\\n\",\n    \"\\n\",\n    \"Prior to launching the Vllm server, you must trace the llama4 model, with the traced model path provided through the environment variable NEURON_COMPILED_ARTIFACTS.\\n\",\n    \"\\n\",\n    \"Open a terminal and spin up a server of the model. \\n\",\n    \"To accommodate multiple image inputs, include the optional argument --limit-mm-per-prompt\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"1b31fb3f\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"export VLLM_NEURON_FRAMEWORK=\\\"neuronx-distributed-inference\\\"\\n\",\n    \"export NEURON_COMPILED_ARTIFACTS=\\\"/home/ubuntu/llama4/traced_models/Llama-4-Scout-17B-16E-Instruct/\\\"\\n\",\n    \"export VLLM_RPC_TIMEOUT=100000\\n\",\n    \"nohup python -m vllm.entrypoints.openai.api_server \\\\\\n\",\n    \"    --model \\\"meta-llama/Llama-4-Scout-17B-16E-Instruct\\\" \\\\\\n\",\n    \"    --max-num-seqs 1 \\\\\\n\",\n    \"    --max-model-len 16384 \\\\\\n\",\n    \"    --tensor-parallel-size 64 \\\\\\n\",\n    \"    --device neuron \\\\\\n\",\n    \"    --port 8000 \\\\\\n\",\n    \"    --use-v2-block-manager \\\\\\n\",\n    \"    --disable-log-requests \\\\\\n\",\n    \"    --override-neuron-config '{}' \\\\\\n\",\n    \"    --limit-mm-per-prompt image=5\\n\",\n    \"\\n\",\n    \"...\\n\",\n    \"INFO:     Started server process [25218]\\n\",\n    \"INFO:     Waiting for application startup.\\n\",\n    \"INFO:     Application startup complete.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"69768e75\",\n   \"metadata\": {},\n   \"source\": [\n    \"Open another terminal and execute the following client code with python:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0494f47e\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from openai import OpenAI\\n\",\n    \"\\n\",\n    \"MODEL = \\\"meta-llama/Llama-4-Scout-17B-16E-Instruct\\\"\\n\",\n    \"\\n\",\n    \"client = OpenAI(\\n\",\n    \"    api_key = \\\"EMPTY\\\",\\n\",\n    \"    base_url = \\\"http://localhost:8000/v1\\\"\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"print(\\\"== Test text input ==\\\")\\n\",\n    \"completion = client.chat.completions.create(\\n\",\n    \"    model=MODEL,\\n\",\n    \"    messages=[{\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"what is the recipe of mayonnaise in two sentences?\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }]\\n\",\n    \")\\n\",\n    \"print(completion.choices[0].message.content)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"print(\\\"== Test image input ==\\\")\\n\",\n    \"completion = client.chat.completions.create(\\n\",\n    \"    model=MODEL,\\n\",\n    \"    messages=[{\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": \\\"https://httpbin.org/image/png\\\"}},\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"Describe this image\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }]\\n\",\n    \")\\n\",\n    \"print(completion.choices[0].message.content)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"print(\\\"== Test multiple image inputs ==\\\")\\n\",\n    \"completion = client.chat.completions.create(\\n\",\n    \"    model=MODEL,\\n\",\n    \"    messages=[{\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": \\\"https://httpbin.org/image/png\\\"}},\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": \\\"https://httpbin.org/image/png\\\"}},\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"Compare these two images, tell me the difference.\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }]\\n\",\n    \")\\n\",\n    \"print(completion.choices[0].message.content)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"39d36c0c\",\n   \"metadata\": {},\n   \"source\": [\n    \"Below is an example output:\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d07d242d\",\n   \"metadata\": {},\n   \"source\": [\n    \"```\\n\",\n    \"== Test text input ==\\n\",\n    \"To make mayonnaise, combine 2 egg yolks, 1 tablespoon of lemon juice or vinegar, and a pinch of salt in a bowl, and whisk them together until smooth. Then, slowly pour in 1/2 cup of oil while continuously whisking the mixture until it thickens and emulsifies into a creamy sauce.\\n\",\n    \"== Test image input ==\\n\",\n    \"The image depicts a cartoon-style illustration of a pig's face, characterized by its pink color and endearing expression. The pig features two small black eyes with white outlines, a curved smile, and two small nostrils on its snout. Two red circles adorn the cheeks, adding to the pig's rosy appearance.\\n\",\n    \"\\n\",\n    \"**Key Features:**\\n\",\n    \"\\n\",\n    \"* **Ears:** Two triangular ears are positioned at the top of the head.\\n\",\n    \"* **Facial Expression:** The pig's facial expression is cheerful, with a smile and rosy cheeks.\\n\",\n    \"* **Background:** The background of the image is transparent.\\n\",\n    \"\\n\",\n    \"Overall, the image presents a cute and friendly cartoon pig face.\\n\",\n    \"== Test multiple image inputs ==\\n\",\n    \"The two images are identical, featuring a cartoon pig's face with a pink color and black outline. The only difference is that the first image has a lighter shade of pink compared to the second image.\\n\",\n    \"\\n\",\n    \"**Key Features:**\\n\",\n    \"\\n\",\n    \"* Both images depict a cartoon pig's face.\\n\",\n    \"* They have the same facial features, including eyes, nose, mouth, and ears.\\n\",\n    \"* The background of both images is white.\\n\",\n    \"\\n\",\n    \"**Color Comparison:**\\n\",\n    \"\\n\",\n    \"* The first image has a lighter pink color (RGB: 255, 182, 193).\\n\",\n    \"* The second image has a slightly darker pink color (RGB: 240, 128, 128).\\n\",\n    \"\\n\",\n    \"Overall, while the two images appear similar at first glance, they differ slightly in terms of their pink hue.\\n\",\n    \"```\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"language_info\": {\n   \"name\": \"python\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/llama4-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4f580405\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Deploying Llama4 Multimodal Models\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"98b5edf6\",\n   \"metadata\": {},\n   \"source\": [\n    \"This guide shows how to deploy Llama4 on an AWS Neuron Trainium2 (Trn2) instance using vLLM V1 with the vLLM-Neuron Plugin. This model supports both text and images. It uses Llama4 Scout (meta-llama/Llama-4-Scout-17B-16E) as the example model in this tutorial; however, Maverick (meta-llama/Llama-4-Maverick-17B-128E-Instruct) can also be used.\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"id\": \"88abcf3e\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\\n   :local:\\n   :depth: 2\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"963ac8ed\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Examples\\n\",\n    \"\\n\",\n    \"- [Offline Example](#offline-example)\\n\",\n    \"- [Online Example](#online-example)\\n\",\n    \"- [Advanced Configuration Examples](#advanced-configuration-examples)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5d0049f7\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 1: Set up your development environment\\n\",\n    \"\\n\",\n    \"As a prerequisite, this tutorial requires that you have a Trn2 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed.\\n\",\n    \"\\n\",\n    \"To set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK, see [NxD Inference Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#nxdi-setup). To use a Jupyter Notebook (.ipynb) on a Neuron-enabled instance, see this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"After setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you chose when you launched the instance.\\n\",\n    \"\\n\",\n    \"After you are connected, activate the Python virtual environment that includes the Neuron SDK.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"dff142cd\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 2: Install the vLLM version that supports NxD Inference\\n\",\n    \"\\n\",\n    \"NxD Inference supports running models with vLLM. This functionality is available in the vLLM-Neuron GitHub repository. Install the latest release branch of vLLM-Neuron plugin following instructions in the [vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide-v1.html).\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"01b099e8\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 3: Deploy with vLLM V1 Inference\\n\",\n    \"\\n\",\n    \"We provide two examples to run Llama4 with vLLM V1:\\n\",\n    \"\\n\",\n    \"* Offline inference: you can provide prompts in a python script and execute it.\\n\",\n    \"* Online inference: you will serve the model in an online server and send requests.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"7a59eb13\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Offline Example\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"Prior to launching the vLLM server, you must trace the Llama4 model. Provide the trace model by setting the environment variable NEURON_COMPILED_ARTIFACTS.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"08f20db7\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"from vllm import LLM, SamplingParams\\n\",\n    \"\\n\",\n    \"# Hugging Face authentication (replace with your token)\\n\",\n    \"# from huggingface_hub import login\\n\",\n    \"# login(token=\\\"your_hf_token_here\\\")\\n\",\n    \"\\n\",\n    \"# Configure Neuron environment for inference\\n\",\n    \"# Note: No need to set VLLM_NEURON_FRAMEWORK in V1 - it defaults to neuronx-distributed-inference\\n\",\n    \"os.environ['NEURON_COMPILED_ARTIFACTS'] = \\\"/home/ubuntu/llama4/traced_models/Llama-4-Scout-17B-16E-Instruct\\\"\\n\",\n    \"\\n\",\n    \"IMAGE_URL = \\\"https://httpbin.org/image/png\\\"\\n\",\n    \"\\n\",\n    \"# Initialize LLM with Neuron device configuration\\n\",\n    \"# Note: In V1, configuration is passed via additional_config\\n\",\n    \"llm = LLM(\\n\",\n    \"    model=\\\"meta-llama/Llama-4-Scout-17B-16E-Instruct\\\",  # or the file path to the downloaded checkpoint\\n\",\n    \"    max_num_seqs=1,\\n\",\n    \"    max_model_len=16384,\\n\",\n    \"    tensor_parallel_size=64,\\n\",\n    \"    limit_mm_per_prompt={\\\"image\\\": 5}, # Accepts up to 5 images per prompt\\n\",\n    \"    # V1 uses additional_config for Neuron-specific settings\\n\",\n    \"    additional_config=dict(\\n\",\n    \"        override_neuron_config=dict(\\n\",\n    \"            # Add any custom Neuron configurations here if needed\\n\",\n    \"        )\\n\",\n    \"    )\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"# Configure sampling for deterministic output\\n\",\n    \"sampling_params = SamplingParams(temperature=0.0, max_tokens=100)\\n\",\n    \"\\n\",\n    \"# Test 1: Text-only input\\n\",\n    \"conversation = [\\n\",\n    \"    {\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"what is the recipe of mayonnaise in two sentences?\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }\\n\",\n    \"]\\n\",\n    \"for output in llm.chat(conversation, sampling_params):\\n\",\n    \"    print(f\\\"Generated text: {output.outputs[0].text !r}\\\")\\n\",\n    \"\\n\",\n    \"# Test 2: Single image with text\\n\",\n    \"conversation = [\\n\",\n    \"    {\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": IMAGE_URL}},\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"Describe this image\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }\\n\",\n    \"]\\n\",\n    \"for output in llm.chat(conversation, sampling_params):\\n\",\n    \"    print(f\\\"Generated text: {output.outputs[0].text !r}\\\")\\n\",\n    \"\\n\",\n    \"# Test 3: Multiple images with text\\n\",\n    \"conversation = [\\n\",\n    \"    {\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": IMAGE_URL}},\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": IMAGE_URL}},\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"Compare these two images, tell me the difference.\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }\\n\",\n    \"]\\n\",\n    \"for output in llm.chat(conversation, sampling_params):\\n\",\n    \"    print(f\\\"Generated text: {output.outputs[0].text !r}\\\")\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"586cc351\",\n   \"metadata\": {},\n   \"source\": [\n    \"Below is an example output:\\n\",\n    \"\\n\",\n    \"```bash\\n\",\n    \"Generated text: 'To make mayonnaise, combine 2 egg yolks, 1 tablespoon of lemon juice or vinegar, and a pinch of salt in a bowl, and whisk them together until smooth. Then, slowly pour in 1/2 cup of oil while continuously whisking the mixture until it thickens and emulsifies into a creamy sauce.'\\n\",\n    \"Generated text: \\\"The image depicts a cartoon-style illustration of a pig's face, characterized by its pink color and endearing expression. The pig features two small black eyes with white outlines, a curved smile, and two small nostrils on its snout. Two red circles adorn the cheeks, adding to the pig's rosy appearance.\\\\n\\\\n**Key Features:**\\\\n\\\\n* **Color:** Pink\\\\n* **Facial Expression:** Smiling\\\\n* **Eyes:** Small, black, with white outlines\\\\n* **Sn\\\"\\n\",\n    \"Generated text: \\\"The two images are identical, with no discernible differences. The only variation is a slight difference in the shade of pink used for the pig's face, but this could be due to different rendering or display settings rather than an actual difference in the images themselves.\\\\n\\\\n**Key Features:**\\\\n\\\\n* Both images feature a cartoon-style pig's head with a smiling face.\\\\n* The pig has two small ears, two eyes, and a curved smile.\\\\n* The background of both images is white.\\\\n\\\\n**Conclusion:**\\\\nGiven\\\"\\n\",\n    \"```\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5a04efba\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Online Example\\n\",\n    \"\\n\",\n    \"Prior to launching the Vllm server, you must trace the llama4 model, with the traced model path provided through the environment variable NEURON_COMPILED_ARTIFACTS.\\n\",\n    \"\\n\",\n    \"Open a terminal and spin up a server of the model. \\n\",\n    \"To accommodate multiple image inputs, include the optional argument --limit-mm-per-prompt\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"1b31fb3f\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"export NEURON_COMPILED_ARTIFACTS=\\\"/home/ubuntu/llama4/traced_models/Llama-4-Scout-17B-16E-Instruct/\\\"\\n\",\n    \"export VLLM_RPC_TIMEOUT=100000\\n\",\n    \"\\n\",\n    \"# V1 uses different configuration syntax with --additional-config\\n\",\n    \"nohup vllm serve \\\\\\n\",\n    \"    --model \\\"meta-llama/Llama-4-Scout-17B-16E-Instruct\\\" \\\\\\n\",\n    \"    --max-num-seqs 1 \\\\\\n\",\n    \"    --max-model-len 16384 \\\\\\n\",\n    \"    --tensor-parallel-size 64 \\\\\\n\",\n    \"    --port 8000 \\\\\\n\",\n    \"    --disable-log-requests \\\\\\n\",\n    \"    --limit-mm-per-prompt image=5 \\\\\\n\",\n    \"    --additional-config '{\\n\",\n    \"        \\\"override_neuron_config\\\": {}\\n\",\n    \"    }' &\\n\",\n    \"\\n\",\n    \"# Wait for server to start\\n\",\n    \"sleep 10\\n\",\n    \"echo \\\"Server started. Check logs for startup completion.\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"server_output\",\n   \"metadata\": {},\n   \"source\": [\n    \"Expected server startup output:\\n\",\n    \"\\n\",\n    \"```text\\n\",\n    \"INFO:     Started server process [25218]\\n\",\n    \"INFO:     Waiting for application startup.\\n\",\n    \"INFO:     Application startup complete.\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"69768e75\",\n   \"metadata\": {},\n   \"source\": [\n    \"Open another terminal and execute the following client code with python:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0494f47e\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from openai import OpenAI\\n\",\n    \"\\n\",\n    \"MODEL = \\\"meta-llama/Llama-4-Scout-17B-16E-Instruct\\\"\\n\",\n    \"\\n\",\n    \"client = OpenAI(\\n\",\n    \"    api_key = \\\"EMPTY\\\",\\n\",\n    \"    base_url = \\\"http://localhost:8000/v1\\\"\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"print(\\\"== Test text input ==\\\")\\n\",\n    \"completion = client.chat.completions.create(\\n\",\n    \"    model=MODEL,\\n\",\n    \"    messages=[{\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"what is the recipe of mayonnaise in two sentences?\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }]\\n\",\n    \")\\n\",\n    \"print(completion.choices[0].message.content)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"print(\\\"== Test image input ==\\\")\\n\",\n    \"completion = client.chat.completions.create(\\n\",\n    \"    model=MODEL,\\n\",\n    \"    messages=[{\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": \\\"https://httpbin.org/image/png\\\"}},\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"Describe this image\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }]\\n\",\n    \")\\n\",\n    \"print(completion.choices[0].message.content)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"print(\\\"== Test multiple image inputs ==\\\")\\n\",\n    \"completion = client.chat.completions.create(\\n\",\n    \"    model=MODEL,\\n\",\n    \"    messages=[{\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": \\\"https://httpbin.org/image/png\\\"}},\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": \\\"https://httpbin.org/image/png\\\"}},\\n\",\n    \"            {\\\"type\\\": \\\"text\\\", \\\"text\\\": \\\"Compare these two images, tell me the difference.\\\"},\\n\",\n    \"        ]\\n\",\n    \"    }]\\n\",\n    \")\\n\",\n    \"print(completion.choices[0].message.content)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"39d36c0c\",\n   \"metadata\": {},\n   \"source\": [\n    \"Below is an example output:\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d07d242d\",\n   \"metadata\": {},\n   \"source\": [\n    \"```\\n\",\n    \"== Test text input ==\\n\",\n    \"To make mayonnaise, combine 2 egg yolks, 1 tablespoon of lemon juice or vinegar, and a pinch of salt in a bowl, and whisk them together until smooth. Then, slowly pour in 1/2 cup of oil while continuously whisking the mixture until it thickens and emulsifies into a creamy sauce.\\n\",\n    \"\\n\",\n    \"== Test image input ==\\n\",\n    \"The image depicts a cartoon-style illustration of a pig's face, characterized by its pink color and endearing expression. The pig features two small black eyes with white outlines, a curved smile, and two small nostrils on its snout. Two red circles adorn the cheeks, adding to the pig's rosy appearance.\\n\",\n    \"\\n\",\n    \"**Key Features:**\\n\",\n    \"\\n\",\n    \"* **Ears:** Two triangular ears are positioned at the top of the head.\\n\",\n    \"* **Facial Expression:** The pig's facial expression is cheerful, with a smile and rosy cheeks.\\n\",\n    \"* **Background:** The background of the image is transparent.\\n\",\n    \"\\n\",\n    \"Overall, the image presents a cute and friendly cartoon pig face.\\n\",\n    \"\\n\",\n    \"== Test multiple image inputs ==\\n\",\n    \"The two images are identical, featuring a cartoon pig's face with a pink color and black outline. The only difference is that the first image has a lighter shade of pink compared to the second image.\\n\",\n    \"\\n\",\n    \"**Key Features:**\\n\",\n    \"\\n\",\n    \"* Both images depict a cartoon pig's face.\\n\",\n    \"* They have the same facial features, including eyes, nose, mouth, and ears.\\n\",\n    \"* The background of both images is white.\\n\",\n    \"\\n\",\n    \"**Color Comparison:**\\n\",\n    \"\\n\",\n    \"* The first image has a lighter pink color (RGB: 255, 182, 193).\\n\",\n    \"* The second image has a slightly darker pink color (RGB: 240, 128, 128).\\n\",\n    \"\\n\",\n    \"Overall, while the two images appear similar at first glance, they differ slightly in terms of their pink hue.\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"advanced_config\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Advanced Configuration Examples\\n\",\n    \"\\n\",\n    \"#### Model Compilation and Configuration\\n\",\n    \"\\n\",\n    \"In `override_neuron_config`, to support multimodal architecture, you can define `text_config` and `vision_config` separately for text decoder and vision encoder.\\n\",\n    \"\\n\",\n    \"The image input can be represented in 1, 4, or 16 chunks based on its resolution and aspect ratio. Additionally, there is one chunk to describe the entire image, resulting in the total number of chunks. Due to the use of data parallelism (DP) together with tensor parallelism (TP), the vision model input batch size is padded to the next value divisible by the DP degree, which in this case is 4. The final padded batch size will be:\\n\",\n    \"\\n\",\n    \"* 1+1 = 2 → 4: Each rank has the batch size = 4/4 = 1\\n\",\n    \"* 4+1 = 5 → 8: Each rank has the batch size = 8/4 = 2\\n\",\n    \"* 16+1 = 17 → 20: Each rank has the batch size = 20/4 = 5\\n\",\n    \"\\n\",\n    \"There are a few fields you can configure to improve performance:\\n\",\n    \"\\n\",\n    \"- `cp_degree`: degree of context parallelism at the attention layer for prefill.\\n\",\n    \"- `blockwise_matmul_config`: the configuration of the blockwise MoE kernel for prefill.\\n\",\n    \"- `attn_block_tkg_nki_kernel_enabled` and `attn_block_tkg_nki_kernel_cache_update` to enable a NKI kernel for attention and a kernel KV cache update for decode operations.\\n\",\n    \"\\n\",\n    \"The `scout_neuron_config` shown below contains the recommended configuration for Llama4 Scout model.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"scout_config\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"scout_neuron_config = {\\n\",\n    \"    \\\"text_config\\\": {\\n\",\n    \"        \\\"batch_size\\\": 1,\\n\",\n    \"        \\\"is_continuous_batching\\\": True,\\n\",\n    \"        \\\"seq_len\\\": 16384,\\n\",\n    \"        \\\"enable_bucketing\\\": True,\\n\",\n    \"        \\\"context_encoding_buckets\\\": [256, 512, 1024, 2048, 4096, 8192, 10240, 16384],\\n\",\n    \"        \\\"token_generation_buckets\\\": [256, 512, 1024, 2048, 4096, 8192, 10240, 16384],\\n\",\n    \"        \\\"torch_dtype\\\": \\\"float16\\\",\\n\",\n    \"        \\\"async_mode\\\": True,\\n\",\n    \"        \\\"world_size\\\": 64,\\n\",\n    \"        \\\"tp_degree\\\": 64,\\n\",\n    \"        \\\"cp_degree\\\": 16,\\n\",\n    \"        \\\"cast_type\\\": \\\"as-declared\\\",\\n\",\n    \"        \\\"logical_neuron_cores\\\": 2,\\n\",\n    \"        \\\"cc_pipeline_tiling_factor\\\": 1,\\n\",\n    \"        \\\"sequence_parallel_enabled\\\": True,\\n\",\n    \"        \\\"fused_qkv\\\": True,\\n\",\n    \"        \\\"qkv_kernel_enabled\\\": True,\\n\",\n    \"        \\\"attn_kernel_enabled\\\": True,\\n\",\n    \"        \\\"attn_block_tkg_nki_kernel_enabled\\\": True,\\n\",\n    \"        \\\"attn_block_tkg_nki_kernel_cache_update\\\": True,\\n\",\n    \"        \\\"k_cache_transposed\\\": False,\\n\",\n    \"        \\\"blockwise_matmul_config\\\": {\\n\",\n    \"            \\\"block_size\\\": 256,\\n\",\n    \"            \\\"use_block_parallel\\\": True,\\n\",\n    \"            \\\"block_sharding_strategy\\\": \\\"HI_LO\\\",\\n\",\n    \"            \\\"skip_dma_token\\\": True,\\n\",\n    \"            \\\"skip_dma_weight\\\": True,\\n\",\n    \"            \\\"parallelize_token_to_block_mapping\\\": True\\n\",\n    \"        }\\n\",\n    \"    },\\n\",\n    \"    \\\"vision_config\\\": {\\n\",\n    \"        \\\"batch_size\\\": 1,\\n\",\n    \"        \\\"seq_len\\\": 8192,\\n\",\n    \"        \\\"torch_dtype\\\": \\\"float16\\\",\\n\",\n    \"        \\\"tp_degree\\\": 16,\\n\",\n    \"        \\\"cp_degree\\\": 1,\\n\",\n    \"        \\\"dp_degree\\\": 4,\\n\",\n    \"        \\\"world_size\\\": 64,\\n\",\n    \"        \\\"fused_qkv\\\": True,\\n\",\n    \"        \\\"qkv_kernel_enabled\\\": True,\\n\",\n    \"        \\\"attn_kernel_enabled\\\": True,\\n\",\n    \"        \\\"mlp_kernel_enabled\\\": True,\\n\",\n    \"        \\\"enable_bucketing\\\": True,\\n\",\n    \"        \\\"buckets\\\": [8, 28, 88],\\n\",\n    \"        \\\"logical_neuron_cores\\\": 2,\\n\",\n    \"        \\\"save_sharded_checkpoint\\\": True\\n\",\n    \"    }\\n\",\n    \"}\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"custom_config_v1\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Using Custom Neuron Configuration with vLLM V1\\n\",\n    \"\\n\",\n    \"When using vLLM V1, you can pass custom Neuron configurations using the `additional_config` parameter. Here's an example of how to use the advanced configuration:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"custom_v1_example\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Example: Using custom Neuron configuration with vLLM V1\\n\",\n    \"import os\\n\",\n    \"from vllm import LLM, SamplingParams\\n\",\n    \"\\n\",\n    \"# Enable V1 mode\\n\",\n    \"os.environ['VLLM_USE_V1'] = '1'\\n\",\n    \"os.environ['NEURON_COMPILED_ARTIFACTS'] = \\\"/home/ubuntu/llama4/traced_models/Llama-4-Scout-17B-16E-Instruct\\\"\\n\",\n    \"\\n\",\n    \"# Initialize LLM with custom Neuron configuration\\n\",\n    \"llm = LLM(\\n\",\n    \"    model=\\\"meta-llama/Llama-4-Scout-17B-16E-Instruct\\\",\\n\",\n    \"    max_num_seqs=1,\\n\",\n    \"    max_model_len=16384,\\n\",\n    \"    tensor_parallel_size=64,\\n\",\n    \"    limit_mm_per_prompt={\\\"image\\\": 5},\\n\",\n    \"    # V1 syntax: use additional_config with override_neuron_config\\n\",\n    \"    additional_config=dict(\\n\",\n    \"        override_neuron_config=scout_neuron_config  # Use the configuration defined above\\n\",\n    \"    )\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"# The rest of your inference code remains the same\\n\",\n    \"sampling_params = SamplingParams(temperature=0.0, max_tokens=100)\\n\",\n    \"# ... inference code ...\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"server_custom_config\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Server Configuration with Custom Neuron Config\\n\",\n    \"\\n\",\n    \"For online inference with custom configuration, you can pass the Neuron config via the `--additional-config` flag:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"server_custom_example\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"# Example server startup with custom Neuron configuration\\n\",\n    \"export VLLM_USE_V1=1\\n\",\n    \"export NEURON_COMPILED_ARTIFACTS=\\\"/home/ubuntu/llama4/traced_models/Llama-4-Scout-17B-16E-Instruct/\\\"\\n\",\n    \"export VLLM_RPC_TIMEOUT=100000\\n\",\n    \"\\n\",\n    \"# Start server with custom Neuron configuration\\n\",\n    \"vllm serve \\\\\\n\",\n    \"    --model \\\"meta-llama/Llama-4-Scout-17B-16E-Instruct\\\" \\\\\\n\",\n    \"    --max-num-seqs 1 \\\\\\n\",\n    \"    --max-model-len 16384 \\\\\\n\",\n    \"    --tensor-parallel-size 64 \\\\\\n\",\n    \"    --port 8000 \\\\\\n\",\n    \"    --disable-log-requests \\\\\\n\",\n    \"    --limit-mm-per-prompt image=5 \\\\\\n\",\n    \"    --additional-config '{\\n\",\n    \"        \\\"override_neuron_config\\\": {\\n\",\n    \"            \\\"text_config\\\": {\\n\",\n    \"                \\\"batch_size\\\": 1,\\n\",\n    \"                \\\"is_continuous_batching\\\": true,\\n\",\n    \"                \\\"seq_len\\\": 16384,\\n\",\n    \"                \\\"enable_bucketing\\\": true,\\n\",\n    \"                \\\"context_encoding_buckets\\\": [256, 512, 1024, 2048, 4096, 8192, 10240, 16384],\\n\",\n    \"                \\\"token_generation_buckets\\\": [256, 512, 1024, 2048, 4096, 8192, 10240, 16384],\\n\",\n    \"                \\\"torch_dtype\\\": \\\"float16\\\",\\n\",\n    \"                \\\"async_mode\\\": true,\\n\",\n    \"                \\\"world_size\\\": 64,\\n\",\n    \"                \\\"tp_degree\\\": 64,\\n\",\n    \"                \\\"cp_degree\\\": 16\\n\",\n    \"            },\\n\",\n    \"            \\\"vision_config\\\": {\\n\",\n    \"                \\\"batch_size\\\": 1,\\n\",\n    \"                \\\"seq_len\\\": 8192,\\n\",\n    \"                \\\"torch_dtype\\\": \\\"float16\\\",\\n\",\n    \"                \\\"tp_degree\\\": 16,\\n\",\n    \"                \\\"cp_degree\\\": 1,\\n\",\n    \"                \\\"dp_degree\\\": 4,\\n\",\n    \"                \\\"world_size\\\": 64\\n\",\n    \"            }\\n\",\n    \"        }\\n\",\n    \"    }'\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"language_info\": {\n   \"name\": \"python\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/llama405b_perf_comparison.csv",
    "content": "Scenario (all using BF16),TTFT (P50 in ms),TPOT (P50 in ms),Output token Throughput (per second)\nNo speculative decoding,2442,37.9,25.46\nFused speculative decoding + rescaled weights (Llama 3.2 1B Draft),2255,8.27,102.41"
  },
  {
    "path": "libraries/nxd-inference/tutorials/llama70b_apc_perf_comparison.csv",
    "content": "Dataset,TTFT (P50 in ms) without prefix caching ,TTFT (P50 in ms) with prefix caching,Improvement\nmath.math (>90% cache hit),342.81,107.8,3.18x\ndynamic sonnet 1k (~25% cache hit),123.08,102.15,1.2x\ndynamic sonnet 2k (~25% cache hit),592.8,377.2,1.57x\nHumanEval (No cache hit),89.7,91.8,0.98x"
  },
  {
    "path": "libraries/nxd-inference/tutorials/llama70b_perf_comparison.csv",
    "content": "Scenario (all using BF16),TTFT (P50 in ms),TPOT (P50 in ms),Output token Throughput (per second)\nNo speculative decoding,814.2,19.6,36\nFused speculative decoding (Llama 3.2 1B Draft),870.1,5.3,144"
  },
  {
    "path": "libraries/nxd-inference/tutorials/modules_to_not_convert.json",
    "content": "{\n\"model\":{\n    \"modules_to_not_convert\": [\n    \"lm_head\",\n    \"layers.0.mlp\",\n    \"layers.125.mlp\",\n    \"layers.0.self_attn\",\n    \"layers.1.self_attn\",\n    \"layers.2.self_attn\",\n    \"layers.3.self_attn\",\n    \"layers.4.self_attn\",\n    \"layers.5.self_attn\",\n    \"layers.6.self_attn\",\n    \"layers.7.self_attn\",\n    \"layers.8.self_attn\",\n    \"layers.9.self_attn\",\n    \"layers.10.self_attn\",\n    \"layers.11.self_attn\",\n    \"layers.12.self_attn\",\n    \"layers.13.self_attn\",\n    \"layers.14.self_attn\",\n    \"layers.15.self_attn\",\n    \"layers.16.self_attn\",\n    \"layers.17.self_attn\",\n    \"layers.18.self_attn\",\n    \"layers.19.self_attn\",\n    \"layers.20.self_attn\",\n    \"layers.21.self_attn\",\n    \"layers.22.self_attn\",\n    \"layers.23.self_attn\",\n    \"layers.24.self_attn\",\n    \"layers.25.self_attn\",\n    \"layers.26.self_attn\",\n    \"layers.27.self_attn\",\n    \"layers.28.self_attn\",\n    \"layers.29.self_attn\",\n    \"layers.30.self_attn\",\n    \"layers.31.self_attn\",\n    \"layers.32.self_attn\",\n    \"layers.33.self_attn\",\n    \"layers.34.self_attn\",\n    \"layers.35.self_attn\",\n    \"layers.36.self_attn\",\n    \"layers.37.self_attn\",\n    \"layers.38.self_attn\",\n    \"layers.39.self_attn\",\n    \"layers.40.self_attn\",\n    \"layers.41.self_attn\",\n    \"layers.42.self_attn\",\n    \"layers.43.self_attn\",\n    \"layers.44.self_attn\",\n    \"layers.45.self_attn\",\n    \"layers.46.self_attn\",\n    \"layers.47.self_attn\",\n    \"layers.48.self_attn\",\n    \"layers.49.self_attn\",\n    \"layers.50.self_attn\",\n    \"layers.51.self_attn\",\n    \"layers.52.self_attn\",\n    \"layers.53.self_attn\",\n    \"layers.54.self_attn\",\n    \"layers.55.self_attn\",\n    \"layers.56.self_attn\",\n    \"layers.57.self_attn\",\n    \"layers.58.self_attn\",\n    \"layers.59.self_attn\",\n    \"layers.60.self_attn\",\n    \"layers.61.self_attn\",\n    \"layers.62.self_attn\",\n    \"layers.63.self_attn\",\n    \"layers.64.self_attn\",\n    \"layers.65.self_attn\",\n    \"layers.66.self_attn\",\n    \"layers.67.self_attn\",\n    \"layers.68.self_attn\",\n    \"layers.69.self_attn\",\n    \"layers.70.self_attn\",\n    \"layers.71.self_attn\",\n    \"layers.72.self_attn\",\n    \"layers.73.self_attn\",\n    \"layers.74.self_attn\",\n    \"layers.75.self_attn\",\n    \"layers.76.self_attn\",\n    \"layers.77.self_attn\",\n    \"layers.78.self_attn\",\n    \"layers.79.self_attn\",\n    \"layers.80.self_attn\",\n    \"layers.81.self_attn\",\n    \"layers.82.self_attn\",\n    \"layers.83.self_attn\",\n    \"layers.84.self_attn\",\n    \"layers.85.self_attn\",\n    \"layers.86.self_attn\",\n    \"layers.87.self_attn\",\n    \"layers.88.self_attn\",\n    \"layers.89.self_attn\",\n    \"layers.90.self_attn\",\n    \"layers.91.self_attn\",\n    \"layers.92.self_attn\",\n    \"layers.93.self_attn\",\n    \"layers.94.self_attn\",\n    \"layers.95.self_attn\",\n    \"layers.96.self_attn\",\n    \"layers.97.self_attn\",\n    \"layers.98.self_attn\",\n    \"layers.99.self_attn\",\n    \"layers.100.self_attn\",\n    \"layers.101.self_attn\",\n    \"layers.102.self_attn\",\n    \"layers.103.self_attn\",\n    \"layers.104.self_attn\",\n    \"layers.105.self_attn\",\n    \"layers.106.self_attn\",\n    \"layers.107.self_attn\",\n    \"layers.108.self_attn\",\n    \"layers.109.self_attn\",\n    \"layers.110.self_attn\",\n    \"layers.111.self_attn\",\n    \"layers.112.self_attn\",\n    \"layers.113.self_attn\",\n    \"layers.114.self_attn\",\n    \"layers.115.self_attn\",\n    \"layers.116.self_attn\",\n    \"layers.117.self_attn\",\n    \"layers.118.self_attn\",\n    \"layers.119.self_attn\",\n    \"layers.120.self_attn\",\n    \"layers.121.self_attn\",\n    \"layers.122.self_attn\",\n    \"layers.123.self_attn\",\n    \"layers.124.self_attn\",\n    \"layers.125.self_attn\"\n    ]\n  },\n\"draft_model\":{\n    \"modules_to_not_convert\": [\n    \"lm_head\",\n    \"layers.0.mlp\",\n    \"layers.1.mlp\",\n    \"layers.2.mlp\",\n    \"layers.3.mlp\",\n    \"layers.4.mlp\",\n    \"layers.5.mlp\",\n    \"layers.6.mlp\",\n    \"layers.7.mlp\",\n    \"layers.8.mlp\",\n    \"layers.9.mlp\",\n    \"layers.10.mlp\",\n    \"layers.11.mlp\",\n    \"layers.12.mlp\",\n    \"layers.13.mlp\",\n    \"layers.14.mlp\",\n    \"layers.15.mlp\",\n    \"layers.0.self_attn\",\n    \"layers.1.self_attn\",\n    \"layers.2.self_attn\",\n    \"layers.3.self_attn\",\n    \"layers.4.self_attn\",\n    \"layers.5.self_attn\",\n    \"layers.6.self_attn\",\n    \"layers.7.self_attn\",\n    \"layers.8.self_attn\",\n    \"layers.9.self_attn\",\n    \"layers.10.self_attn\",\n    \"layers.11.self_attn\",\n    \"layers.12.self_attn\",\n    \"layers.13.self_attn\",\n    \"layers.14.self_attn\",\n    \"layers.15.self_attn\",\n    \"fc\"\n    ]\n  }\n}\n    "
  },
  {
    "path": "libraries/nxd-inference/tutorials/pixtral-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Deploy Pixtral Large on Trn2 instances\\n\",\n    \"\\n\",\n    \"This tutorial provides a step-by-step guide to deploy [mistralai/Pixtral-Large-Instruct-2411](https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411) using NeuronX Distributed (NxD) Inference on a single `trn2.48xlarge` instance.\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\\n\",\n    \"    :local:\\n\",\n    \"    :depth: 2\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Prerequisites\\n\",\n    \"\\n\",\n    \"### Set up and connect to a `trn2.48xlarge` instance\\n\",\n    \"\\n\",\n    \"As a prerequisite, this tutorial requires that you have a Trn2 instance with a Deep Learning AMI that has the Neuron SDK pre-installed. To set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK, see the [NxDI setup guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#nxdi-setup).\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"To use Jupyter Notebook on the Neuron instance, you can follow this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html). After you are connected, activate the Python virtual environment that includes the Neuron SDK.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \"```python\\n\",\n    \"pip list | grep neuron\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You should see Neuron packages including\\n\",\n    \"`neuronx-distributed-inference` and `neuronx-cc`.\\n\",\n    \"\\n\",\n    \"### Install packages\\n\",\n    \"\\n\",\n    \"NxD Inference supports running models with vLLM. This functionality is\\n\",\n    \"available in a fork of the vLLM GitHub repository:\\n\",\n    \"\\n\",\n    \"- [aws-neuron/upstreaming-to-vllm](https://github.com/aws-neuron/upstreaming-to-vllm/tree/neuron-2.26)\\n\",\n    \"\\n\",\n    \"To run NxD Inference with vLLM, you need to download and install vLLM from this fork. Refer the [Neuron vllm installation guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html#installing-the-aws-neuron-fork-of-vllm) to install vllm.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Ensure that the Neuron virtual environment is activated if using a new terminal instead of the one from connect step above. Then, install the Neuron vLLM fork into the virtual environment.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 1 Download the model and convert the checkpoint\\n\",\n    \"\\n\",\n    \"To deploy [mistralai/Pixtral-Large-Instruct-2411](https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411) on Neuron, you need to first download the checkpoint from HuggingFace to a local path on the Trn2 instance (for more information on downloading models from HuggingFace, refer [the guide on Downloading models](https://huggingface.co/docs/hub/en/models-downloading)).\\n\",\n    \"\\n\",\n    \"Once you have downloaded the model, convert the original Pixtral checkpoint by running the following [script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/convert_pixtral_weights_to_hf.py). After the conversion, you should see a `config.json` file in the output folder along with weights in `model-xxxx-of-xxxx.safetensors` format.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"<div class=\\\"alert alert-block alert-warning\\\">\\n\",\n    \"<b>Note:</b> There is a known issue in the Huggingface conversion script that sets the `image_token_index` to `32000` in `config.json`. You need to manually set `image_token_index` to `10` before proceeding with the subsequent steps.\\n\",\n    \"</div>\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 2: Compile and deploy Pixtral Large\\n\",\n    \"\\n\",\n    \"While compiling the model, certain configurations are used to optimize the performance of the model. These configurations are described below and can be modified as per one's use-case.\\n\",\n    \"\\n\",\n    \"- Pixtral consists of a **_text_** model and a **_vision encoder_**. You need to specify configurations explicitly through `text_neuron_config` and `vision_neuron_config`.\\n\",\n    \"- `tp_degree` : This is the tensor parallel degree for sharding the model across the neuron cores. Here, it is set to **64** for the **_text model_** and **16** for the **_vision encoder_**.\\n\",\n    \"- `batch_size` : This is set to the batch size for compiling the models. Currently prefill is always done with `batch_size = 1`; hence the `batch_size` in `vision_neuron_config` is set to **1** and the `batch_size` in `text_neuron_config` is set to the desired value for handling concurrent requests (same as `max-num-seqs` for the vllm argument).\\n\",\n    \"- `seq_len` : Set this to the maximum sequence length that needs to be supported.\\n\",\n    \"- `text_neuron_config`\\n\",\n    \"    - `enable_bucketing` : [Bucketing](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#bucketing) allows one to optimize performance for specific sequence lengths and in this case we [configure specific buckets](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#configuring-specific-buckets). \\n\",\n    \"    - `context_encoding_buckets` : This refers to the prefill phase (size of the input prompt) and should be set to handle different sequence lengths for inputs. It's set to `[2048, 4096, 10240]`.\\n\",\n    \"    - `token_generation_buckets` : Token generation buckets are set to the output token lengths. In this case - `[2048, 4096, 10240]`.\\n\",\n    \"    - `flash_decoding_enabled` : Setting this to `True` enables partitioning the KV cache and improves the performance for long sequences. Refer the app note on [Flash Decoding](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/app-notes/parallelism.html#flash-decoding) for more details.\\n\",\n    \"    - `sequence_parallel_enabled` : [Sequence Parallelism](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#sequence-parallelism) splits tensors across the sequence dimension to improve performance.\\n\",\n    \"    - `fused_qkv` : [QKV weight fusion](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#qkv-weight-fusion) concatenates a model's query, key and value weight matrices to achieve better performance.\\n\",\n    \"    - `qkv_kernel_enabled` : Enable the use of the fused QKV kernel.\\n\",\n    \"    - `mlp_kernel_enabled` : Enable the use of the MLP kernel.\\n\",\n    \"    - `cc_pipeline_tiling_factor` : \\n\",\n    \"- `vision_neuron_config`\\n\",\n    \"    - `buckets` : In the context of the vision encoder, buckets account for two dimensions - image sizes and number of images. The Pixtral HF processor processes each image in `16x16` patches. For example, a `512x512` image is processed as a `32x32` grid, which is `32x32=1024` image tokens. To handle 6 images, it'll be `6144` tokens. In this case, buckets are set to `[2048, 4096, 6144, 8192, 10240]` to handle different number of images and image sizes.\\n\",\n    \"    - `seq_len` : Set this to the maximum sequence length for the use case.\\n\",\n    \"    - `tp_degree` : The vision encoder uses a tensor parallel degree of 16. \\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Compile and deploy using `vllm`\\n\",\n    \"In this step, you can directly use the vllm command to deploy the model. The `neuronx-distributed-inference` model loader in vllm performs JIT compilation before deploying it with the model server. Replace `<path to converted pixtral checkpoint>` with your specific path before running the below command.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%writefile start_vllm.sh\\n\",\n    \"#!/bin/bash\\n\",\n    \"\\n\",\n    \"echo \\\"Running vLLM server in the background...\\\"\\n\",\n    \"rm -f ./vllm_server.log \\n\",\n    \"\\n\",\n    \"export NEURON_RT_INSPECT_ENABLE=0 \\n\",\n    \"export NEURON_RT_VIRTUAL_CORE_SIZE=2\\n\",\n    \"export VLLM_NEURON_FRAMEWORK=\\\"neuronx-distributed-inference\\\"\\n\",\n    \"VLLM_RPC_TIMEOUT=100000\\n\",\n    \"\\n\",\n    \"nohup vllm serve \\\\\\n\",\n    \"    --model \\\"/home/ubuntu/model_hf/\\\" \\\\\\n\",\n    \"    --limit-mm-per-prompt 'image=6' \\\\\\n\",\n    \"    --tensor-parallel-size 64 \\\\\\n\",\n    \"    --max-model-len 10240 \\\\\\n\",\n    \"    --max-num-seqs 4 \\\\\\n\",\n    \"    --device neuron \\\\\\n\",\n    \"    --override-neuron-config \\\"{\\\\\\\"text_neuron_config\\\\\\\": { \\\\\\\"tp_degree\\\\\\\": 64, \\\\\\\"world_size\\\\\\\": 64, \\\\\\\"batch_size\\\\\\\": 4, \\\\\\\"seq_len\\\\\\\": 10240, \\\\\\\"ctx_batch_size\\\\\\\": 1, \\\\\\\"flash_decoding_enabled\\\\\\\": true, \\\\\\\"enable_bucketing\\\\\\\": true, \\\\\\\"skip_warmup\\\\\\\": true, \\\\\\\"context_encoding_buckets\\\\\\\": [2048, 4096, 10240], \\\\\\\"token_generation_buckets\\\\\\\": [2048, 4096, 10240], \\\\\\\"torch_dtype\\\\\\\": \\\\\\\"float16\\\\\\\", \\\\\\\"sequence_parallel_enabled\\\\\\\": true, \\\\\\\"fused_qkv\\\\\\\": true, \\\\\\\"qkv_kernel_enabled\\\\\\\": true, \\\\\\\"mlp_kernel_enabled\\\\\\\": true, \\\\\\\"cc_pipeline_tiling_factor\\\\\\\": 1 }, \\\\\\\"vision_neuron_config\\\\\\\": { \\\\\\\"batch_size\\\\\\\": 1, \\\\\\\"seq_len\\\\\\\": 10240, \\\\\\\"tp_degree\\\\\\\": 16, \\\\\\\"world_size\\\\\\\": 64, \\\\\\\"torch_dtype\\\\\\\": \\\\\\\"float16\\\\\\\", \\\\\\\"buckets\\\\\\\": [2048, 4096, 6144, 8192, 10240] }}\\\" > ./vllm_server.log 2>&1 &\\n\",\n    \"SERVER_PID=$!\\n\",\n    \"\\n\",\n    \"echo \\\"Server started in the background with the following id: $SERVER_PID. Waiting until server is ready to serve...\\\"\\n\",\n    \"until grep -q \\\"Application startup complete\\\" ./vllm_server.log 2>/dev/null || ! kill -0 $SERVER_PID 2>/dev/null; do sleep 0.5; done\\n\",\n    \"grep -q \\\"Application startup complete\\\" ./vllm_server.log 2>/dev/null && echo \\\"vLLM Server is ready!\\\" || (echo \\\"vLLM Server failed, check the ./vllm_server.log file\\\" && exit 1)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!chmod +x ./start_vllm.sh\\n\",\n    \"!./start_vllm.sh\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": []\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 3: Ping the server using a client\\n\",\n    \"\\n\",\n    \"After deploying the model server, you can run inference by sending it requests. The below example sends a text prompt with a single image - \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import requests\\n\",\n    \"import json\\n\",\n    \"from huggingface_hub import hf_hub_download\\n\",\n    \"from datetime import datetime, timedelta\\n\",\n    \"\\n\",\n    \"url = \\\"http://0.0.0.0:8000/v1/chat/completions\\\"\\n\",\n    \"headers = {\\\"Content-Type\\\": \\\"application/json\\\", \\\"Authorization\\\": \\\"Bearer token\\\"}\\n\",\n    \"\\n\",\n    \"model = \\\"mistralai/Pixtral-Large-Instruct-2411\\\"\\n\",\n    \"vllm_model = \\\"/home/ubuntu/model_hf/\\\"\\n\",\n    \"\\n\",\n    \"def load_system_prompt(repo_id: str, filename: str) -> str:\\n\",\n    \"    file_path = hf_hub_download(repo_id=repo_id, filename=filename)\\n\",\n    \"    with open(file_path, \\\"r\\\") as file:\\n\",\n    \"        system_prompt = file.read()\\n\",\n    \"    today = datetime.today().strftime(\\\"%Y-%m-%d\\\")\\n\",\n    \"    yesterday = (datetime.today() - timedelta(days=1)).strftime(\\\"%Y-%m-%d\\\")\\n\",\n    \"    model_name = repo_id.split(\\\"/\\\")[-1]\\n\",\n    \"    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"SYSTEM_PROMPT = load_system_prompt(model, \\\"SYSTEM_PROMPT.txt\\\")\\n\",\n    \"\\n\",\n    \"image_url = \\\"https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png\\\"\\n\",\n    \"\\n\",\n    \"messages = [\\n\",\n    \"    {\\\"role\\\": \\\"system\\\", \\\"content\\\": SYSTEM_PROMPT},\\n\",\n    \"    {\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"            {\\n\",\n    \"                \\\"type\\\": \\\"text\\\",\\n\",\n    \"                \\\"text\\\": \\\"Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.\\\",\\n\",\n    \"            },\\n\",\n    \"            {\\\"type\\\": \\\"image_url\\\", \\\"image_url\\\": {\\\"url\\\": image_url}},\\n\",\n    \"        ],\\n\",\n    \"    },\\n\",\n    \"]\\n\",\n    \"\\n\",\n    \"data = {\\\"model\\\": vllm_model, \\\"messages\\\": messages}\\n\",\n    \"\\n\",\n    \"response = requests.post(url, headers=headers, data=json.dumps(data))\\n\",\n    \"print(response.json()[\\\"choices\\\"][0][\\\"message\\\"][\\\"content\\\"])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Sample response from the model\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"The ranking of countries based on the best food is subjective and can vary greatly depending on personal preferences. It can be perceived as offensive by some to rank cuisines but I will do it based on commonly held opinions.\\n\",\n    \"\\n\",\n    \"1. Italy\\n\",\n    \"Color on the map: Brown\\n\",\n    \"City visible on the map: Napoli (in brown color)\\n\",\n    \"\\n\",\n    \"2. France\\n\",\n    \"Color on the map: Dark teal\\n\",\n    \"City visible on the map: Marseille (in dark teal color)\\n\",\n    \"\\n\",\n    \"3. Spain\\n\",\n    \"Color on the map: Red pink\\n\",\n    \"City visible on the map: Barcelona (in red pink color)\\n\",\n    \"\\n\",\n    \"4. Germany\\n\",\n    \"Color on the map: Orange\\n\",\n    \"City visible on the map: Cologne (in orange color)\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Conclusion\\n\",\n    \"\\n\",\n    \"Congratulations ! You now know how to deploy `mistralai/Pixtral-Large-Instruct-2411` on a `trn2.48xlarge` instance. Modify the configurations and deploy the model as per your requirements and use case.\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"neuron-224\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.12\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/qwen2-vl-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Deploy Qwen2-VL on Trn2 instances\\n\",\n    \"\\n\",\n    \"This tutorial provides a step-by-step guide to deploy [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) using NeuronX Distributed (NxD) Inference on a single `trn2.48xlarge` instance.\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 1: Set up your development environment\\n\",\n    \"\\n\",\n    \"As a prerequisite, this tutorial requires that you have a Trn2 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed.\\n\",\n    \"\\n\",\n    \"To set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK, see the [NxDI setup guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#nxdi-setup). To run a Jupyter (.ipynb) notebook on a Neuron instance, follow this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"After setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you chose when you launched the instance.\\n\",\n    \"\\n\",\n    \"After you are connected, activate the Python virtual environment that includes the Neuron SDK.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \"```python\\n\",\n    \"pip list | grep neuron\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You should see Neuron packages including\\n\",\n    \"`neuronx-distributed-inference` and `neuronx-cc`.\\n\",\n    \"\\n\",\n    \"## Step 2: Install the vLLM version that supports NxD Inference\\n\",\n    \"\\n\",\n    \"NxD Inference supports running models with vLLM. This functionality is available in the vLLM-Neuron GitHub repository. Install the latest release branch of vLLM-Neuron plugin following instructions in the [vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide-v1.html).\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Ensure that the Neuron virtual environment is activated if you are using a new terminal session instead of the one from connection step above. Then, install the Neuron vLLM fork into the virtual environment.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 3 Download the model from HuggingFace (Optional)\\n\",\n    \"\\n\",\n    \"To deploy [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) on Neuron, first download the checkpoint from HuggingFace to a local path on the Trn2 instance. For more information on downloading models from HuggingFace, refer to [HuggingFace's guide on Downloading models](https://huggingface.co/docs/hub/en/models-downloading)).\\n\",\n    \"\\n\",\n    \"After the download, you should see a `config.json` file in the output folder along with weights in `model-xxxx-of-xxxx.safetensors` format.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 4: Compile and deploy Qwen2-VL Inference\\n\",\n    \"\\n\",\n    \"In this step, you use the `vllm` command to deploy the model. The `neuronx-distributed-inference` model loader in vllm performs JIT compilation before deploying it with the model server. Replace the `model_name_or_path` with your specific path if you download the model checkpoint from HuggingFace(Step 3).\\n\",\n    \"\\n\",\n    \"Here are two examples of running Qwen2-VL with vLLM V1:\\n\",\n    \"\\n\",\n    \"* Offline inference: you can provide prompts in a python script and execute it.\\n\",\n    \"* Online inference: you will serve the model in an online server and send requests. \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Model Configuration Requirements & Examples\\n\",\n    \"\\n\",\n    \"There is a known issue with `batch_size` > 1 or `tp_degree` != 4 configurations for Qwen2-VL models. Here we suggest to use `batch_size` = 1 and `tp_degree` = 4 configuration, which deploys `Qwen/Qwen2-VL-7B-Instruct` model on a single trn2 chip with 4 cores. You can replicate the setting on the `trn2.48xlarge` instance consisting of 16 chips and 64 cores.\\n\",\n    \"\\n\",\n    \"We support configurable image sizes for Qwen2-VL and use `number_of_images` as the vision buckets. For example, in the configuration below, `number_of_images` is the maximum vision bucket, i.e., `128`.\\n\",\n    \"Please specify `default_image_width` and `default_image_height` in the `vision_neuron_config` as the input image size. The default image sizes are `default_image_width: 640` and `default_image_height: 320`.\\n\",\n    \"\\n\",\n    \"<div class=\\\"alert alert-block alert-warning\\\">\\n\",\n    \"<b>Note:</b> Please make sure the number of tokens does not exceed the `max_context_length` in the `text_neuron_config`, i.e., `number_of_prompt_tokens + (default_image_width // 28) * (default_image_height // 28) * number_of_images < max_context_length - max_new_tokens`.\\n\",\n    \"</div>\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We configure these fields below to improve performance. For more details, refer to [NxD Inference features configurations guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html).\\n\",\n    \"- **`enable_ve_data_parallelism`: whether to enable vision encoder data parallelism.**\\n\",\n    \"\\n\",\n    \"<div class=\\\"alert alert-block alert-warning\\\">\\n\",\n    \"<b>Note:</b> We set the `ve_dp_degree` to `world_size // tp_degree` in the vision_neuron_config. With `enable_ve_data_parallelism=True`, we require the number of images (vision bucket size) to be divisible by `ve_dp_degree`.\\n\",\n    \"</div>\\n\",\n    \"\\n\",\n    \"- `sequence_parallel_enabled`: whether to enable sequence parallelism.\\n\",\n    \"- `fuse_qkv` and `qkv_kernel_enabled`: whether to use the fused QKV kernel. `qkv_kernel_enabled` is not supported yet in the `vision_neuron_config` for Qwen2-VL.\\n\",\n    \"- `attn_kernel_enabled`: whether to use the optimized attention kernel.\\n\",\n    \"\\n\",\n    \"Below we provide the recommended configuration with `batch_size` 1 and `tp_degree` 4.\\n\",\n    \"<div class=\\\"alert alert-block alert-warning\\\">\\n\",\n    \"<b>Note:</b> If you encounter Out-of-Memory issue during the runtime, please try to reduce the size of vision buckets as the KV cache grows linearly with batch size and sequence length.\\n\",\n    \"</div>\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"qwen2_vl_neuron_config = {\\n\",\n    \"    \\\"text_neuron_config\\\": {\\n\",\n    \"        \\\"batch_size\\\": 1,\\n\",\n    \"        \\\"ctx_batch_size\\\": 1,\\n\",\n    \"        \\\"tkg_batch_size\\\": 1,\\n\",\n    \"        \\\"seq_len\\\": 32768,\\n\",\n    \"        \\\"max_new_tokens\\\": 64,\\n\",\n    \"        \\\"max_context_length\\\": 32768,\\n\",\n    \"        \\\"torch_dtype\\\": \\\"float16\\\",\\n\",\n    \"        \\\"skip_sharding\\\": False,\\n\",\n    \"        \\\"save_sharded_checkpoint\\\": True,\\n\",\n    \"        \\\"tp_degree\\\": 4,\\n\",\n    \"        \\\"world_size\\\": 4,\\n\",\n    \"        \\\"enable_bucketing\\\": True,\\n\",\n    \"        \\\"context_encoding_buckets\\\": [2048, 16384, 32768],\\n\",\n    \"        \\\"token_generation_buckets\\\": [2048, 16384, 32768],\\n\",\n    \"        \\\"fused_qkv\\\": True,\\n\",\n    \"        \\\"qkv_kernel_enabled\\\": True,\\n\",\n    \"        \\\"sequence_parallel_enabled\\\": True,\\n\",\n    \"        \\\"attn_kernel_enabled\\\": True,\\n\",\n    \"        \\\"cc_pipeline_tiling_factor\\\": 2,\\n\",\n    \"        \\\"attention_dtype\\\": \\\"float16\\\",\\n\",\n    \"        \\\"rpl_reduce_dtype\\\": \\\"float16\\\",\\n\",\n    \"        \\\"cast_type\\\": \\\"as-declared\\\",\\n\",\n    \"        \\\"logical_neuron_cores\\\": 2,\\n\",\n    \"    },\\n\",\n    \"    \\\"vision_neuron_config\\\": {\\n\",\n    \"        \\\"batch_size\\\": 1,\\n\",\n    \"        \\\"seq_len\\\": 131072,\\n\",\n    \"        \\\"max_context_length\\\": 131072,\\n\",\n    \"        \\\"torch_dtype\\\": \\\"bfloat16\\\",\\n\",\n    \"        \\\"skip_sharding\\\": False,\\n\",\n    \"        \\\"save_sharded_checkpoint\\\": True,\\n\",\n    \"        \\\"tp_degree\\\": 1,\\n\",\n    \"        \\\"world_size\\\": 4,\\n\",\n    \"        \\\"fused_qkv\\\": True,\\n\",\n    \"        \\\"enable_ve_data_parallel\\\": True,\\n\",\n    \"        \\\"qkv_kernel_enabled\\\": False,\\n\",\n    \"        \\\"attn_kernel_enabled\\\": True,\\n\",\n    \"        \\\"enable_bucketing\\\": True,\\n\",\n    \"        \\\"buckets\\\": [128],\\n\",\n    \"        \\\"cc_pipeline_tiling_factor\\\": 2,\\n\",\n    \"        \\\"attention_dtype\\\": \\\"bfloat16\\\",\\n\",\n    \"        \\\"rpl_reduce_dtype\\\": \\\"bfloat16\\\",\\n\",\n    \"        \\\"cast_type\\\": \\\"as-declared\\\",\\n\",\n    \"        \\\"logical_neuron_cores\\\": 2,\\n\",\n    \"        \\\"default_image_width\\\": 640,\\n\",\n    \"        \\\"default_image_height\\\": 320\\n\",\n    \"    }\\n\",\n    \"}\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Offline Example\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"\\n\",\n    \"os.environ[\\\"VLLM_NEURON_FRAMEWORK\\\"] = \\\"neuronx-distributed-inference\\\"\\n\",\n    \"\\n\",\n    \"from vllm import LLM, SamplingParams\\n\",\n    \"from vllm.assets.image import ImageAsset\\n\",\n    \"from transformers import AutoProcessor\\n\",\n    \"\\n\",\n    \"def qwen2_vl_offline_test():\\n\",\n    \"    model_name_or_path = \\\"Qwen/Qwen2-VL-7B-Instruct/\\\"\\n\",\n    \"    # Create an LLM.\\n\",\n    \"    llm = LLM(\\n\",\n    \"    model=model_name_or_path,\\n\",\n    \"    tensor_parallel_size=4,\\n\",\n    \"    max_num_seqs=1,\\n\",\n    \"    max_model_len=32768,\\n\",\n    \"    additional_config=dict(\\n\",\n    \"        override_neuron_config=qwen2_vl_neuron_config  # Use the configuration defined above\\n\",\n    \"    ),\\n\",\n    \"    enable_prefix_caching=False,\\n\",\n    \"    enable_chunked_prefill=False,\\n\",\n    \"    )\\n\",\n    \"\\n\",\n    \"    # Sample prompts.\\n\",\n    \"    prompt = \\\"What do you see in these images?\\\"\\n\",\n    \"    # Resize to default image size\\n\",\n    \"    default_image_size = (640, 320)\\n\",\n    \"\\n\",\n    \"    images = [\\n\",\n    \"        ImageAsset(\\\"blue_flowers\\\").pil_image.resize(default_image_size),\\n\",\n    \"        ImageAsset(\\\"bird\\\").pil_image.resize(default_image_size),\\n\",\n    \"    ]\\n\",\n    \"\\n\",\n    \"    processor = AutoProcessor.from_pretrained(model_name_or_path)\\n\",\n    \"\\n\",\n    \"    placeholders = [{\\\"type\\\": \\\"image\\\"} for _ in images]\\n\",\n    \"    messages = [\\n\",\n    \"    {\\\"role\\\": \\\"system\\\", \\\"content\\\": \\\"You are a helpful assistant.\\\"},\\n\",\n    \"    {\\n\",\n    \"        \\\"role\\\": \\\"user\\\",\\n\",\n    \"        \\\"content\\\": [\\n\",\n    \"                *placeholders,\\n\",\n    \"                {\\n\",\n    \"                \\\"type\\\": \\\"text\\\",\\n\",\n    \"                \\\"text\\\": prompt,\\n\",\n    \"                },\\n\",\n    \"        ],\\n\",\n    \"    },\\n\",\n    \"    ]\\n\",\n    \"\\n\",\n    \"    prompt = processor.apply_chat_template(\\n\",\n    \"    messages,\\n\",\n    \"    tokenize=False,\\n\",\n    \"    add_generation_prompt=True,\\n\",\n    \"    )\\n\",\n    \"    inputs = {\\n\",\n    \"    \\\"prompt\\\": prompt,\\n\",\n    \"    \\\"multi_modal_data\\\": {\\n\",\n    \"        \\\"image\\\": images,\\n\",\n    \"    },\\n\",\n    \"    }\\n\",\n    \"    outputs = llm.generate([inputs], SamplingParams(top_k=1, max_tokens=64))\\n\",\n    \"\\n\",\n    \"    for output in outputs:\\n\",\n    \"        generated_text = output.outputs[0].text\\n\",\n    \"        print(f\\\"Generated text: {generated_text!r}\\\")\\n\",\n    \"\\n\",\n    \"if __name__ == \\\"__main__\\\":\\n\",\n    \"    qwen2_vl_offline_test()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Below is an example output:\\n\",\n    \"```bash\\n\",\n    \"Generated text: 'The first image shows a close-up of a flower with blue petals and water droplets on them, set against a dark background. The second image features a vibrant red bird with blue and green wings perched on a branch.'\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Online Example\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import json\\n\",\n    \"\\n\",\n    \"VLLM_NEURON_FRAMEWORK='neuronx-distributed-inference'\\n\",\n    \"additional_neuron_config=json.dumps(dict(override_neuron_config=qwen2_vl_neuron_config))\\n\",\n    \"start_server_cmd=cmd = f'''python3 -m vllm.entrypoints.openai.api_server \\\\\\n\",\n    \"   --model=\\\\'{model_name_or_path}\\\\' \\\\\\n\",\n    \"   --tensor-parallel-size=4 \\\\\\n\",\n    \"   --max-num-seqs=1 \\\\\\n\",\n    \"   --max-model-len=32768 \\\\\\n\",\n    \"   --additional-config=\\\\'{additional_neuron_config}\\\\' \\\\\\n\",\n    \"   --no-enable-chunked-prefill \\\\\\n\",\n    \"   --no-enable-prefix-caching \\\\\\n\",\n    \"   --port=8080\\n\",\n    \"'''\\n\",\n    \"\\n\",\n    \"import os\\n\",\n    \"os.system(start_server_cmd)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Once the vLLM server is online, submit requests using the example below:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from openai import OpenAI\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"client = OpenAI(api_key=\\\"EMPTY\\\", base_url=\\\"http://0.0.0.0:8080/v1\\\")\\n\",\n    \"models = client.models.list()\\n\",\n    \"model_name = models.data[0].id\\n\",\n    \"\\n\",\n    \"messages = [\\n\",\n    \"   {\\\"role\\\": \\\"system\\\", \\\"content\\\": \\\"You are a helpful assistant.\\\"},\\n\",\n    \"   {\\n\",\n    \"      \\\"role\\\": \\\"user\\\",\\n\",\n    \"      \\\"content\\\": [\\n\",\n    \"        {\\n\",\n    \"            \\\"type\\\": \\\"text\\\",\\n\",\n    \"            \\\"text\\\": \\\"Describe this image.\\\",\\n\",\n    \"        },\\n\",\n    \"        {\\n\",\n    \"            \\\"type\\\": \\\"image_url\\\",\\n\",\n    \"            \\\"image_url\\\": {\\n\",\n    \"                \\\"url\\\": \\\"example_image_url\\\" # need to resize to {default_image_width}x{default_image_height}\\n\",\n    \"            }\\n\",\n    \"        }\\n\",\n    \"      ],\\n\",\n    \"   },\\n\",\n    \"]\\n\",\n    \"\\n\",\n    \"response = client.chat.completions.create(\\n\",\n    \"    model=model_name,\\n\",\n    \"    messages=messages,\\n\",\n    \"    max_tokens=64,\\n\",\n    \"    temperature=1.0,\\n\",\n    \"    top_p=1.0,\\n\",\n    \"    stream=False,\\n\",\n    \"    extra_body={\\\"top_k\\\": 1},\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"generated_text = response.choices[0].message.content\\n\",\n    \"print(generated_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Conclusion\\n\",\n    \"\\n\",\n    \"Congratulations ! You now know how to deploy `Qwen/Qwen2-VL-7B-Instruct` on a `trn2.48xlarge` instance. Modify the configurations and deploy the model as per your requirements and use case.\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"neuron-224\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.12\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/qwen3-moe-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d860f5c8\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Deploy Qwen3-MoE 235B on Trn2 instances\\n\",\n    \"This tutorial provides a step-by-step guide to deploy [Qwen/Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B) on a single `trn2.48xlarge` instance using vLLM V1 with the vLLM-Neuron Plugin.\\n\",\n    \"\\n\",\n    \"**Note**: Qwen3-MoE 235B may observe degraded decode throughput compared to previous releases. Our team is actively investigating the root cause. In the meantime, we recommend customers use release 2.28 for workloads where Qwen3-MoE 235B decode performance is critical.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f05df502\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"bdc1ca69\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 1: Set up your development environment\\n\",\n    \"\\n\",\n    \"As a prerequisite, this tutorial requires that you have a Trn2 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed.\\n\",\n    \"\\n\",\n    \"To set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK, see the [NxDI setup guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#nxdi-setup). To use Jupyter Notebook on the Neuron instance, you can follow this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"After setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you chose when you launched the instance.\\n\",\n    \"\\n\",\n    \"After you are connected, activate the Python virtual environment that includes the Neuron SDK.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"2cb24e4d\",\n   \"metadata\": {},\n   \"source\": [\n    \"```python\\n\",\n    \"pip list | grep neuron\\n\",\n    \"```\\n\",\n    \"You should see Neuron packages including\\n\",\n    \"`neuronx-distributed-inference` and `neuronx-cc`.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"c4dcab0f\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 2: Install the vLLM version that supports NxD Inference\\n\",\n    \"\\n\",\n    \"NxD Inference supports running models with vLLM. This functionality is available in the vLLM-Neuron GitHub repository. Install the latest release branch of vLLM-Neuron plugin following instructions in the [vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide-v1.html).\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"db96b543\",\n   \"metadata\": {},\n   \"source\": [\n    \"Ensure that the Neuron virtual environment is activated if using a new terminal instead of the one from connect step above. Then, install the Neuron vLLM into the virtual environment.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f7683efb\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 3 Download the model from HuggingFace\\n\",\n    \"\\n\",\n    \"To deploy [Qwen/Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B) on Neuron, you need to first download the checkpoint from HuggingFace to a local path on the Trn2 instance (for more information on downloading models from HuggingFace, refer [the guide on Downloading models](https://huggingface.co/docs/hub/en/models-downloading)).\\n\",\n    \"\\n\",\n    \"After the download, you should see a `config.json` file in the output folder along with weights in `model-xxxx-of-xxxx.safetensors` format.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"ba133c4a\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 4: Compile and deploy Qwen3 Inference\\n\",\n    \"\\n\",\n    \"In this step, you can directly use the vllm command to deploy the model. The `neuronx-distributed-inference` model loader in vllm performs JIT compilation before deploying it with the model server. Replace the default model path `~/models/Qwen3-235B-A22B/` with your specific path before running the below command.\\n\",\n    \"\\n\",\n    \"We provide two examples to run Qwen3 with vLLM V1:\\n\",\n    \"\\n\",\n    \"* Offline inference: you can provide prompts in a python script and execute it.\\n\",\n    \"* Online inference: you will serve the model in an online server and send requests. \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"1eee733a\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Model Compilation and Configuration\\n\",\n    \"There is a known issue with `batch_size < 16` for Qwen3 MoE configurations.\\n\",\n    \"\\n\",\n    \"There are a few fields you can configure to improve performance:\\n\",\n    \"- `tp_degree`: degree of tensor parallelism.\\n\",\n    \"- `attention_dp_degree`: degree of data parallelism at the attention layer for decoding.\\n\",\n    \"- `cp_degree`: degree of context parallelism at the attention layer for prefill.\\n\",\n    \"- `moe_tp_degree`: degree of tensor parallelism at the moe layer, `moe_tp_degree`*`moe_ep_degree` should equal to `tp_degree`.\\n\",\n    \"- `moe_ep_degree`: degree of expert parallelism at the moe layer, `moe_tp_degree`*`moe_ep_degree` should equal to `tp_degree`.\\n\",\n    \"- `blockwise_matmul_config`: the configuration of the blockwise MoE kernel for prefill, here we recommend to shard on the intermediate dimension.\\n\",\n    \"- `use_index_calc_kernel`: whether to use specialized kernel for index calculations.\\n\",\n    \"- `moe_mask_padded_token`: whether to mask padded tokens at the moe layer.\\n\",\n    \"- `qkv_kernel_enabled` and `qkv_nki_kernel_enabled`: whether to use the fused QKV kernel.\\n\",\n    \"- `qkv_cte_nki_kernel_fuse_rope`: whether to use the fused QKV and RoPE kernel.\\n\",\n    \"- `strided_context_parallel_kernel_enabled`: whether to use the strided context parallel flash attention kernel.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ca58ccff\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"qwen3_moe_neuron_config = {\\n\",\n    \"    \\\"tp_degree\\\": 64,\\n\",\n    \"    \\\"attention_dp_degree\\\": 8,\\n\",\n    \"    \\\"cp_degree\\\": 16,\\n\",\n    \"    \\\"moe_tp_degree\\\": 2,\\n\",\n    \"    \\\"moe_ep_degree\\\": 32,\\n\",\n    \"    \\\"use_index_calc_kernel\\\": True,\\n\",\n    \"    \\\"moe_mask_padded_tokens\\\": True,\\n\",\n    \"    \\\"batch_size\\\": 16,\\n\",\n    \"    \\\"ctx_batch_size\\\": 1,\\n\",\n    \"    \\\"max_context_length\\\": 16384,\\n\",\n    \"    \\\"seq_len\\\": 16384,\\n\",\n    \"    \\\"is_continuous_batching\\\": True,\\n\",\n    \"    \\\"fused_qkv\\\": True,\\n\",\n    \"    \\\"blockwise_matmul_config\\\":{\\\"use_shard_on_intermediate_dynamic_while\\\": True, \\\"skip_dma_token\\\": True},\\n\",\n    \"    \\\"on_device_sampling_config\\\": {\\n\",\n    \"        \\\"do_sample\\\": True,\\n\",\n    \"        \\\"temperature\\\": 0.6,\\n\",\n    \"        \\\"top_k\\\": 20,\\n\",\n    \"        \\\"top_p\\\": 0.95\\n\",\n    \"    },\\n\",\n    \"    \\\"enable_bucketing\\\": True,\\n\",\n    \"    \\\"context_encoding_buckets\\\": [10240, 16384],\\n\",\n    \"    \\\"token_generation_buckets\\\": [10240, 16384],\\n\",\n    \"    \\\"flash_decoding_enabled\\\": False,\\n\",\n    \"    \\\"logical_nc_config\\\": 2,\\n\",\n    \"    \\\"sequence_parallel_enabled\\\": True,\\n\",\n    \"    \\\"qkv_kernel_enabled\\\": True,\\n\",\n    \"    \\\"qkv_nki_kernel_enabled\\\": True,\\n\",\n    \"    \\\"qkv_cte_nki_kernel_fuse_rope\\\": True,\\n\",\n    \"    \\\"attn_kernel_enabled\\\": True,\\n\",\n    \"    \\\"strided_context_parallel_kernel_enabled\\\": True,\\n\",\n    \"    \\\"async_mode\\\": True\\n\",\n    \"}\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4ffbd74f\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Offline Example\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"a1320307\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"\\n\",\n    \"os.environ[\\\"VLLM_NEURON_FRAMEWORK\\\"] = \\\"neuronx-distributed-inference\\\"\\n\",\n    \"\\n\",\n    \"from vllm import LLM, SamplingParams\\n\",\n    \"\\n\",\n    \"# Create an LLM.\\n\",\n    \"llm = LLM(\\n\",\n    \"   model=\\\"~/models/Qwen3-235B-A22B/\\\",\\n\",\n    \"   tensor_parallel_size=64,\\n\",\n    \"   max_num_seqs=16,\\n\",\n    \"   max_model_len=16384,\\n\",\n    \"   additional_config=dict(\\n\",\n    \"    override_neuron_config=qwen3_moe_neuron_config  # Use the configuration defined above\\n\",\n    \"    ),\\n\",\n    \"   enable_prefix_caching=False,\\n\",\n    \"   enable_chunked_prefill=False,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"# Sample prompts.\\n\",\n    \"prompts = [\\n\",\n    \"   \\\"The president of the United States is\\\",\\n\",\n    \"   \\\"The capital of France is\\\",\\n\",\n    \"   \\\"The future of AI is\\\",\\n\",\n    \"]\\n\",\n    \"outputs = llm.generate(prompts, SamplingParams(top_k=1))\\n\",\n    \"\\n\",\n    \"for output in outputs:\\n\",\n    \"   prompt = output.prompt\\n\",\n    \"   generated_text = output.outputs[0].text\\n\",\n    \"   print(f\\\"Prompt: {prompt!r}, Generated text: {generated_text!r}\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"7c0dce40\",\n   \"metadata\": {},\n   \"source\": [\n    \"Below is an example output:\\n\",\n    \"\\n\",\n    \"```bash\\n\",\n    \"Prompt: 'The president of the United States is', Generated text: ' the head of state and head of government of the United States, indirectly elected to'\\n\",\n    \"Prompt: 'The capital of France is', Generated text: ' Paris. The capital of Italy is Rome. The capital of Germany is Berlin.'\\n\",\n    \"Prompt: 'The future of AI is', Generated text: \\\" not just about smarter algorithms or faster processors; it's about creating systems that can\\\"\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"417b462b\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Online Example\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"d95a68b0\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import json\\n\",\n    \"\\n\",\n    \"VLLM_NEURON_FRAMEWORK='neuronx-distributed-inference'\\n\",\n    \"additional_neuron_config=json.dumps(dict(override_neuron_config=qwen3_moe_neuron_config))\\n\",\n    \"start_server_cmd=cmd = f'''vllm serve \\\\\\n\",\n    \"   --model=\\\"~/models/Qwen3-235B-A22B/\\\" \\\\\\n\",\n    \"   --tensor-parallel-size=64 \\\\\\n\",\n    \"   --max-num-seqs=16 \\\\\\n\",\n    \"   --max-model-len=16384 \\\\\\n\",\n    \"   --additional-config=\\\\'{additional_neuron_config}\\\\' \\\\\\n\",\n    \"   --no-enable-chunked-prefill \\\\\\n\",\n    \"   --no-enable-prefix-caching \\\\\\n\",\n    \"   --port=8080\\n\",\n    \"'''\\n\",\n    \"\\n\",\n    \"import os\\n\",\n    \"os.system(start_server_cmd)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d7c3024a\",\n   \"metadata\": {},\n   \"source\": [\n    \"Once the vLLM server is online, submit requests using the example below:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"f09ec8eb\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from openai import OpenAI\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"client = OpenAI(api_key=\\\"EMPTY\\\", base_url=\\\"http://0.0.0.0:8080/v1\\\")\\n\",\n    \"models = client.models.list()\\n\",\n    \"model_name = models.data[0].id\\n\",\n    \"\\n\",\n    \"prompt = \\\"Hello, my name is Llama \\\"\\n\",\n    \"\\n\",\n    \"response = client.chat.completions.create(\\n\",\n    \"    model=model_name,\\n\",\n    \"    messages=[{\\\"role\\\": \\\"user\\\", \\\"content\\\": prompt}],\\n\",\n    \"    max_tokens=1024,\\n\",\n    \"    temperature=1.0,\\n\",\n    \"    top_p=1.0,\\n\",\n    \"    stream=False,\\n\",\n    \"    extra_body={\\\"top_k\\\": 50},\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"generated_text = response.choices[0].message.content\\n\",\n    \"print(generated_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"583a3a9f\",\n   \"metadata\": {},\n   \"source\": [\n    \"Below is an example output:\\n\",\n    \"```bash\\n\",\n    \"<think>\\n\",\n    \"Okay, so the user is Llama, and they want to know if I can handle that name. Let me think. First, Llama is an animal, but people can have names like that too. I should make sure I use the correct capitalization if that's how they present themselves. The user mentioned they're trying to start a conversation, so I should respond warmly. Maybe they want to check if I can remember their name or if I can be friendly. I need to acknowledge their name properly and invite them to ask questions. Also, considering Llama isn't a common name, I should take care not to misspell it or use lowercase unless instructed. Let me confirm the name and offer assistance. I'll keep it simple and welcoming.\\n\",\n    \"\\n\",\n    \"Wait, but maybe the user just wants to confirm they're using the correct name format. Should I include an emoji to keep the tone friendly? The example response uses a Llama face, but since Llama is their name, maybe a different emoji like a star or checkmark? Or maybe none, to stay professional. But the user wants a conversational tone, so perhaps a smiley. Let me structure the response as: \\\"Hello, Llama! 😊 Nice to meet you. How can I assist you today?\\\" That's friendly, uses their name correctly, and opens the door for help without assuming their intent.\\n\",\n    \"</think>\\n\",\n    \"\\n\",\n    \"Hello, Llama! 🤗 It's nice to meet you. How can I assist you today? Let me know if you have any questions or need help exploring topics!\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"6bd1f1bc\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Conclusion\\n\",\n    \"\\n\",\n    \"Congratulations ! You now know how to deploy `Qwen/Qwen3-235B-A22B` on a `trn2.48xlarge` instance. Modify the configurations and deploy the model as per your requirements and use case.\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"language_info\": {\n   \"name\": \"python\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/qwen3-vl-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d860f5c8\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Deploy Qwen3-VL 8B on Trn2 instances\\n\",\n    \"This tutorial provides a step-by-step guide to deploy [Qwen/Qwen3-VL-8B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking) on a single `trn2.48xlarge` instance using vLLM V1 with the vLLM-Neuron Plugin.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f05df502\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"b3d939e5\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Examples\\n\",\n    \"\\n\",\n    \"- [Offline Example](#offline-example)\\n\",\n    \"- [Online Example](#online-example)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"bdc1ca69\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 1: Set up your development environment\\n\",\n    \"\\n\",\n    \"As a prerequisite, this tutorial requires that you have a Trn2 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed.\\n\",\n    \"\\n\",\n    \"To set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK, see the [NxDI setup guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#nxdi-setup). To use a Jupyter (.ipynb) notebook on a Neuron instance, follow this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"After setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you chose when you launched the instance.\\n\",\n    \"\\n\",\n    \"After you are connected, activate the Python virtual environment that includes the Neuron SDK.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"2cb24e4d\",\n   \"metadata\": {},\n   \"source\": [\n    \"```python\\n\",\n    \"pip list | grep neuron\\n\",\n    \"```\\n\",\n    \"You should see Neuron packages including\\n\",\n    \"`neuronx-distributed-inference` and `neuronx-cc`.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"c4dcab0f\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 2: Install the vLLM version that supports NxD Inference\\n\",\n    \"\\n\",\n    \"NxD Inference supports running models with vLLM. This functionality is available in the vLLM-Neuron GitHub repository. Install the latest release branch of vLLM-Neuron plugin following instructions in the [vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide-v1.html).\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"db96b543\",\n   \"metadata\": {},\n   \"source\": [\n    \"Ensure that the Neuron virtual environment is activated if you are using a new terminal instead of the one from connection step above. Then, install the Neuron vLLM into the virtual environment.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f7683efb\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 3: Download the model from HuggingFace (Optional)\\n\",\n    \"\\n\",\n    \"To deploy [Qwen/Qwen3-VL-8B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking) on Neuron,  download the checkpoint from HuggingFace to a local path on the Trn2 instance. For more information on downloading models from HuggingFace, refer to [the HuggingFace guide on downloading models](https://huggingface.co/docs/hub/en/models-downloading)).\\n\",\n    \"\\n\",\n    \"After the download, you should see a `config.json` file in the output folder along with weights in `model-xxxx-of-xxxx.safetensors` format.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"ba133c4a\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step 4: Compile and deploy Qwen3 VL Inference\\n\",\n    \"\\n\",\n    \"We provide two examples to run Qwen3 VL with vLLM V1:\\n\",\n    \"\\n\",\n    \"* Offline inference: you can provide prompts in a python script and execute it.\\n\",\n    \"* Online inference: you will serve the model in an online server and send requests. \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"ab33a384\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Model Compilation and Configuration\\n\",\n    \"\\n\",\n    \"Certain configurations are used to optimize the performance of the model during compilation. These configurations are described below and can be modified for your specific use case.\\n\",\n    \"- Qwen3 VL consists of a text model and a vision encoder. You must specify configurations explicitly through `text_neuron_config` and `vision_neuron_config`.\\n\",\n    \"- `world_size`: max number of neuron cores in the distributed environment. Text and vision model must have the same world size.\\n\",\n    \"- `tp_degree`: degree of tensor parallelism. Text and vision model can use different sharding scheme and therefore different TP degree.\\n\",\n    \"- `batch_size`: This is set to the batch size for compiling the models. For optimized latency, Prefill is always done with batch_size = 1; hence `ctx_batch_size` in `text_neuron_config` and the `batch_size` in `vision_neuron_config` are set to 1. The `batch_size` and `tkg_batch_size` in `text_neuron_config` are set to the desired value for handling concurrent requests (same as `max-num-seqs` for the vllm argument).\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"1eee733a\",\n   \"metadata\": {},\n   \"source\": [\n    \"- `text_neuron_config`\\n\",\n    \"    - `seq_len`: Set this to the maximum sequence length in your use case. We currently support up to 32768 in the text model. This refers to the total length of vision and text, input and output tokens.\\n\",\n    \"    - `enable_bucketing`: [Bucketing](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#bucketing) allows one to optimize performance for specific sequence lengths and in this case we [configure specific buckets](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#configuring-specific-buckets).\\n\",\n    \"    - `context_encoding_buckets`: This refers to the prefill/context encoding phase and should be set to handle different total length of vision and text input tokens.\\n\",\n    \"    - `token_generation_buckets`: This refers to the decode/token generation phase. The bucket size should reflect the total sequence length, which is the sum of vision tokens, text input tokens, and output tokens.\\n\",\n    \"    - **`sequence_parallel_enabled`: Enable the sequence parallelism for text model.**\\n\",\n    \"    - `fused_qkv`: [QKV weight fusion](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#qkv-weight-fusion) concatenates a model’s query, key and value weight matrices to achieve better performance.\\n\",\n    \"    - `qkv_kernel_enabled`: Enable the use of the fused QKV kernel.\\n\",\n    \"    - `mlp_kernel_enabled`: Enable the use of the MLP kernel.\\n\",\n    \"    - `attn_kernel_enabled`: Enable the use of the Flash Attention kernel.\\n\",\n    \"- `vision_neuron_config`\\n\",\n    \"    - `seq_len`: Set this to the maximum vision sequence length in your use case. We currently support up to 16384 in the vision model. Vision sequence length is calculated by `num_images * (image_height//patch_size) * (image_width//patch_size)`.\\n\",\n    \"    - `buckets`: Set this to handle different vision sequence lengths.\\n\",\n    \"    - `fused_qkv`: [QKV weight fusion](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#qkv-weight-fusion) concatenates a model’s query, key and value weight matrices to achieve better performance.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0f583149\",\n   \"metadata\": {},\n   \"source\": [\n    \"<div class=\\\"alert alert-block alert-warning\\\">\\n\",\n    \"<b>Note:</b> Qwen3 VL vision embeddings are spatially compressed by a factor of `spatial_merge_size ** 2` before being fed into the text model. This value is defined in the model's `config.json`. As a result, the effective text context length is calculated as: `text_context_len = vision_seq_len // (spatial_merge_size ** 2)`.\\n\",\n    \"</div>\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ca58ccff\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"text_neuron_config = {\\n\",\n    \"    # Batch Size\\n\",\n    \"    \\\"batch_size\\\": 1,\\n\",\n    \"    \\\"ctx_batch_size\\\": 1,\\n\",\n    \"    \\\"tkg_batch_size\\\": 1,\\n\",\n    \"    \\n\",\n    \"    # Sequence Lengths\\n\",\n    \"    \\\"seq_len\\\": 32768,\\n\",\n    \"    \\\"max_context_length\\\": 32768,\\n\",\n    \"    \\n\",\n    \"    # Buckets\\n\",\n    \"    \\\"enable_bucketing\\\": True,\\n\",\n    \"    \\\"context_encoding_buckets\\\": [2048, 5120, 32768],\\n\",\n    \"    \\\"token_generation_buckets\\\": [2048, 5120, 32768],\\n\",\n    \"    \\n\",\n    \"    # Parallelism\\n\",\n    \"    \\\"world_size\\\": 16,\\n\",\n    \"    \\\"tp_degree\\\": 16,\\n\",\n    \"    \\\"sequence_parallel_enabled\\\": True,\\n\",\n    \"    \\n\",\n    \"    # Others\\n\",\n    \"    \\\"torch_dtype\\\": \\\"float16\\\",\\n\",\n    \"    \\\"rpl_reduce_dtype\\\": \\\"float16\\\",\\n\",\n    \"    \\\"attention_dtype\\\": \\\"float16\\\",\\n\",\n    \"    \\\"cast_type\\\": \\\"as-declared\\\",\\n\",\n    \"    \\\"logical_neuron_cores\\\": 2,\\n\",\n    \"    \\\"cc_pipeline_tiling_factor\\\": 1,\\n\",\n    \"    \\n\",\n    \"    # Kernels\\n\",\n    \"    \\\"fused_qkv\\\": True,\\n\",\n    \"    \\\"qkv_kernel_enabled\\\": True,\\n\",\n    \"    \\\"mlp_kernel_enabled\\\": True,\\n\",\n    \"    \\\"attn_kernel_enabled\\\": True,\\n\",\n    \"}\\n\",\n    \"\\n\",\n    \"vision_neuron_config = {\\n\",\n    \"    # Batch Size\\n\",\n    \"    \\\"batch_size\\\": 1,\\n\",\n    \"    \\n\",\n    \"    # Sequence Lengths\\n\",\n    \"    \\\"seq_len\\\": 16384,\\n\",\n    \"    \\\"max_context_length\\\": 16384,\\n\",\n    \"    \\n\",\n    \"    # Buckets\\n\",\n    \"    \\\"enable_bucketing\\\": True,\\n\",\n    \"    \\\"buckets\\\": [1024, 2048, 16384],\\n\",\n    \"    \\n\",\n    \"    # Parallelism\\n\",\n    \"    \\\"world_size\\\": 16,\\n\",\n    \"    \\\"tp_degree\\\": 16,\\n\",\n    \"    \\n\",\n    \"    # Others\\n\",\n    \"    \\\"torch_dtype\\\": \\\"float16\\\",\\n\",\n    \"    \\\"rpl_reduce_dtype\\\": \\\"float16\\\",\\n\",\n    \"    \\\"cast_type\\\": \\\"as-declared\\\",\\n\",\n    \"    \\\"logical_neuron_cores\\\": 2,\\n\",\n    \"    \\\"cc_pipeline_tiling_factor\\\": 2,\\n\",\n    \"    \\n\",\n    \"    # Kernels\\n\",\n    \"    \\\"fused_qkv\\\": True,\\n\",\n    \"    \\\"attn_kernel_enabled\\\": False,\\n\",\n    \"    \\\"mlp_kernel_enabled\\\": False,\\n\",\n    \"}\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4ffbd74f\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Offline Example\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"a1320307\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"\\n\",\n    \"os.environ[\\\"VLLM_NEURON_FRAMEWORK\\\"] = \\\"neuronx-distributed-inference\\\"\\n\",\n    \"os.environ[\\\"NEURON_RT_DBG_INTRA_RDH_CHANNEL_BUFFER_SIZE\\\"] = \\\"146800640\\\" # to support 32k sequence length\\n\",\n    \"\\n\",\n    \"from vllm import LLM, SamplingParams\\n\",\n    \"\\n\",\n    \"model_name_or_path = \\\"~/models/Qwen3-VL-8B-Thinking/\\\"\\n\",\n    \"\\n\",\n    \"# Create an LLM.\\n\",\n    \"llm = LLM(\\n\",\n    \"   model=model_name_or_path,\\n\",\n    \"   tokenizer=model_name_or_path,\\n\",\n    \"   trust_remote_code=True,\\n\",\n    \"   tensor_parallel_size=16,\\n\",\n    \"   max_num_seqs=1,\\n\",\n    \"   max_model_len=32768,\\n\",\n    \"   additional_config={\\n\",\n    \"      \\\"override_neuron_config\\\": {\\n\",\n    \"            \\\"text_neuron_config\\\": text_neuron_config,\\n\",\n    \"            \\\"vision_neuron_config\\\": vision_neuron_config\\n\",\n    \"      }\\n\",\n    \"   },\\n\",\n    \"   limit_mm_per_prompt={\\\"image\\\": 20}, # Use the max number of image in your use case\\n\",\n    \"   enable_prefix_caching=False,\\n\",\n    \"   enable_chunked_prefill=False,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"# Sample prompts.\\n\",\n    \"from transformers import AutoProcessor\\n\",\n    \"from vllm.assets.image import ImageAsset\\n\",\n    \"\\n\",\n    \"processor = AutoProcessor.from_pretrained(model_name_or_path)\\n\",\n    \"\\n\",\n    \"prompt = \\\"What do you see in these images?\\\"\\n\",\n    \"images = [\\n\",\n    \"   ImageAsset(\\\"blue_flowers\\\").pil_image,\\n\",\n    \"   ImageAsset(\\\"bird\\\").pil_image,\\n\",\n    \"]\\n\",\n    \"      \\n\",\n    \"placeholders = [{\\\"type\\\": \\\"image\\\"} for _ in images]\\n\",\n    \"messages = [\\n\",\n    \"   {\\\"role\\\": \\\"system\\\", \\\"content\\\": \\\"You are a helpful assistant.\\\"},\\n\",\n    \"   {\\n\",\n    \"   \\\"role\\\": \\\"user\\\",\\n\",\n    \"      \\\"content\\\": [\\n\",\n    \"               *placeholders,\\n\",\n    \"               {\\n\",\n    \"               \\\"type\\\": \\\"text\\\",\\n\",\n    \"               \\\"text\\\": prompt,\\n\",\n    \"               },\\n\",\n    \"      ],\\n\",\n    \"   },\\n\",\n    \"]\\n\",\n    \"\\n\",\n    \"prompt = processor.apply_chat_template(\\n\",\n    \"   messages,\\n\",\n    \"   tokenize=False,\\n\",\n    \"   add_generation_prompt=True,\\n\",\n    \")\\n\",\n    \"inputs = {\\n\",\n    \"   \\\"prompt\\\": prompt,\\n\",\n    \"   \\\"multi_modal_data\\\": {\\n\",\n    \"      \\\"image\\\": images,\\n\",\n    \"   },\\n\",\n    \"}\\n\",\n    \"\\n\",\n    \"outputs = llm.generate([inputs], SamplingParams(top_k=1, max_tokens=1024))\\n\",\n    \"print(f\\\"Prompt: {prompt!r}, Generated text: {outputs[0].outputs[0].text!r}\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"11fa90db\",\n   \"metadata\": {},\n   \"source\": [\n    \"Below is an example output:\\n\",\n    \"```bash\\n\",\n    \"Prompt: '<|im_start|>system\\\\nYou are a helpful assistant.<|im_end|>\\\\n<|im_start|>user\\\\n<|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|>What do you see in these images?<|im_end|>\\\\n<|im_start|>assistant\\\\n<think>\\\\n', Generated text: \\\"So, let's look at both images. First image: there are blue flowers with water droplets, some pink flowers in the background, and they're in a wet, reflective surface, maybe water. There are bokeh lights (those yellow circles) in the background, so it's a shallow depth of field. Second image: a bird with bright red head and chest, blue wings and tail, perched on a branch. The background is green, blurred, so it's a forest or jungle setting. Need to describe each image clearly.\\\\n\\\\nFirst image details: blue flowers (maybe plumeria?), water droplets on petals, some pink flowers, wet surface (water), reflections, bokeh lights (out of focus yellow circles). Second image: bird with vibrant colors—red body, blue wings/tail, black beak, perched on a brown branch, green background (blurred foliage). Both images have high detail, vibrant colors, nature themes.\\\\n\\\\nSo, summarize each image's content.\\\\n</think>\\\\n\\\\nIn the first image, I see **vibrant blue flowers** (likely plumeria) with water droplets glistening on their petals. These flowers are partially submerged in a reflective, wet surface (possibly water), creating subtle ripples and reflections. In the background, there are soft, out-of-focus pink flowers and warm, golden bokeh lights (blurred circular highlights), which add a dreamy, atmospheric quality to the scene. The overall mood is serene and ethereal, emphasizing the delicate beauty of the flowers and the moisture around them.  \\\\n\\\\nIn the second image, I observe a **colorful bird** perched on a thick, textured brown branch. The bird has a striking combination of colors: a bright red head and chest, vivid blue wings and tail, and a dark beak. Its feathers appear detailed and glossy, with the blue wings showing intricate patterns. The background is a blurred, lush green (suggesting a forest or jungle environment), which creates a soft, natural backdrop that highlights the bird’s vibrant plumage. The image captures the bird in sharp focus, emphasizing its vivid colors and the texture of its feathers and the branch it rests on.  \\\\n\\\\nBoth images showcase nature’s beauty with high detail, vibrant colors, and a focus on the interplay of light and texture.\\\"\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"417b462b\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Online Example\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"d95a68b0\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import json\\n\",\n    \"\\n\",\n    \"VLLM_NEURON_FRAMEWORK='neuronx-distributed-inference'\\n\",\n    \"additional_neuron_config=json.dumps(dict(override_neuron_config=dict(text_neuron_config=text_neuron_config, vision_neuron_config=vision_neuron_config)))\\n\",\n    \"limit_mm_per_prompt_json = json.dumps({\\\"image\\\": 20})\\n\",\n    \"\\n\",\n    \"start_server_cmd= f'''vllm serve \\\\\\n\",\n    \"--model=\\\"~/models/Qwen3-VL-8B-Thinking/\\\" \\\\\\n\",\n    \"--tokenizer=\\\"~/models/Qwen3-VL-8B-Thinking/\\\" \\\\\\n\",\n    \"--trust-remote-code \\\\\\n\",\n    \"--tensor-parallel-size=16 \\\\\\n\",\n    \"--max-num-seqs=1 \\\\\\n\",\n    \"--max-model-len=32768 \\\\\\n\",\n    \"--additional-config=\\\\'{additional_neuron_config}\\\\' \\\\\\n\",\n    \"--limit_mm_per_prompt=\\\\'{limit_mm_per_prompt_json}\\\\' \\\\\\n\",\n    \"--no-enable-chunked-prefill \\\\\\n\",\n    \"--no-enable-prefix-caching \\\\\\n\",\n    \"--port=8080\\n\",\n    \"'''\\n\",\n    \"\\n\",\n    \"import os\\n\",\n    \"os.system(start_server_cmd)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d7c3024a\",\n   \"metadata\": {},\n   \"source\": [\n    \"After deploying the model server, you can run inference by sending it requests. The below example sends a text prompt with two images -\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"f09ec8eb\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from openai import OpenAI\\n\",\n    \"\\n\",\n    \"client = OpenAI(api_key=\\\"EMPTY\\\", base_url=\\\"http://0.0.0.0:8080/v1\\\")\\n\",\n    \"models = client.models.list()\\n\",\n    \"model_name = models.data[0].id\\n\",\n    \"\\n\",\n    \"messages = [\\n\",\n    \"   {\\n\",\n    \"      \\\"role\\\": \\\"user\\\",\\n\",\n    \"      \\\"content\\\": [\\n\",\n    \"            {\\n\",\n    \"               \\\"type\\\": \\\"image_url\\\",\\n\",\n    \"               \\\"image_url\\\": {\\n\",\n    \"                  \\\"url\\\": \\\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg\\\"}\\n\",\n    \"            },\\n\",\n    \"            {\\n\",\n    \"               \\\"type\\\": \\\"text\\\",\\n\",\n    \"               \\\"text\\\": \\\"Describe this image\\\",\\n\",\n    \"            },\\n\",\n    \"      ],\\n\",\n    \"   },\\n\",\n    \"]\\n\",\n    \"\\n\",\n    \"response = client.chat.completions.create(\\n\",\n    \"   model=model_name,\\n\",\n    \"   messages=messages,\\n\",\n    \"   temperature=1.0,\\n\",\n    \"   top_p=1.0,\\n\",\n    \"   stream=False,\\n\",\n    \"   extra_body={\\\"top_k\\\": 1},\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"generated_text = response.choices[0].message.content\\n\",\n    \"print(generated_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"583a3a9f\",\n   \"metadata\": {},\n   \"source\": [\n    \"Below is an example output:\\n\",\n    \"```bash\\n\",\n    \"So, let's describe this image. First, the main subject is a wild cat, probably a Pallas's cat, in a snowy environment. Let's check the details. The cat has thick, fluffy fur that's a mix of brown, gray, and maybe some lighter shades. Its fur is dusted with snow, so it's in a winter setting. The cat is walking on snow, with one paw lifted, so it's in motion. The background has white birch trees with black bark patterns, typical of a snowy forest. There's also a chain-link fence on the left side, which might indicate a controlled environment like a zoo or wildlife reserve. The snow on the ground is fresh, and there are some small twigs or debris visible. The cat's face has distinctive markings, like the white area around the mouth and the striped pattern on its cheeks. The overall scene is cold, with the snow and the cat's thick fur suggesting it's adapted to cold climates. Let's structure the description: start with the main subject, then details about the cat's appearance, the environment, and the setting.\\n\",\n    \"</think>\\n\",\n    \"\\n\",\n    \"The image depicts a **Pallas's cat** (a wild feline species native to Central Asia) walking through a snowy landscape. The cat’s thick, fluffy fur is a mix of brown, gray, and cream tones, dusted with snowflakes, emphasizing its adaptation to cold climates. Its face features distinctive markings: a white patch around the mouth, dark stripes on the cheeks, and a short, rounded muzzle. The cat is captured mid-stride, with one paw lifted, conveying movement across the snow-covered ground.  \\n\",\n    \"\\n\",\n    \"In the background, **white-barked birch trees** with dark, irregular bark patterns create a stark, wintry forest scene. To the left, a **chain-link fence** suggests the setting may be a controlled environment like a zoo or wildlife reserve. The snow on the ground is fresh and undisturbed except for the cat’s path, with small twigs and debris scattered nearby. The overall atmosphere is serene and cold, highlighting the cat’s natural camouflage and resilience in a snowy habitat.\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"6bd1f1bc\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Conclusion\\n\",\n    \"\\n\",\n    \"Congratulations ! You now know how to deploy `Qwen/Qwen3-VL-8B-Thinking` on a `trn2.48xlarge` instance. Modify the configurations and deploy the model as per your requirements and use case.\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"language_info\": {\n   \"name\": \"python\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/sd-inference-tutorial.rst",
    "content": ".. _nxdi-sd-inference-tutorial:\n\nTutorial: Using Speculative Decoding (SD) to improve inference performance on Trn2 instances\n============================================================================================\n\nNeuronX Distributed Inference (NxDI) allows you to deploy large language models on\nTrn2 or Trn1 instances. This tutorial provides a step-by-step guide to deploy a Qwen3-32B model\non a Trn2 instance using two configurations: one without speculative decoding and one\nwith Qwen3-0.6B as the draft model for speculative decoding. We use LLMPerf to measure and compare\nperformance between the two configurations. While this tutorial uses Qwen models for\ndemonstration, the approach is model-agnostic and can be applied to other supported models\n(see :ref:`nxdi-model-reference`).\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nEnvironment setup\n-----------------\nThis tutorial requires that you have a Trn2 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed.\nTo set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK,\nsee :ref:`nxdi-setup`.\n\nConnect to the EC2 instance via your preferred option: EC2 Instance Connect, Session Manager, or SSH client.\nFor more information, see `Connect to your Linux instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-methods.html>`_ in the Amazon EC2 User Guide.\n\nStart a built-in vLLM Neuron Deep Learning Container (DLC). For more information about available containers,\nsee the `AWS Neuron Deep Learning Containers repository <https://github.com/aws-neuron/deep-learning-containers#vllm-inference-neuronx>`_.\n\nFor example, we use the following:\n\n::\n\n    docker run -d -it --privileged -v /home/ubuntu/:/home/ubuntu/ public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:0.9.1-neuronx-py311-sdk2.26.1-ubuntu22.04\n\nScenario 1: Run without Speculative Decoding\n---------------------------------------------\n\nStep 1: Set environment variables\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nPopulate the following environment variables:\n\n::\n\n    export VLLM_NEURON_FRAMEWORK=\"neuronx-distributed-inference\"\n    export NEURON_COMPILED_ARTIFACTS=\"/home/ubuntu/Qwen-32B-BS1-SL6k-TP64\"\n    export MODEL_ID=\"Qwen/Qwen3-32B\"\n\nNxDI will persist the compiled model artifacts on the EC2 instance in ``NEURON_COMPILED_ARTIFACTS`` so you can rerun the model without recompiling it. If you need to recompile it, empty the folder.\n\nStep 2: Start the vLLM server\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nInvoke the model:\n\n::\n\n    VLLM_USE_V1=0 vllm serve $MODEL_ID \\\n        --tensor-parallel-size 64 \\\n        --max-num-seqs 1  \\\n        --max-model-len 6400 \\\n        --override-neuron-config '{\"save_sharded_checkpoint\": true}'  \n\nWe use ``tensor-parallel-size 64`` assuming the default Logical NeuronCore (LNC) configuration.\nFor more information about LNC, see `Trainium2 Architecture <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trainium2.html>`_.\n\nWe also use ``max-num-seqs 1`` as a baseline. Feel free to adjust this value to your needs. We will use the same value for both scenarios.\n\nFinally, we use ``save_sharded_checkpoint: true`` to speed up model loading after compilation.\nFor more information, see the :ref:`NeuronX Distributed Save/Load Developer Guide <neuronx_distributed_save_load_developer_guide>`.\n\nAfter the model compiles, you will see the following output:\n\n::\n\n    INFO:     Started server process [7]\n    INFO:     Waiting for application startup.\n    INFO:     Application startup complete.\n\nThis indicates the server is ready and the model endpoint is available for inference.\n\nStep 3: Test the endpoint\n~~~~~~~~~~~~~~~~~~~~~~~~~~\nYou can test the endpoint using curl or any HTTP client:\n\n::\n\n    curl http://localhost:8000/v1/completions \\\n        -H \"Content-Type: application/json\" \\\n        -d '{\n            \"model\": \"Qwen/Qwen3-32B\",\n            \"prompt\": \"What is machine learning?\",\n            \"max_tokens\": 100,\n            \"temperature\": 0.7\n        }'\n\nStep 4: Load the model and measure performance with LLMPerf\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nLogin to the docker container that runs the model (``docker exec -it ...``) and install llmperf:\n\n::\n\n    cd /opt\n    git clone https://github.com/ray-project/llmperf.git\n    cd llmperf\n    pip install -e .\n\n    export OPENAI_API_BASE=\"http://localhost:8000/v1\"\n    export OPENAI_API_KEY=dummy\n\n    python token_benchmark_ray.py \\\n        --model \"Qwen/Qwen3-32B\" \\\n        --mean-input-tokens 128 \\\n        --stddev-input-tokens 0 \\\n        --mean-output-tokens 512 \\\n        --stddev-output-tokens 0 \\\n        --max-num-completed-requests 10 \\\n        --timeout 1200 \\\n        --num-concurrent-requests 1 \\\n        --results-dir /tmp/results \\\n        --llm-api openai \\\n        --additional-sampling-params '{}'\n\nWe used ``mean-output-tokens 512`` as a baseline example of an output token length to demonstrate SD performance. Shorter values in our case here did not show significant benefits.\n\nLog the results (we kept p99 for brevity):\n\n::\n\n    ttft_s 0.04828366368776187\n    end_to_end_latency_s 6.044886132028841\n    request_output_throughput_token_per_s 102.27375153804246\n    number_input_tokens 128.0\n    number_output_tokens 558.0\n\n\nScenario 2: Run with Speculative Decoding\n------------------------------------------\n\nStep 1: Set environment variables\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nFor speculative decoding, we need to specify both the target model and the draft model:\n\n::\n\n    export VLLM_NEURON_FRAMEWORK=\"neuronx-distributed-inference\"\n    export NEURON_COMPILED_ARTIFACTS=\"/home/ubuntu/Qwen-32B-BS1-SL6k-TP64-SD\"\n    export MODEL_ID=\"Qwen/Qwen3-32B\"\n    export DRAFT_MODEL_ID=\"Qwen/Qwen3-0.6B\"\n\nStep 2: Start the vLLM server with speculative decoding\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nInvoke the model with speculative decoding enabled:\n\n::\n\n    VLLM_USE_V1=0 vllm serve $MODEL_ID \\\n        --tensor-parallel-size 64 \\\n        --max-num-seqs 1 \\\n        --max-model-len 6400 \\\n        --override-neuron-config '{\"save_sharded_checkpoint\": true, \"enable_fused_speculation\": true}' \\\n        --speculative-config '{\"model\": \"'\"$DRAFT_MODEL_ID\"'\", \"num_speculative_tokens\": 7, \"max_model_len\": 2048, \"method\": \"eagle\"}'\n\nThe key differences from the baseline configuration are:\n\n- ``--speculative-config``: Specifies the draft model configuration including:\n  \n  - ``model``: The draft model path (Qwen3-0.6B)\n  - ``num_speculative_tokens``: Number of tokens to speculatively generate (7 in this example)\n  - ``max_model_len``: Maximum sequence length for the draft model (2048)\n  - ``method``: Speculative decoding method (eagle)\n\n- ``enable_fused_speculation``: Enables fused speculation in the Neuron config for improved performance by combining draft model execution with verification\n\nAfter the model compiles, you will see the same startup messages indicating the server is ready.\n\nStep 3: Test the endpoint\n~~~~~~~~~~~~~~~~~~~~~~~~~~\nTest the endpoint the same way as in Scenario 1:\n\n::\n\n    curl http://localhost:8000/v1/completions \\\n        -H \"Content-Type: application/json\" \\\n        -d '{\n            \"model\": \"Qwen/Qwen3-32B\",\n            \"prompt\": \"What is machine learning?\",\n            \"max_tokens\": 100,\n            \"temperature\": 0.7\n        }'\n\nStep 4: Load the model and measure performance with LLMPerf\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nLogin to the docker container that runs the model (``docker exec -it ...``) and follow Step 4 from the non-SD experiment.\nRun the load test with the same configuration.\n\nLog the results (we kept p99 for brevity):\n\n::\n\n    ttft_s 0.04737630250383518\n    end_to_end_latency_s 5.6368158639998\n    request_output_throughput_token_per_s 137.84216889131872\n    number_input_tokens 128.0\n    number_output_tokens 565.37\n\n\nPerformance Comparison\n----------------------\n\nThe table below summarizes the key performance metrics (p99 values) from both configurations:\n\n.. list-table::\n   :header-rows: 1\n   :widths: 40 30 30\n\n   * - Metric\n     - Without SD\n     - With SD\n   * - Time to First Token (TTFT)\n     - 48.3 ms\n     - 47.4 ms\n   * - End-to-End Latency\n     - 6.04 s\n     - 5.64 s\n   * - Request Output Throughput\n     - 102.3 tokens/s\n     - 137.8 tokens/s\n   * - Number of Input Tokens\n     - 128\n     - 128\n   * - Number of Output Tokens\n     - 558\n     - 565\n\nKey observations:\n\n- **Throughput improvement**: Speculative decoding achieves 35% higher throughput (137.8 vs 102.3 tokens/s)\n- **Latency reduction**: End-to-end latency is reduced by 7% (5.64s vs 6.04s)\n- **TTFT**: Time to first token remains comparable between both configurations\n\nConclusion\n----------\nFor Qwen3-32B with Qwen3-0.6B as the draft model on Trn2, speculative decoding delivers\n35% higher throughput and 7% lower end-to-end latency at 512 output tokens. Performance\ngains vary based on model pairing, output length, and workload characteristics. Use this\nbenchmarking approach to validate the optimal configuration for your use case. \n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/trn1-llama3.1-70b-instruct-accuracy-eval-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Evaluating Accuracy of Llama-3.1-70B on Neuron using open source datasets\\n\",\n    \"\\n\",\n    \"This tutorial provides a step-by-step guide to measure the accuracy of Llama3.1 70B on Trn1 with evaluation on two distinct tasks: mathematical reasoning and logical analysis.\\n\",\n    \"\\n\",\n    \"For this tutorial we use two datasets available in lm-eval, namely `gsm8k_cot`(high school math questions) and `mmlu_flan_n_shot_generative_logical_fallacies` (multiple choice questions on the subject) to demonstrate accuracy evaluation on Trn1. The metrics in these task are two variants of [ExactMatch](https://huggingface.co/spaces/evaluate-metric/exact_match) metrics called StrictMatch and FlexibleExtract which differ in how strict they are in extracting the final answer from the generated output from the model. To see the exact task definition used in lm-eval please look at [gsm8k-cot](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/gsm8k/gsm8k-cot.yaml) and [mmlu template](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/flan_n_shot/generative/_mmlu_flan_generative_template_yaml).\\n\",\n    \"\\n\",\n    \"We also need the instruction-tuned version of llama-3.1 70b [meta-llama/Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) available hugging face.\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\\n\",\n    \"    :local:\\n\",\n    \"    :depth: 4\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Task Overview\\n\",\n    \"\\n\",\n    \"### 1. GSM8K with Chain-of-Thought (gsm8k_cot)\\n\",\n    \"\\n\",\n    \"The GSM8K dataset focuses on grade school math word problems, testing LLMs’ mathematical reasoning capabilities. Using Chain-of-Thought (CoT) prompting, we evaluate models’ ability to:\\n\",\n    \"\\n\",\n    \"- Solve complex math word problems\\n\",\n    \"\\n\",\n    \"- Show step-by-step reasoning\\n\",\n    \"\\n\",\n    \"- Arrive at accurate numerical answers\\n\",\n    \"\\n\",\n    \"### 2. MMLU Logical Fallacies (mmlu_flan_n_shot_generative_logical_fallacies)\\n\",\n    \"\\n\",\n    \"This evaluation focuses on the model’s ability to identify and explain logical fallacies, a subset of the MMLU benchmark. The task tests:\\n\",\n    \"\\n\",\n    \"- Understanding of common logical fallacies\\n\",\n    \"\\n\",\n    \"- Ability to analyze arguments\\n\",\n    \"\\n\",\n    \"- Explanation of reasoning flaws\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Environment Setup Guide\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Prerequisites\\n\",\n    \"\\n\",\n    \"This tutorial requires that you have a Trn1 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed. Also we depend on our fork of vLLM as described in the [vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html#nxdi-vllm-user-guide).\\n\",\n    \"\\n\",\n    \"To use Jupyter Notebook on the Neuron instance, you can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"Before running evaluations, ensure your environment is properly configured by following these essential setup guides:\\n\",\n    \"\\n\",\n    \"1. [NxD Inference Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html)\\n\",\n    \"\\n\",\n    \"    - Configure AWS Neuron environment\\n\",\n    \"\\n\",\n    \"    - Set up required dependencies\\n\",\n    \"\\n\",\n    \"    - Verify system requirements\\n\",\n    \"\\n\",\n    \"2. [vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html)\\n\",\n    \"\\n\",\n    \"    - Setup vLLM according to the guide\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\n\",\n    \"### Installing dependencies\\n\",\n    \"\\n\",\n    \"Copy the [inference-benchmarking](https://github.com/aws-neuron/aws-neuron-samples/tree/master/inference-benchmarking/) directory to some location on your instance. Change directory to the your copy of [inference-benchmarking](https://github.com/aws-neuron/aws-neuron-samples/tree/master/inference-benchmarking/). Install other required dependencies in the same python env (e.g aws_neuron_venv_pytorch if you followed [manual install NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#id3) ) by:\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \"```python\\n\",\n    \"git clone --depth 1 https://github.com/aws-neuron/aws-neuron-samples.git\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \"```python\\n\",\n    \"pip install -r requirements.txt\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Download llama-3.1 70B\\n\",\n    \"To use this sample, you must first download [meta-llama/Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) model checkpoint from Hugging Face and store it locally. We are saving the model checkpoints at ``/home/ubuntu/models/Llama-3.1-70B-Instruct/`` on the Trn1 instance. For more information, see [Downloading models](https://huggingface.co/docs/hub/en/models-downloading) in the Hugging Face documentation.\\n\",\n    \"\\n\",\n    \"## Running Evaluations\\n\",\n    \"There are two methods that you can use [the evaluation scripts](https://github.com/aws-neuron/aws-neuron-samples/tree/master/inference-benchmarking/) to run your evaluation.\\n\",\n    \"\\n\",\n    \"1. Using a yaml configuration file and `accuracy.py` script\\n\",\n    \"\\n\",\n    \"2. writing your own python script that uses several components provided in `accuracy.py` and `server_config.py`\\n\",\n    \"\\n\",\n    \"We demonstrate each use case separately here.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 1. Running eval with yaml config file\\n\",\n    \"In this method all you need is to create a yaml config file that specifies the server configuration and testing scenario you want to run. Create `config.yaml` with the following content.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%writefile config.yaml\\n\",\n    \"server:\\n\",\n    \"  name: \\\"Llama-3.1-70B-Instruct\\\"\\n\",\n    \"  model_path: \\\"/home/ubuntu/model_hf/llama-3.1-70B-Instruct-hf\\\"\\n\",\n    \"  model_s3_path: null\\n\",\n    \"  compiled_model_path: \\\"/home/ubuntu/traced_model_hf/llama-3.1-70B-Instruct-hf\\\"\\n\",\n    \"  max_seq_len: 16384\\n\",\n    \"  context_encoding_len: 16384\\n\",\n    \"  tp_degree: 32\\n\",\n    \"  n_vllm_threads: 32\\n\",\n    \"  server_port: 8000\\n\",\n    \"  continuous_batch_size: 1\\n\",\n    \"\\n\",\n    \"test:\\n\",\n    \"  accuracy:\\n\",\n    \"    mytest:\\n\",\n    \"      client: \\\"lm_eval\\\"\\n\",\n    \"      datasets: [\\\"gsm8k_cot\\\", \\\"mmlu_flan_n_shot_generative_logical_fallacies\\\"]\\n\",\n    \"      max_concurrent_requests: 1\\n\",\n    \"      timeout: 3600\\n\",\n    \"      client_params:\\n\",\n    \"        limit: 200\\n\",\n    \"        use_chat: True\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"For tasks that require higher sequence length you need to adjust `max_seq_len`. For the tasks in this tutorial 16384 would suffice.\\n\",\n    \"\\n\",\n    \"Run `python accuracy.py --config config.yaml`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"python accuracy.py --config config.yaml 2>&1 | tee accuracy_evaluation.log\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 2. Running eval through your own python code\\n\",\n    \"You might be interested in running the evaluation in you python code. For instance if you want to change the configuration programatically or post-process the results. This is possible using 3 main components provided in `accuracy.py` and `server_config.py`.\\n\",\n    \"\\n\",\n    \"1. Server Configuration: Using ServerConfig to define the vLLM server settings\\n\",\n    \"\\n\",\n    \"2. Accuracy Scenario: Using AccuracyScenario to specify evaluation parameters\\n\",\n    \"\\n\",\n    \"3. Test Execution: Running the evaluation with the configured settings\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"First, import the necessary components:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from aws_neuron_eval.accuracy import AccuracyScenario, run_accuracy_test\\n\",\n    \"from aws_neuron_eval.server_config import ServerConfig\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### 1. Configure the Server\\n\",\n    \"\\n\",\n    \"Set up your server configuration with ServerConfig. This example uses Llama 3.1-70b Instruct:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Configure the server settings\\n\",\n    \"name = \\\"Llama-3.1-70B-Instruct\\\"\\n\",\n    \"\\n\",\n    \"server_config = ServerConfig(\\n\",\n    \"    name=name,\\n\",\n    \"    model_path=f\\\"/home/ubuntu/model_hf/llama-3.1-70B-Instruct-hf\\\",  # Local model path\\n\",\n    \"    model_s3_path=None,                         # S3 model path (not used)\\n\",\n    \"    compiled_model_path=f\\\"/home/ubuntu/traced_model_hf/llama-3.1-70B-Instruct-hf\\\",  # Compiled model path\\n\",\n    \"    max_seq_len=16384,                          # Maximum sequence length\\n\",\n    \"    context_encoding_len=16384,                 # Context window size\\n\",\n    \"    tp_degree=32,                               # Tensor parallel degree for Trn1\\n\",\n    \"    n_vllm_threads=32,                          # Number of vLLM threads\\n\",\n    \"    server_port=8000,                           # Server port\\n\",\n    \"    continuous_batch_size=1,                    # Batch size for continuous batching\\n\",\n    \")\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### 2. Define Evaluation Scenarios\\n\",\n    \"\\n\",\n    \"Create an AccuracyScenario to specify your evaluation parameters:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"scenario = AccuracyScenario(\\n\",\n    \"    client=\\\"lm_eval\\\",              # Evaluation client\\n\",\n    \"    datasets=[                     # Target datasets\\n\",\n    \"        \\\"gsm8k_cot\\\",\\n\",\n    \"        \\\"mmlu_flan_n_shot_generative_logical_fallacies\\\",\\n\",\n    \"    ],\\n\",\n    \"    max_concurrent_requests=1,     # Maximum concurrent requests\\n\",\n    \"    timeout=5000,                  # Timeout in seconds - changed to 5000 from 3600\\n\",\n    \"    client_params={\\\"limit\\\": 200}   # Client-specific parameters\\n\",\n    \")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### 3. Run the Evaluation\\n\",\n    \"\\n\",\n    \"Execute the evaluation using run_accuracy_test:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Run the test with a named scenario\\n\",\n    \"results_collection = run_accuracy_test(\\n\",\n    \"    server_config=server_config,\\n\",\n    \"    named_scenarios={\\\"mytest\\\": scenario}\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"# Display results\\n\",\n    \"print(results_collection)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"This code will execute the evaluation on the specified datasets and return detailed performance metrics. The results include accuracy scores and other relevant metrics for each dataset.\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"aws_neuron_venv\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.12\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/trn2-llama3.1-405b-speculative-tutorial.rst",
    "content": ".. _nxdi-trn2-llama3.1-405b-speculative-tutorial:\n\nTutorial: Using Speculative Decoding and Quantization to improve Llama-3.1-405B inference performance on Trn2 instances\n=======================================================================================================================\n\nNeuronX Distributed (NxD) Inference allows you to deploy Llama3.1 405B on\na single Trn2 instance. This tutorial will show you how to optimize inference performance for Llama3.1 405B on a Trn2 instance\nwith speculative decoding and quantization. We will compile and load the model into a VLLM server and measure performance using LLMPerf.\nThis tutorial consists of two parts. In the first part, we will collect performance metrics for our base configuration with ``bf16`` model weights. In the second part, we will optimize inference performance with ``fp8`` quantized weights and speculative decoding. \nThe performance is then compared with the results from part 1.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nPrerequisites\n-----------------------------------------------\n\n\nSet up and connect to a Trn2.48xlarge instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAs a prerequisite, this tutorial requires that you have a Trn2 instance\ncreated from a Deep Learning AMI that has the Neuron SDK pre-installed.\n\nTo set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK,\nsee :ref:`nxdi-setup`.\n\nAfter setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you\nchose when you launched the instance.\n\nAfter you are connected, activate the Python virtual environment that\nincludes the Neuron SDK.\n\n::\n\n   source ~/aws_neuronx_venv_pytorch_2_5_nxd_inference/bin/activate\n\nRun ``pip list`` to verify that the Neuron SDK is installed.\n\n::\n\n   pip list | grep neuron\n\nYou should see Neuron packages including\n``neuronx-distributed-inference`` and ``neuronx-cc``.\n\nInstall packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNxD Inference supports running models with vLLM via the upstream ``vllm-neuron``\nplugin. Install the latest release branch by following the steps in the\n:ref:`vLLM User Guide for NxD Inference<nxdi-vllm-user-guide-v1>`.\n\nIn this tutorial, you will use `llmperf <https://github.com/ray-project/llmperf>`_ to measure the inference performance of the base Llama-3.1-405b-Instruct configuration and the more\noptimized configuration. \nWe will use the `load test <https://github.com/ray-project/llmperf?tab=readme-ov-file#load-test>`_ feature of LLMPerf and measure the performance for accepting\n10,000 tokens as input and generating 1500 tokens as output.\nInstall llmperf into the virtual environment.\n\n::\n\n    git clone https://github.com/ray-project/llmperf.git\n    cd llmperf\n    pip install -e . \n\n\nDownload models\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo run inference in the first part of the tutorial, you need to download the Llama-3.1-405b-Instruct model checkpoint with ``bf16`` weights from Hugging Face (`meta-llama/Llama-3.1-405B-Instruct <https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct>`__). \nFor the second part of the tutorial, you will run a more optimized inference configuration. For this part, you need to download an fp8-quantized Llama3.1-405B-FP8 model checkpoint (`meta-llama/Llama-3.1-405B-Instruct-FP8 <https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct-FP8>`__).\nWith Speculative Decoding, you will also need to specify a draft model. You can download and use the model checkpoint from `meta-llama/Llama-3.2-1B-Instruct <https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct>`__.\nFor more information, see\n`Downloading models <https://huggingface.co/docs/hub/en/models-downloading>`__\nin the Hugging Face documentation. \n\nScenario 1: Run Llama-3.1-405b inference with base configuration using ``bf16`` weights\n-----------------------------------------------------------------------------------------\n\nStep 1: Compile the model and run generate\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nWe will first compile and run generation on a sample prompt using a command\ninstalled by ``neuronx-distributed-inference``. Save the contents of the below script to your favorite \nshell script file, for example, ``compile_model.sh`` and then run it.\n\nNote that we are using the following features as described in\nthe tutorial for running 405B model :ref:`nxdi-trn2-llama3.1-405b-tutorial`\n\n* Logical NeuronCore Configuration (LNC)\n* Tensor parallelism (TP) on Trn2\n* Optimized Kernels\n\nThe script compiles the model and runs generation on the given input prompt. Please refer to :ref:`nxd-inference-api-guide` for more information on these ``inference_demo`` flags.\nNote the path we used to save the compiled model. This path should be used\nwhen launching vLLM server for inference so that the compiled model can be loaded without recompilation.\n\n.. note::\n\n    Known issue: Using kernels with bucket length of 1024 or less may lead to ``Numerical Error`` in inference.\n\n    ::\n\n        RuntimeError: Failed to execute the model status=1003 message=Numerical Error\n\n::\n\n    # Replace this with the path where you downloaded and saved the model files.\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.1-405B-Instruct/\"\n    # This is where the compiled model will be saved. The same path\n    # should be used when launching vLLM server for inference.\n    COMPILED_MODEL_PATH=\"/home/ubuntu/traced_model/Llama-3.1-405B-Instruct/\"\n\n    NUM_CORES=128\n    TP_DEGREE=64\n    LNC=2\n\n    export NEURON_RT_VIRTUAL_CORE_SIZE=$LNC\n    export NEURON_RT_NUM_CORES=$((NUM_CORES/NEURON_RT_VIRTUAL_CORE_SIZE))\n    export NEURON_RT_EXEC_TIMEOUT=600 \n\n\n    inference_demo \\\n        --model-type llama \\\n        --task-type causal-lm \\\n            run \\\n            --model-path $MODEL_PATH \\\n            --compiled-model-path $COMPILED_MODEL_PATH \\\n            --torch-dtype bfloat16 \\\n            --start_rank_id 0 \\\n            --local_ranks_size $TP_DEGREE \\\n            --tp-degree $TP_DEGREE \\\n            --batch-size 1 \\\n            --max-context-length 12288 \\\n            --seq-len 12800 \\\n            --on-device-sampling \\\n            --top-k 1 \\\n            --fused-qkv \\\n            --sequence-parallel-enabled \\\n            --qkv-kernel-enabled \\\n            --attn-kernel-enabled \\\n            --mlp-kernel-enabled \\\n            --cc-pipeline-tiling-factor 1 \\\n            --pad-token-id 2 \\\n            --enable-bucketing \\\n            -—context-encoding-buckets 2048 4096 10240 12288 \\\n            -—token-generation-buckets 12800 \\\n            --prompt \"What is annapurna labs?\" 2>&1 | tee log\n\n\nThe above script will compile a Neuron model for this base-case configuration, and also run generate on the example prompt specified with the ``-prompt`` flag. \nYou can change this prompt to your prompt of choice. \nThe script's output will be written into ``log``, a log file in the working directory. \n\nIn addition, in the subsequent runs of this script, you can add a ``--skip-compile`` flag to skip \nthe compiling step since the model is already compiled in the first run of the script. \nThis will allow you to test the model with different prompts. \n\nStep 2: Start the vLLM server with the compiled Neuron model\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAfter compiling the model, you can run the model using vLLM. Save the contents of the below script to another\nshell script file, for example, ``start_vllm.sh`` and then run it.\n\n::\n\n    export NEURON_RT_VIRTUAL_CORE_SIZE=2\n\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.1-405B-Instruct/\"\n    COMPILED_MODEL_PATH=\"/home/ubuntu/traced_model/Llama-3.1-405B-Instruct/\"\n\n    export VLLM_NEURON_FRAMEWORK=\"neuronx-distributed-inference\"\n    export NEURON_COMPILED_ARTIFACTS=$COMPILED_MODEL_PATH  # Re-use the compiled artifacts\n    VLLM_RPC_TIMEOUT=100000 vllm serve \\\n        --model \"$MODEL_PATH\" \\\n        --max-num-seqs 1 \\\n        --max-model-len 12800 \\\n        --tensor-parallel-size 64 \\\n        --no-enable-prefix-caching \\\n        --port 8000 & PID=$! 2>&1 | tee llama405b_bf16.log\n    echo \"vLLM server started with PID $PID\"\n\nStep 3: Measure performance using LLMPerf\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nAfter the above steps, the vllm server should be running. Before we can use the ``llmperf`` package, we need to make a few changes to its code. \nFollow :ref:`benchmarking with LLMPerf guide <llm_perf_patch_changes>` to apply the code changes. \n    \nWe can now measure the performance using ``llmperf``. Below is a sample shell script to run ``llmperf``. More information about several arguments used in the script can be found in the \n`llmperf open source code <https://github.com/ray-project/llmperf/blob/main/token_benchmark_ray.py>`_ .\n\n::\n\n    # This should be the same path to which the model was downloaded (also used in the above steps).\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.1-405B-Instruct\"\n    # This is the name of directory where the test results will be saved.\n    OUTPUT_PATH=llmperf-results-sonnets\n\n    export OPENAI_API_BASE=\"http://localhost:8000/v1\"\n    export OPENAI_API_KEY=\"mock_key\"\n\n    python token_benchmark_ray.py \\\n        --model $MODEL_PATH \\\n        --mean-input-tokens 10000 \\\n        --stddev-input-tokens 0 \\\n        --mean-output-tokens 1500 \\\n        --stddev-output-tokens 0 \\\n        --num-concurrent-requests 1\\\n        --timeout 3600 \\\n        --max-num-completed-requests 50 \\\n        --additional-sampling-params '{}' \\\n        --results-dir $OUTPUT_PATH \\\n        --llm-api \"openai\"\n\n\nThe output for this llama-3.1-405B model run for the base case is shown below. Please note that the numbers can slightly vary between runs but should be in the same order of magnitude.\n::\n    \n    Results for token benchmark for /home/ubuntu/models/llama-3.1-405b queried with the openai api.\n\n    inter_token_latency_s\n        p25 = 0.03783673520494379\n        p50 = 0.037929154633788834\n        p75 = 0.03799374728198055\n        p90 = 0.03806084386428147\n        p95 = 0.03818095359194858\n        p99 = 0.03862880035825585\n        mean = 0.03790912092492011\n        min = 0.03711292916794487\n        max = 0.03867580939426865\n        stddev = 0.0002364662521116205\n    ttft_s\n        p25 = 2.437347081664484\n        p50 = 2.441959390998818\n        p75 = 2.4439403364085592\n        p90 = 2.444729209714569\n        p95 = 2.445114637189545\n        p99 = 79.22927707570342\n        mean = 5.451600373298861\n        min = 2.427013176959008\n        max = 153.00210832804441\n        stddev = 21.29264628138615\n    end_to_end_latency_s\n        p25 = 70.06310007086722\n        p50 = 70.09642704750877\n        p75 = 70.1557097924524\n        p90 = 70.28295350184199\n        p95 = 70.56055794338462\n        p99 = 148.28325726192182\n        mean = 73.19207735829521\n        min = 70.00512732309289\n        max = 222.50397142698057\n        stddev = 21.54750467688136\n    request_output_throughput_token_per_s\n        p25 = 25.417755028050912\n        p50 = 25.463487985775544\n        p75 = 25.522234144656743\n        p90 = 25.6487981126861\n        p95 = 25.729858763245502\n        p99 = 25.90146713883131\n        mean = 25.13808905954906\n        min = 8.080754642125802\n        max = 26.021214285642255\n        stddev = 2.465472136291901\n    number_input_tokens\n        p25 = 10000.0\n        p50 = 10000.0\n        p75 = 10000.0\n        p90 = 10000.0\n        p95 = 10000.0\n        p99 = 10000.0\n        mean = 10000.0\n        min = 10000\n        max = 10000\n        stddev = 0.0\n    number_output_tokens\n        p25 = 1783.0\n        p50 = 1785.0\n        p75 = 1789.75\n        p90 = 1798.1\n        p95 = 1803.55\n        p99 = 1816.67\n        mean = 1787.92\n        min = 1779\n        max = 1825\n        stddev = 8.54720386310933\n    Number Of Errored Requests: 0\n    Overall Output Throughput: 24.421011092151268\n    Number Of Completed Requests: 50\n    Completed Requests Per Minute: 0.8195336846889548\n\n\n\nScenario 2: Run Llama-3.1-405b inference with fp8 weights and fused speculation (with draft model)\n--------------------------------------------------------------------------------------------------\n\nStep 1: Rescale the model weights to use Neuron FP8 format and save the modules to not convert file in model path\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nSince Neuron device only supports the ``FP8_EXP4 (IEEE-754)`` data type, and the HuggingFace FP8 checkpoint for Llamma-405b is in a different FP8 format (``OCP FP8 E4M3/e4m3fn``) which has a different range, we need to rescale the public model weights. \nFollow this guide to rescale the FP8 model weights from HuggingFace: `link <https://github.com/aws-neuron/neuronx-distributed/blob/main/src/neuronx_distributed/quantization/README_rescaling_fp8_for_neuron.md>`__.\n\nRunning a quantized model requires us to create modules to not convert json file to explicitly mention the layers which are not quantized in the model. For this tutorial we can use the following file.\n\nDownload: :download:`modules_to_not_convert.json <modules_to_not_convert.json>`\n\nNext we will compile and run the model and record performance metrics.\n\nStep 2: Compile the model and run generate\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nWe will first compile and run generation on a sample prompt using a command\ninstalled by ``neuronx-distributed-inference``. Save the contents of the below script to your favorite \nshell script file, for example, ``compile_model.sh`` and then run it.\n\nNote that we are using the following features as described in\nthe tutorial for running 405B model :ref:`nxdi-trn2-llama3.1-405b-tutorial`\n\n* Logical NeuronCore Configuration (LNC)\n* Tensor parallelism (TP) on Trn2\n* Optimized Kernels\n\nThe compiling script is similar to the one in part 1. \nNote that we have added the path for the draft model.\n\n\n.. note::\n\n    Known issue: Using kernels with bucket length of 1024 or less may lead to ``Numerical Error`` in inference.\n\n    ::\n\n        RuntimeError: Failed to execute the model status=1003 message=Numerical Error\n\n\n::\n    \n    # Replace this with the path where you downloaded and saved the model files.\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.1-405B-Instruct-FP8-rescaled/\"\n    # Replace this with the path where you downloaded and saved the draft model files.\n    DRAFT_MODEL_PATH=\"/home/ubuntu/models/Llama-3.2-1b-instruct/\"    \n    # This is where the compiled model (.pt file) and sharded checkpoints will be saved. The same path\n    # should be used when launching vLLM server for inference.\n    COMPILED_MODEL_PATH=\"/home/ubuntu/traced_model/Llama-3.1-405B-Instruct/\"\n    # Add a modules to not convert json file to the model path to specify non quantized modules.\n    MTNC_FILE_PATH=\"/home/ubuntu/models/Llama-3.1-405B-Instruct-FP8-rescaled/modules_to_not_convert.json\"\n\n    NUM_CORES=128\n    TP_DEGREE=64\n    LNC=2\n\n\n    export NEURON_RT_VIRTUAL_CORE_SIZE=$LNC\n    export NEURON_RT_NUM_CORES=$((NUM_CORES/NEURON_RT_VIRTUAL_CORE_SIZE))\n    export NEURON_RT_EXEC_TIMEOUT=600 \n    export XLA_HANDLE_SPECIAL_SCALAR=1\n    export UNSAFE_FP8FNCAST=1\n\n    inference_demo \\\n        -—model-type llama \\\n        -—task-type causal-lm \\\n        run \\\n            -—model-path $MODEL_PATH \\\n            -—compiled-model-path $COMPILED_MODEL_PATH \\\n            -—torch-dtype bfloat16 \\\n            -—start_rank_id 0 \\\n            -—local_ranks_size $TP_DEGREE \\\n            -—tp-degree $TP_DEGREE \\\n            -—batch-size 1 \\\n            -—max-context-length 12288 \\\n            -—seq-len 12800 \\\n            -—on-device-sampling \\\n            -—top-k 1 \\\n            -—fused-qkv \\\n            -—sequence-parallel-enabled \\\n            -—qkv-kernel-enabled \\\n            -—attn-kernel-enabled \\\n            -—mlp-kernel-enabled \\\n            -—cc-pipeline-tiling-factor 1 \\\n            -—draft-model-path $DRAFT_MODEL_PATH \\\n            -—enable-fused-speculation \\\n            -—speculation-length 7 \\\n            -—pad-token-id 2 \\\n            -—quantized-mlp-kernel-enabled \\\n            -—quantization-type per_channel_symmetric \\\n            -—rmsnorm-quantize-kernel-enabled \\\n            -—enable-bucketing \\\n            -—prompt \"What is annapurna labs?\" \\\n            --modules-to-not-convert-file $MTNC_FILE_PATH \\\n            -—context-encoding-buckets 2048 4096 10240 12288 \\\n            -—token-generation-buckets 12800 2>&1 | tee compile_and_generate_log\n\n\nThe above script will compile a Neuron model with fused speculation, and also run generate on the example prompt specified with the ``-prompt`` flag. Please refer to :ref:`nxd-inference-api-guide` for more information on these ``inference_demo`` flags.\n\nYou can change this prompt to your prompt of choice. \nThe script's output will be written into ``compile_and_generate_log``, a log file in the working directory. \n\nIn this script, we also turn on some additional environment variables: ``XLA_HANDLE_SPECIAL_SCALAR`` and ``UNSAFE_FP8FNCAST`` to enable Neuron compiler to treat rescaled ``FP8FN`` weights as\n``FP8_EXP4`` weights.\n\nIn addition, in the subsequent runs of this script, you can add a ``--skip-compile`` flag to skip \nthe compiling step since the model is already compiled in the first run of the script. \nThis will allow you to test the model with different prompts. \n\n\n\nStep 3: Start the Vllm server with the compiled Neuron model\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAfter compiling the model, you can run the model using vLLM. Save the contents of the below script to another\nshell script file, for example, ``start_vllm.sh`` and then run it.\n\n::\n\n    export NEURON_RT_INSPECT_ENABLE=0\n    export NEURON_RT_VIRTUAL_CORE_SIZE=2\n    export XLA_HANDLE_SPECIAL_SCALAR=1\n    export UNSAFE_FP8FNCAST=1\n\n\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.1-405B-Instruct-FP8-rescaled\"\n    DRAFT_MODEL_PATH=\"/home/ubuntu/models/Llama-3.2-1b-instruct\"\n    COMPILED_MODEL_PATH=\"/home/ubuntu/traced_models/Llama-3.1-405B-Instruct_fp8\"\n\n\n    export VLLM_NEURON_FRAMEWORK=\"neuronx-distributed-inference\"\n    export NEURON_COMPILED_ARTIFACTS=$COMPILED_MODEL_PATH\n    VLLM_RPC_TIMEOUT=100000 vllm serve \\\n        -—model $MODEL_PATH \\\n        -—max-num-seqs 1 \\\n        -—max-model-len 12800 \\\n        -—tensor-parallel-size 64 \\\n        -—device neuron \\\n        -—speculative-max-model-len 12800 \\\n        -—speculative-model $DRAFT_MODEL_PATH \\\n        -—num-speculative-tokens 7 \\\n        -—use-v2-block-manager \\\n        -—override-neuron-config \"{\\\"enable_fused_speculation\\\":true, \\\"quantized-mlp-kernel-enabled\\\":true, \\\"quantization-type\\\":\\\"per_channel_symmetric\\\", \\\"skip_warmup\\\": true}\" \\\n        -—port 8000 & PID=$!\n    echo \"vLLM server started with PID $PID\"\n\nStep 4: Measure performance using LLMPerf\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nAfter the above steps, the vllm server should be running. Before we can use the ``llmperf`` package, we need to make a few changes to its code. \nFollow :ref:`benchmarking with LLMPerf guide <llm_perf_patch_changes>` to apply the code changes.\n    \nWe can now measure the performance using ``llmperf``. Run the following script with the modified ``llmperf`` package.\n\n::\n\n    # This should be the same path to which the model was downloaded (also used in the above steps).\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.1-405B-Instruct-FP8-rescaled\"\n    # This is the name of directory where the test results will be saved.\n    OUTPUT_PATH=llmperf-results-sonnets\n\n    export OPENAI_API_BASE=\"http://localhost:8000/v1\"\n    export OPENAI_API_KEY=\"mock_key\"\n\n    python token_benchmark_ray.py \\\n        --model $MODEL_PATH \\\n        --mean-input-tokens 10000 \\\n        --stddev-input-tokens 0 \\\n        --mean-output-tokens 1500 \\\n        --stddev-output-tokens 0 \\\n        --num-concurrent-requests 1\\\n        --timeout 3600 \\\n        --max-num-completed-requests 50 \\\n        --additional-sampling-params '{}' \\\n        --results-dir $OUTPUT_PATH \\\n        --llm-api \"openai\"\n\n\nThe output for this llama-3.1-405B model run with fused speculation with fused spec is shown below. Please note that the numbers can slightly vary between runs but should be in the same order of magnitude. \n\n::\n\n    Results for token benchmark for /home/ubuntu/models/Llama-3.1-405B-Instruct-FP8-rescaled queried with the openai api.\n\n    inter_token_latency_s\n        p25 = 0.008220573497974934\n        p50 = 0.008265312568750231\n        p75 = 0.008438719224417583\n        p90 = 0.00848199803312309\n        p95 = 0.008495625438929224\n        p99 = 0.011143428944987235\n        mean = 0.008419798457414533\n        min = 0.008173695931987216\n        max = 0.01364151847269386\n        stddev = 0.0007612118573477839\n    ttft_s\n        p25 = 2.2543624382815324\n        p50 = 2.254961202503182\n        p75 = 2.2576071268413216\n        p90 = 2.2596270388457924\n        p95 = 2.260639927221928\n        p99 = 2.2628143909573555\n        mean = 2.256157155628316\n        min = 2.2534945809748024\n        max = 2.2629711360204965\n        stddev = 0.0023667267664955545\n    end_to_end_latency_s\n        p25 = 14.586015026085079\n        p50 = 14.65608573507052\n        p75 = 14.91364526405232\n        p90 = 14.977840351965279\n        p95 = 15.000083449739032\n        p99 = 18.969864878777866\n        mean = 14.886235136194154\n        min = 14.520539953839034\n        max = 22.716861865017563\n        stddev = 1.1415236552464672\n    request_output_throughput_token_per_s\n        p25 = 100.64608830743339\n        p50 = 102.4148205461138\n        p75 = 102.90679421801005\n        p90 = 103.02201242683091\n        p95 = 103.26614794565539\n        p99 = 103.36118277211666\n        mean = 101.22055373532301\n        min = 66.0742671641385\n        max = 103.37081160698546\n        stddev = 5.19249551094185\n    number_input_tokens\n        p25 = 10000.0\n        p50 = 10000.0\n        p75 = 10000.0\n        p90 = 10000.0\n        p95 = 10000.0\n        p99 = 10000.0\n        mean = 10000.0\n        min = 10000\n        max = 10000\n        stddev = 0.0\n    number_output_tokens\n        p25 = 1501.0\n        p50 = 1501.0\n        p75 = 1501.0\n        p90 = 1501.0\n        p95 = 1501.0\n        p99 = 1501.0\n        mean = 1501.0\n        min = 1501\n        max = 1501\n        stddev = 0.0\n    Number Of Errored Requests: 0\n    Overall Output Throughput: 100.69986490153724\n    Number Of Completed Requests: 50\n    Completed Requests Per Minute: 4.025311055357918\n\n\n\n\nConclusion\n-----------------------------------------------------------\nAs seen from the table below, draft model based fused speculative decoding and quantization significantly improved inference performance: TPOT reduced by 4x and output token throughput increased by 4x, while TTFT decreased from 2442 ms to 2255 ms compared to baseline without speculative decoding.\nPlease note that batch size of 1 is used in this tutorial for computing the below metrics.\n\n.. csv-table::\n   :file: llama405b_perf_comparison.csv\n   :header-rows: 1\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/trn2-llama3.1-405b-tutorial.rst",
    "content": ".. _nxdi-trn2-llama3.1-405b-tutorial:\n\nTutorial: Deploying Llama3.1 405B (Trn2)\n========================================\n\nNeuronX Distributed (NxD) Inference enables you to deploy Llama3.1 405B on\na single Trn2 instance.\n\nYou can run Llama3.1 405B with default configuration options. NxD\nInference also provides several features and configuration options that\nyou can use to optimize and tune the performance of Llama3.1 405B on\nTrn2. This guide walks through how to run Llama3.1 405B on Trn2 with\nvLLM, and how to enable these optimizations for optimal performance. In addition, we also have a separate tutorial for running Llama3.1 405B with vanilla fused speculative decoding :ref:`nxdi-trn2-llama3.1-405b-speculative-tutorial`. \n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nBackground, Concepts, and Optimizations\n---------------------------------------\n\nLogical NeuronCore Configuration (LNC)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nOn Trn2, the Neuron SDK supports *Logical NeuronCore Configuration\n(LNC)*, which determines the number of NeuronCores visible to the Neuron SDK.\nWhen running on Trn2, the Neuron SDK is optimized for LNC=2, which means\neach NeuronCore visible to the Neuron SDK is two physical NeuronCores.\nThe LNC configuration also affects what TP degree options you can use.\n\nNxD Inference automatically chooses the correct LNC configuration\nbased on the target platform.\n\nFor more information about LNC, see :ref:`logical-neuroncore-config`.\n\nTensor parallelism (TP) on Trn2\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nEach Trn2 instance has 128 Neuron cores. With LNC=2, you can set a TP\ndegree up to 64. We recommend that you use LNC=2 for all models on Trn2.\n\nFor more information about tensor parallelism in NxD Inference, see\n:ref:`nxdi-tensor-parallelism`.\n\nOptimizing Performance\n~~~~~~~~~~~~~~~~~~~~~~\n\nEAGLE Speculative Decoding\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSpeculative decoding is a performance optimization technique where a\nsmaller *draft* LLM model predicts the next tokens, and the larger *target*\nLLM model verifies those predictions.\n\nNxD Inference supports EAGLE v1 speculative decoding with a\nflat draft structure. To use EAGLE v1, you must use an EAGLE checkpoint for a draft model \nthat is not tree-based and is specifically fine-tuned for EAGLE speculation. For more\ninformation about EAGLE, see the official implementation on GitHub: `SafeAILab/EAGLE <https://github.com/SafeAILab/EAGLE>`__.\n\nTo optimize performance for EAGLE speculative decoding, NxD Inference uses\na feature called *fused speculation*, where the\ndraft model and target model are fused into a single compiled model artifact\nto improve performance. Fused speculation uses a different config called\nFusedSpecNeuronConfig, which specifies the model class. draft config,\nand draft model path to fuse with the target model.\n\nFor more information about speculative decoding in NxD Inference, including\nother types of speculative decoding supported, see :ref:`nxd-speculative-decoding`.\n\nFP8 Quantization\n^^^^^^^^^^^^^^^^\n\nNxD Inference supports FP8 quantization, where model weights and data\nare converted to a smaller data type to reduce memory bandwidth usage.\nFP8 quantization enables optimal usage of memory bandwidth to improve\nmodel performance. For more information, see :ref:`nxdi-weight-quantization`.\n\nNxD Inference also supports KV cache quantization, where the KV cache is\nquantized to FP8. For more information, see :ref:`nxdi-kv-cache-quantization`.\n\nOptimized Kernels\n^^^^^^^^^^^^^^^^^\n\nNxD Inference supports kernels that optimize parts of the modeling code\nfor best performance.\n\n- Flash attention. This kernel uses a sharded flash attention\n  implementation to improve performance during the context encoding\n  pass. This kernel is enabled automatically at supported sequence\n  lengths. For LNC2, NxD Inference automatically enables flash attention for sequence lengths of\n  256 and larger that are divisible by 256. For LNC1, NxD Inference automatically enables flash attention\n  for sequence lengths of 4096 and larger. You can also enable it with ``attn_kernel_enabled=True`` in\n  NeuronConfig. NxD Inference automatically enables the flash attention kernel\n  at supported sequence lengths even if ``attn_kernel_enabled`` is ``false``.\n- QKV. This kernel fuses the QKV layers to improve performance during\n  the attention forward pass. To enable this kernel, set\n  ``qkv_kernel_enabled=True`` in NeuronConfig.\n- MLP. This kernel implements the MLP module used in decoder layers. To\n  enable this kernel, set ``mlp_kernel_enabled=True`` in NeuronConfig.\n- Quantized MLP. This kernel implements a quantized version of the MLP\n  kernel. This kernel uses FP8 compute to improve performance. To enable\n  this kernel, set ``quantized_mlp_kernel_enabled=True``. This kernel requires\n  ``mlp_kernel_enabled=True``.\n\n.. note::\n   To use the QKV and MLP kernels, you must set ``torch_dtype`` to ``torch.bfloat16``\n   in NeuronConfig.\n\n.. _nxdi-trn2-llama3.1-405b-running:\n\nTutorial: Run Llama3.1 405B on Trn2\n-----------------------------------\n\nAs a prerequisite, this tutorial requires that you have a Trn2 instance\ncreated from a Deep Learning AMI that has the Neuron SDK pre-installed.\n\nTo set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK,\nsee :ref:`nxdi-setup`.\n\nStep 1: Connect to the Trn2 instance\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUse SSH to connect to the Trn2 instance using the key pair that you\nchose when you launched the instance.\n\nAfter you are connected, activate the Python virtual environment that\nincludes the Neuron SDK.\n\n::\n\n   source ~/aws_neuronx_venv_pytorch_2_5_nxd_inference/bin/activate\n\nRun ``pip list`` to verify that the Neuron SDK is installed.\n\n::\n\n   python -m pip list\n\nYou should see Neuron packages including\n``neuronx-distributed-inference`` and ``neuronx-cc``.\n\nStep 2: Install the vLLM version that supports NxD Inference\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNxD Inference supports running models with vLLM through the upstream\n``vllm-neuron`` plugin that ships in the vLLM project. Install the\nlatest release branch of the plugin following the detailed\nsteps in the vLLM user guide for NxD Inference.\n\nStep 3: Deploy Llama 3.1 405B sample code\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nChoose one of the following examples to run on the Trn2 instance:\n\n1. Deploy Llama3.1 405B with vLLM offline inference. This example demonstrates\n   how to deploy on Trn2 with vLLM and topK sampling.\n\n2. Deploy Llama3.1 405B with EAGLE speculative decoding. This example\n   demonstrates how to use EAGLE to optimize Llama3.1 405B on Trn2.\n\nExample 1: Deploy Llama3.1 405B on Trn2 with vLLM offline inference\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis example demonstrates how to deploy Llama3.1 405B on Trn2 with vLLM\noffline inference and the following configuration options:\n\n- Sequence length: 2048 tokens\n- Max context length: 1024 tokens\n\nTo use this sample, you must first download a 405B model checkpoint from Hugging Face\nto a local path on the Trn2 instance. For more information, see\n`Downloading models <https://huggingface.co/docs/hub/en/models-downloading>`__\nin the Hugging Face documentation. You can download and use `meta-llama/Llama-3.1-405B-Instruct <https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct>`__\nfor this tutorial.\n\n::\n\n   import os\n   import torch\n   \n   from vllm import LLM, SamplingParams\n   \n   # Force vLLM framework to use neuronx-distributed-inference\n   os.environ['VLLM_NEURON_FRAMEWORK'] = \"neuronx-distributed-inference\"\n   \n   model_path = \"/home/ubuntu/models/Llama-3.1-405B-Instruct/\"\n   \n   \n   def run_llama_generate():\n       # Initialize vLLM.\n       llm = LLM(\n           model=model_path,\n           tensor_parallel_size=64,\n           max_num_seqs=1,\n           max_model_len=2048,\n           block_size=2048,\n           dtype=torch.bfloat16,\n           enable_prefix_caching=False,\n           additional_config={\n               \"override_neuron_config\": {\n                   \"skip_warmup\": True,\n                   \"max_context_length\": 1024,\n               },\n           },\n       )\n   \n       # Run vLLM to generate outputs.\n       prompts = [\"I believe the meaning of life is\"]\n       sampling_params = SamplingParams(top_k=50)\n       outputs = llm.generate(prompts, sampling_params)\n       for output in outputs:\n           prompt = output.prompt\n           generated_text = output.outputs[0].text\n           print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n   \n   \n   if __name__ == \"__main__\":\n       run_llama_generate()\n\nExample 2: Deploy Llama3.1 405B on Trn2 with EAGLE speculative decoding\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis example demonstrates how to deploy Llama3.1 405B on Trn2 with EAGLE\nspeculative decoding.\n\n.. note::\n   To use this example, you must provide an EAGLE-trained Llama3.1 405B\n   checkpoint to use for EAGLE speculative decoding. For more information\n   about EAGLE checkpoint compatibility with NxD Inference, see :ref:`nxd-eagle-speculative-decoding`.\n\nThis example uses the following configuration options:\n\n- Sequence length: 2048 tokens\n- Max context length: 1024 tokens\n- Speculation length: 6 tokens\n- Flash attention, QKV, and MLP kernels\n- On-device sampling with greedy sampling\n- Sequence parallelism enabled\n- Auto-bucketing enabled, which automatically selects buckets to use.\n  For more information about bucketing and how to customize the buckets used,\n  see :ref:`nxdi-bucketing`.\n\n::\n\n   import copy\n   import os\n   import torch\n   \n   from transformers import AutoTokenizer, GenerationConfig\n   \n   from neuronx_distributed_inference.models.config import FusedSpecNeuronConfig, NeuronConfig, OnDeviceSamplingConfig\n   from neuronx_distributed_inference.models.llama.modeling_llama import LlamaInferenceConfig, NeuronLlamaForCausalLM\n   from neuronx_distributed_inference.utils.hf_adapter import HuggingFaceGenerationAdapter, load_pretrained_config\n   \n   model_path = \"/home/ubuntu/models/llama-3.1-405b-Instruct/\"\n   draft_model_path = \"/home/ubuntu/models/EAGLE-llama-3-405b/\"\n   compiled_model_path = \"/home/ubuntu/neuron_models/llama-3-405b-instruct-EAGLE/\"\n   \n   # Set environment variables for Trn2.\n   os.environ[\"XLA_DENSE_GATHER_FACTOR\"] = \"0\"\n   os.environ[\"NEURON_RT_EXEC_TIMEOUT\"] = \"600\"\n   \n   def run_llama_generate():\n       top_k = 1\n       do_sample = False\n   \n       # Initialize tokenizer.\n       tokenizer = AutoTokenizer.from_pretrained(model_path, padding_side=\"right\")\n       tokenizer.pad_token = tokenizer.eos_token\n   \n       # Initialize target model config.\n       neuron_config = NeuronConfig(\n           torch_dtype=torch.bfloat16,\n           tp_degree=64,\n           batch_size=1,\n           max_context_length=1024,\n           seq_len=2048,\n           on_device_sampling_config=OnDeviceSamplingConfig(\n               dynamic=False,\n               do_sample=do_sample,\n               top_k=top_k\n           ),\n           enable_eagle_speculation=True,\n           enable_fused_speculation=True,\n           speculation_length=6,\n           trace_tokengen_model=False,\n           enable_bucketing=True,\n           fused_qkv=True,\n           sequence_parallel_enabled=True,\n           attn_kernel_enabled=True,\n           qkv_kernel_enabled=True,\n           mlp_kernel_enabled=True,\n           cc_pipeline_tiling_factor=1,\n       )\n       config = LlamaInferenceConfig(\n           neuron_config,\n           load_config=load_pretrained_config(model_path),\n       )\n   \n       # Initialize draft model config.\n       draft_neuron_config = copy.deepcopy(neuron_config)\n       draft_neuron_config.trace_tokengen_model = True\n       draft_neuron_config.enable_fused_speculation = False\n       draft_neuron_config.is_eagle_draft = True\n       draft_config = LlamaInferenceConfig(\n           draft_neuron_config,\n           load_config=load_pretrained_config(draft_model_path)\n       )\n   \n       # Initialize fused speculation config.\n       fused_spec_config = FusedSpecNeuronConfig(\n           NeuronLlamaForCausalLM._model_cls,\n           draft_config=draft_config,\n           draft_model_path=draft_model_path,\n       )\n       config.fused_spec_config = fused_spec_config\n           \n       # Compile and save model.\n       print(\"\\nCompiling and saving model...\")\n       model = NeuronLlamaForCausalLM(model_path, config)\n       model.compile(compiled_model_path)\n       tokenizer.save_pretrained(compiled_model_path)\n   \n       # Load from compiled checkpoint.\n       print(\"\\nLoading model from compiled checkpoint...\")\n       model = NeuronLlamaForCausalLM(compiled_model_path)\n       model.load(compiled_model_path)\n       tokenizer = AutoTokenizer.from_pretrained(compiled_model_path)\n   \n       # Initialize generation config.\n       generation_config = GenerationConfig.from_pretrained(model_path)\n       generation_config_kwargs = {\n           \"do_sample\": do_sample,\n           \"top_k\": top_k,\n           \"pad_token_id\": 0,\n           \"prompt_lookup_num_tokens\": neuron_config.speculation_length,\n       }\n       generation_config.update(**generation_config_kwargs)\n   \n       # Generate outputs.\n       print(\"\\nGenerating outputs...\")\n       prompts = [\"I believe the meaning of life is\"]\n       print(f\"Prompts: {prompts}\")\n       inputs = tokenizer(prompts, padding=True, return_tensors=\"pt\")\n       generation_model = HuggingFaceGenerationAdapter(model)\n       outputs = generation_model.generate(\n           inputs.input_ids,\n           generation_config=generation_config,\n           attention_mask=inputs.attention_mask,\n           max_length=model.config.neuron_config.max_length,\n       )\n       output_tokens = tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False)\n       print(\"Generated outputs:\")\n       for i, output_token in enumerate(output_tokens):\n           print(f\"Output {i}: {output_token}\")\n   \n   \n   if __name__ == \"__main__\":\n       run_llama_generate()\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/trn2-llama3.1-8b-multi-lora-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Multi-LoRA serving for Llama-3.1-8B on Trn2 instances\\n\",\n    \"\\n\",\n    \"NeuronX Distributed (NxD) Inference supports multi-LoRA serving. This tutorial provides a step-by-step guide for multi-LoRA serving with Llama-3.1-8B as the base model on a Trn2 instance. It describes two different ways of running multi-LoRA serving with NxD Inference directly and through vLLM (with NxD Inference) We will use LoRA adapters downloaded from HuggingFace as examples for serving.\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\\n\",\n    \"    :local:\\n\",\n    \"    :depth: 2\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Prerequisites\\n\",\n    \"\\n\",\n    \"### Set up and connect to a Trn2.48xlarge instance\\n\",\n    \"\\n\",\n    \"As a prerequisite, this tutorial requires that you have a Trn2 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed.\\n\",\n    \"\\n\",\n    \"To set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK, see [NxD Inference Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#nxdi-setup). To use Jupyter Notebook on the Neuron instance, you can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"After setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you chose when you launched the instance.\\n\",\n    \"\\n\",\n    \"After you are connected, activate the Python virtual environment that includes the Neuron SDK.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"```python\\n\",\n    \"source ~/aws_neuronx_venv_pytorch_2_9_nxd_inference/bin/activate\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Run ```pip list``` to verify that the Neuron SDK is installed.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"```python\\n\",\n    \"pip list | grep neuron\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You should see Neuron packages including `neuronx-distributed-inference` and `neuronx-cc`.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Install Packages\\n\",\n    \"\\n\",\n    \"NxD Inference supports running models with vLLM. This functionality is available in the AWS Neuron fork of the vLLM GitHub repository. Install the latest release branch of vLLM from the AWS Neuron fork following instructions in the [vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html#nxdi-vllm-user-guide).\\n\",\n    \"\\n\",\n    \"### Download base model and LoRA adapters\\n\",\n    \"\\n\",\n    \"To use this sample, you must first download a [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model checkpoint from Hugging Face to a local path on the Trn2 instance. Note that you may need access from Meta for model download. For more information, see [Downloading models](https://huggingface.co/docs/hub/en/models-downloading) in the Hugging Face documentation.\\n\",\n    \"\\n\",\n    \"You must download LoRA adapters from Hugging Face for multi-LoRA serving. As examples, you can download [nvidia/llama-3.1-nemoguard-8b-topic-control](https://huggingface.co/nvidia/llama-3.1-nemoguard-8b-topic-control), [reissbaker/llama-3.1-8b-abliterated-lora](https://huggingface.co/reissbaker/llama-3.1-8b-abliterated-lora), [Stefano-M/aixpa_amicifamiglia_short_prompt](https://huggingface.co/Stefano-M/aixpa_amicifamiglia_short_prompt), and [GaetanMichelet/Llama-31-8B_task-2_180-samples_config-2](https://huggingface.co/GaetanMichelet/Llama-31-8B_task-2_180-samples_config-2). Suppose these LoRA adapters are saved in `/home/ubuntu/lora_adapters/`.\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Using vLLM V1 for multi-LoRA serving on Trn2\\n\",\n    \"\\n\",\n    \"You will run multi-LoRA serving on Trn2 with vLLM V1 using Llama-3.1-8b-instruct and four LoRA adapters, two are preloaded in HBM during model initialization and the four adapters are loaded in host memory. The data type is bfloat16 precision.\\n\",\n    \"Please refer to [vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html#nxdi-vllm-user-guide) for more details on how to run model inference on TRN2 with vLLM V1.\\n\",\n    \"\\n\",\n    \"### Multi-LoRA Configurations\\n\",\n    \"\\n\",\n    \"You should specifically set the following configurations when enabling multi-LoRA serving with vLLM V1.\\n\",\n    \"\\n\",\n    \"- `enable_lora` - The flag to enable multi-LoRA serving in NxD Inference. Defaults to False.\\n\",\n    \"\\n\",\n    \"- `max_loras` - The maximum number of concurrent LoRA adapters in device memory.\\n\",\n    \"\\n\",\n    \"- `max_cpu_loras` - The maximum number of concurrent LoRA adapters in host memory.\\n\",\n    \"\\n\",\n    \"- `max_lora_rank` - The highest LoRA rank that needs to be supported. Defaults to ```16```. If it is not specified, the maximum LoRA rank of the LoRA adapter checkpoints will be used.\\n\",\n    \"\\n\",\n    \"- `lora-ckpt-json` - The the path of JSON file that describes the mappings for the adapter IDs and their checkpoint paths. It includes three fields:\\n\",\n    \"   - `lora-ckpt-dir` - The directory of the LoRA adapters.\\n\",\n    \"   - `lora-ckpt-paths` - The mapping between LoRA adapter IDs on HBM and their checkpoint paths at initialization. Note that they might be evicted at runtime.\\n\",\n    \"   - `lora-ckpt-paths-cpu` - The mapping between LoRA adapter IDs and their checkpoints on CPU.\\n\",\n    \"\\n\",\n    \"Here is an example of the JSON file:\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\n\",\n    \"```json\\n\",\n    \"{\\n\",\n    \"    \\\"lora-ckpt-dir\\\": \\\"/home/ubuntu/lora_adapters/\\\",\\n\",\n    \"    \\\"lora-ckpt-paths\\\": {\\n\",\n    \"        \\\"lora_id_1\\\": \\\"llama-3.1-nemoguard-8b-topic-control\\\",\\n\",\n    \"        \\\"lora_id_2\\\": \\\"llama-3.1-8b-abliterated-lora\\\"\\n\",\n    \"    },\\n\",\n    \"    \\\"lora-ckpt-paths-cpu\\\": {\\n\",\n    \"        \\\"lora_id_1\\\": \\\"llama-3.1-nemoguard-8b-topic-control\\\",\\n\",\n    \"        \\\"lora_id_2\\\": \\\"llama-3.1-8b-abliterated-lora\\\",\\n\",\n    \"        \\\"lora_id_3\\\": \\\"aixpa_amicifamiglia_short_prompt\\\",\\n\",\n    \"        \\\"lora_id_4\\\": \\\"Llama-31-8B_task-2_180-samples_config-2\\\"\\n\",\n    \"    }\\n\",\n    \"}\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Offline inference example\\n\",\n    \"\\n\",\n    \"You can run multi-LoRA serving offline on TRN2 with vLLM V1.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"from vllm import LLM, SamplingParams\\n\",\n    \"from vllm.lora.request import LoRARequest\\n\",\n    \"\\n\",\n    \"MODEL_PATH=\\\"/home/ubuntu/model_hf/llama-3.1-8b-instruct/\\\"\\n\",\n    \"# Replace this with the path where you saved the JSON file.\\n\",\n    \"LORA_CKPT_JSON=\\\"/home/ubuntu/lora_adapters/lora_adapters.json\\\"\\n\",\n    \"# This is where the compiled model will be saved.\\n\",\n    \"COMPILED_MODEL_PATH=\\\"/home/ubuntu/traced_model/llama-3.1-8B-Lora/\\\"\\n\",\n    \"os.environ[\\\"NEURON_COMPILED_ARTIFACTS\\\"] = (COMPILED_MODEL_PATH)\\n\",\n    \"os.environ[\\\"VLLM_USE_V1\\\"] = \\\"1\\\"\\n\",\n    \"\\n\",\n    \"# Sample prompts.\\n\",\n    \"prompts = [\\n\",\n    \"    \\\"The president of the United States is\\\",\\n\",\n    \"    \\\"The capital of France is\\\",\\n\",\n    \"]\\n\",\n    \"\\n\",\n    \"# Create a sampling params object.\\n\",\n    \"sampling_params = SamplingParams(top_k=1)\\n\",\n    \"override_neuron_config = {\\n\",\n    \"    \\\"skip_warmup\\\": True,\\n\",\n    \"    \\\"lora_ckpt_json\\\": LORA_CKPT_JSON,\\n\",\n    \"}\\n\",\n    \"\\n\",\n    \"# Create an LLM with multi-LoRA serving.\\n\",\n    \"llm = LLM(\\n\",\n    \"    model=MODEL_PATH,\\n\",\n    \"    max_num_seqs=2,\\n\",\n    \"    max_model_len=64,\\n\",\n    \"    tensor_parallel_size=32,\\n\",\n    \"    additional_config={\\n\",\n    \"        \\\"override_neuron_config\\\": override_neuron_config\\n\",\n    \"    },\\n\",\n    \"    enable_lora=True,\\n\",\n    \"    max_loras=2,\\n\",\n    \"    max_cpu_loras=4,\\n\",\n    \"    enable_prefix_caching=False,\\n\",\n    \"    enable_chunked_prefill=False,\\n\",\n    \")\\n\",\n    \"\\\"\\\"\\\"\\n\",\n    \"Only the lora_name needs to be specified.\\n\",\n    \"The lora_id and lora_path are supplied at the LLM class/server initialization, after which the paths are\\n\",\n    \"handled by NxD Inference.\\n\",\n    \"\\\"\\\"\\\"\\n\",\n    \"# lora_id_1 is in HBM\\n\",\n    \"lora_req_1 = LoRARequest(\\\"lora_id_1\\\", 1, \\\" \\\")\\n\",\n    \"# lora_id_3 is in host memory and it will be dynamically swapped to HBM at runtime\\n\",\n    \"lora_req_2 = LoRARequest(\\\"lora_id_3\\\", 2, \\\" \\\")\\n\",\n    \"outputs = llm.generate(prompts, sampling_params, lora_request=[lora_req_1, lora_req_2])\\n\",\n    \"\\n\",\n    \"for output in outputs:\\n\",\n    \"    prompt = output.prompt\\n\",\n    \"    generated_text = output.outputs[0].text\\n\",\n    \"    print(f\\\"Prompt: {prompt!r}, Generated text: {generated_text!r}\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"#### Run multi-LoRA serving with model quantization\\n\",\n    \"\\n\",\n    \"To enable multi-LoRA serving with the base model quantized, you must pass some quantization-related arguments to vLLM. For example, you can add the following arguments to `override_neuron_config`. Refer to [Model Weight Quantization](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#nxdi-weight-quantization) for more information.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"quantization_config = {\\n\",\n    \"    \\\"quantized\\\": True,\\n\",\n    \"    # quantized_checkpoints_path is the path that saves the quantized base model weights\\n\",\n    \"    \\\"quantized_checkpoints_path\\\": os.path.join(COMPILED_MODEL_PATH, \\\"model_quant.pt\\\"),\\n\",\n    \"    \\\"quantization_type\\\": \\\"per_channel_symmetric\\\",\\n\",\n    \"}\\n\",\n    \"# Add quantization config to override_neuron_config\\n\",\n    \"override_neuron_config.update(quantization_config)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Online Server Example\\n\",\n    \"\\n\",\n    \"You can also run online multi-LoRA serving on TRN2 with vLLM V1. Save the contents of the below script to another shell script file, for example, `start_vllm.sh` and then run it.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%writefile start_vllm.sh\\n\",\n    \"#!/bin/bash\\n\",\n    \"\\n\",\n    \"echo \\\"Running vLLM server in the background...\\\"\\n\",\n    \"rm -f ./vllm_server.log\\n\",\n    \"\\n\",\n    \"# These should be the same paths used when compiling the model.\\n\",\n    \"MODEL_PATH=\\\"/home/ubuntu/model_hf/llama-3.1-8b-instruct/\\\"\\n\",\n    \"# Replace this with the path where you saved the JSON file. Refer to the NxD Inference script for the JSON format.\\n\",\n    \"LORA_CKPT_JSON=\\\"/home/ubuntu/lora_adapters/lora_adapters.json\\\"\\n\",\n    \"# This is where the compiled model will be saved.\\n\",\n    \"COMPILED_MODEL_PATH=\\\"/home/ubuntu/traced_model/llama-3.1-8B-Lora/\\\"\\n\",\n    \"# Replace this with the path where you saved the LoRA adapters\\n\",\n    \"LORA_ADAPTER_DIR=\\\"/home/ubuntu/lora_adapters\\\"\\n\",\n    \"# Set lora_modules to register LoRA adapters during multi-LoRA serving\\n\",\n    \"LORA_MODULES=\\\"lora_id_1=${LORA_ADAPTER_DIR}/llama-3.1-nemoguard-8b-topic-control \\\"\\n\",\n    \"LORA_MODULES+=\\\"lora_id_2=${LORA_ADAPTER_DIR}/llama-3.1-8b-abliterated-lora \\\"\\n\",\n    \"LORA_MODULES+=\\\"lora_id_3=${LORA_ADAPTER_DIR}/aixpa_amicifamiglia_short_prompt \\\"\\n\",\n    \"LORA_MODULES+=\\\"lora_id_4=${LORA_ADAPTER_DIR}/Llama-31-8B_task-2_180-samples_config-2 \\\"\\n\",\n    \"\\n\",\n    \"export NEURON_COMPILED_ARTIFACTS=$COMPILED_MODEL_PATH\\n\",\n    \"VLLM_RPC_TIMEOUT=100000 \\n\",\n    \"nohup vllm serve $MODEL_PATH \\\\\\n\",\n    \"    --max-num-seqs 2 \\\\\\n\",\n    \"    --max-model-len 64 \\\\\\n\",\n    \"    --tensor-parallel-size 32 \\\\\\n\",\n    \"    --disable-log-requests \\\\\\n\",\n    \"    --no-enable-chunked-prefill \\\\\\n\",\n    \"    --no-enable-prefix-caching \\\\\\n\",\n    \"    --enable-lora \\\\\\n\",\n    \"    --max-loras 2 \\\\\\n\",\n    \"    --max-cpu-loras 8 \\\\\\n\",\n    \"    --override-neuron-config \\\"{\\\\\\\"sequence_parallel_enabled\\\\\\\": false}\\\" \\\\\\n\",\n    \"    --lora-modules ${LORA_MODULES} \\\\\\n\",\n    \"    --port 8000 ./vllm_server.log 2>&1 & \\n\",\n    \"\\n\",\n    \"SERVER_PID=$!\\n\",\n    \"\\n\",\n    \"echo \\\"Server started in the background with the following id: $SERVER_PID. Waiting until server is ready to serve...\\\"\\n\",\n    \"\\n\",\n    \"until grep -q \\\"Server is ready to serve\\\" ./vllm_server.log 2>/dev/null || ! kill -0 $SERVER_PID 2>/dev/null; do sleep 0.5; done\\n\",\n    \"grep -q \\\"Server is ready to serve\\\" ./vllm_server.log 2>/dev/null && echo \\\"vLLM Server is ready!\\\" || (echo \\\"vLLM Server failed, check the ./vllm_server.log file\\\" && exit 1)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!chmod +x ./start_vllm.sh\\n\",\n    \"!./start_vllm.sh\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"After the vLLM server is launched, you can check the registered LoRA adapters in the vLLM server.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"curl http://localhost:8000/v1/models | jq\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can send requests to the server for serving with the `model` field as one of the registered LoRA adapter IDs. Two sample requests are:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"shellscript\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"# request LoRA adapter in HBM\\n\",\n    \"curl http://localhost:8000/v1/completions \\\\\\n\",\n    \"    -H \\\"Content-Type: application/json\\\" \\\\\\n\",\n    \"    -d '{\\n\",\n    \"        \\\"model\\\": \\\"lora_id_1\\\",\\n\",\n    \"        \\\"prompt\\\": \\\"The president of the United States is\\\",\\n\",\n    \"        \\\"max_tokens\\\": 32,\\n\",\n    \"        \\\"temperature\\\": 0\\n\",\n    \"    }' | jq\\n\",\n    \"\\n\",\n    \"# request LoRA adapter in host memory with dynamic swap\\n\",\n    \"curl http://localhost:8000/v1/completions \\\\\\n\",\n    \"    -H \\\"Content-Type: application/json\\\" \\\\\\n\",\n    \"    -d '{\\n\",\n    \"        \\\"model\\\": \\\"lora_id_3\\\",\\n\",\n    \"        \\\"prompt\\\": \\\"The capital of France is\\\",\\n\",\n    \"        \\\"max_tokens\\\": 32,\\n\",\n    \"        \\\"temperature\\\": 0\\n\",\n    \"    }' | jq\\n\",\n    \"    \\n\",\n    \"# request the base model for serving\\n\",\n    \"curl http://localhost:8000/v1/completions \\\\\\n\",\n    \"    -H \\\"Content-Type: application/json\\\" \\\\\\n\",\n    \"    -d '{\\n\",\n    \"        \\\"model\\\": $MODEL_PATH,\\n\",\n    \"        \\\"prompt\\\": \\\"The capital of France is\\\",\\n\",\n    \"        \\\"max_tokens\\\": 32,\\n\",\n    \"        \\\"temperature\\\": 0\\n\",\n    \"    }' | jq\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Dynamically loading LoRA Adapters\\n\",\n    \"\\n\",\n    \"In addition to specifying LoRA adapters at server startup, you can also dynamically configure LoRA adapters at runtime through dedicated API endpoints. This feature can be particularly useful when the flexibility to change LoRA adapters on-the-fly is needed.\\n\",\n    \"\\n\",\n    \"Note: the LoRA adapter checkpoints must be stored locally on the host where the server is running before a LoRA adapter is loaded.\\n\",\n    \"\\n\",\n    \"To enable dynamic LoRA configuration, ensure that the environment variable `VLLM_ALLOW_RUNTIME_LORA_UPDATING` is set to True when starting the server engine.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"shellscript\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"export VLLM_ALLOW_RUNTIME_LORA_UPDATING=True\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Example request to load a LoRA adapter:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"shellscript\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"curl -X POST http://localhost:8000/v1/load_lora_adapter \\\\\\n\",\n    \"-H \\\"Content-Type: application/json\\\" \\\\\\n\",\n    \"-d '{\\n\",\n    \"    \\\"lora_name\\\": \\\"lora_id_5\\\",\\n\",\n    \"    \\\"lora_path\\\": \\\"/path/to/lora-adapter-5\\\"\\n\",\n    \"}'\\n\",\n    \"\\n\",\n    \"# check the registered LoRA adapters in the vLLM server.\\n\",\n    \"curl http://localhost:8000/v1/models | jq\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Example request to unload a LoRA adapter:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"shellscript\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"curl -X POST http://localhost:8000/v1/unload_lora_adapter \\\\\\n\",\n    \"-H \\\"Content-Type: application/json\\\" \\\\\\n\",\n    \"-d '{\\n\",\n    \"    \\\"lora_name\\\": \\\"lora_id_1\\\"\\n\",\n    \"}'\\n\",\n    \"\\n\",\n    \"# check the registered LoRA adapters in the vLLM server.\\n\",\n    \"curl http://localhost:8000/v1/models | jq\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"neuron-224\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.12\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/trn2-llama3.3-70b-apc-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Using Prefix Caching with Llama-3.3-70B on Trn2 instances\\n\",\n    \"\\n\",\n    \"This tutorial provides a step-by-step guide to deploy Llama3.3 70B using \\n\",\n    \"NeuronX Distributed (NxD) Inference on a single Trn2.48xl instance using two\\n\",\n    \"different configurations, one with prefix caching enabled and the other\\n\",\n    \"without prefix caching. We will also measure average response time\\n\",\n    \"for both the configurations with prompts containing a common prefix.\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\\n\",\n    \"    :local:\\n\",\n    \"    :depth: 2\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Background, Concepts, and Optimizations\\n\",\n    \"\\n\",\n    \"### Block KV Cache Layout\\n\",\n    \"\\n\",\n    \"To support prefix caching, NxDI now uses block kv cache layout. Enable block layout of\\n\",\n    \"the cache by setting `is_block_kv_layout=True` in NeuronConfig. The first two\\n\",\n    \"dimensions of the KV cache are set to the number of blocks and block size, respectively.\\n\",\n    \"These configurations are specified using `pa_num_blocks` and `pa_block_size` in NeuronConfig.\\n\",\n    \"\\n\",\n    \"For optimal performance with Neuron, it's recommended to set `pa_block_size=32`.\\n\",\n    \"The minimum required `pa_num_blocks` can be calculated using the formula\\n\",\n    \"`(batch_size * max_seq_len) / block_size` where batch_size is the compiled batch size\\n\",\n    \"and max_seq_len is the maximum sequence length of the compiled model on Neuron.\\n\",\n    \"While using the minimum block calculation will produce accurate results, it's recommended\\n\",\n    \"to initialize as many blocks as possible without exceeding HBM space limitations. This\\n\",\n    \"ensures that Neuron has sufficient blocks to save as much prefix data as possible. More cache\\n\",\n    \"blocks implies higher prefix caching hit rate and hence better context encoding performance.\\n\",\n    \"\\n\",\n    \"### Kernels\\n\",\n    \"\\n\",\n    \"NxD Inference supports kernels that optimize parts of the modeling code\\n\",\n    \"for best performance when prefix caching is enabled.\\n\",\n    \"\\n\",\n    \"- Token generation attention kernel with block kv cache read and update capabilities.\\n\",\n    \"  This kernel reads the cache blocks using the active block table, converts the required\\n\",\n    \"  blocks into flat layout, performs attention and scatters back the computed key and value\\n\",\n    \"  to the correct slot in the block cache layout. To enable this kernel, set\\n\",\n    \"  `attn_block_tkg_nki_kernel_enabled=True` and `attn_block_tkg_nki_kernel_cache_update=True`\\n\",\n    \"  in NeuronConfig.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Prerequisites\\n\",\n    \"\\n\",\n    \"### Set up and connect to a Trn2.48xlarge instance\\n\",\n    \"\\n\",\n    \"As a prerequisite, this tutorial requires that you have a Trn2 instance\\n\",\n    \"created from a Deep Learning AMI that has the Neuron SDK pre-installed.\\n\",\n    \"\\n\",\n    \"To set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK,\\n\",\n    \"see the [NxDI setup guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html#nxdi-setup).\\n\",\n    \"\\n\",\n    \"After setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you\\n\",\n    \"chose when you launched the instance.\\n\",\n    \"\\n\",\n    \"To use Jupyter Notebook on the Neuron instance, you can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"After you are connected, activate the Python virtual environment that\\n\",\n    \"includes the Neuron SDK.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \"```python\\n\",\n    \"source ~/aws_neuronx_venv_pytorch_2_8_nxd_inference/bin/activate\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Run `pip list` to verify that the Neuron SDK is installed.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \"```python\\n\",\n    \"pip list | grep neuron\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You should see Neuron packages including\\n\",\n    \"`neuronx-distributed-inference` and `neuronx-cc`.\\n\",\n    \"\\n\",\n    \"### Install packages\\n\",\n    \"\\n\",\n    \"NxD Inference supports running models with vLLM. This functionality is\\n\",\n    \"available through the vllm-neuron plugin. Install the latest release branch of\\n\",\n    \"vLLM from the vllm-neuron plugin following instructions in the\\n\",\n    \"[vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html#nxdi-vllm-user-guide).\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Download models\\n\",\n    \"\\n\",\n    \"To use this sample, you must first download a 70B model checkpoint from Hugging Face\\n\",\n    \"to a local path on the Trn2 instance. For more information, see\\n\",\n    \"[Downloading models](https://huggingface.co/docs/hub/en/models-downloading)\\n\",\n    \"in the Hugging Face documentation. You can download and use [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)\\n\",\n    \"for this tutorial.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Scenario 1: Run Llama3.3 70B on Trn2 without Prefix Caching\\n\",\n    \"\\n\",\n    \"### Step 1: Compile the model\\n\",\n    \"\\n\",\n    \"We will first compile using a command installed by `neuronx-distributed-inference`.\\n\",\n    \"\\n\",\n    \"Note that we are also using the following features as described in\\n\",\n    \"the tutorial for running 405B model [Tutorial: Deploying Llama3.1 405B (Trn2)](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.1-405b-tutorial.html)\\n\",\n    \"\\n\",\n    \"- Logical NeuronCore Configuration (LNC)\\n\",\n    \"- Tensor parallelism (TP) on Trn2\\n\",\n    \"- Optimized Kernels\\n\",\n    \"\\n\",\n    \"Note the path we used to save the compiled model. This path should be used\\n\",\n    \"when launching vLLM server for inference so that the compiled model can be loaded without recompilation.\\n\",\n    \"Refer to the [NxD inference API](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/api-guides/api-guide.html) guide for more information on these `inference_demo` flags.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"# Replace this with the path where you downloaded and saved the model files.\\n\",\n    \"MODEL_PATH=\\\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"# This is where the compiled model will be saved. The same path\\n\",\n    \"# should be used when launching vLLM server for inference.\\n\",\n    \"COMPILED_MODEL_PATH=\\\"/home/ubuntu/traced_model/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"\\n\",\n    \"NUM_CORES=128\\n\",\n    \"TP_DEGREE=64\\n\",\n    \"LNC=2\\n\",\n    \"\\n\",\n    \"export NEURON_RT_VIRTUAL_CORE_SIZE=$LNC\\n\",\n    \"export NEURON_RT_NUM_CORES=$((NUM_CORES/NEURON_RT_VIRTUAL_CORE_SIZE))\\n\",\n    \"export NEURON_RT_EXEC_TIMEOUT=600 \\n\",\n    \"export XLA_DENSE_GATHER_FACTOR=0 \\n\",\n    \"export NEURON_RT_INSPECT_ENABLE=0\\n\",\n    \"\\n\",\n    \"inference_demo \\\\\\n\",\n    \"    --model-type llama \\\\\\n\",\n    \"    --task-type causal-lm \\\\\\n\",\n    \"        run \\\\\\n\",\n    \"        --model-path $MODEL_PATH \\\\\\n\",\n    \"        --compiled-model-path $COMPILED_MODEL_PATH \\\\\\n\",\n    \"        --torch-dtype bfloat16 \\\\\\n\",\n    \"        --start_rank_id 0 \\\\\\n\",\n    \"        --local_ranks_size $TP_DEGREE \\\\\\n\",\n    \"        --tp-degree $TP_DEGREE \\\\\\n\",\n    \"        --batch-size 4 \\\\\\n\",\n    \"        --is-continuous-batching \\\\\\n\",\n    \"        --ctx-batch-size 1 \\\\\\n\",\n    \"        --tkg-batch-size 4 \\\\\\n\",\n    \"        --max-context-length 8192 \\\\\\n\",\n    \"        --seq-len 8192 \\\\\\n\",\n    \"        --on-device-sampling \\\\\\n\",\n    \"        --top-k 1 \\\\\\n\",\n    \"        --do-sample \\\\\\n\",\n    \"        --fused-qkv \\\\\\n\",\n    \"        --sequence-parallel-enabled \\\\\\n\",\n    \"        --qkv-kernel-enabled \\\\\\n\",\n    \"        --attn-kernel-enabled \\\\\\n\",\n    \"        --mlp-kernel-enabled \\\\\\n\",\n    \"        --attn-block-tkg-nki-kernel-enabled \\\\\\n\",\n    \"        --attn-block-tkg-nki-kernel-cache-update \\\\\\n\",\n    \"        --k-cache-transposed \\\\\\n\",\n    \"        --cc-pipeline-tiling-factor 1 \\\\\\n\",\n    \"        --pad-token-id 2 \\\\\\n\",\n    \"        --enable-bucketing \\\\\\n\",\n    \"        --context-encoding-buckets 512 1024 2048 4096 8192 \\\\\\n\",\n    \"        --token-generation-buckets 512 1024 2048 4096 8192 \\\\\\n\",\n    \"        --compile-only \\\\\\n\",\n    \"        --prompt \\\"What is annapurna labs?\\\" 2>&1 | tee log.txt\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Step 2: Serve the model using vLLM\\n\",\n    \"\\n\",\n    \"After compiling the model, you can run the model using vLLM. Save the contents of the below script to another\\n\",\n    \"shell script file, for example, `start_vllm.sh` and then run it.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%writefile start_vllm.sh\\n\",\n    \"#!/bin/bash\\n\",\n    \"\\n\",\n    \"echo \\\"Running vLLM server in the background...\\\"\\n\",\n    \"rm -f ./vllm_server.log \\n\",\n    \"\\n\",\n    \"export NEURON_RT_INSPECT_ENABLE=0 \\n\",\n    \"export NEURON_RT_VIRTUAL_CORE_SIZE=2\\n\",\n    \"\\n\",\n    \"# These should be the same paths used when compiling the model.\\n\",\n    \"MODEL_PATH=\\\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"COMPILED_MODEL_PATH=\\\"/home/ubuntu/traced_model/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"\\n\",\n    \"export VLLM_NEURON_FRAMEWORK=\\\"neuronx-distributed-inference\\\"\\n\",\n    \"export NEURON_COMPILED_ARTIFACTS=$COMPILED_MODEL_PATH\\n\",\n    \"VLLM_RPC_TIMEOUT=100000 \\n\",\n    \"nohup vllm serve \\\\\\n\",\n    \"    --model $MODEL_PATH \\\\\\n\",\n    \"    --max-num-seqs 4 \\\\\\n\",\n    \"    --max-model-len 8192 \\\\\\n\",\n    \"    --tensor-parallel-size 64 \\\\\\n\",\n    \"    --no-enable-prefix-caching \\\\\\n\",\n    \"    --block-size 32 \\\\\\n\",\n    \"    --port 8000 > ./vllm_server.log 2>&1 &\\n\",\n    \"SERVER_PID=$!\\n\",\n    \"\\n\",\n    \"echo \\\"Server started in the background with the following id: $SERVER_PID. Waiting until server is ready to serve...\\\"\\n\",\n    \"\\n\",\n    \"until grep -q \\\"Server is ready to serve\\\" ./vllm_server.log 2>/dev/null || ! kill -0 $SERVER_PID 2>/dev/null; do sleep 0.5; done\\n\",\n    \"grep -q \\\"Server is ready to serve\\\" ./vllm_server.log 2>/dev/null && echo \\\"vLLM Server is ready!\\\" || (echo \\\"vLLM Server failed, check the ./vllm_server.log file\\\" && exit 1)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!chmod +x ./start_vllm.sh\\n\",\n    \"!./start_vllm.sh\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"If you see the below logs, that means your server is up and running:\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"```python\\n\",\n    \"INFO: Started server process [284309]\\n\",\n    \"INFO: Waiting for application startup.\\n\",\n    \"INFO: Application startup complete.\\n\",\n    \"INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Step 3: Analyze Request response from server\\n\",\n    \"\\n\",\n    \"An example script has been added to demonstrate how a common lookup table is used to\\n\",\n    \"answer 10 different questions while measuring the total response time. The lookup table\\n\",\n    \"serves as a shared prefix that's consistently applied across all 10 input prompts.\\n\",\n    \"The script will calculate and display the average time required to answer all questions.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"\\n\",\n    \"MODEL_PATH=\\\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"COMPILED_MODEL_PATH=\\\"/home/ubuntu/traced_model/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"\\n\",\n    \"LONG_PROMPT=$(cat << 'EOL'\\n\",\n    \"You are a helpful assistant in recognizes the content of tables in markdown format. Here is a table as follows.\\n\",\n    \"# Table\\n\",\n    \"| ID  | Name          | Age | Occupation    | Country       | Email                  | Phone Number   | Address                       |\\n\",\n    \"|-----|---------------|-----|---------------|---------------|------------------------|----------------|------------------------------|\\n\",\n    \"| 1   | John Doe      | 29  | Engineer      | USA           | john.doe@example.com   | 555-1234       | 123 Elm St, Springfield, IL  |\\n\",\n    \"| 2   | Jane Smith    | 34  | Doctor        | Canada        | jane.smith@example.com | 555-5678       | 456 Oak St, Toronto, ON      |\\n\",\n    \"| 3   | Alice Johnson | 27  | Teacher       | UK            | alice.j@example.com    | 555-8765       | 789 Pine St, London, UK      |\\n\",\n    \"| 4   | Bob Brown     | 45  | Artist        | Australia     | bob.b@example.com      | 555-4321       | 321 Maple St, Sydney, NSW    |\\n\",\n    \"| 5   | Carol White   | 31  | Scientist     | New Zealand   | carol.w@example.com    | 555-6789       | 654 Birch St, Wellington, NZ |\\n\",\n    \"| 6   | Dave Green    | 28  | Lawyer        | Ireland       | dave.g@example.com     | 555-3456       | 987 Cedar St, Dublin, IE     |\\n\",\n    \"| 7   | Emma Black    | 40  | Musician      | USA           | emma.b@example.com     | 555-1111       | 246 Ash St, New York, NY     |\\n\",\n    \"| 8   | Frank Blue    | 37  | Chef          | Canada        | frank.b@example.com    | 555-2222       | 135 Spruce St, Vancouver, BC |\\n\",\n    \"| 9   | Grace Yellow  | 50  | Engineer      | UK            | grace.y@example.com    | 555-3333       | 864 Fir St, Manchester, UK   |\\n\",\n    \"| 10  | Henry Violet  | 32  | Artist        | Australia     | henry.v@example.com    | 555-4444       | 753 Willow St, Melbourne, VIC|\\n\",\n    \"| 11  | Irene Orange  | 26  | Scientist     | New Zealand   | irene.o@example.com    | 555-5555       | 912 Poplar St, Auckland, NZ  |\\n\",\n    \"| 12  | Jack Indigo   | 38  | Teacher       | Ireland       | jack.i@example.com     | 555-6666       | 159 Elm St, Cork, IE         |\\n\",\n    \"| 13  | Karen Red     | 41  | Lawyer        | USA           | karen.r@example.com    | 555-7777       | 357 Cedar St, Boston, MA     |\\n\",\n    \"| 14  | Leo Brown     | 30  | Chef          | Canada        | leo.b@example.com      | 555-8888       | 246 Oak St, Calgary, AB      |\\n\",\n    \"| 15  | Mia Green     | 33  | Musician      | UK            | mia.g@example.com      | 555-9999       | 975 Pine St, Edinburgh, UK   |\\n\",\n    \"| 16  | Noah Yellow   | 29  | Doctor        | Australia     | noah.y@example.com     | 555-0000       | 864 Birch St, Brisbane, QLD  |\\n\",\n    \"| 17  | Olivia Blue   | 35  | Engineer      | New Zealand   | olivia.b@example.com   | 555-1212       | 753 Maple St, Hamilton, NZ   |\\n\",\n    \"| 18  | Peter Black   | 42  | Artist        | Ireland       | peter.b@example.com    | 555-3434       | 912 Fir St, Limerick, IE     |\\n\",\n    \"| 19  | Quinn White   | 28  | Scientist     | USA           | quinn.w@example.com    | 555-5656       | 159 Willow St, Seattle, WA   |\\n\",\n    \"| 20  | Rachel Red    | 31  | Teacher       | Canada        | rachel.r@example.com   | 555-7878       | 357 Poplar St, Ottawa, ON    |\\n\",\n    \"| 21  | Steve Green   | 44  | Lawyer        | UK            | steve.g@example.com    | 555-9090       | 753 Elm St, Birmingham, UK   |\\n\",\n    \"| 22  | Tina Blue     | 36  | Musician      | Australia     | tina.b@example.com     | 555-1213       | 864 Cedar St, Perth, WA      |\\n\",\n    \"| 23  | Umar Black    | 39  | Chef          | New Zealand   | umar.b@example.com     | 555-3435       | 975 Spruce St, Christchurch, NZ|\\n\",\n    \"| 24  | Victor Yellow | 43  | Engineer      | Ireland       | victor.y@example.com   | 555-5657       | 246 Willow St, Galway, IE    |\\n\",\n    \"| 25  | Wendy Orange  | 27  | Artist        | USA           | wendy.o@example.com    | 555-7879       | 135 Elm St, Denver, CO       |\\n\",\n    \"| 26  | Xavier Green  | 34  | Scientist     | Canada        | xavier.g@example.com   | 555-9091       | 357 Oak St, Montreal, QC     |\\n\",\n    \"| 27  | Yara Red      | 41  | Teacher       | UK            | yara.r@example.com     | 555-1214       | 975 Pine St, Leeds, UK       |\\n\",\n    \"| 28  | Zack Blue     | 30  | Lawyer        | Australia     | zack.b@example.com     | 555-3436       | 135 Birch St, Adelaide, SA   |\\n\",\n    \"| 29  | Amy White     | 33  | Musician      | New Zealand   | amy.w@example.com      | 555-5658       | 159 Maple St, Wellington, NZ |\\n\",\n    \"| 30  | Ben Black     | 38  | Chef          | Ireland       | ben.b@example.com      | 555-7870       | 246 Fir St, Waterford, IE    |\\n\",\n    \"EOL\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"questions=(\\n\",\n    \"    \\\"Question: what is the age of John Doe? Your answer: The age of John Doe is \\\"\\n\",\n    \"    \\\"Question: what is the age of Zack Blue? Your answer: The age of Zack Blue is \\\"\\n\",\n    \"    \\\"Question: Which country is Ben Black from? Your answer: The country of Ben Black is \\\"\\n\",\n    \"    \\\"Question: Who has rachel.r@example.com as their email domain? Your answer: The email domain rachel.r@example.com belongs to \\\"\\n\",\n    \"    \\\"Question: What is the phone number for contacting Karen Red? Your answer: The phone number for contacting Karen Red is \\\"\\n\",\n    \"    \\\"Question: What is the occupation of Tina Blue? Your answer: The occupation of Tina Blue is \\\"\\n\",\n    \"    \\\"Question: What is the name of the person with id as 29? Your answer: The name of the person with id as 29 is \\\"\\n\",\n    \"    \\\"Question: What is the address of Alice Johnson? Your answer: The address of Alice Johnson is \\\"\\n\",\n    \"    \\\"Question: What is the id of Irene Orange? Your answer: The id of Irene Orange is \\\"\\n\",\n    \"    \\\"Question: What is the age of Leo Brown? Your answer: The age of Leo Brown is \\\"\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"# Function to make a single request\\n\",\n    \"make_request() {\\n\",\n    \"    local question=$1\\n\",\n    \"    local prompt_with_suffix=\\\"${LONG_PROMPT}\\n\",\n    \"\\n\",\n    \"Based on the table above, please answer this question:\\n\",\n    \"${question}\\\"\\n\",\n    \"    \\n\",\n    \"    local escaped_prompt=$(echo \\\"$prompt_with_suffix\\\" | jq -Rs .)\\n\",\n    \"    \\n\",\n    \"    # Make the curl request and capture both response and time\\n\",\n    \"    local response_file=$(mktemp)\\n\",\n    \"    time_output=$(TIMEFORMAT='%R'; { time curl -s http://localhost:8000/v1/chat/completions \\\\\\n\",\n    \"        -H \\\"Content-Type: application/json\\\" \\\\\\n\",\n    \"        -d \\\"{\\n\",\n    \"            \\\\\\\"model\\\\\\\": \\\\\\\"$MODEL_PATH\\\\\\\",\\n\",\n    \"            \\\\\\\"messages\\\\\\\": [\\n\",\n    \"                {\\n\",\n    \"                    \\\\\\\"role\\\\\\\": \\\\\\\"user\\\\\\\",\\n\",\n    \"                    \\\\\\\"content\\\\\\\": ${escaped_prompt}\\n\",\n    \"                }\\n\",\n    \"            ]\\n\",\n    \"        }\\\" > \\\"$response_file\\\"; } 2>&1)\\n\",\n    \"    \\n\",\n    \"    # Extract the response content\\n\",\n    \"    local response_content=$(cat \\\"$response_file\\\" | jq -r '.choices[0].message.content')\\n\",\n    \"    rm \\\"$response_file\\\"\\n\",\n    \"    \\n\",\n    \"    # Return both time and response\\n\",\n    \"    echo \\\"TIME:$time_output\\\"\\n\",\n    \"    echo \\\"RESPONSE:$response_content\\\"\\n\",\n    \"}\\n\",\n    \"\\n\",\n    \"# Make first request (warm-up) with a random question\\n\",\n    \"random_index=$((RANDOM % ${#questions[@]}))\\n\",\n    \"echo \\\"Warm-up request with question: ${questions[$random_index]}\\\"\\n\",\n    \"IFS=$'\\\\n' read -r -d '' time_str response_str < <(make_request \\\"${questions[$random_index]}\\\" && echo '')\\n\",\n    \"echo \\\"Response: $response_str\\\"\\n\",\n    \"echo \\\"Time taken: ${time_str#TIME:} seconds\\\"\\n\",\n    \"echo \\\"Warm-up complete\\\"\\n\",\n    \"echo \\\"-------------------\\\"\\n\",\n    \"\\n\",\n    \"# Make 10 timed requests with random questions\\n\",\n    \"total_time=0\\n\",\n    \"for i in {0..9}; do\\n\",\n    \"    random_index=$i\\n\",\n    \"    #random_index=$((RANDOM % ${#questions[@]}))\\n\",\n    \"    question=\\\"${questions[$random_index]}\\\"\\n\",\n    \"    echo \\\"Request $i with question: $question\\\"\\n\",\n    \"    \\n\",\n    \"    IFS=$'\\\\n' read -r -d '' time_str response_str < <(make_request \\\"$question\\\" && echo '')\\n\",\n    \"    time_taken=${time_str#TIME:}\\n\",\n    \"    response=${response_str#RESPONSE:}\\n\",\n    \"    \\n\",\n    \"    total_time=$(echo \\\"$total_time + $time_taken\\\" | bc -l)\\n\",\n    \"    echo \\\"Response: $response\\\"\\n\",\n    \"    echo \\\"Time taken: ${time_taken} seconds\\\"\\n\",\n    \"    echo \\\"-------------------\\\"\\n\",\n    \"done\\n\",\n    \"\\n\",\n    \"# Calculate and display average time\\n\",\n    \"average_time=$(echo \\\"scale=3; $total_time / 10\\\" | bc -l)\\n\",\n    \"echo \\\"Average time across 10 requests: ${average_time} seconds\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Output from the script would include all the answers to the questions along with the average time to process all the requests at the very end as shown below.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"```python\\n\",\n    \"Average time across 10 requests: .388 seconds\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Scenario 2: Run Llama3.3 70B on Trn2 with Prefix Caching\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Step 1: Compile the model\\n\",\n    \"\\n\",\n    \"The compilation script with prefix caching adds extra flags specific to prefix caching to enable and configure Block KV cache layout along with enabling the kernels used with prefix caching. Please refer to the [Prefix Caching Support](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#prefix-caching-support) documentation for more information on the prefix caching flags used below.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"# Replace this with the path where you downloaded and saved the model files.\\n\",\n    \"MODEL_PATH=\\\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"# This is where the compiled model will be saved. The same path\\n\",\n    \"# should be used when launching vLLM server for inference.\\n\",\n    \"COMPILED_MODEL_PATH=\\\"/home/ubuntu/traced_model/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"\\n\",\n    \"NUM_CORES=128\\n\",\n    \"TP_DEGREE=64\\n\",\n    \"LNC=2\\n\",\n    \"\\n\",\n    \"export NEURON_RT_VIRTUAL_CORE_SIZE=$LNC\\n\",\n    \"export NEURON_RT_NUM_CORES=$((NUM_CORES/NEURON_RT_VIRTUAL_CORE_SIZE))\\n\",\n    \"export NEURON_RT_EXEC_TIMEOUT=600 \\n\",\n    \"export XLA_DENSE_GATHER_FACTOR=0 \\n\",\n    \"export NEURON_RT_INSPECT_ENABLE=0\\n\",\n    \"\\n\",\n    \"inference_demo \\\\\\n\",\n    \"    --model-type llama \\\\\\n\",\n    \"    --task-type causal-lm \\\\\\n\",\n    \"        run \\\\\\n\",\n    \"        --model-path $MODEL_PATH \\\\\\n\",\n    \"        --compiled-model-path $COMPILED_MODEL_PATH \\\\\\n\",\n    \"        --torch-dtype bfloat16 \\\\\\n\",\n    \"        --start_rank_id 0 \\\\\\n\",\n    \"        --local_ranks_size $TP_DEGREE \\\\\\n\",\n    \"        --tp-degree $TP_DEGREE \\\\\\n\",\n    \"        --batch-size 4 \\\\\\n\",\n    \"        --is-continuous-batching \\\\\\n\",\n    \"        --ctx-batch-size 1 \\\\\\n\",\n    \"        --tkg-batch-size 4 \\\\\\n\",\n    \"        --max-context-length 8192 \\\\\\n\",\n    \"        --seq-len 8192 \\\\\\n\",\n    \"        --on-device-sampling \\\\\\n\",\n    \"        --top-k 1 \\\\\\n\",\n    \"        --do-sample \\\\\\n\",\n    \"        --fused-qkv \\\\\\n\",\n    \"        --sequence-parallel-enabled \\\\\\n\",\n    \"        --qkv-kernel-enabled \\\\\\n\",\n    \"        --attn-kernel-enabled \\\\\\n\",\n    \"        --mlp-kernel-enabled \\\\\\n\",\n    \"        --attn-block-tkg-nki-kernel-enabled \\\\\\n\",\n    \"        --attn-block-tkg-nki-kernel-cache-update \\\\\\n\",\n    \"        --cc-pipeline-tiling-factor 1 \\\\\\n\",\n    \"        --pad-token-id 2 \\\\\\n\",\n    \"        --enable-bucketing \\\\\\n\",\n    \"        --context-encoding-buckets 512 1024 2048 4096 8192 \\\\\\n\",\n    \"        --token-generation-buckets 512 1024 2048 4096 8192 \\\\\\n\",\n    \"        --prefix-buckets 512 1024 2048 \\\\\\n\",\n    \"        --enable-block-kv-layout \\\\\\n\",\n    \"        --pa-num-blocks 2048 \\\\\\n\",\n    \"        --pa-block-size 32 \\\\\\n\",\n    \"        --enable-prefix-caching \\\\\\n\",\n    \"        --compile-only \\\\\\n\",\n    \"        --prompt \\\"What is annapurna labs?\\\" 2>&1 | tee log.txt\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Step 2: Serve the model using vLLM with prefix caching enabled\\n\",\n    \"\\n\",\n    \"After compiling the model, you can serve the model using vLLM with prefix caching enabled.\\n\",\n    \"Save the contents of the below script to another\\n\",\n    \"shell script file, for example, `start_vllm_apc.sh` and then run it.\\n\",\n    \"\\n\",\n    \"Note that we use `--enable-prefix-caching` in vLLM to enable prefix caching, along\\n\",\n    \"with `--block-size 32` and `--num-gpu-blocks-override 2048` which are consistent\\n\",\n    \"with `--pa-block-size 32` and `--pa-num-blocks 2048` flags specified during model\\n\",\n    \"compilation.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%writefile start_vllm.sh\\n\",\n    \"#!/bin/bash\\n\",\n    \"echo \\\"Running vLLM server in the background...\\\"\\n\",\n    \"rm -f ./vllm_server.log \\n\",\n    \"\\n\",\n    \"export NEURON_RT_INSPECT_ENABLE=0 \\n\",\n    \"export NEURON_RT_VIRTUAL_CORE_SIZE=2\\n\",\n    \"\\n\",\n    \"# These should be the same paths used when compiling the model.\\n\",\n    \"MODEL_PATH=\\\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"COMPILED_MODEL_PATH=\\\"/home/ubuntu/traced_model/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"\\n\",\n    \"export VLLM_NEURON_FRAMEWORK=\\\"neuronx-distributed-inference\\\"\\n\",\n    \"export NEURON_COMPILED_ARTIFACTS=$COMPILED_MODEL_PATH\\n\",\n    \"VLLM_RPC_TIMEOUT=100000 \\n\",\n    \"nohup vllm serve \\\\\\n\",\n    \"    --model $MODEL_PATH \\\\\\n\",\n    \"    --max-num-seqs 4 \\\\\\n\",\n    \"    --max-model-len 8192 \\\\\\n\",\n    \"    --tensor-parallel-size 64 \\\\\\n\",\n    \"    --num-gpu-blocks-override 2048 \\\\\\n\",\n    \"    --enable-prefix-caching \\\\\\n\",\n    \"    --block-size 32 \\\\\\n\",\n    \"    --additional-config '{\\\"override_neuron_config\\\": {\\\"is_block_kv_layout\\\": true, \\\"is_prefix_caching\\\": true}}' \\\\\\n\",\n    \"    --port 8000 > ./vllm_server.log 2>&1 &\\n\",\n    \"SERVER_PID=$!\\n\",\n    \"\\n\",\n    \"echo \\\"Server started in the background with the following id: $SERVER_PID. Waiting until server is ready to serve...\\\"\\n\",\n    \"\\n\",\n    \"until grep -q \\\"Server is ready to serve\\\" ./vllm_server.log 2>/dev/null || ! kill -0 $SERVER_PID 2>/dev/null; do sleep 0.5; done\\n\",\n    \"grep -q \\\"Server is ready to serve\\\" ./vllm_server.log 2>/dev/null && echo \\\"vLLM Server is ready!\\\" || (echo \\\"vLLM Server failed, check the ./vllm_server.log file\\\" && exit 1)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!chmod +x ./start_vllm.sh\\n\",\n    \"!./start_vllm.sh\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Wait for the server to be up and running before proceeding further.\\n\",\n    \"\\n\",\n    \"### Step 3: Analyze Request response from server\\n\",\n    \"\\n\",\n    \"Execute the same script file from scenario 1,\\n\",\n    \"to send identical request to the server with prefix caching enabled.\\n\",\n    \"The average time to respond to all the requests will be printed in the terminal.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"\\n\",\n    \"MODEL_PATH=\\\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"COMPILED_MODEL_PATH=\\\"/home/ubuntu/traced_model/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"\\n\",\n    \"LONG_PROMPT=$(cat << 'EOL'\\n\",\n    \"You are a helpful assistant in recognizes the content of tables in markdown format. Here is a table as follows.\\n\",\n    \"# Table\\n\",\n    \"| ID  | Name          | Age | Occupation    | Country       | Email                  | Phone Number   | Address                       |\\n\",\n    \"|-----|---------------|-----|---------------|---------------|------------------------|----------------|------------------------------|\\n\",\n    \"| 1   | John Doe      | 29  | Engineer      | USA           | john.doe@example.com   | 555-1234       | 123 Elm St, Springfield, IL  |\\n\",\n    \"| 2   | Jane Smith    | 34  | Doctor        | Canada        | jane.smith@example.com | 555-5678       | 456 Oak St, Toronto, ON      |\\n\",\n    \"| 3   | Alice Johnson | 27  | Teacher       | UK            | alice.j@example.com    | 555-8765       | 789 Pine St, London, UK      |\\n\",\n    \"| 4   | Bob Brown     | 45  | Artist        | Australia     | bob.b@example.com      | 555-4321       | 321 Maple St, Sydney, NSW    |\\n\",\n    \"| 5   | Carol White   | 31  | Scientist     | New Zealand   | carol.w@example.com    | 555-6789       | 654 Birch St, Wellington, NZ |\\n\",\n    \"| 6   | Dave Green    | 28  | Lawyer        | Ireland       | dave.g@example.com     | 555-3456       | 987 Cedar St, Dublin, IE     |\\n\",\n    \"| 7   | Emma Black    | 40  | Musician      | USA           | emma.b@example.com     | 555-1111       | 246 Ash St, New York, NY     |\\n\",\n    \"| 8   | Frank Blue    | 37  | Chef          | Canada        | frank.b@example.com    | 555-2222       | 135 Spruce St, Vancouver, BC |\\n\",\n    \"| 9   | Grace Yellow  | 50  | Engineer      | UK            | grace.y@example.com    | 555-3333       | 864 Fir St, Manchester, UK   |\\n\",\n    \"| 10  | Henry Violet  | 32  | Artist        | Australia     | henry.v@example.com    | 555-4444       | 753 Willow St, Melbourne, VIC|\\n\",\n    \"| 11  | Irene Orange  | 26  | Scientist     | New Zealand   | irene.o@example.com    | 555-5555       | 912 Poplar St, Auckland, NZ  |\\n\",\n    \"| 12  | Jack Indigo   | 38  | Teacher       | Ireland       | jack.i@example.com     | 555-6666       | 159 Elm St, Cork, IE         |\\n\",\n    \"| 13  | Karen Red     | 41  | Lawyer        | USA           | karen.r@example.com    | 555-7777       | 357 Cedar St, Boston, MA     |\\n\",\n    \"| 14  | Leo Brown     | 30  | Chef          | Canada        | leo.b@example.com      | 555-8888       | 246 Oak St, Calgary, AB      |\\n\",\n    \"| 15  | Mia Green     | 33  | Musician      | UK            | mia.g@example.com      | 555-9999       | 975 Pine St, Edinburgh, UK   |\\n\",\n    \"| 16  | Noah Yellow   | 29  | Doctor        | Australia     | noah.y@example.com     | 555-0000       | 864 Birch St, Brisbane, QLD  |\\n\",\n    \"| 17  | Olivia Blue   | 35  | Engineer      | New Zealand   | olivia.b@example.com   | 555-1212       | 753 Maple St, Hamilton, NZ   |\\n\",\n    \"| 18  | Peter Black   | 42  | Artist        | Ireland       | peter.b@example.com    | 555-3434       | 912 Fir St, Limerick, IE     |\\n\",\n    \"| 19  | Quinn White   | 28  | Scientist     | USA           | quinn.w@example.com    | 555-5656       | 159 Willow St, Seattle, WA   |\\n\",\n    \"| 20  | Rachel Red    | 31  | Teacher       | Canada        | rachel.r@example.com   | 555-7878       | 357 Poplar St, Ottawa, ON    |\\n\",\n    \"| 21  | Steve Green   | 44  | Lawyer        | UK            | steve.g@example.com    | 555-9090       | 753 Elm St, Birmingham, UK   |\\n\",\n    \"| 22  | Tina Blue     | 36  | Musician      | Australia     | tina.b@example.com     | 555-1213       | 864 Cedar St, Perth, WA      |\\n\",\n    \"| 23  | Umar Black    | 39  | Chef          | New Zealand   | umar.b@example.com     | 555-3435       | 975 Spruce St, Christchurch, NZ|\\n\",\n    \"| 24  | Victor Yellow | 43  | Engineer      | Ireland       | victor.y@example.com   | 555-5657       | 246 Willow St, Galway, IE    |\\n\",\n    \"| 25  | Wendy Orange  | 27  | Artist        | USA           | wendy.o@example.com    | 555-7879       | 135 Elm St, Denver, CO       |\\n\",\n    \"| 26  | Xavier Green  | 34  | Scientist     | Canada        | xavier.g@example.com   | 555-9091       | 357 Oak St, Montreal, QC     |\\n\",\n    \"| 27  | Yara Red      | 41  | Teacher       | UK            | yara.r@example.com     | 555-1214       | 975 Pine St, Leeds, UK       |\\n\",\n    \"| 28  | Zack Blue     | 30  | Lawyer        | Australia     | zack.b@example.com     | 555-3436       | 135 Birch St, Adelaide, SA   |\\n\",\n    \"| 29  | Amy White     | 33  | Musician      | New Zealand   | amy.w@example.com      | 555-5658       | 159 Maple St, Wellington, NZ |\\n\",\n    \"| 30  | Ben Black     | 38  | Chef          | Ireland       | ben.b@example.com      | 555-7870       | 246 Fir St, Waterford, IE    |\\n\",\n    \"EOL\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"questions=(\\n\",\n    \"    \\\"Question: what is the age of John Doe? Your answer: The age of John Doe is \\\"\\n\",\n    \"    \\\"Question: what is the age of Zack Blue? Your answer: The age of Zack Blue is \\\"\\n\",\n    \"    \\\"Question: Which country is Ben Black from? Your answer: The country of Ben Black is \\\"\\n\",\n    \"    \\\"Question: Who has rachel.r@example.com as their email domain? Your answer: The email domain rachel.r@example.com belongs to \\\"\\n\",\n    \"    \\\"Question: What is the phone number for contacting Karen Red? Your answer: The phone number for contacting Karen Red is \\\"\\n\",\n    \"    \\\"Question: What is the occupation of Tina Blue? Your answer: The occupation of Tina Blue is \\\"\\n\",\n    \"    \\\"Question: What is the name of the person with id as 29? Your answer: The name of the person with id as 29 is \\\"\\n\",\n    \"    \\\"Question: What is the address of Alice Johnson? Your answer: The address of Alice Johnson is \\\"\\n\",\n    \"    \\\"Question: What is the id of Irene Orange? Your answer: The id of Irene Orange is \\\"\\n\",\n    \"    \\\"Question: What is the age of Leo Brown? Your answer: The age of Leo Brown is \\\"\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"# Function to make a single request\\n\",\n    \"make_request() {\\n\",\n    \"    local question=$1\\n\",\n    \"    local prompt_with_suffix=\\\"${LONG_PROMPT}\\n\",\n    \"\\n\",\n    \"Based on the table above, please answer this question:\\n\",\n    \"${question}\\\"\\n\",\n    \"    \\n\",\n    \"    local escaped_prompt=$(echo \\\"$prompt_with_suffix\\\" | jq -Rs .)\\n\",\n    \"    \\n\",\n    \"    # Make the curl request and capture both response and time\\n\",\n    \"    local response_file=$(mktemp)\\n\",\n    \"    time_output=$(TIMEFORMAT='%R'; { time curl -s http://localhost:8000/v1/chat/completions \\\\\\n\",\n    \"        -H \\\"Content-Type: application/json\\\" \\\\\\n\",\n    \"        -d \\\"{\\n\",\n    \"            \\\\\\\"model\\\\\\\": \\\\\\\"$MODEL_PATH\\\\\\\",\\n\",\n    \"            \\\\\\\"messages\\\\\\\": [\\n\",\n    \"                {\\n\",\n    \"                    \\\\\\\"role\\\\\\\": \\\\\\\"user\\\\\\\",\\n\",\n    \"                    \\\\\\\"content\\\\\\\": ${escaped_prompt}\\n\",\n    \"                }\\n\",\n    \"            ]\\n\",\n    \"        }\\\" > \\\"$response_file\\\"; } 2>&1)\\n\",\n    \"    \\n\",\n    \"    # Extract the response content\\n\",\n    \"    local response_content=$(cat \\\"$response_file\\\" | jq -r '.choices[0].message.content')\\n\",\n    \"    rm \\\"$response_file\\\"\\n\",\n    \"    \\n\",\n    \"    # Return both time and response\\n\",\n    \"    echo \\\"TIME:$time_output\\\"\\n\",\n    \"    echo \\\"RESPONSE:$response_content\\\"\\n\",\n    \"}\\n\",\n    \"\\n\",\n    \"# Make first request (warm-up) with a random question\\n\",\n    \"random_index=$((RANDOM % ${#questions[@]}))\\n\",\n    \"echo \\\"Warm-up request with question: ${questions[$random_index]}\\\"\\n\",\n    \"IFS=$'\\\\n' read -r -d '' time_str response_str < <(make_request \\\"${questions[$random_index]}\\\" && echo '')\\n\",\n    \"echo \\\"Response: $response_str\\\"\\n\",\n    \"echo \\\"Time taken: ${time_str#TIME:} seconds\\\"\\n\",\n    \"echo \\\"Warm-up complete\\\"\\n\",\n    \"echo \\\"-------------------\\\"\\n\",\n    \"\\n\",\n    \"# Make 10 timed requests with random questions\\n\",\n    \"total_time=0\\n\",\n    \"for i in {0..9}; do\\n\",\n    \"    random_index=$i\\n\",\n    \"    #random_index=$((RANDOM % ${#questions[@]}))\\n\",\n    \"    question=\\\"${questions[$random_index]}\\\"\\n\",\n    \"    echo \\\"Request $i with question: $question\\\"\\n\",\n    \"    \\n\",\n    \"    IFS=$'\\\\n' read -r -d '' time_str response_str < <(make_request \\\"$question\\\" && echo '')\\n\",\n    \"    time_taken=${time_str#TIME:}\\n\",\n    \"    response=${response_str#RESPONSE:}\\n\",\n    \"    \\n\",\n    \"    total_time=$(echo \\\"$total_time + $time_taken\\\" | bc -l)\\n\",\n    \"    echo \\\"Response: $response\\\"\\n\",\n    \"    echo \\\"Time taken: ${time_taken} seconds\\\"\\n\",\n    \"    echo \\\"-------------------\\\"\\n\",\n    \"done\\n\",\n    \"\\n\",\n    \"# Calculate and display average time\\n\",\n    \"average_time=$(echo \\\"scale=3; $total_time / 10\\\" | bc -l)\\n\",\n    \"echo \\\"Average time across 10 requests: ${average_time} seconds\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"```python\\n\",\n    \"Average time across 10 requests: .388 seconds\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"As seen from the two scenarios, average time with prefix caching enabled is lesser than the time it takes to serve the same requests with prefix caching disabled. This is attributed to the lesser time to compute the first token by reusing the common prefix across all the prompts.\\n\",\n    \"\\n\",\n    \"We also ran the same model configurations with public datasets with varying cache hit rates for benchmarking prefix caching on neuron and here are the results that we achieved:\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"| Dataset | TTFT (P50 in ms) without prefix caching | TTFT (P50 in ms) with prefix caching | Improvement |\\n\",\n    \"|---------|----------------------------------------|-------------------------------------|-------------|\\n\",\n    \"| math.math (>90% cache hit) | 342.81 | 107.8 | 3.18x |\\n\",\n    \"| dynamic sonnet 1k (~25% cache hit) | 123.08 | 102.15 | 1.2x |\\n\",\n    \"| dynamic sonnet 2k (~25% cache hit) | 592.8 | 377.2 | 1.57x |\\n\",\n    \"| HumanEval (No cache hit) | 89.7 | 91.8 | 0.98x |\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Conclusion\\n\",\n    \"\\n\",\n    \"In general, with a higher ratio of prefix(shared prompt) to prefill tokens that results in higher cache-hit rate, \\n\",\n    \"prefix caching achieves a TTFT speedup of up to 3x compared to when prefix caching is disabled. When the dataset has\\n\",\n    \"low prefix cache hit rate, prefix caching TTFT performance can degrade slightly due to the overhead of supporting\\n\",\n    \"block KV cache layout, as seen in the HumanEval dataset.\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"neuron-224\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.12\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/trn2-llama3.3-70b-dp-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tutorial: Scaling LLM Inference with Data Parallelism on Trn2\\n\",\n    \"\\n\",\n    \"This tutorial demonstrates how to implement data parallelism (DP) for LLM inference with multiple model copies on AWS Neuron. We'll walk through the steps to deploy multiple Llama 3.3 70B model endpoints on a single ```trn2.48xlarge``` instance using NxD Inference and vLLM, and run data parallel inference.\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. contents:: Table of contents\\n\",\n    \"    :local:\\n\",\n    \"    :depth: 2\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Data Parallel Inference\\n\",\n    \"\\n\",\n    \"We can achieve Data Parallelism by using multiple copies of the same model hosted on the instance to process multiple requests simultaneously. Using NxD Inference and vLLM, you can deploy multiple model endpoints by adjusting the tensor parallel degree (Tensor Parallelism (TP) refers to sharding model weight matrices onto multiple NeuronCores within each model copy) and allocating appropriate NeuronCore ranges for each model endpoint. While increasing the batch size with a single copy of the model increases throughput, introducing data parallelism with multiple model endpoints combined with tensor parallelism allows further increase in instance throughput with some impact to latency. Use this technique when you can relax the latency constraint of your application to further maximize the throughput of the instance.\\n\",\n    \"\\n\",\n    \"In this tutorial we use Llama 3.3 70B with DP=2 and TP=32. However, you can follow the same sequence of steps to deploy additional model copies by appropriately changing the tensor parallel degree. You can also use this guide to deploy multiple copies of any other models on Trn1 or Inf2 instances as long as the model fits and the DP x TP degree does not exceed the number of model cores.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Prerequisites\\n\",\n    \"\\n\",\n    \"### Setup and Connect to an Amazon EC2 Trn2 Instance\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\"\n   },\n   \"source\": [\n    \"An Amazon EC2 ``trn2.48xlarge`` instance with AWS Neuron SDK version 2.23.0 or later (:ref:`latest-neuron-release`) is required. \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To launch a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK and NxD Inference dependencies, see [NxD Inference Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/neuronx-distributed/setup-guide-nxd-inference.html).\\n\",\n    \"\\n\",\n    \"To use Jupyter Notebook on the Neuron instance, you can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"Make sure to activate the Neuron virtual environment\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \"```python\\n\",\n    \"source ~/aws_neuronx_venv_pytorch_2_5_nxd_inference/bin/activate\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To verify that NxD Inference has installed successfully, check that you can run the inference_demo console script.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \"```python\\n\",\n    \"inference_demo --help\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Download Model Weights\\n\",\n    \"\\n\",\n    \"To use this tutorial, you must first download a Llama 3.3 70B Instruct model checkpoint from Hugging Face to a local path on the Trn2 instance. For more information, see [Downloading Models](https://huggingface.co/docs/transformers/main/en/installation#offline-mode) in the Hugging Face documentation. You can download and use [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) for this tutorial.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Install the vLLM Neuron plugin\\n\\n\",\n    \"NxD Inference supports running models with vLLM via the upstream `vllm-neuron` plugin. \",\n    \"Install the latest release branch by following the steps in the \",\n    \"[vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html).\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Install LLMPerf\\n\",\n    \"\\n\",\n    \"In this tutorial, you will use [LLMPerf](https://github.com/ray-project/llmperf) to measure the performance.\\n\",\n    \"Follow [LLM Inference Benchmarking guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/neuronx-distributed/programming-guide/nxd-inference/nxdi-llm-inference-benchmarking.html) to install LLMPerf from source and apply patches incl. support to benchmark data parallel setups.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Step-by-Step Tutorial Instructions\\n\",\n    \"\\n\",\n    \"### Step 1: Compile the model\\n\",\n    \"\\n\",\n    \"Before we launch the model endpoint with vLLM, we'll use the NxD Inference library to compile the model with an appropriate configuration. Refer to [NxD Inference Features Configuration Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/neuronx-distributed/programming-guide/nxd-inference/nxdi-features-configuration.html) for more information. To compile a model for data parallelism inference, set the ```NUM_CORES```,```TP_DEGREE```, ```BATCH_SIZE``` to allow for strategic workflow distribution. For DP=2 with BATCH_SIZE>=1, TP_DEGREE should be set to 64/2=32 to maximize NeuronCore utilization across all model copies. Simply create and run a shell script as illustrated below:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"# Replace with path to your downloaded Hugging Face model checkpoints\\n\",\n    \"MODEL_PATH=\\\"/ubuntu/model_hf/Llama-3.3-70B-Instruct/\\\"\\n\",\n    \"\\n\",\n    \"# This is where the compiled model will be saved. The same path should be used when launching vLLM server for inference.\\n\",\n    \"\\n\",\n    \"NUM_CORES=128\\n\",\n    \"TP_DEGREE=32\\n\",\n    \"LNC=2\\n\",\n    \"BATCH_SIZE=4\\n\",\n    \"\\n\",\n    \"export NEURON_RT_VIRTUAL_CORE_SIZE=$LNC\\n\",\n    \"export NEURON_RT_NUM_CORES=$((NUM_CORES/NEURON_RT_VIRTUAL_CORE_SIZE))\\n\",\n    \"export NEURON_RT_EXEC_TIMEOUT=600\\n\",\n    \"export XLA_DENSE_GATHER_FACTOR=0\\n\",\n    \"export NEURON_RT_INSPECT_ENABLE=0\\n\",\n    \"\\n\",\n    \"inference_demo \\\\\\n\",\n    \"    --model-type llama \\\\\\n\",\n    \"    --task-type causal-lm \\\\\\n\",\n    \"        run \\\\\\n\",\n    \"        --model-path $MODEL_PATH \\\\\\n\",\n    \"        --compiled-model-path $COMPILED_MODEL_PATH \\\\\\n\",\n    \"        --torch-dtype bfloat16 \\\\\\n\",\n    \"        --start_rank_id 0 \\\\\\n\",\n    \"        --local_ranks_size $TP_DEGREE \\\\\\n\",\n    \"        --tp-degree $TP_DEGREE \\\\\\n\",\n    \"        --batch-size $BATCH_SIZE \\\\\\n\",\n    \"        --max-context-length 8192 \\\\\\n\",\n    \"        --seq-len 8192 \\\\\\n\",\n    \"        --on-device-sampling \\\\\\n\",\n    \"        --top-k 1 \\\\\\n\",\n    \"        --do-sample \\\\\\n\",\n    \"        --fused-qkv \\\\\\n\",\n    \"        --sequence-parallel-enabled \\\\\\n\",\n    \"        --qkv-kernel-enabled \\\\\\n\",\n    \"        --attn-kernel-enabled \\\\\\n\",\n    \"        --mlp-kernel-enabled \\\\\\n\",\n    \"        --cc-pipeline-tiling-factor 1 \\\\\\n\",\n    \"        --pad-token-id 2 \\\\\\n\",\n    \"        --enable-bucketing \\\\\\n\",\n    \"        --context-encoding-buckets 2048 4096 8192 \\\\\\n\",\n    \"        --token-generation-buckets 2048 4096 8192 \\\\\\n\",\n    \"        --compile-only \\\\\\n\",\n    \"        --prompt \\\"What is annapurna labs?\\\" 2>&1 | tee log2\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"It's important to specify the path to which the compiled model is saved, as this same path must be used when you later launch the vLLM server for inference, allowing you to use the pre-compiled model without having to compile it again.\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"metadata\": {\n    \"raw_mimetype\": \"text/restructuredtext\",\n    \"vscode\": {\n     \"languageId\": \"raw\"\n    }\n   },\n   \"source\": [\n    \".. note::\\n\",\n    \"    \\n\",\n    \"    To run this script on trn1, set LNC=1. For more information about LNC, see :ref:`logical-neuroncore-config` .\\n\",\n    \"    Also appropriately change NUM_CORES & TP_DEGREE (eg. 16 for DP=2)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"For detailed information about the inference_demo flags, you can consult the [NxD Inference API Reference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/neuronx-distributed/api-reference-guide/nxd-inference/index.html).\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Step 2: Launch model endpoints\\n\",\n    \"Create a deployment script (```deploy_vllm_endpoint.sh```) containing below code snippet that configures and launches a model endpoint. The script is parameterized so that you can pass a specific port number, range of neuron cores, tensor parallel degree and batch size.\\n\",\n    \"\\n\",\n    \"#### Key Parameters Explained:\\n\",\n    \"\\n\",\n    \"- ```MODEL_PATH```: The Hugging Face model identifier or local model_hf path containing Meta-Llama-3.3-70B-Instruct hugging face checkpoints. Eg. /home/ubuntu/model_hf/Llama-3.3-70B-Instruct/\\n\",\n    \"\\n\",\n    \"- ```port```: Network port for the endpoint Eg. 8000. The port number should be unique for each model endpoint.\\n\",\n    \"\\n\",\n    \"- ```cores```: Range of NeuronCores allocated to this endpoint. This should be a non overlapping range of cores when deploying multiple model endpoints on the same instance. For example, when allocated 32 NeuronCores to a model endpoint specify 0-31 or 32-63.\\n\",\n    \"\\n\",\n    \"- ```tp_degree```: Degree of tensor parallelism for model sharding. To maximize NeuronCores utilization, reduce tp_degree while increasing dp_degree.\\n\",\n    \"\\n\",\n    \"- ```bs``` : Batch size specified for model endpoint.\\n\",\n    \"\\n\",\n    \"These parameters should match the values used during compilation step above.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%writefile start_vllm.sh\\n\",\n    \"#!/bin/bash\\n\",\n    \"\\n\",\n    \"echo \\\"Running vLLM server in the background...\\\"\\n\",\n    \"\\n\",\n    \"# Default values for arguments\\n\",\n    \"DEFAULT_PORT=$PORT\\n\",\n    \"DEFAULT_CORES=$CORES\\n\",\n    \"DEFAULT_TP_DEGREE=32\\n\",\n    \"DEFAULT_BS=4\\n\",\n    \"\\n\",\n    \"# Help function\\n\",\n    \"show_help() {\\n\",\n    \"    echo \\\"Usage: $0 [options]\\\"\\n\",\n    \"    echo \\\"Options:\\\"\\n\",\n    \"    echo \\\"  -p port        Port number for vLLM endpoint (default: $DEFAULT_PORT)\\\"\\n\",\n    \"    echo \\\"  -c cores       Range of neuron cores (default: $DEFAULT_CORES)\\\"\\n\",\n    \"    echo \\\"  -t tp_degree   Tensor parallel degree (default: $DEFAULT_TP_DEGREE)\\\"\\n\",\n    \"    echo \\\"  -b bs          Batch size (default: $DEFAULT_BS)\\\"\\n\",\n    \"    echo \\\"  -h             Show this help message\\\"\\n\",\n    \"}\\n\",\n    \"\\n\",\n    \"# Parse single-letter arguments\\n\",\n    \"while getopts \\\"p:c:t:b:h\\\" opt; do\\n\",\n    \"    case $opt in\\n\",\n    \"        p) port=\\\"$OPTARG\\\" ;;\\n\",\n    \"        c) cores=\\\"$OPTARG\\\" ;;\\n\",\n    \"        t) tp_degree=\\\"$OPTARG\\\" ;;\\n\",\n    \"        b) bs=\\\"$OPTARG\\\" ;;\\n\",\n    \"        h) show_help; exit 0 ;;\\n\",\n    \"        ?) show_help; exit 1 ;;\\n\",\n    \"    esac\\n\",\n    \"done\\n\",\n    \"\\n\",\n    \"# Set defaults if not provided\\n\",\n    \"port=${port:-$DEFAULT_PORT}\\n\",\n    \"cores=${cores:-$DEFAULT_CORES}\\n\",\n    \"tp_degree=${tp_degree:-$DEFAULT_TP_DEGREE}\\n\",\n    \"bs=${bs:-$DEFAULT_BS}\\n\",\n    \"\\n\",\n    \"# Environment configurations\\n\",\n    \"export NEURON_RT_INSPECT_ENABLE=0\\n\",\n    \"export NEURON_RT_VIRTUAL_CORE_SIZE=2\\n\",\n    \"\\n\",\n    \"# These should be the same paths used when compiling the model.\\n\",\n    \"MODEL_PATH=\\\"/shared/models/llama-3.3-70b-instruct/\\\"\\n\",\n    \"COMPILED_MODEL_PATH=\\\"/shared/traced-models/dp_tutorial/\\\"\\n\",\n    \"\\n\",\n    \"export VLLM_NEURON_FRAMEWORK=\\\"neuronx-distributed-inference\\\"\\n\",\n    \"export NEURON_COMPILED_ARTIFACTS=$COMPILED_MODEL_PATH\\n\",\n    \"export NEURON_RT_VISIBLE_CORES=${cores}\\n\",\n    \"\\n\",\n    \"VLLM_RPC_TIMEOUT=100000\\n\",\n    \"nohup vllm serve \\\\\\n\",\n    \"    --model \\\"$MODEL_PATH\\\" \\\\\\n\",\n    \"    --max-num-seqs \\\"${bs}\\\" \\\\\\n\",\n    \"    --max-model-len 8192 \\\\\\n\",\n    \"    --no-enable-prefix-caching \\\\\\n\",\n    \"    --tensor-parallel-size \\\"${tp_degree}\\\" \\\\\\n\",\n    \"    --additional-config '{\\n\",\n    \"        \\\"override_neuron_config\\\": {\\n\",\n    \"            \\\"on_device_sampling_config\\\": {\\n\",\n    \"                \\\"do_sample\\\": true,\\n\",\n    \"                \\\"global_topk\\\": 64\\n\",\n    \"            }\\n\",\n    \"        }\\n\",\n    \"    }' \\\\\\n\",\n    \"    --port \\\"${port}\\\" > ./vllm_server.log 2>&1 &\\n\",\n    \"SERVER_PID=$!\\n\",\n    \"\\n\",\n    \"echo \\\"Server started in the background with the following id: $SERVER_PID. Waiting until server is ready to serve...\\\"\\n\",\n    \"\\n\",\n    \"until grep -q \\\"Server is ready to serve\\\" ./vllm_server.log 2>/dev/null || ! kill -0 $SERVER_PID 2>/dev/null; do sleep 0.5; done\\n\",\n    \"grep -q \\\"Server is ready to serve\\\" ./vllm_server.log 2>/dev/null && echo \\\"vLLM Server is ready!\\\" || (echo \\\"vLLM Server failed, check the ./vllm_server.log file\\\" && exit 1)\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!chmod +x ./start_vllm.sh\\n\",\n    \"!./start_vllm.sh -p 8000 -c 0-31\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!./start_vllm.sh -p 8001 -c 32-63\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Run this script to launch 2 vLLM servers. You can run these commands as background processes in the same terminal or run two separate terminals for each command. We launch two servers, each with a tensor parallel degree of 32 and batch size of 4. Note that the first vLLM server uses neuron cores 0-31 and the second one 32-63. You can pick any ports that are available.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"The server start up time can take a few minutes since the model weights are getting loaded. Once the vLLM servers have been launched, you should see the following log output. This implies that the model server has been deployed.\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"INFO:     Started server process [221607]\\n\",\n    \"INFO:     Waiting for application startup.\\n\",\n    \"INFO:     Application startup complete.\\n\",\n    \"INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Step 3: Benchmark the deployed model endpoints\\n\",\n    \"After the above steps, the vLLM server should be running. You can now measure the performance using LLMPerf. Ensure you have made the required changes to use LLMPerf with DP>1 by following [Install LLMPerf](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-dp-tutorial.html#install-llmperf)\\n\",\n    \"\\n\",\n    \"Below is a sample shell script to run LLMPerf. The script allows the user to specify tensor parallelism degree, data parallelism degree, and batch size through command-line arguments, with default values provided. It calculates the concurrency based on batch size and data parallelism, sets up the environment for benchmarking with input tokens N(7936, 30) and output tokens N(256,30), and then runs LlmPerf\\u2019s ```token_benchmark_ray.py``` with various parameters to measure the model endpoints\\u2019 performance. The benchmark simulates requests with specific input and output token distributions, and collects results for analysis.\\n\",\n    \"\\n\",\n    \"More information about several arguments used in the script can be found in the [llmperf open source code](https://github.com/ray-project/llmperf/blob/main/token_benchmark_ray.py)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"\\n\",\n    \"# Default values for arguments\\n\",\n    \"DEFAULT_TP_DEGREE=32\\n\",\n    \"DEFAULT_DP_DEGREE=2\\n\",\n    \"DEFAULT_BS=1\\n\",\n    \"\\n\",\n    \"# Help function\\n\",\n    \"show_help() {\\n\",\n    \"    echo \\\"Usage: $0 [options]\\\"\\n\",\n    \"    echo \\\"Options:\\\"\\n\",\n    \"    echo \\\"  -t tp_degree          Tensor parallel degree (default: $DEFAULT_TP_DEGREE)\\\"\\n\",\n    \"    echo \\\"  -d dp_degree          Data parallel degree (default: $DEFAULT_DP_DEGREE)\\\"\\n\",\n    \"    echo \\\"  -b bs          Batch size (default: $DEFAULT_BS)\\\"\\n\",\n    \"    echo \\\"  -h             Show this help message\\\"\\n\",\n    \"}\\n\",\n    \"\\n\",\n    \"# Parse single-letter arguments\\n\",\n    \"while getopts \\\"t:d:b:h\\\" opt; do\\n\",\n    \"    case $opt in\\n\",\n    \"        t) tp_degree=\\\"$OPTARG\\\" ;;\\n\",\n    \"        d) dp_degree=\\\"$OPTARG\\\" ;;\\n\",\n    \"        b) bs=\\\"$OPTARG\\\" ;;\\n\",\n    \"        h) show_help; exit 0 ;;\\n\",\n    \"        ?) show_help; exit 1 ;;\\n\",\n    \"    esac\\n\",\n    \"done\\n\",\n    \"\\n\",\n    \"# Set defaults if not provided\\n\",\n    \"tp_degree=${tp_degree:-$DEFAULT_TP_DEGREE}\\n\",\n    \"dp_degree=${dp_degree:-$DEFAULT_DP_DEGREE}\\n\",\n    \"bs=${bs:-$DEFAULT_BS}\\n\",\n    \"\\n\",\n    \"# Calculate total concurrent requests (batch_size * data_parallelism)\\n\",\n    \"# If result is less than 1, default to batch_size\\n\",\n    \"concurrency=$(awk -v batch=\\\"$bs\\\" -v dp_degree=\\\"$dp_degree\\\" 'BEGIN {\\n\",\n    \"    concurrency = int(batch * dp_degree)\\n\",\n    \"    print (concurrency >= 1 ? concurrency : batch)\\n\",\n    \"}')\\n\",\n    \"echo \\\"concurrency: $concurrency\\\"\\n\",\n    \"\\n\",\n    \"MODEL_PATH=\\\"/shared/ashdeok/llama33-70B/Llama-3.3-70B-Instruct\\\"\\n\",\n    \"export COMPILED_MODEL_PATH=\\\"/shared/ashdeok/llama33-70B/traced_model/Llama-3.3-70B-Instruct-DP/\\\"\\n\",\n    \"\\n\",\n    \"# Modify OpenAI's API key and API base to use vLLM's API server.\\n\",\n    \"export OPENAI_API_KEY=EMPTY\\n\",\n    \"\\n\",\n    \"#if you have more vLLM servers, append the required number of ports like so:\\n\",\n    \"#;http://localhost:8001/v1;http://localhost:8002/v1\\\"\\n\",\n    \"export OPENAI_API_BASE=\\\"http://0.0.0.0:8000/v1;http://0.0.0.0:8001/v1\\\"\\n\",\n    \"\\n\",\n    \"python /shared/ashdeok/PR_tutorials/llmperf/token_benchmark_ray.py \\\\\\n\",\n    \"--model ${MODEL_PATH} \\\\\\n\",\n    \"--mean-input-tokens 7936 \\\\\\n\",\n    \"--stddev-input-tokens 30 \\\\\\n\",\n    \"--mean-output-tokens 256 \\\\\\n\",\n    \"--stddev-output-tokens 30 \\\\\\n\",\n    \"--num-concurrent-requests ${concurrency} \\\\\\n\",\n    \"--results-dir \\\"/shared/ashdeok/results-DP/\\\" \\\\\\n\",\n    \"--timeout 21600 \\\\\\n\",\n    \"--max-num-completed-requests 1000 \\\\\\n\",\n    \"--additional-sampling-params '{\\\"temperature\\\": 0.7, \\\"top_k\\\": 50}' \\\\\\n\",\n    \"--llm-api \\\"openai\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Once the script starts executing, you will see output like:\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"INFO worker.py:1852 -- Started a local Ray instance.\\n\",\n    \"  4%|\\u258d         | 39/1000 [01:29<30:14,  1.89s/it]\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"Once benchmarking is complete, results can be found in the directory specified with the `--results-dir` flag in the ```benchmark_model.sh``` script.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Conclusion\\n\",\n    \"\\n\",\n    \"This tutorial demonstrates how data parallelism using multiple model copies can help increase the throughput. While standard batching (DP=1, BS>1) processes multiple requests through a single model copy, data parallelism deploys multiple independent model copies that can process different requests simultaneously. Our experiments with batch sizes 1 & 4 show that as we decrease Tensor Parallelism (TP) from 64 to 16 and increase Data Parallelism (DP) from 1 to 4, we see up to 2x throughput improvement with non optimized configurations. However, this comes with an increase in Time To First Token (TTFT) latency. This illustrates a key consideration: while DP can improve overall system throughput by processing more concurrent requests, it can lead to higher latency\\n\",\n    \"\\n\",\n    \"When to choose Data parallel with multiple model copies over using single model copy in an instance:\\n\",\n    \"\\n\",\n    \"- Use DP when your workload is collective-bound rather than memory or compute-bound. At high batch sizes, TP64 / TP128 collectives can become slow due to the number of hops and increasing throughput requirements. At high enough batch size, it can be better to pay the cost of duplicated weight loads and use DP with multiple model copies in order to reduce collective latencies.\\n\",\n    \"\\n\",\n    \"- Consider DP when you need to handle many concurrent requests and can tolerate moderate latency increases.\\n\",\n    \"\\n\",\n    \"Implementation requires careful consideration of your total memory budget, as each additional model copy increases memory consumption. You'll need to balance the number of model copies against the resources allocated to each model copy based on your specific throughput and latency requirements. By understanding these trade-offs and following the implementation guidelines in this tutorial, users can select the most appropriate approach for their specific use case and optimize their inference setup accordingly.\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"neuron-224\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.10.12\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/trn2-llama3.3-70b-fp8.rst",
    "content": ".. _nxdi-vllm-llama-fp8-inference-tutorial:\n\nTutorial: Deploy fp8 quantized Llama3.3-70B on Trn2 instances\n============================================================================================\n\nQuantization can significantly reduce the model size and inference time. This tutorial provides a step-by-step guide to deploy a fp8 quantized Llama3.3-70B on Trainium2 instances. We utilize the custom quantization feature to quantize specific layers from the original model checkpoint. \n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nEnvironment setup\n-----------------\nThis tutorial requires that you have a Trn2 instance created from a Deep Learning AMI that has the Neuron SDK pre-installed.\nTo set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK,\nsee :ref:`nxdi-setup`.\n\nConnect to the EC2 instance via your preferred option: EC2 Instance Connect, Session Manager, or SSH client.\nFor more information, see `Connect to your Linux instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-methods.html>`_ in the Amazon EC2 User Guide.\n\n\nFor this tutorial, we use a pre-installed virtual environment in the DLAMI at ``/opt/aws_neuronx_venv_pytorch_inference_vllm``. If you prefer to use a container, start a built-in vLLM Neuron Deep Learning Container (DLC). For more information about available containers,\nsee the `AWS Neuron Deep Learning Containers repository <https://github.com/aws-neuron/deep-learning-containers#vllm-inference-neuronx>`_.\n\n\n\nStep 1: Quantize the Llama3.3 70B b16 checkpoint to fp8 \n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nWe first quantize the `original Llama3.3 70B model <https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct>`_ checkpoint using `modules from Neuronx Distributed <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/custom-quantization.html#quantize-using-nxd>`_.\nIn the below script, ``modules_to_not_convert`` contains the layers that are not being quantized to fp8. In this instance, we quantize only the mlp layers except the first and the last layer. If you have a similar FP8 checkpoint, you can skip this step and use that.\nUse the below code snippet to create a script for quantization and execute the script. This will create a fp8 checkpoint in the `output_path`.\n::\n\n    import json\n    import torch\n    from typing import Optional, List\n    from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, AutoConfig\n    from neuronx_distributed_inference.modules.checkpoint import prune_state_dict,save_state_dict_safetensors\n    from neuronx_distributed.quantization.quantization_utils import quantize_pytorch_model_per_channel_symmetric, convert_qint8_to_int8_state_dict\n\n    model_path = \"<path to the bf16 checkpoint>\"\n    output_path = \"<path to save the quantized checkpoint>\"\n\n    modules_to_not_convert = [\n        \"lm_head\",\n        \"layers.0.mlp\",\n        \"layers.79.mlp\",\n        \"layers.0.self_attn\",\n        \"layers.1.self_attn\",\n        \"layers.2.self_attn\",\n        \"layers.3.self_attn\",\n        \"layers.4.self_attn\",\n        \"layers.5.self_attn\",\n        \"layers.6.self_attn\",\n        \"layers.7.self_attn\",\n        \"layers.8.self_attn\",\n        \"layers.9.self_attn\",\n        \"layers.10.self_attn\",\n        \"layers.11.self_attn\",\n        \"layers.12.self_attn\",\n        \"layers.13.self_attn\",\n        \"layers.14.self_attn\",\n        \"layers.15.self_attn\",\n        \"layers.16.self_attn\",\n        \"layers.17.self_attn\",\n        \"layers.18.self_attn\",\n        \"layers.19.self_attn\",\n        \"layers.20.self_attn\",\n        \"layers.21.self_attn\",\n        \"layers.22.self_attn\",\n        \"layers.23.self_attn\",\n        \"layers.24.self_attn\",\n        \"layers.25.self_attn\",\n        \"layers.26.self_attn\",\n        \"layers.27.self_attn\",\n        \"layers.28.self_attn\",\n        \"layers.29.self_attn\",\n        \"layers.30.self_attn\",\n        \"layers.31.self_attn\",\n        \"layers.32.self_attn\",\n        \"layers.33.self_attn\",\n        \"layers.34.self_attn\",\n        \"layers.35.self_attn\",\n        \"layers.36.self_attn\",\n        \"layers.37.self_attn\",\n        \"layers.38.self_attn\",\n        \"layers.39.self_attn\",\n        \"layers.40.self_attn\",\n        \"layers.41.self_attn\",\n        \"layers.42.self_attn\",\n        \"layers.43.self_attn\",\n        \"layers.44.self_attn\",\n        \"layers.45.self_attn\",\n        \"layers.46.self_attn\",\n        \"layers.47.self_attn\",\n        \"layers.48.self_attn\",\n        \"layers.49.self_attn\",\n        \"layers.50.self_attn\",\n        \"layers.51.self_attn\",\n        \"layers.52.self_attn\",\n        \"layers.53.self_attn\",\n        \"layers.54.self_attn\",\n        \"layers.55.self_attn\",\n        \"layers.56.self_attn\",\n        \"layers.57.self_attn\",\n        \"layers.58.self_attn\",\n        \"layers.59.self_attn\",\n        \"layers.60.self_attn\",\n        \"layers.61.self_attn\",\n        \"layers.62.self_attn\",\n        \"layers.63.self_attn\",\n        \"layers.64.self_attn\",\n        \"layers.65.self_attn\",\n        \"layers.66.self_attn\",\n        \"layers.67.self_attn\",\n        \"layers.68.self_attn\",\n        \"layers.69.self_attn\",\n        \"layers.70.self_attn\",\n        \"layers.71.self_attn\",\n        \"layers.72.self_attn\",\n        \"layers.73.self_attn\",\n        \"layers.74.self_attn\",\n        \"layers.75.self_attn\",\n        \"layers.76.self_attn\",\n        \"layers.77.self_attn\",\n        \"layers.78.self_attn\",\n        \"layers.79.self_attn\"\n    ]\n\n    def quantize(model: torch.nn.Module, dtype=torch.qint8, modules_to_not_convert: Optional[List[str]] = None) -> torch.nn.Module:\n        quant_model = quantize_pytorch_model_per_channel_symmetric(model,dtype=dtype, modules_to_not_convert=modules_to_not_convert)\n        model_quant_sd = quant_model.state_dict()\n        convert_qint8_to_int8_state_dict(model_quant_sd)\n        quantized_state_dict = prune_state_dict(model_quant_sd)\n        return quantized_state_dict\n\n    model = AutoModelForCausalLM.from_pretrained(model_path)\n    tokenizer = AutoTokenizer.from_pretrained(model_path)\n    generation_config = GenerationConfig.from_pretrained(model_path)\n    config = AutoConfig.from_pretrained(model_path)\n\n    state_dict = quantize(model,torch.float8_e4m3fn,modules_to_not_convert)\n\n    save_state_dict_safetensors(state_dict=state_dict,state_dict_dir=output_path)\n\n    #save tokenizer, config in new checkpoint folder\n    tokenizer.save_pretrained(output_path)\n    config.save_pretrained(output_path)\n    generation_config.save_pretrained(output_path)\n\n    modules_to_not_convert_json = {\n        \"model\": {\n            \"modules_to_not_convert\": modules_to_not_convert\n        }\n    }\n\n    with open(f\"{output_path}/modules_to_not_convert.json\", \"w\") as f:\n        json.dump(modules_to_not_convert_json, f, indent=2)\n\n\n\n\nStep 2: Compile the model\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn this step, we use the quantized fp8 checkpoint to compile the model using a utility from `neuronx-distributed-inference <https://github.com/aws-neuron/neuronx-distributed-inference>`_.\nNote that we are using multiple optimization features like tensor parallelism, sequence parallelism and optimized kernels for attention, mlp and QKV computation.\nYou can modify some of the below parameters based on your use case:\n\n* ``tp-degree``: set this to the number of neuron cores for partitioning the model. Typically ``local_ranks_size`` needs to be set to the same value.\n* ``batch-size``: set this to the desired number of requests to process simultaneously. Along with this, ``tkg-batch-size`` and ``max-batch-size`` should be set to the same value.\n* ``seq-len``: set this to the maximum sequence length during inference. i.e. sum of input and output sequence lengths.\n\n::\n\n    export NEURON_RT_INSPECT_ENABLE=0\n    export NEURON_RT_EXEC_TIMEOUT=600\n    export NEURON_RT_VIRTUAL_CORE_SIZE=2\n    export NEURON_RT_NUM_CORES=64\n    export XLA_DENSE_GATHER_FACTOR=0\n    export XLA_IR_DEBUG=1\n    export XLA_HLO_DEBUG=1\n    export XLA_HANDLE_SPECIAL_SCALAR=1\n    export UNSAFE_FP8FNCAST=1\n    export DISABLE_NUMERIC_CC_TOKEN=1\n    MODEL_PATH=\"<path to the fp8 model checkpoint\"\n    COMPILED_MODEL_PATH=\"<folder to save compiled artifacts>\"\n    export BASE_COMPILE_WORK_DIR=\"<folder to save compiled artifacts>\"\n    inference_demo \\\n        --model-type llama \\\n        --task-type causal-lm \\\n        run \\\n        --model-path $MODEL_PATH \\\n        --compiled-model-path $COMPILED_MODEL_PATH \\\n        --torch-dtype bfloat16 \\\n        --batch-size 4 \\\n        --enable-bucketing \\\n        --local_ranks_size 64 \\\n        --tp-degree 64 \\\n        --start_rank_id 0 \\\n        --pad-token-id 0 \\\n        --cc-pipeline-tiling-factor 1 \\\n        --on-device-sampling \\\n        --global-topk 256 \\\n        --dynamic \\\n        --top-k 50 \\\n        --top-p 0.9 \\\n        --temperature 0.7 \\\n        --do-sample \\\n        --sequence-parallel-enabled \\\n        --fused-qkv \\\n        --qkv-kernel-enabled \\\n        --attn-kernel-enabled \\\n        --mlp-kernel-enabled \\\n        --logical-neuron-cores 2 \\\n        --prompt \"What is annapurna labs?\" \\\n        --ctx-batch-size 1 \\\n        --tkg-batch-size 4 \\\n        --max-batch-size 4 \\\n        --is-continuous-batching \\\n        --compile-only \\\n        --quantized-mlp-kernel-enabled \\\n        --quantization-type per_channel_symmetric \\\n        --rmsnorm-quantize-kernel-enabled \\\n        --modules-to-not-convert-file $MODEL_PATH/modules_to_not_convert.json \\\n        --async-mode \\\n        --attn-block-tkg-nki-kernel-enabled \\\n        --attn-block-tkg-nki-kernel-cache-update \\\n        --k-cache-transposed \\\n        --save-sharded-checkpoint \\\n        --max-context-length 4096 \\\n        --seq-len 5120 \\\n        --context-encoding-buckets  2048 4096 \\\n        --token-generation-buckets  5120   2>&1 | tee compile.log\n\n.. note::\n\n    There is a known limitation for compiling the fp8 model directly through vllm. This will be fixed in a future release.\n\n\nStep 3: Serve the model using vLLM\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nIn this step, we use the pre-compiled model from the previous step and serve it using vllm.\n\n* ``tensor-parallel-size``: set this to the ``tp-degree`` used during compilation.\n* ``max-num-seqs``: set this to the ``batch-size`` used during compilation.\n* ``max-model-len``: set this to ``seq-len`` from the above step.\n\nNote that we set an environment variable (``NEURON_COMPILED_ARTIFACTS``) to the path that has the compiled model from the previous step. The vllm command skips compilation and loads the model using the pre-compiled artifacts.\n::\n\n    export NEURON_RT_INSPECT_ENABLE=0\n    export NEURON_RT_EXEC_TIMEOUT=600\n    export NEURON_RT_VIRTUAL_CORE_SIZE=2\n    export NEURON_RT_NUM_CORES=64\n    export NEURON_RT_VISIBLE_CORES='0-63'\n    export XLA_DENSE_GATHER_FACTOR=0\n    export XLA_IR_DEBUG=1\n    export XLA_HLO_DEBUG=1\n    export XLA_HANDLE_SPECIAL_SCALAR=1\n    export UNSAFE_FP8FNCAST=1\n    export DISABLE_NUMERIC_CC_TOKEN=1\n    export VLLM_RPC_TIMEOUT=100000\n    export VLLM_NEURON_FRAMEWORK=neuronx-distributed-inference\n    \n    MODEL_PATH=\"<path to Llama3.3 70B fp8 checkpoint>\"\n    COMPILED_MODEL_PATH=\"<path to a folder that has the pre-compiled model artifacts from the previous step>\"\n    export NEURON_COMPILED_ARTIFACTS=$COMPILED_MODEL_PATH\n\n    vllm serve \\\n        $MODEL_PATH \\\n        --tensor-parallel-size 64 \\\n        --max-num-seqs 4 \\\n        --max-model-len 5120 \\\n        --port 8000 \\\n        --disable-log-requests \\\n        --block_size 128 \\\n        --num-gpu-blocks-override 4 \\\n        --no-enable-prefix-caching \\\n        --additional-config '{\n            \"override_neuron_config\": {\n                \"max_prompt_length\": 4096\n               }\n        }' 2>&1 | tee vllm.log \n\n\nOnce the model is loaded, you will see the following output:\n\n::\n\n    INFO:     Started server process [7]\n    INFO:     Waiting for application startup.\n    INFO:     Application startup complete.\n\nThis indicates the server is ready and the model endpoint is available for inference.\n\nStep 4: Test the endpoint\n~~~~~~~~~~~~~~~~~~~~~~~~~~\nYou can test the endpoint using curl or any HTTP client:\n\n::\n\n    curl http://localhost:8000/v1/completions \\\n        -H \"Content-Type: application/json\" \\\n        -d '{\n            \"model\": \"<model name>\",\n            \"prompt\": \"What is machine learning?\",\n            \"max_tokens\": 100,\n            \"temperature\": 0.7\n        }'\n\n\nConclusion\n----------\nYou have successully quantized a Llama3.3 70B model to fp8 and deployed the model on Trainium 2 for inference. To evaluate the accuracy of the quantized model, run accuracy evaluation tests using :ref:`accuracy-eval-with-datasets`."
  },
  {
    "path": "libraries/nxd-inference/tutorials/trn2-llama3.3-70b-tutorial.rst",
    "content": ".. _nxdi-trn2-llama3.3-70b-tutorial:\n\nTutorial: Using Speculative Decoding to improve Llama-3.3-70B inference performance on Trn2 instances\n=======================================================================================================\n\nNeuronX Distributed (NxD) Inference allows you to deploy Llama3.3 70B on\na single Trn2 or Trn1 instance. This tutorial provides a step-by-step\nguide to deploy Llama3.3 70B on a Trn2 instance using two different configurations, one without\nspeculative decoding and the other with draft model based speculative decoding enabled\n(with Llama-3.2 1B as the draft model).\nWe will also measure performance by running a load test using LLMPerf\nand compare key metrics between the two configurations.\nWhile this tutorial uses batch size 1 for demonstration purposes, the model configuration provides support for batch sizes up to 4.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nPrerequisites:\n---------------\nSet up and connect to a Trn2.48xlarge instance\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAs a prerequisite, this tutorial requires that you have a Trn2 instance\ncreated from a Deep Learning AMI that has the Neuron SDK pre-installed.\n\nTo set up a Trn2 instance using Deep Learning AMI with pre-installed Neuron SDK,\nsee :ref:`nxdi-setup`.\n\nAfter setting up an instance, use SSH to connect to the Trn2 instance using the key pair that you\nchose when you launched the instance.\n\nAfter you are connected, activate the Python virtual environment that\nincludes the Neuron SDK.\n\n::\n\n   source ~/aws_neuronx_venv_pytorch_2_5_nxd_inference/bin/activate\n\nRun ``pip list`` to verify that the Neuron SDK is installed.\n\n::\n\n   pip list | grep neuron\n\nYou should see Neuron packages including\n``neuronx-distributed-inference`` and ``neuronx-cc``.\n\nInstall packages\n~~~~~~~~~~~~~~~~~\nNxD Inference supports running models with vLLM. This functionality is\navailable in the AWS Neuron fork of the vLLM GitHub repository. Install the latest release branch of vLLM from the AWS Neuron fork \nfollowing instructions in the :ref:`vLLM User Guide for NxD Inference<nxdi-vllm-user-guide-v1>`.\n\nIn this tutorial, you will use `llmperf <https://github.com/ray-project/llmperf>`_ to measure the performance.\nWe will use the `load test <https://github.com/ray-project/llmperf?tab=readme-ov-file#load-test>`_ feature of LLMPerf and measure the performance for accepting\n10,000 tokens as input and generating 1500 tokens as output.\nInstall llmperf into the virtual environment.\n\n::\n\n    git clone https://github.com/ray-project/llmperf.git\n    cd llmperf\n    pip install -e . \n\n\nDownload models\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTo use this sample, you must first download a 70B model checkpoint from Hugging Face\nto a local path on the Trn2 instance. For more information, see\n`Downloading models <https://huggingface.co/docs/hub/en/models-downloading>`__\nin the Hugging Face documentation. You can download and use `meta-llama/Llama-3.3-70B-Instruct <https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct>`__\nfor this tutorial.\n\nSince we will be using Speculative Decoding in the second configuration, \nyou will also need a draft model checkpoint. You can download and use `meta-llama/Llama-3.2-1B-Instruct <https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct>`__.\n\n.. note::\n\n    NxD Inference supports batch sizes up to 4 for this model configuration. To determine the optimal batch size for your specific use case, we recommend incrementally testing batch sizes from 1 to 4 while monitoring your application's performance metrics.\n\nScenario 1: Run Llama3.3 70B on Trn2\n-------------------------------------\nIn this scenario, you will run Llama3.3 70B on Trn2 without Speculative Decoding\nusing bfloat16 precision.\n\nStep 1: Compile the model\n~~~~~~~~~~~~~~~~~~~~~~~~~~\nWe will first compile and run generation on a sample prompt using a command\ninstalled by ``neuronx-distributed-inference``. Save the contents of the below script to your favorite \nshell script file, for example, ``compile_model.sh`` and then run it.\n\nNote that we are using the following features as described in\nthe tutorial for running 405B model :ref:`nxdi-trn2-llama3.1-405b-tutorial`\n\n* Logical NeuronCore Configuration (LNC)\n* Tensor parallelism (TP) on Trn2\n* Optimized Kernels\n\nThe script compiles the model and runs generation on the given input prompt.\nNote the path we used to save the compiled model. This path should be used\nwhen launching vLLM server for inference so that the compiled model can be loaded without recompilation.\nPlease refer to :ref:`nxd-inference-api-guide` for more information on these ``inference_demo`` flags.\n\n\n.. note::\n\n    Known issue: Using kernels with bucket length of 1024 or less may lead to ``Numerical Error`` in inference.\n\n    ::\n\n        RuntimeError: Failed to execute the model status=1003 message=Numerical Error\n\n\n::\n\n    # Replace this with the path where you downloaded and saved the model files.\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\"\n    # This is where the compiled model will be saved. The same path\n    # should be used when launching vLLM server for inference.\n    COMPILED_MODEL_PATH=\"/home/ubuntu/traced_model/Llama-3.3-70B-Instruct/\"\n\n    NUM_CORES=128\n    TP_DEGREE=64\n    LNC=2\n\n    export NEURON_RT_VIRTUAL_CORE_SIZE=$LNC\n    export NEURON_RT_NUM_CORES=$((NUM_CORES/NEURON_RT_VIRTUAL_CORE_SIZE))\n    export NEURON_RT_EXEC_TIMEOUT=600 \n    export XLA_DENSE_GATHER_FACTOR=0 \n    export NEURON_RT_INSPECT_ENABLE=0\n\n    inference_demo \\\n        --model-type llama \\\n        --task-type causal-lm \\\n            run \\\n            --model-path $MODEL_PATH \\\n            --compiled-model-path $COMPILED_MODEL_PATH \\\n            --torch-dtype bfloat16 \\\n            --start_rank_id 0 \\\n            --local_ranks_size $TP_DEGREE \\\n            --tp-degree $TP_DEGREE \\\n            --batch-size 1 \\\n            --max-context-length 12288 \\\n            --seq-len 12800 \\\n            --on-device-sampling \\\n            --top-k 1 \\\n            --do-sample \\\n            --fused-qkv \\\n            --sequence-parallel-enabled \\\n            --qkv-kernel-enabled \\\n            --attn-kernel-enabled \\\n            --mlp-kernel-enabled \\\n            --cc-pipeline-tiling-factor 1 \\\n            --pad-token-id 2 \\\n            --enable-bucketing \\\n            --context-encoding-buckets 2048 4096 8192 12288 \\\n\t        --token-generation-buckets 2048 4096 8192 12800 \\\n            --prompt \"What is annapurna labs?\" 2>&1 | tee log\n\n\n    \nStep 2: Run the model using vLLM \n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nAfter compiling the model, you can run the model using vLLM. Save the contents of the below script to another\nshell script file, for example, ``start_vllm.sh`` and then run it.\n\n::\n\n    export NEURON_RT_INSPECT_ENABLE=0 \n    export NEURON_RT_VIRTUAL_CORE_SIZE=2\n\n    # These should be the same paths used when compiling the model.\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\"\n    COMPILED_MODEL_PATH=\"/home/ubuntu/traced_model/Llama-3.3-70B-Instruct/\"\n\n    export VLLM_NEURON_FRAMEWORK=\"neuronx-distributed-inference\"\n    export NEURON_COMPILED_ARTIFACTS=$COMPILED_MODEL_PATH\n    VLLM_RPC_TIMEOUT=100000 vllm serve \\\n        --model $MODEL_PATH \\\n        --max-num-seqs 1 \\\n        --max-model-len 12800 \\\n        --tensor-parallel-size 64 \\\n        --device neuron \\\n        --use-v2-block-manager \\\n        --override-neuron-config \"{\\\"on_device_sampling_config\\\": {\\\"do_sample\\\": true}, \\\"skip_warmup\\\": true}\" \\\n        --port 8000 &\n    PID=$!\n    echo \"vLLM server started with PID $PID\"\n\nStep 3: Measure performance using LLMPerf\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nAfter the above steps, the vllm server should be running. \nYou can now measure the performance using LLMPerf. Before we can use the ``llmperf`` package, we need to make a few changes to its code. \nFollow :ref:`benchmarking with LLMPerf guide <llm_perf_patch_changes>` to apply the code changes.\n\n\nBelow is a sample shell script to run LLMPerf. To provide the model with 10000 tokens as input and generate 1500 tokens as output on average,\nwe use the following parameters from LLMPerf:\n\n::\n\n    --mean-input-tokens 10000 \\\n    --mean-output-tokens 1500 \\\n\n\nMore information about several arguments used in the script can be found in the \n`llmperf open source code <https://github.com/ray-project/llmperf/blob/main/token_benchmark_ray.py>`_.\n\n::\n\n    # This should be the same path to which the model was downloaded (also used in the above steps).\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\"\n    # This is the name of directory where the test results will be saved.\n    OUTPUT_PATH=llmperf-results-sonnets\n\n    export OPENAI_API_BASE=\"http://localhost:8000/v1\"\n    export OPENAI_API_KEY=\"mock_key\"\n\n    python token_benchmark_ray.py \\\n        --model $MODEL_PATH \\\n        --mean-input-tokens 10000 \\\n        --stddev-input-tokens 0 \\\n        --mean-output-tokens 1500 \\\n        --stddev-output-tokens 0 \\\n        --num-concurrent-requests 1\\\n        --timeout 3600 \\\n        --max-num-completed-requests 50 \\\n        --tokenizer $MODEL_PATH \\\n        --additional-sampling-params '{}' \\\n        --results-dir $OUTPUT_PATH \\\n        --llm-api \"openai\"\n\nA sample output from the above script is shown below:\n\n::\n\n    Results for token benchmark for /home/ubuntu/models/Llama-3.3-70B-Instruct/ queried with the openai api.\n\n    inter_token_latency_s\n        p25 = 0.01964743386193489\n        p50 = 0.01965969146322459\n        p75 = 0.019672998415771872\n        p90 = 0.01969826815724373\n        p95 = 0.019810569172135244\n        p99 = 0.020350346909947692\n        mean = 0.01969182239660784\n        min = 0.0196275211258056\n        max = 0.020702997242410977\n        stddev = 0.00015700734112322808\n    ttft_s\n        p25 = 0.8109508841298521\n        p50 = 0.8142827898263931\n        p75 = 30.46490489714779\n        p90 = 30.513100237119943\n        p95 = 30.521608413150535\n        p99 = 48.876512633068415\n        mean = 11.503728219866753\n        min = 0.8080519903451204\n        max = 66.4881955627352\n        stddev = 15.692731777293613\n    end_to_end_latency_s\n        p25 = 30.296781020238996\n        p50 = 30.326033774763346\n        p75 = 59.9560666854959\n        p90 = 60.001504834741354\n        p95 = 60.028880204679446\n        p99 = 79.1842334462329\n        mean = 41.04328096391633\n        min = 30.265212223865092\n        max = 97.54387667682022\n        stddev = 15.796048923358924\n    request_output_throughput_token_per_s\n        p25 = 25.044969421803977\n        p50 = 49.49542857484997\n        p75 = 49.543217224244\n        p90 = 49.583184869985566\n        p95 = 49.58588728343319\n        p99 = 49.592597790896676\n        mean = 40.91042833304163\n        min = 15.387946954098137\n        max = 49.59489426003143\n        stddev = 11.825984480587056\n    number_input_tokens\n        p25 = 10000.0\n        p50 = 10000.0\n        p75 = 10000.0\n        p90 = 10000.0\n        p95 = 10000.0\n        p99 = 10000.0\n        mean = 10000.0\n        min = 10000\n        max = 10000\n        stddev = 0.0\n    number_output_tokens\n        p25 = 1501.0\n        p50 = 1501.0\n        p75 = 1501.0\n        p90 = 1501.0\n        p95 = 1501.0\n        p99 = 1502.02\n        mean = 1501.04\n        min = 1501\n        max = 1503\n        stddev = 0.282842712474619\n    Number Of Errored Requests: 0\n    Overall Output Throughput: 36.55567822866449\n    Number Of Completed Requests: 50\n    Completed Requests Per Minute: 1.4612140207588533\n\n\nScenario 2: Run Llama3.3 70B on Trn2 with Speculative Decoding\n--------------------------------------------------------------\nIn this scenario, you will run Llama3.3 70B on Trn2 with Speculative Decoding.\nSpecifically, we will use the below variations from the supported variants as described in\n:ref:`nxd-speculative-decoding`\n\n* Speculative Decoding with Llama-3.2-1B as the draft model :ref:`nxd-vanilla-speculative-decoding`\n* Fused Speculation for improved performance :ref:`nxd-fused-speculative-decoding`\n\nStep 1: Compile the model\n~~~~~~~~~~~~~~~~~~~~~~~~~~\nWhen compiling the model to use speculative decoding, you need to provide \na draft model checkpoint and a few additional parameters to the ``inference_demo`` command.\n\nFor a quick review, here are the additional arguments provided:\n\n::\n\n            --draft-model-path $DRAFT_MODEL_PATH \\\n            --enable-fused-speculation \\\n            --speculation-length 7 \\\n\nPlease refer to :ref:`nxd-inference-api-guide` for more information on these ``inference_demo`` flags.\nThe complete script to compile the model for this configuration is shown below:\n\n\n.. note::\n\n    Known issue: Using kernels with bucket length of 1024 or less may lead to ``Numerical Error`` in inference.\n\n    ::\n\n        RuntimeError: Failed to execute the model status=1003 message=Numerical Error\n\n\n::\n\n    # This is the same path as in the previous scenario.\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\"\n    # This is the path where the draft model is downaloded and saved.\n    DRAFT_MODEL_PATH=\"/home/ubuntu/models/Llama-3.2-1B-Instruct/\"\n    # As in the previous scenario, this is where the compiled model will be saved.\n    COMPILED_MODEL_PATH=\"/home/ubuntu/traced_model/Llama-3.3-70B-Instruct/\"\n\n    NUM_CORES=128\n    TP_DEGREE=64\n    LNC=2\n\n    export NEURON_RT_VIRTUAL_CORE_SIZE=$LNC\n    export NEURON_RT_NUM_CORES=$((NUM_CORES/NEURON_RT_VIRTUAL_CORE_SIZE))\n    export NEURON_RT_EXEC_TIMEOUT=600 \n    export XLA_DENSE_GATHER_FACTOR=0 \n    export NEURON_RT_INSPECT_ENABLE=0\n\n    inference_demo \\\n        --model-type llama \\\n        --task-type causal-lm \\\n            run \\\n            --model-path $MODEL_PATH \\\n            --compiled-model-path $COMPILED_MODEL_PATH \\\n            --torch-dtype bfloat16 \\\n            --start_rank_id 0 \\\n            --local_ranks_size $TP_DEGREE \\\n            --tp-degree $TP_DEGREE \\\n            --batch-size 1 \\\n            --max-context-length 12288 \\\n            --seq-len 12800 \\\n            --on-device-sampling \\\n            --top-k 1 \\\n            --fused-qkv \\\n            --sequence-parallel-enabled \\\n            --qkv-kernel-enabled \\\n            --attn-kernel-enabled \\\n            --mlp-kernel-enabled \\\n            --cc-pipeline-tiling-factor 1 \\\n            --draft-model-path $DRAFT_MODEL_PATH \\\n            --enable-fused-speculation \\\n            --speculation-length 7 \\\n            --pad-token-id 2 \\\n            --enable-bucketing \\\n            --context-encoding-buckets 2048 4096 8192 12288 \\\n\t        --token-generation-buckets 2048 4096 8192 12800 \\\n            --prompt \"What is annapurna labs?\" 2>&1 | tee log\n\nStep 2: Run the model using vLLM\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nSimilar to compiling the model, we need to specify parameters specific to \nspeculative decoding when running the model using vLLM.\n\nFor a quick glance, these are the parameters that are different for \nrunning vLLM server with model compiled using speculative decoding:\n\n::\n\n            --speculative-max-model-len 12800 \\\n            --speculative-model $DRAFT_MODEL_PATH \\\n            --num-speculative-tokens 7 \\\n            --override-neuron-config \"{\\\"enable_fused_speculation\\\":true}\" \\\n            \nHere is the complete script to run the model using vLLM with speculative decoding:\n\n::\n\n    export NEURON_RT_INSPECT_ENABLE=0 \n    export NEURON_RT_VIRTUAL_CORE_SIZE=2\n\n    # These should be the same paths used when compiling the model.\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\"\n    DRAFT_MODEL_PATH=\"/home/ubuntu/models/Llama-3.2-1B-Instruct/\"\n    COMPILED_MODEL_PATH=\"/home/ubuntu/traced_model/Llama-3.3-70B-Instruct/\"\n\n    export VLLM_NEURON_FRAMEWORK=\"neuronx-distributed-inference\"\n    export NEURON_COMPILED_ARTIFACTS=$COMPILED_MODEL_PATH\n    VLLM_RPC_TIMEOUT=100000 vllm serve \\\n        --model $MODEL_PATH \\\n        --max-num-seqs 1 \\\n        --max-model-len 12800 \\\n        --tensor-parallel-size 64 \\\n        --device neuron \\\n        --speculative-max-model-len 12800 \\\n        --speculative-model $DRAFT_MODEL_PATH \\\n        --num-speculative-tokens 7 \\\n        --use-v2-block-manager \\\n        --override-neuron-config \"{\\\"enable_fused_speculation\\\":true}\" \\\n        --port 8000 &\n    PID=$!\n    echo PID=$PID\n    echo \"vLLM server started with PID $PID\"\n\nStep 3: Measure performance using LLMPerf\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nThe script to measure the performance using LLMPerf is same as the one used in the first scenario. Before we can use the ``llmperf`` package, we need to make a few changes to its code. \nFollow :ref:`benchmarking with LLMPerf guide <llm_perf_patch_changes>` to apply the code changes.\n\nFor convenience, here's the script once again:\n\n::\n\n    # This should be the same path to which the model was downloaded (also used in the above steps).\n    MODEL_PATH=\"/home/ubuntu/models/Llama-3.3-70B-Instruct/\"\n    # This is the name of directory where the test results will be saved. Use a different name for this scenario.\n    OUTPUT_PATH=llmperf-results-sonnets-speculative\n\n    export OPENAI_API_BASE=\"http://localhost:8000/v1\"\n    export OPENAI_API_KEY=\"mock_key\"\n\n    python token_benchmark_ray.py \\\n        --model $MODEL_PATH \\\n        --mean-input-tokens 10000 \\\n        --stddev-input-tokens 0 \\\n        --mean-output-tokens 1500 \\\n        --stddev-output-tokens 0 \\\n        --num-concurrent-requests 1\\\n        --timeout 3600 \\\n        --max-num-completed-requests 50 \\\n        --tokenizer $MODEL_PATH \\\n        --additional-sampling-params '{}' \\\n        --results-dir $OUTPUT_PATH \\\n        --llm-api \"openai\"\n\nA sample output from the above script is shown below:\n\n::\n\n    Results for token benchmark for /home/ubuntu/models/Llama-3.3-70B-Instruct/ queried with the openai api.\n\n    inter_token_latency_s\n        p25 = 0.0053349758717231455\n        p50 = 0.005386366705410183\n        p75 = 0.005441084293027719\n        p90 = 0.005499971026182175\n        p95 = 0.005520176071580499\n        p99 = 0.005911254031351169\n        mean = 0.00540780140378178\n        min = 0.005264532127728065\n        max = 0.006265544256816307\n        stddev = 0.00013951778334019935\n    ttft_s\n        p25 = 0.8693495176266879\n        p50 = 0.870149074587971\n        p75 = 0.8710820493288338\n        p90 = 0.8725412225350737\n        p95 = 0.8742059985175729\n        p99 = 36.83790613239617\n        mean = 2.280795605443418\n        min = 0.8676468348130584\n        max = 71.38881027325988\n        stddev = 9.97280539681726\n    end_to_end_latency_s\n        p25 = 8.873123338911682\n        p50 = 8.950916013680398\n        p75 = 9.030085149221122\n        p90 = 9.120021602977067\n        p95 = 9.150626054406166\n        p99 = 45.70815015356973\n        mean = 10.393093119114637\n        min = 8.766328778117895\n        max = 80.78758085798472\n        stddev = 10.158917239418473\n    request_output_throughput_token_per_s\n        p25 = 166.22213179149702\n        p50 = 167.69243252025473\n        p75 = 169.16253286110174\n        p90 = 169.52692450439133\n        p95 = 169.81518762962915\n        p99 = 170.85438941846397\n        mean = 164.631719334475\n        min = 18.579588397857652\n        max = 171.2233293995004\n        stddev = 21.152953887186314\n    number_input_tokens\n        p25 = 10000.0\n        p50 = 10000.0\n        p75 = 10000.0\n        p90 = 10000.0\n        p95 = 10000.0\n        p99 = 10000.0\n        mean = 10000.0\n        min = 10000\n        max = 10000\n        stddev = 0.0\n    number_output_tokens\n        p25 = 1501.0\n        p50 = 1501.0\n        p75 = 1501.0\n        p90 = 1501.0\n        p95 = 1501.0\n        p99 = 1502.02\n        mean = 1501.04\n        min = 1501\n        max = 1503\n        stddev = 0.282842712474619\n    Number Of Errored Requests: 0\n    Overall Output Throughput: 144.17136914316023\n    Number Of Completed Requests: 50\n    Completed Requests Per Minute: 5.76285918335928\n\nConclusion\n-----------\nAs seen in the table below, TPOT reduced by 3.6x and output token throughput increased by 4x when using speculative decoding with draft model combined with fused speculative decoding,\ncompared to baseline without speculative decoding. Please note that batch size of 1 is used in this tutorial for computing the below metrics.\n\n\n.. csv-table::\n   :file: llama70b_perf_comparison.csv\n   :header-rows: 1\n\n"
  },
  {
    "path": "libraries/nxd-inference/tutorials/trn3-gpt-oss-120b-tutorial.rst",
    "content": ".. meta::\n    :description: Tutorial for deploying GPT-OSS 120B on Trainium3 instances using NeuronX Distributed (NxD) Inference with vLLM.\n    :keywords: GPT-OSS 120B, Trainium3, NeuronX Distributed Inference, NxD Inference, vLLM, Large Language Models, LLM Deployment, Tensor Parallelism, Data Parallelism, Speculative Decoding, Neuron SDK\n    :date-modified: 12/02/2025\n\n.. _nxdi-trn3-gpt-oss-120b-tutorial:\n\nTutorial: GPT-OSS 120B on Trn3 instances [BETA]\n=======================================================================\n\nNeuronX Distributed (NxD) Inference allows you to deploy GPT-OSS 120B on\nTrn3 instances for high-performance inference. This tutorial provides a step-by-step\nguide to deploy GPT-OSS 120B on a Trn3 instance using tensor parallelism, \ndata parallelism and optimized kernels for efficient inference at scale.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nPrerequisites\n-------------\n\nAs a prerequisite, this tutorial requires that you have a Trn3 instance\ncreated from a Deep Learning AMI which is current private[Beta] \nthat has the Neuron SDK with support for GPT-OSS 120B on Trn3 instances pre-installed.\n\n.. note::\n\n    Please contact us to get access to the private Deep Learning AMI for 2.27 Beta release\n    that has all the necessary artifacts for you to run this tutorial on Trn3 instance.\n\n\nThe Deep Learning AMI contains the following:\n\n* Neuron system dependencies\n* Python virtual environment with Neuron SDK and vLLM v0.11.1 in :code:`~/neuronx_gpt_oss_120b_in_vllm_venv`\n* vLLM startup script at :code:`~/start_vllm_server.sh`\n* GPT-OSS 120B and EAGLE3 draft model checkpoints in :code:`/mnt/inference/models/`\n\n\nPerformance Optimizations\n-------------------------\n\nThe model is configured to run with data parallelism i.e. 8 independent vLLM endpoints per Trn3\ninstance each using tensor parallelism with :code:`tp_degree=8` and :code:`LNC=2`. Furthermore, we use the\nfollowing performance optimizations:\n\n* speculative decoding using EAGLE3 with speculation length 5\n* optimized NKI kernels for attention, MoE, sampling\n* support for MXFP4 compute in MoE (Trn3 only)\n\nFor more information see:\n\n* :ref:`moe-inference-deep-dive`\n* :ref:`logical-neuroncore-config`\n* :ref:`trainium3-arch`\n\n\nStep 1: Launch vLLM server\n--------------------------\n\nUse the included script to launch a vLLM server on the instance.\n\n::\n\n    source ./neuronx_gpt_oss_120b_in_vllm_venv/bin/activate\n    bash start_vllm_server.sh\n\n\nDuring first start up, the model will be compiled and serialized. Subsequent startups will\ndirectly load from the serialized model (:ref:`nxdi-vllm-v1-serialization`). You should see output indicating the server is \nready:\n\n\n::\n\n    INFO:     Started server process\n    INFO:     Waiting for application startup.\n    INFO:     Application startup complete.\n    INFO:     Uvicorn running on http://0.0.0.0:8000\n\n\n\nThe setup is intended to be used with data parallelism and supports running 8 copies on\none Trn3 instance. You will need to provide a unique port (:code:`--port 8000`) and update the visible \nNeuronCores range (:code:`export NEURON_RT_VISIBLE_CORES=0-7`) for each copy. If you want to start multiple\nservers concurrently without loading from a serialized model, you also need to provide each with\na unique compiler working directory by setting the :code:`BASE_COMPILE_WORK_DIR` environment variable.\nPlease refer to :ref:`/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-dp-tutorial.ipynb` for more information.\n\nCurrently, (NxD) Inference supports EAGLE3 heads with the same hidden size and vocabulary size as the target model\nand follow the Llama3 dense architecture. It must contain the following layers:\n:code:`fc, hidden_norm, input_layernorm, attention, mlp, lm_head and embed_tokens`. Any other EAGLE3 head \narchitecture needs to be brought up as a new model.\n\n\nStep 2: Test inference with sample requests\n--------------------------------------------\n\nWith the vLLM server running, open a new terminal session and test the inference endpoint.\n\nFirst, verify the server is responding:\n\n::\n\n    curl -i http://localhost:8000/health\n\nYou should receive a :code:`HTTP/1.1 200 OK` response.\n\nNow, send a sample inference request:\n\n::\n\n    curl http://localhost:8000/v1/chat/completions \\\n        -H \"Content-Type: application/json\" \\\n        -d '{\n            \"model\": \"/mnt/inference/models/gpt-oss-120b\",\n            \"messages\": [\n                {\"role\": \"user\", \"content\": \"How are you?\"}\n            ]\n        }'\n\nYou should receive a JSON response with the generated text.\n\n\nStep 3: Run performance benchmarks\n-----------------------------------\n\nWe are going to use LLMPerf for benchmarking. Install LLMPerf from source and \npatch it to support data parallelism and reasoning models following :ref:`llm-inference-benchmarking`.\n\nThen, run the benchmark with the following commands:\n\n::\n\n    export OPENAI_API_KEY=EMPTY\n\n    # if you have started multiple vLLM servers, \n    # append the endpoints separated by semicolon\n    # e.g. `export OPENAI_API_BASE=\"http://localhost:8000/v1;http://localhost:8001/v1\"`\n    # and adjust `--num-concurrent-requests` accordingly. You might also want to increase\n    # `--max-num-completed-requests`.\n    export OPENAI_API_BASE=\"http://0.0.0.0:8000/v1\"\n\n    python ~/llmperf/token_benchmark_ray.py \\\n        --model /mnt/inference/models/gpt-oss-120b \\\n        --mean-input-tokens 10000 \\\n        --stddev-input-tokens 0 \\\n        --mean-output-tokens 3000 \\\n        --stddev-output-tokens 0 \\\n        --num-concurrent-requests 1 \\\n        --results-dir \"./llmperf_results/\" \\\n        --max-num-completed-requests 50 \\\n        --additional-sampling-params '{\"temperature\": 1.0, \"top_k\": 1.0, \"top_p\": 1.0}' \\\n        --llm-api \"openai\"\n\nStep 4: Clean up\n----------------\n\nTo stop the vLLM server and free up resources:\n\n1. Press ``Ctrl+C`` in the terminal running the vLLM server\n2. Verify all processes have stopped:\n\n::\n\n    ps aux | grep vllm\n\n3. If any vLLM processes are still running, terminate them using their process IDs (PIDs): ``kill -9 <PID>``.\n   \nYou have now successfully deployed GPT-OSS 120B on a Trn3 instance using NxD Inference with vLLM!\n"
  },
  {
    "path": "libraries/nxd-inference/vllm/index.rst",
    "content": ".. meta:: \n    :description: Run high-performance LLM inference with vLLM on AWS Neuron accelerators. Deploy models like Llama, Qwen, and more on Trainium and Inferentia instances.\n    :date-modified: 11/25/2025\n\n\nvLLM on Neuron\n===============\n\nvLLM on Neuron enables high-performance LLM inference on AWS Trainium and Inferentia instances, providing a streamlined deployment experience with minimal code changes. The integration leverages AWS Neuron's optimized AI inference capabilities and vLLM's advanced features like continuous batching to deliver efficient model serving for both latency-sensitive applications and high-throughput batch processing workloads.\n\nOverview\n---------\n\nvLLM is a popular library for LLM inference and serving that integrates with AWS Neuron through the NxD Inference (neuronx-distributed-inference) library. This integration uses vLLM's Plugin System to extend the model execution components responsible for loading and invoking models within vLLM's LLMEngine, while maintaining vLLM's input processing, scheduling, and output processing behaviors.\n\n**Key Features:**\n\n- **Continuous batching** for efficient processing of multiple requests\n- **Prefix caching** to improve time-to-first-token by reusing KV cache of common prompts\n- **Speculative decoding** support (Eagle V1)\n- **Quantization** with INT8/FP8 support for optimized performance\n- **Dynamic sampling** and tool calling capabilities\n- **Multimodal support** for models like Llama 4 Scout and Maverick\n\n**Supported Models:**\n\n- Llama 2/3.1/3.3\n- Llama 4 Scout, Maverick (with multimodal capabilities)\n- Qwen 2.5\n- Qwen 3\n- Custom models onboarded to NxD Inference\n\n**Deployment Options:**\n\n- Quick deployment using pre-configured Deep Learning Containers\n- Manual installation from source with the vLLM-Neuron plugin\n- Offline batch inference for processing multiple prompts\n- Online model serving with an OpenAI-compatible API server\n\nGet Started with Inference and vLLM on Neuron\n----------------------------------------------\n\nLearn how to run high-performance inference workloads using vLLM on AWS Neuron accelerators. These quickstart guides walk you through setting up both offline batch processing and online API serving, helping you deploy large language models efficiently on Trainium and Inferentia instances.\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: Deploy a Deep Learning Container with vLLM\n      :link: /containers/get-started/quickstart-configure-deploy-dlc\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Quickly deploy a vLLM server on Trainium and Inferentia instances using a DLC image preconfigured with AWS Neuron SDK artifacts.\n\n   .. grid-item-card:: Offline Model Serving\n      :link: quickstart-vllm-offline-serving\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Run batch inference jobs with vLLM on Neuron. Install the plugin, process multiple prompts, and cache compiled artifacts for faster reruns.\n\n   .. grid-item-card:: Online Model Serving\n      :link: quickstart-vllm-online-serving\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Launch an OpenAI-compatible API server with vLLM on Neuron. Set up interactive endpoints, validate with curl, and integrate with the OpenAI SDK.\n\nGuides for vLLM on Neuron\n--------------------------\n\n.. grid:: 1 \n   :gutter: 3\n\n   .. grid-item-card:: vLLM on Neuron User Guide (V1)\n      :link: /libraries/nxd-inference/developer_guides/vllm-user-guide-v1\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Learn the details of developing inference models on Neuron with vLLM V1.\n\nvLLM on Neuron Tutorials\n--------------------------\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: Deploy Llama4 with vLLM\n      :link: /libraries/nxd-inference/tutorials/llama4-tutorial\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Learn how to deploy Llama4 multimodal models on Trainium2 instances using vLLM for both offline and online inference.\n\n.. toctree::\n    :hidden:\n    :maxdepth: 1\n\n    Quickstart: Offline Model Serving </libraries/nxd-inference/vllm/quickstart-vllm-offline-serving>\n    Quickstart: Online Model Serving </libraries/nxd-inference/vllm/quickstart-vllm-online-serving>\n    vLLM on Neuron User Guide </libraries/nxd-inference/developer_guides/vllm-user-guide-v1>\n    Model Recipes </libraries/nxd-inference/models/index>\n    Deploy Llama4 with vLLM </libraries/nxd-inference/tutorials/llama4-tutorial>\n"
  },
  {
    "path": "libraries/nxd-inference/vllm/quickstart-vllm-offline-serving.rst",
    "content": ".. meta::\n   :description: Learn how to run your first offline vLLM batch inference job on AWS Neuron.\n   :date_updated: 2025-12-02\n\n.. _quickstart-offline-serving:\n\nQuickstart: Run offline inference with vLLM on Neuron\n======================================================\n\nThis quickstart walks you through running vLLM in offline (batch) inference mode on AWS Neuron. You install the ``vllm-neuron`` plugin, generate text for a batch of prompts, and cache the compiled artifacts so reruns stay fast.\n\n**This quickstart is for**: Developers who want to run offline/batch inference on Neuron without an API server  \n**Time to complete**: ~20 minutes\n\nPrerequisites\n-------------\n\nBefore you begin, make sure you have:\n\n* An EC2 instance with Neuron cores and network access to Hugging Face Hub.\n* The Neuron SDK installed (see :ref:`Setup Instructions<nxdi-setup>`).\n* Python 3.10 or later with ``pip``.\n* Basic familiarity with running Python scripts in a virtual environment.\n\n.. note::\n   For the fastest setup, consider the vLLM Neuron Deep Learning Container (DLC) which bundles the SDK, vLLM, and dependencies. See :ref:`quickstart_vllm_dlc_deploy`.\n\nStep 1: Install the ``vllm-neuron`` plugin\n-------------------------------------------\n\nIn this step, you install the Neuron-enabled vLLM plugin inside your Python environment.\n\n.. code-block:: bash\n\n   # Activate your Neuron virtual environment\n   source ~/aws_neuronx_venv_pytorch_2_8_nxd_inference/bin/activate\n\n   # Clone the vLLM Neuron plugin repository\n   git clone https://github.com/vllm-project/vllm-neuron.git\n   cd vllm-neuron\n\n   # Install with the Neuron package repository\n   pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com -e .\n\n.. important::\n   The ``--extra-index-url`` flag ensures Neuron-compatible wheels are pulled from the AWS repository.\n\nTo confirm the install succeeded, run ``python -c \"import vllm\"`` and verify no errors display.\n\nStep 2: Run a batch inference job\n---------------------------------\n\nIn this step, you run a short Python script that generates completions for three prompts using the Llama 3.1 8B Instruct model.\n\n.. tip::\n   **Before your first run**, set the ``NEURON_COMPILED_ARTIFACTS`` environment variable to enable caching. This lets subsequent runs skip the Neuron compilation phase and load instantly:\n\n   .. code-block:: bash\n\n      export NEURON_COMPILED_ARTIFACTS=\"./compiled_models\"\n\n   After the first run completes, the ``compiled_models`` directory will contain the cached artifacts.\n\n.. code-block:: python\n\n    from vllm import LLM, SamplingParams\n\n    llm = LLM(\n        model=\"meta-llama/Llama-3.1-8B-Instruct\",\n        tensor_parallel_size=32,\n        max_num_seqs=1,\n        max_model_len=128,\n        enable_prefix_caching=False,\n        enable_chunked_prefill=False,\n        additional_config={\n            \"override_neuron_config\": {\n                \"skip_warmup\": True,\n            },\n        },\n    )\n\n    prompts = [\n        \"Hello, my name is\",\n        \"The capital of France is\",\n        \"The future of AI is\",\n    ]\n\n    outputs = llm.generate(prompts, SamplingParams(top_k=10))\n    for output in outputs:\n        prompt = output.prompt\n        generated_text = output.outputs[0].text\n        print(f\"Prompt: {prompt!r}\")\n        print(f\"Generated: {generated_text!r}\")\n\nIf the script succeeds, you will see each prompt followed by generated text in the console.\n\nStep 3: Optimize model loading with sharded checkpoints\n-------------------------------------------------------\n\nIn this step, you configure vLLM to save sharded checkpoints, which significantly speeds up model loading on subsequent runs.\n\nBy default, vLLM shards the model weights during every load, which can take considerable time. Setting ``save_sharded_checkpoint: true`` saves the sharded weights to disk after the first run, eliminating this overhead.\n\n.. code-block:: python\n\n    from vllm import LLM, SamplingParams\n\n    llm = LLM(\n        model=\"meta-llama/Llama-3.1-8B-Instruct\",\n        tensor_parallel_size=32,\n        max_num_seqs=1,\n        max_model_len=128,\n        enable_prefix_caching=False,\n        enable_chunked_prefill=False,\n        additional_config={\n            \"override_neuron_config\": {\n                \"skip_warmup\": True,\n                \"save_sharded_checkpoint\": true,\n            },\n        },\n    )\n\nAfter the first run, the sharded checkpoint is saved alongside your model files. Subsequent runs will load the pre-sharded weights directly, reducing initialization time.\n\nStep 4: Try advanced configuration options (optional)\n-----------------------------------------------------\n\nIn this step, you explore optional tuning features that can improve throughput for specific workloads.\n\n**Enable prefix caching when prompts share a long system prefix**:\n\n.. note::\n   To understand how to configure prefix caching parameters like ``num_gpu_blocks_override``, ``block_size``, ``pa_num_blocks``, and ``pa_block_size``, \n   see the `Llama 3.3 70B prefix caching tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-apc-tutorial.html#Scenario-1:-Run-Llama3.3-70B-on-Trn2-without-Prefix-Caching>`_.\n\n.. code-block:: python\n\n    from vllm import LLM, SamplingParams\n\n    llm = LLM(\n        model=\"meta-llama/Llama-3.1-8B-Instruct\",\n        tensor_parallel_size=32,\n        max_num_seqs=4,\n        max_model_len=2048,\n        num_gpu_blocks_override=4096,\n        block_size=32,\n        enable_prefix_caching=True,\n        additional_config={\n            \"override_neuron_config\": {\n                \"is_prefix_caching\": True,\n                \"is_block_kv_layout\": True,\n                \"pa_num_blocks\": 4096,\n                \"pa_block_size\": 32,\n                \"skip_warmup\": True,\n            },\n        },\n    )\n\n    prompts = [\n        \"Hello, my name is\",\n        \"The president of the United States is\",\n        \"The capital of France is\",\n    ]\n\n    outputs = llm.generate(prompts, SamplingParams(temperature=0.0))\n\n    for output in outputs:\n        print(f\"Prompt: {output.prompt}\")\n        print(f\"Generated: {output.outputs[0].text}\")\n\n**Use Eagle speculative decoding when you have an EAGLE checkpoint available**:\n\nBelow is an example of how to run vLLM inference with an EAGLE V1 checkpoint\n\n.. note::\n   Eagle draft checkpoints must be converted for NxD Inference compatibility and include the target model's LM head. Follow the guidance at `EAGLE Checkpoint Compatibility <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#eagle-checkpoint-compatibility>`_.\n\n.. code-block:: python\n\n    from vllm import LLM, SamplingParams\n\n    llm = LLM(\n        model=\"meta-llama/Llama-3.1-8B-Instruct\",\n        tensor_parallel_size=32,\n        max_num_seqs=4,\n        max_model_len=256,\n        speculative_config={\n            \"model\": \"./eagle_draft_converted\",\n            \"num_speculative_tokens\": 5,\n            \"max_model_len\": 256,\n            \"method\": \"eagle\",\n        },\n    )\n\n    prompts = [\n        \"The key benefits of cloud computing are\",\n        \"Python is a popular programming language because\",\n        \"Machine learning models can be improved by\",\n    ]\n\n    outputs = llm.generate(prompts, SamplingParams(top_k=50, max_tokens=100))\n\n    for output in outputs:\n        print(f\"Prompt: {output.prompt}\")\n        print(f\"Generated: {output.outputs[0].text}\")\n\nConfirmation\n------------\n\nRe-run the script from Step 2. You should see completions printed again, and the log will indicate:\n\n* Compiled artifacts were loaded from cache (if ``NEURON_COMPILED_ARTIFACTS`` is set)\n* Sharded checkpoint was loaded directly (if ``save_sharded_checkpoint: true`` was used)\n\nIf you enable Neuron debug logging, look for ``Loaded Neuron compiled artifacts`` messages.\n\nCommon issues\n-------------\n\n- **Initial run takes too long**: Set ``NEURON_COMPILED_ARTIFACTS`` before running so the second run reuses the cache.\n- **Model loading is slow on every run**: Enable ``save_sharded_checkpoint: true`` in ``override_neuron_config`` to avoid re-sharding the model weights each time.\n- **Warmup adds latency**: Keep ``skip_warmup=True`` in ``override_neuron_config`` if your workload does not require the warmup pass.\n\nClean up\n--------\n\nDeactivate your Python environment with ``deactivate``. Delete the ``compiled_models`` directory if you no longer need the cached artifacts. Remove any sharded checkpoint directories created by ``save_sharded_checkpoint``. Remove the cloned ``vllm-neuron`` repository if finished testing.\n\nNext steps\n----------\n\n* Explore prefix caching, Eagle speculative decoding, and other options in :ref:`nxdi-feature-guide`.\n* Review supported model architectures in :ref:`nxdi-supported-model-architectures`.\n* Switch to the online serving quickstart (:ref:`quickstart-online-serving`) when you need an API endpoint.\n\nFurther reading\n---------------\n\n- :ref:`nxdi-vllm-user-guide-v1`: Complete integration reference.\n- :ref:`nxdi-tutorials-index`: In-depth tutorials and workflow guides.\n- `Downloading models from Hugging Face <https://huggingface.co/docs/hub/en/models-downloading>`_: Instructions for obtaining model checkpoints.\n"
  },
  {
    "path": "libraries/nxd-inference/vllm/quickstart-vllm-online-serving.rst",
    "content": ".. meta::\n   :description: Launch the vLLM OpenAI-compatible server on AWS Neuron for interactive inference.\n   :date_updated: 2025-01-15\n\n.. _quickstart-online-serving:\n\nQuickstart: Serve models online with vLLM on Neuron\n===================================================\n\nThis quickstart shows you how to launch the vLLM OpenAI-compatible API server on AWS Neuron. You install the ``vllm-neuron`` plugin, start the server, validate it with ``curl``, and call it from the OpenAI Python SDK.\n\n**This quickstart is for**: Developers who need an interactive, low-latency serving endpoint on Neuron  \n**Time to complete**: ~20 minutes\n\nPrerequisites\n-------------\n\nBefore you begin, make sure you have:\n\n* An EC2 instance with Neuron cores and network access to Hugging Face Hub.\n* The Neuron SDK installed (see :ref:`Setup Instructions<nxdi-setup>`).\n* Python 3.10 or later with ``pip``.\n* Basic familiarity with running Python scripts in a virtual environment.\n\n.. note::\n   For the fastest setup, consider the vLLM Neuron Deep Learning Container (DLC), which bundles the SDK, vLLM, and dependencies. See :ref:`quickstart_vllm_dlc_deploy`.\n\nStep 1: Install the ``vllm-neuron`` plugin\n-------------------------------------------\n\nIn this step, you install the Neuron-enabled vLLM plugin inside your Python environment.\n\n.. code-block:: bash\n\n   # Activate your Neuron virtual environment\n   source ~/aws_neuronx_venv_pytorch_2_9_nxd_inference/bin/activate\n\n   # Clone the vLLM Neuron plugin repository\n   git clone https://github.com/vllm-project/vllm-neuron.git\n   cd vllm-neuron\n\n   # Install with the Neuron package repository\n   pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com -e .\n\n.. important::\n   The ``--extra-index-url`` flag pulls the Neuron-compatible wheels from the AWS repository.\n\nTo confirm the install succeeded, run ``python -c \"import vllm\"`` and verify no errors display.\n\nStep 2: Launch the API server\n-----------------------------\n\nIn this step, you start an OpenAI-compatible endpoint with a LLaMA model.\n\n.. code-block:: bash\n\n    export VLLM_NEURON_FRAMEWORK=\"neuronx-distributed-inference\"\n\n    vllm serve \\\n      --model meta-llama/Llama-3.1-8B-Instruct \\\n      --tensor-parallel-size 8 \\\n      --max-model-len 128 \\\n      --max-num-seqs 4 \\\n      --no-enable-prefix-caching \\\n      --port 8000 \\\n      --additional-config '{\n        \"override_neuron_config\": {\n          \"enable_bucketing\": false\n        }\n      }'\n\nKey arguments:\n\n* ``--tensor-parallel-size``: Matches the number of Neuron cores you want to use.\n* ``--max-model-len`` and ``--max-num-seqs``: Duplicate limits from your offline workflow.\n* ``--additional-config``: Wrap Neuron overrides under ``override_neuron_config`` (``enable_bucketing`` here).\n* ``--port``: Choose the listening port for the API server.\n\nStep 3: Verify the endpoint with ``curl``\n------------------------------------------\n\nIn this step, you confirm the server is responding by sending a chat completion request.\n\n.. code-block:: bash\n\n    curl http://localhost:8000/v1/chat/completions \\\n      -H \"Content-Type: application/json\" \\\n      -d '{\n            \"model\": \"meta-llama/Llama-3.1-8B-Instruct\",\n            \"messages\": [\n              {\"role\": \"system\", \"content\": \"You are a concise assistant.\"},\n              {\"role\": \"user\", \"content\": \"List three Neuron optimization tips.\"}\n            ],\n            \"temperature\": 0.2\n          }'\n\nIf successful, the server returns a JSON payload containing the generated answer.\n\nStep 4: Call the API with the OpenAI SDK\n-----------------------------------------\n\nNow that the server is live, call it using the OpenAI Python SDK.\n\n.. code-block:: python\n\n    from openai import OpenAI\n\n    # Client setup\n    client = OpenAI(\n        api_key=\"EMPTY\",\n        base_url=\"http://localhost:8000/v1\",\n    )\n\n    models = client.models.list()\n    model_name = models.data[0].id\n\n    max_tokens = 50\n    temperature = 1.0\n    top_p = 1.0\n    top_k = 50\n    stream = False\n\n    response = client.chat.completions.create(\n        model=model_name,\n        messages=[{\"role\": \"user\", \"content\": \"Hello, my name is Llama\"}],\n        max_tokens=max_tokens,\n        temperature=temperature,\n        top_p=top_p,\n        stream=stream,\n        extra_body={\"top_k\": top_k},\n    )\n\n    generated_text = \"\"\n    if stream:\n        for chunk in response:\n            if chunk.choices[0].delta.content is not None:\n                generated_text += chunk.choices[0].delta.content\n    else:\n        generated_text = response.choices[0].message.content\n\n    print(generated_text)\n\nStep 5: Explore advanced configuration (optional)\n-------------------------------------------------\n\nThe commands below show optional tuning features to adapt the server for different workloads.\n\n**Reuse compiled models to avoid recompilation**:\n\n.. code-block:: bash\n\n    # Create directory for compiled artifacts if it doesn't exist\n    mkdir -p ./neuron_compiled_models/llama3-8b\n    \n    # Set the environment variable before launching the server\n    export NEURON_COMPILED_ARTIFACTS=\"./neuron_compiled_models/llama3-8b\"\n\n**Enable prefix caching when prompts share a long context**:\n\n.. code-block:: bash\n\n    vllm serve \\\n      --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 \\\n      --tensor-parallel-size 32 \\\n      --max-model-len 1024 \\\n      --max-num-seqs 8 \\\n      --enable-prefix-caching \\\n      --block-size 32 \\\n      --num-gpu-blocks-override 256 \\\n      --additional-config '{\n        \"override_neuron_config\": {\n          \"is_prefix_caching\": true,\n          \"is_block_kv_layout\": true,\n          \"pa_num_blocks\": 256,\n          \"pa_block_size\": 32\n        }\n      }' \\\n      --port 8000\n\n**Use Eagle speculative decoding when you have an EAGLE checkpoint available**:\n\nBelow is an example of how to run vLLM inference with an EAGLE V1 checkpoint\n\n.. note::\n   Eagle draft checkpoints must be converted for NxD Inference compatibility and include the target model's LM head. Follow the guidance at `EAGLE Checkpoint Compatibility <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#eagle-checkpoint-compatibility>`_.\n\n.. code-block:: bash\n\n    vllm serve \\\n      --model meta-llama/Llama-3.1-8B-Instruct \\\n      --tensor-parallel-size 32 \\\n      --max-model-len 2048 \\\n      --max-num-seqs 4 \\\n      --speculative-config '{\"model\": \"./eagle_draft_converted\", \"method\": \"eagle\", \"num_speculative_tokens\": 5, \"max_model_len\": 2048}' \\\n      --port 8000\n\n**Use MultiLoRA**:\n\n.. note::\n   For multi-LoRA serving, you can optionally create a JSON configuration file that maps LoRA adapter IDs to their checkpoint paths. This enables dynamic adapter loading and swapping between HBM and host memory.\n\n.. code-block:: bash\n\n    # Example JSON configuration (save as lora_config.json):\n    # {\n    #   \"lora-ckpt-dir\": \"/opt/lora/tinyllama/\",\n    #   \"lora-ckpt-paths\": {\n    #     \"tarot_adapter\": \"tarot\",\n    #     \"support_adapter\": \"mental-health\"\n    #   },\n    #   \"lora-ckpt-paths-cpu\": {\n    #     \"tarot_adapter\": \"tarot\",\n    #     \"support_adapter\": \"mental-health\"\n    #   }\n    # }\n\n    vllm serve \\\n      --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 \\\n      --tensor-parallel-size 32 \\\n      --max-model-len 1024 \\\n      --max-num-seqs 2 \\\n      --enable-lora \\\n      --max-loras 2 \\\n      --max-cpu-loras 4 \\\n      --lora-modules \\\n        tarot_adapter=/opt/lora/tinyllama/tarot \\\n        support_adapter=/opt/lora/tinyllama/mental-health \\\n      --additional-config '{\"override_neuron_config\": {\"lora_ckpt_json\": \"/path/to/lora_config.json\"}}' \\\n      --port 8000\n\nClients can select an adapter per request by setting the ``model`` field to the adapter ID in the request. The ``max-loras`` parameter controls concurrent adapters in HBM, while ``max-cpu-loras`` controls adapters in host memory with dynamic swapping support.\n\n**Tune context and token buckets for long prompts**:\n\n.. code-block:: bash\n\n    export VLLM_NEURON_FRAMEWORK=\"neuronx-distributed-inference\"\n\n    vllm serve \\\n      --model \"meta-llama/Llama-3.1-8B-Instruct\" \\\n      --tensor-parallel-size 32 \\\n      --max-num-seqs 1 \\\n      --max-model-len 1024 \\\n      --port 8000 \\\n      --no-enable-prefix-caching \\\n      --additional-config '{\n        \"override_neuron_config\": {\n          \"enable_bucketing\": true,\n          \"context_encoding_buckets\": [256, 512, 1024],\n          \"token_generation_buckets\": [32, 64, 128, 256, 512, 768],\n          \"max_context_length\": 1024,\n          \"seq_len\": 1024,\n          \"batch_size\": 1,\n          \"ctx_batch_size\": 1,\n          \"tkg_batch_size\": 1,\n          \"is_continuous_batching\": true\n        }\n      }'\n\nSet ``NEURON_COMPILED_ARTIFACTS`` before launching if you want to reuse artifacts across runs.\n\nConfirmation\n------------\n\nResend the ``curl`` request from Step 3 or rerun the OpenAI SDK snippet from Step 4. Successful responses confirm that the server is up and reachable. You can also open ``http://localhost:8000/health`` to check the health probe.\n\nCommon issues\n-------------\n\n- **Server exits immediately**: Confirm ``--tensor-parallel-size`` matches the number of available Neuron cores.\n- **Requests return 5xx errors**: Lower ``--max-num-seqs`` or ``--max-model-len`` if the model runs out of memory.\n- **Initial requests take too long**: Set ``NEURON_COMPILED_ARTIFACTS`` so subsequent launches reuse compiled artifacts.\n\nClean up\n--------\n\nStop the API server (Ctrl+C). Deactivate your Python environment with ``deactivate``. Remove the cloned ``vllm-neuron`` repository if you no longer need it, and clear any cached artifacts if disk space is a concern.\n\nNext steps\n----------\n\n* Explore prefix caching, Eagle speculative decoding, and other options in :ref:`nxdi-feature-guide`.\n* Review supported model architectures in :ref:`nxdi-supported-model-architectures`.\n* Try the offline batch quickstart (:ref:`quickstart-offline-serving`) if you need non-interactive inference.\n\nFurther reading\n---------------\n\n- :ref:`nxdi-vllm-user-guide-v1` – Complete integration reference.\n- :ref:`nxdi-tutorials-index` – In-depth tutorials and workflow guides.\n- `OpenAI Python SDK reference <https://github.com/openai/openai-python>`_ – API documentation for the client used in Step 4.\n"
  },
  {
    "path": "libraries/nxd-training/api-guide.txt",
    "content": "* :ref:`nxdt_config_overview`\n\n..\n    * :ref:`nxdt_config_overview`\n    * :ref:`nxdt_config_trainer`\n    * :ref:`nxdt_config_exptm`\n    * :ref:`nxdt_config_distributed_strategy`\n    * :ref:`nxdt_config_data`\n    * :ref:`nxdt_config_model`\n    * :ref:`nxdt_config_overview_precision_config`"
  },
  {
    "path": "libraries/nxd-training/api-reference-guide.rst",
    "content": ".. _nxd-training-api-guide:\n\nAPI Reference Guide \n===============================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /libraries/nxd-training/general/config_overview\n\n\n.. include:: /libraries/nxd-training/api-guide.txt"
  },
  {
    "path": "libraries/nxd-training/app_notes/nxd-training-amr-appnote.rst",
    "content": ".. _nxd_training_amr_appnote:\n\n.. include:: /libraries/neuronx-distributed/activation_memory_reduction.rst"
  },
  {
    "path": "libraries/nxd-training/app_notes/nxd-training-cp-appnote.rst",
    "content": ".. _nxd_training_cp_appnote:\n\n.. include:: /libraries/neuronx-distributed/context_parallelism_overview.rst"
  },
  {
    "path": "libraries/nxd-training/app_notes/nxd-training-pp-appnote.rst",
    "content": ".. _nxd_training_pp_appnote:\n\n.. include:: /libraries/neuronx-distributed/pipeline_parallelism_overview.rst"
  },
  {
    "path": "libraries/nxd-training/app_notes/nxd-training-tp-appnote.rst",
    "content": ".. _nxd_training_tp_appnote:\n\n.. include:: /libraries/neuronx-distributed/tensor_parallelism_overview.rst"
  },
  {
    "path": "libraries/nxd-training/app_notes.rst",
    "content": ".. _nxd_training_appnotes:\n\nApp Notes\n=========\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    \n    /about-neuron/appnotes/neuronx-distributed/introducing-nxdt-training\n    /libraries/nxd-training/app_notes/nxd-training-tp-appnote\n    /libraries/nxd-training/app_notes/nxd-training-pp-appnote\n    /libraries/nxd-training/app_notes/nxd-training-amr-appnote\n\n\n\n.. include:: /libraries/nxd-training/app_notes.txt"
  },
  {
    "path": "libraries/nxd-training/app_notes.txt",
    "content": "* :ref:`introduce-nxd-training`\n* :ref:`nxd_training_tp_appnote`\n* :ref:`nxd_training_pp_appnote`\n* :ref:`nxd_training_amr_appnote`"
  },
  {
    "path": "libraries/nxd-training/developer-guide.rst",
    "content": ".. _nxdt_developer_guide\n\nDeveloper Guide (``nxd-training`` )\n====================================\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /libraries/nxd-training/developer_guides/index\n\n.. include:: /libraries/neuronx-distributed/developer-guide.txt\n"
  },
  {
    "path": "libraries/nxd-training/developer_guides/cpu_mode_developer_guide.rst",
    "content": ".. _cpu_mode_overview:\n\nCPU Mode Overview\n=================\n\nCPU mode allows users to run parallel primitives\nlike `RowParallelLinear` and `ColumnParallelLinear` on CPU. This is useful\nwhen debugging or developing model sharding and want to check the intermediate results \nof sharded layers. The CPU mode runs in PyTorch's eager mode and does not require\nthe compilation steps of torch-xla and Neuron compiler. The collective communications\nlike all-reduce use the PyTorch's \n`gloo backend <https://pytorch.org/docs/stable/distributed.html#backends-that-come-with-pytorch>`_\nfor communications.\n\nTo enable the CPU mode, we need to set the environment variable `NXD_CPU_MODE=1` to \nenable the CPU mode. As the CPU mode leverages Gloo backend for communication, users \nneed to initialize the distributed environment with \"gloo\" backend instead of \"xla\" backend.\nIn the following, we given an example of a MLP with Tensor Parallel linear layers. \n\n.. code-block:: python\n\n    import torch\n    import torch.nn as nn\n    import torch.distributed as dist\n    from neuronx_distributed.parallel_layers import layers\n    from neuronx_distributed.parallel_layers import initialize_model_parallel\n    from neuronx_distributed.utils import cpu_mode, get_device, master_print\n\n    # initialize the distributed environment inside PyTorch\n    cc_backend = \"gloo\" if cpu_mode() else \"xla\"\n    dist.init_process_group(backend=cc_backend)\n\n    # assuming sharding the model with TP=2\n    initialize_model_parallel(tensor_model_parallel_size=2)\n\n    hidden_size = 1024\n    rand_inputs = torch.rand(4, hidden_size)\n    model = nn.Sequential(\n        layers.ColumnParallelLinear(\n            hidden_size,\n            hidden_size,\n            bias=False,\n            gather_output=False,\n            keep_master_weight=True,\n        ),\n        layers.RowParallelLinear(\n            hidden_size,\n            hidden_size,\n            bias=False,\n            input_is_parallel=True,\n            keep_master_weight=True,\n        ),\n    )\n    model = model.to(get_device())\n    rand_inputs = rand_inputs.to(get_device())\n\n    outputs = model(rand_inputs)\n    # user can check the outputs are on the CPU\n    # and there is no compilation triggered\n    master_print(f\"Output sum is {outputs.sum()}\")\n\n\n.. code-block:: bash\n\n    # set the environment variable to enable CPU mode\n    # if the environment variable is set to 0, \n    # the script will run on Trainium accelerator using XLA\n    export NXD_CPU_MODE=1\n    # assumign the script show above is saved in test_cpu_mode.py\n    exec_file=test_cpu_mode.py\n    torchrun --nnodes=1 --nproc-per-node=2 --master_port=1234 ${exec_file}\n\n\nHow to use CPU mode in existing scripts\n---------------------------------------\n\nIf the scripts previously used the `xla_device` explicitly, \nusers need to replace the corresponding use of `xla_device` with \n`get_device()` function call from `neuronx_distributed.utils` to get the suitable device. \nSimilarly, you need to replace explicit calling of `xm.master_print` with wrapped `master_print`\nfrom `neuronx_distributed.utils`. In principle, to make the \nscripts general to both CPU mode and XLA mode with Trainium as the backend, you \nneed to replace functions from torch-xla package with a thin wrapper that can \ndispatch the function calls to the native PyTorch counterparts, when CPU mode \nis in-use.\n"
  },
  {
    "path": "libraries/nxd-training/developer_guides/dev-guide.txt",
    "content": "* :ref:`nxdt_developer_guide_integrate_new_model`\n* :ref:`nxdt_developer_guide_integrate_new_dataloader`\n* :ref:`nxdt_developer_flow_register_optimizer_lr_scheduler`\n* :ref:`nxdt_developer_guide_migration_nnm_nxdt`\n* :ref:`nxdt_developer_guide_migration_nemo_nxdt`"
  },
  {
    "path": "libraries/nxd-training/developer_guides/index.rst",
    "content": ".. _nxdt_developer_guide:\n\nDeveloper Guide\n===============\n\nThis section will go over a variety of developer guides to help users get started with\nthe Neuronx Distributed Training library.\n\n.. toctree::\n    :maxdepth: 2\n\n    Integrating a new model <new_model_guide>\n    Integrating a new dataset/dataloader <new_dataloader_guide>\n    Registering an optimizer and LR scheduler <optimizer_lr_scheduler_flow>\n    Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training <migration_nnm_nxdt>\n    NxD Training Compatibility with NeMo <migration_nemo_nxdt>\n    CPU Mode Developer Guide <cpu_mode_developer_guide>\n"
  },
  {
    "path": "libraries/nxd-training/developer_guides/migration_nemo_nxdt.rst",
    "content": ".. _nxdt_developer_guide_migration_nemo_nxdt:\n\nNxD Training Compatibility with NeMo\n====================================\n\nNxD Training (NxDT) is built on top of `NeMo-1.14 <https://github.com/NVIDIA/NeMo/tree/v1.14.0>`_.\nThe framework reuses modules from NeMo and exposes them via similar config interface.\n\n.. note::\n\n    At the moment, NxDT only allows running training of decoder LLM models.\n\nThis document goes over steps on how to run the NeMo training workloads inside NxDT.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nModel Integration\n------------------\n\n**Model already Exists in NxDT Model Hub:**\n\nIf the model you want to train is already included in the NxDT model hub, and the training workflow\n(e.g., pre-training, fine-tuning) is supported in NxDT, you need to modify NeMo YAML configuration file to\nthe NxDT YAML file. Follow the mapping table in the :ref:`nxdt_nemo_nxdt_config_mapping`.\n\n**Custom/New Model**\n\nIf your model is not part of the NxDT model hub, please use the guide\n:ref:`nxdt_developer_guide_integrate_new_model`.\n\n\nDataloader Integration\n----------------------\n\n**Dataloader already exposed via one of the NxDT configs**\n\nIn this case, please map the NeMo YAML config parameters to NxDT config parameters using the\nmapping table provided here :ref:`nxdt_nemo_nxdt_config_mapping`.\n\n**Custom/New Dataloader**\n\nIf the dataloader is not part of the hub, please use the guide\n:ref:`nxdt_developer_guide_integrate_new_dataloader`.\n\nOptimizer/LR Scheduler Integration\n----------------------------------\n\nSince NxDT is built on top of NeMo, all the optimizers/LR schedulers provided by NeMo can be enabled\nfrom the config.\n\nOptimal Partitioning\n--------------------\n\nNxDT is built on top of\n`NxD Core <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/index.html>`_\nprimitives and exposes different model parallelism techniques. All of them can be configured using\nthe ``distributed_strategy`` config.\n\nFusions/kernels\n---------------\n\nAll the fused kernels available inside the NeMo config are not available in NxDT. This is because fused\nkernels in NeMo are built specifically for GPUs. Neuron have a different set of kernels that can be\nenabled from the config. Also, since Neuron uses a graph based approach, the compiler can optimize\nsome of the modules and do fusions wherever required.\n\nCheckpoint Saving/loading\n-------------------------\n\n#.\n   NeMo combines the model weights, optimizers and other state_dicts into a single ``state_dict``\n   and dumps a file of the format: ``tp_rank_0*_pp_rank_00*/model_optim_rng.ckpt``. However, with NxDT, we\n   save the model ``state_dict`` and the optimizer separately. The model statedict is saved in a folder\n   of the form: ``model/dp_rank_00_tp_rank_00_pp_rank_00.pt`` and the optimizer is saved into a separate folder\n   as: ``optim/dp_rank_00_tp_rank_00_pp_rank_00.pt``. This is mainly done so that when we use\n   `zero1 <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html?highlight=zero1#neuron-zero1-optimizer>`_,\n   each DP rank can save its own optimizer shard.\n\n#.\n   NxDT doesn’t support ``.nemo`` style checkpoint saving. If users have a ``.nemo`` checkpoint, they would\n   have to unpack it themselves and build a checkpoint conversion script to load the checkpoint into NxDT.\n\n#.\n   In NeMo, if we are using pipeline parallel, each pipeline stage creates an independent model. So\n   lets say we have a model with 32 layers and we use PP=4, then NeMo would create 4 chunks with layers 0-7.\n   So each PP rank would have a ``model_state_dict`` with keys going from layer-0-7. However, in NxDT, the model\n   is created as a whole and then sharded. So the layer numbers are preserved.\n\n#.\n   One would have to write up a checkpoint conversion script similar to the checkpoint conversion from\n   NeMo to NxDT.\n\nFor a more detailed mapping of NeMo parameters to NxDT parameters, follow the guide\n:ref:`nxdt_nemo_nxdt_config_mapping`.\n\n.. _nxdt_nemo_nxdt_config_mapping:\n\nConfig Mapping\n--------------\n\nHere is a detailed mapping for all the parameters in the config file. For the below mapping, we chose\nthe Llama example across both NeMo and NxDT frameworks. The same mapping is also true for other models.\n\n.. csv-table::\n   :file: nemo_nxdt_mapping.csv\n   :header-rows: 1\n   :widths: 20, 20, 40\n\n.. note::\n\n   For parameters that are not supported by NxDT, please create a feature request with specific use-case\n   for the parameter, if needed.\n"
  },
  {
    "path": "libraries/nxd-training/developer_guides/migration_nnm_nxdt.rst",
    "content": ".. _nxdt_developer_guide_migration_nnm_nxdt:\n\nMigrating from Neuron-NeMo-Megatron to Neuronx Distributed Training\n====================================================================\n\nIn this section, we go over the changes one would have to make if they are migrating their\ntraining workload from\n`Neuronx-NeMo-Megatron (NNM) <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nemo-megatron/index.html>`_\nto Neuronx Distributed Training (NxDT) framework.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nConfig migration\n----------------\n\nNxDT is a framework built on top of `NeMo <https://github.com/NVIDIA/NeMo>`_ and\n`NeuronxDistributed (NxD) <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/index.html>`_\nand supports megatron-style model. The megatron model implementation is ported over from NNM.\nHence, most of the config YAMLs from NNM can be migrated to use NxDT.\n\nWhen building NxDT for the sake of modularity, we grouped certain parameters together, eg.\n:ref:`distributed_strategy<nxdt_config_distributed_strategy>` has all the configuration for model parallelism,\n:ref:`data<nxdt_config_data>` config now holds all the parameters required to configure\nthe dataset.\n\nAt a high level, there are some differences with the NNM config, which are highlighted below:\n\n#.\n    The overall config structure has changed. For simplicity and ease of understanding, the config parameters\n    are grouped according to their high level use case. For example, previously all the distributed config parameters\n    used to reside inside ``model`` config, now it’s been moved to a ``distributed_config`` of its own. Similarly data\n    config is moved out to have clear separation between model and data.\n\n#.\n    Environment variables like ``neuron_cc_flags``  and ``neuron_compile_cache_url`` can be set from the config\n    itself. There is no need to set the environment variables. The rationale is to avoid having to configure training\n    scripts from multiple places.\n\n#.\n    ``Activation Checkpointing:`` NxDT only supports selective and full activation checkpointing. The ``selective``\n    checkpointing is done only for the ``CoreAttention`` block (in case of llama3-8K we recompute the ``MLP``\n    block, too) and ``full`` activation checkpointing is done only at a layer boundary. NxDT doesn’t support\n    config parameters like ``activations_checkpoint_method``, ``activations_checkpoint_num_layers``,\n    ``num_micro_batches_with_partial_activation_checkpoints``, ``activations_checkpoint_layers_per_pipeline``,\n    ``disable_layer_norm_checkpointing``. Please remove these parameters from your config.yaml file.\n\n.. note::\n\n    If you plan to add more modules that need to be recomputed, one would have to override the checkpointing config inside\n    ``ModelModule`` (refer to ``build_model`` API at :ref:`nxdt_developer_guide_integrate_new_model_build_module`)\n    and add the modules that need to be recomputed.\n\n4.\n    ``Tokenizer:`` The tokenizer which used to reside under ``model`` is now moved to ``data``. This is done so that all\n    data related configuration can reside at one place.\n\n#.\n    ``accumulate_grad_batches:`` This param is removed since it should always be 1. Gradient accumulation is handled by\n    setting the global_batch_size and micro_batch_size along with data-parallel degree.\n\n#.\n    ``pre_process and post_process:``: These two parameters were added to the model to decide if the embedding lookup\n    needs to be added at the start and if a ``pooler`` layer needs to be added at the end. This has been set by default\n    for all decoder models and hence the config param is no longer exposed.\n\n#.\n    ``Mixed precision config:`` NxDT no longer exposes NeMo mixed precision parameters: ``native_amp_init_scale``,\n    ``native_amp_growth_interval``, ``hysteresis``, ``fp32_residual_connection``, ``fp16_lm_cross_entropy``. All these\n    parameters are specific to the GPU mixed precision strategy, which Neuron doesn’t support, or they are not\n    applicable. Neuron has a different way to enable mixed precision training through ``master_weights`` and\n    ``fp32_grad_accumulation``.\n\n\n#.\n    ``megatron_amp_o2:`` This parameter is not supported.\n\n#.\n    ``Fusions:`` Neuron doesn’t support fusion parameters like ``grad_div_ar_fusion``, ``gradient_accumulation_fusion``,\n    ``bias_activation_fusion``, ``bias_dropout_add_fusion``, ``masked_softmax_fusion``. All of these fusions are built\n    for GPU and require CUDA kernels which cannot run on Trn1. Neuron would have its own set of kernels and when we\n    support them, we would enable those parameters from the config.\n\n.. note::\n\n    If there is a need to support these configs, please create a feature request with exact needs and we shall work on it.\n\nFor detailed mapping, please check the :ref:`nxdt_nnm_nxdt_config_mapping`.\n\nModel code\n----------\n\nThere are the following differences in the model code:\n\n#.\n    NNM used `Apex <https://github.com/NVIDIA/apex/tree/master>`_ to get all the distributed parallel layers and schedules.\n    Since NxDT uses NxD as the base library, all the\n    `parallel layers/parallel state <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#parallel-layers>`_\n    are coming from NxD. Eg. `apex.parallel_state <https://github.com/NVIDIA/apex/blob/master/apex/transformer/parallel_state.py>`_\n    is replaced with\n    `nxd.parallel_layers.parallel_state <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#parallel-model-state>`_.\n\n#.\n    NNM explicitly creates a module for each pipeline-parallel (PP) rank, however, NxDT uses NxD which does the\n    partitioning under the hood. Hence, users no longer have to worry about creating a rank specific module.\n    They can create one single model and\n    `NxD’s PP wrapper <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#neuron-distributed-pipeline-model>`_\n    takes care of sharding for each PP rank. Hence, all the code related to pipeline parallelism inside model\n    code is removed. The model code assumes there is no PP and just uses TP layers from NxD.\n\n.. note::\n    For the tracer to work efficiently, we configure the pipeline parallel config inside the ``BaseModelModule`` class inside\n    ``lightning_modules/model``.\n\n3.\n    In NNM, megatron module had to explicitly handle gradient reduction for shared weights across PP ranks. In NxDT,\n    since we are using NxD’s PP wrapper, all that is handled for the user.\n\n#.\n    For activation checkpointing, NNM had explicit recompute functions which handled the\n    `custom forward API <https://github.com/aws-neuron/neuronx-nemo-megatron/blob/main/nemo/nemo/collections/nlp/modules/common/megatron/transformer.py>`_.\n    With NxDT, `NxD’s Activation Checkpoint wrapper <https://github.com/aws-neuron/neuronx-distributed/blob/main/src/neuronx_distributed/utils/activation_checkpoint.py>`_\n    handles the recompute of the modules. Users just have to configure the ``activation_checkpoint_config`` inside\n    ``nxd_config``\n    `here <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#initialize-nxd-config>`__.\n\n\nCheckpointing Save/Load\n-----------------------\n\nNxDT supports all the checkpointing features which NNM supports. This includes async checkpointing, auto-resume, etc.\nThere are some differences in the format of the checkpoint. This is because NxDT uses\n`NxD’s checkpoint api <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#save-checkpoint>`_.\nThe key differences are listed below:\n\n#.\n    NNM combines the model weights, optimizers and other state_dicts into a single ``state_dict`` and dump a file\n    of the format: ``tp_rank_0*_pp_rank_00*/model_optim_rng.ckpt``. However, with NxDT, we save the model ``state_dict``\n    and the optimizer separately. The model ``statedict`` is saved in a folder of the form:\n    ``model/dp_rank_00_tp_rank_00_pp_rank_00.pt`` and the optimizer is saved into a separate folder as:\n    ``optim/dp_rank_00_tp_rank_00_pp_rank_00.pt``. This is mainly done so that when we use\n    `zero1 <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html?highlight=zero1#neuron-zero1-optimizer>`_,\n    each DP rank can save its own optimizer shard.\n\n#.\n    In NNM, if we are using pipeline parallelism, each pipeline stage creates an independent model. So lets say we have\n    a model with 32 layers and we use PP=4, then NNM would create 4 chunks with layers 0-7. So each PP rank would have\n    ``model_state_dict`` with keys going from layer-0-7. However, in NxDT, the model is created as a whole and then\n    sharded. So the layer numbers are preserved.\n\n#.\n    There are checkpoint conversion scripts provided under ``examples/`` of NxDT repository to convert the existing NNM\n    checkpoints to NxDT format in case of migrating in the middle of training.\n\n.. code-block:: shell\n\n    python nnm_nxdt_ckpt_converter.py --tp 8 --pp 4 --n_layers 32 --nnm_ckpt_path {path_to_ckpt}/ckpt/nnm --nxdt_ckpt_path {path_to_ckpt}/nnm-converted-nxdt-ckpt/ --enable_parallel_processing True --num_parallel_processes 8\n\n.. _nxdt_nnm_nxdt_config_mapping:\n\nConfig Mapping\n--------------\n\nHere is a detailed mapping for all the parameters in the config file. For the below mapping, we chose the\nLlama-7B example across NNM and NxDT frameworks. The same mapping is also true for other models.\n\n.. csv-table::\n   :file: nnm_nxdt_mapping.csv\n   :header-rows: 1\n   :widths: 20, 20, 40\n\n.. note::\n\n    For parameters that are not supported by NxDT, please create a feature request with specific use-case\n    for the parameter, if needed.\n"
  },
  {
    "path": "libraries/nxd-training/developer_guides/nemo_nxdt_mapping.csv",
    "content": "﻿NeMo param,NxDT param mapping,Comments\r\nname,name,\r\nrestore_from_path,Not supported,\r\n**trainer**,,\r\ndevices,devices,\r\nnum_nodes,num_nodes,\r\naccelerator,Not required,\"We made the default as TPU which maps to Neuron internally, so users no longer have to add it.\"\r\nprecision,replaced by ``precision_config``,There is a separate `precision` config to control the precision of model and optimizer.\r\nlogger,Not required,\"We set default value of logger to False.\"\r\nenable_checkpointing,Separate ``exp_manager`` config,\"All checkpointing is controlled by exp_manager config.\"\r\nuse_distributed_sampler,Not supported,\r\nmax_epochs,max_epochs,\r\nmax_steps,max_steps,\r\nlog_every_n_steps,log_every_n_steps,\r\nval_check_interval,val_check_interval,\r\nlimit_val_batches,limit_val_batches,\r\nlimit_test_batches,limit_test_batches,\r\naccumulate_grad_batches,Removed,\"This is automatically configured based on global_batchsize, micro-batchsize and distributed config.\"\r\ngradient_clip_val,gradient_clip_val,\r\nbenchmark,Not supported,\r\nenable_model_summary,Not supported,\r\n**exp_manager**,,\r\nlog_local_rank_0_only,log_local_rank_0_only,\r\ncreate_tensorboard_logger,create_tensorboard_logger,\r\nexplicit_log_dir,explicit_log_dir,\r\nexp_dir,exp_dir,\r\nname,name,\r\ncreate_wandb_logger,Not supported,\"This was not supported under NNM, either. We have removed this argument from NxDT.\"\r\nwandb_logger_kwargs,Not supported,\r\nresume_if_exists,resume_if_exists,\r\nresume_ignore_no_checkpoint,resume_ignore_no_checkpoint,\r\ncreate_checkpoint_callback,create_checkpoint_callback,\r\ncheckpoint_callback_params,checkpoint_callback_params,\r\n**model**,,\r\nmcore_gpt,Not supported,NxDT has its own implementation of megatron_gpt_model which is based on v1.14 version of NeMo\r\ntensor_model_parallel_size,``distributed_strategy.tensor_model_parallel_size``,All the parallelism config are moved to distributed_strategy config\r\npipeline_model_parallel_size,``distributed_strategy.pipeline_model_parallel_size``,\r\nvirtual_pipeline_model_parallel_size,``distributed_strategy.virtual_pipeline_model_parallel_size``,\r\nsequence_parallel,``distributed_strategy.sequence_parallel``,\r\nmicro_batch_size,``data.micro_batch_size``,All the dataset/dataloader/tokenizer configuration are now part of a separate config called data\r\nglobal_batch_size,``data.global_batch_size``,\r\ntokenizer,``data.tokenizer``,\r\ndata,Moved to ``data`` at the same level as model,\"The entire ``data`` key now controls a ``DataModule`` and is placed at the same level as ``model`` key in the config structure.\"\r\nencoder_seq_length,encoder_seq_length,\r\nmax_position_embeddings,max_position_embeddings,\r\nmake_vocab_size_divisible_by,make_vocab_size_divisible_by,\r\npre_process,Not supported,NxDT by default adds embedding layer at the start of the transformer block.\r\npost_process,Not supported,NxDT by default adds a LM-head at the end of the transformer block.\r\npersist_layer_norm,persist_layer_norm,\r\nshare_embeddings_and_output_weights,share_embeddings_and_output_weights,\r\nposition_embedding_type,position_embedding_type,\r\nrotary_percentage,rotary_percentage,\r\ntransformer_block_type,transformer_block_type,\r\nhas_bias,has_bias,\r\nnum_query_groups,Not required,query group attention can be configured using ``num_kv_heads`` parameter.\r\nnative_amp_init_scale,Not Required,\r\nnative_amp_growth_interval,Not Required,\"GPU optimizations which were not supported in NNM, have been removed from NxDT. Most of these fusion ops, the neuron compiler handles on its own. For Attention and Softmax, Neuron uses NKI kernels and custom ops to implement them\"\r\nhysteresis,Not Required,\r\nfp32_residual_connection,Not Required,\r\nfp16_lm_cross_entropy,Not Required,\r\nmegatron_amp_O2,Not Required,\r\ngrad_div_ar_fusion,Not Required,\r\ngradient_accumulation_fusion,Not Required,\r\nbias_activation_fusion,Not Required,\r\nbias_dropout_add_fusion,Not Required,\r\nmasked_softmax_fusion,``fusions.softmax``,\r\nseed,seed is moved out of model and at the same level as model,\r\nresume_from_checkpoint,``exp_manager.resume_from_checkpoint``,\r\nuse_cpu_initialization,use_cpu_initialization,\r\nonnx_safe,Not supported,\"This was not supported under NNM too, we have removed this argument from NxDT.\"\r\napex_transformer_log_level,Not supported,\r\ngradient_as_bucket_view,Not supported,\r\nsync_batch_comm,Not supported,\r\nactivations_checkpoint_granularity,activations_checkpoint_granularity,By default NxDT checkpoints attention module in case of selective and a single layer in case of full checkpointing.\r\nactivations_checkpoint_method,Not supported,\r\nactivations_checkpoint_num_layers,Not supported,\r\nnum_micro_batches_with_partial_activation_checkpoints,Not supported,\r\nactivations_checkpoint_layers_per_pipeline,Not supported,\r\ndisable_layer_norm_checkpointing,Not supported,\r\ntransformer_engine,Not supported,This is specifically built for NVIDIA GPUs.\r\nfp8,Not supported,fp8 training is not supported on Neuron (both NNM and NxDT).\r\nfp8_e4m3,Not supported,\r\nfp8_hybrid,Not supported,\r\nfp8_margin,Not supported,\r\nuse_emha,Not supported,\r\nnsys_profile,Not supported,This is specifically built for NVIDIA GPUs.\r\noptim,optim,"
  },
  {
    "path": "libraries/nxd-training/developer_guides/new_dataloader_guide.rst",
    "content": ".. _nxdt_developer_guide_integrate_new_dataloader:\n\nIntegrating a new dataset/dataloader\n====================================\n\nIn this section, we showcase how to integrate a new dataset/dataloader with the library.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nBuilding Dataset module\n-----------------------\n\nOne can use the guide on `PyTorch docs <https://pytorch.org/tutorials/beginner/data_loading_tutorial.html#dataset-class>`_\nto create a ``Dataset`` class.\n\nBuilding DataModule\n-------------------\n\nTo configure the dataloader, one needs to create a ``DataModule`` class. Neuronx Distributed Training library provides\na ``BaseDataModule`` which one can use to implement their new ``DataModule``. Create a new file called\n``new_data_module.py`` and add the following content.\n\n.. code-block:: python\n\n    from neuronx_distributed_training.lightning_modules.data.base import BaseDataModule\n\n    class NewDataModule(BaseDataModule):\n        def __init__(self, cfg, trainer):\n            \"\"\"\n            DataModule class for configuring the dataset/dataloader\n\n            Args:\n                cfg: `data` cfg in the yaml file.\n                trainer: PyTorch-Lightning trainer.\n            \"\"\"\n            super().__init__(cfg, trainer)\n            # Users can use the cfg argument to pass down\n            # arguments from the yaml file to the DataModule.\n\n\n        def get_batch_length(self, batch):\n            \"\"\"\n            Returns the length of the batch.\n            \"\"\"\n            return len(batch[\"input_ids\"])\n\n        def process_global_batch(self, global_batch, global_batch_size=None):\n            \"\"\" Any custom processing of batches can be done here.\n\n            Args:\n                global_batch: list of inputs, eg.[tokens, labels]\n                global_batch_size: Length of tokens and labels\n            \"\"\"\n            return global_batch\n\n        def train_dataloader(self):\n            \"\"\"\n            This API should return a torch.utils.data.dataloader.DataLoader object\n            \"\"\"\n            ...\n\n        def val_dataloader(self):\n            \"\"\"\n            This API should return a torch.utils.data.dataloader.DataLoader object\n            \"\"\"\n            ...\n\n        def test_dataloader(self):\n            \"\"\"\n            This API should return a torch.utils.data.dataloader.DataLoader object\n            \"\"\"\n            ...\n\n\nPlug into ``training.py``\n#########################\n\nOnce the new data module is created, we can then plug this into the ``training.py`` script under ``examples``\nfolder. We can modify the ``training.py`` script as follows:\n\n.. code-block:: python\n\n    ...\n    # Assuming we are using the same ModelModule we used for LLama example.\n    from new_data_module import NewDataModule\n    data_module = NewDataModule(cfg, trainer)\n    model = HFLLamaModule(cfg, trainer)\n\n    trainer.fit(model, datamodule=data_module)\n\n\nThe rest of the code can remain the same. The trainer will now use the ``NewDataModule`` for fetching the\n``dataloader`` and run e2e training.\n\nCreate config file\n###################\n\nNext, we can create a config file under ``conf`` to be used for this new dataloader. We can start with a copy of\n``hf_llama_7B_config.yaml``. Let's call this config file ``my_new_config.yaml``. We can edit the ``data`` key\nto configure the ``DataModule``\n\n.. note::\n\n    For the model, we are using the same model that the llama example is using. To configure\n    a new model, please check the\n    :ref:`nxdt_developer_guide_integrate_new_model` section.\n\nLaunching e2e training\n######################\n\nWe can now launch training using the new ``data_module``. This can be done using the following command:\n\n.. code-block:: shell\n\n    CONF=my_new_config.yaml ./train.sh\n"
  },
  {
    "path": "libraries/nxd-training/developer_guides/new_model_guide.rst",
    "content": ".. _nxdt_developer_guide_integrate_new_model:\n\nIntegrating a New Model\n==========================\n\nThe NeuronX Distributed Training library is a modular framework that allows users to integrate\ntheir new modules with the framework while still utilizing the other modules provided by the\nlibrary. In this section, we showcase how to integrate a new model with the library.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nModel Building (torch.nn.Module)\n--------------------------------\n\nUsers can create a torch.nn.Module using the tensor-parallel APIs provided by the\n`NeuronxDistributed <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/index.html>`_\nlibrary. Let’s take an example of the\n`GPT-NeoX model built inside NxD examples <https://github.com/aws-neuron/neuronx-distributed/blob/main/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/modeling_gpt_neox_nxd.py>`_.\nWe can copy the model file and treat it as a new model to onboard using the framework.\n\n.. note::\n\n    To understand more about how to build models using Tensor-parallel APIs check the\n    `Developer guide here <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tp_developer_guide.html#creating-model>`_.\n\n\nModel Integration\n-----------------\n\nOnce we have built the model, the next step is to integrate with the training framework. This can be done\nusing the following steps:\n\n.. _nxdt_developer_guide_integrate_new_model_build_module:\n\nBuild a `Lightning Module <https://lightning.ai/docs/pytorch/stable/common/lightning_module.html>`_\n####################################################################################################\n\nNeuronx Distributed Training framework provides a ``BaseModelModule`` that implements the majority of the training\nAPIs. Users can subclass this base module and implement few APIs that set up the model. Here is an example to\nsetup the GPT-NeoX model example. Create a new file called ``new_model_module.py`` and add the following content.\n\n.. code-block:: python\n\n    from transformers import GPTNeoXConfig\n    import neuronx_distributed as nxd\n    from neuronx_distributed.parallel_layers.layer_norm import LayerNorm\n    from neuronx_distributed_training.lightning_modules.model.base import BaseModelModule\n    from neuronx_distributed_training.utils.model_utils import get_param_groups_by_weight_decay\n    from modeling_gpt_neox_nxd import GPTNeoXForCausalLMNxD\n\n    class MyNewModel(BaseModelModule):\n\n        def _get_model(self,):\n            model_name = \"EleutherAI/gpt-neox-20b\"\n            config = GPTNeoXConfig.from_pretrained(model_name)\n            config.use_cache = False\n            # Note: We can modify the model by reading parameters from self.config.model.\n            # We would have to expose those config in the self.config.model accordingly.\n            # Couple of examples are here, where we have exposed num_layers and hidden_size.\n            if self.config.model.get('num_layers', -1) != -1:\n                config.num_hidden_layers = self.config.model.get('num_layers')\n            if self.config.model.get('hidden_size', -1) != -1:\n                config.hidden_size = self.config.model.get('hidden_size')\n            # This is because the GPT-Neox implementation requires this in the config.\n            config.sequence_parallel_enabled = self.config.distributed_strategy.get(\"sequence_parallel\", False)\n            return GPTNeoXForCausalLMNxD(config)\n\n        def build_model(self):\n            # This API is where we build the model object, and return the model.\n            # However, in addition to returning the model, users need to\n            # configure the nxd config too for pipeline parallelism and\n            # activation checkpointing. Here is an example:\n            if self.config.model.get(\"activations_checkpoint_granularity\", None) == \"selective\":\n                # Here just to showcase how to recompute modules, we are using\n                # GPTNeoXMLPNxD, users can add their own custom modules\n                self.nxd_config[\"activation_checkpoint_config\"] = GPTNeoXMLPNxD\n            elif self.config.model.get(\"activations_checkpoint_granularity\", None) == \"full\":\n                self.nxd_config[\"activation_checkpoint_config\"] = \"full\"\n\n            # Read more about configuring pipeline parallel config here:\n            # https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pp_developer_guide.html#pp-developer-guide\n            self.nxd_config[\"pipeline_config\"].update(\n                {\n                    \"transformer_layer_cls\": GPTNeoXLayerNxD,\n                    \"output_loss_value_spec\": (True, False),\n                    \"input_names\": [\"input_ids\", \"attention_mask\", \"labels\"],\n                    \"leaf_module_cls\": [LayerNorm.__name__],\n                }\n            )\n            return nxd.initialize_parallel_model(self.nxd_config, self._get_model)\n\n        def setup_optimizer_param_groups(self):\n            # Depending on what weight decay we need, users can configure\n            # the params groups accordingly.\n            no_decay = [\"bias\"]\n            if self.config.model.get(\"do_layer_norm_weight_decay\", False):\n                no_decay.append(\"LayerNorm\")\n            self._optimizer_param_groups = get_param_groups_by_weight_decay(self.model, no_decay)\n\n        def init_weights(self,):\n            \"\"\"\n            This API is mainly to tell the framework how each layer needs\n            to be initialized. This is required because NxD's PP API would\n            use this to initialize the layers after model partition.\n            Any layer that is unique to the model needs to be added here.\n            \"\"\"\n            if isinstance(module, LayerNorm):\n                module.weight.data.fill_(1.0)\n            # The BaseModelModule already initializes the ColumnParallel, RowParallel\n            # ParallelEmbedding layers.\n            super().init_weights()\n\n\nPlug into ``training.py``\n#########################\n\n\nOnce the new model is created, we can then plug this into the ``training.py`` script under ``examples`` folder.\nWe can modify the ``training.py`` script as follows:\n\n.. code-block:: python\n\n    ...\n    # Assuming we are using the same DataModule we used for LLama example.\n    data_module = HFDataModule(cfg, trainer)\n    from new_model_module import MyNewModel\n    model = MyNewModel(cfg, trainer)\n\n    trainer.fit(model, datamodule=data_module)\n\nThe rest of the code can remain the same. The trainer will now use the ``MyNewModel`` for fetching the\n``model`` code and run e2e training.\n\nCreate config file\n###################\n\nNext we can create a config file under ``conf`` to be used for this new model. We can start with a copy of\n``hf_llama_7B_config.yaml``. Let's call this config file ``my_new_config.yaml``. We can remove the key\n``model.model_config`` as we are not using it inside our ``MyNewModel``. We can edit the\n``distributed_strategy`` config depending on what we need.\n\n.. note::\n\n    For the dataset, we are using the same dataset that the llama example is using. To configure\n    a new dataset, please check the\n    :ref:`nxdt_developer_guide_integrate_new_dataloader` section\n\nLaunching e2e training\n######################\n\nWe can now launch training using the new model. This can be done using the following command:\n\n.. code-block:: shell\n\n    CONF=my_new_config.yaml ./train.sh\n"
  },
  {
    "path": "libraries/nxd-training/developer_guides/nnm_nxdt_mapping.csv",
    "content": "﻿NNM param,NxDT param mapping,Comments\r\nname,name,\r\nrestore_from_path,Not supported,\"This config was not fully supported in NNM, either.\"\r\n**trainer**,,\r\ndevices,devices,\r\nnum_nodes,num_nodes,\r\naccelerator,Not required,\"We made the default as TPU which maps to Neuron internally, so users no longer have to add it.\"\r\nprecision,replaced by ``precision_config``,There is a separate `precision` config to control the precision of model and optimizer.\r\nlogger,Replaced by default,\"We made the NNM logger default in NxDT.\"\r\nenable_checkpointing,Separate ``exp_manager`` config,\"All checkpointing is controlled by exp_manager config.\"\r\nreplace_sampler_ddp,Not supported,\"Had to be always False in NNM, made it default in NxDT. No setting required.\"\r\nmax_epochs,max_epochs,\r\nmax_steps,max_steps,\r\nlog_every_n_steps,log_every_n_steps,\r\nval_check_interval,val_check_interval,\r\nlimit_val_batches,limit_val_batches,\r\nlimit_test_batches,limit_test_batches,\r\naccumulate_grad_batches,Removed,\"This is automatically configured based on global_batchsize, micro-batchsize and distributed config.\"\r\ngradient_clip_val,gradient_clip_val,\r\nbenchmark,Not supported,\r\nenable_model_summary,Not supported,\r\n**exp_manager**,,\r\nlog_local_rank_0_only,log_local_rank_0_only,\r\ncreate_tensorboard_logger,create_tensorboard_logger,\r\nexplicit_log_dir,explicit_log_dir,\r\nexp_dir,exp_dir,\r\nname,name,\r\ncreate_wandb_logger,Not supported,\"This was not supported under NNM, either. We have removed this argument from NxDT.\"\r\nwandb_logger_kwargs,Not supported,\r\nresume_if_exists,resume_if_exists,\r\nresume_ignore_no_checkpoint,resume_ignore_no_checkpoint,\r\ncreate_checkpoint_callback,create_checkpoint_callback,\r\ncheckpoint_callback_params,checkpoint_callback_params,\r\n**model**,,\r\ntensor_model_parallel_size,``distributed_strategy.tensor_model_parallel_size``,\"All the parallelism config are moved to distributed_strategy config.\"\r\npipeline_model_parallel_size,``distributed_strategy.pipeline_model_parallel_size``,\r\nvirtual_pipeline_model_parallel_size,``distributed_strategy.virtual_pipeline_model_parallel_size``,\r\nsequence_parallel,``distributed_strategy.sequence_parallel``,\r\nwrap_with_zero,``distributed_strategy.zero1``,\r\nmicro_batch_size,``data.micro_batch_size``,All the dataset/dataloader/tokenizer configurations are now part of a separate config called data.\r\nglobal_batch_size,``data.global_batch_size``,\r\ntokenizer,``data.tokenizer``,\r\ndata,Moved to ``data`` at the same level as model,\"The entire ``data`` key now controls a ``DataModule`` and is placed at the same level as ``model`` key in the config structure.\"\r\nencoder_seq_length,encoder_seq_length,\r\nmax_position_embeddings,max_position_embeddings,\r\nmake_vocab_size_divisible_by,make_vocab_size_divisible_by,\r\npre_process,Not supported,NxDT by default adds embedding layer at the start of the transformer block.\r\npost_process,Not supported,NxDT by default adds a LM-head at the end of the transformer block.\r\npersist_layer_norm,persist_layer_norm,\r\nshare_embeddings_and_output_weights,share_embeddings_and_output_weights,\r\nposition_embedding_type,position_embedding_type,\r\nrotary_percentage,rotary_percentage,\r\ntransformer_block_type,transformer_block_type,\r\nhas_bias,has_bias,\r\nnative_amp_init_scale,Not required,\r\nnative_amp_growth_interval,Not required,\"GPU optimizations which were not supported in NNM, have been removed from NxDT. Most of these fusion ops, the neuron compiler handles on its own. For Attention and Softmax, Neuron uses NKI kernels and custom ops to implement them.\"\r\nhysteresis,Not required,\r\nfp32_residual_connection,Not required,\r\nfp16_lm_cross_entropy,Not required,\r\nmegatron_amp_O2,Not required,\r\ngrad_div_ar_fusion,Not required,\r\ngradient_accumulation_fusion,Not required,\r\nbias_activation_fusion,Not required,\r\nbias_dropout_add_fusion,Not required,\r\nmasked_softmax_fusion,``fusions.softmax``,\r\nseed,Seed is moved out of model and at the same level as ``model``,\r\nresume_from_checkpoint,``exp_manager.resume_from_checkpoint``,\r\nuse_cpu_initialization,use_cpu_initialization,\r\nonnx_safe,Not supported,\"This was not supported under NNM, either. We have removed this argument from NxDT.\"\r\napex_transformer_log_level,Not supported,\r\ngradient_as_bucket_view,Not supported,\r\nsync_batch_comm,Not supported,\r\nlog_parameter_norm,``exp_manager.log_gradient_norm``,\r\nlog_gradient_norm,``exp_manager.log_gradient_norm``,\r\nflexible_pipeline_parallel_stages,Not supported,\r\nactivations_checkpoint_granularity,activations_checkpoint_granularity,\"Currently, NxDT checkpoints the attention module in case of selective and a single layer in case of full checkpointing.\"\r\nactivations_checkpoint_method,Not supported,\r\nactivations_checkpoint_num_layers,Not supported,\r\nnum_micro_batches_with_partial_activation_checkpoints,Not supported,\r\nactivations_checkpoint_layers_per_pipeline,Not supported,\r\ndisable_layer_norm_checkpointing,Not supported,\r\nzero_use_master_weight,Supported via precision config,See :ref:`manual precision config<nxdt_config_overview_precision_config>`.\r\nzero_use_fp32_grad_accum,Supported via precision config,See :ref:`manual precision config<nxdt_config_overview_precision_config>`.\r\ntransformer_engine,Not supported,This is specifically built for NVIDIA GPUs.\r\nfp8,Not supported,fp8 training is not supported on Neuron (both NNM and NxDT).\r\nfp8_e4m3,Not supported,fp8 training is not supported on Neuron (both NNM and NxDT).\r\nfp8_hybrid,Not supported,fp8 training is not supported on Neuron (both NNM and NxDT).\r\nfp8_margin,Not supported,fp8 training is not supported on Neuron (both NNM and NxDT).\r\nuse_emha,Not supported,fp8 training is not supported on Neuron (both NNM and NxDT).\r\nconvert_to_hf,Supported via separate script,\r\nnsys_profile,Not supported,This is specifically built for NVIDIA GPUs.\r\noptim,optim,\r\nenable_recovery_time_instrumentation,``exp_manager.enable_recovery_time_instrumentation``,\r\nasync_checkpointing,``exp_manager.async_checkpointing``,"
  },
  {
    "path": "libraries/nxd-training/developer_guides/optimizer_lr_scheduler_flow.rst",
    "content": ".. _nxdt_developer_flow_register_optimizer_lr_scheduler:\n\nRegistering an optimizer and LR scheduler\n=========================================\n\nA new optimizer or LR scheduler can be registered with the framework and enabled from the config.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nSetting up the optimizer\n------------------------\n\nOne can write their own optimizer class. One such example is the\n`AdamW_FP32OptimParams <https://github.com/aws-neuron/neuronx-distributed/blob/main/src/neuronx_distributed/utils/adamw_fp32_optim_params.py>`_.\n\nThe inputs to the optimizer can be exposed in the config YAML file. To do this, we need to create a ``Params`` class\nas shown below:\n\n.. code-block:: python\n\n    from dataclasses import dataclass\n    from typing import Any, Dict, Optional, Tuple\n\n    from omegaconf import MISSING\n\n    @dataclass\n    class OptimizerParams:\n        \"\"\"\n        All the params listed below can be configured from the YAML file\n        \"\"\"\n\n        lr: Optional[float] = MISSING\n        betas: Tuple[float, float] = (0.9, 0.999)\n        eps: float = 1e-08\n        weight_decay: float = 0\n        amsgrad: bool = False\n\n\nOnce we create the optimizer and the optimizer params class, we can now register the optimizer with the\nframework using the following code:\n\n.. code-block:: python\n\n    from nemo.core.optim import register_optimizer\n\n    # `adamw_fp32OptState` would be the name in the optim config of the YAML file.\n    register_optimizer(\"adamw_fp32OptState\", AdamW_FP32OptimParams, OptimizerParams)\n\nThis registration can be done inside the ``training.py`` file which resides in ``examples`` folder.\n\nOnce the registration is done, we can now expose the ``OptimizerParams`` under ``optim`` config of the\nYAML file.\n\n\nSetting up the LR scheduler\n---------------------------\n\nOne can write their own LR scheduler and register with the framework. One such example of LR scheduler is\nshown below:\n\n.. code-block:: python\n\n    from functools import partial\n\n    from torch.optim.lr_scheduler import LambdaLR\n    from transformers.optimization import _get_linear_schedule_with_warmup_lr_lambda\n\n\n    class LinearAnnealingWithWarmUp(LambdaLR):\n        def __init__(self, optimizer, warmup_steps, max_steps, last_epoch=-1):\n            lr_lambda = partial(\n                _get_linear_schedule_with_warmup_lr_lambda,\n                num_warmup_steps=warmup_steps,\n                num_training_steps=max_steps,\n            )\n            super().__init__(optimizer, lr_lambda, last_epoch)\n\n\nOnce we build this LR scheduler, we can expose the arguments to the config YAML file. Before that,\nwe need to write up a ``LRSchedulerParams`` class. Here is an example for the same:\n\n.. code-block:: python\n\n    from nemo.core.config.schedulers import SchedulerParams\n\n    class LinearAnnealingWithWarmupParams(SchedulerParams):\n        warmup_steps: int = 0\n        max_steps: int = 0\n\n\nOnce the LR scheduler and the ``SchedulerParams`` class are set, we can now register the scheduler\nwith the framework as below:\n\n.. code-block:: python\n\n    from nemo.core.optim.lr_scheduler import register_scheduler\n\n\n    # Here, `LinearAnnealingWithWarmUp` is the name of the scheduler we would use in the config YAML file\n    register_scheduler(\"LinearAnnealingWithWarmUp\", LinearAnnealingWithWarmUp, LinearAnnealingWithWarmupParams)\n\n\nThis registration can be done inside the ``training.py`` file which resides under ``examples`` folder.\n\nOnce the registration is done, we can now expose the ``LinearAnnealingWithWarmupParams`` under ``sched`` config\nof the YAML file.\n"
  },
  {
    "path": "libraries/nxd-training/general/config_overview.rst",
    "content": ".. _nxdt_config_overview:\n\nYAML Configuration Settings\n===========================\n\nThe library allows configuring a bunch of parameters in the YAML file to run large scale training.\nThe important categories and parameters are highlighted below. At the top level, we have the following\nkeys:\n\n.. code-block:: yaml\n\n    name:\n        # Name of the experiment\n    model_source:\n        # Model source code, could be megatron or hf\n    seed:\n        # Random seed to be used for the entire experiment\n    trainer:\n        # Settings to configure the PyTorch-Lightning trainer\n    exp_manager:\n        # Settings to configure logging/checkpointing\n    distributed_strategy:\n        # Settings to configure how the model is to be distributed across devices\n    data:\n        # Settings to configure the dataset/dataloader\n    model:\n        # Settings to configure the model architecture and the optimizer\n    precision:\n        # Settings to configure the model precision\n    compiler_flags:\n        # Neuron compiler flags to be used\n    compiler_cache_url:\n        # Cache to be used to save the compiled artifacts\n    aync_exec_max_inflight_requests:\n        # Used to configure the runtime queue\n    bucket_size_collectives:\n        # Collectives are batched into tensors of this size (in MBs)\n    neuron_rt_exec_timeout:\n        # Runtime timeout\n    neuron_experimental_compress_rg:\n        # To use compress replica group\n\n\n.. _nxdt_config_trainer:\n\nTrainer\n-------\n\nNeuronx Distributed Trainer framework is built on top of `PyTorch-Lightning <https://lightning.ai/docs/pytorch/stable/>`_\nand this key allows users to configure the ``trainer``.\n\n.. code-block:: yaml\n\n    devices: 32\n    num_nodes: 1\n    max_epochs: -1\n    max_steps: 20000\n    log_every_n_steps: 1\n    val_check_interval: 20000\n    check_val_every_n_epoch: null\n    num_sanity_val_steps: 0\n    limit_val_batches: 1\n    limit_test_batches: 1\n    gradient_clip_val: 1.0\n    lnc: 2\n    sequential_move_factor: 11\n\n.. note::\n\n    All the above trainer parameters follow the exact same definition of the PyTorch-Lightning Trainer.\n    More information about each of them can be found\n    `here <https://lightning.ai/docs/pytorch/stable/common/trainer.html>`__.\n\n**devices**\n\nNumber of devices to be used for training. If using torchrun, this is equal to ``nproc_per_node * num_nodes``.\n\n    * **Type**: integer\n    * **Required**: True\n\n**lnc**\n\nNeuron-specific setting that specifies the logical-to-physical Neuron Core mapping ratio.\nThis parameter determines the number of physical Neuron cores used for each logical Neuron Core.\n\nValues:\n\n- lnc: 1 - Each node exposes 128 logical devices, with a 1:1 mapping between logical and physical Neuron Cores.\n- lnc: 2 - Implements a 2:1 mapping between logical and physical Neuron Cores.\n\n    * **Type**: integer\n    * **Required**: False\n    * **Default**: None (must be explicitly set)\n\n**num_nodes**\n\nNumber of nodes to be used for training\n\n    * **Type**: integer\n    * **Required**: True\n\n**max_epochs**\n\nMaximum number of epochs to run. A value of ``-1`` means that the number of training steps would be inferred\nfrom ``max_steps``\n\n    * **Type**: integer\n    * **Required**: True\n\n**log_every_n_steps**\n\nHow often to log loss values\n\n    * **Default value**: 1\n    * **Type**: integer\n    * **Required**: True\n\n**val_check_interval**\n\nHow often to run validation step. Using this parameter one can run validation step after ``X`` training steps.\n\n    * **Type**: integer\n    * **Required**: True\n\n**check_val_every_n_epoch**\n\nAnother parameter that controls the frequency of validation step. Using this parameter, one can run valiation\nstep after ``X`` epochs.\n\n    * **Type**: integer\n    * **Required**: True\n\n**num_sanity_val_steps**\n\nHow many sanity validation steps to run. Keeping it to ``0`` would not run validation step at the start of\ntraining.\n\n    * **Type**: integer\n    * **Required**: True\n\n\n**limit_val_batches**\n\nNumber of batches to run validation step on.\n\n    * **Type**: integer\n    * **Required**: True\n\n\n**gradient_clip_val**\n\nFloat value to clip gradients at.\n\n    * **Type**: float\n    * **Required**: True\n\n\n**sequential_move_factor**\n\nNumber of ranks/devices participating in initializing the model weights in parallel. Useful to reduce init time\nwhen using TP-PP config. The value can be increased upto the number of ``trainer.devices`` being used.\n\n    * **Default value**: 11\n    * **Type**: integer\n    * **Required**: False\n\n.. _nxdt_config_exptm:\n\nExperiment Manager\n------------------\n\nThis setting is mainly for configuring different aspects of experiment management like checkpointing,\nexperiment logging directory, which parameters to log and how often to log, etc.\n\n\n.. code-block:: yaml\n\n    log_local_rank_0_only: True\n    create_tensorboard_logger: True\n    explicit_log_dir: null\n    exp_dir: null\n    name: megatron_llama\n    resume_if_exists: True\n    resume_ignore_no_checkpoint: True\n    create_checkpoint_callback: True\n    checkpoint_callback_params:\n        monitor: step\n        save_top_k: 1\n        mode: max\n        save_last: False\n        filename: 'megatron_llama--{step}-{consumed_samples}'\n        every_n_train_steps: 200\n        use_master_weights_in_ckpt: False\n    log_parameter_norm: True\n    log_gradient_norm: True\n    enable_recovery_time_instrumentation: False\n    save_xser: True\n    load_xser: True\n    async_checkpointing: False\n    resume_from_checkpoint: null\n\n**log_local_rank_0_only**\n\nLog only on rank 0. The recommended setting should be ``True``\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n**create_tensorboard_logger**\n\nSetting this ``True`` would log the loss and other parameters to tensorboard.\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n**exp_log_dir**\n\nExplicitly specify the logging directory. Otherwise, the framework would save to current directory as default.\n\n    * **Type**: str\n    * **Default**: null\n    * **Required**: False\n\n**resume_if_exists**\n\nSet this to ``True`` to resume from an existing checkpoint. This config will be useful when we want to\nauto-resume from a failed training job.\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n\n**resume_ignore_no_checkpoint**\n\nExperiment manager errors out if ``resume_if_exists`` is ``True`` and no checkpoint could be found. This\nbehaviour can be disabled, in which case exp_manager will print a message and\ncontinue without restoring, by setting ``resume_ignore_no_checkpoint`` to ``True``.\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n**checkpoint_callback_params.save_top_k**\n\nHow many checkpoints to keep around. Example: If set to 1, only 1 checkpoint at any given time would be\nkept around. The framework would automatically keep deleting checkpoints.\n\n    * **Type**: int\n    * **Required**: True\n\n**checkpoint_callback_params.every_n_train_steps**\n\nHow often we want to checkpoint.\n\n    * **Type**: int\n    * **Required**: True\n\n**checkpoint_callback_params.use_master_weights_in_ckpt**\n\nWhether or not to save master weights when checkpointing.\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n**log_parameter_norm**\n\nSet this to log parameter norm across model parallel ranks.\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n**log_gradient_norm**\n\nSet this to log gradient norm across model parallel ranks.\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n**enable_recovery_time_instrumentation**\n\nSet this if you don’t want to default to not printing the detailing timing for recovery.\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n**save_xser**\n\nSet this to save with torch xla serialization to reduce time saving, it’s recommended to enable ``xser``\nfor significantly faster save/load. Note that if the checkpoint is saved with ``xser``, it can only be\nloaded with ``xser``, vice versa.\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n**load_xser**\n\nSet this to load with torch xla serialization to reduce time saving, it’s recommended to enable ``xser`` for\nsignificantly faster save/load. Note that if the checkpoint is saved with ``xser``, it can only be loaded\nwith ``xser``, vice versa.\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n**async_checkpointing**\n\nSet this if you want to use async checkpointing. Under the hood the library uses the async checkpointing\nfeature provided by NeuronxDistributed's\n`save API <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#id3>`_.\n\n    * **Type**: bool\n    * **Default**: False\n    * **Required**: False\n\n**resume_from_checkpoint**\n\nSet this as the checkpoint file to load from. Check the SFT/DPO/ORPO example config under ``conf`` on how to use it.\n\n    * **Type**: str\n    * **Default**: null\n    * **Required**: False\n\n**ckpt_ptl_version**\n\nSet this only if your checkpoint does not contain the pytorch-lightning version in it.\nThis version is the pytorch-lightning version the checkpoint was saved with.\n\n    * **Type**: str\n    * **Default**: \"2.5.0\"\n    * **Required**: False\n\n.. _nxdt_config_distributed_strategy:\n\nDistributed Strategy\n--------------------\n\n.. code-block:: yaml\n\n    tensor_model_parallel_size: 8\n    pipeline_model_parallel_size: 1\n    virtual_pipeline_model_parallel_size: 1\n    zero1: True\n    sequence_parallel: True\n    kv_replicator: 4\n\nThis setting allows users to configure the sharding strategy to be used for distributing the model across\nworkers.\n\n**tensor_model_parallel_size**\n\n`Tensor parallel degree <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#initialize-model-parallelism>`_\nto be used for sharding models.\n\n    * **Type**: int\n    * **Required**: True\n\n**pipeline_model_parallel_size**\n\n`Pipeline parallel degree <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#initialize-model-parallelism>`_\nto be used for sharding models.\n\n    * **Type**: int\n    * **Required**: True\n\n**virtual_pipeline_model_parallel_size**\n\n`Interleaved pipeline parallel degree <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#neuron-distributed-pipeline-model>`_.\nUse a value of 1 if no pipeline parallelism is used.\n\n    * **Type**: int\n    * **Required**: True\n\n**context_parallel_size**\n\nContext parallel degree to be used for sharding sequence. When\ncontext_parallel_size is greater than 1,\n``fusions.ring_attention`` must be set to ``True``.\n\n    * **Type**: int\n    * **Required**: False\n    * **Default**: 1\n\n**zero1**\n\nWraps the optimizer with zero1.\n\n    * **Type**: bool\n    * **Required**: True\n\n**sequence_parallel**\n\nTo shard along the sequence dimension. Sequence Parallel is always used in conjuction with tensor parallel.\nThe sequence dimension will be sharded with the same degree as the ``tensor_model_parallel_size``.\n\n    * **Type**: bool\n    * **Required**: True\n\n**kv_replicator**\n\nThis parameter is used together with ``qkv_linear`` parameter. It is used to configure the\n`GQAQKVLinear module <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#gqa-qkv-linear-module>`_\n\n    * **Type**: bool\n    * **Required**: True\n\n.. _nxdt_config_data:\n\nData\n----\n\nThis is where we configure the dataset/dataloader. This config is dependent on the dataloader/dataset been\nused. Users can add custom keys in this config and read inside the ``CustomDataModule`` using ``cfg.data``.\nCurrently the library adds support for 3 kinds of data modules: ``MegatronDataModule``, ``ModelAlignmentDataModule``\nand ``HFDataModule``. To learn about the config parameters of ``MegatronDataModule`` please check the\n``megatron_llama_7B_config.yaml``, for ``ModelAlignmentDataModule`` check the ``megatron_llama2_7B_SFT_config.yaml``\nand for ``HFDataModule``, refer to ``hf_llama3_8B_config.yaml``.\n\nThe parameters that are common across all the configs are documented below.\n\n.. code-block:: yaml\n\n    micro_batch_size: 1\n    global_batch_size: 1024\n\n\n**micro_batch_size**\n\nThe batch is distributed across multiple data parallel ranks and within each rank, we accumulate gradients.\nMicro batch size is the size that is used for each of those gradient calculation steps.\n\n    * **Type**: int\n    * **Required**: True\n\n**global_batch_size**\n\nThis config along with micro-batchsize decides the gradient accumulation number automatically.\n\n    * **Type**: int\n    * **Required**: True\n\n.. _nxdt_config_model:\n\nModel\n-----\n\nThis is where we can configure the model architecture. When building custom models, this config can be\nused to parameterize the custom model. The below parameters are taken from an example of the Megatron\nmodel config. Depending on the model and required parameters, this config can change.\n\nHF Model\n########\n\nLet's start with the config for the HF model:\n\n.. code-block:: yaml\n\n    # model architecture\n    model_config: /home/ubuntu/config.json\n    encoder_seq_length: 4096\n    max_position_embeddings: ${.encoder_seq_length}\n    num_layers: 4\n    hidden_size: 4096\n    qkv_linear: False\n\n    # Miscellaneous\n    use_cpu_initialization: True\n\n    ## Activation Checkpointing\n    activations_checkpoint_granularity: selective\n    activations_checkpoint_recompute: [CoreAttention]\n\n    fusions:\n        softmax: True\n        flash_attention: False\n\n    do_layer_norm_weight_decay: False\n\n    optim:\n        name: adamw_fp32OptState\n        lr: 3e-4\n        weight_decay: 0.01\n        capturable: False\n        betas:\n        - 0.9\n        - 0.999\n        sched:\n            name: LinearAnnealingWithWarmUp\n            warmup_steps: 100\n            max_steps: ${trainer.max_steps}\n\n**model_config**\n\nPoints to the ``config.json`` path required by the ``transformers`` model implementation. One such example of\n``config.json`` is `here <https://github.com/aws-neuron/neuronx-distributed/blob/main/examples/training/llama/tp_zero1_llama_hf_pretrain/7B_config_llama2/config.json>`__\n\n    * **Type**: str\n    * **Required**: True\n\n**encoder_seq_length**\n\nSetting the sequence length for the training job. This parameter is common for all models supported in the library.\n\n    * **Type**: int\n    * **Required**: True\n\n**num_layers**\n\nThis config will override the number of layers inside the ``config.json`` in the ``model_config``. This is exposed\nso that one can quickly increase/decrease the size of the model. This parameter is common for all models supported\nin the library.\n\n    * **Type**: int\n    * **Required**: True\n\n**hidden_size**\n\nThis config will override the ``hidden_size`` inside the ``config.json`` in the ``model_config``. This parameter\nis common for all models supported in the library.\n\n    * **Type**: int\n    * **Required**: True\n\n**qkv_linear**\n\nThis needs to be set if users want to use the\n`GQAQKVLinear module <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#gqa-qkv-linear-module>`_\n\n    * **Type**: bool\n    * **Required**: True\n\n**fuse_qkv**\n\nThis is set if users want to use fused q, k and v tensors in\n`GQAQKVLinear module <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api-reference-guide-training.html#gqa-qkv-linear-module>`_ Using fuse_qkv can improve throughput. \nThis parameter is True by default.\n\n    * **Type**: bool\n    * **Required**: False\n\n**transpose_nki_inputs**\n\nThis is set if users want to transpose the inputs to NKI FlashAttention function. To be used only when\n``fusions.flash_attention`` is ``True``. Using ``transpose_nki_inputs`` with ``fusions.flash_attention``\ncan improve throughput. This parameter is True by default for all models, unless used otherwise.\n\n    * **Type**: bool\n    * **Required**: False\n\n**pipeline_cuts**\n\nThis is set as a list of layer names if users want to specify manual cut points for pipeline parallelism.\nOne example is ['model.layers.10', 'model.layers.20'] in the case of PP=3.\n\n    * **Type**: List[str]\n    * **Required**: False\n\n.. note::\n    When using this param, the number of pipeline cuts should always be ``pipeline_model_parallel_size-1``.\n\n**use_cpu_initialization**\n\nSetting this flag to ``True`` will initialize the weights on ``CPU`` and then move to device. It is recommended to set\nthis flag to ``True``. This parameter is common for all models supported in the library.\n\n    * **Type**: bool\n    * **Required**: True\n\n**activations_checkpoint_granularity**\n\nThis flag controls which module needs to be recomputed during the backward pass.\n\nValues:\n\n- ``selective`` - Enables selective recomputation of specified\n                modules in `activations_checkpoint_recompute` during the backward pass.\n- ``full`` - Saves activations at layer boundaries and recomputes the entire layer during the backward pass.\n- ``null`` - Disables activation checkpointing.\n\nMore information on activation recompute can be found\n`in this link <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html#activation-recomputation>`_.\nThis parameter is common for all models supported in the library.\n\n    * **Type**: str\n    * **Possible Values**: ``selective``, ``full``, ``null``\n    * **Required**: True\n\n**activations_checkpoint_recompute**\nThis config specifies which modules to recompute when using ``selective`` activation checkpointing.\nIt accepts a list of module names as strings or `null`.\n\n    * **Type**: list[str] or `null`\n    * **Required**: False\n\n**fusions.softmax**\n\nSetting this flag to ``True`` will replace the ``torch.nn.Softmax`` with a fused custom ``Softmax`` operator. This\nparameter is common for all models supported in the library.\n\n    * **Type**: bool\n    * **Required**: True\n\n**fusions.flash_attention**\n\nSetting this flag to ``True`` will insert the flash attention module for both forward and backward. This parameter is\ncommon for all models supported in the library.\n\n    * **Type**: bool\n    * **Required**: True\n\n**fusions.ring_attention**\n\nSetting this flag to ``True`` will use the ring attention module for\nboth forward and backward.\nThis parameter must be true when ``context_parallel_size``\nis greater than 1.\n\n    * **Type**: bool\n    * **Required**: False\n\n**fusions.do_layer_norm_weight_decay**\n\nSetting this flag to ``True`` will add layer norm weight decay. This parameter is common for all models supported in\nthe library.\n\n    * **Type**: bool\n    * **Required**: True\n\n**optim**\n\nThis is where the optimizers can be set. We can configure the optimizers\nsupported by ``NeMo``. All the optimzers can be configured according to the\n`parameters specified here <https://github.com/NVIDIA/NeMo/blob/v1.14.0/nemo/core/config/optimizers.py>`__.\n\n    * **Type**: config\n    * **Possible Values**: ``adamw``, ``adamw_fp32OptState``, ``sgd``, ``adam``, ``adadelta``, ``adamax``,\n    *  ``adagrad``, ``rmsprop``, ``rprop``, ``novograd``, ``adafactor``\n    * **Required**: True\n\n**optim.sched**\n\nThis is where the LR schedulers can be set. We can configure the schedulers\nsupported by ``NeMo``. All the schedulers can be configured according to the\n`parameters specified here <https://github.com/NVIDIA/NeMo/blob/v1.14.0/nemo/core/config/schedulers.py>`__.\n\n    * **Type**: config\n    * **Possible Values**: ``LinearAnnealingWithWarmUp``, ``CosineAnnealing``, ``WarmupPolicy``,\n    *  ``WarmupHoldPolicy``, ``SquareAnnealing``, ``NoamAnnealing``, ``WarmupAnnealing``,\n    *   ``StepLR``, ``rprop``, ``ExponentialLR``\n    * **Required**: True\n\nMegatron Model\n##############\n\nThe library enables a\n`megatron transformer <https://github.com/NVIDIA/NeMo/blob/v1.14.0/nemo/collections/nlp/models/language_modeling/megatron/gpt_model.py>`_\nmodel which can be configured from the yaml file. The different available parameters are documented below after\nthe following reference example.\n\n.. code-block:: yaml\n\n    # model architecture\n    encoder_seq_length: 4096\n    max_position_embeddings: ${.encoder_seq_length}\n    num_layers: 32\n    hidden_size: 4096\n    ffn_hidden_size: 11008\n    num_attention_heads: 32\n    num_kv_heads: 32\n    init_method_std: 0.021\n    hidden_dropout: 0\n    attention_dropout: 0\n    ffn_dropout: 0\n    apply_query_key_layer_scaling: True\n    normalization: 'rmsnorm'\n    layernorm_epsilon: 1e-5\n    do_layer_norm_weight_decay: False # True means weight decay on all params\n    make_vocab_size_divisible_by: 8 # Pad the vocab size to be divisible by this value for computation efficiency.\n    persist_layer_norm: True # Use of persistent fused layer norm kernel.\n    share_embeddings_and_output_weights: False # Untie embedding and output layer weights.\n    position_embedding_type: 'rope' # Position embedding type. Options ['learned_absolute', 'rope]\n    rotary_percentage: 1 # If using position_embedding_type=rope, then the per head dim is multiplied by this.\n    activation: 'swiglu' # ['swiglu', 'gelu']\n    has_bias: False\n    # Miscellaneous\n    use_cpu_initialization: True\n\n    ## Activation Checkpointing\n    activations_checkpoint_granularity: selective # 'selective' or 'full'\n\n    fusions:\n        softmax: True\n        flash_attention: False # Use NKI flash attention\n\n    optim:\n        name: adamw\n        lr: 3e-4\n        weight_decay: 0.1\n        capturable: True\n        betas:\n        - 0.9\n        - 0.95\n        sched:\n        name: CosineAnnealing\n        warmup_steps: 2000\n        constant_steps: 0\n        min_lr: 3.0e-5\n\n.. note::\n\n    For common config, please refer to the ``HF Model`` section above.\n\n**ffn_hidden_size**\n\nTransformer FFN hidden size.\n\n    * **Type**: int\n    * **Required**: True\n\n**num_attention_heads**\n\nNumber of ``Q`` attention heads.\n\n    * **Type**: int\n    * **Required**: True\n\n**num_kv_heads**\n\nNumber of ``KV`` heads. This is where we can configure ``Q`` and ``KV`` differently to create ``GQA`` modules.\n\n    * **Type**: int\n    * **Required**: True\n\n**init_method_std**\n\nStandard deviation to use when we init layers of the transformer model.\n\n    * **Type**: float\n    * **Required**: True\n\n**hidden_dropout**\n\nDropout probability for hidden state transformer.\n\n    * **Type**: float\n    * **Required**: True\n\n**attention_dropout**\n\nDropout probability in the attention layer.\n\n    * **Type**: float\n    * **Required**: True\n\n**ffn_dropout**\n\nDropout probability in the feed-forward layer.\n\n    * **Type**: float\n    * **Required**: True\n\n**apply_query_key_layer_scaling**\n\nScale ``Q * K^T`` by ``(1 / layer-number)``.\n\n    * **Type**: bool\n    * **Required**: True\n\n**normalization**\n\nNormalization layer to use.\n\n    * **Type**: str\n    * **Possible Values**: ``rmsnorm``, ``layernorm``\n    * **Required**: True\n\n**layernorm_epsilon**\n\nEpsilon value for layernorm.\n\n    * **Type**: float\n    * **Required**: True\n\n**share_embeddings_and_output_weights**\n\nSetting this parameter to ``True`` will tie the ``vocab embedding`` weight with the final ``MLP`` weight.\n\n    * **Type**: bool\n    * **Required**: True\n\n**make_vocab_size_divisible_by**\n\nSo lets say your vocab size is ``31999`` and you set this value to 4, the framework would pad the vocab-size such that\nit becomes divisible by ``4``. In this case the close divisible value is ``32K``.\n\n    * **Type**: int\n    * **Required**: True\n\n**position_embedding_type**\n\nType of position embedding to be used.\n\n    * **Type**: str\n    * **Possible Values**: ``learned_absolute``, ``rope``\n    * **Required**: True\n\n**rotary_percentage**\n\nIf using ``position_embedding_type=rope``, then the per head dim is multiplied by this factor.\n\n    * **Type**: float\n    * **Required**: True\n\n**activation**\n\nUsers can specify the activation function to be used in the model.\n\n    * **Type**: str\n    * **Possible Values**: ``swiglu``, ``gelu``\n    * **Required**: True\n\n**has_bias**\n\nSetting this parameter to ``True`` will add bias to each of the linear layers in the model.\n\n    * **Type**: bool\n    * **Required**: True\n\n\n.. _nxdt_config_overview_precision_config:\n\nPrecision\n---------\n\nThis config can help to decide the dtype of the model/optimizer.\n\n.. code-block:: yaml\n\n    precision:\n        type: 'mixed_precision' # ['bf16SR', 'fp32', 'autocast', 'mixed_precision', 'mixed_precisionSR', 'manual']\n        # Set the following only if precision type is manual, otherwise they will be automatically set.\n        master_weights: False\n        fp32_grad_acc: False\n        xla_use_bf16: '0'\n        xla_downcast_bf16: '0'\n        neuron_rt_stochastic_rounding_en: '0'\n        parallel_layers_reduce_dtype: 'bf16'\n\n.. note::\n\n    Only if the precision type is ``manual``, ``master_weights`` , ``fp32_grad_acc``, ``xla_use_bf16``, ``xla_downcast_bf16``,\n    ``neuron_rt_stochastic_rounding_en`` will be picked up from the config. These parameters are for more finer control of\n    precision. It is recommended to use ``mixed_precision`` config for better accuracy.\n\n**type**\n    **mixed_precision**\n\n    The ``mixed_precision`` config uses the ``zero1`` optimizer. It performs grad accumulation,\n    ``grad cc``, and keeps the master copy of the weights in ``fp32``. It also sets the ``xla_downcast_bf16``\n    environment variable to 1 and disables stochastic rounding.\n\n    **mixed_precisionSR**\n\n    ``mixed_precisionSR`` is a superset of the ``mixed_precision`` config with stochastic rounding enabled.\n\n\n    **bf16SR**\n\n    ``bf16SR`` config will perform all operations in ``bf16`` and relies on stochastic rounding feature for accuracy gains.\n\n\n    **autocast**\n\n    ``autocast`` config will follow the exact same precision strategy followed by ``torch.autocast``.\n\n    .. note::\n        Autocast is not supported in this release.\n\n    **manual**\n\n    To gain control of the different precision nobs, one can set the precision type to ``manual`` and control parameters\n    like - ``master_weights`` , ``fp32_grad_acc``, ``xla_use_bf16``, ``xla_downcast_bf16`` and\n    ``neuron_rt_stochastic_rounding_en``.\n\n**parallel_layers_reduce_dtype**\n\nThis config will perform reduce collectives (all-reduce and reduce-scatter) within parallel layers in the\nspecified precision. If ``fp32`` precision type is used, then we implicitly set reduce dtype to ``fp32``.\nOtherwise it will be defaulted to ``bf16`` in all other cases unless specified.\n\n\nModel Alignment Specific\n------------------------\n\nYou can configure fine-tuning (SFT) or model alignment (DPO/ORPO)\nthrough the YAML file, along with parameter-efficient\nfine-tuning using LoRA.\n\n.. code-block:: yaml\n\n    model_alignment_strategy:\n        # DPO specific config\n        dpo:\n            kl_beta: 0.01\n            loss_type: sigmoid\n            max_prompt_length: 2048\n            precompute_ref_log_probs: True\n            truncation_mode: keep_start\n\n        # Alternatively, you can also use SFT specific config\n        sft:\n            packing: True\n\n        # Alternatively, can also use ORPO specific config\n        orpo:\n            beta: 0.01\n            max_prompt_length: 2048\n            truncation_mode: keep_start\n\n        # Parameter-efficient finetuning - LoRA config\n        peft:\n            lora_rank: 16\n            lora_alpha: 32\n            lora_dropout: 0.05\n            lora_bias: \"none\"\n            lora_verbose: True\n            target_modules: [\"qkv_proj\"]\n\n\n**model_alignment_strategy**\n\n    Set only when using finetuning specific algorithms (SFT, DPO, etc) and related hyperparameters\n    DPO-specific parameters.\n\n        **dpo**\n            **kl_beta**\n\n            KL-divergence beta to control divergence of policy model from reference model\n\n                * **Type**: float\n                * **Default**: 0.01\n                * **Required**: True\n\n            **loss_type**\n\n            Currently support sigmoid version of optimized DPO loss\n\n                * **Type**: str\n                * **Default**: ``sigmoid``\n                * **Required**: True\n\n            **max_prompt_length**\n\n            Set maximum length of prompt in the concatenated prompt and (chosen/rejected) response input\n\n                * **Type**: integer\n                * **Required**: True\n\n            **precompute_ref_log_probs**\n\n            To enable precomputation of reference model log probabilities using pre-fit hook,\n            False is not supported currently\n\n                * **Type**: bool\n                * **Required**: True\n\n            **truncation_mode**\n\n            To define how to truncate if size (prompt+response) exceeds seq_length\n            options: [\"keep_start\", \"keep_end\"]\n\n                * **Type**: str\n                * **Default**: ``keep_start```\n                * **Required**: True\n\n    SFT-specific parameters.\n\n        **sft**\n            **packing**\n\n            Appends multiple records in a single record until seq length\n            supported by model, if false uses pad tokens to reach seq length.\n            Setting it to True increases throughput but might impact accuracy.\n\n                * **Type**: bool\n                * **Default**: False\n                * **Required**: False\n\n    `Odds Ratio Preference Optimization (ORPO) <https://arxiv.org/abs/2403.07691>`_\n    specific parameters.\n\n        **orpo**\n            **beta**\n\n            KL-divergence beta to control divergence of policy model from reference model\n\n                * **Type**: float\n                * **Default**: 0.01\n                * **Required**: True\n\n            **max_prompt_length**\n\n            Set maximum length of prompt in the concatenated prompt and (chosen/rejected) response input\n\n                * **Type**: integer\n                * **Required**: True\n\n            **truncation_mode**\n\n            To define how to truncate if size (prompt+response) exceeds seq_length\n            options: [\"keep_start\", \"keep_end\"]\n\n                * **Type**: str\n                * **Default**: ``keep_start```\n                * **Required**: True\n\n        **peft**\n            Configuration options for Parameter-Efficient Fine-Tuning (PEFT) methods,\n            specifically LoRA settings.\n\n            **lora_rank**\n\n            Rank of LoRA; determines the number of trainable parameters\n            Higher rank allows for more expressive adaptations but increases memory usage\n\n                * **Type**: int\n                * **Default**: 16\n                * **Required**: True\n\n            **lora_alpha**\n\n            Scaling factor for LoRA updates; affects the magnitude of LoRA adaptations.\n\n                * **Type**: int\n                * **Default**: 32\n                * **Required**: True\n\n            **lora_dropout**\n\n            Dropout rate for LoRA layers to prevent overfitting.\n\n                * **Type**: float\n                * **Default**: 0.05\n                * **Required**: False\n\n            **lora_bias**\n\n            Bias type for LoRA. Determines which biases are trainable. Can be 'none', 'all' or 'lora_only'\n\n                * **Type**: str\n                * **Default**: \"none\"\n                * **Required**: False\n\n            **lora_verbose**\n\n            Enables detailed LoRA-related logging during training.\n\n                * **Type**: bool\n                * **Default**: False\n                * **Required**: False\n\n            **target_modules**\n\n            List of model layers to apply `LoRA <https://arxiv.org/abs/2106.09685>`__.\n\n                * **Type**: list[str]\n                * **Default**: [\"qkv_proj\"] (for Llama)\n                * **Required**: True"
  },
  {
    "path": "libraries/nxd-training/general/features.rst",
    "content": ".. _nxdt_features:\n\nNeuronx Distributed Training Library Features\n=============================================\n\nThe library is meant to provide an end-to-end framework for training on Trainium instances. The NxD Training is a\ncollection of open-source libraries, tools, and resources that empowers customers to run end-to-end training workflows\non Neuron. Its an extension to Neuronx-Distributed (NxD) library. NxD Training incorporates the distributed strategies\nprimitives from NxD (i.e., NxD Parallel Primitives),while maintaining a design that is ready to integrate partitioning\ntechnologies from native PyTorch or from OpenXLA such as GSPMD. NxD Training also supports  PyTorch Lightning (PTL)\nTrainer and extends NxD to include data engineering features from NeMo, such as data loaders, datasets, and tokenizers,\nas well as ML engineering capabilities from NeMo like monitoring, logging, and experiment management. Furthermore,\nthe NxD Training framework introduces support for training techniques such as pre-training and fine-tuning, along with\na model hub featuring end-to-end examples for state of the art models like LLama, GPT, and Mixtral MoE implemented using\nboth HuggingFace and Megatron-LM model classes.\n\nThe framework uses the distributed training technology from NxD. This allows the framework to support all the\nsharding techniques and Modules already supported by NxD.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nDistributed Techniques\n-----------------------\n\n1. Data-parallelism\n2. `Tensor-parallelism <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tensor_parallelism_overview.html#tensor-parallelism-overview>`_\n3. `Sequence-Parallelism <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html#sequence-parallelism>`_\n4. `Pipeline-parallelism <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pipeline_parallelism_overview.html>`_\n    1. 1F1B pipeline schedule\n    2. Interleave pipeline schedule (or virtual pipeline parallel)\n5. `Zero1 <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/zero1_gpt2.html#what-is-zero-1>`_\n6. Expert-parallelism\n7. `Context-Parallelism <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/context_parallelism_overview.html>`_\n\nModules\n--------\n\n1. `Grouped Query Attention layer <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#gqa-qkv-linear-module>`_\n2. Mixture of Experts (MoE)\n\nModel/Optimizer Precision\n-------------------------\n\nTo cater to different types of precision that can affect the overall training, the library provides an option to\nconfigure the following:\n\n1. Zero1 with Master weights in FP32\n2. BF16 + Stochastic Rounding\n3. FP32\n\nCheckpoint Saving/Loading\n-------------------------\nWhen we are working with large models and running training for a long time, checkpointing becomes an important\npart of training models. The framework supports the following features for checkpointing:\n\n1. Save/Load sharded checkpoints\n2. Asynchronous checkpoint saving/loading\n3. Ability to keep only the last K checkpoints\n4. Auto-resume training jobs from previous checkpoints\n5. Ability to dump a checkpoint to S3\n\nTo optimize the checkpointing time, we have enabled dumping of checkpoints from all ranks to distribute the workload\nand parallelize the checkpoint saving. Similarly when loading checkpoints, the API would load only on 1 data-parallel\nrank and broadcast it to all ranks. This improves the checkpoint loading time as it avoids contention on the file\nsystem.\n\nTraining Recipes\n----------------\n\nThe library supports the following training recipes:\n\n1. Pre-training: The library shows examples of pretraining models like LLama2/3-8B/70B , GPT, Mistral, and Mixtral MoE\n2. Supervised Fine-tuning: Showcase fine-tuning of llama-3 model with a chat dataset.\n"
  },
  {
    "path": "libraries/nxd-training/general/installation_guide.rst",
    "content": ".. _nxdt_installation_guide:\n\nSetup\n=====\n\nNeuronx Distributed Training framework is built on top of\n`NeuronxDistributed (NxD) <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/index.html>`_ ,\n`NeMo <https://github.com/NVIDIA/NeMo/tree/v1.14.0>`_ libraries and\n`PyTorch-Lightning <https://github.com/Lightning-AI/pytorch-lightning/tree/1.8.6>`_. The guide below will provide\na step-by-step instructions on how to setup the environment to run training using NeuronX Distributed Training\nframework. Alternatively, you can use the Neuronx Distributed Training virtual environment found in the Neuron DLAMI without\nrunning any of these setup steps. See :ref:`neuron-dlami-overview`.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n.. _nxdt_python_venv:\n\nSetup a python Virtual Environment\n----------------------------------\n\nLet's first setup a virtual env for our development. This can be done using the command below:\n\n.. code-block :: shell\n\n    python3 -m venv env\n    source env/bin/activate\n\n.. _nxdt_neuron_deps:\n\nInstalling Neuron Dependencies\n------------------------------\n\nInstall the neuron packages using the command:\n\n.. code-block :: shell\n\n    pip install -U pip\n    pip install --upgrade neuronx-cc==2.* torch-neuronx torchvision neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com\n\n.. _nxdt_nemo_deps:\n\nBuilding Apex\n-------------\n\nNxD Training uses the NeMo toolkit, which requires you to install additional dependencies. One of these dependencies is \nthe `Apex <https://github.com/NVIDIA/apex/tree/master>`_ library. The NeMo toolkit uses this library for several fused \nmodule implementations.\n\n.. note::\n    NeMo used to use Apex for all distributed training APIs. Since we are using NxD for the same purpose, the use of\n    Apex for this framework is very minimal. It's been added as a dependency since some of the minor imports inside NeMo\n    will break without it. Hence, when building Apex, we build a slim CPU version using the instructions below:\n\n1. Clone Apex repo\n\n.. code-block :: shell\n\n    git clone https://github.com/NVIDIA/apex.git\n    cd apex\n    git checkout 23.05\n\n\n2. Replace the contents of the ``setup.py`` with the following contents:\n\n.. code-block :: python\n\n    import sys\n    import warnings\n    import os\n    from packaging.version import parse, Version\n\n    from setuptools import setup, find_packages\n    import subprocess\n\n    import torch\n    from torch.utils.cpp_extension import BuildExtension, CppExtension, CUDAExtension, CUDA_HOME, load\n\n    setup(\n        name=\"apex\",\n        version=\"0.1\",\n        packages=find_packages(\n            exclude=(\"build\", \"csrc\", \"include\", \"tests\", \"dist\", \"docs\", \"tests\", \"examples\", \"apex.egg-info\",)\n        ),\n        install_requires=[\"packaging>20.6\",],\n        description=\"PyTorch Extensions written by NVIDIA\",\n    )\n\n3. Install python dependencies:\n\n.. code-block :: shell\n\n    pip install packaging wheel\n\n\n4. Build the wheel using the command:\n\n.. code-block :: shell\n\n    python setup.py bdist_wheel\n\n\n5. After this, you should see the wheel at ``dist/``. You can use this for installation in the next section.\n6. Come out of the ``apex`` directory using ``cd ..``.\n\n\n.. _nxdt_nxdt_reqs:\n\nInstalling the requirements\n---------------------------\n\nDownload the ``requirements.txt`` using the command:\n\n.. code-block :: shell\n\n    wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed-training/master/requirements.txt\n\nWe can now install the dependencies of the library using the following command:\n\n.. code-block :: shell\n\n    pip install -r requirements.txt ~/apex/dist/apex-0.1-py3-none-any.whl\n\nAfter installing the requirements, we need to patch some of the installations so run\n\n.. code-block :: shell\n    \n    wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed-training/master/install_setup.sh\n    chmod +x install_setup.sh\n    ./install_setup.sh\n\nYou may see some warnings related to the installations, but those can be ignored.\n\n.. _nxdt_nxdt_nxdt_install:\n\nInstalling Neuronx Distributed Training framework\n-------------------------------------------------\n\nTo install the library, one can run the following command:\n\n.. code-block :: shell\n\n    pip install neuronx_distributed_training --extra-index-url https://pip.repos.neuron.amazonaws.com\n\n\n.. _nxdt_installation_common_failures:\n\nCommon failures during installation\n-----------------------------------\n\nThis section goes over the common failures one can see during setup and how to resolve them.\n\n1. **``ModuleNotFoundError: No module named 'Cython'``**\n\n   You may have to install Cython explicitly using ``pip install Cython``\n\n2. **Error while building ``youtokentome``**\n\n   If you get an error that says ``Python.h file not found``, you may have to install python-dev and recreate the\n   virtual env. To install python-dev, you can use the command: ``sudo apt-get install python-dev``\n\n3. **Mismatched torch and torch-xla version**\n\n   When you see an error that looks like:\n\n::\n\n    ImportError: env/lib/python3.10/site-packages/_XLAC.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c109TupleTypeC1ESt6vectorINS_4Type24SingletonOrSharedTypePtrIS2_EESaIS4_EENS_8optionalINS_13QualifiedNameEEESt10shared_ptrINS_14FunctionSchemaEE\n\n   It indicates that the major versions of torch and torch-xla don't match.\n\n.. note::\n    If you install torch again, make sure to install the corresponding torchvision version else that would have\n    a conflict.\n\n4. **Torch vision version error**\n\n   The below error indicates incorrect torchvision version. If installing ``torch=2.1``, install ``torchvision=0.16``\n   (This `link <https://pypi.org/project/torchvision/>`_ shows which version of torchvision is compatible with\n   which version of torch).\n\n::\n\n    ValueError: Could not find the operator torchvision::nms. Please make sure you have already registered the operator\n    and (if registered from C++) loaded it via torch.ops.load_library.`\n\n5. **Matplotlib lock error**\n\n   If you see the below error:\n\n::\n\n    TimeoutError: Lock error: Matplotlib failed to acquire the following lock file\n\n   This error means there is some contention in compute/worker nodes to access the matlotlib cache, and hence the timeout\n   error. To resolve this error, add or run ``python -c 'import matplotlib.pyplot as plt'`` command as part of your setup.\n   This will create a matplotlib cache and avoid the race condition.\n\n\n\n"
  },
  {
    "path": "libraries/nxd-training/general/known-issues.txt",
    "content": "* :ref:`nxdt_known_issues`"
  },
  {
    "path": "libraries/nxd-training/general/known_issues.rst",
    "content": ".. _nxdt_known_issues:\n\nKnown Issues and Workarounds\n============================\n\nThis section covers the common failures that one can see while working with Neuronx Distributed Training library.\nSome of the failures regarding installation have been documented in :ref:`nxdt_installation_common_failures`.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nShared weights error\n--------------------\n\nTieing weights is not supported when using pipeline parallelism.\nThis means currently, the ``share_embeddings_and_output_weights`` parameter is not supported when using pipeline\nparallelism. It would produce an error that looks like this\n\n::\n\n    File \"/home/ubuntu/aws_neuron_venv_pytorch/lib/python3.8/site-packages/neuronx_distributed/pipeline/model.py\", line 625, in _reduce_shared_weights\n    assert p.grad is not None, f\"Found shared weight {n} has None grad\"\n    AssertionError: Found shared weight language_model_embedding_word_embeddings.weight has None grad\n\nPlease set this flag to ``False`` when using pipeline parallelism.\n\n\nHOST OOM issues\n---------------\n\nYou would see an error log that looks like this without any other error above it.\n\n::\n\n    WARNING:torch.distributed.elastic.agent.server.api:Received 15 death signal, shutting down workers\n    WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3721028 closing signal SIGTERM\n    WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3721029 closing signal SIGTERM\n    WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3721030 closing signal SIGTERM\n    WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3721031 closing signal SIGTERM\n\nYou can confirm ``HOST OOM`` by checking ``sudo dmesg`` on the Trn1 node. ``HOST OOM`` can occur because of multiple\nreasons:\n\nDuring checkpoint saving\n########################\n\nIf you see the above error immediately after a checkpoint saving log, this indicates that the entire checkpoint\nis copied to CPU. In this case, please check if the ``save_xser`` parameter is set to ``True``. This mode will\nensure each worker saves only one tensor at a time to disk. Setting this to ``False`` will make all the workers\ncopy the entire checkpoint to CPU and can result in ``HOST OOM``.\n\nDuring async_checkpointing\n##########################\n\n``async_checkpointing`` when used with a low number of nodes can cause ``HOST OOM`` as it increases memory pressure\nper node. When we use more nodes, the memory pressure gets divided among the nodes and hence you would get an OOM.\n\nOn a high level, async checkpointing copies data from device memory to host memory, then launch a new process\nto save host memory to storage, and let the main process continue with the training. Since we launch\na new process, it requires a lot more extra host memory, because the launched process has the exact copy of memory\nspace of the parent process. Let's use the following example to demonstrate how much memory we would need. For a llama2\n70b training using tp32 on 32 nodes, we launch 32 processes on each node. As baseline, each process uses 5 GB of host\nmemory. There is also the XRT server, which uses 110 GB of host memory, so in total 270 GB host memory is used\n(5*32 + 110). If we enable ``async_checkpointing`` on this setting, the final memory usage can reach as high as\n482 GB because of the following reasons:\n\n1. Each training process needs to allocate memory to hold the model. The model weights for llama2 70B would\nrequire 280GB of memory to store the weights. The optimizer state would require twice as much memory. So total\namount of host memory is 840 GB. Because we used all ranks for saving, the 840GB of data was evenly distributed\namong 1,024 processes (32 x 32), which means 0.84 GB of memory per process, or 26 GB of memory per instance. So\neach process’s host memory usage is 5.8GB.\n\n2. Second, each training process will fork a process for saving. The forked process will have a copy of parent’s\nmemory. In practice, linux uses a Copy-On-Write mechanism to save memory usage, but still in theory the actual memory\nusage of the child process can reach 5.8 GB combined. When ``async_checkpointing`` is enabled, we have 64 processes\neach using 5.8 GB of memory, and the XRT server uses 110 GB of memory. Therefore the total memory usage will be 482GB\n(64 * 5.8 + 110).\n\nHence with 32 nodes, we are already on the edge (each Trn1 node has 512GB of host memory) and we could OOM at 32 nodes.\nFor a more stable run, enabling ``async_checkpointing`` at 64 nodes is recommended.\n\n\nDuring Dataloading\n##################\n\nAnother common reason for ``HOST OOM`` is loading too much data onto CPU. For pipeline-parallel processing, the\nlibrary loads the entire global batch onto CPU and then moves it one-by-one to device. If we have a large\nbatchsize with each batch taking space, it can lead to ``HOST OOM``.\n\n\nImportError: ``helpers``\n------------------------\n\nIf you see an error that looks like:\n\n::\n\n    ImportError: cannot import name 'helpers' from 'nemo.collections.nlp.data.language_modeling.megatron' (/usr/local/lib/python3.8/dist-packages/nemo/collections/nlp/data/language_modeling/megatron/__init__.py)\n\nThis could be because the helpers.cpp didn’t get built correctly at the time of execution. We can pre-built it\nby running the following code:\n\n.. code-block:: python\n\n    import sys\n    import types\n\n    import torch\n\n    if torch.__version__.startswith(\"2\"):\n        string_classes = str\n        inf = torch.inf\n    else:\n        string_classes = None\n        inf = None\n\n\n    # conditionally modify the import\n    def modify_torch_six_import():\n        if string_classes is not None:\n            try:\n                if \"torch._six\" not in sys.modules:\n                    # Create and add dummy module to sys.modules\n                    six_module = types.ModuleType(\"torch._six\")\n                    six_module.string_classes = string_classes\n                    six_module.inf = inf\n                    sys.modules[\"torch._six\"] = six_module\n            except Exception as e:\n                raise RuntimeError(f\"Failed to override torch._six import: {e}\")\n\n    modify_torch_six_import()\n    from nemo.collections.nlp.data.language_modeling.megatron.dataset_utils import compile_helper\n    compile_helper()\n\n\nAlternatively, if you see\n\n::\n\n    ImportError: /shared/username/aws_neuron_venv_pytorch/lib/python3.10/site-packages/nemo/collections/nlp/data/language_modeling/megatron/helpers.cpython-310-x86_64-linux-gnu.so: file too short\n\nA current workaround for this case is to delete the .so file and run the above snippet explicitly.\n\nMatplotlib error\n----------------\n\nIf you see an error that looks like:\n\n::\n\n    TimeoutError: Lock error: Matplotlib failed to acquire the following lock file\n\nIt means there is some contention in compute/worker nodes to access the matlotlib cache, and hence the lock error.\nTo resolve this add or run ``python -c 'import matplotlib.pyplot as plt'`` as part of your setup. This will\ncreate a matplotlib cache and avoid the race condition.\n\nFlash Attention not supported for megatron-style models\n-------------------------------------------------------\n\nFlash attention kernel is supported only for HF-style models and will be added for megatron-style models in one of\nthe future releases.\n"
  },
  {
    "path": "libraries/nxd-training/index.rst",
    "content": ".. meta::\n   :description: NxD Training (NeuronX Distributed Training) is a PyTorch library for end-to-end distributed training on AWS Trainium instances, offering turnkey workflows for pre-training, fine-tuning, and PEFT.\n   :keywords: NxD Training, NeuronX Distributed Training, AWS Neuron SDK, Distributed Training, PyTorch Lightning, Tensor Parallelism, Pipeline Parallelism, ZeRO-1, LoRA, PEFT, Model Training\n   :date-modified: 01/22/2026\n\n.. _nxdt:\n\nNxD Training\n============\n\nThis section contains the technical documentation specific to the NxD Training library included with the Neuron SDK.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Overview </libraries/nxd-training/overview>\n    Setup </libraries/nxd-training/general/installation_guide>\n    Tutorials  </libraries/nxd-training/tutorials/index>\n    Developer Guides  </libraries/nxd-training/developer_guides/index>\n    API Reference Guide </libraries/nxd-training/api-reference-guide>\n    App Notes </libraries/nxd-training/app_notes>\n    Release Notes </release-notes/components/nxd-training>\n    Misc  </libraries/nxd-training/misc>\n\nWhat is NxD Training?\n---------------------\n\nNxD Training (NeuronX Distributed Training) is a PyTorch library for end-to-end distributed training on AWS Trainium instances. It combines ease-of-use with powerful features built on top of the NxD Core library, offering turnkey support for model pre-training, supervised fine-tuning (SFT), and parameter-efficient fine-tuning (PEFT) using LoRA.\n\nWith NxD Training, developers can:\n\n* Train large-scale models with turnkey workflows for pre-training, SFT, and PEFT (LoRA)\n* Leverage distributed strategies including Data Parallelism, Tensor Parallelism, Sequence Parallelism, Pipeline Parallelism, and ZeRO-1\n* Use PyTorch Lightning integration for organized training code\n* Access ready-to-use model samples based on HuggingFace and Megatron-LM formats\n* Manage experiments with integrated checkpointing, logging, and S3 storage support\n* Choose from three usage interfaces: YAML configuration files, PyTorch Lightning APIs, or NxD Core primitives\n\nNxD Training is compatible with training platforms like NVIDIA's NeMo (except for Trainium-specific features) and is available on GitHub as both pip wheel and source code.\n\nUsage Interfaces\n----------------\n\nNxD Training provides three interfaces to meet different developer needs:\n\n* **YAML Configuration Files**: High-level access for distributed training with minimal code changes\n* **PyTorch Lightning APIs**: Standardized training workflows with NxD Core primitives\n* **NxD Core Primitives**: Low-level APIs for custom model integration and advanced use cases\n\n\nNxD Training documentation\n---------------------------\n\n.. grid:: 1 1 2 2\n    :gutter: 3\n    \n    .. grid-item-card:: Overview\n        :link: /libraries/nxd-training/overview\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Learn about NxD Training architecture, key features, and usage interfaces for distributed training on AWS Trainium.\n\n    .. grid-item-card:: Setup\n        :link: /libraries/nxd-training/general/installation_guide\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Step-by-step instructions for installing and configuring NxD Training on Trainium instances.\n\n    .. grid-item-card:: Tutorials\n        :link: /libraries/nxd-training/tutorials/index\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Hands-on tutorials for training various models including Llama, GPT, and BERT with different parallelism strategies.\n\n    .. grid-item-card:: Developer Guides\n        :link: /libraries/nxd-training/developer_guides/index\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        In-depth guides for model integration, YAML configuration, migration from NeMo/NNM, and advanced training workflows.\n\n    .. grid-item-card:: API Reference\n        :link: /libraries/nxd-training/api-reference-guide\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Comprehensive API documentation for NxD Training modules, configuration options, and programming interfaces.\n\n    .. grid-item-card:: Application Notes\n        :link: /libraries/nxd-training/app_notes\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Detailed application notes on distributed strategies, optimization techniques, and best practices for training.\n\n    .. grid-item-card:: Misc Resources\n        :link: /libraries/nxd-training/misc\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Known issues, troubleshooting guides, and other helpful resources for working with NxD Training.\n\n    .. grid-item-card:: NxD Training Release Notes\n        :link: /release-notes/components/nxd-training\n        :link-type: doc\n        :class-card: sd-rounded-3\n        \n        Review the latest updates, new features, and bug fixes in NxD Training releases."
  },
  {
    "path": "libraries/nxd-training/misc.rst",
    "content": ".. _nxdt_misc\n\nMisc.\n=====\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /release-notes/components/nxd-training\n    /libraries/nxd-training/general/known_issues\n\n.. include:: /libraries/nxd-training/misc.txt"
  },
  {
    "path": "libraries/nxd-training/misc.txt",
    "content": "* :ref:`nxd-training_rn`\n* :ref:`nxdt_known_issues`"
  },
  {
    "path": "libraries/nxd-training/overview.rst",
    "content": ".. _nxd-training-overview:\n\nOverview\n=========\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nNxD Training\n-------------------\n\nThe NeuronX Distributed Training (NxD Training) library is a collection of open-source tools and\nlibraries designed to empower customers to train PyTorch models on AWS Trainium instances.\nIt combines both ease-of-use and access to features built on top of\n:ref:`NxD Core <neuronx-distributed-training-index>` library. Except for a few Trainium specific features, NxD Training\nis compatible with training platforms like NVIDIA’s NeMo.\n\nSpecifically, :ref:`NxD Training <nxdt_figure>` offers the following features and productivity flows:\n\n*  **Training Workflows**: Developers benefit from turnkey support for multiple workflows such as model Pre-training, Supervised Finetuning (SFT),  \n   and Parameter Efficient Finetuning (PEFT) using Low Rank Adapters (LoRA) [#f1]_. For these workflows, precision types supported include  \n   (a) FP32 for both baseline and for master weights when using ZeRO-1, \n   and (b) BF16 combined with :ref:`stochastic rounding <neuron-rounding-modes>`.\n\n*  **Distributed Strategies**: Splitting training workload over multiple nodes shortens the job duration. This is made possible through distributed strategies \n   that are the techniques used to shard large scale models across multiple Neuron Cores. NxD Training Distributed Strategies are implemented in the \n   :ref:`NxD Core <neuronx-distributed-training-index>` library and include:\n   Data Parallelism, \n   `Tensor-parallelism <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tensor_parallelism_overview.html#tensor-parallelism-overview>`_, \n   `Sequence-Parallelism <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html#sequence-parallelism>`_,  \n   `Pipeline-parallelism <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pipeline_parallelism_overview.html>`_  (including 1F1B pipeline \n   schedule and interleaved pipeline schedule), and `ZeRO-1 <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/zero1_gpt2.html#what-is-zero-1>`_.\n\n*  **Data Science  Modules**: The integration of datasets, dataloaders, tokenizers and other data wrangling tools makes it easy to prepare and use large-scale training data.\n\n*  **Data Engineering Modules**: Integrated *Experiment Manager* allows for saving training outputs through checkpointing and evaluating results through enhanced logging. It comes with \n   multiple options\n   for optimally loading/saving checkpoints such as sharded checkpoints, last-K checkpoints, asynchronous checkpoints, auto-resume from checkpoints and storage in S3 buckets.\n\n*  **PyTorch Lightning**: NxD Training is integrated with training frameworks like like PyTorch Lightning that help with organizing training code.\n\n*  **Models**: Users can start on NxD Training with ready-to-use samples based on HuggingFace and Megatron-LM model formats. It has support for advanced LLM architecture blocks such as \n   `Grouped Query Attention layer <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#gqa-qkv-linear-module>`_. \n\n*  **SW Releases**: NxD Training code is available on `GitHub <https://github.com/aws-neuron/neuronx-distributed-training/tree/main>`_, both as pip wheel and source code.\n\n.. _nxdt_figure:\n\n.. figure:: ./images/nxd_training.jpg\n    \n    `NxD Training`\n\nUsing NxD Training\n------------------\n\nML developers often need access to training code at different levels of abstraction. As shown in :ref:`figure <nxdt_usage_figure>`, using NxD Training is possible  \nusing three interfaces: \n\n*   High-level `YAML <https://yaml.org/>`_  configuration file used in conjunction with models in NxD Training's model hub\n*   `PyTorch Lightning (PTL) <https://github.com/Lightning-AI/pytorch-lightning>`_ APIs and Trainer in conjunction with NxD Core primitives\n*   :ref:`NxD Core <neuronx-distributed-training-index>` foundational API, also refered to as NxD Core primitives\n\nAll three usage mechanisms employ the underlying NxD Core library either directly through programming interfaces or \nconfiguration files and developers can choose the method that meets \ntheir needs.\n\n.. _nxdt_usage_figure:\n\n.. figure:: ./images/nxdt_ux.jpg\n\n    `Using NxD Training through (a) Configuration Files (b) PyTorch Lightning APIs, and (c) NxD Core primitives`\n\nConfiguration File\n^^^^^^^^^^^^^^^^^^\n\nNxD Training supports a top-level access for distributed training using YAML based configuration files. \nThis option is available for models that are available in the model hub or custom ones enabled after following\nthe steps listed in :ref:`model integration guide <nxdt_developer_guide_integrate_new_model>` inside NxD Training. With this usage model, only the configuration parameters \ninside the YAML file need to be set and no further code changes are necessary. This facilitates easy experimentation with various configuration settings and automating the workflow.\nFigure below shows the major \nsettings available inside YAML configuration file and more details on how to exercise these options are in \n:ref:`YAML Configuration Settings <nxdt_config_overview>`. Existing users of NeuronX NeMo Megatron (NNM) or NVIDIA NeMo \ncan review :ref:`NNM <nxdt_developer_guide_migration_nnm_nxdt>` and :ref:`NeMo <nxdt_developer_guide_migration_nemo_nxdt>`\nmigration guides, respectively, to map the configuration parameters to NxD Training.\n\n.. figure:: ./images/yaml_parts.jpg\n\n    `Top level settings for NxD Training through configuration file`\n\nPyTorch Lightning APIs\n^^^^^^^^^^^^^^^^^^^^^^\n\n`PyTorch Lightning <https://github.com/Lightning-AI/pytorch-lightning>`_ is a library that abstracts out model \ntraining workflows and eliminates the boilerplate code to setup training loops. Through its inheritable classes for \ntraining loops, data and customizable callbacks for checkpointing and distributed strategies, developers can set \ntraining workflows in a standardized and compact manner. \n\nAs shown in :ref:`user interfaces to NxD Training, Figure (b) <nxdt_usage_figure>`, overall training scripts can be built \nusing PyTorch Lightning and making use of NxD Core library. \nThis requires overriding the base classes of PyTorch Lightning such as ``LightningModule``, ``DataModule``; \nconfiguring optimizer and LR scheduler;setting appropriate callbacks; and launching the ``Trainer``.\nFor more details, refer to NxD Core's PyTorch Lightning :ref:`developer guide <ptl_developer_guide>` \nand :ref:`sample tutorial <llama2_tp_pp_ptl_tutorial>`. \n\nNxD Core Primitives\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNxD Core primitives are basic APIs that can be stitched together to build complete training workflows for AWS Trainium instances. \nAddtionally, these primitives are required for integrating a new custom model into NxD Training or \nusing the model directly via NxD Core library.\n\nNxD Core library has support for all the essential training features - model sharding, handling collective communications, \nmemory reduction, checkpointing, optimizer setting and profiling. \nFor example, tensor parallelism through NxD Core is achieved by converting the linear layers, common in attention modules \nof transformer-architecture based models, to parallel layers. For pipeline parallelism, NxD Core offers ability for both manual and automatic\nselection of pipeline cut points in the model graph. \nAdditional options for sequence parallelism and activation recomputation help with memory reduction.\nFor all these parallelism options, NxD Core library automatically ensures efficient management of all the required collective communications across Neuron Cores.\n\nExact details on how these capabilities can be exercised are described in :ref:`NxD Core developer guide <neuronx_distributed_developer_guide>`. \nFor background information and description of NxD Core primitives, users are referred to \nNxD Core's :ref:`app notes <neuronx_distributed_appnotes>`, and :ref:`API guide <neuronx_distributed_api_guide>`, respectively. \nFollowing these steps, once a new model is onboarded using NxD Core APIs, its training workflow can be streamlined using\nNxD Training's experiment manager and data science/engineering modules.\n\n.. [#f1] Supported through NxD Core.\n..\n   With NxD Core, model sharding is made possible using \n   coversion of linear layers to ``RowParallel``/ ``ColumnParallel`` layers for tensor parallelism; wrapping model class into ``NxDPPModel`` for pipeline parallelism; and setting suitable flags for sequence parallelism.\n   NxD Core provides sample implementations for optimizer and checkpointing code and they can then be integrated inside an overall model training script.\n   Details on how these capabilities can be exercised are detailed in :ref:`NxD Core developer guide <neuronx_distributed_developer_guide>`. For background information and interface descriptions, users are referred to \n   NxD Core's :ref:`app notes <neuronx_distributed_appnotes>`, and :ref:`API guide <neuronx_distributed_api_guide>`, respectively. Once a new model is onboarded using NxD Core APIs, its training workflow can be streamlined using\n   NxD Training's experiment manager and data science/engineering modules.\n"
  },
  {
    "path": "libraries/nxd-training/overview.txt",
    "content": "* :ref:`nxd-training-overview`"
  },
  {
    "path": "libraries/nxd-training/setup.txt",
    "content": "* :ref:`nxdt_installation_guide`"
  },
  {
    "path": "libraries/nxd-training/tutorials/checkpoint_conversion.rst",
    "content": ".. _checkpoint_conversion:\n\nCheckpoint Conversion\n=====================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nThe NxD Training library provides a versatile checkpoint conversion functionality,\nallowing seamless transition between different model styles. This tutorial aims to provide a\ncomprehensive guide through the various use cases and demonstrate how to perform the checkpoint conversions.\n\nSupported Model Architectures\n-----------------------------\n\nThe checkpoint conversion functionality supports conversion of the following model styles to/from NxDT checkpoints:\n\n1. **HuggingFace (HF) style models**\n2. **Megatron style models**\n\nExtends support for both GQA (Llama-3) and non-GQA models (Llama-2).\n\nConversion Scenarios and Usage\n------------------------------\n\nThe tool supports the following conversion scenarios. It internally\nuses ``NeuronxDistributed (NxD)`` to convert to/from checkpoints.\nRun the following commands from the ``/examples/checkpoint_conversion_scripts/`` directory:\n\n.. note::\n\n   1. **Important**: You must set the ``--hw_backend`` argument correctly for your hardware.\n      The sample commands below use ``trn1``.\n\n      - Set ``--hw_backend trn1`` for Trainium (Trn1) hardware\n      - Set ``--hw_backend trn2`` for Trainium 2 (Trn2) hardware\n\n   All example commands in this tutorial use ``trn1``. If you're using Trn2,\n   remember to replace ``trn1`` with ``trn2`` in every command.\n\n   2. Ensure that the model configuration config.json file is present,\n      as it is required for checkpoint conversions.\n      It is suggested to use specific json files like\n      `examples <https://github.com/aws-neuron/neuronx-distributed/blob/main/examples/training/llama/tp_zero1_llama_hf_pretrain/8B_config_llama3/config.json>`__ .\n      If not present, you will need to create it.\n\n   3. If your HF/custom checkpoint has multiple ``.bin`` or ``.pt`` or ``.pth`` files\n      then merge and convert to a single file before conversion.\n\nFor conversion of non-GQA based models (e.g. Llama2), just set the ``--qkv_linear`` argument to ``False``.\n\n1. **HF style model**:\n\n   a. **HF to NxDT checkpoint**:\n\n      **Command**:\n\n      .. code-block:: bash\n\n        python3 checkpoint_converter.py --model_style hf --hw_backend trn1 --input_dir /home/ubuntu/pretrained_llama_3_8B_hf/pytorch_model.bin --output_dir /home/ubuntu/converted_hf_style_hf_to_nxdt_tp8pp4/ --save_xser True --config /home/ubuntu/pretrained_llama_3_8B_hf/config.json --tp_size 8 --pp_size 4 --n_layers 32 --kv_size_multiplier 1 --qkv_linear True --convert_from_full_state\n\n     This converts an HF-style checkpoint to an NxDT checkpoint.\n\n   b. **NxDT to HF checkpoint**:\n\n    **Command**:\n\n    .. code-block:: bash\n\n       python3 checkpoint_converter.py --model_style hf --hw_backend trn1 --input_dir ~/examples/nemo_experiments/hf_llama3_8B_SFT/2024-07-19_23-07-40/checkpoints/hf_llama3_8B--step=5-consumed_samples=160.0.ckpt/model --output_dir ~/converted_hf_style_nxdt_to_hf_tp8pp4/ --load_xser True --config ~/config.json --tp_size 8 --pp_size 4 --kv_size_multiplier 1 --qkv_linear True --convert_to_full_state\n\n    This converts an NxDT checkpoint to an HF-style checkpoint.\n\n2. **Megatron style model (non-GQA models: e.g., Llama-2, and GQA models: e.g., Llama-3)**:\n\n   a. **HF to NxDT Megatron checkpoint**:\n\n    **Command**:\n\n    .. code-block:: bash\n\n       python3 checkpoint_converter.py --model_style megatron --hw_backend trn1 --input_dir ~/megatron-tp8pp4-nxdt-to-hf4/checkpoint.pt --output_dir ~/meg_nxdt_hf3_nxdt3 --config ~/llama_gqa/config.json --save_xser True --tp_size 8 --pp_size 4 --n_layers 32 --kv_size_multiplier 1 --qkv_linear True --convert_from_full_state\n\n    This converts an HF-style checkpoint to an NxDT Megatron-style checkpoint.\n\n   b. **NxDT Megatron checkpoint to HF**:\n\n    **Command**:\n\n    .. code-block:: bash\n\n       python3 checkpoint_converter.py  --model_style megatron --hw_backend trn1 --input_dir ~/examples/nemo_experiments/megatron_llama/2024-07-23_21-07-30/checkpoints/megatron_llama--step=5-consumed_samples=5120.0.ckpt/model --output_dir ~/megatron-tp8pp4-nxdt-to-hf4 --load_xser True --config ~/llama_gqa/config.json --tp_size 8 --pp_size 4 --kv_size_multiplier 1 --qkv_linear True --convert_to_full_state\n\n    This converts an NxDT Megatron-style checkpoint to an HF-style checkpoint (GQA-based model, see: ``--qkv_linear`` set to ``True``).\n\n\nKey Arguments\n^^^^^^^^^^^^^\n\nThe ``checkpoint_converter.py`` script supports the following key arguments:\n\n- ``--model_style``: Specifies the model style, either `hf` (HuggingFace: default) or `megatron`\n- ``--hw_backend``: (required) Specifies the hardware backend either `trn1` or `trn2`\n- ``--input_dir``: (required) directory containing the input checkpoint\n- ``--hf_model_name``: (optional) HuggingFace model identifier for directly converting models hosted on HuggingFace\n- ``--output_dir``: (required) directory to save the converted checkpoint directory\n- ``--save_xser``: Saves the checkpoint with torch_xla serialization\n- ``--load_xser``: Loads the checkpoint with torch_xla serialization\n- ``--convert_from_full_state``: Converts full model checkpoint to sharded model checkpoint\n- ``--convert_to_full_state``: Converts sharded model checkpoint to full model checkpoint\n- ``--config``: path to the model configuration file (create `json` file if not present)\n- ``--tp_size``: tensor parallelism degree\n- ``--pp_size``: pipeline parallelism degree\n- ``--n_layers``: number of layers in the model\n- ``--kv_size_multiplier``: key-value size multiplier\n- ``--qkv_linear``: boolean to specify GQA/non-GQA models\n- ``--fuse_qkv``: boolean to specify fused QKV in GQA models\n\nWe recommend enabling xser for significantly faster save and load times.\nNote that if the checkpoint is saved with xser, it can only be loaded with xser,\nand vice versa.\n\nConversion Example\n------------------\n\nAssuming you have a pre-trained HF-style Llama3-8B model checkpoint looking similar to:\n\n``input_dir: /hf/checkpoint/pytorch_model.bin``\n\n.. code-block:: bash\n\n  $ ls /hf/checkpoint\n\n  -rw-r--r-- 1 user group 123 Aug 27 2024 pytorch_model.bin\n\nConvert the HF-style checkpoint to an NxDT checkpoint on a single instance:\n\n.. code-block:: bash\n\n  python3 checkpoint_converter.py --model_style hf --hw_backend trn1 --input_dir /hf/checkpoint/pytorch_model.bin --output_dir /nxdt/checkpoint --save_xser True --convert_from_full_state --config /path/to/config.json --tp_size 8 --pp_size 4 --n_layers 32 --kv_size_multiplier 1 --qkv_linear True --convert_from_full_state\n\nThis command will create an NxDT checkpoint in ``output_dir: /nxdt/checkpoint``\nand it will be sharded with (tp=8, pp=4) like:\n\n.. code-block:: bash\n\n  $ ls /nxdt/checkpoint/model\n\n  -rw-r--r-- 1 user group 123 Aug 27 2024 dp_rank_00_tp_rank_00_pp_rank_00.pt\n  -rw-r--r-- 1 user group 456 Aug 27 2024 dp_rank_00_tp_rank_01_pp_rank_00.pt\n  ...........................................................................\n  -rw-r--r-- 1 user group 789 Aug 27 2024 dp_rank_00_tp_rank_07_pp_rank_02.pt\n  -rw-r--r-- 1 user group 122 Aug 27 2024 dp_rank_00_tp_rank_07_pp_rank_03.pt\n\nDirect HuggingFace Model Conversion\n-----------------------------------\n\nUsing the ``--hf_model_name`` argument allows users to directly convert checkpoint files hosted on HuggingFace\nwithout the need for manual downloading or merging of checkpoint files.\n\nTo use this feature, you can specify the HuggingFace model identifier using the ``--hf_model_name`` argument.\nThe script will then download the model and convert it directly to the NxDT format.\n\n.. note::\n\n   1. When using ``--hf_model_name``, do not specify ``--input_dir``. These arguments are mutually exclusive.\n   2. If both ``--hf_model_name`` and ``--input_dir`` are specified, the script will prioritize ``--input_dir`` and ignore ``--hf_model_name``\n   3. You will be prompted to enter your HuggingFace API token. If you don't have one,\n      you can create it at https://huggingface.co/settings/tokens.\n   4. Ensure you have sufficient disk space to download and process the model.\n\nExample usage:\n\n.. code-block:: bash\n\n   python3 checkpoint_converter.py --model_style hf --hw_backend trn1 --hf_model_name \"meta-llama/Llama-2-7b-hf\" --output_dir /path/to/output --save_xser True --config /path/to/config.json --tp_size 8 --pp_size 4 --n_layers 32 --kv_size_multiplier 1 --qkv_linear False --convert_from_full_state\n\nThis command will download the Llama-2-7b model from HuggingFace.\nConvert it to NxDT format, and save it in the specified output directory.\n\nTroubleshooting\n^^^^^^^^^^^^^^^\n\n- If you encounter an error related to HuggingFace authentication, ensure you're using a valid API token.\n- If the download fails, check your internet connection and verify that the model identifier is correct."
  },
  {
    "path": "libraries/nxd-training/tutorials/hf_llama3_70B_pretraining.rst",
    "content": ".. _hf_llama3_70B_pretraining:\n\nHuggingFace Llama3.1/Llama3-70B Pretraining\n=============================================\n\nIn this example, we will compile and train a HuggingFace Llama3.1/Llama3-70B model\non multiple trn1 or newly launched trn2 instances using ParallelCluster with the ``NxD Training (NxDT)`` library.\nThe example has the following main sections:\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nSetting up the environment\n--------------------------\n\nParallelCluster Setup\n^^^^^^^^^^^^^^^^^^^^^\n\nIn this example, we will use 16 trn1.32xlarge instances or 8 trn2.48xlarge instances with ParallelCluster.\nPlease follow the instructions here to create a cluster:\n`Train your model on ParallelCluster\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster/parallelcluster-training.html>`_\n\nParallelCluster automates the creation of trainium clusters,\nand provides the Slurm job management system for scheduling and managing distributed training jobs.\nPlease note that the home directory on your ParallelCluster\nhead node will be shared with all of the worker nodes via NFS.\n\nInstall Dependencies\n^^^^^^^^^^^^^^^^^^^^\n\nOnce you have launched ParallelCluster,\nplease follow this guide on how to install the latest Neuron packages:\n`PyTorch Neuron Setup Guide\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/index.html#setup-torch-neuronx>`_.\n\nNext, we will need to install ``NxDT`` and its dependencies.\nPlease see the following installation guide for installing ``NxDT``:\n:ref:`NxDT Installation Guide <nxdt_installation_guide>`\n\n\nDownload the dataset\n--------------------\n\nLet's download training-data scripts for our experiments\n\n.. code:: ipython3\n\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/get_dataset.py\n\nThen download ``config.json`` file:\n\nFor Llama-3.1-70B:\n\n.. code-block:: bash\n\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_pp_llama_hf_pretrain/70B_config_llama3.1/config.json ~/\n\nFor Llama-3-70B:\n\n.. code-block:: bash\n\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_pp_llama_hf_pretrain/70B_config_llama3/config.json ~/\n\nTo tokenize the data, we must request the tokenizer from Hugging Face and Meta by following the\ninstructions at the following link: `HuggingFace Llama 3.1 70B Model <https://huggingface.co/meta-llama/Meta-Llama-3.1-70B>`__ . \n\nUse of the Llama models is governed by the Meta license.\nIn order to download the model weights and tokenizer, please visit the above website\nand accept their License before requesting access. After access has been granted,\nyou may use the following python3 script along with your own hugging face token to download and save the tokenizer.\n\n\n.. code:: ipython3\n\n   from huggingface_hub import login\n   from transformers import AutoTokenizer\n\n   login(token='your_own_hugging_face_token')\n\n   tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3.1-70B')  \n   # For llama3 uncomment line below\n   # tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-70B')\n\n   tokenizer.save_pretrained(\".\")\n\nFor Llama3.1/Llama3, make sure your base directory has the following files:\n\n.. code:: ipython3\n\n   './tokenizer_config.json', './special_tokens_map.json', './tokenizer.json'\n\nNext, let’s download and pre-process the dataset:\n\n.. code:: ipython3\n\n   mkdir ~/examples_datasets/\n   python3 get_dataset.py --llama-version 3\n\n\n`Note:` In case you see an error of the following form when downloading data: ``huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name'. Use `repo_type` argument if needed.`` \nThis could be because of a stale cache. Try deleting the cache using: \n\n.. code:: ipython3\n\n   sudo rm -rf ~/.cache/\n\n\nPre-compile the model\n---------------------\n\nBy default, PyTorch Neuron uses a just in time (JIT) compilation flow that sequentially\ncompiles all of the neural network compute graphs as they are encountered during a training job.\nThe compiled graphs are cached in a local compiler cache so that subsequent training jobs\ncan leverage the compiled graphs and avoid compilation\n(so long as the graph signatures and Neuron version have not changed).\n\nAn alternative to the JIT flow is to use the included ``neuron_parallel_compile``\ncommand to perform ahead of time (AOT) compilation. In the AOT compilation flow,\nthe compute graphs are first identified and extracted during a short simulated training run,\nand the extracted graphs are then compiled and cached using parallel compilation,\nwhich is considerably faster than the JIT flow.\n\nFirst, clone the open-source ``neuronx-distributed-training`` library\n\n.. code:: ipython3\n\n   git clone https://github.com/aws-neuron/neuronx-distributed-training\n   cd neuronx-distributed-training/examples\n\nNow, ensure that you are using the proper config file in the ``conf/`` directory.\nIn the ``train.sh`` file, ensure that the ``CONF_FILE`` variable is properly\nset to the config for the model you want to use. In our case,\nit will be ``hf_llama3_70B_config.yaml`` for training on trn1 cluster, and ``hf_llama3_70B_trn2_config.yaml`` for trn2.\n\nIn this tutorial, we will train Llama3-70B model on multiple compute nodes. For training on trn1, please make sure ``hf_llama3_70B_config`` has the right configuration:\n\n.. code-block:: bash\n\n    trainer:\n      devices: 32\n      num_nodes: 16\n\nFor pretraining on trn2, ``hf_llama3_70B_trn2_config`` would contain:\n\n.. code-block:: bash\n\n    trainer:\n      devices: 64\n      lnc: 2 # default for trn2 workloads\n      num_nodes: 8\n\nOn trn2 instances, the configuration `lnc: 2` indicates that there is a 2-to-1 mapping between logical Neuron Core (lnc) and physical Neuron Core.\nAnother supported configuration is `lnc: 1`, in which case each node would expose 128 logical devices.\n\nThe default config here is a 70B parameter model,\nbut users can also add their own ``conf/*.yaml`` files and run different configs and\nhyperparameters if desired. Please see :ref:`Config Overview <nxdt_config_overview>`\nfor examples and usage for the ``.yaml`` config files.\n\nOn trn1 cluster, run the following commands to launch an AOT pre-compilation job on your instance:\n\n.. code-block:: bash\n\n    export COMPILE=1\n    export CONF_FILE=hf_llama3_70B_config\n    sbatch --exclusive \\\n        --nodes 16 \\\n        --cpus-per-task 128 \\\n        --wrap=\"srun ./train.sh\"\n\nOn trn2 cluster, run the following:\n\n.. code-block:: bash\n\n    export COMPILE=1\n    export CONF_FILE=hf_llama3_70B_trn2_config\n    sbatch --exclusive \\\n        --nodes 8 \\\n        --cpus-per-task 128 \\\n        --wrap=\"srun ./train.sh\"\n\n\nOnce you have launched the precompilation job, run the squeue command to view the\nSlurm job queue on your cluster. If you have not recently run a job on your cluster,\nit may take 4-5 minutes for the requested trn1.32xlarge or trn2.48xlarge nodes nodes to\nbe launched and initialized.\nOnce the job is running, squeue should show output similar to the following:\n\n\n.. code-block:: bash\n\n    JOBID  PARTITION  NAME      USER    ST  TIME  NODES NODELIST(REASON)\n    7      compute1   wrap      ubuntu  R   5:11  16    compute1-st-queue1-i1-[1-16]\n\nYou can view the output of the precompilation job by examining the file named\n``slurm-ZZ.out``,\nwhere ZZ represents the JOBID of your job in the squeue output above.\n\n.. code-block:: bash\n\n    tail -f slurm-7.out\n\nOnce the precompilation job is complete, just like the above output\nyou should see a message similar to the following in the logs:\n\n.. code-block:: bash\n\n    2024-11-07 09:57:13.000144:  39810  INFO ||NEURON_PARALLEL_COMPILE||: Total graphs: 36\n    2024-11-07 09:57:13.000144:  39810  INFO ||NEURON_PARALLEL_COMPILE||: Total successful compilations: 36\n    2024-11-07 09:57:13.000144:  39810  INFO ||NEURON_PARALLEL_COMPILE||: Total failed compilations: 0\n\nAt this point, you can press ``CTRL-C`` to exit the tail command.\n\n.. note::\n    The number of graphs will differ based on package versions, models, and other factors.\n    This is just an example.\n\n\nTraining the model\n------------------\n\nYou can launch pre-training job similar to compilation by using the same\ntraining script but now turning off the ``COMPILE`` environment variable\n\nOn trn1 ParallelCluster:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    export CONF_FILE=hf_llama3_70B_config\n    sbatch --exclusive \\\n        --nodes 16 \\\n        --cpus-per-task 128 \\\n        --wrap=\"srun ./train.sh\"\n\nOn trn2 ParallelCluster:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    export CONF_FILE=hf_llama3_70B_trn2_config\n    sbatch --exclusive \\\n        --nodes 8 \\\n        --cpus-per-task 128 \\\n        --wrap=\"srun ./train.sh\"\n\nAs outlined above, you can again use the ``squeue`` command to view the job queue,\nand also monitor the job in the same way with the ``tail`` command to see the training logs.\nOnce the model is loaded onto the Trainium accelerators and training has commenced,\nyou will begin to see output indicating the job progress:\n\nExample:\n\n.. code-block:: bash\n\n    Epoch 0:   3%|▎         | 3/91 [16:05<7:52:06, 321.89s/it, loss=6.7, v_num=2, reduced_train_loss=13.40, lr=7.5e-9, parameter_norm=5536.0, global_step=1.000, consumed_samples=2048.0]\n    Epoch 0:   3%|▎         | 3/91 [16:05<7:52:06, 321.89s/it, loss=4.47, v_num=2, reduced_train_loss=13.40, lr=7.5e-9, parameter_norm=5536.0, global_step=2.000, consumed_samples=3072.0]\n    Epoch 0:   4%|▍         | 4/91 [21:20<7:44:18, 320.22s/it, loss=4.47, v_num=2, reduced_train_loss=13.40, lr=7.5e-9, parameter_norm=5536.0, global_step=2.000, consumed_samples=3072.0]\n    Epoch 0:   4%|▍         | 4/91 [21:20<7:44:18, 320.22s/it, loss=3.35, v_num=2, reduced_train_loss=13.40, lr=7.5e-9, parameter_norm=5536.0, global_step=3.000, consumed_samples=4096.0]\n\n\n.. note::\n    The convergence is for demonstration and would differ based on instance type, model, and other factors.\n\n\nMonitoring Training\n-------------------\n\nTensorboard monitoring\n^^^^^^^^^^^^^^^^^^^^^^\n\nIn addition to the text-based job monitoring described in the previous section,\nyou can also use tools such as TensorBoard to monitor training job progress.\nTo view an ongoing training job in TensorBoard, you first need to identify the\nexperiment directory associated with your ongoing job.\nThis will typically be the most recently created directory under\n``~/neuronx-distributed-training/examples/nemo_experiments/hf_llama/``.\nOnce you have identifed the directory, ``cd`` into it, and then launch TensorBoard:\n\n.. code-block:: bash\n\n    cd ~/neuronx-distributed-training/examples/nemo_experiments/hf_llama/8/\n    tensorboard --logdir ./\n\nWith TensorBoard running, you can then view the TensorBoard dashboard by browsing to\n``http://localhost:6006`` on your local machine. If you cannot access TensorBoard at this address,\nplease make sure that you have port-forwarded TCP port 6006 when SSH'ing into the head node,\n\n.. code-block:: bash\n\n    ssh -i YOUR_KEY.pem ubuntu@HEAD_NODE_IP_ADDRESS -L 6006:127.0.0.1:6006\n\nneuron-top / neuron-monitor / neuron-ls\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe `neuron-top <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-top-user-guide.html>`_\ntool can be used to view useful information about NeuronCore utilization, vCPU and RAM utilization,\nand loaded graphs on a per-node basis. To use neuron-top during on ongoing training job, run ``neuron-top``:\n\n.. code-block:: bash\n\n    ssh compute1-st-queue1-i1-1  # to determine which compute nodes are in use, run the squeue command\n    neuron-top\n\nSimilarly, once you are logged into one of the active compute nodes,\nyou can also use other Neuron tools such as\n`neuron-monitor <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nand `neuron-ls <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nto capture performance and utilization statistics and to understand NeuronCore allocation.\n\n\nContinual Pre-training with Downloaded Meta Model Weights\n---------------------------------------------------------\nIf you want to perform contiual pre-training using the model weights provided by Meta, follow these steps:\n\nEnsure you have the ``config.json`` file, which should have been downloaded as described in the `Download the dataset`_ section.\n\n\nDownload the model and convert the ``state_dict`` to NxDT checkpoint format\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nGet the conversion scripts described in the :ref:`Checkpoint Conversion <checkpoint_conversion>`. \nMention the ``hf_model_name`` argument to specify the HuggingFace model identifier for\nthe model you want to download and convert the checkpoint to NxDT format.\n\nRun the following to download the model and convert the ``state_dict`` to NxDT sharded checkpoint.\n\nOn trn1 cluster:\n\n.. code-block:: bash\n\n   python3 ./checkpoint_converter_scripts/checkpoint_converter.py \\\n     --model_style hf \\\n     --hf_model_name meta-llama/Meta-Llama-3-70B \\\n     --hw_backend trn1 \\\n     --tp_size 32 --pp_size 8 --n_layers 80 \\\n     --output_dir /fsx/pretrained_weight/ \\\n     --convert_from_full_state --save_xser True \\\n     --kv_size_multiplier 4 --qkv_linear True \\\n     --config ~/config.json\n\nOn trn2 cluster:\n\n.. code-block:: bash\n\n   python3 ./checkpoint_converter_scripts/checkpoint_converter.py \\\n     --model_style hf \\\n     --hf_model_name meta-llama/Meta-Llama-3-70B \\\n     --hw_backend trn2 \\\n     --tp_size 32 --pp_size 4 --n_layers 80 \\\n     --output_dir /fsx/pretrained_weight/ \\\n     --convert_from_full_state --save_xser True \\\n     --kv_size_multiplier 4 --qkv_linear True \\\n     --config ~/config.json\n\n\n.. note::\n    This conversion process requires larger host memory. Please run it on a trn1.32xlarge or trn2.48xlarge compute node. \n    In this example, the converted model is stored on FSx for Lustre to be accessed by all compute nodes.\n\nStart the continual training job by loading converted checkpoints\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn order to start the continual training job with loading this converted model as initial weights, please update the config file (``hf_llama3_70B_config.yaml`` or ``hf_llama3_70B_trn2_config.yaml``)  as below:\n\n.. code-block:: bash\n\n    exp_manager:\n    .\n    .\n      resume_from_checkpoint: /fsx/pretrained_weight/ # manually set the checkpoint file to load from\n    .\n    .\n    model:\n      # Miscellaneous\n      use_cpu_initialization: False # Init weights on the CPU (slow for large models) \n      weight_init_only: True \n\nCompared to initial pre-training loss value, you should see lower loss value when the training starts with Meta's model weights. Logs for one such sample run look like below.\n\n.. code-block:: bash\n\n    Epoch 0:   3%|▎         | 3/91 [16:09<7:53:59, 323.17s/it, loss=0.834, v_num=7, reduced_train_loss=1.670, lr=7.5e-9, parameter_norm=4736.0, global_step=1.000, consumed_samples=2048.0]\n    Epoch 0:   3%|▎         | 3/91 [16:09<7:53:59, 323.17s/it, loss=0.556, v_num=7, reduced_train_loss=1.670, lr=7.5e-9, parameter_norm=4736.0, global_step=2.000, consumed_samples=3072.0]\n    Epoch 0:   4%|▍         | 4/91 [21:25<7:46:02, 321.41s/it, loss=0.556, v_num=7, reduced_train_loss=1.670, lr=7.5e-9, parameter_norm=4736.0, global_step=2.000, consumed_samples=3072.0]\n    Epoch 0:   4%|▍         | 4/91 [21:25<7:46:02, 321.41s/it, loss=0.417, v_num=7, reduced_train_loss=1.670, lr=7.5e-9, parameter_norm=4736.0, global_step=3.000, consumed_samples=4096.0]\n\n\nPretraining with Context Paralellism\n------------------------------------\n\nTo run pretraining with context parallelism, use the following yaml config file: ``hf_llama3_70B_CP_config.yaml``.\nThis YAML file has the following changes to enable context parallelism:\n\n\n.. code-block:: yaml\n\n    distributed_strategy:\n        context_parallel_size: 2\n\n    fusions:\n        flash_attention: False\n        ring_attention: True\n\n\n**distributed_strategy**\n    **context_parallel_size**\n\n    Context parallel degree to be used for sharding sequence.\n\n    * **Type**: int\n    * **Required**: False\n    * **Default**: 1\n\n\n**fusions**\n    **ring_attention**\n\n    Setting this flag to ``True`` will use the ring attention module for\n    both forward and backward.\n    This parameter must be true when context parallel is\n    ```context_parallel_size`` is greater than 1.\n\n    * **Type**: bool\n    * **Required**: False\n\n\nIn the config file, ``context_parallel_size`` is set to the desired degree, and as\ncontext parallelism leverages ring attention instead of flash attention, we set ``ring_attention: True``,\nand ``flash_attention: False``.\n\nContext parallelism currently supports sequence lengths up to 32k and is supported on TRN1.\n\nCompile with:\n\n.. code-block:: bash\n\n    export COMPILE=1\n    export CONF_FILE=hf_llama3_70B_CP_config\n    sbatch --exclusive \\\n        --nodes 16 \\\n        --cpus-per-task 128 \\\n        --wrap=\"srun ./train.sh\"\n\nand pre-training with:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    export CONF_FILE=hf_llama3_70B_CP_config\n    sbatch --exclusive \\\n        --nodes 16 \\\n        --cpus-per-task 128 \\\n        --wrap=\"srun ./train.sh\"\n\n\nTroubleshooting Guide\n---------------------\n\nFor issues with ``NxDT``, please see:\n:ref:`NxDT Known Issues <nxdt_known_issues>`\n"
  },
  {
    "path": "libraries/nxd-training/tutorials/hf_llama3_8B_DPO_ORPO.rst",
    "content": ".. _hf_llama3_8B_DPO_ORPO:\n\nHF Llama3.1/Llama3-8B Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO) based Fine-tuning (Beta)\n=================================================================================================================================\n\nIn this example, we will show how to compile and finetune a pre-trained\nHF Llama3.1/Llama3-8B model on a single instance with the ``NxD Training (NxDT)`` library\nusing `Direct Preference Optimization (DPO) <https://arxiv.org/pdf/2305.18290>`_ and\n`Odds Ratio Preference Optimization (ORPO) <https://arxiv.org/abs/2403.07691>`_\nbased fine-tuning. The pre-trained Llama3-8B model serves as the foundation, and we will\nbuild upon this base by fine-tuning and aligning the model to adapt\nit to a specific task or dataset.\nThe example has the following main sections:\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nSetting up the environment\n--------------------------\n\nInstall Dependencies\n^^^^^^^^^^^^^^^^^^^^\n\nOnce you have launched a Trn1 instance,\nPlease follow this guide on how to install the latest Neuron packages:\n`PyTorch Neuron Setup Guide\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/index.html#setup-torch-neuronx>`_.\n\nNext, we will need to install ``NxDT`` and its dependencies.\nPlease see the following installation guide for installing ``NxDT``:\n:ref:`NxDT Installation Guide <nxdt_installation_guide>`.\n\nFor DPO and ORPO tests, We have to first install ``requirements.txt`` and then install ``alignment_requirements.txt``. We can use the following commands for the same:\n\n.. code-block:: shell\n\n    wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed-training/master/requirements.txt\n    pip install -r requirements.txt\n    wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed-training/master/alignment_requirements.txt\n    pip install -r alignment_requirements.txt\n\nDPO-YAML Configuration Overview\n-------------------------------\n\nYou can configure a variety of DPO-specific and model parameters for finetuning through the YAML file.\n\n.. code-block:: yaml\n\n    exp_manager:\n        resume_from_checkpoint: /pretrained_ckpt\n\n    data:\n        train_dir: /example_datasets/llama3_8b/data_dpo.jsonl\n        val_dir: null\n        dev_choose_samples: null\n        seq_length: 4096\n        tokenizer:\n            type: /llama3_tokenizer\n\n    model:\n        weight_init_only: True\n\n    model_alignment_strategy:\n        dpo:\n            kl_beta: 0.01\n            loss_type: sigmoid\n            max_prompt_length: 2048\n            precompute_ref_log_probs: True\n            truncation_mode: keep_start\n\n\n**exp_manager**\n    **resume_from_checkpoint**\n\n    Manually set the checkpoint file (pretrained/post-SFT checkpoint) to load from\n\n        * **Type**: str\n        * **Default**: ``/pretrained_ckpt``\n        * **Required**: True (start with pretrained checkpoint)\n\n**data**\n    **train_dir**\n\n    DPO training data - jsonl or arrow file\n\n    As for DPO we use HF style ModelAlignment dataloader, we also use HF style data file paths\n\n        * **Type**: str\n        * **Required**: True\n\n    **val_dir**\n\n    DPO validation data - jsonl or arrow file\n\n    As for DPO we use HF style ModelAlignment dataloader, we also use HF style data file paths\n\n        * **Type**: str\n        * **Required**: False\n\n    **dev_choose_samples**\n\n    If set, will use that many number of records from the\n    head of the dataset instead of using all. Set to null to use full dataset\n\n        * **Type**: integer\n        * **Default**: null\n        * **Required**: False\n\n    **seq_length**\n\n    Set sequence length for the training job.\n    For DPO, it is total sequence length of prompt and (chosen/rejected) response concatenated together\n\n        * **Type**: integer\n        * **Required**: True\n\n    **tokenizer**\n        **type**\n\n        Set tokenizer path/type\n\n            * **Type**: str\n            * **Default**: ``/llama3_tokenizer``\n            * **Required**: True\n\n **model**\n        **weight_init_only**\n\n        Load only model states and ignore the optim states from ckpt directory\n\n            * **Type**: bool\n            * **Default**: True\n\n **model_alignment_strategy**\n\n    Set only when using finetuning specific algorithms (SFT, DPO, etc) and and parameter-efficient\n    fine-tuning methods like LoRA (Low-Rank Adaptation).\n\n        **dpo**\n            Direct Preference Optimization (DPO) specific parameters.\n\n            **kl_beta**\n\n            KL-divergence beta to control divergence of policy model from reference model\n\n                * **Type**: float\n                * **Default**: 0.01\n                * **Required**: True\n\n            **loss_type**\n\n            Currently support sigmoid version of optimized DPO loss\n\n                * **Type**: str\n                * **Default**: ``sigmoid``\n                * **Required**: True\n\n            **max_prompt_length**\n\n            Set maximum length of prompt in the concatenated prompt and (chosen/rejected) response input\n\n                * **Type**: integer\n                * **Required**: True\n\n            **precompute_ref_log_probs**\n\n            To enable precomputation of reference model log probabilities using pre-fit hook,\n            False is not supported currently\n\n                * **Type**: bool\n                * **Required**: True\n\n            **truncation_mode**\n\n            To define how to truncate if size (prompt+response) exceeds seq_length\n            options: [\"keep_start\", \"keep_end\"]\n\n                * **Type**: str\n                * **Default**: ``keep_start```\n                * **Required**: True\n\nORPO-YAML Configuration Overview\n--------------------------------\n\nHere we show the ORPO-specific model parameters which can be configured\nfor finetuning through the YAML file.\nAnd below we explain the parameters that are new as compared to DPO-specific\nparameters.\n\n.. code-block:: yaml\n\n    exp_manager:\n        checkpoint_callback_params:\n            every_n_train_steps: 10\n        resume_from_checkpoint: /pretrained_ckpt\n\n    data:\n        train_dir: /example_datasets/llama3_8b/data_orpo.jsonl\n        val_dir: null\n        dev_choose_samples: null\n        seq_length: 4096\n        tokenizer:\n            type: /llama3_tokenizer\n\n    model:\n        encoder_seq_len: 4096\n        weight_init_only: True\n        optim:\n            lr: 1.5e-4\n            sched:\n                name: CosineAnnealing\n\n    model_alignment_strategy:\n        orpo:\n            beta: 0.1\n            max_prompt_length: 2048\n            truncation_mode: keep_start\n\n\n**exp_manager**\n\n    **checkpoint_callback_params.every_n_train_steps**\n\n    How often we want to checkpoint.\n\n        * **Type**: int\n        * **Required**: True\n\n**model**\n    **encoder_seq_length**\n\n    Setting the sequence length for the training job. This parameter is common for all\n    models supported in the library.\n\n        * **Type**: int\n        * **Required**: True\n\n    **optim.sched**\n\n    This is where the LR schedulers can be set. We can configure the schedulers supported by\n    ``NeMo``. All the schedulers can be configured according to the\n    `parameters specified here <https://github.com/NVIDIA/NeMo/blob/v1.14.0/nemo/core/config/schedulers.py>`__.\n\n        * **Type**: config\n        * **Possible Values**: ``LinearAnnealingWithWarmUp``, ``CosineAnnealing``, ``WarmupPolicy``,\n        *  ``WarmupHoldPolicy``, ``SquareAnnealing``, ``NoamAnnealing``, ``WarmupAnnealing``,\n        *   ``StepLR``, ``rprop``, ``ExponentialLR``\n        * **Required**: True\n\n\n **model_alignment_strategy**\n\n    Set only when using finetuning specific algorithms (SFT, DPO, ORPO, etc) and parameter-efficient\n    fine-tuning methods like LoRA (Low-Rank Adaptation).\n\n        **orpo**\n            Odds Ratio Preference Optimization (ORPO) specific parameters.\n\n            **beta**\n\n            KL-divergence beta to control divergence of policy model from reference model\n\n                * **Type**: float\n                * **Default**: 0.01\n                * **Required**: True\n\nDownload the dataset\n--------------------\n\nThe DPO (& ORPO) tutorial makes use of the same preprocessed version of `intel-orca_dpo_pairs`\npreference dataset that is stored in S3. The dataset can be downloaded to your cluster or\ninstance by running the following AWS CLI commands on the head node or your Trn1 instance:\n\n.. code-block:: bash\n\n    export DATA_DIR=~/examples_datasets/llama3_8b\n    mkdir -p ${DATA_DIR} && cd ${DATA_DIR}\n    aws s3 cp s3://neuron-s3/training_datasets/llama/dpo/data_dpo.jsonl .  --no-sign-request\n\nThen, download the ``config.json`` file:\n\nFor Llama-3.1-8B:\n\n.. code-block:: bash\n\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_zero1_llama_hf_pretrain/8B_config_llama3.1/config.json ~/\n\n\nFor Llama-3-8B:\n\n.. code-block:: bash\n\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_zero1_llama_hf_pretrain/8B_config_llama3/config.json ~/\n\n\nConvert data to DPO-specific Preference data format\n---------------------------------------------------\n\nIf you directly downloaded the `Intel ORCA_dpo_pairs dataset <https://huggingface.co/datasets/Intel/orca_dpo_pairs>`_, then you need to convert the\ndata into preference dataset format using the script below.\n\n.. note::\n    For different datasets with different field names, make necessary changes to the script accordingly.\n\n.. code-block:: python\n\n    from datasets import load_dataset\n    from transformers import AutoTokenizer\n\n    def preference_data_format(example):\n\n        system = \"<|im_start|>\\n\" + example['system'] + \"<|im_end|>\\n\"\n\n        # Format instruction\n        prompt = \"<|im_start|> \" + example['question'] + \"<|im_end|>\\n<|im_start|>assistant\\n\"\n\n        # Format chosen answer\n        chosen = example['chosen'] + \"<|im_end|>\\n\"\n\n        # Format rejected answer\n        rejected = example['rejected'] + \"<|im_end|>\\n\"\n\n        return {\n            \"prompt\": system + prompt,\n            \"chosen\": chosen,\n            \"rejected\": rejected,\n        }\n\n    # Particular dataset with following fields: \"system\", \"question\", \"chosen\", \"rejected\"\n    dataset = load_dataset(\"json\", data_files=\"orca_rlhf.jsonl\", split=\"train\")\n\n    # Save columns\n    original_columns = dataset.column_names\n\n    # Format dataset\n    dataset = dataset.map(\n        preference_data_format,\n        remove_columns=original_columns\n        )\n\n    # save converted preference dataset\n    dataset.to_json(\"data_dpo.jsonl\")\n\n\nDownload pretrained model checkpoint and tokenizer\n--------------------------------------------------\n\nIn this tutorial, we will use a pretrained Llama3-8B checkpoint (post-SFT checkpoint preferred)\nfrom the original repository.\nFollow the steps to download tokenizer and model checkpoint from\nthe pretraining stage: `<https://llama.meta.com/llama-downloads/>`_\n\nAlternatively, the model checkpoint and tokenizer can also be downloaded\nfrom HuggingFace by following this `guide <https://huggingface.co/meta-llama/Llama-3.1-8B#use-with-llama>`_\n\nYou can also directly download and covert the HuggingFace\nmodel checkpoint using :ref:`Direct HuggingFace Model Conversion <checkpoint_conversion>`\n\nCreate a folder ``llama3_tokenizer`` and copy the tokenizer contents to it.\n\nModify the following paths in YAML file based on your specific directory configuration:\n\n1. ``model.model_config``\n2. ``exp_manager.resume_from_checkpoint``\n3. ``tokenizer.type``\n4. ``train_dir`` and ``val_dir``\n\nYou can use your Llama model, pretrained checkpoint and tokenizer by\nmodifying the ``hf_llama3_8B_<DPO/ORPO>_config.yaml`` file.\n\n\nCheckpoint Conversion\n^^^^^^^^^^^^^^^^^^^^^\n\nFollow this :ref:`Checkpoint Conversion Guide <checkpoint_conversion>` to convert the\nHF-style Llama3-8B checkpoint\nto NxDT supported format and store it in ``pretrained_ckpt`` directory.\nModify the config parameter ``exp_manager.resume_from_checkpoint`` path to the\nconverted pretrained checkpoint path.\n\nPre-compile the model\n---------------------\n\nBy default, PyTorch Neuron uses a just in time (JIT) compilation flow that sequentially\ncompiles all of the neural network compute graphs as they are encountered during a training job.\nThe compiled graphs are cached in a local compiler cache so that subsequent training jobs\ncan leverage the compiled graphs and avoid compilation\n(so long as the graph signatures and Neuron version have not changed).\n\nAn alternative to the JIT flow is to use the included ``neuron_parallel_compile``\ncommand to perform ahead of time (AOT) compilation. In the AOT compilation flow,\nthe compute graphs are first identified and extracted during a short simulated training run,\nand the extracted graphs are then compiled and cached using parallel compilation,\nwhich is considerably faster than the JIT flow.\n\nFirst, clone the open-source ``neuronx-distributed-training`` library\n\n.. code:: ipython3\n    \n   git clone https://github.com/aws-neuron/neuronx-distributed-training\n   cd neuronx-distributed-training/examples\n\nNow, ensure that you are using the proper config file in the ``conf/`` directory.\nIn the ``train.sh`` file, ensure that the ``CONF_FILE`` variable is properly\nset to the config for the model you want to use. In our case,\nit will be ``hf_llama3_8B_<DPO/ORPO>_config.yaml``. The default config here is a 8B parameter model,\nbut users can also add their own ``conf/*.yaml`` files and run different configs and\nhyperparameters if desired. Please see :ref:`Config Overview <nxdt_config_overview>`\nfor examples and usage for the ``.yaml`` config files.\n\nNext, run the following commands to launch an AOT pre-compilation job on your instance:\n\n.. code-block:: bash\n\n    export COMPILE=1\n    export CONF_FILE=hf_llama3_8B_<DPO/ORPO>_config\n    ./train.sh\n\nThe compile output and logs will be shown directly in the terminal\nand you will see logs similar to this:\n\n.. code-block:: bash\n\n    2024-10-24 18:49:49.000950: INFO ||NEURON_PARALLEL_COMPILE||: Total graphs: 32\n    2024-10-24 18:49:49.000950: INFO ||NEURON_PARALLEL_COMPILE||: Total successful compilations: 32\n    2024-10-24 18:49:49.000950: INFO ||NEURON_PARALLEL_COMPILE||: Total failed compilations: 0\n\nThen, you know your compilation has successfully completed.\n\n.. note::\n    The number of graphs will differ based on package versions, models, and other factors.\n    This is just an example.\n\n\nTraining the model\n------------------\n\nThe fine-tuning job is launched almost exactly in the same way as the compile job.\nWe now turn off the ``COMPILE`` environment variable and\nrun the same training script to start pre-training.\n\nOn a single instance:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    export CONF_FILE=hf_llama3_8B_<DPO/ORPO>_config\n    ./train.sh\n\nOnce the model is loaded onto the Trainium accelerators and training has commenced,\nyou will begin to see output indicating the job progress:\n\nExample:\n\n.. code-block:: bash\n\n    Epoch 0:   5%|â–         | 3/62 [02:59<58:44,  0.02it/s, v_num=8-06, reduced_train_loss=6.930, chosen_rewards=-0.81, rejected_rewards=-0.675, lr=2.73e-5, parameter_norm=1.95e+3, global_step=1.000, consumed_samples=32.00, throughput=0.108, throughput_peak=0.0677, gradient_norm=8.600]\n    Epoch 0:   6%|â–‹         | 4/62 [03:24<49:27,  0.02it/s, v_num=8-06, reduced_train_loss=6.790, chosen_rewards=-0.628, rejected_rewards=-0.64, lr=5.45e-5, parameter_norm=1.95e+3, global_step=3.000, consumed_samples=64.00, throughput=0.181, throughput_peak=0.146, gradient_norm=6.590]\n    Epoch 0:   8%|â–Š         | 5/62 [03:50<43:42,  0.02it/s, v_num=8-06, reduced_train_loss=6.790, chosen_rewards=-0.628, rejected_rewards=-0.64, lr=5.45e-5, parameter_norm=1.95e+3, global_step=3.000, consumed_samples=64.00, throughput=0.181, throughput_peak=0.146, gradient_norm=6.590]\n\n.. note::\n    The values in the above logs will differ based on config used, package versions,\n    models, and other factors. This is just an example.\n\nMonitoring Training\n-------------------\n\nTensorboard monitoring\n^^^^^^^^^^^^^^^^^^^^^^\n\nIn addition to the text-based job monitoring described in the previous section,\nyou can also use standard tools such as TensorBoard to monitor training job progress.\nTo view an ongoing training job in TensorBoard, you first need to identify the\nexperiment directory associated with your ongoing job.\nThis will typically be the most recently created directory under\n``~/neuronx-distributed-training/examples/nemo_experiments/hf_llama3_8B/``.\nOnce you have identifed the directory, cd into it, and then launch TensorBoard:\n\n.. code-block:: bash\n\n    cd ~/neuronx-distributed-training/examples/nemo_experiments/hf_llama3_8B/\n    tensorboard --logdir ./\n\nWith TensorBoard running, you can then view the TensorBoard dashboard by browsing to\n``http://localhost:6006`` on your local machine. If you cannot access TensorBoard at this address,\nplease make sure that you have port-forwarded TCP port 6006 when SSH'ing into the head node,\n\n.. code-block:: bash\n\n    ssh -i YOUR_KEY.pem ubuntu@HEAD_NODE_IP_ADDRESS -L 6006:127.0.0.1:6006\n\nneuron-top / neuron-monitor / neuron-ls\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe `neuron-top <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-top-user-guide.html>`_\ntool can be used to view useful information about NeuronCore utilization, vCPU and RAM utilization,\nand loaded graphs on a per-node basis. To use neuron-top during on ongoing training job, run ``neuron-top``:\n\n.. code-block:: bash\n\n    ssh compute1-dy-queue1-i1-1  # to determine which compute nodes are in use, run the squeue command\n    neuron-top\n\nSimilarly, once you are logged into one of the active compute nodes,\nyou can also use other Neuron tools such as\n`neuron-monitor <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nand `neuron-ls <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nto capture performance and utilization statistics and to understand NeuronCore allocation.\n\nTroubleshooting Guide\n---------------------\n\nFor issues with ``NxDT``, please see:\n:ref:`NxDT Known Issues <nxdt_known_issues>`"
  },
  {
    "path": "libraries/nxd-training/tutorials/hf_llama3_8B_SFT.rst",
    "content": ".. _hf_llama3_8B_SFT:\n\nHuggingFace  Llama3.1/Llama3-8B Supervised Fine-tuning\n======================================================\n\nIn this example, we will compile and finetune pre-trained HF  Llama3.1/Llama3-8B\nmodel on a single instance with the NxD Training library.\nThe pre-trained Llama3-8B model serves as the foundation, and we will\nbuild upon this solid base by fine-tuning the model to adapt\nit to a specific task or dataset.\nThe example has the following main sections:\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nSetting up the environment\n--------------------------\n\nInstall Dependencies\n^^^^^^^^^^^^^^^^^^^^\n\nPlease follow this guide on how to install the latest Neuron packages:\n`PyTorch Neuron Setup Guide\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/index.html#setup-torch-neuronx>`_.\n\nNext, we will need to install NxD Training and its dependencies.\nPlease see the following installation guide for installing NxD Training:\n:ref:`NxD Training Installation Guide <nxdt_installation_guide>`.\n\n\nSFT-YAML Configuration Overview\n-------------------------------\n\nYou can configure a variety of SFT-specific and model parameters for finetuning through the YAML file.\n\n.. code-block:: yaml\n\n    exp_manager:\n        resume_from_checkpoint: /pretrained_ckpt\n\n    data:\n        train_dir: /example_datasets/llama3_8b/training.jsonl\n        val_dir: /example_datasets/llama3_8b/validation.json\n        dev_choose_samples: 2250\n        seq_length: 4096\n        alignment_strategy:\n            sft:\n                packing: True\n        tokenizer:\n            type: /llama3_tokenizer\n\n    model:\n        weight_init_only: True\n\n\n**exp_manager**\n    **resume_from_checkpoint**\n\n    Manually set the checkpoint file (pretrained checkpoint) to load from\n\n        * **Type**: str\n        * **Default**: ``/pretrained_ckpt``\n        * **Required**: True (start with pretrained checkpoint)\n\n**data**\n\n    **train_dir**\n\n    SFT training data - jsonl or arrow file\n\n    As for SFT we use HF style ModelAlignment dataloader, we also use HF style data file paths\n\n        * **Type**: str\n        * **Required**: True\n\n    **val_dir**\n\n    SFT validation data - jsonl or arrow file\n\n    As for SFT we use HF style ModelAlignment dataloader, we also use HF style data file paths\n\n        * **Type**: str\n        * **Required**: False\n\n    **dev_choose_samples**\n\n    If set, will use that many number of records from the\n    head of the dataset instead of using all. Set to null to use full dataset\n\n        * **Type**: integer\n        * **Default**: null\n        * **Required**: False\n\n    **seq_length**\n\n    Set sequence length for the training job.\n\n        * **Type**: integer\n        * **Required**: True\n\n    **alignment_strategy**\n\n    Set only when using finetuning specific algorithms (SFT, DPO, etc) and related hyperparameters\n    SFT-specific parameters.\n\n        **sft**\n            **packing**\n\n            Appends multiple records in a single record until seq length\n            supported by model, if false uses pad tokens to reach seq length.\n            Setting it to True increases throughput but might impact accuracy.\n\n                * **Type**: bool\n                * **Default**: False\n                * **Required**: False\n\n    **tokenizer**\n        **type**\n\n        Set tokenizer path/type\n\n            * **Type**: str\n            * **Default**: ``/llama3_tokenizer``\n            * **Required**: True\n\n **model**\n        **weight_init_only**\n\n        Load only model states and ignore the optim states from ckpt directory\n\n            * **Type**: bool\n            * **Default**: True\n\n\nDownload the dataset\n--------------------\n\nThis tutorial makes use of a preprocessed version of `databricks-dolly` instruction-following\ndataset that is stored in S3. The dataset can be downloaded to your cluster or instance\nby running the following AWS CLI commands on the head node or your Trn1 instance:\n\n.. code-block:: bash\n\n    export DATA_DIR=~/examples_datasets/llama3_8b\n    mkdir -p ${DATA_DIR} && cd ${DATA_DIR}\n    aws s3 cp s3://neuron-s3/training_datasets/llama/sft/training.jsonl .  --no-sign-request\n    aws s3 cp s3://neuron-s3/training_datasets/llama/sft/validation.jsonl .  --no-sign-request\n\n\nDownload pretrained model checkpoint and tokenizer\n--------------------------------------------------\n\nIn this tutorial, we will use a pretrained Llama3-8B checkpoint from the original repository.\nFollow the steps to download tokenizer and model checkpoint from\nthe pretraining stage: `<https://llama.meta.com/llama-downloads/>`_\n\nAlternatively, the model checkpoint and tokenizer can also be downloaded\nfrom HuggingFace by following this `guide <https://huggingface.co/meta-llama/Llama-3.1-8B#use-with-llama>`_\n\nYou can also directly download and covert the HuggingFace\nmodel checkpoint using :ref:`Direct HuggingFace Model Conversion <checkpoint_conversion>`\n\nCreate a folder ``llama3_tokenizer`` and copy the tokenizer contents to it.\n\nModify the following paths in YAML file based on your specific directory configuration:\n\n1. ``model.model_config``\n2. ``exp_manager.resume_from_checkpoint``\n3. ``tokenizer.type``\n4. ``train_dir`` and ``val_dir``\n\nYou can use your custom model, pretrained checkpoint and tokenizer by\nmodifying the ``hf_llama3_8B_SFT_config.yaml`` file.\n\n\nCheckpoint Conversion\n^^^^^^^^^^^^^^^^^^^^^\n\nFollow this :ref:`Checkpoint Conversion Guide <checkpoint_conversion>` to convert the\nHF-style Llama3-8B checkpoint\nto NxDT supported format and store it in  ``pretrained_ckpt`` directory.\nModify the config parameter ``exp_manager.resume_from_checkpoint`` path to the\nconverted pretrained checkpoint path.\n\nPre-compile the model\n---------------------\n\nBy default, PyTorch Neuron uses a just in time (JIT) compilation flow that sequentially\ncompiles all of the neural network compute graphs as they are encountered during a training job.\nThe compiled graphs are cached in a local compiler cache so that subsequent training jobs\ncan leverage the compiled graphs and avoid compilation\n(so long as the graph signatures and Neuron version have not changed).\n\nAn alternative to the JIT flow is to use the included ``neuron_parallel_compile``\ncommand to perform ahead of time (AOT) compilation. In the AOT compilation flow,\nthe compute graphs are first identified and extracted during a short simulated training run,\nand the extracted graphs are then compiled and cached using parallel compilation,\nwhich is considerably faster than the JIT flow.\n\nFirst, clone the open-source ``neuronx-distributed-training`` library\n\n.. code:: ipython3\n\n   git clone https://github.com/aws-neuron/neuronx-distributed-training\n   cd neuronx-distributed-training/examples\n\nNow, ensure that you are using the proper config file in the ``conf/`` directory.\nIn the ``train.sh`` file, ensure that the ``CONF_FILE`` variable is properly\nset to the config for the model you want to use. In our case,\nit will be ``hf_llama3_8B_SFT_config``. The default config here is a 8B parameter model,\nbut users can also add their own ``conf/*.yaml`` files and run different configs and\nhyperparameters if desired. Please see :ref:`Config Overview <nxdt_config_overview>`\nfor examples and usage for the ``.yaml`` config files.\n\nNext, run the following commands to launch an AOT pre-compilation job on your instance:\n\n.. code-block:: bash\n\n    export COMPILE=1\n    ./train.sh\n\nThe compile output and logs will be shown directly in the terminal\nand you will see logs similar to this:\n\n.. code-block:: bash\n\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total graphs: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total successful compilations: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total failed compilations: 0\n\nThen, you know your compilation has successfully completed.\n\n.. note::\n    The number of graphs will differ based on package versions, models, and other factors.\n    This is just an example.\n\n\nTraining the model\n------------------\n\nThe fine-tuning job is launched almost exactly in the same way as the compile job.\nWe now turn off the ``COMPILE`` environment variable and\nrun the same training script to start pre-training.\n\nOn a single instance:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    ./train.sh\n\nOnce the model is loaded onto the Trainium accelerators and training has commenced,\nyou will begin to see output indicating the job progress:\n\nExample:\n\n.. code-block:: bash\n\n    Epoch 0:   0%|          | 189/301501 [59:12<1573:03:24, 18.79s/it, loss=7.75, v_num=3-16, reduced_train_loss=7.560, global_step=188.0, consumed_samples=24064.0]\n    Epoch 0:   0%|          | 190/301501 [59:30<1572:41:13, 18.79s/it, loss=7.74, v_num=3-16, reduced_train_loss=7.560, global_step=189.0, consumed_samples=24192.0]\n    Epoch 0:   0%|          | 191/301501 [59:48<1572:21:28, 18.79s/it, loss=7.73, v_num=3-16, reduced_train_loss=7.910, global_step=190.0, consumed_samples=24320.0]\n\nMonitoring Training\n-------------------\n\nTensorboard monitoring\n^^^^^^^^^^^^^^^^^^^^^^\n\nIn addition to the text-based job monitoring described in the previous section,\nyou can also use standard tools such as TensorBoard to monitor training job progress.\nTo view an ongoing training job in TensorBoard, you first need to identify the\nexperiment directory associated with your ongoing job.\nThis will typically be the most recently created directory under\n``~/neuronx-distributed-training/examples/nemo_experiments/hf_llama3_8B/``.\nOnce you have identifed the directory, cd into it, and then launch TensorBoard:\n\n.. code-block:: bash\n\n    cd ~/neuronx-distributed-training/examples/nemo_experiments/hf_llama3_8B/\n    tensorboard --logdir ./\n\nWith TensorBoard running, you can then view the TensorBoard dashboard by browsing to\n``http://localhost:6006`` on your local machine. If you cannot access TensorBoard at this address,\nplease make sure that you have port-forwarded TCP port 6006 when SSH'ing into the head node,\n\n.. code-block:: bash\n\n    ssh -i YOUR_KEY.pem ubuntu@HEAD_NODE_IP_ADDRESS -L 6006:127.0.0.1:6006\n\nneuron-top / neuron-monitor / neuron-ls\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe `neuron-top <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-top-user-guide.html>`_\ntool can be used to view useful information about NeuronCore utilization, vCPU and RAM utilization,\nand loaded graphs on a per-node basis. To use neuron-top during on ongoing training job, run ``neuron-top``:\n\n.. code-block:: bash\n\n    ssh compute1-dy-queue1-i1-1  # to determine which compute nodes are in use, run the squeue command\n    neuron-top\n\nSimilarly, once you are logged into one of the active compute nodes,\nyou can also use other Neuron tools such as\n`neuron-monitor <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nand `neuron-ls <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nto capture performance and utilization statistics and to understand NeuronCore allocation.\n\nTroubleshooting Guide\n---------------------\n\nFor issues with NxD Training, please see:\n:ref:`NxD Training Known Issues <nxdt_known_issues>`"
  },
  {
    "path": "libraries/nxd-training/tutorials/hf_llama3_8B_SFT_LORA.rst",
    "content": ".. _hf_llama3_8B_SFT_LORA:\n\nHuggingFace  Llama3.1/Llama3-8B Efficient Supervised Fine-tuning with LoRA (Beta)\n=================================================================================\n\nIn this example, we will compile and finetune pre-trained HF  Llama3.1/Llama3-8B model\nwith LoRA adaptors on a single instance with the ``NxD Training (NxDT)`` library.\nLoRA or Low Rank Adaptation allows for parameter-efficient fine-tuning (PEFT) by adding small trainable rank\ndecomposition matrices to specified layer of the model, significantly\nreducing memory usage and training time compared to dense fine-tuning.\nThe pre-trained Llama3-8B model serves as the foundation, and we will\nbuild upon this by fine-tuning the model to adapt it to a specific task or dataset.\n\n.. warning::\n   **9/18/2025**: Currently, the code in this tutorial does not work. We will be updating it at a futu\n\nThe example has the following main sections:\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nSetting up the environment\n--------------------------\n\nInstall Dependencies\n^^^^^^^^^^^^^^^^^^^^\n\nFirst, you can launch a Trn1 instance by following the Neuron DLAMI guide:\n`Neuron DLAMI User Guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/dlami/index.html>`_.\n\nOnce you have launched a Trn1 instance,\nfollow this guide on how to install the latest Neuron packages:\n`PyTorch Neuron Setup Guide\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/index.html#setup-torch-neuronx>`_.\n\nNext, we will need to install ``NxDT`` and its dependencies.\nPlease see the following installation guide for installing ``NxDT``:\n:ref:`NxDT Installation Guide <nxdt_installation_guide>`.\n\n\nDownload the dataset\n--------------------\n\nThis tutorial makes use of a preprocessed version of `databricks-dolly` instruction-following\ndataset that is stored in S3. The dataset can be downloaded to your cluster or instance\nby running the following AWS CLI commands on the head node or your Trn1 instance:\n\n.. code-block:: bash\n\n    export DATA_DIR=~/examples_datasets/llama3_8b\n    mkdir -p ${DATA_DIR} && cd ${DATA_DIR}\n    aws s3 cp s3://neuron-s3/training_datasets/llama/sft/training.jsonl .  --no-sign-request\n    aws s3 cp s3://neuron-s3/training_datasets/llama/sft/validation.jsonl .  --no-sign-request\n\n\nThen, download the ``config.json`` file:\n\nFor Llama-3-8B:\n\n.. code-block:: bash\n\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_zero1_llama_hf_pretrain/8B_config_llama3/config.json ~/\n\n\nDownload pretrained model checkpoint and tokenizer\n--------------------------------------------------\n\nIn this tutorial, we will use a pretrained Llama3-8B checkpoint from the original repository.\nFollow the steps to download tokenizer and model checkpoint from\nthe pretraining stage: `<https://llama.meta.com/llama-downloads/>`_.\n\nAlternatively, the model checkpoint and tokenizer can also be downloaded\nfrom HuggingFace by following this `guide <https://huggingface.co/meta-llama/Meta-Llama-3-8B#use-with-llama3>`_.\n\nYou can also directly download and covert the HuggingFace\nmodel checkpoint using :ref:`Direct HuggingFace Model Conversion <checkpoint_conversion>`.\n\nIf you choose to download the weights from HuggingFace with your own token, you can create a python script to run such as:\n\n.. code-block:: python\n\n    import transformers\n\n    tokenizer_path=\"llama3_tokenizer\"\n    model_weights_path=\"llama3-8B_hf_weights\"\n    model_id = \"meta-llama/Meta-Llama-3-8B\"\n\n    t = transformers.AutoTokenizer.from_pretrained(model_id)\n    t.save_pretrained(tokenizer_path)\n\n    m = transformers.AutoModelForCausalLM.from_pretrained(model_id)\n    m.save_pretrained(model_weights_path)\n\nCreate a folder ``llama3_tokenizer`` and copy the tokenizer contents to it.\n\nModify the following paths in YAML file based on your specific directory configuration:\n\n1. ``model.model_config``\n2. ``exp_manager.resume_from_checkpoint``\n3. ``tokenizer.type``\n4. ``train_dir`` and ``val_dir``\n\nYou can use your custom model, pretrained checkpoint and tokenizer by\nmodifying the ``hf_llama3_8B_SFT_lora_config.yaml`` file.\n\n\nCheckpoint Conversion\n^^^^^^^^^^^^^^^^^^^^^\n\nFollow this :ref:`Checkpoint Conversion Guide <checkpoint_conversion>` to convert the\nHF-style Llama3-8B checkpoint\nto NxDT supported format and store it in  ``pretrained_ckpt`` directory.\nModify the config parameter ``exp_manager.resume_from_checkpoint`` path to the\nconverted pretrained checkpoint path.\n\n\nLoRA SFT-YAML Configuration Overview\n------------------------------------\n\nYou can configure a variety of SFT, DPO, PEFT-specfic and model parameters for finetuning using the YAML file.\n\n.. code-block:: yaml\n\n    exp_manager:\n        resume_from_checkpoint: /pretrained_ckpt\n\n    data:\n        train_dir: /example_datasets/llama3_8b/training.jsonl\n        val_dir: /example_datasets/llama3_8b/validation.json\n        dev_choose_samples: 2250\n        seq_length: 4096\n        tokenizer:\n            type: /llama3_tokenizer\n\n    model:\n        weight_init_only: True\n\n    model_alignment_strategy:\n        sft:\n            packing: True\n        peft:\n            lora_rank: 16\n            lora_alpha: 32\n            lora_dropout: 0.05\n            lora_bias: \"none\"\n            lora_verbose: True\n            target_modules: [\"qkv_proj\"]\n\n\n**exp_manager**\n    **resume_from_checkpoint**\n\n    Manually set the checkpoint file (pretrained checkpoint) to load from\n\n        * **Type**: str\n        * **Default**: ``/pretrained_ckpt``\n        * **Required**: True (start with pretrained checkpoint)\n\n**data**\n\n    **train_dir**\n\n    SFT training data - jsonl or arrow file\n\n    For SFT, we use HF style ModelAlignment dataloader, we also use HF style data file paths\n\n        * **Type**: str\n        * **Required**: True\n\n    **val_dir**\n\n    SFT validation data - jsonl or arrow file\n\n    For SFT, we use HF style ModelAlignment dataloader, we also use HF style data file paths\n\n        * **Type**: str\n        * **Required**: False\n\n    **dev_choose_samples**\n\n    If set, will use that many number of records from the\n    head of the dataset instead of using all. Set to null to use full dataset\n\n        * **Type**: integer\n        * **Default**: null\n        * **Required**: False\n\n    **seq_length**\n\n    Set sequence length for the training job.\n\n        * **Type**: integer\n        * **Required**: True\n\n    **tokenizer**\n        **type**\n\n        Set tokenizer path/type\n\n            * **Type**: str\n            * **Default**: ``/llama3_tokenizer``\n            * **Required**: True\n\n **model**\n        **weight_init_only**\n\n        Load only model states and ignore the optim states from ckpt directory\n\n            * **Type**: bool\n            * **Default**: True\n\n **model_alignment_strategy**\n\n    Set only when using finetuning specific algorithms (SFT, DPO, etc) and parameter-efficient\n    fine-tuning methods like LoRA (Low-Rank Adaptation).\n\n        **sft**\n            Supervised Fine-Tuning (SFT) specific parameters.\n\n            **packing**\n\n            Appends multiple records in a single record until seq length\n            supported by model, if false uses pad tokens to reach seq length.\n            Setting it to True increases throughput but might impact accuracy.\n\n                * **Type**: bool\n                * **Default**: False\n                * **Required**: False\n\n        **peft**\n            Configuration options for Parameter-Efficient Fine-Tuning (PEFT) methods,\n            specifically LoRA settings.\n\n            **lora_rank**\n\n            Rank of LoRA; determines the number of trainable parameters\n            Higher rank allows for more expressive adaptations but increases memory usage\n\n                * **Type**: int\n                * **Default**: 16\n                * **Required**: True\n\n            **lora_alpha**\n\n            Scaling factor for LoRA updates; affects the magnitude of LoRA adaptations.\n\n                * **Type**: int\n                * **Default**: 32\n                * **Required**: True\n\n            **lora_dropout**\n\n            Dropout rate for LoRA layers to prevent overfitting.\n\n                * **Type**: float\n                * **Default**: 0.05\n                * **Required**: False\n\n            **lora_bias**\n\n            Bias type for LoRA. Determines which biases are trainable. Can be 'none', 'all' or 'lora_only'\n\n                * **Type**: str\n                * **Default**: \"none\"\n                * **Required**: False\n\n            **lora_verbose**\n\n            Enables detailed LoRA-related logging during training.\n\n                * **Type**: bool\n                * **Default**: False\n                * **Required**: False\n\n            **target_modules**\n\n            List of model layers to apply LoRA.\n\n                * **Type**: list[str]\n                * **Default**: [\"qkv_proj\"] (for Llama)\n                * **Required**: True\n\n\nPre-compile the model\n---------------------\n\nBy default, PyTorch Neuron uses a just in time (JIT) compilation flow that sequentially\ncompiles all of the neural network compute graphs as they are encountered during a training job.\nThe compiled graphs are cached in a local compiler cache so that subsequent training jobs\ncan leverage the compiled graphs and avoid compilation\n(so long as the graph signatures and Neuron version have not changed).\n\nAn alternative to the JIT flow is to use the included ``neuron_parallel_compile``\ncommand to perform ahead of time (AOT) compilation. In the AOT compilation flow,\nthe compute graphs are first identified and extracted during a short simulated training run,\nand the extracted graphs are then compiled and cached using parallel compilation,\nwhich is considerably faster than the JIT flow.\n\nFirst, clone the open-source ``neuronx-distributed-training`` library\n\n.. code:: ipython3\n\n   git clone https://github.com/aws-neuron/neuronx-distributed-training\n   cd neuronx-distributed-training/examples\n\nNow, ensure that you are using the proper config file in the ``conf/`` directory.\nIn the ``train.sh`` file, ensure that the ``CONF_FILE`` variable is properly\nset to the config for the model you want to use. In our case,\nit will be ``hf_llama3_8B_SFT_lora_config``. The default config here is a 8B parameter model,\nbut users can also add their own ``conf/*.yaml`` files and run different configs and\nhyperparameters if desired. Please see :ref:`Config Overview <nxdt_config_overview>`\nfor examples and usage for the ``.yaml`` config files.\n\nNext, run the following commands to launch an AOT pre-compilation job on your instance:\n\n.. code-block:: bash\n\n    cd ~/neuronx-distributed-training/examples\n    export COMPILE=1\n    ./train.sh\n\nThe compile output and logs will be shown directly in the terminal\nand you will see logs similar to this:\n\n.. code-block:: bash\n\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total graphs: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total successful compilations: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total failed compilations: 0\n\nThen, you know your compilation has successfully completed.\n\n.. note::\n    The number of graphs will differ based on package versions, models, and other factors.\n    This is just an example.\n\n\nTraining the model\n------------------\n\nThe fine-tuning job is launched almost exactly in the same way as the compile job.\nWe now turn off the ``COMPILE`` environment variable and\nrun the same training script to start pre-training.\n\nOn a single instance:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    ./train.sh\n\nOnce the model is loaded onto the Trainium accelerators and training has commenced,\nyou will begin to see output indicating the job progress:\n\nExample:\n\n.. code-block:: bash\n\n    Epoch 0:   0%|          | 189/301501 [59:12<1573:03:24, 18.79s/it, loss=7.75, v_num=3-16, reduced_train_loss=7.560, global_step=188.0, consumed_samples=24064.0]\n    Epoch 0:   0%|          | 190/301501 [59:30<1572:41:13, 18.79s/it, loss=7.74, v_num=3-16, reduced_train_loss=7.560, global_step=189.0, consumed_samples=24192.0]\n    Epoch 0:   0%|          | 191/301501 [59:48<1572:21:28, 18.79s/it, loss=7.73, v_num=3-16, reduced_train_loss=7.910, global_step=190.0, consumed_samples=24320.0]\n\nMonitoring Training\n-------------------\n\nTensorboard monitoring\n^^^^^^^^^^^^^^^^^^^^^^\n\nIn addition to the text-based job monitoring described in the previous section,\nyou can also use standard tools such as TensorBoard to monitor training job progress.\nTo view an ongoing training job in TensorBoard, you first need to identify the\nexperiment directory associated with your ongoing job.\nThis will typically be the most recently created directory under\n``~/neuronx-distributed-training/examples/nemo_experiments/hf_llama3_8B/``.\nOnce you have identifed the directory, cd into it, and then launch TensorBoard:\n\n.. code-block:: bash\n\n    cd ~/neuronx-distributed-training/examples/nemo_experiments/hf_llama3_8B/\n    tensorboard --logdir ./\n\nWith TensorBoard running, you can then view the TensorBoard dashboard by browsing to\n``http://localhost:6006`` on your local machine. If you cannot access TensorBoard at this address,\nplease make sure that you have port-forwarded TCP port 6006 when SSH'ing into the head node,\n\n.. code-block:: bash\n\n    ssh -i YOUR_KEY.pem ubuntu@HEAD_NODE_IP_ADDRESS -L 6006:127.0.0.1:6006\n\nneuron-top / neuron-monitor / neuron-ls\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe `neuron-top <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-top-user-guide.html>`_\ntool can be used to view useful information about NeuronCore utilization, vCPU and RAM utilization,\nand loaded graphs on a per-node basis. To use neuron-top during on ongoing training job, run ``neuron-top``:\n\n.. code-block:: bash\n\n    ssh compute1-dy-queue1-i1-1  # to determine which compute nodes are in use, run the squeue command\n    neuron-top\n\nSimilarly, once you are logged into one of the active compute nodes,\nyou can also use other Neuron tools such as\n`neuron-monitor <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nand `neuron-ls <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nto capture performance and utilization statistics and to understand NeuronCore allocation.\n\nTroubleshooting Guide\n---------------------\n\nFor issues with ``NxDT``, please see:\n:ref:`NxDT Known Issues <nxdt_known_issues>`\n"
  },
  {
    "path": "libraries/nxd-training/tutorials/hf_llama3_8B_pretraining.rst",
    "content": ".. _hf_llama3_8B_pretraining:\n\nHuggingFace Llama3.1/Llama3-8B Pretraining\n==========================================\n\nIn this example, we will compile and train a HF Llama3.1/Llama3-8B model on a single instance\nwith the ``NxD Training (NxDT)`` library.\nThe example has the following main sections:\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nSetting up the environment\n--------------------------\n\nInstall Dependencies\n^^^^^^^^^^^^^^^^^^^^\n\nOnce you have launched a Trn1 instance,\nplease follow this guide on how to install the latest Neuron packages:\n`PyTorch Neuron Setup Guide\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/index.html#setup-torch-neuronx>`_.\n\nNext, we will need to install ``NxDT`` and its dependencies.\nPlease see the following installation guide for installing ``NxDT``:\n:ref:`NxDT Installation Guide <nxdt_installation_guide>`\n\n\nDownload the dataset\n--------------------\n\nLet's download training-data scripts for our experiments\n\n.. code:: ipython3\n\n   wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/get_dataset.py\n\n\nTo tokenize the data, we must request the tokenizer from Hugging Face and Meta by following the\ninstructions at the following link: `HuggingFace Llama 3 8B Model <https://huggingface.co/meta-llama/Meta-Llama-3-8B>`__ . \n\nUse of the Llama models is governed by the Meta license.\nIn order to download the model weights and tokenizer, please visit the above website\nand accept their License before requesting access. After access has been granted,\nyou may use the following python3 script along with your own hugging face token to download and save the tokenizer.\n\n\n.. code:: ipython3\n\n   from huggingface_hub import login\n   from transformers import AutoTokenizer\n\n   login(token='your_own_hugging_face_token')\n\n   tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B')  \n\n   tokenizer.save_pretrained(\".\")\n\nFor Llama3.1/Llama3, make sure your base directory has the following files:\n\n.. code:: ipython3\n\n   './tokenizer_config.json', './special_tokens_map.json', './tokenizer.json'\n\nNext let’s download and pre-process the dataset:\n\n.. code:: ipython3\n\n   mkdir ~/examples_datasets/ && cd ~/examples_datasets/\n   python3 ~/get_dataset.py --llama-version 3\n\n\n`Note:` In case you see an error of the following form when downloading data: ``huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name'. Use `repo_type` argument if needed.`` \nThis could be because of a stale cache. Try deleting the cache using: \n\n.. code:: ipython3\n\n   sudo rm -rf ~/.cache/\n\n\nPre-compile the model\n---------------------\n\nBy default, PyTorch Neuron uses a just in time (JIT) compilation flow that sequentially\ncompiles all of the neural network compute graphs as they are encountered during a training job.\nThe compiled graphs are cached in a local compiler cache so that subsequent training jobs\ncan leverage the compiled graphs and avoid compilation\n(so long as the graph signatures and Neuron version have not changed).\n\nAn alternative to the JIT flow is to use the included ``neuron_parallel_compile``\ncommand to perform ahead of time (AOT) compilation. In the AOT compilation flow,\nthe compute graphs are first identified and extracted during a short simulated training run,\nand the extracted graphs are then compiled and cached using parallel compilation,\nwhich is considerably faster than the JIT flow.\n\nFirst, clone the open-source ``neuronx-distributed-training`` library\n\n.. code:: ipython3\n\n   git clone https://github.com/aws-neuron/neuronx-distributed-training\n   cd neuronx-distributed-training/examples\n\nNow, ensure that you are using the proper config file in the ``conf/`` directory.\nIn the ``train.sh`` file, ensure that the ``CONF_FILE`` variable is properly\nset to the config for the model you want to use. In our case,\nit will be ``hf_llama3_8B_config``. The default config here is a 8B parameter model,\nbut users can also add their own ``conf/*.yaml`` files and run different configs and\nhyperparameters if desired. Please see :ref:`Config Overview <nxdt_config_overview>`\nfor examples and usage for the ``.yaml`` config files.\n\nNext, run the following commands to launch an AOT pre-compilation job on your instance:\n\n.. code-block:: bash\n\n    export COMPILE=1\n    ./train.sh\n\nThe compile output and logs will be shown directly in the terminal\nand you will see a message similar to this:\n\n.. code-block:: bash\n\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total graphs: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total successful compilations: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total failed compilations: 0\n\nThen, you know your compilation has successfully completed.\n\n.. note::\n    The number of graphs will differ based on package versions, models, and other factors.\n    This is just an example.\n\n\nTraining the model\n------------------\n\nThe pre-training job is launched almost exactly the same as the compile job.\nWe now turn off the ``COMPILE`` environment variable and\nrun the same training script to start pre-training.\n\nOn a single instance:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    ./train.sh\n\nOnce the model is loaded onto the Trainium accelerators and training has commenced,\nyou will begin to see output indicating the job progress:\n\nExample:\n\n.. code-block:: bash\n\n    Epoch 0:   0%|          | 189/301501 [59:12<1573:03:24, 18.79s/it, loss=7.75, v_num=3-16, reduced_train_loss=7.560, global_step=188.0, consumed_samples=24064.0]\n    Epoch 0:   0%|          | 190/301501 [59:30<1572:41:13, 18.79s/it, loss=7.74, v_num=3-16, reduced_train_loss=7.560, global_step=189.0, consumed_samples=24192.0]\n    Epoch 0:   0%|          | 191/301501 [59:48<1572:21:28, 18.79s/it, loss=7.73, v_num=3-16, reduced_train_loss=7.910, global_step=190.0, consumed_samples=24320.0]\n\n\nMonitoring Training\n-------------------\n\nTensorboard monitoring\n^^^^^^^^^^^^^^^^^^^^^^\n\nIn addition to the text-based job monitoring described in the previous section,\nyou can also use standard tools such as TensorBoard to monitor training job progress.\nTo view an ongoing training job in TensorBoard, you first need to identify the\nexperiment directory associated with your ongoing job.\nThis will typically be the most recently created directory under\n``~/neuronx-distributed-training/examples/nemo_experiments/hf_llama3_8B/``.\nOnce you have identifed the directory, cd into it, and then launch TensorBoard:\n\n.. code-block:: bash\n\n    cd ~/neuronx-distributed-training/examples/nemo_experiments/hf_llama3_8B/\n    tensorboard --logdir ./\n\nWith TensorBoard running, you can then view the TensorBoard dashboard by browsing to\n``http://localhost:6006`` on your local machine. If you cannot access TensorBoard at this address,\nplease make sure that you have port-forwarded TCP port 6006 when SSH'ing into the head node,\n\n.. code-block:: bash\n\n    ssh -i YOUR_KEY.pem ubuntu@HEAD_NODE_IP_ADDRESS -L 6006:127.0.0.1:6006\n\nneuron-top / neuron-monitor / neuron-ls\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe `neuron-top <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-top-user-guide.html>`_\ntool can be used to view useful information about NeuronCore utilization, vCPU and RAM utilization,\nand loaded graphs on a per-node basis. To use neuron-top during on ongoing training job, run ``neuron-top``:\n\n.. code-block:: bash\n\n    ssh compute1-dy-queue1-i1-1  # to determine which compute nodes are in use, run the squeue command\n    neuron-top\n\nSimilarly, once you are logged into one of the active compute nodes,\nyou can also use other Neuron tools such as\n`neuron-monitor <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nand `neuron-ls <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nto capture performance and utilization statistics and to understand NeuronCore allocation.\n\nTroubleshooting Guide\n---------------------\n\nFor issues with ``NxDT``, please see:\n:ref:`NxDT Known Issues <nxdt_known_issues>`"
  },
  {
    "path": "libraries/nxd-training/tutorials/index.rst",
    "content": ".. _nxdt_tutorials:\n\nTutorials\n=========\n\nThis section will go over tutorials to help users get started with NxD Training library.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    HuggingFace Llama3.1/Llama3-8B Pretraining <hf_llama3_8B_pretraining>\n    HuggingFace Llama3.1/LLama3-8B Supervised Fine-tuning <hf_llama3_8B_SFT>\n    HuggingFace Llama3.1/Llama3-8B Efficient Supervised Fine-tuning with LoRA (Beta) <hf_llama3_8B_SFT_LORA>\n    HuggingFace Llama3.1/Llama3-8B Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO) based Fine-tuning (Beta) <hf_llama3_8B_DPO_ORPO>\n    HuggingFace Llama3.1/Llama3-70B Pretraining <hf_llama3_70B_pretraining>\n    Checkpoint Conversion <checkpoint_conversion>\n\n.. include:: /libraries/nxd-training/tutorials/tutorials.txt\n"
  },
  {
    "path": "libraries/nxd-training/tutorials/megatron_gpt_pretraining.rst",
    "content": ".. _megatron_gpt_pretraining:\n\nMegatron GPT Pretraining\n========================\n\nIn this example, we will compile and train a Megatron GPT model on a single instance or\non multiple instances using ParallelCluster with the NxD Training library.\nThe example has the following main sections:\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nSetting up the environment\n--------------------------\n\nParallelCluster Setup\n^^^^^^^^^^^^^^^^^^^^^\n\nIn this example, we will use 8 instances with ParallelCluster,\nplease follow the instructions here to create a cluster:\n`Train your model on ParallelCluster\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/devflows/training/parallelcluster/parallelcluster-training.html>`_\n\nParallelCluster automates the creation of trn1 clusters,\nand provides the SLURM job management system for scheduling and managing distributed training jobs.\nPlease note that the home directory on your ParallelCluster\nhead node will be shared with all of the worker nodes via NFS.\n\nInstall Dependencies\n^^^^^^^^^^^^^^^^^^^^\n\nOnce you have launched a trn1 instance or ParallelCluster,\nplease follow this guide on how to install the latest Neuron packages:\n`PyTorch Neuron Setup Guide\n<https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/setup/torch-neuronx.html#setup-torch-neuronx>`_.\n\nNext, we will need to install NxD Training and its dependencies.\nPlease see the following installation guide for installing NxD Training:\n:ref:`NxDT Installation Guide <nxdt_installation_guide>`\n\n\nDownload the dataset\n--------------------\n\nThis tutorial makes use of a preprocessed Wikipedia dataset that is stored in S3.\nThe dataset can be downloaded to your cluster or instance by running\nthe following commands on the head node or your trn1 instance:\n\n.. code-block:: bash\n\n    export DATA_DIR=~/examples_datasets/gpt2\n    mkdir -p ${DATA_DIR} && cd ${DATA_DIR}\n    wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json\n    wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt\n    aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.bin .  --no-sign-request\n    aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.idx .  --no-sign-request\n    aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/license.txt .  --no-sign-request\n\n\n\nPre-compile the model\n---------------------\n\nBy default, PyTorch Neuron uses a just in time (JIT) compilation flow that sequentially\ncompiles all of the neural network compute graphs as they are encountered during a training job.\nThe compiled graphs are cached in a local compiler cache so that subsequent training jobs\ncan leverage the compiled graphs and avoid compilation\n(so long as the graph signatures and Neuron version have not changed).\n\nAn alternative to the JIT flow is to use the included ``neuron_parallel_compile``\ncommand to perform ahead of time (AOT) compilation. In the AOT compilation flow,\nthe compute graphs are first identified and extracted during a short simulated training run,\nand the extracted graphs are then compiled and cached using parallel compilation,\nwhich is considerably faster than the JIT flow.\n\nFirst, clone the open-source ``neuronx-distributed-training`` library\n\n.. code:: ipython3\n\n   git clone https://github.com/aws-neuron/neuronx-distributed-training\n   cd neuronx-distributed-training/examples\n\nNow, ensure that you are using the proper config file in the ``conf/`` directory.\nIn the ``train.sh`` file, ensure that the ``CONF_FILE`` variable is properly\nset to the config for the model you want to use. In our case,\nit will be ``megatron_gpt_config``. The default config here is a 6.7B parameter model,\nbut users can also add their own ``conf/*.yaml`` files and run different configs and\nhyperparameters if desired. Please see :ref:`Config Overview <nxdt_config_overview>`\nfor examples and usage for the ``.yaml`` config files.\n\nNext, run the following commands to launch an AOT pre-compilation job on your instance:\n\n.. code-block:: bash\n\n    export COMPILE=1\n    ./train.sh\n\nThe compile output and logs will be shown directly in the terminal\nand you will see a message similar to this:\n\n.. code-block:: bash\n\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total graphs: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total successful compilations: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total failed compilations: 0\n\nThen, you know your compilation has successfully completed.\n\n.. note::\n    The number of graphs will differ based on package versions, models, and other factors.\n    This is just an example.\n\nIf you are using ParallelCluster, then you will need to update the ``conf/megatron_gpt_config.yaml``\nwith\n\n.. code-block:: yaml\n\n    num_nodes: 8\n\nThen to run the compile job:\n\n.. code-block:: bash\n\n    export COMPILE=1\n    sbatch --exclusive \\\n        --nodes 8 \\\n        --cpus-per-task 128 \\\n        --wrap=\"srun ./train.sh\"\n\nOnce you have launched the precompilation job, run the squeue command to view the\nSLURM job queue on your cluster. If you have not recently run a job on your cluster,\nit may take 4-5 minutes for the requested trn1.32xlarge nodes to be launched and initialized.\nOnce the job is running, squeue should show output similar to the following:\n\n.. code-block:: bash\n\n    JOBID  PARTITION  NAME      USER    ST  TIME  NODES NODELIST(REASON)\n    10     compute1   wrap      ubuntu  R   5:11  8     compute1-dy-queue1-i1-[0-7]\n\nYou can view the output of the precompilation job by examining the file named\n``slurm-ZZ.out``,\nwhere ZZ represents the JOBID of your job in the squeue output above.\n\n.. code-block:: bash\n\n    tail -f slurm-10.out\n\nOnce the precompilation job is complete, just like the above output\nyou should see a message similar to the following in the logs:\n\n.. code-block:: bash\n\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total graphs: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total successful compilations: 22\n    2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total failed compilations: 0\n\nAt this point, you can press ``CTRL-C`` to exit the tail command.\n\nTraining the model\n------------------\n\nThe pre-training job is launched almost exactly the same as the compile job.\nWe now turn off the ``COMPILE`` environment variable and\nrun the same training script to start pre-training.\n\nOn a single instance:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    ./train.sh\n\nIf you are using ParallelCluster:\n\n.. code-block:: bash\n\n    export COMPILE=0\n    sbatch --exclusive \\\n        --nodes 8 \\\n        --cpus-per-task 128 \\\n        --wrap=\"srun ./train.sh\"\n\nAs outlined above, you can again use the ``squeue`` command to view the job queue,\nand also monitor the job in the same way with the ``tail`` command to see the training logs.\nOnce the model is loaded onto the Trainium accelerators and training has commenced,\nyou will begin to see output indicating the job progress:\n\nExample:\n\n.. code-block:: bash\n\n    Epoch 0:   0%|          | 189/301501 [59:12<1573:03:24, 18.79s/it, loss=7.75, v_num=3-16, reduced_train_loss=7.560, global_step=188.0, consumed_samples=24064.0]\n    Epoch 0:   0%|          | 190/301501 [59:30<1572:41:13, 18.79s/it, loss=7.74, v_num=3-16, reduced_train_loss=7.560, global_step=189.0, consumed_samples=24192.0]\n    Epoch 0:   0%|          | 191/301501 [59:48<1572:21:28, 18.79s/it, loss=7.73, v_num=3-16, reduced_train_loss=7.910, global_step=190.0, consumed_samples=24320.0]\n\nMonitoring Training\n-------------------\n\nTensorboard monitoring\n^^^^^^^^^^^^^^^^^^^^^^\n\nIn addition to the text-based job monitoring described in the previous section,\nyou can also use standard tools such as TensorBoard to monitor training job progress.\nTo view an ongoing training job in TensorBoard, you first need to identify the\nexperiment directory associated with your ongoing job.\nThis will typically be the most recently created directory under\n``~/neuronx-distributed-training/examples/nemo_experiments/megatron_gpt/``.\nOnce you have identifed the directory, cd into it, and then launch TensorBoard:\n\n.. code-block:: bash\n\n    cd ~/neuronx-distributed-training/examples/nemo_experiments/megatron_gpt/\n    tensorboard --logdir ./\n\nWith TensorBoard running, you can then view the TensorBoard dashboard by browsing to\n``http://localhost:6006`` on your local machine. If you cannot access TensorBoard at this address,\nplease make sure that you have port-forwarded TCP port 6006 when SSH'ing into the head node,\n\n.. code-block:: bash\n\n    ssh -i YOUR_KEY.pem ubuntu@HEAD_NODE_IP_ADDRESS -L 6006:127.0.0.1:6006\n\nneuron-top / neuron-monitor / neuron-ls\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe `neuron-top <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-top-user-guide.html>`_\ntool can be used to view useful information about NeuronCore utilization, vCPU and RAM utilization,\nand loaded graphs on a per-node basis. To use neuron-top during on ongoing training job,\nfirst SSH into one of your compute nodes from the head node (if using ParallelCluster), and then run ``neuron-top``:\n\n.. code-block:: bash\n\n    ssh compute1-dy-queue1-i1-1  # to determine which compute nodes are in use, run the squeue command\n    neuron-top\n\nSimilarly, once you are logged into one of the active compute nodes,\nyou can also use other Neuron tools such as\n`neuron-monitor <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nand `neuron-ls <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html>`_\nto capture performance and utilization statistics and to understand NeuronCore allocation.\n\nTroubleshooting Guide\n---------------------\n\nFor issues with NxD Training, please see:\n:ref:`NxD Training Known Issues <nxdt_known_issues>`\n\nFor ParallelCluster issues see:\n`AWS ParallelCluster Troubleshooting <https://docs.aws.amazon.com/parallelcluster/latest/ug/troubleshooting-v3.html>`_\n"
  },
  {
    "path": "libraries/nxd-training/tutorials/tutorials.txt",
    "content": "* :ref:`megatron_gpt_pretraining`\n* :ref:`hf_llama3_8B_pretraining`\n* :ref:`hf_llama3_8B_SFT`\n* :ref:`hf_llama3_8B_SFT_LORA`\n* :ref:`hf_llama3_8B_DPO_ORPO`\n* :ref:`hf_llama3_70B_pretraining`\n* :ref:`checkpoint_conversion`\n"
  },
  {
    "path": "libraries/transformers-neuronx/index.rst",
    "content": ".. _transformers_neuronx_readme:\n\nTransformers NeuronX (``transformers-neuronx``)\n==============================================\n\n.. important::\n    As of 9/16/2025, support has ended for the Transformers NeuronX library and the corresponding documentation has been archived. If you are still using Transformers NeuronX, Neuron recommends that you migrate to the :ref:`NxD Inference library <nxdi-index>`, which provides a PyTorch-based inference library.  Refer to the :ref:`migration guide <nxdi_migrate_from_tnx>` to learn how to migrate your Transformers NeuronX workloads to NxD Inference.           \n\nTo view the archived Transformers NeuronX documentation, including setup instructions and developer guides, see the :ref:`Transformers NeuronX documentation <transformers_neuronx_archive_readme>`. \n\n\n"
  },
  {
    "path": "llms.txt",
    "content": "# AWS Neuron SDK Documentation\n\n> AWS Neuron SDK enables high-performance deep learning and generative AI on AWS Inferentia and Trainium instances. The SDK provides comprehensive support for PyTorch, JAX, TensorFlow, and distributed training/inference libraries, along with developer tools for profiling, debugging, and optimizing ML workloads.\n\n## Important notes for AI agents:\n\n* The ReStructured Text (RST, .rst) sources for this documentation can be found at https://github.com/aws-neuron/aws-neuron-sdk in the `master` branch.\n\n* When developing on behalf of or providing code or information to a user, observe the information listed below in this section:\n\n  * Use the `latest` version of the website (as present in the page URL) instead of older versions (with `/release-*.**.*/` in the page URL). Older versions of the docs have a higher probability of being incorrect over time.\n  * Currently, setting the number of logical NeuronCores (LogicalNeuronCores, LNC) to 8 is not supported. Do not suggest LNC=8 until the documentation clearly and expressly notes it is possible.\n\n## Getting Started\n\n- [AWS Neuron Documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html): Main documentation home page\n- [What is AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/what-is-neuron.html): Overview of the Neuron SDK for high-performance machine learning on AWS Inferentia and Trainium\n- [What's New](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/whats-new.html): Latest releases and features of the Neuron SDK\n- [Quickstarts](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/index.html): Quick start guides for getting started with AWS Neuron\n- [Inference Quickstart](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/inference-quickstart.html): End-to-end inference quickstart on Neuron\n- [Training Quickstart](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/training-quickstart.html): End-to-end training quickstart on Neuron\n- [Announcements](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/index.html): SDK announcements including end-of-support and maintenance notices\n\n## Setup Guides\n\n- [Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/index.html): Installation and upgrade instructions for Neuron on Trainium and Inferentia instances\n- [Multi-Framework DLAMI](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/multiframework-dlami.html): Pre-configured AMI with PyTorch, JAX, and vLLM virtual environments\n\n### PyTorch Setup\n\n- [PyTorch Setup Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/index.html): Choose between DLAMI, DLC, or manual installation for PyTorch on Neuron\n- [PyTorch DLAMI Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/dlami.html): Install PyTorch via pre-configured Deep Learning AMI\n- [PyTorch DLC Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/dlc.html): Install PyTorch via Docker Deep Learning Container from ECR\n- [PyTorch Manual Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/manual.html): Install PyTorch manually using pip on bare OS\n- [Update PyTorch DLAMI](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/update-dlami.html): Update PyTorch version and drivers on an existing DLAMI\n- [Update PyTorch Manual](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/update-manual.html): Update PyTorch version and drivers on a manual installation\n- [Update PyTorch DLC](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/update-dlc.html): Update PyTorch container image and host driver\n\n### JAX Setup\n\n- [JAX Setup Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/index.html): Choose between DLAMI, DLC, or manual installation for JAX on Neuron\n- [JAX DLAMI Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/dlami.html): Install JAX via pre-configured Deep Learning AMI\n- [JAX DLC Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/dlc.html): Install JAX via Docker Deep Learning Container from ECR\n- [JAX Manual Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/manual.html): Install JAX manually using pip on bare OS\n\n### Legacy and Troubleshooting\n\n- [Legacy Inf1 Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/legacy-inf1/index.html): Installation guides for legacy Inferentia 1 instances\n- [Setup Troubleshooting](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/troubleshooting.html): Solutions for common setup issues\n\n## Core Concepts\n\n- [Neuron Architecture](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/index.html): Understanding the Neuron hardware and software architecture\n- [Neuron Hardware](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/index.html): Inferentia and Trainium hardware architecture details\n- [Neuron Features](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/index.html): Overview of model development features provided by Neuron\n- [Term Glossary](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/glossary.html): Definitions of key terms used in AWS Neuron documentation\n- [Model Samples](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/models/index.html): Pre-tested model samples and implementations\n- [Benchmarks](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/index.html): Training and inference performance benchmarks\n- [SDK Maintenance Policy](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/sdk-policy.html): AWS Neuron SDK maintenance, support lifecycle, and versioning policy\n- [FAQ](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/index.html): Frequently asked questions about AWS Neuron\n- [Troubleshooting](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/troubleshooting.html): Solutions for common issues with AWS Neuron\n- [News and Blogs](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/news-and-blogs/index.html): AWS Neuron news articles and blog posts\n\n## ML Frameworks\n\n- [ML Frameworks Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/index.html): PyTorch and JAX integration for high-performance machine learning on Neuron\n\n### PyTorch on Neuron\n\n- [PyTorch on Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/index.html): Complete PyTorch integration for both inference and training on Neuron hardware\n- [About PyTorch on Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/about/index.html): History and evolution of PyTorch support (torch-neuron, torch-neuronx, TorchNeuron Native)\n- [Native PyTorch for Trainium](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/pytorch-native-overview.html): TorchNeuron native backend with eager mode and torch.compile for Trn2/Trn3\n- [Training with PyTorch NeuronX](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/training-torch-neuronx.html): Training guides and resources for Trn1/Trn2/Trn3\n- [Inference with PyTorch NeuronX](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/inference-torch-neuronx.html): Inference guides for Inf2 and Trn1/Trn2/Trn3\n- [PyTorch Training Developer Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide.html): Core concepts for training on Neuron with XLA\n- [PyTorch Training Tutorials](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/tutorials-training-torch-neuronx.html): Step-by-step training examples\n- [PyTorch Inference Tutorials](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/inference/tutorials-inference-torch-neuronx.html): Step-by-step inference examples\n- [torch-neuron vs torch-neuronx Comparison](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/guide-torch-neuron-vs-torch-neuronx-inference.html): Detailed comparison for inference workloads across Inf1 and Inf2/Trn1\n- [Supported Operators](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/pytorch-neuron-supported-operators.html): List of PyTorch operators supported on Neuron\n- [Multi-Node Training Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.html): Configure multi-node distributed training on Trn1\n\n### JAX on Neuron\n\n- [JAX on Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/index.html): JAX support with PJRT plugin and NKI integration\n- [JAX NeuronX Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/setup/jax-setup.html): Install and configure the JAX NeuronX plugin\n- [JAX API Reference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/api-reference-guide/index.html): API reference for JAX NeuronX features\n- [JAX Environment Variables](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/api-reference-guide/neuron-envvars.html): JAX NeuronX environment variables reference\n- [JAX Known Issues](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/setup/jax-neuronx-known-issues.html): Known issues and limitations in JAX NeuronX\n\n### TensorFlow (Archived)\n\n- [TensorFlow on Neuron (Archived)](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/index.html): Archived TensorFlow support documentation\n\n## Distributed Libraries & Inference\n\n- [NeuronX Distributed Libraries](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/index.html): High-performance distributed training and inference libraries\n- [NxD Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/index.html): PyTorch-based inference library for deploying large models\n- [NxD Inference Tutorials](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/index.html): Comprehensive tutorials for deploying LLMs and vision models\n- [Llama 3.3 70B Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-tutorial.html): Deploy Llama 3.3 70B on Trn2\n- [Llama 3.1 405B Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.1-405b-tutorial.html): Deploy Llama 3.1 405B on Trn2\n- [Llama 4 Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/llama4-tutorial.html): Deploy Llama 4 models on Neuron\n- [Multi-LoRA Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.1-8b-multi-lora-tutorial.html): Serve multiple LoRA adapters with Llama 3.1 8B\n- [Qwen2 VL Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/qwen2-vl-tutorial.html): Deploy Qwen2 VL vision language model\n- [Qwen3 VL Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/qwen3-vl-tutorial.html): Deploy Qwen3 VL vision language model\n- [Flux Inference Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/flux-inference-tutorial.html): Deploy Flux.1 image generation model\n- [Flux Inpainting Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/flux-inpainting-inference-tutorial.html): Image inpainting with Flux.1\n- [Pixtral Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/pixtral-tutorial.html): Deploy Pixtral vision language model\n- [Disaggregated Inference Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/disaggregated-inference-tutorial.html): Separate prefill and decode for improved performance\n- [vLLM on Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/vllm/index.html): High-performance inference serving for large language models with OpenAI-compatible APIs\n- [vLLM Offline Serving Quickstart](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/vllm/quickstart-vllm-offline-serving.html): Run batch inference with vLLM on Neuron\n- [vLLM Online Serving Quickstart](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/vllm/quickstart-vllm-online-serving.html): Launch an OpenAI-compatible API server with vLLM on Neuron\n- [vLLM DLC Deployment](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/get-started/quickstart-configure-deploy-dlc.html): Deploy a vLLM server using a pre-configured Neuron Deep Learning Container\n- [vLLM V1 User Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide-v1.html): Complete guide for vLLM V1 with vLLM-Neuron Plugin\n- [vLLM User Guide (Legacy)](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html): User guide for earlier vLLM versions\n- [NxD Training](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/index.html): PyTorch library for end-to-end distributed training with Neuron\n- [NxD Core](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/index.html): Core distributed training and inference primitives\n\n## Developer Tools\n\n- [Developer Tools Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/index.html): Comprehensive suite of tools for optimizing, monitoring, and debugging ML workloads\n\n### Neuron Explorer\n\n- [Neuron Explorer](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/index.html): Unified profiling suite with AI-driven optimization recommendations\n- [Neuron Explorer Getting Started](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/get-started.html): Set up Neuron Explorer, launch the web UI, and configure SSH tunneling\n- [Capture and View Profiles](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/how-to-profile-workload.html): Capture and view profiles in the UI or via VSCode integration\n- [Device Trace Viewer](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-device-profiles.html): Hardware-level execution timeline, operator table, and dependency analysis\n- [System Trace Viewer](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-system-profiles.html): System-level execution timeline and analysis\n- [Hierarchy Viewer](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-hierarchy-view.html): Model layer to hardware execution visualization\n- [Source Code Viewer](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/how-to-link-view-source-code.html): Bidirectional linking between source code and profile data\n- [Summary Viewer](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-summary-page.html): High-level performance metrics and optimization recommendations\n- [Database Viewer](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-database-viewer.html): SQL and natural language queries on profiling data\n- [Tensor Viewer](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-tensor-viewer.html): Tensor names, shapes, sizes, and memory usage details\n- [Memory Viewer](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-memory-viewer.html): Low-level memory allocation and usage pattern analysis\n- [AI Recommendation Viewer](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-ai-recommendations.html): AI-powered bottleneck analysis and optimization recommendations\n- [Migration from Neuron Profiler](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/migration-faq.html): Migration guide and FAQ for moving from Neuron Profiler to Neuron Explorer\n- [View Profiles with Perfetto](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/view-perfetto.html): View Neuron Explorer profiles using the Perfetto UI\n\n### System Tools\n\n- [System Tools](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/index.html): Command-line utilities for monitoring, debugging, and managing AWS Neuron devices\n- [Neuron Top](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-top-user-guide.html): Real-time monitoring of Neuron device utilization\n- [Neuron Monitor](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html): Collect metrics for monitoring and alerting\n\n## Neuron Runtime & Compiler\n\n- [NeuronX Runtime](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/index.html): High-performance execution engine for running models on AWS Inferentia and Trainium\n- [Runtime Configuration](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/configuration-guide.html): Learn how to configure the Neuron Runtime using environment variables\n- [Runtime API Reference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/index.html): Comprehensive guide to the Neuron Runtime API\n- [Neuron Graph Compiler](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/index.html): Sophisticated compilation system that transforms ML models into optimized code\n- [Compiler Error Codes](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/index.html): Neuron Compiler error code documentation\n\n## Neuron Kernel Interface (NKI)\n\n- [NKI Introduction](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/index.html): Programming interface for direct access to AWS NeuronDevices\n- [NKI FAQ](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/nki_faq.html): Frequently asked questions about NKI\n- [NKI Getting Started](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/index.html): Setup and quickstart for NKI development\n- [NKI Quickstart](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/quickstart-implement-run-kernel.html): Implement and run your first NKI kernel\n- [NKI Language Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/nki-language-guide.html): Developer guide for NKI's Pythonic language syntax\n- [NKI Setup Environment](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/setup-env.html): Environment setup for NKI development\n\n### NKI Overviews & Concepts\n\n- [NKI About Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/index.html): Core concepts and architecture overview\n- [Memory Hierarchy Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/memory-hierarchy-overview.html): Understanding NKI memory hierarchy\n- [Data Representation Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/data-representation-overview.html): How data is represented in NKI\n- [Indexing Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/indexing-overview.html): Tensor indexing in NKI\n- [Tiling Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/tiling-overview.html): Tiling strategies for efficient computation\n- [DMA Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/nki-dma-overview.html): Direct Memory Access in NKI\n- [LNC Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/lnc.html): Large NeuronCore multi-core programming\n\n### NKI Tutorials\n\n- [NKI Tutorials](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/index.html): Step-by-step tutorials for NKI kernel development\n- [Tensor Addition Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/tensor_addition_tutorial.html): Learn NKI basics with tensor addition\n- [SPMD Tensor Addition Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/spmd_tensor_addition.html): Multi-core tensor addition with SPMD\n- [SPMD Multi-NC Tensor Addition](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/spmd_multiple_nc_tensor_addition.html): Multi-NeuronCore SPMD programming\n- [Matrix Multiplication Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/matrix_multiplication_tutorial.html): Implement efficient matrix multiplication\n- [Attention Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/attention_tutorial.html): Build attention mechanisms in NKI\n- [Fused Mamba Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/fused_mamba_tutorial.html): Optimize Mamba state space models\n- [Average Pool2D Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/average_pool2d_tutorial.html): Implement 2D average pooling\n- [Transpose2D Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/transpose2d_tutorial.html): Efficient 2D tensor transpose\n- [Kernel Optimization Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/kernel-optimization.html): Techniques for optimizing NKI kernel performance\n\n### NKI Guides\n\n- [NKI Guides](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/index.html): Comprehensive guides for NKI development\n- [Framework Custom Operators](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/framework_custom_op.html): Integrate NKI kernels with PyTorch and JAX\n- [Scheduling APIs Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/how-to-scheduling-apis.html): Using NKI scheduling APIs for instruction control\n- [Architecture Overview](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/index.html): Hardware architecture guides\n- [Trainium & Inferentia2 Architecture](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/trainium_inferentia2_arch.html): Architecture details for Trn1 and Inf2\n- [Trainium2 Architecture](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/trainium2_arch.html): Architecture details for Trn2\n- [Trainium3 Architecture](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/trainium3_arch.html): Architecture details for Trn3\n\n### NKI Deep Dives\n\n- [NKI Deep Dives](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/index.html): In-depth documentation on NKI concepts and optimization\n- [NKI Performance Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki_perf_guide.html): Optimization techniques for NKI kernels\n- [Profiling NKI Kernels](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/use-neuron-profile.html): Learn how to profile NKI kernels with Neuron Explorer\n- [NKI Compiler](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-compiler.html): Documentation for the NKI compiler\n- [NKI 0.3.0 Update Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-0-3-0-update-guide.html): Migration guide for NKI 0.3.0 changes\n- [NKI Beta2 Migration Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-beta2-migration-guide.html): Migrating to NKI Beta 2\n- [NKI Block Dimension Migration](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki_block_dimension_migration_guide.html): Migrating to new block dimension APIs\n- [NKI APS](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-aps.html): Automatic Performance Scheduling in NKI\n- [NKI DGE](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-dge.html): Data Gather Engine programming\n- [NKI Dynamic Range](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-dynamic-range.html): Dynamic range indexing in NKI\n- [NKI HBM CRC Hashing](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-hbm-crc-hashing.html): HBM CRC hashing for data integrity\n- [MxFP MatMul Deep Dive](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/mxfp-matmul.html): Microscaling floating-point matrix multiplication\n\n### NKI API Reference\n\n- [NKI API Reference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/index.html): Complete API documentation for the Neuron Kernel Interface\n- [nki](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.html): Top-level NKI module\n- [nki.jit](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.jit.html): JIT compilation decorator for NKI kernels\n- [nki.language](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.language.html): NKI language constructs, memory types, and tile operations\n- [nki.language.tile_size](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.language.tile_size.html): Tile size constants for NKI programming\n- [nki.isa](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.isa.html): Instruction Set Architecture APIs for low-level operations\n- [nki.collectives](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.collectives.html): Collective communication operations across NeuronCores\n- [nki.simulate](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.simulate.html): NKI kernel simulation for CPU-based testing\n- [nki.api.shared](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.api.shared.html): Shared API utilities and types\n\n### NKI Library\n\n- [NKI Library](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/index.html): Pre-built optimized kernels for common operations\n- [NKI Library About](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/about/index.html): Overview and design principles of the NKI Library\n- [NKI Library API Reference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/index.html): Complete API reference for all NKI Library kernels\n- [Attention CTE Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/attention-cte.html): Fused attention for context encoding\n- [Attention TKG Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/attention-tkg.html): Fused attention for token generation\n- [Attention Block TKG Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/attention-block-tkg.html): Fused attention block for token generation\n- [QKV Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/qkv.html): Fused QKV projection with quantization support\n- [MLP Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/mlp.html): Fused MLP with gate/up projection\n- [RMSNorm-Quant Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/rmsnorm-quant.html): Fused RMSNorm with quantization\n- [RoPE Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/rope.html): Rotary Position Embedding\n- [Router Top-K Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/router-topk.html): Expert selection for Mixture of Experts\n- [MoE CTE Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/moe-cte.html): MoE context encoding\n- [MoE TKG Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/moe-tkg.html): MoE token generation\n- [Cumsum Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/cumsum.html): Cumulative sum operation\n- [Cross Entropy Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/cross-entropy.html): Cross entropy forward and backward\n- [Output Projection CTE](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/output-projection-cte.html): Output projection for context encoding\n- [Output Projection TKG](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/output-projection-tkg.html): Output projection for token generation\n- [Transformer TKG Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/transformer-tkg.html): Full transformer block for token generation\n- [Conv1D Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/conv1d.html): 1D convolution\n- [Depthwise Conv1D Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/depthwise-conv1d.html): Depthwise 1D convolution\n- [Blockwise MM Backward](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/blockwise-mm-backward.html): Blockwise matrix multiply backward for MoE training\n- [Top-K Reduce Kernel](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/topk-reduce.html): Top-K reduction operation\n- [Dynamic Elementwise Add](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/dynamic-elementwise-add.html): Dynamic elementwise addition\n- [Find Nonzero Indices](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/find-nonzero-indices.html): Find nonzero element indices\n- [Fine-Grained All-Gather](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/fg-allgather.html): Ring-based all-gather with compute overlap\n- [FGCC (All-Gather + Matmul)](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/fgcc.html): Fused fine-grained collective compute combining all-gather and matmul\n- [SBUF-to-SBUF All-Gather](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/sb2sb-allgather.html): SBUF-to-SBUF all-gather for small and large tensors\n\n### NKI Library Kernel Utilities\n\n- [Kernel Utilities](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/index.html): Shared utilities for NKI Library kernels\n- [TensorView](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/tensor-view.html): Tensor view abstraction with rearrange and dynamic access\n- [SbufManager / Allocator](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/allocator.html): SBUF memory allocation management\n- [TiledRange](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/tiled-range.html): Tiled range iteration utilities\n- [Stream Shuffle Broadcast](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/stream-shuffle-broadcast.html): Stream shuffle and broadcast utilities\n\n## Deployment & Orchestration\n\n- [AWS Workload Orchestration](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/index.html): Deployment patterns and best practices for running Neuron-powered applications\n- [Amazon EC2](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/ec2-flows.html): Launching Inf/Trn instances on Amazon EC2\n- [Amazon EKS](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/eks-flows.html): Deploy Neuron workloads on Kubernetes with Amazon Elastic Kubernetes Service\n- [Amazon SageMaker](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/sagemaker-flows.html): Deploy Neuron workloads on Amazon SageMaker\n- [Neuron Containers](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/index.html): Pre-configured Docker images for training and serving models\n- [Getting Started with Containers](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/getting-started.html): Step-by-step guide for building Neuron containers using Docker\n- [Kubernetes Getting Started](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html): Deploy Neuron workloads on Kubernetes\n- [Neuron DRA](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/neuron-dra.html): Dynamic Resource Allocation for Kubernetes\n- [Container Tutorials](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials.html): Hands-on tutorials for deploying containers on EC2, EKS, ECS\n\n## Release Notes & Support\n\n- [Release Notes](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html): Official home page for AWS Neuron SDK release notes\n- [What's New](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/whats-new.html): Latest releases and features of the Neuron SDK\n- [Latest Release (2.29.0)](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/2.29.0.html): Release notes for Neuron SDK version 2.29.0\n- [Component Release Notes](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/index.html): Release notes for individual Neuron components\n- [Troubleshooting](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/troubleshooting.html): Solutions for common issues with AWS Neuron\n- [FAQ](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/index.html): Frequently asked questions about AWS Neuron\n\n## Optional\n\n- [Model Samples](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/models/index.html): Sample models and implementations for AWS Neuron\n- [Benchmarks](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/index.html): Training performance benchmarks for Trn1 instances with distributed training metrics\n- [PyTorch NeuronX Application Notes](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/index.html): Technical documentation for PyTorch NeuronX\n- [Open Source](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/oss/index.html): Neuron Open Source GitHub Repos and contribution guidelines\n- [Native PyTorch Support](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/pytorch-native-overview.html): Learn about native PyTorch support for inference and training\n- [TensorFlow Neuron Tutorials](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/index.html): Tutorials for TensorFlow Neuron\n- [TensorFlow NeuronX Tutorials](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.html): Tutorials for TensorFlow NeuronX\n- [NeMo Megatron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nemo-megatron/index.html): NeMo Megatron integration with Neuron for large-scale model training\n- [Third-party Tools](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/third-party-solutions.html): Third-party tools and integrations that support the AWS Neuron development experience\n- [Tool Tutorials](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/index.html): Tutorials for how to utilize all Neuron Tools\n- [Setup Troubleshooting (legacy)](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/setup-troubleshooting.html): Solutions for common setup issues (legacy page)\n- [Amazon ECS](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/ecs-flows.html): Run containerized Neuron applications using Amazon Elastic Container Service\n- [AWS ParallelCluster](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/parallelcluster-flows.html): Set up HPC clusters for distributed training and inference workloads\n- [AWS Batch](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/aws-batch-flows.html): Execute batch ML jobs with automatic scaling and resource management\n- [Container FAQ & Troubleshooting](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/faq.html): Frequently asked questions and solutions for common issues with Neuron containers\n- [Runtime Troubleshooting](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-troubleshoot.html): Solutions for common issues with the Neuron Runtime\n- [Neuron C++ Custom Operators](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/index.html): Documentation for creating custom operators in C++ for Neuron\n- [AWS Neuron DLAMIs](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/dlami/index.html): Pre-configured Amazon Machine Images with Neuron SDK\n- [Security](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/security.html): Security disclosures and notification for the AWS Neuron SDK\n- [SDK Maintenance Policy](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/sdk-policy.html): AWS Neuron SDK maintenance and support policy\n"
  },
  {
    "path": "neuron-customops/api-reference-guide/api-reference-guide.rst",
    "content": "API Reference Guide\n===================\n\n\n.. toctree::\n    :maxdepth: 1\n\n    /neuron-customops/api-reference-guide/custom-ops-ref-guide"
  },
  {
    "path": "neuron-customops/api-reference-guide/custom-ops-ref-guide.rst",
    "content": ".. _custom-ops-api-ref-guide:\n\nCustom Operators API Reference Guide [Beta]\n============================================\n\nThis page provides the documentation for the C++ API available to creators of Neuron custom C++ operators (see :ref:`neuron_c++customops`).\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\nTensor Library\n--------------\n\nThe tensor library used for Neuron custom C++ operators is based upon the PyTorch ATen tensor library. This includes the core Tensor class as well as select operations defined below. Users need to include the ``<torch/torch.h>`` header to access the tensor library. A small example of using the tensor library looks as follows.\n\n.. code-block:: c++\n\n    #include <torch/torch.h>\n    ...\n    torch::Tensor a = torch::zeros({32, 32, 3}, torch::kFloat);\n\nTensor Factory Functions\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe tensor factory functions provide different means for creating new tensors.\n\nThey each take in a ``size`` argument that specifies the size of each dimension of the tensor created (with the exception of ``eye``, which takes in two int64's and creates a strictly 2-dimensional identity matrix.)\n\n``c10::TensorOptions`` allows the specification of optional properties for the tensor being created. Currently, only the ``dtype`` property has an effect on tensor construction, and it must be specified. Other properties, such as ``layout`` may be supported in the future.\nThe example above shows a common way to use factory functions.\n\nThe following dtypes are supported:\n\n* torch::kFloat\n* torch::kBFloat16\n* torch::kHalf\n* torch::kInt\n* torch::kChar\n* torch::kShort\n* torch::kByte\n\n.. cpp:function:: torch::Tensor empty(torch::IntArrayRef size, c10::TensorOptions options)\n\n    Creates a tensor filled with uninitialized data, with the specified size and options. Slightly faster than other factory functions since it skips writing data to the tensor.\n\n.. cpp:function:: torch::Tensor full(torch::IntArrayRef size, const Scalar & fill_value, c10::TensorOptions options)\n\n    Creates a tensor filled with the specified ``fill_value``, with the specified size and options.\n\n.. cpp:function:: torch::Tensor zeros(torch::IntArrayRef size, c10::TensorOptions options)\n\n    Creates a tensor filled with zeros, with the specified size and options.\n\n.. cpp:function:: torch::Tensor ones(torch::IntArrayRef size, c10::TensorOptions options)\n\n    Creates a tensor filled with ones, with the specified size and options.\n\n.. cpp:function:: torch::Tensor eye(int64_t n, int64_t m, c10::TensorOptions options)\n\n    Creates a 2-D tensor with ones on the diagonal and zeros elsewhere.\n\nTensor Operation Functions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe tensor library provides commonly used operations defined below. The tensor operation functions do not support broadcasting; the shape of the operands must match if applicable. \n\nThe library provides two styles of functions for each tensor operation. For functions ending with ``_out``, a tensor with the proper size must be provided to which the output is written. This is illustrated in the example below.\n\n.. code-block:: c++\n\n    torch::exp_out(t_out, t_in);\n\nAlternatively, for functions that do not end in ``_out``, a new tensor that contains the results of the operation is allocated and returned as seen in the example below.\n\n.. code-block:: c++\n\n    torch::Tensor t_out = torch::exp(t_in);\n\n.. warning:: \n    Only operations that are documented below are supported.\n\n.. cpp:function:: torch::Tensor& abs_out(torch::Tensor &result, torch::Tensor &self)\n.. cpp:function:: torch::Tensor abs(torch::Tensor& self)\n\n    Computes the absolute value of each element in ``self``.\n\n.. cpp:function:: torch::Tensor& ceil_out(torch::Tensor &result, torch::Tensor &self)\n.. cpp:function:: torch::Tensor ceil(torch::Tensor &self)\n\n    Computes the ceiling of the elements of ``self``, the smallest integer greater than or equal to each element.\n\n.. cpp:function:: torch::Tensor& floor_out(torch::Tensor& result, torch::Tensor &self)\n.. cpp:function:: torch::Tensor floor(torch::Tensor &self)\n\n    Computes the floor of the elements of ``self``, the largest integer less than or equal to each element.\n\n.. cpp:function:: torch::Tensor& sin_out(torch::Tensor& result, torch::Tensor& self)\n.. cpp:function:: torch::Tensor sin(torch::Tensor& self)\n\n    Computes the sine value of the elements of ``self``.\n\n.. cpp:function:: torch::Tensor& cos_out(torch::Tensor& result, torch::Tensor& self)\n.. cpp:function:: torch::Tensor cos(torch::Tensor& self)\n\n    Computes the cosine value of the elements of ``self``.\n\n.. cpp:function:: torch::Tensor& tan_out(torch::Tensor& result, torch::Tensor& self)\n.. cpp:function:: torch::Tensor tan(torch::Tensor& self)\n\n    Computes the tangent value of the elements of ``self``.\n\n.. cpp:function:: torch::Tensor& log_out(torch::Tensor& result, torch::Tensor& self)\n.. cpp:function:: torch::Tensor log(torch::Tensor& self)\n\n    Computes the natural logarithm of the elements of ``self``.\n\n.. cpp:function:: torch::Tensor& log2_out(torch::Tensor& result, torch::Tensor& self)\n.. cpp:function:: torch::Tensor log2(torch::Tensor& self)\n\n    Computes the base-2 logarithm of the elements of ``self``.\n\n.. cpp:function:: torch::Tensor& log10_out(torch::Tensor& result, torch::Tensor& self)\n.. cpp:function:: torch::Tensor log10(torch::Tensor& self)\n\n    Computes the base-10 logarithm of the elements of ``self``.\n\n.. cpp:function:: torch::Tensor& exp_out(torch::Tensor& result, torch::Tensor& self)\n.. cpp:function:: torch::Tensor exp(torch::Tensor& self)\n\n    Computes the exponential of the elements of ``self``.\n\n.. cpp:function:: torch::Tensor& pow_out(torch::Tensor& result, const torch::Tensor& self, const torch::Scalar & exponent)\n.. cpp:function:: torch::Tensor& pow_out(torch::Tensor& result, const torch::Scalar& self, const torch::Tensor & exponent)\n.. cpp:function:: torch::Tensor& pow_out(torch::Tensor& result, const torch::Tensor& self, const torch::Tensor & exponent)\n.. cpp:function:: torch::Tensor pow(const torch::Tensor& self, const torch::Scalar & exponent)\n.. cpp:function:: torch::Tensor pow(const torch::Scalar& self, const torch::Tensor & exponent)\n.. cpp:function:: torch::Tensor pow(const torch::Tensor& self, const torch::Tensor & exponent)\n\n    Takes the power of each element in ``self`` with ``exponent``. \n\n.. cpp:function:: torch::Tensor& clamp_out(torch::Tensor& result, const torch::Tensor& self, const torch::Scalar& minval, const torch::Scalar& maxval)\n.. cpp:function:: torch::Tensor clamp(const torch::Tensor& self, const torch::Scalar& minval, const torch::Scalar& maxval)\n\n    Clamps all elements in ``self`` into the range ``[minval, maxval]``.\n\n.. cpp:function:: torch::Tensor& add_out(torch::Tensor& result, const torch::Tensor& self, const torch::Scalar &other, const torch::Scalar& alpha=1)\n.. cpp:function:: torch::Tensor& add_out(torch::Tensor& result, const torch::Tensor& self, const torch::Tensor& other, const torch::Scalar& alpha=1)\n.. cpp:function:: torch::Tensor add(const torch::Tensor& self, const torch::Scalar &other, const torch::Scalar& alpha=1)\n.. cpp:function:: torch::Tensor add(const torch::Tensor& self, const torch::Tensor &other, const torch::Scalar& alpha=1)\n\n    Adds ``other``, scaled by ``alpha``, to ``input``,\n.. math:: \n    out = self + alpha \\times other.\n\n.. cpp:function:: torch::Tensor& sub_out(torch::Tensor& result, const torch::Tensor& self, const torch::Scalar &other, const torch::Scalar& alpha=1)\n.. cpp:function:: torch::Tensor& sub_out(torch::Tensor& result, const torch::Tensor& self, const torch::Tensor& other, const torch::Scalar& alpha=1)\n.. cpp:function:: torch::Tensor sub(const torch::Tensor& self, const torch::Tensor &other, const torch::Scalar& alpha=1)\n.. cpp:function:: torch::Tensor sub(const torch::Tensor& self, const torch::Scalar& other, const torch::Scalar& alpha=1)\n\n    Subtracts ``other``, scaled by ``alpha``, to ``input``,\n.. math:: \n    out = self - alpha \\times other.\n\n.. cpp:function:: torch::Tensor& mul_out(torch::Tensor& result, const torch::Tensor& self, const torch::Scalar &other)\n.. cpp:function:: torch::Tensor& mul_out(torch::Tensor& result, const torch::Tensor& self, const torch::Tensor& other)\n.. cpp:function:: torch::Tensor mul(const torch::Tensor& self, const torch::Scalar &other)\n.. cpp:function:: torch::Tensor mul(const torch::Tensor& self, const torch::Tensor &other)\n\n    Multiplies ``self`` by ``other``.\n\n.. cpp:function:: torch::Tensor& div_out(torch::Tensor& result, const torch::Tensor& self, const torch::Scalar &other)\n.. cpp:function:: torch::Tensor& div_out(torch::Tensor& result, const torch::Tensor& self, const torch::Tensor& other)\n.. cpp:function:: torch::Tensor div(const torch::Tensor& self, const torch::Scalar &other)\n.. cpp:function:: torch::Tensor div(const torch::Tensor& self, const torch::Tensor &other)\n\n    Divides ``self`` by ``other``.\n\n.. note:: \n   For tensor-tensor bitwise operations, all the bitwise operations are elementwise between two tensors. For scalar-tensor bitwise operations, the scalar is casted to the datatype of the tensor before computing the bitwise operation.\n\n.. cpp:function:: torch::Tensor& bitwise_and_out(torch::Tensor& result, const torch::Tensor& self, const torch::Tensor& other)\n.. cpp:function:: torch::Tensor& bitwise_and_out(torch::Tensor& result, const torch::Tensor& self, const torch::Scalar& other)\n.. cpp:function:: torch::Tensor& bitwise_and_out(torch::Tensor& result, const torch::Scalar& self, const torch::Tensor& other)\n.. cpp:function:: torch::Tensor bitwise_and(const torch::Tensor& self, const torch::Tensor& other)\n.. cpp:function:: torch::Tensor bitwise_and(const torch::Tensor& self, const torch::Scalar& other)\n.. cpp:function:: torch::Tensor bitwise_and(const torch::Scalar& self, const torch::Tensor& other)\n\n    Computes the bitwise AND of ``self`` and ``other``. The input tensors must be of integral types.\n\n.. cpp:function:: torch::Tensor& bitwise_or_out(torch::Tensor& result, const torch::Tensor& self, const torch::Tensor& other)\n.. cpp:function:: torch::Tensor& bitwise_or_out(torch::Tensor& result, const torch::Tensor& self, const torch::Scalar& other)\n.. cpp:function:: torch::Tensor& bitwise_or_out(torch::Tensor& result, const torch::Scalar& self, const torch::Tensor& other)\n.. cpp:function:: torch::Tensor bitwise_or(const torch::Tensor& self, const torch::Tensor& other)\n.. cpp:function:: torch::Tensor bitwise_or(const torch::Tensor& self, const torch::Scalar& other)\n.. cpp:function:: torch::Tensor bitwise_or(const torch::Scalar& self, const torch::Tensor& other)\n\n    Computes the bitwise OR of ``self`` and ``other``. The input tensors must be of integral types.\n\n.. cpp:function:: torch::Tensor& bitwise_not_out(torch::Tensor& result, const torch::Tensor& self)\n.. cpp:function:: torch::Tensor bitwise_not(torch::Tensor& result, const torch::Tensor& self)  \n\n    Computes the bitwise NOT of ``self``. The input tensor must be of integral types. \n\nClass torch::Tensor\n^^^^^^^^^^^^^^^^^^^\n\nConstructors\n\"\"\"\"\"\"\"\"\"\"\"\"\n\nUsers should not call the Tensor constructor directly but instead use one of the Tensor factory functions.\n\nMember Functions\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. cpp:function:: template<typename T, size_t N> TensorAccessor<T,N,true> accessor() const&\n\n    Return a ``TensorAccessor`` for element-wise random access of a Tensor's elements. Scalar type and dimension template parameters must be specified. This const-qualified overload returns a read-only ``TensorAccessor``, preventing the user from writing to Tensor elements. See the Tensor Accessors section below for more details.\n\n.. cpp:function::  template<typename T, size_t N> TensorAccessor<T,N,false> accessor() &\n\n    Return a ``TensorAccessor`` for element-wise random access of a Tensor's elements. Scalar type and dimension template parameters must be specified. This non-const-qualified overload returns a ``TensorAccessor`` that can be used to both read and write to Tensor elements. See the Tensor Accessors section below for more details.\n\n.. cpp:function:: template<typename T> TensorReadStreamAccessor<T> read_stream_accessor() const&\n\n    Opens a streaming accessor for read on a tensor. Template parameter ``T`` is the scalar type of the tensor data. See Streaming Accessors section below for more details.\n\n.. cpp:function:: template<typename T> TensorWriteStreamAccessor<T> write_stream_accessor() &\n\n    Opens a streaming accessor for write on a tensor. Template parameter ``T`` is the scalar type of the tensor data. See Streaming Accessors section below for more details.\n\n.. cpp:function:: CoherencyEnforcer::Policy get_accessor_coherence_policy() const\n\n    Get the Tensor accessor coherence policy. See Coherence section below for more details.\n\n.. cpp:function:: void set_accessor_coherence_policy(CoherencyEnforcer::Policy policy) const\n\n    Set the Tensor accessor coherence policy. See Coherence section below for more details.\n\n.. cpp:function:: TensorTcmAccessor<true> tcm_accessor() const&\n\n    Opens a TCM accessor on a tensor. This const-qualified overload returns a read-only ``TensorTcmAccessor``, preventing the user from writing to Tensor elements. See TCM Accessor section below for more details.\n\n.. cpp:function:: TensorTcmAccessor<false> tcm_accessor() &\n\n    Opens a TCM accessor on a tensor. This non-const-qualified overload returns a ``TensorTcmAccessor`` that can be used to both read and write to Tensor elements. See TCM Accessor section below for more details.\n\n.. cpp:function:: torch::Tensor& fill_(const torch::Scalar & value) const\n    \n    Fill a tensor with the specified value.\n\nTensor Operators\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. cpp:function:: Tensor& operator=(const Tensor &x) &\n.. cpp:function:: Tensor& operator=(Tensor &&x) &\n\n    Assignment operators\n\nTensor Accessors\n----------------\n\nThe standard tensor accessor provides element-wise random access to ``Tensor`` elements. They can be created by calling ``Tensor::accessor()``. It can be used similarly to the Pytorch ATen version (see https://pytorch.org/cppdocs/notes/tensor_basics.html#cpu-accessors). However, it is not as fast as other methods of accessing a ``Tensor``, such as the streaming accessor or TCM accessor.\n\n.. warning::\n    The standard tensor accessors can only be used in single core mode. Using standard tensor accessors in multicore mode is undefined behaviour and is going to cause race condition, yielding incorrect result.\n\nExample Usage\n^^^^^^^^^^^^^\n\nElement-wise add of two 1D tensors using ``TensorAccessor``.\n\n.. code-block:: c++\n\n    torch::Tensor tensor_add_compute(const torch::Tensor& t1, const torch::Tensor& t2) {\n        size_t num_elem = t1.numel();\n        assert(t1.sizes() == t2.sizes());\n        torch::Tensor t_out = torch::empty({num_elem}, torch::kFloat);\n\n        auto t1_acc = t1.accessor<float, 1>();\n        auto t2_acc = t2.accessor<float, 1>();\n        auto t_out_acc = t_out.accessor<float, 1>();\n        for (size_t i = 0; i < num_elem; i++) {\n            t_out_acc[i] = t1_acc[i] + t2_acc[i];\n        }\n        return t_out;\n    }\n\n.. _custom-ops-ref-guide-mem-arch:\n\nMemory Architecture\n^^^^^^^^^^^^^^^^^^^\n\nTensor data is stored in HBM. The various types of accessors enable users to access tensor data from their custom C++ operator code running on the GPSIMD engine.\n\n.. image:: /neuron-customops/images/ncorev2_gpsimd_memory.png\n    :width: 600\n\nStreaming Accessors\n-------------------\n\nStreaming accessors provide the user the ability to access ``Tensor`` elements in sequential order, faster than the standard tensor accessor. There are two stream accessor classes, one for reading and one for writing. Users should not construct stream accessors directly, but should get them from a ``Tensor`` using ``Tensor::read_stream_accessor`` and ``Tensor::write_stream_accessor()``.\n\nAn active stream accessor is defined as a stream accessor that has been instantiated and not yet closed (via the ``close()`` method or by going out-of-scope).\n\nThe user is responsible for managing stream accessors concurrently accessing the same ``Tensor``. For safest usage, no stream accessor should be active while there is an active ``TensorWriteStreamAccessor`` on the same ``Tensor``. The user may either have multiple ``TensorReadStreamAccessors`` active on the same ``Tensor``, or only have a single ``TensorWriteStreamAccessor`` active on that ``Tensor``. Stream accessors should not be used concurrently with standard tensor accessors on the same ``Tensor``.\n\nAn unlimited number of active stream accessors (in total, across all ``Tensors``) are functionally supported, but only up to 4 active stream accessors will be performant. Additional stream accessors beyond the 4th will have performance similar to that of a standard tensor accessor.\n\n.. warning::\n    Streaming Accessors can only be used in single core mode. Using streaming accessors in multicore mode is undefined behaviour and is going to cause race condition, yielding incorrect result.\n\nExample Usage\n^^^^^^^^^^^^^\n\nElement-wise add of two tensors using ``TensorWriteStreamAccessor`` and ``TensorWriteStreamAccessor``.\n\n.. code-block:: c++\n\n    torch::Tensor tensor_add_compute(const torch::Tensor& t1, const torch::Tensor& t2) {\n        assert(t1.sizes() == t2.sizes());\n        torch::Tensor t_out = torch::empty(t1.sizes(), torch::kFloat);\n\n        auto t1_rd_stm_acc = t1.read_stream_accessor<float>();\n        auto t2_rd_stm_acc = t2.read_stream_accessor<float>();\n        auto t_out_wr_stm_acc = t_out.write_stream_accessor<float>();\n        for (int i = 0; i < t1.numel(); i++) {\n            auto sum = t1_rd_stm_acc.read() + t2_rd_stm_acc.read();\n            t_out_wr_stm_acc.write(sum);\n        }\n        return t_out;\n    }\n\nClass torch::TensorWriteStreamAccessor\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. cpp:class:: template<typename T> class TensorReadStreamAccessor\n\n    The class template parameter ``T`` is the scalar type of the tensor data.\n\nMember Functions\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. cpp:function:: T read()\n\n    Reads from next element in the stream. User is responsible for knowing when to stop reading from ``TensorReadStreamAccessor``. Reading past the end of the stream or on a closed stream results in undefined behaviour.\n\n.. cpp:function:: int close()\n\n    Closes stream. Do not read from the stream after calling ``close()``.\n\nClass torch::TensorWriteStreamAccessor\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. cpp:class:: template<typename T> class torch::TensorWriteStreamAccessor\n\n    The class template parameter ``T`` is the scalar type of the tensor data.\n\nMember Functions\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. cpp:function:: void write(T value)\n\n    Writes to next element in the stream. Written value is not guaranteed to be written back to the Tensor's memory until the ``TensorWriteStreamAccessor`` goes out of scope, or the user explicitly calls ``close()``. User is responsible for knowing when to stop writing to a stream accessor. Writing past the end of the stream or on a closed stream results in undefined behaviour.\n\n.. cpp:function:: int close()\n\n    Closes stream. Flushes write data to the ``Tensor``'s memory. Do not write to the stream after calling ``close()``.\n\nCoherence\n^^^^^^^^^\n\nStream accessors cache ``Tensor`` data in GPSIMD tightly-coupled memory (TCM), but do not ensure their caches remain coherent. When exactly they read from or write back to HBM is opaque to the user (except for ``close()`` which forces a write back).\n\nThe safest way to use them is to ensure that no stream accessor is active (instantiated and not yet closed) while there is an active write stream accessor on the same ``Tensor``. The user should either have multiple read stream accessors active on the same ``Tensor``, or only have a single write stream accessor active on that ``Tensor``.\n\nThe standard tensor accessors read/write HBM directly. Therefore, tensor accessors can safely concurrently access the same ``Tensor``, but it is safest not to use them concurrently with stream accessors since HBM isn't guaranteed to be coherent with the stream accessor caches.\n\nThese coarse-grained guidelines are best practices, but it is possible to ignore them with careful usage of the accessors (making sure elements are read before they are written to, elements written to are written back before being read again, etc).\n\nThe coherence policy of a ``Tensor`` determines what to do when there is potentially incoherent access by an accessor of that ``Tensor``. It can either cause an error, or allow it but print a warning, or do nothing. In the case of the latter two options, it is the user's responsibility to ensure they carefully use accessors coherently. Coherence policy for ``Tensors`` is ``torch::CoherencyEnforcer::Policy::COHERENT`` by default, but can be changed using ``Tensor::set_accessor_coherence_policy()``.\n\n.. code-block:: c++\n\n    // class torch::CoherencyEnforcer\n    enum Policy {\n        // Enforce a resource is acquired in a way that guarantees coherence\n        // Causes an error if it encounters potentially incoherent access\n        COHERENT,\n\n        // Allows potentially incoherent access, but will print a warning\n        INCOHERENT_VERBOSE,\n\n        // Allows potentially incoherent access, no error or warnings\n        INCOHERENT_QUIET\n    };\n\nTCM Accessor\n------------\n\nTCM accessors provide the fastest read and write performance. TCM accessors allow the user to manually manage copying data between larger, but slower-access HBM to faster GPSIMD tightly-coupled memory (TCM). It may be beneficial to see the diagram under :ref:`custom-ops-ref-guide-mem-arch`. Create a ``TensorTcmAccessor`` from a ``Tensor`` by calling ``Tensor::tcm_accessor()``. Users can allocate and free TCM memory using ``tcm_malloc()`` and ``tcm_free()``. Users have access to a 16KB pool of TCM memory. Note the streaming accessors also allocate from this pool (4KB each). TCM accessors do not do any coherence checks.\n\n.. note:: \n    See :ref:`neuronx-customop-mlp-perf` for a tutorial on how to use TCM accessors. \n\nExample Usage\n^^^^^^^^^^^^^\n\nElement-wise negate of a tensor using ``TensorTcmAccessor``.\n\n.. code-block:: c++\n\n    torch::Tensor tensor_negate_compute(const torch::Tensor& t_in) {\n        size_t num_elem = t_in.numel();\n        torch::Tensor t_out = torch::empty(t_in.sizes(), torch::kFloat);\n\n        static constexpr size_t buffer_size = 1024;\n        float *tcm_buffer = (float *)torch::neuron::tcm_malloc(sizeof(float) * buffer_size);\n\n        if (tcm_buffer != nullptr) {\n            // tcm_malloc allocated successfully, use TensorTcmAccessor\n            auto t_in_tcm_acc = t_in.tcm_accessor();\n            auto t_out_tcm_acc = t_out.tcm_accessor();\n            for (size_t i = 0; i < num_elem; i += buffer_size) {\n                size_t remaining_elem = num_elem - i;\n                size_t copy_size = (remaining_elem > buffer_size) ? buffer_size : remaining_elem;\n\n                t_in_tcm_acc.tensor_to_tcm<float>(tcm_buffer, i, copy_size);\n                for (size_t j = 0; j < copy_size; j++) {\n                    tcm_buffer[j] *= -1;\n                }\n                t_out_tcm_acc.tcm_to_tensor<float>(tcm_buffer, i, copy_size);\n            }\n\n            torch::neuron::tcm_free(tcm_buffer);\n        } else {\n            // Handle not enough memory...\n        }\n\n        return t_out;\n    }\n\nTCM Management Functions\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. cpp:function:: void * torch::neuron::tcm_malloc(size_t nbytes)\n\n    Allocate ``nbytes`` bytes of memory from TCM and return pointer to this memory. Upon failure, returns null.\n\n.. cpp:function:: void torch::neuron::tcm_free(void * ptr)\n\n    Free memory that was allocated by ``tcm_malloc()``. Undefined behaviour if ``ptr`` was not returned from a previous call to ``tcm_malloc()``.\n\nClass torch::TensorTcmAccessor\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. cpp:class:: template<bool read_only> class torch::TensorTcmAccessor\n\n    The ``read_only`` template parameter controls whether or not you can write to the accessor's ``Tensor``. A ``const Tensor`` will return a read-only ``TensorTcmAccessor`` from ``Tensor::tcm_accessor()``.\n\nMember Functions\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. cpp:function:: template<typename T> void tensor_to_tcm(T * tcm_ptr, size_t tensor_offset, size_t num_elem)\n\n    Copy ``num_elem`` elements from the accessor's ``Tensor`` starting at the index ``tensor_offset`` to a TCM buffer starting at ``tcm_ptr``. Tensor indexing is performed as if the tensor was flattened. Template parameter ``T`` is the scalar type of the tensor data. The TCM buffer's size should be at least ``sizeof(T) * num_elem`` bytes.\n\n.. cpp:function:: template<typename T> void tcm_to_tensor(T * tcm_ptr, size_t tensor_offset, size_t num_elem)\n\n    Copy ``num_elem`` elements from a TCM buffer starting at ``tcm_ptr`` to the accessor's ``Tensor`` starting at the index ``tensor_offset``. Tensor indexing is performed as if the tensor was flattened. The TCM buffer's size should be at least ``sizeof(T) * num_elem`` bytes.\n\n\nWriting Directly to Output Tensor\n---------------------------------\n\n.. cpp:function:: torch::Tensor get_dst_tensor()\n\n    Returns a reference to the Custom C++ operator output tensor (return value). If this method is called, it is assumed that data will be written to this output tensor, and the tensor returned from the C++ operator will be ignored. Using this method will improve performance by avoiding additional copying of the return value. See example below for function usage.\n\n    .. code-block:: c++\n        :emphasize-lines: 4, 12\n        \n        // Example of write to get_dst_tensor()\n        torch::Tensor example_kernel(const torch::Tensor& t_in) {\n            size_t num_elem = t_in.numel();\n            torch::Tensor t_out = get_dst_tensor();\n            auto t_out_tcm_acc = t_out.tcm_accessor();\n\n            float *tcm_buffer = (float *)torch::neuron::tcm_malloc(sizeof(float) * buffer_size);\n            \n            // Populate tcm_buffer with results\n            ...\n            // Write to t_out throught tcm_accessor\n            t_out_acc.tcm_to_tensor<float>(tcm_buffer, offset, copy_size);\n            \n            ...\n        }\n\nUsing multiple GPSIMD cores\n---------------------------\n\n.. note:: \n    See :ref:`neuronx-customop-mlp-perf` for a tutorial on how to use multiple GPSIMD cores to execute the Custom C++ Operator\n\nBy default, Custom C++ operators target a single core of the GPSIMD-Engine. Performance of Custom C++ operators can be improved by targeting multiple cores. To enable usage of multiple GPSIMD cores, ``multicore=True`` should be passed to ``custom_op.load()``.\n\n.. code-block:: python\n    :emphasize-lines: 6\n\n    custom_op.load(\n        name=name,\n        compute_srcs=compute_srcs,\n        shape_srcs=shape_srcs,\n        build_directory=os.getcwd(),\n        multicore=True\n    )\n\nEach GPSIMD core executes the same kernel function. The user can control the execution on each core by conditioning the Custom C++ operator logic on the core id (obtained via ``get_cpu_id()`` API). This is illustrated in the example below.\n\n.. warning::\n    In multicore mode, tensors can only be accessed through TCM accessors. Using regular tensor accessors and streaming accessors are going to yield incorrect result.\n\nThe following functions are defined in ``neuron/neuron-utils.hpp``\n\n.. cpp:function:: uint32_t get_cpu_id()\n\n    Return the id of the core that the Custom C++ operator is executing on, id is in range ``[0, get_cpu_count())``\n\n.. cpp:function:: uint32_t get_cpu_count()\n\n    Return the total number of available GPSIMD cores.\n\n.. code-block:: c++\n    :emphasize-lines: 5, 6, 15\n\n    torch::Tensor example_kernel(const torch::Tensor& t_in) {\n        size_t num_elem = t_in.numel();\n        torch::Tensor t_out = get_dst_tensor();\n\n        uint32_t cpu_id = get_cpu_id();\n        uint32_t cpu_count = get_cpu_count();\n\n        uint32_t partition = num_elem / cpu_count;\n\n        float *tcm_buffer = (float *)torch::neuron::tcm_malloc(sizeof(float) * buffer_size);\n        // Populate tcm_buffer with desired results\n        ...\n\n        // Write to t_out with a offset computed from cpu_id and cpu_count\n        t_out_tcm_acc.tcm_to_tensor<float>(tcm_buffer, partition*cpu_id, copy_size);\n\n        ...\n    }\n\nReturn Value Handling\n^^^^^^^^^^^^^^^^^^^^^\n\nWhen using multiple GPSIMD cores, the ``get_dst_tensor()`` API must be used to write the return value of the Custom C++ operators. Data not written to the tensor reference returned by ``get_dst_tensor()``, or not invoking ``get_dst_tensor()`` will result in undefined behavior. The user is responsible for writing the appropriate portion of the output reference tensor from a given GPSIMD core. Since there is no synchronization between GPSIMD cores, it is advised that each GPSIMD core writes to a mutually exclusive partition of the output reference tensor.\n\nprintf()\n--------------\n\nCustom C++ operators support the use of C++'s ``printf()`` to send information to the host's terminal. Using ``printf()`` is the recommended approach to functional debug. With it, the programmer can check the value of inputs, outputs, intermediate values, and control flow within their operator.\n\nUsage\n^^^^^\n\nTo use ``printf()`` within a Custom C++ operator, the programmer must set the following environment variables before running their model in order to receive the messages printed by their operator:\n\n.. list-table:: Environment Variables\n   :widths: 50 200 20 200 200\n   :header-rows: 1\n\n\n\n   * - Name\n     - Description\n     - Type\n     - Value to Enable printf\n     - Default Value\n   * - ``NEURON_RT_LOG_LEVEL``\n     - Runtime log verbose level\n     - String\n     - At least ``INFO``\n     - See (:ref:`nrt-configuration`) for more options.\n   * - ``NEURON_RT_GPSIMD_STDOUT_QUEUE_SIZE_BYTES``\n     - Size of the printf output buffer, in bytes\n     - Integer\n     - Any power of two that is equal to or less than ``131072`` (128KB)\n     - Recommend setting a value of ``131072`` to maximize the size of printf's buffer. Setting a value of 0 disables printf.\n\nWithin a Custom C++ operator, ``printf()`` can be used as normal from within a C++ program. For more information, consult a reference such as (https://cplusplus.com/reference/cstdio/printf/)\n\nExample\n^^^^^^^\n\n.. code-block:: c++\n\n    #include <torch/torch.h>\n    #include <stdio.h> // Contains printf()\n\n    torch::Tensor tensor_negate_compute(const torch::Tensor& t_in) {\n        size_t num_elem = t_in.numel();\n        torch::Tensor t_out = torch::zeros({num_elem}, torch::kFloat);\n\n        auto t_in_acc = t_in.accessor<float, 1>();\n        auto t_out_acc = t_out.accessor<float, 1>();\n        for (size_t i = 0; i < num_elem; i++) {\n            float tmp = -1 * t_in_acc[i];\n            printf(\"Assigning element %d to a value of %f\\n\", i, tmp);\n            t_out_acc[i] = tmp;\n        }\n        return t_out;\n    }\n\nPrint statements then appear on the host's terminal with a header message prepended:\n\n::\n\n    2023-Jan-26 00:25:02.0183  4057:4131   INFO  TDRV:pool_stdio_queue_consume_all_entries    Printing stdout from GPSIMD:\n    Assigning element 0 to a value of -1.000000\n    Assigning element 1 to a value of -2.000000\n    Assigning element 2 to a value of -3.000000\n    Assigning element 3 to a value of -4.000000\n    Assigning element 4 to a value of -5.000000\n    Assigning element 5 to a value of -6.000000\n    Assigning element 6 to a value of -7.000000\n    Assigning element 7 to a value of -8.000000\n\n\nLimitations\n^^^^^^^^^^^\n\n* Performance: using ``printf()`` significantly degrades the operator's performance.\n\n  * The programmer can disable it by unsetting ``NEURON_RT_GPSIMD_STDOUT_QUEUE_SIZE_BYTES`` or setting it to 0.\n\n    * We recommend that you disable ``printf()`` if you are running the model in a performance-sensitive context.\n\n  * To maximize performance, remove calls to ``printf()`` from within the operator.\n\n    * Even if ``printf()`` is disabled, calling the function incurs overhead.\n* Buffer size: output from ``printf()`` is buffered during model execution and read by the Neuron runtime after execution.\n\n  * The model can still execute successfully if you overflow the buffer.\n  * Overflowing the buffer causes the oldest data in the buffer to be overwritten.\n* Print statements are processed and printed to the host's terminal at the end of model execution, not in real time.\n* ``printf()`` is only supported in single core mode, or on GPSIMD core 0 only when using multiple GPSIMD cores.\n\nLibrary Limitations\n-------------------\n\n* Tensors passed into and returned from CustomOp functions can either have up to 8 dimensions where the maximum size of each dimension is 65535, or up to 4 dimensions where the maximum size of each dimension is 4294967295.\n* When using multiple GPSIMD cores, only ``TensorTcmAccessor`` is supported. Usage of other accessors results in undefined behaviour.\n* Each model can only have one CustomOp library, and the library can have 10 functions registered. For more information on function registration in PyTorch, see `Implementing an operator in C++` in the :ref:`feature-custom-operators-devguide`.\n\n  * However, models using ``torch.sort`` cannot have any CustomOps.\n"
  },
  {
    "path": "neuron-customops/customops-intro.txt",
    "content": "Neuron Custom C++ Operators enable developers to write C++ Custom Operators (“CustomOps”) that run on NeuronCores. This enables developers to extend operator support beyond what is officially supported by Neuron.\n\nDevelopers can use standard PyTorch custom operators programming interfaces to leverage Neuron Custom C++ Operators feature. This makes it easy to migrate CPU Custom Operators to Neuron, and implement new beta operators, all without any intimate knowledge of the NeuronCore hardware."
  },
  {
    "path": "neuron-customops/index.rst",
    "content": ".. _neuron_c++customops:\n\nNeuron Custom C++ Operators [Beta]\n==================================\n\n\n.. include:: /neuron-customops/customops-intro.txt\n\n\n.. note:: \n\n        Neuron Custom C++ Operators feature is currently supported on NeuronCore-v2 architecture only, which is found in Trainium (Trn1) and second-generation Inferentia (Inf2) chips.\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /neuron-customops/api-reference-guide/api-reference-guide\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n      \n    /neuron-customops/programming-guide/programming-guide\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /neuron-customops/tutorials/tutorials\n\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    /neuron-customops/misc-customops\n\n\n\n\n.. dropdown::  API Reference Guide\n      :class-title: sphinx-design-class-title-med\n      :class-body: sphinx-design-class-body-small\n      :animate: fade-in\n      :open:\n\n      * :ref:`custom-ops-api-ref-guide`       \n\n\n.. dropdown::  Developer Guide\n      :class-title: sphinx-design-class-title-med\n      :class-body: sphinx-design-class-body-small\n      :animate: fade-in\n      :open:\n\n      * :ref:`feature-custom-operators-devguide`\n\n\n.. dropdown::  Tutorials\n      :class-title: sphinx-design-class-title-med\n      :class-body: sphinx-design-class-body-small\n      :animate: fade-in\n      :open:\n\n      * :ref:`neuronx-customop-mlp-tutorial`\n      * :ref:`neuronx-customop-mlp-perf`\n\n\n.. dropdown::  Misc\n      :class-title: sphinx-design-class-title-med\n      :class-body: sphinx-design-class-body-small\n      :animate: fade-in\n      :open:\n\n  \n      * :ref:`gpsimd-customop-tools-rn`\n      * :ref:`gpsimd-customop-lib-rn`\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "neuron-customops/misc-customops.rst",
    "content": "Misc (Neuron Custom C++ Operators)\n==================================\n\n.. toctree::\n    :maxdepth: 1\n\n    /release-notes/archive/customcxxps/gpsimd-tools\n    /release-notes/archive/customcxxps/gpsimd-customop-lib"
  },
  {
    "path": "neuron-customops/programming-guide/custom-c++-operators-devguide.rst",
    "content": ".. _feature-custom-operators-devguide:\n\nNeuron Custom C++ Operators Developer Guide [Beta]\n==================================================\n\nThis document gives an overview of the Neuron Custom C++ Operator feature and APIs . Currently, CustomOp support is limited to the PyTorch framework.  \n\nPlease refer to the following documents for further information regarding Neuron Custom C++ Operators:\n\n* :ref:`neuronx-customop-mlp-tutorial`\n* :ref:`neuronx-customop-mlp-perf`\n* :ref:`custom-ops-api-ref-guide`\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nSetup & Installation\n--------------------\n\n.. note::\n   The name of ``aws-neuronx-gpsimd-customop`` has been changed to ``aws-neuronx-gpsimd-customop-lib`` as of the neuron 2.10 release.\n\nWe provide tooling and library packages (RPM and DEB) that can be installed on TRN1 and INF2 instances:\n::\n\n   aws-neuronx-gpsimd-tools-0.3\n   aws-neuronx-gpsimd-customop-lib-0.3\n\nFor AL2023 only, the following packages need be installed as dependencies:\n\n::\n   sudo dnf install libnsl\n   sudo dnf install libxcrypt-compat\n\nOn AL2023, they can be installed with the following commands:\n\n::\n   sudo dnf remove python3-devel -y\n   sudo dnf remove aws-neuronx-gpsimd-tools-0.* -y\n   sudo dnf remove aws-neuronx-gpsimd-customop-lib-0.* -y\n   \n   sudo dnf install python3-devel -y\n   sudo dnf install aws-neuronx-gpsimd-tools-0.* -y \n   sudo dnf install aws-neuronx-gpsimd-customop-lib-0.* -y\n\nOn Ubuntu, they can be installed with the following commands:\n\n::\n   sudo apt-get remove python3-dev -y\n   sudo apt-get remove aws-neuronx-gpsimd-tools=0.* -y\n   sudo apt-get remove aws-neuronx-gpsimd-customop-lib=0.* -y  \n   \n   sudo apt-get install python3-dev -y\n   sudo apt-get install aws-neuronx-gpsimd-tools=0.* -y\n   sudo apt-get install aws-neuronx-gpsimd-customop-lib=0.* -y \n\n\nImplementing an operator in C++\n-------------------------------\n\nCustom operators require a function that defines the custom computation. We define this as the **kernel function**. Neuron Custom C++ Operators also contain a **shape function** separate from the normal compute code. This *shape function* defines the shapes of output tensors for a given set of inputs to the operator. This is needed because PyTorch Neuron (torch-neuronx) is based on the PyTorch/XLA software package and uses a Just-In-Time (JIT) compilation strategy. At runtime the operators in the model will be compiled into a binary to be executed on the NeuronCore. During compilation the shapes of the input and output tensors to operators are computed. The **shape function** is executed on the host, whereas the **kernel function** is executed on the NeuronCore. \n\nKernel Function\n^^^^^^^^^^^^^^^\n\nThe kernel function contains the C++ implementation of the CustomOp, as shown in the example below.  By including torch.h in the source, the developer has access to a NeuronCore-ported subset of the torch C++ api  (https://pytorch.org/cppdocs/).  The port contains everything required for CustomOp development and model integration, specifically Tensor and Scalar classes in c10, and a subset of aTen operators.\n::\n\n   #include <stdint.h>\n   #include <stdlib.h>\n   #include <torch/torch.h>\n\n   torch::Tensor tensor_negate_compute(const torch::Tensor& t_in) {\n      size_t num_elem = t_in.numel();\n      torch::Tensor t_out = torch::zeros({num_elem}, torch::kFloat);\n\n      auto t_in_acc = t_in.accessor<float, 1>();\n      auto t_out_acc = t_out.accessor<float, 1>();\n      for (size_t i = 0; i < num_elem; i++) {\n         t_out_acc[i] = -1 * t_in_acc[i];\n      }\n      return t_out;\n   }\n\nThe kernel function is the main computational code for the operator. We support a subset of the input types usable by regular PyTorch Custom Operators: ``torch::Tensor``, ``torch::Scalar``, ``double``, and ``int64_t``. However we do not support ``std::vector`` or ``std::tuple`` of these types at this time. When passing in scalars, ``double`` is the only supported dtype, no other integral types such as ``int``, ``short``, ``int64_t`` or ``long`` are supported. The return value must be a ``torch::Tensor``.\n\n.. warning::\n   Tensors passed into and returned from CustomOp functions can either have up to 8 dimensions where the maximum size of each dimension is 65535, or up to 4 dimensions where the maximum size of each dimension is 4294967295.\n\nThe body of the kernel function may exercise C/C++ libraries, ``torch::Tensor`` classes, and select aTen operators, as is customary for Torch programming.  For high performance, feature offerings provide faster memory access, via new Tensor Accessor classes and stack management compiler flags. Additionally, higher performance can be obtained by parallelizing execution of the kernel over multiple GPSIMD cores. See the :ref:`custom-ops-api-ref-guide` for more details.\n\nFinally, because the kernel is specially compiled for and run by the NeuronCore target, its tooling, libraries, and environment differ from the host pytorch installation. For example, while the host may run Pytorch 1.13 and a C++17 compatible compiler in a linux environment, the NeuronCore may run a port of Pytorch 1.12 (c10) and LLVM’s libc++ C++14 version 10.0.1 without linux.  Developers must develop for the compiler, torch version, and environment of their targeted NeuronCore.  See the :ref:`custom-ops-api-ref-guide` for more details.\n\n\nShape Function\n^^^^^^^^^^^^^^\n\nThe shape function has the same function signature as the kernel function, but does not perform any computations. Rather, it only defines the shape of the output tensor but not the actual values. \n::\n\n   #include <stdint.h>\n   #include <stdlib.h>\n   #include <torch/torch.h>\n\n   torch::Tensor tensor_negate_shape(torch::Tensor t1) {\n      size_t num_elem = t1.numel();\n      torch::Tensor t_out = torch::zeros({num_elem}, torch::kFloat);\n\n      return t_out;\n   }\n\nThe body of the shape function may exercise C/C++ libraries or ``torch::Tensor`` classes. The body may not access the data of input tensors since these are XLA Tensors and do not have any data storage allocated yet. However, any of the functions that access shape information such as *numel* (to get the number of elements) may be used. \n\n\nBuilding and executing operators\n--------------------------------\n\nOnce you have the kernel and shape functions for your operators you can build them into a library to use them from PyTorch in your model. Just like regular PyTorch Custom Operators, Neuron Custom C++ Operators use a registration macro to associate the kernel and shape functions with the name of the operator that will be called from Python.\n\nSimilar to PyTorch, Neuron Custom C++ Operators are grouped into libraries defined within the ``NEURON_LIBRARY(<lib_name>, m)`` scope, where lib_name is the name of your library of custom operators. Within this scope, calls to ``m.def(<op_name>, <shape_fcn>, <kernel_fcn>)`` define each operator in your library. The ``op_name`` is the name to call the operator with in the model (i.e. ``torch.ops.lib_name.op_name()``). The ``shape_fcn`` is a function pointer to the shape function to call during compilation. Finally the ``kernel_fcn`` is the name of the function to be executed on the NeuronCore at runtime. \n::\n\n   #include <stdint.h>\n   #include <stdlib.h>\n   #include <torch/torch.h>\n   #include \"torchneuron/register.h\"\n\n   torch::Tensor tensor_negate_shape(torch::Tensor t1) {\n      size_t num_elem = t1.numel();\n      torch::Tensor t_out = torch::zeros({num_elem}, torch::kFloat);\n\n      return t_out;\n   }\n\n   NEURON_LIBRARY(my_ops, m) {\n      m.def(\"tensor_negate\", &tensor_negate_shape, \"tensor_negate_compute\");\n   }\n\nNotice that the ``NEURON_LIBRARY`` macro is used in the same C++ file as the shape function. This is because the registration is loaded on the host.\n\n.. warning::\n   Each model can only have one CustomOp library, and the library can have 10 functions registered. However, models using ``torch.sort`` cannot have any CustomOps.\n\nThe custom op library is built by calling the ``load`` API in Python like:\n::\n\n   import torch_neuronx\n   from torch_neuronx.xla_impl import custom_op\n\n   custom_op.load(\n      name='my_ops',\n      compute_srcs=['kernel.cpp'],\n      shape_srcs=['shape.cpp'],\n      multicore=False\n   )\n\nIn the example above, name refers to the name of the library file to be created (i.e. ``libmy_ops.so``) and the ``compute_srcs`` and ``shape_srcs`` are lists of files to be compiled. After the ``load`` API completes, the library will have been compiled and loaded into the current PyTorch process. \n\n.. warning::\n   The library file name should not be \"builtin\" as it is a reserved keyword.\n\nCustomOp also supports multicore execution mode. If you want to the library to run in multicore mode, pass the flag ``multicore=True`` into the ``load`` API. Notice that the execution mode is specified at the library level, so all the functions in the library run in the same mode. For more details of multicore CustomOp, please refer to `Using multiple GPSIMD cores` section in :ref:`custom-ops-api-ref-guide`.\n\nSimilar to PyTorch, the Neuron custom op will be available at ``torch.ops.<lib_name>.<op_name>`` where ``lib_name`` is defined in the ``NEURON_LIBRARY`` macro, and ``op_name`` is defined in the call to ``m.def``.\n::\n\n   import torch\n\n   out_tensor = torch.ops.my_ops.tensor_negate(in_tensor)\n\n\nLoading a previously built library\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe library can also be built ahead of time or in a separate process and loaded later. In the ``load`` API, specify the ``build_directory`` argument and the library will be written to that location on disk.\n::\n\n   import torch_neuronx\n   from torch_neuronx.xla_impl import custom_op\n\n   custom_op.load(\n      name='my_ops',\n      compute_srcs=['kernel.cpp'],\n      shape_srcs=['shape.cpp'],\n      build_directory*=*os.getcwd(),\n   )\n\nThen, later, this library can be loaded by calling the ``load_library`` API and using the ops in the exact same way.\n::\n\n   import torch\n   import torch_neuronx\n   from torch_neuronx.xla_impl import custom_op\n\n   custom_op.load_library('/home/user/libmy_ops.so')\n\n   out_tensor = torch.ops.my_ops.tensor_negate(in_tensor)\n\nNote: The ``load_library`` API does not need to be called in the same process where the library is built with the load API. Similar to regular PyTorch Custom Operators, Neuron Custom C++ Operators are built and loaded at the same time when the ``load`` API is called.  \n\n\nPerformance Guidance\n--------------------\n\nWhen possible, it is recommended that operators supported by the designated framework with supported compilation onto Neuron devices are used. These operators have been have been highly optimized for the Neuron architecture. However, for other scenarios where Custom C++ operators are the required solution, the following recommendations can be followed to improve performance:\n\n* Use the provided memory management accessors (streaming and tcm accessor). Both of these accessors improve data fetch overhead. See the :ref:`custom-ops-api-ref-guide` for more information.\n* You can optionally specify the estimated amount of stack space (in bytes) used in your Custom C++ operator via the ``extra_cflags`` argument in the call to ``custom_op.load()``. For instance, if you anticipate your operator using ~20KB of stack space, include the argument ``extra_cflags=['-DSTACK_SIZE=20000']`` in the call to custom_op.load(). **This is necessary only if you anticipate the stack to grow beyond ~8KB.** This flag is used to decide whether to place the stack in faster local memory, which significantly improves performance, or if we will need to place the stack in larger NeuronCore memory with longer access latency. If you do not specify this flag, or the estimate you provide is small enough (less than ~8KB), the stack will go in local memory. Note, when placed in local memory, the stack space will not be restricted by your estimate, but if your stack grows beyond ~8KB, there's a risk of a stack overflow, and you will be notified with an error message from GPSIMD should such a case occur. If you do specify a stack size, the maximum supported stack size is 400KB.\n* Use multiple GPSIMD cores when possible to parallelize (and hence improve performance) of Custom C++ operator, refer to `Using multiple GPSIMD cores`  section in :ref:`custom-ops-api-ref-guide` for more information.\n\nFunctional Debug\n----------------\n\nCustom C++ operators support the use of the C++ language's ``printf()``. For functional debug, the recommended approach is using ``printf()`` to print input, intermediate, and final values. Consult the :ref:`custom-ops-api-ref-guide` for more information.\n\n\n"
  },
  {
    "path": "neuron-customops/programming-guide/programming-guide.rst",
    "content": "Developer Guide\n===============\n\n.. toctree::\n    :maxdepth: 1\n\n    /neuron-customops/programming-guide/custom-c++-operators-devguide"
  },
  {
    "path": "neuron-customops/tutorials/customop-mlp-perf-opt.rst",
    "content": ".. _neuronx-customop-mlp-perf:\n\nNeuron Custom C++ Operators Performance Optimization\n====================================================\n\nIn this tutorial, we will build on the small MLP model shown in :ref:`neuronx-customop-mlp-tutorial` and demonstrate methods to optimize the performance of a custom C++ operator. We will be taking advantage of the TCM accessor as well as the usage of multiple GPSIMD cores to enhance performance.\n\nThis tutorial assumes the reader has read and set up an environment described in :ref:`neuronx-customop-mlp-tutorial`.\n\n.. contents:: Table of Contents\n    :local:\n    :depth: 2\n\nDownload Examples\n-----------------\n\nTo download the source code for this tutorial, do:\n\n.. code:: bash\n\n    git clone https://github.com/aws-neuron/aws-neuron-samples.git\n    cd aws-neuron-samples/torch-neuronx/inference/customop_mlp\n\n.. note:: \n    We will be using an inference example in this tutorial in order to adhere to certain Custom C++ operator restrictions when using multiple GPSIMD cores (see :ref:`custom-ops-api-ref-guide`  for details on current restrictions).\n\n.. note::\n\n    Custom C++ Operators are supported as of Neuron SDK Version 2.7 as a beta feature. As such this feature is not installed by default, additional tooling and library packages (RPM and DEB) are required. \n\n    For AL2023 only, the following packages need be installed as dependencies:\n    ::\n      sudo dnf install libnsl\n      sudo dnf install libxcrypt-compat\n    \n    On AL2023, they can be installed with the following commands:\n    ::\n      sudo dnf remove python3-devel -y\n      sudo dnf remove aws-neuronx-gpsimd-tools-0.* -y\n      sudo dnf remove aws-neuronx-gpsimd-customop-lib-0.* -y\n      \n      sudo dnf install python3-devel -y\n      sudo dnf install aws-neuronx-gpsimd-tools-0.* -y \n      sudo dnf install aws-neuronx-gpsimd-customop-lib-0.* -y\n\n    On Ubuntu, they can be installed with the following commands:\n    ::\n      sudo apt-get remove python3-dev -y\n      sudo apt-get remove aws-neuronx-gpsimd-tools=0.* -y\n      sudo apt-get remove aws-neuronx-gpsimd-customop-lib=0.* -y  \n      \n      sudo apt-get install python3-dev -y\n      sudo apt-get install aws-neuronx-gpsimd-tools=0.* -y\n      sudo apt-get install aws-neuronx-gpsimd-customop-lib=0.* -y  \n\nActivate the virtual environment created in :ref:`neuronx-customop-mlp-tutorial`,\n\n.. code:: shell\n\n    source ~/aws_neuron_venv_pytorch/bin/activate\n\nAs a reminder, ``ninja`` should be already installed in the virtual environment. If not, install it for PyTorch Custom Extensions in your environment by running:\n\n.. literalinclude:: tutorial_source_code/custom_c_perf_optimization/custom_c_perf_optimization_code.sh\n   :language: bash\n   :lines: 5-6\n\nModel Configuration Adjustment\n------------------------------\n\nFor this tutorial, we will enlarge the size of the hidden layer from ``[120, 84]`` to ``[4096, 2048]`` in ``model.py``.\n\n.. code-block:: python\n    :emphasize-lines: 8\n\n    import torch\n    import torch.nn as nn\n    from torch.nn import functional as F\n    import my_ops\n\n    # Declare 3-layer MLP for MNIST dataset                                                                \n    class MLP(nn.Module):\n        def __init__(self, input_size = 28 * 28, output_size = 10, layers = [4096, 2048]):\n            super(MLP, self).__init__()\n            self.fc1 = nn.Linear(input_size, layers[0])\n            self.fc2 = nn.Linear(layers[0], layers[1])\n            self.fc3 = nn.Linear(layers[1], output_size)\n\n        def forward(self, x):\n            f1 = self.fc1(x)\n            r1 = my_ops.Relu.apply(f1)\n            f2 = self.fc2(r1)\n            r2 = my_ops.Relu.apply(f2)\n            f3 = self.fc3(r2)\n            return torch.log_softmax(f3, dim=1)\n\nPerformance with Element-wise Accessor\n---------------------------------------\n\nThe ``neuron`` directory contains the same code shown in :ref:`neuronx-customop-mlp-tutorial`, where the ``relu_forward`` is implemented with element-wise accessor. Go to ``neuron`` directory, run ``build.py`` then ``inference.py``, the expected output on a trn1 instance is,\n\n.. code-block:: bash\n\n    Inf throughput (iter/sec): 8.098649744235592\n    ----------End Inference ---------------\n\nPerformance with TCM Accessor\n-----------------------------\nNow we switch to ``neuron-tcm`` folder. As mentioned in :ref:`custom-ops-api-ref-guide`, TCM accessors provide faster read and write performance. We implement the ``relu_forward`` using TCM accessor in ``relu.cpp``:\n\n.. code-block:: c++\n\n    torch::Tensor relu_forward(const torch::Tensor& t_in) {\n        size_t num_elem = t_in.numel();\n        torch::Tensor t_out = torch::zeros(t_in.sizes(), torch::kFloat); \n\n        static constexpr size_t buffer_size = 1024;\n        float *tcm_buffer = (float*)torch::neuron::tcm_malloc(sizeof(float) * buffer_size);\n\n        if (tcm_buffer != nullptr) {\n            auto t_in_tcm_acc = t_in.tcm_accessor();\n            auto t_out_tcm_acc = t_out.tcm_accessor();\n\n            for (size_t i = 0; i < num_elem; i += buffer_size) {\n            size_t remaining_elem = num_elem - i;\n            size_t copy_size = (remaining_elem > buffer_size) ? buffer_size : remaining_elem;\n\n            t_in_tcm_acc.tensor_to_tcm<float>(tcm_buffer, i, copy_size);\n            for (size_t j = 0; j < copy_size; j++) {\n                tcm_buffer[j] = tcm_buffer[j] > 0.0 ? tcm_buffer[j] : 0.0;\n            }\n            t_out_tcm_acc.tcm_to_tensor<float>(tcm_buffer, i, copy_size);\n            }\n        }\n        torch::neuron::tcm_free(tcm_buffer);\n        return t_out;\n    }\n\nRun ``build.py`` then ``inference.py``, the expected output on a trn1 instance is:\n\n.. code-block:: bash\n\n    Inf throughput (iter/sec): 220.73800131604054\n    ----------End Inference ---------------\n\nExtending the example to utilize multiple GPSIMD cores\n------------------------------------------------------\n\nNow we switch to the ``neuron-multicore`` folder. We first enable the usage of multiple GPSIMD cores by ``multicore=True`` in the ``build.py``. \n\n.. code-block:: python\n\n    custom_op.load(\n        name='relu',\n        compute_srcs=['relu.cpp'],\n        shape_srcs=['shape.cpp'],\n        build_directory=os.getcwd(),\n        multicore=True,\n        verbose=True\n    )\n\nAfter passing the flag, the kernel function ``relu_forward`` defined in ``relu.cpp`` will execute on all GPSIMD cores. Thus we need to use ``cpu_id`` to partition the workload among all cores. \n\n.. code-block:: c++\n\n    torch::Tensor relu_forward(const torch::Tensor& t_in) {\n        size_t num_elem = t_in.numel();\n        torch::Tensor t_out = get_dst_tensor();\n\n        uint32_t cpu_id = get_cpu_id();\n        uint32_t cpu_count = get_cpu_count();\n        uint32_t partition = num_elem / cpu_count;\n        if (cpu_id == cpu_count - 1) {\n            partition = num_elem - partition * (cpu_count - 1);\n        }\n\n        static constexpr size_t buffer_size = 1024;\n        float *tcm_buffer = (float*)torch::neuron::tcm_malloc(sizeof(float) * buffer_size);\n\n        if (tcm_buffer != nullptr) {\n            auto t_in_tcm_acc = t_in.tcm_accessor();\n            auto t_out_tcm_acc = t_out.tcm_accessor();\n\n            for (size_t i = 0; i < partition; i += buffer_size) {\n            size_t remaining_elem = partition - i;\n            size_t copy_size = (remaining_elem > buffer_size) ? buffer_size : remaining_elem;\n\n            t_in_tcm_acc.tensor_to_tcm<float>(tcm_buffer, partition *cpu_id + i, copy_size);\n            for (size_t j = 0; j < copy_size; j++) {\n                tcm_buffer[j] = tcm_buffer[j] > 0.0 ? tcm_buffer[j] : 0.0;\n            }\n            t_out_tcm_acc.tcm_to_tensor<float>(tcm_buffer, partition *cpu_id + i, copy_size);\n            }\n        }\n        torch::neuron::tcm_free(tcm_buffer);\n        return t_out;\n    }\n\nThere are two things noteworthy in the code:\n\n1. We use ``cpu_id`` and ``cpu_count`` to distribute the workload among all cores. Particularly, each cores performs ``relu`` on a partition of the tensor, the offset is computed based on ``cpu_id``.\n2. The output of the operator is directly written to the tensor from ``get_dst_tensor()``. The ``return t_out;`` statement is ignored during execution.\n\nRun ``build.py`` then ``inference.py``, the expected output on a trn1 instance is:\n\n.. code-block:: bash\n\n    Inf throughput (iter/sec): 269.936119707143\n    ----------End Inference ---------------\n\nDetails of the API used in the sample here can be found in :ref:`custom-ops-api-ref-guide`. \n\n"
  },
  {
    "path": "neuron-customops/tutorials/customop-mlp-training.rst",
    "content": ".. _neuronx-customop-mlp-tutorial:\n\nNeuron Custom C++ Operators in MLP Training \n===========================================\n\nIn this tutorial we’ll demonstrate how to prepare a PyTorch model that contains a custom operator (ie. CppExtension) for Neuron compilation to run on Trainium EC2 instances. To learn more about Neuron CustomOps see :ref:`neuron_c++customops`. For a deeper dive on MNIST or Multi-Layer Perceptron models, see the :ref:`neuronx-mlp-training-tutorial`. This tutorial assumes the reader is familiar with `PyTorch Custom Extensions <https://pytorch.org/tutorials/advanced/cpp_extension.html>`_.\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nSetup Environment and Download Examples\n---------------------------------------\n\nBefore running the tutorial please follow the installation instructions at:\n\n* :ref:`pytorch-neuronx-install` on Trn1\n\n.. note::\n    The name of ``aws-neuronx-gpsimd-customop`` has been changed to ``aws-neuronx-gpsimd-customop-lib`` as of the neuron 2.10 release.\n\n.. note::\n\n    Custom C++ Operators are supported as of Neuron SDK Version 2.7 as a beta feature. As such this feature is not installed by default, additional tooling and library packages (RPM and DEB) are required. \n\n    For AL2023 only, the following packages need be installed as dependencies:\n    ::\n        sudo dnf install libnsl\n        sudo dnf install libxcrypt-compat\n    \n    On AL2023, they can be installed with the following commands:\n    ::\n        sudo dnf remove python3-devel -y\n        sudo dnf remove aws-neuronx-gpsimd-tools-0.* -y\n        sudo dnf remove aws-neuronx-gpsimd-customop-lib-0.* -y\n\n        sudo dnf install python3-devel -y\n        sudo dnf install aws-neuronx-gpsimd-tools-0.* -y \n        sudo dnf install aws-neuronx-gpsimd-customop-lib-0.* -y\n\n    On Ubuntu, they can be installed with the following commands:\n    ::\n        sudo apt-get remove python3-dev -y\n        sudo apt-get remove aws-neuronx-gpsimd-tools=0.* -y\n        sudo apt-get remove aws-neuronx-gpsimd-customop-lib=0.* -y  \n\n        sudo apt-get install python3-dev -y\n        sudo apt-get install aws-neuronx-gpsimd-tools=0.* -y\n        sudo apt-get install aws-neuronx-gpsimd-customop-lib=0.* -y \n\n  \nFor all the commands below, make sure you are in the virtual environment that you have created above before you run the commands:\n\n.. code:: shell\n\n    source ~/aws_neuron_venv_pytorch/bin/activate\n\nInstall dependencies for PyTorch Custom Extensions in your environment by running:\n\n.. literalinclude:: tutorial_source_code/custom_c_mlp_training/custom_c_mlp_training_code.sh\n   :language: bash\n   :lines: 5-6\n\nThe ``ninja`` package is only needed for the reference CPU example. It is not needed by Neuron to run on Trainium instances.\n    \nTo download the source code for this tutorial, do:\n\n.. code:: bash\n\n    git clone https://github.com/aws-neuron/aws-neuron-samples.git\n    cd aws-neuron-samples/torch-neuronx/training/customop_mlp\n\nIn the ``customop_mlp`` directory there are two subdirectories. The ``pytorch`` directory contains an example model and training script using a custom operator that runs using the cpu device with standard PyTorch APIs and libraries (ie. not specific to AWS/Neuron). The ``neuron`` directory contains a version of the same model and training script with the custom operator ported to Neuron to run on trn1 using the XLA device. \n\nBasic PyTorch Custom Relu Operator\n----------------------------------\n\nFor the next few sections we’ll review the example model in the ``pytorch`` directory. This is a condensed and simplified explanation of PyTorch C++ Extensions, for more details see the `PyTorch documentation <https://pytorch.org/tutorials/advanced/cpp_extension.html>`_. In ``my_ops.py`` we implement a custom relu activation op as a torch autograd function so that we can use it in a training loop:\n\n.. code-block:: python\n\n    import torch\n\n    torch.ops.load_library('librelu.so')\n\n    class Relu(torch.autograd.Function):\n        @staticmethod\n        def forward(ctx, input):\n            ctx.save_for_backward(input)\n            return torch.ops.my_ops.relu_forward(input)\n\n        @staticmethod\n        def backward(ctx, grad):\n            input, = ctx.saved_tensors\n            return torch.ops.my_ops.relu_backward(grad, input), None\n\nNotice that here we first load ``librelu.so`` using the ``load_library`` API. And then call the ``relu_forward`` and ``relu_backward`` functions from our library within the relevant static methods. \n\nWe implemented these two library functions in the ``relu.cpp`` file:\n\n.. code-block:: c++\n\n    torch::Tensor relu_forward(const torch::Tensor& t_in) {\n        ...\n        t_out_acc[i][j] = t_in_acc[i][j] > 0.0 ? t_in_acc[i][j] : 0.0;\n        ...\n    }\n\n    torch::Tensor relu_backward(const torch::Tensor& t_grad, const torch::Tensor& t_in) {\n        ...\n        t_out_acc[i][j] = t_in_acc[i][j] > 0.0 ? t_grad_acc[i][j] : 0.0;\n        ...\n    }\n\n    TORCH_LIBRARY(my_ops, m) {\n        m.def(\"relu_forward\", &relu_forward);\n        m.def(\"relu_backward\", &relu_backward);\n    }\n\nAnd then built them into a library using the PyTorch Cpp Extension APIs in the ``build.py`` script:\n\n.. code-block:: python\n\n    torch.utils.cpp_extension.load(\n        name='librelu',\n        sources=['relu.cpp'],\n        is_python_module=False,\n        build_directory=os.getcwd()\n    )\n\nRun ``python build.py`` to produce the ``librelu.so`` library.\n    \nMulti-layer perceptron MNIST model\n----------------------------------\n\nIn ``model.py``, we define the multi-layer perceptron (MLP) MNIST model with 3 linear layers and a custom ReLU activation, followed by a log-softmax layer. Highlighted below are the relevant custom changes in the ``model.py`` file:\n\n.. code-block:: python\n    :emphasize-lines: 4, 16, 18\n\n    import torch\n    import torch.nn as nn\n    from torch.nn import functional as F\n    import my_ops\n\n    # Declare 3-layer MLP for MNIST dataset                                                                \n    class MLP(nn.Module):\n        def __init__(self, input_size = 28 * 28, output_size = 10, layers = [120, 84]):\n            super(MLP, self).__init__()\n            self.fc1 = nn.Linear(input_size, layers[0])\n            self.fc2 = nn.Linear(layers[0], layers[1])\n            self.fc3 = nn.Linear(layers[1], output_size)\n\n        def forward(self, x):\n            f1 = self.fc1(x)\n            r1 = my_ops.Relu.apply(f1)\n            f2 = self.fc2(r1)\n            r2 = my_ops.Relu.apply(f2)\n            f3 = self.fc3(r2)\n            return torch.log_softmax(f3, dim=1)\n\nTraining the MLP model on CPU\n-----------------------------\n\nIn the ``train_cpu.py`` script we load the MNIST train dataset, instantiate the MLP model, and use ``device='cpu'`` to execute on the host CPU. Expected CPU output:\n\n.. code:: bash\n\n    ----------Training ---------------\n    Train throughput *(*iter/sec*)*: *286*.96994718801335\n    Final loss is *0*.1040\n    ----------End Training ---------------\n\nNeuron Relu CustomOp\n--------------------\n\nNow switch over into the ``neuron`` directory. To migrate our PyTorch customOp to Neuron, we have to make a few small changes. First, we create a new ``shape.cpp`` file to implement our shape function as required by XLA (see :ref:`feature-custom-operators-devguide` for details). We also replace the ``TORCH_LIBRARY`` API with ``NEURON_LIBRARY``.\n\n.. code-block:: c++\n\n    torch::Tensor relu_fwd_shape(torch::Tensor t_in) {\n        torch::Tensor t_out = torch::zeros(t_in.sizes(), torch::kFloat);\n        return t_out;\n    }\n\n    torch::Tensor relu_bwd_shape(torch::Tensor t_grad, torch::Tensor t_in) {\n        torch::Tensor t_out = torch::zeros(t_in.sizes(), torch::kFloat);\n        return t_out;\n    }\n\n    NEURON_LIBRARY(my_ops, m) {\n        m.def(\"relu_forward\", &relu_fwd_shape, \"relu_forward\");\n        m.def(\"relu_backward\", &relu_bwd_shape, \"relu_backward\");\n    }\n\nAnd then we build it using the ``torch_neuronx`` package in ``build.py``:\n\n.. code-block:: python\n\n    from torch_neuronx.xla_impl import custom_op\n\n    custom_op.load(\n        name='relu',\n        compute_srcs=['relu.cpp'],\n        shape_srcs=['shape.cpp'],\n        build_directory=os.getcwd()\n    )\n\nNotice that here we specify both the ``relu.cpp`` and ``shape.cpp`` files separately. This is because the shape functions will be compiled with an x86 compiler and run on the host during the XLA compilation, and the compute functions will be compiled for the NeuronCore accelerator and executed during the training loop. Running ``build.py`` produces the same ``librelu.so`` as in the CPU example, but compiles the source code to execute on the NeuronCore.\n\nIn our ``my_ops.py`` file we just use the ``torch_neuronx`` API to load our new library and execute our customOp exactly the same way we did before:\n\n.. code-block:: python\n\n    import torch\n    import torch_neuronx\n    from torch_neuronx.xla_impl import custom_op\n\n    custom_op.load_library('librelu.so')\n\n    class Relu(torch.autograd.Function):\n        @staticmethod\n        def forward(ctx, input):\n            ctx.save_for_backward(input)\n            return torch.ops.my_ops.relu_forward(input)\n\n        @staticmethod\n        def backward(ctx, grad):\n            input, = ctx.saved_tensors\n            return torch.ops.my_ops.relu_backward(grad, input), None\n\nTraining the MLP model on Trainium\n----------------------------------\n\nIn the ``train.py`` script we modify the CPU training script ``train_cpu.py`` to run with PyTorch Neuron torch_xla. Expected output on a trn1 instance:\n\n.. code:: bash\n\n    ----------Training ---------------\n    2023-02-02 22 (tel:2023020222):46:58.000299: INFO ||NCC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/USER_neuroncc-2.0.0.8683a0+c94c3936c/MODULE_4447837791278761679/MODULE_0_SyncTensorsGraph.329_4447837791278761679_ip-172-31-38-167.us-west-2.compute.internal-49ad7ade-14011-5f3bf523d8788/1650ba41-bcfd-4d15-9038-16d391c4a57c/MODULE_0_SyncTensorsGraph.329_4447837791278761679_ip-172-31-38-167.us-west-2.compute.internal-49ad7ade-14011-5f3bf523d8788.neff. Exiting with a successfully compiled graph\n    2023-02-02 22 (tel:2023020222):46:58.000433: INFO ||NCC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/USER_neuroncc-2.0.0.8683a0+c94c3936c/MODULE_16964505026440903899/MODULE_1_SyncTensorsGraph.401_16964505026440903899_ip-172-31-38-167.us-west-2.compute.internal-4d0cabba-14011-5f3bf529794a3/23d74230-59dd-4347-b247-fa98aed416bd/MODULE_1_SyncTensorsGraph.401_16964505026440903899_ip-172-31-38-167.us-west-2.compute.internal-4d0cabba-14011-5f3bf529794a3.neff. Exiting with a successfully compiled graph\n    Train throughput (iter/sec): 117.47151142662648\n    Final loss is 0.1970\n    ----------End Training ---------------\n"
  },
  {
    "path": "neuron-customops/tutorials/tutorial_source_code/custom_c_mlp_training/custom_c_mlp_training_code.sh",
    "content": "#!/bin/bash\nset -eExuo\n\n# Install requirements\npip install regex\npip install ninja\n\ncd ~/aws-neuron-samples/torch-neuronx/training/customop_mlp\n\ncd pytorch\npython build.py\n\npython train_cpu.py\n\ncd ..\ncd neuron\npython build.py\n\npython train.py"
  },
  {
    "path": "neuron-customops/tutorials/tutorial_source_code/custom_c_perf_optimization/custom_c_perf_optimization_code.sh",
    "content": "#!/bin/bash\nset -eExuo\n\n# Install requirements\npip install regex\npip install ninja\n\ncd ~/aws-neuron-samples/torch-neuronx/inference/customop_mlp\n\ncd neuron\npython build.py\npython inference.py\n\ncd ..\ncd neuron-tcm\npython build.py\npython inference.py\n\ncd ..\ncd neuron-multicore\npython build.py\npython inference.py"
  },
  {
    "path": "neuron-customops/tutorials/tutorials.rst",
    "content": "Tutorials\n=========\n\n.. toctree::\n    :maxdepth: 1\n\n    /neuron-customops/tutorials/customop-mlp-training\n    /neuron-customops/tutorials/customop-mlp-perf-opt"
  },
  {
    "path": "neuron-runtime/about/collectives.rst",
    "content": ".. meta::\n    :description: Learn about Neuron Collective Communication in AWS Neuron SDK. Understand key operations like AllGather, ReduceScatter, AllReduce and All-to-All, along with intra-node and inter-node communication scopes.\n    :date-modified: 12/02/2025\n\n.. _about_collectives:\n\nWhat is Neuron Collective Communication?\n=========================================\n\nThis topic covers Neuron Collective Communication and how it applies to developing with the AWS Neuron SDK. Collectives are distributed communication primitives that enable ranks in a distributed workload to exchange data using simple, well-defined semantics. In Neuron, each rank can be represented by a physical or logical Neuron Core.\n\nOverview\n--------\n\nModern neural networks with billions to trillions of parameters exceed single-machine computational capacity, making distributed machine learning essential for training and deployment. Collectives are a set of distributed computing primitives with simple semantics, originally developed in HPC.\n\nCollective communication coordinates data exchange among multiple processes in distributed systems. Unlike point-to-point communication, collective operations involve groups performing tasks like gradient aggregation, parameter sharing, and computation synchronization.\n\n\nApplies to\n----------\n\nThis concept is applicable to:\n\n* **Distributed Training**: Collective communication aggregates and synchronizes gradients across workers to maintain model consistency. In this scenario, collective operations enable workers to compute gradient sums across all nodes, ensuring uniform parameter updates.\n\n* **Distributed Inference**: During inference, collective communication distributes requests across multiple accelerators in serving nodes, optimizing resource utilization and maintaining low latency under high loads.\n\nIn distributed training, workers compute gradients on different data batches simultaneously. Collective communication aggregates and synchronizes gradients across workers to maintain model consistency. Also, during inference, collective communication distributes requests across multiple accelerators in serving nodes, optimizing resource utilization and maintaining low latency under high loads.\n\nFrom a developer perspective, the training/inference code will have high-level invocations to collective functions like (PyTorch) ``all_gather``, ``all_reduce``, ``reduce_scatter``, ``all_to_all``, ``permute``, and others. See below for a visual representation of some key collective operations:\n\nCollective Operations\n---------------------\n\nAllGather Operation\n~~~~~~~~~~~~~~~~~~~\n\nIn the **AllGather** operation, each rank shares its tensor and receives the aggregated tensors from all ranks, ordered by rank index.\n\n.. image:: /neuron-runtime/img/collectives/all-gather.gif\n   :alt: AllGather Operation\n   :align: center\n   :width: 80%\n\nReduceScatter Operation\n~~~~~~~~~~~~~~~~~~~~~~~\n\nThe **ReduceScatter** operation performs reductions on input data (for example, sum, min, max) across ranks, with each rank receiving an equal-sized block/piece of the result based on its rank index.\n\n.. image:: /neuron-runtime/img/collectives/reduce-scatter.gif\n   :alt: ReduceScatter Operation\n   :align: center\n   :width: 80%\n\nAllReduce Operation\n~~~~~~~~~~~~~~~~~~~\n\nThe **AllReduce** operation performs reductions on data (e.g., sum, max, min) across ranks and stores the result in the output buffer of every rank.\n\n.. image:: /neuron-runtime/img/collectives/all-reduce.gif\n   :alt: AllReduce Operation\n   :align: center\n   :width: 80%\n\nAll-to-All Operation\n~~~~~~~~~~~~~~~~~~~~\n\nIn **AlltoAll**, each rank sends different data to and receives different data from every other rank, resembling a distributed transpose.\n\n.. image:: /neuron-runtime/img/collectives/all-to-all.gif\n   :alt: All-to-All Operation\n   :align: center\n   :width: 80%\n\nPermute Operation\n~~~~~~~~~~~~~~~~~~\n\nIn the **Permute** operation, each rank sends its data to a designated destination rank and receives data from a designated source rank, according to a set of source-target pairs. The source-target pairs must form a valid ring topology with direct physical connectivity between adjacent ranks. Only ranks included in the source-target pairs participate in the collective execution; other ranks remain inactive during the operation. Currently, only circular permute patterns are supported.\n\n.. image:: /neuron-runtime/img/collectives/permute.gif\n   :alt: All-to-All Operation\n   :align: center\n   :width: 80%\n\n\n\nCommunication Scope\n--------------------\n\nCollective communication operations can be further categorized based on their scope within the distributed system topology. Understanding this distinction is crucial for optimizing performance and minimizing communication overhead in large-scale distributed training and inference. Collectives can be grouped into two main categories:\n\nIntra-node Collectives\n~~~~~~~~~~~~~~~~~~~~~~\n\n**Intra-node collectives** operate within a single node or a group of nodes where all corresponding Neuron Chips are physically interconnected using NeuronLinks. These operations typically leverage high-bandwidth, low-latency chip-to-chip connections, high-speed PCIe links and NeuronLink interconnections. Since data remains within the local memory (in one or more interconnected nodes) hierarchy, intra-node collectives generally offer superior bandwidth and lower latency compared to inter-node communication. However, depending on the size of the model, multiple nodes are required for the job.\n\n  For more details, see :doc:`Intra-node Collective Communications with AWS Neuron </neuron-runtime/explore/intranode-collective-comm>`.\n\nInter-node Collectives\n~~~~~~~~~~~~~~~~~~~~~~\n\n**Inter-node collectives** coordinate communication across multiple physical nodes in a distributed cluster, requiring data to traverse network infrastructure via EFA (Elastic Fabric Adapter) connections. While inter-node communication typically has higher latency and lower bandwidth than intra-node alternatives, it enables scaling beyond the computational limits of a single machine. Efficient inter-node collective implementations often employ hierarchical communication patterns, where intra-node operations are performed first, followed by inter-node coordination among designated processes.\n\nModern distributed training frameworks automatically optimize collective operations by combining intra-node and inter-node communication strategies. For example, in a Trn2 cluster, an all-reduce operation across 256 accelerators distributed across 4 nodes might first perform local reductions within each 64-accelerator node, then execute inter-node communication between the 4 nodes, and finally broadcast results back within each node.\n\n  For more details, see :doc:`Inter-node Collective Communications with AWS Neuron </neuron-runtime/explore/internode-collective-comm>`.\n\nSystem Connectivity\n-------------------\n\nEach Trainium 2 server (trn2.48xlarge or trn2u.48xlarge) consists of 16 Trainium2 chips, each connected to a 200Gbps EFA (`Elastic Fabric Adapter <https://aws.amazon.com/hpc/efa/>`__) network interface, for an aggregated 3.2Tbps `device-RDMA connectivity <https://en.wikipedia.org/wiki/Remote_direct_memory_access>`__. Each Trainium2 chip consists of eight physical NeuronCores. These physical cores can also be configured as Logical Cores or LNC (Logical Neuron Core). By default, each two NeuronCores are exposed as one (Logical) rank (LNC=2), but under LNC=1, they're exposed as two. In ``LNC=2``, each chip is exposed as 4 ranks for a total of 64 ranks per server, and each rank gets 3.2 Tbps / 64 = 50Gbps. In the case of ``LNC=1``, each chip is exposed as 8 ranks, and each rank gets 50 Gbps / 2 = 25Gbps.\n\nEach NeuronCore has dedicated components to actually realize collective operations called CC Cores. The collectives communication cores (CC cores) are dedicated synchronization processors responsible for the orchestration of collective communications. The CC cores control when and how data movement engines transfer data, ensuring each step of the collective algorithm executes in the correct order.\n\nLatency-wise, Trn2.48xl instances are backed by the AWS `10p10u <https://www.aboutamazon.com/news/aws/aws-infrastructure-generative-ai>`__ network. When measured with the `RDMA core performance test ib_write_lat <https://enterprise-support.nvidia.com/s/article/ib-write-lat>`__, a minimal packet takes 15us (latency) to go from an HBM in one server to an HBM of another.\n\n.. image:: /neuron-runtime/img/collectives/trn2-topology.png\n   :alt: Trn2 Topology\n   :align: center\n   :width: 80%  \n\nEach Trn2 server consists of 16 Trainium2 chips connected in a **2D Torus** — each chip is connected to 4 neighbors with a NeuronLink. For an :ref:`UltraServer configuration <trn2-ultraserver>`, we extend this to a **3D Torus**, with each chip adding connections on the Z dimensions to 2 neighbors with a bidirectional **NeuronLink** between each pair.\n\n.. image:: /neuron-runtime/img/collectives/trn2-ultraserver-topology.png\n   :alt: Trn2 UltraServer Topology\n   :align: center\n   :width: 80%\n\nRead more\n----------\n\nFor more details about how collectives are implemented in Neuron, see the following pages:\n\n* :doc:`Inter-node Collective Communications with AWS Neuron </neuron-runtime/explore/internode-collective-comm>`\n* :doc:`Intra-node Collective Communications with AWS Neuron </neuron-runtime/explore/intranode-collective-comm>`"
  },
  {
    "path": "neuron-runtime/about/core-dump.rst",
    "content": ".. meta::\n   :description: This topic guides you through your first time generating a Neuron runtime core dump when using the AWS Neuron SDK. \n   :date-modified: 12-02-2025\n\n.. _runtime-core-dump-quickstart:\n\nQuickstart: Generating a Neuron runtime core dump\n==================================================\n\nThis topic guides you through your first time generating a Neuron runtime core dump. It will help you understand the process when using AWS Neuron during a runtime failure and debugging the state of the device. When you have completed it, you will have a core dump.\n\n**This quickstart is for**: Advanced users\n\n**Time to complete**: 15m\n\nPrerequisites\n---------------\n\n* `Launch an EC2 instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`__\n* Use the latest :doc:`AWS Neuron Multi-Framework DLAMI </dlami/index>`\n* `Connect to the EC2 instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connect-linux-inst-ssh.html>`__\n* Understand the  :doc:`AWS Neuron Kernel Interface </nki/get-started/index>`\n\nStep 1: Setup the python virtual environment\n---------------------------------------------\n\nTo run this example, you must create a Python virtual environment with the Neuron Compiler::\n\n    python3 -m venv venv\n    source venv/bin/activate\n    python3 -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com\n    pip install neuronx-cc==2.*\n\nStep 2: Implement a NKI kernel with an error\n---------------------------------------------\n\nTo generate a core dump, you must run a model with a runtime error. The following script implements a NKI kernel with a out-of-bounds indirect memcopy. Save it to ``oob.py``::\n\n    import neuronxcc.nki as nki\n    import neuronxcc.nki.isa as nisa\n    import neuronxcc.nki.language as nl\n    from neuronxcc.nki.typing import tensor\n    import numpy as np\n\n    @nki.jit()\n    def out_of_bounds(in_tensor):\n        output = nl.ndarray([64, 512], dtype=in_tensor.dtype, buffer=nl.shared_hbm)\n\n        n, m = in_tensor.shape\n        ix, iy = nl.mgrid[0:n//2, 0:m]\n\n        # indices are out of range on purpose to demonstrate the core dump\n        expr_arange = 3*nl.arange(n//2)[:, None] \n        idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)\n\n        out_tile: tensor[64, 512] = nisa.memset(shape=(n//2, m), value=-1, dtype=in_tensor.dtype)\n        nisa.dma_copy(src=in_tensor[idx_tile, iy], dst=out_tile[ix, iy], oob_mode=nisa.oob_mode.error)\n\n        nl.store(output, out_tile)\n        return output\n\n    if __name__ == \"__main__\":\n        in_tensor = np.random.random_sample([128, 512]).astype(np.float32) * 100\n        output = out_of_bounds(in_tensor)\n\nStep 3: Run the NKI kernel\n---------------------------\n\nTrigger the core dump by running the script in your virtual environment: ``python3 oob.py``.\n\nThis leads to a runtime error and is accompanied with a ``nrt_infodump``::\n\n    2025-Sep-19 18:57:20.782962  4444:4444  ERROR  TDRV:exec_process_custom_notification        nd0:nc0:h_model.id1001: Received notification generated at runtime: failed to run scatter/gather (indirect memory copy via vector DGE), due to out-of-bound access. model name = file.neff.\n    2025-Sep-19 18:57:20.798030  4444:4444  ERROR  TDRV:exec_wait_round_robin                   [ND 0][NC 0] Out of bounds access on model file.neff\n    2025-Sep-19 18:57:20.805570  4444:4444  ERROR  NMGR:dlr_infer                               Inference completed with err: 1006. mode->h_nn=1001, lnc=0\n    2025-Sep-19 18:57:20.813269  4444:4444  ERROR   NRT:nrt_infodump                            Neuron runtime information - please include in any support request:\n    2025-Sep-19 18:57:20.821272  4444:4444  ERROR   NRT:nrt_infodump                            ------------->8------------[ cut here ]------------>8-------------\n    2025-Sep-19 18:57:20.829241  4444:4444  ERROR   NRT:nrt_infodump                            NRT version: 2.x.33931.0 (8be979e9fd075e9294c151d7cf03968058670d4c)\n    2025-Sep-19 18:57:20.837226  4444:4444  ERROR   NRT:nrt_infodump                            Embedded FW version: 1.0.22039.0 (d5fbbb7781171a2d6dd5bf6bac8f71064308bb0a) loaded from \"libnrtucode_extisa.so\"\n    2025-Sep-19 18:57:20.848129  4444:4444  ERROR   NRT:nrt_infodump                            CCOM version: 2.0.35440.0- (compat 78)\n    2025-Sep-19 18:57:20.855228  4444:4444  ERROR   NRT:nrt_infodump                            NCFW version: 1.0.18253.0 (7c9806c58d468da2cd27d24d59ceaf8fa0d25e4a)\n    2025-Sep-19 18:57:20.863255  4444:4444  ERROR   NRT:nrt_infodump                            Instance ID: i-0b514eadc4fec7de6\n    2025-Sep-19 18:57:20.870138  4444:4444  ERROR   NRT:nrt_infodump                            Cluster ID: 0\n    2025-Sep-19 18:57:20.876409  4444:4444  ERROR   NRT:nrt_infodump                            Kernel: Linux 5.10.240-218.959.amzn2int.x86_64 #1 SMP Thu Aug 7 19:38:22 UTC 2025\n    2025-Sep-19 18:57:20.886375  4444:4444  ERROR   NRT:nrt_infodump                            Nodename: 9371096ea4a1\n    2025-Sep-19 18:57:20.892956  4444:4444  ERROR   NRT:nrt_infodump                            Driver version: 2.x\n\n    2025-Sep-19 18:57:20.901533  4444:4444  ERROR   NRT:nrt_infodump                            Failure: NRT_EXEC_OOB in nrt_execute()\n    2025-Sep-19 18:57:20.908621  4444:4444  ERROR   NRT:nrt_infodump                            LNC: 0\n    2025-Sep-19 18:57:20.914681  4444:4444  ERROR   NRT:nrt_infodump                            Visible cores: 0, 1\n    2025-Sep-19 18:57:20.921135  4444:4444  ERROR   NRT:nrt_infodump                            Environment:\n    2025-Sep-19 18:57:20.927398  4444:4444  ERROR   NRT:nrt_infodump                            -------------8<-----------[ cut to here ]-----------8<------------\n    2025-Sep-19 18:57:21.484865  4444:4444  ERROR   NRT:nrt_execute_repeat                      Failed to execute model file.neff with status 1006\n\nConfirmation\n--------------\n\nThe core dump is generated under ``/tmp/neuron-core-dumps/``::\n\n    $ ls /tmp/neuron-core-dump/\n    dt-20250917-194443-cid-0000000000000000\n    $ ls /tmp/neuron-core-dump/dt-20250917-194443-cid-0000000000000000/\n    i-0b514eadc4fec7de6-nd0-nc0-pid-897-tid-897-lid-0  i-0b514eadc4fec7de6-nrt-pid-897.log\n\nThe core dump creates two types of files:\n\n* Dump of the hardware state\n* Dump of the tail of Neuron runtime error logs\n\nNext Steps\n-----------\n\nNow that you've completed this quickstart, take the core dump and dive into other topics that build off of and investigate it.\n\n* :ref:`Explore a Neuron Runtime core dump <runtime-core-dump-deep-dive>`\n\n"
  },
  {
    "path": "neuron-runtime/about/index.rst",
    "content": ".. _neuron-runtime-about:\n\n.. meta::\n   :description: Learn about the AWS NeuronX Runtime, its features, and capabilities.\n   :date-modified: 11/03/2025\n\nAbout the NeuronX Runtime\n==========================\n\nThis section provides information about the AWS Neuron Runtime, its features, and capabilities. Learn about core dumps, debugging techniques, and other important aspects of the Neuron Runtime.\n\nWhat is the NeuronX Runtime?\n--------------------------------\n\nThe NeuronX Runtime consists of a kernel driver and C/C++ libraries which provides APIs to access Inferentia and Trainium Neuron devices. The Neuron ML frameworks plugins for TensorFlow, PyTorch and Apache MXNet use the Neuron runtime to load and run models on the NeuronCores. Neuron runtime loads compiled deep learning models, also referred to as Neuron Executable File Format (NEFF) to the Neuron devices and is optimized for high-throughput and low-latency.\n\nWhat are Neuron Collectives?\n-----------------------------\n\nNeuron Collectives are distributed communication primitives that coordinate data exchange among multiple NeuronCores in distributed machine learning workloads. Each rank represents a physical or logical NeuronCore that participates in collective operations such as AllGather, AllReduce, ReduceScatter, and AllToAll.\n\nThese operations enable efficient gradient aggregation during distributed training and parameter sharing during distributed inference. Collectives operate at two levels: intra-node communication uses high-bandwidth NeuronLink interconnects between chips within a node, while inter-node communication leverages EFA (Elastic Fabric Adapter) networks to coordinate across multiple physical nodes. The runtime automatically selects optimal algorithms based on message size, cluster topology, and latency requirements.\n\nGet Started\n------------  \n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card:: Quickstart: Generate a Neuron Runtime Core Dump\n      :link: runtime-core-dump-quickstart\n      :link-type: ref\n      :class-header: sd-bg-primary sd-text-white\n\n      Learn how to generate a Neuron runtime core dump for debugging runtime failures and analyzing device state.\n\nNeuron Runtime Collectives\n---------------------------\n\n.. grid:: 1\n   :gutter: 2\n\n   .. grid-item-card:: About Neuron Runtime Collectives\n      :link: collectives\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n\n      Learn about \"Collectives\", distributed communication primitives that enable efficient data exchange between NeuronCores.\n   \n\n\n"
  },
  {
    "path": "neuron-runtime/api/debug-stream-api.rst",
    "content": ".. _nrt-debug-stream-api:\n\n========================================\nNeuron Debug Stream API Documentation\n========================================\n\nOverview\n========\n\nThe ``ndebug_stream`` APIs provide applications a way to consume debug events from the runtime. These debug events are emitted by the runtime per Logical Neuron Core and can be used by applications to get information on events that occurred on the device (such as device prints, breakpoints, etc.).\n\nDebug events are streamed through a connection interface, allowing applications to monitor and display information from Neuron Cores during execution.\n\nConnecting, Polling, and Consuming\n===================================\n\nConnection Process\n------------------\n\nApplications that want to consume debug events must follow these steps:\n\n1. **Connect** to a Logical Neuron Core's debug stream via ``nrt_debug_client_connect``\n2. **Poll** for events using Linux kernel polling APIs on the returned file descriptor\n3. **Consume** events using the ``nrt_debug_client_read_one_event`` API\n4. **Close** the connection when finished using ``nrt_debug_client_connect_close``\n\nOnce a client is connected to a core's debug stream, the runtime will push debug events emitted by the Logical Neuron Core to the stream for clients to consume.\n\nPolling for Events\n------------------\n\nThe stream file descriptor obtained from ``nrt_debug_client_connect`` is a standard Linux file descriptor and can be passed into any Linux polling API (such as ``epoll``, ``poll``, or ``select``). This allows applications to efficiently wait for debug events without busy waiting.\n\n.. important::\n   While the ``stream_fd`` is pollable, all other non-polling functionality must go through the provided ``nrt_debug_client*`` APIs. The stream contents can only be accessed using the ``nrt_debug_client_read*`` API(s).\n\nEvents\n======\n\nEvents consist of two parts:\n\n1. A header describing the payload type\n2. A payload representing the contents of the event\n\nEach event sent to the application is wrapped as a datagram. The header is a fixed-sized struct that describes the contents of the payload, including the size and how to interpret it.\n\nEvent Types\n-----------\n\nCurrently, the system supports these event types:\n\n+-------------------------------------------------+------------------------------------------+\n| Event Type                                      | Description                              |\n+=================================================+==========================================+\n| ``NDEBUG_STREAM_EVENT_TYPE_DEBUG_TENSOR_READ``  | Debug tensor read events from the core   |\n+-------------------------------------------------+------------------------------------------+\n\nAPI Reference\n=============\n\nnrt_debug_client_connect\n------------------------\n\n.. code-block:: c\n\n   NRT_STATUS nrt_debug_client_connect(int logical_nc_idx, int *stream_fd);\n\nEstablishes a connection to a specified Logical Neuron Core's debug stream.\n\n**Parameters:**\n\n* ``logical_nc_idx [in]`` - Core's debug stream to connect to\n* ``stream_fd [out]`` - Connection handle to reference and interact with the stream\n\n**Returns:**\n\n* ``NRT_SUCCESS`` on success\n\n.. note::\n   Only one client can connect to a Logical Neuron Core's stream at any given time. Attempts to connect to a stream with multiple clients will result in a ``NRT_INVALID`` return status.\n\nnrt_debug_client_connect_close\n------------------------------\n\n.. code-block:: c\n\n   void nrt_debug_client_connect_close(int stream_fd);\n\nCloses a connection created by ``nrt_debug_client_connect``.\n\n**Parameters:**\n\n* ``stream_fd [in]`` - Connection handle to close\n\nnrt_debug_client_read_one_event\n-------------------------------\n\n.. code-block:: c\n\n   NRT_STATUS nrt_debug_client_read_one_event(int stream_fd, ndebug_stream_event_header_t *header, void **payload);\n\nConsumes a single event from the stream.\n\n**Parameters:**\n\n* ``stream_fd [in]`` - Stream to consume an event from\n* ``header [out]`` - Consumed event's header\n* ``payload [out]`` - Consumed event's payload\n\n**Returns:**\n\n* ``NRT_SUCCESS`` on success\n* ``NRT_QUEUE_EMPTY`` if no events are available\n\n.. important::\n   It is the user's responsibility to free the payload pointer.\n\n.. note::\n   This function must be called from the same process that owns the Logical Neuron Core. Calling this function from any other process results in undefined behavior.\n\nData Structures\n===============\n\nndebug_stream_event_type\n------------------------\n\n.. code-block:: c\n\n   typedef enum ndebug_stream_event_type {\n       NDEBUG_STREAM_EVENT_TYPE_INVALID = 0,\n       NDEBUG_STREAM_EVENT_TYPE_DEBUG_TENSOR_READ = 1,\n   } ndebug_stream_event_type_t;\n\nEnumeration of the different types of debug events that can be emitted.\n\nndebug_stream_event_header\n--------------------------\n\n.. code-block:: c\n\n   typedef struct ndebug_stream_event_header {\n       uint64_t data_size;\n       uint32_t type;\n       char reserved[52];\n   } ndebug_stream_event_header_t;\n\nHeader structure for debug stream events.\n\n**Fields:**\n\n* ``data_size`` - Size of the payload data in bytes\n* ``type`` - Type of event (see ``ndebug_stream_event_type_t``)\n* ``reserved`` - Reserved bytes for future use\n\nndebug_stream_payload_debug_tensor_read\n---------------------------------------\n\n.. code-block:: c\n\n   typedef struct ndebug_stream_payload_debug_tensor_read {\n       char prefix[512];\n       uint32_t logical_nc_id;\n       uint32_t pipe;\n       char tensor_dtype[16];\n       uint64_t tensor_shape[8];\n       uint64_t tensor_data_size;\n       char reserved0[416];\n       char tensor_data[];\n   } ndebug_stream_payload_debug_tensor_read_t;\n\nPayload structure for debug tensor read events.\n\n**Fields:**\n\n* ``prefix`` - The prefix string to print\n* ``logical_nc_id`` - The logical core the print event originated from\n* ``pipe`` - The pipe to write the printed string to\n* ``tensor_dtype`` - Tensor data type\n* ``tensor_shape`` - Tensor shape dimensions (up to 8 dimensions)\n* ``tensor_data_size`` - Size in bytes of the tensor content\n* ``reserved0`` - Reserved bytes for future use\n* ``tensor_data`` - The contents of the tensor to display (flexible array member)\n\nNotes and Important Considerations\n==================================\n\n1. These APIs do not allow for interprocess communication. Debug events are only pushed to the process that owns the Logical Neuron Core.\n\n2. These APIs do not provide thread safety for multiple threads accessing the SAME stream (thread safety for different streams is guaranteed).\n\n3. There can only be one outstanding connection per stream. Any attempts to initialize multiple connections will result in an error.\n\n4. Events are only emitted AFTER a client connects to a Logical Neuron Core's stream. Any event that would have been emitted before connecting to the stream is dropped.\n\n5. Events will be dropped if the number of unconsumed events in a stream exceeds the stream's buffer size. Clients must consume events fast enough to prevent dropped events.\n\n6. Clients can configure the stream's buffer size via the ``NEURON_RT_DEBUG_STREAM_BUFFER_SIZE`` environment variable. The buffer size currently defaults to 64K debug events.\n\n7. The payload buffer returned by ``nrt_debug_client_read_one_event`` must be freed by the caller.\n"
  },
  {
    "path": "neuron-runtime/api/index.rst",
    "content": ".. _nrt_api_reference:\n\nNeuron Runtime API Reference\n=============================\n\nThis section provides comprehensive API reference documentation for the Neuron Runtime (NRT) and Neuron Driver Library (NDL). These APIs enable low-level access to AWS Neuron devices and provide interfaces for model loading, execution, memory management, and collective operations.\n\n**Source code for these APIs can be found at**: https://github.com/aws-neuron/aws-neuron-sdk.\n\nCore Runtime APIs\n-----------------\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`NRT API </neuron-runtime/api/nrt>`\n     - Main Neuron Runtime API for model loading, execution, and tensor management\n   * - :doc:`NRT Status </neuron-runtime/api/nrt_status>`\n     - Status codes and error handling for runtime operations\n   * - :doc:`NRT Version </neuron-runtime/api/nrt_version>`\n     - Version information and compatibility checking\n\nAsynchronous Execution APIs\n----------------------------\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`NRT Async </neuron-runtime/api/nrt_async>`\n     - Asynchronous execution API for non-blocking operations\n   * - :doc:`NRT Async Send/Recv </neuron-runtime/api/nrt_async_sendrecv>`\n     - Asynchronous tensor send and receive operations\n\nProfiling and Debugging APIs\n-----------------------------\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`NRT Profile </neuron-runtime/api/nrt_profile>`\n     - Profiling API for performance analysis and optimization\n   * - :doc:`NRT System Trace </neuron-runtime/api/nrt_sys_trace>`\n     - System trace capture and event fetching\n   * - :doc:`Debug Stream </neuron-runtime/api/ndebug_stream>`\n     - Debug event streaming from Logical Neuron Cores\n\nCollective Operations API\n--------------------------\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`NEC API </neuron-runtime/api/nec>`\n     - Neuron Elastic Collectives (NEC) for distributed operations\n\nNeuron Driver Library (NDL) APIs\n---------------------------------\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`NDL API </neuron-runtime/api/ndl>`\n     - Low-level Neuron Driver Library for device access and control\n   * - :doc:`Neuron Driver Shared </neuron-runtime/api/neuron_driver_shared>`\n     - Shared definitions between runtime and driver\n   * - :doc:`Tensor Batch Operations </neuron-runtime/api/neuron_driver_shared_tensor_batch_op>`\n     - Batch operation structures for tensor transfers\n\nNeuron Datastore API\n--------------------\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Neuron Datastore </neuron-runtime/api/neuron_ds>`\n     - Neuron Datastore (NDS) for sharing metrics and model information\n\nExperimental APIs\n-----------------\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`NRT Experimental </neuron-runtime/api/nrt_experimental>`\n     - Experimental features and APIs (subject to change)\n\n.. toctree::\n   :maxdepth: 5\n   :hidden:\n   :caption: Core Runtime APIs\n\n.. toctree::\n   :maxdepth: 5\n   :hidden:\n   :caption: Debugging\n\n   Debug Stream APIs </neuron-runtime/api/debug-stream-api>\n\n\n.. toctree::\n   :maxdepth: 5\n   :hidden:\n   :caption: Asynchronous Execution APIs\n\n   NRT Async <nrt_async>\n   NRT Async Send/Recv <nrt_async_sendrecv>\n\n.. toctree::\n   :maxdepth: 5\n   :hidden:\n   :caption: Profiling and Debugging APIs\n\n   NRT Profile <nrt_profile>\n   NRT System Trace <nrt_sys_trace>\n   Debug Stream <ndebug_stream>\n\n.. toctree::\n   :maxdepth: 5\n   :hidden:\n   :caption: Collective Operations API\n\n   NEC API <nec>\n\n.. toctree::\n   :maxdepth: 5\n   :hidden:\n   :caption: Neuron Driver Library APIs\n\n   NDL API <ndl>\n   Neuron Driver Shared <neuron_driver_shared>\n   Tensor Batch Operations <neuron_driver_shared_tensor_batch_op>\n\n.. toctree::\n   :maxdepth: 5\n   :hidden:\n   :caption: Neuron Datastore API\n\n   Neuron Datastore <neuron_ds>\n\n.. toctree::\n   :maxdepth: 5\n   :hidden:\n   :caption: Experimental APIs\n\n   NRT Experimental <nrt_experimental>\n"
  },
  {
    "path": "neuron-runtime/api/ndebug_stream.rst",
    "content": ".. _api_ndebug_stream_h:\n\nndebug_stream.h\n===============\n\nNeuron Debug Stream API - Consume debug events from the runtime per Logical Neuron Core.\n\n**Source**: `src/libnrt/include/nrt/ndebug_stream.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/ndebug_stream.h>`_\n\nOverview\n--------\n\nThe ``ndebug_stream`` APIs provide applications a way to consume debug events from the runtime. These debug events are emitted by the runtime per Logical Neuron Core and can be used by applications to get information on events that occurred on the device (ie prints, breakpoints, etc.).\n\n**Connecting, polling, and consuming:** Applications that want to consume debug events will first need to connect to a Logical Neuron Core's debug stream via a call to ``nrt_debug_client_connect``. Once a client is connected to a core's debug stream, the runtime will push debug events emitted by the Logical Neuron Core to the stream for clients to consume.\n\n**Closing a Connection:** Once a connection is not needed anymore, clients can close the connection using the ``nrt_debug_client_connect_close`` API.\n\n**Events:** Events consist of a header describing the payload type, and a payload representing the contents of the event. Events can be consumed by clients via the ``nrt_debug_client_read*`` API(s).\n\nEnumerations\n------------\n\nndebug_stream_event_type_t\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum ndebug_stream_event_type {\n       NDEBUG_STREAM_EVENT_TYPE_INVALID = 0,\n       NDEBUG_STREAM_EVENT_TYPE_DEBUG_TENSOR_READ = 1,\n   } ndebug_stream_event_type_t;\n\nDebug stream event types.\n\n**Source**: `ndebug_stream.h:51 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/ndebug_stream.h#L51>`_\n\nStructures\n----------\n\nndebug_stream_event_header_t\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct ndebug_stream_event_header {\n       uint64_t data_size;\n       uint32_t type;\n       char reserved[52];\n   } ndebug_stream_event_header_t;\n\nDebug stream event header.\n\n**Source**: `ndebug_stream.h:56 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/ndebug_stream.h#L56>`_\n\nndebug_stream_payload_debug_tensor_read_t\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct ndebug_stream_payload_debug_tensor_read {\n       char prefix[512];\n       uint32_t logical_nc_id;\n       uint32_t pipe;\n       char tensor_dtype[16];\n       uint64_t tensor_shape[8];\n       uint64_t tensor_data_size;\n       char reserved0[416];\n       char tensor_data[];\n   } ndebug_stream_payload_debug_tensor_read_t;\n\nPayload for debug tensor read events.\n\n**Source**: `ndebug_stream.h:62 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/ndebug_stream.h#L62>`_\n\nFunctions\n---------\n\nnrt_debug_client_connect\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_debug_client_connect(int logical_nc_idx, int *stream_fd);\n\nEstablish a connection to a specified Logical Neuron Core's debug stream.\n\n**Parameters:**\n\n* ``logical_nc_idx`` [in] - Core's debug stream to connect to.\n* ``stream_fd`` [out] - Connection handle to reference and interact with the stream.\n\n**Returns:** NRT_SUCCESS on success.\n\n**Note:** Only one client can connect to a Logical Neuron Core's stream at any given time. Attempts to connect to a stream with multiple clients will result in a NRT_INVALID return status.\n\n**Source**: `ndebug_stream.h:82 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/ndebug_stream.h#L82>`_\n\nnrt_debug_client_connect_close\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   void nrt_debug_client_connect_close(int stream_fd);\n\nCloses connection created by ``nrt_debug_client_connect``.\n\n**Parameters:**\n\n* ``stream_fd`` [in] - Connection handle to close.\n\n**Source**: `ndebug_stream.h:88 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/ndebug_stream.h#L88>`_\n\nnrt_debug_client_read_one_event\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_debug_client_read_one_event(int stream_fd, ndebug_stream_event_header_t *header, void **payload);\n\nConsumes a single event from the stream.\n\n**Parameters:**\n\n* ``stream_fd`` [in] - Stream to consume an event from\n* ``header`` [out] - Consumed event's header.\n* ``payload`` [out] - Consumed event's payload. **IMPORTANT**: it is the user's responsibility to free this payload pointer.\n\n**Returns:** NRT_SUCCESS on success.\n\n**Note:** This function must be called from the same process that owns the Logical Neuron Core. Calling this function from any other process results in undefined behavior.\n\n**Source**: `ndebug_stream.h:102 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/ndebug_stream.h#L102>`_\n"
  },
  {
    "path": "neuron-runtime/api/ndl.rst",
    "content": ".. _api_ndl_h:\n\nndl.h\n=====\n\nNeuron Driver Library (NDL) API - Low-level interface to Neuron devices.\n\n**Source**: `src/libnrt/include/ndl/ndl.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h>`_\n\nEnumerations\n------------\n\nNQ_DEV_TYPE\n^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum NQ_DEV_TYPE {\n       NQ_DEV_TYPE_NEURON_CORE = 0,\n       NQ_DEV_TYPE_TOPSP,\n       NQ_DEV_TYPE_MAX,\n   } ndl_nq_dev_t;\n\nDevice type enumeration for notification queues.\n\n**Source**: `ndl.h:18 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L18>`_\n\nConstants\n---------\n\nNEURON_MAX_DEVICES\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NEURON_MAX_DEVICES MAX_NEURON_DEVICE_COUNT\n\nMaximum neuron devices supported on a system.\n\n**Source**: `ndl.h:24 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L24>`_\n\nNEURON_DEVICE_PREFIX\n^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NEURON_DEVICE_PREFIX \"/dev/neuron\"\n\nDevice file prefix for Neuron devices.\n\n**Source**: `ndl.h:25 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L25>`_\n\nMAX_HBM_PER_DEVICE\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define MAX_HBM_PER_DEVICE 4\n\nMaximum HBM (High Bandwidth Memory) regions per device.\n\n**Source**: `ndl.h:28 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L28>`_\n\nMAX_NEURON_DEVICE_COUNT\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define MAX_NEURON_DEVICE_COUNT 64\n\nMaximum neuron devices supported on a system.\n\n**Source**: `ndl.h:78 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L78>`_\n\nMAX_NC_PER_DEVICE\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define MAX_NC_PER_DEVICE 8\n\nMaximum neuron cores per device.\n\n**Source**: `ndl.h:81 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L81>`_\n\nStructures\n----------\n\nndl_version_info_t\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct ndl_version_info {\n       uint16_t driver_major_version;\n       uint16_t driver_minor_version;\n       char driver_full_version[DRIVER_VERSION_MAX_SIZE];\n       uint16_t library_major_version;\n       uint16_t library_minor_version;\n   } ndl_version_info_t;\n\nVersion information for driver and library.\n\n**Source**: `ndl.h:31 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L31>`_\n\nndl_device_init_param_t\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct ndl_device_init_param {\n       bool initialize_device;\n       int num_dram_regions;\n       bool map_hbm;\n   } ndl_device_init_param_t;\n\nDevice initialization parameters.\n\n**Source**: `ndl.h:59 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L59>`_\n\nndl_device_t\n^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct ndl_device {\n       uint8_t device_index;\n       uint8_t device_type;\n       uint16_t device_revision;\n       uint8_t connected_device_count;\n       uint8_t connected_devices[MAX_NEURON_DEVICE_COUNT];\n       uint64_t csr_base[2];\n       uint64_t csr_size[2];\n       ndl_copy_buf_t cpy_bufs[MAX_NC_PER_DEVICE];\n       void *hbm_va[MAX_HBM_PER_DEVICE];\n       size_t hbm_size;\n       uint32_t hbm_va_cnt;\n       uint32_t shift_hbm_size;\n       uint64_t hbm_offset[MAX_HBM_PER_DEVICE];\n       uint8_t context[];\n   } ndl_device_t;\n\nDevice structure containing device information and resources.\n\n**Source**: `ndl.h:83 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L83>`_\n\nndl_mem_info_t\n^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct ndl_mem_info {\n       ndl_device_t *device;\n       __u64 driver_handle;\n       uint64_t pa;\n       uint64_t mmap_offset;\n       uint64_t size;\n       uint32_t align;\n       void *mmap_va;\n       uint32_t host_memory;\n       int nc_id;\n   } ndl_mem_info_t;\n\nMemory allocation information.\n\n**Source**: `ndl.h:107 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L107>`_\n\nndl_notification_context_t\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct ndl_notification_context {\n       union {\n           uint8_t nc_id;\n           uint8_t nq_dev_id;\n       };\n       ndl_nq_dev_t nq_dev_type;\n       uint8_t nq_type;\n       uint8_t engine_index;\n       uint32_t size;\n       int fd;\n       uint64_t offset;\n       uint64_t mem_handle;\n       void *va;\n       ndl_mem_info_t *mem_info;\n   } ndl_notification_context_t;\n\nNotification queue context.\n\n**Source**: `ndl.h:119 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L119>`_\n\nFunctions\n---------\n\nndl_get_version\n^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int ndl_get_version(ndl_version_info_t *version);\n\nGet version info.\n\n**Parameters:**\n\n* ``version`` [out] - Buffer to store the version information.\n\n**Returns:** 0 on success, -1 on failed to read driver version.\n\n**Source**: `ndl.h:45 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L45>`_\n\nndl_open_device\n^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int ndl_open_device(int device_index, ndl_device_init_param_t *params, ndl_device_t **device);\n\nCalled by app the first time when it accesses the device.\n\n**Parameters:**\n\n* ``device_index`` [in] - device index that is to be opened\n* ``params`` [in] - device initialization parameters\n* ``device`` [out] - device specific information\n\n**Returns:** 0 on success, -1 on failure\n\n**Source**: `ndl.h:141 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L141>`_\n\nndl_close_device\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int ndl_close_device(ndl_device_t *device);\n\nCalled by app when it is done. After this, device cannot be accessed.\n\n**Parameters:**\n\n* ``device`` [in] - Device to close.\n\n**Returns:** 0 on success, -1 on failure\n\n**Source**: `ndl.h:150 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L150>`_\n\nndl_available_devices\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int ndl_available_devices(int *device_indexes, int device_indexes_size);\n\nGet all the device index.\n\n**Parameters:**\n\n* ``device_indexes`` [out] - Buffer to store device indexes.\n* ``device_indexes_size`` [in] - Size of the buffer in dwords.\n\n**Returns:** Number of devices found.\n\n**Source**: `ndl.h:159 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L159>`_\n\nndl_memory_alloc\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int ndl_memory_alloc(ndl_device_t *device, size_t size, uint64_t align, uint32_t host_memory, \n                        uint32_t dram_channel, uint32_t dram_region, uint32_t nc_id, \n                        uint32_t mem_alloc_type, uint64_t *mem_handle);\n\nAllocates memory.\n\n**Parameters:**\n\n* ``device`` [in] - Device to be associated with the allocation.\n* ``size`` [in] - Number of bytes to allocate.\n* ``host_memory`` [in] - If true allocate from host memory instead of using device memory.\n* ``dram_channel`` [in] - DRAM channel to use in the device memory.\n* ``dram_region`` [in] - DRAM region to use in the device memory.\n* ``nc_id`` [in] - NC ID to use in the device\n* ``mem_alloc_type`` [in] - Type of memory allocation\n* ``mem_handle`` [out] - Allocated memory handle would be stored here.\n\n**Returns:** 0 on success.\n\n**Source**: `ndl.h:227 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L227>`_\n\nndl_memory_map\n^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int ndl_memory_map(uint64_t mem_handle, void **va);\n\nMap given memory handle into virtual address space.\n\n**Parameters:**\n\n* ``mem_handle`` [in] - Handle to map.\n* ``va`` [out] - Resulting virtual address.\n\n**Returns:** 0 on success\n\n**Source**: `ndl.h:240 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L240>`_\n\nndl_memory_free\n^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int ndl_memory_free(uint64_t mem_handle);\n\nFrees already allocated memory.\n\n**Parameters:**\n\n* ``mem_handle`` [in] - Memory handle to be freed.\n\n**Returns:** 0 on success.\n\n**Source**: `ndl.h:255 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L255>`_\n\nndl_notification_init\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int ndl_notification_init(ndl_device_t *device, int nq_dev_id, ndl_nq_dev_t nq_dev_type, \n                             uint8_t nq_type, uint8_t engine_index, uint32_t size, \n                             bool on_host_memory, uint32_t dram_channel, uint32_t dram_region,\n                             uint64_t *notification_context);\n\nConfigure notification queue.\n\n**Parameters:**\n\n* ``device`` [in] - Device\n* ``nq_dev_id`` [in] - Notification device index\n* ``nq_dev_type`` [in] - Notification device type\n* ``nq_type`` [in] - Notification queue type\n* ``engine_index`` [in] - Engine index\n* ``size`` [in] - Size in bytes\n* ``on_host_memory`` [in] - If true, NQ is created on host memory\n* ``dram_channel`` [in] - If NQ is created on device, DRAM channel to use\n* ``dram_region`` [in] - If NQ is created on device, DRAM region to use\n* ``notification_context`` [out] - Resulting NQ context.\n\n**Returns:** 0 on success.\n\n**Source**: `ndl.h:625 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L625>`_\n\nndl_reset_ncs\n^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int ndl_reset_ncs(int device_index, int nc_map, uint32_t *request_id);\n\nReset given NCs within a device.\n\n**Parameters:**\n\n* ``device_index`` [in] - Device to reset.\n* ``nc_map`` [in] - NCs to reset (-1 to reset entire device)\n* ``request_id`` [out] - ID for this reset request\n\n**Returns:** 0 on success.\n\n**Source**: `ndl.h:476 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/ndl.h#L476>`_\n"
  },
  {
    "path": "neuron-runtime/api/nec.rst",
    "content": ".. _api_nec_h:\n\nnec.h\n=====\n\nNeuron Elastic Collectives (NEC) API - Collective operations for distributed computing on Neuron devices.\n\n**Source**: `src/libnrt/include/nrt/nec.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h>`_\n\nOverview\n--------\n\nThis is the main component for Neuron Elastic Collectives in Neuron Runtime (NRT). This provides collective operations to applications offloaded by the device including collective comm init, receiving (post) operations, building resources for the operation, triggering the operation and polling its completion.\n\nConstants\n---------\n\nNEC_MAX_CHANNELS\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NEC_MAX_CHANNELS 32\n\nMaximum channels (matches MAXCHANNELS in NCCL).\n\n**Source**: `nec.h:18 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L18>`_\n\nNEC_MAX_COMM_N\n^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NEC_MAX_COMM_N 12\n\nMax supported replica-groups in NEFF.\n\n**Source**: `nec.h:26 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L26>`_\n\nNEC_MAX_STREAM_N\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NEC_MAX_STREAM_N 4\n\nThe maximum number of concurrent cc execution.\n\n**Source**: `nec.h:56 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L56>`_\n\nEnumerations\n------------\n\nnec_pod_type_t\n^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum nec_pod_type {\n       NEC_POD_TYPE_NONE,\n       NEC_POD_TYPE_P2P,\n       NEC_POD_TYPE_SWITCH,\n       NEC_POD_TYPE_INVALID\n   } nec_pod_type_t;\n\nPod type enumeration (translated from what KaenaDriver returns).\n\n**Source**: `nec.h:103 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L103>`_\n\nenc_pattern_t\n^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum enc_pattern {\n       ENC_PATTERN_RING,\n       ENC_PATTERN_MESH,\n       ENC_PATTERN_INVALID,\n   } enc_pattern_t;\n\nCommunication pattern types.\n\n**Source**: `nec.h:244 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L244>`_\n\nStructures\n----------\n\nnccl_comm_info_t\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nccl_comm_info {\n       uint64_t cluster_id;\n       time_t epoch;\n       int neuron_dev;\n       int rank;\n       int rank_n;\n       int local_rank_n;\n       int local_rack_rank_n;\n       int node;\n       int node_n;\n       bool enable_pod;\n       bool use_net;\n       int pod;\n       int pod_n;\n       int pod_node;\n       int pod_node_n;\n       struct enc_peer_info *peers;\n       int channel_n;\n       struct enc_ring rings[NEC_MAX_CHANNELS];\n       int kangaring_channel_n;\n       int* kangaring_paths[NEC_MAX_CHANNELS];\n       int mla_cycle_n;\n       int* mla_cycles[NEC_MAX_CHANNELS];\n   } nccl_comm_info_t;\n\nComm info to query from NCCL.\n\n**Source**: `nec.h:732 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L732>`_\n\nenc_neuron_device_info_t\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct enc_neuron_device_info {\n       int nec_dev_id;\n       int mla_idx;\n       int tpb_idx;\n       int host_device_id;\n       int routing_id;\n       uint64_t pod_id;\n       nec_pod_type_t pod_type;\n       uint32_t pod_node_id;\n       uint32_t virtual_server_id;\n       enc_proxy_histogram_config_t histogram_config;\n   } enc_neuron_device_info_t;\n\nNeuron Device information. This data structure is used to send the device information from KaenaRuntime to KaenaNCCL for nccl communicator building.\n\n**Source**: `nec.h:787 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L787>`_\n\nnec_version_info_t\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nec_version_info {\n       uint64_t major;\n       uint64_t minor;\n       uint64_t patch;\n       uint64_t maintenance;\n       char git_hash[16];\n       uint64_t compatibility_version;\n       uint8_t future_fields[];\n   } nec_version_info_t;\n\nNEC version information.\n\n**Source**: `nec.h:920 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L920>`_\n\nFunctions\n---------\n\nnec_get_device_count\n^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int nec_get_device_count(int *available_devices_array, uint32_t array_size);\n\nQuery device information - get device count.\n\n**Parameters:**\n\n* ``available_devices_array`` [out] - Array to store available device IDs\n* ``array_size`` [in] - Size of the array\n\n**Returns:** Number of available devices\n\n**Source**: `nec.h:917 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L917>`_\n\nnec_get_virtual_core_size\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nec_get_virtual_core_size(uint32_t *virtual_core_size);\n\nQuery vcore size.\n\n**Parameters:**\n\n* ``virtual_core_size`` [out] - Virtual core size\n\n**Returns:** NRT_STATUS_SUCCESS on success\n\n**Source**: `nec.h:923 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L923>`_\n\nnec_get_version_info\n^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nec_get_version_info(nec_version_info_t *version_info);\n\nGet NEC version information.\n\n**Parameters:**\n\n* ``version_info`` [out] - Version information structure\n\n**Returns:** NRT_STATUS_SUCCESS on success\n\n**Source**: `nec.h:932 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nec.h#L932>`_\n"
  },
  {
    "path": "neuron-runtime/api/neuron_driver_shared.rst",
    "content": ".. _api_neuron_driver_shared_h:\n\nneuron_driver_shared.h\n======================\n\nShared definitions between Neuron driver and runtime.\n\n**Source**: `src/libnrt/include/ndl/neuron_driver_shared.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h>`_\n\nEnumerations\n------------\n\nneuron_driver_feature_flag\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   enum neuron_driver_feature_flag {\n       NEURON_DRIVER_FEATURE_DMABUF = 1ull << 0,\n       NEURON_DRIVER_FEATURE_ASYNC_DMA = 1ull << 1,\n       NEURON_DRIVER_FEATURE_BATCH_DMAQ_INIT = 1ull << 2,\n       NEURON_DRIVER_FEATURE_BIG_CORE_MAPS = 1ull << 3,\n       NEURON_DRIVER_FEATURE_MEM_ALLOC_TYPE = 1ull << 4,\n       NEURON_DRIVER_FEATURE_HBM_SCRUB = 1ull << 5,\n       NEURON_DRIVER_FEATURE_MEM_ALLOC64 = 1ull << 6,\n       NEURON_DRIVER_FEATURE_CONTIGUOUS_SCRATCHPAD = 1ull << 7,\n       NEURON_DRIVER_FEATURE_ZEROCOPY = 1ull << 8,\n   };\n\nFeature flags for driver capabilities.\n\n**Source**: `neuron_driver_shared.h:11 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L11>`_\n\nneuron_pod_ctrl_req\n^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   enum neuron_pod_ctrl_req {\n       NEURON_NPE_POD_CTRL_REQ_POD = 0,\n       NEURON_NPE_POD_CTRL_REQ_SINGLE_NODE = 1,\n       NEURON_NPE_POD_CTRL_REQ_KILL = 2,\n       NEURON_NPE_POD_CTRL_SET_MODE = 3,\n   };\n\nPod control request types.\n\n**Source**: `neuron_driver_shared.h:40 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L40>`_\n\nneuron_ultraserver_mode\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   enum neuron_ultraserver_mode {\n       NEURON_ULTRASERVER_MODE_UNSET = 0,\n       NEURON_ULTRASERVER_MODE_X4 = 1,\n       NEURON_ULTRASERVER_MODE_X2H = 2,\n       NEURON_ULTRASERVER_MODE_X2V = 3,\n       NEURON_ULTRASERVER_MODE_X1 = 4,\n   };\n\nUltraserver configuration modes.\n\n**Source**: `neuron_driver_shared.h:47 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L47>`_\n\nneuron_dma_queue_type\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   enum neuron_dma_queue_type {\n       NEURON_DMA_QUEUE_TYPE_TX = 0,\n       NEURON_DMA_QUEUE_TYPE_RX,\n       NEURON_DMA_QUEUE_TYPE_COMPLETION,\n   };\n\nDMA queue types.\n\n**Source**: `neuron_driver_shared.h:63 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L63>`_\n\nNQ_DEVICE_TYPE\n^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   enum NQ_DEVICE_TYPE {\n       NQ_DEVICE_TYPE_NEURON_CORE = 0,\n       NQ_DEVICE_TYPE_TOPSP,\n       NQ_DEVICE_TYPE_MAX\n   };\n\nNotification queue device types.\n\n**Source**: `neuron_driver_shared.h:115 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L115>`_\n\nNQ_TYPE\n^^^^^^^\n\n.. code-block:: c\n\n   enum NQ_TYPE {\n       NQ_TYPE_TRACE = 0,\n       NQ_TYPE_NOTIFY,\n       NQ_TYPE_EVENT,\n       NQ_TYPE_ERROR,\n       NQ_TYPE_TRACE_DMA,\n       NQ_TYPE_THROTTLE,\n       NQ_TYPE_MAX\n   };\n\nNotification queue types.\n\n**Source**: `neuron_driver_shared.h:123 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L123>`_\n\nmem_alloc_category_t\n^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum {\n       NEURON_MEMALLOC_TYPE_UNKNOWN_HOST,\n       NEURON_MEMALLOC_TYPE_CODE_HOST,\n       NEURON_MEMALLOC_TYPE_TENSORS_HOST,\n       NEURON_MEMALLOC_TYPE_CONSTANTS_HOST,\n       NEURON_MEMALLOC_TYPE_MISC_HOST,\n       NEURON_MEMALLOC_TYPE_NCDEV_HOST,\n       NEURON_MEMALLOC_TYPE_NOTIFICATION_HOST,\n       NEURON_MEMALLOC_TYPE_UNKNOWN_DEVICE,\n       NEURON_MEMALLOC_TYPE_CODE_DEVICE,\n       NEURON_MEMALLOC_TYPE_TENSORS_DEVICE,\n       NEURON_MEMALLOC_TYPE_CONSTANTS_DEVICE,\n       NEURON_MEMALLOC_TYPE_SCRATCHPAD_DEVICE,\n       NEURON_MEMALLOC_TYPE_MISC_DEVICE,\n       NEURON_MEMALLOC_TYPE_NCDEV_DEVICE,\n       NEURON_MEMALLOC_TYPE_COLLECTIVES_DEVICE,\n       NEURON_MEMALLOC_TYPE_SCRATCHPAD_NONSHARED_DEVICE,\n       NEURON_MEMALLOC_TYPE_NOTIFICATION_DEVICE,\n       NEURON_MEMALLOC_TYPE_DMA_RINGS_HOST,\n       NEURON_MEMALLOC_TYPE_DMA_RINGS_DEVICE,\n       NEURON_MEMALLOC_TYPE_CONTIGUOUS_SCRATCHPAD_DEVICE,\n       NEURON_MEMALLOC_TYPE_MAX\n   } mem_alloc_category_t;\n\nMemory allocation categories for sysfs counters.\n\n**Source**: `neuron_driver_shared.h:234 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L234>`_\n\nStructures\n----------\n\nneuron_dma_eng_state\n^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   struct neuron_dma_eng_state {\n       __u32 revision_id;\n       __u32 max_queues;\n       __u32 num_queues;\n       __u32 tx_state;\n       __u32 rx_state;\n   };\n\nDMA engine state information.\n\n**Source**: `neuron_driver_shared.h:76 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L76>`_\n\nneuron_dma_queue_state\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   struct neuron_dma_queue_state {\n       __u32 hw_status;\n       __u32 sw_status;\n       __u64 base_addr;\n       __u32 length;\n       __u32 head_pointer;\n       __u32 tail_pointer;\n       __u64 completion_base_addr;\n       __u32 completion_head;\n   };\n\nDMA queue state information.\n\n**Source**: `neuron_driver_shared.h:84 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L84>`_\n\nneuron_uuid\n^^^^^^^^^^^\n\n.. code-block:: c\n\n   struct neuron_uuid {\n       __u8 value[32];\n   };\n\nUUID structure for model identification.\n\n**Source**: `neuron_driver_shared.h:163 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L163>`_\n\nneuron_app_info\n^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   struct neuron_app_info {\n       __s32 pid;\n       __u8 nc_lock_map;\n       struct neuron_uuid uuid_data[APP_INFO_MAX_MODELS_PER_DEVICE];\n       size_t host_mem_size;\n       size_t device_mem_size;\n   };\n\nApplication information including PID, locked neuron cores, and memory usage.\n\n**Source**: `neuron_driver_shared.h:175 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L175>`_\n\nneuron_memcpy_batch_t\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct neuron_memcpy_batch {\n       __u64 mem_handle;\n       __u64 mem_handle_offset;\n       const nrt_tensor_batch_op_t *ops_ptr;\n       __u32 num_ops;\n       __u16 bar4_wr_threshold;\n       __u16 flags;\n       void *context;\n   } neuron_memcpy_batch_t;\n\nA batch of copy operations for efficient data transfer.\n\n**Source**: `neuron_driver_shared.h:220 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L220>`_\n\nnds_header_t\n^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nds_header {\n       char signature[4];\n       int version;\n   } nds_header_t;\n\nNeuron Datastore header structure.\n\n**Source**: `neuron_driver_shared.h:330 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L330>`_\n\nConstants\n---------\n\nNEURON_DMA_H2T_DEFAULT_QID\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NEURON_DMA_H2T_DEFAULT_QID (-1)\n\nH2T DMA Default Queue id.\n\n**Source**: `neuron_driver_shared.h:108 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L108>`_\n\nNEURON_MAX_PROCESS_PER_DEVICE\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NEURON_MAX_PROCESS_PER_DEVICE 16\n\nMaximum processes per device.\n\n**Source**: `neuron_driver_shared.h:167 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L167>`_\n\nNDS_MAX_NEURONCORE_COUNT\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NDS_MAX_NEURONCORE_COUNT (4)\n\nMaximum neuron core count for NDS.\n\n**Source**: `neuron_driver_shared.h:323 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared.h#L323>`_\n"
  },
  {
    "path": "neuron-runtime/api/neuron_driver_shared_tensor_batch_op.rst",
    "content": ".. _api_neuron_driver_shared_tensor_batch_op_h:\n\nneuron_driver_shared_tensor_batch_op.h\n=======================================\n\nShared tensor batch operation structures between runtime and driver.\n\n**Source**: `src/libnrt/include/ndl/neuron_driver_shared_tensor_batch_op.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared_tensor_batch_op.h>`_\n\nTypedefs\n--------\n\nnrt_tensor_batch_offset_t\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef uint64_t nrt_tensor_batch_offset_t;\n\nType for tensor batch operation offset.\n\n**Source**: `neuron_driver_shared_tensor_batch_op.h:13 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared_tensor_batch_op.h#L13>`_\n\nnrt_tensor_batch_size_t\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef uint64_t nrt_tensor_batch_size_t;\n\nType for tensor batch operation size.\n\n**Source**: `neuron_driver_shared_tensor_batch_op.h:14 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared_tensor_batch_op.h#L14>`_\n\nStructures\n----------\n\nnrt_tensor_batch_op_t\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_tensor_batch_op {\n       nrt_tensor_batch_offset_t offset;\n       nrt_tensor_batch_size_t size;\n       void *buffer;\n   } nrt_tensor_batch_op_t;\n\nTensor batch operation structure containing offset, size, and buffer pointer.\n\n**Source**: `neuron_driver_shared_tensor_batch_op.h:17 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/ndl/neuron_driver_shared_tensor_batch_op.h#L17>`_\n"
  },
  {
    "path": "neuron-runtime/api/neuron_ds.rst",
    "content": ".. _api_neuron_ds_h:\n\nneuron_ds.h\n===========\n\nNeuron Datastore (NDS) API - Shared memory datastore for runtime metrics and model information.\n\n**Source**: `src/libnrt/include/nrt/nds/neuron_ds.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h>`_\n\nConstants\n---------\n\nOBJECT_TYPE_MODEL_NODE_INFO\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define OBJECT_TYPE_MODEL_NODE_INFO (0)\n\nNDS object type for model node information.\n\n**Source**: `neuron_ds.h:19 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L19>`_\n\nOBJECT_TYPE_PROCESS_INFO\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define OBJECT_TYPE_PROCESS_INFO (1)\n\nNDS object type for process information.\n\n**Source**: `neuron_ds.h:20 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L20>`_\n\nMODEL_MEM_USAGE_LOCATION_COUNT\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define MODEL_MEM_USAGE_LOCATION_COUNT 2\n\nNumber of memory usage locations tracked.\n\n**Source**: `neuron_ds.h:24 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L24>`_\n\nEnumerations\n------------\n\nfeature_bitmap_bit_index_t\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum feature_bitmap_bit_index {\n       BIT_INDEX_TEST_FEATURE = 0,\n       BIT_INDEX_MULTICORE_FEATURE = 1,\n       BIT_INDEX_COUNT = BIT_INDEX_MULTICORE_FEATURE + 1\n   } feature_bitmap_bit_index_t;\n\nFeature bitmap's bit index information.\n\n**Source**: `neuron_ds.h:88 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L88>`_\n\nStructures\n----------\n\nnds_mem_usage_info_t\n^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nds_mem_usage_info {\n       size_t total_size;\n       uint32_t chunk_count;\n   } nds_mem_usage_info_t;\n\nAggregated data for all chunks of the same type/location.\n\n**Source**: `neuron_ds.h:45 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L45>`_\n\nnds_model_node_info_t\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nds_model_node_info {\n       uint32_t model_id;\n       uint32_t model_node_id;\n       char name[256];\n       char uuid[16];\n       uint8_t nc_index;\n       uint8_t sg_index;\n   } nds_model_node_info_t;\n\nLoaded model node information.\n\n**Source**: `neuron_ds.h:51 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L51>`_\n\nnds_model_node_mem_usage_info_t\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nds_model_node_mem_usage_info {\n       nds_mem_usage_info_t model_mem_usage[MODEL_MEM_USAGE_LOCATION_COUNT][NDS_DMA_MEM_USAGE_SLOT_COUNT];\n   } nds_model_node_mem_usage_info_t;\n\nLoaded model node memory usage information.\n\n**Source**: `neuron_ds.h:61 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L61>`_\n\nnds_version_info_t\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nds_version_info {\n       uint8_t major;\n       uint8_t minor;\n       uint32_t build;\n   } nds_version_info_t;\n\nVersion information.\n\n**Source**: `neuron_ds.h:66 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L66>`_\n\nnds_process_info_t\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nds_process_info {\n       int8_t framework_type;\n       char tag[32];\n       nds_version_info_t framework_version;\n       nds_version_info_t fal_version;\n       nds_version_info_t runtime_version;\n   } nds_process_info_t;\n\nProcess information-related struct.\n\n**Source**: `neuron_ds.h:73 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L73>`_\n\nFunctions\n---------\n\nnds_open\n^^^^^^^^\n\n.. code-block:: c\n\n   int nds_open(ndl_device_t *device, pid_t pid, nds_instance_t **inst);\n\nOpens NDS for the given pid. If pid == 0, it acquires it for the current PID and it's opened in read-write mode. If pid != 0, it acquires it for the provided PID and it's opened as read-only.\n\n**Parameters:**\n\n* ``device`` [in] - ndl_device used to open this NDS\n* ``pid`` [in] - pid for which to open the NDS, if 0 - it's opened as r/w for the current process\n* ``inst`` [out] - address of a pointer which will contain the instance handle\n\n**Returns:** non zero in case of error\n\n**Source**: `neuron_ds.h:102 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L102>`_\n\nnds_close\n^^^^^^^^^\n\n.. code-block:: c\n\n   int nds_close(nds_instance_t *inst);\n\nReleases the NDS instance and frees the data associated with it (mandatory for readers).\n\n**Parameters:**\n\n* ``inst`` [in] - NDS instance to close\n\n**Returns:** non zero in case of error, the pointer gets deleted regardless\n\n**Source**: `neuron_ds.h:110 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L110>`_\n\nnds_increment_nc_counter\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int nds_increment_nc_counter(nds_instance_t *inst, int pnc_index, uint32_t counter_index, uint64_t increment);\n\nIncrements a simple per-nc counter.\n\n**Parameters:**\n\n* ``inst`` [in] - NDS instance\n* ``pnc_index`` [in] - Neuroncore index\n* ``counter_index`` [in] - Counter index\n* ``increment`` [in] - Amount to increment\n\n**Returns:** 0 on success.\n\n**Source**: `neuron_ds.h:123 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L123>`_\n\nnds_get_nc_counter\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int nds_get_nc_counter(nds_instance_t *inst, int pnc_index, uint32_t counter_index, uint64_t *value);\n\nGets a simple per-nc counter.\n\n**Parameters:**\n\n* ``inst`` [in] - NDS instance\n* ``pnc_index`` [in] - Neuroncore index\n* ``counter_index`` [in] - Counter index\n* ``value`` [out] - Counter value\n\n**Returns:** 0 on success.\n\n**Source**: `neuron_ds.h:145 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L145>`_\n\nnds_increment_nd_counter\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int nds_increment_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t increment);\n\nIncrements a simple per-nd counter - may overflow.\n\n**Parameters:**\n\n* ``inst`` [in] - NDS instance\n* ``counter_index`` [in] - Counter index\n* ``increment`` [in] - Amount to increment\n\n**Returns:** 0 on success.\n\n**Source**: `neuron_ds.h:167 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L167>`_\n\nnds_get_nd_counter\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int nds_get_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t *value);\n\nGets a simple per-nd counter.\n\n**Parameters:**\n\n* ``inst`` [in] - NDS instance\n* ``counter_index`` [in] - Counter index\n* ``value`` [out] - Counter value\n\n**Returns:** 0 on success.\n\n**Source**: `neuron_ds.h:193 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L193>`_\n\nnds_obj_new\n^^^^^^^^^^^\n\n.. code-block:: c\n\n   nds_obj_handle_t nds_obj_new(nds_instance_t *inst, int type);\n\nCreates a new NDS object with the given type.\n\n**Parameters:**\n\n* ``inst`` [in] - NDS instance\n* ``type`` [in] - type of object to create\n\n**Returns:** handle for newly created object\n\n**Source**: `neuron_ds.h:220 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L220>`_\n\nnds_obj_commit\n^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int nds_obj_commit(nds_obj_handle_t obj);\n\nWrites an NDS object to the NDS memory.\n\n**Parameters:**\n\n* ``obj`` [in] - NDS object handle\n\n**Returns:** 0 on success.\n\n**Source**: `neuron_ds.h:213 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L213>`_\n\nnds_read_all_model_nodes\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   int nds_read_all_model_nodes(nds_instance_t *inst, nds_obj_handle_t **models, size_t *count);\n\nReads all model info data and returns it as an array (needs to be deleted by caller).\n\n**Parameters:**\n\n* ``inst`` [in] - NDS instance\n* ``models`` [out] - Pointer where to write the address of an array of length count containing object handles\n* ``count`` [out] - Number of models loaded (present in the models array)\n\n**Returns:** non-NULL on success.\n\n**Source**: `neuron_ds.h:250 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nds/neuron_ds.h#L250>`_\n"
  },
  {
    "path": "neuron-runtime/api/nrt-async-api-best-practices.rst",
    "content": "##############\nBest Practices\n##############\n\nSync vs Async APIs\n==================\n\nWith the introduction of the explicit async APIs, the Neuron Runtime provides users with a choice between synchronous APIs and asynchronous APIs. Choosing the right approach\ndepends on your workload requirements and performance goals.\n\nWhen to Use Synchronous APIs\n----------------------------\n\nSynchronous APIs are appropriate when:\n\n* **Prototyping or debugging** — Blocking behavior simplifies reasoning about execution order and makes it easier to isolate issues.\n* **Simple, sequential workloads** — If your application processes one request at a time without pipelining, the added complexity of async APIs may not provide meaningful\n  benefit.\n\nWhen to Use Asynchronous APIs\n-----------------------------\n\nAsynchronous APIs are recommended when:\n\n* **Maximizing device utilization** — Async APIs allow you to queue future execution requests while the device processes current work, eliminating idle time between operations.\n* **Pipelining across Execution Units** — Async APIs enable the overlapping of work between different Execution Units, allowing for customizable pipelining schemes, reducing\n  Execution Unit idle time.\n* **Overlapping device work with CPU work** — Non-blocking APIs free the CPU to perform other tasks (e.g., preprocessing, request management) while the device processes requests.\n\nMaximizing Device Utilization\n=============================\n\nTo maximize device utilization, applications should keep execution unit queues saturated with work at all times. Rather than waiting for each request to complete before submitting\nthe next request, use the schedule APIs to queue multiple requests ahead of execution—this ensures the device always has work ready to execute when the current operation finishes.\nMonitor queue depth using completion APIs like ``nrta_get_sequence`` to track how many requests remain in flight, and submit new work as completions occur to maintain a steady pipeline.\nAvoid letting the queue drain completely, as this creates idle gaps while the CPU prepares and submits the next request. A good rule of thumb is to keep at least 2-3 requests\nqueued per execution unit to absorb any variability in CPU scheduling or request preparation time. For workloads that span multiple execution units, submit work to each unit\nas soon as the data dependencies are satisfied—this allows compute, communication, and data transfer operations to overlap, further improving overall device utilization.\n\nHandling Execution Errors\n=========================\n\nRequest Error Handling\n----------------------\n\nWhen using asynchronous APIs, errors may not surface until after the\nschedule call returns—the device could encounter a failure mid-execution\nwhile the application continues to submit new work. To detect these\nfailures, all the schedule APIs accept an ``NRT_STATUS*`` parameter,\nthat will be populated with the result of the request execution upon\nrequest completion. Applications should track where they allocated this\nstatus, and check this status after each scheduled request to detect\nexecution failures.\n\nSee :doc:`nrt-async-api-examples` for an example.\n\nExecution Unit Unrecoverable\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn rare cases, an execution unit may enter a fatal failure state due to\na non-recoverable error such as a timeout or detectable hardware issue.\nOnce in this state, the execution unit can no longer process requests —\nall subsequent schedule calls will return\n``NRT_EXEC_UNIT_UNRECOVERABLE``.\n\nThis is a terminal state; the execution unit cannot be restored without\n*at least* reinitializing Runtime, most likely by terminating and\nrelaunching the application. With the worst errors, reloading the driver\nor rebooting the machine will be needed. Applications should monitor for\nthis return code and implement appropriate recovery logic, such as\nreleasing resources, notifying upstream services, and relaunching their\napplication.\n"
  },
  {
    "path": "neuron-runtime/api/nrt-async-api-examples.rst",
    "content": "Async API Usage Examples\n========================\n\nSchedule\n--------\n\n.. code:: c\n\n   NRT_STATUS exec_ret;\n\n   NRT_STATUS ret = nrta_execute_schedule(model, inputs, outputs, 0, &exec_ret, &seq);\n   if (ret != NRT_SUCCESS) {\n       if (ret == NRT_QUEUE_FULL) {\n           // or handle retries if desired\n           break;\n       }\n       // handle other errors\n       ...\n   } else {\n       req_seqs[req] = seq;\n   }\n\nWait for Completion via polling\n-------------------------------\n\nHere’s an example for polling for completion:\n\n.. code:: c\n\n   nrta_seq_t last_req_seq;\n   nrta_seq_t completed_seq = {};\n   while (true) {\n       nrta_get_sequence(lnc, NRTA_XU_COMPUTE, &completed_seq);\n       if (completed_seq >= last_req_seq) {\n           break;\n       }\n       usleep(1);\n   }\n\nIn a similar vein, we can also poll with\n``nrta_is_completed(last_req_seq, &is_completed)``\n\n.. _nrta-error-handling:\n\nError Handling\n--------------\n\nWe need to maintain an array/vector to track the execution return\nstatuses.\n\n.. code:: c\n\n   static const int NUM_EXECS = 8;\n   int lnc = 0;\n   NRT_STATUS exec_rets[NUM_EXECS];\n\n   // submit execution requests\n   nrta_seq_t req_seqs[NUM_EXECS];\n   for (int req = 0; req < NUM_EXECS; req++) {\n       nrta_seq_t seq = {};\n       NRT_STATUS ret = nrta_execute_schedule(model, inputs, outputs, 0, &exec_rets[req], &seq);\n       if (ret != NRT_SUCCESS) {\n           if (ret == NRT_QUEUE_FULL) {\n               // or handle retries if desired\n               break;\n           }\n           // handle other errors\n           ...\n       } else {\n           req_seqs[req] = seq;\n       }\n   }\n\n   // check for execution errors\n   int error_count = 0;\n   for (int req = 0; req < NUM_EXECS; req++) {\n       if (exec_rets[req] != NRT_SUCCESS) {\n           fprintf(stderr, \"Request [%x] completed with error %lu\\n\", req_seqs[req], exec_rets[req]);\n           error_count++;\n       }\n   }\n   if (error_count > 0) {\n       ...\n   }\n\nFinding Number of Pending Executions\n------------------------------------\n\nWhile this is susceptible to some races, here’s an example of how to\nestimate the outstanding requests:\n\n.. code:: c\n\n   nrta_seq_t last_completed = {};\n   const int compute_queue = 0; // Compute XU only has 1 queue\n   nrta_get_sequence(lnc, NRTA_XU_COMPUTE, compute_queue, &last_completed);\n\n   // sanity check: the two sequence ids should be from the same XU\n   assert(NRTA_SEQ_GET_XU_ID(last_submitted) == NRTA_SEQ_GET_XU_ID(last_completed));\n   // the sequence id is a monotone and sequential value for each XU\n   return last_submitted - last_completed;\n"
  },
  {
    "path": "neuron-runtime/api/nrt-async-api-overview.rst",
    "content": "====================================================\nNeuron Runtime Async APIs: Motivation and Overview\n====================================================\n\nIntroduction\n============\n\nAchieving maximum utilization of AWS Neuron Devices requires applications to execute work asynchronously—submitting future execution requests while the device is still processing\nprevious ones. The Neuron Runtime (NRT) Async APIs provide explicit, fine-grained control over asynchronous operations, enabling developers to fully optimize their workloads for\nNeuron hardware.\n\nNeuron Device Execution Units\n=============================\n\nThe Neuron Runtime exposes the Neuron device as a collection of specialized, independent processing blocks called execution units. Each execution unit can process\noperations asynchronously, enabling parallel execution across multiple units.\n\nCurrently there are 3 types of Execution Units:\n\n+------------------+-----------------------------------------------------------------------------------+\n| Execution Unit   | Purpose                                                                           |\n+==================+===================================================================================+\n| Neuron Core XU   | Executes compiled models or kernels                                               |\n+------------------+-----------------------------------------------------------------------------------+\n| Collectives XU   | Runs standalone collective operations (all-gather, reduce-scatter, all-reduce)    |\n|                  | outside of a compiled model/kernel                                                |\n+------------------+-----------------------------------------------------------------------------------+\n| Tensor Op XU     | Transfers data between host and Neuron Devices                                    |\n+------------------+-----------------------------------------------------------------------------------+\n\nAnd each neuron core has multiple execution units of each type *(PENDING\nAPI for getting number of queues)*: \n\n+------------------+------------+\n| Execution Unit   | Queues/NC  |\n+==================+============+\n| Neuron Core XU   | 1          |\n+------------------+------------+\n| Collectives XU   | 3          |\n+------------------+------------+\n| Tensor Op XU     | 2          |\n+------------------+------------+\n\nIn general, an individual Execution Unit on the device is uniquely\nidentified by :math:`(NeuronCore\\times XUType\\times QueueID)`\n\nThis abstraction along with the Explicit Async APIs, provide\napplications the control necessary to overlap compute, communication,\nand data movement operations.\n\n(Legacy) Async Execution Mode vs (New) Async APIs\n=================================================\n\nPreviously, the Neuron Runtime supported an Async Execution Mode which allowed for the asynchronous submission of model/kernel executions. When this mode is enabled, calls to\n``nrt_execute`` return immediately, allowing the calling thread to prepare the next execution while the device processes the current one. To maintain tensor consistency, tensor\nread/write operations automatically block while tensors are in use by pending executions.\n\nWhile this flow works, the implicit nature of the implementation limits both the flexibility and control available to applications.\n\n**Limited Flexibility:** The current async model ties execution and data operations together in ways that prevent efficient pipelining. For example, reading tensor\ndata from the device blocks until all pending executions complete, preventing applications from overlapping data transfers with ongoing Neuron Core computation.\n\n**Limited Control:** The current APIs do not expose asynchronous control for all execution units, limiting applications from making optimal scheduling decisions. Without\nfine-grained, asynchronous control over each execution unit, applications cannot implement scheduling strategies that maximize overlap between compute, communication, and\ndata movement operations.\n\nAsync APIs\n==========\n\nThe Async APIs directly address the limitations of the implicit async implementation through two core design choices:\n\n* **Explicit completion primitives** — Instead of relying on implicit blocking behavior to ensure consistency, the new APIs provide explicit mechanisms for tracking request\n  completion. This gives applications full control over synchronization and enables efficient polling patterns that keep execution units saturated with work.\n* **All execution units can run asynchronously** — Unlike the current model where execution and tensor operations are coupled, the new APIs allow the Neuron Core, Collectives,\n  and Tensor execution units to operate independently and in parallel. This enables applications to schedule compute, communication, and data movement operations concurrently,\n  achieving true overlap between these different types of work.\n\nTogether, these design choices give applications the flexibility to implement custom scheduling strategies and the control needed to make optimal decisions about when to overlap work,\nwhen to synchronize, and how to maximize device utilization.\n\nKey Benefits\n------------\n\n* **Higher device utilization** — Pipeline work across multiple devices without idle cycles\n* **Compute/communication/data transfer overlap** — Schedule independent operations in parallel\n* **Greater optimization flexibility** — Build custom execution strategies tailored to your specific workload\n\nWhat are the Async APIs\n=======================\n\nThe Explicit Async APIs (prefixed with ``nrta``) are organized into two main categories:\n\n* **Schedule APIs** (``nrta_execute_schedule``, ``nrta_cc_schedule``, ``nrta_tensor_read``, ``nrta_tensor_write``, ``nrta_tensor_copy``) — enqueue work to an execution unit and return a sequence number for tracking.\n* **Completion APIs** (``nrta_get_sequence``, ``nrta_is_completed``) — enable applications to monitor execution unit progress and check for request completion.\n\nTogether, these categories enable a workflow where applications continuously submit work and monitor completions—keeping execution units busy and\nmaximizing device utilization.\n\nSee `nrt_async.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h>`_ for more details.\n\nSummary\n=======\n\nThe Neuron Runtime Async APIs give developers explicit control over asynchronous execution on Neuron hardware. Whether you're building advanced inference pipelines or\nimplementing eager mode workloads that demand responsive kernel scheduling, these APIs unlock optimization opportunities by exposing non-blocking interfaces for all\nexecution units.\n"
  },
  {
    "path": "neuron-runtime/api/nrt.rst",
    "content": ".. _api_nrt_h:\n\nnrt.h\n=====\n\nNeuron Runtime (NRT) API - Main interface for loading and executing models on Neuron devices.\n\n**Source**: `src/libnrt/include/nrt/nrt.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h>`_\n\nConstants\n---------\n\nNRT_MAJOR_VERSION\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NRT_MAJOR_VERSION 2\n\nMajor version of runtime.\n\n**Source**: `nrt.h:21 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L21>`_\n\nNRT_MINOR_VERSION\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NRT_MINOR_VERSION 0\n\nMinor version of runtime.\n\n**Source**: `nrt.h:22 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L22>`_\n\nEnumerations\n------------\n\nnrt_tensor_placement_t\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum {\n       NRT_TENSOR_PLACEMENT_DEVICE,\n       NRT_TENSOR_PLACEMENT_HOST,\n       NRT_TENSOR_PLACEMENT_VIRTUAL,\n   } nrt_tensor_placement_t;\n\nTensor placement options.\n\n**Source**: `nrt.h:34 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L34>`_\n\nnrt_framework_type_t\n^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum {\n       NRT_FRAMEWORK_TYPE_INVALID = 0,\n       NRT_FRAMEWORK_TYPE_NO_FW = 1,\n       NRT_FRAMEWORK_TYPE_TENSORFLOW,\n       NRT_FRAMEWORK_TYPE_PYTORCH,\n       NRT_FRAMEWORK_TYPE_MXNET,\n       NRT_FRAMEWORK_TYPE_PRECHECK,\n   } nrt_framework_type_t;\n\nFramework types supported by NRT.\n\n**Source**: `nrt.h:40 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L40>`_\n\nnrt_dtype_t\n^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum nrt_dtype {\n       NRT_DTYPE_UNKNOWN = 0x0,\n       NRT_DTYPE_INVALID = 0x0,\n       NRT_DTYPE_FP8_E3 = 0xD,\n       NRT_DTYPE_FP8_E4 = 0xE,\n       NRT_DTYPE_FP8_E5 = 0xF,\n       NRT_DTYPE_FLOAT16 = 0x7,\n       NRT_DTYPE_BFLOAT16 = 0x6,\n       NRT_DTYPE_FLOAT32 = 0xA,\n       NRT_DTYPE_FP32R = 0xB,\n       NRT_DTYPE_UINT8 = 0x3,\n       NRT_DTYPE_UINT16 = 0x5,\n       NRT_DTYPE_UINT32 = 0x9,\n       NRT_DTYPE_UINT64 = 0x1,\n       NRT_DTYPE_INT8 = 0x2,\n       NRT_DTYPE_INT16 = 0x4,\n       NRT_DTYPE_INT32 = 0x8,\n       NRT_DTYPE_INT64 = 0xC,\n   } nrt_dtype_t;\n\nData types supported by NRT.\n\n**Source**: `nrt.h:90 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L90>`_\n\nnrt_op_type_t\n^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum nrt_op_type {\n       NRT_OP_ADD = 0x0,\n       NRT_OP_FMA = 0x1,\n       NRT_OP_MAX = 0x2,\n       NRT_OP_MIN = 0x3,\n       NRT_OP_INVALID = 0xF,\n   } nrt_op_type_t;\n\nOperation types for collectives.\n\n**Source**: `nrt.h:83 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L83>`_\n\nnrt_cc_op_type_t\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum nrt_cc_op_type {\n       NRT_CC_ALLGATHER,\n       NRT_CC_ALLREDUCE,\n       NRT_CC_REDUCESCATTER\n   } nrt_cc_op_type_t;\n\nCollective communication operation types.\n\n**Source**: `nrt.h:111 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L111>`_\n\nStructures\n----------\n\nnrt_instance_info_t\n^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_instance_info {\n       uint32_t family;\n       uint32_t size;\n       char arch_name[16];\n       char device_revision[8];\n   } nrt_instance_info_t;\n\nInstance information structure.\n\n**Source**: `nrt.h:117 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L117>`_\n\nnrt_tensor_batch_t\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_tensor_batch {\n       const nrt_tensor_t *tensor;\n       const nrt_tensor_batch_op_t *ops;\n       uint32_t num_ops;\n   } nrt_tensor_batch_t;\n\nA batch of tensor operations on a single tensor.\n\n**Source**: `nrt.h:343 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L343>`_\n\nnrt_tensor_device_allocation_info_t\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_tensor_device_allocation_info {\n       uint64_t physical_address;\n       size_t size;\n       int hbm_index;\n   } nrt_tensor_device_allocation_info_t;\n\nReturns on device allocation info for a tensor.\n\n**Source**: `nrt.h:442 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L442>`_\n\nnrt_vnc_memory_stats_t\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_vnc_memory_stats {\n       size_t bytes_used;\n       size_t bytes_limit;\n   } nrt_vnc_memory_stats_t;\n\nNRT memory stats for a VNC.\n\n**Source**: `nrt.h:509 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L509>`_\n\nnrt_cc_comm_t\n^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_cc_comm {\n       uint32_t *replica_group;\n       uint32_t rank;\n       uint32_t rank_n;\n       uint32_t ctx_device_id;\n       uint32_t ctx_device_count;\n       uint32_t vnc;\n   } nrt_cc_comm_t;\n\nCommunicator for collective operations.\n\n**Source**: `nrt.h:545 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L545>`_\n\nnrt_tensor_list_t\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_tensor_list {\n       nrt_tensor_t **tensors;\n       size_t num_tensors;\n   } nrt_tensor_list_t;\n\nList of tensors.\n\n**Source**: `nrt.h:554 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L554>`_\n\nFunctions\n---------\n\nnrt_init\n^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_init(nrt_framework_type_t framework, const char *fw_version, const char *fal_version);\n\nInitialize neuron runtime.\n\n**Parameters:**\n\n* ``framework`` [in] - Type of the framework.\n* ``fw_version`` [in] - Framework version as string. (eg 2.1)\n* ``fal_version`` [in] - Framework Abstraction Layer version as string.\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt.h:133 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L133>`_\n\nnrt_close\n^^^^^^^^^\n\n.. code-block:: c\n\n   void nrt_close();\n\nCloses all the devices and cleans up the runtime state.\n\n**Source**: `nrt.h:138 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L138>`_\n\nnrt_load\n^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_load(const void *neff_bytes, size_t size, int32_t vnc, int32_t vnc_count, nrt_model_t **model);\n\nLoad given NEFF and place it in one or more neuron cores.\n\n**Parameters:**\n\n* ``neff_bytes`` [in] - Pointer to NEFF data.\n* ``size`` [in] - Length of the NEFF data.\n* ``vnc`` [in] - VNC index where the NEFF should be loaded(-1 means runtime would automatically load in first free VNC).\n* ``vnc_count`` [in] - DEPRECATED: always use -1\n* ``model`` [out] - Resulting model would be stored here.\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt.h:149 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L149>`_\n\nnrt_unload\n^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_unload(nrt_model_t *model);\n\nUnload given model and free up device and host resources.\n\n**Parameters:**\n\n* ``model`` - Model to unload.\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt.h:172 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L172>`_\n\nnrt_execute\n^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_execute(nrt_model_t *model, const nrt_tensor_set_t *input_set, nrt_tensor_set_t *output_set);\n\nExecute given model with given inputs and collect outputs.\n\n**Parameters:**\n\n* ``model`` [in] - Model to execute.\n* ``input_set`` [in] - Set of input tensors.\n* ``output_set`` [in] - Set of output tensors.\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt.h:256 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L256>`_\n\nnrt_tensor_allocate\n^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_tensor_allocate(nrt_tensor_placement_t tensor_placement, int vnc, size_t size, \n                                  const char *name, nrt_tensor_t **tensor);\n\nAllocates a tensor that can be passed and used by a model for compute.\n\n**Parameters:**\n\n* ``tensor_placement`` [in] - Where the tensor would be allocated (device, host, or virtual memory)\n* ``vnc`` [in] - Virtual Neuron Core id to allocate the tensor on. Pass in -1 if allocating tensors on host memory.\n* ``size`` [in] - Size in bytes of the tensor to allocate.\n* ``name`` [in] - OPTIONAL. Name of the tensor.\n* ``tensor`` [out] - Pointer to newly created tensor will be stored here.\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt.h:283 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L283>`_\n\nnrt_tensor_free\n^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   void nrt_tensor_free(nrt_tensor_t **tensor);\n\nDeallocates a tensor created by \"nrt_tensor_allocate\".\n\n**Parameters:**\n\n* ``tensor`` [in] - Deallocates given tensor.\n\n**Source**: `nrt.h:292 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L292>`_\n\nnrt_tensor_read\n^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_tensor_read(const nrt_tensor_t *tensor, void *buf, size_t offset, size_t size);\n\nCopies data from tensor to passed in buffer.\n\n**Parameters:**\n\n* ``tensor`` [in] - Tensor used to reference the tensor to read from.\n* ``buf`` [out] - Buffer used to store data read from the tensor.\n* ``offset`` [in] - Offset into the tensor to read from.\n* ``size`` [in] - Number of bytes to read.\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt.h:303 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L303>`_\n\nnrt_tensor_write\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_tensor_write(nrt_tensor_t *tensor, const void *buf, size_t offset, size_t size);\n\nCopies data from passed in buffer to tensor.\n\n**Parameters:**\n\n* ``tensor`` [in/out] - Tensor used to reference the tensor to write to.\n* ``buf`` [in] - Buffer used to store data to write to the tensor.\n* ``offset`` [in] - Offset into the tensor to write to.\n* ``size`` [in] - Number of bytes to write.\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt.h:315 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L315>`_\n\nnrt_tensor_copy\n^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_tensor_copy(const nrt_tensor_t *src, size_t src_offset, nrt_tensor_t *dst, \n                              size_t dst_offset, size_t size);\n\nCopies data between tensors.\n\n**Parameters:**\n\n* ``src`` [in] - Tensor to copy from.\n* ``src_offset`` [in] - Offset into the source tensor to copy from.\n* ``dst`` [out] - Tensor to copy to.\n* ``dst_offset`` [in] - Offset into the destination tensor to copy to.\n* ``size`` [in] - Number of bytes to copy.\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt.h:381 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L381>`_\n\nnrt_get_total_vnc_count\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_get_total_vnc_count(uint32_t *vnc_count);\n\nReturns VirtualNeuronCores available in instance.\n\n**Parameters:**\n\n* ``vnc_count`` [out] - VirtualNeuronCores available in instance.\n\n**Note:** This API can be called before nrt_init().\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt.h:203 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h#L203>`_\n"
  },
  {
    "path": "neuron-runtime/api/nrt_async.rst",
    "content": ".. _api_nrt_async_h:\n\nnrt_async.h\n===========\n\nNeuron Runtime Asynchronous Execution API - Non-blocking operations for tensor I/O and model execution.\n\n**Source**: `src/libnrt/include/nrt/nrt_async.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h>`__\n\n.. note::\n\n   The Neuron Runtime Async APIs are currently in early release and may change across Neuron versions.\n\nEnumerations\n------------\n\nnrta_xu_t\n^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum {\n       NRTA_XU_TENSOR_OP = 0,\n       NRTA_XU_COMPUTE,\n       NRTA_XU_COLLECTIVES,\n       NRTA_XU_TYPE_NUM\n   } nrta_xu_t;\n\nExecution unit types.\n\n**Source**: `nrt_async.h:18 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L20>`__\n\nTypedefs\n--------\n\nnrta_seq_t\n^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef uint64_t nrta_seq_t;\n\nMonotonically increasing IDs of executions. The first 16 bits are an Execution Unit ID, while the last 48 bits are a strictly ordered Sequence Number.\n\n**Source**: `nrt_async.h:31 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L33>`__\n\nnrta_xu_id_t\n^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef uint16_t nrta_xu_id_t;\n\nExecution unit ID type.\n\n**Source**: `nrt_async.h:32 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L34>`__\n\nConstants\n---------\n\nNRTA_SEQ_NUM_MAX\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define NRTA_SEQ_NUM_MAX ((1ull << 48) - 1)\n\nMaximum sequence number value.\n\n**Source**: `nrt_async.h:34 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L36>`__\n\nFunctions\n---------\n\nnrta_tensor_write\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrta_tensor_write(nrt_tensor_t *tensor, const void *buf, uint64_t offset, \n                                uint64_t size, int queue, NRT_STATUS *ret, \n                                nrta_seq_t *req_sequence);\n\nEnqueues a tensor write request. Copies the data from a host buffer to a tensor allocated on a Neuron device.\n\n**Parameters:**\n\n* ``tensor`` [in] - Destination tensor\n* ``buf`` [in] - Host buffer containing source data\n* ``offset`` [in] - Offset into the tensor\n* ``size`` [in] - Number of bytes to write\n* ``queue`` [in] - XU queue to use\n* ``ret`` [in] - pointer to store return value of the async request upon completion\n* ``req_sequence`` [out] - Sequence number of the scheduled request\n\n**Returns:** NRT_SUCCESS on success\n\n**Source**: `nrt_async.h:59 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L57>`__\n\nnrta_tensor_read\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrta_tensor_read(void *buf, nrt_tensor_t *tensor, uint64_t offset, \n                               uint64_t size, int queue, NRT_STATUS *ret, \n                               nrta_seq_t *req_sequence);\n\nEnqueues a tensor read request. Copies the data from a tensor allocated on a Neuron device to a host buffer.\n\n**Parameters:**\n\n* ``buf`` [in] - Destination Host buffer\n* ``tensor`` [in] - Source tensor\n* ``offset`` [in] - Offset into the tensor\n* ``size`` [in] - Number of bytes to read\n* ``queue`` [in] - XU queue to use\n* ``ret`` [in] - pointer to store return value of the async request upon completion\n* ``req_sequence`` [out] - Sequence number of the scheduled request\n\n**Returns:** NRT_SUCCESS on success\n\n**Source**: `nrt_async.h:77 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L81>`__\n\nnrta_tensor_copy\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrta_tensor_copy(nrt_tensor_t *src, uint64_t src_offset, nrt_tensor_t *dst, \n                               uint64_t dst_offset, uint64_t size, int queue, \n                               NRT_STATUS *ret, nrta_seq_t *req_sequence);\n\nEnqueues a tensor copy request. Copies data between two tensors allocated on the same Logical Neuron Core.\n\n**Parameters:**\n\n* ``src`` [in] - Source tensor\n* ``src_offset`` [in] - Offset into the source tensor\n* ``dst`` [in] - Destination tensor\n* ``dst_offset`` [in] - Offset into the destination tensor\n* ``size`` [in] - Number of bytes to copy\n* ``queue`` [in] - XU queue to use\n* ``ret`` [in] - pointer to store return value of the async request upon completion\n* ``req_sequence`` [out] - Sequence number of the scheduled request\n\n**Returns:** NRT_SUCCESS on success\n\n**Source**: `nrt_async.h:98 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L107>`__\n\nnrta_execute_schedule\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrta_execute_schedule(nrt_model_t *model, const nrt_tensor_set_t *input, \n                                    nrt_tensor_set_t *output, int queue, \n                                    NRT_STATUS *ret, nrta_seq_t *req_sequence);\n\nSchedules an asynchronous request to execute a model with specified inputs and outputs.\n\n**Parameters:**\n\n* ``model`` [in] - The model to schedule for execution\n* ``input`` [in] - Set of input tensors for the model\n* ``output`` [in] - Set of tensors to receive the outputs\n* ``queue`` [in] - XU queue to use, must be 0\n* ``ret`` [in] - pointer to store return value of the async request upon completion\n* ``req_sequence`` [out] - Sequence number of the scheduled request\n\n**Returns:** NRT_SUCCESS on successful preparation, appropriate error code otherwise\n\n**Source**: `nrt_async.h:118 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L129>`__\n\nnrta_cc_prepare\n^^^^^^^^^^^^^^^^^^^^^\n**NOTE: The nrta_cc_prepare and nrta_cc_schedule APIs are work-in-progress and subject to change.**\n\n.. code-block:: c\n\n   NRT_STATUS nrta_cc_prepare(nrt_cc_comm_t *comm, nrt_tensor_list_t *input, \n                              nrt_tensor_list_t *output, nrt_dtype_t dtype, \n                              nrt_op_type_t op, nrt_cc_op_type_t cc_op\n                              nrt_cc_context_t **cc_ctx);\n\nPrepares collective context and HW configuration needed for collectives operation.\nAllocates a collective context handle that is returned to the caller which is freed in the schedule thread post CC op execution.\n\n**Parameters:**\n\n* ``comm`` [in] - Communicator containing the replica group\n* ``input`` [in] - Input tensor list\n* ``output`` [out] - Output tensor list\n* ``dtype`` [in] - Data type of elements\n* ``op`` [in] - Reduction operation (e.g., SUM, MAX) if applicable\n* ``cc_op`` [in] - Collective operation (e.g., ALLREDUCE, ALLGATHER)\n* ``cc_ctx`` [out] - Collective context\n\n**Returns:** NRT_SUCCESS on successful preparation, appropriate error code otherwise\n\n**Source**: `nrt_async.h:155 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L155>`__\n\nnrta_cc_schedule\n^^^^^^^^^^^^^^^^^^^^^\n**NOTE: The nrta_cc_prepare and nrta_cc_schedule APIs are work-in-progress and subject to change.**\n\n.. code-block:: c\n\n   NRT_STATUS nrta_cc_schedule(nrt_cc_context_t **cc_ctx, int queue, \n                              NRT_STATUS *ret, nrta_seq_t *req_sequence);\n\nSchedules an asynchronous request to execute collective operation\n\n**Parameters:**\n\n* ``cc_ctx`` [in] - Collective context\n* ``queue`` [in] - XU queue to use, must be 0\n* ``ret`` [in] - pointer to store return value of the async request upon completion\n* ``req_sequence`` [out] - Sequence number of the scheduled request\n\n**Returns:** NRT_SUCCESS on successful preparation, appropriate error code otherwise\n\n**Source**: `nrt_async.h:172 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L172>`__\n\nnrta_is_completed\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrta_is_completed(nrta_seq_t seq, bool *is_completed);\n\nChecks completion status of a scheduled request.\n\n**Parameters:**\n\n* ``seq`` [in] - Scheduled request sequence id\n* ``is_completed`` [out] - true if the request is completed, false otherwise\n\n**Returns:** NRT_SUCCESS if the request is completed, NRT_INVALID if the seq is not valid\n\n**Source**: `nrt_async.h:159 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L186>`__\n\nnrta_get_sequence\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrta_get_sequence(uint32_t lnc, nrta_xu_t xu, int queue, nrta_seq_t *seq);\n\nReturns sequence number of the last completed request.\n\n**Parameters:**\n\n * ``lnc`` [in] - LNC\n * ``xu`` [in] - XU\n * ``queue`` [in] - XU's queue\n * ``seq`` [out] - last completed sequence number\n\n**Returns:** NRT_SUCCESS on success\n\n**Source**: `nrt_async.h:185 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h#L198>`__\n"
  },
  {
    "path": "neuron-runtime/api/nrt_async_sendrecv.rst",
    "content": ".. _api_nrt_async_sendrecv_h:\n\nnrt_async_sendrecv.h\n====================\n\nNeuron Runtime Asynchronous Send/Receive API - Network communication between logical neuron cores.\n\n**Source**: `src/libnrt/include/nrt/nrt_async_sendrecv.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async_sendrecv.h>`_\n\n.. note::\n\n   The Neuron Runtime Async APIs are currently in early release and may change across Neuron versions.\n\nFunctions\n---------\n\nnrt_async_sendrecv_init\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_async_sendrecv_init(int lnc);\n\nInitialize asynchronous tensor send and receive on logical neuron core.\n\nLogical neuron core ID is the absolute ID of the logical core on the host machine. The ID is unaffected by device remapping via docker and selection of visible logical cores.\n\n**Parameters:**\n\n* ``lnc`` [in] - Logical neuron core ID on the current server\n\n**Returns:** NRT_SUCCESS if logical core has been initialized successfully, NRT_FAILURE for errors\n\n**Note:** This function may only be called when runtime is initialized. This function must have a matching call to nrt_async_sendrecv_close() before nrt_close() is called.\n\n**Source**: `nrt_async_sendrecv.h:48 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async_sendrecv.h#L48>`_\n\nnrt_async_sendrecv_close\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_async_sendrecv_close(int lnc);\n\nCloses asynchronous tensor send and receive of logical neuron core and cleans up resources.\n\n**Parameters:**\n\n* ``lnc`` [in] - Logical neuron core ID on the current server\n\n**Returns:** NRT_SUCCESS if logical core has been closed successfully, NRT_FAILURE for errors\n\n**Note:** After this function was invoked, all sendrecv communicators and requests associated with this logical neuron core are closed and cannot be accessed anymore.\n\n**Source**: `nrt_async_sendrecv.h:64 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async_sendrecv.h#L64>`_\n\nnrt_async_sendrecv_connect\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_async_sendrecv_connect(const char* peer_ip, int peer_lnc, int lnc, \n                                         nrt_async_sendrecv_comm_t** send_comm);\n\nCreate send communicator.\n\nBefore send communicator can be used to initiate sending a tensor, connection to receive communicator must be established. Use function nrt_async_sendrecv_test_comm() to test whether connection is established.\n\n**Parameters:**\n\n* ``peer_ip`` [in] - IP address of peer logical neuron core\n* ``peer_lnc`` [in] - Logical neuron core ID on the peer server\n* ``lnc`` [in] - Logical neuron core ID on the current server\n* ``send_comm`` [out] - Pointer to send communicator\n\n**Returns:** NRT_SUCCESS if logical core has been created successfully, NRT_RESOURCE if the number of created communicators exceeds the limit, NRT_FAILURE for other errors\n\n**Note:** This function is thread-safe.\n\n**Source**: `nrt_async_sendrecv.h:84 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async_sendrecv.h#L84>`_\n\nnrt_async_sendrecv_accept\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_async_sendrecv_accept(const char* peer_ip, int peer_lnc, int lnc, \n                                        nrt_async_sendrecv_comm_t** recv_comm);\n\nCreate receive communicator.\n\nBefore receive communicator can be used to initiate receiving a tensor, connection to receive communicator must be established. Use function nrt_async_sendrecv_test_comm() to test whether connection is established.\n\n**Parameters:**\n\n* ``peer_ip`` [in] - IP address of peer logical neuron core\n* ``peer_lnc`` [in] - Logical neuron core ID on the peer server\n* ``lnc`` [in] - Logical neuron core ID on the current server\n* ``recv_comm`` [out] - Pointer to receive communicator\n\n**Returns:** NRT_SUCCESS if logical core has been created successfully, NRT_RESOURCE if the number of created communicators exceeds the limit, NRT_FAILURE for other errors\n\n**Note:** This function is thread-safe.\n\n**Source**: `nrt_async_sendrecv.h:104 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async_sendrecv.h#L104>`_\n\nnrt_async_sendrecv_send_tensor\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_async_sendrecv_send_tensor(nrt_tensor_t* tensor, size_t offset, size_t length, \n                                             nrt_async_sendrecv_comm_t* send_comm, \n                                             nrt_async_sendrecv_request_t** request);\n\nAsynchronously send a tensor.\n\nThis is a non-blocking function. This function is thread-safe. This function is only allowed to be invoked on a communicator that is successfully tested to be connected via call to nrt_async_sendrecv_test_comm().\n\n**Parameters:**\n\n* ``tensor`` [in] - Tensor to send from\n* ``offset`` [in] - Offset into the tensor to send from\n* ``length`` [in] - Number of bytes to send\n* ``send_comm`` [in] - Send communicator\n* ``request`` [out] - Pointer to send request\n\n**Returns:** NRT_SUCCESS on success, NRT_INVALID_HANDLE if handle is invalid, NRT_RESOURCE if the number of pending requests exceeds the limit, NRT_FAILURE for other errors\n\n**Source**: `nrt_async_sendrecv.h:135 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async_sendrecv.h#L135>`_\n\nnrt_async_sendrecv_recv_tensor\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_async_sendrecv_recv_tensor(nrt_tensor_t* tensor, size_t offset, size_t length, \n                                             nrt_async_sendrecv_comm_t* recv_comm, \n                                             nrt_async_sendrecv_request_t** request);\n\nAsynchronously receive a tensor.\n\nThis is a non-blocking function. This function is thread-safe. This function is only allowed to be invoked on a communicator that is successfully tested to be connected via call to nrt_async_sendrecv_test_comm().\n\n**Parameters:**\n\n* ``tensor`` [in] - Tensor to receive to\n* ``offset`` [in] - Offset into the tensor to receive to\n* ``length`` [in] - Number of bytes to read\n* ``recv_comm`` [in] - Receive communicator\n* ``request`` [out] - Pointer to receive request\n\n**Returns:** NRT_SUCCESS on success, NRT_INVALID_HANDLE if handle is invalid, NRT_RESOURCE if the number of pending requests exceeds the limit, NRT_FAILURE for other errors\n\n**Source**: `nrt_async_sendrecv.h:156 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async_sendrecv.h#L156>`_\n\nnrt_async_sendrecv_test_request\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_async_sendrecv_test_request(nrt_async_sendrecv_request_t* request, bool* done, size_t* size);\n\nTest the completion status of an asynchronous request.\n\nThis function is thread-safe when invoked with different requests. This function is not allowed to be invoked concurrently by multiple threads with the same request at the same time.\n\n**Parameters:**\n\n* ``request`` [in] - Request to test\n* ``done`` [out] - Whether the request has completed\n* ``size`` [out] - Number of bytes sent/received\n\n**Returns:** NRT_SUCCESS on success, NRT_INVALID_HANDLE if handle is invalid, NRT_TIMEOUT if the request fails to complete data transfer within time limit, NRT_FAILURE for other errors\n\n**Source**: `nrt_async_sendrecv.h:174 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async_sendrecv.h#L174>`_\n"
  },
  {
    "path": "neuron-runtime/api/nrt_experimental.rst",
    "content": ".. _api_nrt_experimental_h:\n\nnrt_experimental.h\n==================\n\nNeuron Runtime Experimental API - Features under development and subject to change.\n\n**Source**: `src/libnrt/include/nrt/nrt_experimental.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_experimental.h>`_\n\n.. note::\n\n   Experimental APIs are provided for testing and feedback and may not be appropriate for production environments.\n\nEnumerations\n------------\n\nnrt_tensor_usage_t\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum nrt_tensor_usage {\n       NRT_TENSOR_USAGE_INPUT = 0,\n       NRT_TENSOR_USAGE_OUTPUT,\n   } nrt_tensor_usage_t;\n\nUsage of a Tensor in the NEFF.\n\n**Source**: `nrt_experimental.h:18 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_experimental.h#L18>`_\n\nStructures\n----------\n\nnrt_tensor_info_t\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_tensor_info {\n       char name[NRT_TENSOR_NAME_MAX];\n       nrt_tensor_usage_t usage;\n       size_t size;\n       nrt_dtype_t dtype;\n       uint32_t *shape;\n       uint32_t ndim;\n   } nrt_tensor_info_t;\n\nTensor information including name, usage, size, data type, and shape.\n\n**Source**: `nrt_experimental.h:25 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_experimental.h#L25>`_\n\nnrt_tensor_info_array_t\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_tensor_info_array {\n       uint64_t tensor_count;\n       nrt_tensor_info_t tensor_array[];\n   } nrt_tensor_info_array_t;\n\nArray of tensor information.\n\n**Source**: `nrt_experimental.h:34 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_experimental.h#L34>`_\n\nnrt_model_info_t\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_model_info {\n       uint32_t vnc;\n   } nrt_model_info_t;\n\nModel information structure.\n\n**Source**: `nrt_experimental.h:139 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_experimental.h#L139>`_\n\nFunctions\n---------\n\nnrt_get_model_tensor_info\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_get_model_tensor_info(nrt_model_t *model, nrt_tensor_info_array_t **tensor_info);\n\nReturn input/output tensor information for a given model.\n\n**Parameters:**\n\n* ``model`` [in] - Model for which tensor information needs to be extracted.\n* ``tensor_info`` [out] - Pointer to store the result.\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt_experimental.h:48 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_experimental.h#L48>`_\n\nnrt_trace_start\n^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_trace_start(bool trace_mem);\n\nEnable tracing for all VNCs visible to the app.\n\n**Parameters:**\n\n* ``trace_mem`` [in] - collect memory allocation info\n\n**Returns:** NRT_SUCCESS on success.\n\n**Source**: `nrt_experimental.h:68 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_experimental.h#L68>`_\n\nnrt_trace_stop\n^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_trace_stop(const char *filename);\n\nSerialize all data and disable tracing.\n\n**Parameters:**\n\n* ``filename`` [in] - filename to write to\n\n**Returns:** NRT_SUCCESS on success.\n\n**Source**: `nrt_experimental.h:75 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_experimental.h#L75>`_\n\nnrt_barrier\n^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_barrier(int32_t vnc, uint32_t g_device_id, uint32_t g_device_count);\n\nImplements a barrier by running a small all-reduce over all workers.\n\n**Parameters:**\n\n* ``vnc`` [in] - local VNC (within the instance)\n* ``global_device_id`` [in] - global worker ID\n* ``global_device_count`` [in] - total number of workers\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt_experimental.h:115 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_experimental.h#L115>`_\n"
  },
  {
    "path": "neuron-runtime/api/nrt_profile.rst",
    "content": ".. _api_nrt_profile_h:\n\nnrt_profile.h\n=============\n\nNeuron Runtime Profiling API - Tools for profiling model execution and device performance.\n\n**Source**: `src/libnrt/include/nrt/nrt_profile.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_profile.h>`_\n\nFunctions\n---------\n\nnrt_profile_start\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_profile_start(nrt_model_t *model, const char *filename);\n\nEnable profiling for a model.\n\n**Parameters:**\n\n* ``model`` [in] - model to profile\n* ``filename`` [in] - output filename that will be used with nrt_profile_stop()\n\n**Returns:** NRT_SUCCESS on success.\n\n**Source**: `nrt_profile.h:18 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_profile.h#L18>`_\n\nnrt_profile_stop\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_profile_stop(const char *filename);\n\nCollect results and disable profiling for a model.\n\n**Parameters:**\n\n* ``filename`` [in] - output filename to save the NTFF profile to\n\n**Returns:** NRT_SUCCESS on success.\n\n**Source**: `nrt_profile.h:26 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_profile.h#L26>`_\n\nnrt_profile_continuous_start\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_profile_continuous_start(nrt_profile_continuous_options_t *options);\n\nStart continuous device profiling.\n\nWhen continuous device profiling is started, profiling is enabled for every model but notifications will only be serialized to disk when the user calls nrt_profile_continuous_save().\n\n**Parameters:**\n\n* ``options`` [in] - options to control continuous device profiling\n\n**Returns:** NRT_SUCCESS on success.\n\n**Source**: `nrt_profile.h:77 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_profile.h#L77>`_\n\nnrt_profile_continuous_save\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_profile_continuous_save(uint32_t vnc, nrt_profile_continuous_options_t *options);\n\nSave NTFF profile to disk for the latest model executed on requested NeuronCore.\n\n**Parameters:**\n\n* ``vnc`` [in] - (start) NeuronCore id to collect profile for\n* ``options`` [in] - options to control continuous device profiling\n\n**Returns:** NRT_SUCCESS on success.\n\n**Source**: `nrt_profile.h:91 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_profile.h#L91>`_\n\nnrt_inspect_begin\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_inspect_begin();\n\nBegin tracing/profiling.\n\nUsers of this API must set options through environment variables (NEURON_RT_INSPECT_ENABLE, NEURON_RT_INSPECT_OUTPUT_DIR, etc.).\n\n**Returns:** NRT_SUCCESS on success\n\n**Source**: `nrt_profile.h:118 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_profile.h#L118>`_\n\nnrt_inspect_stop\n^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_inspect_stop();\n\nStop tracing/profiling and dump profile data.\n\n**Returns:** NRT_SUCCESS on success\n\n**Source**: `nrt_profile.h:126 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_profile.h#L126>`_\n\nnrt_inspect_begin_with_options\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_inspect_begin_with_options(nrt_inspect_config_t *options);\n\nBegin tracing/profiling with configurable options.\n\n**Parameters:**\n\n* ``options`` [in] - A pointer to an nrt_inspect_config struct containing configuration options for profiling. If NULL is passed, default options will be used.\n\n**Returns:** NRT_SUCCESS on success\n\n**Note:** This API ignores all the NEURON_RT_INSPECT_* environment variables.\n\n**Source**: `nrt_profile.h:237 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_profile.h#L237>`_\n\nnrt_inspect_config_allocate\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_inspect_config_allocate(nrt_inspect_config_t **options);\n\nAllocate memory for the options structure which is needed to start profiling using nrt_inspect_begin_with_options.\n\n**Parameters:**\n\n* ``options`` [out] - pointer to a pointer to options nrt_inspect_config struct\n\n**Returns:** NRT_SUCCESS on success\n\n**Source**: `nrt_profile.h:149 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_profile.h#L149>`_\n\nnrt_inspect_config_set_output_dir\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_inspect_config_set_output_dir(nrt_inspect_config_t *options, const char *output_dir);\n\nSets the output directory for results of profiling using nrt_inspect_begin_with_options.\n\n**Parameters:**\n\n* ``options`` [in,out] - Pointer to the options structure.\n* ``output_dir`` [in] - Path to the output directory. Must be a valid non-empty string\n\n**Returns:** NRT_SUCCESS on success, NRT_INVALID for invalid parameters, NRT_RESOURCE for memory allocation failure.\n\n**Source**: `nrt_profile.h:180 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_profile.h#L180>`_\n"
  },
  {
    "path": "neuron-runtime/api/nrt_status.rst",
    "content": ".. _api_nrt_status_h:\n\nnrt_status.h\n============\n\nNeuron Runtime status codes and error handling.\n\n**Source**: `src/libnrt/include/nrt/nrt_status.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_status.h>`_\n\nEnumerations\n------------\n\nNRT_STATUS\n^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef enum {\n       NRT_SUCCESS = 0,\n       NRT_FAILURE = 1,\n       NRT_INVALID = 2,\n       NRT_INVALID_HANDLE = 3,\n       NRT_RESOURCE = 4,\n       NRT_TIMEOUT = 5,\n       NRT_HW_ERROR = 6,\n       NRT_QUEUE_FULL = 7,\n       NRT_LOAD_NOT_ENOUGH_NC = 9,\n       NRT_UNSUPPORTED_NEFF_VERSION = 10,\n       NRT_FAIL_HOST_MEM_ALLOC = 11,\n       NRT_UNINITIALIZED = 13,\n       NRT_CLOSED = 14,\n       NRT_QUEUE_EMPTY = 15,\n       NRT_EXEC_UNIT_UNRECOVERABLE = 101,\n       NRT_EXEC_BAD_INPUT = 1002,\n       NRT_EXEC_COMPLETED_WITH_NUM_ERR = 1003,\n       NRT_EXEC_COMPLETED_WITH_ERR = 1004,\n       NRT_EXEC_NC_BUSY = 1005,\n       NRT_EXEC_OOB = 1006,\n       NRT_COLL_PENDING = 1100,\n       NRT_EXEC_HW_ERR_COLLECTIVES = 1200,\n       NRT_EXEC_HW_ERR_HBM_UE = 1201,\n       NRT_EXEC_HW_ERR_NC_UE = 1202,\n       NRT_EXEC_HW_ERR_DMA_ABORT = 1203,\n       NRT_EXEC_SW_NQ_OVERFLOW = 1204,\n       NRT_EXEC_HW_ERR_REPAIRABLE_HBM_UE = 1205,\n       NRT_NETWORK_PROXY_FAILURE = 1206,\n   } NRT_STATUS;\n\nStatus codes returned by NRT API functions.\n\n**Status Codes:**\n\n* ``NRT_SUCCESS`` - Operation completed successfully\n* ``NRT_FAILURE`` - Non-specific failure\n* ``NRT_INVALID`` - Invalid input (e.g., invalid NEFF, bad instruction, input tensor name/size mismatch)\n* ``NRT_INVALID_HANDLE`` - Invalid handle passed\n* ``NRT_RESOURCE`` - Failed to allocate a resource for requested operation\n* ``NRT_TIMEOUT`` - Operation timed out\n* ``NRT_HW_ERROR`` - Hardware failure\n* ``NRT_QUEUE_FULL`` - Not enough space in the execution input queue\n* ``NRT_LOAD_NOT_ENOUGH_NC`` - Failed to allocate enough NCs for loading a NEFF\n* ``NRT_UNSUPPORTED_NEFF_VERSION`` - Unsupported version of NEFF\n* ``NRT_UNINITIALIZED`` - NRT API called before nrt_init()\n* ``NRT_CLOSED`` - NRT API called after nrt_close()\n* ``NRT_QUEUE_EMPTY`` - Accessed a queue with no data\n* ``NRT_EXEC_UNIT_UNRECOVERABLE`` - Encountered fatal error, Execution Unit cannot recover\n* ``NRT_EXEC_BAD_INPUT`` - Invalid input submitted to exec()\n* ``NRT_EXEC_COMPLETED_WITH_NUM_ERR`` - Execution completed with numerical errors (produced NaN)\n* ``NRT_EXEC_COMPLETED_WITH_ERR`` - Execution completed with other errors\n* ``NRT_EXEC_NC_BUSY`` - Neuron core is locked (in use) by another model/process\n* ``NRT_EXEC_OOB`` - One or more indirect memcopies and/or embedding updates are out of bound\n* ``NRT_COLL_PENDING`` - Collective operation is still pending\n* ``NRT_EXEC_HW_ERR_COLLECTIVES`` - Stuck in collectives op (missing notification(s))\n* ``NRT_EXEC_HW_ERR_HBM_UE`` - HBM encountered an unrepairable uncorrectable error\n* ``NRT_EXEC_HW_ERR_NC_UE`` - On-chip memory of Neuron Core encountered a parity error\n* ``NRT_EXEC_HW_ERR_DMA_ABORT`` - DMA engine encountered an unrecoverable error\n* ``NRT_EXEC_SW_NQ_OVERFLOW`` - Software notification queue overflow\n* ``NRT_EXEC_HW_ERR_REPAIRABLE_HBM_UE`` - HBM encountered a repairable uncorrectable error\n* ``NRT_NETWORK_PROXY_FAILURE`` - EFA network proxy operation failed\n\n**Source**: `nrt_status.h:13 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_status.h#L13>`_\n\nFunctions\n---------\n\nnrt_get_status_as_str\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   const char *nrt_get_status_as_str(NRT_STATUS status);\n\nGet string representation of a status code.\n\n**Parameters:**\n\n* ``status`` [in] - Status code to convert to string.\n\n**Returns:** String representation of the status code.\n\n**Source**: `nrt_status.h:58 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_status.h#L58>`_\n"
  },
  {
    "path": "neuron-runtime/api/nrt_sys_trace.rst",
    "content": ".. _api_nrt_sys_trace_h:\n\nnrt_sys_trace.h\n===============\n\nNeuron Runtime System Trace API - Capture and fetch system trace events from Neuron devices.\n\n**Source**: `src/libnrt/include/nrt/nrt_sys_trace.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_sys_trace.h>`_\n\nFunctions\n---------\n\nSystem Trace Capture\n^^^^^^^^^^^^^^^^^^^^\n\nnrt_sys_trace_config_allocate\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: c\n\n   NRT_STATUS nrt_sys_trace_config_allocate(nrt_sys_trace_config_t **options);\n\nAllocate memory for the options structure which is needed to start profiling using nrt_sys_trace_start.\n\n**Parameters:**\n\n* ``options`` [in] - pointer to a pointer to options nrt_sys_trace_config struct\n\n**Returns:** NRT_SUCCESS on success\n\n**Source**: `nrt_sys_trace.h:29 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_sys_trace.h#L29>`_\n\nnrt_sys_trace_config_set_max_events_per_nc\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: c\n\n   void nrt_sys_trace_config_set_max_events_per_nc(nrt_sys_trace_config_t *options, uint64_t max_events_per_nc);\n\nSets max number of events that can be stored across all ring buffers.\n\n**Parameters:**\n\n* ``options`` [in,out] - Pointer to the options structure.\n* ``max_events_per_nc`` [in] - Max number of events that can be stored in each ring buffer.\n\n**Source**: `nrt_sys_trace.h:50 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_sys_trace.h#L50>`_\n\nnrt_sys_trace_config_set_capture_enabled_for_nc\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: c\n\n   void nrt_sys_trace_config_set_capture_enabled_for_nc(nrt_sys_trace_config_t *options, uint32_t nc_idx, bool enabled);\n\nSets system trace capture enabled for a specific NeuronCore. Ring buffers won't be allocated for disabled NeuronCores.\n\n**Parameters:**\n\n* ``options`` [in,out] - Pointer to the options structure.\n* ``nc_idx`` [in] - NeuronCore index.\n* ``enabled`` [in] - Capture enabled flag.\n\n**Source**: `nrt_sys_trace.h:60 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_sys_trace.h#L60>`_\n\nnrt_sys_trace_get_event_types\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: c\n\n   NRT_STATUS nrt_sys_trace_get_event_types(const char ***event_types, size_t *count);\n\nReturns an allocated array of all valid event type strings.\n\n**Parameters:**\n\n* ``event_types`` [out] - Pointer to array of const char* (allocated).\n* ``count`` [out] - Number of event types.\n\n**Returns:** NRT_SUCCESS on success, error code otherwise.\n\n**Note:** The user is responsible for freeing the array and each string, or can use nrt_sys_trace_free_event_types() for convenience.\n\n**Source**: `nrt_sys_trace.h:79 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_sys_trace.h#L79>`_\n\nnrt_sys_trace_start\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: c\n\n   NRT_STATUS nrt_sys_trace_start(nrt_sys_trace_config_t *options);\n\nInitialization for system trace capture including allocating memory for event ring buffers.\n\n**Parameters:**\n\n* ``options`` [in] - Configuration options for system trace capture\n\n**Returns:** NRT_SUCCESS on success\n\n**Source**: `nrt_sys_trace.h:106 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_sys_trace.h#L106>`_\n\nnrt_sys_trace_stop\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: c\n\n   NRT_STATUS nrt_sys_trace_stop();\n\nTeardown for system trace capture including freeing allocated memory for event ring buffers.\n\n**Returns:** NRT_SUCCESS on success\n\n**Source**: `nrt_sys_trace.h:109 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_sys_trace.h#L109>`_\n\nSystem Trace Fetch\n^^^^^^^^^^^^^^^^^^\n\nnrt_sys_trace_fetch_events\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: c\n\n   NRT_STATUS nrt_sys_trace_fetch_events(char **buffer, size_t *written_size, const nrt_sys_trace_fetch_options_t *options);\n\nFetches system trace events from process memory and returns them as a JSON-formatted string. Once events are fetched, they cannot be fetched again.\n\n**Parameters:**\n\n* ``buffer`` [out] - On successful return, will point to a dynamically allocated, null-terminated JSON string containing the trace events. The caller must free the allocated memory by calling nrt_sys_trace_buffer_free(buffer).\n* ``written_size`` [out] - A pointer to a size_t variable that will be set to the number of bytes written into the allocated buffer.\n* ``options`` [in] - Pointer to options such as max number of events to fetch.\n\n**Returns:** NRT_SUCCESS on success.\n\n**Source**: `nrt_sys_trace.h:143 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_sys_trace.h#L143>`_\n\nnrt_sys_trace_buffer_free\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: c\n\n   void nrt_sys_trace_buffer_free(char *buffer);\n\nFree the buffer allocated by nrt_sys_trace_fetch_events. Should be called after the events are no longer needed.\n\n**Parameters:**\n\n* ``buffer`` [in] - Pointer to buffer to be freed.\n\n**Source**: `nrt_sys_trace.h:151 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_sys_trace.h#L151>`_\n"
  },
  {
    "path": "neuron-runtime/api/nrt_version.rst",
    "content": ".. _api_nrt_version_h:\n\nnrt_version.h\n=============\n\nNeuron Runtime version information API.\n\n**Source**: `src/libnrt/include/nrt/nrt_version.h <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_version.h>`_\n\nConstants\n---------\n\nRT_VERSION_DETAIL_LEN\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define RT_VERSION_DETAIL_LEN 128\n\nMaximum length for version detail string.\n\n**Source**: `nrt_version.h:12 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_version.h#L12>`_\n\nGIT_HASH_LEN\n^^^^^^^^^^^^\n\n.. code-block:: c\n\n   #define GIT_HASH_LEN 64\n\nMaximum length for git hash string.\n\n**Source**: `nrt_version.h:13 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_version.h#L13>`_\n\nStructures\n----------\n\nnrt_version_t\n^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   typedef struct nrt_version {\n       uint64_t rt_major;\n       uint64_t rt_minor;\n       uint64_t rt_patch;\n       uint64_t rt_maintenance;\n       char rt_detail[RT_VERSION_DETAIL_LEN];\n       char git_hash[GIT_HASH_LEN];\n   } nrt_version_t;\n\nNRT version information structure.\n\n**Source**: `nrt_version.h:15 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_version.h#L15>`_\n\nFunctions\n---------\n\nnrt_get_version\n^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n   NRT_STATUS nrt_get_version(nrt_version_t *ver, size_t size);\n\nGet the NRT library version.\n\n**Parameters:**\n\n* ``ver`` [out] - Pointer to nrt version struct\n* ``size`` [in] - Length of the data needed to be filled in the nrt_version_struct\n\n**Returns:** NRT_STATUS_SUCCESS on success.\n\n**Source**: `nrt_version.h:28 <https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_version.h#L28>`_\n"
  },
  {
    "path": "neuron-runtime/configuration-guide.rst",
    "content": "Configuration Guide\n===================\n\n.. toctree::\n    :maxdepth: 1\n    \n    Runtime Configuration </neuron-runtime/nrt-configurable-parameters>"
  },
  {
    "path": "neuron-runtime/explore/compute-comm-overlap.rst",
    "content": ".. _neuron-runtime-explore-compute-comm:\n\n.. meta::\n   :description: How AWS Neuron's architecture enables compute-communication overlap to improve performance in distributed training workloads.\n   :keywords: AWS Neuron, collective communocation, compute-communication overlap, distributed training, FSDP, TP, Neuron Runtime, Neuron Compiler\n\n=========================================\nCompute-Communication Overlap in Neuron\n=========================================\n\nThis topic explains how AWS Neuron's architecture enables compute-communication overlap to improve performance in distributed training workloads. Users will learn about the asynchronous execution model where dedicated collective communication cores operate independently from computation engines, the challenges of resource contention between DMA engines, and optimization techniques including Token Threading for FSDP and static DMA priority adjustment. The content covers practical implementation strategies for overlapping FSDP operations with computational tasks in adjacent network layers, helping developers maximize throughput in tensor parallelism and fully-sharded data parallelism scenarios.\n\nBackground\n----------\n\nCollective communication (CC) operations on the AWS Trainium System-on-Chip (SoC) architecture are executed autonomously from computation engines using dedicated CC cores. Computation engines on each Neuron core do not execute explicit communication instructions. Instead, they asynchronously initiate the CC core and later retrieve completion signals once CC operations finish. The Neuron compiler implements this mechanism by generating pseudo-instructions (PseudoTriggerCollective2 or PTC2) for each CC operation in the engine binaries of the Neuron Executable File Format (NEFF).\n\nWhen a NEFF is loaded, the Neuron Runtime translates these pseudo-instructions into Write instructions to trigger the CC core during execution. At the same time, the runtime loads the collective communication program for the control path and pre-constructed DMA rings that establish the data path for CC operations. During runtime execution, whenever a Neuron core triggers a CC core, the next scheduled operation advances through the configured DMA rings, enabling inter-core data transfer using a semaphore-based synchronization protocol among CC cores within the processing cluster.\n\nThis asynchronous execution paradigm enables intrinsic overlapping of computation and communication processes, which enhances throughput in scenarios where computation can proceed independently from communication results. This architectural advantage is especially pronounced in computation-intensive applications such as neural network training.\n\nDespite these performance benefits, resource contention is a significant consideration. DMA engines are shared resources between computation and communication subsystems. This contention can cause throughput degradation for compute operations due to delayed DMA transactions between High Bandwidth Memory (HBM) and Scratchpad Buffer (SBUF), affecting both input tensor loading and output tensor spill-out for computation engines. Communication operations may also experience performance degradation due to time-sharing of DMA engine resources. Implementing optimal DMA prioritization strategies is critical for maximizing system performance in real-world conditions.\n\nOverlap Between Compute and Communication\n-----------------------------------------\n\nThe Neuron compiler enables concurrent execution of operations across the Neuron core and CC core through a sophisticated instruction scheduling mechanism. The compiler backend maintains separate scheduling queues for computation engines and communication streams, allowing independent instruction scheduling except where explicit dependencies exist. In theory, this design should enable optimal overlapping of compute and communication operations without manual intervention, similar to scheduling computational instructions across multiple computation engines. However, empirical analysis reveals suboptimal overlapping patterns in some scenarios.\n\nFor example, in dense Large Language Model (LLM) training that uses Tensor Parallelism (TP), Fully-Sharded Data Parallelism (FSDP), and Sequence Parallelism (SP), each network layer exhibits characteristic communication requirements:\n\n- **TP AllGather**: Precedes matrix multiplication to consolidate sharded activations.\n- **TP ReduceScatter**: Aggregates and re-shards the outputs.\n- **FSDP AllGather**: Required before each layer execution to gather sharded model parameters.\n- **FSDP ReduceScatter**: Needed during the backward pass for gradient accumulation.\n\nCurrent compiler heuristics schedule FSDP AllGather operations collectively at the earliest possible execution point, as these operations depend only on subsequent computational operations within their respective layers. However, this strategy creates resource contention with critical TP communication operations, resulting in decreased end-to-end performance—even when Multi-stream CC capability is available for concurrent execution. A more efficient approach would proactively perform FSDP AllGather for a given layer during the execution of the preceding layer.\n\nSimilarly, FSDP ReduceScatter operations are typically scheduled at the end of the backward pass, just before optimizer execution, due to compiler memory optimization strategies. An alternative scheduling approach—placing each FSDP ReduceScatter operation within the subsequent backward layer—would enable better computational overlap and eliminate idle periods at the end of the backward pass.\n\nToken Threading for FSDP\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo achieve optimal overlapping of CC operations, a novel dependency control mechanism called **Token Threading for FSDP** has been implemented. This experimental feature can be activated with environment variables:\n\n- For JAX frameworks: ``NEURON_FSDP=1``\n- For NeuronX Distributed (NxD): ``NEURON_NXD_FSDP_CC_MULTISTREAM=1``\n\nThis mechanism uses a specialized Neuron PJRT compiler pass to identify operation patterns spanning TP and FSDP dimensions. It enforces precise execution ordering between CC operations by establishing synthetic data dependencies using a daisy-chain configuration of token tensors. Each token is a single-element tensor serving as a synchronization mechanism.\n\nThe resulting High Level Optimizer (HLO) instruction sequence demonstrates the dependency chain:\n\n.. code-block:: none\n   \n   constant.45 = bf16[] constant(0)\n   all-gather.26 = (bf16[4096,8192]{2,1,0}, bf16[]) all-gather(param, constant.45), ...\n   ...\n   get-tuple-element.6 = bf16[] get-tuple-element(all-gather.26), index=1,...\n   all-gather.25 = (bf16[896,8192]{1,0}, bf16[]) all-gather(param.2, get-tuple-element.6), ...\n   ...\n   get-tuple-element.2 = bf16[896,8192]{1,0} get-tuple-element(all-gather.25), index=0, ... \n   dot.9 = bf16[4096,8192]{1,0} dot(maximum.14, get-tuple-element.2),...\n   ...\n   get-tuple-element.7 = bf16[] get-tuple-element(all-gather.25), index=1, ...\n   reduce-scatter.8 = (bf16[128,8192]{1,0}, bf16[]) reduce-scatter(dot.9, get-tuple-element.7), ...\n\nA token is extracted from the preceding CC operation and incorporated into the input tuple of the next CC operation, creating an explicit data dependency that enforces deterministic ordering. The Neuron compiler preserves this ordering during instruction scheduling but eliminates the token tensors from the final execution plan.\n\nThis implementation enables effective overlapping of FSDP CC operations with computational operations in adjacent network layers. Performance analysis confirms that FSDP AllGather operations for Attention layers successfully overlap with computation in preceding Multi-Layer Perceptron (MLP) layers, specifically in the execution window between TP AllGather and ReduceScatter operations.\n\n.. figure:: /images/deep-dives/compiler/deep-dive-compute-comm1.png\n   :align: center\n   :width: 80%\n\n   Image that shows how FSDP-AG operations for Attention layers successfully overlap with computation in preceding MLP layers.\n\nAdjusting Static DMA Priority\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo address performance degradation caused by overlapping FSDP AllGather operations competing for DMA resources, a configurable static prioritization mechanism is provided through DMA packet size adjustment. DMA engines process descriptors from up to 16 DMA rings in HBM using a round-robin arbitration scheme. Arbitration transitions between rings only at packet boundaries. DMA rings with smaller packet sizes are more susceptible to resource starvation. Increasing packet size elevates processing priority.\n\n- The Neuron compiler generates PseudoDmaTrigger (PDMAT) instructions and descriptors in the NEFF.\n- The Neuron Runtime translates these into hardware WRITE operations and constructs hardware-compatible DMA rings.\n- The ``NEURON_RT_DBG_DMA_PACKETIZATION_SIZE`` environment variable controls packet size during DMA ring construction. The default is 4 KiB, the empirically determined minimum for DMA/HBM efficiency. This parameter only allows increasing packet size to elevate priority.\n- For PTC2 instructions, ``NEURON_RT_DBG_CC_DMA_PACKET_SIZE`` controls packet size, with a default and maximum of 64 KiB. This parameter only allows reducing packet size to lower priority and only affects memory copy components of CC operations.\n\nFor systems with both TP and FSDP, optimal performance is achieved by prioritizing PDMAT for computational operations over FSDP CC operations:\n\n.. code-block:: shell\n\n   NEURON_RT_DBG_DMA_PACKETIZATION_SIZE=65536\n   NEURON_RT_DBG_CC_DMA_PACKET_SIZE=4096\n\nAlthough ``NEURON_RT_DBG_CC_DMA_PACKET_SIZE`` also affects critical TP collective communication operations, empirical analysis shows operational efficiency remains unimpaired.\n\nThe architecture supports additional DMA instruction types for dynamic transaction handling (DmaMemcpy, DmaIndirect, DmaTranspose), using the Descriptor Generation Engine (DGE) to generate DMA descriptors dynamically. The ``NEURON_RT_DBG_DMA_PACKETIZATION_SIZE`` parameter does not affect these DGE-based instructions. Enhanced dynamic DMA prioritization is under development.\n\nOverlap Between Communications – Multi-stream CC\n------------------------------------------------\n\nOptimal system performance requires computation duration to be sufficient to fully mask communication latency. Partial communication masking can provide incremental benefits but may introduce secondary performance implications as seen in the figure below.\n\n.. figure:: /images/deep-dives/compiler/deep-dive-compute-comm2.png\n   :align: center\n   :width: 80%\n\n   Image that shows idle compute resources due to cross-compute communication latency.\n\nIn experimental configurations, FSDP AllGather operations gather weight parameters for Up, Gate, and Down projections in the next MLP layer. These operations are larger than those in the Attention layer, and the Attention layer's computation is shorter. Extended FSDP AllGather operations can delay TP ReduceScatter operations, which could otherwise start immediately. If TP ReduceScatter could execute concurrently with FSDP AllGather, subsequent computations (such as Up and Gate projections) could begin earlier.\n\nMulti-stream CC enables concurrent execution of communication operations using parallel communication resources. The hardware provides two CC cores per physical Neuron core. In TP×FSDP training, two physical Neuron cores are configured as a Logical Neuron Core (LNC2 mode), resulting in four CC cores per logical unit. Each CC core can manage a distinct communication stream, supporting up to four concurrent CC streams in LNC2 mode.\n\n.. figure:: /images/deep-dives/compiler/deep-dive-compute-comm3.png\n   :align: center\n   :width: 80%\n\n   Image that shows efficient use of compute when effective overlapping of communication operations are enabled.\n\n- With fewer streams than CC cores, each stream has exclusive access to a CC core, and surplus cores are allocated to stream 0.\n- Increased CC core allocation does not necessarily provide linear throughput gains. The benefit is greatest when communication operations use algorithms with multiple channels.\n- In reference implementations, optimal performance requires two streams: stream 0 for TP CC operations and stream 1 for FSDP CC operations.\n\nTo enable multi-stream CC in JAX, set these environment variables:\n\n.. code-block:: shell\n   \n   NEURON_FSDP=1\n   NEURON_FSDP_CC_MULTISTREAM=1\n\nFor NxD implementations, also set this environment variable:\n\n.. code-block:: shell\n   \n   NEURON_NXD_FSDP_CC_MULTISTREAM=1\n\nThe stream allocation mechanism is implemented in Neuron PJRT compilation passes, where CC stream identifiers (stream_id) are assigned to the ``frontend_attributes`` field of HLO instructions, using metadata tags from Token Threading for FSDP.\n\n.. code-block:: none\n   \n   reduce-scatter.8 =\n     (bf16[128,8192]{1,0}, bf16[]) reduce-scatter(dot.9, get-tuple-element.7), ...\n     frontend_attributes={collective_type=\"tp_reduce_scatter\",has_token=\"1\",stream_id=\"0\"}, ...\n\nThese configuration parameters are being incorporated into default settings in future releases, enabling automatic activation. More granular user-configurable options for stream allocation are also under development.\n\nAdjusting Static DMA Priority (per Stream)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDMA prioritization for TP CC operations is critical, as these operations directly block subsequent computation. They must not be delayed by concurrent FSDP CC weight prefetch operations. Since FSDP CC operations overlap with long computational sequences, they can be executed on a best-effort basis. The optimal DMA priority hierarchy is: TP CC ≥ PDMAT (compute) > FSDP CC.\n\nThe ``NEURON_RT_DBG_CC_DMA_PACKET_SIZE`` variable accepts comma-delimited values for individual adjustment of DMA packet sizes per communication stream:\n\n.. code-block:: shell\n\n   NEURON_RT_DBG_DMA_PACKETIZATION_SIZE=65536\n   NEURON_RT_DBG_CC_DMA_PACKET_SIZE=65536,4096 # 65536 for stream 0, 4096 for stream 1\n\nWeight Prefetch\n^^^^^^^^^^^^^^^\n\nTo overlap FSDP CC operations with computation from adjacent layers, FSDP AllGather operations are strategically relocated to preceding layers in both forward and backward passes. Similarly, FSDP ReduceScatter operations in the backward pass are relocated to subsequent layers. Large language models typically alternate Attention and MLP blocks. MLP layers have longer computation and larger weights, resulting in larger FSDP CC operations.\n\nIf all FSDP CC operations are shifted by one layer, Attention layers in the backward pass may be burdened with very large FSDP AllGather and ReduceScatter operations for adjacent MLP layers, exceeding their computational duration.\n\nTo balance communication and computation, additional configuration parameters enable precise control over the shifting distance for FSDP CC operations:\n\n.. code-block:: shell\n   \n   NEURON_FSDP_NUM_LAYER_EARLY_AG_SHIFT=1\n   NEURON_FSDP_NUM_LAYER_LATE_RS_SHIFT=2\n\nThese parameters enable differential shifting strategies for AllGather and ReduceScatter operations, optimizing the overlap pattern for each model architecture.\n\nWhat’s Next?\n------------\n\nDynamic DMA Prioritization\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFuture implementations will introduce a dedicated field in DMA instructions to specify priority class, enabling dynamic DMA prioritization at the instruction level, including DGE instructions. This will allow developers to assign priority designations in HLO instructions, with the Neuron compiler generating instructions with appropriate priority class based on user tags and compiler heuristics. Beyond packet size adjustment, this approach will provide additional mechanisms for regulating relative priority among competing instructions.\n\nFor critical CC operations, the DGE will implement dynamic resource reallocation, temporarily relinquishing DMA engines occupied by inflight CC operations. This is especially beneficial for latency-sensitive scenarios, such as inference token generation, where CC operations are critical and often contend with weight prefetching from HBM to SBUF. Since these critical operations typically involve small data transfers, packet size adjustment may not be sufficient. Complete isolation of DMA engines during these operations can yield substantial improvements in end-to-end performance, even if it reduces overall DGE throughput.\n\nTRN3 and later generations will include DMA engines with strict priority-based arbitration, processing descriptors from the highest-priority ring to completion before lower-priority transactions. This hardware advancement will expand the flexibility and effectiveness of DMA prioritization strategies.\n\nFine-grained CC\n^^^^^^^^^^^^^^^\n\nCurrently, TP CC operations cannot be effectively overlapped with computation due to strict data dependencies. Performance profiles show computational idle periods during TP collective communication operations. Two common patterns create these stalls:\n\n1. ``dot(all-gather(x), y)``: Matrix multiplication cannot proceed until AllGather consolidates sharded activations across the TP dimension.\n2. ``reduce-scatter(dot(x, y))``: Requires matrix multiplication to complete before reduction and redistribution.\n\nThese CC operations can be decomposed into more granular communication primitives—specifically, sequences of send/receive operations implemented with CollectivePermute operations. In the ``dot(all-gather(x), y)`` pattern, this allows partial matrix multiplication to begin with each received data segment while transmitting it to other ranks, rather than waiting for the full tensor. Similarly, ``reduce-scatter(dot(x, y))`` can be restructured for progressive reduction and communication of partial results during ongoing computation.\n\nThis fine-grained CC approach is based on research from Google and is under development for future versions of the Neuron SDK.\n\nRead More\n---------\n\n- `AWS Neuron SDK Documentation Home <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/>`_\n- `Neuron Distributed Training Guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/index.html>`_\n- `Neuron Runtime Documentation <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/index.html>`_\n"
  },
  {
    "path": "neuron-runtime/explore/core-dump-deep-dive.rst",
    "content": ".. meta::\n   :description: This topic explores Neuron runtime core dumps in depth, using the neuron-dump tool included in the AWS Neuron SDK.\n   :date-modified: 12-02-2025\n   \n.. _runtime-core-dump-deep-dive:   \n\nDeep Dive: Explore Neuron runtime core dumps\n=============================================\n\nThis topic explores Neuron runtime core dumps in depth and discusses the technical details of it from the perspective of an AWS Neuron expert. Some experience in AWS NeuronCore Architecture is required to understand it in full.\n\nWhat you should know before reading\n------------------------------------\n\n* :doc:`AWS NeuronCore Architecture </about-neuron/arch/neuron-hardware/neuroncores-arch>`\n* :doc:`Amazon EC2 AI Chips Architecture </about-neuron/arch/neuron-hardware/neuron-devices>`\n* :doc:`Generating a Neuron runtime core dump </neuron-runtime/about/core-dump>`\n\nOverview\n--------\n\nWhat are Neuron Runtime core dumps?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nCore dumps are a snapshot of relevant runtime and hardware state to aid in debugging issues when deploying Neuron at scale.\n\nWhat problems do Neuron Runtime core dumps solve?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen deploying Neuron applications at scale, there can be infrequent and difficult to reproduce errors.\nCore dumps are a mechanism to capture relevant state about these errors to aid in debugging.\n\nWho are Neuron runtime core dumps for?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nOrganizations who are scaling up Neuron applications and encountering sporadic issues in the fleet.\n\nWhen should Neuron Runtime core dumps be used?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDiagnoising correctness issues occuring infrequently in the fleet.\n\nHow are core dumps enabled?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nCore dumps are enabled by default if an executable script ``/opt/aws/neuron/bin/neuron-dump`` exists.\nThe package ``aws-neuronx-tools`` provides a default implementation of ``/opt/aws/neuron/bin/neuron-dump``.\nAlternatively, core dumps are enabled if users install a custom version of ``/opt/aws/neuron/bin/neuron-dump``.\n\nIf users want to disable this default behavior, core dumps are disabled by defining both ``NEURON_RT_LOCAL_CORE_DUMP_DIRECTORY`` and ``NEURON_RT_S3_CORE_DUMP_PREFIX`` to an empty string::\n\n    export NEURON_RT_LOCAL_CORE_DUMP_DIRECTORY=\"\"\n    export NEURON_RT_S3_CORE_DUMP_PREFIX=\"\"\n\nAlternatively, deleting ``/opt/aws/neuron/bin/neuron-dump`` also disables core dumps.\n\nWhat is the core dump generation flow?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUpon execution error:\n\n1. Neuron runtime produces state snapshots for each rank\n2. Neuron runtime invokes ``neuron-dump`` to capture instance hardware state\n3. ``neuron-dump`` captures environment and hardware state\n4. ``neuron-dump`` can optionally be configured to upload core dump artifacts to S3\n\nWhat is included in a core dump?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- tail of runtime logs (default naming: ``nrt-<instance id>-pid-<pid>.log``)\n- dump of hardware state for every participating physical NeuronCore (default naming: ``<instance id>-nd<device id>-nc<core id>-pid-<pid>-tid-<tid>-lid-<log id>``)\n  - installed neuron packages\n  - snapshot of instruction buffers\n  - semaphore values\n  - DMA state\n- dump of CC core state for every participating CC core (naming: ``<instance id>-nd<device id>-cc-core-<cc core id>-pid-<pid>-tid-<tid>-lid-<log id>``)\n- tail of nrt error logs for every participating process (naming: ``<instance id>-nrt-pid-<pid>.log``)\n\nneuron-dump\n-------------\n\nWhat is neuron-dump?\n~~~~~~~~~~~~~~~~~~~~~\n\n``neuron-dump`` is the script responsible for capturing relevant hardware state for core dumps and uploading the core dump to Amazon S3.\n\nHow is neuron-dump distributed?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA default ``neuron-dump`` is distributed as part of the ``aws-neuronx-tools`` package.\n\nWhere is neuron-dump installed?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n``/opt/aws/neuron/bin/neuron-dump``\n\nHow do users customize the information included in core dumps?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo add/remove information from core dumps, uyou must install a custom ``neuron-dump`` at ``/opt/aws/neuron/bin/neuron-dump``. If you choose to install a custom ``neuron-dump`` as part of an automated script, you must install it after you install ``aws-neuronx-tools``.\n\nWhat input interface does Neuron runtime provide to neuron-dump?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuron runtime provides input to ``neuron-dump`` by CLI flag-value pairs.\nThe following CLI flags are provided to ``neuron-dump``::\n\n    --neff-name: Name from the neff header\n    --neff-uuid: UUID from the neff header\n\n    --date-time: datetime formatted as `yyyy-mm-dd-HH-MM`. This datetime represents the epoch of the initial barrier when running a collectives execution, or it falls back to the epoch from the local process if collectives context is not available.\n    --pid: The process id\n    --tid: The thread id\n    --log-id: A process unique id given. Each execution of neuron-dump is given a unique log id for that given runtime process. Not guaranteed to be unique across processes.\n\n    --instance-id: The instance id\n    --cluster-id: the unique identifier for a single collectives execution. `0000000000000000` if collectives information is not available.\n\n    --error-location: The libnrt api where the error occured.\n    --error-code: The libnrt api return code: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-api-guide.html#api-return-codes.\n\n    --local-output-dir: The directory specified by `NEURON_RT_LOCAL_CORE_DUMP_DIRECTORY` with format variables replaced.\n    --s3-output-prefix: The prefix specified to by `NEURON_RT_S3_CORE_DUMP_PREFIX` with format variables replaced. Only included if `NEURON_RT_S3_CORE_DUMP_PREFIX` is set.\n\nConfiguring Neuron Runtime core dumps\n---------------------------------------\n\nWhere are core dumps located locally on the instance?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe Neuron Runtime exposes the environment variable ``NEURON_RT_LOCAL_CORE_DUMP_DIRECTORY`` to configure the local root directory of core dumps. The default value is ``/tmp/neuron-core-dump/dt-%d-cid-%c``.\n\nWhere are core dumps uploaded to in Amazon S3?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe Neuron Runtime exposes the environment variable ``NEURON_RT_S3_CORE_DUMP_PREFIX`` to configure the root directory for core dumps to be uploaded to in an s3 bucket. Neuron Runtime does not perform the upload to s3. The formatted directory is provided as an argument to ``neuron-dump`` which can be configured by the user to upload the core dump to s3.\n\nWhat format variables are supported for core dump paths?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe configuration environment variables ``NEURON_RT_LOCAL_CORE_DUMP_DIRECTORY`` and ``NEURON_RT_S3_CORE_DUMP_PREFIX`` support format variable substition. Neuron Runtime substitutes these variables with information from the runtime process. The formatted directories are then passed along to ``neuron-dump``::\n\n    %d: datetime\n    %c: cluster id\n    %p: the process id\n    %t: the thread id\n    %l: the log id\n    %i: the instance id\n\nHow do users ensure the root path for core dumps are unique?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIncluding the format variables ``%d`` (datetime) and ``%c`` (cluster id) in the path ensures uniqueness.\nAs well, these values are agreed upon by all participating ranks in a collectives execution, so all ranks produce their core dump in the same directory with these set::\n\n    export NEURON_RT_LOCAL_CORE_DUMP_DIRECTORY=\"/your/base/path/%d-%c\"\n    export NEURON_RT_S3_CORE_DUMP_PREFIX=\"s3://your/s3/bucket/%d-%c\"\n"
  },
  {
    "path": "neuron-runtime/explore/device-memory.rst",
    "content": ".. meta::\n   :description: Learn how to understand, monitor, and optimize memory usage on AWS Neuron devices such as Trainium and Inferentia ML chips. \n   :date-modified: 10/16/2025\n\n.. _neuron-device-memory-deep-dive:\n\nNeuron Device Memory\n====================\n\nLearn how to understand, monitor, and optimize memory usage on AWS Neuron devices. This topic covers memory categories including tensors, model constants, scratchpad allocations, DMA rings, and profiling buffers. Discover debugging tools like neuron-top and neuron-monitor, troubleshoot out-of-memory (OOM) errors, and implement strategies to reduce memory consumption for efficient ML workload execution on Inferentia and Trainium instances.\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n\nThe Neuron Runtime's memory usage falls into the following categories:\n\n- ``tensors``: input and output tensors allocated by application\n- ``model constants``: compiled constants used by a NEFF program\n- ``model code``: the executable instructions for the Neuron Core. This also includes a micro-code overhead of 96MB per physical Neuron Core (this overhead is subject to future improvements)\n- ``profile buffers``: buffers used to store profling events\n- ``scratchpad`` and ``shared scratchpad``: additional space used to store intermediary SBUF and other computations. Read :ref:`nd-scratchpad` for details.\n- ``dma rings``: Data transfer instructions describing data movements during NEFF execution, used during NEFF execution.\n- ``collectives``: Memory overhead used to orchestrate collective communication\n\nHere's what users can do to adjust these forms of memory usage:\n\n1. ``model constants`` and ``tensors`` are entirely controlled by the user. Adjust similar to other XLA devices with matrix dimensions, batch sizes, etc.\n2. ``scratchpad`` and ``shared scratchpad`` depend on model size, model type and tiling strategy. Read the :ref:`nd-scratchpad`.\n3. ``dma rings`` usage is not easily actionable. It can be reduced by using DGE where possible, or changing the model to reduce data movements (like transfers between HBM and SBUF).\n4. ``profile buffers`` are allocated when the user enables profiling. Users can influence these allocations by either disabling profiling or manually adjusting. Read the :ref:`nd-profile-buffers` section.\n5. ``model code`` usages are not actionable. If users observe significant usage, contact your AWS Neuron support.\n\n\n\nLogical Neuron Cores\n~~~~~~~~~~~~~~~~~~~~~\n\nStarting with ``trn2``, we introduced the concept of Logical Neuron Cores, where multiple physical Neuron Cores are grouped into the same \"Neuron Core\". Read :doc:`this article </about-neuron/arch/neuron-features/logical-neuroncore-config>` for more details.\n\n.. note::\n   On ``trn2``, the default configuration is LNC2, but when using LNC1 (``NEURON_LOGICAL_NC_CONFIG=1``), two neighboring Neuron Cores will end up **SHARING a HBM**. See the following diagram, where two vertically neighboring NeuronCore-V3s share a HBM.\n\n   .. image:: /images/architecture/Trainium2/trainium2.png\n\n   As a result, there will be **noisy neighbor problems**, and you may see out-of-memory (OOM) errors earlier than expected depending on what is loaded on the neighboring core.\n\nDebugging Tools\n~~~~~~~~~~~~~~~\n\nneuron-top\n^^^^^^^^^^\n\nRunning ``neuron-top`` will give you a view of the current memory usages on a core level. Read :doc:`this article </tools/neuron-sys-tools/neuron-top-user-guide>` for more details.\n\n\nsysfs\n^^^^^\n\nAs an alternative, you can find the same information from the sysfs. Read :doc:`this article </tools/neuron-sys-tools/neuron-sysfs-user-guide>` for more details.\n\nOut-of-memory (OOM) Errors\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen an OOM occurs, the Neuron Runtime dumps a detailed breakdown of the various memory usage types for each NEFF. For example:\n\n.. code-block:: text\n\n   2025-May-15 20:58:33.895937 224822:224822 ERROR  TDRV:print_lnc_hbm_details                   LNC size is 1. Neuron Cores using this HBM: NC 4 and NC 5\n   2025-May-15 20:58:33.897479 224822:224822 ERROR  TDRV:log_dev_mem                             Failed to allocate 4.000GB (alignment: none, usage: tensors) on ND 0:NC 4\n   2025-May-15 20:58:33.899416 224822:224822 ERROR  TDRV:log_dev_mem_usage_table                 Displaying Current Memory Utilization:\n   (NOTE: the lines are LONG, and NEFF id to name mapping is printed after)\n\n                 |          |  Model   |  Model   |          |  Shared  |          |          |DMA Rings |DMA Rings | DMA Rings |DMA Rings |           |          | Profiler |\n                 |  TOTAL   |   Code   |Constants | Tensors  |Scratchpad|Scratchpad| Runtime  |    IO    |  Spill   |Collectives| Runtime  |Collectives|  XT CC   | Buffers  |\n   ND 0 Overall  | 20.188GB |192.102MB | 82.344KB | 20.000GB |  0.000B  |  0.000B  |350.125KB |179.000KB | 64.000KB |  0.000B   | 68.000KB |  0.000B   |  0.000B  |  0.000B  |\n   \\_NC 4        | 20.094GB | 96.065MB | 58.344KB | 20.000GB |  0.000B  |  0.000B  |229.062KB |118.000KB | 48.000KB |  0.000B   | 36.000KB |  0.000B   |  0.000B  |  0.000B  |\n     \\_NEFF 1001 |263.906KB | 28.562KB | 34.344KB |   n/a    |   n/a    |  0.000B  |108.000KB | 57.000KB | 32.000KB |  0.000B   | 4.000KB  |  0.000B   |   n/a    |   n/a    |\n     \\_NEFF 1002 |244.875KB | 31.875KB | 24.000KB |   n/a    |   n/a    |  0.000B  |108.000KB | 61.000KB | 16.000KB |  0.000B   | 4.000KB  |  0.000B   |   n/a    |   n/a    |\n   \\_NC 5        | 96.285MB | 96.037MB | 24.000KB |  0.000B  |  0.000B  |  0.000B  |121.062KB | 61.000KB | 16.000KB |  0.000B   | 32.000KB |  0.000B   |  0.000B  |  0.000B  |\n     \\_NEFF 1003 |244.875KB | 31.875KB | 24.000KB |   n/a    |   n/a    |  0.000B  |108.000KB | 61.000KB | 16.000KB |  0.000B   | 4.000KB  |  0.000B   |   n/a    |   n/a    |\n\n   NEFF id to name mapping:\n   1001: \"1.0.41235.0+df4a714bb-/local/out-test0_meta_dense\"\n   1002: \"1.0.41235.0+df4a714bb-/local/out-test0_meta_concat3\"\n   1003: \"1.0.41235.0+df4a714bb-/local/out-test0_meta_concat3\"\n\nIn case this OOM message is truncated, this information is also available under ``/tmp/neuron_mem_table_device_<device_id>_hbm_<hbm_idx>.log``.\n\nPer-NEFF INFO logs\n^^^^^^^^^^^^^^^^^^\n\nThe memory usage of a NEFF is also available as ``INFO`` level logs during model load. By using ``NEURON_RT_LOG_LEVEL_TDRV=info``, you'll see a log like:\n\n.. code-block:: text\n\n   2025-May-15 07:41:15.014997 2198754:2198754  INFO  TDRV:dml_log_dev_neff_mem\n   [ND 0:NC 0] Current Usage Total: 96.543MB\n           shared scratchpad: 0.000B\n   Per NEFF memory usage breakdown for [out-test0_meta_concat3]:\n           Total: 230.562KB\n           * model code: 30.562KB\n           * model constants: 24.000KB\n           * scratchpad: 0.000B\n           * runtime: 95.000KB\n           * dma rings io: 61.000KB\n           * dma rings spill: 16.000KB\n           * dma rings collectives: 0.000B\n           * dma rings runtime: 4.000KB\n           * collectives: 0.000B\n\n\n.. _nd-profile-buffers:\n\nProfile Buffers\n---------------\n\nWhen used with NRT's profiling APIs and ``neuron-profiler capture``, Runtime allocates buffers in order to store the profiling events. These profiling buffers by default are about 64 or 128 MB each, so expect around 2 GB overhead. (*subject to future changes*)\n\nThese profiler buffer sizes can be manually adjusted by setting flags ``NEURON_RT_PROFILE_BUF_<buffer type>_MB``. For example, ``NEURON_RT_PROFILE_BUF_DMA_MB=512``. Here's a list of the different buffers one can attempt adjusting: ``EVENT``, ``DMA``, ``THROTTLE``, ``CC_CORE_INSTRUCTION``, ``CC_CORE_EVENT``.\n\n.. note::\n   Adjusting the buffer sizes manually is NOT recommended, since buffers too small will cause profiler to lose events. **Prioritize profiling one NEFF at a time, and only consider when profiling a single NEFF still OOMs.**\n\nAnother option for reducing memory usage further when profiling is to use the ``--single-io``. This option will reduce the memory used by IO tensors by creating an IO tensor the size of the largest IO tensor in the model. Other IO tensors will point to slices of this tensor during execution. The output will no longer be correct but the profile will still realistically capture performance. Note that the ``--single-io`` option is only available to ``neuron-profile``.\n\n.. code-block:: bash\n\n   neuron-profile capture -n file.neff --single-io\n\n**NOTE**: only device profiles require extra device memory. System profiles do not. If you are only interested in a high-level view of performance kernel execution latency and time spent in Neuron runtime APIs, consider capturing a system profile with the ``nrt_sys_trace_fetch_events`` or ``NEURON_RT_INSPECT_ENABLE`` APIs.\n\n.. _nd-scratchpad:\n\nScratchpad\n----------\n\nAside from inputs and outputs, a NEFF execution requires additional space on HBM for temporary spills out of the state buffer (the cache). This is necessary because the working set of a program can be arbitrarily large, and may not fit in the state buffer. We call this space **scratchpad**.\n\nScratchpad size requirement for a NEFF is specified entirely by the compiler. Scratchpad size depends on kernel size, kernel type and tiling strategy. For example, for a training workload, scratchpad usage is usually determined by the size of activation between forward and backward layer. For an inference kernel, scratchpad usage is usually determined by the size of hidden states. Additionally, optimal tiling and fusion of collective and/or compute operations can reduce scratchpad usage significantly.\n\n``def.json`` within a NEFF contains information about how much scratchpad space is required for the NEFF. Scratchpad memory is allocated on the HBM, per NeuronCore. The memory is only used while a NEFF execution is running. Thus it makes sense to share this memory among all loaded NEFFs to reduce the overall memory footprint. Runtime allocates a **shared scratchpad** - that is shared by all NEFFs loaded on a particular NeuronCore. The size of the **shared scratchpad** size is equivalent of the size of the largest **scratchpad** among all the loaded NEFFs. In some cases a variable cannot be placed in **shared scratchpad** and is placed in a **non-shared scratchpad** specific to a NEFF (see `Scratchpad variables`_ below).\n\nScratchpad variables\n~~~~~~~~~~~~~~~~~~~~\n\nThe scratchpad space is fully managed by the Compiler. A NEFF defines scratchpad variables and their **size** and **offset** within the scratchpad space. Runtime maps all these variables to the scratchpad space it allocates on the HBM. Some of the variables may overlap with others since not all variables are \"live\" at the same time during NEFF execution.\n\nRuntime iterates through all scratchpad variables in ``def.json`` and computes ``MAX`` of ``offset + size`` over all of them. That is the size of the shared scratchpad space required by the NEFF.\n\nShared scratchpad\n~~~~~~~~~~~~~~~~~\n\nAs the name implies, **shared scratchpad** is shared among all programs/NEFFs loaded on a particular NeuronCore. This is possible because only one NEFF executes at a time on a NeuronCore, and data cannot be passed from one NEFF to other through the scratchpad. That means the scratchpad dynamically grows/shrinks with NEFF loads/unloads. To achieve that, the **runtime allocates the shared scratchpad in chunks**, referred to as **scratchpad pages**.\n\nOnce a variable is placed in a scratchpad page the variable's physical location cannot be changed, i.e. the variable cannot be moved to another page and the page itself cannot be moved. That is because during NEFF load the Runtime generates DMA descriptors that point to the variables' physical addresses and the descriptors are generated only once during NEFF load. The number of pages can grow and shrink as NEFFs are loaded and unloaded but the variables for the loaded NEFFs retain their physical locations. When a new NEFF is loaded, it might require larger **scratchpad** space than any of the currently loaded NEFFs. In that case new pages are allocated, but the pages are not necessarily contiguous with the previously allocated pages.\n\nBecause the pages are not contiguous in HBM, a scratchpad variable must fit entirely within a page in order to be placed in the shared scratchpad (``(var_offset % NEURON_SCRATCHPAD_PAGE_SIZE) + var_size <= NEURON_SCRATCHPAD_PAGE_SIZE``). The default scratchpad page size in Runtime is 512 MB and through environment variables described later in this document, it can be set to any multiple of 512 MB, up to a maximum of 3.5 GB.\n\nShared scratchpad pages are shown in the OOM reporting in Runtime as category **\"shared scratchpad\"** and in sysfs under:\n\n.. code-block:: text\n\n   /sys/devices/virtual/neuron_device/neuron<device_number>/neuron_core<nc_number>/stats/memory_usage/device_mem/model_shared_scratchpad/\n\nNon-shared/Private scratchpad allocations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf a variable cannot fit into a shared scratchpad page, Runtime makes a completely separate allocation for it.\n\nAs an example, let's say scratchpad page size is 512 MB, and we load the following two NEFFs:\n\nNEFF A has the following scratchpad variables (using a different format from ``def.json`` for brevity here):\n\n``a_var1: {offset: 0, size: 536870912 [512 MB]}, a_var2: {offset: 536870912 [512 MB], size: 1073741824 [1 GB]}``\n\nNEFF B has the following scratchpad variables:\n``b_var1: {offset: 0, size: 104857600 [100 MB]}, b_var2: {offset: 104857600 [100 MB], size: 1610612736 [1.5 GB]}``\n\n``a_var1`` and ``b_var1`` both satisfy the condition to fit within the 512 MB shared scratchpad page. Since they are both at same offset, they will end up sharing the same shared scratchpad page.\n\nBut ``a_var2`` and ``b_var2`` are both bigger than 512 MB, Runtime will make separate allocations for them. So there will be 1 GB of private allocation for NEFF A and another 1.5 GB of private allocation for NEFF B.\n\nIn this example we would have 2.5 GB of non-shared scratchpad allocations on the HBM. These would show up as category **\"scratchpad\"** in the OOM reporting in Runtime, and in sysfs under: \n\n.. code-block:: text\n\n   /sys/devices/virtual/neuron_device/neuron<device_number>/neuron_core<nc_number>/stats/memory_usage/device_mem/model_shared_scratchpad/\n\nOne thing to note in this case is that Runtime will still calculate the required amount of shared scratchpad and allocate it. It comes to 1.5 GB for NEFF A and 1.6 GB for NEFF B - so the maximum among the NEFFs is 1.6 GB; and rounded up to scratchpad page size, it comes to 2 GB. Thus, Runtime will allocate 2 GB of shared scratchpad (or 4 pages), and 2.5 GB of non-shared scratchpad allocations in this case, even though it only ends up using 1 page of the shared scratchpad.\n\nIf the page size is set to 2GB (by setting ``NEURON_SCRATCHPAD_PAGE_SIZE=2048`` - see environment variables described later in this doc), all variables would fit within the shared scratchpad page. After loading both NEFFs only a single shared 2 GB page will be allocated, with zero HBM consumed by the non-shared scratchpad. Thus, choosing the right scratchpad page size can reduce HBM allocations by a significant amount.\n\nHow to avoid high non-shared scratchpad usage\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf the OOM report has a high amount of non-shared scratchpad usage (i.e. high ``scratchpad`` category usage, but not ``shared scratchpad`` category), it typically means that the scratchpad variables are larger than the default Runtime scratchpad page size.\n\nExamples of non-shared scratchpad usage in OOM report:\n\n.. code-block:: text\n\n   Overall HBM usage\n       * total: 23.577GB\n       * ...\n       * shared scratchpad: 9.000GB\n       * scratchpad: 8.149GB   <--- non-shared scratchpad allocations\n       * ...\n\nOr, with recent changes to OOM reporting:\n\n.. code-block:: text\n\n                                                                   non-shared scratchpad allocations\n                                                                             |\n                                                                             v\n                 |          |  Model   |  Model   |          |  Shared  |          |          |DMA Rings | ...\n                 |  TOTAL   |   Code   |Constants | Tensors  |Scratchpad|Scratchpad| Runtime  |    IO    | ...\n   ND 0 HBM 0    | 23.577GB |932.370MB | 1.438MB  | 5.359GB  |  9.000GB |  8.149GB |203.062KB |118.000KB | ...\n   ...\n\nYou can try experimenting with larger scratchpad page sizes through the following environment variables for Compiler and Runtime respectively:\n\n.. code-block:: bash\n\n   export NEURON_CC_FLAGS=' <other flags if required> --hbm-scratchpad-page-size=<size in MB> ' # Env var for Neuron Compiler\n   export NEURON_SCRATCHPAD_PAGE_SIZE=<size in MB>  # Env var for Neuron Runtime\n\nBoth these environment variables specify the scratchpad page size in MBs (megabytes)\n\nAs an example, setting scratchpad page size to 2 GB:\n\n.. code-block:: bash\n\n   export NEURON_CC_FLAGS=' --hbm-scratchpad-page-size=2048 '\n   export NEURON_SCRATCHPAD_PAGE_SIZE=2048\n\nNote that the env variable for Neuron Compiler needs to be set as well, otherwise it may set the offsets for the variables in an inefficient manner.\n\n**The size should be a multiple of 512 and less than 4096 (4 GB)**. Setting the scratchpad page size too low would lead to non-shared allocations, and setting it too high could also lead to memory wastage (as the last scratchpad page allocated may only be partially utilized). It is recommended to try values like 2048 (2 GB), 1536 (1.5 GB) and 1024 (1 GB) in case of OOM.\n\nAppendix: NEFF format for scratchpad variables\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf we unpack a NEFF (using ``neuron-packager``), and inspect ``sg00/def.json`` (and ``sg01/def.json`` in case of NEFFs generated for Trn2 LNC size 2 configuration), we will see variables entries like these:\n\n.. code-block:: json\n\n   \"var\": {\n           \"some_variable_name\": {\n               \"backing_variable_off\": 17108992,\n               \"ops\": [],\n               \"size\": 131072,\n               \"type\": \"virtual\",\n               \"var_id\": 2349\n           },\n           ...\n    }\n\n``type`` being \"virtual\" for a variable indicates that it is a scratchpad variable. The ``backing_variable_off`` field is the offset inside the shared scratchpad space allocated by Runtime, and the ``size`` field is the size of the variable.\n\nDMA Rings\n---------\n\n**DMA rings** are buffers used to store DMA **descriptors** (each descriptor describes a data movement that the DMA engines can execute).\n\nDGE generates the descriptors dynamically during NEFF execution, so, if a NEFF is using DGE for some DMA, then no allocation is needed on the HBM for those descriptors.\n\nFor any DMAs not using DGE, Runtime must allocate the DMA rings on HBM and build the DMA descriptors before execution. The details for building the descriptors for these DMAs in the NEFF is encoded in ``def.json`` and ``<engine>.json`` where ``<engine>`` is the TPB engine that will trigger the DMA operation.\n\nOverall, reducing DMA rings usage requires changes in the NEFF itself, with the most effective change being using DGE for DMAs where supported.\n\nIn OOM reports, DMA rings are further categorized as:\n\n1. IO - These descriptors have an I/O tensor as their source or destination\n2. Spill - These descriptors move data between any NEFF variables/tensors, excluding any I/O tensors\n3. Collectives - These descriptors move data for collectives operations between ranks on the same node\n4. Runtime - These descriptors do not correspond to any explicit DMAs in the NEFF but are needed to perform DMAs to support NEFF execution. Examples: loading DVE and activation tables, instruction fetch DMAs for TPB engines"
  },
  {
    "path": "neuron-runtime/explore/direct-hbm-tensor-alloc.rst",
    "content": ".. _direct-hbm-tensor-alloc:\n\n.. meta::\n   :description: Guide on Direct HBM Tensor Allocation with Neuron\n   :date_updated: 12/02/2024\n\nDirect HBM Tensor Allocation with Neuron\n========================================\n\nThis topic provides an overview and usage examples for directly allocating tensors into High Bandwidth Memory (HBM) on AWS Neuron devices using the Neuron Runtime with PyTorch.\n\nOverview\n---------\n\n* Device identifier: On Trainium/Inferentia instances, Neuron devices are identified in PyTorch through the names: ``privateuseone`` or ``neuron``. These names can be used interchangeably\n* Direct HBM allocation: Allows tensors to be allocated directly into High Bandwidth Memory (HBM) on Neuron devices  \n* Performance optimization: Eliminates memory transfer overhead between CPU and device memory\n\nBackground\n-----------\n\n* PyTorch has many different devices which it dispatches ops (like add, matmul, to) to, ``privateuseone`` is one of these devices, we utilize this and register our backend using this PyTorch interface, and we rename it as ``neuron``. If a tensor is created or moved to a device, PyTorch will dispatch the allocation operation to that device. For instance, if a tensor is created on ``neuron:0`` specifically, the Neuron Runtime will handle the allocation, and will allocate the result on device instead of CPU.\n\n* *Diagram 1: Device registration and allocation flow*\n\n  .. image:: /neuron-runtime/img/device-allocation-flow.png\n     :align: center\n     :width: 80%\n\n* *Diagram 2: Tensor allocation behaviour*\n\n  .. image:: /neuron-runtime/img/tensor-allocation-behavior.png\n     :align: center\n     :width: 80%\n\nDevice Placement Behavior\n--------------------------\n\nCritical Rule\n~~~~~~~~~~~~~~\n\n* All-or-nothing: ALL inputs must be on ``neuron:0`` for outputs to remain on device  \n* CPU fallback: Any CPU input causes ALL outputs to move to CPU\n\nWhy This Matters\n~~~~~~~~~~~~~~~~~\n\n* Chained operations: Enables efficient multi-model pipelines without CPU roundtrips  \n* Reduced latency: Eliminates expensive device-to-CPU transfers  \n* Memory efficiency: Better utilization of 32GB (trn1) / 96GB (trn2) HBM available on Trainium instances\n\nUsage Examples\n----------------\n\nBasic Usage - All Inputs on Device\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n    traced_model = '{your-model-here}'\n    torch_neuronx.move_trace_to_device(traced_model, 0)\n\n    # Single input\n    input_tensor = torch.rand([1, 3, 224, 224], device=\"neuron:0\")\n    output = traced_model(input_tensor)\n    print(output.device)  # device(type='neuron', index=0)\n\n    # Multiple inputs\n    a = torch.rand([2, 2], device=\"neuron:0\")\n    b = torch.rand([2, 2], device=\"neuron:0\")\n    output = traced_model(a, b)\n    print(output.device)  # device(type='neuron', index=0)\n\n\nMixed Device Inputs - Shows Fallback\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n    a = torch.rand([2, 2], device=\"neuron:0\")\n    b = torch.rand([2, 2], device=\"cpu\")  # One CPU tensor\n    output = traced_model(a, b)\n    print(output.device)  # device(type='cpu') - falls back to CPU\n\n\nEfficient Model Chaining\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: python\n\n    input_data = torch.rand([1, 256], device=\"neuron:0\")\n    intermediate = traced_model1(input_data)    # stays on device\n    final_output = traced_model2(intermediate)  # stays on device\n\n\nBest Practices\n----------------\n\n* Keep all tensors on same device: Ensure all inputs are on ``neuron:0`` to avoid CPU fallback  \n* Monitor HBM usage: Be aware of HBM limits on Trainium instances (32GB for trn1, 96GB for trn2)\n* Verify device placement: Check ``tensor.device`` to confirm expected placement\n\nCompatibility\n--------------\n\n* Works with: All ``torch_neuronx.trace`` models, dynamic batching, ``move_trace_to_device``\n* Limited by: Available HBM memory\n"
  },
  {
    "path": "neuron-runtime/explore/index.rst",
    "content": ".. _neuron-runtime-explore-home:\n\n.. meta::\n   :description: Topics that explore the AWS Neuron Runtime and tools in-depth, written by the AWS engineers who developed them.\n   :keywords: AWS Neuron, deep dives, whitepapers, engineering\n\nNeuron Runtime Deep Dives\n==========================\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   Understand NEFF Files <work-with-neff-files>\n   Compute-Communication Overlap <compute-comm-overlap>\n   Neuron Device Memory <device-memory>\n   Direct HBM Tensor Allocation <direct-hbm-tensor-alloc>\n   Runtime Performance Tips <runtime-performance-tips>\n   Neuron Runtime Core Dumps <core-dump-deep-dive>\n   Inter-node Collectives <internode-collective-comm>\n   Intra-node Collectives <intranode-collective-comm>\n\nCurious about how the Neuron Runtime works? Looking for deeper explorations of the computer science, techniques, and algorithms used to develop it? This section provides topics that dive into the learnings and engineering behind the Neuron Runtime, written by the AWS engineers who developed it.\n\nNeuronX Runtime Deep Dives\n---------------------------\n\n.. grid:: 2\n        :gutter: 2\n\n        .. grid-item-card:: Understand NEFF Files\n\n                * :ref:`work-with-neff-files`\n\n                Explore the structure and contents of NEFF files, the compiled model format used by the Neuron Runtime.\n\n        .. grid-item-card:: Compute-Communication Overlap\n\n                * :ref:`neuron-runtime-explore-compute-comm`\n  \n        .. grid-item-card:: Neuron Device Memory\n\n                * :ref:`neuron-device-memory-deep-dive`\n\n                Learn how the Neuron Runtime overlaps computation and communication to maximize performance on AWS Inferentia and Trainium chips.\n  \n        .. grid-item-card:: Neuron Device Memory\n\n                * :ref:`neuron-device-memory-deep-dive`\n\n                Understand, monitor, and optimize memory usage on AWS Neuron devices including tensors, model constants, scratchpad allocations, and more.\n\n        .. grid-item-card:: Direct HBM Tensor Allocation\n\n                * :ref:`direct-hbm-tensor-alloc`\n  \n                Optimize performance by allocating tensors directly into High Bandwidth Memory (HBM) on Neuron devices, eliminating CPU-device memory transfer overhead.\n\n        .. grid-item-card:: Runtime Performance Tips\n\n                * :ref:`runtime-performance-tips`\n  \n                Best practices and optimization techniques for achieving optimal performance with the AWS Neuron Runtime. \n\n        .. grid-item-card:: Neuron Runtime Core Dumps   \n\n                * :ref:`runtime-core-dump-deep-dive`\n\n                Dive into the structure and analysis of Neuron Runtime core dumps to troubleshoot and debug runtime issues effectively.\n\nNeuron Collectives Deep Dives\n-----------------------------\n\n.. grid:: 2\n        :gutter: 2\n\n        .. grid-item-card:: Inter-node Collectives Communication\n\n                * :doc:`internode-collective-comm`\n\n                Explore Ring, Mesh, and Recursive Doubling-Halving algorithms for coordinating data exchange across multiple nodes via EFA networks.\n\n        .. grid-item-card:: Intra-node Collectives Communication\n\n                * :doc:`intranode-collective-comm`\n\n                Learn about Ring, Mesh, KangaRing, and RDH algorithms optimized for high-bandwidth NeuronLink communication within single nodes.\n"
  },
  {
    "path": "neuron-runtime/explore/internode-collective-comm.rst",
    "content": ".. meta::\n    :description: Learn about inter-node collective communications with AWS Neuron, including algorithms and optimization strategies\n    :date-modified: 12/02/2025\n\n.. _internode_collectives:\n\nInter-node Collective Communications with AWS Neuron\n====================================================\n\nThis topic explores inter-node collective communication algorithms and optimization strategies for AWS Neuron distributed workloads. It covers the implementation details of Ring, Mesh, and Recursive Doubling-Halving algorithms for coordinating data exchange across multiple nodes connected via EFA (Elastic Fabric Adapter) networks.\n\nOverview\n--------\n\nInter-node collective communication enables efficient data exchange between NeuronCores distributed across multiple physical nodes in a cluster. This document examines three primary algorithmic approaches: Ring, Mesh, and Recursive Doubling-Halving (RDH), with each optimized for different cluster sizes and message characteristics. The choice of algorithm depends on the trade-offs between step latency (O(N), O(1), O(logN)) and network bandwidth utilization, with performance further influenced by EFA network topology and message size considerations.\n\nApplies to\n----------\n\nThis concept is applicable to:\n\n* **Distributed Training**: Collective communication aggregates and synchronizes gradients across workers to maintain model consistency. In this scenario, collective operations enable workers to compute gradient sums across all nodes, ensuring uniform parameter updates.\n* **Distributed Inference**: During inference, collective communication distributes requests across multiple accelerators in serving nodes, optimizing resource utilization and maintaining low latency under high loads.\n  \nIntroduction: About Collectives on Neuron\n------------------------------------------\n\nAlso see :ref:`intranode_collectives`.\n\nCollective Communication Operations\n-----------------------------------\n\nWe define the following denotations:\n\n* **N**: the number of participating ranks in a communication group\n* **C**: a \"chunk\", which is a piece of data (subset of tensor data transmitted at each algorithm step) with size equaling to that of a rank's input in AllGather, or output in ReduceScatter\n* **B**: the size of both the input and output buffer in AllReduce. In that context, C = B / N\n\nNow we establish the following collective operations:\n\n.. list-table:: Collective Operations\n   :widths: 20 15 15 50\n   :header-rows: 1\n\n   * - Operation Type\n     - Input Size\n     - Output Size\n     - Explanation\n   * - AllGather\n     - C\n     - N * C\n     - Each rank starts with a chunk and ends with everyone else's chunks\n   * - ReduceScatter\n     - N * C\n     - C\n     - Each rank starts with N chunks, and ends with a unique chunk which is fully reduced among the N ranks\n   * - AllReduce\n     - B = N * C\n     - B = N * C\n     - Each rank contributes B, and ends with B which is fully reduced among the N ranks. AllReduce can be seen as a concatenation of ReduceScatter followed by AllGather\n   * - AllToAll\n     - B = N * C\n     - B = N * C\n     - Each rank starts with N chunks, and ends with the N in a way that the pieces of data were transposed between the ranks r0[A0, A1] r1[B0, B1] → r0[A0, B0], r1 [A1, B1]\n\nThe execution time of a collective communication operation consists of two parts: **step latency + data transfer** time. As mentioned above, the per-hop or point-to-point latency is ~15us. On the other hand, the transfer time is a function of the buffer/message size. For example, to transfer 1KiB, 1MiB, and 1GiB at 50Gbps takes 160ns, 160us, and 160ms respectively. Therefore, the collectives communication problem is latency dominant for small sizes, and throughput dominant for large sizes, and a mix for mid-sizes. This requires us to incorporate different strategies and algorithms for each range.\n\nCommunication Groups - Only One Rank per Node\n----------------------------------------------\n\nWhen distributing a ML workload across multiple nodes, communication groups are always formed with symmetry:\n\n1. The number of participating ranks on each node is consistent\n2. These local ranks must have the same intra-node indices\n\nAs the name suggests, one-rank-per-node groups refer to the simple case where we only need to focus on the network communication between peers.\n\nRing Algorithm\n~~~~~~~~~~~~~~\n\n.. image:: /neuron-runtime/img/collectives/ring-algorithm.png\n   :alt: Ring Algorithm\n   :align: center\n\nIn the Ring algorithm, all participating ranks are joined together in a directed cycle. This algorithm is considered bandwidth optimal, meaning that from each rank's perspective, it transfers the minimal amount of data. For instance, in the AllGather and ReduceScatter cases we have: ``number_of_steps * chunk_size = (N - 1) * C``.\n\nRing's O(N) number of hops means it has linear step latency, making it not suitable for large clusters or the latency-bound small message sizes. However, because each rank only receives from and sends to a fixed peer, the Ring algorithm does not incur any ingress congestion. Furthermore, we can arrange the neighbors in Ring to be topologically close to each other in the substrate network, which reduces the congestion between the global inflight transactions on core switches. As a result, Ring tends to push for the highest bandwidth utilization rate with large message sizes.\n\nRing AllGather\n^^^^^^^^^^^^^^^\n\nIn step 0, each rank r sends its input chunk to its downstream peer. In step 1, each rank sends the chunk it just received from the upstream peer further to downstream, and the process repeats until all chunks have traversed the ring.\n\nRing ReduceScatter\n^^^^^^^^^^^^^^^^^^\n\nIn step 0, each rank r sends its (r-1)th chunk to its downstream. In step 1, each rank reduces the chunk it just received with the same indexed chunk from its own input, and then sends the result to downstream. This process goes on until each rank has its output chunk fully traversed the ring and reduced among all ranks.\n\nIn practice, each ReduceScatter step consists of three components: network receive, local reduction, and network send. To hide the serial latency, an implementation trick is to further break a chunk of data into slices to pipeline the communication and reduction.\n\nRing AllReduce\n^^^^^^^^^^^^^^^\n\nThis algorithm is a concatenation of the above two patterns (ReduceScatter followed by AllGather). Its traffic is doubled since we need to traverse the ring twice, but still minimal/optimal.\n\nMesh Algorithm\n~~~~~~~~~~~~~~\n\n.. image:: /neuron-runtime/img/collectives/mesh-algorithm.png\n   :alt: Mesh Algorithm\n   :align: center\n\nThe Mesh algorithm aims to optimize step latency for small message sizes. Rather than transferring data point-to-point like in Ring and accumulating hop latency along the steps. Consequently, there is no extra overhead in Mesh with ``traffic = number_of_peers * chunk_size = (N - 1) * C``.\n\nThe mesh communication pattern suffers from ingress congestion because each rank directly receives from all peers. Even small variances in the start time of an operation will cause the fast starters to saturate disproportionately high fractions of the switch and NIC bandwidth, congesting the rest of the transactions by the slower ranks. Furthermore, the fact that one has to communicate with multiple peers means that some of the network paths will be longer (go through higher level of switches) and hence subject to congestion and queuing delays. As a result, Mesh does not scale well to large clusters and/or message sizes.\n\nMesh AllGather\n^^^^^^^^^^^^^^^\n\nIt directly broadcasts data from each rank to all the other peers in one step, hence achieving O(1) latency.\n\nMesh ReduceScatter\n^^^^^^^^^^^^^^^^^^\n\nSimilarly, Mesh ReduceScatter scatters the input of each rank to the other peers, and then locally reduces the N-1 received chunks, plus the self chunk into the output.\n\nMesh AllReduce\n^^^^^^^^^^^^^^^\n\nThis algorithm is a concatenation of the above two patterns (ReduceScatter followed by AllGather). Its traffic is doubled since we need to run Mesh twice, but still minimal/optimal.\n\nSingle-step Mesh Algorithm (AllReduce)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Single-step Mesh algorithm is a variant of Mesh specifically designed for AllReduce. The goal is to trade off bandwidth optimality for a reduced number of hops (from 2 to 1). Rather than simply concatenating ReduceScatter and AllGather, we can have each rank duplicate and broadcast its whole input buffer to all peers. Upon receiving these duplicates, a rank will reduce the whole buffer to its output. Remember that each hop is expected to add ~15 us latency. Single-step Mesh outperforms regular Mesh for sufficiently small cluster and/or message sizes where the extra data transfer time is shorter than 15 us.\n\n.. list-table:: Mesh Algorithm Comparison\n   :widths: 30 15 55\n   :header-rows: 1\n\n   * - Algorithm\n     - # steps\n     - Network Traffic Amount per Rank\n   * - Mesh AllGather\n     - 1\n     - optimal = (N - 1) * C\n   * - Mesh ReduceScatter\n     - 1\n     - optimal = (N - 1) * C\n   * - Mesh AllReduce\n     - 2\n     - optimal = 2 * (N - 1) * C\n   * - Single-step Mesh AllReduce\n     - 1\n     - not optimal = (N - 1) * N * C\n\nRecursive Doubling and Halving (RDH) Algorithm\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. image:: /neuron-runtime/img/collectives/rdh-algorithm.png\n   :alt: Recursive Doubling an Halving Algorithm\n   :align: center\n\n(inspired by https://web.cels.anl.gov/~thakur/papers/ijhpca-coll.pdf)\n\nThe Recursive Doubling and Halving (RDH) algorithm works to find the middle ground between Mesh and Ring in both step latency and bandwidth utilization, in a communication group with N = 2^p members.\n\nThere is no additional overhead in RDH with ``traffic = (1 + 2 + 4 ...) * chunk_size = (N - 1) * C``. Corresponding to the number of steps, the step latency of RDH is O(logN) or O(p). In respect to congestion deficiency, having log(N) peers poses ingress contentions, but in the implementation, we can issue send/receive credits with rate control to mitigate such issues. Effectively, a rank will only talk to one single peer at any given time, resulting in several steady streams of high-speed transfer and a relatively high amortized bandwidth utilization. Overall, RDH is suitable for large clusters with medium/large sized messages.\n\nWhen representing the indices in binary, they are obtained by flipping each of the p bits of the current rank's index.\n\nRecursive-Doubling AllGather\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIt works by having each rank sequentially communicating with log(N) peers. In each step of AllGather, a rank sends all the chunks it has collected so far, and receives the equal amount of new chunks from its peer, hence doubling the amount of data. The algorithm follows a classic Recursive-Doubling communication style.\n\nRecursive-Halving ReduceScatter\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis algorithm works similarly — in each step, a rank sends half of the partially reduced chunks so far to its peer, and receives the other peer's half of partial chunks. It then reduces the self chunks and the received chunks together, and we repeat the process with the problem space exactly halved.\n\nRecursive-Halving AllReduce\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAgain, AllReduce works by concatenating ReduceScatter and AllGather.\n\nAlgorithm Summary\n~~~~~~~~~~~~~~~~~\n\n.. list-table:: Algorithm Comparison\n   :widths: 15 20 25 20 20\n   :header-rows: 1\n\n   * - Algorithm\n     - Step Latency\n     - Network BW Utilization\n     - Suitable Group Sizes\n     - Suitable Message Sizes\n   * - Ring\n     - O(N)\n     - High\n     - Small-to-medium\n     - Large\n   * - Mesh\n     - O(1)\n     - Low\n     - Small\n     - Small\n   * - RDH\n     - O(logN)\n     - > Mesh; < Ring\n     - Medium-to-large\n     - Small-to-medium\n\nCommunication Groups - Multiple Ranks per Node\n-----------------------------------------------\n\nOrchestrating collective communication operations across distributed computing systems presents a fundamental challenge when multiple processing ranks are deployed per node. The complexity arises from the need to efficiently coordinate data exchange both within individual nodes (intra-node) and across the network between different nodes (inter-node), each with distinct bandwidth characteristics, latency profiles, and optimal communication patterns.\n\nTraditional flat communication algorithms that treat all ranks uniformly often fail to exploit the inherent hierarchical structure of modern distributed systems, leading to suboptimal performance and scalability bottlenecks.\n\nHierarchical algorithms address this challenge by recognizing and leveraging the two-tier nature of distributed systems, strategically decomposing global operations into separate intra-node and inter-node phases that can each be optimized independently while maintaining overall correctness and efficiency.\n\nHierarchical Algorithm\n~~~~~~~~~~~~~~~~~~~~~~\n\nThe Hierarchical algorithm is a powerful framework to break down a multiple-rank-per-node operation into stages of pure intra-node and inter-node communication.\n\nThe hierarchical algorithm implementation employs a plug-and-play mechanism to allow for any combination of intra-node and inter-node algorithms — and we simply choose the most optimal one for each communication stage.\n\nThe latency and throughput properties of the Hierarchical algorithm is therefore dependent on the selected sub-algorithms. However, it's worth calling out that by breaking a global communication into intra-node and inter-node dimensions, we promote the principle of divide and conquer and it matters especially to the total latency. For example by choosing Ring + Ring, the latency is O(X) + O(Y), where X is the number of nodes and Y intra-node ranks. That is significantly better than Flat Ring's O(X * Y).\n\nOverall, the Hierarchical algorithm is versatile to work well across a wide range of group and message sizes. For example, small groups + sizes can go to intra-node Mesh + inter-node Mesh, and large groups + sizes can go to intra-node KangaRing + inter-node RDH.\n\nGlobal AllGather = inter-node AllGather + intra-node AllGather\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nLet's assume there are X servers globally, each containing Y ranks. The first step of the hierarchical algorithm is to form Y rank lists where each contains X ranks who have the same local index. Later, we run AllGather on these inter-node groups in parallel and each rank ends up with X chunks. Finally, we form X rank lists each of all the Y ranks on one node, and run intra-node AllGather again to further broadcast the data. By the end, everyone has all the (X * Y) chunks.\n\nBecause the inter EFA interface has lower bandwidth than that of the intra-node interface, and that the first stage incurs less traffic than the second, we choose to run the inter-node communication first.\n\n.. image:: /neuron-runtime/img/collectives/global-allgather.png\n   :alt: Global All-Gather node communication\n   :align: center\n\nGlobal ReduceScatter = intra-node ReduceScatter + inter-node ReduceScatter\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe can run intra-node ReduceScatter first on X parallel rank lists of Y local ranks, reducing the buffer size to 1/Y of the original size. Notice that each rank will end up holding a different 1/Y corresponds to its local index. Next, we run inter-node ReduceScatter on Y parallel rank lists of X network ranks, further reducing the buffer on each global rank a unique 1/(X * Y) chunk of the original.\n\nThe order of the intra- and inter-node stages is flipped when compared to AllGather, because now the second stage has less traffic.\n\n.. image:: /neuron-runtime/img/collectives/global-reducescatter.png\n   :alt: Global ReduceScatter node communication\n   :align: center\n\nGlobal AllReduce = intra-node ReduceScatter + inter-node AllReduce + intra-node AllGather\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe first run an intra-node ReduceScatter to break down the buffer size to 1/Y of the original. Then we run an inter-node AllReduce. Again, each inter-node group will work on a different 1/Y section of the original buffer, so there's no duplicated work. And lastly, we run an intra-node AllGather to broadcast the whole buffer to everyone.\n\n.. image:: /neuron-runtime/img/collectives/global-allreduce.png\n   :alt: Global All-Reduce node communication\n   :align: center\n\nFlat Ring Algorithm (Edge cases only)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe Flat Ring algorithm works by connecting all the global ranks in a directed cycle. Ranks local to a single server are connected in an open chain with the intra-node communication interface, and the two ends will be joined to chains on other servers with the inter-node EFA interface.\n\nThe step latency is O(X * Y) - X is the number of nodes and Y intra-node ranks - and the network bandwidth utilization is high. However, one caveat is that each EFA interface is connected to a different Trainium Chip. So, to utilize all of them, we need to run multiple directed cycles (called channels) in parallel, thus reducing the transfer size and efficiency of each cycle, besides causing high context switching overheads in the collective execution cores. We only enable Flat Ring on Trn1 for large message size cases where it has a clear edge.\n\n.. image:: /neuron-runtime/img/collectives/flat-ring.png\n   :alt: Flat Ring algorithm\n   :align: center\n\nMore information\n-----------------\n\n* :doc:`Intra-node Collective Communications </neuron-runtime/explore/intranode-collective-comm>`\n* :doc:`About Neuron Runtime Collectives </neuron-runtime/about/collectives>`"
  },
  {
    "path": "neuron-runtime/explore/intranode-collective-comm.rst",
    "content": ".. meta::\n    :description: Learn about intra-node collective communications with AWS Neuron, including Ring, Mesh, KangaRing, and RDH algorithms\n    :date-modified: 12/02/2025\n\n.. _intranode_collectives:\n\nIntra-node Collective Communications with AWS Neuron\n====================================================\n\nThis topic covers intra-node collective communication algorithms and optimization strategies for AWS Neuron distributed workloads within a single node. It examines Ring, Mesh, KangaRing, and Recursive Doubling-Halving (RDH) algorithms for coordinating data exchange between NeuronCores connected via high-bandwidth intra-chip and chip-to-chip NeuronLink interconnects.\n\nOverview\n--------\n\nIntra-node collective communication enables efficient data exchange between NeuronCores within a single physical node or tightly coupled nodes connected via NeuronLinks. This document explores four primary algorithmic approaches—Ring, Mesh, KangaRing, and RDH—each optimized for different message sizes and latency requirements. The algorithms leverage the 2D Torus topology of Trainium chips and specialized hardware features like duplication to minimize memory bandwidth pressure and maximize throughput.\n\nApplies to\n----------\n\nThis concept is applicable to:\n\n* **Distributed Training**: Collective communication aggregates and synchronizes gradients across workers to maintain model consistency. In this scenario, collective operations enable workers to compute gradient sums across all nodes, ensuring uniform parameter updates.\n* **Distributed Inference**: During inference, collective communication distributes requests across multiple accelerators in serving nodes, optimizing resource utilization and maintaining low latency under high loads.\n\nCollective Communication Operations\n-----------------------------------\n\nWe define the following denotations:\n\n* **N**: the number of participating ranks in a communication group\n* **C**: a \"chunk\", which is a piece of data (subset of tensor data transmitted at each algorithm step) with size equaling to that of a rank's input in AllGather, or output in ReduceScatter\n* **B**: the size of both the input and output buffer in AllReduce. In that context, C = B / N\n\nNow we establish the following collective operations:\n\n.. list-table:: Collective Operations\n   :widths: 20 15 15 50\n   :header-rows: 1\n\n   * - Operation Type\n     - Input Size\n     - Output Size\n     - Explanation\n   * - AllGather\n     - C\n     - N * C\n     - Each rank starts with a chunk and ends with everyone else's chunks\n   * - ReduceScatter\n     - N * C\n     - C\n     - Each rank starts with N chunks, and ends with a unique chunk which is fully reduced among the N ranks\n   * - AllReduce\n     - B = N * C\n     - B = N * C\n     - Each rank contributes B, and ends with B which is fully reduced among the N ranks. AllReduce can be seen as a concatenation of ReduceScatter followed by AllGather\n   * - AllToAll\n     - B = N * C\n     - B = N * C\n     - Each rank starts with N chunks, and ends with the N in a way that the pieces of data were transposed between the ranks r0[A0, A1] r1[B0, B1] → r0[A0, B0], r1 [A1, B1]\n\nThe execution time of a collective communication operation consists of two portions: latency + data transfer time. More concretely, the latency term is of 10^0 to 10^1 us magnitude. For example, the per-hop latency of Ring/KangaRing is about 1-2 us (HBM load dependent). On the other hand, the transfer time is dependent on the buffer/message size. For example, to transfer 1KB, 1MB, and 1GB at 100GBps takes 10 ns, 10 us, and 10 ms respectively. Therefore, the collective communication problem is latency dominant for small sizes, and throughput dominant for large sizes, and a balance for mid-sizes. For this reason, different strategies and algorithms are required to provide the best performance for each range.\n\nRing Algorithm\n--------------\n\n.. image:: /neuron-runtime/img/collectives/ring-algorithm.png\n   :alt: Ring Algorithm\n   :align: center\n\nIn Ring algorithm, all the ranks are connected in a directed cycle. Algorithmically, it has O(N) per-hop latency where N is the number of ranks. In practice, we run multiple cycles with mutually exclusive wires in parallel for full wire bandwidth. That means big tensors (packets) are divided into smaller packets called chunks (more specifically, a chunk is a subset of tensor data transmitted at each algorithm step. The chunk size depends on number of participating ranks on collective) that are transferred across ranks in one or more cycles.\n\nRing AllGather\n~~~~~~~~~~~~~~\n\nIn step 0, each rank sends its input chunk to its downstream neighbor. In step 1, each rank sends the chunk it has just received from upstream to downstream. The process goes on until all the chunks have traversed the ring.\n\nRing ReduceScatter\n~~~~~~~~~~~~~~~~~~\n\nIn step 0, each rank r sends its (r-1)th chunk to its downstream. In step 1, each rank reduces the chunk it has just received with the same indexed chunk from its own input, and writes the result to downstream. The process goes on until each rank has its output chunk fully traversed the ring. It is important to mention a chunk transmit is divided into two sliced transmissions, where the first slice reduction overlaps the second slice communication.\n\nRing AllReduce\n~~~~~~~~~~~~~~\n\nThis algorithm is a concatenation of the above two patterns (ReduceScatter followed by AllGather), so it requires the ring to be traversed twice.\n\nMesh Algorithm\n--------------\n\n.. image:: /neuron-runtime/img/collectives/mesh-algorithm.png\n   :alt: Mesh Algorithm\n   :align: center\n\nThe Mesh algorithm aims to optimize latency for small message sizes. Rather than transferring data step-by-step like in Ring and accumulate per-hop latency along the way, Mesh directly broadcasts/scatters data to all other ranks in one step, hence, to a first degree it has O(1) latency. This is made possible by inter-chip routing — from a rank, data can be directly written to any other rank on a remote chip, where the in-between traffic is routed automatically. The downside of routing is that it leads to link over-subscription, hence mesh is good for mainly small sizes.\n\nMesh AllGather\n~~~~~~~~~~~~~~\n\nThis algorithm consists of two steps. In step 0, each chip contains 4 input chunks which need to be broadcasted to the other 15 chips. We split these destinations roughly evenly among the 4 local ranks. Each rank reads the 4 chunks, either locally or over intra-chip connectivity, and writes to the closest rank on each destination chip via routing. In step 1, each local rank has received 16 distinct chunks. We then run intra-chip broadcast to further exchange them.\n\nMesh ReduceScatter\n~~~~~~~~~~~~~~~~~~\n\nThis algorithm also involves two steps. In step 0, each rank is tasked to send to roughly a quarter of the other 60 off-chip ranks. For each destination, the local rank reads 4 on-chip chunks, one from each NeuronCore (LNC=1 or LNC=2), which correspond to the destination rank, reduces them, and writes the result over via routing. In step 1, each rank has received 16 partially reduced chunks, each of which is from a different chip. It then reads these 16 chunks, reduces them, and writes the result to output.\n\nMesh AllReduce\n~~~~~~~~~~~~~~\n\nThis algorithm is a concatenation of the above two patterns (ReduceScatter followed by AllGather).\n\nSingle-step Mesh Algorithm (AllReduce)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe Single-step Mesh algorithm is a variant of Mesh specifically designed for AllReduce. The goal is to trade off bandwidth optimality for a reduced number of hops (from 2 to 1). Rather than simply concatenating ReduceScatter and AllGather, we can have each rank duplicate and broadcast its whole input buffer to all peers. Upon receiving these duplicates, a rank will reduce the whole buffer to its output.\n\nKangaRing Algorithm\n-------------------\n\n.. image:: /neuron-runtime/img/collectives/kangaring-algorithm.png\n   :alt: KangaRing Algorithm\n   :align: center\n\nThe KangaRing Algorithm is an extension and optimization of Ring. Rather than connecting all ranks in a flat cycle, we group each two ranks (with LNC=2 each rank is composed of 2 NeuronCores) out of four on the same Neuron Device, nominate one as primary and the other as secondary, and connect only the primary ranks in a cycle. Hence, the per-hop latency is cut by half when compared with Ring (although still O(N)). Primary ranks handle all the data movement and reduction, while secondary ranks just sit idle. In practice, we will alternate the assignment of primary and secondary ranks in different cycles (when using all 16 Neuron Devices, there are 2 non-overlapping Hamiltonian cycles on 2D Torus), so that each rank is active in half of them.\n\nKangaRing AllGather\n~~~~~~~~~~~~~~~~~~~\n\nAlgorithm-wise, in step 0, each primary rank sends its self chunk as well as the secondary chunk to the downstream. In subsequent steps, it reads the newly received chunk and duplicates it to both the secondary peer and downstream. Duplication is a hardware feature that allows a data transfer to only incur one read but duplicate the write to two destinations, for reduced HBM pressure. Specifically, for each chunk to traverse every 2 ranks, Ring needs to do 1R1W (one read / one write) twice, resulting in 4 HBM accesses. The same transfer can be done with one 1R2W (one read / 2 writes) in KangaRing, resulting in 3 HBM accesses or a 25% reduction.\n\nKangaRing ReduceScatter\n~~~~~~~~~~~~~~~~~~~~~~~\n\nIn step 0, each primary rank reduces self and secondary chunks and writes the result to downstream. In subsequent steps, it reduces the newly received partial sum, self chunk, and secondary chunk, and writes to downstream. For each chunk to traverse every 2 ranks, Ring needs to do 2R1W (two reads / 1 write) twice, resulting in 6 HBM accesses. In comparison, KangaRing does one 3R1W for only 4 touches or a 33% reduction.\n\nKangaRing is an option for TP replica-groups where all ranks in device are in same rank-list. For instance: On a one-rank-per-chip rank-list replica-group, Ring is used rather than KangaRing. In these particular cases, KangaRing is better than Ring at all sizes. At smaller sizes, it has better latency. At larger sizes, which are HBM bandwidth bound or contended, the number of touches is reduced. But obviously, it still loses to Mesh at small sizes. KangaRing is only relevant for TP replica-groups where all ranks in chip are in same rank-list.\n\nRecursive Doubling and Halving (RDH) Algorithm\n-----------------------------------------------\n\n.. image:: /neuron-runtime/img/collectives/rdh-interchip-algorithm.png\n   :alt: RDH Algorithm at the inter-node level\n   :align: center\n\nThe RDH Algorithm optimizes for mid-size collectives, where both the latency and transfer factors matter. The 2D-Torus connectivity can also be seen as a 4D hyper-cube, where each Chip can reach to a neighbor in 4 axis directions W, X, Y, and Z.\n\nRDH AllGather\n~~~~~~~~~~~~~\n\nThis algorithm involves two stages: inter-chip recursive-doubling and intra-chip broadcast. In the first stage, ranks of the same in-chip index form a communication group, so there are 4 groups of 16 ranks each. Within a group, each rank sends/receives in the 4 axis directions sequentially, and pair-wise exchanges the received chunks so far. By the end of recursive doubling, chunks within each communication group are fully broadcasted. In the second stage, intra-chip broadcast, the 4 local ranks then use intra-chip to further exchange chunks.\n\nRDH ReduceScatter\n~~~~~~~~~~~~~~~~~\n\nThe algorithm also involves two stages: intra-chip reduction and inter-chip recursive halving. In the first stage, a quarter of the chunks are partially reduced to each of the 4 local ranks, with indices corresponding to each rank's inter-chip communication group members. In the second stage, each rank sends/receives the partially reduced chunks in the 4 axis directions sequentially, halving its problem space at each step, until there is one fully reduced chunk left.\n\nEvidently, the intra-chip stage has O(1) number of steps or latency, and the inter-chip recursive stage has O(logN) latency, where N is the number of ranks. When a rank communicates in an axis direction that requires on-chip routing via intra-chip, it may contend with traffic by another rank, but this only happens in some of the cases. So, RDH suffers from less severe link over-subscription than Mesh.\n\nAlgorithm Summary\n-----------------\n\n.. list-table:: Algorithm Comparison\n   :widths: 15 15 20 20 30\n   :header-rows: 1\n\n   * - Algorithm\n     - Latency\n     - Link Utilization\n     - HBM Pressure\n     - Sweet Range (Empirically)\n   * - Ring\n     - O(N)\n     - Full\n     - Normal\n     - Fallback only\n   * - Mesh\n     - O(1)\n     - Over-subscription\n     - Normal\n     - < 1MB\n   * - RDH\n     - O(logN)\n     - Partial Over-subscription\n     - Normal\n     - 1-56MB\n   * - KangaRing\n     - O(N/2)\n     - Full\n     - Reduced\n     - >56MB\n\n.. image:: /neuron-runtime/img/collectives/mesh-rdh-kr-summary.png\n   :alt: Comparison of message size for RDH, KangaRing, and Mesh algorithms\n   :align: center\n\n\nMore information\n-----------------\n\n* :doc:`Inter-node Collective Communications </neuron-runtime/explore/internode-collective-comm>`\n* :doc:`About Neuron Runtime Collectives </neuron-runtime/about/collectives>`\n"
  },
  {
    "path": "neuron-runtime/explore/runtime-performance-tips.rst",
    "content": ".. meta::\n   :description: Performance optimization tips for AWS Neuron Runtime\n   :keywords: AWS Neuron, performance, optimization, runtime, asynchronous execution, NUMA, CPU affinity\n\n.. _runtime-performance-tips:\n\n==========================================\nBest Practices: Neuron Runtime Performance\n==========================================\n\nThis topic provides best practices and performance optimization tips for applications using the AWS Neuron Runtime (NRT). Following these guidelines can help you achieve optimal performance when running workloads on AWS Neuron devices.\n\nBest Practice: Enable asynchronous execution\n---------------------------------------------\n\nBackground\n^^^^^^^^^^\n\nThe Neuron runtime's main submission interface, ``nrt_execute()``, is synchronous by default. It's typically required to\nenable asynchronous mode to achieve high on-device utilization. The asynchronous interface allows the application's call\nto ``nrt_execute()`` to return immediately after preparing and enqueuing a request. A callback can be registered (see\n``nrt_register_async_exec_callback()``) to receive completion notifications.\n\nEnabling this feature improves performance by allowing the application thread to proceed with host-side processing,\nwhich creates a pipeline between host and device work in the critical path.\n\nInstructions\n^^^^^^^^^^^^^\n\nEnable this feature by setting the following environment variable to the required queue depth::\n\n    NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=<queue-depth>\n\nAdditional Notes\n^^^^^^^^^^^^^^^^^\n\n* The queue depth can be arbitrarily large. However, each execution submission typically has pre-allocated reserved\n  tensors for output buffers, which means limiting the number of requests in the queue is necessary to manage memory\n  usage.\n\nBest Practice: Isolate latency-sensitive threads\n-------------------------------------------------\n\nBackground\n^^^^^^^^^^\n\nThe proxy thread is a per-neuron-core thread in the runtime that drives network communication over EFA. While this\nthread doesn't perform heavy computation, it is latency-sensitive and directly impacts on-device execution. This thread\nneeds to be isolated for consistent performance.\n\nInstructions\n^^^^^^^^^^^^^\n\nNeuron Runtime provides an environment variable that allows you to specify the CPU affinity of proxy threads. This\nenables you to isolate a set of CPUs and place the proxy threads on them. Here's a simple way to achieve this::\n\n    NEURON_RT_LOW_LATENCY_TASKS_CPU_AFFINITY=40-47,88-95,136-143,184-191\n    taskset --cpu-list 0-39,48-87,96-135,144-183 my_workload.py\n\nAdditional Notes\n^^^^^^^^^^^^^^^^^\n\n* Using fewer cores than threads for proxy threads naturally results in slightly higher P0 latency. However, isolating\n  to a small set of cores is practically preferred because the performance is predictable and consistent, while the\n  impact remains negligible.\n\n* The configuration suggested above is specific to the trn2.48xlarge instance. It allocates 32 out of 192 cores to\n  latency-sensitive threads, split across 2 NUMA nodes. If you choose a custom configuration, it's important to balance\n  the allocated cores across NUMA nodes. Use ``lscpu | grep -i numa`` if needed to check your system's NUMA topology.\n\n* The approach above provides a simple baseline configuration. If your application involves multiple processes, you'll\n  want to adjust their affinities away from critical path threads, as demonstrated using taskset above. You can also\n  enable system-wide isolation using the kernel parameter\n  `isolcpus <https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cpu-partitioning/isolcpus>`_.\n\nUnderstanding Neuron Runtime CPU Usage\n---------------------------------------\n\nDuring typical operation, there can be many polling threads in the runtime. For example, in a trn2.48xlarge instance\nused with an ``lnc=1`` configuration, there will be 128 threads polling for execution completions. Additionally, the\napplication can perform three operations in parallel per core: read, write, and execute. Between NRT and upper layers\n(PJRT, etc.), this is handled with three different threads that busy-loop while polling for the completion of these\nevents. This results in a total of 384 threads.\n\nThis activity appears as busy CPUs but is typically harmless for the following reasons:\n\n1. **Thread yielding**: The threads simply poll and yield, so other threads on the system will not be starved of CPU\n   resources.\n\n2. **Non-blocking execution**: These threads do not block on-device executions. Since the execution queue is managed on\n   the device, as long as there are queued executions, no performance impact should be observed from host jitter.\n\nBest Practice: Respect the NUMA node layout\n--------------------------------------------\n\nBackground\n^^^^^^^^^^\n\nEach Neuron Device is connected to a specific NUMA node on the host instance. Data movements between host <-> device are\naffected by the NUMA node layout.\n\nInstructions\n^^^^^^^^^^^^^^\n\nWhile the Neuron Runtime internally takes the NUMA layout into account, configuring application threads to respect the\nNUMA node layout may also lead to performance benefits. As a general rule of thumb, threads that interact with a specific\nNeuron Core might see latency improvements if the CPU affinity for that thread places it on the same NUMA Node as the\nNeuron Core it interacts with. The NUMA node layout can be obtained from the ``neuron-ls`` tool and is also listed below.\n\nLayout\n^^^^^^\n\ntrn1.32xlarge\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. list-table::\n   :header-rows: 1\n   :widths: 10 10 12 10 15 15 15 8\n\n   * - NEURON DEVICE\n     - NEURON CORES\n     - NEURON CORE IDS\n     - NEURON MEMORY\n     - CONNECTED DEVICES\n     - PCI BDF\n     - CPU AFFINITY\n     - NUMA NODE\n   * - 0\n     - 2\n     - 0-1\n     - 32 GB\n     - 12, 3, 4, 1\n     - 0000:10:1c.0\n     - 0-31,64-95\n     - 0\n   * - 1\n     - 2\n     - 2-3\n     - 32 GB\n     - 13, 0, 5, 2\n     - 0000:10:1d.0\n     - 0-31,64-95\n     - 0\n   * - 2\n     - 2\n     - 4-5\n     - 32 GB\n     - 14, 1, 6, 3\n     - 0000:a0:1c.0\n     - 32-63,96-127\n     - 1\n   * - 3\n     - 2\n     - 6-7\n     - 32 GB\n     - 15, 2, 7, 0\n     - 0000:a0:1d.0\n     - 32-63,96-127\n     - 1\n   * - 4\n     - 2\n     - 8-9\n     - 32 GB\n     - 0, 7, 8, 5\n     - 0000:20:1b.0\n     - 0-31,64-95\n     - 0\n   * - 5\n     - 2\n     - 10-11\n     - 32 GB\n     - 1, 4, 9, 6\n     - 0000:20:1c.0\n     - 0-31,64-95\n     - 0\n   * - 6\n     - 2\n     - 12-13\n     - 32 GB\n     - 2, 5, 10, 7\n     - 0000:90:1b.0\n     - 32-63,96-127\n     - 1\n   * - 7\n     - 2\n     - 14-15\n     - 32 GB\n     - 3, 6, 11, 4\n     - 0000:90:1c.0\n     - 32-63,96-127\n     - 1\n   * - 8\n     - 2\n     - 16-17\n     - 32 GB\n     - 4, 11, 12, 9\n     - 0000:20:1d.0\n     - 0-31,64-95\n     - 0\n   * - 9\n     - 2\n     - 18-19\n     - 32 GB\n     - 5, 8, 13, 10\n     - 0000:20:1e.0\n     - 0-31,64-95\n     - 0\n   * - 10\n     - 2\n     - 20-21\n     - 32 GB\n     - 6, 9, 14, 11\n     - 0000:90:1d.0\n     - 32-63,96-127\n     - 1\n   * - 11\n     - 2\n     - 22-23\n     - 32 GB\n     - 7, 10, 15, 8\n     - 0000:90:1e.0\n     - 32-63,96-127\n     - 1\n   * - 12\n     - 2\n     - 24-25\n     - 32 GB\n     - 8, 15, 0, 13\n     - 0000:10:1e.0\n     - 0-31,64-95\n     - 0\n   * - 13\n     - 2\n     - 26-27\n     - 32 GB\n     - 9, 12, 1, 14\n     - 0000:10:1b.0\n     - 0-31,64-95\n     - 0\n   * - 14\n     - 2\n     - 28-29\n     - 32 GB\n     - 10, 13, 2, 15\n     - 0000:a0:1e.0\n     - 32-63,96-127\n     - 1\n   * - 15\n     - 2\n     - 30-31\n     - 32 GB\n     - 11, 14, 3, 12\n     - 0000:a0:1b.0\n     - 32-63,96-127\n     - 1\n\ninf2.48xlarge\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. list-table::\n   :header-rows: 1\n   :widths: 10 10 12 10 12 15 15 8\n\n   * - NEURON DEVICE\n     - NEURON CORES\n     - NEURON CORE IDS\n     - NEURON MEMORY\n     - CONNECTED DEVICES\n     - PCI BDF\n     - CPU AFFINITY\n     - NUMA NODE\n   * - 0\n     - 2\n     - 0-1\n     - 32 GB\n     - 11, 1\n     - 0000:80:1e.0\n     - 48-71,144-167\n     - 2\n   * - 1\n     - 2\n     - 2-3\n     - 32 GB\n     - 0, 2\n     - 0000:90:1e.0\n     - 72-95,168-191\n     - 3\n   * - 2\n     - 2\n     - 4-5\n     - 32 GB\n     - 1, 3\n     - 0000:80:1d.0\n     - 48-71,144-167\n     - 2\n   * - 3\n     - 2\n     - 6-7\n     - 32 GB\n     - 2, 4\n     - 0000:90:1f.0\n     - 72-95,168-191\n     - 3\n   * - 4\n     - 2\n     - 8-9\n     - 32 GB\n     - 3, 5\n     - 0000:80:1f.0\n     - 48-71,144-167\n     - 2\n   * - 5\n     - 2\n     - 10-11\n     - 32 GB\n     - 4, 6\n     - 0000:90:1d.0\n     - 72-95,168-191\n     - 3\n   * - 6\n     - 2\n     - 12-13\n     - 32 GB\n     - 5, 7\n     - 0000:20:1e.0\n     - 24-47,120-143\n     - 1\n   * - 7\n     - 2\n     - 14-15\n     - 32 GB\n     - 6, 8\n     - 0000:20:1f.0\n     - 24-47,120-143\n     - 1\n   * - 8\n     - 2\n     - 16-17\n     - 32 GB\n     - 7, 9\n     - 0000:10:1e.0\n     - 0-23,96-119\n     - 0\n   * - 9\n     - 2\n     - 18-19\n     - 32 GB\n     - 8, 10\n     - 0000:10:1f.0\n     - 0-23,96-119\n     - 0\n   * - 10\n     - 2\n     - 20-21\n     - 32 GB\n     - 9, 11\n     - 0000:10:1d.0\n     - 0-23,96-119\n     - 0\n   * - 11\n     - 2\n     - 22-23\n     - 32 GB\n     - 10, 0\n     - 0000:20:1d.0\n     - 24-47,120-143\n     - 1\n\ntrn2.48xlarge\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. list-table::\n   :header-rows: 1\n   :widths: 10 10 12 10 15 15 15 8\n\n   * - NEURON DEVICE\n     - NEURON CORES\n     - NEURON CORE IDS\n     - NEURON MEMORY\n     - CONNECTED DEVICES\n     - PCI BDF\n     - CPU AFFINITY\n     - NUMA NODE\n   * - 0\n     - 4\n     - 0-3\n     - 96 GB\n     - 12, 3, 4, 1\n     - 0000:cc:00.0\n     - 48-95,144-191\n     - 1\n   * - 1\n     - 4\n     - 4-7\n     - 96 GB\n     - 13, 0, 5, 2\n     - 0000:b5:00.0\n     - 48-95,144-191\n     - 1\n   * - 2\n     - 4\n     - 8-11\n     - 96 GB\n     - 14, 1, 6, 3\n     - 0000:b6:00.0\n     - 48-95,144-191\n     - 1\n   * - 3\n     - 4\n     - 12-15\n     - 96 GB\n     - 15, 2, 7, 0\n     - 0000:cb:00.0\n     - 48-95,144-191\n     - 1\n   * - 4\n     - 4\n     - 16-19\n     - 96 GB\n     - 0, 7, 8, 5\n     - 0000:6f:00.0\n     - 0-47,96-143\n     - 0\n   * - 5\n     - 4\n     - 20-23\n     - 96 GB\n     - 1, 4, 9, 6\n     - 0000:58:00.0\n     - 0-47,96-143\n     - 0\n   * - 6\n     - 4\n     - 24-27\n     - 96 GB\n     - 2, 5, 10, 7\n     - 0000:59:00.0\n     - 0-47,96-143\n     - 0\n   * - 7\n     - 4\n     - 28-31\n     - 96 GB\n     - 3, 6, 11, 4\n     - 0000:6e:00.0\n     - 0-47,96-143\n     - 0\n   * - 8\n     - 4\n     - 32-35\n     - 96 GB\n     - 4, 11, 12, 9\n     - 0000:9b:00.0\n     - 0-47,96-143\n     - 0\n   * - 9\n     - 4\n     - 36-39\n     - 96 GB\n     - 5, 8, 13, 10\n     - 0000:84:00.0\n     - 0-47,96-143\n     - 0\n   * - 10\n     - 4\n     - 40-43\n     - 96 GB\n     - 6, 9, 14, 11\n     - 0000:85:00.0\n     - 0-47,96-143\n     - 0\n   * - 11\n     - 4\n     - 44-47\n     - 96 GB\n     - 7, 10, 15, 8\n     - 0000:9a:00.0\n     - 0-47,96-143\n     - 0\n   * - 12\n     - 4\n     - 48-51\n     - 96 GB\n     - 8, 15, 0, 13\n     - 0000:f8:00.0\n     - 48-95,144-191\n     - 1\n   * - 13\n     - 4\n     - 52-55\n     - 96 GB\n     - 9, 12, 1, 14\n     - 0000:e1:00.0\n     - 48-95,144-191\n     - 1\n   * - 14\n     - 4\n     - 56-59\n     - 96 GB\n     - 10, 13, 2, 15\n     - 0000:e2:00.0\n     - 48-95,144-191\n     - 1\n   * - 15\n     - 4\n     - 60-63\n     - 96 GB\n     - 11, 14, 3, 12\n     - 0000:f7:00.0\n     - 48-95,144-191\n     - 1\n"
  },
  {
    "path": "neuron-runtime/explore/work-with-neff-files.rst",
    "content": ".. meta::\n  :description: Learn about NEFF (Neuron Executable File Format) architecture, structure, and components\n\n.. _work-with-neff-files:\n\nWork with NEFF Files\n====================\n\nNEFF Architecture\n-----------------\n\nOverview\n~~~~~~~~\n\nA NEFF (Neuron Executable File Format) is a Neuron Runtime executable file generated by the Neuron compiler describing a compute graph (typically a neural network model). While each NEFF is always a single file, at its core, the NEFF is just a tarball of all the metadata needed to run the described compute graph.\n\nPackaging\n~~~~~~~~~\n\nAt its core, the NEFF is just a file with a Header prepended onto a Tarball. Unpacking the NEFF and examining its contents is as straightforward as stripping the header from the file and untaring the header-stripped buffer. As part of the Neuron devtools suite, we have a ``neuron-packager`` tool that can be used to unpack a NEFF::\n\n    neuron-packager unpack file.neff\n\nNEFF Header\n~~~~~~~~~~~\n\nThe NEFF header is a 1024 byte buffer prepended onto the NEFF tarball::\n\n    typedef struct neff_header {\n        uint64_t pkg_version;\n        uint64_t header_size;\n        uint64_t data_size;\n        uint64_t neff_version_major;\n        uint64_t neff_version_minor;\n        uint8_t neff_build_version[128];\n        uint32_t num_tpb;\n        uint8_t hash[32];\n        uint8_t uuid[16];\n        char name[256];\n        uint32_t requested_tpb_count;\n        uint8_t tpb_per_node[64];\n        uint64_t feature_bits;\n        uint32_t lnc_size;\n        uint8_t pad[468];\n        uint8_t data[];\n    } neff_header_t;\n\nIts contents are described below:\n\n* ``uint64_t pkg_version``\n    Tool version used to create this NEFF\n* ``uint64_t header_size``\n    Number of bytes contained in this header\n* ``uint64_t data_size``\n    Size in bytes of the NEFF contents\n* ``uint64_t neff_version_major``\n    NEFF major version\n* ``uint64_t neff_version_minor``\n    NEFF minor version\n* ``uint8_t neff_build_version[128]``\n    Build version information\n* ``uint32_t num_tpb``\n    Total number of TPBs required for efficient execution (all SGs get their own TPB)\n* ``uint8_t hash[NEFF_HEADER_HASH_SZ]``\n    Hash of the package, sha256 or md5 depending on the pkg_version\n* ``uint8_t uuid[NEFF_HEADER_UUID_SZ]``\n    Unique identifier for the NEFF\n* ``char name[NEFF_HEADER_NAME_SZ]``\n    Name of the NEFF\n* ``uint32_t requested_tpb_count``\n    How many TPBs were requested during compilation\n* ``uint8_t tpb_per_node[MAX_NODES]``\n    Number of required TPBs per kelf node in the graph, 1 byte per node\n* ``uint64_t feature_bits``\n    Bits representing individual incompatible NEFF features for fine-grained compatibility checking\n* ``uint32_t lnc_size``\n    Logical core size required to run this NEFF\n\nTarball\n~~~~~~~\n\nThe NEFF tarball, when unpacked, consists of top-level JSON files describing the graph as a whole and partitioned subgraphs.\n\nComponents\n----------\n\nSubgraphs (sg00 ... sgN)\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nA subgraph is a directory in the unpackaged NEFF that contains files which describe the computation and resources needed to run a \"subgraph\". When the NEFF is loaded, each subgraph declared in the NEFF will be loaded onto its own TPB. In the past on INF1, a NEFF could contain multiple subgraphs with data being passed between subgraphs to improve model throughput. This feature was called serial TPB. The serial TPB feature does not exist (not needed) on architectures after INF1. Today multiple subgraphs in a NEFF tie to the logical core feature.\n\ndef.json\n~~~~~~~~\n\nThe ``def.json`` file is the starting file for any subgraph. At its top level, it will point the runtime to the engine JSONs and the engine binaries as well as declare queue sets and variables used by the subgraph to move and hold data.\n\nQueue Sets\n^^^^^^^^^^\n\nQueue sets declared in ``def.json`` will be mapped to physical HW queues by the runtime during model load. These queue sets will be used to move data during NEFF execution and are declared in the ``dma_queue`` object. Each queue set is a JSON object that can contain the following fields:\n\n* ``type`` (required)\n    * **Type**: string\n    * **Description**: What this queue set will be used for\n    * **Valid values**: ``in``, ``out``, ``data``, ``embedding_update``, ``dynamic``\n    * **Supported architectures**: all\n\n* ``num_queues`` (optional)\n    * **Type**: int\n    * **Description**: Number of HW queues to reserve for this queue set. More HW queues allow the NEFF program to use multiple DMA engines to transfer data\n    * **Restrictions**: On INF1, this field must be 1. On non-INF1 platforms it must be 16 or less\n    * **Default**: 1\n    * **Supported architectures**: all\n\n* ``owner`` (optional)\n    * **Type**: string\n    * **Description**: Engine that owns this queue set. When queue set is assigned to an engine, the owning engine will perform the DMA triggers of the queue set\n    * **Supported architectures**: all\n\n* ``pinned`` (optional)\n    * **Type**: bool\n    * **Description**: Queue is used to move data to the TPB's state buffer during model start. Once the data is moved to SB, the NEFF will never write to the buffers \"pinning\" the data to SB\n    * **Default**: false\n    * **Supported architectures**: INF1\n\n* ``queue_instances`` (optional)\n    * **Type**: [string]\n    * **Description**: Set of DMA rings that can be swapped in/out during model execution\n    * **Supported architectures**: all architectures except for INF1\n\n* ``semaphore_set`` (optional)\n    * **Type**: [int]\n    * **Description**: A list of semaphores used by the queue set to signal data transfer completion\n    * **Supported architectures**: all\n\n* ``semaphore`` (optional)\n    * **Type**: int\n    * **Description**: Single semaphore used by single queue to signal data transfer completion\n    * **Supported architectures**: all\n\n* ``fabric_path`` (optional)\n    * **Type**: string\n    * **Description**: Which pathway the DMA queue should take to move data\n    * **Valid values**: ``main``, ``alt``\n    * **Default**: \"main\"\n    * **Supported architectures**: all\n\nVariables\n^^^^^^^^^\n\nVariables are buffers allocated on device that can be referenced by the NEFF to read data from and write data to during execution. Variables are declared in the ``var`` object in ``def.json`` and each variable is a JSON object that can contain the following fields:\n\n* ``type`` (required)\n    * **Type**: string\n    * **Description**: What type of data this variable contains\n    * **Valid values**: ``state-buffer``, ``input``, ``output``, ``file`` (HBM), ``tmp-buf`` (HBM) - private-per-NEFF scratchpad allocation, ``virtual`` (HBM) - shared scratchpad variables, ``pointer`` (HBM), ``dge-table``\n    * **Supported architectures**: all\n\n* ``var_id`` (required)\n    * **Type**: int\n    * **Description**: Unique ID to reference this variable with\n    * **Restrictions**: Must be unique to this variable\n    * **Supported architectures**: all\n\n* ``size`` (required)\n    * **Type**: int\n    * **Description**: Size in bytes of the variable\n    * **Supported architectures**: all\n\n* ``alignment`` (optional)\n    * **Type**: int\n    * **Description**: Physical address alignment for this variable\n    * **Restrictions**: Must be a power of two\n    * **Default**: 0\n    * **Supported architectures**: all\n\n* ``fabric_path`` (optional)\n    * **Type**: string\n    * **Description**: Fabric path to place this variable on\n    * **Default**: \"main\"\n    * **Supported architectures**: all\n\n* ``file_name`` (optional)\n    * **Type**: string\n    * **Description**: File to load variable data from. Can point to .npy files or raw binary data (any file without a .npy extension)\n    * **Restrictions**: Only used with variable type ``file``\n    * **Supported architectures**: all\n\n* ``backing_variable_off`` (optional)\n    * **Type**: int\n    * **Description**: The offset inside the shared scratchpad space allocated by Runtime\n    * **Restrictions**: Only used with variable type ``virtual``\n    * **Supported architectures**: all\n\n* ``referenced_var_id`` (optional)\n    * **Type**: int\n    * **Description**: ``var_id`` of the variable whose address will be placed in this pointer variable\n    * **Restrictions**: Only used with variable type ``pointer``\n    * **Supported architectures**: all\n\n* ``list`` (optional)\n    * **Type**: [int]\n    * **Description**: List of ``var_ids`` to populate the table with\n    * **Restrictions**: Used with variable type ``dge-table``\n    * **Supported architectures**: all\n\n{ENGINE}.json\n~~~~~~~~~~~~~\n\nThe engine JSON is a JSON for each of the TPB's engines. This JSON will describe the DMA descriptors triggered by the engine to move data during execution as well as some extra engine-specific metadata.\n\nDMA Descriptors\n^^^^^^^^^^^^^^^\n\nIn each engine JSON file, there is a list of JSON objects describing DMA data movements triggered by the engine. This list is indexed by the ``dma`` key. Each object in the list is a JSON object with the following fields:\n\n* ``id`` (required)\n    * **Type**: int\n    * **Description**: Identifier to map this descriptor to a trigger in the engine binary. Other descriptors with the same ID in the same function call must have the same trigger amounts\n    * **Supported architectures**: all\n\n* ``queue`` (required if ``instance_name`` is empty)\n    * **Type**: string\n    * **Description**: Name of the queue set this descriptor will be placed on\n    * **Restrictions**: ``instance_name`` field takes precedence over this field\n    * **Supported architectures**: all\n\n* ``instance_name`` (required if ``queue`` field is empty)\n    * **Type**: string\n    * **Description**: Name of the queue set instance this descriptor will be placed on\n    * **Restrictions**: Takes precedence over ``queue`` field\n    * **Supported architectures**: everything but INF1\n\n* ``function_start`` (optional)\n    * **Type**: string\n    * **Description**: Names the function that will trigger this descriptor and all other descriptors after it belonging to the same queue set until the next ``function_start`` for the queue set is hit. Used in the call graph flow feature of the compiler\n    * **Restrictions**: Must name a valid function declared in the engine binary\n    * **Default**: \"\"\n    * **Supported architectures**: everything but INF1\n\n* ``section_start_desc`` (optional)\n    * **Type**: bool\n    * **Description**: If this field is true, the runtime will place this descriptor on the first queue in the queue set\n    * **Default**: false\n    * **Supported architectures**: everything but INF1\n\n* ``event`` (optional)\n    * **Type**: int\n    * **Description**: Event to set after this descriptor has been executed\n    * **Supported architectures**: all\n\n* ``semaphore`` (optional)\n    * **Type**: int\n    * **Description**: Semaphore to increment after this descriptor has been executed. If there are multiple queues in the queue set, semaphore will be incremented by ``num_queues`` amount when the transfer is complete\n    * **Supported architectures**: all\n\n* ``remote_semaphores`` (optional)\n    * **Type**: [int]\n    * **Description**: Semaphore(s) of other NeuronCore/TPB to increment in case of LNC size 2\n    * **Supported architectures**: Trn2 and above\n\n* ``desc`` (required)\n    Contains the following sub-fields:\n\n    * ``op`` (optional)\n        * **Type**: string\n        * **Description**: Op for the DMA engine to perform for this transfer\n        * **Valid values**: ``fma``, ``cast``, ``add``, ``min``, ``max``, ``transpose``, ``copy``\n        * **Default**: \"copy\"\n        * **Supported architectures**: everything but INF1\n\n    * ``from``/``to`` (``to`` is always required; ``from`` is required for non-CCE descriptors)\n        * **Type**: string\n        * **Description**: Which variable to read from/write to\n        * **Restrictions**: Must be a variable declared in ``def.json``\n        * **Supported architectures**: all\n\n    * ``from_off``/``to_off`` (required)\n        * **Type**: int\n        * **Description**: Offset in variable to read/write\n        * **Supported architectures**: all\n\n    * ``from_steps``/``to_steps`` (required)\n        * **Type**: Array[int]\n        * **Description**: Access pattern steps for variable. All elements are in denominations of bytes. The first element - corresponding to the innermost/fastest-growing dimension - is usually 1 to indicate that successive bytes must be copied\n        * **Restrictions**: Max array size of 4; array-length must match ``{from, to}_sizes``\n        * **Supported architectures**: all\n\n    * ``from_sizes``/``to_sizes`` (required)\n        * **Type**: Array[int]\n        * **Description**: Access pattern sizes for variable. The first element - corresponding to the innermost/fastest-growing dimension - is in denomination of bytes. All other elements are counts of number of elements in those dimensions\n        * **Restrictions**: Max array-size of 4; array length must match ``{from, to}_steps``\n        * **Supported architectures**: all\n\n    * ``from_dtype``/``to_dtype`` (optional)\n        * **Type**: string\n        * **Description**: Dtype of variable\n        * **Valid values**: ``float8e3``, ``float8e4``, ``float8e5``, ``float16``, ``float32``, ``float32r``, ``bfloat16``, ``uint8``, ``uint16``, ``uint32``, ``uint64``, ``int8``, ``int16``, ``int32``, ``int64``\n        * **Default**: \"uint8\"\n        * **Supported architectures**: everything but INF1\n\n    * ``num_tiling_dimensions`` (optional)\n        * **Type**: int\n        * **Description**: Number of dimensions used to tile the DMA descriptor; number of dimensions used for a single tile\n        * **Supported architectures**: all\n\n    * ``from_arr`` (required for CCE descriptors, replaces ``from*`` fields)\n        * **Type**: [objects]\n        * **Description**: List of source tensors to perform the CCE op on; fields are the \"from\" parts of a DMA descriptor\n        * **Restrictions**: Cannot have more than 16 source tensors (length of ``from_arr`` <= 16)\n        * **Supported architectures**: everything but INF1\n\n    **FMA only fields:**\n\n    * ``scale_dtype`` (required with fma op)\n        * **Type**: string\n        * **Description**: Data type of the scale constant\n        * **Valid values**: ``float32``\n        * **Default**: \"float32\"\n\n    * ``scale`` (optional)\n        * **Type**: double\n        * **Description**: Scale for data being moved\n        * **Restrictions**: Only valid on \"fma\" type descriptors\n        * **Default**: 1.0\n        * **Supported architectures**: everything but INF1\n\n    **Min/Max only fields:**\n\n    * ``constant_dtype`` (optional)\n        * **Type**: string\n        * **Description**: Datatype of \"min\"/\"max\" constant\n        * **Valid values**: ``float32``, ``int32``, ``uint32``\n\n    * ``constant`` (required if ``constant_dtype`` specified)\n        * **Type**: double, int, or uint\n        * **Description**: Constant to start the \"min\"/\"max\" operation with\n        * **Restrictions**: Only valid on \"min\"/\"max\" type descriptors; will be ignored if ``constant_dtype`` is not specified\n        * **Supported architectures**: everything but INF1\n\n    **Transpose only fields:**\n\n    * ``transpose_shape`` (optional)\n        * **Type**: [int]\n        * **Description**: Shape to transpose the data to\n        * **Restrictions**: Number of elements must be ``XPOSE_NUM_DIMS`` (4)\n        * **Supported architectures**: everything but INF1\n\n    * ``transpose_element_size`` (optional)\n        * **Type**: int\n        * **Description**: Size of a single element of the transpose\n        * **Supported architectures**: everything but INF1\n\nActivation.json\n~~~~~~~~~~~~~~~\n\nIn addition to DMA descriptors, the ``Activation.json`` file also contains metadata on the PWP tables used by the NEFF. The field ``activation_function_sets`` lists the activation function sets used during execution of the NEFF. Each \"activation function set\" will point to activation table metadata contained in the subgraph's directory.\n\nExample::\n\n    \"activation_function_sets\": [\n        \"reciprocal_sqrt_and_small\",\n        \"natural_log_exp_and_others\",\n        \"reciprocal_and_small\",\n        \"gelu_and_others\"\n    ]\n\nDVE.json\n~~~~~~~~\n\n``DVE.json`` will contain info about the DVE tables used by the NEFF. In ``DVE.json``, the DVE tables will be indexed by the ``dve_tables`` key. More info on this works can be found in the Loadable DVE doc.\n\nExample::\n\n    \"dve_tables\": [\n        {\n            \"control_table\": \"default_control_table.bin\",\n            \"datapath_table\": \"default_datapath_table.bin\",\n            \"opcode_table\": \"default_opcode_table.bin\"\n        }\n    ]\n\nConstants\n~~~~~~~~~\n\nConstants are data files placed directly in the subgraph directory. These files can be referenced from a variable declared in ``def.json``. During model load time, the contents of these files are written into the variable declared for it. The data is either written as raw binary for most files, or, for npy files, the file will be parsed and the numpy array data will be written to the buffer. These are pointed to by the ``file_name`` field in var declarations.\n"
  },
  {
    "path": "neuron-runtime/faq.rst",
    "content": ".. _neuron-runtime-faq:\n\nNeuronX runtime FAQ\n==================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 1\n\n\nWhere can I find information about Neuron Runtime 2.x (``libnrt.so``)\n---------------------------------------------------------------------\n\nSee :ref:`introduce-libnrt` for detailed information about Neuron Runtime 2.x (``libnrt.so``).\n\nWhat will happen if I will upgrade Neuron Framework without upgrading latest kernel mode driver?\n------------------------------------------------------------------------------------------------\n\nApplication start would fail with the following error message:\n.. code:: bash\n\n    2021-Aug-11 19:18:21.0661 24616:24616 ERROR   NRT:nrt_init      This runtime requires Neuron Driver version 2.0 or greater. Please upgrade aws-neuron-dkms package.\n\n\nDo I need to recompile my model to use the Runtime Library?\n-----------------------------------------------------------\nNo. Runtime 2.x supports all the models compiled with Neuron Compiler 1.x.\n\n\nDo I need to change my application launch command?\n--------------------------------------------------\nNo.\n\nHow do I restart/start/stop the NeuronX Runtime?\n-----------------------------------------------\nSince Neuron Runtime is a library, starting/stopping application would result in starting/stopping the Neuron Runtime.\n\n\nHow do I know which runtimes are associated with which Neuron Device(s)?\n------------------------------------------------------------------------\n`neuron-ls` and `neuron-top` can be used to find out applications using Neuron Devices.\n\n\nWhat about RedHat or other versions of Linux and Windows?\n--------------------------------------------------------\n\nWe don't officially support it yet.\n\n\nHow can I take advantage of multiple NeuronCores to run multiple inferences in parallel?\n---------------------------------------------------------------------------------------\n\nExamples of this for TensorFlow and MXNet are found\n:ref:`here <tensorflow-tutorials>` and :ref:`here <mxnet-tutorials>`.\n"
  },
  {
    "path": "neuron-runtime/index.rst",
    "content": ".. _neuron_runtime:\n\nNeuronX Runtime\n================\n\nThe NeuronX Runtime is a high-performance execution engine that enables deep learning models to run on AWS Inferentia and Trainium accelerators. It consists of a kernel driver and C/C++ libraries that provide low-level APIs for accessing Neuron devices, managing model execution, and coordinating collective communications across NeuronCores.\n\nThe Neuron Runtime serves as the foundation for all ML framework integrations (TensorFlow, PyTorch, JAX, and Apache MXNet), loading compiled models in Neuron Executable File Format (NEFF) and orchestrating their execution on Neuron hardware. It is optimized for high-throughput and low-latency inference and training workloads, with features including:\n\n* **Efficient model execution**: Loads and executes NEFF files on NeuronCores with optimized memory management\n* **Multi-model support**: Manages multiple models across multiple NeuronCores with flexible allocation strategies\n* **Collective communications**: Provides high-performance collective operations for distributed training and inference\n* **Device management**: Handles NeuronCore allocation, device discovery, and resource management\n* **Debugging support**: Offers core dump generation, debug streams, and detailed logging for troubleshooting\n* **Configuration flexibility**: Extensive environment variables for fine-tuning runtime behavior\n\nThe Neuron Runtime is typically used transparently through ML framework plugins, but also provides direct C/C++ APIs for developers building custom frameworks or requiring low-level device control. \n\n.. toctree::\n    :maxdepth: 2\n    :hidden:\n\n    Overview </neuron-runtime/about/index>\n    Get Started </neuron-runtime/about/core-dump>\n    Deep Dives </neuron-runtime/explore/index>\n    /neuron-runtime/configuration-guide\n    Developer Guide </neuron-runtime/nrt-developer-guide>\n    API Reference </neuron-runtime/api/index>\n    NRT Debug Stream </neuron-runtime/api/debug-stream-api>\n    Troubleshooting on Inf1 and Trn1 </neuron-runtime/nrt-troubleshoot>\n    Release Notes </release-notes/components/runtime>\n    FAQ </neuron-runtime/faq>\n\nGet Started\n------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: About the NeuronX Runtime\n        :link: neuron-runtime-about\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Learn about the AWS Neuron Runtime, its features, and capabilities for accessing Inferentia and Trainium Neuron devices.\n\n    .. grid-item-card:: Quickstart: Generate a Core Dump\n        :link: runtime-core-dump-quickstart\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Learn how to generate a Neuron runtime core dump for debugging runtime failures and analyzing device state.\n\nReference\n------------\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: Runtime Developer Guide\n        :link: nrt-api-guide\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Comprehensive guide to the Neuron Runtime API for developers building custom frameworks that call libnrt APIs directly.\n\n    .. grid-item-card:: Runtime API Reference Documentation\n        :link: /neuron-runtime/api/index\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Documentation of the APIs in the public headers for the Neuron Runtime.\n\n    .. grid-item-card:: Runtime Configuration\n        :link: nrt-configuration\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Learn how to configure the Neuron Runtime using environment variables to control NeuronCore allocation, logging, and more.\n\n    .. grid-item-card:: Troubleshooting on Inf1 and Trn1\n        :link: nrt-troubleshooting\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Solutions for common issues encountered when using the Neuron Runtime on Inferentia and Trainium instances.\n\n    .. grid-item-card:: Frequently Asked Questions\n        :link: neuron-runtime-faq\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Answers to common questions about the Neuron Runtime, including compatibility, configuration, and usage.\n\nLearn More\n------------\n\n.. grid:: 1\n    :gutter: 2\n\n    .. grid-item-card:: Explore the Neuron Runtime\n        :link: neuron-runtime-explore-home\n        :link-type: ref\n        :class-header: sd-bg-primary sd-text-white\n\n        Deep dives into the Neuron Runtime, including NEFF files, compute-communication overlap, device memory, and core dumps.\n\nCollectives\n------------\n\n.. grid:: 1\n    :gutter: 2\n\n    .. grid-item-card:: About Collectives\n        :link: /neuron-runtime/about/collectives\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Learn about Neuron Runtime collectives.\n\n.. grid:: 2\n    :gutter: 2\n\n    .. grid-item-card:: Deep Dive: Inter-node Collective Communication\n        :link: /neuron-runtime/explore/internode-collective-comm\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Explore and understand techniques for communication across nodes in the Neuron Runtime.\n\n    .. grid-item-card:: Deep dive: Intra-node Collective Communication\n        :link: /neuron-runtime/explore/intranode-collective-comm\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Explore and understand techniques for communication within nodes in the Neuron Runtime.\n\nRelease Notes\n--------------\n\n.. grid:: 1\n    :gutter: 2\n\n    .. grid-item-card:: Runtime Release Notes\n        :link: /release-notes/components/runtime\n        :link-type: doc\n        :class-header: sd-bg-primary sd-text-white\n\n        Latest updates, improvements, and bug fixes for the Neuron Runtime library, driver, and collectives.\n\n"
  },
  {
    "path": "neuron-runtime/nrt-configurable-parameters.rst",
    "content": ".. _nrt-configuration:\n\nNeuronX Runtime Configuration\n============================\n\nNeuronX Runtime is responsible for executing ML models on Neuron Devices. NeuronX Runtime determines which NeuronCore will execute which model and how to execute it.\nConfiguration of the NeuronX Runtime is controlled through the use of Environment variables at the process level.  By default, Neuron framework extensions will take care of NeuronX Runtime configuration on the user's behalf.  Explicit configurations are also possible when attempting to achieve a desired behavior.\n\nThis guide provides an overview of the different environment variables available to\nconfigure NeuronX Runtime behavior.\n\n.. list-table:: Environment Variables\n   :widths: 25 60 20 50 20 50\n   :header-rows: 1\n   \n\n   \n   * - Name\n     - Description\n     - Type\n     - Expected Values\n     - Default Value\n     - RT Version\n   * - ``NEURON_RT_VISIBLE_CORES``\n     - Range of specific NeuronCores needed by the process\n     - Integer range (like 1-3)\n     - Any value or range between 0 to Max NeuronCore in the system.\n     - None\n     - 2.0+\n   * - ``NEURON_RT_NUM_CORES``\n     - Number of NeuronCores required by the process.\n     - Integer\n     - A value from 1 to Max NeuronCore in the system.\n     - 0, which is interpreted as \"all\"\n     - 2.0+\n   * - ``NEURON_RT_LOG_LOCATION``\n     - Runtime log location\n     - string\n     - console or syslog\n     - console\n     - 2.0+\n   * - ``NEURON_RT_LOG_LEVEL``\n     - Runtime log verbose level\n     - string\n     - ERROR, WARNING, INFO, DEBUG, TRACE\n     - ERROR\n     - 2.0+\n   * - ``NEURON_RT_EXEC_TIMEOUT``\n     - Timeout for execution in seconds\n     - Integer\n     - 0 to INT_MAX\n     - 30\n     - 2.0+\n   * - ``NEURON_RT_VALIDATE_HASH``\n     - Validate NEFF contents before loading into accelerator\n     - Boolean\n     - TRUE or FALSE\n     - FALSE\n     - 2.0+\n   * - ``NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS``\n     - Share weights when loading multiple instance versions of the same model on different NeuronCores\n     - Boolean\n     - TRUE or FALSE\n     - FALSE\n     - 2.11+\n   * - ``NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS``\n     - Controls number of asynchronous execution requests to be supported.\n     - Integer\n     - 0 to INT_MAX; 0 is disabled.\n     - 0\n     - 2.15+\n   * - ``NEURON_RT_ALLOW_LEGACY_NEFF``\n     - Allow a NEFF compiled for an older arch to execute on a newer one. For example, executing a NEFF originally compiled for Trn1 architecture on Trn2.\n     - Boolean\n     - TRUE or FALSE\n     - FALSE\n     - 2.25+\n\n.. warning::\n  When applying ``NEURON_RT_ALLOW_LEGACY_NEFF``, note that not all NEFF files, especially those from older architectures, may be compatible.\n  In the case of an incompatibility, the operation will fail with a data mismatch error or stall out.\n\nNeuronCore Allocation\n---------------------\n\n.. important ::\n\n  ``NEURONCORE_GROUP_SIZES`` is being deprecated, if your application is using ``NEURONCORE_GROUP_SIZES`` please \n  see :ref:`neuron-migrating-apps-neuron-to-libnrt` for more details.\n\n\nBy default, NeuronX Runtime initializes all the cores present in the system and reserves them for the current process.\n\n.. note::\n\n  Once a NeuronCore is reserved for a process, it cannot be used by another process at all, until the process reserving that NeuronCore is terminated.\n  \nUsing NEURON_RT_VISIBLE_CORES\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nFor parallel processing, ``NEURON_RT_VISIBLE_CORES`` can be used to control which NeuronCores each process would reserve.  This variable is specified with a single NeuronCore index or an inclusive range value.\n\nFor example, if a process (myapp.py) requires one NeuronCore, then it can be started with\n``NEURON_RT_VISIBLE_CORES=0`` to limit the process to NeuronCore 0. For parallel processing, multiple process can be\nstarted (without any change to myapp.py code) with different ``NEURON_RT_VISIBLE_CORES`` values.\nHere is an example that runs myapp.py on inf1.xlarge in parallel across the four different NeuronCores available in the inf1.xlarge.\n\n::\n\n NEURON_RT_VISIBLE_CORES=0 myapp.py &\n NEURON_RT_VISIBLE_CORES=1 myapp.py &\n NEURON_RT_VISIBLE_CORES=2 myapp.py &\n NEURON_RT_VISIBLE_CORES=3 myapp.py &\n\n\nIf myapp.py required 3 NeuronCores and was running on a inf1.6xlarge (16 NeuronCores maximum), the first instance of myapp.py could use NeuronCores 0-2, the next instance could use 3-5 and so on:\n\n::\n\n NEURON_RT_VISIBLE_CORES=0-2 myapp.py &\n NEURON_RT_VISIBLE_CORES=3-5 myapp.py &\n NEURON_RT_VISIBLE_CORES=6-8 myapp.py &\n NEURON_RT_VISIBLE_CORES=9-11 myapp.py &\n NEURON_RT_VISIBLE_CORES=12-14 myapp.py &\n\n\nUsing NEURON_RT_NUM_CORES\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf ``NEURON_RT_NUM_CORES`` is set to a value between 1 and the maximum number of NeuronCores in the instance, Neuron Runtime will attempt to automatically reserve the number of free NeuronCores specified for the process. The difference between ``NEURON_RT_VISIBLE_CORES`` and ``NEURON_RT_NUM_CORES`` is that, ``NEURON_RT_VISIBLE_CORES`` specifies exact NeuronCores to allocate where as ``NEURON_RT_NUM_CORES`` specifies the number of NeuronCores needed and Neuron Runtime selects free NeuronCores.\n\nUsing the same example earlier where myapp.py needed 3 cores, but _which_ 3 cores was of no concern, the same application could be executed in parallel up to 5 times on an inf1.6xlarge (16 NeuronCore max):\n\n::\n\n NEURON_RT_NUM_CORES=3 myapp.py &\n NEURON_RT_NUM_CORES=3 myapp.py &\n NEURON_RT_NUM_CORES=3 myapp.py &\n NEURON_RT_NUM_CORES=3 myapp.py &\n NEURON_RT_NUM_CORES=3 myapp.py &\n\nExecuting a 6th ``NEURON_RT_NUM_CORES=3 myapp.py &`` in the above example would fail as there is only a single NeuronCore still free.\n\n\nNotes\n~~~~~\n\n1. Number of NeuronCores in a inferentia device is 4\n2. Number of inferentia is depends on the instance size.\n3. The NeuronCore index in NEURON_RT_VISIBLE_CORES starts from 0 and ends at (number of NeuronDevices * number of NeuronCores) - 1.\n4. By default, ``NEURON_RT_NUM_CORES`` is set to ``0``, which indicates to RT that all cores are to be used.  \n5. NEURON_RT_VISIBLE_CORES takes precedence over NEURON_RT_NUM_CORES.  If specified, all cores within the range will be assigned to the owning process.\n\n\nLogging and debug-ability\n-------------------------\nBy default, NeuronX Runtime logs to syslog with verbose level of *INFO* and only *ERROR* s are logged in console.\nThe following code snippet shows ways to increase/decrease the log level.\n\n::\n\n NEURON_RT_LOG_LEVEL=INFO myapp.py         # Sets the log level for syslog and console to INFO\n NEURON_RT_LOG_LOCATION=console NEURON_RT_LOG_LEVEL=QUIET myapp.py    # Completely disables console logging.\n\nBy default, NeuronX Runtime expects the NeuronCore to complete execution of any model with in 2 seconds.\nIf NeuronCore didn't complete the execution within 2 seconds then runtime would fail the execution with timeout error.\nMost of the models takes few milliseconds to complete so 2 seconds(2000 milliseconds) is more than adequate.\nHowever if your model is expected to run more than 2 seconds then you can increase the timeout with NEURON_RT_EXEC_TIMEOUT.\n\n::\n\n NEURON_RT_EXEC_TIMEOUT=5 myapp.py       # increases the timeout to 5 seconds\n\n\nAdditional Logging Controls\n-------------------------\nNeuronX Runtime enables detailed control over logging behaviors, including the ability to set separate log levels and log locations for individual components. \nWhen ``NEURON_RT_LOG_LEVEL`` is set globally, NeuronX Runtime combines the logs from all modules into a single stream. \nFor instance, the logs from the modules ``TDRV`` and ``NMGR`` would appear in the same stream as shown in the example below\n\n::\n  2023-Jan-09 20:27:41.0593 15042:15042 ERROR  TDRV:exec_consume_infer_status_notifications (FATAL-RT-UNDEFINED-STATE) inference timeout (600000 ms) on Neuron Device 0 NC 0, waiting for execution completion notification\n  2023-Jan-09 20:27:41.0600 15042:15042 ERROR  NMGR:dlr_infer \n\nHowever, it is possible to adjust the log level for individual components to capture more or less detail as required for specific debugging contexts.\nThese individual components are\n- ``TDRV``: the low level driver library\n- ``KMGR``: the higher level manager library bridging the driver and runtime\n- ``NRT``: the Neuron Runtime library responsible for loading and executing models that is exposed to end users and frameworks\n\nTo adjust the log level for individual components, use the environment variable ``NEURON_RT_LOG_LEVEL_<component>``, where ``<component>`` is the identifier of the component \n(either ``TDRV``, ``NMGR``, or ``NRT``). \nThis allows for precise control over the verbosity of logs generated by each component, facilitating more targeted debugging.\nFor example, the following sets different log levels for the ``TDRV`` and ``NMGR`` components.\n\n::\n  export NEURON_RT_LOG_LEVEL_TDRV=DEBUG\n  export NEURON_RT_LOG_LEVEL_NMGR=ERROR\n\n\nSimilarly, to specify separate log locations for individual components, use the environment variable ``NEURON_RT_LOG_LOCATION_<component>``, following the same naming convention as for log levels. \nThis feature enables logs from different components to be directed to separate files or destinations, making it easier to organize and analyze the log output.\nFor example, the following sets different log locations for the ``TDRV`` and ``NMGR`` components.\n\n::\n  export NEURON_RT_LOG_LOCATION_TDRV=tdrv.log\n  export NEURON_RT_LOG_LOCATION_NMGR=nmgr.log\n\n\n\nChecksum\n--------\nTo execute a model(NEFF), NeuronX Runtime needs to load the NEFF file into NeuronCore and run.\nNeuron Runtime provides a way to do checksum validation on each NEFF file while loading to validate the file is not corrupted.\nThis option is off by default to avoid performance penalty during model load time(~50%).\n\n::\n\n NEURON_RT_VALIDATE_HASH=true myapp1.py     # enables model checksum validation while loading\n NEURON_RT_VALIDATE_HASH=false myapp2.py    # disables(default) model checksum validation while loading\n \n \nShared Weights (NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS)\n--------------------------------------------------------\nBy default, NeuronX Runtime will make copies of model weights when loading the same instance of a model to multiple NeuronCores. Changing this default to a weight sharing mechanism is possible with NeuronX Runtime 2.11 or higher by setting ``NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS=TRUE``. Use of this flag will allow for more models to be loaded by reducing the memory requirements, but will potentially come at a cost of throughput by forcing the execution across cores to compete for memory bandwidth.\n\nNote: the use of this flag requires the model to be loaded with the multi-instance feature (see :ref:`torch_core_placement_api`).\n\nSee the :pytorch-neuron-src:`[BERT tutorial with shared weights notebook] <bert_tutorial/tutorial_pretrained_bert_shared_weights.ipynb>` for an example of how this is used in ``Torch-Neuron``.\n\n::\n\n NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS=TRUE myapp1.py     # enables model weight sharing\n NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS=FALSE myapp2.py    # disables(default) model weight sharing\n\n\nAynchronous Execution (NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS)\n--------------------------------------------------------\nA beta asynchronous execution feature which can reduce latency by roughly 12% for training workloads. Starting in Neuron Runtime version 2.15, the feature is available, but disabled.  To enable the feature for possible improvement, recommendation is to set NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS to 3.  Setting the number of inflight requests above 3 may lead to Out-Of-Memory (OOM) errors during execution.  For developers using libnrt.so directly, please use nrt_register_async_exec_callback to register a callback for the nrt execution thread to post the execution status to. A default callback will be registered if one is not set by the developer.\n\n::\n\n NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3 myapp.py     # Up to 3 async exec requests at once.\n NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=0 myapp.py     # disables async execution (default behavior)\n"
  },
  {
    "path": "neuron-runtime/nrt-developer-guide.rst",
    "content": ".. _nrt-api-guide:\n\nDeveloper Guide - NeuronX Runtime\n=================================\n\nThis guide is intended to support a deeper understanding of the Neuron Runtime and how ML applications are built using the Runtime APIs directly, and focuses on the information you need to know when building custom frameworks that call ``libnrt`` APIs directly from C/C++ apps. It is applicable to developers building their own ML frameworks; if you are using a popular existing framework such as PyTorch, JAX, or TensorFlow, the concepts and techniques discussed in this guide do not apply to your work.\n\n.. note::\n    The next few paragraphs provide a brief introduction to the Neuron hardware and the Neuron Runtime architecture. Customers who would rather skip this and jump straight to building their first ML\n    application which runs without the aid of an ML framework, should go to :ref:`first_app`.\n\n\nAbout the Neuron Runtime Library\n--------------------------------\n\nThe Neuron Runtime Library (``libnrt``) is the intermediate layer between an application and a framework, and the Neuron driver and Neuron Devices. It provides a C API for initializing the Neuron hardware, staging models and input data, executing inferences and training iterations on the staged models, and retrieving output data. The vast majority of ML applications running on Neuron will follow one of the following 3 architectural templates:\n\n\n.. figure:: ../images/neuron-rt-diagram.png\n\n    `Individual processes executing models on one or more Neuron Devices`\n\n.. figure:: ../images/neuron-rt-diagram-2.png\n\n    `Processes working together on executing models within the same instance - libnccom (The Neuron Collective Communication Library) handles inter-worker communication`\n\n\n.. figure:: ../images/neuron-rt-diagram-3.png\n\n    `Processes working together on executing models across multiple instances - libnccom, libfabric and the EFA driver handle communication`\n\n\n.. _reqs:\n\nRequirements\n------------\n\nA more comprehensive guide to installing Neuron software can be found in the :ref:`torch_quick_start` guide.\n\nThe Neuron Runtime requires the Neuron Driver, which is provided by the ``aws-neuron-dkms`` package. Run the commands below to install the driver for the indicated operating system:\n\nAL2023:\n\n.. code-block:: bash\n\n    sudo dnf install aws-neuronx-dkms\n\nUbuntu:\n\n.. code-block:: bash\n\n    sudo apt-get install aws-neuronx-dkms\n\n\n\nThe Runtime Library consists of the ``libnrt.so`` and header files.  These artifacts are version-controlled and installed via the ``aws-neuronx-runtime-lib`` package. After installing the package, you will find the compied library file (``libnrt.so``) in\n``/opt/aws/neuron/lib`` and the necessary header files to use the APIs it provides in ``/opt/aws/neuron/include``. Run the commands below to install the runtime library and headers for the indicated operating system:\n\nAL2023:\n\n.. code-block:: bash\n\n    sudo dnf install aws-neuronx-runtime-lib\n\nUbuntu:\n\n.. code-block:: bash\n\n    sudo apt-get install aws-neuronx-runtime-lib\n\nFor applications that use distributed training or distributed inferences, the Neuron Collective Communication Library is required. Run the commands below to the library for the indicated operating system:\n\nAL2023:\n\n.. code-block:: bash\n\n    sudo dnf install aws-neuronx-collectives\n\nUbuntu:\n\n.. code-block:: bash\n\n    sudo apt-get install aws-neuronx-collectives\n\n\nIn case of multi-instance training, you must also install the EFA driver and the Libfabric library (provided by the EFA installer). Run the command below to install it:\n\nAL2023 & Ubuntu:\n\n.. code-block:: bash\n\n    curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz\n    wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key\n    cat aws-efa-installer.key | gpg --fingerprint\n    wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig\n\n    tar -xvf aws-efa-installer-latest.tar.gz\n    cd aws-efa-installer && sudo bash efa_installer.sh --yes\n    cd\n    sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer\n\n\n.. _insttypes:\n\nIntroduction to Neuron Hardware\n-------------------------------\n\nNeuron Machine Learning Accelerators (or Neuron Devices) are custom accelerators designed to efficiently run  Machine Learning workloads such as inference using a given model or a distributed training job. Depending on the type of workload and its size, customers can opt for the following Neuron-equipped EC2 instances:\n\n.. list-table::\n    :widths: 40 40 40 40 40\n    :header-rows: 1\n\n    * - Workload type\n      - Neuron Device Name\n      - Instance type(s)\n      - Devices Per Instance\n      - Availability\n    * - Inference\n      - Inferentia II (v3)\n      - inf2.xlarge, inf2.8xlarge\n      - 1\n      - Available Now!\n    * - Inference\n      - Inferentia II (v3)\n      - inf2.24xlarge\n      - 6\n      - Available Now!\n    * - Inference\n      - Inferentia II (v3)\n      - inf2.48xlarge\n      - 12\n      - Available Now!\n    * - Inference\n      - Inferentia (v1)\n      - inf1.xlarge, inf1.2xlarge\n      - 1\n      - Available Now!\n    * - Inference\n      - Inferentia (v1)\n      - inf1.6xlarge\n      - 4\n      - Available Now!\n    * - Inference\n      - Inferentia (v1)\n      - inf1.24xlarge\n      - 16\n      - Available Now!\n    * - Training\n      - Trainium (v2)\n      - trn1.2xlarge\n      - 1\n      - Available Now!\n    * - Training\n      - Trainium (v2)\n      - trn1.32xlarge\n      - 16\n      - Available Now!\n\n\n\nNeuron Device\n^^^^^^^^^^^^^\n\nEach Neuron Device consists of multiple execution units called \"NeuronCores\". They use high-bandwidth device memory and PCIe interfaces to coordinate with the host CPU and other Neuron Devices and components (depending on the Neuron Device version).\n\nTo get the number of NeuronCores per Neuron Device, the amount of Neuron Device memory, and the way devices are directly connected, use the ``neuron-ls`` tool by running the following command:\n\n``neuron-ls --topology``\n\nIf successful, it will return output like this:\n\n.. code-block:: bash\n\n    instance-type: trn1.32xlarge\n    instance-id: i-0633517e496256bf8\n    +--------+--------+--------+---------------+---------+\n    | NEURON | NEURON | NEURON |   CONNECTED   |   PCI   |\n    | DEVICE | CORES  | MEMORY |    DEVICES    |   BDF   |\n    +--------+--------+--------+---------------+---------+\n    | 0      | 2      | 32 GB  | 12, 3, 4, 1   | 10:1c.0 |\n    | 1      | 2      | 32 GB  | 13, 0, 5, 2   | 10:1d.0 |\n    | 2      | 2      | 32 GB  | 14, 1, 6, 3   | a0:1c.0 |\n    | 3      | 2      | 32 GB  | 15, 2, 7, 0   | a0:1d.0 |\n    | 4      | 2      | 32 GB  | 0, 7, 8, 5    | 20:1b.0 |\n    | 5      | 2      | 32 GB  | 1, 4, 9, 6    | 20:1c.0 |\n    | 6      | 2      | 32 GB  | 2, 5, 10, 7   | 90:1b.0 |\n    | 7      | 2      | 32 GB  | 3, 6, 11, 4   | 90:1c.0 |\n    | 8      | 2      | 32 GB  | 4, 11, 12, 9  | 20:1d.0 |\n    | 9      | 2      | 32 GB  | 5, 8, 13, 10  | 20:1e.0 |\n    | 10     | 2      | 32 GB  | 6, 9, 14, 11  | 90:1d.0 |\n    | 11     | 2      | 32 GB  | 7, 10, 15, 8  | 90:1e.0 |\n    | 12     | 2      | 32 GB  | 8, 15, 0, 13  | 10:1e.0 |\n    | 13     | 2      | 32 GB  | 9, 12, 1, 14  | 10:1b.0 |\n    | 14     | 2      | 32 GB  | 10, 13, 2, 15 | a0:1e.0 |\n    | 15     | 2      | 32 GB  | 11, 14, 3, 12 | a0:1b.0 |\n    +--------+--------+--------+---------------+---------+\n    Neuron Device Topology\n          *        *        *        *\n          │        │        │        │\n          ▼        ▼        ▼        ▼\n    *––►[ 0 ]◄––►[ 1 ]◄––►[ 2 ]◄––►[ 3 ]◄––*\n          ▲        ▲        ▲        ▲\n          │        │        │        │\n          ▼        ▼        ▼        ▼\n    *––►[ 4 ]◄––►[ 5 ]◄––►[ 6 ]◄––►[ 7 ]◄––*\n          ▲        ▲        ▲        ▲\n          │        │        │        │\n          ▼        ▼        ▼        ▼\n    *––►[ 8 ]◄––►[ 9 ]◄––►[10 ]◄––►[11 ]◄––*\n          ▲        ▲        ▲        ▲\n          │        │        │        │\n          ▼        ▼        ▼        ▼\n    *––►[12 ]◄––►[13 ]◄––►[14 ]◄––►[15 ]◄––*\n          ▲        ▲        ▲        ▲\n          │        │        │        │\n          *        *        *        *\n\n\n|nd_v1|\n\n\nNeuronCore\n^^^^^^^^^^\n\nThe NeuronCore is the primary execution unit within the accelerator. Each NeuronCore contains several execution engines\n(for different types of compute operations such as tensor-based, vector, and scalar), Direct Memory Access (DMA) engines, and a local cache.\n\nA NeuronCore can operate independently or together with other NeuronCores, depending on the nature of the workload and the way\na model is compiled and loaded to the NeuronCores in the accelerator. Each execution engine can access the cache and DRAM attached to the accelerator device.\nData is transferred between the host CPU and the accelerator device (as well as between the device DRAM and NeuronCores) using DMA, which enables more efficient data movement.\n\nThe Neuron Runtime Architecture\n-------------------------------\n\n|nrt_arch|\n\nApplication Interface Layer (The ``libnrt`` API)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Application Interface Layer allows applications and frameworks to use the available Neuron Devices to run\ninference or training workloads. A complete reference of the C interface can be found in :ref:`nrt_api`.\n\nMonitoring and Profiling\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Neuron Runtime is able to capture key execution metrics which can be read in real-time using ``neuron-monitor`` and\n``neuron-top``. ``neuron-monitor`` allows forwarding those metrics to CloudWatch or a Prometheus server, enabling fleet-wide\nmonitoring - for more on that please refer to the ``neuron-monitor`` usage guide :ref:`neuron-monitor-ug`.\nProfiling an execution is another feature of the Neuron Runtime - which provides an API for starting and stopping profiling,\nas well as saving the profile data to a file, which can be used by tools such as the Neuron Tensorboard. This API is\ndocumented in :ref:`api_profile` section.\n\n.. _neff-format:\n\nThe NEFF format and NEFF Parser\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nA NEFF (Neuron Executable File Format) is a single file container for all the artifacts needed to execute a model on one or more NeuronCores.\nA NEFF is the output of the Neuron Compiler (neuron-cc). It contains Neuron machine instructions, pseudo instructions (compiler-generated instructions\nwhich are parsed and replaced with Neuron instructions by the Neuron Runtime when the model loads), tensor information, model parameters and other components\nthat support the model's execution on one or more NeuronCores.\n\nOperators that are not supported by Neuron can be compiled into CPU-executable binary and included into the NEFF as well.\n\nUsually there is only one subgraph (which is executed on a single NeuronCore) in a NEFF:\n\n.. code-block:: bash\n\n    NEFF Nodes:\n        NODE       Executor    Name        Variable       Size    Type    Format            Shape    DataType    TimeSeries\n           1    Neuron Core    sg00\n                                            image:0    3259008      IN      NHWC    [1 3 552 984]\n                                       net_output:0    1323972     OUT      NHWC    [1 78 69 123]                false\n\nIn this example, there is a single subgraph, one input, and one output:\n\n|nrt_neff_single|\n\nSome NEFFs can have multiple subgraphs (which are deployed by the runtime on separate NeuronCores) and multiple CPU operators, as demonstrated below:\n\n.. code-block:: bash\n\n    NEFF Nodes:\n        NODE       Executor                             Name               Variable    Size    Type    Format        Shape    DataType    TimeSeries\n           1    Neuron Core                             sg00\n                                                                            input:0       2      IN      NHWC    [1 1 1 1]\n                                                                         nn/relu1:0       2     OUT      NHWC    [1 1 1 1]                false\n           1    Neuron Core                             sg01\n                                                                         nn/relu1:0       2      IN      NHWC    [1 1 1 1]\n                                                                         nn/relu2:0       2     OUT      NHWC    [1 1 1 1]                false\n           2            CPU         fused_3_layout_transform\n                                                                layout_transform0:0       0     OUT                     []\n           4            CPU        fused_2_nn_conv2d_nn_relu\n                                                                          constant0       2      IN              [1 1 1 1]     float16\n                                                                         nn.relu0:0       0     OUT                     []\n           5            CPU    fused_1_layout_transform_copy\n                                                                         nn/relu3:0       0     OUT                     []\n           6    Neuron Core                             sg02\n                                                                         nn/relu3:0       2      IN      NHWC    [1 1 1 1]\n                                                                         nn/relu4:0       2     OUT      NHWC    [1 1 1 1]                false\n           6    Neuron Core                             sg03\n                                                                         nn/relu4:0       2      IN      NHWC    [1 1 1 1]\n                                                                        nn/output:0       2     OUT      NHWC    [1 1 1 1]                false\n\nThe output above is summarized by the graph below:\n\n|nrt_neff|\n\nThe nodes marked with dark blue are intermediate tensors that are handled internally by the Neuron Runtime.\nThe other blue nodes are inputs/outputs. The green colored box indicates the operator is executed on the NeuronCore while\nthe red color box indicates the execution is done on the CPU.\n\nThe NEFF layer in Neuron Runtime is responsible for parsing a NEFF, validating it, and translating pseudo instructions into hardware specific\ninstructions and DMA descriptors.\n\n\nGraph Walker and CPU Node Executor\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAs shown in the previous section, a NEFF can contain one or more nodes. During execution, the Neuron Runtime Graph Walker executes each node\none by one and handles copying input and output between each of them. If a node needs to be executed by the CPU, then a corresponding library function, found\nin a .so file in the NEFF, is dynamically loaded using ``dlopen()`` during model load and executed during model execution. Since this library function is executed in the calling\nthread’s context, the workload can be efficiently parallelized using a multi-threaded approach.\n\nIn the example below, each invocation of ``nrt_execute()`` would take 23ms: the first CPU node takes 1ms, the NeuronCore execution takes 20ms and the second CPU node takes 2 ms,\nso the total latency is 23ms and the throughput is 43 calls per second (1000/23).\n\n|nrt_neff_s|\n\nIf multiple threads are used, subsequent executions would be pipelined inside the runtime, hence increasing the throughput in this case to ~50 (1000/20).\n\n|nrt_neff_m|\n\nUser Mode Driver\n^^^^^^^^^^^^^^^^\n\nThis is the lowest level component of the Neuron Runtime and handles programming the engines, managing memory,\ncreating DMA descriptors to move data from host and device, handling notifications etc.\n\nMemory Management\n~~~~~~~~~~~~~~~~~\n\nThe Neuron Runtime is responsible with managing Neuron Device and host memory for the running models. The application is responsibile with\ndeallocating every loaded model and allocated tensor so the proper deallocation method needs to be called.\nFor more details, refer to :ref:`nrt_api` documentation.\nTools such as ``neuron-top`` and ``neuron-monitor`` can be used to determine the amount of memory being used at any given time.\n\n\n.. _first_app:\n\n\nBuilding your first Neuron application\n----------------------------------------\n\nThe simple application presented here loads a NEFF file using the provided binary files' contents as input tensors and saving the output tensors as\nbinary files. If a file isn't provided for an input tensor, that input tensor will be zero-filled.\n\nPrerequisites\n^^^^^^^^^^^^^\n\nBefore you start, you must have the following available in your local environment:\n\n* A recent version of GCC C++ compiler\n* An installation of the ``aws-neuronx-runtime-lib`` package as described in :ref:`reqs`\n\nRunning the built application requires:\n\n* A Neuron-equipped EC2 compute instance as shown in :ref:`insttypes`\n* Installing the ``aws-neuronx-runtime-lib`` and the ``aws-neuronx-dkms`` package on the instance as described in :ref:`reqs`\n* A NEFF file\n\n\nObtain a NEFF file\n^^^^^^^^^^^^^^^^^^\n\nWhen you run a workload through a Neuron framework, the compiled NEFFs are placed in ``/var/tmp/neuron-compile-cache``.\nAdditionally, setting the ``NEURON_FRAMEWORK_DEBUG`` environment variable to ``1`` before running the workload enables\nthe compiled NEFFs to be written to the current directory.\n\nAuthor your code\n^^^^^^^^^^^^^^^^\n\nFor the purposes of this guide, use the code provided below. If you are developing your own application, review this code to understand how to use the Neuron Runtime in it.\n\n.. code-block:: c\n\n    #include <stdbool.h>\n    #include <nrt/nrt.h>\n    #include <nrt/nrt_experimental.h>\n\n    #include <stdio.h>\n    #include <string.h>\n    #include <stdlib.h>\n    #include <time.h>\n    #include <errno.h>\n    #include <sys/mman.h>\n    #include <sys/stat.h>\n    #include <pthread.h>\n    #include <fcntl.h>\n    #include <stdint.h>\n    #include <unistd.h>\n\n    // Function to mmap a file in the application's memory space,\n    // it will return a pointer to the mmapped memory and the size\n    // of the mmapped data will be written to *size\n    void *mmap_file(const char *filepath, size_t *size) {\n        struct stat sb;\n        int fd = open(filepath, O_RDONLY);\n        if (fd < 0 || fstat(fd, &sb) != 0) {\n            fprintf(stderr, \"Unable to open %s: %s\\n\", filepath, strerror(errno));\n            return MAP_FAILED;\n        }\n        *size = sb.st_size;\n        return mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);\n    }\n\n    #define P_ERR(...) fprintf(stderr, __VA_ARGS__)\n\n    #define CHECK_RESULT(res, expected, ...)    \\\n        if (res != expected) {                  \\\n            fprintf(stderr, __VA_ARGS__);       \\\n            exit(-1);                           \\\n        }\n\n    // struct used to load input tensors from files\n    typedef struct {\n        char *name;\n        size_t size;\n        void *data;\n    } input_tensor_info_t;\n\n    // simple container for input_tensor_info_t\n    typedef struct {\n        input_tensor_info_t *entries;\n        int entry_count;\n    } input_tensor_info_array_t;\n\n    // Allocate tensorsets and tensors based on the info_array and returns a valid tensorset in out_tset\n    // containing all the newly allocated tensors\n    NRT_STATUS allocate_tensors(nrt_tensor_info_array_t *info_array, nrt_tensor_usage_t usage_type, nrt_tensor_set_t **out_tset) {\n        NRT_STATUS result;\n        int tensor_idx;\n        nrt_tensor_info_t *tensor_info = NULL;\n        nrt_tensor_t *tensor = NULL;\n\n        // We allocate a nrt_tensor_set which acts as a containers for nrt_tensors\n        result = nrt_allocate_tensor_set(out_tset);\n        if (result != NRT_SUCCESS) {\n            P_ERR(\"Couldn't allocate %s tensorset\\n\", usage_type == NRT_TENSOR_USAGE_INPUT ? \"input\" : \"output\");\n        }\n\n        for (tensor_idx = 0; tensor_idx < info_array->tensor_count; tensor_idx++) {\n            tensor_info = &info_array->tensor_array[tensor_idx];\n            if (tensor_info->usage != usage_type) {\n                continue;\n            }\n            // Allocate the tensor with the name and size found in tensor_info_array\n            result = nrt_tensor_allocate(NRT_TENSOR_PLACEMENT_DEVICE, 0, tensor_info->size,\n                                         tensor_info->name, &tensor);\n            if (result != NRT_SUCCESS) {\n                P_ERR(\"Couldn't allocate tensor %s\\n\", tensor_info->name);\n                return result;\n            }\n            // Finally add the tensors to the newly allocated tensor set\n            result = nrt_add_tensor_to_tensor_set(*out_tset, tensor_info->name, tensor);\n            if (result != NRT_SUCCESS) {\n                P_ERR(\"Couldn't add tensor %s to tensorset\\n\", tensor_info->name);\n                return result;\n            }\n        }\n        return NRT_SUCCESS;\n    }\n\n    // Tensor iterator handler - returns false if the iteration needs to stop\n    typedef bool (*tensor_handler)(nrt_tensor_t *, nrt_tensor_info_t *, NRT_STATUS *, void *);\n\n    // Iterates through all the tensors in the given tensorset, based on the data in info_array for the given usage_type\n    // and calls the handler function with the provided args pointer\n    // Will return the first error returned by a handler\n    NRT_STATUS iterate_tensors(nrt_tensor_set_t *tset, nrt_tensor_info_array_t *info_array, nrt_tensor_usage_t usage_type,\n                               tensor_handler handler, void *args) {\n        NRT_STATUS result = NRT_SUCCESS;\n        NRT_STATUS final_result = NRT_SUCCESS;\n        int tensor_idx;\n        nrt_tensor_info_t *tensor_info = NULL;\n        nrt_tensor_t *tensor = NULL;\n\n        for (tensor_idx = 0; tensor_idx < info_array->tensor_count; tensor_idx++) {\n            tensor_info = &info_array->tensor_array[tensor_idx];\n            if (tensor_info->usage != usage_type) {\n                continue;\n            }\n            result = nrt_get_tensor_from_tensor_set(tset, tensor_info->name, &tensor);\n            if (result != NRT_SUCCESS) {\n                P_ERR(\"Tensor %s not found in tensor set\\n\", tensor_info->name);\n                continue;\n            }\n            result = NRT_SUCCESS;\n            if ((*handler)(tensor, tensor_info, &result, args) == false) {\n                return result;\n            }\n            if (final_result == NRT_SUCCESS && result != final_result) {\n                final_result = result;\n            }\n        }\n        return final_result;\n    }\n\n    // Tensor iteration handler that checks if a tensor has an input file associated with it\n    // based on the CLI args\n    bool handler_load_inputs(nrt_tensor_t *tensor, nrt_tensor_info_t *tensor_info, NRT_STATUS *result, void* args) {\n        NRT_STATUS res;\n        int idx;\n        input_tensor_info_array_t *info_array = (input_tensor_info_array_t *)args;\n        bool input_found = false;\n\n        for (idx = 0; idx < info_array->entry_count; idx++) {\n            if (strcmp(info_array->entries[idx].name, tensor_info->name) != 0) {\n                continue;\n            }\n            if (info_array->entries[idx].size != tensor_info->size) {\n                P_ERR(\"Input file for tensor %s has incorrect size %lu, expected %lu\\n\",\n                      tensor_info->name, info_array->entries[idx].size, tensor_info->size);\n                break;\n            }\n            res = nrt_tensor_write(tensor, info_array->entries[idx].data, 0, tensor_info->size);\n            if (res != NRT_SUCCESS) {\n                P_ERR(\"Unable to write content to input tensor %s\\n\", tensor_info->name);\n            } else {\n                input_found = true;\n            }\n        }\n        if (!input_found) {\n            fprintf(stderr, \"Input tensor %s will be zero-filled\\n\", tensor_info->name);\n        }\n        *result = NRT_SUCCESS;\n        return true;\n    }\n\n    // Tensor iteration handler that saves outputs\n    bool handler_save_outputs(nrt_tensor_t *tensor, nrt_tensor_info_t *tensor_info, NRT_STATUS *result, void* args) {\n        static char filename[280];\n\n        int fd;\n        // Allocating a buffer large enough to read the entire tensor\n        void *tensor_data = malloc(tensor_info->size);\n\n        *result = NRT_SUCCESS;\n        if (tensor_data == NULL) {\n            fprintf(stderr, \"Unable to allocate memory for saving output tensor %s\\n\", tensor_info->name);\n            *result = NRT_FAILURE;\n            return true;\n        }\n        // Reading the tensor to the newly allocated buffer\n        *result = nrt_tensor_read(tensor, tensor_data, 0, tensor_info->size);\n        if (*result != NRT_SUCCESS) {\n            fprintf(stderr, \"Unable to read tensor %s\\n\", tensor_info->name);\n            free(tensor_data);\n            return true;\n        }\n\n        // Saving the tensor to a file\n        snprintf(filename, 280, \"%s.out\", tensor_info->name);\n        fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);\n        if (fd < 0) {\n            fprintf(stderr, \"Unable to open %s for writing\\n\", filename);\n            free(tensor_data);\n            *result = NRT_FAILURE;\n            return true;\n        }\n        if (write(fd, tensor_data, tensor_info->size) != tensor_info->size) {\n            *result = NRT_FAILURE;\n            fprintf(stderr, \"Unable to write tensor %s contents to file %s\\n\", tensor_info->name, filename);\n        }\n        close(fd);\n\n        free(tensor_data);\n        return true;\n    }\n\n    // Tensor iteration handler that deallocates tensors\n    bool handler_free_tensor(nrt_tensor_t *tensor, nrt_tensor_info_t *tensor_info, NRT_STATUS *result, void* args) {\n        *result = NRT_SUCCESS;\n        nrt_tensor_free(&tensor);\n        return true;\n    }\n\n    int main(int argc, char *argv[]) {\n        NRT_STATUS result;\n        int idx = 0;\n        int tensor_idx = 0;\n        void *neff_data = NULL;\n        size_t neff_size = 0;\n        void *input_data = NULL;\n\n        input_tensor_info_array_t input_tensor_info_array = {0};\n        input_tensor_info_t *current_input = NULL;\n\n        nrt_model_t *model = NULL;\n        nrt_tensor_set_t *inputs = NULL;\n        nrt_tensor_set_t *outputs = NULL;\n\n        nrt_tensor_t *tensor = NULL;\n        nrt_tensor_info_array_t *tensor_info_array = NULL;\n\n        if (argc < 2) {\n            fprintf(stderr, \"Incorrect number of args, usage: exec_test file.neff [input_1_name] [input_1_file] ...\\n\");\n            exit(-1);\n        }\n\n        // Try mmapping the NEFF file first, so we can fail fast if not found or\n        // mmap fails\n        neff_data = mmap_file(argv[1], &neff_size);\n        if (neff_data == MAP_FAILED) {\n            fprintf(stderr, \"Unable to map file %s\\n\", argv[1]);\n            exit(-1);\n        }\n\n        // mmap input tensor files (if any provided) and fill the input_tensor_info array\n        if (argc > 3) {\n            input_tensor_info_array.entries = malloc((argc - 2 / 2) * sizeof(input_tensor_info_t));\n            for (idx = 2; idx < argc; idx += 2) {\n                if (idx + 1 >= argc) {\n                    break;\n                }\n                current_input = &input_tensor_info_array.entries[input_tensor_info_array.entry_count];\n                input_data = mmap_file(argv[idx + 1], &current_input->size);\n                if (input_data == MAP_FAILED) {\n                    fprintf(stderr, \"Unable to mmap inputs file %s\\n\", argv[idx + 1]);\n                    continue;\n                }\n                current_input->name = argv[idx];\n                current_input->data = input_data;\n                input_tensor_info_array.entry_count++;\n            }\n        }\n\n        // Before calling any nrt API, nrt_init must be called\n        // Since this is not running as part of a framework, the correct parameter for 'framework' is\n        // NRT_FRAMEWORK_TYPE_NO_FW and the others can be empty strings\n        result = nrt_init(NRT_FRAMEWORK_TYPE_NO_FW, \"\", \"\");\n        CHECK_RESULT(result, NRT_SUCCESS, \"NRTLIB could not be initialized, error: %d\\n\", (int)result);\n\n        // Loading the NEFF\n        printf(\"Loading NEFF\\n\");\n        result = nrt_load(neff_data, neff_size, -1, -1, &model);\n        CHECK_RESULT(result, NRT_SUCCESS, \"Unable to load NEFF\\n\");\n\n        // In order to allocate tensors, first we need to call nrt_get_model_tensor_info which\n        // will give us the model tensors' names and sizes in tensor_info_array\n        printf(\"Getting IO tensor information\\n\");\n        result = nrt_get_model_tensor_info(model, &tensor_info_array);\n        CHECK_RESULT(result, NRT_SUCCESS, \"Unable to get model tensor information\\n\");\n\n        // Allocating tensors\n        printf(\"Creating I/O data (%ld tensors)\\n\", tensor_info_array->tensor_count);\n        result = allocate_tensors(tensor_info_array, NRT_TENSOR_USAGE_INPUT, &inputs);\n        CHECK_RESULT(result, NRT_SUCCESS, \"Error allocating input tensors\\n\");\n        result = allocate_tensors(tensor_info_array, NRT_TENSOR_USAGE_OUTPUT, &outputs);\n        CHECK_RESULT(result, NRT_SUCCESS, \"Error allocating input tensors\\n\");\n\n        // Loading input files (if provided)\n        iterate_tensors(inputs, tensor_info_array, NRT_TENSOR_USAGE_INPUT, handler_load_inputs,\n                        (void*) &input_tensor_info_array);\n\n        // Executing model using the tensors in the inputs tensorset and writing the outputs to the tensors\n        // in the outputs tensorset\n        result = nrt_execute(model, inputs, outputs);\n        CHECK_RESULT(result, NRT_SUCCESS, \"Error during model execution: %d\\n\", result);\n\n        // Saving outputs to files\n        result = iterate_tensors(outputs, tensor_info_array, NRT_TENSOR_USAGE_OUTPUT, handler_save_outputs, NULL);\n        if (result != NRT_SUCCESS) {\n            P_ERR(\"Error saving outputs to files\\n\");\n        }\n\n        // Unloading the model\n        result = nrt_unload(model);\n        if (result != NRT_SUCCESS) {\n            P_ERR(\"Unable to unload NEFF\\n\");\n        }\n\n        printf(\"Freeing tensors\\n\");\n        iterate_tensors(inputs, tensor_info_array, NRT_TENSOR_USAGE_INPUT, handler_free_tensor, NULL);\n        iterate_tensors(outputs, tensor_info_array, NRT_TENSOR_USAGE_OUTPUT, handler_free_tensor, NULL);\n\n        nrt_destroy_tensor_set(&inputs);\n        nrt_destroy_tensor_set(&outputs);\n\n        printf(\"Deallocating model tensor info\\n\");\n        // We are done with the tensor_info_array, we can dispose of it\n        nrt_free_model_tensor_info(tensor_info_array);\n\n        printf(\"Deallocating inputs tensor info\\n\");\n        // Unmapping the input files\n        for (tensor_idx = 0; tensor_idx < input_tensor_info_array.entry_count; tensor_idx++) {\n            munmap(input_tensor_info_array.entries[tensor_idx].data, input_tensor_info_array.entries[tensor_idx].size);\n        }\n        if (input_tensor_info_array.entries) {\n            free(input_tensor_info_array.entries);\n        }\n\n        // Clean-up the runtime\n        printf(\"Cleaning up the runtime\\n\");\n        nrt_close();\n\n        printf(\"DONE\\n\");\n    }\n\n\n\n\nBuilding the example:\n\n.. code-block:: bash\n\n    gcc run_neff.c -o run_neff -lnrt -pthread -I/opt/aws/neuron/include -L/opt/aws/neuron/lib\n\n\nRunning the example:\n\n.. code-block:: bash\n\n    ./run_neff my.neff [input_1] [input_1.bin] [input_2] [input_2.bin] ...\n\n\nReview of the example code\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section breaks down the code example above to better illustrate the structure, flow, and calls in it.\n\nInitialization and cleanup\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code-block:: c\n\n    // ...\n    result = nrt_init(NRT_FRAMEWORK_TYPE_NO_FW, \"\", \"\");\n    // ...\n    nrt_close();\n\n\nThe Neuron Runtime is initialized by calling ``nrt_init`` and all applications should call ``nrt_close`` once they're done\nusing it. For more details on these functions, go to the :ref:`api_init` section.\n\nLoading the NEFF\n~~~~~~~~~~~~~~~~\n\nOnce the contents of a NEFF file have been mapped to virtual memory using ``mmap``,  load the NEFF with ``nrt_load``.\n\n.. code-block:: c\n\n    // ...\n    void *mmap_file(const char *filepath, size_t *size) {\n        struct stat sb;\n        int fd = open(filepath, O_RDONLY);\n        if (fd < 0 || fstat(fd, &sb) != 0) {\n            fprintf(stderr, \"Unable to open %s: %s\\n\", filepath, strerror(errno));\n            return MAP_FAILED;\n        }\n        *size = sb.st_size;\n        return mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);\n    }\n    // ...\n    neff_data = mmap_file(argv[1], &neff_size);\n\n\nThe runtime will decide the optimal placement for the model. Specifically, it chooses the optimal NeuronCore on which to deploy the model.\n\n.. code-block:: c\n\n    // ...\n    result = nrt_load(neff_data, neff_size, -1, -1, &model);\n    // ...\n\n\nThe call to ``nrt_load`` returns a valid model handle of type ``nrt_model_t*``, which you can use for other calls to the Runtime API (such as ``nrt_execute``).\n\nFor more details on the model API (including ``nrt_load``), see :ref:`api_model`.\n\n\nCreating input/output tensors\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe main container for tensors is the ``nrt_tensor_set_t*``. Tensors (``nrt_tensor_t*``) are not passed directly to the NEFF execution function (``nrt_execute``); rather,\nthey have to be wrapped as ``nrt_tensor_set_t*``. The ``allocate_tensors`` function allocates the tensorset and the tensors for the requested usage type\n(``NRT_TENSOR_USAGE_INPUT`` or ``NRT_TENSOR_USAGE_OUTPUT``) and returns the tensorset containing the allocated tensors in ``out_tset``.\n\n.. code-block:: c\n\n    NRT_STATUS allocate_tensors(nrt_tensor_info_array_t *info_array, nrt_tensor_usage_t usage_type, nrt_tensor_set_t **out_tset) {\n        // ...\n        // We allocate a nrt_tensor_set which acts as a containers for nrt_tensors\n        result = nrt_allocate_tensor_set(out_tset);\n        // ...\n\n        for (tensor_idx = 0; tensor_idx < info_array->tensor_count; tensor_idx++) {\n            tensor_info = &info_array->tensor_array[tensor_idx];\n            if (tensor_info->usage != usage_type) {\n                continue;\n            }\n            // ...\n            // Allocate the tensor with the name and size found in tensor_info_array\n            result = nrt_tensor_allocate(NRT_TENSOR_PLACEMENT_DEVICE, 0, tensor_info->size,\n                                         tensor_info->name, &tensor);\n            // ...\n            // Finally add the tensors to the newly allocated tensor set\n            result = nrt_add_tensor_to_tensor_set(*out_tset, tensor_info->name, tensor);\n            // ...\n        }\n        // ...\n    }\n\n\nIterating through tensors in an nrt_tensor_set_t\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA helper function, ``iterate_tensors``, is defined and implemented to iterate through the ``nrt_tensor_t`` values in a tensorset and call the function\n``handler`` for each of them. If the handler function returns ``false``, iteration ends. ``iterate_tensors`` returns the first error\nreported by the handler function.\n\n.. code-block:: c\n\n    // Tensor iterator handler - returns false if the iteration needs to stop\n    typedef bool (*tensor_handler)(nrt_tensor_t *, nrt_tensor_info_t *, NRT_STATUS *, void *);\n\n    NRT_STATUS iterate_tensors(nrt_tensor_set_t *tset, nrt_tensor_info_array_t *info_array, nrt_tensor_usage_t usage_type,\n                               tensor_handler handler, void *args) {\n    // ...\n    for (tensor_idx = 0; tensor_idx < info_array->tensor_count; tensor_idx++) {\n        // ...\n        result = nrt_get_tensor_from_tensor_set(tset, tensor_info->name, &tensor);\n        // ...\n        if ((*handler)(tensor, tensor_info, &result, args) == false) {\n            return result;\n        }\n        // ...\n    }\n\n\nDeallocating input/output tensors\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAfter the execution is complete, the tensors are deallocated using ``iterate_tensors`` and the tensorsets are deallocated\nusing ``nrt_destroy_tensor_set``:\n\n.. code-block:: c\n\n    iterate_tensors(inputs, tensor_info_array, NRT_TENSOR_USAGE_INPUT, handler_free_tensor, NULL);\n    iterate_tensors(outputs, tensor_info_array, NRT_TENSOR_USAGE_OUTPUT, handler_free_tensor, NULL);\n\n    nrt_destroy_tensor_set(&inputs);\n    nrt_destroy_tensor_set(&outputs);\n\n\nThe ``handler_free_tensor`` function simply deallocates the given tensor:\n\n.. code-block:: c\n\n    bool handler_free_tensor(nrt_tensor_t *tensor, nrt_tensor_info_t *tensor_info, NRT_STATUS *result, void* args) {\n        // ...\n        nrt_tensor_free(&tensor);\n        // ...\n    }\n\nExecuting the NEFF\n~~~~~~~~~~~~~~~~~~\n\nExecute the NEFF by calling ``nrt_execute``. If ``nrt_execute`` completes successfully, the output tensors are\nread and saved to files (one binary file per output tensor) using ``iterate_tensors``:\n\n.. code-block:: c\n\n    // Executing model using the tensors in the inputs tensorset and writing the outputs to the tensors\n    // in the outputs tensorset\n    result = nrt_execute(model, inputs, outputs);\n    // ...\n    // Saving outputs to files\n    result = iterate_tensors(outputs, tensor_info_array, NRT_TENSOR_USAGE_OUTPUT, handler_save_outputs, NULL);\n\n\nThe iteration handler reads the tensor data and writes it to a file with the same name as the tensor:\n\n.. code-block:: c\n\n    bool handler_save_outputs(nrt_tensor_t *tensor, nrt_tensor_info_t *tensor_info, NRT_STATUS *result, void* args) {\n        // ...\n        void *tensor_data = malloc(tensor_info->size);\n        // ...\n        // Reading the tensor to the newly allocated buffer\n        *result = nrt_tensor_read(tensor, tensor_data, 0, tensor_info->size);\n        // ...\n\n        // Saving the tensor to a file\n        snprintf(filename, 280, \"%s.out\", tensor_info->name);\n        fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);\n        // ...\n        if (write(fd, tensor_data, tensor_info->size) != tensor_info->size) {\n            // ...\n        }\n        close(fd);\n\n\nFor more details on the execution API, go to the :ref:`api_exec` section.\n\n\n.. _nrt_api:\n\nThe LIBNRT API\n------------------\n\nAPI Return Codes\n^^^^^^^^^^^^^^^^\n\nAll API calls return an ``NRT_STATUS`` value representing the return status of the call. In case of an error, an error message\nis logged (based on the logging settings). The table below contains all the possible error codes. Note that some error codes only apply to certain API calls. \nFor more details on these errors, refer the :ref:`nrt-troubleshooting` docs.\n\n.. list-table::\n    :widths: 40 20 260\n    :header-rows: 1\n\n    * - Name\n      - Return Code\n      - Error\n    * - ``NRT_SUCCESS``\n      - 0\n      - Call was successful\n    * - ``NRT_FAILURE``\n      - 1\n      - Generic failure\n    * - ``NRT_INVALID``\n      - 2\n      - Invalid NEFF, bad instruction, bad DMA descriptor, input tensor name/size does not match the model, etc.\n    * - ``NRT_INVALID_HANDLE``\n      - 3\n      - Invalid handle (e.g. an invalid model handle)\n    * - ``NRT_RESOURCE``\n      - 4\n      - Failed to allocate a resource for the requested operation\n    * - ``NRT_TIMEOUT``\n      - 5\n      - Operation timed out\n    * - ``NRT_HW_ERROR``\n      - 6\n      - Hardware failure\n    * - ``NRT_QUEUE_FULL``\n      - 7\n      - Too many pending ``nrt_execute()`` requests. The runtime request queue is full. Cannot enqueue more ``nrt_execute()`` requests\n    * - ``NRT_LOAD_NOT_ENOUGH_NC``\n      - 9\n      - The number of available NeuronCores is insufficient for the requested operation\n    * - ``NRT_UNSUPPORTED_NEFF_VERSION``\n      - 10\n      - NEFF version unsupported\n    * - ``NRT_UNINITIALIZED``\n      - 13\n      - Returned when attempting an API call when the library is not initialized\n    * - ``NRT_CLOSED``\n      - 14\n      - Returned when attempting an API call after ``nrt_close()`` was called\n    * - ``NRT_EXEC_BAD_INPUT``\n      - 1002\n      - Invalid input has been submitted to nrt_execute()\n    * - ``NRT_EXEC_COMPLETED_WITH_NUM_ERR``\n      - 1003\n      - Execution completed with numerical errors (produced NaN)\n    * - ``NRT_EXEC_COMPLETED_WITH_ERR``\n      - 1004\n      - Execution was completed with other errors, either logical (event double clear), or hardware (parity error)\n    * - ``NRT_EXEC_NC_BUSY``\n      - 1005\n      - The NeuronCore is locked (in use) by another model/thread\n    * - ``NRT_OOB``\n      - 1006\n      - One or more indirect memcopies and/or embedding updates are out of bound due to input corruptions\n    * - ``NRT_EXEC_HW_ERR_COLLECTIVES``\n      - 1200\n      - Suspected hang in collectives operation due to hardware errors on this or other workers.\n    * - ``NRT_EXEC_HW_ERR_HBM_UE``\n      - 1201\n      - HBM Unrepairable Uncorrectable hardware error caused incorrect results \n    * - ``NRT_EXEC_HW_ERR_NC_UE``\n      - 1202\n      - NeuronCore parity errors caused incorrect results\n    * - ``NRT_EXEC_HW_ERR_DMA_ABORT``\n      - 1203\n      - The DMA engine encountered an error and halted execution\n    * - ``NRT_EXEC_SW_NQ_OVERFLOW``\n      - 1204\n      - Execution timed out due to dropped notifications. Likely caused by Hardware DMA Generation Engine (DGE) notifications\n    * - ``NRT_EXEC_HW_ERR_REPAIRABLE_HBM_UE``\n      - 1205\n      - HBM Repairable Uncorrectable hardware error caused incorrect results\n    * - ``NRT_NETWORK_PROXY_FAILURE``\n      - 1206\n      - Network communication failed between Neuron hardware and Collectives\n\n\n.. _api_init:\n\nInitialization, configuration and teardown\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. c:function:: NRT_STATUS nrt_init(nrt_framework_type_t framework, const char *fw_version, const char *fal_version)\n\n    Initializes the Neuron Runtime’s internal state and the Neuron hardware’s state.\n    This should be called before any other nrt_* call is attempted, although a small set of functions\n    are exempt from this rule (such as ``nrt_get_total_nc_count`` and ``get_nrt_version``). Any call to the NRT\n    library API will return ``NRT_FAILURE`` if ``nrt_init`` has not been called beforehand for any API call requires it.\n\n    The runtime is configured by setting the appropriate environment variable before this API call.\n    The list of available environment variables is found in the :ref:`api_config` section.\n\n    :param framework: Can be one of:\n\n        ``NRT_FRAMEWORK_TYPE_INVALID,                 // Invalid framework\n        NRT_FRAMEWORK_TYPE_NO_FW,                   // No framework\n        NRT_FRAMEWORK_TYPE_TENSORFLOW,              // Tensorflow\n        NRT_FRAMEWORK_TYPE_PYTORCH,                 // Pytorch\n        NRT_FRAMEWORK_TYPE_MXNET                    // Mxnet``\n\n        This argument is used by our Neuron Tools to determine the type of application running,\n        it has no other impact on the functioning of the runtime.\n        Application using a custom framework or calling the Neuron Runtime directly should use ``NRT_FRAMEWORK_TYPE_NO_FW``.\n\n    :param const char *fw_version: version of the framework on top of which this runtime is running\n    :param const char *fal_version: version of the framework adapter on top of which this runtime is running\n\n    Applications using `NRT_FRAMEWORK_TYPE_NO_FW` for the first argument should use two empty strings for the versions.\n\n\n.. _api_config:\n\nEnvironment variables used to configure the Runtime Library\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n``NEURON_RT_LOG_LOCATION=<CONSOLE/SYSLOG>, default=CONSOLE``\n    Chooses the output target for the Neuron Runtime logs (either console or syslog).\n\n``NEURON_RT_LOG_LEVEL=<ERROR/WARN/INFO/DEBUG/TRACE>, default=ERROR``\n    Specifies the logging verbosity for the Neuron Runtime library, from ERROR (least verbose), to TRACE (most verbose).\n\n``NEURON_RT_NUM_CORES=<n>``\n    Specifies how many NeuronCores are needed for the application. During ``nrt_init`` the requested number of NeuronCores are **exclusively** associated with the calling processes and\n    become unavailable to any other process attempting to use them. If there aren't enough NeuronCores available, ``nrt_init`` will return an error. Once the owner process has called ``nrt_close``\n    or exited, the NeuronCores are released and become available  to be associated with another process. By default, all NeuronCores present on the instance will be made available to the caller.\n\n\n``NEURON_RT_VISIBLE_CORES=<m,n,p-q>``\n    Similarly to the previous, it allows the calling process to get exclusive access to a set of NeuronCores, but it allows explicitly specifying which NeuronCores are available for the application based on their zero-based indices.\n    This variable can be a list of NeuronCores, for example: ``NEURON_RT_VISIBLE_CORES=3,4,5,6``, a range of NeuronCores, for example: ``NEURON_RT_VISIBLE_CORES=3-6``, or a combination of both: ``NEURON_RT_VISIBLE_CORES=3-5,6``.\n    The resulting range must be contiguous, for example this is not valid: ``NEURON_RT_VISIBLE_CORES=3,5,6`` because 4 is missing from the list, and indices need to be provided in consecutive increasing order.\n\n\n    .. note::\n\n        If both ``NEURON_RT_VISIBLE_CORES`` are ``NEURON_RT_NUM_CORES`` are defined, ``NEURON_RT_VISIBLE_CORES`` will be used.\n\n\n``NEURON_RT_ROOT_COMM_ID=<ip_address:port>``\n    Mandatory for applications that run workloads containing Collective Communication operators, allows specifying the IP address and assign a port for the rank 0 worker in the Collective Compute worker pool.\n    For example: ``NEURON_RT_ROOT_COMM_ID=10.0.1.2:46820``.\n\n\n``NEURON_RT_STOCHASTIC_ROUNDING_SEED=<value>``\n    Allows setting a value for the stochastic rounding seed. Has no effect on inf1.\n\n\n``NEURON_RT_DEBUG_MEMLOG_MAX_SIZE=<value>, default=1024*1024``\n    Allows changing the number of entries in the memory allocations log. This log contains an entry for every allocation and deallocation and will be dumped to a file in case of a memory allocation failure in CSV format.\n\n\n.. c:function::  NRT_STATUS nrt_close()\n\n    Closes all the devices used by the application (as defined by ``NEURON_RT_NUM_CORES``/``NEURON_RT_VISIBLE_CORES``)\n    and cleans up the runtime state. Note that once ``nrt_close`` has been called, most nrt_* API calls will fail if attempted.\n\n\n.. _api_model:\n\nThe Model API\n^^^^^^^^^^^^^\n\n.. c:function:: NRT_STATUS nrt_load(const void *neff_bytes, size_t size, int32_t start_nc, int32_t nc_count, nrt_model_t **model)\n\n    Loads a NEFF file whose content is found in `neff_bytes`, with the given size, placing it on ``nc_count`` NeuronCores starting with NeuronCore index `start_nc`.\n    If either ``nc_count`` or ``start_nc`` are -1, an optimal value for each will be determined automatically. The model can be configured using a list of environment\n    variables read inside this API call which can be found in the :ref:`model_env` section. It returns a handle to the loaded model in the ``nrt_model_t*``\n    pointer if the call succeeds. The returned handle represents the loaded model and can be used with calls that operate on an ``nrt_model_t*`` (such as ``nrt_execute``).\n\n\n    :param neff_bytes: Pointer to existing NEFF file data\n    :param size: Size of data in ``neff_bytes``\n    :param start_nc: Index of the NeuronCore on which to stage the model. The first NeuronCore owned by the application will always have the index ``0`` - for example, even if when setting ``NEURON_RT_VISIBLE_CORES=3,4``, the two NeuronCores will be referred to as ``0`` and ``1``. If -1, an optimal index will be automatically determined (based on current NeuronCore usage).\n    :param nc_count: Number of NeuronCores on which to stage the model. If its value is a multiple of the amount of NeuronCores needed by the model, the model will be replicated on the number of NeuronCores specified in the argument. This feature is called **TBD** and it will be explained in detail in a separate section. If its value is -1, the model will be staged a single time, using the number of cores needed by a single instance of the model.\n    :param model: Model handle returned by the call which can be passed to other functions that operate on models (such as ``nrt_execute``).\n\n\n.. _model_env:\n\nEnvironment variables used to configure a model being loaded\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n``NEURON_RT_EXEC_TIMEOUT=<n>, default=30 (inf1), default=600(trn1,inf2)``\n    Maximum of time, in seconds, allowed for one execution before timing out - which will cause the call to ``nrt_execute`` to fail and return ``NRT_TIMEOUT``.\n\n``NEURON_RT_VALIDATE_HASH=<true/false>, default=false``\n    Verify the integrity of NEFF data being loaded by checking against a checksum found in the header.\n\n``NEURON_RT_STOCHASTIC_ROUNDING_EN=<true/false>, default=false``\n    Enable stochastic rounding.\n\n\n.. c:function:: NRT_STATUS nrt_load_collectives(const void *neff_bytes, size_t size, int32_t start_nc, int32_t nc_count, uint32_t g_device_id, uint32_t g_device_count, nrt_model_t **model)\n\n    Same as ``nrt_load`` (same environment variables can be used to configure the model), but must be used when loading NEFFs containing Collective Communication operators. Uses the same arguments as `nrt_load`, but adds 2 extra ones.\n\n    :param neff_bytes: Pointer to existing NEFF file data\n    :param size: Size of data in ``neff_bytes``\n    :param start_nc: Index of NeuronCore on which to stage the model. If -1, an optimal index will be automatically determined (based on current NeuronCore usage).\n    :param nc_count: Number of NeuronCores on which to stage the model. If its value is a multiple of the amount of NeuronCores needed by the model, the model will be replicated on the number of NeuronCores specified in the argument. This feature is called **TBD** and it will be explained in detail in a separate section. If its value is -1, the model will be staged a single time, using the number of cores needed by a single instance of the model.\n    :param g_device_id: Globally unique ID within the Collective Communication world associated with this model instance.\n    :param g_device_count: Size of the Collective Communication world (total number of participating unique IDs).\n    :param model: Model handle returned by the call which can be passed to other functions that operate on models (such as ``nrt_execute``).\n\n\n.. c:function:: NRT_STATUS nrt_unload(nrt_model_t *model)\n\n    Unloads the given model and frees up device and host resources.\n\n    :param model: Pointer to model to unload. All data associated with the model is deleted, do not reuse the pointer or try to deallocate it afterwards. Do not call ``nrt_unload`` again on the same ``nrt_model_t*`` pointer (think of it as a call to `free()`).\n\n\n.. c:function:: NRT_STATUS nrt_get_model_nc_count(const nrt_model_t *model, uint32_t *nc_count)\n\n    Gets the number of NeuronCores used by the model and writes that value at the address pointed by ``nc_count``.\n\n    :param model: Valid pointer to an ``nrt_model_t``.\n    :param nc_count: If the call completes successfully, the pointed address will contain the number of NeuronCores used by the model.\n\n\n.. c:function:: NRT_STATUS nrt_get_model_tensor_info(nrt_model_t *model, nrt_tensor_info_array_t **tensor_info)\n\n    Gets input/output tensor information for a given loaded model.\n\n    :param model: Valid pointer to an ``nrt_model_t``.\n    :param tensor_info: Pointer to a ``nrt_tensor_info_array_t*`` which will contain the tensor information data. The function allocates memory for the structure internally which can only be correctly freed by calling ``nrt_free_model_tensor_info``.\n        The ``nrt_tensor_info_array_t`` struct and its dependencies are defined as follows:\n\n        .. code-block:: c\n\n            typedef struct nrt_tensor_info_array {\n                uint64_t tensor_count;              // Total number of input/output tensors used by the model\n                nrt_tensor_info_t tensor_array[];   // Array of tensor info representing those tensors\n            } nrt_tensor_info_array_t;\n\n            typedef struct nrt_tensor_info {\n                char name[NRT_TENSOR_NAME_MAX];     // Name of the tensor\n                nrt_tensor_usage_t usage;           // Type of the tensor\n                size_t size;                        // Tensor size in bytes\n                nrt_dtype_t dtype;                  // Data type\n                uint32_t *shape;                    // An array representing data shape\n                uint32_t ndim;                      // The number of dimensions (number of elements in the shape array)\n            } nrt_tensor_info_t;\n\n            // Usage type definitions for tensors\n            typedef enum nrt_tensor_usage {\n                NRT_TENSOR_USAGE_INPUT = 0,     // Tensor is used for input\n                NRT_TENSOR_USAGE_OUTPUT,        // Tensor is used for output\n            } nrt_tensor_usage_t;\n\n            // Data type definitions for tensors\n            typedef enum nrt_dtype {\n                NRT_DTYPE_UNKNOWN = 0,\n                NRT_DTYPE_FLOAT32,\n                NRT_DTYPE_FLOAT16,\n                NRT_DTYPE_BFLOAT16,\n                NRT_DTYPE_INT8,\n                NRT_DTYPE_UINT8,\n                NRT_DTYPE_INT16,\n                NRT_DTYPE_UINT16,\n                NRT_DTYPE_INT32,\n                NRT_DTYPE_UINT32,\n                NRT_DTYPE_INT64,\n                NRT_DTYPE_UINT64\n            } nrt_dtype_t;\n\n\n.. c:function:: NRT_STATUS nrt_free_model_tensor_info(nrt_tensor_info_array_t *tensor_info)\n\n    Frees a ``nrt_tensor_info_array_t`` allocated by a call to ``nrt_get_model_tensor_info``. As with all deallocation functions, don’t call it more than once on the same pointer.\n\n    :param tensor_info: ``nrt_tensor_info_array_t`` to deallocate.\n\n\n.. c:function:: NRT_STATUS nrt_get_model_instance_count(nrt_model_t *model, uint32_t *instance_count)\n\n    Returns the number of times this `nrt_model_t `is currently staged on the NeuronDevice(s) by writing it to the address pointed by ``instance_count``. It will always be >= 1. This value can be used to determine the number of threads that can optimally call ``nrt_execute`` on this ``nrt_model_t``.\n\n    :param model: Valid pointer to an ``nrt_model_t``.\n    :param instance_count: If the call completes successfully, the address will contain the instance count for this model\n\n\n.. _api_tensor:\n\nThe Tensor API\n^^^^^^^^^^^^^^\n\n\n.. c:function:: NRT_STATUS nrt_tensor_allocate(nrt_tensor_placement_t tensor_placement, int logical_nc_id, size_t size, const char *name, nrt_tensor_t **tensor)\n\n    Allocates a new tensor, placing it in either host virtual memory or device memory (based on the ``tensor_placement`` argument), on the specified NeuronCore index, of a given size, and attaches the given name to it - the name is only used for log messages.\n    For applications running on Inferentia, ``tensor_placement`` should always be ``NRT_TENSOR_PLACEMENT_VIRTUAL``. For all other cases, ``NRT_TENSOR_PLACEMENT_DEVICE`` should be used. If successful, the ``tensor`` address will contain a valid pointer to the newly allocated ``nrt_tensor_t``.\n    (depricated) ``tensor_placement`` set to ``NRT_TENSOR_PLACEMENT_HOST`` will allocate tensors in physical host memory. Tensors allocated with ``NRT_TENSOR_PLACEMENT_HOST`` cannot be larger than 4MB, the Kernel physical page size limit. We restrict tensors to a single page of host memory to simplify the generation of DMA descriptors during pre-execution setup.\n\n    :param tensor_placement: Controls where the tensor will be placed, the definition of the ``nrt_tensor_placement_t`` enum is as follows:\n\n        .. code-block:: c\n\n            typedef enum {\n                NRT_TENSOR_PLACEMENT_DEVICE,    // the tensor is allocated directly in device memory\n                NRT_TENSOR_PLACEMENT_HOST,      // (depricated) the tensor is allocated in DMAable host memory (only for sizes < 4MB)\n                NRT_TENSOR_PLACEMENT_VIRTUAL    // the tensor is allocated in host memory\n            } nrt_tensor_placement_t;\n\n    :param int logical_nc_id: Zero-based NeuronCore index on which to allocate the tensor (if ``tensor_placement`` is ``NRT_TENSOR_PLACEMENT_DEVICE``) or to which associate the tensor for all other cases.\n    :param size: Size for the new tensor.\n    :param name: Name for the new tensor.\n    :param tensor: If the call completes successfully, the address will contain a valid ``nrt_tensor_t*`` pointer.\n\n\n\n.. c:function:: void nrt_tensor_free(nrt_tensor_t **tensor)\n\n    Frees a tensor allocated by a call to ``nrt_tensor_allocate`` and sets the nrt_tensor_t* pointer at address ``tensor`` to NULL.\n\n    :param tensor: Pointer to a pointer to a previously allocated nrt_model_t. After the call returns, the ``nrt_model_t*`` pointer will be NULL.\n\n\n\n.. c:function:: NRT_STATUS nrt_tensor_read(const nrt_tensor_t *tensor, void *buf, size_t offset, size_t size)\n\n    Reads ``size`` bytes of data from a given tensor, starting at ``offset``, to ``buf`` starting at offset 0. ``buf`` needs to be allocated with a size of at least ``size`` bytes.\n\n    :param tensor: Valid pointer to an ``nrt_tensor_t``.\n    :param buf: Buffer where to write read data, it needs to be at least `size` bytes in size.\n    :param offset: Offset within the tensor from which to begin reading.\n    :param size: Size to read.\n\n\n\n.. c:function:: NRT_STATUS nrt_tensor_write(nrt_tensor_t *tensor, const void *buf, size_t offset, size_t size)\n\n    Writes ``size`` bytes of data to a given tensor, starting at ``offset``, from ``buf`` (starting at offset 0).\n\n    :param tensor: Valid pointer to an ``nrt_tensor_t``.\n    :param buf: Buffer containing ``size`` bytes of data to write to the tensor.\n    :param offset: Offset within the tensor from which to begin writing.\n    :param size: Size to write.\n\n\n.. c:function:: NRT_STATUS nrt_tensor_copy(const nrt_tensor_t *src, size_t src_offset, nrt_tensor_t *dst, size_t dst_offset, size_t size)\n\n    Copies ``size`` bytes of data from ``src`` (starting at ``src_offset``) into ``dst`` (starting at ``dst_offset``).\n    When copying between two device tensors, they must both be allocated on the same HBM or the call returns ``NRT_INVALID``.\n\n    :param src: Valid pointer to an ``nrt_tensor_t`` to copy from.\n    :param src_offset: Offset within the source tensor from which to begin copying.\n    :param dst: Valid pointer to an ``nrt_tensor_t`` to copy to.\n    :param dst_offset: Offset within the destination tensor at which to begin copying.\n    :param size: Size to copy.\n\n\n.. c:function:: size_t nrt_tensor_get_size(const nrt_tensor_t *tensor)\n\n    Returns the size, in bytes, of the given tensor.\n\n    :param tensor: Valid pointer to an ``nrt_tensor_t``.\n    :returns: Size in bytes of the given tensor.\n\n\n.. c:function:: NRT_STATUS nrt_tensor_allocate_empty(const char *name, nrt_tensor_t **tensor)\n\n    Allocates an empty tensor, i.e. the tensor structure w/o any attached storage.\n\n    :param name: Name for the new tensor.\n    :param tensor: If the call completes successfully, the address will contain a valid ``nrt_tensor_t*`` pointer.\n\n\n.. c:function:: NRT_STATUS nrt_tensor_attach_buffer(nrt_tensor_t *tensor, void *buffer, size_t size)\n\n    Attaches a caller-supplied buffer to a tensor. Any storage previously attached to the tensor is detached and freed if was owned by the tensor.\n    The attached buffer is managed by the caller and must persist through the entire lifetime of the tensor - calling `nrt_tensor_free` will not deallocate it.\n    This changes the memory placement of the nrt_tensor_t to ``NRT_TENSOR_PLACEMENT_VIRTUAL`` regardless of the initial memory placement type.\n\n    :param tensor: Valid pointer to an ``nrt_tensor_t``.\n    :param buffer: Buffer of ``size`` bytes to attach to the tensor.\n    :param size: Size of attached buffer.\n\n\n.. c:function:: NRT_STATUS nrt_tensor_allocate_slice(const nrt_tensor_t *tensor_source, size_t offset, size_t size, const char *name, nrt_tensor_t **tensor_slice)\n\n    Allocates a new ``nrt_tensor_t`` that doesn’t have its own backing storage - instead, it will use a part (slice) of ``tensor_source``’s storage, starting at ``offset``\n    with the given size. The shared backing storage is reference counted and it will not be deallocated until the last tensor using it is deallocated.\n\n    :param tensor_source: Valid pointer to a ``nrt_tensor_t`` whose storage will be used by the new tensor.\n    :param offset: Offset within the ``tensor_source`` used as origin for the 'slice'.\n    :param size: Size of storage to be used by the new tensor.\n    :param name: Name for the new tensor.\n    :param tensor_slice: If the call completes successfully, the address will contain a valid, newly allocated, ``nrt_tensor_t*`` pointer.\n\n\n.. c:function:: void *nrt_tensor_get_va(const nrt_tensor_t *tensor)\n\n    Returns the virtual address for an allocated tensor.\n\n    :param tensor: Valid pointer to an ``nrt_tensor_t``.\n    :returns: Pointer to host memory used by the tensor.\n\n.. c:function:: NRT_STATUS nrt_tensor_check_output_completion(const nrt_tensor_t *output_tensor, int64_t timeout, uint64_t expected_completion_count)\n\n    Checks if the output tensor has been completely written to by the Neuron Runtime.\n    It waits for up to ``timeout`` microseconds, or unlimited if ``timeout`` is negative, until the tensor reaches the expected completion count.\n    If the ``timeout`` is given as unbounded, it emits a warning at the first 30 seconds.\n    The caller is in charge of handling the timeout behavior.\n    If the tensor is complete, it returns ``NRT_SUCCESS``;\n    if the output tensor is given as NULL, it returns ``NRT_INVALID``;\n    if the tensor does not reach the ``expected_completion_count`` within the timeout, it returns ``NRT_TIMEOUT``.\n\n    :param output_tensor: Valid pointer to an ``nrt_tensor_t``, which is expected to be an output tensor.\n    :param timeout: Maximum time to wait for the output tensor to be written to, in microseconds. If negative, it waits indefinitely until the tensor is complete.\n    :param expected_completion_count: The number of completions expected by the caller.\n\n\n_api_tensorset:\n\nThe Tensorset API\n~~~~~~~~~~~~~~~~~\n\nTensorsets are containers for tensors.\n\n.. c:function:: NRT_STATUS nrt_allocate_tensor_set(nrt_tensor_set_t **result)\n\n    Allocates an empty ``nrt_tensor_set_t`` and places its address in ``result``.\n\n    :param result: If the call completes successfully, this address will contain a pointer to a valid, newly allocated ``nrt_tensor_set_t``.\n\n\n.. c:function:: void nrt_destroy_tensor_set(nrt_tensor_set_t **tensor_set)\n\n    Frees a tensor set allocated by a call to ``nrt_allocate_tensor_set`` and sets the ``nrt_tensor_set_t*`` pointer at address ``tensor_set`` to NULL.\n\n    :param tensor_set: Pointer to a pointer to a previously allocated ``nrt_tensor_set_t``. After the call returns, the ``nrt_tensor_set_t*`` pointer will be NULL.\n\n\n.. c:function:: NRT_STATUS nrt_add_tensor_to_tensor_set(nrt_tensor_set_t *tensor_set, const char *tensor_name, nrt_tensor_t *tensor)\n\n    Adds an ``nrt_tensor`` to a tensor_set under a given name. That name can be later used to retrieve the tensor.\n\n    :param tensor_set: Pointer to a valid Tensorset where to add the tensor.\n    :param tensor_name: Name that will be used to access the added tensor in the container. Does not need to be the same as the ``nrt_tensor_t``’s name.\n    :param tensor: Pointer to a valid ``nrt_tensor_t`` to ad to the Tensorset.\n\n\n.. c:function:: NRT_STATUS nrt_get_tensor_from_tensor_set(nrt_tensor_set_t *tensor_set, const char *tensor_name, nrt_tensor_t **tensor)\n\n    Gets an ``nrt_tensor`` from the tensor set based on the name used when it was added by ``nrt_add_tensor_to_tensor_set`` and places its address\n    at the address pointed by ``tensor``. If the tensor is not found, ``NRT_FAILURE`` is returned and nothing gets written at the address pointed by ``tensor``.\n\n    :param tensor_set: Pointer to a valid Tensorset containing the tensor.\n    :param tensor_name: Name associated with the searched ``nrt_tensor_t`` when it was added to this Tensorset. Might be different from the ``nrt_tensor_t``’s internal name.\n    :param tensor: Address where the address of the found ``nrt_tensor_t`` will be placed.\n\n\n.. _api_exec:\n\nThe Execution API\n^^^^^^^^^^^^^^^^^\n\n\n.. c:function:: NRT_STATUS nrt_execute(nrt_model_t *model, const nrt_tensor_set_t *input_set, nrt_tensor_set_t *output_set)\n\n    Runs one execution of the given ``nrt_model_t`` using the provided input tensor set and writing the results to the provided output tensor set.\n\n    :param model: Valid pointer to a `nrt_model_t` on which to run the execution.\n    :param input_set: Tensorset containing input data.\n    :param input_set: Tensor set where the output data will be written to.\n\n\n.. c:function:: NRT_STATUS nrt_execute_repeat(nrt_model_t *model, const nrt_tensor_set_t *input_set, nrt_tensor_set_t *output_set, int repeat_count)\n\n    Same as ``nrt_execute`` but it will repeat the execution ``repeat_count`` times using the outputs from the n - 1th iteration as inputs for the nth iteration.\n    This requires a specially compiled NEFF and it's not a commonly used call.\n\n    :param model: Valid pointer to a `nrt_model_t` on which to run the execution.\n    :param input_set: Tensorset containing input data.\n    :param input_set: Tensor set where the output data will be written to.\n    :param repeat_count:  Number of times to repeat this execution.\n\n\n.. _api_profile:\n\nThe Profiling API\n^^^^^^^^^^^^^^^^^\n\n\n.. c:function:: NRT_STATUS nrt_profile_start(nrt_model_t *model, const char *filename)\n\n    Begins profiling of the execution of the given model. The profile data will be written to the file specified by the path in ``filename``.\n    The file will be truncated if it exists.\n\n    :param model: Valid pointer to a `nrt_model_t` which will be profiled by the Neuron Runtime during execution.\n    :param filename: Path to a file where the profile will be written. If the file already exists, it will be truncated.\n\n\n.. c:function:: NRT_STATUS nrt_profile_stop(const char *filename)\n\n    Ends profiling of the execution of a model and writes profile data to ``filename``. ``filename`` needs to be the same path as the one used for ``nrt_profile_start``.\n\n    :param filename: Path to a file where the profile will be written. If the file already exists, it will be truncated.\n\n\n.. _api_debug_stream:\n\nThe Debug Stream API\n^^^^^^^^^^^^^^^^^^^^\n\nSee :ref:`Neuron Debug Stream API Documentation <nrt-debug-stream-api>` for more details.\n\n.. c:function:: NRT_STATUS nrt_debug_client_connect(int logical_nc_idx, int *stream_fd)\n\n    Establishes a connection to a specified Logical Neuron Core's debug stream and returns a handle to the stream in the ``stream_fd`` parameter. Note that only one client\n    can connect to a Logical Neuron Core's stream at any given time. Attempts to connect to a stream with multiple clients will result in a ``NRT_INVALID`` return status.\n\n    :param logical_nc_idx: Core's debug stream to connect to\n    :param stream_fd: Connection handle to reference and interact with the stream\n\n    :returns: ``NRT_SUCCESS`` on success\n\n.. c:function:: void nrt_debug_client_connect_close(int stream_fd)\n\n    Closes a connection created by ``nrt_debug_client_connect``.\n\n    :param stream_fd: Connection handle to close\n\n.. c:function:: NRT_STATUS nrt_debug_client_read_one_event(int stream_fd, ndebug_stream_event_header_t *header, void **payload)\n\n    Consumes a single event from the stream and return it in ``header`` and ``payload``. Note that it is the user's responsibility to free the payload pointer. Also keep\n    in mind that this function must be called from the same process that owns the Logical Neuron Core. Calling this function from any other process results in undefined behavior.\n\n    :param stream_fd: Stream to consume an event from\n    :param header: Consumed event's header\n    :param payload: Consumed event's payload (caller's responsibility to free this pointer)\n\n    :returns: ``NRT_SUCCESS`` on success or ``NRT_QUEUE_EMPTY`` if no events are available\n\nOther APIs\n^^^^^^^^^^\n\n.. c:function:: NRT_STATUS nrt_get_version(nrt_version_t *ver, size_t size)\n\n    Fills a ``nrt_version_t`` struct with the provided size with version info. The ``size`` argument allows for backwards compatibility.\n    if the struct changes in future releases.\n\n    :param *ver: Pointer to a ``nrt_version_t`` structure which is currently defined as:\n\n        .. code-block:: c\n\n            typedef struct nrt_version {\n                uint64_t rt_major;       // major version number\n                uint64_t rt_minor;       // minor version number\n                uint64_t rt_patch;       // patch version number\n                uint64_t rt_maintenance; // maintainance version number\n                char rt_detail[RT_VERSION_DETAIL_LEN]; // runtime version description string\n                char git_hash[GIT_HASH_LEN];           // runtime git hash\n            } nrt_version_t;\n\n    :param size_t size: Size of the ``nrt_version_t`` structure, should always be ``sizeof(nrt_version_t)``\n\n\n.. c:function:: NRT_STATUS nrt_get_total_nc_count(uint32_t *nc_count)\n\n    Gets the total number of NeuronCores present on the current instance. The result is not affected by the values in\n    ``NEURON_RT_NUM_CORES`` or ``NEURON_RT_VISIBLE_CORES`` and, in fact, this function can be called before calling ``nrt_init``.\n\n    :param nc_count: If the call completes successfully, the address will contain the total number of NeuronCores present on the instance.\n\n\n.. c:function:: NRT_STATUS nrt_get_visible_nc_count(uint32_t *nc_count)\n\n    Gets the total number of NeuronCores available to the application after ``nrt_init`` has parsed the configuration environment variables ``NEURON_RT_NUM_CORES`` and ``NEURON_RT_VISIBLE_CORES``\n    (if provided).\n\n    :param nc_count: If the call completes successfully, the address will contain the total number of NeuronCores available to the application.\n\n\n.. |nd_v1| image:: ../images/neuron-rt-nd-v1.png\n.. |nrt_arch| image:: ../images/neuron-rt-architecture.png\n.. |nrt_neff| image:: ../images/neuron-rt-neff.png\n.. |nrt_neff_s| image:: ../images/neuron-rt-neff-s.png\n.. |nrt_neff_m| image:: ../images/neuron-rt-neff-m.png\n.. |nrt_neff_single| image:: ../images/neuron-rt-neff-single.png\n"
  },
  {
    "path": "neuron-runtime/nrt-troubleshoot.rst",
    "content": ".. _nrt-troubleshooting:\n\nNeuron Runtime Troubleshooting on Inf1, Inf2 and Trn1\n=====================================================\n\nThis document aims to provide more information on how to fix issues you\nmight encounter while using the Neuron Runtime 2.x or above. For each\nissue we will provide an explanation of what happened and what can\npotentially correct the issue.\n\n\nIf your issue is not listed below or you have a more nuanced problem, contact\nus via `issues <https://github.com/aws/aws-neuron-sdk/issues>`__ posted\nto this repo, the `AWS Neuron developer\nforum <https://forums.aws.amazon.com/forum.jspa?forumID=355>`__, or\nthrough AWS support.\n\n\n.. contents::  Table of contents\n   :local:\n   :depth: 2\n\nNeuron Driver installation fails\n--------------------------------\n\naws-neuron-dkms is a driver package which needs to be compiled during\ninstallation. The compilation requires kernel headers for the instance's\nkernel. ``uname -r`` can be used to find kernel version in the instance.\nIn some cases, the installed kernel headers might be newer than the\ninstance's kernel itself.\n\nPlease look at the aws-neuron-dkms installation log for message like the\nfollowing:\n\n::\n\n   Building for 4.14.193-149.317.amzn2.x86_64\n   Module build for kernel 4.14.193-149.317.amzn2.x86_64 was skipped since the\n   kernel headers for this kernel does not seem to be installed.\n\nIf installation log is not available, check whether the module is\nloaded.\n\n::\n\n   $ lsmod | grep neuron\n\nIf the above has no output then that means ``aws-neuron-dkms``\ninstallation is failed.\n\nSolution\n^^^^^^^^\n\n1. Stop all applications using the NeuronCores.\n\n2. Uninstall aws-neuron-dkms ``sudo apt remove aws-neuron-dkms`` or\n   ``sudo dnf remove aws-neuron-dkms``\n\n3. Install kernel headers for the current kernel\n   ``sudo apt install -y linux-headers-$(uname -r)`` or\n   ``sudo dnf install -y \"kernel-devel-uname-r = $(uname -r)\"``\n\n4. Install aws-neuron-dkms ``sudo apt install aws-neuron-dkms`` or\n   ``sudo dnf install aws-neuron-dkms``\n\nApplication fails to start\n--------------------------\n\nNeuron Runtime requires Neuron Driver(aws-neuron-dkms package) to access Neuron\ndevices. If the driver is not installed then Neuron Runtime wont able to access the\nNeuron devices and will fail with an error message in console and syslog.\n\nIf ``aws-neuron-dkms`` is not installed then the error message will be like the following::\n\n 2021-Aug-11 18:38:27.0917 13713:13713 ERROR   NRT:nrt_init      Unable to determine Neuron Driver version. Please check aws-neuron-dkms package is installed.\n\nIf ``aws-neuron-dkms`` is installed but does not support the latest runtime then the error message will be like the following::\n\n 2021-Aug-11 19:18:21.0661 24616:24616 ERROR   NRT:nrt_init      This runtime requires Neuron Driver version 2.0 or greater. Please upgrade aws-neuron-dkms package.\n\nWhen using any supported framework from Neuron SDK version 2.5.0 and Neuron Driver (aws-neuron-dkms) versions 2.4 or older, Neuron Runtime will return the following error message::\n\n  2022-Dec-01 09:34:12.0559   138:138   ERROR   HAL:aws_hal_tpb_pooling_write_profile       failed programming the engine\n\nSolution\n^^^^^^^^\n\nPlease follow the installation steps in :ref:`setup-guide-index` to install ``aws-neuronx-dkms``.\n\nThis Neuron Runtime (compatibility id: X) is not compatible with the installed aws-neuron-dkms package\n------------------------------------------------------------------------------------------------------\n\nThis error is caused by incompatibility between the Neuron Driver (dkms package) and the Runtime Library (runtime-lib package).  The driver remains backwards compatible with older versions of Neuron Runtime, but newer versions of the Runtime might rely on the functionality that is only provided by a newer driver.  In that case, an update to the newer driver is required.\n\nIn some cases the compatibility error persists even after the driver has been updated.  That happens when the update process fails to reload the driver at the end of the update.  Note that ``$ modinfo neuron``  will misleadingly show the new version because modinfo reads the version information for neuron.ko file that's been successfully replaced.\n\nReload failure happens because one of the processes is still using Neuron Devices and thus the driver cannot be reloaded.\n\nSolution\n^^^^^^^^\n\nCheck for any process that is still using the Neuron driver by running lsmod:\n\n.. code:: bash\n\n   ubuntu@ip-10-1-200-50:~$ lsmod | grep neuron\n   neuron                237568  0\n   ubuntu@ip-10-1-200-50:~$\n\n“Used by” counter, the second number, should be 0.  If it is not, there is still a running process that is using Neuron.  Terminate that process and either:\n\n.. code:: bash\n\n   $ sudo rmmod neuron\n   $ sudo modprobe neuron\n\nOr simply rerun the installation one more time.  The driver logs its version in dmesg:\n\n.. code:: bash\n\n   $ sudo dmesg\n   ...\n   [21531.105295] Neuron Driver Started with Version:2.9.4.0-8a6fdf292607dccc3b7059ebbe2fb24c60dfc7c4\n\nA common culprit is a Jupyter process.  If you are using Jupyter on the instance, make sure to terminate Jupyter process before updating the driver.\n\nNeuron Core is in use\n---------------------\n\nA Neuron Core cant be shared between two applications. If an application\nstarted using a Neuron Core all other applications trying to use the\nNeuronCore would fail during runtime initialization with the following\nmessage in the console and in syslog:\n\n.. code:: bash\n\n   2021-Aug-27 23:22:12.0323 28078:28078 ERROR   NRT:nrt_allocate_neuron_cores               NeuronCore(s) not available - Requested:nc1-nc1 Available:0\n\nSolution\n^^^^^^^^\n\nTerminate any other processes that are using NeuronCore and then try launching the application again. If you are using Jupyter, ensure that you only have a single Jupyter kernel attempting to access the NeuronCores by restarting or shutting-down any other kernels, which will release any NeuronCores that might be in use.\n\nUnsupported NEFF Version\n------------------------\n\nWhile loading a model(NEFF), Neuron Runtime checks the version compatibility.\nIf the version the NEFF is incompatible with Runtime then it would fail the\nmodel load with following error message:\n\n::\n\n   NEFF version mismatch supported: 1.1 received: 2.0\n\nSolution\n^^^^^^^^\n\nUse compatible versions of Neuron Compiler and Runtime. Updating to the\nlatest version of both Neuron Compiler and Neuron Runtime is the\nsimplest solution. If updating one of the two is not an option, please\nrefer to the :ref:`runtime_rn`\nof the Neuron Runtime to determine NEFF version support.\n\nUnsupported Hardware Operator Code\n----------------------------------\n\nWhile loading a model(NEFF), Neuron Runtime checks whether the hardware operators are supported or not. If unsupported,\nNeuron Runtime will display the following error messages:\n\n::\n\n    2023-Jul-28 22:23:13.0357 101413:101422 ERROR  TDRV:translate_one_pseudo_instr_v2           Unsupported hardware operator code 214 found in neff.\n    2023-Jul-28 22:23:13.0357 101413:101422 ERROR  TDRV:translate_one_pseudo_instr_v2           Please make sure to upgrade to latest aws-neuronx-runtime-lib and aws-neuronx-collective; for detailed installation instructions visit Neuron documentation.\n\nSolution\n^^^^^^^^\n\nUpgrade to latest Neuron Runtime and Neuron Collectives.\n\n\nInsufficient Memory\n-------------------\n\nWhile loading a model(NEFF), Neuron Runtime reserves both device and host memory\nfor storing weights, ifmap and ofmap of the Model. The memory consumption of\neach model is different. If Neuron Runtime is unable to allocate memory then\nthe model load would fail with the following message in syslog\n\n::\n\n   kernel: [XXXXX] neuron:mc_alloc: device mempool [0:0] total 1073741568 occupied 960539030 needed 1272 available 768\n\n\nSolution\n^^^^^^^^\n\nAs the error is contextual to what's going on with your instance, the\nexact next step is unclear. Try unloading some of the loaded models\nwhich will free up device DRAM space. If this is still a problem, moving\nto a larger Inf1 instance size with additional NeuronCores may help.\n\nInsufficient number of NeuronCores\n----------------------------------\n\nThe NEFF requires more NeuronCores than available on the instance.\n\nCheck for error messages in syslog similar to:\n\n::\n\n  NRT:  26638:26638 ERROR  TDRV:db_vtpb_get_mla_and_tpb                 Could not find VNC id n\n  NRT:  26638:26638 ERROR  NMGR:dlr_kelf_stage                          Failed to create shared io\n  NRT:  26638:26638 ERROR  NMGR:stage_kelf_models                       Failed to stage graph: kelf-a.json to NeuronCore\n  NRT:  26638:26638 ERROR  NMGR:kmgr_load_nn_post_metrics               Failed to load NN: xxxxxxx, err: 2\n\nSolution\n^^^^^^^^\n\nThe NeuronCores may be in use by models you are not actively using.\nEnsure you've unloaded models you're not using and terminated unused applications.\nIf this is still a problem, moving to a larger Inf1 instance\nsize with additional NeuronCores may help.\n\nNumerical Error\n---------------\n\nNeuron Devices will detect any NaN generated during execution and\nreport it. If Neuron Runtime sees NaNs are generated then it would\nfail the execution request with Numerical Error with the following\nmessage:\n\n::\n\n   nrtd[nnnnn]: ....  Error notifications found on NC .... INFER_ERROR_SUBTYPE_NUMERICAL\n\nSolution\n^^^^^^^^\n\nThis usually an indication of either error in the model or error in the\ninput.\n\nReport issue to Neuron by posting the relevant details on GitHub\n`issues <https://github.com/aws/aws-neuron-sdk/issues>`__.\n\nRuntimeError: module compiled against API version 0xf but this version of numpy is 0xe\n--------------------------------------------------------------------------------------\nThis usually means that the numpy version used during compilation is different than the one used when executing the model.\nAs of Neuron SDK release 2.15, numpy versions supported in Neuron SDK are following:  numpy<=1.25.2, >=1.22.2.  Check and confirm the right\nnumpy version is installed and re-compile/execute the model.\n\n\nFailure to initialize Neuron\n----------------------------\n\n::\n\n   nd0 nc0 Timestamp program stop timeout (1000 ms)\n   nd0 nc0 Error while waiting for timestamp program to end on TPB eng 0\n   nd0 nc0 Failed to stop neuron core\n   nd0 nc0 Failed to end timestamp sync programs\n   TDRV not initialized\n   Failed to initialize devices, error:5\n\n.. _solution-2:\n\nSolution\n^^^^^^^^\n\nPreviously executed application left Neuron devices in running state.\nReset Neuron devices but reloading Neuron Driver. Note, this is a\ntemporary workaround, future versions of Neuron will reset running\ndevices automatically.\n\n::\n\n   sudo rmmod neuron; sudo modprobe neuron\n\nAn application is trying to use more cores that are available on the instance\n-----------------------------------------------------------------------------\n\n::\n\n   Could not open the nd1\n\n.. _solution-3:\n\nSolution\n^^^^^^^^\n\nUse properly sized instance. trn1.32xlarge has 32 Neuron Cores,\ntrn1.2xlarge has 2 Neuron Cores.\n\n\nNeuron DGE notification queue overflow\n----------------------------------------\n\n.. code:: bash\n\n   2025-Oct-01 23:48:34.002205 516278:516289 ERROR  TDRV:exec_consume_topsp_cc_notifications     [ND 1][NC 4] execution on model /home/ubuntu/compiled-models/model.MODULE_7c055c4ac6e2851a63bb+7d89256e.neff, is stuck after all collectives operation have completed\n   2025-Oct-01 23:48:34.002207 516278:516288 ERROR   NRT:nrt_infodump                            Failure: NRT_EXEC_SW_NQ_OVERFLOW in nrt_execute()\n   2025-Oct-01 23:48:34.002234 516278:516288 ERROR   NRT:nrt_infodump                            LNC: 0\n   2025-Oct-01 23:48:34.002260 516278:516285 ERROR  TDRV:exec_request_process_errors             [ND 1][NC 0] execution timeout (30000 ms) on model /home/ubuntu/compiled-models/model.MODULE_7c055c4ac6e2851a63bb+7d89256e.neff, potentially caused by DGE notifications enabled. Please disable it (set NEURON_RT_ENABLE_DGE_NOTIFICATIONS to 0) and try again.\n\nSolution\n^^^^^^^^\n\nSet the environment variable ``NEURON_RT_ENABLE_DGE_NOTIFICATIONS`` to ``0`` to disable DMA Generation Engine notifications.\n\n\nNeuron Runtime execution fails at out-of-bound access\n-----------------------------------------------------\n\nWhen a Neuron Runtime execution encounters an out-of-bound access error, the runtime logs in the stdout console will display one of the following error messages:\n\n::\n\n    2024-08-12 18:34:56,116::ERROR: 2024-Aug-12 18:34:56.067150 159612:159612 ERROR  TDRV:generate_custom_notification_msg        nd0:nc0:h_model.id1107: Received notification generated at runtime: failed to run embedding table update, due to out-of-bound access.\n    2024-08-12 18:34:56,116::ERROR: 2024-Aug-12 18:34:56.067151 159602:159602 ERROR  TDRV:generate_custom_notification_msg        nd0:nc1:h_model.id1109: Received notification generated at runtime: failed to run scatter/gather (indirect memory copy), due to out-of-bound access.\n\n**Cause of the Error**\n\nAn out-of-bound access error typically indicates that incorrect inputs have been provided to the model.\n\n**How to Debug**\n\nTo troubleshoot this issue, you need to examine both the High-Level Operation (HLO) and all inputs.\nNeuron Runtime can automatically dump all inputs in binary format, which can be instrumental in debugging.\nTo enable input dumping for each failed execution, set the following environment variable:\n\n::\n\n    export NEURON_RT_DBG_DUMP_INPUTS_ON_ERR=<an NRT_STATUS value>\n\nA complete set of ``NRT_STATUS`` can be found under :ref:`The LIBNRT API Return Codes <nrt_api>`.\n\nOnce this variable is set, Neuron Runtime generates a directory in the current working directory for each failed execution at this `NRT_STATUS` value. The directory name follows this pattern:\n\n::\n\n    input_dump_<runtime_generated_random_number>_h_nn_<runtime_generated_execution_id>\n\n\nInside each directory, you'll find all the inputs that led to this failure, stored in binary format.\nAdditionally, the model name is saved in a separate file called model_name.txt within the same directory.\n\nTo disable input dump, you can set the environment variable back to 0\n\n::\n\n    export NEURON_RT_DBG_DUMP_INPUTS_ON_ERR=0\n\n**Example: Debug an out-of-bound access execution**\n\nTo debug an out-of-bound (OOB) execution, which returns an NRT_STATUS code of 1006, both HLO and all inputs are required.\nBy setting the ``NEURON_RT_DBG_DUMP_INPUTS_ON_ERR`` environment variable to 1006, you can capture the inputs leading to an OOB execution.\n\nFor example, when an OOB error occurs, Neuron Runtime creates a directory named input_dump_424238335_h_nn_10001.\nHere, 424238335 is a randomly generated number by Neuron Runtime, and 10001 is the Neuron Runtime generated execution ID.\nAll relevant inputs, labeled from input0 to input14, are saved in binary format within this directory.\n\n::\n\n    ubuntu@ip-172-31-53-90:~$ NEURON_RT_DBG_DUMP_INPUTS_ON_ERR=1006 torchrun --nproc_per_node=2 train_torchrun.py\n    ......\n    2024-Jun-26 00:32:47.943821 30294:32381 ERROR  TDRV:generate_custom_notification_msg        nd0:nc0:h_model.id1001: Received notification generated at runtime: failed to run scatter/gather (indirect memory copy), due to out-of-bound access. isa instruction line number = 11. model name = /home/ubuntu/token-seqlen1280-batch128-FullyUnrolled.736.2.0.62758.0a0+44863561.93f365ce40ab99133659.pb.neff\n    ......\n    2024-Jun-26 00:32:47.948678 30294:32381 ERROR  NMGR:dlr_infer                               Inference completed with err: 1006. mode->h_nn=1001, start_nc=0, nc_count=1\n    2024-Jun-26 00:32:50.801487 30294:32381 ERROR  TDRV:tensor_dump_inputs                      15 input tensors were dumped successfully to directory /home/ubuntu/input_dump_424238335_h_nn_10001. Model name is /home/ubuntu/token-seqlen1280-batch128-FullyUnrolled.736.2.0.62758.0a0+44863561.93f365ce40ab99133659.pb.neff\n    ......\n\n    ubuntu@ip-172-31-53-90:~$ ls -lt\n    total 3908900\n    drwxrwxr-x 2 ubuntu ubuntu 4096 Jun 26 00:32 input_dump_424238335_h_nn_10001\n    .....\n\n    ubuntu@ip-172-31-53-90:~$ ls -lt input_dump_424238335_h_nn_10001\n    total 1405192\n    -rw-r—r-- 1 ubuntu ubuntu 5242880 Jun 26 00:32 input14.bin\n    -rw-r—r-- 1 ubuntu ubuntu 5242880 Jun 26 00:32 input13.bin\n    -rw-r—r-- 1 ubuntu ubuntu 5242880 Jun 26 00:32 input12.bin\n    -rw-r—r-- 1 ubuntu ubuntu 5242880 Jun 26 00:32 input11.bin\n    -rw-r—r-- 1 ubuntu ubuntu 13967360 Jun 26 00:32 input10.bin\n    -rw-r—r-- 1 ubuntu ubuntu 81920 Jun 26 00:32 input8.bin\n    -rw-r—r-- 1 ubuntu ubuntu 4 Jun 26 00:32 input9.bin\n    -rw-r—r-- 1 ubuntu ubuntu 4 Jun 26 00:32 input6.bin\n    -rw-r—r-- 1 ubuntu ubuntu 81920 Jun 26 00:32 input7.bin\n    -rw-r—r-- 1 ubuntu ubuntu 16777216 Jun 26 00:32 input5.bin\n    -rw-r—r-- 1 ubuntu ubuntu 131072 Jun 26 00:32 input3.bin\n    -rw-r—r-- 1 ubuntu ubuntu 13967360 Jun 26 00:32 input4.bin\n    -rw-r—r-- 1 ubuntu ubuntu 16777216 Jun 26 00:32 input2.bin\n    -rw-r—r-- 1 ubuntu ubuntu 13967360 Jun 26 00:32 input1.bin\n    -rw-r—r-- 1 ubuntu ubuntu 1342177280 Jun 26 00:32 input0.bin\n    -rw-r—r-- 1 ubuntu ubuntu 9 Jun 26 00:32 model_name.txt\n\n    ubuntu@ip-172-31-53-96:~$ cat input_dump_424238335_h_nn_10001/model_name.txt\n    /home/ubuntu/token-seqlen1280-batch128-FullyUnrolled.736.2.0.62758.0a0+44863561.93f365ce40ab99133659.pb.neff\n\n\n**Known Limitations**\n\n* **HLO Access**: Neuron Runtime does not have direct access to the HLO; it must be deduced from the model name.\n\n* **Partial Input Dumps**: If a Neuron Runtime execution fails and an exception is raised to the Neuron Framework, other ongoing Neuron Runtime executions may be terminated by the Neuron Framework. This means only one set of inputs may be fully captured, while others may be incomplete if terminated prematurely.\n\n  * An input dump folder is considered complete when the model_name.txt file is fully written, as Neuron Runtime saves all inputs first and then writes the model_name.txt file. So you might find out the folder with the complete set of inputs by searching for the model_name.txt file.\n\n\nHardware Errors\n----------------\n\n\nFor Trn and Inf instances, the following hardware errors are monitored by Neuron Runtime:\n\n\n+-------------------------------------+-----------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n| Error Types                         | Description                                               | Behaviors                                                                                                                     | Recommended Actions                                                                                                                                                  |\n+=====================================+===========================================================+===============================================================================================================================+======================================================================================================================================================================+\n| SRAM Uncorrectable                  | An on-chip SRAM encountered a parity error and produced   | 1. Instance Retirement Notice:                                                                                                | 1. Replace the EC2 instance by                                                                                                                                       |\n|                                     | incorrect results.                                        | You will receive an `EC2 instance retirement notice                                                                           | `terminating <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html>`_                                                                      |\n|                                     |                                                           | <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-retirement.html>`_                                              | it or `stopping then starting <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html>`_ it.                                                            |\n|                                     |                                                           | within 15 minutes of experiencing this message.                                                                               |                                                                                                                                                                      |\n|                                     |                                                           | EKS, EC2 Auto Scaling Groups, and AWS ParallelCluster will react to                                                           | 2. Utilize `Neuron Sysfs <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-sysfs-user-guide.html#description-for-each-metric>`_ |\n|                                     |                                                           | these retirement notices according to their configured policies,                                                              | and `Neuron Monitor <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html#system-level-metric-groups>`_     |\n|                                     |                                                           | but you can also automate responses to these notices yourself with                                                            | to monitor the ``sram_ecc_uncorrected`` error counts.                                                                                                                |\n|                                     |                                                           | `EventBridge rules <https://repost.aws/knowledge-center/eventbridge-notification-scheduled-events>`_.                         |                                                                                                                                                                      |\n|                                     |                                                           |                                                                                                                               |                                                                                                                                                                      |\n|                                     |                                                           | 2. Neuron Runtime Behavior:                                                                                                   |                                                                                                                                                                      |\n|                                     |                                                           | Neuron Runtime will timeout and exit with ``NRT_EXEC_COMPLETED_WITH_ERR (1004)``                                              |                                                                                                                                                                      |\n|                                     |                                                           | or ``NRT_EXEC_HW_ERR_NC_UE (1202)`` return code.                                                                              |                                                                                                                                                                      |\n|                                     |                                                           | You will see the following error message in runtime logs from stdout console: ``(FATAL-RT-UNDEFINED-STATE)                    |                                                                                                                                                                      |\n|                                     |                                                           | [ND 0][NC 0] Uncorrectable memory error is detected, metadata: 0x16. Please terminate or stop/start this instance to prevent  |                                                                                                                                                                      |\n|                                     |                                                           | future impact from the hardware error.``                                                                                      |                                                                                                                                                                      |\n+-------------------------------------+-----------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n| HBM Unrepairable Uncorrectable      | An HBM encountered an unrepairable uncorrectable error    | 1. Instance Retirement Notice:                                                                                                | 1. Replace the EC2 instance by                                                                                                                                       |\n|                                     | and produced incorrect results.                           | You will receive an `EC2 instance retirement notice                                                                           | `terminating <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html>`_                                                                      |\n|                                     |                                                           | <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-retirement.html>`_                                              | it or `stopping then starting <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html>`_ it.                                                            |\n|                                     |                                                           | within 15 minutes of experiencing this message.                                                                               |                                                                                                                                                                      |\n|                                     |                                                           | EKS, EC2 Auto Scaling Groups, and AWS ParallelCluster will react to                                                           | 2. Utilize `Neuron Sysfs <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-sysfs-user-guide.html#description-for-each-metric>`_ |\n|                                     |                                                           | these retirement notices according to their configured policies,                                                              | and `Neuron Monitor <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html#system-level-metric-groups>`_     |\n|                                     |                                                           | but you can also automate responses to these notices yourself with                                                            | to monitor the ``mem_ecc_uncorrected`` error counts.                                                                                                                 |\n|                                     |                                                           | `EventBridge rules <https://repost.aws/knowledge-center/eventbridge-notification-scheduled-events>`_.                         |                                                                                                                                                                      |\n|                                     |                                                           |                                                                                                                               |                                                                                                                                                                      |\n|                                     |                                                           | 2. Neuron Runtime Behavior:                                                                                                   |                                                                                                                                                                      |\n|                                     |                                                           | Neuron Runtime will timeout and exit with ``NRT_TIMEOUT (5)``                                                                 |                                                                                                                                                                      |\n|                                     |                                                           | or ``NRT_EXEC_HW_ERR_HBM_UE (1201)`` return code.                                                                             |                                                                                                                                                                      |\n|                                     |                                                           | You will see the following error message in runtime logs from stdout console: ``(FATAL-RT-UNDEFINED-STATE)                    |                                                                                                                                                                      |\n|                                     |                                                           | Uncorrectable HBM memory error is detected. Execution results may be invalid.                                                 |                                                                                                                                                                      |\n|                                     |                                                           | Please terminate this instance to prevent future impact from the hardware error.``                                            |                                                                                                                                                                      |\n+-------------------------------------+-----------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n| HBM Repairable Uncorrectable        | An HBM encountered a repairable uncorrectable error       | Neuron Runtime Behavior:                                                                                                      | 1. Reload the neuron driver or                                                                                                                                       |\n|                                     | and produced incorrect results.                           | Neuron Runtime will timeout and exit with ``NRT_TIMEOUT (5)``                                                                 | `reboot <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-reboot.html>`_ the EC2 instance.                                                           |\n|                                     |                                                           | or ``NRT_EXEC_HW_ERR_REPAIRABLE_HBM_UE (1205)`` return code.                                                                  |                                                                                                                                                                      |\n|                                     |                                                           | You will see the following error message in runtime logs from stdout console: ``(FATAL-RT-UNDEFINED-STATE)                    | 2. Utilize `Neuron Sysfs <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-sysfs-user-guide.html#description-for-each-metric>`_ |\n|                                     |                                                           | Uncorrectable HBM memory error is detected. Execution results may be invalid.                                                 | and `Neuron Monitor <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html#system-level-metric-groups>`_     |\n|                                     |                                                           | Please reload the neuron driver or reboot your EC2 instance to prevent future impact from the hardware error.``               | to monitor the ``mem_ecc_repairable_uncorrected`` error counts.                                                                                                      |\n+-------------------------------------+-----------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n| DMA Aborts                          | A DMA engine encountered an unrecoverable error.          | Neuron Runtime Behavior:                                                                                                      | Replace the EC2 instance by                                                                                                                                          |\n|                                     |                                                           | Neuron Runtime will timeout and exit with ``NRT_TIMEOUT (5)``                                                                 | `terminating <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html>`_                                                                      |\n|                                     |                                                           | or ``NRT_EXEC_HW_ERR_DMA_ABORT (1203)`` return code.                                                                          | it or `stopping then starting <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html>`_ it.                                                            |\n|                                     |                                                           | You will see the following error messages in runtime logs from stdout console:                                                |                                                                                                                                                                      |\n|                                     |                                                           | ``[MLA 0][NC 0] DMA TX engine 0 is in an abort state`` or                                                                     |                                                                                                                                                                      |\n|                                     |                                                           | ``[MLA 0][NC 0] DMA RX engine 0 is in an abort state``                                                                        |                                                                                                                                                                      |\n+-------------------------------------+-----------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n| Hang on Collectives                 | Possibly caused by a hardware error on another worker.    | Neuron Runtime Behavior:                                                                                                      | Search for SRAM Uncorrectable, HBM Uncorrectable, DMA Aborts, and Hang on Compute errors on the other workers, and implement the recommended actions on the          |\n|                                     |                                                           | Neuron Runtime will timeout and exit with ``NRT_TIMEOUT (5)``                                                                 | affected worker. Afterward, restart your workload and attempt again.                                                                                                 |\n|                                     |                                                           | or ``NRT_EXEC_HW_ERR_COLLECTIVES (1200)`` return code.                                                                        |                                                                                                                                                                      |\n|                                     |                                                           | You will see the following error messages in runtime logs from stdout console:                                                |                                                                                                                                                                      |\n|                                     |                                                           | ``(FATAL-RT-UNDEFINED-STATE) missing collectives status                                                                       |                                                                                                                                                                      |\n|                                     |                                                           | on Neuron Device 0 NC 0, model 0 - suspected hang in collectives operation 0 out of 100``                                     |                                                                                                                                                                      |\n+-------------------------------------+-----------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n| Hang on Compute                     | Unexpected software or hardware issue.                    | Neuron Runtime Behavior:                                                                                                      | Replace the EC2 instance by                                                                                                                                          |\n|                                     |                                                           | Neuron Runtime will timeout and exit with ``NRT_TIMEOUT (5)``.                                                                | `terminating <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html>`_                                                                      |\n|                                     |                                                           | You will see the following error messages in runtime logs from stdout console:                                                | it or `stopping then starting <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html>`_ it.                                                            |\n|                                     |                                                           | ``(FATAL-RT-UNDEFINED-STATE) execution timeout (30000 ms)                                                                     |                                                                                                                                                                      |\n|                                     |                                                           | on Neuron Device 0 NC 0, model xxx.neff, waiting for execution completion notification``                                      |                                                                                                                                                                      |\n+-------------------------------------+-----------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n\nUpon any hardware errors, you should also expect to see the error message like the following in ``dmesg``:\n``NEURON_HW_ERR=SRAM_UNCORRECTABLE_ERROR instance-id=i-0592464924bd45322 hostname=ip-172-31-61-252 nd-id=0 nc-id=0 serial-num=19fcda00f5ff6eb9 action=TERMINATE_INSTANCE``\n\n\nEFA and Collective Communication Errors\n-----------------------------------------\n\nMissing aws-neuronx-collectives package\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**aws-neuronx-collectives** package is required to execute Collective\nCommunication on a single instance and across multiple instances.\n\n::\n\n   NCCL init error: Error opening libnccom.so, cannot use collective operations! Please set LD_LIBRARY_PATH to library location. Error: libnccom.so: cannot open shared object\n   file: No such file or directory\n   Please make sure to install correct version of aws-neuronx-collectives; for detailed installation instructions visit Neuron documentation\n\n.. _solution-4:\n\nSolution\n~~~~~~~~~\n\nInstall aws-neuornx-collectives package. If the installation used\nnon-default destination set LD_LIBRARY_PATH.\n\n.. _missing-efa-installer-package:\n\nMissing efa installer package.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**efa-installer** package is required to execute Collective\nCommunication across multiple instances.\n\n::\n\n   Unable to run multi-instance workload.  Ofi plugin is not installed or EFA is not enabled\n\n.. _solution-5:\n\nSolution\n~~~~~~~~~\n\nFollow the directions to install efa-installer package. Make sure to add\nthe path to to libfabric library to LD_LIBRARY_PATH\n\n.. _efa-is-not-enabled-in-trn132xlarage:\n\nEFA is not enabled in trn1.32xlarge\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEFA is used as a transport for Collective Communication among multiple\ninstances. EFA must be enabled on the instances used for multi-node\ntraining.\n\n::\n\n    OFI plugin initNet() failed is EFA enabled?\n\n.. _solution-6:\n\nSolution\n~~~~~~~~~\n\nConfirm that EFA is enabled by running lspci command and making sure\nthere are eight EFA devices. For example:\n\n::\n\n   [ec2-user@ip-10-0-13-247 ~]$ lspci -tv\n   -+-[0000:a0]-+-00.0  Amazon.com, Inc. Elastic Network Adapter (ENA)\n    |           +-01.0  Amazon.com, Inc. Elastic Network Adapter (ENA)\n    |           +-19.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)\n    |           +-1a.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)\n    |           +-1b.0  Amazon.com, Inc. NeuronDevice\n    |           +-1c.0  Amazon.com, Inc. NeuronDevice\n    |           +-1d.0  Amazon.com, Inc. NeuronDevice\n    |           +-1e.0  Amazon.com, Inc. NeuronDevice\n    |           \\-1f.0  Amazon.com, Inc. NVMe SSD Controller\n    +-[0000:90]-+-00.0  Amazon.com, Inc. Elastic Network Adapter (ENA)\n    |           +-01.0  Amazon.com, Inc. Elastic Network Adapter (ENA)\n    |           +-19.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)\n    |           +-1a.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)\n    |           +-1b.0  Amazon.com, Inc. NeuronDevice\n    |           +-1c.0  Amazon.com, Inc. NeuronDevice\n    |           +-1d.0  Amazon.com, Inc. NeuronDevice\n    |           +-1e.0  Amazon.com, Inc. NeuronDevice\n    |           \\-1f.0  Amazon.com, Inc. NVMe SSD Controller\n    +-[0000:20]-+-00.0  Amazon.com, Inc. Elastic Network Adapter (ENA)\n    |           +-01.0  Amazon.com, Inc. Elastic Network Adapter (ENA)\n    |           +-19.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)\n    |           +-1a.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)\n    |           +-1b.0  Amazon.com, Inc. NeuronDevice\n    |           +-1c.0  Amazon.com, Inc. NeuronDevice\n    |           +-1d.0  Amazon.com, Inc. NeuronDevice\n    |           +-1e.0  Amazon.com, Inc. NeuronDevice\n    |           \\-1f.0  Amazon.com, Inc. NVMe SSD Controller\n    +-[0000:10]-+-00.0  Amazon.com, Inc. Elastic Network Adapter (ENA)\n    |           +-01.0  Amazon.com, Inc. Elastic Network Adapter (ENA)\n    |           +-19.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)\n    |           +-1a.0  Amazon.com, Inc. Elastic Fabric Adapter (EFA)\n    |           +-1b.0  Amazon.com, Inc. NeuronDevice\n    |           +-1c.0  Amazon.com, Inc. NeuronDevice\n    |           +-1d.0  Amazon.com, Inc. NeuronDevice\n    |           +-1e.0  Amazon.com, Inc. NeuronDevice\n    |           \\-1f.0  Amazon.com, Inc. NVMe SSD Controller\n    \\-[0000:00]-+-00.0  Intel Corporation 440FX - 82441FX PMC [Natoma]\n                +-01.0  Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]\n                +-01.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI\n                +-03.0  Amazon.com, Inc. Device 1111\n                +-04.0  Amazon.com, Inc. NVMe EBS Controller\n                \\-1f.0  Amazon.com, Inc. NVMe EBS Controller\n\nLaunch instances with EFA enabled and try again. If not planning to use\nthe instances for multi-node training or running on trn1.2xlarge, this\nerror message can be ignored.\n\nCommunication timeout\n^^^^^^^^^^^^^^^^^^^^^^\n\nRanks exchange information during NEFF loading and before the start of\nthe execution. The loading/execution cannot move forward until all ranks\nare ready.\n\n::\n\n   Timeout waiting for RX (waited 120 sec) - retrying\n\n::\n\n   Timeout waiting for incoming connection (waited 120 sec) - retrying\n\n::\n\n   Connect to localhost:33666 failed - retrying\n\n.. _solution-7:\n\nSolution\n~~~~~~~~~\n\nThe communication timeouts are not fatal. The ranks will continue\nwaiting forever. In most case the timeouts are caused by one of the\nranks getting delayed, usually be recompilation of a graph. The\nexecution is resumed after the graph is compiled (might take significant\namount of time). It is possible to determine if compilation is in\nprogress by checking the logs on all nodes.\n\nCommunication timeouts might also indicate that one of the nodes or\nranks is hang. If that is the case, terminate the run and restart from\nthe last known good check point.\n\n.. _communication-errors:\n\nCommunication errors\n---------------------\n\n::\n\n   RX, connection closed by remote peer\n\nThere could be other similar messages indicating that ranks failed to\ncommunicate.\n\n.. _solution-8:\n\nSolution\n^^^^^^^^\n\nOne of the ranks or nodes encountered a problem and terminated.\nTerminate the run and restart from the last known check point.\n\n.. _efa-kernel-messages-dmesg-after-process-termination:\n\nEFA Kernel messages (dmesg) after process termination.\n------------------------------------------------------\n\n::\n\n   [298850.502143] neuron:npid_detach: neuron:npid_detach: pid=90193, slot=0\n   [298850.919248] efa 0000:a0:1a.0 rdmap160s26: Failed to process command DEREG_MR (opcode 8) comp_status 7 err -22\n\n.. _solution-9:\n\nSolution\n^^^^^^^^\n\nWhen a process that executed Collective Communication terminates it\nderegisters buffers that were registered with the networking stack.\nThere is a race condition because the Neuron driver deregisters buffers\nowned by terminating process as part of the memory cleanup. The error is\nbenign and will be removed in the future releases.\n\nFailure to find bootstrap interface\n-----------------------------------\n\n::\n\n   No interface found in the same subnet as remote address fe80::1461:22ff:fe33:b471<45015>\n   No usable listening interface found\n\n.. _solution-10:\n\nSolution\n^^^^^^^^\n\nBootstrap code incorrectly trying to use link-local IPv6 address for\ncommunication. This error will be fixed in the next Neuron release. In\nthe meantime, as a workaround, disable IPv6 on the instances.\n\n::\n\n   sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1\n   sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1\n\nName resolution failure\n-----------------------\n\n.. code:: bash\n\n     WARN Invalid NCCL_COMM_ID [compute1-dy-training-0-1.pcluster-trn1-24-pdx80-2n.pcluster:41211], please use format: <ipv4>:<port> or [<ipv6>]:<port>\n\n.. _solution-11:\n\nSolution\n^^^^^^^^\n\nVerify that the name can be resolved by DNS by using nslookup or dig.  Currently released version fails to resolve FQDN longer than 63 characters.  This error will be fixed in the upcoming Neuron release.  In the mean time use shorter names to ensure that FQDN length does not exceed the maximum of 63 characters.\n\nNeuron Runtime timeout or GPSIMD exception\n------------------------------------------\n\nAt this point, reset of Neuron Runtime is required after running a model which\ninvoked a Neuron Custom C++ operator. Otherwise, a Neuron Runtime timeout or\nGPSIMD exception may occur.\n\nExample Neuron Runtime timeout:\n\n::\n\n   2023-Jan-09 20:27:41.0593 15042:15042 ERROR  TDRV:exec_consume_tpb_status_notifications   Missing infer_status notification: (end:1)\n   2023-Jan-09 20:27:41.0593 15042:15042 ERROR  TDRV:exec_consume_tpb_status_notifications   Missing infer_status notification: (end:2)\n   2023-Jan-09 20:27:41.0593 15042:15042 ERROR  TDRV:exec_consume_tpb_status_notifications   Missing infer_status notification: (end:3)\n   2023-Jan-09 20:27:41.0593 15042:15042 ERROR  TDRV:exec_consume_tpb_status_notifications   Missing infer_status notification: (end:4)\n   2023-Jan-09 20:27:41.0593 15042:15042 ERROR  TDRV:exec_consume_tpb_status_notifications   Missing infer_status notification: (end:0)\n   2023-Jan-09 20:27:41.0593 15042:15042 ERROR  TDRV:exec_consume_infer_status_notifications (FATAL-RT-UNDEFINED-STATE) inference timeout (600000 ms) on Neuron Device 0 NC 0, waiting for execution completion notification\n   2023-Jan-09 20:27:41.0600 15042:15042 ERROR  NMGR:dlr_infer                               Inference completed with err: 5\n\nExample GPSIMD exception:\n\n::\n\n   2023-Jan-06 22:28:01.0845 137472:137472 ERROR TDRV:pool_stdio_queue_consume_all_entries  Printing stderr from GPSIMD:\n   GPSIMD EXCEPTION OCCURRED: ILLEGAL INSTRUCTION\n   Subtype/Type/Cause: 0x201\n   Exception PC: 0x840001E8\n\nSolution\n^^^^^^^^\n\nIf either of the above errors are seen, and ``NEURON_RT_RESET_CORES`` is set to\n0, either unset it or set it to 1. This will enable the default runtime\nbehaviour of resetting NeuronCores when initializing applications. See\n:ref:`nrt-configuration` for more information.\n\nAlso note that the timeout period can be changed by setting\n``NEURON_RT_EXEC_TIMEOUT``. See :ref:`nrt-configuration` for more information.\n\n\nFI_EFA_FORK_SAFE\n----------------\n\nOlder Linux (<5.15) kernels require environment variable FI_EFA_FORK_SAFE to be set to 1 for the libfabric to operate correctly.  Specifically Amazon Linux 2 uses 5.10 kernel and requires the variable to be set.\n\nWhen the variable is not set multi-node collective communication will be disabled.  Intra-node collective communication is still possible.  The following error message will be logged the first time a model containing collective communication is loaded:\n\n.. code-block::\n\n   Linux kernel 5.10 requires setting FI_EFA_FORK_SAFE=1 environment variable.  Multi-node support will be disabled.\n   Please restart with FI_EFA_FORK_SAFE=1 set.\"\n\n\nNeuron driver cannot be uninstalled\n------------------------------------\n\nIf you attempt to uninstall the Neuron driver on Ubuntu with ``sudo dpkg -r aws-neuronx-dkms``, you may get an error like this:\n\n.. code-block::\n\n   Removing aws-neuronx-dkms (2.x) ...\n   Neuron module is currently loaded. Attempting to unload...\n   ERROR: Cannot unload neuron module - it is currently in use.\n   Please stop all processes using the neuron module before uninstalling.\n   dpkg: error processing package aws-neuronx-dkms (--remove):\n   installed aws-neuronx-dkms package pre-removal script subprocess returned error exit status 1\n   Errors were encountered while processing:\n   aws-neuronx-dkms\n\nOn Amazon Linux, you get a similar error if you run ``sudo rpm -e aws-neuronx-dkms`` to uninstall the driver:\n\n.. code-block::\n   \n   Uninstall of aws-neuronx module (version 2.x) beginning:\n   Neuron module is currently loaded. Attempting to unload...\n   ERROR: Cannot unload neuron module - it is currently in use.\n   Please stop all processes using the neuron module before uninstalling.\n   error: %preun(aws-neuronx-dkms-2.x-dkms.noarch) scriptlet failed, exit status 1\n   error: aws-neuronx-dkms-2.x-dkms.noarch: erase failed\n\nUsually, this just means you still have an active process using the driver. Killing that process will allow the driver to be unloaded/uninstalled. But if for some rare reason the driver is stuck, one remediation is to first force uninstall the driver, and then reboot. \n\nSolution\n^^^^^^^^\n\nForce-uninstall the Neuron driver.\n\n.. warning:: Force-uninstalling the driver runs the risk of causing system instability or causing a kernel panic. Reboot your instance immediately after uninstalling it.\n\nTo force-uninstall the driver on Ubuntu instances:\n\n.. code-block::\n\n   sudo dkms remove aws-neuronx/<version> --all\n   sudo dpkg -r --force-all aws-neuronx-dkms\n\nTo force-uninstall the driver on Amazon Linux instances:\n\n.. code-block::\n\n   sudo dkms remove aws-neuronx/<version> --all\n   sudo rpm -e --noscript aws-neuronx-dkms\n"
  },
  {
    "path": "neuron-runtime/rn.rst",
    "content": "What's New\n==========\n\n.. toctree::\n   :maxdepth: 1\n\n   /release-notes/components/runtime\n\n"
  },
  {
    "path": "nki/_ext/nki_directives.py",
    "content": "\"\"\"\nCopyright (c) 2023, Amazon.com. All Rights Reserved\n\nDefine new directives for nki documentation\n\n\"\"\"\n\nfrom __future__ import annotations\n\nimport importlib\nimport os\nfrom typing import TYPE_CHECKING, ClassVar, Any, Union\n\nfrom docutils import nodes\nfrom docutils.parsers.rst import directives\nfrom docutils.statemachine import ViewList\nfrom sphinx.directives.code import (\n    LiteralInclude,\n    container_wrapper,\n    LiteralIncludeReader,\n)\nfrom sphinx.locale import __\nfrom sphinx.util import logging, parselinenos\nfrom sphinx.util.docutils import SphinxDirective\nfrom sphinx.util.nodes import nested_parse_with_titles\n\nif TYPE_CHECKING:\n    from docutils.nodes import Element, Node\n\n    from sphinx.application import Sphinx\n    from sphinx.config import Config\n    from sphinx.util.typing import ExtensionMetadata, OptionSpec\n\nlogger = logging.getLogger(__name__)\n\n\nclass NKIExampleReader(LiteralIncludeReader):\n\n    def __init__(self, filename: str, options: dict[str, Any], config: Config) -> None:\n        if \"diff\" in options:\n            raise ValueError(__(\"`diff` mode is not supported\"))\n\n        super().__init__(filename=filename, options=options, config=config)\n        marker = self.options.get(\"marker\", \"NKI_EXAMPLE\")\n        self.example_begin = f\"{marker}_BEGIN\"\n        self.example_end = f\"{marker}_END\"\n        self.skip_marker = self.options.get(\"skip_marker\", \"NKI_EXAMPLE\")\n\n    def nki_example_filter(\n        self,\n        lines: list[str],\n        location: Union[tuple[str, int], None] = None,\n    ) -> list[str]:\n        whole_file = \"whole-file\" in self.options\n        example_lines = []\n        include_line = whole_file\n        indentsize = 0\n\n        for lineno, line in enumerate(lines):\n            if include_line:\n                if not whole_file and self.example_end in line:\n                    include_line = False\n                    continue\n\n                if self.skip_marker in line:\n                    continue\n\n                if indentsize and \"\\n\" not in line[:indentsize]:\n                    line = line[indentsize:]\n\n                example_lines.append(line)\n                continue\n\n            assert not whole_file, \"`inline` should stay true if `whole_file` is True\"\n            if self.example_begin in line:\n                include_line = True\n                indentsize = len(line) - len(line.lstrip())\n                if example_lines:\n                    # Insert an empty line between blocks\n                    example_lines.append(\"\\n\")\n\n                continue\n\n        return example_lines\n\n    def read(self, location: Union[tuple[str, int], None] = None) -> tuple[str, int]:\n        filters = [\n            self.nki_example_filter,\n            #  self.pyobject_filter,\n            #  self.start_filter,\n            #  self.end_filter,\n            #  self.lines_filter,\n            self.dedent_filter,\n            self.prepend_filter,\n            self.append_filter,\n        ]\n\n        lines = self.read_file(self.filename, location=location)\n\n        for func in filters:\n            lines = func(lines, location=location)\n\n        return \"\".join(lines), len(lines)\n\n\nclass NKIExample(LiteralInclude):\n    \"\"\"A directive to include nki example\"\"\"\n\n    option_spec: ClassVar[OptionSpec] = {\n        \"marker\": str,\n        \"skip_marker\": str,\n        \"whole-file\": directives.flag,\n        **LiteralInclude.option_spec,\n    }\n\n    def run(self) -> list[Node]:\n        document = self.state.document\n        if not document.settings.file_insertion_enabled:\n            return [\n                document.reporter.warning(\"File insertion disabled\", line=self.lineno)\n            ]\n        # convert options['diff'] to absolute path\n        if \"diff\" in self.options:\n            _, path = self.env.relfn2path(self.options[\"diff\"])\n            self.options[\"diff\"] = path\n\n        try:\n            location = self.state_machine.get_source_and_line(self.lineno)\n            nki_root = self.config.nki_example_root\n            if nki_root and not os.path.isabs(self.arguments[0]):\n                filename = os.path.join(nki_root, self.arguments[0])\n                rel_filename = self.arguments[0]\n            else:\n                rel_filename, filename = self.env.relfn2path(self.arguments[0])\n            self.env.note_dependency(rel_filename)\n\n            reader = NKIExampleReader(filename, self.options, self.config)\n            text, lines = reader.read(location=location)\n\n            retnode: Element = nodes.literal_block(text, text, source=filename)\n            retnode[\"force\"] = \"force\" in self.options\n            self.set_source_info(retnode)\n            if self.options.get(\"diff\"):  # if diff is set, set udiff\n                retnode[\"language\"] = \"udiff\"\n            elif \"language\" in self.options:\n                retnode[\"language\"] = self.options[\"language\"]\n            if (\n                \"linenos\" in self.options\n                or \"lineno-start\" in self.options\n                or \"lineno-match\" in self.options\n            ):\n                retnode[\"linenos\"] = True\n            retnode[\"classes\"] += self.options.get(\"class\", [])\n            extra_args = retnode[\"highlight_args\"] = {}\n            if \"emphasize-lines\" in self.options:\n                hl_lines = parselinenos(self.options[\"emphasize-lines\"], lines)\n                if any(i >= lines for i in hl_lines):\n                    logger.warning(\n                        __(\"line number spec is out of range(1-%d): %r\"),\n                        lines,\n                        self.options[\"emphasize-lines\"],\n                        location=location,\n                    )\n                extra_args[\"hl_lines\"] = [x + 1 for x in hl_lines if x < lines]\n            extra_args[\"linenostart\"] = reader.lineno_start\n\n            if \"caption\" in self.options:\n                caption = self.options[\"caption\"] or self.arguments[0]\n                retnode = container_wrapper(self, retnode, caption)\n\n            # retnode will be note_implicit_target that is linked from caption and numref.\n            # when options['name'] is provided, it should be primary ID.\n            self.add_name(retnode)\n\n            return [retnode]\n        except Exception as exc:\n            return [document.reporter.warning(exc, line=self.lineno)]\n\n\ndef setup(app: Sphinx) -> ExtensionMetadata:\n    app.add_config_value(\"nki_example_root\", None, \"env\")\n    app.add_directive(\"nki_example\", NKIExample)\n\n    return {\n        \"version\": \"0.1\",\n        \"parallel_read_safe\": True,\n        \"parallel_write_safe\": True,\n    }\n"
  },
  {
    "path": "nki/_templates/nki-custom-class-attr-only-template.rst",
    "content": "{{ fullname | escape | underline}}\n\n.. currentmodule:: {{ module }}\n\n.. autoclass:: {{ objname }}\n\n   {% block attributes %}\n   {% if attributes %}\n   .. rubric:: {{ _('Attributes') }}\n\n   .. autosummary::\n   {% for item in attributes %}\n      ~{{ name }}.{{ item }}\n   {%- endfor %}\n   {% endif %}\n   {% endblock %}\n"
  },
  {
    "path": "nki/_templates/nki-custom-class-template.rst",
    "content": "{{ fullname | escape | underline}}\n\n.. currentmodule:: {{ module }}\n\n.. autoclass:: {{ objname }}\n   :members:\n\n   {% block methods %}\n   {% if methods %}\n   .. rubric:: {{ _('Methods') }}\n\n   .. autosummary::\n      :nosignatures:\n   {% for item in methods %}\n      {%- if not item.startswith('_') %}\n      ~{{ name }}.{{ item }}\n      {%- endif -%}\n   {%- endfor %}\n   {% endif %}\n   {% endblock %}\n\n   {% block attributes %}\n   {% if attributes %}\n   .. rubric:: {{ _('Attributes') }}\n\n   .. autosummary::\n   {% for item in attributes %}\n      ~{{ name }}.{{ item }}\n   {%- endfor %}\n   {% endif %}\n   {% endblock %}\n"
  },
  {
    "path": "nki/api/index.rst",
    "content": ".. _nki_api_reference:\n\nNKI API Reference Manual\n===============================\n\n.. toctree::\n    :maxdepth: 2\n\n    nki\n    nki.isa\n    nki.language\n    nki.collectives\n    nki.api.shared"
  },
  {
    "path": "nki/api/nki/__init__.py",
    "content": "\"\"\"Auto-generated stub file\"\"\"\nfrom enum import Enum\nimport nki.language as nl\nimport ml_dtypes\n\ndef jit(func=None, mode=\"auto\", **kwargs):\n    r\"\"\"\n    This decorator compiles a top-level NKI function to run on NeuronDevices.\n\n    This decorator tries to automatically detect the current framework and compile\n    the function as a custom operator. To bypass the framework detection logic, you\n    can specify the ``mode`` parameter explicitly.\n\n    You might need to explicitly set the target platform using the\n    ``NEURON_PLATFORM_TARGET_OVERRIDE`` environment variable. Supported values are\n    \"trn1\"/\"gen2\", \"trn2\"/\"gen3\", and \"trn3\"/\"gen4\".\n\n    :param func: Function that defines the custom operation.\n    :param mode: Compilation mode. Supported values are \"jax\", \"torchxla\",\n                 and \"auto\". (Default: \"auto\".)\n\n    .. code-block:: python\n       :caption: Writing an addition kernel using ``@nki.jit``\n\n        @nki.jit()\n        def nki_tensor_add_kernel(a_input, b_input):\n            # Check both input tensor shapes are the same for element-wise operation.\n            assert a_input.shape == b_input.shape\n\n            # Check the first dimension's size to ensure it does not exceed on-chip\n            # memory tile size, since this simple kernel does not tile inputs.\n            assert a_input.shape[0] <= nl.tile_size.pmax\n\n            # Allocate space for the input tensors in SBUF and copy the inputs from HBM\n            # to SBUF with DMA copy.\n            a_tile = nl.ndarray(dtype=a_input.dtype, shape=a_input.shape, buffer=nl.sbuf)\n            nisa.dma_copy(dst=a_tile, src=a_input)\n\n            b_tile = nl.ndarray(dtype=b_input.dtype, shape=b_input.shape, buffer=nl.sbuf)\n            nisa.dma_copy(dst=b_tile, src=b_input)\n\n            # Allocate space for the result and use tensor_tensor to perform\n            # element-wise addition. Note: the first argument of 'tensor_tensor'\n            # is the destination tensor.\n            c_tile = nl.ndarray(dtype=a_input.dtype, shape=a_input.shape, buffer=nl.sbuf)\n            nisa.tensor_tensor(dst=c_tile, data1=a_tile, data2=b_tile, op=nl.add)\n\n            # Create a tensor in HBM and copy the result into HBM.\n            c_output = nl.ndarray(dtype=a_input.dtype, shape=a_input.shape, buffer=nl.hbm)\n            nisa.dma_copy(dst=c_output, src=c_tile)\n\n            # Return kernel output as function output.\n            return c_output\n    \"\"\"\n    ...\n\n\ndef simulate(kernel):\n    \"\"\"Create a CPU-simulated version of an NKI kernel.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    See :ref:`nki-simulate` for full documentation including target platform\n    selection, precise floating-point mode, debugging, and known limitations.\n\n    Example:\n    \n    .. code-block:: python\n\n        @nki.jit\n        def my_kernel(a, b): ...\n\n        # Explicit simulation\n        result = nki.simulate(my_kernel)(a_np, b_np)\n\n        # With LNC2\n        result = nki.simulate(my_kernel[2])(a_np, b_np)\n\n    Args:\n      kernel: NKI kernel function, typically decorated with ``@nki.jit``.\n        If a plain function is passed, it is automatically wrapped.\n\n    Returns:\n      A callable that, when invoked with NumPy arrays, executes the kernel\n      on CPU and returns NumPy array results.\n    \"\"\"\n    ...\n"
  },
  {
    "path": "nki/api/nki/collectives/__init__.py",
    "content": "\"\"\"Stubs for nki.collectives\"\"\"\n\nfrom enum import Enum\nimport nki.language as nl\n\nclass NKIObject:\n    r\"\"\"Base class for NKI kernel dataclasses and configuration objects.\"\"\"\n    ...\n\n\nclass ReplicaGroup(NKIObject):\n    r\"\"\"Defines a group of ranks that participate in a collective operation.\n\n    Sub-groups represented by lists of ranks should not have any overlap.\"\"\"\n    ...\n\n\ndef all_gather(srcs, dsts, replica_group, collective_dim):\n    r\"\"\"Perform an all-gather on the given replica group and input/output tensors.\n\n    The ``srcs`` and ``dsts`` parameters accept lists of tensors to support coalesced\n    collective communication, which allows multiple tensors to be gathered in a single\n    collective operation for improved efficiency.\n\n    Tensors can reside on either HBM or SBUF. However, mixing memory spaces is not\n    supported: all tensors must be on HBM or all must be on SBUF. Coalesced collective\n    communication (multiple tensors) is only supported when tensors are on HBM.\n\n    :param srcs: List of input tensors to gather\n    :param dsts: List of output tensors to store results\n    :param replica_group: ReplicaGroup defining rank groups for the collective\n    :param collective_dim: Dimension along which output tensors are concatenated.\n        Currently only 0 is supported for HBM tensors. For SBUF tensors, 0 or 1 is\n        supported as SBUF collectives currently only operate on 2D tensors with a\n        single free dimension.\"\"\"\n    ...\n\n\ndef all_reduce(srcs, dsts, replica_group, op):\n    r\"\"\"Perform an all-reduce on the given replica group and input/output tensors.\n\n    The ``srcs`` and ``dsts`` parameters accept lists of tensors to support coalesced\n    collective communication, which allows multiple tensors to be reduced in a single\n    collective operation for improved efficiency.\n\n    Tensors can reside on either HBM or SBUF. However, mixing memory spaces is not\n    supported: all tensors must be on HBM or all must be on SBUF. Coalesced collective\n    communication (multiple tensors) is only supported when tensors are on HBM.\n\n    :param srcs: List of input tensors to reduce\n    :param dsts: List of output tensors to store results\n    :param replica_group: ReplicaGroup defining rank groups for the collective\n    :param op: The reduction operation to perform (``nl.add``, ``nl.minimum``, or ``nl.maximum``)\"\"\"\n    ...\n\n\ndef all_to_all(srcs, dsts, replica_group, collective_dim):\n    r\"\"\"Perform an all-to-all on the given replica group and input/output tensors.\n\n    The ``srcs`` and ``dsts`` parameters accept lists of tensors to support coalesced\n    collective communication, which allows multiple tensors to be redistributed in a\n    single collective operation for improved efficiency.\n\n    Tensors must reside on HBM. SBUF is not currently supported for all-to-all.\n\n    :param srcs: List of input tensors to redistribute\n    :param dsts: List of output tensors to store results\n    :param replica_group: ReplicaGroup defining rank groups for the collective\n    :param collective_dim: Dimension along which input tensors are split and output tensors are concatenated.\n        Currently only 0 is supported.\"\"\"\n    ...\n\n\ndef all_to_all_v(srcs, dsts, replica_group, metadata_tensor, recv_counts_known=False, has_rdispls=False):\n    r\"\"\"Perform a variable-length all-to-all on the given replica group and input/output tensors.\n\n    Unlike all_to_all which splits and concatenates along a collective_dim,\n    all_to_all_v treats tensors as flat buffers of elements. Counts and\n    displacements in the metadata tensor are in elements (row-major order),\n    not slices along a particular dimension.\n\n    :param srcs: List of input tensors to redistribute (must be exactly one)\n    :param dsts: List of output tensors to store results (must be exactly one)\n    :param replica_group: ReplicaGroup defining rank groups for the collective\n    :param metadata_tensor: Metadata tensor of shape (2-4, world_size), dtype uint32.\n                            Row 0: send counts, Row 1: send displacements,\n                            Row 2 (optional): recv counts, Row 3 (optional): recv displacements.\n    :param recv_counts_known: If True, metadata includes receive counts (row 2)\n    :param has_rdispls: If True, metadata includes receive displacements (row 3)\"\"\"\n    ...\n\n\ndef collective_permute(srcs, dsts, source_target_pairs):\n    r\"\"\"Send and receive data between ranks based on explicitly defined source-target pairs.\n\n    Each pair ``(source, target)`` specifies that data from the source rank\n    should be sent to the target rank. This gives you full control over the\n    communication pattern (e.g., pairwise swaps, arbitrary shuffles).\n\n    Prefer :func:`collective_permute_implicit` when the communication\n    follows a ring topology, as the hardware can optimize that pattern.\n\n    Tensors must reside on HBM. SBUF is not currently supported for collective_permute.\n\n    Coalesced collective communication (multiple tensors) is not currently supported;\n    each list parameter must contain exactly one tensor.\n\n    :param srcs: List of source tensors to send\n    :param dsts: List of destination tensors to receive into\n    :param source_target_pairs: List of (source, target) rank ID pairs\"\"\"\n    ...\n\n\ndef collective_permute_implicit(srcs_by_channel, dsts_by_channel, replica_group, channel_ids=[0]):\n    r\"\"\"Send and receive data between ranks in a ring, where sources and destinations are\n    implicitly determined by the ring structure during runtime.\n\n    Each rank sends data to its successor and receives from its predecessor in the ring.\n    This differs from :func:`collective_permute` where users explicitly specify source-target pairs.\n\n    Since the sources and destinations are implicitly determined, use\n    :func:`collective_permute_implicit_current_processing_rank_id` to get the rank ID\n    whose data is currently being processed.\n\n    The outer dimension of ``srcs_by_channel`` and ``dsts_by_channel`` corresponds to channels.\n    For each channel, the inner list contains exactly one tensor (coalesced collective\n    communication is not currently supported).\n\n    **Channels**: Multiple channels enable overlapping communication, allowing concurrent data\n    transfers. The number of available channels depends on the replica group and system\n    connectivity (see\n    `Neuron Collectives <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/about/collectives.html#system-connectivity>`_).\n    The maximum number of channels is 4 for replica groups containing all devices inside a node\n    and 2 for other supported replica groups.\n\n    :param srcs_by_channel: List of source tensor lists, one per channel. Each inner list must contain exactly one tensor.\n    :param dsts_by_channel: List of destination tensor lists, one per channel. Each inner list must contain exactly one tensor.\n    :param replica_group: ReplicaGroup defining rank groups for the collective\n    :param channel_ids: List of channel IDs to use for communication (default [0] for single channel).\n        Currently must be consecutive integers starting from 0.\"\"\"\n    ...\n\n\ndef collective_permute_implicit_current_processing_rank_id(iteration_id, replica_group, channel_id=0):\n    r\"\"\"Returns the rank ID of the data to be processed in the current ring iteration.\n\n    This function is intended to be used in conjunction with\n    :func:`collective_permute_implicit` or :func:`collective_permute_implicit_reduce`.\n    Since the sources and destinations are implicitly determined in ring algorithms,\n    the rank ID of received data can only be determined at runtime.\n\n    At iteration 0, this returns the current rank's own ID (processing local data).\n    In subsequent iterations, it returns the rank ID of data received from predecessors,\n    progressing around the ring.\n\n    The returned rank ID is a scalar register. To determine the offset of the received\n    data chunk within a tensor, use register ALU operations (e.g., multiply the rank ID\n    by chunk size), then use dynamic access pattern (``tensor.ap()``) in ISA compute\n    operations (e.g., ``nisa.nc_matmul()``).\n\n    **Typical usage pattern**: In each iteration of a ring algorithm, the compute kernel\n    uses this function to identify which rank's data is being processed, computes on that\n    data while concurrently triggering the next communication step to send already-computed\n    chunks to the successor.\n\n    **Channels**: Multiple channels enable overlapping communication, allowing concurrent data\n    transfers. The number of available channels depends on the replica group and system\n    connectivity (see\n    `Neuron Collectives <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/about/collectives.html#system-connectivity>`_).\n    The maximum number of channels is 4 for replica groups containing all devices inside a node\n    and 2 for other supported replica groups.\n\n    :param iteration_id: Current ring step (typically the loop counter).\n    :param replica_group: ReplicaGroup defining the ring topology\n    :param channel_id: Channel ID for the communication (0 to num_channels-1)\n    :return: Scalar register containing the rank ID of the data to be processed\"\"\"\n    ...\n\n\ndef collective_permute_implicit_reduce(srcs0_by_channel, srcs1_by_channel, dsts_by_channel, replica_group, op, channel_ids=[0]):\n    r\"\"\"Perform an implicit collective permute with reduction in a ring, where sources and\n    destinations are implicitly determined by the ring structure during runtime.\n\n    Combines :func:`collective_permute_implicit` with a reduction operation.\n    Each rank reduces its local sources using ``op(srcs0_by_channel[i], srcs1_by_channel[i])``,\n    sends the result to its successor, and receives its predecessor's reduced result into\n    ``dsts_by_channel[i]``.\n\n    Since the sources and destinations are implicitly determined, use\n    :func:`collective_permute_implicit_current_processing_rank_id` to get the rank ID\n    whose data is currently being processed.\n\n    The outer dimension of ``srcs0_by_channel``, ``srcs1_by_channel``, and ``dsts_by_channel``\n    corresponds to channels. For each channel, the inner list contains exactly one tensor\n    (coalesced collective communication is not currently supported).\n\n    **Channels**: Multiple channels enable overlapping communication, allowing concurrent data\n    transfers. The number of available channels depends on the replica group and system\n    connectivity (see\n    `Neuron Collectives <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/about/collectives.html#system-connectivity>`_).\n    The maximum number of channels is 4 for replica groups containing all devices inside a node\n    and 2 for other supported replica groups.\n\n    :param srcs0_by_channel: List of source tensor lists (left operand of reduction), one per channel. Each inner list must contain exactly one tensor.\n    :param srcs1_by_channel: List of source tensor lists (right operand of reduction), one per channel. Each inner list must contain exactly one tensor.\n    :param dsts_by_channel: List of destination tensor lists to receive predecessor's reduced result, one per channel. Each inner list must contain exactly one tensor.\n    :param replica_group: ReplicaGroup defining rank groups for the collective\n    :param op: The reduction operation to perform (``nl.add``, ``nl.minimum``, or ``nl.maximum``)\n    :param channel_ids: List of channel IDs to use for communication (default [0] for single channel).\n        Currently must be consecutive integers starting from 0.\"\"\"\n    ...\n\n\ndef rank_id():\n    r\"\"\"Get the rank ID of the current rank.\n\n    :return: The rank ID of the current rank within the collective group\"\"\"\n    ...\n\n\ndef reduce_scatter(srcs, dsts, replica_group, collective_dim, op):\n    r\"\"\"Perform a reduce-scatter on the given replica group and input/output tensors.\n\n    The ``srcs`` and ``dsts`` parameters accept lists of tensors to support coalesced\n    collective communication, which allows multiple tensors to be reduced and scattered\n    in a single collective operation for improved efficiency.\n\n    Tensors can reside on either HBM or SBUF. However, mixing memory spaces is not\n    supported: all tensors must be on HBM or all must be on SBUF. Coalesced collective\n    communication (multiple tensors) is only supported when tensors are on HBM.\n\n    :param srcs: List of input tensors to reduce and scatter\n    :param dsts: List of output tensors to store results\n    :param replica_group: ReplicaGroup defining rank groups for the collective\n    :param collective_dim: Dimension along which input tensors are split.\n        Currently only 0 is supported for both HBM and SBUF tensors.\n    :param op: The reduction operation to perform (``nl.add``, ``nl.minimum``, or ``nl.maximum``)\"\"\"\n    ...\n\n"
  },
  {
    "path": "nki/api/nki/isa/__init__.py",
    "content": "\"\"\"Stubs for nki.isa\"\"\"\n\nfrom enum import Enum\nimport nki.collectives as nc\nimport nki.isa as nisa\nimport nki.language as nl\n\nclass NKIObject:\n    r\"\"\"Base class for NKI kernel dataclasses and configuration objects.\"\"\"\n    ...\n\n\nclass NkiValidationError(Exception):\n    r\"\"\"Raised when hardware constraints are violated.\"\"\"\n    ...\n\n\nclass VirtualRegister(NKIObject):\n    r\"\"\"A virtual register on engine.\n\n    Allocated via ``nisa.register_alloc()`` and manipulated via\n    ``nisa.register_move()``, ``nisa.register_load()``, ``nisa.register_store()``.\n    \n    Virtual registers represent registers on engine and are used for various APIs \n    such loading and storing constants from tensors, as the return value of \n    ``nki.collective`` and ``nki.isa`` APIs, and for dynamic addressing.\n    \n    In addition to NKI APIs, virtual registers can be used to represent dynamic \n    loop bounds for for loops using :doc:`dynamic_range <nki.language.dynamic_range>`,\n    and while loops.\n    \n    .. code-block:: python\n\n        import nki.language as nl\n        import nki.isa as nisa\n\n        # Using a register in a dynamic for loop.\n        reg = nisa.register_alloc(5)\n        for _ in nl.dynamic_range(reg):\n            tile = nl.load(input_tensor[0:128, 0:512])\n            result = nl.multiply(tile, tile)\n            nl.store(out_tensor[0:128, 0:512], result)\n       \n    .. code-block:: python\n\n        import nki.language as nl\n        import nki.isa as nisa\n\n        # Using a register in a dynamic while loop.\n        cond_sb = nl.ndarray((1, 1), dtype=nl.int32, buffer=nl.sbuf)\n        nisa.dma_copy(dst=cond_sb, src=...)\n\n        # Load condition into register\n        reg = nisa.register_alloc()\n        nisa.register_load(reg, cond_sb)\n\n        while reg:\n            ... \n            nisa.dma_copy(dst=cond_sb, src = ...)\n            nisa.register_load(reg, cond_sb)\n            \n    \"\"\"\n    ...\n\n\nclass dge_mode(Enum):\n    r\"\"\"Descriptor Generation Engine mode.\"\"\"\n\n    unknown = 0\n    \"\"\"Unknown DGE mode, i.e., let compiler decide the DGE mode\"\"\"\n    swdge = 1\n    \"\"\"Software DGE\"\"\"\n    hwdge = 2\n    \"\"\"Hardware DGE\"\"\"\n    none = 3\n    \"\"\"Not using DGE\"\"\"\n\n\nclass dma_engine(Enum):\n    r\"\"\"DMA transfer engine.\n        \"\"\"\n\n    dma = 1\n    \"\"\"Shared DMA with CoreBarrier synchronization (default). Can be triggered from any engine.\"\"\"\n    gpsimd_dma = 2\n    \"\"\"GPSIMD's internal DMA engine for low-latency SB-to-SB swaps in LNC=2.\n        Implies GPSIMD as the trigger engine.\"\"\"\n\n\nclass engine(Enum):\n    r\"\"\"Neuron Device engines.\"\"\"\n\n    tensor = 1\n    \"\"\"Tensor Engine\"\"\"\n    vector = 5\n    \"\"\"Vector Engine\"\"\"\n    scalar = 2\n    \"\"\"Scalar Engine\"\"\"\n    gpsimd = 3\n    \"\"\"GpSIMD Engine\"\"\"\n    dma = 4\n    \"\"\"DMA Engine\"\"\"\n    sync = 6\n    \"\"\"Sync Engine\"\"\"\n    unknown = 0\n    \"\"\"Unknown Engine\"\"\"\n\n\nclass matmul_perf_mode(Enum):\n    r\"\"\"Performance mode for matmul.\"\"\"\n\n    none = 'none'\n    \"\"\"Default mode, no performance optimization\"\"\"\n    double_row = 'double_row'\n    \"\"\"Double FP8 mode, 2x matmul throughput by packing two FP8 weight/ifmap element pairs\"\"\"\n\n\nclass nc_version(Enum):\n    r\"\"\"NeuronCore version.\"\"\"\n\n    gen2 = 2\n    \"\"\"Trn1/Inf2 target\"\"\"\n    gen3 = 3\n    \"\"\"Trn2 target\"\"\"\n    gen4 = 4\n    \"\"\"Trn3 target\"\"\"\n\n\nclass oob_mode(Enum):\n    r\"\"\"Out-of-bounds access mode.\"\"\"\n\n    error = 0\n    \"\"\"Raise a runtime error when an out-of-bounds access is detected.\"\"\"\n    skip = 1\n    \"\"\"Silently skip the runtime out-of-bounds access.\"\"\"\n\n\nclass reduce_cmd(Enum):\n    r\"\"\"Engine register reduce commands.\"\"\"\n\n    idle = 0\n    \"\"\"Not using the accumulator registers\"\"\"\n    reset = 1\n    \"\"\"Resets the accumulator registers to its initial state\"\"\"\n    reduce = 2\n    \"\"\"Keeps accumulating over the current value of the accumulator registers\"\"\"\n    reset_reduce = 3\n    \"\"\"Resets the accumulator registers then immediately accumulate the results of the current instruction into the accumulators\"\"\"\n    load_reduce = 4\n    \"\"\"Loads a value into the accumulator registers, then accumulate the results of the current instruction into the accumulators\"\"\"\n\n\ndef activation(dst, op, data, bias=None, scale=1.0, reduce_op=None, reduce_res=None, reduce_cmd=reduce_cmd.idle, name=None):\n    r\"\"\"Apply an activation function on every element of the input tile using Scalar Engine, with an optional scale/bias operation\n    before the activation and an optional reduction operation after the activation in the same instruction.\n\n    The activation function is specified in the ``op`` input field (see :ref:`nki-act-func` for a list of\n    supported activation functions and their valid input ranges).\n\n    ``nisa.activation`` can optionally multiply the input ``data`` by a scalar or vector ``scale``\n    and then add another vector ``bias`` before the activation function is applied.\n\n    After the activation function\n    is applied, Scalar Engine can also reduce along the free dimensions of the activated data per lane, using\n    ``reduce_op`` operation. ``reduce_op`` must be ``nl.add``.\n\n    The reduction result is then either stored into or reduced on top of a set of internal engine registers\n    called ``reduce_regs`` (one 32-bit register per compute lane, 128 registers in total), controlled by the\n    ``reduce_cmd`` field:\n\n    - ``nisa.reduce_cmd.reset``: Reset ``reduce_regs`` to zero only.\n    - ``nisa.reduce_cmd.idle``: Do not modify ``reduce_regs``.\n    - ``nisa.reduce_cmd.reduce``: Reduce activated data over existing values in ``reduce_regs``.\n    - ``nisa.reduce_cmd.reset_reduce``: Reset ``reduce_regs`` to zero and then store the reduction result\n      of the activated data.\n\n    ``nisa.activation`` can also emit another instruction to read out ``reduce_regs`` by\n    passing an SBUF/PSUM tile in the ``reduce_res`` arguments.\n    The ``reduce_regs`` state can persist across multiple ``nisa.activation`` instructions without the need to\n    be evicted back to SBUF/PSUM (``reduce_res`` tile).\n\n    The following is the pseudo code for ``nisa.activation``:\n\n    .. code-block:: python\n\n        output = op(data * scale + bias)\n\n        if reduce_cmd == nisa.reduce_cmd.reset or reduce_cmd == nisa.reduce_cmd.reset_reduce:\n            reduce_regs = 0\n\n        result = reduce_op(reduce_regs, reduce_op(output, axis=<FreeAxis>))\n\n        if reduce_cmd == nisa.reduce_cmd.reduce or reduce_cmd == nisa.reduce_cmd.reset_reduce:\n            reduce_regs += result\n\n        if reduce_res:\n            reduce_res = reduce_regs\n\n    All these optional operations incur no further performance penalty compared to only applying the activation function,\n    except reading out ``reduce_regs`` into ``reduce_res`` will have a small overhead due to an extra instruction.\n\n    **Memory types.**\n\n    The input ``data`` tile can be an SBUF or PSUM tile. Similarly, the instruction\n    can write the output ``dst`` tile into either SBUF or PSUM.\n\n    **Data types.**\n\n    Both input ``data`` and output ``dst`` tiles can be in any valid NKI data type\n    (see :ref:`nki-dtype` for more information).\n    The Scalar Engine always performs the math operations in float32 precision.\n    Therefore, the engine automatically casts the input ``data`` tile to float32 before\n    performing multiply/add/activate specified in the activation instruction.\n    The engine is also capable of casting the float32 math results into another\n    output data type in ``dst`` at no additional performance cost.\n    The ``scale`` parameter must\n    have a float32 data type, while the ``bias`` parameter can be any supported dtype except tfloat32.\n\n    **Layout.**\n\n    The ``scale`` can either be a compile-time constant scalar or a\n    ``[N, 1]`` vector from SBUF/PSUM. ``N`` must be the same as the partition dimension size of ``data``.\n    In NeuronCore-v2, the ``bias`` must be a ``[N, 1]`` vector, but starting NeuronCore-v3, ``bias`` can either be\n    a compile-time constant scalar or a ``[N, 1]`` vector similar to ``scale``.\n\n    When the ``scale`` (or similarly, ``bias``) is a scalar, the scalar\n    is broadcasted to all the elements in the input ``data`` tile to perform the computation.\n    When the ``scale`` (or ``bias``) is a vector, the ``scale`` (or ``bias``) value in each partition is broadcast\n    along the free dimension of the ``data`` tile.\n\n    **Tile size.**\n\n    The partition dimension size of input ``data`` and output ``dst`` tiles must be the same and must not exceed 128.\n    The number of elements per partition of ``data`` and ``dst`` tiles must be the same and must not\n    exceed the physical size of each SBUF partition.\n\n    :param dst: the activation output\n    :param op: an activation function (see :ref:`nki-act-func` for supported functions)\n    :param data: the input tile; layout: (partition axis <= 128, free axis)\n    :param scale: a scalar or a vector for multiplication\n    :param bias: a scalar (NeuronCore-v3 or newer) or a vector for addition\n    :param reduce_op: the reduce operation to perform on the free dimension of the activated data\n    :param reduce_res: a tile of shape ``(data.shape[0], 1)`` to hold the final state of ``reduce_regs``.\n    :param reduce_cmd: an enum member from ``nisa.reduce_cmd`` to control the state of ``reduce_regs``.\"\"\"\n    ...\n\n\ndef activation_reduce(dst, op, data, reduce_op, reduce_res, bias=None, scale=1.0, name=None):\n    r\"\"\"Perform the same computation as ``nisa.activation`` and also a reduction along the free dimension of the\n    ``nisa.activation`` result using Scalar Engine. The results for the reduction is stored\n    in the reduce_res.\n\n    This API is equivalent to calling ``nisa.activation`` with\n    ``reduce_cmd=nisa.reduce_cmd.reset_reduce`` and passing in reduce_res. This API is kept for\n    backward compatibility, we recommend using ``nisa.activation`` moving forward.\n\n    Refer to :doc:`nisa.activation <nki.isa.activation>` for semantics of ``op/data/bias/scale``.\n\n    In addition to :doc:`nisa.activation <nki.isa.activation>` computation, this API also performs a reduction\n    along the free dimension(s) of the :doc:`nisa.activation <nki.isa.activation>` result, at a small additional\n    performance cost. The reduction result is returned in ``reduce_res`` in-place, which must be a\n    SBUF/PSUM tile with the same partition axis size as the input tile ``data`` and one element per partition.\n    On NeuronCore-v2, the ``reduce_op`` must be ``nl.add``.\n\n    There are 128 registers on the scalar engine for storing reduction results, corresponding\n    to the 128 partitions of the input. These registers are shared between ``activation`` and ``activation_accu`` calls.\n    This instruction first resets those\n    registers to zero, performs the reduction on the value after activation function is applied,\n    stores the results into the registers,\n    then reads out the reduction results from the register, eventually store them into ``reduce_res``.\n\n    Note that ``nisa.activation`` can also change the state of the register. It's user's\n    responsibility to ensure correct ordering. It's the best practice to not mixing\n    the use of ``activation_reduce`` and ``activation``.\n\n    Reduction axis is not configurable in this API. If the input tile has multiple free axis, the API will\n    reduce across all of them.\n\n    Mathematically, this API performs the following computation:\n\n    .. code-block:: python\n\n        output = op(data * scale + bias)\n        reduce_res = reduce_op(output, axis=<FreeAxis>)\n\n    :param dst: output tile of the activation instruction; layout: same as input ``data`` tile\n    :param op: an activation function (see :ref:`nki-act-func` for supported functions)\n    :param data: the input tile; layout: (partition axis <= 128, free axis)\n    :param reduce_op: the reduce operation to perform on the free dimension of the activation result\n    :param reduce_res: a tile of shape ``(data.shape[0], 1)``, where data.shape[0]\n                    is the partition axis size of the input ``data`` tile. The result of ``sum(ReductionResult)``\n                    is written in-place into the tensor.\n    :param bias: a vector with the same partition axis size as ``data``\n                 for broadcast add (after broadcast multiply with ``scale``)\n    :param scale: a scalar or a vector with the same partition axis size as ``data``\n                  for broadcast multiply\"\"\"\n    ...\n\n\ndef affine_select(dst, pattern, channel_multiplier, on_true_tile, on_false_value, cmp_op=nl.equal, offset=0, name=None):\n    r\"\"\"Select elements between an input tile ``on_true_tile`` and a scalar value ``on_false_value``\n    according to a boolean predicate tile using GpSimd Engine.\n\n    The predicate tile is calculated on-the-fly in the engine by evaluating an affine expression element-by-element.\n    The affine expression is defined by a ``pattern``, ``offset``, and ``channel_multiplier``, similar to ``nisa.iota``.\n    The ``pattern`` field is a list of lists in the form of\n    ``[[step_w, num_w], [step_z, num_z], [step_y, num_y], [step_x, num_x]]``. When fewer than 4D ``pattern``\n    is provided, NKI compiler automatically pads remaining dimensions with size of 1.\n\n    Given a 4D pattern (padded if needed), the instruction generates a predicate using the following pseudo code:\n\n    .. code-block:: python\n\n        num_partitions = dst.shape[0]\n        [[step_w, num_w], [step_z, num_z], [step_y, num_y], [step_x, num_x]] = pattern\n\n        for channel_id in range(num_partitions):\n          for w in range(num_w):\n            for z in range(num_z):\n              for y in range(num_y):\n                for x in range(num_x):\n                  affine_value = offset + (channel_id * channel_multiplier) +\n                                (w * step_w) + (z * step_z) + (y * step_y) + (x * step_x)\n\n                  predicate = cmp_op(affine_value, 0)  # Compare with 0 using cmp_op\n\n                  if predicate:\n                      dst[channel_id, w, z, y, x] = on_true_tile[channel_id, w, z, y, x]\n                  else:\n                      dst[channel_id, w, z, y, x] = on_false_value\n\n    The above pseudo code assumes ``dst`` has the same size in every dimension ``x/y/z/w`` for simplicity. However,\n    the instruction allows any sizes in the free dimension, as long as the number of elements per partition in ``dst``\n    matches the product: ``num_w * num_z * num_y * num_x``.\n\n    A common use case for ``affine_select`` is to apply a causal mask on the attention\n    scores for transformer decoder models.\n\n    **Memory types.**\n\n    The output ``dst`` tile must be in SBUF. The input ``on_true_tile`` must also be in SBUF.\n\n    **Data types.**\n\n    The input ``on_true_tile`` and output ``dst`` tile can be any valid NKI data type\n    (see :ref:`nki-dtype` for more information). If the data type of ``on_true_tile`` differs from\n    that of ``dst``, the input elements in ``on_true_tile``, if selected, are first cast to FP32\n    before converting to the output data type in ``dst``.\n    The ``on_false_value`` must be float32, regardless of the input/output tile data types.\n\n    **Layout.**\n\n    The partition dimension determines the number of active channels for parallel pattern generation and selection.\n    The input tile ``on_true_tile``, the calculated boolean predicate tile, and the returned output tile\n    must have the same partition dimension size and.\n\n    **Tile size.**\n\n    - The partition dimension size of ``dst`` and ``on_true_tile`` must be the same and must not exceed 128.\n    - The number of elements per partition of ``dst`` and ``on_true_tile`` must not\n      exceed the physical size of each SBUF partition.\n    - The total number of elements in ``pattern`` must match the number of elements\n      per partition in the ``dst`` and ``on_true_tile`` tiles.\n\n    :param dst: the output tile in SBUF to store the selected values\n    :param pattern: a list of [step, num] to describe up to 4D tensor sizes and strides for affine expression generation\n    :param offset: an int32 offset value to be added to every generated affine value\n    :param channel_multiplier: an int32 multiplier to be applied to the channel (partition) ID\n    :param on_true_tile: an input tile for selection with a ``True`` predicate value\n    :param on_false_value: a scalar value for selection with a ``False`` predicate value\n    :param cmp_op: comparison operator to use for predicate evaluation (default: nl.equal)\"\"\"\n    ...\n\n\ndef bn_aggr(dst, data, name=None):\n    r\"\"\"Aggregate one or multiple ``bn_stats`` outputs to generate\n    a mean and variance per partition using Vector Engine.\n\n    The input ``data`` tile\n    effectively has an array of ``(count, mean, variance*count)`` tuples per partition\n    produced by  :doc:`bn_stats <nki.isa.bn_stats>` instructions. Therefore, the number of elements per partition\n    of ``data`` must be a modulo of three.\n\n    Note, if you need to aggregate multiple ``bn_stats`` instruction outputs,\n    it is recommended to declare a SBUF tensor\n    and then make each ``bn_stats`` instruction write its output into the\n    SBUF tensor at different offsets.\n\n    Vector Engine performs the statistics aggregation in float32 precision.\n    The engine automatically casts the input ``data`` to float32 before performing computation.\n    The float32 computation results are cast to ``dst.dtype`` at no additional performance cost.\n\n    :param dst: an output tile with two elements per partition: a mean followed by a variance\n    :param data: an input tile with results of one or more :doc:`bn_stats <nki.isa.bn_stats>`\"\"\"\n    ...\n\n\ndef bn_stats(dst, data, name=None):\n    r\"\"\"Compute mean- and variance-related statistics for each partition of an input tile ``data``\n    in parallel using Vector Engine.\n\n    The output tile of the instruction has 6 elements per partition:\n\n    - the ``count`` of the even elements (of the input tile elements from the same partition)\n    - the ``mean`` of the even elements\n    - ``variance * count`` of the even elements\n    - the ``count`` of the odd elements\n    - the ``mean`` of the odd elements\n    - ``variance * count`` of the odd elements\n\n    To get the final mean and variance of the input tile,\n    we need to pass the above ``bn_stats`` instruction output\n    into the :doc:`bn_aggr <nki.isa.bn_aggr>`\n    instruction, which will output two elements per partition:\n\n    - mean (of the original input tile elements from the same partition)\n    - variance\n\n    Due to hardware limitation, the number of elements per partition\n    (i.e., free dimension size) of the input ``data`` must not exceed 512 (nl.tile_size.bn_stats_fmax).\n    To calculate per-partition mean/variance of a tensor with more than\n    512 elements in free dimension, we can invoke ``bn_stats`` instructions\n    on each 512-element tile and use a single ``bn_aggr`` instruction to\n    aggregate ``bn_stats`` outputs from all the tiles.\n\n    Vector Engine performs the above statistics calculation in float32 precision.\n    The engine automatically casts the input ``data`` to float32 before performing computation.\n    The float32 computation results are cast to ``dst.dtype`` at no additional performance cost.\n\n    :param dst: an output tile with 6-element statistics per partition\n    :param data: the input tile (up to 512 elements per partition)\"\"\"\n    ...\n\n\ndef core_barrier(data, cores, engine=engine.gpsimd, name=None):\n    r\"\"\"Synchronize execution across multiple NeuronCores by implementing a barrier mechanism.\n\n    .. note::\n      Available only on NeuronCore-v3 or newer.\n\n    This instruction creates a synchronization point where all specified NeuronCores must\n    reach before any can proceed. The barrier is implemented using a semaphore-based protocol\n    where each NeuronCore writes a semaphore to each other core (remote semaphore update)\n    and then waits for the other cores' semaphores before continuing execution (local semaphore wait).\n\n    The use case is when two NeuronCores both need to write to disjoint portions of a\n    shared HBM tensor (``data``) and they both need to consume the tensor after both cores\n    have finished writing into the tensor. In this case, both cores can perform the write to\n    ``data`` in HBM using ``nisa.dma_copy``, and then signal to each other when the write operation is complete\n    using ``nisa.core_barrier``.\n\n    This instruction is only allowed in NeuronCore-v3 or newer when\n    `LNC (Logical NeuronCore) <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/logical-neuroncore-config.html>`_\n    is enabled. Currently only ``cores=(0, 1)`` is supported. This allows synchronization between exactly\n    two NeuronCores that share the same HBM stack.\n\n    The ``data`` parameter represents the shared data that all cores need to synchronize on.\n    This must be data in shared HBM that multiple cores are accessing.\n\n    The ``engine`` parameter allows specifying which engine inside the NeuronCores should execute the barrier\n    instruction (that is, the remote semaphore update and local semaphore wait). The barrier will block\n    execution on this engine, other engines will not be blocked.\n\n    :param data: the shared data that all cores need to synchronize on; must be data in shared HBM\n    :param cores: a tuple of core indices to synchronize; only ``(0, 1)`` is supported when LNC2 is enabled\n    :param engine: the engine to execute the barrier instruction on; defaults to GpSimd Engine\n\n    Example:\n\n    .. code-block:: python\n\n        # Synchronize between two cores after each core writes to half of shared tensor\n        shared_tensor = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.shared_hbm)\n\n        # Each core writes to half of the tensor\n        if core_id == 0:\n            # Core 0 writes to first half\n            core0_data = nl.ndarray((batch_size // 2, hidden_dim), dtype=nl.float32, buffer=nl.sbuf)\n            nisa.dma_copy(dst=shared_tensor[:batch_size // 2, :], src=core0_data)\n        else:\n            # Core 1 writes to second half\n            core1_data = nl.ndarray((batch_size // 2, hidden_dim), dtype=nl.float32, buffer=nl.sbuf)\n            nisa.dma_copy(dst=shared_tensor[batch_size // 2:, :], src=core1_data)\n\n        core_barrier(data=shared_tensor, cores=(0, 1))\n\n        # Now both cores can safely read the complete tensor\"\"\"\n    ...\n\n\ndef dma_compute(dst, srcs, reduce_op, scales=None, unique_indices=True, name=None):\n    r\"\"\"Perform math operations using compute logic inside DMA engines with element-wise scaling and reduction.\n\n    This instruction leverages the compute capabilities within DMA engines to perform scaled element-wise operations\n    followed by reduction across multiple source tensors. The computation follows the pattern:\n    ``dst = reduce_op(srcs[0] * scales[0], srcs[1] * scales[1], ...)``, where each source tensor is first\n    multiplied by its corresponding scale factor, then all scaled results are combined using the specified\n    reduction operation.\n    Currently, only ``nl.add`` is supported for ``reduce_op``, and\n    all values in ``scales`` must be ``1.0`` (or ``scales`` can be ``None``\n    which defaults to all 1.0).\n\n    The DMA engines perform all computations in float32 precision internally. Input tensors are automatically\n    cast from their source data types to float32 before computation, and the final float32 result is cast\n    to the output data type in a pipelined fashion.\n\n    **Read-Modify-Write with vector_offset (scatter and gather).**\n\n    When one of the source tensors has a ``vector_offset`` (indirect indexing),\n    ``dma_compute`` performs read-modify-write with two modes:\n\n    **Scatter RMW**: ``dst(HBM)[indices] = dst(HBM)[indices] + src(SB)``\n      - ``dst`` is in HBM with indirect indexing\n      - One source matches ``dst`` and has ``vector_offset``\n      - The other source is data in SBUF\n\n    **Gather RMW**: ``dst(SB) = dst(SB) + src(HBM)[indices]``\n      - ``dst`` is in SBUF\n      - One source is data in HBM with ``vector_offset``\n      - The other source matches ``dst``\n\n    Both modes require:\n      - Exactly 2 source tensors\n      - All ``scales`` must be ``1.0`` (or ``None``)\n      - ``unique_indices`` must be ``True`` (non-unique indices not yet supported)\n\n    **Memory types.**\n\n    Both input ``srcs`` tensors and output ``dst`` tensor can be in HBM or SBUF.\n    Both ``srcs`` and ``dst`` tensors must have compile-time known addresses (unless using vector_offset for indirect access).\n\n    **Data types.**\n\n    All input ``srcs`` tensors and the output ``dst`` tensor can be any supported NKI data types\n    (see :ref:`nki-dtype` for more information). The DMA engines automatically cast input data types to float32\n    before performing the scaled reduction computation. The float32 computation results are then cast to the\n    data type of ``dst`` in a pipelined fashion.\n\n    **Layout.**\n\n    The computation is performed element-wise across all tensors, with the reduction operation applied\n    across the scaled source tensors at each element position.\n\n    **Tile size.**\n\n    The element count of each tensor in ``srcs`` and ``dst`` must match exactly.\n    The max number of source tensors in ``srcs`` is 16.\n\n    :param dst: the output tensor to store the computed results\n    :param srcs: a list of input tensors to be scaled and reduced\n    :param reduce_op: the reduction operation to apply (currently only ``nl.add`` is supported)\n    :param scales: (optional) a list of scale factors corresponding to each\n                   tensor in ``srcs``. Must be all 1.0 if provided.\n                   Defaults to None (equivalent to [1.0, 1.0, ...]).\n    :param unique_indices: (optional) Whether scatter indices are unique.\n                          Must be True when using vector_offset (non-unique not yet supported).\n                          Default: True.\"\"\"\n    ...\n\n\ndef dma_copy(dst, src, oob_mode=oob_mode.error, dge_mode=dge_mode.unknown, engine=engine.unknown, name=None):\n    r\"\"\"Copy data from ``src`` to ``dst`` using DMA engines.\n\n    This instruction performs data movement between memory locations (SBUF or HBM) using DMA engines.\n    The operation copies data from the source tensor to the destination tensor: ``dst = src``.\n\n    ``nisa.dma_copy`` supports different modes of DMA descriptor generation (DGE):\n\n    - ``nisa.dge_mode.none``: Neuron Runtime generates DMA descriptors and stores them into HBM before NEFF execution.\n    - ``nisa.dge_mode.swdge``: Gpsimd Engine generates DMA descriptors as part of the ``nisa.dma_copy`` instruction\n      during NEFF execution.\n    - ``nisa.dge_mode.hwdge``: Sync Engine or Scalar Engine sequencers invoke DGE hardware block to generate DMA\n      descriptors as part of the ``nisa.dma_copy`` instruction during NEFF execution.\n\n    See `Trainium2 arch guide` and `Introduction to DMA with NKI` for more discussion.\n\n    When either ``sw_dge`` or ``hw_dge`` mode is used, the ``src`` and ``dst`` tensors can have a dynamic start address\n    which depends on a variable that cannot be resolved at compile time. When ``sw_dge`` is selected, ``nisa.dma_copy``\n    can also perform a gather or scatter operation, using a list of dynamic indices from SBUF.\n    In both of these dynamic modes, out-of-bound address checking is turned on automatically during execution.\n    By default a runtime error is raised (``oob_mode=oob_mode.error`` as default setting).\n    Developers can disable this error and make the ``nisa.dma_copy`` instruction skip the DMA transfer for a given dynamic\n    address or index when it is out of bound using ``oob_mode=oob_mode.skip``.\n\n    **Memory types.**\n\n    Both ``src`` and ``dst`` tiles can be in HBM or SBUF. However, if both tiles are in SBUF, consider using an alternative\n    for better performance:\n\n    - :doc:`nisa.tensor_copy <nki.isa.tensor_copy>` for direct copies\n    - :doc:`nisa.nc_n_gather <nki.isa.nc_n_gather>` to gather elements within each partition independently\n    - :doc:`nisa.local_gather <nki.isa.local_gather>` to gather elements within groups of partitions\n\n    **Data types.**\n\n    Both ``src`` and ``dst`` tiles can be any supported NKI data types (see :ref:`nki-dtype` for more information).\n\n    The DMA engines automatically handle data type conversion when ``src`` and ``dst`` have different data types.\n    The conversion is performed through a two-step process: first casting from ``src.dtype`` to float32, then\n    from float32 to ``dst.dtype``.\n\n    **Tile size.**\n\n    The total number of data elements in ``src`` must match that of ``dst``.\n\n    **Indirect addressing (gather/scatter).**\n\n    ``nisa.dma_copy`` supports indirect addressing for dynamic row selection at runtime. This enables\n    gather (read from dynamic rows) and scatter (write to dynamic rows) patterns. Indirect addressing\n    is activated by calling ``.ap()`` on ``src`` or ``dst`` with a ``vector_offset`` or ``scalar_offset``\n    parameter.\n\n    There are two types of indirect addressing:\n\n    *Vector indirection* provides per-partition dynamic offsets. Each of the hardware partitions\n    gets its own index, enabling gather/scatter where different partitions access different rows.\n    Use ``.ap(pattern=..., vector_offset=idx_tensor, indirect_dim=0)`` where ``idx_tensor`` is an\n    SBUF tensor of shape ``(P, 1)`` containing one row index per partition.\n    The tensor being indexed (the one ``.ap()`` is called on) must be in HBM.\n\n    *Scalar indirection* provides a single dynamic offset applied uniformly to all partitions.\n    Use ``.ap(pattern=..., scalar_offset=reg_or_tensor, indirect_dim=N)`` where the offset is\n    either a 1x1 SBUF tensor or a ``VirtualRegister`` from ``nisa.register_alloc()``.\n\n    ``vector_offset`` and ``scalar_offset`` are mutually exclusive.\n\n    **Indirect gather example** (``vector_offset`` on ``src``):\n\n    .. code-block:: python\n\n        import nki\n        import nki.isa as nisa\n        import nki.language as nl\n\n        @nki.jit\n        def indirect_gather_kernel(data, indices):\n            P, F = indices.shape[0], data.shape[1]\n            output = nl.ndarray((P, F), dtype=data.dtype, buffer=nl.shared_hbm)\n\n            idx = nl.ndarray((P, 1), dtype=nl.uint32, buffer=nl.sbuf)\n            nisa.dma_copy(dst=idx, src=indices)\n\n            dst = nl.ndarray((P, F), dtype=data.dtype, buffer=nl.sbuf)\n            nisa.dma_copy(\n                dst=dst,\n                src=data.ap(\n                    pattern=[[F, P], [1, F]],\n                    vector_offset=idx,\n                    indirect_dim=0,\n                ),\n            )\n\n            nisa.dma_copy(dst=output, src=dst)\n            return output\n\n    **Indirect scatter example** (``vector_offset`` on ``dst``):\n\n    .. code-block:: python\n\n        import nki\n\n        @nki.jit\n        def indirect_scatter_kernel(src_data, indices, output):\n            P, F = src_data.shape\n\n            src = nl.ndarray((P, F), dtype=src_data.dtype, buffer=nl.sbuf)\n            nisa.dma_copy(dst=src, src=src_data)\n\n            idx = nl.ndarray((P, 1), dtype=nl.uint32, buffer=nl.sbuf)\n            nisa.dma_copy(dst=idx, src=indices)\n\n            nisa.dma_copy(\n                dst=output.ap(\n                    pattern=[[F, P], [1, F]],\n                    vector_offset=idx,\n                    indirect_dim=0,\n                ),\n                src=src,\n            )\n            return output\n\n    :param dst: the destination tensor to copy data into\n    :param src: the source tensor to copy data from\n    :param dge_mode: (optional) specify which Descriptor Generation Engine (DGE) mode to use for DMA descriptor generation: ``nki.isa.dge_mode.none`` (turn off DGE) or ``nki.isa.dge_mode.swdge`` (software DGE) or ``nki.isa.dge_mode.hwdge`` (hardware DGE)  or ``nki.isa.dge_mode.unknown`` (by default, let compiler select the best DGE mode). Hardware based DGE is only supported for NeuronCore-v3 or newer. See `Trainium2 arch guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/arch/trainium2_arch.html>`__ for more information.\n    :param oob_mode: (optional) Specifies how to handle out-of-bounds (oob) array indices during indirect access operations. Valid modes are:\n\n        - ``oob_mode.error``: (Default) Raises an error when encountering out-of-bounds indices.\n        - ``oob_mode.skip``: Silently skips any operations involving out-of-bounds indices.\n\n        For example, when using indirect gather/scatter operations, out-of-bounds indices can occur if the index array contains values that exceed the dimensions of the target array.\n\n    :param engine: (optional) the engine to use for HWDGE descriptor generation: ``nki.isa.engine.sync`` or ``nki.isa.engine.scalar``.\n                   Only valid when ``dge_mode=nisa.dge_mode.hwdge``. ``nki.isa.engine.unknown`` by default.\"\"\"\n    ...\n\n\ndef dma_transpose(dst, src, axes=None, dge_mode=dge_mode.unknown, oob_mode=oob_mode.error, name=None):\n    r\"\"\"Perform a transpose on input ``src`` using DMA Engine.\n\n    The permutation of transpose follow the rules described below:\n\n    1. For 2-d input tile, the permutation will be [1, 0]\n    2. For 3-d input tile, the permutation will be [2, 1, 0]\n    3. For 4-d input tile, the permutation will be [3, 1, 2, 0]\n\n    **DMA Direct Transpose Constraints**\n\n    The only valid ``dge_mode`` s are ``unknown`` and ``hwdge``. If ``hwdge``, this instruction will be lowered\n    to a Hardware DGE transpose. This has additional restrictions:\n\n    1. ``src.shape[0] == 16``\n    2. ``src.shape[-1] % 128 == 0``\n    3. ``src.dtype`` is 2 bytes\n\n    **DMA Indirect Transpose Constraints**\n\n    The only valid ``dge_mode`` s are ``unknown`` and ``swdge``. This instruction will be lowered\n    to a Software DGE transpose (``dma_gather_transpose``). This has additional restrictions:\n\n    #. When ``src`` is 4D: ``len(src[1])`` or ``len(src[2])`` must be 1\n    #. ``src.shape[-1] <= 128``\n    #. ``src.dtype`` is 2 bytes\n    #. ``src`` tensor must be on HBM\n    #. ``indices`` must be 2-d\n    #. ``indices.shape[0] * indices.shape[1]`` must be ``>=`` ``src.shape[0]``\n    #. ``src.shape[0]`` must be divisible by 16\n    #. ``indices.shape[0]`` must be in ``[16, 128]`` and divisible by 16\n    #. When ``indices.shape[1] > 1``: ``indices.shape[0]`` must be exactly 128\n    #. ``indices.dtype`` is ``np.uint32``\n    #. ``indices`` tensor must be on SBUF\n    #. TRN2+ only\n\n    Indirect transpose effectively performs the following operation:\n    ``flat_indices = indices.T.flatten()[:src.shape[0]]``\n    ``gathered = src[flat_indices, :]``\n    ``dst = gathered.T``\n\n    **Indirect transpose example** (``vector_offset`` on ``src``):\n\n    .. code-block:: python\n\n        import nki\n        import nki.isa as nisa\n        import nki.language as nl\n\n        @nki.jit\n        def gather_transpose_kernel(src_hbm, idx_hbm):\n            P, F = 128, 128\n            output = nl.ndarray((F, P), dtype=src_hbm.dtype, buffer=nl.shared_hbm)\n\n            idx_sb = nl.load(idx_hbm)\n\n            dst_sb = nl.ndarray((F, P), dtype=src_hbm.dtype, buffer=nl.sbuf)\n            nisa.memset(dst=dst_sb, value=0)\n\n            src_ap = src_hbm.ap(\n                pattern=[[F, P], [1, F]],\n                vector_offset=idx_sb,\n                indirect_dim=0,\n            )\n            nisa.dma_transpose(dst=dst_sb, src=src_ap, axes=(1, 0))\n\n            nisa.dma_copy(dst=output, src=dst_sb)\n            return output\n\n    :param dst: the destination of transpose, must be a tile in SBUF.\n    :param src: the source of transpose, must be a tile in HBM or SBUF. ``src.dtype == dst.dtype``\n    :param axes: transpose axes where the i-th axis of the transposed tile will correspond to the axes[i] of the source.\n                 Supported axes are ``(1, 0)``, ``(2, 1, 0)``, and ``(3, 1, 2, 0)``.\n    :param dge_mode: (optional) specify which Descriptor Generation Engine (DGE) mode to use for DMA descriptor generation: ``nki.isa.dge_mode.none`` (turn off DGE) or ``nki.isa.dge_mode.swdge`` (software DGE) or ``nki.isa.dge_mode.hwdge`` (hardware DGE)  or ``nki.isa.dge_mode.unknown`` (by default, let compiler select the best DGE mode). Hardware based DGE is only supported for NeuronCore-v3 or newer. See `Trainium2 arch guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/arch/trainium2_arch.html>`__ for more information.\n    :param oob_mode: (optional) Specifies how to handle runtime out-of-bounds (oob) array indices during indirect access operations. Valid modes are:\n\n        - ``oob_mode.error``: (Default) Raises an error when encountering runtime out-of-bounds indices.\n\n        - ``oob_mode.skip``: Silently skips any operations involving out-of-bounds indices. Only valid when ``src`` uses indirect indexing.\"\"\"\n    ...\n\n\ndef dropout(dst, data, prob, name=None):\n    r\"\"\"Randomly replace some elements of the input tile ``data`` with zeros\n    based on input probabilities using Vector Engine.\n    The probability of replacing input elements with zeros (i.e., drop probability)\n    is specified using the ``prob`` field:\n    - If the probability is 1.0, all elements are replaced with zeros.\n    - If the probability is 0.0, all elements are kept with their original values.\n\n    The ``prob`` field can be a scalar constant or a tile of shape ``(data.shape[0], 1)``,\n    where each partition contains one drop probability value.\n    The drop probability value in each partition is applicable to the input\n    ``data`` elements from the same partition only.\n\n    Data type of the input ``data`` tile can be any valid NKI data types\n    (see :ref:`nki-dtype` for more information).\n    However, data type of ``prob`` has restrictions based on the data type of ``data``:\n\n    - If data type of ``data`` is any of the integer types (e.g., int32, int16),\n      ``prob`` data type must be float32\n    - If data type of data is any of the float types (e.g., float32, bfloat16),\n      ``prob`` data can be any valid float type\n\n    The output data type ``dst.dtype`` must match the input data type ``data.dtype``.\n\n    :param dst: an output tile of the dropout result\n    :param data: the input tile\n    :param prob: a scalar or a tile of shape ``(data.shape[0], 1)`` to indicate the\n                 probability of replacing elements with zeros\"\"\"\n    ...\n\n\ndef exponential(dst, src, max_value=0.0, reduce_res=None, reduce_cmd=reduce_cmd.idle, reduce_init=0.0, name=None):\n    r\"\"\"Apply exponential function to each element after subtracting a max_value using Vector Engine.\n\n    .. note::\n        Available only on NeuronCore-v4 and newer.\n\n    This instruction computes ``exp(src - max_value)`` for each element. The instruction can\n    optionally maintain a running sum of the exponential values using shared internal reduction\n    registers in the Vector Engine.\n\n    The exponential operation is performed as:\n\n    .. code-block::\n\n        dst[i] = exp(src[i] - max_value)\n\n    When accumulation is enabled through ``reduce_cmd``, the instruction also computes:\n\n    .. code-block::\n\n        reduce_res[i] = sum(dst[i])\n\n    The Vector Engine performs the computation in float32 precision internally and can\n    output results in various data types as specified by the ``dst`` dtype field.\n\n    **Constraints**\n\n    - Supported engines: Vector.\n    - ``src``, ``dst`` must have the same number of elements in the partition dimension.\n    - ``src``, ``dst`` must have the same number of elements in the free dimensions.\n    - ``src``, ``dst`` can be up to 4D tensor.\n    - ``reduce_init`` should be unset or set to ``0.0`` when ``reduce_cmd`` is not ``load_reduce``.\n\n    :param dst: The output tile with exponential function applied. Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, tfloat32, int8, int16, int32, uint8, uint16.\n    :param src: The input tile to apply exponential function on. Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, int8, int16, int32, uint8, uint16, uint32.\n    :param max_value: The maximum value to subtract from each element before applying exponential (for numerical stability). Can be a scalar or vector of shape ``(src.shape[0], 1)``. Supported dtypes: float32.\n    :param reduce_res: Optional tile to store reduction results (sum of exponentials). Must have shape ``(src.shape[0], 1)``. Supported buffers: SBUF, PSUM. Supported dtypes: float8_e4m3, float8_e5m2, float16, bfloat16, float32, tfloat32.\n    :param reduce_cmd: Control the state of reduction registers for accumulating exponential results. Supported: ``idle``, ``reset_reduce``, ``reduce``, ``load_reduce``.\n    :param reduce_init: Initial value for reduction when using ``reduce_cmd.load_reduce``. Supported dtypes: float32.\n\n    **Accumulator behavior:**\n\n    The Vector Engine maintains internal accumulator registers that can be controlled via the ``reduce_cmd`` parameter:\n\n    - ``reduce_cmd.reset_reduce``: Reset accumulators to 0, then accumulate the current results.\n    - ``reduce_cmd.reduce``: Continue accumulating without resetting (useful for multi-step reductions).\n    - ``reduce_cmd.load_reduce``: Load the values from ``reduce_init`` into the accumulator, then accumulate the current result on top of it.\n    - ``reduce_cmd.idle``: (default) No accumulation performed, accumulator state unknown.\n\n    .. note::\n      Even when ``reduce_cmd`` is set to ``idle``, the accumulator state may still be modified.\n      Always use ``reset_reduce`` after any Vector Engine operation that ran with ``idle`` mode to ensure\n      consistent behavior.\n\n    .. note::\n      The accumulator registers are shared for other Vector Engine accumulation instructions such as :doc:`nki.isa.range_select <nki.isa.range_select>`,\n      :doc:`nki.isa.select_reduce <nki.isa.select_reduce>`, and :doc:`nki.isa.tensor_scalar_cumulative <nki.isa.tensor_scalar_cumulative>`.\n\n    **Behavior**\n\n    .. code-block:: python\n\n        # Initialize reduction if requested\n        if reduce_cmd == reduce_cmd.reset_reduce:\n            accumulator = 0\n        elif reduce_cmd == reduce_cmd.load_reduce:\n            accumulator = reduce_init\n        elif reduce_cmd == reduce_cmd.idle:\n            accumulator = undefined  # Not used\n\n        # Process each element\n        for i in range(num_elements):\n            dst[i] = exp(src[i] - max_value)\n\n            # Update reduction if active\n            if reduce_cmd != reduce_cmd.idle:\n                accumulator += dst[i]\"\"\"\n    ...\n\n\ndef get_nc_version():\n    r\"\"\"Returns the nc_version of the current target context.\"\"\"\n    ...\n\n\ngpsimd_engine = engine.gpsimd\n\"\"\"GpSIMD Engine\"\"\"\n\n\ndef iota(dst, pattern, offset=0, channel_multiplier=0, name=None):\n    r\"\"\"Generate a constant literal pattern into SBUF using GpSimd Engine.\n\n    The pattern is defined by an int32 ``offset``, a tensor access pattern of up to 4D ``pattern`` and\n    an int32 ``channel_multiplier``. The ``pattern`` field is a list of lists in the form of\n    ``[[step_w, num_w], [step_z, num_z], [step_y, num_y], [step_x, num_x]]``. When fewer than 4D ``pattern``\n    is provided, NKI compiler automatically pads remaining dimensions with size of 1.\n\n    Given a 4D pattern (padded if needed), the instruction generates a stream of values using the following pseudo code:\n\n    .. code-block:: python\n\n        num_partitions = dst.shape[0]\n        [[step_w, num_w], [step_z, num_z], [step_y, num_y], [step_x, num_x]] = pattern\n\n        for channel_id in range(num_partitions):\n            for w in range(num_w):\n                for z in range(num_z):\n                    for y in range(num_y):\n                        for x in range(num_x):\n                            value = offset + (channel_id * channel_multiplier) +\n                                    (w * step_w) + (z * step_z) + (y * step_y) + (x * step_x)\n\n                            dst[channel_id, w, z, y, x] = value\n\n    The above pseudo code assumes ``dst`` has the same size in every dimension ``x/y/z/w`` for simplicity. However,\n    the instruction allows any sizes in the free dimension, as long as the number of elements per partition in ``dst``\n    matches the product: ``num_w * num_z * num_y * num_x``.\n\n    **Memory types.**\n\n    The output ``dst`` tile must be in SBUF.\n\n    **Data types.**\n\n    The generated values are computed in 32-bit integer arithmetic. The GpSimd Engine can cast\n    these integer results to any valid NKI data type (see :ref:`nki-dtype` for more information)\n    before writing to the output tile. The output data type is determined by the ``dst`` tile's\n    data type.\n\n    **Layout.**\n\n    The partition dimension determines the number of active channels for parallel pattern generation.\n\n    **Tile size.**\n\n    The partition dimension size of ``dst`` must not exceed 128. The number of\n    elements per partition of ``dst`` must not exceed the physical size of each SBUF partition.\n    The total number of elements in ``pattern`` must match the number of elements per partition in the ``dst`` tile.\n\n    :param dst: the output tile in SBUF to store the generated pattern\n    :param pattern: a list of [step, num] to describe up to 4D tensor sizes and strides\n    :param offset: an int32 offset value to be added to every generated value\n    :param channel_multiplier: an int32 multiplier to be applied to the channel (parition) ID\"\"\"\n    ...\n\n\ndef local_gather(dst, src_buffer, index, num_elem_per_idx=1, num_valid_indices=None, name=None):\n    r\"\"\"Gather SBUF data in ``src_buffer`` using ``index`` on GpSimd Engine.\n\n    Each of the eight GpSimd cores in GpSimd Engine connects to 16 contiguous SBUF partitions\n    (e.g., core[0] connected to partition[0:16]) and performs gather from the connected 16\n    SBUF partitions *independently* in parallel. The indices used for gather on each core should also\n    come from the same 16 connected SBUF partitions. If you only need to gather elements within a partition,\n    consider using :doc:`nisa.nc_n_gather <nki.isa.nc_n_gather>` instead, which supports gathering more indices.\n\n    During execution of the instruction, each GpSimd core reads a 16-partition slice from ``index``, flattens\n    all indices into a 1D array ``indices_1d`` (along the partition dimension first).\n    By default with no ``num_valid_indices`` specified, each GpSimd core\n    will treat all indices from its corresponding 16-partition ``index`` slice as valid indices.\n    However, when the number of valid indices per core\n    is not a multiple of 16, users can explicitly specify the valid index count per core in ``num_valid_indices``.\n    Note, ``num_valid_indices`` must not exceed the total element count in each 16-partition ``index`` slice\n    (i.e., ``num_valid_indices <= index.size / (index.shape[0] / 16)``).\n\n    Next, each GpSimd core uses the flattened ``indices_1d`` indices as *partition offsets* to gather from\n    the connected 16-partition slice of ``src_buffer``. Optionally, this API also allows gathering of multiple\n    contiguous elements starting at each index to improve gather throughput, as indicated by ``num_elem_per_idx``.\n    Behavior of out-of-bound index access is undefined.\n\n    Even though all eight GpSimd cores can gather with completely different indices, a common use case for\n    this API is to make all cores gather with the same set of indices (i.e., partition offsets). In this case,\n    users can generate indices into 16 partitions, replicate them eight times to 128 partitions and then feed them into\n    ``local_gather``.\n\n    As an example, if ``src_buffer`` is (128, 512) in shape and ``index`` is (128, 4) in shape, where the partition\n    dimension size is 128, ``local_gather`` effectively performs the following operation:\n\n    ``local_gather`` preserves the input data types from ``src_buffer`` in the gather output.\n    Therefore, no data type casting is allowed in this API. The indices in ``index`` tile must be uint16 types.\n\n    This API has three tile size constraints [subject to future relaxation]:\n\n    #. The partition axis size of ``src_buffer`` must match that of ``index`` and must\n       be a multiple of 16. In other words, ``src_buffer.shape[0] == index.shape[0] and src_buffer.shape[0] % 16 == 0``.\n    #. The number of contiguous elements to gather per index per partition ``num_elem_per_idx``\n       must be one of the following values: ``[1, 2, 4, 8, 16, 32]``.\n    #. The number of indices for gather per core must be less than or equal to 4096.\n\n    :param dst: an output tile of the gathered data\n    :param src_buffer: an input tile for gathering.\n    :param index: an input tile with indices used for gathering.\n    :param num_elem_per_idx: an optional integer value to read multiple contiguous elements per index per partition; default is 1.\n    :param num_valid_indices: an optional integer value to specify the number of valid indices per GpSimd core; default is\n                              ``index.size / (index.shape[0] / 16)``.\n\n    Click :download:`here <../../test/test_nki_isa_local_gather.py>` to download the\n    full NKI code example with equivalent numpy implementation.\"\"\"\n    ...\n\n\ndef max8(dst, src, name=None):\n    r\"\"\"Find the 8 largest values in each partition of the source tile.\n\n    This instruction reads the input elements, converts them to fp32 internally, and outputs\n    the 8 largest values in descending order for each partition. Outputs are converted to\n    ``dst.dtype`` automatically.\n\n    The source tile can be up to 5-dimensional, while the output tile is always 2-dimensional.\n    The number of elements read per partition must be between 8 and 16,384 inclusive.\n    The output will always contain exactly 8 elements per partition.\n    The source and output must have the same partition dimension size:\n\n    - source: [par_dim, ...]\n    - output: [par_dim, 8]\n\n    :param dst: a 2D tile containing the 8 largest values per partition in descending order with shape [par_dim, 8]\n    :param src: the source tile to find maximum values from\"\"\"\n    ...\n\n\ndef memset(dst, value, engine=engine.unknown, name=None):\n    r\"\"\"Initialize ``dst`` by filling it with a compile-time constant ``value``, using Vector or GpSimd Engine.\n    The memset instruction supports all valid NKI dtypes (see :ref:`nki-dtype`).\n\n    :param dst: destination tile to initialize.\n    :param value: the constant value to initialize with\n    :param engine: specify which engine to use for memset: ``nki.isa.engine.vector`` or ``nki.isa.engine.gpsimd`` ;\n                   ``nki.isa.engine.unknown`` by default, lets compiler select the best engine for the given\n                   input tile shape\n\n    .. note::\n        For x4 packed types (``float8_e4m3fn_x4``, ``float8_e5m2_x4``,\n        ``float4_e2m1fn_x4``), only ``value=0`` is supported.\"\"\"\n    ...\n\n\ndef nc_find_index8(dst, data, vals, name=None):\n    r\"\"\"Find indices of the 8 given vals in each partition of the data tensor.\n\n    This instruction first loads the 8 values,\n    then loads the data tensor and outputs the indices (starting at 0) of the first\n    occurrence of each value in the data tensor, for each partition.\n\n    The data tensor can be up to 5-dimensional, while the vals tensor must be up\n    to 3-dimensional. The data tensor must have between 8 and 16,384 elements per\n    partition. The vals tensor must have exactly 8 elements per partition.\n    The output will contain exactly 8 elements per partition and will be uint16 or\n    uint32 type. Default output type is uint32.\n\n    Behavior is undefined if vals tensor contains values that are not in\n    the data tensor.\n\n    If provided, a mask is applied only to the data tensor.\n\n    :param dst: a 2D tile containing indices (uint16 or uint32) of the 8 values in each partition with shape [par_dim, 8]\n    :param data: the data tensor to find indices from\n    :param vals: tensor containing the 8 values per partition whose indices will be found\"\"\"\n    ...\n\n\ndef nc_match_replace8(dst, data, vals, imm, dst_idx=None, name=None):\n    r\"\"\"Replace first occurrence of each value in ``vals`` with ``imm`` in ``data``\n    using the Vector engine and return the replaced tensor. If ``dst_idx``\n    tile is provided, the indices of the matched values are written to ``dst_idx``.\n\n    :param dst: output tile with replaced values\n    :param data: the data tensor to search and replace in\n    :param vals: tensor containing the 8 values per partition to match\n    :param imm: the immediate float value to replace matched values with\n    :param dst_idx: optional tile to store indices of matched values\"\"\"\n    ...\n\n\ndef nc_matmul(dst, stationary, moving, is_stationary_onezero=False, is_moving_onezero=False, is_transpose=False, accumulate=None, tile_position=(), tile_size=(), perf_mode=matmul_perf_mode.none, name=None):\n    r\"\"\"Compute ``dst = stationary.T @ moving`` matrix multiplication using Tensor Engine.\n\n    The figure below illustrates how to map a matrix multiplication from a mathematical definition\n    to ``nisa.nc_matmul`` on Tensor Engine. The stationary tensor is loaded into the systolic array first and\n    stays in place, while the moving tensor streams through the array during computation.\n    For more detailed discussion of Tensor Engine capabilities, see\n    `Trainium arch guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/arch/trainium_inferentia2_arch.html>`_.\n\n    .. figure:: ../../img/arch_images/matmul.png\n      :align: center\n      :width: 100%\n\n      MxKxN Matrix Multiplication Visualization.\n\n    **Performance mode.**\n\n    On NeuronCore-v2, performance mode is not supported.\n    On NeuronCore-v3 and NeuronCore-v4, Tensor Engine supports FP8 double performance mode, enabled by setting\n    performance mode to ``double_row``.\n    See `Trainium2 arch guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/arch/trainium2_arch.html>`_\n    for more details.\n    ``double_row`` performance mode cannot be combined with Tensor Engine column tiling mode (details below).\n\n    **Tiling mode.**\n    NeuronCore Tensor Engine is built upon a systolic array with 128 rows and 128 columns of processing elements (PEs).\n    Tensor Engine supports both row and column tiling modes, which allow multiple ``nc_matmul`` instructions with\n    a stationary tile size smaller than [128, 128] to run in parallel to improve hardware utilization.\n    Row tiling mode slices the 128 PE rows into 2x 64 row\n    tiles (NeuronCore-v2 or newer), or 4x 32 row tiles (NeuronCore-v3 or newer). Column tiling mode slices\n    the 128 PE columns in the same fashion. The row and column tile sizes can be set independently in the\n    ``tile_size`` field as a tuple ``(row_size, column_size)``. The stationary tile size must not exceed the chosen\n    ``tile_size``.\n\n    In addition, a given ``nc_matmul`` can also pick the exact row and column tile within the 128x128 systolic\n    array, by specifying the starting row and starting column in ``tile_position`` as a\n    tuple ``(start_row, start_column)``. The ``start_row`` must be a multiple of ``row_size`` specified in ``tile_size``\n    and must not exceed 128. Similarly, the ``start_column`` must be a multiple of ``column_size`` and must not exceed 128.\n\n    For example, setting ``tile_position`` to (64, 0) and ``tile_size`` to (64, 128) means using the bottom half\n    of the systolic array.\n\n    Note, ``tile_position`` and ``tile_size`` must both be set to enable tiling mode. If they are not set,\n    the default is to use the full systolic array, which is equivalent to ``tile_position=(0, 0)``\n    and ``tile_size=(128, 128)``. The values in ``tile_position`` and ``tile_size`` tuples can be\n    integers or affine expressions.\n\n    **Accumulation mode.**\n\n    The ``accumulate`` parameter controls whether the matmul result should overwrite or accumulate on top of\n    the ``dst`` PSUM tile. When ``accumulate=False``, the result overwrites the existing content.\n    When ``accumulate=True``, the result is added to the existing content.\n    When ``accumulate=None`` (default), the behavior is auto-detected: the first write to a PSUM location\n    overwrites, and subsequent writes to the same location accumulate. Multiple ``nc_matmul`` instructions\n    with ``accumulate=True`` can form an accumulation group before the PSUM tile content is evicted back to SBUF.\n\n    **Transpose mode.**\n\n    Tensor Engine can transpose a tile in SBUF by loading it as a stationary tile and using an identity matrix\n    as the moving tile.\n    Starting NeuronCore-v3, turning on transpose mode by setting ``is_transpose=True`` enables bit-accurate\n    data transpose, which can transpose tensors with NaN/Inf values properly.\n    See `Trainium2 arch guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/arch/trainium2_arch.html>`_\n    for more details.\n\n    On NeuronCore-v2, Tensor Engine does not support transpose mode natively. However, setting ``is_transpose=True``\n    ensures neuron-profile identifies this instruction as a transpose for performance metric accounting purposes.\n\n    **Memory types.**\n\n    The ``nc_matmul`` instruction *must* read inputs from SBUF and\n    write outputs to PSUM. Therefore, the ``stationary`` and ``moving`` must be SBUF tiles, and ``dst`` tile\n    must be a PSUM tile.\n\n    **Data types.**\n\n    The input ``stationary`` and ``moving`` tiles can be one of these supported data types:\n    ``float8_e4m3/float8_e5m2/bfloat16/float16/tfloat32/float32``. The ``stationary`` and ``moving`` tiles\n    can have different data types, with one exception: if one of the input tiles is ``tfloat32/float32``,\n    the other tile must also be ``tfloat32/float32``.\n    On NeuronCore-v3 and NeuronCore-v4, when performance mode is ``double_row``, ``stationary`` and ``moving`` tiles\n    must be one of ``float8_e4m3`` or ``float8_e5m2``, but the two input tiles can have different float8 formats.\n\n    The accumulation precision internal to Tensor Engine is float32.\n    The ``dst`` tile must be a float32 tile in NeuronCore-v2 and NeuronCore-v3. Starting NeuronCore-v4,\n    ``dst`` can either be a float32 or bfloat16 tile.\n\n    **Layout.**\n\n    If performance mode is off, the contraction dimension of the matmul must be along the partition dimension in\n    both ``stationary`` and ``moving`` tiles.\n\n    If performance mode is ``double_row``, the contraction dimension of the matmul is split between the partition dimension\n    and the first free dimension after the partition dimension in both ``stationary`` and ``moving`` tiles.\n    The first free dimension must be 2. For example, to perform a matmul of ``[1, 256]@[256, 3]=[1, 3]``, the stationary\n    tile is of shape ``[128, 2, 1]``, while the moving tile is of shape ``[128, 2, 3]``.\n\n    Regardless of performance mode, the free dimension of the ``stationary`` tile matches the partition\n    dimension of the output ``dst`` tile in size, while the free dimension of the ``moving`` tile\n    matches the free dimension of the ``dst`` tile in size.\n\n    **Tile size.**\n\n    The partition dimension sizes of the ``stationary`` and ``moving`` tiles must be identical. They must not\n    exceed 128 when tiling mode is off or ``row_size`` specified in ``tile_size`` when tiling mode is on.\n    The free dimension size of ``stationary`` must not exceed 128 when tiling mode is off or ``column_size``\n    in ``tile_size`` when tiling mode is on.\n\n    On NeuronCore-v2 and -v3, the free dimension size of ``moving`` tile must not exceed 512, matching the maximum\n    number of float32 elements per PSUM bank. Starting NeuronCore-v4, the free dimension size of ``moving`` tile\n    can go up to 4096 for float32 ``dst`` or 8192 for bfloat16 ``dst``, matching the size of 8x PSUM banks\n    (the entire PSUM).\n\n    Explicit tiling is required when the high-level matmul operation exceeds the tile size limits of ``nc_matmul``.\n\n    **Profiler view syntax.**\n\n    Each ``nc_matmul`` call lowers to two ISA instructions in the profiler: a load instruction\n    (to load the stationary operand into the Tensor Engine) followed by a multiply instruction.\n    Both instructions will appear in profiler output for a single ``nc_matmul`` call.\n\n    The multiply instruction operands are displayed in a compact ISA syntax:\n\n    .. code-block:: text\n\n        src=<dtype>@<address>[<strides>][<num_elem>]\n        dst=<dtype>@<address>[<strides>][<num_elem>]\n        <M>*<K> acc_flags=<flags> psum_zero=<val>\n\n    Where:\n\n    - ``<dtype>``: data type (e.g., ``bfloat16``, ``fp8e4``, ``fp8e5``)\n    - ``<address>``: hex memory address in SBUF (for src) or PSUM (for dst)\n    - ``[<strides>]``: element strides per dimension (multi-dimensional)\n    - ``[<num_elem>]``: number of elements per dimension (multi-dimensional)\n    - ``<M>*<K>``: matmul dimensions (M rows × K contraction)\n    - ``acc_flags``: accumulator control flags (e.g., ``2`` = reset accumulator)\n    - ``psum_zero``: PSUM zero-initialization control value\n\n    :param dst: the matmul output\n    :param stationary: the stationary operand\n    :param moving: the moving operand\n    :param is_stationary_onezero: hints to the compiler whether the ``stationary`` operand is a tile with ones/zeros only;\n                           setting this field explicitly could lead to 2x better performance\n                           if ``stationary`` tile is in float32; the field has no impact for non-float32 ``stationary``\n    :param is_moving_onezero: hints to the compiler whether the ``moving`` operand is a tile with ones/zeros only;\n                           setting this field explicitly could lead to 2x better performance\n                           if ``moving`` tile is in float32; the field has no impact for non-float32 ``moving``\n    :param is_transpose: controls Tensor Engine transpose mode on/off starting NeuronCore-v3\n    :param accumulate: if True, accumulate the matmul result into the existing ``dst`` PSUM tile content;\n                       if False, overwrite the existing content;\n                       if None (default), auto-detect based on whether this PSUM location was previously written.\n                       Not exposed for ``nc_transpose``.\n    :param tile_position: a 2D tuple (start_row, start_column) to control starting row in Tensor Engine tiling mode; start_column must be 0\n    :param tile_size: a 2D tuple (row_size, column_size) to control row tile size in Tensor Engine tiling mode; column_size must be 128\n    :param perf_mode: controls Tensor Engine FP8 double performance mode on/off starting NeuronCore-v3: ``matmul_perf_mode.none`` (default) disables double FP8 mode; ``matmul_perf_mode.double_row`` enables double FP8 mode which achieves 2x matmul throughput by packing two FP8 weight/ifmap element pairs and computing two multiplications in parallel per cycle; cannot be combined with column tiling mode. See the `Trainium2 arch guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/arch/trainium2_arch.html>`__ for more information.\"\"\"\n    ...\n\n\ndef nc_matmul_mx(dst, stationary, moving, stationary_scale, moving_scale, tile_position=None, tile_size=None, accumulate=None, name=None):\n    r\"\"\"Compute matrix multiplication of MXFP8/MXFP4 quantized matrices with integrated dequantization using Tensor Engine.\n\n    .. note::\n\n      Available only on NeuronCore-v4 and newer.\n\n    The NeuronCore-v4 Tensor Engine supports matrix multiplication of MXFP8/MXFP4 quantized matrices as defined in the\n    `OCP Microscaling standard <https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf>`__.\n    This instruction performs matrix multiplication between quantized ``stationary`` and ``moving`` matrices while\n    applying dequantization scales during computation. The micro-scaling group size is 32 elements in groups of\n    8 partitions × 4 elements per partition of both ``stationary`` and ``moving`` tensors.\n    See `Trainium3 arch guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/about/trainium3_arch.html>`_\n    for more detailed discussion.\n\n    **Tiling Mode.**\n\n    NeuronCore Tensor Engine is built upon a systolic array with 128 rows and 128 columns of processing elements (PEs).\n    For ``nc_matmul_mx``, Tensor Engine supports only row tiling mode, which allows multiple ``nc_matmul_mx`` instructions with\n    a stationary partition dimension size smaller than 128 to run in parallel to improve hardware utilization.\n    Row tiling mode slices the 128 PE rows into 2x 64 row tiles or 4x 32 row tiles.\n\n    The row tile size can be set in the ``tile_size`` field as a tuple ``(row_size, column_size)``,\n    where ``column_size`` must be 128.\n    The stationary tile size must not exceed the chosen ``tile_size``.\n\n    A given ``nc_matmul_mx`` can pick the exact row tile within the 128x128 systolic array by specifying the starting row\n    in ``tile_position`` as a tuple ``(start_row, start_column)``, where ``start_column`` must be 0.\n    The ``start_row`` must be a multiple of ``row_size`` specified in ``tile_size`` and must not exceed 128.\n\n    For example, setting ``tile_position`` to (64, 0) and ``tile_size`` to (64, 128) means using the bottom half\n    of the systolic array.\n\n    Note, ``tile_position`` and ``tile_size`` must both be set to enable tiling mode. If they are not set,\n    the default is to use the full systolic array, which is equivalent to ``tile_position=(0, 0)``\n    and ``tile_size=(128, 128)``. The values in ``tile_position`` and ``tile_size`` tuples can be\n    integers or affine expressions.\n\n    **Memory types.**\n\n    The ``nc_matmul_mx`` instruction must read inputs from SBUF and write outputs to PSUM. Therefore, the\n    ``stationary``, ``moving``, ``stationary_scale``, and ``moving_scale`` must be SBUF tiles, and ``dst``\n    tile must be a PSUM tile.\n\n    **Data types.**\n\n    The input ``stationary`` and ``moving`` tiles must be float8_e5m2_x4, float8_e4m3fn_x4, or float4_e2m1fn_x4\n    (4-packed quantized data types). The ``stationary_scale`` and ``moving_scale`` tiles must be uint8.\n    The ``dst`` tile can be float32 or bfloat16.\n\n    **Layout.**\n\n    The contraction dimension of the matrix multiplication is along the partition dimension of ``stationary``\n    and ``moving`` tensors and also the x4 dimension within each packed data type element\n    (float8_e5m2_x4, float8_e4m3fn_x4, or float4_e2m1fn_x4).\n\n    The free dimension of the ``stationary`` tile matches the partition\n    dimension of the output ``dst`` tile in size, while the free dimension of the ``moving`` tile\n    matches the free dimension of the ``dst`` tile in size.\n\n    The scale tensors follow a special layout requirement. See more details in ``nisa.quantize_mx`` API doc.\n\n    *Tile size*\n\n    - The partition dimension size of ``stationary`` and ``moving`` must be identical and be a multiple of 32,\n      not exceeding 128.\n    - The free dimension size of ``stationary`` must be even and not exceed 128.\n    - The free dimension size of ``moving`` must not exceed 512 when ``dst`` is in float32 or 1024 when ``dst`` is in bfloat16.\n    - The scale tensors have partition dimensions that depend on whether the data tensors span multiple quadrants.\n      See more details in ``nisa.quantize_mx`` API doc.\n\n    **Profiler view syntax.**\n\n    ``nc_matmul_mx`` uses the same profiler output format as :doc:`nisa.nc_matmul <nki.isa.nc_matmul>`,\n    except the source access pattern is interpreted as an MX-quantized tensor:\n    ``src=<dtype>@$MX[<data_addr>,<scale_addr>,<start_scale_partition>]@[<step_elem>][<num_elem>]``.\n\n    :param dst: the matrix multiplication output (PSUM tile)\n    :param stationary: the stationary quantized matrix (SBUF tile)\n    :param moving: the moving quantized matrix (SBUF tile)\n    :param stationary_scale: the dequantization scales for stationary matrix\n                             (SBUF tile)\n    :param moving_scale: the dequantization scales for moving matrix (SBUF tile)\n    :param tile_position: a 2D tuple (start_row, start_column) to control\n                          starting row and column in Tensor Engine tiling mode\n    :param tile_size: a 2D tuple (row_size, column_size) to control row and\n                      column tile sizes in Tensor Engine tiling mode\n    :param accumulate: if True, accumulate the matmul result into the existing\n                       ``dst`` PSUM tile content; if False, overwrite the\n                       existing content; if None (default), auto-detect based on\n                       whether this PSUM location was previously written\"\"\"\n    ...\n\n\ndef nc_n_gather(dst, data, indices, name=None):\n    r\"\"\"Gather elements from ``data`` according to ``indices`` using GpSimd Engine.\n\n    This instruction performs a gather operation where elements are selected from the input ``data`` tile\n    based on flattened indices specified in the ``indices`` tile. The free dimensions of ``data`` are\n    treated as if they were flattened into a single dimension for indexing purposes, while the partition\n    dimension defines the parallel compute boundary.\n\n    The gather operation works independently within each partition. For each partition, the free dimensions\n    of ``data`` are conceptually flattened, and elements are gathered according to the corresponding\n    flattened indices from the same partition in ``indices``. If you need to gather elements across partitions\n    (within groups of partitions), consider using :doc:`nisa.local_gather <nki.isa.local_gather>`.\n\n    The ``n`` in ``nc_n_gather`` indicates that this instruction corresponds to ``n`` groups of instructions\n    in the underlying ISA, where ``n = ceil(elems_per_partition / 512)``.\n\n    Alternatively, we could gather elements by calling :doc:`nisa.dma_copy <nki.isa.dma_copy>` with an\n    indirect access pattern derived from ``indices``. However, this is less efficient than ``nc_n_gather``,\n    which uses GpSimd Engine to perform local data movement within SBUF, without using DMA engines.\n\n    **Memory types.**\n\n    All input and output tiles (``data``, ``indices``, and ``dst``) must be in SBUF.\n    GpSimd Engine cannot access PSUM (see :ref:`arch_sec_neuron_core_engines` for details).\n\n    **Data types.**\n\n    The input ``data`` tile can be any valid NKI data type (see :ref:`nki-dtype` for more information).\n    The output ``dst`` tile must have the same data type as ``data``.\n    The ``indices`` tile must be uint32.\n\n    **Layout.**\n\n    The partition dimension of ``data``, ``indices``, and ``dst`` must be the same.\n    Within each partition, the free dimensions of ``data`` are flattened for indexing.\n    The free dimensions of ``indices`` determine the shape of the output ``dst``.\n\n    **Tile size.**\n\n    The partition dimension size of ``data``, ``indices``, and ``dst`` must be the same and must not exceed 128.\n    The number of elements per partition in ``dst`` must match the number of elements per partition in ``indices``.\n    The indices' values must be within the range ``[0, data.size / data.shape[0])``.\n\n    :param dst: output tile containing the gathered elements\n    :param data: the input tile to gather elements from\n    :param indices: the indices tile (uint32) specifying which elements to gather\"\"\"\n    ...\n\n\ndef nc_stream_shuffle(dst, src, shuffle_mask, name=None):\n    r\"\"\"Apply cross-partition data movement within a quadrant of 32 partitions from source tile\n    ``src`` to destination tile ``dst`` using Vector Engine.\n\n    Both source and destination tiles can be in either SBUF or PSUM, and passed in by reference as arguments.\n    In-place shuffle is allowed, i.e., ``dst`` same as ``src``. ``shuffle_mask`` is a 32-element list. Each mask\n    element must be in data type int or affine expression. ``shuffle_mask[i]`` indicates which input partition the\n    output partition [i] copies from within each 32-partition quadrant. The special value ``shuffle_mask[i]=255``\n    means the output tensor in partition [i] will be unmodified. ``nc_stream_shuffle`` can be applied to multiple\n    of quadrants. In the case with more than one quadrant, the shuffle is applied to each quadrant independently,\n    and the same ``shuffle_mask`` is used for each quadrant. For more information about the cross-partition data movement,\n    see :ref:`arch_guide_cross_partition_data_movement`.\n\n    This API has 3 constraints on ``src`` and ``dst``:\n\n    #. ``dst`` must have same data type as ``src``.\n    #. ``dst`` must have the same number of elements per partition as ``src``.\n    #. The access start partition of ``src`` (``src_start_partition``), does not have to match or be in the same quadrant\n       as that of ``dst`` (``dst_start_partition``). However, ``src_start_partition``/``dst_start_partition`` needs to follow\n       some special hardware rules with the number of active partitions ``num_active_partitions``.\n       ``num_active_partitions = ceil(max(src_num_partitions, dst_num_partitions)/32) * 32``, where ``src_num_partitions`` and\n       ``dst_num_partitions`` refer to the number of partitions the ``src`` and ``dst`` tensors access respectively.\n       ``src_start_partition``/``dst_start_partition`` is constrained based on the value of ``num_active_partitions``:\n\n      * If ``num_active_partitions`` is 96/128, ``src_start_partition``/``dst_start_partition`` must be 0.\n\n      * If ``num_active_partitions`` is 64, ``src_start_partition``/``dst_start_partition`` must be 0/64.\n\n      * If ``num_active_partitions`` is 32, ``src_start_partition``/``dst_start_partition`` must be 0/32/64/96.\n\n    :param dst: the destination tile\n    :param src: the source tile\n    :param shuffle_mask: a 32-element list that specifies the shuffle source and destination partition\"\"\"\n    ...\n\n\ndef nc_transpose(dst, data, engine=engine.unknown, name=None):\n    r\"\"\"Perform a 2D transpose between the partition axis and the free axis of input ``data`` using Tensor or Vector Engine.\n\n    If the ``data`` tile has more than one free axis, this API implicitly flattens all free axes into one axis\n    and then performs a 2D transpose.\n\n    2D transpose on Tensor Engine is implemented by performing a matrix multiplication between ``data`` as the\n    stationary tensor and an identity matrix as the moving tensor. This is equivalent to calling ``nisa.nc_matmul``\n    directly with ``is_transpose=True``. See :ref:`architecture guide <arch_sec_tensor_engine_alternative_use>`\n    for more information. On NeuronCore-v2, Tensor Engine transpose is not bit-accurate if the input ``data``\n    contains NaN/Inf.\n    You may consider replacing NaN/Inf with regular floats (float_max/float_min/zeros) in the input matrix.\n    Starting NeuronCore-v3, all Tensor Engine transpose is bit-accurate.\n\n    **Memory types.**\n\n    Tensor Engine ``nc_transpose`` must read the input tile from SBUF and write the transposed result to PSUM.\n    Vector Engine ``nc_transpose`` can read/write from/to either SBUF or PSUM.\n\n    **Data types.**\n\n    The input ``data`` tile can be any valid NKI data type (see :ref:`nki-dtype` for more information).\n    The output ``dst`` tile must have the same data type as that of ``data``.\n\n    **Layout.**\n    The partition dimension of ``data`` tile becomes the free dimension of the ``dst`` tile.\n    Similarly, the free dimension of the ``data`` tile becomes the partition dimension of the ``dst`` tile.\n\n    **Tile size.**\n    Tensor Engine ``nc_transpose`` can handle an input tile of shape [128, 128] or smaller, while Vector\n    Engine can handle shape [32, 32] or smaller.\n    If no ``engine`` is specified, Neuron Compiler will automatically select an engine\n    based on the input shape.\n\n    :param dst: the transpose output\n    :param data: the input tile to be transposed\n    :param engine: specify which engine to use for transpose: ``nki.isa.engine.tensor`` or ``nki.isa.engine.vector``;\n                   by default, the best engine will be selected for the given input tile shape\"\"\"\n    ...\n\n\ndef nonzero_with_count(dst, src, index_offset=0, padding_val=-1, name=None):\n    r\"\"\"Find indices of nonzero elements in an input tensor and their total count using GpSimd Engine.\n\n    .. note::\n\n      Available only on NeuronCore-v3 and newer.\n\n    NOTE: this instruction only operates on partitions [0, 16, 32, ..., 112] of the input tile\n    and writes to partitions [0, 16, 32, ..., 112] of the destination tile. The data in other\n    partitions of the destination tile are not modified, including the last 'extra' slot for count.\n\n    This behavior is due to the physical connectivity of GpSimd engine. Each of the eight GpSimd cores\n    connects to 16 contiguous SBUF partitions (e.g., core[0] connects to partitions[0:16]).\n    In nonzero_with_count, each GpSimd core reads from and writes to its 0-th partition only.\n\n    This instruction takes an input array and produces an output array containing the indices of all\n    nonzero elements, followed by padding values, and ending with the count of nonzero elements found.\n\n    The output tensor has one more element in the free dimension than the input tensor:\n\n    - **First N elements**: 0-indexed positions of nonzero elements, offset by ``index_offset``\n    - **Next T-N elements**: Filled with ``padding_val``\n    - **Last element**: Count ``N`` of nonzero elements found\n\n    The ``index_offset`` parameter is useful when processing arrays in tiles, allowing\n    indices to be relative to the original array position rather than the tile.\n\n    Example for one partition of the tensor:\n\n    .. code-block::\n\n        Input array (T=8): [0, 1, 1, 0, 0, 1, 0, 0]\n        index_offset = 16\n        padding_val = -1\n\n        Output (T+1=9): [17, 18, 21, -1, -1, -1, -1, -1, 3]\n\n        Where:\n\n        - 17, 18, 21 are the indices (1, 2, 5) plus offset 16\n        - -1 is the padding value for unused slots\n        - 3 is the count of nonzero elements\n\n    **Constraints**\n\n    - Supported arch versions: NeuronCore-v3+.\n    - Supported engines: GpSimd.\n    - Parameters ``src``, ``dst`` must have the same number of elements in the partition dimension.\n    - Destination tensor must have exactly 1 more element than the source tensor in the free dimension.\n    - Only accesses the 0-th partition for each GpSimd core (i.e., [0, 16, 32, ..., 112]).\n    - ``src`` must be in SBUF with dtype float32 or int32.\n    - ``dst`` must be in SBUF with dtype int32.\n    - ``index_offset`` and ``padding_val`` must be int32.\n\n    :param src: Input tensor to find nonzero indices from. Only partitions [0, 16, 32, ..., 112] are read from. Supported buffers: SBUF. Supported dtypes: float32, int32.\n    :param dst: Output tensor containing nonzero indices, padding, and count. Only partitions [0, 16, 32, ..., 112] are written to. It must have one extra element than src in the free dimension. Supported buffers: SBUF. Supported dtypes: int32.\n    :param index_offset: Offset to add to the found indices (useful for tiled processing). Supported dtypes: int32.\n    :param padding_val: Value to use for padding unused output elements. Supported dtypes: int32.\n\n    **Behavior**\n\n    .. code-block:: python\n\n        # Find all nonzero elements in input\n        nonzero_indices = []\n        for i in range(len(input_array)):\n            if input_array[i] != 0:\n                nonzero_indices.append(i + index_offset)\n\n        # Build output array\n        output = []\n        # Add found indices\n        for idx in nonzero_indices:\n            output.append(idx)\n        # Add padding for remaining slots\n        for _ in range(len(input_array) - len(nonzero_indices)):\n            output.append(padding_val)\n        # Add count as last element\n        output.append(len(nonzero_indices))\n\n    **Example**\n\n    .. code-block:: python\n\n        def nonzero_with_count_kernel(in_tensor):\n            in_shape = in_tensor.shape\n            assert len(in_tensor.shape) == 2, \"expected 2D tensor\"\n\n            in_tile = nl.ndarray(in_shape, dtype=in_tensor.dtype, buffer=nl.sbuf)\n            nisa.dma_copy(dst=in_tile, src=in_tensor)\n\n            out_tile = nl.ndarray((in_shape[0], in_shape[1] + 1), dtype=nl.int32, buffer=nl.sbuf)\n            nisa.nonzero_with_count(dst=out_tile, src=in_tile, index_offset=0, padding_val=-1)\n\n            out_tensor = nl.ndarray(out_tile.shape, dtype=out_tile.dtype, buffer=nl.hbm)\n            nisa.dma_copy(dst=out_tensor, src=out_tile)\n\n            return out_tensor\"\"\"\n    ...\n\n\ndef quantize_mx(dst, src, dst_scale, name=None):\n    r\"\"\"Quantize FP16/BF16 data to MXFP8 tensors (both data and scales) using Vector Engine.\n\n    .. note::\n\n      Available only on NeuronCore-v4 and newer.\n\n    The resulting MXFP8 tensors, ``dst`` and ``dst_scale`` are as defined in the\n    `OCP Microscaling standard <https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf>`__.\n    This instruction calculates the required scales for each group of 32 values in ``src``, divides them by the calculated scale,\n    and casts to the target MXFP8 datatype. The output layout is suitable for direct consumption by the\n    ``nisa.nc_matmul_mx`` API running on Tensor Engine.\n\n    **Memory types.**\n\n    All input ``src`` and output tiles (``dst`` and ``dst_scale``) must be in SBUF.\n\n    **Data types.**\n\n    The input ``src`` tile must be float16 or bfloat16. The output ``dst`` tile must be float8_e5m2_x4 or\n    float8_e4m3fn_x4 (4-packed FP8 data types). The ``dst_scale`` tile must be uint8.\n\n    The 4-packed data types (float8_e5m2_x4/float8_e4m3fn_x4) are 32-bit data types that pack four 8-bit\n    float8_e5m2/float8_e4m3fn values.\n\n    **Layout.**\n\n    The quantization operates on groups of 32 elements from the input ``src`` tile, where each group consists of\n    8 partitions × 4 elements per partition. For each 32-element group, the instruction produces:\n\n    - Quantized FP8 data in ``dst``\n    - One shared scale value in ``dst_scale`` per group\n\n    **Tile size.**\n\n    - The partition dimension size of ``src`` must be a multiple of 32 and must not exceed 128.\n    - The free dimension size of ``src`` must be a multiple of 4 and must not exceed the physical size of each SBUF\n      partition.\n    - The ``dst`` tile has the same partition dimension size as ``src`` but a free dimension size\n      that is 1/4 of ``src`` free dimension size due to the special 4-packed FP8 data types.\n\n    :param dst: the quantized MXFP8 output tile\n    :param src: the input FP16/BF16 tile to be quantized\n    :param dst_scale: the output scale tile\"\"\"\n    ...\n\n\ndef rand2(dst, min, max, name=None):\n    r\"\"\"Generate pseudo random numbers with uniform distribution using Vector Engine.\n\n    .. note::\n\n      Available only on NeuronCore-v4 and newer.\n\n    This instruction generates pseudo random numbers and stores them into SBUF/PSUM.\n    The generated values follow a uniform distribution within the specified [min, max] range.\n\n    Key features:\n\n    - Uses XORWOW PRNG algorithm for high-quality random number generation\n    - Generates FP32 random values with uniform distribution\n    - Supports output conversion to various data types\n\n    **Memory types.**\n\n    The output ``dst`` tile can be in SBUF or PSUM.\n\n    **Data types.**\n\n    The output ``dst`` tile can be any of: float8_e4m3, float8_e5m2, float16, bfloat16, float32,\n    tfloat32, int8, int16, int32, uint8, uint16, or uint32.\n\n    **Tile size.**\n\n    The partition dimension size of ``dst`` must not exceed 128. The number of\n    elements per partition of ``dst`` must not exceed the physical size of each SBUF/PSUM partition.\n\n    **Constraints.**\n\n    - Supported arch versions: NeuronCore-v4+.\n    - Supported engines: Vector.\n    - min < max for valid range.\n\n    :param dst: the destination tensor to write random values to\n    :param min: minimum value for uniform distribution range (FP32), can be a scalar or vector value\n    :param max: maximum value for uniform distribution range (FP32), can be a scalar or vector value\"\"\"\n    ...\n\n\ndef rand_get_state(dst, engine=engine.unknown, name=None):\n    r\"\"\"Store the current pseudo random number generator (PRNG) states from the engine.\n\n    This instruction stores the current PRNG states cached inside the engine to SBUF/PSUM.\n    Each partition in the output tensor holds the PRNG states for the corresponding compute lane\n    inside the engine.\n\n    **Memory types.**\n\n    The output ``dst`` tile must be in SBUF (NeuronCore-v3) or SBUF/PSUM (NeuronCore-v4+).\n\n    **Data types.**\n\n    The output ``dst`` tile must be uint32.\n\n    **Tile size.**\n\n    - dst element count for XORWOW must be 6 elements (GpSimd) or 24 elements (Vector).\n\n    **Constraints.**\n\n    - Supported arch versions: NeuronCore-v3+.\n    - Supported engines: NeuronCore-v3: GpSimd. NeuronCore-v4+: GpSimd, Vector.\n    - Since GpSimd Engine cannot access PSUM, ``dst`` must be in SBUF when using GpSimd Engine.\n\n    :param dst: the destination tensor to store PRNG state values; must be a 2D uint32 tensor\n    :param engine: specify which engine to use: ``nki.isa.engine.vector``, ``nki.isa.engine.gpsimd``,\n                   or ``nki.isa.engine.unknown`` (default, the best engine will be selected)\"\"\"\n    ...\n\n\ndef rand_set_state(src_seeds, engine=engine.unknown, name=None):\n    r\"\"\"Seed the pseudo random number generator (PRNG) inside the engine.\n\n    This instruction initializes the PRNG state for future random number generation operations.\n    Each partition in the source tensor seeds the PRNG states for the corresponding compute lane\n    inside the engine.\n\n    The PRNG state is cached inside the engine as a persistent state during the rest of NEFF\n    execution. However, the state cannot survive TPB resets or Runtime reload.\n\n    **Memory types.**\n\n    The input ``src_seeds`` tile must be in SBUF.\n\n    **Data types.**\n\n    The input ``src_seeds`` tile must be uint32.\n\n    **Tile size.**\n\n    - src_seeds element count for XORWOW must be 6 elements (GpSimd) or 24 elements (Vector).\n\n    **Constraints.**\n\n    - Supported arch versions: NeuronCore-v3+.\n    - Supported engines: NeuronCore-v3: GpSimd. NeuronCore-v4+: GpSimd, Vector.\n    - ``src_seeds`` must be in SBUF.\n\n    :param src_seeds: the source tensor containing seed values for the PRNG; must be a 2D uint32 tensor\n                      with the partition dimension representing the compute lanes and the free dimension\n                      containing the seed values\n    :param engine: specify which engine to use: ``nki.isa.engine.vector``, ``nki.isa.engine.gpsimd``,\n                   or ``nki.isa.engine.unknown`` (default, the best engine will be selected)\"\"\"\n    ...\n\n\ndef range_select(dst, on_true_tile, comp_op0, comp_op1, bound0, bound1, reduce_cmd=reduce_cmd.reset_reduce, reduce_res=None, reduce_op=nl.maximum, range_start=0, on_false_value=-3.4028235e+38, name=None):\n    r\"\"\"Select elements from ``on_true_tile`` based on comparison with bounds using Vector Engine.\n\n    .. note::\n\n      Available only on NeuronCore-v3 and newer.\n\n    For each element in ``on_true_tile``, compares its free dimension index + ``range_start`` against ``bound0`` and ``bound1``\n    using the specified comparison operators (``comp_op0`` and ``comp_op1``). If both comparisons\n    evaluate to True, copies the element to the output; otherwise uses  ``on_false_value``.\n\n    Additionally performs a reduction operation specified by ``reduce_op`` on the results,\n    storing the reduction result in ``reduce_res``.\n\n    **Note on numerical stability:**\n\n    In self-attention, we often have this instruction sequence: ``range_select`` (VectorE) -> ``reduce_res`` -> ``activation`` (ScalarE).\n    When ``range_select`` outputs a full row of ``fill_value``, caution is needed to avoid NaN in the\n    activation instruction that subtracts the output of ``range_select`` by ``reduce_res`` (max value):\n\n    - If ``dst.dtype`` and ``reduce_res.dtype`` are both FP32, we should not hit any NaN issue\n      since ``FP32_MIN - FP32_MIN = 0``. Exponentiation on 0 is stable (1.0 exactly).\n\n    - If ``dst.dtype`` is FP16/BF16/FP8, the fill_value in the output tile will become ``-INF``\n      since HW performs a downcast from FP32_MIN to a smaller dtype.\n      In this case, you must make sure ``reduce_res.dtype`` is FP32 to avoid NaN in ``activation``.\n      NaN can be avoided because ``activation`` always upcasts input tiles to FP32 to perform math operations: ``-INF - FP32_MIN = -INF``.\n      Exponentiation on ``-INF`` is stable (0.0 exactly).\n\n    **Constraints:**\n\n    The comparison operators must be one of:\n\n    - nl.equal\n    - nl.less\n    - nl.less_equal\n    - nl.greater\n    - nl.greater_equal\n\n    Partition dim sizes must match across ``on_true_tile``, ``bound0``, and ``bound1``:\n\n    - ``bound0`` and ``bound1`` must have one element per partition\n    - ``on_true_tile`` must be one of the FP dtypes, and ``bound0/bound1`` must be FP32 types.\n\n    The comparison with ``bound0``, ``bound1``, and free dimension index is done in FP32.\n    Make sure ``range_start`` + free dimension index is within 2^24 range.\n\n    **Numpy equivalent:**\n\n    .. code-block:: python\n\n        indices = np.zeros_like(on_true_tile, dtype=np.float32)\n        indices[:] = range_start + np.arange(on_true_tile[0].size)\n\n        mask = comp_op0(indices, bound0) & comp_op1(indices, bound1)\n        select_out_tile = np.where(mask, on_true_tile, on_false_value)\n        reduce_tile = reduce_op(select_out_tile, axis=1, keepdims=True)\n\n    :param dst: output tile with selected elements\n    :param on_true_tile: input tile containing elements to select from\n    :param on_false_value: constant value to use when selection condition is False.\n      Due to hardware constraints, this must be ``FP32_MIN`` (``-3.4028235e+38``).\n      See the numerical stability note above for guidance on output dtype selection.\n    :param comp_op0: first comparison operator\n    :param comp_op1: second comparison operator\n    :param bound0: tile with one element per partition for first comparison\n    :param bound1: tile with one element per partition for second comparison\n    :param reduce_op: reduction operator to apply on across the selected output. Currently only ``nl.maximum`` is supported.\n    :param reduce_cmd: controls the state of the Vector Engine accumulator registers.\n      Defaults to ``reduce_cmd.reset_reduce``. See :ref:`nki-reduce-cmd` for supported values.\n    :param reduce_res: optional tile to store reduction results.\n    :param range_start: starting base offset for index array for the free dimension of ``on_true_tile``.\n        Defaults to 0, and must be a compile-time integer.\"\"\"\n    ...\n\n\ndef reciprocal(dst, data, name=None):\n    r\"\"\"Compute element-wise reciprocal (1.0/x) of the input ``data`` tile using Vector Engine.\n\n    **Memory types.**\n\n    Both the input ``data`` and output ``dst`` tiles can be in SBUF or PSUM.\n\n    **Data types.**\n\n    The input ``data`` tile can be any valid NKI data type (see :ref:`nki-dtype` for more information).\n    The Vector Engine automatically casts the input data type to float32 and performs the reciprocal\n    computation in float32 math. The float32 results are cast to the data type of ``dst``.\n\n    **Layout.**\n\n    The partition dimension of the input ``data`` is considered the parallel compute dimension.\n\n    **Tile size.**\n\n    The partition dimension size of input ``data`` and output ``dst`` tiles must be the same\n    and must not exceed 128. The number of elements per partition of ``dst`` must match\n    that of ``data`` and must not exceed the physical size of each SBUF partition.\n\n    :param dst: the output tile\n    :param data: the input tile\"\"\"\n    ...\n\n\ndef register_alloc(x=None):\n    r\"\"\"Allocate a virtual register and optionally initialize it with a value.\n\n    Each engine sequencer (Tensor/Scalar/Vector/GpSimd/Sync Engine) within a NeuronCore maintains its own set of\n    physical registers for scalar operations (64x 32-bit registers per engine sequencer in NeuronCore v2-v4).\n    This API conceptually allocates a register within a virtual register space.\n    Users do not need to explicitly free a register through nisa APIs. The NKI compiler\n    handles physical register allocation (and deallocation) across the appropriate engine sequencers\n    based on the dynamic program flow.\n\n    NKI provides the following APIs to manipulate allocated registers:\n\n    - ``nisa.register_move``: Move a constant integer or another register's value into a register\n    - ``nisa.register_load``: Load a scalar (32-bit) value from HBM/SBUF into a register\n    - ``nisa.register_store``: Store register contents to HBM/SBUF\n\n    In the current NKI release, these registers are primarily used to specify dynamic loop boundaries and\n    while loop conditions. The NKI compiler compiles such dynamic looping constructs to branching instructions\n    executed by engine sequencers. For additional details, see ``nl.dynamic_range``. For more information\n    on engine sequencer and its capabilities, see\n    `Trainium/Inferentia2 architecture guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/arch/trainium_inferentia2_arch.html>`_.\n\n    :param x: optional initialization value. Can be one of:\n\n              - ``None`` (default): allocate an uninitialized register\n              - ``int``: allocate a register initialized with this immediate integer value\n\n    Example:\n\n    Three ways to allocate a register initialized to zero:\n\n    .. code-block:: python\n\n        # Approach 1: Using an immediate value\n        reg1 = nisa.register_alloc(0)\n\n        # Approach 2: Two-step with register_load\n        zero_tensor = nl.zeros([1, 1], dtype=nl.int32, buffer=nl.sbuf)\n        reg2 = nisa.register_alloc(None)\n        nisa.register_load(reg2, zero_tensor)\"\"\"\n    ...\n\n\ndef register_load(dst, src):\n    r\"\"\"Load a scalar value from memory (HBM or SBUF) into a virtual register.\n\n    This instruction reads a single scalar value (up to 32-bit) from a memory location (HBM or SBUF)\n    and stores it in the specified virtual register. The source must be a NKI tensor with exactly\n    one element (shape [1] or [1, 1]). This enables dynamic loading of values computed at\n    runtime into registers for use in control flow operations.\n\n    The virtual register system allows the NKI compiler to allocate physical registers across\n    different engine sequencers as needed. See ``nisa.register_alloc`` for more details on\n    virtual register allocation.\n\n    :param dst: the destination virtual register (allocated via ``nisa.register_alloc``)\n    :param src: the source tensor containing a single scalar value to load\n\n    Example:\n\n    .. code-block:: python\n\n        # Load a computed value into a register\n        computed_bound = nl.ones([1], dtype=nl.int32, buffer=nl.sbuf)  # bound of 1 in SBUF\n        loop_reg = nisa.register_alloc()\n        nisa.register_load(loop_reg, computed_bound)\"\"\"\n    ...\n\n\ndef register_move(dst, src):\n    r\"\"\"Move a value into a virtual register.\n\n    This instruction loads a value into the specified virtual register. The source can be\n    either a compile-time constant integer or another virtual register.\n\n    The virtual register system allows the NKI compiler to allocate physical registers across\n    different engine sequencers as needed. See ``nisa.register_alloc`` for more details on\n    virtual register allocation.\n\n    This instruction operates on virtual registers only and does not access SBUF, PSUM, or HBM.\n\n    :param dst: the destination virtual register (allocated via ``nisa.register_alloc``)\n    :param src: source value - either a compile-time constant integer or a VirtualRegister\n\n    Example:\n\n    .. code-block:: python\n\n        # Allocate a register and initialize it with a constant\n        loop_count = nisa.register_alloc()\n        nisa.register_move(loop_count, 10)  # Set register to 10\n\n        # Copy from another register\n        reg2 = nisa.register_alloc()\n        nisa.register_move(reg2, loop_count)  # Copy value from loop_count\"\"\"\n    ...\n\n\ndef register_store(dst, src):\n    r\"\"\"Store the value from a virtual register into memory (HBM/SBUF).\n\n    This instruction writes the scalar value (up to 32-bit) stored in a virtual register to a memory location\n    (HBM or SBUF). The destination must be a tensor with exactly one element (shape [1] or [1, 1]).\n    This enables saving register values back to memory for later use or for output purposes.\n\n    The virtual register system allows the NKI compiler to allocate physical registers across\n    different engine sequencers as needed. See ``nisa.register_alloc`` for more details on\n    virtual register allocation.\n\n    :param dst: the destination tensor with a single element to store the register value\n    :param src: the source virtual register (allocated via ``nisa.register_alloc``)\n\n    Example:\n\n    .. code-block:: python\n\n        # Store a register value back to memory\n        counter_reg = nisa.register_alloc(0)\n        # ... perform operations that modify counter_reg ...\n        result_tensor = nl.ndarray([1], dtype=nl.int32, buffer=nl.sbuf)\n        nisa.register_store(result_tensor, counter_reg)\"\"\"\n    ...\n\n\ndef rng(dst, engine=engine.unknown, name=None):\n    r\"\"\"Generate pseudo random numbers using the Vector or GpSimd Engine.\n\n    This instruction generates 32 random bits per element and writes them to the\n    destination tensor. Depending on the size of the dtype, the instruction truncates\n    each 32-bit random value to the specified data type, taking the least significant bits.\n\n    Example use case:\n    To generate random FP32 numbers between 0.0 and 1.0, follow the Rng instruction\n    with a normalization instruction (e.g., write 16 random bits as UINT16, then\n    divide by (2^16-1) to get a random FP32 number between 0.0 and 1.0).\n\n    **Memory types.**\n\n    The output ``dst`` tile can be in SBUF or PSUM.\n\n    **Data types.**\n\n    The output ``dst`` tile must be an integer type: int8, int16, int32, uint8, uint16, or uint32.\n\n    **Tile size.**\n\n    The partition dimension size of ``dst`` must not exceed 128. The number of\n    elements per partition of ``dst`` must not exceed the physical size of each SBUF/PSUM partition.\n\n    **Constraints.**\n\n    - Supported arch versions: NeuronCore-v2+.\n    - Supported engines: NeuronCore-v2: Vector. NeuronCore-v3+: GpSimd, Vector.\n    - Since GpSimd Engine cannot access PSUM, ``dst`` must be in SBUF when using GpSimd Engine.\n\n    :param dst: the destination tensor to write random values to\n    :param engine: specify which engine to use: ``nki.isa.engine.vector``, ``nki.isa.engine.gpsimd``,\n                   or ``nki.isa.engine.unknown`` (default, the best engine will be selected)\"\"\"\n    ...\n\n\nscalar_engine = engine.scalar\n\"\"\"Scalar Engine\"\"\"\n\n\ndef scalar_tensor_tensor(dst, data, op0, operand0, op1, operand1, reverse0=False, reverse1=False, name=None):\n    r\"\"\"Apply two math operators in sequence using Vector Engine: ``(data <op0> operand0) <op1> operand1``.\n\n    This instruction is equivalent to running two operations back-to-back:\n    1. ``temp_result = tensor_scalar(data, op0, operand0)`` - broadcast ``operand0`` and apply ``op0``\n    2. ``dst = tensor_tensor(temp_result, op1, operand1)`` - element-wise operation with ``operand1``\n\n    The ``operand0`` can be either a compile-time\n    constant scalar for broadcast across all elements of ``data`` or\n    a tile of shape ``(data.shape[0], 1)`` for broadcast along the free dimension.\n    The ``operand1`` tile must have the same shape as ``data`` for element-wise operation.\n\n    The scalar broadcasting in the first operation is performed at no additional performance cost,\n    making this instruction have approximately the same latency as a regular ``tensor_tensor`` instruction.\n\n    Both ``op0`` and ``op1`` must be arithmetic operators (see :ref:`nki-aluop` for supported operators).\n    Bitvec operators are not supported. When the operators are non-commutative (e.g., subtract),\n    operand ordering can be reversed using ``reverse0`` and ``reverse1`` flags.\n\n    **Memory types.**\n\n    The input ``data`` tile can be an SBUF or PSUM tile. The ``operand0`` can be an SBUF or PSUM tile\n    or a compile-time constant scalar. The ``operand1`` must be an SBUF or PSUM tile.\n    However, ``data`` and ``operand1`` cannot both reside in PSUM. The output ``dst`` tile can be\n    written to either SBUF or PSUM.\n\n    **Data types.**\n\n    All input tiles can be any supported NKI data type (see :ref:`nki-dtype` for more information).\n    The Vector Engine automatically casts input data types to float32 and performs all computations\n    in float32 math. The float32 results are cast to the data type of output ``dst``.\n\n    **Layout.**\n\n    The parallel computation dimension of ``nisa.scalar_tensor_tensor`` is along the partition dimension.\n\n    **Tile size.**\n\n    The partition dimension size of input ``data``, ``operand1``, and output ``dst`` tiles must be\n    the same and must not exceed 128. The total number of elements per partition of input ``data``, ``operand1``,\n    and output ``dst`` tiles must be the same and must not exceed the\n    physical size of each SBUF partition.\n    If operand0 is not a scalar, the partition dimension size of ``operand0`` must be the same as that of ``data``\n    and the number of elements per partition of ``operand0`` must be 1.\n\n    :param dst: the output tile\n    :param data: the input tile\n    :param op0: the first math operator used with operand0 (see :ref:`nki-aluop` for supported operators)\n    :param operand0: a scalar constant or a tile of shape ``(data.shape[0], 1)``, where data.shape[0]\n                    is the partition axis size of the input ``data`` tile\n    :param reverse0: reverse ordering of inputs to ``op0``; if false, ``operand0`` is the rhs of ``op0``;\n                     if true, ``operand0`` is the lhs of ``op0``\n    :param op1: the second math operator used with operand1 (see :ref:`nki-aluop` for supported operators)\n    :param operand1: a tile with the same size as ``data`` for element-wise operation\n    :param reverse1: reverse ordering of inputs to ``op1``; if false, ``operand1`` is the rhs of ``op1``;\n                     if true, ``operand1`` is the lhs of ``op1``\"\"\"\n    ...\n\n\ndef select_reduce(dst, predicate, on_true, on_false, reduce_res=None, reduce_cmd=reduce_cmd.idle, reduce_op=nl.maximum, reverse_pred=False, name=None):\n    r\"\"\"Selectively copy elements from either ``on_true`` or ``on_false`` to the destination tile\n    based on a ``predicate`` using Vector Engine, with optional reduction (max).\n\n    The operation can be expressed in NumPy as:\n\n    .. code-block:: python\n\n        # Select:\n        predicate = ~predicate if reverse_pred else predicate\n        result = np.where(predicate, on_true, on_false)\n\n        # With Reduce:\n        reduction_result = np.max(result, axis=1, keepdims=True)\n\n    **Memory constraints:**\n\n    - Both ``on_true`` and ``predicate`` are permitted to be in SBUF\n    - Either ``on_true`` or ``predicate`` may be in PSUM, but not both simultaneously\n    - The destination ``dst`` can be in either SBUF or PSUM\n\n    **Shape and data type constraints:**\n\n    - ``on_true``, ``dst``, and ``predicate`` must have identical shapes (same number of partitions and elements per partition)\n    - ``on_true`` can be any supported dtype except ``tfloat32``, ``int32``, ``uint32``\n    - ``on_false`` dtype must be ``float32`` if ``on_false`` is a scalar.\n    - ``on_false`` has to be either scalar or vector of shape ``(on_true.shape[0], 1)``\n    - ``predicate`` dtype can be any supported integer type ``int8``, ``uint8``, ``int16``, ``uint16``\n    - ``reduce_res`` must be a vector of shape ``(on_true.shape[0], 1)``\n    - ``reduce_res`` dtype must of float type\n    - ``reduce_op`` only supports ``max``\n\n    **Behavior:**\n\n    - Where predicate is True: The corresponding elements from ``on_true`` are copied to ``dst``\n    - Where predicate is False: The corresponding elements from ``on_false`` are copied to ``dst``\n    - When reduction is enabled, the max value from each partition of the ``result`` is computed and stored in ``reduce_res``\n\n    **Accumulator behavior:**\n\n    The Vector Engine maintains internal accumulator registers that can be controlled via the ``reduce_cmd`` parameter:\n\n    - ``nisa.reduce_cmd.reset_reduce``: Reset accumulators to -inf, then accumulate the current results\n    - ``nisa.reduce_cmd.reduce``: Continue accumulating without resetting (useful for multi-step reductions)\n    - ``nisa.reduce_cmd.idle``: No accumulation performed (default)\n\n    .. note::\n      Even when ``reduce_cmd`` is set to ``idle``, the accumulator state may still be modified.\n      Always use ``reset_reduce`` after any operations that ran with ``idle`` mode to ensure\n      consistent behavior.\n\n    .. note::\n      The accumulator registers are shared for other Vector Engine accumulation instructions such :doc:`nki.isa.range_select <nki.isa.range_select>`\n\n    :param dst: The destination tile to write the selected values to\n    :param predicate: Tile that determines which value to select (on_true or on_false)\n    :param on_true: Tile to select from when predicate is True\n    :param on_false: Value to use when predicate is False, can be a scalar value or a vector tile of ``(on_true.shape[0], 1)``\n    :param reduce_res: (optional) Tile to store reduction results, must have shape ``(on_true.shape[0], 1)``\n    :param reduce_cmd: (optional) Control accumulator behavior using ``nisa.reduce_cmd`` values, defaults to idle\n    :param reduce_op: (optional) Reduction operator to apply (only ``nl.maximum`` is supported)\n    :param reverse_pred: (optional) Reverse the meaning of the predicate condition, defaults to False\"\"\"\n    ...\n\n\ndef sendrecv(src, dst, send_to_rank, recv_from_rank, pipe_id, dma_engine=dma_engine.dma, name=None):\n    r\"\"\"Perform point-to-point communication between NeuronCores by sending and receiving data\n    simultaneously using DMA engines.\n\n    .. note::\n      Available only on NeuronCore-v3 or newer.\n\n    This instruction enables bidirectional data exchange between two NeuronCores within a\n    Logical NeuronCore (LNC) configuration.\n    The current NeuronCore sends its ``src`` tile to the ``dst`` location of the target\n    NeuronCore specified by ``send_to_rank``,\n    while simultaneously receiving data from ``recv_from_rank`` into its own ``dst`` tile.\n\n    The use case is when NeuronCores need to exchange data for distributed computation patterns,\n    such as all-gather communication or other collective operations where cores need to\n    coordinate their computations by exchanging tiles.\n\n    This instruction is only allowed in NeuronCore-v3 or newer when\n    `LNC (Logical NeuronCore) <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/logical-neuroncore-config.html>`_\n    is enabled. The communication occurs between NeuronCores that share the same HBM stack within the LNC configuration.\n    Therefore, ``send_to_rank`` and ``recv_from_rank`` must be either 0 or 1.\n\n    The ``pipe_id`` parameter provides synchronization control by grouping sendrecv operations. Operations with the same\n    ``pipe_id`` form a logical group where all operations in the group must complete before any can proceed. Operations\n    with different ``pipe_id`` values can progress independently without blocking each other.\n\n    The ``dma_engine`` parameter specifies which DMA transfer mechanism to use:\n\n    - ``nisa.dma_engine.dma`` (default): Uses the standard DMA engine with CoreBarrier synchronization.\n      Can be triggered from any engine.\n    - ``nisa.dma_engine.gpsimd_dma``: Uses the GPSIMD's internal DMA engine for low-latency\n      SB-to-SB swaps in LNC=2. Implies GPSIMD as the trigger engine. This mode has restrictions:\n      the partition dimension size of ``src``/``dst`` must be a multiple of 16, and the data size\n      per partition must not exceed 1024 bytes for 32-bit types, 512 bytes for 16-bit types,\n      or 256 bytes for 8-bit types.\n\n    **Memory types.**\n\n    Both ``src`` and ``dst`` tiles must be in SBUF.\n\n    **Data types.**\n\n    ``src`` and ``dst`` must have the same data type, but they can be any supported data types in NKI.\n\n    **Layout.**\n\n    ``src`` and ``dst`` must have the same shape and layout.\n\n    **Tile size.**\n\n    ``src`` and ``dst`` must have the same partition dimension size and the same number of elements per partition.\n\n    :param src: the source tile on the current NeuronCore to be sent to the target NeuronCore\n    :param dst: the destination tile on the current NeuronCore where received data will be stored\n    :param send_to_rank: rank ID of the target NeuronCore to send data to\n    :param recv_from_rank: rank ID of the source NeuronCore to receive data from\n    :param pipe_id: synchronization identifier that groups sendrecv operations; operations with the same pipe_id are synchronized\n    :param dma_engine: the DMA transfer mode; defaults to ``nisa.dma_engine.dma``\n\n    Example:\n\n    .. code-block:: python\n\n        # Exchange data between two cores in a ring pattern\n        num_cores = 2\n        current_rank = nl.program_id()\n        next_rank = (current_rank + 1) % num_cores\n        prev_rank = (current_rank - 1) % num_cores\n\n        # Data to send and buffer to receive\n        send_data = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.sbuf)\n        recv_buffer = nl.ndarray((batch_size, hidden_dim), dtype=nl.float32, buffer=nl.sbuf)\n\n        # Perform bidirectional exchange\n        sendrecv(\n            src=send_data,\n            dst=recv_buffer,\n            send_to_rank=next_rank,\n            recv_from_rank=prev_rank,\n            pipe_id=0\n        )\n\n        # Now recv_buffer contains data from the previous core\"\"\"\n    ...\n\n\ndef sequence_bounds(dst, segment_ids, name=None):\n    r\"\"\"Compute the sequence bounds for a given set of segment IDs using GpSIMD Engine.\n\n    Given a tile of segment IDs, this function identifies where each segment begins and ends.\n    For each element, it returns a pair of values: [start_index, end_index] indicating\n    the boundaries of the segment that element belongs to. All segment IDs must be non-negative\n    integers. Padding elements (with segment ID of zero) receive special boundary\n    values: a start index of n and an end index of (-1), where n is the length\n    of ``segment_ids``.\n\n    The output tile contains two values per input element: the start index (first column)\n    and end index (second column) of each segment. The partition dimension must always be 1.\n    For example, with input shape (1, 512), the output shape becomes (1, 2, 512), where\n    the additional dimension holds the start and end indices for each element.\n\n    Both the input tile (``segment_ids``) and output tile (``dst``) must have data type ``nl.float32`` or ``nl.int32``.\n\n    **NumPy equivalent:**\n\n    :param dst: tile containing the sequence bounds.\n    :param segment_ids: tile containing the segment IDs. Elements with ID=0 are treated as padding.\"\"\"\n    ...\n\n\ndef set_rng_seed(src_seeds, name=None):\n    r\"\"\"Seed the pseudo random number generator (PRNG) inside the Vector Engine.\n\n    The PRNG state is cached inside the engine as a persistent state during the rest of NEFF\n    execution. However, the state cannot survive TPB resets or Runtime reload.\n\n    Using the same seed will generate the same sequence of random numbers when used\n    together with the ``nisa.rng()`` on the Vector Engine.\n\n    **Memory types.**\n\n    The input ``src_seeds`` must be in SBUF or PSUM.\n\n    **Data types.**\n\n    The input ``src_seeds`` must be a 32-bit value.\n\n    **Tile size.**\n\n    The input ``src_seeds`` must be a [1,1] tensor.\n\n    :param src_seeds: a [1,1] tensor on SBUF or PSUM with a 32-bit value to be used as the seed\"\"\"\n    ...\n\n\ndef tensor_copy(dst, src, engine=engine.unknown, name=None):\n    r\"\"\"Create a copy of ``src`` tile within NeuronCore on-chip SRAMs using Vector, Scalar or GpSimd Engine.\n\n    The output tile has the same partition axis size and also the same number of elements per partition\n    as the input tile ``src``.\n\n    All three compute engines, Vector, Scalar and GpSimd Engine can perform tensor copy. However, their copy behavior\n    is slightly different across engines:\n\n    - Scalar Engine on NeuronCore-v2 performs copy by first casting the input tile to FP32 internally and then casting from\n      FP32 to ``dst.dtype``. Users should be cautious with assigning this instruction to Scalar Engine when the input data\n      type cannot be precisely cast to FP32 (e.g., INT32).\n    - Both GpSimd and Vector Engine can operate in two modes: (1) bit-accurate copy when input and output data types are\n      the same or (2) intermediate FP32 cast when input and output data types differ, similar to Scalar Engine.\n\n    In addition, since GpSimd Engine cannot access PSUM in NeuronCore, Scalar or Vector Engine must be chosen when the input or\n    output tile is in PSUM (see :ref:`arch_sec_neuron_core_engines` for details). By default, this API returns\n    a tile in SBUF, unless the returned value is assigned to a pre-declared PSUM tile.\n\n    On NeuronCore v2, ``tensor_copy`` is not supported on the Scalar Engine. Instead, use :doc:`nisa.activation <nki.isa.activation>` with ``op=nl.copy``.\n\n    :param dst: a tile with the same content and partition axis size as the ``src`` tile.\n    :param src: the source of copy, must be a tile in SBUF or PSUM.\n    :param engine: (optional) the engine to use for the operation: `nki.isa.engine.vector`, `nki.isa.engine.scalar`,\n                  `nki.isa.engine.gpsimd` or `nki.isa.engine.unknown` (default, compiler selects best engine based on engine workload).\"\"\"\n    ...\n\n\ndef tensor_copy_predicated(dst, src, predicate, reverse_pred=False, name=None):\n    r\"\"\"Conditionally copy elements from the ``src`` tile to the destination tile on SBUF / PSUM\n    based on a ``predicate`` using Vector Engine.\n\n    This instruction provides low-level control over conditional data movement on NeuronCores,\n    optimized for scenarios where only selective copying of elements is needed. Either ``src`` or\n    ``predicate`` may be in PSUM, but not both simultaneously. Both ``src`` and ``predicate`` are permitted to be in SBUF.\n\n    Shape and data type constraints:\n\n    1. ``src`` (if it is a tensor), ``dst``, and ``predicate`` must occupy the same number of partitions and same number of elements per partition.\n    2. ``predicate`` must be of type ``uint8``, ``uint16``, or ``uint32``.\n    3. ``src`` and ``dst`` must share the same data type.\n\n    **Behavior:**\n\n    - Where predicate is True: The corresponding elements from `src` are copied to `dst` tile. If `src` is a scalar, the scalar is copied to the `dst` tile.\n    - Where predicate is False: The corresponding values in `dst` tile are unmodified\n\n    :param ``src``: The source tile or number to copy elements from when ``predicate`` is True\n    :param ``dst``: The destination tile to copy elements to\n    :param ``predicate``: A tile that determines which elements to copy\n    :param reverse_pred: A boolean that reverses the effect of ``predicate``.\"\"\"\n    ...\n\n\ntensor_engine = engine.tensor\n\"\"\"Tensor Engine\"\"\"\n\n\ndef tensor_partition_reduce(dst, op, data, name=None):\n    r\"\"\"Apply a reduction operation across partitions of an input ``data`` tile using GpSimd Engine.\n\n    :param dst: output tile with reduced result\n    :param op: the reduction operator (add, max, bitwise_or, bitwise_and)\n    :param data: the input tile to be reduced\"\"\"\n    ...\n\n\ndef tensor_reduce(dst, op, data, axis, negate=False, keepdims=False, name=None):\n    r\"\"\"Apply a reduction operation to the free axes of an input ``data`` tile using Vector Engine.\n\n    The reduction operator is specified in the ``op`` input field\n    (see :ref:`nki-aluop` for a list of supported reduction operators).\n    ``nisa.tensor_reduce`` supports two types of reduction operators: 1) bitvec operators (e.g., bitwise_and, bitwise_or)\n    and 2) arithmetic operators (e.g., add, subtract, multiply).\n\n    The reduction axes are specified in the ``axis`` field as an int or list of ints indicating\n    which dimensions to reduce. The reduction axes must be the last contiguous free dimension(s)\n    of the tile, ending at the final dimension. Axis 0 (partition axis) cannot be reduced.\n\n    For example, given a 4D tile ``(P, D1, D2, D3)``:\n\n    - ``axis=(3,)`` reduces only ``D3``\n    - ``axis=(2, 3)`` reduces ``D2`` and ``D3``\n    - ``axis=(1, 2, 3)`` reduces ``D1``, ``D2``, and ``D3``\n\n    When the reduction ``op`` is an arithmetic operator, the instruction can also multiply the output reduction\n    results by ``-1.0`` before writing into the output tile, at no additional performance cost. This behavior is\n    controlled by the ``negate`` input field.\n\n    **Memory types.**\n\n    Both the input ``data`` and ``dst`` tiles can be in SBUF or PSUM.\n\n    **Data types.**\n\n    For bitvec operators, the input/output data types must be integer types and Vector Engine treats\n    all input elements as bit patterns without any data type casting. For arithmetic operators,\n    the input/output data types can be any supported NKI data types, but the engine automatically casts\n    input data types to float32\n    and performs the reduction operation in float32 math. The float32 reduction results are cast to the\n    data type of ``dst``.\n\n    **Layout.**\n\n    ``nisa.tensor_reduce`` only supports free axes reduction. Therefore, the partition dimension of the input\n    ``data`` is considered the parallel compute dimension. To perform a partition axis reduction, we can either:\n\n    1. invoke a ``nisa.nc_transpose`` instruction on the input tile and then this ``nisa.tensor_reduce``\n       on the transposed tile, or\n    2. invoke ``nisa.nc_matmul`` instructions to multiply a ``nl.ones([128, 1], dtype=data.dtype)`` tile as a stationary\n       tensor with the input tile as a moving tensor. See more discussion on Tensor Engine alternative usage in\n       `Trainium architecture guide <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/arch/trainium2_arch.html>`_.\n\n    **Tile size.**\n\n    The partition dimension size of input ``data`` and output ``dst`` tiles must be the same and must not exceed 128.\n    The number of elements per partition of ``data`` must not\n    exceed the physical size of each SBUF partition. The number of elements per partition in ``dst`` must be consistent\n    with the ``axis`` field. For example, if ``axis`` indicates all free dimensions of ``data`` are reduced,\n    the number of elements per partition in ``dst`` must be 1.\n\n    :param dst: output tile of the reduction result\n    :param op: the reduction operator (see :ref:`nki-aluop` for supported reduction operators)\n    :param data: the input tile to be reduced\n    :param axis: int or tuple/list of ints. The axis (or axes) along which to reduce;\n                 must be the last contiguous free dimension(s) ending at the final dim.\n                 For example, for a 4D tile ``(P, D1, D2, D3)``: valid values are\n                 ``(3,)``, ``(2, 3)``, or ``(1, 2, 3)``. Axis 0 (partition dim) cannot be reduced.\n    :param negate: if True, reduction result is multiplied by ``-1.0``;\n                   only applicable when op is an arithmetic operator\n    :param keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one.\n                     With this option, the result will broadcast correctly against the input array.\"\"\"\n    ...\n\n\ndef tensor_scalar(dst, data, op0, operand0, reverse0=False, op1=None, operand1=None, reverse1=False, engine=engine.unknown, name=None):\n    r\"\"\"Apply up to two math operators to the input ``data`` tile by broadcasting scalar/vector operands\n    in the free dimension using Vector or Scalar or GpSimd Engine: ``(data <op0> operand0) <op1> operand1``.\n\n    The input ``data`` tile can be an SBUF or PSUM tile. Both ``operand0`` and ``operand1`` can be\n    SBUF or PSUM tiles of shape ``(data.shape[0], 1)``, i.e., vectors,\n    or compile-time constant scalars.\n\n    ``op1`` and ``operand1`` are optional, but must be ``None`` (default values) when unused.\n    Note, performing one operator has the same performance cost as performing two operators in the instruction.\n\n    When the operators are non-commutative (e.g., subtract), we can reverse ordering of the inputs for each operator through:\n\n      - ``reverse0 = True``: ``tmp_res = operand0 <op0> data``\n      - ``reverse1 = True``: ``operand1 <op1> tmp_res``\n\n    The ``tensor_scalar`` instruction supports two types of operators: 1) bitvec\n    operators (e.g., bitwise_and) and 2) arithmetic operators (e.g., add).\n    See :ref:`nki-aluop` for the full list of supported operators.\n    The two operators, ``op0`` and ``op1``, in a ``tensor_scalar`` instruction must be of the same type\n    (both bitvec or both arithmetic).\n    If bitvec operators are used, the ``tensor_scalar`` instruction must run on Vector Engine. Also, the input/output\n    data types must be integer types, and input elements are treated as bit patterns without any data type casting.\n\n    If arithmetic operators are used, the ``tensor_scalar`` instruction can run on Vector or Scalar or GpSimd Engine.\n    However, each engine supports limited arithmetic operators (see :ref:``tbl-aluop``). The Scalar Engine on trn2 only\n    supports some operator combinations:\n\n      - ``op0=nl.multiply`` and ``op1=nl.add``\n      - ``op0=nl.multiply`` and ``op1=None``\n      - ``op0=nl.add`` and ``op1=None``\n\n    Also, arithmetic operators impose no restriction on the data types of input tensor ``data`` and output tensor ``dst``,\n    but the operand0 and operand1 (if used) must be float32.\n    The compute engine automatically casts ``data.dtype`` to float32\n    and performs the operators in float32 math.\n    The float32 computation results are cast to ``dst.dtype`` at no additional performance cost.\n\n    :param dst: an output tile of ``(data <op0> operand0) <op1> operand1`` computation\n    :param data: the input tile\n    :param op0: the first math operator used with operand0 (see :ref:`nki-aluop` for supported operators)\n    :param operand0: a scalar constant or a tile of shape ``(data.shape[0], 1)``, where data.shape[0]\n                    is the partition axis size of the input ``data`` tile\n    :param reverse0: reverse ordering of inputs to ``op0``; if false, ``operand0`` is the rhs of ``op0``;\n                     if true, ``operand0`` is the lhs of ``op0``\n    :param op1: the second math operator used with operand1 (see :ref:`nki-aluop` for supported operators);\n                this operator is optional\n    :param operand1: a scalar constant or a tile of shape ``(data.shape[0], 1)``, where data.shape[0]\n                    is the partition axis size of the input ``data`` tile\n    :param reverse1: reverse ordering of inputs to ``op1``; if false, ``operand1`` is the rhs of ``op1``;\n                     if true, ``operand1`` is the lhs of ``op1``\n    :param engine: (optional) the engine to use for the operation: `nki.isa.engine.vector`, `nki.isa.engine.scalar`,\n                   `nki.isa.engine.gpsimd` (only allowed for rsqrt) or `nki.isa.engine.unknown` (default, let\n                   compiler select best engine based on the input tile shape).\"\"\"\n    ...\n\n\ndef tensor_scalar_cumulative(dst, src, op0, op1, imm0, imm1=None, reduce_cmd=reduce_cmd.reset_reduce, name=None):\n    r\"\"\"Perform tensor-scalar arithmetic operation with cumulative reduction using Vector Engine.\n\n    The operation applies a scalar operation to each tensor element, then performs a cumulative\n    reduction, storing the cumulative results in the destination tensor.\n\n    The operation can be expressed in pseudocode as:\n\n    .. code-block:: python\n\n        if reduce_cmd == reset_reduce:\n            if op1 == add or op1 == subtract:\n                reg = 0\n            elif op1 == mult:\n                reg = 1\n            elif op1 == max:\n                reg = -inf\n            elif op1 == min:\n                reg = +inf\n        elif reduce_cmd == reduce:\n            reg = reg\n        elif reduce_cmd == load_reduce:\n            reg = imm1\n\n        for i in len(in_tensor):\n            if not reverse0:\n                reg = op1(op0(in_tensor[i], imm0), reg)\n                out_tensor[i] = reg\n            else:\n                reg = op1(op0(imm0, in_tensor[i]), reg)\n                out_tensor[i] = reg\n\n    **Operation constraints:**\n\n    - Scalar operation (``op0``) must be an arithmetic op (e.g., add, mult, max)\n    - Reduction operation (``op1``) is limited to add, subtract, mult, max, min\n    - Input / output dtypes are restricted to BF16, FP16, FP32, FP8, UINT8, UINT16, INT8, INT16\n        - INT32/UINT32 are not supported as input/output dtypes (ISA limitation)\n\n    **Accumulator behavior:**\n\n    The Vector Engine maintains internal accumulator registers controlled via ``reduce_cmd``:\n\n    - ``reset_reduce``: Reset accumulator based on reduction operation type\n    - ``load_reduce``: Initialize accumulator with ``imm1`` value\n    - ``reduce``: Continue with existing accumulator value\n\n    :param dst: The destination tensor to write cumulative results to\n    :param src: The source tensor to process\n    :param op0: Scalar arithmetic operation to apply to each element\n    :param op1: Cumulative arithmetic operation for cumulative computation\n    :param imm0: Scalar or vector value for tensor-scalar operation. Must be FP32 datatype\n    :param imm1: (optional) Initial scalar or vector value for the accumulator when ``load_reduce``\n                            is specified as the ``reduce_cmd``. Must be FP32 datatype\n    :param reduce_cmd: (optional) Control accumulator behavior using ``nisa.reduce_cmd`` values,\n                                defaults to ``reset_reduce``\"\"\"\n    ...\n\n\ndef tensor_scalar_reduce(dst, data, op0, operand0, reduce_op, reduce_res, reverse0=False, name=None):\n    r\"\"\"Perform the same computation as ``nisa.tensor_scalar`` with one math operator\n    and also a reduction along the free dimension of the ``nisa.tensor_scalar`` result using Vector Engine.\n\n    Refer to :doc:`nisa.tensor_scalar <nki.isa.tensor_scalar>` for semantics of ``data/op0/operand0``.\n    Unlike regular ``nisa.tensor_scalar`` where two operators are supported, only one\n    operator is supported in this API. Also, ``op0`` can only be arithmetic operation in :ref:`nki-aluop`.\n    Bitvec operators are not supported in this API.\n\n    In addition to :doc:`nisa.tensor_scalar <nki.isa.tensor_scalar>` computation, this API also performs a reduction\n    along the free dimension(s) of the :doc:`nisa.tensor_scalar <nki.isa.tensor_scalar>` result, at a small additional\n    performance cost. The reduction result is returned in ``reduce_res`` in-place, which must be a\n    SBUF/PSUM tile with the same partition axis size as the input tile ``data`` and one element per partition.\n    The ``reduce_op`` can be any of ``nl.add``, ``nl.subtract``, ``nl.multiply``, ``nl.max`` or ``nl.min``.\n\n    Reduction axis is not configurable in this API. If the input tile has multiple free axis, the API will\n    reduce across all of them.\n\n    .. math::\n      result = data <op0> operand0 \\\\\n      reduce\\_res = reduce\\_op(dst, axis=<FreeAxis>)\n\n    :param dst: an output tile of ``(data <op0> operand0)`` computation\n    :param data: the input tile\n    :param op0: the math operator used with operand0 (any arithmetic operator in :ref:`nki-aluop` is allowed)\n    :param operand0: a scalar constant or a tile of shape ``(data.shape[0], 1)``, where data.shape[0]\n                    is the partition axis size of the input ``data`` tile\n    :param reverse0: `(not supported yet)` reverse ordering of inputs to ``op0``; if false, ``operand0`` is the rhs of ``op0``;\n                     if true, ``operand0`` is the lhs of ``op0``. `<-- currently not supported yet.`\n    :param reduce_op: the reduce operation to perform on the free dimension of ``data <op0> operand0``\n    :param reduce_res: a tile of shape ``(data.shape[0], 1)``, where data.shape[0]\n                    is the partition axis size of the input ``data`` tile. The result of ``reduce_op(data <op0> operand0)``\n                    is written in-place into the tile.\"\"\"\n    ...\n\n\ndef tensor_tensor(dst, data1, data2, op, engine=engine.unknown, name=None):\n    r\"\"\"Perform an element-wise operation of input two tiles using Vector Engine or GpSimd Engine.\n    The two tiles must have the same partition axis size and the same number of elements per partition.\n\n    The element-wise operator is specified using the ``op`` field. Valid choices for ``op``:\n\n    1. Any supported *binary* operator that runs on the Vector Engine. (See :ref:`nki-aluop` for details.)\n    2. ``nl.power``. (Which runs on the GpSimd engine.)\n\n    For bitvec operators, the input/output data types must be integer types and Vector Engine treats\n    all input elements as bit patterns without any data type casting. For arithmetic operators, the behavior\n    depends on the data types:\n\n    - **Float types**: The engine casts input data types to float32 and performs the element-wise operation\n      in float32 math. The float32 results are cast to ``dst.dtype`` at no additional performance cost.\n    - **int32/uint32 types**: When all input/output tiles are int32 or uint32, the operation defaults to\n      GpSimd Engine, which uses native integer arithmetic. This ensures exact results for all 32-bit integer\n      values. You may override this by passing ``engine=nki.isa.engine.vector`` explicitly.\n\n    Since GpSimd Engine cannot access PSUM, the input/output tiles cannot be in PSUM if ``op`` is ``nl.power``.\n    Similarly, the automatic GpSimd dispatch for int32/uint32 falls back to Vector Engine when any operand\n    resides in PSUM. (See :ref:`arch_sec_neuron_core_engines` for details.)\n\n    Otherwise, the output tile can be in either SBUF or PSUM.\n    However, the two input tiles, ``data1`` and ``data2`` cannot both reside in PSUM.\n    The three legal cases are:\n\n    1. Both ``data1`` and ``data2`` are in SBUF.\n    2. ``data1`` is in SBUF, while ``data2`` is in PSUM.\n    3. ``data1`` is in PSUM, while ``data2`` is in SBUF.\n\n    Note, if you need broadcasting capability in the free dimension for either input tile, you should consider\n    using :doc:`nki.isa.tensor_scalar <nki.isa.tensor_scalar>` API instead,\n    which has better performance than ``nki.isa.tensor_tensor`` in general.\n\n    :param dst: an output tile of the element-wise operation\n    :param data1: lhs input operand of the element-wise operation\n    :param data2: rhs input operand of the element-wise operation\n    :param op: a binary math operator (see :ref:`nki-aluop` for supported operators)\n    :param engine: (optional) the engine to use for the operation: `nki.isa.engine.vector`, `nki.isa.engine.gpsimd`\n                   or `nki.isa.engine.unknown` (default, let compiler select best engine based on the input tile shape).\"\"\"\n    ...\n\n\ndef tensor_tensor_scan(dst, data0, data1, initial, op0, op1, reverse0=False, reverse1=False, name=None):\n    r\"\"\"Perform a scan operation of two input tiles using Vector Engine.\n\n    Mathematically, the tensor_tensor_scan instruction on Vector Engine performs\n    the following computation per partition:\n\n    .. code-block:: python\n\n        # Let's assume we work with numpy, and data0 and data1 are 2D (with shape[0] being the partition axis)\n        import numpy as np\n\n        result = np.ndarray(data0.shape, dtype=data0.dtype)\n        result[:, 0] = op1(op0(data0[:. 0], initial), data1[:, 0])\n\n        for i in range(1, data0.shape[1]):\n            result[:, i] = op1(op0(data0[:, i], result[:, i-1]), data1[:, i])\n\n    The two input tiles (``data0`` and ``data1``) must have the same\n    partition axis size and the same number of elements per partition.\n    The third input ``initial`` can either be a float32 compile-time scalar constant\n    that will be broadcasted in the partition axis of ``data0``/``data1``, or a tile\n    with the same partition axis size as ``data0``/``data1`` and one element per partition.\n\n    The two input tiles, ``data0`` and ``data1`` cannot both reside in PSUM. The three legal cases are:\n\n    1. Both ``data1`` and ``data2`` are in SBUF.\n    2. ``data1`` is in SBUF, while ``data2`` is in PSUM.\n    3. ``data1`` is in PSUM, while ``data2`` is in SBUF.\n\n    The scan operation supported by this API has two programmable\n    math operators in ``op0`` and ``op1`` fields.\n    Both ``op0`` and ``op1`` can be any binary arithmetic operator\n    supported by NKI (see :ref:`nki-aluop` for details).\n    We can optionally reverse the input operands of ``op0`` by setting ``reverse0`` to True\n    (or ``op1`` by setting ``reverse1``). Reversing operands is useful for non-commutative\n    operators, such as subtract.\n\n    Input/output data types can be any supported NKI data type (see :ref:`nki-dtype`),\n    but the engine automatically casts input data types to float32\n    and performs the computation in float32 math. The float32 computation results are\n    cast to ``dst.dtype`` at no additional performance cost.\n\n    :param dst: an output tile of the scan operation\n    :param data0: lhs input operand of the scan operation\n    :param data1: rhs input operand of the scan operation\n    :param initial: starting state of the scan; can be a SBUF/PSUM tile with 1 element/partition or a scalar\n                        compile-time constant\n    :param op0: a binary arithmetic math operator (see :ref:`nki-aluop` for supported operators)\n    :param op1: a binary arithmetic math operator (see :ref:`nki-aluop` for supported operators)\n    :param reverse0: reverse ordering of inputs to ``op0``; if false, ``data0`` is the lhs of ``op0``;\n                   if true, ``data0`` is the rhs of ``op0``\n    :param reverse1: reverse ordering of inputs to ``op1``; if false, ``data1`` is the rhs of ``op1``;\n                   if true, ``data1`` is the lhs of ``op1``\"\"\"\n    ...\n\n\nunknown_engine = engine.unknown\n\"\"\"Unknown Engine\"\"\"\n\n\nvector_engine = engine.vector\n\"\"\"Vector Engine\"\"\"\n\n"
  },
  {
    "path": "nki/api/nki/language/__init__.py",
    "content": "\"\"\"Stubs for nki.language\"\"\"\n\nfrom enum import Enum\n\nclass MemoryRegion(Enum):\n    r\"\"\"Memory region constants for NKI tensors.\"\"\"\n\n    sbuf = 'sbuf'\n    psum = 'psum'\n    private_hbm = 'private_hbm'\n    shared_hbm = 'shared_hbm'\n\n\nclass NKIObject:\n    r\"\"\"Base class for NKI kernel dataclasses and configuration objects.\"\"\"\n    ...\n\n\nclass tile_size:\n    r\"\"\"Hardware tile size constants (pmax, psum_fmax, gemm_stationary_fmax, etc.)\"\"\"\n    bn_stats_fmax = ...\n    \"\"\"Maximum free dimension of BN_STATS\"\"\"\n    gemm_moving_fmax = ...\n    \"\"\"Maximum free dimension of the moving operand of General Matrix Multiplication on Tensor Engine\"\"\"\n    gemm_stationary_fmax = ...\n    \"\"\"Maximum free dimension of the stationary operand of General Matrix Multiplication on Tensor Engine\"\"\"\n    pmax = ...\n    \"\"\"Maximum partition dimension of a tile\"\"\"\n    psum_fmax = ...\n    \"\"\"Maximum free dimension of a tile on PSUM buffer\"\"\"\n    psum_min_align = ...\n    \"\"\"Minimum byte alignment requirement for PSUM free dimension address\"\"\"\n    sbuf_min_align = ...\n    \"\"\"Minimum byte alignment requirement for SBUF free dimension address\"\"\"\n    total_available_sbuf_size = ...\n    \"\"\"Usable SBUF size per partition (total minus reserved bytes).\"\"\"\n\n\nclass NkiTensor(NKIObject):\n    r\"\"\"Tensor class with access pattern support.\n\n    Attributes:\n        shape: Tuple of dimension sizes\n        dtype: NKI data type string\n        buffer: Buffer location (sbuf, psum, hbm, etc.)\n        _storage: Opaque storage handle, interpreted by the backend\n        _pattern: Access pattern as list of [step, num] tuples (None = identity)\n        offset: Element offset into storage\"\"\"\n    ...\n\n\ndef abs(x, dtype=None):\n    r\"\"\"Absolute value of the input, element-wise.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has absolute values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.abs\n        a = nl.full((128, 512), -1.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.abs(a)\n        expected = nl.full((128, 512), 1.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(b, expected)\n\n        # nki.language.abs with explicit dtype\n        a = nl.full((128, 512), -1.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.abs(a, dtype=nl.float16)\n        expected = nl.full((128, 512), 1.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\ndef add(x, y, dtype=None):\n    r\"\"\"Add the inputs, element-wise.\n\n    ((Similar to `numpy.add <https://numpy.org/doc/stable/reference/generated/numpy.add.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile that has ``x + y``, element-wise.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.add -- element-wise addition of two tiles\n        a = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.full((128, 512), 2.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.add(a, b)\n\n        expected = nl.full((128, 512), 5.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\n\n        # nki.language.add -- adding a scalar to every element of a tile\n        a = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.add(a, 2.0)\n        expected = nl.full((128, 512), 5.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\ndef affine_range(start, stop=None, step=1):\n    r\"\"\"Create a sequence for fully unrolled loop iteration.\n\n    Create a sequence of numbers for use as loop iterators in NKI, resulting in\n    a fully unrolled loop. Prefer :doc:`static_range <nki.language.static_range>` instead.\n\n    .. warning::\n    \n        This API is deprecated and will be removed in future releases.\n\n    :param start: start value (or stop if ``stop`` is None).\n    :param stop: stop value (exclusive).\n    :param step: step size.\n    :return: an iterator yielding integer values from start to stop.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.affine_range\n        for i in nl.affine_range(input_tensor.shape[1] // 512):\n            offset = i * 512\n            tile = nl.load(input_tensor[0:128, offset:offset+512])\n            result = nl.multiply(tile, tile)\n            nl.store(out_tensor[0:128, offset:offset+512], result)\"\"\"\n    ...\n\n\ndef all(x, axis, dtype=None):\n    r\"\"\"Whether all elements along the specified axis (or axes) evaluate to True.\n\n    ((Similar to `numpy.all <https://numpy.org/doc/stable/reference/generated/numpy.all.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param axis: int or tuple/list of ints. The axis (or axes) along which to operate;\n        must be free dimensions, not partition dimension (0); can only be the\n        last contiguous dim(s) of the tile: [1], [1,2], [1,2,3], [1,2,3,4].\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile with the logical AND reduction along the provided axis.\"\"\"\n    ...\n\n\ndef arctan(x, dtype=None):\n    r\"\"\"Inverse tangent of the input, element-wise.\n\n    ((Similar to `numpy.arctan <https://numpy.org/doc/stable/reference/generated/numpy.arctan.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has inverse tangent values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.arctan -- arctan(0.0) = 0.0\n        a = nl.full((128, 512), 0.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.arctan(a)\n        expected = nl.full((128, 512), 0.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\nbfloat16 = 'bfloat16'\n\"\"\"16-bit floating-point number (1S,8E,7M)\"\"\"\n\n\ndef bitwise_and(x, y, dtype=None):\n    r\"\"\"Compute the bitwise AND of two tiles element-wise.\n\n    ((Similar to `numpy.bitwise_and <https://numpy.org/doc/stable/reference/generated/numpy.bitwise_and.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Inputs must be integer typed.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile with the bitwise AND result.\"\"\"\n    ...\n\n\ndef bitwise_or(x, y, dtype=None):\n    r\"\"\"Compute the bitwise OR of two tiles element-wise.\n\n    ((Similar to `numpy.bitwise_or <https://numpy.org/doc/stable/reference/generated/numpy.bitwise_or.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Inputs must be integer typed.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile with the bitwise OR result.\"\"\"\n    ...\n\n\ndef bitwise_xor(x, y, dtype=None):\n    r\"\"\"Compute the bitwise XOR of two tiles element-wise.\n\n    ((Similar to `numpy.bitwise_xor <https://numpy.org/doc/stable/reference/generated/numpy.bitwise_xor.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Inputs must be integer typed.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile with the bitwise XOR result.\"\"\"\n    ...\n\n\nbool_ = 'bool'\n\"\"\"Boolean (True or False) stored as a byte\"\"\"\n\n\ndef broadcast_to(x, shape, dtype=None):\n    r\"\"\"Broadcast a tile to a new shape following numpy broadcasting rules.\n\n    ((Similar to `numpy.broadcast_to <https://numpy.org/doc/stable/reference/generated/numpy.broadcast_to.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    If ``x.shape`` is already the same as ``shape``, returns ``x`` unchanged\n    (or a dtype-cast copy if ``dtype`` differs).\n\n    :param x: the source tile in SBUF or PSUM.\n    :param shape: the target shape. Must have the same rank as ``x``.\n        Each dimension must either match or be broadcast from size 1.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile with the target shape containing broadcast values from ``x``.\"\"\"\n    ...\n\n\ndef ceil(x, dtype=None):\n    r\"\"\"Ceiling of the input, element-wise.\n\n    ((Similar to `numpy.ceil <https://numpy.org/doc/stable/reference/generated/numpy.ceil.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    The ceil of the scalar x is the smallest integer i, such that i >= x.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has ceiling values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.ceil -- rounds 3.2 up to 4.0\n        a = nl.full((128, 512), 3.2, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.ceil(a)\n        expected = nl.full((128, 512), 4.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\n\n        # nki.language.ceil -- rounds -3.7 up to -3.0\n        a = nl.full((128, 512), -3.7, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.ceil(a)\n        expected = nl.full((128, 512), -3.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\ndef copy(x, dtype=None):\n    r\"\"\"Create a copy of the input tile.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Uses the Scalar Engine via ``activation(op=copy)``. Note that the Scalar Engine\n    internally casts through FP32, which may be lossy for integer types with\n    values exceeding FP32 precision (e.g. int32 values > 2^23).\n\n    :param x: the source of copy, must be a tile in SBUF or PSUM.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a new tile with the same layout as ``x``, allocated on the same buffer\n        as ``x`` (SBUF or PSUM).\"\"\"\n    ...\n\n\ndef cos(x, dtype=None):\n    r\"\"\"Cosine of the input, element-wise.\n\n    ((Similar to `numpy.cos <https://numpy.org/doc/stable/reference/generated/numpy.cos.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has cosine values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.cos -- cos(0.0) = 1.0\n        a = nl.full((128, 512), 0.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.cos(a)\n        expected = nl.full((128, 512), 1.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\ndef device_print(print_prefix, tensor):\n    r\"\"\"Print a message with a string prefix followed by the value of a tile.\n\n    During kernel execution on hardware, the Neuron Runtime (NRT) exports device-printed tensors\n    via the NRT debug stream API. By default, setting the environment variable\n    ``NEURON_RT_DEBUG_OUTPUT_DIR`` to a directory path enables the default stream consumer,\n    which dumps tensor data to that directory. The output is organized as:\n    ``<output_dir>/<print_prefix>/core_<logical_core_id>/<iteration>/``.\n\n    In CPU simulation, this prints immediately to stdout.\n\n    :param print_prefix: prefix of the print message. Evaluated at trace time; must be a constant string.\n    :param tensor: tensor to print out. Can be in SBUF or HBM.\"\"\"\n    ...\n\n\ndef divide(x, y, dtype=None):\n    r\"\"\"Divide the inputs, element-wise.\n\n    ((Similar to `numpy.divide <https://numpy.org/doc/stable/reference/generated/numpy.divide.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile that has ``x / y``, element-wise.\n    \"\"\"\n    ...\n\n\ndef dropout(x, rate, dtype=None):\n    r\"\"\"Randomly zeroes some of the elements of the input tile given a probability rate.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param rate: the probability of zeroing each element. Can be a scalar constant\n        or a tile of shape ``(x.shape[0], 1)`` for per-partition drop probabilities.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile with randomly zeroed elements of ``x``.\"\"\"\n    ...\n\n\ndef ds(start, size):\n    r\"\"\"Create a dynamic slice for tensor indexing.\n\n    :param start: the start index of the slice.\n    :param size: the size of the slice.\n    :return: a dynamic slice object for use in tensor indexing.\"\"\"\n    ...\n\n\ndef dynamic_range(start, stop=None, step=1):\n    r\"\"\"Create a sequence for **dynamic**  loop iteration.\n\n    Create a sequence of numbers for use as **dynamic** loop iterators in NKI.\n    The loop runs on device with dynamic bounds.\n\n    :param start: start value (or stop if ``stop`` is None), can be VirtualRegister.\n    :param stop: stop value (exclusive), can be VirtualRegister.\n    :param step: step size, must be a compile-time positive integer (not VirtualRegister).\n    :return: an iterator yielding integer values from start to stop.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.dynamic_range\n        for _ in nl.dynamic_range(1):\n            tile = nl.load(input_tensor[0:128, 0:512])\n            result = nl.multiply(tile, tile)\n            nl.store(out_tensor[0:128, 0:512], result)\"\"\"\n    ...\n\n\ndef empty_like(x, dtype=None, buffer=None, name=''):\n    r\"\"\"Create a new tensor with the same shape and type as a given tensor.\n\n    ((Similar to `numpy.empty_like <https://numpy.org/doc/stable/reference/generated/numpy.empty_like.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: the tensor.\n    :param dtype: the data type of the tensor (default: same as ``x``).\n    :param buffer: the specific buffer (ie, sbuf, psum, hbm), (default: same as ``x``).\n    :param name: the name of the tensor, used in :ref:`scheduling <how-to-scheduling-apis>`.\n    :return: a new :class:`NkiTensor` with the same shape and type as ``x``.\"\"\"\n    ...\n\n\ndef equal(x, y, dtype=None):\n    r\"\"\"Return (x == y) element-wise.\n\n    ((Similar to `numpy.equal <https://numpy.org/doc/stable/reference/generated/numpy.equal.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information); Defaults to the input tile dtype.\n        Use ``dtype=nl.uint8`` for a boolean-like result.\n    :return: a tile with 1 where equal, 0 otherwise.\"\"\"\n    ...\n\n\ndef erf(x, dtype=None):\n    r\"\"\"Error function, element-wise.\"\"\"\n    ...\n\n\ndef erf_dx(x, dtype=None):\n    r\"\"\"Derivative of error function, element-wise.\"\"\"\n    ...\n\n\ndef exp(x, dtype=None):\n    r\"\"\"Exponential of the input, element-wise.\n\n    ((Similar to `numpy.exp <https://numpy.org/doc/stable/reference/generated/numpy.exp.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    The ``exp(x)`` is ``e^x`` where ``e`` is the Euler's number = 2.718281...\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has exponential values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.exp -- exp(0.0) = 1.0\n        a = nl.full((128, 512), 0.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.exp(a)\n        expected = nl.full((128, 512), 1.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\ndef expand_dims(x, axis):\n    r\"\"\"Expand the shape of a tile.\n\n    ((Similar to `numpy.expand_dims <https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Insert a new axis that will appear at the axis position in the expanded tile shape.\n\n    :param x: a tile.\n    :param axis: position in the expanded axes where the new axis is placed.\n    :return: a tile with view of input data with the number of dimensions increased.\"\"\"\n    ...\n\n\nfloat16 = 'float16'\n\"\"\"16-bit floating-point number\"\"\"\n\n\nfloat32 = 'float32'\n\"\"\"32-bit floating-point number\"\"\"\n\n\nfloat4_e2m1fn_x4 = 'float4_e2m1fn_x4'\n\"\"\"4x packed float4_e2m1fn elements, custom data type for nki.isa.nc_matmul_mx on NeuronCore-v4\"\"\"\n\n\nfloat8_e4m3 = 'float8_e4m3'\n\"\"\"8-bit floating-point number (1S,4E,3M)\"\"\"\n\n\nfloat8_e4m3fn = 'float8_e4m3fn'\n\"\"\"8-bit floating-point number (1S,4E,3M), Extended range: no inf, NaN represented by 0bS111'1111\"\"\"\n\n\nfloat8_e4m3fn_x4 = 'float8_e4m3fn_x4'\n\"\"\"4x packed float8_e4m3fn elements, custom data type for nki.isa.nc_matmul_mx on NeuronCore-v4\"\"\"\n\n\nfloat8_e5m2 = 'float8_e5m2'\n\"\"\"8-bit floating-point number (1S,5E,2M)\"\"\"\n\n\nfloat8_e5m2_x4 = 'float8_e5m2_x4'\n\"\"\"4x packed float8_e5m2 elements, custom data type for nki.isa.nc_matmul_mx on NeuronCore-v4\"\"\"\n\n\ndef floor(x, dtype=None):\n    r\"\"\"Floor of the input, element-wise.\n\n    ((Similar to `numpy.floor <https://numpy.org/doc/stable/reference/generated/numpy.floor.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    The floor of the scalar x is the largest integer i, such that i <= x.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has floor values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.floor -- rounds 3.7 down to 3.0\n        a = nl.full((128, 512), 3.7, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.floor(a)\n        expected = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\n\n        # nki.language.floor -- rounds -3.2 down to -4.0\n        a = nl.full((128, 512), -3.2, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.floor(a)\n        expected = nl.full((128, 512), -4.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\ndef fmod(x, y, dtype=None):\n    r\"\"\"Floating-point remainder of ``x / y``, element-wise.\n\n    The remainder has the same sign as the dividend x.\n    It is equivalent to the Matlab(TM) rem function and should not be confused with the Python modulus operator x % y.\n\n    ((Similar to `numpy.fmod <https://numpy.org/doc/stable/reference/generated/numpy.fmod.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile. If x is a scalar value it will be broadcast to the shape of y.\n    :param y: a tile or a scalar value.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile that has values ``x fmod y``.\n    \"\"\"\n    ...\n\n\ndef full(shape, fill_value, dtype, buffer=MemoryRegion.sbuf, name=''):\n    r\"\"\"Create a new tensor of given shape and dtype on the specified buffer, filled with initial value.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param shape: the shape of the tensor.\n    :param fill_value: the value to fill the tensor with.\n    :param dtype: the data type of the tensor.\n    :param buffer: the specific buffer (ie, sbuf, psum, hbm), defaults to sbuf.\n    :param name: the name of the tensor, used in :ref:`scheduling <how-to-scheduling-apis>`.\n    :return: a new :class:`NkiTensor` allocated on the buffer.\"\"\"\n    ...\n\n\ndef gather_flattened(data, indices, axis=0, dtype=None):\n    r\"\"\"Gather elements from data tensor using indices after flattening.\n\n    This instruction gathers elements from the data tensor using integer indices\n    provided in the indices tensor. For each element in the indices tensor, it\n    retrieves the corresponding value from the data tensor using the index value\n    to select from the free dimension of data.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param data: input tensor to gather from.\n    :param indices: indices to gather.\n    :param axis: axis along which to gather.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: gathered tensor.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.gather_flattened -- gather elements by index\n        data = nl.load(data_tensor[0:128, 0:512])\n        indices = nl.load(indices_tensor[0:128, 0:512])\n        result = nl.gather_flattened(data, indices)\n        nl.store(actual_tensor[0:128, 0:512], result)\"\"\"\n    ...\n\n\ndef gelu(x, dtype=None):\n    r\"\"\"GELU activation, element-wise.\"\"\"\n    ...\n\n\ndef gelu_apprx_sigmoid(x, dtype=None):\n    r\"\"\"GELU approximation using sigmoid, element-wise.\"\"\"\n    ...\n\n\ndef gelu_apprx_sigmoid_dx(x, dtype=None):\n    r\"\"\"Derivative of sigmoid-approximated GELU, element-wise.\"\"\"\n    ...\n\n\ndef gelu_apprx_tanh(x, dtype=None):\n    r\"\"\"GELU approximation using tanh, element-wise.\"\"\"\n    ...\n\n\ndef gelu_dx(x, dtype=None):\n    r\"\"\"Derivative of GELU activation, element-wise.\"\"\"\n    ...\n\n\ndef greater(x, y, dtype=None):\n    r\"\"\"Return (x > y) element-wise.\n\n    ((Similar to `numpy.greater <https://numpy.org/doc/stable/reference/generated/numpy.greater.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information); Defaults to the input tile dtype.\n        Use ``dtype=nl.uint8`` for a boolean-like result.\n    :return: a tile with 1 where x > y, 0 otherwise.\"\"\"\n    ...\n\n\ndef greater_equal(x, y, dtype=None):\n    r\"\"\"Return (x >= y) element-wise.\n\n    ((Similar to `numpy.greater_equal <https://numpy.org/doc/stable/reference/generated/numpy.greater_equal.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information); Defaults to the input tile dtype.\n        Use ``dtype=nl.uint8`` for a boolean-like result.\n    :return: a tile with 1 where x >= y, 0 otherwise.\"\"\"\n    ...\n\n\nhbm = MemoryRegion.private_hbm\n\n\nint16 = 'int16'\n\"\"\"16-bit signed integer number\"\"\"\n\n\nint32 = 'int32'\n\"\"\"32-bit signed integer number\"\"\"\n\n\nint8 = 'int8'\n\"\"\"8-bit signed integer number\"\"\"\n\n\ndef invert(x, dtype=None):\n    r\"\"\"Compute the bitwise NOT element-wise.\n\n    ((Similar to `numpy.invert <https://numpy.org/doc/stable/reference/generated/numpy.invert.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Input must be integer typed. Implemented as XOR with all-ones.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile with the bitwise NOT result.\"\"\"\n    ...\n\n\ndef is_hbm(buffer):\n    r\"\"\"Check if buffer is any HBM type.\"\"\"\n    ...\n\n\ndef is_on_chip(buffer):\n    r\"\"\"Check if buffer is on-chip (SBUF or PSUM).\"\"\"\n    ...\n\n\ndef is_psum(buffer):\n    r\"\"\"Check if buffer is PSUM.\"\"\"\n    ...\n\n\ndef is_sbuf(buffer):\n    r\"\"\"Check if buffer is SBUF.\"\"\"\n    ...\n\n\ndef left_shift(x, y, dtype=None):\n    r\"\"\"Left shift the bits of x by y positions element-wise.\n\n    ((Similar to `numpy.left_shift <https://numpy.org/doc/stable/reference/generated/numpy.left_shift.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Inputs must be integer typed.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile with the left-shifted result.\"\"\"\n    ...\n\n\ndef less(x, y, dtype=None):\n    r\"\"\"Return (x < y) element-wise.\n\n    ((Similar to `numpy.less <https://numpy.org/doc/stable/reference/generated/numpy.less.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information); Defaults to the input tile dtype.\n        Use ``dtype=nl.uint8`` for a boolean-like result.\n    :return: a tile with 1 where x < y, 0 otherwise.\"\"\"\n    ...\n\n\ndef less_equal(x, y, dtype=None):\n    r\"\"\"Return (x <= y) element-wise.\n\n    ((Similar to `numpy.less_equal <https://numpy.org/doc/stable/reference/generated/numpy.less_equal.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information); Defaults to the input tile dtype.\n        Use ``dtype=nl.uint8`` for a boolean-like result.\n    :return: a tile with 1 where x <= y, 0 otherwise.\"\"\"\n    ...\n\n\ndef load(src, dtype=None):\n    r\"\"\"Load a tensor from device memory (HBM) into on-chip memory (SBUF).\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param src: HBM tensor to load the data from.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a new tile on SBUF with values from ``src``.\"\"\"\n    ...\n\n\ndef load_transpose2d(src, dtype=None):\n    r\"\"\"Load a tensor from device memory (HBM) and 2D-transpose the data before storing into on-chip memory (SBUF).\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param src: HBM tensor to load the data from.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a new tile on SBUF with values from ``src`` 2D-transposed.\"\"\"\n    ...\n\n\ndef log(x, dtype=None):\n    r\"\"\"Natural logarithm of the input, element-wise.\n\n    ((Similar to `numpy.log <https://numpy.org/doc/stable/reference/generated/numpy.log.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    It is the inverse of the exponential function, such that: ``log(exp(x)) = x`` .\n    The natural logarithm base is ``e``.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has natural logarithm values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.log -- log(1.0) = 0.0\n        a = nl.full((128, 512), 1.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.log(a)\n        expected = nl.full((128, 512), 0.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\ndef logical_and(x, y, dtype=None):\n    r\"\"\"Compute the logical AND of two tiles element-wise.\n\n    ((Similar to `numpy.logical_and <https://numpy.org/doc/stable/reference/generated/numpy.logical_and.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Inputs should be boolean-like (0 or 1 values).\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile with the logical AND result.\"\"\"\n    ...\n\n\ndef logical_not(x, dtype=None):\n    r\"\"\"Compute the logical NOT element-wise.\n\n    ((Similar to `numpy.logical_not <https://numpy.org/doc/stable/reference/generated/numpy.logical_not.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Implemented as XOR with 1, so inputs should be boolean-like (0 or 1 values).\n    For non-boolean inputs, use ``nl.equal(x, 0)`` instead.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile with the logical NOT result.\"\"\"\n    ...\n\n\ndef logical_or(x, y, dtype=None):\n    r\"\"\"Compute the logical OR of two tiles element-wise.\n\n    ((Similar to `numpy.logical_or <https://numpy.org/doc/stable/reference/generated/numpy.logical_or.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Inputs should be boolean-like (0 or 1 values).\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile with the logical OR result.\"\"\"\n    ...\n\n\ndef logical_xor(x, y, dtype=None):\n    r\"\"\"Compute the logical XOR of two tiles element-wise.\n\n    ((Similar to `numpy.logical_xor <https://numpy.org/doc/stable/reference/generated/numpy.logical_xor.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Inputs should be boolean-like (0 or 1 values).\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile with the logical XOR result.\"\"\"\n    ...\n\n\ndef matmul(x, y, transpose_x=False):\n    r\"\"\"x @ y matrix multiplication of x and y.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile on SBUF (partition dimension <= 128, free dimension <= 128),\n        x's free dimension must match y's partition dimension.\n    :param y: a tile on SBUF (partition dimension <= 128, free dimension <= 512).\n    :param transpose_x: defaults to False. If True, x is treated as already transposed.\n        If False, an additional transpose will be inserted to make x's partition\n        dimension the contract dimension of the matmul to align with the Tensor Engine.\n    :return: x @ y or x.T @ y if transpose_x=True.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.matmul -- identity.T @ ones = ones\n        x = nl.shared_identity_matrix(n=128, dtype=nl.float32)\n        y = nl.full((128, 128), 1.0, dtype=nl.float32, buffer=nl.sbuf)\n        result_psum = nl.matmul(x, y, transpose_x=True)\n        result = nl.ndarray((128, 128), dtype=nl.float32, buffer=nl.sbuf)\n        nisa.tensor_copy(result, result_psum)\n        expected = nl.full((128, 128), 1.0, dtype=nl.float32,\n                           buffer=nl.sbuf)\n        assert nl.equal(result, expected)\"\"\"\n    ...\n\n\ndef max(x, axis, dtype=None, keepdims=False):\n    r\"\"\"Maximum of elements along the specified axis (or axes) of the input.\n\n    ((Similar to `numpy.max <https://numpy.org/doc/stable/reference/generated/numpy.max.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param axis: int or tuple/list of ints. The axis (or axes) along which to operate;\n        must be free dimensions, not partition dimension (0); can only be the\n        last contiguous dim(s) of the tile: [1], [1,2], [1,2,3], [1,2,3,4].\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :param keepdims: if True, the reduced axes are kept as size-one dimensions.\n    :return: a tile with the maximum along the provided axis.\"\"\"\n    ...\n\n\ndef maximum(x, y, dtype=None):\n    r\"\"\"Maximum of the inputs, element-wise.\n\n    ((Similar to `numpy.maximum <https://numpy.org/doc/stable/reference/generated/numpy.maximum.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile that has the maximum of each element from x and y.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.maximum -- max(3.0, 5.0) = 5.0\n        a = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.full((128, 512), 5.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.maximum(a, b)\n        expected = nl.full((128, 512), 5.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\n\n        # nki.language.maximum -- with a scalar operand\n        a = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.maximum(a, 5.0)\n        expected = nl.full((128, 512), 5.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\ndef mean(x, axis, dtype=None, keepdims=False):\n    r\"\"\"Arithmetic mean along the specified axis (or axes) of the input.\n\n    ((Similar to `numpy.mean <https://numpy.org/doc/stable/reference/generated/numpy.mean.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param axis: int or tuple/list of ints. The axis (or axes) along which to operate;\n        must be free dimensions, not partition dimension (0); can only be the\n        last contiguous dim(s) of the tile: [1], [1,2], [1,2,3], [1,2,3,4].\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :param keepdims: if True, the reduced axes are kept as size-one dimensions.\n    :return: a tile with the average of elements along the provided axis. Float32\n        intermediate values are used for the computation.\"\"\"\n    ...\n\n\ndef min(x, axis, dtype=None, keepdims=False):\n    r\"\"\"Minimum of elements along the specified axis (or axes) of the input.\n\n    ((Similar to `numpy.min <https://numpy.org/doc/stable/reference/generated/numpy.min.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param axis: int or tuple/list of ints. The axis (or axes) along which to operate;\n        must be free dimensions, not partition dimension (0); can only be the\n        last contiguous dim(s) of the tile: [1], [1,2], [1,2,3], [1,2,3,4].\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :param keepdims: if True, the reduced axes are kept as size-one dimensions.\n    :return: a tile with the minimum along the provided axis.\"\"\"\n    ...\n\n\ndef minimum(x, y, dtype=None):\n    r\"\"\"Minimum of the inputs, element-wise.\n\n    ((Similar to `numpy.minimum <https://numpy.org/doc/stable/reference/generated/numpy.minimum.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile that has the minimum of each element from x and y.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.minimum -- min(3.0, 5.0) = 3.0\n        a = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.full((128, 512), 5.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.minimum(a, b)\n        expected = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\n\n        # nki.language.minimum -- with a scalar operand\n        a = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.minimum(a, 5.0)\n        expected = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\ndef mish(x, dtype=None):\n    r\"\"\"Mish activation, element-wise.\"\"\"\n    ...\n\n\ndef mod(x, y, dtype=None):\n    r\"\"\"Remainder of ``x / y``, element-wise.\n\n    Computes the remainder complementary to the floor_divide function.\n    It is equivalent to the Python modulus x % y and has the same sign as the divisor y.\n\n    ((Similar to `numpy.mod <https://numpy.org/doc/stable/reference/generated/numpy.mod.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile. If x is a scalar value it will be broadcast to the shape of y.\n    :param y: a tile or a scalar value.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile that has values ``x mod y``.\n    \"\"\"\n    ...\n\n\ndef multiply(x, y, dtype=None):\n    r\"\"\"Multiply the inputs, element-wise.\n\n    ((Similar to `numpy.multiply <https://numpy.org/doc/stable/reference/generated/numpy.multiply.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile that has ``x * y``, element-wise.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.multiply -- element-wise multiplication of two tiles\n        a = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.full((128, 512), 4.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.multiply(a, b)\n        expected = nl.full((128, 512), 12.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\n\n        # nki.language.multiply -- scaling every element by a scalar\n        a = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.multiply(a, 4.0)\n        expected = nl.full((128, 512), 12.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\ndef ndarray(shape, dtype, buffer=MemoryRegion.sbuf, name='', address=None):\n    r\"\"\"Create a new tensor of given shape and dtype on the specified buffer.\n\n    :param shape: the shape of the tensor.\n    :param dtype: the data type of the tensor.\n    :param buffer: the specific buffer (ie, sbuf, psum, hbm), defaults to sbuf.\n    :param name: the name of the tensor, used in :ref:`scheduling <how-to-scheduling-apis>`.\n    :param address: optional memory address ``(partition_offset, free_offset)``.\n    :return: a new :class:`NkiTensor` allocated on the buffer.\"\"\"\n    ...\n\n\ndef negative(x, dtype=None):\n    r\"\"\"Numerical negative of the input, element-wise.\n\n    ((Similar to `numpy.negative <https://numpy.org/doc/stable/reference/generated/numpy.negative.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has numerical negative values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.negative -- negates 5.0 to -5.0\n        a = nl.full((128, 512), 5.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.negative(a)\n        expected = nl.full((128, 512), -5.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\n\n        # nki.language.negative -- negates -3.0 to 3.0\n        a = nl.full((128, 512), -3.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.negative(a)\n        expected = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\ndef no_reorder():\n    r\"\"\"Prevent the scheduler from reordering operations in this region.\n\n    Use as a context manager (``with nl.no_reorder():``) to guarantee that\n    operations inside the block execute in program order. Without this\n    directive, the compiler scheduler is free to reorder independent\n    operations for better hardware utilization.\n\n    Dynamic loops (``nl.dynamic_range``) are not supported inside a\n    ``no_reorder`` block. Static loops (``nl.affine_range``,\n    ``nl.sequential_range``, ``nl.static_range``) are allowed because\n    they are fully unrolled at compile time.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.no_reorder -- guarantee execution order\n        with nl.no_reorder():\n            a = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n            b = nl.full((128, 512), 2.0, dtype=nl.float32, buffer=nl.sbuf)\n            c = nl.add(a, b)\n        expected = nl.full((128, 512), 5.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\ndef not_equal(x, y, dtype=None):\n    r\"\"\"Return (x != y) element-wise.\n\n    ((Similar to `numpy.not_equal <https://numpy.org/doc/stable/reference/generated/numpy.not_equal.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information); Defaults to the input tile dtype.\n        Use ``dtype=nl.uint8`` for a boolean-like result.\n    :return: a tile with 1 where not equal, 0 otherwise.\"\"\"\n    ...\n\n\ndef num_programs(axes=0):\n    r\"\"\"Number of SPMD programs along the given axes in the launch grid.\n\n    :param axes: the axes of the launch grid. If not provided, returns the total\n        number of programs along the entire launch grid.\n    :return: the number of SPMD programs along ``axes`` in the launch grid.\"\"\"\n    ...\n\n\ndef ones(shape, dtype, buffer=MemoryRegion.sbuf, name=''):\n    r\"\"\"Create a new tensor of given shape and dtype on the specified buffer, filled with ones.\n\n    ((Similar to `numpy.ones <https://numpy.org/doc/stable/reference/generated/numpy.ones.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param shape: the shape of the tensor.\n    :param dtype: the data type of the tensor.\n    :param buffer: the specific buffer (ie, sbuf, psum, hbm), defaults to sbuf.\n    :param name: the name of the tensor, used in :ref:`scheduling <how-to-scheduling-apis>`.\n    :return: a new :class:`NkiTensor` allocated on the buffer.\"\"\"\n    ...\n\n\ndef power(x, y, dtype=None):\n    r\"\"\"Elements of x raised to powers of y, element-wise.\n\n    ((Similar to `numpy.power <https://numpy.org/doc/stable/reference/generated/numpy.power.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile that has values ``x`` to the power of ``y``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.power -- element-wise exponentiation of two tiles\n        a = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.full((128, 512), 2.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.power(a, b)\n        expected = nl.full((128, 512), 9.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\nprivate_hbm = MemoryRegion.private_hbm\n\n\ndef prod(x, axis, dtype=None, keepdims=False):\n    r\"\"\"Product of elements along the specified axis (or axes) of the input.\n\n    ((Similar to `numpy.prod <https://numpy.org/doc/stable/reference/generated/numpy.prod.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param axis: int or tuple/list of ints. The axis (or axes) along which to operate;\n        must be free dimensions, not partition dimension (0); can only be the\n        last contiguous dim(s) of the tile: [1], [1,2], [1,2,3], [1,2,3,4].\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :param keepdims: if True, the reduced axes are kept as size-one dimensions.\n    :return: a tile with the product along the provided axis.\"\"\"\n    ...\n\n\ndef program_id(axis=0):\n    r\"\"\"Index of the current SPMD program along the given axis in the launch grid.\n\n    :param axis: the axis of the launch grid.\n    :return: the program id along ``axis``.\"\"\"\n    ...\n\n\ndef program_ndim():\n    r\"\"\"Number of dimensions in the SPMD launch grid.\n\n    :return: the number of dimensions in the launch grid, i.e. the number of axes. 0 if no grid.\"\"\"\n    ...\n\n\npsum = MemoryRegion.psum\n\n\ndef rand(shape, dtype, buffer=MemoryRegion.sbuf, name=''):\n    r\"\"\"Create a new tensor of given shape and dtype on the specified buffer, filled with random values.\n\n    Values are sampled from a uniform distribution between 0 and 1.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param shape: the shape of the tensor.\n    :param dtype: the data type of the tensor (see :ref:`nki-dtype` for more information).\n    :param buffer: the specific buffer (sbuf, psum, hbm), defaults to sbuf.\n    :param name: the name of the tensor, used in :ref:`scheduling <how-to-scheduling-apis>`.\n    :return: a new :class:`NkiTensor` allocated on the buffer with random values.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.rand -- generate random values in [0, 1)\n        a = nl.rand((128, 512), dtype=nl.float32)\"\"\"\n    ...\n\n\ndef random_seed(seed):\n    r\"\"\"Set the random seed for random number generation.\n\n    Using the same seed will generate the same sequence of random numbers\n    when used with ``rand()``.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param seed: a [1,1] tensor on SBUF or PSUM with a 32-bit seed value.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.random_seed -- set seed for reproducible random values\n        seed = nl.full((1, 1), 42, dtype=nl.int32, buffer=nl.sbuf)\n        nl.random_seed(seed)\n        a = nl.rand((128, 512), dtype=nl.float32)\n\n        # nki.language.random_seed -- same seed produces same values\n        seed = nl.full((1, 1), 42, dtype=nl.int32, buffer=nl.sbuf)\n        nl.random_seed(seed)\n        a = nl.rand((128, 512), dtype=nl.float32)\n        nl.random_seed(seed)\n        b = nl.rand((128, 512), dtype=nl.float32)\n        assert nl.equal(a, b)\"\"\"\n    ...\n\n\ndef reciprocal(x, dtype=None):\n    r\"\"\"Reciprocal of the input, element-wise.\n\n    ((Similar to `numpy.reciprocal <https://numpy.org/doc/stable/reference/generated/numpy.reciprocal.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    ``reciprocal(x) = 1 / x``\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has reciprocal values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.reciprocal -- reciprocal(4.0) = 0.25\n        a = nl.full((128, 512), 4.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        b = nl.reciprocal(a)\n        expected = nl.full((128, 512), 0.25, dtype=nl.float32,\n                           buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\ndef relu(x, dtype=None):\n    r\"\"\"ReLU activation, element-wise.\"\"\"\n    ...\n\n\ndef right_shift(x, y, dtype=None):\n    r\"\"\"Right shift the bits of x by y positions element-wise.\n\n    ((Similar to `numpy.right_shift <https://numpy.org/doc/stable/reference/generated/numpy.right_shift.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    Inputs must be integer typed.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value. At least one of x, y must be a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile with the right-shifted result.\"\"\"\n    ...\n\n\ndef rms_norm(x, w, axis, n, epsilon=1e-06, dtype=None, compute_dtype=None):\n    r\"\"\"Apply Root Mean Square Layer Normalization.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: input tile.\n    :param w: weight tile.\n    :param axis: axis along which to compute the root mean square (rms) value.\n    :param n: total number of values to calculate rms.\n    :param epsilon: epsilon value used by rms calculation to avoid divide-by-zero.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :param compute_dtype: (optional) dtype for the internal computation.\n    :return: ``x / RMS(x) * w``\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.rms_norm -- normalize with unit weights\n        x = nl.full((128, 512), 2.0, dtype=nl.float32, buffer=nl.sbuf)\n        w = nl.full((128, 512), 1.0, dtype=nl.float32, buffer=nl.sbuf)\n        result = nl.rms_norm(x, w, axis=1, n=512)\"\"\"\n    ...\n\n\ndef rsqrt(x, dtype=None):\n    r\"\"\"Reciprocal of the square-root of the input, element-wise.\n\n    ((Similar to `torch.rsqrt <https://pytorch.org/docs/master/generated/torch.rsqrt.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    ``rsqrt(x) = 1 / sqrt(x)``\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has reciprocal square-root values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.rsqrt -- rsqrt(4.0) = 0.5\n        a = nl.full((128, 512), 4.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        b = nl.rsqrt(a)\n        expected = nl.full((128, 512), 0.5, dtype=nl.float32,\n                           buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\nsbuf = MemoryRegion.sbuf\n\n\ndef sequential_range(start, stop=None, step=1):\n    r\"\"\"Create a sequence for fully unrolled loop iteration.\n\n    Create a sequence of numbers for use as loop iterators in NKI, resulting in\n    a fully unrolled loop. Prefer :doc:`static_range <nki.language.static_range>` instead.\n\n    .. warning::\n    \n        This API is deprecated and will be removed in future releases.\n\n    :param start: start value (or stop if ``stop`` is None).\n    :param stop: stop value (exclusive).\n    :param step: step size.\n    :return: an iterator yielding integer values from start to stop.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.sequential_range\n        for i in nl.sequential_range(input_tensor.shape[1] // 512):\n            offset = i * 512\n            tile = nl.load(input_tensor[0:128, offset:offset+512])\n            result = nl.multiply(tile, tile)\n            nl.store(out_tensor[0:128, offset:offset+512], result)\"\"\"\n    ...\n\n\nshared_hbm = MemoryRegion.shared_hbm\n\n\ndef shared_identity_matrix(n, dtype='uint8', dst=None):\n    r\"\"\"Create an identity matrix in SBUF with the specified data type.\n\n    The compiler will reuse all identity matrices of the same\n    dtype in the graph to save space.\n\n    :param n: the number of rows (and columns) of the returned identity matrix\n    :param dtype: the data type of the tensor, default to be ``nl.uint8`` (see :ref:`nki-dtype` for more information).\n    :return: a new :class:`NkiTensor` which contains the identity tensor\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.shared_identity_matrix -- 128x128 identity matrix\n        identity = nl.shared_identity_matrix(n=128, dtype=nl.float32)\n        expected = nl.load(expected_tensor[0:128, 0:128])\n        assert nl.equal(identity, expected)\n        nl.store(actual_tensor[0:128, 0:128], identity)\"\"\"\n    ...\n\n\ndef sigmoid(x, dtype=None):\n    r\"\"\"Sigmoid activation, element-wise.\"\"\"\n    ...\n\n\ndef sign(x, dtype=None):\n    r\"\"\"Sign of the numbers of the input, element-wise.\n\n    ((Similar to `numpy.sign <https://numpy.org/doc/stable/reference/generated/numpy.sign.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    The sign function returns ``-1`` if ``x < 0``, ``0`` if ``x==0``, ``1`` if ``x > 0``.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has sign values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.sign -- sign(-5.0) = -1.0\n        a = nl.full((128, 512), -5.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        b = nl.sign(a)\n        expected = nl.full((128, 512), -1.0, dtype=nl.float32,\n                           buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\ndef silu(x, dtype=None):\n    r\"\"\"SiLU (Swish) activation, element-wise.\"\"\"\n    ...\n\n\ndef silu_dx(x, dtype=None):\n    r\"\"\"Derivative of SiLU activation, element-wise.\"\"\"\n    ...\n\n\ndef sin(x, dtype=None):\n    r\"\"\"Sine of the input, element-wise.\n\n    ((Similar to `numpy.sin <https://numpy.org/doc/stable/reference/generated/numpy.sin.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has sine values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.sin -- sin(0.0) = 0.0\n        a = nl.full((128, 512), 0.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        b = nl.sin(a)\n        expected = nl.full((128, 512), 0.0, dtype=nl.float32,\n                           buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\ndef softmax(x, axis=-1, dtype=None):\n    r\"\"\"Softmax activation function on the input, element-wise.\n\n    ((Similar to `torch.nn.functional.softmax <https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param axis: int or tuple/list of ints. The axis (or axes) along which to operate; must be free dimensions, not partition dimension (0); can only be the last contiguous dim(s) of the tile: [1], [1,2], [1,2,3], [1,2,3,4]\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has softmax of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.softmax -- uniform input produces uniform output\n        a = nl.full((128, 512), 1.0, dtype=nl.float32, buffer=nl.sbuf)\n        result = nl.softmax(a, axis=1)\"\"\"\n    ...\n\n\ndef softplus(x, dtype=None):\n    r\"\"\"Softplus activation, element-wise.\"\"\"\n    ...\n\n\ndef sqrt(x, dtype=None):\n    r\"\"\"Non-negative square-root of the input, element-wise.\n\n    ((Similar to `numpy.sqrt <https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has square-root values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.sqrt -- sqrt(4.0) = 2.0\n        a = nl.full((128, 512), 4.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        b = nl.sqrt(a)\n        expected = nl.full((128, 512), 2.0, dtype=nl.float32,\n                           buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\ndef square(x, dtype=None):\n    r\"\"\"Square of the input, element-wise.\n\n    ((Similar to `numpy.square <https://numpy.org/doc/stable/reference/generated/numpy.square.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has square of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.square -- square(3.0) = 9.0\n        a = nl.full((128, 512), 3.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        b = nl.square(a)\n        expected = nl.full((128, 512), 9.0, dtype=nl.float32,\n                           buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\ndef static_range(start, stop=None, step=1):\n    r\"\"\"Create a sequence for fully unrolled loop iteration.\n\n    Create a sequence of numbers for use as loop iterators in NKI, resulting in\n    a fully unrolled loop. Prefer this method over :doc:`affine_range <nki.language.affine_range>`\n    and :doc:`sequential_range <nki.language.sequential_range>`\n\n    :param start: start value (or stop if ``stop`` is None).\n    :param stop: stop value (exclusive).\n    :param step: step size.\n    :return: an iterator yielding integer values from start to stop.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.static_range\n        for i in nl.static_range(input_tensor.shape[1] // 512):\n            offset = i * 512\n            tile = nl.load(input_tensor[0:128, offset:offset+512])\n            result = nl.multiply(tile, tile)\n            nl.store(out_tensor[0:128, offset:offset+512], result)\"\"\"\n    ...\n\n\ndef store(dst, value):\n    r\"\"\"Store into a tensor on device memory (HBM) from on-chip memory (SBUF).\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param dst: HBM tensor to store the data into.\n    :param value: an SBUF tile that contains the values to store.\"\"\"\n    ...\n\n\ndef subtract(x, y, dtype=None):\n    r\"\"\"Subtract the inputs, element-wise.\n\n    ((Similar to `numpy.subtract <https://numpy.org/doc/stable/reference/generated/numpy.subtract.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile or a scalar value.\n    :param y: a tile or a scalar value.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tiles, or whichever input type has the highest precision (see NKI Type Promotion for more information);\n    :return: a tile that has ``x - y``, element-wise.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.subtract -- element-wise subtraction of two tiles\n        a = nl.full((128, 512), 10.0, dtype=nl.float32, buffer=nl.sbuf)\n        b = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.subtract(a, b)\n        expected = nl.full((128, 512), 7.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\n\n        # nki.language.subtract -- subtracting a scalar from every element\n        a = nl.full((128, 512), 10.0, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.subtract(a, 3.0)\n        expected = nl.full((128, 512), 7.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\ndef sum(x, axis, dtype=None, keepdims=False):\n    r\"\"\"Sum of elements along the specified axis (or axes) of the input.\n\n    ((Similar to `numpy.sum <https://numpy.org/doc/stable/reference/generated/numpy.sum.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param axis: int or tuple/list of ints. The axis (or axes) along which to operate;\n        must be free dimensions, not partition dimension (0); can only be the\n        last contiguous dim(s) of the tile: [1], [1,2], [1,2,3], [1,2,3,4].\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :param keepdims: if True, the reduced axes are kept as size-one dimensions.\n    :return: a tile with the sum along the provided axis.\"\"\"\n    ...\n\n\ndef tan(x, dtype=None):\n    r\"\"\"Tangent of the input, element-wise.\n\n    ((Similar to `numpy.tan <https://numpy.org/doc/stable/reference/generated/numpy.tan.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has tangent values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.tan -- tan(0.0) = 0.0\n        a = nl.full((128, 512), 0.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        b = nl.tan(a)\n        expected = nl.full((128, 512), 0.0, dtype=nl.float32,\n                           buffer=nl.sbuf)\n        assert nl.equal(b, expected)\"\"\"\n    ...\n\n\ndef tanh(x, dtype=None):\n    r\"\"\"Hyperbolic tangent, element-wise.\"\"\"\n    ...\n\n\ntfloat32 = 'tfloat32'\n\"\"\"32-bit floating-point number (1S,8E,10M)\"\"\"\n\n\ndef transpose(x, dtype=None):\n    r\"\"\"Transposes a 2D tile between its partition and free dimension.\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: 2D input tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has the values of the input tile with its partition and free\n        dimensions swapped.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.transpose -- transpose of identity is identity\n        x = nl.shared_identity_matrix(n=128, dtype=nl.float32)\n        result_psum = nl.transpose(x)\n        result = nl.ndarray((128, 128), dtype=nl.float32, buffer=nl.sbuf)\n        nisa.tensor_copy(result, result_psum)\n        assert nl.equal(result, x)\"\"\"\n    ...\n\n\ndef trunc(x, dtype=None):\n    r\"\"\"Truncated value of the input, element-wise.\n\n    ((Similar to `numpy.trunc <https://numpy.org/doc/stable/reference/generated/numpy.trunc.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    The truncated value of the scalar x is the nearest integer i which is closer to zero than x is.\n    In short, the fractional part of the signed number x is discarded.\n\n    :param x: a tile.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: a tile that has truncated values of ``x``.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.trunc -- truncates 3.7 toward zero to 3.0\n        a = nl.full((128, 512), 3.7, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.trunc(a)\n        expected = nl.full((128, 512), 3.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\n\n        # nki.language.trunc -- truncates -3.7 toward zero to -3.0\n        a = nl.full((128, 512), -3.7, dtype=nl.float32, buffer=nl.sbuf)\n        c = nl.trunc(a)\n        expected = nl.full((128, 512), -3.0, dtype=nl.float32, buffer=nl.sbuf)\n        assert nl.equal(c, expected)\"\"\"\n    ...\n\n\nuint16 = 'uint16'\n\"\"\"16-bit unsigned integer number\"\"\"\n\n\nuint32 = 'uint32'\n\"\"\"32-bit unsigned integer number\"\"\"\n\n\nuint8 = 'uint8'\n\"\"\"8-bit unsigned integer number\"\"\"\n\n\ndef var(x, axis, dtype=None, keepdims=False):\n    r\"\"\"Variance along the specified axis (or axes) of the input.\n\n    ((Similar to `numpy.var <https://numpy.org/doc/stable/reference/generated/numpy.var.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: a tile.\n    :param axis: int or tuple/list of ints. The axis (or axes) along which to operate;\n        must be free dimensions, not partition dimension (0); can only be the\n        last contiguous dim(s) of the tile: [1], [1,2], [1,2,3], [1,2,3,4].\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :param keepdims: currently ignored; result always has keepdims=True shape.\n    :return: a tile with the variance of the elements along the provided axis.\"\"\"\n    ...\n\n\ndef where(condition, x, y, dtype=None):\n    r\"\"\"Return elements chosen from x or y depending on condition.\n\n    ((Similar to `numpy.where <https://numpy.org/doc/stable/reference/generated/numpy.where.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param condition: condition tile with float values (1.0 for True, 0.0 for False).\n    :param x: tensor from which to take elements where condition is True.\n    :param y: tensor from which to take elements where condition is False.\n    :param dtype: (optional) data type to cast the output type to (see Supported Data Types for more information); if not specified, it will default to be the same as the data type of the input tile.\n    :return: tensor with elements from x or y based on condition.\n\n    Examples:\n\n    .. code-block:: python\n\n        import nki.language as nl\n\n        # nki.language.where -- select 10.0 where condition is 1, else 0.0\n        cond = nl.full((128, 512), 1.0, dtype=nl.float32,\n                       buffer=nl.sbuf)\n        x = nl.full((128, 512), 10.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        y = nl.full((128, 512), 0.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        result = nl.where(cond, x, y)\n        expected = nl.full((128, 512), 10.0, dtype=nl.float32,\n                           buffer=nl.sbuf)\n        assert nl.equal(result, expected)\n\n        # nki.language.where -- select 5.0 where condition is 0\n        cond = nl.full((128, 512), 0.0, dtype=nl.float32,\n                       buffer=nl.sbuf)\n        x = nl.full((128, 512), 10.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        y = nl.full((128, 512), 5.0, dtype=nl.float32,\n                    buffer=nl.sbuf)\n        result = nl.where(cond, x, y)\n        expected = nl.full((128, 512), 5.0, dtype=nl.float32,\n                           buffer=nl.sbuf)\n        assert nl.equal(result, expected)\"\"\"\n    ...\n\n\ndef zeros(shape, dtype, buffer=MemoryRegion.sbuf, name=''):\n    r\"\"\"Create a new tensor of given shape and dtype on the specified buffer, filled with zeros.\n\n    ((Similar to `numpy.zeros <https://numpy.org/doc/stable/reference/generated/numpy.zeros.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param shape: the shape of the tensor.\n    :param dtype: the data type of the tensor.\n    :param buffer: the specific buffer (ie, sbuf, psum, hbm), defaults to sbuf.\n    :param name: the name of the tensor, used in :ref:`scheduling <how-to-scheduling-apis>`.\n    :return: a new :class:`NkiTensor` allocated on the buffer.\"\"\"\n    ...\n\n\ndef zeros_like(x, dtype=None, buffer=None, name=''):\n    r\"\"\"Create a new tensor of zeros with the same shape and type as a given tensor.\n\n    ((Similar to `numpy.zeros_like <https://numpy.org/doc/stable/reference/generated/numpy.zeros_like.html>`_))\n\n    .. warning::\n\n       This API is experimental and may change in future releases.\n\n    :param x: the tensor.\n    :param dtype: the data type of the tensor.\n    :param buffer: the specific buffer (ie, sbuf, psum, hbm), defaults to sbuf.\n    :param name: the name of the tensor, used in :ref:`scheduling <how-to-scheduling-apis>`.\n    :return: a new :class:`NkiTensor` of zeros with the same shape as ``x``.\"\"\"\n    ...\n\n"
  },
  {
    "path": "nki/api/nki.api.shared.rst",
    "content": "=======================\nNKI API Common Fields\n=======================\n\n.. _nki-dtype:\n\nSupported Data Types\n========================\n\n:ref:`tbl-dtype` below lists all supported data types by NKI.\nAlmost all of the NKI APIs accept a data type field, `dtype`,\nwhich must be a `nki.language` data type.\n\n.. _tbl-dtype:\n\n.. table:: Supported Data Types by NKI\n\n  +------------------------+------------------------------+-------------------------------------------------+\n  |                        | Data Type                    | Accepted ``dtype`` Field by NKI APIs            |\n  +========================+==============================+=================================================+\n  |                        | 8-bit unsigned integer       | ``nki.language.uint8``                          |\n  |                        +------------------------------+-------------------------------------------------+\n  |                        | 8-bit signed integer         | ``nki.language.int8``                           |\n  |                        +------------------------------+-------------------------------------------------+\n  | Integer                | 16-bit unsigned integer      | ``nki.language.uint16``                         |\n  |                        +------------------------------+-------------------------------------------------+\n  |                        | 16-bit signed integer        | ``nki.language.int16``                          |\n  |                        +------------------------------+-------------------------------------------------+\n  |                        | 32-bit unsigned integer      | ``nki.language.uint32``                         |\n  |                        +------------------------------+-------------------------------------------------+\n  |                        | 32-bit signed integer        | ``nki.language.int32``                          |\n  +------------------------+------------------------------+-------------------------------------------------+\n  |                        | float8_e4m3 (1S,4E,3M) [#1]_ | ``nki.language.float8_e4m3``                    |\n  |                        +------------------------------+-------------------------------------------------+\n  |                        | float8_e5m2 (1S,5E,2M)       | ``nki.language.float8_e5m2``                    |\n  |                        +------------------------------+-------------------------------------------------+\n  |                        | float16 (1S,5E,10M)          | ``nki.language.float16``                        |\n  |                        +------------------------------+-------------------------------------------------+\n  | Float                  | bfloat16 (1S,8E,7M)          | ``nki.language.bfloat16``                       |\n  |                        +------------------------------+-------------------------------------------------+\n  |                        | tfloat32 (1S,8E,10M)         | ``nki.language.tfloat32``                       |\n  |                        +------------------------------+-------------------------------------------------+\n  |                        | float32 (1S,8E,23M)          | ``nki.language.float32``                        |\n  +------------------------+------------------------------+-------------------------------------------------+\n  | Boolean                | boolean stored as uint8      | ``nki.language.bool_``                          |\n  +------------------------+------------------------------+-------------------------------------------------+\n\n.. _nki-aluop:\n\nSupported Math Operators for NKI ISA\n====================================\n\n:ref:`tbl-aluop` below lists all the mathematical operator primitives supported by NKI.\nMany :ref:`nki.isa <nki-isa>` APIs (instructions) allow programmable operators through the ``op`` field.\nThe supported operators fall into two categories: *bitvec* and *arithmetic*. In general, instructions\nusing *bitvec* operators expect integer data types and treat input elements as bit patterns. On the other\nhand, instructions using *arithmetic* operators accept any valid NKI data type and convert input elements\ninto float32 before performing the operators.\n\n.. _tbl-aluop:\n.. table:: Supported Math Operators by NKI ISA\n\n  +------------------------+----------------------------+---------------------------------------------+------------------------+\n  |                        | Operator                   | ``op``                                      | Legal Reduction ``op`` |\n  +========================+============================+=============================================+========================+\n  |                        | Bitwise Not                | ``nki.language.invert``                     | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Bitwise And                | ``nki.language.bitwise_and``                | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Bitwise Or                 | ``nki.language.bitwise_or``                 | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  | Bitvec                 | Bitwise Xor                | ``nki.language.bitwise_xor``                | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Arithmetic Shift Left      | ``nki.language.left_shift``                 | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Arithmetic Shift Right     |  Not supported                              | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Logical Shift Left         | ``nki.language.left_shift``                 | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Logical Shift Right        | ``nki.language.right_shift``                | N                      |\n  +------------------------+----------------------------+---------------------------------------------+------------------------+\n  |                        | Add                        | ``nki.language.add``                        | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Subtract                   | ``nki.language.subtract``                   | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Multiply                   | ``nki.language.multiply``                   | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Max                        | ``nki.language.maximum``                    | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Min                        | ``nki.language.minimum``                    | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Is Equal to                | ``nki.language.equal``                      | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Is Not Equal to            | ``nki.language.not_equal``                  | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  | Arithmetic             | Is Greater than or Equal to| ``nki.language.greater_equal``              | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Is Greater than to         | ``nki.language.greater``                    | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Is Less than or Equal to   | ``nki.language.less_equal``                 | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Is Less than               | ``nki.language.less``                       | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Logical And                | ``nki.language.logical_and``                | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Logical Or                 | ``nki.language.logical_or``                 | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Logical Xor                | ``nki.language.logical_xor``                | Y                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Reverse Square Root        | ``nki.language.rsqrt``                      | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Reciprocal                 | ``nki.language.reciprocal``                 | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Absolute                   | ``nki.language.abs``                        | N                      |\n  |                        +----------------------------+---------------------------------------------+------------------------+\n  |                        | Power                      | ``nki.language.power``                      | N                      |\n  +------------------------+----------------------------+---------------------------------------------+------------------------+\n\n.. _nki-act-func:\n\nSupported Activation Functions for NKI ISA\n==========================================\n:ref:`tbl-act-func` below lists all the activation function supported by the ``nki.isa.activation`` API. These\nactivation functions are approximated with piece-wise polynomials on Scalar Engine.\n*NOTE*: if input values fall outside the supported **Valid Input Range** listed below,\nthe Scalar Engine will generate invalid output results.\n\n.. _tbl-act-func:\n.. table:: Supported Activation Functions by NKI ISA\n   :widths: 25 25 25\n\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Function Name                              | Accepted ``op`` by Scalar Engine                    | Valid Input Range   |\n   +============================================+=====================================================+=====================+\n   | Identity                                   | ``nki.language.copy``                               | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Square                                     | ``nki.language.square``                             | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Sigmoid                                    | ``nki.language.sigmoid``                            | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Relu                                       | ``nki.language.relu``                               | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Gelu                                       | ``nki.language.gelu``                               | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Gelu Derivative                            | ``nki.language.gelu_dx``                            | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Gelu with Tanh Approximation               | ``nki.language.gelu_apprx_tanh``                    | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Gelu with Sigmoid Approximation            | ``nki.language.gelu_apprx_sigmoid``                 | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Gelu with Sigmoid Approximation Derivative | ``nki.language.gelu_apprx_sigmoid_dx``              | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Silu                                       | ``nki.language.silu``                               | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Silu Derivative                            | ``nki.language.silu_dx``                            | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Tanh                                       | ``nki.language.tanh``                               | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Softplus                                   | ``nki.language.softplus``                           | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Mish                                       | ``nki.language.mish``                               | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Erf                                        | ``nki.language.erf``                                | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Erf Derivative                             | ``nki.language.erf_dx``                             | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Exponential                                | ``nki.language.exp``                                | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Natural Log                                | ``nki.language.log``                                | ``[2^-64, 2^64]``   |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Sine                                       | ``nki.language.sin``                                | ``[-PI, PI]``       |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Arctan                                     | ``nki.language.arctan``                             | ``[-PI/2, PI/2]``   |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Square Root                                | ``nki.language.sqrt``                               | ``[2^-116, 2^118]`` |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Reverse Square Root                        | ``nki.language.rsqrt``                              | ``[2^-87, 2^97]``   |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Reciprocal                                 | ``nki.language.reciprocal``                         | ``±[2^-42, 2^42]``  |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Sign                                       | ``nki.language.sign``                               | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n   | Absolute                                   | ``nki.language.abs``                                | ``[-inf, inf]``     |\n   +--------------------------------------------+-----------------------------------------------------+---------------------+\n\n.. _nki-engine-sel:\n\nNKI Engine Selection for Operators Supported on Multiple Engines\n================================================================\nThere is a tradeoff between precision and speed on different engines for operators with multiple engine options. Users can select which engine to map to based on\ntheir needs. We take reciprocal and reverse square root as two examples and explain the tradeoff below.\n\n1. Reciprocal can run on Scalar Engine or Vector Engine:\n\n  Reciprocal can run on Vector Engine with ``nki.isa.reciprocal`` or on Scalar Engine with ``nki.isa.activation(nl.reciprocal)``. Vector Engine performs reciprocal\n  at a higher precision compared to Scalar Engine; however, the computation throughput of reciprocal on Vector Engine is about 8x lower than Scalar Engine for large\n  input tiles. For input tiles with a small number of elements per partition (less than 64, processed one per cycle), instruction initiation interval (roughly 64\n  cycles) dominates performance so Scalar Engine and Vector Engine have comparable performance. In this case, we suggest using Vector Engine to achieve better precision.\n\n  **Estimated cycles on different engines:**\n\n  .. list-table::\n    :widths: 40 60\n    :header-rows: 1\n\n    * - Cost `(Engine Cycles)`\n      - Condition\n    * - ``max(MIN_II, N)``\n      - mapped to Scalar Engine ``nki.isa.scalar_engine``\n    * - ``max(MIN_II, 8*N)``\n      - mapped to Vector Engine ``nki.isa.vector_engine``\n\n  where,\n\n  - ``N`` is the number of elements per partition in the input tile.\n  - ``MIN_II`` is the minimum instruction initiation interval for small input tiles.\n    ``MIN_II`` is roughly 64 engine cycles.\n\n  **Note** ``nki.isa.activation(op=nl.reciprocal)`` doesn't support setting bias on NeuronCore-v2.\n\n2. Reverse square root can run on GpSIMD Engine or Scalar Engine:\n\n  Reverse square root can run on GpSIMD Engine with ``nki.isa.tensor_scalar(op0=nl.rsqrt, operand0=0.0)`` or on Scalar Engine with ``nki.isa.activation(nl.rsqrt)``.\n  GpSIMD Engine performs reverse square root at a higher precision compared to Scalar Engine; however, the computation throughput of reverse square root on GpSIMD\n  Engine is 4x lower than Scalar Engine.\n\n\n.. rubric:: Footnotes\n\n.. [#1] S: sign bits, E: exponent bits, M: mantissa bits\n"
  },
  {
    "path": "nki/api/nki.collectives.rst",
    "content": "nki.collectives\n===============\n\n.. currentmodule:: nki.collectives\n\nThe ``nki.collectives`` module provides APIs for multi-core collective communication\noperations such as all-reduce and all-gather across NeuronCores.\n\n.. _nki-collectives:\n\nNKI Collectives\n---------------\n\nCollective operations for multi-rank communication.\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   all_reduce\n   all_gather\n   reduce_scatter\n   all_to_all\n   all_to_all_v\n   collective_permute\n   collective_permute_implicit\n   collective_permute_implicit_reduce\n   collective_permute_implicit_current_processing_rank_id\n   rank_id\n\n\nConstants\n--------------\n\n.. autosummary::\n   :toctree: generated\n   :template: nki-custom-class-template.rst\n   :nosignatures:\n\n   ReplicaGroup\n"
  },
  {
    "path": "nki/api/nki.isa.rst",
    "content": "nki.isa\n========\n\n.. currentmodule:: nki.isa\n\nThe ``nki.isa`` module exposes low-level ISA instructions for compute, data movement, and synchronization.\nThese APIs map to individual Tensor Engine, Vector Engine, Scalar Engine, and DMA Engine operations,\ngiving you fine-grained control over the underlying hardware capabilities.\n\n.. _nki-isa:\n\nNKI ISA\n--------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   nc_matmul\n   nc_matmul_mx\n   nc_transpose\n   activation\n   activation_reduce\n   tensor_reduce\n   tensor_partition_reduce\n   tensor_tensor\n   tensor_tensor_scan\n   scalar_tensor_tensor\n   tensor_scalar\n   tensor_scalar_reduce\n   tensor_scalar_cumulative\n   tensor_copy\n   tensor_copy_predicated\n   exponential\n   reciprocal\n   quantize_mx\n   iota\n   dropout\n   affine_select\n   range_select\n   select_reduce\n   sequence_bounds\n   memset\n   bn_stats\n   bn_aggr\n   local_gather\n   nc_n_gather\n   dma_copy\n   dma_transpose\n   dma_compute\n   max8\n   nc_find_index8\n   nc_match_replace8\n   nc_stream_shuffle\n   register_alloc\n   register_load\n   register_move\n   register_store\n   core_barrier\n   sendrecv\n   rng\n   rand2\n   rand_set_state\n   rand_get_state\n   set_rng_seed\n   nonzero_with_count\n\n\n\nNKI ISA Config Enums\n--------------------\n.. autosummary::\n   :toctree: generated\n   :template: nki-custom-class-attr-only-template.rst\n   :nosignatures:\n\n   engine\n   dma_engine\n   reduce_cmd\n   dge_mode\n   oob_mode\n   matmul_perf_mode\n\n\nTarget\n-------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   nc_version\n   get_nc_version\n\n\nConstants\n---------\n\n.. autosummary::\n   :toctree: generated\n   :template: nki-custom-class-template.rst\n   :nosignatures:\n\n   VirtualRegister\n"
  },
  {
    "path": "nki/api/nki.isa.rst.bak",
    "content": ".. _nki-isa:\n\nnki.isa\n========\n\n.. currentmodule:: nki.isa\n\nThe ``nki.isa`` module provides low-level instructions that map directly to the NeuronDevice instruction set architecture. These APIs give you fine-grained control over compute engines, data movement, and memory operations.\n\nMatrix Operations\n-----------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 40 60\n\n   * - API\n     - Description\n   * - :doc:`nc_matmul <generated/nki.isa.nc_matmul>`\n     - Matrix multiplication on the Tensor Engine\n   * - :doc:`nc_matmul_mx <generated/nki.isa.nc_matmul_mx>`\n     - Matrix multiplication with MX (microscaling) format support\n   * - :doc:`nc_transpose <generated/nki.isa.nc_transpose>`\n     - Transpose a tile on the Tensor Engine\n\nActivation and Element-wise Operations\n--------------------------------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 40 60\n\n   * - API\n     - Description\n   * - :doc:`activation <generated/nki.isa.activation>`\n     - Apply activation functions (exp, gelu, sigmoid, etc.)\n   * - :doc:`activation_reduce <generated/nki.isa.activation_reduce>`\n     - Apply activation with reduction\n   * - :doc:`exponential <generated/nki.isa.exponential>`\n     - Dedicated exponential with max subtraction (Trn3 only)\n   * - :doc:`reciprocal <generated/nki.isa.reciprocal>`\n     - Compute element-wise reciprocal\n   * - :doc:`quantize_mx <generated/nki.isa.quantize_mx>`\n     - Quantize tensors to MX (microscaling) format\n\nTensor Arithmetic\n-----------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 40 60\n\n   * - API\n     - Description\n   * - :doc:`tensor_tensor <generated/nki.isa.tensor_tensor>`\n     - Element-wise operation on two tensors\n   * - :doc:`tensor_tensor_scan <generated/nki.isa.tensor_tensor_scan>`\n     - Element-wise operation with scan (prefix sum)\n   * - :doc:`scalar_tensor_tensor <generated/nki.isa.scalar_tensor_tensor>`\n     - Scalar-tensor-tensor fused operation\n   * - :doc:`tensor_scalar <generated/nki.isa.tensor_scalar>`\n     - Element-wise operation between a tensor and a scalar\n   * - :doc:`tensor_scalar_reduce <generated/nki.isa.tensor_scalar_reduce>`\n     - Tensor-scalar operation with reduction\n   * - :doc:`tensor_scalar_cumulative <generated/nki.isa.tensor_scalar_cumulative>`\n     - Tensor-scalar operation with cumulative reduction\n\n"
  },
  {
    "path": "nki/api/nki.language.rst",
    "content": ".. _nki-language:\n\nnki.language\n====================\n\n.. currentmodule:: nki.language\n\nThe ``nki.language`` module provides high-level constructs for writing NKI kernels.\nIt includes tensor creation, indexing, type casting, math operations, and loop constructs\nthat the NKI compiler translates into efficient hardware instructions.\n\n.. _nl_creation:\n\nCreation operations\n--------------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   ndarray\n   zeros\n   ones\n   full\n   zeros_like\n   empty_like\n   shared_identity_matrix\n   rand\n   random_seed\n\n.. _nl_tensor_ops:\n\nTensor operations\n------------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   load\n   load_transpose2d\n   store\n   copy\n   matmul\n   transpose\n\n.. _nl_math:\n\nMath operations\n----------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   abs\n   add\n   arctan\n   ceil\n   cos\n\n   .. divide : not supported\n\n   exp\n   floor\n   log\n   maximum\n   minimum\n   multiply\n   negative\n   power\n   reciprocal\n   rsqrt\n   sign\n   sin\n   sqrt\n   square\n   subtract\n   tan\n   tanh\n   trunc\n\n.. _nl_activation_and_backpropagation:\n\nActivation and Backpropagation functions\n-----------------------------------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   relu\n   sigmoid\n   silu\n   silu_dx\n   gelu\n   gelu_dx\n   gelu_apprx_sigmoid\n   gelu_apprx_sigmoid_dx\n   gelu_apprx_tanh\n   mish\n   softplus\n   softmax\n   erf\n   erf_dx\n\n\n.. _nl_normalization_and_regularization:\n\nNormalization and Regularization functions\n------------------------------------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   dropout\n   rms_norm\n\n\n.. _nl_reduction:\n\nReduction operations\n---------------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   all\n   max\n   mean\n   min\n   prod\n   sum\n   var\n\n.. _nl_comparison:\n\nComparison operations\n----------------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   equal\n   not_equal\n   less\n   less_equal\n   greater\n   greater_equal\n\n.. _nl_logical:\n\nLogical operations\n-------------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   logical_and\n   logical_or\n   logical_xor\n   logical_not\n\n.. _nl_bitwise:\n\nBitwise operations\n-------------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   bitwise_and\n   bitwise_or\n   bitwise_xor\n   invert\n   left_shift\n   right_shift\n\n.. _nl_tensor_manipulation_operations:\n\nTensor manipulation operations\n-------------------------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   broadcast_to\n   ds\n   expand_dims\n\n\n.. _nl_indexing:\n\nIndexing operations\n------------------------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   where\n   gather_flattened\n\n\n.. _nl_iterators:\n\nIterators\n----------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   affine_range\n   dynamic_range\n   sequential_range\n   static_range\n\n\n.. _nl_memory_hierarchy:\n\nMemory Hierarchy\n-----------------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   psum\n   sbuf\n   hbm\n   private_hbm\n   shared_hbm\n   is_psum\n   is_sbuf\n   is_hbm\n   is_on_chip\n\n.. _nl_others:\n\nOthers\n-------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   device_print\n   no_reorder\n   program_id\n   num_programs\n   program_ndim\n\n.. _nl_datatypes:\n\nData Types\n-----------\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   bool_\n   int8\n   int16\n   int32\n   uint8\n   uint16\n   uint32\n   float16\n   float32\n   bfloat16\n   tfloat32\n   float8_e4m3\n   float8_e5m2\n   float8_e4m3fn\n   float8_e5m2_x4\n   float8_e4m3fn_x4\n   float4_e2m1fn_x4\n\n\n.. _nl_constants:\n\nConstants\n----------\n\n.. list-table::\n\n   * - :doc:`tile_size <nki.language.tile_size>`\n     - Hardware tile size constants (pmax, psum_fmax, gemm_stationary_fmax, etc.)\n\n.. toctree::\n   :hidden:\n\n   nki.language.tile_size\n"
  },
  {
    "path": "nki/api/nki.language.tile_size.rst",
    "content": "nki.language.tile\\_size\n=======================\n\n.. currentmodule:: nki.language\n\n.. autoclass:: tile_size\n\n   .. rubric:: Attributes\n\n   .. autosummary::\n\n      ~tile_size.pmax\n      ~tile_size.psum_fmax\n      ~tile_size.gemm_stationary_fmax\n      ~tile_size.gemm_moving_fmax\n      ~tile_size.bn_stats_fmax\n      ~tile_size.psum_min_align\n      ~tile_size.sbuf_min_align\n      ~tile_size.total_available_sbuf_size\n"
  },
  {
    "path": "nki/api/nki.rst",
    "content": ".. _nki-reference:\n\nnki\n======\n\n.. currentmodule:: nki\n\nThe ``nki`` module provides the top-level entry points for compiling and running NKI kernels.\nUse the :func:`jit` decorator to compile a kernel for NeuronDevices, or :func:`simulate` to\nrun a kernel in the CPU simulator for debugging.\n\n.. _nki_decorators:\n\n\n.. autosummary::\n   :toctree: generated\n   :nosignatures:\n\n   jit\n   simulate\n"
  },
  {
    "path": "nki/api/nki.simulate.rst",
    "content": ".. meta::\n    :description: Documentation for the nki.simulate API in the Neuron SDK\n    :keywords: nki, simulate, nki.simulate, test, kernels, aws neuron sdk\n    :date-modified: 04/02/2026\n\n.. _nki-simulate:\n\nnki.simulate\n============\n\n.. note::\n\n   This API is experimental and may change in future releases.\n\n``nki.simulate`` runs NKI kernels on your CPU using Python (and NumPy), with no Trainium hardware required.\nIt executes kernel code as regular Python, making it ideal for fast development, debugging, and correctness testing.\n\n.. contents:: On this page\n   :local:\n   :depth: 2\n\nOverview\n--------\n\n``nki.simulate`` is a CPU-based functional simulator for NKI kernels. It executes every ``nki.isa``\nand ``nki.language`` operation using Python and NumPy, producing results that approximate hardware behavior.\nYou write your kernel once and can run it on both the simulator and real Trainium devices. Some kernels\nmay require adjustments when moving to hardware — see :ref:`Simulation Limitations <simulation-limitations-api>` for details.\n\n**Why use the simulator?**\n\n- **No hardware required** — develop and test NKI kernels on any machine with Python.\n- **Cost savings** — avoid the cost of developing on Trainium instances; iterate locally, then deploy to hardware when ready.\n- **Same kernel code** — the same ``@nki.jit`` kernel can run on both hardware and the simulator. See :ref:`Simulation Limitations <simulation-limitations-api>` for cases where adjustments may be needed.\n- **Full debugging support** — use ``breakpoint()``, PDB, or IDE debuggers to step through kernel execution and inspect tensor values.\n- **Fast iteration** — test kernels instantly without compilation or deployment.\n- **Hardware constraint validation** — catches invalid shapes, buffer misuse, dtype errors, and other constraint violations at runtime with clear error messages.\n- **AI-assisted development** — ideal for GenAI coding agents authoring NKI kernels, thanks to instant local feedback and detailed error messages that enable rapid autonomous iteration.\n\nQuick Start\n-----------\n\n.. nki_example:: /nki/examples/simulate/nki_simulate_example.py\n   :language: python\n   :marker: NKI_EXAMPLE_SIMULATE\n\n.. nki_example:: /nki/examples/simulate/nki_simulate_example.py\n   :language: python\n   :marker: NKI_EXAMPLE_SIMULATE_RUN\n\n\nUsage\n-----\n\nRunning the Simulator\n^^^^^^^^^^^^^^^^^^^^^\n\nThe simulator accepts **NumPy arrays** as inputs. If your script uses PyTorch or JAX tensors,\nconvert them to NumPy arrays before passing them to simulated kernels (for example, ``tensor.numpy()``).\n\n**nki.simulate() API**\n\nUse the explicit API to run a kernel on the simulator. This is also useful when you want\nto run a kernel on *both* the simulator and hardware in the same script — for example,\nto compare results:\n\n.. code-block:: python\n\n   # Run on simulator\n   sim_result = nki.simulate(my_kernel)(a_np, b_np)\n\n   # Run on hardware (requires Trainium and neuronx-cc)\n   hw_result = my_kernel(a_torch, b_torch)\n\n   # Compare\n   np.testing.assert_allclose(sim_result, hw_result.numpy(), rtol=1e-2)\n\n\nTarget Platform\n^^^^^^^^^^^^^^^\n\nThe simulator models different NeuronCore generations. Set the target using the\n``NEURON_PLATFORM_TARGET_OVERRIDE`` environment variable:\n\n.. list-table::\n   :header-rows: 1\n   :widths: 40 60\n\n   * - Environment variable value\n     - Hardware\n   * - ``trn1`` or ``gen2``\n     - Trn1 (NeuronCore-v2)\n   * - ``trn2`` or ``gen3``\n     - Trn2 (NeuronCore-v3)\n   * - ``trn3`` or ``gen4``\n     - Trn3 (NeuronCore-v4)\n   * - *(unset)*\n     - Auto-detect (uses the Neuron chip detected on the running machine, otherwise defaults to ``trn3``)\n\nPrecise Floating-Point Mode\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBy default, the simulator stores ``bfloat16``, ``float8_e4m3``, and ``float8_e5m2`` tensors as ``float32``\nfor faster simulation performance and to let you examine kernel correctness in high-precision floating-point.\nTo get numerical behavior similar to hardware, enable precise mode with ``NKI_PRECISE_FP=1``:\n\n.. code-block:: bash\n\n   NKI_PRECISE_FP=1 python my_script.py\n\nWhen enabled, low-precision dtypes are stored using ``ml_dtypes`` (real ``bfloat16``, ``float8``, etc.)\ninstead of ``float32``. This is recommended for most use cases.\n\nDebugging\n^^^^^^^^^\n\nBecause the simulator runs kernels as regular Python, you have full access to Python's\ndebugging ecosystem.\n\n**Using breakpoint():**\n\n.. code-block:: python\n\n   @nki.jit\n   def my_kernel(a_ptr):\n       tile = nl.load(a_ptr)\n       breakpoint()  # Debugger stops here — inspect `tile`\n       result = nl.add(tile, tile)\n       return nl.store(result)\n\n   nki.simulate(my_kernel)(data)\n\n**Using device_print:**\n\n``nl.device_print`` works in the simulator and prints tensor values to stdout:\n\n.. code-block:: python\n\n   @nki.jit\n   def my_kernel(a_ptr):\n       tile = nl.load(a_ptr)\n       nl.device_print(\"my tile\", tile)\n       ...\n\n**Using Python print:**\n\nSince the simulator executes kernels as standard Python, you can use ``print()`` to inspect any\nintermediate tensor or register value during execution. This is especially useful for both interactive\ndebugging and AI-assisted development workflows where agents iterate on kernels locally.\n\n**IDE Debugging (VSCode / PyCharm):**\n\nSet breakpoints in your kernel code and run your script normally. The simulator executes\nkernel code in-process, so IDE debuggers work without any special configuration.\n\n\nHow It Works\n------------\n\nExecution\n^^^^^^^^^\n\nWhen you call ``nki.simulate(kernel)(a, b)``:\n\n1. Each NumPy array argument is wrapped into an ``NkiTensor`` with ``buffer=nl.hbm``\n   (or ``shared_hbm`` for LNC2). Non-array arguments pass through unchanged.\n2. The simulator backend is activated, routing all ``nki.isa`` and ``nki.language``\n   operations to NumPy-based implementations.\n3. The kernel function runs as regular Python — each NKI API call executes eagerly\n   and sequentially. There is no instruction scheduling or engine parallelism.\n4. On return, ``NkiTensor`` results are converted back to NumPy arrays. Input arrays are\n   updated in-place if the kernel modified the corresponding HBM tensors.\n\nFor **LNC2 kernels** (``kernel[2]``), the simulator spawns two Python threads that execute the\nkernel concurrently, each with its own ``program_id``. Input arrays use ``shared_hbm`` buffers,\nso both threads can access shared memory. ``nki.isa.sendrecv`` and ``nki.isa.core_barrier``\nuse thread-safe synchronization primitives.\n\nUninitialized Memory Detection\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe simulator automatically fills all newly allocated tensors with **sentinel values** — ``NaN`` for\nfloating-point types and ``4`` for integer types. This makes it easy to detect bugs where a kernel\nreads from memory that was never written to.\n\nBecause ``NaN`` propagates through arithmetic (any operation involving ``NaN`` produces ``NaN``), if\nyour kernel accidentally computes on uninitialized memory, the resulting output will contain ``NaN``\nvalues. You can check for this in your test:\n\n.. code-block:: python\n\n   result = nki.simulate(my_kernel)(inputs)\n   assert not np.any(np.isnan(result)), \"Kernel computed on uninitialized memory!\"\n\n**Why this matters:**\n\nOn real hardware, uninitialized memory contains arbitrary leftover values from previous operations.\nA kernel that reads uninitialized data may appear to produce correct results on hardware by coincidence —\nmaking these bugs extremely difficult to track down. The simulator's sentinel values turn these silent\ncorrectness hazards into immediately visible ``NaN`` values in the output.\n\n.. tip::\n\n   If you see unexpected ``NaN`` values in your simulation output, check that all tensors are properly\n   initialized before use. Common causes include:\n\n   - Allocating a tensor with ``nl.ndarray`` but not writing to all elements before reading\n   - Off-by-one errors in tile loop bounds that leave some elements unwritten\n   - Conditional writes that skip certain partitions or indices\n\n\nHardware Constraint Validation\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEach ``nki.isa`` operation validates hardware constraints at runtime — shape limits, dtype\ncompatibility, buffer types, engine restrictions, and architecture version requirements.\nInvalid operations raise clear Python exceptions with descriptive error messages.\n\n.. note::\n\n   Hardware constraint validation is actively being developed. Some constraints may not yet\n   be checked by the simulator. If your kernel passes simulation but fails on hardware,\n   report it to the Neuron team as an issue.\n\n\n**Example:**\n\n.. code-block:: python\n\n   @nki.jit\n   def bad_kernel(a_ptr):\n       tile = nl.ndarray((256, 512), dtype=nl.float32, buffer=nl.sbuf)  # exceeds 128\n       ...\n\n   nki.simulate(bad_kernel)(data)\n   # AssertionError: tensor_tensor data1 partition dimension 256 exceeds maximum 128\n\n\n\n.. _simulation-limitations-api:\n\nSimulation Limitations\n----------------------\n\nThe simulator approximates hardware behavior but is not identical. Understanding these\nlimitations helps you write kernels that work on both the simulator and real Trainium hardware.\n\nNo Compilation\n^^^^^^^^^^^^^^\n\nThe simulator runs kernel code directly as Python — there is no compilation step. For real hardware,\nNKI kernels go through a full compilation pipeline (NKI → NEFF binary). This means\nthe simulator cannot catch compilation errors; a kernel that runs on the simulator may still fail\nto compile for hardware.\n\nNKI Meta-Programming Support\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe simulator accepts any valid Python in the kernel body, including arbitrary classes, closures,\nand dynamic control flow. The NKI compiler, however, only supports a restricted subset of Python\nfor meta-programming, see :ref:`NKI Language Guide <nki-language-guide>`. As a result, kernels that execute successfully on the simulator may fail to\ncompile on hardware.\n\nNumerical Precision\n^^^^^^^^^^^^^^^^^^^\n\nBy default, the simulator stores low-precision types (``bfloat16``, ``float8_e4m3``, ``float8_e5m2``)\nas ``float32``, which can mask rounding and precision issues that appear on hardware. Enable\n``NKI_PRECISE_FP=1`` (recommended) to use real low-precision storage via ``ml_dtypes`` for\nnumerical behavior similar to hardware. See `Precise Floating-Point Mode`_ for details.\n\nPerformance\n^^^^^^^^^^^\n\nThe simulator runs on the CPU using Python and NumPy. It does not model instruction latency,\nengine parallelism, or hardware scheduling. Since kernels are interpreted rather than compiled\nand optimized for Trainium NeuronCores, the simulator is significantly slower than hardware\nexecution and is not suitable for performance benchmarking.\n\nMemory Model\n^^^^^^^^^^^^\n\nThe simulator allocates each tensor independently without simulating overlapping memory regions\nor validating against SBUF/PSUM capacity limits. Kernels with memory conflicts may run\nsuccessfully on the simulator but fail or produce incorrect results on real hardware, where\nSBUF and PSUM are shared physical memory with capacity constraints.\n\nKnown Gaps\n^^^^^^^^^^\n\n- ``nki.collectives`` APIs are not implemented in the simulator.\n- Some ``nki.isa`` instructions produce incorrect results: ``local_gather``,\n  ``nc_stream_shuffle`` with ``mask=255``, ``nc_matmul_mx``, and ``quantize_mx``.\n"
  },
  {
    "path": "nki/deep-dives/index.rst",
    "content": ".. _nki_deep-dives_home:\n\n.. meta::\n    :description: Documentation home for the AWS Neuron SDK NKI Deep Dives and other advanced materials.\n    :keywords: NKI, AWS Neuron, Deep Dives, Advanced Programming\n    :date-modified: 12/01/2025\n\nNKI Deep Dives\n==============\n\nThis section provides in-depth technical documentation and guides for advanced users of the Neuron Kernel Interface (NKI). These deep dives offer detailed explanations of NKI concepts, programming patterns, and best practices to help you maximize the performance and capabilities of your NKI code on AWS Neuron devices.\n\nOptimizing a NKI Kernel\n-----------------------\n\n.. grid:: 2\n   :margin: 4 1 0 0\n\n   .. grid-item-card:: NKI Performance Optimizations\n      :link: nki_perf_guide\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\nAdvanced NKI Programming\n------------------------\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: MXFP4/8 Matrix Multiplication Guide\n      :link: mxfp-matmul \n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Perform matrix multiplication using MXFP8 data types in NKI kernels, including data layout, quantization, and tiling strategies.\n\n   .. grid-item-card:: NKI Compiler\n      :link: nki_compiler_about\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Learn about the NKI Compiler.\n\n   .. grid-item-card:: NKI Dynamic Loops\n      :link: nki-dynamic-loops\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Use dynamic loops with runtime-determined trip counts via hardware loop instructions.\n\n   .. grid-item-card:: Descriptor Generation Engine (DGE)\n      :link: dge-documentation\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Control how DMA descriptors are generated: pre-computed, software (GpSimd), or hardware DGE.\n\n   .. grid-item-card:: DMA Bandwidth Guide\n      :link: nki-dma-bandwidth-guide\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Guidelines for maximizing DMA bandwidth with large contiguous payloads.\n\n   .. grid-item-card:: NKI Access Patterns\n      :link: nki-aps\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Learn about Access Patterns (AP) to directly specify how the Trainium hardware accesses tensors.\n\n\nAdditional NKI Information\n--------------------------\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Performance Optimizations <nki_perf_guide>\n    MXFP8/4 Matrix Multiplication <mxfp-matmul>\n    NKI Access Patterns <nki-aps>\n    NKI Dynamic Loops <nki-dynamic-loops>\n    Descriptor Generation Engine (DGE) <nki-dge>\n    DMA Bandwidth Guide <nki-dma-bandwidth-guide>\n    nki-compiler\n"
  },
  {
    "path": "nki/deep-dives/mxfp-matmul.rst",
    "content": ".. meta::\n    :description: Guide for implementating MXFP4/8 matrix multiplication using NKI on AWS Neuron hardware.\n    :keywords: MXFP8, MXFP4, Matrix Multiplication, NKI, Neuron\n    :date-modified: 12/19/2025\n\nMXFP Matrix Multiplication with NKI on AWS Neuron\n===================================================\n\nIn this guide, you'll learn how to perform MXFP4/8 matrix multiplication, quantization, and Neuron's recommended best practices for writing MX kernels.\n\n\nBefore You start\n-----------------\n\n* Read the MX-related sections of the :ref:`Trainium 3 Architecture Guide for NKI <trainium3_arch>` and become familiar with basic matrix multiplication concepts on Neuron in the :doc:`Matrix Multiplication tutorial </nki/guides/tutorials/matrix_multiplication>`.\n\n.. note::\n    The code snippets in this guide are taken from the `tutorial code package <https://github.com/aws-neuron/nki-samples/tree/main/src/nki_samples/tutorials/mxfp-matmul>`_ which demonstrates how to execute all MX kernel examples from Torch. We recommend you browse and run the code as you read the tutorial.\n\nWhat is MXFP4/8 Matrix Multiplication?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nMXFP4/8 matrix multiplication uses microscaling (MX) quantization as defined in the OCP standard. Unlike traditional quantization that uses tensor- or channel-wide scale factors, microscaling calculates quantization scales from small groups of values. Specifically, groups of 32 elements along the matrix multiplication contraction dimension share the same 8-bit MX scale value.\n\nThis approach preserves significantly more information in quantized values by preventing high-magnitude outliers from \"squeezing\" the entire data distribution. The NeuronCore-v4 Tensor Engine performs matrix multiplication of MXFP4 or MXFP8 input matrices and dequantization with MX scales in a single instruction, achieving 4x throughput compared to BF16/FP16 matrix multiplication while outputting results in FP32 or BF16.\n\nLayout and Tile Size Requirements\n----------------------------------\n\nBefore diving into code examples of MX multiplication, it's important to review the layout and tile-size requirements of MX. MX quantized tensors are represented with separate data and scale tensors, each with distinct requirements.\n\nData Tensor\n~~~~~~~~~~~~\n\nCompared to BF16/FP32 matrix multiplication, the performance uplift from Matmul-MX comes from the ability to contract 4x more elements during one matmul operation as each TensorE processing element is able to perform four simultaneous, FP4/FP8, multiply-accumulate computations. This means the maximum effective contraction dimension has increased from 128 → 512. \n\nFirst, let's examine the tile-size constraints for MX so we can allocate the correct space for tensors. MX data is represented in NKI using quad (x4) packed data types (:doc:`float8_e5m2_x4 </nki/api/generated/nki.language.float8_e5m2_x4>`, :doc:`float8_e4m3fn_x4 </nki/api/generated/nki.language.float8_e4m3fn_x4>`, and :doc:`float4_e2m1fn_x4 </nki/api/generated/nki.language.float4_e2m1fn_x4>`, herein referred to collectively as ``MXFP_x4``). The ``float8_*_x4`` types are 32-bits wide and physically contain four ``float8`` elements. The ``float4_*_x4`` type is 16-bits wide and physically contains four ``float4`` elements. As expressed in ``_x4`` elements, the TensorE maximum tile sizes in NKI code continue to be given by the existing hardware constraints, summarized below.\n\n.. list-table::\n   :header-rows: 1\n   :widths: 20 20 30 30\n\n   * - Matrix Type\n     - Data Type\n     - Implied Physical Size\n     - Max Tile Size in Code\n   * - Stationary\n     - BF16\n     - [128P, 128F]\n     - [128P, 128F]\n   * - Stationary\n     - MXFP_x4\n     - [512P, 128F]\n     - [128P, 128F]\n   * - Moving\n     - BF16\n     - [128P, 512F]\n     - [128P, 512F]\n   * - Moving\n     - MXFP_x4\n     - [512P, 512F]\n     - [128P, 512F]\n\nThis means that we will allocate data tensors, of type ``MXFP_x4``, in our NKI code with the same shapes as we would for BF16/FP32, but it's implied they contain 4x more contraction elements as shown in the subsequent diagrams.\n\nNow let's examine a BF16 tile destined to be quantized into a max-sized moving tile for Matmul-MX (``[128P, 512F] MXFP_x4``). Note that the following concepts are equally applicable to the stationary tile whose max size is ``[128P, 128F]``.\n\nSince a 4x larger contraction dimension is supported we'll start with a BF16 tile of size ``[512, 512]`` as shown below. To help us in the subsequent step we'll also view it as being sectioned into 4 regions of 128 rows (i.e. reshaped as ``[4, 128, 512]``). This view is mathematical (i.e. not residing in any particular memory).\n\n.. image:: /nki/img/deep-dives/mxfp84-matmul-guide-1.png\n   :width: 50%\n   :align: center\n\nAs explained in the :doc:`Trainium 3 Architecture Guide for NKI </nki/guides/architecture/trainium3_arch>` we must take 4 elements originating 128 apart on the contraction axis and pack them together on the SBUF free-dimension as shown below. We'll call this transformation \"interleaving\".\n\n.. image:: /nki/img/deep-dives/mxfp84-matmul-guide-2.png\n   :align: center\n\nNotice the SBUF shape has become ``[128P, 2048F]``. In a subsequent code example we'll see that it's useful to view/reshape this as ``[128P, 512F, 4F]``, making it clear we have 512 groups of 4 packed elements.\n\nNext, let's Quantize-MX this tile, which will preserve the layout but pack groups of 4 free-dimension elements into a single ``MXFP_x4`` element, as shown below. Note that Quantize-MX does not support an FP4 output but Matmul-MX does support FP4 input.\n\n.. image:: /nki/img/deep-dives/mxfp84-matmul-guide-3.png\n   :width: 50%\n   :align: center\n\nNotice the shape is now ``[128P, 512F]`` which is the max moving tile size we aimed for. But each ``MXFP_x4`` element, shown in red, physically contains four quantized elements from the original tile. Recall that each TensorE processing element ingests enough data to perform four, FP4/FP8 multiply-accumulate operations, which is why four elements from the original contraction axis must be packed together in this fashion.\n\nWith this understanding we'll state the space allocation rules for quantized ``MXFP_x4`` data tiles.\n\n.. code-block:: none\n\n    Unquantized Interleaved Data Tile = [P,F] BF16 in SBUF\n\n    MX Quantized Data Tile = [P, F//4] MXFP_x4 in SBUF\n\nScale Tensor\n~~~~~~~~~~~~~\n\nLet's revisit the BF16 tile with the interleaved SBUF layout but this time with one of the ``[8P, 4F]`` scaling groups overlaid.\n\n.. image:: /nki/img/deep-dives/mxfp84-matmul-guide-4.png\n   :align: center\n\nMX scales are represented using a ``UINT8`` tile containing one element for each scaling group.\n\nAs explained in the :doc:`Trainium 3 Architecture Guide for NKI </nki/guides/architecture/trainium3_arch>`, we view the partition-dimension of SBUF as being split into 4 quadrants of 32 partitions each. Scales must be placed in the quadrant from which the corresponding scaling group originated, as shown below.\n\n\n.. image:: /nki/img/deep-dives/mxfp84-matmul-guide-5.png\n   :width: 50%\n   :align: center\n\n\nNotice the allocated shape is ``[128P, 512F]`` despite the underlying useful shape being ``[16P, 512F]``. See the :doc:`quantize_mx API </nki/api/generated/nki.isa.quantize_mx>` for an example of how to improve memory usage by packing scales, from other quantized tensors, into the same allocation.\n\nWith this understanding we'll state the space allocation rules for quantized MX scale tiles.\n\n.. code-block:: none\n\n    Unquantized Interleaved Data Tile = [P,F] BF16 in SBUF\n\n    If P <= 32 (Oversize optional)\n\n    MX Quantized Scale = [P//8, F//4] UINT8 in SBUF\n\n    If P > 32 (Oversize required)\n\n    MX Quantized Scale = [P, F//4] UINT8 in SBUF\n\nBasic Matmul-MX\n----------------\n\nThis NKI example performs a single Matmul-MX using offline-quantized, max-sized input tiles. For simplicity, it assumes the MX *data* tiles in HBM already satisfy the layout requirements so they may be simply loaded straight into SBUF. The MX *scale* tiles require some shuffling. Note that subsequent examples, instead, show how to establish this layout yourself in SBUF.\n\n.. literalinclude:: src/mxfp-matmul/mx_kernels.py\n   :language: python\n   :start-after: [start-kernel_offline_quantized_mx_matmul]\n   :end-before: [end-kernel_offline_quantized_mx_matmul]\n\nA few notes about the above example:\n\n* The ``MXFP_x4`` packed data types are custom to NKI and are not supported in Torch. Therefore, we mimic the packed data using ``uint8`` in Torch and simply view it as ``MXFP_x4`` in the kernel, as shown.\n* The ``load_scales_scattered()`` helper function reads contiguously packed offline scales from HBM and spreads them across partition-dim quadrants.\n* The PSUM output tile is allocated with data type BF16 to indicate the desired output data type of the Matmul-MX. Note that Matmul-MX (:doc:`nki.isa.nc_matmul </nki/api/generated/nki.isa.nc_matmul_mx>`) supports both BF16 and FP32 output dtypes.\n\nLet's also look at the host code which calls this kernel as all subsequent examples use the same structure.\n\n.. literalinclude:: src/mxfp-matmul/mx_toplevel.py\n   :language: python\n   :start-after: [start-run_offline_quantized_matmul_mx_test]\n   :end-before: [end-run_offline_quantized_matmul_mx_test]\n\n* The ``generate_stabilized_mx_data()`` helper function is used to generate MX data on the host. \"Stabilized\" means the data is randomly generated but injected with certain properties to allow for lossless quantization/dequantization, including constraining the data to be in the FP4/8 range. It conveniently returns MX data as ``ml_dtypes`` FP4/FP8, the same data packed into ``uint`` to mimic the ``MXFP_x4`` packing (suitable for sending to a NKI kernel), MX scales, and a corresponding unquantized FP32 tensor. The input shape argument specifies the unquantized shape. The unquantized tensor is viewed as being in the required layout for MX operations. Therefore to generate an MX data tile of maximum size we must specify an unquantized free-dimension that is 4x larger. In this example the moving unquantized shape is ``[128P, 2048F]`` and the function will return a ``[128P, 512F]`` packed MX data tensor, as desired.\n* ``nc_matmul_mx_golden()`` is a utility to mimic the hardware's Matmul-MX operation and is therefore useful for verifying the hardware output. It assumes the input tensors meet the SBUF layout requirements and the data tensor is packed to mimic ``MXFP_x4``. Hence it can directly accept MX data generated by ``generate_stabilized_mx_data()``.\n* ``compare_and_print_results()`` uses ``numpy.allclose()`` to check data correctness and print the tensors to ``stdout``.\n* Although this is a single-tile Matmul-MX, larger MX tensors can be multiplied by using the same tiling techniques shown in the non-MX :doc:`Matrix Multiplication tutorial </nki/guides/tutorials/matrix_multiplication>`.\n\nQuantize-MX + Matmul-MX\n-----------------------\n\nNext we'll replace one of the Matmul-MX inputs with a tile that we quantize on the VectorE using Quantize-MX. Again, it assumes the interleaved SBUF layout requirement is already satisfied. The source data for Quantize-MX must be in SBUF (cannot be in PSUM).\n\nThe two main changes in this example are:\n\n* The ``allocate_mx_tiles()`` helper function implements the data and scale tile allocation rules mentioned above.\n* ``load_scales_scattered()`` is again used for the stationary scales but is unnecessary for the moving scales since Quantize-MX will correctly spread the data across SBUF partition-dim quadrants.\n\n.. literalinclude:: src/mxfp-matmul/mx_kernel_utils.py\n   :language: python\n   :start-after: [start-allocate_mx_tiles]\n   :end-before: [end-allocate_mx_tiles]\n\n.. literalinclude:: src/mxfp-matmul/mx_kernels.py\n   :language: python\n   :start-after: [start-kernel_on_device_quantize_matmul_mx]\n   :end-before: [end-kernel_on_device_quantize_matmul_mx]\n\nPlease see the code package for the host code that calls this kernel.\n\nSBUF Layout Using Strided Access\n--------------------------------\n\nHere we present two techniques for establishing the interleaved layout required for MX operations. Both produce the same result but have different performance tradeoffs. Therefore it's useful to think of them as tools in a toolbox where you use the one that's appropriate for your given situation.\n\nIt's important to note that these techniques operate on unquantized tensors (BF16 in these examples) as the layout must be established before calling Quantize-MX. If you already have offline MX weights (already quantized), it's suggested you establish the required layout offline so you may perform a direct load to SBUF.\n\nThe techniques are first explained then followed by a combined code example.\n\nVectorE/ScalarE Strided Access\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHere we use either VectorE or ScalarE to write data to SBUF in the required layout. The simplest operation is a TensorCopy (shown below) but it's usually more performant to apply the strided access pattern to some prior useful computation already occurring on these engines.\n\nFor completeness the example loads an HBM tensor to SBUF prior to rearranging the data on-device using an SBUF-to-SBUF TensorCopy. The load is needed for this to be a standalone executable example but in practice it's expected your data would already be in SBUF from some previous operation. The TensorCopy strided access pattern is the key takeaway from this example.\n\nAlso note the TensorCopy source could be PSUM if you want to rearrange the data immediately after a prior matmul.\n\nDMA Strided Access\n~~~~~~~~~~~~~~~~~~~\n\nHere we DMA a tensor from HBM to SBUF using a strided access pattern. It's conceptually similar to the above technique except the source of the copy is in HBM. This technique is typically significantly slower than on-device techniques but it can be useful in heavily compute-bound workloads where the DMA may overlap with compute.\n\nCode\n~~~~\n\nThis example demonstrates both techniques, selected by the ``use_tensor_copy`` argument. They are very similar but with slightly different read access patterns. It's useful to refer to the above layout diagrams as you read this code as the reshapes and access patterns directly correspond.\n\n.. literalinclude:: src/mxfp-matmul/mx_kernel_utils.py\n   :language: python\n   :start-after: [start-copy_data_strided]\n   :end-before: [end-copy_data_strided]\n\nSee the code package for an example kernel that calls ``copy_data_strided()`` to establish the interleaved layout for stationary and moving tiles, quantize both, and perform a Matmul-MX.\n\n.. _nki-mxfp-scale-packing:\n\nPacking Scale Values\n--------------------\n\nAs discussed in `Scale Tensor`_, each element of a scale tensor corresponds to a group of 32 elements in the unquantized source tensor.\nEach scaling group spans 8 partitions, with 4 free elements per partition, giving scale tensors a logical size of ``[P // 8, F // 4]``.\nHowever, due to connectivity constraints between SBUF and VectorE, scale values must be placed in the same quadrant as their corresponding scaling group.\nWhen the unquantized source tensor spans multiple partitions (i.e., ``src.shape[0] > 32``), the scale tensor has physical shape ``[P, F // 4]``.\nOnly the first 4 partitions (= 32 partitions per quadrant divided by 8 partitions per scaling group) of each quadrant are occupied, leaving the remaining 28 unused.\n\nQuantize-MX allows you to utilize some of this space by packing scale values from multiple Quantize-MX calls.\nQuantize-MX and Matmul-MX support writing/reading scales at an offset of 0, 4, 8, or 12 within each partition, allowing you to pack scale values from up to four tensors into a single tile.\nTo illustrate, consider the scale tile from `Scale Tensor`_ shown with and without scale packing:\n\n.. image:: /nki/img/deep-dives/mxfp84-matmul-guide-6.drawio.png\n   :align: center\n\nCode\n~~~~\n\nThis example demonstrates how to pack scale values from multiple Quantize-MX calls into a single tensor in SBUF, as mentioned in the :ref:`Trainium3 Architecture Guide <arch-trn3-quad-mxfp>`.\nWe use tensor slicing to control the offset into each quadrant at which Quantize-MX writes scale values.\n\n.. literalinclude:: src/mxfp-matmul/mx_kernels.py\n   :language: python\n   :start-after: [start-kernel_copy_strided_quantize_matmul_mx_packed_scale]\n   :end-before: [end-kernel_copy_strided_quantize_matmul_mx_packed_scale]\n\n\nAdditional Tips\n----------------\n\n* It's important to plan where in your design you'll pay the cost of interleaving the data. Ideally you minimize the cost by finding existing, prior compute on which you can apply the strided access pattern. Or find existing compute against which you can overlap the interleave process. For offline MX weights prepare the layout offline on CPU so you may load the data to SBUF directly in a contiguous/unstrided fashion.\n\n* As with all compute on Neuron, it's generally performant to spread it across multiple engines operating in parallel. Given that Quantize-MX runs exclusively on the VectorE a bit more care may be needed to alleviate VectorE contention by becoming familiar with operations that may be relegated other engines, like ScalarE.\n\n* The TensorE operates at double the clock frequency of VectorE, therefore Matmul-MX produces data at double the rate that Quantize-MX can consume it. It may seem that the TensorE could be back-pressured in a situation where a Matmul-MX quickly feeds a subsequent Matmul-MX (since you must Quantize-MX in between at half the speed), but that only happens for small tensors. Larger tensors require tiled matrix multiplication which inherently reuses input (quantized) tiles, allowing time for prior matmul output data to be quantized.\n\nMatmul-MX supports PE-tiling (row-tiling only) where matmuls with a small (<= 64) contraction-dimension (partition-dimension) may be parallelized on the TensorE. This becomes more relevant for MX since a 4x-larger effective contraction-dimension is supported, meaning it's useful for an ``MXFP_x4`` contraction-dimension <= 64 or an equivalent unquantized contraction-dimension <= 256.\n\nExecuting the Code\n------------------\nAfter downloading the `tutorial code package <https://github.com/aws-neuron/nki-samples/tree/main/src/nki_samples/tutorials/mxfp-matmul>`_ to your Trainium3 Neuron environment, simply execute it as follows and observe the sample output.\n\n.. code-block:: bash\n\n  $ python3 mx_toplevel.py\n\n  =====================================================================================\n      OFFLINE_QUANTIZED_MX_MATMUL - stationary <float8_e5m2> @ moving <float8_e5m2>\n  =====================================================================================\n\n  Result shape: (128, 512)\n\n  np.allclose pass? True\n\n  Device Output:\n  [[0.02526855 0.59765625 1.15625   ] ... [-0.09033203 -0.10888672 -0.84375   ]]\n  ...\n  [[ 0.25585938  0.18554688 -0.546875  ] ... [-0.71875    -0.6015625  -0.46484375]]\n\n  Golden:\n  [[0.02535721 0.5957752  1.1556101 ] ... [-0.09036541 -0.10906862 -0.8448767 ]]\n  ...\n  [[ 0.2551025   0.1856966  -0.54681885] ... [-0.71797514 -0.6026518  -0.4641544 ]]\n\n\n  =========================================================================================\n      OFFLINE_QUANTIZED_MX_MATMUL - stationary <float4_e2m1fn> @ moving <float4_e2m1fn>\n  =========================================================================================\n\n  Result shape: (128, 512)\n\n  np.allclose pass? True\n\n  Device Output:\n  [[-0.02038574  0.02648926  0.10351562] ... [-0.25        0.02404785  0.08154297]]\n  ...\n  [[ 0.234375  -0.0456543  1.140625 ] ... [ 1.1015625   0.04833984 -0.17675781]]\n\n  Golden:\n  [[-0.02036181  0.02647817  0.10362364] ... [-0.24955288  0.02399684  0.08132255]]\n  ...\n  [[ 0.23485765 -0.04565394  1.1424086 ] ... [ 1.0981529   0.04839906 -0.17722145]]\n\n\n  ========================================================================================\n      ON_DEVICE_QUANTIZE_MATMUL_MX - stationary <float4_e2m1fn> @ moving <float8_e5m2>\n  ========================================================================================\n\n  Result shape: (128, 512)\n\n  np.allclose pass? True\n\n  Device Output:\n  [[-0.12792969  0.02685547 -0.19140625] ... [ 0.05883789 -0.01916504 -0.66796875]]\n  ...\n  [[ 0.03198242 -0.24316406 -0.1640625 ] ... [ 0.06591797 -0.11914062  0.6015625 ]]\n\n  Golden:\n  [[-0.1284121   0.02687968 -0.19178611] ... [ 0.05882631 -0.01915852 -0.666565  ]]\n  ...\n  [[ 0.03191248 -0.24304396 -0.16389877] ... [ 0.06606946 -0.11931092  0.60205466]]\n\n\n  ======================================================================================\n      ON_DEVICE_QUANTIZE_MATMUL_MX - stationary <float8_e5m2> @ moving <float8_e5m2>\n  ======================================================================================\n\n  Result shape: (128, 512)\n\n  np.allclose pass? True\n\n  Device Output:\n  [[ 0.02832031 -0.29296875  0.04394531] ... [-0.13671875 -0.00704956 -0.47265625]]\n  ...\n  [[ 0.03442383 -0.75        0.11572266] ... [ 0.86328125 -0.00735474  0.33007812]]\n\n  Golden:\n  [[ 0.02831857 -0.29297137  0.04390652] ... [-0.13685682 -0.00703458 -0.47168562]]\n  ...\n  [[ 0.03451066 -0.7511592   0.11560257] ... [ 0.86369723 -0.00734489  0.3300762 ]]\n\n\n  ================================================================\n      COPY_STRIDED_TENSOR_COPY - <float8_e5m2> @ <float8_e5m2>\n  ================================================================\n\n  Result shape: (128, 512)\n\n  np.allclose pass? True\n\n  Device Output:\n  [[ 0.56640625 -1.28125     0.26953125] ... [ 0.5859375   0.31054688 -0.60546875]]\n  ...\n  [[ 1.2421875 -0.859375  -1.140625 ] ... [-0.06542969  0.11425781  0.6015625 ]]\n\n  Golden:\n  [[ 0.5663527  -1.2832397   0.26900524] ... [ 0.5861912  0.3109728 -0.6038357]]\n  ...\n  [[ 1.2426924  -0.85944945 -1.1438001 ] ... [-0.0654989   0.11429967  0.6028823 ]]\n\n\n  ============================================================\n      COPY_STRIDED_DMA - <float8_e5m2> @ <float8_e5m2>\n  ============================================================\n\n  Result shape: (128, 512)\n\n  np.allclose pass? True\n\n  Device Output:\n  [[ 0.32421875  0.43359375 -0.09814453] ... [ 0.82421875 -2.171875    0.71484375]]\n  ...\n  [[-0.47070312 -0.734375    0.09765625] ... [ 1.328125   -1.09375    -0.32226562]]\n\n  Golden:\n  [[ 0.32461044  0.43410686 -0.09810834] ... [ 0.82437325 -2.1703691   0.71522826]]\n  ...\n  [[-0.47003102 -0.733371    0.09745546] ... [ 1.3250915  -1.0969493  -0.32166338]]\n\n"
  },
  {
    "path": "nki/deep-dives/nki-aps.rst",
    "content": ".. meta::\n   :description: Deep dive into Access Patterns (AP) to directly specify how tensors are accessed on Trainium hardware\n   :keywords: NKI kernels, Neuron Kernel Interface, AWS Neuron SDK, kernel compilation, Trainium, Inferentia, machine learning acceleration\n   :date-modified: 12/19/2025\n\n.. _nki-aps:\n\n===================\nNKI Access Patterns\n===================\n\nStarting with NKI 0.2.0, NKI supports the use of access patterns (AP) on \n``nl.ndarray``, which provides users with the ability to specify \nhardware-native access patterns. This low-level capability allows developers \nto specify precisely what they want their instructions to read on the hardware.\n\nAccess patterns are only necessary if slicing cannot represent the desired \ntensor access.\n\nHardware Capability\n===================\n\nInstructions can read and write tensors from/to the SBUF or PSUM, which are \nboth two-dimensional memories with 128 partitions on NeuronCore v2/v3/v4. \nWithin each SBUF/PSUM partition, the tensor read/write logic on the NeuronCore \nsupports accessing elements from up to four-dimensional arrays, though most \ninstructions only support 1D/2D/3D in the free dimension due to instruction \nlength limitations.\n\nThe multi-dimensional access patterns are typically described using two pieces \nof information: 1) the element stepping (i.e., ``stride``) and 2) number of \nelements (i.e., ``size``) in each dimension. A tensor access pattern of an \ninstruction is expected to be the same across all partitions.\n\nIn addition to the free dimension pattern, additional information is required \nto locate the number of elements to access: 1) the offset from the beginning \nof the tensor and 2) the number of partitions. The next section will describe \nhow the NKI API abstracts this information.\n\nNKI API for the Access Pattern\n===============================\n\nThe NKI API for access pattern is a direct reflection of the hardware capability. \nThe ``nl.ndarray`` has an ``ap`` method.\n\n.. code-block:: python\n\n   def ap(self, pattern: List[Tuple[int, int]], \n      offset: Optional[int] = 0,\n      scalar_offset: Optional[Access] = None,\n      vector_offset: Optional[Access] = None,\n      indirect_dim: int = 0\n      dtype: Optional[Dtype] = None):\n      pass\n\nThe parameters have the following definitions:\n\n* ``pattern``: A list of two-element tuples, each tuple describes the access on one dimension. The first element represents the element stepping and the second element represents the number of elements in each dimension. This tuple is referred to as ``[step, num]`` going forward.\n\n  * The shape of a pattern is the collection of num. For example, given pattern ``[[w_step, w_num], [z_step, z_num], [y_step, y_num], [x_step, x_num]]``, the shape is ``[w_num, z_num, y_num, x_num]``.\n  * **Note**: The order of the pattern specified here is in the opposite order to what is actually accepted by the hardware. Therefore, the order of the tuples shown on the profiler will be in the opposite order of what is specified here.\n\n* ``offset``: The offset to start the access in terms of number of elements from the beginning of the tensor. The default value is 0.\n* ``scalar_offset``: An SBUF tensor of shape ``(1, 1)`` that specifies the location to start the access in terms of number of elements on the ``indirect_dim`` of the access pattern. At most one of the ``scalar_offset`` and ``vector_offset`` can be specified.\n* ``vector_offset``: An SBUF tensor that specifies the location to start the access in terms of number of elements from the beginning of the indirect dimension specified by ``indirect_dim``. At most one of the ``scalar_offset`` and ``vector_offset`` can be specified.\n* ``indirect_dim``: The indirect dimension on which to apply ``scalar_offset`` and ``vector_offset``.\n* ``dtype``: The data type of the access pattern. The default value is the ``dtype`` of the tensor being accessed.\n\nSemantics of the Access Pattern\n================================\n\nAccess patterns can be thought of as compact representations of a loop. The \noffset is an integer indicating the start offset in terms of elements with \nrespect to the beginning of the tensor. Each two-element list ``[step, num]`` \nrepresents the stride in terms of elements and the number of iterations of \neach level of the loop. The semantics are explored through the following \nexample.\n\nGiven a tensor, the Access Pattern conceptually flattens the tensor to 1D,\nand then uses a loop to fetch elements from the tensor to construct a view.\nConsider the following NKI code:\n\n.. code-block:: python\n\n   t = nl.ndarray((p_count, N), dtype=nl.float32, buffer=nl.sbuf)\n   access = t.ap(\n     pattern=[[N, p_size], [z_step, z_num], [\n     y_step, y_num], [x_step, x_num]], \n     offset)\n\nThe above represents the following access on the tensor ``t``, written below in pseudo-code.\n\n.. code-block:: python\n\n   access = nl.ndarray((p_size, z_num, y_num, x_num), dtype=nl.float32, buffer=nl.sbuf)\n   for w in range(p_size):\n     for z in range(z_num):\n       for y in range(y_num):\n         for x in range(x_num):\n           t_flatten = t.flatten() # first flatten the tensor to 1d\n           access[w, z, y, x] = [offset + (w * N) + (z * z_step)\n                     + (y * y_step) + (x * x_step)]\n\nThe access pattern has the following properties:\n\n1. Recall from the hardware capability, the access pattern in each partition \nmust be identical. Therefore, the step of the first tuple in the AP must be \nequal to the number of elements in the free dimension of the tensor.\n2. The shape of the result view is always the same as the shape of the pattern.\n\nNote that calling ``.ap`` on a tensor does not do any computation directly. \nIt describes how to get data. The engines will consume data when the AP \nis passed into a ``nki.isa`` instruction.\n\n.. code-block:: python\n\n   src = nl.ndarray((16, 32), dtype=nl.float32, buffer=nl.sbuf)\n   dst = nl.ndarray((16, 32), dtype=nl.float32, buffer=nl.sbuf)\n   src_access = src.ap([32, 16], [1, 32]) # no computation happens\n   dst_access = dst.ap([32, 16], [1, 32]) # no computation happens\n\n   # Engine reads both src_access and dst_access and performs the copy\n   nisa.dma_copy(dst_access, src_access)\n\nA Concrete Example\n==================\n\nGiven a tensor ``t`` of size (16P, 16F), to iterate all the elements in \n``t[0:16, 8:16]`` the access pattern can be written as:\n\n.. code-block:: python\n\n   t = nl.ndarray((16, 16), dtype=nl.float32, buffer=nl.sbuf)\n   access = t.ap(pattern=[[16, 16], [1, 8]], offset=8)\n\n\n   # Semantics, the following is pseudo-code\n   access = nl.ndarray((16, 8), dtype=nl.float32, buffer=nl.sbuf)\n   # in loop form\n   for w in range(16):\n     for z in range(8):\n       idx = 8 + (w * 16) + (1 * z)\n       t_flatten = t.flatten()\n       access[w, z] = t_flatten[idx]\n\n.. image:: /nki/img/deep-dives/memory-access-visualization-1.png\n   :width: 80%\n   :align: center\n\nRestriction on SBUF/PSUM Tensors\n=================================\n\nFor SBUF/PSUM tensors, the first tuple must always be the access for the \npartition dimension. On NeuronCore v2/v3/v4, the access on the partition \ndimension must be contiguous, meaning that the step of the leading dimension \nmust be the element count of the entire free dimension of the tensor. \nTherefore, given a tensor of shape ``(p_dim, f_dim0, f_dim1)``, the step of \nthe leading dimension must be ``f_dim0 * f_dim1``.\n\nThe following example is not allowed because it reads every other partition.\n\n.. code-block:: python\n\n   t = nl.ndarray((16, 32), dtype=nl.float32, buffer=nl.sbuf)\n\n   # The following is illegal, because the first stride is 32*2 and reads every other partition\n   t.ap(pattern=[[64, 8], [1, 32]], offset=0)\n\n.. image:: /nki/img/deep-dives/memory-access-visualization-2.png\n   :width: 80%\n   :align: center\n\nRestriction on Nested Indexing\n===============================\n\nThe ``.ap`` method is only allowed on ``nl.ndarray`` and cannot be called on a \ntile produced by it. For example, the following would result in an error.\n\n.. code-block:: python\n\n   t = nl.ndarray((128, 256), dtype=nl.float32, buffer=nl.sbuf)\n   t.ap(pattern=[[256, 128],[2, 128]], offset=0).ap(pattern=[[128, 64], [1, 64]], offset=0)\n        ^-- cannot specify an access pattern on an already indexed tensor\n\nTo facilitate nested indexing, the :doc:`NKI Library </nki/library/index>`\nprovides :doc:`TensorView </nki/library/kernel-utils/tensor-view>`. ``TensorView`` provides\na convenient interface for tensor manipulation operations like slicing, permuting, broadcasting, and reshaping without copying data. It keeps track of the \noperations performed on the tensor, and could efficiently generate NKI Access Pattern by calling ``get_view()``. For example, the nested tensor slicing \nabove could be represented as the following chain of TensorView operations.\n\n.. code-block:: python\n\n   t = nl.ndarray((128, 256), dtype=nl.float32, buffer=nl.sbuf)\n   t_view = TensorView(t)\n\n   \"\"\"\n   Equivalent to .ap(.ap(pattern=[[256, 128],[2, 128]], offset=0), \n   notice the ``step`` parameter in TensorView is on the dimension it is slicing,\n   where in Access Patterns, the ``stride`` is computed by flattening the tensor to 1D. \n\n   Conceptually equivalent to t[0:128, 0:256:2], where the resulting view is of shape (128, 128)\n   \"\"\"\n   t_access_0 = t.slice(dim=0, start=0, end=128, step=1).slice(dim=1, start=0, end=256, step=2)\n\n   \"\"\"\n   Slice the t_access_0, conceptually equivalent to t_access_0[0:64, 0:64], where the resulting\n   view is of shape (64, 64)\n   \"\"\"\n   t_access_1 = t_access_0.slice(dim=0, start=0, end=64).slice(dim=1, start=0, end=64, step=1)\n\n   # t_access_1.get_view() is equivalent to the nested indexing.\n   t_access_1.get_view() # Materialize the operations to the NKI Access Pattern\n\n\nReinterpret Cast with ``ap``\n============================\n\nThe ``dtype`` parameter can be used for reinterpret casting the tensor. \nSince both the pattern and the offset are in terms of number of elements, \nnot bytes, the count must be computed accordingly. See the following example \nof reinterpret cast from ``INT32`` to ``BF16``.\n\n.. code-block:: python\n\n   t = nl.ndarray((128, 256), dtype=nl.int32, buffer=nl.sbuf)\n   cast_to_bf16 = t.ap(pattern=[\n     [512, 128], [1, 512]\n    ], # notice the number of elements is doubled due to dtype size change\n   offset = 0, dtype=nl.bfloat16) # cast_to_bf16 has shape (128, 512)\n\nDynamic Access with ``scalar_offset`` and ``vector_offset``\n===========================================================\n\nThe ``scalar_offset`` and ``vector_offset`` are for dynamic tensor access, i.e. using a \nruntime value to index another tensor. \n\nScalar Dynamic Access\n---------------------\n\nThe ``scalar_offset`` is an SBUF value that specifies the index on the ``indirect_dim`` of the tensor. \n\n.. code-block:: python\n   \n   def scalar_dynamic_dma(A):\n      # Assume input A is of shape (4*128, 512). We want to copy from A[3*128:, 0:256]\n      # The 3*128 offset comes from a dynamic variable in SBUF\n      assert A.shape == [512, 512]\n      batch_idx = nl.ndarray((1, 1), nl.int32, buffer=nl.sbuf)\n      nisa.memset(batch_idx, value=3*128)\n\n      result = nl.ndarray((128, 256), A.dtype, buffer=nl.shared_hbm)\n\n      nisa.dma_copy(src=A.ap(\n         pattern=[[512, 128], [1, 256]], offset=0,\n         scalar_offset=batch_idx, indirect_dim=0\n         ),\n         dst=result[...])\n\n      return result\n\nThe code block above accesses ``batch_idx`` on the 0-th dimension \nof the tensor A. Note that the dimension is relative to \nthe base tensor, not relative to the pattern specified.\n\nThis example will access the memory from A starting at the element offset below.\n\n.. code-block:: python\n\n   # prod(A.shape[indirect_dim+1:]) is the accumulated shape\n   # to the right of indirect_dim\n   offset + scalar_offset * prod(A.shape[indirect_dim+1:])\n\nIn the example above, the access starts from:\n\n.. code-block:: python\n\n   0 + batch_idx * 512\n\nAgain, we should notice that 512 is read from the shape of the base tensor, not from the access pattern. The shape of the access pattern is ``(128, 256)``.\n\n\nVector Dynamic Access\n---------------------\n\nVector dynamic access is similar to that of scalar, except that the dynamic offsets are in a vector. \nWe need to specify the field ``vector_offset``. **Currently, only ``indirect_dim=0`` is supported**. \nThe stride on the leading dimension must be the total number of elements to the right of the \nleading dimension in the base tensor, and the stride specified in the \nleading dimension of the pattern in the .ap() is currently ignored. \nWe still recommend setting the stride properly so that code would still work \nif this limitation is lifted in the future.\n\n.. code-block:: python \n\n   def indirect_vector_dynamic_dma(A):\n      # shape of A is (128, 512)\n      dynamic_idx_legal = nl.ndarray((64, 1), nl.int32, nl.sbuf)\n      nisa.iota(dynamic_idx_legal, [[1, 1]], 0, 2)\n\n      result_sb = nl.ndarray((64, 512), nl.float32, buffer=nl.sbuf)\n      result_hbm = nl.ndarray((64, 512), nl.float32, buffer=nl.shared_hbm)\n\n      nisa.dma_copy(src=A.ap(\n         [[512, 64], [1, 512]], 0, vector_offset=dynamic_idx_legal, indirect_dim=0\n         ), dst=result_sb, name='inst0')\n\n      nisa.dma_copy(result_hbm, result_sb, name=\"copy1\")\n\n      return result_hbm\n\nFor this particular case, the semantics of the access are the following. Note that the stride on the dynamic dimension is directly read from the base tensor.\n\n.. code-block:: python\n\n   indirect_dimension = 0\n\n   for w in range(64):\n     for z in range(512):\n      dynamic_idx = dynamic_idx_legal[w]\n         A[\n            // static offsets\n            offset +\n            // AP with the indirect dimension number replaced\n            // Note that the 512 is read from the shape of the **base** tensor.\n            1 * z + 512 * dynamic_idx\n         ]\n\n\nInteraction with DGE\n--------------------\n\nThe ``scalar_offset`` and ``vector_offset`` interact with the DGE mode selection. Refer to \n:doc:`Descriptor Generation Engine (DGE) Reference </nki/deep-dives/nki-dge>` for details.\n"
  },
  {
    "path": "nki/deep-dives/nki-compiler.rst",
    "content": ".. meta::\n   :description: Overview of the NKI Compiler, its integration with the Neuron SDK, and how it enables efficient kernel development for AWS Neuron hardware.\n   :keywords: NKI Compiler, Neuron Kernel Interface, AWS Neuron SDK, kernel compilation, Trainium, Inferentia, machine learning acceleration\n\n.. _nki_compiler_about:\n\n======================\nAbout the NKI Compiler\n======================\n\nThis topic covers the NKI Compiler and how it interacts with the Neuron Graph Compiler to produce a complete model. The NKI Compiler is responsible for compiling NKI kernels.\n\nOverview\n----------\n\nThe NKI language allows kernel writers to have direct, fine grained control over Neuron devices. Through low level APIs that reflect the Neuron instruction set architecture (ISA), NKI empowers developers to take direct control over critical performance optimizations during kernel development. This approach requires a dedicated NKI Compiler, separate from :doc:`the existing Neuron Graph Compiler </compiler/index>`, which compiles kernel code while preserving the developer's optimization choices. To seamlessly integrate NKI into model architectures defined in machine learning frameworks like JAX and PyTorch, the NKI Compiler also works in conjunction with the Neuron Graph compiler.\n\nThe diagram below shows the detailed compilation flow inside the Neuron compilers and how they work together to build the overall binary that is executable on Neuron hardware. The NKI Compiler first parses the kernel code into an AST representation for semantic analysis. It then performs a small number of middle end and back end transformations on the AST, optimizing resource allocations and instruction scheduling, producing optimized NKI IR that gets integrated back into the overall model.\n\n.. image:: /nki/img/compiler/nki-compiler-1.jpg\n\n.. important::\n    While the NKI meta-programming language looks and feels like Python, it is not actually Python code. When the Python interpreter encounters a top level function decorated with ``@nki.jit``, it invokes the NKI Compiler to handle compilation of that function.\n\n.. code-block:: python\n    \n    # this is a Python function that calls 'kernel', which is a NKI kernel\n    def a_function(x,y,z):\n        kernel(x, y, z)\n\n    # this is a NKI kernel that will be compiled by the NKI Compiler and \n    # integrated back into the overall model by the Neuron Graph compiler\n    @nki.jit\n    def kernel(x,y,z):\n        # this is kernel code\n\n\nUsing Python features within NKI kernels that are not supported will result in useful errors from the NKI Compiler indicating that the feature is not a valid NKI feature. Neuron has intentionally constrained the NKI meta-programming language to be as minimal as possible while serving the needs of building high performance kernels for today's popular models and will continue to grow and evolve the language over time. \n\nNKI Compiler Open Source\n-------------------------\n\nNeuron is planning to release the source code for the NKI Compiler to increase awareness and transparency, to enable easier development of tools, and to invite participation and collaboration as we evolve the NKI language. Developers will be able to download the compiler sources, modify them, build the compiler, and use their locally built compiler in their overall model compilation flow. \n\nTo do this, developers will be able to download our sources from our public git repository: https://github.com/aws-neuron/nki-library. The source files can be found under the ``...`` filepath in the repo.\n\nThe repo contains all the sources for the entire NKI Compiler, as well as build instructions on how to produce a standalone nki.whl. Once built, developers can install their locally built wheel: ``pip install nki.whl``. This will replace the default NKI Compiler that is installed with the Neuron SDK package. The local wheel will then be registered to handle subsequent ``@nki.jit`` decorators and will be picked up and integrated with the rest of the Neuron Graph compiler flow.\n\nNote that upon installing a locally-built wheel, developers must reinstall the Neuron SDK in order to revert their changes to the official version of the NKI Compiler. Also, the officially built compiler will have an officially tagged version whereas locally built versions will not. Any bug and error reports will contain the version of the compiler used.\n\n\nHow the NKI Compiler Works with the Graph Compiler\n--------------------------------------------------\n\nFor each kernel function, Neuron runs the NKI Compiler to produce an artifact \nfor that kernel function. This is similar to compiling a single file with a \ntraditional compiler, such as a C++ compiler.\n\nAll of the kernel artifacts are managed by the Neuron SDK. Programmers do not \nneed to manage these files themselves. Similar to prior versions of NKI, \nprogrammers mark kernel functions with ``nki.jit``---the NKI Compiler will be \ninvoked automatically when this decorator is encountered during compilation.\n\nThe Neuron Graph Compiler (or just the Neuron Compiler) handles the rest of the\nmodel, which we refer to as \"the compute graph\". The framework, such as\nPyTorch or Jax, orchestrates the process of building a compute graph from the\nmodel definition. When the model includes a call to a NKI kernel function, the\nNKI Compiler will insert a reference to the compiled artifact into the graph.\nThe Graph Compiler recognizes these references and assembles the final result\nthat can be run on the Trainium Hardware.\n\nIntegration\n-----------\n\nAs described above, both the NKI Compiler and the Neuron Compiler are used to\nconstruct the final artifact that can be run on Trainium hardware. The NKI\nCompiler compiles each NKI kernel function in turn, and the Neuron Compiler\ncompiles the whole model and inserts the NKI kernels based on the references\ngenerated by the NKI Compiler.\n\nThis insertion of NKI kernels into the graph is done very late in the\ncompilation process. This is different from prior versions of NKI that\nintegrated NKI kernels earlier in the compile process. Insertion later in the\nprocess allows the NKI Compiler to provide custom behavior for NKI and give\nusers a more predictable and performant result.\n\nFurther reading\n---------------\n\n- :doc:`/compiler/index`\n- :doc:`/nki/get-started/about/index`\n\n"
  },
  {
    "path": "nki/deep-dives/nki-dge.rst",
    "content": ".. meta::\n   :description: Deep dive into the Descriptor Generation Engine (DGE) modes for DMA operations in NKI on AWS Neuron hardware.\n   :keywords: NKI, DGE, DMA, descriptor, swdge, hwdge, gather, scatter, AWS Neuron, Trainium\n   :date-modified: 03/31/2026\n\n.. _dge-documentation:\n\n=============================================\nDescriptor Generation Engine (DGE) Reference\n=============================================\n\nEvery DMA operation (``nisa.dma_copy``, ``nisa.dma_transpose``) needs a\n*descriptor* that tells the hardware the source address, destination address,\ntransfer shape, and stride pattern. We can specify *when* and *where* those\ndescriptors are produced---on the host before execution, on the GpSimd engine\nat runtime, or on a dedicated hardware block. Each choice has different\nperformance characteristics and capability constraints. DGE (Descriptor\nGeneration Engine) is the umbrella term for the strategies that control this.\n\nIn the NKI API, there are three concrete strategies---plus an ``unknown`` mode\nthat lets the compiler choose---exposed through the ``nki.isa.dge_mode`` enum.\nThe rest of this document describes each mode, its constraints, and when to use\neach one.\n\n.. contents:: On this page\n   :local:\n   :depth: 2\n\n\nDGE Modes\n----------\n\n``unknown`` --- let the compiler decide\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor,\n                 dge_mode=nisa.dge_mode.unknown)\n\nThe default. The compiler selects the best mode based on the target hardware,\ntensor shapes, and surrounding instruction schedule. Use this unless you have a\nspecific reason to force a specific mode.\n\n``none`` --- pre-computed descriptors in HBM\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor,\n                 dge_mode=nisa.dge_mode.none)\n\nDMA descriptors are pre-computed on the Trainium host **before** NEFF\nexecution. The pre-computed descriptors are stored them in HBM. At runtime the\nDMA engine reads the pre-built descriptor directly---no on-device generation is\nneeded.\n\n**When to use:**\n\n- Fully static transfer patterns where source/destination addresses are known\n  at compile time.\n- When you want to avoid any on-device descriptor generation overhead.\n\n**Trade-offs:**\n\n- Descriptors consume HBM capacity (one per DMA instruction instance).\n- Cannot handle dynamic (runtime-computed) addresses or indices.\n\n``swdge`` --- software DGE\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor,\n                 dge_mode=nisa.dge_mode.swdge)\n\nThe **GpSimd Engine** generates DMA descriptors during NEFF execution. This is\nthe only mode that supports indirect (gather/scatter) operations with dynamic\nindices from SBUF.\n\n**When to use:**\n\n- Dynamic addresses that depend on runtime values.\n- Gather or scatter operations using ``vector_offset`` (indirect indexing).\n- Indirect transpose (``dma_transpose`` with indirect ``src``).\n\n**Trade-offs:**\n\n- Consumes GpSimd Engine cycles for descriptor generation.\n- May compete with other GpSimd workloads.\n\nImportantly, ``swdge`` has additional constraints for indirect transpose:\n\n- ``src.shape[-1] <= 128``\n- ``src.dtype`` must be 2 bytes (``float16`` / ``bfloat16``)\n- ``src`` must be on HBM\n- ``src.shape[0]`` must be divisible by 16\n- When ``src`` is 4D: ``src.shape[1]`` or ``src.shape[2]`` must be 1\n- Index tensor must be 2-D, on SBUF, with dtype ``uint32``\n- ``indices.shape[0]`` must be in ``[16, 128]`` and divisible by 16\n- When ``indices.shape[1] > 1``: ``indices.shape[0]`` must be exactly 128\n- Only available on NeuronCore-v3 (Trainium2) or newer only\n\n``hwdge`` --- hardware DGE\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor,\n                 dge_mode=nisa.dge_mode.hwdge)\n\nA dedicated **hardware block** on the NeuronCore generates descriptors on\ndemand, triggered by the Scalar Engine or Sync Engine sequencer. Each TRN2\nNeuronCore has **two DGE instances**.\n\n**When to use:**\n\n- Dynamic or semi-dynamic transfer patterns on NeuronCore-v3+.\n- When GpSimd Engine is busy with other work (avoids ``swdge`` contention).\n- Overlapping descriptor generation with compute via Scalar Engine pipelining.\n\n**Trade-offs:**\n\n- Each hardware-DGE DMA instruction takes approximately **600 ns** to execute.\n- Does **not** support indirect (gather/scatter) operations.\n\nNote, for ``dma_copy`` with ``hwdge``, the ``engine`` parameter can optionally\nselect which sequencer triggers the DGE block:\n\n.. code-block:: python\n\n   # Let Scalar Engine trigger DGE (can overlap with earlier compute)\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor,\n                 dge_mode=nisa.dge_mode.hwdge,\n                 engine=nisa.engine.scalar)\n\n   # Let Sync Engine trigger DGE\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor,\n                 dge_mode=nisa.dge_mode.hwdge,\n                 engine=nisa.engine.sync)\n\nOnly ``nisa.engine.scalar`` and ``nisa.engine.sync`` are valid when\n``dge_mode=hwdge``.\n\nHardware DGE constraints for ``dma_transpose``:\n\n- ``src.shape[0] == 16``\n- ``src.shape[-1] % 128 == 0``\n- ``src.dtype`` must be 2 bytes (``float16`` / ``bfloat16``)\n\n\nMode Selection Summary\n------------------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 15 15 15 20 35\n\n   * - Mode\n     - Descriptor Source\n     - Min HW\n     - Indirect Support\n     - Best For\n   * - ``none``\n     - Host (pre-computed in HBM)\n     - Any\n     - No\n     - Fully static patterns, zero on-device overhead\n   * - ``swdge``\n     - GpSimd Engine\n     - Any (indirect: v3+)\n     - Yes\n     - Gather/scatter, dynamic indices\n   * - ``hwdge``\n     - Hardware DGE block\n     - NeuronCore-v3+\n     - No\n     - Dynamic patterns without GpSimd contention\n   * - ``unknown``\n     - Compiler decides\n     - Any\n     - Depends\n     - Default---recommended unless tuning\n\n\nHow ``.ap()`` Affects DGE Mode\n-------------------------------\n\nWhen you use ``.ap()`` with ``vector_offset`` for indirect (gather/scatter)\naccess, the DGE mode is constrained to ``swdge``:\n\n.. list-table::\n   :header-rows: 1\n   :widths: 35 30 30\n\n   * - Access Pattern\n     - ``dma_copy``\n     - ``dma_transpose``\n   * - Static (no ``.ap()``, or ``.ap()`` without offsets)\n     - Any mode\n     - ``none``, ``hwdge``, or compiler-selected\n   * - ``.ap()`` with ``scalar_offset``\n     - Any mode\n     - Any mode\n   * - ``.ap()`` with ``vector_offset``\n     - ``unknown`` or ``swdge``\n     - ``unknown`` or ``swdge``\n\nIf you specify ``dge_mode=unknown`` (the default) with ``vector_offset``, the\ncompiler will automatically select ``swdge``.\n\nThe ``name`` Parameter\n-----------------------\n\nBoth ``dma_copy`` and ``dma_transpose`` accept an optional ``name`` string:\n\n.. code-block:: python\n\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor, name=\"load_weights\")\n\nThis label appears in profiling traces and compiler debug output. It does not\naffect execution. Assigning meaningful names makes it significantly easier to\nidentify specific DMA operations when analyzing performance with Neuron\nprofiling tools.\n\n\nPerformance Implications\n-------------------------\n\nIn essence, the choice comes down to where you want to spend your overhead\nbudget:\n\n- **``none``** --- Lowest per-transfer latency (descriptor already in HBM), but\n  each descriptor consumes HBM bandwidth on first fetch and HBM capacity\n  permanently.\n- **``swdge``** --- Flexible but uses GpSimd cycles. In GpSimd-bound kernels\n  this can become a bottleneck.\n- **``hwdge``** --- ~600 ns per instruction. When triggered from Scalar Engine,\n  descriptor generation overlaps with earlier compute instructions in the\n  pipeline, effectively hiding the cost. Frees GpSimd for other work.\n- **``unknown``** --- The compiler applies heuristics to pick the best mode for\n  the target and workload. Start here and only override after profiling.\n\nIn summary, use ``unknown`` until profiling tells you otherwise, then\nswitch to the specific mode that addresses the bottleneck you observe.\n\n\nCode Examples\n--------------\n\nStatic copy (no DGE)\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.isa as nisa\n\n   # Pre-computed descriptors — addresses fully known at compile time\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor,\n                 dge_mode=nisa.dge_mode.none,\n                 name=\"static_load\")\n\nSoftware DGE copy with dynamic address\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.isa as nisa\n\n   # GpSimd generates the descriptor at runtime\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor,\n                 dge_mode=nisa.dge_mode.swdge,\n                 name=\"dynamic_load\")\n\nHardware DGE copy\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.isa as nisa\n\n   # Hardware DGE block generates the descriptor (NeuronCore-v3+)\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor,\n                 dge_mode=nisa.dge_mode.hwdge,\n                 name=\"hwdge_load\")\n\nHardware DGE transpose\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.isa as nisa\n\n   # src must be [16, ...] with last dim divisible by 128, 2-byte dtype\n   nisa.dma_transpose(dst=sbuf_tile, src=hbm_tensor,\n                      dge_mode=nisa.dge_mode.hwdge,\n                      name=\"hwdge_transpose\")\n\nSoftware DGE indirect transpose (gather + transpose)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.isa as nisa\n   import nki.language as nl\n\n   # indices is a 2-D uint32 SBUF tensor; src is on HBM\n   # Effectively: dst = src[indices.T.flatten()[:src.shape[0]], :].T\n   P, F = 128, 128\n   src_ap = hbm_tensor.ap(\n       pattern=[[P, F], [1, P]],\n       vector_offset=indices,\n       indirect_dim=0,\n   )\n   nisa.dma_transpose(dst=sbuf_tile, src=src_ap,\n                      dge_mode=nisa.dge_mode.swdge,\n                      name=\"gather_transpose\")\n\nCompiler-selected mode (default)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.isa as nisa\n\n   # Let the compiler pick the best DGE mode\n   nisa.dma_copy(dst=sbuf_tile, src=hbm_tensor,\n                 name=\"auto_load\")\n\n   nisa.dma_transpose(dst=sbuf_tile, src=hbm_tensor,\n                      name=\"auto_transpose\")\n"
  },
  {
    "path": "nki/deep-dives/nki-dma-bandwidth-guide.rst",
    "content": ".. meta::\n   :description: Guidelines for maximizing DMA bandwidth by using large contiguous payloads in NKI.\n   :keywords: NKI, DMA, bandwidth, payload size, AWS Neuron, Trainium\n   :date-modified: 04/12/2026\n\n.. _nki-dma-bandwidth-guide:\n\n=====================================================\nGuideline to Avoid Under-Utilizing DMA Bandwidth\n=====================================================\n\nA common misconception is that the hardware's internal memory interleaving\nremoves the need for large contiguous DMA payloads. In practice, small\nfragmented transfers underperform badly regardless of how the hardware\ndistributes traffic across memory channels. This document clarifies why large\npayloads (≥4 KiB) are required to saturate HBM bandwidth.\n\n.. contents:: On this page\n   :local:\n   :depth: 2\n\n\nHow HBM Channel Interleaving Works\n-------------------------------------\n\nHBM is organized into multiple independent channels and banks. The hardware\nuses address interleaving to spread DMA traffic across all available channels,\navoiding hot-spots where one channel becomes a bottleneck while others sit\nidle. This achieves higher effective channel utilization and more consistent\nbandwidth across diverse access patterns.\n\nHowever, channel interleaving only solves the *channel utilization* problem.\nIt has no effect on the per-transfer payload size seen by the DMA engines.\n\n\nWhy Large Contiguous DMA Payloads Are Required\n------------------------------------------------\n\nThe fundamental problem is per-packet overhead. Each NeuronCore has 16 DMA\nengines, and every DMA transfer incurs descriptor setup, synchronization, and\nsemaphore-to-start latency (~1300 ns cross-engine). When payloads are small,\nthe engines spend more time on overhead than on data movement, and the DMA\npacket rate---not HBM bandwidth---becomes the limiting factor.\n\nChannel interleaving does **not**:\n\n- Reduce the number of DMA packets required for a given transfer.\n- Remove the need for large contiguous payloads per DMA operation.\n- Eliminate DMA packets-per-second (PPS) bottlenecks caused by small\n  transfers.\n\nChannel utilization and per-engine throughput are independent concerns.\nInterleaving addresses the first; payload size addresses the second.\n\nLarge contiguous payloads (≥4 KiB per partition) amortize this fixed overhead\nand allow each engine to sustain its peak throughput:\n\n=====  ==============  ==============  ==============\n Gen   BW / Engine     Engines / NC    Aggregate BW\n=====  ==============  ==============  ==============\nTRN1   17 B/ns         16              272 GB/s\nTRN2   23 B/ns         16              368 GB/s\nTRN3   33 B/ns         16              528 GB/s\n=====  ==============  ==============  ==============\n\nWith small payloads the engines cannot fill their pipelines, and achieved\nbandwidth drops well below these peaks regardless of how well the hardware\ndistributes traffic across channels.\n\n\nBandwidth vs. Payload Size\n----------------------------\n\nThe relationship between DMA payload size and achieved bandwidth follows a\nsaturation curve:\n\n- **< 256 B per partition:** Severely overhead-bound. Achieved bandwidth is a\n  small fraction of peak.\n- **256 B -- 2 KiB per partition:** Improving but still below peak. Per-packet\n  overhead is a significant fraction of transfer time.\n- **≥ 2 KiB per partition (minimum recommended):** Approaches peak bandwidth.\n  The kernel efficiency guide recommends at least 2 KiB of contiguous data per\n  partition for all data types.\n- **≥ 4 KiB per partition (target for full saturation):** Fully amortizes\n  per-packet overhead and saturates the DMA engines.\n\n.. list-table:: Minimum free-dimension sizes for 2 KiB per partition\n   :header-rows: 1\n\n   * - Data Type\n     - Minimum Free Dimension\n     - Bytes per Partition\n   * - float32\n     - 512 elements\n     - 2 048\n   * - bfloat16 / float16\n     - 1 024 elements\n     - 2 048\n   * - float8\n     - 2 048 elements\n     - 2 048\n\n\nPractical Guidance\n--------------------\n\n- **Maximize the free dimension** of every DMA tile. Target ≥4 KiB per\n  partition for peak throughput.\n- **Coalesce transfers.** One large DMA covering multiple logical sub-tiles\n  is faster than many small DMAs to adjacent addresses.\n- **Do not rely on hardware channel interleaving alone** to solve bandwidth\n  problems caused by small or fragmented transfers. Channel utilization and\n  per-engine throughput are independent concerns.\n- **Use full partitions (P=128).** Fewer partitions means fewer engines\n  utilized, compounding the effect of small payloads.\n"
  },
  {
    "path": "nki/deep-dives/nki-dynamic-loops.rst",
    "content": ".. meta::\n   :description: Deep dive into nki.language.dynamic_range for dynamic loop iteration with runtime bounds on AWS Neuron hardware.\n   :keywords: NKI, dynamic_range, hardware loop, runtime bounds, VirtualRegister, AWS Neuron, Trainium\n   :date-modified: 03/31/2026\n\n.. _nki-dynamic-loops:\n\n==================\nNKI Dynamic Loops\n==================\n\nThis document covers the `dynamic_range` NKI language API and describes how it\ncan be used to create on-chip (a.k.a. dynamic) loops.\n\nTo begin, let's look at the `dynamic_range` function which is defined below.\n\n.. py:function:: nki.language.dynamic_range(start, stop=None, step=1)\n   :noindex:\n\n   Create a sequence for **dynamic** loop iteration with runtime bounds.\n\n   :param start: Start value (inclusive), or stop if ``stop`` is ``None``. Can be a ``VirtualRegister``.\n   :param stop: Stop value (exclusive). Can be a ``VirtualRegister``.\n   :param step: Step size. Must be a compile-time positive ``int`` (not a ``VirtualRegister``).\n   :return: An iterator yielding integer values from *start* to *stop*.\n\nThe other NKI range iterators (``affine_range``, ``sequential_range``,\n``static_range``) all require compile-time constant bounds. However, some\nkernels need trip counts determined at execution time on the NeuronCore---for\nexample, when the number of tiles to process is loaded from a tensor or\ncomputed on device. The ``nl.dynamic_range`` iterator supports this use case.\n\nWhen the compiler encounters a ``dynamic_range`` loop it emits a **hardware\nloop instruction** on the device. The loop body is not unrolled; instead, a\nsingle copy of the body is generated and the hardware iterates over it at\nruntime.\n\n\n.. contents:: On this page\n   :local:\n   :depth: 2\n\nParameter Constraints\n-----------------------\n\n``start`` / ``stop``\n   Can be Python ``int`` literals **or** ``VirtualRegister`` objects (runtime\n   values computed on device). When only one positional argument is given it is\n   treated as ``stop`` and ``start`` defaults to ``0``, matching the Python\n   ``range()`` convention.\n\n``step``\n   **Must** be a compile-time positive integer. Passing a ``VirtualRegister``\n   raises an ``AssertionError``:\n   The step must be known at compile time because the hardware loop instruction\n   encodes the step as an immediate operand.\n\nComparison with Other Range Iterators\n---------------------------------------\n\nNKI provides four range iterators. The table below summarises their key\ndifferences:\n\n.. list-table::\n   :header-rows: 1\n   :widths: 20 15 15 20 30\n\n   * - Iterator\n     - Bounds\n     - Unrolled?\n     - Generated Code\n     - Primary Use Case\n   * - ``static_range``\n     - Compile-time ``int``\n     - Yes (at compile time)\n     - Fully unrolled---no loop instruction\n     - Default choice---supersedes ``sequential_range`` and ``affine_range``.\n   * - ``sequential_range``\n     - Compile-time ``int``\n     - Yes (at compile time)\n     - Fully unrolled---no loop instruction\n     - Deprecated, formerly for iterations with loop-carried dependencies. Prefer ``static_range`` instead.\n   * - ``affine_range``\n     - Compile-time ``int``\n     - Yes (at compile time)\n     - Fully unrolled---no loop instruction\n     - Deprecated, formerly for parallel iterations with no loop-carried dependency. Prefer ``static_range`` instead.\n   * - ``dynamic_range``\n     - Runtime ``VirtualRegister`` or ``int``\n     - **No**\n     - **Hardware loop instruction**\n     - Trip count unknown at compile time\n\nThere are three key distinctions worth calling out:\n\n- ``static_range``, ``affine_range``, and ``sequential_range`` require all bounds to be\n  compile-time integers. The compiler keeps them as loops internally but may\n  unroll them in the backend. ``dynamic_range`` bounds can be\n  runtime values and the loop is **never** unrolled.\n- ``static_range``, ``affine_range``, and ``sequential_range`` fully unrolls at compile time, which can dramatically increase\n  compilation time, ``dynamic_range`` avoids this entirely.\n\nHardware Lowering\n-------------------\n\nThe compiler lowers ``dynamic_range`` loops to hardware loop instructions on\nthe NeuronCore. Because the loop exists as a single hardware instruction with a body:\n\n- The compiled artifact size does **not** grow with the trip count.\n- The loop variable is a device register, not a Python ``int``. You cannot use\n  it in host-side Python expressions (e.g., ``if i == 0:``). Use NKI\n  device-side operations for any conditional logic that depends on the loop\n  variable.\n\nRegister Allocation Implications\n----------------------------------\n\nInside a ``dynamic_range`` loop the compiler must keep all live tensors in\non-chip memory (SBUF/PSUM) for the **entire duration** of the loop, because\nthe hardware re-executes the same body on each iteration. This means:\n\n- Tensors allocated inside the loop body are allocated once and reused across\n  iterations.\n- Keeping the loop body small and limiting the number of live tiles reduces\n  memory pressure.\n\nIn contrast, ``static_range`` unrolls each iteration independently, giving the\ncompiler full freedom to schedule instructions across the flattened instruction\nstream. However, this does not solve the issue when the trip count is unknown\nat compile time---which is precisely when ``dynamic_range`` is needed.\n\nInteraction with ``no_reorder``\n---------------------------------\n\n``dynamic_range`` loops inside a ``nl.no_reorder()`` block are not currently\nsupported.\n\n.. code-block:: python\n\n   # ✗ This is NOT supported and will error\n   with nl.no_reorder():\n       for i in nl.dynamic_range(n):\n           ...\n\n``affine_range``, ``sequential_range``, and ``static_range`` are all permitted\ninside ``no_reorder`` blocks.\n\nTo work around this, place the ``no_reorder`` block inside the loop body:\n\n.. code-block:: python\n\n   # ✓ no_reorder inside the dynamic loop body\n   for i in nl.dynamic_range(n):\n       with nl.no_reorder():\n           ...\n\nUsing ``while`` with a ``VirtualRegister``\n--------------------------------------------\n\nAs an alternative to ``dynamic_range``, you can use a standard ``while`` loop\nwith a ``VirtualRegister`` as the condition. The loop terminates when the\nregister holds the value ``0``.\n\n.. code-block:: python\n\n   import nki.language as nl\n   import nki.isa as nisa\n\n   reg = nisa.register_alloc(1)\n   while reg:\n       # perform work ...\n\n       # update condition from an SBUF tensor\n       nisa.register_load(reg, cond_tensor)\n\n\nWhen to Use ``dynamic_range``\n-------------------------------\n\nUse ``dynamic_range`` when:\n\n- The number of iterations is **not known at compile time**---for example, it\n  depends on a value loaded from a tensor or computed on device.\n- The trip count is **large** and unrolling (``static_range``, ``affine_range``, or ``sequential_range``) would cause\n  excessive compilation time or code size.\n\nPrefer other iterators when:\n\n- Bounds are compile-time constants and iterations are independent, contain loop-carried dependencies, or need full unrolling → \n  ``static_range``, ``affine_range``, or ``sequential_range``.\n\nExamples\n----------\n\nBasic usage with a constant bound\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.language as nl\n   import nki.isa as nisa\n\n   for _ in nl.dynamic_range(1):\n       tile = nl.ndarray((128, 512), dtype=nl.float32, buffer=nl.sbuf)\n       result = nl.ndarray((128, 512), dtype=nl.float32, buffer=nl.sbuf)\n       nisa.dma_copy(src=input_tensor[0:128, 0:512], dst=tile)\n       nisa.tensor_tensor(dst=result, data1=tile, data2=tile, op=nl.multiply)\n       nisa.dma_copy(src=result, dst=out_tensor[0:128, 0:512])\n\nEven with a constant bound, this generates a hardware loop instruction rather than unrolling.\n\nRuntime trip count from a ``VirtualRegister``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.language as nl\n   import nki.isa as nisa\n\n   start = nisa.register_alloc(0)\n   stop = nisa.register_alloc(512)\n   for i in nl.dynamic_range(start, stop, 128):\n       tile = nl.ndarray((128, 512), dtype=nl.float32, buffer=nl.sbuf)\n       result = nl.ndarray((128, 512), dtype=nl.float32, buffer=nl.sbuf)\n       nisa.dma_copy(src=input_tensor.ap([[512, 128], [1, 512]], scalar_offset=i), dst=tile)\n       nisa.tensor_scalar(dst=result, data=tile, op0=nl.add, operand0=2.0)\n       nisa.dma_copy(src=result, dst=out_tensor.ap([[512, 128], [1, 512]], scalar_offset=i))\n\n\nSpecifying start, stop, and step\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.language as nl\n   import nki.isa as nisa\n\n   # Loop from `begin` to `end` with step 2\n   # begin and end are VirtualRegisters; step must be a compile-time int\n   begin = nisa.register_alloc(0)\n   end = nisa.register_alloc(4)\n   for i in nl.dynamic_range(begin, end, 2):\n       ...\n"
  },
  {
    "path": "nki/deep-dives/nki_perf_guide.rst",
    "content": ".. _nki_perf_guide:\n\nNKI Performance Optimizations\n=============================\n\nIn this document, we describe a recipe to find performance bottlenecks of NKI kernels and apply common software optimizations\nto address such bottlenecks. During this process, we will showcase how to leverage :doc:`neuron-profile </nki/guides/use-neuron-profile>`,\na GUI-based performance profiler designed for NeuronDevices, to guide your performance optimization efforts. Before proceeding\nwith this document, make sure to read through :doc:`NeuronDevice Architecture Guide </nki/guides/architecture/trainium_inferentia2_arch>`\nto familiarize yourself with Neuron hardware architecture.\n\nIdeally, performance optimization efforts would end with one of two possible outcomes: the execution of a NKI kernel is\neither strictly **compute-bound** or **memory-bound**. In the context of NeuronDevices, compute-bound means at least one\nof the compute engines is active close to 100% of the kernel execution time (90%+ is considered good in practice),\nwhile memory-bound typically means the achieved device memory bandwidth utilization (MBU) is close to 100% (60%+\nis considered good in practice). For compute-bound kernels that are matrix-multiplication dominated, we should also aim\nfor close to 100% model flops utilization (MFU) in the execution. All of these metrics are available under the ``Summary``\ntab in ``neuron-profile`` GUI:\n\n.. _perf_guide_mbu:\n\n.. figure:: /nki/img/nki_perf_guide/fig1.png\n   :align: center\n   :width: 60%\n\n   MBU metric in neuron-profile.\n\n.. _perf_guide_compute_metrics:\n\n.. figure:: /nki/img/nki_perf_guide/fig2.png\n   :align: center\n   :width: 60%\n\n   Compute-related metrics in neuron-profile.\n\nThe rest of this document is divided into three sections, focusing on three categories of performance optimizations. The\nfirst section covers optimizations to maximize achieved arithmetic intensity, with the goal of minimizing compute engine\nidle periods due to unnecessary data movement. The second and third sections dive into optimizations to improve compute\nengine and data movement efficiency, respectively.\n\nImproving Arithmetic Intensity\n------------------------------\n\nArithmetic intensity of a computation workload is commonly defined as the average number of computation operations performed\nper byte of data accessed from memory. In the context of NeuronDevices, the definition refers to data accessed from *device\nmemory* (HBM), since the on-chip memory (SBUF) has sufficient bandwidth to keep all compute engines busy.\n\nWhen arithmetic intensity is overly low, compute engines would be consuming data much faster than DMA engines fetching data\nfrom device memory into the on-chip memory SBUF. In this case, the execution is bounded by the available device memory bandwidth.\nOnce arithmetic intensity is beyond certain threshold, that is, ratio of maximum compute throughput over memory bandwidth,\nthe performance bottleneck shifts to how fast compute engines can perform computation, which leads to a compute-bound execution.\n\nFigure below visualizes the `Roofline Model <https://en.wikipedia.org/wiki/Roofline_model#:~:text=The%20roofline%20model%20is%20an,benefit%20and%20priority%20of%20optimizations.>`_\\\n, which captures this idea by plotting the projected attainable compute throughput with respective to the arithmetic intensity\nof an algorithm.\n\n\n.. _perf_guide_roof:\n\n.. figure:: /nki/img/nki_perf_guide/fig3.png\n   :align: center\n   :width: 50%\n\n   The Roofline Model.\n\n*Algorithmic* arithmetic intensity is an intrinsic characteristic of the particular workload and solely dependent on the\ncompute algorithm. In reality, due to limited capacity in SBUF, the *achieved* arithmetic intensity of a NKI kernel implementation\nof such workload could be lower than the algorithmic arithmetic intensity. This could lead to excessive compute engine idle\ntime blocked by completion of data movements. The two typical reasons behind this are *input data reloading* and *intermediate\ndata spillage*. Let's discuss how to identify their symptoms in ``neuron-profile`` and how to mitigate these issues to improve\narithmetic intensity next.\n\n.. _perf_guide_temporal_locality:\n\nOpt #1. Exploit temporal locality to minimize input data reloading\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n\n**Symptom**: In neuron-profile, if a NKI kernel triggers DMAs (\\ ``nl.load``\\ ) for the same input tensor multiple times,\nyou would see the relevant DMA activities (on the timeline row with a label starting with ``q`` and ending with ``IO``\\\n) being highlighted in an orange box. Hovering over the “+” sign of the box in top-left corner, a performance warning pop-up\nwill show up, indicating which input tensor is being reloaded, the size of it and how many times it was reloaded. For example,\nfigure below is a screenshot of such warning pop-up showing the ``u`` input tensor defined in my NKI kernel was reloaded\n~7 times:\n\n\n.. _perf_guide_input_reload_warning:\n\n.. figure:: /nki/img/nki_perf_guide/fig4.png\n   :align: center\n   :width: 50%\n\n   Performance warning on input data reloading.\n\n**Optimization**: Input tensor reloading could be avoided if the same data stay in SBUF across all the operations that consume\nit at different points of the execution. However, keeping too much data in SBUF across operations can increase the memory\npressure in SBUF, leading to more spilling of intermediate data. Therefore, avoiding input reload should be a trade-off\nprogrammers need to make carefully. Figure below illustrates this trade-off conceptually.\n\n\n.. _perf_guide_input_reloading:\n\n.. figure:: /nki/img/nki_perf_guide/fig5.png\n   :align: center\n   :width: 70%\n\n   SBUF usage impact with and without input reloading.\n\nA classic example of using this optimization technique is in a matrix multiplication kernel, where we need to exploit data\nreuse in the same rows of the left hand-side input matrix across different columns of the right hand-side matrix. See\n:doc:`Matmul NKI Tutorial Optimization 1-3 </nki/guides/tutorials/matrix_multiplication>` for more\ndetailed discussion. Another great example is in the :ref:`Fused Mamba <tut_mamba_loop_reordering>`\nkernel tutorial, where programmers can minimize reloading of largest input tensors through loop reordering.\n\n.. _perf_guide_opt2:\n\nOpt #2.  Fuse operations to minimize intermediate data spilling\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Symptom**: In ``neuron-profile`` , we can find many useful data movement related metrics in the ``Summary`` tab:\n\n\n.. _perf_guide_summary:\n\n.. figure:: /nki/img/nki_perf_guide/fig6.png\n   :align: center\n   :width: 60%\n\n   ``neuron-profile`` Summary tab.\n\nBelow we highlight four relevant metrics to assess severity of data spilling under the ``data_movement`` section (tip: hovering\nover any metric name will show a detailed description of the metric):\n\n.. _perf_guide_data_metrics:\n\n.. figure:: /nki/img/nki_perf_guide/fig7.png\n   :align: center\n   :width: 60%\n\n   Data movement metrics\n\nHere, ``spill_save_bytes`` refers to the total size of intermediate data in bytes the workload spills from SBUF into device\nmemory, while ``spill_reload_bytes`` indicates total size of spilled data in bytes the workload reloads back into SBUF.\nBy comparing ``spill_save_bytes`` against ``sb_read_bytes``\\ , you can get a feel on how much of the data movement traffic\nfrom SBUF to device memory is related to spilling. Similarly, comparing ``spill_reload_bytes`` against ``sb_write_bytes``\nindicates how much of traffic from device memory back to SBUF is related to spilling. If the spill related traffic takes\nup a significant portion (for example over 30%), it is likely worthwhile to take a close look at this optimization.\n\n**Optimization**: To reduce spilling, the key is to find operator fusion opportunities in the kernel. To achieve fusion, we\ntypically also need to slice up computation of each operator and perform computation for a portion of the input tensor at\na time. As a simple example, assume a chain of operators ``op0 → op1`` on a large input tensor ``kernel_in_hbm`` that cannot\nfit in SBUF all at once. If we were to do the operators one at a time, we will effectively have the following sequence of\nevents:\n\n.. code-block::\n\n   for tile in kernel_in_hbm:\n       tile_sbuf = load(tile)\n       op0_out_sbuf = op0(tile_sbuf)\n       # compiler generated spilling, or NKI programmers explicitly perform a store\n       spill_save(op0_out_sbuf, op0_out_hbm)\n\n   for tile in op1_out_device_memory:\n       tile_sbuf = spill_reload(tile)\n       op1_out_sbuf = op1(tile_sbuf)\n       store(op1_out_sbuf, kernel_out_hbm)\n\nHowever, if we fuse the operators from above:\n\n.. code-block::\n\n   for tile in kernel_in_hbm:\n       tile_sbuf = load(tile)\n       op0_out_sbuf = op0(tile_sbuf)\n       op1_out_sbuf = op1(op0_out_sbuf)\n       store(op1_out_sbuf, kernel_out_hbm)\n\nInside a NKI kernel, operator fusion is exactly done as the above through explicit loop fusion.\n\nOne great use of this optimization is the self attention operator commonly found in Transformer models. Self attention performs\na chain of operators: matmul_0 → softmax → matmul_1, where matmul_0 of a single attention head produces a large intermediate\ntensor shape that overflows SBUF in common Transformer models with a context length in the thousands.\n\nConsider caching as described in :ref:`Exploit temporal locality to minimize input data reloading <perf_guide_temporal_locality>` if there \nare no opportunities for operation fusion. If caching is already implemented with blocking, then oversized block size might be causing spills. \nRefer to :doc:`Matmul NKI Tutorial Optimization 1-3 </nki/guides/tutorials/matrix_multiplication>` for more details on block sizing.\n\n\n**Optimization Gotchas**:\nCertain code patterns in NKI might lead to unexpected spilling from programmers' perspectives. We are working on improving\nthese in future releases. As an example, buffers sometimes need to be declared within the inner loop to avoid spilling.\nIn other words, instead of:\n\n.. code-block::\n\n\n   buf = nl.ndarray((2, 4, nl.par_dim(128), 512), buffer=nl.sbuf)\n   for i0 in nl.affine_range(2):\n     for i1 in nl.affine_range(4):\n        buf[i0, i1, ....] = nl.load(...)\n        ...\n\nwe need to implement:\n\n.. code-block::\n\n   for i0 in nl.affine_range(2):\n     for i1 in nl.affine_range(4):\n        buf = nl.ndarray((nl.par_dim(128), 512), buffer=nl.sbuf)\n        buf[...] = nl.load(...)\n\nWith the above aforementioned optimizations, the kernel execution should achieve an arithmetic intensity that is somewhat\nclose to the algorithmic arithmetic intensity. At this point, you should be able to observe from the execution timeline\nin ``neuron-profile`` whether the kernel spends more time in compute or DMA engines. The ``engine/dma_active_time_percent``\nmetrics reported in the Summary tab should also give you good hints. If your kernel execution is dominated by computation,\nwe recommend going over :ref:`Optimizing Compute Efficiency <perf_guide_compute>`\nfirst to optimize compute efficiency. Otherwise, jump straight to :ref:`Optimizing Data Movement Efficiency <perf_guide_memory>`\nto understand how to optimize data movement efficiency.\n\n\n.. _perf_guide_compute:\n\nOptimizing Compute Efficiency\n-----------------------------\n\nCompute efficiency optimizations typically fall into two categories:\n\n\n#. “time” domain engine utilization: reduce engine idle time to keep the compute engine *on critical path* as busy as possible,\n   such as enabling pipelining among engines.\n#. “spatial” domain engine utilization: within the engine active periods, increase instruction efficiency to use as many\n   hardware units within the engine as possible, such as combining multiple instructions into one.\n\nLet's dive into each category below.\n\nReducing engine idle time\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo improve the active time of a compute engine, we need to understand the exact reasons for the engine to enter an idle\nstate. In neuron-profile, we can focus on the execution trace of the bottlenecked engine and zoom into the visually large\nengine idle gaps. For example, in the below profile, we expect VectorE to be the bottlenecked engine and therefore focus\non the idle gaps on VectorE:\n\n.. _perf_guide_engine_idle:\n\n.. figure:: /nki/img/nki_perf_guide/fig8.png\n   :align: center\n   :width: 100%\n\n   Engine idle gaps.\n\n*Side note*\\ , for faster GUI rendering, neuron-profile enables data sampling by default and “hides” certain instructions\nfrom the timeline with a large profile. To confirm whether an engine indeed has an idle gap, we recommend zooming into a\nsmaller region of the profile and turn on “Show unsampled data” in ``View Edit Settings`` to make sure all instructions\nare rendered:\n\n.. _perf_guide_unsampled:\n\n.. figure:: /nki/img/nki_perf_guide/fig9.png\n   :align: center\n   :width: 100%\n\n   Show unsampled data in neuron-profile.\n\nFor each engine idle gap, you can find out the reasons why the engine cannot execute instructions by inspecting the **semaphore\nwait condition** of the first instruction executed on the engine after the gap. Broadly speaking, these semaphore wait conditions\nare either waiting for 1) other compute engine instructions or 2) DMA activities to finish. We have different techniques\nto shrink the idle gaps caused by either of these wait conditions (that is, engine stall reasons).\n\n.. _perf_guide_opt3:\n\nOpt #3.  Overlap execution across compute engines through pipelining\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Symptom**: The semaphore wait condition of the first instruction after an idle gap is on a semaphore name that matches a\ncompute engine name in NeuronCore: Vector, Scalar, GpSimd and Tensor. These semaphores are associated with instruction completion\non the corresponding compute engine.\n\nFor example, the below ``TENSOR_TENSOR`` instruction on VectorE is waiting for ``S[4] (Scalar)`` to reach a value of 36.\nThis means VectorE was waiting for ScalarE to finish certain instructions.\n\n.. _perf_guide_wait_engine:\n\n.. figure:: /nki/img/nki_perf_guide/fig10.png\n   :align: center\n   :width: 100%\n\n   Semaphore wait on another compute engine.\n\n**Optimization**: When there is a sequence of operators on different compute engines, we can slice the computation in a way\nthat the compute engines can process tiles of the original operator in a pipeline fashion. As an example, let’s assume we\nhave two operator back to back on a large (say, thousands of elements) tensor ``X``\\ : ``X → op0 → Y → op1 → Z``. ``op0``\nis performed on ScalarE while ``op1`` is on VectorE. For simplicity, let’s assume tensor ``X/Y/Z`` have the same shape.\n\nFigure below shows two possible execution timelines with and without engine pipelining. Without pipelining, VectorE is fully\nidle when ScalarE is executing ``op0`` on tensor ``X`` in the first half of the execution. Similarly, ScalarE is idle while\nVectorE is running ``op1``. However, with pipelining, ScalarE is able to produce partial results in tiles and unblock VectorE\nas soon as the first tile is processed. Overall, engine pipelining shortens the end to end latency to complete ``op0`` and\n``op1``\\ , through shrinking engine idle time and improving hardware utilization.\n\n.. _perf_guide_engine_pipe:\n\n.. figure:: /nki/img/nki_perf_guide/fig11.png\n   :align: center\n   :width: 80%\n\n   Engine timeline with and without engine pipelining.\n\nChoosing a proper tile size is crucial to the performance of such engine pipelining. It is up to NKI programmers to make\nthis choice in kernel implementation and iterate on it using performance profiling data in neuron-profile. For complex kernels,\nwe often need to schedule a pipeline among all engines: Tensor/Scalar/Vector/GpSimd Engine.\n\nFor example, in Transformer's self-attention layer, in addition to fusing matmul_0(Q, K) → softmax → matmul_1(softmax_out,\nV) in a single kernel to minimize spilling as discussed in :ref:`Opt #2 <perf_guide_opt2>`,\nwe also need to form a complex engine pipeline for the operators to maximize utilization of the compute engines:\n\n\n* matmul_0/matmul_1: TensorE\n* softmax:\n\n  * exponential: ScalarE\n  * summation: VectorE\n  * scale by reciprocal of summation: ScalarE\n  * for causal self attention, triangular masking: GpSimdE\n\n\n.. _perf_guide_opt4:\n\nOpt #4.  Overlap data loading with computation\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Symptom**: The semaphore wait condition of the first instruction after an idle gap is on a semaphore name that starts with\nletter ``q``. These semaphores are associated with completion of DMA activities.\n\nFor example, hovering on an instruction will bring up the key instruction details as follows:\n\n.. _perf_guide_wait_input:\n\n.. figure:: /nki/img/nki_perf_guide/fig12.png\n   :align: center\n   :width: 100%\n\n   Instruction waiting for input data loading.\n\nIn this particular screenshot, the ``EVENT_SEMAPHORE`` instruction could not start earlier even though VectorE was idle\nbecause it was waiting for semaphore S[22] (\\ ``qSyncIO0``\\ ) to reach a value of 240. The semaphore is only incremented\nwhenever the corresponding DMA activities shown on the ``qSyncIO0`` execution trace are completed. Clicking on the DMA activities\non ``qSyncIO0`` immediately before the ``EVENT_SEMAPHORE`` instruction, you may follow the ``nki_source_location`` to find\nout which line of code is related to this DMA activity (\\ ``nl.load()`` call).\n\nSimilarly, if an instruction is blocked on ``S[47] (qSyncSpillReload0``\\ ), that means it is blocked by DMA activities for\nspilling:\n\n.. _perf_guide_wait_spill:\n\n.. figure:: /nki/img/nki_perf_guide/fig13.png\n   :align: center\n   :width: 100%\n\n   Instruction waiting for spilled data reloading.\n\nClicking on the DMA activities on ``qSyncSpillReload0`` immediately before the ``EVENT_SEMAPHORE`` instruction, you may\nfind out the name of the intermediate NKI tensor that was spilled/reloaded. For example, the below DMA transfer reloads\nthe tensor named ``deltaU`` as defined in our NKI kernel. Note, spill/reload DMA transfers are generated by Neuron Compiler\nautomatically by analyzing SBUF usage in NKI kernels. Therefore, these DMA transfers do not have an associated explicit\nNKI API call or ``nki_source_location`` information.\n\n.. _perf_guide_spill_variable:\n\n.. figure:: /nki/img/nki_perf_guide/fig14.png\n   :align: center\n   :width: 60%\n\n   Spilled tensor variable name.\n\n**Optimization**: Overlapping data loading with compute is highly similar to enabling compute engine pipelining in Opt #3,\nsince DMA engines can move data in parallel to compute engine execution, just like how compute engines can run different\noperators in parallel.\n\n.. _perf_guide_overlap_comp_mem:\n\n.. figure:: /nki/img/nki_perf_guide/fig15.png\n   :align: center\n   :width: 80%\n\n   DMA and engine timeline with and without overlapping.\n\nHowever, it is also possible that even after maximizing overlapping of compute and data movement the best you can, the data\nmovement duration is still not hidden behind compute even though your kernel has a compute-bound arithmetic intensity. In\nthese cases, the most common cause is the data movement in your kernel is not using the DMA engines *efficiently*. Refer\nto a :ref:`later section <perf_guide_memory>` to\nsee relevant optimization techniques to improve DMA bandwidth utilization.\n\nAs a concrete example, we demonstrate how to properly overlap compute and data movement in a compute-bound (VectorE as the\nbottlenecked engine) kernel in :ref:`Mamba tutorial <tut_mamba_tiling>`.\n\nImproving engine efficiency\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nOnce done with “avoiding engine idle gaps” as much as possible, we can focus on improving “engine efficiency” during the\nbusy periods of the engine. We will start with two optimizations techniques that are generally applicable to all compute\nengines, followed by TensorE-specific optimization techniques.\n\nOpt #5a: Use sufficiently large input tiles in free dimension\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Symptom**: Certain operators might trigger many back-to-back instructions with small free dimension sizes in the input\ntensors. For example, in the below profile, ScalarE is busy with many repeated ``activation`` instructions with IDENTITY\n(scale/bias enabled) activation function, which is equivalent to calling ``nki.isa.tensor_scalar(op0=nl.multiply, op1=add)``\nAPIs. If you click on one of the instructions to pull up the instruction detailed view, you can see the source tensor access\npattern is ``fp32@20580[1,1,1][1,1,1]`` , where the first set of bracket indicates 3D strides and the second set indicates\n3D shape in FP32 elements. More detailed discussion of ISA access pattern can be found by clicking on the ``i`` button at\nthe end of the ``Operands`` row.\n\nIn this example, each of the back-to-back instructions is reading **one** element per partition from SBUF, which would take\nabout one engine cycle to perform useful computation within the instruction. Such instructions are extremely inefficient\nsince the static instruction overhead in the order of ~100 cycles would be limiting the overall throughput.\n\nTo make things worse, these instructions also have data dependency (read after write) between consecutive instructions,\nwhich means the next instruction cannot start data read until the previous instruction has all of its output committed to\nthe local SRAM. In neuron-profile, you can inspect data dependency between instructions by clicking on an instruction of\ninterests (\\ ``Inst1`` in the below profile), which will highlight the clicked instruction and also the instruction that\nproduces input for the clicked instruction (\\ ``Inst0`` in the below profile). The dependency information can also be viewed\nin the details “instruction dependency pcs”. In fact, all the neighboring instructions also have a similar dependency patterns\nin this profile.\n\nWith the above inefficiencies, the initiation interval (the time between the starting points of two consecutive instructions)\nfor these instructions on ScalarE is around ``189 ns (264 ScalarE cycles on NC-v2)`` , which is much higher than the useful\ncomputation cost (one ScalarE cycle throughput-wise).\n\n.. _perf_guide_small_instr:\n\n.. figure:: /nki/img/nki_perf_guide/fig16.png\n   :align: center\n   :width: 100%\n\n   Many back-to-back ScalarE instructions with small tensor shapes\n\n**Optimization**: The trick of this optimization is to increase the free dimension size of instruction input tiles. As discussed\nin the :doc:`architecture guide </nki/guides/architecture/trainium_inferentia2_arch>`, NeuronCore compute engines\ntypically require at least 128 elements/partition in the source tensor to be efficient. However, it is worth mentioning\nthat increasing free dimension sizes might not be trivial due to the high-level computation definition. We suggest developers\nwalking through the :doc:`architecture guide </nki/guides/architecture/trainium_inferentia2_arch>` in detail to better understand capabilities of\ndifferent compute engines, and mapping/reformulating the high-level operators onto the engines using the most suitable instructions.\nSuch instructions could be invoked either through the high-level ``[nki.lanaguage](api/nki.language)`` or low-level\n``[nki.isa](api/nki.isa)`` APIs.\n\nIn addition, keep in mind there is a trade-off in choosing the free dimension size in instruction input tiles: Too small\nof a tile size exposes significant instruction overhead leading to inefficient engine execution, while too large of a tile\nsize often leads to inefficient pipelining between engines (working against :ref:`Opt #3 <perf_guide_opt3>`)\nand high memory pressure in SBUF (working against Opt #2).\n\nAs an example, a naive implementation of the prefix sum scan operation in Mamba v1 would trigger ``seq_len`` back-to-back\nsingle element ``nki.isa.tensor_scalar`` instructions as shown in the above profile example, where ``seq_len`` is the sequence\nlength of the model typically in the range of thousands. A more efficient way to implement this operation is through a special\nVectorE instruction ``nisa.tensor_tensor_scan``.\nSee the :doc:`Mamba tutorial </nki/guides/tutorials/fused_mamba>` for more discussion.\n\nOpt #5b: Use sufficiently large input tiles in partition dimension\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Symptom**: When instructions use input/output tiles that span fewer than 128 partitions, they typically under-utilize\nthe compute engine capabilities. This is because each SBUF/PSUM partition has a one-to-one mapping to parallel vector lanes\nin the compute engines. As an example, the ``TENSOR_TENSOR`` instruction (equivalent to ``nki.tensor_tensor``\\ ) on VectorE\ntakes a source tensor in SBUF that occupies 64 partitions only, as indicated by the ``channels=64`` instruction operand\nfield. If we were to increase the ``channels`` field to 128, the instruction would have taken the same amount of time as\n``channels=64``.\n\n.. _perf_guide_le128_part:\n\n.. figure:: /nki/img/nki_perf_guide/fig17.png\n   :align: center\n   :width: 70%\n\n   An instruction that read/write less than 128 partitions.\n\n\nSimilarly, for a ``MultiplyMoving`` instruction (Matmul opcode in neuron-profile) TensorE, if the instruction reads/writes\ntiles do not span the full SBUF/PSUM partitions, we would be underutilizing TensorE. As an example, the below ``MultiplyMoving``\ninstruction only writes to 96 partitions in PSUM, as indicated by the operand ``128*96``\\ , which means the instruction\nonly uses 128 rows and 96 columns of the processing elements out of the available 128x128 systolic array.\n\n.. _perf_guide_le128_col:\n\n.. figure:: /nki/img/nki_perf_guide/fig18.png\n   :align: center\n   :width: 70%\n\n   MultiplyMoving instruction that uses <128 TensorE columns\n\n\n**Optimization**:\nIf we see **many back-to-back** **instructions** on the compute engine that have fewer than 128 partitions in the input/output\ntiles as discussed above, we should consider an optimization called “partition vectorization”.\n\nAs an example, say we have two ``nki.isa.nc_matmul()`` instructions with each generating a 64-partition PSUM tile of the\nsame shape. Then VectorE needs to run ``nki.isa.tensor_reduce()`` on both tiles to generate a reduction result. Note, on\ntrn1/inf2, VectorE cannot run the two independent ``nki.isa.tensor_reduce()`` instructions in parallel in this case, even\nthough the total number of compute lanes required for these instructions does not exceed 128. To improve VectorE utilization\nin this case, we can:\n\n\n#. The two ``nc_matmul()`` instructions write to disjoint PSUM partitions: partition 0-63 for the first ``nc_matmul`` and\n   partition 64-127 for the second one.\n#. Invoke a single ``nki.isa.tensor_reduce()`` instruction to process output of both ``nki.isa.nc_matmul()`` instructions.\n\nThe below pseudo-code illustrates the above computation without and with partition vectorization.\n\n.. code-block::\n\n   import nki.isa as nisa\n   import nki.language as nl\n\n   ################################################################\n   # option 1: No partition vectorization\n   # two 64-partition vector instructions running serially\n\n   # By default, NKI creates mm_tile0 and mm_tile1 in partition 0-63\n   mm_tile0 = nisa.nc_matmul(...)\n   mm_tile1 = nisa.nc_matmul(...)\n\n   # Both nki.isa.reduce instructions move data from psum partition 0-63\n   # in a serialized fashion\n   reduce0 = nisa.tensor_reduce(mm_tile0, ...)\n   reduce1 = nisa.tensor_reduce(mm_tile1, ...)\n\n   ################################################################\n   # option 2: Partition vectorization\n   # vectorized into one 128-partition vector instructions\n\n   # Here, we explicitly declare a 128-partition tensor in PSUM\n   mm_tile = nl.zeros((128, ...), np.float32, buffer=nl.psum)\n\n   i_output0_p = nl.arange(64)[:, None]\n   i_output1_p = 64 + nl.arange(64)[:, None]\n   # Assign first part of mm_tile to partition 0-63\n   mm_tile[i_output0_p, ...] = nki.isa.nc_matmul(...)\n   # Assign second part of mm_tile to partition 64-127\n   mm_tile[i_output1_p, ...] = nki.isa.nc_matmul(...)\n\n   # A single nki.isa.reduce instruction, using all 128 partitions\n   reduce = nisa.tensor_reduce(mm_tile, ...)\n\nOption #2 above is able to perform the reduction 2x faster, by vectorizing the partition dimension and performing a single\nreduction instead of two.\n\nOpt #6: Combine instructions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Symptom**: Even though the majority of popular ML models are matrix multiplication heavy, certain operators can be vector/scalar\noperation heavy instead, such as self-attention in Transformer models. These operators typically have a performance bottleneck\nin VectorE or ScalarE or both. As an example, the below profile shows the inner loop of self attention, where either VectorE\nor ScalarE is busy at any moment in time, while TensorE has clear engine idle gaps.\n\n.. _perf_guide_vector_scalar_bound:\n\n.. figure:: /nki/img/nki_perf_guide/fig19.png\n   :align: center\n   :width: 100%\n\n   A VectorE/ScalarE-bound profile.\n\n**Optimization**: A common optimization to tackle vector/scalar-operation-heavy operators is **combining instructions** using\nlow-level ``nki.isa`` APIs. Combining instructions can leverage the deep pipelined stages within VectorE and ScalarE engine\ndata path to increase hardware utilization per instruction and reduce the instruction count. Check out the\n:doc:`architecture guide </nki/guides/architecture/trainium_inferentia2_arch>` to learn what operations can be done in a pipeline fashion\nin a single VectorE/ScalarE instruction.\n\nFor example, below pseudo-code showcase combining three instructions into a single one on ScalarE. ``impl 1`` and ``impl\n2`` are functionally equivalent, but ``impl 2`` is 3x faster in terms of latency by touching the input ``data`` only once\nand running all three operations (multiply, add, exp) in a pipeline.\n\n.. code-block::\n\n   import nki.isa as nisa\n   import nki.language as nl\n\n   # input: data (tile[128, 512]), scale (tile[128, 1]) , bias (tile[128, 1])\n\n   # impl 1:\n   scaled = nl.multiply(data, scale)\n   shifted = nl.add(scaled, bias)\n   exp = nl.exp(shifted)\n\n   # impl 2:\n   exp = nisa.activation(nl.exp, data,\n                            bias, scale)\n\nCheck out :doc:`nki.isa APIs </nki/api/nki.isa>`\nto understand low-level ISA API semantics, limitations, engine mapping, and rough estimates of performance cost.\n\nSee :doc:`Fused Mamba </nki/guides/tutorials/fused_mamba>` tutorial for a concrete example to\ncombine matrix-vector multiplication and exponential evaluation in a single ``nisa.activation`` instruction.\n\nOpt #7: TensorE only: Leverage fast weight load\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Symptom**: Let's consider a matrix multiplication between two matrices of shape ``[M, K]`` and ``[K, N]``\\ , with one of\nthe following conditions:\n\n\n#. M is significantly smaller than 128, while N is much larger than 128, or\n#. the other way around: N is significantly smaller than 128, while M is much larger than 128\n\nIn NKI, if the matrix with ``min(M, N)`` dimension is mapped to the **stationary tensor** (\\ ``x`` input tensor in ``nl.matmul``\nand ``nisa.nc_matmul``\\ ) for the TensorE ``LoadStationary`` instruction (details see :ref:`architecture guide <arch_guide_tensor_engine>`\n), we will typically end up under-utilizing TensorE more severely compared to mapping such matrix to the **moving tensor**.\n\nIn ``neuron-profile``\\ , programmers can identify also this inefficient case by inspecting the ``src`` access patterns for\nLoadStationary and MultiplyMoving instructions on TensorE. For example, the below screenshot indicates a stationary tensor\nwith 1 element per partition and a moving tensor with 128 elements per partition:\n\n\n.. _perf_guide_matrix_vector_instr:\n\n.. figure:: /nki/img/nki_perf_guide/fig20-21.png\n   :align: center\n   :width: 100%\n\n   Example instructions for matrix-vector multiplication.\n\nIf you have many back-to-back TensorE instructions with the above pattern, we recommend applying the below optimization.\n\n**Optimization**: The key idea of this optimization is to simply swap the stationary and moving tensor positions for the given\nmatmul in NKI, in order to leverage the \"Fast LoadStationary\" support in TensorE (more discussion in\n:ref:`architecture guide <arch_guide_tensor_engine_perf>`). To better understand the intuition behind this, let's walk\nthrough a concrete example.\n\nConsider a ``[1, 128] x [128, 128]`` matrix multiplication as below:\n\n.. _perf_guide_matrix_vector:\n\n.. figure:: /nki/img/nki_perf_guide/fig22.png\n   :align: center\n   :width: 60%\n\n   Illustration of matrix-vector multiplication.\n\nSince K=128 is the contraction dimension, it will get mapped to the partition dimension of the SBUF for both the ``x`` and\n``y`` matrices. M and N will therefore get mapped to the free dimension of the SBUF.  and we will refer to ``x`` as the\n“short” tensor, and ``y`` as the “long” tensor (short and long in the free dimension, respectively). We have two possible\nways of performing this computation on the TensorE, which we'll refer to as “Short Moving” and “Short Stationary“, depending\non which tensor has the short free dimension.\n\n.. _perf_guide_matrix_vector_2way:\n\n.. figure:: /nki/img/nki_perf_guide/fig23.png\n   :align: center\n   :width: 100%\n\n   Two possible TensorE instruction mapping for matrix-vector multiplication.\n\nBased on the multiplication property of transpose, we have ``A×B=(B.T×A.T).T``. Meanwhile, based on the semantics of TensorE, when\nwe want to compute ``A×B``, we need to call ``nc_matmul(A.T, B)``, and for ``BT×AT``, we need to call\n``nc_matmul(B.T.T, A.T)`` -> ``nc_matmul(B, A.T)``. Notice how the parameters\nto ``nc_matmul`` are swapped! Thus, when we swap stationary and moving tensors and perform the matrix multiplication, the\noutput tensor will be transposed from the original output.\n\nRecall, if there is a difference in initiation interval between ``LoadStationary`` and ``MultiplyMoving``, one of them\ncan end up limiting the throughput of TensorE:\n\n.. _perf_guide_tensor_perf:\n\n.. figure:: /nki/img/arch_images/mm_bottleneck.png\n   :align: center\n   :width: 60%\n\n   Two possible TensorE performance characteristics.\n\nIn the above scenarios, we expect TensorE performance to be bound by whichever instruction reads the longer tensor - LoadStationary\nin “Short Moving”, and MultiplyMoving in “Short Stationary”. However, with TensorE Fast LoadStationary, TensorE can perform\n``LoadStationary`` **up to 4x** faster than a ``MultiplyMoving`` with the same free axis size.\n\nSo in the two above scenarios:\n\n\n#. Short Moving - ``LoadStationary`` initiation interval is roughly equal to the number of elements divided by 4 (because\n   of fast LoadStationary), and ``MultiplyMoving`` initiation interval is dominated TensorE instruction turnaround time ``MM_INIT_LATENCY\n   (64 cycles on trn1)``. Therefore, we have  ``LS_II ~= 128/4 = 32 cycles`` , and ``MM_II ~= max(1, MM_INIT_LATENCY=64 cycles)``\n   which leads to issuing a MM roughly every 64 cycles.\n#. Short Stationary - ``MultiplyMoving`` initiation interval will dominate, which leads to issuing a MM roughly every 128\n   cycles.\n\nBecause of the above, we will prefer to map short tensors to the moving tensor in ``MultiplyMoving`` instruction in TensorE.\n\nA classic example is a matrix-vector product. This is commonly seen in auto-regressive token generation in LLMs, where most\nof the matmuls occur only on a single token (vector) as the feature map, while the weight tensor remains large and hence\nmust be broken into tiles to meet TensorE tile size constraints.\n\nOpt #8: TensorE only: Mitigating overhead from tensor transposes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Symptom**: Since TensorE accounts for over 90% of the hardware FLOPS on a NeuronCore, we would like the engine to perform\nuseful computations as much as possible, especially in matmul-heavy kernels. The most common “not useful” computation that\ncould occupy precious TensorE cycles is tensor PF-transposes, which swap the partition and free dimensions of a NKI tile.\nWhen you have a profile with TensorE visually extremely busy, we recommend doing a sanity check on how much of the TensorE\nactivities are performing transposes. One easy way to check is by selecting ``Instruction Type`` as the ``Instruction Grouping``\nin ``View Settings`` :\n\n.. _perf_guide_transpose_setting:\n\n.. figure:: /nki/img/nki_perf_guide/fig25.png\n   :align: center\n   :width: 60%\n\n   Change view settings to visualize transposes.\n\nWith this instruction coloring, TensorE instructions will be highlighted in two different colors: one for Transpose and\none for Regular (useful matmuls). As an example, the below profile has an execution trace with TensorE being the performance\nbottleneck. Visually, we can see the bulk of the TensorE execution is for regular matmuls, but there is a noticeable chunk\nof engine time spent on transpose-induced instructions in red. Note, the colors for transpose versus regular instructions\nare chosen randomly by the profiler each time. You should hover over the instructions to check the ``Instruction Type``\nfield on the pop-up to confirm the color mapping.\n\n.. _perf_guide_transpose_timeline:\n\n.. figure:: /nki/img/nki_perf_guide/fig26.png\n   :align: center\n   :width: 100%\n\n   Example timeline with a transpose instruction type.\n\n\n**Optimization**: The key goal of this optimization is to reduce the number of transpose-induced instructions on TensorE,\nwhen such instructions are taking up a large portion of the execution. Before diving into techniques to reduce transposes,\nit is important to understand the root cause of these transposes.\n\nAt a high level, tensor transposes are needed to adjust the data layout of tensors to match the partition dimension requirements\nof different ISA instructions. Refer to the :doc:`architecture guide </nki/guides/architecture/trainium_inferentia2_arch>`\nfor layout requirements of each compute engine. Transposes are inserted explicitly into NKI kernels through \n:doc:`nisa.sb_transpose </nki/api/generated/nki.isa.nc_transpose>` APIs, or calling ``nl.matmul`` with ``transpose_x=False``. \nThese transposes are most commonly lowered down to Tensor Engine.\n\nBroadly speaking, there are 2 different types of tensor transposes, with different root causes:\n\n\n#. IO tensor transpose (abbreviated as IO transpose)\n#. intermediate tensor transpose (abbreviated as intermediate transpose)\n\n**IO transpose.** These transposes are ** done on NKI kernel IO (input/output) tensors, which must reside in device memory\nin current NKI releases. The transposes are needed when the NKI compute API consuming input tensors or producing the output\ntensors expect a different layout than their IO layout in device memory. To simplify discussion, we dive into input tensor\nlayout discussion below, but the same reasoning also applies to output tensors.\n\nFor example, say we have an input tensor in device memory with layout ``[out_channel=128, in_channel=128]`` (major-to-minor\nordering), but the ``nisa.nc_matmul`` call in our NKI kernel expects ``[in_channel, out_channel]`` as input tile layout.\nIn this case, we can perform a :doc:`nl.load </nki/api/generated/nki.language.load>` to load the input\ninto SBUF, with ``out_channel`` being the partition dimension because ``out_channel`` is the most major dimension in device\nmemory. Then, a PF-transpose on TensorE is required before the loaded data can be consumed by ``nisa.nc_matmul`` . Alternatively,\nwe can invoke :doc:`nl.load_transpose2d </nki/api/generated/nki.language.load_transpose2d>`\nto transpose the input tensor on the fly in the DMA engine, with a major caveat of much lower DMA bandwidth compared to\n``nl.load``. ``nl.load_transpose2d`` could make sense in a compute-bound kernel, but should certainly be avoided in memory-bound\nkernels.\n\nEither way, an IO transpose is inevitable here *due to* the IO tensor layout choice we made as NKI programmers. In the naive\ncase scenario where we only care about reaching the best performance for a single kernel, we can carefully decide on the\nIO tensor layout to make sure it is compatible with the NKI compute API layout requirements. When the input tensor is consumed\nby multiple compute APIs with conflicting layout requirements, IO-transposes cannot be avoided but should still be minimized\nas much as possible with a careful trade-off.\n\nHowever, NKI kernels are often injected into a larger model defined at the framework such as PyTorch and JAX, in which case\nthe kernel IO tensors are also input/output of the surrounding framework operators. These cases will require more complex\nreasoning on the optimal IO tensor layout for the NKI kernel, but the optimization goal of minimizing IO transposes remains\nthe same.\n\nOne last complexity in deciding IO tensor layout is the layout choice also has a potential impact on DMA efficiency. See\nmore discussion in a :ref:`later section <perf_guide_memory>`\ndiscussion optimizing data movement efficiency.\n\n**Intermediate Transpose.** These transposes are done on intermediate tensors produced within a NKI kernel. These transposes\narise due to layout requirement mismatches between producer and consumer NKI compute APIs.\n\nThere are two common techniques to reduce intermediate transposes: 1) swapping moving/stationary tensors in ``nisa.nc_matmul``\n(or equivalently, ``nl.matmul``) and 2) mapping a computation to an alternative engine with different layout requirements.\n\nOne example for technique 1) is in an operator chain commonly seen in Transformer models: ``linear_layer`` → ``layernorm``.\nNormally, we tend to map the weight ``[hidden_size, 4xhidden_size]`` tensor in ``linear_layer`` to the stationary tensor\nand the input feature map ``[hidden_size, seq_len]`` to the moving tensor when performing ``nisa.nc_matmul`` on TensorE.\nThe output feature map of this matmul will be in a layout of ``[4xhidden_size, seq_len]``. However, the first step in ``layernorm``\nto calculate mean and variance, ``nisa.bn_stats``\\ , requires ``4xhidden_size`` to be the free dimension because we need\nto calculate mean/variance within a single token. Therefore, a naive implementation of this operator chain will trigger\na PF-transpose between the ``nisa.nc_matmul`` and ``nisa.bn_stats`` instructions. However, if we were to instead map the\nweight tensor to the moving tensor and input feature map to stationary tensor, we can skip this PF-transpose entirely because\nthe ``nisa.nc_matmul`` output will be in the expected layout by ``nisa.bn_stats``.\n\nAn example for technique 2) is in a similar operator chain: ``linear_layer → RMSnorm`` with the same intermediate tensor\ndimensions as the above example. ``RMSnorm`` is considered a cheaper normalization operator compared compared ``Layernorm``\\\n, because it replaces the mean/variance calculation with squared and summation. Unlike ``nisa.bn_stats`` for mean/variance\ncalculations which must be done along the free dimension, for ``RMSnorm`` the scalar squared operator has no layout requirement\nand the summation can be done along either dimensions: use VectorE ``nisa.tensor_reduce`` for free dimension summation or\nuse TensorE ``nisa.nc_matmul`` for partition dimension summation (see :ref:`TensorE alternative use case <arch_sec_tensor_engine_alternative_use>`\nin the architecture guide). Since ``RMSnorm`` can be done with either ``[4xhidden_size, seq_len]`` or ``[seq_len, 4xhidden_size]``\\\n, we should make the layout choice based on more surrounding operator: ``RMSnorm`` in Transformer models is typically followed\nby yet another ``linear_layer``\\ , which requires the ``[4xhidden_size, seq_len]`` layout. Therefore, to minimize intermediate\ntransposes in an operator chain like ``linear_layer → RMSnorm → linear_layer`` , we should map the weight tensor of the\nfirst ``linear_layer`` to the stationary tensor and leverage TensorE to perform cross-partition summation for ``RMSnorm``.\n\n.. _perf_guide_memory:\n\nOptimizing Data Movement Efficiency\n-----------------------------------\n\nThe key goal of optimizing memory-bound kernels is to keep the DMA engines running at high bandwidth utilization as much\nas possible. If you are seeing major DMA engine idle gaps in neuron-profile, you should first find ways to hide compute\nbehind DMA activities using techniques discussed in :ref:`Opt #4 <perf_guide_opt4>`.\nThe rest of this section is going to focus on optimizations to improve DMA bandwidth utilization. All the optimizations\nbelow are applicable to a common symptom: computation blocked by DMA activities, which are keeping the DMA engines “busy”\nbut at low bandwidth utilization (< 60%):\n\n.. _perf_guide_busy_dma:\n\n.. figure:: /nki/img/nki_perf_guide/fig27.png\n   :align: center\n   :width: 100%\n\n   Busy DMA engines with relatively idle compute engines.\n\nNote, the current NKI release only supports running a kernel on a single NeuronCore (subject to changes in future releases).\nTherefore, the optimizations below will focus solely on movement between device memory and on-chip memory SBUF for now.\n\n\n.. _perf_guide_opt9:\n\nOpt #9: Perform sufficiently large DMA transfers\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Symptoms**: A quick way to determine whether the DMA transfers are moving large enough amount of data per transfer is to\nvisualize the DMA activities per engine in ``neuron-profile``:\n\n.. _perf_guide_dma_setting:\n\n.. figure:: /nki/img/nki_perf_guide/fig28.png\n   :align: center\n   :width: 70%\n\n   Change view settings to visualize DMA transfer per DMA engine.\n\nWith the above view settings, each DMA transfer will be shown with a continuous bar on the execution trace, grouped by DMA\nengines. Below is a profile example with small DMA transfers going on all 16 DMA engines. Visually, we can see DMA engine\nempty gaps (due to DMA overhead) are taking up more time than active DMA transfers. Hovering over some of DMA transfers,\nwe can also see a transfer size of 4B, which is extremely tiny. For reference, the transfer size on Trainium/Inferential2\nshould be larger than 32KiB to achieve ideal bandwidth.\n\n.. _perf_guide_tiny_dma:\n\n.. figure:: /nki/img/nki_perf_guide/fig29.png\n   :align: center\n   :width: 100%\n\n   Example timeline with tiny DMA transfers.\n\nFor comparison, here's another profile with sufficiently large DMA transfers, achieving close 70% DMA throughput utilization:\n\n.. _perf_guide_large_dma:\n\n.. figure:: /nki/img/nki_perf_guide/fig30.png\n   :align: center\n   :width: 100%\n\n   Example timeline with large DMA transfers.\n\n**Optimizations**: Refer to the architecture guide for more detailed discussion on DMA engines and intuitions behind the need\nfor large DMA transfer sizes to achieve good DMA efficiency. Here, we will discuss simple rule of thumbs in NKI to trigger\nlarge DMA transfers: maximize the partition and free dimension sizes in both :doc:`nl.load </nki/api/generated/nki.language.load>`\nand :doc:`nl.store </nki/api/generated/nki.language.store>`. For example, the below data loading will trigger\n16 DMA transfers that can be run on all 16 DMA engines, which each transfer loading 8 SBUF partitions' worth of data with\na transfer size of 32KiB:\n\n.. code-block::\n\n   import nki.language as nl\n\n   def load_store_32kib_contiguous(in_tensor, out_tensor):\n       # both in_tensor and out_tensor have FP32 data type, 4B/element\n       assert in_tensor.dtype == out_tensor.dtype == nl.float32\n       # both have shape 128x1024 in device memory\n       assert in_tensor.shape == out_tensor.shape == [128, 1024]\n\n       # partition dim size is at maximum supported by the architecture: 128\n       # free dim size is at the ideal size to achieve good bandwidth usage: 1024\n       # Beyond 1024 has diminished return on bandwidth and\n       # runs the risk of degrading compute/data movement pipelining efficiency\n\n       # This access pattern should map to 16 DMA transfers (1 transfer/DMA engine),\n       # with each DMA transfer moving 8 partitions worth of data:\n       # 8 partitions * 1024 elements * 4B/element = 32 KiB\n       data_tile = nl.load(in_tensor[0:128, 0:1024])\n\n       # Do some useful computation\n       ...\n\n       # Store, similar size as the load\n       nl.store(out_tensor[0:128, 0:1024], data_tile)\n\nOpt #10: Minimize use of DMA transposes.\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Symptom**: Excessive use of DMA transposes, invoked through ``nl.load_transpose2d``, can degrade DMA bandwidth significantly.\nIn ``neuron-profile``, you can find out whether  ``nl.load_transpose2d`` is taking up substantial amount of execution\ntime by using the search functionality, which will highlight all the DMA activities that perform transposes on the fly:\n\n.. _perf_guide_search_transpose:\n\n.. figure:: /nki/img/nki_perf_guide/fig31.png\n   :align: center\n   :width: 100%\n\n   Search for DMA activities that perform transposes.\n\n**Optimizations**: Refer to :ref:`Opt #8 <perf_guide_opt9>`\nfor a detailed discussion on how to eliminate the need of transposes on device memory input data. When the transposes are\ninevitable and the kernel is memory bound, we recommend replacing ``nl.load_transpose2d`` with ``nl.load()`` and ``nisa.nc_transpose()``.\nFor example, if you have an ``in_tensor`` of shape [8192, 128] in device memory but you would like an SBUF tile of shape\n[128, 8192] spread across 128 partitions for computation, the following two code snippets can achieve the same functionality:\n\n.. code-block::\n\n   # Option 1, low DMA bandwidth usage:\n   sbuf_opt1 = nl.load_transpose2d(in_tensor[0:8192, 0:128])\n\n   # Option 2, better DMA bandwidth usage, fastest transpose:\n   sbuf_opt2 = nl.ndarray((128, 8192), dtype=in_tensor.dtype)\n   for i_in_tile in nl.affine_range(8192 // 128):\n       i_start = i_in_tile*128\n       current_tile = nl.load(in_tensor[i_start:i_start+128, 0:128])\n       sbuf_opt2[0:128, i_start:i_start+128] = nisa.nc_transpose(current_tile)\n\nOption 2 above is especially great for cases where ``nl.load_transpose2d`` is slowing down data movement in the critical\npath and TensorE is otherwise idle. Occasionally Option 1 can still be the right call, when the amount of data to be transposed\nis small and the overhead of ``nl.load_transpose2d`` can be well hidden behind other useful computation.\n"
  },
  {
    "path": "nki/deep-dives/src/mxfp-matmul/mx_cpu_utils.py",
    "content": "################################################################\n# CPU Utilities to generate MX kernel input and golden data\n################################################################\n\nimport numpy as np\nimport ml_dtypes as mld\n\n# Ensure dtype is in the list of MX FP8/FP4 dtypes we support\ndef validate_quantized_dtype(dtype):\n  if dtype not in {mld.float8_e5m2, mld.float8_e4m3fn, mld.float4_e2m1fn}:\n    raise ValueError(f\"Unsupported quantized dtype: {dtype}\")\n  return dtype == mld.float4_e2m1fn\n\n# Get exponent for float32 in IEEE 754 standard\ndef get_float32_exp(float_data):\n  man_nbits, exp_nbits = 23, 8\n  return (float_data.astype(np.float32).view(np.uint32) >> man_nbits) & ((1 << exp_nbits) - 1)\n\n# max normal\n# float8_e5m2: S 11110 11 = ± 2^15 × 1.75 = ± 57,344\n# float8_e4m3fn: S 1111 110 = ± 2^8 × 1.75 = ± 448\n# float4_e2m1fn: S 11 1 = ± 2^2 × 1.5 = ± 6\ndef get_mx_fp_max(mx_dtype):\n  \"\"\"Get maximum representable value for MX dtype\"\"\"\n  validate_quantized_dtype(mx_dtype)\n  if mx_dtype == mld.float8_e5m2:\n    return 57344.0  # 2^15 * 1.75\n  elif mx_dtype == mld.float8_e4m3fn:\n    return 448.0    # 2^8 * 1.75\n  elif mx_dtype == mld.float4_e2m1fn:\n    return 6.0      # 2^2 * 1.5\n  else:\n    raise ValueError(f\"Unsupported mx_dtype: {mx_dtype}\")\n\ndef get_mx_max_exp(mx_dtype):\n  \"\"\"Get maximum exponent for MX dtype\"\"\"\n  validate_quantized_dtype(mx_dtype)\n  if mx_dtype == mld.float8_e5m2:\n    return 15\n  elif mx_dtype == mld.float8_e4m3fn:\n    return 8\n  elif mx_dtype == mld.float4_e2m1fn:\n    return 2\n  else:\n    raise ValueError(f\"Unsupported mx_dtype: {mx_dtype}\")\n\ndef get_p_contiguous_scale(hw_scale, data_p_size, p_offset=0):\n  if data_p_size <= 32:\n    return hw_scale[p_offset : p_offset + data_p_size]\n\n  scale = np.zeros((data_p_size // 8,) + tuple(hw_scale.shape[1:]), hw_scale.dtype)\n  for i in range(data_p_size // 8):\n    scale[i] = hw_scale[i // 4 * 32 + i % 4 + p_offset]\n\n  return scale\n\n# inputs/outputs are numpy, with shape [P,F]\n# returns:\n#   mx_data_golden x4 mimicked packing. If fp8, then uint32 containing 4 x fp8 elements. If fp4, then uint8 containing 2 x fp4 elements.\n#   mx_scale_golden as uint8 with shape [P//8, F//4] (scales are packed contiguously)\ndef quantize_mx_golden(in_tensor, out_quantized_dtype, ocp_saturation = True, reverse_dst_fdim_group = 0, custom_mx_max_exp=None):\n  max_exp = custom_mx_max_exp(out_quantized_dtype) if custom_mx_max_exp else get_mx_max_exp(out_quantized_dtype)\n  max_val = get_mx_fp_max(out_quantized_dtype)\n  float32_exp_bias = 127\n\n  P, F = in_tensor.shape\n  SP, SF = P // 8, F // 4\n\n  in_tensor_ = np.copy(in_tensor)\n\n  RG = reverse_dst_fdim_group\n  # reverse free dimension by a group of RG elements (keep the order within each group)\n  if RG > 0:\n    assert F % RG == 0\n    in_tensor_ = in_tensor_.reshape(P, F // RG, RG)[:, ::-1, :].reshape(P, F)\n\n  exp = get_float32_exp(in_tensor_)\n\n  # Reshape exponent tensor to group by 8x4 blocks for max computation\n  exp_reshaped = exp.reshape(SP, 8, SF, 4)\n\n  # Compute max exponent for each 8x4 block using vectorized operations\n  # Take max over the 8x4 dimensions (axes 1 and 3)\n  mx_scale_golden = np.max(exp_reshaped, axis=(1, 3)).astype(np.uint8) - max_exp\n\n  # Convert scale exponents to scale factors\n  scale_exp = mx_scale_golden.astype(np.int32) - float32_exp_bias\n  scale_factors = 2.0**scale_exp  # Shape: [SP, SF]\n\n  # Expand scale factors to match input tensor shape using vectorized operations\n  # Each scale factor applies to an 8x4 block\n  scale_expanded_p = np.repeat(scale_factors, 8, axis=0)  # Shape: [P, SF]\n  scale = np.repeat(scale_expanded_p, 4, axis=1)  # Shape: [P, F]\n\n  # Quantize: divide by scale\n  mx_data_golden = in_tensor_ / scale\n  if ocp_saturation:\n    mx_data_golden = np.clip(mx_data_golden, -max_val, max_val)\n  \n  # Cast to out_quantized_dtype then mimic x4 packing\n  mx_data_golden = mx_data_golden.astype(out_quantized_dtype)\n  mx_data_golden_x4 = pack_mx_data_into_x4(mx_data_golden)\n\n  return mx_data_golden_x4, mx_scale_golden\n\n# *_x4 inputs must mimic x4 packing via uint\n#   if quantized_dtype=fp8, then must be uint32 containing 4 x quantized_dtype elements\n#   if quantized_dtype=fp4, then must be uint8 containing 2 x quantized_dtype elements\n# *_scale inputs are numpy uint8.\n# use_contiguous_scale: True=scales are packed together contiguously, False=scales are spread across p-dim quadrants.\n# Return numpy result.\ndef nc_matmul_mx_golden(stationary_x4, moving_x4, stationary_scale, moving_scale, stationary_quantized_dtype, moving_quantized_dtype,\n                        use_contiguous_scale=True, stationary_scale_p_offset=0, moving_scale_p_offset=0):\n  \n  validate_quantized_dtype(stationary_quantized_dtype)\n  validate_quantized_dtype(moving_quantized_dtype)\n\n  # Unpack and upcast to fp32\n  moving = unpack_mx_data_from_x4(moving_x4, moving_quantized_dtype).astype(np.float32)\n  moving_scale = moving_scale.astype(np.float32)\n  stationary = unpack_mx_data_from_x4(stationary_x4, stationary_quantized_dtype).astype(np.float32)\n  stationary_scale = stationary_scale.astype(np.float32)\n\n  # Process moving tensor\n  new_shape = moving.shape[:-1] + (moving.shape[-1] // 4, 4)\n  moving = moving.reshape(new_shape)\n  MP, MF0, MF1 = moving.shape\n  assert MF1 == 4\n  # moving_scale = moving_scale.cpu().numpy().astype(np.float32)\n  if not use_contiguous_scale:\n    # if scale follows hw layout, make it contiguous at partition dimension\n    moving_scale = get_p_contiguous_scale(moving_scale, MP, moving_scale_p_offset)\n\n  MSP, MSF0 = moving_scale.shape\n\n  # The scale tensor may have more columns than needed (e.g., when stationary and moving scales are packed together).\n  moving_scale_relevant = moving_scale[:, :MF0]\n\n  # Convert scale exponents to scale factors\n  moving_scale_factors = 2.0 ** (moving_scale_relevant - 127)  # Shape: [MSP, MF0]\n\n  # Expand scale factors to match moving tensor shape\n  # Each scale factor applies to an 8x1x4 block\n  moving_scale_expanded = np.repeat(moving_scale_factors[:, :, np.newaxis], 4, axis=2)  # Shape: [MSP, MF0, 4]\n  moving_scale_expanded = np.repeat(moving_scale_expanded[:, np.newaxis, :, :], 8, axis=1)  # Shape: [MSP, 8, MF0, 4]\n  moving_scale_expanded = moving_scale_expanded.reshape(MSP * 8, MF0, 4)  # Shape: [MP, MF0, 4]\n\n  # Apply scaling\n  moving *= moving_scale_expanded\n\n  # Process stationary tensor\n  new_shape = stationary.shape[:-1] + (stationary.shape[-1] // 4, 4)\n  stationary = stationary.reshape(new_shape)\n  SP, SF0, SF1 = stationary.shape\n  assert SF1 == 4\n  stationary = stationary.astype(np.float32)\n\n  if not use_contiguous_scale:\n    # if scale follows hw layout, make it contiguous at partition dimension\n    stationary_scale = get_p_contiguous_scale(stationary_scale, SP, stationary_scale_p_offset)\n\n  SSP, SSF0 = stationary_scale.shape\n\n  # The scale tensor may have more columns than needed (e.g., when stationary and moving scales are packed together).\n  stationary_scale_relevant = stationary_scale[:, :SF0]\n\n  # Convert scale exponents to scale factors\n  stationary_scale_factors = 2.0 ** (stationary_scale_relevant - 127)  # Shape: [SSP, SF0]\n\n  # Expand scale factors to match stationary tensor shape\n  # Each scale factor applies to an 8x1x4 block\n  stationary_scale_expanded = np.repeat(stationary_scale_factors[:, :, np.newaxis], 4, axis=2)  # Shape: [SSP, SF0, 4]\n  stationary_scale_expanded = np.repeat(stationary_scale_expanded[:, np.newaxis, :, :], 8, axis=1)  # Shape: [SSP, 8, SF0, 4]\n  stationary_scale_expanded = stationary_scale_expanded.reshape(SSP * 8, SF0, 4)  # Shape: [SP, SF0, 4]\n\n  # Apply scaling\n  stationary *= stationary_scale_expanded\n\n  # This einsum mimics the hardware's Matmul-MX operation. In contrast to a standard 2D x 2D matmul, \n  # this performs an additional multiply-accumulate on the 4 elements inside one _x4 element, which is what\n  # the hardware does.\n  golden = np.einsum(\"kiq,kjq->ij\", stationary, moving)\n  return golden\n\ndef dequantize_mx_golden(mx_data_x4, quantized_dtype, mx_scale):\n  \"\"\"\n  Dequantize MX data back to float32, reversing quantize_mx_golden.\n\n  This is the exact reverse of quantize_mx_golden:\n  - quantize: out_data = in_data / scale, then clip, then cast to MX format\n  - dequantize: cast to float32, then out_data = in_data * scale\n  where scale = 2^(mx_scale - float32_exp_bias)\n\n  Args:\n      mx_data_x4: np.ndarray mimicking x4 packing via uint. [P, F//4] if fp8, [P, F//2] if fp4\n      mx_scale: np.ndarray [SP, SF] in uint8 - scale tensor where SP=P//8, SF=F//4 if fp8 or F//2 if fp4 \n\n  Returns:\n      np.ndarray [P, F] in float32 - dequantized data (same shape as original input to quantize)\n  \"\"\"\n  \n  is_fp4 = validate_quantized_dtype(quantized_dtype)\n\n  float32_exp_bias = 127\n\n  P, F_packed = mx_data_x4.shape\n  SP, SF = mx_scale.shape\n\n  assert SP == P // 8, f\"Scale tensor P dimension mismatch: expected {P//8}, got {SP}\"\n  expected_SF = F_packed // 2 if is_fp4 else F_packed\n  assert SF == expected_SF, f\"Scale tensor F dimension mismatch: expected {expected_SF}, got {SF}\"\n\n  # Unpack\n  mx_data_unpacked = unpack_mx_data_from_x4(mx_data_x4, quantized_dtype)\n  # Convert quantized_dtype to float32\n  data_float = mx_data_unpacked.astype(np.float32)\n  P_expanded, F_expanded = data_float.shape\n\n  # The F dimension is expanded, so check it's as expected\n  expected_F_expanded = F_packed * 2 if is_fp4 else F_packed * 4\n  assert F_expanded == expected_F_expanded, f\"Unexpected expansion: expected {expected_F_expanded}, got {F_expanded}\"\n\n  # Convert scale exponents to scale factors\n  scale_exp = mx_scale.astype(np.int32) - float32_exp_bias\n  scale_exp = np.clip(scale_exp, -127, 127)\n  scale_factors = 2.0**scale_exp\n\n  # Use numpy's repeat and tile to expand scale factors to match data shape\n  # Each scale factor needs to be applied to an 8x4 block\n  # First expand along P dimension: repeat each row 8 times\n  scale_expanded_p = np.repeat(scale_factors, 8, axis=0)  # Shape: [P_expanded, SF]\n\n  # Then expand along F dimension: repeat each column 4 times\n  scale_expanded = np.repeat(scale_expanded_p, 4, axis=1)   # Shape: [P_expanded, F_expanded]\n\n  # Dequantize: multiply by scale (reverse of quantize division)\n  dequantized_data = data_float * scale_expanded\n\n  return dequantized_data\n\ndef generate_stabilized_mx_data(quantized_dtype, shape, val_range=1.0):\n  \"\"\"\n  Generate stabilized floating-point data and its equivalent MX quantized representation.\n\n  This function returns standard floating-point numbers along with their equivalent\n  MX quantized data and scale tensors that are stabilized in the sense that the\n  floating-point data and MX data can convert to each other exactly without losing precision.\n\n  Args:\n      quantized_dtype: MX quantization dtype (ml_dtypes.float8_e5m2, ml_dtypes.float8_e4m3fn, ml_dtypes.float4_e2m1fn)\n      shape: 2D shape for the unquantized output tensor, each 8x4 block is a scaling group; e.g.,\n             fp_data[8*row : 8*(row+1), 4*col : 4*(col+1)] is a scaling group\n      val_range: fp_data output will be in (-val_range, val_range), (default: 1.0)\n\n  Returns numpy tensors:\n      tuple: (fp_data, quantized_mx_data, quantized_mx_data_x4, quantized_mx_scale)\n          - fp_data: floating-point data\n          - quantized_mx_data: MX quantized data that can be de-quantized to fp_data.\n          - quantized_mx_data_x4: quantized_mx_data packed to mimic NKI MXFP_x4 datatypes.\n              if quantized_dtype=fp8, then dtype=uint32 packed with 4 x quantized_dtype elements\n              if quantized_dtype=fp4, then dtype=uint8 packed with 2 x quantized_dtype elements.\n                uint16 is not used because it behaves inconsistently in torch when moving data host <-> device.\n          - quantized_mx_scale: MX scale tensor, uint8\n  \"\"\"\n  validate_quantized_dtype(quantized_dtype)\n\n  _q_height, _q_width = 8, 4\n  assert (shape[0] % _q_height == 0), f'shape[0] must be a multiple of {_q_height}, but got {shape[0]}'\n  assert (shape[1] % _q_width == 0), f'shape[1] must be a multiple of {_q_width}, but got {shape[1]}'\n\n  if val_range == 0:\n    zeros = np.zeros(shape)\n    return zeros, *quantize_mx_golden(zeros, quantized_dtype)\n\n  # Get MX dtype parameters\n  max_val = get_mx_fp_max(quantized_dtype)\n  max_exp = get_mx_max_exp(quantized_dtype)\n\n  # Generate initial random mxfp data within the mxfp dtype's range.\n  rand_data = (np.random.random(shape) * 2 - 1) * max_val\n\n  # For each scaling block, randomly select one element to have max exponent.\n  # This prevents change in mx_scale after quantize(dequantize(rand_mx_data, rand_mx_scale)), causing precision loss.\n  for i in range(0, shape[0], _q_height):\n    for j in range(0, shape[1], _q_width):\n      # Random position within the tile\n      tile_i = np.random.randint(0, _q_height - 1)\n      tile_j = np.random.randint(0, _q_width - 1)\n\n      # Set this element to have maximum exponent\n      # Value = ±1.xxx × 2^max_exp (where 1.xxx is the mantissa)\n      sign = np.random.choice([-1, 1])\n      # Within the range of [1.0, 1.5) (could be upto 1.75 for mxfp8).\n      mantissa = 1.0 + np.random.random() * 0.5\n      rand_data[i + tile_i, j + tile_j] = sign * mantissa * (2 ** max_exp)\n\n  # Cast to quantized_dtype\n  rand_data_quantized = rand_data.astype(quantized_dtype)\n  # pack into uint to mimic x4\n  rand_data_quantized_x4 = pack_mx_data_into_x4(rand_data_quantized)\n\n  # Calculate mx_scale bounds based on val_range\n  # max_val already takes max_exp into account\n  float32_exp_bias = 127\n  mx_scale_upper_bound = min(255, int(np.log2(val_range / max_val) + float32_exp_bias))\n  mx_scale_lower_bound = max(0, mx_scale_upper_bound - 10)\n\n  # Generate random scale\n  scale_shape = (shape[0] // _q_height, shape[1] // _q_width)\n  rand_quantized_scale_np = np.random.randint(mx_scale_lower_bound, mx_scale_upper_bound + 1,\n                                            size=scale_shape, dtype=np.uint8)\n\n  # Dequantize to get final fp data\n  dequantized_fp_data_np = dequantize_mx_golden(rand_data_quantized_x4, quantized_dtype, rand_quantized_scale_np)\n\n  return dequantized_fp_data_np, rand_data_quantized, rand_data_quantized_x4, rand_quantized_scale_np\n\ndef pack_mx_data_into_x4(mx_data):\n  \"\"\"\n  Pack MX data based on dtype:\n  - FP4: Pack 2 adjacent values into uint8 (4 bits each)\n  - FP8: Pack 4 adjacent values into uint32 (8 bits each)\n  \"\"\"\n  import ml_dtypes as mld\n  \n  if mx_data.dtype == mld.float4_e2m1fn:\n    # FP4 path: pack 2 values into uint8. Each FP4 element consumes 8 bits. Take the relevant 4-bits from two elements\n    # and pack into uint8.\n    mx_as_bytes = mx_data.view(np.uint8)\n    H, W = mx_data.shape\n    assert W % 2 == 0, \"Width must be divisible by 2 for FP4 packing\"\n    \n    bytes_grouped = mx_as_bytes.reshape(H, W // 2, 2)\n    return ((bytes_grouped[:, :, 0] & 0xF).astype(np.uint8) << 0) | \\\n            ((bytes_grouped[:, :, 1] & 0xF).astype(np.uint8) << 4)\n  \n  elif mx_data.dtype in [mld.float8_e5m2, mld.float8_e4m3fn]:\n    # FP8 path: view automatically gives (H, W//4) shape\n    # Just view it as uint32.\n    return mx_data.view(np.uint32)\n  \n  else:\n    raise ValueError(f\"Unsupported dtype: {mx_data.dtype}\")\n\ndef unpack_mx_data_from_x4(packed_data, target_dtype):\n  \"\"\"\n  Unpack MX data based on target dtype:\n  - FP4: Unpack uint8 into 2 adjacent values (4 bits each)\n  - FP8: Unpack uint32 into 4 adjacent values (8 bits each)\n  \"\"\"\n  import ml_dtypes as mld\n  \n  if target_dtype == mld.float4_e2m1fn:\n    # FP4 path: unpack uint8 into 2 values\n    assert packed_data.dtype == np.uint8, f\"Expected uint8 for FP4, got {packed_data.dtype}\"\n    H, W_packed = packed_data.shape\n    \n    # Extract 4-bit values from uint8\n    unpacked = np.zeros((H, W_packed, 2), dtype=np.uint8)\n    unpacked[:, :, 0] = packed_data & 0xF\n    unpacked[:, :, 1] = (packed_data >> 4) & 0xF\n    \n    # Each FP4 (target_dtype) actually consumes 8-bits.\n    return unpacked.reshape(H, W_packed * 2).view(target_dtype)\n  \n  elif target_dtype in [mld.float8_e5m2, mld.float8_e4m3fn]:\n    # FP8 path: view automatically gives (P, F*4) shape\n    assert packed_data.dtype == np.uint32, f\"Expected uint32 for FP8, got {packed_data.dtype}\"\n    return packed_data.view(target_dtype)\n      \n  else:\n    raise ValueError(f\"Unsupported dtype: {target_dtype}\")\n\n"
  },
  {
    "path": "nki/deep-dives/src/mxfp-matmul/mx_kernel_utils.py",
    "content": "################################################################\n# NKI Kernel helper utilities for using MX\n################################################################\n\nimport nki\nimport nki.isa as nisa\nimport nki.language as nl\nimport numpy as np\n\n# data_hbm = MX data tile, dtype=*_x4, in HBM. dim[0] must be multiple of 32.\n# scale_hbm = MX scale tile, dtype=*_x4, in HBM, contiguous.\n# Returns SBUF tile with scales spread across P-dim quadrants as follows:\n# HBM Scale:      →     Physical SBUF Layout:\n# [0:4,   :]      →     Quadrant 0: partitions [0:4,   :]\n# [4:8,   :]      →     Quadrant 1: partitions [32:36, :]\n# [8:12,  :]      →     Quadrant 2: partitions [64:68, :]\n# [12:16, :]      →     Quadrant 3: partitions [96:100, :]\ndef load_scales_scattered(data_hbm, scale_hbm):\n  # As per nc_matmul_mx's SBUF input layout rules, we need to spread the scales across the partition-dimension.\n\n  # P dimension must be multiple of 32 and not exceed 128\n  data_p, _ = data_hbm.shape\n  assert data_p % 32 == 0, f\"Data tile P={data_p} must be divisible by 32 for MX. Apply padding.\"\n  assert data_p <= 128, f\"Data tile P={data_p} must be <= 128.\"\n  \n  scale_p, scale_f = scale_hbm.shape\n  # This should automatically be true, but just sanity check.\n  assert (scale_p == data_p//8), f\"Scale tile P={scale_p} must be Data tile P//8 (data_p={data_p}), for MX.\" \n\n  # We only need to scatter the scales if more than one SBUF quadrant is used.\n  if (data_p > 32): # Could also check (scale_p > 4)\n    # Allocate expanded scale tile. Notice here we match the P-dim of the data tile.\n    scale_sbuf = nl.ndarray((data_p, scale_f), dtype=scale_hbm.dtype, buffer=nl.sbuf)\n    nisa.memset(dst=scale_sbuf,value=0)\n \n    # Take each group of 4 scale rows from HBM and write them to the respective SBUF quadrant, where SBUF quadrants\n    # are 32-rows.\n    for q in range (scale_p // 4):\n      # .ap(pattern) tuple of [step_size, count], right-most is the inner (fastest changing) dimension of the access pattern (AP)\n      # The src AP reads scale_f elements, jumps to the next row, 4 times total. \n      # Outer for-loop sets the src AP start offset to be the first of a set of 4 rows.\n      # The dst AP also writes scale_f elements, jumps to the next row, 4 times total.\n      # But the start-offset is the first of a set of 32 rows in dst.\n      nisa.dma_copy(\n        src=scale_hbm.ap(pattern=[[scale_f, 4], [1, scale_f]],offset=(4*q)*scale_f),\n        dst=scale_sbuf.ap(pattern=[[scale_f, 4], [1, scale_f]],offset=(32*q)*scale_f)        \n      )\n\n  else:\n    # Allocate scale tile. Notice here we use scale_p directly since scales will fit into one quadrant.\n    scale_sbuf = nl.ndarray((scale_p, scale_f), dtype=scale_hbm.dtype, buffer=nl.sbuf)\n    nisa.dma_copy(src=scale_hbm, dst=scale_sbuf) # Straight copy\n\n  return scale_sbuf\n\n# Expected input tile shapes: stationary_hbm [4, P_st, F_st], moving_hbm [4, P_mv, F_mv]\n# Output SBUF shapes: stationary_sbuf [P_st, 4, F_st], moving_sbuf [P_mv, 4, F_mv]\n#\n# HBM Layout [4, P, F]:           SBUF Layout [P, 4, F]:\n# =====================           ======================\n# ┌───────────┐                   ┌─────────┬─────────┬─────────┬─────────┐\n# │           │                   │         │         │         │         │\n# │ Tile0     │                   │  Tile0  │  Tile1  │  Tile2  │  Tile3  │\n# │ [P,F]     │                   │  [P,F]  │  [P,F]  │  [P,F]  │  [P,F]  │\n# │           │                   │         │         │         │         │\n# ├───────────┤                   └─────────┴─────────┴─────────┴─────────┘\n# │           │\n# │ Tile1     │\n# │ [P,F]     │\n# │           │\n# ├───────────┤\n# │           │\n# │ Tile2     │\n# │ [P,F]     │\n# │           │\n# ├───────────┤\n# │           │\n# │ Tile3     │\n# │ [P,F]     │\n# │           │\n# └───────────┘\ndef load_tensor_helper(stationary_hbm, moving_hbm):\n  P_st = stationary_hbm.shape[1]\n  F_st = stationary_hbm.shape[2]\n  P_mv = moving_hbm.shape[1]\n  F_mv = moving_hbm.shape[2]\n  \n  stationary_sbuf = nl.ndarray((P_st, 4, F_st), dtype=stationary_hbm.dtype, buffer=nl.sbuf)\n  moving_sbuf = nl.ndarray((P_mv, 4, F_mv), dtype=moving_hbm.dtype, buffer=nl.sbuf)\n  \n  # .ap(pattern) tuple of [step_size, count], right-most is the inner (fastest changing) dimension of the access pattern (AP).\n  # dst (SBUF) does not have an AP specified which means it is linearly accessed.\n  # The src AP reads F elements, then jumps to the next Tile, 4 times. This supplies the data to fill one row of SBUF.\n  #   Then we jump to the next row of HBM and repeat.\n\n  nisa.dma_copy(src=stationary_hbm.ap(pattern=[[F_st, P_st], [P_st*F_st, 4], [1, F_st]], offset=0), dst=stationary_sbuf)\n  nisa.dma_copy(src=moving_hbm.ap(pattern=[[F_mv, P_mv], [P_mv*F_mv, 4], [1, F_mv]], offset=0), dst=moving_sbuf)\n\n  return stationary_sbuf, moving_sbuf\n\n# [start-allocate_mx_tiles]\n# shape_unquantized represents the 2D unquantized SBUF shape with interleaved\n# layout established (i.e. the shape immediately before calling Quantize-MX).\ndef allocate_mx_tiles(shape_unquantized, mx_dtype, alloc_scale: bool = True):\n  assert len(shape_unquantized) == 2, f\"shape_unquantized must have exactly 2 dimensions, got {len(shape_unquantized)}\"\n  \n  P, F = shape_unquantized\n  \n  # Allocate data tile\n  # Quantize-MX shrinks the free-dim by 4x because it packs 4 elements into 1.\n  mx_data_sbuf = nl.ndarray((P, F//4), dtype=mx_dtype, buffer=nl.sbuf)\n\n  if not alloc_scale:\n      return mx_data_sbuf, None\n  \n  # Allocate scale tile\n  # Nominally the scale tile is sized (P//8, F//4) given that the scaling\n  # group shape is [8P, 4F]. But when P > 32, the scales must be placed in the\n  # partition-dim quadrant from which the corresponding scaling group originated \n  # hence we must allocate the full P.\n  if P <= 32: # Can store all scales in first p-dim quadrant.\n    mx_scale_sbuf = nl.ndarray((P//8, F//4), dtype=nl.uint8, buffer=nl.sbuf)\n  else: # Must oversize and spread across quadrants.\n    mx_scale_sbuf = nl.ndarray((P, F//4), dtype=nl.uint8, buffer=nl.sbuf)\n  \n  return mx_data_sbuf, mx_scale_sbuf\n# [end-allocate_mx_tiles]\n\n# [start-copy_data_strided]\n# Read unquantized tensors from HBM and establish interleaved layout in SBUF.\n# use_tensor_copy=true: Straight read from HBM->SBUF, then use SBUF-to-SBUF TensorCopy to stride the data.\n#   Intended to demonstrate how to stride the tile using VectorE/ScalarE if tile already present on SBUF.\n# use_tensor_copy=false: Stride the data while reading HBM->SBUF.\n#   Intended to demonstrate how to stride the tile if coming from HBM, using only the DMA engine.\n# The output shapes are [P//4, F*4] where the [P,F] is the shape of the corresponding unquantized input tensor.\ndef copy_data_strided(stationary_hbm, moving_hbm, use_tensor_copy: bool = True):  \n    \n  # The HBM tensors have nominal shape [P,F]. Reshape into [4, P//4, F]. \n  # In other words, we divide the contraction axis into 4 \"P\" tiles since we'll eventually\n  # need to read data from each tile and pack them together on SBUF.\n  \n  # These dimensions reflect the shape of each \"P\" tile.\n  P_st = stationary_hbm.shape[0] // 4\n  F_st = stationary_hbm.shape[1]\n  P_mv = moving_hbm.shape[0] // 4\n  F_mv = moving_hbm.shape[1]\n  \n  stationary_hbm_reshape = stationary_hbm.reshape((4, P_st, F_st))\n  moving_hbm_reshape = moving_hbm.reshape((4, P_mv, F_mv))\n\n  # Allocate SBUF tensors to store the strided result.\n  # The shape is [P//4, F, 4] where the [P,F] is the shape of the unquantized input tensor.\n  # In other words, we view the free-dim as having F_st/F_mv groups of 4 elements.\n  # Taking 3D views of both the HBM and SBUF tensors allows for cleaner indexing.\n  stationary_sbuf_strided = nl.ndarray((P_st, F_st, 4), dtype=stationary_hbm.dtype, buffer=nl.sbuf)\n  moving_sbuf_strided = nl.ndarray((P_mv, F_mv, 4), dtype=moving_hbm.dtype, buffer=nl.sbuf)    \n\n  # Perform a TensorCopy to achieve the required layout.\n  if (use_tensor_copy):\n\n    # First load from HBM -> SBUF. Take \"P\" tiles from HBM and write them\n    # contiguously (adjacent to each other) into the SBUF free-dim. \n    # This load is not the focus of this example so its details are encapsulated in load_tensor_helper().\n    # The SBUF shapes will be stationary_sbuf [P_st, 4, F_st], moving_sbuf [P_mv, 4, F_mv]\n    stationary_sbuf, moving_sbuf = load_tensor_helper(stationary_hbm_reshape, moving_hbm_reshape)\n\n    # Perform SBUF-to-SBUF TensorCopy to shuffle the data into the required MX layout.\n    # Here are some tips on how to read this access pattern (AP).\n    # .ap(pattern) = tuple of [step_size, count], right-most is the inner (fastest changing) dimension of the access pattern (AP).\n    # The dst (*_strided) has no AP specified, meaning it is linearly written to.\n    # To understand the src AP it's useful to refer to the SBUF Layout diagram in load_tensor_helper().\n    # We read 1 element, then step F elements to the next tile, 4 times total. In other words, we gather a group\n    # of 4 elements (one from each tile).\n    # Then step 1 element and repeat the above F times to read an entire row of SBUF.\n    # Then step to the next row of SBUF and repeat the above for all P rows of SBUF.\n    # Note, this example is shown as a strided-read but it could be re-written as a strided-write, though it will be slower.\n    # Secondly, the source tile can be in PSUM (i.e. the result of a prior matmul).\n  \n    nisa.tensor_copy(src=stationary_sbuf.ap(pattern=[[4*F_st, P_st], [1, F_st], [F_st, 4]], offset=0), dst=stationary_sbuf_strided)\n    nisa.tensor_copy(src=moving_sbuf.ap(pattern=[[4*F_mv, P_mv], [1, F_mv], [F_mv, 4]], offset=0), dst=moving_sbuf_strided)\n\n  # Perform a strided DMA to achieve the required layout.\n  else:\n\n    # Similar to TensorCopy, the we linearly write to stationary_sbuf_strided.\n    # When reading from *_hbm_reshape, we read one element from each tile.\n    # Then step 1 element and repeat the above F times, thereby reading one full row of HBM.\n    # Then step to the next row of HBM and repeat the above P times.\n\n    nisa.dma_copy(src=stationary_hbm_reshape.ap(pattern=[[F_st, P_st], [1, F_st], [P_st*F_st, 4]], offset=0),\n                  dst=stationary_sbuf_strided)\n    nisa.dma_copy(src=moving_hbm_reshape.ap(pattern=[[F_mv, P_mv], [1, F_mv], [P_mv*F_mv, 4]], offset=0),\n                  dst=moving_sbuf_strided)\n\n  # Return as 2D.\n  return stationary_sbuf_strided.reshape((P_st, F_st*4)), moving_sbuf_strided.reshape((P_mv, F_mv*4))\n# [end-copy_data_strided]\n"
  },
  {
    "path": "nki/deep-dives/src/mxfp-matmul/mx_kernels.py",
    "content": "################################################################\n# NKI Kernels to demonstrate MX usage\n################################################################\n\nimport nki\nimport nki.isa as nisa\nimport nki.language as nl\nfrom mx_kernel_utils import load_scales_scattered, allocate_mx_tiles, copy_data_strided\n\n# [start-kernel_offline_quantized_mx_matmul]\n# Matmul-MX using offline-quantized input tiles in HBM, assumed to be maximum tile sizes for the TensorE.\n# MX layout requirements for data tiles are ignored. (i.e. it's assumed the data tiles are \n# already correctly laid out).\n# *_mx_data inputs mimic _x4 packed types via uint. This kernel will simply view it as _x4.\n# *_mx_scale inputs are uint8, with scales packed contiguous (this kernel will spread them across partition-dim).\n# mx_dtype = one of nl.float8_e5m2_x4, nl.float8_e4m3fn_x4, nl.float4_e2m1fn_x4.\n# Returns bfloat16 matmul result.\n@nki.jit\ndef kernel_offline_quantized_mx_matmul(stationary_mx_data, stationary_mx_scale, moving_mx_data, moving_mx_scale, mx_dtype):    \n  \n  MAX_TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n  MAX_TILE_K = nl.tile_size.pmax  # 128\n  MAX_TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n\n  # View the input data as _x4 mx_dtype. This is done using an access pattern, specifying the target dtype and a simple\n  # linear pattern.\n  stationary_mx_data_hbm_x4 = stationary_mx_data.ap(dtype=mx_dtype, pattern=[[MAX_TILE_M,MAX_TILE_K],[1,MAX_TILE_M]], offset=0)\n  moving_mx_data_hbm_x4 = moving_mx_data.ap(dtype=mx_dtype, pattern=[[MAX_TILE_N,MAX_TILE_K],[1,MAX_TILE_N]], offset=0)\n\n  # Check that the input tiles are max-sized. This is merely for simplicity of the example but\n  # smaller shapes are also supported.\n  assert stationary_mx_data_hbm_x4.shape == (MAX_TILE_K, MAX_TILE_M)\n  assert moving_mx_data_hbm_x4.shape == (MAX_TILE_K, MAX_TILE_N)\n\n  # Load inputs directly from HBM to SBUF. Data is assumed to already have the \n  # layout required by MX. Scales are assumed to be contiguous in HBM therefore we use\n  # load_scales_scattered() to spread them across SBUF partition-dim quadrants, as is required\n  # by Matmul-MX.\n\n  stationary_mx_data_sbuf_x4 = nl.ndarray(stationary_mx_data_hbm_x4.shape, dtype=mx_dtype, buffer=nl.sbuf)\n  nisa.dma_copy(dst=stationary_mx_data_sbuf_x4, src=stationary_mx_data_hbm_x4)\n  stationary_mx_scale_sbuf = load_scales_scattered(stationary_mx_data_sbuf_x4, stationary_mx_scale)\n\n  # Load moving\n  moving_mx_data_sbuf_x4 = nl.ndarray(moving_mx_data_hbm_x4.shape, dtype=mx_dtype, buffer=nl.sbuf)\n  nisa.dma_copy(dst=moving_mx_data_sbuf_x4, src=moving_mx_data_hbm_x4)\n  moving_mx_scale_sbuf = load_scales_scattered(moving_mx_data_sbuf_x4, moving_mx_scale)\n  \n  # Allocate a tile in PSUM. This could also be float32.\n  result_psum = nl.ndarray((MAX_TILE_M, MAX_TILE_N), dtype=nl.bfloat16, buffer=nl.psum)\n\n  # Matmul-MX\n  nisa.nc_matmul_mx(\n    dst=result_psum,\n    stationary=stationary_mx_data_sbuf_x4,\n    moving=moving_mx_data_sbuf_x4,\n    stationary_scale=stationary_mx_scale_sbuf,\n    moving_scale=moving_mx_scale_sbuf\n  )\n\n  # Copy the PSUM result back to SBUF\n  result_sbuf = nl.ndarray(result_psum.shape, dtype=nl.bfloat16, buffer=nl.sbuf)\n  nisa.tensor_copy(dst=result_sbuf, src=result_psum)  \n\n  # Store to HBM\n  result_hbm = nl.ndarray(result_psum.shape, dtype=nl.bfloat16, buffer=nl.shared_hbm)  \n  nisa.dma_copy(dst=result_hbm, src=result_sbuf)\n  \n  return result_hbm\n# [end-kernel_offline_quantized_mx_matmul]\n\n# [start-kernel_on_device_quantize_matmul_mx]\n# Matmul-MX using a offline-quantized stationary input tile from HBM and on-device quantized moving tile.\n# Input to Quantize-MX must be bf16/fp16.\n# MX layout requirements for data tiles are ignored. (i.e. it's assumed the data tiles are \n# already correctly laid out, including moving_data_bf16).\n# *_mx_data inputs are float32 where each element contains 4 x quantized elements elements.\n#   *_mx_data will be viewed as mx_dtype.\n# *_mx_scale inputs are uint8, with scales packed contiguous (this kernel will spread them across partition-dim).\n# mx_dtype = one of nl.float8_e5m2_x4, nl.float8_e4m3fn_x4, nl.float4_e2m1fn_x4.\n# It's assumed TensorE max tile sizes are used.\n@nki.jit\ndef kernel_on_device_quantize_matmul_mx(stationary_mx_data, stationary_mx_scale, moving_data_bf16, stationary_mx_dtype, moving_mx_dtype):\n\n  assert moving_mx_dtype != nl.float4_e2m1fn_x4, \"FP4 not supported by Quantize-MX\"\n\n  MAX_TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n  MAX_TILE_K = nl.tile_size.pmax  # 128\n  MAX_TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n\n  # View the input MX data as _x4 mx_dtype. This is done using an access pattern, specifying the target dtype and a simple\n  # linear pattern.\n  stationary_mx_data_hbm_x4 = stationary_mx_data.ap(dtype=stationary_mx_dtype, pattern=[[MAX_TILE_M,MAX_TILE_K],[1,MAX_TILE_M]], offset=0)\n\n  # Check that the input tiles are max-sized. This is merely for simplicity of the example but\n  # smaller shapes are also supported.\n  assert stationary_mx_data_hbm_x4.shape == (MAX_TILE_K, MAX_TILE_M)\n  # Note the factor of 4 on the N free-dim. This is unquantized data whose free-dim will be packed and\n  # reduced by a factor of 4 during quantize_mx.\n  assert moving_data_bf16.shape == (MAX_TILE_K, MAX_TILE_N*4)\n\n  # Load stationary MX.\n  stationary_mx_data_sbuf_x4 = nl.ndarray(stationary_mx_data_hbm_x4.shape, dtype=stationary_mx_dtype, buffer=nl.sbuf)\n  nisa.dma_copy(dst=stationary_mx_data_sbuf_x4, src=stationary_mx_data_hbm_x4)\n  stationary_mx_scale_sbuf = load_scales_scattered(stationary_mx_data_sbuf_x4, stationary_mx_scale)\n  \n  # Load moving BF16\n  moving_bf16_sbuf = nl.ndarray(moving_data_bf16.shape, dtype=moving_data_bf16.dtype, buffer=nl.sbuf)\n  nisa.dma_copy(dst=moving_bf16_sbuf, src=moving_data_bf16)\n\n  # Allocate quantized moving tiles\n  moving_mx_data_sbuf_x4, moving_mx_scale_sbuf = allocate_mx_tiles(moving_data_bf16.shape, moving_mx_dtype)  \n\n  # Quantize-MX. Scales will automatically be spread across partition-dim quadrants.\n  nisa.quantize_mx(dst=moving_mx_data_sbuf_x4,\n                  src=moving_bf16_sbuf,\n                  dst_scale=moving_mx_scale_sbuf)  \n\n  # Allocate a tile in PSUM\n  result_psum = nl.ndarray((MAX_TILE_M, MAX_TILE_N), dtype=nl.bfloat16, buffer=nl.psum)\n\n  # Matmul-MX\n  nisa.nc_matmul_mx(\n    dst=result_psum,\n    stationary=stationary_mx_data_sbuf_x4,\n    moving=moving_mx_data_sbuf_x4,\n    stationary_scale=stationary_mx_scale_sbuf,\n    moving_scale=moving_mx_scale_sbuf\n  )  \n\n  # Copy the PSUM result back to SBUF\n  result_sbuf = nl.ndarray(result_psum.shape, dtype=nl.bfloat16, buffer=nl.sbuf)\n  nisa.tensor_copy(dst=result_sbuf, src=result_psum)  \n\n  # Store to HBM\n  result_hbm = nl.ndarray(result_psum.shape, dtype=nl.bfloat16, buffer=nl.shared_hbm)  \n  nisa.dma_copy(dst=result_hbm, src=result_sbuf)\n\n  return result_hbm\n# [end-kernel_on_device_quantize_matmul_mx]\n\n# Matmul-MX using on-device quantized stationary and moving tensors, demonstrating how to use\n# a strided access pattern to establish the SBUF layout required by MX operations.\n# Two examples are shown: the access pattern is implemented either in VectorE/ScalarE Tensor Copy or by the DMA engine.\n# Unquantized input tiles from HBM are expected to be sized such that they become max-tiles for the \n# TensorE once quantized.\n@nki.jit\ndef kernel_copy_strided_quantize_matmul_mx(stationary_hbm, moving_hbm, mx_dtype, use_tensor_copy: bool = True):\n  \n  assert mx_dtype != nl.float4_e2m1fn_x4, \"FP4 not supported by Quantize-MX\"\n \n  MAX_TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n  MAX_TILE_K = nl.tile_size.pmax  # 128\n  MAX_TILE_N = nl.tile_size.gemm_moving_fmax  # 512  \n\n  # Ensure input tensors are in HBM.\n  assert stationary_hbm.buffer == moving_hbm.buffer == nl.hbm\n\n  # Sanity check the shapes. We expect contraction dimension of the unquantized tile to be 4x.\n  assert stationary_hbm.shape == (MAX_TILE_K*4, MAX_TILE_M)\n  assert moving_hbm.shape == (MAX_TILE_K*4, MAX_TILE_N)\n\n  # The key details of this example are shown in copy_data_strided() where data is copied into SBUF\n  # using strided access patterns to achieve the required MX layout.\n  # Returned shape is [P//4, F*4] where [P,F] is the input shape.\n  stationary_sbuf_strided, moving_sbuf_strided = copy_data_strided(stationary_hbm, moving_hbm, use_tensor_copy)\n\n  # Allocate quantized moving tiles\n  stationary_mx_data_sbuf, stationary_mx_scale_sbuf = allocate_mx_tiles(stationary_sbuf_strided.shape, mx_dtype)\n  moving_mx_data_sbuf, moving_mx_scale_sbuf = allocate_mx_tiles(moving_sbuf_strided.shape, mx_dtype)\n\n  # Quantize-MX. Scales will automatically be spread across partition-dim quadrants.\n  nisa.quantize_mx(dst=stationary_mx_data_sbuf,\n                  src=stationary_sbuf_strided,\n                  dst_scale=stationary_mx_scale_sbuf)\n\n  nisa.quantize_mx(dst=moving_mx_data_sbuf,\n                  src=moving_sbuf_strided,\n                  dst_scale=moving_mx_scale_sbuf)\n  \n  # Allocate a tile in PSUM\n  result_psum = nl.ndarray((MAX_TILE_M, MAX_TILE_N), dtype=nl.bfloat16, buffer=nl.psum)\n\n  # Matmul-MX\n  nisa.nc_matmul_mx(\n    dst=result_psum,\n    stationary=stationary_mx_data_sbuf,\n    moving=moving_mx_data_sbuf,\n    stationary_scale=stationary_mx_scale_sbuf,\n    moving_scale=moving_mx_scale_sbuf\n  )\n\n  # Copy the PSUM result back to SBUF\n  result_sbuf = nl.ndarray(result_psum.shape, dtype=nl.bfloat16, buffer=nl.sbuf)\n  nisa.tensor_copy(dst=result_sbuf, src=result_psum)  \n\n  # Store to HBM\n  result_hbm = nl.ndarray(result_psum.shape, dtype=nl.bfloat16, buffer=nl.shared_hbm)  \n  nisa.dma_copy(dst=result_hbm, src=result_sbuf)\n\n  return result_hbm\n\n#[start-kernel_copy_strided_quantize_matmul_mx_packed_scale]\n# Matmul-MX using on-device quantized stationary and moving tensors, demonstrating how to use\n# pack scale values from multiple quantize_mx calls into a single tensor in SBUF.\n# \n# Unquantized input tiles from HBM are expected to be sized such that they become max-tiles for the \n# TensorE once quantized.\n@nki.jit\ndef kernel_copy_strided_quantize_matmul_mx_packed_scale(stationary_hbm, moving_hbm, mx_dtype, use_tensor_copy: bool = True):\n  \n  assert mx_dtype != nl.float4_e2m1fn_x4, \"FP4 not supported by Quantize-MX\"\n \n  MAX_TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n  MAX_TILE_K = nl.tile_size.pmax  # 128\n  MAX_TILE_N = nl.tile_size.gemm_moving_fmax  # 512  \n\n  # Ensure input tensors are in HBM.\n  assert stationary_hbm.buffer == moving_hbm.buffer == nl.hbm\n\n  # Sanity check the shapes. We expect contraction dimension of the unquantized tile to be 4x.\n  assert stationary_hbm.shape == (MAX_TILE_K*4, MAX_TILE_M)\n  assert moving_hbm.shape == (MAX_TILE_K*4, MAX_TILE_N)\n\n  # Use strided access patterns to achieve required MX layout.\n  # Returned shape is [P//4, F*4] where [P,F] is the input shape.\n  stationary_sbuf_strided, moving_sbuf_strided = copy_data_strided(stationary_hbm, moving_hbm, use_tensor_copy)\n\n  # Allocate quantized stationary/moving tiles.\n  # Unlike the example kernel_copy_strided_quantize_matmul_mx, we do not allocate scale tiles here.\n  stationary_mx_data_sbuf, _  = allocate_mx_tiles(stationary_sbuf_strided.shape, mx_dtype, alloc_scale=False)\n  moving_mx_data_sbuf, _ = allocate_mx_tiles(moving_sbuf_strided.shape, mx_dtype, alloc_scale=False)\n\n  # Allocate a single tile into which we will pack scale values from BOTH quantize_mx calls.\n  #\n  # quantize_mx requires that the input tile's free dimension contains exactly 4x as many \n  # elements as the scale tile. We will use this tile for both quantize_mx calls, so its \n  # free dimension needs to be able to hold the larger of the two input tiles, hence MAX_TILE_N.\n  packed_mx_scale_sbuf = nl.ndarray((MAX_TILE_K, MAX_TILE_N), dtype=nl.uint8, buffer=nl.sbuf)\n\n  # Each scaling group consists of 32 elements, with 8 partitions x 4 elements per partition.\n  # Therefore, for each 32-partition SBUF quadrant, we get only 32 // 8 = 4 partitions' worth of scale factors.\n  # This leaves 28 partitions unused. quantize_mx lets us use some of this space by storing other tensors'\n  # scale factors at an offset.\n\n  # In this example, we use tensor slicing to store:\n  # - stationary's scale values at offset 0 in each quadrant (i.e., partitions 0:4, 32:36, 64:68, 96:100)\n  # - moving's scale values at offset 4 in each quadrant (i.e., partitions 4:8, 36:40, 68:72, 100:104)\n\n  # moving's scale values will be written to partitions 0:4 in each quadrant.\n  # Additionally, we restrict the free dimension size to match stationary's shape.\n  stationary_mx_scale_sbuf = packed_mx_scale_sbuf[0:, :MAX_TILE_M]\n\n  # moving's scale values will be written to partitions 4:8 in each quadrant.\n  # We don't restrict the size of the free dimension; it already matches moving's shape.\n  moving_mx_scale_sbuf = packed_mx_scale_sbuf[4:, :]\n\n  # Quantize-MX. Scales will automatically be spread across partition-dim quadrants.\n  nisa.quantize_mx(dst=stationary_mx_data_sbuf,\n                  src=stationary_sbuf_strided,\n                  dst_scale=stationary_mx_scale_sbuf)\n\n  nisa.quantize_mx(dst=moving_mx_data_sbuf,\n                  src=moving_sbuf_strided,\n                  dst_scale=moving_mx_scale_sbuf)\n  \n  # Allocate a tile in PSUM\n  result_psum = nl.ndarray((MAX_TILE_M, MAX_TILE_N), dtype=nl.bfloat16, buffer=nl.psum)\n\n  # Matmul-MX\n  nisa.nc_matmul_mx(\n    dst=result_psum,\n    stationary=stationary_mx_data_sbuf,\n    moving=moving_mx_data_sbuf,\n    stationary_scale=stationary_mx_scale_sbuf,\n    moving_scale=moving_mx_scale_sbuf\n  )\n\n  # Copy the PSUM result back to SBUF\n  result_sbuf = nl.ndarray(result_psum.shape, dtype=nl.bfloat16, buffer=nl.sbuf)\n  nisa.tensor_copy(dst=result_sbuf, src=result_psum)  \n\n  # Store to HBM\n  result_hbm = nl.ndarray(result_psum.shape, dtype=nl.bfloat16, buffer=nl.shared_hbm)  \n  nisa.dma_copy(dst=result_hbm, src=result_sbuf)\n\n  return result_hbm\n#[end-kernel_copy_strided_quantize_matmul_mx_packed_scale]"
  },
  {
    "path": "nki/deep-dives/src/mxfp-matmul/mx_toplevel.py",
    "content": "import torch\nimport os\nimport nki.language as nl\nimport numpy as np\nimport torch_xla\nimport shutil\nimport ml_dtypes as mld\nfrom mx_cpu_utils import generate_stabilized_mx_data, nc_matmul_mx_golden, quantize_mx_golden\nfrom mx_kernels import kernel_offline_quantized_mx_matmul, kernel_on_device_quantize_matmul_mx, kernel_copy_strided_quantize_matmul_mx, kernel_copy_strided_quantize_matmul_mx_packed_scale\n\n# Global compiler flags\nNEURON_CC_BASE_FLAGS = \" --target trn3 --pipeline compile SaveTemps --internal-compiler-debug-mode=all --internal-backend-options='--print-format=json,condensed' \"\n\ndevice = None\ncpu = None\n\n# NKI kernels use these _x4 custom dtypes to represent MXFP* data.\nquantized_dtype_to_x4_map = {\n  mld.float8_e5m2: nl.float8_e5m2_x4,\n  mld.float8_e4m3fn: nl.float8_e4m3fn_x4,\n  mld.float4_e2m1fn: nl.float4_e2m1fn_x4,\n}\n\ndef setup_compiler_workdir(test_name):\n  \"\"\"Setup unique compiler output directory for each test\"\"\"\n  current_dir = os.path.dirname(os.path.abspath(__file__))\n  workdir = f\"{current_dir}/artifacts_{test_name}\"\n  \n  # Remove existing directory if it exists\n  if os.path.exists(workdir):\n    shutil.rmtree(workdir)\n  os.makedirs(workdir, exist_ok=True)\n  \n  # Set full environment variable\n  os.environ[\"NEURON_CC_FLAGS\"] = f\"{NEURON_CC_BASE_FLAGS} --compile_workdir {workdir}\"\n\ndef compare_and_print_results(res, golden, rtol=5e-2, atol=5e-2):\n  print(\"\\n\\nResult shape:\", res.shape)\n  \n  # Ensure both are numpy float32\n  res_float = res.astype(np.float32) if res.dtype != np.float32 else res\n  golden_float = golden.astype(np.float32) if golden.dtype != np.float32 else golden\n  \n  match = np.allclose(res_float, golden_float, rtol=rtol, atol=atol)\n  print(\"\\nnp.allclose pass?\", match)\n  \n  if not match:\n    # Print mismatch info\n    diff = np.abs(res_float - golden_float)\n    max_diff = np.max(diff)\n    mean_diff = np.mean(diff)\n    print(f\"Max difference: {max_diff:.6f}\")\n    print(f\"Mean difference: {mean_diff:.6f}\")\n  \n  # Print first and last row, first 3 and last 3 columns\n  print(f\"\\nDevice Output:\\n[{res_float[0,:3]} ... {res_float[0,-3:]}]\\n...\\n[{res_float[-1,:3]} ... {res_float[-1,-3:]}]\")\n  print(f\"\\nGolden:\\n[{golden_float[0,:3]} ... {golden_float[0,-3:]}]\\n...\\n[{golden_float[-1,:3]} ... {golden_float[-1,-3:]}]\")\n\ndef print_test_header(test_name):\n  border_length = max(60, len(test_name) + 8)  # Ensure minimum width + padding\n  print(f\"\\n\\n{'='*border_length}\")\n  print(f\"    {test_name}\")\n  print(f\"{'='*border_length}\\n\")\n\n# [start-run_offline_quantized_matmul_mx_test]\n# This test will quantize to MXFP8 on the host.\n# Then execute Matmul-MX on the device using these offline-quantized tiles.\ndef run_offline_quantized_matmul_mx_test(quantized_dtype):\n  \n  # Choose max tile-sizes for TensorE.\n  M, K, N = 128, 128, 512\n\n  print_test_header(f\"OFFLINE_QUANTIZED_MX_MATMUL - stationary <{quantized_dtype.__name__}> @ moving <{quantized_dtype.__name__}>\")\n\n  setup_compiler_workdir(f\"offline_quantized_mx_matmul\")\n\n  # Generate stationary MX tile. Note the scales will be packed contiguously here. The kernel will later load the scales into SBUF\n  # in the required scattered fashion.\n  st_unquantized_shape = (K, M*4)\n  _, _, st_mx_data_x4, st_mx_scale = generate_stabilized_mx_data(quantized_dtype, st_unquantized_shape)\n\n  # Generate moving MX tile\n  mv_unquantized_shape = (K, N*4)\n  _, _, mv_mx_data_x4, mv_mx_scale = generate_stabilized_mx_data(quantized_dtype, mv_unquantized_shape)\n\n  # Call the Kernel. Perform matmul-mx: stationary_mx @ moving_mx\n  output_kernel = kernel_offline_quantized_mx_matmul(\n    torch.from_numpy(st_mx_data_x4).to(device), \n    torch.from_numpy(st_mx_scale).to(device), \n    torch.from_numpy(mv_mx_data_x4).to(device), \n    torch.from_numpy(mv_mx_scale).to(device), \n    quantized_dtype_to_x4_map[quantized_dtype]\n  )\n\n  output_kernel_np = output_kernel.cpu().float().numpy()\n\n  # Generate the golden\n  golden = nc_matmul_mx_golden(st_mx_data_x4, mv_mx_data_x4, st_mx_scale, mv_mx_scale, quantized_dtype, quantized_dtype)\n\n  compare_and_print_results(output_kernel_np, golden)\n# [end-run_offline_quantized_matmul_mx_test]\n\n# This test will quantize the stationary tile to MXFP8 on the host, and moving tile on device.\n# Then execute Matmul-MX on the device,\ndef run_on_device_quantize_matmul_mx_test(quantized_dtype_stationary, quantized_dtype_moving):\n  \n  # Choose max tile-sizes for TensorE.\n  M, K, N = 128, 128, 512\n \n  print_test_header(f\"ON_DEVICE_QUANTIZE_MATMUL_MX - stationary <{quantized_dtype_stationary.__name__}> @ moving <{quantized_dtype_moving.__name__}>\")\n\n  setup_compiler_workdir(f\"on_device_quantize_matmul_m\")\n\n  # Generate stationary MX tile. Note the scales will be packed contiguously here. The kernel will later load the scales into SBUF\n  # in the required scattered fashion.\n  st_unquantized_shape = (K, M*4)\n  _, _, st_mx_data_x4, st_mx_scale = generate_stabilized_mx_data(quantized_dtype_stationary, st_unquantized_shape)\n\n  # Generate moving tile\n  mv_unquantized_shape = (K, N*4)\n  # Notice we don't just generate random fp data using, say, np.random.\n  # Instead we use generate_stabilized_mx_data()'s fp_data output to get stabilized unquantized data that can be\n  # quantized and dequantized without loss of precision.\n  mv_data, _, _, _ = generate_stabilized_mx_data(quantized_dtype_moving, mv_unquantized_shape)\n\n  # Call the Kernel. Quantize mv_data, then perform Matmul-MX.\n  output_kernel = kernel_on_device_quantize_matmul_mx(\n    torch.from_numpy(st_mx_data_x4).to(device), \n    torch.from_numpy(st_mx_scale).to(device), \n    torch.from_numpy(mv_data).bfloat16().to(device), # Convert to bf16,\n    quantized_dtype_to_x4_map[quantized_dtype_stationary], # stationary mx\n    quantized_dtype_to_x4_map[quantized_dtype_moving], # moving qmx output\n  )\n\n  output_kernel_np = output_kernel.cpu().float().numpy()\n\n  # Generate the golden\n  # Quantize moving tensor as an intermediate step.\n  moving_mx_data, moving_mx_scale = quantize_mx_golden(mv_data, quantized_dtype_moving)\n  # Matmul-MX\n  golden = nc_matmul_mx_golden(st_mx_data_x4, moving_mx_data, st_mx_scale, moving_mx_scale, quantized_dtype_stationary, quantized_dtype_moving)\n\n  compare_and_print_results(output_kernel_np, golden)\n\n# This example:\n# 1. Starts with two HBM tensors.\n# 2. Establishes required SBUF layout using:\n#   - TensorCopy on the NeuronCore (if use_tensor_copy is True)\n#   - DMA (if use_tensor_copy is False)\n# 3. Quantizes both tensors on device, storing scale values:\n#   - In a single packed tile (if pack_scales is True)\n#   - In two separate tiles (if pack_scales is False)\n# 4. Performs Matmul-MX.\ndef run_copy_strided_test(quantized_dtype, use_tensor_copy: bool = True, pack_scales: bool = False):\n  # Choose max tile-sizes for TensorE. But here we're specifying unquantized shapes.\n  # Since Matmul-MX allows for 4x larger contraction dimension, we choose K=512.\n  K, M, N = 512, 128, 512\n\n  print_test_header(f\"COPY_STRIDED_{'TENSOR_COPY' if use_tensor_copy else 'DMA'}_{'PACKED' if pack_scales else 'UNPACKED'} - <{quantized_dtype.__name__}> @ <{quantized_dtype.__name__}>\")\n\n  setup_compiler_workdir(f\"copy_strided_test_tensor_copy_{use_tensor_copy}_{pack_scales}\")\n\n  # Generate the stationary and moving tensors in bf16.\n  # Using generate_stabilized_mx_data() to generate FP data that is within the MX data-type range.\n  # Contraction dimension is the first dimensions, as is required by TensorE.\n  st_shape = (K, M)\n  st_data, _, _, _ = generate_stabilized_mx_data(quantized_dtype, st_shape)\n  \n  mv_shape = (K, N)\n  mv_data, _, _, _ = generate_stabilized_mx_data(quantized_dtype, mv_shape)\n\n  # Call the kernel\n  kernel = kernel_copy_strided_quantize_matmul_mx_packed_scale if pack_scales else kernel_copy_strided_quantize_matmul_mx\n  output_kernel = kernel(\n    torch.from_numpy(st_data).bfloat16().to(device),\n    torch.from_numpy(mv_data).bfloat16().to(device),\n    quantized_dtype_to_x4_map[quantized_dtype],\n    use_tensor_copy\n  )\n\n  output_kernel_np = output_kernel.cpu().float().numpy()\n\n  # To generate a golden we simply perform matmul using the input fp tensors.\n  # Notice we're not using the matmul_mx_golden/quantize_mx_golden utilities -- they mimic the hardware\n  # and therefore assume the input tensors have the interleaved layout.\n  golden = st_data.T @ mv_data\n  \n  compare_and_print_results(output_kernel_np, golden)\n\nif __name__ == \"__main__\":\n\n  device = torch_xla.device()\n  cpu = torch.device('cpu')\n  \n  # Matmul-MX with MX tensors prepared on host\n  run_offline_quantized_matmul_mx_test(mld.float8_e5m2) # FP8 @ FP8\n  run_offline_quantized_matmul_mx_test(mld.float4_e2m1fn) # FP4 @ FP4\n\n  # Matmul-MX with moving tensor quantized on device.\n  run_on_device_quantize_matmul_mx_test(mld.float4_e2m1fn, mld.float8_e5m2) # Mixed FP4 @ FP8\n  run_on_device_quantize_matmul_mx_test(mld.float8_e5m2, mld.float8_e5m2) # FP8 @ FP8\n\n  # Use TensorCopy to stride the data\n  run_copy_strided_test(mld.float8_e5m2, use_tensor_copy=True, pack_scales=False) # FP8 @ FP8\n\n  # Use DMA to stride the data\n  run_copy_strided_test(mld.float8_e5m2, use_tensor_copy=False, pack_scales=False) # FP8 @ FP8\n\n  # Pack scale values into single tensor and use TensorCopy to stride the data\n  run_copy_strided_test(mld.float8_e5m2, use_tensor_copy=True, pack_scales=True) # FP8 @ FP8\n\n  # Pack scale values into single tensor and use DMA to stride the data\n  run_copy_strided_test(mld.float8_e5m2, use_tensor_copy=False, pack_scales=True) # FP8 @ FP8\n"
  },
  {
    "path": "nki/examples/average_pool2d/average_pool2d_jax.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\nJAX implementation for average pool 2D NKI tutorial.\n\n\"\"\"\n# NKI_EXAMPLE_40_BEGIN\nimport jax.numpy as jnp\n# NKI_EXAMPLE_40_END\nfrom average_pool2d_nki_kernels import tensor_avgpool_kernel\n\n\n# NKI_EXAMPLE_40_BEGIN\n# Reference JAX implementation\ndef jax_average_pool_2D(in_tensor, pool_size):\n  c, h_in, w_in = in_tensor.shape\n  reshaped = in_tensor.reshape(c, h_in // pool_size, pool_size, w_in // pool_size, pool_size)\n  return jnp.nanmean(reshaped, axis=(2, 4))\n  # NKI_EXAMPLE_40_END\n\n\n# NKI_EXAMPLE_41_BEGIN\nif __name__ == \"__main__\":\n  POOL_SIZE = 2\n  C, HIN, WIN = 2, 6, 6\n  HOUT, WOUT = HIN//POOL_SIZE, WIN//POOL_SIZE\n\n  in_array = jnp.arange(C * HIN * WIN, dtype=jnp.float32).reshape(C, HIN, WIN)\n\n  # NKI_EXAMPLE_39_BEGIN\n  out_nki = tensor_avgpool_kernel(in_array, pool_size=POOL_SIZE)\n  # NKI_EXAMPLE_39_END\n  out_jax = jax_average_pool_2D(in_array, pool_size=POOL_SIZE)\n\n  print(in_array, out_nki, out_jax)\n\n  if jnp.allclose(out_nki, out_jax):\n    print(\"NKI and JAX match\")\n  else:\n    print(\"NKI and JAX differ\")\n    # NKI_EXAMPLE_41_END\n"
  },
  {
    "path": "nki/examples/average_pool2d/average_pool2d_nki_kernels.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\nNKI implementation for average pool 2D NKI tutorial.\n\n\"\"\"\nimport numpy as np\n# NKI_EXAMPLE_37_BEGIN\nimport nki\nimport nki.isa as nisa\nimport nki.language as nl\nfrom nki.typing import tensor\n\n@nki.jit\ndef tensor_avgpool_kernel(in_tensor, pool_size):\n  \"\"\"NKI kernel to compute a 2D avg-pool operation\n\n  Args:\n      in_tensor: an input tensor, of shape C x H x W\n      pool_size: an integer representing a (square) pool-window size\n\n  Return:\n      out_tensor: the resulting output tensor, of shape C x (H/pool_size) x (W/pool_size)\n  \"\"\"\n\n  # Get input/output dimensions\n  sz_cin, sz_hin, sz_win = in_tensor.shape\n  sz_hout = sz_hin // pool_size\n  sz_wout = sz_win // pool_size\n  # Create output tensor shared between all SPMD instances as result tensor\n  out_tensor = nl.ndarray((sz_cin, sz_hout, sz_wout), dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n\n  # Set relevant sizes\n  sz_p = sz_cin\n  sz_pool = pool_size\n\n  # Generate pool access pattern to create a 5D view:\n  # [sz_p, sz_hout, sz_wout, sz_pool, sz_pool]\n  # The pool dimensions are placed last so we can reduce over them.\n\n  # Load input data from external memory to on-chip memory\n  in_tile = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype, buffer=nl.sbuf)\n  nisa.dma_copy(dst=in_tile, src=in_tensor)\n\n  # Perform the pooling operation using an access pattern view:\n  # The .ap() creates a strided 5D view of the 3D input tile,\n  # grouping elements into pool windows for reduction.\n  pool_view = in_tile.ap([\n    [sz_hin * sz_win, sz_p],      # partition stride\n    [sz_pool * sz_win, sz_hin // sz_pool],  # outer row stride\n    [sz_pool, sz_win // sz_pool],            # outer col stride\n    [sz_win, sz_pool],             # inner row stride (within pool window)\n    [1, sz_pool],                  # inner col stride (within pool window)\n  ])\n  sum_tile = nl.sum(pool_view, axis=[3, 4])\n  out_tile = nl.ndarray(sum_tile.shape, dtype=sum_tile.dtype, buffer=nl.sbuf)\n  nisa.tensor_scalar(dst=out_tile, data=sum_tile, op0=nl.multiply,\n                     operand0=1.0 / (pool_size * pool_size))\n\n  # Store the results back to hbm\n  nisa.dma_copy(dst=out_tensor, src=out_tile)\n\n  # Transfer the ownership of `out_tensor` to the caller\n  return out_tensor\n  # NKI_EXAMPLE_37_END\n\n\n# Reference NumPy implementation\ndef np_average_pool_2D(in_tensor, pool_size):\n  c, h_in, w_in = in_tensor.shape\n  reshaped = in_tensor.reshape(c, h_in // pool_size, pool_size, w_in // pool_size, pool_size)\n  return np.nanmean(reshaped, axis=(2, 4))\n\n\nif __name__ == \"__main__\":\n  # Now let's run the kernel\n  POOL_SIZE = 2\n  C, HIN, WIN = 2, 6, 6\n  HOUT, WOUT = HIN//POOL_SIZE, WIN//POOL_SIZE\n\n  in_tensor = np.arange(C * HIN * WIN, dtype=np.float16).reshape(C, HIN, WIN)\n\n  out_nki = tensor_avgpool_kernel(in_tensor, POOL_SIZE)\n\n  out_np = np_average_pool_2D(in_tensor, POOL_SIZE)\n\n  print(in_tensor, out_nki, out_np)\n\n  match = (out_nki == out_np).all()\n\n  if match:\n    print(\"NKI and NumPy match\")\n  else:\n    print(\"NKI and NumPy differ\")\n\n  assert match\n"
  },
  {
    "path": "nki/examples/average_pool2d/average_pool2d_torch.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\nPyTorch implementation for average pool 2D NKI tutorial.\n\n\"\"\"\n# NKI_EXAMPLE_38_BEGIN\nimport torch\nimport torch_xla\n# NKI_EXAMPLE_38_END\nfrom average_pool2d_nki_kernels import tensor_avgpool_kernel\n\n\n# NKI_EXAMPLE_38_BEGIN\nif __name__ == \"__main__\":\n  device = torch_xla.device()\n\n  # Now let's run the kernel\n  POOL_SIZE = 2\n  C, HIN, WIN = 2, 6, 6\n  HOUT, WOUT = HIN//POOL_SIZE, WIN//POOL_SIZE\n\n  in_tensor = torch.arange(C * HIN * WIN, dtype=torch.bfloat16).reshape(C, HIN, WIN).to(device=device)\n  out_nki = torch.zeros((C, HOUT, WOUT), dtype=torch.bfloat16).to(device=device)\n\n  out_nki = tensor_avgpool_kernel(in_tensor, POOL_SIZE)\n\n  out_torch = torch.nn.functional.avg_pool2d(in_tensor, POOL_SIZE, POOL_SIZE)\n\n  print(in_tensor, out_nki, out_torch) # an implicit XLA barrier/mark-step\n\n  if (out_nki == out_torch).all():\n    print(\"NKI and Torch match\")\n  else:\n    print(\"NKI and Torch differ\")\n    # NKI_EXAMPLE_38_END\n"
  },
  {
    "path": "nki/examples/fused_mamba/mamba_nki_kernels.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\nMamba-v1 NKI kernel implementation.\n\n\"\"\"\n# NKI_EXAMPLE_25_BEGIN\nimport nki\nimport nki.language as nl\nimport nki.isa as nisa\nimport numpy as np\n# NKI_EXAMPLE_25_END\nimport argparse\nimport itertools\n\n# NKI_EXAMPLE_25_BEGIN\n@nki.jit\ndef mamba_v1(delta, u, A, B, C):\n    \"\"\"Computes the SSM operation in the Mamba model.\n\n    :param delta: (batch_size, channels, seq_len)\n    :param u: (batch_size, channels, seq_len)\n    :param A: (channels, state_size)\n    :param B: (batch_size, state_size, seq_len)\n    :param C: (batch_size, state_size, seq_len)\n    :return: (batch_size, channels, seq_len)\n    \"\"\"\n    batch_size, channels, seq_len = delta.shape\n    output = nl.ndarray((batch_size, channels, seq_len), dtype=delta.dtype,\n                        buffer=nl.shared_hbm)\n\n    _, state_size = A.shape\n\n    # We can relax this using mask paramters in all the NKI API calls\n    assert channels % 128 == 0\n\n    # Map channels to the partition dimension\n    # Tile channels to comply with NKI tile size constraints\n    channel_psize = nl.tile_size.pmax\n    n_channel_tile = channels // channel_psize\n\n    # Most outer loop with batch_size, parallel_for\n    for i_batch in nl.affine_range(batch_size):\n        # Inner loop: tiling channels\n        for i_channel_tile in nl.affine_range(n_channel_tile):\n            channel_start = i_channel_tile * channel_psize\n\n            # partial accumulated scanC result with processed states\n            scanC_accum = nl.zeros((channel_psize, seq_len), dtype=delta.dtype)\n\n            # Second outer loop with state_size, partial parallel\n            for i_state in nl.affine_range(state_size):\n\n                # Load the relevant tile from delta and A\n                delta_slice = delta[i_batch, channel_start:channel_start+channel_psize, 0:seq_len]\n                delta_i = nl.ndarray(delta_slice.shape, dtype=delta_slice.dtype, buffer=nl.sbuf)\n                nisa.dma_copy(dst=delta_i, src=delta_slice)\n                A_slice = A[channel_start:channel_start+channel_psize, i_state:i_state+1]\n                A_i = nl.ndarray(A_slice.shape, dtype=A_slice.dtype, buffer=nl.sbuf)\n                nisa.dma_copy(dst=A_i, src=A_slice)\n\n                # Step 1&2: Element-wise multiplication of delta_i and A_i and then exponential\n                deltaA = nl.ndarray((channel_psize, seq_len), dtype=delta.dtype, buffer=nl.sbuf)\n                nisa.activation(dst=deltaA, op=nl.exp, data=delta_i, scale=A_i)\n\n                # Load the relevant tile from u and B\n                u_slice = u[i_batch, channel_start:channel_start+channel_psize, 0:seq_len]\n                u_i = nl.ndarray(u_slice.shape, dtype=u_slice.dtype, buffer=nl.sbuf)\n                nisa.dma_copy(dst=u_i, src=u_slice)\n                B_slice = B[i_batch, i_state:i_state+1, 0:seq_len]\n                B_i = nl.ndarray(B_slice.shape, dtype=B_slice.dtype, buffer=nl.sbuf)\n                nisa.dma_copy(dst=B_i, src=B_slice)\n\n                # Step 3: Element-wise multiplication of delta_i, B_i and u_i\n                deltaU = nl.ndarray((channel_psize, seq_len), dtype=delta.dtype, buffer=nl.sbuf)\n                nisa.tensor_tensor(dst=deltaU, data1=delta_i, data2=u_i, op=nl.multiply)\n                B_i_bcast = nl.broadcast_to(B_i, (channel_psize, seq_len))\n                deltaBu = nl.ndarray((channel_psize, seq_len), dtype=delta.dtype, buffer=nl.sbuf)\n                nisa.tensor_tensor(dst=deltaBu, data1=deltaU, data2=B_i_bcast, op=nl.multiply)\n\n                # Step 4: Associative scan between deltaA and deltaBu\n                scan_res = nl.ndarray((channel_psize, seq_len), dtype=delta.dtype, buffer=nl.sbuf)\n                nisa.tensor_tensor_scan(dst=scan_res, data0=deltaA, data1=deltaBu, initial=0.0,\n                        op0=nl.multiply, op1=nl.add)\n\n                # Load the relevant tile from C\n                C_slice = C[i_batch, i_state:i_state+1, 0:seq_len]\n                C_i = nl.ndarray(C_slice.shape, dtype=C_slice.dtype, buffer=nl.sbuf)\n                nisa.dma_copy(dst=C_i, src=C_slice)\n\n                # Step 5: Element-wise multiplication of scan_res and C_i\n                C_i_bcast = nl.broadcast_to(C_i, (channel_psize, seq_len))\n                scanC = nl.ndarray((channel_psize, seq_len), dtype=delta.dtype, buffer=nl.sbuf)\n                nisa.tensor_tensor(dst=scanC, data1=scan_res, data2=C_i_bcast, op=nl.multiply)\n\n                # Step 6: Accumulation of scanC along state_size dimension\n                nisa.tensor_tensor(dst=scanC_accum, data1=scanC_accum, data2=scanC, op=nl.add)\n\n            # Store scanC_accum for a single batch/channel tile to output\n            nisa.dma_copy(dst=output[i_batch, channel_start:channel_start+channel_psize, 0:seq_len],\n                    src=scanC_accum)\n\n    return output\n# NKI_EXAMPLE_25_END\n\n# NKI_EXAMPLE_26_BEGIN\n@nki.jit\ndef mamba_v2(delta, u, A, B, C):\n    \"\"\"Computes the SSM operation in the Mamba model.\n\n    :param delta: (batch_size, channels, seq_len)\n    :param u: (batch_size, channels, seq_len)\n    :param A: (channels, state_size)\n    :param B: (batch_size, state_size, seq_len)\n    :param C: (batch_size, state_size, seq_len)\n    :return: (batch_size, channels, seq_len)\n    \"\"\"\n    batch_size, channels, seq_len = delta.shape\n    output = nl.ndarray((batch_size, channels, seq_len), dtype=delta.dtype,\n                        buffer=nl.shared_hbm)\n    _, state_size = A.shape\n\n    assert channels % 128 == 0\n\n    # Map channels to the partition dimension\n    # Tile channels to comply with NKI tile size constraints\n    channel_psize = nl.tile_size.pmax\n    n_channel_tile = channels // channel_psize\n\n    # Most outer loop with batch_size, parallel_for\n    for i_batch in nl.affine_range(batch_size):\n\n        # Second outer loop: tiling channels\n        for i_channel_tile in nl.affine_range(n_channel_tile):\n            channel_start = i_channel_tile * channel_psize\n\n            # partial accumulated scanC result with processed states\n            scanC_accum = nl.zeros((channel_psize, seq_len), dtype=delta.dtype)\n\n            # Load delta/u once to be reused across states\n            delta_slice = delta[i_batch, channel_start:channel_start+channel_psize, 0:seq_len]\n            delta_i = nl.ndarray(delta_slice.shape, dtype=delta_slice.dtype, buffer=nl.sbuf)\n            nisa.dma_copy(dst=delta_i, src=delta_slice)\n            u_slice = u[i_batch, channel_start:channel_start+channel_psize, 0:seq_len]\n            u_i = nl.ndarray(u_slice.shape, dtype=u_slice.dtype, buffer=nl.sbuf)\n            nisa.dma_copy(dst=u_i, src=u_slice)\n\n            # Inner loop with state_size, partial parallel\n            for i_state in nl.affine_range(state_size):\n                # Load the relevant tile from A\n                A_slice = A[channel_start:channel_start+channel_psize, i_state:i_state+1]\n                A_i = nl.ndarray(A_slice.shape, dtype=A_slice.dtype, buffer=nl.sbuf)\n                nisa.dma_copy(dst=A_i, src=A_slice)\n\n                # Step 1&2: Element-wise multiplication of delta_i and A_i and then exponential\n                deltaA = nl.ndarray((channel_psize, seq_len), dtype=delta.dtype, buffer=nl.sbuf)\n                nisa.activation(dst=deltaA, op=nl.exp, data=delta_i, scale=A_i)\n\n                # Load the relevant tile from B\n                B_slice = B[i_batch, i_state:i_state+1, 0:seq_len]\n                B_i = nl.ndarray(B_slice.shape, dtype=B_slice.dtype, buffer=nl.sbuf)\n                nisa.dma_copy(dst=B_i, src=B_slice)\n\n                # Step 3: Element-wise multiplication of delta_i, B_i and u_i\n                deltaU = nl.ndarray((channel_psize, seq_len), dtype=delta.dtype, buffer=nl.sbuf)\n                nisa.tensor_tensor(dst=deltaU, data1=delta_i, data2=u_i, op=nl.multiply)\n                B_i_bcast = nl.broadcast_to(B_i, (channel_psize, seq_len))\n                deltaBu = nl.ndarray((channel_psize, seq_len), dtype=delta.dtype, buffer=nl.sbuf)\n                nisa.tensor_tensor(dst=deltaBu, data1=deltaU, data2=B_i_bcast, op=nl.multiply)\n\n                # Step 4: Associative scan between deltaA and deltaBu\n                scan_res = nl.ndarray((channel_psize, seq_len), dtype=delta.dtype, buffer=nl.sbuf)\n                nisa.tensor_tensor_scan(dst=scan_res, data0=deltaA, data1=deltaBu, initial=0.0,\n                        op0=nl.multiply, op1=nl.add)\n\n                # Load the relevant tile from C\n                C_slice = C[i_batch, i_state:i_state+1, 0:seq_len]\n                C_i = nl.ndarray(C_slice.shape, dtype=C_slice.dtype, buffer=nl.sbuf)\n                nisa.dma_copy(dst=C_i, src=C_slice)\n\n                # Step 5: Element-wise multiplication of scan_res and C_i\n                C_i_bcast = nl.broadcast_to(C_i, (channel_psize, seq_len))\n                scanC = nl.ndarray((channel_psize, seq_len), dtype=delta.dtype, buffer=nl.sbuf)\n                nisa.tensor_tensor(dst=scanC, data1=scan_res, data2=C_i_bcast, op=nl.multiply)\n\n                # Step 6: Accumulation of scanC along state_size dimension\n                nisa.tensor_tensor(dst=scanC_accum, data1=scanC_accum, data2=scanC, op=nl.add)\n\n            # Store scanC_accum for a single batch to output\n            nisa.dma_copy(dst=output[i_batch, channel_start:channel_start+channel_psize, 0:seq_len],\n                    src=scanC_accum[0:channel_psize, 0:seq_len])\n\n    return output\n# NKI_EXAMPLE_26_END\n\n\n@nki.jit\ndef mamba_v3(delta, u, A, B, C):\n    \"\"\"Computes the SSM operation in the Mamba model.\n\n    :param delta: (batch_size, channels, seq_len)\n    :param u: (batch_size, channels, seq_len)\n    :param A: (channels, state_size)\n    :param B: (batch_size, state_size, seq_len)\n    :param C: (batch_size, state_size, seq_len)\n    :return: (batch_size, channels, seq_len)\n    \"\"\"\n    batch_size, channels, seq_len = delta.shape\n    output = nl.ndarray((batch_size, channels, seq_len), dtype=delta.dtype,\n                        buffer=nl.shared_hbm)\n    _, state_size = A.shape\n\n    # Map channels to the partition dimension\n    # Tile channels to comply with NKI tile size constraints\n    channel_psize = nl.tile_size.pmax\n    n_channel_tile = channels // channel_psize\n\n    # Magic number, decided through empirical profiling data\n    seq_len_fsize = 512\n    n_seq_len_tile = seq_len // seq_len_fsize\n\n    # Fix this later with mask\n    assert channels % channel_psize == 0\n    assert seq_len % seq_len_fsize == 0\n\n    # Most outer loop with batch_size, parallel_for\n    for i_batch in nl.affine_range(batch_size):\n\n        # Second outer loop: tiling channels\n        for i_channel_tile in nl.affine_range(n_channel_tile):\n            channel_start = i_channel_tile * channel_psize\n\n            # partial accumulated scanC result with processed states\n            scanC_accum = nl.zeros((channel_psize, seq_len), dtype=delta.dtype)\n\n            # Load delta/u once to be reused across states\n            delta_slice = delta[i_batch, channel_start:channel_start+channel_psize, 0:seq_len]\n            delta_i = nl.ndarray(delta_slice.shape, dtype=delta_slice.dtype, buffer=nl.sbuf)\n            nisa.dma_copy(dst=delta_i, src=delta_slice)\n            u_slice = u[i_batch, channel_start:channel_start+channel_psize, 0:seq_len]\n            u_i = nl.ndarray(u_slice.shape, dtype=u_slice.dtype, buffer=nl.sbuf)\n            nisa.dma_copy(dst=u_i, src=u_slice)\n\n            # Inner loop with state_size, partial parallel\n            for i_state in nl.affine_range(state_size):\n                # Load the relevant tile from A\n                A_slice = A[channel_start:channel_start+channel_psize, i_state:i_state+1]\n                A_i = nl.ndarray(A_slice.shape, dtype=A_slice.dtype, buffer=nl.sbuf)\n                nisa.dma_copy(dst=A_i, src=A_slice)\n\n                # Last scan result\n                scan_init = nl.zeros((channel_psize, 1), dtype=delta_i.dtype)\n                # FIXME: sequential_range gives incorrect answer and also much worse perf than static_range\n                # for i_seq_len_tile in nl.sequential_range(n_seq_len_tile):\n                for i_seq_len_tile in nl.static_range(n_seq_len_tile):\n                    seq_len_start = i_seq_len_tile * seq_len_fsize\n\n                    # Step 1&2: Element-wise multiplication of delta_i and A_i and then exponential\n                    deltaA = nl.ndarray((channel_psize, seq_len_fsize), dtype=delta.dtype, buffer=nl.sbuf)\n                    nisa.activation(dst=deltaA, op=nl.exp,\n                            data=delta_i[0:channel_psize, seq_len_start:seq_len_start+seq_len_fsize],\n                            scale=A_i)\n\n                    # Load the relevant tile from B\n                    B_slice = B[i_batch, i_state:i_state+1, seq_len_start:seq_len_start+seq_len_fsize]\n                    B_i = nl.ndarray(B_slice.shape, dtype=B_slice.dtype, buffer=nl.sbuf)\n                    nisa.dma_copy(dst=B_i, src=B_slice)\n\n                    # Step 3: Element-wise multiplication of delta_i, B_i and u_i\n                    deltaU = nl.ndarray((channel_psize, seq_len_fsize), dtype=delta.dtype, buffer=nl.sbuf)\n                    nisa.tensor_tensor(dst=deltaU,\n                            data1=delta_i[0:channel_psize, seq_len_start:seq_len_start+seq_len_fsize],\n                            data2=u_i[0:channel_psize, seq_len_start:seq_len_start+seq_len_fsize],\n                            op=nl.multiply)\n                    B_i_bcast = nl.broadcast_to(B_i, (channel_psize, seq_len_fsize))\n                    deltaBu = nl.ndarray((channel_psize, seq_len_fsize), dtype=delta.dtype, buffer=nl.sbuf)\n                    nisa.tensor_tensor(dst=deltaBu, data1=deltaU, data2=B_i_bcast, op=nl.multiply)\n\n                    # Step 4: Associative scan between deltaA and deltaBu\n                    scan_res = nl.ndarray((channel_psize, seq_len_fsize), dtype=delta.dtype, buffer=nl.sbuf)\n                    nisa.tensor_tensor_scan(dst=scan_res, data0=deltaA, data1=deltaBu, initial=scan_init,\n                            op0=nl.multiply, op1=nl.add)\n                    nisa.tensor_copy(dst=scan_init, src=scan_res[0:channel_psize, seq_len_fsize-1:seq_len_fsize])\n\n                    # Load the relevant tile from C\n                    C_slice = C[i_batch, i_state:i_state+1, seq_len_start:seq_len_start+seq_len_fsize]\n                    C_i = nl.ndarray(C_slice.shape, dtype=C_slice.dtype, buffer=nl.sbuf)\n                    nisa.dma_copy(dst=C_i, src=C_slice)\n\n                    # Step 5: Element-wise multiplication of scan_res and C_i\n                    C_i_bcast = nl.broadcast_to(C_i, (channel_psize, seq_len_fsize))\n                    scanC = nl.ndarray((channel_psize, seq_len_fsize), dtype=delta.dtype, buffer=nl.sbuf)\n                    nisa.tensor_tensor(dst=scanC, data1=scan_res, data2=C_i_bcast, op=nl.multiply)\n\n                    # Step 6: Accumulation of scanC along state_size dimension\n                    nisa.tensor_tensor(dst=scanC_accum[0:channel_psize, seq_len_start:seq_len_start+seq_len_fsize],\n                            data1=scanC_accum[0:channel_psize, seq_len_start:seq_len_start+seq_len_fsize],\n                            data2=scanC, op=nl.add)\n\n            # Store scanC_accum for a single batch to output\n            nisa.dma_copy(dst=output[i_batch, channel_start:channel_start+channel_psize, 0:seq_len],\n                    src=scanC_accum[0:channel_psize, 0:seq_len])\n    return output\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(\"Run Mamba NKI kernels.\")\n    parser.add_argument(\"--version\",\n            nargs='+',\n            default=[\"v1\", \"v2\", \"v3\"],\n            choices=[\"v1\", \"v2\", \"v3\"],\n            help=\"Test versions\")\n\n    parser.add_argument(\"--batch\",\n            nargs='+',\n            default=[1],\n            help=\"Batch size.\")\n    parser.add_argument(\"--seq_len\",\n            nargs='+',\n            default=[2048],\n            help=\"Sequence length.\")\n    parser.add_argument(\"--channels\",\n            nargs='+',\n            default=[256],\n            help=\"Number of channels.\")\n    parser.add_argument(\"--state_size\",\n            nargs='+',\n            default=[16],\n            help=\"State size.\")\n\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n\n    # Small test to ensure numerical correctness\n    arr_batch = [int(_) for _ in args.batch]\n    arr_seq_len = [int(_) for _ in args.seq_len]\n    arr_channels = [int(_) for _ in args.channels]\n    arr_state_size = [int(_) for _ in args.state_size]\n\n    configs = itertools.product(arr_batch, arr_seq_len, arr_channels, arr_state_size)\n\n    for config in configs:\n        batch, seq_len, channels, state_size = config\n        print(f\">>> batch={batch}, seq_len={seq_len}, channels={channels}, state_size={state_size}\")\n\n        # Set up input tensors\n        dtype = np.float32\n        delta = np.ones((batch, channels, seq_len), dtype=dtype)\n        u = np.ones((batch, channels, seq_len), dtype=dtype)\n        A = -np.ones((channels, state_size), dtype=dtype)\n        B = np.ones((batch, state_size, seq_len), dtype=dtype)\n        C = np.ones((batch, state_size, seq_len), dtype=dtype)\n\n        func_dict = {\"v1\": mamba_v1,\n                     \"v2\": mamba_v2,\n                     \"v3\": mamba_v3,\n                    }\n\n        # v1: reference kernel\n        print(f\">>>> Running v1 (reference).\")\n        nki_out_v1 = mamba_v1(delta, u, A, B, C)\n\n        for version in args.version:\n            if version == \"v1\":\n                # already run, continue\n                continue\n\n            print(f\">>>> Running version {version}.\")\n            func = func_dict[version]\n            nki_out_test = func(delta, u, A, B, C)\n            print(f\">>>> mamba {version} matches?\", np.all(nki_out_test == nki_out_v1))\n            assert np.all(nki_out_test == nki_out_v1)"
  },
  {
    "path": "nki/examples/fused_mamba/mamba_torch.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\nMamba-v1 PyTorch Reference Implementation.\n\n\"\"\"\n\n# NKI_EXAMPLE_24_BEGIN\nimport torch\nimport torch_xla\nimport os\nimport argparse\n\nos.environ[\"NEURON_FRAMEWORK_DEBUG\"] = \"1\"\nos.environ[\"NEURON_CC_FLAGS\"]= \" --model-type=transformer --disable-dge \"\n\n\ndef associative_scan(deltaA, deltaB_u):\n    \"\"\"\n    Args:\n        deltaA: [batch_size, channels, state_size, seq_len]\n        deltaB_u: [batch_size, channels, state_size, seq_len]\n\n    Mamba uses an associative scan operator to aggregate information across\n    time sequentially (sequence length, e.g. sequence of tokens),\n    from the past to the present.\n    \"\"\"\n    batch_size, channels, state_size, seq_len = deltaA.shape\n    out = torch.empty(batch_size, channels, state_size, seq_len,\n                        device=deltaA.device, dtype=deltaA.dtype)\n    for i in range(seq_len):\n        prev_state = out[..., i - 1] if i > 0 else 0\n        out[..., i] = deltaA[..., i] * prev_state + deltaB_u[..., i]\n    return out\n\n\ndef mamba_layer(delta, A, B, u, C):\n    \"\"\"\n    Args:\n        delta: [batch, channels, seq_len]\n        u: [batch, channels, seq_len]\n        A: [channels, state_size]\n        B: [batch, state_size, seq_len]\n        C: [batch, state_size, seq_len]\n    \"\"\"\n    # expand the tensors so they all have the same dimensions and compute elementwise products (with broadcast)\n    # deltaA and deltaB_u have shape [batch_size, channels, state_size, seq_len]\n    deltaA = torch.exp(delta[:, :, None, :] * A[None, :, :, None])\n    deltaB_u = delta[:, :, None, :] * B[:, None, :, :] * u[:, :, None, :]\n    scan_res = associative_scan(deltaA, deltaB_u)\n    # y sums over the `state_size` axis and has shape [batch_size, channels, seq_len]\n    mamba_out = (C[:, None, :, :] * scan_res).sum(dim=-2)\n    return mamba_out\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(\n    \"\"\"Run Mamba PyTorch implementation. Hard-coded small example only since\n       PyTorch implementation is very slow for larger configs.\n    \"\"\")\n    parser.add_argument(\"--mode\",\n                        choices=[\"accuracy\", \"perf\"],\n                        default=\"accuracy\",\n                        help=\"\"\"Do accuracy test or perf test.\n                                Accuracy test compares mamba_v1 kernel against PyTorch implementation.\n                                Perf test will generate a NEFF for the PyTorch implementation in local directory\n                                for a manual run of neuron-profile.\n                             \"\"\")\n    args = parser.parse_args()\n    return args\n\n\nif __name__ == \"__main__\":\n    args = parse_args()\n\n    # Toy example\n    batch = 1\n    seq_len = 512\n    channels = 256\n    state_size = 16\n\n    dtype = torch.float32\n\n    device = torch_xla.device()\n\n    delta = torch.ones(batch, channels, seq_len, dtype=dtype, device=device)\n    u = torch.ones(batch, channels, seq_len, dtype=dtype, device=device)\n\n    # For numerical accuracy testing purposes, we choose negative numbers for A on purpose.\n    # Otherwise, the associative scan will integrate too fast and overflow, which would\n    # mask any real numerical issues in our computation.\n    # A negative A will ensure we catch numerical issues when we have them.\n    A = -torch.ones(channels, state_size, dtype=dtype, device=device)\n    B = torch.ones(batch, state_size, seq_len, dtype=dtype, device=device)\n\n    C = torch.ones(batch, state_size, seq_len, dtype=dtype, device=device)\n\n    torch_xla.sync()\n    torch_out = mamba_layer(delta, A, B, u, C)\n    torch_xla.sync()\n    print(torch_out)\n    # NKI_EXAMPLE_24_END\n\n    if args.mode == \"accuracy\":\n        # Call NKI mamba_v1 kernel to check accuracy\n        from mamba_nki_kernels import mamba_v1\n\n        torch_xla.sync()\n        nki_out = mamba_v1(delta, u, A, B, C)\n        torch_xla.sync()\n\n        allclose = torch.allclose(torch_out, nki_out, atol=1e-2, rtol=1e-2)\n\n        if allclose:\n            print(\"NKI and Torch match\")\n        else:\n            print(\"NKI and Torch differ\")\n\n        assert allclose\n"
  },
  {
    "path": "nki/examples/getting_started_baremetal.py",
    "content": "# NKI_EXAMPLE_0_BEGIN NKI_EXAMPLE_1_BEGIN\nimport nki\nimport nki.language as nl\n# NKI_EXAMPLE_1_END\n\n\n# NKI_EXAMPLE_2_BEGIN\n@nki.jit\ndef nki_tensor_add_kernel(a_input, b_input):\n    # NKI_EXAMPLE_2_END\n\n    \"\"\"NKI kernel to compute element-wise addition of two input tensors\n    \"\"\"\n\n    # NKI_EXAMPLE_3_BEGIN\n    # Check all input/output tensor shapes are the same for element-wise operation\n    assert a_input.shape == b_input.shape\n\n    # Check size of the first dimension does not exceed on-chip memory tile size limit,\n    # so that we don't need to tile the input to keep this example simple\n    assert a_input.shape[0] <= nl.tile_size.pmax\n    # NKI_EXAMPLE_3_END\n\n    # Load the inputs from device memory to on-chip memory\n    # NKI_EXAMPLE_4_BEGIN\n    a_tile = nl.load(a_input)\n    b_tile = nl.load(b_input)\n    # NKI_EXAMPLE_4_END\n\n    # Specify the computation (in our case: a + b)\n    # NKI_EXAMPLE_5_BEGIN\n    c_tile = nl.add(a_tile, b_tile)\n    # NKI_EXAMPLE_5_END\n\n    # NKI_EXAMPLE_6_BEGIN\n    # Create a HBM tensor as the kernel output\n    c_output = nl.ndarray(a_input.shape, dtype=a_input.dtype, buffer=nl.shared_hbm)\n\n    # Store the result to c_output from on-chip memory to device memory\n    nl.store(c_output, value=c_tile)\n\n    # Return kernel output as function output\n    return c_output\n# NKI_EXAMPLE_0_END NKI_EXAMPLE_6_END\n\n\nif __name__ == \"__main__\":\n    # NKI_EXAMPLE_8_BEGIN\n    import numpy as np\n\n    a = np.ones((4, 3), dtype=np.float16)\n    b = np.ones((4, 3), dtype=np.float16)\n\n    # NKI_EXAMPLE_12_BEGIN\n    # Run NKI kernel on a NeuronDevice\n    c = nki_tensor_add_kernel(a, b)\n    # NKI_EXAMPLE_12_END\n\n    print(c)\n    # NKI_EXAMPLE_8_END\n"
  },
  {
    "path": "nki/examples/getting_started_jax.py",
    "content": "import nki\nimport nki.language as nl\n\n@nki.jit\ndef nki_tensor_add_kernel(a_input, b_input):\n    \"\"\"NKI kernel to compute element-wise addition of two input tensors\n    \"\"\"\n\n    c_output = nl.ndarray(a_input.shape, dtype=a_input.dtype, buffer=nl.shared_hbm)\n\n    # Check all input/output tensor shapes are the same for element-wise operation\n    assert a_input.shape == b_input.shape\n\n    # Check size of the first dimension does not exceed on-chip memory tile size limit,\n    # so that we don't need to tile the input to keep this example simple\n    assert a_input.shape[0] <= nl.tile_size.pmax\n\n    # Load the inputs from device memory to on-chip memory\n    a_tile = nl.load(a_input)\n    b_tile = nl.load(b_input)\n\n    # Specify the computation (in our case: a + b)\n    c_tile = nl.add(a_tile, b_tile)\n\n    # Store the result to c_output from on-chip memory to device memory\n    nl.store(c_output, value=c_tile)\n\n    return c_output\n\n\nif __name__ == \"__main__\":\n    # NKI_EXAMPLE_11_BEGIN\n    import jax.numpy as jnp\n\n    a = jnp.ones((4, 3), dtype=jnp.float16)\n    b = jnp.ones((4, 3), dtype=jnp.float16)\n\n    c = nki_tensor_add_kernel(a, b)\n\n    print(c)\n    # NKI_EXAMPLE_11_END\n"
  },
  {
    "path": "nki/examples/getting_started_torch.py",
    "content": "import nki\nimport nki.language as nl\n\n@nki.jit\ndef nki_tensor_add_kernel(a_input, b_input):\n    \"\"\"NKI kernel to compute element-wise addition of two input tensors\n    \"\"\"\n\n    c_output = nl.ndarray(a_input.shape, dtype=a_input.dtype, buffer=nl.shared_hbm)\n\n    # Check all input/output tensor shapes are the same for element-wise operation\n    assert a_input.shape == b_input.shape\n\n    # Check size of the first dimension does not exceed on-chip memory tile size limit,\n    # so that we don't need to tile the input to keep this example simple\n    assert a_input.shape[0] <= nl.tile_size.pmax\n\n    # Load the inputs from device memory to on-chip memory\n    a_tile = nl.load(a_input)\n    b_tile = nl.load(b_input)\n\n    # Specify the computation (in our case: a + b)\n    c_tile = nl.add(a_tile, b_tile)\n\n    # Store the result to c_output from on-chip memory to device memory\n    nl.store(c_output, value=c_tile)\n\n    return c_output\n\n\nif __name__ == \"__main__\":\n    # NKI_EXAMPLE_10_BEGIN\n    import torch\n    import torch_xla\n\n    device = torch_xla.device()\n\n    a = torch.ones((4, 3), dtype=torch.float16).to(device=device)\n    b = torch.ones((4, 3), dtype=torch.float16).to(device=device)\n\n    c = nki_tensor_add_kernel(a, b)\n\n    print(c)  # an implicit XLA barrier/mark-step (triggers XLA compilation)\n    # NKI_EXAMPLE_10_END\n"
  },
  {
    "path": "nki/examples/index-case-1.py",
    "content": "import nki\nimport nki.language as nl\nimport math\n\n@nki.jit\ndef tensor_split_kernel_(in_tensor):\n  \"\"\"NKI kernel to split an input tensor into two output tensors, along the column axis.\n\n  The even columns of the input tensor will be gathered into the first output tensor,\n  and the odd columns of the input tensor will be gathered into the second output tensor.\n\n  Args:\n      in_tensor: an input tensor\n  Returns:\n      out_tensor_even: a first output tensor (will hold the even columns of the input tensor)\n      out_tensor_odd: a second output tensor (will hold the odd columns of the input tensor)\n  \"\"\"\n\n  # This example only works for tensors with a partition dimension that fits in the SBUF\n  assert in_tensor.shape[0] <= nl.tile_size.pmax\n\n  # Extract tile sizes.\n  sz_p, sz_f = in_tensor.shape\n  sz_fout_even = sz_f - sz_f // 2\n  sz_fout_odd = sz_f // 2\n\n  # create output tensors\n  out_tensor_even = nl.ndarray((sz_p, sz_fout_even), dtype=in_tensor.dtype, buffer=nl.shared_hbm)\n  out_tensor_odd = nl.ndarray((sz_p, sz_fout_odd), dtype=in_tensor.dtype, buffer=nl.shared_hbm)\n\n  # Load input data from external memory to on-chip memory\n  in_tile = nl.load(in_tensor)\n\n  # Store the results back to external memory\n  nl.store(out_tensor_even, value=in_tile[:, 0:sz_f:2])\n  nl.store(out_tensor_odd,  value=in_tile[:, 1:sz_f:2])\n\n  return out_tensor_even, out_tensor_odd\n\n\nif __name__ == \"__main__\":\n    import torch\n    import torch_xla\n\n    device = torch_xla.device()\n\n    X, Y = 4, 5\n    in_tensor = torch.arange(X * Y, dtype=torch.bfloat16).reshape(X, Y).to(device=device)\n\n    out1_tensor, out2_tensor = tensor_split_kernel_(in_tensor)\n    print(in_tensor, out1_tensor, out2_tensor)\n"
  },
  {
    "path": "nki/examples/index-case-3.py",
    "content": "import nki\nimport nki.language as nl\n\n@nki.jit\ndef tensor_maxpool_kernel_(in_tensor, sz_pool):\n  \"\"\"NKI kernel to compute a 2D max-pool operation\n\n  Args:\n      in_tensor: an input tensor, of dimensions C x H x W\n      sz_pool: integer P representing a (square) pool-window size\n  Returns:\n      out_tensor: the resulting output tensor, of dimensions C x (H/P) x (W/P)\n  \"\"\"\n\n  # Get input/output dimensions\n  sz_p, sz_hin, sz_win = in_tensor.shape\n  sz_hout, sz_wout = sz_hin // sz_pool, sz_win // sz_pool\n  out_tensor = nl.ndarray((sz_p, sz_hout, sz_wout), dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n\n  # Load input data from external memory to on-chip memory\n  in_tile = nl.load(in_tensor)\n\n  # Perform the pooling operation using an access pattern to create a 5D view:\n  # [sz_p, sz_hout, sz_wout, sz_pool, sz_pool]\n  # The pool dimensions are placed last so we can reduce over them.\n  pool_view = in_tile.ap([\n    [sz_hin * sz_win, sz_p],      # partition stride\n    [sz_pool * sz_win, sz_hout],   # outer row stride (hop by pool rows)\n    [sz_pool, sz_wout],            # outer col stride (hop by pool cols)\n    [sz_win, sz_pool],             # inner row stride (within pool window)\n    [1, sz_pool],                  # inner col stride (within pool window)\n  ])\n  out_tile = nl.max(pool_view, axis=[3, 4])\n\n  # Store the results back to external memory\n  nl.store(out_tensor, value=out_tile)\n\n  return out_tensor\n\n\nif __name__ == \"__main__\":\n    import torch\n    import torch_xla\n\n    device = torch_xla.device()\n\n    # Now let's run the kernel\n    POOL_SIZE = 2\n    C, HIN, WIN = 2, 6, 6\n    HOUT, WOUT = HIN//POOL_SIZE, WIN//POOL_SIZE\n\n    in_tensor = torch.arange(C * HIN * WIN, dtype=torch.bfloat16).reshape(C, HIN, WIN).to(device=device)\n    out_tensor = tensor_maxpool_kernel_(in_tensor, POOL_SIZE)\n\n    print(in_tensor, out_tensor) # an implicit XLA barrier/mark-step\n"
  },
  {
    "path": "nki/examples/layout-dynamic-loop.py",
    "content": "import nki.language as nl\nimport nki\nimport math\n\n@nki.jit\ndef tensor_exp_kernel_(in_tensor):\n  \"\"\"NKI kernel to compute elementwise exponential of an input tensor\n\n  Args:\n      in_tensor: an input tensor of ANY 2D shape (up to SBUF size)\n  Returns:\n      out_tensor: an output tensor of ANY 2D shape (up to SBUF size)\n  \"\"\"\n  sz_p, sz_f = in_tensor.shape\n  out_tensor = nl.ndarray((sz_p, sz_f), dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n\n  for p in nl.affine_range(math.ceil(sz_p / nl.tile_size.pmax)):\n    # Generate tensor indices for the input/output tensors\n    p_start = k * nl.tile_size.pmax\n    p_end = p_start + nl.tile_size.pmax\n    i_p = slice(p_start, min(p_end, sz_p))\n\n    # Load input data from external memory to on-chip memory\n    in_tile = nl.load(in_tensor[i_p, 0:sz_f]\n\n    # perform the computation\n    out_tile = nl.exp(in_tile)\n\n    # store the results back to external memory\n    nl.store(out_tensor[i_p, 0:sz_f], value=out_tile)\n\n    return out_tensor\n"
  },
  {
    "path": "nki/examples/layout-loop.py",
    "content": "import nki.language as nl\nfrom torch_neuronx import nki_jit\n\n@nki_jit\ndef tensor_exp_kernel_(in_tensor):\n  \"\"\"NKI kernel to compute elementwise exponential of an input tensor\n\n  Args:\n      in_tensor: an input tensor of shape [256,512]\n  Returns:\n      out_tensor: an output tensor of shape [256,512]\n  \"\"\"\n  out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n\n  for k in nl.affine_range(2):\n    # Generate tensor indices for the input/output tensors\n    p_start = k * nl.tile_size.pmax\n    p_end = p_start + nl.tile_size.pmax\n    i_p = slice(p_start, p_end)\n\n    # Load input data from HBM to on-chip memory\n    in_tile = nl.load(in_tensor[i_p, 0:512])\n\n    # perform the computation\n    out_tile = nl.exp(in_tile)\n\n    # store the results back to HBM\n    nl.store(out_tensor[i_p, i_f], value=out_tile)\n\n  return out_tensor\n"
  },
  {
    "path": "nki/examples/layout-pass.py",
    "content": "import nki.language as nl\nimport nki\n\n@nki.jit\ndef tensor_exp_kernel_(in_tensor):\n  \"\"\"NKI kernel to compute elementwise exponential of an input tensor\n\n  Args:\n      in_tensor: an input tensor of shape [128,512]\n  Returns:\n      out_tensor: an output tensor of shape [128,512]\n  \"\"\"\n\n  out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n\n  # Load input data from HBM to on-chip memory\n  in_tile = nl.load(in_tensor[0:128, 0:512])\n\n  # perform the computation:\n  out_tile = nl.exp(in_tile)\n\n  # store the results back to HBM\n  nl.store(out_tensor[0:128, 0:512], value=out_tile)\n\n  return out_tensor\n\n\nif __name__ == \"__main__\":\n  import torch\n  import torch_xla\n\n  device = torch_xla.device()\n\n  shape = (128, 512)\n  in_tensor = torch.ones(shape,  dtype=torch.bfloat16).to(device=device)\n  out_tensor = tensor_exp_kernel_(in_tensor)\n\n  print(out_tensor) # an implicit XLA barrier/mark-step\n"
  },
  {
    "path": "nki/examples/layout-violation.py",
    "content": "import nki.language as nl\nimport nki\n\n\n@nki.jit\ndef tensor_exp_kernel_(in_tensor):\n  \"\"\"NKI kernel to compute elementwise exponential of an input tensor\n\n  Args:\n      in_tensor: an input tensor of shape [128,512]\n  Returns:\n      out_tensor: an output tensor of shape [128,512]\n  \"\"\"\n  out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n\n  # Load input data from HBM to on-chip memory\n  in_tile = nl.load(in_tensor[0:256, 0:512])\n\n  # perform the computation:\n  out_tile = nl.exp(in_tile)\n\n  # store the results back to HBM\n  nl.store(out_tensor[0:256, 0:512], value=out_tile)\n\n\n# NKI_EXAMPLE_12_BEGIN\nif __name__ == \"__main__\":\n  import torch\n  import torch_xla\n\n  device = torch_xla.device()\n\n  shape = (256, 512) # Previously (128, 512)\n  in_tensor = torch.ones(shape,  dtype=torch.bfloat16).to(device=device)\n  out_tensor = tensor_exp_kernel_(in_tensor)\n\n  print(out_tensor) # an implicit XLA barrier/mark-step\n  # NKI_EXAMPLE_12_END\n"
  },
  {
    "path": "nki/examples/matrix_multiplication/matrix_multiplication_nki_kernels.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\nNKI implementation for matrix multiplication NKI tutorial.\n\n\"\"\"\n\nimport nki as nki\nimport nki.isa as nisa\nimport nki.language as nl\nimport numpy as np\n\n\n# NKI_EXAMPLE_16_BEGIN\n@nki.jit\ndef nki_matmul_basic_(lhsT, rhs):\n  \"\"\"NKI kernel to compute a 64x128x512 matrix multiplication operation\n\n  Args:\n      lhsT: an input tensor of shape [128,64], a left hand side argument of the\n        matrix multiplication, delivered transposed for optimal performance\n      rhs: an input tensor of shape [128,512], a right hand side argument of the\n        matrix multiplication\n  Returns:\n      result: the resulting output tensor of shape [64,512]\n  \"\"\"\n  # Verify that the lhsT and rhs are the expected sizes.\n  K, M = lhsT.shape\n  K_, N = rhs.shape\n\n  # Check that the contraction dimension matches and all dimensions\n  #are what were expected.\n  assert K == K_, \\\n    f\"Expected contraction dimension to match on both lhsT ({K}) and rhs ({K})\"\n  assert K == 128, f\"Expected contraction dimension to be 128, but got {K}\"\n  assert M == 64, f\"Expected lhsT matrix to have dimension M of 64, but got {M}\"\n  assert N == 512, f\"Expected rhs matrix to have dimension N of 512, but got {N}\"\n\n  # Create a tensor to write the result into (not initialized)\n  result = nl.ndarray((M, N), dtype=lhsT.dtype, buffer=nl.shared_hbm)\n\n  # Creating a tensor in SBUF to load the inputs into (not initialized)\n  lhs_tile = nl.ndarray(lhsT.shape, dtype=lhsT.dtype, buffer=nl.sbuf)\n  rhs_tile = nl.ndarray(rhs.shape, dtype=rhs.dtype, buffer=nl.sbuf)\n\n  # Loading the inputs (HBM->SBUF)\n  # Note: here we take Tile dtype definition into account,\n  # which forces P-dim as the left most index\n  nisa.dma_copy(dst=lhs_tile, src=lhsT)\n  nisa.dma_copy(dst=rhs_tile, src=rhs)\n\n  # Create a tensor in PSUM to accumulate the result in (uninitialized)\n  result_psum = nl.ndarray(result.shape, dtype=nl.float32, buffer=nl.psum)\n\n  # Perform the matrix-multiplication\n  # Note: A NKI matmul instruction always writes to PSUM in float32 data-type\n  nisa.nc_matmul(result_psum, lhs_tile, rhs_tile)\n\n  # Create a tensor in SBUF and copy the result from PSUM back to SBUF, \n  # and cast to expected output data-type\n  result_sbuf = nl.ndarray(result_psum.shape, dtype=result.dtype, buffer=nl.sbuf)\n  nisa.tensor_copy(dst=result_sbuf, src=result_psum)\n\n  # The result of [64,128] x [128,512] matrix multiplication has a shape of [64, 512].\n  # This dictates which indices to use to address the result tile.\n  nisa.dma_copy(dst=result, src=result_sbuf)\n\n  return result\n  # NKI_EXAMPLE_16_END\n\n\n# NKI_EXAMPLE_18_BEGIN\n@nki.jit\ndef nki_matmul_tiled_(lhsT, rhs):\n  \"\"\"NKI kernel to compute a matrix multiplication operation in a tiled manner\n\n  Args:\n      lhsT: an input tensor of shape [K,M], where both K and M are multiples for\n        128.  It is the left-hand-side argument of the matrix multiplication,\n        delivered transposed for optimal performance.\n      rhs: an input tensor of shape [K,N], where K is a multiple of 128, and N\n        is a multiple of 512.  It is the right-hand-side argument of the matrix\n        multiplication.\n  Returns:\n      result: the resulting output tensor of shape [M,N]\n  \"\"\"\n\n  # Verify that the lhsT and rhs have the same contraction dimension.\n  K, M = lhsT.shape\n  K_, N = rhs.shape\n  assert K == K_, \"lhsT and rhs must have the same contraction dimension\"\n\n  # Lookup the device matrix multiply dimensions.\n  TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n  TILE_K = nl.tile_size.pmax  # 128\n  TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n\n  # Verify that the input matrices are a multiple of the tile dimensions.\n  assert M % TILE_M == 0, \\\n    f\"Expected M, {M}, to be a multiple of stationary free-dimension max, {TILE_M}\"\n  assert N % TILE_N == 0, \\\n    f\"Expected N, {N}, to be a multiple of moving free-dimension max, {TILE_N}\"\n  assert K % TILE_K == 0, \\\n    f\"Expected K, {K}, to be a multiple of the partition dimension max, {TILE_K}\"\n\n  # Create a space for the result in HBM (not initialized)\n  result = nl.ndarray((M, N), dtype=lhsT.dtype, buffer=nl.shared_hbm)\n\n  # Use affine_range to loop over tiles\n  for m in nl.affine_range(M // TILE_M):\n    for n in nl.affine_range(N // TILE_N):\n      # Allocate a tensor in PSUM\n      res_psum = nl.ndarray((TILE_M, TILE_N), nl.float32, buffer=nl.psum)\n\n      for k in nl.affine_range(K // TILE_K):\n        # Declare the tiles on SBUF\n        lhsT_tile = nl.ndarray((TILE_K, TILE_M), dtype=lhsT.dtype, buffer=nl.sbuf)\n        rhs_tile = nl.ndarray((TILE_K, TILE_N), dtype=rhs.dtype, buffer=nl.sbuf)\n\n        # Load tiles from lhsT and rhs\n        nisa.dma_copy(dst=lhsT_tile,\n                      src=lhsT[k * TILE_K:(k + 1) * TILE_K,\n                               m * TILE_M:(m + 1) * TILE_M])\n        nisa.dma_copy(dst=rhs_tile, \n                      src=rhs[k * TILE_K:(k + 1) * TILE_K,\n                              n * TILE_N:(n + 1) * TILE_N])\n\n        # Accumulate partial-sums into PSUM\n        nisa.nc_matmul(dst=res_psum, stationary=lhsT_tile, moving=rhs_tile)\n\n      # Copy the result from PSUM back to SBUF, and cast to expected output data-type\n      res_sb = nl.ndarray(res_psum.shape, dtype=result.dtype, buffer=nl.sbuf)\n      nisa.tensor_copy(dst=res_sb, src=res_psum)\n\n      # Copy the result from SBUF to HBM.\n      nisa.dma_copy(dst=result[m * TILE_M:(m + 1) * TILE_M,\n                               n * TILE_N:(n + 1) * TILE_N],\n                    src=res_sb)\n\n  return result\n  # NKI_EXAMPLE_18_END\n\n\n# NKI_EXAMPLE_19_BEGIN\n@nki.jit\ndef nki_matmul_hoist_load_(lhsT, rhs):\n  \"\"\"NKI kernel to compute a matrix multiplication operation in a tiled manner\n     while hoisting the load of the lhsT and rhs to outer loops.\n\n  Args:\n      lhsT: an input tensor of shape [K,M], where both K and M are multiples for\n        128.  It is the left-hand-side argument of the matrix multiplication,\n        delivered transposed for optimal performance.\n      rhs: an input tensor of shape [K,N], where K is a multiple of 128, and N\n        is a multiple of 512.  It is the right-hand-side argument of the matrix\n        multiplication.\n  Returns:\n      result: the resulting output tensor of shape [M,N]\n  \"\"\"\n\n  # Verify that the lhsT and rhs are the expected sizes.\n  K, M = lhsT.shape\n  K_, N = rhs.shape\n  assert K == K_, \"lhsT and rhs must have the same contraction dimension\"\n\n  # Lookup the device matrix multiply dimensions.\n  TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n  TILE_K = nl.tile_size.pmax  # 128\n  TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n\n  # Verify that the input matrices are a multiple of the tile dimensions.\n  assert M % TILE_M == 0, \\\n    f\"Expected M, {M}, to be a multiple of stationary free-dimension max, {TILE_M}\"\n  assert N % TILE_N == 0, \\\n    f\"Expected N, {N}, to be a multiple of moving free-dimension max, {TILE_N}\"\n  assert K % TILE_K == 0, \\\n    f\"Expected K, {K}, to be a multiple of the partition dimension max, {TILE_K}\"\n\n  # Create a space for the result in HBM (not initialized)\n  result = nl.ndarray((M, N), dtype=lhsT.dtype, buffer=nl.shared_hbm)\n\n  # Use affine_range to loop over tiles\n  for m in nl.affine_range(M // TILE_M):\n    # Load a whole column tiles from lhsT (with K * TILE_M numbers)\n    # This corresponds to the whole row in the original lhs\n    lhsT_tiles = []\n    for k in nl.affine_range(K // TILE_K):\n      # Allocate space in SBUF for the tile (uninitialized)\n      lhsT_tile = nl.ndarray(shape=(TILE_K, TILE_M), dtype=lhsT.dtype, buffer=nl.sbuf)\n      # Copy the tile from HBM to SBUF\n      nisa.dma_copy(dst=lhsT_tile, \n                    src=lhsT[k * TILE_K:(k + 1) * TILE_K,\n                             m * TILE_M:(m + 1) * TILE_M])\n      # Append the tile to the list of tiles.\n      lhsT_tiles.append(lhsT_tile)\n\n    for n in nl.affine_range(N // TILE_N):\n      # Load a whole column tiles from rhs (with K * TILE_N numbers)\n      rhs_tiles = []\n      for k in nl.affine_range(K // TILE_K):\n        # Allocate space in SBUF for the tile (uninitialized)\n        rhs_tile = nl.ndarray(shape=(TILE_K, TILE_N), dtype=rhs.dtype, buffer=nl.sbuf)\n        # Copy the tile from HBM to SBUF\n        nisa.dma_copy(dst=rhs_tile,\n                      src=rhs[k * TILE_K:(k + 1) * TILE_K,\n                              n * TILE_N:(n + 1) * TILE_N])\n        # Append the tile to the list of tiles.\n        rhs_tiles.append(rhs_tile)\n\n      # Allocate a tile in PSUM for the result (uninitialized)\n      res_psum = nl.ndarray(shape=(TILE_M, TILE_N), dtype=nl.float32, buffer=nl.psum)\n      for k in nl.affine_range(K // TILE_K):\n        # Accumulate partial-sums into PSUM\n        nisa.nc_matmul(dst=res_psum, stationary=lhsT_tiles[k], moving=rhs_tiles[k])\n\n      # Copy the result from PSUM back to SBUF, and cast to expected output data-type\n      res_sb = nl.ndarray(shape=(TILE_M, TILE_N), dtype=nl.float32, buffer=nl.sbuf)\n      nisa.tensor_copy(dst=res_sb, src=res_psum)\n\n      # Copy the result from SBUF to HBM.\n      nisa.dma_copy(dst=result[m * TILE_M:(m + 1) * TILE_M,\n                               n * TILE_N:(n + 1) * TILE_N],\n                    src=res_sb)\n\n  return result\n  # NKI_EXAMPLE_19_END\n\n\n# NKI_EXAMPLE_20_BEGIN\n@nki.jit\ndef nki_matmul_block_free_dimension_(lhsT, rhs):\n  \"\"\"NKI kernel to compute a matrix multiplication operation while blocking the\n     free dimensions of the LHS and RHS to improve memory access pattern.\n\n  Args:\n      lhsT: an input tensor of shape [K,M], where both K and M are multiples for\n        128.  It is the left-hand-side argument of the matrix multiplication,\n        delivered transposed for optimal performance.\n      rhs: an input tensor of shape [K,N], where K is a multiple of 128, and N\n        is a multiple of 512.  It is the right-hand-side argument of the matrix\n        multiplication.\n  Returns:\n      result: the resulting output tensor of shape [M,N]\n  \"\"\"\n\n  # Verify that the lhsT and rhs have the same contraction dimension.\n  K, M = lhsT.shape\n  K_, N = rhs.shape\n  assert K == K_, \"lhsT and rhs must have the same contraction dimension\"\n\n  # Lookup the device matrix multiply dimensions.\n  TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n  TILE_K = nl.tile_size.pmax  # 128\n  TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n\n  # Configuring the blocking size for the free dimensions\n  TILES_IN_BLOCK_M = 2\n  TILES_IN_BLOCK_N = 2\n\n  BLOCK_M = TILE_M * TILES_IN_BLOCK_M  # 256\n  BLOCK_N = TILE_N * TILES_IN_BLOCK_N  # 1024\n\n  # the size has to be multiple of block size\n  assert M % BLOCK_M == 0\n  assert N % BLOCK_N == 0\n\n  # Create a space for the result in HBM (not initialized)\n  result = nl.ndarray((M, N), dtype=lhsT.dtype, buffer=nl.shared_hbm)\n\n  # Loop over blocks over the M dimension\n  for m in nl.affine_range(M // BLOCK_M):\n    # Load TILES_IN_BLOCK_M columns tiles by TILES_K rows from lhsT\n    lhsT_tiles = []\n    for bm in nl.affine_range(TILES_IN_BLOCK_M):\n      # Inner tile array.\n      lhsT_tiles_internal = []\n      for k in nl.affine_range(K // TILE_K):\n        # Allocate space in SBUF for the tile (uninitialized)\n        lhsT_tile = nl.ndarray(shape=(TILE_K, TILE_M),\n                               dtype=lhsT.dtype,\n                               buffer=nl.sbuf)\n        # Copy the tile from HBM to SBUF\n        nisa.dma_copy(dst=lhsT_tile,\n                      src=lhsT[k * TILE_K:(k + 1) * TILE_K,\n                               (m * TILES_IN_BLOCK_M + bm) *\n                               TILE_M:((m * TILES_IN_BLOCK_M + bm) + 1) *\n                               TILE_M])\n        # Append the tile to the inner list of tiles.\n        lhsT_tiles_internal.append(lhsT_tile)\n      # Append the inner list of tiles into the outer list of tiles.\n      lhsT_tiles.append(lhsT_tiles_internal)\n\n    for n in nl.affine_range(N // BLOCK_N):\n      # Load TILES_IN_BLOCK_N columns from rhs by TILES_K rows from rhs\n      rhs_tiles = []\n      for bn in nl.affine_range(TILES_IN_BLOCK_N):\n        # Inner tile array.\n        rhs_tiles_internal = []\n        for k in nl.affine_range(K // TILE_K):\n          # Allocate space in SBUF for the tile (uninitialized)\n          rhs_tile = nl.ndarray(shape=(TILE_K, TILE_N),\n                                dtype=rhs.dtype,\n                                buffer=nl.sbuf)\n          # Copy the tile from HBM to SBUF\n          nisa.dma_copy(dst=rhs_tile,\n                        src=rhs[k * TILE_K:(k + 1) * TILE_K,\n                                (n * TILES_IN_BLOCK_N + bn) *\n                                TILE_N:((n * TILES_IN_BLOCK_N + bn) + 1) *\n                                TILE_N])\n          # Append the tile to the inner list of tiles.\n          rhs_tiles_internal.append(rhs_tile)\n        # Append the inner list of tiles into the outer list of tiles.\n        rhs_tiles.append(rhs_tiles_internal)\n\n      for bm in nl.affine_range(TILES_IN_BLOCK_M):\n        for bn in nl.affine_range(TILES_IN_BLOCK_N):\n          # Allocate a tensor in PSUM\n          result_tile = nl.ndarray(shape=(TILE_M, TILE_N),\n                                   dtype=nl.float32,\n                                   buffer=nl.psum)\n          for k in nl.affine_range(K // TILE_K):\n            # Accumulate partial-sums into PSUM\n            nisa.nc_matmul(dst=result_tile,\n                           stationary=lhsT_tiles[bm][k],\n                           moving=rhs_tiles[bn][k])\n  \n          # Copy the result from PSUM back to SBUF, and cast to expected\n          # output data-type\n          result_tmp = nl.ndarray(shape=result_tile.shape,\n                                  dtype=result.dtype,\n                                  buffer=nl.sbuf)\n          nisa.tensor_copy(dst=result_tmp, src=result_tile)\n\n          # Copy the result from SBUF to HBM.\n          nisa.dma_copy(dst=result[(m * TILES_IN_BLOCK_M + bm) *\n                                   TILE_M:((m * TILES_IN_BLOCK_M + bm) + 1) *\n                                   TILE_M,\n                                   (n * TILES_IN_BLOCK_N + bn) *\n                                   TILE_N:((n * TILES_IN_BLOCK_N + bn) + 1) *\n                                   TILE_N],\n                        src=result_tmp)\n\n  return result\n  # NKI_EXAMPLE_20_END\n\n\n# NKI_EXAMPLE_21_BEGIN\n@nki.jit\ndef nki_matmul_fully_optimized_(\n    lhsT,\n    rhs,\n    # Meta-parameters\n    TILES_IN_BLOCK_M=16,\n    TILES_IN_BLOCK_N=2,\n    TILES_IN_BLOCK_K=8,\n):\n  \"\"\"NKI kernel to compute a large matrix multiplication efficiently by\n     blocking all dimensions and doing layout optimization.\n\n  Args:\n      lhsT: an input tensor of shape [K,M], where K is a multiple of 128 *\n        TILES_IN_BLOCK_K and M is a multiple of 128 * TILES_IN_BLOCK_M.  It is the\n        left-hand-side argument of the matrix multiplication, delivered transposed\n        for optimal performance.\n      rhs: an input tensor of shape [K,N],  where K is a multiple of 128 *\n        TILES_IN_BLOCK_K and N is a multiple of 512 * TILES_IN_BLOCK_N.  It is\n        the right-hand-side argument of the matrix multiplication.\n      TILES_IN_BLOCK_*: meta parameters to control blocking dimensions\n  Returns:\n      result: the resulting output tensor of shape [M,N]\n  \"\"\"\n\n  # Verify that the lhsT and rhs have the same contraction dimension.\n  K, M = lhsT.shape\n  K_, N = rhs.shape\n  assert K == K_, \"lhsT and rhs must have the same contraction dimension\"\n\n  # Lookup the device matrix multiply dimensions.\n  TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n  TILE_K = nl.tile_size.pmax  # 128\n  TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n\n  # Compute the block dimensions.\n  BLOCK_M = TILE_M * TILES_IN_BLOCK_M\n  BLOCK_N = TILE_N * TILES_IN_BLOCK_N\n  BLOCK_K = TILE_K * TILES_IN_BLOCK_K\n\n  # Verify the size is a multiple of block size\n  assert M % BLOCK_M == 0, \\\n    f\"Expected M {M} to be divisible by {BLOCK_M} when there are {TILES_IN_BLOCK_M}\"\n  assert N % BLOCK_N == 0, \\\n    f\"Expected N {N} to be divisible by {BLOCK_N} when there are {TILES_IN_BLOCK_N}\"\n  assert K % BLOCK_K == 0, \\\n    f\"Expected K {K} to be divisible by {BLOCK_K} when there are {TILES_IN_BLOCK_K}\"\n\n  # Create a space for the result in HBM (not initialized)\n  result = nl.ndarray((M, N), dtype=lhsT.dtype, buffer=nl.shared_hbm)\n\n  # Compute the number of blocks in each dimension\n  NUM_BLOCK_M = M // BLOCK_M\n  NUM_BLOCK_N = N // BLOCK_N\n  NUM_BLOCK_K = K // BLOCK_K\n\n  # Blocking N dimension (the RHS free dimension)\n  for n in nl.affine_range(NUM_BLOCK_N):\n    n_start = n * BLOCK_N\n    n_end = n_start + BLOCK_N\n\n    # Allocate and initialize result matrix N-block to 0.0.\n    #\n    # Each result M-tile stores its N-block contiguous on the free-dim\n    # with shape (TILE_M, TILES_IN_BLOCK_N, TILE_N). This layout allows\n    # reshaping to (TILE_M, BLOCK_N) for SBUF->HBM DMA to operate on a\n    # large payload, enabling good DMA efficiency.\n    #\n    # We split the N-block into individual M-tiles so the compiler can\n    # pipeline memset(0), matmul, tensor_tensor, and SBUF->HBM DMA\n    # on M-tile granularity.\n    result_m_tiles = []\n    for m in nl.affine_range(NUM_BLOCK_M):\n      for m_tile in nl.affine_range(TILES_IN_BLOCK_M):\n        result_m_tile = nl.ndarray(\n          shape=(TILE_M, TILES_IN_BLOCK_N, TILE_N),\n          dtype=result.dtype,\n          buffer=nl.sbuf,\n        )\n        nisa.memset(dst=result_m_tile, value=0.0)\n        result_m_tiles.append(result_m_tile)\n\n    # Blocking K dimension (the contraction dimension)\n    for k in nl.sequential_range(NUM_BLOCK_K):\n      k_block_tile_start = k * TILES_IN_BLOCK_K\n\n      # Load tiles from RHS\n      # Load tiles one N-block at a time for good DMA efficiency.\n      rhs_tiles = nl.ndarray(\n        shape=(TILE_K, TILES_IN_BLOCK_K, BLOCK_N),\n        dtype=rhs.dtype,\n        buffer=nl.sbuf,\n      )\n      for k_tile in range(TILES_IN_BLOCK_K):\n        k_tile_start = (k_block_tile_start + k_tile) * TILE_K\n        k_tile_end = k_tile_start + TILE_K\n        nisa.dma_copy(\n          dst=rhs_tiles[0:TILE_K, k_tile, 0:BLOCK_N],\n          src=rhs[k_tile_start:k_tile_end, n_start:n_end],\n        )\n\n      # Blocking M dimension (the LHS free dimension)\n      for m in nl.affine_range(NUM_BLOCK_M):\n        # Loading tiles from lhsT\n        # Load tiles one M-block at a time for good DMA efficiency.\n        lhsT_tiles = nl.ndarray(\n          shape=(TILE_K, TILES_IN_BLOCK_K, BLOCK_M),\n          dtype=lhsT.dtype,\n          buffer=nl.sbuf,\n        )\n        m_start = m * BLOCK_M\n        m_end = m_start + BLOCK_M\n        for k_tile in nl.affine_range(TILES_IN_BLOCK_K):\n          k_tile_start = (k_block_tile_start + k_tile) * TILE_K\n          k_tile_end = k_tile_start + TILE_K\n          nisa.dma_copy(\n            dst=lhsT_tiles[0:TILE_K, k_tile, 0:BLOCK_M],\n            src=lhsT[k_tile_start:k_tile_end, m_start:m_end],\n          )\n\n        # Do matmul with all tiles in the blocks\n        m_block_tile_start = m * TILES_IN_BLOCK_M\n        for n_tile in nl.affine_range(TILES_IN_BLOCK_N):\n          for m_tile in nl.affine_range(TILES_IN_BLOCK_M):\n            result_tile = nl.ndarray(\n              shape=(TILE_M, TILE_N), dtype=nl.float32, buffer=nl.psum\n            )\n            for k_tile in nl.affine_range(TILES_IN_BLOCK_K):\n              m_tile_start = m_tile * TILE_M\n              m_tile_end = m_tile_start + TILE_M\n              n_tile_start = n_tile * TILE_N\n              n_tile_end = n_tile_start + TILE_N\n              nisa.nc_matmul(\n                dst=result_tile,\n                stationary=lhsT_tiles[0:TILE_K, k_tile, m_tile_start:m_tile_end],\n                moving=rhs_tiles[0:TILE_K, k_tile, n_tile_start:n_tile_end],\n              )\n\n            # Evict from PSUM to SBUF while accumulating into result M-tile.\n            m_tile_idx = m_block_tile_start + m_tile\n            result_m_tile = result_m_tiles[m_tile_idx]\n            nisa.tensor_tensor(\n              dst=result_m_tile[0:TILE_M, n_tile, 0:TILE_N],\n              data1=result_m_tile[0:TILE_M, n_tile, 0:TILE_N],\n              data2=result_tile,\n              op=nl.add,\n            )\n\n    # Evict the result M-tiles from SBUF to HBM.\n    # Copy on N-blocks granularity for good DMA efficiency.\n    for m in nl.affine_range(NUM_BLOCK_M):\n      m_block_tile_start = m * TILES_IN_BLOCK_M\n      for m_tile in nl.affine_range(TILES_IN_BLOCK_M):\n        m_tile_idx = m_block_tile_start + m_tile\n        result_m_tile = result_m_tiles[m_tile_idx]\n        result_m_tile_block = result_m_tile.reshape((TILE_M, BLOCK_N))\n\n        m_tile_start = m_tile_idx * TILE_M\n        m_tile_end = m_tile_start + TILE_M\n        nisa.dma_copy(\n          dst=result[m_tile_start:m_tile_end, n_start:n_end],\n          src=result_m_tile_block[0:TILE_M, 0:BLOCK_N],\n        )\n\n  return result\n# NKI_EXAMPLE_21_END\n"
  },
  {
    "path": "nki/examples/matrix_multiplication/matrix_multiplication_torch.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\nPyTorch implementation for matrix multiplication NKI tutorial.\n\n\"\"\"\n\nimport torch\nimport torch_xla\n\nfrom matrix_multiplication_nki_kernels import nki_matmul_basic_, nki_matmul_tiled_, nki_matmul_hoist_load_, nki_matmul_block_free_dimension_, nki_matmul_fully_optimized_\n\nif __name__ == \"__main__\":\n\n  # NKI_EXAMPLE_17_BEGIN\n  device = torch_xla.device()\n  cpu = torch.device('cpu')\n\n  # Test the small workload with basic kernel\n  lhs_small = torch.rand((64, 128), dtype=torch.bfloat16, device=device)\n  rhs_small = torch.rand((128, 512), dtype=torch.bfloat16, device=device)\n\n  # Run NKI kernel\n  output_small = nki_matmul_basic_(lhs_small.T, rhs_small)\n\n  # Run torch reference\n  output_small_torch = torch.matmul(lhs_small, rhs_small)\n\n  # Compare results\n  print(\"Checking correctness of nki_matmul_basic\")\n  if torch.allclose(output_small_torch, output_small, atol=1e-4, rtol=1e-2):\n    print(\"NKI and Torch match\")\n  else:\n    print(\"NKI and Torch differ\")\n    # NKI_EXAMPLE_17_END\n\n  # NKI_EXAMPLE_22_BEGIN\n  # Test the large workload with tiled kernels\n  lhs = torch.rand((4096, 1024), dtype=torch.bfloat16, device=device)\n  rhs = torch.rand((1024, 2048), dtype=torch.bfloat16, device=device)\n\n  # Run torch reference\n  output_torch = torch.matmul(lhs, rhs).to(device=cpu)\n\n  def check_match(nki_func):\n    output = nki_func(lhs.T, rhs)\n    output_nki = output.to(device=cpu)\n    if torch.allclose(output_torch, output_nki, atol=1e-4, rtol=1e-2):\n      print(\"NKI and Torch match\")\n    else:\n      print(\"NKI and Torch differ\")\n\n  print(\"Checking correctness of nki_matmul_tiled\")\n  check_match(nki_matmul_tiled_)\n\n  print(\"Checking correctness of nki_matmul_hoist_load\")\n  check_match(nki_matmul_hoist_load_)\n\n  print(\"Checking correctness of nki_matmul_block_free_dimension\")\n  check_match(nki_matmul_block_free_dimension_)\n\n  print(\"Checking correctness of nki_matmul_fully_optimized\")\n  check_match(nki_matmul_fully_optimized_)\n  # NKI_EXAMPLE_22_END\n"
  },
  {
    "path": "nki/examples/simulate/nki_simulate_example.py",
    "content": "\"\"\"Quick Start example for nki.simulate documentation.\"\"\"\n\nimport nki\nimport nki.language as nl\nimport nki.isa as nisa\nimport numpy as np\n\n\n# NKI_EXAMPLE_SIMULATE_BEGIN\n@nki.jit\ndef add_kernel(a_ptr, b_ptr):\n    # Load tiles from HBM into SBUF\n    a = nl.load(a_ptr)\n    b = nl.load(b_ptr)\n    # Element-wise add\n    result = nl.add(a, b)\n    # Store result back to HBM\n    out = nl.ndarray(a_ptr.shape, dtype=a_ptr.dtype, buffer=nl.shared_hbm)\n    nl.store(out, value=result)\n    return out\n# NKI_EXAMPLE_SIMULATE_END\n\n\n# NKI_EXAMPLE_SIMULATE_RUN_BEGIN\n# Run on the CPU simulator\nresult = nki.simulate(add_kernel)(a, b)\n\n# Verify correctness\nnp.testing.assert_allclose(result, a + b, rtol=1e-5)\n# NKI_EXAMPLE_SIMULATE_RUN_END\n"
  },
  {
    "path": "nki/examples/tensor_addition/tensor_addition_nki_kernels.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\nNKI implementation for tensor addition NKI tutorial.\n\n\"\"\"\n# NKI_EXAMPLE_27_BEGIN\nimport nki as nki\nimport nki.language as nl\nimport nki.isa as nisa\nimport os\n\nos.environ[\"NEURON_PLATFORM_TARGET_OVERRIDE\"] = \"trn1\"\n\n@nki.jit\ndef nki_tensor_add(a_input, b_input):\n  \"\"\"NKI kernel to compute element-wise addition of two input tensors\n\n  This kernel assumes strict input/output sizes can be uniformly tiled to [128,512]\n\n  Args:\n      a_input: a first input tensor\n      b_input: a second input tensor\n\n  Returns:\n      c_output: an output tensor\n  \"\"\"\n  # Create output tensor shared between all SPMD instances as \n  # result tensor (uninitialized)\n  c_output = nl.ndarray(a_input.shape, dtype=a_input.dtype, buffer=nl.shared_hbm)\n\n  # Extract the dimensions for the a_input shape.\n  M, N = a_input.shape\n\n  # Set the tile dimensions, while the TILE_N is not, strictly speaking, limited to \n  # 512 for the additiona operation, we stick with this size for simplicity.\n  TILE_M = 128\n  TILE_N = 512\n\n  # Check the input sizes match and match the tilable constraint.\n  assert a_input.shape == b_input.shape, \\\n    f\"Expected shaps {a_input.shape} and {b_input.shape} to match\"\n  assert a_input.dtype == b_input.dtype, \\\n    f\"Expected data types {a_input.dtype} and {b_input.dtype} to match\"\n  assert M % TILE_M == 0, \\\n    f\"Expected partition dimention ({M}) to be divisble by {TILE_M}\"\n  assert N % TILE_N == 0, \\\n    f\"Expected partition dimention ({N}) to be divisble by {TILE_N}\"\n\n  # Lop over each tile, load the tile, do the addition, and save it back to HBM.\n  for m in nl.affine_range(M // TILE_M):\n    for n in nl.affine_range(N // TILE_N):\n      # Allocte space for the a_tile and b_tile in sbuf (uninitialized)\n      a_tile = nl.ndarray(shape=(TILE_M, TILE_N), dtype=a_input.dtype, buffer=nl.sbuf)\n      b_tile = nl.ndarray(shape=(TILE_M, TILE_N), dtype=b_input.dtype, buffer=nl.sbuf)\n\n      # Load the a_tile and b_tile from HBM into SBUF.\n      nisa.dma_copy(dst=a_tile,\n                    src=a_input[m * TILE_M:(m + 1) * TILE_M,\n                                n * TILE_N:(n + 1) * TILE_N])\n      nisa.dma_copy(dst=b_tile,\n                    src=b_input[m * TILE_M:(m + 1) * TILE_M,\n                                n * TILE_N:(n + 1) * TILE_N])\n\n      # Allocate space for the c_tile in sbuf.\n      c_tile = nl.ndarray(shape=(TILE_M, TILE_N), dtype=a_input.dtype, buffer=nl.sbuf)\n\n      # Perform the addition using the element-wise tensor_tensor instruction.\n      nisa.tensor_tensor(dst=c_tile, data1=a_tile, data2=b_tile, op=nl.add)\n\n      # Copy the result to the output tensor.\n      nisa.dma_copy(dst=c_output[m * TILE_M:(m + 1) * TILE_M,\n                                 n * TILE_N:(n + 1) * TILE_N],\n                    src=c_tile)\n\n  # Transfer the ownership of `c_output` to the caller\n  return c_output\n  # NKI_EXAMPLE_27_END\n"
  },
  {
    "path": "nki/examples/transpose2d/transpose2d_jax.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\nJAX implementation for transpose2d NKI tutorial.\n\n\"\"\"\n\n# NKI_EXAMPLE_36_BEGIN\nimport jax\nimport jax.numpy as jnp\n# NKI_EXAMPLE_36_END\n\nfrom transpose2d_nki_kernels import tensor_transpose2D_kernel_\n\n# NKI_EXAMPLE_36_BEGIN\nif __name__ == \"__main__\":\n  P, X, Y = 5, 37, 44\n  a = jax.random.uniform(jax.random.PRNGKey(42), (P, X * Y))\n  a_t_nki = tensor_transpose2D_kernel_(a, shape2D=(X, Y))\n\n  a_t_jax = jnp.transpose(a.reshape(P, X, Y), axes=(0, 2, 1)).reshape(P, X * Y)\n  print(a, a_t_nki, a_t_jax)\n\n  allclose = jnp.allclose(a_t_jax, a_t_nki)\n  if allclose:\n    print(\"NKI and JAX match\")\n  else:\n    print(\"NKI and JAX differ\")\n\n  assert allclose\n# NKI_EXAMPLE_36_END\n"
  },
  {
    "path": "nki/examples/transpose2d/transpose2d_nki_kernels.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\nNKI baremetal implementation for transpose2d NKI tutorial.\n\"\"\"\n\nimport numpy as np\n# NKI_EXAMPLE_33_BEGIN\nimport nki\nimport nki.language as nl\nimport nki.isa as nisa\n\n\n@nki.jit\ndef tensor_transpose2D_kernel_(in_tensor, shape2D):\n  \"\"\"\n  NKI kernel to reorder the elements on axis[1] of the input tensor.\n\n  Every row of the input tensor is a flattened row-major 2D matrix.\n  The shape2D argument defines the dimensions of the flattened matrices (#rows,#cols).\n  Our goal in this kernel is to transpose these flattened 2D matrices, i.e. make them (#cols,#rows).\n\n  Example:\n      in_tensor = [a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3]\n      shape2D = (3,4)\n  this means that in_tensor has 3 rows and 4 columns, i.e. can be represented as:\n      [a0,a1,a2,a3]\n      [b0,b1,b2,b3]\n      [c0,c1,c2,c3]\n  after transpose, we expect to get:\n      [a0,b0,c0]\n      [a1,b1,c1]\n      [a2,b2,c2]\n      [a3,b3,c3]\n  Thus, out_tensor is expected to be [a0,b0,c0,a1,b1,c1,a2,b2,c2,a3,b3,c3]\n\n  Args:\n    in_tensor: an input tensor\n    shape2D: tuple representing the dimensions to be transposed: (#rows, #cols)\n  \"\"\"\n  out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # Gather input shapes\n  sz_p, _ = in_tensor.shape\n\n  # Load input data from external memory to on-chip memory\n  in_tile = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype, buffer=nl.sbuf)\n  nisa.dma_copy(dst=in_tile, src=in_tensor)\n\n  # Performing f1/f2 transpose\n  # ==========================\n  # The desired transpose pattern is provided as an input:\n  sz_f1, sz_f2 = shape2D\n\n  # Perform the transposition via element-wise SBUF-to-SBUF copies\n  # with index arithmetic to scatter elements into transposed positions.\n  # RHS traverses an F1 x F2 matrix in row major order\n  # LHS traverses an F2 x F1 (transposed) matrix in row major order\n  out_tile = nl.ndarray(shape=(sz_p, sz_f2*sz_f1), dtype=in_tensor.dtype,\n                        buffer=nl.sbuf)\n  for i_f1 in nl.affine_range(sz_f1):\n    for i_f2 in nl.affine_range(sz_f2):\n      nisa.tensor_copy(dst=out_tile[:, nl.ds(i_f2*sz_f1+i_f1, 1)],\n                       src=in_tile[:, nl.ds(i_f1*sz_f2+i_f2, 1)])\n\n  # Finally, we store out_tile to external memory\n  nisa.dma_copy(dst=out_tensor, src=out_tile)\n\n  return out_tensor\n  # NKI_EXAMPLE_33_END\n\n\nif __name__ == \"__main__\":\n  P, X, Y = 5, 3, 4\n  a = np.arange(P*X*Y, dtype=np.int8).reshape((P, X*Y))\n\n  a_t_nki = tensor_transpose2D_kernel_(a, (X, Y))\n\n  a_t_np = np.transpose(a.reshape(P, X, Y), (0, 2, 1)).reshape(P, X * Y)\n\n  print(a, a_t_nki, a_t_np)\n\n  allclose = np.allclose(a_t_np, a_t_nki)\n  if allclose:\n    print(\"NKI and NumPy match\")\n  else:\n    print(\"NKI and NumPy differ\")\n\n  assert allclose\n"
  },
  {
    "path": "nki/examples/transpose2d/transpose2d_torch.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\nPyTorch implementation for transpose2d NKI tutorial.\n\"\"\"\n\n# NKI_EXAMPLE_34_BEGIN\nimport torch\nimport torch_xla\n# NKI_EXAMPLE_34_END\n\nfrom transpose2d_nki_kernels import tensor_transpose2D_kernel_\n\n\n# NKI_EXAMPLE_34_BEGIN\nif __name__ == \"__main__\":\n  device = torch_xla.device()\n\n  P, X, Y = 5, 3, 4\n  a = torch.arange(P*X*Y, dtype=torch.int8).reshape((P, X*Y)).to(device=device)\n  a_t_nki = torch.zeros((P, Y*X), dtype=torch.int8).to(device=device)\n\n  a_t_nki = tensor_transpose2D_kernel_(a, (X, Y))\n\n  a_cpu = torch.arange(P*X*Y, dtype=torch.int8).reshape((P, X*Y))\n  a_t_torch = torch.transpose(a_cpu.reshape(P, X, Y), 1, 2).reshape(P, X * Y).to(device=device)\n\n  print(a, a_t_nki, a_t_torch)\n\n  allclose = torch.allclose(a_t_torch, a_t_nki)\n  if allclose:\n    print(\"NKI and PyTorch match\")\n  else:\n    print(\"NKI and PyTorch differ\")\n\n  assert allclose\n  # NKI_EXAMPLE_34_END\n"
  },
  {
    "path": "nki/get-started/about/data-representation-overview.rst",
    "content": ".. meta::\n   :description: Overview of Data Representations in NKI\n   :date_updated: 12/02/2025\n\n.. _nki-about-data:\n\n==========================\nData Representation in NKI\n==========================\n\nThis topic covers Data Representation and how it applies to developing with the AWS Neuron SDK.\nThis overview will describe how data appears to the NKI programmer, and how this data is organized on the NeuronDevice.\n\nRepresenting data in NKI\n------------------------\n\nNKI represents data in NeuronCore's memory hierarchy with built-in ``tensor`` type.\nA ``tensor`` is a multi-dimensional array which contains elements with\nthe same data type, or \"dtype\".\n\nProgrammers can pass ``tensor`` values in and out of NKI kernels, and declare or initialize ``tensor`` values in any memory within the NeuronDevice\n(PSUM, SBUF, HBM) using APIs such as :doc:`nki.language.ndarray </nki/api/generated/nki.language.ndarray>` and :doc:`nki.language.zeros </nki/api/generated/nki.language.zeros>`.\n\nA ``tensor`` value has a name, a shape that describes the number and size of\neach of its dimensions, an element data type (or \"dtype\"), and a description of\nthe physical location of the underlying data on the NeuronCore. For example, a\nmatrix of 16-bit floating point numbers may have a shape of ``(128,64)``\nindicating that there are 128 rows and 64 columns of numbers, and a dtype of\n\"bfloat16\" describing the floating format.\n\nThe physical location of a ``tensor`` consists of a memory (HBM, SBUF, or\nPSUM), and an offset and size for the underlying data. In the case of HBM\ntensors, there is only one offset and size. However, for SBUF and PSUM tensors\nthere are two offsets and two sizes because those memories are two-dimensional.\nThe two offsets and sizes describe a rectangle in the underlying memory within\nwhich the tensor data will live. For two-dimensional memories, the first\ndimension is called the \"partition dimension\" and corresponds to the partitions\nof the underlying memory. Using our example from above, if our 128x64 element\ntensor was resident on the SBUF, then the partition offset and size could be 0\nand 128 indicating that each row of the matrix corresponds to one partition of\nthe SBUF. The second offset and size could be, for instance, 1024 and 128,\nindicating that each matrix row start 1024 bytes from the beginning of each\npartition, and consumes 128 bytes (2 bytes for each 16-bit float), within each\npartition.\n\nOften, NKI programmers will not need to worry about the physical location of\ntensors. When using high-level APIs such as\n:doc:`nki.language.ndarray </nki/api/generated/nki.language.ndarray>`,\nthe physical location is assigned automatically by the NKI compiler. However,\nmore advanced kernels may directly control the relative physical locations of\ntensors using the direct allocation APIs.\n\nInput and output tensors from ML frameworks to NKI kernels will be tensors with\nunderlying memory type of ``hbm``. These tensors are placed in the HBM memory\nprior to calling the NKI kernel. Intermediate tensors can be allocated using the\ntensor creation APIs, for instance:\n\n.. code-block::\n\n   # Allocate 3D tensor on the SBUF\n   x = nl.ndarray((128, 32, 512), dtype=nl.float32, buffer=nl.sbuf)\n\nThe above code creates a new 3D tensor on the SBUF memory with shape 128x32x512, and with\nan element type of 32-bit floats. The physical location of this tensor will\nbe assigned by the NKI compiler, and the total amount of memory used will be:\n``8,388,608 = 128 * 32 * 512 * 4``\n"
  },
  {
    "path": "nki/get-started/about/index.rst",
    "content": ".. meta::\n   :description: Learn about Neuron Kernel Interface (NKI) and core concepts essential for working with it.\n   :keywords: NKI, AWS Neuron, Core Concepts, Programming Model, Architecture\n   :date-modified: 12/01/2025\n\n.. _nki_about_home:\n\nAbout Neuron Kernel Interface (NKI)\n===================================\n\nThis section covers core concepts Neuron Kernel Interface (NKI) within the AWS Neuron SDK. Whether you're developing custom kernels or optimizing machine learning workloads, this documentation will help you leverage the full capabilities of AWS Neuron accelerators.\n\nIntroducing NKI: Complete Kernel Development Solution \n-------------------------------------------------------\n\nNeuron Kernel Interface (NKI) is an open source tool for developing kernels for Trainium hardware. It has three main parts: \n\n* The first part is the NKI Programming Interface, which offers two APIs: ``nki.lang`` for high-level tile programming (similar to numpy and Triton), and ``nki.isa`` for direct access to hardware instructions.\n\n* The second part is the NKI Compiler, built on MLIR, which turns NKI kernel code into optimized hardware instructions. It keeps the execution order and memory allocation that developers specify. \n\n* The third part is the NKI Library (``NKI-Lib``), which provides ready-to-use optimized kernels that developers can use directly or learn from.\n\nUsing MLIR enables NKI integration with the LLVM ecosystem and compiler research community. NKI's open-source code lets everyone see how the compilation works, from the Python code to the final hardware instructions. Researchers can try new compiler techniques, framework developers can learn how kernels work with their code, and the community can improve both the compiler and kernel library. If you want to start using NKI, you can find tutorials available at https://github.com/aws-neuron/nki-samples.\n\nFor more details on NKI and Neuron open source GitHub repos, see :doc:`/about-neuron/oss/index`.\n\nNKI and Neuron Hardware \n------------------------\n\nBefore learning about NKI, it's important to understand the hardware where NKI kernels run. NKI is made specifically for AWS Trainium, so let's look at the architecture your NKI code will use.\n\n.. image:: /nki/img/overviews/about-nki-1.png\n\nTrainium chips are AI chips built by AWS for AI training and inference. They are highly optimized for performance, power efficiency, and cost efficiency for a broad range of AI/ML workloads. Trainium uses a small number of powerful cores (called NeuronCores), each with four specialized engines that work together:\n\n* **Tensor Engine**: Handles matrix operations like matrix-multiplications and convolutions\n* **Vector Engine**: Processes multi-input vector operations and reductions (e.g. Normalization, ResidualAdd)\n* **Scalar Engine**: Performs element-wise functions, including non-linearities (e.g. GELU, Square)\n* **GpSimd Engine**: Deeply embedded general-purpose programmable processors for custom operations\n\nTrainium devices also have dedicated **Collective Communication Engines** (**CC-Cores**) that move data between NeuronCores and between Trainium chips. These engines handle operations like AllReduce, ReduceScatter, AllGather and All2all, while the core computation continues processing in parallel, allowing more efficient scaling to multiple cores and chips.\n\nThe memory system has three levels:\n\n* **HBM** (High Bandwidth Memory): Provides device working memory.  \n* **SBUF** (State Buffer): On-chip scratchpad SRAM that serves as a software-managed cache. Holds active tensors locally and acts as a landing buffer for DMA transfers.  \n* **PSUM** (Partial Sum Buffer): Stores and accumulates matrix operations results.  \n\nUnlike traditional CPUs and GPUs which adopt hardware managed caches, Trainium software (NKI and the Neuron Graph Compiler) explicitly manages the allocation and data movment within the entire memory hierarchy. This architecture allows developers to optimize hardware usage directly, resulting in more consistent and predictable performance. NKI exposes all NISA primitives needed to manage the memory hierarchy.\n\n.. _nki_arch_guides:\n\nNKI and Neuron Architecture\n----------------------------\n\nNKI currently supports the following NeuronDevice generations:\n\n* Trainium/Inferentia2, available on AWS ``trn1``, ``trn1n`` and ``inf2`` instances\n* Trainium2, available on AWS ``trn2`` instances and UltraServers\n* Trainium3, available on AWS ``trn3`` instances and UltraServers\n\nThe documents below provide an architecture deep dive of each NeuronDevice generation,\nwith a focus on areas that NKI developers can directly control through kernel implementation.\n\n* :doc:`Trainium/Inferentia2 Architecture Guide </nki/guides/architecture/trainium_inferentia2_arch>` serves as a foundational architecture guide for understanding the basics of any NeuronDevice generation.\n* :doc:`Trainium2 Architecture Guide </nki/guides/architecture/trainium2_arch>` walks through the architecture enhancements when compared to the previous generation.\n* :doc:`Trainium3 Architecture Guide </nki/guides/architecture/trainium3_arch>` covers the enhancements for the next-generation Trainium ML accelerators.\n  \nNeuron recommends new NKI developers start with :doc:`Trainium/Inferentia2 Architecture Guide </nki/guides/architecture/trainium_inferentia2_arch>` before exploring newer NeuronDevice architecture.\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: Trainium/Inferentia2 Architecture Guide\n      :link: trainium_inferentia2_arch\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Foundational architecture guide for understanding NeuronDevice basics.\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Trainium2 Architecture Guide\n      :link: trainium2_arch\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Architecture enhancements and improvements in the Trainium2 generation.\n\n   .. grid-item-card:: Trainium3 Architecture Guide\n      :link: trainium3_arch\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Latest architecture features and capabilities in Trainium3 devices.\n\nNKI APIs\n-------------\n\nNKI provides two sets of APIs:\n\n1. The higher-level ``nki.lang`` interface makes memory allocation, tensor indexing, and control of logical neuron core groups easier. Data scientists and ML engineers who know numpy and Triton will find this familiar.\n2. The lower-level ``nki.isa`` interface gives direct access to the Neuron Instruction Set Architecture (NISA). This lets operations map directly to hardware instructions with full control over instruction selection, scheduling, and allocation. This helps developers get the most out of the hardware for better performance, throughput, and latency.\n\nThese two APIs are designed to work together: ``nki.lang`` makes indexing and memory operations simpler, while ``nki.isa`` provides the hardware details needed for maximum efficiency.\n\nIn the next section, we provide broad view of key concepts for NKI programming, starting with how tensors are allocated, how loop performance is controlled, and memory movement APIs.\n\nTensor management and indexing \n------------------------------\n\nThe ``nki.lang`` APIs provide tools for memory allocation, execution scheduling, tensor indexing, and tensor manipulation. The next two examples demonstrate memory allocation and scheduling APIs.\n\nFor memory allocation, developers can explicitly control tensor placement in the memory hierarchy. For example:\n\n.. code-block:: python\n\n    import nki.language as nl\n\n    # Allocate tensor of FP32 elements in SBUF (on-chip scratchpad memory)\n    # using ndarray call similar to numpy \n    # like numpy, nl supports ndarray(), zeros() and ones() functions\n    x_on_chip = nl.ndarray((128, 32, 512), dtype=nl.float32, buffer=nl.sbuf)\n\n    # Allocate tensor of FP16 elements in HBM (high-bandwidth memory, off-chip)\n    y_in_hbm = nl.ndarray(shape, dtype=nl.float16, buffer=nl.shared_hbm)\n\nScheduling options for loop\n----------------------------\n\nLoops are a key part of tile and tensor programming. NKI offers three ways to write loops that control execution order and determine whether loops are optimized during compilation or depend on runtime values.\n\nLet's look at three types of loops, which serve as hints to the compiler. The compiler will always make sure your code works correctly, regardless of any optimizations it makes.\n\nSequential loop (default loops)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nLoops with sequential ranges are loops that might carry dependencies between the result of one loop to the next loop.  The NKI compiler does not try to re-order or parallelize the executions of loops, and runs them in sequence order.  When in doubt, Neuron recommends you start with sequential loops.\n\n.. code-block:: python\n\n    import nki.language as nl\n\n    # Sequential range - compiler will assume loop iteration n, *might* depend on \n    # results from iterations n-1, n-2,...0, and will not try to unroll \n    # or parallize the code execution\n    # when in doubt, developers should start with Sequential_range()\n\n    for i in nl.sequential_range(8):\n        # Compiler will not re-order\n        result = process_tile(result_from_previous_loop)\n        result_from_previous_loop = result\n\nAffine loop \n^^^^^^^^^^^^\n\nAffine loops are a hint that developers can give to the compiler when developer is confident there are no carried dependencies between different loop iterations. This approach allows the compiler to unroll and optimize code ordering between different iterations of the loop to improve performance. \n\n.. code-block:: python\n\n    import nki.language as nl\n\n    # Affine range - allows compiler optimizations like pipelining and unrolling\n    for i in nl.affine_range(8):\n        # Compiler can reorder and optimize these iterations\n        process_tile(i)\n\nOn-device (Dynamic) loop \n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSome code does not know the number of loop iterations at compile time; or perhaps the code depends on dynamically generated integer values during runtime that decide the number of iterations. In this case, the NKI compiler does not attempt to optimize across loop iterations.\n\n.. code-block:: python\n\n    import nki.language as nl\n\n    # Dynamic range - runs on device at runtime, not compile-time\n    lower_bound = register_alloc(0)\n    upper_bound = register_alloc(10)\n    for i in nl.dynamic_range(lower_bound, upper_bound):\n        process_tensor(t[i])\n\nDirect Hardware Control with nki.isa\n--------------------------------------\n\nThe ``nki.isa`` APIs provide low-level operations for computation, data movement, dynamic control flow, and communication between cores. The examples below show compute operations, dynamic control flow, and collective communication APIs.\n\nMatrix operations execute on the Tensor Engine. For instance:\n\n.. code-block:: python\n\n    import nki.isa as nisa\n\n    # Matrix multiplication on Tensor Engine using nc_matmul\n    # nc stands for NeuronCore, and matmul is the instruction name\n    # stationary: [128, 128], moving: [128, 512], output: [128, 512]\n    # The input arguments must meet NISA requirements as defined \n    # in the Trainium architecture, such as data types, layout, tile sizes\n    # and buffer memory types (SBUF or PSUM)\n    # dst is explicitly defined as instruciton parameter\n    nisa.nc_matmul(dst=output, stationary, moving)\n\n    # Element-wise operations between two tensors\n    # in this specific example, x and y must have the same partition dimension size\n    # and the same number of elements per partition.\n    # Notice the destination (dst) is explicit defined in the instruction parameters\n    # and op=nl.add defines the actual element-wise operation needed\n    nisa.tensor_tensor(dst=output, data1=x, data2=y, op=nl.add)\n\nDynamic control flow uses register-based operations to enable runtime control decisions on the device itself. For example:\n\n.. code-block:: python\n\n    import nki.isa as nisa\n    import nki.language as nl\n\n    # this is used to load the scalar register used in the dynamic loop\n    # memory allocation does NOT perform initialization\n    cond = nl.ndarray((1, 1), buffer=nl.shared_hbm, dtype=nl.int32)\n\n    # explicit initialization is required: initialize cond to zero\n    isa.dma_copy(dst=cond, src=nl.zeros())\n\n    # Allocate a scalar register for control flow\n    # initialize register to 1\n    reg = nisa.register_alloc(1)\n\n    # Dynamic while-loop with runtime condition\n    # while condition will check for non-zero integer in register as true condition\n    while reg:  \n        # Perform some calculation on device, which updates tensor cond\n        # update loop condition from cond\n        nisa.register_load(reg, cond)  # Re-evaluate condition\n\n\nCollective communication primitives enable kernels to coordinate and exchange data across multiple NeuronCores. For example:\n\n.. code-block:: python\n\n    import nki.isa as nisa\n\n    # Synchronize all cores at a barrier point\n    nisa.barrier()\n\n    # Send and receive data between cores\n    nisa.sendrecv()\n\nThe nki.isa interface gives developers detailed control over AWS Trainium's hardware. This direct access lets them fine-tune how computations work, manage memory, and optimize when instructions run. By controlling these elements precisely, developers can get the best performance from Trainium by creating custom versions of AI model parts like attention mechanisms, loss functions, and data preprocessing routines.\n\nNKI Open Source Compiler\n---------------------------\n\nThe NKI Compiler, built on MLIR, turns kernel source code into optimized NKI IR (Intermediate Representation). The Neuron Compiler Back-end then turns this NKI IR into NeuronISA instructions. When a framework model includes NKI source code, the framework calls the NKI Compiler to process these kernels separately. The NKI Compiler creates optimized NKI IR that gets added to the larger Neuron IR representing the complete model, which then goes to the Neuron Graph Compiler.\n\nThe NKI Compiler processes one kernel at a time, creating NKI intermediate representation (NKI IR). This IR, along with other kernels and compilation graphs, is used to create a Neuron Executable (NEFF). We've put the NKI Compiler code on GitHub so performance engineers, researchers, compiler developers, and MLIR enthusiasts can understand how the compilation works and contribute to research or development.\n\nThe diagram below shows how PyTorch or JAX models are turned into optimized NeuronISA instructions. When developers create a model with NKI kernels (marked with the @nki.jit decorator), the framework starts tracing the model through the Neuron Backend. During this process, when the framework finds NKI kernels, it calls the NKI Compiler to process them right away. The NKI Compiler creates optimized NKI IR that is saved and referenced by custom-call nodes in the Neuron IR. The framework continues building the complete Neuron IR, adding these custom-call nodes alongside regular model operations. When the Neuron IR is complete, the Graph Compiler processes the entire model, and the Neuron Compiler Back-end generates code for both standard operations and the NKI kernels by turning the referenced NKI IR into NeuronISA instructions.\n\n.. image:: /nki/img/overviews/about-nki-2.png\n\nIn PyTorch, the ``@nki_op`` decorator handles registration of the custom operation, enabling seamless integration into the framework's execution model.\n\nFor more information, see :doc:`the NKI Compiler documentation </nki/deep-dives/nki-compiler>`.\n\nNKI Library\n------------\n\nThe NKI Library (``NKI-Lib``) is a collection of open-source, pre-optimized, production-ready kernels for common operations. You can use these kernels directly in your PyTorch or JAX code as regular Python functions. The library has two main purposes:\n\n1. It gives you immediate performance improvements through optimized implementations\n2. It provides examples that show best practices for memory management, instruction scheduling, and hardware use\n\nDevelopers can use these kernels as they are or as starting points for creating custom optimizations for specific needs.\n\nFor more information, see :doc:`the NKI Library documentation </nki/library/index>`.\n\nWorking with NKI Kernels\n-------------------------\n\nIf you're already running models on Trainium or Inferentia, you're probably using NKI kernels without realizing it. The Neuron compiler automatically adds optimized NKI kernels for common operations during compilation. Many of these kernels are already part of the standard compilation process. When you use vLLM with the Neuron plug-in, popular models already include NKI kernels. Models in NeuronXDistributed Inference also regularly use NKI kernels for you. In many cases, you get the performance benefits of these kernels without changing any code.\n\nBeyond these automatic optimizations, developers who want more control can use NKI in two more ways. First, you can call existing kernels from the NKI Library directly in your PyTorch or JAX code. This needs only small code changes. You just import the kernel and call it where needed in your model. For example, if you need a faster attention mechanism or a special activation function, you can add the matching NKI Library kernel with just a few lines of code.\n\n.. code-block:: python\n\n    # Example: Authoring a custom NKI kernel in PyTorch\n\n    import torch\n    from torch_neuronx import nki_op, nki\n    import nki.language as nl\n\n    # Step 1: Define NKI kernel\n    @nki.jit\n    def my_kernel(in_ptr0, out_ptr):\n        # ... kernel implementation ...\n\n    # Step 2: Register as PyTorch custom operator\n    @nki_op(\"mylib::my_op\", mutates_args={})\n    def my_op(x: torch.Tensor) -> torch.Tensor:\n        out = torch.empty_like(x)\n        my_kernel(x, out)\n        return out\n\n    # Use in PyTorch code\n    x = torch.randn(128, device=\"neuron\")\n    y = my_kernel(x)\n\n    # y = my_op(x)\n\n\nSecond, developers can create custom kernels for operations that aren't in the library or need special optimizations. You can start from scratch using the ``nki.lang`` or ``nki.isa`` APIs, or you can modify existing kernels from the NKI Library as starting points.\n\nThese three approaches (automatic optimization, using library kernels, and creating custom kernels) are widely used across ML frameworks and libraries. Frameworks like PyTorch use NKI kernels through ATen operator dispatch for seamless integration. NxD Inference (NxDI), Optimum Neuron, and vLLM use all three approaches: they benefit from automatic compiler optimizations, directly call kernels from the NKI Library when appropriate, and create custom kernels for their specific needs.\n\nProfiling, Debugging, and Performance Optimization\n----------------------------------------------------\n\nNeuron Explorer helps you profile your NKI kernel by making it easier to capture and analyze performance data at both system and device levels. You can collect detailed system profiles that show:\n\n* Device utilization (how much each engine is used)\n* Memory consumption\n* Communication patterns between cores\n\nFor NKI kernels specifically, Neuron Explorer shows source-code level information, helping you find bottlenecks by connecting kernel code directly with device-level profiles. The tool works with familiar framework APIs in both PyTorch and JAX. You can view the results in several ways:\n\n* The Neuron Profiler UI\n* Perfetto integration\n* JSON export for custom analysis\n\nThis makes it easier than ever to optimize your NKI kernel performance.\n\nFor a more in-depth example of profiling a NKI kernel with Neuron Explorer, see :doc:`/nki/guides/use-neuron-profile` and the :doc:`Neuron Explorer documentation </tools/neuron-explorer/index>`.\n\nCore Concepts\n---------------\n\nFor details on specific NKI concepts, jump to one of these topics:\n\n.. grid:: 1 \n   :gutter: 3\n\n   .. grid-item-card:: Introduction to Direct Memory Access (DMA)\n      :link: nki-dma-overview\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Learn about DMA with NKI.\n\n   .. grid-item-card:: Data Representation Overview\n      :link: data-representation-overview\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Understanding data types, layouts, and representation in NKI programming.\n\n   .. grid-item-card:: Indexing Overview\n      :link: indexing-overview\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Tensor indexing patterns and addressing schemes in NKI kernels.\n\n   .. grid-item-card:: Memory Hierarchy Overview\n      :link: memory-hierarchy-overview\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Memory levels, allocation strategies, and data movement in Neuron devices.\n\n   .. grid-item-card:: Tiling Overview\n      :link: tiling-overview\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Strategies for breaking down large computations into manageable tiles.\n\nUnderstanding the NKI Language\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nExplore core language constructs including loops, indexing, and control flow, explain the memory hierarchy and data representation, and cover tiling and scheduling concepts with examples. Link to the docs for deep diving into optimization techniques like allocation and scheduling.\n\nFor more about the NKI Language, see :doc:`/nki/get-started/nki-language-guide`. Otherwise, read up on the core programming concepts below!\n\nCore Programming Model\n^^^^^^^^^^^^^^^^^^^^^^^\n\nNKI uses a sequential programming model where operations run in the order they're written. However, the compiler may change the order of operations that don't depend on each other to make the code faster. This approach gives predictable execution while letting the hardware's multiple compute engines work in parallel behind the scenes.\n\nThere's an important difference between compile-time and runtime execution:\n* Most NKI code, including print statements, runs during compilation\n* Other statements, like nki.isa.* function calls, create actual runtime operations on the device\n\nFor example:\n\n\n.. code-block:: python\n\n    @nki.jit\n    def my_function(x: tensor, y: tensor) -> tensor:\n        print(f\"adding tensors of type {x.dtype} and {x.shape}\")  # Compile-time print\n        nki.isa.tensor_tensor(output, x, y, op=nki.language.add)  # Runtime\n        return output\n\n\nThe print statement shows \"adding tensors of type float16 and shape (128,512)\" during compilation, not when the code runs on the device. If you want to see output from the device itself, NKI provides a special device_print function.\n\n\nValue Types and Data Structures\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe NKI meta-programming language supports six basic value types:\n* None\n* Booleans\n* 32-bit integers\n* 32-bit floats\n* String literals\n* Tensors (references to on-device memory)\n\nIt also supports container types like tuples, lists, dictionaries with string keys, and simple user-defined classes. These containers work much like their Python equivalents:\n\n\n.. code-block:: python\n\n    l = [1, 2, 3]\n    l.append(4.1)\n    l.extend((\"Hello\", \"List\"))\n    size = l.count()\n\n    d = dict()\n    d['a'] = 1\n    for k, v in d.items():\n        print(k, v)\n\nTensor Management and Memory\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTensors are the most important type in NKI. They represent on-chip memory regions with metadata you can query, including dtype, shape, address, offset, pattern, and buffer. The most commonly used fields are dtype and shape, which help with compatibility checking and iteration:\n\n\n.. code-block:: python\n\n    assert x.shape == y.shape, \"expecting tensors of the same shape\"\n    for i in range(t.shape[0]):  # Compile-time constant bounds\n        my_function(t[i])\n\n\nYou can create tensors using the simple nki.language.ndarray API or more advanced memory management techniques. The basic approach creates tensors with a specified shape, data type, and memory buffer:\n\n\n.. code-block:: python\n\n    t = nl.ndarray((128, 128), nl.float16, nl.sbuf)\n    u = t.reshape((128, 2, 64))  # Alternative view of same memory\n\n\n\nMemory Architecture and Indexing\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe SBUF memory uses a two-dimensional layout with partition and free dimensions. By convention, the first tensor dimension always maps to the partition dimension, while the remaining dimensions are arranged in the free dimension.\n\nTensor indexing supports integer indexing, slices (start:stop:step), and ellipsis (...) notation, just like NumPy:\n\n\n.. code-block:: python\n\n    u = t[0, 0, 10]        # Single element\n    u = t[:, 0, :]         # Slice with defaults\n    u = t[0, ..., :]       # Using ellipsis\n    u = t[::2, :, ::2]     # Step indexing\n\n\nEach indexing operation creates a new tensor reference with hardware access patterns you can query:\n\n\n.. code-block:: python\n\n    u = t[0, ...]\n    print(u.offset)   # Hardware access pattern offset\n    print(u.pattern)  # Hardware access pattern\n\nControl Flow Constructs\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nNKI supports two types of control flow:\n1. Static control flow (evaluated at compile-time)\n2. Dynamic control flow (executed on the device)\n\nStatic control flow includes standard if statements, for loops, and while loops that are unrolled during compilation:\n\n\n.. code-block:: python\n\n    for i in range(len(inputs)):\n        if i % 2 == 0:\n            nki.isa.nc_transpose(dst=outputs[i], data=inputs[i])\n        else:\n            nki.isa.reciprocal(dst=outputs[i], data=inputs[i])\n\n\nThe compiler provides special range functions as performance hints: sequential_range(), static_range(), and affine_range(). These don't change how your code works, but they give the compiler hints about how to optimize it.\n\n**Dynamic control flow** runs on the Trainium device using register values and a special range function:\n\n.. code-block:: python\n\n    # Dynamic loop with static bounds\n    for i in dynamic_range(10):\n        process_tensor(t[i])\n\n    # Dynamic loop with register-based bounds\n    count = nki.isa.register_alloc(count_tensor)\n    for i in dynamic_range(count):\n        process_tensor(t[i])\n\n\nDynamic while loops use register conditions and four register management APIs: register_alloc(), register_move(), register_load(), and register_store().\n\n\nClass Support and Interoperability\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNKI provides basic support for user-defined classes, which must inherit from NKIObject. These classes work similarly to Python dataclasses and can be created with or without the @dataclass decorator:\n\n\n.. code-block:: python\n\n    @dataclass\n    class C(NKIObject):\n        x: int\n        y: bool = False\n\n        def toggle(self):\n            self.y = not self.y\n\n    c = C(1)\n    c.toggle()\n\n\nYou can create class instances in Python and pass them to NKI kernels, where they're translated using the object's dictionary. Check the language guide for more details.\n\n\nNKI Compiler Architecture and Development\n------------------------------------------\n\nThe NKI language gives kernel writers detailed control over Neuron hardware. By offering low-level APIs that match the hardware instructions, the compiler steps back and lets developers take control. This needs a separate compiler that processes the kernel code and works together with the Neuron Graph compiler to fit kernels into the overall model.\n\nThe NKI compiler runs when Python is tracing the code. When the interpreter finds a top-level function with the ``@nki.jit`` decorator, it calls the NKI compiler. The compiler reads the function, creates an Abstract Syntax Tree (AST) of the user's code, and makes a few low-level changes to:\n\n* Optimize the code\n* Allocate memory\n* Schedule instructions\n\nIt then sends the optimized code to the Neuron Graph compiler, which adds it to the overall model and creates the NEFF executable.\n\nThe diagram below shows the detailed compilation process inside the Neuron compilers and how they work together to create the final program that runs on Neuron hardware. The NKI Compiler first converts the kernel code into an AST representation for analysis. It then makes a few middle-end and back-end changes to the AST, improving resource allocation and instruction scheduling. This creates optimized NKI IR that gets added back into the overall model.\n\n.. image:: /nki/img/overviews/about-nki-3.png\n\n.. toctree::\n      :maxdepth: 1\n      :hidden:\n\n      Memory Hierarchy <memory-hierarchy-overview>\n      Data Representation <data-representation-overview>\n      Indexing <indexing-overview>\n      Tiling <tiling-overview>\n      Direct Memory Access <nki-dma-overview>\n      Logical Neuron Cores <lnc>\n"
  },
  {
    "path": "nki/get-started/about/indexing-overview.rst",
    "content": ".. meta::\n   :description: Overview of Indexing in NKI\n   :date_updated: 12/02/2025\n\n.. _nki-about-indexing:\n.. _nki-tensor-indexing:\n\n=======================\nTensor Indexing on NKI\n=======================\n\nThis topic covers basic tensor indexing and how it applies to developing with the AWS Neuron SDK. This overview describes basic indexing of tensors with several examples of how to use indexing in NKI kernels.\n\n.. _nki-basic-tensor-indexing:\n\nBasic Tensor Indexing\n^^^^^^^^^^^^^^^^^^^^^\n\nNKI supports basic indexing of tensors using integers as indexes. For example,\nwe can index a 3-dimensional tensor with a single integer to get get a *view*\nof a portion of the original tensor.\n\n.. code-block::\n\n   x = nl.ndarray((2, 2, 2), dtype=nl.float32, buffer=nl.hbm)\n\n   # `x[1]` return a view of x with shape of [2, 2]\n   # [[x[1, 0, 0], x[1, 0 ,1]], [x[1, 1, 0], x[1, 1 ,1]]]\n   assert x[1].shape == [2, 2]\n\nNKI also supports creating views from sub-ranges of the original tensor\ndimension. This is done with the standard Python **slicing** syntax. For\nexample:\n\n.. code-block::\n\n   x = nl.ndarray((2, 128, 1024), dtype=nl.float32, buffer=nl.hbm)\n\n   # `x[1, :, :]` is the same as `x[1]`\n   assert x[1, :, :].shape == [128, 1024]\n\n   # Get a smaller view of the third dimension\n   assert x[1, :, 0:512].shape == [128, 512]\n\n   # `x[:, 1, 0:2]` returns a view of x with shape of [2, 2]\n   # [[x[0, 1, 0], x[0, 1 ,1]], [x[1, 1, 0], x[1, 1 ,1]]]\n   assert x[:, 1, 0:2].shape == [2, 2]\n\nWhen indexing into tensors, NeuronCore offers much more flexible memory access\nin its on-chip SRAMs along the free dimension. You can use this to efficiently\nstride the SBUF/PSUM memories at high performance for all NKI APIs that access\non-chip memories. Note, however, this flexibility is not supported along the\npartition dimension. That being said, device memory (HBM) is always more\nperformant when accessed sequentially.\n\n.. _nki-advanced-tensor-indexing:\n.. _pm_sec_tile_indexing:\n\nTensor Indexing by Example\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this section, we share several use cases that benefit from advanced\nmemory access patterns and demonstrate how to implement them in NKI.\n\nCase #1 - Tensor split to even and odd columns\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHere we split an input tensor into two output tensors, where the first\noutput tensor gathers all the even columns from the input tensor,\nand the second output tensor gathers all the odd columns from the\ninput tensor. We assume the rows of the input tensors are mapped to SBUF\npartitions. Therefore, we are effectively gathering elements along\nthe free dimension of the input tensor. The figure below visualizes the input and output tensors.\n\n.. _nki-fig-pm-index-1:\n\n.. image:: /nki/img/pm-index-1.png\n   :align: center\n   :width: 60%\n\n*Tensor split to even and odd columns*\n\n.. nki_example:: /nki/examples/index-case-1.py\n   :language: python\n   :linenos:\n   :whole-file:\n\nThe main concept in this example is that we are using slices to access the even\nand odd columns of the input tensor. For the partition dimension, we use the\nslice expression `:`, which selects all of the rows of the input tensor. For\nthe free dimension, we use `0:sz_f:2` for the even columns. This slice says:\nstart at index `0`, take columns unto index `sz_f`, and increment by `2` at\neach step. The odd columns are similar, except we start at index `1`.\n\n\nCase #2 - Transpose tensor along the f axis\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn this example we transpose a tensor along two of its axes. Note,\nthere are two main types of transposition in NKI:\n\n1. Transpose between the partition-dimension axis and one of the free-dimension axes, which is achieved via the\n   :doc:`nki.isa.nc_transpose </nki/api/generated/nki.isa.nc_transpose>` API.\n2. Transpose between two free-dimension axes, which is achieved via a :doc:`nki.isa.dma_copy </nki/api/generated/nki.isa.dma_copy>` API,\n   with indexing manipulation in the transposed axes to re-arrange the data.\n\nIn this example, we'll focus on the second case: consider a\nthree-dimensional input tensor ``[P, F1, F2]``, where the ``P`` axis is mapped\nto the different SBUF partitions and the ``F1`` and ``F2`` axes are\nflattened and placed in each partition, with ``F1`` being the major\ndimension. Our goal in this example is to transpose the ``F1`` and\n``F2`` axes with a parallel dimension ``P``,\nwhich would re-arrange the data within each partition. The figure below illustrates the input and output tensor layouts.\n\n.. _nki-fig-index-2:\n\n.. image:: /nki/img/pm-index-2.png\n   :align: center\n   :width: 60%\n\n*Tensor F1:F2 Transpose*\n\n.. nki_example:: /nki/examples/transpose2d/transpose2d_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_33\n\nThe main concept introduced in this example is a 2D memory access\npattern per partition, via additional indices. We copy ``in_tile`` into\n``out_tile``, while traversing the memory in different access patterns\nbetween the source and destination, thus achieving the desired\ntransposition.\n\nYou may download the full runnable script from :ref:`Transpose2d tutorial <tutorial_transpose2d_code>`.\n\nCase #3 - 2D pooling operation\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nLastly, we examine a case of\ndimensionality reduction. We implement a 2D MaxPool operation, which\nis used in many vision neural networks. This operation takes\n``C x [H,W]`` matrices and reduces each matrix along the ``H`` and ``W``\naxes. To leverage free-dimension flexible indexing, we can map the ``C``\n(parallel) axis to the ``P`` dimension and ``H/W`` (contraction)\naxes to the ``F`` dimension.\nPerforming such a 2D pooling operation requires a 4D memory access\npattern in the ``F`` dimension, with reduction along two axes. The figure below illustrates the input and output tensor layouts.\n\n.. _nki-fig-index-3:\n\n.. image:: /nki/img/pm-index-3.png\n   :align: center\n   :width: 60%\n\n*2D-Pooling Operation (reducing on axes F2 and F4)*\n\n.. nki_example:: /nki/examples/index-case-3.py\n   :language: python\n   :linenos:\n   :whole-file:\n\n\n"
  },
  {
    "path": "nki/get-started/about/lnc.rst",
    "content": ".. meta::\n   :description: Overview of Neuron Logical Cores\n   :date_updated: 12/12/2025\n\n.. _nki-about-lnc:\n\nUsing Logical Neuron Cores (LNC)\n================================\n\nThis topic covers how to use multiple neuron cores by launching your NKI kernel\non multiple cores at the same time. This overview will cover how to launch\nkernels, and the basic methods for writing a kernel to run on multiple cores.\n\nLogical Neuron Cores (LNC)\n--------------------------\n\nThe Neuron SDK supports running NKI kernels on multiple logical cores. When\nlaunching a kernel, you can opt to run the kernel on 1 or 2 logical cores. If\nyou choose to run on 2 logical cores, at runtime, your kernel will be run on\ntwo physical cores (if available) that have shared HBM memory (see Trainium3\nArchitrecture <trainium3_arch> for more details on NeuronCores). These two\nversion can operate on different parts of the input data, increasing overall\nperformance of your kernel.\n\nNKI gives you a few mechanisms to for using Logical Neuron Cores (LNC). We will\nlook briefly at each of these, specifically we will describe:\n\n1. How to launch a kernel on multiple cores\n2. How to tell if a kernel is running on multiple cores\n3. How to tell which core a kernel is running on\n\nLaunching a kernel on multiple cores\n-----------\n\nTo launch a NKI kernel on multiple cores, you specify the number of cores to\nuse, in square brackets, when calling the kernel. For example, suppose we have\na kernel called `lnc_test`, and we want to launch this kernel on two cores.\n\n.. code-block::\n\n   # Launch lnc_test on 2 cores\n   lnc_test[2](input)\n\nThe bracket syntax must contain only one number, the number of cores to use.\nIf no brackets are given the number of cores defaults to 1. If the number is\ntoo large for the current architecture, then you will receive an error.\n\n.. code-block::\n\n   # Launch lnc_test on 1 core\n   lnc_test(input)\n\n   # Launch lnc_test on 1 core\n   lnc_test[1](input)\n\n   # Launch lnc_test on 2 cores\n   lnc_test[2](input)\n\n   # Launch lnc_test on 8 cores (ERROR on current architecture)\n   lnc_test[8](input)\n\nProgramming for multiple cores\n-----\n\nWhen writing a NKI kernel for multiple cores, there are two important APIs that\ncan be used to tell how many cores are being used and which core the current\ninstance is running on. These APIs are called `num_programs` and `program_id`.\n\nThe `num_programs` API will return the total number of cores the current kernel\nis running on. If LNC is not being used, this API will return 1. So, we can\ntell if we are running on multiple cores by inspecting the result of this\nvariable:\n\n.. code-block::\n\n   @nki.jit\n   def lnc_test(input):\n     if nl.num_programs() > 1:\n       print(\"Running on multiple cores\")\n     else:\n       print(\"Running on one core - no LNC\")\n\n   # Launch lnc_test on 1 core\n   # prints \"Running on one core - no LNC\"\n   lnc_test(input)\n\n   # Launch lnc_test on 2 cores\n   # prints \"Running on multiple cores\"\n   lnc_test[2](input)\n\nThe `program_id` API will return the logical core id that the current\ninstance is running on. In the case of LNC=2, this API will return either 0\nor 1. When not using LNC, this API will return 0. This API can be used to\nprogrammatically divide work between multiple cores.\n\nFor example, suppose we have a tensor with shape `2x128x128` and we want to\ncompute the reciprocal of all of the elements of this tensor. We can write a\nkernel function that is LNC-aware and can make use of extra cores when\navailable.\n\n.. code-block::\n\n   def lnc_test(input):\n    # Check the first dimension is 2 for this example\n    assert input.shape[0] == 2\n\n    # create temporary storage on SBUF for comptation\n    in_tile = nl.ndarray(input.shape[1:], input.dtype, buffer=nl.sbuf)\n    out_tile = nl.ndarray(input.shape[1:], input.dtype, buffer=nl.sbuf)\n\n    # create output tensor\n    output = nl.ndarray(input.shape, input.dtype, buffer=nl.shared_hbm)\n\n    if nl.num_programs() == 1:\n      # Not using multiple cores, process two tiles\n      for i in range(2):\n        nisa.dma_copy(in_tile, input[i])\n        nisa.reciprocal(out_tile, in_tile)\n        nisa.dma_copy(output[i], out_tile)\n    else:\n      # Using multiple cores, process tiles in parallel, one per core\n      i = nl.program_id(0)\n      nisa.dma_copy(in_tile, input[i])\n      nisa.reciprocal(out_tile, in_tile)\n      nisa.dma_copy(output[i], out_tile)\n    return output\n\nThe code above has two cases, one for when we are not using LNC\n(`num_programs` returns 1), and one for when we are using LNC=2\n(`num_programs` returns 2). In the non-LNC case, there is a for loop that\nprocesses each input tiles one after the other. However, in the LNC=2 case,\nwe can use the `program_id` API to query which core we are on. This API will\nreturn either `0` or `1`. The code uses the `program_id` to have each core\nprocess one of the two tiles, in parallel.\n\nFinal Notes\n---\n\nUsing LNC can improve the performance of NKI kernels by leveraging multiple\nNeuronCores. However, there are two things to be mindful of when using LNC.\nFirst, the inputs and outputs of the kernel should be stored in the Shared HBM\nthat all of the cores can access. Second, the Neuron SDK assumes that when\nrunning a kernel on multiple cores, the program on each core is \"the same\".\nThis means that each core is executing the same basic control flow as the other\ncores. Most of the time, this requirement will be automatically satisfied by\nthe NKI compiler. However, if you use dynamic control flow, and this\ncontrol-flow is different on the different cores, then the behavior is\nundefined, and you will likely receive an error at runtime.\n"
  },
  {
    "path": "nki/get-started/about/memory-hierarchy-overview.rst",
    "content": ".. meta::\n   :description: Overview of the Trainium Memory Hierarchy\n   :date_updated: 12/02/2025\n\n.. _nki-about-memory:\n\n======================================\nThe Trainium Memory Hierarchy\n======================================\n\nThis topic covers the Trainium Memory Hierarchy and how it applies to developing with the AWS Neuron SDK. This overview covers the various memories\nthat are available on the Trainium hardware and how they are used. Understanding the memory hierarchy is important for writing performant kernels\nfor use in your Machine Leaning models.\n\n\nMemory hierarchy\n-----------------\n\nThe diagram in :numref:`Fig. %s <nki-fig-pm-memory>`, below, shows the four-level memory hierarchy available to a single NeuronCore. The latency\nranges provided in the figure are approximate and are intended to calibrate the programmer's mental model (see :doc:`NeuronDevice Architecture Guide </nki/guides/architecture/trainium_inferentia2_arch>` for the exact values). Memories closer to the top of the figure are the closer to the compute engines; therefore, they are designed to provide the highest bandwidth and lowest latency. However, the faster memories also have smaller capacities compared to memories near the bottom. This set of memories is the *Memory Hierarchy* for the Trainium devices.\n\nUnlike memory hierarchies for traditional processors (such as CPUs and GPUs), all of the memories available to a NeuronCore are software-managed. This means the contents of the memories are managed either directly by the programmer, or by the Neuron SDK tool chain, rather than being managed by the hardware. In other words, NeuronCore does not have a hardware cache system that performs data movement across memories in a way that is opaque to the program. All memory movement is explicit in the program itself. These explicit memory movements may be specified by writing a NKI kernel, or they may be computed by the Neuron Graph Compiler as part of the optimization process. \n\nIn the following section we will discuss each memory in turn.\n\n.. _nki-fig-pm-memory:\n\n.. figure:: /nki/img/pm-memory.png\n   :align: center\n   :width: 80%\n\n   NeuronCore Memory Hierarchy with Capacity and Bandwidth Ranges\n\nNeuronCore external memory\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe two memories at the bottom of the hierarchy, host memory and device memory,\nare both considered *external* memory for a NeuronCore. These memories are\n**linear memory**, where multi-dimensional tensors must be stored in a\nflattened manner.\n\nThe **host memory** is the CPU-attached DRAM, which is accessible by the host\nCPUs and all the NeuronCores attached to the instance. NKI kernels currently do\nnot provide APIs to move data in and out of the host memory directly, but\nrather, rely on ML frameworks such as PyTorch or JAX to send input data from\nhost memory to the NeuronDevice and vice versa. For an example of this, see\n:doc:`Getting Started with NKI </nki/get-started/quickstart-implement-run-kernel>`.\n\nThe **device memory** resides within a NeuronDevice and uses High Bandwidth\nMemory (HBM) technologies starting from NeuronDevice v2. Currently, the input\nand output parameters to NKI kernels must be HBM tensor references. When a NKI\nkernel begins execution, the first task is to load the input tensors from HBM\ninto the internal memory. Then computation can be done on the tensors in\ninternal memory. Once the computation is complete, the results are copied from\nthe internal memory back to the HBM.\n\nNeuronCore internal memory\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe two memories at the top of the hierarchy, SBUF and PSUM, are both\nconsidered *internal* (or *on-chip*) memory for a NeuronCore. Both memories are\n**two-dimensional** memory, organized in **128 partitions**. The partitions\nsize of PSUM is typically much smaller than SBUF, and PSUM/SBUF partition sizes\nvary with NeuronCore generations.\n\nState Buffer (SBUF) memory is the main software-managed on-chip memory. The\nSBUF is accessible by all the compute engines within a NeuronCore. NKI kernel\ninput tensors from HBM must be loaded into the SBUF for computation computed\noutput tensors of the kernel must be stored back into the HBM from SBUF before\nthe host can access them.\n\nBoth loading and storing to and from the HBM memory can be done using the :doc:`nki.isa.dma_copy </nki/api/generated/nki.isa.dma_copy>` API. In addition, SBUF is used for storing intermediate data within the kernel, generated by the compute engines. Note, SBUF has **~20x higher bandwidth** than HBM, but it needs to be carefully managed to minimize HBM accesses for better performance.\n\nLastly, Partial Sum Buffer (PSUM) memory is a small, dedicated memory designed\nfor storing matrix multiplication (MatMult) results computed by the tensor\nengine. Tensor Engine is able to read-add-write to every address in PSUM.\nTherefore, PSUM is useful for performing large MatMult calculations using\nmultiple tiles where multiple MatMult instructions need to accumulate into the\nsame output tile. As is shown in :numref:`Fig. %s <nki-fig-pm-memory>`, PSUM memory\ncan also be read and written by the vector and scalar engines. However, due to\nthe limited capacity of PSUM, we recommend that you reserve PSUM space for the\ntensor engine to write MatMult outputs and to use the vector and scalar engines\nto evict MatMult results back to SBUF as soon as possible.\n\n.. note:: \n   To optimize kernel performance, it is good practice for NKI programmers to be mindful of SBUF and PSUM usage through careful :ref:`tiling <nki-about-tiling>` and loop fusion. If the total size of the live data being used by a NKI kernel overflows the capacity of any on-chip memory, the Neuron compiler will insert the necessary spills or refills between that memory and the next-tier memory in the hierarchy.\n\n"
  },
  {
    "path": "nki/get-started/about/nki-dma-overview.rst",
    "content": ".. meta::\n    :description: Direct Memory Access (DMA) engines in Neuron enable efficient data movement between different memory types, maximizing memory bandwidth utilization and overall workload performance.\n\nIntroduction to Direct Memory Access (DMA) with NKI\n======================================================\n\nDirect Memory Access (DMA) engines in Neuron enable efficient data movement between different memory types, primarily between the device memory (HBM) and on-chip SRAM buffers (SBUF). DMA Engines can operate in parallel to compute, allowing asynchronous data movement independent from compute operations. Each NeuronCore (v2-v4) is paired with 16 DMA engines. Understanding and efficiently utilizing these DMA engines is critical for maximizing memory bandwidth utilization and overall workload performance.\n\nBefore reading this doc, it may be helpful to refer to :doc:`Introduction to memory hierarchy in NKI </nki/get-started/about/memory-hierarchy-overview>`.\n\nBasic DMA Capabilities\n-----------------------\n\nTo move data between HBM and SBUF, programmers can initiate a DMA transfer that gets executed by the DMA engines. Each DMA transfer starts with a DMA trigger from a NeuronCore and ends with a semaphore update from the DMA engine to signal the completion of transfer back to the NeuronCore. Today, each DMA transfer is by default parallelized up to 16 DMA engines, depending on the shape.\n\nThe 16 DMA Engines are connected to both the off chip HBM and the on-chip SRAM, called SBUF. DMA transfers can move data in multiple directions: bidirectionally between HBM to SBUF, within HBM or within SBUF. Each DMA engine has a theoretical bandwidth of 27.2 GB/s for NeuronCore-v2 and -v3 or 38.4 GB/s for NeuronCore-v4. DMA engines also support scatter-gather operations, allowing a single transfer to gather data from multiple non-contiguous source buffers or scatter to multiple non-contiguous destination buffers. \n\nDMA transfers can perform both copy and transpose transfers into SBUF. This doc will mainly focus on copy transfers.\nYou can also perform casting as part of DMA when the transfer has a different source and destination datatype. Neuron supported datatypes can be found in the :doc:`NKI datatype guide </nki/api/nki.api.shared>`. The casting operation is performed by first casting the source type to FP32, before finally casting to the destination type. This may be worth considering if working with integer types. Casting with DMAs is not supported for MXFP4 and MXFP8 datatypes.\n\nDMA Triggers\n-------------\n\nDMA transfers can be triggered by any engine sequencer in the NeuronCore. (For details, refer to :doc:`/nki/guides/architecture/trainium2_arch`.) The sequencer instruction to trigger the transfer may wait on any semaphore condition which is signaled by other compute engines to respect data dependencies. The Trigger Engine for a given transfer can be specified by setting the ``engine`` parameter when calling :doc:`nisa.dma_copy </nki/api/generated/nki.isa.dma_copy>`. This behavior is only allowed when using hardware DGE in the current NKI release.\n\nDMA Queues\n-----------\n\nDMA transfers are submitted to DMA queues for the DMA Engines to consume. There are 16 DMA queues per DMA engine (ID 0-15). A given DMA transfer can be submitted to a single queue ID across all 16x DMA engines paired with a NeuronCore. The given queue for a DMA transfer can be seen when mousing over a DMA transfer in a profile in Neuron Explorer. The queue ID is typically tied to the trigger engine and the method of descriptor generation (refer to the NeuronCore-v3 architecture guide for details). DMA transfers within a queue on the same DMA engine are executed in order. DMA transfers from different DMA queues are scheduled in a round robin fashion (for NeuronCore-v2 and v3) or based on the queue QoS configured (for NeuronCore-v4). Refer to the NeuronCore-v4 architecture guide for more details on DMA QoS.\n\nPerformance Considerations\n---------------------------\n\nWhen moving data in or out of SBUF, optimal performance is achieved with transfers maximizing the number of partitions with 4KiB or larger per partition. Given 16x DMA engines and 128 SBUF partitions, each DMA engine is typically responsible for moving data for eight SBUF partitions (128 partitions / 16 DMA engines). The figure below visualizes the DMA throughput across different number of bytes per partition (\"Free Bytes\"), for a fixed partition dimension size of 128:\n\n.. figure:: /nki/img/overviews/nki-dma-intro-1.jpg\n   :alt: DMA throughput graph showing performance across different bytes per partition\n\nThe points on the graph refer to various Free (Dimension) Byte values (that is, bytes per partition). We see that at 4096 free bytes, we are able to nearly saturate DMA bandwidth.\n\nAnother key consideration for performance is overhead to initiate a DMA transfer. Small, frequent transfers incur significant overhead causing us to be latency bound, while larger transfers help amortize these costs, moving to a more bandwidth bound regime. For optimal performance, it's important to batch data movements into larger transfers whenever possible. \n\nThe table below shows the theoretical peak DMA bandwidth per NeuronCore generation:\n\n.. list-table::\n   :header-rows: 1\n\n   * - Generation\n     - BW / Engine\n     - Engines / NC\n     - Aggregate BW\n   * - Trainium1\n     - 17 B/ns\n     - 16\n     - 272 GB/s\n   * - Trainium2\n     - 23 B/ns\n     - 16\n     - 368 GB/s\n   * - Trainium3\n     - 33 B/ns\n     - 16\n     - 528 GB/s\n\nThe minimum free dimension size to reach the recommended 2 KiB per partition depends on the data type:\n\n.. list-table::\n   :header-rows: 1\n\n   * - Data Type\n     - Minimum Free Dimension\n     - Bytes per Partition\n   * - float32\n     - 512 elements\n     - 2 048\n   * - bfloat16 / float16\n     - 1 024 elements\n     - 2 048\n   * - float8\n     - 2 048 elements\n     - 2 048\n\nWe will look at two examples below, which show various shapes, sizes and access patterns, and how this affects the the achieved DMA throughput of the corresponding DMA transfers.\n\nExamples\n---------\n\nAs DMAs are a result of the corresponding source layout and access pattern, it is best to look at concrete examples to ground our understanding of common applications and their resulting access patterns.\n\nExample 1: Move A[4,4096] HBM → SBUF\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe purpose of this example is to show a very simple access pattern (a 2D tensor in contiguous memory in HBM, being written to SBUF). This should build a foundation of how a particular access pattern maps to a specific set of DMA transfers.\n\nConsider a 2D Tensor, A[4, 4096], in HBM. Assume the tensor is laid out in row-major form and is contiguous in the HBM. In row major form, array elements are stored sequentially row by row in memory, meaning all elements of the first row are stored first, followed by all elements of the second row, and so on. Let's assume we wish to move this tensor to SBUF, where the destination tensor will have a partition dimension of 4 and a free dimension of 4096. Each row of the source tensor will occupy a single partition in SBUF. \n\nAssuming A is a bfloat16 tensor, this means that the total size of the tensor is 32KiB (4*4096*2B). Knowing that each DMA engine corresponds to 8 partition lanes, and we are writing our 4 rows to only 4 partition lanes of SBUF, we would expect to see a single DMA engine active, with a single transfer size of 32KiB.\n\nHere is a diagram with the expected behavior:\n\n.. figure:: /nki/img/overviews/nki-dma-intro-2.jpg\n   :alt: Diagram showing DMA transfer of A[4,4096] from HBM to SBUF\n\nExample\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nHere is the kernel to perform the DMA transfer.\n\n.. code-block:: python\n\n    import nki.language as nl\n    import nki.isa as nisa\n    import nki\n\n    @nki.jit\n    def tensor_exp_kernel_isa(in_tensor):\n      \"\"\"NKI kernel to compute elementwise exponential of an input tensor\n      Args:\n           in_tensor: an input tensor of shape [4,4096]\n      Returns:\n           out_tensor: an output tensor of shape [4,4096]\n      \"\"\"\n      out_tensor = nl.ndarray(in_tensor.shape, dtype=nl.bfloat16, buffer=nl.shared_hbm)\n      sbuf_tensor = nl.ndarray(in_tensor.shape, dtype=nl.bfloat16, buffer=nl.sbuf)\n      out_tile = nl.ndarray(in_tensor.shape, dtype=nl.bfloat16, buffer=nl.sbuf)\n\n      # Load input data from HBM to on-chip memory\n      nisa.dma_copy(src=in_tensor[0:4, 0:4096], dst=sbuf_tensor)\n\n      # perform the computation:\n      out_tile = nisa.activation(op=nl.exp, data=sbuf_tensor)\n\n      # store the results back to HBM\n      nisa.dma_copy(src=out_tile, dst=out_tensor[0:4, 0:4096])\n      return out_tensor\n\n    if __name__ == \"__main__\":\n      import torch\n      import torch_neuronx\n      shape = (4, 4096)\n      in_tensor = torch.ones(shape, dtype=torch.bfloat16)\n      out_tensor = tensor_exp_kernel_isa(in_tensor)\n      print(out_tensor)\n\nProfile\n\"\"\"\"\"\"\"\n\nThe above code runs on a single NeuronCore-v3, in a Trn2 instance. Here we can look at the profile, to validate the expected behavior. Refer to the :doc:`Neuron Explorer user guide </tools/neuron-explorer/index>` for guidance on how to generate a profile.\n\n.. figure:: /nki/img/overviews/nki-dma-intro-3.png\n   :alt: Profile showing DMA transfer for Example 1\n\nThis is exactly what we expected based on our analysis. From the profile, we can see that the first DMA engine takes 1416 ns to load 32 KiB from HBM to SBUF and also a small 4B semaphore update. Even though the remaining 15 DMA engines do not perform useful data movement, they also perform a small 4B semaphore update writes. This allows the NeuronCore to always monitor a semaphore increment of 16 to signal DMA transfer completion, regardless of the tensor shapes in the transfer.  \n\nThis is good, but this example only uses a single DMA engine. In the next example, we increase partition dimension to increase the number of DMA Engines in use.\n\nExample 2: Move A[128,128] HBM → SBUF\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe purpose of this example is to show how as partition count scales, the number of DMA Engines in use increases.\n\nConsider a 2D Tensor A[128, 128] in HBM, laid out in row-major form and contiguous on the HBM. Assuming we wish to move A from HBM to SBUF, how many DMA engines will this require?\n\nAgain, we see the total tensor size is 32KiB (128*128*2B), the same as the previous example. We are writing across 128 partitions of SBUF, with each row corresponding to a partition lane. Knowing that each DMA engine corresponds to 8 partition lanes, and we are writing to 128 partitions, we would expect all 16 DMA engines to be active, each performing a single DMA operation of 2KiB (8 rows x 128 elements x 2 bytes per element).\n\nHere is a diagram of the expected transfer:\n\n.. figure:: /nki/img/overviews/nki-dma-intro-4.jpg\n   :alt: Diagram showing DMA transfer of A[128,128] from HBM to SBUF\n\nExample\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: python\n\n    import nki.language as nl\n    import nki.isa as nisa\n    import nki\n    import os\n    os.environ[\"NEURON_FRAMEWORK_DEBUG\"] = \"1\"\n    os.environ[\"NEURON_RT_ENABLE_DGE_NOTIFICATIONS\"] = \"1\"\n    os.environ[\"NEURON_PLATFORM_TARGET_OVERRIDE\"] = \"trn2\"\n\n\n    @nki.jit(mode=\"torchxla\")\n    def tensor_exp_kernel_isa(in_tensor):\n      \"\"\"NKI kernel to compute elementwise exponential of an input tensor\n      Args:\n            in_tensor: an input tensor of shape [128,128]\n      Returns:\n            out_tensor: an output tensor of shape [128,128]\n      \"\"\"\n      out_tensor = nl.ndarray(in_tensor.shape, dtype=nl.bfloat16, buffer=nl.shared_hbm)\n      sbuf_tensor = nl.ndarray(in_tensor.shape, dtype=nl.bfloat16, buffer=nl.sbuf)\n      out_tile = nl.ndarray(in_tensor.shape, dtype=nl.bfloat16, buffer=nl.sbuf)\n   \n      # Load input data from HBM to on-chip memory\n      nisa.dma_copy(src=in_tensor[0:128, 0:128], dst=sbuf_tensor)\n\n      # perform the computation:\n      out_tile = nisa.activation(op=nl.exp, data=sbuf_tensor)\n   \n      # store the results back to HBM\n      nisa.dma_copy(src=out_tile, dst=out_tensor[0:128, 0:128])\n      return out_tensor\n\n    if __name__ == \"__main__\":\n      import torch\n      import torch_xla\n\n\n      device = torch_xla.device()\n      shape = (128, 128) # Tensor shape : [128, 128]\n      in_tensor = torch.ones(shape,  dtype=torch.bfloat16).to(device=device)\n      print(in_tensor.dtype)\n      out_tensor = tensor_exp_kernel_isa(in_tensor)\n      print(out_tensor) # an implicit XLA barrier/mark-step\n\nProfile\n\"\"\"\"\"\"\"\n\n.. figure:: /nki/img/overviews/nki-dma-intro-5.png\n   :alt: Profile showing DMA transfer for Example 2\n\nIn the above profile, we can see that all 16 DMA engines are active, as each DMA engine is reading 8 rows from HBM and writing to 8 corresponding partition lanes in SBUF.  Similarly, we see the reverse also applies from SBUF, back to HBM. By mousing over an individual DMA operation, we see each DMA engine corresponds to a single 2KiB read (8 rows x 128 elements x 2B), as we expect!\n\nUsing the same profile from the 128x128 DMA example, lets look at the DMA Trigger and the associated Transfer. You can trace the DMA trigger instruction and the associated DMA transfer via the profiler. This would be useful if you wanted to understand the why a DMA was triggered when, and any preceding dependencies.\n\n.. figure:: /nki/img/overviews/nki-dma-intro-6.png\n   :alt: Profile showing DMA trigger from qGpSimdDynamic\n\n.. figure:: /nki/img/overviews/nki-dma-intro-7.png\n   :alt: Profile showing corresponding trigger in GPSimd\n\nWe can see the first DMA is triggered from qGpSimdDynamic (First screenshot). We can look at GPSimd to see the corresponding trigger (second screenshot).\n"
  },
  {
    "path": "nki/get-started/about/tiling-overview.rst",
    "content": ".. meta::\n   :description: Overview of Tiling process for NKI programmers\n   :date_updated: 12/02/2025\n\n.. _nki-about-tiling:\n\n=======================\nWhat is Tiling?\n=======================\n\nThis topic covers tiling and how it applies to developing NKI kernels with the AWS Neuron SDK. Tiling is the process of dividing a large tensor up in to smaller tensors that can be processed by single Neuron ISA instructions. When writing NKI kernels, all tensors must be tiled to fit within the constraints of the hardware.\n\nTile-based operations\n----------------------\n\nAll NKI APIs operate on tiles. A tile is just a tensor that resides in either the SBUF or PSUM memory with a size and layout that satisfies the constraints of the Neuron instruction set architecture (NeuronCore ISA). Since the SBUF and PSUM memories have 128 partitions, most APIs are limited to tiles with a first dimension (also called the \"Partition Dimension\") no larger than 128 elements. So, for example, to compute the reciprocal of a matrix of size 256x256, you will need to split the computation up into (at least) two parts:\n\n.. code-block::\n\n   # Example how to split 256x256 into tiles with 128 partition dimensions\n   # Assume input and output are tensors of size 256 x 256\n\n   # The hardware supports up to 128 partitions\n   P_DIM = nki.language.tile_size.pmax\n\n   # allocating memory for input and output tiles\n   # note that memory allocation does not initialize\n   in_tile = nl.ndarray((P_DIM, 256), dtype=nl.float32, buffer=nl.sbuf)\n   out_tile = nl.ndarray((P_DIM, 256), dtype=nl.float32, buffer=nl.sbuf)\n\n   # process first tile from input to output\n   nki.isa.dma_copy(dst=in_tile, src=input[0:P_DIM, 0:256])\n   nki.isa.reciprocal(dst=out_tile, data=in_tile)\n   nki.isa.dma_copy(dst=output[0:P_DIM, 0:256], src=out_tile)\n\n   # process second tile\n   nki.isa.dma_copy(dst=in_tile, src=input[P_DIM:256, 0:256])\n   nki.isa.reciprocal(dst=out_tile, data=in_tile)\n   nki.isa.dma_copy(dst=output[P_DIM:256, 0:256], src=out_tile)\n\nIn the code above, we allocate two SBUF tensors to store our tiles: one for the input and one for the result. These two tiles are available within the kernel that they are declared in, and will be automatically recycled by the compiler when no longer needed. Then we copy the first 128 rows of our matrix from the input in HBM to the input tile in SBUF, and compute the reciprocal placing the result into the output tile in SBUF. Finally, we copy the result back to the output tensor, in HBM. Of course, this could also be done with a loop, as shown below.\n\n.. code-block::\n\n   # allocate memory for input and output tiles\n   in_tile = nl.ndarray((P_DIM, 256), dtype=nl.float32, buffer=nl.sbuf)\n   out_tile = nl.ndarray((P_DIM, 256), dtype=nl.float32, buffer=nl.sbuf)\n   # process tiles\n   for i in range(input.shape[0] // P_DIM):\n       s = nl.ds(i * P_DIM, P_DIM) # equivalent to i * P_DIM : (i + 1) * P_DIM\n       nki.isa.dma_copy(dst=in_tile, src=input[s, 0:256])\n       nki.isa.reciprocal(dst=out_tile, data=in_tile)\n       nki.isa.dma_copy(dst=output[s, 0:256], src=out_tile)\n\nWe will provide more discussion of the indexing in :ref:`Tensor Indexing <nki-tensor-indexing>`. Next, let's discuss two important considerations when working with tile-based operations in NKI: :ref:`data layout <nki-tile-layout>` and :ref:`tile size <nki-tile-size>` constraints.\n\n.. _nki-tile-layout:\n\nLayout considerations\n-----------------------\n\nWhen working with multi-dimensional arrays in any platform, it is important to consider the physical memory layout of the arrays, or how data is stored in memory. For example, in the context of 1D linear memory, we can store a 2D array in a row-major layout or a column-major layout. Row-major layouts place elements within each row in contiguous memory, and column-major layouts place elements within each column in contiguous memory.\n\nAs discussed in :ref:`Memory hierarchy <nki-about-memory>`, the on-chip memories, SBUF and PSUM, are arranged as 2D memory arrays. The first dimension is always the partition dimension ``P`` with 128 memory partitions that can be read and written in parallel by compute engines. The second dimension is the free dimension ``F`` where elements are read and written sequentially. A tensor is placed in SBUF and PSUM across both P and ``F``, with the same start offset across all ``P`` partitions used by the tensor. The figure below illustrates a default tensor layout. Note that a tile in NKI must map shape[0] to the partition dimension.\n\n.. _nki-fig-pm-layout:\n\n.. figure:: /nki/img/overviews/tiling-1.png\n   :align: center\n   :width: 70%\n\n   Tensor mapped to partition and free dimensions of SBUF and PSUM\n\nSimilar to other domain-specific languages that operate on tensors, NKI defines a contraction axis of a tensor as the axis over which reduction is performed, for example the summation axis in a dot product. NKI also defines a parallel axis as an axis over which the same operation is performed on all elements. For example, if we take a ``[100, 200]`` matrix and sum each row independently to get an output of shape ``[100, 1``], then the row-axis (``axis[0]``, left-most) is the parallel axis, and the column-axis (``axis[1``], right-most) is the contraction axis.\n\nTo summarize, the partition and free dimensions of a NKI tensor dictate how the tensor is stored in the 2D on-chip memories physically, while the parallel and contraction axes of a tensor are logical axes that are determined by the computation to be done on the tensor.\n\nThe NeuronCore compute engines impose two layout constraints (LC):\n\n* **[Layout Constraint #1]** For matrix multiplication operations, the contraction axis of both input tiles must be mapped to the Partition (P or P_DIM) dimension which is typically 128 for current hardware.\n* **[Layout Constraint #2]** For operations that are not matrix multiplication operations, such as scalar or vector operations, the parallel axis should be mapped to the Partition (``P`` or ``P_DIM``) dimension.\n\nLayout Constraint #1 means that to perform a matrix multiplication of shapes ``[M, K]`` and ``[K, N]`` that contracts on K to generate ``[M, N]``, Tensor Engine (the engine performing this matmul operation) requires the K dimension to be mapped to the partition dimension in SBUF for both input matrices. Therefore, you need to pass shapes ``[K, M]`` and ``[K, N]`` into the :doc:`nki.isa.nc_matmul </nki/api/generated/nki.isa.nc_matmul>` API, as the partition dimension is always the left-most dimension for an input tile to any NKI compute API.\n\nTo help developers get started with NKI quickly, NKI also provides a high-level API :doc:`nki.isa.nc_matmul </nki/api/generated/nki.isa.nc_matmul>` that can take ``[M, K]`` and ``[K, N]`` input shapes and invoke the necessary layout shuffling on the input data before sending it to the Tensor Engine matmul instruction.\n\nLC#2, on the other hand, is applicable to many instructions supported on Vector, Scalar and GpSimd Engines. See :doc:`nki.isa.tensor_reduce </nki/api/generated/nki.isa.tensor_reduce>` API as an example.\n\n.. _nki-tile-size:\n\nTile size considerations\n-------------------------\n\nBesides layout constraints, NeuronCore hardware further imposes three tile-size constraints (TC) in NKI:\n\n* **[Tile-Size Constraint#1]** The P dimension size of a tile in both SBUF and PSUM must never exceed ``nki.tile_size.pmax == 128``.\n* **[Tile-Size Constraint#2]** For tiles in PSUM, the F dimension size must not exceed ``nki.tile_size.psum_fmax == 512``.\n* **[TileSize Constraint#3]** Matrix multiplication input tiles F dimension size must not exceed ``nki.tile_size.gemm_stationary_fmax == 128`` on the left-hand side (LHS), or ``nki.tile_size.gemm_moving_fmax == 512`` on the right-hand side (RHS).\n\nProgrammers are responsible for breaking up your tensors according to these tile-size constraints. For example, below is a simple kernel that applies the exponential function to every element of an input tensor. The kernel expects a shape of ``(128, 512)`` for both input and output tensors:\n\n.. code-block::\n\n   import nki.isa as nisa\n   import nki.language as nl\n   import nki\n\n   # The hardware supports up to 128 partitions\n   P_DIM = nki.language.tile_size.pmax\n\n   @nki.jit\n   def tensor_kernel(in_tensor):\n    \"\"\"NKI kernel to compute elementwise reciprocal of an input tensor\n    Args:\n    in_tensor: an input tensor of shape [128,512]\n    Returns:\n    out_tensor: an output tensor of shape [128,512]\n    \"\"\"\n     X_SIZE = 128\n     Y_SIZE = 512\n     \n     # allocate space for the result\n     out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype, buffer=nl.shared_hbm)\n     # allocate space for tile memory\n     in_tile = nl.ndarray((P_DIM, 256), dtype=nl.float32, buffer=nl.sbuf)\n     out_tile = nl.ndarray((P_DIM, 256), dtype=nl.float32, buffer=nl.sbuf)\n\n     # Process first tile\n     nki.isa.dma_copy(dst=in_tile, src=in_tensor[0:P_DIM, 0:256])\n     nki.isa.reciprocal(dst=out_tile, data=in_tile)\n     nki.isa.dma_copy(dst=out_tensor[0:P_DIM, 0:256], src=out_tile)\n     \n     return out_tensor\n\nAs expected, the output tensor is an element-wise exponentiation of the input-tensor (a tensor of ones):\n\n::\n\n   tensor([[2.7188, 2.7188, 2.7188, ..., 2.7188, 2.7188, 2.7188],\n   [2.7188, 2.7188, 2.7188, ..., 2.7188, 2.7188, 2.7188],\n   [2.7188, 2.7188, 2.7188, ..., 2.7188, 2.7188, 2.7188],\n   ...,\n   [2.7188, 2.7188, 2.7188, ..., 2.7188, 2.7188, 2.7188],\n   [2.7188, 2.7188, 2.7188, ..., 2.7188, 2.7188, 2.7188],\n   [2.7188, 2.7188, 2.7188, ..., 2.7188, 2.7188, 2.7188]],\n   device='xla:1', dtype=torch.bfloat16)\n\n.. _nki-output-garbage-data:\n\nNow let's examine what happens if the input/output tensor shapes do not match the shape of the compute kernel. As an example, we can change the input and output tensor shape from ``[128,512]`` to ``[256,512]``:\n\nSince the compute kernel is expecting ``(128, 512)`` input/output tensors, but we used a ``(256, 512)`` input/output tensor instead, the bottom half of the output tensor becomes garbage data:\n\n::\n\n   tensor([[2.7188, 2.7188, 2.7188, ..., 2.7188, 2.7188, 2.7188],\n   [2.7188, 2.7188, 2.7188, ..., 2.7188, 2.7188, 2.7188],\n   [2.7188, 2.7188, 2.7188, ..., 2.7188, 2.7188, 2.7188],\n   ...,\n   [0.5273, 0.6055, 0.4336, ..., 0.9648, 0.9414, 0.4062],\n   [0.7109, 0.2539, 0.7227, ..., 0.7344, 0.2539, 0.1211],\n   [0.8867, 0.2109, 0.8789, ..., 0.8477, 0.2227, 0.1406]],\n   device='xla:1', dtype=torch.bfloat16)\n\nWe could try to fix this by changing the tile size inside the compute kernel to ``(256, 512)`` as well, and see what happens: (**Note**: This violates tile-size constraint #1!) \n\nHere, the Neuron Graph Compiler identifies the tile-size constraint violation and fails compilation with the following exception:\n\n::\n\n   Size of partition dimension 256 exceeds architecture limitation of 128.\n\nNow, let's see how to build a kernel that properly handles ``(256, 512)`` input/output tensors with a simple loop. We can use the ``nki.language.tile_size.pmax`` constant defined in NKI as the maximum partition dimension size in a tile.\n\n.. code-block::\n\n   import nki.isa as nisa\n   import nki.language as nl\n   import nki\n\n   # The hardware supports up to 128 partitions\n   P_DIM = nki.language.tile_size.pmax\n\n   @nki.jit\n   def tensor_exp_kernel_(in_tensor):\n     \"\"\"NKI kernel to compute elementwise exponential of an input tensor\n     Args:\n         in_tensor: an input tensor of shape [256,512]\n     Returns:\n         out_tensor: an output tensor of shape [256,512]\n     \"\"\"\n     X_SIZE = 128\n     Y_SIZE = 512\n     assert in_tensor.shape == (X_SIZE, Y_SIZE)\n     # allocate space for the result\n     out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype, buffer=nl.shared_hbm)\n     # allocate space for tile memory\n     in_tile = nl.ndarray((P_DIM, Y_SIZE), dtype=nl.float32, buffer=nl.sbuf)\n     out_tile = nl.ndarray((P_DIM, Y_SIZE), dtype=nl.float32, buffer=nl.sbuf)\n\n     for k in nl.affine_range(in_tensor.shape[0] / nl.tile_size.pmax):\n       # Generate tensor indices for the input/output tensors\n       p_start = k * nl.tile_size.pmax\n       i_p = nl.ds(p_start, nl.tile_size.pmax)\n\n       # Process tile\n       nki.isa.dma_copy(dst=in_tile, src=in_tensor[i_p, :])\n       nki.isa.reciprocal(dst=out_tile, data=in_tile)\n       nki.isa.dma_copy(dst=out_tensor[i_p, :], src=out_tile)\n     \n     return out_tensor\n\nThe ``nl.affine_range(2)`` API call is similar to the Python ``range`` function, and you can think of it as returning ``[0, 1]``. See :ref:`NKI iterator API <nl_iterators>` for a detailed discussion of various loop iterator options in NKI.\n\nWhile the code above does handle ``(256, 512)`` tensors correctly, it is rather inflexible since it only supports an input shape of ``(256, 512)``. Therefore, as a last step, we extend this kernel to handle varying input/output sizes:\n\n.. code-block::\n\n   import nki.isa as nisa\n   import nki.language as nl\n   import nki\n   import math\n\n   # The hardware supports up to 128 partitions\n   P_DIM = nki.language.tile_size.pmax\n\n   @nki.jit\n   def tensor_exp_kernel_(in_tensor):\n     \"\"\"NKI kernel to compute elementwise exponential of an input tensor\n     Args:\n         in_tensor: an input tensor of ANY 2D shape (up to SBUF size)\n     Returns:\n         out_tensor: an output tensor of ANY 2D shape (up to SBUF size)\n     \"\"\"\n\n     sz_p, sz_f = in_tensor.shape\n     assert sz_f < nl.tile_size.total_available_sbuf_size\n    \n     # allocate space for the result\n     out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype, buffer=nl.shared_hbm)\n     # allocate space for tile memory\n     in_tile = nl.ndarray((P_DIM, sz_f), dtype=nl.float32, buffer=nl.sbuf)\n     out_tile = nl.ndarray((P_DIM, sz_f), dtype=nl.float32, buffer=nl.sbuf)\n     \n     for p in nl.affine_range(math.ceil(sz_p / P_DIM)):\n       # Generate tensor indices for the input/output tensors\n       p_start = p * P_DIM\n       p_end = p_start + P_DIM\n       i_p = slice(p_start, min(p_end, sz_p)) # same as nl.ds(p_start, min(p_end, sz_p) - p_start)\n\n       # Process tile\n       nki.isa.dma_copy(dst=in_tile, src=in_tensor[i_p, :])\n       nki.isa.reciprocal(dst=out_tile, data=in_tile)\n       nki.isa.dma_copy(dst=out_tensor[i_p, :], src=out_tile)\n       \n     return out_tensor\n\nThe above example handles cases where ``in_tensor.shape[0]`` is not a multiple of 128 by using the standard Python ``min`` function to make sure the tensor access is in bounds.\n\nFurther reading\n---------------\n\n- :ref:`Logical Neuron Cores (LNC) <nki-about-lnc>`\n"
  },
  {
    "path": "nki/get-started/index.rst",
    "content": ".. meta::\n   :description: Get started with Neuron Kernel Interface (NKI).\n   :keywords: NKI, AWS Neuron, Get Started, Language Guide\n   :date-modified: 12/13/2025\n\n.. _nki-get-started:\n\nGet Started with Neuron Kernel Interface (NKI)\n==============================================\n\nThis section provides the essentials that you need to get started with NKI.\n\n.. grid:: 1\n      :margin: 2\n\n      .. grid-item::\n\n            .. card:: About NKI\n                  :link: nki_about_home\n                  :link-type: ref\n                  :class-body: sphinx-design-class-title-small\n\n                  Learn about Neuron Kernel Interface (NKI) and core concepts essential for working with it.\n\nThe Quick Start Guide will walk you through implementing and running your first kernel. The NKI Language Guide\nprovides an introduction to some of the core concepts within NKI.\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: NKI Quick Start Guide\n      :link: quickstart-run-nki-kernel\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Implement and run your first NKI kernel.\n\n   .. grid-item-card:: NKI Language Guide\n      :link: nki-language-guide\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Developer guide for NKI's Pythonic language syntax.\n\n.. toctree::\n      :maxdepth: 1\n      :hidden:\n\n      Environment Setup <setup-env>\n      First NKI Kernel <quickstart-implement-run-kernel>\n      NKI Language Guide <nki-language-guide>\n      Concepts <about/index>\n"
  },
  {
    "path": "nki/get-started/nki-language-guide.rst",
    "content": ".. meta::\n    :description: Comprehensive guide to the NKI language for AWS Neuron SDK, covering tensor operations, control flow, memory management, and programming patterns for Trainium accelerators.\n    :keywords: NKI, AWS Neuron, Language Guide, Tensor Operations, Trainium\n    :date-modified: 04/08/2026\n\n.. _nki-language-guide:\n\nNKI Language Guide\n==================\n\nThe Neuron Kernel Interface (NKI) language is designed for writing kernel functions to accelerate machine learning workloads on Trainium devices. This guide is an introduction to the NKI language and the key concepts you will need to know to program in NKI effectively.\n\nLet us start by looking at a simple NKI function.\n\n.. code-block:: python\n\n    @nki.jit\n    def nki_tensor_add_kernel(a_input, b_input):\n        \"\"\"\n        NKI kernel to compute element-wise addition of two input tensors.\n        \"\"\"\n\n        # Check both input tensor shapes/dtypes are the same for element-wise operation.\n        assert a_input.shape == b_input.shape\n        assert a_input.dtype == b_input.dtype\n\n        print(f\"adding tensors of type {a_input.dtype} and shape {a_input.shape}\")\n\n        # Check the first dimension's size to ensure it does not exceed on-chip\n        # memory tile size, since this simple kernel does not tile inputs.\n        assert a_input.shape[0] <= nl.tile_size.pmax\n\n        # Allocate space for the input tensors in SBUF and copy the inputs from HBM\n        # to SBUF with DMA copy.\n        a_tile = nl.ndarray(shape=a_input.shape, dtype=a_input.dtype, buffer=nl.sbuf)\n        nisa.dma_copy(dst=a_tile, src=a_input)\n\n        b_tile = nl.ndarray(shape=b_input.shape, dtype=b_input.dtype, buffer=nl.sbuf)\n        nisa.dma_copy(dst=b_tile, src=b_input)\n\n        # Allocate space for the result and use tensor_tensor to perform\n        # element-wise addition. Note: the first argument of 'tensor_tensor'\n        # is the destination tensor.\n        c_tile = nl.ndarray(shape=a_input.shape, dtype=a_input.dtype, buffer=nl.sbuf)\n        nisa.tensor_tensor(dst=c_tile, data1=a_tile, data2=b_tile, op=nl.add)\n\n        # Create a tensor in HBM and copy the result into HBM.\n        c_output = nl.ndarray(dtype=a_input.dtype, shape=a_input.shape, buffer=nl.shared_hbm)\n        nisa.dma_copy(dst=c_output, src=c_tile)\n\n        # Return kernel output as function output.\n        return c_output\n        \n.. important::\n   The first thing you may notice about this NKI function is that it looks very much like a Python function. In fact, all NKI functions are syntactically valid Python functions. However, it is important to understand that NKI functions are not Python functions: they will be compiled by the NKI compiler and run on the Trainium accelerator. Because of this, not all Python constructs and libraries are supported within a NKI function.\n\nThe second thing to notice is that NKI has a sequential programming model. This means that the logical order of operations follows the syntactic order of the statements in the function. As you learn more about the Trainium hardware, you will see that the hardware can often do many things at the same time across the different compute engines on the Trainium devices. When we compile NKI functions, we will respect the sequential order of operations written by the programmer. The compiler may reorder operations that have no data dependencies, but this is functionally transparent to NKI programmers. Later you will see how to control which engines operations run on and even how to influence the ordering of operations with no data dependencies for better performance, but all of this is done in the context of the sequential ordering of the code.\n\nThe third thing to notice about this simple function is that is has a print statement. You may be wondering: When does this print happen? Does the Trainium hardware output a string, where does it go? What about all those different engines we just talked about and the sequential ordering? The answer to these questions reveal a very important aspect of NKI programming. The answer is that the print is evaluated by the compiler at compile time, not at runtime. So, when you compile this NKI function, the NKI compiler will output a string like:\n\n.. code-block:: text\n\n   adding tensors of type float16 and shape (128, 512)\n\nHowever, when we run this compiled function on Trainium devices they will not output anything. This is usually what you want. The compiler gives important debugging information during compilation, but when you deploy your function across 1000 Trainium devices, they will not waste any time generating debug output. \n\n**Note**: There is a special print function that does run on the Trainium devices, called ``device_print``, that can be used if this is really what you need, see the API references for more information.\n\nWe have just seen that the print statement is evaluated at compile-time, and not at runtime. In fact, most things in NKI programs are evaluated at compile time. In general, calls to nki.isa.* functions will result in on-device operations, and (almost) all other things will be evaluated by the compiler at compile time. We will discuss some exceptions to this rule below, but for now it is generally the case that only the nki.isa.* calls result in run-time operations, and everything else is evaluated by the compiler at compile-time.\n\nThis leads us to our the last observation about NKI functions. The nki.isa.* APIs are the heart of the matter. These APIs are designed to expose the underlying hardware capabilities in as direct a way as possible. If you write a nki.isa function, then the hardware will execute that operation at that point in the program. The NKI meta-programming language simply provides a convenient way to specify which ISA operations you want to run on your data.\n\nIn the rest of this guide we will focus on the NKI language, starting with the compilation model and namespaces, then the values you can manipulate in a NKI function. We will then cover tensor indexing, control flow, and end with a discussion of class support, interoperation with Python, and composable kernels.\n\nCompilation Model\n------------------\n\nWhen you decorate a function with ``@nki.jit`` and call it, the NKI compiler processes your kernel in three stages:\n\n1. **Specialization**: The compiler takes your Python function and evaluates all meta-programming constructs. This includes resolving tensor shapes, unrolling loops, inlining function calls, and evaluating if-statements with compile-time conditions. The result is a specialized, flat sequence of ``nki.isa.*`` operations with all compile-time values resolved.\n\n2. **Compilation**: The specialized program is lowered to Trainium machine code. This stage performs instruction scheduling, register allocation, and memory layout.\n\n3. **Graph-compiler linking**: The compiled kernel is linked into the larger computation graph managed by the Neuron graph compiler, which handles data movement between the host and device.\n\nThe specialization stage is key to understanding NKI programming. During specialization, the compiler acts as an interpreter for the meta-programming parts of your kernel. Everything that is not a ``nki.isa.*`` call or a ``dynamic_range`` loop is evaluated and resolved at this stage. This means:\n\n- All ``for`` loops (except ``dynamic_range``) are **unrolled** at specialization time. The compiler expands the loop body once for each iteration.\n- All function calls are **inlined** at specialization time. The compiler substitutes the function body at each call site.\n- All ``if`` statements with compile-time conditions are **resolved** at specialization time. Only the taken branch is included in the specialized program.\n- All Python expressions on compile-time values (integers, booleans, strings, shapes) are **evaluated** at specialization time.\n\nThe only constructs that survive specialization and become runtime operations are ``nki.isa.*`` calls and ``dynamic_range`` loops. Everything else is part of the meta-programming language that controls how the final sequence of ISA operations is generated.\n\n.. note::\n\n   Throughout this documentation, we use the term **NKI meta-programming language** to refer to the Python subset that is evaluated at specialization time (loops, conditionals, function calls, and expressions on compile-time values), and **NKI language** to refer to the runtime primitives (``nki.isa.*`` operations and ``dynamic_range`` loops) that execute on the device.\n\n.. code-block:: python\n\n   @nki.jit\n   def example_kernel(a_input):\n       # Meta-programming: this loop is unrolled at specialization time\n       for i in range(4):\n           tile = nl.ndarray((128, 512), dtype=nl.float16, buffer=nl.sbuf)\n           nisa.dma_copy(dst=tile, src=a_input[i * 128:(i + 1) * 128, :])\n           # Meta-programming: this if is resolved at specialization time\n           if i % 2 == 0:\n               nisa.tensor_scalar(dst=tile, data=tile, op0=nl.add, operand0=1.0)\n\nAfter specialization, this kernel becomes a flat sequence of ``dma_copy`` and ``tensor_scalar`` operations, with the loop and if-statement fully resolved.\n\nNKI Namespaces\n---------------\n\nNKI is organized into several Python namespaces:\n\n- ``nki`` — The top-level package. Provides the ``@nki.jit`` decorator for compiling kernel functions.\n- ``nki.language`` (commonly imported as ``nl``) — The high-level language API. This includes tensor creation (``ndarray``), data types, memory buffers, loop ranges (``affine_range``, ``dynamic_range``), and high-level math operations (``nl.add``, ``nl.matmul``, ``nl.softmax``, etc.). Many of the functions in ``nki.language`` are convenience wrappers around one or more ``nki.isa`` operations.\n- ``nki.isa`` (commonly imported as ``nisa``) — The low-level instruction set architecture API. Each function in this namespace maps directly to a Trainium hardware operation. These are the only calls that produce runtime operations on the device.\n- ``nki.collectives`` — APIs for multi-device collective communication operations such as ``all_reduce``, ``all_gather``, and ``collective_permute``.\n\nA typical NKI kernel imports these namespaces as follows:\n\n.. code-block:: python\n\n   import nki\n   import nki.language as nl\n   import nki.isa as nisa\n\nThe distinction between ``nki.language`` and ``nki.isa`` is important. When you call a ``nki.language`` function like ``nl.add(a, b)``, the compiler may lower this to one or more ``nki.isa`` operations depending on the tensor shapes and types. When you call a ``nki.isa`` function like ``nisa.tensor_tensor(...)``, you are directly specifying the hardware operation. Use ``nki.language`` for readability and portability; use ``nki.isa`` when you need precise control over which hardware engine executes an operation.\n\nNKI Values\n-----------\n\nThe NKI language supports six types of values:\n\n1. The special None value\n2. Boolean values (True and False)\n3. 32-bit integer values\n4. 32-bit IEEE floating-point values\n5. String literals\n6. Tensors (on-device tensor memory)\n\nIn addition, NKI supports the following container types:\n\n1. Tuples of any fixed length\n2. Lists of arbitrary length\n3. Dictionaries with string-value keys\n4. Simple user-defined classes\n\nNKI values and containers are very similar to their Python equivalents. For instance, you can use most of the Python standard list functions, and they work in the same way as in Python.\n\n.. code-block:: python\n\n   l = [1,2,3]    # create a list with 3 elements \n   l.append(4.1)  # append a value to the list\n   l.extend((\"Hello\", \"List\")) # extend list with multiple values\n   size = len(l) # return number of elements in list\n   third = l[2]  # get third element of list (index 2)\n\n   # search list for a specific value\n   if l.index(2):\n     print(\"list contains 2\")\n     \n   # remove a specific value from a list (if present)\n   l.remove(1)\n\n   # print out list in reverse order\n   l.reverse()\n   for x in l:\n     print(x)\n\nThe NKI dictionary type is also similar to the Python version, but with the restriction that the keys must be string values.\n\n.. code-block:: python\n\n    d = dict() # create an empty dictionary\n    d['a'] = 1 # set a value in the dictionary\n\n    print(d.keys())  # print out keys in dictionary\n    print(d.items())  # print out values in dictionary\n\n    # print out dictionary\n    for k in d.keys():\n        v = d[k]\n        print(k, v)\n\n    # remove value from dictionary if present\n    if d.pop('a'):\n        print(\"removed 'a' from dictionary\")\n\n    # fetch value of a, set to 2 if not present\n    a = d.setdefault('a', 2)\n\nWe will discuss user-defined classes later in the guide. For now, let's take a close look at the most important value in NKI, the tensor.\n\nTensor Values\n--------------\n\nThe ``NkiTensor`` class represents an on-chip tensor. That is, an ``NkiTensor`` instance is really a reference to some region of memory on the Trainium device at runtime. At compile-time, we do not yet know the precise location nor the precise contents of this tensor, and therefore, code evaluated at compile-time will not be able to query the precise location nor the contents. At compile-time we can only query meta-data about the tensor, such as its shape and element type. ``NkiTensor`` exposes the following meta-data:\n\n* ``t.dtype`` - The element type of the tensor, e.g. \"float16\"\n* ``t.shape`` - The shape of the tensor, e.g. (128,64,64)\n* ``t.ndim`` - The number of dimensions\n* ``t.size`` - The total number of elements\n* ``t.offset`` - The access pattern offset (discussed below)\n* ``t.buffer`` - The memory buffer this tensor lives in (discussed below)\n* ``t.get_pattern()`` - The access pattern (discussed below)\n\nThe most commonly used fields are dtype and shape. We have already seen an example of using these fields to check that argument tensors are compatible in our simple example. Another common case is using a dimension of a shape to iterate over a tensor:\n\n.. code-block:: python\n\n   # assume t is a 3-dimensional tensor, we can iterate over the\n   # 2-D subtensors\n   for i in range(t.shape[0]):\n     my_function(t[i])\n\nNote, because the shape is part of the meta-data of the tensor, the expression ``t.shape[0]`` is a compile-time constant. Therefore, the bounds of the for-loop are known at compile time. The compiler will unroll this loop into a sequence of calls to my_function, one for each subtensor of t.\n\nIn addition to the basic meta-data fields, ``NkiTensor`` provides two methods for creating alternate views of the same underlying storage:\n\n``view(dtype)``\n  Reinterpret the tensor's storage bits as a different data type. The underlying memory is not modified; only the interpretation changes. This is useful for bitwise manipulation, such as reinterpreting ``int32`` values as ``float32``.\n\n  .. code-block:: python\n\n     int_tensor = nl.ndarray((128, 256), dtype=nl.int32, buffer=nl.sbuf)\n     float_tensor = int_tensor.view(nl.float32)\n\n``ap(pattern, offset=0, scalar_offset=None, vector_offset=None, indirect_dim=0, dtype=None)``\n  Create a tensor with an explicit hardware access pattern sharing the same storage. The ``pattern`` is a list of ``[step, num]`` tuples that define how elements are accessed. This is an advanced feature for controlling the exact memory access pattern used by the hardware. See the architecture guide for details on access patterns.\n\n  .. code-block:: python\n\n     t = nl.ndarray((128, 1024), dtype=nl.float16, buffer=nl.sbuf)\n     # Access every other element in the free dimension\n     u = t.ap(pattern=[(1, 128), (2, 512)])\n\nCreating Tensors\n-----------------\n\nThe easiest way to create tensors is using the ``nki.language.ndarray`` API. This function takes a shape, a dtype, and a memory type, and returns an ``NkiTensor`` representing a reference to a memory region in the given memory type large enough to hold the tensor.\n\n.. note::\n\n   ``ndarray`` does **not** initialize memory. The contents of a newly allocated tensor are undefined until explicitly written to (e.g., via ``nisa.dma_copy`` or ``nisa.memset``).\n\n.. code-block:: python\n\n   # A matrix of 128x128 16-bit float values in the SBUF memory\n   t = nl.ndarray((128,128), nl.float16, nl.sbuf)\n   assert t.shape = (128,128)\n   assert t.dtype == nl.float16\n   assert t.buffer == nl.sbuf\n\nYou can also pass an optional ``name`` argument to ``ndarray``. The name is a string label that is propagated through the compiler into the generated IR and debug information. This can be helpful when profiling or debugging compiled kernels, since the name will appear in compiler output and diagnostic messages.\n\n.. code-block:: python\n\n   # Named tensor for easier identification in compiler output\n   t = nl.ndarray((128,128), nl.float16, nl.sbuf, name=\"my_weights\")\n\nYou can also create a tensor from an existing tensor using the ``reshape`` method. The ``reshape`` method will create a new reference to the same memory with a different shape. The reshaped tensor must have the same total number of elements as the original.\n\n.. code-block:: python\n\n   # create an alternate view of t with shape 128x2x64\n   u = t.reshape((128,2,64))\n\n   # create an alternate view of t with shape 128x32x4\n   v = t.reshape((128,32,4))\n\nIn both cases, ``u`` and ``v`` refer to the same underlying memory as ``t``; no data is copied.\n\nTensor Indexing\n----------------\n\nNext, we will examine two meta-data fields related to tensor indexing: offset and pattern. But before we talk about these fields, let's look at the most common way of indexing tensors using integers and slices.\n\nSuppose you have a tensor t with shape 64x64x64 that is in the SBUF memory. The SBUF memory is a two-dimensional block of memory, so the underlying storage for this 3-D tensor is a 2-D region of the SBUF. Recall, in the SBUF, the first dimension is called the partition dimension and the second dimension is called the free dimension. By convention, the first dimension of a tensor always corresponds to the partition dimension, and the remaining dimensions are laid out in the free dimension. Therefore, in our example, we have 64 partitions, each with 64*64=4096 elements.\n\nWe can refer to specific elements of the tensor using an index expression.\n\n.. code-block:: python\n\n   # 10th element in partition 0\n   u = t[0,0,10]\n\n   # 65th element in partition 0\n   u = t[0,1,0]\n\n   # last element of the tensor\n   u = t[63,63,63]\n\nIt is more common to refer to whole sub-tensors rather then single elements, and for this we can use slices. A slice is an expression of the form start:stop:step, which describes a range of elements starting with index start, up to (but not including) index stop, and incrementing by step. If any of start, stop, or step are not specified, defaults will be used.\n\n.. code-block:: python\n\n   # All first 64 elements of every partition\n   u = t[0:64, 0, 0:64]\n\n   # Same as above, but using defaults\n   u = t[:, 0, :]\n\n   # Only the even elements of the third dimension\n   u = t[:, :, ::2]\n\nFinally, you can also use the ellipsis (...) to indicate defaults for a range of dimensions.\n\n.. code-block:: python\n\n   # the whole tensor t\n   u = t[...]\n\n   # same as above\n   u = t[:,...]\n\n   # use defaults for second dimension\n   # equivalent to t[0,0:64,0:64]\n   u = t[0,...,:]\n\nNote, when you index into a tensor, the result is another tensor. So, in the examples above, the tensor u also has the normal tensor fields and capabilities. This means you can query the shape of the result, or further index the tensor u.\n\n.. code-block:: python\n\n   u = t[0,...]\n   assert u.shape = (64,64)\n\n   v = u[0:32, :]\n   assert v.shape = (32, 64)\n\nIn addition to querying the shape, you can also query the hardware access pattern that corresponds to the tensor value. For example, the code below will display the access pattern that would be used to query u, which is a sub-tensor of t.\n\n.. code-block:: python\n\n   u = t[0,...]\n\n   # check hardware access pattern\n   print(u.offset)\n   print(u.get_pattern())\n\nFor advanced use cases, the hardware access pattern can be specified directly.\n\n.. code-block:: python\n\n   # Specify HW access pattern directly\n   u = t.ap(offset = 0, pattern = [...])\n\nFor more details on hardware access patterns, see the architecture guide.\n\nControl Flow\n-------------\n\nNKI supports basic control flow constructs, including if-statements, for-loops over ranges, lists or tuples, and while loops. All of these constructs work similarly their equivalents in Python, but with one important difference: they are all evaluated at specialization time. This means the compiler unrolls every loop and resolves every branch before generating device code. For example, the code below uses a simple loop with a nested if statement to process the even and odd elements of a list differently.\n\n.. code-block:: python\n\n    inputs = [a, b, c]\n    outputs = [x, y, z]\n\n    assert len(inputs) == len(outputs)\n    for i in range(len(inputs)):\n        if i % 2 == 0:\n            nisa.nc_transpose(dst=outputs[i], data=inputs[i])\n        else:\n            nisa.reciprocal(dst=outputs[i], data=inputs[i])\n\nThe loop and if-statement above will ultimately be evaluated away by NKI Compiler. This means that the ISA instructions will be included in the final executable as a linear sequence:\n\n.. code-block:: python\n\n   nki.isa.nc_transpose(dst=x, data=a)\n   nki.isa.reciprocal(dst=y, data=b)\n   nki.isa.nc_transpose(dst=z, data=c)\n\nA for-loop can also iterate over a list or tuple, similar to Python. The two loops below both print the numbers 1-3 in sequence.\n\n.. code-block:: python\n\n   l = [1,2,3]\n   for x in l:\n     print(x)\n\n   t = (1,2,3)\n   for x in t:\n     print(x)\n\nFinally, NKI also supports while loops. Again these loops are similar to Python, and will be unrolled by the compiler, just like the for-loops.\n\n.. code-block:: python\n\n   # print the numbers 0-9\n   x = 0\n   while x < 10:\n     print(x)\n     x += 1\n\nDynamic Control Flow\n----------------------\n\nIn the previous section we looked at control-flow constructs that are ultimately expanded at compile-time. NKI also supports dynamic control-flow, or control-flow that runs on the device. Dynamic control-flow is not expanded by the compiler, but lowered to equivalent Trainium control-flow instructions.\n\nThe most basic dynamic loop is a for-loop with static bounds. A dynamic loop with static bounds can be written using the standard for-loop with a dynamic_range hint.\n\n.. code-block:: python\n\n   # create a dynamic loop that runs \"on chip\"\n   for i in dynamic_range(10):\n     process_tensor(t[i])\n\nThe for loop above will lower to a loop on the Trainium device. The loop will execute its body (process_tensor), 10 times and then continue. Because this is a dynamic loop, the loop index, i, will be stored in a hardware register during evaluation. Therefore, the type of i is register in NKI. Register values can be used to index tensors, and passed to nki.isa APIs. We can also use registers to create dynamic loops with dynamic bounds.\n\n.. code-block:: python\n\n   count = nki.isa.register_alloc(0)\n   nisa.register_load(count, count_tensor)\n   for i in dynamic_range(count):\n     process_tensor(t[i])\n\nThe loop above uses a register value as the upper bound. This register is allocated with the ``register_alloc`` function, and then its value is populated from a tensor using ``register_load``. The for loop will then execute ``count`` times.\n\nThere are four register APIs that can be used to create, and load and store values to and from registers. Each register is 32-bit and supports multiple data types: ``u8``, ``u16``, ``u32``, ``i8``, ``i16``, ``i32``, and ``fp32`` (or a pair of registers for ``u64``/``i64``). Signed integers are supported, so negative values (e.g., ``count=-5``) are valid. The register APIs return and operate on ``VirtualRegister`` objects.\n\nA ``VirtualRegister`` represents a scalar value stored in a hardware register on the Trainium device. Unlike compile-time integer values, a ``VirtualRegister`` holds a value that exists at runtime. You can use a ``VirtualRegister`` as a loop bound for ``dynamic_range``, as a condition for a dynamic ``while`` loop, or as a ``scalar_offset`` in a tensor access pattern for dynamic indexing.\n\n.. note::\n\n   The induction variable of a ``dynamic_range`` loop is also a ``VirtualRegister``, but it is frozen: you cannot write to it with ``register_move`` or ``register_load``. This prevents ambiguity about whether modifying the induction variable would affect loop termination.\n\n.. code-block:: python\n\n   # allocate a new register with initial value (32-bit integer)\n   def register_alloc(x: int) -> VirtualRegister: ...\n\n   # store a constant integer into a register\n   def register_move(dst: VirtualRegister, imm: int): ...\n\n   # load a value from an SBUF tensor into a register\n   # the source tensor must be a 1x1 SBUF tile\n   def register_load(dst: VirtualRegister, src: tensor): ...\n\n   # store the value of a register into an SBUF tensor\n   def register_store(dst: tensor, src: VirtualRegister): ...\n\nUsing the APIs above, we can also create dynamic while loops. A dynamic while loop is specified using the standard while-loop with a condition that is a single register value. The NKI compiler will preserve while loops with register conditions, and not unroll them.\n\n.. code-block:: python\n\n   # suppose cond is an SBUF tensor, perhaps declared as\n   cond = nl.ndarray((1, 1), buffer=nl.sbuf, dtype=nl.int32)\n\n   # allocate a register with initial value 1\n   reg = nisa.register_alloc(1)\n\n   # This while loop is dynamic because the condition is a register\n   while reg:\n      # perform a calculation that updates cond\n      nisa.dma_copy(dst=cond, ...)\n\n      # update register used in while-loop condition\n      nisa.register_load(reg, cond)\n\nThe code above uses a 1x1 SBUF tensor called cond to store the condition. We update this tensor in the body of the loop and then use register_load to update the register. When the register reg holds the value 0 the loop will terminate.\n\nClass Support\n--------------\n\nNKI has basic support for user-defined classes. In NKI all classes are similar to Python data classes. When you declare a class for use in a NKI kernel, the class must inherit from NKIObject and no other classes. This restriction is to ensure the NKI compiler only brings in class definitions that are intended for NKI. A simple NKI class can be declared similar to a Python data class:\n\n.. code-block:: python\n\n   @dataclass \n   class C(NKIObject):\n     x : int\n     y : bool = False\n     \n     def toggle(self):\n       self.y = not self.y\n       \n   c = C(1)\n   c.toggle()\n\n   # prints 1 True\n   print(c.x, c.y)\n\nThe @dataclass decorator is optional; classes with and without the @dataclass decorator will be compiled in the same way by the NKI compiler. The compiler will create the initializer functions __init__ and __post_init__, if they are not provided by the user. For the class above, the default initializers are:\n\n.. code-block:: python\n\n   # default if not provided by the user\n   def __init__(self, x = None, y = False):\n     self.x = x\n     self.y = y\n     self.__post_init__()\n\n   # default if not provided by the user\n   def __post_init__(self):\n     pass\n\nClasses can be declared in Python and passed as arguments to NKI functions. When a class is used as an argument to a NKI kernel, the NKI kernel will import the definition of the Python class, and convert the Python class instance to a NKI instance using the objects dictionary. Currently, NKI does not look at slots or other object features, only the object dictionary. For example, consider the code shown below.\n\n.. code-block:: python\n\n   class A(NKIObject):\n     x : int = 1\n     def __init__(self, x):\n       self.x = x\n\n   @nki.jit\n   def kernel(a : A): ...\n\n   kernel(A(1))\n\nThe class A is instantiated in Python as an argument to the kernel function. The NKI compiler will take this object and translate it to an instance of A on the NKI side. Roughly this translation is done by translating the object dictionary, in pseudo-code:\n\n.. code-block:: python\n\n   # pseudo-code \"copy constuct\" A on NKI side\n   def kernel(python_a : A):\n     # make a NKI instance of class A\n     nki_a = new A\n     # populate NKI instance from Python instance\n     nki_a.__dict__ = python_a.__dict__\n\nEnumerations\n-------------\n\nIn addition to the basic data classes described, NKI also supports basic enumerations. For example, the following can be used in NK kernel functions.\n\n.. code-block:: python\n\n   class E(Enum):\n     x = 1\n     y = 2\n     z = 3\n\n   def f(e : E):\n     if e == E.x: ...\n     elif e == E.y: ...\n     elif e == E.z: ...\n     \n   f(E.x)\n\nSimilar to Python, the NKI compiler will translate the enumration class E to the following:\n\n.. code-block:: python\n\n   class E(NKIObject):\n     x = E(\"x\", 1)\n     y = E(\"y\", 2)\n     z = E(\"z\", 3)\n     \n     def __init__(self, name, value):\n       self.name = name\n       self.value = value\n\nEquality in NKI is structural, so no additional code is needed to replicate the behavior of == and != for objects of type E. No other binary operators on enum values are supported.\n\nComposable Kernels\n-------------------\n\nBecause all functions are inlined at specialization time, NKI supports a powerful composition pattern: you can pass functions as arguments to other functions, and the compiler will inline them at each call site. This allows you to write generic kernel templates that can be specialized with different operations.\n\nFor example, consider a generic tiled processing kernel that applies a user-supplied function to each tile:\n\n.. code-block:: python\n\n   def tiled_process(input_tensor, output_tensor, tile_fn):\n       \"\"\"Generic kernel that applies tile_fn to each tile of the input.\"\"\"\n       for i in range(input_tensor.shape[0] // nl.tile_size.pmax):\n           tile = nl.ndarray((128, 512), dtype=input_tensor.dtype, buffer=nl.sbuf)\n           nisa.dma_copy(dst=tile, src=input_tensor[i * 128:(i + 1) * 128, :])\n\n           result = nl.ndarray((128, 512), dtype=input_tensor.dtype, buffer=nl.sbuf)\n           tile_fn(dst=result, src=tile)\n\n           nisa.dma_copy(dst=output_tensor[i * 128:(i + 1) * 128, :], src=result)\n\n   def my_activation(dst, src):\n       nisa.activation(dst=dst, data=src, op=nl.relu)\n\n   def my_scale(dst, src):\n       nisa.tensor_scalar(dst=dst, data=src, op0=nl.multiply, operand0=0.5)\n\n   @nki.jit\n   def relu_kernel(a_input, a_output):\n       tiled_process(a_input, a_output, my_activation)\n\n   @nki.jit\n   def scale_kernel(a_input, a_output):\n       tiled_process(a_input, a_output, my_scale)\n\nDuring specialization, the compiler inlines ``tiled_process`` and then inlines the specific ``tile_fn`` (either ``my_activation`` or ``my_scale``) at each call site. The result is a fully specialized kernel with no function call overhead.\n\nThis pattern is especially useful for building mega-kernels that compose multiple operations. You can pass function references as hyperparameters when using the kernel builder API:\n\n.. code-block:: python\n\n   from nki.compiler.kernel_builder import compile_kernel\n\n   compile_kernel(\n       tiled_process,\n       inputs={\"input_tensor\": input_array},\n       outputs={\"output_tensor\": output_array},\n       compile_opts=opts,\n       tile_fn=my_activation,  # passed as a hyperparameter\n   )\n\nFunctions can also be stored in data structures, returned from other functions, and selected dynamically at specialization time based on compile-time conditions:\n\n.. code-block:: python\n\n   def select_activation(name):\n       if name == \"relu\":\n           return my_relu\n       elif name == \"gelu\":\n           return my_gelu\n\n   @nki.jit\n   def kernel(a_input, a_output):\n       act_fn = select_activation(\"relu\")\n       # act_fn is resolved at specialization time; the selected\n       # function is inlined directly\n       act_fn(dst=a_output, src=a_input)\n\nBecause all of this resolution happens at specialization time, there is no runtime cost. The compiled kernel contains only the specific ISA operations for the chosen function.\n"
  },
  {
    "path": "nki/get-started/quickstart-implement-run-kernel.rst",
    "content": ".. meta::\n    :description: Learn how to implement and run your first NKI kernel on AWS Neuron accelerators\n    :date-modified: 03/30/2026\n\n.. _quickstart-run-nki-kernel:\n\nQuickstart: Implement and run your first kernel\n================================================\n\nThe Neuron Kernel Interface (NKI) lets you write low-level kernels that use the ISA of Trainium2 and Trainium3 ML accelerators. Your kernels can be used in PyTorch and JAX models to speed up critical parts of your model. This topic guides you through your first time writing a NKI kernel. It will help you understand the process when using AWS Neuron and NKI. \n\nWhen you have completed it, you will have a simple kernel that adds two input tensors and returns the result and a test program in PyTorch or JAX.\n\n* This quickstart is for: Customers new to NKI\n* Time to complete: ~10 minutes\n\nPrerequisites\n--------------\n\nBefore you begin, you will need a Trn2 or Trn3 EC2 instance.\n\n* Your EC2 instance should have the Neuron SDK and NKI library installed on them. If you used the Deep Learning AMI (DLAMI), these will be available by activating a PyTorch or JAX environment with Python's venv.\n* You will need a text editor or IDE for editing code.\n* A basic familiarity with Python and either PyTorch or JAX will be helpful, though not strictly required.\n\n\nBefore you start\n-----------------\n\nMake sure you are logged in to your EC2 instance and have activated either a PyTorch or JAX environment. See :doc:`Set up your environment for NKI development <setup-env>` for details.\n\nStep 1: Import the nki library\n-------------------------------\n\nIn this step you create the ``add_kernel.py`` file and add imports for the ``nki``, ``nki.language``, and ``nki.isa`` libraries.\n\n.. code-block:: python\n\n    import nki\n    import nki.language as nl\n    import nki.isa as nisa\n\nOpen your favorite editor or IDE and create the ``add_kernel.py`` code file, and then add the imports for the NKI libraries.\n\nStep 2: Create the nki_tensor_add_kernel\n-----------------------------------------\n\nIn this step, you define the ``nki_tensor_add_kernel`` function. \n\n.. code-block:: python\n\n    @nki.jit\n    def nki_tensor_add_kernel(a_input, b_input):\n        \"\"\"\n        NKI kernel to compute element-wise addition of two input tensors.\n        \"\"\"\n\nAdd the ``nki_tensor_add_kernel`` function definition above. Make sure you annotate it with the ``@nki.jit`` decorator as in the example above.\n\nStep 3: Check input size and shapes\n------------------------------------\n\nIn this step, you add a couple of assertions to check that ``a_input`` and ``b_input`` are the same size/datatype and that these will fit within the on-chip tile size.\n\nAdd the following assertions to your ``nki_tensor_add_kernel`` function in ``add_kernel.py``.\n\n.. code-block:: python\n\n        # check both input tensor shapes/dtypes are the same for element-wise operation.\n        assert a_input.shape == b_input.shape\n        assert a_input.dtype == b_input.dtype\n\n        # Check the first dimension's size to ensure it does not exceed on-chip\n        # memory tile size, since this simple kernel does not tile inputs.\n        assert a_input.shape[0] <= nl.tile_size.pmax\n\nThe first assertion checks that ``a_input`` and ``b_input`` have the same shape. The second assertion checks that the inputs will fit in within the tile size of the on-chip memory. If an input is larger than the on-chip tile size, you must tile the input. To keep this example simple we will avoid discussing tiling further in this quick start.\n\nStep 4: Read input into the on-chip memory\n-------------------------------------------\n\nIn this step, you will add code to read the inputs from HBM into on-chip memory.\n\nThe ``nki_tensor_add_kernel`` function will receive inputs from the HBM memory and must move them into on-chip memory to operate over their values. You first create space in the on-chip memory and then copy the value into on-chip memory for each input. See :doc:`Memory Hierarchy </nki/get-started/about/memory-hierarchy-overview>` for more details on the memory hierarchy.\n\n.. code-block:: python\n\n    # Allocate space for the input tensors in SBUF and copy the inputs from HBM\n    # to SBUF with DMA copy.\n    a_tile = nl.ndarray(shape=a_input.shape, dtype=a_input.dtype, buffer=nl.sbuf)\n    nisa.dma_copy(dst=a_tile, src=a_input)\n\n    b_tile = nl.ndarray(shape=b_input.shape, dtype=b_input.dtype, buffer=nl.sbuf)\n    nisa.dma_copy(dst=b_tile, src=b_input)\n\nThe ``nl.ndarray`` function allows you to allocate tensors in SBUF. Here you allocate ``a_tile`` and ``b_tile`` and use the ``nisa.dma_copy`` :doc:`instruction </nki/api/generated/nki.isa.dma_copy>` to copy tensors between HBM and SBUF memories. You first supply the destination for the copy, ``a_tile`` and ``b_tile``. Then you provide the source for the copy, ``a_input`` and ``b_input``, as seen in this example.\n\nStep 5: Add the two tensors\n----------------------------\n\nIn this step, you add code to allocate a destination tensor in SBUF and put the results of adding these two tensor in the new tensor.\n\n.. code-block:: python\n\n    # Allocate space for the result and use tensor_tensor to perform\n    # element-wise addition. Note: the first argument of 'tensor_tensor'\n    # is the destination tensor.\n    c_tile = nl.ndarray(shape=a_input.shape, dtype=a_input.dtype, buffer=nl.sbuf)\n    nisa.tensor_tensor(dst=c_tile, data1=a_tile, data2=b_tile, op=nl.add)\n\nAs in step 4, you allocate a space for the ``c_tile`` in SBUF, using ``nl.ndarray``. Since the shape of the output will be the same shape as the inputs, you can use the ``a_input`` data type and shape for the allocation. You use the ``nisa.tensor_tensor`` :doc:`instruction </nki/api/generated/nki.isa.tensor_tensor>` to perform element-wise calculation on two tensors. The first argument of ``tensor_tensor`` is the destination tensor, ``c_tile``, and the sources, ``a_tile`` and ``b_tile``, follow it. You must also provide an op which tells ``tensor_tensor`` which operation to perform on the inputs. In this case, you use ``op=nl.add`` to specify addition.\n\nStep 6: Copy the result to HBM\n-------------------------------\n\nIn this step, you will allocate space for the output tensor in HBM and copy the result from SBUF to the new tensor. This is the inverse of what you did with the input, where you copied the inputs from HBM into SBUF.\n\n.. code-block:: python\n\n    # Create a tensor in HBM and copy the result into HBM.\n    c_output = nl.ndarray(dtype=a_input.dtype, shape=a_input.shape, buffer=nl.shared_hbm)\n    nisa.dma_copy(dst=c_output, src=c_tile)\n\nYou use ``nl.ndarray`` with ``buffer=nl.shared_hbm`` to create tensors in HBM, similar to how you allocated space in SBUF with ``buffer=nl.sbuf``. You then copy the result in ``c_tile`` into ``c_output``. Remember that ``c_output`` is the destination and ``c_tile`` is the source for the ``dma_copy`` instruction. The copy is needed because outputs, like inputs, need to be in HBM.\n\nStep 7: Return the output\n--------------------------\n\nIn this step, you will return the result.\n\n.. code-block:: python\n\n    # Return kernel output as function output.\n    return c_output\n\nYou should now have an ``add_kernel.py`` file that looks as follows.\n\n.. code-block:: python\n\n    import nki\n    import nki.language as nl\n    import nki.isa as nisa\n\n    @nki.jit\n    def nki_tensor_add_kernel(a_input, b_input):\n        \"\"\"\n        NKI kernel to compute element-wise addition of two input tensors.\n        \"\"\"\n\n        # check both input tensor shapes/dtypes are the same for element-wise operation.\n        assert a_input.shape == b_input.shape\n        assert a_input.dtype == b_input.dtype\n\n        # Check the first dimension's size to ensure it does not exceed on-chip\n        # memory tile size, since this simple kernel does not tile inputs.\n        assert a_input.shape[0] <= nl.tile_size.pmax\n\n        # Allocate space for the input tensors in SBUF and copy the inputs from HBM\n        # to SBUF with DMA copy.\n        a_tile = nl.ndarray(shape=a_input.shape, dtype=a_input.dtype, buffer=nl.sbuf)\n        nisa.dma_copy(dst=a_tile, src=a_input)\n\n        b_tile = nl.ndarray(shape=b_input.shape, dtype=b_input.dtype, buffer=nl.sbuf)\n        nisa.dma_copy(dst=b_tile, src=b_input)\n\n        # Allocate space for the result and use tensor_tensor to perform\n        # element-wise addition. Note: the first argument of 'tensor_tensor'\n        # is the destination tensor.\n        c_tile = nl.ndarray(shape=a_input.shape, dtype=a_input.dtype, buffer=nl.sbuf)\n        nisa.tensor_tensor(dst=c_tile, data1=a_tile, data2=b_tile, op=nl.add)\n\n        # Create a tensor in HBM and copy the result into HBM.\n        c_output = nl.ndarray(dtype=a_input.dtype, shape=a_input.shape, buffer=nl.shared_hbm)\n        nisa.dma_copy(dst=c_output, src=c_tile)\n\n        # Return kernel output as function output.\n        return c_output\n\n\nStep 8: Create a PyTorch or JAX test program\n---------------------------------------------\n\nIn this step, you create a test program as a Python script using either PyTorch or JAX.\n\n.. tabs::\n\n   .. tab:: PyTorch\n\n      You can create a file called ``test_program.py`` with the following content.\n\n      .. code-block:: python\n\n          import torch\n          import torch_neuronx\n          from add_kernel import nki_tensor_add_kernel\n\n          # Generate input tensors.\n          a = torch.ones((4, 3), dtype=torch.float16)\n          b = torch.ones((4, 3), dtype=torch.float16)\n\n          # Trace the kernel for Neuron.\n          trace = torch_neuronx.trace(nki_tensor_add_kernel, (a, b))\n\n          # Run the traced kernel.\n          c = trace(a, b)\n\n          # Print the result.\n          print(c)\n\n      You create input tensors using PyTorch. You use ``torch_neuronx.trace`` to compile the kernel for the Neuron device, then call the traced function to run it. The ``print`` function prints the result to the console.\n\n   .. tab:: JAX\n\n      You can create a file called ``test_program.py`` with the following content.\n\n      .. code-block:: python\n\n          import jax.numpy as jnp\n          from add_kernel import nki_tensor_add_kernel\n\n          # Generate the input tensors.\n          a = jnp.ones((4, 3), dtype=jnp.float16)\n          b = jnp.ones((4, 3), dtype=jnp.float16)\n\n          # Invoke the kernel to add the results.\n          c = nki_tensor_add_kernel(a, b)\n\n          # Print the result tensor.\n          print(c)\n\n      You create input tensors using the ``jax.numpy`` library. You call the ``nki_tensor_add_kernel function`` to invoke the kernel. The ``print`` function prints the result to the console.\n\nAll complete! Now, let's confirm everything works.\n\nConfirmation\n-------------\n\nYou can confirm the success of the kernel by running the driver you created in step 8.\n\n.. code-block:: bash\n\n    NEURON_PLATFORM_TARGET_OVERRIDE=trn3 python test_program.py\n\nThe ``NEURON_PLATFORM_TARGET_OVERRIDE`` environment variable sets the target architecture for compilation. In this example it is set to ``trn3`` which creates a binary suitable for running on Trn3 machines. For Trn2, specify ``trn2``.\n\nWhether you used PyTorch or JAX for the driver, you should see the following result.\n\n.. code-block:: text\n\n    [[2. 2. 2.]\n     [2. 2. 2.]\n     [2. 2. 2.]\n     [2. 2. 2.]]\n\nYou will also see some additional output depending on whether you used PyTorch or JAX.\n\n.. tabs::\n\n   .. tab:: PyTorch\n\n      .. code-block:: text\n\n            2026-Apr-13 01:46:31.0675 837617:837663 [2] int nccl_net_ofi_create_plugin(nccl_net_ofi_plugin_t**):219 CCOM WARN NET/OFI Failed to initialize rdma protocol\n            2026-Apr-13 01:46:31.0678 837617:837663 [2] int nccl_net_ofi_create_plugin(nccl_net_ofi_plugin_t**):354 CCOM WARN NET/OFI aws-ofi-nccl initialization failed\n            2026-Apr-13 01:46:31.0681 837617:837663 [2] ncclResult_t nccl_net_ofi_init_no_atexit_fini_v6(ncclDebugLogger_t):183 CCOM WARN NET/OFI Initializing plugin failed\n            2026-Apr-13 01:46:31.0683 837617:837663 [2] net_plugin.cc:97 CCOM WARN OFI plugin initNet() failed is EFA enabled?\n            .\n            Compiler status PASS\n            2026-04-13 01:46:33.000003: 837617 [INFO]: Compilation Successfully Completed for model.MODULE_9886333626096130500+70e3f644.hlo_module.pb\n            tensor([[2., 2., 2.],\n             [2., 2., 2.],\n             [2., 2., 2.],\n             [2., 2., 2.]], device='xla:0', dtype=torch.float16)\n\n      .. note::\n\n         The CCOM warnings about OFI/EFA initialization are harmless on single-node instances without EFA networking and can be safely ignored.\n\n   .. tab:: JAX\n\n      .. code-block:: text\n\n            WARNING:2026-04-13 01:56:40,630:jax._src.xla_bridge:901: Platform 'neuron' is experimental and not all JAX functionality may be correctly supported!\n            2026-Apr-13 01:56:47.0115 838811:838863 [3] int nccl_net_ofi_create_plugin(nccl_net_ofi_plugin_t**):219 CCOM WARN NET/OFI Failed to initialize rdma protocol\n            2026-Apr-13 01:56:47.0117 838811:838863 [3] int nccl_net_ofi_create_plugin(nccl_net_ofi_plugin_t**):354 CCOM WARN NET/OFI aws-ofi-nccl initialization failed\n            2026-Apr-13 01:56:47.0120 838811:838863 [3] ncclResult_t nccl_net_ofi_init_no_atexit_fini_v6(ncclDebugLogger_t):183 CCOM WARN NET/OFI Initializing plugin failed\n            2026-Apr-13 01:56:47.0122 838811:838863 [3] net_plugin.cc:97 CCOM WARN OFI plugin initNet() failed is EFA enabled?\n            [[2. 2. 2.]\n             [2. 2. 2.]\n             [2. 2. 2.]\n             [2. 2. 2.]]\n\n      .. note::\n\n         The \"Platform 'neuron' is experimental\" warning and CCOM warnings are harmless and can be safely ignored.\n\nCongratulations! You have now your first NKI kernel written and running. If you encountered any issues, see the Common issues section below.\n\nCommon issues\n--------------\n\nUh oh! Did you encounter an error or other issue while working through this quickstart? Here are some commonly encountered issues and how to address them.\n\n* ``nki``, ``jax``, ``torch``, etc. library not found: You may need to activate the PyTorch or JAX environment.\n* No neuron device available: You may not have the ``neuron`` kernel module loaded. Make sure the ``neuron`` module is loaded with ``sudo modprobe neuron``.\n\nClean up\n---------\n\nWhen you are finished with this example, you can deactivate your ``venv`` with ``deactivate`` and remove both ``add_kernel.py`` and ``test_program.py``.\n\nNext steps\n-----------\n\nNow that you've completed this quickstart, take your work and dive into other topics that build off of it.\n\n* :doc:`NKI Language Guide </nki/get-started/nki-language-guide>`\n* :doc:`NKI Tutorials </nki/guides/tutorials/index>`\n\nFurther reading\n----------------\n\n* :doc:`NKI API Reference Manual </nki/api/index>`\n* :doc:`NKI Developer Guides </nki/guides/index>`\n"
  },
  {
    "path": "nki/get-started/setup-env.rst",
    "content": ".. meta::\n    :description: How to set up your environment for NKI development with AWS Neuron SDK\n    :date-modified: 04/12/2026\n\n\n.. _how-to-set-up-nki-env:\n\nHow to set up your environment for NKI development\n===================================================\n\nThe Neuron Kernel Interface (NKI) lets you write kernels that directly use hardware resources in the Trn2 / Trn3 family of Neuron ML accelerators. NKI kernels use low-level operators that match instructions on Neuron devices. You can use kernels with PyTorch or JAX to speed up critical sections of your model. This topic shows you how to set up your environment for NKI development using the AWS Neuron SDK. After you set up your environment, you can access the NKI and Neuron Graph compilers.\n\nTask overview\n--------------\nThis tutorial walks you through launching a Trn2 / Trn3 instance with an Amazon Machine Image (AMI).\n\nPrerequisites\n--------------\n\n* You need an AWS login to launch a Trn2 / Trn3 EC2 instance.\n\nInstructions\n-------------\n\n\n.. tabs::\n\n   .. tab:: Amazon Linux 2023\n\n      You can set up an environment to use NKI in several ways. The easiest method uses the Neuron Multi-framework Deep Learning AMI (DLAMI). The DLAMI provides Python virtual environments (using venv) for frameworks like PyTorch and JAX. AWS updates the DLAMI with each new Neuron SDK release. If you prefer to manage the environment directly, you can start with a standard Amazon Linux 2023 (AL2023) AMI and install the Neuron SDK and NKI library directly. If you already have a configured environment, follow the upgrade tab instructions to upgrade to the latest SDK.\n\n      .. tabs::\n\n         .. tab:: DLAMI\n\n            1. Launch the instance using the Neuron Deep Learning AMI.\n   \n               .. image:: /nki/img/get-started/nki-setup-1.png\n\n               Select the desired region from the EC2 Console and choose \"Launch Instance\". In the \"Quick Start\" tab, select \"Amazon Linux\", then in the AMI dropdown search for \"neuron\". The \"Deep Learning AMI Neuron (Amazon Linux 2023)\" should be the only option. Select an Trn2 / Trn3 instance type. For more details see the Trn2 or Trn3 EC2 pages.\n\n               Once the instance is launched, an environment can be activated with the NKI library and Neuron SDK already installed.\n\n               * Note: If you are looking to use the Neuron DLAMI in your cloud automation flows, Neuron also supports SSM parameters to easily retrieve the latest DLAMI id.\n\n         .. tab:: Standard AMI\n\n            1. Launch the instance using the Amazon Linux 2023\n               \n               Select the desired region from the EC2 Console and choose \"Launch Instance\". In the \"Quick Start\" tab, select \"Amazon Linux\", then in the AL2023 AMI. Select an Trn2 / Trn3 instance type. For more details see the Trn2 or Trn3 EC2 pages. Note: You will need to allocate at least 85 GB of storage.\n            \n            2. Install Drivers and Tools\n\n               .. code-block:: bash\n\n                  # Configure Linux for Neuron repository updates\n                  sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF\n                  [neuron]\n                  name=Neuron YUM Repository\n                  baseurl=https://yum.repos.neuron.amazonaws.com\n                  enabled=1\n                  metadata_expire=0\n                  EOF\n                  sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n\n                  # Update OS packages \n                  sudo dnf update -y\n\n                  # Install OS headers \n                  sudo dnf install -y \"kernel-devel-uname-r = $(uname -r)\"\n\n                  # Install git \n                  sudo dnf install git -y\n\n                  # Install Neuron Driver\n                  sudo dnf install aws-neuronx-dkms-2.* -y\n\n                  # Install Neuron Runtime \n                  sudo dnf install aws-neuronx-collectives-2.* -y\n                  sudo dnf install aws-neuronx-runtime-lib-2.* -y\n\n                  # Install Neuron Tools \n                  sudo dnf install aws-neuronx-tools-2.* -y\n\n                  # Add PATH\n                  export PATH=/opt/aws/neuron/bin:$PATH\n\n            3. Set up either a PyTorch or JAX environment to use with NKI\n\n               .. tabs::\n\n                  .. tab:: PyTorch\n\n                     .. code-block:: bash\n\n                        # Install External Dependency\n                        sudo dnf install -y libxcrypt-compat\n\n                        # Install Python \n                        sudo dnf install -y python3.11\n\n                        # Install GCC\n                        sudo dnf install -y gcc-c++ \n\n                        # Create Python venv\n                        python3.11 -m venv aws_neuron_venv_pytorch \n\n                        # Activate Python venv \n                        source aws_neuron_venv_pytorch/bin/activate \n                        pip install -U pip \n\n                        # Install Jupyter notebook kernel\n                        pip install ipykernel \n                        python3.11 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name \"Python (torch-neuronx)\"\n                        pip install jupyter notebook\n                        pip install environment_kernels\n\n                        # Set pip repository pointing to the Neuron repository \n                        pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com\n\n                        # Install wget, awscli \n                        pip install wget \n                        pip install awscli \n\n                        # Install Neuron Compiler and Framework\n                        pip install neuronx-cc==2.* torch-neuronx==2.9.* torchvision nki\n\n                  .. tab:: JAX\n\n                     .. code-block:: bash\n\n                        # Install External Dependency\n                        sudo dnf install -y libxcrypt-compat\n\n                        # Install Python \n                        sudo dnf install -y python3.11\n\n                        # Install GCC \n                        sudo dnf install -y gcc-c++ \n\n                        # Create Python venv\n                        python3.11 -m venv aws_neuron_venv_jax\n\n                        # Activate Python venv \n                        source aws_neuron_venv_jax/bin/activate \n                        pip install -U pip\n\n                     Neuron provides two different ways to install the JAX package. The first is a common package with jax-neuronx packaged together and tested with all the necessary dependencies including jax, jaxlib, libneuronxla, neuronx-cc, and nki. This package can be installed as follows.\n\n                     .. code-block:: bash\n\n                        pip install jax-neuronx[stable] --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n                     Alternatively, jax, jaxlib, libneuronxla, neuronx-cc, and nki can be installed separately, with jax-neuronx being an optional addition. This version can be installed as follows.\n\n                     .. code-block:: bash\n\n                        pip install jax==0.7.0 jaxlib==0.7.0\n                        pip install jax-neuronx libneuronxla neuronx-cc==2.* nki --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n         .. tab:: Upgrade\n\n            Upgrading an existing AL2023 install of of the Neuron SDK with NKI can be done with for PyTorch or JAX.\n\n            .. tabs::\n\n               .. tab:: PyTorch\n\n                  .. code-block:: bash\n\n                     # Install External Dependency\n                     sudo dnf install -y libxcrypt-compat\n\n                     # Activate Python venv \n                     source aws_neuron_venv_pytorch/bin/activate \n\n                     # Install Jupyter notebook kernel\n                     pip install ipykernel \n                     python3.11 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name \"Python (torch-neuronx)\"\n                     pip install jupyter notebook\n                     pip install environment_kernels\n\n                     # Set pip repository pointing to the Neuron repository \n                     pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com\n\n                     # Install wget, awscli \n                     pip install wget \n                     pip install awscli \n\n                     # Update Neuron Compiler and Framework\n                     pip install --upgrade neuronx-cc==2.* torch-neuronx==2.9.* torchvision nki\n\n               .. tab:: JAX\n\n                  .. code-block:: bash\n\n                     # Install External Dependency\n                     sudo dnf install -y libxcrypt-compat\n\n                     # Activate Python venv \n                     source aws_neuron_venv_pytorch/bin/activate\n\n                     # Install wget, awscli \n                     pip install wget \n                     pip install awscli \n\n                  JAX upgrade can be done with either the combined jax-neuronx package which is tested to work together as follows.\n\n                  .. code-block:: bash\n\n                     pip install --upgrade jax-neuronx[stable] --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n                  Alternatively, jax, jaxlib, libneuronxla, neuronx-cc, and nki can be upgraded separately, with jax-neuronx being an optional addition. This version can be installed as follows.\n\n                  .. code-block:: bash\n\n                     pip install jax==0.7.0 jaxlib==0.7.0\n                     pip install --upgrade jax-neuronx libneuronxla neuronx-cc==2.* nki --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n   .. tab:: Ubuntu 24\n\n      The easiest way to set up an environment to use NKI is by using the Neuron Multi-framework Deep Learning AMI (DLAMI). The DLAMI provides Python virtual environments (using venv) for a variety of frameworks including PyTorch and JAX and is updated with each new release of the Neuron SDK. For customers that prefer to manage the environment directly, it is also possible to start with an standard Ubuntu 24 AMI and install the Neuron SDK and NKI library directly. Customers who already have an environment configured can follow the instructions in the upgrade tab to upgrade to the latest SDK.\n\n      .. tabs::\n\n         .. tab:: DLAMI\n\n            1. Launch the instance using the Neuron Deep Learning AMI\n   \n               .. image:: /nki/img/get-started/nki-setup-2.png\n\n               Select the desired region from the EC2 Console and choose \"Launch Instance\". In the \"Quick Start\" tab, select \"Ubuntu\", then in the AMI dropdown search for \"neuron\". The \"Deep Learning AMI Neuron (Ubuntu 24.04)\" should be the only option. Select an Trn2 / Trn3 instance type. For more details see the Trn2 or Trn3 EC2 pages.\n\n               Once the instance is launched, an environment can be activated with the NKI library and Neuron SDK already installed.\n\n               * Note: If you are looking to use the Neuron DLAMI in your cloud automation flows, Neuron also supports SSM parameters to easily retrieve the latest DLAMI id.\n\n         .. tab:: Standard AMI\n\n            1. Launch the instance using the Ubuntu 24\n               \n               Select the desired region from the EC2 Console and choose \"Launch Instance\". In the \"Quick Start\" tab, select \"Ubuntu\", then in the Ubuntu Server 22 AMI. Select an Trn2 / Trn3 instance type. For more details see the Trn2 or Trn3 EC2 pages. Note: You will need to allocate at least 50 GB of storage.\n            \n            2. Install Drivers and Tools\n\n               .. code-block:: bash\n\n                  # Configure Linux for Neuron repository updates\n                  . /etc/os-release\n                  sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF\n                  deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main\n                  EOF\n                  wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -\n\n                  # Update OS packages \n                  sudo apt-get update -y\n\n                  # Install OS headers \n                  sudo apt-get install linux-headers-$(uname -r) -y\n\n                  # Install git \n                  sudo apt-get install git -y\n\n                  # Install Neuron Driver\n                  sudo apt-get install aws-neuronx-dkms=2.* -y\n\n                  # Install Neuron Runtime \n                  sudo apt-get install aws-neuronx-collectives=2.* -y\n                  sudo apt-get install aws-neuronx-runtime-lib=2.* -y\n\n                  # Install Neuron Tools \n                  sudo apt-get install aws-neuronx-tools=2.* -y\n\n                  # Add PATH\n                  export PATH=/opt/aws/neuron/bin:$PATH\n\n            3. Set up either a PyTorch or JAX environment to use with NKI\n\n               .. tabs::\n\n                  .. tab:: PyTorch\n\n                     .. code-block:: bash\n\n                        # Install Python venv \n                        sudo apt-get install -y python3.12-venv g++ \n\n                        # Create Python venv\n                        python3.12 -m venv aws_neuron_venv_pytorch \n\n                        # Activate Python venv \n                        source aws_neuron_venv_pytorch/bin/activate \n                        python -m pip install -U pip \n\n                        # Install Jupyter notebook kernel\n                        pip install ipykernel \n                        python3.12 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name \"Python (torch-neuronx)\"\n                        pip install jupyter notebook\n                        pip install environment_kernels\n\n                        # Set pip repository pointing to the Neuron repository \n                        python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com\n\n                        # Install wget, awscli \n                        python -m pip install wget \n                        python -m pip install awscli \n\n                        # Install Neuron Compiler and Framework\n                        python -m pip install neuronx-cc==2.* torch-neuronx==2.9.* torchvision nki\n\n                  .. tab:: JAX\n\n                     .. code-block:: bash\n\n                        # Install Python venv \n                        sudo apt-get install -y python3.12-venv g++ \n\n                        # Create Python venv\n                        python3.12 -m venv aws_neuron_venv_jax\n\n                        # Activate Python venv \n                        source aws_neuron_venv_jax/bin/activate \n                        python -m pip install -U pip \n\n                     Neuron provides two different ways to install the JAX package. The first is a common package with jax-neuronx packaged together and tested with all the necessary dependencies including jax, jaxlib, libneuronxla, neuronx-cc, and nki. This package can be installed as follows.\n\n                     .. code-block:: bash\n\n                        pip install jax-neuronx[stable] --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n                     Alternatively, jax, jaxlib, libneuronxla, neuronx-cc, and nki can be installed separately, with jax-neuronx being an optional addition. This version can be installed as follows.\n\n                     .. code-block:: bash\n\n                        pip install jax==0.7.0 jaxlib==0.7.0\n                        pip install jax-neuronx libneuronxla neuronx-cc==2.* nki --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n         .. tab:: Upgrade\n\n            Upgrading an existing Ubuntu 24 install of of the Neuron SDK with NKI can be done with for PyTorch or JAX.\n\n            .. tabs::\n\n               .. tab:: PyTorch\n\n                  .. code-block:: bash\n\n                     # Install Python venv \n                     sudo apt-get install -y python3.12-venv g++ \n\n                     # Create Python venv\n                     python3.12 -m venv aws_neuron_venv_pytorch \n\n                     # Activate Python venv \n                     source aws_neuron_venv_pytorch/bin/activate \n                     pip install -U pip \n\n                     # Install Jupyter notebook kernel\n                     pip install ipykernel \n                     python3.12 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name \"Python (torch-neuronx)\"\n                     pip install jupyter notebook\n                     pip install environment_kernels\n\n                     # Set pip repository pointing to the Neuron repository \n                     pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com\n\n                     # Install wget, awscli \n                     pip install wget \n                     pip install awscli \n\n                     # Install Neuron Compiler and Framework\n                     pip install neuronx-cc==2.* torch-neuronx==2.9.* torchvision nki\n\n               .. tab:: JAX\n\n                  .. code-block:: bash\n\n                     # Update Python venv \n                     sudo apt-get install -y python3.12-venv g++ \n\n                     # Activate Python venv \n                     source aws_neuron_venv_jax/bin/activate \n                     pip install -U pip \n\n                  Neuron provides two different ways to install the JAX package. The first is a common package with jax-neuronx packaged together and tested with all the necessary dependencies including jax, jaxlib, libneuronxla, neuronx-cc, and nki. This package can be installed as follows.\n\n                  .. code-block:: bash\n\n                     pip install --upgrade jax-neuronx[stable] --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n                  Alternatively, jax, jaxlib, libneuronxla, neuronx-cc, and nki can be installed separately, with jax-neuronx being an optional addition. This version can be installed as follows.\n\n                  .. code-block:: bash\n\n                     pip install jax==0.7.0 jaxlib==0.7.0\n                     pip install --upgrade jax-neuronx libneuronxla neuronx-cc==2.* nki --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\nConfirm your work\n------------------\n\nTo test the NKI environment is set up and ready to use, a ``venv`` that contains the ``nki`` library must be activated. Select the tab below that corresponds to how you installed the Neuron SDK above.\n\n.. tabs::\n\n   .. tab:: Deep Learning AMI\n      \n      The Deep Learning AMI provides a number of environments for PyTorch, JAX, and other supported ML frameworks. Any of the PyTorch or JAX venvs supplied as a part of the Deep Learning AMI will include the ``nki`` library. See the Neuron DLAMI overview for the full list of environments. For simplicity, the JAX and PyTorch tabs below each choose the plain JAX and PyTorch venv respectively.\n\n      .. tabs::\n\n         .. tab:: PyTorch\n\n            .. code-block:: bash\n\n               source /opt/aws_neuronx_venv_pytorch_2_9/bin/activate\n\n         .. tab:: JAX\n\n            .. code-block:: bash\n\n               source /opt/aws_neuronx_venv_jax_0_7/bin/activate\n\n   .. tab:: Standard AMI\n      \n      The venv created in the setup step above can be activate as follows.\n\n      .. tabs::\n\n         .. tab:: PyTorch\n\n            .. code-block:: bash\n\n               source aws_neuronx_venv_pytorch/bin/activate\n\n         .. tab:: JAX\n\n            .. code-block:: bash\n\n               source aws_neuronx_venv_jax/bin/activate\n\nOnce the ``venv`` is activated, confirm that NKI is available.\n\n.. code-block:: bash\n\n   python -c 'import nki'\n\nIf the environment is setup correctly, Python should return without reporting any errors.\n\nCommon issues\n---------------\n\nUh oh! Did you encounter an error or other issue while working through this task? Here are some commonly encountered issues and how to address them.\n\n* Python reports an error trying to import NKI when using a Deep Learning AMI:\n  \n    - Make sure a PyTorch or JAX ``venv`` (provided as part of the Deep Learning AMI) is activated. Your shell prompt should reflect this by starting with ``(aws_neuronx_venv_<framework+version>) ...``\n  \n* Python reports an error trying to import NKI in the ``venv`` created as part of the Standard AMI install:\n  \n    - Make sure the ``venv`` you created is activated. Your shell prompt should reflect this by starting with ``(<venv-name>) ...``\n    - Make sure that the NKI library installation (with ``pip``) from the previous instructions succeeded.\n\nRelated information\n-------------------\n\n* :doc:`Neuron DLAMI User Guide </dlami/index>`\n* :doc:`Neuron Setup Guide </setup/index>`\n"
  },
  {
    "path": "nki/guides/architecture/index.rst",
    "content": ".. meta::\n    :description: NKI and Neuron Architectures.\n    :keywords: NKI, AWS Neuron, Architecture, Trainium, trn1, trn2, trn3, inf2\n    :date-modified: 12/14/2025\n\n.. _nki-architecture-guides:\n\nNKI and Neuron Architecture\n----------------------------\n\nNKI currently supports the following NeuronDevice generations:\n\n* Trainium/Inferentia2, available on AWS ``trn1``, ``trn1n`` and ``inf2`` instances\n* Trainium2, available on AWS ``trn2`` instances and UltraServers\n* Trainium3, available on AWS ``trn3`` instances and UltraServers\n\nThe documents below provide an architecture deep dive of each NeuronDevice generation,\nwith a focus on areas that NKI developers can directly control through kernel implementation.\n\n* :doc:`Trainium/Inferentia2 Architecture Guide </nki/guides/architecture/trainium_inferentia2_arch>` serves as a foundational architecture guide for understanding the basics of any NeuronDevice generation.\n* :doc:`Trainium2 Architecture Guide </nki/guides/architecture/trainium2_arch>` walks through the architecture enhancements when compared to the previous generation.\n* :doc:`Trainium3 Architecture Guide </nki/guides/architecture/trainium3_arch>` covers the enhancements for the next-generation Trainium ML accelerators.\n  \nNeuron recommends new NKI developers start with :doc:`Trainium/Inferentia2 Architecture Guide </nki/guides/architecture/trainium_inferentia2_arch>` before exploring newer NeuronDevice architecture.\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: Trainium/Inferentia2 Architecture Guide\n      :link: trainium_inferentia2_arch\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Foundational architecture guide for understanding NeuronDevice basics.\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Trainium2 Architecture Guide\n      :link: trainium2_arch\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Architecture enhancements and improvements in the Trainium2 generation.\n\n   .. grid-item-card:: Trainium3 Architecture Guide\n      :link: trainium3_arch\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Latest architecture features and capabilities in Trainium3 devices.\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   Trainium/Inferentia2 Guide <trainium_inferentia2_arch>\n   Trainium2 Guide <trainium2_arch>\n   Trainium3 Guide <trainium3_arch>\n\n"
  },
  {
    "path": "nki/guides/architecture/trainium2_arch.rst",
    "content": ".. meta::\n   :description: Trainium2 Architecture Guide for NKI\n   :keywords: AWS Neuron, Trainium2, NeuronCore-v3, NKI, architecture\n   :date-modified: 12/01/2025\n\n.. _trainium2_arch:\n\nTrainium2 Architecture Guide for NKI\n===============================================\n\nIn this guide, we will dive into hardware architecture of third-generation NeuronDevices: Trainium2. This guide will highlight major architectural updates compared to the previous generation. Therefore, we assume readers have gone through :doc:`Trainium/Inferentia2 Architecture Guide </nki/guides/architecture/trainium_inferentia2_arch>` in detail to understand the basics of NeuronDevice Architecture.\n\nThe diagram below shows a block diagram of a Trainium2 device, which consists of:\n\n* 8 NeuronCores (v3).\n* 4 HBM stacks with a total device memory capacity of 96GiB and bandwidth of 3TB/s.\n* 128 DMA (Direct Memory Access) engines to move data within and across devices.\n* 20 CC-Cores for collective communication.\n* 4 NeuronLink-v3 for device-to-device collective communication.\n\n.. _fig-arch-neuron-device-v3:\n\n.. image:: /nki/img/arch_images/neuron_device3.png\n\nTrainium2 Device Diagram.\n\nFor a high-level architecture specification comparison from Trainium1 to Trainium2, check out the\n:doc:`Neuron architecture guide for Trainium2 </about-neuron/arch/neuron-hardware/trainium2>`. The rest of this guide will provide details on new features or improvements in NeuronCore-v3 compute engines and memory subsystem compared to NeuronCore-v2.\n\nNeuronCore-v3 Compute Engine Updates\n------------------------------------\n\nThe figure below is a simplified NeuronCore-v3 diagram of the compute engines and their connectivity to the two on-chip SRAMs, SBUF and PSUM. This is similar to NeuronCore-v2.\n\n.. _fig-neuroncore-v3-diagram:\n\n.. image:: /nki/img/arch_images/nki-trn2-arch-1.png\n\nNeuronCore-v3 SBUF capacity is **28MiB** (or, 128 partitions of 224KiB), up from 24 MiB in NeuronCore-v2. PSUM capacity remains the same at 2MiB. Engine data-path width and frequency are updated to the following:\n\n.. list-table:: Compute Engine Specifications\n   :widths: 20 20 40 20\n   :header-rows: 1\n\n   * - Device Architecture\n     - Compute Engine\n     - Data-path Width (elements/cycle)\n     - Frequency (GHz)\n   * - Trainium2\n     - Tensor\n     - 4x128 (dense FP8_E4/FP8_E5 input), 2x128 (dense BF16/FP16 input) or 5x128 (sparse input); 1x128 (output)\n     - 2.4\n   * - \n     - Vector\n     - 512 BF16/FP16 input/output; 256 input/output for other data types\n     - 0.96\n   * - \n     - Scalar\n     - 128 input/output\n     - 1.2\n   * - \n     - GpSimd\n     - \n     - 1.2\n\nNext, we will go over major updates to each compute engine.\n\nTensor Engine\n--------------\n\nThe Tensor Engine is optimized for tensor computations such as GEMM, CONV, and Transpose. A NeuronCore-v3 Tensor Engine delivers 158 FP8, 79 BF16/FP16/TF32 and 20 FP32 dense TFLOPS of tensor computations. It also delivers 316 FP8/BF16/FP16/TF32 sparse TFLOPS. The rest of this section describes new architectural features introduced in NeuronCore-v3 Tensor Engine. \n\nDouble FP8 Matmul Performance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuronCore-v3 TensorEngine (TensorE from now on) supports matrix multiplications (matmuls) of FP8 input matrices (including FP8_E4 and FP8_E5 formats [1]_) at **double** the throughput compared to BF16/FP16. Mixing FP8_E4 in one input matrix and FP8_E5 in the other is also allowed. This FP8 double performance mode uses FP32 as the accumulation data type, similar to BF16/FP16 matmul.\n\n.. [1] FP8_E3 format is still supported by NeuronCore-v3 TensorE similar to NeuronCore-v2, but its matmul performance is the same as BF16/FP16.\n\nLogically, TensorE doubles the FP8 matmul performance by doubling the maximum contraction dimension of a matmul instruction from 128 (for BF16/FP16) to 256, effectively presenting a 256x128 systolic array to the programmer. Under the hood, since the systolic array is still organized as a grid of 128x128 processing elements, each processing element performs two pairs of FP8 multiplications and also accumulation of the two multiplication results per cycle. The remaining section discusses the semantics of a single double-FP8 matmul instruction. Multiple such instructions can be used to accommodate larger matrix multiplications than the allowed instruction-level tile sizes.\n\nA double-FP8 matmul can perform a multiplication of a 128x256 matrix and a 256x512 matrix (that is, MxKxN matmul, M=128, K=256, N=512). The figure below shows a visualization of the two input matrices (x and y) and the matmul output matrix (output). The figure also highlights two elements (red and yellow) in the first row of the x matrix and in the first column of the y matrix. These two elements are 128 (K//2) elements apart within the rows and columns. We will use these elements to illustrate the SBUF layout requirements for these matrices next. \n\n\n.. _fig-double-fp8-matmul:\n\n.. image:: /nki/img/arch_images/nki-trn2-arch-2.png\n\nThese tensors must still fit in the 128-partition SBUF, with each partition feeding data into each row of processing elements inside the TensorE. The contraction of size 256 is therefore split into two dimensions: (1) the partition dimension of size 128 and (2) the most major (slowest) free dimension of size 2. This is illustrated in the figure below. Both the stationary matrix (x in above figure) and the moving matrix (y in above figure) are sliced in two tiles, where the first and second tile correspond to first and second halves of the contraction dimension, respectively. \n\n.. _fig-double-fp8-sbuf-layout:\n\n.. image:: /nki/img/arch_images/nki-trn2-arch-3.png\n\nNext, we invoke the LoadStationary and MultiplyMoving instructions to perform the matrix multiplications using the above tensors in SBUF. This is illustrated in figure below. The LoadStationary instruction loads the stationary tensor (K/2=128, 2, M=128) into TensorE, which stores two data elements into a single processing element (for example, the red and yellow elements land in the first processing element of TensorE as shown in ❶). Next, the MultiplyMoving instruction streams the moving tensor horizontally across the loaded stationary tensor. Similar to LoadStationary, two elements of moving tensor are sent to the same processing element simultaneously as shown in ❷, such that they can get multiplied with the corresponding pair of loaded stationary elements.\n\n.. _fig-double-fp8-instruction:\n\n.. image:: /nki/img/arch_images/nki-trn2-arch-4.png\n\nNote that the above double FP8 ``LoadStationary``/``MultiplyMoving`` instruction sequence with a 256 contraction dimension takes the same amount of time as the regular BF16/FP16 LoadStationary/MultiplyMoving instruction sequence with a 128 contraction dimension. Since the double FP8 instruction performs double the FLOPs, overall double FP8 matmul on TensorE can achieve double the throughput compared to BF16/FP16 matmuls.\n\nNKI programmers can invoke double FP8 matmul using the ``nisa.nc_matmul()`` API on NeuronCore-v3:\n\n.. code-block:: python\n\n   import nki.isa as nisa\n\n   # stationary: [128, 2, 128]\n   # moving: [128, 2, 512]\n   # dst: [128, 512]\n   nisa.nc_matmul(dst, stationary, moving, \n                  perf_mode=nisa.matmul_perf_mode.double_row, ...)\n\nThe ``nt.tensor[128, 2, 128]`` stationary and ``nt.tensor[128, 2, 512]`` moving tensor shapes reflect the maximum tile sizes for the double FP8 matmul instruction. Smaller tile sizes are supported, though the second dimension (the most major free dimension) of both input tensors must be two. In other words, if the contraction dimension of the matmul is not a multiple of two, programmers are required to explicitly pad the input tensors with zeros to enable the performance mode.\n\nNote that Double FP8 matmul performance mode cannot be combined with the following TensorE features:\n\n* Column tiling mode\n* Sparse matmul (new in NeuronCore-v3, discussion below)\n* Transpose mode (new in NeuronCore-v3, more discussion below)\n\n.. TODO: Uncomment and unindent when the NISA API ships\n   M:N Structured Sparsity\n   ^^^^^^^^^^^^^^^^^^^^^^^^\n\n   Trainium2 TensorE introduces sparse matmul (matrix multiplication) support for M:N structured sparsity. This new functionality multiplies a regular dense moving matrix with a sparse stationary matrix that exhibits a M:N sparsity pattern, where every N elements only have up to M non-zero values along the contraction dimension. Trainium2 hardware supports up to 4x compression ratio and therefore 4x faster matmul performance compared to dense, with the largest value of N being 16. Programmers also have the flexibility to choose a lower compression ratio (CR=N/M) for better model accuracy. NKI currently supports the following M:N patterns: 4:8 (2x compression), 4:12 (3x) and 4:16 (4x), through the ``nki.isa.sparse_matmul`` API.\n\n   To exercise sparse matmul, the sparse stationary matrix must be compressed to store only M out of every N elements, along with a tag tensor which indicates the original positions of the remaining M non-zero elements. In Figure below, a stationary matrix with a compression ratio of 16:4, along with its compressed representation.\n\n   .. _fig-sparse-matmul:\n\n   .. image:: /nki/img/arch_images/nki-trn2-arch-5.png\n\n   The ``nki.isa.sparse_matmul`` API takes the following arguments ``nc_matmul_sparse(moving, stationary, tag, compress_ratio)``.\n\n   Each row TensorE is able to read from 4, 2, or 1 SBUF partitions corresponding to the maximum compression ratio supported by the sparsity on TRN2 ratio. In order to efficiently utilize the TensorE the input moving should have shape ``[Partition Dimension <=128, Compression Ratio, Tile Free Dimension <= 512]``. The stationary matrix represents a 128x128 compressed weight tensor.\n\n   Finally, the tag tensor is a 128x32 tensor of uint16. Each position of uncompressed elements is encoded as an 4-bit integer, which is the minimal width to relative position within N=16. Four tags are then packed into a uint16 datatype which forms the [128,32] tensor.\n\n   A sample ``nki.isa.sparse_matmul`` can be found here:\n\n   .. code-block:: python\n\n      def mm_sparse_128_512_cr4(moving_tensor, stationary_tensor, tag_tensor, output):\n      \"\"\"\n      Args:\n         moving: Input tensor of shape [128, 4, 512], which represents activation tensor\n         stationary: Input tensor of shape [128, 128], which represents the \n                     compressed weight tensor\n         tag: Input tensor of shape [128, 32] of uint16 datatype where each tag \n               represents the indices of non-zero elements of the weight tensor.\n               Tags are uint4 datatypes, 4 tags are packed into 1 uint16 datatype\n         output: reference to the resulting output tensor of shape [128, 512]\n      \"\"\"\n      _, compress_ratio, _ = moving_tensor.shape\n      \n      moving = nl.ndarray(moving_tensor.shape, dtype=moving_tensor.dtype, buffer=nl.sbuf)\n      stationary = nl.ndarray(stationary_tensor.shape, dtype=stationary_tensor.dtype, buffer=nl.sbuf)\n      tag = nl.ndarray(tag_tensor.shape, dtype=tag_tensor.dtype, buffer=nl.sbuf)\n      nisa.dma_copy(dst=moving, src=moving_tensor)\n      nisa.dma_copy(dst=stationary, src=stationary_tensor)\n      nisa.dma_copy(dst=tag, src=tag_tensor)\n\n      psum_buf = nc_matmul_sparse(moving, # [128P, 4, 512F] \n                                    stationary, # [128P, 128F]\n                                    tag, # [128, 32]\n                                    compress_ratio)\n\n      nisa.dma_copy(dst=output, src=psum_buf)\n\n      # Sparse matmul     \n      def test_nc_matmul_sparse(self):\n         M = 512\n         N = 128\n         K = 128\n         sparsity_pattern = (16, 4)\n         L, R = sparsity_pattern\n         ratio = L // R\n\n         moving = np.random.random_sample((K, ratio, M)).astype(dtype) # activation\n         stationary = np.random.random_sample((K, N)).astype(dtype) # compressed weight \n         \n         # For demonstration purpose, we use random values between 0~15 in the tag tensor\n         tag = np.random.randint(0, 16, size=(K, N), dtype=np.ubyte)\n         \n         # maps logical tags to physical tags and pack 4 uint4 to 1 uint\n         squeezed_tag = squeeze_tags(tag, ratio) \n         \n         # Generate NKI output\n         nki_output = np.zeros((N, M), dtype=dtype)\n         mm_sparse_128_512_cr4(moving, stationary, squeezed_tag, nki_output)\n\nBuilt-in Transpose Support\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAs discussed in :doc:`Trainium/Inferentia2 Architecture Guide </nki/guides/architecture/trainium_inferentia2_arch>`, one common use of TensorE besides matrix multiplication operations is transposition of a 2D SBUF tensor, which swaps the partition and free dimension of the matrix. Such a transposition is done through a matmul of the tensor to be transposed (stationary tensor) and an identity matrix (moving tensor). Prior to NeuronCore-v3, TensorE has to perform multiplication of each data element with 1.0 or 0.0 and accumulation along the contraction dimension normally. However, if the tensor to be transposed contains NaN/Inf floating point values, the matmul result will not be a bit-accurate transposition of the original matrix - the NaN/Inf values will propagate through the accumulation chain and spread across the output tensor.\n\nStarting with NeuronCore-v3, TensorE supports an explicit transpose mode, which can correctly transpose input tensors with NaN/Inf. In addition, the transpose mode provides the following benefits:\n\n* 2x speedup in FP32 transpose, vs. no transpose mode enabled.\n* FP16/BF16 PSUM output for FP16/BF16 transpose, vs. FP32 (default matmul output data type) PSUM output when no transpose mode enabled. This allows faster PSUM data eviction back to SBUF.\n\n.. note:: NeuronCore-v3 TensorE transpose mode for FP8 input data produces 16-bit output elements in PSUM, with the upper 8 bits filled with zeros.\n\nNKI programmers can enable TensorE transpose mode on NeuronCore-v3 through the following APIs:\n\n.. code-block:: python\n\n   nisa.nc_matmul(..., is_transpose=True)\n   # OR\n   nisa.nc_transpose(..., engine=nisa.constants.engine.tensor)\n\nVector Engine\n----------------\n\nVector Engine (VectorE) is specially designed to accelerate vector operations where every element in the output tensor typically depends on multiple elements from input tensor(s), such as vector reduction and element-wise operators between two tensors. NeuronCore-v3 Vector Engine delivers a total of 1.0 TFLOPS of FP32 computations and can handle various input/output data-types, including FP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32. \n\nVector Engine Performance Mode\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuronCore-v3 Vector Engine provides a new performance mode BF16/FP16 data types, which quadruples or doubles the instruction throughput depending on the instruction type compared to NeuronCore-v2 (more details below). Enabling this performance mode does not change the computation precision - all computation is still done in FP32, similar to NeuronCore-v2 Vector Engine.\n\nIn particular, the following instructions could see a 4x throughput lift compared to NeuronCore-v2:\n\n1. ``nisa.tensor_copy`` and ``nisa.tensor_scalar`` when both input/output tensors:\n    a. are in SBUF\n    b. are in BF16/FP16 (input and output data types do not need to match)\n    c. have physically contiguous elements in the inner-most (most minor) free dimension\n\nThe following instructions could see a 2x throughput lift compared to NeuronCore-v2:\n\n1. ``nisa.tensor_copy`` and ``nisa.tensor_scalar``:\n    a. when both input/output tensors satisfy 1a and 1b, but not 1c conditions above, or\n    b. when both input/output tensors satisfy 1b and 1c, but one of input and output tensors is in PSUM\n2. ``nisa.tensor_tensor``:\n    a. when both input tensors are SBUF and all of input/output tensors are in BF16/FP16\n\nNote, NKI programmers are not required to explicitly enable VectorE performance mode. VectorE detects the above conditions and enables performance mode automatically in hardware.\n\nScalar Engine\n---------------\n\nAs discussed in Trainium/Inferentia2 Architecture Guide, Scalar Engine (ScalarE) is specially designed to accelerate scalar operations where every element in the output tensor only depends on one element of the input tensor. In addition, ScalarE provides hardware acceleration to evaluate non-linear functions such as Gelu and Sqrt. All architectural capabilities from NeuronCore-v2 Scalar Engine are applicable to NeuronCore-v3. NeuronCore-v3 Scalar Engine additionally supports bit-accurate tensor copies without intermediate FP32 data type casting, similar to VectorE and Gpsimd Engine (see details in ``nisa.tensor_copy``).\n\nGpsimd Engine\n--------------\n\nGpSimd Engine (GpSimdE) is intended to be a general-purpose engine that can run any ML operators that cannot be lowered onto the other highly specialized compute engines discussed above efficiently, such as applying a triangular mask to a tensor. A GpSimdE consists of eight fully programmable processors that can execute arbitrary C/C++ programs.\n\nIn NeuronCore-v3, each processor in GpsimdE also comes with an integrated DMA engine that can move data in parallel to computation on GpsimdE and also parallel to data movements done by the main DMA engines on the Neuron Device. These integrated DMA engines can reach any SBUF/HBM on-chip or off-chip in the same trn2 instance. All eight processors together have a total integrated DMA bandwidth of 307 GB/s (153 GB/s per read/write direction).\n\nIn NeuronCore-v3, each processor in GpsimdE also comes with an integrated DMA engine that can move data in parallel to computation on GpsimdE and also parallel to data movements done by the main DMA engines on the Neuron Device. These integrated DMA engines can reach any SBUF/HBM on-chip or off-chip in the same trn2 instance. All eight processors together have a total integrated DMA bandwidth of 307 GB/s (153 GB/s per read/write direction). \n\nData Movement Updates\n----------------------\n\nTrainium2 consists of a three-tiered memory hierarchy: HBM, SBUF and PSUM, from highest to lowest memory capacity. Figures below show the specifications of these memories and their connectivity for one NeuronCore-v3.\n\n.. _fig-memory-hierarchy:\n\n.. image:: /nki/img/arch_images/nki-trn2-arch-5-1.png\n\n.. _fig-memory-hierarchy-2:\n\n.. image:: /nki/img/arch_images/nki-trn2-arch-6.png\n\nAs shown in the above figures, data movement between HBM and SBUF is performed using on-chip DMA (Direct Memory Access) engines, which can run in parallel to computation within the NeuronCore. Data movement between PSUM and SBUF is done through ISA instructions on the compute engines. In NeuronCore-v3, two restrictions in engine parallel accesses to SBUF/PSUM are lifted to improve programming flexibility compared to NeuronCore-v2:\n\n1. VectorE and GpSimdE can access SBUF in parallel.\n    a. This was disallowed in NeuronCore-v2.\n    b. VectorE's performance mode leverages a shared memory bus between the VectorE and GpsimdE engines to deliver 2-4x performance improvement for select VectorE instructions. The hardware automatically coordinates access between engines to optimize bus utilization, including arbitrating between GpsimdE and relevant VectorE instructions.\n2. VectorE and ScalarE can access PSUM in parallel.\n    a. This was disallowed in NeuronCore-v2.\n    b. Both VectorE and ScalarE can access PSUM at full bandwidth in parallel, as long as their accesses do not collide on the same PSUM bank.\n\nDMA Transpose\n^^^^^^^^^^^^^^^\n\nTrainium2 DMA engines can perform a tensor transpose while moving data from HBM into SBUF, or from SBUF to SBUF itself. The figure below illustrates these two supported DMA transpose data flows. Trainium2 DMA transpose supports bit-accurate transposition for both 2-byte and 4-byte data types.\n\n.. _fig-dma-transpose:\n\n.. image:: /nki/img/arch_images/nki-trn2-arch-7.png\n\nHBM2SBUF DMA transpose\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nBefore diving into how HBM2SBUF transpose works, let's revisit a simple DMA copy from a packed HBM tensor ``[128, 512]`` to an SBUF tensor ``[nl.par_dim(128), 512]``. Following Numpy convention, these tensor shapes follow a major to minor ordering. The figure below visualizes these HBM and SBUF tensors. A packed ``[128, 512]`` HBM tensor consists of 128 chunks of 512 elements, laid out back to back in the HBM linear memory. The most minor (that is, inner-most) dimension consists of 512 contiguous elements in memory. Once loaded into the SBUF, the most minor HBM tensor dimension (512) is mapped to the free dimension of the SBUF, while the most major dimension is mapped to the SBUF partition dimension.\n\nIn Trainium2, each NeuronCore-v3 is typically paired with 16x DMA engines to drive its corresponding SBUF bandwidth. In the above DMA copy, each DMA engine would be responsible for moving 128/16 = 8 chunks of 512 elements.\n\n* HBM tensor [128, 512]: 512 is the inner-most (minor) dimension\n\n.. _fig-hbm2sbuf-dma-copy:\n\n.. image:: /nki/img/arch_images/nki-trn2-arch-8.png\n\nIn contrast, in a DMA transpose operation, we take an HBM tensor of opposite layout [512, 128]:\n\n.. _fig-hbm2sbuf-dma-transpose:\n\n.. image:: /nki/img/arch_images/nki-trn2-arch-9.png\n\nIn a DMA transposition, the most minor dimension of the source HBM tensor now becomes the partition dimension of the SBUF in destination. Compared to the above DMA copy operation where each DMA engine reads and writes an independent slice of 512 elements, DMA transpose requires all 16x DMA engines to work co-operatively to deliver the best throughput - these 16x DMA engines should write into a single ``[nl.par_dim(128), 16]`` SBUF tile in parallel at a time, where the 16 elements along free dimension must be contiguous. Having a multiple of 128 and a multiple of 16 in the output SBUF partition and inner-most free dimension sizes is a pre-requisite to achieve best DMA throughput efficiency possible with DMA transpose. However, it is not a functionality requirement - DMA transpose can flexible tile sizes for DMA transpose at the cost of DMA performance. \n\nHBM2SBUF DMA transpose is commonly seen in ML workloads where the data layout in HBM differs from the format needed by the initial compute engine that processes the data. For example, in the LLM decode phase, the K cache typically has an HBM layout of ``[seqlen, d_head]``, where ``seqlen`` and ``d_head`` are the sequence length and head dimensions respectively. However, when K is consumed by TensorE in the Q@K operator in self-attention, ``d_head`` is the contraction dimension of the matrix multiplication. Therefore, the most-minor d_head dimension in HBM should become the partition dimension to satisfy TensorE layout requirements (see :ref:`Tiling <nki-tile-layout>`: Contraction dimension must map to partition dimension). Mapping most minor HBM tensor dimension to SBUF partition dimension is exactly an HBM2SBUF DMA transpose operation on Trainium2. \n\nIn NKI, programmers can invoke an HBM2SBUF DMA transpose using the ``nisa.dma_transpose`` API.\n\n.. code-block:: python\n\n   import nki\n   import nki.language as nl\n   import nki.isa as nisa\n\n   # hbm_src: [512, 128] in shared_hbm\n   sbuf_dst = nl.ndarray((128, 512), dtype=hbm_src.dtype, buffer=nl.sbuf)\n   nisa.dma_transpose(dst=sbuf_dst, src=hbm_src)\n\n.. admonition:: Performance Consideration\n\n   DMA transpose on Trainium2 can achieve up to 90% DMA throughput utilization given hardware-friendly tensor access patterns, compared to up to 100% throughput utilization for a DMA copy.\n\nSBUF2SBUF DMA transpose\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nSBUF2SBUF DMA transpose works in a similar fashion as HBM2SBUF transpose, where the most minor dimension of the input SBUF tensor, i.e., inner-most free dimension, becomes the partition dimension of the output SBUF tensor. Therefore, SBUF2SBUF DMA transpose is a way to swap partition and free axis of an SBUF tensor, an alternative to TensorE transpose.\n\nThe same ``nisa.dma_transpose`` API can be used to perform an SBUF2SBUF DMA transpose:\n\n.. code-block:: python\n\n   import nki\n   import nki.language as nl\n   import nki.isa as nisa\n\n   # sbuf_src: [128, 128] in sbuf\n   sbuf_dst = nl.ndarray((128, 128), dtype=sbuf_src.dtype, buffer=nl.sbuf)\n   nisa.dma_transpose(dst=sbuf_dst, src=sbuf_src)\n\nPerformance Consideration. SBUF2SBUF transpose can achieve up to 50% of DMA throughput on Trainium2. Compared to TensorE transpose that is more performant but requires ScalarE/VectorE to evict the transposed output from PSUM back to SBUF, DMA transpose can read from and write to SBUF directly. Therefore, DMA transpose is particularly useful in operators that are ScalarE/VectorE bound, such as self attention.\n\n.. _dge_arch:\n\nDescriptor Generation Engine (DGE)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Descriptor Generation Engine (DGE) is a new hardware block in NeuronCore-v3 that accelerates DMA descriptor generation to perform either DMA copy or transpose on the DMA engines. Each NeuronCore-v3 comes with two instances of DGE, which can be commanded through either SyncE or ScalarE sequencer. The figure below shows the connectivity of the DGE instances.\n\n.. _fig-dge:\n\n.. image:: /nki/img/arch_images/nki-trn2-arch-10.png\n\nPrior to Trainium2, DMA descriptor generation was handled in two ways. They were either generated statically on the host when loading a NEFF onto a Neuron Device (i.e., static DMA), or created dynamically through custom kernels on GpsimdE during NEFF execution (i.e., software DGE). The static approach stored all descriptors in HBM, consuming valuable memory space that could otherwise be used for model parameters or computation data. The software-based approach used a portion of SBUF for storing descriptors generated during execution and occupies GpsimdE that could otherwise perform useful computation.\n\nIn comparison, the new hardware-based DGE in Trainium2 generates descriptors on demand without requiring additional memory storage. It also frees up GpsimdE to perform useful computation. Therefore, it is recommended to leverage hardware-based DGE on Trainium2 whenever possible to initiate a DMA transfer.\n\nNKI programmers can invoke hardware-based DGE on NeuronCore-v3 using ``nisa.dma_copy`` and ``nisa.dma_transpose`` APIs, by setting ``dge_mode=nisa.dge_mode.hw_dge``. The compute engine to initiate a DGE command (Sync Engine or ScalarE) is currently determined by NKI compiler (subject to changes).\n\n.. note::\n   NeuronCore-v3 hardware DGE currently does not support indirect DMA operations (gather/scatter). Refer to nisa API documentation for detailed implementation guidelines.\n\n.. admonition:: Performance Consideration\n\n   When triggered from ScalarE, execution of the DGE-based DMA instruction could be hidden behind earlier compute instructions (such as ``nisa.activate()``) in program order, since DGE and the compute pipeline of ScalarE are independent hardware resources. Each DGE-based DMA instruction takes about 600 ns to execute on NeuronCore-v3.\n\n"
  },
  {
    "path": "nki/guides/architecture/trainium3_arch.rst",
    "content": ".. meta::\n   :description: Trainium3 Architecture Guide for NKI\n   :keywords: AWS Neuron, Trainium3, NeuronCore-v4, NKI, architecture\n   :date-modified: 03/09/2026\n\n.. _trainium3_arch:\n\nTrainium3 Architecture Guide for NKI\n====================================\n\nIn this guide, we will dive into the hardware architecture of fourth-generation NeuronDevices: Trainium3. This guide will highlight major architectural updates compared to the previous generation (Trainium2). Therefore, we assume readers are familiar with :doc:`Trainium/Inferentia2 Architecture Guide </nki/guides/architecture/trainium_inferentia2_arch>` and :doc:`Trainium2 Architecture Guide for NKI </nki/guides/architecture/trainium2_arch>` to understand the basics of NeuronDevice Architecture. \n\nThe diagram below shows a block diagram of a Trainium3 device, which consists of:\n\n* 8 NeuronCores (v4).\n* 4 HBM stacks with a total device memory capacity of 144 GiB and bandwidth of 4.7 TB/s. \n* 128 DMA (Direct Memory Access) engines to move data within and across devices.\n* 20 CC-Cores for collective communication.\n* 4 NeuronLink-v4 for device-to-device collective communication.\n\n.. _fig-arch-neuron-device-v4:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-1.png\n\nThe rest of this guide discusses NeuronCore-v4's major architectural updates compared to NeuronCore-v3 that are relevant for NKI programmers. \n\nNeuronCore-v4 Compute Engine Updates\n------------------------------------\n\nThe figure below is a simplified NeuronCore-v4 diagram of the compute engines and their connectivity to the two on-chip SRAMs, which are SBUF and PSUM. This is similar to previous versions of NeuronCore. \n\n.. _fig-neuroncore-v4-diagram:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-2.png\n\nThe NeuronCore-v4 SBUF capacity is 32 MiB (up from 28 MiB in NeuronCore-v3), while the PSUM capacity remains the same at 2 MiB. The engine data-path widths and frequencies are updated to the following:\n\n.. list-table:: Compute Engine Specifications\n   :widths: 20 20 40 20\n   :header-rows: 1\n\n   * - Device Architecture\n     - Compute Engine\n     - Data-path Width (elements/cycle)\n     - Frequency (GHz)\n   * - Trainium3\n     - Tensor\n     - 8x128 (MXFP8 dense input) or 2x128 (non-MXFP8 dense input) or 5x128 (sparse input); 1x128 (output)\n     - 2.4\n   * - \n     - Vector\n     - 512 BF16/FP16/FP8 input/output; 256 input/output for other data types\n     - 1.2\n   * - \n     - Scalar\n     - 256 BF16/FP16/FP8 input/output; 128 input/output for other data types\n     - 1.2\n   * - \n     - GpSimd\n     - 128 input/output for all data types\n     - 1.2\n\nSync Engine has not changed since :doc:`previous Trainium architectures </nki/guides/architecture/trainium_inferentia2_arch>`. Next, we will go over major architectural updates to each compute engine. \n\nTensor Engine\n--------------\n\nThe Tensor Engine is optimized for tensor computations such as GEMM, CONV, and Transpose. A NeuronCore-v4 Tensor Engine delivers 315 MXFP8/MXFP4 TFLOPS, where MXFP8/MXFP4 are OCP (Open Compute Project) compliant data type formats. Besides quantized data types, a NeuronCore-v4 Tensor Engine also delivers 79 BF16/FP16/TF32 and 20 FP32 TFLOPS of tensor computations. The rest of this section describes new architectural features introduced in the NeuronCore-v4 Tensor Engine. \n\n.. _arch-trn3-quad-mxfp:\n\nQuad-MXFP8/MXFP4 Matmul Performance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe NeuronCore-v4 Tensor Engine (TensorE) supports two new input data types: MXFP8 and MXFP4, where MX stands for \"microscaling\", as defined in the OCP standard. Microscaling is a subset of absmax (absolute maximum quantization), where quantization scale factors are calculated using absolute maxima of fine-granularity groups of values, as opposed to having tensor- or channel-wise scale factors. It can significantly improve the amount of information preserved in the quantized values. The supported scaling group size is 32: That means that 32 MXFP8/MXFP4 elements along the matrix multiplication (matmul) contraction dimension share the same 8-bit MX scale value. The Tensor Engine performs matrix multiplications of MXFP8 or MXFP4 input matrices [1]_ and dequantization with the MX scales in a single instruction, with the output either in FP32 or BF16. We will refer to MXFP8 and MXFP4 matmul as MX matmul in the rest of this guide. An MX matmul with either the MXFP8 or MXFP4 datatype runs at 4x the throughput compared to a BF16/FP16 matmul. \n\n.. [1] Multiplying an MXFP8 matrix with an MXFP4 matrix is also allowed. \n\nLogically, TensorE quadruples the MX matmul performance, as compared to BF16 performance, by quadrupling the maximum contraction dimension of the matmul instruction from 128 (for BF16/FP16) to 512, effectively presenting a 512x128 systolic array to the programmer. Under the hood, since the systolic array is still organized as a grid of 128x128 processing elements, each processing element performs four pairs of MX multiplications and also accumulation of the four multiplication results per cycle. This is similar to the Double-FP8 performance mode in the Trainium2 TensorE (discussed in :doc:`Trainium2 Architecture Guide </nki/guides/architecture/trainium2_arch>`), but the data layout requirements for MX matmul are distinct and discussed below. \n\nMathematically, an MX matmul instruction can perform a multiplication of a 128x512 matrix and a 512x512 matrix (that is, MxKxN matmul, M=128, K=512, N=512). The figure below shows a visualization of the two input matrices (x and y) and the matmul output matrix (output). The figure also highlights four elements (red, blue, yellow and green) in the first row of the x matrix and in the first column of the y matrix. These four elements are 128 (K//4) elements apart within the row and column. Each pair of same-colored elements from x and y matrices will get multiplied, and the multiplication results are subsequently accumulated in the matmul operation, inside the TensorE. We will use these elements to illustrate the SBUF layout requirements for these matrices next. \n\n.. _fig-mx-matmul:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-3.png\n\nThe figure below shows how the above matrices should be laid out in SBUF in preparation for MX matmul. For visualization purposes, the x matrix is rotated 90 degrees, such that the contraction K dimension is aligned with the SBUF partition dimension. In addition, we pack the four highlighted elements that used to be 128 elements apart back-to-back along the free dimension. As a result, the matmul contraction dimension K=512 is split into two dimensions: (1) the partition dimension of size 128 and (2) the most minor (fastest) free dimension of size 4. The y (moving) matrix follows a similar four-element packing pattern along the free dimension. The MX matmul instruction requires that data is packed in such quads of elements. In NKI, programmers can directly work with MX data using special quad (x4) packed data types: ``float8_e5m2_x4``, ``float8_e4m3fn_x4``, and ``float4_e2m1fn_x4``.\n\n.. _fig-mx-sbuf-layout:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-4.png\n\nNext, we invoke the LoadStationary and MultiplyMoving instructions to perform the matrix multiplications using the above tensors in SBUF. This is illustrated in the figure below. The LoadStationary instruction loads the MX stationary tensor (K/4=128, M=128, 4) into TensorE, which stores four MX data elements into a single processing element as shown in ❶. Next, the MultiplyMoving instruction streams the moving tensor horizontally across the loaded stationary tensor. Similar to LoadStationary, four elements of moving tensor are sent to the same processing element simultaneously as shown in ❷, such that they can get multiplied with the corresponding loaded stationary elements.\n\n.. _fig-mx-instruction:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-5.png\n\nSince MX matmul in TensorE performs dequantization in addition to the multiplication of the input matrices, we discuss how the scale tensor is laid out in SBUF for TensorE consumption. Recall that the supported MX group size on NeuronCore-v4 TensorE is 32 elements along the contraction dimension. Each input MX matrix to the matmul operation therefore has its own scale tensor. In fact, the highlighted x4 elements within each matrix in the above images are within the same scaling group. The diagram below shows the full 32-element scaling group that includes these highlighted x4 elements within matrix x and y. \n\n.. image:: /nki/img/arch_images/nki-trn3-arch-6.png\n\nLet's focus on the stationary data and scale tensor layout below. On the left, the purple rectangle represents the 32-element scaling group that includes the four highlighted elements, which spans 8 SBUF partitions (8P) and 4 elements per partition. \n\nA single scaling group corresponds to one 8-bit integer scale. Therefore, for every 32 partitions of the data tensor, we get 32/8=4 partitions worth of scale factors. As shown in the scale tensor below, the full scale tensor is split across four SBUF quadrants, where each quadrant holds 4 partitions worth of scales. Note the free dimension of the scale tensor is M=128, which is 4x smaller than the data tensor. This is because the four packed colored elements in the data tensor belong to the same scaling group and hence share a single scale. Within each SBUF quadrant, 32-4=28 partitions are unused in the scale tensor below. Multiple scale tensors for different MX matmul instructions can be packed together to fill up the unused partitions. See the :ref:`MXFP NKI tutorial <nki-mxfp-scale-packing>` for more discussion on packed scales.\n\n.. _fig-mx-scale-layout:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-7.png\n\nThe moving data and scale tensor layout follows the same rules. Therefore, an MX matmul on TensorE requires four input tensors:\n\n1. stationary data\n2. stationary scale\n3. moving data\n4. moving scale\n\nIn NKI, programmers can define MX data tensors using the special x4 data types. The maximum tile size for stationary MX data tensor is [128, 128] in x4 data types ([128, 512] of actual values), while the maximum tile size for moving MX data tensor is [128, 512] in x4 data types ([128, 2048] of actual values). One convenience of the x4 datatypes is that the output matrix dimensions map directly to the sizes of the free dimensions of the input matrices. Similarly, the maximum tile size for stationary and moving MX scale tensors are [128, 128] and [128, 512] in ``nl.uint8``, respectively. The API to invoke an MX matmul is :doc:`nisa.nc_matmul_mx </nki/api/generated/nki.isa.nc_matmul_mx>`:\n\n.. code-block:: python\n\n   nisa.nc_matmul_mx(dst, stationary, moving, stationary_scale, moving_scale)\n\n.. _arch-trn3-bf16-psum:\n\nBF16 Matmul Results in PSUM\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nPrior to the NeuronCore-v4, the Tensor Engine always passes FP32 matrix multiplication results to PSUM unless transpose mode is turned on. Similarly, the PSUM buffer was restricted to FP32 near-memory accumulation (fp32_psum_tensor += fp32_matmul_output). Starting with the NeuronCore-v4, the Tensor Engine allows the matrix multiplication instruction (:doc:`nisa.nc_matmul </nki/api/generated/nki.isa.nc_matmul>`) to store BF16 data into the PSUM buffer directly and also to perform addition to a BF16 tensor stored in PSUM. \n\nNote that the accumulation performed during a matmul operation within the systolic array is still performed using FP32 data. When writing the matmul results into a BF16 PSUM tensor location, the downcast from FP32 to BF16 is performed immediately before the write. The downcast can use the RNE (round nearest even) or SR (stochastic rounding) mode. The figure below illustrates this data flow. \n\n.. _fig-bf16-psum:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-8.png\n\nWhen adding the matmul results to an existing BF16 tensor stored in PSUM, the following operations are performed:\n\n* The existing PSUM tensor (red) is upcast to FP32.\n* The PSUM tensor (now in FP32) and the TensorE output (yellow) are added together at FP32 precision.\n* The result of the addition (green) is converted to BF16 using the given rounding mode, and written back to PSUM.\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-9.png\n\nBackground Transpose\n^^^^^^^^^^^^^^^^^^^^^\n\nThe NeuronCore-v4 TensorEngine supports a new background transpose functionality, which allows it to run a transpose operation in parallel to another matrix multiplication (or another transpose). It allows us to achieve close to double performance on long chains of transposes, or to overlap a larger matrix multiplication with transpose operations in the background.\n\nNKI programmers are not required to enable background transpose explicitly to leverage the performance improvements from this feature. The decision to trigger background transpose is made automatically by the hardware. \n\nVector Engine\n--------------\n\nThe Vector Engine is optimized for vector computations, in which every element of the output is dependent on multiple input elements. Examples include the axpi operations (Z=aX+Y), Layer Normalization, and Pooling operations. The NeuronCore-v4 Vector Engine delivers a total of 1.2 TFLOPS of FP32 computations and can handle various input/output data-types, including FP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32. The rest of this section describes new architectural features introduced in the NeuronCore-v4 Vector Engine. \n\nMX data-type Quantization\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe NeuronCore-v4 VectorE supports quantizing FP16/BF16 data to MXFP8 tensors (both data and scales) in a layout that TensorE can directly consume for MX matmul, as described in the Quad-MXFP8/MXFP4 Matmul Performance section above. As a reminder, an MxK MXFP8 matrix, where K is the contraction dimension, requires the following data and scale layout in SBUF:\n\n.. _fig-mx-quantization:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-10.png\n\nThe VectorE can natively quantize BF16/FP16 data to produce this layout using the QuantizeMX instruction. QuantizeMX calculates the required scales for each group of 32 values, divides them by the calculated scale, and casts to the target MXFP8 datatype (as per the OCP specification):\n\n.. _fig-mx-quantization-flow:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-11.png\n\nThe source FP16/BF16 data must be in SBUF, and has to be in a layout that exactly matches the target MXFP8 data layout (QuantizeMX preserves the data layout). The target MXFP8 data and scales also have to be in SBUF. The quantization instruction can quantize four input elements per partition, per cycle (i.e., 4x Vector performance mode).\n\nIn NKI, programmers can perform such an MX data type quantization using the :doc:`nisa.quantize_mx </nki/api/generated/nki.isa.quantize_mx>` API.\n\nFast Exponential Evaluation\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe NeuronCore-v4 Vector Engine introduces a new instruction to perform fast exponential evaluation (:doc:`nisa.exponential </nki/api/generated/nki.isa.exponential>`), at 4x the throughput compared to the :doc:`nisa.activation </nki/api/generated/nki.isa.activation>` (``op=nl.exp``) instruction on the Scalar Engine. In addition to the exponential function, the instruction on Vector Engine can also apply a subtraction before the exponential function and an accumulation after:\n\n.. code-block:: python\n\n   # Inputs:\n   # src tile [M, N]\n   # max_value tile [M, 1]\n   # Outputs:\n   # dst tile of the same shape [M, N]\n   # reduce_res tile [M, 1]\n   for i in range(M): # parallel (partition) dimension\n       reduce_res[i, 0] = 0\n       for j in range(N): # sequential (free) dimension\n           dst[i, j] = exp(src[i, j] - max_value[i, 0])\n           reduce_res[i, 0] += dst[i, j]\n\nThis particular pattern is useful to speed up the Softmax operator, which is commonly on the critical path of long context length self-attention in large language models (LLMs):\n\n.. math::\n\n   Softmax(X)=\\frac{e^{X_i-max(X)}}{\\sum_i e^{X_i-max(X)}}\n\nX is a vector of attention scores in the context of self-attention, which corresponds to a row in the src tile in the above instruction pseudo code.\n\nXORWOW-based PRNG\n^^^^^^^^^^^^^^^^^\n\nThe NeuronCore-v4 VectorE provides hardware support to produce PRNG (pseudo-random) values using XORWOW as the underlying algorithm. Compared to the LFSR-based algorithm used in VectorE prior to NeuronCore-v4, XORWOW produces higher quality random values. The NeuronCore-v4 VectorE can produce 4x 32-bit PRNG values per compute lane per engine cycle.\n\nIn addition, the NeuronCore-v4 VectorE introduces support for loading and storing XORWOW random states from and to SBUF (or PSUM), across all 128 compute lanes. Within each compute lane, four XORWOW random states are tracked to maintain the :doc:`nisa.rand2 </nki/api/generated/nki.isa.rand2>` instruction throughput, with each state comprising 6 ``uint32`` values. For more details, refer to the :doc:`nisa.rand_set_state </nki/api/generated/nki.isa.rand_set_state>` and :doc:`nisa.rand_get_state </nki/api/generated/nki.isa.rand_get_state>` API documentation. This new state load/store capability in NeuronCore-v4 VectorE, which was not available in previous NeuronCore versions, allows users to save and restore random states for reproducible training runs more easily. \n\nScalar Engine\n--------------\n\nThe Scalar Engine is optimized for scalar computations in which every element of the output is dependent on one element of the input. The NeuronCore-v4 Scalar Engine delivers a total of 1.2 TFLOPS of FP32 computations and can support various input/output data types, including FP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32. The rest of this section describes new architectural features introduced in the NeuronCore-v4 Scalar Engine. \n\nPerformance mode\n^^^^^^^^^^^^^^^^^\n\nThe Trainium3 ScalarE now natively supports the :doc:`tensor_scalar </nki/api/generated/nki.isa.tensor_scalar>` and :doc:`tensor_copy </nki/api/generated/nki.isa.tensor_copy>` instructions (same as VectorE), and offers up to 2x performance uplift for BF16/FP16 datatypes, which is the same as the 2x performance mode on VectorE introduced with Trainium2. For those instructions, NKI users are able to select the execution engine, which can help offload either one of the engines, or load balance between them, depending on workload characteristics.\n\n.. _arch-trn3-activation2:\n\nMore flexible ``nisa.activation``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. note::\n   The Activation2 instruction does not have ``nki.isa`` API support at this time.\n\nTrainium3 introduces the Activation2 instruction, which provides more flexibility to users compared to the existing Activation instruction. Unlike Activation, which only supports the combination of scale multiplication and bias addition, Activation2 supports bias subtraction and allows users to disable scale multiplication and bias addition entirely. Further, while Activation only supported add as a reduce command, Activation2 supports add, max, min, absmax, and absmin reductions.\n\nData Movement and DMA updates\n------------------------------\n\n.. _arch-trn3-indirect-access:\n\nSBUF/PSUM indirect access\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. note::\n   SBUF/PSUM indirect access for compute engines does not have ``nki.isa`` API support at this time.\n\nThe NeuronCore-v4 SBUF/PSUM introduce a new indirect addressing mode for all compute engines (TensorE/VectorE/ScalarE/GpsimdE), which allows gathering or scattering SBUF and PSUM tensors along the free (F) dimension. Consider a tensor of shape [128, 512] located in SBUF, which occupies 128 partitions with 512 elements per partition. Suppose a user is interested in only accessing the elements 0, 128 and 384 along the free dimension across all 128 partitions for a single computation operation, such as ``nisa.nc_matmul``:\n\n.. _fig-indirect-access-1:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-12.png\n\nSince these three vectors do not have a uniform stride along the free dimension, the access pattern is not a tensorized pattern (i.e. a regular N-dimensional access pattern). Prior to NeuronCore-v4, such an access pattern would require three separate instructions (such as ``nisa.nc_matmul``) to perform the computation on all three vectors. \n\nIn NeuronCore-v4, all compute engines can perform a gather access pattern to directly access those three vectors in a single instruction:\n\n.. _fig-indirect-gather:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-13.png\n\n\nSimilarly, an indirect scatter operation allows any engine to scatter a set of vectors into a target tensor:\n\n.. _fig-indirect-scatter:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-14.png\n\nBoth styles of indirection use a separate offset tensor to encode which vectors to access. \n\nSBUF Read-Add-Write\n^^^^^^^^^^^^^^^^^^^^\n\n.. note::\n   SBUF Read-Add-Write does not have ``nki.isa`` API support at this time.\n\nNeuronCore-v4 introduces an enhanced SBUF capability that enables on-the-fly tensor accumulation near memory. This feature allows DMA engines to perform B+=A operations, where tensor B resides in SBUF and tensor A can be sourced from any accessible memory location (such as HBM or SBUF). Tensors A and B can be either BF16 or FP32 data types, but they must have a matching data type within a single DMA transfer performing the Read-Add-Write operation. This near-memory accumulation maintains the same throughput as standard DMA copy operations to SBUF (compared to 50% DMA throughput via DMA collective compute engines prior to NeuronCore-v4), enabling efficient in-place tensor updates without additional memory overhead.\n\nThe figure below illustrates the data flow that is used to enable this SBUF accumulation feature. First, a DMA unit transfers tensor A to the ReadAddWrite unit adjacent to the SBUF. The ReadAddWrite unit then retrieves tensor B from SBUF, performs the addition of A and B, and writes the result back to tensor B's original location in SBUF. \n\n.. _fig-read-add-write:\n\n.. image:: /nki/img/arch_images/nki-trn3-arch-15.png\n\n.. _arch-trn3-traffic-shaping:\n\nDMA Traffic Shaping\n^^^^^^^^^^^^^^^^^^^^\n\n.. note::\n   DMA Traffic Shaping does not have ``nki.isa`` API support at this time.\n\nTrainium3 DMA engines support Traffic Shaping, which enables configurable bandwidth allocation across different DMA operations. The DMA Traffic Shaping feature supports 4 distinct classes of service, enabling fine-grained control over the priorities of data movement. This capability is particularly beneficial when optimizing parallel computation and communication (collective operations) across multiple NeuronCores.\n"
  },
  {
    "path": "nki/guides/architecture/trainium_inferentia2_arch.rst",
    "content": ".. meta::\n    :description: Comprehensive guide to Trainium/Inferentia2 hardware architecture for NKI, covering compute engines, memory hierarchy, and optimization techniques for AWS Neuron SDK.\n    :keywords: Trainium, Inferentia2, NKI, AWS Neuron, Hardware Architecture\n    :date-modified: 12/01/2025\n\n.. _trainium_inferentia2_arch:\n\nTrainium/Inferentia2 Architecture Guide for NKI\n===============================================\n\nIn this guide, we will dive into hardware architecture of second-generation NeuronDevices: Trainium/Inferentia2.\nOur goal is to equip advanced Neuron users with sufficient architectural knowledge to write performant NKI kernels and\ntroubleshoot performance issues on NeuronDevices using :doc:`Neuron Explorer </nki/guides/use-neuron-profile>`,\na profiler tool designed specifically for NeuronDevices. This guide is also written assuming readers have read\nthrough :doc:`NKI Language Guide </nki/get-started/nki-language-guide>` and familiarized themselves with key NKI concepts.\n\n:numref:`Fig. %s <fig-arch-neuron-device-v2>` shows a block diagram of a Trainium and Inferentia2 device.\nAt a high level, both Trainium and Inferentia2 devices consist of:\n\n* 2 NeuronCores (v2).\n* 2 HBM stacks with a total device memory capacity of 32GiB and bandwidth of 820 GB/s.\n* 32 DMA (Direct Memory Access) engines to move data within and across devices.\n* 6 CC-Cores for collective communication.\n* 2 (Inferentia2) or 4 (Trainium) NeuronLink-v2 for device-to-device collective communication.\n\n\n.. _fig-arch-neuron-device-v2:\n\n.. figure:: /nki/img/arch_images/neuron_device2.png\n   :align: center\n   :width: 100%\n\n   Trainium/Inferentia2 Device Diagrams.\n\nThe rest of this guide will go into details of each compute engine in NeuronCore-v2 and supported data movement\npatterns across the memory hierarchy.\n\n.. _arch_sec_neuron_core_engines:\n\nNeuronCore-v2 Compute Engines\n-----------------------------\n\nIn this section, we will describe the architectural details within a NeuronCore-v2. The figure below is a simplified diagram\nof the compute engines and their connectivity to the two on-chip SRAMs: state buffer (SBUF) and partial sum buffer (PSUM).\n\n.. _fig-arch-neuron-core-v2:\n\n.. figure:: /nki/img/neuroncore_fig24.png\n   :align: center\n   :width: 60%\n\n   NeuronCore-v2 and its device memory (HBM).\n\nA NeuronCore-v2 consists of four heterogeneous compute engines (Tensor, Vector, Scalar, and GpSimd), each designed to accelerate different types of operators in modern machine learning models. Each compute engine has its own sequencer, which is responsible for instruction fetch, decode, and issue. The four compute engines execute four independent instruction streams asynchronously in parallel. Explicit synchronization to satisfy data dependencies between engines is handled through atomic semaphores in hardware. In NKI, programmers do not need to program engine synchronization manually. The Neuron Compiler can automatically insert the required synchronizations during compilation, based on data dependencies identified in the NKI kernel. \n\nThe instruction stream within each compute engine consists of both control and data-path instructions. Control instructions are executed directly by the engine sequencer and can perform scalar operations using a set of 32-bit scalar registers private to each sequencer. Examples of control instructions include register ALU operations for dynamic condition and address calculations, branching for control flow execution, and triggering DMA transfers. Data path instructions are executed by the specialized engine data path, which interacts with tensors in SBUF/PSUM. Data path instructions can handle flexible addressing and shapes by referencing values stored in scalar registers.\n\nWithin each NeuronCore, there is also a Sync Engine, which functions as an engine sequencer that can perform the same types of control instructions. The Sync Engine is most commonly used to trigger DMA transfers without interfering with compute engine instruction scheduling and ordering.\n\n\nIn addition, it is often useful to take engine data-path width and frequency into account when optimizing performance for\na multi-engine operator:\n\n  +------------------------+----------------+------------------------------------+-----------------------+\n  | Device Architecture    | Compute Engine | Data-path Width (elements/cycle)   | Frequency (GHz)       |\n  +========================+================+====================================+=======================+\n  |                        | Tensor         | 2x128 (input); 1x128 (output)      | 2.8                   |\n  |                        +----------------+------------------------------------+-----------------------+\n  |                        | Vector         |                                    | 1.12                  |\n  |                        +----------------+                                    +-----------------------+\n  | Trainium/Inferentia2   | Scalar         |   128 input/output                 | 1.4                   |\n  |                        +----------------+                                    +-----------------------+\n  |                        | GpSimd         |                                    | 1.4                   |\n  +------------------------+----------------+------------------------------------+-----------------------+\n\nMemory-wise, a NeuronCore-v2 consists of two software-managed on-chip SRAMs, a 24MiB SBUF as the main data storage and a\n2MiB PSUM as a dedicated accumulation buffer for Tensor Engine. Both SBUF and PSUM are considered two-dimensional memories\nwith 128 partitions each, i.e., one SBUF partitions has 192KiB of memory while one PSUM partition has 16KiB. We will cover\nmore details on data movements with SBUF/PSUM later :ref:`here <arch_sec_data_movement>`.\n\n\nThe rest of this section will cover the following topics for each compute engine:\n\n\n* Key functionalities.\n* Layout and tile size requirement for input and output tensors.\n* Best practices to achieve good performance on the engine.\n\n.. _arch_guide_tensor_engine:\n\nTensor Engine\n^^^^^^^^^^^^^\n\nTensor Engine (TensorE from now on) is specially designed to accelerate matrix-multiplications (matmuls), as well as other\noperators that can be executed using matrix multiplications such as 2D convolutions. We also note that TensorE can be used\nfor advanced data movement from SBUF to PSUM, including transposition and broadcast\n(more discussion below :ref:`here <arch_sec_tensor_engine_alternative_use>`).\nArchitecturally, the engine is built around a `systolic array <https://en.wikipedia.org/wiki/Systolic_array>`_ with\n128 rows and 128 columns of processing elements, which streams input data from SBUF and writes output to PSUM.\n\n**Data Types.** TensorE supports `BF16 <https://en.wikipedia.org/wiki/Bfloat16_floating-point_format>`_\\ ,\nFP16, `TF32 <https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/>`_\\\n, and cFP8 input matrix data types at a maximum throughput of 92 TFLOPS, as well as 23 TFLOPS for FP32 inputs. TensorE performs\nmixed-precision calculations, with accumulations at FP32 precision. Therefore, the output data of a TensorE calculation\nis always in FP32.\n\n**Layout.** To understand the layout and tiling constraints of TensorE, let's visualize its connection to SBUF\nand PSUM as below. Note, PSUM partition dimension is purposely rotated 90 degrees compared to SBUF partition dimension\ndue to systolic array data flow.\n\n\n.. _fig-arch-tensor-engine:\n\n.. figure:: /nki/img/arch_images/tensor_engine.png\n   :align: center\n   :width: 80%\n\n   Tensor Engine and SRAM Connectivity.\n\nAs shown in the diagram above, TensorE must **read** input matrices from **SBUF** and **write** output matrices to **PSUM**.\nPSUM also allows near-memory accumulation of multiple matrix multiplication output tiles (detailed usage discussed\n:ref:`here <arch_sec_accumulation_psum>`).\n\nIn NKI, to perform a multiplication of two matrices, ``x[M, K]`` and ``y[K, N]``, you may invoke the NKI ISA API\n``nki.isa.nc_matmul(x, y)`` directly. The returned tile has a shape of ``[M, N]`` as expected. At the hardware level,\nTensorE requires both input tiles to have the **contraction dimension** ``K`` in the SBUF partition\ndimension, that is, the first dimension of input shapes (:ref:`Tiling Layout <nki-tile-layout>`).\nThis ISA requirement is reflected in the low-level API :doc:`nki.isa.nc_matmul </nki/api/generated/nki.isa.nc_matmul>`,\nwhich takes ``stationary`` and ``moving`` matrices as input parameters. Therefore, ``nki.isa.nc_matmul(x, y)`` is a two-step computation:\ninvoking ``nki.isa.nc_transpose(x)`` to get ``stationary`` and then ``nki.isa.nc_matmul(stationary, moving)`` to get the final result.\nIn other words, ``nki.isa.nc_matmul(stationary[K,M], moving[K,N])`` performs a ``stationary.T @ moving`` calculation, which will result\nin an output with dimensions ``[M,N]``.\n\nFor every ``nki.isa.nc_matmul(stationary, moving)`` call, TensorE executes two distinct Neuron ISA instructions:\n\n* LoadStationary (short for LS): This instruction loads the ``stationary`` from SBUF and caches it in internal storage of TensorE\n* MultiplyMoving (short for MM): This instruction loads the ``moving`` from SBUF and multiplies ``moving`` across the pre-loaded\n  ``stationary`` matrix from the previous LoadStationary instruction. The output of this instruction is the\n  output of the ``nki.isa.nc_matmul`` call written to PSUM.\n\nWith the above instruction sequence, we as NKI programmers effectively map input tile ``stationary`` as the stationary tensor\nand input tile ``moving`` as the moving tensor for TensorE. As a rule-of-thumb for layout analysis, the **free** axis of the\n**stationary** tensor always becomes the partition (first) axis of the output tile, while the **free** axis of the\n**moving** tensor becomes the free axis of the output. :numref:`Fig %s <fig-arch-matmul>` below visualizes this concept\nby showing a matrix multiplication in both mathematical and TensorE views.\n\n.. _fig-arch-matmul:\n\n.. figure:: /nki/img/arch_images/matmul.png\n   :align: center\n   :width: 100%\n\n   MxKxN Matrix Multiplication Visualization.\n\nHowever, programmers are also free to map ``stationary`` tile to the moving tensor instead, which would lead to the same output tile\nbut transposed: ``nki.isa.nc_matmul(moving[K,N], stationary[K,M]) = moving.T @ stationary = outputT[N, M]``. In fact, mapping high-level input tiles\nto the low-level stationary/moving tensors in TensorE is an important layout decision that NKI programmers should consider\nto minimize data transposes. Programmers should make this decision based on layout requirements imposed\nby the compute engine that is going to consume the matrix multiplication output. See NKI Performance Guide\nfor more discussion.\n\n.. _arch_matmul_tile_size:\n\n**Tile Size.** The ``nki.isa.nc_matmul`` API enforces the following constraints on the input/output tile sizes:\n\n#. ``stationary`` tensor free axis size (\\ ``stationary_fsize``\\ ) must never exceed 128, due to the number of PE columns in TensorE.\n#. ``stationary/moving`` tensor partition axis size (\\ ``stationary_psize/moving_psize``\\ ) must never exceed 128, due to the number of PE rows and\n   also the number of SBUF partitions.\n#. ``moving`` tensor free axis size (``moving_fsize``) must never exceed 512, due to the fact that each ``nc_matmul`` can only write\n   to a single PSUM bank, which can only hold 512 FP32 elements per PSUM partition.\n\nWhen the shapes of the input matrices defined in the user-level operator exceed any of the above tile size limitation, we\nmust tile the input matrices and invoke multiple ``nki.isa.nc_matmul`` calls to perform the matrix multiplication. Exceeding\nthe ``stationary_fsize`` (#1) or ``moving_fsize`` (#3) tile limitations for M or N should lead to fully independent ``nki.isa.nc_matmul``\nwith disjoint output tiles. However, when ``K`` exceeds the ``stationary_psize/moving_psize`` limit, we need to tile the input matrices\nin the contraction dimension and invoke multiple ``nki.isa.nc_matmul`` to accumulate into the *same* output buffer in PSUM.\nRefer to the :ref:`Tiling Matrix Multiplications <tutorial_matmul_tiling>`\ntutorial for a NKI code example.\n\n.. _arch_sec_tensor_engine_alternative_use:\n\n**Alternative Use Case**\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nOne interesting use case of TensorE is low-latency data reshape within NeuronCore, which typically involves multiplying\na matrix to be reshaped with a compile-time constant matrix filled with zeros and ones.\n\nAs an example, we can perform a 128x128 matrix transposition (i.e., swap the free and partition axis of the matrix) using\n``nki.isa.nc_matmul(transpose_input, identity)``\\ , where ``transpose_input`` is the matrix to be transposed and\n``identity`` is a 128x128 identity matrix. In fact, this is exactly what nki.isa.nc_transpose() does, when TensorE is chosen\nas the compute engine.\n\n.. _fig-arch-mm-transpose:\n\n.. figure:: /nki/img/arch_images/mm_transpose.png\n   :align: center\n   :width: 80%\n\n   Transposition.\n\nSimilarly, we can broadcast a vector occupying a single partition to M (M <= 128) partitions using ``nki.isa.nc_matmul(ones,\nbroadcast_input, is_stationary_onezero=True)``\\ , where ``ones`` is a 1xM vector filled with ones and ``broadcast_input`` is\nthe vector to be broadcast. In fact, NKI invokes such matmul under the hood when ``broadcast_input.broadcast_to((M, broadcast_input.shape[1]))``\nis called.\n\n.. _fig-arch-mm-broadcast:\n\n.. figure:: /nki/img/arch_images/mm_broadcast.png\n   :align: center\n   :width: 80%\n\n   Partition Broadcast.\n\nIn general, we can achieve many more complex data reshapes in TensorE, such as shuffling partitions of a SBUF tensor, by\nconstructing appropriate zero/one patterns as one of the matmul inputs.\n\nFinally, we can also leverage TensorE for data summation across SBUF partitions (P-dim summation). For example, a vector\nlaid out across SBUF partitions can be reduced into a single sum using TensorE as shown in the diagram below. Note, this\nutilizes only a single PE column of the TensorE; therefore, depending on the surrounding operators, this may not be the\nbest use of TensorE. If you can do summation within each partition (F-dim summation), see\n:doc:`nki.isa.tensor_reduce </nki/api/generated/nki.isa.tensor_reduce>`\nfor an alternative reduction implementation on Vector Engine. It is recommended to choose the engine based on the natural\nlayout of your input data to avoid any transpositions.\n\n.. _fig-arch-mm-cross-partition:\n\n.. figure:: /nki/img/arch_images/mm_cross_partition.png\n   :align: center\n   :width: 60%\n\n   Cross-Partition Accumulation\n\nAs TensorE is the most performant compute engine of the NeuronCore in terms of FLOPS, the goal is to have it execute meaningful\ncomputation at high utilization as much as possible. The above “alternative use cases” stop TensorE from performing *useful*\ncomputations at *high* throughput and therefore, should generally be avoided. However, there are situations where it is\nadvisable to use them:\n\n\n* Operators that do not require heavy matmuls anyhow, e.g. normalization, softmax.\n* Layout conflicts between producer and consumer engines where broadcast/transpose are absolutely unavoidable (see example\n  in fused attention tutorial).\n\n.. _arch_guide_tensor_engine_perf:\n\n**Performance Consideration**\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAs a rule of thumb, TensorE can achieve the best throughput when it runs many back-to-back ``nki.isa.nc_matmul`` with both\ninput matrices at the largest possible tiles sizes (``stationary`` is 128x128 and ``moving`` is 128x512). In this ideal\nscenario, TensorE sees the below instruction sequence:\n\n\n* ``LoadStationary (LS[0])`` (128x128)\n* ``MultiplyMoving (MM[0])`` (128x512)\n* ``LoadStationary (LS[1])`` (128x128)\n* ``MultiplyMoving (MM[1])`` (128x512)\n* ...\n\n**Cost Model:** TensorE is a deeply pipelined engine; therefore, the engine can have several ``LS&MM`` instruction pairs\nin-flight at a given time. Due to this pipelining nature, it is often *not* useful to use end-to-end execution *latency*\nof a single instruction when estimating the instruction cost. Instead, we can focus on the **initiation interval** of\nsuch instructions, that is, the number of cycles between successive instruction launches. Therefore, we can estimate the\ncost of an instruction ``I`` by how soon TensorE can issue the next instruction after ``I``.\n\nFor the sake of discussion, let's assume we have many back-to-back ``MM`` instructions with BF16/FP16/TF32/cFP8 input data\ntype that reuse a single pre-loaded ``stationary`` inside TensorE. The initiation interval between subsequent MM instructions in\nthis case is roughly ``max(N, MM_INIT_LATENCY)``\\ , where ``MM_INIT_LATENCY`` is 64 TensorE cycles on NeuronCore-v2, and  ``N`` is the\nfree axis size of ``moving`` of current ``MM`` (typically set to 512). For FP32 input data type,\nthe instruction cost is roughly 4x higher than BF16/FP16/TF32/cFP8. Therefore, whenever possible, we recommend down-casting\nFP32 input matrix data type to one of BF16/FP16/TF32/cFP8 before performing matrix multiplications.\n\nFigure below visualizes two pipelined ``MM`` instructions:\n\n.. _fig-arch-mm-pipeline:\n\n.. figure:: /nki/img/arch_images/mm_pipeline.png\n   :align: center\n   :width: 90%\n\n   Pipelined multiplyMoving instructions.\n\n**Background LoadStationary:** In typical workloads, TensorE would be alternating between LS and MM instructions with different\ninput matrices. In order to optimize TensorE's utilization, we also enable a \"background LoadStationary\" capability, which\nallows loading of the next stationary tensor in parallel to the computation on the current stationary tensor.\n\nAs a result, depending on the relative sizes of the ``stationary`` and ``moving`` matrices, the overall\nTensorE performance can be bounded by either ``LS`` or ``MM`` instructions. Figure below visualizes these two cases. In\nthe ideal scenario where ``stationary`` and ``moving`` use the largest tile sizes, TensorE should operate in case (a).\n\n.. _fig-arch-mm-bottlenecks:\n\n.. figure:: /nki/img/arch_images/mm_bottleneck.png\n   :align: center\n   :width: 70%\n\nPossible execution timeline execution with background LoadStationary\n\n**Fast LoadStationary:** Since ``LoadStationary`` is a pure data movement with no computation, TensorE can perform ``LoadStationary``\n**up to 4x** faster than a ``MultiplyMoving`` with the same free axis size. Fast ``LoadStationary`` has an important performance\nimplication on ``nki.isa.nc_matmul``\\ : When one of the input matrices has a small free axis size and the other has a large\nfree axis size, we prefer to put the matrix with large free axis as the ``stationary`` matrix. For example, if we\ntry to do a vector-matrix multiplication, it is recommended to put the matrix as ``stationary`` matrix and vector as ``moving``\nmatrix to get the best performance out of TensorE.\n\n.. _arch_guide_vector_engine:\n\nVector Engine\n^^^^^^^^^^^^^\n\nVector Engine (VectorE) is specially designed to accelerate vector operations where every element in the output tensor typically\ndepends on multiple elements from input tensor(s), such as vector reduction and element-wise operators between two tensors.\nVectorE consists of 128 parallel vector lanes, each of which can stream data from a SBUF/PSUM partition, perform mathematical\noperations, and write data back to each SBUF/PSUM partition in a deeply pipelined fashion.\n\n**Data Types.** VectorE supports all NKI data types (details see :ref:`supported data types in NKI <nki-dtype>`)\nin both input and output tiles. :ref:`Arithmetic operations <nki-aluop>`\nare performed in FP32, with automatic zero-overhead input and output casting to and from FP32. Refer to ``nki.isa`` API\nreference manual for any instruction-specific data type requirements.\n\n**Layout & Tile Size.** VectorE instructions expect the parallel axis of the input and output data to be mapped to the partition dimension. For\nexample, the figure below shows reduction add of a NxM matrix along the M dimension. Since each of N rows in the matrix\ncan be reduced in parallel, the N dimension of the matrix should be mapped to the SBUF partition dimension. Refer to the\n:doc:`nki.isa API manual </nki/api/nki.isa>` for\ninstruction-specific layout constraint of different VectorE instructions.\n\n\n.. _fig-arch-vector-engine-reduce:\n\n.. figure:: /nki/img/arch_images/vector_engine_reduce.png\n   :align: center\n   :width: 60%\n\n   Reduce add on Vector Engine.\n\nIn terms of tile size, the majority of VectorE instructions only have limitation on the input/output tile partition dimension\nsize which must not exceed 128, while the free dimension size can be up to 64K elements for SBUF or 4K elements for PSUM.\nHowever, there are a few notable exceptions, such as :doc:`nki.isa.bn_stats </nki/api/generated/nki.isa.bn_stats>`\nwhich further imposes free dimension size of input tile cannot exceed 512. Refer to the `nki.isa API manual <nki.language>`\nfor instruction-specific tile size constraints.\n\n.. _arch_guide_cross_partition_data_movement:\n\nCross-partition Data Movement\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe VectorE also supports a limited set of cross-partition data movement within each group of 32 partitions. The figure\nbelow shows connectivity between SBUF and VectorE banks. VectorE consists of four Reshape and Compute banks: each Reshape\nBank connects to 32 SBUF/PSUM partitions and outputs 32 parallel streams of data, while each Compute Bank can process 32\nparallel data streams using 32 vector lanes. The Compute Bank can write back to 32 SBUF/PSUM partitions.\n\n\n.. _fig-arch-vector_cross_partition:\n\n.. figure:: /nki/img/arch_images/vector_engine_cross_partition.png\n   :align: center\n   :width: 90%\n\n   Vector Engine reshape and compute banks.\n\nThe Reshape Bank supports the following data movement:\n\n\n#. *32x32 transpose*\\ : Each Reshape Bank can read in 32 elements per SBUF/PSUM partitions and transpose the partition and\n   free dimension of the incoming 32x32 matrix. This can be invoked by :doc:`nki.isa.nc_transpose </nki/api/generated/nki.isa.nc_transpose>`\n   API by selecting VectorE as the execution engine.\n#. *32 partition shuffle*\\ : Each Reshape Bank can take an arbitrary *shuffle mask*\n   ``SM``\\ * of length 32. The integer value of ``SM[i]`` indicates the source partition ID (modulo 32) that the Reshape Bank\n   output stream ``i`` will get. For example, we can broadcast partition[0] to partition[0-31] using a SM of 32 zeros.\n   This can be invoked by :doc:`nki.isa.nc_stream_shuffle </nki/api/generated/nki.isa.nc_stream_shuffle>` API.\n\nRefer :ref:`here <arch_sec_cross_partition_connect>`\nlater in this doc for cross-bank data movement.\n\n.. _arch_sec_vector_engine_perf:\n\n**Performance Consideration**\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**128 Parallel Compute Lanes:** VectorE can perform computation with all 128 vector lanes in parallel, with each lane streaming\ndata from/to one SBUF/PSUM partition. Therefore, the performance cost of a VectorE instruction using all 128 lanes is the\nsame as an instruction that uses fewer than 128 lanes.\n\nAs a result, we recommend NKI developers to maximize the compute lanes used per VectorE instruction, that is, the partition\naxis size of input/output tiles of a single ``nki.isa`` or ``nki.language`` compute API call. When the partition axis size\nof input tiles is inevitably fewer than 128 partitions due to high-level operator definition, we could adopt an optimization\ncalled “partition vectorization” by packing multiple “small” VectorE instructions of the same operation into a single “large”\nVector instruction. Refer to NKI Performance Guide for more detailed discussion of this optimization.\n\n**Cost Model:** In the most common cases where the free axis size (\\ ``N``\\ ) of the input tile(s) is sufficiently large\n(\\ ``N > 128``\\ ), the execution cost of an instruction on VectorE is correlated to ``N``\\ :\n\n\n* If there is only one input tile, most VectorE instructions can execute in roughly ``N`` cycles (example:\n  :doc:`nki.isa.tensor_scalar </nki/api/generated/nki.isa.tensor_scalar>`)\n* If there are two input tiles, the instruction can execute in roughly ``2N`` cycles (example: nki.isa.tensor_tensor)\n\n\nThere are a few exceptions to the above rule, depending on the data types and instruction type. See\n:doc:`NKI ISA API doc </nki/api/nki.isa>`\nfor instruction-specific instruction cost details.\n\nIn the rare cases where VectorE is running many back-to-back instructions either with ``N << 128`` or with every instruction\ndepending on the output tile of the previous instruction, we need to add a static instruction overhead of 100 engine cycles\nto the above execution cost estimate.\n\nThe above rules are for general guidance only. To find out the exact instruction costs for your NKI kernel, you may capture\na detailed instruction execution trace on device using :doc:`neuron-profiler </nki/guides/use-neuron-profile>`.\n\n\nScalar Engine\n^^^^^^^^^^^^^\n\nScalar Engine (ScalarE) is specially designed to accelerate scalar operations where every element in the output tensor only\ndepends on one element of the input tensor. In addition, ScalarE provides hardware acceleration to evaluate non-linear functions\nsuch as Gelu and Sqrt. The currently supported set of non-linear functions is listed in :ref:`here <nki-act-func>`.\nIt it worth noting that we can support any new non-linear functions on ScalarE as they come up in new ML model architectures\nthrough Neuron SDK software updates. Similar to VectorE, ScalarE consists of 128 parallel lanes, each of which can stream\ndata from a SBUF/PSUM partition, perform mathematical operations, and write data back to each SBUF/PSUM partition in a deeply\npipelined fashion.\n\n**Data Types.** ScalarE supports all NKI data types (details see :ref:`supported data types in NKI <nki-dtype>`)\nin both input and output tiles. All internal computation is performed in FP32,\nwith automatic zero-overhead input and output casting to and from FP32.\n\n**Layout & Tile Size.** ScalarE typically evaluates scalar operations (such as, nki.language.gelu), which does not impose\nany input/output tile layout constraints. However, there are additional hardware features in ScalarE that will have layout\nconstraints similar to VectorE (more discussion later).\n\nIn terms of tile size, ScalarE instructions only have limitation on the input/output tile partition dimension size which\nmust not exceed 128, while the free dimension size can be up to 64K elements for SBUF or 4K elements for PSUM.\n\n.. _arch_sec_scalar_pipelined_fma:\n\nPipelined Multiply-Add\n~~~~~~~~~~~~~~~~~~~~~~\n\nEach ScalarE compute lane also supports an additional multiply-add **before** the non-linear function (\\ ``func``\\ ) is applied\nin a pipeline fashion. Mathematically, ScalarE implements:\n\n.. code-block::\n\n   # Case 1: scale is SBUF/PSUM vector\n   # Input: 2D in_tile, 1D scale, 1D bias\n   # Output: 2D out_tile\n   for lane_id in range(in_tile.shape[0]):\n      for k in range(in_tile.shape[1])\n       out_tile[lane_id][k] = func(in_tile[lane_id][k] * scale[lane_id]\n                                       + bias[lane_id])\n\n   # Case 2: scale is a compile-time scalar constant in the instruction\n   for lane_id in range(in_tile.shape[0]):\n      for k in range(in_tile.shape[1])\n       out_tile[lane_id][k] = func(in_tile[lane_id][k] * scale\n                                       + bias[lane_id])\n\nThis functionality can be invoked using the :doc:`nki.isa.activation </nki/api/generated/nki.isa.activation>`\nAPI by specifying a ``scale`` for multiplication and ``bias`` for addition. The scale can either be a tile from SBUF/PSUM\nwith one element/partition or a compile-time constant. On the other hand, the bias can only be a tile from SBUF/PSUM with\none element/partition. A useful mental model for this capability is combining a :doc:`nki.isa.tensor_scalar </nki/api/generated/nki.isa.tensor_scalar>`\ninstruction with a non-linear function evaluation into a single instruction (2x speed-up than two separate instructions).\n\nPipelined Reduction\n~~~~~~~~~~~~~~~~~~~~~~\n\nEach ScalarE compute lane also supports reduction **after** the non-linear function (\\ ``func``\\ ) is applied\nin a pipeline fashion. On NeuronCore-v2, the reduction operator can only be addition.\n\nMathematically, ScalarE with accumulation enabled implements:\n\n.. code-block::\n   :emphasize-lines: 7\n\n   # Input: 2D in_tile, 1D scale (similarly for scalar scale), 1D bias\n   # Output: 2D out_tile, 1D reduce_res\n   for lane_id in range(in_tile.shape[0]):\n     for k in range(in_tile.shape[1]):\n       out_tile[lane_id][k] = func(in_tile[lane_id][k] * scale[lane_id]\n                                    + bias[lane_id])\n       reduce_res[lane_id] += out_tile[lane_id][k]\n\nThis functionality can be invoked using the :doc:`nki.isa.activation_reduce </nki/api/generated/nki.isa.activation_reduce>`\nAPI by specifying ``reduce_op`` as ``nki.language.add`` and ``reduce_res`` as\nthe output reduction tile, passed by reference.\n\nA useful mental model for this capability is combining a :doc:`nki.isa.activation </nki/api/generated/nki.isa.activation>`\ninstruction with a :doc:`nki.isa.tensor_reduce </nki/api/generated/nki.isa.tensor_reduce>` into a single API,\nwhich returns results from **both** APIs. Note,\n:doc:`nki.isa.activation_reduce </nki/api/generated/nki.isa.activation_reduce>`\ninvokes two back-to-back ISA instructions on hardware, `Activate` and `ActReadAccumulator`. The `Activate` instruction\nperforms the regular computation as specified in :doc:`nki.isa.activation </nki/api/generated/nki.isa.activation>` and also\nreduction at no additional cost. The reduction result is cached inside ScalarE after `Activate`.\nThe `ActReadAccumulator` instruction is a low cost (roughly 64 ScalarE cycles on NeuronCore-v2)\ninstruction to write the internal reduction result back to SBUF/PSUM, one element per partition.\n\nPerformance Consideration\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAll the performance notes discussed for :ref:`Vector Engine <arch_sec_vector_engine_perf>`\nearlier are applicable to Scalar Engine, with one exception regarding instruction cost for two input tensors - ScalarE can\nonly read up to one input tensor per instruction.\n\n**Instruction Combination.** All ``nki.isa.activation`` instructions have the same execution cost, regardless of whether\nwe enable the scale multiplication or bias add. Therefore, it is recommended to combine such multiply-add operations with\nnon-linear function evaluation into a single ScalarE instruction if the computation allows it. This is highly useful for\nML operators that are **not** TensorE heavy (not matmul-bound). Softmax is one such example, where we typically subtract\nthe maximum value of the input elements before evaluating exponential function for numerical stability.\n\nGpSimd Engine\n^^^^^^^^^^^^^\n\nGpSimd Engine (GpSimdE) is intended to be a general-purpose engine that can run any ML operators that cannot be lowered\nonto the other highly specialized compute engines discussed above efficiently, such as applying a triangular mask to a tensor.\n\n\nA GpSimdE consists of eight fully programmable processors that can execute arbitrary C/C++ programs. Therefore, this engine\nprovides the hardware support for `Neuron Custom Operator. <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/programming-guide/custom-c%2B%2B-operators-devguide.html>`_\nIn addition, each processor is a 512-bit vector machine that can run high-performance vectorized kernels. Every  ``nki.isa``\nAPI running on GpSimdE such as :doc:`nki.isa.iota </nki/api/generated/nki.isa.iota>`\nuses a vectorized kernel implementation that Neuron engineers hand-tune for the underlying processor ISA.\n\n**Data Types.** Each processor in GpSimd supports vectorized computation for\n\n\n* 16x FP32/INT32/UINT32, or\n* 32x FP16/INT16/UINT16, or\n* 64x INT8/UINT8\n\nThis is in contrast to ScalarE/VectorE which can only perform arithmetic operations in FP32. However, if the GpSimdE program\nchooses to, it can also access SBUF data of any :ref:`supported data types in NKI <nki-dtype>`\nand perform data casting to- and from-FP32 at no throughput cost similar to VectorE/ScalarE.\n\n**Layout & Tile Size.** The layout and tile size requirements of GpSimdE highly depend on semantics of the exact instruction.\nRefer to the :doc:`nki.isa API reference guide </nki/api/nki.isa>`\nfor these requirements.\n\n**Memory Hierarchy.** In Trainium/Inferentia2, each GpSimdE processor has 64KB of local data RAM, also called tightly-coupled\nmemory (TCM) as discussed in `Neuron Custom Operator <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/programming-guide/custom-c%2B%2B-operators-devguide.html>`_.\nThe TCM is configured with a 3-cycle access latency and 512-bit data width. Therefore, TCM is often used to store intermediate\ncomputation results within a Neuron Custom Operator or GpSimdE instruction.\n\nThe eight processors in GpSimdE also have a high-bandwidth read/write interface connected to the SBUF.\n:numref:`Figure %s <fig-gpsimd-sbuf-connectivity>` below illustrates the GpSimdE connectivity to SBUF. Each processor connects\nto 16 SBUF partitions for both reading and writing: processor[0] connected to partition[0:15], processor[1] to partition[16:31]\nand so on. Each processor can programmatically send tensor read/write requests to SBUF to access data from the connected\npartitions. On the read side, once a read request is processed, the tensor read interface can deliver up to 512-bit of data\nfrom all 16 connected partitions collectively (up to 32-bit per partition) to the processor per cycle, which matches the\n512-bit SIMD width. Similarly, on the write side, the tensor write interface can accept 512-bit of data for writing back\nto the connected SBUF partitions per cycle.\n\n.. _fig-gpsimd-sbuf-connectivity:\n\n.. figure:: /nki/img/arch_images/gpsimd-sbuf-connectivity.png\n   :align: center\n   :width: 60%\n\n   Connectivity between GpSimdE and SBUF.\n\n**Performance Consideration**\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**128 Parallel Compute Lanes:** Similar to VectorE and ScalarE, GpSimdE has 128 parallel compute lanes for 32-bit computation\ndata types across SIMD lanes of all eight processors. Therefore, it is desirable to invoke GpSimdE instructions that will\nutilize all the parallel compute lanes, typically through accessing all 128 SBUF partitions for input and output. In addition,\nsince each processor can also handle 32-wide 16-bit or 64-wide 8-bit data type computation, GpSimdE can effectively support\n256 or 512 parallel compute lanes internally.\n\n**Cost Model:** Unlike VectorE/ScalarE, there is no rule-of-thumb to estimate execution cost of a GpSimdE instruction. Refer\nto the :doc:`nki.isa </nki/api/nki.isa>`\nAPI reference manual to find out instruction-specific latency estimates.\n\n.. _arch_sec_data_movement:\n\nData Movement\n-------------\n\nIn this section, we will dive into the memory subsystem and discuss how to perform data movement between different memories\nand also how to do it efficiently. As a reminder, there are three main types of memory on a NeuronDevice: HBM, SBUF, and\nPSUM, from highest to lowest capacity. Figure below shows the specifications of these memories and their connectivity\nfor one NeuronCore-v2:\n\n.. _fig-arch-memory-hierarchy:\n\n.. figure:: /nki/img/arch_images/memory_hierarchy.png\n   :align: center\n   :width: 60%\n\n   Memory hierarchy.\n\nAs shown in the above figure, data movement between HBM and SBUF is performed using on-chip DMA\n(Direct Memory Access) engines, which can run in\nparallel to computation within the NeuronCore. Data movement between PSUM and SBUF is done through ISA instructions on the\ncompute engines. However, different compute engines have different connectivity to SBUF/PSUM as indicated by the arrows\nin the figure. In addition, NeuronCore-v2 has the following restrictions:\n\n\n#. VectorE and GpSimdE cannot access SBUF in parallel.\n#. VectorE and ScalarE cannot access PSUM in parallel.\n\nTherefore, VectorE and GpSimdE instructions that access SBUF must be serialized, similarly for VectorE and ScalarE instructions\nthat access PSUM. This is enforced by Neuron Compiler during NKI kernel compilation, so NKI developers are not required\nto program such serializations.\n\nThe rest of this section will discuss the following topics in detail:\n\n\n* Data movement between HBM and SBUF using DMAs.\n* Accessing SBUF/PSUM tensors using compute engines.\n* In-memory accumulation using TensorE and PSUM.\n\nData movement between HBM and SBUF using DMAs\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEach NeuronCore-v2 is equipped by 16 parallel DMA engines that can perform data movement between any addressable\nmemories in the system. Here, we focus on using these DMA engines to move data between the local SBUF and HBM.\nEach DMA engine can process one **DMA transfer** at a time driving a peak bandwidth of 27 GiB/s, but all DMA engines\ncan process different DMA transfers in parallel.\n\nEach DMA transfer can gather a list of source **DMA buffers** and then scatter the data into another list of destination\nDMA buffers. Data within a DMA buffer must be continuous in the memory address map. There is some performance overhead\nat both DMA buffer and transfer levels, both of which can be amortized by moving a sufficiently\nlarge amount of data (more discussion below).\n\nNext, let's examine how HBM and SBUF are laid out in the device memory address map. On one hand,\nHBM is logically a one-dimensional memory and hence occupies a flat chunk of continuous addresses in the\naddress map. In the most common cases, an HBM tensor in NKI is also contiguous in the HBM address space.\n\nOn the other hand, SBUF is considered a two-dimensional memory with 128 partitions as discussed earlier :ref:`here <arch_sec_neuron_core_engines>`.\n:numref:`Figure %s <fig-arch-sbuf-addr-space>`\nshows how SBUF addresses fit in the device\naddress map. ``sbuf_base_addr`` is a 64-bit address dependent\non which NeuronCore-v2 on the device the SBUF is located in. The SBUF addresses start from the first byte of partition 0,\nincrement along the free dimension first and then advance onto the next partition.\n\n\n.. _fig-arch-sbuf-addr-space:\n\n.. figure:: /nki/img/arch_images/sbuf_addr_space.png\n   :align: center\n   :width: 80%\n\n   SBUF memory address space.\n\nAs discussed in :doc:`NKI Language Guide </nki/get-started/nki-language-guide>`,\nan SBUF tensor in NKI spans one or more partitions, with data starting at the same offset:\n\n.. _fig-arch-sbuf-tensor:\n\n.. figure:: /nki/img/pm-layout.png\n   :align: center\n   :width: 80%\n\n   SBUF tensor.\n\nAs a result, a data movement involving ``tensor`` in SBUF will require at least ``tensor.shape[0]``, i.e., P dim size,\ndifferent DMA buffers, since slices of tensor data from different SBUF partitions occupy non-contiguous memory\nin the address space. If the tensor data slice within each SBUF partition is not contiguous in the F dimension,\nmore DMA buffers will need to be unrolled along the F dim. These DMA buffers are typically grouped into different\nDMA transfers so that multiple DMA engines can participate in the data movement to maximize memory bandwidth utilization.\n\nIn NKI, moving data from HBM to SBUF and from SBUF to HBM are done with calls to the :doc:`nki.isa.dma_copy </nki/api/generated/nki.isa.dma_copy>` API. Neuron Compiler is responsible for converting each NKI API call to DMA transfers and\nassigning these transfers to different DMA engines. As an example, loading a 128x512 FP32 HBM tensor to SBUF is best\ndone through 16 DMA transfers (one per DMA engine), each moving a scatter-gather list of 8 DMA buffers:\n\n.. code-block::\n\n   import nki.language as nl\n   tile = nl.load(in_tensor[0:128, 0:512])\n\nTo achieve good performance out of the DMAs, we generally aim to:\n\n#. Move a large amount of contiguous data in each DMA buffer to amortize DMA buffer overhead\n#. Move a large amount of data in each DMA transfer to amortize DMA transfer overhead.\n#. Invoke as many parallel DMA transfers on the available DMA engines as possible.\n\nThese goals ultimately boil down to a quick optimization rule: maximize **both free (4KiB or above) and partition\n(ideally 128) dimension sizes** when moving tensors between SBUF and HBM using ``nki.language.load``\nand ``nki.language.store``. Refer to the\n:doc:`NKI Performance Guide </nki/deep-dives/nki_perf_guide>` for more information\non optimizing performance of data movements between HBM and SBUF.\n\nAccessing SBUF/PSUM tensors using compute engines\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n:numref:`Figure %s <fig-arch-data-streaming>` shows a simplified timeline of how compute engines\n**stream** data in and out of on-chip SRAM (SBUF or PSUM).\nRefer to :numref:`Figure %s <fig-arch-neuron-core-v2>` for the available connectivity between engines and SBUF/PSUM.\nAt a high level, the compute engines are able to pipeline\ndata reads, computation and writes along the F dimension of the src/dst tensors.\nIn every cycle, each engine can read 128 elements across 128 SBUF/PSUM partitions,\nperform a computation on previously\nread 128 elements, and write 128 previously computed results to SBUF/PSUM.\nIn other words, the P axis of a tensor\nis the *parallel* dimension for SBUF/PSUM data accessing, while the F axis of the tensor is the *time* dimension for data\naccessing.\n\n.. _fig-arch-data-streaming:\n\n.. figure:: /nki/img/arch_images/data_streaming.png\n   :align: center\n   :width: 80%\n\n   Data streaming between SBUF and compute engine.\n\nWhen accessing SBUF/PSUM tensors in an instruction, we need to follow different rules in the P and F dimensions. First,\nhardware does not allow P dimension striding when accessing data from a single SBUF/PSUM tensor. Therefore, a valid src/dst\ntensor of an instruction must occupy a continuous number of partitions. In addition, the hardware further enforces which\npartition a tensor can start from (\\ ``start_partition``\\ ) based on the number of partitions the tensor occupies (\\ ``num_partition``\\\n). This is currently handled by the tensor allocator in Neuron Compiler during NKI kernel compilation process:\n\n\n* If ``64 < num_partition <= 128``\\ , ``start_partition`` must be 0\n* If ``32 < num_partition <= 64``\\ , ``start_partition`` must be 0 or 64\n* If ``0 < num_partition <= 32``\\ , ``start_partition`` must be one of 0/32/64/96\n\nOn the other hand, data accessing along the free dimension is a lot more flexible: the src/dst tensor of an engine\ninstruction can support up to four-dimensional tensorized access pattern with a stride in each dimension\nwithin each partition. At the ISA level,\neach F axis in the tensor can have a size expressed in ``uint16`` and a stride expressed in ``int16``\\ , measured in data elements.\nAs an example, if the tensor data type is BF16, and the stride of the most-minor F dimension is set to 10, then we will\nstride across 20B within a partition at a time. Refer to :ref:`Tile Indexing in NKI Programming Guide <pm_sec_tile_indexing>`\nto learn about how to index SBUF/PSUM tensors to achieve F dimension striding in NKI syntax.\n\nLastly, as implied in :numref:`Figure %s <fig-arch-data-streaming>`,\nwhen accessing a SBUF/PSUM tensor, all active partitions must follow the same F dimension access pattern. In other words,\nat every time step, the engine read/write interface will access data elements at the same *offset* within each active partition.\n\n.. _arch_sec_cross_partition_connect:\n\nCross-Partition Connectivity\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe majority of VectorE/ScalarE/GpSimdE instructions on NeuronCore-v2 require ``src_tensor`` and ``dst_tensor`` to occupy\nthe same number of partitions. When the number of partitions involved exceeds 64, by the ``start_partition`` rule discussed\nabove, the src_tensor and dst_tensor in such cases must both start from partition 0. Therefore, we effectively cannot perform\nany cross-partition data movement when ``num_partition > 64`` : each partition of ``src_tensor`` data will eventually flow\ninto the corresponding partition in ``dst_tensor``.\n\nHowever, when ``num_partition < 64``\\ , VectorE/ScalarE/GpSimdE on NeuronCore-v2 supports two styles of cross-partition\nSBUF/PSUM data movement patterns: 1) cross-half movement for ``32 < num_partition <= 64`` and 2) cross-quadrant movement\nfor ``0 < num_partition <= 32``. Figure below illustrates these two patterns for ``num_partition=64`` and ``num_partition=32``.\nThe shaded portion of the ``Engine`` block indicates the active lanes for the given instruction. With these movement patterns,\neach partition in ``src_tensor`` still has a one-to-one mapping to each partition in ``dst_tensor``.\n\n.. _fig-arch-cross-quadrant:\n\n.. figure:: /nki/img/arch_images/cross_quadrant.png\n   :align: center\n   :width: 90%\n\n   Cross-partition connectivity.\n\nPerformance Consideration\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Access pattern.** As discussed previously in the context of compute engine utilization, it is recommended to use as many\npartitions as possible when accessing SBUF/PSUM tensors to saturate the available data streaming bandwidth. In addition,\naccessing with a large stride in the most-minor (fastest) F dimension will incur performance penalty. When the most-minor\nF dimension stride is less than 16 bytes, SBUF/PSUM on NeuronCore-v2 can supply a peak bandwidth of 128 elements/cycle at\n1.4 GHz for each tensor read/write interface. A 16-byte stride is equivalent to 4 elements for 32-bit data types, 8 elements\nfor 16-bit data types or 16 elements for 8-bit data types.\nIf the most-minor F dimension stride exceeds 16 bytes, the achievable bandwidth of each tensor read/write interface will\nbe half of the peak bandwidth, which translates to roughly 50% performance hit on the instructions.\n\n**Concurrent SBUF/PSUM accesses by engines.** As mentioned earlier, NeuronCore-v2 has the following on-chip RAM access restrictions:\n\n#. Vector Engine and GpSimd Engine cannot access SBUF in parallel\n#. Vector Engine and Scalar Engine cannot access PSUM in parallel\n\nDespite these restrictions, SBUF is capable of driving peak bandwidth in each tensor read/write interface connected to VectorE/ScalarE/TensorE\nor GpSimdE/ScalarE/TensorE *simultaneously* without bandwidth interference. Similarly, PSUM can drive peak bandwidth for\nVectorE/TensorE or ScalarE/TensorE *simultaneously*.\n\n**Tensor access overhead.** Initiating a tensor access request from an engine to its SBUF/PSUM read/write interface incurs\na static overhead approximately 60 cycles on NeuronCore-v2. Compute engines can typically hide some of this latency through\ninstruction level parallelism. However, it is still highly recommended to access tensors with large P and F dimension sizes\nwhenever possible to amortize this overhead.\n\n.. _arch_sec_accumulation_psum:\n\nNear-memory accumulation in PSUM\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAs shown in :numref:`Figure %s <fig-arch-neuron-core-v2>`,\nboth VectorE and ScalarE have read and write access to PSUM, while TensorE only has write access. In fact, PSUM is designed\nto be a landing buffer for TensorE with near-memory accumulation capabilities that allows read-accumulate-write to every\n4B element in memory. Note, this accumulation mechanism can *only* be controlled by TensorE. VectorE and ScalarE can only\naccess PSUM like a regular SRAM similar to SBUF.\n\nNext, let's discuss how TensorE can write outputs to PSUM. As previously discussed, PSUM is organized into 128 *partitions,*\neach consisting of 16KB of memory. Each partition is further divided into 8 PSUM banks, with each bank holding up to 512\n32-bit values. The output tile of a TensorE matrix multiplication instruction (\\ ``nki.isa.nc_matmul``\\ ) must **fit** into\none PSUM bank per partition, which is the fundamental reason for\nthe :ref:`free dimension size limitation <arch_matmul_tile_size>` for the ``moving`` tensor.\nEvery ``nc_matmul`` instruction can choose whether to *override* existing bank data with instruction output or *accumulate*\ninstruction output into existing bank data element-wise.\n\nThe accumulation mode of PSUM is particularly useful when the high-level matmul operator has a contraction dimension (i.e.,\n``stationary/moving`` partition dimension of ``nki.isa.nc_matmul``) greater than 128. As an example, let's assume the following\nmatmul dimensions:\n\n\n* ``x.shape = [128, 256]``\n* ``y.shape = [256, 512]``\n\nFigure below shows this matmul mathematically and also how we would tile the contraction dimension. With tiling, we slice\nboth ``x`` and ``y`` in the contraction dimension to get ``[x0, x1]`` and ``[y0, y1]`` input tiles. To get the\nfinal output result, we need to perform:\n\n\n* output0 = matmul(x0, y0)\n* output1 = matmul(x1, y1)\n* output = output0 + output1\n\n.. _fig-arch-mm-tiling:\n\n.. figure:: /nki/img/arch_images/mm_tiling.png\n   :align: center\n   :width: 90%\n\n   Matmul tiling (mathematical view).\n\nPSUM accumulation effectively combines Step 2 and 3 above into a single TensorE ``nki.isa.nc_matmul`` instruction. Assuming\nwe have ``x`` in the transposed layout in SBUF, visually the above tiled matmul example will have two back-to-back ``nki.isa.nc_matmul``\ninstructions on TensorE:\n\n.. _fig-arch-mm-tiling-hw:\n\n.. figure:: /nki/img/arch_images/mm_tiling_hw.png\n   :align: center\n   :width: 90%\n\n   Matmul tiling (hardware view).\n\nEffectively, the first ``nki.isa.nc_matmul`` instruction overwrites the destination PSUM bank with the instruction output.\nThe second instruction accumulates instruction output onto the previous instruction's result in the same PSUM. The PSUM\naccumulation is always done in FP32. A series of TensorE matmul instructions with the first one writing to a PSUM bank and\nmore subsequent instructions accumulating into the same PSUM bank data is called a *matmul accumulation group*.\n\nIn NKI, ``nisa.nc_matmul`` supports an ``accumulate`` parameter to control PSUM accumulation behavior.\nWhen not specified (default), the compiler auto-detects: the first write to a PSUM bank overwrites, and\nsubsequent writes accumulate. The following NKI code pattern demonstrates PSUM accumulation:\n\n.. code-block::\n\n   # condition 1: a psum buffer with zeros\n   psum_buf = nl.zeros((128, 128), dtype=nl.float32, buffer=nl.psum)\n\n   # condition 2: an affine range loop\n   for i in nl.affine_range(N):\n      # condition 3: add matmul results from TensorEngine\n      nisa.nc_matmul(dst=psum_buf, stationary=stationary_tile, moving=moving_tile)\n\n\nRefer to the\n:ref:`Tiling Matrix Multiplications <tutorial_matmul_tiling>`\ntutorial for a detailed implementation.\n\n.. note::\n\n   When ``accumulate`` is not specified (default), ``nisa.nc_matmul`` auto-detects accumulation:\n   the first write to a PSUM location overwrites, and subsequent writes accumulate. Accumulation\n   can also be controlled explicitly with ``accumulate=True`` or ``accumulate=False``.\n\nFinally, with 8 PSUM banks per partition, TensorE can have up to eight outstanding matmul accumulation groups, which allows\nflexible scheduling of matmul instructions on TensorE. Also, the extra buffering from multiple PSUM banks allows us to pipeline\nTensorE computation with other compute engines: TensorE can move onto the next accumulation group without waiting for VectorE/ScalarE\nto evict previous accumulation group results.\n"
  },
  {
    "path": "nki/guides/framework_custom_op.rst",
    "content": ".. meta::\n   :description: Learn how to insert NKI kernels as custom operators into PyTorch or JAX models with code examples.\n   :keywords: NKI, custom operator, PyTorch, JAX, kernel integration, Neuron Kernel Interface\n   :date-modified: 02/26/2026\n\n.. _nki_framework_custom_op:\n\nNKI Kernel as a Framework Custom Operator\n===========================================\n\nThis document demonstrates how to insert a NKI kernel as a custom\noperator into a PyTorch or JAX model using simple code examples.\n\nUsing NKI kernels\n-------------------------------\n\nTo register a NKI kernel registration, you need to call a decorated\nNKI function.\n\nLet's examine a guiding example below where we\nrandomly initialize two inputs, add them together, and then\nmultiply the result by the two input tensors element-wise.\nThis effectively calculates: ``a * b * (a + b)``.\n\nWe define a common NKI kernel for addition. This is a tiled variation of the addition kernel from\n:doc:`Quickstart: Build and Run a Kernel </nki/get-started/quickstart-implement-run-kernel>`.\n\n.. nki_example:: /nki/examples/tensor_addition/tensor_addition_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_27\n\n^^^^^^^\nPyTorch\n^^^^^^^\n\nWe can perform ``(a + b) * a * b`` using native PyTorch code.\n::\n\n   import torch\n   import torch_xla\n\n   device = torch_xla.device()\n\n   a = torch.randn(256, 1024, dtype=torch.float32).to(device)\n   b = torch.randn(256, 1024, dtype=torch.float32).to(device)\n   c = a + b\n   out = a * b * c\n\n   print(out)\n\nNow let's replace the tensor addition (``c = a + b``) with a NKI\nkernel.\nTo do this we replace the ``+`` operator with a call to the NKI kernel\ncaller (``nki_tensor_add``), and everything else works as before.\n\n::\n\n   device = torch_xla.device()\n   a = torch.randn(256, 1024, dtype=torch.float32).to(device)\n   b = torch.randn(256, 1024, dtype=torch.float32).to(device)\n   c = nki_tensor_add(a, b) # calling a NKI kernel, instead of the built-in torch op\n   out = a * b * c\n   print(out)\n\nTo understand what happens under the hood when we compile the above\ncode, we can print HLO IR graph generated by XLA by setting the\n``NEURON_FRAMEWORK_DEBUG`` environment variable, which preserves the HLO in\nbinary form, and the ``XLA_SAVE_TENSORS_FILE``, which presents a textual\nrepresentation of the HLO. For example, you may add the following lines to your\ncode:\n\n::\n\n   import os\n   os.environ['NEURON_FRAMEWORK_DEBUG'] = \"1\"\n   os.environ[\"XLA_SAVE_TENSORS_FILE\"] = \"example1.pbtxt\"\n\nA ``example1.pbtxt.0`` file is then written in your run directory that has the\ncorresponding human-readable HLO IR.\n\nLet's examine the XLA output of this example.\nIn line #14 we can identify that the tensor addition is now\nmapped to an HLO ``xla::_op_<locals>CallImpl`` instruction, representing the custom call. The output of\nthat ``xla::_op_<locals>CallImpl`` is then consumed by the next instruction in line\n#15 as usual.\n\n.. code-block::\n   :linenos:\n\n    [ScheduleSyncTensorsGraph]\n    TensorsGraphInfo:\n      _str_intern (/home/ec2-user/pytorch-klir/lib/python3.10/site-packages/torch/_tensor_str.py:462)\n      _str (/home/ec2-user/pytorch-klir/lib/python3.10/site-packages/torch/_tensor_str.py:726)\n      __repr__ (/home/ec2-user/pytorch-klir/lib/python3.10/site-packages/torch/_tensor.py:590)\n      <module> (/home/ec2-user/private-aws-neuron-sdk-staging/nki/examples/tensor_addition/t2.py:14)\n    \n    Root Hashes: (181deae9d76fbfbf2fe0e040179f9da8)\n    \n    ## BEGIN_GRAPH\n    IR {\n      %0 = f32[256,1024]{1,0} xla::device_data(), xla_shape=f32[256,1024]{1,0}\n      %1 = f32[256,1024]{1,0} xla::device_data(), xla_shape=f32[256,1024]{1,0}\n      %2 = (f32[256,1024]{1,0}) xla::_op_<locals>CallImpl(%1, %0), xla_shape=(f32[256,1024]{1,0})\n      %3 = f32[256,1024]{1,0} aten::mul(%1, %0), xla_shape=f32[256,1024]{1,0}\n      %4 = f32[256,1024]{1,0} aten::mul(%3, %2), xla_shape=f32[256,1024]{1,0}, ROOT=0\n    }\n    \n    Graph Hash: f518d5bd723cb9d6f9482b42b33105e1\n    \n    ## END_GRAPH\n\nThe Neuron compiler replaces the above custom call with\nthe corresponding NKI kernel implementation while optimizing the rest of the\ncompute graph as usual. At the end of the compilation process, a single\ncompiled binary NEFF file\nis generated representing the entire graph\nincluding the NKI kernel. For more information about NEFF files, see `Neuron Compiler <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/index.html>`__.\n\n.. _nki_framework_custom_op_jax:\n\n^^^\nJAX\n^^^\n\nWe can perform ``(a + b) * a * b`` using native JAX code.\n\n::\n\n   import jax\n   import jax.numpy as jnp\n\n   @jax.jit\n   def jax_customop_tutorial(a, b):\n      c = a + b\n      out = a * b * c\n      return out\n\n   seed = jax.random.PRNGKey(0)\n   seed_a, seed_b = jax.random.split(seed)\n   a = jax.random.normal(seed_a, (256, 1024), dtype=jnp.float32)\n   b = jax.random.normal(seed_b, (256, 1024), dtype=jnp.float32)\n\n   print(jax_customop_tutorial(a, b))\n\nSimilar to the PyTorch example above, let's replace the tensor addition ``(c = a + b)`` with\nthe addition NKI kernel. To do this we replace the ``+`` operator with a call to the NKI kernel\ncaller (``nki_tensor_add``), and everything else works as before.\n\n::\n\n   import jax\n   import jax.numpy as jnp\n\n   @jax.jit\n   def jax_customop_tutorial(a, b):\n      c = nki_tensor_add(a, b) # calling a NKI kernel, instead of the built-in jax op\n      out = a * b * c\n      return out\n\n   seed = jax.random.PRNGKey(0)\n   seed_a, seed_b = jax.random.split(seed)\n   a = jax.random.normal(seed_a, (256, 1024), dtype=jnp.float32)\n   b = jax.random.normal(seed_b, (256, 1024), dtype=jnp.float32)\n   print(jax_customop_tutorial(a, b))\n\n\nTo understand what happens under the hood when we compile the above code,\nwe can print the HLO IR graph by adding the following snippet to your code:\n\n::\n\n   print(jax.jit(jax_customop_tutorial)\n      .lower(a, b)\n      .compile()\n      .runtime_executable()\n      .hlo_modules()[0].to_string()\n   )\n\nLet's examine the XLA output of this example.\nIn line #8 we can identify that the tensor addition is now\nmapped to an HLO ``custom-call`` instruction, similar to PyTorch. The output of\nthat ``custom-call`` is then consumed by the next instruction in line\n#9 as usual.\n\n.. code-block::\n   :linenos:\n\n   HloModule jit_jax_customop_tutorial, entry_computation_layout={(f32[256,1024]{1,0}, f32[256,1024]{1,0})->(f32[256,1024]{1,0})}, allow_spmd_sharding_propagation_to_parameters={}, allow_spmd_sharding_propagation_to_output={true}\n   \n   ENTRY %main.12 (Arg_0.1: f32[256,1024], Arg_1.2: f32[256,1024]) -> (f32[256,1024]) {\n     %Arg_0.1 = f32[256,1024]{1,0} parameter(0), metadata={op_name=\"a\"}\n     %Arg_1.2 = f32[256,1024]{1,0} parameter(1), metadata={op_name=\"b\"}\n     %multiply.0 = f32[256,1024]{1,0} multiply(%Arg_0.1, %Arg_1.2), metadata={op_name=\"jit(jax_customop_tutorial)/jit(main)/jit(jax_customop_tutorial)/mul\" source_file=\"/home/ec2-user/private-aws-neuron-sdk-staging/nki/examples/tensor_addition/t4.py\" source_line=9}\n     %constant.0 = s8[128,128]{1,0} constant({...})\n     %custom-call.0 = f32[256,1024]{1,0} custom-call(%Arg_0.1, %Arg_1.2, %constant.0), custom_call_target=\"AwsNeuronCustomNativeKernel\", api_version=API_VERSION_STATUS_RETURNING, metadata={op_name=\"jit(jax_customop_tutorial)/jit(main)/jit(jax_customop_tutorial)/nki_call\" source_file=\"/home/ec2-user/jax-klir/lib/python3.10/site-packages/nki/_jax.py\" source_line=64}, backend_config=\"eyJrZXJuZWxfdmVyc2lvbiI6IDEsICJrbGlyX2JpbmFyeSI6IHsiYmluYXJ5IjogIi90bXAvbmtpX3RlbnNvcl9hZGRncmV6aGF4eS5rbGlyIiwgImlucHV0X25hbWVzIjogWyJhX2lucHV0IiwgImJfaW5wdXQiLCAidG1wLjQiXSwgIm91dHB1dF9uYW1lcyI6IFsiY19vdXRwdXQuMzUiXX0sICJmdW5jX25hbWUiOiAibmtpX3RlbnNvcl9hZGQiLCAiZ3JpZCI6IFtdLCAiaGFzX2NvbGxlY3RpdmVzIjogZmFsc2V9\"\n     %multiply.1 = f32[256,1024]{1,0} multiply(%multiply.0, %custom-call.0), metadata={op_name=\"jit(jax_customop_tutorial)/jit(main)/jit(jax_customop_tutorial)/mul\" source_file=\"/home/ec2-user/private-aws-neuron-sdk-staging/nki/examples/tensor_addition/t4.py\" source_line=9}\n     ROOT %tuple.11 = (f32[256,1024]{1,0}) tuple(%multiply.1)\n   }\n\nThe Neuron compiler replaces the above custom-call with\nthe corresponding NKI kernel implementation while optimizing the rest of the\ncompute graph as usual. At the end of the compilation process, a single\ncompiled binary NEFF file\nis generated representing the entire graph\nincluding the NKI kernel. For more information about NEFF files, see `Neuron Compiler <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/index.html>`__.\n\n\nUsing NKI in training graphs\n----------------------------\n\nIf you are using NKI to implement a new operator in a training graph,\nyou might need to make the new operator interplay with the\n``autograd`` engine in the framework. To do this, in PyTorch, you can\nsubclass the framework’s base operator class and implement both the ``forward()``\nand ``backward()`` methods. The ``autograd`` engine then uses the ``backward()``\nmethod when performing auto-differentiation. See\n`Extending torch.autograd <https://pytorch.org/docs/stable/notes/extending.html>`__ in the\nPyTorch Docs for instructions on doing this in PyTorch. To do this in JAX,\nyou can create a ``custom_vjp`` rule (vjp stands for Vector-Jacobian product), which binds the\n``forward()`` and ``backward()`` calls. See\n`Autodiff Cookbook <https://jax.readthedocs.io/en/latest/notebooks/autodiff_cookbook.html>`__ in\nthe JAX Docs for instructions on doing this.\n\nLet's reuse the ``nki_tensor_add`` kernels from before and demonstrate how to train a\nsimple compute graph ``(a+b)*a*b`` in both PyTorch and JAX.\n\n.. _nki_framework_custom_op_pytorch:\n\n^^^^^^^\nPyTorch\n^^^^^^^\n\nWe define a ``NkiAddFunc``\nclass, which leverages the ``nki_tensor_add`` kernel in its ``forward()``\nfunction. The gradients of both input tensors in ``y = a + b`` are\nones, so the ``backward()`` function\npropagates the ``dy`` gradients from the previous backward function.\n\n::\n\n   import torch\n   import torch_xla\n   device = torch_xla.device()\n\n   class NkiAddFunc(torch.autograd.Function):\n     @staticmethod\n     def forward(ctx, a, b):\n       return nki_tensor_add(a, b)\n\n     @staticmethod\n     def backward(ctx, dy, *args):\n       # gradients for a and b\n       return dy, dy\n\n   # now, let's define the compute graph\n   a = torch.randn(256, 1024, dtype=torch.float32).to(device).detach().requires_grad_()\n   b = torch.randn(256, 1024, dtype=torch.float32).to(device).detach().requires_grad_()\n   c = NkiAddFunc.apply(a, b)\n   out = a * b * c\n\n   # here we define a (dummy) loss-function, in prep for backward propagation\n   loss = out.sum()\n\n   # lastly, let's invoke the auto-grad engine\n   loss.backward()\n\n   torch_xla.sync()\n\n^^^\nJAX\n^^^\n\nWe define a ``custom_vjp`` function ``nki_add_func`` by using\nthe ``@jax.custom_vjp`` decorator which directly calls\nthe ``nki_tensor_add`` kernel. We then define and register\nthe ``forward()`` and ``backward()`` implementations of the\n``nki_add_func`` function via ``defvjp()``. Just like the PyTorch\nexample before, the ``backward()`` implementation simply passes\nthe gradients through. Finally, to start training, we execute the\nforward pass by calling ``nki_add_func(a, b) * x * y``.\nTo get the gradients, we call ``jax.grad`` directly with a loss function.\n\n::\n\n   @jax.custom_vjp\n   def nki_add_func(a, b):\n      return nki_tensor_add(a, b)\n\n   def f_forward(a, b):\n      # operator output and residual (same as input here)\n      return nki_add_func(a, b), (a, b)\n\n   def f_backward(res, grad):\n      # gradients for a and b\n      return grad, grad\n\n   nki_add_func.defvjp(f_forward, f_backward) # line 11\n\n   @jax.jit\n   def jax_customop_tutorial_and_grad(a, b):\n      out = nki_add_func(a, b) * a * b\n\n      # use the same dummy loss function (output sum) as PyTorch example above\n      grad = jax.grad(lambda x, y: (nki_add_func(x, y) * x * y).sum(), argnums=(0, 1))(a, b)\n      return out, *grad\n\n   c, grad_a, grad_b = jax_customop_tutorial_and_grad(a, b)\n"
  },
  {
    "path": "nki/guides/how-to-scheduling-apis.rst",
    "content": ".. _how-to-scheduling-apis:\n\n.. meta::\n   :description: Learn how to use NKI Scheduling APIs to control automatic instruction scheduling by adding dependency edges and using no-reorder blocks.\n   :keywords: NKI, scheduling, instruction scheduling, dependency edges, no-reorder, Neuron Kernel Interface\n   :date-modified: 02/20/2026\n\nHow to Use the NKI Scheduling APIs\n==================================\n\nLearn how to control instruction execution order in your NKI kernels using scheduling APIs. This guide demonstrates how to add dependency edges between instructions with ``with_schedule()`` and how to use ``no_reorder`` blocks to prevent automatic instruction reordering, giving you fine-grained control over kernel performance optimization.\n\nAbout the NKI Scheduling APIs\n-----------------------------\n\nThe NKI Scheduling APIs provide additional control over automatic instruction scheduling. This control comes in the form of adding additional dependency edges before scheduling. The extra dependency edges constrain the reordering that the automatic scheduler will do.\n\nAdding dependency edges\n-----------------------\n\nBelow is an example showing how to specify the scheduling metadata for NKI kernel functions. The extra dependency edges are communicated to the NKI compiler by setting a property on the top-level kernel functions with the scheduling edges.\n\n.. code-block:: python\n\n   @nki.jit()\n   def kernel(t):\n       x = nl.ndarray(t.shape, t.dtype, buffer=nl.sbuf)\n       nisa.dma_copy(dst=x, src=t)\n       \n       a = nl.ndarray(t.shape, t.dtype, buffer=nl.sbuf)\n       b = nl.ndarray(t.shape, t.dtype, buffer=nl.sbuf)\n       c = nl.ndarray(t.shape, t.dtype, buffer=nl.sbuf)\n       \n       nisa.reciprocal(dst=a, data=x, name=\"recip\")\n       nisa.tensor_scalar(dst=b, data=x, op0=nl.add, operand0=1, name=\"plus1\")\n       nisa.tensor_tensor(dst=c, data1=a, data2=b, op=nl.add)\n       \n       out = nl.ndarray(t.shape, t.dtype, buffer=nl.hbm)\n       nisa.dma_copy(dst=out, src=c)\n       \n       return out\n\n   # The named statements \"recip\" and \"plus1\" could execute in any order.\n   # We can fix the order with a dependency by setting the \"schedule\" property.\n   # This property can be a list of pairs of instruction names.\n   # Each pair is a set of dependency edges.\n   # In this case \"plus1\" depends on \"recip\", and so will execute second.\n   scheduled = kernel.with_schedule([\n       (\"plus1\", \"recip\")\n   ])\n\n   # The second component of each pair can be a single name or a list of names\n   # This is equivalent to above. Using a list is convenient\n   # for declaring multiple dependency edges.\n   scheduled = kernel.with_schedule([\n       (\"plus1\", [\"recip\"])\n   ])\n\nThe NKI compiler will collect the data from the ``with_schedule`` call, check that it makes sense, and propagate it to the scheduling pass of the compiler. Below is a more complicated example with programmatic meta-data generation. In this example, we will enforce a sequential order for all of the activation operations.\n\n.. code-block:: python\n\n   # compute exp on three tiles and return the result tiles\n   @nki.jit\n   def kernel(a, b, c):\n       in_tiles = []\n       for inp in (a,b,c):\n           in_tile = nl.ndarray(inp.shape, inp.dtype, buffer=nl.sbuf)\n           nisa.dma_copy(dst=in_tile, src=inp)\n           in_tiles.append(in_tile)\n       \n       out_tiles = []\n       for i in range(len(in_tiles)):\n           tile = in_tiles[i]\n           out_tile = nl.ndarray(tile.shape, tile.dtype, buffer=nl.sbuf)\n           nisa.activation(dst=out_tile, data=tile, op=nl.exp, name=f\"act{i}\")\n           out_tiles.append(out_tile)\n       \n       outs = []\n       for tile in out_tiles:\n           out = nl.ndarray(tile.shape, tile.dtype, buffer=nl.hbm)\n           nisa.dma_copy(dst=out, src=tile)\n           outs.append(out)\n       \n       return tuple(outs)\n\n   # The activations have no data dependencies, and could execute in any order.\n   # Make them execute serially by building a list of pairs of edges\n   # act1 depends on act0\n   # act2 depends on act1\n   l = []\n   for i in range(1,3):\n       l.append((f\"act{i}\", f\"act{i-1}\"))\n   \n   # attach the dependencies to the kernel by calling with_schedule\n   scheduled = kernel.with_schedule(l)\n\nUsing no_reorder\n----------------\n\nAdding dependency edges can be tedious. To make the process more streamlined the NKI compiler also supports no-reorder blocks. A no-reorder block is a section of code where dependency edges are automatically between every pair of instructions. Using no-reorder blocks, the example above could be written as shown below.\n\n.. code-block:: python\n\n   # compute exp on three tiles and return the result tiles\n   @nki.jit()\n   def loop(a, b, c):\n       in_tiles = []\n       for inp in (a,b,c):\n           in_tile = nl.ndarray(inp.shape, inp.dtype, buffer=nl.sbuf)\n           nisa.dma_copy(dst=in_tile, src=inp)\n           in_tiles.append(in_tile)\n       \n       out_tiles = []\n       with nl.no_reorder():\n           for i in range(len(in_tiles)):\n               tile = in_tiles[i]\n               out_tile = nl.ndarray(tile.shape, tile.dtype, buffer=nl.sbuf)\n               nisa.activation(dst=out_tile, data=tile, op=nl.exp, name=f\"act{i}\")\n               out_tiles.append(out_tile)\n       \n       outs = []\n       for tile in out_tiles:\n           out = nl.ndarray(tile.shape, tile.dtype, buffer=nl.hbm)\n           nisa.dma_copy(dst=out, src=tile)\n           outs.append(out)\n       \n       return tuple(outs)\n\nThe ``no_reorder`` block instructs the compiler to insert dependency edges between every instruction. Note, the ``no_reorder`` block is \"dynamically scoped\", meaning it applies to all of the code that would execute under the block, not just the code that is syntactically under the block. For example, the following code is equivalent to the above.\n\n.. code-block:: python\n\n   def loop_body(i, in_tiles, out_tiles):\n       tile = in_tiles[i]\n       out_tile = nl.ndarray(tile.shape, tile.dtype, buffer=nl.sbuf)\n       nisa.activation(dst=out_tile, data=tile, op=nl.exp, name=f\"act{i}\")\n       out_tiles.append(out_tile)\n\n   @nki.jit\n   def loop(a, b, c):\n       in_tiles = []\n       for inp in (a,b,c):\n           in_tile = nl.ndarray(inp.shape, inp.dtype, buffer=nl.sbuf)\n           nisa.dma_copy(dst=in_tile, src=inp)\n           in_tiles.append(in_tile)\n       \n       out_tiles = []\n       with nl.no_reorder():\n           for i in range(len(in_tiles)):\n               loop_body(i, in_tiles, out_tiles)\n       \n       outs = []\n       for tile in out_tiles:\n           out = nl.ndarray(tile.shape, tile.dtype, buffer=nl.hbm)\n           nisa.dma_copy(dst=out, src=tile)\n           outs.append(out)\n       \n       return tuple(outs)\n\nNotice that even though the ``loop_body`` function is not syntactically under a ``no_reorder`` block, it will be evaluated as a no-reorder block because the function is called from under a ``no_reorder`` block.\n"
  },
  {
    "path": "nki/guides/index.rst",
    "content": ".. meta::\n    :description: Guides for AWS Neuron Kernel Interface (NKI), including architectures, tutorials for implementing and optimizing kernels, and how to use kernels with common frameworks.\n    :keywords: NKI, AWS Neuron, Guides, Tutorials, how-to\n    :date-modified: 2/26/2026\n\n.. _nki-guides:\n\nNKI Guides\n===========\n\nThis section provides hands-on tutorials for the Neuron Kernel Interface (NKI), demonstrating how to write custom kernels for AWS Trainium and Inferentia instances. These tutorials cover fundamental operations, advanced techniques, and distributed computing patterns using NKI.\n\nTutorials\n---------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: nki-matrix-multiplication\n      :link-type: ref\n\n      **Matrix Multiplication**\n      ^^^\n      Learn the fundamentals of implementing matrix multiplication in your NKI kernels.\n\n   .. grid-item-card::\n      :link: nki-transpose2d\n      :link-type: ref\n\n      **Transpose 2D**\n      ^^^\n      Implement efficient 2D matrix transpose operations using NKI\n\n   .. grid-item-card::\n      :link: nki-averagepool2d\n      :link-type: ref\n\n      **Average Pooling 2D**\n      ^^^\n      Create custom 2D average pooling kernels for computer vision workloads\n\n   .. grid-item-card::\n      :link: nki-fused-mamba\n      :link-type: ref\n\n      **Fused Mamba**\n      ^^^\n      Implement fused Mamba state space model kernels\n\nArchitecture Guides\n-------------------\n\nNeuron recommends new NKI developers start with :doc:`Trainium/Inferentia2 Architecture Guide </nki/guides/architecture/trainium_inferentia2_arch>` before exploring newer NeuronDevice architecture.\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: Trainium/Inferentia2 Architecture Guide\n      :link: trainium_inferentia2_arch\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Foundational architecture guide for understanding NeuronDevice basics.\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Trainium2 Architecture Guide\n      :link: trainium2_arch\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Architecture enhancements and improvements in the Trainium2 generation.\n\n   .. grid-item-card:: Trainium3 Architecture Guide\n      :link: trainium3_arch\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Latest architecture features and capabilities in Trainium3 devices.\n\nHow-To Guides\n-------------\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: How to use the NKI CPU Simulator\n      :link: nki-simulator\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      Develop and debug NKI kernels on your CPU with no hardware required.\n\n   .. grid-item-card:: How to Insert NKI Kernels into Models\n      :link: nki_framework_custom_op\n      :link-type: ref\n      :class-body: sphinx-design-class-title-small\n\n      How to insert a NKI kernel as a custom operator into a PyTorch or JAX model using simple code examples.\n\n   .. grid-item-card:: How to Use the NKI Scheduling APIs\n      :link: how-to-scheduling-apis\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Control instruction execution order using dependency edges and no-reorder blocks for kernel performance optimization.\n\n   .. grid-item-card:: Profiling a NKI Kernel with Neuron Explorer\n      :link: /nki/guides/use-neuron-profile\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Profile NKI kernels using Neuron Explorer to analyze hardware-level performance.\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   Tutorials </nki/guides/tutorials/index>\n   Architecture </nki/guides/architecture/index>\n   NKI CPU Simulator </nki/guides/nki_simulator>\n   Insert NKI Kernels into Models </nki/guides/framework_custom_op>\n   Use NKI Scheduling APIs </nki/guides/how-to-scheduling-apis>\n   Profile a NKI Kernel </nki/guides/use-neuron-profile>"
  },
  {
    "path": "nki/guides/nki_simulator.rst",
    "content": ".. meta::\n    :description: Documentation for the nki.simulate API in the Neuron SDK\n    :keywords: nki, simulate, nki.simulate, test, kernels, aws neuron sdk\n    :date-modified: 04/02/2026\n\n.. _nki-simulator:\n\nNKI CPU Simulator\n=================\n\n.. warning::\n\n   This API is experimental and may change in future releases.\n\n``nki.simulate`` runs NKI kernels on your CPU using Python (and NumPy), with no Trainium hardware required.\nIt executes kernel code as regular Python, making it ideal for fast development, debugging, and correctness testing.\n\n.. contents:: On this page\n   :local:\n   :depth: 2\n\nOverview\n--------\n\n``nki.simulate`` is a CPU-based functional simulator for NKI kernels. It executes every ``nki.isa``\nand ``nki.language`` operation using Python and NumPy, producing results that approximate hardware behavior.\nYou write your kernel once and can run it on both the simulator and real Trainium devices. Some kernels\nmay require adjustments when moving to hardware — see :ref:`Simulation Limitations <simulation-limitations>` for details.\n\n**Why use the simulator?**\n\n- **No hardware required** — develop and test NKI kernels on any machine with Python.\n- **Cost savings** — avoid the cost of developing on Trainium instances; iterate locally, then deploy to hardware when ready.\n- **Same kernel code** — the same ``@nki.jit`` kernel can run on both hardware and the simulator. See :ref:`Simulation Limitations <simulation-limitations>` for cases where adjustments may be needed.\n- **Full debugging support** — use ``breakpoint()``, PDB, or IDE debuggers to step through kernel execution and inspect tensor values.\n- **Fast iteration** — test kernels instantly without compilation or deployment.\n- **Hardware constraint validation** — catches invalid shapes, buffer misuse, dtype errors, and other constraint violations at runtime with clear error messages.\n- **AI-assisted development** — ideal for GenAI coding agents authoring NKI kernels: instant local feedback, detailed error messages, and the ability to instrument every line of code with debug prints (including intermediate tensors) enable rapid autonomous iteration without hardware access.\n\nQuick Start\n-----------\n\n.. nki_example:: /nki/examples/simulate/nki_simulate_example.py\n   :language: python\n   :marker: NKI_EXAMPLE_SIMULATE\n\n.. nki_example:: /nki/examples/simulate/nki_simulate_example.py\n   :language: python\n   :marker: NKI_EXAMPLE_SIMULATE_RUN\n\n\nUsage\n-----\n\nRunning the Simulator\n^^^^^^^^^^^^^^^^^^^^^\n\nThe simulator accepts **NumPy arrays** as inputs. If your script uses PyTorch or JAX tensors,\nconvert them to NumPy arrays before passing them to simulated kernels (for example, ``tensor.numpy()``).\n\n**nki.simulate() API**\n\nUse the explicit API to run a kernel on the simulator. This is also useful when you want\nto run a kernel on *both* the simulator and hardware in the same script — for example,\nto compare results:\n\n.. code-block:: python\n\n   # Run on simulator\n   sim_result = nki.simulate(my_kernel)(a_np, b_np)\n\n   # Run on hardware (requires Trainium and neuronx-cc)\n   hw_result = my_kernel(a_torch, b_torch)\n\n   # Compare\n   np.testing.assert_allclose(sim_result, hw_result.numpy(), rtol=1e-2)\n\n\nTarget Platform\n^^^^^^^^^^^^^^^\n\nThe simulator models different NeuronCore generations. Set the target using the\n``NEURON_PLATFORM_TARGET_OVERRIDE`` environment variable:\n\n.. list-table::\n   :header-rows: 1\n   :widths: 40 60\n\n   * - Environment variable value\n     - Hardware\n   * - ``trn1`` or ``gen2``\n     - Trn1 (NeuronCore-v2)\n   * - ``trn2`` or ``gen3``\n     - Trn2 (NeuronCore-v3)\n   * - ``trn3`` or ``gen4``\n     - Trn3 (NeuronCore-v4)\n   * - *(unset)*\n     - Auto-detect (uses the Neuron chip detected on the running machine, otherwise defaults to ``trn3``)\n\nPrecise Floating-Point Mode\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nBy default, the simulator stores ``bfloat16``, ``float8_e4m3``, and ``float8_e5m2`` tensors as ``float32``\nfor faster simulation performance and to let you examine kernel correctness in high-precision floating-point.\nTo get numerical behavior similar to hardware, enable precise mode with ``NKI_PRECISE_FP=1``:\n\n.. code-block:: bash\n\n   NKI_PRECISE_FP=1 python my_script.py\n\nWhen enabled, low-precision dtypes are stored using ``ml_dtypes`` (real ``bfloat16``, ``float8``, etc.)\ninstead of ``float32``. This is recommended for most use cases.\n\nDebugging\n^^^^^^^^^\n\nBecause the simulator runs kernels as regular Python, you have full access to Python's\ndebugging ecosystem.\n\n**Using breakpoint():**\n\n.. code-block:: python\n\n   @nki.jit\n   def my_kernel(a_ptr):\n       tile = nl.load(a_ptr)\n       breakpoint()  # Debugger stops here — inspect `tile`\n       result = nl.add(tile, tile)\n       return nl.store(result)\n\n   nki.simulate(my_kernel)(data)\n\n**Using device_print:**\n\n``nl.device_print`` works in the simulator and prints tensor values to stdout:\n\n.. code-block:: python\n\n   @nki.jit\n   def my_kernel(a_ptr):\n       tile = nl.load(a_ptr)\n       nl.device_print(\"my tile\", tile)\n       ...\n\n**Using Python print:**\n\nSince the simulator executes kernels as standard Python, you can use ``print()`` to inspect any\nintermediate tensor or register value during execution. This is especially useful for both interactive\ndebugging and AI-assisted development workflows where agents iterate on kernels locally.\n\n**IDE Debugging (VSCode / PyCharm):**\n\nSet breakpoints in your kernel code and run your script normally. The simulator executes\nkernel code in-process, so IDE debuggers work without any special configuration.\n\n\nHow It Works\n------------\n\nExecution\n^^^^^^^^^\n\nWhen you call ``nki.simulate(kernel)(a, b)``:\n\n1. Each NumPy array argument is wrapped into an ``NkiTensor`` with ``buffer=nl.hbm``\n   (or ``shared_hbm`` for LNC2). Non-array arguments pass through unchanged.\n2. The simulator backend is activated, routing all ``nki.isa`` and ``nki.language``\n   operations to NumPy-based implementations.\n3. The kernel function runs as regular Python — each NKI API call executes eagerly\n   and sequentially. There is no instruction scheduling or engine parallelism.\n4. On return, ``NkiTensor`` results are converted back to NumPy arrays. Input arrays are\n   updated in-place if the kernel modified the corresponding HBM tensors.\n\nFor **LNC2 kernels** (``kernel[2]``), the simulator spawns two Python threads that execute the\nkernel concurrently, each with its own ``program_id``. Input arrays use ``shared_hbm`` buffers,\nso both threads can access shared memory. ``nki.isa.sendrecv`` and ``nki.isa.core_barrier``\nuse thread-safe synchronization primitives.\n\nUninitialized Memory Detection\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe simulator automatically fills all newly allocated tensors with **sentinel values** — ``NaN`` for\nfloating-point types and ``4`` for integer types. This makes it easy to detect bugs where a kernel\nreads from memory that was never written to.\n\nBecause ``NaN`` propagates through arithmetic (any operation involving ``NaN`` produces ``NaN``), if\nyour kernel accidentally computes on uninitialized memory, the resulting output will contain ``NaN``\nvalues. You can check for this in your test:\n\n.. code-block:: python\n\n   result = nki.simulate(my_kernel)(inputs)\n   assert not np.any(np.isnan(result)), \"Kernel computed on uninitialized memory!\"\n\n**Why this matters:**\n\nOn real hardware, uninitialized memory contains arbitrary leftover values from previous operations.\nA kernel that reads uninitialized data may appear to produce correct results on hardware by coincidence —\nmaking these bugs extremely difficult to track down. The simulator's sentinel values turn these silent\ncorrectness hazards into immediately visible ``NaN`` values in the output.\n\n.. tip::\n\n   If you see unexpected ``NaN`` values in your simulation output, check that all tensors are properly\n   initialized before use. Common causes include:\n\n   - Allocating a tensor with ``nl.ndarray`` but not writing to all elements before reading\n   - Off-by-one errors in tile loop bounds that leave some elements unwritten\n   - Conditional writes that skip certain partitions or indices\n\n\nHardware Constraint Validation\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nEach ``nki.isa`` operation validates hardware constraints at runtime — shape limits, dtype\ncompatibility, buffer types, engine restrictions, and architecture version requirements.\nInvalid operations raise clear Python exceptions with descriptive error messages.\n\n.. warning::\n\n   Hardware constraint validation is actively being developed. Some constraints may not yet\n   be checked by the simulator. If your kernel passes simulation but fails on hardware,\n   report it to the Neuron team as an issue.\n\n\n**Example:**\n\n.. code-block:: python\n\n   @nki.jit\n   def bad_kernel(a_ptr):\n       tile = nl.ndarray((256, 512), dtype=nl.float32, buffer=nl.sbuf)  # exceeds 128\n       ...\n\n   nki.simulate(bad_kernel)(data)\n   # AssertionError: tensor_tensor data1 partition dimension 256 exceeds maximum 128\n\n\n\n.. _simulation-limitations:\n\nSimulation Limitations\n----------------------\n\nThe simulator approximates hardware behavior but is not identical. Understanding these\nlimitations helps you write kernels that work on both the simulator and real Trainium hardware.\n\nNKI Meta-Programming Support\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe simulator executes kernel code directly as Python — there is no compilation step. As a result,\nthe simulator accepts any valid Python in the kernel body, including arbitrary classes, closures,\nand dynamic control flow. The NKI compiler, however, only supports a restricted subset of Python\nfor meta-programming, see :ref:`NKI Language Guide<nki-language-guide>`. Kernels that use\nunsupported Python constructs will execute successfully on the simulator but fail to compile for hardware.\n\nNumerical Precision\n^^^^^^^^^^^^^^^^^^^\n\nBy default, the simulator stores low-precision types (``bfloat16``, ``float8_e4m3``, ``float8_e5m2``)\nas ``float32``, which can mask rounding and precision issues that appear on hardware. Enable\n``NKI_PRECISE_FP=1`` (recommended) to use real low-precision storage via ``ml_dtypes`` for\nnumerical behavior similar to hardware. See `Precise Floating-Point Mode`_ for details.\n\nPerformance\n^^^^^^^^^^^\n\nThe simulator runs on the CPU using Python and NumPy. It does not model instruction latency,\nengine parallelism, or hardware scheduling. Since kernels are interpreted rather than compiled\nand optimized for Trainium NeuronCores, the simulator is significantly slower than hardware\nexecution and is not suitable for performance benchmarking.\n\nMemory Model\n^^^^^^^^^^^^\n\nThe simulator allocates each tensor independently without simulating overlapping memory regions\nor validating against SBUF/PSUM capacity limits. Kernels with memory conflicts may run\nsuccessfully on the simulator but fail or produce incorrect results on real hardware, where\nSBUF and PSUM are shared physical memory with capacity constraints.\n\nKnown Gaps\n^^^^^^^^^^\n\n- ``nki.collectives`` APIs are not implemented in the simulator.\n- Some ``nki.isa`` instructions produce incorrect results: ``local_gather``,\n  ``nc_stream_shuffle`` with ``mask=255``, ``nc_matmul_mx``, and ``quantize_mx``.\n- Some hardware constraint checks are missing — see `Hardware Constraint Validation`_ for details.\n"
  },
  {
    "path": "nki/guides/tutorials/average_pool2d.rst",
    "content": ".. _nki-averagepool2d:\n\nAveragePool2D\n=============\n\nIn this tutorial, we examine a case of\ndimensionality reduction. We implement a 2D AveragePool operation, which\nis used in many vision neural networks.\nIn doing so, we learn about:\n\n-  NKI syntax and programming model.\n-  multi-dimensional memory access patterns in NKI.\n\nThe 2D AveragePool operation takes\n``C x [H,W]`` matrices and reduces each matrix along the ``H`` and ``W``\naxes. To leverage free-dimension flexible indexing, we can map the ``C``\n(parallel) axis to the ``P`` dimension and ``H/W`` (contraction)\naxes to the ``F`` dimension.\nPerforming such a 2D pooling operation requires a 4D memory access\npattern in the ``F`` dimension, with reduction along two axes.\n:ref:`Figure <nki-fig-avgpool>`\nbelow illustrates the input and output tensor layouts.\n\n.. :\n\n.. figure:: ../../img/pm-index-3.png\n   :name: nki-fig-avgpool\n   :align: center\n   :width: 60%\n\n   2D-Pooling Operation (reducing on axes F2 and F4)\n\nPyTorch\n-------\n\nCompute kernel\n^^^^^^^^^^^^^^\n\n.. nki_example:: ../../examples/average_pool2d/average_pool2d_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_37\n\n\nLaunching kernel and testing correctness\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo execute the kernel, we prepare tensors ``in_tensor`` and call ``tensor_avgpool_kernel``:\n\n.. nki_example:: ../../examples/average_pool2d/average_pool2d_torch.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_38\n\nJAX\n-------\n\nCompute kernel\n^^^^^^^^^^^^^^\n\nLet's reuse the same NKI kernel implementation defined for PyTorch above:\n\n.. nki_example:: ../../examples/average_pool2d/average_pool2d_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_37\n\nIn order to pass ``pool_size`` as a compile time constant, we pass ``pool_size`` as kwargs.\n\n.. nki_example:: ../../examples/average_pool2d/average_pool2d_jax.py\n   :language: python\n   :marker: NKI_EXAMPLE_39\n\nWe write a reference JAX implementation of ``AveragePool2D`` as JAX does\nnot have a primitive for it.\n\n.. nki_example:: ../../examples/average_pool2d/average_pool2d_jax.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_40\n\n\nLaunching kernel and testing correctness\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo execute the kernel, we prepare array ``in_array`` and invoke the kernel caller function ``tensor_avgpool_kernel``:\n\n.. nki_example:: ../../examples/average_pool2d/average_pool2d_jax.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_41\n\n\nDownload All Source Code\n--------------------------\n\nClick the links to download source code of the kernels and the testing code\ndiscussed in this tutorial.\n\n* NKI baremetal implementation: :download:`average_pool2d_nki_kernels.py <../../examples/average_pool2d/average_pool2d_nki_kernels.py>`\n* PyTorch implementation: :download:`average_pool2d_torch.py <../../examples/average_pool2d/average_pool2d_torch.py>`\n    * You must also download :download:`average_pool2d_nki_kernels.py <../../examples/average_pool2d/average_pool2d_nki_kernels.py>`\n      into the same folder to run this PyTorch script.\n* JAX implementation: :download:`average_pool2d_jax.py <../../examples/average_pool2d/average_pool2d_jax.py>`\n    * You must also download :download:`average_pool2d_nki_kernels.py <../../examples/average_pool2d/average_pool2d_nki_kernels.py>`\n      into the same folder to run this JAX script.\n\nYou can also view the source code in the GitHub repository `nki_samples <https://github.com/aws-neuron/nki-samples/tree/main/src/nki_samples/tutorials/average_pool2d/>`_\n\nExample usage of the scripts:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nRun NKI baremetal implementation:\n\n.. code-block::\n\n   python3 average_pool2d_nki_kernels.py\n\nRun PyTorch implementation:\n\n.. code-block::\n\n   python3 average_pool2d_torch.py\n\nRun JAX implementation:\n\n.. code-block::\n\n   python3 average_pool2d_jax.py\n"
  },
  {
    "path": "nki/guides/tutorials/fused_mamba.rst",
    "content": ".. _nki-fused-mamba:\n\nFused Mamba\n==============\n\nIn this tutorial, we implement a NKI kernel for the `Mamba Large Language Model <https://arxiv.org/abs/2312.00752>`_,\na State Space Model (SSM) which replaces\nthe attention of a regular Transformer model with a custom layer inspired by Recurrent Neural Networks. We will walk through\nthe core computation step-by-step and map it to NKI APIs to form a functional kernel. Next, by scaling the input shapes\nof the kernel (both channel size and sequence length), we will iterate on a more hardware-efficient kernel implementation\nto improve the scaling efficiency.\n\nIn this tutorial, we learn about:\n\n* Mapping different vector operations efficiently to NeuronCore compute engines, such as associative scan and element-wise\n  operations between tensors\n* Leveraging data reuse and tiling to reduce excessive data movement and keep compute engines busy\n* Using :doc:`neuron-profile </nki/guides/use-neuron-profile>` to identify performance bottlenecks and opportunities\n\nPyTorch Reference Implementation\n--------------------------------\n\nBefore jumping to NKI, let's examine the compute definition of a Mamba-v1 layer using the below PyTorch script\n(``mamba_torch.py``):\n\n.. nki_example:: ../../examples/fused_mamba/mamba_torch.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_24\n\nThe input tensor shapes are as follows:\n\n* ``delta: [batch, channels, seq_len]``\n* ``u: [batch, channels, seq_len]``\n* ``A: [channels, state_size]``\n* ``B: [batch, state_size, seq_len]``\n* ``C: [batch, state_size, seq_len]``\n\nThe key model parameters are:\n\n\n* ``batch``\\ : batch size of the model.\n* ``seq_len``\\ : sequence length of the model.\n* ``channels``\\ : hidden size of a token.\n* ``state_size``\\ : number of model states.\n\nWe use ``[batch=1, seq_len=512, channels = 256, state_size = 16]`` as a simple test case for initial performance evaluation.\n\nRunning the above Python script will compile the ``PyTorch`` compute graph using Neuron Compiler and generate a Neuron executable\nfile (NEFF) in the same directory. We can then profile the NEFF on a single NeuronCore using :doc:`neuron-profiler </nki/guides/use-neuron-profile>`.\nFigure below is a screenshot of the profile. We see this initial PyTorch implementation takes **151.83 ms** to execute *on\ndevice*.\n\n.. _fig_mamba_torch_ref:\n\n.. figure:: ../../img/mamba_torch_ref.png\n   :align: center\n   :width: 100%\n\n   Profile of Mamba PyTorch Implementation\n\nZooming into a portion of the profile, we notice the compute activities on different engines (TensorE/VectorE/ScalarE/GpSimdE)\nare quite sparse compared to data movement activities (the qSyncIO0 and qVectorSpillReload rows):\n\n.. _fig_mamba_torch_ref_zoomed:\n\n.. figure:: ../../img/mamba_torch_ref_zoomed.png\n   :align: center\n   :width: 100%\n\n   Profile of Mamba PyTorch Implementation (Zoomed-in)\n\nIn this seemingly “memory-bound” execution trace, the achieved DMA throughput is also extremely low, hovering around\n0.33% utilization throughout execution. Therefore, we are stressing neither the compute nor the memory subsystem, hinting\nthe workload is running at low efficiency on the NeuronCore. In the rest of this tutorial, we will showcase how to re-write\nthe above computation using NKI to achieve a device execution latency of **172.93 usec** , which is a **878x speedup**\ncompared to the PyTorch reference implementation.\n\nMapping Mamba Layer to NeuronCore\n---------------------------------\n\nIn this section, we will discuss how the computation can be mapped onto the NeuronCore architecture. We will also highlight\nthe importance of choosing appropriate data layouts to achieve good compute efficiency.\n\nRecall we have the following input tensor shapes in device memory:\n\n\n* ``delta: [batch_size, channels, seq_len]``\n* ``u: [batch_size, channels, seq_len]``\n* ``A: [channels, state_size]``\n* ``B: [batch_size, state_size, seq_len]``\n* ``C: [batch_size, state_size, seq_len]``\n\nIn fact, the above tensor layout has been chosen carefully based on the computation done in NeuronCore, which we will discuss\nin more detail below.\n\nIn Mamba models, both ``seq_len`` and ``channels`` are typically in the thousands (such as ``seq_len=16K, channels=4K``),\nwhile ``batch_size`` and ``state_size`` are much smaller by 2-3 order of magnitudes (such as ``batch_size=4, state_size=16``).\nTo simplify visualization of computation\non multi-dimensional tensors, let's hold ``batch`` and ``state_size`` dimension constant and focus on computation per batch\nper state. Note, the ``batch_size`` dimension is considered a fully parallel axis in a Mamba layer, while ``state_size``\nis only a partial parallel axis where results from different states will be accumulated together.\n\nBy extracting ``batch`` and ``state_size`` dimensions, we get the following input tensor shapes in device memory:\n\n\n* ``delta_i: [channels, seq_len]``\n* ``u_i:     [channels, seq_len]``\n* ``A_i:     [channels]``\n* ``B_i:     [seq_len]``\n* ``C_i:     [seq_len]``\n\nNext, let's visualize the data flow and computation using 2D matrices or vectors step-by-step.\n\nStep 1: Element-wise multiplication of ``delta_i`` and ``A_i``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe have the following PyTorch reference code for Step 1:\n\n.. code-block::\n\n   # delta[batch, channels, seq_len]\n   # A    [channels, state_size]\n   delta[:, :, None, :] * A[None, :, :, None]\n\n   # Holding batch and state_size constant\n   # delta_i: [channels, seq_len]\n   # A_i:     [channels]\n   delta_i[:, :] * A_i[:]\n\nAfter the above transformation, the multiplication between ``delta_i`` and ``A_i`` involves a **broadcasting** across the\n``seq_len`` dimension of ``delta_i``. In NKI, free-dimension broadcast can often be folded into the actual computation instruction\nat no additional performance cost, while partition-dim broadcast often requires a separate instruction on TensorE (see TensorE\nalternative use case in :ref:`Trainium/Inferentia2 Architecture Guide <arch_sec_tensor_engine_alternative_use>`).\nAs a result, we have two options for executing Step 1.\n\n**Option 1: Map ``seq_len`` to free dimension.** Element-wise multiplication of ``delta_i`` and ``A_i`` on NeuronCore can\nbe done through :doc:`nisa.tensor_scalar </nki/api/generated/nki.isa.tensor_scalar>`\non either VectorE or ScalarE, which automatically broadcast ``A_i`` along the free dimension to match the ``seq_len`` dimension\nin ``A_i``.\n\nNote, the ``channels`` dimension is mapped to SBUF partition dimension. Since the input ``channels`` dimension has a size\nof 256 in our initial setup, which exceeds the architectural limitation of ``nl.tile_size.pmax=128`` , we must **tile**\n``delta_i`` in the ``channels`` dimension (tiled dimension denoted as ``channels_tiled``\\ ) and feed one tile into ``nisa.tensor_scalar``\nat a time. Figure below illustrates the computation done for Option 1.\n\n.. _fig_mamba_step1_opt1:\n\n.. figure:: ../../img/mamba_step1_opt1.png\n   :align: center\n   :width: 80%\n\n   Step 1, Option 1: `nisa.tensor_scalar`\n\nAs an example, the associated NKI code for batch ``i_batch``\\ , state ``i_state`` and tile ``i_tile_channels`` in ``channels``\nis:\n\n.. code-block::\n\n   # Input shape in device memory matches the computation layout\n   # Device memory layout:\n   # delta_i: [channels, seq_len]\n   # A_i:     [channels]\n\n   # Computation layout in SBUF:\n   # delta_i: [par_dim(channels), seq_len]\n   # A_i:     [par_dim(channels)]\n\n   deltaA_i = nisa.tensor_scalar(delta_i, op0=nl.multiply, operand0=A_i)\n\nNote, with this compute layout option, the ``delta_i`` tensor shape ``[channels, seq_len]`` in device memory can be loaded\ninto SBUF efficiently with ``seq_len`` as the free dimension and fed into VectorE/ScalarE for computation. No extra transposes\nare needed.\n\n**Option 2: Map ``seq_len`` to partition dimension.** Alternatively, if we choose a transposed layout for ``delta_i`` in\nSBUF for computation, we will need a partition-dimension broadcast of ``A_i`` using a separate instruction on TensorE\n(``A_i.broadcast_to(...)``) and then a :doc:`nisa.tensor_tensor </nki/api/generated/nki.isa.tensor_tensor>`\noperation between ``delta_i`` and the broadcast ``A_i`` on VectorE. As a reminder, we need to tile the ``seq_len`` dimension\nto meet the tile size constraint ``nl.tile_size.pmax=128``. Figure below illustrates the computation done for Option 2.\n\n\n.. _fig_mamba_step1_opt2:\n\n.. figure:: ../../img/mamba_step1_opt2.png\n   :align: center\n   :width: 80%\n\n   Step 1, Option 2: p-dim broadcast + `nisa.tensor_tensor`\n\nThe associated NKI code is as follows:\n\n.. code-block::\n\n   # Input shape in device memory does NOT match the computation layout\n   # Device memory layout:\n   # delta_i: [channels, seq_len]\n   # A_i:     [channels]\n\n   # Computation layout in SBUF:\n   # delta_i: [par_dim(seq_len_tiled), channels]\n   # A_i:     [par_dim(1), channels]\n\n   A_i_bcast = A_i.broadcast_to((nl.tile_size.pmax, channels))\n   deltaA_i = nisa.tensor_tensor(delta_i, A_i_bcast, op=ml.multiply)\n\nAssuming the same ``delta_i`` device memory layout ``[channels, seq_len]``\\ , before performing the ``nisa.tensor_tensor``\ninstruction, we will need to either:\n\n\n* Do a regular load of ``delta_i`` into SBUF using :doc:`nl.load <../../api/generated/nki.language.load>` and an explicit transpose on the loaded ``delta_i`` using\n  ``nl.transpose`` to make ``seq_len`` lie in the free dimension, or\n* Do a transposed load of ``delta_i`` using :doc:`nl.load_transpose2d <../../api/generated/nki.language.load_transpose2d>`,\n  which is significantly less efficient in memory bandwidth usage compared to ``nl.load``\n\nIf Option2 was chosen as the compute layout, we would have incentives to define the ``delta`` input tensor shape as ``[seq_len,\nchannels]`` in device memory instead.\n\nFrom computation perspectives, Option 2 is less efficient than Option 1 because:\n\n#. Option 2 needs an extra TensorE instruction performing partition dimension broadcast.\n#. ``nisa.tensor_tensor`` is 2x slower than ``nisa.tensor_scalar`` for our input data type FP32 (see API doc for instruction\n   cost estimates).\n\nTherefore, for Step 1 only, Option 1 is the winner compared to Option 2. Let's continue with the rest of the steps to see\nif we need to revise this selection due to surrounding operator layout preferences.\n\nStep 2: Exponential of deltaA_i.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nStep 2 is evaluating exponential on ``deltaA_i`` from the previous step:\n\n.. code-block::\n\n   torch.exp(...)\n\nIn NeuronCore, evaluating an exponential function on a tensor is considered a scalar operation, which runs on ScalarE. This\noperation can be invoked through :doc:`nl.exp <../../api/generated/nki.language.exp>`\nor :doc:`nisa.activation </nki/api/generated/nki.isa.activation>`.\nHowever, ScalarE is able to perform a “pipelined multiply-add” on the input before evaluating a non-linear function (detail\nsee :ref:`Trainium/Inferentia2 Architecture Guide <arch_sec_scalar_pipelined_fma>`).\nIn other words, we can fold Step 1 (Option 1) ``nisa.tensor_scalar`` and Step 2 into a single ScalarE instruction at\nno additional cost. This functionality is only exposed in the ``nisa.activation`` API. This folding is not feasible if we\nchose Option 2 ``nisa.tensor_tensor`` in Step 1. Figure below illustrates our new execution plan to combine Step 1 and 2\ninto ``nisa.activation`` :\n\n.. _fig_mamba_step2:\n\n.. figure:: ../../img/mamba_step2.png\n   :align: center\n   :width: 80%\n\n   Step 1&2: ``nisa.activation``\n\nThe associated NKI code is as follows:\n\n.. code-block::\n\n   # Input shape in device memory matches the computation layout\n   deltaA_i = nisa.activation(op=nl.exp, data=delta_i, scale=A_i)\n\nStep 3: Element-wise multiplication of delta_i, B_i and u_i.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nPyTorch reference code for Step 3 is:\n\n.. code-block::\n\n   # delta[batch, channels, seq_len]\n   # B:   [batch, state_size, seq_len]\n   # u:   [batch, channels, seq_len]\n   delta[:, :, None, :] * B[:, None, :, :] * u[:, :, None, :]\n\n   # Holding batch and state_size constant\n   # delta_i: [channels, seq_len]\n   # B_i:     [seq_len]\n   # u_i:     [channels, seq_len]\n   delta_i[:, :] * B_i[None, :] * u_i[:, :]\n\nThis step involves similar compute layout and instruction choices as Step 1:\n\n\n* ``channels`` is either partition or free dimension for both ``delta_i`` and ``u_i``\n* multiplication with ``B_i`` is either through ``nisa.tensor_tensor`` or ``nisa.tensor_scalar``\n\nSince we preferred Step 1 to consume ``delta_i`` using ``channels`` as the partition dimension in previous steps, it is\nwise to follow the same layout choice here for ``delta_i`` to avoid any transposes. Given this layout choice, the multiplication\nwith ``B_i`` will have to be a ``nisa.tensor_tensor``. Figure below visualizes the computation in Step 3:\n\n\n.. _fig_mamba_step3:\n\n.. figure:: ../../img/mamba_step3.png\n   :align: center\n   :width: 80%\n\n   Step 3: p-dim broadcast + 2x ``nisa.tensor_tensor``\n\nThe associated NKI code is as follows:\n\n.. code-block::\n\n   # Input shape in device memory does NOT match the computation layout\n   # Device memory layout:\n   # delta_i: [channels, seq_len]\n   # u_i:     [channels, seq_len]\n   # B_i:     [seq_len]\n\n   # Computation layout in SBUF:\n   # delta_i: [par_dim(channels_tiled), seq_len]\n   # u_i:     [par_dim(channels_tiled), seq_len]\n   # B_i:     [par_dim(1), seq_len]\n\n   deltaU_i = nisa.tensor_tensor(delta_i, u_i, op=ml.multiply)\n   B_i_bcast = B_i.broadcast_to((nl.tile_size.pmax, seq_len))\n   deltaBu_i = nisa.tensor_tensor(deltaU_i, B_i_bcast, op=ml.multiply)\n\nStep 4: Associative scan between deltaA_i and deltaBu_i\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn this step, we use an associative scan operator between ``deltaA`` and ``deltaBu`` to aggregate information across time\nsequentially (sequence length, e.g. sequence of tokens), from the past to the present. Here is a PyTorch reference implementation:\n\n.. code-block::\n\n   # deltaA:   [batch_size, channels, state_size, seq_len]\n   # deltaB_u: [batch_size, channels, state_size, seq_len]\n   out = torch.empty(batch_size, channels, state_size, seq_len,\n                     device=deltaA.device, dtype=deltaA.dtype)\n\n   for i in range(seq_len):\n       # starting state is 0\n       prev_state = out[..., i - 1] if i > 0 else 0\n       # multiply deltaA by the previous time step state and then add deltaB_u\n       out[..., i] = deltaA[..., i] * prev_state + deltaB_u[..., i]\n\nBy holding batch and state_size dimensions constant, we get ``deltaA_i`` and ``deltaBu_i`` both with\n``[channels_tiled, seq_len]``, where ``channels_tiled`` is the partition dimension.\nThe associative scan between these two tile shapes can\nbe implemented in NKI naively through the following loop:\n\n.. code-block::\n\n   scan_i = nl.ndarray((channels_tiled, seq_len), ...)\n\n   # Peeling the first iteration out, which is\n   # equivalent to loop iterator dependent control flow within the loop\n   scan_i[0:channels_tiled, 0] = deltaBu[0:channels_tiled, 0]\n\n   for i in nl.sequential_range(seq_len - 1):\n      scan_i[0:channels_tiled, i+1] =    deltaA_i[0:channels_tiled, i+1] * scan_i[0:channels_tiled, i]\n                                       + deltaBu_i[0:channels_tiled, i+1]\n\nWithin the loop, the current implementation invokes one instruction for multiplication and another for addition. Since both\ninstructions are performed among tiles of shape ``[channels_tiled, 1]``, we can combine\nthese two instructions using :doc:`nisa.tensor_scalar <../../api/generated/nki.isa.tensor_scalar>`\nwhich supports two operators in a pipelined fashion within an instruction at the same cost as a single operator. Below is\na new implementation that could provide 2x speedup compared to the above:\n\n.. code-block::\n\n   scan_i = nl.ndarray((channels_tiled, seq_len), dtype=deltaA.dtype, buffer=nl.sbuf)\n   scan_i[0:channels_tiled, 0] = deltaBu[i_p, 0]\n\n   for i in nl.sequential_range(seq_len - 1):\n      scan_i[0:channels_tiled, i+1] = nisa.tensor_scalar(\n           deltaA[0:channels_tiled, i+1],\n           op0=nl.multiply,\n           operand0=scan_i[0:channels_tiled, i],\n           op1=nl.add,\n           operand1=deltaBu[0:channels_tiled, i+1])\n\nHowever, the above loop nest will turn into ``seq_len`` many instructions with input tiles that have a single element per\npartition in SBUF. In addition, every ``nisa.tensor_scalar`` instruction has a data dependency on the output of the previous\ninstruction. As discussed in the :ref:`Trainium/Inferentia2 Architecture Guide <arch_sec_vector_engine_perf>`,\nthese two traits combined in the instruction sequence is considered extremely *inefficient* on ScalarE/VectorE, where\nthe static instruction overhead instead of the useful execution time would be dominating the engine timeline.\n\nConveniently, NKI exposes another instruction :doc:`nisa.tensor_tensor_scan <../../api/generated/nki.isa.tensor_tensor_scan>`\non VectorE, which can perform the above loop nest in a *single* instruction by caching the intermediate scan result from\nthe previous time step internally in VectorE without going through SBUF.\n\n.. code-block::\n\n   scan_i = nisa.tensor_tensor_scan(deltaA_i, deltaBu_i, initial=0,\n                                    op0=np.multiply, op1=np.add)\n\nNote, the shape of ``scan_i`` is exactly the same as the input ``deltaA_i/deltaBu_i``\\ : ``[channels_tiled, seq_len]``.\n\nStep 5: Element-wise multiplication of C_i and scan_i\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe PyTorch reference implementation is:\n\n.. code-block::\n\n   # scan_res: [batch_size, channels, state_size, seq_len]\n   # C:        [batch_size, state_size, seq_len]\n   scanC = C[:, None, :, :] * scan_res\n\n   # Holding batch and state constant\n   # scan_i: [channels_tiled, seq_len]\n   # C_i:    [seq_len]\n   scanC_i = C_i[None, :] * scan_i[:, :]\n\nYou know the drill - Since ``channels_tiled`` is the partition dimension in ``scan_i`` from the previous step, we need to\nperform a partition-dimension broadcast on ``C_i`` before invoking ``nisa.tensor_tensor``\\ :\n\n\n.. _fig_mamba_step5:\n\n.. figure:: ../../img/mamba_step5.png\n   :align: center\n   :width: 80%\n\n   Step 5: p-dim broadcast + ``nisa.tensor_tensor``\n\nThe corresponding NKI code is:\n\n.. code-block::\n\n   C_i_bcast = C_i.broadcast((nl.tile_size.pmax, seq_len))\n   scanC_i = nisa.tensor_tensor(scan_i, C_i_bcast, op=ml.multiply)\n\nStep 6: Accumulation of scanC_i along ``state_size`` dimension\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSo far in Step 1-5, all the computation is logically parallel across the ``state_size`` dimension in a Mamba layer. The\nnext step of computation introduces data dependency along the ``state_size`` dimension for the first time. The PyTorch reference\nimplementation is:\n\n.. code-block::\n\n   # scan_res: [batch_size, channels, state_size, seq_len]\n   # C:        [batch_size, state_size, seq_len]\n   # -2 dim is state_size\n   scanC.sum(dim=-2)\n\n   # Holding batch constant only.\n   # scan_i_states: [channels_tiled, state_size, seq_len]\n   (scanC_i).sum(dim=-2)\n\nIn NKI, we can accumulate the ``scanC_i`` results across states element-wise using ``state_size-1`` number of ``nisa.tensor_tensor``\ninstructions:\n\n.. _fig_mamba_step6:\n\n.. figure:: ../../img/mamba_step6.png\n   :align: center\n   :width: 80%\n\n   Step 6: ``state_size-1`` number of ``nisa.tensor_tensor``\n\nSince we will be looping over different states, we can also declare an empty accumulation buffer ``scanC_accum`` of shape\n``[channels_tiled, seq_len]`` outside of the loop structure and accumulate into this buffer at the end of the every loop\niteration using ``+=`` operator. The use of a single accumulation buffer avoids allocating memory for ``scanC_i`` across\nall states in SBUF. The corresponding NKI code is:\n\n.. code-block::\n\n   scanC_accum = nl.zeros(...)\n\n   for i_state in nl.affine_range(state_size):\n       scanC_i = ...\n       scanC_accum += scanC_i\n\nInitial NKI Kernel\n------------------\n\nPutting all the pieces together from the previous section, we can arrive at the below kernel implementation ``mamba_v1``:\n\n.. nki_example:: ../../examples/fused_mamba/mamba_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_25\n\nIn the above code example,\n\n* We have three levels of loop nests. From the outer-most to inner-most:\n    * Iterating over ``batch``: Different batch samples perform completely different computation. ``A`` tensor is the only\n      input parameter that is shared among batch samples.\n    * Iterating over ``state_size``: Different states perform parallel computation until Step 6 as discussed in the previous\n      section. Both ``delta`` and ``u`` tensors are shared across different states.\n    * Iterating over ``channels``: This is the most-inner dimension where we tile the input channels dimension into ``nl.tile_size.pmax=128``\n      chunks. Both ``B`` and ``C`` tensors are shared across different ``channels``.\n* The kernel above assumes channels is a multiple of ``nl.tile_size.pmax=128`` . We can relax this by adding a ``mask``\n  parameter in all the NKI API call in the kernel. To simplify the code example, we omit this change.\n  See :ref:`NKI API Masking <nki-mask>` for more information.\n* We declare an empty intermediate tensor ``scanC_accum`` to hold partial summation from every state.\n* Within the inner loop, we process data for ``nl.tile_size.pmax=128`` channels for one batch sample in one state.\n    * We use the :ref:`slicing syntax <nki-basic-tensor-indexing>`\n      to index a tensor. For example, ``delta[i_batch, channel_start:channel_start+channel_psize, 0:seq_len]`` grabs data from\n      the input ``delta`` tensor for the current range of channels at the current batch sample.\n    * Note, in tensor slicing, the first index dimension from the left with a slicing range will be chosen as the partition\n      dimension. When loading ``B``, since we intend to load only one state's worth of data into one partition of SBUF (discussed\n      in Step 3), we need to explicitly slice the state using: ``nl.load(B[i_batch, **i_state:i_state+1**, 0:seq_len])``. Otherwise,\n      ``nl.load(B[i_batch, **i_state**, 0:seq_len])`` will treat ``seq_len`` as the partition dimension, which is not what we\n      planned for in Step 3 and would also trigger a NKI compilation error since ``seq_len`` exceeds ``nl.tile_size.pmax``.\n    * We accumulate partial ``scanC_i`` results into the accumulation buffer using the ``+=`` operator. This creates a loop-carried\n      dependency for ``scanC_accum`` on the ``i_state`` loop.\n\nPerformance Check\n^^^^^^^^^^^^^^^^^\n\nLet's re-run neuron-profile on the above NKI kernel:\n\n.. _fig_mamba_v1_profile:\n\n.. figure:: ../../img/mamba_v1_profile.png\n   :align: center\n   :width: 100%\n\n   Profile of initial Mamba kernel implementation ``mamba_v1``\n\nHooray! This NKI kernel implementation now takes ``172.93`` usec, which is **878x** speedup compared to the reference PyTorch\nimplementation. Based on the profile, VectorE is the busiest compute engine in the Mamba layer. This makes sense because\nthe bulk of computation in the kernel is in ``nisa.tensor_tensor``\\ , which can only run on VectorE.\n\nTherefore, our goal is to keep VectorE as busy as possible throughout execution. Note, every NEFF execution involves certain\nstart-up and tear-down overhead. We can use the ``Selection Summary`` feature in ``neuron-profile`` to find out the percentage\nof time VectorE is busy during the actual execution period:\n\n\n.. _fig_mamba_v1_profile_zoomed:\n\n.. figure:: ../../img/mamba_v1_profile_zoomed.png\n   :align: center\n   :width: 100%\n\n   Profile of initial Mamba kernel implementation ``mamba_v1`` (zoomed in)\n\nAs indicated by the above profile, VectorE is active over **98.71%** of the time, which is rather impressive. However,\nremember we used small input shapes as a toy example to get started: ``[batch=1, seq_len=512, channels = 256, n = 16]``.\nNext, let's increase the ``channels`` and ``seq_len`` dimensions one by one and observe how VectorE efficiency changes.\n\nIncreasing input ``channels`` size\n--------------------------------------\n\nLet's increase the size of ``channels`` by 16x, from 256 to a more realistic value 4096. We obtain the following profile:\n\n.. _fig_mamba_v1_profile_4k_chan:\n\n.. figure:: ../../img/mamba_v1_profile_4k_chan.png\n   :align: center\n   :width: 100%\n\n   Profile of ``mamba_v1`` kernel with 4K channels\n\nThe new device execution time with increased channels is now **2.34 ms**. We can see that VectorE active duration has\ndropped to **92.16%** during the core execution period, compared to **98.71%** previously with the toy example. Let’s zoom\ninto an arbitrary region of the profile to see what could be causing VectorE to go idle:\n\n.. _fig_mamba_v1_profile_4k_chan_sem:\n\n.. figure:: ../../img/mamba_v1_profile_4k_chan_sem.png\n   :align: center\n   :width: 100%\n\n   ``mamba_v1`` kernel blocking on input tensor loading\n\nBy identifying a gap where VectorE is completely idle, we can hover over the first executed instruction after the gap\nto find out what's the reason for idleness in the instruction semaphore wait condition. In the above screenshot, the instruction\nis pending on ``S[22]`` to reach a value of 240, which is set by ``qSyncIO0`` activities. This means VectorE has been waiting\nfor input tensors to be loaded before performing more computation. If you hover over ``qSyncIO0`` activities during the\nVectorE idle period, you can also see the exact input tensor name defined in NKI being loaded in the DMA:\n\n.. _fig_mamba_v1_profile_4k_chan_load_var:\n\n.. figure:: ../../img/mamba_v1_profile_4k_chan_load_var.png\n   :align: center\n   :width: 100%\n\n   DMA loading tensor u in ``mamba_v1`` profile\n\nWe can find similar VectorE gaps through the execution trace. At this point, we can conclude one of the reasons why we have\na lower VectorE active time percentage is due to *blocking* input tensor loading (``nl.load``) activities in the DMA.\nNext, let's spend some time analyzing DMA efficiency.\n\nZooming out, we can make several observations. First, we see two orange boxes around the ``qSyncIO0`` row. Hovering over\nthe top left corners of the boxes shows two similar performance warnings for loading IO tensors:\n\n\n.. _fig_mamba_v1_profile_4k_chan_reload:\n\n.. figure:: ../../img/mamba_v1_profile_4k_chan_reload.png\n   :align: center\n   :width: 100%\n\n   Performance warnings for reloading ``u`` and ``delta`` tensors\n\nThis indicates we reload both the input ``u`` and ``delta`` tensors around 7 times. This could be inevitable\nwhen we don't have sufficient on-chip memory (SBUF) to allow full reuse of the input data tensors. However, the profiler\nshows we are only hitting around 50% capacity usage throughout execution:\n\n.. _fig_mamba_v1_profile_4k_chan_sb:\n\n.. figure:: ../../img/mamba_v1_profile_4k_chan_sb.png\n   :align: center\n   :width: 100%\n\n   Low SBUF usage\n\nTherefore, the input tensor reloading is likely not justified, and we should investigate whether we can optimize the\nNKI kernel to avoid it.\n\n.. _tut_mamba_loop_reordering:\n\nMinimizing data reloading by loop reordering\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo understand why delta and u are being reloaded, let's revisit our input tensor shapes:\n\n\n* ``delta: [batch_size, channels, seq_len]``\n* ``u:     [batch_size, channels, seq_len]``\n* ``A:     [channels, state_size]``\n* ``B:     [batch_size, state_size, seq_len]``\n* ``C:     [batch_size, state_size, seq_len]``\n\nLet's hold ``batch_size`` constant since the majority of input tensors have completely different slices for different batch\nsamples:\n\n\n* ``delta: [channels, seq_len]``\n* ``u:     [channels, seq_len]``\n* ``A:     [channels, state_size]``\n* ``B:     [state_size, seq_len]``\n* ``C:     [state_size, seq_len]``\n\n``delta`` and ``u`` tensors have the same shape with ``channels`` as the outer dimensions, while ``B`` and ``C`` have the\nsame shape with ``state_size`` as the outer dimension. All four of these input tensors have ``seq_len`` as the inner dimension.\nTherefore, we say ``delta/u`` is reused across different states, while ``B/C`` are reused across different channels. Given\nthis conflicting reuse dimensions, we further say it is more important to **prioritize reuse of ``delta/u``** because\nthe expected size of ``channels`` is much higher than ``state_size``:\n\n\n* ``state_size`` is now 16 and typically stay small\n* ``channels`` is now 4096 and typically in the thousands\n\nIn NKI, we can prioritize ``delta/u`` reuse through loop ordering. Recall in the initial NKI kernel implementation, we have\nthe following inner loops:\n\n.. code-block::\n\n   ...\n   for i_state in nl.affine_range(state_size):\n       for i_channel_tile in nl.affine_range(n_channel_tile):\n           # step 1-6\n   ...\n\nSince these two loops are executed serially within a single NeuronCore, the loop instances will be unrolled by Neuron Compiler.\nWith the channel dimension in the fastest dimension, we will need to load ``delta/u`` across all channels in the first state,\nand then likely reload them again in the later states due to a large total memory size in ``delta`` and ``u`` (16MB in this\ncase).\n\nTo prioritize reuse of ``delta/u``\\ , we should reorder the above loop nests. To further enforce the reuse, we can hoist\nthe ``nl.load`` calls for ``delta/u`` outside of the ``i_state`` inner loop:\n\n.. code-block::\n\n   ...\n   for i_channel_tile in nl.affine_range(n_channel_tile):\n       delta_i = nl.load(...)\n       u_i = nl.load(...)\n\n       for i_state in nl.affine_range(state_size):\n           # step 1-6\n   ...\n\nAs a side effect of this loop re-ordering, we can also spot a loop fusion opportunity since we have two ``i_channel_tile``\nloop nests at the same level now:\n\n.. code-block::\n\n   scanC_accum = nl.zeros((n_channel_tile, nl.par_dim(channel_psize), seq_len), ...)\n   ...\n\n   # First i_channel_tile loop\n   for i_channel_tile in nl.affine_range(n_channel_tile):\n       delta_i = nl.load(...)\n       u_i = nl.load(...)\n\n       for i_state in nl.affine_range(state_size):\n           # step 1-6\n\n   # Second i_channel_tile loop\n   for i_channel_tile in nl.affine_range(n_channel_tile):\n       nl.store(..., scanC_accum[i_channel_tile, 0:channel_psize, 0:seq_len])\n\n   ...\n\nBy fusing the two ``i_channel_tile`` loop nests into a single loop nest, we can pull the declaration of ``scanC_accum``\ninside the ``i_channel_tile`` loop and further reduce the ``scanC_accum`` size requirement by a factor of ``n_channel_tile``\n:\n\n.. code-block::\n\n   ...\n\n   # First i_channel_tile loop\n   for i_channel_tile in nl.affine_range(n_channel_tile):\n       scanC_accum = nl.zeros((nl.par_dim(channel_psize), seq_len), ...)\n\n       delta_i = nl.load(...)\n       u_i = nl.load(...)\n\n       for i_state in nl.affine_range(state_size):\n           # step 1-6\n\n       nl.store(..., scanC_accum[i_channel_tile, 0:channel_psize, 0:seq_len])\n\n   ...\n\nLet's modify our initial NKI kernel implementation accordingly to get ``mamba_v2``:\n\n.. nki_example:: ../../examples/fused_mamba/mamba_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_26\n\nWe recapture the profile for the new kernel implementation:\n\n.. _fig_mamba_v2:\n\n.. figure:: ../../img/mamba_v2.png\n   :align: center\n   :width: 100%\n\n   Profile of ``mamba_v2`` kernel with loop reordering optimization\n\n\nThe device execution time is now **1.61 ms**, which is a **31%** reduction in latency compared to our initial kernel implementation.\nWe can also see VectorE active duration is back up to 99.63% and the performance warnings on input tensor reloading are\nnow gone. In case you are curious, the above loop reordering optimization alone provides around 30% of latency reduction,\nwhile the loop fusion optimization contributes the remaining 1% performance boost. This makes sense because the loop reordering\naddresses our key performance concern around input data reloading, while reducing intermediate tensor size is only a nice-to-have\ngiven we were quite low on SBUF usage to begin with.\n\nIncreasing input ``seq_len`` size\n-------------------------------------\n\nNext, let's increase the input ``seq_len`` by **16x**, from 512 to 8192 and recompile the above NKI kernel. Below is the\nassociated performance profile:\n\n.. _fig_mamba_v2_8K_seqlen:\n\n.. figure:: ../../img/mamba_v2_8K_seqlen.png\n   :align: center\n   :width: 100%\n\n   Profile of ``mamba_v2`` kernel with 8K seq_len\n\nThe new profile now takes **53.33 ms**, which is **33x longer** than the previous profile. VectorE active duration has\ndropped down to a new low: 58.93%. Compared to the profile captured with a smaller ``seq_len``, we notice new DMA activity\nrows ``qSyncSpillReload0`` and ``qVectorSpillReload0`` , which are associated with data movement traffic for intermediate\ndata spill from SBUF into device memory or reload back to SBUF. Zooming into a smaller portion of the profile:\n\n.. _fig_mamba_v2_8K_seqlen_zoomed:\n\n.. figure:: ../../img/mamba_v2_8K_seqlen_zoomed.png\n   :align: center\n   :width: 100%\n\n   Poor overlap of computation and data movement\n\n\nWe can see VectorE enters idle states due to a blocking semaphore wait for ``qSyncSpillReload0`` activities,\nwhich indicates the extra spill/reload is indeed degrading overall computation performance. In addition, we can see low\nSBUF usage peaking at merely 50%. Computation and data movement are also not overlapped properly, leading to low average\nutilization in both compute engines and DMA throughput in the overall timeline.\n\nIntuitively, increasing ``seq_len`` of the kernel increases the active tile sizes of input and intermediate tensors in the\nfree dimension, which could cause severe fragmentation in SBUF and excessive data movements to spill/reload tensors in\nSBUF. To mitigate these inefficiencies, we must **tile** the ``seq_len`` dimension in our NKI kernel through a new loop\nlevel.\n\n.. _tut_mamba_tiling:\n\nMitigate spilling by tiling ``seq_len``\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe have **three** key considerations when adding this new loop level:\n\n1. tile size selection,\n2. loop-carried dependency handling\n3. loop ordering with other loop nests.\n\n**Tile size of ``seq_len``.** Since previously with ``seq_len=512`` in our toy example, we were able to achieve close to\n100% VectorE utilization, let's set the tile size ``seq_len_fsize`` to 512 as a starting point. We can revisit this decision\nas needed once we obtain a new profile.\n\n**Loop-carried dependency.** Splitting ``seq_len`` into chunks is straightforward for all computation steps except for Step\n4. In the associative scan operation, the next loop iteration requires results from the previous iteration for computation.\nAs a result, we will introduce another loop-carried dependency here with the scan tiles. This dependency can be handled\nthrough the ``initial`` input parameter:\n\n.. code-block::\n\n   scan_init = nl.zeros((channel_psize, 1), ...)\n\n   for i_seq_len_tile in static_range(seq_len // seq_len_fsize):\n       scan_i = nisa.tensor_tensor_scan(deltaA, deltaBu, initial=scan_init,\n                                             op0=np.multiply, op1=np.add)\n       scan_init = scan_i[0:channel_psize, seq_len_fsize-1]\n\nNote, we choose to use ``static_range`` instead of ``affine_range`` due to the new loop-carried dependencies.\n\n**Loop ordering.** Recall from our latest NKI kernel implementation, we have the following loop nest:\n\n.. code-block::\n\n   ...\n   for i_batch in nl.affine_range(batch_size):\n\n       for i_channel_tile in nl.affine_range(n_channel_tile):\n           scanC_accum = nl.zeros((nl.par_dim(channel_psize), **seq_len**), ...)\n\n           delta_i = nl.load(delta[i_batch, channel_start:channel_start+channel_psize, 0:**seq_len**])\n           u_i = nl.load(u[i_batch, channel_start:channel_start+channel_psize, 0:**seq_len**])\n\n           for i_state in nl.affine_range(state_size):\n               A_i = nl.load(A[channel_start:channel_start+channel_psize, i_state])\n\n               B_i = nl.load(B[i_batch, i_state:i_state+1, 0:**seq_len**])\n               C_i = nl.load(C[i_batch, i_state:i_state+1, 0:**seq_len**])\n\n               deltaA = ...\n               deltaBu = ...\n               scanC = ...\n               ...\n               scanC_accum += ...\n\n            nl.store(..., scanC_accum[i_channel_tile, 0:channel_psize, 0:**seq_len**])\n   ...\n\nLet's denote the above loop ordering as ``[batch_size, n_channel_tile, state_size]``\\ , and our key question here is where\nto insert ``seq_len`` in this list.\n\nAppending ``seq_len`` to the above list, that is, making ``seq_len`` the new inner-most loop, would involve the least amount\nof code changes to our current NKI kernel. However, it will lead to the least amount of SBUF usage reduction, since this\nloop ordering won't be tiling ``scanC_accum``, ``delta_i`` and ``u_i`` tensors. Given ``seq_len=8192`` and FP32 data types,\nthese three tensors will occupy 8192\\ *4B*\\ 3 = 96 KiB/partition, half of the available SBUF capacity. Let's go ahead and\nexperiment this loop ordering in a new kernel ``mamba_v3``:\n\n.. _fig_mamba_v3:\n\n.. figure:: ../../img/mamba_v3.png\n   :align: center\n   :width: 100%\n\n   Profile of ``mamba_v3`` kernel with seq_len tiling optimization\n\n\nWith the above profile, the kernel now takes **27.8 ms**\\ , which is **48%** reduction in latency compared to no ``seq_len``\ntiling. VectorE is now 94.85% active, and we no longer have spilling related DMA activities.\n\nFinally, since the key advantage of Mamba compared to Transformer models is Mamba's computation and latency should scale\nlinearly with respect to ``seq_len``, instead of quadratically in Transformers, let's plot the measured kernel latencies across different\n``seq_len`` up to 8K (what we have optimized so far) and compare it against “perfect latencies” assuming linear scaling\nfrom ``seq_len=512``. We evaluate scaling efficiency using ``perfect latency / measured latency``,\nwhich is a higher the better metric. Finally, to showcase the importance of the last seq_len tiling optimization for scaling seq_len,\nwe also compare scaling efficiency for ``mamba_v2`` (no seq_len tiling) and ``mamba_v3`` (seq_len tiling).\n\n.. list-table::\n   :header-rows: 1\n\n   * - seq_len\n     - Perfect Latency (ms)\n     - mamba_v2 Measured Latency (ms)\n     - mamba_v2 Scaling Efficiency\n     - mamba_v3 Measured Latency (ms)\n     - mamba_v3 Scaling Efficiency\n   * - 512\n     - N/A\n     - 1.6\n     - N/A\n     - 1.6\n     - N/A\n   * - 1024\n     - 3.2\n     - 4.4\n     - 72.73%\n     - 3.3\n     - 96.97%\n   * - 2048\n     - 6.4\n     - 8.9\n     - 71.91%\n     - 6.6\n     - 96.97%\n   * - 3072\n     - 9.6\n     - 13.1\n     - 73.28%\n     - 10.1\n     - 95.05%\n   * - 4096\n     - 12.8\n     - 17.6\n     - 72.73%\n     - 13.3\n     - 96.24%\n   * - 5120\n     - 16\n     - 23.7\n     - 67.51%\n     - 17.3\n     - 92.49%\n   * - 6144\n     - 19.2\n     - 27.5\n     - 69.82%\n     - 19.6\n     - 97.96%\n   * - 7168\n     - 22.4\n     - 41.3\n     - 54.24%\n     - 24.2\n     - 92.56%\n   * - 8192\n     - 25.6\n     - 52.2\n     - 49.04%\n     - 27.8\n     - 92.09%\n\n\nThe above data shows the last NKI kernel implementation ``mamba_v3`` can reach 90%+ scaling efficiency up to 8K ``seq_len``.\nTo support even larger ``seq_len``, we will need more aggressive tiling by pulling the ``seq_len`` loop level further\ntowards the outer-loop level to tile more input/intermediate tensors to keep spilling low and VectorE busy.\n\nDownload All Source Code\n--------------------------\n\nClick the links to download source code of the kernels and the testing code\ndiscussed in this tutorial.\n\n* PyTorch reference implementation: :download:`mamba_torch.py <../../examples/fused_mamba/mamba_torch.py>`\n* Three versions of NKI kernels: :download:`mamba_nki_kernels.py <../../examples/fused_mamba/mamba_nki_kernels.py>`\n\nYou can also view the source code in the GitHub repository `nki_samples <https://github.com/aws-neuron/nki-samples/tree/main/src/nki_samples/tutorials/fused_mamba/>`_\n\nExample usage of the scripts:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n**Performance mode**\n\nRun PyTorch reference implementation to generate a NEFF for profiling:\n\n.. code-block::\n\n   python3 mamba_torch.py --mode perf\n\nCheck performance numbers of mamba_v1/mamba_v2/mamba_v3:\n\n.. code-block::\n\n   python3 mamba_nki_kernels.py --mode perf --version v1 v2 v3 --batch 1 --seq_len 2048 --channels 512 --state_size 16\n\n\n**Accuracy mode**\n\nCheck mamba_v1 NKI kernel accuracy against PyTorch implementation:\n\n.. code-block::\n\n   python3 mamba_torch.py --mode accuracy\n\nCheck optimized Mamba kernel (mamba_v2, mamba_v3) accuracy against mamba_v1:\n\n.. code-block::\n\n   python3 mamba_nki_kernels.py --mode accuracy --version v1 v2 v3 --batch 1 --seq_len 2048 --channels 512 --state_size 16\n"
  },
  {
    "path": "nki/guides/tutorials/index.rst",
    "content": ".. meta::\n    :description: Hands-on tutorials for AWS Neuron Kernel Interface (NKI), covering matrix operations, normalization techniques, advanced kernels, and distributed computing patterns.\n    :keywords: NKI, AWS Neuron, Tutorials, Matrix Multiplication, Normalization\n    :date-modified: 12/01/2025\n\n.. _nki-tutorials:\n\nNKI Tutorials\n==============\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   Matrix Multiplication <matrix_multiplication>\n   average_pool2d\n   transpose2d\n   fused_mamba\n   kernel-optimization\n\nThis section provides hands-on tutorials for the Neuron Kernel Interface (NKI), demonstrating how to write custom kernels for AWS Trainium and Inferentia instances. These tutorials cover fundamental operations, advanced techniques, and distributed computing patterns using NKI.\n\nThe full source code of the following tutorials can be also viewed on the \n`nki-samples <https://github.com/aws-neuron/nki-samples/tree/main/src/nki_samples/tutorials/>`_ repository on GitHub.\n\nBasic Operations\n----------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: matrix_multiplication\n      :link-type: doc\n\n      **Matrix Multiplication**\n      ^^^\n      Learn the fundamentals of implementing matrix multiplication in your NKI kernels.\n\n   .. grid-item-card::\n      :link: transpose2d\n      :link-type: doc\n\n      **2D Transpose**\n      ^^^\n      Implement efficient 2D matrix transpose operations using NKI\n\n   .. grid-item-card::\n      :link: average_pool2d\n      :link-type: doc\n\n      **Average Pooling 2D**\n      ^^^\n      Create custom 2D average pooling kernels for computer vision workloads\n\nAdvanced Kernels\n----------------\n\n.. grid:: 1 1 2 2\n   :gutter: 2\n\n   .. grid-item-card::\n      :link: fused_mamba\n      :link-type: doc\n\n      **Fused Mamba**\n      ^^^\n      Implement fused Mamba state space model kernels\n\n   .. grid-item-card::\n      :link: kernel-optimization\n      :link-type: doc\n\n      **Kernel Optimization**\n      ^^^\n      Learn the recommended workflow for optimizing NKI kernels using profiling and performance analysis\n"
  },
  {
    "path": "nki/guides/tutorials/kernel-optimization.rst",
    "content": ".. meta::\n    :description: Learn the recommended workflow for optimizing kernels with NKI and AWS Neuron.\n    :date-modified: 12/02/2025\n\n.. _nki-kernel-optimization-guide:\n\nIntroduction to NKI Kernel Optimization\n========================================\n\nThe Neuron Kernel Interface (NKI) provides an API for writing hand-tuned kernels. You use the Instruction Set Architecture (ISA) of a Neuron device directly to speed up critical parts of an ML model. This topic covers how you develop and tune NKI kernels and how this applies to developing with the AWS Neuron SDK. The Neuron Profiler helps you identify opportunities to improve ML model performance and drive hand-tuned optimizations with a NKI kernel.\n\n\nOverview\n--------\n\nDevelopers commonly create NKI kernels to accelerate critical operations in larger ML inference or training models. Just as you might accelerate a traditional program by writing small parts in inline assembler, NKI lets you directly program the underlying Neuron hardware using the same ISA instructions the Neuron Compiler generates. In this overview, we use a kernel that performs matrix multiply as an example. We use the profiler to work from a simpler, more obviously correct version of the kernel to a version that performs better by improving memory usage by removing redudant loads and increasing DMA efficiency through blocking which better overlaps loading data and computing results. Along the way, we use a test program in PyTorch or JAX to ensure each step preserves a working kernel. We use the Neuron Explorer to drive additional performance improvements. We also change the kernel from being memory bound in the initial tiled implementation to being compute bound, as we would expect, in the optimized version of the kernel.\n\nApplies to\n-----------\n\nThis concept is applicable to:\n\n*  Improving the performance of critical sections of ML inference or training models.\n* Writing small performant kernels for standalone ML inference or training.\n\nWhen to write a kernel?\n------------------------\n\nThe Neuron Compiler takes ML models written in PyTorch, JAX, and other frameworks and generates the best performing code it can based on that model. Like any general purpose compiler, it may make optimization decisions that work well for the general case but may not produce optimal code for this specific model. The Neuron Kernel Interface (NKI) provides a mechanism for replacing sections of a model with a hand-tuned kernel. The first step in identifying a good candidate for turning a section of a model into a kernel is the Neuron Profiler, which provides a view on how the model performs.\n\nThe Neuron Profiler can help indicate where the model might benefit from optimization. You can map sections in the Neuron Profiler where one or more engines are idle while waiting on DMA or similar apparent gaps to places in the model where code may execute several times. These can be good candidates for writing a custom kernel. Good candidates are similar to where you might split a large function into smaller functions in a traditional program. This means some \"minimum cut\" in the graph where there are relatively few inputs and outputs of the kernel.\n\nStarting simple\n----------------\n\nThe end goal of writing a kernel is to improve the performance of the model, but the first step is to write a kernel that correctly performs the operation you wish to replace in the graph. As a motivating example, suppose that the section of the graph you wish to replace consists of a matrix multiply of two relatively large matrices. Kernels will often be more sophisticated than this, as you can see by looking at the Neuron Kernel Library (NKI-Lib), for instance performing functions like RMSNorm-Quant or QKV, but matrix multiply may be an aspect of these more sophisticated kernels.\n\nNKI provides the ``nki.isa.nc_matmul`` instruction to perform a matrix multiply. This instruction operates over a restricted sized matrix with at most a 128 x 128 \"stationary\" (weights) matrix and a 128 x 512 \"moving\" (ifmap) matrix. This allows you to produce a 128 x 512 matrix, at most, as output. The \"stationary\" matrix must be transposed to get a result that is not transposed. To call the ``nki.isa.nc_matmul`` instruction, provide to the state buffer (SBUF), and the result will be written into the partial sum buffer (PSUM). If you use a small driver program to invoke the kernel, the arguments will be passed in from the device memory (HBM) and the result will be read from HBM as well. The kernel will move inputs from HBM to SBUF, call the ``nki.isa.nc_matmul`` instruction, move the result from PSUM to SBUF (you cannot move data directly from PSUM to HBM), and then from SBUF to HBM.\n\n.. code-block:: python\n\n    import nki\n    import nki.language as nl\n    import nki.isa as nisa\n    import os\n\n    os.environ[\"NEURON_PLATFORM_TARGET_OVERRIDE\"] = \"trn2\"\n    \n    @nki.jit\n    def matrix_multiply_kernel(lhsT, rhs):\n      \"\"\"NKI kernel to compute a matrix multiplication operation on a single tile\n\n      Args:\n        lhsT: an input tensor of shape [K,M], where both K and M are, at most, \n          128.  It is the left-hand-side argument of the matrix multiplication,\n          delivered transposed for optimal performance.\n        rhs: an input tensor of shape [K,N], where K is, at most, 128, and N\n          is, at most, 512.  It is the right-hand-side argument of the matrix\n          multiplication.\n      Returns:\n        result: the resulting output tensor of shape [M,N]\n      \"\"\"\n      # Verify that the lhsT and rhs are the expected sizes.\n      K, M = lhsT.shape\n      K_, N = rhs.shape\n\n      # Ensure that the contraction dimension matches\n      assert K == K_, \\\n        f\"Contraction demention {K} does not match {K_}, did you remember to transpose?\"\n\n      # Ensure the dimensions will fit within the constrins of matmul.\n      assert K <= nl.tile_size.pmax, \\\n        f\"Expected partition dimension in lhsT ({K}) to be less than {nl.tile_size.pmax}\"\n      assert M <= nl.tile_size.gemm_stationary_fmax, \\\n        f\"Expected free dimension in lhsT ({M}) to be less than \" \\\n        f\"{nl.tile_size.gemm_stationary_fmax}\"\n      assert N <= nl.tile_size.gemm_moving_fmax, \\\n        f\"Expected free dimension in rhs ({N}) to be less than \" \\\n        f\"{nl.tile_size.gemm_moving_fmax}\"\n\n      # Allocate tiles for lhsT and rhs on sbuf (uninitialized)\n      lhsT_tile = nl.ndarray(shape=lhsT.shape, dtype=lhsT.dtype, buffer=nl.sbuf)\n      rhs_tile = nl.ndarray(shape=rhs.shape, dtype=rhs.dtype, buffer=nl.sbuf)\n\n      # Copy the input matrices from HBM to SBUF\n      nisa.dma_copy(dst=lhsT_tile, src=lhsT)\n      nisa.dma_copy(dst=rhs_tile, src=rhs)\n\n      # Perform matrix multiply, result will be written into PSUM\n      result_tile = nl.ndarray(shape=(M, N), dtype=nl.float32, buffer=nl.psum)\n      nisa.nc_matmul(dst=result_tile, stationary=lhsT_tile, moving=rhs_tile)\n\n      # Copy result to SBUF (we cannot copy directly from PSUM to HBM)\n      result_tmp = nl.ndarray(shape=result_tile.shape,\n                              dtype=result_tile.dtype,\n                              buffer=nl.sbuf)\n      nisa.tensor_copy(dst=result_tmp, src=result_tile)\n\n      # Copy result to HBM\n      result = nl.ndarray(shape=result_tmp.shape,\n                          dtype=result_tmp.dtype,\n                          buffer=nl.hbm)\n      nisa.dma_copy(dst=result, src=result_tmp)\n\n      return result\n\nThis small kernel allows you to experiment with the ``nki.isa.nc_matmul`` instruction and you can test that it works with a simple driver.\n\n.. tabs::\n\n   .. tab:: PyTorch\n\n      .. code-block:: python\n\n          import numpy as np\n          import torch\n          import torch_xla\n          import torch_xla\n          from multiply_kernel import matrix_multiply_kernel\n\n          # Set up our initial inputs in numpy, and compute the matrix multiply in pure\n          # numpy on the CPU\n          rng = np.random.default_rng()\n          lhs = rng.random((128, 128), dtype=np.float32)\n          rhs = rng.random((128, 512), dtype=np.float32)\n          expected_result = np.matmul(lhs, rhs)\n\n          # Setup the XLA device and generate input tensors.\n          device = torch_xla.device()\n\n          lhsT_torch = torch.from_numpy(lhs.T).to(device=device)\n          rhs_torch = torch.from_numpy(rhs).to(device=device)\n\n          # Invoke the kernel to add the results.\n          result_device = matrix_multiply_kernel(lhsT_torch, rhs_torch)\n\n          result_torch = result_device.cpu()\n\n          if np.allclose(expected_result, result_torch):\n              print(\"Kernel computed correct output\")\n              print(result_torch)\n          else:\n              print(\"FAILED: Kernel computed output off from expected\")\n              print(\"expected:\")\n              print(expected_result)\n              print(\"actual:\")\n              print(result_torch)\n\n   .. tab:: JAX\n\n      .. code-block:: python\n\n          import numpy as onp\n          import jax.numpy as jnp\n          from multiply_kernel import matrix_multiply_kernel\n\n          # Set up our initial inputs in numpy, and compute the matrix multiply in pure\n          # numpy on the CPU\n          rng = onp.random.default_rng()\n          lhs = rng.random((128, 128), dtype=onp.float32)\n          rhs = rng.random((128, 512), dtype=onp.float32)\n          expected_result = onp.matmul(lhs, rhs)\n\n          # Generate the input tensors\n          lhsT_jax = jnp.array(lhs.T)\n          rhs_jax = jnp.array(rhs)\n\n          result_jax = matrix_multiply_kernel(lhsT_jax, rhs_jax)\n\n          if onp.allclose(expected_result, result_jax):\n              print(\"Kernel computed correct output\")\n              print(result_jax)\n          else:\n              print(\"FAILED: Kernel computed output off from expected\")\n              print(\"expected:\")\n              print(expected_result)\n              print(\"actual:\")\n              print(result_jax)\n\nYou can validate that you have the correct understanding of the nki.isa.nc_matmul instruction by invoking your test:\n\n.. code-block:: bash\n\n    $ python driver.py\n    Kernel computed correct output\n    tensor([[35.7896, 32.8659, 31.6545,  ..., 37.1804, 31.4682, 33.9796],\n            [28.8202, 27.4512, 26.0832,  ..., 30.1993, 27.0034, 27.1942],\n            [35.0943, 30.6835, 33.3721,  ..., 36.8755, 32.7837, 32.4317],\n            ...,\n            [34.9192, 30.0401, 32.3874,  ..., 34.2831, 31.9439, 32.8761],\n            [33.0372, 28.7389, 32.2096,  ..., 34.8574, 30.7248, 32.1855],\n            [32.4571, 29.1864, 31.7483,  ..., 33.3723, 30.1617, 29.8077]])\n\n(Note that there will be some additional output, which varies slightly depending on which framework you use. The values will also vary, since the inputs are randomly generated.)\n\nAs you become more familiar with NKI, you will no longer need to start with quite so simple a variation on the kernel. While this kernel allowed us to validate our understanding of the ``nki.isa.nc_matmul`` instruction, it will not allow you to pass in matrices larger than a single tile. A more realistic variant of the kernel needs to take matrices larger than the tile size, break down the inputs into single tiles, compute each output tile, then write the result back to HBM.\n\nWriting the kernel\n-------------------\n\nThe simple start allowed us to validate our understanding of the ``nki.isa.matmul`` instruction. The following kernel shows how you can do this with input matrices that are larger than a single tile size. You may recognize the traditional three nested loop structure of matrix multiply, but instead of the inner body computing a scalar value it operates over a full tile.\n\n.. code-block:: python\n\n    import nki\n    import nki.language as nl\n    import nki.isa as nisa\n    import os\n\n    os.environ[\"NEURON_PLATFORM_TARGET_OVERRIDE\"] = \"trn2\"\n\n    @nki.jit\n    def matrix_multiply_kernel(lhsT, rhs):\n      \"\"\"NKI kernel to compute a matrix multiplication operation in a tiled manner\n\n      Args:\n          lhsT: an input tensor of shape [K,M], where both K and M are multiples for\n            128.  It is the left-hand-side argument of the matrix multiplication,\n            delivered transposed for optimal performance.\n          rhs: an input tensor of shape [K,N], where K is a multiple of 128, and N\n            is a multiple of 512.  It is the right-hand-side argument of the matrix\n            multiplication.\n      Returns:\n          result: the resulting output tensor of shape [M,N]\n      \"\"\"\n\n      # Verify that the lhsT and rhs have the same contraction dimension.\n      K, M = lhsT.shape\n      K_, N = rhs.shape\n      assert K == K_, \"lhsT and rhs must have the same contraction dimension\"\n     \n      # Lookup the device matrix multiply dimensions.\n      TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n      TILE_K = nl.tile_size.pmax  # 128\n      TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n     \n      # Verify that the input matrices are a multiple of the tile dimensions.\n      assert M % TILE_M == 0, \\\n        f\"Expected M, {M}, to be a multiple of stationary free-dimension max, {TILE_M}\"\n      assert N % TILE_N == 0, \\\n        f\"Expected N, {N}, to be a multiple of moving free-dimension max, {TILE_N}\"\n      assert K % TILE_K == 0, \\\n        f\"Expected K, {K}, to be a multiple of the partition dimension max, {TILE_K}\"\n     \n      # Create a space for the result in HBM (uninitialized)\n      result = nl.ndarray(shape=(M, N), dtype=lhsT.dtype, buffer=nl.hbm)\n     \n      # Use affine_range to loop over tiles\n      for m in nl.affine_range(M // TILE_M):\n        for n in nl.affine_range(N // TILE_N):\n          # Allocate a tensor in PSUM (uninitialized)\n          result_tile = nl.ndarray(shape=(TILE_M, TILE_N),\n                               dtype=nl.float32,\n                               buffer=nl.psum)\n     \n          for k in nl.affine_range(K // TILE_K):\n            # Declare the tiles on SBUF (uninitialized)\n            lhsT_tile = nl.ndarray(shape=(TILE_K, TILE_M),\n                               dtype=lhsT.dtype,\n                               buffer=nl.sbuf)\n            rhs_tile = nl.ndarray(shape=(TILE_K, TILE_N),\n                              dtype=rhs.dtype,\n                              buffer=nl.sbuf)\n     \n            # Load tiles from lhsT and rhs\n            nisa.dma_copy(dst=lhsT_tile, \n                      src=lhsT[k * TILE_K:(k + 1) * TILE_K,\n                               m * TILE_M:(m + 1) * TILE_M])\n            nisa.dma_copy(dst=rhs_tile,\n                      src=rhs[k * TILE_K:(k + 1) * TILE_K,\n                              n * TILE_N:(n + 1) * TILE_N])\n     \n            # Accumulate partial-sums into PSUM\n            nisa.nc_matmul(dst=result_tile, stationary=lhsT_tile, moving=rhs_tile)\n     \n          # Copy the result from PSUM back to SBUF, and cast to expected\n          # output data-type\n          result_tmp = nl.ndarray(shape=(TILE_M, TILE_N),\n                              dtype=nl.float32,\n                              buffer=nl.sbuf)\n          nisa.tensor_copy(dst=result_tmp, src=result_tile)\n\n          # Copy the result from SBUF to HBM.\n          nisa.dma_copy(dst=result[m * TILE_M:(m + 1) * TILE_M,\n                               n * TILE_N:(n + 1) * TILE_N],\n                    src=result_tmp)\n     \n      return result\n\nThe tiled version expects the input and output matrices to be a multiple of the tile sizes. In cases where the matrices you want to multiply do not match that, they can be padded or the implementation could be extended to handle the sub-tile sized edges. The body of the n and m loops allocates a result_tile in the PSUM. The inner-most k loop then loads the tiles from the lhsT and rhs inputs into SBUF from HBM, performs the matrix multiply, accumulating the result into the result_tile. After the k loop completes, the m, n tile has been computed and can be moved from PSUM to SBUF and then written into the correct position in the result HBM.\n\nNow that you have a kernel that can handle what you expect the model to need, you can extend the small test driver above to ensure you can keep the kernel functioning correctly as you begin to improve the performance of the kernel. This driver is something you can continue to use with each progressive improvement of the kernel. This is just a variation on the original test that provides input matrices large enough to represent the real workload the kernel will be expected to handle.\n\nIn this case that just means increasing the size of the input matrices from a single tile at 128x128 x 128x512 to something slightly more realistic at 4096x8192 x 8192x8192. You can update the numpy generation of inputs to set the lhs and rhs to the new dimensions.\n\n.. code-block:: python\n\n    lhs = rng.random((4096, 8192), dtype=np.float32)\n    rhs = rng.random((8192, 8192), dtype=np.float32)\n\nIt is important to select input sizes that are realistic (or at least representative) of the real work you expect the kernel to handle, because you will use this test not just for correctness, but also to allow you to profile the kernel to guide improvements on the kernel's performance.\n\nIn addition to changing the size of the input to the kernel, you will also want to enable profiling of the kernel. You will use the approach described in the :doc:`Neuron Explorer user guide </tools/neuron-explorer/index>` to profile just the call to the NKI matrix multiply kernel. With this you can surround the call to the kernel with the profiling context.\n\n.. tabs::\n\n   .. tab:: PyTorch\n\n      .. code-block:: python\n\n          from torch_neuronx.experimental import profiler\n          ...\n          with profiler.profile(port=9012,\n                                profile_type='system',\n                                target='neuron_profile_perfetto',\n                                output_dir='./output',\n                                ms_duration=600000) as profiler:\n              result_device = matrix_multiply_kernel(lhsT_device, rhs_device)\n\n\n\n   .. tab:: JAX\n\n      .. code-block:: python\n\n          import jax\n          ...\n          with jax.profiler.trace(\"./output\"):\n            result_jax = matrix_multiply_kernel(lhsT_jax, rhs_jax)\n\nWhen you run the test driver, in addition to showing that the output matches the numpy result, you will also get both the Neuron Execution File Format (NEFF) file, which is what executes on the accelerator and the Neuron Timing File Format (NTFF) file generated by running the kernel with profiling enabled. You can use these two files with the neuron_profiler to view the results of running the kernel.\n\nLooking at this profile for the full kernel run, you can see that the DMA queues which move data from HBM to SBUF and back are quite active. Looking at the Tensor and TensorMatrix lines, it appears there are some gaps within the run as well. The heavy use of DMA and the Tensor Engine (TensorE) is not too surprising, since those are the two things the kernel is primarily doing. The profile also provides some data as an overview of how much each engine is being used. You can zoom in to one of the areas where you see a gap and validate the impression.\n\n.. image:: /nki/img/how-to/v2-full.png\n\nYou can see that the TensorE is busy from the start of the kernel through the end. Note that matrix multiply becomes two instructions on the hardware load weights, which loads the static matrix, and matrix multiply, which loads the moving matrix and performs the matrix multiply operation.\n\n.. image:: /nki/img/how-to/v2-zoom.png\n\nHowever, there are gaps between matrix multiply operations indicate that the TensorE is waiting on data to be read from the HBM to SBUF for the next operation to take place that we can see when we zoom in.Looking at the original kernel code you can see that you are loading the two tiles before each matrix multiply. Looking at the summary data provided in the profile, you can also see that the DMA engines were active 99.93% of the time while the TensorE was only active 87.28% of the run.\n\n.. list-table::\n   :header-rows: 0\n   :widths: 50 50\n\n   * - .. image:: /nki/img/how-to/v2-dma.png\n          :width: 100%\n     - .. image:: /nki/img/how-to/v2-pe.png\n          :width: 100%\n\n\nAnalyzing the kernel\n---------------------\n\nThe first step to improving the performance of the kernel is to analyze the performance you observed and apply that to your understanding of the NeuronEngine Architecture. The NeuronEngine Architecture consists of a number of computational engines that can each run independently, assuming the inputs are available for each instruction. In the current example, the only computational engine you are using is the TensorE and all of its inputs are coming directly from the DMA engines just before the computation is performed with the output of each tile written back after the k inner-most loop completes. Considering that matrix multiply is compute bound, you would expect that the matrix multiply instruction should be the limiting factor of your performance. However, TensorE was only active about 69.83% of the time, which tells us you can likely get more data to it faster to improve the overall computation time.\n\nLooking at this, you might notice two things. First, since the data for each matrix multiply is being loaded just before the multiply, you are always waiting on these loads to complete before you can start the next multiply. If you look at the structure of the iteration, you can also see that you will load the same tile more than once. For instance the m=0, k=0 tile will be loaded N // TILE_N times. One change you could make is to load all of the tiles needed to compute a given output tile before you start the computation. You can accomplish this by moving the loads out into the outer loops, loading all K // TILE_K tiles for a given value of m from the stationary matrix at the start of the m loop, and all K // TILE_K tiles for a given value of n from the stationary matrix at the start of the n loop.\n\n.. code-block:: python\n\n    import nki\n    import nki.language as nl\n    import nki.isa as nisa\n    import os\n\n    os.environ[\"NEURON_PLATFORM_TARGET_OVERRIDE\"] = \"trn2\"\n\n    @nki.jit\n    def matrix_multiply_kernel(lhsT, rhs):\n      \"\"\"NKI kernel to compute a matrix multiplication operation in a tiled manner\n         while hoisting the load of the lhsT and rhs to outer loops.\n\n      Args:\n          lhsT: an input tensor of shape [K,M], where both K and M are multiples for\n            128.  It is the left-hand-side argument of the matrix multiplication,\n            delivered transposed for optimal performance.\n          rhs: an input tensor of shape [K,N], where K is a multiple of 128, and N\n            is a multiple of 512.  It is the right-hand-side argument of the matrix\n            multiplication.\n      Returns:\n          result: the resulting output tensor of shape [M,N]\n      \"\"\"\n\n      # Verify that the lhsT and rhs are the expected sizes.\n      K, M = lhsT.shape\n      K_, N = rhs.shape\n      assert K == K_, \"lhsT and rhs must have the same contraction dimension\"\n      result = nl.ndarray(shape=(M, N), dtype=nl.float32, buffer=nl.hbm)\n\n      # Lookup the device matrix multiply dimensions.\n      TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n      TILE_K = nl.tile_size.pmax  # 128\n      TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n\n      # Verify that the input matrices are a multiple of the tile dimensions.\n      assert M % TILE_M == 0, \\\n        f\"Expected M, {M}, to be a multiple of stationary free-dimension max, {TILE_M}\"\n      assert N % TILE_N == 0, \\\n        f\"Expected N, {N}, to be a multiple of moving free-dimension max, {TILE_N}\"\n      assert K % TILE_K == 0, \\\n        f\"Expected K, {K}, to be a multiple of the partition dimension max, {TILE_K}\"\n\n      # Use affine_range to loop over tiles\n      for m in nl.affine_range(M // TILE_M):\n        # Load a whole column tiles from lhsT (with K * TILE_M numbers)\n        # This corresponds to the whole row in the original lhs\n        lhsT_tiles = []\n        for k in nl.affine_range(K // TILE_K):\n          # Allocate space in SBUF for the tile (uninitialized)\n          lhsT_tile = nl.ndarray(shape=(TILE_K, TILE_M),\n                               dtype=lhsT.dtype,\n                               buffer=nl.sbuf)\n          # Copy the tile from HBM to SBUF\n          nisa.dma_copy(dst=lhsT_tile, \n                      src=lhsT[k * TILE_K:(k + 1) * TILE_K,\n                             m * TILE_M:(m + 1) * TILE_M])\n          # Append the tile to the list of tiles.\n          lhsT_tiles.append(lhsT_tile)\n\n        for n in nl.affine_range(N // TILE_N):\n          # Load a whole column tiles from rhs (with K * TILE_N numbers)\n          rhs_tiles = []\n          for k in nl.affine_range(K // TILE_K):\n            # Allocate space in SBUF for the tile (uninitialized)\n            rhs_tile = nl.ndarray(shape=(TILE_K, TILE_N),\n                              dtype=rhs.dtype,\n                              buffer=nl.sbuf)\n            # Copy the tile from HBM to SBUF\n            nisa.dma_copy(dst=rhs_tile,\n                      src=rhs[k * TILE_K:(k + 1) * TILE_K,\n                              n * TILE_N:(n + 1) * TILE_N])\n            # Append the tile to the list of tiles.\n            rhs_tiles.append(rhs_tile)\n\n          # Allocate a tile in PSUM for the result\n          result_tile = nl.ndarray(shape=(TILE_M, TILE_N),\n                               dtype=nl.float32,\n                               buffer=nl.psum)\n          for k in nl.affine_range(K // TILE_K):\n            # Accumulate partial-sums into PSUM\n            nisa.nc_matmul(dst=result_tile,\n                       stationary=lhsT_tiles[k],\n                       moving=rhs_tiles[k])\n\n          # Copy the result from PSUM back to SBUF, and cast to expected\n          # output data-type\n          result_tmp = nl.ndarray(shape=(TILE_M, TILE_N),\n                              dtype=nl.float32,\n                              buffer=nl.sbuf)\n          nisa.tensor_copy(dst=result_tmp, src=result_tile)\n\n          # Copy the result from SBUF to HBM.\n          nisa.dma_copy(dst=result[m * TILE_M:(m + 1) * TILE_M,\n                               n * TILE_N:(n + 1) * TILE_N],\n                    src=result_tmp)\n\n      return result\n\nThe test program validates that the new implementation is correct and also provides new NEFF and NTFF.\n\n.. image:: /nki/img/how-to/v3-full.png\n\nAt this level the profile does not look too different, however when you zoom in, you can see that the matrix multiplies no longer show so many gaps.\n\n.. image:: /nki/img/how-to/v3-zoom.png\n\nAnalyzing the improvement though, you can see that this change has made big strides. The DMA and matrix multiply is better overlapped, the DMA engines are now busy 99.73% of the time, slightly more than before, but the TensorE is busy 99.85% of the time. This is a huge improvement, but the time spent in the kernel is still dominated by DMA.\n\n.. list-table::\n   :header-rows: 0\n   :widths: 50 50\n\n   * - .. image:: /nki/img/how-to/v3-dma.png\n          :width: 100%\n     - .. image:: /nki/img/how-to/v3-pe.png\n          :width: 100%\n\nOverlapping data and compute through blocking\n-----------------------------------------------\n\nThe previous refinement of the kernel showed that you can improve the utilization of the TensorE by improving how the data is loaded. Instead of loading each tile in the innermost loop, lifting the loads to the outer loops and loading a whole column from both the transposed stationary matrix and the moving matrix reduced the overall amount data that needed to be moved from HBM to SBUF. However, the fact that the kernel is still memory bound means there is more that can be done.\n\nBlocking is a technique to help load even larger amounts of data in at a time. Instead of copying single tiles of data from HBM to SBUF, you can load a full block, which is a multiple of the number of tiles. Since matrix multiply still needs to operate tile by tile, you compute all of the tiles in the block before proceeding to the next block.\n\n.. code-block:: python\n\n    import nki\n    import nki.language as nl\n    import nki.isa as nisa\n    import os\n\n    os.environ[\"NEURON_PLATFORM_TARGET_OVERRIDE\"] = \"trn2\"\n\n    @nki.jit\n    def matrix_multiply_kernel(lhsT, rhs):\n      \"\"\"NKI kernel to compute a matrix multiplication operation while blocking the\n         free dimensions of the LHS and RHS to improve memory access pattern.\n      \n      Args:\n          lhsT: an input tensor of shape [K,M], where both K and M are multiples for\n            1.    It is the left-hand-side argument of the matrix multiplication,\n            delivered transposed for optimal performance.\n          rhs: an input tensor of shape [K,N], where K is a multiple of 128, and N\n            is a multiple of 512.  It is the right-hand-side argument of the matrix\n            multiplication.\n      Returns:\n          result: the resulting output tensor of shape [M,N]\n      \"\"\"\n      \n      # Verify that the lhsT and rhs have the same contraction dimension.\n      K, M = lhsT.shape\n      K_, N = rhs.shape\n      assert K == K_, \"lhsT and rhs must have the same contraction dimension\"\n      \n      # Lookup the device matrix multiply dimensions.\n      TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n      TILE_K = nl.tile_size.pmax  # 128\n      TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n      \n      # Configuring the blocking size for the free dimensions\n      TILES_IN_BLOCK_M = 2\n      TILES_IN_BLOCK_N = 2\n      \n      BLOCK_M = TILE_M * TILES_IN_BLOCK_M  # 256\n      BLOCK_N = TILE_N * TILES_IN_BLOCK_N  # 1024\n      \n      # the size has to be multiple of block size\n      assert M % BLOCK_M == 0, f\"Expected M ({M}) to be divisible by BLOCK_M ({BLOCK_M})\"\n      assert N % BLOCK_N == 0, f\"Expected N ({N}) to be divisible by BLOCK_N ({BLOCK_N})\"\n\n      # Create a space for the result in HBM (not initialized)\n      result = nl.ndarray(shape=(M, N), dtype=lhsT.dtype, buffer=nl.hbm)\n      \n      # Loop over blocks over the M dimension\n      for m in nl.affine_range(M // BLOCK_M):\n        # Load TILES_IN_BLOCK_M columns tiles from lhsT\n        lhsT_tiles = []\n        for bm in nl.affine_range(TILES_IN_BLOCK_M):\n          # Inner tile array.\n          lhsT_tiles_internal = []\n          for k in nl.affine_range(K // TILE_K):\n            # Allocate space in SBUF for the tile (uninitialized)\n            lhsT_tile = nl.ndarray(shape=(TILE_K, TILE_M),\n                                   dtype=lhsT.dtype,\n                                   buffer=nl.sbuf)\n            # Copy the tile from HBM to SBUF\n            nisa.dma_copy(dst=lhsT_tile,\n                    src=lhsT[k * TILE_K:(k + 1) * TILE_K,\n                         (m * TILES_IN_BLOCK_M + bm) *\n                         TILE_M:((m * TILES_IN_BLOCK_M + bm) + 1) *\n                         TILE_M])\n            # Append the tile to the inner list of tiles.\n            lhsT_tiles_internal.append(lhsT_tile)\n          # Append the inner list of tiles into the outer list of tiles.\n          lhsT_tiles.append(lhsT_tiles_internal)\n      \n        for n in nl.affine_range(N // BLOCK_N):\n          # Load TILES_IN_BLOCK_N columns from rhs\n          rhs_tiles = []\n          for bn in nl.affine_range(TILES_IN_BLOCK_N):\n            # Inner tile array.\n            rhs_tiles_internal = []\n            for k in nl.affine_range(K // TILE_K):\n              # Allocate space in SBUF for the tile (uninitialized)\n              rhs_tile = nl.ndarray(shape=(TILE_K, TILE_N),\n                                    dtype=rhs.dtype,\n                                    buffer=nl.sbuf)\n              # Copy the tile from HBM to SBUF\n              nisa.dma_copy(dst=rhs_tile,\n                    src=rhs[k * TILE_K:(k + 1) * TILE_K,\n                        (n * TILES_IN_BLOCK_N + bn) *\n                        TILE_N:((n * TILES_IN_BLOCK_N + bn) + 1) *\n                        TILE_N])\n              # Append the tile to the inner list of tiles.\n              rhs_tiles_internal.append(rhs_tile)\n            # Append the inner list of tiles into the outer list of tiles.\n            rhs_tiles.append(rhs_tiles_internal)\n      \n          for bm in nl.affine_range(TILES_IN_BLOCK_M):\n            for bn in nl.affine_range(TILES_IN_BLOCK_N):\n              # Allocate a tensor in PSUM\n              result_tile = nl.ndarray(shape=(TILE_M, TILE_N),\n                                       dtype=nl.float32,\n                                       buffer=nl.psum)\n              for k in nl.affine_range(K // TILE_K):\n                # Accumulate partial-sums into PSUM\n                nisa.nc_matmul(dst=result_tile,\n                               stationary=lhsT_tiles[bm][k],\n                               moving=rhs_tiles[bn][k])\n      \n              # Copy the result from PSUM back to SBUF, and cast to expected\n              # output data-type\n              result_tmp = nl.ndarray(shape=result_tile.shape,\n                                      dtype=result.dtype,\n                                      buffer=nl.sbuf)\n              nisa.tensor_copy(dst=result_tmp, src=result_tile)\n\n              # Copy the result from SBUF to HBM.\n              nisa.dma_copy(dst=result[(m * TILES_IN_BLOCK_M + bm) *\n                                       TILE_M:((m * TILES_IN_BLOCK_M + bm) + 1) *\n                                       TILE_M,\n                                       (n * TILES_IN_BLOCK_N + bn) *\n                                       TILE_N:((n * TILES_IN_BLOCK_N + bn) + 1) *\n                                       TILE_N],\n                            src=result_tmp)\n      \n      return result\n\nRunning the test driver ensures the new implementation of the kernel is correct and provides a new NEFF and NTFF that helps us understand the improvements.\n\n.. image:: /nki/img/how-to/v4-full.png\n\nZooming in on a similarly sized section shows that while the overall time of the kernel has improved, there are again gaps between our matrix multiply instructions.\n\n.. image:: /nki/img/how-to/v4-zoom.png\n\nAgain you can see gaps in the matrix multiply. Even though the new implementation of the kernel improves on the overall time of the kernel, the new implementation reduces the number of DMA instructions, because each instruction loads more, but you wait longer for each block to load. In fact, even though the performance improved the TensorE is actually less utilized as a percentage of time, dropping to 99.52% of the time, with the DMA engines hitting 95.70%. This means there is a small amount of time when only the TensorE is being used, but the DMA engine is still active for most of the kernel run, which you should expect could be smaller.\n\n.. list-table::\n   :header-rows: 0\n   :widths: 50 50\n\n   * - .. image:: /nki/img/how-to/v4-dma.png\n          :width: 100%\n     - .. image:: /nki/img/how-to/v4-pe.png\n          :width: 100%\n\nOptimizing DMA through blocking the contraction dimension\n---------------------------------------------------------\n\nOne of the advantages of leaving the K dimension unblocked was that you could rely on the PSUM buffer to hold the final computed value. To block in the K dimension, you will need to store intermediate partial sums in a temporary SBUF array of tiles. The nki.isa.tensor_tensor instruction can be used to add two tensors, allowing you to accumulate into the temporary tile. With this, you can build blocks in all three dimensions. This version of blocking loads the blocks to in BLOCK_K by BLOCK_M and BLOCK_K by BLOCK_N dimensions.\n\n.. code-block:: python\n\n   import nki\n   import nki.language as nl\n   import nki.isa as nisa\n   import os\n\n   os.environ[\"NEURON_PLATFORM_TARGET_OVERRIDE\"] = \"trn2\"\n\n   @nki.jit\n   def matrix_multiply_kernel(\n       lhsT,\n       rhs,\n       # Meta-parameters\n       TILES_IN_BLOCK_M=16,\n       TILES_IN_BLOCK_N=2,\n       TILES_IN_BLOCK_K=8,\n   ):\n     \"\"\"NKI kernel to compute a large matrix multiplication efficiently by\n        blocking all dimensions and doing layout optimization.\n     \n     Args:\n         lhsT: an input tensor of shape [K,M], where K is a multiple of 128 *\n           TILES_IN_BLOCK_K and M is a multiple of 128 * TILES_IN_BLOCK_M.  It is the\n           left-hand-side argument of the matrix multiplication, delivered transposed\n           for optimal performance.\n         rhs: an input tensor of shape [K,N],  where K is a multiple of 128 *\n           TILES_IN_BLOCK_K and N is a multiple of 512 * TILES_IN_BLOCK_N.  It is\n           the right-hand-side argument of the matrix multiplication.\n         TILES_IN_BLOCK_*: meta parameters to control blocking dimensions\n     Returns:\n         result: the resulting output tensor of shape [M,N]\n     \"\"\"\n\n     # Verify that the lhsT and rhs have the same contraction dimension.\n     K, M = lhsT.shape\n     K_, N = rhs.shape\n     assert K == K_, \"lhsT and rhs must have the same contraction dimension\"\n\n     # Lookup the device matrix multiply dimensions.\n     TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n     TILE_K = nl.tile_size.pmax  # 128\n     TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n\n     # Compute the block dimensions.\n     BLOCK_M = TILE_M * TILES_IN_BLOCK_M\n     BLOCK_N = TILE_N * TILES_IN_BLOCK_N\n     BLOCK_K = TILE_K * TILES_IN_BLOCK_K\n\n     # the size has to be multiple of block size\n     assert M % BLOCK_M == 0, \\\n       f\"Expected M {M} to be divisble by {BLOCK_M} when there are {TILES_IN_BLOCK_M}\"\n     assert N % BLOCK_N == 0, \\\n       f\"Expected N {N} to be divisble by {BLOCK_N} when there are {TILES_IN_BLOCK_N}\"\n     assert K % BLOCK_K == 0, \\\n       f\"Expected K {K} to be divisble by {BLOCK_K} when there are {TILES_IN_BLOCK_K}\"\n\n     # Create a space for the result in HBM (not initialized)\n     result = nl.ndarray(shape=(M,N), dtype=nl.float32, buffer=nl.hbm)\n\n     # Compute the number of blocks in each dimension\n     NUM_BLOCK_M = M // BLOCK_M\n     NUM_BLOCK_N = N // BLOCK_N\n     NUM_BLOCK_K = K // BLOCK_K\n\n     # Blocking N dimension (the RHS free dimension)\n     for n in nl.affine_range(NUM_BLOCK_N):\n       # Create the initial result tiles in SBUF and initialize each tile to\n       # 0.0, since the final results will be accumulated here.\n       result_tmps = []\n       for m_idx in range(NUM_BLOCK_M):\n         block_m = []\n         for bm_idx in range(TILES_IN_BLOCK_M):\n           block_n = []\n           for bn_idx in range(TILES_IN_BLOCK_N):\n             # Create the result tile (uninitialized)\n             tile = nl.ndarray(shape=(TILE_M, TILE_N),\n                               dtype=lhsT.dtype,\n                               buffer=nl.sbuf)\n             # Initialize the tile 0.0\n             nisa.memset(dst=tile, value=0.0)\n             # Append the tile to block_n array.\n             block_n.append(tile)\n           # Append block_n array to block_m array.\n           block_m.append(block_n)\n         # Append block_m array into result_tmps.\n         result_tmps.append(block_m)\n\n       # Blocking K dimension (the contraction dimension)\n       # Use `sequential_range` because we do not want the compiler to\n       # change this loop by, for example, vectorizing it\n       for k in nl.sequential_range(NUM_BLOCK_K):\n         # Loading tiles from rhs setting the load tile to\n         # `TILE_K x BLOCK_SIZE_N` to optimize DMA performance\n         rhs_tiles = []\n         for bk_r in range(TILES_IN_BLOCK_K):\n           # Allocate rhs_tile tensor, TILE_K x BLOCK_N\n           rhs_tile = nl.ndarray(shape=(TILE_K, BLOCK_N),\n                                 dtype=rhs.dtype,\n                                 buffer=nl.sbuf)\n           # Copy block tile from rhs, to rhs_tile.\n           nisa.dma_copy(dst=rhs_tile[0:TILE_K, 0:BLOCK_N],\n                         src=rhs[(TILES_IN_BLOCK_K * k + bk_r) *\n                                 TILE_K:(TILES_IN_BLOCK_K * k + bk_r + 1) * TILE_K,\n                                 BLOCK_N * n:BLOCK_N * (n + 1)])\n           # Append rhs_tile to rhs_tiles.\n           rhs_tiles.append(rhs_tile)\n\n         # Blocking M dimension (the LHS free dimension)\n         for m in nl.affine_range(NUM_BLOCK_M):\n           # Loading tiles from lhsT\n           lhsT_tiles = []\n           for bk_l in nl.affine_range(TILES_IN_BLOCK_K):\n             # Allocate lhsT_tile in SBUF (uninitialized)\n             lhsT_tile = nl.ndarray(shape=(TILE_K, BLOCK_M),\n                                    dtype=lhsT.dtype,\n                                    buffer=nl.sbuf)\n             # Copy block tile from lhsT to lhsT_tile\n             nisa.dma_copy(\n               dst=lhsT_tile[0:TILE_K, 0:BLOCK_M],\n               src=lhsT[(TILES_IN_BLOCK_K * k + bk_l) *\n                    TILE_K:(TILES_IN_BLOCK_K * k + bk_l + 1) * TILE_K,\n                    BLOCK_M * m:BLOCK_M * (m + 1)])\n             # Copy block tile from lhsT to lhsT_tile\n             lhsT_tiles.append(lhsT_tile)\n\n           # Do matmul with all tiles in the blocks\n           for bn in nl.affine_range(TILES_IN_BLOCK_N):\n             for bm in nl.affine_range(TILES_IN_BLOCK_M):\n               # Allocate result_tile in PSUM (uninitialized)\n               result_tile = nl.ndarray(shape=(TILE_M, TILE_N),\n                                        dtype=nl.float32,\n                                        buffer=nl.psum)\n               for bk in nl.affine_range(TILES_IN_BLOCK_K):\n                 # Perform matrix multiply on a tile.\n                 nisa.nc_matmul(\n                   dst=result_tile,\n                   stationary=lhsT_tiles[bk][0:TILE_K, bm * TILE_M:(bm + 1) * TILE_M],\n                   moving=rhs_tiles[bk][0:TILE_K, bn * TILE_N:(bn + 1) * TILE_N]\n                 )\n               # Accumulate the result into the result_tmps tile.\n               nisa.tensor_tensor(dst=result_tmps[m][bm][bn],\n                                  data1=result_tmps[m][bm][bn],\n                                  data2=result_tile,\n                                  op=nl.add)\n\n       # Copying the result from SBUF to HBM\n       for m in nl.affine_range(NUM_BLOCK_M):\n         for bm in nl.affine_range(TILES_IN_BLOCK_M):\n           # coalesce result tiles for better DMA performance\n           result_packed = nl.ndarray(shape=(TILE_M, BLOCK_N),\n                                      dtype=nl.float32,\n                                      buffer=nl.sbuf)\n           for bn in nl.affine_range(TILES_IN_BLOCK_N):\n             nisa.tensor_copy(\n               dst=result_packed[0:TILE_M, bn * TILE_N:(bn + 1) * TILE_N],\n               src=result_tmps[m][bm][bn][0:TILE_M, 0:TILE_N])\n\n           # Copy packed result from SBUF to HBM.\n           nisa.dma_copy(dst=result[(TILES_IN_BLOCK_M * m + bm) *\n                                    TILE_M:(TILES_IN_BLOCK_M * m + bm + 1) * TILE_M,\n                                    BLOCK_N * n:BLOCK_N * (n + 1)],\n                         src=result_packed[0:TILE_M, 0:BLOCK_N])\n\n     return result\n\nThis version of the kernel is considerably more complicated, but the test driver you created for the simplest version of this kernel means you have a ready test. The sizes of matrices you chose in the original test were forward-looking in that they correspond to the tiling dimensions you selected. However, you expose these as additional arguments (unlike in the previous blocking), so a model calling this kernel can choose block sizes appropriate for the model. The test driver also gives us a new set of NEFF and NTFF files.\n\n.. image:: /nki/img/how-to/v5-full.png\n\nOther than the improved time, this seems similar to the other profile graphs, however you can see a slightly more complex pattern. This reflects the time to compute the full output tile and then copying the results out.\n\n.. image:: /nki/img/how-to/v5-zoom.png\n\nZooming in you can see the gap at the end of the set of matrix multiplies where the results are accumulated into the SBUF temporary results. Looking at the utilization of the DMA engines and TensorE you can see the DMA engines are now active only 21.54% of the time, while the TensorE is now active 99.50%, with the Vector Engine (VectorE) active 10.55% of the time, where it was previously unused.\n\n.. list-table::\n   :header-rows: 0\n   :widths: 50 50\n\n   * - .. image:: /nki/img/how-to/v5-dma.png\n          :width: 100%\n     - .. image:: /nki/img/how-to/v5-pe.png\n          :width: 100%\n   * -\n     - .. image:: /nki/img/how-to/v5-vec.png\n          :width: 100%\n\nThis final version of the matrix multiply kernel is no longer memory-bound. Instead, as you should expect, it is compute-bound with the TensorE and VectorE engines being the limiting factor on the speed of the kernel.\n\nSummary\n-------\n\nWhile the matrix multiply example kernel is a relatively simple one, which primarily focuses on just two of the engines in the NeuronEngine Architecture: the DMA engines and the TensorE, it demonstrates how you can start with a simpler known correct version of a kernel with a test case that provides a representative workload and use a combination of your understanding of the NeuronEngine Architecture, the Neuron Profiler, and your understanding of the kernel you are trying to implement to improve the performance of the kernel.\n\nOnce the kernel is ready you use it to replace the section of the model it is intended to implement. The test driver can continue to be used as a unit test that ensures correct operations and allows you to add regression tests, both for accuracy and performance of the kernel. It can also provide a starting point to porting to other generations of the NeuronEngine Architecture.\n\nRelated concepts\n----------------\n\n* :doc:`Tutorial: Matrix multiplication </nki/guides/tutorials/matrix_multiplication>`\n* :doc:`Profiling NKI kernels with Neuron Explorer </nki/guides/use-neuron-profile>`\n\nFurther reading\n---------------\n\n* :doc:`NKI Language Guide </nki/get-started/nki-language-guide>`\n* :doc:`NeuronDevice Architecture Guide for NKI </nki/guides/architecture/trainium_inferentia2_arch>`\n* :doc:`NKI Performance Guide </nki/deep-dives/nki_perf_guide>`\n"
  },
  {
    "path": "nki/guides/tutorials/matrix_multiplication.rst",
    "content": ".. meta::\n    :description: Learn how to implement and optimize matrix multiplication kernels using NKI on AWS Neuron hardware, from basic implementation to advanced optimization techniques.\n    :keywords: Matrix Multiplication, NKI, Neuron, Optimization, TensorE\n    :date-modified: 12/01/2025\n\n.. _nki-matrix-multiplication:\n\nMatrix multiplication\n=====================\n\nIn this tutorial, we will start with a simple NKI matrix multiplication kernel\nand optimize it step by step. In doing so, we learn about:\n\n-  The NKI syntax and programming model.\n-  Layout, tiling, and memory management considerations when performing\n   matrix multiplication in NKI.\n\nBasic compute kernel\n----------------------\n\n\n.. _nki-fig-mm-view:\n\n.. figure:: ../../img/matrix-multiplication-views.png\n   :align: center\n\n   MxKxN Matrix Multiplication Visualization\n\n:numref:`Fig. %s <nki-fig-mm-view>` illustrates how a simple matrix\nmultiplication: ``lhs [M, K] * rhs [K, N] = output [M, N]`` would be mapped to the\nTensor Engine (TensorE) and SRAMs from its original mathematical view. Note, the PSUM\npartition dimension is rotated 90 degrees from SBUF partition dimension solely for layout visualization.\nThe copy preserves the ``output`` tile layout from PSUM to SBUF, by copying data from each PSUM partition\nto the corresponding SBUF partition.\n\nThe NKI example below implements a compute kernel for a single-tile matrix\nmultiplication. It computes a ``64(M) x 128(K) x 512 (N)`` matrix\nmultiplication operation.\n\n.. nki_example:: ../../examples/matrix_multiplication/matrix_multiplication_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_16\n\nIn this example, we define the NKI kernel as ``nki_matmul_basic_:``\n\n1. We define indices to access the LHS and RHS input tensors.\n2. To adhere to NKI's layout considerations,\n   we map the contraction axis of both LHS and RHS to the P-dimension,\n   which means we load LHS in transposed form.\n3. To adhere to NKI's tile size considerations,\n   we limit the matmul instruction arguments to tiles of up to\n   ``[128,128]`` for LHS, and ``[128,512]`` for RHS.\n4. Using the ``nisa.dma_copy`` operation, we load the inputs from HBM tensors\n   to SBUF tiles.\n5. We then use the ``nisa.nc_matmul`` operation to perform the matrix\n   multiplication. Note that we set the LHS argument is transposed. Also note that the *64x128*\n   dimension here actually under-utilizes the TensorE, but it helps to\n   distinguish the M, K and N dimensions for education purposes in this first\n   code example.\n6. ``nisa.nc_matmul`` always writes its result to PSUM, and since\n   ``nisa.dma_copy`` only moves data from SBUF to HBM, we copy the\n   multiplication result from PSUM back to SBUF using ``nisa.tensor_copy``.\n\nWe can then execute the kernel and verify correctness against the torch\nimplementation as follows. Note that we use `torch.allclose` to tolerate\nnumerical error inherent to floating-point arithmetic.\n\n.. nki_example:: ../../examples/matrix_multiplication/matrix_multiplication_torch.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_17\n\n\n.. _tutorial_matmul_tiling:\n\nTiling matrix multiplications\n-------------------------------\n\n.. TODO\n  Stretch goal (not urgent): use nki masking to support non-multiples\n\nSo far, we've limited our matrix multiplication to the tile sizes\nallowed by NKI's tile size and layout constraints. Next, we'll see how\nto handle larger matrix multiplications. Let's start with a pseudo-code\nfor tiling an ``[M,K] @ [K,N]`` matrix-multiplication.\nNote that we assume the left-hand-side matrix (``[M,K]``) is already transposed\nto LHS_T (``[K,M]``) for optimal performance of the underlying TensorE.\n\n::\n\n   # LHS_T: left-hand-side matmul argument (shape [K,M])\n   # RHS: right-hand-side matmul argument (shape [K,N])\n   # RES: matmul result (shape [M,N])\n\n   # Tile LHS_T free dimension\n   for m in range(0, M, 128):\n     # Tile RHS free dimension\n     for n in range(0, N, 512):\n       # Zero-out the accumulator buffer\n       accum = zeros((128, 512))\n       # Tile contraction dimension\n       for k in range(0, K, 128):\n         lhsT_tile = LHS_T[m : m+128, k : k+128]\n         rhs_tile = RHS[k : k+128, n : n+512]\n         accum += dot(lhsT_tile, rhs_tile)\n       RES[m : m+128, n : n+512] = accum\n\nThis form of tiling can be achieved in NKI as follows:\n\n.. nki_example:: ../../examples/matrix_multiplication/matrix_multiplication_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_18\n\nA few notes about the above code example:\n\n.. code-block::\n\n   psum_buf = nl.ndarray(..., buffer=nl.psum)\n\n   # condition: an affine range loop\n   for i in nl.affine_range(N):\n      # condition 3: add matmul results from TensorEngine\n      nisa.nc_matmul(psum_buf, stationary_tile, moving_tile) # or nl.matmul\n\nThe use of :ref:`PSUM accumulation architecture feature <arch_sec_accumulation_psum>` is critical to\nachieve good performance out of TensorEngine when\nthe contraction dimension of the matmul is greater than 128.\n\nThe :doc:`nl.affine_range <../../api/generated/nki.language.affine_range>` is used\nto define loop-level iterators, which is the recommended iterator type when the\nloop does not have loop-carried dependency (Note, associative reductions are\nnot considered loop carried dependencies in this context). The first\n``nisa.nc_matmul`` call overwrites the contents of the ``psum_buf``, with\nsubsequent calls to the ``nisa.nc_matmul`` instruction accumulating results\ninto the ``psum_buf``.\n\nThere is an alternative way to implement this tiled matrix multiplication kernel\nusing the SPMD programming model.  We can use the SPMD model to launch ``(M/128)\nx (N/512)`` instances of the kernel to complete the innermost loop.\n\n\nOptimization 1: Removing Redundant Loads\n----------------------------------------\n\n\nCurrently, every ``nisa.nc_matmul`` is accompanied with two ``nisa.dma_copy`` calls in the\ninner loop, both of which move data from HBM to SBUF. Let's introduce a metric,\narithmetic intensity, to help understand why this is problematic. The arithmetic\nintensity of a workload is defined as the number of computation operations\nperformed per byte of data accessed from HBM on average. The reason why we do\nnot consider data accessed from SBUF in this metric is because the SBUF\nbandwidth (~20x higher than HBM) is high enough to sustain the peak computation\nthroughput in TensorE.\n\n.. _nki-fig-roofline:\n\n.. figure:: ../../img/roofline.png\n   :align: center\n\n   Roofline Model: The Relationship Between Arithmetic Intensity and Performance\n\n:numref:`Fig. %s <nki-fig-roofline>`  shows the roofline model, which models the\nrelationship between arithmetic intensity of a workload and its achievable\nperformance on a given computing platform. To saturate TensorE in a\nNeuronCore-v2, the arithmetic intensity threshold of a workload is 222\nFlops/Byte for ``bfloat16`` data type.  Inside the inner loop of\n``nki_matmul_tiled_``, accessing ``lhsT_tile`` and ``rhs_tile`` requires\n160 KB of data read from HBM, while the ``nisa.nc_matmul`` call involves 16 MFlops.\nThis leads to an arithmetic intensity of 102, which is significantly lower than\nthe saturation threshold of 222. Therefore, ``nki_matmul_tiled_``\noperates in the memory bound region of the roofline model and under-utilizes\nTensorE.  To make the best out of TensorE, we need to improve the arithmetic\nintensity of the matmul kernel.\n\nWith NKI, programmers can control when and how to load data from HBM into SBUF\nand also perform computation. We will demonstrate in the upcoming steps how to\nincrease the arithmetic intensity of the matmul kernel using NKI, thereby\nmaximizing the utilization of TensorE.\n\nFirst, we notice that in ``nki_matmul_tiled_``, the same tiles from\n``lhsT`` and ``rhs`` matrices are loaded more than once across different\niterations of the inner loop. The following example reduces these redundant\nloads through hoisting them out of the innermost loop.\n\n.. _nki-fig-mm-after-load-hoisting:\n\n.. figure:: ../../img/mm-memory-pattern-after-load-hoisting.png\n   :align: center\n\n   Memory Pattern After Hoisting Loads Out of the Innermost Loop\n\n\n.. nki_example:: ../../examples/matrix_multiplication/matrix_multiplication_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_19\n\n\nOptimization 2: Blocking M and N Dimension\n-----------------------------------------------------------\n\nWhile hoisting the load out of the innermost loop eliminates some redundant\nloads, we can push this idea further to increase arithmetic intensity.\n\nEach time we load K elements from the MxK matrix stored in HBM, Optimization 1 allows us\nto utilize those same elements N different times.\nHowever, SBUF capacity is much higher than `K` elements currently cached from optimization 1.\nWe can load multiple K elements from the MxK matrix at a time, result in higher data reuse.\nThis will increase arithmetic intensity.\n\n\nBlock size must balance two constraints: it should be large enough to saturate arithmetic intensity, yet\nsmall enough for all live blocks remain within SBUF capacity to avoid spilling, causing performance regression.\n\n\n:numref:`Fig. %s <nki-fig-mm-after-blocking-free>` below visualizes the memory pattern\nafter blocking both free dimensions.\n\n.. _nki-fig-mm-after-blocking-free:\n\n.. figure:: ../../img/mm-memory-pattern-after-blocking-free.png\n   :align: center\n\n   Memory Pattern After Blocking Free Dimensions\n\n\n.. nki_example:: ../../examples/matrix_multiplication/matrix_multiplication_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_20\n\nOptimization 3: Blocking M, N and K Dimension\n----------------------------------------------------------------\nBlocking only the free dimensions and requiring to load the whole partition dimension (K) will set an upper\nlimit on block size (M and N) due to limited SBUF capacity.\n\nMatrix multiply with shapes ``[M, K] @ [K, N] = [M, N]`` requires ``K`` multiplies and ``K`` additions\n(or ``K-1`` for accumulation) for each element in the resulting ``[M, N]`` grid, totaling ``2*K*M*N`` FLOPS.\nIt has to load ``M*K + K*N + M*N`` elements, resulting in arithmetic intensity ``2*M*N*K/(2*(M*K + K*N + M*N))``\nfor 2 byte data types like FP16 or BF16. Since the full K has to fit in memory for Optimization 2,\nit will limit M and N size for a block. Arithmetic intensity will be lower if any of the M, N or K is\nmuch smaller than the others.\n\nBlocking partition dimension also results in calculating partial matrix multiplies in each block that have to\nbe accumulated, resulting in additional HBM traffic if not handled carefully.\n\n.. _nki-fig-mm-after-blocking-all:\n\n.. figure:: ../../img/mm-memory-pattern-after-blocking-all.png\n   :align: center\n\n   Memory Pattern After Blocking All Dimensions\n\nWith the blocking configuration in the code (16 tiles or 2048 numbers in the\n``M`` dimension; 2 tiles or 1024 numbers in the ``N`` dimension; and 8 tiles or\n1024 numbers in the ``K`` dimension), this computation has an arithmetic\nintensity of 683 Flops/Byte (2048*1024*1024/(2048*1024 + 1024*1024)). This is\ncertainly above the threshold of 222.\n\nAt the same time, this blocking configuration keeps all the tensors within the\nSBUF limit as much as possible.  With all matrices in BF16 data type, the\n``lhsT_tiles`` requires 4MB and ``rhs_tiles`` requires 2MB SBUF memory. The\n``result_m_tiles`` requires ``4 * NUM_BLOCK_M`` MB SBUF memory, where\n``NUM_BLOCK_M`` is ``M // 2048``. Thus, as long as ``M <= 8192``, the required\nSBUF memory is under the 24 MB budget (4 + 2 + 4 * (8192 // 2048) == 22 MB).\nWhen the ``M`` dimension becomes bigger, spilling and reloading of the\n``result_m_tiles`` will happen, but because the frequency is relatively low, the\ncomputation can still be sufficient.\nBlock size must balance two constraints: it should be large enough to saturate arithmetic intensity, yet\nsmall enough for all live blocks to remain within SBUF capacity to avoid spilling, causing performance regression.\n\nWe also use a contiguous N-block layout per M-tile to eliminate coalescing. Each\nresult M-tile is allocated with shape ``(TILE_M, TILES_IN_BLOCK_N, TILE_N)``\ninstead of separate ``(TILE_M, TILE_N)`` tiles. Because the N-block tiles are\nalready contiguous in the free dimension, we can later reshape to\n``(TILE_M, BLOCK_N)`` and issue a single large ``nisa.dma_copy`` for SBUF to HBM\neviction without needing a ``nisa.tensor_copy()`` coalescing step to remove work\nfrom VectorE.\nFurthermore, by splitting the N-block into individual M-tiles, the compiler can\npipeline ``memset(0)``, matmul, ``tensor_tensor`` accumulation, and SBUF to HBM\nDMA eviction on M-tile granularity, overlapping all stages.\n\n.. nki_example:: ../../examples/matrix_multiplication/matrix_multiplication_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_21\n\nTesting Correctness and Benchmarking\n------------------------------------\n\nTo test the correctness of the kernels, we compare the result with the\n``torch.matmul`` with ``torch.allclose``.\n\n.. nki_example:: ../../examples/matrix_multiplication/matrix_multiplication_torch.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_22\n\nOutput from the test:\n\n::\n\n   Checking correctness of nki_matmul_tiled\n   NKI and Torch match\n   Checking correctness of nki_matmul_hoist_load\n   NKI and Torch match\n   Checking correctness of nki_matmul_block_free_dimension\n   NKI and Torch match\n   Checking correctness of nki_matmul_fully_optimized\n   NKI and Torch match\n\nDownload All Source Code\n--------------------------\n\nClick the links to download source code of the kernels and the testing code\ndiscussed in this tutorial.\n\n* All matrix multiplication NKI kernels: :download:`matrix_multiplication_nki_kernels.py <../../examples/matrix_multiplication/matrix_multiplication_nki_kernels.py>`\n* PyTorch implementation: :download:`matrix_multiplication_torch.py <../../examples/matrix_multiplication/matrix_multiplication_torch.py>`\n\nYou can also view the source code in the GitHub repository `nki_samples <https://github.com/aws-neuron/nki-samples/tree/main/src/nki_samples/tutorials/matrix_multiplication/>`_\n\nExample usage of the scripts:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nRun benchmarking of different NKI kernels:\n\n.. code-block::\n\n   python3 matrix_multiplication_nki_kernels.py\n\nRun PyTorch implementation to validate the NKI results against the PyTorch\nimplementation:\n\n.. code-block::\n\n   python3 matrix_multiplication_torch.py\n"
  },
  {
    "path": "nki/guides/tutorials/transpose2d.rst",
    "content": ".. _nki-transpose2d:\n\nTranspose2D\n===========\n\nIn this tutorial, we transpose a tensor along two of its axes using NKI.\nIn doing so, we learn about:\n\n-  The NKI syntax and programming model.\n-  Multi-dimensional memory address patterns in NKI.\n\nAs background, there are two main types of transposition in NKI:\n\n1. Transposition between the partition-dimension axis and one of the\n   free-dimension axes, which is achieved via the\n   :literal:`nki.isa.nc_transpose` instruction.\n2. Transposition between two axes on the free-dimension, which is achieved\n   via a ``nki.language.copy`` instruction, with indexing manipulation\n   in the free axis to re-arrange the data.\n\n\nIn this example, we'll focus on the second case: consider a\nthree-dimensional input tensor ``[P, F1, F2]``, where the ``P`` axis is mapped\nto the different SBUF partitions and the ``F1`` and ``F2`` axes are\nflattened and placed in each partition, with ``F1`` being the major\ndimension. Our goal in this example is to transpose the ``F1`` and\n``F2`` axes with a parallel dimension ``P``,\nto re-arrange the data within each partition. :ref:`Figure <nki-fig-transpose>`\nbelow illustrates the input and output tensor layouts.\n\n.. _nki-fig-transpose:\n\n.. figure:: ../../img/pm-index-2.png\n   :align: center\n   :width: 60%\n\n   Tensor F1:F2 Transpose\n\nPyTorch\n-------\n\nCompute kernel\n^^^^^^^^^^^^^^\n\n.. nki_example:: ../../examples/transpose2d/transpose2d_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_33\n\nLaunching kernel and testing correctness\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo execute the kernel, we prepare tensors ``a`` and call ``tensor_transpose2D_kernel_``:\n\n\n.. nki_example:: ../../examples/transpose2d/transpose2d_torch.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_34\n\n\nJAX\n---\n\nCompute kernel\n^^^^^^^^^^^^^^\n\nWe can reuse the same NKI compute kernel defined for PyTorch above.\n\n.. nki_example:: ../../examples/transpose2d/transpose2d_nki_kernels.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_33\n\n\nLaunching kernel and testing correctness\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo execute the kernel, we prepare array ``a`` and call ``tensor_transpose2D_kernel_``:\n\n.. nki_example:: ../../examples/transpose2d/transpose2d_jax.py\n   :language: python\n   :linenos:\n   :marker: NKI_EXAMPLE_36\n\n.. note::\n   We pass ``shape2D`` as kwargs to pass the shape as a compile-time constant\n   to the kernel function.\n\n.. _tutorial_transpose2d_code:\n\nDownload All Source Code\n--------------------------\n\nClick the links to download source code of the kernels and the testing code\ndiscussed in this tutorial.\n\n* NKI baremetal implementation: :download:`transpose2d_nki_kernels.py <../../examples/transpose2d/transpose2d_nki_kernels.py>`\n* PyTorch implementation: :download:`transpose2d_torch.py <../../examples/transpose2d/transpose2d_torch.py>`\n    * You must also download :download:`transpose2d_nki_kernels.py <../../examples/transpose2d/transpose2d_nki_kernels.py>`\n      into the same folder to run this PyTorch script.\n* JAX implementation: :download:`transpose2d_jax.py <../../examples/transpose2d/transpose2d_jax.py>`\n    * You must also download :download:`transpose2d_nki_kernels.py <../../examples/transpose2d/transpose2d_nki_kernels.py>`\n      into the same folder to run this JAX script.\n\nYou can also view the source code in the GitHub repository `nki_samples <https://github.com/aws-neuron/nki-samples/tree/main/src/nki_samples/tutorials/transpose2d/>`_\n\nExample usage of the scripts:\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nRun NKI baremetal implementation:\n\n.. code-block::\n\n   python3 transpose2d_nki_kernels.py\n\nRun PyTorch implementation:\n\n.. code-block::\n\n   python3 transpose2d_torch.py\n\nRun JAX implementation:\n\n.. code-block::\n\n   python3 transpose2d_jax.py\n"
  },
  {
    "path": "nki/guides/use-neuron-profile.rst",
    "content": ".. meta::\n    :description: Learn how to profile Neuron Kernel Interface (NKI) kernels using Neuron Explorer to analyze hardware-level performance characteristics on Trainium and Inferentia devices.\n    :date-modified: 12/02/2025\n\n.. _use-neuron-profile:\n\nProfile a NKI Kernel\n====================\n\nLearn how to profile Neuron Kernel Interface (NKI) kernels using Neuron Explorer to analyze hardware-level performance characteristics on Trainium and Inferentia devices. This comprehensive guide covers two profiling methods: using the ``neuron-explorer capture`` command-line tool. You'll discover how to generate NEFF and NTFF files, identify performance bottlenecks, optimize kernel execution, and leverage the interactive web-based Neuron Profile UI to visualize execution traces with source code integration for efficient NKI kernel development and optimization.\n\nInstall Neuron Explorer\n------------------------\n\nEnsure that you have the latest version of the ``aws-neuronx-tools`` package installed as Neuron Explorer comes with this package. The ``aws-neuronx-tools`` package is pre-installed on Neuron DLAMIs.\n\n* For detailed installation instructions, see: :ref:`How to Get Started with Neuron Explorer <new-neuron-profiler-setup>`.\n\nProfile a NKI Kernel\n--------------------\n\nProfiling NKI (Neuron Kernel Interface) kernels helps you understand hardware level performance characteristics of your kernels running on AWS Trainium and Inferentia devices. When you write or optimize custom NKI kernels, profiling allows you to:\n\n* **Identify bottlenecks**: Determine if your kernel is compute-bound, memory-bound, or limited by data movement.\n* **Optimize performance**: Analyze kernel-level execution time, investigate compute engine utilization, look for opportunities to implement operator fusion to fine-tune performance.\n* **Compare implementations**: Benchmark different kernel implementations or configurations to pick the most efficient kernel.\n\nYou can profile NKI kernels using several approaches. In this guide, you'll learn two primary methods for profiling NKI kernels.\n\nHow to profile using neuron-explorer capture\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo profile an NKI kernel using neuron-explorer capture, follow these three steps:\n\n1. Set the environment variable ``NEURON_FRAMEWORK_DEBUG=1`` to instruct the compiler to save the NEFF (Neuron Executable File Format) file.\n2. Execute the NKI kernel to generate the NEFF file.\n3. Run ``neuron-explorer capture`` to create an Neuron Trace File Format (NTFF) file for performance analysis.\n\nEach of these steps is explained in detail below.\n\nStep 1: Set Environment Variables\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWe will profile a 3-layer MLP model that fuses matrix multiplications with ReLU activation functions and uses a NKI matrix multiplication kernel. The rest of this tutorial will use a performance profile generated from this example. Here is the implementation of ``mlp_with_mm_kernel.py``. Save this file before moving on to the next step::\n\n    \"\"\"\n    Example 3-layer MLP with matrix multiplication kernel to demonstrate Neuron Profile.\n    \"\"\"\n\n    import torch\n    import torch.nn as nn\n    import torch.nn.functional as F\n    import torch_neuronx\n    import nki\n    import nki.isa as nisa\n    import nki.language as nl\n    import os\n\n    os.environ[\"NEURON_LOGICAL_NC_CONFIG\"] = \"1\"\n\n    os.environ[\"NEURON_FRAMEWORK_DEBUG\"] = \"1\"\n    os.environ[\"XLA_IR_DEBUG\"] = \"1\"       # Preserve source-level IR names in the compiled graph for profiler source mapping\n    os.environ[\"XLA_HLO_DEBUG\"] = \"1\"      # Preserve HLO operation names and metadata for profiler attribution\n\n    @nki.jit\n    def nki_matmul(\n        lhsT,\n        rhs,\n        # Meta-parameters\n        TILES_IN_BLOCK_M=16,\n        TILES_IN_BLOCK_N=2,\n        TILES_IN_BLOCK_K=8,\n    ):\n        \"\"\"NKI kernel to compute a large matrix multiplication efficiently by\n        blocking all dimensions and doing layout optimization.\n\n        Args:\n            lhsT: an input tensor of shape [K,M], where K is a multiple of 128 *\n                TILES_IN_BLOCK_K and M is a multiple of 128 * TILES_IN_BLOCK_M.  It is the\n                left-hand-side argument of the matrix multiplication, delivered transposed\n                for optimal performance.\n            rhs: an input tensor of shape [K,N],  where K is a multiple of 128 *\n                TILES_IN_BLOCK_K and N is a multiple of 512 * TILES_IN_BLOCK_N.  It is\n                the right-hand-side argument of the matrix multiplication.\n            TILES_IN_BLOCK_*: meta parameters to control blocking dimensions\n        Returns:\n            result: the resulting output tensor of shape [M,N]\n        \"\"\"\n\n        # Verify that the lhsT and rhs have the same contraction dimension.\n        K, M = lhsT.shape\n        K_, N = rhs.shape\n        assert K == K_, \"lhsT and rhs must have the same contraction dimension\"\n\n        # Lookup the device matrix multiply dimensions.\n        TILE_M = nl.tile_size.gemm_stationary_fmax  # 128\n        TILE_K = nl.tile_size.pmax  # 128\n        TILE_N = nl.tile_size.gemm_moving_fmax  # 512\n\n        # Compute the block dimensions.\n        BLOCK_M = TILE_M * TILES_IN_BLOCK_M\n        BLOCK_N = TILE_N * TILES_IN_BLOCK_N\n        BLOCK_K = TILE_K * TILES_IN_BLOCK_K\n\n        # Verify the size is a multiple of block size\n        assert M % BLOCK_M == 0, \\\n            f\"Expected M {M} to be divisible by {BLOCK_M} when there are {TILES_IN_BLOCK_M}\"\n        assert N % BLOCK_N == 0, \\\n            f\"Expected N {N} to be divisible by {BLOCK_N} when there are {TILES_IN_BLOCK_N}\"\n        assert K % BLOCK_K == 0, \\\n            f\"Expected K {K} to be divisible by {BLOCK_K} when there are {TILES_IN_BLOCK_K}\"\n\n        # Create a space for the result in HBM (not initialized)\n        result = nl.ndarray((M, N), dtype=lhsT.dtype, buffer=nl.shared_hbm)\n\n        # Compute the number of blocks in each dimension\n        NUM_BLOCK_M = M // BLOCK_M\n        NUM_BLOCK_N = N // BLOCK_N\n        NUM_BLOCK_K = K // BLOCK_K\n\n        # Blocking N dimension (the RHS free dimension)\n        for n in nl.affine_range(NUM_BLOCK_N):\n            n_start = n * BLOCK_N\n            n_end = n_start + BLOCK_N\n\n            # Allocate and initialize result matrix N-block to 0.0.\n            #\n            # Each result M-tile stores its N-block contiguous on the free-dim\n            # with shape (TILE_M, TILES_IN_BLOCK_N, TILE_N). This layout allows\n            # reshaping to (TILE_M, BLOCK_N) for SBUF->HBM DMA to operate on a\n            # large payload, enabling good DMA efficiency.\n            #\n            # We split the N-block into individual M-tiles so the compiler can\n            # pipeline memset(0), matmul, tensor_tensor, and SBUF->HBM DMA\n            # on M-tile granularity.\n            result_m_tiles = []\n            for m in nl.affine_range(NUM_BLOCK_M):\n                for m_tile in nl.affine_range(TILES_IN_BLOCK_M):\n                    result_m_tile = nl.ndarray(\n                        shape=(TILE_M, TILES_IN_BLOCK_N, TILE_N),\n                        dtype=result.dtype,\n                        buffer=nl.sbuf,\n                    )\n                    nisa.memset(dst=result_m_tile, value=0.0)\n                    result_m_tiles.append(result_m_tile)\n\n            # Blocking K dimension (the contraction dimension)\n            for k in nl.sequential_range(NUM_BLOCK_K):\n                k_block_tile_start = k * TILES_IN_BLOCK_K\n\n                # Load tiles from RHS\n                # Load tiles one N-block at a time for good DMA efficiency.\n                rhs_tiles = nl.ndarray(\n                    shape=(TILE_K, TILES_IN_BLOCK_K, BLOCK_N),\n                    dtype=rhs.dtype,\n                    buffer=nl.sbuf,\n                )\n                for k_tile in range(TILES_IN_BLOCK_K):\n                    k_tile_start = (k_block_tile_start + k_tile) * TILE_K\n                    k_tile_end = k_tile_start + TILE_K\n                    nisa.dma_copy(\n                        dst=rhs_tiles[0:TILE_K, k_tile, 0:BLOCK_N],\n                        src=rhs[k_tile_start:k_tile_end, n_start:n_end],\n                    )\n\n                # Blocking M dimension (the LHS free dimension)\n                for m in nl.affine_range(NUM_BLOCK_M):\n                    # Loading tiles from lhsT\n                    # Load tiles one M-block at a time for good DMA efficiency.\n                    lhsT_tiles = nl.ndarray(\n                        shape=(TILE_K, TILES_IN_BLOCK_K, BLOCK_M),\n                        dtype=lhsT.dtype,\n                        buffer=nl.sbuf,\n                    )\n                    m_start = m * BLOCK_M\n                    m_end = m_start + BLOCK_M\n                    for k_tile in nl.affine_range(TILES_IN_BLOCK_K):\n                        k_tile_start = (k_block_tile_start + k_tile) * TILE_K\n                        k_tile_end = k_tile_start + TILE_K\n                        nisa.dma_copy(\n                            dst=lhsT_tiles[0:TILE_K, k_tile, 0:BLOCK_M],\n                            src=lhsT[k_tile_start:k_tile_end, m_start:m_end],\n                        )\n\n                    # Do matmul with all tiles in the blocks\n                    m_block_tile_start = m * TILES_IN_BLOCK_M\n                    for n_tile in nl.affine_range(TILES_IN_BLOCK_N):\n                        for m_tile in nl.affine_range(TILES_IN_BLOCK_M):\n                            result_tile = nl.ndarray(\n                                shape=(TILE_M, TILE_N), dtype=nl.float32, buffer=nl.psum\n                            )\n                            for k_tile in nl.affine_range(TILES_IN_BLOCK_K):\n                                m_tile_start = m_tile * TILE_M\n                                m_tile_end = m_tile_start + TILE_M\n                                n_tile_start = n_tile * TILE_N\n                                n_tile_end = n_tile_start + TILE_N\n                                nisa.nc_matmul(\n                                    dst=result_tile,\n                                    stationary=lhsT_tiles[0:TILE_K, k_tile, m_tile_start:m_tile_end],\n                                    moving=rhs_tiles[0:TILE_K, k_tile, n_tile_start:n_tile_end],\n                                )\n\n                            # Evict from PSUM to SBUF while accumulating into result M-tile.\n                            m_tile_idx = m_block_tile_start + m_tile\n                            result_m_tile = result_m_tiles[m_tile_idx]\n                            nisa.tensor_tensor(\n                                dst=result_m_tile[0:TILE_M, n_tile, 0:TILE_N],\n                                data1=result_m_tile[0:TILE_M, n_tile, 0:TILE_N],\n                                data2=result_tile,\n                                op=nl.add,\n                            )\n\n            # Evict the result M-tiles from SBUF to HBM.\n            # Copy on N-blocks granularity for good DMA efficiency.\n            for m in nl.affine_range(NUM_BLOCK_M):\n                m_block_tile_start = m * TILES_IN_BLOCK_M\n                for m_tile in nl.affine_range(TILES_IN_BLOCK_M):\n                    m_tile_idx = m_block_tile_start + m_tile\n                    result_m_tile = result_m_tiles[m_tile_idx]\n                    result_m_tile_block = result_m_tile.reshape((TILE_M, BLOCK_N))\n\n                    m_tile_start = m_tile_idx * TILE_M\n                    m_tile_end = m_tile_start + TILE_M\n                    nisa.dma_copy(\n                        dst=result[m_tile_start:m_tile_end, n_start:n_end],\n                        src=result_m_tile_block[0:TILE_M, 0:BLOCK_N],\n                    )\n\n        return result\n\n\n    class NKILinear(nn.Module):\n        def __init__(self, in_features, out_features):\n            super(NKILinear, self).__init__()\n            self.weight = nn.Parameter(torch.randn(out_features, in_features))\n            self.bias = nn.Parameter(torch.randn(out_features))\n\n        def forward(self, x):\n            weight_T = self.weight.t()\n            x_T = x.t()\n            output = nki_matmul(x_T, weight_T)\n            return output + self.bias\n\n\n    class MLP(nn.Module):\n        def __init__(self):\n            super(MLP, self).__init__()\n            self.fc1 = NKILinear(2048, 2048)\n            self.fc2 = NKILinear(2048, 1024)\n            self.fc3 = NKILinear(1024, 1024)\n\n        def forward(self, x):\n            x = F.relu(self.fc1(x))\n            x = F.relu(self.fc2(x))\n            x = self.fc3(x)\n            return F.log_softmax(x, dim=1)\n\n\n    def main():\n        torch.manual_seed(0)\n\n        model = MLP()\n        train_x = torch.randn(2048, 2048)\n\n        # Use torch_neuronx.trace to compile the model and generate the NEFF\n        traced_model = torch_neuronx.trace(model, train_x, compiler_args=\"--lnc=1\", compiler_workdir=\"./compiler_workdir\")\n\n        output = traced_model(train_x)\n        print(f\"Output tensor: {output}\")\n\n\n    if __name__ == \"__main__\":\n        main()\n\nAs you can see, at the very top we have added the following environment variables::\n\n    os.environ[\"NEURON_FRAMEWORK_DEBUG\"] = \"1\"\n    os.environ[\"XLA_IR_DEBUG\"] = \"1\"\n    os.environ[\"XLA_HLO_DEBUG\"] = \"1\"\n\nThese environment variables serve the following purposes:\n\n* ``NEURON_FRAMEWORK_DEBUG=1``: Enables Neuron debug output. This triggers the Neuron compiler to save the Neuron Executable File Format (NEFF) artifact to the current directory after compilation of your NKI kernel. The NEFF contains all hardware instructions required to execute your NKI kernel on a NeuronDevice, as well as metadata and debug info needed for profiling.\n* ``XLA_IR_DEBUG=1``: Preserves the mapping between high-level framework operations (e.g., PyTorch operators) and the intermediate representation (IR) passed to the compiler. This enables source code linking from device instructions back to framework-level code in the profiler.\n* ``XLA_HLO_DEBUG=1``: Preserves the mapping between the HLO (High Level Operation) graph and the original framework operations. This enables the profiler to display descriptive operator names and stack frame information, making it easier to identify which part of your model corresponds to each device instruction.\n\nStep 2: Compile Your NKI Kernel\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nCompile your NKI kernel to create a NEFF in your current directory::\n\n    $ python3 mlp_with_mm_kernel.py\n\n.. note:: The ``compiler_workdir`` argument to ``torch_neuronx.trace`` specifies the directory where the compiler saves artifacts, including the NEFF file. Look for your NEFF file inside the ``./compiler_workdir`` directory, which will be named ``graph.neff``.\n\nStep 3: Profile the Generated NEFF\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe last step is profiling the generated NEFF. This step executes the NEFF on the NeuronDevice and records a raw execution trace into a NTFF artifact::\n\n    $ neuron-explorer capture -n ./compiler_workdir/graph.neff -s profile.ntff --profile-nth-exec=2 --enable-dge-notifs\n\nThis will save your NTFF profile to ``profile_exec_2.ntff``.\n\n.. important::\n\n    The ``--profile-nth-exec=2`` option will profile your NEFF twice on the NeuronDevice and output a NTFF profile for the second iteration. This is recommended to avoid one-time warmup delays which can be seen in the first iteration of execution.\n\n    The ``--enable-dge-notifs`` option enables the capture of DGE DMA events but has known issues where it may overflow the status notification queue and cause execution timeouts when there are many DGE instructions.\n\nView the Neuron Explorer UI\n----------------------------\n\nThis section assumes you've completed the previous step and have already generated both the NEFF and NTFF files, and downloaded them on your local machine.\n\nNeuron Explorer includes an interactive, web-based UI for exploring execution traces in detail. In this section, we'll open the Neuron Explorer UI to examine NKI-specific profiling information. These details can be found in multiple areas of the interface — including instruction hover tooltips, instruction click panels, search results, and box select results. For a comprehensive overview of all available viewers, see the :doc:`Neuron Explorer documentation </tools/neuron-explorer/index>`.\n\nTo view the Neuron Profile Web UI, execute the view command to start Web UI, replacing ``<workspace>`` with a path to a folder to store your profiling artifacts::\n\n    $ neuron-explorer view --data-path ./<workspace>\n\n``<workspace>`` is a path that neuron profile will use for storing and managing profiles.\n\nThe above command should print a URL that you can click to open the web UI::\n\n    View a list of profiles at http://localhost:3001/\n\nPort Forwarding for Remote Instances\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf ``neuron-explorer view`` is run on a remote instance, you may need to use port forwarding to access the web UI. By default, neuron-explorer creates a web server on port 3001 and the API server on port 3002. To enable connection to your browser in you local computer, we will need to establish an ssh tunnel to both of the ports.\n\nFor example::\n\n    ssh -L 3001:localhost:3001 -L 3002:localhost:3002 <user>@<ip> -fN\n\nIf you created an EC2 instance with ``pem`` credentials, include it in the ``ssh`` tunnel below::\n\n    ssh -i ~/my-ec2.pem -L 3001:localhost:3001 -L 3002:localhost:3002 ubuntu@[PUBLIC_IP_ADDRESS] -fN\n\n\nUsing the Profile UI\n~~~~~~~~~~~~~~~~~~~~~\n\n* Once the ssh tunnel is setup, you can now open a browser and navigate to http://localhost:3001.\n\n   .. image:: /nki/img/how-to/nki-profiler-1.png\n      :align: center\n      :width: 750\n\n* Click on the button \"Upload Profile\" to upload NEFF and NTFF files, and give a meaningful name to your profile. Selecting a source code folder for code linking is optional.\n\n   .. image:: /nki/img/how-to/nki-profiler-2.png\n      :align: center\n      :width: 750\n\n* After the files are uploaded and processed, you will be able to open the profile from the list.\n\n   .. image:: /nki/img/how-to/nki-profiler-3.png\n      :align: center\n      :width: 750\n\n* If you click on the name of your profile in Profile Name column, it will navigate to profile page\n\n   .. image:: /nki/img/how-to/nki-profiler-4.png\n      :align: center\n      :width: 750\n\n* If you hover over any engine instruction in the timeline with your mouse, you will see instruction details in a pop-up box.\n\n   .. image:: /nki/img/how-to/nki-profiler-5.png\n      :align: center\n      :width: 750\n\n* If you click on any engine instruction in the timeline with your mouse, you will see event details in a panel below the timeline.\n\n   .. image:: /nki/img/how-to/nki-profiler-6.png\n      :align: center\n      :width: 750\n\n* To view hierarchy of this profile, click on Add Widget and select Hierarchy. For more details, see the :doc:`Hierarchy Viewer </tools/neuron-explorer/overview-hierarchy-view>` documentation.\n\n   .. image:: /nki/img/how-to/nki-profiler-7.png\n      :align: center\n      :width: 750\n\n* Using the Profiler's flexible layout support, you can drag and group every widget into any panel of your choice to customize the layout for your workflow.\n\n   .. image:: /nki/img/how-to/nki-profiler-8.png\n      :align: center\n      :width: 750\n\n1. If you right-click on an operator in the hierarchy timeline, it will highlight all related instructions in the instruction timeline.\n\n   .. image:: /nki/img/how-to/nki-profiler-9.png\n      :align: center\n      :width: 750\n\nView NKI Source Code in Neuron Profile\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can optionally include your NKI source code files for display in Neuron Profile. When provided, Neuron Profile loads the source code into an integrated viewer, displayed side-by-side with the execution timeline in the web UI. This makes it easier to navigate between the instruction trace and the corresponding NKI source code, and to track the exact version of the code that generated the profile. For more details on source code linking, see the :doc:`Source Code Viewer </tools/neuron-explorer/how-to-link-view-source-code>` documentation.\n\n.. note:: Even if you don't upload the source code, the NKI source filename and line number remain available in the instruction detail view as noted in View Neuron Profile UI.\n\n* If source code is uploaded with NEFF and NTFF file, you will be able to see the source code in the code editor. To open the code editor, click on **Add Widget** and select **Code Editor**.\n\n   .. image:: /nki/img/how-to/nki-profiler-10.png\n      :align: center\n      :width: 750\n\n* The code editor will be open on the right-hand side.\n\n   .. image:: /nki/img/how-to/nki-profiler-11.png\n      :align: center\n      :width: 750\n\n* Hover on an instruction that has NKI source location and **Command + left click** on Mac (**Ctrl + right click** on Windows), and it will jump to the line of the source code and highlight all of instructions related to this line.\n\n   .. image:: /nki/img/how-to/nki-profiler-12.png\n      :align: center\n      :width: 750\n\n* You can also enable different source code decorations in **Source Code Settings**.\n\n   .. image:: /nki/img/how-to/nki-profiler-13.png\n      :align: center\n      :width: 750\n\n   .. image:: /nki/img/how-to/nki-profiler-14.png\n      :align: center\n      :width: 750\n\nNext Steps\n----------\n\nGreat! Now that you've learned how to profile an NKI kernel, it's time to take this further:\n\n* Dive into the :doc:`NKI Performance Guide </nki/deep-dives/nki_perf_guide>` to discover techniques for making your kernels faster and more efficient.\n* Explore the `NKI sample kernels <https://github.com/aws-neuron/nki-samples>`__ to see real-world examples of high-performance kernel implementations — and get inspiration for your own NKI kernels.\n* Learn more about the Neuron Explorer viewers to deepen your profiling analysis:\n\n  * :doc:`Device Trace Viewer </tools/neuron-explorer/overview-device-profiles>` — Explore hardware-level execution with timeline view, operator table, and event details.\n  * :doc:`Hierarchy Viewer </tools/neuron-explorer/overview-hierarchy-view>` — Visualize execution from model layers down to hardware operations.\n  * :doc:`Source Code Viewer </tools/neuron-explorer/how-to-link-view-source-code>` — Navigate between source code and profile data with bidirectional linking.\n  * :doc:`Summary Viewer </tools/neuron-explorer/overview-summary-page>` — Get high-level performance insights and optimization recommendations.\n  * :doc:`AI Recommendation Viewer </tools/neuron-explorer/overview-ai-recommendations>` — Get AI-powered bottleneck analysis and optimization suggestions for NKI profiles.\n\nBy combining profiling insights with optimization strategies and practical examples, you'll be well prepared to write NKI kernels that leverage Neuron hardware in an efficient way.\n"
  },
  {
    "path": "nki/index.rst",
    "content": ".. _neuron-nki:\n\n.. meta::\n   :description: Neuron Kernel Interface (NKI) - Low-level programming interface for custom kernel development on AWS Trainium and Inferentia with direct NeuronCore ISA access.\n   :keywords: NKI, Neuron Kernel Interface, custom kernels, NeuronCore, AWS Neuron, Trainium, Inferentia, ISA, tile programming, torch.compile\n   :date-modified: 2026-04-02\n\nNeuron Kernel Interface (NKI)\n====================================\n\nNeuron Kernel Interface (NKI) is a bare-metal language and compiler for directly programming NeuronDevices\navailable on AWS Trn/Inf instances. You can use NKI to develop, optimize and run new operators directly on\nNeuronCores while making full use of available compute and memory resources. NKI empowers ML developers to\nself-serve and invent new ways to use the NeuronCore hardware, starting NeuronCores v2 (Trainium1) and beyond.\n\nNKI provides developers with direct access to the NeuronCore ISA (Instruction Set Architecture), accessible from a\nPython-based programming environment, which has syntax and tile-level semantics that are similar to\n`Triton <https://triton-lang.org/main/index.html>`_ and `NumPy <https://numpy.org/doc/stable/>`_.\nThis enables developers to get started quickly and optimize performance in a familiar environment, while at the same\ntime get full control of the underlying hardware. At the hardware level, NeuronCore's tensorized memory access\ncapability enables efficient reading and writing of multi-dimensional arrays on a per instruction basis,\nwhich makes NKI's tile-based programming highly suitable for the NeuronCore instruction set.\n\nFor comparison, before NKI was introduced, the only way to program NeuronDevices was through defining high-level ML\nmodels in frameworks such as `PyTorch <https://pytorch.org/>`_\nand `JAX <https://jax.readthedocs.io/en/latest/index.html>`_.\nNeuron Compiler takes such high-level model definitions as input,\nperforms multiple rounds of optimization, and eventually generates a NEFF (Neuron Executable File Format) that\nis executable on NeuronDevices. At a high level, Neuron Compiler runs the following optimization stages in order:\n\n1. **Hardware-agnostic graph-level optimizations.** These transformations are done in the compiler front-end,\n   using `XLA <https://openxla.org/xla>`_, including optimizations like constant propagation, re-materialization\n   and operator fusion.\n\n2. **Loop-level optimization.** Compiler turns the optimized graph from Step 1 into a series of loop nests\n   and performs layout, tiling and loop fusion optimizations.\n\n3. **Hardware intrinsics mapping.** Compiler maps the architecture-agnostic loop nests from Step 2 into\n   architecture-specific instructions.\n\n4. **Hardware-specific optimizations.** These optimizations are mainly\n   done at the instruction level in compiler back-end,\n   with a key goal of reducing memory pressure and improving instruction-level parallelism. For example, memory\n   allocation and instruction scheduling are done in this stage.\n\nNKI kernels bypass the first 3 steps, and are compiled into IRs (intermediate representations) that the compiler's\nback-end (Step 4 above) can directly consume. Advanced features in NKI, such as direct allocation, also allow programmers\nto bypass certain compiler passes in Step 4. As a result, NKI developers can now have great control over NeuronDevices down to\nthe instruction level. We highly recommend developers to study the underlying hardware architecture before\noptimizing performance of their NKI kernels. See the NKI guide below to learn more!\n\n.. _api_reference_guide:\n\nAPI Reference Guide\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. grid:: 2\n      :margin: 4 1 0 0\n\n      .. grid-item::\n\n            .. card:: NKI API Reference Manual\n                  :link: nki_api_reference\n                  :link-type: ref\n                  :class-body: sphinx-design-class-title-small\n\n\n.. toctree::\n      :maxdepth: 1\n      :hidden:\n\n      NKI FAQ <nki_faq>\n\n\n"
  },
  {
    "path": "nki/library/about/index.rst",
    "content": ".. meta::\n    :description: Overviews and conceptual docs for the NKI Library . NKI Library provides pre-built NKI kernels you can use in model development with Neuron.\n    :date-modified: 12/02/2025\n\n.. _nkl_overviews_home:\n\nAbout the NKI Library \n======================================\n\nLearn about the NKI Library and the pre-built kernels it provides to accelerate the performance of your models.\n\nWhat is the NKI Library?\n-----------------------------------\n\nThe NKI Library is a collection of pre-built NKI kernels optimized for AWS Neuron-powered devices. These kernels are designed to accelerate machine learning workloads by providing efficient implementations of common operations used in deep learning models. NKI kernels are commonly used to implement custom PyTorch operators that run on NeuronCores, enabling developers to optimize performance-critical operations beyond what the Neuron Compiler generates automatically.\n\nHow do I use the NKI Library?\n------------------------------\n\nThe kernels in the NKI Library are provided in a public GitHub repository that you can clone and integrate into your Neuron-based model development workflow. You can use these kernels directly in your models to take advantage of their optimized performance on Neuron hardware. \n\n* **NKI Library repository**: https://github.com/aws-neuron/nki-library\n\nTo get started using NKI Library kernels in your model development, clone or fork the repo and follow the instructions in the `README <https://github.com/aws-neuron/nki-library/blob/main/README.md>`_ file.\n\nResources\n---------\n\n* :doc:`NKI Library Kernel API Reference </nki/library/api/index>`\n* :doc:`NKI Library Kernel Design Specifications </nki/library/specs/index>`\n* :doc:`NKI Documentation </nki/index>`\n\n    "
  },
  {
    "path": "nki/library/api/attention-block-tkg.rst",
    "content": ".. meta::\n    :description: Attention Block TKG kernel implements fused attention block optimized for Token Generation.\n    :date-modified: 02/13/2026\n\n.. currentmodule:: nkilib.core.attention_block_tkg.attention_block_tkg\n\n.. _nki_library_attention_block_tkg:\n\nAttention Block TKG Kernel API Reference\n=========================================\n\n**[Experimental]** Implements a fully fused attention block optimized for Token Generation (autoregressive decoding), keeping all intermediate tensors in SBUF to minimize HBM traffic.\n\nThe kernel supports:\n\n* Fused multi-stage computation: pre-normalization, QKV projection, RoPE, post-normalization, attention, KV cache update, and output projection\n* Multiple KV cache layouts: flat (transposed/non-transposed) and block-based\n* Grouped-Query Attention (GQA) with configurable Q/KV head ratios\n* Optional RMSNorm at multiple stages (pre-projection, post-projection per-head)\n* Optional Rotary Position Embedding (RoPE) with configurable layouts\n* Flexible quantization support (FP8, FP16, BF16)\n* FP8 KV cache quantization support\n* Configurable softmax scaling factor\n* Batch processing with per-batch cache indexing\n* Single program multiple data (SPMD) sharding for distributed computation\n\nBackground\n----------\n\nThe ``attention_block_tkg`` kernel combines multiple stages of transformer attention computation into a single fused operation that minimizes data movement between HBM and on-chip memory (SBUF).\n\n**Fused Operations:**\n\nThe kernel fuses the following stages in SBUF to avoid HBM round-trips:\n\n1. **Pre-normalization**: Optional RMSNorm on input hidden states\n2. **QKV Projection**: Linear projection to Query, Key, Value tensors\n3. **RoPE**: Optional Rotary Position Embedding on Q and K\n4. **Post-normalization**: Optional per-head RMSNorm on Q and K\n5. **Attention Computation**: Scaled dot-product attention with KV cache\n6. **KV Cache Update**: Write new K/V tokens to cache\n7. **Output Projection**: Linear projection of attention output\n\n**Performance Benefits:**\n\nBy keeping intermediate tensors in SBUF throughout the computation, this kernel achieves:\n\n* Reduced HBM bandwidth consumption\n* Lower latency for token generation\n* Better hardware utilization through operation fusion\n\nAPI Reference\n-------------\n\n**Source code for this kernel API can be found at**: `attention_block_tkg.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/transformer/attention_block_tkg.py>`_\n\nattention_block_tkg\n^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: attention_block_tkg(X: nl.ndarray, X_hidden_dim_actual: Optional[int], rmsnorm_X_enabled: bool, rmsnorm_X_eps: Optional[float], rmsnorm_X_gamma: Optional[nl.ndarray], W_qkv: nl.ndarray, bias_qkv: Optional[nl.ndarray], quantization_type_qkv: QuantizationType, weight_dequant_scale_qkv: Optional[nl.ndarray], input_dequant_scale_qkv: Optional[nl.ndarray], rmsnorm_QK_pre_rope_enabled: bool, rmsnorm_QK_pre_rope_eps: float, cos: Optional[nl.ndarray], sin: Optional[nl.ndarray], rope_contiguous_layout: bool, rmsnorm_QK_post_rope_enabled: bool, rmsnorm_QK_post_rope_eps: float, rmsnorm_QK_post_rope_W_Q: Optional[nl.ndarray], rmsnorm_QK_post_rope_W_K: Optional[nl.ndarray], K_cache_transposed: bool, active_blocks_table: Optional[nl.ndarray], K_cache: nl.ndarray, V_cache: nl.ndarray, attention_mask: nl.ndarray, sink: Optional[nl.ndarray], softmax_scale: Optional[float] = None, update_cache: bool, kv_cache_update_idx: Optional[nl.ndarray], k_scale: Optional[nl.ndarray] = None, v_scale: Optional[nl.ndarray] = None, W_out: Optional[nl.ndarray], bias_out: Optional[nl.ndarray], quantization_type_out: QuantizationType, weight_dequant_scale_out: Optional[nl.ndarray], input_dequant_scale_out: Optional[nl.ndarray], transposed_out: bool, out_in_sb: bool, sbm: Optional[SbufManager] = None, skip_attention: bool = False)\n\n   Fused Attention Block for Token Generation (TKG).\n\n   Performs end-to-end attention block computation optimized for autoregressive decoding:\n   X → [RMSNorm] → QKV Projection → [RMSNorm Q/K] → [RoPE] → [RMSNorm Q/K] →\n   Attention → KV Cache Update → [Output Projection] → Output\n\n   All intermediate tensors remain in SBUF to minimize HBM traffic.\n\n   :param X: Input hidden states ``[B, S_tkg, H]`` @ HBM or ``[pmax, B*S_tkg, H//pmax]`` @ SBUF\n   :type X: ``nl.ndarray``\n   :param X_hidden_dim_actual: Actual hidden dim if X is padded\n   :type X_hidden_dim_actual: ``int``, optional\n   :param rmsnorm_X_enabled: Apply RMSNorm to X before QKV projection\n   :type rmsnorm_X_enabled: ``bool``\n   :param rmsnorm_X_eps: RMSNorm epsilon (default 1e-3)\n   :type rmsnorm_X_eps: ``float``, optional\n   :param rmsnorm_X_gamma: RMSNorm weights ``[1, H]`` @ HBM\n   :type rmsnorm_X_gamma: ``nl.ndarray``, optional\n   :param W_qkv: QKV projection weights ``[H, d_head*(q_heads+2)]`` @ HBM\n   :type W_qkv: ``nl.ndarray``\n   :param bias_qkv: QKV bias ``[1, d_head*(q_heads+2)]`` @ HBM\n   :type bias_qkv: ``nl.ndarray``, optional\n   :param quantization_type_qkv: Quantization type for QKV projection\n   :type quantization_type_qkv: ``QuantizationType``\n   :param weight_dequant_scale_qkv: Weight dequantization scale for QKV projection\n   :type weight_dequant_scale_qkv: ``nl.ndarray``, optional\n   :param input_dequant_scale_qkv: Input dequantization scale for QKV projection\n   :type input_dequant_scale_qkv: ``nl.ndarray``, optional\n   :param rmsnorm_QK_pre_rope_enabled: Apply RMSNorm to Q/K before RoPE\n   :type rmsnorm_QK_pre_rope_enabled: ``bool``\n   :param rmsnorm_QK_pre_rope_eps: Pre-RoPE RMSNorm epsilon\n   :type rmsnorm_QK_pre_rope_eps: ``float``\n   :param cos: RoPE cosine embeddings ``[d_head//2, B, S_tkg]`` @ HBM (None = skip RoPE)\n   :type cos: ``nl.ndarray``, optional\n   :param sin: RoPE sine embeddings ``[d_head//2, B, S_tkg]`` @ HBM (None = skip RoPE)\n   :type sin: ``nl.ndarray``, optional\n   :param rope_contiguous_layout: True for contiguous halves, False for interleaved\n   :type rope_contiguous_layout: ``bool``\n   :param rmsnorm_QK_post_rope_enabled: Apply RMSNorm to Q/K after RoPE\n   :type rmsnorm_QK_post_rope_enabled: ``bool``\n   :param rmsnorm_QK_post_rope_eps: Post-RoPE RMSNorm epsilon\n   :type rmsnorm_QK_post_rope_eps: ``float``\n   :param rmsnorm_QK_post_rope_W_Q: Post-RoPE Q weights ``[1, d_head]`` @ HBM\n   :type rmsnorm_QK_post_rope_W_Q: ``nl.ndarray``, optional\n   :param rmsnorm_QK_post_rope_W_K: Post-RoPE K weights ``[1, d_head]`` @ HBM\n   :type rmsnorm_QK_post_rope_W_K: ``nl.ndarray``, optional\n   :param K_cache_transposed: K cache layout flag\n   :type K_cache_transposed: ``bool``\n   :param active_blocks_table: Block indices for block KV cache ``[B, num_blocks]`` @ HBM\n   :type active_blocks_table: ``nl.ndarray``, optional\n   :param K_cache: Key cache @ HBM\n   :type K_cache: ``nl.ndarray``\n   :param V_cache: Value cache @ HBM\n   :type V_cache: ``nl.ndarray``\n   :param attention_mask: Attention mask ``[S_ctx, B, q_heads, S_tkg]`` @ HBM\n   :type attention_mask: ``nl.ndarray``\n   :param sink: Attention sink tokens ``[H, 1]`` @ HBM\n   :type sink: ``nl.ndarray``, optional\n   :param softmax_scale: Scaling factor for attention scores (``Q @ K^T * softmax_scale``). If ``None``, defaults to ``1.0 / sqrt(d_head)``.\n   :type softmax_scale: ``float``, optional\n   :param update_cache: Update KV cache with new tokens\n   :type update_cache: ``bool``\n   :param kv_cache_update_idx: Cache write positions ``[B, 1]`` (uint32_max = skip)\n   :type kv_cache_update_idx: ``nl.ndarray``, optional\n   :param k_scale: Key quantization scale for FP8 KV cache. Enables FP8 quantization of K values written to cache.\n   :type k_scale: ``nl.ndarray``, optional\n   :param v_scale: Value quantization scale for FP8 KV cache. Enables FP8 quantization of V values written to cache.\n   :type v_scale: ``nl.ndarray``, optional\n   :param W_out: Output projection weights ``[q_heads*d_head, H]`` @ HBM\n   :type W_out: ``nl.ndarray``, optional\n   :param bias_out: Output projection bias ``[1, H]`` @ HBM\n   :type bias_out: ``nl.ndarray``, optional\n   :param quantization_type_out: Quantization type for output projection\n   :type quantization_type_out: ``QuantizationType``\n   :param weight_dequant_scale_out: Weight dequantization scale for output projection\n   :type weight_dequant_scale_out: ``nl.ndarray``, optional\n   :param input_dequant_scale_out: Input dequantization scale for output projection\n   :type input_dequant_scale_out: ``nl.ndarray``, optional\n   :param transposed_out: Transpose output layout (requires W_out)\n   :type transposed_out: ``bool``\n   :param out_in_sb: Return output in SBUF instead of HBM\n   :type out_in_sb: ``bool``\n   :param sbm: SBUF memory manager (otherwise auto-allocated)\n   :type sbm: ``SbufManager``, optional\n   :param skip_attention: Skip attention computation (for testing). Default: False.\n   :type skip_attention: ``bool``\n   :return: Tuple of (out, K_out, V_out) - Output tensor, updated K cache or new K tokens, updated V cache or new V tokens\n   :rtype: ``tuple``\n\n   **Dimensions**:\n\n   * B: batch size\n   * S_tkg: number of new tokens to generate\n   * S_ctx: KV cache sequence length in current bucket\n   * S_max_ctx: maximum KV cache capacity of current bucket\n   * H: hidden dimension\n   * d_head: head dimension (must be even)\n   * q_heads: number of query heads\n   * kv_heads: 1 (GQA with single KV head)\n\n   **Supported Data Types**:\n\n   * Supports nl.float16 and nl.bfloat16\n\n   **Constraints**:\n\n   * Requires NeuronCore v3+\n   * d_head must be even\n   * H must be multiple of 128\n   * Requires ``batch * sequence_tkg * q_heads <= pmax (=128)``\n\n\nImplementation Details\n----------------------\n\n**Computation Flow:**\n\nThe kernel executes the following stages in sequence:\n\n1. **Input Pre-normalization** (optional):\n   \n   - Apply RMSNorm to input hidden states: ``X_norm = RMSNorm(X, rmsnorm_pre_W, rmsnorm_pre_eps)``\n   - Computed in FP32, result cast back to input dtype\n\n2. **QKV Projection**:\n   \n   - Compute ``QKV = X_norm @ W_qkv.T`` using matrix multiplication\n   - Result shape: ``[B, S_tkg, (q_heads + 2) * d_head]``\n   - Supports FP8 quantization with dequantization scales\n\n3. **Q/K Processing** (per head group):\n   \n   - Extract Q heads: ``Q = QKV[:, :, :q_heads * d_head]``\n   - Extract K head: ``K = QKV[:, :, q_heads * d_head : (q_heads + 1) * d_head]``\n   - Apply RoPE if enabled: ``Q, K = RoPE(Q, K, cos, sin, position_ids)``\n   - Apply per-head RMSNorm if enabled: ``Q = RMSNorm(Q, rmsnorm_post_W_Q)``, ``K = RMSNorm(K, rmsnorm_post_W_K)``\n\n4. **V Processing**:\n   \n   - Extract V head: ``V = QKV[:, :, (q_heads + 1) * d_head :]``\n\n5. **KV Cache Update**:\n   \n   - Write new K/V tokens to cache at positions specified by ``kv_cache_update_idx``\n   - Supports multiple cache layouts (flat, transposed, block-based)\n   - Uses indirect addressing for efficient batch processing\n\n6. **Attention Computation**:\n   \n   - Compute scaled dot-product attention: ``Attn = softmax(Q @ K_cache.T / scale) @ V_cache``\n   - Apply causal masking based on ``S_ctx`` (context lengths)\n   - Use FP32 accumulation if ``mixed_precision=True``\n   - Supports Grouped-Query Attention by replicating KV heads\n\n7. **Output Projection**:\n   \n   - Reshape attention output: ``Attn_flat = Attn.reshape([B, S_tkg, q_heads * d_head])``\n   - Compute ``out = Attn_flat @ W_o.T``\n   - Supports FP8 quantization with dequantization scales\n\n**Memory Management:**\n\nThe kernel uses a custom SBUF memory manager (``SbufManager``) to efficiently allocate and reuse on-chip memory:\n\n- Stack-based allocation for temporary tensors\n- Automatic memory reuse after tensor lifetime ends\n- Minimizes SBUF fragmentation\n\n**Parallelization:**\n\nThe kernel supports data parallelism across multiple Neuron Cores:\n\n- Batch dimension (``B``) can be sharded across cores\n- Each core processes a subset of batch elements independently\n- KV cache updates use per-core indexing\n\n**Cache Layout Support:**\n\n1. **Flat Cache** (``is_block_kv=False``):\n   \n   - K cache: ``[B, S_max_ctx, d_head]`` or ``[B, d_head, S_max_ctx]`` (transposed)\n   - V cache: ``[B, S_max_ctx, d_head]``\n   - Direct indexing by batch and sequence position\n\n2. **Block Cache** (``is_block_kv=True``):\n   \n   - K/V cache: ``[num_blocks, block_len, d_head]``\n   - Indirect indexing via block slot mapping\n   - Efficient for variable-length sequences\n\n**Quantization Support:**\n\n- FP8 weights: Provide ``qkv_scale`` and ``o_scale`` for dequantization\n- Mixed precision: FP32 accumulation with FP16/BF16 inputs\n- Automatic dtype handling throughout the pipeline\n\n**Key Implementation Notes:**\n\n1. **Grouped-Query Attention**: The kernel processes Q heads in groups, where each group shares a single K/V head. This reduces KV cache memory by a factor of ``q_heads / kv_heads``.\n\n2. **RoPE Application**: Rotary embeddings are applied using position indices derived from ``S_ctx`` (current context length). Supports both contiguous and interleaved layouts.\n\n3. **Causal Masking**: Attention scores are masked such that token at position ``i`` can only attend to positions ``0`` to ``i`` in the context. Implemented by adding ``-inf`` to masked positions before softmax.\n\n4. **Cache Update Optimization**: \n   \n   - For ``S_tkg=1``: Uses batched vector DMA with ``vector_offset`` for all batches in one operation\n   - For ``S_tkg>1``: Uses per-batch scalar DMA with ``scalar_offset``\n   - Block cache uses indirect addressing via block slot indices\n\n5. **Memory Efficiency**: All intermediate tensors (QKV, Q, K, V, attention scores, attention output) remain in SBUF. Only input ``X``, weights, caches, and final output ``out`` reside in HBM.\n"
  },
  {
    "path": "nki/library/api/attention-cte.rst",
    "content": ".. meta::\n    :description: Attention CTE kernel implements attention optimized for Context Encoding (prefill) use cases.\n    :date-modified: 04/09/2026\n\n.. currentmodule:: nkilib.core.attention.attention_cte\n\nAttention CTE Kernel API Reference\n===================================\n\nImplements attention optimized for Context Encoding (prefill) use cases with long sequence lengths.\n\nThe kernel supports:\n\n* Efficient attention computation for long sequence lengths\n* Causal masking\n* Sliding window attention\n* Context parallelism for distributed computation\n* Prefix caching for efficient inference\n* Sink tokens for streaming attention\n* Native Grouped Query Attention (GQA) support\n* Softmax caching for training\n* Sequence packing with per-query KV range bounds\n\nBackground\n--------------\n\nThe ``Attention CTE`` kernel is designed specifically for context encoding (prefill) scenarios where the sequence length is large (typically > 256). It performs the standard attention operation ``Attention(Q, K, V) = softmax(scale * Q @ K^T) @ V`` with optimizations for long sequence lengths.\n\nThe kernel employs efficient tiling strategies and memory access patterns to maximize performance on Neuron hardware. It supports various optimizations including flash attention for long sequences, LNC sharding, and context parallelism.\n\nAPI Reference\n----------------\n\n**Source code for this kernel API can be found at**: `attention_cte.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/attention/attention_cte.py>`_\n\n\nattention_cte\n^^^^^^^^^^^^^^^\n\n.. py:function:: attention_cte(q: nl.ndarray, k: nl.ndarray, v: nl.ndarray, scale: float = 1.0, causal_mask: bool = True, k_prior: Optional[nl.ndarray] = None, v_prior: Optional[nl.ndarray] = None, prior_used_len: Optional[nl.ndarray] = None, sink: Optional[nl.ndarray] = None, sliding_window: Optional[int] = None, tp_q: bool = True, tp_k: bool = False, tp_out: bool = False, cache_softmax: bool = False, softmax_dtype=nl.float32, mm_out_dtype=nl.float32, cp_offset: Optional[nl.ndarray] = None, global_cp_deg: int = None, cp_strided_q_slicing: bool = False, bound_min: Optional[nl.ndarray] = None, bound_max: Optional[nl.ndarray] = None)\n\n   Entrypoint NKI kernel that supports multiple attention variants.\n\n   The kernel can be invoked with 1D SPMD grid for LNC2 or without grid.\n\n   :param q: Query tensor with layout dependent on ``tp_q`` parameter\n   :type q: ``nl.ndarray``\n   :param k: Key tensor with layout dependent on ``tp_k`` parameter\n   :type k: ``nl.ndarray``\n   :param v: Value tensor with shape ``(batch_size_kv, seqlen, d)``\n   :type v: ``nl.ndarray``\n   :param scale: Scaling factor for attention scores. Must be 1.0 when using sliding window, context parallel, or prefix caching.\n   :type scale: ``float``, optional\n   :param causal_mask: Whether to use causal mask\n   :type causal_mask: ``bool``, optional\n   :param k_prior: (Prefix caching) Prior key tensor with layout dependent on ``tp_k`` parameter\n   :type k_prior: ``nl.ndarray``, optional\n   :param v_prior: (Prefix caching) Prior value tensor with shape ``(batch_size_kv, seqlen_prior, d)``\n   :type v_prior: ``nl.ndarray``, optional\n   :param prior_used_len: (Prefix caching) Actual used length in prior with shape ``(1,)``\n   :type prior_used_len: ``nl.ndarray``, optional\n   :param sink: Sink token tensor\n   :type sink: ``nl.ndarray``, optional\n   :param sliding_window: Sliding window size for attention, ``None`` or ``0`` denotes no sliding window mask\n   :type sliding_window: ``int``, optional\n   :param tp_q: Query tensor transpose flag\n   :type tp_q: ``bool``, optional\n   :param tp_k: Key tensor transpose flag\n   :type tp_k: ``bool``, optional\n   :param tp_out: Output tensor transpose flag\n   :type tp_out: ``bool``, optional\n   :param cache_softmax: Whether to cache softmax intermediate values\n   :type cache_softmax: ``bool``, optional\n   :param softmax_dtype: Data type for softmax computations\n   :type softmax_dtype: ``nl.dtype``, optional\n   :param mm_out_dtype: Data type for matmul output accumulation. Default: ``nl.float32``.\n   :type mm_out_dtype: ``nl.dtype``, optional\n   :param cp_offset: Context parallel offset tensor\n   :type cp_offset: ``nl.ndarray``, optional\n   :param global_cp_deg: Global context parallel degree\n   :type global_cp_deg: ``int``, optional\n   :param cp_strided_q_slicing: Whether to use strided Q slicing for context parallelism. Default: False.\n   :type cp_strided_q_slicing: ``bool``\n   :param bound_min: (Sequence packing) Per-query minimum KV index bounds with shape ``(batch_size, seqlen_q)``. When provided with ``bound_max``, restricts the KV range each query attends to. Default: ``None`` (no packing).\n   :type bound_min: ``nl.ndarray``, optional\n   :param bound_max: (Sequence packing) Per-query maximum KV index bounds with shape ``(batch_size, seqlen_q)``. When provided with ``bound_min``, restricts the KV range each query attends to. Default: ``None`` (no packing).\n   :type bound_max: ``nl.ndarray``, optional\n   :return: Output tensor with attention results. Shape depends on ``tp_out`` parameter. If ``cache_softmax`` is ``True``, returns tuple of ``(output, out_neg_max, out_sum_recip)``.\n   :rtype: ``nl.ndarray`` or ``tuple``\n\n   **IO Shapes**:\n\n   * q:\n     ``(batch_size, seqlen_q, d)`` when ``tp_q`` is ``True``\n     ``(batch_size, d, seqlen_q)`` when ``tp_q`` is ``False``\n   * k:\n     ``(batch_size_kv, seqlen_kv, d)`` when ``tp_k`` is ``True``\n     ``(batch_size_kv, d, seqlen_kv)`` when ``tp_k`` is ``False``\n   * v: ``(batch_size_kv, seqlen_kv, d)``\n   * returns:\n     ``(batch_size, d, seqlen_q)`` if ``tp_out`` is ``True``\n     ``(batch_size, seqlen_q, d)`` if ``tp_out`` is ``False``\n\n   **Constraints**:\n\n   * Head dimension (``d``) must be <= 128\n   * ``scale`` must be 1.0 when using sliding window, context parallel, or prefix caching\n   * Context parallelism currently only supports causal attention\n   * Sliding window attention currently only supports causal attention\n\nFeatures\n-----------\n\n1. **Causal Masking (causal_mask=True)**:\n   \n   * Masks upper triangle of attention scores: ``S[i,j] = -inf`` when ``i < j``\n   * Enables compute skipping: skip MM1/MM2 for upper triangle tiles\n\n2. **Sliding Window Attention (SWA, when sliding_window > 0)**:\n   \n   * Local attention: each query only attends to nearby keys within a window\n   * Masks attention scores: ``S[i,j] = -inf`` when ``|i - j| > sliding_window``\n   * Currently only works with causal: masks both upper triangle AND positions outside window\n   * When used with CP: loads only required KV slice to save memory\n\n3. **Context Parallelism (CP, global_cp_deg > 1, cp_offset != None)**:\n   \n   * Distributes long sequence computation across multiple devices/ranks\n   * Each rank (kernel call) processes a slice of Q sequence with full K/V\n   * ``cp_offset`` indicates which Q slice this rank handles (runtime value)\n   * Requires dynamic masking since offset unknown at compile time\n   * Currently only supports causal attention\n\n4. **Prefix Caching (k_prior/v_prior provided)**:\n   \n   * K/V split into two parts: prior (cached) and active (current)\n   * ``prior_used_len`` specifies how much of prior to use (dynamic mask)\n   * Causal mask not required for prior portion (although SWA still applies if enabled)\n\n5. **Sink Tokens (sink provided)**:\n   \n   * Add additional sink token to softmax denominator\n\n6. **Grouped Query Attention (GQA, batch_size_kv < batch_size)**:\n   \n   * Kernel handles GQA natively without explicit K/V replication\n\n7. **Support for training**:\n   \n   * Kernel can optionally return maximum attention score and softmax denominator (per row) for backpropagation\n\nImplementation Details\n-------------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **LNC2 Sharding**: Shards computation across 2 NeuronCores with primary sharding on batch dimension and secondary sharding on sequence length for odd batch sizes.\n\n2. **Flash Attention**: For K/V length > 10K tokens, divides into 8K-token sections and processes one section at a time to fit in SBUF memory.\n\n3. **Software Pipelining**: Overlaps operations across Q groups (``i``, ``i+1``, ``i+2``) for efficient hardware utilization:\n   \n   * Group ``i``: PV computation, writeback\n   * Group ``i+1``: Exp computation\n   * Group ``i+2``: Q load, QK computation\n\n4. **Modular Allocation**: Uses efficient buffer reuse with modular allocation for intermediate tensors.\n\n5. **Dynamic Masking**: Implements efficient masking strategies for causal, sliding window, and context parallel scenarios.\n\n6. **Optimized Memory Access**: Employs careful memory access patterns to optimize data movement between HBM and SBUF.\n\n\n\nSee Also\n-----------\n\n* :doc:`Attention TKG Kernel API Reference </nki/library/api/attention-tkg>`\n"
  },
  {
    "path": "nki/library/api/attention-tkg.rst",
    "content": ".. meta::\n    :description: Attention TKG kernel implements attention optimized for Token Generation (decode) use cases.\n    :date-modified: 11/28/2025\n\n.. currentmodule:: nkilib.core.attention_tkg\n\nAttention TKG Kernel API Reference\n===================================\n\nImplements attention optimized for Token Generation (decode) use cases with small active sequence lengths.\n\nThe kernel supports:\n\n* Efficient attention computation for small active sequence lengths\n* Flexible tensor placement in SBUF or HBM\n* Adaptive LNC2 sharding strategies\n* In-kernel mask generation\n* Fused RoPE (Rotary Position Embedding)\n* Block KV cache for efficient long-context inference\n* Attention sink for streaming attention\n* GPSIMD optimizations for inter-core communication\n\nBackground\n--------------\n\nThe ``Attention TKG`` kernel is designed specifically for token generation (decoding) scenarios where the active sequence length is small (typically ≤ 7). It performs the standard attention operation ``Attention(Q, K, V) = softmax(Q @ K^T) @ V`` with optimizations for small active sequence lengths and large KV caches.\n\nThe kernel employs efficient tiling strategies and memory access patterns to maximize performance on Neuron hardware. It supports various optimizations including LNC sharding, block KV cache, and attention sink for streaming attention.\n\nAPI Reference\n----------------\n\n**Source code for this kernel API can be found at**: `attention_tkg.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/attention/attention_tkg.py>`_\n\nAttnTKGConfig\n^^^^^^^^^^^^^^^\n\n.. py:class:: AttnTKGConfig\n\n   Configuration for token-generation attention kernel.\n\n   This dataclass contains shape parameters and performance optimization flags\n   for the attention_tkg kernel, which is optimized for small active sequence lengths.\n\n   .. py:attribute:: bs\n      :type: int\n      :value: 0\n\n      Batch size\n\n   .. py:attribute:: q_head\n      :type: int\n      :value: 0\n\n      Number of query heads\n\n   .. py:attribute:: s_active\n      :type: int\n      :value: 0\n\n      Active sequence length (>1 means speculative decoding)\n\n   .. py:attribute:: curr_sprior\n      :type: int\n      :value: 0\n\n      Current prior sequence length (KV cache length for this execution)\n\n   .. py:attribute:: full_sprior\n      :type: int\n      :value: 0\n\n      Full prior sequence length (maximum KV cache capacity)\n\n   .. py:attribute:: d_head\n      :type: int\n      :value: 0\n\n      Head dimension (embedding size per head)\n\n   .. py:attribute:: block_len\n      :type: int\n      :value: 0\n\n      Block length for block KV cache (0 if not using block KV)\n\n   .. py:attribute:: tp_k_prior\n      :type: bool\n      :value: False\n\n      Specifies that k_prior is transposed (shape ``[B, 1, d, s_prior]`` instead of ``[B, 1, s_prior, d]``)\n\n   .. py:attribute:: strided_mm1\n      :type: bool\n      :value: True\n\n      Use strided memory access for first matmul to improve cache locality\n\n   .. py:attribute:: use_pos_id\n      :type: bool\n      :value: False\n\n      Generate attention mask from position IDs in-kernel instead of loading pre-generated mask\n\n   .. py:attribute:: fuse_rope\n      :type: bool\n      :value: False\n\n      Fuse RoPE (Rotary Position Embedding) computation into the kernel\n\n   .. py:attribute:: use_gpsimd_sb2sb\n      :type: bool\n      :value: True\n\n      Use GPSIMD instructions for SBUF-to-SBUF data transfers (LNC2 sharding)\n\n   .. py:attribute:: qk_in_sb\n      :type: bool\n      :value: False\n\n      Query and key tensors are already in SBUF instead of HBM\n\n   .. py:attribute:: k_out_in_sb\n      :type: bool\n      :value: False\n\n      Output key tensor after RoPE should be stored in SBUF instead of HBM\n\n   .. py:attribute:: out_in_sb\n      :type: bool\n      :value: False\n\n      Output tensor should be stored in SBUF instead of HBM\n\nattention_tkg\n^^^^^^^^^^^^^^^\n\n.. py:function:: attention_tkg(q: nl.ndarray, k_active: nl.ndarray, v_active: nl.ndarray, k_prior: nl.ndarray, v_prior: nl.ndarray, mask: nl.ndarray, out: nl.ndarray, cfg: AttnTKGConfig, sbm: SbufManager, inv_freqs: Optional[nl.ndarray] = None, rope_pos_ids: Optional[nl.ndarray] = None, sink: Optional[nl.ndarray] = None, active_blocks_table: Optional[nl.ndarray] = None, k_out: Optional[nl.ndarray] = None, DBG_TENSORS: Optional[tuple] = None) -> Tuple[nl.ndarray, Optional[nl.ndarray]]\n\n   Attention specifically optimized for token-gen (where s_active is small). Can optionally fuse RoPE at the start.\n\n   :param q: Query tensor. Shape depends on ``cfg.qk_in_sb``: If ``True``: ``[d, B * H * s_active]``, else: ``[B, d, H, s_active]``\n   :type q: ``nl.ndarray``\n   :param k_active: Active key tensor. Shape depends on ``cfg.qk_in_sb``: If ``True``: ``[d, B * s_active]``, else: ``[B, d, s_active]``\n   :type k_active: ``nl.ndarray``\n   :param v_active: Active value tensor. Shape: ``[B, 1, s_active, d]``\n   :type v_active: ``nl.ndarray``\n   :param k_prior: Prior key tensor from KV cache. Shape: ``[B+, 1, s_prior, d]`` if ``cfg.tp_k_prior`` else ``[B+, 1, d, s_prior]``. For block KV cache, shape is ``[B+ * block_count, block_len, d]``\n   :type k_prior: ``nl.ndarray``\n   :param v_prior: Prior value tensor from KV cache. Shape: ``[B+, 1, s_prior, d]``. For block KV cache, shape is ``[B+ * block_count, block_len, d]``\n   :type v_prior: ``nl.ndarray``\n   :param mask: Attention mask. Shape: ``[s_active, B, H, s_active]`` if ``cfg.use_pos_id`` else ``[s_prior, B, H, s_active]``\n   :type mask: ``nl.ndarray``\n   :param out: Output tensor. Shape depends on ``cfg.out_in_sb``: If ``True``: ``[d, B * H * s_active]``, else: ``[B, H, d, s_active]``\n   :type out: ``nl.ndarray``\n   :param cfg: Kernel configuration with shapes and performance flags\n   :type cfg: ``AttnTKGConfig``\n   :param sbm: SBUF memory manager for allocating temporary buffers\n   :type sbm: ``SbufManager``\n   :param inv_freqs: Inverse frequencies for RoPE. Shape: ``[d // 2, 1]``. Required when ``cfg.fuse_rope`` is ``True``\n   :type inv_freqs: ``nl.ndarray``, optional\n   :param rope_pos_ids: Position IDs for RoPE. Shape: ``[B, s_active]``. Required when ``cfg.fuse_rope`` or ``cfg.use_pos_id`` is ``True``\n   :type rope_pos_ids: ``nl.ndarray``, optional\n   :param sink: Sink attention tokens. Shape: ``[H, 1]`` for streaming attention sink tokens\n   :type sink: ``nl.ndarray``, optional\n   :param active_blocks_table: Table of active blocks for block KV cache. Shape: ``[B, num_blocks]``. Required when using block KV cache\n   :type active_blocks_table: ``nl.ndarray``, optional\n   :param k_out: Output key tensor after RoPE. Shape depends on ``cfg.k_out_in_sb``: If ``True``: ``[d, B * s_active]``, else: ``[B, 1, d, s_active]``\n   :type k_out: ``nl.ndarray``, optional\n   :param DBG_TENSORS: Optional tuple of 4-5 debug tensors with shared HBM type for intermediate value inspection\n   :type DBG_TENSORS: ``tuple``, optional\n   :return: Tuple of ``(out, k_out)`` where ``out`` is the attention output tensor and ``k_out`` is the key output tensor (if ``cfg.fuse_rope`` is ``True``)\n   :rtype: ``tuple``\n\n   **Constraints**:\n\n   * Optimized for ``s_active <= 7`` and ``d_head <= 128``\n   * ``cfg.qk_in_sb=True`` is required when skipping fused RoPE\n   * Block KV cache requires ``cfg.qk_in_sb=True``\n   * In-kernel mask generation (``cfg.use_pos_id=True``) is not supported with batch sharding or block KV cache\n\nFeatures\n-----------\n\n1. **Flexible Tensor Placement**:\n   \n   * ``q``, ``k``, ``k_out``, and ``out`` tensors can be placed in either SBUF or HBM\n   * When ``qk_in_sb=True``, q and k tensors are pre-loaded in SBUF (required for block KV cache)\n   * ``out_in_sb`` and ``k_out_in_sb`` flags control output tensor placement for reduced memory transfers\n   * Use this feature for performance improvement when integrating this kernel into a larger kernel\n\n2. **Adaptive LNC2 Sharding**:\n   \n   * Automatically selects sharding strategy based on tensor dimensions\n   * Batch sharding: Used when batch is even AND (``s_prior < 256`` OR ``b*q_head*s_active > 128``)\n   * Sequence sharding: Used when ``s_prior >= 256`` and batch sharding criteria not met\n   * Balances computation across 2 NeuronCores for improved throughput\n\n3. **Mask Generation**:\n   \n   * ``use_pos_id=False``: Pre-generated mask loaded from HBM\n   * ``use_pos_id=True``: Mask generated in-kernel from position IDs\n   * In-kernel generation reduces memory bandwidth but requires position ID input\n\n4. **Fused RoPE (Rotary Position Embedding)**:\n   \n   * ``fuse_rope`` integrates RoPE computation directly into the attention kernel\n   * Applies rotary embeddings to Q and K tensors, scaling Q by ``1/sqrt(d_head)``\n   * Reduces memory traffic by avoiding separate RoPE passes\n\n5. **Block KV Cache**:\n   \n   * Supports block-sparse KV cache with configurable ``block_len``\n   * Uses ``active_blocks_table`` to track which cache blocks are active per batch\n   * Enables efficient long-context inference with sparse memory access patterns\n\n6. **K_prior Transpose Handling**:\n   \n   * ``tp_k_prior`` flag indicates whether K_prior is pre-transposed in memory\n   * Optimizes memory layout: ``[B, 1, d, s_prior]`` when ``tp_k_prior=True`` vs ``[B, 1, s_prior, d]`` when False\n   * Reduces transpose operations during computation and improves interoperability with other kernels\n\n7. **Strided Memory Access (strided_mm1)**:\n   \n   * Enables strided read patterns for K in first matmul\n   * When enabled, allows MM2 to use sequential V reads for better DMA throughput\n   * Trades off MM1 memory access for MM2 optimization\n\n8.  **Attention Sink**:\n   \n   * Supports streaming attention with sink tokens for infinite context\n   * Sink tokens maintain fixed attention scores across all positions\n   * Integrated into softmax reduction for minimal overhead\n\n9.  **GPSIMD SBUF-to-SBUF Transfers**:\n    \n   * ``use_gpsimd_sb2sb`` enables high-performance GPSIMD instructions for inter-core communication\n   * Optimizes LNC2 sharding by using extended instructions for SBUF-to-SBUF data transfers\n\n10. **Context Length Management**:\n    \n    * ``curr_sprior``: Current prior sequence length (actual KV cache content for this invocation)\n    * ``full_sprior``: Full prior sequence length (maximum KV cache capacity allocated)\n    * Allows progressive filling of KV cache during autoregressive generation\n\nImplementation Details\n-------------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Efficient Tiling Strategy**: Uses carefully chosen tile sizes for processing batches, sequences, and heads to maximize hardware utilization.\n\n2. **Cascaded Reduction**: Implements cascaded max and sum reduction operations for softmax computation to maintain numerical stability.\n\n3. **Memory Access Optimization**: Employs careful memory access patterns to optimize data movement between HBM and SBUF.\n\n4. **Block KV Cache Support**: Implements efficient block-sparse KV cache with dynamic block size adjustment to ensure optimal hardware utilization.\n\n5. **Attention Sink Integration**: Efficiently integrates attention sink tokens into the softmax computation for streaming attention.\n\n6. **Fused RoPE Implementation**: Implements efficient rotary position embeddings with optimized trigonometric computations.\n\n7. **Adaptive Sharding**: Dynamically selects between batch and sequence sharding based on tensor dimensions to optimize performance.\n\n8. **GPSIMD Optimization**: Uses GPSIMD instructions for high-performance SBUF-to-SBUF data transfers in LNC2 sharding.\n\n9. **Debug Support**: Provides comprehensive debug tensor support for intermediate value inspection.\n\n10. **Stack-based SBUF Allocation**: Uses SbufManager for efficient on-chip memory management with hierarchical scoping.\n\n\n\nSee Also\n-----------\n\n* :doc:`Output Projection TKG Kernel API Reference </nki/library/api/output-projection-tkg>`\n"
  },
  {
    "path": "nki/library/api/blockwise-mm-backward.rst",
    "content": ".. meta::\n    :description: Blockwise MM Backward kernel computes backward pass for blockwise Mixture of Experts layers.\n    :date-modified: 02/13/2026\n\n.. currentmodule:: nkilib.experimental.moe.bwd\n\nBlockwise MM Backward Kernel API Reference\n============================================\n\n**[Experimental]** Computes the backward pass for blockwise matrix multiplication in Mixture of Experts (MoE) layers, producing gradients for all parameters.\n\nThe kernel supports:\n\n* Gradient computation for hidden states, expert affinities, gate/up weights, and down weights\n* Optional bias gradient computation\n* Multiple sharding strategies (hidden dimension, intermediate dimension)\n* Affinity scaling on hidden or intermediate dimension\n* Gradient clamping for numerical stability\n* Various activation functions (SiLU, GELU, Swish)\n* Dropless MoE with variable block assignments per expert\n\nBackground\n-----------\n\nThe ``blockwise_mm_bwd`` kernel is the backward pass companion to the MoE CTE forward kernel. It computes gradients for all learnable parameters in a blockwise MoE layer by reversing the forward computation:\n\n1. **Down projection backward**: Compute gradients for down projection weights and intermediate activations\n2. **Activation backward**: Compute gradients through the activation function using checkpointed activations\n3. **Gate/Up projection backward**: Compute gradients for gate and up projection weights\n4. **Hidden states backward**: Compute gradients for input hidden states\n5. **Affinity backward**: Compute gradients for expert affinities\n\nThe kernel uses activation checkpoints saved during the forward pass (``gate_up_proj_act_checkpoint_T`` and ``down_proj_act_checkpoint``) to avoid recomputation.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `blockwise_mm_backward.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/moe/bwd/blockwise_mm_backward.py>`_\n\nblockwise_mm_bwd\n^^^^^^^^^^^^^^^^^\n\n.. py:function:: blockwise_mm_bwd(hidden_states: nl.ndarray, expert_affinities_masked: nl.ndarray, gate_up_proj_weight: nl.ndarray, down_proj_weight: nl.ndarray, gate_up_proj_act_checkpoint_T: nl.ndarray, down_proj_act_checkpoint: nl.ndarray, token_position_to_id: nl.ndarray, block_to_expert: nl.ndarray, output_hidden_states_grad: nl.ndarray, block_size: int, skip_dma: SkipMode = None, compute_dtype: nki.dtype = nl.bfloat16, is_tensor_update_accumulating: bool = True, shard_option: ShardOption = ShardOption.SHARD_ON_HIDDEN, affinity_option: AffinityOption = AffinityOption.AFFINITY_ON_H, kernel_type_option: KernelTypeOption = KernelTypeOption.DROPLESS, clamp_limits: ClampLimits = None, bias: bool = False, activation_type: ActFnType = ActFnType.SiLU, block_tile_size: int = None) -> tuple\n\n   Compute backward pass for blockwise MoE layer.\n\n   Computes gradients for all parameters in a Mixture of Experts layer using blockwise\n   matrix multiplication. Optimized for dropless MoE with variable block assignments per expert.\n\n   :param hidden_states: Input hidden states tensor with shape ``[T, H]`` in HBM.\n   :type hidden_states: ``nl.ndarray``\n   :param expert_affinities_masked: Expert affinities with shape ``[T * E, 1]`` in HBM.\n   :type expert_affinities_masked: ``nl.ndarray``\n   :param gate_up_proj_weight: Gate and up projection weights with shape ``[E, H, 2, I_TP]`` in HBM.\n   :type gate_up_proj_weight: ``nl.ndarray``\n   :param down_proj_weight: Down projection weights with shape ``[E, I_TP, H]`` in HBM.\n   :type down_proj_weight: ``nl.ndarray``\n   :param gate_up_proj_act_checkpoint_T: Checkpointed gate/up activations from forward pass with shape ``[N, 2, I_TP, B]``.\n   :type gate_up_proj_act_checkpoint_T: ``nl.ndarray``\n   :param down_proj_act_checkpoint: Checkpointed down projection activations from forward pass with shape ``[N, B, H]``.\n   :type down_proj_act_checkpoint: ``nl.ndarray``\n   :param token_position_to_id: Token position to block mapping with shape ``[N * B]``.\n   :type token_position_to_id: ``nl.ndarray``\n   :param block_to_expert: Expert index per block with shape ``[N, 1]``.\n   :type block_to_expert: ``nl.ndarray``\n   :param output_hidden_states_grad: Upstream gradient from output with shape ``[T, H]``.\n   :type output_hidden_states_grad: ``nl.ndarray``\n   :param block_size: Number of tokens per block. Must be one of: 128, 256, 512, 1024.\n   :type block_size: ``int``\n   :param skip_dma: DMA skip mode for OOB handling. Default: ``SkipMode(False, False)``.\n   :type skip_dma: ``SkipMode``, optional\n   :param compute_dtype: Computation data type. Default: ``nl.bfloat16``.\n   :type compute_dtype: ``nki.dtype``\n   :param is_tensor_update_accumulating: Whether to accumulate into existing gradients. Default: ``True``.\n   :type is_tensor_update_accumulating: ``bool``\n   :param shard_option: Sharding strategy. ``SHARD_ON_HIDDEN``: shard across hidden dimension. ``SHARD_ON_INTERMEDIATE``: shard across intermediate dimension. ``AUTO``: auto-select. Default: ``SHARD_ON_HIDDEN``.\n   :type shard_option: ``ShardOption``\n   :param affinity_option: Dimension for affinity scaling. ``AFFINITY_ON_H``: scale on hidden dimension. ``AFFINITY_ON_I``: scale on intermediate dimension. Default: ``AFFINITY_ON_H``.\n   :type affinity_option: ``AffinityOption``\n   :param kernel_type_option: Token dropping strategy. ``DROPLESS``: variable blocks per expert. ``DROPPING``: fixed blocks per expert. Default: ``DROPLESS``.\n   :type kernel_type_option: ``KernelTypeOption``\n   :param clamp_limits: Gradient clamping limits for numerical stability. Contains ``linear_clamp_upper_limit``, ``linear_clamp_lower_limit``, ``non_linear_clamp_upper_limit``, ``non_linear_clamp_lower_limit``.\n   :type clamp_limits: ``ClampLimits``, optional\n   :param bias: Whether to compute bias gradients. Default: ``False``.\n   :type bias: ``bool``\n   :param activation_type: Activation function type. Default: ``SiLU``.\n   :type activation_type: ``ActFnType``\n   :param block_tile_size: Optional tile size override for block processing.\n   :type block_tile_size: ``int``, optional\n   :return: Tuple of gradient tensors. When ``bias=False``: ``(hidden_states_grad, expert_affinities_masked_grad, gate_up_proj_weight_grad, down_proj_weight_grad)``. When ``bias=True``: additionally includes ``gate_and_up_proj_bias_grad`` and ``down_proj_bias_grad``.\n   :rtype: ``tuple``\n\n   **Dimensions**:\n\n   * T: Total number of input tokens\n   * H: Hidden dimension size\n   * I_TP: Intermediate size / tensor parallel degree\n   * E: Number of experts\n   * B: Block size (tokens per block)\n   * N: Number of blocks\n\n   **Supported Data Types**:\n\n   * Input: bfloat16, float16\n\n   **Constraints**:\n\n   * ``block_size`` must be one of: 128, 256, 512, 1024\n   * H must be divisible by the number of shards for LNC sharding\n   * Currently only supports ``DROPLESS`` kernel type\n   * Requires activation checkpoints from the forward pass (``gate_up_proj_act_checkpoint_T`` and ``down_proj_act_checkpoint``)\n\nImplementation Details\n-----------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Sharding Strategies**: Supports sharding across hidden dimension (simpler, no H-tiling) or intermediate dimension (better memory efficiency) for LNC2 parallelism.\n\n2. **Activation Checkpointing**: Uses saved activations from the forward pass to avoid recomputation during backward, trading memory for compute.\n\n3. **Blockwise Processing**: Processes tokens in blocks matching the forward pass structure, enabling efficient gradient accumulation across experts.\n\n4. **Gradient Clamping**: Optional clamping of gradients for numerical stability during training.\n\n5. **Affinity Gradient Computation**: Computes gradients for expert routing weights, enabling end-to-end training of the router.\n\nSee Also\n-----------\n\n* :doc:`MoE CTE Kernel API Reference </nki/library/api/moe-cte>`\n* :doc:`MoE TKG Kernel API Reference </nki/library/api/moe-tkg>`\n"
  },
  {
    "path": "nki/library/api/conv1d.rst",
    "content": ".. meta::\n    :description: 1D Convolution operation using tensor engine with replication strategy.\n    :date-modified: 04/09/2026\n\n.. currentmodule:: nkilib.experimental.conv\n\nConv1D Kernel API Reference\n===========================\n\nImplements 1D convolution using tensor engine with a replication strategy for efficient computation.\n\nThe kernel supports:\n\n* Arbitrary stride, padding, and dilation values\n* Optional bias addition\n* Activation function fusion\n* LNC sharding on the output channel dimension\n\nIntended usage range:\n\n* Kernel size (K): 1 to 128\n* Sequence length (L): 1 to 4096\n* Input channels (C_in): 1 to 4096\n* Output channels (C_out): 1 to 4096\n* Batch size (B): Any positive integer\n\nBackground\n-----------\n\nThe ``conv1d`` kernel applies 1D convolution filters across the input sequence dimension. It uses a replication strategy to efficiently utilize the tensor engine by stacking multiple filter positions along the partition dimension.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `conv1d.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/conv/conv1d.py>`_\n\nconv1d\n^^^^^^\n\n.. py:function:: conv1d(x_in: nl.ndarray, filters: nl.ndarray, bias: Optional[nl.ndarray] = None, stride: int = 1, padding: tuple[int, int] = (0, 0), dilation: int = 1, activation_fn: Optional[ActFnType] = None, lnc_shard: bool = False) -> nl.ndarray\n\n   1D Convolution operation using tensor engine with replication strategy.\n\n   :param x_in: [B, C_in, L], Input tensor on HBM.\n   :type x_in: ``nl.ndarray``\n   :param filters: [K, C_in, C_out], Convolution filter weights on HBM.\n   :type filters: ``nl.ndarray``\n   :param bias: [C_out], Optional bias tensor on HBM. Default None.\n   :type bias: ``Optional[nl.ndarray]``\n   :param stride: Stride for convolution. Must be >= 1. Default 1.\n   :type stride: ``int``\n   :param padding: Tuple of (left_pad, right_pad). Must be non-negative. Default (0, 0).\n   :type padding: ``tuple[int, int]``\n   :param dilation: Dilation factor for dilated convolution. Must be >= 1. Default 1.\n   :type dilation: ``int``\n   :param activation_fn: Optional activation function to fuse. Default None.\n   :type activation_fn: ``Optional[ActFnType]``\n   :param lnc_shard: If True, shard computation across LNC cores on C_out dimension. Default False.\n   :type lnc_shard: ``bool``\n   :return: [B, C_out, L_out], Output tensor on HBM where L_out = (L + pad_left + pad_right - dilation * (K - 1) - 1) // stride + 1\n   :rtype: ``nl.ndarray``\n\n   **Notes**:\n\n   * All input tensors (x_in, filters, bias) must have the same dtype\n   * Input channels C_in must match filter channels\n   * Uses replication strategy to stack K filter positions along partition dimension\n   * Partition alignment rules limit K replication factor based on C_in tile size\n   * Memory management uses SbufManager with multi-buffering for efficiency\n\n   **Dimensions**:\n\n   * B: Batch size\n   * C_in: Number of input channels\n   * C_out: Number of output channels\n   * L: Input sequence length\n   * L_out: Output sequence length = (L + pad_left + pad_right - dilation * (K - 1) - 1) // stride + 1\n\n"
  },
  {
    "path": "nki/library/api/cross-entropy.rst",
    "content": ".. meta::\n    :description: Cross entropy kernel implements memory-efficient cross entropy loss for large vocabularies using online log-sum-exp algorithm.\n    :date-modified: 02/13/2026\n\n.. currentmodule:: nkilib.experimental.loss\n\nCross Entropy Kernel API Reference\n===================================\n\nImplements memory-efficient cross entropy loss computation for large vocabularies using the online log-sum-exp algorithm with batched processing.\n\nThe kernel supports:\n\n* Memory-efficient computation for large vocabularies\n* Online log-sum-exp algorithm to avoid numerical overflow\n* Forward and backward pass kernels\n* Batched processing for improved throughput\n* Optimized for LNC2 (2 cores) architecture\n* Configurable chunk sizes and batch sizes\n* Support for bfloat16 and float32 data types\n\nBackground\n-----------\n\nThe ``cross_entropy_forward`` kernel is designed for efficient computation of cross entropy loss in large vocabulary scenarios, such as language modeling. Traditional cross entropy implementations require loading the entire vocabulary for each position, which can be memory-intensive. This kernel uses an online log-sum-exp algorithm that processes the vocabulary in chunks, maintaining numerical stability while reducing memory requirements.\n\nA companion ``cross_entropy_backward`` kernel computes gradients with respect to logits using the saved log-sum-exp state from the forward pass.\n\n.. note::\n    This kernel is optimized for Trainium2 (TRN2) and uses batched processing where each core processes multiple positions simultaneously with vectorized operations.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `cross_entropy.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/loss/cross_entropy.py>`_\n\ncross_entropy_forward\n^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: cross_entropy_forward(logits_hbm: nl.ndarray, targets_hbm: nl.ndarray, positions_per_batch: int = 32, chunk_size: int = 32768, dtype: nki.dtype = nl.bfloat16) -> tuple[nl.ndarray, nl.ndarray]\n\n   Cross entropy forward pass using online log-sum-exp algorithm with batching.\n\n   This kernel computes cross entropy loss for large vocabularies using a memory-efficient\n   online log-sum-exp algorithm. Optimized for LNC2 (2 cores) with batched processing where\n   each core processes multiple positions in batches with vectorized operations.\n\n   :param logits_hbm: Input logits tensor in HBM with shape [num_positions, V]. Supported dtypes: nl.bfloat16, nl.float32. MUST be 2D (already flattened).\n   :type logits_hbm: ``nl.ndarray``\n   :param targets_hbm: Target indices tensor in HBM with shape [num_positions]. dtype: nl.int32. MUST be 1D (already flattened).\n   :type targets_hbm: ``nl.ndarray``\n   :param positions_per_batch: Number of positions to process together. Default: 32. Larger batches improve HBM bandwidth and SBUF utilization. Candidate values (powers of 2): 8, 16, 32, 64, 128. Must satisfy: positions_per_batch × chunk_size × dtype_bytes ≤ 24 MiB.\n   :type positions_per_batch: ``int``\n   :param chunk_size: Size of vocabulary chunks. Default: 32768 (32K). Must not exceed vocabulary size V or hardware limit (65535). Candidate values: 65535 (F_MAX, ideal for 128K-256K vocabs, bf16 only), 49152 (3/4 of F_MAX), 40960 (Good balance), 32768 (Standard, good for 32K-128K vocabs), 16384 (Half of 32K), 8192 (Quarter of 32K), 4096 (Small vocab fallback), 2048 (Minimum practical).\n   :type chunk_size: ``int``\n   :param dtype: Data type for internal computations. Default: nl.bfloat16. Supported types: nl.bfloat16 (2 bytes), nl.float32 (4 bytes). Controls precision of intermediate calculations and memory usage.\n   :type dtype: ``nki.dtype``\n   :return: A tuple containing: loss_hbm (Cross entropy loss per position in HBM with shape [num_positions], dtype matches dtype parameter), lse_state_hbm (Log-sum-exp values per position in HBM with shape [num_positions], dtype matches dtype parameter, saved for backward pass).\n   :rtype: ``tuple[nl.ndarray, nl.ndarray]``\n\n   **Notes**:\n\n   * Batched version for LNC2 (2 cores): Each core processes multiple positions in batches\n   * Positions assigned in strided pattern (core_id, core_id + 2, core_id + 4, ...)\n   * Vectorized operations across batch dimension for efficiency\n   * chunk_size must not exceed vocabulary size V\n   * positions_per_batch must be in range (0, 128]\n   * Per-allocation size constraint: positions_per_batch × chunk_size × dtype_bytes ≤ 24 MiB\n   * Performance tuning: Increase positions_per_batch for better throughput (up to memory limit)\n   * Performance tuning: Use larger chunk_size to reduce loop iterations (up to V and memory limit)\n\nImplementation Details\n-----------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Online Log-Sum-Exp Algorithm**: Processes vocabulary in chunks while maintaining running maximum and sum of exponentials to avoid numerical overflow.\n\n2. **Batched Processing**: Each core processes multiple positions simultaneously using vectorized operations for improved throughput.\n\n3. **Memory Efficiency**: Uses configurable chunk sizes to balance memory usage and computational efficiency.\n\n4. **Load Balancing**: Distributes positions across cores in a strided pattern for optimal load distribution.\n\n5. **Numerical Stability**: Maintains numerical stability through careful handling of maximum values and exponential computations.\n\n**Chunk Size Selection Guide**:\n\n* V ≤ 32K: Use chunk_size = V (single chunk)\n* 32K < V ≤ 128K: Use chunk_size = 32768 or 40960\n* 128K < V ≤ 256K: Use chunk_size = 65535 (bf16) or 32768 (fp32)\n* Always verify: positions_per_batch × chunk_size × dtype_bytes ≤ 24 MiB\n\ncross_entropy_backward\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: cross_entropy_backward(logits_hbm: nl.ndarray, targets_hbm: nl.ndarray, lse_state_hbm: nl.ndarray, reduction: str = \"mean\", positions_per_batch: int = 32, chunk_size: int = 32768, dtype: nki.dtype = nl.bfloat16, inplace: bool = True) -> nl.ndarray\n\n   Cross entropy backward pass computing gradients with respect to logits.\n\n   Computes the gradient of cross entropy loss with respect to input logits using the formula:\n   ``grad_logits[i, j] = grad_scale * (softmax(logits[i, j]) - 1{j == target[i]})``\n   where softmax is computed using the saved LSE state from the forward pass, and ``grad_scale``\n   is determined by the reduction parameter.\n\n   Optimized for LNC2 (2 cores) with batched processing where each core processes multiple\n   positions in batches with vectorized operations.\n\n   :param logits_hbm: Input logits tensor in HBM with shape ``[num_positions, V]``. Supported dtypes: ``nl.bfloat16``, ``nl.float32``. MUST be 2D (already flattened). Same tensor used in forward pass.\n   :type logits_hbm: ``nl.ndarray``\n   :param targets_hbm: Target indices tensor in HBM with shape ``[num_positions]``. dtype: ``nl.int32``. MUST be 1D (already flattened). Same tensor used in forward pass.\n   :type targets_hbm: ``nl.ndarray``\n   :param lse_state_hbm: Log-sum-exp values from forward pass in HBM with shape ``[num_positions]``. dtype matches ``dtype`` parameter. Saved state from ``cross_entropy_forward``.\n   :type lse_state_hbm: ``nl.ndarray``\n   :param reduction: How to scale gradients. ``'mean'``: scale by ``1/num_positions`` (matches PyTorch default). ``'sum'``: scale by ``1.0``. Default: ``'mean'``.\n   :type reduction: ``str``\n   :param positions_per_batch: Number of positions to process together. Default: 32. Must satisfy: ``positions_per_batch × chunk_size × dtype_bytes ≤ 24 MiB``.\n   :type positions_per_batch: ``int``\n   :param chunk_size: Size of vocabulary chunks. Default: 32768.\n   :type chunk_size: ``int``\n   :param dtype: Data type for internal computations. Default: ``nl.bfloat16``. Supported types: ``nl.bfloat16``, ``nl.float32``.\n   :type dtype: ``nki.dtype``\n   :param inplace: If ``True``, write gradients directly over ``logits_hbm`` to save HBM memory. Default: ``True``. When ``True``, ``logits_hbm`` is overwritten and cannot be used after.\n   :type inplace: ``bool``\n   :return: Gradient with respect to logits in HBM with shape ``[num_positions, V]``. If ``inplace=True``, this is the same tensor as ``logits_hbm``.\n   :rtype: ``nl.ndarray``\n\n   **Notes**:\n\n   * Uses the saved LSE state from ``cross_entropy_forward`` to compute softmax without recomputing the full forward pass\n   * ``inplace=True`` saves ``num_positions × vocab_size × dtype_bytes`` of HBM memory\n   * Same chunking and batching strategy as the forward pass for consistent performance"
  },
  {
    "path": "nki/library/api/cumsum.rst",
    "content": ".. meta::\n    :description: Cumsum kernel computes cumulative sum along the last dimension.\n    :date-modified: 01/21/2026\n\n.. currentmodule:: nkilib.core.cumsum\n\nCumsum Kernel API Reference\n============================\n\nComputes cumulative sum along the last dimension of the input tensor. Optimized for batch sizes up to 2048 and hidden dimension sizes up to 8192. Supports 3D inputs with sequence length up to 10.\n\nThe kernel supports:\n\n* Cumulative sum computation along the last dimension only\n* 2D and 3D input tensors\n* Float32 accumulation for numerical stability\n* Efficient tiled processing for large tensors\n* Sequential processing to maintain cumulative dependencies\n\nBackground\n--------------\n\nThe ``cumsum`` kernel implements cumulative sum computation, where each element in the output is the sum of all preceding elements (including itself) along the specified dimension. This operation is commonly used in various machine learning applications including attention mechanisms and sequence processing.\n\nThe kernel applies the following transformation along the last dimension:\n\n* ``out[..., i] = sum(x[..., 0:i+1])``\n\nThe implementation uses ``tensor_tensor_scan`` operations with float32 accumulation for numerical stability, processing data in tiles to handle large tensors efficiently.\n\nAPI Reference\n----------------\n\n**Source code for this kernel API can be found at**: `cumsum.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/cumsum/cumsum.py>`_\n\ncumsum\n^^^^^^^^^^^^^^^\n\n.. py:function:: cumsum(x, axis=-1)\n\n   Compute cumulative sum along the last dimension.\n\n   :param x: Input tensor of shape ``[B, H]`` for 2D or ``[B, S, H]`` for 3D in HBM\n   :type x: ``nl.ndarray``\n   :param axis: Axis along which to compute cumsum. Must be -1 or the last dimension index. Default is -1.\n   :type axis: ``int``, optional\n   :return: Output tensor with same shape and dtype as input, containing cumulative sums along the last dimension\n   :rtype: ``nl.ndarray``\n\n   **Constraints**:\n\n   * Only supports cumsum along the last dimension (axis=-1)\n   * Batch size (``B``) must be up to 2048\n   * Hidden dimension size (``H``) must be up to 8192\n   * Sequence length (``S``) for 3D inputs must be up to 10\n   * Input tensor must be 2D or 3D\n   * For very long hidden dimensions (>5K), expect ~1e-2 absolute error due to fp32 accumulation\n\nImplementation Details\n-------------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Tiled Processing**: Processes data in tiles to handle large tensors efficiently:\n   \n   * **Partition Tiles**: Up to 128 elements per partition tile\n   * **Free Dimension Tiles**: Up to 2048 elements per free dimension tile\n   * Sequential processing across free dimension tiles to maintain cumulative dependencies\n\n2. **Numerical Stability**: Uses float32 accumulation internally regardless of input dtype to maintain numerical precision for long sequences.\n\n3. **Tensor Scan Operations**: Leverages ``tensor_tensor_scan`` with multiply and add operations to compute cumulative sums efficiently:\n   \n   * ``result[i] = ones[i] * result[i-1] + data[i] = result[i-1] + data[i]``\n\n4. **Carry Forward**: Maintains cumulative state across tiles by carrying forward the last column of each processed tile as the initial value for the next tile.\n\n5. **Memory Management**: Efficiently manages SBUF allocations for intermediate buffers and uses DMA operations for HBM transfers."
  },
  {
    "path": "nki/library/api/depthwise-conv1d.rst",
    "content": ".. meta::\n    :description: Depthwise Conv1D kernel using implicit GEMM approach for TRN2.\n    :date-modified: 02/06/2025\n\n.. currentmodule:: nkilib.experimental.conv\n\nDepthwise Conv1D Kernel API Reference\n======================================\n\nImplements depthwise 1D convolution using implicit GEMM without full im2col materialization.\n\nThe kernel supports:\n\n* Depthwise 1D convolution with stride=1 and zero padding\n* Implicit GEMM approach for memory efficiency\n* LNC2 sharding on channel dimension\n* Optimized for TRN2 platform\n\nBackground\n-----------\n\nThe ``depthwise_conv1d_implicit_gemm`` kernel performs depthwise 1D convolution by loading input with shape [S_TILE, Q] where row k contains elements starting at index k (i.e., input[k:k+Q]), enabling implicit im2col via offset-based loading. This approach avoids materializing the full im2col matrix, saving W*S*C memory. The kernel tiles on S dimension for S > 128 and is optimized for TRN2 platform with LNC2 sharding on channel dimension.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `depthwise_conv1d.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/conv/depthwise_conv1d.py>`_\n\ndepthwise_conv1d_implicit_gemm\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: depthwise_conv1d_implicit_gemm(img_ref: nl.ndarray, filter_ref: nl.ndarray, padding: tuple = ((0, 0), (0, 0)), stride: tuple = (1, 1), rhs_dilation: tuple = (1, 1), lhs_dilation: tuple = (1, 1), feature_group_count: int = 1, batch_group_count: int = 1, in_perm: tuple = None, kern_perm: tuple = None, out_perm: tuple = None) -> nl.ndarray\n\n   Depthwise Conv1D using implicit GEMM without full im2col materialization.\n\n   Performs depthwise 1D convolution by loading input with shape [S_TILE, Q] where\n   row k contains elements starting at index k (i.e., input[k:k+Q]), enabling implicit\n   im2col via offset-based loading. Tiles on S dimension for S > 128. Optimized for\n   TRN2 platform with LNC2 sharding on channel dimension.\n\n   :param img_ref: Input tensor on HBM with shape [N, C, 1, W].\n   :type img_ref: ``nl.ndarray``\n   :param filter_ref: Depthwise kernel weights on HBM with shape [C, 1, 1, S].\n   :type filter_ref: ``nl.ndarray``\n   :param padding: Padding as ((H_pad_l, H_pad_r), (W_pad_l, W_pad_r)). Default: ((0,0),(0,0)), only zeros supported.\n   :type padding: ``tuple``\n   :param stride: Stride values. Default: (1, 1), only (1, 1) supported.\n   :type stride: ``tuple``\n   :param rhs_dilation: RHS dilation. Default: (1, 1).\n   :type rhs_dilation: ``tuple``\n   :param lhs_dilation: LHS dilation. Default: (1, 1).\n   :type lhs_dilation: ``tuple``\n   :param feature_group_count: Number of feature groups. Default: 1.\n   :type feature_group_count: ``int``\n   :param batch_group_count: Number of batch groups. Default: 1.\n   :type batch_group_count: ``int``\n   :param in_perm: Input permutation. Default: None.\n   :type in_perm: ``tuple``, optional\n   :param kern_perm: Kernel permutation. Default: None.\n   :type kern_perm: ``tuple``, optional\n   :param out_perm: Output permutation. Default: None.\n   :type out_perm: ``tuple``, optional\n   :return: Convolution output on HBM with shape [N, C, 1, Q] where Q = W - S + 1.\n   :rtype: ``nl.ndarray``\n\n   **Notes**:\n\n   * Only supports stride=1 and zero padding\n   * Requires C to be divisible by NUM_SHARDS (2)\n   * Uses LNC2 sharding on channel dimension\n   * For depthwise convolution, feature_group_count must equal C\n\n   **Dimensions**:\n\n   * N: Batch size\n   * C: Number of channels\n   * W: Input width (spatial dimension)\n   * S: Kernel size\n   * Q: Output width (W - S + 1)\n\nImplementation Details\n-----------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Implicit GEMM Approach**: Avoids materializing full im2col matrix by using offset-based loading patterns, saving W*S*C memory.\n\n2. **Tiling Strategy**: \n   - Input: [N, C, W] tiled as [N, C_TILES, C_TILE] x [S_TILES, S_TILE, Q]\n   - Filter: [C, S] tiled as [C_TILES, C_TILE] x [S_TILES, S_TILE]\n   - Output: [N, C, Q] accumulated in [Q_TILES, Q_TILE] chunks\n\n3. **Tile Size Selection**:\n   - S_TILE = min(S, 128): Matches partition dimension (P_MAX=128)\n   - Q_TILE = min(Q, 512): Matches free dimension (F_MAX=512)\n   - C_TILE = min(C_per_shard, 128): Balances parallelism and memory\n\n4. **Filter Preloading**: Amortizes transpose cost across channels by preloading filter tiles in outer loop.\n\n5. **Sequential S-tile Accumulation**: Enables pipelining and reduces PSUM pressure.\n\n6. **LNC2 Sharding**: Distributes computation across channel dimension for parallel processing."
  },
  {
    "path": "nki/library/api/dynamic-elementwise-add.rst",
    "content": ".. meta::\n    :description: Elementwise addition with dynamic partition dimension tiling.\n    :date-modified: 04/09/2026\n\n.. currentmodule:: nkilib.experimental.dynamic_shapes\n\nDynamic Elementwise Add Kernel API Reference\n============================================\n\nElementwise addition with dynamic partition dimension tiling.\n\nComputes output = input_a + input_b for 2D bf16 tensors where the number of M-dimension tiles to process is determined at runtime via num_m_tiles. Optimized for M dimensions up to 2048 and H dimensions up to 8192.\n\nBackground\n-----------\n\nThe ``dynamic_elementwise_add`` kernel computes elementwise addition where the number of M-dimension tiles to process is determined at runtime. This demonstrates NKI's support for dynamic loop bounds using ``sequential_range`` with runtime-variable trip counts.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `dynamic_elementwise_add.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/dynamic_shapes/dynamic_elementwise_add.py>`_\n\ndynamic_elementwise_add\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: dynamic_elementwise_add(input_a: nl.ndarray, input_b: nl.ndarray, num_m_tiles: nl.ndarray) -> nl.ndarray\n\n   Elementwise addition with dynamic partition dimension tiling.\n\n   :param input_a: [M, H], First input tensor, bf16, on HBM.\n   :type input_a: ``nl.ndarray``\n   :param input_b: [M, H], Second input tensor, bf16, on HBM. Must match input_a shape.\n   :type input_b: ``nl.ndarray``\n   :param num_m_tiles: [1, 1], int32 scalar tensor on HBM. Value = number of M-tiles to process (0 <= num_m_tiles <= M // P_MAX).\n   :type num_m_tiles: ``nl.ndarray``\n   :return: [M, H], bf16 output tensor on HBM. Elements in the first (num_m_tiles * P_MAX) rows contain input_a + input_b; remaining rows are unmodified.\n   :rtype: ``nl.ndarray``\n\n   **Notes**:\n\n   * M must be divisible by P_MAX (128)\n   * H must be divisible by H_TILE_SIZE (512)\n   * input_a and input_b must have identical shapes\n\n   **Dimensions**:\n\n   * M: Row dimension, tiled at P_MAX (128). Dynamic at runtime via num_m_tiles.\n   * H: Hidden/column dimension, tiled at H_TILE_SIZE (512). Static.\n\n"
  },
  {
    "path": "nki/library/api/fg-allgather.rst",
    "content": ".. meta::\n    :description: Fine-grained ring-based all-gather kernel for TRN2.\n    :date-modified: 04/09/2026\n\n.. currentmodule:: nkilib.experimental.collectives\n\nFine-Grained All-Gather Kernel API Reference\n=============================================\n\nPerforms fine-grained ring-based all-gather across ranks for TRN2.\n\nThe kernel supports:\n\n* Ring-based collective permute with double buffering\n* Both SBUF and HBM communication paths with automatic selection based on tensor sizes\n* Overlapped communication and data movement\n\nBackground\n-----------\n\nThe ``fine_grained_allgather`` kernel performs all-gather on the input tensor across ranks along the row dimension. It uses ring-based collective permute with double buffering to overlap communication and data movement.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `fg_allgather.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/collectives/fg_allgather.py>`_\n\nfine_grained_allgather\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: fine_grained_allgather(lhs: nl.ndarray, tp_degree: int, num_groups: int, force_hbm_cc: bool = False) -> nl.ndarray\n\n   Fine-grained ring-based all-gather kernel for TRN2.\n\n   :param lhs: [m, K], Input tensor, row-sharded across ranks.\n   :type lhs: ``nl.ndarray``\n   :param tp_degree: Tensor parallelism degree (number of ranks). Must be even. Supported values: 4, 8, 16, 32, 64, 128.\n   :type tp_degree: ``int``\n   :param num_groups: Number of replica groups for collective communication.\n   :type num_groups: ``int``\n   :param force_hbm_cc: If True, force HBM collective communication path even when SBUF path is feasible.\n   :type force_hbm_cc: ``bool``\n   :return: [RANK_N, ...], Fully gathered tensor in shared HBM. Shape depends on communication path (SBUF vs HBM).\n   :rtype: ``nl.ndarray``\n\n   **Notes**:\n\n   * tp_degree must be even.\n   * M must be divisible by (RANK_N * LNC_N * CHANNEL_N).\n   * Platform target is TRN2 only.\n\n   **Dimensions**:\n\n   * m: Local rows per rank (before all-gather).\n   * M: Total rows after all-gather (m * tp_degree).\n\n"
  },
  {
    "path": "nki/library/api/fgcc.rst",
    "content": ".. meta::\n    :description: Fine grained all-gather and matrix multiplication (FGCC) kernel for TRN2.\n    :date-modified: 04/09/2026\n\n.. currentmodule:: nkilib.experimental.collectives\n\nFGCC (All-Gather + Matmul) Kernel API Reference\n=================================================\n\nPerforms fused all-gather and matrix multiplication (FGCC) for TRN2.\n\nThe kernel supports:\n\n* All-gather on left-hand side tensor across ranks\n* Matrix multiplication with column-sharded right-hand side tensor\n* Ring-based collective permute overlapped with compute\n* Both SBUF and HBM communication paths with automatic selection\n\nBackground\n-----------\n\nThe ``allgather_compute_matmul`` kernel performs all-gather on the left-hand side tensor across ranks, then computes matrix multiplication with a column-sharded right-hand side tensor. Communication is overlapped with compute using ring-based collective permute.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `fgcc.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/collectives/fgcc.py>`_\n\nallgather_compute_matmul\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: allgather_compute_matmul(lhs: nl.ndarray, rhs: nl.ndarray, tp_degree: int, num_groups: int, force_hbm_cc: bool = False) -> nl.ndarray\n\n   Fine grained all-gather and matrix multiplication (FGCC) kernel for TRN2.\n\n   :param lhs: [m, K], Left-hand side tensor, row-sharded across ranks.\n   :type lhs: ``nl.ndarray``\n   :param rhs: [K, N], Right-hand side tensor, column-sharded per rank.\n   :type rhs: ``nl.ndarray``\n   :param tp_degree: Tensor parallelism degree (number of ranks). Must be even.\n   :type tp_degree: ``int``\n   :param num_groups: Number of replica groups for collective communication.\n   :type num_groups: ``int``\n   :param force_hbm_cc: If True, force HBM collective communication path even when SBUF path is feasible.\n   :type force_hbm_cc: ``bool``\n   :return: [RANK_N, ...], Column-sharded result tensor in shared HBM. Shape depends on communication path (SBUF vs HBM).\n   :rtype: ``nl.ndarray``\n\n   **Notes**:\n\n   * tp_degree must be even.\n   * lhs and rhs must have matching K dimension.\n   * M must be divisible by (RANK_N * LNC_N * CHANNEL_N).\n   * Platform target is TRN2 only.\n\n   **Dimensions**:\n\n   * m: Local rows per rank (before all-gather).\n   * M: Total rows after all-gather (m * tp_degree).\n   * K: Shared (contraction) dimension.\n\n"
  },
  {
    "path": "nki/library/api/find-nonzero-indices.rst",
    "content": ".. meta::\n    :description: Find indices of nonzero elements along the T dimension.\n    :date-modified: 04/09/2026\n\n.. currentmodule:: nkilib.core.subkernels\n\nFind Nonzero Indices Subkernel API Reference\n=============================================\n\nFinds indices of nonzero elements along the T dimension.\n\nThe kernel supports:\n\n* Finding nonzero indices in an input tensor of shape [T, C]\n* LNC2 sharding across columns\n* GpSimd ``nonzero_with_count`` ISA for parallel processing\n* Token counts up to 65536 and column counts up to 128\n* Optional column subsetting via ``col_start_id`` and ``n_cols``\n\nBackground\n-----------\n\nThe ``find_nonzero_indices`` subkernel computes the indices of nonzero elements along the T dimension for each column of an input tensor. It uses the GpSimd ``nonzero_with_count`` ISA instruction for parallel processing of 8 columns at a time, with LNC2 sharding across the column dimension.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `find_nonzero_indices.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/subkernels/find_nonzero_indices.py>`_\n\nfind_nonzero_indices\n^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: find_nonzero_indices(input_tensor: nl.ndarray, col_start_id: nl.ndarray = None, n_cols: int = None, chunk_size: int = None, index_dtype: nki.dtype = nl.int32)\n\n   Find indices of nonzero elements along the T dimension.\n\n   :param input_tensor: [T, C], Input tensor on HBM. Nonzero elements are found along the T dimension for each column.\n   :type input_tensor: ``nl.ndarray``\n   :param col_start_id: [1], Optional HBM tensor containing the starting column index in the C dimension. If specified, only n_cols Columns starting from col_start_id are processed. If None, all C Columns are processed.\n   :type col_start_id: ``nl.ndarray``\n   :param n_cols: Number of columns (in C dimension) to process. Required when col_start_id is specified, ignored otherwise.\n   :type n_cols: ``int``\n   :param chunk_size: Size of chunks for processing T dimension. If None, defaults to T. Must divide T evenly. Smaller chunk sizes reduce memory usage.\n   :type chunk_size: ``int``\n   :param index_dtype: Data type for output indices tensor. Default is nl.int32.\n   :type index_dtype: ``nki.dtype``\n   :return: [C, T] or [n_cols, T], Tensor containing nonzero indices. For each column c, the first N values are the T-indices of nonzero elements, followed by -1 padding values.\n   :rtype: ``nl.ndarray``\n   :return: [C] or [n_cols], Count of nonzero elements per column.\n   :rtype: ``nl.ndarray``\n\n   **Notes**:\n\n   * Requires LNC2 configuration (2 NeuronCores)\n   * C must be divisible by 2 (for LNC2 sharding)\n   * chunk_size must be divisible by 128 (partition size)\n   * Uses GpSimd nonzero_with_count ISA which only operates on partitions [0, 16, 32, ..., 112]\n\n   **Dimensions**:\n\n   * T: Sequence/token dimension (first dimension of input)\n   * C: Column dimension that used to calculate the non zero indices (second dimension of input)\n   * C_full: Full columns dimension from input tensor shape\n\n"
  },
  {
    "path": "nki/library/api/index.rst",
    "content": ".. meta::\n    :description: Reference for the pre-built NKI Library kernels included with the AWS Neuron SDK.\n    :date-modified: 04/09/2026\n\n.. _nkl_api_ref_home:\n\nNKI Library Supported Kernel Reference\n======================================\n\nThe NKI Library provides pre-built reference kernels you can use directly in your model development with the AWS Neuron SDK and NKI. These kernels provide the default classes, functions, and parameters you can use to integrate the NKI Library kernels into your models.\n\n**Source code for these kernel APIs can be found at**: https://github.com/aws-neuron/nki-library\n\nCore Kernels\n-------------\n\nNormalization and Quantization Kernels\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`RMSNorm-Quant </nki/library/api/rmsnorm-quant>`\n     - Performs optional RMS normalization followed by quantization to ``fp8``.\n\nQKV Projection Kernels\n~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`QKV </nki/library/api/qkv>`\n     - Performs Query-Key-Value projection with optional normalization and RoPE fusion.\n\nAttention Kernels\n~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Attention CTE </nki/library/api/attention-cte>`\n     - Implements attention optimized for Context Encoding (prefill) use cases.\n   * - :doc:`Attention TKG </nki/library/api/attention-tkg>`\n     - Implements attention optimized for Token Generation (decode) use cases with small active sequence lengths.\n\nRotary Position Embedding (RoPE) Kernels\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`RoPE </nki/library/api/rope>`\n     - Applies Rotary Position Embedding to input embeddings with flexible layout support.\n\nMulti-Layer Perceptron (MLP) Kernels\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`MLP </nki/library/api/mlp>`\n     - Implements Multi-Layer Perceptron with optional normalization fusion and quantization support.\n\nOutput Projection Kernels\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Output Projection CTE </nki/library/api/output-projection-cte>`\n     - Computes output projection optimized for Context Encoding use cases.\n   * - :doc:`Output Projection TKG </nki/library/api/output-projection-tkg>`\n     - Computes output projection optimized for Token Generation use cases.\n\nMixture of Experts (MoE) Kernels\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Router Top-K </nki/library/api/router-topk>`\n     - Computes router logits, applies activation functions, and performs top-K selection for MoE models.\n   * - :doc:`MoE CTE </nki/library/api/moe-cte>`\n     - Implements Mixture of Experts MLP operations optimized for Context Encoding use cases.\n   * - :doc:`MoE TKG </nki/library/api/moe-tkg>`\n     - Implements Mixture of Experts MLP operations optimized for Token Generation use cases.\n\nCumulative Sum Kernels\n~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Cumsum </nki/library/api/cumsum>`\n     - Computes cumulative sum along the last dimension with optimized tiling.\n\nCore Subkernels\n~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Find Nonzero Indices </nki/library/api/find-nonzero-indices>`\n     - Finds indices of nonzero elements along the T dimension using GpSimd ``nonzero_with_count`` ISA.\n\nExperimental Kernels\n---------------------\n\n.. note::\n   Experimental kernels are under active development and their APIs may change in future releases.\n\nAttention Kernels\n~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Attention Block TKG </nki/library/api/attention-block-tkg>`\n     - Fused attention block for Token Generation that keeps all intermediate tensors in SBUF to minimize HBM traffic.\n\nTransformer Kernels\n~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Transformer TKG </nki/library/api/transformer-tkg>`\n     - Multi-layer transformer forward pass megakernel for token generation.\n\nConvolution Kernels\n~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Conv1D </nki/library/api/conv1d>`\n     - 1D convolution using tensor engine with replication strategy.\n   * - :doc:`Depthwise Conv1D </nki/library/api/depthwise-conv1d>`\n     - Implements depthwise 1D convolution using implicit GEMM algorithm.\n\nCollective Communication Kernels\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Fine-Grained All-Gather </nki/library/api/fg-allgather>`\n     - Ring-based all-gather for TRN2 with double-buffered collective permute.\n   * - :doc:`FGCC (All-Gather + Matmul) </nki/library/api/fgcc>`\n     - Fused all-gather and matrix multiplication for TRN2.\n   * - :doc:`SBUF-to-SBUF All-Gather </nki/library/api/sb2sb-allgather>`\n     - SBUF-to-SBUF all-gather with variants for small and large tensors.\n\nMoE Subkernels\n~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Top-K Reduce </nki/library/api/topk-reduce>`\n     - MoE Top-K reduction across sparse all-to-all collective output.\n\nDynamic Shape Kernels\n~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Dynamic Elementwise Add </nki/library/api/dynamic-elementwise-add>`\n     - Elementwise addition with runtime-variable M-dimension tiling.\n\nLoss Kernels\n~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Cross Entropy </nki/library/api/cross-entropy>`\n     - Memory-efficient cross entropy loss forward and backward passes using online log-sum-exp algorithm.\n\nMoE Backward Kernels\n~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`Blockwise MM Backward </nki/library/api/blockwise-mm-backward>`\n     - Computes backward pass for blockwise matrix multiplication in Mixture of Experts layers.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Attention Block TKG <attention-block-tkg>\n    Attention CTE <attention-cte>\n    Attention TKG <attention-tkg>\n    Blockwise MM Backward <blockwise-mm-backward>\n    Conv1D <conv1d>\n    Cross Entropy <cross-entropy>\n    Cumsum <cumsum>\n    Depthwise Conv1D <depthwise-conv1d>\n    Dynamic Elementwise Add <dynamic-elementwise-add>\n    FGCC <fgcc>\n    Find Nonzero Indices <find-nonzero-indices>\n    Fine-Grained All-Gather <fg-allgather>\n    MLP <mlp>\n    MoE CTE <moe-cte>\n    MoE TKG <moe-tkg>\n    Output Projection CTE <output-projection-cte>\n    Output Projection TKG <output-projection-tkg>\n    QKV <qkv>\n    RMSNorm-Quant <rmsnorm-quant>\n    RoPE <rope>\n    Router Top-K <router-topk>\n    SBUF-to-SBUF All-Gather <sb2sb-allgather>\n    Top-K Reduce <topk-reduce>\n    Transformer TKG <transformer-tkg>\n"
  },
  {
    "path": "nki/library/api/mlp.rst",
    "content": ".. meta::\n    :description: MLP kernel implements Multi-Layer Perceptron with optional normalization fusion and quantization.\n    :date-modified: 04/09/2026\n\n.. currentmodule:: nkilib.core.mlp\n\nMLP Kernel API Reference\n=========================\n\nImplements Multi-Layer Perceptron with optional normalization fusion and quantization support.\n\nThe kernel supports:\n\n* Both context encoding (CTE) and token generation (TKG) modes\n* Optional normalization fusion (RMSNorm, LayerNorm)\n* Various activation functions\n* Residual connections via fused addition\n* Flexible tensor layouts and column tiling optimizations\n* Bias addition for all projections and normalization\n* FP8 quantization (static and row-wise, TKG mode only)\n* MXFP4/MXFP8 quantization (TKG mode)\n* Gate and up projection result clamping\n* Optional gate projection skipping\n* SBUF output for kernel fusion\n\nBackground\n-----------\n\nThe ``MLP`` kernel is a critical component in transformer architectures, responsible for processing token representations after the attention mechanism. This kernel optimizes the MLP computation by fusing it with optional normalization and supporting various optimizations for both context encoding and token generation scenarios.\n\n.. note::\n    This kernel automatically selects between TKG (Token Generation) and CTE (Context Encoding) implementations based on the batch size × sequence length, ensuring optimal performance across different use cases.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `mlp.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/mlp/mlp.py>`_\n\nmlp\n^^^\n\n.. py:function:: mlp(hidden_tensor: nl.ndarray, gate_proj_weights_tensor: nl.ndarray, up_proj_weights_tensor: nl.ndarray, down_proj_weights_tensor: nl.ndarray, normalization_weights_tensor: Optional[nl.ndarray] = None, gate_proj_bias_tensor: Optional[nl.ndarray] = None, up_proj_bias_tensor: Optional[nl.ndarray] = None, down_proj_bias_tensor: Optional[nl.ndarray] = None, normalization_bias_tensor: Optional[nl.ndarray] = None, fused_add_tensor: Optional[nl.ndarray] = None, store_fused_add_result: bool = False, activation_fn: ActFnType = ActFnType.SiLU, normalization_type: NormType = NormType.NO_NORM, quantization_type: QuantizationType = QuantizationType.NONE, gate_w_scale: Optional[nl.ndarray] = None, up_w_scale: Optional[nl.ndarray] = None, down_w_scale: Optional[nl.ndarray] = None, gate_up_in_scale: Optional[nl.ndarray] = None, down_in_scale: Optional[nl.ndarray] = None, quant_clipping_bound: float = 0.0, output_dtype = None, store_output_in_sbuf: bool = False, eps: float = 1e-6, skip_gate_proj: bool = False, use_tkg_gate_up_proj_column_tiling: bool = True, use_tkg_down_proj_column_tiling: bool = True, use_tkg_down_proj_optimized_layout: bool = False, gate_clamp_upper_limit: Optional[float] = None, gate_clamp_lower_limit: Optional[float] = None, up_clamp_upper_limit: Optional[float] = None, up_clamp_lower_limit: Optional[float] = None, force_cte_mode: bool = False, sbm: Optional[BufferManager] = None) -> list[nl.ndarray]\n\n   MLP (Multi-Layer Perceptron) Kernel implementation.\n\n   Performs the standard MLP computation with support for both context encoding (CTE) and\n   token generation (TKG) modes. Automatically selects the appropriate implementation based\n   on input dimensions and supports various optimizations.\n\n   :param hidden_tensor: Input hidden states tensor with shape [B, S, H] or SBUF layout.\n   :type hidden_tensor: ``nl.ndarray``\n   :param gate_proj_weights_tensor: Gate projection weight matrix with shape [H, I].\n   :type gate_proj_weights_tensor: ``nl.ndarray``\n   :param up_proj_weights_tensor: Up projection weight matrix with shape [H, I].\n   :type up_proj_weights_tensor: ``nl.ndarray``\n   :param down_proj_weights_tensor: Down projection weight matrix with shape [I, H].\n   :type down_proj_weights_tensor: ``nl.ndarray``\n   :param normalization_weights_tensor: Normalization weights with shape [1, H].\n   :type normalization_weights_tensor: ``nl.ndarray``, optional\n   :param gate_proj_bias_tensor: Bias tensor for gate projection with shape [1, I].\n   :type gate_proj_bias_tensor: ``nl.ndarray``, optional\n   :param up_proj_bias_tensor: Bias tensor for up projection with shape [1, I].\n   :type up_proj_bias_tensor: ``nl.ndarray``, optional\n   :param down_proj_bias_tensor: Bias tensor for down projection with shape [1, H].\n   :type down_proj_bias_tensor: ``nl.ndarray``, optional\n   :param normalization_bias_tensor: Bias tensor for normalization with shape [1, H]. Only applicable for layer normalization.\n   :type normalization_bias_tensor: ``nl.ndarray``, optional\n   :param fused_add_tensor: Tensor to fuse for the residual connection.\n   :type fused_add_tensor: ``nl.ndarray``, optional\n   :param store_fused_add_result: If True, stores the fused_add output to HBM, and the kernel returns both the fused_add output and the MLP output. Default: False.\n   :type store_fused_add_result: ``bool``\n   :param activation_fn: Activation function type.\n   :type activation_fn: ``ActFnType``\n   :param normalization_type: Type of normalization.\n   :type normalization_type: ``NormType``\n   :param quantization_type: Quantization type to use (default: QuantizationType.NONE). Supported values are QuantizationType.STATIC and QuantizationType.ROW. Quantization is only supported in TKG mode.\n   :type quantization_type: ``QuantizationType``\n   :param gate_w_scale: FP8 dequantization scales for gate weights. Shape is [128, I] for row-wise quantization, [128, 1] for static quantization. Defaults to None.\n   :type gate_w_scale: ``nl.ndarray``, optional\n   :param up_w_scale: FP8 dequantization scales for up weights. Shape is [128, I] for row-wise quantization, [128, 1] for static quantization. Defaults to None.\n   :type up_w_scale: ``nl.ndarray``, optional\n   :param down_w_scale: FP8 dequantization scales for down weights. Shape is [128, I] for row-wise quantization, [128, 1] for static quantization. Defaults to None.\n   :type down_w_scale: ``nl.ndarray``, optional\n   :param gate_up_in_scale: FP8 dequantization scales for gate and up input. Used for static quantization with shape [128, 1]. Defaults to None.\n   :type gate_up_in_scale: ``nl.ndarray``, optional\n   :param down_in_scale: FP8 dequantization scales for down input. Used for static quantization with shape [128, 1]. Defaults to None.\n   :type down_in_scale: ``nl.ndarray``, optional\n   :param quant_clipping_bound: Clipping bound for quantization. Default: 0.0.\n   :type quant_clipping_bound: ``float``\n   :param output_dtype: Output tensor data type. Defaults to None; if None, the hidden tensor's ``dtype`` is used.\n   :type output_dtype: ``nki.dtype``\n   :param store_output_in_sbuf: If True, stores the output in SBUF instead of HBM, allowing the next layer to read it directly without an additional load operation. This option is only available in TKG mode where output tensor is small enough to fit in SBUF. Default: False.\n   :type store_output_in_sbuf: ``bool``\n   :param eps: Epsilon value for numerical stability.\n   :type eps: ``float``\n   :param skip_gate_proj: Skip gate projection.\n   :type skip_gate_proj: ``bool``\n   :param use_tkg_gate_up_proj_column_tiling: If True, uses column tiling for the gate and up projection in TKG mode. Default: True.\n   :type use_tkg_gate_up_proj_column_tiling: ``bool``\n   :param use_tkg_down_proj_column_tiling: If True, uses column tiling for the down projection in TKG mode. Default: True.\n   :type use_tkg_down_proj_column_tiling: ``bool``\n   :param use_tkg_down_proj_optimized_layout: If True, the standard down_weight tensor (``shape [I, H]``) is reinterpreted as ``[I, lnc, 128, H // (128 * lnc)]``, then transposed to ``[I, lnc, H // (128 * lnc), 128]``. This layout provides unit-stride weight loading, reducing the matrix multiplication initiation interval. Only applied when ``use_tkg_down_proj_column_tiling`` is False. Default: False.\n   :type use_tkg_down_proj_optimized_layout: ``bool``\n   :param gate_clamp_upper_limit: Upper bound value to clamp on gate projection results, does not perform clamping if the value is set to None.\n   :type gate_clamp_upper_limit: ``float``, optional\n   :param gate_clamp_lower_limit: Lower bound value to clamp on gate projection results, does not perform clamping if the value is set to None.\n   :type gate_clamp_lower_limit: ``float``, optional\n   :param up_clamp_upper_limit: Upper bound value to clamp on up projection results, does not perform clamping if the value is set to None.\n   :type up_clamp_upper_limit: ``float``, optional\n   :param up_clamp_lower_limit: Lower bound value to clamp on up projection results, does not perform clamping if the value is set to None.\n   :type up_clamp_lower_limit: ``float``, optional\n   :param force_cte_mode: If True, forces the use of CTE mode. Default: False.\n   :type force_cte_mode: ``bool``\n   :param sbm: Optional BufferManager instance for custom SBUF memory management. When provided, the kernel uses the given buffer manager instead of creating its own. Default: ``None``.\n   :type sbm: ``BufferManager``, optional\n   :return: The MLP output tensor(s). HBM output: Tensor with shape [B, S, H]. SBUF output: Shape depends on the mode setting. CTE: Not applicable. TKG when ``use_tkg_down_proj_column_tiling`` is ``True = [BxS, H]``. TKG when ``use_tkg_down_proj_column_tiling`` is ``False = [128(p_max), H/128, BxS``]``. If ``store_fused_add_result`` is ``True``, returns a list containing both the output and the stored fused output.\n   :rtype: ``list[nl.ndarray]``\n\n   **Notes**:\n\n   * Automatically dispatches to either CTE or TKG implementation based on batch size and sequence length.\n   * Token generation mode (TKG) is used for small batch/sequence dimensions (``batch_size × sequence_length ≤ 96``), while context encoding (CTE) handles larger inputs.\n   * Column tiling and tensor layout optimization (``use_tkg_down_proj_optimized_layout``) are valid only in TKG mode.\n   * FP8 quantization support is available only in TKG mode.\n   * Supported input data types: ``nl.bfloat16``, ``nl.float16``, ``nl.float32``\n\nImplementation Details\n-----------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Dual Implementation Strategy**: Automatically selects between CTE (Context Encoding) and TKG (Token Generation) implementations based on ``batch_size × sequence_length``.\n\n2. **Normalization Fusion**: Optionally fuses RMSNorm or LayerNorm operations with the MLP computation for improved performance.\n\n3. **FP8 Quantization**: Supports FP8 quantization with both static and row-wise dequantization scales. Available only in TKG mode for weights and activations.\n\n4. **Flexible Tensor Layouts**: Supports column tiling optimizations and tensor layout optimizations in TKG mode to improve memory access patterns.\n\n5. **Activation Function Options**: Supports multiple activation functions, including SiLU (Swish), GELU, and ReLU.\n\n6. **Result Clamping**: Provides optional clamping of gate and up projection results with configurable upper and lower bounds.\n\n7. **Gate Projection Skipping**: Allows skipping the gate projection computation when ``skip_gate_proj`` is enabled.\n\n8. **Residual Connection Fusion**: Can incorporate residual connections through fused_add_tensor for improved performance.\n\n9. **SBUF Output Option**: Provides the option to keep output in SBUF for fusion with subsequent operations (TKG mode only).\n\n10. **Bias Addition**: Supports optional bias addition for gate, up, and down projections, as well as for normalization.\n\n11. **Optimized Weight Loading**: In TKG mode, ``use_tkg_down_proj_optimized_layout`` enables unit-stride weight loading to reduce matrix multiplication initiation interval.\n\n12. **Multi-Precision Support**: Supports ``bfloat16``, ``float16``, and ``float32`` input data types for flexible precision requirements.\n\nSee Also\n--------\n\n* :doc:`QKV Kernel API Reference </nki/library/api/qkv>`\n* :doc:`RMSNorm-Quant Kernel API Reference </nki/library/api/rmsnorm-quant>`\n"
  },
  {
    "path": "nki/library/api/moe-cte.rst",
    "content": ".. meta::\n    :description: MoE CTE kernel implements Mixture of Experts MLP optimized for Context Encoding.\n    :date-modified: 02/13/2026\n\n.. currentmodule:: nkilib.core.moe_cte\n\nMoE CTE Kernel API Reference\n=============================\n\nImplements Mixture of Experts (MoE) MLP computation optimized for Context Encoding with blockwise matrix multiplication and multiple sharding strategies.\n\nThe kernel supports:\n\n* Unified entry point dispatching to multiple implementation variants\n* Block-sharding and intermediate-dimension-sharding strategies\n* Multiple quantization types (FP8 row/static, MxFP4/MxFP8)\n* Expert affinity scaling (pre-scale and post-scale modes)\n* Various activation functions (SiLU, GELU, ReLU)\n* Optional bias terms for projections\n* Clamping for gate and up projections\n* Activation checkpointing for gradient computation\n* Hybrid static/dynamic loop optimization for padded sequences\n\nBackground\n--------------\n\nThe ``MoE CTE`` kernel is designed for Mixture of Experts models during context encoding (prefill) phase where the sequence length is typically large (T > 128). The kernel performs blockwise MoE MLP computation:\n\n1. **Token Assignment**: Tokens are pre-assigned to blocks via ``token_position_to_id``\n2. **Gate Projection**: ``gate_out = hidden @ gate_weights``\n3. **Up Projection**: ``up_out = hidden @ up_weights``\n4. **Activation**: ``act_gate = activation_fn(gate_out)``\n5. **Element-wise Multiply**: ``intermediate = act_gate * up_out``\n6. **Down Projection**: ``expert_out = intermediate @ down_weights``\n7. **Affinity Scaling**: ``output = expert_out * affinity`` (if enabled)\n8. **Block Accumulation**: Results are accumulated across blocks for multi-expert assignments\n\nThe unified ``moe_cte`` entry point dispatches to the appropriate implementation based on the ``spec`` parameter, which selects between block-sharding and intermediate-dimension-sharding strategies with optional MX quantization support.\n\nAPI Reference\n----------------\n\n**Source code for this kernel API can be found at**: `moe_cte.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/moe/moe_cte/moe_cte.py>`_\n\nmoe_cte\n^^^^^^^^\n\n.. py:function:: moe_cte(hidden_states: nl.ndarray, expert_affinities_masked: nl.ndarray, gate_up_proj_weight: nl.ndarray, down_proj_weight: nl.ndarray, token_position_to_id: nl.ndarray, block_to_expert: nl.ndarray, block_size: int, spec: MoECTESpec, conditions: Optional[nl.ndarray] = None, gate_and_up_proj_bias: Optional[nl.ndarray] = None, down_proj_bias: Optional[nl.ndarray] = None, quantization_config: Optional[QuantizationConfig] = None, gate_up_activations_T: Optional[nl.ndarray] = None, down_activations: Optional[nl.ndarray] = None, activation_function: ActFnType = ActFnType.SiLU, skip_dma: SkipMode = SkipMode(False, False), compute_dtype=nl.bfloat16, is_tensor_update_accumulating: bool = True, expert_affinities_scaling_mode: ExpertAffinityScaleMode = ExpertAffinityScaleMode.POST_SCALE, gate_clamp_upper_limit: Optional[float] = None, gate_clamp_lower_limit: Optional[float] = None, up_clamp_upper_limit: Optional[float] = None, up_clamp_lower_limit: Optional[float] = None)\n\n   Unified entry point for MoE CTE blockwise matrix multiplication kernels.\n\n   Dispatches to the appropriate implementation based on ``spec.implementation``. Supports multiple\n   sharding strategies and quantization modes for different hardware targets.\n\n   :param hidden_states: Input hidden states tensor with shape ``[T+1, H]`` in HBM. T+1 because padding token position is set to T.\n   :type hidden_states: ``nl.ndarray``\n   :param expert_affinities_masked: Expert affinities for each token with shape ``[(T+1) * E, 1]`` in HBM\n   :type expert_affinities_masked: ``nl.ndarray``\n   :param gate_up_proj_weight: Concatenated gate and up projection weights with shape ``[E, H, 2, I_TP]`` in HBM\n   :type gate_up_proj_weight: ``nl.ndarray``\n   :param down_proj_weight: Down projection weights with shape ``[E, I_TP, H]`` in HBM\n   :type down_proj_weight: ``nl.ndarray``\n   :param token_position_to_id: Block index of corresponding tokens with shape ``[N * B]`` in HBM. Includes padding tokens (N * B >= T). Padding token id is set to T.\n   :type token_position_to_id: ``nl.ndarray``\n   :param block_to_expert: Expert indices of corresponding blocks with shape ``[N, 1]`` in HBM\n   :type block_to_expert: ``nl.ndarray``\n   :param block_size: Number of tokens per block (must be multiple of 256)\n   :type block_size: ``int``\n   :param spec: Implementation selection and configuration. Controls which sharding strategy and implementation variant to use. See ``MoECTESpec`` for details.\n   :type spec: ``MoECTESpec``\n   :param conditions: Block padding indicators with shape ``[N+1]``. Used by hybrid and block_mx implementations to distinguish padded vs non-padded blocks.\n   :type conditions: ``nl.ndarray``, optional\n   :param gate_and_up_proj_bias: Gate and up projection bias with shape ``[E, 2, I_TP]``. For SiLU, up_bias = up_bias + 1.\n   :type gate_and_up_proj_bias: ``nl.ndarray``, optional\n   :param down_proj_bias: Down projection bias with shape ``[E, H]``\n   :type down_proj_bias: ``nl.ndarray``, optional\n   :param quantization_config: Quantization scales configuration containing ``gate_up_proj_scale`` and ``down_proj_scale`` for weight dequantization. See ``QuantizationConfig`` for details.\n   :type quantization_config: ``QuantizationConfig``, optional\n   :param gate_up_activations_T: Pre-allocated storage for gate/up activations (for activation checkpointing). Used when ``spec.shard_on_I.checkpoint_activation=True``.\n   :type gate_up_activations_T: ``nl.ndarray``, optional\n   :param down_activations: Pre-allocated storage for down projection activations (for activation checkpointing). Used when ``spec.shard_on_I.checkpoint_activation=True``.\n   :type down_activations: ``nl.ndarray``, optional\n   :param activation_function: Activation function for MLP block. Default is ``SiLU``.\n   :type activation_function: ``ActFnType``\n   :param skip_dma: DMA skip mode configuration. Default is ``SkipMode(False, False)``.\n   :type skip_dma: ``SkipMode``\n   :param compute_dtype: Compute data type. Default is ``nl.bfloat16``.\n   :type compute_dtype: ``nl.dtype``\n   :param is_tensor_update_accumulating: Whether to accumulate results over multiple blocks. Default is ``True``.\n   :type is_tensor_update_accumulating: ``bool``\n   :param expert_affinities_scaling_mode: Post or pre scaling mode. Default is ``POST_SCALE``.\n   :type expert_affinities_scaling_mode: ``ExpertAffinityScaleMode``\n   :param gate_clamp_upper_limit: Upper clamp limit for gate projection\n   :type gate_clamp_upper_limit: ``float``, optional\n   :param gate_clamp_lower_limit: Lower clamp limit for gate projection\n   :type gate_clamp_lower_limit: ``float``, optional\n   :param up_clamp_upper_limit: Upper clamp limit for up projection\n   :type up_clamp_upper_limit: ``float``, optional\n   :param up_clamp_lower_limit: Lower clamp limit for up projection\n   :type up_clamp_lower_limit: ``float``, optional\n   :return: Output hidden states with shape ``[T+1, H]``. When activation checkpointing is enabled, may return a tuple including saved activations.\n   :rtype: ``nl.ndarray`` or ``Tuple[nl.ndarray, ...]``\n\n   **Dimensions**:\n\n   * T: Total number of input tokens (after linearizing across the batch dimension)\n   * H: Hidden dimension size\n   * B: Number of tokens per block\n   * N: Total number of blocks\n   * E: Number of experts\n   * I_TP: Intermediate size / tensor parallelism degree\n\n   **Supported Data Types**:\n\n   * Input: bfloat16, float16\n   * MX implementations: float4_e2m1fn_x4 (MxFP4), float8_e4m3fn (MxFP8)\n\n   **Constraints**:\n\n   * Block size B: 256-1024 tokens (must be multiple of 256)\n   * Total tokens T: Up to 32K tokens per call\n   * Hidden dimension H: 512-8192 (optimal: 2048-4096), must be multiple of 512\n   * Intermediate dimension I_TP: 2048-16384 (optimal: 8192), must be divisible by 16\n   * Number of experts E: 8-64 (optimal: 8-16)\n   * All input/output tensors must have the same floating point dtype\n   * ``token_position_to_id`` and ``block_to_expert`` must be ``nl.int32`` tensors\n\nConfiguration Classes\n-----------------------\n\nMoECTESpec\n^^^^^^^^^^^\n\nSpecification for MoE CTE kernel execution. Selects the implementation variant and provides implementation-specific configuration.\n\n.. code-block:: python\n\n   from nkilib.core.moe.moe_cte.moe_cte import MoECTESpec, MoECTEImplementation\n\n   # Block sharding (default config auto-initialized)\n   spec = MoECTESpec(implementation=MoECTEImplementation.shard_on_block)\n\n   # I-sharding with activation checkpointing\n   spec = MoECTESpec(\n       implementation=MoECTEImplementation.shard_on_i,\n       shard_on_I=ShardOnIConfig(checkpoint_activation=True),\n   )\n\n**Implementation variants**:\n\n* ``shard_on_block``: Shards blocks across cores. Best for many blocks. (TRN2)\n* ``shard_on_i``: Shards intermediate dimension across cores. (TRN2)\n* ``shard_on_i_hybrid``: Shard on I with hybrid static/dynamic loop. (TRN2)\n* ``shard_on_i_dropping``: Shard on I for dropping layer. (TRN2)\n* ``shard_on_block_mx``: Shard on block with MxFP4/MxFP8 quantization. (TRN3)\n* ``shard_on_i_mx``: Shard on I with MxFP4/MxFP8 quantization. (TRN3)\n* ``shard_on_i_mx_hybrid``: Shard on I with MxFP4/MxFP8 and hybrid loop. (TRN3)\n\nQuantizationConfig\n^^^^^^^^^^^^^^^^^^^\n\nConfiguration for quantization-related parameters. Contains dequantization scales for weight tensors.\n\n.. code-block:: python\n\n   from nkilib.core.moe.moe_cte.moe_cte import QuantizationConfig\n\n   # No quantization (default)\n   quant_cfg = QuantizationConfig()\n\n   # With per-tensor scales\n   quant_cfg = QuantizationConfig(\n       gate_up_proj_scale=gate_up_scale_tensor,\n       down_proj_scale=down_scale_tensor,\n   )\n\n* ``gate_up_proj_scale`` (``nl.ndarray``, optional): Dequantization scales for gate/up projection weights.\n* ``down_proj_scale`` (``nl.ndarray``, optional): Dequantization scales for down projection weights.\n\nImplementation Details\n-------------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Unified Dispatch**: The ``moe_cte`` entry point dispatches to the appropriate implementation based on ``spec.implementation``.\n\n2. **Block Sharding**: Distributes blocks across cores for parallel processing. Supports PING_PONG and HI_LO distribution strategies.\n\n3. **Intermediate Dimension Sharding**: Distributes the intermediate dimension (I_TP) across multiple cores with all-reduce operations to combine partial results.\n\n4. **Quantization Support**: Handles multiple quantization schemes:\n   \n   * **FP8 Row Quantization**: Per-row scaling for weights\n   * **FP8 Static Quantization**: Single scale per weight matrix\n   * **MxFP4/MxFP8**: Microscaling formats with block-wise scaling (TRN3)\n\n5. **Expert Affinity Scaling Modes**:\n   \n   * **PRE_SCALE**: Apply affinity scaling before activation\n   * **POST_SCALE**: Apply affinity scaling after down projection (default)\n\n6. **Hybrid Loop Optimization**: For sequences with padding, uses a hybrid static/dynamic loop where non-padded blocks are processed in a compile-time-known static loop and padded blocks in a runtime-dependent dynamic loop.\n\n7. **Activation Checkpointing**: Optionally saves intermediate activations for gradient computation during backward pass.\n\n8. **Optional Clamping**: Supports clamping of gate and up projection outputs for numerical stability.\n\nUsage Examples\n-----------------\n\nBasic usage with block sharding:\n\n.. code-block:: python\n\n   from nkilib.core.moe.moe_cte.moe_cte import moe_cte, MoECTESpec, MoECTEImplementation\n\n   spec = MoECTESpec(implementation=MoECTEImplementation.shard_on_block)\n\n   output = moe_cte(\n       hidden_states=hidden_states,\n       expert_affinities_masked=expert_affinities,\n       gate_up_proj_weight=gate_up_weights,\n       down_proj_weight=down_weights,\n       token_position_to_id=token_position_to_id,\n       block_to_expert=block_to_expert,\n       block_size=512,\n       spec=spec,\n   )\n\nWith quantization:\n\n.. code-block:: python\n\n   from nkilib.core.moe.moe_cte.moe_cte import QuantizationConfig\n\n   quant_cfg = QuantizationConfig(\n       gate_up_proj_scale=gate_up_scale,\n       down_proj_scale=down_scale,\n   )\n\n   output = moe_cte(\n       hidden_states=hidden_states,\n       expert_affinities_masked=expert_affinities,\n       gate_up_proj_weight=gate_up_weights,\n       down_proj_weight=down_weights,\n       token_position_to_id=token_position_to_id,\n       block_to_expert=block_to_expert,\n       block_size=512,\n       spec=spec,\n       quantization_config=quant_cfg,\n   )\n\nSee Also\n-----------\n\n* :doc:`MoE TKG Kernel API Reference </nki/library/api/moe-tkg>`\n* :doc:`Router Top-K Kernel API Reference </nki/library/api/router-topk>`\n* :doc:`MLP Kernel API Reference </nki/library/api/mlp>`\n"
  },
  {
    "path": "nki/library/api/moe-tkg.rst",
    "content": ".. meta::\n    :description: MoE TKG kernel implements Mixture of Experts MLP optimized for Token Generation.\n    :date-modified: 02/13/2026\n\n.. currentmodule:: nkilib.core.moe_tkg\n\nMoE TKG Kernel API Reference\n=============================\n\nImplements Mixture of Experts (MoE) MLP computation optimized for Token Generation with support for both all-expert and selective-expert modes.\n\nThe kernel supports:\n\n* All-expert mode (process all experts for all tokens)\n* Selective-expert mode (process only top-K selected experts)\n* Multiple quantization types (FP8 row/static, MxFP4)\n* Expert affinity scaling (post-scale mode)\n* Expert affinity masking for distributed inference\n* Various activation functions (SiLU, GELU, ReLU)\n* Optional bias terms for projections\n* Clamping for gate and up projections\n* SBUF or HBM output allocation\n\nBackground\n--------------\n\nThe ``MoE TKG`` kernel is designed for Mixture of Experts models during token generation (decoding) phase where the batch size and sequence length are typically small (T ≤ 128). The kernel performs the core MoE MLP computation:\n\n1. **Gate Projection**: ``gate_out = hidden @ gate_weights``\n2. **Up Projection**: ``up_out = hidden @ up_weights``\n3. **Activation**: ``act_gate = activation_fn(gate_out)``\n4. **Element-wise Multiply**: ``intermediate = act_gate * up_out``\n5. **Down Projection**: ``expert_out = intermediate @ down_weights``\n6. **Affinity Scaling**: ``output = sum(expert_out * affinity)`` (if enabled)\n\nThe kernel supports two operational modes:\n\n* **All-Expert Mode**: Processes all experts for all tokens, useful for distributed inference scenarios\n* **Selective-Expert Mode**: Processes only the top-K selected experts per token, reducing computation\n\nAPI Reference\n----------------\n\n**Source code for this kernel API can be found at**: `moe_tkg.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/moe/moe_tkg/moe_tkg.py>`_\n\nmoe_tkg\n^^^^^^^^^^^^^^^\n\n.. py:function:: moe_tkg(hidden_input: nl.ndarray, expert_gate_up_weights: nl.ndarray, expert_down_weights: nl.ndarray, expert_affinities: nl.ndarray, expert_index: nl.ndarray, is_all_expert: bool, rank_id: Optional[nl.ndarray] = None, expert_gate_up_bias: Optional[nl.ndarray] = None, expert_down_bias: Optional[nl.ndarray] = None, expert_gate_up_weights_scale: Optional[nl.ndarray] = None, expert_down_weights_scale: Optional[nl.ndarray] = None, hidden_input_scale: Optional[nl.ndarray] = None, gate_up_input_scale: Optional[nl.ndarray] = None, down_input_scale: Optional[nl.ndarray] = None, mask_unselected_experts: bool = False, expert_affinities_eager: Optional[nl.ndarray] = None, expert_affinities_scaling_mode: ExpertAffinityScaleMode = ExpertAffinityScaleMode.NO_SCALE, activation_fn: ActFnType = ActFnType.SiLU, output_dtype=None, gate_clamp_upper_limit: Optional[float] = None, gate_clamp_lower_limit: Optional[float] = None, up_clamp_upper_limit: Optional[float] = None, up_clamp_lower_limit: Optional[float] = None, output_in_sbuf: bool = False, is_all_expert_dynamic: bool = False) -> nl.ndarray\n\n   Mixture of Experts (MoE) MLP token generation kernel.\n\n   Performs MoE computation with support for both all-expert and selective-expert modes.\n   Supports various quantization types including FP8 row/static quantization and MxFP4.\n   Optimized for token generation scenarios with T ≤ 128 (except MX all-expert mode).\n\n   :param hidden_input: Input hidden states tensor with shape ``[T, H]`` in HBM or ``[H0, T, H1]`` in SBUF\n   :type hidden_input: ``nl.ndarray``\n   :param expert_gate_up_weights: Fused gate and up projection weights. Shape ``[E_L, H, 2, I]`` for bf16/fp16 or ``[E_L, 128, 2, ceil(H/512), I]`` for MxFP4\n   :type expert_gate_up_weights: ``nl.ndarray``\n   :param expert_down_weights: Down projection weights. Shape ``[E_L, I, H]`` for bf16/fp16 or ``[E_L, I_p, ceil(I/512), H]`` for MxFP4\n   :type expert_down_weights: ``nl.ndarray``\n   :param expert_affinities: Expert routing weights/affinities with shape ``[T, E]``. For all-expert mode with affinity scaling, this will be sliced to ``[T, E_L]`` internally.\n   :type expert_affinities: ``nl.ndarray``\n   :param expert_index: Top-K expert indices per token with shape ``[T, K]``\n   :type expert_index: ``nl.ndarray``\n   :param is_all_expert: If ``True``, process all experts for all tokens; otherwise, process only selected top-K experts\n   :type is_all_expert: ``bool``\n   :param rank_id: Rank ID tensor specifying which worker processes experts ``[E_L * rank_id, E_L * (rank_id + 1))``. Shape ``[1, 1]``. Required for all-expert mode with affinity scaling enabled.\n   :type rank_id: ``nl.ndarray``, optional\n   :param expert_gate_up_bias: Bias for gate/up projections. Shape ``[E_L, 2, I]`` for non-MX or ``[E_L, I_p, 2, ceil(I/512), 4]`` for MX.\n   :type expert_gate_up_bias: ``nl.ndarray``, optional\n   :param expert_down_bias: Bias for down projection with shape ``[E_L, H]``\n   :type expert_down_bias: ``nl.ndarray``, optional\n   :param expert_gate_up_weights_scale: Quantization scales for gate/up weights. Shape ``[E_L, 2, I]`` for FP8 row quantization, ``[E_L, 2, 1]`` for FP8 static quantization, or ``[E_L, 128/8, 2, ceil(H/512), I]`` for MxFP4.\n   :type expert_gate_up_weights_scale: ``nl.ndarray``, optional\n   :param expert_down_weights_scale: Quantization scales for down weights. Shape ``[E_L, H]`` for FP8 row quantization, ``[E_L, 1]`` for FP8 static quantization, or ``[E_L, I_p/8, ceil(I/512), H]`` for MxFP4.\n   :type expert_down_weights_scale: ``nl.ndarray``, optional\n   :param hidden_input_scale: FP8 dequantization scale for the hidden input tensor. Used for static quantization of the input.\n   :type hidden_input_scale: ``nl.ndarray``, optional\n   :param gate_up_input_scale: FP8 dequantization scales for gate/up input. Shape ``[E_L, 1]``. Used for static quantization.\n   :type gate_up_input_scale: ``nl.ndarray``, optional\n   :param down_input_scale: FP8 dequantization scales for down input. Shape ``[E_L, 1]``. Used for static quantization.\n   :type down_input_scale: ``nl.ndarray``, optional\n   :param mask_unselected_experts: Whether to apply expert affinity masking based on expert_index. When ``True``, affinities are masked to zero for experts not selected by each token. Only used in all-expert mode with affinity scaling.\n   :type mask_unselected_experts: ``bool``\n   :param expert_affinities_eager: Eager expert affinities with shape ``[T, K]``. Not used in all-expert mode.\n   :type expert_affinities_eager: ``nl.ndarray``, optional\n   :param expert_affinities_scaling_mode: When to apply affinity scaling. Supported values: ``NO_SCALE``, ``POST_SCALE``. Default is ``NO_SCALE``.\n   :type expert_affinities_scaling_mode: ``ExpertAffinityScaleMode``\n   :param activation_fn: Activation function type. Default is ``SiLU``.\n   :type activation_fn: ``ActFnType``\n   :param output_dtype: Output tensor data type. Defaults to ``None``; if ``None``, uses ``hidden_input`` dtype.\n   :type output_dtype: ``nl.dtype``, optional\n   :param gate_clamp_upper_limit: Upper bound value to clamp gate projection results\n   :type gate_clamp_upper_limit: ``float``, optional\n   :param gate_clamp_lower_limit: Lower bound value to clamp gate projection results\n   :type gate_clamp_lower_limit: ``float``, optional\n   :param up_clamp_upper_limit: Upper bound value to clamp up projection results\n   :type up_clamp_upper_limit: ``float``, optional\n   :param up_clamp_lower_limit: Lower bound value to clamp up projection results\n   :type up_clamp_lower_limit: ``float``, optional\n   :param output_in_sbuf: If ``True``, allocate output in SBUF with same shape as hidden_input. If ``False`` (default), allocate output in HBM with shape ``[T, H]``.\n   :type output_in_sbuf: ``bool``\n   :param is_all_expert_dynamic: If ``True``, enables dynamic expert selection in all-expert mode, where the set of active experts can vary per token. Default: ``False``.\n   :type is_all_expert_dynamic: ``bool``\n   :return: Output tensor with MoE computation results. Shape ``[T, H]`` or same shape as hidden_input if output_in_sbuf=True.\n   :rtype: ``nl.ndarray``\n\n   **Dimensions**:\n\n   * T: Number of tokens (batch_size × seq_len)\n   * H: Hidden dimension\n   * I: Intermediate dimension\n   * E: Number of global experts\n   * E_L: Number of local experts processed by this kernel\n   * K: Top-K experts per token\n   * I_p: I//4 if I ≤ 512 else 128\n\n   **Supported Data Types**:\n\n   * Input: bfloat16, float16, float4_e2m1fn_x4 (MxFP4)\n\n   **Constraints**:\n\n   * T ≤ 128 (batch_size × seq_len must be ≤ 128, except for MX all-expert mode)\n   * ``PRE_SCALE`` and ``PRE_SCALE_DELAYED`` modes are not supported\n   * Static quantization (``gate_up_input_scale`` and ``down_input_scale``) is not currently supported\n   * MX kernels require ``expert_gate_up_weights_scale`` and ``expert_down_weights_scale`` to be set\n   * All-expert mode with affinity scaling requires ``rank_id`` parameter\n   * All-expert mode does not support ``expert_affinities_eager``\n\nImplementation Details\n-------------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Dual Mode Operation**: Supports both all-expert and selective-expert modes with separate optimized implementations for each.\n\n2. **Quantization Support**: Handles multiple quantization schemes:\n   \n   * **FP8 Row Quantization**: Per-row scaling for weights\n   * **FP8 Static Quantization**: Single scale per weight matrix\n   * **MxFP4**: Microscaling FP4 format with block-wise scaling\n\n3. **Expert Affinity Masking**: For distributed inference in all-expert mode, masks expert affinities based on rank ID to ensure each worker processes only its assigned experts.\n\n4. **Fused Gate-Up Projection**: Gate and up projection weights are fused into a single tensor for efficient memory access and computation.\n\n5. **Affinity Scaling Modes**:\n   \n   * **NO_SCALE**: No affinity scaling applied\n   * **POST_SCALE**: Apply affinity scaling after expert computation (recommended)\n\n6. **Activation Function Support**: Supports various activation functions including SiLU (default), GELU, and ReLU.\n\n7. **Optional Clamping**: Supports clamping of gate and up projection outputs for numerical stability.\n\n8. **Flexible Output Allocation**: Supports output allocation in either HBM or SBUF for integration with larger kernels.\n\n9. **MX-Specific Optimizations**: MX all-expert mode supports larger batch sizes and includes K-dimension sharding for selective-expert mode.\n\n\n\nSee Also\n-----------\n\n* :doc:`MoE CTE Kernel API Reference </nki/library/api/moe-cte>`\n* :doc:`Router Top-K Kernel API Reference </nki/library/api/router-topk>`\n* :doc:`MLP Kernel API Reference </nki/library/api/mlp>`\n"
  },
  {
    "path": "nki/library/api/output-projection-cte.rst",
    "content": ".. meta::\n    :description: Output Projection CTE kernel computes output projection optimized for Context Encoding.\n    :date-modified: 11/28/2025\n\n.. currentmodule:: nkilib.core.output_projection.output_projection_cte\n\nOutput Projection CTE Kernel API Reference\n===========================================\n\nComputes output projection (attention @ weight + bias) optimized for Context Encoding (prefill) use cases.\n\nThe kernel supports:\n\n* Efficient projection of attention outputs\n* Optional bias addition\n* LNC sharding for distributed computation\n* Optimized memory access patterns\n* Head dimension packing for improved performance\n\nBackground\n--------------\n\nThe ``Output Projection CTE`` kernel computes the operation ``out = attention @ weight + bias``, which is commonly used to project the output scores after an attention block in transformer models. This kernel is specifically optimized for Context Encoding (Prefill) use cases, where the sequence length can be large (typically ``S`` ≥ 512).\n\nThe kernel employs efficient tiling strategies and memory access patterns to maximize performance on Neuron hardware, with support for sharding across multiple Logical Neuron Cores (LNCs) to handle large hidden dimensions. When ``LNC>1``, the ``H`` dimension is sharded across the cores, which avoids the need for any inter-core collective operations as each core produces part of the output tensor.\n\nAPI Reference\n----------------\n\n**Source code for this kernel API can be found at**: `output_projection_cte.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/output_projection/output_projection_cte.py>`_\n\noutput_projection_cte\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: output_projection_cte(attention: nl.ndarray, weight: nl.ndarray, bias=None, quantization_type: QuantizationType = QuantizationType.NONE, input_scales: Optional[nl.ndarray] = None, weight_scales: Optional[nl.ndarray] = None)\n\n   Output Projection Kernel optimized for Context Encoding (Prefill) use cases.\n\n   This kernel computes ``out = attention @ weight + bias``, typically used to project the output scores after an attention block in transformer models.\n\n   This kernel is optimized for Context Encoding (aka Prefill) use cases where sequence length ``S`` is large. Using this kernel with ``S < 512`` may result in degraded performance.\n\n   This kernel uses a layout also used by other Context Encoding kernels to avoid need for transposes.\n\n   :param attention: Input tensor in HBM, typically the scores output from an attention block. Shape: ``[B, N, D, S]``, where ``B`` is batch size, ``N`` is number of heads, ``D`` is head dimension, and ``S`` is sequence length. Indexing: ``[b, n, d, s]``.\n   :type attention: ``nl.ndarray``\n   :param weight: Weight tensor in HBM. Shape: ``[N*D, H]``, where ``H`` is hidden dimension size. Indexing: ``[n * D + d, h]``.\n   :type weight: ``nl.ndarray``\n   :param bias: Optional bias tensor in HBM. Shape: ``[1, H]``. Indexing: ``[1, h]``.\n   :type bias: ``nl.ndarray``, optional\n   :param quantization_type: Type of quantization (NONE or STATIC for FP8). Default: QuantizationType.NONE.\n   :type quantization_type: ``QuantizationType``\n   :param input_scales: Input scale tensor for FP8 quantization. Shape: ``[128, 1]``.\n   :type input_scales: ``nl.ndarray``, optional\n   :param weight_scales: Weight scale tensor for FP8 quantization. Shape: ``[128, 1]``.\n   :type weight_scales: ``nl.ndarray``, optional\n   :return: Output tensor in HBM. Shape: ``[B, S, H]``. Indexing: ``[b, s, h]``.\n   :rtype: ``nl.ndarray``\n\n   **Data Types**:\n     This kernel supports ``nl.float32``, ``nl.float16`` and ``nl.bfloat16`` data types.\n     However, for ``nl.float32``, large inputs may not fit in SBUF.\n\n   **Dimensions**:\n     * ``B``: Batch size\n     * ``N``: Number of heads\n     * ``S``: Sequence length\n     * ``H``: Hidden dimension size\n     * ``D``: Head dimension size\n\n   **Restrictions**:\n\n   * The contract dimension of input and weight tensors must match (``N*D == weight.shape[0]``)\n   * Output projection kernel currently only supports ``H`` to be no more than 32768\n   * Hidden dimension (``H``) needs to be divisible by LNC size since LNC sharding is on the weight hidden dimension\n   * Head dimension (``D``) must be <= 128\n   * Maximum validated ``H`` size is 20705\n   * Maximum validated ``B*S`` size is 131072\n   * Maximum validated ``N`` size is 17\n\nImplementation Details\n-------------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Dimension Packing**: Optimizes the contraction dimension by folding ``N`` (number of heads) into ``D`` (head dimension) when beneficial, improving computational efficiency.\n\n2. **Efficient Tiling Strategy**: Uses carefully chosen tile sizes for processing batches and sequences to maximize hardware utilization.\n\n3. **LNC Sharding**: Supports sharding across multiple Logical Neuron Cores (LNCs) by dividing the hidden dimension, enabling processing of larger models.\n\n4. **Memory Access Optimization**: Employs optimized memory access patterns to maximize bandwidth utilization and minimize data movement.\n\n5. **PSUM Bank Utilization**: Efficiently utilizes PSUM banks for accumulating partial results during matrix multiplication operations.\n\n6. **Stream Shuffle Broadcast**: Uses stream shuffle broadcast for bias tensors to efficiently distribute them across processing elements.\n\n7. **Specialized Engine Selection**: Alternates between scalar and vector engines for tensor copy operations to balance workload and improve performance.\n\nSee Also\n-----------\n\n* :doc:`Output Projection TKG Kernel API Reference </nki/library/api/output-projection-tkg>`\n* :doc:`QKV Kernel API Reference </nki/library/api/qkv>`\n"
  },
  {
    "path": "nki/library/api/output-projection-tkg.rst",
    "content": ".. meta::\n    :description: Output Projection TKG kernel computes output projection optimized for Token Generation.\n    :date-modified: 11/28/2025\n\n.. currentmodule:: nkilib.core.output_projection.output_projection_tkg\n\nOutput Projection TKG Kernel API Reference\n===========================================\n\nComputes output projection (attention @ weight + bias) optimized for Token Generation (decode) use cases.\n\nThe kernel supports:\n\n* Efficient projection of attention outputs\n* Optional bias addition\n* LNC sharding for distributed computation\n* Optimized memory access patterns\n* Head dimension packing for improved performance\n* Flexible output tensor layouts\n* SBUF output option for kernel fusion\n\nBackground\n--------------\n\nThe ``Output Projection TKG`` kernel computes the operation ``out = attention @ weight + bias``, which is commonly used to project the output scores after an attention block in transformer models. This kernel is specifically optimized for Token Generation (Decode) use cases, where the sequence length ``S`` is small (often 1 or a small number for speculative decoding).\n\nThe kernel employs efficient tiling strategies and memory access patterns to maximize performance on Neuron hardware, with support for sharding across multiple Logical Neuron Cores (LNCs) to handle large hidden dimensions. When ``LNC>1``, the ``H`` dimension is sharded across the cores, which avoids the need for any inter-core collective operations as each core produces part of the output tensor.\n\nThe input layouts expected for this kernel are different from those for the CTE kernel. In TKG workloads, the ``S`` dimension is small, so placing the ``N`` dimension next to it allows more efficient GQA implementations by loading multiple heads at once.\n\nAPI Reference\n----------------\n\n**Source code for this kernel API can be found at**: `output_projection_tkg.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/output_projection/output_projection_tkg.py>`_\n\noutput_projection_tkg\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: output_projection_tkg(attention: nl.ndarray, weight: nl.ndarray, bias: Optional[nl.ndarray] = None, quantization_type: QuantizationType = QuantizationType.NONE, weight_scale: Optional[nl.ndarray] = None, input_scale: Optional[nl.ndarray] = None, TRANSPOSE_OUT=False, OUT_IN_SB=False)\n\n   Output Projection Kernel optimized for Token Generation (Decode) use cases.\n\n   This kernel computes ``out = attention @ weight + bias``, typically used to project the output scores after an attention block in transformer models.\n\n   This kernel is optimized for Token Generation (aka Decode) use cases where sequence length ``S`` is small.\n\n   :param attention: Input tensor in HBM or SBUF, typically the scores output from an attention block. Shape: ``[D, B, N, S]``, where ``D`` is head dimension, ``B`` is batch size, ``N`` is number of heads, and ``S`` is sequence length. Indexing: ``[d, b, n, s]``.\n   :type attention: ``nl.ndarray``\n   :param weight: Weight tensor in HBM. Shape: ``[N*D, H]``, where ``H`` is hidden dimension size. Indexing: ``[n * D + d, h]``.\n   :type weight: ``nl.ndarray``\n   :param bias: Optional bias tensor in HBM. Shape: ``[1, H]``. Indexing: ``[1, h]``.\n   :type bias: ``nl.ndarray``, optional\n   :param quantization_type: Type of quantization to apply. Default: QuantizationType.NONE.\n   :type quantization_type: ``QuantizationType``\n   :param weight_scale: Weight scale tensor for quantization.\n   :type weight_scale: ``nl.ndarray``, optional\n   :param input_scale: Input scale tensor for quantization.\n   :type input_scale: ``nl.ndarray``, optional\n   :param TRANSPOSE_OUT: Whether to store the output in transposed shape. If ``False``, output shape is ``[B*S, H]`` with indexing ``[b*S+s, h]``. If ``True``, output shape is ``[H_1, H_0, H_2, B*S]`` with indexing ``[h_1, h_0, h_2, b*S+s]``, where ``H_0 = logical core size (LNC)``, ``H_1 = 128``, ``H_2 = H/(H_0*H_1)``, such that ``h = h_0*H_1*H_2 + h_1*H_2 + h_2``.\n   :type TRANSPOSE_OUT: ``bool``\n   :param OUT_IN_SB: If ``True``, output is in SBUF. Else, it is written out to HBM.\n   :type OUT_IN_SB: ``bool``\n   :return: Output tensor in HBM or SBUF. Shape depends on ``TRANSPOSE_OUT`` parameter.\n   :rtype: ``nl.ndarray``\n\n   **Data Types**:\n     This kernel supports ``nl.float32``, ``nl.float16`` and ``nl.bfloat16`` data types.\n     However, for ``nl.float32``, large inputs may not fit in SBUF.\n\n   **Dimensions**:\n     * ``B``: Batch size\n     * ``N``: Number of heads\n     * ``S``: Sequence length\n     * ``H``: Hidden dimension size\n     * ``D``: Head dimension size\n\n   **Restrictions**:\n\n   * The contract dimension of input and weight tensors must match (``N*D == weight.shape[0]``)\n   * Hidden dimension (``H``) needs to be divisible by LNC size since LNC sharding is on the weight hidden dimension\n   * ``B*S`` must be <= 128\n   * Head dimension (``D``) must be <= 128\n   * When ``TRANSPOSE_OUT`` is ``False``, ``H`` must be a multiple of ``512*LNC``\n   * When ``TRANSPOSE_OUT`` is ``True``, ``H`` must be a multiple of ``128*LNC``\n   * When ``TRANSPOSE_OUT`` is ``True`` and using 32-bit floats, ``N*H`` must be <= 81920\n   * When ``TRANSPOSE_OUT`` is ``True`` and using 16-bit floats, ``N*H`` must be <= 163840\n\nImplementation Details\n-------------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Dimension Packing**: Optimizes the contraction dimension by folding ``N`` (number of heads) into ``D`` (head dimension) when beneficial, improving computational efficiency.\n\n2. **Efficient Tiling Strategy**: Uses carefully chosen tile sizes for processing batches and sequences to maximize hardware utilization.\n\n3. **LNC Sharding**: Supports sharding across multiple Logical Neuron Cores (LNCs) by dividing the hidden dimension, enabling processing of larger models.\n\n4. **Memory Access Optimization**: Employs optimized memory access patterns to maximize bandwidth utilization and minimize data movement.\n\n5. **PSUM Bank Utilization**: Efficiently utilizes PSUM banks for accumulating partial results during matrix multiplication operations.\n\n6. **Stream Shuffle Broadcast**: Uses stream shuffle broadcast for bias tensors to efficiently distribute them across processing elements.\n\n7. **Flexible Output Layouts**: Supports both standard and transposed output layouts to accommodate different downstream kernel requirements.\n\n8. **SBUF Output Option**: Provides the option to keep output in SBUF for fusion with subsequent operations.\n\n9. **Block-based Weight Loading**: Uses block-based loading of weights to encourage prefetching and improve memory access patterns.\n\nSee Also\n-----------\n\n* :doc:`Output Projection CTE Kernel API Reference </nki/library/api/output-projection-cte>`\n* :doc:`QKV Kernel API Reference </nki/library/api/qkv>`\n"
  },
  {
    "path": "nki/library/api/qkv.rst",
    "content": ".. meta::\n    :description: QKV kernel performs Query-Key-Value projection with optional normalization and RoPE fusion.\n    :date-modified: 02/13/2026\n\n.. currentmodule:: nkilib.core.qkv\n\nQKV Kernel API Reference\n==================================\n\nPerforms Query-Key-Value projection with optional normalization and RoPE fusion.\n\nThe kernel supports:\n\n* Optional RMSNorm/LayerNorm fusion\n* Multiple output tensor layouts\n* Residual connections from previous MLP and attention outputs\n* Automatic selection between TKG and CTE implementations based on batch_size * seqlen threshold\n* Optional RoPE (Rotary Position Embedding) fusion\n* Fused FP8 KV cache quantization\n* Block-based KV cache layout support\n* MX quantization support (CTE mode only)\n\nBackground\n-----------\n\nThe ``QKV`` kernel is a critical component in transformer architectures, responsible for projecting the input hidden states into query, key, and value representations. This kernel optimizes the projection operation by fusing it with optional normalization and supporting various output layouts to accommodate different transformer implementations.\n\n.. note::\n    This kernel automatically selects between TKG (Token Generation) and CTE (Context Encoding) implementations based on sequence length, ensuring optimal performance across different use cases. CTE is used for longer sequences, while TKG is optimized for shorter sequences.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `qkv.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/qkv/qkv.py>`_\n\nqkv\n^^^\n\n.. py:function:: qkv(input: nl.ndarray, fused_qkv_weights: nl.ndarray, output_layout: QKVOutputLayout = QKVOutputLayout.BSD, bias: Optional[nl.ndarray] = None, quantization_type: QuantizationType = QuantizationType.NONE, qkv_w_scale: Optional[nl.ndarray] = None, qkv_in_scale: Optional[nl.ndarray] = None, fused_residual_add: Optional[bool] = False, mlp_prev: Optional[nl.ndarray] = None, attention_prev: Optional[nl.ndarray] = None, fused_norm_type: NormType = NormType.NO_NORM, gamma_norm_weights: Optional[nl.ndarray] = None, layer_norm_bias: Optional[nl.ndarray] = None, norm_eps: float = 1e-6, hidden_actual: Optional[int] = None, fused_rope: Optional[bool] = False, cos_cache: Optional[nl.ndarray] = None, sin_cache: Optional[nl.ndarray] = None, d_head: Optional[int] = None, num_q_heads: Optional[int] = None, num_kv_heads: Optional[int] = None, k_cache: Optional[nl.ndarray] = None, v_cache: Optional[nl.ndarray] = None, k_scale: Optional[nl.ndarray] = None, v_scale: Optional[nl.ndarray] = None, fp8_max: Optional[float] = None, fp8_min: Optional[float] = None, kv_dtype: Optional[type] = None, use_block_kv: bool = False, block_size: Optional[int] = None, slot_mapping: Optional[nl.ndarray] = None, store_output_in_sbuf: bool = False, sbm: Optional[SbufManager] = None, use_auto_allocation: bool = False, load_input_with_DMA_transpose: bool = True, is_input_swizzled: bool = False) -> nl.ndarray\n\n   QKV (Query, Key, Value) projection kernel with multiple optional fused operations.\n    \n   Performs matrix multiplication between hidden states and fused QKV weights matrix with optional\n   fused operations including residual addition, normalization, bias addition, and RoPE rotation.\n   Automatically selects between TKG and CTE implementations based on sequence length.\n\n   :param input: Input hidden states tensor. Shape: [B, S, H] where B=batch, S=sequence_length, H=hidden_dim.\n   :type input: ``nl.ndarray``\n   :param fused_qkv_weights: Fused QKV weight matrix. Shape: [H, I] where I=fused_qkv_dim=(num_q_heads + 2*num_kv_heads)*d_head.\n   :type fused_qkv_weights: ``nl.ndarray``\n   :param output_layout: Output tensor layout. QKVOutputLayout.BSD=[B, S, I] or QKVOutputLayout.NBSd=[num_heads, B, S, d_head]. Default: QKVOutputLayout.BSD.\n   :type output_layout: ``QKVOutputLayout``\n   :param bias: Bias tensor to add to QKV projection output. Shape: [1, I].\n   :type bias: ``nl.ndarray``, optional\n   :param quantization_type: Type of quantization to apply. Default: QuantizationType.NONE.\n   :type quantization_type: ``QuantizationType``\n   :param qkv_w_scale: Weight scale tensor for quantization.\n   :type qkv_w_scale: ``nl.ndarray``, optional\n   :param qkv_in_scale: Input scale tensor for quantization.\n   :type qkv_in_scale: ``nl.ndarray``, optional\n   :param fused_residual_add: Whether to perform residual addition: input = input + mlp_prev + attention_prev. Default: False.\n   :type fused_residual_add: ``bool``, optional\n   :param mlp_prev: Previous MLP output tensor for residual addition. Shape: [B, S, H].\n   :type mlp_prev: ``nl.ndarray``, optional\n   :param attention_prev: Previous attention output tensor for residual addition. Shape: [B, S, H].\n   :type attention_prev: ``nl.ndarray``, optional\n   :param fused_norm_type: Type of normalization (NO_NORM, RMS_NORM, RMS_NORM_SKIP_GAMMA, LAYER_NORM). Default: NormType.NO_NORM.\n   :type fused_norm_type: ``NormType``\n   :param gamma_norm_weights: Normalization gamma/scale weights. Shape: [1, H]. Required for RMS_NORM and LAYER_NORM.\n   :type gamma_norm_weights: ``nl.ndarray``, optional\n   :param layer_norm_bias: Layer normalization beta/bias weights. Shape: [1, H]. Only for LAYER_NORM.\n   :type layer_norm_bias: ``nl.ndarray``, optional\n   :param norm_eps: Epsilon value for numerical stability in normalization. Default: 1e-6.\n   :type norm_eps: ``float``, optional\n   :param hidden_actual: Actual hidden dimension for padded tensors (if H contains padding).\n   :type hidden_actual: ``int``, optional\n   :param fused_rope: Whether to apply RoPE rotation to Query and Key heads after QKV projection. Default: False.\n   :type fused_rope: ``bool``, optional\n   :param cos_cache: Cosine cache for RoPE. Shape: [B, S, d_head]. Required if fused_rope=True.\n   :type cos_cache: ``nl.ndarray``, optional\n   :param sin_cache: Sine cache for RoPE. Shape: [B, S, d_head]. Required if fused_rope=True.\n   :type sin_cache: ``nl.ndarray``, optional\n   :param d_head: Dimension per attention head. Required for QKVOutputLayout.NBSd and RoPE.\n   :type d_head: ``int``, optional\n   :param num_q_heads: Number of query heads. Required for RoPE.\n   :type num_q_heads: ``int``, optional\n   :param num_kv_heads: Number of key/value heads. Required for RoPE.\n   :type num_kv_heads: ``int``, optional\n   :param k_cache: Key cache tensor for fused FP8 KV cache quantization. Shape: ``[B, max_seq_len, kv_dim]``. Required when ``k_scale`` and ``v_scale`` are provided.\n   :type k_cache: ``nl.ndarray``, optional\n   :param v_cache: Value cache tensor for fused FP8 KV cache quantization. Shape: ``[B, max_seq_len, kv_dim]``. Required when ``k_scale`` and ``v_scale`` are provided.\n   :type v_cache: ``nl.ndarray``, optional\n   :param k_scale: Key quantization scale for FP8 KV cache quantization. Enables KV output quantization when both ``k_scale`` and ``v_scale`` are provided.\n   :type k_scale: ``nl.ndarray``, optional\n   :param v_scale: Value quantization scale for FP8 KV cache quantization. Enables KV output quantization when both ``k_scale`` and ``v_scale`` are provided.\n   :type v_scale: ``nl.ndarray``, optional\n   :param fp8_max: Maximum FP8 value for clamping during KV cache quantization. Defaults to the maximum positive value of ``kv_dtype``.\n   :type fp8_max: ``float``, optional\n   :param fp8_min: Minimum FP8 value for clamping during KV cache quantization. Defaults to the negative of ``fp8_max``.\n   :type fp8_min: ``float``, optional\n   :param kv_dtype: Data type for quantized KV cache output. Defaults to the input tensor dtype if not specified.\n   :type kv_dtype: ``type``, optional\n   :param use_block_kv: Whether to use block-based KV cache layout. When ``True``, requires ``block_size`` and ``slot_mapping``. Default: False.\n   :type use_block_kv: ``bool``\n   :param block_size: Number of tokens per block in block KV cache. Required when ``use_block_kv=True``.\n   :type block_size: ``int``, optional\n   :param slot_mapping: Mapping from token positions to block slots for block KV cache. Required when ``use_block_kv=True``.\n   :type slot_mapping: ``nl.ndarray``, optional\n   :param store_output_in_sbuf: Whether to store output in SBUF (currently unsupported, must be False). Default: False.\n   :type store_output_in_sbuf: ``bool``\n   :param sbm: Optional SBUF manager for memory allocation control with pre-specified bounds for SBUF usage.\n   :type sbm: ``SbufManager``, optional\n   :param use_auto_allocation: Whether to use automatic SBUF allocation. Default: False.\n   :type use_auto_allocation: ``bool``\n   :param load_input_with_DMA_transpose: Whether to use DMA transpose optimization. Default: True.\n   :type load_input_with_DMA_transpose: ``bool``\n   :param is_input_swizzled: Whether the input tensor is swizzled (only applicable with MX Quantization). Default: False.\n   :type is_input_swizzled: ``bool``\n   :return: QKV projection output tensor with shape determined by output_layout.\n   :rtype: ``nl.ndarray``\n\n   **Raises**:\n\n   * **ValueError** – Raised when contract dimension mismatch occurs between ``input`` and ``fused_qkv_weights``.\n   * **AssertionError** – Raised when required parameters for fused operations are missing or have incorrect shapes.\n\nImplementation Details\n-----------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Automatic Implementation Selection**: The kernel automatically selects between TKG (Token Generation) and CTE (Context Encoding) implementations based on sequence length. Some features like RoPE fusion and loading input with DMA transpose are only available in CTE mode. TKG mode only supports automatic allocation at the moment.\n\n2. **Fused Operations Support**: \n   \n   - **Residual Addition**: Fuses ``input`` + ``mlp_prev`` + ``attention_prev``\n   - **Normalization**: Supports RMSNorm, LayerNorm, and ``RMS_NORM_SKIP_GAMMA``\n   - **Bias Addition**: Adds bias to QKV projection output\n   - **RoPE Fusion**: Applies Rotary Position Embedding to Query and Key heads\n   - **FP8 KV Cache Quantization**: Quantizes K and V outputs directly into the KV cache, avoiding a separate quantization step. Enabled when ``k_scale`` and ``v_scale`` are provided. Only supported with BSD output layout.\n   - **Block KV Cache**: Supports block-based KV cache layout with indirect addressing via ``slot_mapping`` for variable-length sequences.\n\n3. **Flexible Output Layouts**: Supports BSD (``[B, S, I]``) and NBSd (``[num_heads, B, S, d_head]``) output tensor layouts.\n\n4. **Memory Management**: \n   \n   - Optional SBUF manager for controlled memory allocation\n   - DMA transpose optimization for weight loading\n\n5. **Hardware Compatibility**: Supports bf16, fp16, and fp32 data types (fp32 inputs are internally converted to bf16).\n\n6. **Constraints**: \n   \n   - H must be ≤ 24576 and divisible by 128\n   - I must be ≤ 4096\n   - For NBSd output: d_head must equal 128\n   - FP8 KV cache quantization requires BSD output layout\n   - Block KV cache requires ``block_size`` and ``slot_mapping``\n\n"
  },
  {
    "path": "nki/library/api/rmsnorm-quant.rst",
    "content": ".. meta::\n    :description: RMSNorm-Quant kernel performs optional RMS normalization followed by fp8 quantization.\n    :date-modified: 10/28/2025\n\n.. currentmodule:: nkilib.core.rmsnorm_quant.rmsnorm_quant\n\nRMSNorm-Quant Kernel API Reference\n==================================\n\nPerforms optional RMS normalization followed by quantization to fp8.\n\nThe kernel supports:\n\n* Optional RMS normalization before quantization\n* 8-bit quantization along the last dimension of the input tensor\n* Single program multiple data (SPMD) sharding for distributed computation\n* Flexible input tensor shapes (minimum 2 dimensions)\n* Input validation with configurable dimension limits\n* Lower bound clipping for numerical stability\n\nBackground\n--------------\n\nThe ``RMSNorm-Quant`` kernel processes tensors along their last dimension (processing dimension), with all other dimensions collapsed into a single outer dimension. This design allows for efficient processing of tensors with arbitrary shapes, as long as they have at least 2 dimensions.\n\nFor detailed information about the mathematical operations and implementation details, refer to the :doc:`RMSNorm-Quant Kernel Design Specification </nki/library/specs/design-rmsnorm-quant>`.\n\nAPI Reference\n----------------\n\n**Source code for this kernel API can be found at**: `rmsnorm_quant.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/rmsnorm/rmsnorm_quant.py>`_\n\nrmsnorm_quant_kernel\n^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: rmsnorm_quant_kernel(hidden: nl.ndarray, ln_w: nl.ndarray, kargs: RmsNormQuantKernelArgs, input_dequant_scale: nl.ndarray = None)\n\n   Entrypoint NKI kernel that performs one of the following:\n   \n   1. Perform RMSNorm and quantize the normalized hidden over the hidden dimension (``H``, or ``axis=-1``).\n   2. Quantize hidden over dimension ``H``.\n\n   The kernel supports no specialization, or specialization along 1 dimension (1D SPMD grid).\n\n   :param hidden: Input hidden states tensor with minimum 2 dimensions. For 3D inputs, expected layout is ``[B, S, H]``. For 2D inputs, layout is ``[outer_dim, processing_dim]`` where outer_dim is the product of all major dimensions.\n   :type hidden: ``nl.ndarray``\n   :param ln_w: Gamma multiplicative bias vector with ``[H]`` or ``[1, H]`` layout. Required when RMS normalization is enabled.\n   :type ln_w: ``nl.ndarray``\n   :param kargs: Kernel arguments specifying normalization type, bounds, and epsilon values. See :py:class:`RmsNormQuantKernelArgs` for details.\n   :type kargs: ``RmsNormQuantKernelArgs``\n   :param input_dequant_scale: Optional dequantization scale for input tensor.\n   :type input_dequant_scale: ``nl.ndarray``, optional\n   :return: Output tensor with shape ``[..., H + 4]`` on HBM where the last dimension is extended by 4 elements. The first H elements store the possibly normalized and quantized tensor, while the last 4 elements store fp8 floats that can be reinterpreted as fp32 dequantization scales.\n   :rtype: ``nl.ndarray``\n\n   **Constraints**:\n\n   * Input tensor must have at least 2 dimensions\n   * For 3D inputs: batch dimension ≤ MAX_B, sequence length ≤ MAX_S, hidden dimension ≤ MAX_H\n   * For 2D inputs: processing dimension ≤ MAX_H, outer dimension ≤ MAX_B × MAX_S\n   * When RMS normalization is enabled, ln_w must have shape [H] or [1, H] where H matches the processing dimension\n\nRmsNormQuantKernelArgs\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:class:: RmsNormQuantKernelArgs\n\n   RMS Norm Quantization Kernel arguments.\n\n   .. py:attribute:: lower_bound\n      :type: float\n\n      Non-negative float used for clipping input values and scale.\n\n   .. py:attribute:: norm_type\n      :type: NormType\n      :value: NormType.RMS_NORM\n\n      Normalization type to use [``RMS_NORM``, ``NO_NORM``]\n\n   .. py:attribute:: quantization_type\n      :type: QuantizationType\n      :value: QuantizationType.ROW\n\n      Quantization type to use [``ROW``, ``STATIC``]\n\n   .. py:attribute:: eps\n      :type: float\n      :value: 1e-6\n\n      Epsilon value for numerical stability, model hyperparameter\n\n   .. py:method:: needs_rms_normalization() -> bool\n\n      Returns True if RMS normalization should be applied, False otherwise.\n\n   .. py:method:: has_lower_bound() -> bool\n\n      Returns True if a positive lower bound is specified, False otherwise.\n\n   **Raises**:\n\n   * **AssertionError** – Raised when unsupported normalization types are used, negative bounds are provided, or invalid epsilon values are specified.\n   * Supports 1D SPMD grid or no specialization\n\n   .. note::\n      The autocast argument may NOT be respected properly. The kernel automatically handles dimension validation and provides detailed error messages for constraint violations.\n\nImplementation Details\n-------------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Input Tensor Outer Dimension Collapse**: All major dimensions are collapsed into one for simplification, allowing the kernel to process along the minor dimension efficiently.\n\n2. **Tiling**: The kernel is tiled on the major dimension by a size equal to the hardware's maximum partition dimension, ensuring full utilization of the hardware engines' input width.\n\n3. **SBUF/PSUM Allocation**: Uses Stack Allocator for consistent and deterministic memory allocations within the kernel scope.\n\n4. **SPMD Sharding**: Supports splitting computation across the constituent cores of a Logical Neuron Core by sharding on the outer-most dimension with automatic load balancing for non-divisible dimensions.\n\n5. **Gamma Broadcast**: Improves pipeline parallelism by distributing work to the TensorEngine through matrix multiplication against a vector of ones.\n\n6. **Activation Reduce**: Uses specialized instructions to perform reduce-add operations efficiently along with square operations.\n\n7. **Optimized Batch Processing**: Processes tiles in batches of 8 for improved efficiency, with remainder handling for non-divisible cases.\n\n8. **Input Validation**: Comprehensive validation of tensor dimensions against hardware limits (MAX_B, MAX_S, MAX_H) with detailed error messages.\n\n9. **Numerical Stability**: Implements lower bound clipping and minimum dequantization scale clamping to prevent numerical instabilities.\n\nSee Also\n-----------\n\n* :doc:`RMSNorm-Quant Kernel Design Specification </nki/library/specs/design-rmsnorm-quant>`\n"
  },
  {
    "path": "nki/library/api/rope.rst",
    "content": ".. meta::\n    :description: RoPE kernel applies Rotary Position Embedding to input embeddings.\n    :date-modified: 01/21/2026\n\n.. currentmodule:: nkilib.core.rope\n\nRoPE Kernel API Reference\n==========================\n\nApplies Rotary Position Embedding (RoPE) to input embeddings, encoding positional information by rotating embedding dimension pairs using precomputed sine/cosine frequencies.\n\nThe kernel supports:\n\n* Efficient position encoding without absolute position embeddings\n* Optional LNC sharding for parallelization across cores\n* Flexible memory layouts (contiguous or interleaved)\n* Layout conversion strategies (DMA strided access or SBUF matmul)\n* Standalone operation with HBM I/O\n* SBUF-only operation for megakernel fusion\n\nBackground\n--------------\n\nThe ``RoPE`` kernel implements Rotary Position Embedding, which encodes positional information by rotating pairs of embedding dimensions using precomputed sine/cosine frequencies. This approach enables position-aware attention mechanisms without requiring absolute position embeddings.\n\nThe kernel applies the following transformation:\n\n* ``out[even] = x[even] * cos - x[odd] * sin``\n* ``out[odd] = x[odd] * cos + x[even] * sin``\n\nThe kernel supports two memory layouts for the head dimension: contiguous (first half, second half) and interleaved (even, odd, even, odd). Layout conversion can be performed using either strided DMA access or SBUF matmul operations.\n\nAPI Reference\n----------------\n\n**Source code for this kernel API can be found at**: `rope.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/embeddings/rope.py>`_\n\nRoPE\n^^^^^^^^^^^^^^^\n\n.. py:function:: RoPE(x_in, cos, sin, lnc_shard=False, contiguous_layout=True, relayout_in_sbuf=False)\n\n   Apply Rotary Position Embedding (RoPE) to input embeddings.\n   Standalone kernel with HBM I/O and optional LNC sharding.\n\n   :param x_in: Input embeddings tensor with shape ``[d_head, B, n_heads, S]`` in HBM\n   :type x_in: ``nl.ndarray``\n   :param cos: Cosine frequencies tensor with shape ``[d_head//2, B, S]`` in HBM\n   :type cos: ``nl.ndarray``\n   :param sin: Sine frequencies tensor with shape ``[d_head//2, B, S]`` in HBM\n   :type sin: ``nl.ndarray``\n   :param lnc_shard: Parallelize across LNC cores by tiling sequence dimension. Default is ``False``.\n   :type lnc_shard: ``bool``, optional\n   :param contiguous_layout: Memory layout in d_head dimension. ``True`` for ``[first_half, second_half]`` (default, more efficient), ``False`` for ``[even, odd, even, odd, ...]`` (interleaved).\n   :type contiguous_layout: ``bool``, optional\n   :param relayout_in_sbuf: Use SBUF matmul for layout conversion (only for small tensors). Default is ``False``.\n   :type relayout_in_sbuf: ``bool``, optional\n   :return: RoPE applied output tensor with shape ``[d_head, B, n_heads, S]`` in HBM\n   :rtype: ``nl.ndarray``\n\n   **Constraints**:\n\n   * Head dimension (``d_head``) must be 64 or 128\n   * Batch size (``B``) must be in range (0, 64]\n   * Sequence length (``S``) must be in range (0, 512]\n   * Number of heads (``n_heads``) must be in range (0, 16]\n   * When ``lnc_shard=True``, sequence length must be divisible by number of programs\n   * SBUF relayout (``relayout_in_sbuf=True``) requires ``B * n_heads * S <= gemm_moving_fmax``\n\nRoPE_sbuf\n^^^^^^^^^^^^^^^\n\n.. py:function:: RoPE_sbuf(x_in_sb, cos_sb, sin_sb, x_out_sb, convert_from_interleaved=False)\n\n   Apply RoPE on tensors in SBUF (for megakernel fusion).\n   Helper function that operates entirely in SBUF without HBM I/O.\n\n   :param x_in_sb: Input embeddings tensor with shape ``[d_head, B, n_heads, S]`` in SBUF\n   :type x_in_sb: ``nl.ndarray``\n   :param cos_sb: Cosine frequencies tensor with shape ``[d_head//2, B, S]`` in SBUF\n   :type cos_sb: ``nl.ndarray``\n   :param sin_sb: Sine frequencies tensor with shape ``[d_head//2, B, S]`` in SBUF\n   :type sin_sb: ``nl.ndarray``\n   :param x_out_sb: Output buffer tensor with shape ``[d_head, B, n_heads, S]`` in SBUF\n   :type x_out_sb: ``nl.ndarray``\n   :param convert_from_interleaved: Convert from interleaved to contiguous layout (only for small tensors: ``B * n_heads * S <= gemm_moving_fmax``). Default is ``False``.\n   :type convert_from_interleaved: ``bool``, optional\n   :return: Output tensor with RoPE applied (modified in-place)\n   :rtype: ``nl.ndarray``\n\n   **Constraints**:\n\n   * Assumes contiguous layout unless ``convert_from_interleaved=True``\n   * For large tensors with interleaved layout, use ``RoPE()`` with strided DMA\n   * Input and output tensors must have matching dtypes\n\nImplementation Details\n-------------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Layout Conversion Strategies**: Supports two methods for converting between contiguous and interleaved layouts:\n   \n   * **DMA Strided Access**: Uses strided DMA operations with step=2 to gather/scatter even and odd indices separately. Suitable for all tensor sizes.\n   * **SBUF Matmul**: Uses matrix multiplication with a permutation matrix for layout conversion. Limited to small tensors where ``B * n_heads * S <= gemm_moving_fmax``.\n\n2. **LNC Sharding**: Supports parallelization across Logical NeuronCore (LNC) cores by tiling the sequence dimension. Each core processes a tile of size ``S // n_prgs``.\n\n3. **Efficient Tensor Operations**: Uses ``tensor_tensor`` operations with TensorView broadcasting to efficiently apply cos/sin coefficients across the n_heads dimension.\n\n4. **Memory Management**: Carefully manages SBUF allocations for intermediate buffers including separate storage for odd half elements to satisfy tensor_tensor alignment requirements.\n\n5. **Permutation Matrix Generation**: For SBUF layout conversion, generates a permutation matrix using strided access on an identity matrix, enabling efficient transformation via matrix multiplication.\n\n\n\nSee Also\n-----------\n\n* :doc:`RoPE HuggingFace Kernel API Reference </nki/library/api/rope-hf>`\n"
  },
  {
    "path": "nki/library/api/router-topk.rst",
    "content": ".. meta::\n    :description: Router Top-K kernel computes router logits and performs top-K selection for MoE models.\n    :date-modified: 01/21/2026\n\n.. currentmodule:: nkilib.core.router_topk\n\nRouter Top-K Kernel API Reference\n==================================\n\nComputes router logits, applies activation functions, and performs top-K selection with expert affinity scattering for Mixture of Experts (MoE) models.\n\nThe kernel supports:\n\n* Router logits computation (x @ w + bias)\n* Activation functions (SOFTMAX, SIGMOID)\n* Top-K expert selection (K ≤ 8)\n* Expert affinity scattering (one-hot or indirect DMA)\n* Multiple layout configurations and optimization modes\n* Column tiling for small token counts\n* LNC sharding across token dimension\n* Pre-norm and post-norm activation pipelines\n* L1 normalization of top-K probabilities\n\nBackground\n--------------\n\nThe ``Router Top-K`` kernel is a core component of Mixture of Experts (MoE) models, responsible for routing tokens to the most relevant experts. The kernel computes router logits by multiplying input tokens with a weight matrix, applies activation functions, selects the top-K experts for each token, and scatters the expert affinities to the full expert dimension.\n\nThe kernel is optimized for token counts T ≤ 2048, expert counts E ≤ 512, hidden dimensions H that are multiples of 128, and K ≤ 8 top experts per token. It supports both context encoding (CTE) with larger T and token generation (TKG) with T ≤ 128.\n\n**Pipeline Configurations**:\n\nThe kernel supports multiple pipeline configurations:\n\n1. **(topK, ACT2, Scatter)**: Standard pipeline with post-topK activation\n2. **(ACT1, topK)**: Pre-norm activation before topK selection\n3. **(ACT1, topK, Norm, Scatter)**: Pre-norm with L1 normalization and scatter\n\nAPI Reference\n----------------\n\n**Source code for this kernel API can be found at**: `router_topk.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/core/router_topk/router_topk.py>`_\n\nrouter_topk\n^^^^^^^^^^^^^^^\n\n.. py:function:: router_topk(x, w, w_bias, router_logits, expert_affinities, expert_index, act_fn, k, x_hbm_layout, x_sb_layout, output_in_sbuf=False, router_pre_norm=True, norm_topk_prob=False, use_column_tiling=False, use_indirect_dma_scatter=False, return_eager_affi=False, use_PE_broadcast_w_bias=False, shard_on_tokens=False, skip_store_expert_index=False, skip_store_router_logits=False, x_input_in_sbuf=False, expert_affin_in_sb=False)\n\n   Router top-K kernel for Mixture of Experts (MoE) models.\n\n   Computes router logits (x @ w + bias), applies activation functions, performs top-K selection,\n   and scatters expert affinities. Supports multiple layout configurations, sharding strategies,\n   and optimization modes.\n\n   :param x: Input tensor. Shape depends on ``x_hbm_layout`` and ``x_input_in_sbuf``. If in HBM: ``[H, T]`` or ``[T, H]``. If in SBUF: a permutation of ``[128, T, H/128]``.\n   :type x: ``nl.ndarray``\n   :param w: Weight tensor with shape ``[H, E]`` in HBM\n   :type w: ``nl.ndarray``\n   :param w_bias: Optional bias tensor with shape ``[1, E]`` or ``[E]`` in HBM\n   :type w_bias: ``nl.ndarray``\n   :param router_logits: Output router logits with shape ``[T, E]`` in HBM\n   :type router_logits: ``nt.mutable_tensor``\n   :param expert_affinities: Output expert affinities with shape ``[T, E]`` in HBM or SBUF\n   :type expert_affinities: ``nt.mutable_tensor``\n   :param expert_index: Output expert indices with shape ``[T, K]`` in HBM or SBUF\n   :type expert_index: ``nt.mutable_tensor``\n   :param act_fn: Activation function (SOFTMAX or SIGMOID)\n   :type act_fn: ``common_types.RouterActFnType``\n   :param k: Number of top experts to select (must be ≤ 8)\n   :type k: ``int``\n   :param x_hbm_layout: Layout of x in HBM (0=[H,T], 1=[T,H])\n   :type x_hbm_layout: ``int``\n   :param x_sb_layout: Layout of x in SBUF (0-3, see notes for details)\n   :type x_sb_layout: ``int``\n   :param output_in_sbuf: If True, outputs are in SBUF (requires T ≤ 128). Default is False.\n   :type output_in_sbuf: ``bool``, optional\n   :param router_pre_norm: If True, apply activation before top-K (ACT1 pipeline). Default is True.\n   :type router_pre_norm: ``bool``, optional\n   :param norm_topk_prob: If True, normalize top-K probabilities with L1 norm. Default is False.\n   :type norm_topk_prob: ``bool``, optional\n   :param use_column_tiling: Enable PE array column tiling for small T. Default is False.\n   :type use_column_tiling: ``bool``, optional\n   :param use_indirect_dma_scatter: Use indirect DMA for expert affinity scatter. Default is False.\n   :type use_indirect_dma_scatter: ``bool``, optional\n   :param return_eager_affi: If True, return top-K affinities in addition to scattered. Default is False.\n   :type return_eager_affi: ``bool``, optional\n   :param use_PE_broadcast_w_bias: Use tensor engine for bias broadcast. Default is False.\n   :type use_PE_broadcast_w_bias: ``bool``, optional\n   :param shard_on_tokens: Enable LNC sharding across token dimension. Default is False.\n   :type shard_on_tokens: ``bool``, optional\n   :param skip_store_expert_index: Skip storing expert indices to HBM. Default is False.\n   :type skip_store_expert_index: ``bool``, optional\n   :param skip_store_router_logits: Skip storing router logits to HBM. Default is False.\n   :type skip_store_router_logits: ``bool``, optional\n   :param x_input_in_sbuf: If True, x is already in SBUF. Default is False.\n   :type x_input_in_sbuf: ``bool``, optional\n   :param expert_affin_in_sb: If True, expert affinities output is in SBUF. Default is False.\n   :type expert_affin_in_sb: ``bool``, optional\n   :return: List of ``[router_logits, expert_index, expert_affinities, optional: expert_affinities_topk]``\n   :rtype: ``list``\n\n   **Dimensions**:\n\n   * T: Total number of tokens\n   * H: Hidden dimension size\n   * E: Number of experts\n   * K: Number of top experts to select per token\n\n   **Constraints**:\n\n   * K must be ≤ 8\n   * E must be ≤ 512 (gemm_moving_fmax)\n   * H must be a multiple of 128\n   * SIGMOID activation requires ``use_indirect_dma_scatter=True``\n   * ``router_pre_norm`` requires ``use_indirect_dma_scatter=True``\n   * With ``use_indirect_dma_scatter``, T must be ≤ 128 or multiple of 128\n   * ``shard_on_tokens`` requires n_prgs > 1 and T divisible by 2\n   * ``output_in_sbuf`` requires T ≤ 128\n\n   **SBUF Layout Options** (``x_sb_layout``):\n\n   * 0: ``[128, T, H/128]`` - P-dim contains H elements with stride of H/128\n   * 1: ``[128, T, H/128]`` - P-dim with H/256 chunk interleaving\n   * 2: ``[128, T, H/128]`` - P-dim contains consecutive H elements\n   * 3: ``[128, H/128, T]`` - H-tiles in dim-1, T in dim-2\n\nrouter_topk_input_x_load\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: router_topk_input_x_load(x, hbm_layout=0, sb_layout=1)\n\n   Load input tensor x from HBM to SBUF with specified layout transformations.\n\n   Performs DMA transfer from HBM to SBUF with layout conversion based on hbm_layout\n   and sb_layout parameters. Supports multiple layout combinations optimized for\n   different access patterns in subsequent matmul operations.\n\n   :param x: Input tensor in HBM. Shape ``[H, T]`` if hbm_layout=0, ``[T, H]`` if hbm_layout=1\n   :type x: ``nl.ndarray``\n   :param hbm_layout: Layout of x in HBM (0=[H,T], 1=[T,H]). Default is 0.\n   :type hbm_layout: ``int``, optional\n   :param sb_layout: Target layout in SBUF (0-3). Default is 1.\n   :type sb_layout: ``int``, optional\n   :return: Input tensor in SBUF with transformed layout\n   :rtype: ``nl.ndarray``\n\n   **Constraints**:\n\n   * H must be a multiple of 128\n   * Supported combinations: (hbm_layout=0, sb_layout=3) and (hbm_layout=1, sb_layout=0/1/2)\n\nrouter_topk_input_w_load\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: router_topk_input_w_load(w, x_sb_layout, name='')\n\n   Load weight tensor w from HBM to SBUF with layout matching x tensor.\n\n   :param w: Weight tensor with shape ``[H, E]`` in HBM\n   :type w: ``nl.ndarray``\n   :param x_sb_layout: Layout of x in SBUF (determines w layout)\n   :type x_sb_layout: ``int``\n   :param name: Optional name for the tensor. Default is empty string.\n   :type name: ``str``, optional\n   :return: Weight tensor in SBUF with appropriate layout\n   :rtype: ``nl.ndarray``\n\nImplementation Details\n-------------------------\n\nThe kernel implementation includes several key optimizations:\n\n1. **Tiled Matrix Multiplication**: Tiles computation on both H (contraction dimension) and T (token dimension) for efficient memory access and hardware utilization.\n\n2. **PE Array Column Tiling**: For small token counts (T < 128), splits the PE array column-wise into multiple tiles (32, 64, or 128 columns) to enable parallel execution of independent matmuls.\n\n3. **LNC Sharding**: Supports parallelization across 2 cores by sharding the token dimension. Each core processes T/2 tokens with automatic load balancing for non-divisible token counts.\n\n4. **Bias Broadcasting**: Supports two methods for bias application:\n   \n   * Stream shuffle broadcast (default)\n   * Tensor engine matmul with ones mask (``use_PE_broadcast_w_bias=True``)\n\n5. **Top-K Selection**: Uses hardware-accelerated ``max8`` and ``nc_find_index8`` instructions to efficiently find top-8 values and their indices.\n\n6. **Expert Affinity Scattering**: Supports two scattering methods:\n   \n   * **One-hot scatter**: Uses mask-based selection with element-wise operations\n   * **Indirect DMA scatter**: Uses dynamic indexing for efficient scatter to HBM\n\n7. **Activation Pipelines**: Supports multiple activation pipeline configurations (ACT1, ACT2) with optional L1 normalization.\n\n8. **Memory Management**: Carefully manages SBUF allocations with modular allocation and buffer reuse for intermediate tensors.\n\n\n\nSee Also\n-----------\n\n* :doc:`Router Top-K PyTorch Reference </nki/library/api/router-topk-torch>`\n"
  },
  {
    "path": "nki/library/api/sb2sb-allgather.rst",
    "content": ".. meta::\n    :description: SBUF-to-SBUF all-gather kernel for gathering tensors across ranks.\n    :date-modified: 04/09/2026\n\n.. currentmodule:: nkilib.experimental.collectives\n\nSBUF-to-SBUF All-Gather Kernel API Reference\n=============================================\n\nPerforms SBUF-to-SBUF all-gather for gathering tensors across ranks.\n\nThe kernel provides two variants:\n\n* ``allgather_sb2sb`` — Optimized for small tensors that fit entirely in SBUF\n* ``allgather_sb2sb_tiled`` — Adds tiling and LNC support for larger tensors\n\nBackground\n-----------\n\nThe ``allgather_sb2sb`` kernels gather input tensors from all ranks along the last dimension (K dimension). Each rank contributes its local tensor, and all ranks receive the concatenated result.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `sb2sb_allgather.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/collectives/sb2sb_allgather.py>`_\n\nallgather_sb2sb\n^^^^^^^^^^^^^^^\n\n.. py:function:: allgather_sb2sb(inp: nl.ndarray, replica_groups: ReplicaGroup, tp_degree: int) -> nl.ndarray\n\n   SBUF-to-SBUF all-gather kernel for gathering tensors across ranks.\n\n   :param inp: [H, W], Input tensor on HBM, where W is the local width per rank.\n   :type inp: ``nl.ndarray``\n   :param replica_groups: ReplicaGroup defining which ranks participate in the collective.\n   :type replica_groups: ``ReplicaGroup``\n   :param tp_degree: Tensor parallelism degree (number of ranks in the group).\n   :type tp_degree: ``int``\n   :return: [H, K], Output tensor on shared HBM containing gathered data from all ranks.\n   :rtype: ``nl.ndarray``\n\n   **Notes**:\n\n   * Input tensor must fit in SBUF (H * W * dtype_size <= SBUF capacity)\n   * Output is stored in shared_hbm for cross-rank visibility\n   * All ranks receive identical output after the collective\n\n   **Dimensions**:\n\n   * H: Height dimension (partition dimension, typically <= 128)\n   * W: Width dimension per rank (local width before gather)\n\nallgather_sb2sb_tiled\n^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: allgather_sb2sb_tiled(inp: nl.ndarray, replica_groups: ReplicaGroup, tp_degree: int) -> nl.ndarray\n\n   SBUF-to-SBUF all-gather with tiling and LNC support for larger tensors.\n\n   :param inp: [M, K], Input tensor on HBM, where K is the local width per rank.\n   :type inp: ``nl.ndarray``\n   :param replica_groups: ReplicaGroup defining which ranks participate in the collective.\n   :type replica_groups: ``ReplicaGroup``\n   :param tp_degree: Tensor parallelism degree (number of ranks in the group).\n   :type tp_degree: ``int``\n   :return: [M, K * tp_degree], Output tensor on shared HBM containing gathered data.\n   :rtype: ``nl.ndarray``\n\n   **Notes**:\n\n   * TILE_M is capped at 128 (SBUF partition size limit)\n   * When launched with LNC grid [lnc], tiles are distributed across LNC cores\n   * Each LNC core processes TILES_PER_CORE = NUM_M_TILES // n_prgs tiles\n   * Assumes M is evenly divisible by 128 when M > 128\n\n   **Dimensions**:\n\n   * M: Height dimension (tiled along this dimension)\n   * K: Width dimension per rank (local width before gather)\n   * TILE_M: Tile size along M dimension (capped at 128)\n\n"
  },
  {
    "path": "nki/library/api/topk-reduce.rst",
    "content": ".. meta::\n    :description: Compute MoE Top-K reduction across sparse all_to_all_v() collective output buffer.\n    :date-modified: 04/09/2026\n\n.. currentmodule:: nkilib.experimental.subkernels\n\nTop-K Reduce Kernel API Reference\n=================================\n\nComputes MoE Top-K reduction across sparse ``all_to_all_v()`` collective output buffer.\n\nThe kernel supports:\n\n* Gathering scattered rows by packed global token index\n* Reduction along the K dimension\n* LNC sharding on the H dimension\n\nBackground\n-----------\n\nThe ``topk_reduce`` kernel gathers scattered rows by packed global token index and reduces along the K dimension. It is used to recombine expert outputs after an ``all_to_all_v()`` collective in Mixture of Experts models.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `topk_reduce.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/subkernels/topk_reduce.py>`_\n\ntopk_reduce\n^^^^^^^^^^^\n\n.. py:function:: topk_reduce(input: nl.ndarray, T: int, K: int)\n\n   Compute MoE Top-K reduction across sparse all_to_all_v() collective output buffer.\n\n   :param input: [TK_padded, H + 2]@HBM, bf16/fp16. Sparse input buffer containing T*K scattered outputs. Global token index is packed as int32 in the final 2x columns of each row.\n   :type input: ``nl.ndarray``\n   :param T: Total number of input tokens.\n   :type T: ``int``\n   :param K: Number of routed experts per token.\n   :type K: ``int``\n   :return: [T, H]@HBM, bf16/fp16. Ordered and reduced output.\n   :rtype: ``nl.ndarray``\n\n   **Dimensions**:\n\n   * TK_padded: n_src_ranks * T, padded input row count\n   * H: Hidden dimension size (must be divisible by LNC)\n   * T: Total number of input tokens (up to 128)\n\n"
  },
  {
    "path": "nki/library/api/transformer-tkg.rst",
    "content": ".. meta::\n    :description: Transformer token generation forward pass megakernel.\n    :date-modified: 04/09/2026\n\n.. currentmodule:: nkilib.experimental.transformer\n\nTransformer TKG Kernel API Reference\n====================================\n\nImplements the transformer token generation forward pass as a single megakernel.\n\nThe kernel supports:\n\n* Configurable number of transformer layers\n* Per-layer attention block (RMSNorm + QKV + RoPE + Attention + Output Projection)\n* Per-layer MLP block (RMSNorm + Gate/Up + Activation + Down Projection)\n* All-reduce collective communication between layers\n* Residual connections\n* Optional FP8 quantization with per-layer weight scales\n* SBUF residual path with SB2SB all-reduce\n\nBackground\n-----------\n\nThe ``transformer_tkg`` kernel performs multiple transformer layers in a single kernel invocation for token generation. Within each layer, it executes: attention block, all-reduce, MLP, all-reduce, and residual connections. This reduces kernel launch overhead and enables cross-layer optimizations.\n\nAPI Reference\n--------------\n\n**Source code for this kernel API can be found at**: `transformer_tkg.py <https://github.com/aws-neuron/nki-library/blob/main/src/nkilib_src/nkilib/experimental/transformer/transformer_tkg.py>`_\n\ntransformer_tkg\n^^^^^^^^^^^^^^^\n\n.. py:function:: transformer_tkg(X: nl.ndarray, W_qkvs: List[nl.ndarray], W_outs: List[nl.ndarray], W_gates: List[nl.ndarray], W_ups: List[nl.ndarray], W_downs: List[nl.ndarray], W_gamma_qkvs: List[nl.ndarray], W_gamma_mlps: List[nl.ndarray], K_caches: List[nl.ndarray], V_caches: List[nl.ndarray], RoPE_cos: nl.ndarray, RoPE_sin: nl.ndarray, mask_cache: nl.ndarray, mask_active: nl.ndarray, position_ids: Optional[nl.ndarray], num_layers: int, eps: float = 1e-06, replica_groups: Optional[List[List[int]]] = None, sbuf_residual_and_cc: bool = False, clamp_bound: float = 0.0, W_gate_scales: Optional[List[nl.ndarray]] = None, W_up_scales: Optional[List[nl.ndarray]] = None, W_down_scales: Optional[List[nl.ndarray]] = None)\n\n   Transformer token generation forward pass megakernel.\n\n   :param X: [B, S_tkg, H], Input hidden states on HBM\n   :type X: ``nl.ndarray``\n   :param W_qkvs: Per-layer QKV projection weights\n   :type W_qkvs: ``List[nl.ndarray]``\n   :param W_outs: Per-layer output projection weights\n   :type W_outs: ``List[nl.ndarray]``\n   :param W_gates: Per-layer MLP gate projection weights\n   :type W_gates: ``List[nl.ndarray]``\n   :param W_ups: Per-layer MLP up projection weights\n   :type W_ups: ``List[nl.ndarray]``\n   :param W_downs: Per-layer MLP down projection weights\n   :type W_downs: ``List[nl.ndarray]``\n   :param W_gamma_qkvs: Per-layer RMSNorm gamma for QKV\n   :type W_gamma_qkvs: ``List[nl.ndarray]``\n   :param W_gamma_mlps: Per-layer RMSNorm gamma for MLP\n   :type W_gamma_mlps: ``List[nl.ndarray]``\n   :param K_caches: Per-layer K caches on HBM\n   :type K_caches: ``List[nl.ndarray]``\n   :param V_caches: Per-layer V caches on HBM\n   :type V_caches: ``List[nl.ndarray]``\n   :param RoPE_cos: [d_head//2, B, S_tkg], RoPE cosine embeddings\n   :type RoPE_cos: ``nl.ndarray``\n   :param RoPE_sin: [d_head//2, B, S_tkg], RoPE sine embeddings\n   :type RoPE_sin: ``nl.ndarray``\n   :param mask_cache: Attention mask for cached KV context\n   :type mask_cache: ``nl.ndarray``\n   :param mask_active: Attention mask for active tokens\n   :type mask_active: ``nl.ndarray``\n   :param position_ids: [B, 1], KV cache write positions (None = skip cache update)\n   :type position_ids: ``Optional[nl.ndarray]``\n   :param num_layers: Number of transformer layers to execute\n   :type num_layers: ``int``\n   :param eps: RMSNorm epsilon (default 1e-6)\n   :type eps: ``float``\n   :param replica_groups: Replica groups for collective communication\n   :type replica_groups: ``Optional[List[List[int]]]``\n   :param sbuf_residual_and_cc: Use SBUF residual path with SB2SB all-reduce (default False)\n   :type sbuf_residual_and_cc: ``bool``\n   :param clamp_bound: FP8 quantization clipping boundary (default 0.0, 0 = no clipping)\n   :type clamp_bound: ``float``\n   :param W_gate_scales: Per-layer FP8 gate weight scales\n   :type W_gate_scales: ``Optional[List[nl.ndarray]]``\n   :param W_up_scales: Per-layer FP8 up weight scales\n   :type W_up_scales: ``Optional[List[nl.ndarray]]``\n   :param W_down_scales: Per-layer FP8 down weight scales\n   :type W_down_scales: ``Optional[List[nl.ndarray]]``\n   :return: [B, S_tkg, H], Final hidden states after all transformer layers\n   :rtype: ``nl.ndarray``\n\n   **Dimensions**:\n\n   * B: Batch size\n   * S_tkg: Token generation sequence length (number of new tokens)\n   * H: Hidden dimension (must be multiple of 128)\n   * H0: Partition tile size (pmax = 128)\n   * H1: H // H0\n\n"
  },
  {
    "path": "nki/library/index.rst",
    "content": ".. meta::\n    :description: Home page for the NKI Library  documentation. NKI Library provides pre-built NKI kernels you can use in model development with Neuron.\n    :date-modified: 12/02/2025\n\n.. _nkl_home:\n\nNKI Library Documentation\n==========================\n\nThe NKI Library is a collection of pre-built kernels optimized for AWS Neuron-powered devices. These kernels are designed to accelerate machine learning workloads by providing efficient implementations of common operations used in deep learning models.\n\n**NKI Library GitHub repository**: https://github.com/aws-neuron/nki-library\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: **NKI Library Kernel Design Specs**\n      :class-card: sd-border-1\n      :link: /nki/library/specs/index\n      :link-type: doc\n\n      Review the formal specifications for the pre-built NKI kernels available in the NKI Library.\n\n   .. grid-item-card:: **NKI Library Supported Kernel Reference**\n      :class-card: sd-border-1\n      :link: /nki/library/api/index\n      :link-type: doc\n\n      Use this kernel reference to understand the functions, parameters, and usage of the pre-built NKI kernels in the NKI Library.\n\n.. grid:: 1 \n   :gutter: 3\n\n   .. grid-item-card:: **NKI Library Kernel Utilities**\n      :class-card: sd-border-1\n      :link: /nki/library/kernel-utils/index\n      :link-type: doc\n\n      Utility modules for memory management, tensor views, and iteration helpers used in NKI kernel development.\n\n   .. grid-item-card:: **NKI Library Release Notes**\n      :class-card: sd-border-1\n      :link: /release-notes/components/nki-lib\n      :link-type: doc\n\n      Release notes for the NKI Library kernels and APIs.\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   Overview <about/index>\n   Kernel Design Specs <specs/index>\n   Kernel API Reference <api/index>\n   Kernel Utilities <kernel-utils/index>\n   Release Notes </release-notes/components/nki-lib>\n\n"
  },
  {
    "path": "nki/library/kernel-utils/allocator.rst",
    "content": ".. meta::\n    :description: API reference for the SbufManager (Allocator) utility in the NKI Library.\n    :date-modified: 02/13/2026\n\n.. currentmodule:: nkilib.core.utils.allocator\n\nSbufManager (Allocator) API Reference\n=====================================\n\nThis topic provides the API reference for the ``SbufManager`` utility. It provides stack-based SBUF memory allocation with scope management and multi-buffering support.\n\nWhen to Use\n-----------\n\nUse ``SbufManager`` when you need:\n\n* **Deterministic memory layout**: Manual control over SBUF addresses for predictable memory placement\n* **Scope-based allocation**: Automatic cleanup of temporary buffers when a computation phase ends\n* **Multi-buffering in loops**: Ping-pong buffers for overlapping compute and memory operations\n* **Memory debugging**: Detailed logging of allocation patterns and usage statistics\n\n``SbufManager`` is particularly useful in complex kernels with multiple computation phases where different buffers are needed at different times.\n\nAPI Reference\n-------------\n\n**Source code**: https://github.com/aws-neuron/nki-library\n\nSbufManager\n^^^^^^^^^^^\n\n.. py:class:: SbufManager(sb_lower_bound, sb_upper_bound, logger=None, use_auto_alloc=False, default_stack_alloc=True)\n\n   Stack-based SBUF memory manager with scope support.\n\n   :param sb_lower_bound: Lower bound of available SBUF memory region.\n   :type sb_lower_bound: int\n   :param sb_upper_bound: Upper bound of available SBUF memory region.\n   :type sb_upper_bound: int\n   :param logger: Optional logger instance for allocation tracking.\n   :type logger: Logger, optional\n   :param use_auto_alloc: If True, delegates address assignment to compiler. Default False.\n   :type use_auto_alloc: bool\n   :param default_stack_alloc: If True, ``alloc()`` uses stack; if False, uses heap. Default True.\n   :type default_stack_alloc: bool\n\n   .. py:method:: open_scope(interleave_degree=1, name=\"\")\n\n      Opens a new allocation scope. Allocations within this scope are freed when the scope closes.\n\n      :param interleave_degree: Number of buffer sections for multi-buffering. Default 1.\n      :type interleave_degree: int\n      :param name: Optional scope name for debugging.\n      :type name: str\n      :rtype: None\n\n   .. py:method:: close_scope()\n\n      Closes the current scope and frees all stack allocations made within it.\n\n      :rtype: None\n\n   .. py:method:: increment_section()\n\n      Advances to the next buffer section within a multi-buffer scope. When all sections are used, wraps back to the first section.\n\n      :rtype: None\n\n   .. py:method:: alloc_stack(shape, dtype, buffer=nl.sbuf, name=None, base_partition=0, align=None)\n\n      Allocates a tensor on the stack (freed when scope closes).\n\n      :param shape: Shape of the tensor.\n      :type shape: tuple[int, ...]\n      :param dtype: Data type (e.g., ``nl.bfloat16``, ``nl.float32``).\n      :type dtype: dtype\n      :param buffer: Buffer type. Only ``nl.sbuf`` supported.\n      :type buffer: buffer\n      :param name: Optional tensor name (must be unique).\n      :type name: str, optional\n      :param base_partition: Base partition for allocation. Default 0.\n      :type base_partition: int\n      :param align: Alignment requirement in bytes.\n      :type align: int, optional\n      :return: Allocated SBUF tensor.\n      :rtype: nl.ndarray\n\n   .. py:method:: alloc_heap(shape, dtype, buffer=nl.sbuf, name=None, base_partition=0, align=None)\n\n      Allocates a tensor on the heap (must be manually freed with ``pop_heap()``).\n\n      Parameters are identical to ``alloc_stack()``.\n\n      :rtype: nl.ndarray\n\n   .. py:method:: alloc(shape, dtype, buffer=nl.sbuf, name=None, base_partition=0, align=None)\n\n      Allocates a tensor on the stack or heap, depending on the ``default_stack_alloc`` setting.\n\n      Parameters are identical to ``alloc_stack()``.\n\n      :rtype: nl.ndarray\n\n   .. py:method:: pop_heap()\n\n      Frees the most recently allocated heap tensor.\n\n      :rtype: None\n\n   .. py:method:: get_total_space()\n\n      Returns the total number of bytes in the managed region.\n\n      :rtype: int\n\n   .. py:method:: get_free_space()\n\n      Returns the number of free bytes between stack and heap.\n\n      :rtype: int\n\n   .. py:method:: get_used_space()\n\n      Returns the number of bytes currently used by stack and heap allocations.\n\n      :rtype: int\n\n   .. py:method:: get_stack_curr_addr()\n\n      Returns the current stack address. Not supported in auto-allocation mode.\n\n      :rtype: int\n\n   .. py:method:: get_heap_curr_addr()\n\n      Returns the current heap address. Not supported in auto-allocation mode.\n\n      :rtype: int\n\n   .. py:method:: align_stack_curr_addr(align=32)\n\n      Aligns the current stack address to the given alignment. Not supported in auto-allocation mode.\n\n      :param align: Alignment in bytes. Default 32.\n      :type align: int\n      :rtype: None\n\n   .. py:method:: set_name_prefix(prefix)\n\n      Sets a prefix string prepended to all subsequent allocation names.\n\n      :param prefix: Prefix string.\n      :type prefix: str\n      :rtype: None\n\n   .. py:method:: get_name_prefix()\n\n      Returns the current name prefix.\n\n      :rtype: str\n\n   .. py:method:: flush_logs()\n\n      Prints buffered allocation logs in tree format.\n\n      :rtype: None\n\ncreate_auto_alloc_manager\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. py:function:: create_auto_alloc_manager(logger=None)\n\n   Creates an SbufManager that delegates address assignment to the compiler.\n\n   :param logger: Optional logger instance.\n   :type logger: Logger, optional\n   :return: Auto-allocation SbufManager instance.\n   :rtype: SbufManager\n\nExamples\n--------\n\nWithout SbufManager (Manual Allocation)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.language as nl\n\n   @nki.jit\n   def kernel_without_sbm(input_hbm, output_hbm):\n       addr = 0\n       \n       # Heap-like allocation at end of SBUF\n       heap_addr = nl.tile_size.total_available_sbuf_size - 512\n       weights = nl.ndarray((128, 256), dtype=nl.bfloat16, buffer=nl.sbuf,\n                            address=(0, heap_addr))\n       print(f\"weights.address = {weights.address}\")  # (0, 261632)\n       \n       # Outer scope\n       buf1 = nl.ndarray((128, 512), dtype=nl.bfloat16, buffer=nl.sbuf,\n                         address=(0, addr))\n       print(f\"buf1.address = {buf1.address}\")  # (0, 0)\n       addr += 512 * 2  # 1024\n       \n       # Inner scope\n       inner_start = addr\n       buf2 = nl.ndarray((128, 256), dtype=nl.bfloat16, buffer=nl.sbuf,\n                         address=(0, addr))\n       print(f\"buf2.address = {buf2.address}\")  # (0, 1024)\n       addr += 256 * 2  # 1536\n       buf3 = nl.ndarray((128, 256), dtype=nl.bfloat16, buffer=nl.sbuf,\n                         address=(0, addr))\n       print(f\"buf3.address = {buf3.address}\")  # (0, 1536)\n       # End inner scope - must manually reset\n       addr = inner_start  # 1024\n       \n       # Back in outer - reuse inner's memory\n       buf4 = nl.ndarray((128, 512), dtype=nl.bfloat16, buffer=nl.sbuf,\n                         address=(0, addr))\n       print(f\"buf4.address = {buf4.address}\")  # (0, 1024)\n\nWith SbufManager\n^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.language as nl\n   from nkilib.core.utils.allocator import SbufManager\n\n   @nki.jit\n   def kernel_with_sbm(input_hbm, output_hbm):\n       sbm = SbufManager(0, nl.tile_size.total_available_sbuf_size)\n       \n       weights = sbm.alloc_heap((128, 256), nl.bfloat16, name=\"weights\")\n       print(f\"weights.address = {weights.address}\")  # (0, 261632)\n       \n       sbm.open_scope(name=\"outer\")\n       buf1 = sbm.alloc_stack((128, 512), nl.bfloat16, name=\"buf1\")\n       print(f\"buf1.address = {buf1.address}\")  # (0, 0)\n       \n       sbm.open_scope(name=\"inner\")\n       buf2 = sbm.alloc_stack((128, 256), nl.bfloat16, name=\"buf2\")\n       print(f\"buf2.address = {buf2.address}\")  # (0, 1024)\n       buf3 = sbm.alloc_stack((128, 256), nl.bfloat16, name=\"buf3\")\n       print(f\"buf3.address = {buf3.address}\")  # (0, 1536)\n       sbm.close_scope()\n       \n       buf4 = sbm.alloc_stack((128, 512), nl.bfloat16, name=\"buf4\")\n       print(f\"buf4.address = {buf4.address}\")  # (0, 1024)\n       sbm.close_scope()\n       \n       sbm.pop_heap()\n\nBoth produce identical memory layouts:\n\n.. code-block:: text\n\n   weights.address = (0, 261632)  # heap at top\n   buf1.address = (0, 0)          # stack grows up\n   buf2.address = (0, 1024)       # inner scope\n   buf3.address = (0, 1536)       # inner scope\n   buf4.address = (0, 1024)       # reuses inner's memory\n\nMulti-Buffering Example\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.language as nl\n   from nkilib.core.utils.allocator import SbufManager\n\n   @nki.jit\n   def kernel_multibuffer(input_hbm, output_hbm, N):\n       sbm = SbufManager(0, nl.tile_size.total_available_sbuf_size)\n       \n       # Double-buffering: 2 sections alternate\n       sbm.open_scope(interleave_degree=2, name=\"double_buffer\")\n       \n       for i in nl.affine_range(N):\n           # Allocates to section 0, then 1, then 0, then 1...\n           buf = sbm.alloc_stack((128, 512), nl.bfloat16)\n           # Load to buf[current], compute on buf[previous]\n           sbm.increment_section()\n       \n       sbm.close_scope()\n\nDebug output for ``N=4``:\n\n.. code-block:: text\n\n   [SBM] Allocations:\n       ▶ SCOPE 'double_buffer' [interleave=2] @ 0\n       ├── (unnamed): 1024 B @ 0 (128, 512) bfloat16\n       ├── ↳ section: 1/2 @ 1024\n       ├── (unnamed): 1024 B @ 1024 (128, 512) bfloat16\n       ├── ↻ section: 0/2 @ 0\n       ├── (unnamed): 1024 B @ 0 (128, 512) bfloat16\n       ├── ↳ section: 1/2 @ 1024\n       └── (unnamed): 1024 B @ 1024 (128, 512) bfloat16\n       ◀ END 'double_buffer' freed=2048 B\n\nNote how allocations alternate between addresses 0 and 1024.\n\nSee Also\n--------\n\n* :doc:`TensorView </nki/library/kernel-utils/tensor-view>` - Zero-copy tensor view operations\n"
  },
  {
    "path": "nki/library/kernel-utils/index.rst",
    "content": ".. meta::\n    :description: API reference for kernel utility modules in the NKI Library.\n    :date-modified: 04/09/2026\n\n.. _nkl_kernel_utils_home:\n\nNKI Library Kernel Utilities Reference\n======================================\n\nThe NKI Library provides utility modules to simplify common patterns in NKI kernel development. These utilities help manage memory allocation, tensor views, dimension tiling, and data broadcasting.\n\n**Source code for these utilities can be found at**: https://github.com/aws-neuron/nki-library\n\nMemory Management\n~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`SbufManager (Allocator) </nki/library/kernel-utils/allocator>`\n     - Stack-based SBUF memory allocator with scope management and multi-buffering support.\n\nTensor Operations\n~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`TensorView </nki/library/kernel-utils/tensor-view>`\n     - Zero-copy tensor view operations including slicing, permuting, reshaping, and broadcasting.\n   * - :doc:`stream_shuffle_broadcast </nki/library/kernel-utils/stream-shuffle-broadcast>`\n     - Broadcasts a single partition across the partition dimension using hardware shuffle.\n\nIteration Helpers\n~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: 40 60\n\n   * - :doc:`TiledRange </nki/library/kernel-utils/tiled-range>`\n     - Divides dimensions into tiles with automatic remainder handling.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    :caption: Memory Management\n\n    SbufManager (Allocator) <allocator>\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    :caption: Tensor Operations\n\n    TensorView <tensor-view>\n    stream_shuffle_broadcast <stream-shuffle-broadcast>\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n    :caption: Iteration Helpers\n\n    TiledRange <tiled-range>\n"
  },
  {
    "path": "nki/library/kernel-utils/tensor-view.rst",
    "content": ".. meta::\n    :description: API reference for the TensorView utility in the NKI Library.\n    :date-modified: 02/13/2026\n\n.. currentmodule:: nkilib.core.utils.tensor_view\n\nTensorView API Reference\n========================\n\nThis topic provides the API reference for the ``TensorView`` utility. It provides zero-copy tensor view operations for NKI tensors.\n\nWhen to Use\n-----------\n\nUse ``TensorView`` when you need to:\n\n* **Reshape without copying**: Change tensor layout for different computation phases\n* **Slice with strides**: Extract non-contiguous elements efficiently\n* **Permute dimensions**: Transpose or reorder dimensions for matmul compatibility\n* **Broadcast dimensions**: Expand size-1 dimensions without data duplication\n* **Chain operations**: Combine multiple view transformations fluently\n\n``TensorView`` is essential for kernels that need to interpret the same data in multiple layouts (e.g., attention kernels that reshape between ``[B, S, H]`` and ``[B, num_heads, S, head_dim]``).\n\nAPI Reference\n-------------\n\n**Source code**: https://github.com/aws-neuron/nki-library\n\nTensorView\n^^^^^^^^^^\n\n.. py:class:: TensorView(base_tensor)\n\n   A view wrapper around NKI tensors supporting various operations without copying data.\n\n   :param base_tensor: The underlying NKI tensor.\n   :type base_tensor: nl.ndarray\n\n   .. py:attribute:: shape\n      :type: tuple[int, ...]\n\n      Current shape of the view.\n\n   .. py:attribute:: strides\n      :type: tuple[int, ...]\n\n      Stride of each dimension in elements.\n\n   .. py:method:: get_view()\n\n      Generates the actual NKI tensor view using array pattern.\n\n      :return: NKI tensor with the view pattern applied.\n      :rtype: nl.ndarray\n\n   .. py:method:: slice(dim, start, end, step=1)\n\n      Creates a sliced view along a dimension.\n\n      :param dim: Dimension to slice.\n      :type dim: int\n      :param start: Start index (inclusive).\n      :type start: int\n      :param end: End index (exclusive).\n      :type end: int\n      :param step: Step size. Default 1.\n      :type step: int\n      :return: New TensorView with sliced dimension.\n      :rtype: TensorView\n\n   .. py:method:: permute(dims)\n\n      Creates a permuted view by reordering dimensions.\n\n      :param dims: New order of dimensions.\n      :type dims: tuple[int, ...]\n      :return: New TensorView with permuted dimensions.\n      :rtype: TensorView\n\n      **Note**: For SBUF tensors, partition dimension (dim 0) must remain at position 0.\n\n   .. py:method:: broadcast(dim, size)\n\n      Expands a size-1 dimension to a larger size without copying.\n\n      :param dim: Dimension to broadcast (must have size 1).\n      :type dim: int\n      :param size: New size for the dimension.\n      :type size: int\n      :return: New TensorView with broadcasted dimension.\n      :rtype: TensorView\n\n   .. py:method:: reshape_dim(dim, shape)\n\n      Reshapes a single dimension into multiple dimensions.\n\n      :param dim: Dimension to reshape.\n      :type dim: int\n      :param shape: New sizes (can contain one -1 for inference).\n      :type shape: tuple[int, ...]\n      :return: New TensorView with reshaped dimension.\n      :rtype: TensorView\n\n   .. py:method:: flatten_dims(start_dim, end_dim)\n\n      Flattens a range of contiguous dimensions into one.\n\n      :param start_dim: First dimension to flatten (inclusive).\n      :type start_dim: int\n      :param end_dim: Last dimension to flatten (inclusive).\n      :type end_dim: int\n      :return: New TensorView with flattened dimensions.\n      :rtype: TensorView\n\n   .. py:method:: expand_dim(dim)\n\n      Inserts a new dimension of size 1.\n\n      :param dim: Position to insert the new dimension.\n      :type dim: int\n      :return: New TensorView with added dimension.\n      :rtype: TensorView\n\n   .. py:method:: squeeze_dim(dim)\n\n      Removes a dimension of size 1.\n\n      :param dim: Dimension to remove (must have size 1).\n      :type dim: int\n      :return: New TensorView with removed dimension.\n      :rtype: TensorView\n\n   .. py:method:: select(dim, index)\n\n      Selects a single element along a dimension, reducing dimensionality.\n\n      :param dim: Dimension to select from.\n      :type dim: int\n      :param index: Index to select (int for static, nl.ndarray for dynamic).\n      :type index: int | nl.ndarray\n      :return: New TensorView with one fewer dimension.\n      :rtype: TensorView\n\n   .. py:method:: rearrange(src_pattern, dst_pattern, fixed_sizes=None)\n\n      Rearranges dimensions using einops-style patterns.\n\n      :param src_pattern: Source dimension pattern with named dimensions.\n      :type src_pattern: tuple[str | tuple[str, ...], ...]\n      :param dst_pattern: Destination dimension pattern.\n      :type dst_pattern: tuple[str | tuple[str, ...], ...]\n      :param fixed_sizes: Dictionary mapping dimension names to sizes.\n      :type fixed_sizes: dict[str, int], optional\n      :return: New TensorView with rearranged dimensions.\n      :rtype: TensorView\n\n   .. py:method:: reshape(new_shape)\n\n      Reshapes the tensor to new dimensions.\n\n      :param new_shape: New dimension shape.\n      :type new_shape: tuple[int, ...]\n      :return: New TensorView with reshaped dimensions.\n      :rtype: TensorView\n\n      .. note:: General reshape is not yet implemented and will raise an error. Use ``reshape_dim`` for single-dimension reshaping.\n\n   .. py:method:: has_dynamic_access()\n\n      Checks if the tensor view uses dynamic indexing (via a prior ``select`` with an ``nl.ndarray`` index).\n\n      :return: True if the view has dynamic access, False otherwise.\n      :rtype: bool\n\nExamples\n--------\n\nReshape and Permute\n^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   import nki.language as nl\n   from nkilib.core.utils.tensor_view import TensorView\n\n   @nki.jit\n   def kernel_reshape_permute(data_sb):\n       view = TensorView(data_sb)  # Shape: (128, 24, 64)\n       \n       reshaped = view.reshape_dim(1, (4, 6))  # (128, 4, 6, 64)\n       transposed = reshaped.permute((0, 2, 1, 3))  # (128, 6, 4, 64)\n       \n       result = transposed.get_view()\n\nSlicing with Step\n^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   from nkilib.core.utils.tensor_view import TensorView\n\n   @nki.jit\n   def kernel_strided_slice(data_sb):\n       view = TensorView(data_sb)  # Shape: (128, 256)\n       \n       # Take every other element: indices 0, 2, 4, ...\n       strided = view.slice(dim=1, start=0, end=256, step=2)  # (128, 128)\n       \n       result = strided.get_view()\n\nBroadcasting\n^^^^^^^^^^^^\n\n.. code-block:: python\n\n   from nkilib.core.utils.tensor_view import TensorView\n\n   @nki.jit\n   def kernel_broadcast(scale_sb, data_sb):\n       # scale_sb shape: (128, 1, 64)\n       # data_sb shape: (128, 32, 64)\n       \n       scale_view = TensorView(scale_sb)\n       \n       # Broadcast dim 1 from size 1 to 32\n       broadcasted = scale_view.broadcast(dim=1, size=32)  # (128, 32, 64)\n       \n       # Now can multiply element-wise\n       result = data_sb * broadcasted.get_view()\n\nEinops-Style Rearrange\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   from nkilib.core.utils.tensor_view import TensorView\n\n   @nki.jit\n   def kernel_rearrange(data_sb):\n       view = TensorView(data_sb)  # Shape: (128, 512, 64)\n       \n       # Reshape and transpose: (p, h*w, c) -> (p, c, h, w)\n       # where h=32 (must specify one dimension for -1 inference)\n       rearranged = view.rearrange(\n           src_pattern=('p', ('h', 'w'), 'c'),\n           dst_pattern=('p', 'c', 'h', 'w'),\n           fixed_sizes={'h': 32}\n       )  # (128, 64, 32, 16)\n       \n       result = rearranged.get_view()\n\nChained Operations\n^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n   from nkilib.core.utils.tensor_view import TensorView\n\n   @nki.jit\n   def attention_reshape(qkv_sb, num_heads, head_dim):\n       # qkv_sb shape: (128, seq_len, 3 * num_heads * head_dim)\n       view = TensorView(qkv_sb)\n       \n       # Chain: reshape -> slice Q -> reshape to heads\n       q_view = (view\n           .reshape_dim(2, (3, num_heads, head_dim))  # (128, S, 3, H, D)\n           .select(dim=2, index=0)                     # (128, S, H, D) - select Q\n           .permute((0, 2, 1, 3)))                     # (128, H, S, D)\n       \n       q = q_view.get_view()\n\nSee Also\n--------\n\n* :doc:`stream_shuffle_broadcast </nki/library/kernel-utils/stream-shuffle-broadcast>` - Hardware broadcast for partition dimension\n* :doc:`SbufManager </nki/library/kernel-utils/allocator>` - Memory allocation with scope management\n"
  },
  {
    "path": "nki/library/specs/design-rmsnorm-quant.rst",
    "content": ".. meta::\n    :description: Design specification for the RMSNorm-Quant kernel included in the NKI Library .\n    :date-modified: 12/02/2025\n\n\nRMSNorm-Quant Kernel Design Specification\n==========================================\n\nThis document describes the design of the RMSNorm-Quant kernel. It is intended to be a companion to the code to help readers understand what this kernel does, how it's designed, and how to use it.\n\nFor details on how to use this kernel, see the :doc:`RMSNorm-Quant Kernel API Reference </nki/library/api/rmsnorm-quant>`.\n\nBackground\n----------\n\nThis kernel performs *optional* `RMS normalization <https://arxiv.org/abs/1910.07467>`_ followed by quantization to ``fp8``.\n\nMotivation\n^^^^^^^^^^\nPerformance\n\"\"\"\"\"\"\"\"\"\"\"\n\nIt is expected that this kernel is typically used in an LLM FP8 inference model to replace the RMSNorm and FP8 quantization operators.\n\nThis kernel enables sequence-parallelism (SP) for the RMSNorm_Quant operation. In a non-SP LLM implementation, typically an allReduce collectives operation is followed by RMSNorm_Quant where the computation is duplicated across the entire [S,H] tensor on each TP (tensor parallel) worker. In SP, the allReduce+RMSNorm_Quant operation is instead replaced with reduceScatter + RMSNorm_Quant + allGather. The compute is accelerated because each worker only computes [S/TP_degree,H]. Furthermore, the allGather distributes an FP8 tensor, improving collective performance compared to bf16.\n\nNeuron Support\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\nCurrently the Neuron software stack does not support packing the two tensors with different data types (an FP8 data tensor and FP32 quantization tensor) into one tensor. This kernel showcases how this can be achieved in NKI.\n\nNext we'll examine the math this kernel performs.\n\nRMSNorm\n^^^^^^^\n\nMath\n\"\"\"\"\n\nThe input tensor typically has shape [B, S, H].\n\nRMSNorm is independently performed on each [B,S].\n\nThe equation is:\n\n.. math::\n\n    \\mathrm{RMSNorm}(x_i)=\\frac{x_i}{\\mathrm{RMS}(x)} \\gamma_i \\quad \\text{for } i = 1 \\dots H\n\nwhere:\n\n.. math::\n\n    \\mathrm{RMS}(x)=\\sqrt{(\\frac{1}{H} \\sum_{i=1}^{H} x_i^2) + \\epsilon} \\\\\n    x = \\text{each [B,S] with shape [H]} \\\\\n    \\gamma \\text{ = gamma with shape [H]} \\\\\n    \\epsilon = \\text{ small positive value for numerical stability}\n\nExplained in English using common LLM terminology, each token (i.e. each element of the S dimension) is represented by a vector of shape [H] (i.e. a vector in the so-called 'embedding' space). Each token-vector is normalized by dividing each element in the vector by the RMS factor of the overall token-vector. This **RMS** factor is computed ‘right-to-left', meaning the **S**\\ quares of the vector elements are computed, then the **M**\\ ean, then the square-**R**\\ oot. There is also a learned scaling factor called gamma; this is a shape [H] vector that scales (i.e. multiplied against) every token-vector.\n\nNext we'll look at how the above math is implemented using NKI ISA instructions on the hardware.\n\nOperator Graph\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nThe following diagram depicts the flow of operations. The code is written generically with respect to input tensor shape and tile sizes. But to be more relatable, this diagram instead uses both typical LLM labels ([S,H]) for the code's outer-dimension and processing-dimension as well as tiling sizes that optimally fit Trainium 2. \n\n.. figure:: images/RMSNorm.drawio.svg\n   :align: center\n\nQuantization\n^^^^^^^^^^^^\n\nMath\n\"\"\"\"\n\nWe subsequently apply AbsMax quantization to the RMS-Normalized input tensor whose shape is typically [B,S,H].\n\nQuantization is independently performed on each [B,S].\n\nThe equation is:\n\n.. math::\n    M = \\max_{i=1}^{H} |x_i| \\\\\n    D = \\frac{M}{240} \\\\\n    Q = \\frac{1}{D} \\\\\n    \\mathbf{x}_q = xQ\n\nor equivalently\n\n.. math::\n    x_{q,i} = x_iQ \\quad \\text{for } i = 1, \\dots, H\n\nwhere\n\n.. math::\n    x = \\text{each [B,S] with shape [H]} \\\\\n    \\mathbf{x}_q = \\text{quantized } \\mathbf{x} \\\\\n    D = \\text{de-quantization scale} \\\\\n    Q = \\text{quantization scale}\n\nThe above equation omits clipping/flooring details which are instead included later in this document.\n\nEach token-vector is quantized by multiplying each element in the vector by the quantization scale (Q) of the given token-vector; or said equivalently, dividing by the dequantization scale (D). The dequantization scale is computed by finding the absolute-max value in the vector and dividing by 240 (a typical constant for 8-bit quantization).\n\nOperator Graph\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nIn the following operator graph you'll notice that the final output packs the data and scales together into a single tensor, as described in the Motivation section.\n\n.. figure:: images/quant.drawio.svg\n   :align: center\n\nIn summary, we've seen how the RMSNorm and Quantization math operations are implemented using NKI ISA instructions and examined the intermediate shapes and tiling decisions along the way.\n\nNext we'll look at the kernel's API.\n\nHigh-Level Design Considerations & Optimization Strategies\n----------------------------------------------------------\n\nInput Tensor Outer Dimension Collapse\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThe code provides a good description of this but it's briefly summarized here so the idea can be referenced below. The RMSNorm-Quantization computations happen strictly on the minor dimension of the input tensor (called the ‘processing dimension' in the code therefore all major dimensions are collapsed into one for simplification (called the ‘outer dimension' in the code). In other words, the input is collapsed into a 2D tensor.\n\nExample:\n    [B,S,H] is collapsed into [BxS, H] = [outer_dimension, processing_dimension]\n\nTiling\n^^^^^^\n\nThe overall kernel (both RMSNorm and Quantization steps) is tiled on the major dimension of the 2D input tensor by a size equal to the hardware's maximum partition dimension of a tile. This ensures full utilization of the various hardware engines' input width.\n\nWithin the RMSNorm operation, the RMS-scale and gamma steps are further tiled on the minor dimension by a size equal to the hardware's maximum free dimension of the stationary operand of General Matrix Multiplication on TensorEngine. This is because the gamma-broadcast operation is ultimately performed via TensorEngine matrix multiplication so we maximize our use of the engine with maximally sized tiles. See :doc:`What is Tiling? </nki/get-started/about/tiling-overview>` for more details on tile size constraints.\n\nExample:\n\n    Consider a typical LLM input tensor of the shape [Batch, Sequence, Hidden] with [B=1, S=1024, H=2048]. We'll set B=1 for simplicity so that we can ignore it entirely. The tensor is first tiled on the S dimension in a size of 128 (which is the maximum partition dimension of Trainium2), resulting in 1024 / 128 = 8 outer dimension tiles of shape [S=128, H=2048]. The inverse-RMS calculation is performed across the H dimension, meaning it is performed independently on every row of the tile.\n\n    We subsequently tile on the H dimension in a size of 512 (the maximum matrix-multiply free-dimension on Trainium2), resulting in 2048 / 512 = 4 processing dimension tiles of shape [S=128, H=512]. The RMS scale (ScalarE) is applied, gamma is broadcast (TensorE), and gamma is applied (VectorE). You'll notice that pipeline parallelism is implemented by splitting the computation across 3 engines.\n\nSBUF/PSUM Allocation\n^^^^^^^^^^^^^^^^^^^^\n\nThe Stack Allocator is generally recommended for all kernels since it enables consistent and deterministic SBUF/PSUM memory allocations within the scope of the kernel. This is contrast to the default allocator which considers a larger scope outside the kernel, potentially resulting in varying allocations and consequent kernel performance variations.\n\nSPMD Sharding\n^^^^^^^^^^^^^\n\nThis kernel supports SPMD sharding as a way to split the computation across the constituent cores of a :doc:`Logical Neuron Core </about-neuron/arch/neuron-features/logical-neuroncore-config>`. It shards on the outer-most dimension.\n\n\nGamma Broadcast\n^^^^^^^^^^^^^^^\n\nThe bulk of the RMSNorm-Quantization operations rely on the Vector and Scalar engines as the core math does not involve matrix-multiplication at all, hence the TensorEngine would otherwise be idle. To improve pipeline parallelism we use a technique to broadcast the gamma vector across rows of a 2D matrix by performing matrix multiplication against a vector of ones, thereby distributing some of the work to the TensorEngine.\n\nactivation_reduce\n^^^^^^^^^^^^^^^^^\n\nThis :doc:`instruction </nki/api/generated/nki.isa.activation_reduce>` is notable because it allows us to perform the reduce-add for free along with the square operation.\n\n\nDesign Implementation\n---------------------\n\nThe commented code and the above sections should together deliver a good understanding of this kernel. However this section explains a few additional points to help understand the code.\n\nCPU Golden\n^^^^^^^^^^\n\nThe following is a simple Python equivalent to the kernel which can be another useful way of understanding the kernel's behaviour.\n\n.. code-block:: python\n\n   def rmsnorm_quant_ref(inp: np.ndarray, gamma: np.ndarray, eps: float = 1e-6) -> Tuple[np.ndarray, np.ndarray]:\n       \"\"\"RMSNorm + Quantization reference impl.\n\n       - inp: shape [B, S, H]\n       - output[0]: shape [B, S, H] in fp8e4, representing the quantized RMSNorm output of input\n       - output[1]: shape [B, S, 4] in fp32 representing the per-row dequantization scale\n       \"\"\"\n       assert(len(inp.shape) == 3)\n       inp = inp.astype(np.float32)\n       gamma = gamma.astype(np.float32)\n\n       # Perform RMSNorm\n       rms = np.sqrt(np.mean(np.square(inp), axis=-1, keepdims=True))\n       norm = inp * np.reciprocal(rms + eps)\n       norm *= gamma\n\n       # Perform quantization\n       norm_abs_max = np.abs(norm).max(axis=-1, keepdims=True)\n       quant_scale = 240.0 / norm_abs_max\n       norm_quant = norm * quant_scale\n       assert(np.allclose(norm, norm_quant * np.reciprocal(quant_scale)))  # dequantization should yield same norm\n\n       # Cast and return\n       norm_quant = dt.static_cast(norm_quant, dt.float8_e4m3)\n       dequant_scale = dt.static_cast(np.reciprocal(quant_scale), np.float32)\n\n       return norm_quant, dequant_scale\n\n\nKernel Code Details\n^^^^^^^^^^^^^^^^^^^\n\n`rms_normalize_tile()` contains a loop to tile across the processing dimension. This loop contains the following directive:\n\n.. code-block:: python\n\n    directives=ncc.multi_buffer(constants.num_hw_psum_banks)\n\nThis enables the compiler to replicate the gamma PSUM allocation (into which the gamma-broadcast matmul result is stored), improving pipeline parallelism by enabling each loop iteration to write into a separate PSUM bank.\n\n.. code-block:: python\n\n    skip_middle_end_transformations\n\nThe compiler middle-end-transformation passes contain heuristic-driven optimizations, including loop-reordering and loop-fusion. While these passes could help improve performance, in some cases, they are not predictable. Kernels are generally hand-tuned to achieve optimal performance, so we turn them off.\n\nKernel API\n----------\n\n.. autodata:: rmsnorm_quant_kernel\n   :noindex:\n\nEvaluation\n----------\n\nPerformance Targets\n^^^^^^^^^^^^^^^^^^^\n\nThe section includes some example performance targets for real world model configurations on a Trainium 2 with LNC=2 configuration.\n\n**Llama3.3 70B**\n\n+--------------------+-------------+-----------------+--------+\n| Target Latency (us)| Batch Count | Sequence Length | Hidden |\n+====================+=============+=================+========+\n| 458.2              | 1           | 2K              | 8192   |\n+--------------------+-------------+-----------------+--------+\n| 6,287.0            | 1           | 32K             | 8192   |\n+--------------------+-------------+-----------------+--------+\n\n**Llama3.1 405B**\n\n+--------------------+-------------+-----------------+--------+\n| Target Latency (us)| Batch Count | Sequence Length | Hidden |\n+====================+=============+=================+========+\n| 866.81             | 1           | 2K              | 16384  |\n+--------------------+-------------+-----------------+--------+\n| 13,214.40          | 1           | 32K             | 16384  |\n+--------------------+-------------+-----------------+--------+\n\n\nPerformance Analysis\n--------------------\n\nHere we demonstrate a sample execution of this kernel and break it down in the Profiler.\n\n**Test Parameters:**\n\n  * LNC: 2 ( Note, two pairs of instructions in `nc0`, and `nc1` in captured figures )\n  * Batch Size: 1\n  * Sequence Length: 160\n  * Hidden Size: 16,384\n  * Data Type: `dt.bfloat16`\n  * Quantization Data Type: `dt.float8_e4m3`\n  * Quantization Only: `False`\n\nThe following picture shows the overall execution.\n\n.. image:: images/profile_overall.png\n\nPhase 1: Load Inputs\n^^^^^^^^^^^^^^^^^^^^\n\nThis phase involves two DMA load operations: one for the hidden tensor and one for the gamma tensor.\n\n* **Hidden Tensor**: The DMA buffer size is calculated as `hidden_size * sizeof(dtype)`.\n\n* **Gamma Tensor**: The code intends to load the entire `[1, H]` tensor in a single operation. However, it should be noted that the compiler performs optimizations for trivial dimensions, which can result in several small (e.g., 4-byte) DMA buffer loads.\n\nPhase 2: RMSNorm\n^^^^^^^^^^^^^^^^\n\n.. figure:: images/profile_phase_2.png\n   :align: center\n\n* Compute Inverse RMS scale\n\n    * This step involves two ACT (activation) instructions:\n\n        * `activation_reduce`: Squares each element of the hidden tensor and performs a reduction (sum) across the hidden dimension.\n        * `activation`: Adds a small constant `eps` for numerical stability, applies a scaling factor `(1 / H)`, and then computes the reciprocal square root of the result.\n\n* Broadcast Gamma – Part 1 / Part 2\n\n    * As previously mentioned, a multi-buffer strategy is used for PSUM. Assuming there are N PSUM banks, Part 1 of the broadcast operation replicates the gamma values of shape [1, `512`] to [128, 512] tiles, repeating this process N times.\n    * The size `512` corresponds to the **free dimension limit** of the TensorEngine, meaning we must slice the H dimension (processing dimension) into chunks of 512.\n    * The broadcast is divided into Part 1 and Part 2 because the inverse RMS scale value is needed before evicting data from the PSUM buffers after Part 1. The PSUM data is not evicted to the SBUF immediately; instead, it remains in place to be consumed by the `scalar_tensor_tensor` operation once `inverse_rms_scale` is ready. This behavior is intentional, as there is limited performance benefit in evicting PSUMs early. Part 2 of the gamma broadcast is fully pipelined with the subsequent `scalar_tensor_tensor` instruction, making early eviction unnecessary.\n\n* Apply gamma and inverse RMS scale\n\n    * This step is performed using the `scalar_tensor_tensor` instruction, with a free dimension size of 512, matching the limit of the TensorEngine. This allows the operation to be *efficiently pipelined* with the TensorEngine activity.\n\nPhase 3: Quantization\n^^^^^^^^^^^^^^^^^^^^^\n\n\n.. figure:: images/profile_phase_3.png\n   :align: center\n\nThe overall quantization process involves heavy use of the VectorEngine, primarily due to the `max` function. These instructions are executed **sequentially with no parallelism**, as each step depends on the result of the previous one.\n\n* Compute absolute maximum\n\n* Compute dequantization scale\n\n    * `activation`: The dequantization scale is derived by dividing the absolute max by `_FP8_RANGE`\n\n* Compute quantized output\n\n    * `tensor_scalar`: clamp to `_MIN_DEQUANT_SCALE_VAL` for numerical stability\n    * `reciprocal`:  compute the reciprocal to get the quantization scale\n    * `tensor_scalar`: Apply quantization scale to produce the quantized result\n\nPhase 4: Store output\n^^^^^^^^^^^^^^^^^^^^^\n\nStore quantized value with dequantizing scale\n\n  * **Hidden Tensor**:\n    The DMA buffer size is calculated as `hidden_size * sizeof(quant_dtype)`.\n  * **Dequantization Scale:**\n    The DMA buffer size is calculated as `4* sizeof(quant_dtype)`."
  },
  {
    "path": "nki/library/specs/index.rst",
    "content": ".. meta::\n    :description: NKI Library specifications for the pre-built kernels included with the AWS Neuron SDK.\n    :date-modified: 12/02/2025\n\n.. _nkl_design_spec_home:\n\nNKI Library Design Specifications\n==================================\n\nThe NKI Library provides pre-built kernels you can review and modify in your own kernel development with the AWS Neuron SDK and NKI. In this section, learn how the NKI Library kernels are designed and optimized so you can apply the same techniques to your own custom NKI kernels.\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 40 30\n\n   * - Kernel\n     - Description\n     - Source Code\n   * - :doc:`RMSNorm-Quant kernel specification </nki/library/specs/design-rmsnorm-quant>`\n     - Performs optional RMS normalization followed by quantization to ``fp8``.\n     - `Source code <https://github.com/aws-neuron/nki-samples/tree/main/src/nki_samples/reference/rmsnorm_quant>`_\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    RMSNorm-Quant <design-rmsnorm-quant>"
  },
  {
    "path": "nki/migration/index.rst",
    "content": ".. _nki_migration_home:\n\n.. meta::\n    :description: NKI Migration Guides for upgrading between NKI versions.\n    :keywords: NKI, AWS Neuron, Migration, Upgrade, Update Guide\n\nNKI Migration Guides\n====================\n\nThese guides help you migrate your NKI kernels between versions.\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: NKI 0.3.0 Update Guide\n      :link: nki-0-3-0-update-guide\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Migrate your NKI kernels from 0.2.0 to 0.3.0, including API changes, deprecations, and new features.\n\n   .. grid-item-card:: NKI Block Dimension Migration Guide\n      :link: nki_block_dimension_migration_guide\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Migrate NKI kernels to use block dimensions for improved performance and resource utilization on Trainium devices.\n\n   .. grid-item-card:: NKI Beta 2 Migration Guide\n      :link: nki-beta2-migration-guide\n      :link-type: doc\n      :class-body: sphinx-design-class-title-small\n\n      Migrate NKI kernels from Beta 1 to Beta 2.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    NKI 0.3.0 Update Guide <nki-0-3-0-update-guide>\n    Block Dimension Migration Guide <nki_block_dimension_migration_guide>\n    Beta 2 Migration Guide <nki-beta2-migration-guide>\n"
  },
  {
    "path": "nki/migration/nki-0-3-0-update-guide.rst",
    "content": ".. meta::\n   :description: NKI 0.3.0 Update Guide — update NKI kernels from Beta 2 to NKI 0.3.0\n   :keywords: NKI, Neuron Kernel Interface, update guide, 0.3.0, Trainium, Inferentia\n\n.. _nki-0-3-0-update-guide:\n\nNKI 0.3.0 Update Guide\n=======================\n\nFor developers with existing NKI Beta 2 kernels, this document provides guidance on updating to NKI 0.3.0.\n\nNKI 0.3.0 is a significant update to the Neuron Kernel Interface, available in AWS Neuron SDK 2.29.0.\nThis release moves NKI to General Availability with a new open-source NKI Standard Library (nki-stdlib),\na built-in CPU Simulator, ``nki.language`` APIs, and several API improvements for correctness\nand consistency.\n\nThis guide is intended for NKI developers updating existing kernels from Beta 2 to NKI 0.3.0. It covers\nnew features, deprecated and removed APIs, and breaking changes with before-and-after code examples.\n\n.. note::\n\n   If you are migrating from NKI Beta 1 (``neuronxcc.nki.*``), first complete the\n   :doc:`NKI Beta 2 Migration Guide <nki-beta2-migration-guide>` before following this guide.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\nWhat's New in NKI 0.3.0\n------------------------\n\n\nNKI Standard Library (nki-stdlib)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNKI 0.3.0 ships with the NKI Standard Library (nki-stdlib), which provides developer-visible code for all\nNKI APIs and native language objects (e.g., ``NkiTensor``).\n\n\nNKI CPU Simulator\n~~~~~~~~~~~~~~~~~\n\nNKI 0.3.0 introduces ``nki.simulate(kernel)``, which executes NKI kernels entirely on CPU without requiring\nNeuronDevice hardware. The simulator interprets NKI operations using NumPy, producing numerically equivalent\nresults to on-device execution (with minor floating-point differences due to CPU vs NeuronCore arithmetic).\nThis enables local development, debugging, and functional correctness testing on any machine — including\nlaptops and CI environments.\n\n.. note::\n\n   The NKI CPU Simulator is experimental in NKI 0.3.0.\n\nThe simulator can be invoked in two ways:\n\n1. **Set the environment variable** ``NKI_SIMULATOR=1`` to run existing kernels without code changes:\n\n.. code-block:: bash\n\n   NKI_SIMULATOR=1 python my_script.py\n\n2. **Wrap the kernel call** with ``nki.simulate``:\n\n.. code-block:: python\n\n   import nki\n   import numpy as np\n\n   @nki.jit\n   def my_kernel(X, Y):\n       ...\n\n   # Run on CPU — no Neuron device needed\n   X = np.random.randn(128, 512).astype(np.float16)\n   Y = np.zeros((128, 512), dtype=np.float16)\n   nki.simulate(my_kernel)(X, Y)\n\n\n``nki.typing`` Module\n~~~~~~~~~~~~~~~~~~~~~\n\nA new module for type-annotating kernel tensor parameters. Use ``nt.tensor[shape]`` to declare expected\ntensor shapes:\n\n.. code-block:: python\n\n   import nki.typing as nt\n\n   @nki.jit\n   def my_kernel(\n       X: nt.tensor[128, 512],\n       Y: nt.tensor[128, 512]\n   ):\n       ...\n\n\nNew ``nki.isa`` APIs\n~~~~~~~~~~~~~~~~~~~~\n\n* ``nki.isa.exponential`` — Dedicated exponential instruction with max subtraction, faster than ``nisa.activation(op=nl.exp)`` and useful for Softmax calculation. Trn3 (NeuronCore-v4) only.\n\n\nNew ``nki.collectives`` APIs\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n* ``nki.collectives.all_to_all_v`` — Variable-length all-to-all collective. Unlike ``all_to_all``, uses a metadata tensor to specify per-rank send/recv counts.\n\n\nMatmul Accumulation\n~~~~~~~~~~~~~~~~~~~\n\n``nc_matmul`` and ``nc_matmul_mx`` now have an ``accumulate`` parameter that controls whether the operation\noverwrites or accumulates on the destination PSUM tile. The default (``accumulate=None``) auto-detects:\nthe first write to a PSUM location overwrites, and subsequent writes accumulate. This matches Beta 2\nbehavior.\n\n.. code-block:: python\n\n   nisa.nc_matmul(dst, stationary, moving, accumulate=True)\n   nisa.nc_matmul_mx(dst, stationary, moving, stat_scale, mov_scale, accumulate=True)\n\n\nAddress Placement\n~~~~~~~~~~~~~~~~~\n\nThe ``address`` parameter was added to ``nki.language.ndarray`` as an optional parameter for explicit\nmemory placement.\n\n.. code-block:: python\n\n   buf = nl.ndarray((128, 512), dtype=nl.float16, address=(p_off, f_off))  # explicit placement\n\n\n``nki.language`` APIs\n~~~~~~~~~~~~~~~~~~~~~\n\nNKI 0.3.0 introduces ``nki.language`` APIs as convenience wrappers around ``nki.isa`` APIs. These\ninclude operations such as ``nl.load``, ``nl.store``, ``nl.copy``, ``nl.matmul``, ``nl.transpose``,\n``nl.softmax``, and other high-level operations that map to one or more ``nki.isa`` calls.\n\n.. note::\n\n   The ``nki.language`` convenience APIs are experimental in NKI 0.3.0.\n\n\nDeprecated and Removed APIs\n----------------------------\n\n\n``nki.isa.tensor_copy_dynamic_src`` / ``nki.isa.tensor_copy_dynamic_dst``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDeprecated and scheduled for removal. Use ``nisa.tensor_copy()`` with ``.ap()`` and ``scalar_offset`` instead.\n\n\n``nki.jit(platform_target=...)``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe ``platform_target`` parameter is deprecated. Set the target platform via the\n``NEURON_PLATFORM_TARGET_OVERRIDE`` environment variable instead.\n\n.. important::\n\n   This is a breaking change. Passing ``platform_target`` to ``@nki.jit`` raises an error in NKI 0.3.0.\n\n\n``nki.jit(mode=...)``\n~~~~~~~~~~~~~~~~~~~~~\n\nThe ``mode`` parameter is deprecated and ignored. The NKI Compiler now inspects the kernel arguments to\ndetect the appropriate machine learning framework automatically:\n\n1. **Torch tensors**: uses TorchXLA integration.\n2. **JAX arrays**: uses JAX integration.\n3. **NumPy arrays**: runs the kernel in standalone mode without a machine learning framework.\n\nTo run the kernel in the CPU simulator, set the environment variable ``NKI_SIMULATOR=1``, or wrap the\nkernel call in ``nki.simulate``.\n\n.. important::\n\n   This is a breaking change. Code that passes ``mode=`` to ``@nki.jit`` should remove the parameter.\n\n\nAPI Breaking Changes\n--------------------\n\nThis section describes each breaking change with before-and-after code examples.\n\n\n``nisa.dma_copy`` — Reading from PSUM\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n``nisa.dma_copy`` no longer supports reading directly from PSUM. Copy the PSUM tensor to SBUF first\nusing ``nisa.tensor_copy``.\n\n.. code-block:: python\n\n   # Beta 2\n   nisa.dma_copy(dst=hbm_tensor, src=psum_tensor[0:TILE, 0:N])\n\n   # NKI 0.3.0\n   sbuf_temp = nl.ndarray((TILE, PSUM_SIZE), dtype=nl.float32, buffer=nl.sbuf)\n   nisa.tensor_copy(dst=sbuf_temp[0:TILE, 0:N], src=psum_tensor[0:TILE, 0:N])\n   nisa.dma_copy(dst=hbm_tensor, src=sbuf_temp[0:TILE, 0:N])\n\n\n``nisa.dma_copy`` — ``dge_mode`` Type Matching\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNKI 0.3.0 enforces that source and destination element types must match when using\n``dge_mode=dge_mode.hwdge``. Beta 2 did not validate this, allowing mismatched types to pass silently.\n\nThe DMA hardware moves raw bytes — HWDGE generates descriptors without interpreting data content, so no\ntype casting occurs. To reinterpret data as a different type, use ``.view()`` to match types before the copy.\n\n.. code-block:: python\n\n   # Beta 2 (no validation, undefined behavior)\n   nisa.dma_copy(dst=dst_f4, src=src_ui16, dge_mode=nisa.dge_mode.hwdge)\n\n   # NKI 0.3.0 — use .view() to reinterpret\n   nisa.dma_copy(dst=dst_f4, src=src_ui16.view(nl.float4_e2m1fn_x4), dge_mode=nisa.dge_mode.hwdge)\n\nAlternatively, use ``dge_mode.swdge`` or ``dge_mode.none`` if type casting is intended.\n\n\n``nisa.dma_copy`` — ``dst_rmw_op`` and ``unique_indices`` Removed\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n``nisa.dma_copy`` no longer supports read-modify-write operations. The ``dst_rmw_op`` and ``unique_indices``\nparameters have been removed. Use ``nisa.dma_compute`` instead.\n\n.. code-block:: python\n\n   # Beta 2 — simple read-modify-write\n   nisa.dma_copy(dst, src, dst_rmw_op=nl.add)\n\n   # NKI 0.3.0 — use dma_compute\n   nisa.dma_compute(dst, [src], reduce_op=nl.add)\n\nFor accumulation loops with indirect indexing:\n\n.. code-block:: python\n\n   # Beta 2\n   for k_idx in range(K):\n       dst_rmw_op = None if k_idx == 0 else nl.add\n       nisa.dma_copy(\n           src=input.ap(...),\n           dst=reduced_sb[:, :],\n           dst_rmw_op=dst_rmw_op,\n           unique_indices=True,\n       )\n\n   # NKI 0.3.0 — split into dma_copy + dma_compute\n   for k_idx in range(K):\n       src_access = input.ap(...)\n       if k_idx == 0:\n           nisa.dma_copy(dst=reduced_sb[:, :], src=src_access)\n       else:\n           nisa.dma_compute(\n               dst=reduced_sb[:, :],\n               srcs=[src_access, reduced_sb[:, :]],\n               reduce_op=nl.add,\n               unique_indices=True,\n           )\n\n\n``nisa.memset`` — Strict Type Matching\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNKI 0.3.0 enforces that the ``value`` argument must match the destination tensor's dtype. Beta 2 silently\ncast float values to the destination type. For integer-typed tensors, pass an integer literal.\n\n.. code-block:: python\n\n   # Beta 2\n   buf = nl.ndarray((128, 128), dtype=nl.int32, buffer=nl.sbuf)\n   nisa.memset(dst=buf, value=2.0)\n\n   # NKI 0.3.0\n   buf = nl.ndarray((128, 128), dtype=nl.int32, buffer=nl.sbuf)\n   nisa.memset(dst=buf, value=2)\n\n\n``nisa.tensor_reduce`` — Axis Handling Fix\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNKI 0.3.0 fixes incorrect axis handling that existed in Beta 2. Beta 2 incorrectly allowed ``axis=1`` to\nrefer to the last free dimension even for 3D/4D tensors. NKI 0.3.0 corrects this so that axis values\ncorrespond to the actual tensor dimensions.\n\nKernels that relied on the Beta 2 behavior (e.g., using ``axis=1`` to mean the last dimension of a 3D/4D\ntensor) will produce errors in NKI 0.3.0.\n\n\n``nisa.dma_compute`` — Parameter Reorder\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe ``scales`` and ``reduce_op`` parameters swapped positions. ``scales`` is now optional, and\n``unique_indices`` was added (moved from ``dma_copy``).\n\n.. code-block:: python\n\n   # Beta 2\n   nisa.dma_compute(dst, srcs, scales, reduce_op)\n\n   # NKI 0.3.0\n   nisa.dma_compute(dst, srcs, reduce_op, scales=None, unique_indices=True)\n\n\n``nisa.sendrecv`` — ``dma_engine`` Enum\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe boolean ``use_gpsimd_dma`` parameter is replaced by the ``dma_engine`` enum.\n\n.. code-block:: python\n\n   # Beta 2\n   nisa.sendrecv(..., use_gpsimd_dma=True)\n\n   # NKI 0.3.0\n   from nki.isa import dma_engine\n   nisa.sendrecv(..., dma_engine=dma_engine.gpsimd_dma)\n   nisa.sendrecv(..., dma_engine=dma_engine.dma)      # was use_gpsimd_dma=False\n\n\n``nisa.affine_select`` — ``offset`` Parameter Moved\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe ``offset`` parameter moved from the 3rd positional argument to a keyword argument with default ``0``.\nExisting positional call sites will break.\n\n.. code-block:: python\n\n   # Beta 2\n   nisa.affine_select(dst, pattern, offset, channel_multiplier, on_true, on_false)\n\n   # NKI 0.3.0\n   nisa.affine_select(dst, pattern, channel_multiplier, on_true, on_false, offset=offset)\n\n\n``nisa.register_move`` — ``imm`` Renamed to ``src``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe ``imm`` parameter has been renamed to ``src`` and now accepts a ``VirtualRegister`` instead of a\ncompile-time constant. To move a compile-time constant into a register, first allocate a register with\nthe constant value.\n\n.. code-block:: python\n\n   # Beta 2\n   nisa.register_move(dst, imm=42)\n\n   # NKI 0.3.0\n   src = nisa.register_alloc(x=42)\n   nisa.register_move(dst, src=src)\n\n\nCollectives — ``num_channels`` Removed\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n``num_channels`` removed from ``collective_permute_implicit_current_processing_rank_id``. The high-level\n``collective_permute_implicit()`` now accepts a ``channel_ids`` list directly.\n\n.. code-block:: python\n\n   # Beta 2\n   rank_id = ncc.collective_permute_implicit_current_processing_rank_id(\n       iteration_id=0, channel_id=ch, num_channels=N, replica_group=rg\n   )\n\n   # NKI 0.3.0\n   rank_id = ncc.collective_permute_implicit_current_processing_rank_id(\n       iteration_id=0, channel_id=ch, replica_group=rg\n   )\n\n   ncc.collective_permute_implicit(\n       srcs_by_channel=[[src0], [src1]],\n       dsts_by_channel=[[dst0], [dst1]],\n       replica_group=rg,\n       channel_ids=[0, 1],  # replaces num_channels=2\n   )\n\n\nOutput Tensors Must Use ``nl.shared_hbm``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAll kernel output (return) tensors must be allocated with ``buffer=nl.shared_hbm``. Using ``nl.hbm``\nfor output tensors will cause compilation failures.\n\n.. code-block:: python\n\n   # Beta 2\n   output = nl.ndarray((B, C, L), dtype=x.dtype, buffer=nl.hbm)\n\n   # NKI 0.3.0\n   output = nl.ndarray((B, C, L), dtype=x.dtype, buffer=nl.shared_hbm)\n\n\nInteger Enum Constants No Longer Supported\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRaw integer values (e.g., ``dge_mode=2``) are no longer accepted for enum parameters. Use the named enum\nmembers instead: ``nki.isa.engine``, ``nki.isa.dge_mode``, ``nki.isa.oob_mode``, ``nki.isa.reduce_cmd``,\nand ``nki.isa.nc_version``.\n\n.. code-block:: python\n\n   # Beta 2\n   nisa.dma_copy(src=src_tensor, dst=dst_tensor, dge_mode=2)\n\n   # NKI 0.3.0\n   nisa.dma_copy(src=src_tensor, dst=dst_tensor, dge_mode=nisa.dge_mode.hwdge)\n\n\nString Buffer Names No Longer Supported\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n``nl.ndarray``, ``nl.zeros``, and other creation ops no longer accept strings for the ``buffer`` parameter.\nUse buffer objects from ``nki.language`` instead.\n\n.. code-block:: python\n\n   # Beta 2\n   buf = nl.ndarray((128, 512), dtype=nl.float16, buffer='sbuf')\n\n   # NKI 0.3.0\n   buf = nl.ndarray((128, 512), dtype=nl.float16)  # buffer defaults to sbuf\n   buf = nl.ndarray((128, 512), dtype=nl.float16, buffer=nl.sbuf)\n\n.. list-table:: Buffer type mapping\n   :header-rows: 1\n   :widths: 50 50\n\n   * - Beta 2 (string)\n     - NKI 0.3.0 (object)\n   * - ``\"sbuf\"``\n     - ``nl.sbuf``\n   * - ``\"psum\"``\n     - ``nl.psum``\n   * - ``\"hbm\"``\n     - ``nl.hbm``\n   * - ``\"private_hbm\"``\n     - ``nl.private_hbm``\n   * - ``\"shared_hbm\"``\n     - ``nl.shared_hbm``\n\n\n``nki.isa.dma_engine`` Alias Repurposed\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe Beta 2 ``nki.isa.dma_engine`` module-level alias was unused and did not map correctly to a valid engine.\nIn NKI 0.3.0, it has been replaced with the ``nki.isa.dma_engine`` enum, which provides explicit control\nover DMA transfer engines (``dma_engine.dma`` for shared DMA, ``dma_engine.gpsimd_dma`` for GPSIMD's\ninternal DMA engine).\n\n\nLanguage Restrictions\n---------------------\n\nThe NKI 0.3.0 compiler has stricter validation. The following patterns require changes for NKI 0.3.0.\n\n\nRemove Keyword-Only Argument Separator (``*``)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe NKI 0.3.0 compiler does not support the ``*`` separator in kernel function signatures. Move all\nparameters with defaults to the end of the signature.\n\n.. code-block:: python\n\n   # Beta 2\n   @nki.jit\n   def my_kernel(X: nl.ndarray, *, flag: bool = True, scale: float = 1.0):\n       ...\n\n   # NKI 0.3.0\n   @nki.jit\n   def my_kernel(X: nl.ndarray, flag: bool = True, scale: float = 1.0):\n       ...\n\n\nReplace ``is`` / ``is not`` with ``==`` / ``!=``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe NKI 0.3.0 compiler does not support Python's ``is`` / ``is not`` operators. These operators check\nobject identity, which is not meaningful during NKI compilation tracing. Use ``==`` / ``!=`` instead.\n\n.. code-block:: python\n\n   # Beta 2\n   if some_flag is True:\n       ...\n\n   # NKI 0.3.0\n   if some_flag == True:\n       ...\n\n\nReplace List Kernel Arguments with Tuples\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe NKI 0.3.0 compiler does not support ``list`` as a kernel argument type. Convert list arguments to\ntuples at the call site.\n\nTuples are immutable and hashable, which more accurately reflects the semantics of compiled kernels and enables \nthe compiler to cache compilations based on the kernel's arguments.\n\n.. code-block:: python\n\n   # Beta 2\n   @nki.jit\n   def my_kernel(img, in_perm, stride=[1, 1]):\n       ...\n   my_kernel(img, in_perm=[0, 3, 1, 2], stride=[1, 1])\n\n   # NKI 0.3.0\n   @nki.jit\n   def my_kernel(img, in_perm, stride=(1, 1)):\n       ...\n   my_kernel(img, in_perm=(0, 3, 1, 2), stride=(1, 1))\n\n\nAPI Improvements\n----------------\n\nThese changes improve correctness or usability but are non-breaking for most kernels.\n\n\n``nisa.memset`` — x4 Packed Type Restriction\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nx4 packed types (``float8_e4m3fn_x4``, ``float8_e5m2_x4``, ``float4_e2m1fn_x4``) now enforce ``value=0``.\nThe ISA memset instruction fills the destination with a single u32 value and has no notion of the\nsub-elements packed inside, so only zero is valid. To initialize x4 packed tensors with non-zero values,\nuse ``nisa.dma_copy`` to load pre-computed x4 data from an HBM kernel argument.\n\n.. code-block:: python\n\n   # Zero-fill works directly\n   buf = nl.ndarray((128, 128), dtype=nl.float8_e4m3fn_x4, buffer=nl.sbuf)\n   nisa.memset(dst=buf, value=0)\n\n   # Non-zero: pass pre-computed x4 data as a kernel argument from HBM\n   # and use nisa.dma_copy to load it into SBUF\n   nisa.dma_copy(dst=buf, src=precomputed_x4_hbm_tensor)\n\n\n``nisa.range_select`` — Parameter Fixes\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nBeta 2 silently overrode ``on_false_value`` to ``FP32_MIN`` and ``reduce_cmd`` to ``reset_reduce``,\nregardless of user input. In NKI 0.3.0:\n\n* ``reduce_cmd`` now works as expected (default ``reset_reduce``)\n* ``on_false_value`` must be ``FP32_MIN`` due to hardware constraints, but is now documented as a\n  constraint rather than silently ignored\n\n\nParameter Default Value Updates\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe following default values changed in NKI 0.3.0:\n\n* ``nki.isa.iota`` — ``offset`` is now optional with a default of ``0``\n* ``nki.isa.core_barrier`` — ``engine`` default changed from ``unknown`` to ``gpsimd`` (no behavioral change)\n* ``nki.language.num_programs`` — ``axes`` default changed from ``None`` to ``0``\n* ``nki.language.program_id`` — ``axis`` now has a default value of ``0``\n* ``nki.language.ndarray`` — ``buffer`` default changed from ``None`` to ``nl.sbuf``\n* ``nki.language.zeros`` — ``buffer`` default changed from ``None`` to ``nl.sbuf``\n* ``nki.language.sequential_range`` — ``stop`` and ``step`` now have default values (``None`` and ``1``)\n"
  },
  {
    "path": "nki/migration/nki-beta2-migration-guide.rst",
    "content": ".. meta::\n   :description: Best practices for migrating NKI kernels from Beta 1 to the Beta 2 NKI Compiler\n   :keywords: NKI kernels, Neuron Kernel Interface, AWS Neuron SDK, kernel compilation, Trainium, Inferentia, machine learning acceleration\n\n.. _nki-migration-guide:\n\n=========================================\nNKI Migration Guide from Beta 1 to Beta 2\n=========================================\n\nThis topic covers best practices for migrating NKI kernels from the legacy \n``neuronxcc.nki.*`` namespace to the new ``nki.*`` namespace which uses the \nnew NKI Compiler. See :ref:`nki_compiler_about` \nfor more in-depth information.\n\nBackground: NKI has a Compiler!\n==================================\n\nAs of Release 2.27, NKI now has a new standalone compiler. The syntax of NKI \nremains a subset of Python. This means you can largely use Python syntax when \nwriting NKI kernels. However, it is important to remember that your NKI \nfunctions are compiled by the NKI Compiler and not evaluated by the Python \ninterpreter. The goal is to offer a better programming experience with more \nprecise error messages.\n\nWith the NKI Compiler, we have chosen to define the NKI meta-programming language as a subset \nof Python. This means that all NKI programs are valid Python programs, but not \nall Python programs are valid NKI programs. The delineation is the ``nki.jit`` \ndecorator. Just as before, you mark your NKI kernels with the ``nki.jit`` \ndecorator. However, unlike before, the functions under this decorator will be \npassed to the NKI Compiler and not be evaluated by the Python interpreter.\n\n.. code-block:: python\n\n   def a_function(x,y,z):\n     # this is Python code\n\n   @nki.jit\n   def kernel(x,y,z):\n     # this is NKI code\n\nIf you use Python features within a NKI kernel that are not supported, the NKI\nCompiler will give an error. The goal is that programming in NKI is intuitive \nand convenient and all of the features you need are available and behave as \nexpected. However, if you find some curious errors or confusing behavior, \nreach out to us on the NKI Samples repository on AWS Neuron GitHub.\n\nThis document is intended for experienced NKI developers who are looking to \nmigrate their existing kernels to the Beta 2 NKI compiler. Most code snippets \nbelow are assumed to be executed within a valid NKI kernel.\n\nKey Migration Items\n===================\n\nThese are the key items to migrate existing kernel to the Beta 2 NKI Compiler.\n\nWhat new features are available in NKI Beta 2?\n----------------------------------------------\n\n* A new namespace for NKI Beta 2, ``nki.*``\n* ``device_print`` is available to inspect tensor values\n* The behavior of loops and branching is consistent with regular Python\n* Lists and dictionaries are available and their behavior in loops is consistent with regular Python\n* Direct allocation APIs have been reworked\n\nWhat features in ``neuronxcc.nki.*`` are not available in ``nki.*``?\n--------------------------------------------------------------------\n\n* ``arange`` has been removed, use slicing or :ref:`nki-aps`\n* The ``mask`` parameter is no longer supported\n* Block dimensions of tensors have been removed\n* Explicit ``dst`` parameter is now required for ``nki.isa`` instructions and is always the first argument\n* ``nl.load`` and ``nl.store`` have been removed, use ``nisa.dma_copy``\n* Nested slicing is not available\n* Dynamic Access syntax has changed\n* Decorators on sub-kernels need to be removed\n* Dictionaries support only string keys\n\nNew Features in NKI Beta 2\n===========================\n\nNew namespace, new APIs\n-----------------------\n\nNKI Beta 2 introduces a number of changes to the language and to the \ncompilation process. While we are deprecating NKI Beta 1, the Beta 2 release \nsupports both versions of the language via namespaces. The Beta 1 APIs can \nbe used via the ``neuronxcc.nki.*`` namespace, while Beta 2 has moved to the \n``nki.*`` namespace.\n\n.. code-block:: python\n\n   # Legacy Beta 1 APIs\n   import neuronxcc.nki as nki\n   import neuronxcc.nki.isa as nisa\n\n   # New Beta 2 APIs\n   import nki\n   import nki.isa as nisa\n\nWe have made improvements to the APIs, like consistent naming, order of \narguments, and matching more closely the hardware ISA so that what developers \nwrite in NKI and what they see in the profiler are the same. There is one \nchange that developers should be aware of: all ISA functions now require a \ndestination parameter.\n\nAll ISA functions require a destination parameter\n-------------------------------------------------\n\nIn Beta 2, all of the ISA functions now require a ``dst`` parameter instead \nof returning a result. So, instead of writing:\n\n.. code-block:: python\n\n   result[...] = nisa.reciprocal(src)\n\nDevelopers must write:\n\n.. code-block:: python\n\n   nisa.reciprocal(dst=result[...], src)\n\nThis change makes the behavior of the APIs more consistent and matches cases\nwhere APIs may perform accumulation or return multiple results. It also helps \navoid scenarios where developers might inadvertently write to the wrong buffer \nor inadvertently introduce additional copy operations.\n\nDynamic control flow\n--------------------\n\nNKI Beta 2 includes support for dynamic (on-chip) control flow. All of the \ndynamic control flow uses on-chip registers to hold the conditional values. \nSee :ref:`trainium_inferentia2_arch` for more information. If a control flow \nconstruct uses a register as a conditional, then the loop will be an on-chip, \ndynamic (or runtime) loop. This is very common in scenarios like Mixture of \nExperts (MoE), where the index space for the expert is known at runtime, but \nnot at compile time. Dynamic control flow with the new NKI APIs unlock this \nuse case.\n\nTo support dynamic control flow, NKI has a new set of ``nki.isa`` APIs for \nreading and writing to hardware registers. See :doc:`/nki/api/index` for \nmore information.\n\n.. code-block:: python\n\n   # Define a register\n   def register_alloc(x: Optional[int]) -> register: ...\n\n   # Fill the register with an immediate value\n   def register_move(dst: imm: int): ...\n\n   # Load SRAM tensor element into the dst register\n   def register_load(dst: register, src: tensor): ...\n\n   # Store the value of the register into SRAM\n   def register_store(dst: tensor, src: register): ...\n\nThe most basic dynamic loop is a ``for`` loop that uses a register value for \nthe iteration value and another register for the upper bound. Developers can \nwrite this kind of loop using ``dynamic_range``:\n\n.. code-block:: python\n\n   # dynamic loop with dynamically computed upper bounds\n   # upper_bound is a hardware register\n   # the loop index, i, is also a hardware register\n   upper_bound = register_alloc()\n   register_load(upper_bound, tensor)\n   for i in dynamic_range(5, upper_bound, 2):\n     ...\n\nDevelopers can also write dynamic while loops. When using a dynamic while loop, \nthe developer should update the register within the body of the loop.\n\n.. code-block:: python\n\n   # initialize a conditional tensor which will be updated in the loop\n   cond = nl.ndarray((1, 1), buffer=nl.sbuf, dtype=np.int32)\n\n   # create register with initial value\n   reg = register_alloc(5)\n\n   while reg: # loop will terminate when the value reaches 0\n     ...\n     # store the register value into SBUF for computation\n     nisa.register_store(cond, reg)\n     # Decrement the condition variable by 1\n     nisa.tensor_scalar(cond, cond, nl.add, -1)\n     # load (updated) value from cond tensor into register\n     nisa.register_load(reg, cond)\n\nUpdate indexing syntax for ``mgrid`` and ``arange``\n---------------------------------------------------\n\nIf using ``nl.mgrid/arange`` to access continuous elements in an existing NKI \nkernel, this should be replaced with integer slicing. Take a look at the \nfollowing example.\n\n.. code-block:: python\n\n   # Example 1\n   t = nl.ndarray(shape=(128, 16, 64), ...)\n   # Old Approach: use mgrid to access continuous elements\n   i_p, if0, if1 = nl.mgrid[0:128, 0:8, 0:64]\n   t[i_p, if0, if1] \n   # Updated: should just use integers to create the slice\n   t[0:128, 0:8, 0:64]\n\n   # Example 2\n   t = nl.ndarray(shape=(128, 16*64))\n   # Old Approach: using mgrid\n   i_p, if0, if1 = nl.mgrid[0:128, 0:8, 0:64]\n   t[i_p, if0*64+i_f1]\n   # should just use integer slicing\n   t[0:128, 0:8*64]\n\nIf your use case cannot be represented with the slicing syntax above, see\n:ref:`nki-aps`.\n\nChanges in NKI Beta 2 from Beta 1\n==================================\n\nConsistent control flow behavior\n--------------------------------\n\nIn NKI Beta 1, range iterators were converted into special objects that allowed \nthe eDSL to capture the loop body. Because of this, loops were only executed once \nby the Python evaluator, which could lead to some surprising results. For example, \nin the code below, the normal Python variable ``var`` ends up with a value of 1 \nrather than the expected value of 8. This has been solved in the new NKI Compiler.\n\n.. code-block:: python\n\n   val = 0\n   for i in range(8):\n     val += 1\n   print(val) # will print 1 in Beta 1, prints 8 in Beta 2\n\nFor similar reasons, sometimes Python control flow constructs, such as ``if`` \nstatements, could not be handled properly when nested within a ``for`` loop. \nFor example, in Beta 1 the code below produces an undefined result. In Beta 2, \nthis code produces the expected result.\n\n.. code-block:: python\n\n   val = 0\n   for i in range(8):\n     if i == 0:\n       val = 1\n     else:\n       val = 2\n   print(val) # undefined behaviour in Beta 1, prints 2 in Beta 2\n\nMany other examples of troublesome control flow have been fixed, which should \nmake using NKI easier and more intuitive.\n\n.. _nki-mask:\n\nDeprecation of masking\n----------------------\n\nFollow this section if you are using the ``mask`` parameter in your kernel.\n\nIn NKI Beta 1, the concept of masking was introduced to order modify the \nbehavior of tensor indexing expressions. The use of masking was almost always \nused to avoid out-of-bounds access. For example, suppose a developer is tiling \na tensor of size 129 x 513, and you want to use tiles of size 128 x 512. A \ntypical way to write a tiling loop in Beta 1 is shown below.\n\n.. code-block:: python\n\n   t = nl.ndarray(shape=(129, 513), ...)\n   result = nl.ndarray(shape=(129, 513), ...)\n   for i in range(2):\n     for j in range(2):\n       i_p, i_f = nl.mgrid[0:128, 0:512]\n       result[i_p+128*i, i_f+512*i] = nisa.tensor_copy(t[i_p+128*i, i_f+512*i],\n        mask=(i_p+128*i<129) & (i_f+512*i<513))\n\nNote, when ``i`` (or ``j``) is equal to 1, then the index expression \n``result[i_p+128*i, i_f+512*i]`` would overflow the tensor dimension. The mask \nexpression ``mask=(i_p+128*i<129) & (i_f+512*i<513)`` modifies the indexing so \nthat the equations are true, and thus inbounds of the tensor. This mechanism \nhas many drawbacks, including being error-prone and non-intuitive for Python \ndevelopers. Therefore, this mechanism has been deprecated in Beta 2.\n\nIn NKI Beta 2, developers can use standard constructs from Python such as \n``min`` and ``slice`` to build indexing expressions that are in bounds for \nthe tensor. For example, the above code can now be written as:\n\n.. code-block:: python\n\n   for i in range(2):\n     p_start = i * 128\n     p_end = min(129, pstart + 128)\n     p = slice(p_start, p_end)  # a.k.a. (p_start:p_end)\n     \n     for j in range(2):\n       f_start = j * 512\n       f_end = min(513, f_start + 512)\n       f = slice(f_start, f_end)  # a.k.a. (f_start:f_end)\n       \n       nisa.tensor_copy(result[p, f], t[p, f])\n\nThe developer may also choose to inline the slices, if that is more natural. \nThe below syntax is common in NKI Beta 1.\n\n.. code-block:: python\n\n   nisa.tensor_copy(result[p_start:p_end, f_start:f_end],\n                         t[p_start:p_end, f_start:f_end])\n\nImproved Allocation API\n-----------------------\n\nThe manual allocation API has been simplified. In Beta 2 the there is a new \nargument to ``nl.ndarray`` that allows the offset of each tensor to be specified: \n(partition_offset, free_offset). Similar to the Beta 1, while the partition offset \ncorresponds to a physical partition lane on the hardware, the free dimension offset \nis the element offset within each partition. The free dimension offset is \ntranslated into physical SBUF address in the compiler.\n\n.. code-block:: python\n\n   # creates your buffer on parition 0, offset by 128 elements of your data type\n   a_result = nl.ndarray(dtype=a.dtype, shape=a.shape, name=\"result\", \n     address=(0, 128), buffer=nl.sbuf)\n\nThe address space for PSUM is now also 2D to be consistent with the hardware. \nRecall that PSUM on NeuronCore v2/v3/v4 is organized into 128 partitions, each \nconsisting of 16KB of memory. Each partition is further divided into 8 PSUM banks, \nwith each bank holding up to 2048 bit worth of values. The allocation for PSUM \ntensors must start at the beginning of each bank - the compiler will throw an \nerror otherwise.\n\nFor example, the following code will allocate a PSUM tensor on bank 3:\n\n.. code-block:: python\n\n   bank_id = 3\n   PSUM_BANK_SIZE = 2048\n   psum_t = nl.ndarray(dtype=nl.bfloat16, shape=(128, 1024), \n     address=(0, bank_id*PSUM_BANK_SIZE))\n\nTranslate from the Beta 1 Direct Allocation API\n-----------------------------------------------\n\nTo translate the direct allocated kernel in Beta 1, all data structures must \nnot use the block dimension. This means reformatting tensors to place the \npartition-dimension on the left-most position, using either lists or \nmulti-dimensional tensors for the rest of your dimensions. See \n:ref:`nki_block_dimension_migration_guide` for more information.\n\nAfter this, translate the address of each block. For example, given the \nfollowing tensor in the Beta 1 that uses the modular allocation.\n\n.. code-block:: python\n\n   # beta 1 - uses block dimension and mod allocator\n   k_loaded = nl.ndarray((num_512_tiles_cur_section, nl.par_dim(p_k), n_k), \n    dtype=nl.bfloat16, \n    buffer=sb_mod(base_addr=sca, num_free_tiles=(num_512_tiles_cur_section, )\n\nNow with Beta 2, developers can translate the block dimension into a list \nand compute the address for each block.\n\n.. code-block:: python\n\n   # beta 2 - use lists of tensors and get lists of virtual byte addresses\n   k_loaded_tensors = []\n   for i in range(num_512_tiles_cur_section):\n     k_loaded_tensors.append(nl.ndarray(shape=(p_k,n_k), dtype=nl.bfloat16, \n     buffer=nl.sbuf, address=(0, sca + (i%num_512_tiles_cur_section)*n_k*2 ) )\n\nRemove nki.jit decorator on sub-kernels\n---------------------------------------\n\nFor kernels that call other kernels, or call any other functions that are \ndecorated with a ``nki.jit`` decorator, the ``nki.jit`` decorated will need to\nbe removed from sub-kernels.\n\nIn NKI Beta 1, all the sub-kernels called from a top-level kernel could be \ndecorated with ``nki.jit(mode='trace')`` decorator. This decorator needs to be \nremoved for the new NKI Compiler. Otherwise, you will see an error about classes \nneeding to inherent from ``nl.NKIObject`` thrown from the callsite of the sub-kernels.\n\nIf a kernel is being called by another kernel and it is also called standalone, the \ndecorator can be applied on-the-fly at the call site to avoid this problem.\n\n.. code-block:: python\n\n   # Do not apply the decorator on the kernel definition\n   def my_kernel(...):\n     pass\n     \n   # When calling the kernel, apply the decorator\n   a = torch.tensor(...)\n   kernel_decorated = nki.jit(my_kernel)\n   result = kernel_decorated(a)\n\nTranslation of Block Dimensions\n-------------------------------\n\nIf the kernel uses block dimension, defined as a tensor with a partition \ndimension set to any position other than the left-most position, this has been \nremoved in Beta 2. There are two performance-equivalent ways to translate block \ndimensions. The first is to use a Python-like list and the second is to use a \ndifferently-shaped tensor.\n\nUse a Python-like list\n^^^^^^^^^^^^^^^^^^^^^^\n\nBlock dimension of tensors in Beta 1 was syntactic sugar for a list of tensors \nmanaged by the compiler. In NKI Beta 2, users can directly code this patten using \nstandard lists, without extra compiler support.\n\n.. code-block:: python\n\n   # Before migration\n   t = nl.ndarray((8, nl.par_dim(128), 256), dtype=nl.float32, buffer=nl.sbuf)\n   for i in range(8):\n     t[i]\n\n   # After migration\n   # Create an explicit list of tensors\n   t_lst = []\n   for i in range(8):\n     t_lst.append(nl.ndarray(128, 256), dtype=nl.float32, buffer=nl.sbuf)\n   for i in range(8):\n     t_list[i]\n\nWith this approach, the programs generated before and after migration are \nidentical and should yield the same performance.\n\nNot using Python list\n^^^^^^^^^^^^^^^^^^^^^\n\nIf blocks need to be alive at the same time, move the block dimension into \nfree dimension\n\n.. code-block:: python\n\n   a = nl.ndarray((8, par_dim(128), 512), buffer=nl.sbuf, dtype=bfloat16)\n\n   # ----> Migrate to\n   a = nl.ndarray((128, 8, 512), buffer=nl.sbuf, dtype=bfloat16)\n\nAs an example, if all 8 blocks of add_buf need to be live at the same time, then \nthe block dimension needs to be folded into the free dimension.\n\n.. code-block:: python\n\n   @nki.jit\n   def sb_blocks(inp):\n       res = nl.ndarray(shape=(8, 128, 512), dtype=inp.dtype, buffer=nl.shared_hbm)\n       add_buf = nl.ndarray(shape=(8, nl.par_dim(128), 512), dtype=inp.dtype, buffer=nl.sbuf)\n       for i in range(8):\n           nisa.dma_copy(add_buf[i], inp[i])\n       for i in range(8):\n           nisa.dma_copy(res[i], add_buf[i])\n       return res\n\n   # should migrate to\n   @nki.jit\n   def sb_blocks_migrated(inp):\n       res = nl.ndarray(shape=(8, 128, 512), dtype=inp.dtype, buffer=nl.shared_hbm)\n       add_buf = nl.ndarray(shape=(128, 8, 512), dtype=inp.dtype, buffer=nl.sbuf)\n       for i in range(8):\n           nisa.dma_copy(add_buf[0:128, i, 0:512], inp[i])\n       for i in range(8):\n           nisa.dma_copy(res[i], add_buf[0:128, i, 0:512])\n       return res\n\nIf blocks do not need to be alive at the same time, remove the block \ndimension and relocate tensor declaration.\n\n.. code-block:: python\n\n   a = nl.ndarray((8, par_dim(128), 256))\n   for i in nl.affine_range(8):\n     <do something with a[i]>\n\n   # should be transformed to ....\n   for i in nl.affine_range(8):\n     a = nl.ndarray((128, 256))\n     <do something with a>\n\nAs an example, if all 8 blocks of add_buf do not need to be live at the same \ntime, then remove the block dimension and relocate the tensor declaration \ninside the loop.\n\n.. code-block:: python\n\n   @nki.jit\n   def sb_blocks(inp):\n       res = nl.ndarray(shape=(8, 128, 512), dtype=inp.dtype, buffer=nl.shared_hbm)\n       add_buf = nl.ndarray(shape=(8, nl.par_dim(128), 512), dtype=inp.dtype, buffer=nl.sbuf)\n       for i in range(8):\n           nisa.dma_copy(add_buf[i], inp[i])\n           nisa.dma_copy(res[i], add_buf[i])\n       return res\n\n   # should migrate to\n   @nki.jit\n   def sb_blocks_migrated(inp):\n       res = nl.ndarray(shape=(8, 128, 512), dtype=inp.dtype, buffer=nl.shared_hbm)\n       for i in range(8):\n           add_buf = nl.ndarray(shape=(128, 512), dtype=inp.dtype, buffer=nl.sbuf)\n           nisa.dma_copy(add_buf[0:128, 0:512], inp[i])\n           nisa.dma_copy(res[i], add_buf[0:128, 0:512])\n       return res\n\nIt is important to note that the dependency relationship between loop iterations \nis different in ``sb_blocks_migrated`` and the following ``sb_blocks_migrated_incorrect`` \nshown below.\n\n.. code-block:: python\n\n   @nki.jit\n   def sb_blocks_migrated_incorrect(inp):\n       res = nl.ndarray(shape=(8, 128, 512), dtype=inp.dtype, buffer=nl.shared_hbm)\n       add_buf = nl.ndarray(shape=(128, 512), dtype=inp.dtype, buffer=nl.sbuf)\n       for i in range(8):\n           nisa.dma_copy(add_buf[0:128, 0:512], inp[i])\n           nisa.dma_copy(res[i], add_buf[0:128, 0:512])\n       return res\n\nIn ``sb_blocks_migrated``, the compiler could unroll the loop and materialize \nmultiple copies of the tensor ``add_buf``. However, in the ``sb_blocks_migrated_incorrect``, \nthe execution will be serialized because the loop carries a dependency on ``add_buf``.\n\nDynamic Access Pattern\n----------------------\n\nFollow this section for a kernel that uses dynamic access, i.e. using a runtime value \nto index another tensor.\n\nThe syntax for representing dynamic access patterns has changed. In NKI Beta 1, \nan access with a dynamic scalar offset could be represented as shown below where \n``batch_idx`` is a dynamic value in the SBUF:\n\n.. code-block:: python\n\n   batch_idx = nl.multiply(nl.bitwise_and(nl.load(dynamic_idx), y=3), 128)\n   result = nl.ndarray((128, 256), A.dtype, buffer=nl.shared_hbm)\n   batch_idx[...] = 4 # set a constant, but batch_idx is a runtime SBUF value\n   i_p, i_f = nl.mgrid[0:128, 0:256]\n   nisa.dma_copy(src=A[batch_idx, i_p, i_f], dst=result[...])\n\nScalar Dynamic Access\n^^^^^^^^^^^^^^^^^^^^^\n\nIn Beta 2, we need to use a physical access pattern, specified with the ``.ap`` \nmethod, to represent this.\n\n.. code-block:: python\n\n   def indirect_scalar_dynamic_dma(A):\n     # Assume input A is of shape (4*128, 512). We want to copy from A[3*128:, 0:256]\n     # The 3*128 offset comes from a dynamic variable in SBUF\n     assert A.shape = [512, 512]\n     batch_idx = nl.ndarray((1, 1), nl.int32, buffer=nl.sbuf)\n     nisa.memset(batch_idx, value=3*128)\n\n     result = nl.ndarray((128, 256), A.dtype, buffer=nl.shared_hbm)\n\n     nisa.dma_copy(src=A.ap(\n       pattern=[[512, 128], [1, 256]], offset=0, \n       scalar_offset=batch_idx, indirect_dim=0\n       ),\n       dst=result[...])\n\n     return result\n\nThe ``scalar_offset`` is an SBUF value that specifies the index on the \n``indirect_dim`` of the tensor. For example, the code block above accesses \n``batch_idx`` on the 0-th dimension of the tensor ``A``. It is important \nto note that the dimension is relative to the **bast tensor**, not relative\nto the **pattern** specified. \n\nThis example will access the memory from ``A`` starting at the element offset below.\n\n.. code-block:: python\n\n   # prod(A.shape[indirect_dim+1:]) is the accumulated shape \n   # to the right of indirect_dim\n   offset + scalar_offset * prod(A.shape[indirect_dim+1:])\n\nIn the example above, the access would start from:\n\n.. code-block:: python\n\n   0 + batch_idx * 512\n\nAgain, we should notice that ``512`` is read from the shape of the **base tensor**, not \nfrom the access pattern. The shape of the access pattern is ``(128, 256)``.\n\nIn conventional NumPy syntax, the above means that we will are accessing \n``A[batch_idx:batch_idx+128, 0:256]``. Writing this in the canonical loop form, \nthe result of the access is the following:\n\n.. code-block:: python\n\n   result = nl.ndarray(shape=(128, 256), dtype=A.dtype, buffer=nl.sbuf)\n   for x in range(128):\n     for y in range(256):\n       result[x, y] = A.flatten()[0 + batch_idx*512 + x*512 + y*1]\n\nVector Dynamic Access\n^^^^^^^^^^^^^^^^^^^^^\n\nVector dynamic access is similar to that of scalar, except that we need to specify \nthe field ``vector_offset``. **Currently, only ``indirect_dim=0`` is supported**. \nThe stride on the leading dimension must be the the total number of \nelements to the right of the leading dimension in the **base tensor**, and the stride\nspecified in the leading dimension of the pattern in the `.ap()` is currently ignored.\nWe still recommend setting the stride properly so that code would still work if this\nlimitation is lifted in the future.\n\n.. code-block:: python\n\n   def indirect_vector_dynamic_dma(A):\n     # shape of A is (128, 512)\n     dynamic_idx_legal = nl.ndarray((64, 1), nl.int32, nl.sbuf)\n     nisa.iota(dynamic_idx_legal, [[1, 1]], 0, 2)\n     \n     result_sb = nl.ndarray((64, 512), nl.float32, buffer=nl.sbuf)\n     result_hbm = nl.ndarray((64, 512), nl.float32, buffer=nl.shared_hbm)\n\n     nisa.dma_copy(src=A.ap(\n       [[512, 64], [1, 512]], 0, vector_offset=dynamic_idx_legal, indirect_dim=0\n       ), dst=result_sb, name='inst0')\n    \n     nisa.dma_copy(result_hbm, result_sb, name=\"copy1\")\n\n     return result_hbm\n\nFor this particular case, the semantics of the access are the following. Note that,\nthe stride on the dynamic dimension is directly read from the **base tensor**.\n\n.. code-block:: python\n\n   indirect_dimension = 0\n\n   for w in range(64):\n     for z in range(512):\n       dynamic_idx = dynamic_idx_legal[w]\n           A[\n                  // static offsets\n                  offset +\n                  // AP with the indirect dimension number replaced\n                  // Note that the 512 is read from the shape of the **base** tensor.\n                  1 * z + 512 * dynamic_idx\n                 ]\n\nFurther reading\n---------------\n\n- :doc:`/nki/deep-dives/nki-compiler`\n- :doc:`/nki/api/index`\n\n"
  },
  {
    "path": "nki/migration/nki_block_dimension_migration_guide.rst",
    "content": ".. _nki_block_dimension_migration_guide:\n\nNKI Block Dimension Migration Guide\n===================================\n\nThe SBUF/PSUM tensors in NKI used to allow block dimensions in front of the partition dimension. The block dimension support has been removed due the following reasons.\n\n* Removing block dimensions does not hurt the expressivity of NKI.\n* Block dimension is a pure software concept and does not have direct hardware mapping.\n* The block dimension is unintuitive and causes confusion.\n* Using block dimension has no inherit performance benefit, particularly using block dimension has no relationship with memory throughput whatsoever.\n* Multi-buffering is implicit with block dimension. Removing block dimension will make multi-buffering more natural.\n\nThis document will first explain the semantics of block dimensions in detail, then it will provide information on how to migrate existing code that uses block dimensions while maintain the functional correctness and performance.\n\nWhat are block dimensions?\n--------------------------\n\nConsider the following NKI tensor.\n\n.. code-block:: python\n  :linenos:\n\n  a = nl.ndarray((4, 8, nl.par_dim(128), 2, 512), buffer=nl.sbuf)\n\n  # - (4, 8): (B) block dimensions\n  # - 128: (P) partition dimension\n  # - (2, 512): (F) free dimension\n\n\nA NKI tensor has three types of dimensions: `(B, P, F)` . The partition dimension maps to the partition dimension of the physical memory, and the free dimensions describe how data is organized in each SBUF/PSUM partition. The block dimensions described how many physical `(P, F)` tiles the tensor has.\n\nThe block dimension of tensors is a **logical** dimension and is a pure software concept. The compiler analyzes the memory dependency and allocates physical address to each tiles. **This means that the physical tiles may not be alive in the memory simultaneously**, and in most of the cases they don not. Consider the following code snippet that access the tensor `a`.\n\n.. code-block:: python\n  :linenos:\n\n  @nki.jit\n  def exp_func(inp):\n    output = nl.ndarray((4, 8, 128, 2, 512), dtype=float32, \n      buffer=nl.shared_hbm)\n    a = nl.ndarray((4, 8, nl.par_dim(128), 2, 512), dtype=float32, buffer=nl.sbuf)\n    for i in range(4):\n      for j in range(8):\n        a[i, j] = nl.load(inp[i, j])\n        a[i, j] = nl.exp(a[i, j])\n        nl.store(output[i, j], value=result)\n\n\nAt the very minimum, only 1 physical tile of `a` needs to be alive. Then the execution is completely serialized. Essentially, all physical tiles would have the exact same memory address.\n\n.. code-block::\n  :linenos:\n\n  Physical Address Map\n\n  output[0, 0] --> Partition 0 - 128, Free 0 - 2048B\n  output[0, 1] --> Partition 0 - 128, Free 0 - 2048B\n  ...\n\n\nInstead, compiler could choose to allocate 2 physical tiles to `a`, then the dma copy from HBM to SBUF can overlap with the exponential operation. In other word, **the block dimension allows compiler to perform space-time tradeoff at liberty.**\n\n.. code-block::\n  :linenos:\n\n  Physical Address Map\n\n  output[0, 0] --> Partition 0 - 128, Free 0    - 2048B\n  output[0, 1] --> Partition 0 - 128, Free 2048 - 4096B\n  output[0, 2] --> Partition 0 - 128, Free 0    - 2048B\n  output[0, 3] --> Partition 0 - 128, Free 2048 - 4096B\n  ...\n\n\nWhen performing the migration, it is important to understand the dependency relationship between blocks and choose the correct migration method accordingly.\n\nMigration for SBUF tensors\n--------------------------\n\nIf blocks need to be alive at the same time, move the block dimension into free dimension\n**********************************************************************************************\n\n.. code-block:: python\n  :linenos:\n\n  a = nl.ndarray((8, par_dim(128), 512), buffer=nl.sbuf, dtype=bfloat16)\n\n  # ----> Migrate to\n  a = nl.ndarray((128, 8, 512), buffer=nl.sbuf, dtype=bfloat16)\n\nAs an example, all 8 blocks of ``add_buf`` needs to be alive at the same time when the first for loop finishes. Therefore, the block dimension need to be fold into the free dimension.\n\n.. code-block:: python\n    :linenos:\n\n    @nki.jit\n    def sb_blocks(inp):\n        res = nl.ndarray(shape=(8, 128, 512), dtype=inp.dtype, buffer=nl.shared_hbm)\n        add_buf = nl.ndarray(shape=(8, nl.par_dim(128), 512), dtype=inp.dtype, buffer=nl.sbuf)\n        for i in range(8):\n            add_buf[i] = nl.load(inp[i])\n        for i in range(8):\n            nl.store(res[i], add_buf[i])\n        return res\n\n    # should migrate to\n    @nki.jit\n    def sb_blocks_migrated(inp):\n        res = nl.ndarray(shape=(8, 128, 512), dtype=inp.dtype, buffer=nl.shared_hbm)\n        add_buf = nl.ndarray(shape=(128, 8, 512), dtype=inp.dtype, buffer=nl.sbuf)\n        for i in range(8):\n            add_buf[0:128, i, 0:512] = nl.load(inp[i])\n        for i in range(8):\n            nl.store(res[i], add_buf[0:128, i, 0:512])\n        return res\n\nIf blocks does not need to be alive at the same time, remove the block dimension and hoist it down \n**************************************************************************************************\n\n.. code-block:: python\n  :linenos:\n\n  a = nl.ndarray((8, par_dim(128), 256))\n  for i in nl.affine_range(8):\n    <do something with a[i]>\n    \n  # should be transformed to ....\n  for i in nl.affine_range(8):\n    a = nl.ndarray((128, 256))\n    <do something with a>\n\nAs an example, all 8 blocks of ``add_buf`` does not need to be alive at the same time. We can remove the block dimension and hoist down the tensor inside the loop.\n\n.. code-block:: python\n    :linenos:\n\n    @nki.jit\n    def sb_blocks(inp):\n        res = nl.ndarray(shape=(8, 128, 512), dtype=inp.dtype, buffer=nl.shared_hbm)\n        add_buf = nl.ndarray(shape=(8, nl.par_dim(128), 512), dtype=inp.dtype, buffer=nl.sbuf)\n        for i in range(8):\n            add_buf[i] = nl.load(inp[i])\n            nl.store(res[i], add_buf[i])\n        return res\n\n    # should migrate to\n    @nki.jit\n    def sb_blocks_migrated(inp):\n        res = nl.ndarray(shape=(8, 128, 512), dtype=inp.dtype, buffer=nl.shared_hbm)\n        for i in range(8):\n            add_buf = nl.ndarray(shape=(128, 512), dtype=inp.dtype, buffer=nl.sbuf)\n            add_buf[0:128, 0:512] = nl.load(inp[i])\n            nl.store(res[i], add_buf[0:128, 0:512])\n        return res\n\n.. warning::\n    To preserve performance, it is important to hoist down the tensor inside the loop.\n\nIt is important to note that the dependency relationship betweens loop iterations is different in ``sb_blocks_migrated`` and the following ``sb_blocks_migrated_incorrect``.\n\n.. code-block:: python\n    :linenos:\n\n    @nki.jit\n    def sb_blocks_migrated_incorrect(inp):\n        res = nl.ndarray(shape=(8, 128, 512), dtype=inp.dtype, buffer=nl.shared_hbm)\n        add_buf = nl.ndarray(shape=(128, 512), dtype=inp.dtype, buffer=nl.sbuf)\n        for i in range(8):\n            add_buf[0:128, 0:512] = nl.load(inp[i])\n            nl.store(res[i], add_buf[0:128, 0:512])\n        return res\n\nIn ``sb_blocks_migrated``, compiler could unroll the loop and materialize multiple copies of the tensor ``add_buf``. However, in the ``sb_blocks_migrated_incorrect``, the execution will be serialized because the loop carries dependency on ``add_buf``.\n\nMigration for PSUM tensors\n--------------------------\n\n.. note:: \n    To be filled, the backend support for removing blocks in PSUM tensor is still in progress.\n\n\nMigration of direct allocation & multi-buffering\n------------------------------------------------\n\nWhen we have block dimensions, we allocate interleaved address for blocks to achieve multi-buffering.\n\n.. code-block:: python\n  :linenos:\n  \n  def interleave_alloc_func(idx, pdim_size, fdim_size):\n    \"\"\"\n    This function assumes 1d block dimension, and will allocate unique\n    address by modulo of 2.\n\n    For a tensor of 4 blocks, block 0 and 2 will have the same address, while\n    block 1 and 3 will have the same address that is different to that of 0 and 2.\n    \"\"\"\n    # unpack the tuple\n    idx, = idx\n\n    # hard-code to partition 0, since each tile takes up 128 partitions\n    start_partition = 0\n\n    return (start_partition, (idx % 2) * fdim_size)\n  \n  @nki.jit\n  def copy_func(inp):\n    output = nl.ndarray((4, 128, 512), dtype=float32, buffer=nl.shared_hbm)\n    a = nl.ndarray((4, nl.par_dim(128), 512), dtype=float32, buffer=ncc.sbuf.alloc(interleave_alloc_func))\n    for i in range(4):\n        a[i] = nl.load(inp[i])\n        nl.store(output[i], value=a[i])\n\nAfter removing the block dimension, we could write the following to implement the same multi-buffering, which is actually more natural and closer to that on CPU.\n\n.. code-block:: python\n  :linenos:\n  \n  def interleave_alloc_func(idx, pdim_size, fdim_size):\n    \"\"\"\n    This function assumes 1d block dimension, and will allocate unique\n    address by modulo of 2.\n\n    For a tensor of 4 blocks, block 0 and 2 will have the same address, while\n    block 1 and 3 will have the same address that is different to that of 0 and 2.\n    \"\"\"\n    # unpack the tuple\n    assert idx == () # We don't have any block dimension\n\n    # hard-code to partition 0, since each tile takes up 128 partitions\n    start_partition = 0\n\n    return (start_partition, (idx % 2) * fdim_size)\n  \n  @nki.compiler.skip_middle_end_transformations\n  @nki.jit\n  def exp_func(inp):\n    output = nl.ndarray((4, 128, 512), dtype=nl.float32, buffer=nl.shared_hbm)\n    a = nl.ndarray((128, 2, 512), dtype=nl.float32, buffer=ncc.sbuf.alloc(interleave_alloc_func))\n    for i in range(4):\n      a[0:128, i % 2, 0:512] = nl.load(inp[i])\n      nl.store(output[i], value=a[0:128, i % 2, 0:512])\n"
  },
  {
    "path": "nki/nki_faq.rst",
    "content": ".. _nki_faq:\n\nNKI FAQ\n=========\n\nWhen should I use NKI?\n~~~~~~~~~~~~~~~~~~~~~~\n\nNKI lets you write custom operators that program directly against the Neuron ISA.\nThere are two common reasons to use NKI:\n\n* **Performance optimization**: When the Neuron Compiler's general-purpose optimizations\n  don't fully exploit the hardware for your specific workload, NKI lets you write\n  hand-tuned operators that maximize compute and memory throughput. For example,\n  the NKI Library provides optimized kernels for attention, MLP, RMSNorm with\n  quantization, and collective communication that outperform compiler-generated\n  equivalents.\n\n* **Novel operators and architectures**: NKI enables you to implement operators that\n  are not yet supported by the Neuron Compiler, letting you self-serve new deep learning\n  architectures and custom operations without waiting for compiler support.\n\nWhich AWS chips does NKI support?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNKI supports Trainium2 and Trainium3 chips,\navailable in the following instance types: Trn2 and Trn3.\n\nWhich compute engines are supported?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe following AWS Trainium and Inferentia compute engines are\nsupported: Tensor Engine, Vector Engine, Scalar Engine, and GpSimd Engine.\nFor more details, see the :doc:`NeuronDevice Architecture Guide </nki/guides/architecture/index>`,\nand refer to :doc:`nki.isa <api/nki.isa>` APIs to identify which engines are utilized for each instruction.\n\nHow do I launch a NKI kernel onto a logical NeuronCore with Trainium2 or Trainium3 from NKI?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nA logical NeuronCore (LNC) can consist of multiple physical NeuronCores. In the current Neuron release, an LNC on Trainium2 or Trainium3 can have up to two physical NeuronCores.\n\nFor more details on NeuronCore configurations, see\n`Logical NeuronCore configurations <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/logical-neuroncore-config.html#logical-neuroncore-config>`__.\n\nIn NKI, users can launch a NKI kernel onto multiple physical NeuronCores within a logical NeuronCore using ``kernel[2]`` to set LNC=2. Each core receives a different ``nl.program_id(0)`` value (0 or 1).\n\nWhat ML Frameworks support NKI kernels?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNKI is integrated with :ref:`nki_framework_custom_op_pytorch` and :ref:`nki_framework_custom_op_jax`\nframeworks. For more details, see the :ref:`nki_framework_custom_op`.\n\nWhat Neuron software does not currently support NKI?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nNKI does not currently support integration with\nNeuron Custom C++ Operators, Transformers NeuronX, and Neuron Collective Communication.\n\nWhere can I find NKI sample kernels?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNKI provides two open source repositories with kernel examples:\n\n* `NKI Library <https://github.com/aws-neuron/nki-library>`__ — Production-ready, optimized kernels for common operations (matrix multiplication, attention, normalization, quantization, etc.) that you can use directly in your models. See the :doc:`NKI Library documentation </nki/library/index>` for API reference and design specifications.\n\n* `NKI Samples <https://github.com/aws-neuron/nki-samples>`__ — Reference and tutorial kernels that demonstrate NKI programming patterns and concepts. These are designed for learning and experimentation rather than production use.\n\nFor step-by-step guides on writing NKI kernels, see the :doc:`NKI tutorials </nki/guides/tutorials/index>`.\n\nWhat should I do if I have trouble resolving a kernel compilation error?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRefer to the `NKI sample GitHub issues <https://github.com/aws-neuron/nki-samples/issues>`__ for guidance on\nresolving common NKI compilation errors.\n\nIf you encounter compilation errors from Neuron Compiler that you cannot understand or\nresolve, you may check out NKI sample `GitHub issues <https://github.com/aws-neuron/nki-samples/issues>`__\nand open an issue if no similar issues exist.\n\nHow can I debug numerical issues in NKI kernels?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWe encourage NKI programmers to build kernels incrementally and verify output of small operators one at a time.\nNKI also provides a CPU simulation mode that supports printing of kernel intermediate tensor values to the console.\nSee :doc:`nki.simulate </nki/api/generated/nki.simulate>` for a code example.\n\n\nHow can I optimize my NKI kernel?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo learn how to optimize your NKI kernel, see the :ref:`nki_perf_guide`.\n\nDoes NKI support entire Neuron instruction set?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNeuron will iteratively add support for the Neuron\ninstruction set through adding more :doc:`nki.isa <api/nki.isa>` (Instruction Set\nArchitecture) APIs in upcoming Neuron releases.\n\n\nWill NKI APIs guarantee backwards compatibility?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe :doc:`NKI APIs <api/index>` follow the Neuron Software Maintenance policy for Neuron APIs.\nFor more information, see the\n`SDK Maintenance Policy <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/sdk-policy.html>`__."
  },
  {
    "path": "nki/scripts/markdown2rst.py",
    "content": "import re\nimport sys\nfrom m2r import convert\n\nargs = sys.argv\nassert len(args) == 3\n\n# Useful regex engine for testing: https://www.regexpal.com/\n\n# Gets user input\n# filename = input(\"What's the file you want to insert line breaks?\")\n# new_filename = input(\"What's the file you want to write results to?\")\nfilename = args[1]\nnew_filename = args[2]\n\nf = open(filename, \"r\")\ntext = f.read()\nf.close()\n\n# Step 1: run m2r tool to convert markdown to sphinx\ntext = convert(text)\n\n# Step 2:\n# Replace image with sphinx figure directive\n# There can be two formats for images coming out of markdown\npattern1 = r\"\\[Image: image\\.png\\]\\n(.*)\\n\"\npattern2 = r\"\\[Image: image\\.png\\](.*)\\n\"\n\nreplacement=r'''\n.. _<FIXME>:\n\n.. figure:: img/<FIXME>.png\n   :align: center\n   :width: 60%\n\n   \\1\n'''\n\ntext = re.sub(pattern1, replacement, text)\ntext = re.sub(pattern2, replacement, text)\n\n# Replace code browser URL\npattern_url = r\"https:\\/\\/prod\\.artifactbrowser\\.brazil\\.aws\\.dev(\\S*)_build\\/html\\/\"\nreplacement= \"\"\n\ntext = re.sub(pattern_url, replacement, text)\n\n# Step 3.\n# Insert line breaks\ntext = text.split(\"\\n\")\nmax_char = 120\n\nsplit_lines = []\nfor line in text:\n    words = line.replace(\"\\n\", \"\").split(\" \")\n    i_words = 0\n\n    while (i_words < len(words)):\n        buffer = \"\"\n        while(len(buffer) < 120):\n            buffer += words[i_words] + \" \"\n            i_words += 1\n\n            if i_words >= len(words):\n                break\n\n        split_lines.append(buffer)\n\n\n\nwith open(new_filename, \"w+\") as f:\n    f.write(\"\\n\".join(split_lines))\n"
  },
  {
    "path": "nki/scripts/requirements.txt",
    "content": "m2r"
  },
  {
    "path": "nki/test/test_nki_isa_activation.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_BEGIN\nimport neuronxcc.nki.language as nl\nimport neuronxcc.nki.isa as nisa\n# NKI_EXAMPLE_END\nimport numpy as np\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_activation(a_tensor, b_tensor, c_tensor):\n  a_act_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype, buffer=nl.shared_hbm)\n  b_act_tensor = nl.ndarray(b_tensor.shape, dtype=b_tensor.dtype, buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_BEGIN\n  ##################################################################\n  # Example 1: perform exponential function on matrix a of shape (128, 1024)\n  ##################################################################\n  a = nl.load(a_tensor)\n  activated_a = nisa.activation(op=nl.exp, data=a)\n  nl.store(a_act_tensor, activated_a)\n\n  ##################################################################\n  # Example 2: perform the following operations to matrix b of shape (128, 512)\n  # using a single activation instruction: np.square(b * 2.0) + c\n  # 1) compute `np.square(b * 2.0 + c)`\n  # 2) cast 1) results into bfloat16\n  ##################################################################\n  b = nl.load(b_tensor)\n  c = nl.load(c_tensor)\n  activated_b = nisa.activation(op=np.square, data=b, bias=c, scale=2.0,\n                                dtype=nl.bfloat16)\n  nl.store(b_act_tensor, activated_b)\n  # NKI_EXAMPLE_END\n\n  return a_act_tensor, b_act_tensor\n\n  \nclass TestNkiIsaExamplesActivation(unittest.TestCase):\n  def test_activation(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 1024]).astype(np.float32) * 100\n    b = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    c = np.random.random_sample([128, 1]).astype(np.float32) * 100\n\n    a_act, b_act = nki_activation(a, b, c)\n\n    a_act_golden = np.exp(a)\n    b_act_golden = np.square(b*2+c)\n\n    self.assertTrue(np.allclose(a_act, a_act_golden))\n    self.assertTrue(np.allclose(b_act, b_act_golden))\n"
  },
  {
    "path": "nki/test/test_nki_isa_affine_select.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\n# NKI_EXAMPLE_END\nimport numpy as np\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_affine_select(a_tensor):\n  b_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype, buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_BEGIN\n  ##################################################################\n  # Example 1: Take tile a of shape [128, 128] and replace its\n  # upper triangle with nl.fp32.min;\n  ##################################################################\n  ix, iy = nl.mgrid[0:128, 0:128]\n  a = nl.load(a_tensor[ix, iy])\n\n  b = nisa.affine_select(pred=(iy <ix), on_true_tile=a[ix, iy], on_false_value=nl.fp32.min)\n\n  nl.store(b_tensor[ix, iy], b)\n  # NKI_EXAMPLE_END\n\n  return b_tensor\n\n\nclass TestNkiIsaExamplesAffineSelect(unittest.TestCase):\n  def test_affine_select(self):\n    a = np.random.random_sample([128, 128]).astype(np.float32) * 100\n    b_golden = np.copy(a)\n\n    b = nki_affine_select(a)\n\n    triui = np.triu_indices_from(b_golden) # upper triangle indicies\n    b_golden[triui] = nl.fp32.min\n\n    self.assertTrue(np.allclose(b, b_golden))\n\n\n\n\n "
  },
  {
    "path": "nki/test/test_nki_isa_bn_stats.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n# NKI_EXAMPLE_END\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_bn_stats_bn_aggr_1(a_tensor):\n  mean_a_tensor = nl.ndarray([a_tensor.shape[0], 1], dtype=a_tensor.dtype, buffer=nl.shared_hbm)\n  var_a_tensor = nl.ndarray([a_tensor.shape[0], 1], dtype=a_tensor.dtype, buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_BEGIN\n  ##################################################################\n  # Example 1: Calculate the mean and variance for each partition\n  # of tile a with shape (128, 128)\n  ##################################################################\n  a: tensor[128, 128] = nl.load(a_tensor)\n  stats_a: tensor[128, 6] = nisa.bn_stats(a)\n  mean_var_a: tensor[128, 2] = nisa.bn_aggr(stats_a)\n\n  # Extract mean and variance\n  mean_a = mean_var_a[:, 0]\n  var_a = mean_var_a[:, 1]\n  nl.store(mean_a_tensor, mean_a)\n  nl.store(var_a_tensor, var_a)\n  # NKI_EXAMPLE_END\n\n  return mean_a_tensor, var_a_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_bn_stats_bn_aggr_2(b_tensor):\n  mean_b_tensor = nl.ndarray([b_tensor.shape[0], 1], dtype=b_tensor.dtype, buffer=nl.shared_hbm)\n  var_b_tensor = nl.ndarray([b_tensor.shape[0], 1], dtype=b_tensor.dtype, buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_BEGIN\n  # ##################################################################\n  # # Example 2: Calculate the mean and variance for each partition of\n  # # tile b with shape [128, 1024]\n  # ##################################################################\n  b: tensor[128, 1024] = nl.load(b_tensor)\n\n  # Run bn_stats in two tiles because b has 1024 elements per partition,\n  # but bn_stats has a limitation of nl.tile_size.bn_stats_fmax\n  # Initialize a bn_stats output tile with shape of [128, 6*2] to\n  # hold outputs of two bn_stats instructions\n  stats_b = nl.ndarray((128, 6 * 2), dtype=nl.float32)\n  bn_tile = nl.tile_size.bn_stats_fmax\n  ix, iy = nl.mgrid[0:128, 0:bn_tile]\n  iz, iw = nl.mgrid[0:128, 0:6]\n\n  for i in range(1024 // bn_tile):\n    stats_b[iz, i * 6 + iw] = nisa.bn_stats(b[ix, i * bn_tile + iy], dtype=nl.float32)\n\n  mean_var_b = nisa.bn_aggr(stats_b)\n\n  # Extract mean and variance\n  mean_b = mean_var_b[:, 0]\n  var_b = mean_var_b[:, 1]\n\n  nl.store(mean_b_tensor, mean_b)\n  nl.store(var_b_tensor, var_b)\n  # NKI_EXAMPLE_END\n\n  return mean_b_tensor, var_b_tensor\n\n\nclass TestNkiIsaExamplesBnStatsBnAggr(unittest.TestCase):\n  def test_bn_stats_bn_aggr(self):\n    a = np.random.random_sample([128, 128]).astype(np.float32) * 100\n    b = np.random.random_sample([128, 1024]).astype(np.float32) * 100\n\n    a_mean, a_var = nki_bn_stats_bn_aggr_1(a)\n    b_mean, b_var = nki_bn_stats_bn_aggr_2(b)\n\n    a_mean_golden = np.mean(a, axis=1, keepdims=True)\n    b_mean_golden = np.mean(b, axis=1, keepdims=True)\n    a_var_golden = np.var(a, axis=1, keepdims=True)\n    b_var_golden = np.var(b, axis=1, keepdims=True)\n\n    self.assertTrue(np.allclose(a_mean, a_mean_golden))\n    self.assertTrue(np.allclose(a_var, a_var_golden))\n    self.assertTrue(np.allclose(b_mean, b_mean_golden))\n    self.assertTrue(np.allclose(b_var, b_var_golden))\n"
  },
  {
    "path": "nki/test/test_nki_isa_copypredicated.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_21_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n# NKI_EXAMPLE_21_END\nimport numpy as np\n...\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_copy_predicated(predicate, on_true_tensor, on_false_tensor):\n  # NKI_EXAMPLE_21_BEGIN\n  ##################################################################\n  # Example 1: Conditionally copies elements from the `on_true` tile to \n  # SBUF/PSUM destination tile using Vector Engine, where copying occurs \n  # only at positions where the predicate evaluates to True.\n  ##################################################################\n  # NKI_EXAMPLE_21_END\n  ...\n  out_tensor: tensor[128, 512] = nl.ndarray([128, 512], dtype=on_true_tensor.dtype,\n                                            buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_21_BEGIN\n  ...\n  pre_tile: tensor[128, 512] = nl.load(predicate)\n  src_tile: tensor[128, 512] = nl.load(on_true_tensor)\n\n  ix, iy = nl.mgrid[0:128, 0:512]\n  dst_tile: tensor[128, 512] = nl.zeros(shape=src_tile.shape, dtype=src_tile.dtype)\n  dst_tile[ix, iy] = nl.load(on_false_tensor)\n\n  nisa.tensor_copy_predicated(src=src_tile, dst=dst_tile, predicate=pre_tile)\n  # NKI_EXAMPLE_21_END\n\n  nl.store(out_tensor, dst_tile)\n  return out_tensor\n\n\nclass TestNkiIsaExamplescopy_predicated(unittest.TestCase):\n  def test_copy_predicated(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.random.random_sample([128, 512]).astype(np.float32) * 100\n\n    b = nki_copy_predicated(np.less_equal(a, 0.8), a, b)\n    b_golden = np.where(np.less_equal(a, 0.8), a, b)\n\n    self.assertTrue(np.allclose(b, b_golden))\n"
  },
  {
    "path": "nki/test/test_nki_isa_dma_copy.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n\n# NKI_EXAMPLE_1_BEGIN # NKI_EXAMPLE_2_BEGIN # NKI_EXAMPLE_3_BEGIN # NKI_EXAMPLE_4_BEGIN # NKI_EXAMPLE_5_BEGIN # NKI_EXAMPLE_6_BEGIN # NKI_EXAMPLE_7_END\n# NKI_EXAMPLE_0_BEGIN\nimport neuronxcc.nki.isa as nisa\n# NKI_EXAMPLE_0_END\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n# NKI_EXAMPLE_1_END # NKI_EXAMPLE_2_END # NKI_EXAMPLE_3_END # NKI_EXAMPLE_4_END # NKI_EXAMPLE_5_END # NKI_EXAMPLE_6_END # NKI_EXAMPLE_7_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_dma_copy(a):\n  b = nl.ndarray(a.shape, dtype=a.dtype, buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_0_BEGIN\n  ############################################################################\n  # Example 1: Copy over the tensor to another tensor\n  ############################################################################\n  nisa.dma_copy(dst=b, src=a)\n\n  # NKI_EXAMPLE_0_END\n\n  return b\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_indirect_load_oob_err(in_tensor):\n  # NKI_EXAMPLE_1_BEGIN\n  ############################################################################\n  # Example 2: Load elements from HBM with indirect addressing. If addressing \n  # results out-of-bound access, the operation will fail.\n  ############################################################################\n  # NKI_EXAMPLE_1_END\n  ...\n  out_tensor: tensor[64, 512] = nl.ndarray([64, 512], dtype=in_tensor.dtype,\n                                            buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_1_BEGIN\n  ...\n  n, m = in_tensor.shape\n  ix, iy = nl.mgrid[0:n//2, 0:m]\n\n  expr_arange = 2*nl.arange(n//2)[:, None]\n  idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)\n\n  out_tile: tensor[64, 512] = nisa.memset(shape=(n//2, m), value=-1, dtype=in_tensor.dtype)\n  nisa.dma_copy(src=in_tensor[idx_tile, iy], dst=out_tile[ix, iy], oob_mode=nisa.oob_mode.error)\n  # NKI_EXAMPLE_1_END\n\n  nl.store(out_tensor, value=out_tile)\n  return out_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_indirect_load_oob_error_negative(in_tensor):\n  # NKI_EXAMPLE_2_BEGIN\n  ############################################################################\n  # Example 3: Load elements from HBM with indirect addressing. If addressing \n  # results in out-of-bounds access, the operation will fail.\n  ############################################################################\n  # NKI_EXAMPLE_2_END\n  ...\n  out_tensor: tensor[64, 512] = nl.ndarray([64, 512], dtype=in_tensor.dtype,\n                                            buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_2_BEGIN\n  ...\n  n, m = in_tensor.shape\n  ix, iy = nl.mgrid[0:n//2, 0:m]\n\n  # indices are out of range on purpose to demonstrate the error\n  expr_arange = 3*nl.arange(n//2)[:, None] \n  idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)\n\n  out_tile: tensor[64, 512] = nisa.memset(shape=(n//2, m), value=-1, dtype=in_tensor.dtype)\n  nisa.dma_copy(src=in_tensor[idx_tile, iy], dst=out_tile[ix, iy], oob_mode=nisa.oob_mode.error)\n\n  # NKI_EXAMPLE_2_END\n\n  nl.store(out_tensor, value=out_tile)\n  return out_tensor\n\n  \n@nki.jit(mode=\"simulation\")\ndef nki_indirect_load_oob_skip(in_tensor):\n  # NKI_EXAMPLE_3_BEGIN\n  ############################################################################\n  # Example 4: Load elements from HBM with indirect addressing. If addressing \n  # results in out-of-bounds access, the operation will skip indices.\n  ############################################################################\n  # NKI_EXAMPLE_3_END\n  ...\n  out_tensor: tensor[64, 512] = nl.ndarray([64, 512], dtype=in_tensor.dtype,\n                                            buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_3_BEGIN\n  ...\n  n, m = in_tensor.shape\n  ix, iy = nl.mgrid[0:n//2, 0:m]\n\n  # indices are out of range on purpose\n  expr_arange = 3*nl.arange(n//2)[:, None] \n  idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)\n\n  out_tile: tensor[64, 512] = nisa.memset(shape=(n//2, m), value=-1, dtype=in_tensor.dtype)\n  nisa.dma_copy(src=in_tensor[idx_tile, iy], dst=out_tile[ix, iy], oob_mode=nisa.oob_mode.skip)\n\n  # NKI_EXAMPLE_3_END\n\n  nl.store(out_tensor, value=out_tile)\n  return out_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_indirect_store_rmw(in_tensor):\n  # NKI_EXAMPLE_4_BEGIN\n  ############################################################################\n  # Example 5: Store elements to HBM with indirect addressing and with \n  # read-modifed-write operation.\n  ############################################################################\n  # NKI_EXAMPLE_4_END\n  ...\n  out_tensor: tensor[128, 512] = nl.ndarray([128, 512], dtype=in_tensor.dtype,\n                                            buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_4_BEGIN\n  ...\n  n, m = in_tensor.shape\n  ix, iy = nl.mgrid[0:n, 0:m]\n\n  expr_arange = 2*nl.arange(n)[:, None]\n  inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy])\n  idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)\n\n  out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=1, dtype=in_tensor.dtype)\n  nl.store(out_tensor, value=out_tile)\n  nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, dst_rmw_op=np.add)\n  # NKI_EXAMPLE_4_END\n\n  return out_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_indirect_store_oob_err(in_tensor):\n  # NKI_EXAMPLE_5_BEGIN\n  ############################################################################\n  # Example 6: Store elements to HBM with indirect addressing. If indirect \n  # addressing results out-of-bound access, the operation will fail.\n  ############################################################################\n  # NKI_EXAMPLE_5_END\n  ...\n  out_tensor: tensor[128, 512] = nl.ndarray([128, 512], dtype=in_tensor.dtype,\n                                            buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_5_BEGIN\n  ...\n  n, m = in_tensor.shape\n  ix, iy = nl.mgrid[0:n, 0:m]\n\n  expr_arange = 2*nl.arange(n)[:, None]\n  inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy])\n  idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)\n\n  out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=-1, dtype=in_tensor.dtype)\n  nl.store(out_tensor, value=out_tile)\n  nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, oob_mode=nisa.oob_mode.error)\n  # NKI_EXAMPLE_5_END\n\n  return out_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_indirect_store_oob_err_negative(in_tensor):\n  # NKI_EXAMPLE_6_BEGIN\n  ############################################################################\n  # Example 7: Store elements to HBM with indirect addressing. If indirect \n  # addressing results out-of-bounds access, the operation will skip indices.\n  ############################################################################\n  # NKI_EXAMPLE_6_END\n  ...\n  out_tensor: tensor[128, 512] = nl.ndarray([128, 512], dtype=in_tensor.dtype,\n                                            buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_6_BEGIN\n  ...\n  n, m = in_tensor.shape\n  ix, iy = nl.mgrid[0:n, 0:m]\n\n  # indices are out of range on purpose to demonstrate the error\n  expr_arange = 3*nl.arange(n)[:, None] \n  inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy])\n  idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)\n\n  out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=-1, dtype=in_tensor.dtype)\n  nl.store(out_tensor, value=out_tile)\n  nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, oob_mode=nisa.oob_mode.error)\n\n  # NKI_EXAMPLE_6_END\n\n  return out_tensor\n\n  \n@nki.jit(mode=\"simulation\")\ndef nki_indirect_store_oob_skip(in_tensor):\n  # NKI_EXAMPLE_7_BEGIN\n  ############################################################################\n  # Example 8: Store elements to HBM with indirect addressing. If indirect \n  # addressing results out-of-bounds access, the operation will skip indices.\n  ############################################################################\n  # NKI_EXAMPLE_7_END\n  ...\n  out_tensor: tensor[128, 512] = nl.ndarray([128, 512], dtype=in_tensor.dtype,\n                                            buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_7_BEGIN\n  ...\n  n, m = in_tensor.shape\n  ix, iy = nl.mgrid[0:n, 0:m]\n\n  # indices are out of range on purpose\n  expr_arange = 3*nl.arange(n)[:, None] \n  inp_tile: tensor[64, 512] = nl.load(in_tensor[ix, iy])\n  idx_tile: tensor[64, 1] = nisa.iota(expr_arange, dtype=np.int32)\n\n  out_tile: tensor[128, 512] = nisa.memset(shape=(2*n, m), value=-1, dtype=in_tensor.dtype)\n  nl.store(out_tensor, value=out_tile)\n  nisa.dma_copy(dst=out_tensor[idx_tile, iy], src=inp_tile, oob_mode=nisa.oob_mode.skip)\n\n  # NKI_EXAMPLE_7_END\n\n  return out_tensor\n\n@nki.jit(mode='simulation')\ndef nki_dma_copy_swdge(in_tensor):\n  # NKI_EXAMPLE_8_BEGIN\n  ############################################################################\n  # Example 9: Copy data with SWDGE. Must follow DGE access pattern requirements\n  # to use DGE.\n  ############################################################################\n  # NKI_EXAMPLE_8_END\n  ...\n  out_tensor: tensor[64, 512] = nl.ndarray([64, 512], dtype=in_tensor.dtype,\n                                            buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_8_BEGIN\n  ...\n  nisa.dma_copy(dst=out_tensor, src=in_tensor, dge_mode=nisa.dge_mode.swdge)\n\n  # NKI_EXAMPLE_8_END\n\n  return out_tensor\n\n@nki.jit(mode='simulation', platform_target='trn2')\ndef nki_dma_copy_hwdge(in_tensor):\n  # NKI_EXAMPLE_9_BEGIN\n  ############################################################################\n  # Example 10: Copy data with HWDGE. Must follow DGE access pattern requirements,\n  # and further have (1) accessed partitions=128 (2) spill/reload DMA \n  # (3) target=trn2+ to use HWDGE.\n  ############################################################################\n  # NKI_EXAMPLE_9_END\n  ...\n  out_tensor: tensor[128, 512] = nl.ndarray([128, 512], dtype=in_tensor.dtype,\n                                            buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_9_BEGIN\n  ...\n  inp_tile: tensor[128, 512] = nl.load(in_tensor)\n  out_tile: tensor[128, 512] = nl.zeros_like(inp_tile, buffer=nl.sbuf)\n  nisa.dma_copy(dst=out_tile, src=inp_tile, dge_mode=nisa.dge_mode.hwdge)\n  nl.store(out_tensor, value=out_tile)\n\n  # NKI_EXAMPLE_9_END\n\n  return out_tensor\n      \nclass TestNkiIsaExamplesTensorCopy(unittest.TestCase):\n  def test_tensor_copy(self):\n    np.random.seed(0)\n    src = np.random.random_sample([256, 1]).astype(np.float32) * 100\n    dst_golden = np.copy(src)\n\n    dst = nki_dma_copy(src)\n    self.assertTrue(np.allclose(dst, dst_golden))\n\n\n  def test_indirect_load_oob_err(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32)\n\n    b = nki_indirect_load_oob_err(a)\n    \n    b_golden = a[2 * np.arange(64, dtype=np.int32)]\n\n    self.assertTrue(np.allclose(b, b_golden))\n\n\n  def test_indirect_load_oob_err_negative(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32)\n\n    with self.assertRaises(IndexError) as cm:\n      b = nki_indirect_load_oob_error_negative(a)\n    exc = cm.exception\n    self.assertEqual(type(exc), IndexError)\n    self.assertIn(str(exc), 'index 66048 is out of bounds for axis 0 with size 65536')\n\n\n  def test_indirect_load_oob_skip(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32)\n\n    b = nki_indirect_load_oob_skip(a)\n\n    n, m = a.shape\n    b_golden = np.full((n//2, m), -1, dtype=a.dtype)\n    indices = 3 * np.arange((n//3) + 1)\n    b_golden[0:len(indices)] = a[indices, :]\n\n    self.assertTrue(np.allclose(b, b_golden))\n\n    \n  def test_indirect_store_rmw(self):\n    np.random.seed(0)\n    a = np.random.random_sample([64, 512]).astype(np.float32)\n\n    b = nki_indirect_store_rmw(a)\n\n    n, m = a.shape\n    b_golden = np.full(shape=(2*n, m), fill_value=1, dtype=a.dtype)\n    b_golden[2 * np.arange(n, dtype=np.int32)] += a\n\n    self.assertTrue(np.allclose(b, b_golden))\n\n\n  def test_indirect_store_oob_err(self):\n    np.random.seed(0)\n    a = np.random.random_sample([64, 512]).astype(np.float32)\n\n    b = nki_indirect_store_oob_err(a)\n\n    n, m = a.shape\n    b_golden = np.full(shape=(2*n, m), fill_value=-1, dtype=a.dtype)\n    b_golden[2 * np.arange(n, dtype=np.int32)] = a\n\n    self.assertTrue(np.allclose(b, b_golden))\n\n\n  def test_indirect_store_oob_err_negative(self):\n    np.random.seed(0)\n    a = np.random.random_sample([64, 512]).astype(np.float32)\n\n    with self.assertRaises(IndexError) as cm:\n      b = nki_indirect_store_oob_err_negative(a)\n    exc = cm.exception\n    self.assertEqual(type(exc), IndexError)\n    self.assertIn(str(exc), 'index 66048 is out of bounds for axis 0 with size 65536')\n\n\n  def test_indirect_store_oob_skip(self):\n    np.random.seed(0)\n    a = np.random.random_sample([64, 512]).astype(np.float32)\n\n    b = nki_indirect_store_oob_skip(a)\n\n    n, m = a.shape\n    b_golden = np.full(shape=(2*n, m), fill_value=-1, dtype=a.dtype)\n    indices = 3*np.arange(((2*n)//3) + 1)\n    b_golden[indices, :] = a[0:len(indices), :]\n\n    self.assertTrue(np.allclose(b, b_golden))\n\n  def test_dma_copy_swdge(self):\n    np.random.seed(0)\n    a = np.random.random_sample([64, 512]).astype(np.float32)\n    b = nki_dma_copy_swdge(a)\n    self.assertTrue(np.allclose(b, a))\n\n  def test_dma_copy_hwdge(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32)\n    b = nki_dma_copy_hwdge(a)\n    self.assertTrue(np.allclose(b, a))\n"
  },
  {
    "path": "nki/test/test_nki_isa_dma_transpose.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\nimport pytest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n\n# NKI_EXAMPLE_0_BEGIN NKI_EXAMPLE_1_BEGIN NKI_EXAMPLE_2_BEGIN NKI_EXAMPLE_3_BEGIN NKI_EXAMPLE_4_BEGIN NKI_EXAMPLE_5_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\n# NKI_EXAMPLE_0_END NKI_EXAMPLE_1_END NKI_EXAMPLE_4_END NKI_EXAMPLE_5_END\nfrom neuronxcc.nki.isa.constants import dge_mode\n# NKI_EXAMPLE_2_END NKI_EXAMPLE_3_END\n\n#############################################################################\n# NOTE: if you modify this file, make sure to update neuron_isa.py file with\n# NOTE: the correct line numbers under .. nki_example:: directive\n#############################################################################\n\n@nki.jit(mode=\"simulation\")\n# NKI_EXAMPLE_0_BEGIN\n############################################################################\n# Example 1: Simple 2D transpose (HBM->SB)\n############################################################################\ndef nki_dma_transpose_2d_hbm2sb(a):\n  b_sb = nisa.dma_transpose(a[:, :])\n  b = nl.ndarray(shape=b_sb.shape, dtype=b_sb.dtype, buffer=nl.hbm)\n  nl.store(dst=b, value=b_sb)\n  return b\n# NKI_EXAMPLE_0_END\n\n@nki.jit(mode=\"simulation\")\n# NKI_EXAMPLE_1_BEGIN\n############################################################################\n# Example 2: Simple 2D transpose (SB->SB)\n############################################################################\ndef nki_dma_transpose_2d_sb2sb(a):\n  a_sb = nl.load(a)\n  b_sb = nisa.dma_transpose(a_sb[:, :])\n  b = nl.ndarray(shape=b_sb.shape, dtype=b_sb.dtype, buffer=nl.hbm)\n  nl.store(dst=b, value=b_sb)\n  return b\n# NKI_EXAMPLE_1_END\n\n@nki.jit(mode=\"simulation\", platform_target=\"trn2\")\n# NKI_EXAMPLE_2_BEGIN\n################################################################################\n# Example 3: Simple 2D transpose (HBM->SB) using DGE xbar (NeuronCore-v3+ only)\n################################################################################\ndef nki_dma_transpose_2d_hbm2sb_dge_xbar(a):\n  b_sb = nisa.dma_transpose(a[:, :], dge_mode=dge_mode.hwdge)\n  b = nl.ndarray(shape=b_sb.shape, dtype=b_sb.dtype, buffer=nl.hbm)\n  nl.store(dst=b, value=b_sb)\n  return b\n# NKI_EXAMPLE_2_END\n\n@nki.jit(mode=\"simulation\", platform_target=\"trn2\")\n# NKI_EXAMPLE_3_BEGIN\n###############################################################################\n# Example 4: Simple 2D transpose (SB->SB) using DGE xbar (NeuronCore-v3+ only)\n###############################################################################\ndef nki_dma_transpose_2d_sb2sb_dge_xbar(a):\n  a_sb = nl.load(a)\n  b_sb = nisa.dma_transpose(a_sb[:, :], dge_mode=dge_mode.hwdge)\n  b = nl.ndarray(shape=b_sb.shape, dtype=b_sb.dtype, buffer=nl.hbm)\n  nl.store(dst=b, value=b_sb)\n  return b\n# NKI_EXAMPLE_3_END\n\n@nki.jit(mode=\"simulation\", platform_target=\"trn2\")\n# NKI_EXAMPLE_4_BEGIN\n############################################################################\n# Example 5: 3D transpose (HBM->SB) w/Indirect Mem Access\n############################################################################\ndef nki_dma_gather_transpose_3d_hbm2sb(src_tensor, idx_tensor):\n  i_p = nl.arange(32)[:, None]\n  idx = nl.load(idx_tensor)\n\n  _, dim1, dim2 = src_tensor.shape\n\n  iy = nl.arange(dim1)[None, :, None]\n  iz = nl.arange(dim2)[None, None, :]\n\n  dst = nisa.dma_transpose(src_tensor[idx[i_p, 0], iy, iz], axes=(2, 1, 0))\n  dst_tensor = nl.ndarray(shape=(dim2, dim1, idx.shape[0]), dtype=src_tensor.dtype, buffer=nl.shared_hbm)\n    \n  nl.store(dst_tensor, dst)\n  return dst_tensor\n# NKI_EXAMPLE_4_END\n\n@nki.jit(mode=\"simulation\", platform_target=\"trn2\")\n# NKI_EXAMPLE_5_BEGIN\n############################################################################\n# Example 6: 3D transpose (SB->SB) w/Indirect Mem Access\n############################################################################\ndef nki_dma_gather_transpose_3d_sb2sb(src_tensor, idx_tensor):\n  src = nl.load(src_tensor)\n  idx = nl.load(idx_tensor)\n\n  dim0, dim1, dim2 = src.shape\n  \n  iy = nl.arange(dim1)[None, :, None]\n  iz = nl.arange(dim2)[None, None, :]\n\n  dst = nisa.dma_transpose(src[idx, iy, iz], axes=(2, 1, 0))\n  dst_tensor = nl.ndarray(shape=(dim2, dim1, dim0), dtype=src.dtype, buffer=nl.shared_hbm)\n  \n  nl.store(dst_tensor, dst)\n  return dst_tensor\n# NKI_EXAMPLE_5_END\n\nclass TestNkiIsaExamplesDmaTranspose(unittest.TestCase):\n  def test_dma_transpose_2d(self):\n    np.random.seed(0)\n    src = np.random.random_sample([16, 128]).astype(np.float16) * 100\n    dst_golden = np.transpose(src)\n\n    dst = nki_dma_transpose_2d_hbm2sb(src)\n    self.assertTrue(np.allclose(dst, dst_golden))\n\n    dst = nki_dma_transpose_2d_sb2sb(src)\n    self.assertTrue(np.allclose(dst, dst_golden))\n\n    dst = nki_dma_transpose_2d_hbm2sb_dge_xbar(src)\n    self.assertTrue(np.allclose(dst, dst_golden))\n\n    dst = nki_dma_transpose_2d_sb2sb_dge_xbar(src)\n    self.assertTrue(np.allclose(dst, dst_golden))\n  \n  @pytest.mark.xfail(reason=\"PBE-63\")\n  def test_dma_transpose_indirect(self):\n    np.random.seed(0)\n    src_tensor = np.arange(64 * 4 * 128).reshape(64, 4, 128).astype(nl.uint16)\n    idx_tensor = np.arange(32, dtype=nl.uint32).reshape(32, 1)\n    \n    nki_out = nki_dma_gather_transpose_3d_hbm2sb(src_tensor, idx_tensor)\n    golden_out = np.transpose(src_tensor[idx_tensor.reshape(32)], axes=(2, 1, 0))\n\n    assert np.allclose(nki_out, golden_out)\n"
  },
  {
    "path": "nki/test/test_nki_isa_dropout.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n# NKI_EXAMPLE_END\nimport numpy as np\n\n@nki.jit(mode=\"simulation\")\ndef nki_dropout(a_tensor, b_tensor):\n  c_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype, buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_BEGIN\n  ###########################################################################\n  # Example 1: From an input tile a of shape [128, 512], dropout its values\n  # with probabilities in tile b of shape [128, 1] and store the result in c.\n  ###########################################################################\n  a: tensor[128, 512] = nl.load(a_tensor)\n  b: tensor[128, 1] = nl.load(b_tensor)\n\n  c: tensor[128, 512] = nisa.dropout(a, prob=b)\n\n  nl.store(c_tensor, c)\n  # NKI_EXAMPLE_END\n\n  return c_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_dropout_scalar(in_tensor):\n  import neuronxcc.nki.language as nl\n  out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype, buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_BEGIN\n  ######################################################\n  # Example 2: From an input tile a, dropout its values \n  # with probability of 0.2 and store the result in b.\n  ######################################################\n  a = nl.load(in_tensor)\n\n  b = nisa.dropout(a, prob=0.2)\n\n  nl.store(out_tensor, b)\n  # NKI_EXAMPLE_END\n\n  return out_tensor\n\n\nclass TestNkiIsaExamplesDropout(unittest.TestCase):\n  def test_dropout(self):\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.random.random_sample([128, 1]).astype(np.float32) * 100 \n    c = np.zeros([128, 512]).astype(np.float32)\n    c_zeros = np.copy(c)\n    \n    c = nki_dropout(a, b)\n\n    self.assertFalse(np.allclose(c, c_zeros))\n    # self.assertFalse(np.allclose(c, a)) # we don't have dropout simulation implementation\n    \n  def test_dropout_scalar(self):\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.zeros([128, 512]).astype(np.float32)\n    b_zeros = np.copy(b)\n    \n    b = nki_dropout_scalar(a)\n\n    self.assertFalse(np.allclose(b, b_zeros))\n    # self.assertFalse(np.allclose(b, a)) # we don't have dropout simulation implementation"
  },
  {
    "path": "nki/test/test_nki_isa_iota.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n# NKI_EXAMPLE_END\nimport numpy as np\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_iota():\n\n  # NKI_EXAMPLE_BEGIN\n  ##################################################################\n  # Example 1: Generate tile a of 512 constant values in SBUF partition 0\n  # that start at 0 and increment by 1:\n  ##################################################################\n  # a = [0, 1, ..., 511]\n  expr_a = nl.arange(0, 512)[None, :]\n  a: tensor[1, 512] = nisa.iota(expr_a, dtype=nl.int32)\n\n  ##################################################################\n  # Example 2: Generate tile b of 128 constant values across SBUF partitions\n  # that start at 0 and increment by 1, with one value per partition:\n  # b = [[0],\n  #      [1],\n  #      ...,\n  #      [127]]\n  ##################################################################\n  expr_b = nl.arange(0, 128)[:, None]\n  b: tensor[128, 1] = nisa.iota(expr_b, dtype=nl.int32)\n  \n  ##################################################################\n  # Example 3: Generate tile c of 512 constant values in SBUF partition 0\n  # that start at 0 and decrement by 1:\n  # c = [0, -1, ..., -511]\n  ##################################################################\n  expr_c = expr_a * -1\n  c: tensor[1, 512] = nisa.iota(expr_c, dtype=nl.int32)\n\n  ##################################################################\n  # Example 4: Generate tile d of 128 constant values across SBUF\n  # partitions that start at 5 and increment by 2\n  ##################################################################\n  # d = [[5],\n  #      [7],\n  #      ...,\n  #      [259]]\n  expr_d = 5 + expr_b * 2\n  d: tensor[128, 1] = nisa.iota(expr_d, dtype=nl.int32)\n\n  ##################################################################\n  # Example 5: Generate tile e of shape [128, 512] by\n  # broadcast-add expr_a and expr_b\n  # e = [[0, 1, ..., 511],\n  #      [1, 2, ..., 512],\n  #      ...\n  #      [127, 2, ..., 638]]\n  ##################################################################\n  e: tensor[128, 512] = nisa.iota(expr_a + expr_b, dtype=nl.int32)\n  # NKI_EXAMPLE_END\n\n  a_tensor = nl.ndarray([1, 512], dtype=nl.float32, buffer=nl.shared_hbm)\n  b_tensor = nl.ndarray([128, 1], dtype=nl.float32, buffer=nl.shared_hbm)\n  c_tensor = nl.ndarray([1, 512], dtype=nl.float32, buffer=nl.shared_hbm)\n  d_tensor = nl.ndarray([128, 1], dtype=nl.float32, buffer=nl.shared_hbm)\n  e_tensor = nl.ndarray([128, 512], dtype=nl.float32, buffer=nl.shared_hbm)\n  nl.store(a_tensor[0, expr_a], a)\n  nl.store(b_tensor[expr_b, 0], b)\n  nl.store(c_tensor[0, expr_a], c)  \n  nl.store(d_tensor[expr_b, 0], d)\n  nl.store(e_tensor[expr_b, expr_a], e)\n  return a_tensor, b_tensor, c_tensor, d_tensor, e_tensor\n  \n      \nclass TestNkiIsaExamplesIota(unittest.TestCase):\n  def test_iota(self):\n    a, b, c, d, e = nki_iota()\n\n    a_golden = np.expand_dims(np.arange(0, 512), 0)\n    b_golden = np.expand_dims(np.arange(0, 128), 1)\n    c_golden = np.expand_dims(np.arange(0, 512)*-1, 0)\n    d_golden = np.expand_dims(np.arange(5, 260, 2), 1)\n    e_golden = a_golden + b_golden\n\n    self.assertTrue(np.allclose(a, a_golden))\n    self.assertTrue(np.allclose(b, b_golden))\n    self.assertTrue(np.allclose(c, c_golden))\n    self.assertTrue(np.allclose(d, d_golden))\n    self.assertTrue(np.allclose(e, e_golden))\n\n "
  },
  {
    "path": "nki/test/test_nki_isa_local_gather.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n\n\n# NKI_EXAMPLE_END\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_local_gather(src_buffer, index, num_elem_per_idx, num_valid_indices, output_shape):\n  output = nl.ndarray(output_shape, dtype=src_buffer.dtype,\n                      buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_BEGIN\n  ##################################################################\n  # Example 1: gather src_buffer using index\n  # Gather input: src_buffer_tile with shape (128, 512, 4)\n  # Gather indices: index_tile with shape (128, 4)\n  # We use num_valid_indices indices per core, and read num_elem_per_idx\n  # contiguous elements per partition.\n  ##################################################################\n  src_buffer_tile: tensor[128, 512, 4] = nl.load(src_buffer)\n  index_tile: tensor[128, 4] = nl.load(index)\n  output_tile: tensor[128, 4, 16, 4] = nisa.local_gather(\n    src_buffer_tile, index_tile, num_elem_per_idx, num_valid_indices)\n\n  nl.store(output, output_tile)\n  # NKI_EXAMPLE_END\n\n  return output\n\n\nclass TestNkiIsaExamplesLocalGather(unittest.TestCase):\n  def test_local_gather(self):\n    import numpy as np\n\n    # Engine constants\n    # NUMPY_SEMANTICS_BEGIN\n    num_gpsimd_cores = 8\n    num_partitions_per_core = 16\n    # NUMPY_SEMANTICS_END\n\n    # example gather input: src_buffer = np.array((128, 512, 4))\n    # example gather indices: index = np.array((16, 4))\n    # (optional, default=0) gather valid index count per core: num_valid_indices\n    # (optional, default=1) gather element count per index: num_elem_per_idx\n\n    # NUMPY_SEMANTICS_BEGIN\n    src_buffer = np.random.random_sample([128, 512, 4]).astype(np.float32) * 100\n    index_per_core = np.random.randint(low=0, high=512, size=(16, 4), dtype=np.uint16)\n    # replicate 8 times for 8 GpSimd cores\n    index = np.tile(index_per_core, (num_gpsimd_cores, 1))\n    num_elem_per_idx = 4\n    index_hw = index * num_elem_per_idx\n    num_valid_indices = 64\n    output_shape = (128, 4, 16, 4)\n    # NUMPY_SEMANTICS_END\n\n    # Run NKI\n    output_nki = nki_local_gather(src_buffer, index_hw, num_elem_per_idx,\n                                  num_valid_indices, output_shape)\n\n    # NumPy reference\n    # NUMPY_SEMANTICS_BEGIN\n    num_active_cores = index.shape[0] / num_partitions_per_core\n    num_valid_indices = num_valid_indices if num_valid_indices \\\n      else index.size / num_active_cores\n\n    output_np = np.ndarray(shape=(128, num_valid_indices, num_elem_per_idx),\n                           dtype=src_buffer.dtype)\n\n    for i_core in range(num_gpsimd_cores):\n      start_par = i_core * num_partitions_per_core\n      end_par = (i_core + 1) * num_partitions_per_core\n      indices_1d = index[start_par:end_par].flatten(order='F')[0: num_valid_indices]\n\n      output_np[start_par:end_par, :, :] = np.take(\n        src_buffer[start_par:end_par],\n        indices_1d, axis=1)\n\n    output_np = output_np.reshape(output_shape)\n    # NUMPY_SEMANTICS_END\n    self.assertTrue(np.allclose(output_nki, output_np))\n"
  },
  {
    "path": "nki/test/test_nki_isa_max8.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_0_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n# NKI_EXAMPLE_0_END\nimport numpy as np\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_max8():\n  # NKI_EXAMPLE_0_BEGIN\n  ##################################################################\n  # Example 1: Generate tile b of 32 * 128 random floating point values\n  # and get the 8 largest values in each row:\n  ##################################################################\n  expr_a = nl.rand((32, 128))\n  a = nisa.max8(src=expr_a)\n\n  a_tensor = nl.ndarray([32, 8], dtype=nl.float32, buffer=nl.shared_hbm)\n  nl.store(a_tensor, value=a)\n  # NKI_EXAMPLE_0_END\n\n  return a_tensor\n\n\n\nclass TestNkiIsaExamplesMax8(unittest.TestCase):\n  def test_max8(self):\n    a = nki_max8()\n\n    self.assertEqual(a.shape, (32, 8))\n    self.assertTrue(np.all(a >= 0) and np.all(a <= 1))\n    row_diffs = np.diff(a, axis=1)  # Get differences between adjacent elements\n    self.assertTrue(np.all(row_diffs <= 0), \"Values within rows should be descending\")\n\nTestNkiIsaExamplesMax8().test_max8()"
  },
  {
    "path": "nki/test/test_nki_isa_memset.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_7_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\n...\n# NKI_EXAMPLE_7_END\nimport numpy as np\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_memset():\n  a_tensor = nl.ndarray([128, 128], dtype=nl.float32, buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_7_BEGIN\n  ##################################################################\n  # Example 1: Initialize a float32 tile a of shape (128, 128)\n  # with a value of 0.2\n  ##################################################################\n  a = nisa.memset(shape=(128, 128), value=0.2, dtype=nl.float32)\n  # NKI_EXAMPLE_7_END\n\n  i_p = nl.arange(128)[:, None]\n  i_f = nl.arange(128)[None, :]\n  nl.store(a_tensor[i_p, i_f], a)\n  return a_tensor\n  \n      \nclass TestNkiIsaExamplesMemset(unittest.TestCase):\n  def test_memset(self):\n    a = nki_memset()\n\n    a_golden = np.full([128, 128], 0.2).astype(np.float32)\n    self.assertTrue(np.allclose(a, a_golden))\n"
  },
  {
    "path": "nki/test/test_nki_isa_nc_find_index8.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_0_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n# NKI_EXAMPLE_0_END\nimport numpy as np\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_max_index8():\n  # NKI_EXAMPLE_0_BEGIN\n  ##################################################################\n  # Example 1: Generate tile b of 32 * 128 random floating point values,\n  # find the 8 largest values in each row, then find their indices:\n  ##################################################################\n  # Generate random data\n  data = nl.rand((32, 128))\n\n  # Find max 8 values per row\n  max_vals = nisa.max8(src=data)\n\n  # Create output tensor for indices\n  indices_tensor = nl.ndarray([32, 8], dtype=nl.uint32, buffer=nl.shared_hbm)\n\n  # Find indices of max values\n  indices = nisa.nc_find_index8(data=data, vals=max_vals)\n\n  # Store results\n  nl.store(indices_tensor, value=indices)\n  # NKI_EXAMPLE_0_END\n\n  return indices_tensor\n\n\n\nclass TestNkiIsaExamplesMaxIndex8(unittest.TestCase):\n  def test_max_index8(self):\n    indices = nki_max_index8()\n\n    self.assertEqual(indices.shape, (32, 8))\n    self.assertEqual(indices.dtype, np.uint32)\n\n    # Verify indices are within valid range (0 to 127)\n    self.assertTrue(np.all(indices >= 0) and np.all(indices < 128))\n\n    # Check that indices point to descending values\n    indices_diffs = np.diff(indices, axis=1)  # Get differences between adjacent indices\n    # Values should be unique, so indices should be different\n    self.assertTrue(np.all(indices_diffs != 0), \"Indices should be unique\")\n\nTestNkiIsaExamplesMaxIndex8().test_max_index8()\n"
  },
  {
    "path": "nki/test/test_nki_isa_nc_match_replace8.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_0_BEGIN # NKI_EXAMPLE_1_BEGIN # NKI_EXAMPLE_2_BEGIN # NKI_EXAMPLE_3_BEGIN # NKI_EXAMPLE_4_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nimport neuronxcc.nki.typing as nt\n\n# NKI_EXAMPLE_0_END # NKI_EXAMPLE_1_END # NKI_EXAMPLE_2_END # NKI_EXAMPLE_3_END # NKI_EXAMPLE_4_END\nimport numpy as np\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_nc_match_replace8():\n  # NKI_EXAMPLE_0_BEGIN\n  ##################################################################\n  # Example 1: Generate tile a of random floating point values,\n  # get the 8 largest values in each row, then replace their first\n  # occurrences with -inf:\n  ##################################################################\n  N = 4\n  M = 16\n  data_tile = nl.rand((N, M))\n  max_vals = nisa.max8(src=data_tile)\n\n  result = nisa.nc_match_replace8(data=data_tile[:, :], vals=max_vals, imm=float('-inf'))\n  result_tensor = nl.ndarray([N, M], dtype=nl.float32, buffer=nl.shared_hbm)\n  nl.store(result_tensor, value=result)\n  # NKI_EXAMPLE_0_END\n\n  return result_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_nc_match_replace_indices8(in_tensor: nt.tensor, imm: np.float32):\n  # NKI_EXAMPLE_1_BEGIN\n  ##################################################################\n  # Example 2: Read the 8 largest values in each row of the tensor,\n  # replace the first occurrence with imm, write indices, and return\n  # the replaced output.\n  ##################################################################\n  n, m = in_tensor.shape\n  # NKI_EXAMPLE_1_END\n  out_tensor = nl.ndarray([n, m], dtype=in_tensor.dtype, buffer=nl.hbm)\n  idx_tensor = nl.ndarray([n, 8], dtype=nl.uint32, buffer=nl.hbm)\n  # NKI_EXAMPLE_1_BEGIN\n  dst_idx = nl.ndarray((n, 8), dtype=idx_tensor.dtype)\n\n  ix, iy = nl.mgrid[0:n, 0:8]\n\n  inp_tile: nt.tensor[n, m] = nl.load(in_tensor)\n  max_vals: nt.tensor[n, 8] = nisa.max8(src=inp_tile)\n\n  out_tile = nisa.nc_match_replace8(\n    dst_idx=dst_idx[ix, iy], data=inp_tile[:, :], vals=max_vals, imm=imm\n  )\n  # NKI_EXAMPLE_1_END\n\n  nl.store(out_tensor, value=out_tile)\n  nl.store(idx_tensor[ix, iy], value=dst_idx[ix, iy])\n  return out_tensor, idx_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_nc_match_replace_indices8_mask(in_tensor: nt.tensor, imm: np.float32):\n  # NKI_EXAMPLE_2_BEGIN\n  ##################################################################\n  # Example 3: Read the 8 largest values in each row of the tensor,\n  # after applying the specified mask, replace the first occurrence\n  # with imm, write indices, and return the replaced output.\n  ##################################################################\n  n, m = in_tensor.shape\n  # NKI_EXAMPLE_2_END\n  out_tensor = nl.ndarray([n, m], dtype=in_tensor.dtype, buffer=nl.hbm)\n  idx_tensor = nl.ndarray([n, 8], dtype=nl.uint32, buffer=nl.hbm)\n  # NKI_EXAMPLE_2_BEGIN\n  idx_tile = nisa.memset(shape=(n, 8), value=0, dtype=nl.uint32)\n\n  ix, iy = nl.mgrid[0:n, 0:m]\n  inp_tile: nt.tensor[n, m] = nl.load(in_tensor)\n  max_vals: nt.tensor[n, 8] = nisa.max8(src=inp_tile[ix, iy], mask=(ix < n //2 and iy < m//2))\n\n  out_tile = nisa.nc_match_replace8(\n    dst_idx=idx_tile[:, :],\n    data=inp_tile[ix, iy],\n    vals=max_vals,\n    imm=imm,\n    mask=(ix < n // 2 and iy < m // 2),  # mask applies to `data`\n  )\n  # NKI_EXAMPLE_2_END\n\n  nl.store(out_tensor, value=out_tile)\n  nl.store(idx_tensor, value=idx_tile)\n  return out_tensor, idx_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_nc_match_replace_indices8_3d(data_tensor: nt.tensor):\n  # NKI_EXAMPLE_3_BEGIN\n  ##################################################################\n  # Example 4: Read the 8 largest values in each row of the tensor,\n  # replace the first occurrence with 0.0, write indices, and return \n  # the replaced output.\n  ##################################################################\n  n, b, m = data_tensor.shape\n  # NKI_EXAMPLE_3_END\n  out_tensor = nl.ndarray([n, b, m], dtype=data_tensor.dtype, buffer=nl.hbm)\n  # NKI_EXAMPLE_3_BEGIN\n  n, b, m = data_tensor.shape\n\n  out_tensor = nl.ndarray([n, b, m], dtype=data_tensor.dtype, buffer=nl.hbm)\n  idx_tensor = nl.ndarray([n, 8], dtype=nl.uint32, buffer=nl.hbm)\n\n  imm = 0.0\n  idx_tile = nisa.memset(shape=(n, 8), value=0, dtype=nl.uint32)\n  out_tile = nisa.memset(shape=(n, b, m), value=0, dtype=data_tensor.dtype)\n\n  iq, ir, iw = nl.mgrid[0:n, 0:b, 0:m]\n  ip, io = nl.mgrid[0:n, 0:8]\n\n  inp_tile = nl.load(data_tensor[iq, ir, iw])\n  max_vals: nt.tensor[n, 8] = nisa.max8(src=inp_tile)\n\n  out_tile[iq, ir, iw] = nisa.nc_match_replace8(\n    dst_idx=idx_tile[ip, io],\n    data=inp_tile[iq, ir, iw],\n    vals=max_vals[ip, io],\n    imm=imm,\n  )\n\n  # NKI_EXAMPLE_3_END\n  nl.store(out_tensor, value=out_tile)\n  nl.store(idx_tensor, value=idx_tile)\n  return out_tensor, idx_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_nc_match_replace_indices8_3d_inplace(data_tensor: nt.tensor):\n  # NKI_EXAMPLE_4_BEGIN\n  ##################################################################\n  # Example 5: Read the 8 largest values in each row of the tensor,\n  # replace the first occurrence with 0.0 in-place and write indices.\n  ##################################################################\n  n, b, m = data_tensor.shape\n  # NKI_EXAMPLE_4_END\n  out_tensor = nl.ndarray([n, b, m], dtype=data_tensor.dtype, buffer=nl.hbm)\n  # NKI_EXAMPLE_4_BEGIN\n  n, b, m = data_tensor.shape\n\n  out_tensor = nl.ndarray([n, b, m], dtype=data_tensor.dtype, buffer=nl.hbm)\n  idx_tensor = nl.ndarray([n, 8], dtype=nl.uint32, buffer=nl.hbm)\n\n  imm = 0.0\n  idx_tile = nisa.memset(shape=(n, 8), value=0, dtype=nl.uint32)\n\n  iq, ir, iw = nl.mgrid[0:n, 0:b, 0:m]\n  ip, io = nl.mgrid[0:n, 0:8]\n\n  inp_tile = nl.load(data_tensor[iq, ir, iw])\n  max_vals: nt.tensor[n, 8] = nisa.max8(src=inp_tile)\n\n  inp_tile[iq, ir, iw] = nisa.nc_match_replace8(\n    dst_idx=idx_tile[ip, io],\n    data=inp_tile[iq, ir, iw],\n    vals=max_vals[ip, io],\n    imm=imm,\n  )\n\n  # NKI_EXAMPLE_4_END\n  nl.store(out_tensor, value=inp_tile)\n  nl.store(idx_tensor, value=idx_tile)\n  return out_tensor, idx_tensor\n\n\ndef match_and_get_index(data, vals):\n  row = data.copy()\n  vlength = vals.shape[-1]\n\n  result = np.zeros(shape=vals.shape, dtype=np.int32)\n  idx = 0\n  for j in range(vlength):\n    matches = np.where(row == vals[j])[0]\n    if matches:\n      idx = matches[0]\n      row[idx] = np.float32(\"-inf\")\n      result[j] = idx\n  return result\n\n\ndef get_replaced_output_and_max_indices(a, imm=0):\n  axis = -1\n  a_reshaped = a.reshape(a.shape[0], -1)\n  a_sorted = np.sort(a_reshaped, axis=axis)\n  a_sorted_last_8 = a_sorted[:, -8:]\n  max_vals = np.flip(a_sorted_last_8, axis=-1)\n\n  c = a_reshaped.copy()\n  concat_out_golden_max_vals = np.concatenate([c, max_vals], axis=axis)\n  c_idx = np.apply_along_axis(\n    # get index for first occurence of max_vals along the specified axis\n    lambda x: match_and_get_index(x[:-8], x[-8:]),\n    axis=axis,\n    arr=concat_out_golden_max_vals,\n  ).astype(np.uint32)\n  np.put_along_axis(c, indices=c_idx, values=imm, axis=axis)\n  c = np.reshape(c, a.shape)\n  return c, c_idx\n\n\nclass TestNkiIsaExamplesMatchReplace8(unittest.TestCase):\n  def test_nc_match_replace8(self):\n    result = nki_nc_match_replace8()\n\n    self.assertEqual(result.shape, (4, 16))\n    self.assertEqual(result.dtype, np.float32)\n\n    # Each row should have exactly 8 -inf values\n    inf_count = np.sum(np.isinf(result) & (result < 0), axis=1)\n    self.assertTrue(np.all(inf_count == 8))\n\n    # Non-inf values should be between 0 and 1 (from rand)\n    non_inf_mask = ~(np.isinf(result) & (result < 0))\n    self.assertTrue(np.all(result[non_inf_mask] >= 0))\n    self.assertTrue(np.all(result[non_inf_mask] <= 1))\n\n  def test_nc_match_replace_indices8(self):\n    imm = np.float32('-inf')\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32)\n\n    b, b_idx = nki_nc_match_replace_indices8(a, imm=imm)\n    c, c_idx = get_replaced_output_and_max_indices(a, imm)\n\n    self.assertTrue(np.allclose(b, c))\n    self.assertTrue(np.allclose(b_idx, c_idx))\n\n  def test_nc_match_replace_indices8_mask(self):\n    imm = np.float32('-inf')\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32)\n    b, b_idx = nki_nc_match_replace_indices8_mask(a, imm=imm)\n    c, c_idx = get_replaced_output_and_max_indices(a[:64, :256], imm) \n\n    self.assertTrue(np.allclose(b[:64, :256], c))\n    self.assertTrue(np.allclose(b_idx[:64, :256], c_idx))\n\n  def test_nc_match_replace_indices8_3d(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 4, 4]).astype(np.float32)\n\n    b, b_idx = nki_nc_match_replace_indices8_3d(a)\n    c, c_idx = get_replaced_output_and_max_indices(a, imm=0)\n\n    self.assertTrue(np.allclose(b, c))\n    self.assertTrue(np.allclose(b_idx, c_idx))\n\n  def test_nc_match_replace_indices8_3d_inplace(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 4, 4]).astype(np.float32)\n\n    b, b_idx = nki_nc_match_replace_indices8_3d_inplace(a)\n    c, c_idx = get_replaced_output_and_max_indices(a, imm=0)\n\n    self.assertTrue(np.allclose(b, c))\n    self.assertTrue(np.allclose(b_idx, c_idx))"
  },
  {
    "path": "nki/test/test_nki_isa_nc_matmul.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_0_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\n# NKI_EXAMPLE_0_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_nc_matmul(a_tensor, b_tensor, d_tensor, e_tensor, g_tensor, h_tensor):\n  c_tensor = nl.ndarray([128, 512], dtype=nl.float32, buffer=nl.shared_hbm)\n  f_tensor = nl.ndarray([128, 512], dtype=nl.float32, buffer=nl.shared_hbm)\n  i_tensor = nl.ndarray([16, 64, 512], dtype=nl.float32, buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_0_BEGIN\n  ##################################################################\n  # Example 1:\n  # multiply matrix a of shape (128, 128) and matrix b of shape (128, 512)\n  # to get matrix c in PSUM of shape (128, 512)\n  ##################################################################\n  a_mgrid = nl.mgrid[0:128, 0:128]\n  b_mgrid = nl.mgrid[0:128, 0:512]\n  c_mgrid = nl.mgrid[0:128, 0:512]\n\n  a = nl.load(a_tensor[a_mgrid.p, a_mgrid.x])\n  b = nl.load(b_tensor[b_mgrid.p, b_mgrid.x])\n\n  c_psum = nisa.nc_matmul(a[a_mgrid.p, a_mgrid.x], b[b_mgrid.p, b_mgrid.x])\n\n  nl.store(c_tensor[c_mgrid.p, c_mgrid.x], c_psum)\n\n  ##################################################################\n  # Example 2:\n  # multiply matrix d of shape (256, 128) and matrix e of shape (256, 512)\n  # to get matrix f in PSUM of shape (128, 512) using psum accumulation\n  ##################################################################\n  d_mgrid = nl.mgrid[0:128, 0:128]\n  e_mgrid = nl.mgrid[0:128, 0:512]\n  f_mgrid = nl.mgrid[0:128, 0:512]\n\n  f_psum = nl.zeros((128, 512), nl.float32, buffer=nl.psum)\n\n  for i_contract in nl.affine_range(2):\n    d = nl.load(d_tensor[i_contract * 128 + d_mgrid.p, d_mgrid.x])\n    e = nl.load(e_tensor[i_contract * 128 + e_mgrid.p, e_mgrid.x])\n    f_psum += nisa.nc_matmul(d[d_mgrid.p, d_mgrid.x], e[e_mgrid.p, e_mgrid.x])\n    \n  nl.store(f_tensor[f_mgrid.p, f_mgrid.x], f_psum)\n\n  ##################################################################\n  # Example 3:\n  # perform batched matrix multiplication on matrix g of shape (16, 64, 64) \n  # and matrix h of shape (16, 64, 512) to get matrix i of (16, 64, 512) \n  # using Tensor Engine PE tiling mode. \n  ##################################################################\n  g_mgrid = nl.mgrid[0:64, 0:64]\n  h_mgrid = nl.mgrid[0:64, 0:512]\n  i_mgrid = nl.mgrid[0:64, 0:512]\n\n  for i in nl.affine_range(4):\n    for j in nl.affine_range(4):\n      g = nl.load(g_tensor[i * 4 + j, g_mgrid.p, g_mgrid.x])\n      h = nl.load(h_tensor[i * 4 + j, h_mgrid.p, h_mgrid.x])\n      i_psum = nisa.nc_matmul(g, h, tile_position=((i % 2) * 64, (j % 2) * 64), tile_size=(64, 64))\n      nl.store(i_tensor[i * 4 + j, i_mgrid.p, i_mgrid.x], i_psum)\n\n  return c_tensor, f_tensor, i_tensor\n  # NKI_EXAMPLE_0_END\n\n@nki.jit(mode=\"simulation\", platform_target='trn2')\ndef nki_nc_matmul_double_row_gen3(a_input, b_input):\n  NUM_PARTITIONS_A, TWO_A, FREE_A = a_input.shape\n  NUM_PARTITIONS_B, TWO_B, FREE_B = b_input.shape\n\n  c_output = nl.ndarray([FREE_A, FREE_B], dtype=nl.float32, buffer=nl.shared_hbm)\n\n  assert NUM_PARTITIONS_A == NUM_PARTITIONS_B and TWO_A == 2 and TWO_B == 2\n\n  a_tile = nl.ndarray(\n    (NUM_PARTITIONS_A, TWO_A, max(FREE_A, 16)), dtype=nl.float8_e5m2, buffer=nl.sbuf\n  )\n  a_mgrid = nl.mgrid[0:NUM_PARTITIONS_A, 0:TWO_A, 0:FREE_A]\n  a_tile[a_mgrid.p, a_mgrid.x, a_mgrid.y] = nl.load(a_input.view(nl.float8_e5m2))\n  b_tile = nl.load(b_input.view(nl.float8_e5m2))\n  c_tile = nisa.nc_matmul(\n    a_tile[a_mgrid.p, a_mgrid.x, a_mgrid.y], b_tile, perf_mode=\"double_row_gen3\"\n  )\n  nl.store(c_output, value=c_tile)\n  return c_output\n\n\nclass TestNkiIsaExamplesNcMatmul(unittest.TestCase):\n  def test_nc_matmul(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 128]).astype(np.float32) * 100\n    b = np.random.random_sample([128, 512]).astype(np.float32) * 100\n\n    d = np.random.random_sample([256, 128]).astype(np.float32) * 100\n    e = np.random.random_sample([256, 512]).astype(np.float32) * 100\n\n    g = np.random.random_sample([16, 64, 64]).astype(np.float32) * 100\n    h = np.random.random_sample([16, 64, 512]).astype(np.float32) * 100\n    i = np.ndarray(shape=[16, 64, 512], dtype=np.float32)\n\n    c, f, i = nki_nc_matmul(a, b, d, e, g, h)\n\n    c_golden = np.matmul(np.transpose(a), b)\n    f_golden = np.matmul(np.transpose(d), e)\n    i_golden = np.matmul(g.transpose(0, 2, 1), h)\n\n    self.assertTrue(np.allclose(c, c_golden))\n    self.assertTrue(np.allclose(f, f_golden))\n    self.assertTrue(np.allclose(i, i_golden))\n\n  def test_double_row_gen3(self):\n    np.random.seed(0)\n    a = np.ones((128, 2, 1), dtype=nl.float8_e5m2)\n    b = np.ones((128, 2, 512), dtype=nl.float8_e5m2)\n\n    c = nki_nc_matmul_double_row_gen3(a, b)\n\n    c_golden = np.einsum(\"kli,klj->ij\",\n                         a.astype(np.float32),\n                         b.astype(np.float32))\n\n    self.assertTrue(np.allclose(c, c_golden))\n"
  },
  {
    "path": "nki/test/test_nki_isa_nc_stream_shuffle.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_0_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n# NKI_EXAMPLE_0_END\nimport numpy as np\nsimulate_kernel = nki.simulate_kernel\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_nc_stream_shuffle(in_tensor):\n  # NKI_EXAMPLE_0_BEGIN\n  #####################################################################\n  # Example 1: \n  # Apply cross-partition data movement to a 32-partition tensor,\n  # in-place shuffling the data in partition[i] to partition[(i+1)%32].\n  #####################################################################\n  # NKI_EXAMPLE_0_END\n  ...\n  out_tensor = nl.ndarray(shape=(32, 128), dtype=np.float32, buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_0_BEGIN\n  ...\n  a: tensor[32, 128] = nl.load(in_tensor)\n  a_mgrid = nl.mgrid[0:32, 0:128]\n  shuffle_mask = [(i - 1) % 32 for i in range(32)]\n  nisa.nc_stream_shuffle(src=a[a_mgrid.p, a_mgrid.x], dst=a[a_mgrid.p, a_mgrid.x], shuffle_mask=shuffle_mask)\n  \n  nl.store(out_tensor, value=a)\n  # NKI_EXAMPLE_0_END\n  return out_tensor\n\n@nki.jit(mode=\"simulation\")\ndef nki_nc_stream_shuffle_broadcast_partition(in_tensor):\n  # NKI_EXAMPLE_1_BEGIN\n  #####################################################################\n  # Example 2: \n  # Broadcast data in 1 partition to 32 partitions.\n  #####################################################################\n  # NKI_EXAMPLE_1_END\n  ...\n  out_tensor = nl.ndarray(shape=(32, 128), dtype=np.float32, buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_1_BEGIN\n  ...\n  a: tensor[1, 128] = nl.load(in_tensor)\n  b = nl.ndarray(shape=(32, 128), dtype=np.float32)\n  dst_mgrid = nl.mgrid[0:32, 0:128]\n  src_mgrid = nl.mgrid[0:1, 0:128]\n  shuffle_mask = [0] * 32\n  nisa.nc_stream_shuffle(src=a[0, src_mgrid.x], dst=b[dst_mgrid.p, dst_mgrid.x], shuffle_mask=shuffle_mask)\n  \n  nl.store(out_tensor, value=b)\n  # NKI_EXAMPLE_1_END\n  return out_tensor\n\n@nki.jit(mode=\"simulation\")\ndef nki_nc_stream_shuffle_broadcast_mask(in_tensor):\n  # NKI_EXAMPLE_2_BEGIN\n  #####################################################################\n  # Example 3: \n  # In the case where src and dst access more than one quadrant (32 \n  # partitions), the shuffle is applied to each quadrant independently, \n  # and the same shuffle_mask is used for each quadrant.\n  #####################################################################\n  # NKI_EXAMPLE_2_END\n  ...\n  out_tensor = nl.ndarray(shape=(128, 128), dtype=np.float32, buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_2_BEGIN\n  ...\n  a: tensor[128, 128] = nl.load(in_tensor)\n  b = nl.ndarray(shape=(128, 128), dtype=np.float32)\n  mgrid = nl.mgrid[0:128, 0:128]\n  shuffle_mask = [(i - 1) % 32 for i in range(32)]\n  nisa.nc_stream_shuffle(src=a[mgrid.p, mgrid.x], dst=b[mgrid.p, mgrid.x], shuffle_mask=shuffle_mask)\n  \n  nl.store(out_tensor, value=b)\n  # NKI_EXAMPLE_2_END\n  return out_tensor\n\n      \nclass TestNkiIsaExamplesStreamShuffle(unittest.TestCase):\n  def test_stream_shuffle(self):\n    in_tensor = np.random.random_sample([32, 128]).astype(np.float32) * 100\n    out_tensor = simulate_kernel(nki_nc_stream_shuffle, in_tensor)\n    in_tensor[list(range(32))] = in_tensor[[(i - 1) % 32 for i in range(32)]]\n    self.assertTrue(np.allclose(out_tensor, in_tensor))\n\n  def test_broadcast_partition(self):\n    in_tensor = np.random.random_sample([1, 128]).astype(np.float32) * 100\n    out_tensor = simulate_kernel(nki_nc_stream_shuffle_broadcast_partition, in_tensor)\n    out_tensor[0:32] = in_tensor[0]\n    self.assertTrue(np.allclose(out_tensor, in_tensor))\n\n  def test_broadcast_mask(self):\n    in_tensor = np.random.random_sample([128, 128]).astype(np.float32) * 100\n    out_tensor = simulate_kernel(nki_nc_stream_shuffle_broadcast_mask, in_tensor)\n    for j in range(4):\n      in_tensor[list(range(j * 32, (j + 1) * 32))] = in_tensor[[(i - 1) % 32 + j * 32 for i in range(32)]]\n    self.assertTrue(np.allclose(out_tensor, in_tensor))\n"
  },
  {
    "path": "nki/test/test_nki_isa_nc_transpose.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_1_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\n...\n# NKI_EXAMPLE_1_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_nc_transpose(a_tensor, b_tensor):\n  at_tensor = nl.ndarray([a_tensor.shape[1], a_tensor.shape[0]], dtype=a_tensor.dtype,\n                         buffer=nl.shared_hbm)\n  bt_tensor = nl.ndarray([b_tensor.shape[1], b_tensor.shape[0]], dtype=b_tensor.dtype,\n                         buffer=nl.shared_hbm)\n  ##################################################################\n  i_p_a = nl.arange(128)[:, None]\n  i_f_a = nl.arange(64)[None, :]\n  a = nl.load(a_tensor[i_p_a, i_f_a])\n  # NKI_EXAMPLE_1_BEGIN\n  ##################################################################\n  # Example 1: transpose tile a of shape (128, 64)\n  ##################################################################\n  i_p_a = nl.arange(128)[:, None]\n  i_f_a = nl.arange(64)[None, :]\n  aT = nisa.nc_transpose(a[i_p_a, i_f_a])\n\n  # NKI_EXAMPLE_1_END\n  i_p_aT = nl.arange(64)[:, None]\n  i_f_aT = nl.arange(128)[None, :]\n  nl.store(at_tensor[i_p_aT, i_f_aT], aT)\n\n  ##################################################################\n  i_p_b = nl.arange(32)[:, None]\n  i_f_b = nl.arange(2)[None, :]\n  b = nl.load(b_tensor[i_p_b, i_f_b])\n  # NKI_EXAMPLE_1_BEGIN\n  ##################################################################\n  # Example 2: transpose tile b of shape (32, 2) using Vector Engine\n  ##################################################################\n  i_p_b = nl.arange(32)[:, None]\n  i_f_b = nl.arange(2)[None, :]\n  bT = nisa.nc_transpose(b[i_p_b, i_f_b], engine=nisa.vector_engine)\n  # NKI_EXAMPLE_1_END\n\n  i_p_bT = nl.arange(2)[:, None]\n  i_f_bT = nl.arange(32)[None, :]\n  nl.store(bt_tensor[i_p_bT, i_f_bT], bT)\n  return at_tensor, bt_tensor\n\nclass TestNkiIsaExamplesSbTranspose(unittest.TestCase):\n  def test_nc_transpose(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 64]).astype(np.float32) * 100\n    b = np.random.random_sample([32, 2]).astype(np.float32) * 100\n\n    aT, bT = nki_nc_transpose(a, b)\n\n    aT_golden = np.transpose(a)\n    bT_golden = np.transpose(b)\n\n    self.assertTrue(np.allclose(aT, aT_golden))\n    self.assertTrue(np.allclose(bT, bT_golden))\n"
  },
  {
    "path": "nki/test/test_nki_isa_partition_reduce.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_1_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nimport numpy as np\n...\n# NKI_EXAMPLE_1_END\nnki_jit = nki.trace\nsimulate_kernel = nki.simulate_kernel\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n@nki_jit\ndef nki_par_reduce(a_tensor, b_tensor):\n  # NKI_EXAMPLE_1_BEGIN\n  ##################################################################\n  # Example 1: reduce add tile a of shape (128, 32, 4)\n  # in the partition dimension and return\n  # reduction result in tile b of shape (1, 32, 4)\n  ##################################################################\n  a = nl.load(a_tensor[0:128, 0:32, 0:4])  \n  b = nisa.tensor_partition_reduce(np.add, a)\n  nl.store(b_tensor[0:1, 0:32, 0:4], b)\n  # NKI_EXAMPLE_1_END\n\n@nki_jit\ndef nki_par_reduce_nd_b(a_tensor, b_tensor):\n  # NKI_EXAMPLE_1_BEGIN\n  ##################################################################\n  # Example 2: reduce add tile a of shape (b, p, f1, ...)\n  # in the partition dimension p and return\n  # reduction result in tile b of shape (b, 1, f1, ...)\n  ##################################################################\n  for i in nl.affine_range(a_tensor.shape[0]):\n    a = nl.load(a_tensor[i])\n    b = nisa.tensor_partition_reduce(np.add, a)\n    nl.store(b_tensor[i], b)\n  # NKI_EXAMPLE_1_END\n\n\nclass TestNkiIsaExamplesPartitionReduce(unittest.TestCase):\n  def test_par_reduce_nd(self):\n    a = np.random.random_sample([128, 32, 4]).astype(np.float32) * 100\n    b = np.ndarray(shape=(1, 32, 4), dtype=np.float32)\n    simulate_kernel(nki_par_reduce, a, b)\n\n    self.assertTrue(np.allclose(b, np.sum(a, axis=0, keepdims=True)))\n\n  def test_par_reduce_nd_b(self):\n    a = np.random.random_sample([4, 128, 32, 8]).astype(np.float32) * 100\n    b = np.ndarray(shape=(4, 1, 32, 8), dtype=np.float32)\n    simulate_kernel(nki_par_reduce_nd_b, a, b)\n\n    self.assertTrue(np.allclose(b, np.sum(a, axis=1, keepdims=True)))"
  },
  {
    "path": "nki/test/test_nki_isa_range_select.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\"\"\"\nimport unittest\n# NKI_EXAMPLE_0_BEGIN, NKI_EXAMPLE_1_BEGIN\nimport neuronxcc.nki as nki\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nimport numpy as np\n...\n# NKI_EXAMPLE_0_END, NKI_EXAMPLE_1_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n@nki.jit(mode=\"simulation\", platform_target=\"trn2\")\ndef nki_range_select_example(on_true, bound0, bound1, compare_op0, compare_op1, range_start, dtype):\n    # Create output tensors\n    select_res = nl.ndarray(on_true.shape, dtype=dtype, buffer=nl.hbm)\n    reduce_result = nl.ndarray((on_true.shape[0], 1), dtype=dtype, buffer=nl.hbm)\n    \n    # NKI_EXAMPLE_0_BEGIN\n    ##################################################################\n    # Example 1: # Select elements where \n    # bound0 <= range_start + index < bound1 and compute max reduction\n    # \n    # on_false_value must be nl.fp32.min\n    ##################################################################\n    on_true_tile = nl.load(on_true[...])\n    bound0_tile = nl.load(bound0[...])\n    bound1_tile = nl.load(bound1[...])\n\n    reduce_res_tile = nl.ndarray((on_true.shape[0], 1), dtype=dtype, buffer=nl.sbuf)\n    result = nl.ndarray(on_true.shape, dtype=dtype, buffer=nl.sbuf)\n    \n    result[...] = nisa.range_select(\n        on_true_tile=on_true_tile,\n        comp_op0=compare_op0,\n        comp_op1=compare_op1,\n        bound0=bound0_tile,\n        bound1=bound1_tile,\n        reduce_cmd=nisa.reduce_cmd.reset_reduce,\n        reduce_res=reduce_res_tile,\n        reduce_op=np.max,\n        range_start=range_start,\n        on_false_value=nl.fp32.min,\n        dtype=dtype\n    )\n\n    nl.store(select_res[...], value=result[...])\n    nl.store(reduce_result[...], value=reduce_res_tile[...])\n    # NKI_EXAMPLE_0_END\n\n    return result, reduce_result\n\n@nki.jit(mode=\"simulation\", platform_target=\"trn2\")\ndef nki_range_select_chaining(on_true, bound0, bound1, compare_op0, compare_op1, range_start):\n    # Create output tensors\n    select_res = nl.ndarray(on_true.shape, dtype=np.float32, buffer=nl.hbm)\n    reduce_result = nl.ndarray((on_true.shape[0], 1), dtype=np.float32, buffer=nl.hbm)\n    \n    # NKI_EXAMPLE_1_BEGIN\n    ##################################################################\n    # Example 2.a: Initialize reduction with first range_select\n    # Notice we don't pass reduce_res since the accumulation\n    # register keeps track of the accumulation until we're ready to \n    # read it. Also we use reset_reduce in order to \"clobber\" or zero\n    # out the accumulation register before we start accumulating.\n    #\n    # Note: Since the type of these tensors are fp32, we use nl.fp32.min\n    # for on_false_value due to HW constraints.\n    ##################################################################\n    on_true_tile = nl.load(on_true[...])\n    bound0_tile = nl.load(bound0[...])\n    bound1_tile = nl.load(bound1[...])\n\n    reduce_res_sbuf = nl.ndarray((on_true.shape[0], 1), dtype=np.float32, buffer=nl.sbuf)\n    result_sbuf = nl.ndarray(on_true.shape, dtype=np.float32, buffer=nl.sbuf)\n    \n    result_sbuf[...] = nisa.range_select(\n        on_true_tile=on_true_tile,\n        comp_op0=compare_op0,\n        comp_op1=compare_op1,\n        bound0=bound0_tile,\n        bound1=bound1_tile,\n        reduce_cmd=nisa.reduce_cmd.reset_reduce,\n        reduce_op=np.max,\n        range_start=range_start,\n        on_false_value=nl.fp32.min\n    )\n\n    ##################################################################\n    # Example 2.b: Chain multiple range_select operations \n    # with reduction in an affine loop. Adding ones just lets us ensure the reduction \n    # gets updated with new values.\n    ##################################################################\n    ones = nl.full(on_true.shape, fill_value=1, dtype=np.float32, buffer=nl.sbuf)\n    # we are going to loop as if we're tiling on the partition dimension    \n    iteration_step_size = on_true_tile.shape[0]\n    \n    # Perform chained operations using an affine loop index for range_start\n    for i in range(1, 2):\n        # Update input values\n        on_true_tile[...] = nl.add(on_true_tile, ones)\n        \n        # Continue reduction with updated values\n        # notice, we still don't have reduce_res specified\n        result_sbuf[...] = nisa.range_select(\n            on_true_tile=on_true_tile,\n            comp_op0=compare_op0,\n            comp_op1=compare_op1,\n            bound0=bound0_tile,\n            bound1=bound1_tile,\n            reduce_cmd=nisa.reduce_cmd.reduce,\n            reduce_op=np.max,\n            # we can also use index expressions for setting the start of the range\n            range_start=range_start + (i * iteration_step_size),\n            on_false_value=nl.fp32.min\n        )\n\n    range_start = range_start + (2 * iteration_step_size)\n    ##################################################################\n    # Example 2.c: Final iteration, we actually want the results to \n    # return to the user so we pass reduce_res argument so the \n    # reduction  will be written from the accumulation \n    # register to reduce_res_tile\n    ##################################################################\n    on_true_tile[...] = nl.add(on_true_tile, ones)\n    result_sbuf[...] = nisa.range_select(\n        on_true_tile=on_true_tile,\n        comp_op0=compare_op0,\n        comp_op1=compare_op1,\n        bound0=bound0_tile,\n        bound1=bound1_tile,\n        reduce_cmd=nisa.reduce_cmd.reduce,\n        reduce_res=reduce_res_sbuf[...],\n        reduce_op=np.max,\n        range_start=range_start,\n        on_false_value=nl.fp32.min\n    )\n\n    nl.store(select_res[...], value=result_sbuf[...])\n    nl.store(reduce_result[...], value=reduce_res_sbuf[...])\n    # NKI_EXAMPLE_1_END\n\n    return select_res, reduce_result\n\nclass TestNkiIsaExamplesRangeSelect(unittest.TestCase):\n    def test_range_select_example(self):\n        bound0 = np.zeros([128, 1], dtype=np.float32)\n        bound1 = np.full([128, 1], 64, dtype=np.float32)\n        range_start = 32\n        for dtype in (nl.float8_e4m3, nl.float8_e5m2, nl.bfloat16, np.float16, np.float32):\n            on_true_data = np.random.random_sample((128, 512)).astype(dtype)\n            result, reduction = nki_range_select_example(on_true_data, bound0, bound1,\n                                                     np.greater_equal, np.less, range_start, dtype)\n\n            # The results should match the numpy equivalent from the docstring:\n            # indices = np.zeros_like(on_true_data, dtype=np.float32)\n            # indices[:] = range_start + np.arange(on_true_data[0].size)\n            # mask = comp_op0(indices, bound0) & comp_op1(indices, bound1)\n            # result = np.where(mask, on_true_data, on_false_value)\n            # reduction = reduce_op(result, axis=1, keepdims=True)\n            indices = np.zeros_like(on_true_data, dtype=np.float32)\n            indices[:] = range_start + np.arange(on_true_data.shape[1])\n\n            mask = np.greater_equal(indices, bound0) & np.less(indices, bound1)\n            golden = np.where(mask, on_true_data, nl.fp32.min)\n            \n            golden_reduce = np.max(golden, axis=1, keepdims=True)\n\n            self.assertTrue(np.allclose(result, golden.astype(dtype).astype(np.float32)))\n            self.assertTrue(np.allclose(reduction, golden_reduce.astype(dtype).astype(np.float32)))\n\n    def test_range_select_chaining(self):\n        on_true_data = np.random.random_sample((128, 512)).astype(np.float32)\n        range_start = 32\n        bound0 = np.zeros([128, 1], dtype=np.float32)\n        bound1 = np.full([128, 1], 350, dtype=np.float32)\n        \n        result, reduction = nki_range_select_chaining(\n            on_true_data, bound0, bound1,\n            np.greater_equal, np.less, range_start\n        )\n\n        # Calculate golden reference\n        indices = np.zeros_like(on_true_data)\n               \n        # Apply the same operations as in the kernel\n        golden = on_true_data.copy()\n        golden_max = np.zeros((on_true_data.shape[0], 1), dtype=on_true_data.dtype)\n        selected = golden_max.copy()\n\n        iteration_step_size = on_true_data.shape[0]\n        for i in range(3):  # 3 iterations\n            indices[:] = range_start + (i * iteration_step_size) + np.arange(on_true_data.shape[1])\n            mask = np.greater_equal(indices, bound0) & np.less(indices, bound1)\n\n            selected = np.where(mask, golden, nl.fp32.min)\n            golden_max = np.maximum(golden_max, np.max(selected, axis=1, keepdims=True))\n            golden = golden + 1\n\n        self.assertTrue(np.allclose(result, selected))\n        self.assertTrue(np.allclose(reduction, golden_max))\n"
  },
  {
    "path": "nki/test/test_nki_isa_reciprocal.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\n# NKI_EXAMPLE_6_BEGIN\nimport neuronxcc.nki as nki\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\n...\n# NKI_EXAMPLE_6_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef reciprocal_kernel(in_tensor):\n  out_tensor = nl.ndarray([128, 512], dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_6_BEGIN\n  x = nl.load(in_tensor[nl.mgrid[0:128, 0:512]])\n  \n  y = nisa.reciprocal(x)\n\n  # NKI_EXAMPLE_6_END\n  nl.store(out_tensor[nl.mgrid[0:128, 0:512]], value=y)\n  return out_tensor\n\n\nclass TestNkiExampleNisaReciprocal(unittest.TestCase):\n  def test_nisa_reciprocal(self):\n    np.random.seed(0)\n    src = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    dst_golden = np.reciprocal(src)\n\n    dst = reciprocal_kernel(src)\n    self.assertTrue(np.allclose(dst, dst_golden))\n"
  },
  {
    "path": "nki/test/test_nki_isa_reduce.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_2_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nimport numpy as np\n...\n# NKI_EXAMPLE_2_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_reduce(a_tensor):\n  b_tensor = nl.ndarray([a_tensor.shape[0], 1], dtype=a_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_2_BEGIN\n  ##################################################################\n  # Example 1: reduce add tile a of shape (128, 512)\n  # in the free dimension and return\n  # reduction result in tile b of shape (128, 1)\n  ##################################################################\n  i_p_a = nl.arange(128)[:, None]\n  i_f_a = nl.arange(512)[None, :]\n  # NKI_EXAMPLE_2_END\n  \n  a = nl.load(a_tensor[i_p_a, i_f_a])  \n\n  # NKI_EXAMPLE_2_BEGIN\n  b = nisa.tensor_reduce(np.add, a[i_p_a, i_f_a], axis=[1])\n  # NKI_EXAMPLE_2_END\n\n  i_p_b, i_f_b = nl.mgrid[0:128, 0:1]\n  nl.store(b_tensor[i_p_b, i_f_b], b)\n  return b_tensor\n\n      \nclass TestNkiIsaExamplesReduce(unittest.TestCase):\n  def test_reduce(self):\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.ndarray(shape=(128, 1), dtype=np.float32)\n    b = nki_reduce(a)\n\n    self.assertTrue(np.allclose(b, np.sum(a, axis=1, keepdims=True)))\n "
  },
  {
    "path": "nki/test/test_nki_isa_select_reduce.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_1_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\n# NKI_EXAMPLE_1_END\nimport numpy as np\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_select_reduce_basic(predicate_data, on_true_data):\n  # NKI_EXAMPLE_1_BEGIN\n  ##################################################################\n  # Example 1: Basic usage of select_reduce\n  # Create source data, predicate, and destination tensors\n  ##################################################################\n  # Create output tensor for result\n  result_tensor = nl.ndarray(on_true_data.shape, dtype=nl.float32, buffer=nl.hbm)\n  \n  # Load input data to SBUF\n  predicate = nl.load(predicate_data[...])\n  on_true = nl.load(on_true_data[...])\n  \n  # Create destination tensor\n  dst = nl.ndarray(on_true_data.shape, dtype=nl.float32, buffer=nl.sbuf)\n  \n  # Perform select operation - copy from on_true where predicate is true\n  # and set to fp32.min where predicate is false\n  nisa.select_reduce(\n      dst=dst,\n      predicate=predicate,\n      on_true=on_true,\n      on_false=nl.fp32.min,\n  )\n  \n  # Store result to HBM\n  nl.store(result_tensor, value=dst)\n  # NKI_EXAMPLE_1_END\n\n  return result_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_select_reduce_with_reduction(predicate_data, on_true_data, on_false_data):\n  # NKI_EXAMPLE_2_BEGIN\n  ##################################################################\n  # Example 2: Using select_reduce with reduction\n  # Perform selection and compute max reduction per partition\n  ##################################################################\n  # Create output tensors for results\n  result_tensor = nl.ndarray(on_true_data.shape, dtype=nl.float32, buffer=nl.hbm)\n  reduce_tensor = nl.ndarray((on_true_data.shape[0], 1), dtype=nl.float32, buffer=nl.hbm)\n  \n  # Load input data to SBUF\n  predicate = nl.load(predicate_data)\n  on_true = nl.load(on_true_data)\n  on_false = nl.load(on_false_data)\n\n  # Create destination tensor\n  dst = nl.ndarray(on_true_data.shape, dtype=nl.float32, buffer=nl.sbuf)\n  \n  # Create tensor for reduction results\n  reduce_res = nl.ndarray((on_true_data.shape[0], 1), dtype=nl.float32, buffer=nl.sbuf)\n  \n  # Perform select operation with reduction\n  nisa.select_reduce(\n      dst=dst,\n      predicate=predicate,\n      on_true=on_true,\n      on_false=on_false,\n      reduce_cmd=nisa.reduce_cmd.reset_reduce,\n      reduce_res=reduce_res,\n      reduce_op=nl.max\n  )\n  \n  # Store results to HBM\n  nl.store(result_tensor, value=dst)\n  nl.store(reduce_tensor, value=reduce_res)\n  # NKI_EXAMPLE_2_END\n\n  return result_tensor, reduce_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_select_reduce_reverse_pred(predicate_data, on_true_data):\n  # NKI_EXAMPLE_3_BEGIN\n  ##################################################################\n  # Example 3: Using select_reduce with reverse_pred option\n  # Reverse the meaning of the predicate\n  ##################################################################\n  # Create output tensor for result\n  result_tensor = nl.ndarray(on_true_data.shape, dtype=nl.float32, buffer=nl.hbm)\n  \n  # Load input data to SBUF\n  predicate = nl.load(predicate_data[...])\n  on_true = nl.load(on_true_data[...])\n  \n  # Create destination tensor\n  dst = nl.ndarray(on_true_data.shape, dtype=nl.float32, buffer=nl.sbuf)\n  \n  # Perform select operation with reverse_pred=True\n  # This will select on_true where predicate is FALSE\n  nisa.select_reduce(\n      dst=dst,\n      predicate=predicate,\n      on_true=on_true,\n      on_false=nl.fp32.min,\n      reverse_pred=True  # Reverse the meaning of the predicate\n  )\n  \n  # Store result to HBM\n  nl.store(result_tensor, value=dst)\n  # NKI_EXAMPLE_3_END\n\n  return result_tensor\n\n\nclass TestNkiIsaExamplesSelectReduce(unittest.TestCase):\n  def test_select_reduce_basic(self):\n    # Create input data\n    on_true_data = np.ones((128, 64), dtype=np.float32)\n    predicate_data = np.zeros((128, 64), dtype=np.uint8)\n    predicate_data[0:64, :] = 1  # Set first half to 1 (true)\n    \n    # Run the test\n    result = nki_select_reduce_basic(predicate_data, on_true_data)\n\n    self.assertEqual(result.shape, (128, 64))\n    \n    # First half should be 1.0 (from on_true)\n    self.assertTrue(np.all(result[0:64, :] == 1.0))\n    \n    # Second half should be fp32.min (from on_false)\n    self.assertTrue(np.all(result[64:, :] == nl.fp32.min))\n\n  def test_select_reduce_with_reduction(self):\n    np.random.seed(0)\n    on_true_data = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    on_false_data = np.random.random_sample([128, 1]).astype(np.float32) * 100\n    predicate_data = np.random.randint(low=0, high=2, size=[128, 512], dtype=np.bool_)\n\n    result, reduction = nki_select_reduce_with_reduction(predicate_data, on_true_data, on_false_data)\n\n    self.assertEqual(result.shape, (128, 512))\n    self.assertEqual(reduction.shape, (128, 1))\n\n    golden_result = np.where(predicate_data, on_true_data, on_false_data)\n    golden_reduce = np.max(golden_result, axis=1, keepdims=True)\n\n    self.assertTrue(np.allclose(result, golden_result))\n    self.assertTrue(np.allclose(reduction, golden_reduce))\n\n  def test_select_reduce_reverse_pred(self):\n    # Create input data\n    on_true_data = np.ones((128, 64), dtype=np.float32)\n    predicate_data = np.zeros((128, 64), dtype=np.uint8)\n    predicate_data[0:64, :] = 1  # Set first half to 1 (true)\n    \n    # Run the test\n    result = nki_select_reduce_reverse_pred(predicate_data, on_true_data)\n\n    self.assertEqual(result.shape, (128, 64))\n    \n    # First half should be fp32.min (predicate is 1, but reversed)\n    self.assertTrue(np.all(result[0:64, :] == nl.fp32.min))\n    \n    # Second half should be 1.0 (predicate is 0, but reversed)\n    self.assertTrue(np.all(result[64:, :] == 1.0))\n"
  },
  {
    "path": "nki/test/test_nki_isa_sequence_bounds.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_0_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n# NKI_EXAMPLE_0_END\nimport numpy as np\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_sequence_bounds(segment_ids):\n  output = nl.ndarray([1, 2, 32], dtype=segment_ids.dtype, buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_0_BEGIN\n  ######################################################################\n  # Example 1: Generate tile of boundaries of sequence for each element:\n  ######################################################################\n  # Input example\n  # segment_ids = np.array([[0, 1, 1, 2, 2, 2, 0, 3, 3]], dtype=np.int32)\n\n  # Expected output for this example:\n  # [[\n  #   [9, 1, 1, 3, 3, 3, 9, 7, 7]       # start index\n  #   [-1, 3, 3, 6, 6, 6, -1, 9, 9]     # end index\n  #   ]]\n  m, n = segment_ids.shape\n\n  ix, iy, iz = nl.mgrid[0:m, 0:2, 0:n]\n\n  out_tile = nl.ndarray([m, 2, n], dtype=segment_ids.dtype, buffer=nl.sbuf)\n  seq_tile = nl.load(segment_ids)\n  out_tile[ix, iy, iz] = nisa.sequence_bounds(segment_ids=seq_tile)\n  # NKI_EXAMPLE_0_END\n  nl.store(output, value=out_tile)\n  return output\n\n\n\nclass TestNkiIsaExamplesSequenceBounds(unittest.TestCase):\n  def test_sequence_bounds(self):\n    m, n = 1, 32\n    n_seq = m * n\n    length = n\n\n    np.random.seed(0)\n    segment_ids = np.sort(np.random.randint(low=0, high=n_seq, size=length))\n    segment_ids = segment_ids.reshape((m, n), order='F').astype(np.float32)\n    reshaped_segment_ids = segment_ids.reshape(segment_ids.shape[0], -1)\n\n    # NKI_EXAMPLE_1_BEGIN\n    def compute_sequence_bounds(sequence):\n      n = len(sequence)\n\n      min_bounds = np.zeros(n, dtype=sequence.dtype)\n      max_bounds = np.zeros(n, dtype=sequence.dtype)\n\n      min_bound_pad = n\n      max_bound_pad = -1\n\n      min_bounds[0] = 0 if sequence[0] != 0 else min_bound_pad\n      for i in range(1, n):\n        if sequence[i] == 0:\n          min_bounds[i] = min_bound_pad\n        elif sequence[i] == sequence[i - 1]:\n          min_bounds[i] = min_bounds[i - 1]\n        else:\n          min_bounds[i] = i\n\n      max_bounds[-1] = n if sequence[-1] != 0 else max_bound_pad\n      for i in range(n - 2, -1, -1):\n        if sequence[i] == 0:\n          max_bounds[i] = max_bound_pad\n        elif sequence[i] == sequence[i + 1]:\n          max_bounds[i] = max_bounds[i + 1]\n        else:\n          max_bounds[i] = i + 1\n\n      return np.vstack((min_bounds, max_bounds))\n\n    b = (\n      np.apply_along_axis(\n        compute_sequence_bounds, axis=1, arr=reshaped_segment_ids\n      )\n      .reshape(m, 2, n)\n      .astype(np.float32)\n    )\n    # NKI_EXAMPLE_1_END\n\n    a = nki_sequence_bounds(segment_ids=segment_ids)\n    self.assertTrue(np.allclose(a, b))\n"
  },
  {
    "path": "nki/test/test_nki_isa_tensor_copy.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n\n# NKI_EXAMPLE_7_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\n...\n\n# NKI_EXAMPLE_7_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_tensor_copy(in_tensor):\n  out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_7_BEGIN\n  ############################################################################\n  # Example 1: Copy over the tensor to another tensor using the Vector engine.\n  ############################################################################\n  x = nl.load(in_tensor)\n  x_copy = nisa.tensor_copy(x, engine=nisa.vector_engine)\n  nl.store(out_tensor, value=x_copy)\n  # NKI_EXAMPLE_7_END\n\n  return out_tensor\n\n      \nclass TestNkiIsaExamplesTensorCopy(unittest.TestCase):\n  def test_tensor_copy(self):\n    np.random.seed(0)\n    src = np.random.random_sample([8, 8]).astype(np.float32) * 100\n    dst_golden = np.copy(src)\n\n    dst = nki_tensor_copy(src)\n    self.assertTrue(np.allclose(dst, dst_golden))"
  },
  {
    "path": "nki/test/test_nki_isa_tensor_scalar.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_5_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nimport numpy as np\n...\n# NKI_EXAMPLE_5_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_tensor_scalar(a_tensor, c_tensor, e_tensor, f_tensor):\n  b_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  d_tensor = nl.ndarray(c_tensor.shape, dtype=c_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  g_tensor = nl.ndarray(e_tensor.shape, dtype=e_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_5_BEGIN\n  ##################################################################\n  # Example 1: subtract 1.0 from all elements of tile a of\n  # shape (128, 512) and get the output tile in b\n  ##################################################################\n  i_p = nl.arange(128)[:, None]\n  i_f = nl.arange(512)[None, :]\n  # NKI_EXAMPLE_5_END\n  a = nl.load(a_tensor[i_p, i_f])\n  # NKI_EXAMPLE_5_BEGIN\n  b = nisa.tensor_scalar(a[i_p, i_f], np.subtract, 1.0)\n\n  # NKI_EXAMPLE_5_END\n  nl.store(b_tensor[i_p, i_f], b)\n\n  # NKI_EXAMPLE_5_BEGIN\n  ##################################################################\n  # Example 2: broadcast 1.0 into a shape of (128, 512) and subtract\n  # it with tile c to get output tile d\n  ##################################################################\n  i_p = nl.arange(128)[:, None]\n  i_f = nl.arange(512)[None, :]\n  # NKI_EXAMPLE_5_END\n  c = nl.load(c_tensor[i_p, i_f])\n  # NKI_EXAMPLE_5_BEGIN\n  d = nisa.tensor_scalar(c[i_p, i_f], np.subtract, 1.0, reverse0=True)\n\n  # NKI_EXAMPLE_5_END\n  nl.store(d_tensor[i_p, i_f], d)\n\n  # NKI_EXAMPLE_5_BEGIN\n  ##################################################################\n  # Example 3: broadcast multiply tile e with vector f and\n  # then broadcast add with scalar 2.5;\n  # tile e has a shape of (64, 1024) and vector f has a shape of (64, 1)\n  ##################################################################\n  i_p_ef = nl.arange(64)[:, None]\n  i_f_e = nl.arange(1024)[None, :]\n  i_f_f = nl.arange(1)[None, :]\n  # NKI_EXAMPLE_5_END\n  e = nl.load(e_tensor[i_p_ef, i_f_e])\n  f = nl.load(f_tensor[i_p_ef, i_f_f]) \n  # NKI_EXAMPLE_5_BEGIN\n  g = nisa.tensor_scalar(e[i_p_ef, i_f_e], op0=np.multiply, operand0=f[i_p_ef, i_f_f], op1=np.add, operand1=2.5)  \n  # NKI_EXAMPLE_5_END\n\n  nl.store(g_tensor[i_p_ef, i_f_e], g)\n  return b_tensor, d_tensor, g_tensor\n  \n      \nclass TestNkiIsaExamplesTensorScalar(unittest.TestCase):\n  def test_tensor_scalar(self):\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n\n    c = np.random.random_sample([128, 512]).astype(np.float32) * 100\n\n    e = np.random.random_sample([64, 1024]).astype(np.float32) * 100\n    f = np.random.random_sample([64, 1]).astype(np.float32) * 100\n    \n    b, d, g = nki_tensor_scalar(a, c, e, f)\n    \n    self.assertTrue(np.allclose(b, a-1))\n    self.assertTrue(np.allclose(d, 1-c))\n    self.assertTrue(np.allclose(g, e*f + 2.5))\n "
  },
  {
    "path": "nki/test/test_nki_isa_tensor_scalar_cumulative.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_1_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\n# NKI_EXAMPLE_1_END\nimport numpy as np\n\n@nki.jit(mode=\"simulation\")\ndef nki_tensor_scalar_cumulative_scalar(\n  src_data,\n  op0,\n  op1,\n  imm0,\n  imm1=None,\n  reduce_cmd=nisa.reduce_cmd.reset_reduce):\n  # NKI_EXAMPLE_1_BEGIN\n  ##################################################################\n  # Example 1: Basic usage of tensor scalar cumulative.\n  # Using scalar as immeidate values.\n  ##################################################################\n  # Create output tensor for result.\n  result_tensor = nl.ndarray(src_data.shape, dtype=nl.float32, buffer=nl.hbm)\n\n  # Load data into SBUF.\n  src = nl.load(src_data[...])\n\n  # Create destination tensor with zeros.\n  dst = nl.ndarray(src_data.shape, dtype=nl.float32, buffer=nl.sbuf)\n\n  # Apply cumulative operation on tensor with scalar operations.\n  nisa.tensor_scalar_cumulative(\n    src=src,\n    dst=dst,\n    op0=op0,\n    op1=op1,\n    imm0=imm0,\n    imm1=imm1,\n    reduce_cmd=reduce_cmd\n  )\n\n  # Store result to HBM\n  nl.store(result_tensor, value=dst)\n  # NKI_EXAMPLE_1_END\n\n  return result_tensor\n\n@nki.jit(mode=\"simulation\")\ndef nki_tensor_scalar_cumulative_vector(\n  src_data,\n  op0,\n  op1,\n  imm0,\n  imm1=None,\n  reduce_cmd=nisa.reduce_cmd.reset_reduce):\n  # NKI_EXAMPLE_2_BEGIN\n  ##################################################################\n  # Example 2: Basic usage of tensor scalar cumulative.\n  # Using vector as immediate values.\n  ##################################################################\n  # Create output tensor for result.\n  result_tensor = nl.ndarray(src_data.shape, dtype=nl.float32, buffer=nl.hbm)\n\n  # Load data into SBUF.\n  src = nl.load(src_data[...])\n  imm0 = nl.load(imm0[...])\n  imm1 = nl.load(imm1[...]) if imm1 else None\n\n  # Create destination tensor with zeros.\n  dst = nl.ndarray(src_data.shape, dtype=nl.float32, buffer=nl.sbuf)\n\n  # Apply cumulative operation on tensor with scalar operations.\n  nisa.tensor_scalar_cumulative(\n    src=src,\n    dst=dst,\n    op0=op0,\n    op1=op1,\n    imm0=imm0,\n    imm1=imm1,\n    reduce_cmd=reduce_cmd\n  )\n\n  # Store result to HBM\n  nl.store(result_tensor, value=dst)\n  # NKI_EXAMPLE_2_END\n\n  return result_tensor\n\n@nki.jit(mode=\"simulation\")\ndef nki_tensor_scalar_cumulative_chain(\n  src_data,\n  op0,\n  op1,\n  imm0,\n  imm1=None,\n  reduce_cmd=nisa.reduce_cmd.reset_reduce):\n  # NKI_EXAMPLE_3_BEGIN\n  ##################################################################\n  # Example 3: Chain two tensor scalar cumulative together.\n  # Using scalar as immeidate values.\n  ##################################################################\n  # Create output tensor for result.\n  result_tensor = nl.ndarray(src_data.shape, dtype=nl.float32, buffer=nl.hbm)\n\n  # Load data into SBUF.\n  src = nl.load(src_data[...])\n\n  # Create destination tensor with zeros.\n  dst = nl.ndarray(src_data.shape, dtype=nl.float32, buffer=nl.sbuf)\n\n  # Apply cumulative operation on tensor with scalar operations.\n  nisa.tensor_scalar_cumulative(\n    src=src,\n    dst=dst,\n    op0=op0,\n    op1=op1,\n    imm0=imm0,\n    imm1=imm1,\n    reduce_cmd=reduce_cmd\n  )\n\n  # Apply cumulative operation with reduce as reduce_cmd.\n  nisa.tensor_scalar_cumulative(\n    src=src,\n    dst=dst,\n    op0=op0,\n    op1=op1,\n    imm0=imm0,\n    imm1=imm1,\n    reduce_cmd=nisa.reduce_cmd.reduce\n  )\n\n  # Store result to HBM\n  nl.store(result_tensor, value=dst)\n  # NKI_EXAMPLE_3_END\n\n  return result_tensor\n\n@nki.jit(mode=\"simulation\")\ndef nki_tensor_scan(src_data, op, initial):\n  # NKI_EXAMPLE_4_BEGIN\n  ##################################################################\n  # Example 4: Perform tensor scan using tensor scalar cumulative.\n  ##################################################################\n  # Create output tensor for result.\n  result_tensor = nl.ndarray(src_data.shape, dtype=nl.float32, buffer=nl.hbm)\n\n  # Load data into SBUF.\n  src = nl.load(src_data[...])\n\n  # Create destination tensor with zeros.\n  dst = nl.ndarray(src_data.shape, dtype=nl.float32, buffer=nl.sbuf)\n\n  # Apply cumulative operation on tensor with scalar operations.\n  nisa.tensor_scalar_cumulative(\n    src=src,\n    dst=dst,\n    op0=nl.add,\n    op1=op,\n    imm0=np.float32(0.0),\n    imm1=initial,\n    reduce_cmd=nisa.reduce_cmd.load_reduce\n  )\n\n  # Store result to HBM\n  nl.store(result_tensor, value=dst)\n  # NKI_EXAMPLE_4_END\n\n  return result_tensor\n\nclass TestNkiIsaExamplesTensorScalarCumulative(unittest.TestCase):\n  \n  def test_tensor_scalar_cumulative_scalar1(self):\n    \"\"\"Test when op1 is nl.add with scalar imm0.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    result = nki_tensor_scalar_cumulative_scalar(\n      src, op0=nl.add, op1=nl.add, imm0=np.float32(0.0))\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.add.accumulate(src, axis=-1)\n\n    self.assertTrue(np.allclose(result, golden))\n\n  def test_tensor_scalar_cumulative_scalar2(self):\n    \"\"\"Test when op1 is nl.multiply with scalar imm0.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    result = nki_tensor_scalar_cumulative_scalar(\n      src, op0=nl.add, op1=nl.multiply, imm0=np.float32(0.0))\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.multiply.accumulate(src, axis=-1)\n\n    self.assertTrue(np.allclose(result, golden))\n  \n  def test_tensor_scalar_cumulative_vector1(self):\n    \"\"\"Test when op1 is nl.add with vector imm0.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    imm0 = np.ones((128, 1), dtype=np.float32)\n    result = nki_tensor_scalar_cumulative_vector(\n      src, op0=nl.add, op1=nl.add, imm0=imm0)\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.add.accumulate(np.add(src, imm0), axis=-1)\n\n    self.assertTrue(np.allclose(result, golden))\n  \n  def test_tensor_scalar_cumulative_vector2(self):\n    \"\"\"Test when op1 is nl.multiply with vector imm0.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    imm0 = np.ones((128, 1), dtype=np.float32)\n    result = nki_tensor_scalar_cumulative_vector(\n      src, op0=nl.add, op1=nl.multiply, imm0=imm0)\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.multiply.accumulate(np.add(src, imm0), axis=-1)\n\n    self.assertTrue(np.allclose(result, golden))\n  \n  def test_tensor_scalar_cumulative_vector3(self):\n    \"\"\"Test when op1 is nl.max with vector imm0.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    imm0 = np.ones((128, 1), dtype=np.float32)\n    result = nki_tensor_scalar_cumulative_vector(\n      src, op0=nl.add, op1=nl.max, imm0=imm0)\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.maximum.accumulate(np.add(src, imm0), axis=-1)\n\n    self.assertTrue(np.allclose(result, golden))\n  \n  def test_tensor_scalar_cumulative_load_reduce1(self):\n    \"\"\"Test when op1 is nl.add with load_reduce.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    imm0 = np.ones((128, 1), dtype=np.float32)\n    imm1 = np.ones((128, 1), dtype=np.float32)\n    result = nki_tensor_scalar_cumulative_vector(\n      src,\n      op0=nl.add,\n      op1=nl.add,\n      imm0=imm0,\n      imm1=imm1,\n      reduce_cmd=nisa.reduce_cmd.load_reduce\n    )\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.add(np.add.accumulate(np.add(src, imm0), axis=-1), imm1)\n\n    self.assertTrue(np.allclose(result, golden))\n\n  def test_tensor_scalar_cumulative_load_reduce2(self):\n    \"\"\"Test when op1 is nl.multiply with load_reduce.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    imm0 = np.ones((128, 1), dtype=np.float32)\n    imm1 = np.zeros((128, 1), dtype=np.float32)\n    result = nki_tensor_scalar_cumulative_vector(\n      src,\n      op0=nl.add,\n      op1=nl.multiply,\n      imm0=imm0,\n      imm1=imm1,\n      reduce_cmd=nisa.reduce_cmd.load_reduce\n    )\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.zeros((128, 64), dtype=np.float32)\n\n    self.assertTrue(np.allclose(result, golden))\n  \n  def test_tensor_scalar_cumulative_load_reduce3(self):\n    \"\"\"Test when op1 is nl.min with load_reduce.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    imm0 = np.ones((128, 1), dtype=np.float32)\n    imm1 = np.zeros((128, 1), dtype=np.float32)\n    result = nki_tensor_scalar_cumulative_vector(\n      src,\n      op0=nl.add,\n      op1=nl.min,\n      imm0=imm0,\n      imm1=imm1,\n      reduce_cmd=nisa.reduce_cmd.load_reduce\n    )\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.zeros((128, 1), dtype=np.float32)\n\n    self.assertTrue(np.allclose(result, golden))\n\n  def test_tensor_scalar_cumulative_chain1(self):\n    \"\"\"Test chaining two operations of reset_reduce followed by reduce.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    result = nki_tensor_scalar_cumulative_chain(\n      src,\n      op0=nl.add,\n      op1=nl.add,\n      imm0=np.float32(0.0),\n      reduce_cmd=nisa.reduce_cmd.reset_reduce\n    )\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.add(np.add.accumulate(src, axis=-1), 64)\n\n    self.assertTrue(np.allclose(result, golden))\n  \n  def test_tensor_scalar_cumulative_chain2(self):\n    \"\"\"Test chaining two operations of load_reduce followed by reduce.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    result = nki_tensor_scalar_cumulative_chain(\n      src,\n      op0=nl.add,\n      op1=nl.add,\n      imm0=np.float32(0.0),\n      imm1=np.float32(1.0),\n      reduce_cmd=nisa.reduce_cmd.load_reduce\n    )\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.add(np.add.accumulate(src, axis=-1), 65)\n\n    self.assertTrue(np.allclose(result, golden))\n  \n  def test_tensor_scan(self):\n    \"\"\"Test tensor scan.\n    \"\"\"\n    src = np.ones((128, 64), dtype=np.float32)\n\n    result = nki_tensor_scan(src, op=nl.add, initial=np.float32(2.0))\n\n    self.assertEqual(result.shape, (128, 64))\n\n    golden = np.add(np.add.accumulate(src, axis=-1), np.float32(2.0))\n\n    self.assertTrue(np.allclose(result, golden))"
  },
  {
    "path": "nki/test/test_nki_isa_tensor_tensor.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_3_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n...\n# NKI_EXAMPLE_3_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_tensor_tensor(a_tensor, b_tensor):\n  c_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype,\n                        buffer=nl.shared_hbm)\n\n  # NKI_EXAMPLE_3_BEGIN\n  ##################################################################\n  # Example 1: add two tiles, a and b, of the same\n  # shape (128, 512) element-wise and get\n  # the addition result in tile c\n  ##################################################################\n  a: tensor[128, 512] = nl.load(a_tensor)\n  b: tensor[128, 512] = nl.load(b_tensor)\n\n  c: tensor[128, 512] = nisa.tensor_tensor(a, b, op=nl.add)\n\n  # NKI_EXAMPLE_3_END\n  nl.store(c_tensor, c)\n  return c_tensor\n\n\nclass TestNkiIsaExamplesTensorTensor(unittest.TestCase):\n  def test_tensor_tensor(self):\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    c = nki_tensor_tensor(a, b)\n    \n    self.assertTrue(np.allclose(c, np.add(a, b)))\n"
  },
  {
    "path": "nki/test/test_nki_isa_tensor_tensor_scan.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_4_BEGIN\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.language as nl\n# NKI_EXAMPLE_4_END\n\n\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.isa .py file with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n@nki.jit(mode=\"simulation\")\ndef nki_tensor_tensor_scan(a_tensor, b_tensor):\n  c_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  a = nl.load(a_tensor)\n  b = nl.load(b_tensor)\n\n  # NKI_EXAMPLE_4_BEGIN\n  ##################################################################\n  # Example 1: scan two tiles, a and b, of the same\n  # shape (128, 1024) using multiply/add and get\n  # the scan result in tile c\n  ##################################################################\n  c = nl.ndarray(shape=(128, 1024), dtype=nl.float32)\n\n  c[:, 0:512] = nisa.tensor_tensor_scan(a[:, 0:512], b[:, 0:512],\n                                        initial=0, op0=np.multiply, op1=np.add)\n\n  c[:, 512:1024] = nisa.tensor_tensor_scan(a[:, 512:1024], b[:, 512:1024],\n                                           initial=c[:, 511],\n                                           op0=np.multiply, op1=np.add)\n  # NKI_EXAMPLE_4_END\n\n  nl.store(c_tensor, c)\n  return c_tensor\n\n\nclass TestNkiIsaExamplesTensorTensorScan(unittest.TestCase):\n  def test_tensor_tensor_scan(self):\n    a = np.random.random_sample([128, 1024]).astype(np.float32)\n    b = np.random.random_sample([128, 1024]).astype(np.float32)\n    c = nki_tensor_tensor_scan(a, b)\n\n    golden = np.zeros(c.shape)\n    golden[:, 0] = a[:, 0] * 0 + b[:, 0]\n    for i in range(1, c.shape[1]):\n      golden[:, i] = a[:, i] * golden[:, i - 1] + b[:, i]\n\n    print(c)\n    print(golden)\n    print(c - golden)\n    self.assertTrue(np.allclose(c, golden))\n"
  },
  {
    "path": "nki/test/test_nki_mask.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\nimport neuronxcc.nki.isa as nisa\n# NKI_EXAMPLE_15_BEGIN\nimport neuronxcc.nki.language as nl\n# NKI_EXAMPLE_15_END\nimport numpy as np\n...\n\n########################################################################\n# NOTE: if you modify this file, make sure to update nki.api.shared.rst with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_mask(in_tensor):\n  ...\n  out_tensor = nl.ndarray([64, 256], dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_15_BEGIN\n  ...\n  i_p = nl.arange(128)[:, None]\n  i_f = nl.arange(512)[None, :]\n  # NKI_EXAMPLE_15_END\n  in_tile = nl.load(in_tensor[i_p, i_f])\n  # NKI_EXAMPLE_15_BEGIN\n  out_tile = nl.square(in_tile, mask=((i_p<64) & (i_f<256)))\n  # NKI_EXAMPLE_15_END\n\n  nl.store(out_tensor[i_p, i_f], out_tile[i_p, i_f],\n           mask=((i_p < 64) & (i_f < 256)))\n  return out_tensor\n\n\nclass TestNkiIsaExamplesMask(unittest.TestCase):\n  def test_mask(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n\n    b = nki_mask(a)\n\n    b_golden = np.square(a[:64, :256])\n\n    self.assertTrue(np.allclose(b, b_golden))\n"
  },
  {
    "path": "nki/test/test_nki_memory_semantics.py",
    "content": "import unittest\nimport neuronxcc.nki as nki\nimport neuronxcc.nki.language as nl\nimport numpy as np\n\n# NKI_EXAMPLE_0_BEGIN\n@nki.jit(mode='simulation')\ndef simple_demo_kernel(a_ptr):\n  \n  B, N, M = a_ptr.shape\n\n  a_loaded = nl.ndarray((B, nl.par_dim(N), M), dtype=a_ptr.dtype, buffer=nl.sbuf)\n  exp_out =  nl.ndarray((B, nl.par_dim(N), M), dtype=a_ptr.dtype, buffer=nl.sbuf)\n  out_ptr = nl.ndarray((B, nl.par_dim(N), M), dtype=a_ptr.dtype, buffer=nl.shared_hbm)\n\n  for b in nl.affine_range(B):\n    a_loaded[b] = nl.load(a_ptr[b])\n    exp_out[b] = nl.exp(a_loaded[b])\n    nl.store(out_ptr[b], value=exp_out[b])\n\n  return out_ptr\n# NKI_EXAMPLE_0_END\n\nclass TestNkiMemorySemantics(unittest.TestCase):\n  def test_simulate_kernel(self):\n    np.random.seed(0)\n    a = np.random.random_sample([4, 128, 512]).astype(np.float32) * 100\n    \n    result = simple_demo_kernel(a)\n\n    self.assertTrue(np.allclose(result, np.exp(a)))\n"
  },
  {
    "path": "nki/test/test_nki_nl_add.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_20_BEGIN\nimport neuronxcc.nki.language as nl\n# NKI_EXAMPLE_20_END\nimport numpy as np\n\n########################################################################\n# NOTE: if you modify this file, make sure to update the source .py with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef add_tensors(a_tensor, b_tensor):\n  c_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_20_BEGIN\n  a = nl.load(a_tensor[0:128, 0:512])\n  b = nl.load(b_tensor[0:128, 0:512])\n  # add a and b element-wise and store in c[128, 512]\n  c = nl.add(a, b)\n  nl.store(c_tensor[0:128, 0:512], c)\n  # NKI_EXAMPLE_20_END\n  return c_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef add_tensor_scalar(a_tensor):\n  c_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_20_BEGIN\n  a = nl.load(a_tensor[0:128, 0:512])\n  b = 2.2\n  # add constant b to each element in a\n  c = nl.add(a, b)\n  nl.store(c_tensor[0:128, 0:512], c)\n  # NKI_EXAMPLE_20_END\n  return c_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef add_broadcast_free_dim(a_tensor, b_tensor):\n  c_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_20_BEGIN\n  a = nl.load(a_tensor[0:128, 0:512])\n  b = nl.load(b_tensor[0:128, 0:1])\n  # broadcast on free dimension -- [128, 1] is broadcasted to [128, 512]\n  c = nl.add(a, b)\n  nl.store(c_tensor[0:128, 0:512], c)\n  # NKI_EXAMPLE_20_END\n  return c_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef add_broadcast_par_dim(a_tensor, b_tensor):\n  c_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_20_BEGIN\n  a = nl.load(a_tensor[0:128, 0:512])\n  b = nl.load(b_tensor[0:1, 0:512])\n  # broadcast on partition dimension -- [1, 512] is broadcasted to [128, 512]\n  c = nl.add(a, b)\n  nl.store(c_tensor[0:128, 0:512], c)\n  # NKI_EXAMPLE_20_END\n  return c_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef add_broadcast_both_dims(a_tensor, b_tensor):\n  c_tensor = nl.ndarray(a_tensor.shape, dtype=a_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_20_BEGIN\n  a = nl.load(a_tensor[0:128, 0:512])\n  b = nl.load(b_tensor[0:1, 0:1])\n  # broadcast on both dimensions -- [1, 1] is broadcasted to [128, 512]\n  c = nl.add(a, b)\n  nl.store(c_tensor[0:128, 0:512], c)\n  # NKI_EXAMPLE_20_END\n  return c_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef add_broadcast_each_dims(a_tensor, b_tensor):\n  c_tensor = nl.ndarray([128, 512], dtype=a_tensor.dtype,\n                        buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_20_BEGIN\n  a = nl.load(a_tensor[0:128, 0:1])\n  b = nl.load(b_tensor[0:1, 0:512])\n  # broadcast on each dimensions -- [128, 1] and [1, 512] are broadcasted to [128, 512]\n  c = nl.add(a, b)\n  nl.store(c_tensor[0:128, 0:512], c)\n  # NKI_EXAMPLE_20_END\n  return c_tensor\n\n\nclass TestNkiNlExampleAdd(unittest.TestCase):\n  def test_add(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    c = np.zeros([128, 512]).astype(np.float32)\n    c_golden = np.add(a, b)\n    \n    c = add_tensors(a, b)\n    self.assertTrue(np.allclose(c, c_golden))\n\n  def test_add_tensor_scalar(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = 2.2\n    c = np.zeros([128, 512]).astype(np.float32)\n    c_golden = np.add(a, b)\n\n    c = add_tensor_scalar(a)\n    self.assertTrue(np.allclose(c, c_golden))\n\n  def test_add_broadcast_free_dim(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.random.random_sample([128, 1]).astype(np.float32) * 100\n    c = np.zeros([128, 512]).astype(np.float32)\n    c_golden = np.add(a, b)\n\n    c = add_broadcast_free_dim(a, b)\n    self.assertTrue(np.allclose(c, c_golden))\n\n  def test_add_broadcast_par_dim(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.random.random_sample([1, 512]).astype(np.float32) * 100\n    c = np.zeros([128, 512]).astype(np.float32)\n    c_golden = np.add(a, b)\n\n    c = add_broadcast_par_dim(a, b)\n    self.assertTrue(np.allclose(c, c_golden))\n\n  def test_add_broadcast_both_dims(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.random.random_sample([1, 1]).astype(np.float32) * 100\n    c = np.zeros([128, 512]).astype(np.float32)\n    c_golden = np.add(a, b)\n\n    c = add_broadcast_both_dims(a, b)\n    self.assertTrue(np.allclose(c, c_golden))\n\n  def test_add_broadcast_each_dims(self):\n    np.random.seed(0)\n    a = np.random.random_sample([128, 1]).astype(np.float32) * 100\n    b = np.random.random_sample([1, 512]).astype(np.float32) * 100\n    c = np.zeros([128, 512]).astype(np.float32)\n    c_golden = np.add(a, b)\n\n    c = add_broadcast_each_dims(a, b)\n    self.assertTrue(np.allclose(c, c_golden))"
  },
  {
    "path": "nki/test/test_nki_nl_atomic_rmw.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_18_BEGIN\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n...\n# NKI_EXAMPLE_18_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update the source .py with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef atomic_rmw_indirect_indices(in_tensor, indices_tensor, value_tensor):\n  rmw_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # Workaround to get simulation working for testing purposes.\n  # reason: the IR builder marks in_out tensor as output only, hence the simulator ignores the input values of the in_out tensor.\n  # workaround: load input value from another input tensor, write that value to our in_out tensor, so we can test atomic_rmw in simulation.\n  in_tile = nl.load(in_tensor)\n  nl.store(rmw_tensor, in_tile)\n\n  N = 128\n  M = 512\n\n  # NKI_EXAMPLE_18_BEGIN\n  value: tensor[N, M] = nl.load(value_tensor)\n\n  # dynamic indices have to be in SBUF, with shape [N, 1]\n  indices_tile: tensor[N, 1] = nl.load(indices_tensor)\n\n  ix = nl.arange(M)[None, :]\n\n  ########################################################################\n  # Atomic read-modify-write example:\n  #   - read: values of rmw_tensor is indexed by values from indices_tile\n  #   - modify: incremented by value\n  #   - write: saved back into rmw_tensor\n  # resulting in rmw_tensor = rmw_tensor + value\n  ########################################################################\n  nl.atomic_rmw(rmw_tensor[indices_tile, ix], value=value, op=np.add)\n  # NKI_EXAMPLE_18_END\n  return rmw_tensor\n\n\nclass TestNkiExampleNlLoad(unittest.TestCase):\n  def test_atomic_rmw_indirect_indices(self):\n    in_tensor = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    indices_tensor = np.arange(128, dtype=np.int32)\n    indices_tensor = np.expand_dims(indices_tensor, axis=1)\n    value_tensor = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    golden = in_tensor + value_tensor\n\n    rmw_tensor = atomic_rmw_indirect_indices(in_tensor, indices_tensor,\n                                             value_tensor)\n\n    self.assertTrue(np.allclose(rmw_tensor, golden))\n"
  },
  {
    "path": "nki/test/test_nki_nl_broadcast.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_5_BEGIN\nimport neuronxcc.nki.language as nl\n# NKI_EXAMPLE_5_END\n...\n\n\n########################################################################\n# NOTE: if you modify this file, make sure to update the source .py with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n@nki.jit(mode=\"simulation\")\ndef test_nl_broadcast(in_tensor):\n  out_tensor = nl.ndarray([128, 64], in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_5_BEGIN\n  ##################################################################\n  # Example 1: Load from in_tensor[P, F] that is on HBM and\n  # copy into out_tile[P, F] that is on SBUF by broadcasting\n  ##################################################################\n  ...\n  # NKI_EXAMPLE_5_END\n  # NKI_EXAMPLE_5_BEGIN\n  ...\n  # broadcast into out_tile[P, F] that is on SBUF\n  # from data_tile[P, F] that is on SBUF\n  in_tile = nl.load(in_tensor, dtype=in_tensor.dtype)\n  out_tile = nl.broadcast_to(in_tile, shape=(128, in_tensor.shape[1]))\n\n  # store output\n  nl.store(out_tensor, out_tile)\n  # NKI_EXAMPLE_5_END\n  return out_tensor\n\n\nclass TestNkiExampleNlBroadcast(unittest.TestCase):\n  def test_nl_broadcast_to(self):\n    src = np.random.random_sample([1, 64]).astype(np.int32) * 100\n\n    dst = test_nl_broadcast(src)\n    self.assertTrue(np.allclose(np.repeat(src, 128, axis=0), dst))\n"
  },
  {
    "path": "nki/test/test_nki_nl_dslice.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_1_BEGIN\nimport neuronxcc.nki.language as nl\n...\n# NKI_EXAMPLE_1\n\n\n\n@nki.jit(mode=\"simulation\")\ndef example_kernel(in_tensor):\n  out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_1_BEGIN\n  for i in nl.affine_range(in_tensor.shape[1] // 512):\n    tile = nl.load(in_tensor[:, (i * 512):((i + 1) * 512)])\n    # Same as above but use ds (dynamic slice) instead of the native\n    # slice syntax\n    tile = nl.load(in_tensor[:, nl.ds(i * 512, 512)])\n    # NKI_EXAMPLE_1_END\n    nl.store(out_tensor[:, nl.ds(i * 512, 512)], tile)\n\n  return out_tensor\n\n\nclass TestNkiExampleNlLoad(unittest.TestCase):\n  def test_nl_load(self):\n    a = np.random.random_sample([128, 4096]).astype(np.float32) * 100\n\n    b = example_kernel(a)\n    self.assertTrue(np.allclose(a, b))\n"
  },
  {
    "path": "nki/test/test_nki_nl_gather_flattened.py",
    "content": "\"\"\"\nCopyright (C) 2025, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_0_BEGIN\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n# NKI_EXAMPLE_0_END\nimport numpy as np\n\n\n@nki.jit(mode=\"simulation\")\ndef nki_gather_flattened():\n    # NKI_EXAMPLE_0_BEGIN\n    ##################################################################\n    # Example 1: Gather values from a tensor using indices\n    ##################################################################\n    # Create source tensor\n    N = 32\n    M = 64\n    data = nl.rand((N, M), dtype=nl.float32)\n\n    # Create indices tensor - gather every 5th element\n    indices = nl.zeros((N, 10), dtype=nl.uint32)\n    for i in nl.static_range(N):\n        for j in nl.static_range(10):\n            indices[i, j] = j * 5\n\n    # Gather values from data according to indices\n    result = nl.gather_flattened(data=data, indices=indices)\n    # NKI_EXAMPLE_0_END\n\n    # Create output tensor and store result\n    data_tensor = nl.ndarray([N, M], dtype=data.dtype, buffer=nl.shared_hbm)\n    nl.store(data_tensor, value=data)\n    indices_tensor = nl.ndarray([N, 10], dtype=nl.int32, buffer=nl.shared_hbm)\n    nl.store(indices_tensor, value=indices)\n    result_tensor = nl.ndarray([N, 10], dtype=data.dtype, buffer=nl.shared_hbm)\n    nl.store(result_tensor, value=result)\n\n    return data_tensor, indices_tensor, result_tensor\n\n\nclass TestNkiExamplesGather(unittest.TestCase):\n    def test_gather_flattened(self):\n        data, indices, result = nki_gather_flattened()\n\n        self.assertEqual(result.shape, (32, 10))\n        expected = np.take_along_axis(data, indices, axis=-1)\n        self.assertTrue(np.allclose(result, expected))\n\n\nTestNkiExamplesGather().test_gather_flattened()\n"
  },
  {
    "path": "nki/test/test_nki_nl_load_store.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_16_BEGIN NKI_EXAMPLE_15_BEGIN NKI_EXAMPLE_14_BEGIN NKI_EXAMPLE_11_BEGIN NKI_EXAMPLE_10_BEGIN\nimport neuronxcc.nki.language as nl\n# NKI_EXAMPLE_16_END NKI_EXAMPLE_10_END NKI_EXAMPLE_11_END NKI_EXAMPLE_14_END NKI_EXAMPLE_15_END\n...\n\n\n########################################################################\n# NOTE: if you modify this file, make sure to update the source .py with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef example_kernel(in_tensor, use_scalar=False):\n  out_tensor = nl.ndarray(in_tensor.shape, in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_10_BEGIN\n  # load from in_tensor[P, F] that is on HBM\n  # copy into data_tile[P, F] that is on SBUF\n  data_tile = nl.load(in_tensor)\n  ...\n  # NKI_EXAMPLE_10_END\n  if use_scalar:\n    # NKI_EXAMPLE_16_BEGIN\n    ...\n    scalar = 100\n    # store scalar into out_tensor on HBM (effectively a memset)\n    nl.store(out_tensor, scalar)\n    # NKI_EXAMPLE_16_END\n  else:\n    # NKI_EXAMPLE_14_BEGIN\n    ...\n    # store into out_tensor[P, F] that is on HBM\n    # from data_tile[P, F] that is on SBUF\n    nl.store(out_tensor, data_tile)\n    # NKI_EXAMPLE_14_END\n  return out_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef example_load_store_b(in_tensor):\n  out_tensor = nl.ndarray(in_tensor.shape, in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_15_BEGIN NKI_EXAMPLE_11_BEGIN\n  for i_b in nl.affine_range(4):\n    data_tile = nl.zeros((128, 512), dtype=in_tensor.dtype) \n    # NKI_EXAMPLE_15_END\n    # load from in_tensor[4, 128, 512] one batch at a time\n    # copy into data_tile[128, 512]\n    i_p, i_f = nl.mgrid[0:128, 0:512]\n    data_tile[i_p, i_f] = nl.load(in_tensor[i_b, i_p, i_f])\n    # NKI_EXAMPLE_15_BEGIN\n    ...\n    # NKI_EXAMPLE_11_END\n    # store into out_tensor[4, 128, 512] one batch at a time\n    # from data_tile[128, 512] \n    i_p, i_f = nl.mgrid[0:128, 0:512]\n    nl.store(out_tensor[i_b, i_p, i_f], value=data_tile[i_p, i_f]) \n    # NKI_EXAMPLE_15_END\n  return out_tensor\n\n\nclass TestNkiExampleNlLoad(unittest.TestCase):\n  def test_nl_load(self):\n    src = np.random.random_sample([128, 512]).astype(np.float32) * 100\n\n    dst = example_kernel(src)\n    self.assertTrue(np.allclose(src, dst))\n\n  def test_nl_load_scalar(self):\n    src = np.ones([128, 512]).astype(np.int32) * 100\n\n    dst = example_kernel(src, use_scalar=True)\n    self.assertTrue(np.allclose(src, dst))\n\n  def test_load_store_3d(self):\n    in_tensor = np.random.random_sample([4, 128, 512]).astype(np.float32) * 100\n\n    out_tensor = example_load_store_b(in_tensor)\n    self.assertTrue(np.allclose(out_tensor, in_tensor))\n"
  },
  {
    "path": "nki/test/test_nki_nl_load_store_indirect.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_17_BEGIN NKI_EXAMPLE_13_BEGIN\nimport neuronxcc.nki.isa as nisa\n# NKI_EXAMPLE_16_BEGIN NKI_EXAMPLE_12_BEGIN\nimport neuronxcc.nki.language as nl\n...\n\n# NKI_EXAMPLE_12_END NKI_EXAMPLE_13_END NKI_EXAMPLE_16_END NKI_EXAMPLE_17_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update the source .py with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef example_indirect_load_1(data_tensor, idx_tensor):\n  out_tensor = nl.ndarray([64, 512], dtype=data_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_12_BEGIN\n  ############################################################################################\n  # Indirect DMA read example 1:\n  # - data_tensor on HBM has shape [128 x 512].\n  # - idx_tensor on HBM has shape [64] (with values [0, 2, 4, 6, ...]).\n  # - idx_tensor values read from HBM and stored in SBUF idx_tile of shape [64 x 1]\n  # - data_tensor values read from HBM indexed by values in idx_tile \n  #   and store into SBUF data_tile of shape [64 x 512].\n  ############################################################################################\n  i_p = nl.arange(64)[:, None]\n  i_f = nl.arange(512)[None, :]\n\n  idx_tile = nl.load(idx_tensor[i_p]) # indices have to be in SBUF\n  data_tile = nl.load(data_tensor[idx_tile[i_p, 0], i_f]) \n  ...\n  # NKI_EXAMPLE_12_END\n  nl.store(out_tensor, value=data_tile)\n  return out_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef example_indirect_load_2(data_tensor):\n  out_tensor = nl.ndarray([64, 512], dtype=data_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  n, m = data_tensor.shape\n  assert n == 128 and m == 512\n  # NKI_EXAMPLE_13_BEGIN\n  ############################################################################################\n  # Indirect DMA read example 2:\n  # - data_tensor on HBM has shape [128 x 512].\n  # - idx_tile on SBUF has shape [64 x 1] (with values [[0], [2], [4], ...] generated by iota)\n  # - data_tensor values read from HBM indexed by values in idx_tile \n  #   and store into SBUF data_tile of shape [64 x 512].\n  ############################################################################################\n  i_f = nl.arange(512)[None, :]\n  \n  idx_expr = 2*nl.arange(64)[:, None]\n  idx_tile = nisa.iota(idx_expr, dtype=np.int32)\n  data_tile = nl.load(data_tensor[idx_tile, i_f]) \n  ...\n  # NKI_EXAMPLE_13_END\n\n  nl.store(out_tensor, value=data_tile)\n  return out_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef example_indirect_save_1(in_tensor, idx_tensor):\n  data_tensor = nl.ndarray([128, 512], dtype=in_tensor.dtype,\n                           buffer=nl.shared_hbm)\n  data_tile = nl.load(in_tensor)\n  ...\n  # NKI_EXAMPLE_16_BEGIN\n  ##################################################################################\n  # Indirect DMA write example 1:\n  #  - data_tensor has shape [128 x 512].\n  #  - idx_tensor on HBM has shape [64] (with values [0, 2, 4, 6, ...]).\n  #  - idx_tensor values read from HBM and stored in SBUF idx_tile.\n  #  - data_tile of shape [64 x 512] values written into\n  #    HBM data_tensor indexed by values in idx_tile.\n  ##################################################################################\n  i_p = nl.arange(64)[:, None]\n  i_f = nl.arange(512)[None, :]\n  idx_tile = nl.load(idx_tensor[i_p]) # indices have to be in SB\n\n  nl.store(data_tensor[idx_tile[i_p, 0], i_f], value=data_tile[0:64, 0:512])\n  # NKI_EXAMPLE_16_END\n  return data_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef example_indirect_save_2(in_tensor):\n  data_tensor = nl.ndarray([128, 512], dtype=in_tensor.dtype,\n                           buffer=nl.shared_hbm)\n  n, m = in_tensor.shape\n  i_f = nl.arange(m)[None, :]\n  data_tile = nl.load(in_tensor)\n  assert n == 64 and m == 512\n  ...\n  # NKI_EXAMPLE_17_BEGIN\n  #############################################################################################\n  # Indirect DMA write example 2:\n  #  - data_tensor has shape [128 x 512].\n  #  - idx_tile on SBUF has shape [64 x 1] (with values [[0], [2], [4], ...] generated by iota)\n  #  - data_tile of shape [64 x 512] values written into\n  #    HBM data_tensor indexed by values in idx_tile.\n  #############################################################################################\n  idx_expr = 2*nl.arange(64)[:, None]\n  idx_tile = nisa.iota(idx_expr, dtype=np.int32)\n  \n  nl.store(data_tensor[idx_tile, i_f], value=data_tile[0:64, 0:512]) \n  # NKI_EXAMPLE_17_END\n  return data_tensor\n\n\nclass TestNkiExampleNlLoadStoreIndirect(unittest.TestCase):\n  def test_indirect_load_1(self):\n    in_tensor = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    idx_tensor = 2*np.arange(64, dtype=np.int32)\n    golden = in_tensor[idx_tensor]\n\n    out_tensor = example_indirect_load_1(in_tensor, idx_tensor)\n    self.assertTrue(np.allclose(out_tensor, golden))\n\n  def test_indirect_load_2(self):\n    in_tensor = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    idx_tensor = 2*np.arange(64, dtype=np.int32)\n    golden = in_tensor[idx_tensor]\n\n    out_tensor = example_indirect_load_2(in_tensor)\n    self.assertTrue(np.allclose(out_tensor, golden))\n\n  def test_indirect_save_1(self):\n    in_tensor = np.random.random_sample([64, 512]).astype(np.float32) * 100\n    idx_tensor = 2*np.arange(64, dtype=np.int32)\n\n    out_tensor = example_indirect_save_1(in_tensor, idx_tensor)\n    self.assertTrue(np.allclose(out_tensor[idx_tensor], in_tensor))\n\n  def test_indirect_save_2(self):\n    in_tensor = np.random.random_sample([64, 512]).astype(np.float32) * 100\n    idx_tensor = 2*np.arange(64, dtype=np.int32)\n\n    out_tensor = example_indirect_save_2(in_tensor)\n    self.assertTrue(np.allclose(out_tensor[idx_tensor], in_tensor))\n   \n"
  },
  {
    "path": "nki/test/test_nki_nl_load_transpose2d.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_19_BEGIN\nimport neuronxcc.nki.language as nl\nfrom neuronxcc.nki.typing import tensor\n...\n\n# NKI_EXAMPLE_19_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update the source .py with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef example_kernel_0(in_tensor):\n  out_tensor = nl.ndarray([in_tensor.shape[1], in_tensor.shape[0]], dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_19_BEGIN\n  # load from in_tensor[F, P] that is on HBM\n  # transpose and copy into local_tile[P, F] that is on SBUF\n  N, M = in_tensor.shape\n  local_tile: tensor[M, N] = nl.load_transpose2d(in_tensor)\n  ...\n  # NKI_EXAMPLE_19_END\n  nl.store(out_tensor, value=local_tile)\n  return out_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef example_kernel_1(in_tensor):\n  out_tensor = nl.ndarray([in_tensor.shape[1], in_tensor.shape[0]], dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_20_BEGIN\n  import neuronxcc.nki.isa as nisa\n  ...\n\n  # load from in_tensor[F, P] that is on HBM\n  # transpose and copy into local_tile[P, F] that is on SBUF\n  # always use the DMA engine\n  N, M = in_tensor.shape\n  local_tile: tensor[M, N] = nisa.dma_transpose(in_tensor)\n  ...\n  # NKI_EXAMPLE_20_END\n  nl.store(out_tensor, value=local_tile)\n  return out_tensor\n\n\n\nclass TestNkiExampleNlLoadTranspose2d(unittest.TestCase):\n  def test_dma_transpose_load_0(self):\n    np.random.seed(0)\n    src = np.random.random_sample([2048, 128]).astype(np.float32) * 100\n\n    dst = example_kernel_0(src)\n\n    dst_golden = np.transpose(src)\n    self.assertTrue(np.allclose(dst, dst_golden))\n\n  def test_dma_transpose_load_1(self):\n    np.random.seed(0)\n    src = np.random.random_sample([2048, 128]).astype(np.float32) * 100\n\n    dst = example_kernel_1(src)\n\n    dst_golden = np.transpose(src)\n    self.assertTrue(np.allclose(dst, dst_golden))\n"
  },
  {
    "path": "nki/test/test_nki_nl_mgrid.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n# NKI_EXAMPLE_9_BEGIN NKI_EXAMPLE_8_BEGIN\nimport neuronxcc.nki.language as nl\n...\n\n# NKI_EXAMPLE_8_END NKI_EXAMPLE_9_END\n\n########################################################################\n# NOTE: if you modify this file, make sure to update the source .py with\n# NOTE: the correct line numbers under .. literalinclude:: directive\n########################################################################\n\n\n@nki.jit(mode=\"simulation\")\ndef example_kernel(in_tensor):\n  out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_8_BEGIN\n  i_p, i_f = nl.mgrid[0:128, 0:512]\n  tile = nl.load(in_tensor[i_p, i_f])\n  ...\n  nl.store(out_tensor[i_p, i_f], tile)\n\n  # NKI_EXAMPLE_8_END\n  return out_tensor\n\n\n@nki.jit(mode=\"simulation\")\ndef example_kernel_1(in_tensor):\n  out_tensor = nl.ndarray(in_tensor.shape, dtype=in_tensor.dtype,\n                          buffer=nl.shared_hbm)\n  # NKI_EXAMPLE_9_BEGIN\n  grid = nl.mgrid[0:128, 0:512]\n  tile = nl.load(in_tensor[grid.p, grid.x])\n  ...\n  nl.store(out_tensor[grid.p, grid.x], tile)\n  # NKI_EXAMPLE_9_END\n  return out_tensor\n\n\nclass TestNkiExampleNlLoad(unittest.TestCase):\n  def test_nl_load(self):\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.ndarray(shape=(128, 512), dtype=np.float32)\n\n    b = example_kernel(a)\n    self.assertTrue(np.allclose(a, b))\n\n  def test_nl_load_1(self):\n    a = np.random.random_sample([128, 512]).astype(np.float32) * 100\n    b = np.ndarray(shape=(128, 512), dtype=np.float32)\n\n    b = example_kernel_1(a)\n    self.assertTrue(np.allclose(a, b))"
  },
  {
    "path": "nki/test/test_nki_simulate_kernel.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\n# NKI_EXAMPLE_BEGIN\nimport neuronxcc.nki as nki\nimport neuronxcc.nki.language as nl\nimport numpy as np\n\n\n@nki.jit\ndef print_kernel(a_tensor):\n  b = nl.empty_like(a_tensor, buffer=nl.hbm)\n\n  # Load tensor into sbuf\n  a = nl.load(a_tensor)\n\n  # Print tensor y\n  nl.device_print(\"value of a:\", a)\n\n  # Directly store a into hbm\n  nl.store(b, value=a)\n\n  return b\n# NKI_EXAMPLE_END\n\n\nclass TestNkiIsaExamplesSimulateKernel(unittest.TestCase):\n  def test_simulate_kernel(self):\n    # NKI_EXAMPLE_BEGIN\n    np.random.seed(0)\n    a = np.random.random_sample([3, 4]).astype(np.float32) * 10\n\n    b = nki.simulate_kernel(print_kernel, a)\n\n    assert np.allclose(a, b)\n    # NKI_EXAMPLE_END"
  },
  {
    "path": "nki/test/test_nki_spmd_grid.py",
    "content": "\"\"\"\nCopyright (C) 2024, Amazon.com. All Rights Reserved\n\n\"\"\"\nimport unittest\n\nimport numpy as np\nimport neuronxcc.nki as nki\n\n# NKI_EXAMPLE_0_BEGIN\nimport neuronxcc.nki.language as nl\n\n\n@nki.jit\ndef nki_spmd_kernel(a):\n  b = nl.ndarray(a.shape, dtype=a.dtype, buffer=nl.shared_hbm)\n  i = nl.program_id(0)\n  j = nl.program_id(1)\n  \n  a_tile = nl.load(a[i, j])\n  nl.store(b[i, j], a_tile)\n\n  return b\n# NKI_EXAMPLE_0_END\n\n\nnki_spmd_kernel = nki.jit(nki_spmd_kernel, mode='simulation',\n                          platform_target='trn2')\n\n\nclass TestNkiIsaExamplesTensorCopy(unittest.TestCase):\n  def test_spmd_grid(self):\n    np.random.seed(0)\n    src = np.random.random_sample([4, 2, 1, 1]).astype(np.float32) * 100\n    dst_golden = np.copy(src)\n\n    # NKI_EXAMPLE_0_BEGIN\n    ############################################################################\n    # Example 1: Let compiler decide how to distribute the instances of spmd kernel\n    ############################################################################\n    dst = nki_spmd_kernel[4, 2](src)\n    # NKI_EXAMPLE_0_END\n    self.assertTrue(np.allclose(dst, dst_golden))\n\n    # NKI_EXAMPLE_0_BEGIN\n    ############################################################################\n    # Example 2: Distribute SPMD kernel instances to physical NeuronCores with\n    # explicit annotations. Expected physical NeuronCore assignments:\n    #   Physical NC [0]: kernel[0, 0], kernel[0, 1], kernel[1, 0], kernel[1, 1]\n    #   Physical NC [1]: kernel[2, 0], kernel[2, 1], kernel[3, 0], kernel[3, 1]\n    ############################################################################\n    dst = nki_spmd_kernel[nl.spmd_dim(nl.nc(2), 2), 2](src)\n    dst = nki_spmd_kernel[nl.nc(2) * 2, 2](src)  # syntactic sugar\n    # NKI_EXAMPLE_0_END\n    self.assertTrue(np.allclose(dst, dst_golden))\n\n    # NKI_EXAMPLE_0_BEGIN\n    ############################################################################\n    # Example 3: Distribute SPMD kernel instances to physical NeuronCores with\n    # explicit annotations. Expected physical NeuronCore assignments:\n    #   Physical NC [0]: kernel[0, 0], kernel[0, 1], kernel[2, 0], kernel[2, 1]\n    #   Physical NC [1]: kernel[1, 0], kernel[1, 1], kernel[3, 0], kernel[3, 1]\n    ############################################################################\n    dst = nki_spmd_kernel[nl.spmd_dim(2, nl.nc(2)), 2](src)\n    dst = nki_spmd_kernel[2 * nl.nc(2), 2](src)  # syntactic sugar\n    # NKI_EXAMPLE_0_END\n    self.assertTrue(np.allclose(dst, dst_golden))\n"
  },
  {
    "path": "nki/test/test_psum_modulo_alloc.py",
    "content": "# NKI_EXAMPLE_0_BEGIN\nfrom typing import Optional, Tuple\nfrom functools import reduce\nfrom operator import mul\nimport unittest\n\ndef num_elems(shape):\n  return reduce(mul, shape, 1)\n\ndef linearize(shape, indices):\n  return sum(i * num_elems(shape[dim+1:]) for dim, i in enumerate(indices))\n\ndef modulo_allocate_func(base, allocate_shape, scale):\n  def func(indices):\n    if not allocate_shape:\n      # default shape is always (1, 1, ...)\n      allocate_shape_ = (1, ) * len(indices)\n    else:\n      allocate_shape_ = allocate_shape\n    mod_idx = tuple(i % s for i, s in zip(indices, allocate_shape_))\n    return linearize(shape=allocate_shape_, indices=mod_idx) * scale + base\n  return func\n\ndef mod_alloc(base_addr: int, *, \n               base_bank: Optional[int] = 0,\n               num_bank_tiles: Optional[Tuple[int]] = (),\n               base_partition: Optional[int] = 0,\n               num_par_tiles: Optional[Tuple[int]] = (),\n               num_free_tiles: Optional[Tuple[int]] = ()):\n  def psum_modulo_alloc_func(idx, pdim_size, fdim_size):\n    # partial bank allocation is not allowed\n    return (modulo_allocate_func(base_bank, num_bank_tiles, 1)(idx),\n          modulo_allocate_func(base_partition, num_par_tiles, pdim_size)(idx),\n          modulo_allocate_func(base_addr, num_free_tiles, fdim_size)(idx))\n  return psum_modulo_alloc_func\n\n# NKI_EXAMPLE_0_END\n\nimport neuronxcc.nki as nki\nimport neuronxcc.nki.language as nl\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.compiler as ncc\nimport numpy as np\nnki_jit = nki.trace\n\n\n@nki_jit\ndef allocated_loop_transpose(a_ptr, tp_ptr):\n  \n  N, M = a_ptr.shape\n\n  _M, _N = tp_ptr.shape\n  assert _N == N and _M == M\n\n  N0, N1 = N // 128, 128\n  M0, M1 = M // 128, 128\n\n  ix0 = nl.arange(0, M1)[:, None]\n  iy0 = nl.arange(0, N1)[None, :]\n\n  identity = nl.shared_identity_matrix(n=128, dtype=nl.bfloat16)\n\n  for n0 in nl.affine_range(N0):\n    for m0 in nl.affine_range(M0):\n      ix0 = nl.arange(0, 128)[:, None]\n      iy0 = nl.arange(0, 128)[None, :]\n      a_local = nl.ndarray((nl.par_dim(N1), M1), dtype=a_ptr.dtype, \n                           buffer=ncc.sbuf.mod_alloc(base_addr=1024))\n      a_local[ix0, iy0] = nl.load(a_ptr[n0 * N1 + ix0, m0 * M1 + iy0])\n\n      identity_load = nl.ndarray((nl.par_dim(128), 128), dtype=a_ptr.dtype, buffer=ncc.sbuf.mod_alloc(base_addr=0))\n      identity_load[ix0, iy0] = nl.load(identity, dtype=a_ptr.dtype)\n\n      a_local_transpose = nl.ndarray((nl.par_dim(M1), N1), dtype=a_ptr.dtype,\n                                     buffer=ncc.psum.alloc(mod_alloc(base_addr=0)))\n      a_local_transpose[ix0, iy0] = nisa.nc_matmul(a_local[ix0, iy0], identity_load)\n\n      a_t_sbuf = nl.ndarray((nl.par_dim(N1), M1), dtype=a_ptr.dtype,\n                                     buffer=ncc.sbuf.mod_alloc(base_addr=2048))\n      a_t_sbuf[ix0, iy0] = nl.copy(a_local_transpose[ix0, iy0])\n\n      nl.store(tp_ptr[m0 * 128 + ix0, n0 * 128 + iy0], value=a_t_sbuf[ix0, iy0])\n\nclass TestNkiPSUMModuloAllocation(unittest.TestCase):\n  def test_simulate_kernel(self):\n    np.random.seed(0)\n    a = np.random.random_sample([2048, 1024]).astype(np.float32) * 100\n    b = np.ndarray(shape=(1024, 2048), dtype=np.float32)\n\n    nki.simulate_kernel(allocated_loop_transpose, a, b)\n\n    self.assertTrue(np.allclose(b, np.transpose(a)))\n\n"
  },
  {
    "path": "nki/test/test_sbuf_modulo_alloc.py",
    "content": "# NKI_EXAMPLE_0_BEGIN\nfrom typing import Optional, Tuple\nfrom functools import reduce\nfrom operator import mul\nimport unittest\n\ndef num_elms(shape):\n  return reduce(mul, shape, 1)\n\ndef linearize(shape, indices):\n  return sum(i * num_elms(shape[dim+1:]) for dim, i in enumerate(indices))\n\ndef modulo_allocate_func(base, allocate_shape, scale):\n  def func(indices):\n    if not allocate_shape:\n      # default shape is always (1, 1, ...)\n      allocate_shape_ = (1, ) * len(indices)\n    else:\n      allocate_shape_ = allocate_shape\n    mod_idx = tuple(i % s for i, s in zip(indices, allocate_shape_))\n    return linearize(shape=allocate_shape_, indices=mod_idx) * scale + base\n  return func\n\ndef mod_alloc(base_addr: int, *, \n               base_partition: Optional[int] = 0,\n               num_par_tiles: Optional[Tuple[int, ...]] = (),\n               num_free_tiles: Optional[Tuple[int, ...]] = ()):\n  def sbuf_modulo_alloc_func(idx, pdim_size, fdim_size):\n    return (modulo_allocate_func(base_partition, num_par_tiles, pdim_size)(idx),\n          modulo_allocate_func(base_addr, num_free_tiles, fdim_size)(idx))\n  return sbuf_modulo_alloc_func\n\n# NKI_EXAMPLE_0_END\n\n\nimport neuronxcc.nki as nki\nimport neuronxcc.nki.language as nl\nimport neuronxcc.nki.isa as nisa\nimport neuronxcc.nki.compiler as ncc\nimport numpy as np\nnki_jit = nki.trace\n\n\n@nki_jit\ndef allocated_loop_transpose(a_ptr, tp_ptr):\n  \n  N, M = a_ptr.shape\n\n  _M, _N = tp_ptr.shape\n  assert _N == N and _M == M\n\n  N0, N1 = N // 128, 128\n  M0, M1 = M // 128, 128\n\n  ix0 = nl.arange(0, M1)[:, None]\n  iy0 = nl.arange(0, N1)[None, :]\n\n  identity = nl.shared_identity_matrix(n=128, dtype=nl.bfloat16)\n\n  for n0 in nl.affine_range(N0):\n    for m0 in nl.affine_range(M0):\n      ix0 = nl.arange(0, 128)[:, None]\n      iy0 = nl.arange(0, 128)[None, :]\n      a_local = nl.ndarray((nl.par_dim(N1), M1), dtype=a_ptr.dtype, \n                           buffer=ncc.sbuf.alloc(mod_alloc(base_addr=1024)))\n      a_local[ix0, iy0] = nl.load(a_ptr[n0 * N1 + ix0, m0 * M1 + iy0])\n\n      identity_load = nl.ndarray((nl.par_dim(128), 128), dtype=a_ptr.dtype, buffer=ncc.sbuf.alloc(mod_alloc(base_addr=0)))\n      identity_load[ix0, iy0] = nl.load(identity, dtype=a_ptr.dtype)\n\n      a_local_transpose = nl.ndarray((nl.par_dim(M1), N1), dtype=a_ptr.dtype,\n                                     buffer=ncc.psum.mod_alloc(base_bank=0))\n      a_local_transpose[ix0, iy0] = nisa.nc_matmul(a_local[ix0, iy0], identity_load)\n\n      a_t_sbuf = nl.ndarray((nl.par_dim(N1), M1), dtype=a_ptr.dtype,\n                                     buffer=ncc.sbuf.alloc(mod_alloc(base_addr=2048)))\n      a_t_sbuf[ix0, iy0] = nl.copy(a_local_transpose[ix0, iy0])\n\n      nl.store(tp_ptr[m0 * 128 + ix0, n0 * 128 + iy0], value=a_t_sbuf[ix0, iy0])\n\n\nclass TestNkiSBUFModuloAllocation(unittest.TestCase):\n  def test_simulate_kernel(self):\n    np.random.seed(0)\n    a = np.random.random_sample([2048, 1024]).astype(np.float32) * 100\n    b = np.ndarray(shape=(1024, 2048), dtype=np.float32)\n\n    nki.simulate_kernel(allocated_loop_transpose, a, b)\n\n    self.assertTrue(np.allclose(b, np.transpose(a)))\n"
  },
  {
    "path": "release-notes/2.29.0.rst",
    "content": ".. _neuron-2-29-0-whatsnew:\n.. _latest-neuron-release:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK, version 2.29.0. Release date: 04/09/2026.\n\nAWS Neuron SDK 2.29.0 release notes\n===================================\n\n**Date of release**: April 09, 2026\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   PyTorch (torch-neuronx) <components/pytorch>\n   NxD Inference/vLLM  <components/nxd-inference>\n   NKI <components/nki>\n   NKI Library <components/nki-lib>\n   Neuron Runtime <components/runtime>\n   Developer tools <components/dev-tools>\n   Deep Learning AMIs <components/dlamis>\n   Deep Learning Containers <components/containers>\n\nThis page provides detailed component release notes for the Neuron SDK 2.29.0. For a an overview of the release content, see :ref:`What's New in AWS Neuron <whats-new-2026-04-02-v2_29>`.\n\nPackage and Library Updates\n---------------------------\n\n.. grid:: 1 \n        :gutter: 2\n\n        .. grid-item-card::\n                :link: latest-neuron-release-artifacts\n                :link-type: ref\n                :class-card: sd-border-1\n        \n                **Neuron 2.29.0 release artifacts**\n                ^^^\n                The libraries and packages updated in this Neuron release.\n\nComponent Release Notes\n-----------------------\n\nSelect a card below to review detailed release notes for each component of the Neuron SDK version 2.29.0. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.\n\n* For the full set of component release notes across Neuron versions, see :doc:`/release-notes/components/index`.\n\n.. grid:: 1 \n        :gutter: 2\n\n        .. grid-item-card:: \n                :link: components/pytorch\n                :link-type: doc\n\n                **PyTorch Neuron (torch-neuronx)** 2.29.0 release notes\n                ^^^\n                Integrated, native support for PyTorch on Neuron.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: components/nxd-inference\n                :link-type: doc\n\n                **NxD Inference** 2.29.0 release notes\n                ^^^\n                Neuron features and tools for LLM and agent ML model inference, and the vLLM Plugin for Neuron.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: components/nki\n                :link-type: doc\n\n                **Neuron Kernel Interface (NKI)** 2.29.0 release notes\n                ^^^\n                Neuron's Python-based programming interface for developing and optimizing Neuron kernels.\n                +++\n                Supports: ``Inf2``, ``Trn1``, ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: components/nki-lib\n                :link-type: doc\n\n                **NKI Library (NKI-Lib)** 2.29.0 release notes\n                ^^^\n                Reference kernels and utilities for Neuron kernel development with NKI.\n                +++\n                Supports: ``Inf2``, ``Trn1``, ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: components/runtime\n                :link-type: doc\n\n                **Neuron Runtime** 2.29.0 release notes\n                ^^^\n                The Neuron kernel driver and C++ libraries for AWS Inferentia and Trainium instances.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: components/dev-tools\n                :link-type: doc\n\n                **Neuron Developer Tools** 2.29.0 release notes\n                ^^^\n                Tools that support end-to-end development for AWS Neuron, including Neuron Explorer.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: components/dlamis\n                :link-type: doc\n\n                **Neuron Deep Learning AWS Machine Images (DLAMIs)** 2.29.0 release notes\n                ^^^\n                AWS-specific machine images for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n \n        .. grid-item-card:: \n                :link: components/containers\n                :link-type: doc\n\n                **Neuron Deep Learning Containers (DLCs)** 2.29.0 release notes\n                ^^^\n                AWS-specific container definitions for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\nPrevious releases\n-----------------\n\n* :doc:`Neuron 2.28.1 </release-notes/prev/2.28.1>`\n* :doc:`Neuron 2.28.0 </release-notes/prev/2.28.0>`\n* :doc:`Neuron 2.27.0 </release-notes/prev/2.27.0/index>`\n* :doc:`Neuron 2.26.0 </release-notes/prev/2.26.0/index>`\n* :doc:`Neuron 2.25.0 </release-notes/prev/2.25.0/index>`\n* :doc:`Earlier releases </release-notes/prev/rn>`\n\n* :ref:`prev-rn`\n* :ref:`pre-release-content`\n* :ref:`prev-n1-rn`"
  },
  {
    "path": "release-notes/archive/customcxxps/gpsimd-customop-lib.rst",
    "content": ".. _gpsimd-customop-lib-rn:\n\nNeuron Custom C++ Library Release Notes\n========================================\n\n.. note::\n\n    Neuron Custom C++ Operators feature is currently supported on NeuronCore-v2 architecture only, which is found in Trainium (Trn1) and second-generation Inferentia (Inf2) chips.\n\naws-neuronx-gpsimd-customop-lib [0.20.7]\n-----------------------------------------\n\nDate: 03/12/2026\n\n* Fixed package dependency issue with version 0.20.4, initially released as part of Neuron release 2.28.0.\n\naws-neuronx-gpsimd-customop-lib [0.13]\n---------------------------------------\n\nDate: 12/12/2024\n\n* Neuron Custom C++ Operators feature is currently supported on NeuronCore-v2 architecture only, which is found in Trainium (Trn1) and second-generation Inferentia (Inf2) chips.\n\naws-neuronx-gpsimd-customop-lib [0.3]\n-------------------------------------\n\nDate: 04/28/2023\n\n* Add initial support for using Multiple GPSIMD Cores for Custom C++ Operators\n* Package name was changed to ``aws-neuronx-gpsimd-customop-lib``\n\naws-neuronx-gpsimd-customop [0.1]\n---------------------------------\n\nDate: 02/08/2023\n\n* First release of aws-neuronx-gpsimd-customop. This release provides tensor library support required for building Neuron Custom C++ operators.\n"
  },
  {
    "path": "release-notes/archive/customcxxps/gpsimd-tools.rst",
    "content": ".. _gpsimd-customop-tools-rn:\n\nNeuron Custom C++ Tools Release Notes\n======================================\n\n.. note::\n\n    Neuron Custom C++ Operators feature is currently supported on NeuronCore-v2 architecture only, which is found in Trainium (Trn1) and second-generation Inferentia (Inf2) chips.\n\naws-neuronx-gpsimd-tools [0.13]\n-------------------------------------\n\nDate: 12/12/2024\n\n* Neuron Custom C++ Operators feature is currently supported on NeuronCore-v2 architecture only, which is found in Trainium (Trn1) and second-generation Inferentia (Inf2) chips.\n\naws-neuronx-gpsimd-tools [0.1]\n------------------------------\n\nDate: 02/08/2023\n\n* First release of aws-neuronx-gpsimd-tools. This release provides the required tools to support the building of Neuron Custom C++ operators.\n"
  },
  {
    "path": "release-notes/archive/index.rst",
    "content": ".. meta::\n    :description: Archived release notes for deprecated Neuron SDK components\n    :keywords: neuron, release notes, archive, deprecated, legacy\n    :date-modified: 02/26/2026\n\nArchived Neuron Component Release Notes\n=======================================\n\nThis page contains links to release notes for Neuron components that are no longer supported or are no longer in active development.\n\n.. list-table::\n   :widths: 40 60\n   :header-rows: 1\n   :align: left\n\n   * - Component\n     - Description\n   * - :doc:`Neuron XLA Pluggable Device <libneuronxla>`\n     - Neuron XLA pluggable device (libneuronxla) - PJRT runtime integration\n   * - :doc:`Apache MXNet Neuron <mxnet-neuron>`\n     - Apache MXNet Neuron framework release notes\n   * - :doc:`TensorBoard Neuron Plugin <tensorboard-neuron>`\n     - Neuron Plugin for TensorBoard release notes\n   * - :doc:`PyTorch Neuron for Inf1 <torch-neuron>`\n     - PyTorch Neuron (torch-neuron) for Inf1 release notes\n   * - :doc:`Custom C++ Operators Library <customcxxps/gpsimd-customop-lib>`\n     - Neuron Custom C++ Operators Library (aws-neuronx-gpsimd-customop-lib)\n   * - :doc:`Custom C++ Operators Tools <customcxxps/gpsimd-tools>`\n     - Neuron Custom C++ Operators Tools (aws-neuronx-gpsimd-tools)\n   * - :doc:`NeMo Megatron <nemo/index>`\n     - AWS Neuron Reference for Nemo Megatron (neuronx-nemo-megatron)\n   * - :doc:`Neuron Compiler for Inf1 <neuron-cc/neuron-cc>`\n     - Neuron Compiler (neuron-cc) for Inferentia 1 chips\n   * - :doc:`Neuron Compiler Supported Operators <neuron-cc/neuron-cc-ops/index>`\n     - List of operators supported by Neuron Compiler for various frameworks\n   * - :doc:`TensorFlow Model Server 1.x <tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuron>`\n     - TensorFlow Model Server Neuron 1.x release notes\n   * - :doc:`TensorFlow Model Server 2.x <tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuron-v2>`\n     - TensorFlow Model Server Neuron 2.x release notes\n   * - :doc:`TensorFlow Model Server NeuronX <tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuronx>`\n     - TensorFlow Model Server NeuronX (tensorflow-modeslserver-neuronx) release notes\n   * - :doc:`TensorFlow Neuron 1.x <tensorflow/tensorflow-neuron/tensorflow-neuron>`\n     - TensorFlow Neuron (TF1.x) for Inf1 release notes\n   * - :doc:`TensorFlow Neuron 2.x <tensorflow/tensorflow-neuron/tensorflow-neuron-v2>`\n     - TensorFlow 2.x (tensorflow-neuron) for Inf1 release notes\n   * - :doc:`TensorFlow NeuronX <tensorflow/tensorflow-neuronx/tensorflow-neuronx>`\n     - TensorFlow 2.x (tensorflow-neuronx) for Trn1/Inf2 release notes\n   * - :doc:`Neuron SDK 1.x Releases <neuron1/prev/rn>`\n     - Archived release notes for Neuron SDK 1.x versions\n   * - :doc:`Previous Neuron 2.x Release Artifacts <../prev/content>`\n     - Package lists and artifacts for previous Neuron 2.x releases\n\n.. note::\n  You can also access older Neuron documentation by selecting the version selector widget in the lower-right of the browser page and changing it to a prior Neuron version.\n\n  .. image:: /images/version-selector-rtd.png\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   Neuron XLA Pluggable Device <libneuronxla>\n   Apache MXNet Neuron <mxnet-neuron>\n   TensorBoard Neuron Plugin <tensorboard-neuron>\n   PyTorch Neuron for Inf1 <torch-neuron>\n   Custom C++ Operators Library <customcxxps/gpsimd-customop-lib>\n   Custom C++ Operators Tools <customcxxps/gpsimd-tools>\n   NeMo Megatron <nemo/index>\n   Neuron Compiler for Inf1 <neuron-cc/neuron-cc>\n   Neuron Compiler Supported Operators <neuron-cc/neuron-cc-ops/index>\n   TensorFlow Model Server 1.x <tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuron>\n   TensorFlow Model Server 2.x <tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuron-v2>\n   TensorFlow Model Server NeuronX <tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuronx>\n   TensorFlow Neuron 1.x <tensorflow/tensorflow-neuron/tensorflow-neuron>\n   TensorFlow Neuron 2.x <tensorflow/tensorflow-neuron/tensorflow-neuron-v2>\n   TensorFlow NeuronX <tensorflow/tensorflow-neuronx/tensorflow-neuronx>\n   Neuron SDK 1.x Releases <neuron1/prev/rn>\n   Previous Neuron 2.x Release Artifacts <../prev/content>\n\n"
  },
  {
    "path": "release-notes/archive/libneuronxla.rst",
    "content": ".. |Trn1| replace:: :ref:`Trn1 <aws-trn1-arch>`\n.. |Inf2| replace:: :ref:`Inf2 <aws-inf2-arch>`\n\n.. _libneuronxla-rn:\n\nNeuron XLA pluggable device (``libneuronxla``) release notes\n================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 1\n\n``libneuronxla`` is a software package containing Neuron's integration into\nthe `PJRT <https://openxla.org/xla/pjrt_integration>`__ runtime, built using\nthe `PJRT C-API plugin <https://github.com/openxla/xla/blob/5564a9220af230c6c194e37b37938fb40692cfc7/xla/pjrt/c/docs/pjrt_integration_guide.md>`__\nmechanism.\n\nRelease [2.0.5347.0]\n--------------------\nDate: 11/20/2024\n\nSummary\n~~~~~~~\n\nAdd support for torch-xla 2.1.5 which fixes the \"list index out of range\" error when using the Zero Redundancy Optimizer (ZeRO1) checkpoint loading.\n\nRelease [2.0.4986.0]\n--------------------\nDate: 10/25/2024\n\nSummary\n~~~~~~~\n\nThis patch release removes the excessive lock wait time during neuron_parallel_compile graph extraction for large cluster training.\n\nRelease [2.0.4115.0]\n----------------------\nDate: 09/16/2024\n\n\nSummary\n~~~~~~~\n\nThis release of ``libneuronxla`` officially adds beta support for running JAX on AWS Trainium and Inferentia accelerators.\n\n\nWhat’s new in this release\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAnnouncing beta Neuron support for JAX.\n\n- Trainium and Inferentia as PJRT pluggable devices\n- JAX 0.4.31 support (through PJRT C-API version 0.54)\n"
  },
  {
    "path": "release-notes/archive/mxnet-neuron.rst",
    "content": ".. _mxnet-neuron-rn:\n\n\nApache MXNet Neuron Release Notes\n==================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nThis document lists the release notes for MXNet-Neuron framework.\n\nApache MXNet Neuron release [1.8.0.2.4.40.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 12/21/2023\n\nSummary\n-------\n\nMinor updates.\n\nApache MXNet Neuron release [1.8.0.2.4.25.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 10/15/2023\n\nSummary\n-------\n\nMinor updates.\n\nApache MXNet Neuron release [1.8.0.2.4.10.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 7/19/2023\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.8 Neuron.\n\nApache MXNet Neuron release [1.8.0.2.4.9.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 6/14/2023\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.8 Neuron.\n\nApache MXNet Neuron release [1.8.0.2.4.1.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 5/1/2023\n\nNew in this release\n-------------------\n\n* Updated Neuron Runtime library to version 2.12\n* Added missing LICENSE.txt\n\nKnown Issues and Limitations\n----------------------------\n\n* Bert-base in 16 NeuronCores pipeline mode has 50% lower performance when running 16 inferences in parallel with Runtime version 2.12.\n\n[1.5.1.1.10.39.0]\n^^^^^^^^^^^^^^^^^\n\nDate: 5/1/2023\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.5 Neuron.\n\nThis is the last released version. Please use neuron-cc version 1.15.0 only for this mxnet-neuron version. Also, this version is limited to python 3.9 or below only.\n\n.. code:: bash\n\n   python -m pip install mxnet_neuron==1.5.1.* neuron-cc==1.15.0\n\nApache MXNet Neuron release [1.8.0.2.2.127.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 3/28/2023\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.8 Neuron.\n\n[1.5.1.1.10.37.0]\n^^^^^^^^^^^^^^^^^\n\nDate: 3/28/2023\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.5 Neuron.\n\nApache MXNet Neuron release [1.8.0.2.2.43.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 11/23/2022\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.8 Neuron.\n\n[1.5.1.1.10.11.0]\n^^^^^^^^^^^^^^^^^\n\nDate: 11/23/2022\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.5 Neuron.\n\n[1.5.1.1.10.0.0]\n^^^^^^^^^^^^^^^^\n\nDate: 04/28/2022\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.5 Neuron.\n\nApache MXNet Neuron release [1.8.0.2.2.2.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/25/2022\n\nNew in this release\n-------------------\n\n* Added support for unloading models from a NeuronDevice by deleting the model instance in user application. Users can now call ``del`` in Python on an executor and to unload the model from a NeuronDevice (provided the deleted executor is the last executor pointing to the given model). This requires the latest ``aws-mx-1.8`` package from ``https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl``. \n\nBug fixes\n---------\n\n* Fixed a memory leak caused by stale unloaded models in NeuronDevice memory. For this fix to take effect please install aws-mx package from https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl along with the latest mx-neuron package.\n\n[1.5.1.1.9.0.0]\n^^^^^^^^^^^^^^^\n\nDate: 03/25/2022\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.5 Neuron.\n\n\nApache MXNet Neuron release [1.8.0.2.1.5.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 01/20/2022\n\nNew in this release\n-------------------\n\n* Added support of ``mx_neuron.__version__`` to get the build version of MXNet Neuron plugin\n\nBug fixes\n---------\n\n* Fixed assertion errors when inference was completed with NaNs. The expected behavior is to complete inference successfully and warn the \n  user that ``NaN``s were seen during the current inference. \n* Fixed compile issue when individual output nodes have multiple output nodes. Because the output index was being dropped, fewer number \n  of output feature maps were being considered and that caused failures during inference. \n\n\nApache MXNet Neuron release [1.8.0.2.0.276.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 11/05/2021\n\n* Updated Neuron Runtime (which is integrated within this package) to ``libnrt 2.2.18.0`` to fix a container issue that was preventing \n  the use of containers when /dev/neuron0 was not present. See details here :ref:`runtime_rn`.\n\nApache MXNet Neuron release [1.8.0.2.0.271.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate 10/27/2021\n\nNew in this release\n-------------------\n\n-  MXNet Neuron 1.8 now support Neuron Runtime 2.x (``libnrt.so`` shared library) only.\n\n     .. important::\n\n        -  You must update to the latest Neuron Driver (``aws-neuron-dkms`` version 2.1 or newer) \n           for proper functionality of the new runtime library.\n        -  Read :ref:`introduce-libnrt`\n           application note that describes :ref:`why are we making this\n           change <introduce-libnrt-why>` and\n           how :ref:`this change will affect the Neuron\n           SDK <introduce-libnrt-how-sdk>` in detail.\n        -  Read :ref:`neuron-migrating-apps-neuron-to-libnrt` for detailed information of how to\n           migrate your application.\n\n-  Introducing Flexible Execution Groups (FlexEG) feature. See :ref:`flexeg` application note.\n\n\nResolved Issues\n---------------\n\n-  Fixed a bug that prevented compilation of gluon models with multiple\n   cpu and neuron nodes.\n-  Added more debug logic to help with profiling of model load timing.\n\n\n[1.5.1.1.7.0.0]\n^^^^^^^^^^^^^^^\n\nDate 10/27/2021\n\nNew in this release\n-------------------\n\n-  MXNet 1.5 enters maintenance mode. Please visit :ref:`maintenance_mxnet_1_5` for more\n   information.\n\nResolved Issues\n---------------\n\n -  Minor bug fixes.\n\n\n[1.5.1.1.6.5.0]\n^^^^^^^^^^^^^^^\n\nDate 08/12/2021\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.5 Neuron.\n\n[1.8.0.1.3.4.0]\n^^^^^^^^^^^^^^^\n\nDate 08/12/2021\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.8 Neuron.\n\n\n[1.5.1.1.6.1.0]\n^^^^^^^^^^^^^^^\n\nDate 07/02/2021\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.5 Neuron.\n\n[1.8.0.1.3.0.0]\n^^^^^^^^^^^^^^^\n\nDate 07/02/2021\n\nSummary\n-------\n\nSupport for Autoloop, Cpredict API and minor bug fixes and enhancements for MXNet 1.8 Neuron.\n\nMajor New Features\n------------------\n\n- Added support for Autoloop feature for MXNet 1.8 Neuron.\n\nResolved Issues\n---------------\n\n- Added support for CPredict API.\n\n\n[1.8.0.1.2.1.0]\n^^^^^^^^^^^^^^^\n\nDate 5/28/2021\n\nSummary\n-------\n\nMinor bug fixes and enhancements for MXNet 1.8 Neuron\n\nResolved Issues\n---------------\n- Added support for Neuron profiler \n\n\n[1.8.0.1.1.2.0]\n^^^^^^^^^^^^^^^\n\nDate 4/30/2021\n\nSummary\n-------\n\nInitial release of Apache MXNet 1.8 for Neuron\n\nMajor New Features\n------------------\n\n- Gluon API and Neuron support for NLP BERT models\n\n- Neuron is now a plugin\n\n- Please note new API changes to support plugin mode: :ref:`ref-mxnet-neuron-compilation-python-api`\n\n[1.5.1.1.4.x.x]\n^^^^^^^^^^^^^^^\n\nDate 5/28/2021\n\nSummary\n-------\n\n- Minor enhancements.\n\n[1.5.1.1.4.4.0]\n^^^^^^^^^^^^^^^\n\nDate 4/30/2021\n\nSummary\n-------\n\n- Resolve an issue with Neuron profiling.\n\nResolved Issues\n---------------\n\n- Issue: when Neuron profiling is enabled in MXNet-Neuron 1.5.1 (using NEURON_PROFILE=<dir>), and TensorBoard is used to read in the profiled data, user would see an error messsage \"panic: runtime error: index out of range\". This issue is resolved in this release.\n\n[1.5.1.1.3.8.0]\n^^^^^^^^^^^^^^^\n\nDate 3/4/2021\n\nSummary\n-------\n\nMinor enhancements.\n\n[1.5.1.1.3.7.0]\n^^^^^^^^^^^^^^^\n\nDate 2/24/2021\n\nSummary\n-------\n\nFix for CVE-2021-3177.\n\n[1.5.1.1.3.2.0]\n^^^^^^^^^^^^^^^\n\nDate 1/30/2021\n\nSummary\n-------\n\nVarious minor improvements\n\n[1.5.1.1.2.1.0]\n^^^^^^^^^^^^^^^\n\nDate 12/23/2020\n\nSummary\n-------\n\nVarious minor improvements\n\n[1.5.1.1.1.88.0]\n^^^^^^^^^^^^^^^^\n\nDate 11/17/2020\n\nSummary\n-------\n\nThis release includes the bug fix for MXNet Model Server not being able to clean up\nNeuron RTD states after model is unloaded (deleted) from model server.\n\nResolved Issues\n---------------\n\n-  Issue: MXNet Model Server is not able to clean up Neuron RTD states\n   after model is unloaded (deleted) from model server.\n\n    -  Workaround for earlier versions: run “\\ ``/opt/aws/neuron/bin/neuron-cli reset``\\ “ to\n   clear Neuron RTD states after all models are unloaded and server is\n   shut down.\n\n[1.5.1.1.1.52.0]\n^^^^^^^^^^^^^^^^\n\nDate 09/22/2020\n\nSummary\n-------\n\nVarious minor improvements.\n\nMajor New Features\n------------------\n\nResolved Issues\n---------------\n\n-  Issue: When first importing MXNet into python process and subprocess\n   call is invoked, user may get an OSError exception \"OSError: [Errno\n   14] Bad address\" during subprocess call (see\n   https://github.com/apache/incubator-mxnet/issues/13875 for more\n   details). This issue is fixed with a mitigation patch from MXNet for\n   Open-MP fork race conditions.\n\n   -  Workaround for earlier versions: Export KMP_INIT_AT_FORK=false\n      before running python process.\n\n.. _1511110:\n\n[1.5.1.1.1.1.0]\n^^^^^^^^^^^^^^^\n\nDate 08/08/2020\n\n.. _mxnet-summary-1:\n\nSummary\n-------\n\nVarious minor improvements.\n\n.. _mx-major-new-features-1:\n\nMajor New Features\n------------------\n\n.. _mx-resolved-issues-1:\n\nResolved Issues\n---------------\n\n.. _1511021010:\n\n[1.5.1.1.0.2101.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 08/05/2020\n\n.. _mxnet-summary-2:\n\nSummary\n-------\n\nVarious minor improvements.\n\n.. _mx-major-new-features-2:\n\nMajor New Features\n------------------\n\n.. _mx-resolved-issues-2:\n\nResolved Issues\n---------------\n\n.. _1511020930:\n\n[1.5.1.1.0.2093.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 07/16/2020\n\n.. _mxnet-summary-3:\n\nSummary\n-------\n\nThis release contains a few bug fixes and user experience improvements.\n\n.. _mx-major-new-features-3:\n\nMajor New Features\n------------------\n\n.. _mx-resolved-issues-3:\n\nResolved Issues\n---------------\n\n-  User can specify NEURONCORE_GROUP_SIZES without brackets (for\n   example, \"1,1,1,1\"), as can be done in TensorFlow-Neuron and\n   PyTorch-Neuron.\n-  Fixed a memory leak when inferring neuron subgraph properties\n-  Fixed a bug dealing with multi-input subgraphs\n\n.. _1511020330:\n\n[1.5.1.1.0.2033.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 6/11/2020\n\n.. _mxnet-summary-4:\n\nSummary\n-------\n\n-  Added support for profiling during inference\n\n.. _mx-major-new-features-4:\n\nMajor New Features\n------------------\n\n-  Profiling can now be enabled by specifying the profiling work\n   directory using NEURON_PROFILE environment variable during inference.\n   For an example of using profiling, see :ref:`tensorboard-neuron`.\n   (Note that graph view of MXNet graph is not available via\n   TensorBoard).\n\n.. _mx-resolved-issues-4:\n\nResolved Issues\n---------------\n\nKnown Issues and Limitations\n----------------------------\n\nOther Notes\n-----------\n\n.. _1511019000:\n\n[1.5.1.1.0.1900.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 5/11/2020\n\n.. _mxnet-summary-5:\n\nSummary\n-------\n\nImproved support for shared-memory communication with Neuron-Runtime.\n\n.. _mx-major-new-features-5:\n\nMajor New Features\n------------------\n\n-  Added support for the BERT-Base model (base: L-12 H-768 A-12), max\n   sequence length 64 and batch size of 8.\n-  Improved security for usage of shared-memory for data transfer\n   between framework and Neuron-Runtime\n-  Improved allocation and cleanup of shared-memory resource\n-  Improved container support by automatic falling back to GRPC data\n   transfer if shared-memory cannot be allocated by Neuron-Runtime\n\n.. _mx-resolved-issues-5:\n\nResolved Issues\n---------------\n\n-  User is unable to allocate Neuron-Runtime shared-memory resource when\n   using MXNet-Neuron in a container to communicate with Neuron-Runtime\n   in another container. This is resolved by automatic falling back to\n   GRPC data transfer if shared-memory cannot be allocated by\n   Neuron-Runtime.\n-  Fixed issue where some large models could not be loaded on\n   inferentia.\n\n.. _mx-known-issues-and-limitations-1:\n\nKnown Issues and Limitations\n----------------------------\n\n.. _mx-other-notes-1:\n\nOther Notes\n-----------\n\n.. _1511015960:\n\n[1.5.1.1.0.1596.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 3/26/2020\n\n.. _mxnet-summary-6:\n\nSummary\n-------\n\nNo major changes or fixes\n\n.. _mx-major-new-features-6:\n\nMajor New Features\n------------------\n\n.. _mx-resolved-issues-6:\n\nResolved Issues\n---------------\n\n.. _mx-known-issues-and-limitations-2:\n\nKnown Issues and Limitations\n----------------------------\n\n.. _mx-other-notes-2:\n\nOther Notes\n-----------\n\n.. _1511014980:\n\n[1.5.1.1.0.1498.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 2/27/2020\n\n.. _mxnet-summary-7:\n\nSummary\n-------\n\nNo major changes or fixes.\n\n.. _mx-major-new-features-7:\n\nMajor New Features\n------------------\n\n.. _mx-resolved-issues-7:\n\nResolved Issues\n---------------\n\nThe issue(s) below are resolved:\n\n-  Latest pip version 20.0.1 breaks installation of MXNet-Neuron pip\n   wheel which has py2.py3 in the wheel name.\n\n.. _mx-known-issues-and-limitations-3:\n\nKnown Issues and Limitations\n----------------------------\n\n-  User is unable to allocate Neuron-Runtime shared-memory resource when\n   using MXNet-Neuron in a container to communicate with Neuron-Runtime\n   in another container. To work-around, please set environment variable\n   NEURON_RTD_USE_SHM to 0.\n\n.. _mx-other-notes-3:\n\nOther Notes\n-----------\n\n.. _1511014010:\n\n[1.5.1.1.0.1401.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 1/27/2020\n\n.. _mxnet-summary-8:\n\nSummary\n-------\n\nNo major changes or fixes.\n\n.. _mx-major-new-features-8:\n\nMajor New Features\n------------------\n\n.. _mx-resolved-issues-8:\n\nResolved Issues\n---------------\n\n-  The following issue is resolved when the latest multi-model-server\n   with version >= 1.1.0 is used with MXNet-Neuron. You would still need\n   to use \"``/opt/aws/neuron/bin/neuron-cli reset``\" to clear all Neuron\n   RTD states after multi-model-server is exited:\n\n   -  Issue: MXNet Model Server is not able to clean up Neuron RTD\n      states after model is unloaded (deleted) from model server and\n      previous workaround \"``/opt/aws/neuron/bin/neuron-cli reset``\" is\n      unable to clear all Neuron RTD states.\n\n.. _mx-known-issues-and-limitations-4:\n\nKnown Issues and Limitations\n----------------------------\n\n-  Latest pip version 20.0.1 breaks installation of MXNet-Neuron pip\n   wheel which has py2.py3 in the wheel name. This breaks all existing\n   released versions. The error looks like:\n\n::\n\n   Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com\n   ERROR: Could not find a version that satisfies the requirement mxnet-neuron (from versions: none)\n   ERROR: No matching distribution found for mxnet-neuron\n\n-  Work around: install the older version of pip using \"pip install\n   pip==19.3.1\".\n\n.. _mx-other-notes-4:\n\nOther Notes\n-----------\n\n.. _1511013250:\n\n[1.5.1.1.0.1325.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 12/1/2019\n\n.. _mxnet-summary-9:\n\nSummary\n-------\n\n.. _mx-major-new-features-9:\n\nMajor New Features\n------------------\n\n.. _mx-resolved-issues-9:\n\nResolved Issues\n---------------\n\n-  Issue: Compiler flags cannot be passed to compiler during compile\n   call. The fix: compiler flags can be passed to compiler during\n   compile call using “flags” option followed by a list of flags.\n\n-  Issue: Advanced CPU fallback option is a way to attempt to improve\n   the number of operators on Inferentia. The default is currently set\n   to on, which may cause failures. The fix: This option is now off by\n   default.\n\n.. _mx-known-issues-and-limitations-5:\n\nKnown Issues and Limitations\n----------------------------\n\n-  Issue: MXNet Model Server is not able to clean up Neuron RTD states\n   after model is unloaded (deleted) from model server and previous\n   workaround \"``/opt/aws/neuron/bin/neuron-cli reset``\" is unable to\n   clear all Neuron RTD states.\n\n   -  Workaround: run “\\ ``sudo systemctl restart neuron-rtd``\\ “ to\n      clear Neuron RTD states after all models are unloaded and server\n      is shut down.\n\n.. _mx-other-notes-5:\n\nOther Notes\n-----------\n\n.. _1511013490:\n\n[1.5.1.1.0.1349.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 12/20/2019\n\n.. _mxnet-summary-10:\n\nSummary\n-------\n\nNo major changes or fixes. Released with other Neuron packages.\n\n.. _1511013250-1:\n\n[1.5.1.1.0.1325.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 12/1/2019\n\n.. _mxnet-summary-11:\n\nSummary\n-------\n\n.. _mx-major-new-features-10:\n\nMajor New Features\n------------------\n\n.. _mx-resolved-issues-10:\n\nResolved Issues\n---------------\n\n-  Issue: Compiler flags cannot be passed to compiler during compile\n   call. The fix: compiler flags can be passed to compiler during\n   compile call using “flags” option followed by a list of flags.\n\n-  Issue: Advanced CPU fallback option is a way to attempt to improve\n   the number of operators on Inferentia. The default is currently set\n   to on, which may cause failures. The fix: This option is now off by\n   default.\n\n.. _mx-known-issues-and-limitations-6:\n\nKnown Issues and Limitations\n----------------------------\n\n-  Issue: MXNet Model Server is not able to clean up Neuron RTD states\n   after model is unloaded (deleted) from model server and previous\n   workaround \"``/opt/aws/neuron/bin/neuron-cli reset``\" is unable to\n   clear all Neuron RTD states.\n\n   -  Workaround: run “\\ ``sudo systemctl restart neuron-rtd``\\ “ to\n      clear Neuron RTD states after all models are unloaded and server\n      is shut down.\n\n.. _mx-other-notes-6:\n\nOther Notes\n-----------\n\n.. _1511012600:\n\n[1.5.1.1.0.1260.0]\n^^^^^^^^^^^^^^^^^^\n\nDate: 11/25/2019\n\n.. _mxnet-summary-12:\n\nSummary\n-------\n\nThis version is available only in released DLAMI v26.0 and is based on\nMXNet version 1.5.1. Please :ref:`dlami-rn-known-issues` to latest version.\n\n.. _mx-major-new-features-11:\n\nMajor new features\n------------------\n\n.. _mx-resolved-issues-11:\n\nResolved issues\n---------------\n\n.. _mx-known-issues-and-limitations-7:\n\nKnown issues and limitations\n----------------------------\n\n-  Issue: Compiler flags cannot be passed to compiler during compile\n   call.\n\n-  Issue: Advanced CPU fallback option is a way to attempt to improve\n   the number of operators on Inferentia. The default is currently set\n   to on, which may cause failures.\n\n   -  Workaround: explicitly turn it off by setting compile option\n      op_by_op_compiler_retry to 0.\n\n-  Issue: Temporary files are put in current directory when debug is\n   enabled.\n\n   -  Workaround: create a separate work directory and run the process\n      from within the work directory\n\n-  Issue: MXNet Model Server is not able to clean up Neuron RTD states\n   after model is unloaded (deleted) from model server.\n\n   -  Workaround: run “\\ ``/opt/aws/neuron/bin/neuron-cli reset``\\ “ to\n      clear Neuron RTD states after all models are unloaded and server\n      is shut down.\n\n-  Issue: MXNet 1.5.1 may return inconsistent node names for some\n   operators when they are the primary outputs of a Neuron subgraph.\n   This causes failures during inference.\n\n   -  Workaround : Use the ``excl_node_names`` compilation option to\n      change the partitioning of the graph during compile so that these\n      nodes are not the primary output of a neuron subgraph. See\n      :ref:`ref-mxnet-neuron-compilation-python-api`\n\n   .. code:: python\n\n      compile_args = { 'excl_node_names': [\"node_name_to_exclude\"] }\n\nModels Supported\n----------------\n\nThe following models have successfully run on neuron-inferentia systems\n\n1. Resnet50 V1/V2\n2. Inception-V2/V3/V4\n3. Parallel-WaveNet\n4. Tacotron 2\n5. WaveRNN\n\n.. _mx-other-notes-7:\n\nOther Notes\n-----------\n\n-  Python versions supported:\n\n   -  3.5, 3.6, 3.7\n\n-  Linux distribution supported:\n\n   -  Ubuntu 18, Amazon Linux 2\n"
  },
  {
    "path": "release-notes/archive/nemo/index.rst",
    "content": ".. neuronx-nemo-rn:\n\nNeuron Nemo Release Notes\n==============================================\n\n.. toctree::\n   :maxdepth: 1\n\n   neuronx-nemo"
  },
  {
    "path": "release-notes/archive/nemo/neuronx-nemo.rst",
    "content": ".. _neuronx-nemo-rn:\n\n\nAWS Neuron Reference for Nemo Megatron(``neuronx-nemo-megatron``) Release Notes\n===============================================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nThis document lists the release notes for ``neuronx-nemo-megatron`` library.\n\n``neuronx-nemo-megatron`` [0.8.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 12/20/2024\n\nNew in this release\n-------------------\n\n* Added support for HuggingFace to NeMo checkpoint conversion when virtual pipeline parallel is enabled.\n* Added support for Python 3.11\n* Added support for PyTorch 2.5\n* Added collective compute coalescing for ZeRO-1 optimizer\n* Bug fix for flash attention to ensure proper mixed precision data type handling\n\nKnown Issues and Limitations\n----------------------------\n\nNone at this time.\n\n\n``neuronx-nemo-megatron`` [0.7.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 09/16/2024\n\nNew in this release\n-------------------\n\n* Fixed issue with linear warmup with cosine annealing\n* Fixed indexing issues with MPI job checkpoint conversion.\n* Fixed pipeline parallel bug for NeMo to HF checkpoint conversion.\n\nKnown Issues and Limitations\n----------------------------\n\nNone at this time.\n\n\n``neuronx-nemo-megatron`` [0.6.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 07/03/2024\n\nNew in this release\n-------------------\n\n* Added support for fp32 gradient accumulation.\n* Added support for flash attention kernel.\n* Added option for zero1 with master weights.\n* Checkpoint conversion script improvements.\n* S3 checkpointing improvements.\n* Zero1 checkpointing improvements\n* Various bug fixes and improvements.\n\n\nKnown Issues and Limitations\n----------------------------\n\nNone at this time.\n\n\n``neuronx-nemo-megatron`` [0.5.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 04/01/2024\n\nNew in this release\n-------------------\n\n* Added support for LoRA fine tuning.\n* Added support for Mistral 7B and sliding window attention\n* Added support for Zero1 Automatic Mixed Precision.\n* Improved throughput at scale of hundreds of nodes.\n* Improved support for FP32 optimizer states.\n* Merges up and gate projection in Llama for improved throughput.\n* Various bug fixes and improvements.\n* Fixes for checkpoint restoration accuracy issues.\n* Fixes Zero1 checkpointing issues.\n\n\nKnown Issues and Limitations\n----------------------------\n\nNone at this time.\n\n\n``neuronx-nemo-megatron`` [0.4.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 10/15/2023\n\nNew in this release\n-------------------\n\n* Added Llama 70B model pre-training and finetuning support that works with tensor-parallelism and pipeline parallelism using Group Query Attention (GQA)\n* Added GPT-NeoX 20B using  tensor parallelism and pipeline parallelism.\n* Added Checkpoint conversion scripts from Nemo to HuggingFace models for LLama 7B, 13B, 70B, GPT-NeoX FineTuning\n* Stability fixes for hangs observed for long running jobs checkpointing at regular time intervals.\n* Enabled python 3.10 support with Nemo.\n\nKnown Issues and Limitations\n----------------------------\n\n* We are seeing few extra graph compilations than before. These are not limiting functionality or performance.\n* Llama2-70B : Tested and validated on 8 nodes. Scaling beyond might see memory issues.\n\n``neuronx-nemo-megatron`` [0.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 9/15/2023\n\nNew in this release\n-------------------\n\n* Added Llama 13B model support that works with tensor-parallelism and pipeline parallelism\n* Zero1 Optimizer support that works with tensor-parallelism and pipeline parallelism\n* Fixes for loading/saving checkpoint OOM issues while loading large models\n* Added Docker support\n* Feature to save only the last checkpoint and delete previous ones to conserve disk space\n* Added FP32 OptimizerState option for mixed precision\n* Added Validation loop support\n\nKnown Issues and Limitations\n----------------------------\n\n* Tested validation logic with smaller global batch sizes (32). Not tested larger global batch sizes.\n\n"
  },
  {
    "path": "release-notes/archive/neuron-cc/neuron-cc-ops/index.rst",
    "content": ".. _neuron-supported-operators:\n\nNeuron Supported operators\n==========================\n\n.. toctree::\n   :maxdepth: 1\n\n   /archive/tensorflow/tensorflow-neuron/tensorflow2-accelerated-ops\n   neuron-cc-ops-tensorflow\n   neuron-cc-ops-pytorch\n   neuron-cc-ops-mxnet"
  },
  {
    "path": "release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-mxnet.rst",
    "content": ".. _neuron-cc-ops-mxnet:\n\n\nNeuron Apache MXNet Supported operators\n====================================================\n\nTo see a list of supported operators for MXNet, run the following command:\n\n``neuron-cc list-operators --framework MXNET``\n\n.. _neuron-compiler-release-1600:\n\nNeuron Compiler Release [1.6.13.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded\n\n::\n\n  amp_cast\n  amp_multicast\n\n.. _neuron-compiler-release-1410:\n\nNeuron Compiler Release [1.4.1.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1400:\n\nNeuron Compiler Release [1.4.0.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1300:\n\nNeuron Compiler Release [1.3.0.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1270:\n\nNeuron Compiler Release [1.2.7.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1220:\n\nNeuron Compiler Release [1.2.2.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1200:\n\nNeuron Compiler Release [1.2.0.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded\n\n::\n\n Deconvolution\n LayerNorm\n Pad\n SwapAxis\n _contrib_arange_like\n _contrib_interleaved_matmul_encdec_qk\n _contrib_interleaved_matmul_encdec_valatt\n _contrib_interleaved_matmul_selfatt_qk\n _contrib_interleaved_matmul_selfatt_valatt\n arctan\n broadcast_like\n cos\n erf\n pad\n sin\n slice_axis\n\n\n.. _neuron-compiler-release-10240450:\n\nNeuron Compiler Release [1.0.24045.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded ``_contrib_div_sqrt_dim``, ``broadcast_axis``\n\n.. _neuron-compiler-release-10180010:\n\nNeuron Compiler Release [1.0.18001.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-10179370:\n\nNeuron Compiler Release [1.0.17937.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-10168610:\n\nNeuron Compiler Release [1.0.16861.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRemoved ``log`` (Was erroneously reported as added in previous release.\n)\n\n.. _neuron-compiler-release-1015275:\n\nNeuron Compiler Release [1.0.15275]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded ``log``\n\n.. _neuron-compiler-release-1012696:\n\nNeuron Compiler Release [1.0.12696]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-109410:\n\nNeuron Compiler Release [1.0.9410]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-107878:\n\nNeuron Compiler Release [1.0.7878]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-106801:\n\nNeuron Compiler Release [1.0.6801]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-105939:\n\nNeuron Compiler Release [1.0.5939]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nno changes\n\n.. _neuron-compiler-release-105301:\n\nNeuron Compiler Release [1.0.5301]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nno changes\n\n.. _neuron-compiler-release-1046800:\n\nNeuron Compiler Release [1.0.4680.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n   Activation\n   BatchNorm\n   Cast\n   Concat\n   Convolution\n   Convolution_v1\n   Dropout\n   Flatten\n   FullyConnected\n   LeakyReLU\n   Pooling\n   Pooling_v1\n   RNN\n   Reshape\n   SequenceMask\n   SliceChannel\n   Softmax\n   UpSampling\n   __add_scalar__\n   __div_scalar__\n   __mul_scalar__\n   __pow_scalar__\n   __rdiv_scalar__\n   __rpow_scalar__\n   __rsub_scalar__\n   __sub_scalar__\n   _arange\n   _copy\n   _div_scalar\n   _equal_scalar\n   _full\n   _greater_equal_scalar\n   _greater_scalar\n   _lesser_equal_scalar\n   _lesser_scalar\n   _maximum\n   _maximum_scalar\n   _minimum\n   _minimum_scalar\n   _minus_scalar\n   _mul_scalar\n   _not_equal_scalar\n   _ones\n   _plus_scalar\n   _power_scalar\n   _rdiv_scalar\n   _rminus_scalar\n   _rnn_param_concat\n   _zeros\n   batch_dot\n   broadcast_add\n   broadcast_div\n   broadcast_equal\n   broadcast_greater\n   broadcast_greater_equal\n   broadcast_lesser\n   broadcast_lesser_equal\n   broadcast_maximum\n   broadcast_minimum\n   broadcast_mod\n   broadcast_mul\n   broadcast_not_equal\n   broadcast_sub\n   ceil\n   clip\n   concat\n   elemwise_add\n   elemwise_div\n   elemwise_mul\n   elemwise_sub\n   exp\n   expand_dims\n   flatten\n   floor\n   gather_nd\n   log\n   log_softmax\n   max\n   mean\n   min\n   negative\n   ones_like\n   relu\n   repeat\n   reshape\n   reshape_like\n   reverse\n   rsqrt\n   sigmoid\n   slice\n   slice_like\n   softmax\n   split\n   sqrt\n   square\n   squeeze\n   stack\n   sum\n   tanh\n   tile\n   transpose\n   where\n   zeros_like\n"
  },
  {
    "path": "release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-pytorch.rst",
    "content": ".. _neuron-cc-ops-pytorch:\n\nPyTorch Neuron (``torch-neuron``) Supported operators\n=====================================================\n\nCurrent operator lists may be generated with these commands inside\npython:\n\n.. code:: python\n\n   import torch.neuron\n   print(*torch.neuron.get_supported_operations(), sep='\\n')\n\n.. _pytorch-neuron-release-2130:\n\nPyTorch Neuron release [package version 1.*.*.2.9.1.0, SDK 2.13.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 08/28/2023\n\nAdded support for new operators:\n\n- ``aten::clamp_min``\n- ``aten::clamp_max``\n\n.. _pytorch-neuron-release-2900:\n\nPyTorch Neuron release [2.9.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/28/2023\n\nAdded support for new operators:\n\n- ``aten::tensordot``\n- ``aten::adaptive_avg_pool1d``\n- ``aten::prelu``\n- ``aten::reflection_pad2d``\n- ``aten::baddbmm``\n- ``aten::repeat``\n\n\n.. _pytorch-neuron-release-2500:\n\nPyTorch Neuron release [2.5.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 11/23/2022\n\nAdded support for new operators:\n\n- ``aten::threshold``\n- ``aten::roll``\n- ``aten::instance_norm``\n- ``aten::amin``\n- ``aten::amax``\n- ``aten::new_empty``\n- ``aten::new_ones``\n- ``aten::tril``\n- ``aten::triu``\n- ``aten::zero_``\n- ``aten::all``\n- ``aten::broadcast_tensors``\n- ``aten::broadcast_to``\n- ``aten::logical_and``\n- ``aten::logical_not``\n- ``aten::logical_or``\n- ``aten::logical_xor``\n- ``aten::_convolution_mode``\n\nAdded **limited** support for new operators:\n\n- LSTM Operations. See: :ref:`torch_neuron_lstm_support`\n\n  - ``aten::lstm``\n  - ``aten::_pack_padded_sequence``\n  - ``aten::_pad_packed_sequence``\n\n- ``aten::norm``: Supported when ``p`` argument is one of (``1``, ``2``, ``inf``, ``-inf``, ``'fro'``)\n\n\n.. _pytorch-neuron-release-2200:\n\nPyTorch Neuron release [2.2.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/25/2022\n\nAdded support for new operators:\n\n- ``aten::max_pool2d_with_indices``: Fully supported  (Was previously supported only when indices were unused).\n\n\n.. _pytorch-neuron-release-2170:\n\nPyTorch Neuron release [2.1.7.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 01/20/2022\n\nAdded support for new operators:\n\n* ``aten::bucketize``\n* ``aten::any``\n* ``aten::remainder``\n* ``aten::clip``\n* ``aten::repeat_interleave``\n* ``aten::tensor_split``\n* ``aten::split_with_sizes``\n* ``aten::isnan``\n* ``aten::embedding_renorm_``\n* ``aten::dot``\n* ``aten::mv``\n* ``aten::hardsigmoid``\n* ``aten::hardswish``\n* ``aten::trunc``\n* ``aten::one_hot``: Supported when ``num_classes`` is known at trace time.\n      The dynamic version of this operation when ``num_classes = -1`` is not supported.\n* ``aten::adaptive_max_pool1d``\n* ``aten::adaptive_max_pool2d``\n\n\n.. _pytorch-neuron-release-205360:\n\nPyTorch Neuron Release [2.0.536.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- The following are operators with limited support on Neuron. Unlike fully \n  supported operators, these operators are not returned when using \n  :func:`torch_neuron.get_supported_operations`. See each operator \n  description for conditional support:\n\n  - ``aten::max_pool2d_with_indices`` - Supported when indices outputs are not used by a downstream operation. This allows the operation to be compiled to Neuron when it is equivalent to an ``aten::max_pool2d``.\n  - ``aten::max_pool3d_with_indices`` - Supported when indices outputs are not used by a downstream operation. This allows the operation to be compiled to Neuron when it is equivalent to an ``aten::max_pool3d``.\n  - ``aten::where`` - Supported when used as a conditional selection (3-argument variant). Unsupported when used to generate a dynamic list of indices (1-argument variant). See :func:`torch.where`.\n\n\n.. _pytorch-neuron-release-203180:\n\nPyTorch Neuron Release [2.0.318.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n-  ``aten::empty_like``\n-  ``aten::log``\n-  ``aten::type_as``\n-  ``aten::movedim``\n-  ``aten::einsum``\n-  ``aten::argmax``\n-  ``aten::min``\n-  ``aten::argmin``\n-  ``aten::abs``\n-  ``aten::cos``\n-  ``aten::sin``\n-  ``aten::linear``\n-  ``aten::pixel_shuffle``\n-  ``aten::group_norm``\n-  ``aten::_weight_norm``\n\n\n.. _pytorch-neuron-release-15210:\n\nPyTorch Neuron Release [1.5.21.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo change\n\n\n.. _pytorch-neuron-release-1570:\n\nPyTorch Neuron Release [1.5.7.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::erf``\n- ``prim::DictConstruct``\n\n\n.. _pytorch-neuron-release-1410:\n\nPyTorch Neuron Release [1.4.1.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo change\n\n\n.. _pytorch-neuron-release-1350:\n\nPyTorch Neuron Release [1.3.5.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::numel``\n- ``aten::ones_like``\n- ``aten::reciprocal``\n- ``aten::topk``\n\n\n.. _pytorch-neuron-release-12160:\n\nPyTorch Neuron Release [1.2.16.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo change\n\n\n.. _pytorch-neuron-release-12150:\n\nPyTorch Neuron Release [1.2.15.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo change\n\n\n.. _pytorch-neuron-release-1230:\n\nPyTorch Neuron Release [1.2.3.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::silu``\n- ``aten::zeros_like``\n\n\n.. _pytorch-neuron-release-1170:\n\nPyTorch Neuron Release [1.1.7.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::_shape_as_tensor``\n- ``aten::chunk``\n- ``aten::empty``\n- ``aten::masked_fill``\n\n\n.. _pytorch-neuron-release-10240450:\n\nPyTorch Neuron Release [1.0.24045.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::__and__``\n- ``aten::bmm``\n- ``aten::clone``\n- ``aten::expand_as``\n- ``aten::fill_``\n- ``aten::floor_divide``\n- ``aten::full``\n- ``aten::hardtanh``\n- ``aten::hardtanh_``\n- ``aten::le``\n- ``aten::leaky_relu``\n- ``aten::lt``\n- ``aten::mean``\n- ``aten::ne``\n- ``aten::softplus``\n- ``aten::unbind``\n- ``aten::upsample_bilinear2d``\n\n\n.. _pytorch-neuron-release-10172000:\n\nPyTorch Neuron Release [1.0.1720.00]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::constant_pad_nd``\n- ``aten::meshgrid``\n\n\n.. _pytorch-neuron-release-1015320:\n\nPyTorch Neuron Release [1.0.1532.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::ones``\n\n\n.. _pytorch-neuron-release-1015220:\n\nPyTorch Neuron Release [1.0.1522.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo change\n\n\n.. _pytorch-neuron-release-1013860:\n\nPyTorch Neuron Release [1.0.1386.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::ceil``\n- ``aten::clamp``\n- ``aten::eq``\n- ``aten::exp``\n- ``aten::expand_as``\n- ``aten::flip``\n- ``aten::full_like``\n- ``aten::ge``\n- ``aten::gt``\n- ``aten::log2``\n- ``aten::log_softmax``\n- ``aten::max``\n- ``aten::neg``\n- ``aten::relu``\n- ``aten::rsqrt``\n- ``aten::scalarImplicit``\n- ``aten::sqrt``\n- ``aten::squeeze``\n- ``aten::stack``\n- ``aten::sub``\n- ``aten::sum``\n- ``aten::true_divide``\n- ``aten::upsample_nearest2d``\n- ``prim::Constant``\n- ``prim::GetAttr``\n- ``prim::ImplicitTensorToNum``\n- ``prim::ListConstruct``\n- ``prim::ListUnpack``\n- ``prim::NumToTensor``\n- ``prim::TupleConstruct``\n- ``prim::TupleUnpack``\n\nPlease note, primitives are included in this list from this release.\n\n\n.. _pytorch-neuron-release-1011680:\n\nPyTorch Neuron Release [1.0.1168.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::ScalarImplicit``\n\n\n.. _pytorch-neuron-release-1010010:\n\nPyTorch Neuron Release [1.0.1001.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::detach``\n- ``aten::floor``\n- ``aten::gelu``\n- ``aten::pow``\n- ``aten::sigmoid``\n- ``aten::split``\n\nRemove support for operators:\n\n- ``aten::embedding``: Does not meet **performance** criteria\n- ``aten::erf``: Error function does not meet **accuracy** criteria\n- ``aten::tf_dtype_from_torch``: Internal support function, not an operator\n\n\n.. _pytorch-neuron-release-108250:\n\nPyTorch Neuron Release [1.0.825.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo change\n\n\n.. _pytorch-neuron-release-107630:\n\nPyTorch Neuron Release [1.0.763.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::Int``\n- ``aten::arange``\n- ``aten::contiguous``\n- ``aten::div``\n- ``aten::embedding``\n- ``aten::erf``\n- ``aten::expand``\n- ``aten::eye``\n- ``aten::index_select``\n- ``aten::layer_norm``\n- ``aten::matmul``\n- ``aten::mm``\n- ``aten::permute``\n- ``aten::reshape``\n- ``aten::rsub``\n- ``aten::select``\n- ``aten::size``\n- ``aten::slice``\n- ``aten::softmax``\n- ``aten::tf_dtype_from_torch``\n- ``aten::to``\n- ``aten::transpose``\n- ``aten::unsqueeze``\n- ``aten::view``\n- ``aten::zeros``\n\nRemove support for operators:\n\n- ``aten::tf_broadcastable_slice``: Internal support function, not an operator\n- ``aten::tf_padding``: Internal support function, not an operator\n\nThese operators were already supported previously:\n\n- ``aten::_convolution``\n- ``aten::adaptive_avg_pool2d``\n- ``aten::add``\n- ``aten::add_``\n- ``aten::addmm``\n- ``aten::avg_pool2d``\n- ``aten::batch_norm``\n- ``aten::cat``\n- ``aten::dimension_value``\n- ``aten::dropout``\n- ``aten::flatten``\n- ``aten::max_pool2d``\n- ``aten::mul``\n- ``aten::relu_``\n- ``aten::t``\n- ``aten::tanh``\n- ``aten::values``\n- ``prim::Constant``\n- ``prim::GetAttr``\n- ``prim::ListConstruct``\n- ``prim::ListUnpack``\n- ``prim::TupleConstruct``\n- ``prim::TupleUnpack``\n\n\n.. _pytorch-neuron-release-106720:\n\nPyTorch Neuron Release [1.0.672.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo change\n\n\n.. _pytorch-neuron-release-105520:\n\nPyTorch Neuron Release [1.0.552.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded support for new operators:\n\n- ``aten::_convolution``\n- ``aten::adaptive_avg_pool2d``\n- ``aten::add``\n- ``aten::add_``\n- ``aten::addmm``\n- ``aten::avg_pool2d``\n- ``aten::batch_norm``\n- ``aten::cat``\n- ``aten::dimension_value``\n- ``aten::dropout``\n- ``aten::flatten``\n- ``aten::max_pool2d``\n- ``aten::mul``\n- ``aten::relu_``\n- ``aten::t``\n- ``aten::tanh``\n- ``aten::tf_broadcastable_slice``\n- ``aten::tf_padding``\n- ``aten::values``\n- ``prim::Constant``\n- ``prim::GetAttr``\n- ``prim::ListConstruct``\n- ``prim::ListUnpack``\n- ``prim::TupleConstruct``\n- ``prim::TupleUnpack``\n"
  },
  {
    "path": "release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-tensorflow.rst",
    "content": ".. _neuron-cc-ops-tensorflow:\n\nTensorFlow Neuron (``tensorflow-neuron (TF1.x)``) Supported operators\n=====================================================================\n\nTo see a list of supported operators for TensorFlow 1.x, run the following command:\n\n``neuron-cc list-operators --framework TENSORFLOW``\n\n.. _neuron-compiler-release-1910:\n\nNeuron Compiler Release [1.9.1.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nDate: 01/20/2022\n\n\nAdded\n\n::\n\n isNan\n FusedBatchNormV3\n\n.. _neuron-compiler-release-1730:\n\nNeuron Compiler Release [1.7.3.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded\n\n::\n\n ArgMax\n ArgMin\n\n\n\n.. _neuron-compiler-release-16130:\n\nNeuron Compiler Release [1.6.13.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1550:\n\nNeuron Compiler Release [1.5.5.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1400:\n\nNeuron Compiler Release [1.4.0.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1300:\n\nNeuron Compiler Release [1.3.0.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded\n\n::\n\n Abs\n Cos\n DepthwiseConv2dNative\n Erf\n Rank\n Sin\n Size\n\n\n.. _neuron-compiler-release-1270:\n\nNeuron Compiler Release [1.2.7.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1220:\n\nNeuron Compiler Release [1.2.2.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded\n\n::\n\n AdjustContrastv2\n AdjustSaturation\n BroadcastTo\n Cholesky\n Conv2DBackpropInput\n Conv3D\n CropAndResize\n FloorDiv\n HSVToRGB\n InvertPermutation\n L2Loss\n Log1p\n MatrixBandPart\n MatrixDiag\n MatrixSetDiag\n MatrixTriangularSolve\n MaxPool3D\n MirrorPad\n RGBToHSV\n Range\n SoftmaxCrossEntropyWithLogits\n SquaredDifference\n StopGradient\n Unpack\n UnsortedSegmentSum\n\n\n\n.. _neuron-compiler-release-10240450:\n\nNeuron Compiler Release [1.0.24045.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded ``FloorDiv``, ``Softplus``, ``Unstack``\n\n\n.. _neuron-compiler-release-1018001:\n\nNeuron Compiler Release [1.0.18001]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1016764:\n\nNeuron Compiler Release [1.0.16764]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded:\n\n::\n\n   LogSoftmax\n   Neg\n   ResizeBilinear\n   ResizeNearestNeighbor\n\n.. _neuron-compiler-release-1015275:\n\nNeuron Compiler Release [1.0.15275]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAdded\n\n::\n\n   Neg \n\nRemoved\n\n::\n\n   Log\n\n(was inadvertently advertised as supported)\n\n.. _neuron-compiler-release-1012696:\n\nNeuron Compiler Release [1.0.12696]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-109410:\n\nNeuron Compiler Release [1.0.9410]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-107878:\n\nNeuron Compiler Release [1.0.7878]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-106801:\n\nNeuron Compiler Release [1.0.6801]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-105939:\n\nNeuron Compiler Release [1.0.5939]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-105301:\n\nNeuron Compiler Release [1.0.5301]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nNo changes\n\n.. _neuron-compiler-release-1046800:\n\nNeuron Compiler Release [1.0.4680.0]\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n   Add\n   AddV2\n   All\n   AvgPool\n   BatchMatMul\n   BatchMatMulV2\n   BatchToSpaceND\n   BiasAdd\n   Cast\n   Ceil\n   Concat\n   ConcatV2\n   Const\n   Conv2D\n   Equal\n   Exp\n   ExpandDims\n   Fill\n   Floor\n   FusedBatchNorm\n   Greater\n   GreaterEqual\n   Identity\n   LRN\n   LeakyRelu\n   Less\n   LessEqual\n   Log\n   LogicalAnd\n   LogicalNot\n   LogicalOr\n   MatMul\n   Max\n   MaxPool\n   Maximum\n   Mean\n   Min\n   Minimum\n   Mul\n   NoOp\n   NotEqual\n   Pack\n   Pad\n   PadV2\n   Placeholder\n   Pow\n   Prod\n   RandomUniform\n   RealDiv\n   Reciprocal\n   Relu\n   Relu6\n   Reshape\n   ReverseV2\n   Round\n   Rsqrt\n   Select\n   Shape\n   Sigmoid\n   Sign\n   Slice\n   Softmax\n   SpaceToBatchND\n   Split\n   SplitV\n   Sqrt\n   Square\n   Squeeze\n   StridedSlice\n   Sub\n   Sum\n   Tanh\n   Tile\n   Transpose\n   ZerosLike\n"
  },
  {
    "path": "release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-xla.rst",
    "content": ".. _neuron-cc-ops-xla:\n\nTensorFlow Neuron (``tensorflow-neuron (TF1.x)``) Supported operators [XLA]\n=====================================================================\n\nTo see a list of supported operators for XLA, run the following command:\n\n``neuron-cc list-operators --framework XLA``\n\n+-------------------------+-------------------------------------------+\n| Supported XLA Operators | Notes                                     |\n+=========================+===========================================+\n| Abs                     |                                           |\n+-------------------------+-------------------------------------------+\n| Add                     |                                           |\n+-------------------------+-------------------------------------------+\n| Allgather               |                                           |\n+-------------------------+-------------------------------------------+\n| Allreduce               |                                           |\n+-------------------------+-------------------------------------------+\n| Atan2                   |                                           |\n+-------------------------+-------------------------------------------+\n| Batchnorm               |                                           |\n+-------------------------+-------------------------------------------+\n| Batchnormgrad           |                                           |\n+-------------------------+-------------------------------------------+\n| Batchnorminference      |                                           |\n+-------------------------+-------------------------------------------+\n| Broadcast               |                                           |\n+-------------------------+-------------------------------------------+\n| BroadcastInDim          |                                           |\n+-------------------------+-------------------------------------------+\n| Ceil                    |                                           |\n+-------------------------+-------------------------------------------+\n| Clamp                   |                                           |\n+-------------------------+-------------------------------------------+\n| Compare                 |                                           |\n+-------------------------+-------------------------------------------+\n| Concatenate             |                                           |\n+-------------------------+-------------------------------------------+\n| Constant                |                                           |\n+-------------------------+-------------------------------------------+\n| ConstantLiteral         |                                           |\n+-------------------------+-------------------------------------------+\n| ConvertElementType      |                                           |\n+-------------------------+-------------------------------------------+\n| Cos                     |                                           |\n+-------------------------+-------------------------------------------+\n| Customcall              |                                           |\n+-------------------------+-------------------------------------------+\n| Div                     |                                           |\n+-------------------------+-------------------------------------------+\n| Dot                     |                                           |\n+-------------------------+-------------------------------------------+\n| DotGeneral              |                                           |\n+-------------------------+-------------------------------------------+\n| DynamicUpdateSlice      | Supports only for constant index          |\n+-------------------------+-------------------------------------------+\n| Eq                      |                                           |\n+-------------------------+-------------------------------------------+\n| Exp                     |                                           |\n+-------------------------+-------------------------------------------+\n| Floor                   |                                           |\n+-------------------------+-------------------------------------------+\n| Gather                  | Supports only disjoint start_index_map    |\n|                         | and remapped_offset_dims                  |\n+-------------------------+-------------------------------------------+\n| Ge                      |                                           |\n+-------------------------+-------------------------------------------+\n| GetTupleElement         |                                           |\n+-------------------------+-------------------------------------------+\n| Gt                      |                                           |\n+-------------------------+-------------------------------------------+\n| Iota                    |                                           |\n+-------------------------+-------------------------------------------+\n| Le                      |                                           |\n+-------------------------+-------------------------------------------+\n| Log                     |                                           |\n+-------------------------+-------------------------------------------+\n| LogicalAnd              |                                           |\n+-------------------------+-------------------------------------------+\n| LogicalNot              |                                           |\n+-------------------------+-------------------------------------------+\n| Lt                      |                                           |\n+-------------------------+-------------------------------------------+\n| Max                     |                                           |\n+-------------------------+-------------------------------------------+\n| Min                     |                                           |\n+-------------------------+-------------------------------------------+\n| Mul                     |                                           |\n+-------------------------+-------------------------------------------+\n| Ne                      |                                           |\n+-------------------------+-------------------------------------------+\n| Neg                     |                                           |\n+-------------------------+-------------------------------------------+\n| Pad                     |                                           |\n+-------------------------+-------------------------------------------+\n| Pow                     | Exponent argument must be a compile-time  |\n|                         | integer constant                          |\n+-------------------------+-------------------------------------------+\n| Reduce                  | Min, Max, Add and Mul are the only        |\n|                         | supported computations. Init_values must  |\n|                         | be constant                               |\n+-------------------------+-------------------------------------------+\n| Reshape                 |                                           |\n+-------------------------+-------------------------------------------+\n| RngBitGenerator         | Ignores user seed                         |\n+-------------------------+-------------------------------------------+\n| RngUniform              |                                           |\n+-------------------------+-------------------------------------------+\n| Rsqrt                   |                                           |\n+-------------------------+-------------------------------------------+\n| Scatter                 |                                           |\n+-------------------------+-------------------------------------------+\n| Select                  |                                           |\n+-------------------------+-------------------------------------------+\n| ShiftRightLogical       |                                           |\n+-------------------------+-------------------------------------------+\n| Sign                    |                                           |\n+-------------------------+-------------------------------------------+\n| Sin                     |                                           |\n+-------------------------+-------------------------------------------+\n| Slice                   |                                           |\n+-------------------------+-------------------------------------------+\n| Sqrt                    |                                           |\n+-------------------------+-------------------------------------------+\n| Sub                     |                                           |\n+-------------------------+-------------------------------------------+\n| Tanh                    |                                           |\n+-------------------------+-------------------------------------------+\n| Transpose               |                                           |\n+-------------------------+-------------------------------------------+\n| Tuple                   |                                           |\n+-------------------------+-------------------------------------------+\n\n"
  },
  {
    "path": "release-notes/archive/neuron-cc/neuron-cc.rst",
    "content": ".. _neuron-cc-rn:\n\nNeuron Compiler (``neuron-cc``) for Inf1 Release Notes\n======================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nIntroduction\n^^^^^^^^^^^^\n\nThis document lists the release notes for AWS Neuron compiler. The\nNeuron Compiler is an ahead-of-time compiler that ensures Neuron will\noptimally utilize the Inferentia chips.\n\nOperator-support for each input format is provided directly from the\ncompiler.\n\n::\n\n   neuron-cc list-operators --framework {TENSORFLOW | MXNET | XLA}\n\nThe supported operators are also listed here:\n\nTensorflow: :ref:`neuron-cc-ops-tensorflow`\n\nPytorch: :ref:`neuron-cc-ops-pytorch`\n\nXLA: :ref:`neuron-cc-ops-xla`\n\nApache MXNet: :ref:`neuron-cc-ops-mxnet`\n\nKnown issues and limitations - updated 11/23/2022\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n* There is a known issue of increased latency and lower throughput when MLM head is compiled along with BERT model. The workaround is to compile them separately and feed the raw Bert into the head.\n* *TensorFlow 2.x* - In this release supported operators are limited to BERT-like models, specifically no conv2d  or reduce-window operators are available.\n* *Control flow* Neuron only supports control flow operators which are static at compile time. For example static length RNN, top-k, sort.\n* *Data layout* The Neuron compiler supports multiple data layout format (NCHW, NHWC, …). Non-CNHW input/output data-layouts will require Neuron to insert additional transpose operations, causing a degradation in performance.\n* *Primary inputs in NeuronCore Pipeline mode* When a neural network is executed in NeuronCore Pipeline mode, only the first operator in a neural network can receive primary inputs from the host.\n* *Reduce data type* INT8 data type is not currently supported by the Neuron compiler.\n* *NeuronCore Pipeline:* NeuronCorePipeline mode provides low-latency and high-throughput for small batch sizes. We recommend to start testing with batch=1 and gradually increase batch size to fine tune your model throughput and latency performance.\n* *Large input tensors* support varies by model. On some models the large input tensors (eg 1024x1024) may result in lower performance or exceeding hardware or compile-time limits, especially on models where the large input tensor is used by many downstream operators. Workarounds may include use of smaller batch, see\n  :ref:`neuron-batching`\n* *Conv2d operator* is mapped to Inferentia except for specific cases of extremely large tensors and specific parameters.\n* *Conv3d operator* performance is limited when the operator has small number of input channels (< 64).\n* FP64 and INT64 input and output tensors are not supported. Please cast to FP32/INT32 in the machine learning framework, prior compiling for Neuron.\n\nNeuron Compiler release [1.21.0.0]]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nDate: 12/21/2023\n\n* Minor bug fixes.\n\nNeuron Compiler release [1.20.3.0]]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nDate: 10/26/2023\n\n* Minor bug fixes.\n\nNeuron Compiler release [1.19.0.0]]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nDate: 09/15/2023\n\n* Minor bug fixes.\n\nNeuron Compiler release [1.17.0.0]]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 7/19/2023\n\nNew in this release\n-------------------\n\n* This release introduces a new ``--enable-saturate-infinity`` compiler option. A computation that can generate +/- infinity is at a high risk of generating Not-a-Number (NaN) values when the infinity value is used in subsequent computations. This option helps avoid this by converting +Inf/-Inf values to MAX/MIN_FLOAT before operations that could produce NaN values for +Inf/-Inf inputs on the target architecture. While this option helps to avoid NaN values, there is a potential performance degradation that occurs during model execution when this conversion is enabled.\n* Minor bug fixes.\n\nNeuron Compiler release [1.16.2.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 6/14/2023\n\n* Minor bug fixes.\n\nNeuron Compiler release [1.15.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 05/01/2023\n\n* Minor bug fixes.\n\nNeuron Compiler release [1.14.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 04/19/2023\n\n* Minor bug fixes.\n\nNeuron Compiler release [1.13.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nDate: 11/23/2022\n\n* Resolved long compile-times when compiling the YOLOv5 and YOLOv6 models. [GitHub · aws-neuron-sdk · #434]\n* Improved the layout algorithm to resolve an issue compiling a transformer-based text recognition model. [GitHub · aws-neuron-sdk · #410]\n* Support was added for additional XLA operators\n\nNeuron Compiler release [1.11.7.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 08/02/2022\n\n* Fixed a bug for correct handling of mxnet dropout instruction when mode is set as 'training' while performing inference.\n\n\nNeuron Compiler release [1.11.4.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 04/29/2022\n\n* Solved an issue that caused a \"false positive\" reporting of a data race that may occur due to address overlap.\n* Minor bug fixes.\n\n\nNeuron Compiler release [1.10.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/25/2022\n\n* Minor bug fixes.\n\n\nNeuron Compiler release [1.9.1.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 01/20/2022\n\n* Fixed an issue with frontend compiler for fused operators that was reported in `github #362 <https://github.com/aws/aws-neuron-sdk/issues/362>`_.\n\nNeuron Compiler release [1.8.5.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 01/05/2022\n\n\nNew in this release\n-------------------\n\n* Minor bug fixes.\n\n\nNeuron Compiler release [1.8.2.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 12/15/2021\n\n\nNew in this release\n-------------------\n\n* Performance enhancements as a result of improved layout and DMA optimizations.\n* Minor bug fixes.\n\n\nNeuron Compiler release [1.7.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 10/27/2021\n\n\nNew in this release\n-------------------\n\n* The compiler’s list-operators command can now display the supported TensorFlow 2.x operators.\n* Support added for new operators in TensorFlow 1.x -  ArgMax and ArgMin.\n* Introducing the ``–-fast-math`` option for better fine-tuning of accuracy/performance. See :ref:`neuron-cc-training-mixed-precision`\n\n\n[1.6.13.0]\n^^^^^^^^^^\n\nDate 08/12/2021\n\nNew in this release\n-------------------\n\n* TensorFlow 2.x  - First support of TensorFlow 2.x. The support is limited to operators in BERT-like models and was tested with Huggingface BERT small, base, large and DistillBert.\n\nResolved issues\n---------------\n\n* Fixed compiler backend issue in Tensor_tensor argument distance, `github #269 <https://github.com/aws/aws-neuron-sdk/issues/269>`_\n\n\n[1.5.5.0]\n^^^^^^^^^\n\nDate 07/02/2021\n\nSummary\n-------\n\n- Robustness and performance improvements.\n\nNew in this release\n-------------------\n\n* Added ``--enable-fast-context-switch`` option to optimize for faster model switching rather than inference latency.\n* Deprecated support for ONNX\n* Improved robustness of Conv3d\n* Corrected compilation error \"too many instructions\" in DLRM model\n\n\n\n[1.4.0.0]\n^^^^^^^^^\n\nDate 5/28/2021\n\nSummary\n-------\n\n- Performance improvements, and usability improvements.\n\nNew in this release\n-------------------\n\n* Added uncompressed NEFF format for faster loading models prior inference. Enable it by –enable-fast-loading-neuron-binaries. Some cases of large models may be detrminentally impacted as it will not be compressed but many cases will benefit.\n* Corrected compilation error in specific arguments of ResizeBilinear operator\n\n[1.3.0.0]\n^^^^^^^^^\n\nDate 4/30/2021\n\nSummary\n-------\n\n- Performance improvements, new operators, and usability improvements.\n\nNew in this release\n-------------------\n\n- Improved performance of batched CNN models like resnet50  with the default compiler options by 10%.\n\n- Improved performance of bert base sequence 128 batch 6 by upto 16%\n\n- Added support for group and depth wise convolution (with limited performance when the number of input channels is small).\n\n- Added more detailed debug names to support for tensorboard.\n\n\nResolved Issues\n---------------\n\n- Corrected potential race condition in overwriting tiles of output tensors.\n\n- Fixed various issues in pipelined inference by enabling fine grain partitioning by default.\n\n\n\n\n[1.2.7.0]\n^^^^^^^^^\n\nDate 2/24/2021\n\nSummary\n-------\n\nFix for CVE-2021-3177.\n\n[1.2.2.0]\n^^^^^^^^^\n\nDate 1/30/2021\n\nSummary\n-------\n\nAdded suport for multiple new operators (see operators list) for Tensoflow and MXNET. Improved inference performance of language, object recognition models on single as well as multiple pipelined cores using neuroncore-pipeline.\n\nNew in this release\n-------------------\n\n- The following models are now supported: Resnext 224x224, specific BERT variations applied to natural language processing and translation.\n\n- A number of new operators is now supported on Inferentia, see the full lists :ref:`neuron-cc-ops-tensorflow`\n and :ref:`neuron-cc-ops-mxnet`\n\n- Improved inference performance on yolov4 BERT base sequence 64 (on 16 pipelined cores) and openpose 184.\n\nResolved Issues\n---------------\n\n- Corrected a random failure to compile Resnet50 batch 5\n\n- Corrected numerical inaccuracy in RSQRT and related operators for tensors with very large values ( > 1e20)\n\n\n\n[1.1.7.0]\n^^^^^^^^^\n\nDate 12/23/2020\n\nSummary\n-------\n\nAdded suport for PyTorch Yolo V4, a new Framework-visible progress bar and improved inference performance. We continue to streamline the compiler usability by removing the need for options passed to control behavior. We are aiming to remove the need for such options entirely. Some tutorials have been updated to reflect this, but Resnet50 remains in need of these options to achieve maximum performance. Other useability improvements have been added, such as the compiler progress bar. As always, please let us know if there are other areas that we can improve.\n\n\nNew in this release\n-------------------\n- Pytorch Yolo V4 is now supported.\n\n- Added a compiler progress bar when compilation is invoked from the Framework. This allows the user to see that progress continues as compilation proceeds, which is useful when compilation takes several minutes. A dot is printed every 20 seconds.\n\n- Improved inference performance of Tensorflow BERT base seq 256 batch 3 by 10% .\n\nResolved Issues\n---------------\n- Resolved issue with depthwise convolution that manifests as a type check error\n\n\n.. _10240450:\n\n[1.0.24045.0]\n^^^^^^^^^^^^^\n\nDate 11/17/2020\n\nSummary\n-------\n\nImproved performance for pipelined execution (NeuronCore Pipeline).\n\nNew in this release\n-------------------\n\n-  NeuronCore Pipeline: improved partitioning to enable better static\n   weights loading to cache.\n\nResolved Issues\n---------------\n\n-  --static-weights : No longer needed. As this is shown in some\n   examples, please remove the option since the compiler now performs\n   this auto-detection by default.\n\n-  --num-neuroncores renamed to --neuroncore-pipeline-cores. The prior\n   option form is still functional (backwards compatible) and will be\n   removed in future releases.\n\n-  --batching_en: Resolved compilation failure of ResNet50 FP32 batch 1\n   on Ubuntu16 when \"--batching_en\" was used.\n\n\n.. _neuron-cc-10206000:\n\n[1.0.20600.0]\n^^^^^^^^^^^^^\n\nDate 9/22/2020\n\nSummary\n-------\n\nVarious performance improvements - both compilation time and inference\nspeed of object recognition models.\n\n-  Compiler optimization '-O2' option is now enabled by default.\n\n.. _cc-major-new-features-0:\n\nNew in this release\n-------------------\n\n-  Improved inference performance of YOLO v3, YOLO v4, VGG16, SSD300.\n   BERT models were improved by an additional 10%.\n\n-  Modifed such that -O2 is now the default behavior and does not need\n   to be specified. Note: some tutorials still explicitly specify \"-O2\".\n   These will be modified in forthcoming updates.\n\n.. _cc-resolved-issues-0:\n\nResolved Issues\n---------------\n\n-  Sped up compilation of large models that were taking hours to sub-40\n   minute.\n\n\n.. _neuron-cc-10180010:\n\n[1.0.18001.0]\n^^^^^^^^^^^^^\n\nDate 8/08/2020\n\n.. _cc-summary-1:\n\nSummary\n-------\n\nVarious performance improvements.\n\n.. _cc-major-new-features-1:\n\nNew in this release\n-------------------\n\nImproved performance of BERT base with -O2\n\n.. _cc-resolved-issues-1:\n\nResolved Issues\n---------------\n\n-  n/a\n\n.. _neuron-cc-10179370:\n\n[1.0.17937.0]\n^^^^^^^^^^^^^\n\nDate 8/05/2020\n\n.. _cc-summary-2:\n\nSummary\n-------\n\nVarious improvements.\n\n.. _neuron-cc-10168610:\n\n[1.0.16861.0]\n^^^^^^^^^^^^^\n\nDate 7/16/2020\n\n.. _cc-summary-3:\n\nSummary\n-------\n\nThis release has some bug fixes and some functional and performance\nimprovements to support compilation of several neural networks.\n\n.. _cc-major-new-features-2:\n\nNew in this release\n-------------------\n\nThis release\n\n-  Supports compilation of PoseNet, tested for images of specific\n   resolutions upto 736.\n-  Update the -O2 with a new memory allocator to reduce spilling to DRAM\n-  Improved performance of the '-O2' on BERT base, and openpose pose\n   network.\n\n.. _cc-resolved-issues-2:\n\nResolved Issues\n---------------\n\n-  Resolved compilation error in Vgg16 batch 1\n\nOther Notes\n-----------\n\n-  Some versions of Inception network may fail to compile in Tensorflow\n   on Ubuntu 16 in conda environment. The symptom is neuron-cc backend\n   data race error. As a workaround use Ubuntu 18, Amazon Linux 2, or\n   virtual env, or use neuron-cc with flag -O2.\n\n.. warning::\n\n   :ref:`Starting with Neuron 1.14.0, Ubuntu 16 is no longer supported <eol-ubuntu16>`\n\n.. _neuron-cc-10152750:\n\n[1.0.15275.0]\n^^^^^^^^^^^^^\n\nDate 6/11/2020\n\n.. _cc-summary-4:\n\nSummary\n-------\n\nThis release has some bug fixes and some functional and performance\nimprovements to support compilation of several neural networks.\n\n.. _cc-major-new-features-3:\n\nNew in this release\n-------------------\n\nThis release\n\n-  Supports compilation of PoseNet for images of specific resolutions\n   upto 400x400.\n-  Improves performance of resnet152.\n-  Supports a new command line option '-O2' that can help with handling\n   of large tensor inputs for certain models.\n-  increase NEFF versions to 1.0. This means new NEFFs compiled from\n   this release forward are not compatible with older versions of Neuron\n   Runtime prior to May, 2020 (1.0.6905.0) release. Please update the\n   Neuron Runtime when using NEFF version 1.0.\n\n.. _cc-resolved-issues-3:\n\nResolved Issues\n---------------\n\n-  Compilation issues on prosotron encoder, decoder neural networks.\n\n.. _cc-other-notes-1:\n\nOther Notes\n-----------\n\nDependencies\n------------\n\n-  This version creates NEFF 1.0 thus may require update of neuron-rtd\n   if older than May 2020 release.\n\ndmlc_nnvm==1.0.2574.0 dmlc_topi==1.0.2574.0 dmlc_tvm==1.0.2574.0\ninferentia_hwm==1.0.1362.0 islpy==2018.2\n\n.. _neuron-cc-10126960:\n\n[1.0.12696.0]\n^^^^^^^^^^^^^\n\nDate 5/11/2020\n\n.. _cc-summary-5:\n\nSummary\n-------\n\nBug fixes and some functional and performance improvements to several\nneural networks.\n\n.. _cc-major-new-features-4:\n\nNew in this release\n-------------------\n\n-  This version supports compilation of unmodified Tensorflow BERT with\n   batch size 1, 4, 6 for input sequence 128.\n-  Improved Tensorflow BERT batch 4 sequence 128 performance to 45% of\n   the accelerator peak (from 34%).\n-  Support for MXNET BERT base batch 8 compilation\n-  Support for TF Resnet152 batch 2 compilation\n-  Most compiler messages are migrated from cout to logging mechanisms\n   with verbosity control\n\n.. _cc-resolved-issues-4:\n\nResolved Issues\n---------------\n\n-  Fixed failure to compile unmodified Tensorflow BERT model for small\n   batches\n\n-  Fixed run-to-run-variability in OneHot operator implementation\n\n-  Robustness improvements for ParallelWavenet and transformer decoder\n   networks\n\n.. _cc-other-notes-2:\n\nOther Notes\n-----------\n\n.. _dependencies-1:\n\nDependencies\n------------\n\n::\n\n   dmlc_nnvm==1.0.2356.0\n   dmlc_topi==1.0.2356.0\n   dmlc_tvm==1.0.2356.0\n   inferentia_hwm==1.0.1294.0\n   islpy==2018.2\n\n.. _neuron-cc-1094100:\n\n[1.0.9410.0]\n^^^^^^^^^^^^\n\nDate 3/26/2020\n\n.. _cc-summary-6:\n\nSummary\n-------\n\nBug fixes and some functional and performance improvements to several\nneural networks.\n\n.. _cc-major-new-features-5:\n\nNew in this release\n-------------------\n\n-  Support compilation of modified SSD-300\n   (:ref:`tensorflow-ssd300`)\n-  Improved inference performance in natural language processing\n   networks (such as prosotron encoder) by 45%\n\n.. _cc-resolved-issues-5:\n\nResolved Issues\n---------------\n\n-  Eliminated redundant fp32 to bfloat16 cast on input and output\n   tensors\n\nKnown issues and limitations\n----------------------------\n\n-  See previous releases.\n\n.. _cc-other-notes-3:\n\nOther Notes\n-----------\n\n-  Added support for faster iteration on recurrent networks (aka\n   auto-loop)\n\n.. _dependencies-2:\n\nDependencies\n------------\n\n::\n\n   dmlc_nnvm==1.0.2049.0\n   dmlc_topi==1.0.2049.0\n   pip install --upgrade dmlc_tvm==1.0.2049.0\n   inferentia_hwm==1.0.897.0\n   islpy==2018.2\n\n.. _neuron-cc-1078780:\n\n[1.0.7878.0]\n^^^^^^^^^^^^\n\nDate 2/27/2020\n\n.. _cc-summary-7:\n\nSummary\n-------\n\nBug fixes and minor performance improvements.\n\n.. _cc-major-new-features-6:\n\nNew in this release\n-------------------\n\nNone\n\n.. _cc-resolved-issues-6:\n\nResolved Issues\n---------------\n\n-  Corrected image resize operator functionallity\n-  Compiler internal enhancements made that will benefit models such as\n   BERT\n\n.. _cc-known-issues-and-limitations-1:\n\nKnown issues and limitations\n----------------------------\n\n-  See previous releases.\n\n.. _cc-other-notes-4:\n\nOther Notes\n-----------\n\n.. _dependencies-3:\n\nDependencies\n------------\n\n::\n\n   dmlc_nnvm-1.0.1826.0\n   dmlc_topi-1.0.1826.0\n   dmlc_tvm-1.0.1826.0\n   inferentia_hwm-1.0.897.0\n   islpy-2018.2\n\n.. _neuron-cc-1068010:\n\n[1.0.6801.0]\n^^^^^^^^^^^^\n\nDate 1/27/2020\n\n.. _cc-summary-8:\n\nSummary\n-------\n\nBug fixes and some performance enhancement related to data movement for\nBERT-type neural networks.\n\n.. _cc-major-new-features-7:\n\nNew in this release\n-------------------\n\nNone\n\n.. _cc-resolved-issues-7:\n\nResolved Issues\n---------------\n\n-  Improved throughput for operators processed in the Neuron Runtime\n   CPU. As an example: execution of 4 single NeuronCore NEFF models of\n   ResNet50 v2 float16 batch = 5 in parallel on an inf1.1xlarge sped up\n   by 30%.\n-  Corrected shape handling in Gather(TensorFlow)/Take(MXNet) operators\n   that are processed by the Neuron Runtime in the Neuron Runtime vCPU,\n   which resolves a possible crash in Neuron Compiler when compiling\n   models with these operators with some shapes.\n-  Added support for TensorFlow *OneHot* operator (as a Neuron Runtime\n   CPU operator).\n-  Added more internal checking for compiler correctness with newly\n   defined error messages for this case.\n\n::\n\n         “Internal ERROR: Data race between Op1 'Name1(...) [...]' and Op2 'Name2(...) [...]'”\n\n-  Fixed out-of-memory issue introduced in 1.0.5939.0 such that some\n   large models (BERT) compiled on instances with insufficient host\n   memory would cause the runtime to crash with an invalid NEFF. This is\n   actually a compiler error, but due to additional script layers\n   wrapping this in the :ref:`tensorflow-bert-demo`, this would\n   have likely been seen as a runtime error like this:\n\n.. code:: bash\n\n   2020-01-09 13:40:26.002594: E tensorflow/core/framework/op_segment.cc:54] Create kernel failed: Invalid argument: neff is invalid\n   2020-01-09 13:40:26.002637: E tensorflow/core/common_runtime/executor.cc:642] Executor failed to create kernel. Invalid argument: neff is invalid\n   [[{{node bert/NeuronOp}}]]\n\n.. _cc-known-issues-and-limitations-2:\n\nKnown issues and limitations\n----------------------------\n\nSee previous release notes. Some tutorials show use of specific compiler\noptions and flags, these are needed to help provide guidance to the\ncompiler to achieve best performance in specific cases. Please do not\nuse in cases other than as shown in the specific tutorial as results may\nnot be defined. These options should be considered beta and will\nbe removed over time.\n\n.. _cc-other-notes-5:\n\nOther Notes\n-----------\n\n.. _dependencies-4:\n\nDependencies\n------------\n\n::\n\n   dmlc_nnvm-1.0.1619.0\n   dmlc_topi-1.0.1619.0\n   dmlc_tvm-1.0.1619.0\n   inferentia_hwm-1.0.839.0\n   islpy-2018.2\n\n.. _1059390:\n\n[1.0.5939.0]\n^^^^^^^^^^^^\n\nDate 12/20/2019\n\n.. _cc-summary-9:\n\nSummary\n-------\n\nBug fixes and some performance enhancement for NeuronCore Pipeline.\n\n.. _cc-major-new-features-8:\n\nNew in this release\n-------------------\n\n.. _cc-resolved-issues-8:\n\nResolved Issues\n---------------\n\n-  Fixed pipeline execution on more than 10 NeuronCores\n-  Improved NeuronCores Pipeline execution by improving data exchange\n   efficiency between NeuronCores\n-  Added warning for unaligned memory access\n-  Fixed handling of cast on input FP32 tensor\n-  Improved handling of data layouts and transpose\n-  Improved dead-code elimination\n-  Improved efficiency of compute engine synchronization\n-  Improved efficiency of data transfers within the Neuron code\n\n.. _cc-known-issues-and-limitations-3:\n\nKnown issues and limitations\n----------------------------\n\nSee previous release notes. Some tutorials show use of specific compiler\noptions and flags, these are needed to help provide guidance to the\ncompiler to achieve best performance in specific cases. Please do not\nuse in cases other than as shown in the specific tutorial as results may\nnot be defined. These options should be considered beta and will\nbe removed over time.\n\n.. _cc-other-notes-6:\n\nOther Notes\n-----------\n\n.. _dependencies-5:\n\nDependencies\n------------\n\n-  dmlc_nnvm-1.0.1416.0\n\n-  dmlc_topi-1.0.1416.0\n\n-  dmlc_tvm-1.0.1416.0\n\n-  inferentia_hwm-1.0.720.0\n\n-  islpy-2018.2\n\n.. _1053010:\n\n[1.0.5301.0]\n^^^^^^^^^^^^\n\nDate 12/1/2019\n\n.. _cc-summary-10:\n\nSummary\n-------\n\n.. _cc-major-new-features-9:\n\nNew in this release\n-------------------\n\n.. _cc-resolved-issues-9:\n\nResolved Issues\n---------------\n\n-  Added warning for unsupported operators and convolution sizes\n-  Added warning for unsupported layout / upsampling\n-  Added support for Relu6, AddV2, BatchMatmulV2 operators\n-  Added support for default MXNet outputs in –io-config\n-  Improved performance of batched inference for convolutional networks\n-  Fixed MatMult column size 1\n-  Fixed bf16 constant loading\n-  Fixed Conv2D tile accumulation\n\n.. _cc-known-issues-and-limitations-4:\n\nKnown Issues and Limitations\n----------------------------\n\nSee previous release notes. Resolved issues are shown in Resolved\nIssues.\n\n.. _cc-other-notes-7:\n\nOther Notes\n-----------\n\nPlease install g++ on AMIs without g++ pre-installed (i.e. server AMIs):\n\n.. code:: bash\n\n   # Ubuntu\n   sudo apt-get install -y g++\n\n.. code:: bash\n\n   # Amazon Linux\n   sudo dnf install -y gcc-c++\n\nSupported Python versions:\n\n-  3.5, 3.6, 3.7\n\nSupported Linux distributions:\n\n-  Ubuntu 16, Ubuntu 18, Amazon Linux 2\n\n.. _dependencies-6:\n\nDependencies\n------------\n\n-  dmlc_nnvm-1.0.1328.0\n-  dmlc_topi-1.0.1328.0\n-  dmlc_tvm-1.0.1328.0\n-  inferentia_hwm-1.0.674.0\n-  islpy-2018.2\n\n.. _1046800:\n\n[1.0.4680.0]\n^^^^^^^^^^^^\n\nDate: 11/25/2019\n\n.. _cc-major-new-features-10:\n\nNew in this release\n-------------------\n\nN/A, this is the first release.\n\n.. _cc-resolved-issues-10:\n\nResolved issues\n---------------\n\nN/A, this is the first release.\n\n.. _cc-known-issues-and-limitations-5:\n\nKnown issues and limitations\n----------------------------\n\n1. **Control flow** Inferentia has a limited support for control flow.\n   In general, Neuron can only support control flow operators which are\n   static at compile time, i.e. static length RNN, top-k, sort, ...\n2. **Size of neural network** The size of neural network is influenced\n   by a) type of neural network (CNN, LSTM, MLP) , b) number of layers,\n   c) sizes of input (dimension of the tensors, batch size, ...). The\n   current Neuron compiler release has a limitation in terms of the size\n   of neural network it could effectively optimize. As a result, we\n   limit CNN models (e.g. ResNet) to have an input size of up to 480x480\n   FP16, batch size of 4; LSTM models (e.g. GNMT) are limited to a time\n   step limit of up to 900; MLP models (like BERT) are limited up to\n   sequence-length equal 128, batch=8.\n3. **Data layout** The Neuron compiler supports multiple data layout\n   formats (NCHW, NHWC, ...). Non-CNHW input/output data-layouts will\n   require Neuron to insert additional *transpose* operations, causing a\n   degradation in performance.\n4. **Object detection models** Computer-vision object detection and\n   segmentation models are not supported by the current release.\n5. **Reduce data type** INT8 data type is not currently supported by the\n   Neuron compiler.\n6. **Tensor residency** When a sub-graph that is executed on the host is\n   communicating with a sub-graph that is executing on Neuron cores,\n   tensors are copied via the communication queues between the host and\n   Inferentia memory for each inference, which may result in end-to-end\n   performance degradation.\n7. **Primary inputs in NeuronCore Pipeline mode** When a neural network\n   is executed in NeuronCore Pipeline mode, only the first operator in a\n   neural network can receive primary inputs from the host.\n\n.. _cc-other-notes-8:\n\nOther Notes\n-----------\n\n.. _dependencies-7:\n\nDependencies\n------------\n\n-  nnvm: dmlc_nnvm-1.0.1219.0\n-  topi: dmlc_topi-1.0.1219.0\n-  tvm: dmlc_tvm-1.0.1219.0\n-  hwm: inferentia_hwm-1.0.602.0\n-  islpy: islpy-2018.2+aws2018.x.73.0\n"
  },
  {
    "path": "release-notes/archive/neuron1/_legacy-labels.rst",
    "content": ":orphan:\n\n.. This file provides stub labels for archived Neuron 1.x content where\n   the original target pages have been removed. These labels prevent\n   build warnings from the archived release notes.\n\n.. _install-guide-index:\n.. _ndriver_2_3_26_0:\n.. _neuron-containers:\n.. _neuron-tutorials:\n.. _software-end-of-support:\n.. _software-maintenance:\n.. _neuron-mxnet:\n.. _neuron-cc:\n.. _models-inferentia:\n.. _neuron-runtime:\n.. _neuron-k8-scheduler-ext:\n.. _neff-support-table:\n.. _tensorflow-neuron-legacy:\n\nLegacy Neuron 1.x Labels\n=========================\n\nThis page exists solely to provide anchor targets for archived Neuron 1.x release notes.\nThe original pages for these labels have been removed. See the current\n:doc:`Neuron SDK documentation </index>` for up-to-date information.\n"
  },
  {
    "path": "release-notes/archive/neuron1/neuronrelease/previous-content.rst",
    "content": ".. _pre-release-content:\n\nPrevious Releases Content\n=========================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nNeuron 2.5.0 (11/23/2022)\n--------------------------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=2.5.0\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=2,5.0\n\n\nNeuron 1.19.1 (05/27/2022)\n--------------------------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.19.1\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.19.1\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - Python 3.7\n\nNeuron 1.19.0 (04/29/2022)\n--------------------------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.19.0\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.19.0\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - Python 3.7\n\nNeuron 1.18.0 (03/25/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.18.0\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.18.0\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - Python 3.7\n\n\nNeuron 1.17.2 (02/18/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.17.2\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.17.2\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\nNeuron 1.17.1 (02/16/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.17.1\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.17.1\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\nNeuron 1.17.0 (01/20/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.17.0\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.17.0\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\nNeuron 1.16.3 (01/05/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.16.3\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.16.3\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\nNeuron 1.16.2 (12/15/2021)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.16.2\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.16.2\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\nNeuron 1.16.1 (11/05/2021)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.16.1\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.16.1\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\nNeuron 1.16.0 (10/27/2021)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.16.0\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.16.0\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\nNeuron v1.15.2 (September 22 2021)\n----------------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.15.2\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.15.2\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n\n\nNeuron v1.15.1 (August 30 2021)\n-------------------------------\n\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.15.1\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.15.1\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n\n\nNeuron v1.15.0 (August 12 2021)\n-------------------------------\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.15.0\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.15.0\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n\nNeuron v1.14.2 (July 26 2021)\n-----------------------------\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.14.2\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.14.2\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n       \n\n\nNeuron v1.14.1 (July 2nd 2021)\n------------------------------\n\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.14.1\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.14.1\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n       \n\n\n\nNeuron v1.14.0 (May 28th 2021)\n------------------------------\n\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.14.0\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.14.0\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n       \n\nNeuron v1.13.0 (May 1st 2021)\n-----------------------------\n\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.13.0\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.13.0\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n   * - Neuron Conda Packages\n     - * torch-neuron-1.7.1.1.3.5.0 \n     \n       * tensorflow-neuron 1.15.5.1.3.3.0\n\n       * mxnet-neuron-1.5.1.1.4.4.0\n       \n\nNeuron v1.12.2 (Mar 4th 2021)\n------------------------------------------------\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.12.2\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.12.2\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n     - Maintenance\n     - End Of Support\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n     - \n     - * Python 3.5 (2/24/2021)\n   * - Neuron Conda Packages\n     - * torch-neuron 1.7.1.1.2.16.0 \n     \n       * tensorflow-neuron 1.15.5.1.2.9.0\n\n       * mxnet-neuron 1.5.1.1.3.8.0\n       \n     - \n     - \n\nNeuron v1.12.1 (Feb 24th 2021)\n------------------------------------------------\n\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.12.1\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.12.1\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n     - Maintenance\n     - End Of Support\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n     - \n     - * Python 3.5 (2/24/2021)\n   * - Neuron Conda Packages\n     - * torch-neuron 1.7.1.1.2.15.0 \n     \n       * tensorflow-neuron 1.15.5.1.2.8.0\n\n       * mxnet-neuron 1.5.1.1.3.7.0\n       \n     - \n     - \n\n\nNeuron v1.12.0 (Jan 30 2021)\n----------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.12.0\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.12.0\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n     - Maintenance\n     - End Of Support\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n     - \n     - \n   * - Neuron Conda Packages\n     - * Conda-PyTorch 1.5.1, Conda-PyTorch 1.7.1, \n     \n       * Conda-TensorFlow 1.5.1, Conda-MXNet 1.5.1\n     - \n     - \n\n"
  },
  {
    "path": "release-notes/archive/neuron1/prev/content.rst",
    "content": ".. _pre-n1-release-content:\n\nPrevious Releases' Content (Neuron 1.x)\n=======================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\nNeuron 2.5.0 (11/23/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=2.5.0\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.19.1\n\n\nNeuron 1.19.2 (08/02/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.19.2\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.19.1\n\n\nNeuron 1.19.1 (05/27/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.19.1\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.19.1\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - Python 3.7\n\n\n\nNeuron 1.19.0 (04/29/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.19.0\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.19.0\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - Python 3.7\n\n\n\nNeuron 1.18.0 (03/25/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.18.0\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.18.0\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - Python 3.7\n\n\nNeuron 1.17.2 (02/18/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.17.2\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.17.2\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\nNeuron 1.17.1 (02/16/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.17.1\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.17.1\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\n\nNeuron 1.17.0 (01/20/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.17.0\n\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.17.0\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\nNeuron 1.16.3 (01/05/2022)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.16.3\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.16.3\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\nNeuron 1.16.2 (12/15/2021)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.16.2\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.16.2\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\nNeuron 1.16.1 (11/05/2021)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.16.1\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.16.1\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\nNeuron 1.16.0 (10/27/2021)\n--------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.16.0\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.16.0\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n\n\nNeuron v1.15.2 (September 22 2021)\n----------------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.15.2\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.15.2\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n\n\nNeuron v1.15.1 (August 30 2021)\n-------------------------------\n\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.15.1\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.15.1\n\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n\n\nNeuron v1.15.0 (August 12 2021)\n-------------------------------\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.15.0\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.15.0\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n\nNeuron v1.14.2 (July 26 2021)\n-----------------------------\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.14.2\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.14.2\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n       \n\n\nNeuron v1.14.1 (July 2nd 2021)\n------------------------------\n\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.14.1\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.14.1\n\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n       \n\n\n\nNeuron v1.14.0 (May 28th 2021)\n------------------------------\n\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.14.0\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.14.0\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n       \n\nNeuron v1.13.0 (May 1st 2021)\n-----------------------------\n\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.13.0\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.13.0\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n       * Python 3.8 [Beta]\n   * - Neuron Conda Packages\n     - * torch-neuron-1.7.1.1.3.5.0 \n     \n       * tensorflow-neuron 1.15.5.1.3.3.0\n\n       * mxnet-neuron-1.5.1.1.4.4.0\n       \n\nNeuron v1.12.2 (Mar 4th 2021)\n------------------------------------------------\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.12.2\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.12.2\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n     - Maintenance\n     - End Of Support\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n     - \n     - * Python 3.5 (2/24/2021)\n   * - Neuron Conda Packages\n     - * torch-neuron 1.7.1.1.2.16.0 \n     \n       * tensorflow-neuron 1.15.5.1.2.9.0\n\n       * mxnet-neuron 1.5.1.1.3.8.0\n       \n     - \n     - \n\nNeuron v1.12.1 (Feb 24th 2021)\n------------------------------------------------\n\n\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.12.1\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.12.1\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n     - Maintenance\n     - End Of Support\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n     - \n     - * Python 3.5 (2/24/2021)\n   * - Neuron Conda Packages\n     - * torch-neuron 1.7.1.1.2.15.0 \n     \n       * tensorflow-neuron 1.15.5.1.2.8.0\n\n       * mxnet-neuron 1.5.1.1.3.7.0\n       \n     - \n     - \n\n\nNeuron v1.12.0 (Jan 30 2021)\n----------------------------\n\nRelease included packages\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=1.12.0\n\nSee :ref:`neuron-maintenance-policy` for more information.\n\n\nRelease supported frameworks\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list frameworks --neuron-version=1.12.0\n\nDependency Software Supported Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Software\n     - Supported\n     - Maintenance\n     - End Of Support\n   * - Python\n     - * Python 3.6\n       * Python 3.7\n     - \n     - \n   * - Neuron Conda Packages\n     - * Conda-PyTorch 1.5.1, Conda-PyTorch 1.7.1, \n     \n       * Conda-TensorFlow 1.5.1, Conda-MXNet 1.5.1\n     - \n     - \n\n"
  },
  {
    "path": "release-notes/archive/neuron1/prev/rn.rst",
    "content": ".. _prev-n1-rn:\n\nPrevious Release Notes (Neuron 1.x)\n===================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\nNeuron 1.19.2 (08/02/2022)\n--------------------------\n\n**Neuron 1.19.2** This is a patch release. The release include a :ref:`security update <ndriver_2_3_26_0>` for Neuron Driver (``aws-neuron-dkms``) and includes compiler bug fix that ignore MXNet dropout for 'training' while performing inference. \nPlease update the Neuron Driver to the latest (version 2.3.26 or newer) so that you can benefit from operational and security updates included in this release.\n\n.. important ::\n\n   You must update to the latest Neuron Driver (aws-neuron-dkms version 2.3.26 or newer) before installing or upgrading to latest Neuron release.\n      * Uninstall ``aws-neuron-dkms`` by running: ``sudo apt remove aws-neuron-dkms`` or ``sudo dnf remove aws-neuron-dkms``\n      * Install or upgrade to latest Neuron driver (``aws-neuron-dkms``) by following the \":ref:`install-guide-index`\" instructions.\n\nNeuron 1.19.1 (05/27/2022)\n--------------------------\n\n**Neuron 1.19.1** is a patch release. This release fixes a bug in Neuron Driver (``aws-neuron-dkms``). Neuron driver version 2.3.11 included in this release fixes a bug that causes kernel panic when a large memory allocation on Neuron device fails.  Neuron Driver 2.3.11 also introduces a new functionality required by the upcoming Neuron 1.20.0 release.  Because the new functionality is mandatory for Neuron 1.20.0 support, Neuron Driver 2.3.11 adds a compatibility check that will prevents Neuron 1.20.0 from running with older versions of the driver.   An attempt to run Neuron 1.20.0 with an older version of the driver will result in the application terminating with an error message.\n\nIn addition, this release updates ``tensorflow-neuron`` installation instructions to pin ``protobuf`` version to avoid `compatibility issues <https://github.com/protocolbuffers/protobuf/issues/10051>`__ with older versions of TensorFlow.\n\n.. important ::\n\n   For successful installation or update to next releases (Neuron 1.20.0 and newer):\n      * Uninstall ``aws-neuron-dkms`` by running: ``sudo apt remove aws-neuron-dkms`` or ``sudo dnf remove aws-neuron-dkms``\n      * Install or upgrade to latest Neuron driver (``aws-neuron-dkms``) by following the \":ref:`install-guide-index`\" instructions.\n\n\nNeuron 1.19.1 (05/27/2022)\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n**Neuron 1.19.1** is a patch release. This release fixes a bug in Neuron Driver (``aws-neuron-dkms``). Neuron driver version 2.3.11 included in this release fixes a bug that causes kernel panic when a large memory allocation on Neuron device fails.  Neuron Driver 2.3.11 also introduces a new functionality required by the upcoming Neuron 1.20.0 release.  Because the new functionality is mandatory for Neuron 1.20.0 support, Neuron Driver 2.3.11 adds a compatibility check that will prevents Neuron 1.20.0 from running with older versions of the driver.   An attempt to run Neuron 1.20.0 with an older version of the driver will result in the application terminating with an error message.\n\nIn addition, this release updates ``tensorflow-neuron`` installation instructions to pin ``protobuf`` version to avoid `compatibility issues <https://github.com/protocolbuffers/protobuf/issues/10051>`__ with older versions of TensorFlow.\n\n.. important ::\n\n   For successful installation or update to next releases (Neuron 1.20.0 and newer):\n      * Uninstall ``aws-neuron-dkms`` by running: ``sudo apt remove aws-neuron-dkms`` or ``sudo dnf remove aws-neuron-dkms``\n      * Install or upgrade to latest Neuron driver (``aws-neuron-dkms``) by following the \":ref:`install-guide-index`\" instructions.\n\nNeuron 1.19.0 (04/29/2022)\n--------------------------\n\n**Neuron 1.19.0** release adds support for PyTorch version 1.11, updates torch-neuron 1.10 to 1.10.2, and adds support for TensorFlow version 2.8, as well as minor enhancements and bug fixes.\n\nPlease note that starting with this release (*Neuron 1.19.0*), installing ``aws-neuron-runtime-base`` and ``oci-add-hooks`` are no longer required for Neuron Kubernetes device driver plugin. In addition starting with this release, *torch-neuron 1.5* :ref:`will no longer be supported <eol-pt-15>`.\n\n\nNeuron 1.18.0 (03/25/2022)\n--------------------------\n\n**Neuron 1.18.0** release introduces the beta release of :ref:`NeuronPerf <neuronperf>`, NeuronPerf is a Python library with a simple API that enables fast measurements of performance when running models with Neuron. This release adds new 5 models to the :ref:`appnote-performance-benchmark` together with  NeuronPerf scripts used to compile these models and run the benchmarks.\n\n\nThis release also introduces additional ``torch-neuron`` packages that support C++11 ABI, updates TensorFlow-Neuron 2.5 to 2.5.3, adds support for TensorFlow-Neuron 2.6 and 2.7, and introduces Runtime NEURON_RT_NUM_CORES :ref:`environment variable <nrt-configuration>`. In addition this release include minor enhancements and bug fixes in Compiler, Neuron Framework Extensions, Runtime 2.x library and tools. See below detailed release notes.\n\nStarting with this release, *TensorFlow Neuron versions 2.1, 2.2, 2.3 and 2.4* will :ref:`no longer be supported <eol-tf-21-24>` . We will also :ref:`stop supporting PyTorch Neuron version 1.5 <announce-eol-pt-1-5>` starting with Neuron 1.19.0 release, and :ref:`will stop supporting <eol-ncgs-env_2>`  ``NEURONCORE_GROUP_SIZES`` environment variable starting with Neuron 1.20.0 release.\n\nNeuron 1.17.2 (02/18/2022)\n--------------------------\n\n**Neuron 1.17.2** is a patch release. This release fixes a bug in TensorFlow Neuron versions 2.1, 2.2. 2.3 and 2.4. The fixed bug was causing a memory leak of 128B for each inference. Starting this release, TensorFlow Neuron versions 2.1, 2.2, 2.3 and 2.4 are :ref:`entering maintenance mode <maintenance_tf21_tf24>`. Future releases of TensorFlow Neuron versions 2.1, 2.2, 2.3 and 2.4 will address security issues only.\n\nNeuron 1.17.1 (02/16/2022)\n--------------------------\n\n**Neuron 1.17.1** is a patch release. This release fixes a bug in TensorFlow Neuron that caused a memory leak. The memory leak was approximately 128b for each inference and \nexists in all versions of TensorFlow Neuron versions part of Neuron 1.16.0 to Neuron 1.17.0 releases. see :ref:`pre-release-content` for exact versions included in each release.  This release only fixes the memory leak for TensorFlow versions 1.15 and 2.5 from Neuron.  The other versions of TensorFlow Neuron will be fixed in a shortly upcoming release.\n\nNeuron 1.17.0 (01/20/2022)\n--------------------------\n\n**Neuron 1.17.0** release introduces the support of PyTorch 1.10,  Tensorflow 2.5 update to version 2.5.2, new operators support in PyTorch\nand TensorFlow 1.15, in addition to enhancements and bug fixes in PyTorch, TensorFlow, MxNet, Compiler, Runtime and Tools.\n\n- **PyTorch**\n   * First PyTorch 1.10 support.\n   * Added new operators support.\n   * See :ref:`pytorch-neuron-rn` and :ref:`neuron-cc-ops-pytorch` for more details.\n- **TensorFlow 2.x**\n   * Updated Tensorflow 2.5 to version 2.5.2.\n   * Updated tensorflow-model-server 2.5 to version 2.5.3.\n   * See :ref:`tensorflow-neuron-rn-v2` and :ref:`tensorflow-modelserver-rn-v2` for more details.\n- **TensorFlow 1.15**\n   * Added new operators support.\n   * See :ref:`tensorflow-neuron-rn` and :ref:`neuron-cc-ops-tensorflow` for more details.\n- **MXNet**\n   * Added support for ``mx_neuron.__version__`` to get the build version of MXNet Neuron plugin.\n   * See :ref:`mxnet-neuron-rn` for more details.\n- **Tools 2.x**\n   * ``neuron-top`` - Added “all” tab that aggregates all running Neuron processes into a single view.\n   * ``neuron-top`` - Improved startup time by approximately 1.5 seconds in most cases.\n   * See :ref:`dev-tools_rn` for more details.\n- **Compiler**\n   * Enhancements and minor bug fixes.\n   * See :ref:`neuron-cc-rn` for more details.\n- **Runtime 2.x**\n   * Enhancements and minor bug fixes.\n   * See :ref:`runtime_rn` for more details.\n\nNeuron 1.16.3 (01/05/2022)\n--------------------------\n\n**Neuron 1.16.3** is a minor release. This release includes performance enhancements and operator support in :ref:`PyTorch Neuron <pytorch-neuron-rn>`\nand minor bug fixes in :ref:`Neuron Compiler <neuron-cc-rn>`.\n\n\nNeuron 1.16.2 (12/15/2021)\n--------------------------\n\n**Neuron 1.16.2** is a patch release. This release includes performance enhancements and minor bug fixes in :ref:`Neuron Compiler <neuron-cc-rn>`\nand :ref:`PyTorch Neuron <pytorch-neuron-rn>`.\n\nNeuron 1.16.1 (11/05/2021)\n--------------------------\n\n**Neuron 1.16.1** is a patch release. This release fixes a bug in Neuron Runtime that would have prevented users from launching a container that doesn’t use all of the Neuron Devices in the instance. If you are using Neuron within a container, please update to this new release by updating to latest Neuron ML framework package, Neuron Tools, and/or TensorFlow Neuron Model Server.\n\n\n* To update to latest PyTorch 1.9.1:\n  ``pip install --upgrade torch-neuron neuron-cc[tensorflow] torchvision``\n\n* To update to latest TensorFlow 2.5.1:\n  ``pip install --upgrade tensorflow-neuron[cc]``\n\n* To update to latest TensorFlow 1.15.5:\n  ``pip install --upgrade tensorflow-neuron==1.15.5.* neuron-cc``\n\n* To update to latest MXNet 1.8.0:\n  ``pip install --upgrade mx_neuron neuron-cc``\n\n\nFor more details on how to update the framework packages, please check out our :ref:`setup-guide-index`.\n\n\nNeuron 1.16.0 (10/27/2021)\n--------------------------\n\n**Neuron 1.16.0 is a release that requires your attention**. **You must update to the latest Neuron Driver (** ``aws-neuron-dkms`` **version 2.1 or newer)\nfor successful installation or upgrade**.\n\nThis release introduces\n:ref:`Neuron Runtime 2.x <introduce-libnrt>`, upgrades :ref:`PyTorch Neuron <neuron-pytorch>` to\nPyTorch 1.9.1, adds support for new APIs (:func:`torch.neuron.DataParallel` and ``torch_neuron.is_available()``),\nadds new features and capabilities (compiler ``--fast-math`` option for better fine-tuning of accuracy/performance and :ref:`MXNet FlexEG feature <flexeg>`),\nimproves :ref:`tools <neuron-tools>`, adds support for additional :ref:`operators <neuron-supported-operators>`,\nimproves :ref:`performance <appnote-performance-benchmark>`\n(Up to 20% additional throughput and up to 25% lower latency),\nand reduces model loading times. It also simplifies :ref:`Neuron installation steps <install-guide-index>`,\nand improves the user experience of :ref:`container creation and deployment <neuron-containers>`.\nIn addition it includes bug fixes, new :ref:`application notes <neuron-appnotes>`, updated :ref:`tutorials <neuron-tutorials>`,\nand announcements of software :ref:`end-of-support <software-end-of-support>` and :ref:`maintenance <software-maintenance>`.\n\n\n-  **Neuron Runtime 2.x**\n\n   - :ref:`introduce-libnrt` - In this release we are introducing Neuron Runtime 2.x.\n     The new runtime is a shared library (``libnrt.so``), replacing Neuron Runtime 1.x\n     which was a server daemon (``neruon-rtd``).\n\n     Upgrading to ``libnrt.so`` is expected to improves throughput and\n     latency, simplifies Neuron installation and upgrade process,\n     introduces new capabilities for allocating NeuronCores to\n     applications, streamlines container creation, and deprecates tools\n     that are no longer needed. The new library-based runtime\n     (``libnrt.so``) is directly integrated into Neuron’s ML Frameworks (with the exception of MXNet 1.5) and Neuron\n     Tools packages. As a result, users no longer need to install/deploy the\n     ``aws-neuron-runtime``\\ package.\n\n     .. important::\n\n        -  You must update to the latest Neuron Driver (``aws-neuron-dkms`` version 2.1 or newer)\n           for proper functionality of the new runtime library.\n        -  Read :ref:`introduce-libnrt`\n           application note that describes :ref:`why we are making this\n           change <introduce-libnrt-why>` and\n           how :ref:`this change will affect the Neuron\n           SDK <introduce-libnrt-how-sdk>` in detail.\n        -  Read :ref:`neuron-migrating-apps-neuron-to-libnrt` for detailed information of how to\n           migrate your application.\n\n\n-  **Performance**\n\n   -  Updated :ref:`performance numbers <appnote-performance-benchmark>` - Improved performance: Up to 20% additional throughput\n      and up to 25% lower latency.\n\n-  **Documentation resources**\n\n   -  Improved :ref:`Neuron Setup Guide <install-guide-index>`.\n   -  New :ref:`introduce-libnrt` application note.\n   -  New :ref:`bucketing_app_note` application note.\n   -  New :ref:`neuron-cc-training-mixed-precision` application note.\n   -  New :ref:`torch-neuron-dataparallel-app-note` application note.\n   -  New :ref:`flexeg` application note.\n   -  New :ref:`parallel-exec-ncgs` application note.\n   -  New :ref:`Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving <tensorflow-serving-neuronrt-visible-cores>` tutorial.\n   -  Updated :ref:`ResNet50 model for Inferentia </src/examples/pytorch/resnet50.ipynb>` tutorial to use :func:`torch.neuron.DataParallel`.\n\n-  **PyTorch**\n\n   -  PyTorch now supports Neuron Runtime 2.x only. Please visit :ref:`introduce-libnrt` for\n      more information.\n   -  Introducing PyTorch 1.9.1 support.\n   -  Introducing new APIs: :func:`torch.neuron.DataParallel` (see :ref:`torch-neuron-dataparallel-app-note` application note for more details) and\n      ``torch_neuron.is_available()``.\n   -  Introducing :ref:`new operators support <neuron-cc-ops-pytorch>`.\n   -  For more information visit :ref:`neuron-pytorch`\n\n-  **TensorFlow 2.x**\n\n   -  TensorFlow 2.x now supports Neuron Runtime 2.x only. Please visit\n      :ref:`introduce-libnrt` for more information.\n   -  Updated Tensorflow 2.3.x from Tensorflow 2.3.3 to Tensorflow\n      2.3.4.\n   -  Updated Tensorflow 2.4.x from Tensorflow 2.4.2 to Tensorflow\n      2.4.3.\n   -  Updated Tensorflow 2.5.x from Tensorflow 2.5.0 to Tensorflow\n      2.5.1.\n   -  Introducing :ref:`new operators support <tensorflow-ref-neuron-accelerated-ops>`\n   -  For more information visit :ref:`tensorflow-neuron`\n\n-  **TensorFlow 1.x**\n\n   -  TensorFlow 1.x now supports Neuron Runtime 2.x only. Please visit\n      :ref:`introduce-libnrt` for more information.\n   -  Introducing :ref:`new operators support <neuron-cc-ops-tensorflow>`.\n   -  For more information visit :ref:`tensorflow-neuron`\n\n-  **MXNet 1.8**\n\n   -  MXNet 1.8 now supports Neuron Runtime 2.x only. Please visit\n      :ref:`introduce-libnrt` for more information.\n   -  Introducing Flexible Execution Groups (FlexEG) feature.\n   -  MXNet 1.5 enters maintenance mode. Please visit :ref:`maintenance_mxnet_1_5` for more\n      information.\n   -  For more information visit :ref:`neuron-mxnet`\n\n-  **Neuron Compiler**\n\n   -  Introducing the ``–-fast-math`` option for better fine-tuning of accuracy/performance. See :ref:`neuron-cc-training-mixed-precision`\n   -  Support added for new ArgMax and ArgMin operators. See :ref:`neuron-cc-rn`.\n   -  For more information visit :ref:`neuron-cc`\n\n-  **Neuron Tools**\n\n   -  Updates have been made to ``neuron-ls`` and ``neuron-top`` to\n      improve the interface and utility of information\n      provided.\n   -  `neuron-monitor`` has been enhanced to include additional information when\n      used to monitor the latest Frameworks released with Neuron 1.16.0. See :ref:`dev-tools_rn`.\n   -  ``neuron-cli`` is entering maintenance mode as its use is no longer\n      relevant when using ML Frameworks with an integrated Neuron\n      Runtime (libnrt.so).\n   -  For more information visit :ref:`neuron-tools`\n\n-  **Neuron Containers**\n\n   -  Starting with Neuron 1.16.0, installation of Neuron ML Frameworks now includes\n      an integrated Neuron Runtime library. As a result, it is\n      no longer required to deploy ``neuron-rtd``. Please visit :ref:`introduce-libnrt` for\n      information.\n   -  When using containers built with components from Neuron 1.16.0, or\n      newer, please use ``aws-neuron-dkms`` version 2.1 or newer and the\n      latest version of ``aws-neuron-runtime-base``. Passing additional\n      system capabilities is no longer required.\n   -  For more information visit :ref:`neuron-containers`\n\n-  **Neuron Driver**\n\n   -  Support is added for Neuron Runtime 2.x (libnrt.so).\n   -  Memory improvements have been made to ensure all allocations are made with\n      4K alignments.\n\n\n-  **Software Deprecation**\n\n   - :ref:`eol-ncgs-env`\n   - :ref:`eol-ncg`\n\n\n-  **Software maintenance mode**\n\n   - :ref:`maintenance_rtd`\n   - :ref:`maintenance_mxnet_1_5`\n   - :ref:`maintenance_neuron-cli`\n\nNeuron 1.15.2 (09/22/2021)\n--------------------------\n\nNeuron 1.15.2 includes bug fixes for the tensorflow-model-server-neuron 2.5.1.1.6.8.0 package and several other bug fixes for tensorflow-neuron/tensorflow-model-server-neuron packages.\n\nNeuron 1.15.1 (08/30/2021)\n--------------------------\n\nNeuron 1.15.1 includes bug fixes for the aws-neuron-dkms package and several other bug fixes for related packages.\n\nNeuron 1.15.0 (08/12/2021)\n--------------------------\n\nNeuron 1.15.0 is the first release to support TensorFlow 2. In this release TensorFlow 2 supports language transformer base models like BERT. The TensorFlow 2 support will be enhanced in future releases to support additional models.\n\n* **TensorFlow 2.x** - To get started with TensorFlow 2.x:\n\n  *  Run the TensorFlow 2  :ref:`HuggingFace distilBERT Tutorial </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>`.\n  *  Read :ref:`tf2_faq`\n  *  See newly introduced TensorFlow 2.x (``tensorflow-neuron``) Tracing API.\n  *  See :ref:`tensorflow-ref-neuron-accelerated-ops`.\n\n\n* **Documentation**\n\n  *  **New** :ref:`models-inferentia` application note added in this release. This application note describes what types of deep learning model architectures perform well out of the box and provides guidance on techniques you can use to optimize your deep learning models for Inferentia.\n  *  **New** :ref:`Neuron inference performance page <appnote-performance-benchmark>` provides performance information for popular models and links to test these models in your own environment. The data includes throughout and latency numbers, cost per inference, for both realtime and offline applications.\n  *  **New** :ref:`TensorFlow 2 HuggingFace distilBERT Tutorial </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>`.\n  *  **New** :ref:`Bring your own HuggingFace pretrained BERT container to Sagemaker Tutorial </src/examples/pytorch/byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb>`.\n\n\n\n* **More information**\n\n  *  :ref:`tensorflow-neuron-rn`\n  *  :ref:`neuron-cc-rn`\n  *  :ref:`tensorflow-modelserver-rn`\n  \n\n.. _07-02-2021-rn:\n\nNeuron 1.14.2 (07/26/2021)\n--------------------------\n\nThis release (Neuron 1.14.2) , include bug fixes and minor enhancements to Neuron Runtime:\n\n    * Neuron Runtime - see :ref:`runtime_rn`\n\nNeuron 1.14.1 (07/02/2021)\n--------------------------\n\nThis release (Neuron 1.14.1) , include bug fixes and minor enhancements:\n\n* PyTorch Neuron - This release adds “Dynamic Batching” feature support, see PyTorch-Neuron trace python API for more information, the release also add support for new operators and include additional bug fixes and minor enhancements, for more information see :ref:`pytorch-neuron-rn`.\n* TensorFlow Neuron - see :ref:`tensorflow-neuron-rn`.\n* MXNet Neuron - see :ref:`mxnet-neuron-rn`.\n* Neuron Compiler - see :ref:`neuron-cc-rn`.\n* Neuron Runtime - see :ref:`runtime_rn`.\n* Neuron Tools - see :ref:`dev-tools_rn`.\n\n\n.. _05-28-2021-rn:\n\nNeuron 1.14.0 (05/28/2021)\n--------------------------\n\nThis release (Neuron 1.14.0) introduces first release of PyTorch Neuron 1.8.1, tutorials update, performance enhancements and memory optimizations for PyTorch Neuron, TensorFlow Neuron and MXNet Neuron.\n\n\n* PyTorch Neuron - First release of PyTorch Neuron 1.8.1.\n* PyTorch Neuron - Convolution operator support has been extended to include ConvTranspose2d variants.\n* PyTorch Neuron - Updated  tutorials to use Hugging Face Transformers 4.6.0.\n* PyTorch Neuron - Additional performance enhancements, memory optimizations, and bug fixes. see :ref:`pytorch-neuron-rn`.\n* Neuron Compiler - New feature  -  Uncompressed NEFF format for faster loading models prior inference. Enable it by –enable-fast-loading-neuron-binaries. Some cases of large models may be detrimentally  impacted as it will not be compressed but many cases will benefit.\n* Neuron Compiler - Additional performance enhancements, memory optimizations, and bug fixes, see :ref:`neuron-cc-rn`.\n* TensorFlow Neuron - Performance enhancements, memory optimizations, and bug fixes. see :ref:`tensorflow-neuron-rn`. \n* MXNet Neuron - Enhancements and minor bug fixes (MXNet 1.8), see :ref:`mxnet-neuron-rn`.\n* Neuron Runtime - Performance enhancements, memory optimizations, and bug fixes. :ref:`runtime_rn`.\n* Neuron Tools - Minor bug fixes and enhancements.\n* Software Deprecation\n\n    * End of support for Neuron Conda packages in Deep Learning AMI, users should use pip upgrade commands to upgrade to latest Neuron version in DLAMI, see `blog <https://aws.amazon.com/blogs/developer/neuron-conda-packages-eol/>`_.\n    * End of support for Ubuntu 16, see :ref:`documentation <eol-ubuntu16>`.\n\n\nNeuron 1.14.0 (05/28/2021)\n--------------------------\n\nThis release (Neuron 1.14.0) introduces first release of PyTorch Neuron 1.8.1, tutorials update, performance enhancements and memory optimizations for PyTorch Neuron, TensorFlow Neuron and MXNet Neuron.\n\n\n* PyTorch Neuron - First release of PyTorch Neuron 1.8.1.\n* PyTorch Neuron - Convolution operator support has been extended to include ConvTranspose2d variants.\n* PyTorch Neuron - Updated  tutorials to use Hugging Face Transformers 4.6.0.\n* PyTorch Neuron - Additional performance enhancements, memory optimizations, and bug fixes. see :ref:`pytorch-neuron-rn`.\n* Neuron Compiler - New feature  -  Uncompressed NEFF format for faster loading models prior inference. Enable it by –enable-fast-loading-neuron-binaries. Some cases of large models may be detrimentally  impacted as it will not be compressed but many cases will benefit.\n* Neuron Compiler - Additional performance enhancements, memory optimizations, and bug fixes, see :ref:`neuron-cc-rn`.\n* TensorFlow Neuron - Performance enhancements, memory optimizations, and bug fixes. see :ref:`tensorflow-neuron-rn`. \n* MXNet Neuron - Enhancements and minor bug fixes (MXNet 1.8), see :ref:`mxnet-neuron-rn`.\n* Neuron Runtime - Performance enhancements, memory optimizations, and bug fixes. :ref:`runtime_rn`.\n* Neuron Tools - Minor bug fixes and enhancements.\n* Software Deprecation\n\n    * End of support for Neuron Conda packages in Deep Learning AMI, users should use pip upgrade commands to upgrade to latest Neuron version in DLAMI, see `blog <https://aws.amazon.com/blogs/developer/neuron-conda-packages-eol/>`_.\n    * End of support for Ubuntu 16, see  :ref:`documentation <eol-ubuntu16>`.\n\n\nNeuron 1.13.0 (05/01/2021)\n--------------------------\n\nThis release introduces higher performance, updated framework support, new tutorials, and adding models and tools:\n\n* Additional compiler improvements boost performance up to 20% higher throughput compared to previous release across model types.\n* Improving usability for NLP models, with out-of-the-box 12x higher-throughput at 70% lower cost for Hugging Face Transformers pre-trained BERT Base models, see :ref:`pytorch-tutorials-neuroncore-pipeline-pytorch`.\n* Upgrade Apache MXNet to 1.8, where Neuron is now a plugin, see :ref:`mxnet-neuron-rn`.\n* PyTorch ResNext models now functional with new operator support, see :ref:`pytorch-neuron-rn`.\n* PyTorch Yolov5 support, see :ref:`pytorch-neuron-rn`.\n* MXNet: Gluon API and Neuron support for NLP BERT models, see :ref:`mxnet-neuron-rn`.\n* PyTorch Convolution operator support has been extended to include most Conv1d and Conv3d variants, please see :ref:`neuron-cc-ops-pytorch`  for the complete list of operators.\n* First release of Neuron plugin for TensorBoard, see :ref:`neuron-tensorboard-rn`.\n\n**Software Deprecation**\n\n* :ref:`eol-conda-packages`\n* :ref:`eol-ubuntu16`\n* :ref:`eol-classic-tensorboard`\n\n\n.. _03-04-2021-rn:\n\nMarch 4, 2021 Release (Patch)\n-----------------------------\n\nThis release include bug fixes and minor enhancements to the Neuron Runtime and Tools. \n\n\nFebruary 24, 2021 Release (Patch)\n---------------------------------\n\nThis release updates all Neuron packages and libraries in response to the Python Secutity issue CVE-2021-3177 as described here: https://nvd.nist.gov/vuln/detail/CVE-2021-3177. This vulnerability potentially exists in multiple versions of Python including 3.5, 3.6, 3.7. Python is used by various components of Neuron, including the Neuron compiler as well as Machine Learning frameworks including TensorFlow, PyTorch and Apache MXNet. It is recommended that the Python interpreters used in any AMIs and containers used with Neuron are also updated. \n\nPython 3.5 reached `end-of-life <https://peps.python.org/pep-0478/>`_, from this release Neuron packages will not support Python 3.5.\nUsers should upgrade to latest DLAMI or upgrade to a newer Python versions if they are using other AMI.\n\n\nJanuary 30, 2021 Release\n--------------------------\n\nThis release continues to improves the NeuronCore Pipeline performance for BERT models. For example, running BERT Base with the the neuroncore-pipeline-cores compile option, at batch=3, seqlen=32 using 16 Neuron Cores, results in throughput of up to  5340 sequences per second and P99 latency of 9ms using Tensorflow Serving. \n\nThis release also adds operator support and performance improvements for the PyTorch based DistilBert model for sequence classification.\n\n\nDecember 23, 2020 Release\n--------------------------\n\nThis release introduces a PyTorch 1.7 based torch-neuron package as a part of the Neuron SDK. Support for PyTorch model serving with TorchServe 0.2 is added and will be demonstrated with a tutorial. This release also provides an example tutorial for PyTorch based Yolo v4 model for Inferentia. \n\nTo aid visibility into compiler activity, the Neuron-extended Frameworks TensorFlow and PyTorch will display a new compilation status indicator that prints a dot (.) every 20 seconds to the console as compilation is executing. \n\nImportant to know:\n^^^^^^^^^^^^^^^^^^\n\n1. This update continues to support the torch-neuron version of PyTorch 1.5.1 for backwards compatibility.\n2. As Python 3.5 reached end-of-life in October 2020, and many packages including TorchVision and Transformers have\n   stopped support for Python 3.5, we will begin to stop supporting Python 3.5 for frameworks, starting with\n   PyTorch-Neuron version :ref:`neuron-torch-11170` in this release. You can continue to use older versions with Python 3.5.\n\nNovember 17, 2020 Release\n--------------------------\n\nThis release improves NeuronCore Pipeline performance. For example,\nrunning BERT Small, batch=4, seqlen=32 using 4 Neuron Cores, results in\nthroughput of up to 7000 sequences per second and P99 latency of 3ms\nusing Tensorflow Serving.\n\nNeuron tools updated the NeuronCore utilization metric to include all\ninf1 compute engines and DMAs. Added a new neuron-monitor example that\nconnects to Grafana via Prometheus. We've added a new sample script\nwhich exports most of neuron-monitor's metrics to a Prometheus\nmonitoring server. Additionally, we also provided a sample Grafana\ndashboard. More details at :ref:`neuron-tools`.\n\nONNX support is limited and from this version onwards we are not\nplanning to add any additional capabilities to ONNX. We recommend\nrunning models in TensorFlow, PyTorch or MXNet for best performance and\nsupport.\n\nOctober 22, 2020 Release\n--------------------------\n\nThis release adds a Neuron kernel mode driver (KMD). The Neuron KMD\nsimplifies Neuron Runtime deployments by removing the need for elevated\nprivileges, improves memory management by removing the need for huge\npages configuration, and eliminates the need for running neuron-rtd as a\nsidecar container. Documentation throughout the repo has been updated to\nreflect the new support. The new Neuron KMD is backwards compatible with\nprior versions of Neuron ML Frameworks and Compilers - no changes are\nrequired to existing application code.\n\nMore details in the Neuron Runtime release notes at :ref:`neuron-runtime`.\n\nSeptember 22, 2020 Release\n--------------------------\n\nThis release improves performance of YOLO v3 and v4, VGG16, SSD300, and\nBERT. As part of these improvements, Neuron Compiler doesn’t require any\nspecial compilation flags for most models. Details on how to use the\nprior optimizations are outlined in the neuron-cc :ref:`neuron-cc-rn`.\n\nThe release also improves operational deployments of large scale\ninference applications, with a session management agent incorporated\ninto all supported ML Frameworks and a new neuron tool called\nneuron-monitor allows to easily scale monitoring of large fleets of\nInference applications. A sample script for connecting neuron-monitor to\nAmazon CloudWatch metrics is provided as well. Read more about using\nneuron-monitor :ref:`neuron-monitor-ug`.\n\nAugust 19, 2020 Release\n--------------------------\n\nBug fix for an error reporting issue with the Neuron Runtime. Previous\nversions of the runtime were only reporting uncorrectable errors on half\nof the dram per Inferentia. Other Neuron packages are not changed.\n\nAugust 8, 2020 Release\n--------------------------\n\nThis release of the Neuron SDK delivers performance enhancements for the\nBERT Base model. Sequence lengths including 128, 256 and 512 were found\nto have best performance at batch size 6, 3 and 1 respectively using\npublically available versions of both Pytorch (1.5.x) and\nTensorflow-based (1.15.x) models. The compiler option \"-O2\" was used in\nall cases.\n\nA new Kubernetes scheduler extension is included in this release to\nimprove pod scheduling on inf1.6xlarge and inf1.24xlarge instance sizes.\nDetails on how the scheduler works and how to apply the scheduler can be\nfound :ref:`neuron-k8-scheduler-ext`.\nCheck the :ref:`containers_rn` for details\nchanges to k8 components going forward.\n\nAugust 4, 2020 Release\n--------------------------\n\nBug fix for a latent issue caused by a race condition in Neuron Runtime\nleading to possible crashes. The crash was observed under stress load\nconditons. All customers are encouraged to update the latest Neuron\nRuntime package (aws-neuron-runtime), version 1.0.8813.0 or newer. Other\nNeuron packages are being updated as well, but are to be considered\nnon-critical updates.\n\nJuly 16, 2020 Release\n--------------------------\n\nThis release of Neuron SDK adds support for the OpenPose (posenet)\nNeural Network. An example of using Openpose for end to end inference is\navailable :ref:`/src/examples/tensorflow/openpose_demo/openpose.ipynb`.\n\nA new PyTorch auto-partitioner feature now automatically builds a Neuron\nspecific graph representation of PyTorch models. The key benefit of this\nfeature is automatic partitioning the model graph to run the supported\noperators on the NeuronCores and the rest on the host. PyTorch\nauto-partitioner is enabled by default with ability to disable if a\nmanual partition is needed. More details :ref:`neuron-pytorch`. The\nrelease also includes various bug fixes and increased operator support.\n\nImportant to know:\n^^^^^^^^^^^^^^^^^^\n\n1. This update moves the supported version for PyTorch to the current\n   release (PyTorch 1.5.1)\n2. This release supports Python 3.7 Conda packages in addition to Python\n   3.6 Conda packages\n\nJune 18, 2020 Release\n--------------------------\n\nPoint fix an error related to yum downgrade/update of Neuron Runtime\npackages. The prior release fails to successfully downgrade/update\nNeuron Runtime Base package and Neuron Runtime package when using Yum on\nAmazon Linux 2.\n\nPlease remove and then install both packages on AL2 using these\ncommands:\n\n::\n\n   # Amazon Linux 2\n   sudo dnf remove aws-neuron-runtime-base\n   sudo dnf remove aws-neuron-runtime\n   sudo dnf install aws-neuron-runtime-base\n   sudo dnf install aws-neuron-runtime\n\nJun 11, 2020 Release\n--------------------------\n\nThis Neuron release provides support for the recent launch of EKS for\nInf1 instance types and numerous other improvements. More details about\nhow to use EKS with the Neuron SDK can be found in AWS documentation\n`here <https://docs.aws.amazon.com/eks/latest/userguide/inferentia-support.html>`__.\n\nThis release adds initial support for OpenPose PoseNet for images with\nresolutions upto 400x400.\n\nThis release also adds a '-O2' option to the Neuron Compiler. '-O2' can\nhelp with handling of large tensor inputs.\n\nIn addition the Neuron Compiler increments the version of the compiled\nartifacts, called \"NEFF\", to version 1.0. Neuron Runtime versions\nearlier than the 1.0.6905.0 release in May 2020 will not be able to\nexecute NEFFs compiled from this release forward. Please see :ref:`neff-support-table` for\ncompatibility.\n\nStay up to date on future improvements and new features by following the :ref:`neuron_roadmap`.\n\nRefer to the detailed release notes for more information on each Neuron\ncomponent.\n\n.. _important-to-know-1:\n\nImportant to know:\n^^^^^^^^^^^^^^^^^^\n\n1. Size of neural network. The current Neuron compiler release has a\n   limitation in terms of the size of neural network it could\n   effectively optimize for. The size of neural network is influenced by\n   a number of factors including: a) type of neural network (CNN, LSTM,\n   MLP) , b) number of layers, c) sizes of input (dimension of the\n   tensors, batch size, ...). Using the Neuron Compiler '-O2' option can\n   help with handling of large tensor inputs for some models. If not\n   used, Neuron limits the size of CNN models like ResNet to an input\n   size of 480x480 fp16/32, batch size=4; LSTM models like GNMT to have\n   a time step limit of 900; MLP models like BERT to have input size\n   limit of sequence length=128, batch=8.\n\n2. INT8 data type is not currently supported by the Neuron compiler.\n\n3. Neuron does not support TensorFlow 2 or PyTorch 1.4.0.\n\nMay 15, 2020 Release\n--------------------------\n\nPoint fix an error related to installation of the Neuron Runtime Base\npackage. The prior release fails to successfully start Neuron Discovery\nwhen the Neuron Runtime package is not also installed. This scenario of\nrunning Neuron Discovery alone is critical to users of Neuron in\ncontainer environments.\n\nPlease update the aws-neuron-runtime-base package:\n\n::\n\n   # Ubuntu 18 or 16:\n   sudo apt-get update\n   sudo apt-get install aws-neuron-runtime-base\n\n   # Amazon Linux, Centos, RHEL\n   sudo dnf update\n   sudo dnf install aws-neuron-runtime-base\n\nMay 11, 2020 Release\n--------------------------\n\nThis release provides additional throughput improvements to running\ninference on a variety of models; for example BERTlarge throughput has\nimproved by an additional 35% compared to the previous release and with\npeak thoughput of 360 seq/second on inf1.xlarge (more details :ref:`tensorflow-bert-demo` ).\n\nIn addition to the performance boost, this release adds PyTorch, and\nMXNet framework support for BERT models, as well as expands container\nsupport in preparation to an upcoming EKS launch.\n\nWe continue to work on new features and improving performance further,\nto stay up to date follow this repository and our :ref:`neuron_roadmap`.\n\nRefer to the detailed release notes for more information for each Neuron\ncomponent.\n\n.. _important-to-know-2:\n\nImportant to know:\n^^^^^^^^^^^^^^^^^^\n\n1. Size of neural network. The current Neuron compiler release has a\n   limitation in terms of the size of neural network it could\n   effectively optimize for. The size of neural network is influenced by\n   a number of factors including: a) type of neural network (CNN, LSTM,\n   MLP) , b) number of layers, c) sizes of input (dimension of the\n   tensors, batch size, ...). As a result, we limit the sizes of CNN\n   models like ResNet to have an input size limit of 480x480 fp16/32,\n   batch size=4; LSTM models like GNMT to have a time step limit of 900;\n   MLP models like BERT to have input size limit of sequence length=128,\n   batch=8.\n\n2. INT8 data type is not currently supported by the Neuron compiler.\n\n3. Neuron does not support TensorFlow 2 or PyTorch 1.4.0.\n\nMar 26, 2020 Release\n--------------------------\n\nThis release supports a variant of the SSD object detection network, a\nSSD inference demo is available :ref:`tensorflow-ssd300`\n\nThis release also enhances our Tensorboard support to enable CPU-node\nvisibility.\n\nRefer to the detailed release notes for more information for each neuron\ncomponent.\n\n.. _important-to-know-3:\n\nImportant to know:\n^^^^^^^^^^^^^^^^^^\n\n1. Size of neural network. The current Neuron compiler release has a\n   limitation in terms of the size of neural network it could\n   effectively optimize for. The size of neural network is influenced by\n   a number of factors including: a) type of neural network (CNN, LSTM,\n   MLP) , b) number of layers, c) sizes of input (dimension of the\n   tensors, batch size, ...). As a result, we limit the sizes of CNN\n   models like ResNet to have an input size limit of 480x480 fp16/32,\n   batch size=4; LSTM models like GNMT to have a time step limit of 900;\n   MLP models like BERT to have input size limit of sequence length=128,\n   batch=8.\n\n2. INT8 data type is not currently supported by the Neuron compiler.\n\n3. Neuron does not support TensorFlow 2 or PyTorch 1.4.0.\n\nFeb 27, 2020 Release\n--------------------------\n\nThis release improves performance throughput by up to 10%, for example\nResNet-50 on inf1.xlarge has increased from 1800 img/sec to 2040\nimg/sec, Neuron logs include more detailed messages and various bug\nfixes. Refer to the detailed release notes for more details.\n\nWe continue to work on new features and improving performance further,\nto stay up to date follow this repository, and watch the `AWS Neuron\ndeveloper\nforum <https://forums.aws.amazon.com/forum.jspa?forumID=355>`__.\n\n.. _important-to-know-4:\n\nImportant to know:\n^^^^^^^^^^^^^^^^^^\n\n1. Size of neural network. The current Neuron compiler release has a\n   limitation in terms of the size of neural network it could\n   effectively optimize for. The size of neural network is influenced by\n   a number of factors including: a) type of neural network (CNN, LSTM,\n   MLP) , b) number of layers, c) sizes of input (dimension of the\n   tensors, batch size, ...). As a result, we limit the sizes of CNN\n   models like ResNet to have an input size limit of 480x480 fp16/32,\n   batch size=4; LSTM models like GNMT to have a time step limit of 900;\n   MLP models like BERT to have input size limit of sequence length=128,\n   batch=8.\n\n2. Computer-vision object detection and segmentation models are not yet\n   supported.\n\n3. INT8 data type is not currently supported by the Neuron compiler.\n\n4. Neuron does not support TensorFlow 2 or PyTorch 1.4.0.\n\nJan 28, 2020 Release\n--------------------------\n\nThis release brings significant throughput improvements to running\ninference on a variety of models; for example Resnet50 throughput is\nincreased by 63% (measured 1800 img/sec on inf1.xlarge up from 1100/sec,\nand measured 2300/sec on inf1.2xlarge). BERTbase throughput has improved\nby 36% compared to the re:Invent launch (up to 26100seq/sec from\n19200seq/sec on inf1.24xlarge), and BERTlarge improved by 15% (230\nseq/sec, compared to 200 running on inf1.2xlarge). In addition to the\nperformance boost, this release includes various bug fixes as well as\nadditions to the GitHub with  :ref:`neuron-features-index`\ndiving deep on how Neuron performance features work and overall improved\ndocumentation following customer input.\n\nWe continue to work on new features and improving performance further,\nto stay up to date follow this repository, and watch the `AWS Neuron\ndeveloper\nforum <https://forums.aws.amazon.com/forum.jspa?forumID=355>`__.\n\n.. _important-to-know-5:\n\nImportant to know:\n^^^^^^^^^^^^^^^^^^\n\n1. Size of neural network. The current Neuron compiler release has a\n   limitation in terms of the size of neural network it could\n   effectively optimize for. The size of neural network is influenced by\n   a number of factors including: a) type of neural network (CNN, LSTM,\n   MLP) , b) number of layers, c) sizes of input (dimension of the\n   tensors, batch size, ...). As a result, we limit the sizes of CNN\n   models like ResNet to have an input size limit of 480x480 fp16/32,\n   batch size=4; LSTM models like GNMT to have a time step limit of 900;\n   MLP models like BERT to have input size limit of sequence length=128,\n   batch=8.\n\n2. Computer-vision object detection and segmentation models are not yet\n   supported.\n\n3. INT8 data type is not currently supported by the Neuron compiler.\n\n4. Neuron does not support TensorFlow 2 or PyTorch 1.4.0.\n\nNeuron SDK Release Notes Structure\n----------------------------------\n\nThe Neuron SDK is delivered through commonly used package mananagers\n(e.g. PIP, APT and YUM). These packages are then themselves packaged\ninto Conda packages that are integrated into the AWS DLAMI for minimal\ndeveloper overhead.\n\nThe Neuron SDK release notes follow a similar structure, with the core\nimprovements and known-issues reported in the release notes of the\nprimary packages (e.g. Neuron-Runtime or Neuron-Compiler release notes),\nand additional release notes specific to the package-integration are\nreported through their dedicated release notes (e.g. Conda or DLAMI\nrelease notes).\n"
  },
  {
    "path": "release-notes/archive/tensorboard-neuron.rst",
    "content": ".. _neuron-tensorboard-rn:\n\n\nNeuron Plugin for TensorBoard Release Notes\n============================================\n\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 1\n\n\nKnown Issues and Limitations - Updated 11/29/2022\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe following are not limitations in the Neuron plugin, but may affect your ability to\nuse TensorBoard.\n\n- The Neuron plugin for Trn1 (``tensorboard-plugin-neuronx``) is not compatible with the Neuron plugin\n  for Inf1 (``tensorboard-plugin-neuron``).  Please ensure you only have only the correct package installed.\n\nNeuron Plugin for TensorBoard release [2.6.7.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 04/01/2024\n\nSummary\n-------\n\n- Minor updates.\n\nNeuron Plugin for TensorBoard release [2.6.1.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 12/21/2023\n\nSummary\n-------\n\n- Now uses local third-party dependencies instead of relying on a CDN.\n\n\nNeuron Plugin for TensorBoard release [2.5.39.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 7/19/2023\n\nSummary\n-------\n\n- Minor updates.\n\n\n\nNeuron Plugin for TensorBoard release [2.5.37.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 6/14/2023\n\nSummary\n-------\n\n- Minor updates.\n\n\n\nNeuron Plugin for TensorBoard release [2.5.26.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 05/01/2023\n\nSummary\n-------\n\n* Neuron operator timeline view now includes Neuron Runtime setup/teardown time and a collapsed execution of NC engines and DMA - see Tensorboard tutorial for updated views. \n\n* Improved execution categorization to include \"control\" instructions\n\n\n\nNeuron Plugin for TensorBoard release [2.5.25.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/28/2023\n\nSummary\n-------\n\n- Supports INF2 and TRN1.\n\n\nNeuron Plugin for TensorBoard release [2.5.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 12/09/2022\n\nSummary\n-------\n\n- Added support for PyTorch Neuron on Trn1 (``torch-neuronx``) with new views!  Includes a trace view,\n  an operator view, and an operator timeline view.  For more info, check out the documentation\n  :ref:`neuronx-plugin-tensorboard`.\n\n  .. important::\n\n    - You must update to the latest Neuron Tools (``aws-neuronx-tools`` version 2.6 or newer) and install\n      ``tensorboard-plugin-neuronx`` for proper functionality of the Neuron plugin on Trn1.\n    - For Inf1, please continue to use ``tensorboard-plugin-neuron``.  Refer to the getting started guide\n      on Inf1 :ref:`neuron-plugin-tensorboard`.\n\n\nNeuron Plugin for TensorBoard release [2.4.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 04/29/2022\n\nSummary\n-------\n\n- Minor updates.\n\n\nNeuron Plugin for TensorBoard release [2.3.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/25/2022\n\nSummary\n-------\n\n- Minor updates.\n\n\nNeuron Plugin for TensorBoard release [2.2.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 10/27/2021\n\nNew in this release\n-------------------\n\n   -  Neuron Plugin for TensorBoard now support applications built with Neuron Runtime 2.x (``libnrt.so``).\n\n      .. important::\n\n        -  You must update to the latest Neuron Driver (``aws-neuron-dkms`` version 2.1 or newer) \n           for proper functionality of the new runtime library.\n        -  Read :ref:`introduce-libnrt`\n           application note that describes :ref:`why are we making this\n           change <introduce-libnrt-why>` and\n           how :ref:`this change will affect the Neuron\n           SDK <introduce-libnrt-how-sdk>` in detail.\n        -  Read :ref:`neuron-migrating-apps-neuron-to-libnrt` for detailed information of how to\n           migrate your application.\n\n\n[2.1.2.0]\n^^^^^^^^^^\n\nDate: 8/12/2021\n\nSummary\n-------\n\n- Adds support for Neuron Tensorflow 2.5+\n\n\n.. _2.1.0.0:\n\n[2.1.0.0]\n^^^^^^^^^^\n\nDate: 5/28/2021\n\nSummary\n-------\n\n- No major changes or fixes. Released with other Neuron packages.\n\n.. _2.0.29.0:\n\n[2.0.29.0]\n^^^^^^^^^^^\n\nDate: 4/30/2021\n\nSummary\n-------\n\n- First release Neuron plugin for TensorBoard.  Check out it out here:\n  :ref:`neuron-plugin-tensorboard`.\n\n   - The Neuron plugin is now compatible with TensorBoard 2.0 and higher,\n     in addition to TensorBoard 1.15\n\n   - Provides a centralized place to better understand execution using\n     Neuron SDK.\n\n   - Continues support visualization for TensorFlow graphs, with support\n     for PyTorch and MXNet coming in future releases.\n\n- Neuron plugin for TensorBoard is supported for Neuron tools >= 1.5, which is first\n  introduced in Neuron v1.13.0 release\n- TensorBoard-Neuron is deprecated, and only supported for Neuron tools <= 1.4.12.0.\n  The final version, 1.4.12.0 is part of Neuron v1.12.2 release.\n\n\n.. _11501260:\n\n[1.15.0.1.2.6.0]\n^^^^^^^^^^^^^^^^^^\n\nDate: 2/24/2021\n\nSummary\n-------\n\n-  Fix for CVE-2021-3177.\n\n.. _11501110:\n\n[1.15.0.1.1.1.0]\n^^^^^^^^^^^^^^^^^\n\nDate: 12/23/2020\n\nSummary\n-------\n\n-  Minor internal improvements.\n\n\n.. _1150106150:\n\n[1.15.0.1.0.615.0]\n^^^^^^^^^^^^^^^^^^\n\nDate: 11/17/2020\n\nSummary\n-------\n\n-  Fix issue with viewing chrome trace in Neuron profile plugin in\n   Chrome 80+.\n\nResolved Issues\n---------------\n\n-  Updated dependencies to polyfill missing APIs used by chrome trace in\n   newer browser versions.\n\n\n.. _1150106000:\n\n[1.15.0.1.0.600.0]\n^^^^^^^^^^^^^^^^^^\n\nDate: 09/22/2020\n\nSummary\n-------\n\n-  Minor internal improvements.\n\n.. _1150105700:\n\n[1.15.0.1.0.570.0]\n^^^^^^^^^^^^^^^^^^\n\nDate: 08/08/2020\n\n.. _tb-summary-1:\n\nSummary\n-------\n\n-  Minor internal improvements.\n\n.. _1150105130:\n\n[1.15.0.1.0.513.0]\n^^^^^^^^^^^^^^^^^^\n\nDate: 07/16/2020\n\n.. _tb-summary-2:\n\nSummary\n-------\n\n-  Minor internal improvements.\n\n.. _1150104910:\n\n[1.15.0.1.0.491.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 6/11/2020\n\n.. _tb-summary-3:\n\nSummary\n-------\n\nFix issue where utilization was missing in the op-profile view.\n\nResolved Issues\n---------------\n\n-  The op-profile view in the Neuron Profile plugin now correctly shows\n   the overall NeuronCore utilization.\n\n.. _1150104660:\n\n[1.15.0.1.0.466.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 5/11/2020\n\n.. _tb-summary-4:\n\nSummary\n-------\n\nFix potential installation issue when installing both tensorboard and\ntensorboard-neuron.\n\n.. _tb-resolved-issues-1:\n\nResolved Issues\n---------------\n\n-  Added tensorboard as a dependency in tensorboard-neuron. This\n   prevents the issue of overwriting tensorboard-neuron features when\n   tensorboard is installed after tensorboard-neuron.\n\nOther Notes\n-----------\n\n.. _1150103920:\n\n[1.15.0.1.0.392.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 3/26/2020\n\n.. _tb-summary-5:\n\nSummary\n-------\n\nAdded ability to view CPU node latency in the Graphs plugin and the\nNeuron Profile plugins.\n\nMajor New Features\n------------------\n\n-  Added an aggregate view in addition to the current Neuron subgraph\n   view for both the Graphs plugin and the Neuron Profile plugin.\n-  When visualizing a graph executed on a Neuron device, CPU node\n   latencies are available when coloring the graph by \"Compute time\"\n   using the \"neuron_profile\" tag.\n-  The Neuron Profile plugin now has an overview page to compare time\n   spent on Neuron device versus on CPU.\n\n.. _tb-other-notes-1:\n\nOther Notes\n-----------\n\n-  Requires Neuron-RTD config option \"enable_node_profiling\" to be set\n   to \"true\"\n\n.. _1150103660:\n\n[1.15.0.1.0.366.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 02/27/2020\n\n.. _tb-summary-6:\n\nSummary\n-------\n\nReduced load times and fixed crashes when loading large models for\nvisualization.\n\n.. _tb-resolved-issues-2:\n\nResolved Issues\n---------------\n\n-  Enable large attribute filtering by default\n-  Reduced load time for graphs with attributes larger than 1 KB\n-  Fixed a fail to load graphs with many large attributes totaling more\n   than 1 GB in size\n\n.. _1150103150:\n\n[1.15.0.1.0.315.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 12/20/2019\n\n.. _tb-summary-7:\n\nSummary\n-------\n\nNo major chages or fixes. Released with other Neuron packages.\n\n.. _1150103060:\n\n[1.15.0.1.0.306.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 12/1/2019\n\n.. _tb-summary-8:\n\nSummary\n-------\n\n.. _tb-major-new-features-1:\n\nMajor New Features\n------------------\n\n.. _tb-resolved-issues-3:\n\nResolved Issues\n---------------\n\n.. _known-issues--limits:\n\nKnown Issues & Limits\n---------------------\n\nSame as prior release\n\n.. _tb-other-notes-2:\n\nOther Notes\n-----------\n\n.. _1150102800:\n\n[1.15.0.1.0.280.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 11/29/2019\n\n.. _tb-summary-9:\n\nSummary\n-------\n\nInitial release packaged with DLAMI.\n\n.. _tb-major-new-features-2:\n\nMajor New Features\n------------------\n\nN/A, initial release.\n\nSee user guide here:\nhttps://github.com/aws/aws-neuron-sdk/blob/master/docs/neuron-tools/getting-started-tensorboard-neuron.md\n\n.. _tb-resolved-issues-4:\n\nResolved Issues\n---------------\n\nN/A - first release\n\n.. _known-issues--limits-1:\n\nKnown Issues & Limits\n---------------------\n\n-  Must install TensorBoard-Neuron by itself, or after regular\n   TensorBoard is installed. If regular Tensorboard is installed after\n   TensorBoard-Neuron, it may overwrite some needed files.\n-  Utilization missing in Op Profile due to missing FLOPs calculation\n   (see overview page instead)\n-  Neuron Profile plugin may not immediately show up on launch (try\n   reloading the page)\n-  Graphs with NeuronOps may take a long time to load due to attribute\n   size\n-  Instructions that cannot be matched to a framework layer/operator\n   name show as “” (blank)\n-  CPU Usage section in chrome-trace is not applicable\n-  Debugger currently supports TensorFlow only\n-  Visualization requires a TensorFlow-compatible graph\n\n.. _tb-other-notes-3:\n\nOther Notes\n-----------\n"
  },
  {
    "path": "release-notes/archive/tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuron-v2.rst",
    "content": ".. _tensorflow-modelserver-rn-v2:\n\nTensorFlow-Model-Server-Neuron 2.x Release Notes\n================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nThis document lists the release notes for the\nTensorFlow-Model-Server-Neuron package.\n\nTensorFlow Model Server Neuron 2.x release [2.4.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 11/23/2022\n\n* Deprecated the NEURONCORE_GROUP_SIZES environment variable.\n* Minor bug fixes.\n\n\nTensorFlow Model Server Neuron 2.x release [2.3.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 04/29/2022\n\n* Added support for tensorflow-model-serving 2.8.0.\n\n\nTensorFlow Model Server Neuron 2.x release [2.2.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/25/2022\n\n* Updated tensorflow-serving 2.5 to 2.5.4.\n* Add support for tensorflow-model-serving 2.6 and 2.7.\n\n\n\nTensorFlow Model Server Neuron 2.x release [2.1.6.0]\n----------------------------------------------------\n\nDate: 01/20/2022\n\n* Updated tensorflow-model-server 2.5 to version 2.5.3\n\n\nTensorFlow Model Server Neuron 2.x release [2.0.4.0]\n----------------------------------------------------\n\nDate: 11/05/2021\n\n* Updated Neuron Runtime (which is integrated within this package) to ``libnrt 2.2.18.0`` to fix a container issue that was preventing \n  the use of containers when /dev/neuron0 was not present. See details here :ref:`runtime_rn`.\n\nTensorFlow Model Server Neuron 2.x release [2.0.3.0]\n----------------------------------------------------\n\nDate: 10/27/2021\n\nNew in this release\n^^^^^^^^^^^^^^^^^^^\n\n* TensorFlow Model Server Neuron 2.x now support Neuron Runtime 2.x (``libnrt.so`` shared library) only.\n\n     .. important::\n\n        -  You must update to the latest Neuron Driver (``aws-neuron-dkms`` version 2.1 or newer) \n           for proper functionality of the new runtime library.\n        -  Read :ref:`introduce-libnrt`\n           application note that describes :ref:`why are we making this\n           change <introduce-libnrt-why>` and\n           how :ref:`this change will affect the Neuron\n           SDK <introduce-libnrt-how-sdk>` in detail.\n        -  Read :ref:`neuron-migrating-apps-neuron-to-libnrt` for detailed information of how to\n           migrate your application.\n\n\n.. _2511680:\n\nTensorFlow Model Server Neuron 2.x release [1.6.8.0]\n----------------------------------------------------\n\nDate: 08/12/2021\n\nSummary\n^^^^^^^\n\nTensorFlow 2.x - tensorflow-model-server-neuron now support TensorFlow 2.x,  tensorflow-model-server-neuron package versions 2.1.4, 2.2.2, 2.3.0, 2.4.1, and 2.5.1 support TensorFlow 2.x.\n"
  },
  {
    "path": "release-notes/archive/tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuron.rst",
    "content": ".. _tensorflow-modelserver-rn:\n.. _tensorflow-modeslserver-neuron-rn:\n\nTensorFlow-Model-Server-Neuron 1.x Release Notes\n================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nThis document lists the release notes for the\nTensorFlow-Model-Server-Neuron package.\n\nTensorFlow Model Server Neuron 1.x release [2.4.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 11/23/2022\n\n* Deprecated the NEURONCORE_GROUP_SIZES environment variable.\n* Minor bug fixes.\n\n\nTensorFlow Model Server Neuron 1.x release [2.2.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/25/2022\n\n* Minor bug fixes.\n\n\nTensorFlow Model Server Neuron 1.x release [2.0.4.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 11/05/2021\n\n* Updated Neuron Runtime (which is integrated within this package) to ``libnrt 2.2.18.0`` to fix a container issue that was preventing \n  the use of containers when /dev/neuron0 was not present. See details here :ref:`runtime_rn`.\n\nTensorFlow Model Server Neuron 1.x release [2.0.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 10/27/2021\n\nNew in this release\n-------------------\n\n* TensorFlow Model Server Neuron 1.x now support Neuron Runtime 2.x (``libnrt.so`` shared library) only.\n\n     .. important::\n\n        -  You must update to the latest Neuron Driver (``aws-neuron-dkms`` version 2.1 or newer) \n           for proper functionality of the new runtime library.\n        -  Read :ref:`introduce-libnrt`\n           application note that describes :ref:`why are we making this\n           change <introduce-libnrt-why>` and\n           how :ref:`this change will affect the Neuron\n           SDK <introduce-libnrt-how-sdk>` in detail.\n        -  Read :ref:`neuron-migrating-apps-neuron-to-libnrt` for detailed information of how to\n           migrate your application.\n\n\n.. _11501510:\n\n[1.15.0.1.5.1.0]\n^^^^^^^^^^^^^^^^\n\nDate: 07/02/2021\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _11501400:\n\n[1.15.0.1.4.0.0]\n^^^^^^^^^^^^^^^^\n\nDate: 05/24/2021\n\nSummary\n-------\n\n1. Remove SIGINT/SIGTERM handler and rely on mechnisms provided by Neuron runtime for resource cleanup.\n2. Uncap protobuf size limit.\n\n.. _11501330:\n\n[1.15.0.1.3.3.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 05/01/2021\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _11501290:\n\n[1.15.0.1.2.9.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 03/04/2021\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _11501280:\n\n[1.15.0.1.2.8.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 02/24/2021\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n\n.. _11501220:\n\n[1.15.0.1.2.2.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 01/30/2021\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n\n.. _11501130:\n\n[1.15.0.1.1.3.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 12/23/2020\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n\n.. _11501021680:\n\n[1.15.0.1.0.2168.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 11/17/2020\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n\n.. _11501020430:\n\n[1.15.0.1.0.2043.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 09/22/2020\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _11501019650:\n\n[1.15.0.1.0.1965.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 08/08/2020\n\n.. _tms-summary-1:\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _11501019530:\n\n[1.15.0.1.0.1953.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 08/05/2020\n\n.. _tms-summary-2:\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _11501018910:\n\n[1.15.0.1.0.1891.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 07/16/2020\n\n.. _tms-summary-3:\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _11501017960:\n\n[1.15.0.1.0.1796.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate 6/11/2020\n\n.. _tms-summary-4:\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _11501015720:\n\n[1.15.0.1.0.1572.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate 5/11/2020\n\n.. _tms-summary-5:\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _11501013330:\n\n[1.15.0.1.0.1333.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate 3/26/2020\n\n.. _tms-summary-6:\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _11501012400:\n\n[1.15.0.1.0.1240.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate 2/27/2020\n\n.. _tms-summary-7:\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _1150109970:\n\n[1.15.0.1.0.997.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 1/27/2019\n\n.. _tms-summary-8:\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _1150108030:\n\n[1.15.0.1.0.803.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 12/20/2019\n\n.. _tms-summary-9:\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _1150107490:\n\n[1.15.0.1.0.749.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 12/1/2019\n\n.. _tms-summary-10:\n\nSummary\n-------\n\nNo change. See :ref:`tensorflow-neuron-release-notes` for related TensorFlow-Neuron release\nnotes.\n\n.. _1150106630:\n\n[1.15.0.1.0.663.0]\n^^^^^^^^^^^^^^^^^^\n\nDate 11/29/2019\n\n.. _tms-summary-11:\n\nSummary\n-------\n\nThis version is available only in released DLAMI v26.0. See\nTensorFlow-Neuron Release Notes. Please\n:ref:`update <dlami-rn-known-issues>` to latest version.\n"
  },
  {
    "path": "release-notes/archive/tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuronx.rst",
    "content": ".. _tensorflow-modeslserver-neuronx-rn:\n\nTensorFlow-Model-Server-Neuron (``tensorflow-modeslserver-neuronx``) Release Notes\n==================================================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nThis document lists the release notes for the\nTensorFlow-Model-Server-Neuron (``tensorflow-modeslserver-neuronx``) package.\n\nTensorFlow Model Server Neuron  [2.9.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nDate: 7/19/2023\n\n* Minor updates\n\nTensorFlow Model Server Neuron  [2.8.9.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nDate: 6/14/2023\n\n* Minor updates\n\nTensorFlow Model Server Neuron  [2.8.1.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nDate: 5/1/2023\n\n* Minor updates\n\nTensorFlow Model Server Neuron  [2.7.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nDate: 3/28/2023\n\n* Minor updates\n\nTensorFlow Model Server Neuron  [2.6.5.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nDate: 2/24/2023\n\nFirst release of TensorFlow-Model-Server-Neuron (``tensorflow-modeslserver-neuronx``) package.\n"
  },
  {
    "path": "release-notes/archive/tensorflow/tensorflow-neuron/tensorflow-neuron-v2.rst",
    "content": ".. _tensorflow-neuron-rn-v2:\n\nTensorFlow 2.x (``tensorflow-neuron``) Release Notes\n=====================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nThis document lists the release notes for the tensorflow-neuron 2.x packages.\n\n.. _tf-known-issues-and-limitations-v2:\n\nKnown Issues and Limitations - updated 08/12/2021\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- Support on serialized TensorFlow 2.x custom operators is currently limited. Serializing some operators registered from tensorflow-text through `TensorFlow Hub <https://tfhub.dev/>`_ is going to cause failure in tensorflow.neuron.trace.\n\n- Memory leak exists on latest releases of TensorFlow Neuron for versions 2.1, 2.2, 2.3, and 2.4.\n\n\n-  Issue: When compiling large models, user might run out of memory and\n   encounter this fatal error.\n\n::\n\n   terminate called after throwing an instance of 'std::bad_alloc'\n\nSolution: run compilation on a c5.4xlarge instance type or larger.\n\n-  Issue: When upgrading ``tensorflow-neuron`` with\n   ``pip install tensorflow-neuron --upgrade``, the following error\n   message may appear, which is caused by ``pip`` version being too low.\n\n::\n\n     Could not find a version that satisfies the requirement tensorflow<1.16.0,>=1.15.0 (from tensorflow-neuron)\n\nSolution: run a ``pip install pip --upgrade`` before upgrading\n``tensorflow-neuron``.\n\n-  Issue: Some Keras routines throws the following error:\n\n::\n\n   AttributeError: 'str' object has no attribute 'decode'.\n\nSolution: Please downgrade `h5py` by `pip install 'h5py<3'`. This is caused by https://github.com/TensorFlow/TensorFlow/issues/44467.\n\ntensorflow-neuron 2.x release [2.12.2.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 09/16/2024\n\n* Minor updates.\n\ntensorflow-neuron 2.x release [2.11.4.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 07/03/2024\n\n* Minor updates.\n\ntensorflow-neuron 2.x release [2.10.19.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 04/01/2024\n\n* Minor updates.\n\ntensorflow-neuron 2.x release [2.10.8.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 12/21/2023\n\n* Minor updates.\n\ntensorflow-neuron 2.x release [2.10.2.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 10/15/2023\n\n* Minor updates.\n\ntensorflow-neuron 2.x release [2.10.1.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 09/15/2023\n\n* Minor updates.\n\ntensorflow-neuron 2.x release [2.9.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 7/19/2023\n\n* Minor updates.\n\n\ntensorflow-neuron 2.x release [2.8.9.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 06/14/2023\n\n* Added Python 3.10 support.\n\ntensorflow-neuron 2.x release [2.8.1.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 05/01/2023\n\n* Added support for tracing models larger than 2 GB through the environment variable ``NEURON_CC_FLAGS='--extract-weights INSTANCE_TYPE'`` for all inf1 instance types.\n* Neuron release 2.10 release will be the last release that will include support for tensorflow-neuron version 2.7. Future Neuron releases will not include tensorflow-neuron version 2.7.\n\ntensorflow-neuron 2.x release [2.7.4.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 04/19/2023\n\n* Minor updates.\n\ntensorflow-neuron 2.x release [2.7.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/28/2023\n\n* Introduce the ``tfn.analyze_model`` function that displays information about the supported and unsupported operators of a traceable model.\n* Introduce the ``on_neuron_ratio`` attribute of AWS Optimized Neuron Models returned by ``tfn.trace``, which is the percentage of ops on neuron after compilation. \n\ntensorflow-neuron 2.x release [2.6.5.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 02/24/2023\n\n* Minor updates.\n\ntensorflow-neuron 2.x release [2.6.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 2/24/2023\n\n* Minor bug fixes.\n\ntensorflow-neuron 2.x release [2.4.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 11/22/2022\n\n* Beta support for tracing models larger than 2 GB through environment variable ``NEURON_CC_FLAGS='--extract-weights'``.\n* Introduce ``tfn.auto_multicore`` Python API to enable automatic data parallel on multiple NeuronCores.\n* Introduce ``tf-neuron-auto-multicore`` tool to enable automatic data parallel on multiple NeuronCores.\n* Deprecated the NEURONCORE_GROUP_SIZES environment variable.\n* Minor bug fixes.\n\n\ntensorflow-neuron 2.x release [2.3.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 04/29/2022\n\n* Added support for Tensorflow 2.8.0.\n* Added support for Slice operator\n* The graph partitioner now prefers to place less compute intensive operators on CPU if the model already contains a large amount of compute intensive operators.\n* Fixed `Github issue #408 <https://github.com/aws/aws-neuron-sdk/issues/408>`_, the fix solves data type handling bug in ``tfn.trace`` when the model contains Conv2D operators.\n\n\ntensorflow-neuron 2.x release [2.2.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/25/2022\n\n* Updated TensorFlow 2.5 to version 2.5.3.\n* Added support for TensorFlow 2.6 and 2.7.\n* Added a warning message when calling ``tfn.saved_model.compile`` API. In tensorflow-neuron 2.x you should call :ref:`tensorflow.neuron.trace <tensorflow-ref-neuron-tracing-api>`. ``tfn.saved_model.compile`` API supports only partial functionality of :ref:`tensorflow.neuron.trace <tensorflow-ref-neuron-tracing-api>` and will be deprecated in the future.\n\n\n\ntensorflow-neuron 2.x release [2.1.14.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 02/17/2022\n\n* Fixed a bug in TensorFlow Neuron versions 2.1, 2.2. 2.3 and 2.4. The fixed bug was causing a memory leak of 128 bytes for each inference.\n* Improved warning message when calling deprecated compilation API under tensorflow-neuron 2.x. \n\n\ntensorflow-neuron 2.x release [2.1.13.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 02/16/2022\n\n* Fixed a bug that caused a memory leak. The memory leak was approximately 128b for each inference and \n  exists in all versions of TensorFlow Neuron versions part of Neuron 1.16.0 to Neuron 1.17.0 releases. see :ref:`pre-release-content` \n  for exact versions included in each release.  This release only addresses the leak in TensorFlow Neuron 2.5.  Future release of TensorFlow Neuron will fix the leak in other versions as well (2.1, 2.2, 2.3, 2.4).\n\n\n\ntensorflow-neuron 2.x release [2.1.6.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 01/20/2022\n\n* Updated TensorFlow 2.5 to version 2.5.2.\n* Enhanced auto data parallel (e.g. when using NEURONCORE_GROUP_SIZES=X,Y,Z,W) to support edge cases.\n* Fixed a bug that may cause tensorflow-neuron to generate in some cases scalar gather instruction with incorrect arguments.\n\n\ntensorflow-neuron 2.x release [2.0.4.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 11/05/2021\n\n* Updated Neuron Runtime (which is integrated within this package) to ``libnrt 2.2.18.0`` to fix a container issue that was preventing \n  the use of containers when /dev/neuron0 was not present. See details here :ref:`runtime_rn`.\n\ntensorflow-neuron 2.x release [2.0.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 10/27/2021\n\nNew in this release\n-------------------\n\n* TensorFlow 2.x (``tensorflow-neuron``) now support Neuron Runtime 2.x (``libnrt.so`` shared library) only.\n\n     .. important::\n\n        -  You must update to the latest Neuron Driver (``aws-neuron-dkms`` version 2.1 or newer) \n           for proper functionality of the new runtime library.\n        -  Read :ref:`introduce-libnrt`\n           application note that describes :ref:`why are we making this\n           change <introduce-libnrt-why>` and\n           how :ref:`this change will affect the Neuron\n           SDK <introduce-libnrt-how-sdk>` in detail.\n        -  Read :ref:`neuron-migrating-apps-neuron-to-libnrt` for detailed information of how to\n           migrate your application.\n\n\n* Updated TensorFlow 2.3.x from TensorFlow 2.3.3 to TensorFlow 2.3.4. \n* Updated TensorFlow 2.4.x from TensorFlow 2.4.2 to TensorFlow 2.4.3.\n* Updated TensorFlow 2.5.x from TensorFlow 2.5.0 to TensorFlow 2.5.1.\n\n\nResolved Issues\n---------------\n\n* Fix bug that can cause illegal compiler optimizations\n* Fix bug that can cause dynamic-shape operators be placed on Neuron\n\n.. _2501680:\n\ntensorflow-neuron 2.x release [1.6.8.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 08/12/2021\n\nNew in this release\n-------------------\n\n* First release of TensorFlow 2.x integration, Neuron support now TensorFlow versions 2.1.4, 2.2.3, 2.3.3, 2.4.2, and 2.5.0.\n\n* New public API tensorflow.neuron.trace: trace a TensorFlow 2.x keras.Model or a Python callable that can be decorated by tf.function, and return an AWS-Neuron-optimized keras.Model that can execute on AWS Machine Learning Accelerators.\n **Please note** that TensorFlow 1.x SavedModel compilation API tensorflow.neuron.saved_model.compile is not supported in tensorflow-neuron 2.x . It continues to function in tensorflow-neuron 1.15.x .\n\n* Included versions:\n\n   - tensorflow-neuron-2.5.0.1.6.8.0 \n   - tensorflow-neuron-2.4.2.1.6.8.0\n   - tensorflow-neuron-2.3.3.1.6.8.0\n   - tensorflow-neuron-2.2.3.1.6.8.0\n   - tensorflow-neuron-2.1.4.1.6.8.0\n"
  },
  {
    "path": "release-notes/archive/tensorflow/tensorflow-neuron/tensorflow-neuron.rst",
    "content": ".. _tensorflow-neuron-rn:\n.. _tensorflow-neuron-release-notes:\n\nTensorFlow Neuron (``tensorflow-neuron (TF1.x)``) Release Notes\n===============================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n\nThis document lists the release notes for the tensorflow-neuron 1.x package.\n\n.. _tf-known-issues-and-limitations:\n\nKnown Issues and Limitations - updated 08/12/2021\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- Support on serialized TensorFlow 2.x custom operators is currently limited. Serializing some operators registered from TensorFlow-text through `TensorFlow Hub <https://tfhub.dev/>`_ is going to cause failure in tensorflow.neuron.trace.\n\n\n-  Issue: When compiling large models, user might run out of memory and\n   encounter this fatal error.\n\n::\n\n   terminate called after throwing an instance of 'std::bad_alloc'\n\nSolution: run compilation on a c5.4xlarge instance type or larger.\n\n-  Issue: When upgrading ``tensorflow-neuron`` with\n   ``pip install tensorflow-neuron --upgrade``, the following error\n   message may appear, which is caused by ``pip`` version being too low.\n\n::\n\n     Could not find a version that satisfies the requirement TensorFlow<1.16.0,>=1.15.0 (from tensorflow-neuron)\n\nSolution: run a ``pip install pip --upgrade`` before upgrading\n``tensorflow-neuron``.\n\n-  Issue: Some Keras routines throws the following error:\n\n::\n\n   AttributeError: 'str' object has no attribute 'decode'.\n\nSolution: Please downgrade `h5py` by `pip install 'h5py<3'`. This is caused by https://github.com/TensorFlow/TensorFlow/issues/44467.\n\ntensorflow-neuron 1.x release [2.10.1.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 8/28/2023\n\n* Minor updates\n\ntensorflow-neuron 1.x release [2.9.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 7/19/2023\n\n* Minor updates\n\ntensorflow-neuron 1.x release [2.8.9.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 6/14/2023\n\n* Minor updates\n\ntensorflow-neuron 1.x release [2.8.1.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 5/1/2023\n\n* Minor updates\n\ntensorflow-neuron 1.x release [2.7.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 3/28/2023\n\n* Minor updates\n\ntensorflow-neuron 1.x release [2.6.5.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 2/24/2023\n\n* Added support for TensorFlow versions 2.9 and 2.10\n* End-of-support for TensorFlow versions 2.5 and 2.6\n\ntensorflow-neuron 1.x release [2.4.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 11/23/2022\n\n* Introduce ``tf-neuron-auto-multicore`` tool to enable automatic data parallel on multiple NeuronCores.\n* Deprecated the NEURONCORE_GROUP_SIZES environment variable.\n* Minor bug fixes.\n\n\ntensorflow-neuron 1.x release [2.3.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 04/29/2022\n\n* Minor bug fixes.\n\n\ntensorflow-neuron 1.x release [2.1.14.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/25/2022\n\n* Minor bug fixes.\n\n\ntensorflow-neuron 1.x release [2.1.14.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 02/17/2022\n\n* Minor bug fixes.\n\ntensorflow-neuron 1.x release [2.1.13.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 02/16/2022\n\n* Fixed a bug that caused a memory leak. The memory leak was approximately 128b for each inference and \n  exists in all versions of TensorFlow Neuron versions part of Neuron 1.16.0 to Neuron 1.17.0 releases. see :ref:`pre-release-content` \n  for exact versions included in each release.\n\n\n\ntensorflow-neuron 1.x release [2.1.6.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 01/20/2022\n\n* Enhanced auto data parallel (e.g. when using NEURONCORE_GROUP_SIZES=X,Y,Z,W) to support edge cases.\n* Added new operators support. see :ref:`neuron-cc-ops-TensorFlow`.\n\n\ntensorflow-neuron 1.x release [2.0.4.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 11/05/2021\n\n* Updated Neuron Runtime (which is integrated within this package) to ``libnrt 2.2.18.0`` to fix a container issue that was preventing \n  the use of containers when /dev/neuron0 was not present. See details here :ref:`runtime_rn`.\n\n\ntensorflow-neuron 1.x release [2.0.3.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 10/27/2021\n\nNew in this release\n-------------------\n\n* TensorFlow 1.x (``tensorflow-neuron``) now support Neuron Runtime 2.x (``libnrt.so`` shared library) only.\n\n     .. important::\n\n        -  You must update to the latest Neuron Driver (``aws-neuron-dkms`` version 2.1 or newer) \n           for proper functionality of the new runtime library.\n        -  Read :ref:`introduce-libnrt`\n           application note that describes :ref:`why are we making this\n           change <introduce-libnrt-why>` and\n           how :ref:`this change will affect the Neuron\n           SDK <introduce-libnrt-how-sdk>` in detail.\n        -  Read :ref:`neuron-migrating-apps-neuron-to-libnrt` for detailed information of how to\n           migrate your application.\n\nResolved Issues\n---------------\n\n* Fix neuron-cc argument handling bug when nothing can be compiled.\n* Fixing the support of cast operators applied after constants, by Introducing support of constant-folding pass before Neuron auto-mixed-precision.\n\n.. _11551510:\n\n[1.15.5.1.5.1.0]\n^^^^^^^^^^^^^^^^\n\nDate: 07/02/2021\n\nNew in this release\n-------------------\n\n* Bug fixes regarding scalar inputs/outputs.\n* Minor performance improvements when dynamic batch size is turned on or when model is small.\n\n.. _11551400:\n\n[1.15.5.1.4.0.0]\n^^^^^^^^^^^^^^^^\n\nDate: 05/28/2021\n\nNew in this release\n-------------------\n\n* Reduce the amount of input/output data movement during inference.\n* Improve parallelism for dynamic batch size inference by adopting a new sharding mechanism.\n* Reduce the amount of host memory usage during inference.\n* tfn.saved_model.compile now generates correct code when operator Split is used as output.\n* tfn.saved_model.compile now properly reads input tensor shape information from SignatureDef proto.\n* tfn.saved_model.compile now terminates properly when neuron-cc compiler argument is passed but there is no successful compilation.\n* Fix bug on some wrong internal tensor names when neuron-cc compiler crashes.\n* Other minor bug fixes.\n\n.. _11551330:\n\n[1.15.5.1.3.3.0]\n^^^^^^^^^^^^^^^^\n\nDate: 05/01/2021\n\nNew in this release\n-------------------\n\n1. Minor enhancements.\n\n.. _11551290:\n\n[1.15.5.1.2.9.0]\n^^^^^^^^^^^^^^^^\n\nDate: 03/04/2021\n\nNew in this release\n-------------------\n\n1. Minor enhancements.\n\n\n.. _11551280:\n\n[1.15.5.1.2.8.0]\n^^^^^^^^^^^^^^^^\n\nDate: 02/24/2021\n\nNew in this release\n-------------------\n\n1. Fix for CVE-2021-3177.\n\n\n.. _11551220:\n\n[1.15.5.1.2.2.0]\n^^^^^^^^^^^^^^^^\n\nDate: 01/30/2021\n\nNew in this release\n-------------------\n\n1. Bug fixes and internal refactor.\n\n2. Bump TensorFlow base package version to 1.15.5.\n\n3. Introduced a new argument ``convert_constants_to_variables`` to the compilation API ``tfn.saved_model.compile``. Setting it to ``True`` can address the issue of large constants consuming too much memory in the TensorFlow runtime.\n\n\n\n\n.. _11541130:\n\n[1.15.4.1.1.3.0]\n^^^^^^^^^^^^^^^^\n\nDate: 12/23/2020\n\nNew in this release\n-------------------\n\n1. Improved logging during `tfn.saved_model.compile` to display `neuron-cc` compilation progress.\n\n2. Small performance improvement in some edge cases by optimizing the NeuronCore-executable assignment mechanism.\n\n\n\n\n.. _11541021680:\n\n[1.15.4.1.0.2168.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 11/17/2020\n\nNew in this release\n-------------------\n\n1. tensorflow-neuron is now a plugin package that can be used together\n   with TensorFlow~=1.15.0 built with ``GLIBCXX_USE_CXX11_ABI=0``.\n\n2. Improved logging during ``tfn.saved_model.compile`` to display\n   ``neuron-cc`` logging file path, which is useful for tracking\n   ``neuron-cc`` compilation progress.\n\n3. Small performance improvement by utilizing shared memory more\n   efficiently.\n\n\n.. _11531020430:\n\n[1.15.3.1.0.2043.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 09/22/2020\n\nNew in this release\n-------------------\n\n1. tensorflow-neuron now automatically enables data parallel mode on\n   four cores in one Inferentia. In ``TensorFlow-model-server-neuron``,\n   most models can now fully utilize four cores automatically. In Python\n   TensorFlow, running threaded inference using ``>=4`` Python threads\n   in the same TensorFlow Session lead to full utilization of four\n   cores.\n\n2. tensorflow-neuron now tries to enable dynamic batch size\n   automatically for a limited number of models, such as ResNet50.\n\n3. Improved logging during ``tfn.saved_model.compile`` to display\n   input/output information about subgraphs that are going to be\n   compiled by ``neuron-cc``.\n\n.. _11531019650:\n\n[1.15.3.1.0.1965.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 08/08/2020\n\n.. _ts-summary-1:\n\nNew in this release\n-------------------\n\nVarious minor improvements.\n\n.. _11531019530:\n\n[1.15.3.1.0.1953.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 08/05/2020\n\n.. _ts-summary-2:\n\nNew in this release\n-------------------\n\nVarious minor improvements.\n\n.. _11531018910:\n\n[1.15.3.1.0.1891.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 07/16/2020\n\n.. _ts-summary-3:\n\nNew in this release\n-------------------\n\nThis version contains a few bug fixes and user experience improvements.\n\nDependency change\n-----------------\n\n1. Bump TensorFlow base package version number to 1.15.3\n2. Add ``TensorFlow >= 1.15.0, < 1.16.0`` as an installation dependency\n   so that packages depending on TensorFlow can be installed together\n   with tensorflow-neuron without error\n\nNew Features\n------------\n\n1. ``tensorflow-neuron`` now displays a summary of model performance\n   when profiling is enable by setting environment variable\n   ``NEURON_PROFILE``\n\nResolved Issues\n---------------\n\n1. Environment variable ``NEURON_PROFILE`` can now be set to a\n   non-existing path which will be automatically created\n2. Fixed a bug in ``tfn.saved_model.compile`` that causes compilation\n   failure when ``dynamic_batch_size=True`` is specified on a SavedModel\n   with unknown rank inputs.\n\n.. _11521017960:\n\n[1.15.2.1.0.1796.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate 6/11/2020\n\n.. _ts-summary-4:\n\nNew in this release\n-------------------\n\nThis version contains a few bug fixes.\n\nMajor New Features\n------------------\n\n.. _tf-resolved-issues-1:\n\nResolved Issues\n---------------\n\n1. Fixed a bug related with device placement. Now models with device\n   information hardcoded to GPU can be successfully compiled with\n   ``tfn.saved_model.compile``\n2. Fixed a bug in ``tfn.saved_model.compile`` that causes models\n   containing Reshape operators not functioning correctly when it is\n   compiled with ``dynamic_batch_size=True``\n3. Fixed a bug in ``tfn.saved_model.compile`` that causes models\n   containing Table related operators to initialize incorrectly after\n   compilation.\n\nKnown Issues and limitations\n----------------------------\n\n.. _11521015720:\n\n[1.15.2.1.0.1572.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 5/11/2020\n\n.. _ts-summary-5:\n\nNew in this release\n-------------------\n\nThis version contains some bug fixes and new features.\n\n.. _tf-major-new-features-1:\n\nMajor New Features\n------------------\n\n-  tensorflow-neuron is now built on TensorFlow 1.15.2 instead of\n   TensorFlow 1.15.0\n\n.. _tf-resolved-issues-2:\n\nResolved Issues\n---------------\n\n-  Fixed a bug that caused Neuron runtime resources to not all be\n   released when a tensorflow-neuron process terminated with in-flight\n   inferences\n-  Inference timeout value set at compile time is now correctly\n   recognized at runtime\n\n\nKnown Issues and limitations\n----------------------------\n\n.. _tf-11501013330:\n\n[1.15.0.1.0.1333.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 3/26/2020\n\n.. _ts-summary-6:\n\nNew in this release\n-------------------\n\n.. _tf-major-new-features-2:\n\nMajor New Features\n------------------\n\n-  Improved performance between TensorFlow to Neuron runtime.\n\n.. _tf-resolved-issues-3:\n\nResolved Issues\n---------------\n\n-  Fixed a bug in Neuron runtime adaptor operator's shape function when\n   dynamic batch size inference is enabled\n-  Framework method (tensorflow.neuron.saved-model.compile) improved\n   handling of compiler timeout termination by letting it clean up\n   before exiting.\n\n.. _tf-known-issues-and-limitations-2:\n\nKnown Issues and limitations\n----------------------------\n\n.. _11501012400:\n\n[1.15.0.1.0.1240.0]\n^^^^^^^^^^^^^^^^^^^\n\nDate: 2/27/2020\n\n.. _ts-summary-7:\n\nNew in this release\n-------------------\n\n.. _tf-major-new-features-3:\n\nMajor New Features\n------------------\n\n-  Enabled runtime memory optimizations by default to improve inference\n   performance, specifically in cases with large input/output tensors\n-  tfn.saved_model.compile now displays warning message instead of\n   \"successfully compiled\" if less than 30% of operators are mapped to\n   Inferentia\n-  Improve error messages. Runtime failure error messages are now more\n   descriptive and also provide instructions to restart neuron-rtd when\n   necessary.\n\n.. _tf-resolved-issues-4:\n\nResolved Issues\n---------------\n\n.. _tf-known-issues-and-limitations-3:\n\nKnown Issues and Limitations\n----------------------------\n\n-  Issue: When compiling a large model, may encounter.\n\n::\n\n   terminate called after throwing an instance of 'std::bad_alloc'\n\nSolution: run compilation on c5.4xlarge instance type or larger.\n\nOther Notes\n-----------\n\n.. _tf-1150109970:\n\n[1.15.0.1.0.997.0]\n^^^^^^^^^^^^^^^^^^\n\nDate: 1/27/2020\n\n.. _ts-summary-8:\n\nNew in this release\n-------------------\n\n.. _tf-major-new-features-4:\n\nMajor New Features\n------------------\n\n-  Added support for NCHW pooling operators in tfn.saved_model.compile.\n\n.. _tf-resolved-issues-5:\n\nResolved Issues\n---------------\n\n-  Fixed GRPC transient status error issue.\n-  Fixed a graph partitioner issue with control inputs.\n\n.. _tf-known-issues-and-limitations-4:\n\nKnown Issues and Limitations\n----------------------------\n\n-  Issue: When compiling a large model, may encounter.\n\n::\n\n   terminate called after throwing an instance of 'std::bad_alloc'\n\nSolution: run compilation on c5.4xlarge instance type or larger.\n\n.. _tf-other-notes-1:\n\nOther Notes\n-----------\n\n.. _1150108030:\n\n[1.15.0.1.0.803.0]\n^^^^^^^^^^^^^^^^^^\n\nDate: 12/20/2019\n\n.. _ts-summary-9:\n\nNew in this release\n-------------------\n\n.. _tf-major-new-features-5:\n\nMajor New Features\n------------------\n\n.. _tf-resolved-issues-6:\n\nResolved Issues\n---------------\n\n-  Improved handling of ``tf.neuron.saved_model.compile`` arguments\n\n.. _tf-known-issues-and-limitations-5:\n\nKnown Issues and Limitations\n----------------------------\n\n.. _tf-other-notes-2:\n\nOther Notes\n-----------\n\n.. _tf-1150107490:\n\n[1.15.0.1.0.749.0]\n^^^^^^^^^^^^^^^^^^\n\nDate: 12/1/2019\n\n.. _tf-summary-10:\n\nNew in this release\n-------------------\n\n.. _tf-major-new-features-6:\n\nMajor New Features\n------------------\n\n.. _tf-resolved-issues-7:\n\nResolved Issues\n---------------\n\n-  Fix race condition between model load and model unload when the\n   process is killed\n-  Remove unnecessary GRPC calls when the process is killed\n\n.. _tf-known-issues-and-limitations-6:\n\nKnown Issues and Limitations\n----------------------------\n\n-  When compiling a large model, may encounter “terminate called after\n   throwing an instance of 'std::bad_alloc'”. Solution: run compilation\n   on c5.4xlarge instance type or larger.\n\n-  The pip package ``wrapt`` may have a conflicting version in some\n   installations. This is seen when this error occurs:\n\n.. code:: bash\n\n   ERROR: Cannot uninstall 'wrapt'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.\n\nTo solve this, you can update wrapt to the newer version:\n\n.. code:: bash\n\n   python3 -m pip install wrapt --ignore-installed\n   python3 -m pip install tensorflow-neuron\n\nWithin a Conda environment:\n\n.. code:: bash\n\n   conda update wrapt\n   conda update tensorflow-neuron\n\n.. _tf-other-notes-3:\n\nOther Notes\n-----------\n\n.. _1150106630:\n\n[1.15.0.1.0.663.0]\n^^^^^^^^^^^^^^^^^^\n\nDate: 11/25/2019\n\n.. _ts-summary-11:\n\nNew in this release\n-------------------\n\nThis version is available only in released DLAMI v26.0 and is based on\nTensorFlow version 1.15.0. Please\n:ref:`update <dlami-rn-known-issues>` to latest version.\n\n.. _tf-major-new-features-7:\n\nMajor New Features\n------------------\n\n.. _tf-resolved-issues-8:\n\nResolved Issues\n---------------\n\nKnown Issues and Limits\n-----------------------\n\nModels Supported\n----------------\n\nThe following models have successfully run on neuron-inferentia systems\n\n1. BERT_LARGE and BERT_BASE\n2. Transformer\n3. Resnet50 V1/V2\n4. Inception-V2/V3/V4\n\n.. _tf-other-notes-4:\n\nOther Notes\n-----------\n\n-  Python versions supported:\n\n   -  3.5, 3.6, 3.7\n\n-  Linux distribution supported:\n\n   -  Ubuntu 18, Amazon Linux 2\n\n\n\n"
  },
  {
    "path": "release-notes/archive/tensorflow/tensorflow-neuronx/tensorflow-neuronx.rst",
    "content": ".. _tensorflow-neuronx-release-notes:\n\nTensorFlow 2.x (``tensorflow-neuronx``) Release Notes\n========================================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nThis document lists the release notes for the tensorflow-neuronx 2.x packages.\n\n\n\ntensorflow-neuronx 2.x release [2.1.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 09/15/2023\n\n* Minor updates\n\nDate: 05/1/2023\n\n* Added support for tracing models larger than 2 GB through the environment variable ``NEURON_CC_FLAGS='--extract-weights INSTANCE_TYPE'`` for all trn1 and inf2 instance types.\n* tensorflow-neuronx now supports tensorflow 2.7, 2.8, and 2.9 (In addition to the already supported 2.10).\n* Neuron release 2.10 release will be the last release that will include support for tensorflow-neuronx version 2.7. Future Neuron releases will not include tensorflow-neuronx version 2.7.\n\ntensorflow-neuronx 2.10 release [2.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 03/28/2023\n\nThe second release of tensorflow-neuronx 2.10 includes the following features:\n\n* Dynamic batching\n\nThe following features are not included in this release:\n\n* Support for tracing models larger than 2 GB\n\ntensorflow-neuronx 2.10 release [1.0.0]\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDate: 2/24/2023\n\nThe initial release of tensorflow-neuronx 2.10 includes the following features:\n\n* Initial support for TensorFlow 2.10 inference on Inf2 and Trn1\n* Trace API (tensorflow_neuronx.trace)\n* Automatic partitioning of model into CPU vs NeuronCore parts\n* Automatic data parallel on multiple NeuronCores (beta)\n* Python 3.7, 3.8 and 3.9 support\n* HuggingFace Roberta tutorial\n\nThe following features are not included in this release:\n\n* Dynamic batching\n* Support for tracing models larger than 2 GB\n"
  },
  {
    "path": "release-notes/archive/torch-neuron.rst",
    "content": ".. _pytorch-neuron-rn:\n.. _torch_neuron_core_placement_api:\n.. _pytorch-manual-partitioning-jn-tutorial:\n\nPyTorch Neuron (``torch-neuron``) release notes\n===============================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nThis document lists the release notes for the Pytorch-Neuron package.\n\n\n\nKnown Issues and Limitations - Updated 03/21/2023\n-------------------------------------------------\n\nMin & Max Accuracy\n~~~~~~~~~~~~~~~~~~\n\nThe index outputs of the ``aten::argmin``, ``aten::argmax``, ``aten::min``, and\n``aten::max`` operator implementations are sensitive to precision. For models\nthat contain these operators and have ``float32`` inputs, we recommend using the\n``--fp32-cast=matmult --fast-math no-fast-relayout`` compiler option to avoid\nnumerical imprecision issues. Additionally, the ``aten::min`` and ``aten::max``\noperator implementations do not currently support ``int64`` inputs when\n``dim=0``. For more information on precision and performance-accuracy tuning,\nsee :ref:`neuron-cc-training-mixed-precision`.\n\nPython 3.5\n~~~~~~~~~~\n\nIf you attempt to import torch.neuron from Python 3.5 you will see this error\nin 1.1.7.0 - please use Python 3.6 or greater:\n\n.. code-block::\n\n   File \"/tmp/install_test_env/lib/python3.5/site-packages/torch_neuron/__init__.py\", line 29\n      f'Invalid dependency version torch=={torch.__version__}. '\n                                                             ^\n   SyntaxError: invalid syntax\n\n-  Torchvision has dropped support for Python 3.5\n-  HuggingFace transformers has dropped support for Python 3.5\n\nTorchvision\n~~~~~~~~~~~\n\nWhen versions of ``torchvision`` and ``torch`` are mismatched, this\ncan result in exceptions when compiling ``torchvision`` based\nmodels. Specific versions of ``torchvision`` are built against each release\nof ``torch``. For example:\n\n- ``torch==1.5.1`` matches ``torchvision==0.6.1``\n- ``torch==1.7.1`` matches ``torchvision==0.8.2``\n- etc.\n\nSimultaneously installing both ``torch-neuron`` and ``torchvision`` is the\nrecommended method of correctly resolving versions.\n\n\nDynamic Batching\n~~~~~~~~~~~~~~~~\n\nDynamic batching does not work properly for some models that use the\n``aten::size`` operator. When this issue occurs, the input batch sizes are not\nproperly recorded at inference time, resulting in an error such as:\n\n.. code-block:: text\n\n    RuntimeError: The size of tensor a (X) must match the size of tensor b (Y) at non-singleton dimension 0.\n\nThis error typically occurs when ``aten::size`` operators are partitioned to\nCPU. We are investigating a fix for this issue.\n\nPyTorch Neuron release [package ver. 1.*.*.2.11.6.0, SDK ver. 2.20.0]\n---------------------------------------------------------------------\n\nDate: 09/16/2024\n\n* Minor updates.\n\nPyTorch Neuron release [package ver. 1.*.*.2.10.12.0, SDK ver. 2.19.0]\n----------------------------------------------------------------------\n\nDate: 07/03/2024\n\n* Minor updates.\n\nPyTorch Neuron release [package ver. 1.*.*.2.9.74.0, SDK ver. 2.18.0]\n---------------------------------------------------------------------\n\nDate: 04/01/2024\n\n* Minor updates.\n\nPyTorch Neuron release [package ver. 1.*.*.2.9.17.0, SDK ver. 2.16.0]\n---------------------------------------------------------------------\n\nDate: 12/21/2023\n\n* Minor updates.\n\nPyTorch Neuron release [package ver. 1.*.*.2.9.6.0, SDK ver. 2.15.0]\n--------------------------------------------------------------------\n\nDate: 10/26/2023\n\n* Minor updates.\n\nPyTorch Neuron release [package ver. 1.*.*.2.9.1.0, SDK ver. 2.13.0]\n--------------------------------------------------------------------\n\nDate: 08/28/2023\n\n* Added support for clamp_min/clamp_max ATEN operators.\n\nPyTorch Neuron release [package ver. 1.*.*.2.8.9.0, SDK ver. 2.12.0]\n--------------------------------------------------------------------\n\nDate: 07/19/2023\n\n* Minor updates.\n\nPyTorch Neuron release [2.7.10.0]\n--------------------------------------------------\n\nDate: 06/14/2023\n\nNew in this release\n~~~~~~~~~~~~~~~~~~~\n\n* Added support for Python 3.10\n\nBug fixes\n~~~~~~~~~\n\n* torch.pow Operation now correctly handles mismatch between base and exponent data types\n\nPyTorch Neuron release [2.7.1.0]\n--------------------------------------------------\n\nDate: 05/1/2023\n\n* Minor updates.\n\nPyTorch Neuron release [2.6.5.0]\n--------------------------------------------------\n\nDate: 03/28/2023\n\nNew in this release\n~~~~~~~~~~~~~~~~~~~\n\n* Added support for ``torch==1.13.1``\n* New releases of ``torch-neuron`` no longer include versions for ``torch==1.7`` and ``torch==1.8``\n* Added support for Neuron runtime 2.12\n* Added support for new operators:\n\n  * ``aten::tensordot``\n  * ``aten::adaptive_avg_pool1d``\n  * ``aten::prelu``\n  * ``aten::reflection_pad2d``\n  * ``aten::baddbmm``\n  * ``aten::repeat``\n\n* Added a ``separate_weights`` flag to :func:`torch_neuron.trace` to support\n  models that are larger than 2GB\n\n\nBug fixes\n~~~~~~~~~\n\n* Fixed ``aten::_convolution`` with grouping for:\n\n  * :class:`torch.nn.Conv1d`\n  * :class:`torch.nn.Conv3d`\n  * :class:`torch.nn.ConvTranspose2d`\n\n* Fixed ``aten::linear`` to support 1d input tensors\n* Fixed an issue where an input could not be directly returned from the network\n\n\nPyTorch Neuron release [2.5.0.0]\n--------------------------------------------------\n\nDate: 11/23/2022\n\nNew in this release\n~~~~~~~~~~~~~~~~~~~\n\n* Added PyTorch 1.12 support\n* Added Python 3.8 support\n* Added new operators support. See :ref:`neuron-cc-ops-pytorch`\n* Added support for ``aten::lstm``. See: :ref:`torch_neuron_lstm_support`\n* Improved logging:\n\n  * Improved error messages for specific compilation failure modes, including out-of-memory errors\n  * Added a warning to show the code location of ``prim::PythonOp`` operations\n  * Removed overly-verbose tracing messages\n  * Added improved error messages for ``neuron-cc`` and ``tensorflow`` dependency issues\n  * Added more debug information when an invalid dynamic batching configuration is used\n\n* Added new beta explicit NeuronCore placement API. See: :ref:`torch_neuron_core_placement_api`\n* Added new guide for NeuronCore placement. See: :ref:`torch_neuron_core_placement_guide`\n* Improved :func:`torch_neuron.trace` performance when using large graphs\n* Reduced host memory usage of loaded models in ``libtorchneuron.so``\n* Added ``single_fusion_ratio_threshold`` argument to :func:`torch_neuron.trace`\n  to give more fine-grained control of partitioned graphs\n\n\n\nBug fixes\n~~~~~~~~~\n\n* Improved handling of tensor mutations which previously caused accuracy issues on certain models (i.e. yolor, yolov5)\n* Fixed an issue where ``inf`` and ``-inf`` values would cause unexpected ``NaN`` values. This could occur with newer versions of ``transformers``\n* Fixed an issue where :func:`torch.neuron.DataParallel` would not fully utilize all NeuronCores for specific batch sizes\n* Fixed and improved operators:\n\n  * ``aten::upsample_bilinear2d``: Improved error messages in cases where the operation cannot be supported\n  * ``aten::_convolution``: Added support for ``output_padding`` argument\n  * ``aten::div``: Added support for ``rounding_mode`` argument\n  * ``aten::sum``: Fixed to handle non-numeric data types\n  * ``aten::expand``: Fixed to handle scalar tensors\n  * ``aten::permute``: Fixed to handle negative indices\n  * ``aten::min``: Fixed to support more input types\n  * ``aten::max``: Fixed to support more input types\n  * ``aten::max_pool2d``: Fixed to support both 3-dimensional and 4-dimensional input tensors\n  * ``aten::Int``: Fixed an issue where long values would incorrectly lose precision\n  * ``aten::constant_pad_nd``: Fixed to correctly use non-0 padding values\n  * ``aten::pow``: Fixed to support more input types & values\n  * ``aten::avg_pool2d``: Added support for ``count_include_pad`` argument. Added support for ``ceil_mode`` argument if padding isn’t specified\n  * ``aten::zero``: Fixed to handle scalars correctly\n  * ``prim::Constant``: Fixed an issue where ``-inf`` was incorrectly handled\n  * Improved handling of scalars in arithmetic operators\n\n\nPyTorch Neuron release [2.3.0.0]\n--------------------------------------------------\n\nDate: 04/29/2022\n\nNew in this release\n~~~~~~~~~~~~~~~~~~~\n\n* Added support PyTorch 1.11.\n* Updated PyTorch 1.10 to version 1.10.2.\n* End of support for torch-neuron 1.5, see :ref:`eol-pt-15`.\n* Added support for new operators:\n\n  * ``aten::masked_fill_``\n  * ``aten::new_zeros``\n  * ``aten::frobenius_norm``\n\nBug fixes\n~~~~~~~~~\n\n* Improved ``aten::gelu`` accuracy\n* Updated ``aten::meshgrid`` to support optional indexing argument introduced in ``torch 1.10`` , see  `PyTorch issue 50276 <https://github.com/pytorch/pytorch/issues/50276>`_\n\n\n\nPyTorch Neuron release [2.2.0.0]\n--------------------------------------------------\n\nDate: 03/25/2022\n\nNew in this release\n~~~~~~~~~~~~~~~~~~~\n\n* Added full support for  ``aten::max_pool2d_with_indices`` -  (Was previously supported only when indices were unused).\n* Added new torch-neuron packages compiled with ``-D_GLIBCXX_USE_CXX11_ABI=1``, the new packages support PyTorch 1.8, PyTorch 1.9, and PyTorch 1.10.\n  To install the additional packages compiled with ``-D_GLIBCXX_USE_CXX11_ABI=1`` please change the package repo index to ``https://pip.repos.neuron.amazonaws.com (https://pip.repos.neuron.amazonaws.com/)/cxx11/``\n  \n\nPyTorch Neuron release [2.1.7.0]\n--------------------------------------------------\n\nDate: 01/20/2022\n\nNew in this release\n~~~~~~~~~~~~~~~~~~~\n\n* Added PyTorch 1.10 support\n* Added new operators support, see :ref:`neuron-cc-ops-pytorch`\n* Updated ``aten::_convolution`` to support 2d group convolution\n* Updated ``neuron::forward`` operators to allocate less dynamic memory. This can increase performance on models with many input & output tensors.\n* Updated ``neuron::forward`` to better handle batch sizes when ``dynamic_batch_size=True``. This can increase performance at \n  inference time when the input batch size is exactly equal to the traced model batch size.\n\nBug fixes\n~~~~~~~~~\n\n* Added the ability to ``torch.jit.trace`` a ``torch.nn.Module`` where a submodule has already been traced with :func:`torch_neuron.trace` on a CPU-type instance.\n  Previously, if this had been executed on a CPU-type instance, an initialization exception would have been thrown.\n* Fixed ``aten::matmul`` behavior on 1-dimensional by n-dimensional multiplies. Previously, this would cause a validation error.\n* Fixed binary operator type promotion. Previously, in unusual situations, operators like ``aten::mul`` could produce incorrect results due to invalid casting.\n* Fixed ``aten::select`` when index was -1. Previously, this would cause a validation error.\n* Fixed ``aten::adaptive_avg_pool2d`` padding and striding behavior. Previously, this could generate incorrect results with specific configurations.\n* Fixed an issue where dictionary inputs could be incorrectly traced when the tensor values had gradients.\n\n\nPyTorch Neuron release [2.0.536.0]\n--------------------------------------------------\n\nDate: 01/05/2022\n\n\nNew in this release\n~~~~~~~~~~~~~~~~~~~\n\n* Added new operator support for specific variants of operations (See :ref:`neuron-cc-ops-pytorch`)\n* Added optional ``optimizations`` keyword to :func:`torch_neuron.trace` which accepts a list of :class:`~torch_neuron.Optimization` passes.\n\n\nPyTorch Neuron release [2.0.468.0]\n--------------------------------------------------\n\nDate: 12/15/2021\n\n\nNew in this release\n~~~~~~~~~~~~~~~~~~~\n\n* Added support for ``aten::cumsum`` operation.\n* Fixed ``aten::expand`` to correctly handle adding new dimensions.\n\n\nPyTorch Neuron release [2.0.392.0]\n--------------------------------------------------\n\nDate: 11/05/2021\n\n* Updated Neuron Runtime (which is integrated within this package) to ``libnrt 2.2.18.0`` to fix a container issue that was preventing\n  the use of containers when /dev/neuron0 was not present. See details here :ref:`runtime_rn`.\n\nPyTorch Neuron release [2.0.318.0]\n--------------------------------------------------\n\nDate: 10/27/2021\n\nNew in this release\n~~~~~~~~~~~~~~~~~~~\n\n-  PyTorch Neuron 1.x now support Neuron Runtime 2.x (``libnrt.so`` shared library) only.\n\n   .. important::\n\n      -  You must update to the latest Neuron Driver (``aws-neuron-dkms`` version 2.1 or newer)\n         for proper functionality of the new runtime library.\n      -  Read :ref:`introduce-libnrt`\n         application note that describes :ref:`why are we making this\n         change <introduce-libnrt-why>` and\n         how :ref:`this change will affect the Neuron\n         SDK <introduce-libnrt-how-sdk>` in detail.\n      -  Read :ref:`neuron-migrating-apps-neuron-to-libnrt` for detailed information of how to\n         migrate your application.\n\n-  Introducing PyTorch 1.9.1 support (support for ``torch==1.9.1)``\n-  Added ``torch_neuron.DataParallel``, see ResNet-50 tutorial :ref:`[html] </src/examples/pytorch/resnet50.ipynb>` and\n   :ref:`torch-neuron-dataparallel-app-note` application note.\n-  Added support for tracing on GPUs\n-  Added support for ``ConvTranspose1d``\n-  Added support for new operators:\n\n   -  ``aten::empty_like``\n   -  ``aten::log``\n   -  ``aten::type_as``\n   -  ``aten::movedim``\n   -  ``aten::einsum``\n   -  ``aten::argmax``\n   -  ``aten::min``\n   -  ``aten::argmin``\n   -  ``aten::abs``\n   -  ``aten::cos``\n   -  ``aten::sin``\n   -  ``aten::linear``\n   -  ``aten::pixel_shuffle``\n   -  ``aten::group_norm``\n   -  ``aten::_weight_norm``\n\n-  Added ``torch_neuron.is_available()``\n\n\nResolved Issues\n~~~~~~~~~~~~~~~\n\n-  Fixed a performance issue when using both the\n   ``dynamic_batch_size=True`` trace option and\n   ``--neuron-core-pipeline`` compiler option. Dynamic batching now uses\n   ``OpenMP`` to execute pipeline batches concurrently.\n-  Fixed ``torch_neuron.trace`` issues:\n\n   -  Fixed a failure when the same submodule was traced with multiple\n      inputs\n   -  Fixed a failure where some operations would fail to be called with\n      the correct arguments\n   -  Fixed a failure where custom operators (torch plugins) would cause\n      a trace failure\n\n-  Fixed variants of ``aten::upsample_bilinear2d`` when\n   ``scale_factor=1``\n-  Fixed variants of ``aten::expand`` using ``dim=-1``\n-  Fixed variants of ``aten::stack`` using multiple different input data\n   types\n-  Fixed variants of ``aten::max`` using indices outputs\n\n\n[1.8.1.1.5.21.0]\n--------------------------------------------------\n\nDate: 08/12/2021\n\nSummary\n~~~~~~~\n\n- Minor updates.\n\n\n.. _neuron-torch-1570:\n\n[1.8.1.1.5.7.0]\n--------------------------------------------------\n\nDate: 07/02/2021\n\nSummary\n~~~~~~~\n\n- Added support for dictionary outputs using ``strict=False`` flag. See\n  :ref:`/archive/torch-neuron/troubleshooting-guide.rst`.\n- Updated ``aten::batch_norm`` to correctly implement the ``affine`` flag.\n- Added support for ``aten::erf`` and ``prim::DictConstruct``. See\n  :ref:`neuron-cc-ops-pytorch`.\n- Added dynamic batch support. See\n  :ref:`/archive/torch-neuron/api-compilation-python-api.rst`.\n\n\n.. _neuron-torch-1410:\n\n[1.8.1.1.4.1.0]\n--------------------------------------------------\n\nDate: 5/28/2021\n\nSummary\n~~~~~~~~\n\n* Added support for PyTorch 1.8.1\n\n  * Models compatibility\n\n    * Models compiled with previous versions of PyTorch Neuron (<1.8.1) are compatible with PyTorch Neuron 1.8.1.\n    * Models compiled with PyTorch Neuron 1.8.1 are not backward compatible with previous versions of PyTorch Neuron (<1.8.1) .\n\n  * Updated  tutorials to use Hugging Face Transformers 4.6.0.\n  * Added a new set of forward operators (forward_v2)\n  * Host memory allocation when loading the same model on multiple NeuronCores is significantly reduced\n  * Fixed an issue where models would not deallocate all memory within a python session after being garbage collected.\n  * Fixed a TorchScript/C++ issue where loading the same model multiple times would not use multiple NeuronCores by default.\n\n\n* Fixed logging to no longer configure the root logger.\n* Removed informative messages that were produced during compilations as warnings.  The number of warnings reduced significantly.\n* Convolution operator support has been extended to include ConvTranspose2d variants.\n* Reduce the amount of host memory usage during inference.\n\n\n.. _neuron-torch-1350:\n\n[1.7.1.1.3.5.0]\n--------------------------------------------------\n\nDate: 4/30/2021\n\nSummary\n~~~~~~~\n\n- ResNext models now functional with new operator support\n- Yolov5 support refer to https://github.com/aws/aws-neuron-sdk/issues/253 note https://github.com/ultralytics/yolov5/pull/2953 which optimized YoloV5 for AWS Neuron\n- Convolution operator support has been extended to include most Conv1d and Conv3d variants\n- New operator support.  Please see :ref:`neuron-cc-ops-pytorch` for the complete list of operators.\n\n.. _neuron-torch-12160:\n\n[1.7.1.1.2.16.0]\n--------------------------------------------------\n\nDate: 3/4/2021\n\nSummary\n~~~~~~~~\n\n-  Minor enhancements.\n\n.. _neuron-torch-12150:\n\n[1.7.1.1.2.15.0]\n--------------------------------------------------\n\nDate: 2/24/2021\n\nSummary\n~~~~~~~\n\n-  Fix for CVE-2021-3177.\n\n.. _neuron-torch-1230:\n\n[1.7.1.1.2.3.0]\n--------------------------------------------------\n\nDate: 1/30/2021\n\nSummary\n~~~~~~~~\n\n-  Made changes to allow models with -inf scalar constants to correctly compile\n-  Added new operator support. Please see :ref:`neuron-cc-ops-pytorch` for the complete list of operators.\n\n.. _neuron-torch-11170:\n\n[1.1.7.0]\n--------------------------------------------------\n\nDate: 12/23/2020\n\nSummary\n~~~~~~~~\n\n-  We are dropping support for Python 3.5 in this release\n-  torch.neuron.trace behavior will now throw a RuntimeError in the case that no operators are compiled for neuron hardware\n-  torch.neuron.trace will now display compilation progress indicators (dots) as default behavior (neuron-cc must updated to the December release to greater to see this feature)\n-  Added new operator support. Please see :ref:`neuron-cc-ops-pytorch` for the complete list of operators.\n-  Extended the BERT pretrained tutorial to demonstrate execution on multiple cores and batch modification, updated the tutorial to accomodate changes in the Hugging Face Transformers code for version 4.0\n-  Added a tutorial for torch-serve which extends the BERT tutorial\n-  Added support for PyTorch 1.7\n\n.. _neuron-torch-1019780:\n\n[1.0.1978.0]\n--------------------------------------------------\n\nDate: 11/17/2020\n\nSummary\n~~~~~~~\n\n-  Fixed bugs in comparison operators, and added remaining variantes\n   (eq, ne, gt, ge, lt, le)\n-  Added support for prim::PythonOp - note that this must be run on CPU\n   and not Neuron. We recommend you replace this code with PyTorch\n   operators if possible\n-  Support for a series of new operators. Please see :ref:`neuron-cc-ops-pytorch` for the\n   complete list of operators.\n-  Performance improvements to the runtime library\n-  Correction of a runtime library bug which caused models with large\n   tensors to generate incorrect results in some cases\n\n\n\n.. _neuron-torch-1017210:\n\n[1.0.1721.0]\n--------------------------------------------------\n\nDate: 09/22/2020\n\nSummary\n~~~~~~~\n\n-  Various minor improvements to the Pytorch autopartitioner feature\n-  Support for the operators aten::constant_pad_nd, aten::meshgrid\n-  Improved performance on various torchvision models. Of note are\n   resnet50 and vgg16\n\n.. _neuron-torch-1015320:\n\n[1.0.1532.0]\n--------------------------------------------------\n\nDate: 08/08/2020\n\n.. _torch-summary-1:\n\nSummary\n~~~~~~~\n\n-  Various minor improvements to the Pytorch autopartitioner feature\n-  Support for the aten:ones operator\n\n.. _neuron-torch-1015220:\n\n[1.0.1522.0]\n--------------------------------------------------\n\nDate: 08/05/2020\n\n.. _torch-summary-2:\n\nSummary\n~~~~~~~~\n\nVarious minor improvements.\n\n.. _neuron-torch-1013860:\n\n[1.0.1386.0]\n--------------------------------------------------\n\nDate: 07/16/2020\n\n.. _torch-summary-3:\n\nSummary\n~~~~~~~\n\nThis release adds auto-partitioning, model analysis and PyTorch 1.5.1\nsupport, along with a number of new operators\n\nMajor New Features\n~~~~~~~~~~~~~~~~~~\n\n-  Support for Pytorch 1.5.1\n-  Introduce an automated operator device placement mechanism in\n   torch.neuron.trace to run sub-graphs that contain operators that are\n   not supported by the neuron compiler in native PyTorch. This new\n   mechanism is on by default and can be turned off by adding argument\n   fallback=False to the compiler arguments.\n-  Model analysis to find supported and unsupported operators in a model\n\nResolved Issues\n~~~~~~~~~~~~~~~~\n\n.. _neuron-torch-1011680:\n\n[1.0.1168.0]\n--------------------------------------------------\n\nDate 6/11/2020\n\n.. _torch-summary-4:\n\nSummary\n~~~~~~~\n\n.. _major-new-features-1:\n\nMajor New Features\n~~~~~~~~~~~~~~~~~~\n\n.. _torch-resolved-issues-1:\n\nResolved Issues\n~~~~~~~~~~~~~~~\n\nKnown Issues and Limitations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. _neuron-torch-1010010:\n\n[1.0.1001.0]\n--------------------------------------------------\n\nDate: 5/11/2020\n\n.. _torch-summary-5:\n\nSummary\n~~~~~~~~\n\nAdditional PyTorch operator support and improved support for model\nsaving and reloading.\n\n.. _major-new-features-2:\n\nMajor New Features\n~~~~~~~~~~~~~~~~~~\n\n-  Added Neuron Compiler support for a number of previously unsupported\n   PyTorch operators. Please see :ref:`neuron-cc-ops-pytorch` for the\n   complete list of operators.\n-  Add support for torch.neuron.trace on models which have previously\n   been saved using torch.jit.save and then reloaded.\n\n.. _torch-resolved-issues-2:\n\nResolved Issues\n~~~~~~~~~~~~~~~~\n\n.. _torch-known-issues-and-limitations-1:\n\nKnown Issues and Limitations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. _neuron-torch-108250:\n\n[1.0.825.0]\n--------------------------------------------------\n\nDate: 3/26/2020\n\n.. _torch-summary-6:\n\nSummary\n~~~~~~~\n\n.. _major-new-features-3:\n\nMajor New Features\n~~~~~~~~~~~~~~~~~~\n\n.. _torch-resolved-issues-3:\n\nResolved Issues\n~~~~~~~~~~~~~~~\n\n.. _torch-known-issues-and-limitations-2:\n\nKnown Issues and limitations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. _neuron-torch-107630:\n\n[1.0.763.0]\n--------------------------------------------------\n\nDate: 2/27/2020\n\n.. _torch-summary-7:\n\nSummary\n~~~~~~~\n\nAdded Neuron Compiler support for a number of previously unsupported\nPyTorch operators. Please see :ref:`neuron-cc-ops-pytorch` for the complete\nlist of operators.\n\n.. _major-new-features-4:\n\nMajor new features\n~~~~~~~~~~~~~~~~~~\n\n-  None\n\n.. _torch-resolved-issues-4:\n\nResolved issues\n~~~~~~~~~~~~~~~~~\n\n-  None\n\n.. _neuron-torch-106720:\n\n[1.0.672.0]\n--------------------------------------------------\n\nDate: 1/27/2020\n\n.. _torch-summary-8:\n\nSummary\n~~~~~~~~\n\n.. _major-new-features-5:\n\nMajor new features\n~~~~~~~~~~~~~~~~~~\n\n.. _torch-resolved-issues-5:\n\nResolved issues\n~~~~~~~~~~~~~~~~\n\n-  Python 3.5 and Python 3.7 are now supported.\n\n.. _torch-known-issues-and-limitations-3:\n\nKnown issues and limitations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nOther Notes\n~~~~~~~~~~~\n\n.. _neuron-torch-106270:\n\n[1.0.627.0]\n--------------------------------------------------\n\nDate: 12/20/2019\n\n.. _torch-summary-9:\n\nSummary\n~~~~~~~~\n\nThis is the initial release of torch-neuron. It is not distributed on\nthe DLAMI yet and needs to be installed from the neuron pip repository.\n\nNote that we are currently using a TensorFlow as an intermediate format\nto pass to our compiler. This does not affect any runtime execution from\nPyTorch to Neuron Runtime and Inferentia. This is why the neuron-cc\ninstallation must include [tensorflow] for PyTorch.\n\n.. _major-new-features-6:\n\nMajor new features\n~~~~~~~~~~~~~~~~~~\n\n.. _torch-resolved-issues-6:\n\nResolved issues\n~~~~~~~~~~~~~~~\n\n.. _torch-known-issues-and-limitations-4:\n\nKnown issues and limitations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nModels TESTED\n~~~~~~~~~~~~~~\n\nThe following models have successfully run on neuron-inferentia systems\n\n1. SqueezeNet\n2. ResNet50\n3. Wide ResNet50\n\nPytorch Serving\n~~~~~~~~~~~~~~~\n\nIn this initial version there is no specific serving support. Inference\nworks correctly through Python on Inf1 instances using the neuron\nruntime. Future releases will include support for production deployment\nand serving of models\n\nProfiler support\n~~~~~~~~~~~~~~~~\n\nProfiler support is not provided in this initial release and will be\navailable in future releases\n\nAutomated partitioning\n~~~~~~~~~~~~~~~~~~~~~~\n\nAutomatic partitioning of graphs into supported and non-supported\noperations is not currently supported. A tutorial is available to\nprovide guidance on how to manually parition a model graph. Please see\n:ref:`pytorch-manual-partitioning-jn-tutorial`\n\nPyTorch dependency\n~~~~~~~~~~~~~~~~~~\n\nCurrently PyTorch support depends on a Neuron specific version of\nPyTorch v1.3.1. Future revisions will add support for 1.4 and future\nreleases.\n\nTrace behavior\n~~~~~~~~~~~~~~\n\nIn order to trace a model it must be in evaluation mode. For examples\nplease see :ref:`/src/examples/pytorch/resnet50.ipynb`\n\nSix pip package is required\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe Six package is required for the torch-neuron runtime, but it is not\nmodeled in the package dependencies. This will be fixed in a future\nrelease.\n\nMultiple NeuronCore support\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf the num-neuroncores options is used the number of cores must be\nmanually set in the calling shell environment variable for compilation\nand inference.\n\nFor example: Using the keyword argument\ncompiler_args=['—num-neuroncores', '4'] in the trace call, requires\nNEURONCORE_GROUP_SIZES=4 to be set in the environment at compile time\nand runtime\n\nCPU execution\n~~~~~~~~~~~~~~\n\nAt compilation time a constant output is generated for the purposes of\ntracing. Running inference on a non neuron instance will generate\nincorrect results. This must not be used. The following error message is\ngenerated to stderr:\n\n::\n\n   Warning: Tensor output are ** NOT CALCULATED ** during CPU execution and only\n   indicate tensor shape\n\n.. _other-notes-1:\n\nOther notes\n~~~~~~~~~~~\n\n-  Python version(s) supported:\n\n   -  3.6\n\n-  Linux distribution supported:\n\n   -  DLAMI Ubuntu 18 and Amazon Linux 2 (using Python 3.6 Conda environments)\n   -  Other AMIs based on Ubuntu 18\n   -  For Amazon Linux 2 please install Conda and use Python 3.6 Conda\n      environment\n"
  },
  {
    "path": "release-notes/components/compiler.rst",
    "content": ".. meta::\n    :description: Complete release notes for the Neuron Compiler component across all AWS Neuron SDK versions.\n    :keywords: neuron compiler, neuronx-cc, release notes, aws neuron sdk\n    :date-modified: 12/19/2025\n\n.. _compiler_rn:\n\nComponent Release Notes for NeuronX Graph Compiler\n==================================================\n\n**Latest version (in 2.29.0)**: 2.24.5133.0\n\nThe release notes for the NeuronX Graph Compiler (neuronx-cc) Neuron component. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. note::\n    For older Neuron Compiler (neuron-cc) release notes, see :doc:`the archived Neuron Compiler release notes </release-notes/archive/neuron-cc/neuron-cc>` and :doc:`Neuron Compiler operations release notes </release-notes/archive/neuron-cc/neuron-cc-ops/index>`.\n\n.. _compiler-2-27-0-rn:\n\nNeuron Compiler [2.15.54.0] (Neuron 2.27.0 Release)\n----------------------------------------------------\n\nDate of Release: 12/19/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* New error code documentation has been added to help developers better understand and troubleshoot issues encountered during model compilation.\n* Two Neuron Compiler (neuronxcc) flags now have different default behaviors to improve accuracy. The ``--auto-cast`` flag now defaults to ``none`` (previously ``matmul``), and ``--enable-mixed-precision-accumulation`` is now enabled by default.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* Python 3.9 no longer supported: The Neuron Compiler requires Python 3.10 or higher. Users currently on Python 3.9 must upgrade to continue using the Neuron Compiler with Python bindings.\n* Compiler accuracy flag defaults updated: These changes optimize accuracy but may impact performance for FP32 models and models using smaller bitwidth dtypes. To restore previous behavior, explicitly set ``--auto-cast=matmul`` and use the new ``--disable-mixed-precision-accumulation`` flag.\n\nBug Fixes\n~~~~~~~~~\n\n* Minor bug fixes and performance enhancements for both the ``trn1`` and ``trn2`` platforms.\n\n\n----\n\n.. _compiler-2-25-0-rn:\n\nNeuron Compiler [2.14.77.0] (Neuron 2.25.0 Release)\n----------------------------------------------------\n\nDate of Release: 07/31/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Minor bug fixes and performance enhancements for both the ``trn1`` and ``trn2`` platforms.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* Announcement: The Neuron Compiler default for the ``--auto-cast`` option will change from ``--auto-cast=matmult`` to ``--auto-cast=none`` in a future release.\n\nBug Fixes\n~~~~~~~~~\n\n* Minor bug fixes and performance enhancements for both the trn1 and trn2 platforms.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* The Llama3 70B test has a compile time increase of 16% and 18%, for 16 and 32 nodes respectively.\n\n\n----\n\nNeuron Compiler [2.19.8089.0]\n-------------------------------\nDate: 06/24/2025\n\n* This update enables the Hardware DMA Generation Engine (Hardware DGE) by default on Trainium2 instances. Hardware DGE is a memory optimization that will reduce the generated compiler artifacts size (i.e., NEFFs) and the models’ memory footprint.  Data movement (DMA) descriptors will be generated in the hardware, as needed, during model execution. This reduces (HBM) memory usage within the NEFF and allows the use of more DMA queues.\n\n\n----\n\nNeuron Compiler [2.18.121.0]\n----------------------------\nDate: 05/19/2025\n\n* Minor bug fixes and performance enhancements for both the trn1 and trn2 platforms.\n\n\n----\n\nNeuron Compiler [2.17.194.0]\n----------------------------\nDate: 04/03/2025\n\n* Minor bug fixes and performance enhancements for both the trn1 and trn2 platforms.\n\n\n----\n\nNeuron Compiler [2.16.372.0]\n----------------------------\nDate: 01/14/2025\n\n* Minor bug fixes and performance enhancements for the trn2 platform.\n\n\n----\n\nNeuron Compiler [2.16.345.0]\n----------------------------\nDate: 12/20/2024\n\n* Minor bug fixes and performance enhancements for the trn2 platform.\n\n\n----\n\nNeuron Compiler [Neuron 2.21.0 Beta]\n--------------------------------------\nDate: 12/03/2024\n\n* This release introduces the ``trn2`` option argument to the compiler ``--target`` option to specify that the compiler should\n  generate code for a trn2 instance family. Example usage: ``neuronx-cc compile --target=trn2 ...``\n  \n* This release introduces the ``--logical-nc-config`` or ``-lnc`` compiler command line option in support of the Logical NeuronCore Configuration feature available in Trainium2 instances. The compiler's default is LNC=2.  **Note: Use of this option is available only for Trainium2 instances.**\n\n\n----\n\nNeuron Compiler [2.15.128.0]\n----------------------------\nDate: 09/16/2024\n\n* This release introduces memory optimization that will reduce the generated compiler artifacts size (i.e., NEFFs) and the models' memory footprint. It is possible that some models may experience unexpected performance degradation. If this occurs, these optimizations can be disabled using the --disable-dge compiler command line option or the framework-level option ``additional_compile_opt=\" --disable-dge\"``\n\n\n----\n\nNeuron Compiler [2.14.213.0]\n----------------------------\nDate: 07/03/2024\n\n* Minor bug fixes and performance enhancements.\n* Improved flash attention kernel performance.\n\n\n----\n\nNeuron Compiler [2.13.72.0]\n----------------------------\nDate: 04/25/2024\n\n* Minor bug fixes and enhancements.\n\n\n\n----\n\nNeuron Compiler [2.13.68.0]\n----------------------------\nDate: 04/10/2024\n\n* This release fixes hang issues related to Triton Inference Server.\n\n\n\n----\n\nNeuron Compiler [2.13.66.0]\n----------------------------\nDate: 04/01/2024\n\n* This release introduces a new ``--enable-mixed-precision-accumulation`` compiler option. This option instructs the compiler to perform intermediate calculations of reduction operators (such as the dot or reduce operators) in FP32 regardless of the operation's defined datatype. The final result of the operator will be cast from FP32 to the model-designated datatype (e.g., BF16). This helps to improve the operator's resulting acccuracy.\n\n\n\n----\n\nNeuron Compiler [2.12.68.0]\n----------------------------\nDate: 01/18/2024\n\n* Patch release with bug fixes.\n\n\n\n----\n\nNeuron Compiler [2.12.54.0]\n---------------------------\nDate: 12/21/2023\n\n* The compiler now generates instructions to check if a model references an embedding table with an illegal index. The check is made at model execution time. If an attempted invalid table index is encountered, the model execution will continue and the user will see an error similar to:\n\n      WARNING: Received notification generated at runtime: failed to run scatter/gather (indirect memory copy with branch_label_id = xx), due to out-of-bound access.\n\nWhen this occurs, users are encouraged to review the model's gather/scatter input values to determine if there is a coding error.\n\n\n\n----\n\nNeuron Compiler [2.11.0.35]\n---------------------------\nDate: 11/17/2023\n\n* This release addresses performance related issues when training through ``neuronx-nemo-megatron`` library.\n\n\n\n----\n\nNeuron Compiler [2.11.0.34]\n-----------------------------\nDate: 10/26/2023\n\n* This release introduces the option-argument ``llm-training`` to the existing ``--distribution_strategy`` compiler option. This option-argument allows the compiler to make specific optimizations related to training distributed models. This new option-argument is equivalent to the previously introduced ``nemo`` option-argument, which will be deprecated in a future release.\n\n\n\n----\n\nNeuron Compiler [2.10.0.35]\n-----------------------------\nDate: 09/26/2023\n\n* This release addresses a compilation regression for certain configurations of Llama and Llama-2 inference models when it fails compilation with this error \"IndirectLoad/Save requires contiguous indirect access per partition\" .\n\nThere is still a known issue for some configurations of the model with the error \"Too many instructions after unroll for function sg0000\" . To mitigate this, recompile using the ``--optlevel 1 (-O1)`` option. A complete fix will be coming in the future release which will not require this option\n\n\n----\n\nNeuron Compiler [2.10.0.34]\n-----------------------------\nDate: 09/15/2023\n\n* This release introduces a new ``--optlevel (-O)`` compiler option. This option allows the user to balance between compile-time and optimizations performed.\n  Three levels are supported. Level ``--optlevel 1 (-O1)`` aims to minimize compile-time and allow for a more rapid model development cycle. Model execution\n  time may be reduced. Level ``--optlevel 3 (-O3)`` performs whole-model optimization. This level will deliver the best performance however there will be longer\n  compile-times and the compiler will use more host DRAM, potentially requiring a larger instance to compile the model.\n  The default is ``--optlevel 2 (-O2)`` which provides a balance between model performance and compile time. \n\n  The previous ``—enable-experimental-O1`` flag introduced in the 02/08/2023 Neuron Compiler [2.4.0.21] release is now deprecated. Using this flag\n  will generate a message similar to:\n  \n      WARNING: Option —enable-experimental-O1 is deprecated and will be removed in a future release.\" Use ``--optlevel 1 (-O1)`` instead.\n\n\n----\n\nNeuron Compiler [2.9.0.16]\n-----------------------------\nDate: 08/28/2023\n\n* This release fixes an issue where any initial seed passed into the Random Number Generator operator was not honored. The RngBitGenerator operator now correctly accepts and uses setting the seed. Note that the current RNG implementation only supports 32-bit seeds.\n\n\n----\n\nNeuron Compiler [2.8.0.25]\n-----------------------------\nDate: 07/19/2023\n\n* This release introduces a new optional ``--distribution_strategy`` compiler option. This option informs the compiler what type of distributed APIs are used to shard the model and allows the compiler to make API-specific optimizations. Currently following option-arguments are supported: ``nemo``.\n\n\n----\n\nNeuron Compiler [2.7.0.40]\n-----------------------------\nDate: 06/14/2023\n\n* This release introduces a new ``--enable-saturate-infinity`` compiler option. A computation that can generate +/- infinity is at a high\n  risk of generating Not-a-Number (NaN) values when the infinity value is used in subsequent computations. This option helps avoid this\n  by converting +Inf/-Inf values to MAX/MIN_FLOAT before operations that could produce NaN values for +Inf/-Inf inputs on the target\n  architecture. While this option helps to avoid NaN values, there is a potential performance degradation that occurs during model\n  execution when this conversion is enabled.\n  \n\n----\n\nNeuron Compiler [2.6.0.19]\n-----------------------------\nDate: 05/01/2023\n\n* This release introduces a new ``model-type`` option argument: ``unet-inference``.\n  This option instructs the compiler to perform model-specific optimizations that produce executable models with improved performance\n  on the specified target instance.\n  \n* Added support for the HLO operator ``BitcastConvertType`` and also added support for ``TopK`` (sampling mode) operator.\n\n\n----\n\nNeuron Compiler [2.5.0.28]\n-----------------------------\nDate: 03/28/2023\n\n* This release introduces the ``trn1n`` option argument to the compiler ``target`` option to specify that it should\n  generate code for a trn1n instance type. Example usage: ``neuronx-cc compile --target=trn1n ...``\n  \n* The compiler's usage message now includes the ``inf2`` option argument.\n\n* A new 8-bit floating point data type, ``fp8_e4m3``, is now supported and can be specificed using the ``auto-cast-type`` option.\n  This instructs the compiler to convert the FP32 operations selected via the ``--auto-cast`` option to a signed FP8 size\n  with 4-bit exponent and 3-bit mantissa. Care must be taken to ensure that the down-casted values are representable within the 8-bit data range.\n\n\n----\n\nNeuron Compiler [2.4.0.21]\n-----------------------------\nDate: 02/24/2023\n\n* This release introduces the ``inf2`` option argument to the compiler ``target`` option to specify that it should\n  generate code for an inf2 instance type. Example usage: ``neuronx-cc compile --target=inf2 ...``\n  The ``inf2`` option argument does not appear in the compiler's usage message. It will be added in the next release.\n\n\n----\n\nNeuron Compiler [2.4.0.21]\n-----------------------------\nDate: 02/08/2023\n\n* Added support for the following HLO operators: ``SelectAndScatter``.\n* Beta: ``--enable-experimental-O1`` flag: This option reduces the compile-time with a neglible impact on model execution performance.\n  It allows the compiler to execute compiler passes in parallel to perform the compilation. By default the compiler uses 8 processes.\n  This can be changed via the CLI option ``--num-parallel-jobs``. This option is expected to become the default in a future SDK release.\n\n\n----\n\nNeuron Compiler [2.3.0.4]\n-----------------------------\nDate: 12/09/2022\n\n* Added support for the following HLO operators: ``rev (reverse)``.\n* The ``pow()`` function can now handle both integer and floating-point exponents.\n* Optimization enhancements and bug fixes to improve model execution performance.\n\n\n\n----\n\nNeuron Compiler [2.2.0.73]\n-----------------------------\nDate: 10/27/2022\n\n* Adding support for the following HLO operators: ``LogicalNot``, ``atan2`` and ``DynamicUpdateSlice`` (for constant index).\n\n\n----\n\nNeuron Compiler [2.1.0.76]\n-----------------------------\nDate: 10/5/2022\n\n\nThe Neuron Compiler is an Ahead-of-Time compiler that accelerates models for\nexecution on NeuronCores. This release supports compiling models for training\non a Trn1 instance using Pytorch Neuron. Users typically access the compiler via\nthe Framework to perform model compilation, although it can also be run\nas a command line tool (*neuronx-cc*).\n\n\nThe Neuron Compiler supports compiling models for mixed precision calculations. \nThe trn1 hardware supports matrix multiplication using FP16, BF16, and FP32 on\nits Matrix Multiplication Engine, and accumulations using FP32. Operators such as \nactivations or vector operations are supported using FP16, BF16, and FP32.\nTensor transpose can be accomplished in FP16, BF16, FP32, or TF32 datatypes.\nBy default, scalar and vector operations on FP32 values will be done in FP32,\nwhile matrix multiplications are cast to BF16 and transpose operations are cast to FP32.\nThis default casting will generate the highest performance for a FP32 trained model.\n\nBy default, the compiler will target maximum performance by automatically casting\nthe model to mixed precision. It also provides an option (``--auto-cast``) that\nallows the user to make tradeoffs between higher performance and optimal accuracy.\nThe decision on what option argument to use with the ``--auto-cast`` option will be\napplication specific. Compiler CLI options can be passed to the compiler via the framework.\n\nKnown issues\n~~~~~~~~~~~~\n\n-  The Random Number Generator operation can be passed an initial seed\n   value, however setting the seed is not supported in this release.\n-  The exponent value of the pow() function must be a compile-time\n   integer constant.\n-  The compiler treats INT64 datatypes as INT32 by truncating the\n   high-order bits. If possible, cast these values to 32 bits .\n-  Model compilation time is proportional to the model size and\n   operators used. For some larger NLP models it may be upwards of 30\n   minutes.\n\n\n\n----\n\nSupported Operators\n-------------------\n\nThe following XLA operators are supported by the Neuron Compiler. \nFuture releases will broaden model support by providing additional XLA operators defined in\nhttps://www.tensorflow.org/xla/operation_semantics.\n\nThe list of supported operators can also be retrieved from the command line using :ref:`neuronx-cc list-operators<neuronx-cc-list-operators>`.\n\n+-------------------------+-------------------------------------------+\n| Supported XLA Operators | Notes                                     |\n+=========================+===========================================+\n| Abs                     |                                           |\n+-------------------------+-------------------------------------------+\n| Add                     |                                           |\n+-------------------------+-------------------------------------------+\n| Allgather               |                                           |\n+-------------------------+-------------------------------------------+\n| Allreduce               |                                           |\n+-------------------------+-------------------------------------------+\n| Atan2                   |                                           |\n+-------------------------+-------------------------------------------+\n| Batchnorm               |                                           |\n+-------------------------+-------------------------------------------+\n| Batchnormgrad           |                                           |\n+-------------------------+-------------------------------------------+\n| Batchnorminference      |                                           |\n+-------------------------+-------------------------------------------+\n| BitcastConvertType      |                                           |\n+-------------------------+-------------------------------------------+\n| Broadcast               |                                           |\n+-------------------------+-------------------------------------------+\n| BroadcastInDim          |                                           |\n+-------------------------+-------------------------------------------+\n| Ceil                    |                                           |\n+-------------------------+-------------------------------------------+\n| Clamp                   |                                           |\n+-------------------------+-------------------------------------------+\n| Compare                 |                                           |\n+-------------------------+-------------------------------------------+\n| Concatenate             |                                           |\n+-------------------------+-------------------------------------------+\n| Constant                |                                           |\n+-------------------------+-------------------------------------------+\n| ConstantLiteral         |                                           |\n+-------------------------+-------------------------------------------+\n| ConvertElementType      |                                           |\n+-------------------------+-------------------------------------------+\n| Cos                     |                                           |\n+-------------------------+-------------------------------------------+\n| Customcall              |                                           |\n+-------------------------+-------------------------------------------+\n| Div                     |                                           |\n+-------------------------+-------------------------------------------+\n| Dot                     |                                           |\n+-------------------------+-------------------------------------------+\n| DotGeneral              |                                           |\n+-------------------------+-------------------------------------------+\n| DynamicUpdateSlice      | Supports only for constant index          |\n+-------------------------+-------------------------------------------+\n| Eq                      |                                           |\n+-------------------------+-------------------------------------------+\n| Exp                     |                                           |\n+-------------------------+-------------------------------------------+\n| Floor                   |                                           |\n+-------------------------+-------------------------------------------+\n| Gather                  | Supports only disjoint start_index_map    |\n|                         | and remapped_offset_dims                  |\n+-------------------------+-------------------------------------------+\n| Ge                      |                                           |\n+-------------------------+-------------------------------------------+\n| GetTupleElement         |                                           |\n+-------------------------+-------------------------------------------+\n| Gt                      |                                           |\n+-------------------------+-------------------------------------------+\n| Iota                    |                                           |\n+-------------------------+-------------------------------------------+\n| Le                      |                                           |\n+-------------------------+-------------------------------------------+\n| Log                     |                                           |\n+-------------------------+-------------------------------------------+\n| LogicalAnd              |                                           |\n+-------------------------+-------------------------------------------+\n| LogicalNot              |                                           |\n+-------------------------+-------------------------------------------+\n| Lt                      |                                           |\n+-------------------------+-------------------------------------------+\n| Max                     |                                           |\n+-------------------------+-------------------------------------------+\n| Min                     |                                           |\n+-------------------------+-------------------------------------------+\n| Mul                     |                                           |\n+-------------------------+-------------------------------------------+\n| Ne                      |                                           |\n+-------------------------+-------------------------------------------+\n| Neg                     |                                           |\n+-------------------------+-------------------------------------------+\n| Pad                     |                                           |\n+-------------------------+-------------------------------------------+\n| Pow                     | Exponent argument must be a compile-time  |\n|                         | integer constant                          |\n+-------------------------+-------------------------------------------+\n| Reduce                  | Min, Max, Add and Mul are the only        |\n|                         | supported computations. Init_values must  |\n|                         | be constant                               |\n+-------------------------+-------------------------------------------+\n| Reshape                 |                                           |\n+-------------------------+-------------------------------------------+\n| Rev (reverse)           |                                           |\n+-------------------------+-------------------------------------------+\n| RngBitGenerator         | Ignores user seed                         |\n+-------------------------+-------------------------------------------+\n| RngUniform              |                                           |\n+-------------------------+-------------------------------------------+\n| Rsqrt                   |                                           |\n+-------------------------+-------------------------------------------+\n| Scatter                 |                                           |\n+-------------------------+-------------------------------------------+\n| Select                  |                                           |\n+-------------------------+-------------------------------------------+\n| SelectAndScatter        |                                           |\n+-------------------------+-------------------------------------------+\n| ShiftRightLogical       |                                           |\n+-------------------------+-------------------------------------------+\n| Sign                    |                                           |\n+-------------------------+-------------------------------------------+\n| Sin                     |                                           |\n+-------------------------+-------------------------------------------+\n| Slice                   |                                           |\n+-------------------------+-------------------------------------------+\n| Sqrt                    |                                           |\n+-------------------------+-------------------------------------------+\n| Sub                     |                                           |\n+-------------------------+-------------------------------------------+\n| Tanh                    |                                           |\n+-------------------------+-------------------------------------------+\n| Transpose               |                                           |\n+-------------------------+-------------------------------------------+\n| Tuple                   |                                           |\n+-------------------------+-------------------------------------------+\n\n"
  },
  {
    "path": "release-notes/components/containers.rst",
    "content": ".. meta::\n    :description: Complete release notes for the Neuron Containers component across all AWS Neuron SDK versions.\n    :keywords: neuron containers, dlc, kubernetes, k8s, release notes, aws neuron sdk\n    :date-modified: 04/09/2026\n\n.. _containers_rn:\n\nComponent Release Notes for Neuron Containers\n==============================================\n\nThe release notes for the Neuron Containers component. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. _containers-2-29-0-rn:   \n\nNeuron Containers (Neuron 2.29.0 Release)\n--------------------------------------------------------------------------------------\n\nDate of Release: 04/09/2026\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* All Neuron packages and their dependencies have been upgraded to support AWS Neuron SDK version 2.29.0.\n\n\nCallouts\n~~~~~~~~~~~~~~~~\n\n.. important::\n   Announcing maintenance mode for NxDT and NxD Core Training APIs starting next release. Starting with Neuron 2.30.0, NxDT and NxD Core Training APIs are entering maintenance mode. Future releases will address critical security issues only and support will be gradually ended. The ``pytorch-training-neuronx`` DLC has been pinned to include ``neuronx_distributed_training-1.7.0``.\n\n   **How does this impact you?**\n\n   Existing NxDT/NxD Core users should stay on Neuron 2.28 and PyTorch 2.9 until ready to migrate to native PyTorch on Neuron (starting PyTorch 2.10). Customers are recommended to use native PyTorch with standard distributed primitives (DTensor, FSDP, DDP) and TorchTitan starting with Neuron 2.30.0 and PyTorch 2.10. A migration guide will be published in a coming release.\n\n   For more information, see :doc:`/frameworks/torch/pytorch-native-overview`.\n\n\n.. _containers-2-28-0-rn:   \n\nNeuron Containers (Neuron 2.28.0 Release)\n--------------------------------------------------------------------------------------\n\nDate of Release: 02/26/2026\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Kubernetes Support**\n\n* Introduced the Neuron DRA Driver, which enables advanced resource allocation capabilities using the Kubernetes Dynamic Resource Allocation (DRA) API for more flexible and efficient management of Neuron devices. For more details, see :doc:`/containers/neuron-dra`.\n* Added Neuron DRA Driver support to the Neuron Helm Charts. For more details, see :doc:`the updated Helm documentation under the Kubernetes Getting Started page </containers/kubernetes-getting-started>`.\n\n.. _containers-2-27-0-rn:\n\nNeuron Containers (Neuron 2.27.0 Release)\n---------------------------------------------------\n\nDate of Release: 12/19/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Added new pytorch-inference-vllm-neuronx 0.11.0 DLC with PyTorch 2.8, vLLM V1 with the vLLM-Neuron Plugin, tools, NxDI and all dependencies to run vLLM out of the box.\n* Upgraded pytorch-training-neuronx and pytorch-inference-neuronx DLCs to PyTorch 2.9.0 with related dependencies.\n* Upgraded jax-training-neuronx DLC to JAX 0.7.0 with related dependencies.\n* Upgraded base image to Ubuntu 24.04 and Python 3.12 in all DLCs.\n* Upgraded all Neuron packages and dependencies to support AWS Neuron SDK version 2.27.\n\nKnown Issues\n~~~~~~~~~~~~\n\n**Note**: Common Vulnerability and Exposure (CVE) identifiers are assigned to publicly disclosed cybersecurity vulnerabilities. CVE identifiers help security professionals and software vendors coordinate their efforts to address and mitigate vulnerabilities.\n\n* ``pytorch-training-neuronx``: 0.9.0 DLC has multiple CRITICAL and HIGH CVEs. We are actively working to resolve them.\n   * `CVE-2021-44906 <https://nvd.nist.gov/vuln/detail/CVE-2021-44906>`_ - Prototype Pollution vulnerability in minimist package\n   * `CVE-2023-38039 <https://nvd.nist.gov/vuln/detail/CVE-2023-38039>`_ - Memory exhaustion vulnerability in curl/libcurl from unlimited header processing\n   * `CVE-2021-35517 <https://nvd.nist.gov/vuln/detail/CVE-2021-35517>`_ - Denial of service vulnerability in Apache Commons Compress TAR archive processing\n   * `CVE-2022-29217 <https://nvd.nist.gov/vuln/detail/CVE-2022-29217>`_ - JWT signing algorithm confusion vulnerability in PyJWT library\n   * `CVE-2025-58056 <https://nvd.nist.gov/vuln/detail/CVE-2025-58056>`_ - HTTP request smuggling vulnerability in Netty codec\n   * `CVE-2024-45337 <https://nvd.nist.gov/vuln/detail/CVE-2024-45337>`_ - Authorization bypass vulnerability in golang.org/x/crypto SSH implementation\n   * `CVE-2024-56201 <https://nvd.nist.gov/vuln/detail/CVE-2024-56201>`_ - Remote code execution vulnerability in Jinja templating engine\n   * `CVE-2025-0725 <https://nvd.nist.gov/vuln/detail/CVE-2025-0725>`_ - Buffer overflow vulnerability in curl/libcurl gzip decompression\n   * `CVE-2023-36665 <https://nvd.nist.gov/vuln/detail/CVE-2023-36665>`_ - Prototype Pollution vulnerability in protobufjs library\n   * `CVE-2023-45288 <https://nvd.nist.gov/vuln/detail/CVE-2023-45288>`_ - HTTP/2 CONTINUATION frame DoS vulnerability in golang.org/x/net\n   * `CVE-2021-33194 <https://nvd.nist.gov/vuln/detail/CVE-2021-33194>`_ - Infinite loop vulnerability in golang.org/x/net ParseFragment\n   * `CVE-2023-41419 <https://nvd.nist.gov/vuln/detail/CVE-2023-41419>`_ - Privilege escalation vulnerability in gevent WSGIServer\n   * `CVE-2021-35516 <https://nvd.nist.gov/vuln/detail/CVE-2021-35516>`_ - Memory exhaustion vulnerability in Apache Commons Compress 7Z processing\n   * `CVE-2022-24771 <https://nvd.nist.gov/vuln/detail/CVE-2022-24771>`_ - RSA signature verification vulnerability in node-forge\n   * `CVE-2022-41723 <https://nvd.nist.gov/vuln/detail/CVE-2022-41723>`_ - HTTP/2 HPACK decoder DoS vulnerability in golang.org/x/net\n   * `CVE-2025-66031 <https://nvd.nist.gov/vuln/detail/CVE-2025-66031>`_ - Uncontrolled recursion DoS vulnerability in node-forge ASN.1 parsing\n   * `CVE-2025-58057 <https://nvd.nist.gov/vuln/detail/CVE-2025-58057>`_ - Memory exhaustion vulnerability in Netty BrotliDecoder\n   * `CVE-2023-50782 <https://nvd.nist.gov/vuln/detail/CVE-2023-50782>`_ - TLS RSA key exchange vulnerability in python-cryptography\n   * `CVE-2022-24772 <https://nvd.nist.gov/vuln/detail/CVE-2022-24772>`_ - RSA signature verification vulnerability in node-forge DigestInfo\n   * `CVE-2022-27664 <https://nvd.nist.gov/vuln/detail/CVE-2022-27664>`_ - HTTP/2 connection hang DoS vulnerability in golang.org/x/net\n   * `CVE-2024-56326 <https://nvd.nist.gov/vuln/detail/CVE-2024-56326>`_ - Sandbox bypass vulnerability in Jinja str.format detection\n   * `CVE-2024-3651 <https://nvd.nist.gov/vuln/detail/CVE-2024-3651>`_ - Quadratic complexity DoS vulnerability in idna.encode() function\n   * `CVE-2023-49083 <https://nvd.nist.gov/vuln/detail/CVE-2023-49083>`_ - NULL-pointer dereference vulnerability in cryptography PKCS7 processing\n   * `CVE-2024-22189 <https://nvd.nist.gov/vuln/detail/CVE-2024-22189>`_ - Memory exhaustion vulnerability in quic-go NEW_CONNECTION_ID frames\n   * `CVE-2025-47273 <https://nvd.nist.gov/vuln/detail/CVE-2025-47273>`_ - Path traversal vulnerability in setuptools PackageIndex\n   * `CVE-2025-66418 <https://nvd.nist.gov/vuln/detail/CVE-2025-66418>`_ - Unbounded decompression chain vulnerability in urllib3\n   * `CVE-2021-23337 <https://nvd.nist.gov/vuln/detail/CVE-2021-23337>`_ - Command injection vulnerability in lodash template function\n   * `CVE-2023-29824 <https://nvd.nist.gov/vuln/detail/CVE-2023-29824>`_ - Use-after-free vulnerability in SciPy Py_FindObjects() function\n   * `CVE-2025-12816 <https://nvd.nist.gov/vuln/detail/CVE-2025-12816>`_ - ASN.1 schema validation bypass vulnerability in node-forge\n   * `CVE-2025-22869 <https://nvd.nist.gov/vuln/detail/CVE-2025-22869>`_ - SSH file transfer DoS vulnerability in golang.org/x/crypto\n   * `CVE-2025-59530 <https://nvd.nist.gov/vuln/detail/CVE-2025-59530>`_ - HANDSHAKE_DONE frame DoS vulnerability in quic-go\n   * `CVE-2024-6345 <https://nvd.nist.gov/vuln/detail/CVE-2024-6345>`_ - Remote code execution vulnerability in setuptools package_index\n   * `CVE-2023-27533 <https://nvd.nist.gov/vuln/detail/CVE-2023-27533>`_ - TELNET protocol input validation vulnerability in curl/libcurl\n   * `CVE-2021-36090 <https://nvd.nist.gov/vuln/detail/CVE-2021-36090>`_ - Memory exhaustion vulnerability in Apache Commons Compress ZIP processing\n   * `CVE-2025-66471 <https://nvd.nist.gov/vuln/detail/CVE-2025-66471>`_ - Highly compressed data handling vulnerability in urllib3 Streaming API\n   * `CVE-2023-43804 <https://nvd.nist.gov/vuln/detail/CVE-2023-43804>`_ - Cookie header information leak vulnerability in urllib3 redirects\n   * `CVE-2022-25878 <https://nvd.nist.gov/vuln/detail/CVE-2022-25878>`_ - Prototype Pollution vulnerability in protobufjs util.setProperty\n   * `CVE-2021-35515 <https://nvd.nist.gov/vuln/detail/CVE-2021-35515>`_ - Infinite loop vulnerability in Apache Commons Compress 7Z codec construction\n   * `CVE-2021-38561 <https://nvd.nist.gov/vuln/detail/CVE-2021-38561>`_ - Out-of-bounds read vulnerability in golang.org/x/text BCP 47 parsing\n   * `CVE-2022-43551 <https://nvd.nist.gov/vuln/detail/CVE-2022-43551>`_ - HSTS bypass vulnerability in curl/libcurl IDN handling\n   * `CVE-2022-27191 <https://nvd.nist.gov/vuln/detail/CVE-2022-27191>`_ - SSH server crash vulnerability in golang.org/x/crypto AddHostKey\n   * GHSA-m425-mq94-257g - HTTP/2 concurrent stream limit bypass vulnerability in gRPC-Go\n   * `CVE-2023-39325 <https://nvd.nist.gov/vuln/detail/CVE-2023-39325>`_ - HTTP/2 request reset DoS vulnerability in golang.org/x/net\n   * `CVE-2024-2398 <https://nvd.nist.gov/vuln/detail/CVE-2024-2398>`_ - Memory leak vulnerability in curl/libcurl HTTP/2 server push\n   * `CVE-2023-44487 <https://nvd.nist.gov/vuln/detail/CVE-2023-44487>`_ - HTTP/2 Rapid Reset DoS vulnerability in multiple packages\n   * `CVE-2025-55163 <https://nvd.nist.gov/vuln/detail/CVE-2025-55163>`_ - MadeYouReset DDoS vulnerability in Netty HTTP/2 implementation\n   * `CVE-2023-27534 <https://nvd.nist.gov/vuln/detail/CVE-2023-27534>`_ - SFTP path traversal vulnerability in curl/libcurl tilde handling\n   * `CVE-2022-32149 <https://nvd.nist.gov/vuln/detail/CVE-2022-32149>`_ - Accept-Language header DoS vulnerability in golang.org/x/text\n   * `CVE-2025-47913 <https://nvd.nist.gov/vuln/detail/CVE-2025-47913>`_ - SSH agent panic vulnerability in golang.org/x/crypto\n   * `CVE-2022-40898 <https://nvd.nist.gov/vuln/detail/CVE-2022-40898>`_ - DoS vulnerability in Python wheel CLI\n   * `CVE-2023-23914 <https://nvd.nist.gov/vuln/detail/CVE-2023-23914>`_ - HSTS functionality failure vulnerability in curl/libcurl\n   * `CVE-2023-0286 <https://nvd.nist.gov/vuln/detail/CVE-2023-0286>`_ - X.400 address processing vulnerability in cryptography\n   * `CVE-2022-25647 <https://nvd.nist.gov/vuln/detail/CVE-2022-25647>`_ - Deserialization vulnerability in Gson writeReplace() method\n   * `CVE-2021-43565 <https://nvd.nist.gov/vuln/detail/CVE-2021-43565>`_ - SSH server panic vulnerability in golang.org/x/crypto\n   * `CVE-2024-7254 <https://nvd.nist.gov/vuln/detail/CVE-2024-7254>`_ - Stack overflow vulnerability in Protocol Buffers nested groups parsing\n   * `CVE-2023-2976 <https://nvd.nist.gov/vuln/detail/CVE-2023-2976>`_ - Temporary directory access vulnerability in Google Guava FileBackedOutputStream\n   * `CVE-2026-21441 <https://nvd.nist.gov/vuln/detail/CVE-2026-21441>`_ - Decompression bomb vulnerability in urllib3 HTTP redirect responses\n   * `CVE-2023-38545 <https://nvd.nist.gov/vuln/detail/CVE-2023-38545>`_ - Heap buffer overflow vulnerability in curl/libcurl SOCKS5 proxy handshake\n   * GHSA-xpw8-rcwv-8f8p - HTTP/2 RST frame DoS vulnerability in Netty\n   * `CVE-2022-42920 <https://nvd.nist.gov/vuln/detail/CVE-2022-42920>`_ - Arbitrary bytecode generation vulnerability in Apache Commons BCEL\n   * `CVE-2024-24786 <https://nvd.nist.gov/vuln/detail/CVE-2024-24786>`_ - Infinite loop vulnerability in google.golang.org/protobuf JSON unmarshaling\n \n* ``pytorch-inference-vllm-neuronx``: 0.11.0 DLC has multiple HIGH CVEs. We are actively working to resolve these high CVEs:\n   * `CVE-2026-21441 <https://nvd.nist.gov/vuln/detail/CVE-2026-21441>`_ - Decompression bomb vulnerability in urllib3 HTTP redirect responses\n   * `CVE-2025-62164 <https://nvd.nist.gov/vuln/detail/CVE-2025-62164>`_ - Memory corruption vulnerability in vLLM Completions API endpoint\n   * `CVE-2025-69223 <https://nvd.nist.gov/vuln/detail/CVE-2025-69223>`_ - Zip bomb DoS vulnerability in AIOHTTP server\n   * GHSA-mcmc-2m55-j8jj - Insufficient fix for CVE-2025-62164 in vLLM sparse tensor validation\n   * `CVE-2025-66448 <https://nvd.nist.gov/vuln/detail/CVE-2025-66448>`_ - Remote code execution vulnerability in vLLM config class auto_map\n   * `CVE-2025-66418 <https://nvd.nist.gov/vuln/detail/CVE-2025-66418>`_ - Unbounded decompression chain vulnerability in urllib3\n   * `CVE-2025-66471 <https://nvd.nist.gov/vuln/detail/CVE-2025-66471>`_ - Highly compressed data handling vulnerability in urllib3 Streaming API\n\n* ``pytorch-training-neuronx``: 0.9.0 DLC has multiple HIGH CVEs. We are actively working to resolve these high CVEs:\n   * `CVE-2025-66418 <https://nvd.nist.gov/vuln/detail/CVE-2025-66418>`_ - Unbounded decompression chain vulnerability in urllib3\n   * `CVE-2025-66471 <https://nvd.nist.gov/vuln/detail/CVE-2025-66471>`_ - Highly compressed data handling vulnerability in urllib3 Streaming API\n   * `CVE-2026-21441 <https://nvd.nist.gov/vuln/detail/CVE-2026-21441>`_ - Decompression bomb vulnerability in urllib3 HTTP redirect responses\n\n\n----\n\n.. _containers-2-26-0-rn:\n\nNeuron Containers (Neuron 2.26.0 Release)\n---------------------------------------------------\n\nDate of Release: 09/18/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Both pytorch-training-neuronx and pytorch-inference-neuronx DLCs have been upgraded to version 2.8.0 along with their related dependencies.\n* Upgraded Python version to 3.11 in all Deep Learning Containers.\n* All Neuron packages and their dependencies have been upgraded to support version 2.26.0 of the AWS Neuron SDK.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* End-of-support for the Transformers NeuronX library starts with the 2.26.0 release of the AWS Neuron SDK. With this support ended, the PyTorch inference Deep Learning Container (DLC) will no longer include the transformers-neuronx package.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* ``pytorch-training-neuronx`` 2.7.0 DLC has two HIGH CVEs related to ``sagemaker-python-sdk`` package. We are actively working to resolve these high CVEs:\n  * `CVE-2024-34072 <https://nvd.nist.gov/vuln/detail/CVE-2024-34072>`_ - Vulnerability in sagemaker-python-sdk package\n  * `CVE-2024-34073 <https://nvd.nist.gov/vuln/detail/CVE-2024-34073>`_ - Vulnerability in sagemaker-python-sdk package\n\n\n----\n\n.. _containers-2-25-0-rn:\n\nNeuron Containers (Neuron 2.25.0 Release)\n---------------------------------------------------\n\nDate of Release: 07/31/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* All Neuron packages and their dependencies have been upgraded to support AWS Neuron SDK version 2.25.0.\n* The pytorch-inference-vllm-neuronx Deep Learning Container has been upgraded to version 0.9.1.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* ``pytorch-training-neuronx`` 2.7.0 DLC has two HIGH CVEs related to ``sagemaker-python-sdk`` package. We are actively working to resolve these high CVEs:\n  * `CVE-2024-34072 <https://nvd.nist.gov/vuln/detail/CVE-2024-34072>`_ - Vulnerability in sagemaker-python-sdk package\n  * `CVE-2024-34073 <https://nvd.nist.gov/vuln/detail/CVE-2024-34073>`_ - Vulnerability in sagemaker-python-sdk package\n* ``pytorch-inference-vllm-neuronx`` 0.9.1 DLC has CRITICAL and HIGH CVEs. We are actively working to resolve them.\n\n\n----\n\n.. _containers-2-24-0-rn:\n\nNeuron Containers (Neuron 2.24.0 Release)\n---------------------------------------------------\n\nDate of Release: 06/24/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Added new pytorch-inference-vllm-neuronx 0.7.2 DLC that contains all dependencies including drivers, tools, NxDI and other packages to run vLLM out of the box.\n* Upgraded pytorch-training-neuronx DLC to 2.7 version along with its related dependencies.\n* Upgraded pytorch-inference-neuronx DLC to 2.7 version along with its related dependencies.\n* Upgraded jax-training-neuronx DLC to 0.6 version along with its related dependencies.\n* Updated Neuron SDK to latest 2.24.0 release for all Neuron DLCs.\n\n\n----\n\n.. _containers-2-23-0-rn:\n\nNeuron Containers (Neuron 2.23.0 Release)\n---------------------------------------------------\n\nDate of Release: 05/19/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Upgraded pytorch-training-neuronx DLC to 2.6 version along with its related dependencies.\n* Upgraded pytorch-inference-neuronx DLC to 2.6 version along with its related dependencies.\n* Updated Neuron SDK to latest 2.23.0 release for all Neuron DLCs.\n\n\n----\n\n.. _containers-2-22-0-rn:\n\nNeuron Containers (Neuron 2.22.0 Release)\n---------------------------------------------------\n\nDate of Release: 04/04/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Upgraded jax-training-neuronx DLC to 0.5 version.\n* Updated Neuron SDK to latest 2.22.0 release for all Neuron DLCs.\n* Restructure all Dockerfiles by combining RUN commands for faster build time.\n\n**Kubernetes Support**\n\n* This release introduces the Neuron Helm Chart, which helps streamline the deployment of AWS Neuron components on Amazon EKS.\n* Adds ECS support for the \"Neuron Node Problem Detector and Recovery\" artifact.\n* Improves scalability and performance of the Neuron Device Plugin and Neuron Scheduler Extension by skipping \"list\" calls from the device plugin to the scheduler in situations where the pod allocation request either needs one or all the available resources in the node.\n* Ends support for resource name 'neurondevice' with the Neuron Device Plugin.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* Ends support for resource name 'neurondevice' with the Neuron Device Plugin.\n\n\n----\n\n.. _containers-2-21-1-rn:\n\nNeuron Containers (Neuron 2.21.1 Release)\n---------------------------------------------------\n\nDate of Release: 01/14/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Minor improvements and bug fixes.\n\nBug Fixes\n~~~~~~~~~\n\n* Minor improvements and bug fixes.\n\n\n----\n\n.. _containers-2-21-0-rn:\n\nNeuron Containers (Neuron 2.21.0 Release)\n---------------------------------------------------\n\nDate of Release: 12/19/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Added new jax-training-neuronx 0.4 Training DLC that contains all dependencies including drivers, tools and other packages to run JAX out of the box.\n* Added new pytorch-inference-neuronx 2.5.1 and pytorch-training-neuronx 2.5.1 DLCs.\n* PyTorch 1.13.1 and 2.1.2 DLCs reached end of support phase, We now recommend customers to use PyTorch 2.5.1 DLCs by default.\n* All Neuron supported DLCs to use latest Neuron SDK 2.21.0 version.\n* All Neuron supported DLCs are now updated to Ubuntu 22.\n* pytorch-inference-neuronx now supports both NxD Inference and Transformers NeuronX libraries for inference.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* PyTorch 1.13.1 and 2.1.2 DLCs reached end of support phase.\n\n\n----\n\n.. _containers-2-20-2-rn:\n\nNeuron Containers (Neuron 2.20.2 Release)\n---------------------------------------------------\n\nDate of Release: 11/20/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Neuron 2.20.2 DLC fixes dependency bug for NxDT use case by pinning the correct torch version.\n\n**Kubernetes Support**\n\n* This release addresses a stability issue in the Neuron Scheduler Extension that previously caused crashes shortly after installation.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed dependency bug for NxDT use case by pinning the correct torch version.\n* Addressed stability issue in the Neuron Scheduler Extension that previously caused crashes shortly after installation.\n\n\n----\n\n.. _containers-2-20-1-rn:\n\nNeuron Containers (Neuron 2.20.1 Release)\n---------------------------------------------------\n\nDate of Release: 10/25/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Neuron 2.20.1 DLC includes prerequisites for NxDT installation. Customers can expect to use NxDT out of the box.\n\n\n----\n\n.. _containers-2-20-0-rn:\n\nNeuron Containers (Neuron 2.20.0 Release)\n---------------------------------------------------\n\nDate of Release: 09/16/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Updated Neuron SDK to latest 2.20.0 release for PyTorch Neuron DLCs.\n* Added new NxD Training package to pytorch-training-neuronx DLCs.\n\n\n----\n\n.. _containers-2-19-0-rn:\n\nNeuron Containers (Neuron 2.19.0 Release)\n---------------------------------------------------\n\nDate of Release: 07/03/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Updated Neuron SDK to latest 2.19.0 release for PyTorch Neuron DLCs.\n* Updated TorchServe to 0.11.0 for PyTorch Neuron DLCs.\n\n**Kubernetes Support**\n\n* Critical Security Patch: Updated the dependencies used by the Neuron Device Plugin and the Neuron Kubernetes Scheduler to fix several important security vulnerabilities.\n* This release introduces Neuron Node Problem Detector And Recovery artifact to enable fast error detection and recovery in Kubernetes environment. Current version supports EKS managed and self-managed node groups for all EKS supported Kubernetes versions.\n* This release introduces a container image for neuron monitor to make it easy to run neuron monitor along with Prometheus and Grafana to monitor neuron metrics in Kubernetes environments.\n\nBug Fixes\n~~~~~~~~~\n\n* This release contains changes to improve performance of the device plugin at scale.\n\n\n----\n\n.. _containers-2-5-0-rn:\n\nNeuron Containers (Neuron 2.5.0 Release)\n-------------------------------------------------\n\nDate of Release: 11/07/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**DLC Support**\n\n* Neuron now supports trn1-based training in Sagemaker and Deep Learning Containers using PyTorch.\n\n**Neuron Containers**\n\n* Neuron now supports trn1-based training in Sagemaker and Deep Learning Containers using PyTorch.\n\n\n----\n\n.. _containers-2-4-0-rn:\n\nNeuron Containers (Neuron 2.4.0 Release)\n-------------------------------------------------\n\nDate of Release: 10/27/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Neuron Containers**\n\n* Neuron now supports Kubernetes work scheduling at the level of NeuronCore. Updates on how to use the new core allocation method is captured in the Kubernetes documentation on this site.\n\n**Kubernetes Support**\n\n* Added support for NeuronCore based scheduling to the Neuron Kubernetes Scheduler. Learn more about how to use NeuronCores for finer grain control over container scheduling by following the K8 tutorials documentation.\n\n\n----\n\n.. _containers-2-3-0-rn:\n\nNeuron Containers (Neuron 2.3.0 Release)\n-------------------------------------------------\n\nDate of Release: 10/10/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Neuron Containers**\n\n* Now supporting TRN1 and INF1 EC2 instance types as part of Neuron. There is an optional aws-neuronx-oci-hooks package users may install for convenience that supports use of the AWS_NEURON_VISIBLE_DEVICES environment variable when launching containers. New DLC containers will be coming soon in support of training workloads on TRN1.\n\n**Kubernetes Support**\n\n* Added support for TRN1 and INF1 EC2 instance types.\n\n\n----\n\n.. _containers-1-19-0-rn:\n\nNeuron Containers [1.19.0] (Neuron 1.19.0 Release)\n---------------------------------------------------\n\nDate of Release: 04/29/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Neuron Containers**\n\n* Neuron Kubernetes device driver plugin now can figure out communication with the Neuron driver without the oci hooks. Starting with Neuron 1.19.0 release, installing aws-neuron-runtime-base and oci-add-hooks are no longer a requirement for Neuron Kubernetes device driver plugin.\n\n**Kubernetes Support**\n\n* Minor updates.\n\n\n----\n\n.. _containers-2-16-0-rn:\n\nNeuron Containers (Neuron 2.16.0 Release)\n---------------------------------------------------\n\nDate of Release: 09/01/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Kubernetes Support**\n\n* This release enables easier programmability by using 0-based indexing for Neuron Devices and NeuronCores in EKS container environments. Previously, the Neuron Device indexing was assigned randomly. This change requires Neuron Driver version 2.12.14 or newer.\n* Improved logging when Neuron Driver not installed/present.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed Neuron Device Plugin crash when Neuron Driver is not installed/present on the host.\n* Fixed issue where pods fail to deploy when multiple containers are requesting Neuron resources.\n* Fixed issue where launching many pods each requesting Neuron cores fails to deploy.\n\n\n----\n\n.. _containers-2-1-0-rn:\n\nNeuron Containers (Neuron 2.1.0 Release)\n-------------------------------------------------\n\nDate of Release: 10/27/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Kubernetes Support**\n\n* Added support for NeuronCore based scheduling to the Neuron Kubernetes Scheduler. Learn more about how to use NeuronCores for finer grain control over container scheduling by following the K8 tutorials documentation.\n\n\n----\n\n.. _containers-2-0-0-rn:\n\nNeuron Containers (Neuron 2.0.0 Release)\n-------------------------------------------------\n\nDate of Release: 10/10/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Kubernetes Support**\n\n* Added support for TRN1 and INF1 EC2 instance types.\n\n\n----\n\n.. _containers-1-9-3-rn:\n\nNeuron Containers [1.9.3] (Neuron 1.9.3 Release)\n-------------------------------------------------\n\nDate of Release: 08/02/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Kubernetes Support**\n\n* Minor updates.\n\n\n----\n\n.. _containers-1-9-2-rn:\n\nNeuron Containers [1.9.2] (Neuron 1.9.2 Release)\n-------------------------------------------------\n\nDate of Release: 05/27/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Kubernetes Support**\n\n* Minor updates.\n\n\n----\n\n.. _containers-1-9-0-rn:\n\nNeuron Containers [1.9.0] (Neuron 1.9.0 Release)\n-------------------------------------------------\n\nDate of Release: 04/29/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Kubernetes Support**\n\n* Minor updates.\n\n\n----\n\n.. _containers-1-8-2-rn:\n\nNeuron Containers [1.8.2] (Neuron 1.8.2 Release)\n-------------------------------------------------\n\nDate of Release: 03/25/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Kubernetes Support**\n\n* Minor updates.\n\n\n----\n\n.. _containers-1-7-7-rn:\n\nNeuron Containers [1.7.7] (Neuron 1.7.7 Release)\n-------------------------------------------------\n\nDate of Release: 01/20/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Kubernetes Support**\n\n* Minor updates.\n\n\n----\n\n.. _containers-1-7-3-rn:\n\nNeuron Containers [1.7.3] (Neuron 1.7.3 Release)\n---------------------------------------------------\n\nDate of Release: 10/27/2021\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Neuron Containers**\n\n* Starting with Neuron 1.16.0, use of Neuron ML Frameworks now comes with an integrated Neuron Runtime as a library, as a result it is no longer needed to deploy neuron-rtd.\n* When using containers built with components from Neuron 1.16.0, or newer, please use aws-neuron-dkms version 2.1 or newer and the latest version of aws-neuron-runtime-base. Passing additional system capabilities is no longer required.\n\n**Kubernetes Support**\n\n* Minor updates.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* Starting with Neuron 1.16.0, use of Neuron ML Frameworks now comes with an integrated Neuron Runtime as a library, as a result it is no longer needed to deploy neuron-rtd.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* None reported for this release."
  },
  {
    "path": "release-notes/components/dev-tools.rst",
    "content": ".. meta::\n    :description: Complete release notes for the Neuron Developer Tools component across all AWS Neuron SDK versions.\n    :keywords: neuron tools, developer tools, profiler, release notes, neuron explorer, aws neuron sdk\n    :date-modified: 04/09/2026\n\n.. _dev-tools_rn:\n\nComponent Release Notes for Neuron Developer Tools\n==================================================\n\nThe release notes for the Neuron Developer Tools. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n\n.. _dev-tools-2-29-0-rn:   \n\nNeuron Developer Tools & Neuron Explorer (Neuron 2.29.0 Release)\n--------------------------------------------------------------------------------------\n\nDate of Release: 04/09/2026\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* The System Trace Viewer now supports the full suite of Device widgets, enabling multi-device profile analysis. Users can analyze hardware events across all linked Device Profiles within a single System Profile. See :doc:`Neuron Explorer system profiling </tools/neuron-explorer/overview-system-profiles>`.\n* Introduced Memory Viewer in Neuron Explorer, enabling you to inspect low-level memory allocation details and usage patterns. See :doc:`Neuron Explorer Memory Viewer </tools/neuron-explorer/overview-memory-viewer>`.\n* Neuron Explorer for Visual Studio Code is now available on the Visual Studio Code Extension Marketplace, enabling simpler installation and automatic updates. See :ref:`download-neuron-explorer-vscode`.\n* Summary Viewer page in Neuron Explorer now includes system-level profile data, enabling you to view summary metrics for system and device profiles in a single view. See :doc:`Neuron Explorer Summary Viewer </tools/neuron-explorer/overview-summary-page>`.\n* System Timeline now supports Neuron device HBM usage, showing a breakdown of memory allocation by usage category (such as tensors, scratchpad), enabling you to debug out-of-memory issues.\n* Introduced Box Selection Summary in Neuron Explorer, enabling you to view aggregated device profile information for a selected bounding box region. See :ref:`box-selection-summary`.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* None reported for this release.\n\nBug Fixes\n~~~~~~~~~\n\n* None reported for this release.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* None reported for this release.\n\n.. _dev-tools-2-28-0-rn:   \n\nNeuron Developer Tools & Neuron Explorer (Neuron 2.28.0 Release)\n--------------------------------------------------------------------------------------\n\nDate of Release: 02/26/2026\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added system profiling support in Neuron Explorer, enabling you to capture and analyze system-level performance data with drill-down navigation to device profiles. See :doc:`Neuron Explorer system profiling </tools/neuron-explorer/overview-system-profiles>`.\n* Added migration guide from Neuron Profiler/Profiler 2.0 to Neuron Explorer. See :ref:`neuron-profiler-migration-guide`.\n* Added ability to save and search by tags in Neuron Explorer Profile Manager, allowing you to organize and quickly locate profiles across multiple profiling sessions. See :ref:`neuron-explorer-profile-manager`.\n* Added help pop-up for the ``Device Trace Viewer`` in Neuron Explorer to see shortcuts and dependency color legend. See :doc:`Device Trace Viewer </tools/neuron-explorer/overview-device-profiles>`.\n* Introduced ``Tensor Viewer`` in Neuron Explorer, enabling you to quickly identify memory bottlenecks by viewing tensor names, shapes, sizes, and memory usage in a single interface. See :ref:`tensor-viewer-overview`.\n* Introduced ``Database Viewer`` in Neuron Explorer as an interactive interface for querying and exploring profiling data using SQL or natural language, allowing you to perform custom analysis without writing code. See :ref:`database-viewer-overview`.\n* Enhanced data integrity checks in ``nccom-test`` by using pseudo-random data patterns instead of fixed patterns, improving detection of data corruption during collective operations. See *Data Integrity* in the nccom-test documentation.\n* Added support for ``alltoallv`` collective operation in ``nccom-test``, enabling benchmarking of variable-sized all-to-all communication patterns. See *AlltoAllV Example* in the nccom-test documentation.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* The ``neuron-profile analyze`` subcommand is no longer supported.  We recommend migrating to Neuron Explorer.  See :doc:`Get Started with Neuron Explorer </tools/neuron-explorer/get-started>`.\n\nBug Fixes\n~~~~~~~~~\n\n* ``neuron-ls`` now handles concurrent queries correctly. Previously, when multiple processes queried Neuron devices simultaneously, ``neuron-ls`` would fail with a driver error, preventing you from viewing device status.\n* Neuron Explorer now correctly calculates PSUM usage for operations spanning multiple partitions. Previously, PSUM usage was underreported, which could lead to incorrect performance optimization decisions.\n\n.. _dev-tools-2-27-0-rn:\n\nNeuron Developer Tools & Neuron Explorer [2.29.0] (Neuron 2.27.0 Release)\n-------------------------------------------------------------------------\n\nDate of Release: 12/19/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Introduced Neuron Explorer - A unified profiling suite that replaces Neuron Profiler and Profiler 2.0.\n* Four core viewers provide insights into model performance: Hierarchy Viewer, AI Recommendation Viewer, Source Code Viewer, and Summary Viewer.\n* Neuron Explorer is available through UI, CLI, and VSCode IDE integration.\n* Added fine-grained collective communication support to nccom-test utility.\n* New tutorials cover profiling NKI kernels, multi-node training jobs, and vLLM inference workloads.\n* Added Trn3 support for neuron-monitor, neuron-top, neuron-ls, and nccom-test.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* Neuron Profiler and Profiler 2.0 support ends after Neuron 2.28.\n\nBug Fixes\n~~~~~~~~~\n\n* Improved profiling accuracy and reduced overhead.\n* Fixed visualization issues in multi-process scenarios.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Existing NTFF files are compatible but require reprocessing for new features.\n* Neuron Explorer does not support system level profiling at this time.\n\n\n----\n\n.. _dev-tools-2-26-0-rn:\n\nNeuron Developer Tools [2.26.7.0] (Neuron 2.26.0 Release)\n----------------------------------------------------------\n\nDate of Release: 09/18/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Profiler UI now allows selecting multiple semaphore values to display simultaneously for a more comprehensive view of activity.\n* System profile grouping default in Perfetto now uses global NeuronCore ID instead of process local NeuronCore ID for better display of multi-process workloads.\n* Added warning when system profile events are dropped due to limited buffer space.\n* nccom-test support on Trn2 for State Buffer to State Buffer collectives benchmarking for all-reduce, all-gather, and reduce-scatter operations.\n* nccom-test will show helpful error message when invalid sizes are used with all-to-all collectives.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed device memory usage type table and improvement made to stay in sync between runtime and tools versions.\n* Fixed system profile crash when processing long-running workloads.\n* Fixed display of system profiles in Perfetto to correctly separate rows within the same Logical NeuronCore when using NEURON_LOGICAL_NC_CONFIG=2 on Trn2.\n\n\n----\n\n.. _dev-tools-2-25-0-rn:\n\nNeuron Developer Tools [2.25.100.0] (Neuron 2.25.0 Release)\n------------------------------------------------------------\n\nDate of Release: 07/31/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* neuron-ls now shows NeuronCore IDs associated with each Neuron device as well as CPU and NUMA node affinity in both the text and JSON outputs.\n* Added a summary metric to device profiles for total_active_time (the amount of time the device was not idle during execution).\n* System profiles now show the sync point events that are used to approximate CPU and Neuron device timestamp alignment.\n* Removed metrics for defunct processes from Neuron Monitor's Prometheus output to more accurately reflect the current utilization of NeuronCores.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed issue in Neuron Profiler summary metrics where dma_active_time was larger than expected.\n* Fixed type inconsistency for certain event types and attributes in the system profile data that could result in a crash.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* System profile hardware events may be misaligned due to sync point imprecision.\n* System profile events shown in the Neuron Profiler UI for multiprocess workloads are grouped together.\n\n\n----\n\n.. _dev-tools-2-24-0-rn:\n\nNeuron Developer Tools [2.24.54.0] (Neuron 2.24.0 Release)\n-----------------------------------------------------------\n\nDate of Release: 06/24/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Scratchpad memory usage visualization is now available in the Neuron Profiler UI.\n* Framework stack traces are now available in the Neuron Profiler UI.\n* On-device collectives barriers are now shown in the Neuron Profiler UI.\n* HBM throughput visualization over time is now shown in the Neuron Profiler UI.\n* Added option to filter the Neuron Cores to capture trace events on.\n* Added option to filter the event types recorded when capturing system traces.\n* Added a flag to nccom-test to get results in JSON (--report-to-json-file <filename>).\n* Added a flag to nccom-test to explicitly show input and output sizes based on the operation (--show-input-output-size).\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed instance id labeling in system profile view for framework events.\n* Fixed issue in Neuron Profiler UI where the full data was not shown in the NEFF Nodes tab.\n\n\n----\n\n.. _dev-tools-2-23-0-rn:\n\nNeuron Developer Tools [2.23.16.0] (Neuron 2.23.0 Release)\n-----------------------------------------------------------\n\nDate of Release: 05/19/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Improved Neuron Profiler performance, allowing users to view profile results 5x times faster on average.\n* Improved error reporting with timeline support for error signatures via custom notifications in the Neuron Profiler UI.\n* Added execution and out-of-bounds (OOB) error tracking in Neuron Profiler JSON outputs.\n* Updated the default grouping for system profiles to include process ID.\n* Added neuron-monitor companion script for collecting Kubernetes info in EKS.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed hang during data collection when running nccom-test across multiple instances.\n* Fixed certain cases in Neuron Profiler where DMA sizes were always reported as 0 bytes.\n\n\n----\n\n.. _dev-tools-2-22-0-rn:\n\nNeuron Developer Tools [2.22.66.0] (Neuron 2.22.0 Release)\n-----------------------------------------------------------\n\nDate of Release: 04/03/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added several enhancements to the Neuron Profiler UI, including NeuronCore barrier annotations, a minimal default view to improve initial load performance, usability of updating markers, and better organization of view settings.\n* Added new event types in the system profile for Neuron Profiler 2.0 (Beta) related to out-of-bounds execution errors, execution request submission, and model switch overhead.\n* Updated system trace output format for Neuron Profiler 2.0 (Beta).\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* neuron-det is no longer supported starting with this release. We recommend customers transition to Neuron Profiler 2.0 (Beta) for debugging runtime hangs and issues in large-scale settings.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed an issue in the Neuron Profiler UI where dependencies were misaligned in the timeline when highlighted.\n* Fixed an issue where instruction dependency IDs were truncated in the Neuron Profiler JSON output.\n\n\n----\n\n.. _dev-tools-2-21-0-rn:\n\nNeuron Developer Tools [2.20.204.0] (Neuron 2.21.0 Release)\n------------------------------------------------------------\n\nDate of Release: 12/20/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for Trn2 instance types.\n* Added support for Logical Neuroncores. neuron-top, neuron-monitor, and neuron-ls now display and aggregate information per Logical Neuroncore based on LNC configuration.\n* Added Neuron Profile 2.0 (Beta) with system profiles featuring Neuron Runtime API trace and ML framework trace.\n* Option to view system and device profiles using the Perfetto UI.\n* Support for native JAX and PyTorch profilers.\n* Support for distributed workloads in environments such as EKS and ParallelCluster.\n* Ability to drill down from high-level system profiles to low-level device profiles.\n* Simplified experience for capturing profiles.\n\n\n----\n\n.. _dev-tools-2-20-0-rn:\n\nNeuron Developer Tools [2.19.0.0] (Neuron 2.20.0 Release)\n----------------------------------------------------------\n\nDate of Release: 09/16/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for Neuron Kernel Interface (NKI).\n* Updated neuron-profile JSON output to include information regarding instruction dependencies, DMA throughput, and SRAM usage.\n* Updated Neuron Profiler UI to display transpose information for DMAs (when applicable).\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed error handling in neuron-top to exit gracefully when passing an unknown argument.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* None reported for this release.\n\n----\n\n.. _dev-tools-2-19-0-rn:\n\nNeuron Developer Tools [2.18.3.0] (Neuron 2.19.0 Release)\n----------------------------------------------------------\n\nDate of Release: 07/03/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Profile captured with Neuron Runtime 2.20+ now includes annotations with additional information such as duration, size, and replica groups around collective operations.\n* Running neuron-profile capture for workloads with collectives will now attempt to use the required number of workers if --collectives-workers-per-node or --collectives-worker-count is not set.\n* Profiler UI now persists searched information in the URL and provides a summary of the search results.\n* Updating sampling approach to show more representative data in the profiler UI when zoomed out.\n* Updated groupings for displayed info on click in the profiler UI.\n* Added neuron_device_type and neuron_device_memory_size to neuron-monitor's hardware information output.\n\nBug Fixes\n~~~~~~~~~\n\n* Resolved issue where NaN would be seen in the JSON output of neuron-profile and result in parsing errors.\n* Resolved inconsistent timeline display issues in profiler UI that depended on when the profile was processed.\n* neuron-profile view --output-format summary-text will now display in a fixed order.\n* Updated accuracy of pending DMA count in the profiler UI.\n* Removed unnecessary calls to exec when capturing memory utilization metrics in neuron-monitor.\n\n\n----\n\n.. _dev-tools-2-18-0-rn:\n\nNeuron Developer Tools [2.17.1.0] (Neuron 2.18.0 Release)\n----------------------------------------------------------\n\nDate of Release: 04/01/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* NeuronPerf 1.8.55.0: Minor updates.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed potential hang during synchronization step in nccom-test.\n\n\n----\n\n.. _dev-tools-2-17-0-rn:\n\nNeuron Developer Tools [2.17.0.0] (Neuron 2.17.0 Release)\n----------------------------------------------------------\n\nDate of Release: 02/13/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support to neuron-profile for collective communication operator improvements in Neuron SDK 2.17.\n* Optimized count query for sampling in neuron-profile UI for up to 3x faster load performance.\n* Introduced warning annotations in neuron-profile UI to automatically highlight potential performance issues.\n\nBug Fixes\n~~~~~~~~~\n\n* Resolved issue of inaccurate execution time reported by neuron-profile as mentioned in Neuron Tools 2.16.1.0 release notes.\n* Fixed NaN display errors in the neuron-profile UI.\n* Fixed file naming issue when capturing collectives profiles with neuron-profile.\n\n\n----\n\n.. _dev-tools-2-16-0-rn:\n\nNeuron Developer Tools [2.16.1.0] (Neuron 2.16.0 Release)\n----------------------------------------------------------\n\nDate of Release: 12/21/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* First release of the Neuron Distributed Event Tracing tool neuron-det to visualize execution for multi-node workloads.\n* neuron-profile now has the ability to capture multi-worker jobs.\n* Added terminology descriptions to neuron-profile summary statistics.\n* Added optional flags to neuron-profile view to change the InfluxDB bucket name (--db-bucket <bucket name>) and profile display name (--display-name <name>).\n* NeuronPerf 1.8.15.0: Minor updates.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed bug where GPSimd summary values were missing in the profile summary.\n* Fixed issue in nccom-test to no longer expect Neuron Device 0 in a container environment.\n* Fixed issue in nccom-test to no longer require the instance launching nccom-test to be participating in the workload.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Execution time reported in neuron-profile is sometimes inaccurate due to a bug in how the time is captured. The bug will be addressed in upcoming Neuron releases.\n\n\n----\n\n.. _dev-tools-2-15-0-rn:\n\nNeuron Developer Tools [2.15.4.0] (Neuron 2.15.0 Release)\n----------------------------------------------------------\n\nDate of Release: 10/26/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Improved visibility of summary stats in the profiler UI with added groupings.\n* Added support for alltoall CC operation in nccom-test.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed bug in neuron-profile that may result in a crash when using the NeuronCore Pipeline feature on Inf1.\n\n\n----\n\n.. _dev-tools-2-14-0-rn:\n\nNeuron Developer Tools [2.14.6.0] (Neuron 2.14.0 Release)\n----------------------------------------------------------\n\nDate of Release: 09/15/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added legend in neuron-ls to clarify wrap around edges for topology view.\n* Improved error messaging when passing invalid arguments to neuron-profile view.\n* Profiler output now includes HLO name in addition to framework layer names.\n* neuron-profile view now has --output-format json option which will write to a file specified by --output-file <name> (default is ntff.json) instead of writing data to InfluxDB.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed bug in neuron-profile that incorrectly calculated buffer utilization for more recently compiled NEFFs.\n* Fixed bug in neuron-profile where the profile would sometimes include additional idle time while waiting for execution to start.\n\n\n----\n\n.. _dev-tools-2-13-0-rn:\n\nNeuron Developer Tools [2.13.4.0] (Neuron 2.13.0 Release)\n----------------------------------------------------------\n\nDate of Release: 08/28/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* --check option of nccom-test now supports more data types (fp16, bf16, (u)int8, (u)int16, and (u)int32 are now supported in addition to fp32).\n* NeuronPerf 1.8.7.0: Minor updates.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed bug in nccom-test that would wait indefinitely for execution to end when running on multiple instances (-N 2 and higher).\n* Fixed bug in neuron-profile to prevent a crash during utilization calculation.\n\n\n----\n\n.. _dev-tools-2-12-0-rn:\n\nNeuron Developer Tools [2.12.2.0] (Neuron 2.12.0 Release)\n----------------------------------------------------------\n\nDate of Release: 07/19/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Bumped the max supported profiling NTFF version to version 2 to resolve crashes when postprocessing NTFFs captured with newer versions of the Neuron Runtime Library.\n* When viewing profiles captured using Neuron Runtime Library 2.15 or above, please upgrade tools to 2.12.\n* This version of Neuron tools remains compatible with NTFF version 1.\n\nBug Fixes\n~~~~~~~~~\n\n* Bug fixes for neuron-profile related to the calculation of some summary stats.\n\n\n----\n\n.. _dev-tools-2-11-0-rn:\n\nNeuron Developer Tools [2.11.10.0] (Neuron 2.11.0 Release)\n-----------------------------------------------------------\n\nDate of Release: 06/14/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* nccom-test can now show multiple latency stats in the results table, such as average or percentiles, by specifying the -s option (for example: -s p10 p99 avg p50).\n* First public support for neuron-profile as a standalone tool that can be used to profile executions on Neuron Devices.\n\n\n----\n\n.. _dev-tools-2-10-0-rn:\n\nNeuron Developer Tools [2.10.1.0] (Neuron 2.10.0 Release)\n----------------------------------------------------------\n\nDate of Release: 05/01/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added new Neuron Collectives benchmarking tool, nccom-test, to enable benchmarking sweeps on various Neuron Collective Communication operations.\n* Expanded support for Neuron profiling to include runtime setup/teardown times and collapsed execution of NeuronCore engines and DMA.\n\n\n----\n\n.. _dev-tools-2-9-0-rn:\n\nNeuron Developer Tools [2.9.5.0] (Neuron 2.9.0 Release)\n--------------------------------------------------------\n\nDate of Release: 03/28/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Updated neuron-top to show effective FLOPs across all NeuronCores.\n* NeuronPerf 1.7.0.0: Adds trn1/inf2 support for PyTorch and TensorFlow 2.x. Uses new IMDSv2 for obtaining instance types.\n\n\n----\n\n.. _dev-tools-2-8-0-rn:\n\nNeuron Developer Tools [2.8.2.0] (Neuron 2.8.0 Release)\n--------------------------------------------------------\n\nDate of Release: 02/24/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Updated neuron-top to show aggregated utilization/FLOPs across all NeuronCores.\n\n\n----\n\n.. _dev-tools-2-7-0-rn:\n\nNeuron Developer Tools [2.7.2.0] (Neuron 2.7.0 Release)\n--------------------------------------------------------\n\nDate of Release: 02/08/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for model FLOPS metrics in both neuron-monitor and neuron-top.\n\n\n----\n\n.. _dev-tools-2-6-0-rn:\n\nNeuron Developer Tools [2.6.0.0] (Neuron 2.6.0 Release)\n--------------------------------------------------------\n\nDate of Release: 12/09/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for profiling with the Neuron Plugin for TensorBoard on TRN1.\n* Updated profile post-processing for workloads executed on TRN1.\n\n\n----\n\n.. _dev-tools-2-5-0-rn:\n\nNeuron Developer Tools [2.5.19.0] (Neuron 2.5.0 Release)\n---------------------------------------------------------\n\nDate of Release: 11/07/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Minor bug fixes and improvements.\n\nBug Fixes\n~~~~~~~~~\n\n* Minor bug fixes and improvements.\n\n\n----\n\n.. _dev-tools-2-5-0-2-rn:\n\nNeuron Developer Tools [2.5.16.0] (Neuron 2.5.0 Release)\n---------------------------------------------------------\n\nDate of Release: 10/26/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* New neuron-monitor and neuron-top feature: memory utilization breakdown. This new feature provides more details on how memory is being currently used on the Neuron Devices as well as on the host instance.\n* neuron-top's UI layout has been updated to accommodate the new memory utilization breakdown feature.\n* neuron-monitor's inference_stats metric group was renamed to execution_stats. While the previous release still supported inference_stats, starting this release the name inference_stats is considered deprecated and can't be used anymore.\n* NeuronPerf 1.6.0.0: New Evaluation + metrics API. Support map and iterable-type torch datasets. Support custom torch DataLoader args via dataloader_kwargs. New get_report_by_tag utility to identify specific configurations. Python 3.7+ now default from 3.6. Pricing and sizing info updated for inf1 + trn1.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* neuron-monitor's inference_stats metric group was renamed to execution_stats.\n\nBug Fixes\n~~~~~~~~~\n\n* Fix a rare crash in neuron-top when the instance is under heavy CPU load.\n* Fix process names on the bottom tab bar of neuron-top sometimes disappearing for smaller terminal window sizes.\n* NeuronPerf: GPU inputs are now moved correctly.\n\n\n----\n\n.. _dev-tools-2-4-0-rn:\n\nNeuron Developer Tools [2.4.6.0] (Neuron 2.4.0 Release)\n--------------------------------------------------------\n\nDate of Release: 10/10/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for both EC2 INF1 and TRN1 platforms. Name of the package changed from aws-neuron-tools to aws-neuronx-tools.\n* Added support for ECC counters on Trn1.\n* Added version number output to neuron-top.\n* Expanded support for longer process tags in neuron-monitor.\n* Removed hardware counters from the default neuron-monitor config to avoid sending repeated errors - will add back in future release.\n* neuron-ls - Added option neuron-ls --topology with ASCII graphics output showing the connectivity between Neuron Devices on an instance.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* Package name changed from aws-neuron-tools to aws-neuronx-tools.\n\nBug Fixes\n~~~~~~~~~\n\n* Fix neuron-monitor and neuron-top to show the correct Neuron Device when running in a container where not all devices are present.\n\n\n----\n\n.. _dev-tools-2-1-0-rn:\n\nNeuron Developer Tools [2.1.4.0] (Neuron 2.1.0 Release)\n--------------------------------------------------------\n\nDate of Release: 04/29/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Minor updates.\n* NeuronPerf 1.3.0.0: Minor updates.\n\n\n----\n\n.. _dev-tools-2-0-790-0-rn:\n\nNeuron Developer Tools [2.0.790.0] (Neuron 2.0.0 Release)\n----------------------------------------------------------\n\nDate of Release: 03/25/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* NeuronPerf 1.2.0.0: Initial release of NeuronPerf. Supports PyTorch, TensorFlow, and Apache MXNet. Supports customizable JSON and CSV reports.\n\nBug Fixes\n~~~~~~~~~\n\n* neuron-monitor: fixed a floating point error when calculating CPU utilization.\n\n\n----\n\n.. _dev-tools-2-0-623-0-rn:\n\nNeuron Developer Tools [2.0.623.0] (Neuron 2.0.0 Release)\n----------------------------------------------------------\n\nDate of Release: 01/20/2022\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* neuron-top - Added \"all\" tab that aggregates all running Neuron processes into a single view.\n* neuron-top - Improved startup time to approximately 1.5 seconds in most cases.\n* neuron-ls - Removed header message about updating tools from neuron-ls output.\n\nBug Fixes\n~~~~~~~~~\n\n* neuron-top - Reduced single CPU core usage down to 0.7% from 80% on inf1.xlarge when running neuron-top by switching to an event-driven approach for screen updates.\n\n\n----\n\n.. _dev-tools-2-0-494-0-rn:\n\nNeuron Developer Tools [2.0.494.0] (Neuron 2.0.0 Release)\n----------------------------------------------------------\n\nDate of Release: 12/27/2021\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Security related updates related to log4j vulnerabilities.\n\n\n----\n\n.. _dev-tools-2-0-327-0-rn:\n\nNeuron Developer Tools [2.0.327.0] (Neuron 2.0.0 Release)\n----------------------------------------------------------\n\nDate of Release: 11/05/2021\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Updated Neuron Runtime (which is integrated within this package) to libnrt 2.2.18.0 to fix a container issue that was preventing the use of containers when /dev/neuron0 was not present.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed container issue preventing use of containers when /dev/neuron0 was not present.\n\n\n----\n\n.. _dev-tools-2-0-277-0-rn:\n\nNeuron Developer Tools [2.0.277.0] (Neuron 2.0.0 Release)\n----------------------------------------------------------\n\nDate of Release: 10/27/2021\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Tools now support applications built with Neuron Runtime 2.x (libnrt.so).\n* Updates have been made to neuron-ls and neuron-top to significantly improve the interface and utility of information provided.\n* Expands neuron-monitor to include additional information when used to monitor latest Frameworks released with Neuron 1.16.0.\n* neuron-cli entering maintenance mode as its use is no longer relevant when using ML Frameworks with an integrated Neuron Runtime (libnrt.so).\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* You must update to the latest Neuron Driver (aws-neuron-dkms version 2.1 or newer) for proper functionality of the new runtime library.\n* neuron-cli entering maintenance mode.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* None reported for this release.\n"
  },
  {
    "path": "release-notes/components/dlamis.rst",
    "content": ".. meta::\n    :description: Complete release notes for the Neuron DLAMI component across all AWS Neuron SDK versions.\n    :keywords: neuron dlami, deep learning ami, release notes, aws neuron sdk\n    :date-modified: 04/09/2026\n\n.. _dlamis_rn:\n.. _dlami-rn-known-issues:\n\nComponent Release Notes for Neuron DLAMI\n=========================================\n\nThe release notes for the Neuron DLAMI component. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. _dlami-2-29-0-rn:   \n\nNeuron DLAMIs (Neuron 2.29.0 Release)\n------------------------------------------------------------------------\n\n\nDate of Release: 04/09/2026\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* All Neuron packages and their dependencies have been upgraded to support AWS Neuron SDK version 2.29.0.\n\n\nCallouts\n~~~~~~~~~~~~~~~~\n\n.. important::\n    Announcing maintenance mode for NxDT and NxD Core Training APIs starting next release. Starting with Neuron 2.30.0, NxDT and NxD Core Training APIs are entering maintenance mode. Future releases will address critical security issues only and support will be gradually ended. The NxDT virtual environment in both single and multi-framework DLAMIs has been pinned to include ``neuronx_distributed_training-1.7.0``.\n\n    **How does this impact you?**\n\n    Existing NxDT/NxD Core users should stay on Neuron 2.28 and PyTorch 2.9 until ready to migrate to native PyTorch on Neuron (starting PyTorch 2.10). Customers are recommended to use native PyTorch with standard distributed primitives (DTensor, FSDP, DDP) and TorchTitan starting with Neuron 2.30.0 and PyTorch 2.10. A migration guide will be published in a coming release.\n\n    For more information, see :doc:`/frameworks/torch/pytorch-native-overview`.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* Ubuntu 22.04 Multi-Framework DLAMI End of Support: The Ubuntu 22.04 multi-framework DLAMI is no longer published starting with this release. Customers are advised to use the Ubuntu 24.04 multi-framework DLAMI instead.\n* PyTorch 2.8 End of Support: PyTorch 2.8 virtual environments have been removed from multi-framework DLAMIs. Customers should use PyTorch 2.9 virtual environments on Ubuntu 24.04.\n\n\n.. _dlami-2-28-0-rn:   \n\nNeuron DLAMIs (Neuron 2.28.0 Release)\n------------------------------------------------------------------------\n\nDate of Release: 02/26/2026\n\n\nKnown Issues\n~~~~~~~~~~~~\n\n- AL2023-based DLAMIs released alongside version 2.28.0 do not include PyTorch 2.9+ or Multi-Framework environments due to an incompatibility with the default GLIBC version installed on AL2023.\n\n\n.. _dlami-2-27-1-rn:\n\nNeuron DLAMI (Neuron 2.27.1 Release)\n-----------------------------------------------\n\nDate of Release: 01/14/2026\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Support for NKI has been added to all DLAMI virtual environments.\n\n\n----\n\n.. _dlami-2-27-0-rn:\n\nNeuron DLAMI (Neuron 2.27.0 Release)\n-----------------------------------------------\n\nDate of Release: 12/19/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Ubuntu 24.04 Support: This release adds support for Ubuntu 24.04 base, single framework, and multi-framework DLAMIs with Python 3.12, providing customers with the latest Ubuntu LTS version for their machine learning workloads.\n* vLLM V1 with vLLM-Neuron Plugin: Published new vLLM V1 with the vLLM-Neuron Plugin single framework DLAMI and added virtual environment to multi-framework DLAMIs (Amazon Linux 2023, Ubuntu 24.04).\n* PyTorch 2.9 Support: Added PyTorch 2.9 support for single framework DLAMIs and virtual environment to multi-framework DLAMIs (Amazon Linux 2023, Ubuntu 24.04).\n* JAX 0.7 Support: Published JAX 0.7 single framework DLAMI and updated multi-framework DLAMI virtual environments to JAX 0.7 (Amazon Linux 2023, Ubuntu 24.04).\n* Neuron SDK Updates: Upgraded all Neuron packages and dependencies to support AWS Neuron SDK version 2.27.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* TensorFlow 2.10 End of Support: The tensorflow_2_10 single framework DLAMI and virtual environment in multi-framework DLAMIs will reach end of support in a future release. Customers are advised to use previously released DLAMIs for TensorFlow support.\n* Ubuntu 22.04 Single Framework End of Support: Ubuntu 22.04 single framework DLAMIs for PyTorch and JAX will reach end of support in a future release. Customers are advised to use multi-framework or previously released DLAMIs for Ubuntu 22.04.\n* Inf1 virtual environments End of Support: Inf1 virtual environments and AMIs have reached end of support. Use Neuron DLAMIs released up to SDK version 2.26 for Inf1 support.\n\n\n----\n\n.. _dlami-2-26-0-rn:\n\nNeuron DLAMI (Neuron 2.26.0 Release)\n-----------------------------------------------\n\nDate of Release: 09/18/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Support for PyTorch 2.8 (Amazon Linux 2023, Ubuntu 22.04) single-framework DLAMI.\n* Updates multi-framework DLAMI virtual environments to support PyTorch 2.8.\n* All Neuron packages and their dependencies have been upgraded to support version 2.26.0 of the AWS Neuron SDK.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* End-of-support for the Transformers NeuronX library starts with the 2.26.0 release of the AWS Neuron SDK. As a result, the PyTorch inference Deep Learning Container (DLC) will no longer provide the transformers-neuronx virtual environment in both single and multi-framework DLAMIs.\n\n\n----\n\n.. _dlami-2-25-0-rn:\n\nNeuron DLAMI (Neuron 2.25.0 Release)\n-----------------------------------------------\n\nDate of Release: 07/31/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* All multi-framework virtual environments for the Deep Learning AMIs have been upgraded with the latest Neuron packages to support the AWS Neuron SDK version 2.25.0.\n\n\n----\n\n.. _dlami-2-24-0-rn:\n\nNeuron DLAMI (Neuron 2.24.0 Release)\n-----------------------------------------------\n\nDate of Release: 06/24/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Adding support for PyTorch 2.7 (Amazon Linux 2023, Ubuntu 22.04) single framework DLAMI.\n* Adding support for JAX 0.6 (Amazon Linux 2023, Ubuntu 22.05) single framework DLAMI.\n* Update multi framework DLAMI's virtual environments to use PyTorch 2.7 and JAX 0.6.\n\n\n----\n\n.. _dlami-2-23-0-rn:\n\nNeuron DLAMI (Neuron 2.23.0 Release)\n-----------------------------------------------\n\nDate of Release: 05/19/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Adding support for PyTorch 2.6 (Amazon Linux 2023, Ubuntu 22.04) single framework DLAMI.\n* Adding support JAX 0.5 (Amazon Linux 2023, Ubuntu 22.05) single framework DLAMI.\n* Update multi framework DLAMI's virtual environments to use PyTorch 2.6 and JAX 0.5.\n* Security improvements: Bump Linux kernel to 6.8.1027 for Ubuntu 22 DLAMIs.\n* Security improvements: Bump Linux kernel to 6.1.134 for Amazon Linux 2023 DLAMIs.\n* Added a setup script within neuronx-distributed-training virtual environment to automate the installation of required dependencies.\n\n\n----\n\n.. _dlami-2-22-0-rn:\n\nNeuron DLAMI (Neuron 2.22.0 Release)\n-----------------------------------------------\n\nDate of Release: 04/03/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Adding PyTorch 2.5 (Amazon Linux 2023, Ubuntu 22.04) and PyTorch 1.13 Inf1 (Ubuntu 22.04) single framework DLAMIs.\n* Adding PyTorch 1.13 Inf1 virtual environments within the Neuron Multi Framework DLAMIs. (Amazon Linux 2023, Ubuntu 22.04).\n* Adding Tensorflow 2.10 Inf1 virtual environments within Multi Framework DLAMI and Tensorflow singleframework DLAMI.\n* Adding support for Amazon Linux 2023 in the Base Neuron DLAMI.\n* Security improvements: Bump Linux kernel to 5.19.0-1024-aws for Ubuntu 22 DLAMIs.\n* Optimization: Reduce EBS storage size for all DLAMIs such that the virtual environments and dependencies consume 80% of available block storage. This results in reduced cost and time to launch for the DLAMIs. Customers can always request more storage if needed.\n\nBug Fixes\n~~~~~~~~~\n\n* Update venv paths in message of the day (MOTD) launch screens for Neuron DLAMIs.\n\n\n----\n\n.. _dlami-2-21-1-rn:\n\nNeuron DLAMI (Neuron 2.21.1 Release)\n-----------------------------------------------\n\nDate of Release: 01/14/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* No changes to DLAMI.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Incompatibility issue reported for Tensorflow 2.10 (inf1) on v2.21.1.\n\n\n----\n\n.. _dlami-2-21-0-rn:\n\nNeuron DLAMI (Neuron 2.21.0 Release)\n-----------------------------------------------\n\nDate of Release: 12/20/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for Trainium2 chips within the Neuron Multi Framework DLAMI.\n* Added support for JAX 0.4 to Neuron Multi Framework DLAMI.\n* Added NxD Training (NxDT), NxD Inference (NxDI) and NxD Core PyTorch 2.5 support within the Neuron Multi Framework DLAMI.\n* Added Single Framework DLAMI for TensorFlow 2.10 on U22 and corresponding SSM Parameter support.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* Removing virtual environments for PyTorch 1.13 and 2.1 within Neuron Multi Framework DLAMI.\n* Removing PyTorch 1.13 inf1 virtual environment from Neuron Multi Framework DLAMI.\n* Removing Single Framework DLAMI and corresponding SSM Parameters for PyTorch 1.13 and 2.1.\n* Removing SSM Parameters for AL2 Base DLAMI, PyTorch 1.13 and 2.1 Neuron DLAMI.\n\n\n----\n\n.. _dlami-2-20-1-rn:\n\nNeuron DLAMI (Neuron 2.20.1 Release)\n-----------------------------------------------\n\nDate of Release: 10/25/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for Amazon Linux 2023 to Neuron Multi Framework DLAMI. Customers will have two operating system options when using the multi framework DLAMI.\n\n\n----\n\n.. _dlami-2-20-0-rn:\n\nNeuron DLAMI (Neuron 2.20.0 Release)\n-----------------------------------------------\n\nDate of Release: 09/16/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Add neuronx-distributed-training library to PyTorch virtual environments.\n* Updated existing Neuron supported DLAMIs with Neuron 2.20 SDK release.\n\n\n----\n\n.. _dlami-2-19-0-rn:\n\nNeuron DLAMI (Neuron 2.19.0 Release)\n-----------------------------------------------\n\nDate of Release: 07/03/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* New Neuron PyTorch-2.1, PyTorch-1.13 and Base Deep Learning AMIs (DLAMI) for Ubuntu 22.\n* Updated Existing Neuron supported DLAMIs with Neuron 2.19 SDK release.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* End of support for Amazon Linux 2 DLAMIs.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* None reported for this release."
  },
  {
    "path": "release-notes/components/index.rst",
    "content": ".. meta::\n    :description: Index of Neuron SDK component release notes with latest versions\n    :keywords: neuron, release notes, components, versions, aws neuron sdk\n    :date-modified: 02/26/2026\n\nNeuron Component Release Notes\n===============================\n\nThis page provides an index of all Neuron SDK component release notes. Each component has its own dedicated release notes page that tracks changes across all Neuron SDK versions.\n\n.. list-table::\n   :widths: 40 30 30\n   :header-rows: 1\n   :align: left\n\n   * - Component\n     - Updated in Neuron Version\n     - Latest Component Version\n   * - :doc:`Neuron Compiler <compiler>`\n     - 2.27.0\n     - 2.24.5133.0\n   * - :doc:`Neuron Containers <containers>`\n     - 2.29.0\n     - 2.29.0\n   * - :doc:`Neuron Developer Tools <dev-tools>`\n     - 2.29.0\n     - 2.29.0\n   * - :doc:`Neuron DLAMI <dlamis>`\n     - 2.29.0\n     - 2.29.0\n   * - :doc:`JAX NeuronX <jax>`\n     - 2.26.0\n     - 0.7.0.1.0.*\n   * - :doc:`NKI Library <nki-lib>`\n     - 2.29.0\n     - 2.29.0\n   * - :doc:`Neuron Kernel Interface <nki>`\n     - 2.29.0\n     - 0.3.0\n   * - :doc:`NxD Core <nxd-core>`\n     - 2.26.0\n     - 0.18.27753\n   * - :doc:`NxD Inference <nxd-inference>`\n     - 2.29.0\n     - 0.9.17334\n   * - :doc:`NxD Training <nxd-training>`\n     - 2.25.0\n     - 1.5.0\n   * - :doc:`PyTorch Neuron Framework (torch-neuronx) <pytorch>`\n     - 2.29.0\n     - 2.9.0.2.13.*\n   * - :doc:`Neuron Runtime Library <runtime>`\n     - 2.29.0\n     - 2.31.24.0\n   * - :doc:`Neuron Driver <runtime>`\n     - 2.29.0\n     - 2.26.10.0\n   * - :doc:`Neuron Collectives <runtime>`\n     - 2.29.0\n     - 2.31.24.0\n   * - :doc:`vLLM Plugin for Neuron <nxd-inference>`\n     - 2.29.0\n     - 0.5.0\n  \n* For older components and features that have not been updated recently or are out of support, see :doc:`../archive/index`.\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   Neuron Compiler <compiler>\n   Neuron Containers <containers>\n   Neuron Developer Tools <dev-tools>\n   Neuron DLAMI <dlamis>\n   JAX NeuronX <jax>\n   NKI Library <nki-lib>\n   Neuron Kernel Interface <nki>\n   NxD Core <nxd-core>\n   NxD Inference <nxd-inference>\n   NxD Training <nxd-training>\n   PyTorch Neuron Framework <pytorch>\n   Neuron Runtime <runtime>\n   Older components and features <../archive/index>\n"
  },
  {
    "path": "release-notes/components/jax.rst",
    "content": ".. meta::\n    :description: Complete release notes for the JAX NeuronX component across all AWS Neuron SDK versions.\n    :keywords: jax neuronx, jax, release notes, aws neuron sdk\n    :date-modified: 04/03/2026\n\n.. _jax_rn:\n\nComponent Release Notes for JAX NeuronX\n========================================\n\n**Latest Version (in 2.29.0)**: 0.7.0.1.0.*\n\nThe release notes for the JAX NeuronX component. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. _jax-2-26-0-rn:\n\nJAX NeuronX [0.6.2.1.0.*] (Neuron 2.26.0 Release)\n---------------------------------------------------\n\nDate of Release: 09/18/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* This release introduces support for JAX version 0.6.2.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* The Threefry RNG algorithm is not completely supported. Use the rbg algorithm instead. This can be configured by setting the following config option: jax.config.update(\"jax_default_prng_impl\", \"rbg\").\n* For JAX versions older than 0.4.34, caching does not work out of the box.\n* For JAX versions older than 0.4.34, buffer donation does not work out of the box.\n* Mesh configurations which use non-connected Neuron cores may crash during execution.\n* Not all dtypes supported by JAX work on Neuron.\n* jax.random.randint does not produce expected distribution of randint values. Run it on CPU instead.\n* Dynamic loops are not supported for jax.lax.while_loop. Only static while loops are supported.\n* jax.lax.cond is not supported.\n* Host callbacks are not supported.\n* jax.dlpack is not supported.\n* jax.experimental.sparse is not supported.\n* jax.lax.sort only supports comparators with LE, GE, LT and GT operations.\n* jax.lax.reduce_precision is not supported.\n* Certain operations might result in slow compilations.\n* Neuron only supports float8_e4m3 and float8_e5m2 for FP8 dtypes.\n* Complex dtypes (jnp.complex64 and jnp.complex128) are not supported.\n* Variadic reductions are not supported.\n* Out-of-bounds access for scatter/gather operations can result in runtime errors.\n* Dot operations on int dtypes are not supported.\n* lax.DotAlgorithmPreset is not always respected.\n\n\n----\n\n.. _jax-2-25-0-rn:\n\nJAX NeuronX [0.6.1.1.0.*] (Neuron 2.25.0 Release)\n---------------------------------------------------\n\nDate of Release: 07/31/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* This release introduces support for JAX version 0.6.1.\n\nBug Fixes\n~~~~~~~~~\n\n* Previously, using multiple meshes within a single program wasn't supported. This is fixed to add support for sub-meshes.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Known issues are listed at jax-neuron-known-issues.\n\n\n----\n\n.. _jax-2-24-0-rn:\n\nJAX NeuronX [0.6.0.1.0.*] (Neuron 2.24.0 Release)\n---------------------------------------------------\n\nDate of Release: 06/20/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* This release supports JAX versions up to ``0.6.0``.\n* Known issues are listed within :ref:`jax-neuron-known-issues`.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Known issues are listed within :ref:`jax-neuron-known-issues`.\n\n\n----\n\n.. _jax-2-23-0-rn:\n\nJAX NeuronX [0.5.3.1.0.*] (Neuron 2.23.0 Release)\n---------------------------------------------------\n\nDate of Release: 05/20/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* This release supports JAX versions up to ``0.5.3``.\n* Known issues are listed within :ref:`jax-neuron-known-issues`.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* ``jax_neuronx.nki_call`` is no longer supported. Use ``neuronxcc.nki.jit`` instead.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Known issues are listed within :ref:`jax-neuron-known-issues`.\n\n\n----\n\n.. _jax-2-22-0-rn:\n\nJAX NeuronX [0.1.3] (Neuron 2.22.0 Release)\n---------------------------------------------\n\nDate of Release: 04/03/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* This release supports JAX versions up to ``0.5.0``.\n* Known issues are listed within :ref:`jax-neuron-known-issues`.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Known issues are listed within :ref:`jax-neuron-known-issues`.\n\n\n----\n\n.. _jax-2-21-0-rn:\n\nJAX NeuronX [0.1.2] (Neuron 2.21.0 Release)\n---------------------------------------------\n\nDate of Release: 12/20/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* This release supports JAX versions up to ``0.4.35``.\n* Support for JAX versions up to ``0.4.35``.\n* Support for JAX caching API for versions ``0.4.30+``.\n\n\n----\n\n.. _jax-2-20-0-rn:\n\nJAX NeuronX [0.1.1] (Neuron 2.20.0 Release)\n---------------------------------------------\n\nDate of Release: 09/16/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* This is the initial beta release of JAX NeuronX that contains Neuron-specific JAX features, such as the Neuron NKI JAX interface.\n* Announcing the first JAX NeuronX release.\n* JAX interface for Neuron NKI.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* None reported for this release."
  },
  {
    "path": "release-notes/components/nki-lib.rst",
    "content": ".. meta::\n    :description: Complete release notes for the NKI Library component across all AWS Neuron SDK versions.\n    :keywords: nki library, nki-lib, release notes, aws neuron sdk\n    :date-modified: 04/09/2026\n\n.. _nki-lib_rn:\n\nRelease Notes for Neuron Component: NKI Library\n================================================\n\nThe release notes for the NKI Library Neuron component. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. _nki-lib-2-29-0-rn:\n\nNKI Library (NKI-Lib) (Neuron 2.29.0 Release)\n--------------------------------------------------------------------\n\nDate of Release: 04/09/2026\n\nWhat's New\n~~~~~~~~~~\n\nThis release promotes ``find_nonzero_indices`` from experimental to a core subkernel and adds 7 new experimental kernels (Conv1D, Transformer TKG, 3 collective communication kernels, Top-K Reduce, and Dynamic Elementwise Add). Existing kernels receive sequence packing support, MXFP quantization paths, and expanded dimension limits. PyTorch reference implementations are added for 22 kernels.\n\nNew Core Additions\n^^^^^^^^^^^^^^^^^^\n\n* :doc:`find_nonzero_indices </nki/library/api/find-nonzero-indices>` (promoted from experimental) — Finds indices of nonzero elements along the T dimension using GpSimd ``nonzero_with_count`` ISA. Optimized for LNC2 sharding. Supports token counts up to 65536 and column counts up to 128.\n\nNew Experimental Kernels\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n* :doc:`Conv1D </nki/library/api/conv1d>` — 1D convolution using tensor engine with replication strategy. Supports stride, padding, dilation, optional bias, activation fusion, and LNC sharding.\n* :doc:`Transformer TKG </nki/library/api/transformer-tkg>` — Multi-layer transformer forward pass megakernel for token generation. Executes attention block, all-reduce, MLP, and residual connections across a configurable number of layers.\n* :doc:`Fine-Grained All-Gather </nki/library/api/fg-allgather>` — Ring-based all-gather for TRN2 using collective permute with double buffering to overlap communication and data movement.\n* :doc:`FGCC (All-Gather + Matmul) </nki/library/api/fgcc>` — Fused all-gather and matrix multiplication for TRN2, overlapping communication with compute.\n* :doc:`SBUF-to-SBUF All-Gather </nki/library/api/sb2sb-allgather>` — Two variants: ``allgather_sb2sb`` for small tensors fitting in SBUF and ``allgather_sb2sb_tiled`` with tiling and LNC support for larger tensors.\n* :doc:`Top-K Reduce </nki/library/api/topk-reduce>` — Gathers scattered rows by packed global token index and reduces along the K dimension for MoE output. Supports LNC sharding on the hidden dimension.\n* :doc:`Dynamic Elementwise Add </nki/library/api/dynamic-elementwise-add>` — Elementwise addition with runtime-variable M-dimension tiling using dynamic loop bounds.\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* :doc:`Attention CTE Kernel </nki/library/api/attention-cte>`: Added ``mm_out_dtype`` parameter for controlling matmul output dtype. Added ``bound_min``/``bound_max`` parameters for sequence packing support (per-query KV range bounds). Increased max batch size from 32 to 512. Increased max sequence length from 36864 to 131072.\n* :doc:`Attention BWD Kernel </nki/library/api/attention-cte>`: Added ``bound_min``/``bound_max`` parameters for sequence packing support. Added support for large batch size.\n* :doc:`Attention TKG Kernel </nki/library/api/attention-tkg>`: Added ``start_pos_ids`` parameter for explicit KV cache position control to support sliding window masking.\n* :doc:`Attention Block TKG Kernel </nki/library/api/attention-block-tkg>`: Added ``rmsnorm_QK_pre_rope_W_Q``/``rmsnorm_QK_pre_rope_W_K`` parameters for fused QK-norm before RoPE. Added KVDP attention sharding support (``KVDP``, ``KVDP_replica_group``). Added ``enable_fa_s_prior_tiling`` for overriding flash attention s_prior tiling.\n* :doc:`MLP Kernel </nki/library/api/mlp>`: Added ``sbm`` (BufferManager) parameter for custom SBUF memory management. Added MXFP4/MXFP8 quantization path.\n* :doc:`MoE TKG Kernel </nki/library/api/moe-tkg>`: Added new dynamic all-expert algorithm that uses ``block_size`` and ``is_all_expert_dynamic`` args. Expanded support for small I and added support for sharding on T in all-expert MX kernel.\n* :doc:`Output Projection CTE Kernel </nki/library/api/output-projection-cte>`: Added ``output_dtype`` parameter for controlling output data type.\n* :doc:`Output Projection TKG Kernel </nki/library/api/output-projection-tkg>`: Added ``sbm`` (BufferManager) parameter for custom SBUF memory management.\n* :doc:`QKV Kernel </nki/library/api/qkv>`: Added ``is_h_dim_4h_transposed`` and ``weight_layout`` parameters for flexible weight layout support.\n* **rmsnorm_tkg** / **layernorm_tkg**: Added ``shard_on_h`` parameter for sharding on the hidden dimension.\n* Added PyTorch reference implementations for 22 kernels for testing and validation.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* :doc:`Router Top-K Kernel </nki/library/api/router-topk>`: The ``output_in_sbuf``, ``x_input_in_sbuf``, and ``expert_affin_in_sb`` parameters have been removed. The kernel now auto-detects SBUF inputs from the tensor buffer type. Callers passing these keyword arguments must remove them.\n* :doc:`QKV Kernel </nki/library/api/qkv>`: The ``is_input_swizzled`` parameter has been removed and replaced by ``is_h_dim_4h_transposed`` (same position, same default ``False``) and a new ``weight_layout`` parameter. Callers using ``is_input_swizzled`` by name must rename to ``is_h_dim_4h_transposed``.\n* :doc:`QKV Kernel </nki/library/api/qkv>` (TKG variant): New parameter ``is_h_dim_4h_transposed`` has been inserted after ``quantization_type``. Callers using positional arguments for ``qkv_w_scale`` or later parameters must update to use keyword arguments.\n* :doc:`Attention CTE Kernel </nki/library/api/attention-cte>`: New parameter ``mm_out_dtype`` has been inserted between ``softmax_dtype`` and ``cp_offset``. Callers using positional arguments for ``cp_offset``, ``global_cp_deg``, or ``cp_strided_q_slicing`` must update to use keyword arguments.\n* :doc:`Attention TKG Kernel </nki/library/api/attention-tkg>`: New parameter ``start_pos_ids`` has been inserted after ``rope_pos_ids``. Callers using positional arguments beyond ``rope_pos_ids`` must update to use keyword arguments.\n* :doc:`Attention BWD Kernel </nki/library/api/attention-cte>`: New parameters ``bound_min`` and ``bound_max`` have been inserted between ``sinks_ref`` and ``use_causal_mask``. Callers using positional arguments for ``use_causal_mask`` or later parameters must update to use keyword arguments.\n* :doc:`Attention Block TKG Kernel </nki/library/api/attention-block-tkg>`: The keyword-only marker (``*``) has been removed and multiple parameters have been reordered. New pre-RoPE QK-norm parameters (``rmsnorm_QK_pre_rope_W_Q``, ``rmsnorm_QK_pre_rope_W_K``) have been added. ``softmax_scale``, ``k_scale``, and ``v_scale`` have been moved to optional parameters with defaults. All callers must review their argument ordering.\n* **rmsnorm_tkg** / **layernorm_tkg**: New parameter ``shard_on_h`` has been inserted before ``use_heap_memory`` and ``sbm``. Callers using positional arguments beyond ``single_core_forced`` (rmsnorm) or ``eps`` (layernorm) must update to use keyword arguments. Helper functions ``process_rmsnorm_tile``, ``rmsnorm_tkg_llama_impl``, and ``layernorm_tkg_llama_impl`` have been made private (prefixed with ``_``).\n* **SbufManager** has been renamed to **BufferManager**. A backward-compatible alias ``SbufManager = BufferManager`` is provided, so existing code using ``SbufManager`` will continue to work.\n* MoE TKG: Replaced boolean sharding flags (``shard_on_I``, ``shard_on_T``) with ``LNCShardingStrategy`` enum in down projection interfaces.\n* MoE TKG MX quantization files restructured: ``down_projection_mx_shard_I.py`` and ``gate_up_projection_mx_shard_I.py`` replaced with ``all_expert_mx_utils.py``, ``down_projection_mx.py``, and ``gate_up_projection_mx.py``. Callers importing from the old file paths must update their imports.\n* ``find_nonzero_indices`` has been moved from ``nkilib.experimental.subkernels`` to ``nkilib.core.subkernels``. A backward-compatible re-export is provided, so imports via the experimental path continue to work.\n* Removed usage of ``nki.language.par_dim`` throughout the library.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed MLP CTE indexing in gate proj row scales.\n* Fixed QKV TKG ``sb2sb_wrapper_kernel`` signature missing QK-norm parameters.\n* Fixed MLP failure for FP4 quantization with specific dimension combinations (``vnc=2, h=3072, i=384``).\n* Fixed ``bwmm_shard_on_H`` with explicit TensorCopy from PSUM to SBUF for NKI 0.3.0 compatibility.\n\nKnown Issues\n~~~~~~~~~~~~\n\n\n\n.. _nki-lib-2-28-0-rn:   \n\nNKI Library (NKI-Lib) (Neuron 2.28.0 Release)\n--------------------------------------------------------------------\n\nWhat's New\n~~~~~~~~~~\n\nThis release expands the NKI Library with 9 new kernels, bringing the total to 16 documented kernel APIs. New core kernels include RoPE, Router Top-K, MoE CTE, MoE TKG, and Cumsum. New experimental kernels include Attention Block TKG (fused attention block for token generation), Cross Entropy (forward and backward passes), Depthwise Conv1D, and Blockwise MM Backward for MoE training.\n\nExisting kernels receive FP8 and MX quantization support across QKV, MLP, and both Output Projection kernels. Kernel utilities gain new TensorView methods, SbufManager logging improvements with tree-formatted allocation tracing, and new utilities including ``interleave_copy``, ``LncSubscriptable``, and ``rmsnorm_mx_quantize_tkg``. Note that several breaking changes affect kernel signatures and utility APIs — see the Breaking Changes section for details.\n\nNew Core Kernels\n^^^^^^^^^^^^^^^^\n\n* :doc:`RoPE Kernel </nki/library/api/rope>` — Applies Rotary Position Embedding to input embeddings with optional LNC sharding and flexible layout support (contiguous and interleaved).\n* :doc:`Router Top-K Kernel </nki/library/api/router-topk>` — Computes router logits and top-K expert selection for Mixture of Experts models, with support for multiple layout configurations and sharding strategies.\n* :doc:`MoE CTE Kernel </nki/library/api/moe-cte>` — Implements Mixture of Experts optimized for Context Encoding with multiple sharding strategies (block sharding, intermediate dimension sharding) and MxFP4/MxFP8 quantization.\n* :doc:`MoE TKG Kernel </nki/library/api/moe-tkg>` — Implements Mixture of Experts optimized for Token Generation with all-expert and selective-expert modes, supporting FP8 and MxFP4 quantization.\n* :doc:`Cumsum Kernel </nki/library/api/cumsum>` — Computes cumulative sum along the last dimension, optimized for batch sizes up to 2048.\n\nNew Experimental Kernels\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n* :doc:`Attention Block TKG Kernel </nki/library/api/attention-block-tkg>` — Fused attention block for Token Generation that combines RMSNorm, QKV projection, RoPE, attention, and output projection in SBUF to minimize HBM traffic.\n* :doc:`Cross Entropy Kernel </nki/library/api/cross-entropy>` — Memory-efficient cross entropy loss forward and backward passes for large vocabularies using online log-sum-exp algorithm, optimized for LNC2.\n* :doc:`Depthwise Conv1D Kernel </nki/library/api/depthwise-conv1d>` — Depthwise 1D convolution using implicit GEMM algorithm with support for arbitrary stride and padding values, optimized for TRN2.\n* :doc:`Blockwise MM Backward Kernel </nki/library/api/blockwise-mm-backward>` — Backward pass for blockwise matrix multiplication in MoE layers, computing gradients for all parameters with support for dropless MoE.\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* :doc:`QKV Kernel </nki/library/api/qkv>`: Added FP8 quantization support (``quantization_type``, ``qkv_w_scale``, ``qkv_in_scale``), fused FP8 KV cache quantization (``k_cache``, ``v_cache``, ``k_scale``, ``v_scale``, ``fp8_max``, ``fp8_min``, ``kv_dtype``), block-based KV cache layout (``use_block_kv``, ``block_size``, ``slot_mapping``), and MX quantization input swizzling (``is_input_swizzled``).\n* :doc:`MLP Kernel </nki/library/api/mlp>`: Added FP8 quantization support (``quantization_type``, ``gate_w_scale``, ``up_w_scale``, ``down_w_scale``, ``gate_up_in_scale``, ``down_in_scale``, ``quant_clipping_bound``), gate/up projection clamping (``gate_clamp_upper_limit``, ``gate_clamp_lower_limit``, ``up_clamp_upper_limit``, ``up_clamp_lower_limit``), ``skip_gate_proj`` option, and fp16 support for TKG mode.\n* :doc:`Output Projection CTE Kernel </nki/library/api/output-projection-cte>`: Added FP8 quantization support (``quantization_type``, ``input_scales``, ``weight_scales``).\n* :doc:`Output Projection TKG Kernel </nki/library/api/output-projection-tkg>`: Added FP8 quantization support (``quantization_type``, ``weight_scale``, ``input_scale``) and removed 512 restriction on non-transpose path.\n* :doc:`Attention CTE Kernel </nki/library/api/attention-cte>`: Added strided Q slicing for context parallelism (``cp_strided_q_slicing``).\n* :doc:`RMSNorm-Quant Kernel </nki/library/api/rmsnorm-quant>`: Added input dequantization scale support (``input_dequant_scale``).\n\nKernel Utilities\n^^^^^^^^^^^^^^^^\n\nSee :doc:`Kernel Utilities Reference </nki/library/kernel-utils/index>` for full documentation.\n\n* :doc:`TensorView </nki/library/kernel-utils/tensor-view>`: Added ``rearrange`` method for flexible dimension reordering, ``has_dynamic_access`` for checking whether a view requires runtime-dependent addressing, and ``key_in_dict`` helper. The ``slice`` method now clamps the end index to dimension bounds instead of asserting.\n* :doc:`TiledRange </nki/library/kernel-utils/tiled-range>`: ``TiledRangeIterator`` now exposes an ``end_offset`` attribute, enabling kernels to determine the end position of each tile without manual calculation.\n* :doc:`SbufManager (Allocator) </nki/library/kernel-utils/allocator>`: Added ``get_total_space`` and ``get_used_space`` for querying SBUF utilization, ``set_name_prefix`` / ``get_name_prefix`` for scoped naming, and ``flush_logs`` to emit buffered allocation logs. SbufManager now uses ``TreeLogger`` to provide hierarchical, tree-formatted logs of SBUF allocation and deallocation events, making it easier to debug memory usage across nested scopes.\n* **QuantizationType**: Added ``MX`` enum value for microscaling quantization (MxFP4/MxFP8).\n* **common_types**: Added ``GateUpDim`` enum for distinguishing gate vs up projection dimensions.\n* **rmsnorm_tkg / layernorm_tkg**: Both subkernels now accept a ``TensorView`` or ``nl.ndarray`` for input and require an explicit ``output`` tensor parameter, giving callers control over output placement.\n* **New utilities**: Added ``rmsnorm_mx_quantize_tkg`` subkernel for fused RMSNorm with MX quantization in token generation, ``interleave_copy`` for interleaved tensor copy operations, ``LncSubscriptable`` for LNC-aware data access patterns, and ``TreeLogger`` for hierarchical allocation logging.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* The open source repository source directory has been renamed from ``nkilib_standalone`` to ``nkilib_src``.\n* :doc:`MLP Kernel </nki/library/api/mlp>`: The function has been renamed from ``mlp_kernel`` to ``mlp``. New parameters have been inserted in the middle of the signature; callers using positional arguments beyond ``normalization_type`` must update to use keyword arguments.\n* :doc:`QKV Kernel </nki/library/api/qkv>`: New parameters (``quantization_type``, ``qkv_w_scale``, ``qkv_in_scale``) have been inserted after ``bias``; callers using positional arguments beyond ``bias`` must update to use keyword arguments.\n* :doc:`Output Projection TKG Kernel </nki/library/api/output-projection-tkg>`: The ``bias`` parameter is now optional (default ``None``). New parameters (``quantization_type``, ``weight_scale``, ``input_scale``) have been inserted before ``TRANSPOSE_OUT``; callers using positional arguments beyond ``bias`` must update to use keyword arguments.\n* **TiledRangeIterator**: The constructor now requires a fourth positional argument ``end_offset``.\n* **TensorView**: The ``sizes`` attribute has been renamed to ``shape``.\n* **rmsnorm_tkg**: The ``inp`` parameter has been renamed to ``input``. A new required ``output`` parameter has been added as the third argument. The ``output_in_sbuf`` parameter has been removed. New parameters ``hidden_dim_tp`` and ``single_core_forced`` have been added.\n* **layernorm_tkg**: The ``inp`` parameter has been renamed to ``input``. A new required ``output`` parameter has been added as the third argument. The ``output_in_sbuf`` parameter has been removed.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed attention TKG compilation and non-determinism issues.\n* Fixed incorrect v_active slice indices in attention TKG block KV path.\n* Fixed batch sharding in gen_mask_tkg active mask loading.\n* Fixed expert_affinities masking when ``mask_unselected_experts`` is True in MoE TKG.\n* Fixed expert_index shape mismatch in MoE TKG for T > 128.\n* Fixed MoE affinity mask handling for T not divisible by 128.\n* Fixed MoE TKG MX weight generation x4 pack size.\n* Fixed MLP CTE ``force_cte_mode`` parameter validation.\n* Fixed output projection CTE mixed precision support.\n* Fixed output projection TKG variable name typo.\n* Fixed router_topk bias shape to satisfy NKI check requirements.\n* Fixed tail iteration bug for sequences not a multiple of 128 in MoE CTE.\n* Fixed reading extra partitions for last rank in MoE CTE.\n\nKnown Issues\n~~~~~~~~~~~~\n\n.. _nki-lib-2-27-0-rn:\n\nNKI Library (NKI-Lib) (Neuron 2.27.0 Release)\n--------------------------------------------------------------------\n\nWhat's New\n~~~~~~~~~~\n\nThis release introduces the NKI Library, which provides pre-built kernels you can use to optimize\nthe performance of your models. The NKI Library offers ready-to-use, pre-optimized kernels that\nleverage the full capabilities of AWS Trainium hardware.\n\nNKI Library kernels are published in the `NKI Library GitHub repository <https://github.com/aws-neuron/nki-library>`_.\nIn Neuron 2.27, these kernels are also shipped as part of neuronx-cc under the ``nkilib.*`` namespace.\n\nAccessing NKI Library Kernels\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can access NKI Library kernels in two ways:\n\n* **Shipped version**: Import from the ``nkilib.*`` namespace (included with neuronx-cc in Neuron 2.27)\n* **Open source repository**: Clone and use kernels from the GitHub repository under the ``nkilib_standalone.nkilib.*`` namespace\n\nNew Kernels\n~~~~~~~~~~~\n\nThis release includes the following pre-optimized kernels:\n\n* **Attention CTE Kernel** — Implements attention with support for multiple variants and optimizations\n* **Attention TKG Kernel** — Implements attention specifically optimized for token generation scenarios\n* **MLP Kernel** — Implements a Multi-Layer Perceptron with optional normalization fusion and various optimizations\n* **Output Projection CTE Kernel** — Computes the output projection operation optimized for Context Encoding use cases\n* **Output Projection TKG Kernel** — Computes the output projection operation optimized for Token Generation use cases\n* **QKV Kernel** — Performs Query-Key-Value projection with optional normalization fusion\n* **RMSNorm-Quant Kernel** — Performs optional RMS normalization followed by quantization to fp8\n\nNKI Library Kernel Migration to New nki.* Namespace in Neuron 2.28\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSome NKI Library kernels currently use the legacy ``neuronxcc.nki.*`` namespace. Starting with\nNeuron 2.28, all NKI Library kernels will migrate to the new ``nki.*`` namespace.\n\nThe new ``nki.*`` namespace introduces changes to NKI APIs and language constructs. Customers\nusing NKI Library kernels should review the migration guide for any required changes.\n\nNKI Library Namespace Changes in Neuron 2.28\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nStarting with Neuron 2.28, the open source repository namespace will change from\n``nkilib_standalone.nkilib.*`` to ``nkilib.*``, providing a consistent namespace between\nthe open source repository and the shipped version.\n\nCustomers who want to add or modify NKI Library kernels can build and install them to\nreplace the default implementation without changing model imports.\n\n\n\n    "
  },
  {
    "path": "release-notes/components/nki.rst",
    "content": ".. meta::\n    :description: Release notes for the Neuron Kernel Interface (NKI) component across all Neuron SDK versions\n    :keywords: NKI, Neuron Kernel Interface, release notes, nki.language, nki.isa, kernels\n    :date-modified: 04/09/2026\n\n.. _nki_rn:\n\nRelease Notes for Neuron Component: Neuron Kernel Interface (NKI)\n==================================================================\n\nThe release notes for the Neuron Kernel Interface (NKI) component. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. _nki-2-29-0-rn:   \n\nNeuron Kernel Interface (NKI) [0.3.0] (Neuron 2.29.0 Release)\n---------------------------------------------------------------------\n\nDate of Release: 04/09/2026\n\nAWS Neuron SDK 2.29.0 introduces NKI 0.3.0, a significant update to the Neuron Kernel Interface for General Availability. NKI 0.3.0 features NKI Standard Library (nki-stdlib), which provides developer-visible code for all NKI APIs and native language objects (e.g., ``NkiTensor``). This release provides new exposed Trainium capabilities and features in the NKI API and introduces ``nki.language`` APIs. NKI 0.3.0 includes a CPU Simulator, which executes NKI kernels entirely on CPU using NumPy — enabling developers to validate kernel logic on laptops and CI environments without Trainium hardware. NKI 0.3.0 also includes the ``nki.typing`` module for declaring expected tensor shapes, a dedicated ``nki.isa.exponential`` instruction optimized for Softmax computation, matmul accumulation control, explicit memory address placement, and variable-length all-to-all collectives via ``nki.collectives.all_to_all_v``. NKI 0.3.0 includes several API breaking changes that improve correctness and consistency along with updated documentation.\n\nFor the full list of changes and update examples, see the :ref:`NKI 0.3.0 Update Guide <nki-0-3-0-update-guide>`.\n\nNew Features\n~~~~~~~~~~~~\n\n* **NKI Standard Library (nki-stdlib)**: NKI 0.3.0 ships with the NKI Standard Library (nki-stdlib), which provides developer-visible code for all NKI APIs and native language objects (e.g., ``NkiTensor``).\n\n* **NKI CPU Simulator** *(Experimental)*: Executes NKI kernels entirely on CPU using NumPy, enabling local development, debugging, and functional correctness testing without Trainium hardware. Set the environment variable ``NKI_SIMULATOR=1`` to run existing kernels without code changes, or wrap the kernel call with ``nki.simulate(kernel)``. See :doc:`nki.simulate API Reference </nki/api/nki.simulate>`.\n\n* **nki.language APIs** *(Experimental)*: Introduces ``nki.language`` APIs as convenience wrappers around ``nki.isa`` APIs, including ``nl.load``, ``nl.store``, ``nl.copy``, ``nl.matmul``, ``nl.transpose``, ``nl.softmax``, and other high-level operations. See :doc:`nki.language API Reference </nki/api/nki.language>`.\n\n* **nki.typing module**: New module for type-annotating kernel tensor parameters. Use ``nt.tensor[shape]`` to declare expected tensor shapes.\n\n* **nki.isa.exponential**: Dedicated exponential instruction with max subtraction, faster than ``nisa.activation(op=nl.exp)`` and useful for Softmax calculation. Trn3 (NeuronCore-v4) only. See :doc:`nki.isa.exponential </nki/api/generated/nki.isa.exponential>`.\n\n* **nki.collectives.all_to_all_v**: Variable-length all-to-all collective. Unlike ``all_to_all``, uses a metadata tensor to specify per-rank send/recv counts. See :doc:`nki.collectives API Reference </nki/api/nki.collectives>`.\n\n* **Matmul accumulation**: ``nc_matmul`` and ``nc_matmul_mx`` now have an ``accumulate`` parameter that controls whether the operation overwrites or accumulates on the destination PSUM tile. The default (``accumulate=None``) auto-detects, matching Beta 2 behavior. See :doc:`nki.isa.nc_matmul </nki/api/generated/nki.isa.nc_matmul>`.\n\n* **Address placement**: The ``address`` parameter was added to ``nki.language.ndarray`` for explicit memory placement. See :doc:`nki.language.ndarray </nki/api/generated/nki.language.ndarray>`.\n\nDeprecated and Removed APIs\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n* ``nki.isa.tensor_copy_dynamic_src`` / ``nki.isa.tensor_copy_dynamic_dst`` — Deprecated and scheduled for removal. Use ``nisa.tensor_copy()`` with ``.ap()`` and ``scalar_offset`` instead.\n\n* ``nki.jit(platform_target=...)`` — Deprecated. Set the target platform via the ``NEURON_PLATFORM_TARGET_OVERRIDE`` environment variable instead. This is a breaking change.\n\n.. TODO: Create an NKI environment variables reference page and link from here.\n\n* ``nki.jit(mode=...)`` — Deprecated and ignored. The NKI Compiler now auto-detects the framework from kernel arguments. This is a breaking change.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n.. note::\n\n   NKI 0.3.0 requires all NKI kernels in a model to be updated to NKI 0.3.0. Mixing NKI 0.3.0 and NKI Beta 2 kernels in the same model is not supported. For models that have not yet been updated, continue using Neuron SDK 2.28.\n\n* ``nisa.dma_copy`` — No longer supports reading directly from PSUM. Copy the PSUM tensor to SBUF first using ``nisa.tensor_copy``.\n\n* ``nisa.dma_copy`` — Enforces matching source and destination element types when using ``dge_mode=dge_mode.hwdge``. Use ``.view()`` to reinterpret types.\n\n* ``nisa.dma_copy`` — ``dst_rmw_op`` and ``unique_indices`` parameters removed. Use ``nisa.dma_compute`` instead.\n\n* ``nisa.dma_compute`` — ``scales`` and ``reduce_op`` parameters swapped positions. ``scales`` is now optional. ``unique_indices`` parameter added. Update call sites to use the new parameter order: ``nisa.dma_compute(dst, srcs, reduce_op, scales=None, unique_indices=True)``.\n\n* ``nisa.memset`` — Enforces strict type matching between ``value`` and destination dtype. x4 packed types enforce ``value=0``. Kernels that pass float values to integer-typed tensors (e.g., ``value=2.0`` instead of ``value=2``) will now raise an error at compile time.\n\n* ``nisa.sendrecv`` — ``use_gpsimd_dma`` replaced by ``dma_engine`` enum. Update existing kernels to use the new enum.\n\n* ``nisa.affine_select`` — ``offset`` moved from 3rd positional argument to keyword argument with default ``0``.\n\n* ``nisa.register_move`` — ``imm`` renamed to ``src``, now accepts ``VirtualRegister``. Update keyword argument from ``imm=`` to ``src=``.\n\n* ``nki.collectives.collective_permute_implicit_current_processing_rank_id`` — ``num_channels`` parameter removed. Remove ``num_channels`` from call sites and pass ``channel_ids`` list to ``collective_permute_implicit()`` instead.\n\n* Output tensors must use ``buffer=nl.shared_hbm``. Using ``nl.hbm`` causes compilation failures.\n\n* Raw integer enum constants no longer accepted. Use named enum members.\n\n* String buffer names no longer accepted. Use buffer objects (e.g., ``nl.sbuf``).\n\n* Keyword-only argument separator (``*``) in kernel signatures is not supported.\n\n* ``is`` / ``is not`` operators are not supported. Use ``==`` / ``!=``.\n\n* ``list`` kernel arguments are not supported. Convert to tuples.\n\nFor before-and-after code examples, see the :ref:`NKI 0.3.0 Update Guide <nki-0-3-0-update-guide>`.\n\n.. note::\n\n   The previously announced removal of the ``neuronxcc.nki.*`` namespace has been postponed to a future release. Both the ``neuronxcc.nki.*`` and ``nki.*`` namespaces continue to be supported in this release.\n\nOther Changes\n~~~~~~~~~~~~~\n\n* ``nki.isa.dma_engine`` alias repurposed as the ``dma_engine`` enum for DMA transfer engine selection.\n\n* ``nki.isa.iota`` — ``offset`` now optional with default ``0``.\n\n* ``nki.isa.core_barrier`` — ``engine`` default changed from ``unknown`` to ``gpsimd`` (no behavioral change).\n\n* ``nki.language.num_programs`` — ``axes`` default changed from ``None`` to ``0``.\n\n* ``nki.language.program_id`` — ``axis`` now defaults to ``0``.\n\n* ``nki.language.ndarray`` — ``buffer`` default changed from ``None`` to ``nl.sbuf``.\n\n* ``nki.language.zeros`` — ``buffer`` default changed from ``None`` to ``nl.sbuf``.\n\n* ``nki.language.sequential_range`` — ``stop`` and ``step`` now have default values (``None`` and ``1``).\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed incorrect axis handling in ``nisa.tensor_reduce``. Beta 2 incorrectly allowed ``axis=1`` to refer to the last free dimension even for 3D/4D tensors. NKI 0.3.0 corrects this so that axis values correspond to the actual tensor dimensions.\n\n* Fixed ``nisa.range_select`` silently overriding user-specified parameters. The ``on_false_value`` and ``reduce_cmd`` parameters were incorrectly ignored by the compiler — ``on_false_value`` was always set to ``-3.4028235e+38`` and ``reduce_cmd`` was always set to ``reset_reduce``, regardless of the values passed in. NKI 0.3.0 honors the ``reduce_cmd`` parameter and documents the ``FP32_MIN`` hardware constraint for ``on_false_value``.\n\nKnown Issues\n~~~~~~~~~~~~\n\n**Math Operations**\n\n* ``nki.language.divide`` is not supported — Division is not available as a hardware instruction. As a workaround, multiply by the reciprocal: ``nl.multiply(x, nl.reciprocal(y))``.\n\n* ``nki.language.fmod`` and ``nki.language.mod`` are not supported — Modulo operations are not available as hardware instructions. These APIs work in simulation but will fail when compiled for Trainium hardware.\n\n* ``nki.language.power`` does not support scalar exponents — ``nl.power(tile, scalar)`` is not supported. Use ``nl.power(tile, tile)`` instead, where both operands are tiles.\n\n**Broadcasting**\n\n* Binary operations do not support broadcasting — Operations like ``nl.add(a, b)`` require both operands to have the same shape. Broadcasting (e.g., adding a ``(128, 1)`` tile to a ``(128, 512)`` tile) is not yet supported.\n\n* ``nki.language.softmax`` and ``nki.language.rms_norm`` fail on hardware — These functions rely on internal broadcasting between full-size and reduced-size tiles, which is not supported on hardware. They work correctly in simulation.\n\n**Random Number Generation**\n\n* ``nki.language.random_seed`` requires a tensor, not a scalar — Pass a ``[1, 1]`` tensor on SBUF instead of a Python integer. For example: ``nl.random_seed(nl.full((1, 1), 42, dtype=nl.int32, buffer=nl.sbuf))``.\n\n* ``nki.language.rand`` and ``nki.language.random_seed`` engine behavior — On NeuronCore-v4+ (Trn3+), ``rand`` uses ``nisa.rand2`` on the Vector Engine. On earlier NeuronCores, ``rand`` uses ``nisa.rng`` which may run on a different engine than ``random_seed``, potentially causing ``random_seed`` to have no effect on ``rand`` output.\n\n**Matrix Operations**\n\n* ``nki.language.matmul`` without ``transpose_x=True`` is not supported — Calling ``nl.matmul(x, y)`` without setting ``transpose_x=True`` will fail. As a workaround, always use ``nl.matmul(x, y, transpose_x=True)`` and pre-arrange data accordingly.\n\n**Data Movement**\n\n* ``nki.language.store`` does not support PSUM tiles directly — Storing a tile that resides in PSUM requires manually copying it to SBUF first using ``nisa.tensor_copy``. A future release will handle this automatically.\n\n* ``nki.language.copy`` uses lossy FP32 casting — ``nl.copy`` uses the Scalar Engine which internally casts through FP32, which is lossy for integer types with values exceeding FP32 precision (e.g., int32 values > 2^23). Additionally, cross-buffer copies (e.g., PSUM to SBUF) are not supported.\n\n**Control Flow**\n\n* ``nki.language.dynamic_range`` loop variable cannot be used in index arithmetic — The induction variable of a ``dynamic_range`` loop is a scalar, not a register. It cannot be used as a ``scalar_offset`` in access patterns or in arithmetic expressions for computing tile offsets. Use ``nl.affine_range`` or ``nl.static_range`` if you need to compute offsets from the loop variable.\n\n**Multi-Core (LNC2)**\n\n* LNC2 requires identical control flow across cores — When running with Logical NeuronCore 2 (LNC2), the NKI compiler expects each physical NeuronCore to execute identical control flow. Programs with dynamic control flow that differs across cores may deadlock or produce incorrect results. This constraint is not enforced at compile time.\n\n**Caching**\n\n* NKI kernel caching assumes kernels are pure functions of their input arguments. If a kernel's output depends on external state (such as global variables or closures over mutable objects), the cache may return stale results. This is undefined behavior. Always ensure kernel outputs are determined solely by kernel arguments.\n\n**Compiler**\n\n* Address rotation cannot be disabled — Address rotation, a backend compiler optimization that rotates tensor addresses for improved memory utilization, is enabled by default and cannot be opted out of in this release.\n\n\n.. _nki-2-28-0-rn:   \n\nNeuron Kernel Interface (NKI) (Beta 2 - 0.2.0) [2.28] (Neuron 2.28.0 Release)\n-----------------------------------------------------------------------------\n\nDate of Release: 02/26/2026\n\nNew Features\n~~~~~~~~~~~~\n\n* LNC (Large Neuron Core) multi-core support:\n\n  * **Shared buffers and canonical outputs**: The compiler now tracks\n    :doc:`shared_hbm </nki/api/generated/nki.language.shared_hbm>` tensors declared in kernels\n    and canonicalizes LNC kernel outputs into a consistent form. This is foundational\n    infrastructure for multi-core kernel compilation.\n    See :doc:`LNC Overview </nki/get-started/about/lnc>`.\n\n  * **Private HBM tensors**: Users can declare tensors private to a single NeuronCore using the\n    :doc:`private_hbm </nki/api/generated/nki.language.private_hbm>` memory type, distinct from\n    regular and shared HBM.\n\n  * **Intra-LNC collectives**: New ISA instruction types for multi-core collective operations\n    such as cross-core reductions and broadcasts. See full API listing under\n    :doc:`nki.collectives </nki/api/nki.collectives>` below.\n\n* New ``nki.isa`` APIs:\n\n  * :doc:`nki.isa.nonzero_with_count </nki/api/generated/nki.isa.nonzero_with_count>` — returns nonzero element indices and their count, useful for sparse computation and dynamic masking\n  * ``nki.isa.exponential`` — computes element-wise exponential on tensors. See :doc:`nki.isa.activation </nki/api/generated/nki.isa.activation>`.\n\n* New :doc:`nki.collectives </nki/api/nki.collectives>` module, enabling collective communication across multiple NeuronCores directly from NKI kernels:\n\n  * :doc:`nki.collectives.all_reduce </nki/api/generated/nki.collectives.all_reduce>`\n  * :doc:`nki.collectives.all_gather </nki/api/generated/nki.collectives.all_gather>`\n  * :doc:`nki.collectives.reduce_scatter </nki/api/generated/nki.collectives.reduce_scatter>`\n  * :doc:`nki.collectives.all_to_all </nki/api/generated/nki.collectives.all_to_all>`\n  * :doc:`nki.collectives.collective_permute </nki/api/generated/nki.collectives.collective_permute>`\n  * :doc:`nki.collectives.collective_permute_implicit </nki/api/generated/nki.collectives.collective_permute_implicit>`\n  * :doc:`nki.collectives.collective_permute_implicit_reduce </nki/api/generated/nki.collectives.collective_permute_implicit_reduce>`\n  * :doc:`nki.collectives.rank_id </nki/api/generated/nki.collectives.rank_id>`\n\n* New ``dtypes``:\n\n  * :doc:`nki.language.float8_e4m3fn </nki/api/generated/nki.language.float8_e4m3fn>` — for FP8 inference and training workloads\n\n* New NKI language features:\n\n  * ``no_reorder`` blocks — use ``with no_reorder(): ...`` to prevent the compiler from reordering instructions within a block, for kernels where instruction ordering affects correctness\n  * ``__call__`` special method support — callable objects (classes with ``__call__``) can now be used as functions within NKI kernels\n  * ``tensor.view`` method — tensors now support ``.view()`` for reshaping\n  * ``nl.shared_constant`` can now be passed to kernels as string arguments, not just tensor objects\n\nImprovements\n~~~~~~~~~~~~\n\n* Updated ``nki.isa`` APIs:\n\n  * :doc:`nki.isa.dma_transpose </nki/api/generated/nki.isa.dma_transpose>` now supports indirect addressing\n  * :doc:`nki.isa.dma_copy </nki/api/generated/nki.isa.dma_copy>` now supports ``unique_indices`` parameter\n  * :doc:`nki.isa.register_alloc </nki/api/generated/nki.isa.register_alloc>` now accepts an optional tensor argument to pre-fill the allocated register with initial values\n\n* Compiler output improvements:\n\n  * The compiler no longer truncates diagnostic output; users now receive the full set of warnings and errors\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* :doc:`nki.isa.nc_matmul </nki/api/generated/nki.isa.nc_matmul>` parameter ``psumAccumulateFlag`` has been removed. This parameter had no effect on compilation or execution. Simply remove it from your kernel code.\n\n* :doc:`nki.isa.nc_matmul </nki/api/generated/nki.isa.nc_matmul>` parameter ``is_moving_zero`` has been renamed to ``is_moving_onezero`` to match hardware semantics, consistent with the companion ``is_stationary_onezero`` parameter. Kernels that passed ``is_moving_zero`` by name should update to ``is_moving_onezero``.\n\n* ``nki.tensor`` has moved to ``nki.meta.tensor``. Users should update their imports accordingly.\n\n.. note::\n\n   The previously announced removal of the ``neuronxcc.nki.*`` namespace has \n   been postponed from Neuron 2.28 to Neuron 2.29. Both the ``neuronxcc.nki.*`` \n   and ``nki.*`` namespaces continue to be supported in this release. We \n   encourage customers to migrate to the ``nki.*`` namespace using the \n   :doc:`NKI Beta 2 Migration Guide </nki/migration/nki-beta2-migration-guide>`.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed incorrect default value for ``on_false_value`` in ``nki.isa.range_select``. The default\n  was ``0.0`` instead of negative infinity (``-inf``). This caused ``range_select`` to write zeros\n  for out-of-range elements instead of the expected negative-infinity sentinel, which could produce\n  incorrect results in downstream reductions (e.g., max-pooling or top-k).\n  See :doc:`nki.isa.range_select </nki/api/generated/nki.isa.range_select>`.\n\n* Fixed default value parsing for keyword-only arguments in NKI kernels. When a Python function\n  used keyword-only arguments with default values (arguments after ``*`` in the signature), the\n  NKI compiler did not associate the defaults with their corresponding parameter names.\n  This caused keyword-only arguments to appear as required even when they had defaults, leading to\n  \"missing argument\" errors during kernel compilation.\n\n* Fixed wrong default for ``reduce_cmd`` in :doc:`nki.isa.activation </nki/api/generated/nki.isa.activation>`.\n  The default was incorrectly set to ``ZeroAccumulate`` instead of ``Idle``, causing the accumulator\n  to be zeroed before every activation call even when no reduction was requested.\n\n* Fixed missing ALU operators (``rsqrt``, ``abs``, ``power``) in\n  :doc:`nki.isa.tensor_scalar </nki/api/generated/nki.isa.tensor_scalar>` and\n  :doc:`nki.isa.tensor_tensor </nki/api/generated/nki.isa.tensor_tensor>`. Passing these operators\n  previously raised an \"unsupported operator\" error.\n  See :doc:`NKI Language Guide </nki/get-started/nki-language-guide>`.\n\n* Fixed ``float8_e4m3fn`` to ``float8_e4m3`` conversion for kernel inputs and outputs. When a\n  tensor with dtype ``float8_e4m3fn`` was passed to the compiler, the automatic conversion to\n  ``float8_e4m3`` could fail with a size-check error. The conversion now validates sizes\n  correctly before casting.\n  See :doc:`nki.language.float8_e4m3 </nki/api/generated/nki.language.float8_e4m3>`.\n\n* Fixed dynamic for loop incorrectly incrementing the loop induction variable. In loops with a\n  runtime-determined trip count (``sequential_range`` with non-constant bounds), the compiler\n  generated incorrect increment code, causing the loop counter to never advance and the loop to\n  run indefinitely or produce incorrect iteration values.\n  See :doc:`nki.language.sequential_range </nki/api/generated/nki.language.sequential_range>`.\n\n* Fixed reshape of ``shared_hbm`` and ``private_hbm`` tensors failing partition size check.\n  Reshape only recognized plain ``hbm`` memory as exempt from partition-dimension size validation.\n  Tensors allocated in ``shared_hbm`` or ``private_hbm`` (used for cross-kernel and\n  kernel-private storage) incorrectly triggered a \"partition size mismatch\" error when reshaped.\n  See :doc:`nki.language.shared_hbm </nki/api/generated/nki.language.shared_hbm>` and\n  :doc:`nki.language.private_hbm </nki/api/generated/nki.language.private_hbm>`.\n\n* Fixed bias shape checking in :doc:`nki.isa.activation </nki/api/generated/nki.isa.activation>`.\n  The ``bias`` parameter was not validated for shape correctness. A bias tensor with a free\n  dimension other than 1 (e.g., shape ``(128, 64)`` instead of ``(128, 1)``) was accepted\n  without validation, which could produce incorrect results. The compiler now raises an error if the bias\n  free dimension is not 1.\n\n* Fixed incorrect line numbers in stack traces and error reporting. An off-by-one error in the\n  line offset calculation caused all reported line numbers to be shifted by one.\n  Additionally, error location was sometimes lost when errors propagated across file boundaries.\n\n* Fixed invalid keyword arguments being silently ignored instead of raising an error. When calling\n  an NKI API with a misspelled or unsupported keyword argument, the argument was ignored\n  without warning.\n  The compiler now validates all keyword argument names against the function signature and raises\n  an ``unexpected keyword argument`` error for unrecognized names.\n\n* Fixed ``nki.jit`` in auto-detection mode returning an uncalled kernel object instead of\n  executing the kernel. When ``nki.jit`` was used without specifying a framework mode (e.g.,\n  ``@nki.jit`` with no ``mode`` argument), the auto-detection path constructed the appropriate\n  framework-specific kernel object but returned it without calling it. The user received a kernel\n  object instead of the computed result, requiring an extra manual invocation.\n  See :doc:`nki.jit </nki/api/generated/nki.jit>`.\n\n* Fixed stale kernel object state between trace invocations. When tracing the same kernel\n  multiple times (e.g., with different input shapes), compiler state was not fully reset\n  between invocations, causing name collisions and incorrect results.\n  The trace state is now fully reset before each invocation.\n\n* Improved 'removed during code migration' error messages with clear descriptions of\n  unimplemented features. APIs not available in this release (``nki.baremetal``, ``nki.benchmark``,\n  ``nki.profile``, ``nki.simulate_kernel``) previously raised a generic\n  ``NotImplementedError(\"removed during code migration\")`` message. Each now raises a specific\n  message naming the unsupported API. Additionally, calling an ``nki.jit`` kernel with no\n  arguments now raises a clear error instead.\n  See :doc:`NKI Beta 2 Migration Guide </nki/migration/nki-beta2-migration-guide>`.\n\n* Fixed nested ``nki_jit`` decorators not being allowed. The NKI compiler only recognized\n  ``@nki.jit``-decorated functions when they were plain function objects. Nested decorators\n  (e.g., ``@my_wrapper @nki.jit``) wrapped the function in a non-function object, causing the\n  compiler to skip it. The compiler now correctly unwraps decorator chains to find the underlying\n  kernel function. See :doc:`nki.jit </nki/api/generated/nki.jit>`.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* ``nki.isa.range_select``: The ``on_false_value`` and ``reduce_cmd`` parameters are incorrectly\n  ignored by the NKI compiler. The ``on_false_value`` is always set to ``(-3.4028235e+38)``\n  and ``reduce_cmd`` is always set to ``reduce_cmd.reset_reduce``, regardless of the values passed in.\n\n.. _nki-2-27-0-rn:\n\nNeuron Kernel Interface (NKI) (Beta 2 - 0.1.0) [2.27] (Neuron 2.27.0 Release)\n-----------------------------------------------------------------------------\n\nDate: 12/25/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* new ``nki.language`` APIs:\n\n  * ``nki.language.device_print``\n\n* new ``nki.isa`` APIs:\n\n  * ``nki.isa.dma_compute``\n  * ``nki.isa.nki.isa.quantize_mx``\n  * ``nki.isa.nki.isa.nc_matmul``\n  * ``nki.isa.nki.isa.nc_n_gather`` [used to be ``nl.gather_flattened`` with free partition limited to 512]\n  * ``nki.isa.rand2``\n  * ``nki.isa.rand_set_state``\n  * ``nki.isa.rand_get_state``\n  * ``nki.isa.set_rng_seed``\n  * ``nki.isa.rng``\n\n* new ``dtypes``:\n\n  * ``nki.language.float8_e5m2_x4``\n  * ``nki.language.float4_e2m1fn_x4``\n  * ``nki.language.float8_e4m3fn_x4``\n\n* changes to existing APIs:\n\n  * several ``nki.language`` APIs have been removed in NKI Beta 2\n  * all nki.isa APIs have ``dst`` as an input param\n  * all nki.isa APIs removed ``dtype`` and ``mask`` support\n  * ``nki.isa.memset`` — removed ``shape`` positional arg , since we have ``dst``\n  * ``nki.isa.affine_select`` — instead of ``pred``, we now take ``pattern`` and ``cmp_op`` params\n  * ``nki.isa.iota`` — ``expr`` replaced with ``pattern`` and ``offset``\n  * ``nki.isa.nc_stream_shuffle`` - ``src`` and ``dst`` order changed\n\n* docs improvements:\n\n  * restructured NKI Documentation to align with workflows\n  * added :doc:`Trainium3 Architecture Guide for NKI </nki/guides/architecture/trainium3_arch>`\n  * added :doc:`About Neuron Kernel Interface (NKI) </nki/get-started/about/index>`\n  * added :doc:`NKI Environment Setup Guide </nki/get-started/setup-env>`\n  * added :doc:`Get Started with NKI </nki/get-started/quickstart-implement-run-kernel>`\n  * added :doc:`NKI Language Guide </nki/get-started/nki-language-guide>`\n  * added :doc:`About the NKI Compiler </nki/deep-dives/nki-compiler>`\n  * added :doc:`About NKI Beta 2 Migration </nki/migration/nki-beta2-migration-guide>`\n  * added :doc:`MXFP Matrix Multiplication with NKI </nki/deep-dives/mxfp-matmul>`\n  * updated :doc:`Matrix Multiplication Tutorial </nki/guides/tutorials/matrix_multiplication>`\n  * updated :doc:`Profile a NKI Kernel </nki/guides/use-neuron-profile>`\n  * updated :doc:`NKI APIs </nki/api/index>`\n  * updated :doc:`NKI Library docs </nki/library/index>`\n  * removed NKI Error Guide\n\nKnown Issues\n~~~~~~~~~~~~\n\n* ``nki.isa.nki.isa.nc_matmul`` - ``is_moving_onezero`` was incorrectly named ``is_moving_zero`` in this release\n* NKI ISA semantic checks are not available with Beta 2, workaround is to reference the API docs\n* NKI Collectives are not available with Beta 2\n* ``nki.benchmark`` and ``nki.profile`` are not available with Beta 2\n\n\n----\n\n.. _nki-2-26-0-rn:\n\nNeuron Kernel Interface (NKI) (Beta) [2.26] (Neuron 2.26.0 Release)\n--------------------------------------------------------------------\n\nDate: 09/18/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* new ``nki.language`` APIs:\n\n  * ``nki.language.gelu_apprx_sigmoid`` - Gaussian Error Linear Unit activation function with sigmoid approximation.\n  * ``nki.language.tile_size.total_available_sbuf_size`` to get total available SBUF size\n\n* new ``nki.isa`` APIs:\n\n  * ``nki.isa.select_reduce`` - selectively copy elements with max reduction \n  * ``nki.isa.sequence_bounds`` - compute sequence bounds of segment IDs\n  * ``nki.isa.dma_transpose`` \n\n    * ``axes`` param to define 4D transpose for some supported cases\n    * ``dge_mode`` to specify Descriptor Generation Engine (DGE).\n\n  * ``nl.gelu_apprx_sigmoid`` op support on ``nki.isa.activation``\n\n* fixes / improvements:\n\n  * ``nki.language.store`` supports PSUM buffer with extra additional copy inserted.\n\n* docs/tutorial improvements:\n\n  * ``nki.isa.dma_transpose`` API doc and example\n  * ``nki.simulate_kernel`` example improvement\n  * use ``nl.fp32.min`` in tutorial code instead of a magic number\n\n* better error reporting:\n\n  * indirect indexing on transpose\n  * mask expressions\n\n\n----\n\n.. _nki-2-24-0-rn:\n\nNeuron Kernel Interface (NKI) (Beta) [2.24] (Neuron 2.24.0 Release)\n--------------------------------------------------------------------\n\nDate: 06/24/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* ``sqrt`` valid data range extended for accuracy improvement with wider numerical values support.\n* ``nki.language.gather_flattened`` new API\n* ``nki.isa.nc_match_replace8`` additional param ``dst_idx`` \n* improved docs/examples on ``nki.isa.nc_match_replace8``, ``nki.isa.nc_stream_shuffle`` \n* improved error messages\n\n\n----\n\n.. _nki-2-23-0-rn:\n\nNeuron Kernel Interface (NKI) (Beta) [2.23] (Neuron 2.23.0 Release)\n--------------------------------------------------------------------\n\nDate: 05/20/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* ``nki.isa.range_select`` (for trn2) new instruction\n* ``abs``, ``power`` ops supported on to nki.isa tensor instruction\n* ``abs`` op supported on ``nki.isa.activation`` instruction\n* GpSIMD engine support added to ``add``, ``multiply`` in 32bit integer to nki.isa tensor operations\n* ``nki.isa.tensor_copy_predicated`` support for reversing predicate. \n* ``nki.isa.tensor_copy_dynamic_src``, ``tensor_copy_dynamic_dst`` engine selection.\n* ``nki.isa.dma_copy`` additional support with ``dge_mode``, ``oob_mode``, and in-place add ``rmw_op``.\n* ``+=, -=, /=, *=`` operators now work consistently across loop types, PSUM, and SBUF,  \n* fixed simulation for instructions: ``nki.language.rand``, ``random_seed``, ``nki.isa.dropout``\n* fixed simulation masking behavior\n* Added warning when the block dimension is used for SBUF and PSUM tensors, see: :ref:`NKI Block Dimension Migration Guide <nki_block_dimension_migration_guide>` \n\n\n----\n\n.. _nki-2-22-0-rn:\n\nNeuron Kernel Interface (NKI) (Beta) [2.22] (Neuron 2.22.0 Release)\n--------------------------------------------------------------------\n\nDate: 04/03/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* New modules and APIs:\n\n  * ``nki.profile``\n  * ``nki.isa`` new APIs:\n    \n    * ``tensor_copy_dynamic_dst``\n    * ``tensor_copy_predicated``\n    * ``max8``, ``nc_find_index8``, ``nc_match_replace8``\n    * ``nc_stream_shuffle``\n  \n  * ``nki.language`` new APIs: ``mod``, ``fmod``, ``reciprocal``, ``broadcast_to``, ``empty_like``\n\n* Improvements:\n\n  * ``nki.isa.nc_matmul`` now supports PE tiling feature \n  * ``nki.isa.activation`` updated to support reduce operation and ``reduce`` commands\n  * ``nki.isa.engine`` enum\n  * ``engine`` parameter added to more ``nki.isa`` APIs that support engine selection (ie, ``tensor_scalar``, ``tensor_tensor``, ``memset``)\n  * Documentation for ``nki.kernels`` have been moved to the GitHub: https://aws-neuron.github.io/nki-samples. \n    The source code can be viewed at https://github.com/aws-neuron/nki-samples.\n    \n    * These kernels are still shipped as part of Neuron package in ``neuronxcc.nki.kernels`` module\n\n* Documentation updates:\n\n  * Kernels public repository https://aws-neuron.github.io/nki-samples\n  * Updated :doc:`profiling guide </nki/guides/use-neuron-profile>` to use ``nki.profile`` instead of ``nki.benchmark``\n  * NKI ISA Activation functions table now have :ref:`valid input data ranges<tbl-act-func>` listed\n  * NKI ISA Supported Math operators now have :ref:`supported engine<tbl-aluop>` listed\n  * Clarify ``+=`` syntax support/limitation\n\n\n----\n\n.. _nki-2-21-0-rn:\n\nNeuron Kernel Interface (NKI) (Beta) [2.21] (Neuron 2.21.0 Release)\n--------------------------------------------------------------------\n\nDate: 12/16/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* New modules and APIs:\n\n  * ``nki.compiler`` module with Allocation Control and Kernel decorators,\n    see guide for more info.\n  * ``nki.isa``: new APIs (``activation_reduce``, ``tensor_partition_reduce``,\n    ``scalar_tensor_tensor``, ``tensor_scalar_reduce``, ``tensor_copy``, \n    ``tensor_copy_dynamic_src``, ``dma_copy``), new activation functions(``identity``, \n    ``silu``, ``silu_dx``), and target query APIs (``nc_version``, ``get_nc_version``).\n  * ``nki.language``: new APIs (``shared_identity_matrix``, ``tan``,\n    ``silu``, ``silu_dx``, ``left_shift``, ``right_shift``, ``ds``, ``spmd_dim``, ``nc``).\n  * New ``datatype <nl_datatypes>``: ``float8_e5m2``\n  * New ``kernels`` (``allocated_fused_self_attn_for_SD_small_head_size``,\n    ``allocated_fused_rms_norm_qkv``) added, kernels moved to public repository.\n\n\n* Improvements:\n\n  * Semantic analysis checks for nki.isa APIs to validate supported ops, dtypes, and tile shapes.\n  * Standardized naming conventions with keyword arguments for common optional parameters.\n  * Transition from function calls to kernel decorators (``jit``, \n    ``benchmark``, ``baremetal``, ``simulate_kernel``).\n\n* Documentation updates:\n\n  * Tutorial for :doc:`SPMD usage with multiple Neuron Cores on Trn2 </nki/guides/tutorials/spmd_multiple_nc_tensor_addition>`\n\n\n----\n\n.. _nki-2-20-1-rn:\n\nNeuron Kernel Interface (NKI) (Beta) (Neuron 2.20.1 Release)\n-------------------------------------------------------------\n\nDate: 12/03/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* NKI support for Trainium2, including full integration with Neuron Compiler.\n  Users can directly shard NKI kernels across multiple Neuron Cores from an SPMD launch grid.\n  See :doc:`tutorial </nki/guides/tutorials/spmd_multiple_nc_tensor_addition>` for more info.\n  See :doc:`Trainium2 Architecture Guide </nki/guides/architecture/trainium2_arch>` for an initial version of the architecture specification\n  (more details to come in future releases).\n* New calling convention in NKI kernels, where kernel output tensors are explicitly returned from the kernel instead\n  of pass-by-reference. See any :doc:`NKI tutorial </nki/guides/tutorials/index>` for code examples.\n\n\n----\n\n.. _nki-2-20-0-rn:\n\nNeuron Kernel Interface (NKI) (Beta) [2.20] (Neuron 2.20.0 Release)\n--------------------------------------------------------------------\n\nDate: 09/16/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* This release includes the beta launch of the Neuron Kernel Interface (NKI) (Beta).\n  NKI is a programming interface enabling developers to build optimized compute kernels\n  on top of Trainium and Inferentia. NKI empowers developers to enhance deep learning models\n  with new capabilities, performance optimizations, and scientific innovation.\n  It natively integrates with PyTorch and JAX, providing a Python-based programming environment\n  with Triton-like syntax and tile-level semantics offering a familiar programming experience\n  for developers. Additionally, to enable bare-metal access precisely programming the instructions\n  used by the chip, this release includes a set of NKI APIs (``nki.isa``) that directly emit\n  Neuron Instruction Set Architecture (ISA) instructions in NKI kernels.\n\n\n\n\n    "
  },
  {
    "path": "release-notes/components/nxd-core.rst",
    "content": ".. meta::\n    :description: Complete release notes for the NxD Core component across all AWS Neuron SDK versions.\n    :keywords: nxd core, neuronx-distributed, release notes, aws neuron sdk\n    :date-modified: 04/09/2026\n\n.. _nxd-core_rn:\n\nComponent Release Notes for NxD Core\n====================================\n\nThe release notes for the NxD Core (``neuronx-distributed``) Neuron component. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. _nxd-core-2-26-0-rn:\n\nNxD Core [0.15.22259] (Neuron 2.26.0 Release)\n-----------------------------------------------\n\nDate of Release: 09/18/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**NxD Core inference improvements**\n\n* Non-distributed inference in parallel layers: Updated parallel layers to support non-distributed inference when parallel state isn't initialized. In non-parallel environments, RowParallelLinear and ColumnParallelLinear now function as ``nn.Linear``, and ``ParallelEmbedding`` now functions as ``nn.Embedding``. This change enables you to simplify model code that works on device and on CPU by enabling you to use the parallel layer in both cases.\n* Added a ``compiler_flag_hook`` argument to ModelBuilder, which you can use to override compiler flags for different submodels and buckets.\n\nBug Fixes\n~~~~~~~~~\n\n* Added additional instance types to the ``hardware`` enum. For example, ``inf2`` now maps to ``trn1``.\n* Other minor bug fixes and improvements.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* At high batch size (>=32), we have observed performance degradation with ``shard-on-load`` for some models such as Llama3.1-8B. Our current recommendation is to disable this feature by enabling ``save_sharded_checkpoint`` in ``NeuronConfig`` when you trace and compile the model.\n* ``spmd_mode = True`` does not work when provided to the ``parallel_model_trace`` API. ``parallel_model_trace`` will be deprecated in the next Neuron SDK release.\n\n\n----\n\n.. _nxd-core-2-25-0-rn:\n\nNxD Core [0.14.18461] (Neuron 2.25.0 Release)\n-----------------------------------------------\n\nDate of Release: 07/31/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Inference**\n\n* ModelBuilder V2: ModelBuilder V2 provides a simplified version of the ModelBuilder API that is more flexible and extensible. This API includes basic building blocks that you can use to trace, compile, and load modules to Neuron. For more information, see :ref:`nxd-core-model-builder-v2` and the updated `Llama-3.2-1B reference inference sample <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama>`__.\n\n**Training**\n\n* Support for Shared Experts: Shared Experts allow multiple model components to utilize the same expert neural networks. This release adds full support for Shared Experts in training workloads.\n\nDate of Release: 06/24/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Inference:**\n\n* Add ``--auto-cast=none`` compiler arg by default in ModelBuilder to ensure model dtypes are preserved during compilation.\n* Update ModelBuilder to cast model weights based on dtypes defined in module parameters.\n* Add support for PyTorch 2.7. This release includes support for PyTorch 2.5, 2.6, and 2.7.\n* Other minor fixes and improvements.\n\n**Training:**\n\n* Added support for transformers 4.48.0\n\nBug Fixes\n~~~~~~~~~\n\n* Other minor fixes and improvements.\n\n\n----\n\n.. _nxd-core-2-23-0-rn:\n\nNxD Core [0.12.12111] (Neuron 2.23.0 Release)\n-----------------------------------------------\n\nDate of Release: 05/20/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Inference:**\n\n* Improve the Model Builder API. Note: The Model Builder API is in beta.\n  \n  * Add Neuron Persistent Cache support to Model Builder. Now, Model Builder caches compiled model artifacts to reduce compilation time.\n  * Improve the performance of weight sharding in Model Builder to support shard-on-load in NxD Inference.\n  * Improve the performance of Model Builder trace when HLO ``debug`` mode is enabled.\n\n* Add a Llama-3.2-1B reference inference sample using NxD Core.\n* Remove the unsupported NxD inference examples. You can use the NxD Inference library to run inference with on Neuron using NxD.\n* Other minor fixes and improvements.\n\n**Training:**\n\n* Context parallel support for sequence lengths up to 32k on TRN1 (beta feature)\n\n**General:**\n\n* Update the package version to include additional information.\n\nBug Fixes\n~~~~~~~~~\n\n* Other minor fixes and improvements.\n\n\n----\n\n.. _nxd-core-2-22-0-rn:\n\nNxD Core [0.11.0] (Neuron 2.22.0 Release)\n------------------------------------------\n\nDate of Release: 04/03/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Inference:**\n\n* Improve the performance of weight sharding by up to 60-70%, depending on the model.\n* You can now configure modules to skip during quantization with the ``modules_to_not_convert`` argument.\n* Other minor fixes and improvements.\n\n**Training:**\n\n* Fixed issue with wikicorpus dataset download\n* Updated model load for LoRA checkpoints\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed issue with wikicorpus dataset download\n* Updated model load for LoRA checkpoints\n* Other minor fixes and improvements.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* With PT2.5, some of the key workloads like Llama3-8B training may show reduced performance when using ``--llm-training`` compiler flag as compared to PT2.1. In such a case, try removing ``--llm-training`` flag from ``NEURON_CC_FLAGS`` in the run.sh only if using Neuron Kernel Interface.\n\n\n----\n\n.. _nxd-core-2-21-1-rn:\n\nNxD Core [0.10.1] (Neuron 2.21.1 Release)\n------------------------------------------\n\nDate of Release: 01/14/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Inference:**\n\n* Fix an issue with sequence parallel support for quantized models.\n\nBug Fixes\n~~~~~~~~~\n\n* Fix an issue with sequence parallel support for quantized models.\n\n\n----\n\n.. _nxd-core-2-21-0-rn:\n\nNxD Core [0.10.0] (Neuron 2.21.0 Release)\n------------------------------------------\n\nDate of Release: 12/20/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Training:**\n\n* Added support for HuggingFace Llama3 70B with Trn2 instances\n* Added support for PyTorch 2.5\n* Added DPO support for post-training model alignment\n* Added fused QKV optimization in GQA models\n* Support for Mixture-of-Experts with Tensor, Sequence, and Pipeline parallelism\n\nKnown Issues\n~~~~~~~~~~~~\n\n* With PT2.5, some of the key workloads like Llama3-8B training may show reduced performance when using ``--llm-training`` compiler flag as compared to PT2.1. In such a case, try removing ``--llm-training`` flag from ``NEURON_CC_FLAGS`` in the run.sh\n\n\n----\n\n.. _nxd-core-2-20-0-rn:\n\nNxD Core [0.9.0] (Neuron 2.20.0 Release)\n-----------------------------------------\n\nDate of Release: 09/16/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Training:**\n\n* Added LoRA adaptor support\n* Added support for GPU compatible precision support using ZeRO-1\n\n**Inference:**\n\n* Added inference example for DBRX, and Mixtral models\n* Improved inference performance with sequence length autobucketing\n* Improved trace time for inference examples\n* Reduced memory usage by sharing weights across prefill and decode traced models\n\n\n----\n\n.. _nxd-core-2-19-0-rn:\n\nNxD Core [0.8.0] (Neuron 2.19.0 Release)\n-----------------------------------------\n\nDate of Release: 07/03/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for Interleave pipeline parallel. At large cluster sizes, interleave pipeline schedule should help to reduce the pipeline bubble, thereby increasing training throughput.\n* Added integration with flash attention kernel for longer sequence length training. See :ref:`Llama3 8K sequence-length training sample <llama3_tp_zero1_tutorial>`.\n* Added support for naive speculative decoding, enabling assistance during the token generation process by predicting tokens with a draft model and verifying the predicted tokens with the original target model. Refer to the Neuronx Distributed inference developer guide for an example.\n* Added integration with flash attention kernel for longer sequence length inference. See an end to end example of CodeLlama-13b model with 16K sequence length.\n* Added support for scaled inference to run for Llama-2 70b or similar sized models\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Model checkpointing saves sharded checkpoints. Users will have to write a script to combine the shards\n* Validation/Evaluation with interleaved pipeline feature is not supported.\n* Due to weights not being able to be shared across context encoding and token generation trace, inference scale is tested for models up to size Llama-2-70b. For model configurations above this, there is a risk of OOM errors.\n* Tracing Llama-2-70b sized models for inference and loading them to device can take close to two hours. This is due to duplicate sharding of weights for both context encoding and token generation traces.\n\n\n----\n\n.. _nxd-core-2-18-0-rn:\n\nNxD Core [0.7.0] (Neuron 2.18.0 Release)\n-----------------------------------------\n\nDate of Release: 04/01/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for Pipeline-parallelism training using PyTorch-lightning\n* Added support for fine-tuning a model and running evaluation on the fine-tuned model using optimum-neuron\n* Added support for auto-partitioning the pipeline parallel stages for training large models\n* Added support for async checkpointing, optimizing the checkpoint saving time.\n* Added support for auto-resume from a checkpoint, in case training job crashes.\n* Added support for sequence length autobucketing in inference\n* Added support for inference with bfloat16\n* Improved performance for Llama-2-7b inference example.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.\n\n\n----\n\n.. _nxd-core-2-16-0-rn:\n\nNxD Core [0.6.0] (Neuron 2.16.0 Release)\n-----------------------------------------\n\nDate of Release: 12/21/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for Model/Optimizer wrapper that handles the parallelization in both model and optimizer.\n* Added support for PyTorch-lightning. This allows users to train models using Tensor-parallelism and Data-parallelism.\n* Added new checkpoint save/load APIs that handles the parallelization and dumps/loads the checkpoint.\n* Added a new QKV module which has the ability to replicate the KV heads and produce the query, key and value states.\n* Reduced the model initialization time when pipeline-parallel distributed strategy is used.\n* Added support for limiting max parallel compilations in parallel_model_trace. This resolves many out of memory errors by reducing the host memory usage.\n* Added example for Llama-2-7b inference. This is still early in development and is not well-optimized. The current recommendation is to use ``transformers-neuronx`` for optimal performance of llama inference.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.\n* Pipeline-parallelism is not supported as part of PyTorch-lightning integration.\n\n\n----\n\n.. _nxd-core-2-15-0-rn:\n\nNxD Core [0.5.0] (Neuron 2.15.0 Release)\n-----------------------------------------\n\nDate of Release: 10/26/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for pipeline-parallelism for distributed training.\n* Added support for serialized checkpoint saving/loading, resulting in better checkpoint saving/loading time.\n* Added support for mixed precision training using ``torch.autocast``.\n* Fixed an issue with Zero1 checkpoint saving/loading.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed an issue with Zero1 checkpoint saving/loading.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.\n\n\n----\n\n.. _nxd-core-2-14-0-rn:\n\nNxD Core [0.4.0] (Neuron 2.14.0 Release)\n-----------------------------------------\n\nDate of Release: 09/15/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added API for padding attention heads when they are not divisible by tensor-parallel degree\n* Added a constant threadpool for distributed inference\n* Fixed a bug with padding_idx in ParallelEmbedding layer\n* Fixed an issue with checkpoint loading to take into account the stride parameter in tensor parallel layers\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed a bug with padding_idx in ParallelEmbedding layer\n* Fixed an issue with checkpoint loading to take into account the stride parameter in tensor parallel layers\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.\n\n\n----\n\n.. _nxd-core-2-13-0-rn:\n\nNxD Core [0.3.0] (Neuron 2.13.0 Release)\n-----------------------------------------\n\nDate of Release: 08/28/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added Zero1 Optimizer support that works with tensor-parallelism\n* Added support for sequence-parallel that works with tensor-parallelism\n* Added IO aliasing feature in parallel_trace api, which can allow marking certains tensors as state tensors\n* Fixed hangs when tracing models using parallel_trace for higher TP degree\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed hangs when tracing models using parallel_trace for higher TP degree\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.\n\n\n----\n\n.. _nxd-core-2-12-0-rn:\n\nNxD Core [0.2.0] (Neuron 2.12.0 Release)\n-----------------------------------------\n\nDate of Release: 07/19/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added parallel cross entropy loss function.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards.\n\n\n----\n\n.. _nxd-core-2-11-0-rn:\n\nNxD Core [0.1.0] (Neuron 2.11.0 Release)\n-----------------------------------------\n\nDate of Release: 06/14/2023\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Releasing the Neuron Distributed (``neuronx-distributed``) library for enabling large language model training/inference.\n* Added support for tensor-parallelism training/inference.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Currently the model checkpointing saves a sharded checkpoint, and users have to write a script to combine the shards."
  },
  {
    "path": "release-notes/components/nxd-inference.rst",
    "content": ".. meta::\n    :description: Complete release notes for the NxD Inference component across all AWS Neuron SDK versions.\n    :keywords: nxd inference, neuronx-distributed-inference, vllm, release notes, aws neuron sdk\n    :date-modified: 04/09/2026\n\n.. _nxd-inference_rn:\n\nComponent Release Notes for NxD Inference\n=========================================\n\nThe release notes for the NxD Inference Neuron component. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. _nxd-inference-2-29-0-rn:\n\nNxD Inference [0.9.17334] + vLLM Neuron Plugin [0.5.0] (Neuron 2.29.0 Release)\n------------------------------------------------------------------------------\n\nDate of Release: 04/09/2026\n\nNxD Inference\n~~~~~~~~~~~~~\n\nNeuron SDK 2.29.0 includes the following updates for NxD Inference library 0.9.17334:\n\nImprovements\n^^^^^^^^^^^^\n\n* Qwen2 VL Model support improvements (Beta) - Implements vision data parallelism support and improves QPS throughput by 7% for image-heavy workloads. See :doc:`Tutorial: Qwen2 VL Inference </libraries/nxd-inference/tutorials/qwen2-vl-tutorial>`.\n* Qwen3 VL Model support improvements (Beta) - Implements text-model sequence parallelism and on-device vision patch embedding, parallel merger, and padding/slicing, achieving 2.2x QPS throughput for image-heavy workloads. See :doc:`Tutorial: Qwen3 VL Inference </libraries/nxd-inference/tutorials/qwen3-vl-tutorial>`.\n* Flux.1 Model support improvements (Beta) - Implements CFG parallelism for text-to-image use case, improving E2E latency by 19% and instance throughput by 23%. See :doc:`Tutorial: Flux.1 Inference Tutorial </libraries/nxd-inference/tutorials/flux-inference-tutorial>`.\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n\n* NxD Inference no longer supports NKI kernels on Trn1/Inf2 hardware, as NKI 0.3.0 kernels are not supported on Trn1/Inf2. NxD Inference models are now only supported on Trn2 and newer hardware. Customers who require NxD Inference kernel support on Trn1 or Inf2 instances should pin to release 2.28.\n* The BWMM shard-on-hidden kernel previously used during prefill in Mixture-of-Experts models has been removed. Models that depend on this kernel (including Llama 4, Mixtral, and DBRX configurations) should be pinned to release 2.28 for optimal performance.\n\nBug Fixes\n^^^^^^^^^^^^\n\n* Fixed the issue on Qwen2-VL where the ``default_image_width`` and ``default_image_height`` values are overwritten during model loading process.\n\nKnown Issues\n^^^^^^^^^^^^^\n\n* The :doc:`Top-K NKI kernel </nki/library/api/router-topk>` is enabled by default in this release. Release 2.29 has a known accuracy issue at smaller batch sizes (up to 4) when this kernel is disabled.\n\n.. note::\n  Qwen3-MoE 235B may observe degraded decode throughput compared to previous releases. Our team is actively investigating the root cause. In the meantime, we recommend customers use release 2.28 for workloads where Qwen3-MoE 235B decode performance is critical.\n\n.. _nxd-inference-2-28-0-rn:\n\nNxD Inference [0.8.16251] + vLLM Neuron Plugin [0.4.0] (Neuron 2.28.0 Release)\n------------------------------------------------------------------------------\n\nDate of Release: 02/26/2026\n\nNxD Inference\n~~~~~~~~~~~~~\n\nNeuron SDK 2.28.0 includes the following updates for NxD Inference library 0.8.16251:\n\nImprovements\n^^^^^^^^^^^^\n* Qwen2 VL Model Support (Beta) - NxD Inference supports Qwen2 VL vision language model which processes text and image inputs. Please refer to :doc:`Tutorial: Qwen2 VL Inference </libraries/nxd-inference/tutorials/qwen2-vl-tutorial>`.\n  \n  Compatible models include:\n    - `Qwen2-VL-7B-Instruct <https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct>`__\n* Qwen3 VL Model Support (Beta) - NxD Inference supports Qwen3 VL vision language model which processes text and image inputs. Please refer to :doc:`Tutorial: Qwen3 VL Inference </libraries/nxd-inference/tutorials/qwen3-vl-tutorial>`.\n  \n  Compatible models include:\n    - `Qwen3-VL-8B-Thinking <https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking>`__\n* Pixtral Model Support Improvements (Beta) - Adds new functionality support with batch size 32 and sequence length 10240 with vllm v1 on Trn2.\n* Flux.1 Model support improvements (Beta) - Implements CFG parallelism for text-to-image use case, improving E2E latency by 19% and instance throughput by 23%. See :doc:`Tutorial: Flux.1 Inference Tutorial </libraries/nxd-inference/tutorials/flux-inference-tutorial>`.\n\n\nKnown Issues\n^^^^^^^^^^^^\n* Qwen3 MoE only supports batch size >= 16 configurations.\n* Qwen3-VL only supports dynamic image resolution up to vision sequence length 16K, and total vision and text sequence length up to 32K. Qwen2-VL does not support dynamic image resolution yet.\n* Qwen-VL models only support batch size 1 configuration in vision encoder. No video understanding functionality is supported yet.\n* Llama 3.2 11B/90B tutorial and samples not compatible to vLLM V1 are removed.\n\nvLLM Plugin for Neuron\n~~~~~~~~~~~~~~~~~~~~~~\n\nNeuron SDK 2.28.0 includes the following updates for the vLLM Plugin 0.4.1 for Neuron:\n\nImprovements\n^^^^^^^^^^^^\n* Multi-LoRA Serving Enhancements - NxD Inference supports streaming LoRA adapters via vLLM's `load_adapter` serving API, allowing adapters to be loaded into CPU memory dynamically at runtime. This provides more flexibility as users no longer need to specify all adapter checkpoint paths before execution. Additionally, users can now run the base model alone when multi-LoRA serving is enabled. See the :doc:`Llama 3.1 8B Multi-LoRA tutorial </libraries/nxd-inference/tutorials/trn2-llama3.1-8b-multi-lora-tutorial>` for more details.\n* Eagle3 Speculative Decoding - NxD Inference supports Eagle3 speculative decoding on Llama 3.1 8B.\n\n  Supported Eagle3 draft models include:\n    - `EAGLE-LLaMA3-Instruct-8B <https://huggingface.co/yuhuili/EAGLE-LLaMA3-Instruct-8B>`__\n* vLLM v0.13.0 Support - vLLM Neuron Plugin supports vLLM v0.13.0 and Pytorch 2.9.\n\n\nKnown Issues\n^^^^^^^^^^^^\n* This version of the vLLM Neuron Plugin is pinned to vLLM version v0.13.0 and requires PyTorch 2.9. If you must use PyTorch 2.7 or 2.8, you may fall back to the Neuron fork of vLLM that implements a Neuron integration using the vLLM V0 architecture. However, note that this fork is no longer maintained and not all features may be available. The fork can be found at https://github.com/aws-neuron/upstreaming-to-vllm/releases/tag/2.26.1.\n  \n* When using data parallelism (DP > 1) with large batch sizes (e.g., >=16) and long input sequences (e.g., 28K+ tokens), time-to-first-token (TTFT) may be significantly degraded due to uneven request distribution across engine replicas. This is caused by an interaction between the Neuron plugin's engine client and vLLM's internal DP coordinator. Customers experiencing this issue should use version 0.4.1 of the plugin, or reduce concurrency and input sequence length as a workaround.\n  \n* Known issues for vLLM Neuron plugin are tracked in [vLLM V1 user guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide-v1.html#known-issues).\n\n.. _nxd-inference-2-27-1-rn:\n\nNxD Inference [0.7.15603] (Neuron 2.27.1 Release)\n---------------------------------------------------\n\nDate of Release: 01/14/2026\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed stability issue affecting Llama 4 that may occur when changing model configuration.\n\n\n----\n\n.. _nxd-inference-2-27-0-rn:\n\nNxD Inference [0.6.9230] (Neuron 2.27.0 Release)\n-------------------------------------------------\n\nDate of Release: 12/19/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for running NxD Inference on Trn3 instances.\n* Added support for vLLM V1 through vllm-neuron plugin.\n* Qwen3 MoE Model Support (Beta) — NxD Inference supports Qwen3 MoE language model which supports multilingual text inputs.\n* Pixtral Model Support (Beta) — NxD Inference supports Pixtral image understanding model which processes text and image inputs.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Pixtral deployment is supported up to batch size 32 and sequence length 10240 with vLLM v0. vLLM v1 deployment supports up to batch size 4 and sequence length 10240.\n* The performance of Qwen3 MoE and Pixtral on Trn2 is not fully optimized.\n* The vllm-neuron plugin source code in github is currently not compatible with 2.27 SDK.\n\n\n----\n\n.. _nxd-inference-2-25-0-rn:\n\nNxD Inference [0.5.9230] (Neuron 2.25.0 Release)\n-------------------------------------------------\n\nDate of Release: 07/31/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for Qwen3 dense models (0.6B to 32B parameters), which are tested on Trn1.\n* Added simplified functions for validating the accuracy of logits returned by a model: ``check_accuracy_logits_v2`` and ``generated_expected_logits``.\n* Added ``scratchpad_page_size`` attribute to NeuronConfig for configuring the scratchpad page size used during compilation and at runtime.\n* Enabled Chunked Attention as a generic building block for any attention-based model.\n* Published scripts to evaluate model accuracy and benchmark performance against Neuron.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* Removed support for Meta checkpoint compatibility in Llama3.2 Multimodal modeling code. You can continue to use Hugging Face checkpoints.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed accuracy issues when using Automatic Prefix Caching (APC) with EAGLE speculation.\n* Fixed continuous batching for Llama3.2 Multimodal where the input batch size is less than the compiled batch size.\n* Added support for continuous batching when running Neuron modeling code on CPU.\n* Set a manual seed in ``benchmark_sampling`` to improve the stability of data-dependent benchmarks like speculation.\n\n\n----\n\n.. _nxd-inference-2-24-0-rn:\n\nNxD Inference [0.4.7422] (Neuron 2.24.0 Release)\n---------------------------------------------------\n\nDate of Release: 06/24/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n**Models**\n\n* Qwen2.5 text models, which are tested on Trn1. Compatible models include:\n\n  * `Qwen2.5-0.5B-Instruct <https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct>`__\n  * `Qwen2.5-7B-Instruct <https://huggingface.co/Qwen/Qwen2.5-7B-Instruct>`__\n  * `Qwen2.5-32B-Instruct <https://huggingface.co/Qwen/Qwen2.5-32B-Instruct>`__\n  * `Qwen2.5-72B-Instruct <https://huggingface.co/Qwen/Qwen2.5-72B-Instruct>`__\n\n**Features**\n\n* Automatic Prefix Caching support (APC) through vLLM. APC improves efficiency by reusing KV cache from previous queries if the new query shares a prefix. APC can significantly improve TTFT based on how often different queries share the same prefixes. Performance gains are greater when requests have longer shared prefixes and when there is a higher frequency of prefix sharing across requests. For example, with Llama3.3 70B on Trn2, you can observe a 3.2x TTFT improvement with the math.math dataset (90% cache hit), a 1.6x TTFT improvement with a Sonnet dataset with 2K prompt length (25% cache hit), or no TTFT improvement with the HumanEval dataset (0% cache hit). For more information, see :ref:`nxdi_prefix_caching` and :ref:`/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-apc-tutorial.ipynb`.\n* Disaggregated Inference (DI) support through vLLM (Beta). Disaggregated Inference is also known as disaggregated serving, disaggregated prefill, or p/d disaggregation. DI separates the prefill and decode phase of inference onto different hardware resources. DI can improve inter token latency (ITL) by by eliminating prefill stall in continuous batching, where decode is paused to perform prefill for a new incoming request. With DI, you can also scale prefill and decode resources independently to further improve performance. For more information, see :ref:`nxdi-disaggregated-inference`.\n* Context parallelism in NeuronAttentionBase (Beta). Context parallelism distributes context processing across multiple NeuronCores. Context parallelism improves TTFT, particularly at higher sequence lengths where the number of KV heads is low. To use context parallelism, set ``cp_degree`` in NeuronConfig.\n* Mixed-precision parameters in modeling code. This feature enables you to configure each module's dtype independently. To use mixed-precision parameters, set ``cast_type=\"as-declared\"`` in NeuronConfig. Note: The default behavior (``cast_type=\"config\"``) is to cast all parameters to the ``torch_dtype`` in NeuronConfig.\n* Output logits when using on-device sampling. To output logits, enable ``output_logits`` in NeuronConfig. Note that this flag impacts performance and should only be used for debugging model logits.\n\n**Other changes**\n\n* Add support for PyTorch 2.7. This release includes support for PyTorch 2.5, 2.6, and 2.7.\n* Upgrade ``transformers`` requirement from v4.48 to v4.51.\n* Re-enable warmup on Trn2. NxD Inference disabled warmup on Trn2 in the previous release due to an issue that prevented certain model configurations from loading correctly. That issue is now fixed.\n* Update the behavior of the ``attn_kernel_enabled`` attribute in NeuronConfig, which configures whether to use the flash attention kernel. Previously, ``True`` meant to enable in all cases where supported, and ``False`` meant to auto-enable where beneficial (defaults to ``False``). Now, ``attn_kernel_enabled=False`` disables the flash attention kernel in all cases. To use the previous auto-enable behavior, set ``attn_kernel_enabled=None``. The default value for ``attn_kernel_enabled`` is now ``None`` to retain the same default behavior as before.\n* Enable ``--verify-hlo`` flag during compilation. Now, if an HLO is invalid, compilation will fail. Previously, in certain scenarios, the compiler would successfully compile invalid HLOs.\n* Update the flash attention kernel strategy to use the attention kernel on Trn2 in all cases where it's supported. This change fixes an issue where certain context lengths failed to trace.\n* Add ``logical_nc_config`` as an argument to the ``build_module`` and ``build_function`` test utilities, so you can use these utilities to test modules/functions for Trn2 using LNC2.\n* Other minor fixes and improvements.\n\nBug Fixes\n~~~~~~~~~\n\n* Other minor fixes and improvements.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Increased Device Memory Usage for Certain Configurations: Certain model configurations require slightly more device memory than in previous releases. If your model used close to the maximum amount of device memory in previous releases, this increase could cause it to fail to load after you compile it with this release. This issue is most likely to affect Llama3.1-405B configurations that use a large number of buckets.\n\n\n----\n\n.. _nxd-inference-2-23-0-rn:\n\nNxD Inference [0.3.5591] (Neuron 2.23.0 Release)\n-------------------------------------------------\n\nDate of Release: 05/20/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* NxD Inference is now GA and out of beta in the Neuron 2.23 release.\n\n**Features**\n\n* Shard-on-load for weight sharding is now enabled by default. With this change, end-to-end compile and load time is reduced by up to 70% when sharding weights. This change significantly reduces compile time by skipping weight sharding and serialization during compile, but may lead to increased load time. For example, for Llama 3.1 405B, end-to-end compile and load time is reduced from 40 minutes to 12 minutes. For best load performance, you can continue to serialize sharded weights by enabling ``save_sharded_checkpoint`` in NeuronConfig. For more information, see :ref:`nxdi-weights-sharding-guide`.\n* Neuron Persistent Cache. NxD Inference now supports Neuron Persistent Cache, which caches compiled model artifacts to reduce compilation times. For more information, see :ref:`nxdi-neuron-persistent-cache`.\n* Support for an attention block kernel for token generation. This kernel performs QKV projections, RoPE, attention, and output projections. You can use this kernel with Llama3-like attention on Trn2 to improve token gen performance. To use this kernel, enable ``attn_block_tkg_nki_kernel_enabled`` in NeuronConfig.\n\n  * This kernel can also update the KV cache in parallel with each layer's attention compute to further improve performance. This functionality hides the latency of the KV cache update that is otherwise done for all layers at once at the end of each token generation iteration. To enable in-kernel KV cache updates, enable ``attn_block_tkg_nki_kernel_cache_update`` in NeuronConfig. When in-kernel KV cache updating is enabled, you can also enable ``k_cache_transposed`` to further improve the performance.\n\n* Automatically extract ``target_modules`` and ``max_lora_rank`` from LoRA checkpoints. You no longer need to set these arguments manually.\n* Support fused residual add in the QKV kernel. This feature improves the performance of context encoding at short sequence lengths. To use this feature, enable the ``qkv_kernel_fuse_residual_add`` flag in NeuronConfig.\n\nBreaking Changes\n~~~~~~~~~~~~~~~~\n\n* Remove ``set_async_mode(async_mode)`` from NeuronBaseForCausalLM, as this feature didn't work as intended. Async mode cannot be enabled or disabled after the model is loaded. To enable async mode, set ``async_mode=True`` in NeuronConfig.\n\nBug Fixes\n~~~~~~~~~\n\n* Disable warmup for Trn2. This change avoids an issue that prevents certain model configurations from loading correctly.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* None reported for this release."
  },
  {
    "path": "release-notes/components/nxd-training.rst",
    "content": ".. meta::\n    :description: Complete release notes for the NxD Training component across all AWS Neuron SDK versions.\n    :keywords: nxd training, neuronx-distributed-training, release notes, aws neuron sdk\n    :date-modified: 07/31/2025\n\n.. _nxd-training_rn:\n\nComponent Release Notes for NxD Training\n========================================\n\nThe release notes for the NxD Training (``neuronx-distributed-training``) Neuron component. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. _nxd-training-2-25-0-rn:\n\nNxD Training [1.5.0] (Neuron 2.25.0 Release)\n---------------------------------------------\n\nDate of Release: 07/31/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* None\n\nBug Fixes\n~~~~~~~~~\n\n* Disable ``expert_index`` in Mixture of Experts (MoE) forwarding to limit the output to just hidden states and router logits (as expected).\n\n\n----\n\n.. _nxd-training-2-24-0-rn:\n\nNxD Training [1.4.0] (Neuron 2.24.0 Release)\n---------------------------------------------\n\nDate of Release: 06/26/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for PyTorch 2.7\n\n\n----\n\n.. _nxd-training-2-23-0-rn:\n\nNxD Training [1.3.0] (Neuron 2.23.0 Release)\n---------------------------------------------\n\nDate of Release: 05/16/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* (Beta release) Added autocast for HF based Llama3 8B and Llama3 70B models\n* (Beta release) Added support for context parallel sequence lengths up to 32k on TRN1\n* Added support for ORPO\n* Added support for nemo-toolkit 2.1\n* Added support for Transformers 4.48.0\n* Added support for PyTorch-Lightning 2.5.0\n* Added support for PyTorch 2.6\n\n\n----\n\n.. _nxd-training-2-22-0-rn:\n\nNxD Training [1.2.0] (Neuron 2.22.0 Release)\n---------------------------------------------\n\nDate of Release: 04/03/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for LoRA supervised fine-tuning.\n* Added option to configure collectives data types.\n* Minor fixes to reduce the amount of logs during training.\n* Removes ``--llm-training`` flag by default in all configs, except llama2. Note: this flag should not be enabled when using the Neuron Kernel Interface.\n\nBug Fixes\n~~~~~~~~~\n\n* Minor fixes to reduce the amount of logs during training.\n\n\n----\n\n.. _nxd-training-2-21-1-rn:\n\nNxD Training [1.1.1] (Neuron 2.21.1 Release)\n---------------------------------------------\n\nDate of Release: 01/14/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added a flag in Llama3/3.1 70B config to control the dtype of reduce-scatter operations in Column/Row Parallel linear layers.\n\n\n----\n\n.. _nxd-training-2-21-0-rn:\n\nNxD Training [1.1.0] (Neuron 2.21.0 Release)\n---------------------------------------------\n\nDate of Release: 12/20/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for HuggingFace Llama3/3.1 70B with trn2 instances\n* Added support for custom pipeline parallel cuts in HuggingFace Llama3\n* Added support for PyTorch 2.5\n* Added support for DPO post-training model alignment\n* Added support for Mixtral 8x7B Megatron and HuggingFace models\n* Added option in checkpoint converter to download and convert checkpoints using HuggingFace model identifier\n* Fix the validation loss to properly compute the average loss across the validation epoch\n* Minor bug fixes for error logging and imports\n\nBug Fixes\n~~~~~~~~~\n\n* Fix the validation loss to properly compute the average loss across the validation epoch\n* Minor bug fixes for error logging and imports\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Autocast option may not properly cast all inputs to bf16, recommended to use mixed precision option (currently is default) in configs for best results\n* With PT2.5, some of the key workloads like Llama3-8B training may show a reduced performance when using ``--llm-training`` compiler flag as compared to PT2.1. In such a case, try removing ``--llm-training`` flag from ``compiler_flags`` in the config.yaml\n\n\n----\n\n.. _nxd-training-2-20-1-rn:\n\nNxD Training [1.0.1] (Neuron 2.20.1 Release)\n---------------------------------------------\n\nDate of Release: 11/20/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for transformers 4.36.0\n\n\n----\n\n.. _nxd-training-2-20-0-rn:\n\nNxD Training [1.0.0] (Neuron 2.20.0 Release)\n---------------------------------------------\n\nDate of Release: 09/16/2024\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* This is the first release of NxD Training (NxDT), NxDT is a PyTorch-based library that adds support for user-friendly distributed training experience through a YAML configuration file compatible with NeMo, allowing users to easily set up their training workflows. At the same time, NxDT maintains flexibility, enabling users to choose between using the YAML configuration file, PyTorch Lightning Trainer, or writing their own custom training script using the NxD Core.\n* The library supports PyTorch model classes including Hugging Face and Megatron-LM. Additionally, it leverages NeMo's data engineering and data science modules enabling end-to-end training workflows on NxDT, and providing a compatability with NeMo through minimal changes to the YAML configuration file for models that are already supported in NxDT. Furthermore, the functionality of the Neuron NeMo Megatron (NNM) library is now part of NxDT, ensuring a smooth migration path from NNM to NxDT.\n\n**This release of NxDT includes:**\n\n* Installation through ``neuronx-distributed-training`` package.\n* Open Source Github repository: https://github.com/aws-neuron/neuronx-distributed-training\n* Support for YAML based interface allowing users to configure training from a config file.\n* Support for 3D-parallelism, sequence-parallelism and zero1.\n* Support for megatron-model and hugging-face based Llama model.\n* Support flash attention kernel.\n* Support for async checkpointing and s3 checkpointing.\n* Examples to pretrain and fine-tune Llama model\n\nKnown Issues\n~~~~~~~~~~~~\n\n* Model checkpointing saves sharded checkpoints. Users will have to write a script to combine the shards\n* Validation/Evaluation with interleaved pipeline feature is not supported.\n* NxDT shows slightly higher memory utilization as compared to NxD based examples."
  },
  {
    "path": "release-notes/components/pytorch.rst",
    "content": ".. meta::\n    :description: Complete release notes for the Neuron PyTorch framework component across all AWS Neuron SDK versions.\n    :keywords: pytorch, torch-neuronx, torch-neuron, transformers-neuronx, release notes, aws neuron sdk\n    :date-modified: 04/09/2026\n\n.. _pytorch_rn:\n\nComponent Release Notes for Neuron PyTorch Framework\n=====================================================\n\nThe release notes for the Neuron PyTorch framework component. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. _pytorch-2-29-0-rn:   \n\nPyTorch Framework [2.9.0.2.13.*] (Neuron 2.29.0 Release)\n--------------------------------------------------------\n\n**Date of Release**: 04/09/2026\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nUpdates\n^^^^^^^^\n* **PyTorch 2.7 and 2.8 are now End of Life**: Starting Neuron 2.29, PyTorch 2.7 and PyTorch 2.8 are now End of Life. If you require PyTorch 2.7 or PyTorch 2.8 support, use Neuron SDK 2.28.\n\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n* **PyTorch/XLA replaced by TorchNeuron in PyTorch 2.10**: Starting with PyTorch 2.10 support (planned for a future Neuron release), AWS Neuron will use PyTorch support via TorchNeuron instead of PyTorch/XLA. PyTorch 2.9 is the last version using PyTorch/XLA. Users will need to update their scripts when upgrading to PyTorch 2.10 or later. See :ref:`native-pytorch-trainium` for complete details.\n\n\nBug Fixes\n^^^^^^^^^\n* No new bug fixes in this release.\n\n\nKnown Issues\n^^^^^^^^^^^^\n* **Segmentation faults with certain vision models**: Vision models including ``yolos``, ``wav2vec2``, and ``convbert`` crash with segmentation faults during model tracing.\n\n  **How to check if affected**: If your model tracing fails with a segmentation fault, you are likely affected by this issue.\n\n  **Workaround**: Downgrade to torch-neuronx 2.8, which does not exhibit this issue.\n\n  See `GitHub issue #1265 <https://github.com/aws-neuron/aws-neuron-sdk/issues/1265>`_ for updates.\n\n\n.. _pytorch-2-28-0-rn:   \n\nPyTorch Framework (Neuron 2.28.0 Release)\n--------------------------------------------------------\n\n**Date of Release**: 02/26/2026\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* No new improvements in this release.\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n\n* **PyTorch/XLA replaced by TorchNeuron in PyTorch 2.10**: Starting with PyTorch 2.10 support (planned for a future Neuron release), AWS Neuron will use PyTorch support via TorchNeuron instead of PyTorch/XLA. PyTorch 2.9 is the last version using PyTorch/XLA. Users will need to update their scripts when upgrading to PyTorch 2.10 or later. See :ref:`native-pytorch-trainium` for complete details.\n\nBug Fixes\n^^^^^^^^^\n\n* No new bug fixes in this release.\n\nKnown Issues\n^^^^^^^^^^^^\n\n* **Segmentation faults with certain vision models**: Vision models including ``yolos``, ``wav2vec2``, and ``convbert`` crash with segmentation faults during model tracing.\n\n  **How to check if affected**: If your model tracing fails with a segmentation fault, you are likely affected by this issue.\n\n  **Workaround**: Downgrade to torch-neuronx 2.8, which does not exhibit this issue.\n\n  See `GitHub issue #1265 <https://github.com/aws-neuron/aws-neuron-sdk/issues/1265>`_ for updates.\n\n* **Performance degradation with public PyPI torch-xla 2.8.0**: Using the publicly released version of torch-xla 2.8.0 from public PyPI repositories results in 10-15% performance degradation for BERT and LLaMA models (`pytorch/xla#9605 <https://github.com/pytorch/xla/issues/9605>`_).\n\n  **Workaround**: Upgrade to torch-xla version 2.8.1 from public PyPI repositories, which resolve this performance issue.\n\n  See :doc:`/setup/pytorch/index` for detailed installation instructions.\n\n* **PyTorch NeuronX 2.7 does not support Python 3.12**: torch-neuronx 2.7 supports Python 3.10 and 3.11 only. Python 3.12 is not supported in torch-neuronx 2.7.\n\n  **Impact**: Attempting to install or run torch-neuronx 2.7 with Python 3.12 will fail with dependency errors.\n\n  **Workaround**: Use Python 3.10 or 3.11 with torch-neuronx 2.7, or upgrade to torch-neuronx 2.9 which supports Python 3.12.\n\n  See :ref:`setup-guide-index` for complete system requirements and compatibility information.\n\n\n.. _pytorch-2-27-0-rn:\n\nPyTorch Framework (Neuron 2.27.0 Release)\n--------------------------------------------------------\n\n**Date of Release**: 12/19/2025\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added support for PyTorch 2.9\n* Improved model tracing performance for large models by up to 90% through trace API optimizations that avoid copying weights and state buffers to the device and guarantee state restoration after tracing.\n* Fixed GitHub issue #1240 impacting torch-neuronx 2.7 to 2.9\n* Fixed GitHub issue #834 impacting torch-neuronx 2.7 to 2.9\n* Fixed issue in PyTorch 2.8 where PJRT_Client_Destroy was not being called, which prevented NRT:nrt_close from being invoked.\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n\n* PyTorch 2.6 has reached end-of-support since release 2.27.\n* Transitioning to PyTorch Native Support: In the next Neuron release that will support PyTorch 2.10, AWS Neuron will transition from PyTorch/XLA to PyTorch support via TorchNeuron. PyTorch 2.9 will be the last version based on PyTorch/XLA.\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed resource leaks and \"nrtucode: internal error: 832 object(s) leaked, improper teardown\" errors by ensuring proper cleanup of Neuron Runtime resources on program exit.\n\nKnown Issues\n^^^^^^^^^^^^\n\n* Using the publicly released version of torch-xla 2.8.0 from public PyPI repositories would result in lower performance for models like BERT and LLaMA.\n* Using the latest torch-xla v2.7 may result in an increase in host memory usage compared to torch-xla v2.6.\n* PyTorch NeuronX 2.7 supports Python 3.10, and 3.11 only. Python 3.12 is not supported.\n\n----\n\n.. _pytorch-2-26-1-rn:\n\nPyTorch Framework (Neuron 2.26.1 Release)\n--------------------------------------------------------\n\nDate of Release: 10/29/2025\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed an issue with out-of-memory errors by enabling the use of the :doc:`Neuron Runtime API </neuron-runtime/api/index>` to apply direct memory allocation.\n\n\n----\n\n.. _pytorch-2-26-0-rn:\n\nPyTorch Framework (Neuron 2.26.0 Release)\n--------------------------------------------------------\n\nDate of Release: 09/18/2025\n\nReleased Versions: ``2.8.0.2.10.*``, ``2.7.0.2.10.*``, ``2.6.0.2.10.*``\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added support for PyTorch 2.8 (see :ref:`Introducing PyTorch 2.8 Support<introduce-pytorch-2-8>`)\n\nKnown Issues\n^^^^^^^^^^^^\n\n.. note::\n   * See :ref:`Introducing PyTorch 2.8 Support<introduce-pytorch-2-8>` for a full list of known issues with v2.8.\n   * See :ref:`Introducing PyTorch 2.7 Support<introduce-pytorch-2-7>` for a full list of known issues with v2.7.\n   * See :ref:`Introducing PyTorch 2.6 Support<introduce-pytorch-2-6>` for a full list of known issues with v2.6.\n\n* [PyTorch v2.8] Using the publicly released version of torch-xla 2.8.0 from public PyPI repositories would result in lower performance for models like BERT and LLaMA (https://github.com/pytorch/xla/issues/9605). To fix this, switch to using the updated torch-xla version 2.8.1 from public PyPI repositories.\n\n* [PyTorch v2.7] Using the latest torch-xla v2.7 may result in an increase in host memory usage compared to torch-xla v2.6. In one example, LLama2 pretraining with ZeRO1 and sequence length 16k could see an increase of 1.6% in host memory usage.\n\n* Currently, when switching Ubuntu OS kernel version from 5.15 to 6.8, you may see performance differences due to the new kernel scheduler (CFS vs EEVDF). For example, BERT pretraining performance could be lower by up to 10%. You may try using an older OS kernel (i.e. Amazon Linux 2023) or experiment with the kernel real-time scheduler by running ``sudo chrt --fifo 99`` before your command (i.e. ``sudo chrt --fifo 99 <script>``) to improve the performance. Note that adjusting the real-time scheduler can also result in lower performance. See https://www.kernel.org/doc/html/latest/scheduler/sched-eevdf.html for more information.\n\n* Currently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``.\n\n* [PyTorch v2.6] BERT pretraining performance is approximately 10% lower with torch-neuronx 2.6 compared to torch-neuronx 2.5. This is due to a known regression in torch-xla https://github.com/pytorch/xla/issues/9037 and may affect other models with high graph tracing overhead. This is fixed in torch-xla 2.7 and 2.8. To work around this issue in torch-xla 2.6, build the ``r2.6_aws_neuron`` branch of torch-xla as follows (see :ref:`pytorch-neuronx-install-cxx11` for C++11 ABI version):\n\n.. code:: bash\n\n      # Setup build env (make sure you are in a python virtual env). Replace \"apt\" with \"yum\" on AL2023.\n      sudo apt install cmake\n      pip install yapf==0.30.0\n     wget https://github.com/bazelbuild/bazelisk/releases/download/v1.20.0/bazelisk-linux-amd64\n     sudo cp bazelisk-linux-amd64 /usr/local/bin/bazel\n\n     # Clone repos\n     git clone --recursive https://github.com/pytorch/pytorch --branch v2.6.0\n     cd pytorch/\n     git clone --recursive https://github.com/pytorch/xla.git --branch r2.6_aws_neuron\n     _GLIBCXX_USE_CXX11_ABI=0 python setup.py bdist_wheel\n\n     # The pip wheel will be present in ./dist\n     cd xla/\n     CXX_ABI=0 python setup.py bdist_wheel\n\n     # The pip wheel will be present in ./dist and can be installed instead of the torch-xla released in pypi.org\n\n* Currently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 through 2.8 although there will be end-of-support warnings (as noted below).\n\n* Environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (see the warning raised below). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`).\n\n.. code:: bash\n\n   Warning: ``XLA_DOWNCAST_BF16`` will be deprecated after the 2.5 release, please downcast your model directly\n\n* [PyTorch v2.8+] ``DeprecationWarning: Use torch_xla.device instead``. This is a warning that ``torch_xla.core.xla_model.xla_device()`` is deprecated. Switch to using ``torch_xla.device()`` instead.\n\n* [PyTorch v2.8+] ``DeprecationWarning: Use torch_xla.sync instead``. This is a warning that ``torch_xla.core.xla_model.mark_step()`` is deprecated. Switch to using ``torch_xla.sync()`` instead.\n\n* [PyTorch v2.7+] ``AttributeError: module 'torch_xla.core.xla_model' ... does not have the attribute 'xrt_world_size'``. This is an error that notes that ``torch_xla.core.xla_model.xrt_world_size()`` is removed in torch-xla version 2.7+. Switch to using ``torch_xla.runtime.world_size()`` instead.\n\n* [PyTorch v2.7+] ``AttributeError: module 'torch_xla.core.xla_model' ... does not have the attribute 'get_ordinal'``. This is an error that notes that ``torch_xla.core.xla_model.get_ordinal()`` is removed in torch-xla version 2.7+. Switch to using ``torch_xla.runtime.global_ordinal()`` instead.\n\n* [PyTorch v2.5+] ``AttributeError: module 'torch_xla.runtime' has no attribute 'using_pjrt'``. In Torch-XLA 2.5+, ``torch_xla.runtime.using_pjrt`` is removed because PJRT is the sole Torch-XLA runtime. See this `PyTorch commit PR on GitHub <https://github.com/pytorch/xla/commit/d6fb5391d09578c8804b1331a5e7a4f72bf981db>`_.\n\n\n----\n\n.. _pytorch-2-25-0-rn:\n\nPyTorch Framework (Neuron 2.25.0 Release)\n--------------------------------------------------------\n\nDate of Release: 07/31/2025\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* The Core Placement API is no longer beta/experimental and the instructions on how to use it have been updated.\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n\n* To migrate, replace any function scope ``torch_neuron.experimental.`` with ``torch_neuron.``. The change will have no effect on behavior or performance.\n\nKnown Issues\n^^^^^^^^^^^^\n\n* Using the latest torch-xla v2.7 may result in increase in host memory usage compared torch-xla v2.6.\n* When switching Ubuntu OS kernel version from 5.15 to 6.8, you may see performance differences due to the new kernel scheduler (CFS vs EEVDF).\n* When using tensor split operation on a 2D array in the second dimension, the resulting tensors don't have the expected data.\n* BERT pretraining performance is ~10% lower with torch-neuronx 2.6 compared to torch-neuronx 2.5.\n\n\n----\n\n.. _pytorch-2-21-1-rn:\n\nPyTorch Framework (Neuron 2.21.1 Release)\n--------------------------------------------------------\n\nDate of Release: 01/14/2025\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* The transformers dependency has been pinned to ``transformers<4.48``\n\n\n----\n\n.. _pytorch-2-21-0-rn:\n\nPyTorch Framework (Neuron 2.21.0 Release)\n--------------------------------------------------------\n\nDate of Release: 12/20/2024\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Flash decoding support for speculative decoding\n* Enabled on-device generation support in speculative decoding flows\n* Added support for EAGLE speculative decoding support with greedy and lossless sampling\n* Support for CPU compilation and sharded model saving\n* Performance optimized MLP and QKV kernels added for llama models with support for sequence parallel norm\n* Added support to control concurrent compilation workers\n* Added option to skip AllGather using duplicate Q weights during shard over sequence\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed padding issues when requested batch size is smaller than neff compiled size\n* Fixed sequence parallel norm issue when executor is used with speculative decoding flows\n\nKnown Issues\n^^^^^^^^^^^^\n\n* GPT-NeoX is sensitive to ``fp16`` and customers are advised to use only ``amp=\"f32\"`` for GPT-NeoX.\n* Using ``cache_layout=constants.LAYOUT_BSH`` in NeuronConfig has known limitations with compilation. Customers are advised to use ``constants.LAYOUT_SBH`` instead.\n\n\n----\n\n.. _pytorch-2-20-0-rn:\n\nPyTorch Framework (Neuron 2.20.0 Release)\n--------------------------------------------------------\n\nDate of Release: 09/16/2024\n\ntorch-neuron\n~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Minor updates.\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* This release adds support for Neuron Kernel Interface (NKI), Python 3.11, and protobuf versions 3.20+, as well as improved BERT performance.\n* Added support for Neuron Kernel Interface (NKI).\n* Added support for Python 3.11.\n* Added support for protobuf versions 3.20+.\n* (Training) Increased performance for BERT-Large pretraining by changing ``NEURON_TRANSFER_WITH_STATIC_RING_OPS`` default.\n* (Training) Improved Neuron Cache locking mechanism for better Neuron Cache performance during multi-node training\n* (Inference) Added support for weight separated models for DataParallel class.\n\nKnown Issues\n^^^^^^^^^^^^\n\n* Error ``cannot import name 'builder' from 'google.protobuf.internal'`` after installing compiler from earlier releases (2.19 or earlier)\n* Lower accuracy when fine-tuning Roberta\n* Slower loss convergence for NxD LLaMA-3 70B pretraining using ZeRO1 tutorial\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Support for model serialization (save and load) of all models except the ``GPTJForSampling`` and ``GPTNeoXForSampling`` model classes, which reduces future model load time by saving a transformed and sharded set of weights as a new safetensors checkpoint.\n* Support for on device sampling (Top P) with Continuous batching\n* Support for Scaled RoPE for LLAMA 3.1 models\n* Support for multi-node inference for LLAMA 3.1 405B model for specific sequence lengths\n* Support for FlashDecoding (using ``shard_over_sequence``) for supporting long context lengths upto 128k\n\nBug Fixes\n^^^^^^^^^\n\n* Fixes to handle ``seq_ids`` consistently across vLLM versions\n* Fixes for KV head full replication logic errors\n\n----\n\n.. _pytorch-2-19-0-rn:\n\nPyTorch Framework (Neuron 2.19.0 Release)\n--------------------------------------------------------\n\nDate of Release: 07/03/2024\n\ntorch-neuron\n~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Minor updates.\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Improvements in ZeRO1 to have FP32 master weights support and BF16 all-gather\n* Added custom SILU enabled via ``NEURON_CUSTOM_SILU`` environment variable\n* Neuron Parallel Compile now handle non utf-8 characters in trial-run log and reports compilation time results when enabled with ``NEURON_PARALLEL_COMPILE_DUMP_RESULTS``\n* Support for using DummyStore during PJRT process group initialization by setting ``TORCH_DIST_INIT_BARRIER=0`` and ``XLA_USE_DUMMY_STORE=1``\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Support for compiler optimized flash attention kernel to support context lengths of 16k/32k for Llama models\n* Streamer support enabled for BLOOM, GPTJ, GPT2, GPT-NeoX and LLAMA models\n* Support for on device generation for TopK in Mixtral models\n* Continuous batching support for Mistral v0.2\n* Minor API improvements with type annotations for NeuronConfig, end-of-support warnings for old arguments, and exposing top-level configurations\n* Performance improvements such as an optimized logit ordering for continuous batching in Llama models, optimized QKV padding for certain GQA models, faster implementation of cumsum operation to improve TopP performance\n\nBug Fixes\n^^^^^^^^^\n\n* Removed ``start_ids=None`` from ``generate()``\n* Mistral decoding issue that occurs during multiple sampling runs\n* Mistralv0.1 sliding window error\n* Off-by-one error in window context encoding\n* Better error messaging\n\nKnown Issues\n^^^^^^^^^^^^\n\n* ``on_device_generation=GenerationConfig(do_sample=True)`` has some known failures for Llama models. Customers are advised not to use ``on_device_generation`` in such cases.\n* GPT-NeoX is sensitive to ``fp16`` and customers are advised to use only ``amp=\"f32\"`` for GPT-NeoX.\n* Using ``cache_layout=constants.LAYOUT_BSH`` in NeuronConfig has known limitations with compilation. Customers are advised to use ``constants.LAYOUT_SBH`` instead.\n\n----\n\n.. _pytorch-2-18-0-rn:\n\nPyTorch Framework (Neuron 2.18.0 Release)\n--------------------------------------------------------\n\nDate of Release: 04/10/2024\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* [Beta] Added support for continuous batching and a reference integration with vLLM (Llama models only)\n\nKnown Issues\n^^^^^^^^^^^^\n\n* There is a known compiler issue for inference of some configurations of Llama-2 70B that can cause accuracy degredation. Customers are advised to use the ``--enable-mixed-precision-accumulation`` compiler flag if Llama-2 70B accuracy issues occur.\n* There is a known compiler issue for inference of some configurations of Llama-2 13B that can cause accuracy degredation. Customers are advised to use the ``--enable-saturate-infinity --enable-mixed-precision-accumulation`` compiler flags if Llama-2 13B accuracy issues occur.\n* There is a known compiler issue for inference of some configurations of GPT-2 that can cause accuracy degredation. Customers are advised to use the ``--enable-saturate-infinity --enable-mixed-precision-accumulation`` compiler flags if GPT-2 accuracy issues occur.\n* GPT-NeoX is sensitive to ``fp16`` and customers are advised to use only ``amp=\"f32\"`` for GPT-NeoX.\n* Using ``cache_layout=constants.LAYOUT_BSH`` in NeuronConfig has known limitations with compilation. Customers are advised to use ``constants.LAYOUT_SBH`` instead.\n\n----\n\n.. _pytorch-2-17-0-rn:\n\nPyTorch Framework (Neuron 2.17.0 Release)\n--------------------------------------------------------\n\nDate of Release: 04/01/2024\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added support for on device log-softmax and on device sampling for TopK\n* Added support for on device embedding for all models\n* Added support for Speculative Decoding\n* [Beta] Added support for Mixtral-8x7b MoE\n* [Beta] Added support for mistralai/Mistral-7B-Instruct-v0.2 with no sliding window\n* Added faster checkpoint loading support for both sharded and whole checkpoints\n* Added the ability to download checkpoints directly from huggingface hub repositories\n* Added NeuronAutoModelForCausalLM class which automatically loads architecture-specific classes\n* Added a warmup to all kernels to avoid unexpected initialization latency spikes\n\nBug Fixes\n^^^^^^^^^\n\n* Users no longer need a copy of the original checkpoint and can use safetensor checkpoints for optimal speed.\n\nKnown Issues\n^^^^^^^^^^^^\n\n* There is a known compiler issue for inference of some configurations of Llama-2 70B that can cause accuracy degredation. Customers are advised to use the ``--enable-mixed-precision-accumulation`` compiler flag if Llama-2 70B accuracy issues occur.\n* There is a known compiler issue for inference of some configurations of Llama-2 13B that can cause accuracy degredation. Customers are advised to use the ``--enable-saturate-infinity --enable-mixed-precision-accumulation`` compiler flags if Llama-2 13B accuracy issues occur.\n* There is a known compiler issue for inference of some configurations of GPT-2 that can cause accuracy degredation. Customers are advised to use the ``--enable-saturate-infinity --enable-mixed-precision-accumulation`` compiler flags if GPT-2 accuracy issues occur.\n* GPT-NeoX is sensitive to ``fp16`` and customers are advised to use only ``amp=\"f32\"`` for GPT-NeoX.\n\n\n----\n\n.. _pytorch-2-16-0-rn:\n\nPyTorch Framework (Neuron 2.16.0 Release)\n--------------------------------------------------------\n\nDate of Release: 12/21/2023\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* [Beta] Added support for Llama-2 70B\n* [Beta] Added support for Mistral 7B\n* [Beta] Added support for PyTorch 2.1\n* [Beta] Added support for Grouped Query Attention (GQA)\n* [Beta] Added support for ``safetensors`` serialization\n* [Beta] Added support for early stopping in the ``sample_llama`` function\n* [Beta] Added sparse attention support for GPT2\n* Added support for ``BatchNorm``\n* Use the ``--auto-cast=none`` compiler flag by default for all models. This flag improves accuracy for ``float32`` operations\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* [Beta] Added support for Llama-2 70B\n* [Beta] Added support for Mistral 7B\n* [Beta] Added support for Grouped Query Attention (GQA)\n* [Beta] Added support for ``safetensors`` serialization\n* [Beta] Added support for early stopping in the ``sample_llama`` function\n* [Beta] Added sparse attention support for GPT2\n\nBug Fixes\n^^^^^^^^^\n\n* Resolved an issue in ``top_p`` in the ``sample_llama`` function so that it now selects the same number of tokens that the Hugging Face ``top_p`` implementation selects.\n\nKnown Issues\n^^^^^^^^^^^^\n\n* There is a known compiler issue for inference of some configurations of Llama-2 70B that can cause accuracy degredation. Customers are advised to use the ``--enable-mixed-precision-accumulation`` compiler flag if Llama-2 70B accuracy issues occur.\n* There are known compiler issues impacting inference accuracy of certain model configurations of ``Llama-2-13b`` when ``amp = fp16`` is used. If this issue is observed, ``amp=fp32`` should be used as a work around. This issue will be addressed in future Neuron releases.\n\n----\n\n.. _pytorch-2-15-0-rn:\n\nPyTorch Framework (Neuron 2.15.0 Release)\n--------------------------------------------------------\n\nDate of Release: 10/26/2023\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* [Beta] Added support for ``int8`` quantization for Llama\n* [Beta] Added multi bucket context encoding support for BLOOM\n* [Beta] Added model Serialization for all supported models (except GPT-J and GPT-NeoX)\n* [Beta] Added the ability to return output logit scores during sampling\n* Added support for ``SOLU`` activation and ``GroupNorm``\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* [Beta] Added support for ``int8`` quantization for Llama\n* [Beta] Added multi bucket context encoding support for BLOOM\n* [Beta] Added model Serialization for all supported models (except GPT-J and GPT-NeoX)\n* [Beta] Added the ability to return output logit scores during sampling\n\nBug Fixes\n^^^^^^^^^\n\n* [GPT2] Fixed an issue in ``GPT2ForSamplingWithContextBroadcasting`` where the input prompt would get truncated if it was longer than the ``context_length_estimate``.\n\n----\n\n.. _pytorch-2-14-0-rn:\n\nPyTorch Framework (Neuron 2.14.0 Release)\n--------------------------------------------------------\n\nDate of Release: 09/15/2023\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Use the ``--model-type=transformer`` compiler flag by default for all models. This flag improves performance and compilation time for all models. This flag replaces the ``--model-type=transformer-inference`` flag, which is now deprecated.\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed an issue where the ``HuggingFaceGenerationModelAdapter`` class falls back to serial context encoding for models that have parallel context encoding (``GPT2ForSamplingWithContextBroadcasting``, ``LlamaForSampling``, etc.)\n* [GPT2 / OPT] Fixed an issue in the parallel context encoding network where incorrect results could be generated due to incorrect masking logic.\n\nKnown Issues\n^^^^^^^^^^^^\n\n* Some configurations of Llama and Llama-2 inference models fail compilation with the error ``IndirectLoad/Save requires contiguous indirect access per partition``. This is fixed in the compiler version 2.10.0.35 (Neuron SDK 2.14.1).\n* Some configurations of Llama and Llama-2 inference model fail compilation with the error ``Too many instructions after unroll for function sg0000``. To mitigate this, please try with ``-O1`` compiler option (or ``--optlevel 1``) by adding ``os.environ[\"NEURON_CC_FLAGS\"] = \"-O1\"`` to your script or set in the environment. A complete fix will be coming in the future release which will not require this option. Note: Using -O1 in the Llama-2 13B tutorial results in about 50% increase in latency compared to Neuron SDK 2.13.2. If this is not acceptable, please use compiler version from Neuron SDK 2.13.2.\n\n\n----\n\n.. _pytorch-2-13-0-rn:\n\nPyTorch Framework (Neuron 2.13.0 Release)\n--------------------------------------------------------\n\nDate of Release: 08/28/2023\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added support for Llama 2 (excluding grouped/multi-query versions, such as Llama 2 70B) [Beta]\n* Improved the performance of BLOOM and Llama models [Beta]\n* Reduced execution latency of token generation in tensor parallel models by improving thread synchronization (supported in Llama only)\n* Added an optimized vector implementation of RoPE positional embedding (supported in Llama only)\n* Added support for faster context encoding on sequences of varying lengths. This is implemented by allowing multiple buckets for parallel context encoding. During inference the best fit bucket is chosen (supported in Llama/GPT-2 only)\n* Added the Neuron Persistent Cache for compilation to automatically load pre-compiled model artifacts (supported by all models)\n* Improved compilation time by compiling models used for different sequence length buckets in parallel (not supported in GPT-NeoX/GPT-J)\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added support for Llama 2 (excluding grouped/multi-query versions, such as Llama 2 70B) [Beta]\n* Improved the performance of BLOOM and Llama models [Beta]\n* Reduced execution latency of token generation in tensor parallel models by improving thread synchronization (supported in Llama only)\n* Added an optimized vector implementation of RoPE positional embedding (supported in Llama only)\n* Added support for faster context encoding on sequences of varying lengths. This is implemented by allowing multiple buckets for parallel context encoding. During inference the best fit bucket is chosen (supported in Llama/GPT-2 only)\n* Added the Neuron Persistent Cache for compilation to automatically load pre-compiled model artifacts (supported by all models)\n* Improved compilation time by compiling models used for different sequence length buckets in parallel (not supported in GPT-NeoX/GPT-J)\n\nBug Fixes\n^^^^^^^^^\n\n* [Llama] Fixed an issue in the parallel context encoding network where incorrect results could be generated if the context length is shorter than the context length estimate\n* [GPT2 / OPT] Fixed an issue in the parallel context encoding network where incorrect results could be generated\n\nKnown Issues\n^^^^^^^^^^^^\n\n* The ``HuggingFaceGenerationModelAdapter`` class currently falls back to serial context encoding for models that have parallel context encoding (``GPT2ForSamplingWithContextBroadcasting``, ``LlamaForSampling``, etc.)\n* Beam search can introduce memory issues for large models\n* There can be accuracy issues for the GPT-J model for certain use-cases\n\n\n----\n\n.. _pytorch-2-12-0-rn:\n\nPyTorch Framework (Neuron 2.12.0 Release)\n--------------------------------------------------------\n\nDate of Release: 07/21/2023\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added support for GPT-NeoX models [Beta]\n* Added support for BLOOM models [Beta]\n* Added support for Llama models [Alpha]\n* Added support for more flexible tensor-parallel configurations to GPT2, OPT, and BLOOM. The attention heads doesn't need to be evenly divisible by ``tp_degree`` anymore\n* Added multi-query / multi-group attention support for GPT2\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added support for GPT-NeoX models [Beta]\n* Added support for BLOOM models [Beta]\n* Added support for Llama models [Alpha]\n* Added support for more flexible tensor-parallel configurations to GPT2, OPT, and BLOOM. The attention heads doesn't need to be evenly divisible by ``tp_degree`` anymore\n* Added multi-query / multi-group attention support for GPT2\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed NaN issues for GPT2 model\n* Fixed OPT/GPT-NeoX gibberish output\n* Resolved an issue where NaN values could be produced when the context_length argument was used in GPT2/OPT\n\nKnown Issues\n^^^^^^^^^^^^\n\n* Missing cache reorder support for beam search\n\n\n----\n\n.. _pytorch-2-11-0-rn:\n\nPyTorch Framework (Neuron 2.11.0 Release)\n--------------------------------------------------------\n\nDate of Release: 06/14/2023\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added ``int8`` weight storage for GPT2 models\n* Improved prompt context encoding performance for GPT2 models\n* Improved collective communications performance for tp-degrees 4, 8, and 24 on Inf2\n* Improved collective communications performance for tp-degrees 8 and 32 on Trn1\n* Support for the ``--model-type=transformer-inference`` compiler flag for optimized decoder-only LLM inference\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added ``int8`` weight storage for GPT2 models\n* Improved prompt context encoding performance for GPT2 models\n* Improved collective communications performance for tp-degrees 4, 8, and 24 on Inf2\n* Improved collective communications performance for tp-degrees 8 and 32 on Trn1\n* Support for the ``--model-type=transformer-inference`` compiler flag for optimized decoder-only LLM inference\n\nBug Fixes\n^^^^^^^^^\n\n* Added padding to the GPT-J ``linear`` layer to correctly handle odd vocabulary sizes\n* Issues where the HuggingFace ``generate`` method produces incorrect results when ``beam_search`` is used have been resolved\n\n\n----\n\n.. _pytorch-2-10-0-rn:\n\nPyTorch Framework (Neuron 2.10.0 Release)\n--------------------------------------------------------\n\nDate of Release: 05/01/2023\n\ntorch-neuronx\n~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added ``transformers-neuronx`` artifacts to PyPI repository\n* Added support for the HuggingFace ``generate`` method\n* Added model serialization support for GPT2 models, including model saving, loading, and weight swapping\n* Added support for caching compiled artifacts\n* Improved performance by removing unnecessary KV-cache tensor resetting\n* Improved prompt context encoding performance (OPT, GPT2)\n\ntransformers-neuronx\n~~~~~~~~~~~~~~~~~~~~\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added ``transformers-neuronx`` artifacts to PyPI repository\n* Added support for the HuggingFace ``generate`` method\n* Added model serialization support for GPT2 models, including model saving, loading, and weight swapping\n* Added support for caching compiled artifacts\n* Improved performance by removing unnecessary KV-cache tensor resetting\n* Improved prompt context encoding performance (OPT, GPT2)\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed the GPT-J demo to import the correct ``amp_callback`` function\n\nKnown Issues\n^^^^^^^^^^^^\n\n* When the HuggingFace ``generate`` method is configured to use ``beam_search``, this can produce incorrect results for certain configurations. It is recommended to use other generation methods such as ``sample`` or ``greedy_search``. This will be fixed in a future Neuron release.\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n\n* None\n"
  },
  {
    "path": "release-notes/components/runtime.rst",
    "content": ".. meta::\n    :description: Complete release notes for the Neuron Runtime component across all AWS Neuron SDK versions.\n    :keywords: neuron runtime, neuron driver, neuron collectives, release notes, aws neuron sdk\n    :date-modified: 02/26/2026\n\n.. _runtime_rn:\n\nComponent Release Notes for Neuron Runtime\n==========================================\n\nThe release notes for the Neuron Runtime Neuron component, including Neuron collectives, the Runtime driver, and the Runtime library. Read them for the details about the changes, improvements, and bug fixes for all release versions of the AWS Neuron SDK.\n\n.. _runtime-2-29-0-rn:\n\nNeuron Runtime (Neuron 2.29.0 Release)\n------------------------------------------------------------------------\n\nDate of Release: 04/09/2026\n\n\nNeuron Runtime Library\n~~~~~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.31.24.0\n\nNew Features\n^^^^^^^^^^^^\n* Added new |nrt_cc_create_stream|_ API for programmatic host-driven collective stream creation, replacing the previous environment variable approach.\n* Added |nrt_get_attached_efa_bdf|_ API that returns the BDF string of the EFA device attached to a specified Neuron device index, enabling optimal network interface selection.\n* Added |lnc_idx|_ parameter to async tensor APIs, allowing users to select the specific DMA engine for data transfers.\n* New environment variables:\n\n  * ``NEURON_RT_ONE_THREAD_PER_CORE``: Pins each network proxy thread to a dedicated CPU core, providing up to 2x improvement in p50/p99 collective communication latency.\n  * ``NEURON_RT_RANKS_PER_NETWORK_PROXY``: Controls how many ranks share the same network proxy thread, enabling proxy thread consolidation for improved latency in large-scale distributed workloads.\n\n.. |nrt_cc_create_stream| replace:: ``nrt_cc_create_stream``\n.. _nrt_cc_create_stream: https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h\n.. |nrt_get_attached_efa_bdf| replace:: ``nrt_get_attached_efa_bdf``\n.. _nrt_get_attached_efa_bdf: https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt.h\n.. |lnc_idx| replace:: ``lnc_idx``\n.. _lnc_idx: https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h\n\nImprovements\n^^^^^^^^^^^^\n* Added EFA collectives support for Trn3(previously only available on Trn2), enabling cross-instance data transfers.\n* Added profiling support for the :doc:`standalone collectives </neuron-runtime/api/nrt-async-api-overview>`, allowing :doc:`standalone collectives </neuron-runtime/api/nrt-async-api-overview>` traces to appear in the profiler timeline.\n* Added context caching for :doc:`standalone collectives </neuron-runtime/api/nrt-async-api-overview>` operations (all-gather, reduce-scatter, all-reduce) that are run outside of a compiled model/kernel, significantly improving schedule performance by up to 90% for repeated calls.\n* Removed unnecessary memset operations during :doc:`standalone collectives </neuron-runtime/api/nrt-async-api-overview>` request processing flow, eliminating tens of milliseconds of overhead.\n* Removed limit of 512 queue set instances per NEFF which lead to ``NRT_RESOURCE`` errors when loading NEFF with too many queue set instnaces. The Neuron Runtime now supports an unbounded number of queue set instance enabling it to load NEFFs that are further optimized for code size reduction.\n* Added Physical Neuron Core ID and Global Rank ID fields to debug tensor read payload for better multi-core/multi-rank debugging.\n* Added async sequence IDs (nrta_seq_t) to system trace events for correlation between async operations and hardware execution events.\n\nBreaking changes\n^^^^^^^^^^^^^^^^\n* Error tracker removed from async API in favor of a simpler status pointer pass-in model. Applications now pass a status pointer directly. See :ref:`this section <nrta-error-handling>` for an example.\n* |nrta_get_completion_handle|_ API removed.\n* Due to the breaking changes, we have performed a version bump from 2.x to 3.0 for the :doc:`NRT Async APIs </neuron-runtime/api/nrt-async-api-overview>` (APIs prefixed with ``nrta``). Applications using the async API will need to be recompiled against the new version.\n\n.. |nrta_get_completion_handle| replace:: ``nrta_get_completion_handle``\n.. _nrta_get_completion_handle: https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/libnrt/include/nrt/nrt_async.h\n\nBug Fixes\n^^^^^^^^^\n* Fixed profile data loss on SIGTERM shutdown (e.g., vLLM worker processes); the signal handler now covers both SIGTERM and SIGINT.\n* Increased :doc:`Collectives XU </neuron-runtime/api/nrt-async-api-overview>` communication wait timeout (increased from 100ms to 30s) to prevent false timeout failures when ranks have timing drift.\n* Fixed a double-free crash during ``nrt_close`` when NEFF execution fails and model unload is not called.\n* Fixed DMA ring allocation in :doc:`Collectives XU </neuron-runtime/api/nrt-async-api-overview>` context caching that caused hangs and invalid cache hits.\n* Fixed :doc:`Collectives XU </neuron-runtime/api/nrt-async-api-overview>` to properly support in-place operations where send and receive buffers are identical.\n* Fixed device memory leak during repeated NEFF load/unload cycles.\n* Fixed crash when ``proxy_queue`` is destroyed before ``start()`` due to ``ncclSetAffinity`` failure.\n* Fixed system trace event correlation by passing correct execution ID to wait events.\n* Fixed NEFF output during inspect profiling to use UUID for distinguishing NEFFs with same hash.\n* Fixed BranchPrefetchHint addressing mode bug where backwards-relative branch hints computed incorrect target addresses on trn2 and later.\n* Fixed dynamically loaded kernel code carveout size on trn2 (16KB → 32KB) to support migrated operations.\n\nCompatibility Support Table\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThe Neuron runtime was tested for the following EC2 instances and configurations:\n\n=========================== ============= ============== ================= ===============\nInstance Family               OS Type       OS Version     Kernel Version    GLIBC Version\n=========================== ============= ============== ================= ===============\n``Inf2``                    Ubuntu        U24            6.17              2.39\n``Inf2``                    Ubuntu        U22            6.8               2.35\n``Inf2``                    Rocky Linux   RL9            5.14              2.34\n``Inf2``                    Amazon Linux  AL2023         6.12              2.34\n``Inf2``                    Amazon Linux  AL2023         6.1               2.34\n``Trn1``                    Ubuntu        U24            6.17              2.39\n``Trn1``                    Ubuntu        U22            6.8               2.35\n``Trn1``                    Rocky Linux   RL9            5.14              2.34\n``Trn1``                    Amazon Linux  AL2023         6.12              2.34\n``Trn1``                    Amazon Linux  AL2023         6.1               2.34\n``Trn2``                    Ubuntu        U24            6.17              2.39\n``Trn2``                    Ubuntu        U22            6.8               2.35\n``Trn2``                    Amazon Linux  AL2023         6.12              2.34\n``Trn2``                    Amazon Linux  AL2023         6.1               2.34\n=========================== ============= ============== ================= ===============\n\nNeuron Driver\n~~~~~~~~~~~~~\n\n**Version:** 2.27.4.0\n\nNew Features\n^^^^^^^^^^^^\n* Added support for new :ref:`TRN3 Gen2 Ultraserver <aws-trn3-arch>` configurations: US3 (2-node), US4 (4-node), US16 (4-node), and US18 (4-node).\n\nImprovements\n^^^^^^^^^^^^\n* Added top-level DMA reset support during TPB reset on trn3 and later platforms, improving reset reliability.\n\nCompatibility Support Table\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThe Neuron driver was tested for the following EC2 instances and configurations:\n\n=========================== ============= ============== ================= ===============\nInstance Family               OS Type       OS Version     Kernel Version    GLIBC Version\n=========================== ============= ============== ================= ===============\n``Inf2``                    Ubuntu        U24            6.17              2.39\n``Inf2``                    Ubuntu        U22            6.8               2.35\n``Inf2``                    Rocky Linux   RL9            5.14              2.34\n``Inf2``                    Red Hat       RHEL10         6.12              2.39\n``Inf2``                    Amazon Linux  AL2023         6.12              2.34\n``Inf2``                    Amazon Linux  AL2023         6.1               2.34\n``Inf2``                    Amazon Linux  AL2            5.10              2.26\n``Trn1``                    Ubuntu        U24            6.17              2.39\n``Trn1``                    Ubuntu        U22            6.8               2.35\n``Trn1``                    Rocky Linux   RL9            5.14              2.34\n``Trn1``                    Red Hat       RHEL10         6.12              2.39\n``Trn1``                    Amazon Linux  AL2023         6.12              2.34\n``Trn1``                    Amazon Linux  AL2023         6.1               2.34\n``Trn1``                    Amazon Linux  AL2            5.10              2.26\n``Trn2``                    Ubuntu        U24            6.17              2.39\n``Trn2``                    Ubuntu        U22            6.8               2.35\n``Trn2``                    Red Hat       RHEL10         6.12              2.39\n``Trn2``                    Amazon Linux  AL2023         6.12              2.34\n``Trn2``                    Amazon Linux  AL2023         6.1               2.34\n``Trn2``                    Amazon Linux  AL2            5.10              2.26\n=========================== ============= ============== ================= ===============\n\nNeuron Collectives\n~~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.31.24.0\n\nImprovements\n^^^^^^^^^^^^\n* Restructured EFA device processing to per-stream granularity, simplifying resource management for concurrent streams improving stability.\n* Improved bootstrap error messages with actionable troubleshooting guidance when ranks fail to receive root parameters.\n\nBug Fixes\n^^^^^^^^^\n* Fixed incorrect interface selection in multi-ultraserver collectives where EFA was used instead of Ultraserver interfaces, by adding explicit cross-rack-to-rack flag detection.\n* Fixed crash and undefined behavior when channels fail to initialize due to EFA Device plugin errors, now returning clear error messages instead of accessing uninitialized data.\n\n----\n\n.. _runtime-2-28-0-rn:\n\nNeuron Runtime (Neuron 2.28.0 Release)\n------------------------------------------------------------------------\n\nDate of Release: 02/26/2026\n\n\nNeuron Runtime Library\n~~~~~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.30.50.0\n\nImprovements\n^^^^^^^^^^^^\n\n* Added support for :ref:`TRN3 Gen1 Ultraserver <aws-trn3-arch>` instance type with full system topology\n* Added support for tensors larger than 4GB with 64-bit addressing\n* Introduced experimental async APIs (see :doc:`NRT Async APIs Overview </neuron-runtime/api/nrt-async-api-overview>`)\n* Optimized mesh AllGather on TP8 configurations using destination routing\n* Added bound check support for ``dma_direct2d_xpose`` operations\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed proxy thread signaling condition in topsp barrier\n* Fixed segfaults in NEFF load cleanup and error paths\n* Fixed incompatible network/ultraserver interface selection for inter-node mesh\n* Fixed RDH buffer reservation and AllGather bugs\n* Fixed corrupted memory logs in multi-threaded model loads\n* Improved error handling to return a clear error instead of asserting during ``nrt_init``\n\nCompatibility Support Table\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThe Neuron runtime was tested for the following EC2 instances and configurations:\n\n=========================== ============= ============== ================= ===============\nInstance Family               OS Type       OS Version     Kernel Version    GLIBC Version\n=========================== ============= ============== ================= ===============\n``Inf2``                    Ubuntu        U24            6.14              2.39\n``Inf2``                    Ubuntu        U22            6.8               2.35\n``Inf2``                    Rocky Linux   RL9            5.14              2.34\n``Inf2``                    Amazon Linux  AL2023         6.12              2.34\n``Inf2``                    Amazon Linux  AL2023         6.1               2.34\n``Trn1``                    Ubuntu        U24            6.14              2.39\n``Trn1``                    Ubuntu        U22            6.8               2.35\n``Trn1``                    Rocky Linux   RL9            5.14              2.34\n``Trn1``                    Amazon Linux  AL2023         6.12              2.34\n``Trn1``                    Amazon Linux  AL2023         6.1               2.34\n``Trn2``                    Ubuntu        U24            6.14              2.39\n``Trn2``                    Ubuntu        U22            6.8               2.35\n``Trn2``                    Amazon Linux  AL2023         6.12              2.34\n``Trn2``                    Amazon Linux  AL2023         6.1               2.34\n=========================== ============= ============== ================= ===============\n\n\nNeuron Driver\n~~~~~~~~~~~~~\n\n**Version:** 2.26.10.0\n\nDate of Release: 03/12/2026\n\nBug Fixes\n^^^^^^^^^\n\n* Compatibility fixes for Linux kernel 6.18.\n\n.. _neuron-driver-2-26-5-0:\n\n**Version:** 2.26.5.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added support for detecting :ref:`TRN3 Gen1 Ultraserver <aws-trn3-arch>` platforms\n* Added IOCTL to lookup both the Neuron device and the HBM for a given virtual address, enabling frameworks to identify which device holds a tensor\n* Updated driver uninstall behavior to fail gracefully without uninstalling the driver if driver is in use\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed kernel crash where non-validated input to DMA-related IOCTLs could trigger BUG_ON requiring an instance reboot to recover\n* Added BAR bounds validation during ``ncdev_bar_read`` to prevent out-of-bounds register access through IOCTLs\n* Fixed bounds checks on memory accesses where u64 wraparound attacks can lead to out-of-bounds memory access\n\nWe would like to thank Shaul Ben Hai from SentinelOne Security Research for reporting the above three issues.\n\n* Fixed use-after-free issues in sysfs cleanup flow that caused kernel crashes\n* Fixed race condition in sysfs access during driver initialization\n\nCompatibility Support Table\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nThe Neuron driver was tested for the following EC2 instances and configurations:\n\n=========================== ============= ============== ================= ===============\nInstance Family               OS Type       OS Version     Kernel Version    GLIBC Version\n=========================== ============= ============== ================= ===============\n``Inf2``                    Ubuntu        U24            6.14              2.39\n``Inf2``                    Ubuntu        U22            6.8               2.35\n``Inf2``                    Rocky Linux   RL9            5.14              2.34\n``Inf2``                    Amazon Linux  AL2023         6.12              2.34\n``Inf2``                    Amazon Linux  AL2023         6.1               2.34\n``Inf2``                    Amazon Linux  AL2            5.10              2.26\n``Trn1``                    Ubuntu        U24            6.14              2.39\n``Trn1``                    Ubuntu        U22            6.8               2.35\n``Trn1``                    Rocky Linux   RL9            5.14              2.34\n``Trn1``                    Amazon Linux  AL2023         6.12              2.34\n``Trn1``                    Amazon Linux  AL2023         6.1               2.34\n``Trn1``                    Amazon Linux  AL2            5.10              2.26\n``Trn2``                    Ubuntu        U24            6.14              2.39\n``Trn2``                    Ubuntu        U22            6.8               2.35\n``Trn2``                    Amazon Linux  AL2023         6.12              2.34\n``Trn2``                    Amazon Linux  AL2023         6.1               2.34\n``Trn2``                    Amazon Linux  AL2            5.10              2.26\n=========================== ============= ============== ================= ===============\n\nNeuron Collectives\n~~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.30.58.0\n\nImprovements\n^^^^^^^^^^^^\n\n* Added support for :ref:`TRN3 Gen1 Ultraserver <aws-trn3-arch>` instance types with optimized topology configurations\n* Added support for Neuron-Switch-v1 topology and proper network interface selection\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed bug where uninitialized socket file descriptors were incorrectly closed during bootstrap, preventing connection errors in multi-context scenarios\n* Improved error handling for channel creation failures due to plugin initialization errors, preventing crashes with misconfigured plugins\n* Initialized file descriptor arrays to -1 in bootstrap code to prevent accidental use of uninitialized descriptors\n\n----\n\n.. _runtime-2-27-0-rn:\n\nNeuron Runtime [2.29.40.0] (Neuron 2.27.0 Release)\n---------------------------------------------------\n\nDate of Release: 12/19/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added support for Trainium3 (single node mode)\n* Reduced the overhead of reprogramming the Collectives Engine by up to 100x for NEFFs compiled with the ``-O1`` flag. This improves end-to-end performance of these NEFFs by up to 15%.\n* Reduced NeuronCore branch overhead by up to 3x, decreasing the overhead of starting a NEFF program by up to 5%.\n* Reduced the overhead of starting a NEFF program by up to 50% with an on-device hardware barrier between ranks.\n* Improved all-gather latency by up to 35% for messages greater than 1MB in TP8 (LNC2) and TP16 (LNC1) collectives.\n* Added support for NRT Debug Stream APIs.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed scratchpad page allocation bug that caused excessive page allocations due to page rounding error.\n* Fixed segfault that occurred when freeing an empty tensor.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* The ``nrt_tensor_allocate`` APIs do not support more then 4 GB (>= 4GB) sizes. Passing in a size larger than or equal to 4GB will result in datatype overflow leading to undefined behavior.\n* A hardware bug affecting **Trainium** and **Inferentia2** devices causes numerical errors to become \"sticky\" within the Neuron Core hardware.\n\n\n----\n\n.. _runtime-2-26-0-rn:\n\nNeuron Runtime [2.28.19.0] (Neuron 2.26.0 Release)\n---------------------------------------------------\n\nDate of Release: 09/18/2025\n\nImprovements\n~~~~~~~~~~~~~~~\n\n* Added rank ID to all events emitted from the Profiler 2.0 system trace.\n* Improved timestamp alignment of Profiler 2.0 NeuronCore and CPU system trace events enhancing the accuracy of the trace timeline.\n\nBug Fixes\n~~~~~~~~~\n\n* Fixed bug where `nrt_unload` returned `NRT_SUCCESS` even when model stop fails due to Neuron Core lockups.\n* Fixed bug where `model_name` was empty in Profiler 2.0 system trace events.\n* Fixed bug where error messages were incorrectly being displayed on machines with no EFA devices.\n\nKnown Issues\n~~~~~~~~~~~~\n\n* The ``nrt_tensor_allocate`` APIs do not support more then 4 GB (>= 4GB) sizes.\n* A hardware bug affecting **Trainium** and **Inferentia2** devices causes numerical errors to become \"sticky\" within the Neuron Core hardware.\n\n\n----\n\n.. _runtime-2-25-0-rn:\n\nNeuron Runtime (Neuron 2.25.0 Release)\n---------------------------------------\n\nDate of Release: 07/31/2025\n\nNeuron Runtime Library\n~~~~~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.27.23.0\n\nImprovements\n^^^^^^^^^^^^^^\n\n* Introduced ``nrt_get_vnc_memory_stats`` API to retrieve device memory usage.\n* Added support for State-Buffer to State-Buffer collective support for ``all_reduce``, ``reduce_scatter``, and ``all_gather`` for LNC2, which helps reduce HBM memory pressure.\n* Added support for coalescing of Collectives operations for internode RDH.\n* Introduced a new DGE priority class feature to select preferred packet size for memory transfers.\n* Improved ``nrt_init`` time by up to ~3 seconds on AWS Trainium and Inferentia instances.\n* Added a warning message along with a recommended scratchpad configuration when a loaded NEFF has non-optimial scratchpad usage.\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n\n* Due to a hardware bug that can cause numerical errors to be falsely reported, the runtime has disabled numerical errors by default. Users can re-enable numerical errors by setting ``NEURON_RT_NUMERICAL_ERRORS_VERBOSITY=critical`` or ``NEURON_FAIL_ON_NAN=1``.\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed profiling APIs to report execution duration from explicit notifications.\n* Fixed race condition which can cause a crash when starting inspect traces.\n\nKnown Issues\n^^^^^^^^^^^^\n\n* A hardware bug affecting **Trainium** and **Inferentia2** devices causes numerical errors to become \"sticky\" within the Neuron Core hardware.\n\nNeuron Collectives\n~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.25.65.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added multinode collectives support for Trainium2 instances without EFA devices\n* Minor performance improvement to network proxy handshake\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed memory leak clearing up communication devices during ``nrt_close``\n\nNeuron Driver\n~~~~~~~~~~~~~\n\n**Version:** 2.21.37.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added the ability for users to read power utilization for each neuron device via a sysfs interface. This interface shows the minimum, maximum and average power consumed by the device over the past minute, expressed as a percentage of the device's maximum power.\n* Added the ability for users to read the device utilization. This shows up as the microseconds between the start and end of the current execution on hardware.\n\n\n----\n\n.. _runtime-2-24-0-rn:\n\nNeuron Runtime (Neuron 2.24.0 Release)\n---------------------------------------\n\nDate of Release: 06/24/2025\n\nNeuron Runtime Library\n~~~~~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.26.42.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added support for 8x8 collective groups (TP8 + CP8) on **TRN2** for **LNC=2**\n* Added support for direct `State-Buffer` to `State-Buffer` collective ops for **LNC=1**\n* Introduce RDH algorithm for inter-node collective communication\n* Added support for loading NEFF with different world sizes in the same NRT process\n* Reduced the average latency of 32x2 collective groups by 65%\n* Reduced latency for intra-chip reduce scatter operations on **TRN2** instances by up to 20% for small transfers and 60% for medium to large transfers\n* Improved latency for medium message sizes for intra-chip All Gather operations on **TRN2** by up to 60%\n* Improved the debugging experience by adding logs which print out the value of timed-out, non-zero semaphores on **Trainium2** platforms\n* Improved timeout error messages by displaying the NEFF program counters for the stuck Neuron Core\n* Refined out-of-memory error messages to report a NEFF level memory breakdown table\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n\n* This version of the Neuron runtime requires `aws-neuron-dkms` version `2.22` or later on **Trainium2** instances.\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed crash caused by race condition during the capture of system profiles\n* Fixed various memory leaks that occur during `nrt_close`\n\nKnown Issues\n^^^^^^^^^^^^\n\n* The ``nrt_tensor_allocate`` APIs do not support more then 4 GB (>= 4GB) sizes.\n* A hardware bug affecting **Trainium** and **Inferentia2** devices causes numerical errors to become \"sticky\" within the Neuron Core hardware.\n\nNeuron Collectives\n~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.24.59.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Improved interface between ``libnccom`` and ``libnrt`` resulting stability improvements\n\nNeuron Driver\n~~~~~~~~~~~~~\n\n**Version:** 2.20.28.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* This driver is required to run with Neuron Runtime 2.24 or later on Trainium2 machines. Included in the release is a bug fix to avoid device memory corruption issues leading to undefined Neuron Device behavior.\n* Improved interface between ``libnrt`` and the Driver resulting in stability improvements.\n\n\n----\n\n.. _runtime-2-23-0-rn:\n\nNeuron Runtime (Neuron 2.23.0 Release)\n---------------------------------------\n\nDate of Release: 05/19/2025\n\nNeuron Runtime Library\n~~~~~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.25.57.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added ``NEURON_RT_LOW_LATENCY_TASKS_CPU_AFFINITY`` environment variable to allow users to set the thread affinity of low latency tasks that run on host cpu\n* Refined software notification queue overflow detection flow and improved error message\n* Reduced latency for All-Reduce intra-chip collective (TP 4) by 50% for medium message sizes\n* Improved error message when an execution request is passed a tensor allocated on an incorrect HBM\n* Improved NEFF switch latency by up to 95% when using async mode\n* Increased the number of different replica groups supported in the same NEFF on TRN2\n* Explicitly limit the max number of in-flight async requests to the hard limit of 63\n* Added traces for Host <-> device data transfer events in system profiles (Neuron Profiler 2.0 Beta)\n* Added pre/post execution hooks to system profiles (Neuron Profiler 2.0 Beta)\n* Significant performance improvements in time taken by calls to nrt_sys_trace_fetch_events() (Neuron Profiler 2.0 Beta)\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed segfault that can occur when applications attempt to load a NEFF with an unsupported number of FMA source descriptors\n\nKnown Issues\n^^^^^^^^^^^^\n\n* The ``nrt_tensor_allocate`` APIs do not support more then 4 GB (>= 4GB) sizes.\n\nNeuron Collectives\n~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.23.135.0 / 2.23.133.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added Trainium2 support\n* Improved startup times for large scale training jobs by up to 5 seconds\n* Enhanced error logging for bootstrap failures\n* Aws-ofi-nccl: minor performance improvement\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed various memory leaks which occur during process cleanup\n\n\n----\n\n.. _runtime-2-22-0-rn:\n\nNeuron Runtime (Neuron 2.22.0 Release)\n---------------------------------------\n\nDate of Release: 04/03/2025\n\nNeuron Runtime Library\n~~~~~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.24.53.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Improved dynamic DMA descriptor generation performance by up to 3% for certain workloads\n* Reduced collectives device memory footprint for large Neffs\n* Improved device latency for memory bound workloads on TRN2\n* Added support for profiling executions when NRT is launched in Async Execution Mode\n* Added check to detect execution completion queue overflows\n* Reduced overhead of Neuron Profiler 2.0 to <1% of overall latency (Neuron Profiler 2.0 Beta)\n* Added new ``nrt_sys_trace_fetch_events`` API to retrieve system trace events (Neuron Profiler 2.0 Beta)\n* Added out of bound error events to system trace (Neuron Profiler 2.0 Beta)\n* Removed the ``NEURON_RT_INSPECT_DURATION_NSEC`` and ``NEURON_RT_INSPECT_START_OFFSET_NSEC`` configuration options (Neuron Profiler 2.0 Beta)\n* Added dynamic DMA support for block scatter ops (NKI)\n* Added RangeSelect instruction Support for the Vector engine (NKI)\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n\n* Removed support for Neuron Distributed Event Tracing\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed bug introduced in NRT 2.23 where the runtime was incorrectly reporting executions that hit \"Out of Bound\" errors as successful executions\n* Fixed segfault when encountering \"out of memory\" errors when starting profiles\n\nKnown Issues\n^^^^^^^^^^^^\n\n* The ``nrt_tensor_allocate`` APIs do not support more then 4 GB (>= 4GB) sizes.\n\nNeuron Collectives\n~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.22.26.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added check to print out an error message on invalid ``NEURON_RT_ROOT_COMM_ID`` configurations\n\nBug Fixes\n^^^^^^^^^\n\n* Resolved an issue where the ``libnccom.so`` filename was versioned incorrectly as ``libnccom.so.2.y.y``. Will be correctly versioned as ``libnccom.so.2.22.26`` in this release.\n\nNeuron Driver\n~~~~~~~~~~~~~\n\n**Version:** 2.22.2.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added workaround for HW DGE descriptor fetching bug\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed typos in certain error log messages\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n\n* Starting with Neuron Release 2.26, Neuron driver versions above 2.24 will only support non-Inf1 instances (such as ``Trn1``, ``Inf2``, or other instance types).\n* ``Inf1`` instance users, only Neuron driver version 2.24 will remain supported with regular security patches.\n* ``Inf1`` instance users are advised to pin the Neuron driver version to ``2.24.*`` in their installation script.\n\n\n----\n\n.. _runtime-2-21-0-rn:\n\nNeuron Runtime (Neuron 2.21.0 Release)\n---------------------------------------\n\nDate of Release: 12/20/2024\n\nNeuron Runtime Library\n~~~~~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.23.110.0 / 2.23.112.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Added Trainium2 support\n* Added runtime support to detect and fail on out-of-bound memory access in DMA operations\n* Added support for 4-rank replica group on adjacent Neuron cores on TRN1/TRN1N\n* Added new profiling API for capturing system and device profiles (Neuron Profiler 2.0 Beta)\n* Reduced runtime host RAM utilization\n* Improved Neff context switch overhead reducing latency by up to 500us\n* Split hardware errors into more granular categories:\n   * ``NRT_EXEC_HW_ERR_HBM_UE`` (1201)\n   * ``NRT_EXEC_HW_ERR_NC_UE`` (1202)\n   * ``NRT_EXEC_HW_ERR_DMA_ABORT`` (1203)\n* Updated runtime to breakdown DMA ring memory usage into more detailed categories:\n   * dma rings io\n   * dma rings spill\n   * dma rings collectives\n   * dma rings runtime\n* Updated the ``nrt_load`` error path to print a clear error message when failing to load a collectives Neff instead of aborting\n\nBreaking Changes\n^^^^^^^^^^^^^^^^\n\n* Removed INF1 Support from Runtime library\n\nBug Fixes\n^^^^^^^^^\n\n* Fixed multiple memory corruptions and exhaustions on the collectives failure path\n* Fixed bug where incorrect execution status was passed to the async execution callback\n* Fixed DMA abort errors on TRN2\n\nKnown Issues\n^^^^^^^^^^^^\n\n* The ``nrt_tensor_allocate`` APIs do not support more then 4 GB (>= 4GB) sizes.\n\nNeuron Collectives\n~~~~~~~~~~~~~~~~~~\n\n**Version:** 2.21.46.0\n\nImprovements\n^^^^^^^^^^^^^^^\n\n* Bootstrap changes to improve application startup latency for large-scale workloads\n* Logging improvements\n"
  },
  {
    "path": "release-notes/documentation/neuron-documentation.rst",
    "content": ".. _neuron-documentation-rn:\n\nNeuron Documentation Release Notes\n==================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nNeuron 2.21.0\n---------------\nDate: 12/20/2024\n\nNeuron Architectue and Features\n- Added Trainium2 Architectue guide. See :ref:`trainium2-arch`\n- Added Trn2 Architecture guide. See :ref:`aws-trn2-arch`\n- Added Logical NeuronCore configuration guide. See :ref:`logical-neuroncore-config`\n- Added NeuronCore-v3 Architecture guide. See :ref:`neuroncores-v3-arch`\n\nNeuron Compiler\n- Added NKI tutorial for SPMD usage with multiple Neuron Cores on Trn2. See `tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/nki/tutorials/spmd_multiple_nc_tensor_addition.rst>`_\n- Updated NKI FAQ with Trn2 FAQs. See :ref:`nki_faq` \n- Added :doc:`Direct Allocation Developer Guide <nki_direct_allocation_guide>`\n- Updated :doc:`nki.isa <api/nki.isa>` API guide with support for new APIs. \n- Updated :doc:`nki.language <api/nki.language>` API guide with support for new APIs. \n- Updated :doc:`nki.compiler <api/nki.compiler>` API guide with support for new APIs. \n- Updated NKI :ref:`datatype <nl_datatypes>` guide with support for ``float8_e5m2``. \n- Updated :doc:`kernels <api/nki.kernels>` with support for allocated_fused_self_attn_for_SD_small_head_size and allocated_fused_rms_norm_qkv kernels\n\nNeuron Runtime\n- Updated troubleshooting doc with information on device out-of-memory errors after upgrading to Neuron Driver 2.19 or later. See :ref:`small_allocations_mempool`\n\nNeuronX Distributed Inference\n- Added Application Note to introduce NxD Inference. See :ref:`introduce-nxd-inference`\n- Added NxD Inference Supported Features Guide. See :ref:`nxdi-feature-guide`\n- Added NxD Inference Tutorial for Deploying Llama 3.1 405B (Trn2). See :ref:`nxdi-trn2-llama3.1-405b-tutorial`\n- Added NxD Inference API Reference Guide. See :ref:`nxd-inference-api-guides`\n- Added NxD Inference Production Ready Models (Model Hub) Guide. See :ref:`nxdi-model-reference`\n- Added Migration Guide from NxD examples to NxD Inference. See :ref:`nxd-examples-migration-guide`\n- Added Migration Guide from Transformers NeuronX to NeuronX Distributed Inference. See :ref:`nxdi_migrate_from_tnx`\n- Added vLLM User Guide for NxD Inference. See :ref:`nxdi-vllm-user-guide`\n- Added tutorial for deploying Llama3.2 Multimodal Models. See :ref:`/libraries/nxd-inference/tutorials/llama3.2-multimodal-tutorial.ipynb`\n\nNeuronX Distributed Training\n- Updated :ref:`api_guide_nxd_training`, :ref:`llama2_tp_pp_tutorial`, :ref:`llama3_tp_pp_tutorial`, :ref:`nxdt_config_overview`, and :ref:`checkpoint_conversion` with support for fused Q,K,V\n- Updated :ref:`nxdt_config_overview` with support for Trn2 configuration API\n- UpdatedDirect :ref:`checkpoint_conversion` with support for  HuggingFace Model Conversion\n- Added tutorial for HuggingFace Llama3.1/Llama3-70B Pretraining. See :ref:`hf_llama3_70B_pretraining`\n- Added tutorial for HuggingFace Llama3-8B Direct Preference Optimization (DPO) based Fine-tuning. See :ref:`hf_llama3_8B_DPO`\n\nTransformers NeuronX\n- Updated :ref:`transformers_neuronx_developer_guide` and :ref:`torch_neuronx_trace_api` with support for CPU compilation.\n- Updated :ref:`transformers_neuronx_developer_guide` to enable skipping the first Allgather introduced by flash decoding at the cost of duplicate Q weights.\n- Updated :ref:`transformers_neuronx_developer_guide` with support for EAGLE speculation\n\nNeuron Tools\n- Added Neuron Profiler 2.0 Beta User Guide with support for system profiles, integration with Perfetto, distributed workload support, etc. See :ref:`neuron-profiler-2-0-guide`\n- Updated nccom-test user guide to include support for Trn2. See :ref:`nccom-test`\n- Updated neuron-ls user guide to include support for Trn2. See :ref:`neuron-ls-ug`\n- Updated neuron-monitor user guide to include support for Trn2. See :ref:`neuron-monitor-ug`\n- Updated neuron-top user guide to include support for Trn2. See :ref:`neuron-top-ug`\n- Added Ask Q Developer documentation for general Neuron guidance and jumpstarting NKI kernel developement. See :ref:`amazon-q-dev`\n\nPyTorch NeuronX\n- Added troubleshooting note for eager debug mode errors. See :ref:`pytorch-neuron-traning-troubleshooting`\n- Add torch-neuronx cxx11 ABI documentation. See :ref:`pytorch-neuronx-install-cxx11`\n- Added Migration Guide From ``XLA_USE_BF16``/ ``XLA_DOWNCAST_BF16``. See :ref:`migration_from_xla_downcast_bf16`\n- Updated BERT tutorial to not use ``XLA_DOWNCAST_BF16`` and updated BERT-Large pretraining phase to BFloat16 BERT-Large pretraining with AdamW and stochastic rounding. See :ref:`hf-bert-pretraining-tutorial`\n- Added Appliation Note for PyTorch 2.5 support. See :ref:`introduce-pytorch-2-5`\n- Updated PyTorch NeuronX Environment Variables document with support for PyTorch 2.5. See :ref:`pytorch-neuronx-envvars`\n\nMisc\n- Added a third-party developer flow solutions page. See :ref:`third-party-devflow-solutions`\n- Added a third-party libraries page. See :ref:`third-party-libraries`\n\nEnd of support announcements\n- :ref:`announce-eos-neuron-det`\n- :ref:`announce-eos-nxd-examples`\n- :ref:`announce-python-eos`\n- :ref:`announce-eos-pytorch-eos-113`\n- :ref:`announce-eos-pytorch-2-1`\n- :ref:`announce-u20-dlami-dlc-eos`\n- :ref:`announce-eos-torch-neuron`\n\nNeuron 2.20.0\n---------------\nDate: 09/16/2024\n\nNeuron Compiler\n\n- Added Getting Started with NKI guide for implementing a simple “Hello World” style NKI kernel and running it on a Neuron Device (Trainium/Inferentia2). See :ref:`nki_getting_started`\n- Added NKI Programming Model guide for explaining the three main stages of the NKI programming model. See :ref:`nki_programming_model`\n- Added NKI Kernel as a Framework Custom Operator guide for explaining how to insert a NKI kernel as a custom operator into a PyTorch or JAX model using simple code examples. See :ref:`nki_framework_custom_op`\n- Added NKI Tutorials for the following kernels: Tensor addition, Transpose2D, AveragePool2D, Matrix multiplication, RMSNorm, Fused Self Attention, LayerNorm, and Fused Mamba. See :ref:`nki_kernels`\n- Added NKI Kernels guide for optimized kernel examples. See :ref:`nki_kernels`\n- Added Trainium/Inferentia2 Architecture Guide for NKI. See :ref:`trainium_inferentia2_arch`\n- Added Profiling NKI kernels with Neuron Profile. See :ref:`neuron_profile_for_nki`\n- Added NKI Performance Guide for explaining a recipe to find performance bottlenecks of NKI kernels and apply common software optimizations to address such bottlenecks. See :ref:`nki_perf_guide`\n- Added NKI API Reference Manual with nki framework and types, nki.language, nki.isa, NKI API Common Fields, and NKI API Errors. See :ref:`nki_api_reference`\n- Added NKI FAQ. See :ref:`nki_faq`\n- Added NKI Known Issues. See :ref:`nki_known_issues`\n- Updated Neuron Glossary with NKI terms. See :ref:`neuron_hw_glossary`\n- Added new `NKI samples repository <https://github.com/aws-neuron/nki-samples>`_\n- Added average_pool2d, fused_mamba, layernorm, matrix_multiplication, rms_norm, sd_attention, tensor_addition, and transpose_2d kernel tutorials to the NKI samples respository. See :ref:`NKI samples repository <https://github.com/aws-neuron/nki-samples>`\n- Added unit and integration tests for each kernel. See `NKI samples repository <https://github.com/aws-neuron/nki-samples>`_\n- Updated Custom Operators API Reference Guide with updated terminology (HBM). See :ref:`custom-ops-api-ref-guide`\n\nNeuronX Distributing Training (NxDT)\n\n- Added NxDT (Beta) Developer Guide. See :ref:`nxdt_developer_guide`\n- Added NxDT Developer Guide for Migrating from NeMo to Neuronx Distributed Training. See :ref:`nxdt_developer_guide_migration_nemo_nxdt`\n- Added NxDT Developer Guide for Migrating from Neuron-NeMo-Megatron to Neuronx Distributed Training. See :ref:`nxdt_developer_guide_migration_nnm_nxdt`\n- Added NxDT Developer Guide for Integrating a new dataset/dataloader. See :ref:`nxdt_developer_guide_integrate_new_dataloader`\n- Added NxDT Developer Guide for Integrating a new model. See :ref:`nxdt_developer_guide_integrate_new_model`\n- Added NxDT Developer Guide for Registering an optimizer and LR scheduler. See :ref:`Registering an optimizer and LR scheduler`\n- Added NxDT YAML Configuration Overview. See :ref:`nxdt_config_overview`\n- Added Neuronx Distributed Training Library Features documentation. See :ref:`nxdt_features`\n- Added Installation instructions for NxDT. See :ref:`nxdt_installation_guide`\n- Added Known Issues and Workarounds for NxDT. See :ref:`nxdt_known_issues`\n\nNeuronX Distributed Core (NxD Core)\n\n- Updated Developer guide for save/load checkpoint (neuronx-distributed ) with ZeRO-1 Optimizer State Offline Conversion. See :ref:`save_load_developer_guide`\n- Added Developer guide for Standard Mixed Precision with NeuronX Distributed. See :ref:`standard_mixed_precision`\n- Updated NeuronX Distributed API Guide LoRA finetuning support. See :ref:`api_guide`\n- Added Developer guide for LoRA finetuning with NeuronX Distributed. See :ref:`lora_finetune_developer_guide`\n- Updated CodeLlama tutorial with latest package versions. See `tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/neuronx_distributed/llama/codellama_16k_inference.html>`_\n- Added tutorial for Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning with NeuronX Distributed. See :ref:`llama3_8b_tp_ptl_lora_finetune_tutorial`\n- Updated links in Llama2 NxD Finetuning tutorial. See :ref:`llama2_7b_tp_zero1_ptl_finetune_tutorial`\n- Updated tokenizer download command in tutorials. See :ref:`llama2_7b_tp_zero1_tutorial`, :ref:`llama2_tp_pp_tutorial`, and :ref:`codegen25_7b_tp_zero1_tutorial`\n\nJAX Neuron\n\n- Added JAX Neuron Main page. See :ref:`jax-neuron-main`\n- Added JAX Neuron plugin instructions. See :ref:`jax-neuronx-setup`\n- Added JAX Neuron setup instructions. See :ref:`setup-jax-neuronx`\n\nPyTorch NeuronX\n\n- Updated Developer Guide for Training with PyTorch NeuronX with support for convolution in AMP. See :ref:`pytorch-neuronx-programming-guide`.\n- Added inference samples for Wav2Vec2 conformer models with Relative Position Embeddings and Rotary Position Embedding. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_wav2vec2_conformer_relpos_inference_on_inf2.ipynb>`_ and `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_wav2vec2_conformer_rope_inference_on_inf2.ipynb>`_.\n- Updated the ViT sample with updated accelerate version. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_image_classification/vit.ipynb>`_\n- Updated PyTorch NeuronX Environment Variables with ``NEURON_TRANSFER_WITH_STATIC_RING_OPS``. See :ref:`pytorch-neuronx-envvars`\n- Added inference samples for Pixart Alpha and PixArt Sigma models. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_pixart_alpha_inference_on_inf2.ipynb>`_ and `sample <torch-neuronx/inference/hf_pretrained_pixart_sigma_inference_on_inf2.ipynb>`_\n- Added benchmarking scripts for PixArt alpha. See `benchmarking script <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/benchmark/pytorch/pixart_alpha_benchmark.py>`_\n\nTransformers NeuronX\n\n- Updated Transformers NeuronX Developer Guide with Multi-node inference support (TP/PP). See :ref:`transformers_neuronx_developer_guide`\n- Updated Transformers NeuronX Developer Guide with BDH layout support. See :ref:`transformers_neuronx_developer_guide`\n- Updated Transformers NeuronX Developer Guide with Flash Decoding to support long sequence lengths up to 128k. See :ref:`transformers_neuronx_developer_guide`\n- Updated Transformers NeuronX Developer Guide with presharded weights support. See :ref:`transformers_neuronx_developer_guide`\n- Added Llama 3.1 405b sample with 16k sequence length. See `tutorial <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-405b-multinode-16k-sampling.ipynb>`_\n- Added Llama 3.1 70b 64k tutorial. See `tutorial <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-70b-64k-sampling.ipynb>`_\n- Added Llama 3.1 8b 128k tutorial. See `tutorial <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-8b-128k-sampling.ipynb>`_\n- Removed the sample llama-3-8b-32k-sampling.ipynb and replaced it with Llama-3.1-8B model sample llama-3.1-8b-32k-sampling.ipynb. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-8b-32k-sampling.ipynb>`_\n\nNeuron Runtime\n\n- Updated Neuron Runtime Troubleshooting guide with the latest hardware error codes and logs and with Neuron Runtime execution fails at out-of-bound access. See :ref:`nrt-troubleshooting`\n- Updated Neuron Sysfs User Guide with new sysfs entries and device reset instructions. See :ref:`neuron-sysfs-ug`\n- Added Neuron Runtime Input Dump on Trn1 documentation. See :ref:`nrt-input-dumps`\n\nContainers\n\n- Added Neuron Helm Chart repository to help streamline the deployment of AWS Neuron components on Amazon EKS. See `repo <https://github.com/aws-neuron/neuron-helm-charts>`_\n- Updated Kubernetes container deployment process with Neuron Helm Chart documentation. See :ref:`k8s-neuron-helm-chart`\n- Added guide for Deploying Neuron Container on Elastic Container Service (ECS). See :ref:`training-dlc-then-ecs-devflow`\n- Added documentation for Neuron Plugins for Containerized Environments. See :ref:`neuron-container-plugins`\n- Updated guide for locating DLC images. See :ref:`locate-neuron-dlc-image`\n\nNeuron Tools\n\n- Updated Neuron Profiler User Guide with Alternative output formats. See :ref:`neuron-profile-ug`\n\nSoftware Maintenance and Misc\n\n- Updated the Neuron Software Maintenance Policy. See :ref:`sdk-maintenance-policy`\n- Added announcement and updated documentation for end of support start for Tensorflow-Neuron 1.x. See :ref:`announce-tfx-no-support`\n- Added announcement and updated documentation for end of support start for 'neuron-device-version' field. See :ref:`eos-neuron-device-version`\n- Added announcement and updated documentation for end of support start for ‘neurondevice’ resource name. See :ref:`eos-neurondevice`\n- Added announcement and updated documentation for end of support start for AL2. See :ref:`eos-al2`\n- Added announcement for maintenance mode for torch-neuron versions 1.9 and 1.10. See :ref:`announce-torch-neuron-eos`\n- Added supported Protobuf versions to the Neuron Release Artifacts. See :ref:`latest-neuron-release-artifacts`\n- Updated Neuron Github Roadmap. See :ref:`neuron_roadmap`\n\nNeuron 2.19.0\n-------------\nDate: 07/03/2024\n\n\n- Updated Transformers NeuronX Developer guide with support for inference for longer sequence lengths with Flash Attention kernel. See :ref:`Developer Guide <transformers_neuronx_readme>`.\n- Updated Transformers NeuronX developer guide with QKV Weight Fusion support. See :ref:`Developer Guide <transformers_neuronx_readme>`.\n- Updated Transformers NeuronX continuous batching developer guide with updated vLLM instructions and models supported. See :ref:`Developer Guide <transformers_neuronx_readme>`.\n- Updated Neuronx Distributed User guide with interleaved pipeline support. See :ref:`api_guide`\n- Added Codellama 13b 16k tutorial with NeuronX Distributed Inference library. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/codellama-13b-16k-sampling.ipynb>`_ \n- Updated PyTorch NeuronX Environment variables with custom SILU enabled via NEURON_CUSTOM_SILU. See :ref:`pytorch-neuronx-envvars`\n- Updated ZeRO1 support to have FP32 master weights support and BF16 all-gather. See :ref:`zero1-gpt2-pretraining-tutorial`.\n- Updated PyTorch 2.1 Appplication note with workaround for slower loss convergence for NxD LLaMA-3 70B pretraining using ZeRO1 tutorial. See :ref:`introduce-pytorch-2-1`.\n- Updated Neuron DLAMI guide with support for new 2.19 DLAMIs. See :ref:`neuron-dlami-overview`.\n- Updated HF-BERT pre-training documentation for port forwarding. See :ref:`hf-bert-pretraining-tutorial`\n- Updated T5 inference tutorial with transformer flag. See  `sample <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.html>`_ \n- Added support for Llama3 model training. See :ref:`llama3_tp_pp_tutorial` and :ref:`llama2_7b_tp_zero1_tutorial`\n- Added support for Flash Attention kernel for training longer sequences in NeuronX Distributed. See :ref:`llama2_7b_tp_zero1_tutorial` and :ref:`api_guide`\n- Updated Llama2 inference tutorial using NxD Inference library. See `sample <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/neuronx_distributed/llama/llama2_inference.html>`_ \n- Added new guide for Neuron node problem detection and recovery tool. See :ref:`configuration < k8s-neuron-problem-detector-and-recovery-irsa>` and :ref:`tutorial <k8s-neuron-problem-detector-and-recovery>`.\n- Added new guide for Neuron Monitor container to enable easy monitoring of Neuron metrics in Kubernetes. Supports monitoring with Prometheus and Grafana. See :ref:`tutorial <k8s-neuron-monitor>`\n- Updated Neuron scheduler extension documentation about enforcing allocation of contiguous Neuron Devices for the pods based on the Neuron instance type. See :ref:`tutorial <neuron_scheduler>`\n- Updated Neuron Profiler User Guide with various UI enhancements. See :ref:`neuron-profile-ug`\n- Added NeuronPerf support in Llama2 inference tutorial in NeuronX Distributed. See `sample <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/neuronx_distributed/llama/llama2_inference.html>`_ \n- Added announcement for maintenance mode of MxNet. See :ref:`announce-mxnet-maintenance`\n- Added announcement for end of support of Neuron TensorFlow 1.x (Inf1). See :ref:`announce-tfx-eos`\n- Added announcement for end of support of AL2. See :ref:`announce-eos-al2`\n- Added announcement for end of support of 'neuron-device-version' field in neuron-monitor. See :ref:`announce-eos-neuron-device-version`\n- Added announcement for end of support of 'neurondevice' resource name in Neuron Device K8s plugin. See :ref:`announce-eos-neurondevice`\n- Added announcement for end of support for Probuf versions <= 3.19 for PyTorch NeuronX. See :ref:`announce-eos-probuf319`\n\nNeuron 2.18.0\n-------------\nDate: 04/01/2024\n\n\n- Updated PyTorch NeuronX developer guide with Snapshotting support. See :ref:`torch-neuronx-snapshotting`.\n- Updated :ref:`api_guide` and :ref:`pp_developer_guide` with support for ``auto_partition`` API.\n- Updated :ref:`api_guide` with enhanced checkpointing support with ``load`` API and ``async_save`` API.\n- Updated documentation for ``PyTorch Lightning``  to train models using ``pipeline parallelism`` . See :ref:`API guide <api_guide>` and :ref:`Developer Guide <ptl_developer_guide>`.\n- Updated NeuronX Distributed developer guide with support for :ref:`Autobucketing <nxd-inference-devguide-autobucketing>`\n- Added PyTorch NeuronX developer guide for :ref:`Autobucketing <torch-neuronx-autobucketing-devguide>`.\n- Updated :ref:`api_guide` and :ref:`llama2_tp_pp_tutorial` with support for asynchronous checkpointing.\n- Updated Transformers NeuronX Developer guide with support for streamer and stopping criteria APIs. See :ref:`Developer Guide <transformers_neuronx_readme>`.\n- Updated Transformers NeuronX Developer guide with instructions for ``Repeating N-Gram Filtering``. See :ref:`Developer Guide <transformers_neuronx_readme>`.\n- Updated Transformers NeuronX developer guide with Top-K on-device sampling support [Beta]. See :ref:`Developer Guide <transformers_neuronx_readme>`.\n- Updated Transformers NeuronX developer guide with Checkpointing support and automatic model selection. See :ref:`Developer Guide <transformers_neuronx_readme>`.\n- Updated Transformers NeuronX Developer guide with support for speculative sampling [Beta]. See :ref:`Developer Guide <transformers_neuronx_readme>`.\n- Added sample for training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer with ``neuronx-distributed``. See :ref:`codegen25_7b_tp_zero1_tutorial`.\n- Added Tutorial for codellama/CodeLlama-13b-hf model inference with 16K seq length using Transformers Neuronx. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/codellama-13b-16k-sampling.ipynb>`_.\n- Added Mixtral-8x7B Inference Sample/Notebook using TNx. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/mixtral-8x7b-sampling.ipynb>`_.\n- Added Mistral-7B-Instruct-v0.2 Inference inference sample using TNx. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/mistralai-Mistral-7b-Instruct-v0.2.ipynb>`_.\n- Added announcement for Maintenance mode of TensorFlow 1.x. See :ref:`announce-tfx-maintenance`.\n- Updated PyTorch 2.1 documentation to reflect stable (out of beta) support. See :ref:`introduce-pytorch-2-1`.\n- Updated PyTorch NeuronX environment variables to reflect stable (out of beta) support. See :ref:`pytorch-neuronx-envvars`.\n- Updated :ref:`latest-neuron-release-artifacts` with supported HuggingFace Transformers versions.\n- Added user guide instructions for ``Neuron DLAMI``. See :ref:`neuron-dlami-overview`.\n- Updated :ref:`torch-hf-bert-finetune` tutorial with latest Hugging Face Trainer API.\n- Updated Neuron Runtime API guide with support for ``nr_tensor_allocate``. See :ref:`nrt-api-guide`.\n- Updated :ref:`neuron-sysfs-ug` with support for ``serial_number`` unique identifier.\n- Updated :ref:`custom-ops-api-ref-guide` limitations and fixed nested sublists. See :ref:`feature-custom-operators-devguide`.\n- Fixed issue in :ref:`zero1-gpt2-pretraining-tutorial`.\n- Fixed potential hang during synchronization step in ``nccom-test``. See :ref:`nccom-test`.\n- Updated troubleshooting guide with an additional hardware error messaging. See :ref:`nrt-troubleshooting`.\n- Updated DLC documentation. See :ref:`containers-dlc-then-customize-devflow` and :ref:`dlc-then-ec2-devflow`.\n\n\nNeuron 2.16.0\n-------------\nDate: 12/21/2023\n\n- Added setup guide instructions for ``AL2023`` OS. See :ref:`setup-guide-index`\n- Added announcement for name change of Neuron Components. See :ref:`announce-component-name-change`\n- Added announcement for End of Support for ``PyTorch 1.10`` . See :ref:`announce-eos_pytorch110`\n- Added announcement for End of Support for ``PyTorch 2.0`` Beta. See :ref:`announce-eos_pytorch2`\n- Added announcement for moving NeuronX Distributed sample model implementations. See :ref:`announce-moving-samples`\n- Updated Transformers NeuronX developer guide with support for Grouped Query Attention(GQA). See :ref:`developer guide <transformers_neuronx_readme>` \n- Added sample for ``Llama-2-70b`` model inference. See `tutorial <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-70b-sampling.ipynb>`_ \n- Added documentation for ``PyTorch Lightning``  to train models using ``tensor parallelism`` and ``data parallelism`` . See :ref:`api guide <api_guide>` , :ref:`developer guide <ptl_developer_guide>` and :ref:`tutorial <llama2_7b_tp_zero1_ptl_tutorial>`\n- Added documentation for Model and Optimizer Wrapper training API that handles the parallelization. See :ref:`api guide <api_guide>` and :ref:`model_optimizer_wrapper_developer_guide`\n- Added documentation for New ``save_checkpoint``  and ``load_checkpoint`` APIs to save/load checkpoints during distributed training. See :ref:`save_load_developer_guide`\n- Added documentation for a new ``Query-Key-Value(QKV)`` module in NeuronX Distributed for Training. See :ref:`api guide <api_guide>` and :ref:`tutorial <llama2_tp_pp_tutorial>`\n- Added new developer guide for Inference using NeuronX Distributed. :ref:`developer guide<nxd_inference_developer_guide>`\n- Added ``Llama-2-7B`` model inference script (:ref:`[html] </src/examples/pytorch/neuronx_distributed/llama/llama2_inference.ipynb>` :pytorch-neuron-src:`[notebook] <neuronx_distributed/llama/llama2_inference.ipynb>`)\n- Added App note on Support for ``PyTorch 2.1`` (Beta) . See :ref:`introduce-pytorch-2-1`\n- Added developer guide for ``replace_weights`` API to replace the separated weights. See :ref:`torch_neuronx_replace_weights_api` \n- Added [Beta] script for training ``stabilityai/stable-diffusion-2-1-base`` and  ``runwayml/stable-diffusion-v1-5`` models . See `script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/stable_diffusion/>`_ \n- Added [Beta] script for training ``facebook/bart-large`` model. See `script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_summarization/BartLarge.ipynb>`_ \n- Added [Beta] script for ``stabilityai/stable-diffusion-2-inpainting`` model inference.  See `script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_sd2_inpainting_936_624_inference.ipynb>`_ \n- Added documentation for new ``Neuron Distributed Event Tracing (NDET) tool`` to help visualize execution trace logs and diagnose errors in multi-node workloads. See :ref:`neuron-det-ug` \n- Updated Neuron Profile User guide with support for multi-worker jobs. See :ref:`neuron-profile-ug`\n- Minor updates to Custom Ops API reference guide.See :ref:`custom-ops-api-ref-guide`\n\n\n\n\nNeuron 2.15.0\n--------------\nDate: 10/26/2023\n\n- New :ref:`introduce-pytorch-2-0` application note with ``torch-neuronx``\n- New :ref:`llama2_70b_tp_pp_tutorial` and (`sample script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/tp_pp_llama2_70b_hf_pretrain>`_) using ``neuronx-distributed``\n- New :ref:`model_samples_tutorials` documentation for a consolidated list of code samples and tutorials published by AWS Neuron.\n- New :ref:`sdk-classification` documentation for alpha, beta, and stable Neuron SDK definitions and updated documentation references.\n- New :ref:`pipeline_parallelism_overview` and :ref:`pp_developer_guide` documentation in ``neuronx-distributed``\n- Updated :ref:`Neuron Distributed API Guide <api_guide>` regarding pipeline-parallelism support and checkpointing\n- New :ref:`activation_memory_reduction` application note and :ref:`activation_memory_reduction_developer_guide` in ``neuronx-distributed``\n- New ``Weight Sharing (Deduplication)`` `notebook script <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert_shared_weights.ipynb>`_\n- Added Finetuning script for `google/electra-small-discriminator <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/ElectraSmall.ipynb>`_ with ``torch-neuronx``\n- Added `ResNet50 training (Beta) <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/resnet50/resnet50.ipynb>`_ tutorial and scripts with ``torch-neuronx``\n- Added `Vision Perceiver training sample <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_image_classification/VisionPerceiverConv.ipynb>`_ with ``torch-neuronx``\n- Added ``flan-t5-xl`` model inference :pytorch-neuron-src:`tutorial <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>` using ``neuronx-distributed`` \n- Added ``HuggingFace Stable Diffusion 4X Upscaler model Inference on Trn1 / Inf2`` `sample script <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_sd_x4_upscaler_inference.ipynb>`_ with ``torch-neuronx``\n- Updated `GPT-NeoX 6.9B and 20B model scripts <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain>`_ to include selective checkpointing.\n- Added serialization support and removed ``-O1`` flag constraint to ``Llama-2-13B`` model inference script `tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb>`_ with ``transformers-neuronx``\n- Updated ``BERT`` script and ``Llama-2-7B`` script with Pytorch 2.0 support\n- Added option-argument ``llm-training`` to the existing ``--distribution_strategy`` compiler option to make specific optimizations related to training distributed models in :ref:`neuron-compiler-cli-reference-guide`\n- Updated :ref:`neuron-sysfs-ug` to include mem_ecc_uncorrected and sram_ecc_uncorrected hardware statistics.\n- Updated :ref:`torch_neuronx_trace_api` to include io alias documentation\n- Updated :ref:`transformers_neuronx_developer_guide` with serialization support.\n- Upgraded ``numpy`` version to ``1.22.2`` for various scripts\n- Updated ``LanguagePerceiver`` fine-tuning `script <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_text_classification/LanguagePerceiver.ipynb>`_ to ``stable``\n- Announcing :ref:`End of Support for OPT <announce-intent-eos-opt>`  example in ``transformers-neuronx``\n- Announcing :ref:`End of Support for \"nemo\" option-argument <announce-intent-deprecate-nemo-arg>`  \n\nKnown Issues and Limitations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nFollowing tutorials are currently not working. These tutorials will be updated once there is a fix.\n\n- `Zero1-gpt2-pretraining-tutorial <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/zero1_gpt2.html#zero1-gpt2-pretraining-tutorial>`_\n\nNeuron 2.14.0\n-------------\nDate: 09/15/2023\n\n- Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See :ref:`neuron_calculator`\n- Announcement to deprecate ``--model-type=transformer-inference`` flag. See :ref:`announce-end-of-support-transformer-flag`\n- Updated HF ViT benchmarking script to use ``--model-type=transformer`` flag. See :ref:`[script] <src/benchmark/pytorch/hf-google-vit_benchmark.py>`\n- Updated ``torch_neuronx.analyze`` API documentation. See :ref:`torch_neuronx_analyze_api`\n- Updated Performance benchmarking numbers for models on Inf1,Inf2 and Trn1 instances with 2.14 release bits. See :ref:`_benchmark`\n- New tutorial for Training Llama2 7B with Tensor Parallelism and ZeRO-1 Optimizer using ``neuronx-distributed``  :ref:`llama2_7b_tp_zero1_tutorial`\n- New tutorial for ``T5-3B`` model inference using ``neuronx-distributed``  (:pytorch-neuron-src:`tutorial <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`)\n- Updated ``Neuron Persistent Cache`` documentation regarding clarification of flags parsed by ``neuron_cc_wrapper`` tool which is a wrapper over ``Neuron Compiler CLI``. See :ref:`neuron-caching`\n- Added ``tokenizers_parallelism=true`` in various notebook scripts to supress tokenizer warnings making errors easier to detect\n- Updated Neuron device plugin and scheduler YAMLs to point to latest images.  See `yaml configs <https://github.com/aws-neuron/aws-neuron-sdk/tree/master/src/k8>`_\n- Added notebook script to fine-tune ``deepmind/language-perceiver`` model using ``torch-neuronx``. See `sample script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_text_classification/LanguagePerceiver.ipynb>`_\n- Added notebook script to fine-tune ``clip-large`` model using ``torch-neuronx``. See `sample script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_contrastive_image_text/CLIPLarge.ipynb>`_\n- Added ``SD XL Base+Refiner`` inference sample script using ``torch-neuronx``. See `sample script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_sdxl_base_and_refiner_1024_inference.ipynb>`_\n- Upgraded default ``diffusers`` library from 0.14.0 to latest 0.20.2 in ``Stable Diffusion 1.5`` and ``Stable Diffusion 2.1`` inference scripts. See `sample scripts <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference>`_\n- Added ``Llama-2-13B`` model training script using ``neuronx-nemo-megatron`` ( `tutorial <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md>`_ )\n\n\n\n\nNeuron 2.13.0\n-------------\nDate: 08/28/2023\n\n\n- Added tutorials for GPT-NEOX 6.9B and 20B models training using neuronx-distributed. See more at :ref:`tp_tutorials`\n- Added TensorFlow 2.x (``tensorflow-neuronx``) analyze_model API section. See more at :ref:`tensorflow-ref-neuron-analyze_model-api`\n- Updated setup instructions to fix path of existing virtual environments in DLAMIs. See more at :ref:`setup guide <setup-guide-index>`\n- Updated setup instructions to fix pinned versions in upgrade instructions of setup guide. See more at :ref:`setup guide <setup-guide-index>`\n- Updated tensorflow-neuron HF distilbert tutorial to improve performance by removing HF pipeline. See more at :ref:`[html] </src/examples/tensorflow/huggingface_bert/huggingface_bert.html>` :github:`[notebook] </src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb>`\n- Updated training troubleshooting guide in torch-neuronx to describe network Connectivity Issue on trn1/trn1n 32xlarge with Ubuntu. See more at :ref:`pytorch-neuron-traning-troubleshooting`\n- Added \"Unsupported Hardware Operator Code\" section to Neuron Runtime Troubleshooting page. See more at :ref:`nrt-troubleshooting`\n- Removed 'beta' tag from ``neuronx-distributed`` section for training. ``neuronx-distributed`` Training is now considered stable and ``neuronx-distributed`` inference is considered as beta.\n- Added FLOP count(``flop_count``) and connected Neuron Device ids (``connected_devices``) to sysfs userguide. See :ref:`neuron-sysfs-ug`\n- Added tutorial for ``T5`` model inference.  See more at :pytorch-neuron-src:`[notebook] <torch-neuronx/t5-inference-tutorial.ipynb>`\n- Updated neuronx-distributed api guide and inference tutorial. See more at :ref:`api_guide` and :ref:`tp_inference_tutorial`\n- Announcing End of support for ``AWS Neuron reference for Megatron-LM`` starting Neuron 2.13. See more at :ref:`announce-eol-megatronlm`\n- Announcing end of support for ``torch-neuron`` version 1.9 starting Neuron 2.14. See more at :ref:`announce-eol-pytorch19`\n- Upgraded ``numpy`` version to ``1.21.6`` in various training scripts for `Text Classification <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training>`_\n- Added license for Nemo Megatron to SDK Maintenance Policy. See more at :ref:`sdk-maintenance-policy`\n- Updated ``bert-japanese`` training Script to use ``multilingual-sentiments`` dataset. See `hf-bert-jp <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_bert_jp> `_\n- Added sample script for LLaMA V2 13B model inference using transformers-neuronx. See `neuron samples repo <https://github.com/aws-neuron/aws-neuron-samples/>`_\n- Added samples for training GPT-NEOX 20B and 6.9B models using neuronx-distributed. See `neuron samples repo <https://github.com/aws-neuron/aws-neuron-samples/>`_\n- Added sample scripts for CLIP and Stable Diffusion XL inference using torch-neuronx. See `neuron samples repo <https://github.com/aws-neuron/aws-neuron-samples/>`_\n- Added sample scripts for vision and language Perceiver models inference using torch-neuronx. See `neuron samples repo <https://github.com/aws-neuron/aws-neuron-samples/>`_\n- Added camembert training/finetuning example for Trn1 under hf_text_classification in torch-neuronx. See `neuron samples repo <https://github.com/aws-neuron/aws-neuron-samples/>`_\n- Updated Fine-tuning Hugging Face BERT Japanese model sample in torch-neuronx. See `neuron samples repo <https://github.com/aws-neuron/aws-neuron-samples/>`_\n- See more neuron samples changes in `neuron samples release notes <https://github.com/aws-neuron/aws-neuron-samples/blob/master/releasenotes.md>`_\n- Added samples for pre-training GPT-3 23B, 46B and 175B models using neuronx-nemo-megatron library. See `aws-neuron-parallelcluster-samples <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples>`_\n- Announced End of Support for GPT-3 training using aws-neuron-reference-for-megatron-lm library. See `aws-neuron-parallelcluster-samples <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples>`_\n- Updated bert-fine-tuning SageMaker sample by replacing amazon_reviews_multi dataset with amazon_polarity dataset. See `aws-neuron-sagemaker-samples <https://github.com/aws-neuron/aws-neuron-sagemaker-samples>`_\n\n\nNeuron 2.12.0\n-------------\nDate: 07/19/2023\n\n- Added best practices user guide for benchmarking performance of Neuron Devices `Benchmarking Guide and Helper scripts <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/microbenchmark>`_\n- Announcing end of support for Ubuntu 18. See more at :ref:`announce-eol-ubuntu18`\n- Improved sidebar navigation in Documentation.\n- Removed support for Distributed Data Parallel(DDP) Tutorial.\n  \n\nNeuron 2.11.0\n-------------\n\nDate: 06/14/2023\n\n- New :ref:`neuron_calculator` Documentation section to help determine number of Neuron Cores needed for LLM Inference.\n- Added App Note :ref:`neuron_llm_inference`\n- New ``ML Libraries`` Documentation section to have :ref:`neuronx-distributed-index` and :ref:`transformers_neuronx_readme`\n- Improved Installation and Setup Guides for the different platforms supported. See more at :ref:`setup-guide-index`\n- Added Tutorial :ref:`setup-trn1-multi-node-execution`\n"
  },
  {
    "path": "release-notes/index.rst",
    "content": ".. _neuron_release_notes:\n\n.. meta::\n   :description: The AWS Neuron SDK release notes home page. Current release version: 2.29.0.\n   :keywords: aws, neuron, what's new, release notes\n\nAWS Neuron SDK Release Notes\n============================\n\n**Last updated**:  April 09, 2026\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    Neuron 2.29.0 </release-notes/2.29.0>\n    Component release notes </release-notes/components/index>\n    Release artifacts </release-notes/releasecontent>\n    Previous versions </release-notes/prev/rn>\n\nCurrent Release Notes\n----------------------\n\nThis is the official home page for the AWS Neuron SDK release notes. Release notes are provided whenever AWS and Annapurna labs releases a new version of the Neuron SDK. Select a release version and review what it brings to you!\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: \n      :class-card: sd-border-2\n      :link: /release-notes/2.29.0\n      :link-type: doc\n\n      **Latest AWS Neuron SDK release: 2.29.0**\n      ^^^\n      On **04/09/2026**, AWS released version **2.29.0** of the Neuron SDK.\n\n      For more details, select this card and browse the release notes. \n\n----\n\nNeuron Component Release Notes\n------------------------------\n\nEach Neuron component has specific release notes across Neuron versions. \n\n.. list-table::\n   :widths: 40 30 30\n   :header-rows: 1\n   :align: left\n\n   * - Component\n     - Updated in Neuron Version\n     - Latest Component Version\n   * - :doc:`Neuron Compiler </release-notes/components/compiler>`\n     - 2.27.0\n     - 2.24.5133.0\n   * - :doc:`Neuron Containers <components/containers>`\n     - **2.29.0**\n     - 2.29.0\n   * - :doc:`Neuron Developer Tools <components/dev-tools>`\n     - **2.29.0**\n     - 2.29.0\n   * - :doc:`Neuron DLAMI <components/dlamis>`\n     - **2.29.0**\n     - 2.29.0\n   * - :doc:`JAX NeuronX <components/jax>`\n     - 2.26.0\n     - 0.7.0.1.0.*\n   * - :doc:`NKI Library <components/nki-lib>`\n     - **2.29.0**\n     - 2.29.0\n   * - :doc:`Neuron Kernel Interface <components/nki>`\n     - **2.29.0**\n     - 0.3.0\n   * - :doc:`NxD Core <components/nxd-core>`\n     - 2.26.0\n     - 0.18.27753\n   * - :doc:`NxD Inference <components/nxd-inference>`\n     - 2.29.0\n     - 0.9.17334\n   * - :doc:`NxD Training <components/nxd-training>`\n     - 2.25.0\n     - 1.5.0\n   * - :doc:`PyTorch Neuron Framework (torch-neuronx) <components/pytorch>`\n     - **2.29.0**\n     - 2.9.0.2.13.*\n   * - :doc:`Neuron Runtime Library <components/runtime>`\n     - **2.29.0**\n     - 2.31.24.0\n   * - :doc:`Neuron Driver <components/runtime>`\n     - **2.29.0**\n     - 2.26.10.0\n   * - :doc:`Neuron Collectives <components/runtime>`\n     - **2.29.0**\n     - 2.31.24.0\n   * - :doc:`vLLM plugin for Neuron <components/nxd-inference>`\n     - **2.29.0**\n     - 0.6.0\n  \n----\n\nCurrent Release Package Versions\n---------------------------------\n\n.. grid:: 1 \n        :gutter: 2\n\n        .. grid-item-card::\n                :link: latest-neuron-release-artifacts\n                :link-type: ref\n                :class-card: sd-border-1\n        \n                **Neuron 2.29.0 release artifacts**\n                ^^^\n                The libraries and packages updated in the latest Neuron release.\n\n--\n\n.. _previous-neuron-releases:\n\nPrevious AWS Neuron SDK releases\n--------------------------------\n\nRelease notes for prior versions from the past 12 months.\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Neuron version\n     - Date released\n   * - :doc:`2.28.1 <prev/2.28.1>`\n     - 03/13/26\n   * - :doc:`2.28.0 <prev/2.28.0>`\n     - 02/26/26\n   * - :doc:`2.27.1 <prev/2.27.1>`\n     - 01/14/26\n   * - :doc:`2.27.0 <prev/2.27.0/index>`\n     - 12/19/25\n   * - :doc:`2.26.1 <prev/2.26.1>`\n     - 10/29/25\n   * - :ref:`2.26.0 <neuron-2-26-0-whatsnew>`\n     - 09/18/25  \n   * - :ref:`2.25.0 <neuron-2-25-0-whatsnew>`\n     - 07/31/25\n   * - :ref:`2.24.1 <neuron-2-24-1-whatsnew>`\n     - 06/30/25\n   * - :ref:`2.24.0 <neuron-2-24-0-whatsnew>`\n     - 06/24/25\n   * - :ref:`2.23.0 <neuron-2.23.0-whatsnew>`\n     - 06/10/25\n   * - :ref:`2.22.1 <neuron-2.22.1-whatsnew>`\n     - 05/12/25\n   * - :ref:`2.22.0 <neuron-2.22.0-whatsnew>`\n     - 04/03/25\n   * - :ref:`2.21.1 <neuron-2.21.1-whatsnew>`\n     - 01/14/25\n   * - :ref:`2.21.0 Beta <neuron-2.21.0.beta-whatsnew>`\n     - 12/03/24\n\n.. note::\n    The AWS Neuron SDK is updated regularly with new versions. These releases follow a semantic versioning model of ``(major).(minor).(patch)``. ``major`` versions are more likely to introduce new features and breaking changes over a prior major version. ``minor`` versions add feature and API improvements and may introduce smaller breaking changes. ``patch`` versions typically provide bug fixes and will not have breaking changes.\n\nEarlier Neuron component release notes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* For older components and features that have not been updated recently or are out of support, see :doc:`archive/index`.\n\nOlder Releases\n-----------------\n\nRelease notes are archived when the major version of a release is incremented.\n\n* :doc:`Previous Neuron SDK 2.X release notes </release-notes/prev/rn>`\n* :doc:`Archived Neuron SDK 1.X release notes </release-notes/archive/neuron1/prev/rn>`\n\n"
  },
  {
    "path": "release-notes/prev/2.25.0/compiler.rst",
    "content": ".. _neuron-2-25-0-compiler:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK compiler component, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: Neuron Compiler release notes\n====================================================\n\n**Date of release**: July 31, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nAnnouncements\n-------------\nThe Neuron Compiler default for the ``--auto-cast`` option will change from ``--auto-cast=matmult`` to ``--auto-cast=none`` in a future release.\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* Minor bug fixes and performance enhancements for both the ``trn1`` and ``trn2`` platforms.\n\n\nKnown issues\n------------\n\n* The Llama3 70B test has a compile time increase of 16% and 18%, for 16 and 32 nodes respectively. We are investigating the cause of this increase and will provide an update in the future.\n"
  },
  {
    "path": "release-notes/prev/2.25.0/containers.rst",
    "content": ".. _neuron-2-25-0-dlc:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Deep Learning Containers (DLC) component, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: Neuron Deep Learning Containers release notes\n====================================================================\n\n**Date of release**: July 31, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nImprovements\n------------\n\n* All Neuron packages and their dependencies have been upgraded to support vAWS Neuron SDK version 2.25.0.\n* The ``pytorch-inference-vllm-neuronx`` Deep Learning Container has been upgraded to version ``0.9.1``.\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* ``pytorch-training-neuronx`` 2.7.0 DLC has two HIGH CVEs related to ``sagemaker-python-sdk`` package. We are actively working to resolve these high CVEs:\n- * `CVE-2024-34072 <https://nvd.nist.gov/vuln/detail/CVE-2024-34072>`_\n- * `CVE-2024-34073 <https://nvd.nist.gov/vuln/detail/CVE-2024-34073>`_\n* ``pytorch-inference-vllm-neuronx`` 0.9.1 DLC has CRITICAL and HIGH CVEs . We are actively working to resolve these high CVEs:\n- * `CVE-2024-35515 <https://nvd.nist.gov/vuln/detail/CVE-2024-35515>`_\n- * `CVE-2022-4296 <https://nvd.nist.gov/vuln/detail/CVE-2022-42969>`_"
  },
  {
    "path": "release-notes/prev/2.25.0/dlami.rst",
    "content": ".. _neuron-2-25-0-dlami:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Deep Learning AWS Machine Images (DLAMIs) component, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: Neuron Deep Learning AWS Machine Images release notes\n============================================================================\n\n**Date of release**: July 31, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nImprovements\n------------\n\n* All multi-framework virtual environments for the Deep Learning AMIs have been upgraded with the latest Neuron packages to support the AWS Neuron SDK version 2.25.0.\n\n"
  },
  {
    "path": "release-notes/prev/2.25.0/docs-and-samples.rst",
    "content": ".. _neuron-2-25-0-docs-and-samples:\n\n.. meta::\n   :description: The official release notes for updates to the Neuron SDK developer docs and samples. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: Docs and samples release news and details\n================================================================\n\n**Date of release**: July 31, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nImprovements and changes\n------------------------\n\nThis section of the release notes covers improvements and changes we've made to the Neuron SDK documentation and samples in this release.\n\n* We've heard your feedback on the Neuron documentation, and we're starting on a journey to improve them. Once early change is a richer structure for the release notes with a clearer presentation of the details you're looking for. It's a small thing, but we hope it helps you!\n\n* The front page of the AWS Neuron documentation site has been redesigned. This is the first of many improvements to the site design and navigation we'll be making across the next releases.\n\nSupport changes\n---------------\n\nJust a few final notes for this release...\n\n* Support for Wav2Vec2 test sample datasets are currently pinned at version ``3.6.0``, as ``4.0.0`` does not support the ``patrickvonplaten/librispeech_asr_dummy`` dataset at this time."
  },
  {
    "path": "release-notes/prev/2.25.0/index.rst",
    "content": ".. _neuron-2-25-0-whatsnew:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0 release notes\n===================================\n\n**Date of release**: July 31, 2025\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   PyTorch support <nx-pytorch>\n   JAX support <nx-jax>\n   NxD Inference <nxd-inference>\n   NxD Training <nxd-training>\n   NxD Core <nxd-core>\n   Neuron Compiler <compiler>\n   Neuron Runtime <runtime>\n   Developer tools <tools>\n   Deep Learning AMIs <dlami>\n   Deep Learning Containers <containers>\n   Docs and samples <docs-and-samples>\n   Release artifacts </release-notes/releasecontent>\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\nRelease highlights\n------------------\n\nNeuron 2.25.0 delivers updates across several key areas: inference performance optimizations, expanded model support, enhanced profiling capabilities, improved monitoring and observability tools, framework updates, and refreshed development environments and container offerings. The release includes bug fixes across the SDK components, along with updated tutorials and documentation for new features and model deployments.\n\n\nInference Optimizations (NxD Core and NxDI)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron 2.25.0 introduces performance optimizations and new capabilities including:\n\n* Context and Data Parallel support for improved batch scaling\n* Chunked Attention for improved long sequence processing\n* Automatic Aliasing (Beta) for fast tensor operations\n* Disaggregated Serving (Beta) improvements\n\nModel Support (NxDI)\n^^^^^^^^^^^^^^^^^^^^\n\nNeuron 2.25.0 expands model support to include:\n\n* Qwen3 dense models (0.6B to 32B parameters)\n* Flux.1-dev model for text-to-image generation (Beta)\n\nMonitoring and Observability\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* ``neuron-ls`` now displays CPU and NUMA node affinity information\n* ``neuron-ls`` adds NeuronCore IDs display for each Neuron Device\n* ``neuron-monitor`` improves accuracy of device utilization metrics\n\nFramework Updates\n^^^^^^^^^^^^^^^^^\n\n* JAX 0.6.1 support added, maintaining compatibility with versions 0.4.31-0.4.38 and 0.5\n* vLLM support upgraded to version 0.9.x V0\n\nDevelopment Environment Updates\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron SDK updated to version 2.25.0 in:\n\n* Deep Learning AMIs on Ubuntu 22.04 and Amazon Linux 2023\n* Multi-framework DLAMI with environments for both PyTorch and JAX\n* PyTorch 2.7 Single Framework DLAMI\n* JAX 0.6 Single Framework DLAMI\n\nContainer Support\n^^^^^^^^^^^^^^^^^\n\nNeuron SDK updated to version 2.25.0 in:\n\n* PyTorch 2.7 Training and Inference DLCs\n* JAX 0.6 Training DLC\n* vLLM 0.9.1 Inference DLC\n* Neuron Device Plugin and Scheduler container images for Kubernetes integration\n\nComponent release notes\n-----------------------\n\nSelect a card below to review detailed release notes for each component of the Neuron SDK version 2.25.0. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.\n\n.. grid:: 1 1 2 2\n        :gutter: 2\n\n        .. grid-item-card:: \n                :link: neuron-2-25-0-pytorch\n                :link-type: ref\n\n                **PyTorch framework** 2.25.0 release notes\n                ^^^\n                Neuron features and solutions that support the PyTorch ML framework.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-25-0-jax\n                :link-type: ref\n\n                **JAX framework** 2.25.0 release notes\n                ^^^\n                Neuron features and solutions that support the JAX ML framework.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-25-0-nxd-training\n                :link-type: ref\n\n                **NxD Training** 2.25.0 release notes\n                ^^^\n                Neuron features and tools for LLM and agent ML model training.\n                +++\n                Supports: ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-25-0-nxd-inference\n                :link-type: ref\n\n                **NxD Inference** 2.25.0 release notes\n                ^^^\n                Neuron features and tools for LLM and agent ML model inference.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n        \n        .. grid-item-card::\n                :link: neuron-2-25-0-nxd-core\n                :link-type: ref\n\n                **NxD Core** 2.25.0 release notes\n                ^^^\n                Common features and tools for Neuron-based training and inference.\n                +++\n                Supports: ``Trn1`` / ``Trn1n``, ``Trn2``\n         \n        .. grid-item-card:: \n                :link: neuron-2-25-0-compiler\n                :link-type: ref\n\n                **Neuron Compiler** 2.25.0 release notes\n                ^^^\n                The Neuron compiler for AWS Trainium and Inferentia, and its libraries and tools.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-25-0-runtime\n                :link-type: ref\n\n                **Neuron Runtime** 2.25.0 release notes\n                ^^^\n                The Neuron kernel driver and C++ libraries for AWS Inferentia and Trainium instances.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``\n\n        .. grid-item-card:: \n                :link: neuron-2-25-0-tools\n                :link-type: ref\n\n                **Neuron Developer Tools** 2.25.0 release notes\n                ^^^\n                Tools that support end-to-end development for AWS Neuron.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n\n\n        .. grid-item-card:: \n                :link: neuron-2-25-0-dlami\n                :link-type: ref\n\n                **Neuron Deep Learning AWS Machine Images (DLAMIs)** 2.25.0 release notes\n                ^^^\n                AWS-specific machine images for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n \n        .. grid-item-card:: \n                :link: neuron-2-25-0-dlc\n                :link-type: ref\n\n                **Neuron Deep Learning Containers (DLCs)** 2.25.0 release notes\n                ^^^\n                AWS-specific container definitions for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n\n        .. grid-item-card::\n                :link: neuron-2-25-0-docs-and-samples\n                :link-type: ref\n\n                **Documentation and samples** 2.25.0 release notes\n                ^^^\n                Changes to the Neuron docs and code samples.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n`` \n\n        .. grid-item-card::\n                :link: latest-neuron-release-artifacts\n                :link-type: ref\n        \n                **Neuron 2.25.0 release artifacts**\n                ^^^\n                The libraries and packages updated in this release.\n\nSupport announcements\n---------------------\n\nThis section signals the official end-of-support or end of support for specific features, tools, and APIs.\n\nEnd-of-support announcements\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n*An \"end-of-support (EoS)\" announcement is a notification that a feature, tool, or API will not be supported in the future. Plan accordingly!*\n\n* In a future release, the Neuron Compiler default flag ``--auto-cast=matmult`` will change to ``--auto-cast=none``.\n\n  This means the Neuron Compiler will no longer perform auto-casting and use the data types of the operators in the incoming HLO. If the current behavior is desired, users can explicitly pass the  ``--auto-cast=matmult`` and  ``--auto-cast-type=bf16`` options to the compiler.  \n\n  **Note:** This change will not affect Neuron NxDI, NxDT, and TNx Frameworks as these are set to ``--auto-cast=none`` by default. However, Torch-Neuronx users may experience an impact and must adjust their settings if they rely on the previous auto-casting behavior.\n\n* Starting from Neuron Release 2.24, the Hugging Face Transformers NeuronX library is deprecated and in maintenance mode. ``transformers-neuronx`` releases will now only address critical security issues. In Neuron Release 2.26, Neuron will end support for transformers-neuronx. Current users of ``transformers-neuronx`` are advised to migrate to :doc:`NeuronX Distributed Inference </libraries/nxd-inference/index>`.\n\n* PyTorch version 2.6 will no longer be supported in a coming release.  Current users of PyTorch 2.6 are advised to upgrade to PyTorch 2.7, which is supported in this release.\n\n* Support for Python 3.9 will end in a coming release. Currently, we support versions of Python up to 3.11. Current users of Python 3.9 are advised to upgrade to Python 3.11, which is supported in this release.\n\nEnding support in 2.25.0\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n*Items listed here are officially no longer supported starting with Neuron 2.25.0.*\n\n* The following tutorials are no longer supported and have been moved the to :doc:`AWS Neuron SDK doc archive </archive/index>`:\n  \n  * :doc:`/archive/tutorials/finetune_t5`\n  * :doc:`/archive/tutorials/ssd300_demo/ssd300_demo`\n  * :doc:`/archive/tutorials/megatron_gpt_pretraining`\n\n* Neuron 2.25 is the last release supporting NxDT Megatron Models. Future Neuron releases will not include support for NxDT Megatron Models. Current users of the NxDT Megatron Models are advised to use the Hugging Face model instead by setting the ``CONF_FILE`` variable in the ``train.sh`` file to the config model you want to use.\n\n* With version 2.25.0, Neuron no longer supports vLLM version 0.7.2. Current users of vLLM 0.7.2 are advised to upgrade to vLLM 0.9.1, which is supported in this release.\n\n* Transformers for NeuronX is no longer supported. For more details, see :doc:`the prior announcement </about-neuron/announcements/neuron2.x/announce-intent-maintenance-tnx>`.\n\nPrevious releases\n-----------------\n\n* :ref:`Neuron 2.24.1 <neuron-2-24-1-whatsnew>`\n* :ref:`Neuron 2.24.0 <neuron-2-24-0-whatsnew>`\n* :doc:`Earlier releases </release-notes/prev/rn>`\n"
  },
  {
    "path": "release-notes/prev/2.25.0/nx-jax.rst",
    "content": ".. _neuron-2-25-0-jax:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK JAX support component, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: JAX support release notes\n================================================\n\n**Date of release**: July 31, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nReleased versions\n-----------------\n* ``0.6.1.1.0.*``\n\nImprovements\n------------\n\n* This release introduces support for JAX version ``0.6.1``.\n\nBug fixes\n---------\n\n* Previously, using multiple meshes within a single program wasn't supported. This is fixed to add support for sub-meshes.\n\nKnown issues\n------------\n\n* Known issues are listed at :ref:`jax-neuron-known-issues`.\n"
  },
  {
    "path": "release-notes/prev/2.25.0/nx-pytorch.rst",
    "content": ".. _neuron-2-25-0-pytorch:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK PyTorch support component, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: PyTorch support release notes\n====================================================\n\n**Date of release**: July 31, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nReleased versions\n-----------------\n\n- ``2.7.0.2.9.*``\n- ``2.6.0.2.9.*``\n\nImprovements\n------------\n\n- The :ref:`Core Placement API <torch_neuronx_core_placement_api>` is no longer beta/experimental and the instructions on how to use it have been updated.\n\n  To migrate, replace any function scope ``torch_neuron.experimental.`` with ``torch_neuron.``. The change will have no effect on behavior or performance. For example, replace ``torch_neuronx.experimental.set_neuron_cores`` with ``torch_neuronx.set_neuron_cores``. If you use ``torch_neuron.experimental.`` scope it will work as before but now will also emit this warning: “In a future version torch_neuronx.experimental.<func> will be removed.  Call torch_neuronx.<func> instead.\"\n\nKnown issues\n------------\n\n.. note::\n   * See the :ref:`Introducing PyTorch 2.7 Support<introduce-pytorch-2-7>` for a full list of known issues with v2.7.\n   * See the :ref:`Introducing PyTorch 2.6 Support<introduce-pytorch-2-6>` for a full list of known issues with v2.6.\n\n* [v2.7] Using the latest torch-xla v2.7 may result in increase in host memory usage compared torch-xla v2.6. In on example, LLama2 pretraining with ZeRO1 and sequence length 16k could see an increase of 1.6% in host memory usage.\n\n* Currently, when switching Ubuntu OS kernel version from 5.15 to 6.8, you may see performance differences due to the new kernel scheduler (CFS vs EEVDF). For example, BERT pretraining performance could be lower by up to 10%. You may try using an older OS kernel (i.e. Amazon Linux 2023) or experiment with the kernel real-time scheduler by running ``sudo chrt --fifo 99`` before your command (i.e. ``sudo chrt --fifo 99 <script>``) to improve the performance. Note that adjusting the real-time scheduler can also result in lower performance. See https://www.kernel.org/doc/html/latest/scheduler/sched-eevdf.html for more information.\n\n* Currently, when using tensor split operation on a 2D array in the second dimension, the resulting tensors don't have the expected data (https://github.com/pytorch/xla/issues/8640). The work-around is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another work-around is to use ``torch.tensor_split``.\n\n* [v2.6]  BERT pretraining performance is ~10% lower with torch-neuronx 2.6 compared to torch-neuronx 2.5. This is due to a known regression in torch-xla https://github.com/pytorch/xla/issues/9037 and can affect other models with high graph tracing overhead. This is fixed in torch-xla v2.7. To work-around this issue in torch-xla v2.6, build the ``r2.6_aws_neuron`` branch of torch-xla as follows (see :ref:`pytorch-neuronx-install-cxx11` for C++11 ABI version):\n\n   .. code:: bash\n\n      # Setup build env (make sure you are in a python virtual env). Replace \"apt\" with \"yum\" on AL2023.\n      sudo apt install cmake\n      pip install yapf==0.30.0\n     wget https://github.com/bazelbuild/bazelisk/releases/download/v1.20.0/bazelisk-linux-amd64\n     sudo cp bazelisk-linux-amd64 /usr/local/bin/bazel\n\n     # Clone repos\n     git clone --recursive https://github.com/pytorch/pytorch --branch v2.6.0\n     cd pytorch/\n     git clone --recursive https://github.com/pytorch/xla.git --branch r2.6_aws_neuron\n     _GLIBCXX_USE_CXX11_ABI=0 python setup.py bdist_wheel\n\n     # pip wheel will be present in ./dist\n     cd xla/\n     CXX_ABI=0 python setup.py bdist_wheel\n\n     # pip wheel will be present in ./dist and can be installed instead of the torch-xla released in pypi.org\n\n* Currently, BERT pretraining performance is ~11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1`` which would still work in torch-neuronx 2.5 and 2.6 although there will be end-of-support warnings (as noted below).\n\n* Environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (see the warning raised below). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`).\n\n   Warning: ``XLA_DOWNCAST_BF16`` will be deprecated after the 2.5 release, please downcast your model directly\n\n\n* [v2.6] ``AttributeError: module 'torch_xla.core.xla_model' ... does not have the attribute 'xrt_world_size'``. This is an error that notes that ``torch_xla.core.xla_model.xrt_world_size()`` is removed in torch-xla version 2.7. Switch to using ``torch_xla.runtime.world_size()`` instead.\n\n* [v2.6] ``AttributeError: module 'torch_xla.core.xla_model' ... does not have the attribute 'get_ordinal'``. This is an error that notes that ``torch_xla.core.xla_model.xla_model.get_ordinal()`` is removed in torch-xla version 2.7. Switch to using ``torch_xla.runtime.global_ordinal()`` instead.\n\n* ``AttributeError: module 'torch_xla.runtime' has no attribute 'using_pjrt'``. In Torch-XLA 2.5+, ``torch_xla.runtime.using_pjrt`` is removed because PJRT is the sole Torch-XLA runtime. See this `PyTorch commit PR on GitHub <https://github.com/pytorch/xla/commit/d6fb5391d09578c8804b1331a5e7a4f72bf981db>`_.\n\n"
  },
  {
    "path": "release-notes/prev/2.25.0/nxd-core.rst",
    "content": ".. _neuron-2-25-0-nxd-core:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK NxD Core component, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: NxD Core release notes\n=============================================\n\n**Date of release**: July 31, 2025\n\n**Version**: 0.14.18461\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced with release 2.25.0 of the AWS Neuron SDK. Read on to learn about them!*\n\nInference\n^^^^^^^^^\n\nModelBuilder V2\n\"\"\"\"\"\"\"\"\"\n\nModelBuilder V2 provides a simplified version of the ModelBuilder API that is more flexible and extensible.\nThis API includes basic building blocks that you can use to trace, compile, and load modules to Neuron.\nFor more information, see :ref:`nxd-core-model-builder-v2` and the updated\n`Llama-3.2-1B reference inference sample <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama>`__. \n\nTraining\n^^^^^^^^\n\nSupport for Shared Experts\n\"\"\"\"\"\"\"\"\"\n\nShared Experts allow multiple model components to utilize the same expert neural networks. This release adds full support for Shared Experts in training workloads.\n  \n"
  },
  {
    "path": "release-notes/prev/2.25.0/nxd-inference.rst",
    "content": ".. _neuron-2-25-0-nxd-inference:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK NxD Inference component, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: NxD Inference release notes\n==================================================\n\n**Date of release**: July 31, 2025\n\n**Version**: 0.5.9230\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nQwen3 (dense) model support\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAdd support for Qwen3 dense models, which are tested on Trn1. Compatible models include:\n\n- `Qwen3-0.6B <https://huggingface.co/Qwen/Qwen3-0.6B>`__\n- `Qwen3-1.7B <https://huggingface.co/Qwen/Qwen3-1.7B>`__\n- `Qwen3-4B <https://huggingface.co/Qwen/Qwen3-4B>`__\n- `Qwen3-8B <https://huggingface.co/Qwen/Qwen3-8B>`__\n- `Qwen3-14B <https://huggingface.co/Qwen/Qwen3-14B>`__\n- `Qwen3-32B <https://huggingface.co/Qwen/Qwen3-32B>`__\n\nFor more information, see :ref:`nxdi-model-reference`.\n\nOther improvements\n^^^^^^^^^^^^^^^^^^\n\n- Added simplified functions that you can use to validate the accuracy of\n  logits returned by a model. These new functions include\n  ``check_accuracy_logits_v2`` and ``generated_expected_logits``, which provide more flexibility\n  than ``check_accuracy_logits``. For more information, see :ref:`nxdi-evaluating-models`.\n- Added ``scratchpad_page_size`` attribute to NeuronConfig. You can\n  specify this attribute to configure the scratchpad page size used\n  during compilation and at runtime. The scratchpad is a shared memory buffer\n  used for internal model variables and other data. For more information, see :ref:`nxd-inference-api-guide-neuron-config`.\n- Enabled `Chunked Attention <https://huggingface.co/blog/llama4-release#:~:text=Chunked%20attention%20(in%20RoPE%20layers)>`__ as a generic building block for\n  any attention-based model. Chunked attention limits the KV cache size to chunk size and can be used to enable long-context inference where memory constraint is an issue. \n  NxDI now supports chunked attention for any model that defines ``attention_chunk_size`` in the model's HuggingFace ``config.json``,  such as `Llama 4 Scout <https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E/blob/main/config.json#L11>`__,\n  or in the model's InferenceConfig.\n  Developers using NxDI can then pass ``attention_chunk_size`` to the attention module to enable chunked attention. See `modeling_llama.py <https://github.com/aws-neuron/neuronx-distributed-inference/blob/main/src/neuronx_distributed_inference/models/llama/modeling_llama.py>`__ for example.\n- Published scripts to evaluate model accuracy and benchmark performance against Neuron. For more details, see :doc:`the corresponding documentation </libraries/nxd-inference/tutorials/generating-results-with-performance-cli>` or `go to the Neuron samples GitHub repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/inference-benchmarking>`_.\n  \nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\n- Removed support for Meta checkpoint compatibility in Llama3.2 Multimodal modeling\n  code. You can continue to use Hugging Face checkpoints. Hugging Face\n  provides a `conversion\n  script <https://github.com/huggingface/transformers/blob/main/src/transformers/models/mllama/convert_mllama_weights_to_hf.py>`__\n  that you can run to convert a Meta checkpoint to a Hugging Face checkpoint.\n\nBug fixes\n---------\n\n*We're always fixing bugs. It's developer's life!* Here's what we fixed in 2.25.0:\n\n- Fixed accuracy issues when using Automatic Prefix Caching (APC) with\n  EAGLE speculation.\n- Fixed continuous batching for Llama3.2 Multimodal where the input batch size is less\n  than the compiled batch size.\n- Added support for continuous batching when running Neuron modeling code\n  on CPU.\n- Set a manual seed in ``benchmark_sampling`` to improve the stability\n  of data-dependent benchmarks like speculation.\n- Other minor fixes and improvements.\n"
  },
  {
    "path": "release-notes/prev/2.25.0/nxd-training.rst",
    "content": ".. _neuron-2-25-0-nxd-training:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK NxD Training component, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: NxD Training release notes\n=================================================\n\n**Date of release**: July 31, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nBug fixes\n---------\n\n* Disable ``expert_index`` in Mixture of Experts (MoE) forwarding to limit the output to just hidden states and router logits (as expected).\n\n\n\n"
  },
  {
    "path": "release-notes/prev/2.25.0/runtime.rst",
    "content": ".. _neuron-2-25-0-runtime:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Runtime component, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: Neuron Runtime release notes\n===================================================\n\n**Date of release**: July 31, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nReleased versions\n-----------------\n\n- Neuron Collectives: ``2.27.34.0``\n- Neuron Driver: ``2.23.9.0``\n- Neuron Runtime Library: ``2.27.23.0``\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\nNeuron Collectives 2.27.34.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n* Improved the interface with the Neuron Runtime for minor stability improvements.\n\nNeuron Driver 2.23.9.0\n^^^^^^^^^^^^^^^^^^^^^^\n* Exposed Tensor Engine activity counters in `sysfs`.\n\nNeuron Runtime Library 2.27.23.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n* Introduced ``nrt_get_vnc_memory_stats`` API to retrieve device memory usage.\n* Added support for State-Buffer to State-Buffer collective support for ``all_reduce``, ``reduce_scatter``, and ``all_gather`` for LNC2, which helps reduce HBM memory pressure.\n* Added support for coalescing of Collectives operations for internode RDH.\n* Introduced a new DGE priority class feature to select preferred packet size for memory transfers.\n* Improved ``nrt_init`` time by up to ~3 seconds on AWS Trainium and Inferentia instances.\n* Added a warning message along with a recommended scratchpad configuration when a loaded NEFF has non-optimial scratchpad usage.\n\nBreaking changes\n----------------\n\n*Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.*\n\nNeuron Runtime Library 2.27.23.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n* Due to a hardware bug that can cause numerical errors to be falsely reported (see the **Known Issues** section below), the runtime has disabled numerical errors by default. Users can re-enable numerical errors by setting ``NEURON_RT_NUMERICAL_ERRORS_VERBOSITY=critical`` or ``NEURON_FAIL_ON_NAN=1`` to enable debug flows and to prevent numerical errors from blowing up a training run.\n\nBug fixes\n---------\n\n*We're always fixing bugs. It's developer's life!* Here's what we fixed in 2.25.0:\n\nNeuron Runtime Library 2.27.23.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n* Fixed profiling APIs to report execution duration from explicit notifications.\n* Fixed race condition which can cause a crash when starting inspect traces.\n\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n\n* A hardware bug affecting **Trainium** and **Inferentia2** devices causes numerical errors to become \"sticky\" within the Neuron Core hardware. When a legitimate numerical error occurs during execution, the error state persists in the hardware, causing all subsequent executions to incorrectly report numerical errors even when the computations are valid. This sticky error state can only be resolved by restarting the application to clear the hardware.\n"
  },
  {
    "path": "release-notes/prev/2.25.0/tools.rst",
    "content": ".. _neuron-2-25-0-tools:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Developer Tools component, version 2.25.0. Release date: 7/31/2025.\n\nAWS Neuron SDK 2.25.0: Developer Tools release notes\n====================================================\n\n**Date of release**: July 31, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.25.0 release notes home <neuron-2-25-0-whatsnew>`\n\nImprovements\n------------\n\n*Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!*\n\nneuron-ls now shows NeuronCore IDs and CPU affinity\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nFor each Neuron device, ``neuron-ls`` will now show the corresponding NeuronCore IDs as well as CPU and NUMA node affinity in both the text and JSON outputs.\nThese can be used as reference when setting certain Neuron runtime environment variables such as ``NEURON_RT_VISIBLE_CORES``.\nSee :ref:`neuron-ls-ug` for an example.\n\nSystem profiles now show sync point events\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSystem profiles now show the sync point events that are used to approximate CPU and Neuron device timestamp alignment.\nThis can be used as a reference point if any inconsistencies are detected between the runtime and hardware trace timestamps.\nSee :ref:`neuron-profile-system-timestamp-adjustment` for more details.\n\n\nBehavioral changes\n------------------\n\n*Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.*\n\n* Added a summary metric to device profiles for ``total_active_time`` to help determine if the device was unnecessarily idle during execution.\n* Removed metrics for defunct processes from Neuron Monitor's Prometheus output to more accurately reflect the current utilization of NeuronCores.\n  Only processes that are currently active at the time of reporting will be included in the output.\n\n\nBug fixes\n---------\n\n*We're always fixing bugs. It's developer's life!* Here's what we fixed in 2.25.0:\n\n* Fixed issue in Neuron Profiler summary metrics where ``dma_active_time`` was larger than expected.\n* Fixed type inconsistency for certain event types and attributes in the system profile data that could result in a crash.\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\n* System profile hardware events may be misaligned due to sync point imprecision.  In Perfetto, this may cause events to be interleaved.\n* System profile events shown in the Neuron Profiler UI for multiprocess workloads are grouped together.  Please try the Perfetto output if you encounter this issue.\n* Currently, only a Neuron Runtime trace can be shown when capturing a system profile for a PyTorch workload. (Full framework traces can be shown for JAX workloads, though.) We are working to bring PyTorch traces into parity in a future release."
  },
  {
    "path": "release-notes/prev/2.26.0/containers.rst",
    "content": ".. _neuron-2-26-0-dlc:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Deep Learning Containers (DLC) component, version 2.26.0. Release date: 9/18/2025.\n\nAWS Neuron SDK 2.26.0: Neuron Deep Learning Containers release notes\n====================================================================\n\n**Date of release**:  September 18, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.26.0 release notes home <neuron-2-26-0-whatsnew>`\n\n.. important::\n   All Neuron packages and their dependencies have been upgraded to support version ``2.26.0`` of the AWS Neuron SDK.\n\nImprovements\n------------\n\nWe've added the following improvements for Deep Learning Container support in this release of the AWS Neuron SDK:\n\n* Both `pytorch-training-neuronx` and `pytorch-inference-neuronx` DLCs have been upgraded to version ``2.8.0`` along with their related dependencies.\n* Upgraded Python version to 3.11 in all Deep Learning Containers.\n\nBehavioral changes\n------------------\n\n* End-of-support for the Transformers NeuronX library starts with the 2.26.0 release of the AWS Neuron SDK. With this support ended, the PyTorch inference Deep Learning Container (DLC) will no longer include the ``transformers-neuronx`` package. For more details, see :ref:`announce-eos-tnx`.\n\nPrevious release notes\n----------------------\n\n* :ref:`containers_rn`\n* :ref:`containers_rn`\n* :ref:`containers_rn`\n"
  },
  {
    "path": "release-notes/prev/2.26.0/dlami.rst",
    "content": ".. _neuron-2-26-0-dlami:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Deep Learning AWS Machine Images (DLAMIs) component, version 2.26.0. Release date: 9/18/2025.\n\nAWS Neuron SDK 2.26.0: Neuron Deep Learning AWS Machine Images release notes\n============================================================================\n\n**Date of release**:  September 18, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.26.0 release notes home <neuron-2-26-0-whatsnew>`\n\nImprovements\n------------\n\nWe've added the following improvements for DLAMI support in this release of the AWS Neuron SDK:\n\n* Support for PyTorch 2.8 (Amazon Linux 2023, Ubuntu 22.04) single-framework DLAMI\n* Updates multi-framework DLAMI virtual environments to support PyTorch 2.8\n* All Neuron packages and their dependencies have been upgraded to support version 2.26.0 of the AWS Neuron SDK\n\nBehavioral changes\n------------------\n* End-of-support for the Transformers NeuronX library starts with the 2.26.0 release of the AWS Neuron SDK. As a result, the PyTorch inference Deep Learning Container (DLC) will no longer provide the ``transformers-neuronx`` virtual environment in both single and multi-framework DLAMIs. For more details, see :ref:`announce-eos-tnx`.\n\nPrevious release notes\n----------------------\n\n* :ref:`neuron-2-25-0-dlami`\n* :ref:`dlamis_rn`\n"
  },
  {
    "path": "release-notes/prev/2.26.0/index.rst",
    "content": ".. _neuron-2-26-0-whatsnew:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK, version 2.26.0. Release date: 9/18/2025.\n\nAWS Neuron SDK 2.26.0 release notes\n===================================\n\n**Date of release**:  September 18, 2025\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   PyTorch support <nx-pytorch>\n   JAX support <nx-jax>\n   NxD Inference <nxd-inference>\n   NxD Core <nxd-core>\n   NKI <nki>\n   Neuron Runtime <runtime>\n   Developer tools <tools>\n   Deep Learning AMIs <dlami>\n   Deep Learning Containers <containers>\n\nWhat's new?\n-----------\n\n**AWS Neuron SDK 2.26.0** adds support for PyTorch 2.8, JAX 0.6.2, along with support for Python 3.11, and introduces inference improvements on Trainium2 (``Trn2``). This release includes expanded model support, enhanced parallelism features, new Neuron Kernel Interface (NKI) APIs, and improved development tools for optimization and profiling.\n\nInference Updates\n^^^^^^^^^^^^^^^^^\n\n**NxD Inference** - Model support expands with beta releases of Llama 4 Scout and Maverick variants on ``Trn2``. The FLUX.1-dev image generation models are now available in beta on ``Trn2`` instances.\n\nExpert parallelism is now supported in beta, enabling MoE expert distribution across multiple NeuronCores. This release introduces on-device forward pipeline execution in beta and adds sequence parallelism in MoE routers for model deployment flexibility.\n\n.. \n   Sliding Window Attention (SWA) provides performance improvements by attending to recent tokens rather than full context. The feature includes attention sinks support and is automatically enabled for models trained with sliding window attention using the model config ``sliding_window`` attribute.\n\nNeural Kernel Interface (NKI)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNew APIs enable additional optimization capabilities:\n\n* ``gelu_apprx_sigmoid``: GELU activation with sigmoid approximation\n* ``select_reduce``: Selective element copying with maximum reduction\n* ``sequence_bounds``: Sequence bounds computation\n\nAPI enhancements include:\n\n* ``tile_size``: Added total_available_sbuf_size field\n* ``dma_transpose``: Added axes parameter for 4D transpose.\n* ``activation``: Added ``gelu_apprx_sigmoid`` operation\n\nDeveloper Tools\n^^^^^^^^^^^^^^^\n\nNeuron Profiler improvements include the ability to select multiple semaphores at once to correlate pending activity with semaphore waits and increments. Additionally, system profile grouping now uses a global NeuronCore ID instead of a process local ID for visibility across distributed workloads. The Profiler also adds warnings for dropped events due to limited buffer space.\n\nThe ``nccom-test`` utility adds State Buffer support on Trn2 for collective operations, including ``all-reduce``, ``all-gather``, and ``reduce-scatter`` operations. Error reporting provides messages for invalid all-to-all collective sizes to help developers identify and resolve issues.\n\nDeep Learning AMI and Containers\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Deep Learning AMI now supports PyTorch 2.8 on Amazon Linux 2023 and Ubuntu 22.04. Container updates include PyTorch 2.8.0 and Python 3.11 across all DLCs. The transformers-neuronx environment and package have been removed from PyTorch inference DLAMI/DLC.\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\nComponent release notes\n-----------------------\n\nSelect a card below to review detailed release notes for updated components of the Neuron SDK version 2.26.0. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.\n\n.. grid:: 1 1 2 2\n        :gutter: 2\n\n        .. grid-item-card:: \n                :link: neuron-2-26-0-pytorch\n                :link-type: ref\n\n                **PyTorch support** 2.26.0 release notes\n                ^^^\n                Neuron features and solutions that support the PyTorch ML framework.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-26-0-jax\n                :link-type: ref\n\n                **JAX support** 2.26.0 release notes\n                ^^^\n                Neuron features and solutions that support the JAX ML framework.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-26-0-nxd-inference\n                :link-type: ref\n\n                **NxD Inference** 2.26.0 release notes\n                ^^^\n                Neuron features and tools for LLM and agent ML model inference.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n        \n        .. grid-item-card::\n                :link: neuron-2-26-0-nxd-core\n                :link-type: ref\n\n                **NxD Core** 2.26.0 release notes\n                ^^^\n                Common features and tools for Neuron-based training and inference.\n                +++\n                Supports: ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-26-0-nki\n                :link-type: ref\n\n                **Neuron Kernel Interface (NKI)** 2.26.0 release notes\n                ^^^\n                Neuron's Python-based programming interface for developing and optimizing Neuron kernels.\n                +++\n                Supports:  ``Inf2``, ``Trn1``/ ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-26-0-runtime\n                :link-type: ref\n\n                **Neuron Runtime** 2.26.0 release notes\n                ^^^\n                The Neuron kernel driver and C++ libraries for AWS Inferentia and Trainium instances.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-26-0-tools\n                :link-type: ref\n\n                **Neuron Developer Tools** 2.26.0 release notes\n                ^^^\n                Tools that support end-to-end development for AWS Neuron.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: neuron-2-26-0-dlami\n                :link-type: ref\n\n                **Neuron Deep Learning AWS Machine Images (DLAMIs)** 2.26.0 release notes\n                ^^^\n                AWS-specific machine images for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n \n        .. grid-item-card:: \n                :link: neuron-2-26-0-dlc\n                :link-type: ref\n\n                **Neuron Deep Learning Containers (DLCs)** 2.26.0 release notes\n                ^^^\n                AWS-specific container definitions for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card::\n                :link: latest-neuron-release-artifacts\n                :link-type: ref\n        \n                Neuron 2.26.0 release artifacts\n                ^^^\n                The libraries and packages updated in this release.\n\nSupport announcements\n---------------------\n\nThis section signals the official end-of-support or end of support for specific features, tools, and APIs.\n\nEnd-of-support announcements\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n*An \"end-of-support (EoS)\" announcement is a notification that a feature, tool, or API will not be supported in the future. Plan accordingly!*\n\n* The Neuron Compiler default for the ``--auto-cast`` option will change from ``--auto-cast=matmult`` to ``--auto-cast=none`` in a future release.\n* The Beta versions of the :ref:`PyTorch NeuronCore Placement APIs <torch_neuron_core_placement_guide>` are no longer supported with this release.\n\n* Neuron version 2.26.0 is the last release supporting ``parallel_model_trace``. This NxD Inference function will be deprecated in the next version of the Neuron SDK in favor of the ``ModelBuilder.trace()`` method, which provides a more robust and flexible approach for tracing and compiling models for Neuron devices,  enabling more advanced features such as weight layout optimization support, as well as other quality-of-life and stability improvements for SPMD tracing.\n\n  For customers directly invoking ``parallel_model_trace``, they can now use ModelBuilderV2 APIs. For more details on these APIS, see :ref:`nxd-core-model-builder-v2`. For customers that are directly using models in NxDI, there is  no impact since NxDI models are already built on MBv1 which has no issues.\n\nEnding support in 2.26.0\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n*\" End-of-support\" means that AWS Neuron no longer supports the feature, tool, or API indicated in the note as of this release.*\n\n* End-of-support for the Transformers NeuronX library starts with the 2.26.0 release of the AWS Neuron SDK. As a result, the PyTorch inference Deep Learning Container (DLC) will no longer include the ``transformers-neuronx`` package and Neuron no longer provides the ``transformers_neuronx`` virtual environment in both single and multi-framework DLAMIs. For more details, see :ref:`announce-eos-tnx`.\n* Starting with Neuron Release 2.26, Neuron driver versions above 2.24 will only support non-Inf1 instances (such as ``Trn1``, ``Inf2``, or other instance types). For ``Inf1`` instance users, only Neuron driver version 2.24 will remain supported with regular security patches.\n* The Beta versions of the :ref:`PyTorch NeuronCore Placement APIs <torch_neuron_core_placement_guide>` are no longer supported with this release.\n\nKnown issues: Samples\n^^^^^^^^^^^^^^^^^^^^^\n\n* When running the `UNet training sample <https://github.com/aws-neuron/aws-neuron-samples-staging/blob/master/torch-neuronx/training/unet_image_segmentation/unet.ipynb>`_ with the Neuron compiler, you may encounter this error: `Estimated peak HBM usage exceeds 16GB.`\n  \n  * To work around this error, include the function ``conv_wrap`` in your model. (You can find a usable example of this function in the `UNet sample model code <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/unet_image_segmentation/model.py>`_.) Then, define a custom backward pass for your model following the instructions and example in `the Pytorch documentation <https://docs.pytorch.org/docs/stable/notes/extending.html>`_. The UNet sample also illustrates how this is done for the convolution layers in UNet.\n\nPrevious releases\n-----------------\n\n* :doc:`Neuron 2.25.0 </release-notes/prev/2.25.0/index>`\n* :doc:`Earlier releases </release-notes/prev/rn>`"
  },
  {
    "path": "release-notes/prev/2.26.0/nki.rst",
    "content": ".. _neuron-2-26-0-nki:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron Kernel Interface (NKI) component, version 2.26.0. Release date: 9/18/2025.\n\nAWS Neuron SDK 2.26.0: Neuron Kernel Interface (NKI) release notes\n===================================================================\n\n**Date of release**:  September 18, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.26.0 release notes home <neuron-2-26-0-whatsnew>`\n\nImprovements\n------------\n\nNew nki.language APIs\n^^^^^^^^^^^^^^^^^^^^^\n\n* gelu_apprx_sigmoid - Gaussian Error Linear Unit activation function with sigmoid approximation.\n\nUpdated nki.language APIs\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* tile_size.total_available_sbuf_size constant - Added a new field, ``total_available_sbuf_size``, that contains the returned total available SBUF size.\n\nNew nki.isa APIs\n^^^^^^^^^^^^^^^^\n\n* select_reduce - Selectively copy elements with maximum reduction.\n* sequence_bounds - Compute sequence bounds of segment IDs.\n* dma_transpose - Enhanced with:\n\n  * ``axes`` parameter to define 4D transpose for supported cases\n  * ``dge_mode`` parameter to specify Descriptor Generation Engine (DGE)\n\n* activation - Supports the new ``nl.gelu_apprx_sigmoid`` nki.language operation.\n\nImprovements and fixes\n^^^^^^^^^^^^^^^^^^^^^^\n\n* **nki.language.store()** - Supports PSUM buffer with extra additional copy inserted.\n\nDocumentation and tutorial updates\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Added documentation and example for dma_transpose API\n* Improved simulate_kernel example\n* Updated tutorial code to use ``nl.fp32.min`` instead of a magic number\n\nPrevious release notes\n----------------------\n\n* :ref:`nki_rn`\n\n"
  },
  {
    "path": "release-notes/prev/2.26.0/nx-jax.rst",
    "content": ".. _neuron-2-26-0-jax:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK JAX support component, version 2.26.0. Release date: 9/18/2025.\n\nAWS Neuron SDK 2.26.0: JAX support release notes\n================================================\n\n**Date of release**:  September 18, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.26.0 release notes home <neuron-2-26-0-whatsnew>`\n\nReleased versions\n-----------------\n* ``0.6.2.1.0.*``\n\nImprovements\n------------\n\n* This release introduces support for JAX version ``0.6.2``.\n\nKnown issues\n------------\n\n* The ``Threefry`` RNG algorithm is not completely supported. Use the ``rbg`` algorithm instead. This can be configured by setting the following config option: ``jax.config.update(\"jax_default_prng_impl\", \"rbg\")``\n* For JAX versions older than ``0.4.34``, caching does not work out of the box. Use this code to enable caching support:\n  \n  .. code:: python\n    \n    import jax\n    import jax_neuronx\n    from jax._src import compilation_cache\n\n    compilation_cache.set_cache_dir('./cache_directory')\n\n* For JAX versions older than ``0.4.34``, buffer donation does not work out of the box. Add the following snippet to your script to enable it * ``jax._src.interpreters.mlir._platforms_with_donation.append('neuron')``\n* Mesh configurations which use non-connected Neuron cores may crash during execution. You may observe compilation or Neuron runtime errors for such configurations. Device connectivity can be determined by using ``neuron-ls --topology``.\n* Not all dtypes supported by JAX work on Neuron. Check :ref:`neuron-data-types` for supported data types.\n* ``jax.random.randint`` does not produce expected distribution of randint values. Run it on CPU instead.\n* Dynamic loops are not supported for ``jax.lax.while_loop``. Only static while loops are supported.\n* ``jax.lax.cond`` is not supported.\n* Host callbacks are not supported. As a result APIs based on callbacks from ``jax.debug`` and ``jax.experimental.checkify`` are not supported.\n* ``jax.dlpack`` is not supported.\n* ``jax.experimental.sparse`` is not supported.\n* ``jax.lax.sort`` only supports comparators with LE, GE, LT and GT operations.\n* ``jax.lax.reduce_precision`` is not supported.\n* Certain operations (for example, rng weight initialization) might result in slow compilations. Try to run such operations on the CPU backend or by setting the following environment variable: ``NEURON_RUN_TRIVIAL_COMPUTATION_ON_CPU=1``.\n* Neuron only supports ``float8_e4m3`` and ``float8_e5m2`` for FP8 dtypes.\n* Complex dtypes (``jnp.complex64`` and ``jnp.complex128``) are not supported.\n* Variadic reductions are not supported.\n* Out-of-bounds access for scatter/gather operations can result in runtime errors.\n* Dot operations on ``int`` dtypes are not supported.\n* ``lax.DotAlgorithmPreset`` is not always respected. Dot operations occur in operand dtypes. This is a configurable parameter for ``jax.lax.dot`` and ``jax.lax.dot_general``.\n\nPrevious release notes\n----------------------\n\n* JAX Neuron release notes\n"
  },
  {
    "path": "release-notes/prev/2.26.0/nx-pytorch.rst",
    "content": ".. _neuron-2-26-0-pytorch:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK PyTorch support component, version 2.26.0. Release date: 9/18/2025.\n\nAWS Neuron SDK 2.26.0: PyTorch support release notes\n====================================================\n\n**Date of release**:  September 18, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.26.0 release notes home <neuron-2-26-0-whatsnew>`\n\nReleased versions\n-----------------\n\n- ``2.8.0.2.10.*``\n- ``2.7.0.2.10.*``\n- ``2.6.0.2.10.*``\n\nImprovements\n------------\n\n- Added support for PyTorch 2.8 (see :ref:`Introducing PyTorch 2.8 Support<introduce-pytorch-2-8>`)\n\nKnown issues\n------------\n\n.. note::\n   * See :ref:`Introducing PyTorch 2.8 Support<introduce-pytorch-2-8>` for a full list of known issues with v2.8.\n   * See :ref:`Introducing PyTorch 2.7 Support<introduce-pytorch-2-7>` for a full list of known issues with v2.7.\n   * See :ref:`Introducing PyTorch 2.6 Support<introduce-pytorch-2-6>` for a full list of known issues with v2.6.\n\n* [PyTorch v2.8] Using the publicly released version of torch-xla 2.8.0 from public PyPI repositories would result in lower performance for models like BERT and LLaMA (https://github.com/pytorch/xla/issues/9605). To fix this, switch to using the updated torch-xla version 2.8.1 from public PyPI repositories.\n\n* [PyTorch v2.7] Using the latest torch-xla v2.7 may result in an increase in host memory usage compared to torch-xla v2.6. In one example, LLama2 pretraining with ZeRO1 and sequence length 16k could see an increase of 1.6% in host memory usage.\n\n* Currently, when switching Ubuntu OS kernel version from 5.15 to 6.8, you may see performance differences due to the new kernel scheduler (CFS vs EEVDF). For example, BERT pretraining performance could be lower by up to 10%. You may try using an older OS kernel (i.e. Amazon Linux 2023) or experiment with the kernel real-time scheduler by running ``sudo chrt --fifo 99`` before your command (i.e. ``sudo chrt --fifo 99 <script>``) to improve the performance. Note that adjusting the real-time scheduler can also result in lower performance. See https://www.kernel.org/doc/html/latest/scheduler/sched-eevdf.html for more information.\n\n* Currently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``.\n\n* [PyTorch v2.6]  BERT pretraining performance is approximately 10% lower with torch-neuronx 2.6 compared to torch-neuronx 2.5. This is due to a known regression in torch-xla https://github.com/pytorch/xla/issues/9037 and may affect other models with high graph tracing overhead. This is fixed in torch-xla 2.7 and 2.8. To work around this issue in torch-xla 2.6, build the ``r2.6_aws_neuron`` branch of torch-xla as follows (see :ref:`pytorch-neuronx-install-cxx11` for C++11 ABI version):\n\n.. code:: bash\n\n      # Setup build env (make sure you are in a python virtual env). Replace \"apt\" with \"yum\" on AL2023.\n      sudo apt install cmake\n      pip install yapf==0.30.0\n     wget https://github.com/bazelbuild/bazelisk/releases/download/v1.20.0/bazelisk-linux-amd64\n     sudo cp bazelisk-linux-amd64 /usr/local/bin/bazel\n\n     # Clone repos\n     git clone --recursive https://github.com/pytorch/pytorch --branch v2.6.0\n     cd pytorch/\n     git clone --recursive https://github.com/pytorch/xla.git --branch r2.6_aws_neuron\n     _GLIBCXX_USE_CXX11_ABI=0 python setup.py bdist_wheel\n\n     # The pip wheel will be present in ./dist\n     cd xla/\n     CXX_ABI=0 python setup.py bdist_wheel\n\n     # The pip wheel will be present in ./dist and can be installed instead of the torch-xla released in pypi.org\n\n* Currently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 through 2.8 although there will be end-of-support warnings (as noted below).\n\n* Environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (see the warning raised below). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`).\n\n.. code:: bash\n\n   Warning: ``XLA_DOWNCAST_BF16`` will be deprecated after the 2.5 release, please downcast your model directly\n\n* [PyTorch v2.8+] ``DeprecationWarning: Use torch_xla.device instead``. This is a warning that ``torch_xla.core.xla_model.xla_device()`` is deprecated. Switch to using ``torch_xla.device()`` instead.\n\n* [PyTorch v2.8+] ``DeprecationWarning: Use torch_xla.sync instead``. This is a warning that ``torch_xla.core.xla_model.mark_step()`` is deprecated. Switch to using ``torch_xla.sync()`` instead.\n\n* [PyTorch v2.7+] ``AttributeError: module 'torch_xla.core.xla_model' ... does not have the attribute 'xrt_world_size'``. This is an error that notes that ``torch_xla.core.xla_model.xrt_world_size()`` is removed in torch-xla version 2.7+. Switch to using ``torch_xla.runtime.world_size()`` instead.\n\n* [PyTorch v2.7+] ``AttributeError: module 'torch_xla.core.xla_model' ... does not have the attribute 'get_ordinal'``. This is an error that notes that ``torch_xla.core.xla_model.get_ordinal()`` is removed in torch-xla version 2.7+. Switch to using ``torch_xla.runtime.global_ordinal()`` instead.\n\n* [PyTorch v2.5+] ``AttributeError: module 'torch_xla.runtime' has no attribute 'using_pjrt'``. In Torch-XLA 2.5+, ``torch_xla.runtime.using_pjrt`` is removed because PJRT is the sole Torch-XLA runtime. See this `PyTorch commit PR on GitHub <https://github.com/pytorch/xla/commit/d6fb5391d09578c8804b1331a5e7a4f72bf981db>`_.\n\nPrevious release notes\n----------------------\n\n* :ref:`neuron-2-25-0-pytorch`\n* :ref:`pytorch-neuron-rn`\n"
  },
  {
    "path": "release-notes/prev/2.26.0/nxd-core.rst",
    "content": ".. _neuron-2-26-0-nxd-core:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK NxD Core component, version 2.26.0. Release date: 9/18/2025.\n\nAWS Neuron SDK 2.26.0: NxD Core release notes\n=============================================\n\n**Date of release**:  September 18, 2025\n\n**Version**: 0.15.22259\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.26.0 release notes home <neuron-2-26-0-whatsnew>`\n\nNxD Core inference improvements\n-------------------------------\n\nNon-distributed inference in parallel layers\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUpdated parallel layers to support non-distributed inference when parallel state isn't initialized.\nIn non-parallel environments, RowParallelLinear and ColumnParallelLinear now function as ``nn.Linear``,\nand ``ParallelEmbedding``now functions as ``nn.Embedding``. This change enables you to simplify model code that\nworks on device and on CPU by enabling you to use the parallel layer in both cases.\n\nOther improvements\n^^^^^^^^^^^^^^^^^^\n\n* Added a ``compiler_flag_hook`` argument to ModelBuilder, which you can use to override compiler flags\n  for different submodels and buckets.\n\nBug fixes\n---------\n\nHere's what we fixed in 2.26.0:\n\nInference\n^^^^^^^^^\n\n* Added additional instance types to the ``hardware`` enum. For example, ``inf2`` now maps to ``trn1``.\n* Other minor bug fixes and improvements.\n\nKnown issues\n------------\n\n*Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!*\n\nInference\n^^^^^^^^^\n\n* At high batch size (>=32), we have observed performance degradation with ``shard-on-load`` for some models such as Llama3.1-8B. Our current recommendation is to disable this feature by enabling \n  ``save_sharded_checkpoint`` in ``NeuronConfig`` when you trace and compile the model.\n* ``spmd_mode = True`` does not work when provided to the ``parallel_model_trace`` API. ``parallel_model_trace`` will be deprecated in the next Neuron SDK release.\n\nPrevious release notes\n----------------------\n\n* :ref:`neuron-2-25-0-nxd-core`\n* :ref:`nxd-core_rn`\n"
  },
  {
    "path": "release-notes/prev/2.26.0/nxd-inference.rst",
    "content": ".. _neuron-2-26-0-nxd-inference:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Transformers for Inference component, version 2.26.0. Release date: 9/18/2025.\n\nAWS Neuron SDK 2.26.0: NxD Inference release notes\n==================================================\n\n**Date of release**:  September 18, 2025\n\n**Version**: 0.6.10598\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.26.0 release notes home <neuron-2-26-0-whatsnew>`\n\nImprovements\n------------\n\nLlama 4 model support (beta)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAdded beta support for Llama 4, which is a family of multi-modal MoE ope- weight LLMs by Meta that support text\nand image inputs. Llama 4 is tested on ``Trn2``. Compatible models include:\n\n- `Llama 4 Scout <https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct>`__\n- `Llama 4 Maverick <https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct>`__\n\nIn this beta release, Llama 4 model support has the following limitations:\n\n- The model is tested to be accurate up to a sequence length of 8192.\n- Model performance on Trn2 isn't fully optimized.\n- To use Llama 4 with vLLM, you must compile the model outside of vLLM and specify\n  the compiled model path using the ``NEURON_COMPILED_ARTIFACTS`` environment variable.\n\nThese limitations will be addressed in a future release.\n\nFor more information, see :ref:`/libraries/nxd-inference/tutorials/llama4-tutorial.ipynb`\nand :ref:`nxdi-model-reference`.\n\nFLUX.1 model support (beta)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAdded beta support for FLUX.1-dev, which is an open weight image generation model\nby Black Forest Labs. Flux.1-dev is tested on Trn2. Compatible models include:\n\n- `Flux.1-dev <https://huggingface.co/black-forest-labs/FLUX.1-dev>`__\n\nIn this beta release, the model's performance isn't optimized.\n\nFor more information, see :ref:`/libraries/nxd-inference/tutorials/flux-inference-tutorial.ipynb`\nand :ref:`nxdi-model-reference`.\n\nExpert parallelism support (beta)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAdded support for expert parallelism, which distributes expert processing across multiple\nNeuronCores. Expert parallelism improves performance for mixture-of-experts (MoE) models,\nparticularly for models with a large number of experts, such as Llama 4 Maverick. For more\ninformation, see :ref:`nxd-inference-api-guide-moe-neuron-config`.\n\nContext parallelism improvements\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWith this release, context parallelism is out of beta and includes several improvements.\n\n- Added support for sliding window attention (SWA) with context parallelism.\n- Added a strided context parallel flash attention kernel which includes compute elimination.\n  This kernel is more performant than the existing content parallel flash attention kernel,\n  especially at high sequence lengths. To use the kernel,\n  enable ``strided_context_parallel_kernel_enabled`` in NeuronConfig.\n- Fixed an accuracy issue in hybrid sharding configurations that use context parallelism\n  and attention bias. Hybrid sharding refers to models with different sharding strategies\n  for context encoding and token generation submodels, such as a configuration that uses\n  context parallelism for context encoding and data parallelism for token generation.\n  \n..\n  Sliding window attention (SWA)\n  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n..\n  Added support for sliding window attention, including support for attention sinks. Sliding window\n  attention improves attention performance by attending to a subset of recent tokens, rather than the\n  full context.\n..\n  NxD Inference uses the ``sliding_window`` attribute from the model config as the window size. The\n  ``sliding_window`` attribute is typically set in the Hugging Face checkpoint config, so NxD Inference\n  automatically enables sliding window attention for models trained with it.\n\nOn-device forward pipeline execution (Beta)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAdded support for a model-forward function that accepts both on-device abnd on-CPU input tensors. This feature improves performance in pipeline models by eliminating data transfer between device and CPU. For example, you can use this feature with Llama 4 (which accepts image and text inputs) to keep the vision encoder outputs on-device for the context encoding model to process.\n\nTo use pipeline execution, specify ``pipeline_execution=True`` when you initialize a ModelWrapper. For more information, see :ref:`how-to-use-fpem`.\n\nOther improvements\n^^^^^^^^^^^^^^^^^^\n\n* Added support for PyTorch 2.8 and Python 3.11.\n* Added support for sequence parallelism in mixture-of-experts (MoE) routers. This change improves\n  context encoding latency for MoE models that use sequence parallelism.\n* Enabled ``temperature=0`` as a valid option in dynamic on-device sampling. This temperature\n  value specifies to use greedy sampling.\n* Enabled ``top_k`` values of ``0`` and ``-1`` as valid options in dynamic on-device sampling.\n  These ``top_k`` values specify to randomly pick a token from the vocabulary using a uniform\n  distribution.\n\nBug fixes\n---------\n\n* Fixed an issue where HuggingFaceGenerationAdapter performs redundant CPU sampling for models that\n  use on-device sampling and ``output_logits=True``. This fix improves the performance of models with\n  this configuration.\n* Other minor fixes and improvements.\n\nKnown issues\n------------\n\n* ``spmd_mode = True`` does not work when provided to the ``parallel_model_trace`` API. ``parallel_model_trace`` will be deprecated in the next Neuron SDK release.\n\nPrevious release notes\n----------------------\n\n* :ref:`neuron-2-25-0-nxd-inference`\n* :ref:`nxd-inference_rn`\n"
  },
  {
    "path": "release-notes/prev/2.26.0/runtime.rst",
    "content": ".. _neuron-2-26-0-runtime:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Runtime component, version 2.26.0. Release date: 9/18/2025.\n\nAWS Neuron SDK 2.26.0: Neuron Runtime release notes\n===================================================\n\n**Date of release**:  September 18, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.26.0 release notes home <neuron-2-26-0-whatsnew>`\n\nReleased versions\n-----------------\n\n- Neuron Driver: ``2.24.7.0``\n- Neuron Runtime Library: ``2.28.19.0``\n- Neuron Collectives: ``2.28.20.0``\n\nNeuron Runtime Library 2.28.19.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Added rank ID to all events emitted from the Profiler 2.0 system trace.\n* Improved timestamp alignment of Profiler 2.0 NeuronCore and CPU system trace events enhancing the accuracy of the trace timeline.\n\nNeuron Driver 2.24.13.0\n^^^^^^^^^^^^^^^^^^^^^^^^\n* Compatibility fixes for Linux kernel 6.18.\n\n.. note::\n\n   This is a patch release for ``Inf1`` users. Neuron driver version 2.24 is the last driver version to support ``Inf1`` instances.\n\nNeuron Driver 2.24.7.0\n^^^^^^^^^^^^^^^^^^^^^^\n* Fixed installation issue causing builds to fail for Linux Kernels 6.13+.\n\nNeuron Runtime Library 2.28.19.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n* Fixed bug where `nrt_unload` returned `NRT_SUCCESS` even when model stop fails due to Neuron Core lockups.\n* Fixed bug where `model_name` was empty in Profiler 2.0 system trace events.\n\nNeuron Collectives 2.28.20.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n* Fixed bug where error messages were incorrectly being displayed on machines with no EFA devices.\n\nPrevious release notes\n----------------------\n\n* :ref:`neuron-2-25-0-runtime`\n* :ref:`runtime_rn`\n* :ref:`runtime_rn`\n* :ref:`runtime_rn`\n"
  },
  {
    "path": "release-notes/prev/2.26.0/tools.rst",
    "content": ".. _neuron-2-26-0-tools:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Developer Tools component, version 2.26.0. Release date: 9/18/2025.\n\nAWS Neuron SDK 2.26.0: Developer Tools release notes\n====================================================\n\n**Date of release**:  September 18, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.26.0 release notes home <neuron-2-26-0-whatsnew>`\n\nImprovements\n------------\n\nView multiple semaphores simultaneously\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe Neuron Profiler UI now allows you to select multiple semaphore values to display simultaneously for a more comprehensive view of activity.\n\n``nccom-test`` new State Buffer support\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n``nccom-test`` support on Trn2 for State Buffer to State Buffer collectives benchmarking for all-reduce, all-gather, and reduce-scatter operations.\n\nBehavioral changes\n------------------\n\n* System profile grouping default in Perfetto now uses global NeuronCore ID instead of process local NeuronCore ID for better display of multi-process workloads.\n* Added warning when system profile events are dropped due to limited buffer space, and added suggestion of how configure more buffer space if desired.\n* ``nccom-test`` will show helpful error message when invalid sizes are used with all-to-all collectives.\n\nBug fixes\n---------\n\nHere's what we fixed in 2.26.0:\n\n* Fixed device memory usage type table and improvement made to stay in sync between runtime and tools versions.\n* Fixed system profile crash when processing long-running workloads.\n* Fixed display of system profiles in Perfetto to correctly separate rows within the same Logical NeuronCore when using ``NEURON_LOGICAL_NC_CONFIG=2`` on Trn2.\n\nKnown issues\n------------\n\n* System profile hardware events may be misaligned due to sync point imprecision. In Perfetto, this may cause events to be interleaved.\n* System profile events shown in the Neuron Profiler UI for multiprocess workloads are grouped together. Please try the Perfetto output if you encounter this issue.\n* Currently, only a Neuron Runtime trace can be shown when capturing a system profile for a PyTorch workload. (Full framework traces can be shown for JAX workloads, though.) We are working to bring PyTorch traces into parity in a future release.\n\nPrevious versions\n-----------------\n\n* :ref:`neuron-2-25-0-tools`\n* :ref:`dev-tools_rn`\n"
  },
  {
    "path": "release-notes/prev/2.26.1.rst",
    "content": ".. meta::\n    :description: Release notes for AWS Neuron SDK release v2.6.1\n    :date-modified: 10/29/2025\n\n\nAWS Neuron SDK Release Notes - v2.26.1\n=======================================\n\n**Release Date**: October 29, 2025\n\nOverview\n---------    \n\nRelease *2.26.1* of the AWS Neuron SDK includes bug fixes applied to the AWS Neuron SDK v2.26.0. See :ref:`the Neuron SDK v2.26.0 release notes <neuron-2-26-0-whatsnew>` for the full set of changes that shipped with the 2.26.0 release.\n\nBug fixes in this release\n--------------------------\n\n* Fix: To address an issue with out-of-memory errors in **torch-neuronx**, this release enables you to use the Neuron Runtime API to apply direct memory allocation.\n\n\nResources\n----------\n\n* For the set of SDK package version changes in 2.26.1, see :ref:`Release Content <latest-neuron-release-artifacts>`.\n\n\n"
  },
  {
    "path": "release-notes/prev/2.27.0/compiler.rst",
    "content": ".. _neuron-2-27-0-compiler:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK compiler component, version 2.27.0. Release date: 12/19/2025.\n\nAWS Neuron SDK 2.27.0: Neuron Compiler release notes\n====================================================\n\n**Date of release**: December 19, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.27.0 release notes home <neuron-2-27-0-whatsnew>`\n* Review older release notes in the :ref:`Previous Neuron Releases <previous-neuron-releases>` section.\n\nChanges and improvements\n-------------------------\n\n* **Error code docs** New error code documentation has been added to help developers better understand and troubleshoot issues encountered during model compilation. Check them out here: :doc:`Neuron Compiler Error Codes </compiler/error-codes/index>`\n\n* **Compiler accuracy flag defaults updated**: Two Neuron Compiler (neuronxcc) flags now have different default behaviors to improve accuracy. The ``--auto-cast`` flag now defaults to ``none`` (previously ``matmul``), and ``--enable-mixed-precision-accumulation`` is now enabled by default. These changes optimize accuracy but may impact performance for FP32 models and models using smaller bitwidth dtypes. To restore previous behavior, explicitly set ``--auto-cast=matmul`` and use the new ``--disable-mixed-precision-accumulation`` flag.\n\n* **Python 3.9 no longer supported**: The Neuron Compiler requires Python 3.10 or higher. Users currently on Python 3.9 must upgrade to continue using the Neuron Compiler with Python bindings.\n\n"
  },
  {
    "path": "release-notes/prev/2.27.0/containers.rst",
    "content": ".. _neuron-2-27-0-dlc:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Deep Learning Containers (DLC) component, version 2.27.0. Release date: 12/19/2025.\n\nAWS Neuron SDK 2.27.0: Neuron Deep Learning Containers release notes\n====================================================================\n\n**Date of release**: December 19, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.27.0 release notes home <neuron-2-27-0-whatsnew>`\n\nWhat's New\n----------\n\n**PyTorch Inference vLLM-Neuronx 0.11.0 DLC** — Added new ``pytorch-inference-vllm-neuronx`` 0.11.0 DLC with PyTorch 2.8, vLLM V1 with the `vLLM-Neuron Plugin <https://github.com/vllm-project/vllm-neuron>`_, tools, NxDI and all dependencies to run :ref:`nxdi-vllm-user-guide-v1` out of the box.\n\n**PyTorch 2.9.0 Support** — Upgraded ``pytorch-training-neuronx`` and ``pytorch-inference-neuronx`` DLCs to PyTorch 2.9.0 with related dependencies.\n\n**JAX 0.7.0 Support** — Upgraded ``jax-training-neuronx`` DLC to JAX 0.7.0 with related dependencies.\n\n**Ubuntu 24.04 and Python 3.12** — Upgraded base image to Ubuntu 24.04 and Python 3.12 in all DLCs.\n\n**Neuron SDK Updates** — Upgraded all Neuron packages and dependencies to support AWS Neuron SDK version 2.27.\n\n\n\n"
  },
  {
    "path": "release-notes/prev/2.27.0/dlami.rst",
    "content": ".. _neuron-2-27-0-dlami:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Deep Learning AWS Machine Images (DLAMIs) component, version 2.27.0. Release date: 12/19/2025.\n\nAWS Neuron SDK 2.27.0: Neuron Deep Learning AWS Machine Images release notes\n============================================================================\n\n**Date of release**: December 19, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.27.0 release notes home <neuron-2-27-0-whatsnew>`\n\nWhat's New\n----------\n\n**Ubuntu 24.04 Support** — This release adds support for Ubuntu 24.04 base, single framework, and multi-framework DLAMIs with Python 3.12, providing customers with the latest Ubuntu LTS version for their machine learning workloads.\n\n**vLLM V1 with vLLM-Neuron Plugin** — Published new vLLM V1 with the `vLLM-Neuron Plugin <https://github.com/vllm-project/vllm-neuron>`_ single framework DLAMI and added virtual environment to multi-framework DLAMIs (Amazon Linux 2023, Ubuntu 24.04).\n\n**PyTorch 2.9 Support** — Added PyTorch 2.9 support for single framework DLAMIs and virtual environment to multi-framework DLAMIs (Amazon Linux 2023, Ubuntu 24.04).\n\n**JAX 0.7 Support** — Published JAX 0.7 single framework DLAMI and updated multi-framework DLAMI virtual environments to JAX 0.7 (Amazon Linux 2023, Ubuntu 24.04).\n\n**Neuron SDK Updates** — Upgraded all Neuron packages and dependencies to support AWS Neuron SDK version 2.27.\n\nEnd of Support\n--------------\n\n**TensorFlow 2.10 End of Support** — The ``tensorflow_2_10`` single framework DLAMI and virtual environment in multi-framework DLAMIs will reach end of support in a future release. Customers are advised to use previously released DLAMIs for TensorFlow support.\n\n**Ubuntu 22.04 Single Framework End of Support** — Ubuntu 22.04 single framework DLAMIs for PyTorch and JAX will reach end of support in a future release. Customers are advised to use multi-framework or previously released DLAMIs for Ubuntu 22.04.\n\n**Inf1 virtual environments End of Support** — Inf1 virtual environments and AMIs have reached end of support. Use Neuron DLAMIs released up to SDK version 2.26 for Inf1 support.\n\n\n"
  },
  {
    "path": "release-notes/prev/2.27.0/index.rst",
    "content": ".. _neuron-2-27-0-whatsnew:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK, version 2.27.0. Release date: 12/19/2025\n   :date-modified: 01/14/2026\n\nNeuron 2.27.0 Component Release Notes\n=====================================\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   PyTorch support <nx-pytorch>\n   NxD Inference <nxd-inference>\n   NKI <nki>\n   NKI Library <nki-lib>\n   Neuron Compiler <compiler>\n   Neuron Runtime <runtime>\n   Developer tools <tools>\n   Deep Learning AMIs <dlami>\n   Deep Learning Containers <containers>\n\n.. important:: Neuron 2.27.1 patch release\n        **January 14, 2026**\n        \n        A patch release, Neuron version 2.27.1, is available that includes a fix for an issue with Llama 4 models found in Neuron SDK version 2.27.0. For details, see :doc:`the Neuron SDK v2.27.1 release note </release-notes/prev/2.27.1>`.\n\n----\n\n**On December 19, 2025, AWS Neuron released the 2.27.0 version of the Neuron SDK**. \n\nThis page provides detailed component release notes for the Neuron SDK 2.27.0. For a an overview of the release content, see :ref:`What's New in AWS Neuron <whats-new-2025-12-19-v2_27>`.\n\n**Update for Neuron 2.27.1**: A patch release, Neuron version 2.27.1, is available that includes a fix for an issue with Llama 4 models found in Neuron SDK version 2.27.0. For details, see :doc:`the Neuron SDK v2.27.1 release note </release-notes/prev/2.27.1>`.\n\nSelect a card below to review detailed release notes for each component of the Neuron SDK version 2.27.0. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.\n\n.. grid:: 1 \n        :gutter: 2\n\n        .. grid-item-card::\n                :link: latest-neuron-release-artifacts\n                :link-type: ref\n                :class-card: sd-border-1\n        \n                **Neuron 2.27.0 release artifacts**\n                ^^^\n                The libraries and packages updated in this Neuron release.\n\n.. grid:: 1 1 2 2\n        :gutter: 2\n\n        .. grid-item-card:: \n                :link: neuron-2-27-0-pytorch\n                :link-type: ref\n\n                **PyTorch support** 2.27.0 release notes\n                ^^^\n                Neuron features and solutions that support the PyTorch ML framework.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: neuron-2-27-0-nxd-inference\n                :link-type: ref\n\n                **NxD Inference** 2.27.0 release notes\n                ^^^\n                Neuron features and tools for LLM and agent ML model inference.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n         \n        .. grid-item-card:: \n                :link: neuron-2-27-0-compiler\n                :link-type: ref\n\n                **Neuron Compiler** 2.27.0 release notes\n                ^^^\n                The Neuron compiler for AWS Trainium and Inferentia, and its libraries and tools.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: neuron-2-27-0-nki\n                :link-type: ref\n\n                **Neuron Kernel Interface (NKI)** 2.27.0 release notes\n                ^^^\n                Neuron's Python-based programming interface for developing and optimizing Neuron kernels.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card::\n                :link: neuron-2-27-0-nkilib\n                :link-type: ref\n\n                **NKI Library (NKI-Lib)** 2.27.0 release notes\n                ^^^\n                A collection of pre-optimized Neuron kernels for common model operations.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: neuron-2-27-0-runtime\n                :link-type: ref\n\n                **Neuron Runtime** 2.27.0 release notes\n                ^^^\n                The Neuron kernel driver and C++ libraries for AWS Inferentia and Trainium instances.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: neuron-2-27-0-tools\n                :link-type: ref\n\n                **Neuron Developer Tools** 2.27.0 release notes\n                ^^^\n                Tools that support end-to-end development for AWS Neuron.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: neuron-2-27-0-dlami\n                :link-type: ref\n\n                **Neuron Deep Learning AWS Machine Images (DLAMIs)** 2.27.0 release notes\n                ^^^\n                AWS-specific machine images for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n \n        .. grid-item-card:: \n                :link: neuron-2-27-0-dlc\n                :link-type: ref\n\n                **Neuron Deep Learning Containers (DLCs)** 2.27.0 release notes\n                ^^^\n                AWS-specific container definitions for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n\nNxD Core and NxD Training Updates for 2.27\n------------------------------------------\n\nNeuron support for PyTorch 2.9 will be the last to include NeuronX Distributed Training (NxDT), NxD Core training APIs, and PyTorch/XLA for training. Starting with Neuron support for PyTorch 2.10, these components will no longer be supported.\n\nExisting NxDT/NxD Core users should stay on PyTorch 2.9 until ready to migrate to native PyTorch on Neuron (starting PyTorch 2.10). Customers are recommended to use native PyTorch with standard distributed primitives (DTensor, FSDP, DDP) and TorchTitan starting with Neuron 2.28 and PyTorch 2.10. A migration guide will be published in Neuron 2.28.\n\nSoftware maintenance announcements\n----------------------------------\n\nThis section signals the official end-of-support or end of support for specific features, tools, and APIs. For the full set of Neuron release announcements, see :doc:`/about-neuron/announcements/index`.\n\nKnown issues: Samples\n---------------------\n\n* When running the `UNet training sample <https://github.com/aws-neuron/aws-neuron-samples-staging/blob/master/torch-neuronx/training/unet_image_segmentation/unet.ipynb>`_ with the Neuron compiler, you may encounter this error: `Estimated peak HBM usage exceeds 16GB.`\n  \n  * To work around this error, include the function ``conv_wrap`` in your model. (You can find a usable example of this function in the `UNet sample model code <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/unet_image_segmentation/model.py>`_.) Then, define a custom backward pass for your model following the instructions and example in `the PyTorch documentation <https://docs.pytorch.org/docs/stable/notes/extending.html>`_. The UNet sample also illustrates how this is done for the convolution layers in UNet.\n\nPrevious releases\n-----------------\n\n* :doc:`Neuron 2.26.0 </release-notes/prev/2.26.0/index>`\n* :doc:`Neuron 2.25.0 </release-notes/prev/2.25.0/index>`\n* :doc:`Earlier releases </release-notes/prev/rn>`\n\n"
  },
  {
    "path": "release-notes/prev/2.27.0/nki-lib.rst",
    "content": ".. _neuron-2-27-0-nkilib:\n\n.. meta::\n   :description: The official release notes for the NKI Library component, version 2.27.0. Release date: 12/19/2025.\n\nAWS Neuron SDK 2.27.0: NKI Library release notes\n=================================================\n\n**Date of release**: December 19, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.27.0 release notes home <neuron-2-27-0-whatsnew>`\n\nWhat's New\n----------\n\nThis release introduces the NKI Library, which provides pre-built kernels you can use to optimize\nthe performance of your models. The NKI Library offers ready-to-use, pre-optimized kernels that\nleverage the full capabilities of AWS Trainium hardware.\n\nNKI Library kernels are published in the `NKI Library GitHub repository <https://github.com/aws-neuron/nki-library>`_.\nIn Neuron 2.27, these kernels are also shipped as part of neuronx-cc under the ``nkilib.*`` namespace.\n\nAccessing NKI Library Kernels\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYou can access NKI Library kernels in two ways:\n\n* **Shipped version**: Import from the ``nkilib.*`` namespace (included with neuronx-cc in Neuron 2.27)\n* **Open source repository**: Clone and use kernels from the GitHub repository under the ``nkilib_standalone.nkilib.*`` namespace\n\nNew Kernels\n^^^^^^^^^^^\n\nThis release includes the following pre-optimized kernels:\n\n* **Attention CTE Kernel** — Implements attention with support for multiple variants and optimizations\n* **Attention TKG Kernel** — Implements attention specifically optimized for token generation scenarios\n* **MLP Kernel** — Implements a Multi-Layer Perceptron with optional normalization fusion and various optimizations\n* **Output Projection CTE Kernel** — Computes the output projection operation optimized for Context Encoding use cases\n* **Output Projection TKG Kernel** — Computes the output projection operation optimized for Token Generation use cases\n* **QKV Kernel** — Performs Query-Key-Value projection with optional normalization fusion\n* **RMSNorm-Quant Kernel** — Performs optional RMS normalization followed by quantization to fp8\n\nNKI Library Kernel Migration to New nki.* Namespace in Neuron 2.28\n-------------------------------------------------------------------\n\nSome NKI Library kernels currently use the legacy ``neuronxcc.nki.*`` namespace. Starting with\nNeuron 2.28, all NKI Library kernels will migrate to the new ``nki.*`` namespace.\n\nThe new ``nki.*`` namespace introduces changes to NKI APIs and language constructs. Customers\nusing NKI Library kernels should review the migration guide for any required changes.\n\nNKI Library Namespace Changes in Neuron 2.28\n---------------------------------------------\n\nStarting with Neuron 2.28, the open source repository namespace will change from\n``nkilib_standalone.nkilib.*`` to ``nkilib.*``, providing a consistent namespace between\nthe open source repository and the shipped version.\n\nCustomers who want to add or modify NKI Library kernels can build and install them to\nreplace the default implementation without changing model imports.\n\n"
  },
  {
    "path": "release-notes/prev/2.27.0/nki.rst",
    "content": ".. _neuron-2-27-0-nki:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron Kernel Interface (NKI) component, version 2.27.0. Release date: 12/19/2025.\n\nAWS Neuron SDK 2.27.0: Neuron Kernel Interface (NKI) release notes\n===================================================================\n\n**Date of release**: December 19, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.27.0 release notes home <neuron-2-27-0-whatsnew>`\n\nWhat's New\n----------\n\nThis release introduces NKI Beta 2, featuring the new :doc:`NKI Compiler </nki/deep-dives/nki-compiler>`\nand significant enhancements to the NKI language constructs and APIs, including changes to existing APIs. \nFor information about the different NKI Beta versions, see :doc:`About the NKI Compiler </nki/deep-dives/nki-compiler>`.\n\nTo take advantage of Beta 2 with the new compiler, import the ``nki.*`` namespace in your code\nand annotate your top-level kernel function with ``@nki.jit``.\n\n**Backward Compatibility and Migration**\n\nNeuron 2.27 supports both the ``neuronxcc.nki.*`` and ``nki.*`` namespaces side by side,\nallowing existing Beta 1 kernels to continue working seamlessly. However, Neuron 2.27 will\nbe the last release to include support for the ``neuronxcc.nki.*`` namespace. Starting with\nNeuron 2.28, this namespace will no longer be supported.\n\nThe new ``nki.*`` namespace introduces changes to NKI APIs and language constructs. We\nencourage customers to migrate existing kernels from ``neuronxcc.nki.*`` to the new ``nki.*``\nnamespace. A kernel migration guide is available in the Neuron 2.27 documentation to assist\nwith this transition.\n\nNew nki.language APIs\n^^^^^^^^^^^^^^^^^^^^^\n\n* :doc:`nki.language.device_print </nki/api/generated/nki.language.device_print>`\n\nNew nki.isa APIs\n^^^^^^^^^^^^^^^^\n\n* :doc:`nki.isa.dma_compute </nki/api/generated/nki.isa.dma_compute>`\n* :doc:`nki.isa.quantize_mx </nki/api/generated/nki.isa.quantize_mx>`\n* :doc:`nki.isa.nc_matmul </nki/api/generated/nki.isa.nc_matmul>`\n* :doc:`nki.isa.nc_n_gather </nki/api/generated/nki.isa.nc_n_gather>` [used to be ``nl.gather_flattened`` with free partition limited to 512]\n* :doc:`nki.isa.rand2 </nki/api/generated/nki.isa.rand2>`\n* :doc:`nki.isa.rand_set_state </nki/api/generated/nki.isa.rand_set_state>`\n* :doc:`nki.isa.rand_get_state </nki/api/generated/nki.isa.rand_get_state>`\n* :doc:`nki.isa.set_rng_seed </nki/api/generated/nki.isa.set_rng_seed>`\n* :doc:`nki.isa.rng </nki/api/generated/nki.isa.rng>`\n\nNew dtypes\n^^^^^^^^^^^^^^\n\n* :doc:`nki.language.float8_e5m2_x4 </nki/api/generated/nki.language.float8_e5m2_x4>`\n* :doc:`nki.language.float4_e2m1fn_x4 </nki/api/generated/nki.language.float4_e2m1fn_x4>`\n* :doc:`nki.language.float8_e4m3fn_x4 </nki/api/generated/nki.language.float8_e4m3fn_x4>`\n\nChanges to Existing APIs\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Several nki.language APIs have been removed in NKI Beta 2\n* All nki.isa APIs have ``dst`` as an input param\n* All nki.isa APIs removed ``dtype`` and ``mask`` support\n* :doc:`nki.isa.memset </nki/api/generated/nki.isa.memset>` — removed ``shape`` positional arg , since we have ``dst``\n* :doc:`nki.isa.affine_select </nki/api/generated/nki.isa.affine_select>` — instead of ``pred``, we now take ``pattern`` and ``cmp_op`` params\n* :doc:`nki.isa.iota </nki/api/generated/nki.isa.iota>` — ``expr`` replaced with ``pattern`` and ``offset``\n* :doc:`nki.isa.nc_stream_shuffle </nki/api/generated/nki.isa.nc_stream_shuffle>` - ``src`` and ``dst`` order changed\n\nDocumentation Updates\n^^^^^^^^^^^^^^^^^^^^^^\n\n* Restructured NKI Documentation to align with workflows\n* Added :doc:`Trainium3 Architecture Guide for NKI </nki/guides/architecture/trainium3_arch>`\n* Added :doc:`About Neuron Kernel Interface (NKI) </nki/get-started/about/index>`\n* Added :doc:`NKI Environment Setup Guide </nki/get-started/setup-env>`\n* Added :doc:`Get Started with NKI </nki/get-started/quickstart-implement-run-kernel>`\n* Added :doc:`NKI Language Guide </nki/get-started/nki-language-guide>`\n* Added :doc:`About the NKI Compiler </nki/deep-dives/nki-compiler>`\n* Added :doc:`About the NKI Compiler </nki/deep-dives/nki-compiler>`\n* Added :doc:`MXFP Matrix Multiplication with NKI </nki/deep-dives/mxfp-matmul>`\n* Updated :doc:`Matrix Multiplication Tutorial </nki/guides/tutorials/matrix_multiplication>`\n* Updated :doc:`Profile a NKI Kernel </nki/guides/use-neuron-profile>`\n* Updated :doc:`NKI APIs </nki/api/index>`\n* Updated :doc:`NKI Library docs </nki/library/index>`\n* Removed NKI Error Guide\n\nKnown issues\n------------\n\n* :doc:`nki.isa.nc_matmul </nki/api/generated/nki.isa.nc_matmul>` - ``is_moving_onezero`` was incorrectly named ``is_moving_zero`` in this release\n* NKI ISA semantic checks are not available with Beta 2, workaround is to reference the API docs\n* NKI Collectives are not available with Beta 2\n* nki.benchmark and nki.profile are not available\n"
  },
  {
    "path": "release-notes/prev/2.27.0/nx-pytorch.rst",
    "content": ".. _neuron-2-27-0-pytorch:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK PyTorch support component, version 2.27.0. Release date: TBD.\n\nAWS Neuron SDK 2.27.0: PyTorch support release notes\n====================================================\n\n**Date of release**: December 19, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n\n* Go back to the :ref:`AWS Neuron 2.27.0 release notes home <neuron-2-27-0-whatsnew>`\n\nReleased versions\n-----------------\n\n- ``2.9.0.2.11.*``\n- ``2.8.0.2.11.*``\n- ``2.7.0.2.11.*``\n\n\nImprovements\n------------\n\n- Added support for PyTorch 2.9 (see :ref:`Introducing PyTorch 2.9 Support<introduce-pytorch-2-9>`)\n- Improved model tracing performance for large models by up to 90% through trace API optimizations that avoid copying weights and state buffers to the device and guarantee state restoration after tracing.\n- Fixed `GitHub issue #1240 <https://github.com/aws-neuron/aws-neuron-sdk/issues/1240>`_ impacting torch-neuronx 2.7 to 2.9\n- Fixed `GitHub issue #834 <https://github.com/aws-neuron/aws-neuron-sdk/issues/834>`_ impacting torch-neuronx 2.7 to 2.9\n- Fixed issue in PyTorch 2.8 where PJRT_Client_Destroy was not being called, which prevented NRT:nrt_close from being invoked. This caused resource leaks and \"nrtucode: internal error: 832 object(s) leaked, improper teardown\" errors. This fix ensures proper cleanup of Neuron Runtime resources on program exit. (Related: `PyTorch/XLA #9675 <https://github.com/pytorch/xla/pull/9675>`_)\n\n\nTransitioning to PyTorch Native Support for AWS Trainium in the Next Neuron Release Supporting PyTorch 2.10\n------------------------------------------------------------------------------------------------------------\n\nIn the next Neuron release that will support PyTorch 2.10, AWS Neuron will transition from PyTorch/XLA to native PyTorch support via TorchNeuron. PyTorch 2.9 will be the last version based on PyTorch/XLA.\n\n**What's changing:**\n\n- PyTorch 2.9: Last version using PyTorch/XLA backend\n- PyTorch 2.10 and later: Native PyTorch support via TorchNeuron\n\nCustomers using PyTorch/XLA-based training should migrate to native PyTorch with TorchNeuron, which provides:\n\n- Native PyTorch eager execution mode\n- Standard distributed primitives (DTensor, FSDP, DDP)\n- ``torch.compile`` support\n- Compatibility with frameworks like TorchTitan (PyTorch Training Library)\n\nFor more information about native PyTorch on Neuron and migration guidance, please see :ref:`Native PyTorch for AWS Trainium <native-pytorch-trainium>`.\n\n\nKnown issues\n------------\n\n.. note::\n   * PyTorch 2.6 has reached end-of-support since release 2.27.\n   * See :ref:`Introducing PyTorch 2.9 Support<introduce-pytorch-2-9>` for a full list of known issues with v2.9.\n   * See :ref:`Introducing PyTorch 2.8 Support<introduce-pytorch-2-8>` for a full list of known issues with v2.8.\n   * See :ref:`Introducing PyTorch 2.7 Support<introduce-pytorch-2-7>` for a full list of known issues with v2.7.\n\n* [PyTorch v2.8] Using the publicly released version of torch-xla 2.8.0 from public PyPI repositories would result in lower performance for models like BERT and LLaMA (https://github.com/pytorch/xla/issues/9605). To fix this, switch to using the updated torch-xla version 2.8.1 from public PyPI repositories.\n\n* [PyTorch v2.7] Using the latest torch-xla v2.7 may result in an increase in host memory usage compared to torch-xla v2.6. In one example, LLama2 pretraining with ZeRO1 and sequence length 16k could see an increase of 1.6% in host memory usage.\n\n* [PyTorch v2.7] PyTorch NeuronX 2.7 supports Python 3.10, and 3.11 only. Python 3.12 is not supported. Note that Ubuntu 24.04 comes with Python 3.12 by default, so users must install Python 3.10 or 3.11, or use PyTorch 2.9 which supports Python 3.12.\n\n* [libtorch on Ubuntu 22.04] Using libtorch with PyTorch NeuronX on Ubuntu 22.04 may encounter build errors when compiling tokenizers or other Rust-based dependencies, as Ubuntu 22.04 provides Rust 1.75 by default which is too old for some dependencies that require Rust 1.80+. Ubuntu 24.04 is recommended for libtorch applications.\n\n* Currently, when switching Ubuntu OS kernel version from 5.15 to 6.8, you may see performance differences due to the new kernel scheduler (CFS vs EEVDF). For example, BERT pretraining performance could be lower by up to 10%. You may try using an older OS kernel (i.e. Amazon Linux 2023) or experiment with the kernel real-time scheduler by running ``sudo chrt --fifo 99`` before your command (i.e. ``sudo chrt --fifo 99 <script>``) to improve the performance. Note that adjusting the real-time scheduler can also result in lower performance. See https://www.kernel.org/doc/html/latest/scheduler/sched-eevdf.html for more information.\n\n* Currently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``.\n\n* [PyTorch v2.6]  BERT pretraining performance is approximately 10% lower with torch-neuronx 2.6 compared to torch-neuronx 2.5. This is due to a known regression in torch-xla https://github.com/pytorch/xla/issues/9037 and may affect other models with high graph tracing overhead. This is fixed in torch-xla 2.7 and 2.8. To work around this issue in torch-xla 2.6, build the ``r2.6_aws_neuron`` branch of torch-xla as follows (see :ref:`pytorch-neuronx-install-cxx11` for C++11 ABI version):\n\n.. code:: bash\n\n      # Setup build env (make sure you are in a python virtual env). Replace \"apt\" with \"yum\" on AL2023.\n      sudo apt install cmake\n      pip install yapf==0.30.0\n     wget https://github.com/bazelbuild/bazelisk/releases/download/v1.20.0/bazelisk-linux-amd64\n     sudo cp bazelisk-linux-amd64 /usr/local/bin/bazel\n\n     # Clone repos\n     git clone --recursive https://github.com/pytorch/pytorch --branch v2.6.0\n     cd pytorch/\n     git clone --recursive https://github.com/pytorch/xla.git --branch r2.6_aws_neuron\n     _GLIBCXX_USE_CXX11_ABI=0 python setup.py bdist_wheel\n\n     # The pip wheel will be present in ./dist\n     cd xla/\n     CXX_ABI=0 python setup.py bdist_wheel\n\n     # The pip wheel will be present in ./dist and can be installed instead of the torch-xla released in pypi.org\n\n* Currently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 through 2.9 although there will be end-of-support warnings (as noted below).\n\n* Environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (see the warning raised below). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`).\n\n.. code:: bash\n\n   Warning: ``XLA_DOWNCAST_BF16`` will be deprecated after the 2.5 release, please downcast your model directly\n\n* [PyTorch v2.8+] ``DeprecationWarning: Use torch_xla.device instead``. This is a warning that ``torch_xla.core.xla_model.xla_device()`` is deprecated. Switch to using ``torch_xla.device()`` instead.\n\n* [PyTorch v2.8+] ``DeprecationWarning: Use torch_xla.sync instead``. This is a warning that ``torch_xla.core.xla_model.mark_step()`` is deprecated. Switch to using ``torch_xla.sync()`` instead.\n\n* [PyTorch v2.7+] ``AttributeError: module 'torch_xla.core.xla_model' ... does not have the attribute 'xrt_world_size'``. This is an error that notes that ``torch_xla.core.xla_model.xrt_world_size()`` is removed in torch-xla version 2.7+. Switch to using ``torch_xla.runtime.world_size()`` instead.\n\n* [PyTorch v2.7+] ``AttributeError: module 'torch_xla.core.xla_model' ... does not have the attribute 'get_ordinal'``. This is an error that notes that ``torch_xla.core.xla_model.get_ordinal()`` is removed in torch-xla version 2.7+. Switch to using ``torch_xla.runtime.global_ordinal()`` instead.\n\n* [PyTorch v2.5+] ``AttributeError: module 'torch_xla.runtime' has no attribute 'using_pjrt'``. In Torch-XLA 2.5+, ``torch_xla.runtime.using_pjrt`` is removed because PJRT is the sole Torch-XLA runtime. See this `PyTorch commit PR on GitHub <https://github.com/pytorch/xla/commit/d6fb5391d09578c8804b1331a5e7a4f72bf981db>`_.\n\nPrevious release notes\n----------------------\n\n* :ref:`neuron-2-26-0-pytorch`\n* :ref:`pytorch-neuron-rn`\n"
  },
  {
    "path": "release-notes/prev/2.27.0/nxd-inference.rst",
    "content": ".. _neuron-2-27-0-nxd-inference:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Transformers for Inference component, version 2.27.0. Release date: 12/19/2025.\n\nAWS Neuron SDK 2.27.0: NxD Inference release notes\n==================================================\n\n**Date of release**: December 19, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.27.0 release notes home <neuron-2-27-0-whatsnew>`\n\nWhat's New\n----------\n\n**Trn3 Platform Support** — Added support for running NxD Inference on Trn3 instances.\n\n**vLLM V1 support** - This release adds support for vLLM V1 through  `vllm-neuron <https://github.com/vllm-project/vllm-neuron>`_ plugin. You can use the vLLM V1 by using the new vLLM V1 based Neuron DLC or using the vLLM virtual environment in Neuron DLAMIs. See :ref:`vLLM V1 guide <nxdi-vllm-user-guide-v1>` for more information.\n\n**Qwen3 MoE Model Support (Beta)** — NxD Inference supports Qwen3 MoE language model which supports multilingual text inputs. You can use HuggingFace checkpoint. For more information about how to run Qwen3 MoE inference, see :doc:`Tutorial: Qwen3 MoE Inference </libraries/nxd-inference/tutorials/qwen3-moe-tutorial>`.\n\nCompatible models include:\n\n* `Qwen3-235B-A22B <https://huggingface.co/Qwen/Qwen3-235B-A22B>`_\n\n**Pixtral Model Support (Beta)** — NxD Inference supports Pixtral image understanding model which processes text and image inputs. You can use HuggingFace checkpoint. For more information about how to run Pixtral inference, see :doc:`Tutorial: Deploy Pixtral Large on Trn2 instances </libraries/nxd-inference/tutorials/pixtral-tutorial>`.\n\nCompatible models include:\n\n* `Pixtral-Large-Instruct-2411 <https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411>`_\n\nKnown Issues\n------------\n\n* Pixtral deployment is supported up to batch size 32 and sequence length 10240 with vLLM v0. vLLM v1 deployment supports up to batch size 4 and sequence length 10240.\n* The performance of Qwen3 MoE and Pixtral on Trn2 is not fully optimized. We will address the issues in the future release.\n* The vllm-neuron plugin source code in github is currently not compatible with 2.27 SDK. Customers are advised to use inference DLAMI and DLC published with 2.27.0 SDK for vLLN V1 support. vllm-neuron github repo source code\n  will be updated soon to be compatible with 2.27 release SDK.\n\n"
  },
  {
    "path": "release-notes/prev/2.27.0/runtime.rst",
    "content": ".. _neuron-2-27-0-runtime:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Runtime component, version 2.27.0. Release date: 12/19/2025.\n\nAWS Neuron SDK 2.27.0: Neuron Runtime release notes\n===================================================\n\n**Date of release**:  December 19, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.27.0 release notes home <neuron-2-27-0-whatsnew>`\n\nReleased versions\n-----------------\n\n- Neuron Driver: ``2.25.4.0``\n- Neuron Runtime Library: ``2.29.40.0``\n- Neuron Collectives: ``2.29.41.0``\n\nCompatibility Support Tables\n----------------------------\n**Runtime Version:** 2.29.40.0\n\nThe Neuron runtime was tested for the following EC2 instances and configurations:\n\n=========================== ============= ============== ================= ===============\nInstance Family               OS Type       OS Version     Kernel Version    GLIBC Version\n=========================== ============= ============== ================= ===============\n``Inf2``                    Ubuntu        U24            6.14              2.39\n``Inf2``                    Ubuntu        U22            6.8               2.35\n``Inf2``                    Rocky Linux   RL9            5.14              2.34\n``Inf2``                    Amazon Linux  AL2023         6.12              2.34\n``Trn1``                    Ubuntu        U24            6.14              2.39\n``Trn1``                    Ubuntu        U22            6.8               2.35\n``Trn1``                    Rocky Linux   RL9            5.14              2.34\n``Trn1``                    Amazon Linux  AL2023         6.12              2.34\n``Trn2``                    Ubuntu        U24            6.14              2.39\n``Trn2``                    Ubuntu        U22            6.8               2.35\n``Trn2``                    Amazon Linux  AL2023         6.12              2.34\n=========================== ============= ============== ================= ===============\n\n**Driver Version:** 2.25.4.0\n\nThe Neuron driver was tested for the following EC2 instances and configurations:\n\n=========================== ============= ============== ================= ===============\nInstance Family               OS Type       OS Version     Kernel Version    GLIBC Version\n=========================== ============= ============== ================= ===============\n``Inf2``                    Ubuntu        U24            6.14              2.39\n``Inf2``                    Ubuntu        U22            6.8               2.35\n``Inf2``                    Rocky Linux   RL9            5.14              2.34\n``Inf2``                    Amazon Linux  AL2023         6.12              2.34\n``Inf2``                    Amazon Linux  AL2            5.10              2.26\n``Trn1``                    Ubuntu        U24            6.14              2.39\n``Trn1``                    Ubuntu        U22            6.8               2.35\n``Trn1``                    Rocky Linux   RL9            5.14              2.34\n``Trn1``                    Amazon Linux  AL2023         6.12              2.34\n``Trn1``                    Amazon Linux  AL2            5.10              2.26\n``Trn2``                    Ubuntu        U24            6.14              2.39\n``Trn2``                    Ubuntu        U22            6.8               2.35\n``Trn2``                    Amazon Linux  AL2023         6.12              2.34\n``Trn2``                    Amazon Linux  AL2            5.10              2.26\n=========================== ============= ============== ================= ===============\n\nWhat's New\n----------\n\nNeuron Runtime Library 2.29.40.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Added support for Trainium3 (single node mode)\n\nNeuron Driver 2.25.4.0\n^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Added support for Trainium3\n\nImprovements\n------------\n\nNeuron Runtime Library 2.29.40.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Reduced the overhead of reprogramming the Collectives Engine by up to 100x for NEFFs compiled with the ``-O1`` flag. This improves end-to-end performance of these NEFFs by up to 15%.\n* Reduced NeuronCore branch overhead by up to 3x, decreasing the overhead of starting a NEFF program by up to 5%.\n* Reduced the overhead of starting a NEFF program by up to 50% with an on-device hardware barrier between ranks.\n* Improved all-gather latency by up to 35% for messages greater than 1MB in TP8 (LNC2) and TP16 (LNC1) collectives.\n* Added support for :ref:`NRT Debug Stream APIs <nrt-debug-stream-api>`.\n\nBug fixes\n---------\n\nNeuron Runtime Library 2.29.40.0\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Fixed scratchpad page allocation bug that caused excessive page allocations due to page rounding error.\n* Fixed segfault that occurred when freeing an empty tensor.\n\nPrevious release notes\n----------------------\n\n* :ref:`neuron-2-26-0-runtime`"
  },
  {
    "path": "release-notes/prev/2.27.0/tools.rst",
    "content": ".. _neuron-2-27-0-tools:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK Developer Tools component, version 2.27.0. Release date: 12/19/2025.\n\nAWS Neuron SDK 2.27.0: Developer Tools Release Notes\n====================================================\n\n**Date of release**: December 19, 2025\n\n.. contents:: In this release\n   :local:\n   :depth: 2\n\n* Go back to the :ref:`AWS Neuron 2.27.0 release notes home <neuron-2-27-0-whatsnew>`\n\nWhat's New\n----------\n\nIntroducing Neuron Explorer\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Introduces :doc:`Neuron Explorer </tools/neuron-explorer/index>`, a suite of tools designed to support ML engineers throughout their development journey on AWS Trainium. This release adds device profiling support, with new tools like :doc:`Hierarchy Viewer </tools/neuron-explorer/overview-hierarchy-view>`, :doc:`AI Recommendation Viewer </tools/neuron-explorer/overview-ai-recommendations>`, :doc:`Source Code Viewer </tools/neuron-explorer/how-to-link-view-source-code>`, and :doc:`Summary Viewer </tools/neuron-explorer/overview-summary-page>`.\n* Introduced Neuron Explorer UI, CLI, and IDE integration via VSCode\n\nNeuron Explorer includes device profiling support with the following tools:\n\n* :doc:`Hierarchy Viewer </tools/neuron-explorer/overview-hierarchy-view>` — Visualizes the hierarchical structure of your model, allowing you to understand how different components interact and contribute to overall performance\n* :doc:`AI Recommendation Viewer </tools/neuron-explorer/overview-ai-recommendations>` — Provides AI-driven recommendations for optimizing your model based on profiling data, helping you identify bottlenecks and areas for improvement\n* :doc:`Source Code Viewer </tools/neuron-explorer/how-to-link-view-source-code>` — Links profiling data back to your source code, enabling you to quickly identify and address performance issues in your codebase\n* :doc:`Summary Viewer </tools/neuron-explorer/overview-summary-page>` — Offers a high-level overview of your model's performance metrics, resource utilization, and optimization opportunities\n\n* Trn3 support for ``neuron-monitor``, ``neuron-top``, ``neuron-ls``, and ``nccom-test``.\n\nNew Tutorials\n^^^^^^^^^^^^^\n\nNeuron 2.27.0 introduces :doc:`Neuron Explorer </tools/neuron-explorer/index>`, a suite of tools designed to support ML engineers throughout their development journey on AWS Trainium. Neuron Explorer provides insights into model performance, resource utilization, and optimization opportunities, helping developers to fine-tune their models for optimal performance on Trainium instances.\n\nThis release introduces enhanced in-UI performance, simplified setup, and key features for device profiling:  \n\n- **Hierarchy Viewer**: Visualizes the hierarchical structure of your model, allowing you to understand how different components interact and contribute to overall performance.\n- **AI Recommendation Viewer**: Provides AI-driven recommendations for optimizing your model based on profiling data, helping you identify bottlenecks and areas for improvement.\n- **Source Code Viewer**: Links profiling data back to your source code, enabling you to quickly identify and address performance issues in your codebase.\n- **Summary Viewer**: Offers a high-level overview of your model's performance metrics, resource utilization, and optimization opportunities.\n- Added tutorials: :doc:`How to Profile a NKI Kernel </nki/guides/use-neuron-profile>`, :doc:`Profiling Multi-Node Training Jobs </tools/neuron-explorer/how-to-profile-workload>`, and :doc:`Profiling a vLLM Inference Workload </tools/tutorials/performance-profiling-vllm>`\n\n.. note::\n\n   Neuron Explorer is in active development! At this time, it does not support system level profiling. For a stable user experience and system profiling, see :ref:`Neuron Profiler 2.0 <neuron-profiler-2-0-guide>` and :ref:`Neuron Profiler <neuron-profile-ug>`.\n"
  },
  {
    "path": "release-notes/prev/2.27.1.rst",
    "content": ".. meta::\n    :description: Release notes for AWS Neuron SDK release v2.7.1\n    :date-modified: 01/14/2026\n\n\nAWS Neuron SDK Release Notes - v2.27.1\n=======================================\n\n**Release Date**: January 14, 2026   \n\nRelease **2.27.1** of the AWS Neuron SDK includes bug fixes applied to the AWS Neuron SDK v2.27.0. See :ref:`the Neuron SDK v2.27.0 release notes <neuron-2-27-0-whatsnew>` for the full set of changes that shipped with the 2.27.0 release.\n\nWhat's Changed?\n----------------\n\n**Neuron DLAMIs**\n\n* Support for NKI has been added to all DLAMI virtual environments.\n\n\nBug Fixes\n----------\n\n**NxD Inference**\n\n* Fixed stability issue affecting Llama 4 that may occur when changing model configuration.\n* Removed a debug print statement from the Qwen3-MoE model implementation.\n\n----\n\nFor information about known issues in Neuron DLCs, see the :doc:`Neuron DLC component release notes </release-notes/components/containers>`.\n\n.. grid:: 1 \n        :gutter: 2\n\n        .. grid-item-card::\n                :link: latest-neuron-release-artifacts\n                :link-type: ref\n                :class-card: sd-border-1\n        \n                **Neuron 2.27.1 release artifacts**\n                ^^^\n                The libraries and packages updated in this Neuron release.\n"
  },
  {
    "path": "release-notes/prev/2.28.0.rst",
    "content": ".. _neuron-2-28-0-whatsnew:\n\n.. meta::\n   :description: The official release notes for the AWS Neuron SDK, version 2.28.0. Release date: 02/26/2026.\n\nAWS Neuron SDK 2.28.0 release notes\n===================================\n\n**Date of release**: February 26, 2026\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   PyTorch (torch-neuronx) </release-notes/components/pytorch>\n   NxD Inference/vLLM  </release-notes/components/nxd-inference>\n   NKI </release-notes/components/nki>\n   NKI Library </release-notes/components/nki-lib>\n   Neuron Runtime </release-notes/components/runtime>\n   Developer tools </release-notes/components/dev-tools>\n   Deep Learning AMIs </release-notes/components/dlamis>\n   Deep Learning Containers </release-notes/components/containers>\n\nThis page provides detailed component release notes for the Neuron SDK 2.28.0. For a an overview of the release content, see :ref:`What's New in AWS Neuron <whats-new-2026-02-26-v2_28>`.\n\n.. note::\n        On March 13, 2026, Neuron released a patch version for the Neuron SDK, version 2.28.1. Read about what changed here: [:doc:`/release-notes/prev/2.28.1`]\n\nPackage and Library Updates\n---------------------------\n\n.. grid:: 1 \n        :gutter: 2\n\n        .. grid-item-card::\n                :link: latest-neuron-release-artifacts\n                :link-type: ref\n                :class-card: sd-border-1\n        \n                **Neuron 2.28.0 release artifacts**\n                ^^^\n                The libraries and packages updated in this Neuron release.\n\nComponent Release Notes\n-----------------------\n\nSelect a card below to review detailed release notes for each component of the Neuron SDK version 2.28.0. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.\n\n* For the full set of component release notes across Neuron versions, see :doc:`/release-notes/components/index`.\n\n.. grid:: 1 \n        :gutter: 2\n\n        .. grid-item-card:: \n                :link: /release-notes/components/pytorch\n                :link-type: doc\n\n                **PyTorch Neuron (torch-neuronx)** 2.28.0 release notes\n                ^^^\n                Intergated, native support for PyTorch on Neuron.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: /release-notes/components/nxd-inference\n                :link-type: doc\n\n                **NxD Inference** 2.28.0 release notes\n                ^^^\n                Neuron features and tools for LLM and agent ML model inference, and the vLLM Plugin for Neuron.\n                +++\n                Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n        .. grid-item-card:: \n                :link: /release-notes/components/nki\n                :link-type: doc\n\n                **Neuron Kernel Interface (NKI)** 2.28.0 release notes\n                ^^^\n                Neuron's Python-based programming interface for developing and optimizing Neuron kernels.\n                +++\n                Supports: ``Inf2``, ``Trn1``, ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: /release-notes/components/nki-lib\n                :link-type: doc\n\n                **NKI Library (NKI-Lib)** 2.28.0 release notes\n                ^^^\n                Reference kernels and utilities for Neuron kernel development with NKI.\n                +++\n                Supports: ``Inf2``, ``Trn1``, ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: /release-notes/components/runtime\n                :link-type: doc\n\n                **Neuron Runtime** 2.28.0 release notes\n                ^^^\n                The Neuron kernel driver and C++ libraries for AWS Inferentia and Trainium instances.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: /release-notes/components/dev-tools\n                :link-type: doc\n\n                **Neuron Developer Tools** 2.28.0 release notes\n                ^^^\n                Tools that support end-to-end development for AWS Neuron, including Neuron Explorer.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``, ``Trn3``\n\n        .. grid-item-card:: \n                :link: /release-notes/components/dlamis\n                :link-type: doc\n\n                **Neuron Deep Learning AWS Machine Images (DLAMIs)** 2.28.0 release notes\n                ^^^\n                AWS-specific machine images for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n \n        .. grid-item-card:: \n                :link: /release-notes/components/containers\n                :link-type: doc\n\n                **Neuron Deep Learning Containers (DLCs)** 2.28.0 release notes\n                ^^^\n                AWS-specific container definitions for building and deploying Neuron-based ML solutions.\n                +++\n                Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n\nPrevious releases\n-----------------\n\n* :doc:`Neuron 2.27.0 </release-notes/prev/2.27.0/index>`\n* :doc:`Neuron 2.26.0 </release-notes/prev/2.26.0/index>`\n* :doc:`Neuron 2.25.0 </release-notes/prev/2.25.0/index>`\n* :doc:`Earlier releases </release-notes/prev/rn>`\n\n* :ref:`prev-rn`\n* :ref:`pre-release-content`\n* :ref:`prev-n1-rn`"
  },
  {
    "path": "release-notes/prev/2.28.1.rst",
    "content": ".. meta::\n    :description: Release notes for AWS Neuron SDK release v2.8.1\n    :date-modified: 03/13/2026\n\n\nAWS Neuron SDK Release Notes - v2.28.1\n=======================================\n\n**Release Date**: March 13, 2026   \n\nRelease **2.28.1** of the AWS Neuron SDK includes bug fixes applied to the AWS Neuron SDK v2.28.0. See :ref:`the Neuron SDK v2.28.0 release notes <neuron-2-28-0-whatsnew>` for the full set of changes that shipped with the 2.28.0 release.\n\n\nBug Fixes\n----------\n\nNeuron Custom C++ Operators Library\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n* Fixed a package dependency issue in version ``0.20.4`` of ``aws-neuronx-gpsimd-customop-lib``, which was released as part of Neuron version 2.28.0.\n\nNeuron Driver\n~~~~~~~~~~~~~~~~\n\n* Fixed a Neuron Runtime driver compatibility issue with Linux kernel 6.18.\n\n----\n\n.. grid:: 1 \n        :gutter: 2\n\n        .. grid-item-card::\n                :link: latest-neuron-release-artifacts\n                :link-type: ref\n                :class-card: sd-border-1\n        \n                **Neuron 2.28.1 release artifacts**\n                ^^^\n                The libraries and packages updated in this Neuron release.\n"
  },
  {
    "path": "release-notes/prev/content.rst",
    "content": ".. _pre-release-content:\n\nPrevious release artifacts (Neuron 2.x)\n=======================================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n.. _neuron-2.28.1-artifacts:\n\nNeuron 2.28.1 (03/13/2026)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.1\n\nTrn2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.1\n\n\nInf2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.1\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.1\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.1\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.1\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron currently supports NumPy versions 2.X. Neuron continues to support NumPy versions >= 1.21.6, as well.\n\nSupported vLLM Versions\n^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron currently supports vLLM version 0.13.0.\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | >= 4.52                          |\n+----------------------------------+----------------------------------+\n| neuronx-distributed-inference    | >= 4.57                          |\n+----------------------------------+----------------------------------+\n| vllm                             | >= 4.56.0, < 5                   |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\n.. _neuron-2.28.0-artifacts:\n\nNeuron 2.28.0 (02/26/2026)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.0\n\nTrn2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.0\n\n\nInf2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.0\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.0\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.0\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.28.0\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron currently supports NumPy versions 2.X. Neuron continues to support NumPy versions >= 1.21.6, as well.\n\nSupported vLLM Versions\n^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron currently supports vLLM version 0.13.0.\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | >= 4.52                          |\n+----------------------------------+----------------------------------+\n| neuronx-distributed-inference    | >= 4.57                          |\n+----------------------------------+----------------------------------+\n| vllm                             | >= 4.56.0, < 5                   |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\n.. _neuron-2.27.1-artifacts:\n\nNeuron 2.27.1 (01/14/2026)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.1\n\nTrn2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.1\n\n\nInf2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.1\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.1\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.1\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.1\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron currently supports NumPy versions 2.X. Neuron continues to support NumPy versions >= 1.21.6, as well.\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | < 4.35 and >=4.37.2              |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Llama      | 4.31                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - GPT NeoX   | 4.26                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Bert model | 4.26                             |\n| class                            |                                  |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\n.. _neuron-2.27.0-artifacts:\n\nNeuron 2.27.0 (12/19/2025)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.0\n\nTrn2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.0\n\n\nInf2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.0\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.0\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.0\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.27.0\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron currently supports NumPy versions 2.X. Neuron continues to support NumPy versions >= 1.21.6, as well.\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | < 4.35 and >=4.37.2              |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Llama      | 4.31                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - GPT NeoX   | 4.26                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Bert model | 4.26                             |\n| class                            |                                  |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\n.. _neuron-2.26.1-artifacts:\n\nNeuron 2.26.1 (10/29/2025)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.26.1\n\nTrn2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.26.1\n\n\nInf2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.26.1\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.26.1\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.26.1\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.26.1\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron supports versions >= 1.21.6 and <= 1.22.2\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | < 4.35 and >=4.37.2              |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Llama      | 4.31                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - GPT NeoX   | 4.26                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Bert model | 4.26                             |\n| class                            |                                  |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\n.. _neuron-2.25.0-artifacts:\n\nNeuron 2.25.0 (07/31/2025)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.25.0\n\nTrn2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.25.0\n\n\nInf2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.25.0\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.25.0\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.25.0\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.25.0\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron supports versions >= 1.21.6 and <= 1.22.2\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | < 4.35 and >=4.37.2              |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 4.36.0                        |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Llama      | 4.31                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - GPT NeoX   | 4.26                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Bert model | 4.26                             |\n| class                            |                                  |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\n.. _neuron-2.24.0-artifacts:\n\nNeuron 2.24.0 (06/24/2025)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.24.0\n\nTrn2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.24.0\n\n\nInf2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.24.0\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.24.0\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.24.0\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.24.0\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron supports versions >= 1.21.6 and <= 1.22.2\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | < 4.35 and >=4.37.2              |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 4.36.0                        |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Llama      | 4.31                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - GPT NeoX   | 4.26                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Bert model | 4.26                             |\n| class                            |                                  |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\n.. _neuron-2.23.0-artifacts:\n\nNeuron 2.23.0 (05/20/2025)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.23.0\n\nTrn2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.23.0\n\n\nInf2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.23.0\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.23.0\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.23.0\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.23.0\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron supports versions >= 1.21.6 and <= 1.22.2\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | < 4.35 and >=4.37.2              |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 4.36.0                        |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Llama      | 4.31                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - GPT NeoX   | 4.26                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Bert model | 4.26                             |\n| class                            |                                  |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\n.. _neuron-2.22.0-artifacts:\n\nNeuron 2.22.0 (04/03/2025)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.22.0\n\nTrn2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.22.0\n\n\nInf2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.22.0\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.22.0\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.22.0\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.22.0\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron supports versions >= 1.21.6 and <= 1.22.2\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | < 4.35 and >=4.37.2              |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 4.36.0                        |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Llama      | 4.31                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - GPT NeoX   | 4.26                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Bert model | 4.26                             |\n| class                            |                                  |\n+----------------------------------+----------------------------------+\n| nemo-megatron                    | 4.31.0                           |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\n.. _neuron-2.21.0-artifacts:\n\nNeuron 2.21.0 (10/25/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.21.0\n\nTrn2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.21.0\n\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.21.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.21.0\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.21.0\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.21.0\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron supports versions >= 1.21.6 and <= 1.22.2\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | < 4.35 and >=4.37.2              |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 4.36.0                        |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Llama      | 4.31                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - GPT NeoX   | 4.26                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Bert model | 4.26                             |\n| class                            |                                  |\n+----------------------------------+----------------------------------+\n| nemo-megatron                    | 4.31.0                           |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\n\n\n\n.. _neuron-2.20.2.beta-artifacts:\n\nNeuron 2.20.2 (11/20/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.2\n\nInf2 packages\n^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.2\n\nInf1 packages\n^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.2\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.2\n\nSupported Python Versions for Inf2/Trn1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.2\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron supports versions >= 1.21.6 and <= 1.22.2\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | < 4.35 and >=4.37.2              |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 4.36.0                        |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Llama      | 4.31                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - GPT NeoX   | 4.26                             |\n| model class                      |                                  |\n+----------------------------------+----------------------------------+\n| neuronx-distributed - Bert model | 4.26                             |\n| class                            |                                  |\n+----------------------------------+----------------------------------+\n| nemo-megatron                    | 4.31.0                           |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| transformers-neuronx             | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n\nSupported Linux Kernel Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nNeuron Driver (``aws-neuronx-dkms``) supports Linux kernel versions >= 5.10\n\n\n\nNeuron 2.20.1 (10/25/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.1\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.1\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.1\n\nNeuron 2.20.0 (09/16/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.20.0\n\nNeuron 2.19.1 (07/19/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.19.1\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.19.1\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.19.1\n\nNeuron 2.19.0 (07/03/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.19.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.19.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.19.0\n\nNeuron 2.18.2 (04/25/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.18.2\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.18.2\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.18.2\n\n\nNeuron 2.18.1 (04/10/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.18.1\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.18.1\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.18.1\n\nNeuron 2.18.0 (04/01/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.18.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.18.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.18.0\n\n\nNeuron 2.17.0 (02/13/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.17.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.17.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.17.0\n\n\nNeuron 2.16.1 (01/18/2024)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.16.1\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.16.1\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.16.1\n\n\nNeuron 2.16.0 (12/21/2023)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.16.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.16.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.16.0\n\n\n\nNeuron 2.15.2 (11/17/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.15.2\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.15.2\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.15.2\n\n\nNeuron 2.15.1 (11/09/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.15.1\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.15.1\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.15.1\n\n\nNeuron 2.15.0 (10/26/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.15.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.15.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.15.0\n\n\n\nNeuron 2.14.1 (09/26/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.14.1\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.14.1\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.14.1\n\n\n\nNeuron 2.14.0 (09/15/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.14.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.14.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.14.0\n\n\nNeuron 2.13.2 (09/01/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.13.2\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.13.2\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.13.2\n\n\nNeuron 2.13.1 (08/29/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.13.1\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.13.1\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.13.1\n\n\nNeuron 2.13.0 (08/28/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.13.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.13.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.13.0\n\n\nNeuron 2.12.2 (08/20/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.12.2\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.12.2\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.12.2\n\n\nNeuron 2.12.1 (08/09/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.12.1\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.12.1\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.12.1\n\n\nNeuron 2.12.0 (07/19/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.12.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.12.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.12.0\n\n\nNeuron 2.11.0 (06/14/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.11.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.11.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.11.0\n\n\nNeuron 2.10.0 (05/01/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.10.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.10.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.10.0\n\n\n\nNeuron 2.9.1 (04/19/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.9.1\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.9.1\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.9.1\n\n\n\nNeuron 2.9.0 (03/28/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.9.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.9.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.9.0\n\n\n\nNeuron 2.8.0 (02/24/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.8.0\n\nInf2 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.8.0\n\nInf1 packages\n^^^^^^^^^^^^^\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.8.0\n\n\n\nNeuron 2.7.0 (02/08/2023)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.7.0\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=2.7.0\n\nNeuron 2.6.0 (12/12/2022)\n--------------------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n* ``aws-neuronx-dkms-2.6.33.0``\n* ``aws-neuronx-oci-hook-2.1.14.0``\n* ``aws-neuronx-runtime-lib-2.10.30.0``\n* ``aws-neuronx-collectives-2.10.37.0``\n* ``aws-neuronx-tools-2.6.1.0``\n* ``aws-neuronx-k8-plugin-2.1.12.0``\n* ``aws-neuronx-k8-scheduler-2.1.12.0``\n* ``tensorboard_plugin_neuronx-2.5.3.0``\n* ``neuronx-cc-2.3.0.4``\n* ``torch-neuronx-1.12.0.1.4.0``\n* ``tensorflow-model-server-neuronx_1.15.0.2.5.6.0``\n* ``tensorflow-model-server-neuronx_2.5.4.2.5.6.0``\n* ``tensorflow-model-server-neuronx_2.6.3.2.5.6.0``\n* ``tensorflow-model-server-neuronx_2.7.0.2.5.6.0``\n* ``tensorflow-model-server-neuronx_2.8.0.2.5.6.0``\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=2.6.0\n\nNeuron 2.5.0 (11/23/2022)\n-------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n* ``aws-neuronx-dkms-2.6.33.0``\n* ``aws-neuronx-oci-hook-2.1.14.0``\n* ``aws-neuronx-runtime-lib-2.10.27.0``\n* ``aws-neuronx-collectives-2.10.34.0``\n* ``aws-neuronx-tools-2.5.19.0``\n* ``aws-neuronx-k8-plugin-2.1.12.0``\n* ``aws-neuronx-k8-scheduler-2.1.12.0``\n* ``neuronx-cc-2.2.0.73``\n* ``torch-neuronx-1.11.0.1.2.0``\n* ``tensorflow-model-server-neuronx_1.15.0.2.5.6.0``\n* ``tensorflow-model-server-neuronx_2.5.4.2.5.6.0``\n* ``tensorflow-model-server-neuronx_2.6.3.2.5.6.0``\n* ``tensorflow-model-server-neuronx_2.7.0.2.5.6.0``\n* ``tensorflow-model-server-neuronx_2.8.0.2.5.6.0``\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=2.5.0\n   \n\nNeuron 2.4.0 (10/27/2022)\n--------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n* ``aws-neuronx-dkms-2.6.5.0``\n* ``aws-neuronx-oci-hook-2.1.1.0``\n* ``aws-neuronx-runtime-lib-2.10.15.0``\n* ``aws-neuronx-collectives-2.10.17.0``\n* ``aws-neuronx-tools-2.5.16.0``\n* ``aws-neuronx-k8-plugin-2.1.2.0``\n* ``aws-neuronx-k8-scheduler-2.1.2.0``\n* ``neuronx-cc-2.2.0.73``\n* ``torch-neuronx-1.11.0.1.2.0``\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=2.4.0\n\n\nNeuron 2.3.0 (10/10/2022)\n-------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n* ``aws-neuronx-dkms-2.5.41.0``\n* ``aws-neuronx-oci-hook-2.0.16.0``\n* ``aws-neuronx-runtime-lib-2.9.64.0``\n* ``aws-neuronx-collectives-2.9.86.0``\n* ``aws-neuronx-tools-2.4.14.0``\n* ``aws-neuronx-k8-plugin-2.0.1.0``\n* ``aws-neuronx-k8-scheduler-2.0.1.0``\n* ``neuronx-cc-2.1.0.76``\n* ``torch-neuronx-1.11.0.1.1.1``\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --list packages --neuron-version=2.3.0\n"
  },
  {
    "path": "release-notes/prev/rn.rst",
    "content": ".. _prev-rn:\n\nPrevious release notes (Neuron 2.x)\n====================================\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   Neuron 2.28.1 </release-notes/prev/2.28.1>\n   Neuron 2.28.0 </release-notes/prev/2.28.0> \n   Neuron 2.27.1 </release-notes/prev/2.27.1> \n   Neuron 2.27.0 </release-notes/prev/2.27.0/index>\n   Neuron 2.26.1 </release-notes/prev/2.26.1>\n   Neuron 2.26.0 </release-notes/prev/2.26.0/index>\n   Neuron 2.25.0 </release-notes/prev/2.25.0/index>\n\n* **The latest Neuron release is 2.29.0, released on 04/09/2026.** Read the :doc:`2.29.0 release notes </release-notes/2.29.0>` or :doc:`the individual Neuron component release notes </release-notes/components/index>` for more details.\n  \n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\n.. grid:: 1 \n        :gutter: 2\n\n        .. grid-item-card::\n                :link: /release-notes/components/index\n                :link-type: doc\n                :class-card: sd-border-1\n        \n                **Neuron component release notes**\n                ^^^\n                Release notes by component for prior Neuron SDK versions.\n\n\nNeuron 2.28.0 (02/26/2026)\n--------------------------\n\nSee :ref:`neuron-2-28-0-whatsnew` for the full Neuron 2.28.0 release notes or :doc:`the individual Neuron component release notes </release-notes/components/index>`.\n\n* Neuron 2.28.0 was released as a patch for 2.28.0 on 3/13/2026. See the :doc:`2.28.1 (patch) release notes <2.28.1>` for details.\n\nNeuron 2.27.0 (12/19/2025)\n--------------------------\n\nSee :ref:`neuron-2-27-0-whatsnew` for the full Neuron 2.27.0 release notes or :doc:`the individual Neuron component release notes </release-notes/components/index>`.\n\n* Neuron 2.27.1 was released as a patch for 2.27.0 on 1/26/2026. See the :doc:`2.27.1 (patch) release notes <2.27.1>` for details.\n\nNeuron 2.26.1 (10/29/2025)\n--------------------------\n\nSee :doc:`2.26.1` for the updated Neuron 2.26.1 release notes or :doc:`the individual Neuron component release notes </release-notes/components/index>`.\n\nNeuron 2.26.0 (09/18/2025)\n--------------------------\n\nSee :ref:`neuron-2-26-0-whatsnew` for the full Neuron 2.26.0 release notes or :doc:`the individual Neuron component release notes </release-notes/components/index>`.\n\nNeuron 2.25.0 (07/31/2025)\n--------------------------\n\nSee :ref:`neuron-2-25-0-whatsnew` for the full Neuron 2.25.0 release notes or :doc:`the individual Neuron component release notes </release-notes/components/index>`.\n\n.. _neuron-2-24-1-whatsnew:\n\nNeuron 2.24.1 (06/30/2025)\n--------------------------\n\nNeuron version 2.24.1 resolves an installation issue that could prevent NeuronX Distributed Training from being installed successfully.\n\n.. _neuron-2-24-0-whatsnew:\n\nNeuron 2.24.0 (06/24/2025)\n--------------------------\n\nNeuron version 2.24 introduces new inference capabilities including prefix caching, disaggregated inference (Beta), and context parallelization support (Beta). This release also includes NKI language enhancements and enhanced profiling visualizations for improved debugging and performance analysis. Neuron 2.24 adds support for PyTorch 2.7 and JAX 0.6, updates existing DLAMIs and DLCs, and introduces a new vLLM inference container.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nWhat's New\n^^^^^^^^^^\n\nNxD Inference (NxDI) includes the following enhancements:\n\n- **Prefix caching**: Improves Time To First Token (TTFT) by up to 3x when processing common shared prompts across requests.\n- **Disaggregated inference (Beta)**: Uses 1P1D (1 Prefill, 1 Decode) architecture to reduce prefill-decode interference and improve goodput.\n- **Context parallelism (Beta)**: Improves TTFT for longer sequence lengths by processing context encoding in parallel across multiple NeuronCores.\n- **Model support**: Added beta support for Qwen 2.5 text models.\n- **NxD Inference Library**: Upgraded to support PyTorch 2.7 and Transformers 4.48.\n\nHugging Face Optimum Neuron 0.2.0 now supports PyTorch-based NxD Core backend for LLM inference, simplifying the implementation of new PyTorch model architectures. Models including Llama 3.1-8B and Llama-3.3-70B have migrated from Transformers NeuronX to the NxD backend.\n\nTraining\n^^^^^^^^\n\n**Library Upgrades**\n\n- **NxD Training  (NxDT) Library**: Upgraded to support PyTorch 2.7 and Transformers 4.48.\n- **JAX Training Support**: Upgraded to JAX 0.6.0.\n\nNeuron Kernel Interface (NKI)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- **New nki.language.gather_flattened**: Provides efficient parallel tensor element gathering.\n- **Enhanced accuracy**: Improved valid range of ``nki.language.sqrt`` and ``nki.isa.activation(nl.sqrt)`` \n- **Advanced indexing**: Improved performance for ``nki.isa.nc_match_replace8``.\n\nNeuron Tools\n^^^^^^^^^^^^\n\n**Neuron Profiler Enhancements**\n\n- **Framework stack traces**: Maps device instructions to model source code.\n- **Scratchpad memory usage visualization**: Shows tensor-level memory usage over time with HLO name association.\n- **On-device collectives barriers**: Identifies synchronization overhead.\n- **HBM throughput visualization**: Tracks data movement involving High Bandwidth Memory (HBM) over time.\n\n**NCCOM-TEST Improvements**\n\n- Added ``--report-to-json-file`` flag: Outputs results in JSON format.\n- Added ``--show-input-output-size`` flag: Explicitly displays input and output sizes based on operations.\n\nNeuron Deep Learning Containers (DLCs)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- Updated containers with PyTorch 2.7 support for inference and training.\n- Added new inference container with NxD Inference and vLLM with FastAPI.\n- JAX DLCs now support JAX 0.6.0 training.\n\nNeuron Deep Learning AMIs (DLAMIs)\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- Updated MultiFramework DLAMIs to include PyTorch 2.7 and JAX 0.6.0.\n- Added new Single Framework DLAMIs for PyTorch 2.7 and JAX 0.6.0.\n\nNeuron 2.24 Feature Release Notes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - NxD Core (neuronx-distributed) \n     - * :ref:`nxd-core_rn`   \n     - ``Trn1`` / ``Trn1n``, ``Trn2``\n\n   * - NxD Inference (neuronx-distributed-inference)\n     - * :ref:`nxd-inference_rn` \n     - ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n   * - NxD Training (neuronx-distributed-training)\n     - * :ref:`nxd-training_rn` \n     - ``Trn1`` / ``Trn1n``, ``Trn2``\n\n   * - PyTorch NeuronX (torch-neuronx)\n     - * :ref:`pytorch_rn`\n     - ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n   * - Neuron Compiler (neuronx-cc)\n     - * :ref:`compiler_rn`\n     - ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2``\n\n   * - Neuron Kernel Interface (NKI)\n     - * :ref:`nki_rn`\n     - ``Inf2``, ``Trn1``/ ``Trn1n``\n\n   * - Neuron Tools\n     - * :ref:`dev-tools_rn`\n     - ``Inf1``, ``Inf2``, ``Trn1``/ ``Trn1n``\n\n   * - Neuron Runtime\n     - * :ref:`runtime_rn`\n     - ``Inf1``, ``Inf2``, ``Trn1``/ ``Trn1n``\n\n   * - Transformers NeuronX (transformers-neuronx) for Inference\n     - * :ref:`nxd-inference_rn` \n     - ``Inf2``, ``Trn1`` / ``Trn1n``\n\n   * - Neuron Deep Learning AMIs (DLAMIs)\n     - * :ref:`neuron-dlami-overview`\n     - ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n\n   * - Neuron Deep Learning Containers (DLCs)\n     - * :ref:`containers_rn`\n     - ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n``\n\n   * - Release Announcements\n     - * :ref:`announce-no-longer-support-beta-pytorch-neuroncore-placement-apis`\n       * :ref:`announce-eos-block-dimension-nki`\n       * :ref:`announce-eos-tensorflow-tutorial`\n       * :ref:`announce-eos-tnx`\n       * :ref:`announce-eos-longer-support-xla-bf16-vars`\n       * :ref:`announce-eos-block-dimension-nki`\n       * :ref:`announce-no-longer-support-llama-32-meta-checkpoint`\n       * :ref:`announce-no-longer-support-nki-jit`\n       * See more at :ref:`announcements-main`.\n     - ``Inf1``, ``Inf2``, ``Trn1``/ ``Trn1n``\n\n.. _neuron-2.23.0-whatsnew:\n\nNeuron 2.23.0 (05/20/2025)\n---------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nWhat's New\n^^^^^^^^^^\n\nWith the Neuron 2.23 release, we move NxD Inference (NxDI) library out of beta. It is now recommended for all multi-chip inference use-cases. In addition, Neuron has new training capabilities, including Context Parallelism and ORPO, NKI improvements (new operators and ISA features), and new Neuron Profiler debugging and performance analysis optimizations. Finally, Neuron now supports :ref:`PyTorch 2.6 <introduce-pytorch-2-6>` and JAX 0.5.3.\n\nInference: NxD Inference (NxDI) moves from beta to GA. NxDI now supports Persistent Cache to reduce compilation times, and optimizes model loading with improved weight sharding performance.\n\nTraining: NxD Training (NxDT) added Context Parallelism support (beta) for Llama models, enabling sequence lengths up to 32K. NxDT now supports model alignment, ORPO, using DPO-style datasets. NxDT has upgraded supports for 3rd party libraries, specifically: PyTorch Lightning 2.5, Transformers 4.48, and NeMo 2.1.\n\nNeuron Kernel Interface (NKI): New support for 32-bit integer nki.language.add and nki.language.multiply on GPSIMD Engine. NKI.ISA improvements include range_select for Trainium2, fine-grained engine control, and enhanced tensor operations. New performance tuning API ``no_reorder`` has been added to enable user-scheduling of instructions. When combined with allocation, this enables software pipelining. Language consistency has been improved for arithmetic operators (``+=``, ``-=``, ``/=``, ``*=``) across loop types, PSUM, and SBUF.\n\nNeuron Profiler: Profiling performance has improved, allowing users to view profile results 5x times faster on average. New features include timeline-based error tracking and JSON error event reporting, supporting execution and OOB error detection. Additionally, this release improves multiprocess visualization with Perfetto. \n\nNeuron Monitoring: Added Kubernetes context information (pod_name, namespace, and container_name) to neuron monitor prometheus output, enabling resource utilization tracking by pod, namespace, and container.\n\nNeuron DLCs: This release updates containers with PyTorch 2.6 support for inference and training. For JAX DLC, this release adds JAX 0.5.0 training support.\n\nNeuron DLAMIs: This release updates MultiFramework AMIs to include PyTorch 2.6, JAX 0.5, and TensorFlow 2.10 and Single Framework AMIs for PyTorch 2.6 and JAX 0.5.\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - NxD Core (neuronx-distributed) \n     - * :ref:`nxd-core_rn`   \n     - Trn1/Trn1n,Trn2\n\n   * - NxD Inference (neuronx-distributed-inference)\n     - * :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n,Trn2\n\n   * - NxD Training (neuronx-distributed-training)\n     - * :ref:`nxd-training_rn` \n     - Trn1/Trn1n,Trn2\n\n   * - PyTorch NeuronX (torch-neuronx)\n     - * :ref:`pytorch_rn`\n     - Trn1/Trn1n,Inf2,Trn2\n\n   * - Neuron Compiler (neuronx-cc)\n     - * :ref:`compiler_rn`\n     - Trn1/Trn1n,Inf2,Trn2\n\n   * - Neuron Kernel Interface (NKI)\n     - * :ref:`nki_rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Tools\n     - * :ref:`dev-tools_rn`\n     - Inf1,Inf2,Trn1/Trn1n,Trn2\n\n   * - Neuron Runtime\n     - * :ref:`runtime_rn`\n     - Inf1,Inf2,Trn1/Trn1n,Trn2\n\n   * - Transformers NeuronX (transformers-neuronx) for Inference\n     - * :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n\n   * - Neuron Deep Learning AMIs (DLAMIs)\n     - * :ref:`neuron-dlami-overview`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Neuron Deep Learning Containers (DLCs)\n     - * :ref:`containers_rn`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Release Announcements\n     - * :ref:`announce-eos-block-dimension-nki`\n       * :ref:`announce-eos-mllama-checkpoint`\n       * :ref:`announce-eos-torch-neuronx-nki-jit`\n       * :ref:`announce-eos-xla-bf`\n       * :ref:`announce-eos-jax-neuronx-features`\n       * :ref:`announce-no-support-nemo-megatron`\n       * :ref:`announce-no-support-tensorflow-eos`\n       * :ref:`announce-u20-base-no-support`\n       * :ref:`announce-tnx-maintenance`\n       * :ref:`announce-eol-nxd-examples`\n       * See more at :ref:`announcements-main`\n     - Inf1, Inf2, Trn1/Trn1n\n\nFor detailed release artifiacts, see :ref:`Release Artifacts <latest-neuron-release-artifacts>`.\n\n\n\n.. _neuron-2.22.1-whatsnew:\n\nNeuron 2.22.1 (05/12/2025)\n---------------------------\n\nNeuron 2.22.1 release includes a Neuron Driver update that resolves DMA abort errors on Trainium2 devices. These errors were previously occurring in the Neuron Runtime during specific workload executions.\n\n\n.. _neuron-2.22.0-whatsnew:\n\nNeuron 2.22.0 (04/03/2025)\n---------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nWhat's New\n^^^^^^^^^^\n\nThe Neuron 2.22 release includes performance optimizations, enhancements and new capabilities across the Neuron software stack. \n\nFor inference workloads, the NxD Inference library now supports Llama-3.2-11B model and supports multi-LoRA serving, allowing customers to load and serve multiple LoRA adapters. Flexible quantization features have been added, enabling users to specify which model layers or NxDI modules to quantize. Asynchronous inference mode has also been introduced, improving performance by overlapping Input preparation with model execution.\n\nFor training, we added LoRA supervised fine-tuning to NxD Training to enable additional model customization and adaptation.\n\nNeuron Kernel Interface (NKI): This release adds new APIs in nki.isa, nki.language, and nki.profile. These enhancements provide customers with greater flexibility and control.\n\nThe updated Neuron Runtime includes optimizations for reduced latency and improved device memory footprint. On the tooling side, the Neuron Profiler 2.0 (beta) has added UI enhancements and new event type support.\n\nNeuron DLCs: this release reduces DLC image size by up to 50% and enables faster build times with updated Dockerfiles structure. On the Neuron DLAMI side, new PyTorch 2.5 single framework DLAMIs have been added for Ubuntu 22.04 and Amazon Linux 2023, along with several new virtual environments within the Neuron Multi Framework DLAMIs.\n\n\nMore release content can be found in the table below and each component release notes.\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - NxD Core (neuronx-distributed) \n     - * :ref:`nxd-core_rn`   \n     - Trn1/Trn1n,Trn2\n\n   * - NxD Inference (neuronx-distributed-inference)\n     - * :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n,Trn2\n\n   * - NxD Training (neuronx-distributed-training)\n     - * :ref:`nxd-training_rn` \n     - Trn1/Trn1n,Trn2\n\n   * - PyTorch NeuronX (torch-neuronx)\n     - * :ref:`pytorch_rn`\n     - Trn1/Trn1n,Inf2,Trn2\n\n   * - NeuronX Nemo Megatron for Training\n     - * `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_  and  :ref:`neuronx-nemo-rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Compiler (neuronx-cc)\n     - * :ref:`compiler_rn`\n     - Trn1/Trn1n,Inf2,Trn2\n\n   * - Neuron Kernel Interface (NKI)\n     - * :ref:`nki_rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Tools\n     - * :ref:`dev-tools_rn`\n     - Inf1,Inf2,Trn1/Trn1n,Trn2\n\n   * - Neuron Runtime\n     - * :ref:`runtime_rn`\n     - Inf1,Inf2,Trn1/Trn1n,Trn2\n\n   * - Transformers NeuronX (transformers-neuronx) for Inference\n     - * :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n\n   * - Neuron Deep Learning AMIs (DLAMIs)\n     - * :ref:`neuron-dlami-overview`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Neuron Deep Learning Containers (DLCs)\n     - * :ref:`containers_rn`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Release Announcements\n     - * :ref:`announce-eos-neuron-det`\n       * :ref:`announce-eos-nxd-examples`\n       * :ref:`announce-python-eos`\n       * :ref:`announce-eos-pytorch-eos-113`\n       * :ref:`announce-eos-pytorch-2-1`\n       * :ref:`announce-u20-dlami-dlc-eos`\n       * :ref:`announce-no-support-torch-neuron`\n       * See more at :ref:`announcements-main`\n     - Inf1, Inf2, Trn1/Trn1n\n\nFor detailed release artifacts, see :ref:`Release Artifacts <latest-neuron-release-artifacts>`.\n\n.. _neuron-2.21.1-whatsnew:\n\nNeuron 2.21.1 (01/14/2025)\n---------------------------\n\nNeuron 2.21.1 release pins Transformers NeuronX dependency to transformers<4.48 and fixes DMA abort errors on Trn2.\n\nAdditionally, this release addresses NxD Core and Training improvements, including fixes for sequence parallel support in quantized models and a new flag for dtype control in Llama3/3.1 70B configurations. See :ref:`NxD Training Release Notes <nxd-training_rn>` (neuronx-distributed-training) for details.\n\nNxD Inference update includes minor bug fixes for sampling parameters. See :ref:`NxD Inference Release Notes <nxd-inference_rn>`.\n\nNeuron supported DLAMIs and DLCs have been updated to Neuron 2.21.1 SDK. Users should be aware of an incompatibility between Tensorflow-Neuron 2.10 (Inf1) and Neuron Runtime 2.21 in DLAMIs, which will be addressed in the next minor release. See :ref:`Neuron DLAMI Release Notes <dlamis_rn>`.\n\nThe Neuron Compiler includes bug fixes and performance enhancements specifically targeting the Trn2 platform.\n\n.. _neuron-2.21.0-whatsnew:\n\nNeuron 2.21.0 (12/20/2024)\n---------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 1\n\nWhat's New\n^^^^^^^^^^\n\n**Overview**: Neuron 2.21.0 introduces support for :ref:`AWS Trainium2 <trainium2-arch>` and\n:ref:`Trn2 instances <aws-trn2-arch>`, including the trn2.48xlarge instance type and Trn2\nUltraServer (Preview). The release adds new capabilities in both training and\ninference of large-scale models. It introduces :ref:`NxD Inference (beta) <introduce-nxd-inference>`, a\nPyTorch-based library for deployment, :ref:`Neuron Profiler 2.0 (beta) <neuron-profiler-2-0-guide>`, and\n:ref:`PyTorch 2.5 <introduce-pytorch-2-5>` support across the Neuron SDK, and :ref:`Logical NeuronCore\nConfiguration (LNC) <logical-neuroncore-config>` for optimizing NeuronCore allocation. The release\nenables :ref:`Llama 3.1 405B model inference <nxdi-trn2-llama3.1-405b-tutorial>` on a single trn2.48xlarge\ninstance.\n\n**NxD Inference**: :ref:`NxD Inference (beta) <nxdi-overview>` is a new PyTorch-based inference library for\ndeploying large-scale models on AWS Inferentia and Trainium instances.\nIt enables PyTorch model onboarding with minimal code changes and\nintegrates with :doc:`vLLM </libraries/nxd-inference/developer_guides/vllm-user-guide>`. NxDI supports various model architectures,\nincluding Llama versions for text processing (Llama 2, Llama 3, Llama\n3.1, Llama 3.2, and Llama 3.3), and Mixture-of-Experts (MoE) model architectures including\nMixtral and DBRX. The library supports quantization methods, includes\ndynamic sampling, and is compatible with HuggingFace checkpoints and\ngenerate() API. NxDI also supports distributed strategies including tensor parallelism and incorporates speculative decoding techniques (Draft model and EAGLE). The\nrelease includes :ref:`Llama 3.1 405B model sample <nxdi-trn2-llama3.1-405b-tutorial>`, :ref:`Llama 3.3 70B model sample <nxdi-trn2-llama3.3-70b-tutorial>` \nand :ref:`Llama 3.1 405B model with speculative decoding <nxdi-trn2-llama3.1-405b-speculative-tutorial>` for inference on a single trn2.48xlarge instance.\n\nFor more information, see :ref:`NxD Inference documentation <nxdi-overview>` and check the NxD\nInference Github repository: `aws-neuron/neuronx-distributed-inference <https://github.com/aws-neuron/neuronx-distributed-inference>`_\n\n**Transformers NeuronX (TNx)**: This release introduces several new features, including flash decoding support for speculative decoding, and on-device generation in speculative decoding flows. It adds :ref:`Eagle speculative decoding <cb-eagle-speculative-decoding>` with greedy and lossless sampling, as well as support for :ref:`CPU compilation <transformers_neuronx_readme>` and sharded model saving. Performance improvements include optimized MLP and QKV for Llama models with sequence parallel norm and control over concurrent compilation workers.\n\n**Training Highlights:** NxD Training in this release adds support for\nHuggingFace :ref:`Llama3/3.1 70B <hf_llama3_70B_pretraining>` on trn2 instances, introduces :doc:`DPO support </libraries/nxd-training/tutorials/hf_llama3_8B_DPO_ORPO>` for\npost-training model alignment, and adds support for Mixture-of-Experts\n(MoE) models including Mixtral 7B. The release includes improved\n:ref:`checkpoint conversion <checkpoint_conversion>` capabilities and supports MoE with Tensor,\nSequence, Pipeline, and Expert parallelism.\n\n**ML Frameworks:** Neuron 2.21.0 adds support for :ref:`PyTorch 2.5 <introduce-pytorch-2-5>` and \nJAX 0.4.35.\n\n.. note::\n  The CVEs\n  `CVE-2024-31583 <https://github.com/advisories/GHSA-pg7h-5qx3-wjr3>`__\n  and\n  `CVE-2024-31580 <https://github.com/advisories/GHSA-5pcm-hx3q-hm94>`__\n  affect PyTorch versions 2.1 and earlier. Based on Amazon’s analysis,\n  executing models on Trainium and Inferentia is not exposed to either of\n  these vulnerabilities. We recommend upgrading to the new version of\n  Torch-NeuronX by following the Neuron setup instructions.\n\n**Logical NeuronCore Configuration (LNC)**: This release introduces :ref:`LNC <logical-neuroncore-config>`\nfor Trainium2 instances, optimizing NeuronCore allocation for ML\napplications. LNC offers two configurations: default (LNC=2) combining\ntwo physical cores, and alternative (LNC=1) mapping each physical core\nindividually. This feature allows users to efficiently manage resources\nfor large-scale model training and deployment through runtime variables\nand compiler flags.\n\n**Neuron Profiler 2.0:** The new :ref:`profiler <neuron-profiler-2-0-guide>` provides system and\ndevice-level profiling, timeline annotations, container integration, and\nsupport for distributed workloads. It includes trace export capabilities\nfor Perfetto visualization and integration with JAX and PyTorch\nprofilers, and support for :ref:`Logical NeuronCore\nConfiguration (LNC) <logical-neuroncore-config>`.\n\n**Neuron Kernel Interface (NKI)**: NKI now supports Trainium2 including\n:ref:`Logical NeuronCore Configuration (LNC) <logical-neuroncore-config>`, adds SPMD capabilities for\nmulti-core operations, and includes new modules and APIs including\nsupport for float8_e5m2 datatype.\n\n**Deep Learning Containers (DLAMIs)**: This release expands support for\nJAX 0.4 within the :ref:`Multi Framework DLAMI <neuron-dlami-overview>`. It also introduces NxD Training, NxD Inference, and NxD Core with\n:ref:`PyTorch 2.5 <introduce-pytorch-2-5>` support. Additionally, a new Single Framework DLAMI for\nTensorFlow 2.10 on Ubuntu 22 is now available.\n\n**Deep Learning Containers (DLCs):** This release introduces new DLCs\nfor :doc:`JAX 0.4 </setup/jax/index>` training and PyTorch 2.5.1 inference and training. All DLCs\nhave been updated to Ubuntu 22, and the pytorch-inference-neuronx DLC\nnow supports both NxD Inference and TNx libraries.\n\n**Documentation**: Documentation updates include architectural details\nabout Trainium2 and :ref:`NeuronCore-v3 <neuroncores-v3-arch>`, along with specifications and\ntopology information for the trn2.48xlarge instance type and Trn2\nUltraServer.\n\n**Software Maintenance**: This release includes the following  :ref:`announcements <announcements-main>`:\n\n-  Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release\n-  Announcing end of support for Neuron DET tool starting next release\n-  PyTorch Neuron versions 1.9 and 1.10 no longer supported\n-  Announcing end of support for PyTorch 2.1 for Trn1, Trn2 and Inf2 starting next release \n-  Announcing end of support for PyTorch 1.13 for Trn1 and Inf2 starting next release\n-  Announcing end of support for Python 3.8 in future releases\n-  Announcing end of support for Ubuntu20 DLCs and DLAMIs\n\n**Amazon Q**: `Use Q Developer <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/amazonq-getstarted.html#amazon-q-dev>`__\nas your Neuron Expert for general technical guidance and to jumpstart your NKI kernel development.\n\n\nMore release content can be found in the table below and each component release notes.\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - Known Issues and Limitations\n     - * See :ref:`neuron-2.21.0-known-issues`\n     - Trn1/Trn1n , Inf2, Inf1\n\n   * - Transformers NeuronX (transformers-neuronx) for Inference\n     - * Flash decoding support for speculative decoding\n       * Added support for EAGLE speculative decoding with greedy and lossless sampling\n       * Enabled on-device generation support in speculative decoding flows\n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n, Trn2\n\n\n   * - NxD Core (neuronx-distributed) \n     - **Training:**\n\n       * Added support for HuggingFace Llama3 70B with Trn2 instances\n       * Added DPO support for post-training model alignment\n       * See more at :ref:`nxd-core_rn`   \n     - Trn1/Trn1n,Trn2\n\n   * - NxD Inference (neuronx-distributed-inference)\n     - * Introduced new NxD Inference Library. See :ref:`introduce-nxd-inference`\n       * Added Llama3.1 405B Inference Example on Trn2. See :ref:`nxdi-trn2-llama3.1-405b-tutorial`\n       * Added support for vLLM integration for NxD Inference. See :ref:`nxdi-vllm-user-guide-v1`\n       * Introduced Open Source Github repository for NxD Inference. See `aws-neuron/neuronx-distributed-inference <https://github.com/aws-neuron/neuronx-distributed-inference>`_\n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n,Trn2\n\n   * - NxD Training (neuronx-distributed-training)\n     - * Added support for HuggingFace Llama3/3.1 70B with Trn2 instances\n       * Added support for Mixtral 8x7B Megatron and HuggingFace models\n       * Added support for custom pipeline parallel cuts in HuggingFace Llama3\n       * Added support for DPO post-training model alignment\n       * See more at :ref:`nxd-training_rn` \n     - Trn1/Trn1n,Trn2\n\n   * - PyTorch NeuronX (torch-neuronx)\n     - * Introduced PyTorch 2.5 support \n       * See more at :ref:`pytorch_rn`\n     - Trn1/Trn1n,Inf2,Trn2\n\n   * - NeuronX Nemo Megatron for Training\n     - * Added support for HuggingFace to NeMo checkpoint conversion when virtual pipeline parallel is enabled.\n       * Added collective compute coalescing for ZeRO-1 optimizer\n       * See more at `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_  and  :ref:`neuronx-nemo-rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Compiler (neuronx-cc)\n     - * Minor bug fixes and performance enhancements for the Trn2 platform.\n       * See more at :ref:`compiler_rn`\n     - Trn1/Trn1n,Inf2,Trn2\n  \n   * - Neuron Kernel Interface (NKI)\n     - * Added ``nki.compiler`` module with Allocation Control and Kernel decorators\n       * Added new nki.isa APIs. \n       * Added new nki.language APIs. \n       * Added new kernels (``allocated_fused_self_attn_for_SD_small_head_size``, ``allocated_fused_rms_norm_qkv``).\n       * See more at :ref:`nki_rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Deep Learning AMIs (DLAMIs)\n     - * Added support for Trainium2 chips within the Neuron Multi Framework DLAMI.\n       * Added support for JAX 0.4 to Neuron Multi Framework DLAMI.\n       * Added NxD Training (NxDT), NxD Inference (NxDI) and NxD Core PyTorch 2.5 support within the Neuron Multi Framework DLAMI.\n       * See more at :ref:`neuron-dlami-overview`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Neuron Deep Learning Containers (DLCs)\n     - * Added new pytorch-inference-neuronx 2.5.1 and pytorch-training-neuronx 2.5.1 DLCs\n       * Added new jax-training-neuronx 0.4 Training DLC\n       * See more at :ref:`containers_rn`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Neuron Tools\n     - * Introduced Neuron Profiler 2.0. See :ref:`neuron-profiler-2-0-guide`\n       * See more at :ref:`dev-tools_rn`\n     - Inf1,Inf2,Trn1/Trn1n,Trn2\n\n   * - Neuron Runtime\n     - * Added runtime support to fail in case of out-of-bound memory access when DGE is enabled.\n       * Added support for 4-rank replica group on adjacent Neuron cores on TRN1/TRN1N\n       * See more at :ref:`runtime_rn`\n     - Inf1,Inf2,Trn1/Trn1n,Trn2\n\n   * - Release Annoucements\n     - * :ref:`announce-eos-neuron-det`\n       * :ref:`announce-eos-nxd-examples`\n       * :ref:`announce-python-eos`\n       * :ref:`announce-eos-pytorch-eos-113`\n       * :ref:`announce-eos-pytorch-2-1`\n       * :ref:`announce-u20-dlami-dlc-eos`\n       * :ref:`announce-no-support-torch-neuron`\n       * See more at :ref:`announcements-main`\n     - Inf1, Inf2, Trn1/Trn1n\n\n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1, Trn2\n\n.. _neuron-2.21.0-known-issues:\n\n2.21.0 Known Issues and Limitations \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n* See component release notes below for any additional known issues.\n\n\n.. _neuron-2.21.0.beta-whatsnew:\n\nNeuron 2.21.0 Beta (12/03/2024)\n--------------------------------\n\n.. note::\n  This release (Neuron 2.21 Beta) was only tested with Trn2 instances. The next release (Neuron 2.21) will support all instances (Inf1, Inf2, Trn1, and Trn2).\n\n  For access to this release (Neuron 2.21 Beta), please contact your account manager.\n\nThis release (Neuron 2.21 beta) introduces support for :ref:`AWS Trainium2 <trainium2-arch>` and :ref:`Trn2 instances <aws-trn2-arch>`, including the trn2.48xlarge instance type and Trn2 UltraServer. The release showcases Llama 3.1 405B model inference using NxD Inference on a single trn2.48xlarge instance, and FUJI 70B model training using the AXLearn library across eight trn2.48xlarge instances.\n\n:ref:`NxD Inference <nxdi-index>`, a new PyTorch-based library for deploying large language models and multi-modality models, is introduced in this release. It integrates with vLLM and enables PyTorch model onboarding with minimal code changes. The release also adds support for `AXLearn <https://github.com/apple/axlearn>`_ training for JAX models.\n\nThe new :ref:`Neuron Profiler 2.0 <neuron-profiler-2-0-guide>` introduced in this release offers system and device-level profiling, timeline annotations, and container integration. The profiler supports distributed workloads and provides trace export capabilities for Perfetto visualization.\n\nThe documentation has been updated to include architectural details about :ref:`Trainium2 <trainium2-arch>` and :ref:`NeuronCore-v3 <neuroncores-v3-arch>`, along with specifications and topology information for the trn2.48xlarge instance type and Trn2 UltraServer.\n\n:ref:`Use Q Developer <amazon-q-dev>` as your Neuron Expert for general technical guidance and to jumpstart your NKI kernel development.\n\n.. note::\n  For the latest release that supports Trn1, Inf2 and Inf1 instances, please see :ref:`Neuron Release 2.20.2 <neuron-2-20-2-whatsnew>`\n\n\n\n.. _neuron-2-20-2-whatsnew:\n\nNeuron 2.20.2 (11/20/2024)\n---------------------------\n\nNeuron 2.20.2 release fixes a stability issue in Neuron Scheduler Extension that previously caused crashes in Kubernetes (K8) deployments. See :ref:`containers_rn`.\n\nThis release also addresses a security patch update to Neuron Driver that fixes a kernel address leak issue. \nSee more on :ref:`runtime_rn` and :ref:`runtime_rn`.\n\nAddtionally, Neuron 2.20.2 release updates ``torch-neuronx`` and ``libneuronxla`` packages to add support for ``torch-xla`` 2.1.5 package \nwhich fixes checkpoint loading issues with Zero Redundancy Optimizer (ZeRO-1). See :ref:`pytorch_rn` and :ref:`libneuronxla-rn`.\n\nNeuron supported DLAMIs and DLCs are updated with this release (Neuron 2.20.2 SDK). The Training DLC is also updated to address the \nversion dependency issues in NxD Training library. See :ref:`containers_rn`.\n\nNxD Training library in Neuron 2.20.2 release is updated to transformers 4.36.0 package. See :ref:`nxd-training_rn`.\n\n\nNeuron 2.20.1 (10/25/2024)\n---------------------------\n\nNeuron 2.20.1 release addresses an issue with the Neuron Persistent Cache that was brought forth in 2.20 release. In the 2.20 release, the Neuron persistent cache issue resulted in a cache-miss scenario when attempting to load a previously compiled Neuron Executable File Format (NEFF) from a different path or Python environment than the one used for the initial Neuron SDK installation and NEFF compilation. This release resolves the cache-miss problem, ensuring that NEFFs can be loaded correctly regardless of the path or Python environment used to install the Neuron SDK, as long as they were compiled using the same Neuron SDK version.\n\nThis release also addresses the excessive lock wait time issue during neuron_parallel_compile graph extraction for large cluster training. See :ref:`pytorch_rn` and :ref:`libneuronxla-rn`.\n\nAdditionally, Neuron 2.20.1 introduces new Multi Framework DLAMI for Amazon Linux 2023 (AL2023) that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports. See :ref:`dlamis_rn`.\n\nNeuron 2.20.1 Training DLC is also updated to pre-install the necessary dependencies and support NxD Training library out of the box. See :ref:`containers_rn`\n\n.. _neuron-2.20-whatsnew:\n\nNeuron 2.20.0 (09/16/2024)\n---------------------------\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\n**Overview**: Neuron 2.20 release introduces usability improvements and new capabilities across training and inference workloads. A key highlight is the introduction of :ref:`Neuron Kernel Interface (beta) <neuron-nki>`. NKI, pronounced 'Nicky', is enabling developers to build optimized custom compute kernels for Trainium and Inferentia. Additionally, this release introduces :ref:`NxD Training (beta) <nxdt>`, a PyTorch-based library enabling efficient distributed training, with a user-friendly interface compatible with NeMo. This release also introduces the support for the :ref:`JAX framework (beta) <jax-neuron-main>`.\n\nNeuron 2.20 also adds inference support for Pixart-alpha and Pixart-sigma Diffusion-Transformers (DiT) models, and adds support for Llama 3.1 8B, 70B and 405B models inference supporting up to 128K context length.\n\n**Neuron Kernel Interface**: NKI is a programming interface enabling developers to build optimized compute custom kernels on top of Trainium and Inferentia. NKI empowers developers to enhance deep learning models with new capabilities, performance optimizations, and scientific innovation. It natively integrates with PyTorch and JAX, providing a Python-based programming environment with Triton-like syntax and tile-level semantics, offering a familiar programming experience for developers. \nAll of our NKI work is shared as open source, enabling the community developers to collaborate and use these kernels in their projects, improve existing kernels, and contribute new NKI kernels. The list of kernels we are introducing includes Optimized Flash Attention NKI kernel (``flash_attention``), a NKI kernel with an optimized implementation of Mamba model architecture (``mamba_nki_kernels``) and Optimized Stable Diffusion Attention kernel (``fused_sd_attention_small_head``). In addition to NKI kernel samples for ``average_pool2d``, ``rmsnorm``, ``tensor_addition``, ``layernorm``, ``transpose_2d``, and ``matrix_multiplication``.\n\nFor more information see :ref:`NKI section <neuron-nki>` and check the NKI samples Github repository: https://github.com/aws-neuron/nki-samples\n\n**NxD Training (NxDT)**: NxDT is a PyTorch-based library that adds support for user-friendly distributed training experience through a YAML configuration file compatible with NeMo,, allowing users to easily set up their training workflows. At the same time, NxDT maintains flexibility, enabling users to choose between using the YAML configuration file, PyTorch Lightning Trainer, or writing their own custom training script using the NxD Core.\nThe library supports PyTorch model classes including Hugging Face and Megatron-LM. Additionally, it leverages NeMo's data engineering and data science modules enabling end-to-end training workflows on NxDT, and providing compatability with NeMo through minimal changes to the YAML configuration file for models that are already supported in NxDT. Furthermore, the functionality of the Neuron NeMo Megatron (NNM) library is now part of NxDT, ensuring a smooth migration path from NNM to NxDT.\n\nFor more information see :ref:`NxD Training (beta) <nxdt>` and check the NxD Training Github repository: https://github.com/aws-neuron/neuronx-distributed-training \n\n**Training Highlights**: This release adds support for Llama 3.1 8B and 70B model training up to 32K sequence length (beta). It also adds support for torch.autocast() for native PyTorch mixed precision support and PEFT LoRA model training.\n\n**Inference Highlights**: Neuron 2.20 adds support for Llama 3.1 models (405b, 70b, and 8b variants) and introduces new features like on-device top-p sampling for improved performance, support for up to 128K context length through Flash Decoding, and multi-node inference for large models like Llama-3.1-405B.\nFurthermore, this release improves model loading in Transformers Neuronx for models like Llama-3 by loading the pre-sharded or pre-transformed weights and adds support to Diffusion-Transformers (DiT) models such as Pixart-alpha and Pixart-sigma.\n\n**Compiler**: This release introduces Neuron Compiler support for RMSNorm and RMSNormDx operators, along with enhanced performance for the sort operator. \n\n**System Tools**: As for the Neuron Tools, it enables NKI profiling support in the Neuron Profiler and introduces improvements to the Neuron Profiler UI.\n\n**Neuron Driver**: This release adds support for the Rocky Linux 9.0 operating system. \n\n**Neuron Containers**: This release introduces Neuron Helm Chart, which helps streamline the deployment of AWS Neuron components on Amazon EKS. See Neuron Helm Chart Github repository: https://github.com/aws-neuron/neuron-helm-charts. \nAdditionaly, this release adds ECS support for the \"Neuron Node Problem Detector and Recovery\" artifact. See :ref:`ecs-neuron-problem-detector-and-recovery`.\n\n**Neuron DLAMIs and DLCs**: This release includes the addition of the NxDT package to various Neuron DLAMIs (Multi-Framework Neuron DLAMI, PyTorch 1.13 Neuron DLAMI, and PyTorch 2.1 Neuron DLAMI) and the inclusion of NxDT in the PyTorch 1.13 Training Neuron DLC and PyTorch 2.1 Training Neuron DLC.\n\n**Software Maintenance Policy**: This release also updates Neuron SDK software maintenance poclicy, For more information see :ref:`sdk-maintenance-policy`\n\n\nMore release content can be found in the table below and each component release notes.\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - Known Issues and Limitations\n     - * See :ref:`neuron-2.20.0-known-issues`\n     - Trn1/Trn1n , Inf2, Inf1\n\n   * - Transformers NeuronX (transformers-neuronx) for Inference\n     - * Support for on-device sampling (Top P) and dynamic sampling (per request parameters) with Continuous batching. See :ref:`developer guide <transformers_neuronx_readme>`\n       * Support for Flash Decoding to enable inference for higher sequence lengths of upto 128K. See :ref:`developer guide <transformers_neuronx_readme>`.\n       * Support for multi-node inference for large models like ``Llama-3.1-405B``. See :ref:`developer guide <transformers_neuronx_readme>`.\n       * Support for bucketing, multi-node inference , on-device sampling and other improvements in Neuron vLLM integration. See :ref:`developer guide <transformers_neuronx_readme>` \n       * Support for Llama 3.1 models (405B, 70B, and 8B variants). See samples for `Llama-3.1-405B <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-405b-multinode-16k-sampling.ipynb>`_ , `Llama-3.1-70B <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-70b-64k-sampling.ipynb>`_  and  `Llama-3.1-8B <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-3.1-8b-128k-sampling.ipynb>`_\n       * Support for improved model loading for models like Llama-3 by loading the pre-sharded or pre-transformed weights. See :ref:`serialization support in developer guide <transformers_neuronx_readme>`. \n       * Support for ROPE scaling for Llama 3 and Llama 3.1 models. \n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n\n\n   * - NxD Core (neuronx-distributed) \n     - **Training:**\n\n       * Support for LoRA finetuning\n       * Support for Mixed precision enhancements\n\n       **Inference:**\n       \n       * Suppport for DBRX and Mixtral inference samples. See  samples for `DBRX <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/dbrx>`_ and `Mixtral <https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/mixtral>`_\n       * Support for sequence length autobucketing to improve inference performance.\n       * Support for improved tracing in the inference samples.\n       * See more at :ref:`nxd-core_rn`   \n     - Trn1/Trn1n\n\n\n   * - NxD Training (neuronx-distributed-training)\n     - * First release of NxD Training (beta)\n       * See more at :ref:`nxd-training_rn` \n     - Trn1/Trn1n\n\n\n   * - PyTorch NeuronX (torch-neuronx)\n     - * Support for inference of Diffusion-Transformers (DiT) models such as ``Pixart-alpha`` and ``Pixart-sigma``. See samples for `Pixart-alpha <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_pixart_alpha_inference_on_inf2.ipynb>`_ and `Pixart-sigma <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_pixart_sigma_inference_on_inf2.ipynb>`_.\n       * Support for inference of ``wav2vec2-conformer`` models.  See samples for inference of ``wav2vec2-conformer`` with `relative position embeddings <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_wav2vec2_conformer_relpos_inference_on_inf2.ipynb>`_ and `rotary position embeddings <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/inference/hf_pretrained_wav2vec2_conformer_rope_inference_on_inf2.ipynb>`_\n       * See more at :ref:`pytorch_rn`\n     - Trn1/Trn1n,Inf2\n\n   * - NeuronX Nemo Megatron for Training\n     - * Fixed issue with linear warmup with cosine annealing\n       * Fixed indexing issues with MPI job checkpoint conversion.\n       * Fixed pipeline parallel bug for NeMo to HF checkpoint conversion       \n       * See more at `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_  and  :ref:`neuronx-nemo-rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Compiler (neuronx-cc)\n     - * Memory optimization that will reduce the generated compiler artifacts size (i.e., NEFFs)\n       * See more at :ref:`compiler_rn`\n     - Trn1/Trn1n,Inf2\n  \n   * - Neuron Kernel Interface (NKI)\n     - * First Release on Neuron Kernel Interface (NKI)\n       * See more at :ref:`nki_rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Deep Learning AMIs (DLAMIs)\n     - * Support for ``neuronx-distributed-training`` library in PyTorch Neuron DLAMI virtual enviornments. See :ref:`neuron-dlami-overview`\n       * Updated existing Neuron supported DLAMIs with Neuron 2.20 SDK release.\n       * See more at :ref:`Neuron DLAMI Release Notes <neuron-dlami-overview>`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Neuron Deep Learning Containers (DLCs)\n     - * Updated existing PyTorch Neuron DLCs with Neuron 2.20 SDK release.\n       * Support for ``neuronx-distributed-training`` library in `pytorch-training-neuronx DLCs <https://github.com/aws-neuron/deep-learning-containers/tree/main?tab=readme-ov-file#pytorch-training-neuronx>`_. \n       * See more at :ref:`containers_rn`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Neuron Tools\n     - * Improvements in Neuron Profile\n       * See more at :ref:`dev-tools_rn`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Neuron Runtime\n     - * Introduced a sysfs memory usage counter for DMA rings (:ref:`reference <neuron-sysfs-ug>`)\n       * See more at :ref:`runtime_rn`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Release Annoucements\n     - * :ref:`announce-component-name-change-nxdcore`\n       * :ref:`eos-neurondevice`\n       * :ref:`eos-neuron-device-version`\n       * :ref:`announce-tfx-no-support`\n       * :ref:`announce-torch-neuron-eos`\n       * :ref:`eos-al2`\n       * See more at :ref:`announcements-main`\n     - Inf1, Inf2, Trn1/Trn1n\n\n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1\n\n.. _neuron-2.20.0-known-issues:\n\n2.20.0 Known Issues and Limitations \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n* Known issues when using ``on_device_generation`` flag in Transformers NeuronX config for Llama models. Customers are advised not to use the flag when they see an issue. See more at :ref:`nxd-inference_rn`  \n* See component release notes below for any additional known issues.\n\nNeuron Components Release Notes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nInf1, Trn1/Trn1n and Inf2 common packages\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n\n   * - Component\n     - Instance/s\n     - Package/s\n     - Details\n\n\n   * - Neuron Runtime\n     - Trn1/Trn1n, Inf1, Inf2\n     - * Trn1/Trn1n: ``aws-neuronx-runtime-lib`` (.deb, .rpm)\n\n       * Inf1: Runtime is linked into the ML frameworks packages\n       \n     - * :ref:`runtime_rn`\n\n   * - Neuron Runtime Driver\n     - Trn1/Trn1n, Inf1, Inf2\n     - * ``aws-neuronx-dkms``  (.deb, .rpm)\n\n     - * :ref:`runtime_rn`\n\n   * - Neuron System Tools\n     - Trn1/Trn1n, Inf1, Inf2\n     - * ``aws-neuronx-tools``  (.deb, .rpm)\n     - * :ref:`dev-tools_rn`\n\n\n\n   * - Containers\n     - Trn1/Trn1n, Inf1, Inf2\n     - * ``aws-neuronx-k8-plugin`` (.deb, .rpm)\n\n       * ``aws-neuronx-k8-scheduler`` (.deb, .rpm)\n       \n       * ``aws-neuronx-oci-hooks`` (.deb, .rpm)\n\n     - * :ref:`containers_rn`\n\n       * :ref:`containers_rn`\n\n   * - NeuronPerf (Inference only)\n     - Trn1/Trn1n, Inf1, Inf2\n     - * ``neuronperf`` (.whl)\n     - * :ref:`dev-tools_rn`\n\n   * - TensorFlow Model Server Neuron\n     - Trn1/Trn1n, Inf1, Inf2\n     - * ``tensorflow-model-server-neuronx`` (.deb, .rpm)\n     - * :ref:`tensorflow-modeslserver-neuronx-rn`\n\n\n\nTrn1/Trn1n and Inf2 only packages\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n   \n   * - Component\n     - Instance/s\n     - Package/s\n     - Details\n\n\n   * - PyTorch Neuron\n     - Trn1/Trn1n, Inf2\n     - * ``torch-neuronx`` (.whl)\n     - * :ref:`pytorch_rn`\n       * :ref:`pytorch-neuron-supported-operators`\n       \n\n   * - TensorFlow Neuron\n     - Trn1/Trn1n, Inf2\n     - * ``tensorflow-neuronx`` (.whl)\n     - * :ref:`tensorflow-neuronx-release-notes`\n\n \n   * - Neuron Compiler (Trn1/Trn1n, Inf2 only)\n     - Trn1/Trn1n, Inf2\n     - * ``neuronx-cc`` (.whl)\n     - * :ref:`compiler_rn`\n\n\n   * - Neuron Kernel Interface (NKI) Compiler (Trn1/Trn1n, Inf2 only)\n     - Trn1/Trn1n, Inf2\n     - * Supported within ``neuronx-cc`` (.whl)\n     - * :ref:`nki_rn`\n\n   * - Collective Communication library\n     - Trn1/Trn1n, Inf2    \n     - * ``aws-neuronx-collective`` (.deb, .rpm)\n     - * :ref:`runtime_rn`\n\n\n   * - Neuron Custom C++ Operators\n     - Trn1/Trn1n, Inf2\n  \n     - * ``aws-neuronx-gpsimd-customop`` (.deb, .rpm)\n  \n       * ``aws-neuronx-gpsimd-tools`` (.deb, .rpm)\n  \n     - * :ref:`gpsimd-customop-lib-rn`\n\n       * :ref:`gpsimd-customop-tools-rn`\n\n\n   * - Transformers Neuron\n     - Trn1/Trn1n, Inf2\n     - * ``transformers-neuronx`` (.whl)\n     - * :ref:`nxd-inference_rn`\n\n   * - NxD Training\n     - Trn1/Trn1n, Inf2\n     - * ``neuronx-distributed-training`` (.whl)\n     - * :ref:`nxd-training_rn`\n\n\n   * - NxD Core\n     - Trn1/Trn1n, Inf2\n     - * ``neuronx-distributed`` (.whl)\n     - * :ref:`nxd-core_rn`\n\n   * - AWS Neuron Reference for NeMo Megatron\n     - Trn1/Trn1n\n     - * `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_\n     - * :ref:`neuronx-nemo-rn`\n\n\n\n\nInf1 only packages\n~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n   \n\n   * - Component\n     - Instance/s\n     - Package/s\n     - Details\n\n\n   * - PyTorch Neuron\n     - Inf1\n     - * ``torch-neuron`` (.whl)\n     - * :ref:`pytorch-neuron-rn`\n\n       * :ref:`neuron-cc-ops-pytorch`\n\n\n   * - TensorFlow Neuron\n     - Inf1\n     - * ``tensorflow-neuron`` (.whl)\n     - * :ref:`tensorflow-neuron-rn`\n\n       * :ref:`neuron-cc-ops-tensorflow`\n       \n       * :ref:`tensorflow-neuron-rn-v2` \n\n\n\n   * - Apache MXNet\n     - Inf1\n     - * ``mx_neuron`` (.whl)\n     - * :doc:`MXNet Neuron Release Notes </release-notes/archive/mxnet-neuron>`\n\n       * :ref:`neuron-cc-ops-mxnet`\n\n\n   * - Neuron Compiler (Inf1 only)\n     - Inf1\n     - * ``neuron-cc`` (.whl)\n     - * :ref:`neuron-cc-rn`\n\n       * :ref:`neuron-supported-operators`\n\n.. _neuron-2.19.0-whatsnew:\n\nNeuron 2.19.1 (07/19/2024)\n---------------------------\n\nThis release (Neuron 2.19.1) addresses an issue with the Neuron Persistent Cache that was introduced in the previous release, Neuron 2.19. The issue resulted in a cache-miss scenario when attempting to load a previously compiled Neuron Executable File Format (NEFF) from a different path or Python environment than the one used for the initial Neuron SDK installation and NEFF compilation. This release resolves the cache-miss problem, ensuring that NEFFs can be loaded correctly regardless of the path or Python environment used to install the Neuron SDK, as long as they were compiled using the same Neuron SDK version.\n\n\n\nNeuron 2.19.0 (07/03/2024)\n---------------------------\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nNeuron 2.19 release adds Llama 3 training support and introduces Flash Attention kernel support to enable LLM training and inference for\nlarge sequence lengths. Neuron 2.19 also introduces new features and performance\nimprovements to LLM training, improves LLM inference performance for Llama 3 model by upto 20%, and adds tools for monitoring, problem detection and recovery in Kubernetes (EKS) environments, improving efficiency and reliability.\n\n**Training highlights**: LLM model training user experience using\nNeuronX Distributed (NxD) is improved by support for Flash Attention to\nenable training with longer sequence lengths >= 8K. Neuron 2.19 adds support for Llama 3 model training. This release also\nadds support for Interleaved pipeline parallelism to reduce idle time\n(bubble size) and enhance training efficiency and resource utilization for large cluster sizes.\n\n**Inference highlights**: Flash Attention kernel support in the Transformers NeuronX library enables LLM inference for context lengths of up to 32k. This release also adds [Beta] support for continuous batching with ``mistralai/Mistral-7B-v0.2`` in Transformers NeuronX.\n\n**Tools and Neuron DLAMI/DLC highlights**: This release introduces the new Neuron Node\nProblem Detector and Recovery plugin in EKS supported Kubernetes\nenvironments:a tool to monitor the health of Neuron instances and\ntriggers automatic node replacement upon detecting an unrecoverable\nerror. Neuron 2.19 introduces the new Neuron Monitor container to\nenable easy monitoring of Neuron metrics in Kubernetes, and adds monitoring support with Prometheus and Grafana.\nThis release also introduces new PyTorch 2.1 and PyTorch 1.13 single framework DLAMIs for Ubuntu 22. Neuron DLAMIs and Neuron DLCs are also updated to support this release (Neuron 2.19).\n\nMore release content can be found in the table below and each component release notes.\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - Known Issues and Limitations\n     - * See :ref:`neuron-2.19.0-known-issues`\n     - Trn1/Trn1n , Inf2, Inf1\n\n   * - Transformers NeuronX (transformers-neuronx) for Inference\n     - * Support for Flash Attention kernel in Llama models to enable inference for higher sequence lengths. See :ref:`developer guide <transformers_neuronx_readme>`.\n       * Support for running Top-K sampling on Neuron device for generation in Mixtral models. See ``Mixtral-8x7b`` `sample <https://github.com/aws-neuron/transformers-neuronx/blob/main/src/transformers_neuronx/mixtral/model.py>`__.\n       * [Beta] Support for Continuous batching with ``mistralai/Mistral-7B-Instruct-v0.2`` model inference. See :ref:`developer guide <transformers_neuronx_readme>`.\n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n\n   * - NeuronX Distributed (neuronx-distributed) for Training\n     - * Support for Interleaved pipeline parallelism to reduce idle time (bubble size) and enhance training efficiency and resource utilization for large cluster sizes. See :ref:`api guide <api_guide>` , :ref:`developer guide <pp_developer_guide>`\n       * Support for Flash Attention kernel to enable training with longer sequence lengths.\n       * See more at :ref:`nxd-core_rn` \n     - Trn1/Trn1n\n\n   * - NeuronX Distributed (neuronx-distributed) for Inference\n     - * Support for Flash Attention kernel for longer sequence length inference. See :pytorch-neuron-src:`[CodeLlama-13b Inference with 16k sequence length] <neuronx_distributed/llama/codellama_16k_inference.ipynb>`\n       * [Beta] Support for speculative decoding. See :ref:`developer guide <neuronx_distributed_inference_developer_guide>`.\n       * See more at :ref:`nxd-core_rn` \n     - Inf2,Trn1/Trn1n\n\n   * - PyTorch NeuronX (torch-neuronx)\n     - * Support for FP32 master weights and BF16 all-gather during Zero1 training to enhance training efficiency.\n       * Support to add custom SILU activation functions by configuring NEURON_CUSTOM_SILU variable\n       * See more at :ref:`pytorch_rn`\n     - Trn1/Trn1n,Inf2\n\n   * - NeuronX Nemo Megatron for Training\n     - * Support for FP32 gradient accumulation enhancing accuracy for large model training.\n       * Support for Zero1 training with master weights\n       * Support for Flash Attention kernel to train with longer sequence lengths (greater than 8K)\n       * See more at `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_  and  :ref:`neuronx-nemo-rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Compiler (neuronx-cc)\n     - * Support for Flash Attention kernel to enable usage of long sequence lengths during training and inference.\n       * See more at :ref:`compiler_rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron DLAMI and DLC\n     - * Neuron DLAMIs are updated with latest 2.19 Neuron SDK. See :ref:`neuron-dlami-overview`\n       * New Neuron Single Framework DLAMIs with PyTorch-2.1 and PyTorch-1.13 for Ubuntu 22. See :ref:`neuron-dlami-overview`\n       * New Base Deep Learning AMI (DLAMI) for Ubuntu 22. See :ref:`neuron-dlami-overview`\n       * PyTorch 1.13 and PyTorch 2.1 Inference and Training DLCs are updated with latest 2.19 Neuron SDK. See :ref:`neuron_containers`\n       * PyTorch 1.13 Inference and PyTorch 2.1 Inference DLCs are updated with TorchServe v0.11.0. See :ref:`neuron_containers`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Neuron Tools\n     - * Support for new Neuron Node Problem Detector and Recovery plugin in EKS supported kubernetes environments that monitors health of Neuron instances and triggers automatic node replacement upon detecting an unrecoverable error. See :doc:`configuration </containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa>` and :ref:`tutorial <k8s-neuron-problem-detector-and-recovery>`.\n       * Support for new Neuron Monitor container to enable easy monitoring of Neuron metrics in Kubernetes. Supports monitoring with Prometheus and Grafana. See :ref:`tutorial <k8s-neuron-monitor>`\n       * Support for Neuron scheduler extension to enforce allocation of contiguous Neuron Devices for the pods based on the Neuron instance type. See :ref:`tutorial <neuron_scheduler>`\n       * Neuron Profiler bugfixes and UI updates, including improvements to visualizing collective operations and to the consistency of information being displayed\n       * Added memory usage metrics and device count information to neuron-monitor \n       * See more at :ref:`dev-tools_rn`\n     - Inf1,Inf2,Trn1/Trn1n\n\n   * - Neuron Runtime\n     - * Support for dynamic Direct Memory Access (DMA) that reduces memory usage during runtime.\n       * Runtime Enhancements that improve collectives performance\n       * See more at :ref:`runtime_rn`\n     - Inf1,Inf2,Trn1/Trn1n\n  \n   * - Other Documentation Updates\n     - * Announced maintenance mode of MxNet. See :ref:`announce-mxnet-maintenance`\n       * Announced End of support of Neuron TensorFlow 1.x (Inf1). See :ref:`announce-tfx-eos`\n       * Announce End of support for AL2. See :ref:`announce-eos-al2`\n       * --\n     - Inf1, Inf2, Trn1/Trn1n\n\n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1\n\n.. _neuron-2.19.0-known-issues:\n\n2.19.0 Known Issues and Limitations \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n* Known issues when using ``on_device_generation`` flag in Transformers NeuronX config for Llama models. Customers are advised not to use the flag when they see an issue. See more at :ref:`nxd-inference_rn`  \n* See component release notes below for any additional known issues.\n\n\nNeuron Components Release Notes\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nInf1, Trn1/Trn1n and Inf2 common packages\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n\n   * - Component\n     - Instance/s\n     - Package/s\n     - Details\n\n\n   * - Neuron Runtime\n     - Trn1/Trn1n, Inf1, Inf2\n     - * Trn1/Trn1n: ``aws-neuronx-runtime-lib`` (.deb, .rpm)\n\n       * Inf1: Runtime is linked into the ML frameworks packages\n       \n     - * :ref:`runtime_rn`\n\n   * - Neuron Runtime Driver\n     - Trn1/Trn1n, Inf1, Inf2\n     - * ``aws-neuronx-dkms``  (.deb, .rpm)\n\n     - * :ref:`runtime_rn`\n\n   * - Neuron System Tools\n     - Trn1/Trn1n, Inf1, Inf2\n     - * ``aws-neuronx-tools``  (.deb, .rpm)\n     - * :ref:`dev-tools_rn`\n\n   * - Neuron DLAMI\n     - Trn1/Trn1n, Inf1, Inf2\n     - * \n     - * `Neuron DLAMI Release Notes <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/dlami/index.html>`_.\n\n   * - Neuron DLC\n     - Trn1/Trn1n, Inf1, Inf2\n     - *\n     - * :ref:`containers_rn`\n\n   * - Containers\n     - Trn1/Trn1n, Inf1, Inf2\n     - * ``aws-neuronx-k8-plugin`` (.deb, .rpm)\n\n       * ``aws-neuronx-k8-scheduler`` (.deb, .rpm)\n       \n       * ``aws-neuronx-oci-hooks`` (.deb, .rpm)\n\n     - * :ref:`containers_rn`\n\n       * :ref:`containers_rn`\n\n   * - NeuronPerf (Inference only)\n     - Trn1/Trn1n, Inf1, Inf2\n     - * ``neuronperf`` (.whl)\n     - * :ref:`dev-tools_rn`\n\n   * - TensorFlow Model Server Neuron\n     - Trn1/Trn1n, Inf1, Inf2\n     - * ``tensorflow-model-server-neuronx`` (.deb, .rpm)\n     - * :ref:`tensorflow-modeslserver-neuronx-rn`\n\nTrn1/Trn1n and Inf2 only packages\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n   \n   * - Component\n     - Instance/s\n     - Package/s\n     - Details\n\n\n   * - PyTorch Neuron\n     - Trn1/Trn1n, Inf2\n     - * ``torch-neuronx`` (.whl)\n     - * :ref:`pytorch_rn`\n       * :ref:`pytorch-neuron-supported-operators`\n       \n\n   * - TensorFlow Neuron\n     - Trn1/Trn1n, Inf2\n     - * ``tensorflow-neuronx`` (.whl)\n     - * :ref:`tensorflow-neuronx-release-notes`\n\n \n   * - Neuron Compiler (Trn1/Trn1n, Inf2 only)\n     - Trn1/Trn1n, Inf2\n     - * ``neuronx-cc`` (.whl)\n     - * :ref:`compiler_rn`\n\n   * - Collective Communication library\n     - Trn1/Trn1n, Inf2    \n     - * ``aws-neuronx-collective`` (.deb, .rpm)\n     - * :ref:`runtime_rn`\n\n\n   * - Neuron Custom C++ Operators\n     - Trn1/Trn1n, Inf2\n  \n     - * ``aws-neuronx-gpsimd-customop`` (.deb, .rpm)\n  \n       * ``aws-neuronx-gpsimd-tools`` (.deb, .rpm)\n  \n     - * :ref:`gpsimd-customop-lib-rn`\n\n       * :ref:`gpsimd-customop-tools-rn`\n\n\n   * - Transformers Neuron\n     - Trn1/Trn1n, Inf2\n     - * ``transformers-neuronx`` (.whl)\n     - * :ref:`nxd-inference_rn`\n\n   * - Neuron Distributed\n     - Trn1/Trn1n, Inf2\n     - * ``neuronx-distributed`` (.whl)\n     - * :ref:`nxd-core_rn`\n\n   * - AWS Neuron Reference for NeMo Megatron\n     - Trn1/Trn1n\n     - * `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_\n     - * :ref:`neuronx-nemo-rn`\n\n\n\n.. note::\n\n   In next releases ``aws-neuronx-tools`` and ``aws-neuronx-runtime-lib`` will add support for Inf1.\n\n\nInf1 only packages\n~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n   \n\n   * - Component\n     - Instance/s\n     - Package/s\n     - Details\n\n\n   * - PyTorch Neuron\n     - Inf1\n     - * ``torch-neuron`` (.whl)\n     - * :ref:`pytorch-neuron-rn`\n\n       * :ref:`neuron-cc-ops-pytorch`\n\n\n   * - TensorFlow Neuron\n     - Inf1\n     - * ``tensorflow-neuron`` (.whl)\n     - * :ref:`tensorflow-neuron-rn`\n\n       * :ref:`neuron-cc-ops-tensorflow`\n       \n       * :ref:`tensorflow-neuron-rn-v2` \n\n\n\n   * - Apache MXNet\n     - Inf1\n     - * ``mx_neuron`` (.whl)\n     - * :doc:`MXNet Neuron Release Notes </release-notes/archive/mxnet-neuron>`\n\n       * :ref:`neuron-cc-ops-mxnet`\n\n\n   * - Neuron Compiler (Inf1 only)\n     - Inf1\n     - * ``neuron-cc`` (.whl)\n     - * :ref:`neuron-cc-rn`\n\n       * :ref:`neuron-supported-operators`\n\n\n.. _neuron-2.18.0-whatsnew:\n\n\nNeuron 2.18.2 (04/25/2024)\n--------------------------\nPatch release with minor Neuron Compiler bug fixes and enhancements. See more in  :ref:`compiler_rn`\n\n\n\nNeuron 2.18.1 (04/10/2024)\n--------------------------\n\nNeuron 2.18.1 release introduces :ref:`Continuous batching(beta) <transformers_neuronx_readme>` and Neuron vLLM integration(beta) support in Transformers NeuronX library that improves LLM inference throughput. This release also fixes hang issues related to Triton Inference Server as well as updating Neuron DLAMIs and DLCs with this release(2.18.1). \nSee more in  :ref:`nxd-inference_rn` and :ref:`compiler_rn` \n\n\n\nNeuron 2.18.0 (04/01/2024)\n--------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nNeuron 2.18 release introduces stable support (out of beta) for PyTorch 2.1, introduces new features and performance improvements to LLM training and inference, and updates Neuron DLAMIs and Neuron DLCs to support this release (Neuron 2.18).\n\n**Training highlights**: LLM model training user experience using NeuronX Distributed (NxD) is improved by introducing asynchronous checkpointing. This release also adds support for auto partitioning pipeline parallelism in NxD and introduces Pipeline Parallelism in PyTorch Lightning Trainer (beta).\n\n**Inference highlights**: Speculative Decoding support (beta) in TNx library improves LLM inference throughput and output token latency(TPOT) by up to 25% (for LLMs such as Llama-2-70B). TNx also improves weight loading performance by adding support for SafeTensor checkpoint format. Inference using Bucketing in PyTorch NeuronX and NeuronX Distributed is improved by introducing auto-bucketing feature.\nThis release also adds a new sample for ``Mixtral-8x7B-v0.1`` and ``mistralai/Mistral-7B-Instruct-v0.2`` in TNx.\n\n**Neuron DLAMI and Neuron DLC support highlights**: This release introduces new Multi Framework DLAMI for Ubuntu 22 that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports as well as SSM parameter support for DLAMIs to automate the retrieval of latest DLAMI ID in cloud automation flows. Support for new Neuron Training and Inference Deep Learning containers (DLCs) for PyTorch 2.1, as well as a new dedicated GitHub repository to host Neuron container dockerfiles and a public Neuron container registry to host Neuron container images.\n\nMore release content can be found in the table below and each component release notes.\n\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n\n   * - Transformers NeuronX (transformers-neuronx) for Inference\n     - * [Beta] Support for Speculative Decoding API. See :ref:`developer guide <transformers_neuronx_readme>` \n       * Support for SafeTensors checkpoint format with improved weight loading performance.  See :ref:`developer guide <transformers_neuronx_readme>` \n       * Support for running  Top-K sampling on Neuron Device for improved performance.  See :ref:`developer guide <transformers_neuronx_readme>` \n       * Code Llama model inference sample with 16K input seq length. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/codellama-13b-16k-sampling.ipynb>`__\n       * [Beta] Support for streaming API and stopping criteria API. See :ref:`developer guide <transformers_neuronx_readme>`\n       * Support for ``Mixtral-8x7B-v0.1`` model inference. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/mixtral-8x7b-sampling.ipynb>`__\n       * [Beta] Support for ``mistralai/Mistral-7B-Instruct-v0.2`` model inference. See `sample <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/mistralai-Mistral-7b-Instruct-v0.2.ipynb>`__\n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n\n   * - NeuronX Distributed (neuronx-distributed) for Training\n     - * Support for Pipeline Parallelism training using PyTorch Lightning. See :ref:`api guide <api_guide>` , :ref:`developer guide <ptl_developer_guide>` and :doc:`tutorial </archive/tutorials/training_llama2_tp_pp_ptl>`\n       * Support for auto partitioning pipeline parallel stages when training large models.  See :ref:`api guide <api_guide>` and :ref:`pp_developer_guide`\n       * Support for asynchronous checkpointing to improve the time it takes to save the checkpoint.  See :ref:`api guide <api_guide>` , :ref:`save_load_developer_guide` and :doc:`tutorial </archive/tutorials/training_llama2_tp_pp_ptl>`\n       * Tutorial to fine-tune Llama-2-7B model using PyTorch Lightning and running evaluation on the fine-tuned model using Hugging Face optimum-neuron. See :ref:`tutorial <llama2_7b_tp_zero1_ptl_finetune_tutorial>`\n       * ``codegen25-7b-mono`` model training tutorial and script. See :ref:`codegen25_7b_tp_zero1_tutorial` \n       * See more at :ref:`nxd-core_rn` \n     - Trn1/Trn1n\n\n   * - NeuronX Distributed (neuronx-distributed) for Inference\n     - * Support for auto bucketing in inference using a custom bucket kernel that can be passed as a bucket configuration to Tracing API. See :ref:`api guide <api_guide>` and :ref:`neuronx_distributed_inference_developer_guide`\n       * Support for inference with bf16 data type using XLA_USE_BF16=1 flag.\n       * See more at :ref:`nxd-core_rn` \n     - Inf2,Trn1/Trn1n\n\n   * - PyTorch NeuronX (torch-neuronx)\n     - * PyTorch 2.1 support is now stable (out of beta). \n       * Support for auto bucketing in inference using a custom bucket kernel that can be passed as a bucket configuration to Tracing API. See :ref:`torch-neuronx-autobucketing-devguide`\n       * See more at :ref:`pytorch_rn`\n     - Trn1/Trn1n,Inf2\n\n   * - NeuronX Nemo Megatron for Training\n     - * Support for LoRa finetuning. See `sample script <https://github.com/aws-neuron/neuronx-nemo-megatron/tree/main/nemo/examples/nlp/language_modeling/test_llama_lora.sh>`__\n       * Support for Mistral-7B training. See `sample script <https://github.com/aws-neuron/neuronx-nemo-megatron/tree/main/nemo/examples/nlp/language_modeling/test_mistral.sh>`__\n       * Support for asynchronous checkpointing to improve the time it takes to save the checkpoint.\n       * See more at `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_  and  :ref:`neuronx-nemo-rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Compiler (neuronx-cc)\n     - * New ``--enable-mixed-precision-accumulation`` compiler option to perform intermediate computations of an operation in FP32 regardless of the operation's defined datatype. See :ref:`neuron-compiler-cli-reference-guide`\n       * See more at :ref:`compiler_rn`\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron DLAMI and DLC\n     - * New Neuron Multi Framework Deep Learning AMI (DLAMI) for Ubuntu 22 with separate virtual environments for PyTorch 2.1, PyTorch 1.13, Transformers NeuronX and Tensorflow 2.10.  See :ref:`setup guide <setup-ubuntu22-multi-framework-dlami>` and :ref:`neuron-dlami-overview`\n       * Neuron Multi Framework Deep Learning AMI (DLAMI) is now the default Neuron AMI in QuickStart AMI list when launching Neuron instances for Ubuntu through AWS console. See :ref:`setup guide <setup-ubuntu22-multi-framework-dlami>`\n       * Neuron DLAMIs for PyTorch 1.13 and Tensorflow 2.10 are updated with 2.18 Neuron SDK for both Ubuntu 20 and AL2. See :ref:`neuron-dlami-overview`\n       * SSM parameter support for Neuron DLAMIs to find the DLAMI id with latest Neuron release SDK. See :ref:`neuron-dlami-overview`\n       * New Neuron Deep Learning Containers(DLCs) for PyTorch 2.1 Inference and Training.  See :ref:`neuron_containers`\n       * PyTorch 1.13 Inference and Training DLCs are updated with latest 2.18 Neuron SDK and now also comes with pre-installed NeuronX Distributed library. See :ref:`neuron_containers`\n       * Neuron DLCs are now hosted both in public Neuron ECR and as private images. Private images are only needed when using with Sagemaker. See :ref:`neuron_containers`\n       * New Neuron Github Repository to host dockerfiles for Neuron DLCs. See `neuron deep learning containers github repo <https://github.com/aws-neuron/deep-learning-containers>`_\n     - Inf1,Inf2,Trn1/Trn1n\n  \n   * - Other Documentation Updates\n     - * App Note on snapshotting models with PyTorch NeuronX 2.1 to support dumping debug information. See :ref:`pytorch-neuronx-debug`\n       * Added announcement for Maintenance mode of TensorFlow 1.x. See :ref:`announce-tfx-maintenance`\n       * --\n     - Inf1, Inf2, Trn1/Trn1n\n   \n   * - Known Issues and Limitations\n     - * See :ref:`neuron-2.18.0-known-issues`\n     - Trn1/Trn1n , Inf2, Inf1\n\n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1\n\n\n.. _neuron-2.18.0-known-issues:\n\n2.18.0 Known Issues and Limitations \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* For PyTorch 2.1 (NeuronX), slow convergence for LLaMA-2 70B training when using Zero Redundancy Optimizer (ZeRO1) can be resolved by removing all compiler flags.\n* For PyTorch 2.1 (NeuronX), torch-xla 2.1 is incompatible with the default GLibC on AL2. Users are advised to migrate to Amazon Linux 2023 , Ubuntu 22 or Ubuntu 20 Operating systems.\n* See component release notes below for any additional known issues.\n\n\n.. _neuron-2.17.0-whatsnew:\n\n\nNeuron 2.17.0 (02/13/2024)\n--------------------------\n\nWhat's New\n^^^^^^^^^^\n\nNeuron 2.17 release improves small collective communication operators (smaller than 16MB) by up to 30%, which improves large language model (LLM) Inference performance by up to 10%.\nThis release also includes improvements in :ref:`Neuron Profiler <neuron-profile-ug>` and other minor enhancements and bug fixes.\n\n\n.. _neuron-2.16.0-whatsnew:\n\n\n\nNeuron 2.16.1 (01/18/2024)\n--------------------------\nPatch release with compiler bug fixes, updates to :ref:`Neuron Device Plugin and Neuron Kubernetes Scheduler <containers_rn>` .\n\n\nNeuron 2.16.0 (12/21/2023)\n--------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nNeuron 2.16 adds support for Llama-2-70B training and inference, upgrades to PyTorch 2.1 (beta) and adds new support for PyTorch Lightning Trainer (beta) as well as performance improvements and adding Amazon Linux 2023 support.\n\n**Training highlights**: NeuronX Distributed library LLM models training performance is improved by up to 15%. LLM model training user experience is improved by introducing support of PyTorch Lightning Trainer (beta), and a new model optimizer wrapper which will minimize the amount of changes needed to partition models using NeuronX Distributed primitives.  \n\n**Inference highlights**: PyTorch inference now allows to dynamically swap different fine-tuned weights for an already loaded model, as well as overall improvements of LLM inference throughput and latency with Transformers NeuronX. Two new reference model samples for LLama-2-70b and Mistral-7b model inference.\n\n**User experience**: This release introduces two new capabilities: A new tool, Neuron Distributed Event Tracing (NDET) which improves debuggability, and the support of profiling collective communication operators in the Neuron Profiler tool.\n\nMore release content can be found in the table below and each component release notes.\n\n\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n\n   * - Transformers NeuronX (transformers-neuronx) for Inference\n     - * [Beta] Support for Grouped Query Attention(GQA). See :ref:`developer guide <transformers_neuronx_readme>` \n       * [Beta] Support for ``Llama-2-70b`` model inference using ``Grouped Query Attention``. See `tutorial <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference/llama-70b-sampling.ipynb>`__ \n       * [Beta] Support for ``Mistral-7B-Instruct-v0.1`` model inference. See :ref:`sample code <mistral_gqa_code_sample>`\n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n\n   * - NeuronX Distributed (neuronx-distributed) for Training\n     - * [Beta] Support for ``PyTorch Lightning``  to train models using ``tensor parallelism`` and ``data parallelism`` . See :ref:`api guide <api_guide>` , :ref:`developer guide <ptl_developer_guide>` and tutorial\n       * Support for Model and Optimizer Wrapper training API that handles the parallelization. See :ref:`api guide <api_guide>` and :ref:`model_optimizer_wrapper_developer_guide`\n       * New ``save_checkpoint``  and ``load_checkpoint`` APIs to save/load checkpoints during distributed training. See :ref:`save_load_developer_guide`\n       * Support for a new ``Query-Key-Value(QKV)`` module that provides the ability to replicate the Key Value heads and adds flexibility to use higher Tensor parallel degree during Training. See :ref:`api guide <api_guide>` and :doc:`tutorial </archive/tutorials/training_llama2_tp_pp_ptl>`\n       * See more at :ref:`nxd-core_rn` \n     - Trn1/Trn1n\n\n   * - NeuronX Distributed (neuronx-distributed) for Inference\n     - * Support weight-deduplication amongst TP shards by giving ability to save weights separately than in NEFF files.  See developer guide\n       * See more at :ref:`nxd-core_rn` and  :ref:`api_guide`\n     - Inf2,Trn1/Trn1n\n\n   * - PyTorch NeuronX (torch-neuronx)\n     - * [Beta]Support for] ``PyTorch 2.1``. See PyTorch 2.1 support documentation. See  `llama-2-13b inference <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb>`_ sample.\n       * Support to separate out model weights from NEFF files and new ``replace_weights`` API to replace the separated weights. See :ref:`torch_neuronx_replace_weights_api` and :ref:`torch_neuronx_trace_api`\n       * [Beta] Script for training ``stabilityai/stable-diffusion-2-1-base`` and  ``runwayml/stable-diffusion-v1-5`` models . See `script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/stable_diffusion/>`__ \n       * [Beta] Script for training ``facebook/bart-large`` model. See `script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_summarization/BartLarge.ipynb>`__ \n       * [Beta] Script for ``stabilityai/stable-diffusion-2-inpainting`` model inference.  See `script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_sd2_inpainting_936_624_inference.ipynb>`__ \n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Tools\n     - * New ``Neuron Distributed Event Tracing (NDET) tool`` to help visualize execution trace logs and diagnose errors in multi-node workloads.\n       * Support for multi-worker jobs in ``neuron-profile`` . See :ref:`neuron-profile-ug`\n       * See more at :ref:`dev-tools_rn`\n     - Inf1/Inf2/Trn1/Trn1n\n  \n   * - Documentation Updates\n     - * Added setup guide instructions for ``AL2023`` OS. See :ref:`setup-guide-index`\n       * Added announcement for name change of Neuron Components. See :ref:`announce-component-name-change`\n       * Added announcement for End of Support for ``PyTorch 1.10`` . See :ref:`announce-eos_pytorch110`\n       * Added announcement for End of Support for ``PyTorch 2.0`` Beta. See :ref:`announce-eos_pytorch2`\n       * --\n     - Inf1, Inf2, Trn1/Trn1n\n   \n   * - Known Issues and Limitations\n     - * See :ref:`neuron-2.16.0-known-issues`\n     - Trn1/Trn1n , Inf2, Inf1\n\n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1\n\n\n.. _neuron-2.16.0-known-issues:\n\n2.16.0 Known Issues and Limitations \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* We recommend running multi-node training jobs on AL2023 using Amazon EKS. Parallel Cluster currently does not support AL2023.\n* There are known compiler issues impacting inference accuracy of certain model configurations of ``Llama-2-13b`` when ``amp = fp16`` is used. If this issue is observed, ``amp=fp32`` should be used as a work around.  This issue will be addressed in future Neuron releases.\n* Execution time reported in ``neuron-profile`` tool is sometimes in-accurate due to a bug in how the time is captured.  The bug will be addressed in upcoming Neuron releases.\n* See component release notes below for any additional known issues.\n\n\n\n.. _neuron-2.15.0-whatsnew:\n\n\nNeuron 2.15.2 (11/17/2023)\n--------------------------\nPatch release that fixes compiler issues related to performance when training using ``neuronx-nemo-megatron`` library.\n\n\nNeuron 2.15.1 (11/09/2023)\n--------------------------\nPatch release to fix execution overhead issues in Neuron Runtime that were inadvertently introduced in 2.15 release.\n\n\n\nNeuron 2.15.0 (10/26/2023)\n--------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nThis release adds support for PyTorch 2.0 (Beta), increases performance for both training and inference workloads, adding ability to train models like ``Llama-2-70B`` using ``neuronx-distributed``. With this release, we are also adding pipeline parallelism support for ``neuronx-distributed`` enabling full 3D parallelism support to easily scale training to large model sizes.\nNeuron 2.15 also introduces support for training ``resnet50``, ``milesial/Pytorch-UNet`` and ``deepmind/vision-perceiver-conv`` models using ``torch-neuronx``, as well as new sample code for ``flan-t5-xl`` model inference using ``neuronx-distributed``, in addition to other performance optimizations, minor enhancements and bug fixes.\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - Neuron Distributed (neuronx-distributed) for Training\n     - * Pipeline parallelism support. See :ref:`api_guide` , :ref:`pp_developer_guide` and :ref:`pipeline_parallelism_overview`\n       * ``Llama-2-70B`` model training script  (`sample script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/llama2/tp_pp_llama2_70b_hf_pretrain>`__) (tutorial)\n       * Mixed precision support. See :ref:`pp_developer_guide`\n       * Support serialized checkpoint saving and loading using ``save_xser`` and ``load_xser`` parameters. See :ref:`api_guide` \n       * See more at :ref:`nxd-core_rn` \n     - Trn1/Trn1n\n\n   * - Neuron Distributed (neuronx-distributed) for Inference\n     - * ``flan-t5-xl`` model inference script (:pytorch-neuron-src:`tutorial <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`)\n       * See more at :ref:`nxd-core_rn` and  :ref:`api_guide`\n     - Inf2,Trn1/Trn1n\n\n   * - Transformers Neuron (transformers-neuronx) for Inference\n     - * Serialization support for ``Llama``, ``Llama-2``, ``GPT2`` and ``BLOOM`` models . See :ref:`developer guide <transformers_neuronx_readme>`\n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n\n   * - PyTorch Neuron (torch-neuronx)\n     - * Introducing ``PyTorch 2.0`` Beta support. See PyTorch 2.0 support documentation. See `bert training <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/dp_bert_hf_pretrain>`_ and  `t5-3b inference <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.html>`_ samples.\n       * Scripts for training `resnet50[Beta] <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/resnet50>`_ ,\n         `milesial/Pytorch-UNet[Beta] <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/unet_image_segmentation>`_ and `deepmind/vision-perceiver-conv[Beta] <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/hf_image_classification/VisionPerceiverConv.ipynb>`_ models.\n     - Trn1/Trn1n,Inf2\n\n   * - AWS Neuron Reference for Nemo Megatron library (``neuronx-nemo-megatron``)\n     - * ``Llama-2-70B`` model training sample using pipeline parallelism and tensor parallelism ( `tutorial <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md>`__)\n       * ``GPT-NeoX-20B`` model training using pipeline parallelism and tensor parallelism \n       * See more at :ref:`neuronx-nemo-rn` and `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_\n     - Trn1/Trn1n\n\n   * - Neuron Compiler (neuronx-cc)\n     - * New ``llm-training`` option argument to ``--distribution_strategy`` compiler option for optimizations related to distributed training. See more at :ref:`neuron-compiler-cli-reference-guide`\n       * See more at :ref:`compiler_rn`\n     - Inf2/Trn1/Trn1n\n\n   * - Neuron Tools\n     - * ``alltoall`` Collective Communication operation for intra node(with in the instance), previously released in Neuron Collectives v2.15.13, was added as a testable operation in ``nccom-test``. See :ref:`nccom-test`\n       * See more at :ref:`dev-tools_rn`\n     - Inf1/Inf2/Trn1/Trn1n\n  \n   * - Documentation Updates\n     - * New :ref:`App Note <activation_memory_reduction>` and :ref:`Developer Guide <activation_memory_reduction_developer_guide>` about Activation memory reduction using ``sequence parallelism`` and ``activation recomputation`` in ``neuronx-distributed``\n       * Added a new Model Samples and Tutorials summary page. See :ref:`model_samples_tutorials`\n       * Added Neuron SDK Classification guide. See :ref:`sdk-classification`\n       * --\n     - Inf1, Inf2, Trn1/Trn1n\n   \n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1\n\n\n\n\n\n\n\n\n.. _neuron-2.14.0-whatsnew:\n\n\nNeuron 2.14.1 (09/26/2023)\n--------------------------\n\nThis is a patch release that fixes compiler issues in certain configurations of ``Llama`` and ``Llama-2`` model inference using ``transformers-neuronx``.\n\n.. note::\n\n   There is still a known compiler issue for inference of some configurations of ``Llama`` and ``Llama-2`` models that will be addressed in future Neuron release.\n   Customers are advised to use ``--optlevel 1 (or -O1)`` compiler flag to mitigate this known compiler issue.  \n    \n   See :ref:`neuron-compiler-cli-reference-guide` on the usage of ``--optlevel 1`` compiler flag. Please see more on the compiler fix and known issues in :ref:`compiler_rn` and :ref:`nxd-inference_rn` \n   \n\n\n\nNeuron 2.14.0 (09/15/2023)\n--------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nThis release introduces support for ``Llama-2-7B`` model training and ``T5-3B`` model inference using ``neuronx-distributed``. It also adds support for  ``Llama-2-13B`` model training using ``neuronx-nemo-megatron``. Neuron 2.14 also adds support for ``Stable Diffusion XL(Refiner and Base)`` model inference using ``torch-neuronx`` . This release also introduces other new features, performance optimizations, minor enhancements and bug fixes.\nThis release introduces the following:\n\n.. note::\n   This release deprecates ``--model-type=transformer-inference`` compiler flag. Users are highly encouraged to migrate to the ``--model-type=transformer`` compiler flag.\n\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - AWS Neuron Reference for Nemo Megatron library (``neuronx-nemo-megatron``)\n     - * ``Llama-2-13B`` model training support ( `tutorial <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md>`__ )\n       * ZeRO-1 Optimizer support  that works with tensor parallelism and pipeline parallelism\n       * See more at :ref:`neuronx-nemo-rn` and `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_\n     - Trn1/Trn1n\n   \n   * - Neuron Distributed (neuronx-distributed) for Training\n     - * ``pad_model`` API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See  :ref:`api_guide`\n       * ``Llama-2-7B`` model training support  (`sample script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/tp_zero1_llama2_7b_hf_pretrain>`__)\n       * See more at :ref:`nxd-core_rn` and  :ref:`api_guide`\n     - Trn1/Trn1n\n\n   * - Neuron Distributed (neuronx-distributed) for Inference\n     - * ``T5-3B`` model inference support (:pytorch-neuron-src:`tutorial <neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb>`)\n       * ``pad_model`` API to pad attention heads that do not divide by the number of NeuronCores, this will allow users to use any supported tensor-parallel degree. See  :ref:`api_guide` \n       * See more at :ref:`nxd-core_rn` and  :ref:`api_guide`\n     - Inf2,Trn1/Trn1n\n\n   * - Transformers Neuron (transformers-neuronx) for Inference\n     - * Introducing ``--model-type=transformer`` compiler flag that deprecates ``--model-type=transformer-inference`` compiler flag. \n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n\n   * - PyTorch Neuron (torch-neuronx)\n     - * Performance optimizations in ``torch_neuronx.analyze`` API. See :ref:`torch_neuronx_analyze_api`\n       * ``Stable Diffusion XL(Refiner and Base)`` model inference support  ( `sample script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_sdxl_base_and_refiner_1024_inference.ipynb>`__)\n     - Trn1/Trn1n,Inf2\n\n   * - Neuron Compiler (neuronx-cc)\n     - * New  ``--optlevel``(or ``-O``) compiler option that enables different optimizations with tradeoff between faster model compile time and faster model execution. See more at :ref:`neuron-compiler-cli-reference-guide`\n       * See more at :ref:`compiler_rn`\n     - Inf2/Trn1/Trn1n\n\n   * - Neuron Tools\n     - * Neuron SysFS support for showing connected devices on ``trn1.32xl``, ``inf2.24xl`` and ``inf2.48xl`` instances. See :ref:`neuron-sysfs-ug`\n       * See more at :ref:`dev-tools_rn`\n     - Inf1/Inf2/Trn1/Trn1n\n  \n   * - Documentation Updates\n     - * Neuron Calculator now supports multiple model configurations for Tensor Parallel Degree computation. See :ref:`neuron_calculator`\n       * Announcement to deprecate ``--model-type=transformer-inference`` flag. See :ref:`announce-end-of-support-transformer-flag`\n       * --\n     - Inf1, Inf2, Trn1/Trn1n\n   \n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1\n\n\n\n\n.. _neuron-2.13.0-whatsnew:\n\nNeuron 2.13.2 (09/01/2023)\n---------------------------\n\nThis is a patch release that fixes issues in Kubernetes (K8) deployments related to Neuron Device Plugin crashes and other pod scheduling issues. This release also adds support for zero-based Neuron Device indexing in K8 deployments, see the :ref:`Neuron K8 release notes <containers_rn>` for more details on the specific bug fixes.\n\nUpdating to latest Neuron Kubernetes components and Neuron Driver is highly encouraged for customers using Kubernetes.\n\nPlease :ref:`follow these instructions in setup guide <setup-guide-index>` to upgrade to latest Neuron release.\n\n\nNeuron 2.13.1 (08/29/2023)\n--------------------------\nThis release adds support for ``Llama 2`` model training (`tutorial <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md>`__) using `neuronx-nemo-megatron <https://github.com/aws-neuron/neuronx-nemo-megatron>`_ library, and adds support for ``Llama 2`` model inference using ``transformers-neuronx`` library (`tutorial <https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb>`__) . \n\nPlease :ref:`follow these instructions in setup guide <setup-guide-index>` to upgrade to latest Neuron release.\n\n.. note::\n\n   Please install  ``transformers-neuronx`` from https://pip.repos.neuron.amazonaws.com to get latest features and improvements.\n   \n   This release does not support LLama 2 model with Grouped-Query Attention\n\n\nNeuron 2.13.0 (08/28/2023)\n--------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nThis release introduces support for ``GPT-NeoX`` 20B model training in ``neuronx-distributed`` including Zero-1 optimizer capability. It also adds support for ``Stable Diffusion XL`` and ``CLIP`` models inference in ``torch-neuronx``. Neuron 2.13 also introduces `AWS Neuron Reference for Nemo Megatron <https://github.com/aws-neuron/neuronx-nemo-megatron>`_ library supporting distributed training of LLMs like ``GPT-3 175B``. This release also introduces other new features, performance optimizations, minor enhancements and bug fixes.\nThis release introduces the following:\n\n\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - AWS Neuron Reference for Nemo Megatron library\n     - * Modified versions of the open-source packages `NeMo <https://github.com/NVIDIA/NeMo>`_ and `Apex <https://github.com/NVIDIA/apex>`_ that have been adapted for use with AWS Neuron and AWS EC2 Trn1 instances.\n       * ``GPT-3`` model training support ( `tutorial <https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-gpt-job.md>`__ )\n       * See more at `neuronx-nemo-megatron github repo <https://github.com/aws-neuron/neuronx-nemo-megatron>`_\n     - Trn1/Trn1n\n\n   * - Transformers Neuron (transformers-neuronx) for Inference\n     - * Latency optimizations for  ``Llama`` and ``GPT-2`` models inference.\n       * Neuron Persistent Cache support (:ref:`developer guide <transformers_neuronx_readme>`)\n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n   \n   * - Neuron Distributed (neuronx-distributed) for Training\n     - * Now Stable, removed beta support\n       * ZeRO-1 Optimizer support with tensor parallel. (:ref:`tutorial <gpt_neox_tp_zero1_tutorial>`)\n       * Sequence Parallel support. (:ref:`api guide <api_guide>`)\n       * See more at :ref:`nxd-core_rn` and  :ref:`api_guide`\n     - Trn1/Trn1n\n\n   * - Neuron Distributed (neuronx-distributed) for Inference\n     - * KV Cache Support for LLM Inference (:ref:`release notes <nxd-core_rn>`)\n     - Inf2,Trn1/Trn1n\n\n\n   * - PyTorch Neuron (torch-neuronx)\n     - * Seedable dropout enabled by default for training\n       * KV Cache inference support ( :pytorch-neuron-src:`tutorial <torch-neuronx/t5-inference-tutorial.ipynb>` )\n       * ``camembert-base`` training script. (`sample script <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_text_classification/CamembertBase.ipynb>`__)\n       * New models inference support that include `Stable Diffusion XL <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_sdxl_1024_inference.ipynb>`_ , CLIP (`clip-vit-base-patch32 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_clip_base_inference_on_inf2.ipynb>`_ , `clip-vit-large-patch14 <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_clip_large_inference_on_inf2.ipynb>`_ ) , `Vision Perceiver <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_perceiver_vision_inference.ipynb>`_ , `Language Perceiver <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/inference/hf_pretrained_perceiver_language_inference.ipynb>`_ and :pytorch-neuron-src:`T5 <torch-neuronx/t5-inference-tutorial.ipynb>`\n     - Trn1/Trn1n,Inf2\n\n\n   * - Neuron Tools\n     - * New data types support for Neuron Collective Communication Test Utility (NCCOM-TEST)  --check option: fp16, bf16, (u)int8, (u)int16, and (u)int32 \n       * Neuron SysFS support for FLOP count(flop_count) and connected Neuron Device ids (connected_devices).  See :ref:`neuron-sysfs-ug`\n       * See more at :ref:`dev-tools_rn`\n     - Inf1/Inf2/Trn1/Trn1n\n  \n   * - Neuron Runtime \n     - * Runtime version and Capture Time support to NTFF\n       * Async DMA copies support to improve Neuron Device copy times for all instance types\n       * Logging and error messages improvements for Collectives timeouts and when loading NEFFs.\n       * See more at :ref:`runtime_rn`\n     - Inf1, Inf2, Trn1/Trn1n\n  \n   * - End of Support Announcements and Documentation Updates \n     - * Announcing End of support for ``AWS Neuron reference for Megatron-LM`` starting Neuron 2.13. See more at :ref:`announce-eol-megatronlm`\n       * Announcing end of support for ``torch-neuron`` version 1.9 starting Neuron 2.14. See more at :ref:`announce-eol-pytorch19`\n       * Added TensorFlow 2.x (``tensorflow-neuronx``) analyze_model API section. See more at :ref:`tensorflow-ref-neuron-analyze_model-api`\n       * Upgraded ``numpy`` version to ``1.21.6`` in various training scripts for `Text Classification <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training>`_\n       * Updated ``bert-japanese`` training Script to use ``multilingual-sentiments`` dataset. See `hf-bert-jp <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/training/hf_bert_jp>`_\n       * --\n     - Inf1, Inf2, Trn1/Trn1n\n   \n   * - Known Issues and Limitations\n     - * See :ref:`neuron-2.13.0-known-issues`\n     - Trn1/Trn1n , Inf2, Inf1\n\n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1\n\n\n\n.. _neuron-2.13.0-known-issues:\n\n2.13.0 Known Issues and Limitations \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* Currently we see a NaN generated when the model implementation uses torch.dtype(float32.min) or torch.dtype(float32.max) along with XLA_USE_BF16/XLA_DOWNCAST_BF16. This is because, float32.min or float32.max gets downcasted to Inf in bf16 thereby producing a NaN. Short term fix is that we can use a small/large fp32 number instead of using float32.min/float32.max. Example, for mask creation, we can use -/+1e4 instead of min/max values. The issue will be addressed in future Neuron releases.   \n\n\n\n.. _neuron-2.12.0-whatsnew:\n\n\nNeuron 2.12.2 (08/19/2023)\n--------------------------\nPatch release to fix a jemalloc conflict for all Neuron customers that use Ubuntu 22.  The previous releases shipped with a dependency on jemalloc that may lead to compilation failures in Ubuntu 22 only.  \nPlease :ref:`follow these instructions in setup guide<setup-guide-index>` to upgrade to latest Neuron release.\n\n\nNeuron 2.12.1 (08/09/2023)\n--------------------------\nPatch release to improve reliability of Neuron Runtime when running applications on memory constrained instances. The Neuron Runtime has reduced the contiguous memory requirement for initializing the Neuron Cores associated with applications.\nThis reduction allows bringup when only small amounts of contiguous memory remain on an instance.  Please :ref:`upgrade to latest Neuron release<setup-guide-index>` to use the latest Neuron Runtime.\n\n\nNeuron 2.12.0 (07/19/2023)\n--------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nThis release introduces  ZeRO-1 optimizer for model training in ``torch-neuronx`` , introduces beta support for ``GPT-NeoX``, ``BLOOM`` , ``Llama`` and ``Llama 2(coming soon)`` models in ``transformers-neuronx``. This release also adds support for model inference serving on Triton Inference Server for Inf2 & Trn1 instances, ``lazy_load`` API and ``async_load`` API for model loading in ``torch-neuronx``, as well as other new features,\nperformance optimizations, minor enhancements and bug fixes. This release introduces the following:\n\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - ZeRO-1 optimizer for model training in ``torch-neuronx``\n     - * Support of ZeRO-Stage-1 optimizer ( ZeroRedundancyOptimizer() API) for training models using ``torch-neuronx``\n       * See tutorial at  :ref:`zero1-gpt2-pretraining-tutorial`\n     - Inf2, Trn1/Trn1n\n\n   * - Support for new models and Enhancements in ``transformers-neuronx``\n     - * [Beta] Support for inference of ``GPT-NeoX``, ``BLOOM`` and ``Llama`` models. \n       * [Beta] Support for ``Llama 2`` coming soon. Please monitor the `transformers-neuronx repository <https://github.com/aws-neuron/transformers-neuronx/tree/main/src/transformers_neuronx>`_ for updates.\n       * Removed constraints on ``tp_degree`` in tensor-parallel configurations for ``GPT2``, ``OPT``, and ``BLOOM`` . See more at :ref:`nxd-inference_rn`\n       * Added multi-query / multi-group attention support for ``GPT2``.\n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n   \n   * - Support for Inf2 and Trn1 instances on Triton Inference Server\n     - * Support for Model Inference serving on Triton for Inf2 and Trn1 instances. See more at `Triton Server Python Backend <https://github.com/triton-inference-server/python_backend/tree/main/inferentia#using-triton-with-inferentia-2-or-trn1>`_\n       * See tutorial at `Triton on SageMaker - Deploying on Inf2 <https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-triton/inferentia2>`_\n     - Inf2, Trn1\n\n   * - Support for new computer vision models \n     - * Performance optimizations in Stable Diffusion 2.1 model script and added [beta] support for Stable Diffusion 1.5 models.\n       * [Beta] Script for training CLIP model for Image Classification.\n       * [Beta] Script for inference of Multimodal perceiver model\n       * Please check `aws-neuron-samples repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx>`__\n     - Inf2, Trn1/Trn1n\n\n   * - New Features in ``neuronx-distributed`` for training\n     - * Added parallel cross entropy loss function.\n       * See more at tensor parallelism API guide\n     - Trn1/Trn1n\n\n   * - ``lazy_load`` and ``async_load`` API for model loading in inference and performance enhancements in ``torch-neuronx`` \n     - * Added ``lazy_load`` and ``async_load`` API to accelerate model loading for Inference. See more at :ref:`torch_neuronx_lazy_async_load_api`\n       * Optimize DataParallel API to load onto multiple cores simultaneously when device IDs specified are consecutive.\n       * See more at :ref:`pytorch_rn`\n     - Inf2, Trn1/Trn1n\n  \n   * - [Beta] Asynchronous Execution support and Enhancements in Neuron Runtime \n     - * Added beta asynchronous execution feature which can reduce latency by roughly 12% for training workloads. See more at :ref:`nrt-configuration`\n       * AllReduce with All-to-all communication pattern enabled for 16 ranks on TRN1/TRN1N within the instance (intranode)\n       * See more at :ref:`runtime_rn`\n     - Inf1, Inf2, Trn1/Trn1n\n  \n   * - Support for ``distribution_strategy`` compiler option in ``neuronx-cc``\n     - * Support for optional ``--distribution_strategy`` compiler option to enable compiler specific optimizations based on distribution strategy used.\n       * See more at :ref:`neuron-compiler-cli-reference-guide`\n     - Inf2, Trn1/Trn1n\n\n   * - New Micro Benchmarking Performance User Guide and Documentation Updates \n     - * Added best practices user guide for benchmarking performance of Neuron devices. See more at `Benchmarking Guide and Helper scripts <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/microbenchmark>`_\n       * Announcing end of support for Ubuntu 18. See more at :ref:`announce-eol-ubuntu18`\n       * Removed support for Distributed Data Parallel(DDP) Tutorial.\n       * Improved sidebar navigation in Documentation.\n       * --\n     - Inf1, Inf2, Trn1/Trn1n\n   \n   * - Known Issues and Limitations\n     - * See :ref:`neuron-2.12.0-known-issues`\n     - Trn1/Trn1n , Inf2, Inf1\n  \n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1\n\n\n\n.. _neuron-2.12.0-known-issues:\n\n2.12.0 Known Issues and Limitations \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nKnown Issues in Ubuntu 22 Support\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n* Several Vision and NLP models on Ubuntu 22 are not supported due to Compilation issues. Issues will be addressed in upcoming releases.\n* CustomOp feature failing with seg fault on Ubuntu 22.  Issue will be addressed in upcoming releases.\n  \nKnown issues in certain resnet models on Ubuntu 20\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n* Known issue with support for resnet-18, resnet-34, resnet-50, resnet-101 and resnet-152 models on Ubuntu 20. Issues will be addressed in upcoming releases.\n\n\n\n.. _neuron-2.11.0-whatsnew:\n\nNeuron 2.11.0 (06/14/2023)\n--------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nThis release introduces Neuron Distributed, a new python library to simplify training and inference of large models, improving usability with features like S3 model caching, standalone profiler tool, support for Ubuntu22, as well as other new features,\nperformance optimizations, minor enhancements and bug fixes. This release introduces the following:\n\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n  \n   * - New Features and Performance Enhancements in ``transformers-neuronx``\n     - * Support for ``int8`` inference. See example at :ref:`int8_weight_storage_support`\n       * Improved prompt context encoding performance. See more at :ref:`transformers_neuronx_developer_guide`\n       * Improved collective communications performance for Tensor Parallel inference on Inf2 and Trn1.\n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n\n   * - Neuron Profiler Tool \n     - * Profiling and visualization of model execution on Trainium and Inferentia devices now supported as a stand-alone tool.\n       * See more at :ref:`neuron-profile-ug`\n     - Inf1, Inf2, Trn1/Trn1n\n\n   * - Neuron Compilation Cache through S3\n     - * Support for sharing compiled models across Inf2 and Trn1 nodes through S3\n       * See more at :ref:`pytorch-neuronx-parallel-compile-cli`\n     - Inf2, Trn1/Trn1n\n\n   * - New script to scan a model for supported/unsupported operators\n     - * Script to scan a model for supported/unsupported operators before training, scan output includes supported and unsupported operators at both XLA operators and PyTorch operators level.\n       * See a sample tutorial at :ref:`torch-analyze-for-training-tutorial`\n     - Inf2, Trn1/Trn1n\n\n   * - Neuron Distributed Library [Beta]\n     - * New Python Library based on PyTorch enabling distributed training and inference of large models.\n       * Initial support for tensor-parallelism.\n       * See more at :doc:`NeuronX Distributed </libraries/neuronx-distributed/index-training>`\n     - Inf2, Trn1/Trn1n\n\n   * - Neuron Calculator and Documentation Updates  \n     - * New :ref:`neuron_calculator` Documentation section to help determine number of Neuron Cores needed for LLM Inference.\n       * Added App Note :ref:`neuron_llm_inference`\n       * --\n     - Inf1, Inf2, Trn1/Trn1n\n\n   * - Enhancements to Neuron SysFS\n     - * Support for detailed breakdown of memory usage across the NeuronCores\n       * See more at :ref:`neuron-sysfs-ug`\n     - Inf1, Inf2, Trn1/Trn1n\n\n   * - Support for Ubuntu 22\n     - * See more at :ref:`setup-guide-index` for setup instructions on Ubuntu22\n     - Inf1, Inf2, Trn1/Trn1n\n\n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1\n\n\n\n\n.. _neuron-2.10.0-whatsnew:\n\nNeuron 2.10.0 (05/01/2023)\n--------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nThis release introduces new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n\n   * - Initial support for computer vision models inference\n     - * Added Stable Diffusion 2.1 model script for Text to Image Generation\n       * Added VGG model script for Image Classification Task\n       * Added UNet model script for Image Segmentation Task\n       * Please check `aws-neuron-samples repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx>`__\n     - Inf2, Trn1/Trn1n\n\n   * - Profiling support in PyTorch Neuron(``torch-neuronx``) for Inference with TensorBoard\n     - * See more at :ref:`torch-neuronx-profiling-with-tb`\n     - Inf2, Trn1/Trn1n\n  \n   * - New Features and Performance Enhancements in transformers-neuronx\n     - * Support for the HuggingFace generate function. \n       * Model Serialization support for GPT2 models. (including model saving, loading, and weight swapping)\n       * Improved prompt context encoding performance.\n       * See :ref:`transformers_neuronx_readme` for examples and usage\n       * See more at :ref:`nxd-inference_rn` \n     - Inf2, Trn1/Trn1n\n\n   * - Support models larger than 2GB in TensorFlow 2.x Neuron (``tensorflow-neuronx``) \n     - * See :ref:`tensorflow-neuronx-special-flags` for details. (``tensorflow-neuronx``) \n     - Trn1/Trn1n, Inf2\n\n   * - Support models larger than 2GB in TensorFlow 2.x Neuron (``tensorflow-neuron``) \n     - * See :ref:`Special Flags <tensorflow-ref-neuron-tracing-api>` for details. (``tensorflow-neuron``)\n     - Inf1\n  \n   * - Performance Enhancements in PyTorch C++ Custom Operators (Beta)\n     - * Support for using multiple GPSIMD Cores in Custom C++ Operators\n       * See :ref:`custom-ops-api-ref-guide`\n     - Trn1/Trn1n\n   \n   * - Weight Deduplication Feature (Inf1) \n     - * Support for Sharing weights when loading multiple instance versions of the same model on different NeuronCores.\n       * See more at :ref:`nrt-configuration`\n     - Inf1\n\n   * - ``nccom-test`` - Collective Communication Benchmarking Tool\n     - * Supports enabling benchmarking sweeps on various Neuron Collective Communication operations. See :ref:`nccom-test` for more details.\n     - Trn1/Trn1n , Inf2\n\n   * - Announcing end of support for tensorflow-neuron 2.7 & mxnet-neuron 1.5 versions\n     - * See :ref:`announce-eol-tf-before-2-7`\n       * See :ref:`announce-eol-mxnet-before-1-5`\n     - Inf1\n\n   * - Release Artifacts\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1/Trn1n , Inf2, Inf1\n\n.. _neuron-2.9.0-whatsnew:\n\n\nNeuron 2.9.1 (04/19/2023)\n-------------------------\nMinor patch release to add support for deserialized torchscript model compilation and support for multi-node training in EKS. Fixes included in this release are critical to enable training\nand deploying models with Amazon Sagemaker or Amazon EKS.\n\n\nNeuron 2.9.0 (03/28/2023)\n-------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nThis release adds support for EC2 Trn1n instances, introduces new features, performance optimizations, minor enhancements and bug fixes. This release introduces the following:\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n     - Instances\n\n   * - Support for EC2 Trn1n instances\n     - * Updated Neuron Runtime for Trn1n instances     \n      \n       * Overall documentation update to include Trn1n instances\n     - Trn1n\n\n   * - New Analyze API in PyTorch Neuron (``torch-neuronx``)  \n     - * A new API that return list of supported and unsupported PyTorch operators for a model. See :ref:`torch_neuronx_analyze_api`\n     - Trn1, Inf2\n  \n   * - Support models that are larger than 2GB in PyTorch Neuron (``torch-neuron``) on Inf1\n     - * See ``separate_weights`` flag to :func:`torch_neuron.trace` to support models that are larger than 2GB\n     - Inf1\n\n   * - Performance Improvements\n     - * Up to 10% higher throughput when training GPT3 6.7B model on multi-node\n     - Trn1\n\n   * - Dynamic Batching support in TensorFlow 2.x Neuron (``tensorflow-neuronx``)\n     - * See :ref:`tensorflow-neuronx-special-flags` for details.\n     - Trn1, Inf2\n\n   * - NeuronPerf support for Trn1/Inf2 instances\n     - * Added Trn1/Inf2 support for PyTorch Neuron (``torch-neuronx``) and TensorFlow 2.x Neuron (``tensorflow-neuronx``)\n     - Trn1, Inf2\n\n   * - Hierarchical All-Reduce and Reduce-Scatter collective communication\n     - * Added support for hierarchical All-Reduce and Reduce-Scatter in Neuron Runtime to enable better scalability of distributed workloads .\n     - Trn1, Inf2\n  \n   * - New Tutorials added\n     - * :ref:`Added tutorial to fine-tune T5 model <torch-hf-t5-finetune>`\n       * Added tutorial to demonstrate use of Libtorch with PyTorch Neuron (``torch-neuronx``) for inference :ref:`[html] <pytorch-tutorials-libtorch>`\n     - Trn1, Inf2\n\n   * - Release included packages\n     - * see :ref:`latest-neuron-release-artifacts`\n     - Trn1, Inf2, Inf1\n.. _neuron-2.8.0-whatsnew:\n\nNeuron 2.8.0 (02/24/2023)\n-------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nThis release adds support for `EC2 Inf2 <https://aws.amazon.com/ec2/instance-types/inf2/>`_ instances, introduces initial inference support with TensorFlow 2.x Neuron (``tensorflow-neuronx``) on Trn1 and Inf2, and introduces minor enhancements and bug fixes.\n\nThis release introduces the following:\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n\n   * - Support for `EC2 Inf2 <https://aws.amazon.com/ec2/instance-types/inf2/>`_ instances\n     - * Inference support for Inf2 instances in PyTorch Neuron (``torch-neuronx``)      \n    \n       * Inference support for Inf2 instances in TensorFlow 2.x Neuron (``tensorflow-neuronx``)\n        \n       * Overall documentation update to include Inf2 instances\n  \n\n   * - TensorFlow 2.x Neuron (``tensorflow-neuronx``) support\n     - * This releases introduces initial inference support with TensorFlow 2.x Neuron (``tensorflow-neuronx``) on Trn1 and Inf2\n\n\n   * - New Neuron GitHub samples\n     - * New sample scripts for deploying LLM models with ``transformer-neuronx`` under       `aws-neuron-samples <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx/transformers-neuronx/inference>`__  GitHub repository.\n      \n       * New sample scripts for deploying models with ``torch-neuronx`` under `aws-neuron-samples repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx>`__  GitHub repository.\n\n   * - Release included packages\n     - * see :ref:`latest-neuron-release-artifacts`\n\n.. _neuron-2.7.0-whatsnew:\n\nNeuron 2.7.0 (02/08/2023)\n-------------------------\n\n.. contents:: Table of contents\n   :local:\n   :depth: 3\n\nWhat's New\n^^^^^^^^^^\n\nThis release introduces new capabilities and libraries, as well as features and tools that improves usability. This release introduces the following:\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n   * - What's New\n     - Details\n\n   * - PyTorch 1.13\n     - Support of PyTorch 1.13 version for PyTorch Neuron (``torch-neuronx``). For resources see :ref:`pytorch-neuronx-main`\n\n   * - PyTorch DistributedDataParallel (DDP) API\n     - Support of PyTorch DistributedDataParallel (DDP) API in PyTorch Neuron (``torch-neuronx``). For resources how to use PyTorch DDP API with Neuron, please check the DDP tutorial.\n\n   * - Inference support in ``torch-neuronx``\n     - For more details, see Neuron Inference samples `<https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx>`_ in the ``aws-neuron-samples`` GitHub repo.     \n\n   * - Neuron Custom C++ Operators[Beta]\n     - Initial support for Neuron Custom C++ Operators [Beta] , with Neuron Custom C++ Operators (“CustomOps”) you can now write CustomOps that run on NeuronCore-v2 chips. For more resources please check :ref:`neuron_c++customops` section.\n\n\n   * - ``transformers-neuronx`` [Beta] \n     - ``transformers-neuronx``  is a new library enabling LLM model inference. It contains models that are checkpoint-compatible with HuggingFace Transformers, and currently supports Transformer Decoder models like GPT2, GPT-J and OPT. Please check `aws-neuron-samples repository <https://github.com/aws-neuron/transformers-neuronx>`__  \n\n\n   * - Neuron sysfs filesystem\n     - Neuron sysfs filesystem exposes Neuron Devices under ``/sys/devices/virtual/neuron_device`` providing visibility to Neuron Driver and Runtime at the system level. By performing several simple CLIs such as reading or writing to a sysfs file, you can get information such as Neuron Runtime status, memory usage, Driver info etc. For resources about Neuron sysfs filesystem visit :ref:`neuron-sysfs-ug`.\n\n\n   * - TFLOPS support in Neuron System Tools\n     - Neuron System Tools now also report model actual TFLOPs rate in both ``neuron-monitor`` and ``neuron-top``. More details can be found in the :ref:`Neuron Tools documentation <neuron-tools>`.\n\n   * - New sample scripts for training\n     - This release adds multiple new sample scripts for training models with ``torch-neuronx``, Please check `aws-neuron-samples repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx>`__\n\n   * - New sample scripts for inference\n     - This release adds multiple new sample scripts for deploying models with ``torch-neuronx``, Please check `aws-neuron-samples repository <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx>`__\n\n   * - Neuron GitHub samples repository for Amazon EKS\n     - A new AWS Neuron GitHub samples repository for Amazon EKS, Please check `aws-neuron-samples repository <https://github.com/aws-neuron/aws-neuron-eks-samples>`__\n\n.. _neuron-2.6.0-whatsnew:\n\nNeuron 2.6.0 (12/12/2022)\n-------------------------\n\nThis release introduces the support of PyTorch 1.12 version, and introduces PyTorch Neuron (``torch-neuronx``) profiling through Neuron Plugin for TensorBoard. Pytorch Neuron (``torch-neuronx``) users can now profile their models through the following TensorBoard views:\n\n* Operator Framework View\n* Operator HLO View\n* Operator Trace View\n\nThis release introduces the support of LAMB optimizer for FP32 mode, and adds support for :ref:`capturing snapshots <torch-neuronx-snapshotting>` of inputs, outputs and graph HLO for debugging.\n\nIn addition, this release introduces the support of new operators and resolves issues that improve stability for Trn1 customers.\n\n.. _neuron-2.5.0-whatsnew:\n\nNeuron 2.5.0 (11/23/2022)\n-------------------------\n\nNeuron 2.5.0 is a major release which introduces new features and resolves issues that improve stability for Inf1 customers.\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n   :class: table-smaller-font-size\n\n\n   * - Component\n     - New in this release\n\n   * - PyTorch Neuron ``(torch-neuron)``\n     - * PyTorch 1.12 support\n       \n       * Python 3.8 support\n     \n       * :ref:`LSTM <torch_neuron_lstm_support>` support on Inf1\n\n       * :ref:`R-CNN <torch-neuron-r-cnn-app-note>` support on Inf1\n\n       * Support for new :doc:`API for core placement </archive/torch-neuron/api-core-placement>`\n      \n       * Support for :ref:`improved logging <pytorch-neuron-rn>` \n        \n       * Improved :func:`torch_neuron.trace` performance when using large graphs\n      \n       * Reduced host memory usage of loaded models in ``libtorchneuron.so``\n      \n       * :ref:`Additional operators <neuron-cc-ops-pytorch>` support\n       \n\n   * - TensorFlow Neuron ``(tensorflow-neuron)``\n     - * ``tf-neuron-auto-multicore`` tool to enable automatic data parallel on multiple NeuronCores.\n      \n       * Beta support for tracing models larger than 2GB using ``extract-weights`` flag (TF2.x only), see :ref:`tensorflow-ref-neuron-tracing-api`\n\n       * ``tfn.auto_multicore`` Python API to enable automatic data parallel (TF2.x only)\n    \n\nThis Neuron release is the last release that will include ``torch-neuron`` :ref:`versions 1.7 and 1.8 <announce-eol-pt-before-1-8>`, and that will include ``tensorflow-neuron`` :ref:`versions 2.5 and 2.6 <announce-eol-tf-before-2-5>`.\n\nIn addition, this release introduces changes to the Neuron packaging and installation instructions for Inf1 customers, see :ref:`neuron250-packages-changes` for more information.\n\n.. _neuron-2.4.0-whatsnew:\n\nNeuron 2.4.0 (10/27/2022)\n-------------------------\n\nThis release introduces new features and resolves issues that improve stability. The release introduces \"memory utilization breakdown\" feature in both :ref:`Neuron Monitor <neuron-monitor-ug>` and :ref:`Neuron Top <neuron-top-ug>` system tools. The release introduces support for \"NeuronCore Based Sheduling\" capability to the Neuron Kubernetes Scheduler and introduces new operators support in :ref:`Neuron Compiler <neuronx-cc-index>` and :ref:`PyTorch Neuron <pytorch_rn>`. This release introduces also additional eight (8) samples of models' fine tuning using PyTorch Neuron. The new samples can be found in the `AWS Neuron Samples GitHub <https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx>`_ repository.\n"
  },
  {
    "path": "release-notes/releasecontent.rst",
    "content": "\n.. _latest-neuron-release-artifacts:\n\nRelease Content\n===============\n\nThis page contains the packages, libraries, and other artifacts (and the versions of them) that ship in the latest AWS Neuron SDK release.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n<< :ref:`Back to the release notes <latest-neuron-release>`\n\nNeuron 2.29.0 (04/09/2026)\n---------------------------\n\nTrn1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.29.0\n\nTrn2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=trn2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.29.0\n\n\nInf2 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.29.0\n\nInf1 packages\n^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=packages --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.29.0\n\nSupported Python Versions for Inf1 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf1 --file=src/helperscripts/n2-manifest.json --neuron-version=2.29.0\n\nSupported Python Versions for Inf2/Trn1/Trn2 packages\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. program-output:: python3 src/helperscripts/n2-helper.py --list=pyversions --instance=inf2 --file=src/helperscripts/n2-manifest.json --neuron-version=2.29.0\n\nSupported NumPy Versions\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron currently supports NumPy versions 2.X. Neuron continues to support NumPy versions >= 1.21.6, as well.\n\nSupported vLLM Versions\n^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron currently supports vLLM version 0.16.0.\n\nSupported Hugging Face Transformers Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Hugging Face           |\n|                                  | Transformers Versions            |\n+==================================+==================================+\n| torch-neuronx                    | >= 4.52                          |\n+----------------------------------+----------------------------------+\n| neuronx-distributed-inference    | == 4.57.*                        |\n+----------------------------------+----------------------------------+\n| vllm                             | >= 4.56.0, < 5                   |\n+----------------------------------+----------------------------------+\n\nSupported Protobuf Versions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n+----------------------------------+----------------------------------+\n| Package                          | Supported Protobuf versions      |\n+==================================+==================================+\n| neuronx-cc                       | > 3                              |\n+----------------------------------+----------------------------------+\n| torch-neuronx                    | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| jax-neuronx                      | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| torch-neuron                     | < 3.20                           |\n+----------------------------------+----------------------------------+\n| neuronx-distributed              | >= 3.20                          |\n+----------------------------------+----------------------------------+\n| tensorflow-neuronx               | < 3.20                           |\n+----------------------------------+----------------------------------+\n| tensorflow-neuron                | < 3.20                           |\n+----------------------------------+----------------------------------+\n  \nPrevious Neuron Releases Content\n--------------------------------\n\n* :ref:`pre-release-content`\n* :ref:`pre-n1-release-content`\n"
  },
  {
    "path": "requirements-python310.txt",
    "content": "#this requirement file is for Python 3.10\n\nenchant\nSphinx==5\nsphinx-book-theme==1.0.0\nsphinx_design==0.3.0\npydata-sphinx-theme==0.13.0\nsphinxcontrib.htmlhelp\nJinja2\nnbconvert\nMarkupSafe\nablog\nsphinx_plotly_directive\nsphinx-copybutton\nnbsphinx\nsphinxcontrib-programoutput\nsphinxcontrib-contentui\nsphinxcontrib-ansi\nsphinxcontrib-applehelp\nsphinxcontrib-devhelp==1\nsphinxcontrib-htmlhelp\nsphinxcontrib-jsmath\nsphinxcontrib-qthelp\nsphinxcontrib-serializinghtml\nsphinxcontrib-contentui\ntraitlets\nnbformat\nnumpy==1.21.2\nml_dtypes~=0.2.0\nsphinxcontrib-googleanalytics\nipython\nsphinxcontrib.datatemplates\nsphinxcontrib.spelling\nsphinx-tabs"
  },
  {
    "path": "requirements-python38.txt",
    "content": "#this requirement file is for Python 3.7/3.8\n\nenchant\nSphinx==4.5.0\nsphinx-book-theme==0.3.3\npydata-sphinx-theme==0.8.1\nsphinx_design==0.3.0\nJinja2==2.11.3\nMarkupSafe==1.1.1\nablog==0.10.29\nipython-genutils==0.2.0\nipython==7.26.0\nnbconvert\nnbformat\nnbsphinx==0.8.9\npandas==1.3.1\nplotly==5.1.0\nreadthedocs-sphinx-search==0.1.1\nsphinx-panels==0.6.0\nsphinx-rtd-theme==0.5.2\nsphinx-tabs==3.2.0\nsphinx-copybutton==0.5.2\nsphinxcontrib-ansi\nsphinxcontrib-applehelp\nsphinxcontrib-contentui==0.2.5\nsphinxcontrib-devhelp\nsphinxcontrib-htmlhelp\nsphinxcontrib-jsmath\nsphinxcontrib-programoutput==0.17\nsphinxcontrib-qthelp\nsphinxcontrib-serializinghtml\ntraitlets\nml_dtypes~=0.2.0\nsphinxcontrib-googleanalytics\nipython\nsphinxcontrib.datatemplates\nsphinxcontrib.spelling\nsphinx-tabs\n\n"
  },
  {
    "path": "requirements.txt",
    "content": "#this requirement file is for Python 3.10\n\nenchant\nSphinx==5.3\nsphinx-book-theme==1.0.0\nsphinx_design==0.3.0\npydata-sphinx-theme==0.13.0\nsphinxcontrib.htmlhelp\nJinja2\nnbconvert\nMarkupSafe\nablog\nsphinx_plotly_directive\nsphinx-copybutton\nnbsphinx\nsphinxcontrib-programoutput\nsphinxcontrib-contentui\nsphinxcontrib-ansi\nsphinxcontrib-applehelp\nsphinxcontrib-devhelp\nsphinxcontrib-htmlhelp\nsphinxcontrib-jsmath\nsphinxcontrib-qthelp\nsphinxcontrib-serializinghtml\ntraitlets\nnbformat\nnumpy<2.0\nml_dtypes>=0.5.0\npandas\nsphinxcontrib-googleanalytics\nipython\nsphinxcontrib.datatemplates\nsphinxcontrib.spelling\nsphinx-tabs\nexhale\n"
  },
  {
    "path": "setup/index.rst",
    "content": ".. meta::\n   :description: Install AWS Neuron SDK for PyTorch and JAX on Inferentia and Trainium instances\n   :keywords: neuron, installation, setup, pytorch, jax, inferentia, trainium, inf2, trn1, trn2, trn3\n   :instance-types: inf2, trn1, trn2, trn3, inf1\n   :content-type: navigation-hub\n   :date-modified: 2026-03-03\n\n.. _setup-guide-index:\n\nInstall AWS Neuron SDK\n======================\n\nInstall the AWS Neuron SDK to enable deep learning acceleration on Inferentia and Trainium instances.\n\n.. note::\n   \n   **New to Neuron?** Start with the :doc:`quickstart guide </about-neuron/quick-start/index>` \n   for a complete end-to-end tutorial.\n\nQuick Start Decision Tree\n--------------------------\n\nAnswer these questions to find your installation path:\n\n**1. What's your use case?**\n\n- **Training ML models** → Use Trn1, Trn2, or Trn3\n- **Running inference** → Use Inf2, Trn1, Trn2, or Trn3\n- **Legacy Inf1 support** → See :ref:`legacy-inf1-support`\n\n**2. Which framework?**\n\n- :ref:`PyTorch <pytorch-setup>` (recommended for most users)\n- :ref:`JAX <jax-setup>`\n\n**3. Installation method?**\n\n- **AWS Deep Learning AMI** — fastest setup, pre-configured with all dependencies. Best for getting started and single-user development.\n- **Deep Learning Container** — Docker-based, portable across EC2, ECS, EKS. Best for production deployments and CI/CD pipelines.\n- **Manual installation** — full control over packages and versions. Best for custom OS images and shared clusters.\n\nInstance Comparison\n-------------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 15 15 35 15\n   \n   * - Instance\n     - NeuronCore\n     - Use Case\n     - Status\n   * - Trn3\n     - :doc:`v4 </about-neuron/arch/neuron-hardware/neuron-core-v4>`\n     - Training and inference (latest generation)\n     - Current\n   * - Trn2\n     - :doc:`v3 </about-neuron/arch/neuron-hardware/neuron-core-v3>`\n     - Training and inference\n     - Current\n   * - Trn1\n     - :doc:`v2 </about-neuron/arch/neuron-hardware/neuron-core-v2>`\n     - Training and inference\n     - Current\n   * - Inf2\n     - :doc:`v2 </about-neuron/arch/neuron-hardware/neuron-core-v2>`\n     - Inference\n     - Current\n   * - Inf1\n     - :doc:`v1 </about-neuron/arch/neuron-hardware/neuron-core-v1>`\n     - Legacy inference\n     - Legacy\n\nInstallation by Framework\n--------------------------\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: PyTorch\n      :link: pytorch/index\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      **Recommended for most users**\n      \n      - PyTorch 2.9+ with Native Neuron support\n      - Eager mode and torch.compile\n      - Supports: Inf2, Trn1, Trn2, Trn3\n      \n      :bdg-success:`Most Popular`\n\n   .. grid-item-card:: JAX\n      :link: jax/index\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      **For JAX users**\n      \n      - JAX 0.7+ with Neuron backend\n      - XLA compilation\n      - Supports: Inf2, Trn1, Trn2, Trn3\n\nMulti-framework DLAMI\n----------------------\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: 🚀 Neuron multi-framework DLAMI\n      :link: multiframework-dlami\n      :link-type: doc\n      :class-card: sd-border-2\n\n      Pre-configured AMI with PyTorch, JAX, and vLLM virtual environments ready to use. The fastest way to get started with any framework.\n\nCommon Issues\n-------------\n\n.. dropdown:: ⚠️ Module not found errors\n   :color: info\n   :animate: fade-in\n   \n   If you see \"No module named 'torch_neuronx'\" or similar:\n   \n   1. Verify virtual environment is activated\n   2. Check Python version: ``python --version`` (should be 3.10+)\n   3. Reinstall: ``pip install --force-reinstall torch-neuronx``\n   \n   See :doc:`troubleshooting` for more details.\n\n.. dropdown:: ⚠️ Instance type not recognized\n   :color: info\n   :animate: fade-in\n   \n   Ensure you're using a Neuron-supported instance:\n   \n   - Check with: ``aws ec2 describe-instance-types --instance-types <type>``\n   - Verify Neuron devices: ``neuron-ls``\n   \n   See :doc:`troubleshooting` for more details.\n\n.. dropdown:: ⚠️ Version compatibility issues\n   :color: info\n   :animate: fade-in\n   \n   Check version compatibility:\n   \n   - PyTorch 2.9+ requires neuronx-cc 2.15+\n   - See :doc:`/release-notes/index` for compatibility matrix\n   \n   See :doc:`troubleshooting` for more details.\n\n.. _legacy-inf1-support:\n\nLegacy Inf1 Support\n-------------------\n\n.. warning::\n   \n   **Inf1 uses legacy NeuronCore v1 architecture.** For new projects, use Inf2, Trn1, Trn2, or Trn3 with NeuronCore v2.\n   \n   - Inf2 offers 3x better price-performance than Inf1\n   - Broader framework support (PyTorch 2.x, JAX)\n   - Active development and feature updates\n\n.. grid:: 1\n\n   .. grid-item-card:: Inf1 Installation (Legacy)\n      :link: legacy-inf1/index\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      Install Neuron SDK for Inferentia 1 instances\n      \n      :bdg-warning:`Legacy Hardware`\n\nAdditional Resources\n--------------------\n\n- :doc:`/devflows/ec2-flows` - Launch Inf/Trn instances on Amazon EC2\n- :doc:`/containers/index` - Use Deep Learning Containers\n- :doc:`troubleshooting` - Installation troubleshooting guide\n- :doc:`/release-notes/index` - Version compatibility information\n\nOther Platforms\n---------------\n\n.. grid:: 1\n\n   .. grid-item-card:: Rocky Linux 9\n      :link: setup-rocky-linux-9\n      :link-type: ref\n      :class-card: sd-border-2\n\n      Install PyTorch Neuron on Rocky Linux 9 using the Rocky-9-EC2-Base AMI. Covers driver and tools setup, then follows the Amazon Linux 2023 guide for framework installation.\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n   \n   PyTorch <pytorch/index>\n   JAX <jax/index>\n   Multi-framework <multiframework-dlami>\n   torch-neuron (Legacy) <legacy-inf1/index>\n   Use Rocky Linux 9 <setup-rocky-linux-9>\n   Troubleshooting <troubleshooting>\n"
  },
  {
    "path": "setup/index.txt-back",
    "content": ".. _setup-guide-index:\n\nSetup Guide\n===========\nThis section walks you through the various options to install Neuron. You have to install Neuron on Trainium and Inferentia powered instances to enable deep-learning acceleration. \n\n.. dropdown::  Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /setup/install-templates/launch-instance.txt\n\n.. dropdown::  Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. tab-set::\n\n       .. tab-item:: Amazon Linux 2\n\n        .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 2\n            :end-line: 3\n\n       .. tab-item:: Ubuntu 20\n\n        .. include :: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 5\n            :end-line: 6\n\n\n.. tab-set::\n\n\n   .. tab-item:: Pytorch\n        :name:\n\n        .. dropdown::  torch-neuron (``Inf1``)\n                :class-title: sphinx-design-class-title-med\n                :class-body: sphinx-design-class-body-small\n                :animate: fade-in\n\n                * :ref:`Fresh install <install-neuron-pytorch>`\n                * :ref:`Update to latest release <update-neuron-pytorch>`\n                * :ref:`Install previous releases <install-prev-neuron-pytorch>`\n                * :ref:`pytorch-install-cxx11`\n\n                .. include:: /setup/install-templates/trn1-ga-warning.txt\n\n        .. dropdown::  torch-neuronx (``Trn1, Inf2``)\n                :class-title: sphinx-design-class-title-med\n                :class-body: sphinx-design-class-body-small\n                :animate: fade-in\n\n                * :ref:`Fresh install <pytorch-neuronx-install>`\n                * :ref:`Update to latest release <pytorch-neuronx-update>`\n                * :ref:`Install previous releases <pytorch-neuronx-install-prev>`\n\n                .. include:: /setup/install-templates/trn1-ga-warning.txt\n\n   .. tab-item:: Tensorflow\n        :name: \n\n        .. dropdown::  tensorflow-neuron (``Inf1``)\n                :class-title: sphinx-design-class-title-med\n                :class-body: sphinx-design-class-body-small\n                :animate: fade-in\n\n                * :ref:`Fresh install <install-neuron-tensorflow>`\n                * :ref:`Update to Latest release <update-neuron-tensorflow>`\n                * :ref:`Install previous releases <install-prev-neuron-tensorflow>`\n\n        .. dropdown::  tensorflow-neuronx (``Trn1, Inf2``)\n                :class-title: sphinx-design-class-title-med\n                :class-body: sphinx-design-class-body-small\n                :animate: fade-in\n\n                * :ref:`Fresh install <install-tensorflow-neuronx>`\n                * :ref:`Update to Latest release <update-tensorflow-neuronx>`\n\n   .. tab-item:: MXNet\n        :name:\n\n        .. dropdown::  mxnet-neuron (``Inf1``)\n            :class-title: sphinx-design-class-title-med\n            :class-body: sphinx-design-class-body-small\n            :animate: fade-in\n\n            * :ref:`Fresh install <install-neuron-mxnet>`\n            * :ref:`Update to latest release <update-neuron-mxnet>`\n            * :ref:`Install previous releases <install-prev-neuron-mxnet>`\n"
  },
  {
    "path": "setup/install-templates/al2-python.rst",
    "content": ".. note::\n\n  Please make sure to ``upgrade`` from ``python 3.7`` to ``python 3.8`` to use Neuron SDK on ``Amazon Linux 2``. Starting from ``Neuron Release 2.13``, ``python 3.7`` is no longer supported as mentioned :ref:`here <announce-eol-python37>`. \n  Also,  we do not have support for ``torch-neuronx 2.1.2`` on Amaznon Linux 2. "
  },
  {
    "path": "setup/install-templates/inf1/compile_mode.rst",
    "content": "If model compilation occurs outside the model deployment environment, you can \ninstall only the Neuron framework extensions and the compiler on any compute \ninstance. This setup is helpful when compiling large complex models that require \nlarge amount of memory or during a CICD process where models are compiled in a \nseparate step, prior to deployment.\n\n\n   \n"
  },
  {
    "path": "setup/install-templates/inf1/deploy_mode.rst",
    "content": "During deployment it can be beneficial to reduce the number of components installed in the system.\nFor use-cases where only inference is necessary (compilation is already complete), only the\nframework and runtime should be installed.\n\nNote:\nIf you are using a regular U18, U20, or AL2 AMI, follow the same setup instructions as the Base DLAMIs respectively.\n\n\n\n\n\n   \n"
  },
  {
    "path": "setup/install-templates/inf1/develop_mode.rst",
    "content": "The simplest environment setup for model development installs all Neuron SDK components\ndirectly on an AWS ML accelerator instance: the Neuron framework extensions, compiler, runtime, and tools. This will\nallow you to compile, execute, and performance tune your model, all in the same instance. This is the recommended\nworkflow when first starting to work with Neuron device or when optimizing a model.\n\nNote:\nIf you are using a regular U18, U20, or AL2 AMI, follow the same setup instructions as the Base DLAMIs respectively.\n\n\n\n   \n"
  },
  {
    "path": "setup/install-templates/inf1/dlami-enable-neuron-mxnet.rst",
    "content": "\n\n.. tab-set::\n\n   .. tab-item:: MXNet 1.8.0\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux\n\n   .. tab-item:: MXNet 1.5.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --framework-version=mxnet-1.5.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --framework-version=mxnet-1.5.1\n"
  },
  {
    "path": "setup/install-templates/inf1/dlami-enable-neuron-pytorch.rst",
    "content": ".. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --framework=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --framework=pytorch-1.8.1\n\n\n   \n"
  },
  {
    "path": "setup/install-templates/inf1/launch-inf1-ami.rst",
    "content": "* Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to Launch an Inf1 instance, when choosing the instance type at the EC2 console. Please make sure to select the correct instance type. To get more information about Inf1 instances sizes and pricing see `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_.\n\n* Select your Amazon Machine Image (AMI) of choice, please note that Neuron supports Ubuntu 18 AMI or Amazon Linux 2 AMI, you can also choose\n  Ubuntu 18 or Amazon Linux 2 Deep Learning AMI (DLAMI)\n\n* After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-connect-to-instance-linux>`_ to connect to the instance \n\n\n\n"
  },
  {
    "path": "setup/install-templates/inf1/launch-inf1-dlami-aws-cli.rst",
    "content": ".. _launch-inf1-dlami-aws-cli:\n\nAWS CLI commands to launch inf1 instances\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code:: bash\n\n  # Launch instance\n  # The following are the different Deep Learning AMIs to get started and is recommended\n  # for the tutorials.\n  # \"Deep Learning AMI (Amazon Linux)*\"\n  # \"Deep Learning AMI (Amazon Linux 2)*\"\n  # \"Deep Learning AMI (Ubuntu 18.04)*\"\n  #\n\n  # You can get the latest AMI ID for any of the above ones using the following command\n  AWS_REGION=\"<aws region name like us-east-1>\"\n  AMIID=$(aws ec2 describe-images --owners amazon --filters \"Name=name,Values=Deep Learning Base AMI (Ubuntu 18.04)*\" --query 'sort_by(Images, &CreationDate)[].[Name,ImageId]' --region $AWS_REGION --output text | tail -n 1  | awk '{print $(NF)}')\n\n  INSTANCE_ID=$(aws ec2 run-instances --image-id $AMIID --count 1 --instance-type <inf1.xlarge type> --key-name MyKeyPair --region $AWS_REGION [--subnet-id <subnet id>]| python -c 'import sys, json; print(json.load(sys.stdin)[\"Instances\"][0][\"InstanceId\"])')\n  echo \"Instance ID of launched instance\" $INSTANCE_ID\n\n  # Wait for few seconds to a minute for the instance to get created and have public DNS/ip.\n\n  # The following command will get the public DNS name of the launched instance to which\n  # you can then log in to using your key pair.\n  INSTANCE_PUBLIC_DNS=$(aws ec2 describe-instances --instance-id $INSTANCE_ID --region $AWS_REGION | python -c 'import sys, json; print(json.load(sys.stdin)[\"Reservations\"][0][\"Instances\"][0][\"PublicDnsName\"])')\n  echo \"DNS name of the launched instance\" $INSTANCE_PUBLIC_DNS\n\n  # Wait for couple of minutes for the instance to be ready and then login:\n  ssh -i <key.pem> <ubuntu/ec2-user>@$INSTANCE_PUBLIC_DNS\n\n"
  },
  {
    "path": "setup/install-templates/inf1/launch-inf1-dlami.rst",
    "content": "* Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to Launch an Inf1 instance, when choosing the instance type at the EC2 console. Please make sure to select the correct instance type. To get more information about Inf1 instances sizes and pricing see `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_.\n\n* When choosing an Amazon Machine Image (AMI) make sure to select `Deep Learning AMI with Conda Options <https://docs.aws.amazon.com/dlami/latest/devguide/conda.html>`_. Please note that Neuron Conda environments are supported only in Ubuntu 18 DLAMI and Amazon Linux2 DLAMI, Neuron Conda environments are not supported in Amazon Linux DLAMI.\n\n\n\n* After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-connect-to-instance-linux>`_ to connect to the instance \n\n.. note::\n\n  You can also launch the instance from AWS CLI, please see :ref:`AWS CLI commands to launch inf1 instances <launch-inf1-dlami-aws-cli>`.\n\n"
  },
  {
    "path": "setup/install-templates/inf1/neuron-pip-install.rst",
    "content": "\nIt is recommended to use a virtual environment when installing Neuron\npip packages. The following steps show how to setup the virtual\nenvironment on Ubuntu or Amazon Linux:\n\n.. code:: bash\n\n   # Ubuntu\n   sudo apt-get update\n   sudo apt-get install -y python3-venv g++\n\n.. code:: bash\n\n   # Amazon Linux\n   sudo dnf update\n   sudo dnf install -y python3 gcc-c++\n\nSetup a new Python virtual environment:\n\n.. code:: bash\n\n   python3 -m venv test_venv\n   source test_venv/bin/activate\n   pip install -U pip\n\n.. include:: /setup/install-templates/inf1/neuron-pip-setup.rst\n\n.. note::\n\n   .. container:: toggle-header\n\n      .. code:: bash\n\n         curl https://pip.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | gpg --import\n         pip download --no-deps neuron-cc\n         # The above shows you the name of the package downloaded\n         # Use it in the following command\n         wget https://pip.repos.neuron.amazonaws.com/neuron-cc/neuron_cc-<VERSION FROM FILE>.whl.asc\n         gpg --verify neuron_cc-<VERSION FROM FILE>.whl.asc neuron_cc-<VERSION FROM FILE>.whl\n\nThe following Pip installation commands assume you are using a virtual\nPython environment (see above for instructions on how to setup a virtual\nPython environment). If not using virtual Python environment, please\nswitch 'pip' with 'pip3' as appropriate for your Python environment.\n"
  },
  {
    "path": "setup/install-templates/inf1/neuron-pip-setup.rst",
    "content": "Modify Pip repository configurations to point to the Neuron repository:\n\n.. code:: bash\n\n   tee $VIRTUAL_ENV/pip.conf > /dev/null <<EOF\n   [global]\n   extra-index-url = https://pip.repos.neuron.amazonaws.com\n   EOF"
  },
  {
    "path": "setup/install-templates/inf1/note-setup-cntr.rst",
    "content": ".. note::\n\n  * Instructions in this page only apply to setting up Neuron components on Linux host running Ubuntu or Amazon Linux AMI.\n  * For an example of how to install Neuron components in a container, see :ref:`tutorial-docker-env-setup` and our\n    :ref:`neuron-containers` documentation for more details.\n"
  },
  {
    "path": "setup/install-templates/inf1/note-setup-general.rst",
    "content": ".. note::\n\n   For a successful installation or update, execute each line of the instructions below separately or \n   copy the contents of the code block into a script file and source its contents.\n"
  },
  {
    "path": "setup/install-templates/inf1/note-setup-libnrt-warning.rst",
    "content": ".. important ::\n\n   For successful installation or update to next releases (Neuron 1.20.0 and newer):\n      * Uninstall ``aws-neuron-dkms`` by running: ``sudo apt remove aws-neuron-dkms`` or ``sudo dnf remove aws-neuron-dkms``\n      * Install or upgrade to latest Neuron driver (``aws-neuron-dkms``) by following the \"Setup Guide\" instructions."
  },
  {
    "path": "setup/install-templates/inf1/tensorboard-plugin-neuron-pip-install.rst",
    "content": "\nIf you are using the DLAMI TensorFlow-Neuron Conda environment,\nplease run the following to update TensorBoard before installing\nthe Neuron plugin.\n\n.. code:: bash\n\n    pip install \"tensorboard<=2.4.0\" --force-reinstall\n\n.. include:: /setup/install-templates/inf1/neuron-pip-setup.rst\n\n.. code:: bash\n\n    pip install tensorboard-plugin-neuron"
  },
  {
    "path": "setup/install-templates/inf2/dlami-enable-neuron-pytorch.rst",
    "content": ".. tab-set::\n\n   .. tab-item:: PyTorch 1.9.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux\n\n   .. tab-item:: PyTorch 1.8.1\n\n      .. tab-set::\n\n         .. tab-item:: Ubuntu DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --framework=pytorch-1.8.1\n\n         .. tab-item:: Amazon Linux DLAMI\n\n            .. include :: /setup/install-templates/inf1/note-setup-general.rst\n\n            .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --framework=pytorch-1.8.1\n\n\n   \n"
  },
  {
    "path": "setup/install-templates/inf2/launch-inf2-dlami.rst",
    "content": "* Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an Inf2 instance, when choosing the instance type at the EC2 console. Please make sure to select the correct instance type. To get more information about Inf2 instances sizes and pricing see `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_.\n\n* When choosing an Amazon Machine Image (AMI) make sure to select `Deep Learning AMI with Conda Options <https://docs.aws.amazon.com/dlami/latest/devguide/conda.html>`_. Please note that Neuron Conda environments are supported only in Ubuntu 18 DLAMI and Amazon Linux2 DLAMI, Neuron Conda environments are not supported in Amazon Linux DLAMI.\n\n\n\n* After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-connect-to-instance-linux>`_ to connect to the instance \n\n.. note::\n\n  You can also launch the instance from AWS CLI, please see :ref:`AWS CLI commands to launch inf2 instances <launch-inf1-dlami-aws-cli>`.\n\n"
  },
  {
    "path": "setup/install-templates/inf2/note-setup-libnrt-warning.rst",
    "content": ".. important ::\n\n   For successful installation or update to next releases (Neuron 1.20.0 and newer):\n      * Uninstall ``aws-neuron-dkms`` by running: ``sudo apt remove aws-neuron-dkms`` or ``sudo dnf remove aws-neuron-dkms``\n      * Install or upgrade to latest Neuron driver (``aws-neuron-dkms``) by following the \"Setup Guide\" instructions."
  },
  {
    "path": "setup/install-templates/launch-instance.txt",
    "content": "* Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to Launch an instance, when choosing the instance type at the EC2 console. Please make sure to select the correct instance type.\n* To get more information about instances sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_, `Inf1 web page <https://aws.amazon.com/ec2/instance-types/inf1/>`_\n* Select your Amazon Machine Image (AMI) of choice, please note that Neuron supports Amazon Linux 2 AMI(HVM) - Kernel 5.10.\n\n* When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n\n* After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-connect-to-instance-linux>`_ to connect to the instance \n\n.. include:: /setup/install-templates/trn1-ga-warning.txt"
  },
  {
    "path": "setup/install-templates/launch-trn1-dlami.rst",
    "content": "* Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to Launch an Trn1 instance, when choosing the instance type at the EC2 console. Please make sure to select the correct instance type. To get more information about Trn1 instances sizes and pricing see `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_.\n\n* Select your Amazon Machine Image (AMI) of choice, please note that Neuron support Ubuntu 18 AMI or Amazon Linux 2 AMI, you can also choose \n  Ubuntu 18 or Amazon Linux 2 Deep Learning AMI (DLAMI)\n\n* When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n\n* After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-connect-to-instance-linux>`_ to connect to the instance \n\n.. include:: /setup/install-templates/trn1-ga-warning.txt\n\n  \n"
  },
  {
    "path": "setup/install-templates/trn1/dlami-notes.rst",
    "content": "\n.. note::\n  * Please refer to the instructions under the tab ``Amazon Linux 2 DLAMI Base``.\n\n.. note::\n  * Please refer to the instructions under the tab ``Ubuntu 20 DLAMI Base``.\n\n.. note::\n  * Coming soon, meanwhile please refer to the instructions under the tab ``Amazon Linux 2 DLAMI Base``.\n\n.. note::\n  * Coming soon, meanwhile please refer to the instructions under the tab ``Ubuntu 20 DLAMI Base``.\n\n.. note::\n  * For a successful installation or update, execute each line of the instructions below separately or copy the contents of the code block into a script file and source its contents.\n  * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n  * While launching the instance, please use the AMI with the name ``Deep Learning Base Neuron AMI (Amazon Linux 2) <Latest_Date>``.\n  * To launch an instance using a specific AMI, please refer to the instructions mentioned `here <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/finding-an-ami.html#finding-an-ami-console>`__.\n\n.. note::\n  * For a successful installation or update, execute each line of the instructions below separately or copy the contents of the code block into a script file and source its contents.\n  * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n  * While launching the instance, please use the AMI with the name ``Deep Learning Base Neuron AMI (Ubuntu 20.04) <Latest_Date>``.\n  * To launch an instance using a specific AMI, please refer to the instructions mentioned `here <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/finding-an-ami.html#finding-an-ami-console>`__.\n\n.. note::\n  * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n  * While launching the instance, please use the AMI with the name ``Deep Learning AMI Neuron PyTorch 1.13.1 (Amazon Linux 2) <Latest_Date>``.\n  * To launch an instance using a specific AMI, please refer to the instructions mentioned `here <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/finding-an-ami.html#finding-an-ami-console>`__.\n\n.. note::\n  * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n  * While launching the instance, please use the AMI with the name ``Deep Learning AMI Neuron PyTorch 1.13.1 (Ubuntu 20.04) <Latest_Date>``.\n  * To launch an instance using a specific AMI, please refer to the instructions mentioned `here <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/finding-an-ami.html#finding-an-ami-console>`__.\n\n.. warning::\n   * Please note the this DLALMI might not have latest PyTorch version. After activating the python venv, please check your current version using : ``pip list installed | grep torch-neuronx``\n   * To see the latest PyTorch version :ref:`check here <latest-neuron-release-artifacts>` and to update to latest release see :doc:`/setup/pytorch/manual`\n\n.. note::\n  * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n  * While launching the instance, please use the AMI with the name ``Deep Learning AMI Neuron TensorFlow 2.10.1 (Ubuntu 20.04) <Latest_Date>``.\n  * To launch an instance using a specific AMI, please refer to the instructions mentioned `here <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/finding-an-ami.html#finding-an-ami-console>`__.\n\n.. note::\n  * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n  * While launching the instance, please use the AMI with the name ``Deep Learning AMI Neuron TensorFlow 2.10.1 (Amazon Linux 2) <Latest_Date>``.\n  * To launch an instance using a specific AMI, please refer to the instructions mentioned `here <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/finding-an-ami.html#finding-an-ami-console>`__.\n"
  },
  {
    "path": "setup/install-templates/trn1-ga-warning.txt",
    "content": ".. note::\n\n  If you are facing a connectivity issue during the model loading process on a Trn1 instance with Ubuntu, that could probably be because of Ubuntu limitations with multiple interfaces. To solve this problem, please follow the steps mentioned :ref:`here<trn1_ubuntu_troubleshooting>`.\n\n  Users are highly encouraged to use DLAMI to launch the instances, since DLAMIs come with the required fix."
  },
  {
    "path": "setup/jax/dlami.rst",
    "content": ".. meta::\n   :description: Install JAX Neuron using AWS Deep Learning AMI on Inf2, Trn1, Trn2, Trn3\n   :keywords: jax, neuron, dlami, installation, ami\n   :framework: jax\n   :installation-method: dlami\n   :instance-types: inf2, trn1, trn2, trn3\n   :os: ubuntu-24.04, ubuntu-22.04, al2023\n   :python-versions: 3.10, 3.11, 3.12\n   :content-type: installation-guide\n   :estimated-time: 5 minutes\n   :date-modified: 2026-03-03\n\nInstall JAX via Deep Learning AMI\n===================================\n\nInstall JAX with Neuron support using pre-configured AWS Deep Learning AMIs. \n\n⏱️ **Estimated time**: 5 minutes\n\n.. note::\n   Want to read about Neuron's Deep Learning machine images (DLAMIs) before diving in? Check out the :doc:`/dlami/index`.\n\n----\n\nPrerequisites\n-------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n   \n   * - Requirement\n     - Details\n   * - Instance Type\n     - Inf2, Trn1, Trn2, or Trn3\n   * - AWS Account\n     - With EC2 permissions\n   * - SSH Key Pair\n     - For instance access\n   * - AWS CLI\n     - Configured with credentials (optional)\n\nInstallation Steps\n------------------\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 24.04\n      :sync: ubuntu-24-04\n      \n      **Step 1: Find the Latest AMI**\n      \n      Get the latest JAX DLAMI for Ubuntu 24.04:\n      \n      .. code-block:: bash\n         \n         aws ec2 describe-images \\\n           --owners amazon \\\n           --filters \"Name=name,Values=Deep Learning AMI Neuron JAX * (Ubuntu 24.04)*\" \\\n           --query 'Images | sort_by(@, &CreationDate) | [-1].ImageId' \\\n           --output text\n      \n      **Step 2: Launch Instance**\n      \n      Launch a Trn1 or Inf2 instance with the AMI:\n      \n      .. code-block:: bash\n         \n         aws ec2 run-instances \\\n           --image-id ami-xxxxxxxxxxxxxxxxx \\\n           --instance-type trn1.2xlarge \\\n           --key-name your-key-pair \\\n           --security-group-ids sg-xxxxxxxxx \\\n           --subnet-id subnet-xxxxxxxxx\n      \n      Replace:\n      \n      - ``ami-xxxxxxxxxxxxxxxxx`` with AMI ID from Step 1\n      - ``your-key-pair`` with your SSH key pair name\n      - ``sg-xxxxxxxxx`` with your security group ID\n      - ``subnet-xxxxxxxxx`` with your subnet ID\n      \n      **Step 3: Connect to Instance**\n      \n      .. code-block:: bash\n         \n         ssh -i your-key-pair.pem ubuntu@<instance-public-ip>\n      \n      **Step 4: Activate Environment**\n      \n      The DLAMI includes a pre-configured virtual environment:\n      \n      .. code-block:: bash\n         \n         source /opt/aws_neuronx_venv_jax/bin/activate\n      \n      **Step 5: Verify Installation**\n      \n      .. code-block:: python\n         \n         python3 << EOF\n         import jax\n         import jax_neuronx\n         \n         print(f\"JAX version: {jax.__version__}\")\n         print(f\"Devices: {jax.devices()}\")\n         \n         # Check Neuron devices\n         import subprocess\n         result = subprocess.run(['neuron-ls'], capture_output=True, text=True)\n         print(result.stdout)\n         EOF\n      \n      **Expected output**:\n      \n      .. code-block:: text\n         \n         JAX version: 0.7.0\n         Devices: [NeuronDevice(id=0), NeuronDevice(id=1)]\n         \n         +--------+--------+--------+-----------+\n         | DEVICE | CORES  | MEMORY | CONNECTED |\n         +--------+--------+--------+-----------+\n         | 0      | 2      | 32 GB  | Yes       |\n         | 1      | 2      | 32 GB  | Yes       |\n         +--------+--------+--------+-----------+\n      \n      .. dropdown:: ⚠️ Troubleshooting: Module not found\n         :color: warning\n         :animate: fade-in\n         \n         If you see ``ModuleNotFoundError: No module named 'jax_neuronx'``:\n         \n         1. Verify virtual environment is activated:\n            \n            .. code-block:: bash\n               \n               which python\n               # Should show: /opt/aws_neuronx_venv_jax/bin/python\n         \n         2. Check Python version:\n            \n            .. code-block:: bash\n               \n               python --version\n               # Should be 3.10 or higher\n         \n         3. Reinstall jax-neuronx:\n            \n            .. code-block:: bash\n               \n               pip install --force-reinstall jax-neuronx\n      \n      .. dropdown:: ⚠️ Troubleshooting: No Neuron devices found\n         :color: warning\n         :animate: fade-in\n         \n         If ``neuron-ls`` shows no devices:\n         \n         1. Verify instance type:\n            \n            .. code-block:: bash\n               \n               curl http://169.254.169.254/latest/meta-data/instance-type\n               # Should show trn1.*, trn2.*, trn3.*, or inf2.*\n         \n         2. Check Neuron driver:\n            \n            .. code-block:: bash\n               \n               lsmod | grep neuron\n               # Should show neuron driver loaded\n         \n         3. Restart Neuron runtime:\n            \n            .. code-block:: bash\n               \n               sudo systemctl restart neuron-monitor\n               neuron-ls\n\n   .. tab-item:: Ubuntu 22.04\n      :sync: ubuntu-22-04\n      \n      **Step 1: Find the Latest AMI**\n      \n      Get the latest JAX DLAMI for Ubuntu 22.04:\n      \n      .. code-block:: bash\n         \n         aws ec2 describe-images \\\n           --owners amazon \\\n           --filters \"Name=name,Values=Deep Learning AMI Neuron JAX * (Ubuntu 22.04)*\" \\\n           --query 'Images | sort_by(@, &CreationDate) | [-1].ImageId' \\\n           --output text\n      \n      **Step 2: Launch Instance**\n      \n      .. code-block:: bash\n         \n         aws ec2 run-instances \\\n           --image-id ami-xxxxxxxxxxxxxxxxx \\\n           --instance-type trn1.2xlarge \\\n           --key-name your-key-pair \\\n           --security-group-ids sg-xxxxxxxxx \\\n           --subnet-id subnet-xxxxxxxxx\n      \n      **Step 3: Connect to Instance**\n      \n      .. code-block:: bash\n         \n         ssh -i your-key-pair.pem ubuntu@<instance-public-ip>\n      \n      **Step 4: Activate Environment**\n      \n      .. code-block:: bash\n         \n         source /opt/aws_neuronx_venv_jax/bin/activate\n      \n      **Step 5: Verify Installation**\n      \n      .. code-block:: python\n         \n         python3 << EOF\n         import jax\n         import jax_neuronx\n         \n         print(f\"JAX version: {jax.__version__}\")\n         print(f\"Devices: {jax.devices()}\")\n         \n         # Check Neuron devices\n         import subprocess\n         result = subprocess.run(['neuron-ls'], capture_output=True, text=True)\n         print(result.stdout)\n         EOF\n      \n      .. dropdown:: ⚠️ Troubleshooting: Module not found\n         :color: warning\n         :animate: fade-in\n         \n         If you see ``ModuleNotFoundError: No module named 'jax_neuronx'``:\n         \n         1. Verify virtual environment is activated\n         2. Check Python version: ``python --version`` (should be 3.10+)\n         3. Reinstall: ``pip install --force-reinstall jax-neuronx``\n      \n      .. dropdown:: ⚠️ Troubleshooting: No Neuron devices found\n         :color: warning\n         :animate: fade-in\n         \n         If ``neuron-ls`` shows no devices:\n         \n         1. Verify instance type\n         2. Check Neuron driver: ``lsmod | grep neuron``\n         3. Restart runtime: ``sudo systemctl restart neuron-monitor``\n\n   .. tab-item:: Amazon Linux 2023\n      :sync: al2023\n      \n      **Step 1: Find the Latest AMI**\n      \n      Get the latest JAX DLAMI for Amazon Linux 2023:\n      \n      .. code-block:: bash\n         \n         aws ec2 describe-images \\\n           --owners amazon \\\n           --filters \"Name=name,Values=Deep Learning AMI Neuron JAX * (Amazon Linux 2023)*\" \\\n           --query 'Images | sort_by(@, &CreationDate) | [-1].ImageId' \\\n           --output text\n      \n      **Step 2: Launch Instance**\n      \n      .. code-block:: bash\n         \n         aws ec2 run-instances \\\n           --image-id ami-xxxxxxxxxxxxxxxxx \\\n           --instance-type trn1.2xlarge \\\n           --key-name your-key-pair \\\n           --security-group-ids sg-xxxxxxxxx \\\n           --subnet-id subnet-xxxxxxxxx\n      \n      **Step 3: Connect to Instance**\n      \n      .. code-block:: bash\n         \n         ssh -i your-key-pair.pem ec2-user@<instance-public-ip>\n      \n      .. note::\n         \n         Amazon Linux 2023 uses ``ec2-user`` instead of ``ubuntu``.\n      \n      **Step 4: Activate Environment**\n      \n      .. code-block:: bash\n         \n         source /opt/aws_neuronx_venv_jax/bin/activate\n      \n      **Step 5: Verify Installation**\n      \n      .. code-block:: python\n         \n         python3 << EOF\n         import jax\n         import jax_neuronx\n         \n         print(f\"JAX version: {jax.__version__}\")\n         print(f\"Devices: {jax.devices()}\")\n         \n         # Check Neuron devices\n         import subprocess\n         result = subprocess.run(['neuron-ls'], capture_output=True, text=True)\n         print(result.stdout)\n         EOF\n      \n      .. dropdown:: ⚠️ Troubleshooting: Module not found\n         :color: warning\n         :animate: fade-in\n         \n         If you see ``ModuleNotFoundError: No module named 'jax_neuronx'``:\n         \n         1. Verify virtual environment is activated\n         2. Check Python version: ``python --version`` (should be 3.10+)\n         3. Reinstall: ``pip install --force-reinstall jax-neuronx``\n      \n      .. dropdown:: ⚠️ Troubleshooting: No Neuron devices found\n         :color: warning\n         :animate: fade-in\n         \n         If ``neuron-ls`` shows no devices:\n         \n         1. Verify instance type\n         2. Check Neuron driver: ``lsmod | grep neuron``\n         3. Restart runtime: ``sudo systemctl restart neuron-monitor``\n\nNext Steps\n----------\n\nNow that JAX is installed:\n\n1. **Try a Quick Example**:\n   \n   .. code-block:: python\n      \n      import jax\n      import jax.numpy as jnp\n      \n      # Simple operation on Neuron\n      x = jnp.array([1.0, 2.0, 3.0])\n      y = jnp.array([4.0, 5.0, 6.0])\n      result = jax.numpy.multiply(x, y)\n      print(result)\n\n2. **Read Documentation**:\n   \n   - :doc:`/frameworks/jax/index`\n   - :doc:`/frameworks/jax/api-reference-guide/index`\n\n3. **Explore Setup Guide**:\n   \n   - :doc:`/frameworks/jax/setup/jax-setup`\n\nAdditional Resources\n--------------------\n\n- :doc:`/dlami/index` - DLAMI documentation\n- :doc:`/containers/index` - Container-based deployment\n- :doc:`../troubleshooting` - Common issues and solutions\n- :doc:`/release-notes/index` - Version compatibility information\n"
  },
  {
    "path": "setup/jax/dlc.rst",
    "content": ".. meta::\n   :description: Install JAX Neuron using Deep Learning Containers on Inf2, Trn1, Trn2, Trn3\n   :keywords: jax, neuron, dlc, container, docker, installation\n   :framework: jax\n   :installation-method: container\n   :instance-types: inf2, trn1, trn2, trn3\n   :os: ubuntu-24.04, ubuntu-22.04, al2023\n   :content-type: installation-guide\n   :estimated-time: 10 minutes\n   :date-modified: 2026-03-03\n\nInstall JAX via Deep Learning Container\n=========================================\n\nInstall JAX with Neuron support using pre-configured AWS Deep Learning Containers (DLCs).\n\n⏱️ **Estimated time**: ~10 minutes\n\nPrerequisites\n-------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n   \n   * - Requirement\n     - Details\n   * - Instance Type\n     - Inf2, Trn1, Trn2, or Trn3\n   * - Neuron Driver on Host\n     - ``aws-neuronx-dkms`` installed on the host instance\n   * - Docker Installed\n     - Docker engine running on the host instance\n   * - AWS Account\n     - With EC2 permissions\n\nAvailable container images\n--------------------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n   \n   * - Image\n     - ECR URI\n   * - JAX Training\n     - ``public.ecr.aws/neuron/jax-training-neuronx``\n\n.. note::\n\n   JAX DLCs are currently available for training workloads. For the full list of available images and tags, see `JAX Training Containers <https://github.com/aws-neuron/deep-learning-containers#jax-training-neuronx>`_.\n\nFor more information, see :doc:`/containers/locate-neuron-dlc-image`.\n\nInstallation steps\n------------------\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 24.04\n      :sync: ubuntu-24-04\n      \n      **Step 1: Install Neuron driver on host**\n      \n      Configure the Neuron repository and install the driver:\n      \n      .. code-block:: bash\n         \n         . /etc/os-release\n         sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF\n         deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main\n         EOF\n         wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -\n         sudo apt-get update\n         sudo apt-get install -y aws-neuronx-dkms\n      \n      **Step 2: Install and verify Docker**\n      \n      Install Docker and add your user to the ``docker`` group:\n      \n      .. code-block:: bash\n         \n         sudo apt-get install -y docker.io\n         sudo usermod -aG docker $USER\n      \n      Log out and log back in to refresh group membership, then verify:\n      \n      .. code-block:: bash\n         \n         docker run hello-world\n      \n      **Step 3: Pull the DLC image from ECR**\n      \n      Pull the JAX Training DLC image:\n      \n      .. code-block:: bash\n         \n         docker pull public.ecr.aws/neuron/jax-training-neuronx:<image_tag>\n      \n      Replace ``<image_tag>`` with the desired tag from the `JAX Training Containers <https://github.com/aws-neuron/deep-learning-containers#jax-training-neuronx>`_ repository.\n      \n      **Step 4: Run the container**\n      \n      Launch the container with access to Neuron devices:\n      \n      .. code-block:: bash\n         \n         docker run -it \\\n           --device=/dev/neuron0 \\\n           --cap-add SYS_ADMIN \\\n           --cap-add IPC_LOCK \\\n           public.ecr.aws/neuron/jax-training-neuronx:<image_tag> \\\n           bash\n      \n      .. note::\n         \n         Adjust the ``--device`` flags based on your instance type. Use ``ls /dev/neuron*`` on the host to list available devices. For example, a ``trn1.32xlarge`` has 16 devices (``/dev/neuron0`` through ``/dev/neuron15``).\n      \n      **Step 5: Verify inside the container**\n      \n      Run the following commands inside the container to confirm Neuron devices are visible and JAX is installed:\n      \n      .. code-block:: bash\n         \n         neuron-ls\n      \n      .. code-block:: python\n         \n         python3 -c \"import jax; print(f'JAX version: {jax.__version__}'); print(f'Devices: {jax.devices()}')\"\n      \n      **Expected output**:\n      \n      .. code-block:: text\n         \n         +--------+--------+--------+-----------+\n         | DEVICE | CORES  | MEMORY | CONNECTED |\n         +--------+--------+--------+-----------+\n         | 0      | 2      | 32 GB  | Yes       |\n         +--------+--------+--------+-----------+\n         \n         JAX version: 0.7.0\n         Devices: [NeuronDevice(id=0), NeuronDevice(id=1)]\n      \n      .. dropdown:: ⚠️ Troubleshooting: Device not found in container\n         :color: warning\n         :animate: fade-in\n         \n         If ``neuron-ls`` shows no devices inside the container:\n         \n         1. Verify the Neuron driver is installed on the host:\n            \n            .. code-block:: bash\n               \n               # Run on the host (not inside the container)\n               neuron-ls\n         \n         2. Confirm you passed the correct ``--device`` flag:\n            \n            .. code-block:: bash\n               \n               ls /dev/neuron*\n         \n         3. Restart the container with the correct device path:\n            \n            .. code-block:: bash\n               \n               docker run -it --device=/dev/neuron0 \\\n                 --cap-add SYS_ADMIN --cap-add IPC_LOCK \\\n                 public.ecr.aws/neuron/jax-training-neuronx:<image_tag> bash\n      \n      .. dropdown:: ⚠️ Troubleshooting: Permission denied\n         :color: warning\n         :animate: fade-in\n         \n         If you see ``permission denied`` errors when running Docker commands:\n         \n         1. Verify your user is in the ``docker`` group:\n            \n            .. code-block:: bash\n               \n               groups\n               # Should include \"docker\"\n         \n         2. If not, add yourself and re-login:\n            \n            .. code-block:: bash\n               \n               sudo usermod -aG docker $USER\n               # Log out and log back in\n         \n         3. Alternatively, run Docker with ``sudo``:\n            \n            .. code-block:: bash\n               \n               sudo docker run -it --device=/dev/neuron0 \\\n                 --cap-add SYS_ADMIN --cap-add IPC_LOCK \\\n                 public.ecr.aws/neuron/jax-training-neuronx:<image_tag> bash\n      \n      .. dropdown:: ⚠️ Troubleshooting: Image pull failure\n         :color: warning\n         :animate: fade-in\n         \n         If ``docker pull`` fails with a network or authentication error:\n         \n         1. Verify internet connectivity:\n            \n            .. code-block:: bash\n               \n               curl -s https://public.ecr.aws/v2/ | head -1\n         \n         2. Check that the image tag exists by browsing the `ECR Public Gallery <https://gallery.ecr.aws/neuron/jax-training-neuronx>`_.\n         \n         3. If you are behind a proxy, configure Docker proxy settings:\n            \n            .. code-block:: bash\n               \n               sudo mkdir -p /etc/systemd/system/docker.service.d\n               sudo tee /etc/systemd/system/docker.service.d/proxy.conf > /dev/null <<EOF\n               [Service]\n               Environment=\"HTTP_PROXY=http://proxy:port\"\n               Environment=\"HTTPS_PROXY=http://proxy:port\"\n               EOF\n               sudo systemctl daemon-reload\n               sudo systemctl restart docker\n\n   .. tab-item:: Ubuntu 22.04\n      :sync: ubuntu-22-04\n      \n      **Step 1: Install Neuron driver on host**\n      \n      Configure the Neuron repository and install the driver:\n      \n      .. code-block:: bash\n         \n         . /etc/os-release\n         sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF\n         deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main\n         EOF\n         wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -\n         sudo apt-get update\n         sudo apt-get install -y aws-neuronx-dkms\n      \n      **Step 2: Install and verify Docker**\n      \n      Install Docker and add your user to the ``docker`` group:\n      \n      .. code-block:: bash\n         \n         sudo apt-get install -y docker.io\n         sudo usermod -aG docker $USER\n      \n      Log out and log back in to refresh group membership, then verify:\n      \n      .. code-block:: bash\n         \n         docker run hello-world\n      \n      **Step 3: Pull the DLC image from ECR**\n      \n      Pull the JAX Training DLC image:\n      \n      .. code-block:: bash\n         \n         docker pull public.ecr.aws/neuron/jax-training-neuronx:<image_tag>\n      \n      Replace ``<image_tag>`` with the desired tag from the `JAX Training Containers <https://github.com/aws-neuron/deep-learning-containers#jax-training-neuronx>`_ repository.\n      \n      **Step 4: Run the container**\n      \n      Launch the container with access to Neuron devices:\n      \n      .. code-block:: bash\n         \n         docker run -it \\\n           --device=/dev/neuron0 \\\n           --cap-add SYS_ADMIN \\\n           --cap-add IPC_LOCK \\\n           public.ecr.aws/neuron/jax-training-neuronx:<image_tag> \\\n           bash\n      \n      .. note::\n         \n         Adjust the ``--device`` flags based on your instance type. Use ``ls /dev/neuron*`` on the host to list available devices. For example, a ``trn1.32xlarge`` has 16 devices (``/dev/neuron0`` through ``/dev/neuron15``).\n      \n      **Step 5: Verify inside the container**\n      \n      Run the following commands inside the container to confirm Neuron devices are visible and JAX is installed:\n      \n      .. code-block:: bash\n         \n         neuron-ls\n      \n      .. code-block:: python\n         \n         python3 -c \"import jax; print(f'JAX version: {jax.__version__}'); print(f'Devices: {jax.devices()}')\"\n      \n      .. dropdown:: ⚠️ Troubleshooting: Device not found in container\n         :color: warning\n         :animate: fade-in\n         \n         If ``neuron-ls`` shows no devices inside the container:\n         \n         1. Verify the Neuron driver is installed on the host\n         2. Confirm you passed the correct ``--device`` flag: ``ls /dev/neuron*``\n         3. Restart the container with the correct device path\n      \n      .. dropdown:: ⚠️ Troubleshooting: Permission denied\n         :color: warning\n         :animate: fade-in\n         \n         If you see ``permission denied`` errors when running Docker commands:\n         \n         1. Verify your user is in the ``docker`` group: ``groups``\n         2. If not, add yourself: ``sudo usermod -aG docker $USER`` and re-login\n         3. Alternatively, run Docker with ``sudo``\n      \n      .. dropdown:: ⚠️ Troubleshooting: Image pull failure\n         :color: warning\n         :animate: fade-in\n         \n         If ``docker pull`` fails with a network or authentication error:\n         \n         1. Verify internet connectivity: ``curl -s https://public.ecr.aws/v2/ | head -1``\n         2. Check that the image tag exists in the `ECR Public Gallery <https://gallery.ecr.aws/neuron/jax-training-neuronx>`_\n         3. If behind a proxy, configure Docker proxy settings\n\n   .. tab-item:: Amazon Linux 2023\n      :sync: al2023\n      \n      **Step 1: Install Neuron driver on host**\n      \n      Configure the Neuron repository and install the driver:\n      \n      .. code-block:: bash\n         \n         sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF\n         [neuron]\n         name=Neuron YUM Repository\n         baseurl=https://yum.repos.neuron.amazonaws.com\n         enabled=1\n         metadata_expire=0\n         EOF\n         sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n         sudo dnf update -y\n         sudo dnf install -y \"kernel-devel-uname-r == $(uname -r)\"\n         sudo dnf install -y aws-neuronx-dkms\n      \n      **Step 2: Install and verify Docker**\n      \n      Install Docker and add your user to the ``docker`` group:\n      \n      .. code-block:: bash\n         \n         sudo dnf install -y docker\n         sudo usermod -aG docker $USER\n      \n      Log out and log back in to refresh group membership, then verify:\n      \n      .. code-block:: bash\n         \n         docker run hello-world\n      \n      **Step 3: Pull the DLC image from ECR**\n      \n      Pull the JAX Training DLC image:\n      \n      .. code-block:: bash\n         \n         docker pull public.ecr.aws/neuron/jax-training-neuronx:<image_tag>\n      \n      Replace ``<image_tag>`` with the desired tag from the `JAX Training Containers <https://github.com/aws-neuron/deep-learning-containers#jax-training-neuronx>`_ repository.\n      \n      **Step 4: Run the container**\n      \n      Launch the container with access to Neuron devices:\n      \n      .. code-block:: bash\n         \n         docker run -it \\\n           --device=/dev/neuron0 \\\n           --cap-add SYS_ADMIN \\\n           --cap-add IPC_LOCK \\\n           public.ecr.aws/neuron/jax-training-neuronx:<image_tag> \\\n           bash\n      \n      .. note::\n         \n         Adjust the ``--device`` flags based on your instance type. Use ``ls /dev/neuron*`` on the host to list available devices. For example, a ``trn1.32xlarge`` has 16 devices (``/dev/neuron0`` through ``/dev/neuron15``).\n      \n      **Step 5: Verify inside the container**\n      \n      Run the following commands inside the container to confirm Neuron devices are visible and JAX is installed:\n      \n      .. code-block:: bash\n         \n         neuron-ls\n      \n      .. code-block:: python\n         \n         python3 -c \"import jax; print(f'JAX version: {jax.__version__}'); print(f'Devices: {jax.devices()}')\"\n      \n      .. dropdown:: ⚠️ Troubleshooting: Device not found in container\n         :color: warning\n         :animate: fade-in\n         \n         If ``neuron-ls`` shows no devices inside the container:\n         \n         1. Verify the Neuron driver is installed on the host\n         2. Confirm you passed the correct ``--device`` flag: ``ls /dev/neuron*``\n         3. Restart the container with the correct device path\n      \n      .. dropdown:: ⚠️ Troubleshooting: Permission denied\n         :color: warning\n         :animate: fade-in\n         \n         If you see ``permission denied`` errors when running Docker commands:\n         \n         1. Verify your user is in the ``docker`` group: ``groups``\n         2. If not, add yourself: ``sudo usermod -aG docker $USER`` and re-login\n         3. Alternatively, run Docker with ``sudo``\n      \n      .. dropdown:: ⚠️ Troubleshooting: Image pull failure\n         :color: warning\n         :animate: fade-in\n         \n         If ``docker pull`` fails with a network or authentication error:\n         \n         1. Verify internet connectivity: ``curl -s https://public.ecr.aws/v2/ | head -1``\n         2. Check that the image tag exists in the `ECR Public Gallery <https://gallery.ecr.aws/neuron/jax-training-neuronx>`_\n         3. If behind a proxy, configure Docker proxy settings\n\nNext steps\n----------\n\nNow that JAX is running in a container:\n\n1. **Find more container images**: Browse the full list of available Neuron DLC images at :doc:`/containers/locate-neuron-dlc-image`.\n\n2. **Customize your container**: Learn how to extend a DLC with additional packages at :ref:`containers-dlc-then-customize-devflow`.\n\n3. **Read the JAX documentation**: Explore the :doc:`/frameworks/jax/index` for JAX framework documentation and tutorials.\n\nAdditional resources\n--------------------\n\n- :doc:`/containers/locate-neuron-dlc-image` - Full DLC image list\n- :doc:`/containers/index` - Container documentation overview\n- :doc:`../troubleshooting` - Common issues and solutions\n- :doc:`/release-notes/index` - Version compatibility information\n"
  },
  {
    "path": "setup/jax/index.rst",
    "content": ".. _jax-setup:\n\n.. meta::\n   :description: Install JAX for AWS Neuron on Inf2, Trn1, Trn2, Trn3 instances\n   :keywords: jax, neuron, installation, trn1, trn2, trn3, inf2\n   :framework: jax\n   :instance-types: inf2, trn1, trn2, trn3\n   :content-type: framework-setup-hub\n   :date-modified: 2026-03-03\n\nInstall JAX for Neuron\n=======================\n\nInstall JAX with AWS Neuron support for training and inference on Inferentia and Trainium instances.\n\n**Supported Instances**: Inf2, Trn1, Trn2, Trn3\n\n**JAX Version**: 0.7+ with Neuron PJRT plugin\n\n.. admonition:: Beta Release\n   :class: note\n\n   JAX NeuronX is currently in beta. Some JAX functionality may not be fully supported. We welcome your feedback and contributions.\n\nChoose Installation Method\n---------------------------\n\n.. grid:: 1 1 3 3\n   :gutter: 3\n\n   .. grid-item-card:: 🚀 AWS Deep Learning AMI\n      :link: dlami\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      **Recommended for most users**\n      \n      Pre-configured environment with all dependencies\n      \n      ✅ All dependencies included\n      \n      ✅ Tested configurations\n      \n      ✅ Multiple Python versions\n      \n      ⏱️ **Setup time**: ~5 minutes\n\n   .. grid-item-card:: 🐳 Deep Learning Container\n      :link: dlc\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      **For containerized deployments**\n      \n      Pre-configured Docker images from AWS ECR\n      \n      ✅ Docker-based isolation\n      \n      ✅ Training and inference images\n      \n      ✅ Training images available\n      \n      ⏱️ **Setup time**: ~10 minutes\n\n   .. grid-item-card:: 🔧 Manual Installation\n      :link: manual\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      **For custom environments**\n      \n      Install on existing systems or custom setups\n      \n      ✅ Existing system integration\n      \n      ✅ Custom Python versions\n      \n      ✅ Full control over dependencies\n      \n      ⏱️ **Setup time**: ~15 minutes\n\nPrerequisites\n-------------\n\nBefore installing, ensure you have:\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n   \n   * - Requirement\n     - Details\n   * - Instance Type\n     - Inf2, Trn1, Trn2, or Trn3 instance\n   * - Operating System\n     - Ubuntu 24.04, Ubuntu 22.04, or Amazon Linux 2023\n   * - Python Version\n     - Python 3.10, 3.11, or 3.12\n   * - AWS Account\n     - With EC2 launch permissions\n   * - SSH Access\n     - Key pair for instance connection\n\nWhat You'll Get\n---------------\n\nAfter installation, you'll have:\n\n- **JAX 0.7+** with Neuron PJRT plugin\n- **jax-neuronx** package for Neuron-specific features\n- **libneuronxla** PJRT plugin for native JAX device integration\n- **neuronx-cc** compiler for model optimization\n- **Neuron Runtime** for model execution\n\nVersion Information\n-------------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n   \n   * - Component\n     - Version\n   * - JAX\n     - 0.7.0+\n   * - jax-neuronx\n     - 0.7.0+\n   * - libneuronxla\n     - latest\n   * - neuronx-cc\n     - 2.15.0+\n   * - Python\n     - 3.10, 3.11, 3.12\n\nNext Steps\n----------\n\nAfter installation:\n\n1. **Verify Installation**: Run verification commands in the installation guide\n2. **Read the Guide**: :doc:`/frameworks/jax/setup/jax-setup`\n3. **Explore JAX on Neuron**: :doc:`/frameworks/jax/index`\n4. **API Reference**: :doc:`/frameworks/jax/api-reference-guide/index`\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n   \n   dlami\n   dlc\n   manual\n"
  },
  {
    "path": "setup/jax/manual.rst",
    "content": ".. meta::\n   :description: Manually install JAX Neuron on Inf2, Trn1, Trn2, Trn3 instances\n   :keywords: jax, neuron, manual installation, pip\n   :framework: jax\n   :installation-method: manual\n   :instance-types: inf2, trn1, trn2, trn3\n   :os: ubuntu-24.04, ubuntu-22.04, al2023\n   :python-versions: 3.10, 3.11, 3.12\n   :content-type: installation-guide\n   :estimated-time: 15 minutes\n   :date-modified: 2026-03-03\n\nInstall JAX Manually\n=====================\n\nInstall JAX with Neuron support on existing systems using pip.\n\n⏱️ **Estimated time**: 15 minutes\n\nPrerequisites\n-------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n   \n   * - Requirement\n     - Details\n   * - Instance Type\n     - Inf2, Trn1, Trn2, or Trn3\n   * - Operating System\n     - Ubuntu 24.04, Ubuntu 22.04, or Amazon Linux 2023\n   * - Python\n     - Python 3.10, 3.11, or 3.12\n   * - Sudo Access\n     - Required for driver installation\n   * - Internet Access\n     - For downloading packages\n\nInstallation Steps\n------------------\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 24.04\n      :sync: ubuntu-24-04\n      \n      **Step 1: Update System Packages**\n      \n      .. code-block:: bash\n         \n         sudo apt-get update\n         sudo apt-get install -y python3-pip python3-venv\n      \n      **Step 2: Configure Neuron Repository**\n      \n      .. code-block:: bash\n         \n         # Add Neuron repository\n         . /etc/os-release\n         sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF\n         deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main\n         EOF\n         \n         # Add repository key\n         wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -\n         \n         # Update package list\n         sudo apt-get update\n      \n      **Step 3: Install Neuron Driver and Runtime**\n      \n      .. code-block:: bash\n         \n         sudo apt-get install -y aws-neuronx-dkms\n         sudo apt-get install -y aws-neuronx-runtime-lib\n         sudo apt-get install -y aws-neuronx-collectives\n      \n      **Step 4: Create Virtual Environment**\n      \n      .. code-block:: bash\n         \n         python3.10 -m venv ~/neuron_venv_jax\n         source ~/neuron_venv_jax/bin/activate\n      \n      **Step 5: Install JAX and Neuron Packages**\n      \n      .. code-block:: bash\n         \n         pip install -U pip\n         pip install jax-neuronx[stable] --extra-index-url=https://pip.repos.neuron.amazonaws.com\n      \n      **Step 6: Verify Installation**\n      \n      .. code-block:: python\n         \n         python3 << EOF\n         import jax\n         import jax_neuronx\n         \n         print(f\"JAX version: {jax.__version__}\")\n         print(f\"Devices: {jax.devices()}\")\n         \n         # Check Neuron devices\n         import subprocess\n         result = subprocess.run(['neuron-ls'], capture_output=True, text=True)\n         print(result.stdout)\n         EOF\n      \n      .. dropdown:: ⚠️ Troubleshooting: GPG key error\n         :color: warning\n         :animate: fade-in\n         \n         If you see \"EXPKEYSIG\" error during apt-get update:\n         \n         .. code-block:: bash\n            \n            wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -\n            sudo apt-get update -y\n      \n      .. dropdown:: ⚠️ Troubleshooting: Driver installation failed\n         :color: warning\n         :animate: fade-in\n         \n         If driver installation fails:\n         \n         1. Check kernel headers are installed:\n            \n            .. code-block:: bash\n               \n               sudo apt-get install -y linux-headers-$(uname -r)\n         \n         2. Retry driver installation:\n            \n            .. code-block:: bash\n               \n               sudo apt-get install --reinstall aws-neuronx-dkms\n\n   .. tab-item:: Ubuntu 22.04\n      :sync: ubuntu-22-04\n      \n      **Step 1: Update System Packages**\n      \n      .. code-block:: bash\n         \n         sudo apt-get update\n         sudo apt-get install -y python3-pip python3-venv\n      \n      **Step 2: Configure Neuron Repository**\n      \n      .. code-block:: bash\n         \n         # Add Neuron repository\n         . /etc/os-release\n         sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF\n         deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main\n         EOF\n         \n         # Add repository key\n         wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -\n         \n         # Update package list\n         sudo apt-get update\n      \n      **Step 3: Install Neuron Driver and Runtime**\n      \n      .. code-block:: bash\n         \n         sudo apt-get install -y aws-neuronx-dkms\n         sudo apt-get install -y aws-neuronx-runtime-lib\n         sudo apt-get install -y aws-neuronx-collectives\n      \n      **Step 4: Create Virtual Environment**\n      \n      .. code-block:: bash\n         \n         python3.10 -m venv ~/neuron_venv_jax\n         source ~/neuron_venv_jax/bin/activate\n      \n      **Step 5: Install JAX and Neuron Packages**\n      \n      .. code-block:: bash\n         \n         pip install -U pip\n         pip install jax-neuronx[stable] --extra-index-url=https://pip.repos.neuron.amazonaws.com\n      \n      **Step 6: Verify Installation**\n      \n      .. code-block:: python\n         \n         python3 << EOF\n         import jax\n         import jax_neuronx\n         \n         print(f\"JAX version: {jax.__version__}\")\n         print(f\"Devices: {jax.devices()}\")\n         \n         # Check Neuron devices\n         import subprocess\n         result = subprocess.run(['neuron-ls'], capture_output=True, text=True)\n         print(result.stdout)\n         EOF\n      \n      .. dropdown:: ⚠️ Troubleshooting: GPG key error\n         :color: warning\n         :animate: fade-in\n         \n         If you see \"EXPKEYSIG\" error, update the GPG key and retry.\n      \n      .. dropdown:: ⚠️ Troubleshooting: Driver installation failed\n         :color: warning\n         :animate: fade-in\n         \n         Ensure kernel headers are installed before retrying driver installation.\n\n   .. tab-item:: Amazon Linux 2023\n      :sync: al2023\n      \n      **Step 1: Update System Packages**\n      \n      .. code-block:: bash\n         \n         sudo yum update -y\n         sudo yum install -y python3-pip python3-devel\n      \n      **Step 2: Configure Neuron Repository**\n      \n      .. code-block:: bash\n         \n         # Add Neuron repository\n         sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF\n         [neuron]\n         name=Neuron YUM Repository\n         baseurl=https://yum.repos.neuron.amazonaws.com\n         enabled=1\n         metadata_expire=0\n         EOF\n         \n         # Import GPG key\n         sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB\n      \n      **Step 3: Install Neuron Driver and Runtime**\n      \n      .. code-block:: bash\n         \n         sudo yum install -y aws-neuronx-dkms\n         sudo yum install -y aws-neuronx-runtime-lib\n         sudo yum install -y aws-neuronx-collectives\n      \n      **Step 4: Create Virtual Environment**\n      \n      .. code-block:: bash\n         \n         python3.10 -m venv ~/neuron_venv_jax\n         source ~/neuron_venv_jax/bin/activate\n      \n      **Step 5: Install JAX and Neuron Packages**\n      \n      .. code-block:: bash\n         \n         pip install -U pip\n         pip install jax-neuronx[stable] --extra-index-url=https://pip.repos.neuron.amazonaws.com\n      \n      **Step 6: Verify Installation**\n      \n      .. code-block:: python\n         \n         python3 << EOF\n         import jax\n         import jax_neuronx\n         \n         print(f\"JAX version: {jax.__version__}\")\n         print(f\"Devices: {jax.devices()}\")\n         \n         # Check Neuron devices\n         import subprocess\n         result = subprocess.run(['neuron-ls'], capture_output=True, text=True)\n         print(result.stdout)\n         EOF\n      \n      .. dropdown:: ⚠️ Troubleshooting: Repository access error\n         :color: warning\n         :animate: fade-in\n         \n         If you cannot access the Neuron repository:\n         \n         1. Verify network connectivity\n         2. Check proxy settings if behind corporate firewall\n         3. Ensure GPG key is imported correctly\n      \n      .. dropdown:: ⚠️ Troubleshooting: Driver installation failed\n         :color: warning\n         :animate: fade-in\n         \n         Ensure kernel-devel package is installed:\n         \n         .. code-block:: bash\n            \n            sudo yum install -y kernel-devel-$(uname -r)\n\nNext Steps\n----------\n\nNow that JAX is installed:\n\n1. **Try a Quick Example**:\n   \n   .. code-block:: python\n      \n      import jax\n      import jax.numpy as jnp\n      \n      # Simple operation on Neuron\n      x = jnp.array([1.0, 2.0, 3.0])\n      y = jnp.array([4.0, 5.0, 6.0])\n      result = jax.numpy.multiply(x, y)\n      print(result)\n\n2. **Read Documentation**:\n   \n   - :doc:`/frameworks/jax/index`\n   - :doc:`/frameworks/jax/api-reference-guide/index`\n\n3. **Explore Setup Guide**:\n   \n   - :doc:`/frameworks/jax/setup/jax-setup`\n\nAdditional Resources\n--------------------\n\n- :doc:`dlami` - Use pre-configured DLAMI instead\n- :doc:`dlc` - Use pre-configured Docker containers\n- :doc:`/containers/index` - Container-based deployment\n- :doc:`../troubleshooting` - Common issues and solutions\n- :doc:`/release-notes/index` - Version compatibility information\n"
  },
  {
    "path": "setup/jax-neuronx.rst",
    "content": ".. meta::\n   :description: Install and set up JAX NeuronX plugin for AWS Trainium and Inferentia instances. Complete setup guide for JAX on Ubuntu 22 with Neuron SDK integration.\n\n.. _setup-jax-neuronx:\n\nJAX Setup\n=========\n\nThis guide provides step-by-step instructions for installing and configuring JAX with the NeuronX plugin on AWS Trainium and Inferentia instances. JAX NeuronX enables high-performance machine learning workloads by integrating JAX with AWS's custom ML accelerators.\n\nFor more installation and deployment options, see :ref:`jax-neuron-setup`.\n\n.. note::\n   This setup guide is relevant for ``Inf2`` & ``Trn1`` / ``Trn1n`` / ``Trn2`` instances.\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n``JAX`` setup on Ubuntu 22\n---------------------------\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   JAX NeuronX on Ubuntu 22 </frameworks/jax/setup/jax-setup>\n\n.. card:: Ubuntu 22 (Ubuntu22 AMI)\n        :link: /frameworks/jax/setup/jax-setup\n        :link-type: doc\n        :class-body: sphinx-design-class-title-small\n"
  },
  {
    "path": "setup/legacy-inf1/index.rst",
    "content": ".. meta::\n   :description: Legacy installation guide for AWS Inferentia 1 (Inf1) instances\n   :keywords: neuron, inf1, legacy, installation, inferentia\n   :instance-types: inf1\n   :status: legacy\n   :content-type: legacy-guide\n   :date-modified: 2026-03-30\n\n.. _legacy-inf1:\n\nInf1 installation (legacy)\n===========================\n\n.. warning::\n   \n   **Legacy hardware**: Inf1 instances use NeuronCore v1 architecture.\n   \n   **For new projects, use Inf2, Trn1, Trn2, or Trn3 instances** with NeuronCore v2 for:\n   \n   - 3x better price-performance than Inf1\n   - Broader framework support (PyTorch 2.x, JAX)\n   - Active development and feature updates\n   - Latest Neuron SDK features\n   \n   See :ref:`setup-guide-index` for current instance options.\n\n.. admonition:: When to use Inf1\n   :class: tip\n   \n   Use Inf1 only if you:\n   \n   - Maintain existing Inf1 deployments\n   - Have compiled models for NeuronCore v1\n   - Require specific Inf1 cost optimization for inference workloads\n\nMigration to Inf2\n-----------------\n\nConsider migrating to Inf2 for better performance and support:\n\n- Inf2 offers 3x better price-performance\n- Broader framework support including PyTorch 2.x and JAX\n- Active development with monthly SDK releases\n- See :ref:`setup-guide-index` for current installation options\n\nChoose your framework\n---------------------\n\n.. note::\n   \n   JAX is not supported on Inf1 instances. Use Inf2, Trn1, Trn2, or Trn3 for JAX workloads.\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: PyTorch (Inf1)\n      :link: pytorch\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      PyTorch 1.x with torch-neuron\n      \n      Inference on Inf1 instances using NeuronCore v1\n      \n      :bdg-warning:`Legacy`\n\n   .. grid-item-card:: TensorFlow (Inf1)\n      :link: /archive/tensorflow/setup-legacy-inf1-tensorflow\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      TensorFlow 2.x with tensorflow-neuron (archived)\n      \n      :bdg-danger:`Archived`\n\nAdditional resources\n--------------------\n\n- :doc:`/setup/torch-neuron` - Original PyTorch Neuron setup (Inf1)\n- :doc:`/archive/tensorflow/tensorflow-neuron-inference` - TensorFlow Neuron inference (Inf1)\n- :doc:`/release-notes/index` - Version compatibility\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n   \n   pytorch\n"
  },
  {
    "path": "setup/legacy-inf1/pytorch.rst",
    "content": ".. meta::\n   :description: Legacy PyTorch installation guide for AWS Inferentia 1 (Inf1) instances\n   :keywords: pytorch, neuron, inf1, legacy, installation, torch-neuron\n   :framework: pytorch\n   :instance-types: inf1\n   :status: legacy\n   :content-type: legacy-guide\n   :date-modified: 2026-03-30\n\nPyTorch on Inf1 (legacy)\n=========================\n\n.. warning::\n   \n   **Legacy hardware**: Inf1 instances use NeuronCore v1 with PyTorch 1.x (``torch-neuron``).\n   \n   For new projects, use **Inf2, Trn1, Trn2, or Trn3** with PyTorch 2.9+ (``torch-neuronx``).\n   See :doc:`/setup/pytorch/index` for current PyTorch setup.\n\nKey differences from current PyTorch\n--------------------------------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 35 35\n\n   * - Feature\n     - Inf1 (torch-neuron)\n     - Inf2, Trn1, Trn2, Trn3 (torch-neuronx)\n   * - PyTorch version\n     - 1.x\n     - 2.9+\n   * - Backend\n     - PyTorch/XLA (``torch_neuron``)\n     - Native Neuron (``torch_neuronx``)\n   * - Compilation\n     - ``torch_neuron.trace()``\n     - ``torch.compile(backend='neuronx')``\n   * - Training support\n     - No\n     - Yes\n   * - NeuronCore version\n     - v1\n     - v2\n\nSetup instructions\n------------------\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 20.04\n\n      **Launch Instance**\n\n      * `Launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ and select an Inf1 instance type.\n      * Select Ubuntu Server 20 AMI.\n      * `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_.\n\n      **Install Drivers and Tools**\n\n      .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n      .. include:: /includes/setup/tab-inference-torch-neuron-u20.txt\n\n   .. tab-item:: Ubuntu 22.04\n\n      **Launch Instance**\n\n      * `Launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ and select an Inf1 instance type.\n      * Select Ubuntu Server 22 AMI.\n      * `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_.\n\n      **Install Drivers and Tools**\n\n      .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n      .. include:: /includes/setup/tab-inference-torch-neuron-u22.txt\n\n   .. tab-item:: Amazon Linux 2023\n\n      **Launch Instance**\n\n      * `Launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ and select an Inf1 instance type.\n      * Select Amazon Linux 2023 AMI.\n      * `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_.\n\n      **Install Drivers and Tools**\n\n      .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools\n\n      .. include:: /includes/setup/tab-inference-torch-neuron-al2023.txt\n\nUpdate an Existing Installation\n-------------------------------\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 20.04\n\n      .. include:: /archive/torch-neuron/setup/pytorch-update-u20.rst\n\n   .. tab-item:: Ubuntu 22.04\n\n      .. include:: /archive/torch-neuron/setup/pytorch-update-u22.rst\n\n   .. tab-item:: Amazon Linux 2023\n\n      .. include:: /archive/torch-neuron/setup/pytorch-update-al2023.rst\n\nPrevious Versions\n-----------------\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 20.04\n\n      .. include:: /archive/torch-neuron/setup/pytorch-install-prev-u20.rst\n\n   .. tab-item:: Ubuntu 22.04\n\n      .. include:: /archive/torch-neuron/setup/pytorch-install-prev-u22.rst\n\n   .. tab-item:: Amazon Linux 2023\n\n      .. include:: /archive/torch-neuron/setup/pytorch-install-prev-al2023.rst\n\nVerification\n------------\n\nAfter installation, verify with:\n\n.. code-block:: python\n   \n   import torch\n   import torch_neuron\n   \n   print(f\"torch-neuron version: {torch_neuron.__version__}\")\n\n.. code-block:: bash\n   \n   neuron-ls\n\nNext steps\n----------\n\n- :doc:`/archive/torch-neuron/api-reference-guide-torch-neuron` - torch-neuron API reference\n- :doc:`/frameworks/torch/inference-torch-neuronx` - Inference guides\n- :ref:`setup-guide-index` - Current setup options (Inf2, Trn1, Trn2, Trn3)\n"
  },
  {
    "path": "setup/multiframework-dlami.rst",
    "content": ".. meta::\n   :description: Get started with the Neuron Multi-Framework Deep Learning AMI for PyTorch, JAX, and vLLM on Inf2, Trn1, Trn2, Trn3\n   :keywords: neuron, dlami, multi-framework, pytorch, jax, vllm, installation\n   :instance-types: inf2, trn1, trn2, trn3\n   :content-type: installation-guide\n   :date-modified: 2026-03-30\n\n.. _setup-multiframework-dlami:\n\nGet started with the Neuron multi-framework DLAMI\n===================================================\n\nThe Neuron multi-framework Deep Learning AMI (DLAMI) provides a pre-configured environment\nwith multiple frameworks and libraries ready to use. Each framework has its own virtual\nenvironment with all Neuron components pre-installed.\n\nThe multi-framework DLAMI supports Inf2, Trn1, Trn1n, Trn2, and Trn3 instances and is\nupdated with each Neuron SDK release.\n\n.. contents:: On this page\n   :local:\n   :depth: 2\n\nStep 1: Launch the instance\n----------------------------\n\n.. important::\n   Currently, only Ubuntu 24.04 is supported for multi-framework DLAMIs.\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 24.04\n      :sync: ubuntu-24-04\n\n      Open the `EC2 Console <https://console.aws.amazon.com/ec2>`_, select your desired\n      AWS region, and choose \"Launch Instance\". Under AMI selection, choose \"Quick Start\"\n      then \"Ubuntu\", and select **Deep Learning AMI Neuron (Ubuntu 24.04)**.\n\n      .. image:: /images/neuron-multi-framework-dlami-U24-quick-start.png\n         :scale: 20%\n         :align: center\n\n      Select your desired Neuron instance type (Inf2, Trn1, Trn1n, Trn2, or Trn3),\n      configure disk size (minimum 512 GB for Trn instances), and launch the instance.\n\n.. note::\n\n   To retrieve the latest DLAMI ID programmatically for automation flows, use\n   :ref:`SSM parameters <ssm-parameter-neuron-dlami>`.\n\nStep 2: Activate a virtual environment\n----------------------------------------\n\nThe multi-framework DLAMI includes pre-configured virtual environments for each\nsupported framework and library. Activate the one that matches your use case:\n\n1. Find the virtual environment name for your framework or library in the\n   :ref:`Neuron DLAMI overview <neuron-dlami-multifw-venvs>`.\n\n2. Activate the virtual environment:\n\n   .. code-block:: bash\n\n      source /opt/<name_of_virtual_environment>/bin/activate\n\nCommon virtual environments include:\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 40 30\n\n   * - Framework\n     - Virtual environment\n     - Use case\n   * - PyTorch 2.9\n     - ``aws_neuronx_venv_pytorch``\n     - Training and inference\n   * - PyTorch vLLM\n     - ``aws_neuronx_venv_pytorch_inference_vllm``\n     - LLM inference serving\n   * - JAX\n     - ``aws_neuronx_venv_jax``\n     - Training and inference\n\n.. note::\n\n   Virtual environment names and available frameworks may vary by DLAMI version.\n   See :ref:`neuron-dlami-multifw-venvs` for the complete list.\n\nStep 3: Verify and start\n--------------------------\n\nAfter activating a virtual environment, verify the installation:\n\n.. tab-set::\n\n   .. tab-item:: PyTorch\n\n      .. code-block:: bash\n\n         python3 -c \"import torch; import torch_neuronx; print(f'PyTorch {torch.__version__}, torch-neuronx {torch_neuronx.__version__}')\"\n         neuron-ls\n\n      You should see output similar to this (the framework, versions, instance IDs, and details should match your expected ones, not the ones in this example):\n\n      .. code-block::\n\n         PyTorch 2.9.1+cu128, torch-neuronx 2.9.0.2.13.23887+8e870898\n         $ neuron-ls\n         instance-type: trn1.2xlarge\n         instance-id: i-0bea223b1afb7e159\n         +--------+--------+----------+--------+--------------+----------+------+\n         | NEURON | NEURON |  NEURON  | NEURON |     PCI      |   CPU    | NUMA |\n         | DEVICE | CORES  | CORE IDS | MEMORY |     BDF      | AFFINITY | NODE |\n         +--------+--------+----------+--------+--------------+----------+------+\n         | 0      | 2      | 0-1      | 32 GB  | 0000:00:1e.0 | 0-7      | -1   |\n         +--------+--------+----------+--------+--------------+----------+------+\n\n   .. tab-item:: JAX\n\n      .. code-block:: bash\n\n         python3 -c \"import jax; print(f'JAX {jax.__version__}'); print(f'Devices: {jax.devices()}')\"\n         neuron-ls\n\n      You should see output similar to this (the framework, versions, instance IDs, and details should match your expected ones, not the ones in this example):\n\n      .. code-block::\n\n         JAX 0.6.2.1.0.1, torch-neuronx 2.9.0.2.13.23887+8e870898\n         $ neuron-ls\n         instance-type: trn1.2xlarge\n         instance-id: i-0bea223b1afb7e159\n         +--------+--------+----------+--------+--------------+----------+------+\n         | NEURON | NEURON |  NEURON  | NEURON |     PCI      |   CPU    | NUMA |\n         | DEVICE | CORES  | CORE IDS | MEMORY |     BDF      | AFFINITY | NODE |\n         +--------+--------+----------+--------+--------------+----------+------+\n         | 0      | 2      | 0-1      | 32 GB  | 0000:00:1e.0 | 0-7      | -1   |\n         +--------+--------+----------+--------+--------------+----------+------+\n\n\n\n\n\n\nNext steps\n----------\n\nAfter setup, explore the framework documentation:\n\n- :doc:`/frameworks/torch/index` - PyTorch on Neuron\n- :doc:`/frameworks/jax/index` - JAX on Neuron\n- :doc:`/libraries/nxd-inference/vllm/index` - vLLM on Neuron\n- :doc:`/dlami/index` - Full DLAMI documentation and SSM parameters\n"
  },
  {
    "path": "setup/mxnet-neuron.rst",
    "content": ".. _setup-mxnet-neuron:\n\nMxNet Neuron (``mxnet-neuron``) Setup\n=====================================\n\n.. warning::\n\n   MXNet Neuron has been archived. For new projects, use PyTorch or JAX on Inf2, Trn1, Trn2, or Trn3 instances.\n   See :ref:`setup-guide-index` for current setup options.\n\n.. note::\n   This Setup guide is relevant for ``Inf1`` instances only.\n\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n\n``mxnet-neuron`` setup on Ubuntu 20 \n-----------------------------------\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   MXNet Neuron on Ubuntu 20 </archive/mxnet-neuron/setup/mxnet-neuron-ubuntu20>\n   MXNet Neuron on DLAMI Base (Ubuntu 20) </archive/mxnet-neuron/setup/mxnet-neuron-ubuntu20-base-dlami>\n\n.. card:: Ubuntu 20 (Ubuntu20 AMI)\n        :link: setup-mxnet-neuron-u20\n        :link-type: ref\n        :class-body: sphinx-design-class-title-small\n\n.. card:: Ubuntu 20 (DLAMI Base AMI)\n        :link: setup-mxnet-neuron-u20-base-dlami\n        :link-type: ref\n        :class-body: sphinx-design-class-title-small\n\n``mxnet-neuron`` setup on Ubuntu 22\n-----------------------------------\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   MXNet Neuron on Ubuntu 22 </archive/mxnet-neuron/setup/mxnet-neuron-ubuntu22>\n\n.. card:: Ubuntu 22 (Ubuntu22 AMI)\n        :link: setup-mxnet-neuron-u22\n        :link-type: ref\n        :class-body: sphinx-design-class-title-small\n\n\n``mxnet-neuron`` setup on Amazon Linux 2023 (AL2023)\n----------------------------------------------------\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   MXNet Neuron on Amazon Linux 2023 </archive/mxnet-neuron/setup/mxnet-neuron-al2023>\n\n.. card:: Amazon Linux 2023 (Amazon Linux 2023 AMI)\n        :link: setup-mxnet-neuron-al2023\n        :link-type: ref\n        :class-body: sphinx-design-class-title-small\n"
  },
  {
    "path": "setup/notebook/running-jupyter-notebook-as-script.rst",
    "content": ".. _running-jupyter-notebook-as-script:\n\nRunning Jupyter Notebook as script\n==================================\n\nConverting the Jupyter Notebook and running\n-------------------------------------------\n\nGo into the aws-neuron-sdk repository directory containing the Jupyter Notebook (.ipynb file),\n\n.. code:: bash\n\n   cd aws-neuron-sdk/src/examples/<framework like pytorch, tensorflow, etc>\n\nThe Jupyter Notebook (.ipynb) can be converted to python script using jupyter-nbconvert. For example,\n\n.. code:: bash\n\n  jupyter nbconvert --to script tutorial_pretrained_bert.ipynb\n\nand can be run in the virtual env (if needed),\n\n.. code:: bash\n\n  # if not already in the virtual env,\n  source activate <virtual env>\n  # Run the converted script\n  python <tutorial.py>\n\n\n"
  },
  {
    "path": "setup/notebook/setup-jupyter-notebook-steps-troubleshooting.rst",
    "content": ".. _setup-jupyter-notebook-steps-troubleshooting:\n.. _Running Jupyter Notebook Browser:\n\nJupyter Notebook QuickStart\n===========================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nSSH Tunnel to the Inf1/Trn1 instance\n------------------------------------\nThe Jupyter notebook can be run via a browser on port 8888 by default. For simplicity we will use ssh port forwarding from your machine to the instance.\n\n::\n\n   ssh -i \"<pem file>\" <user>@<instance DNS name> -L 8888:127.0.0.1:8888\n\nOn an Ubuntu image the user will be ubuntu@, while on AL2 you should use\nec2-user@\n\nThis additional argument forwards connections to port 8888 on your\nmachine to the new Inf1/Trn1 instance.\n\n\nStarting the Jupyter Notebook on the instance\n---------------------------------------------\nFrom your ssh prompt on the Inf1/Trn1 instance run\n\n::\n\n   jupyter notebook\n\nYou should see logging in your ssh session similar to:\n\n.. code:: bash\n\n   [I 21:53:11.729 NotebookApp] Using EnvironmentKernelSpecManager...\n   [I 21:53:11.730 NotebookApp] Started periodic updates of the kernel list (every 3 minutes).\n   [I 21:53:11.867 NotebookApp] Loading IPython parallel extension\n   [I 21:53:11.884 NotebookApp] JupyterLab beta preview extension loaded from /home/ubuntu/anaconda3/lib/python3.6/site-packages/jupyterlab\n   [I 21:53:11.884 NotebookApp] JupyterLab application directory is /home/ubuntu/anaconda3/share/jupyter/lab\n   [I 21:53:12.002 NotebookApp] [nb_conda] enabled\n   [I 21:53:12.004 NotebookApp] Serving notebooks from local directory: /home/ubuntu/tutorial\n   [I 21:53:12.004 NotebookApp] 0 active kernels\n   [I 21:53:12.004 NotebookApp] The Jupyter Notebook is running at:\n   [I 21:53:12.004 NotebookApp] http://localhost:8888/?token=f9ad4086afd3c91f33d5587781f9fd8143b4cafbbf121a16\n   [I 21:53:12.004 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).\n   [W 21:53:12.004 NotebookApp] No web browser found: could not locate runnable browser.\n\n\nCopy/paste this URL into your browser when you connect for the first\ntime, to login with a token:\n``http://localhost:8888/?token=f9ad4086afd3c91f33d5587781f9fd8143b4cafbbf121a16&token=f9ad4086afd3c91f33d5587781f9fd8143b4cafbbf121a16``\n\n.. code:: bash\n\n   [I 21:53:12.004 NotebookApp] Starting initial scan of virtual environments...\n   [I 21:53:13.507 NotebookApp] Found new kernels in environments: conda_tensorflow2_p27, conda_aws_neuron_mxnet_p36, conda_anaconda3, conda_tensorflow_p27, conda_chainer_p27, conda_python3, conda_tensorflow_p36, conda_aws_neuron_tensorflow_p36, conda_mxnet_p27, **conda_my_notebook_env**, conda_tensorflow2_p36, conda_pytorch_p27, conda_python2, conda_chainer_p36, conda_mxnet_p36, conda_pytorch_p36\n\n\n\nRunning the Jupyter Notebook from your local browser\n----------------------------------------------------\n\nIf you copy and paste the link that looks like\n``http://localhost:8888/?token=f9ad4086afd3c91f33d5587781f9fd8143b4cafbbf121a16&token=f9ad4086afd3c91f33d5587781f9fd8143b4cafbbf121a16``\ninto your local browser the Notebook navigation pane should pop up.\n\nThis works because ssh is forwarding you local port 8888 through to the\nInf1/Trn1 instance port 8888 where the notebook is running. Note that our new\nconda environment is visible as “kernel” with the “conda\\_” prefix\n(highlighted)\n\n1) In notebook browser select the tutorial.\n2) This will pop up a new tab. In that tab use the menus:\n\nKernel → Change Kernel → Environment (conda_my_notebook_env)\n\n3) Start reading through the self documenting notebook tutorial\n\nTroubleshooting\n---------------\n\nIf your jupyter notebook does not start please try the following:\n\n::\n\n   mv ~/.jupyter ~/.jupyter.old\n   mkdir -p ~/.jupyter\n   echo \"c.NotebookApp.iopub_data_rate_limit = 10000000000\" > ~/.jupyter/jupyter_notebook_config.py\n\n   # Instal Jupyter notebook kernel\n    pip install ipykernel\n    python3 -m ipykernel install --user --name aws_neuron_venv_pytorch --display-name \"Python Neuronx\"\n    pip install jupyter notebook\n    pip install environment_kernels\n\n   jupyter notebook\n\n\n"
  },
  {
    "path": "setup/pytorch/dlami.rst",
    "content": ".. meta::\n   :description: Install PyTorch Neuron using AWS Deep Learning AMI on Inf2, Trn1, Trn2, Trn3\n   :keywords: pytorch, neuron, dlami, installation, ami\n   :framework: pytorch\n   :installation-method: dlami\n   :instance-types: inf2, trn1, trn2, trn3\n   :os: ubuntu-24.04, ubuntu-22.04, al2023\n   :python-versions: 3.10, 3.11, 3.12\n   :content-type: installation-guide\n   :estimated-time: 5 minutes\n   :date-modified: 2026-03-03\n\nInstall PyTorch via Deep Learning AMI\n======================================\n\nInstall PyTorch with Neuron support using pre-configured AWS Deep Learning AMIs.\n\n⏱️ **Estimated time**: 5 minutes\n\n.. note::\n   Want to read about Neuron's Deep Learning machine images (DLAMIs) before diving in? Check out the :doc:`/dlami/index`.\n\n----\n\nPrerequisites\n-------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n   \n   * - Requirement\n     - Details\n   * - Instance Type\n     - Inf2, Trn1, Trn2, or Trn3\n   * - AWS Account\n     - With EC2 permissions\n   * - SSH Key Pair\n     - For instance access\n   * - AWS CLI\n     - Configured with credentials (optional)\n\nInstallation Steps\n------------------\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 24.04\n      :sync: ubuntu-24-04\n      \n      **Step 1: Find the Latest AMI**\n      \n      Get the latest PyTorch DLAMI for Ubuntu 24.04 using the AWS CLI:\n      \n      .. code-block:: bash\n         \n         aws ec2 describe-images \\\n           --owners amazon \\\n           --filters \"Name=name,Values=Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04)*\" \\\n           --query 'Images | sort_by(@, &CreationDate) | [-1].ImageId' \\\n           --output text\n\n      You can also use the AWS EC2 parameter store to find the ID of a DLAMI. See `https://docs.aws.amazon.com/dlami/latest/devguide/find-dlami-id.html`__ for details. Record the ID (``image-id``) for the next step.\n      \n      **Step 2: Launch Instance**\n      \n      Launch a Trn1 or Inf2 instance with the AMI using the AWS CLI:\n      \n      .. code-block:: bash\n         \n         aws ec2 run-instances \\\n           --image-id ami-xxxxxxxxxxxxxxxxx \\\n           --instance-type trn1.2xlarge \\\n           --key-name your-key-pair \\\n           --security-group-ids sg-xxxxxxxxx \\\n           --subnet-id subnet-xxxxxxxxx\n      \n      Replace:\n      \n      - ``ami-xxxxxxxxxxxxxxxxx`` with AMI ID from Step 1\n      - ``your-key-pair`` with your SSH key pair name\n      - ``sg-xxxxxxxxx`` with your security group ID\n      - ``subnet-xxxxxxxxx`` with your subnet ID\n\n       You can also launch your DLAMI through the AWS EC2 web console, which also provides hints for security group and subnet IDs. For more details, see `https://docs.aws.amazon.com/dlami/latest/devguide/launch.html`__.\n      \n      **Step 3: Connect to Instance**\n      \n      .. code-block:: bash\n         \n         ssh -i your-key-pair.pem ubuntu@<instance-public-ip>\n      \n      **Step 4: Activate Environment**\n      \n      The DLAMI includes a pre-configured virtual environment:\n      \n      .. code-block:: bash\n         \n         source /opt/aws_neuronx_venv_pytorch_2_9/bin/activate\n      \n      **Step 5: Verify Installation**\n      \n      .. code-block:: bash\n\n         python3 -c \"import torch; import torch_neuronx; print(f'PyTorch {torch.__version__}, torch-neuronx {torch_neuronx.__version__}')\"\n         neuron-ls\n\n      You should see output similar to this (the versions, instance IDs, and details should match your expected ones, not the ones in this example):\n      \n      **Expected output**:\n      \n      .. code-block:: text\n         \n         PyTorch 2.9.0+cpu, torch-neuronx 2.9.0.1.0\n         \n         +--------+--------+--------+-----------+\n         | DEVICE | CORES  | MEMORY | CONNECTED |\n         +--------+--------+--------+-----------+\n         | 0      | 2      | 32 GB  | Yes       |\n         | 1      | 2      | 32 GB  | Yes       |\n         +--------+--------+--------+-----------+\n      \n      .. dropdown:: ⚠️ Troubleshooting: Module not found\n         :color: warning\n         :animate: fade-in\n         \n         If you see ``ModuleNotFoundError: No module named 'torch_neuronx'``:\n         \n         1. Verify virtual environment is activated:\n            \n            .. code-block:: bash\n               \n               which python\n               # Should show:  source /opt/aws_neuronx_venv_pytorch_2_9/bin/activate\n         \n         2. Check Python version:\n            \n            .. code-block:: bash\n               \n               python --version\n               # Should be 3.10 or higher\n         \n         3. Reinstall torch-neuronx:\n            \n            .. code-block:: bash\n               \n               pip install --force-reinstall torch-neuronx\n      \n      .. dropdown:: ⚠️ Troubleshooting: No Neuron devices found\n         :color: warning\n         :animate: fade-in\n         \n         If ``neuron-ls`` shows no devices:\n         \n         4. Verify instance type:\n            \n            .. code-block:: bash\n               \n               curl http://169.254.169.254/latest/meta-data/instance-type\n               # Should show trn1.*, trn2.*, trn3.*, or inf2.*\n         \n         5. Check Neuron driver:\n            \n            .. code-block:: bash\n               \n               lsmod | grep neuron\n               # Should show neuron driver loaded\n         \n         6. Restart Neuron runtime:\n            \n            .. code-block:: bash\n               \n               sudo systemctl restart neuron-monitor\n               neuron-ls\n\n   .. tab-item:: Ubuntu 22.04\n      :sync: ubuntu-22-04\n      \n      **Step 1: Find the Latest AMI**\n      \n      Get the latest PyTorch DLAMI for Ubuntu 22.04:\n      \n      .. code-block:: bash\n         \n         aws ec2 describe-images \\\n           --owners amazon \\\n           --filters \"Name=name,Values=Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 22.04)*\" \\\n           --query 'Images | sort_by(@, &CreationDate) | [-1].ImageId' \\\n           --output text\n      \n      **Step 2: Launch Instance**\n      \n      .. code-block:: bash\n         \n         aws ec2 run-instances \\\n           --image-id ami-xxxxxxxxxxxxxxxxx \\\n           --instance-type trn1.2xlarge \\\n           --key-name your-key-pair \\\n           --security-group-ids sg-xxxxxxxxx \\\n           --subnet-id subnet-xxxxxxxxx\n      \n      **Step 3: Connect to Instance**\n      \n      .. code-block:: bash\n         \n         ssh -i your-key-pair.pem ubuntu@<instance-public-ip>\n      \n      **Step 4: Activate Environment**\n      \n      .. code-block:: bash\n         \n          source /opt/aws_neuronx_venv_pytorch_2_9/bin/activate\n      \n      **Step 5: Verify Installation**\n      \n      .. code-block:: bash\n\n         python3 -c \"import torch; import torch_neuronx; print(f'PyTorch {torch.__version__}, torch-neuronx {torch_neuronx.__version__}')\"\n         neuron-ls\n\n      You should see output similar to this (the versions, instance IDs, and details should match your expected ones, not the ones in this example):\n      \n      **Expected output**:\n      \n      .. code-block:: text\n         \n         PyTorch 2.9.0+cpu, torch-neuronx 2.9.0.1.0\n         \n         +--------+--------+--------+-----------+\n         | DEVICE | CORES  | MEMORY | CONNECTED |\n         +--------+--------+--------+-----------+\n         | 0      | 2      | 32 GB  | Yes       |\n         | 1      | 2      | 32 GB  | Yes       |\n         +--------+--------+--------+-----------+\n      \n      .. dropdown:: ⚠️ Troubleshooting: Module not found\n         :color: warning\n         :animate: fade-in\n         \n         If you see ``ModuleNotFoundError: No module named 'torch_neuronx'``:\n         \n         7. Verify virtual environment is activated\n         8. Check Python version: ``python --version`` (should be 3.10+)\n         9. Reinstall: ``pip install --force-reinstall torch-neuronx``\n      \n      .. dropdown:: ⚠️ Troubleshooting: No Neuron devices found\n         :color: warning\n         :animate: fade-in\n         \n         If ``neuron-ls`` shows no devices:\n         \n         10. Verify instance type\n         11. Check Neuron driver: ``lsmod | grep neuron``\n         12. Restart runtime: ``sudo systemctl restart neuron-monitor``\n\n   .. tab-item:: Amazon Linux 2023\n      :sync: al2023\n      \n      **Step 1: Find the Latest AMI**\n      \n      Get the latest PyTorch DLAMI for Amazon Linux 2023:\n      \n      .. code-block:: bash\n         \n         aws ec2 describe-images \\\n           --owners amazon \\\n           --filters \"Name=name,Values=Deep Learning AMI Neuron PyTorch 2.9 (Amazon Linux 2023)*\" \\\n           --query 'Images | sort_by(@, &CreationDate) | [-1].ImageId' \\\n           --output text\n      \n      **Step 2: Launch Instance**\n      \n      .. code-block:: bash\n         \n         aws ec2 run-instances \\\n           --image-id ami-xxxxxxxxxxxxxxxxx \\\n           --instance-type trn1.2xlarge \\\n           --key-name your-key-pair \\\n           --security-group-ids sg-xxxxxxxxx \\\n           --subnet-id subnet-xxxxxxxxx\n      \n      **Step 3: Connect to Instance**\n      \n      .. code-block:: bash\n         \n         ssh -i your-key-pair.pem ec2-user@<instance-public-ip>\n      \n      .. note::\n         \n         Amazon Linux 2023 uses ``ec2-user`` instead of ``ubuntu``.\n      \n      **Step 4: Activate Environment**\n      \n      .. code-block:: bash\n         \n         source /opt/aws_neuronx_venv_pytorch_2_9/bin/activate\n      \n      **Step 5: Verify Installation**\n      \n      .. code-block:: python\n         \n         .. code-block:: bash\n\n         python3 -c \"import torch; import torch_neuronx; print(f'PyTorch {torch.__version__}, torch-neuronx {torch_neuronx.__version__}')\"\n         neuron-ls\n\n      You should see output similar to this (the versions, instance IDs, and details should match your expected ones, not the ones in this example):\n      \n      **Expected output**:\n      \n      .. code-block:: text\n         \n         PyTorch 2.9.0+cpu, torch-neuronx 2.9.0.1.0\n         \n         +--------+--------+--------+-----------+\n         | DEVICE | CORES  | MEMORY | CONNECTED |\n         +--------+--------+--------+-----------+\n         | 0      | 2      | 32 GB  | Yes       |\n         | 1      | 2      | 32 GB  | Yes       |\n         +--------+--------+--------+-----------+\n      \n      .. dropdown:: ⚠️ Troubleshooting: Module not found\n         :color: warning\n         :animate: fade-in\n         \n         If you see ``ModuleNotFoundError: No module named 'torch_neuronx'``:\n         \n         1. Verify virtual environment is activated\n         2. Check Python version: ``python --version`` (should be 3.10+)\n         3. Reinstall: ``pip install --force-reinstall torch-neuronx``\n      \n      .. dropdown:: ⚠️ Troubleshooting: No Neuron devices found\n         :color: warning\n         :animate: fade-in\n         \n         If ``neuron-ls`` shows no devices:\n         \n         1. Verify instance type\n         2. Check Neuron driver: ``lsmod | grep neuron``\n         3. Restart runtime: ``sudo systemctl restart neuron-monitor``\n\nUpdate an existing installation\n--------------------------------\n\nTo update PyTorch versions or Neuron drivers on an existing DLAMI, see\n:doc:`update-dlami`.\n\n\n.. tip:: **vLLM for LLM inference**\n   \n   Neuron provides a dedicated vLLM DLAMI with vLLM and the vLLM-Neuron Plugin pre-installed.\n   Launch the **Deep Learning AMI Neuron PyTorch Inference vLLM (Ubuntu 24.04)** and activate\n   the pre-configured environment:\n   \n   .. code-block:: bash\n      \n      source /opt/aws_neuronx_venv_pytorch_inference_vllm_0_16/bin/activate\n   \n   vLLM provides an OpenAI-compatible API, continuous batching, and supports models like\n   Llama 2/3.1/3.3/4, Qwen 2.5/3, and multimodal models with quantization support (INT8/FP8).\n   \n   The vLLM environment is also available in the multi-framework DLAMI. For more details\n   on available DLAMIs and SSM parameters, see :doc:`/dlami/index`.\n\nNext Steps\n----------\n\nNow that PyTorch is installed:\n\n1. **Try a Quick Example**:\n   \n   .. code-block:: python\n      \n      import torch\n      import torch_neuronx\n\n      # Simple tensor operation on Neuron\n      x = torch.randn(3, 3)\n      model = torch.nn.Linear(3, 3)\n\n      # Compile for Neuron\n      trace = torch_neuronx.trace(model, x)\n      print(trace(x))\n\n2. **Follow Tutorials**:\n   \n   - :doc:`/frameworks/torch/training-torch-neuronx`\n   - :doc:`/frameworks/torch/inference-torch-neuronx`\n\n3. **Read Documentation**:\n   \n   - :doc:`/frameworks/torch/torch-neuronx/programming-guide/training/index`\n   - :doc:`/frameworks/torch/index`\n\n4. **Explore Tools**:\n   \n   - :doc:`/tools/profiler/neuron-profile-user-guide`\n   - :doc:`/tools/neuron-sys-tools/neuron-top-user-guide`\n\n5. **Deploy LLM inference**: :doc:`/dlami/index` (vLLM on Neuron)\n\nAdditional Resources\n--------------------\n\n- :doc:`/dlami/index` - DLAMI documentation\n- :doc:`/containers/index` - Container-based deployment\n- :doc:`../troubleshooting` - Common issues and solutions\n- :doc:`/release-notes/index` - Version compatibility information\n"
  },
  {
    "path": "setup/pytorch/dlc.rst",
    "content": ".. meta::\n   :description: Install PyTorch Neuron using AWS Deep Learning Containers on Inf2, Trn1, Trn2, Trn3\n   :keywords: pytorch, neuron, dlc, deep learning containers, docker, installation, vllm, inference, training\n   :framework: pytorch\n   :installation-method: dlc\n   :instance-types: inf2, trn1, trn2, trn3\n   :content-type: installation-guide\n   :estimated-time: 10 minutes\n   :date-modified: 2026-03-30\n\nInstall PyTorch via Deep Learning Container\n=============================================\n\nDeploy PyTorch with Neuron support using pre-configured Docker images from AWS ECR.\n\n⏱️ **Estimated time**: 10 minutes\n\n.. note::\n   For a non-containerized setup, consider the :doc:`DLAMI-based installation <dlami>` or\n   :doc:`manual installation <manual>` instead.\n\n----\n\nWhat are Neuron DLCs?\n---------------------\n\nAWS Neuron Deep Learning Containers (DLCs) are pre-configured Docker images with the Neuron SDK\nand ML frameworks pre-installed. They provide Docker-based isolation, reproducibility, and\nportability across deployment platforms including EC2, EKS, ECS, and SageMaker.\n\nAvailable PyTorch Neuron DLC images:\n\n.. list-table::\n   :header-rows: 1\n   :widths: 40 30 30\n\n   * - Container Type\n     - Use Case\n     - Links\n   * - PyTorch Inference (NeuronX)\n     - Model serving on Inf2/Trn1/Trn2/Trn3\n     - `Inference images <https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuronx>`__\n   * - PyTorch Inference vLLM (NeuronX)\n     - LLM serving with vLLM\n     - `vLLM images <https://github.com/aws-neuron/deep-learning-containers#vllm-inference-neuronx>`__\n   * - PyTorch Training (NeuronX)\n     - Model training on Trn1/Trn2/Trn3\n     - `Training images <https://github.com/aws-neuron/deep-learning-containers#pytorch-training-neuronx>`__\n   * - PyTorch Inference (Neuron)\n     - Legacy inference on Inf1\n     - `Inf1 images <https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuron>`__\n\nPrerequisites\n-------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n\n   * - Requirement\n     - Details\n   * - Instance Type\n     - Inf2, Trn1, Trn2, or Trn3\n   * - Docker\n     - Docker Engine installed and running\n   * - AWS CLI\n     - Configured with ECR access permissions\n   * - Neuron Driver\n     - ``aws-neuronx-dkms`` installed on the host\n\nQuick Start: vLLM Inference Container\n--------------------------------------\n\nThe fastest way to get started with LLM inference on Neuron:\n\n.. code-block:: bash\n\n   # Authenticate with ECR\n   aws ecr get-login-password --region us-east-1 | \\\n     docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com\n\n   # Pull the vLLM inference container\n   docker pull 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-neuronx:2.1.2-neuronx-py310-sdk2.20.2-ubuntu20.04\n\n   # Run with Neuron device access\n   docker run -it --device=/dev/neuron0 \\\n     763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-neuronx:2.1.2-neuronx-py310-sdk2.20.2-ubuntu20.04\n\nFor the latest image tags and a step-by-step walkthrough, see\n:doc:`/containers/get-started/quickstart-configure-deploy-dlc`.\n\nQuick Start: Training Container\n--------------------------------\n\n.. code-block:: bash\n\n   # Authenticate with ECR\n   aws ecr get-login-password --region us-east-1 | \\\n     docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com\n\n   # Pull the training container\n   docker pull 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training-neuronx:2.1.2-neuronx-py310-sdk2.20.2-ubuntu20.04\n\n   # Run with all Neuron devices\n   docker run -it --device=/dev/neuron0 --device=/dev/neuron1 \\\n     763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training-neuronx:2.1.2-neuronx-py310-sdk2.20.2-ubuntu20.04\n\n.. note::\n   The image tags above are examples. For the latest available images, see the\n   `Neuron DLC repository <https://github.com/aws-neuron/deep-learning-containers>`__.\n\nCustomizing a DLC\n-----------------\n\nYou can extend a Neuron DLC with additional packages by creating a custom Dockerfile:\n\n.. code-block:: dockerfile\n\n   FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-neuronx:2.1.2-neuronx-py310-sdk2.20.2-ubuntu20.04\n\n   # Install additional packages\n   RUN pip install transformers datasets\n\n   # Copy your application code\n   COPY app/ /app/\n\nFor more details, see :doc:`/containers/dlc-then-customize-devflow`.\n\nDeployment Platforms\n--------------------\n\nNeuron DLCs can be deployed across multiple AWS services:\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Amazon EC2\n      :link: /containers/dlc-then-ec2-devflow\n      :link-type: doc\n      :class-card: sd-rounded-3\n\n      Deploy containers directly on EC2 instances with Neuron devices.\n\n   .. grid-item-card:: Amazon EKS\n      :link: /containers/dlc-then-eks-devflow\n      :link-type: doc\n      :class-card: sd-rounded-3\n\n      Run containers on managed Kubernetes with the Neuron device plugin.\n\n   .. grid-item-card:: Amazon ECS\n      :link: /containers/dlc-then-ecs-devflow\n      :link-type: doc\n      :class-card: sd-rounded-3\n\n      Deploy containers using Amazon Elastic Container Service.\n\nNext Steps\n----------\n\n- :doc:`/containers/get-started/quickstart-configure-deploy-dlc` - Full vLLM DLC deployment walkthrough\n- :doc:`/containers/locate-neuron-dlc-image` - Find the right DLC image for your workload\n- :doc:`/containers/index` - Full containers documentation\n- :doc:`/frameworks/torch/training-torch-neuronx` - Training tutorials\n- :doc:`/frameworks/torch/inference-torch-neuronx` - Inference tutorials\n"
  },
  {
    "path": "setup/pytorch/index.rst",
    "content": ".. meta::\n   :description: Install PyTorch for AWS Neuron on Inf2, Trn1, Trn2, Trn3 instances\n   :keywords: pytorch, neuron, installation, trn1, trn2, trn3, inf2\n   :framework: pytorch\n   :instance-types: inf2, trn1, trn2, trn3\n   :content-type: framework-setup-hub\n   :date-modified: 2026-03-03\n\n.. _pytorch-setup:\n\nInstall PyTorch for Neuron\n===========================\n\nInstall PyTorch with AWS Neuron support for training and inference on Inferentia and Trainium instances.\n\n**Supported Instances**: Inf2, Trn1, Trn2, Trn3\n\n**PyTorch Version**: 2.9+ with Native Neuron backend\n\nChoose installation method\n---------------------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 20 40 40\n\n   * - Method\n     - Best for\n     - Considerations\n   * - :doc:`DLAMI <dlami>`\n     - Getting started quickly, prototyping, single-user development\n     - Pre-configured with tested dependency versions; launch a new AMI to update\n   * - :doc:`DLC <dlc>`\n     - Production deployments, CI/CD pipelines, multi-tenant environments\n     - Requires Docker and Neuron driver on host; portable across EC2, ECS, EKS\n   * - :doc:`Manual <manual>`\n     - Custom OS images, shared clusters, integrating into existing environments\n     - Full control over versions and dependencies; requires manual dependency management\n\n.. grid:: 1 1 3 3\n   :gutter: 3\n\n   .. grid-item-card:: 🚀 AWS Deep Learning AMI\n      :link: dlami\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      **Recommended for most users**\n      \n      Pre-configured environment with all dependencies\n      \n      ✅ All dependencies included\n      \n      ✅ Tested configurations\n      \n      ✅ Multiple Python versions\n      \n      ⏱️ **Setup time**: ~5 minutes\n\n   .. grid-item-card:: 🐳 Deep Learning Container\n      :link: dlc\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      **For containerized deployments**\n      \n      Pre-configured Docker images from AWS ECR\n      \n      ✅ Docker-based isolation\n      \n      ✅ Training and inference images\n      \n      ✅ vLLM-ready images available\n      \n      ⏱️ **Setup time**: ~10 minutes\n\n   .. grid-item-card:: 🔧 Manual Installation\n      :link: manual\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      **For custom environments**\n      \n      Install on existing systems or custom setups\n      \n      ✅ Existing system integration\n      \n      ✅ Custom Python versions\n      \n      ✅ Full control over dependencies\n      \n      ⏱️ **Setup time**: ~15 minutes\n\nPrerequisites\n-------------\n\nBefore installing, ensure you have:\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n   \n   * - Requirement\n     - Details\n   * - Instance Type\n     - Inf2, Trn1, Trn2, or Trn3 instance\n   * - Operating System\n     - Ubuntu 24.04, Ubuntu 22.04, or Amazon Linux 2023\n   * - Python Version\n     - Python 3.10, 3.11, or 3.12\n   * - AWS Account\n     - With EC2 launch permissions\n   * - SSH Access\n     - Key pair for instance connection\n\nWhat You'll Get\n---------------\n\nAfter installation, you'll have:\n\n- **PyTorch 2.9+** with Native Neuron backend\n- **torch-neuronx** package for Neuron-specific operations\n- **neuronx-cc** compiler for model optimization\n- **Neuron Runtime** for model execution\n- **Neuron Tools** for profiling and debugging\n\nVersion Information\n-------------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n   \n   * - Component\n     - Version\n   * - PyTorch\n     - 2.9.0+\n   * - torch-neuronx\n     - 2.9.0+\n   * - neuronx-cc\n     - 2.15.0+\n   * - Python\n     - 3.10, 3.11, 3.12\n\nNext Steps\n----------\n\nAfter installation:\n\n1. **Verify Installation**: Run verification commands in the installation guide\n2. **Try a Tutorial**: \n   \n   * **Inference**: :doc:`/libraries/nxd-inference/vllm/quickstart-vllm-online-serving`\n   * **Training**: :doc:`/frameworks/torch/torch-neuronx/tutorials/training/mlp`\n  \n3. **Read the torch-neuronx Programming Guide**: :doc:`/frameworks/torch/torch-neuronx/programming-guide/training/index`\n4. **Explore Examples**: :doc:`/frameworks/torch/index`\n\nUpdate an existing installation\n--------------------------------\n\nAlready have PyTorch Neuron installed and need to update to a newer PyTorch version or Neuron SDK release? Select the guide that matches your installation method.\n\n.. grid:: 1 1 3 3\n   :gutter: 3\n\n   .. grid-item-card:: 🔄 Update DLAMI\n      :link: update-dlami\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      Update PyTorch and drivers on an existing Deep Learning AMI\n\n   .. grid-item-card:: 🔄 Update DLC\n      :link: update-dlc\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      Pull the latest container image and update the host driver\n\n   .. grid-item-card:: 🔄 Update manual install\n      :link: update-manual\n      :link-type: doc\n      :class-card: sd-border-2\n      \n      Update PyTorch packages and drivers on a manual installation\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n   \n   New DLAMI <dlami>\n   Update Existing DLAMI <update-dlami>\n   New DLC <dlc>\n   Update Existing DLC <update-dlc>\n   New Manual Configuration <manual>\n   Update Manual Configuration <update-manual>\n"
  },
  {
    "path": "setup/pytorch/manual.rst",
    "content": ".. meta::\n   :description: Manual installation of PyTorch Neuron on Inf2, Trn1, Trn2, Trn3 instances\n   :keywords: pytorch, neuron, manual, installation, pip\n   :framework: pytorch\n   :installation-method: manual\n   :instance-types: inf2, trn1, trn2, trn3\n   :os: ubuntu-24.04, ubuntu-22.04, al2023\n   :python-versions: 3.10, 3.11, 3.12\n   :content-type: installation-guide\n   :estimated-time: 15 minutes\n   :date-modified: 2026-03-30\n\nInstall PyTorch via manual installation\n========================================\n\nInstall PyTorch with Neuron support on a bare OS AMI or existing system.\n\n⏱️ **Estimated time**: 15 minutes\n\n.. note::\n   For a faster setup, consider using the :doc:`DLAMI-based installation <dlami>` instead.\n\n.. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n----\n\nPrerequisites\n-------------\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 70\n\n   * - Requirement\n     - Details\n   * - Instance Type\n     - Inf2, Trn1, Trn2, or Trn3\n   * - Operating System\n     - Ubuntu 24.04, Ubuntu 22.04, or Amazon Linux 2023\n   * - Python Version\n     - Python 3.10, 3.11, or 3.12\n   * - AWS Account\n     - With EC2 permissions\n   * - SSH Key Pair\n     - For instance access\n\nInstallation steps\n------------------\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 24.04\n      :sync: ubuntu-24-04\n\n      **Step 1: Launch instance**\n\n      * Follow the instructions to `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_.\n      * Select Ubuntu Server 24 AMI.\n      * For Trn1, adjust your primary EBS volume size to a minimum of 512GB.\n      * `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_.\n\n      **Step 2: Install drivers and tools**\n\n      .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n          :start-line: 299\n          :end-line: 300\n\n      **Step 3: Install EFA** (Trn1/Trn1n/Trn2/Trn3 only)\n\n      .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n          :start-line: 290\n          :end-line: 293\n\n      **Step 4: Install PyTorch and Neuron packages**\n\n      .. tab-set::\n\n          .. tab-item:: PyTorch 2.9.0\n\n              .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n                  :start-line: 296\n                  :end-line: 297\n\n          .. tab-item:: PyTorch 2.8.0\n\n              .. note::\n                PyTorch versions 2.7 and 2.8 are no longer supported on Neuron. If you are looking for setup instructions specific to PyTorch 2.7 and 2.8 on Amazon Linux 2023, Ubuntu 24.04, or Ubuntu 22.04, see `the Neuron release 2.28.0 version of the setup docs <https://awsdocs-neuron.readthedocs-hosted.com/en/v2.28.0/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu24.html#setup-torch-neuronx-ubuntu24>`__.\n\n      **Step 5: Verify installation**\n\n      .. code-block:: bash\n\n         python3 -c \"import torch; import torch_neuronx; print(f'PyTorch {torch.__version__}, torch-neuronx {torch_neuronx.__version__}')\"\n         neuron-ls\n\n      You should see output similar to this (the versions, instance IDs, and details should match your expected ones, not the ones in this example):\n      \n      **Expected output**:\n      \n      .. code-block:: text\n         \n         PyTorch 2.9.0+cpu, torch-neuronx 2.9.0.1.0\n         \n         +--------+--------+--------+-----------+\n         | DEVICE | CORES  | MEMORY | CONNECTED |\n         +--------+--------+--------+-----------+\n         | 0      | 2      | 32 GB  | Yes       |\n         | 1      | 2      | 32 GB  | Yes       |\n         +--------+--------+--------+-----------+\n\n   .. tab-item:: Ubuntu 22.04\n      :sync: ubuntu-22-04\n\n      **Step 1: Launch instance**\n\n      * Follow the instructions to `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_.\n      * Select Ubuntu Server 22 AMI.\n      * For Trn1, adjust your primary EBS volume size to a minimum of 512GB.\n      * `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_.\n\n      **Step 2: Install drivers and tools**\n\n      .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n          :start-line: 242\n          :end-line: 243\n\n      **Step 3: Install EFA** (Trn1/Trn1n/Trn2/Trn3 only)\n\n      .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n          :start-line: 248\n          :end-line: 249\n\n      **Step 4: Install PyTorch and Neuron packages**\n\n      .. tab-set::\n\n          .. tab-item:: PyTorch 2.9.0\n\n              .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n                  :start-line: 286\n                  :end-line: 287\n\n          .. tab-item:: PyTorch 2.8.0\n\n              .. note::\n                  PyTorch versions 2.7 and 2.8 are no longer supported on Neuron. If you are looking for setup instructions specific to PyTorch 2.7 and 2.8 on Amazon Linux 2023, Ubuntu 24.04, or Ubuntu 22.04, see `the Neuron release 2.28.0 version of the setup docs <https://awsdocs-neuron.readthedocs-hosted.com/en/v2.28.0/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html#setup-torch-neuronx-ubuntu22>`__.\n\n      **Step 5: Verify installation**\n\n      .. code-block:: bash\n\n         python3 -c \"import torch; import torch_neuronx; print(f'PyTorch {torch.__version__}, torch-neuronx {torch_neuronx.__version__}')\"\n         neuron-ls\n\n      You should see output similar to this (the versions, instance IDs, and details should match your expected ones, not the ones in this example):\n      \n      **Expected output**:\n      \n      .. code-block:: text\n         \n         PyTorch 2.9.0+cpu, torch-neuronx 2.9.0.1.0\n         \n         +--------+--------+--------+-----------+\n         | DEVICE | CORES  | MEMORY | CONNECTED |\n         +--------+--------+--------+-----------+\n         | 0      | 2      | 32 GB  | Yes       |\n         | 1      | 2      | 32 GB  | Yes       |\n         +--------+--------+--------+-----------+\n\n   .. tab-item:: Amazon Linux 2023\n      :sync: al2023\n\n      .. note::\n         Currently, PyTorch 2.9 is not available on Amazon Linux 2023 and PyTorch 2.7 and 2.8 are no longer supported for Neuron. Use Ubuntu 24.04 for PyTorch 2.9 support. If you are using Neuron 2.28.0, `see the Amazon Linux 2023 setup documentation in the 2.28.0 version of the Neuron docs <https://awsdocs-neuron.readthedocs-hosted.com/en/v2.28.0/setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2023.html>`__.\n\n\n.. tip:: **vLLM for LLM inference**\n\n   After completing the manual installation, you can add vLLM for inference serving\n   using the ``vllm-neuron`` plugin:\n\n   .. code-block:: bash\n\n      git clone https://github.com/vllm-project/vllm-neuron.git\n      cd vllm-neuron\n      pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com -e .\n\n   Or use the pre-configured vLLM DLC image for a containerized deployment.\n   See :doc:`/libraries/nxd-inference/vllm/index` for all deployment options.\n\nUpdate an existing installation\n--------------------------------\n\nTo update PyTorch versions or Neuron drivers on an existing manual installation, see\n:doc:`update-manual`.\n\nNext steps\n----------\n\n- :doc:`/frameworks/torch/training-torch-neuronx` - Training on Trn1/Trn2\n- :doc:`/frameworks/torch/inference-torch-neuronx` - Inference on Inf2/Trn1/Trn2\n- :doc:`/tools/profiler/neuron-profile-user-guide` - Profile your workloads\n- :doc:`/tools/neuron-sys-tools/neuron-top-user-guide` - Monitor system resources\n\nAdvanced\n--------\n\n- :doc:`/frameworks/torch/torch-neuronx/setup/pytorch-neuronx-install-cxx11` - Build torch-xla from source with CXX11 ABI\n\nAdditional resources\n--------------------\n\n- :doc:`dlami` - Use pre-configured DLAMI instead\n- :doc:`dlc` - Use pre-configured Docker containers\n- :doc:`/containers/index` - Container-based deployment\n- :doc:`../troubleshooting` - Common issues and solutions\n- :doc:`/release-notes/index` - Version compatibility information\n"
  },
  {
    "path": "setup/pytorch/update-dlami.rst",
    "content": ".. meta::\n   :description: Update PyTorch Neuron on an existing AWS Deep Learning AMI\n   :keywords: pytorch, neuron, dlami, update, upgrade, driver\n   :framework: pytorch\n   :installation-method: dlami\n   :instance-types: inf2, trn1, trn2, trn3\n   :os: ubuntu-24.04, ubuntu-22.04, al2023\n   :content-type: installation-guide\n   :date-modified: 2026-03-30\n\nUpdate PyTorch on a Deep Learning AMI\n=======================================\n\nUpdate PyTorch and Neuron components on an existing DLAMI to the latest release.\n\n.. contents:: On this page\n   :local:\n   :depth: 2\n\n\nUpdate PyTorch on Ubuntu 24.04\n-------------------------------\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands for your environment.\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 2.9.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 293\n            :end-line: 294\n\n    .. tab-item:: PyTorch 2.8.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n\n\n\nUpdate PyTorch on Ubuntu 22.04\n-------------------------------\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands for your environment.\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 2.9.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n            :start-line: 284\n            :end-line: 285\n\n    .. tab-item:: PyTorch 2.8.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. note::\n            PyTorch versions 2.7 and 2.8 are no longer supported on Neuron. If you are looking for setup instructions specific to PyTorch 2.7 and 2.8 on Amazon Linux 2023, Ubuntu 24.04, or Ubuntu 22.04, see `the Neuron release 2.28.0 version of the setup docs <https://awsdocs-neuron.readthedocs-hosted.com/en/v2.28.0/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.html#setup-torch-neuronx-ubuntu22>`__.\n\n\nUpdate PyTorch on Amazon Linux 2023\n-------------------------------------\n\nIf you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands for your environment.\n\n.. tab-set::\n\n    .. tab-item:: PyTorch 2.8.0\n\n        .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst\n\n        .. note::\n            PyTorch versions 2.7 and 2.8 are no longer supported on Neuron. If you are looking for setup instructions specific to PyTorch 2.7 and 2.8 on Amazon Linux 2023, Ubuntu 24.04, or Ubuntu 22.04, see `the Neuron release 2.28.0 version of the setup docs <https://awsdocs-neuron.readthedocs-hosted.com/en/v2.28.0/setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2023.html#id2>`__.\n\n    .. tab-item:: PyTorch 2.7.0\n\n        .. note::\n            PyTorch versions 2.7 and 2.8 are no longer supported on Neuron. If you are looking for setup instructions specific to PyTorch 2.7 and 2.8 on Amazon Linux 2023, Ubuntu 24.04, or Ubuntu 22.04, see `the Neuron release 2.28.0 version of the setup docs <https://awsdocs-neuron.readthedocs-hosted.com/en/v2.28.0/setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2023.html#id2>`__.\n\n\nUpdate Neuron driver and runtime\n---------------------------------\n\nUpdate the Neuron driver, runtime, and tools on your DLAMI host. This is recommended\nwhen updating to a new Neuron SDK release.\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 24.04\n\n      .. code-block:: bash\n\n         sudo apt-get update\n         sudo apt-get install -y aws-neuronx-dkms\n         sudo apt-get install -y aws-neuronx-runtime-lib\n         sudo apt-get install -y aws-neuronx-collectives\n         sudo apt-get install -y aws-neuronx-tools\n\n   .. tab-item:: Ubuntu 22.04\n\n      .. code-block:: bash\n\n         sudo apt-get update\n         sudo apt-get install -y aws-neuronx-dkms\n         sudo apt-get install -y aws-neuronx-runtime-lib\n         sudo apt-get install -y aws-neuronx-collectives\n         sudo apt-get install -y aws-neuronx-tools\n\n   .. tab-item:: Amazon Linux 2023\n\n      .. code-block:: bash\n\n         sudo dnf install -y aws-neuronx-dkms\n         sudo dnf install -y aws-neuronx-runtime-lib\n         sudo dnf install -y aws-neuronx-collectives\n         sudo dnf install -y aws-neuronx-tools\n\n\nVerify the update\n------------------\n\nAfter updating, activate your virtual environment:\n\n.. code-block:: bash\n\n   source /opt/aws_neuronx_venv_pytorch/bin/activate\n\n\nAnd verify the update: \n\n.. code-block:: bash\n\n   python3 -c \"import torch; import torch_neuronx; print(f'PyTorch {torch.__version__}, torch-neuronx {torch_neuronx.__version__}')\"\n   neuron-ls\n\nYou should see output similar to this (the versions, instance IDs, and details should match your expected ones, not the ones in this example):\n      \n**Expected output**:\n\n.. code-block:: text\n   \n   PyTorch 2.9.0+cpu, torch-neuronx 2.9.0.1.0\n   \n   +--------+--------+--------+-----------+\n   | DEVICE | CORES  | MEMORY | CONNECTED |\n   +--------+--------+--------+-----------+\n   | 0      | 2      | 32 GB  | Yes       |\n   | 1      | 2      | 32 GB  | Yes       |\n   +--------+--------+--------+-----------+\n\n\nPrevious releases\n------------------\n\nTo install a specific previous Neuron SDK release:\n\n- :doc:`Previous releases for Ubuntu 24.04 </frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u24>`\n- :doc:`Previous releases for Ubuntu 22.04 </frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u22>`\n- :doc:`Previous releases for Amazon Linux 2023 </frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2023>`\n"
  },
  {
    "path": "setup/pytorch/update-dlc.rst",
    "content": ".. meta::\n   :description: Update PyTorch Neuron in a Deep Learning Container deployment\n   :keywords: pytorch, neuron, dlc, container, docker, update, upgrade, driver\n   :framework: pytorch\n   :installation-method: container\n   :instance-types: inf2, trn1, trn2, trn3\n   :os: ubuntu-24.04, ubuntu-22.04, al2023\n   :content-type: installation-guide\n   :date-modified: 2026-03-30\n\nUpdate PyTorch in a Deep Learning Container\n=============================================\n\nUpdate your DLC-based PyTorch Neuron deployment to the latest release.\n\n.. contents:: On this page\n   :local:\n   :depth: 2\n\n\nUpdate the container image\n---------------------------\n\nDLC images are versioned and tagged with the Neuron SDK version. To update, pull the\nlatest image tag from ECR:\n\n.. code-block:: bash\n\n   # Training\n   docker pull public.ecr.aws/neuron/pytorch-training-neuronx:<new_image_tag>\n\n   # Inference\n   docker pull public.ecr.aws/neuron/pytorch-inference-neuronx:<new_image_tag>\n\n   # vLLM Inference\n   docker pull public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:<new_image_tag>\n\nReplace ``<new_image_tag>`` with the tag for the desired SDK version (e.g.,\n``2.9.0-neuronx-py312-sdk2.29.0-ubuntu24.04``).\n\nCheck available tags at the ECR Public Gallery:\n\n- `PyTorch Training <https://gallery.ecr.aws/neuron/pytorch-training-neuronx>`_\n- `PyTorch Inference <https://gallery.ecr.aws/neuron/pytorch-inference-neuronx>`_\n- `PyTorch vLLM Inference <https://gallery.ecr.aws/neuron/pytorch-inference-vllm-neuronx>`_\n\nFor the full list of available images and tags, see :doc:`/containers/locate-neuron-dlc-image`.\n\n\nUpdate Neuron driver on the host\n---------------------------------\n\nThe Neuron driver runs on the host, not inside the container. Update it separately\nwhen moving to a new Neuron SDK release.\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 24.04\n\n      .. code-block:: bash\n\n         sudo apt-get update\n         sudo apt-get install -y aws-neuronx-dkms\n\n   .. tab-item:: Ubuntu 22.04\n\n      .. code-block:: bash\n\n         sudo apt-get update\n         sudo apt-get install -y aws-neuronx-dkms\n\n   .. tab-item:: Amazon Linux 2023\n\n      .. code-block:: bash\n\n         sudo dnf install -y aws-neuronx-dkms\n\n\nVerify the update\n------------------\n\nLaunch the new container and verify:\n\n.. code-block:: bash\n\n   docker run -it \\\n     --device=/dev/neuron0 \\\n     --cap-add SYS_ADMIN \\\n     --cap-add IPC_LOCK \\\n     public.ecr.aws/neuron/pytorch-training-neuronx:<new_image_tag> \\\n     bash\n\nInside the container:\n\n.. code-block:: bash\n\n   python3 -c \"import torch; import torch_neuronx; print(f'PyTorch {torch.__version__}, torch-neuronx {torch_neuronx.__version__}')\"\n   neuron-ls\n\n\n.. dropdown:: ⚠️ Troubleshooting: Version mismatch between host driver and container\n   :color: warning\n   :animate: fade-in\n\n   If you see runtime errors after updating the container image but not the host driver:\n\n   1. Check the host driver version: ``modinfo neuron`` on the host\n   2. Update the host driver to match the SDK version in the container\n   3. Reboot if the driver update requires it: ``sudo reboot``\n"
  },
  {
    "path": "setup/pytorch/update-manual.rst",
    "content": ".. meta::\n   :description: Update a manual PyTorch Neuron installation to the latest release\n   :keywords: pytorch, neuron, manual, update, upgrade, driver, pip\n   :framework: pytorch\n   :installation-method: manual\n   :instance-types: inf2, trn1, trn2, trn3\n   :os: ubuntu-24.04, ubuntu-22.04, al2023\n   :content-type: installation-guide\n   :date-modified: 2026-03-30\n\nUpdate a manual PyTorch installation\n======================================\n\nUpdate PyTorch and Neuron components on an existing manual installation to the latest release.\n\n.. contents:: On this page\n   :local:\n   :depth: 2\n\n\nUpdate PyTorch on Ubuntu 24.04\n-------------------------------\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u24.rst\n   :start-after: _pytorch-neuronx-ubuntu24-update:\n\n\nUpdate PyTorch on Ubuntu 22.04\n-------------------------------\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u22.rst\n   :start-after: _pytorch-neuronx-ubuntu22-update:\n\n\nUpdate PyTorch on Amazon Linux 2023\n-------------------------------------\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-al2023.rst\n   :start-after: _pytorch-neuronx-al2023-update:\n\n\nUpdate Neuron driver and runtime\n---------------------------------\n\nUpdate the Neuron driver, runtime, and tools on your host. This is recommended\nwhen updating to a new Neuron SDK release.\n\n.. tab-set::\n\n   .. tab-item:: Ubuntu 24.04\n\n      .. code-block:: bash\n\n         sudo apt-get update\n         sudo apt-get install -y aws-neuronx-dkms\n         sudo apt-get install -y aws-neuronx-runtime-lib\n         sudo apt-get install -y aws-neuronx-collectives\n         sudo apt-get install -y aws-neuronx-tools\n\n   .. tab-item:: Ubuntu 22.04\n\n      .. code-block:: bash\n\n         sudo apt-get update\n         sudo apt-get install -y aws-neuronx-dkms\n         sudo apt-get install -y aws-neuronx-runtime-lib\n         sudo apt-get install -y aws-neuronx-collectives\n         sudo apt-get install -y aws-neuronx-tools\n\n   .. tab-item:: Amazon Linux 2023\n\n      .. code-block:: bash\n\n         sudo dnf install -y aws-neuronx-dkms\n         sudo dnf install -y aws-neuronx-runtime-lib\n         sudo dnf install -y aws-neuronx-collectives\n         sudo dnf install -y aws-neuronx-tools\n\n\nVerify the update\n------------------\n\nAfter updating, activate your virtual environment and verify:\n\n.. code-block:: bash\n\n   source ~/neuron_venv/bin/activate\n\n.. code-block:: python\n\n   python3 << EOF\n   import torch\n   import torch_neuronx\n\n   print(f\"PyTorch version: {torch.__version__}\")\n   print(f\"torch-neuronx version: {torch_neuronx.__version__}\")\n\n   import subprocess\n   result = subprocess.run(['neuron-ls'], capture_output=True, text=True)\n   print(result.stdout)\n   EOF\n\nYou should see output similar to this (the instance IDS and details should match your expected ones, not the ones in this example):\n\n.. code-block::\n\n   PyTorch version: 2.9.1+cu128, torch-neuronx version: 2.9.0.2.13.23887+8e870898\n   $ neuron-ls\n   instance-type: trn1.2xlarge\n   instance-id: i-0bea223b1afb7e159\n   +--------+--------+----------+--------+--------------+----------+------+\n   | NEURON | NEURON |  NEURON  | NEURON |     PCI      |   CPU    | NUMA |\n   | DEVICE | CORES  | CORE IDS | MEMORY |     BDF      | AFFINITY | NODE |\n   +--------+--------+----------+--------+--------------+----------+------+\n   | 0      | 2      | 0-1      | 32 GB  | 0000:00:1e.0 | 0-7      | -1   |\n   +--------+--------+----------+--------+--------------+----------+------+\n\n----\n\n.. _install-prev-releases:\n\nInstall previous releases\n-------------------------\n\nIf you need to install older Neuron releases on your instance, follow the instructions below.\n\nInstall previous releases on Ubuntu 24.04\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u24.rst\n   :start-after: _pytorch-neuronx-install-prev-u24:\n\n\nInstall previous releases on Ubuntu 22.04\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u22.rst\n   :start-after: _pytorch-neuronx-install-prev-u22:\n\n\nInstall previous releases on Amazon Linux 2023\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2023.rst\n   :start-after: _pytorch-neuronx-install-prev-al2023:\n"
  },
  {
    "path": "setup/setup-rocky-linux-9.rst",
    "content": ".. _setup-rocky-linux-9:\n\n.. card:: Select a Different Platform for Setup\n    :link: setup-guide-index\n    :link-type: ref\n    :class-body: sphinx-design-class-title-small\n\nPyTorch Neuron Setup Guide for Rocky Linux 9\n============================================\n\n\n.. contents:: Table of contents\n    :local:\n    :depth: 1\n\nGet Started with Latest Release of PyTorch Neuron \n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThis section provides links that will assist you to quickly start with a fresh installation of PyTorch Neuron (``torch-neuronx`` , ``torch-neuron``).\n\n\n.. dropdown:: Launch the Instance\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    * Please follow the instructions at `launch an Amazon EC2 Instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance>`_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type.\n    * To get more information about instances sizes and pricing see: `Trn1 web page <https://aws.amazon.com/ec2/instance-types/trn1/>`_, `Inf2 web page <https://aws.amazon.com/ec2/instance-types/inf2/>`_\n    * Select Rocky-9-EC2-Base AMI\n    * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB.\n    * After launching the instance, follow the instructions in `Connect to your instance <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html>`_ to connect to the instance\n\n.. dropdown:: Install Drivers and Tools\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n\n    .. include:: /src/helperscripts/installationScripts/python_instructions.txt\n        :start-line: 218\n        :end-line: 219\n\nPlease continue with the installation instructions for EFA and PyTorch Neuron by following the corresponding AL2023 setup guide below (please skip the \"Launch the Instance\" and \"Install Drivers and Tools\" sections). \n\n\n.. card:: Pytorch Neuron (``torch-neuronx``) Setup\n            :link: /setup/pytorch/manual\n            :link-type: doc\n            :class-body: sphinx-design-class-title-small\n\n.. card:: Pytorch Neuron (``torch-neuron``) Setup for Inf1\n            :link: /setup/legacy-inf1/pytorch\n            :link-type: doc\n            :class-body: sphinx-design-class-title-small"
  },
  {
    "path": "setup/setup-troubleshooting.rst",
    "content": ".. _neuron-setup-troubleshooting:\n\nNeuron Setup Troubleshooting\n============================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\n.. _gpg_key_update:\n\nHow to update Neuron repository GNU Privacy Guard (GPG) key for Ubuntu installation\n-----------------------------------------------------------------------------------\n\nDescription\n^^^^^^^^^^^\n\nThe GPG key for the Neuron repository (https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB) is installed on the Ubuntu (Canonical) server, the key was uploaded originally with an expiry date of three (3) years, which has expired on 11/10/22.\n\nAny customer of Ubuntu or Debian using Neuron ``apt`` repository will get the following error:\n\n.. code::\n\n   While running an apt-get update command on an AWS deep learning image (us-east-1/ami-01fce297f68912e45) I get this output:\n\n   Err:6 https://apt.repos.neuron.amazonaws.com (https://apt.repos.neuron.amazonaws.com/) bionic InRelease\n   The following signatures were invalid: EXPKEYSIG 5749CAD8646D9185 Amazon AWS Neuron <neuron-maintainers@amazon.com>\n   Fetched 172 kB in 1s (161 kB/s)\n   Reading package lists... Done\n   W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error:https://apt.repos.neuron.amazonaws.com (https://apt.repos.neuron.amazonaws.com/) bionic InRelease: The following signatures were invalid: EXPKEYSIG 5749CAD8646D9185 Amazon AWS Neuron <neuron-maintainers@amazon.com>\n\nSolution\n^^^^^^^^\n\nTo solve this issue, you need to run the following commands to fetch the new key before running ``apt-get update``\n\n\n.. code::\n\n   wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -\n\n   # Update OS packages\n   sudo apt-get update -y\n\n\n\n\n``pip install --upgrade`` wouldn't upgrade ``neuron-cc``\n--------------------------------------------------------\n\nDescription\n^^^^^^^^^^^\n\nWhen trying to upgrade to a newer Neuron release, for example by calling: \n\n``pip install --upgrade torch-neuron neuron-cc[tensorflow] torchvision``\n\n``neuron-cc`` is not upgraded.\n\nThis can be a result of a bug in certain ``pip`` versions, for example `pip install upgrade will not upgrade package if extras_require specified <https://github.com/pypa/pip/issues/10173>`_\n\nSolution\n^^^^^^^^\n\nTo solve this issue you can either upgrade to a newer ``pip`` version or use ``--force`` when trying to upgrade, for example:\n\n``pip install --force torch-neuron neuron-cc[tensorflow] torchvision``\n\n"
  },
  {
    "path": "setup/torch-neuron-ubuntu20.rst",
    "content": "\n.. dropdown::  Select a Different Framework or Platform\n    :class-title: sphinx-design-class-title-small\n    :class-body: sphinx-design-class-body-small\n    :animate: fade-in\n     \n\n.. raw:: html\n\n        <script>\n            var framework = document.getElementById(\"framework-select\");\n            framework.value = \"torch-neuron\";\n\n            var platform = document.getElementById(\"platform-select\");\n            platform.value = \"ubuntu-20\";\n\n        </script>\n\n\nGet Started with PyTorch Neuron (\"torch-neuron\") on Ubuntu 20\n=============================================================="
  },
  {
    "path": "setup/torch-neuron.rst",
    "content": ".. _setup-torch-neuron:\n\nPyTorch Neuron (``torch-neuron``) Setup\n=======================================\n\n.. warning::\n\n   ``torch-neuron`` is for Inf1 instances only (legacy NeuronCore v1).\n   For new projects, use Inf2, Trn1, Trn2, or Trn3 with ``torch-neuronx``.\n   See :doc:`/setup/pytorch/index` for current setup.\n\nFor Inf1 setup instructions, see :doc:`/setup/legacy-inf1/pytorch`.\n"
  },
  {
    "path": "setup/torch-neuronx.rst",
    "content": ".. _setup-torch-neuronx:\n\n.. meta::\n   :description: Install PyTorch NeuronX (torch-neuronx) on AWS Trainium and Inferentia instances using DLAMI, DLC, or manual pip installation\n   :keywords: pytorch, neuron, torch-neuronx, installation, setup, trainium, inferentia, trn1, trn2, trn3, inf2, DLAMI, pip\n   :date-modified: 2026-03-30\n\nPyTorch Neuron (``torch-neuronx``) Setup \n========================================\n\nInstall PyTorch with Neuron support for training and inference on Inf2, Trn1, Trn2, and Trn3 instances. Choose from a pre-configured DLAMI, a Docker container, or a manual pip installation.\n\nFor the full setup guide with all options, see :doc:`Install PyTorch for Neuron </setup/pytorch/index>`.\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: 🚀 DLAMI Installation\n      :link: /setup/pytorch/dlami\n      :link-type: doc\n      :class-card: sd-border-2\n\n      Pre-configured environment with all dependencies. Recommended for most users.\n\n   .. grid-item-card:: 🚀 Multi-Framework DLAMI\n      :link: /setup/multiframework-dlami\n      :link-type: doc\n      :class-card: sd-border-2\n\n      Pre-configured AMI with PyTorch, JAX, and vLLM virtual environments ready to use.\n\n   .. grid-item-card:: � Deep Learning Container\n      :link: /setup/pytorch/dlc\n      :link-type: doc\n      :class-card: sd-border-2\n\n      Pre-configured Docker images from AWS ECR for containerized deployments.\n\n   .. grid-item-card:: 🔧 Manual Installation\n      :link: /setup/pytorch/manual\n      :link-type: doc\n      :class-card: sd-border-2\n\n      Install on bare OS AMIs or existing systems with full control over dependencies.\n\n   .. grid-item-card:: Rocky Linux 9\n      :link: setup-rocky-linux-9\n      :link-type: ref\n      :class-card: sd-border-2\n\n      Install on Rocky Linux 9 using the Rocky-9-EC2-Base AMI.\n"
  },
  {
    "path": "setup/troubleshooting.rst",
    "content": ".. meta::\n   :description: Troubleshooting guide for AWS Neuron SDK installation issues\n   :keywords: neuron, troubleshooting, installation, errors, debugging\n   :content-type: troubleshooting\n   :date-modified: 2026-03-03\n\nInstallation Troubleshooting\n=============================\n\nCommon issues and solutions for Neuron SDK installation.\n\nModule Import Errors\n--------------------\n\nModuleNotFoundError: No module named 'torch_neuronx'\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Symptoms**: Python cannot find torch_neuronx module after installation.\n\n**Causes**:\n\n- Virtual environment not activated\n- Wrong Python version\n- Installation failed silently\n- Multiple Python installations\n\n**Solutions**:\n\n1. **Verify virtual environment**:\n   \n   .. code-block:: bash\n      \n      which python\n      # Should show virtual environment path, not system Python\n\n2. **Check Python version**:\n   \n   .. code-block:: bash\n      \n      python --version\n      # Should be 3.10, 3.11, or 3.12\n\n3. **Reinstall torch-neuronx**:\n   \n   .. code-block:: bash\n      \n      pip install --force-reinstall torch-neuronx --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\n4. **Verify installation**:\n   \n   .. code-block:: bash\n      \n      pip list | grep neuron\n\nImportError: cannot import name 'neuron' from 'torch'\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Symptoms**: Import error when trying to use Neuron features.\n\n**Cause**: Using PyTorch/XLA syntax with Native PyTorch backend.\n\n**Solution**: Update code to use Native PyTorch syntax:\n\n.. code-block:: python\n   \n   # Old (PyTorch/XLA)\n   import torch_xla.core.xla_model as xm\n   device = xm.xla_device()\n   \n   # New (Native PyTorch)\n   import torch\n   device = torch.device('neuron')\n\nSee :doc:`/frameworks/torch/index` for complete migration guide.\n\nDevice and Runtime Errors\n--------------------------\n\nNo Neuron devices found\n~~~~~~~~~~~~~~~~~~~~~~~\n\n**Symptoms**: ``neuron-ls`` shows no devices or returns error.\n\n**Causes**:\n\n- Wrong instance type\n- Neuron driver not loaded\n- Runtime not started\n\n**Solutions**:\n\n1. **Verify instance type**:\n   \n   .. code-block:: bash\n      \n      curl http://169.254.169.254/latest/meta-data/instance-type\n      # Should show inf2.*, trn1.*, trn2.*, trn3.*, or inf1.*\n\n2. **Check Neuron driver**:\n   \n   .. code-block:: bash\n      \n      lsmod | grep neuron\n      # Should show neuron driver loaded\n\n3. **Install/reload driver**:\n   \n   .. code-block:: bash\n      \n      # Ubuntu/Debian\n      sudo apt-get install -y aws-neuronx-dkms\n      \n      # Amazon Linux\n      sudo yum install -y aws-neuronx-dkms\n\n4. **Restart runtime**:\n   \n   .. code-block:: bash\n      \n      sudo systemctl restart neuron-monitor\n      neuron-ls\n\nRuntimeError: Neuron runtime initialization failed\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Symptoms**: Runtime fails to initialize when running models.\n\n**Causes**:\n\n- Insufficient permissions\n- Runtime version mismatch\n- Corrupted runtime state\n\n**Solutions**:\n\n1. **Check runtime status**:\n   \n   .. code-block:: bash\n      \n      sudo systemctl status neuron-monitor\n\n2. **Verify permissions**:\n   \n   .. code-block:: bash\n      \n      ls -l /dev/neuron*\n      # Should be accessible by current user\n\n3. **Reinstall runtime**:\n   \n   .. code-block:: bash\n      \n      sudo apt-get install --reinstall aws-neuronx-runtime-lib\n\nVersion Compatibility Issues\n-----------------------------\n\nCompiler version mismatch\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Symptoms**: Error about incompatible compiler version.\n\n**Cause**: neuronx-cc version incompatible with framework version.\n\n**Solution**: Install compatible versions:\n\n.. code-block:: bash\n   \n   # For PyTorch 2.9\n   pip install neuronx-cc==2.15.* --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\nSee :doc:`/release-notes/index` for version compatibility matrix.\n\nPackage dependency conflicts\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Symptoms**: pip reports conflicting dependencies.\n\n**Solution**: Use fresh virtual environment:\n\n.. code-block:: bash\n   \n   python3 -m venv ~/fresh_neuron_venv\n   source ~/fresh_neuron_venv/bin/activate\n   pip install -U pip\n   # Install packages in correct order\n   pip install torch==2.9.0\n   pip install torch-neuronx neuronx-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com\n\nNetwork and Repository Issues\n------------------------------\n\nCannot connect to Neuron repository\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**Symptoms**: apt-get or pip cannot reach Neuron repositories.\n\n**Solutions**:\n\n1. **Verify network connectivity**:\n   \n   .. code-block:: bash\n      \n      curl -I https://apt.repos.neuron.amazonaws.com\n      curl -I https://pip.repos.neuron.amazonaws.com\n\n2. **Check proxy settings** (if behind corporate proxy):\n   \n   .. code-block:: bash\n      \n      export https_proxy=http://proxy.example.com:8080\n      export http_proxy=http://proxy.example.com:8080\n\n3. **Use alternative index URL**:\n   \n   .. code-block:: bash\n      \n      pip install torch-neuronx --index-url=https://pip.repos.neuron.amazonaws.com\n\nGPG key expired\n~~~~~~~~~~~~~~~\n\n**Symptoms**: \"EXPKEYSIG\" error during apt-get update.\n\n**Solution**:\n\n.. code-block:: bash\n   \n   wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -\n   sudo apt-get update -y\n\nGetting Help\n------------\n\nIf issues persist:\n\n1. **Check release notes**: :doc:`/release-notes/index`\n2. **Review documentation**: :doc:`/frameworks/torch/index`\n3. **GitHub Issues**: `aws-neuron-sdk/aws-neuron-sdk <https://github.com/aws-neuron/aws-neuron-sdk/issues>`_\n4. **AWS Support**: Open support case if you have AWS Support plan\n\nDiagnostic Information\n----------------------\n\nWhen reporting issues, include:\n\n.. code-block:: bash\n   \n   # System information\n   uname -a\n   cat /etc/os-release\n   \n   # Instance type\n   curl http://169.254.169.254/latest/meta-data/instance-type\n   \n   # Neuron devices\n   neuron-ls\n   \n   # Package versions\n   pip list | grep -E \"(torch|neuron)\"\n   \n   # Driver status\n   lsmod | grep neuron\n   sudo systemctl status neuron-monitor\n"
  },
  {
    "path": "src/benchmark/helper_scripts/llmperf_dp.patch",
    "content": "diff --git a/src/llmperf/ray_clients/openai_chat_completions_client.py b/src/llmperf/ray_clients/openai_chat_completions_client.py\nindex f2e0a91..74c4027 100644\n--- a/src/llmperf/ray_clients/openai_chat_completions_client.py\n+++ b/src/llmperf/ray_clients/openai_chat_completions_client.py\n@@ -1,5 +1,6 @@\n import json\n import os\n+import random\n import time\n from typing import Any, Dict\n \n@@ -14,6 +15,9 @@ from llmperf import common_metrics\n @ray.remote\n class OpenAIChatCompletionsClient(LLMClient):\n     \"\"\"Client for OpenAI Chat Completions API.\"\"\"\n+    def __init__(self):\n+        self.addr_id = 0\n+        self.addr_select_strategy = 'round-robin'\n \n     def llm_request(self, request_config: RequestConfig) -> Dict[str, Any]:\n         prompt = request_config.prompt\n@@ -50,6 +54,13 @@ class OpenAIChatCompletionsClient(LLMClient):\n         address = os.environ.get(\"OPENAI_API_BASE\")\n         if not address:\n             raise ValueError(\"the environment variable OPENAI_API_BASE must be set.\")\n+        # if several addresses of model server exist, select one for each request (1) randomly or (2) round-robin\n+        address_list = address.split(\";\")\n+        if self.addr_select_strategy == 'round-robin':\n+            address = address_list[self.addr_id]\n+            self.addr_id = (self.addr_id + 1) % len(address_list)\n+        else:\n+            address = random.choice(address_list)\n         key = os.environ.get(\"OPENAI_API_KEY\")\n         if not key:\n             raise ValueError(\"the environment variable OPENAI_API_KEY must be set.\")\n"
  },
  {
    "path": "src/benchmark/helper_scripts/llmperf_reasoning.patch",
    "content": "diff --git a/src/llmperf/ray_clients/openai_chat_completions_client.py b/src/llmperf/ray_clients/openai_chat_completions_client.py\nindex aeb5fbf..f1b4473 100644\n--- a/src/llmperf/ray_clients/openai_chat_completions_client.py\n+++ b/src/llmperf/ray_clients/openai_chat_completions_client.py\n@@ -100,7 +100,7 @@ class OpenAIChatCompletionsClient(LLMClient):\n                         raise RuntimeError(data[\"error\"][\"message\"])\n\n                     delta = data[\"choices\"][0][\"delta\"]\n-                    if delta.get(\"content\", None):\n+                    if delta.get(\"content\", None) or delta.get(\"reasoning_content\", None):\n                         if not ttft:\n                             ttft = time.monotonic() - start_time\n                             # time_to_next_token.append(ttft)\n@@ -109,7 +109,11 @@ class OpenAIChatCompletionsClient(LLMClient):\n                                 time.monotonic() - most_recent_received_token_time\n                             )\n                         most_recent_received_token_time = time.monotonic()\n-                        generated_text += delta[\"content\"]\n+                        if \"reasoning_content\" in delta and delta[\"reasoning_content\"]:\n+                            chunk_content = delta[\"reasoning_content\"]\n+                        else:\n+                            chunk_content = delta[\"content\"]\n+                        generated_text += chunk_content\n\n             total_request_time = time.monotonic() - start_time\n             output_throughput = tokens_received / total_request_time"
  },
  {
    "path": "src/benchmark/helper_scripts/neuron_perf.patch",
    "content": "diff --git a/src/llmperf/ray_clients/openai_chat_completions_client.py b/src/llmperf/ray_clients/openai_chat_completions_client.py\nindex f2e0a91..644d5a6 100644\n--- a/src/llmperf/ray_clients/openai_chat_completions_client.py\n+++ b/src/llmperf/ray_clients/openai_chat_completions_client.py\n@@ -92,7 +92,7 @@ class OpenAIChatCompletionsClient(LLMClient):\n                     if delta.get(\"content\", None):\n                         if not ttft:\n                             ttft = time.monotonic() - start_time\n-                            time_to_next_token.append(ttft)\n+                            # time_to_next_token.append(ttft)\n                         else:\n                             time_to_next_token.append(\n                                 time.monotonic() - most_recent_received_token_time\ndiff --git a/token_benchmark_ray.py b/token_benchmark_ray.py\nindex 63216b1..11e0116 100644\n--- a/token_benchmark_ray.py\n+++ b/token_benchmark_ray.py\n@@ -32,6 +32,7 @@ def get_token_throughput_latencies(\n     stddev_input_tokens: int,\n     mean_output_tokens: int,\n     stddev_output_tokens: int,\n+    tokenizer: str,\n     additional_sampling_params: Optional[Dict[str, Any]] = None,\n     num_concurrent_requests: int = 1,\n     max_num_completed_requests: int = 500,\n@@ -60,10 +61,8 @@ def get_token_throughput_latencies(\n     \"\"\"\n     random.seed(11111)\n \n-    tokenizer = LlamaTokenizerFast.from_pretrained(\n-        \"hf-internal-testing/llama-tokenizer\"\n-    )\n-    get_token_length = lambda text: len(tokenizer.encode(text))\n+    hf_tokenizer = LlamaTokenizerFast.from_pretrained(tokenizer)\n+    get_token_length = lambda text: len(hf_tokenizer.encode(text))\n     \n     if not additional_sampling_params:\n         additional_sampling_params = {}\n@@ -84,7 +83,7 @@ def get_token_throughput_latencies(\n             prompt_tokens_mean=mean_input_tokens,\n             prompt_tokens_stddev=stddev_input_tokens,\n             expect_output_tokens=num_output_tokens,\n-            tokenizer=tokenizer\n+            tokenizer=hf_tokenizer\n         ))\n     start_time = time.monotonic()\n     pbar = tqdm(total=max_num_completed_requests)\n@@ -118,7 +117,7 @@ def get_token_throughput_latencies(\n                 with completed_requests_lock:\n                     if num_completed_requests < max_num_completed_requests:\n                         if num_output_tokens:\n-                            request_metrics[common_metrics.INTER_TOKEN_LAT] /= request_metrics[common_metrics.NUM_OUTPUT_TOKENS]\n+                            request_metrics[common_metrics.INTER_TOKEN_LAT] /= num_output_tokens - 1\n                         else:\n                             request_metrics[common_metrics.INTER_TOKEN_LAT] = 0\n                         request_metrics[common_metrics.NUM_OUTPUT_TOKENS] = num_output_tokens\n@@ -155,7 +154,7 @@ def get_token_throughput_latencies(\n         with completed_requests_lock:\n             if num_completed_requests < max_num_completed_requests:\n                 if num_output_tokens:\n-                    request_metrics[common_metrics.INTER_TOKEN_LAT] /= num_output_tokens\n+                    request_metrics[common_metrics.INTER_TOKEN_LAT] /= num_output_tokens - 1\n                 else:\n                     request_metrics[common_metrics.INTER_TOKEN_LAT] = 0\n                 request_metrics[common_metrics.NUM_OUTPUT_TOKENS] = num_output_tokens\n@@ -292,6 +291,7 @@ def run_token_benchmark(\n     additional_sampling_params: str,\n     results_dir: str,\n     user_metadata: Dict[str, Any],\n+    tokenizer: str,\n ):\n     \"\"\"\n     Args:\n@@ -327,6 +327,7 @@ def run_token_benchmark(\n         stddev_output_tokens=stddev_output_tokens,\n         num_concurrent_requests=num_concurrent_requests,\n         additional_sampling_params=json.loads(additional_sampling_params),\n+        tokenizer=tokenizer,\n     )\n \n     if results_dir:\n@@ -462,6 +463,11 @@ args.add_argument(\n         \"name=foo,bar=1. These will be added to the metadata field of the results. \"\n     ),\n )\n+args.add_argument(\n+    \"--tokenizer\",\n+    type=str,\n+    default=\"hf-internal-testing/llama-tokenizer\",\n+)\n \n if __name__ == \"__main__\":\n     env_vars = dict(os.environ)\n@@ -488,4 +494,5 @@ if __name__ == \"__main__\":\n         additional_sampling_params=args.additional_sampling_params,\n         results_dir=args.results_dir,\n         user_metadata=user_metadata,\n+        tokenizer=args.tokenizer,\n     )\n"
  },
  {
    "path": "src/benchmark/tensorflow/distilbert-base-uncased-finetuned-sst-2-english_benchmark.py",
    "content": "# Add to these lists or change as needed\nmodel_names = [\"distilbert-base-uncased-finetuned-sst-2-english\"]\nsequence_lengths = [128]\nbatch_sizes = [128]\npipeline_sizes = [1]\n\n# Silence an irrelevant warning from transformers library\nimport os\nos.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n\nimport numpy as np\nimport neuronperf as npf\nimport neuronperf.tensorflow\nfrom transformers import AutoTokenizer, TFAutoModelForSequenceClassification\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence = \"I am sorry. I really want to like it, but I just can not stand sushi.\"\n    paraphrase = tokenizer.encode_plus(\n        sequence,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"np\",\n    )\n    inputs = {\n        \"input_ids\": np.concatenate([paraphrase[\"input_ids\"]] * batch_size, axis=0),\n        \"attention_mask\": np.concatenate([paraphrase[\"attention_mask\"]] * batch_size, axis=0),\n    }\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Benchmark\n            print(\"Benchmarking {}\".format(filename))\n            reports = npf.tensorflow.benchmark(filename, inputs)\n\n            # View and save results\n            print(\"======== {} ========\".format(filename))\n            npf.print_reports(reports)\n            npf.write_csv(reports)\n            npf.write_json(reports)\n"
  },
  {
    "path": "src/benchmark/tensorflow/distilbert-base-uncased-finetuned-sst-2-english_compile.py",
    "content": "# Add to these lists or change as needed\nmodel_names = [\"distilbert-base-uncased-finetuned-sst-2-english\"]\nsequence_lengths = [128]\nbatch_sizes = [128]\npipeline_sizes = [1]\n\n# Silence an irrelevant warning from transformers library\nimport os\nos.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n\nimport numpy as np\nimport neuronperf as npf\nimport neuronperf.tensorflow\nfrom transformers import AutoTokenizer, TFAutoModelForSequenceClassification\n\n\ndef get_batch(tokenizer, sequence_length, batch_size):\n    sequence = \"I am sorry. I really want to like it, but I just can not stand sushi.\"\n    paraphrase = tokenizer.encode_plus(\n        sequence,\n        max_length=sequence_length,\n        padding=\"max_length\",\n        truncation=True,\n        return_tensors=\"np\",\n    )\n    inputs = {\n        \"input_ids\": np.concatenate([paraphrase[\"input_ids\"]] * batch_size, axis=0),\n        \"attention_mask\": np.concatenate([paraphrase[\"attention_mask\"]] * batch_size, axis=0),\n    }\n    return inputs\n\n\nif __name__ == \"__main__\":\n    for model_name in model_names:\n        tokenizer = AutoTokenizer.from_pretrained(model_name)\n        model = TFAutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False)\n        for sequence_length in sequence_lengths:\n            inputs = [\n                get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes\n            ]\n            filename = f\"{model_name}_sl{sequence_length}.json\"\n\n            # Compile\n            print(\"Compiling {}\".format(filename))\n            npf.tensorflow.compile(\n                model,\n                inputs,\n                batch_sizes=batch_sizes,\n                pipeline_sizes=pipeline_sizes,\n                filename=filename,\n                model_name=model_name,\n            )\n"
  },
  {
    "path": "src/examples/mxnet/README.md",
    "content": "</br>\n</br>\n\nPlease view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** \n\n"
  },
  {
    "path": "src/examples/mxnet/data_parallel/benchmark_utils.py",
    "content": "import math\nfrom collections import Counter\n\nimport numpy as np\n\nclass Results():\n\n    def __init__(self, batch_size, num_cores=1):\n        self.latency_array = []\n        self.end_times = []\n        self.start_times = []\n        self.batch_size = batch_size\n        self.num_cores = num_cores\n\n    def add_result(self, latency_array, end_times, start_times):\n        self.latency_array.extend(latency_array)\n        self.end_times.extend(end_times)\n        self.start_times.extend(start_times)\n\n    def report(self, f, window_size=1):\n        assert(len(self.latency_array) != 0)\n        p50_latency = np.percentile(self.latency_array, 50)\n        p90_latency = np.percentile(self.latency_array, 90)\n        p95_latency = np.percentile(self.latency_array, 95)\n        p99_latency = np.percentile(self.latency_array, 99)\n        p100_latency = np.percentile(self.latency_array, 100)\n\n\n        def get_bucket(start, end):\n            bucketed_start = math.floor(start / window_size) * window_size\n            bucketed_end = math.ceil(end / window_size) * window_size\n            # The check is to make sure that we ignore timestamps that are larger than the window size\n            if bucketed_end - bucketed_start == window_size:\n                return bucketed_start\n            else:\n                return None\n            \n        # Divide the timestamps into different buckets\n        bucketed_timestamps = [get_bucket(start, end)\n                            for start, end in zip(self.start_times, self.end_times)]\n        # Count the values in each bucket\n        counted_buckets = Counter(\n            item for item in bucketed_timestamps if item is not None)\n        # Normalize each bucket\n        bucket_throughputs = [(key, value / window_size)\n                            for key, value in sorted(counted_buckets.items())]\n        \n        busy_throughputs = [value for _, value in bucket_throughputs]\n        max_throughput = max(busy_throughputs) * self.batch_size\n        avg_throughput = sum(busy_throughputs) * self.batch_size / len(busy_throughputs)\n        \n        f.write(\"\\n\")\n        f.write(\n            \"Maximum throughput = {} sentences/sec\\n\".format(int(max_throughput)))\n        f.write(\"Average throughput = {} sentences/sec\\n\".format(int(avg_throughput)))\n\n        f.write(\"\\n\")\n        f.write(\"Latency Percentiles:\\n\")\n        f.write(\"===\\n\")\n        f.write(\"P50  = {} milliseconds\\n\".format(int(1000*p50_latency)))\n        f.write(\"P90  = {} milliseconds\\n\".format(int(1000*p90_latency)))\n        f.write(\"P95  = {} milliseconds\\n\".format(int(1000*p95_latency)))\n        f.write(\"P99  = {} milliseconds\\n\".format(int(1000*p99_latency)))\n        f.write(\"P100 = {} milliseconds\\n\".format(int(1000*p100_latency)))\n        f.write(\"\\n\")\n        f.write(\"Sanity test:\\n\")\n        f.write(\"===\\n\")\n        f.write(\"Processed - num batches {}\\n\".format(len(self.latency_array)))\n        f.write(\"          - batch size {}\\n\".format(self.batch_size))\n        f.write(\"          - num cores {}\\n\".format(self.num_cores))"
  },
  {
    "path": "src/examples/mxnet/data_parallel/data_parallel_tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Using Data Parallel Mode with Gluon MXNet\\n\",\n    \"\\n\",\n    \"In this tutorial, you will compile a Gluon BERT model and run in data-parallel mode to completely utilize the NeuronCores. Here you will benchmark a multi-worker setup and compare it with a single worker.\\n\",\n    \"\\n\",\n    \"This tutorial is intended only for MXNet-1.8.\\n\",\n    \"\\n\",\n    \"In this tutorial, we will be using an inf1.2xlarge with the latest AWS Deep Learning AMI (DLAMI). The inf1.2xlarge instance has 1 AWS Inferentia Chip with 4 NeuronCores.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Setting up your environment\\n\",\n    \"\\n\",\n    \"To run this tutorial, please make sure you deactivate any existing MXNet conda environments you already using. Install MXNet 1.8 by following the instructions at [MXNet Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/mxnet-setup/mxnet-install.html#develop-on-aws-ml-accelerator-instance). You would also need to change your kernel to use the correct Python environment setup earlier by clicking Kerenel->Change Kernel->Python (Neuron MXNet)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Install dependencies\\n\",\n    \"\\n\",\n    \"We have to install gluon-nlp to get the BERT model. Run the following command to install:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!python -m pip install gluonnlp\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Compiling BERT Model\\n\",\n    \"\\n\",\n    \"Next, we compile the Gluon BERT model and save it. Once the model is compiled, we use the same model across the entire tutorial.\\n\",\n    \"In this tutorial, we will be using a BERT model with sequence length 32\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import mxnet as mx\\n\",\n    \"import mx_neuron\\n\",\n    \"import gluonnlp as nlp\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"BERT_MODEL = 'bert_12_768_12'\\n\",\n    \"BERT_DATA = 'book_corpus_wiki_en_uncased'\\n\",\n    \"batch_size = 1\\n\",\n    \"seq_len = 32\\n\",\n    \"num_cores = 1\\n\",\n    \"dtype = 'float32'\\n\",\n    \"\\n\",\n    \"compiled_model_path = '{}.compiled.{}.{}'.format(BERT_MODEL, batch_size, seq_len)\\n\",\n    \"\\n\",\n    \"model, vocab = nlp.model.get_model(BERT_MODEL,\\n\",\n    \"                                   dataset_name=BERT_DATA,\\n\",\n    \"                                   use_classifier=False,\\n\",\n    \"                                   use_decoder=False, ctx=mx.cpu())\\n\",\n    \"  \\n\",\n    \"# Create sample inputs for compilation\\n\",\n    \"words = mx.nd.ones([batch_size, seq_len], name='words', dtype=dtype)\\n\",\n    \"valid_len = mx.nd.ones([batch_size,], name='valid_len', dtype=dtype)\\n\",\n    \"segments = mx.nd.ones([batch_size, seq_len], name='segments', dtype=dtype)\\n\",\n    \"inputs = {'data0': words, 'data1': segments, 'data2': valid_len}\\n\",\n    \"\\n\",\n    \"# Compiler Args ~~ \\n\",\n    \"options = {}\\n\",\n    \"embeddingNames = ['bertmodel0_word_embed_embedding0_fwd', 'bertmodel0_token_type_embed_embedding0_fwd', 'bertencoder0_embedding0']\\n\",\n    \"options.update({'force_incl_node_names': embeddingNames})\\n\",\n    \"options.update({'flags': ['--fp32-cast matmult']}) \\n\",\n    \"\\n\",\n    \"# Compile and save ~~ \\n\",\n    \"model = mx_neuron.compile(model, inputs=inputs, **options)\\n\",\n    \"model.export(compiled_model_path)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Data Parallel Mode\\n\",\n    \"\\n\",\n    \"Data Parallel Mode is a setup in which you launch multiple copies of the same model, such that each model is running independently of the other. In other words, each model has its own resources to run inference. \\n\",\n    \"\\n\",\n    \"On an inf1.2xlarge instance, we have 4 NeuronCores. Hence, we can launch 4 models such that each model is loaded on a single NeuronCore. This unables us to process 4 request concurrently without linear increase in latency. As a result, the throughput of the system increases when compared to a single model inference. This would also allow us to utilize all the 4 NeuronCores on the instance.\\n\",\n    \"\\n\",\n    \"Run through the next set of cells to see the difference in throughput as we scale from one model to 4 models running in parallel.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"def get_sample_inputs(batch_size, seq_len):\\n\",\n    \"    words = np.ones([batch_size, seq_len], dtype=np.float32)\\n\",\n    \"    valid_len = np.ones([batch_size,], dtype=np.float32)\\n\",\n    \"    segments = np.ones([batch_size, seq_len], dtype=np.float32)\\n\",\n    \"    inputs = {'data0': words, 'data1': segments, 'data2': valid_len}\\n\",\n    \"    return inputs\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Next for comparison purposes, we run the setup with 1 worker. To do this, we set the num_cores=1. This would launch only 1 model running on a single NeuronCore. After running the below cell, note down the latency and throughput for the system\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from parallel import NeuronSimpleDataParallel\\n\",\n    \"from benchmark_utils import Results\\n\",\n    \"import time\\n\",\n    \"import functools\\n\",\n    \"import os\\n\",\n    \"import numpy as np\\n\",\n    \"import warnings\\n\",\n    \"\\n\",\n    \"num_cores = 1\\n\",\n    \"batch_size=1\\n\",\n    \"\\n\",\n    \"# Each worker process should use one core, hence we set\\n\",\n    \"#    os.environ['NEURON_RT_NUM_CORES'] = \\\"1\\\"\\n\",\n    \"os.environ[\\\"NEURON_RT_NUM_CORES\\\"] = \\\"1\\\"\\n\",\n    \"\\n\",\n    \"#Result aggregation class (code in bert_benchmark_utils.py)\\n\",\n    \"results = Results(batch_size, num_cores)\\n\",\n    \"def result_handler(output, start, end):\\n\",\n    \"    elapsed = end - start\\n\",\n    \"    results.add_result([elapsed], [end], [start])\\n\",\n    \"\\n\",\n    \"inputs = get_sample_inputs(batch_size, seq_len)\\n\",\n    \"parallel_neuron_model = NeuronSimpleDataParallel(compiled_model_path, num_cores, inputs)\\n\",\n    \"\\n\",\n    \"#Starting the inference threads\\n\",\n    \"parallel_neuron_model.start_continuous_inference()\\n\",\n    \"\\n\",\n    \"# Warm up the cores\\n\",\n    \"for _ in range(num_cores*4):\\n\",\n    \"    parallel_neuron_model.warmup(inputs)\\n\",\n    \"    \\n\",\n    \"# Need to run for high number of iterations to benchmark the models\\n\",\n    \"for _ in range(1000):\\n\",\n    \"    parallel_neuron_model.infer(inputs)\\n\",\n    \"    # Passing the result_handler as a callback function\\n\",\n    \"    parallel_neuron_model.add_result(result_handler)\\n\",\n    \"\\n\",\n    \"# Stop inference                \\n\",\n    \"parallel_neuron_model.stop()\\n\",\n    \"# Since we are using a multi-process execution with a shared queue, some inferences\\n\",\n    \"# may still be in execution phase. Hence we need to wait till all the inputs are processed\\n\",\n    \"# add_all_results() will collect all the results of requests which are in this state\\n\",\n    \"parallel_neuron_model.add_all_results(result_handler)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"with open(\\\"benchmark.txt\\\", \\\"w\\\") as f:\\n\",\n    \"    results.report(f, window_size=1)\\n\",\n    \"\\n\",\n    \"with open(\\\"benchmark.txt\\\", \\\"r\\\") as f:\\n\",\n    \"    for line in f:\\n\",\n    \"        print(line)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Now we run the setup with 4 workers. To do this, we set the num_cores=4. This would launch 4 model running each running on individual NeuronCore. All the 4 models are running in individual processes, in other words the models are running in parallel. \\n\",\n    \"\\n\",\n    \"To feed the models efficiently, we use the producer-consumer setup, in which all processes running a model act as consumers. All consumers are fed using a sharing input queue.\\n\",\n    \"\\n\",\n    \"Now we run the below setup. You may notice, that the throughput increase by >2x when compared to a single worker setup.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from parallel import NeuronSimpleDataParallel\\n\",\n    \"from benchmark_utils import Results\\n\",\n    \"import time\\n\",\n    \"import functools\\n\",\n    \"import os\\n\",\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"num_cores = 4\\n\",\n    \"batch_size=1\\n\",\n    \"\\n\",\n    \"os.environ[\\\"NEURON_RT_NUM_CORES\\\"] = \\\"1\\\"\\n\",\n    \"\\n\",\n    \"#Result aggregation class (code in bert_benchmark_utils.py)\\n\",\n    \"results = Results(batch_size, num_cores)\\n\",\n    \"def result_handler(output, start, end):\\n\",\n    \"    elapsed = end - start\\n\",\n    \"    results.add_result([elapsed], [end], [start])\\n\",\n    \"\\n\",\n    \"inputs = get_sample_inputs(batch_size, seq_len)\\n\",\n    \"parallel_neuron_model = NeuronSimpleDataParallel(compiled_model_path, num_cores, inputs)\\n\",\n    \"\\n\",\n    \"#Starting the inference threads\\n\",\n    \"parallel_neuron_model.start_continuous_inference()\\n\",\n    \"\\n\",\n    \"# Warm up the cores\\n\",\n    \"for _ in range(num_cores*4):\\n\",\n    \"    parallel_neuron_model.warmup(inputs)\\n\",\n    \"    \\n\",\n    \"# Need to run for high number of iterations to benchmark the models\\n\",\n    \"for _ in range(5000):\\n\",\n    \"    parallel_neuron_model.infer(inputs)\\n\",\n    \"    # Passing the result_handler as a callback function\\n\",\n    \"    parallel_neuron_model.add_result(result_handler)\\n\",\n    \"\\n\",\n    \"# Stop inference                \\n\",\n    \"parallel_neuron_model.stop()\\n\",\n    \"# Since we are using a multi-process execution with a shared queue, some inferences\\n\",\n    \"# may still be in execution phase. Hence we need to wait till all the inputs are processed\\n\",\n    \"# add_all_results() will collect all the results of requests which are in this state\\n\",\n    \"parallel_neuron_model.add_all_results(result_handler)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"with open(\\\"benchmark.txt\\\", \\\"w\\\") as f:\\n\",\n    \"    results.report(f, window_size=1)\\n\",\n    \"\\n\",\n    \"with open(\\\"benchmark.txt\\\", \\\"r\\\") as f:\\n\",\n    \"    for line in f:\\n\",\n    \"        print(line)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.6.9\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "src/examples/mxnet/data_parallel/parallel.py",
    "content": "import mxnet as mx\nimport mx_neuron\nimport os\nfrom time import time\nfrom queue import Queue\nfrom multiprocessing import Process, Manager\n\n\ndef consumer(model_file, sample_input, input_queue, result_queue):\n    sym, args, aux = mx.model.load_checkpoint(model_file, 0)\n    sample_input = {key: mx.nd.array(v) for key, v in sample_input.items()}\n    args.update(sample_input)\n    model = sym.bind(mx.cpu(), args=args, aux_states=aux, grad_req=\"null\")\n\n    while True:\n        inputs, input_id = input_queue.get()\n        input_queue.task_done()\n        # Stop execution if stopping condition is recieved\n        if inputs == \"stop\":\n            break\n        inputs = {key: mx.nd.array(v) for key, v in inputs.items()}\n        start = time()\n        results = model.forward(**inputs)\n        results[0].wait_to_read()\n\n        # Make the output iterable - if it is not already a tuple or list\n        if not isinstance(results, tuple) or isinstance(results, list):\n            results = [results]\n        end = time()\n\n        if input_id != -1:\n            result_queue.put((results, start, end, input_id))\n\n\nclass NeuronSimpleDataParallel:\n    def __init__(self, model_file, num_neuron_cores, sample_input):\n        self.num_neuron_cores = num_neuron_cores\n        self.sample_input = sample_input\n        self.model_path = model_file\n        # Create shared input queue and output queue\n        manager = Manager()\n        self.input_queue = manager.Queue(maxsize=num_neuron_cores * 16)\n        self.result_queue = manager.Queue(maxsize=num_neuron_cores * 16)\n\n        self.processes = [\n            Process(\n                target=consumer,\n                args=(\n                    self.model_path,\n                    self.sample_input,\n                    self.input_queue,\n                    self.result_queue,\n                ),\n            )\n            for _ in range(num_neuron_cores)\n        ]\n        self.input_id = 0\n        self.input_dict = set()\n\n    def start_continuous_inference(self):\n        for p in self.processes:\n            p.start()\n\n    def warmup(self, batch):\n        self.input_queue.put((batch, -1))\n\n    def infer(self, batch):\n        self.input_id += 1\n        self.input_dict.add(self.input_id)\n        self.input_queue.put((batch, self.input_id))\n\n    def stop(self):\n        for _ in range(self.num_neuron_cores):\n            self.input_queue.put((\"stop\", -1))\n\n    def add_result(self, callback_fn):\n        if not self.result_queue.empty():\n            result, start, end, input_id = self.result_queue.get()\n            self.input_dict.remove(input_id)\n            self.result_queue.task_done()\n            callback_fn(result, start, end)\n\n    def add_all_results(self, callback_fn):\n        results = []\n        while len(self.input_dict):\n            self.add_result(callback_fn)\n        for p in self.processes:\n            p.join()\n"
  },
  {
    "path": "src/examples/mxnet/mxnet-gluon-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4dcf9bb1\",\n   \"metadata\": {},\n   \"source\": [\n    \"## MXNet 1.8: Getting Started with Gluon Tutorial\\n\",\n    \"\\n\",\n    \"In this tutorial you will compile and deploy resnet-50 using the newly supported MXNet 1.8 and Gluon API on an Inf1 instance. This tutorial is only supported with MXNet 1.8.\\n\",\n    \"\\n\",\n    \"This Jupyter notebook should be run on an inf1.6xlarge instance since you will be loading and compiling several large models.\\n\",\n    \"\\n\",\n    \"To run this tutorial, please make sure you deactivate any existing MXNet conda environments you already using. Install MXNet 1.8 by following the instructions at [MXNet Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/mxnet-setup/mxnet-install.html#install-neuron-mxnet). You would also need to change your kernel to use the correct Python environment setup earlier by clicking Kerenel->Change Kernel->Python (Neuron MXNet)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"83eb578b\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile\\n\",\n    \"\\n\",\n    \"A trained model must be compiled to Inferentia target before it can run on Inferentia. In this step we compile a pre-trained ResNet50 and export it as a compiled MXNet checkpoint.\\n\",\n    \"\\n\",\n    \"Compilation will take a few minutes. At the end of compilation, the files resnet-50_compiled-0000.params and resnet-50_compiled-symbol.json will be created in local directory.\\n\",\n    \"\\n\",\n    \"To check the supported operations for the uncompiled model or information on Neuron subgraphs for the compiled model, please see [Neuron Check Model](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-tools/tutorial-neuron-check-model.html#neuron-check-model).\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"88c41e01\",\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import mxnet as mx\\n\",\n    \"import mx_neuron as neuron\\n\",\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"path='http://data.mxnet.io/models/imagenet/'\\n\",\n    \"mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')\\n\",\n    \"mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')\\n\",\n    \"block = mx.gluon.nn.SymbolBlock.imports('resnet-50-symbol.json',\\\\\\n\",\n    \"    ['data', 'softmax_label'], 'resnet-50-0000.params', ctx=mx.cpu())\\n\",\n    \"\\n\",\n    \"block.hybridize()\\n\",\n    \"\\n\",\n    \"# Compile for Inferentia using Neuron\\n\",\n    \"inputs = { \\\"data\\\" : mx.nd.ones([1,3,224,224], name='data', dtype='float32'), 'softmax_label' : mx.nd.ones([1], name='data', dtype='float32') }\\n\",\n    \"block = neuron.compile(block, inputs=inputs)\\n\",\n    \"\\n\",\n    \"#save compiled model\\n\",\n    \"block.export(\\\"resnet-50_compiled\\\", 0, block)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"6337e0ec\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!ls\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5a9af0c7\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Deploy\\n\",\n    \"\\n\",\n    \"Deply on Infenrentia to see the inference results as below:\\n\",\n    \"```\\n\",\n    \"probability=0.643591, class=n02123045 tabby, tabby cat\\n\",\n    \"probability=0.184392, class=n02123159 tiger cat\\n\",\n    \"probability=0.105063, class=n02124075 Egyptian cat\\n\",\n    \"probability=0.030101, class=n02127052 lynx, catamount\\n\",\n    \"probability=0.016112, class=n02129604 tiger, Panthera tigris\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"960c6aa9\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import numpy as np\\n\",\n    \"import mxnet as mx\\n\",\n    \"import mx_neuron as neuron\\n\",\n    \"\\n\",\n    \"path='http://data.mxnet.io/models/imagenet/'\\n\",\n    \"mx.test_utils.download(path+'synset.txt')\\n\",\n    \"\\n\",\n    \"fname = mx.test_utils.download('https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg?raw=true')\\n\",\n    \"img = mx.image.imread(fname)# convert into format (batch, RGB, width, height)\\n\",\n    \"img = mx.image.imresize(img, 224, 224) # resize\\n\",\n    \"img = img.transpose((2, 0, 1)) # Channel first\\n\",\n    \"img = img.expand_dims(axis=0) # batchify\\n\",\n    \"img = img.astype(dtype='float32')\\n\",\n    \"\\n\",\n    \"block = mx.gluon.nn.SymbolBlock.imports('resnet-50_compiled-symbol.json',\\\\\\n\",\n    \"    ['data', 'softmax_label'], 'resnet-50_compiled-0000.params', ctx=mx.cpu())\\n\",\n    \"softmax = mx.nd.random_normal(shape=(1,))\\n\",\n    \"\\n\",\n    \"out = block(img, softmax).asnumpy()\\n\",\n    \"\\n\",\n    \"with open('synset.txt', 'r') as f:\\n\",\n    \"    labels = [l.rstrip() for l in f]\\n\",\n    \"\\n\",\n    \"out = block(img, softmax).asnumpy()\\n\",\n    \"\\n\",\n    \"prob = np.squeeze(out)\\n\",\n    \"a = np.argsort(prob)[::-1]\\n\",\n    \"for i in a[0:5]:\\n\",\n    \"    print('probability=%f, class=%s' %(prob[i], labels[i]))\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"id\": \"4f15e776\",\n   \"metadata\": {},\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 2\",\n   \"language\": \"python\",\n   \"name\": \"python2\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.6.9\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "src/examples/mxnet/resnet50/resnet50.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"wrapped-soccer\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Running Neuron Apache MXNet ResNet50 on Inferentia \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"appreciated-daily\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Introduction:\\n\",\n    \"In this tutorial we will compile and deploy ResNet50 model for Inferentia.\\n\",\n    \"In this tutorial we provide two main sections:\\n\",\n    \"\\n\",\n    \"1.Compile the ResNet50 model.\\n\",\n    \"\\n\",\n    \"2.Infer the compiled model.\\n\",\n    \"\\n\",\n    \"Before running the following verify this Jupyter notebook is running “conda_aws_neuron_mxnet_p36” kernel. You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page.\\n\",\n    \"Neuron supports Python module, Symbol APIs and the C predict API. The following quick start example uses the Symbol API.\\n\",\n    \"\\n\",\n    \"### Warning\\n\",\n    \"This tutorial was tested on MXNet-1.5\\n\",\n    \"\\n\",\n    \"MXNet-1.5 entered maintenance mode and require Neuron runtime 1.0, please see : [MXNet-1.5 enters maintainence mode](../../../../release-notes/maintenance.html)\\n\",\n    \"\\n\",\n    \"To setup development environment for MXNet-1.5 see installation instructions for Neuron 1.15.1 : [Neuron-1.15.1 MXNet install](../../../../archive/mxnet-neuron/setup/mxnet-install.html)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"advance-rebound\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile model on Neuron\\n\",\n    \"The following step will compile the resnet50 model. Compilation will take a few minutes on inf1.6xlarge. At the end of compilation, the files resnet-50_compiled-0000.params and resnet-50_compiled-symbol.json will be created in local directory.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"alpha-publication\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import mxnet as mx\\n\",\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"path='http://data.mxnet.io/models/imagenet/'\\n\",\n    \"mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')\\n\",\n    \"mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')\\n\",\n    \"sym, args, aux = mx.model.load_checkpoint('resnet-50', 0)\\n\",\n    \"\\n\",\n    \"# Compile for Inferentia using Neuron\\n\",\n    \"inputs = { \\\"data\\\" : mx.nd.ones([1,3,224,224], name='data', dtype='float32') }\\n\",\n    \"sym, args, aux = mx.contrib.neuron.compile(sym, args, aux, inputs)\\n\",\n    \"\\n\",\n    \"#save compiled model\\n\",\n    \"mx.model.save_checkpoint(\\\"resnet-50_compiled\\\", 0, sym, args, aux)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"technical-reason\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!ls\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"meaningful-substance\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Deploy on Inferentia\\n\",\n    \"Using same instance to deploy the model.        \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"cooked-jonathan\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import mxnet as mx\\n\",\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"path='http://data.mxnet.io/models/imagenet/'\\n\",\n    \"mx.test_utils.download(path+'synset.txt')\\n\",\n    \"\\n\",\n    \"fname = mx.test_utils.download('https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg?raw=true')\\n\",\n    \"img = mx.image.imread(fname)# convert into format (batch, RGB, width, height)\\n\",\n    \"img = mx.image.imresize(img, 224, 224) # resize\\n\",\n    \"img = img.transpose((2, 0, 1)) # Channel first\\n\",\n    \"img = img.expand_dims(axis=0) # batchify\\n\",\n    \"img = img.astype(dtype='float32')\\n\",\n    \"\\n\",\n    \"sym, args, aux = mx.model.load_checkpoint('resnet-50_compiled', 0)\\n\",\n    \"softmax = mx.nd.random_normal(shape=(1,))\\n\",\n    \"args['softmax_label'] = softmax\\n\",\n    \"args['data'] = img\\n\",\n    \"\\n\",\n    \"# Inferentia context\\n\",\n    \"ctx = mx.neuron()\\n\",\n    \"\\n\",\n    \"exe = sym.bind(ctx=ctx, args=args, aux_states=aux, grad_req='null')\\n\",\n    \"\\n\",\n    \"with open('synset.txt', 'r') as f:\\n\",\n    \"     labels = [l.rstrip() for l in f]\\n\",\n    \"\\n\",\n    \"exe.forward(data=img)\\n\",\n    \"prob = exe.outputs[0].asnumpy()# print the top-5\\n\",\n    \"prob = np.squeeze(prob)\\n\",\n    \"a = np.argsort(prob)[::-1]\\n\",\n    \"for i in a[0:5]:\\n\",\n    \"     print('probability=%f, class=%s' %(prob[i], labels[i]))\\n\",\n    \"        \\n\",\n    \"# Sample output will look like below:\\n\",\n    \"#probability=0.634792, class=n02123045 tabby, tabby cat\\n\",\n    \"#probability=0.193601, class=n02123159 tiger cat\\n\",\n    \"#probability=0.103627, class=n02124075 Egyptian cat\\n\",\n    \"#probability=0.031604, class=n02127052 lynx, catamount\\n\",\n    \"#probability=0.015892, class=n02129604 tiger, Panthera tigris\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Environment (conda_aws_neuron_mxnet_p36)\",\n   \"language\": \"python\",\n   \"name\": \"conda_aws_neuron_mxnet_p36\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.6.13\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "src/examples/mxnet/resnet50_neuroncore_groups.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Neuron Apache MXNet - Configurations for NeuronCore Groups Using Resnet50\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"## Introduction:\\n\",\n    \"\\n\",\n    \"In this tutorial we will compile and deploy Resnet-50 model in parallel using the concept of NeuronCore Groups on an Inf1 instance. This Jupyter notebook should be run on an instance which is inf1.6xlarge or larger. For simplicity we will run this tutorial on inf1.6xlarge but in real life scenario the compilation should be done on a compute instance and the deployment on inf1 instance to save costs. \\n\",\n    \"\\n\",\n    \"Set environment variable NEURON_RT_NUM_CORES to the total number of Neuron cores that will be utilized. The consecutive NeuronCore groups will be created by Neuron Runtime and place the models to the cores according to the compiled size.\\n\",\n    \"\\n\",\n    \"Note that in order to map a model to a group, the model must be compiled to fit within the group size. To limit the number of NeuronCores during compilation, use compiler_args dictionary with field “–neuroncore-pipeline-cores“ set to the group size. For exmaple, if NEURON_RT_NUM_CORES=4 and two models compiled with “–neuroncore-pipeline-cores=3“ and “–neuroncore-pipeline-cores=1“ were loaded, the first model would occupy NC0-2 and the second model would occupy NC3. \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"```\\n\",\n    \"compile_args = {'--neuroncore-pipeline-cores' : 2}\\n\",\n    \"sym, args, auxs = neuron.compile(sym, args, auxs, inputs, **compile_args)\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\n\",\n    \"In this tutorial we provide two main sections:\\n\",\n    \"\\n\",\n    \"1. Compile the Resnet50 model for Neuron\\n\",\n    \"\\n\",\n    \"2. Run inference using NeuronCore Groups\\n\",\n    \"\\n\",\n    \"Please use environment `conda_aws_neuron_mxnet_p36`.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile model for Neuron\\n\",\n    \"\\n\",\n    \"Model must be compiled to Inferentia target before it can be used on Inferentia. In the following we will compile the the flag, --neuroncore-pipeline-cores set to 2 and run it. The files resnet-50_compiled-0000.params and resnet-50_compiled-symbol.json will be created in local directory\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from packaging import version\\n\",\n    \"import mxnet as mx\\n\",\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"import mx_neuron as neuron\\n\",\n    \"\\n\",\n    \"path='http://data.mxnet.io/models/imagenet/'\\n\",\n    \"mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')\\n\",\n    \"mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')\\n\",\n    \"sym, args, aux = mx.model.load_checkpoint('resnet-50', 0)\\n\",\n    \"\\n\",\n    \"# Compile for Inferentia using Neuron, fit to NeuronCore group size of 2\\n\",\n    \"inputs = { \\\"data\\\" : mx.nd.ones([1,3,224,224], name='data', dtype='float32') }\\n\",\n    \"compile_args = {'--neuroncore-pipeline-cores' : 2}\\n\",\n    \"sym, args, aux = neuron.compile(sym, args, aux, inputs, **compile_args)\\n\",\n    \"\\n\",\n    \"#save compiled model\\n\",\n    \"mx.model.save_checkpoint(\\\"resnet-50_compiled\\\", 0, sym, args, aux)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run inference using NeuronCore Groups\\n\",\n    \"\\n\",\n    \"Within the framework, the model can be mapped to specific cores using ```ctx=mx.neuron(N)``` context where N specifies the index of the Neuron core to deploy. For more information, see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/appnotes/perf/flex-eg.html .\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import warnings\\n\",\n    \"\\n\",\n    \"mx.test_utils.download(path+'synset.txt')\\n\",\n    \"\\n\",\n    \"fname = mx.test_utils.download('https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg?raw=true')\\n\",\n    \"img = mx.image.imread(fname) # convert into format (batch, RGB, width, height)\\n\",\n    \"img = mx.image.imresize(img, 224, 224) # resize\\n\",\n    \"img = img.transpose((2, 0, 1)) # Channel first\\n\",\n    \"img = img.expand_dims(axis=0) # batchify\\n\",\n    \"img = img.astype(dtype='float32')\\n\",\n    \"\\n\",\n    \"sym, args, aux = mx.model.load_checkpoint('resnet-50_compiled', 0)\\n\",\n    \"softmax = mx.nd.random_normal(shape=(1,))\\n\",\n    \"args['softmax_label'] = softmax\\n\",\n    \"args['data'] = img\\n\",\n    \"\\n\",\n    \"os.environ[\\\"NEURON_RT_NUM_CORES\\\"] = '4'\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"# Inferentia context - group index 1 (size 2) would skip NC0 and place the \\n\",\n    \"# compiled model onto NC1,2\\n\",\n    \"ctx = mx.neuron(1)\\n\",\n    \"\\n\",\n    \"exe = sym.bind(ctx=ctx, args=args, aux_states=aux, grad_req='null')\\n\",\n    \"\\n\",\n    \"with open('synset.txt', 'r') as f:\\n\",\n    \"     labels = [l.rstrip() for l in f]\\n\",\n    \"\\n\",\n    \"exe.forward(data=img)\\n\",\n    \"prob = exe.outputs[0].asnumpy()# print the top-5\\n\",\n    \"prob = np.squeeze(prob)\\n\",\n    \"a = np.argsort(prob)[::-1]\\n\",\n    \"for i in a[0:5]:\\n\",\n    \"     print('probability=%f, class=%s' %(prob[i], labels[i]))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can experiment with different Neuron core group combinations and different models.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Troubleshooting\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"If not enough NeuronCores are provided, an error message will be displayed:\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"mxnet.base.MXNetError: [04:01:39] src/operator/subgraph/neuron/./neuron_util.h:541: Check failed: rsp.status().code() == 0: Failed load model with Neuron-RTD Error. Neuron-RTD Status Code: 9, details: \\\"\\\"\\n\",\n    \"\\n\",\n    \"```\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Environment (conda_aws_neuron_mxnet_p36)\",\n   \"language\": \"python\",\n   \"name\": \"conda_aws_neuron_mxnet_p36\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.6.13\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "src/examples/neuron-monitor/neuron-monitor-grafana.json",
    "content": "{\n  \"annotations\": {\n    \"list\": [\n      {\n        \"builtIn\": 1,\n        \"datasource\": \"-- Grafana --\",\n        \"enable\": true,\n        \"hide\": true,\n        \"iconColor\": \"rgba(0, 211, 255, 1)\",\n        \"name\": \"Annotations & Alerts\",\n        \"type\": \"dashboard\"\n      }\n    ]\n  },\n  \"editable\": true,\n  \"gnetId\": null,\n  \"graphTooltip\": 0,\n  \"id\": 2,\n  \"iteration\": 1605138719380,\n  \"links\": [],\n  \"panels\": [\n    {\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {\n            \"align\": null,\n            \"filterable\": false\n          },\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"absolute\",\n            \"steps\": [\n              {\n                \"color\": \"green\",\n                \"value\": null\n              },\n              {\n                \"color\": \"red\",\n                \"value\": 80\n              }\n            ]\n          }\n        },\n        \"overrides\": [\n          {\n            \"matcher\": {\n              \"id\": \"byName\",\n              \"options\": \"Value\"\n            },\n            \"properties\": [\n              {\n                \"id\": \"custom.width\",\n                \"value\": 163\n              }\n            ]\n          },\n          {\n            \"matcher\": {\n              \"id\": \"byName\",\n              \"options\": \"Field\"\n            },\n            \"properties\": [\n              {\n                \"id\": \"custom.width\",\n                \"value\": 450\n              }\n            ]\n          },\n          {\n            \"matcher\": {\n              \"id\": \"byName\",\n              \"options\": \"ami_id\"\n            },\n            \"properties\": [\n              {\n                \"id\": \"custom.width\",\n                \"value\": 217\n              }\n            ]\n          },\n          {\n            \"matcher\": {\n              \"id\": \"byName\",\n              \"options\": \"instance_type\"\n            },\n            \"properties\": [\n              {\n                \"id\": \"custom.width\",\n                \"value\": 391\n              }\n            ]\n          },\n          {\n            \"matcher\": {\n              \"id\": \"byName\",\n              \"options\": \"Prometheus instance\"\n            },\n            \"properties\": [\n              {\n                \"id\": \"custom.width\",\n                \"value\": 641\n              }\n            ]\n          }\n        ]\n      },\n      \"gridPos\": {\n        \"h\": 8,\n        \"w\": 24,\n        \"x\": 0,\n        \"y\": 0\n      },\n      \"id\": 8,\n      \"options\": {\n        \"showHeader\": true,\n        \"sortBy\": []\n      },\n      \"pluginVersion\": \"7.2.1\",\n      \"repeat\": null,\n      \"targets\": [\n        {\n          \"expr\": \"instance_info\",\n          \"format\": \"table\",\n          \"instant\": true,\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"Instance Info\",\n      \"transformations\": [\n        {\n          \"id\": \"organize\",\n          \"options\": {\n            \"excludeByName\": {\n              \"Time\": true,\n              \"Value\": true,\n              \"__name__\": true,\n              \"ami_id\": false,\n              \"instance\": true,\n              \"job\": true\n            },\n            \"indexByName\": {\n              \"Time\": 0,\n              \"Value\": 7,\n              \"__name__\": 1,\n              \"availability_zone\": 8,\n              \"instance\": 5,\n              \"instance_id\": 2,\n              \"instance_name\": 3,\n              \"instance_type\": 4,\n              \"job\": 6,\n              \"region\": 9,\n              \"subnet_id\": 10\n            },\n            \"renameByName\": {\n              \"Value\": \"\",\n              \"availability_zone\": \"Availability Zone\",\n              \"instance\": \"\",\n              \"instance_id\": \"Instance ID\",\n              \"instance_name\": \"Instance Name\",\n              \"instance_type\": \"Instance Type\",\n              \"region\": \"Region\",\n              \"subnet_id\": \"Subnet\"\n            }\n          }\n        }\n      ],\n      \"type\": \"table\"\n    },\n    {\n      \"datasource\": null,\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"absolute\",\n            \"steps\": [\n              {\n                \"color\": \"super-light-yellow\",\n                \"value\": null\n              }\n            ]\n          }\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 5,\n        \"w\": 3,\n        \"x\": 0,\n        \"y\": 8\n      },\n      \"id\": 36,\n      \"options\": {\n        \"colorMode\": \"value\",\n        \"graphMode\": \"none\",\n        \"justifyMode\": \"auto\",\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"last\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"textMode\": \"auto\"\n      },\n      \"pluginVersion\": \"7.2.1\",\n      \"targets\": [\n        {\n          \"expr\": \"count(instance_info)\\n\",\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"Instance Count\",\n      \"type\": \"stat\"\n    },\n    {\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"absolute\",\n            \"steps\": [\n              {\n                \"color\": \"light-blue\",\n                \"value\": null\n              }\n            ]\n          },\n          \"unit\": \"none\"\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 5,\n        \"w\": 3,\n        \"x\": 3,\n        \"y\": 8\n      },\n      \"id\": 10,\n      \"options\": {\n        \"colorMode\": \"value\",\n        \"graphMode\": \"none\",\n        \"justifyMode\": \"center\",\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"mean\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"textMode\": \"auto\"\n      },\n      \"pluginVersion\": \"7.2.1\",\n      \"targets\": [\n        {\n          \"expr\": \"sum (system_vcpu_count)\",\n          \"instant\": true,\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"vCPU Count\",\n      \"type\": \"stat\"\n    },\n    {\n      \"datasource\": null,\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"percentage\",\n            \"steps\": [\n              {\n                \"color\": \"green\",\n                \"value\": null\n              },\n              {\n                \"color\": \"#EAB839\",\n                \"value\": 70\n              },\n              {\n                \"color\": \"orange\",\n                \"value\": 80\n              },\n              {\n                \"color\": \"semi-dark-red\",\n                \"value\": 90\n              }\n            ]\n          },\n          \"unit\": \"percentunit\"\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 5,\n        \"w\": 3,\n        \"x\": 6,\n        \"y\": 8\n      },\n      \"id\": 20,\n      \"options\": {\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"mean\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"showThresholdLabels\": true,\n        \"showThresholdMarkers\": true\n      },\n      \"pluginVersion\": \"7.2.1\",\n      \"targets\": [\n        {\n          \"expr\": \"avg(sum by (instance_id) (system_vcpu_usage_ratio))\",\n          \"instant\": true,\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"vCPU Utilization\",\n      \"type\": \"gauge\"\n    },\n    {\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"percentage\",\n            \"steps\": [\n              {\n                \"color\": \"green\",\n                \"value\": null\n              },\n              {\n                \"color\": \"yellow\",\n                \"value\": 70\n              },\n              {\n                \"color\": \"orange\",\n                \"value\": 80\n              },\n              {\n                \"color\": \"red\",\n                \"value\": 90\n              }\n            ]\n          },\n          \"unit\": \"percentunit\"\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 5,\n        \"w\": 3,\n        \"x\": 9,\n        \"y\": 8\n      },\n      \"id\": 16,\n      \"options\": {\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"mean\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"showThresholdLabels\": true,\n        \"showThresholdMarkers\": true\n      },\n      \"pluginVersion\": \"7.2.1\",\n      \"targets\": [\n        {\n          \"expr\": \"avg(system_memory_used_bytes / system_memory_total_bytes)\",\n          \"instant\": true,\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"Host Memory Usage\",\n      \"type\": \"gauge\"\n    },\n    {\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"absolute\",\n            \"steps\": [\n              {\n                \"color\": \"rgb(191, 151, 105)\",\n                \"value\": null\n              }\n            ]\n          }\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 5,\n        \"w\": 3,\n        \"x\": 12,\n        \"y\": 8\n      },\n      \"id\": 12,\n      \"options\": {\n        \"colorMode\": \"value\",\n        \"graphMode\": \"none\",\n        \"justifyMode\": \"center\",\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"mean\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"textMode\": \"auto\"\n      },\n      \"pluginVersion\": \"7.2.1\",\n      \"targets\": [\n        {\n          \"expr\": \"count(neuroncore_utilization_ratio > 0)\",\n          \"instant\": true,\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"NeuronCores in Use\",\n      \"transformations\": [],\n      \"type\": \"stat\"\n    },\n    {\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {\n            \"align\": null,\n            \"filterable\": false\n          },\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"percentage\",\n            \"steps\": [\n              {\n                \"color\": \"red\",\n                \"value\": null\n              },\n              {\n                \"color\": \"orange\",\n                \"value\": 5\n              },\n              {\n                \"color\": \"yellow\",\n                \"value\": 20\n              },\n              {\n                \"color\": \"green\",\n                \"value\": 35\n              }\n            ]\n          },\n          \"unit\": \"percentunit\"\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 5,\n        \"w\": 3,\n        \"x\": 15,\n        \"y\": 8\n      },\n      \"id\": 4,\n      \"interval\": \"\",\n      \"options\": {\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"mean\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"showThresholdLabels\": true,\n        \"showThresholdMarkers\": true\n      },\n      \"pluginVersion\": \"7.2.1\",\n      \"targets\": [\n        {\n          \"expr\": \"avg(neuroncore_utilization_ratio)\",\n          \"instant\": true,\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"NeuronCore Utilization\",\n      \"type\": \"gauge\"\n    },\n    {\n      \"datasource\": \"Prometheus\",\n      \"description\": \"\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"percentage\",\n            \"steps\": [\n              {\n                \"color\": \"green\",\n                \"value\": null\n              }\n            ]\n          },\n          \"unit\": \"cps\"\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 5,\n        \"w\": 3,\n        \"x\": 18,\n        \"y\": 8\n      },\n      \"id\": 6,\n      \"options\": {\n        \"colorMode\": \"value\",\n        \"graphMode\": \"area\",\n        \"justifyMode\": \"auto\",\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"mean\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"textMode\": \"auto\"\n      },\n      \"pluginVersion\": \"7.2.1\",\n      \"targets\": [\n        {\n          \"expr\": \"sum(rate(execution_status_total{status_type=\\\"completed\\\"}[1m]))\",\n          \"hide\": false,\n          \"instant\": true,\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"Execution Success Rate\",\n      \"transformations\": [],\n      \"type\": \"stat\"\n    },\n    {\n      \"datasource\": \"Prometheus\",\n      \"description\": \"\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"absolute\",\n            \"steps\": [\n              {\n                \"color\": \"green\",\n                \"value\": null\n              },\n              {\n                \"color\": \"red\",\n                \"value\": 1\n              }\n            ]\n          },\n          \"unit\": \"cps\"\n        },\n        \"overrides\": []\n      },\n      \"gridPos\": {\n        \"h\": 5,\n        \"w\": 3,\n        \"x\": 21,\n        \"y\": 8\n      },\n      \"id\": 18,\n      \"options\": {\n        \"colorMode\": \"value\",\n        \"graphMode\": \"area\",\n        \"justifyMode\": \"auto\",\n        \"orientation\": \"auto\",\n        \"reduceOptions\": {\n          \"calcs\": [\n            \"mean\"\n          ],\n          \"fields\": \"\",\n          \"values\": false\n        },\n        \"textMode\": \"auto\"\n      },\n      \"pluginVersion\": \"7.2.1\",\n      \"targets\": [\n        {\n          \"expr\": \"sum(rate(execution_status_total{status_type!=\\\"completed\\\"}[1m]))\",\n          \"instant\": true,\n          \"interval\": \"\",\n          \"legendFormat\": \"\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"timeFrom\": null,\n      \"timeShift\": null,\n      \"title\": \"Execution Error Rate\",\n      \"type\": \"stat\"\n    },\n    {\n      \"aliasColors\": {\n        \"Inf Error Rate\": \"semi-dark-red\",\n        \"Inf Success Rate\": \"light-green\"\n      },\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": null,\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {}\n        },\n        \"overrides\": []\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 12,\n        \"w\": 12,\n        \"x\": 0,\n        \"y\": 13\n      },\n      \"hiddenSeries\": false,\n      \"id\": 32,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.1\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": false,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"sum(rate(execution_status_total{status_type=\\\"completed\\\"}[1m]))\",\n          \"interval\": \"\",\n          \"legendFormat\": \"Execution Success Rate\",\n          \"refId\": \"A\"\n        },\n        {\n          \"expr\": \"sum(rate(execution_status_total{status_type!=\\\"completed\\\"}[1m]))\",\n          \"interval\": \"\",\n          \"legendFormat\": \"Execution Error Rate\",\n          \"refId\": \"B\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Execution Status Rates\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"$$hashKey\": \"object:547\",\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        },\n        {\n          \"$$hashKey\": \"object:548\",\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    },\n    {\n      \"aliasColors\": {\n        \"p0\": \"dark-green\",\n        \"p1\": \"semi-dark-green\",\n        \"p100\": \"semi-dark-red\",\n        \"p25\": \"light-green\",\n        \"p50\": \"super-light-green\",\n        \"p75\": \"super-light-red\",\n        \"p99\": \"light-red\",\n        \"{percentile=\\\"p0\\\"}\": \"dark-green\",\n        \"{percentile=\\\"p1\\\"}\": \"semi-dark-green\",\n        \"{percentile=\\\"p100\\\"}\": \"dark-red\",\n        \"{percentile=\\\"p25\\\"}\": \"light-green\",\n        \"{percentile=\\\"p50\\\"}\": \"super-light-green\",\n        \"{percentile=\\\"p75\\\"}\": \"light-red\",\n        \"{percentile=\\\"p99\\\"}\": \"semi-dark-red\"\n      },\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": null,\n      \"description\": \"\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"absolute\",\n            \"steps\": [\n              {\n                \"color\": \"green\",\n                \"value\": null\n              },\n              {\n                \"color\": \"red\",\n                \"value\": 80\n              }\n            ]\n          },\n          \"unit\": \"s\"\n        },\n        \"overrides\": []\n      },\n      \"fill\": 0,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 12,\n        \"w\": 12,\n        \"x\": 12,\n        \"y\": 13\n      },\n      \"hiddenSeries\": false,\n      \"id\": 34,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.1\",\n      \"pointradius\": 1,\n      \"points\": true,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": false,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"avg by (percentile) (execution_latency_seconds)\",\n          \"interval\": \"\",\n          \"legendFormat\": \"{{percentile}}\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Execution Latency\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"$$hashKey\": \"object:61\",\n          \"format\": \"s\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        },\n        {\n          \"$$hashKey\": \"object:62\",\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    },\n    {\n      \"aliasColors\": {},\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": null,\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"unit\": \"percentunit\"\n        },\n        \"overrides\": []\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 12,\n        \"w\": 8,\n        \"x\": 0,\n        \"y\": 25\n      },\n      \"hiddenSeries\": false,\n      \"id\": 30,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.1\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": false,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"avg by (neuroncore) (neuroncore_utilization_ratio)\",\n          \"interval\": \"\",\n          \"legendFormat\": \"nc{{neuroncore}}\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"NeuronCore Utilization\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"$$hashKey\": \"object:493\",\n          \"format\": \"percentunit\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": \"1\",\n          \"min\": \"0\",\n          \"show\": true\n        },\n        {\n          \"$$hashKey\": \"object:494\",\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": \"100\",\n          \"min\": \"0\",\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    },\n    {\n      \"aliasColors\": {\n        \"Runtime system CPU usage \": \"light-red\",\n        \"Runtime user CPU usage \": \"light-green\"\n      },\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"mappings\": [],\n          \"thresholds\": {\n            \"mode\": \"absolute\",\n            \"steps\": [\n              {\n                \"color\": \"green\",\n                \"value\": null\n              },\n              {\n                \"color\": \"red\",\n                \"value\": 80\n              }\n            ]\n          },\n          \"unit\": \"percentunit\"\n        },\n        \"overrides\": []\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 12,\n        \"w\": 8,\n        \"x\": 8,\n        \"y\": 25\n      },\n      \"hiddenSeries\": false,\n      \"id\": 2,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.1\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": true,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"avg by (usage_type) (neuron_runtime_vcpu_usage_ratio)\",\n          \"format\": \"time_series\",\n          \"instant\": false,\n          \"interval\": \"\",\n          \"legendFormat\": \"Neuron Runtime {{usage_type}} CPU usage \",\n          \"refId\": \"A\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Neuron Runtime vCPU Usage\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"$$hashKey\": \"object:385\",\n          \"format\": \"percentunit\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": \"1\",\n          \"min\": \"0\",\n          \"show\": true\n        },\n        {\n          \"$$hashKey\": \"object:386\",\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    },\n    {\n      \"aliasColors\": {\n        \"host\": \"rgb(0, 217, 255)\",\n        \"neuron_device\": \"super-light-orange\"\n      },\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": null,\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"unit\": \"bytes\"\n        },\n        \"overrides\": []\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 12,\n        \"w\": 8,\n        \"x\": 16,\n        \"y\": 25\n      },\n      \"hiddenSeries\": false,\n      \"id\": 28,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.1\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": false,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"avg by (memory_location) (sum by (instance_id, memory_location) (neuron_runtime_memory_used_bytes))\",\n          \"interval\": \"\",\n          \"legendFormat\": \"{{memory_location}}\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Neuron Runtime Used Memory\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"$$hashKey\": \"object:439\",\n          \"format\": \"bytes\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        },\n        {\n          \"$$hashKey\": \"object:440\",\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    },\n    {\n      \"aliasColors\": {\n        \"Memory Usage\": \"rgb(0, 217, 255)\",\n        \"NeuronCore Usage\": \"light-orange\",\n        \"vCPU Usage\": \"light-blue\"\n      },\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": null,\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"unit\": \"percentunit\"\n        },\n        \"overrides\": []\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 12,\n        \"w\": 8,\n        \"x\": 0,\n        \"y\": 37\n      },\n      \"hiddenSeries\": false,\n      \"id\": 22,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.1\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": false,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"avg(system_memory_used_bytes / system_memory_total_bytes)\",\n          \"instant\": false,\n          \"interval\": \"\",\n          \"legendFormat\": \"Memory Usage\",\n          \"refId\": \"A\"\n        },\n        {\n          \"expr\": \"avg(sum by (instance_id) (system_vcpu_usage_ratio))\",\n          \"instant\": false,\n          \"interval\": \"\",\n          \"legendFormat\": \"vCPU Usage\",\n          \"refId\": \"B\"\n        },\n        {\n          \"expr\": \"avg(neuroncore_utilization_ratio)\",\n          \"instant\": false,\n          \"interval\": \"\",\n          \"legendFormat\": \"NeuronCore Usage\",\n          \"refId\": \"C\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Host System Utilization\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"$$hashKey\": \"object:664\",\n          \"format\": \"percentunit\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": \"1\",\n          \"min\": \"0\",\n          \"show\": true\n        },\n        {\n          \"$$hashKey\": \"object:665\",\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    },\n    {\n      \"aliasColors\": {\n        \"system\": \"light-red\",\n        \"user\": \"light-green\"\n      },\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"unit\": \"percentunit\"\n        },\n        \"overrides\": []\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 12,\n        \"w\": 8,\n        \"x\": 8,\n        \"y\": 37\n      },\n      \"hiddenSeries\": false,\n      \"id\": 24,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.1\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [],\n      \"spaceLength\": 10,\n      \"stack\": true,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"avg by (usage_type) (system_vcpu_usage_ratio)\",\n          \"interval\": \"\",\n          \"legendFormat\": \"{{usage_type}}\",\n          \"refId\": \"A\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Host vCPU Usage\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"$$hashKey\": \"object:876\",\n          \"format\": \"percentunit\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": \"1\",\n          \"min\": \"0\",\n          \"show\": true\n        },\n        {\n          \"$$hashKey\": \"object:877\",\n          \"format\": \"short\",\n          \"label\": null,\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": null,\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    },\n    {\n      \"aliasColors\": {\n        \"Memory Usage Bytes\": \"rgb(223, 180, 0)\",\n        \"Memory Usage Percent\": \"rgb(0, 217, 255)\"\n      },\n      \"bars\": false,\n      \"dashLength\": 10,\n      \"dashes\": false,\n      \"datasource\": \"Prometheus\",\n      \"fieldConfig\": {\n        \"defaults\": {\n          \"custom\": {},\n          \"unit\": \"short\"\n        },\n        \"overrides\": [\n          {\n            \"matcher\": {\n              \"id\": \"byName\",\n              \"options\": \"Memory Usage Percent\"\n            },\n            \"properties\": [\n              {\n                \"id\": \"unit\",\n                \"value\": \"percentunit\"\n              }\n            ]\n          },\n          {\n            \"matcher\": {\n              \"id\": \"byName\",\n              \"options\": \"Memory Usage Bytes\"\n            },\n            \"properties\": [\n              {\n                \"id\": \"unit\",\n                \"value\": \"bytes\"\n              }\n            ]\n          }\n        ]\n      },\n      \"fill\": 1,\n      \"fillGradient\": 0,\n      \"gridPos\": {\n        \"h\": 12,\n        \"w\": 8,\n        \"x\": 16,\n        \"y\": 37\n      },\n      \"hiddenSeries\": false,\n      \"id\": 26,\n      \"legend\": {\n        \"avg\": false,\n        \"current\": false,\n        \"max\": false,\n        \"min\": false,\n        \"show\": true,\n        \"total\": false,\n        \"values\": false\n      },\n      \"lines\": true,\n      \"linewidth\": 1,\n      \"nullPointMode\": \"null\",\n      \"options\": {\n        \"alertThreshold\": true\n      },\n      \"percentage\": false,\n      \"pluginVersion\": \"7.2.1\",\n      \"pointradius\": 2,\n      \"points\": false,\n      \"renderer\": \"flot\",\n      \"seriesOverrides\": [\n        {\n          \"$$hashKey\": \"object:711\"\n        },\n        {\n          \"$$hashKey\": \"object:931\",\n          \"alias\": \"Memory Usage Bytes\",\n          \"yaxis\": 2\n        }\n      ],\n      \"spaceLength\": 10,\n      \"stack\": false,\n      \"steppedLine\": false,\n      \"targets\": [\n        {\n          \"expr\": \"avg(system_memory_used_bytes / system_memory_total_bytes)\",\n          \"instant\": false,\n          \"interval\": \"\",\n          \"legendFormat\": \"Memory Usage Percent\",\n          \"refId\": \"A\"\n        },\n        {\n          \"expr\": \"avg(system_memory_used_bytes)\",\n          \"instant\": false,\n          \"interval\": \"\",\n          \"legendFormat\": \"Memory Usage Bytes\",\n          \"refId\": \"B\"\n        }\n      ],\n      \"thresholds\": [],\n      \"timeFrom\": null,\n      \"timeRegions\": [],\n      \"timeShift\": null,\n      \"title\": \"Host Memory Usage\",\n      \"tooltip\": {\n        \"shared\": true,\n        \"sort\": 0,\n        \"value_type\": \"individual\"\n      },\n      \"type\": \"graph\",\n      \"xaxis\": {\n        \"buckets\": null,\n        \"mode\": \"time\",\n        \"name\": null,\n        \"show\": true,\n        \"values\": []\n      },\n      \"yaxes\": [\n        {\n          \"$$hashKey\": \"object:689\",\n          \"format\": \"percentunit\",\n          \"label\": \"\",\n          \"logBase\": 1,\n          \"max\": \"1\",\n          \"min\": \"0\",\n          \"show\": true\n        },\n        {\n          \"$$hashKey\": \"object:690\",\n          \"decimals\": null,\n          \"format\": \"bytes\",\n          \"label\": \"\",\n          \"logBase\": 1,\n          \"max\": null,\n          \"min\": \"0\",\n          \"show\": true\n        }\n      ],\n      \"yaxis\": {\n        \"align\": false,\n        \"alignLevel\": null\n      }\n    }\n  ],\n  \"refresh\": \"5s\",\n  \"schemaVersion\": 26,\n  \"style\": \"dark\",\n  \"tags\": [],\n  \"templating\": {\n    \"list\": [\n      {\n        \"datasource\": \"Prometheus\",\n        \"filters\": [],\n        \"hide\": 0,\n        \"label\": \"\",\n        \"name\": \"Filters\",\n        \"skipUrlSync\": false,\n        \"type\": \"adhoc\"\n      }\n    ]\n  },\n  \"time\": {\n    \"from\": \"now-6h\",\n    \"to\": \"now\"\n  },\n  \"timepicker\": {},\n  \"timezone\": \"\",\n  \"title\": \"neuron-monitor\",\n  \"uid\": \"EqWNYf5Mz\",\n  \"version\": 68\n}"
  },
  {
    "path": "src/examples/pytorch/bert_tutorial/README.md",
    "content": "</br>\n</br>\n\nPlease view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** \n\n"
  },
  {
    "path": "src/examples/pytorch/bert_tutorial/THIRD",
    "content": ""
  },
  {
    "path": "src/examples/pytorch/bert_tutorial/THIRD PARTY LICENSE.txt",
    "content": "** transformers; version 2.8.0 -- https://github.com/huggingface/transformers\r\nCopyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\r\nCopyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.\r\n\r\n                                 Apache License\r\n                           Version 2.0, January 2004\r\n                        http://www.apache.org/licenses/\r\n\r\n   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\r\n\r\n   1. Definitions.\r\n\r\n      \"License\" shall mean the terms and conditions for use, reproduction,\r\n      and distribution as defined by Sections 1 through 9 of this document.\r\n\r\n      \"Licensor\" shall mean the copyright owner or entity authorized by\r\n      the copyright owner that is granting the License.\r\n\r\n      \"Legal Entity\" shall mean the union of the acting entity and all\r\n      other entities that control, are controlled by, or are under common\r\n      control with that entity. For the purposes of this definition,\r\n      \"control\" means (i) the power, direct or indirect, to cause the\r\n      direction or management of such entity, whether by contract or\r\n      otherwise, or (ii) ownership of fifty percent (50%) or more of the\r\n      outstanding shares, or (iii) beneficial ownership of such entity.\r\n\r\n      \"You\" (or \"Your\") shall mean an individual or Legal Entity\r\n      exercising permissions granted by this License.\r\n\r\n      \"Source\" form shall mean the preferred form for making modifications,\r\n      including but not limited to software source code, documentation\r\n      source, and configuration files.\r\n\r\n      \"Object\" form shall mean any form resulting from mechanical\r\n      transformation or translation of a Source form, including but\r\n      not limited to compiled object code, generated documentation,\r\n      and conversions to other media types.\r\n\r\n      \"Work\" shall mean the work of authorship, whether in Source or\r\n      Object form, made available under the License, as indicated by a\r\n      copyright notice that is included in or attached to the work\r\n      (an example is provided in the Appendix below).\r\n\r\n      \"Derivative Works\" shall mean any work, whether in Source or Object\r\n      form, that is based on (or derived from) the Work and for which the\r\n      editorial revisions, annotations, elaborations, or other modifications\r\n      represent, as a whole, an original work of authorship. For the purposes\r\n      of this License, Derivative Works shall not include works that remain\r\n      separable from, or merely link (or bind by name) to the interfaces of,\r\n      the Work and Derivative Works thereof.\r\n\r\n      \"Contribution\" shall mean any work of authorship, including\r\n      the original version of the Work and any modifications or additions\r\n      to that Work or Derivative Works thereof, that is intentionally\r\n      submitted to Licensor for inclusion in the Work by the copyright owner\r\n      or by an individual or Legal Entity authorized to submit on behalf of\r\n      the copyright owner. For the purposes of this definition, \"submitted\"\r\n      means any form of electronic, verbal, or written communication sent\r\n      to the Licensor or its representatives, including but not limited to\r\n      communication on electronic mailing lists, source code control systems,\r\n      and issue tracking systems that are managed by, or on behalf of, the\r\n      Licensor for the purpose of discussing and improving the Work, but\r\n      excluding communication that is conspicuously marked or otherwise\r\n      designated in writing by the copyright owner as \"Not a Contribution.\"\r\n\r\n      \"Contributor\" shall mean Licensor and any individual or Legal Entity\r\n      on behalf of whom a Contribution has been received by Licensor and\r\n      subsequently incorporated within the Work.\r\n\r\n   2. Grant of Copyright License. Subject to the terms and conditions of\r\n      this License, each Contributor hereby grants to You a perpetual,\r\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\r\n      copyright license to reproduce, prepare Derivative Works of,\r\n      publicly display, publicly perform, sublicense, and distribute the\r\n      Work and such Derivative Works in Source or Object form.\r\n\r\n   3. Grant of Patent License. Subject to the terms and conditions of\r\n      this License, each Contributor hereby grants to You a perpetual,\r\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\r\n      (except as stated in this section) patent license to make, have made,\r\n      use, offer to sell, sell, import, and otherwise transfer the Work,\r\n      where such license applies only to those patent claims licensable\r\n      by such Contributor that are necessarily infringed by their\r\n      Contribution(s) alone or by combination of their Contribution(s)\r\n      with the Work to which such Contribution(s) was submitted. If You\r\n      institute patent litigation against any entity (including a\r\n      cross-claim or counterclaim in a lawsuit) alleging that the Work\r\n      or a Contribution incorporated within the Work constitutes direct\r\n      or contributory patent infringement, then any patent licenses\r\n      granted to You under this License for that Work shall terminate\r\n      as of the date such litigation is filed.\r\n\r\n   4. Redistribution. You may reproduce and distribute copies of the\r\n      Work or Derivative Works thereof in any medium, with or without\r\n      modifications, and in Source or Object form, provided that You\r\n      meet the following conditions:\r\n\r\n      (a) You must give any other recipients of the Work or\r\n          Derivative Works a copy of this License; and\r\n\r\n      (b) You must cause any modified files to carry prominent notices\r\n          stating that You changed the files; and\r\n\r\n      (c) You must retain, in the Source form of any Derivative Works\r\n          that You distribute, all copyright, patent, trademark, and\r\n          attribution notices from the Source form of the Work,\r\n          excluding those notices that do not pertain to any part of\r\n          the Derivative Works; and\r\n\r\n      (d) If the Work includes a \"NOTICE\" text file as part of its\r\n          distribution, then any Derivative Works that You distribute must\r\n          include a readable copy of the attribution notices contained\r\n          within such NOTICE file, excluding those notices that do not\r\n          pertain to any part of the Derivative Works, in at least one\r\n          of the following places: within a NOTICE text file distributed\r\n          as part of the Derivative Works; within the Source form or\r\n          documentation, if provided along with the Derivative Works; or,\r\n          within a display generated by the Derivative Works, if and\r\n          wherever such third-party notices normally appear. The contents\r\n          of the NOTICE file are for informational purposes only and\r\n          do not modify the License. You may add Your own attribution\r\n          notices within Derivative Works that You distribute, alongside\r\n          or as an addendum to the NOTICE text from the Work, provided\r\n          that such additional attribution notices cannot be construed\r\n          as modifying the License.\r\n\r\n      You may add Your own copyright statement to Your modifications and\r\n      may provide additional or different license terms and conditions\r\n      for use, reproduction, or distribution of Your modifications, or\r\n      for any such Derivative Works as a whole, provided Your use,\r\n      reproduction, and distribution of the Work otherwise complies with\r\n      the conditions stated in this License.\r\n\r\n   5. Submission of Contributions. Unless You explicitly state otherwise,\r\n      any Contribution intentionally submitted for inclusion in the Work\r\n      by You to the Licensor shall be under the terms and conditions of\r\n      this License, without any additional terms or conditions.\r\n      Notwithstanding the above, nothing herein shall supersede or modify\r\n      the terms of any separate license agreement you may have executed\r\n      with Licensor regarding such Contributions.\r\n\r\n   6. Trademarks. This License does not grant permission to use the trade\r\n      names, trademarks, service marks, or product names of the Licensor,\r\n      except as required for reasonable and customary use in describing the\r\n      origin of the Work and reproducing the content of the NOTICE file.\r\n\r\n   7. Disclaimer of Warranty. Unless required by applicable law or\r\n      agreed to in writing, Licensor provides the Work (and each\r\n      Contributor provides its Contributions) on an \"AS IS\" BASIS,\r\n      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\r\n      implied, including, without limitation, any warranties or conditions\r\n      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\r\n      PARTICULAR PURPOSE. You are solely responsible for determining the\r\n      appropriateness of using or redistributing the Work and assume any\r\n      risks associated with Your exercise of permissions under this License.\r\n\r\n   8. Limitation of Liability. In no event and under no legal theory,\r\n      whether in tort (including negligence), contract, or otherwise,\r\n      unless required by applicable law (such as deliberate and grossly\r\n      negligent acts) or agreed to in writing, shall any Contributor be\r\n      liable to You for damages, including any direct, indirect, special,\r\n      incidental, or consequential damages of any character arising as a\r\n      result of this License or out of the use or inability to use the\r\n      Work (including but not limited to damages for loss of goodwill,\r\n      work stoppage, computer failure or malfunction, or any and all\r\n      other commercial damages or losses), even if such Contributor\r\n      has been advised of the possibility of such damages.\r\n\r\n   9. Accepting Warranty or Additional Liability. While redistributing\r\n      the Work or Derivative Works thereof, You may choose to offer,\r\n      and charge a fee for, acceptance of support, warranty, indemnity,\r\n      or other liability obligations and/or rights consistent with this\r\n      License. However, in accepting such obligations, You may act only\r\n      on Your own behalf and on Your sole responsibility, not on behalf\r\n      of any other Contributor, and only if You agree to indemnify,\r\n      defend, and hold each Contributor harmless for any liability\r\n      incurred by, or claims asserted against, such Contributor by reason\r\n      of your accepting any such warranty or additional liability.\r\n\r\n   END OF TERMS AND CONDITIONS\r\n\r\n   APPENDIX: How to apply the Apache License to your work.\r\n\r\n      To apply the Apache License to your work, attach the following\r\n      boilerplate notice, with the fields enclosed by brackets \"[]\"\r\n      replaced with your own identifying information. (Don't include\r\n      the brackets!)  The text should be enclosed in the appropriate\r\n      comment syntax for the file format. We also recommend that a\r\n      file or class name and description of purpose be included on the\r\n      same \"printed page\" as the copyright notice for easier\r\n      identification within third-party archives.\r\n\r\n   Copyright [yyyy] [name of copyright owner]\r\n\r\n   Licensed under the Apache License, Version 2.0 (the \"License\");\r\n   you may not use this file except in compliance with the License.\r\n   You may obtain a copy of the License at\r\n\r\n       http://www.apache.org/licenses/LICENSE-2.0\r\n\r\n   Unless required by applicable law or agreed to in writing, software\r\n   distributed under the License is distributed on an \"AS IS\" BASIS,\r\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\r\n   See the License for the specific language governing permissions and\r\n   limitations under the License."
  },
  {
    "path": "src/examples/pytorch/bert_tutorial/bert_benchmark_utils.py",
    "content": "import torch\nimport torch.neuron\nimport os\nimport sys\nimport csv\nimport math\nfrom collections import Counter\n\nimport numpy as np\n\nclass BertTestDataset(torch.utils.data.Dataset):\n    \"\"\"Bert test dataset.\"\"\"\n\n    def __init__(self, tsv_file, tokenizer, max_length=128, transform=None):\n        \"\"\"\n        Args:\n            csv_file (string): Path to the csv file with annotations.\n            tokenizer (callable = hugging face tokenizer):  Takes a string and encodes to standard input tensor set\n            max_length (int): Maximum length that all input tensors will be padded to\n            transform (callable, optional): Optional transform to be applied\n                on a sample.\n        \"\"\"\n        with open(tsv_file, \"r\") as f:\n            reader = csv.reader(f, delimiter=\"\\t\", quotechar=None)\n            self.lines = list(reader)\n\n        self.lines.pop(0)\n\n        self.tokenizer = tokenizer\n        self.max_length = max_length\n        self.transform = transform\n\n    def __len__(self):\n        return len(self.lines)\n\n    def __getitem__(self, idx):\n        if torch.is_tensor(idx):\n            idx = idx.tolist()\n\n        s1_raw = self.lines[idx][3]\n        if isinstance(s1_raw, bytes):\n            s1_raw = s1_raw.decode(\"utf-8\", \"ignore\")\n        s2_raw = self.lines[idx][4]\n        if isinstance(s2_raw, bytes):\n            s2_raw = s2_raw.decode(\"utf-8\", \"ignore\")\n\n        quality = self.lines[idx][0]\n\n        encoded = self.tokenizer.encode_plus(s1_raw, s2_raw, add_special_tokens=True,\n                                             return_tensors='pt', max_length=self.max_length, \n                                             padding='max_length', truncation=True)\n\n        sample = {'encoded': encoded, 'quality': quality}\n\n        if self.transform:\n            sample = self.transform(sample)\n\n        return sample\n\n\nclass BertResults():\n\n    def __init__(self, batch_size, num_cores=1):\n        self.correct_count = 0\n        self.inference_count = 0\n        self.latency_array = []\n        self.end_times = []\n        self.start_times = []\n        self.batch_size = batch_size\n        self.num_cores = num_cores\n\n    def add_result(self, correct_count, inference_count, latency_array, end_times, start_times):\n        self.correct_count += correct_count\n\n        self.inference_count += inference_count\n        self.latency_array.extend(latency_array)\n        self.end_times.extend(end_times)\n        self.start_times.extend(start_times)\n\n    def report(self, f, window_size=1):\n        assert(len(self.latency_array) != 0)\n        p50_latency = np.percentile(self.latency_array, 50)\n        p90_latency = np.percentile(self.latency_array, 90)\n        p95_latency = np.percentile(self.latency_array, 95)\n        p99_latency = np.percentile(self.latency_array, 99)\n        p100_latency = np.percentile(self.latency_array, 100)\n\n\n        def get_bucket(start, end):\n            bucketed_start = math.floor(start / window_size) * window_size\n            bucketed_end = math.ceil(end / window_size) * window_size\n            # The check is to make sure that we ignore timestamps that are larger than the window size\n            if bucketed_end - bucketed_start == window_size:\n                return bucketed_start\n            else:\n                return None\n            \n        # Divide the timestamps into different buckets\n        bucketed_timestamps = [get_bucket(start, end)\n                            for start, end in zip(self.start_times, self.end_times)]\n        # Count the values in each bucket\n        counted_buckets = Counter(\n            item for item in bucketed_timestamps if item is not None)\n        # Normalize each bucket\n        bucket_throughputs = [(key, value / window_size)\n                            for key, value in sorted(counted_buckets.items())]\n        \n        busy_throughputs = [value for _, value in bucket_throughputs]\n        max_throughput = max(busy_throughputs) * self.batch_size\n        avg_throughput = sum(busy_throughputs) * self.batch_size / len(busy_throughputs)\n        \n        f.write(\"\\n\")\n        f.write(\n            \"Maximum throughput = {} sentences/sec\\n\".format(int(max_throughput)))\n        f.write(\"Average throughput = {} sentences/sec\\n\".format(int(avg_throughput)))\n\n        f.write(\"\\n\")\n        f.write(\"Latency Percentiles:\\n\")\n        f.write(\"===\\n\")\n        f.write(\"P50  = {} milliseconds\\n\".format(int(1000*p50_latency)))\n        f.write(\"P90  = {} milliseconds\\n\".format(int(1000*p90_latency)))\n        f.write(\"P95  = {} milliseconds\\n\".format(int(1000*p95_latency)))\n        f.write(\"P99  = {} milliseconds\\n\".format(int(1000*p99_latency)))\n        f.write(\"P100 = {} milliseconds\\n\".format(int(1000*p100_latency)))\n        f.write(\"\\n\")\n        f.write(\"Accuracy:\\n\")\n        f.write(\"===\\n\")\n        if self.inference_count == 0:\n            self.inference_count = 1\n        accuracy = float(self.correct_count) / float(self.inference_count)\n        f.write(\"Accuracy = {}% \\n\".format(round(100*accuracy, 2)))\n        f.write(\"\\n\")\n        f.write(\"Sanity test:\\n\")\n        f.write(\"===\\n\")\n        f.write(\"Processed - num batches {}\\n\".format(len(self.latency_array)))\n        f.write(\"          - batch size {}\\n\".format(self.batch_size))\n        f.write(\"          - num cores {}\\n\".format(self.num_cores))\n"
  },
  {
    "path": "src/examples/pytorch/bert_tutorial/glue_mrpc_dev.tsv",
    "content": "Quality\t#1 ID\t#2 ID\t#1 String\t#2 String\n1\t1355540\t1355592\tHe said the foodservice pie business doesn 't fit the company 's long-term growth strategy .\t\" The foodservice pie business does not fit our long-term growth strategy .\n0\t2029631\t2029565\tMagnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war .\tHis wife said he was \" 100 percent behind George Bush \" and looked forward to using his years of training in the war .\n0\t487993\t487952\tThe dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat .\tThe dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .\n1\t1989515\t1989458\tThe AFL-CIO is waiting until October to decide if it will endorse a candidate .\tThe AFL-CIO announced Wednesday that it will decide in October whether to endorse a candidate before the primaries .\n0\t1783137\t1782659\tNo dates have been set for the civil or the criminal trial .\tNo dates have been set for the criminal or civil cases , but Shanley has pleaded not guilty .\n1\t3039165\t3039036\tWal-Mart said it would check all of its million-plus domestic workers to ensure they were legally employed .\tIt has also said it would review all of its domestic employees more than 1 million to ensure they have legal status .\n0\t1490811\t1490840\tWhile dioxin levels in the environment were up last year , they have dropped by 75 percent since the 1970s , said Caswell .\tThe Institute said dioxin levels in the environment have fallen by as much as 76 percent since the 1970s .\n1\t426112\t426210\tThis integrates with Rational PurifyPlus and allows developers to work in supported versions of Java , Visual C # and Visual Basic .NET.\tIBM said the Rational products were also integrated with Rational PurifyPlus , which allows developers to work in Java , Visual C # and VisualBasic .Net.\n1\t1439663\t1439808\tThe top rate will go to 4.45 percent for all residents with taxable incomes above $ 500,000 .\tFor residents with incomes above $ 500,000 , the income-tax rate will increase to 4.45 percent .\n1\t3147370\t3147525\tThe results appear in the January issue of Cancer , an American Cancer Society journal , being published online today .\tThe results appear in the January issue of Cancer , an American Cancer Society ( news - web sites ) journal , being published online Monday .\n1\t3300040\t3299992\tThe delegates said raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers .\tBin Laden ’ s men pointed out that raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers .\n0\t524136\t524119\t\" Sanitation is poor ... there could be typhoid and cholera , \" he said .\t\" Sanitation is poor , drinking water is generally left behind . . . there could be typhoid and cholera . \"\n0\t969512\t969295\tThe broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 .\tThe technology-laced Nasdaq Composite Index was down 25.36 points , or 1.53 percent , at 1,628.26 .\n1\t1685339\t1685429\tThe only announced Republican to replace Davis is Rep. Darrell Issa of Vista , who has spent $ 1.71 million of his own money to force a recall .\tSo far the only declared major party candidate is Rep. Darrell Issa , a Republican who has spent $ 1.5 million of his own money to fund the recall .\n1\t1967578\t1967664\tThe decision to issue new guidance has been prompted by intelligence passed to Britain by the FBI in a secret briefing in late July .\tScotland Yard 's decision to issue new guidance has been prompted by new intelligence passed to Britain by the FBI in late July .\n1\t2047034\t2046820\tUnable to find a home for him , a judge told mental health authorities they needed to find supervised housing and treatment for DeVries somewhere in California .\tThe judge had told the state Department of Mental Health to find supervised housing and treatment for DeVries somewhere in California .\n1\t2046630\t2046644\tThe decision came a year after Whipple ended federal oversight of the district 's racial balance , facilities , budget , and busing .\tThe decision came a year after Whipple ended federal oversight of school busing as well as the district 's racial balance , facilities and budget .\n0\t2221603\t2221633\tIn midafternoon trading , the Nasdaq composite index was up 8.34 , or 0.5 percent , to 1,790.47 .\tThe Nasdaq Composite Index .IXIC dipped 8.59 points , or 0.48 percent , to 1,773.54 .\n1\t129995\t129864\tMorgan Stanley raised its rating on the beverage maker to \" overweight \" from \" equal-weight \" saying in part that pricing power with its bottlers should improve in 2004 .\tMorgan Stanley raised its rating on the company to \" overweight \" from \" equal-weight , \" saying the beverage maker 's pricing power with bottlers should improve in 2004 .\n0\t919683\t919782\tThe pound also made progress against the dollar , reached fresh three-year highs at $ 1.6789 .\tThe British pound flexed its muscle against the dollar , last up 1 percent at $ 1.6672 .\n0\t970740\t971209\tFriday , Stanford ( 47-15 ) blanked the Gamecocks 8-0 .\tStanford ( 46-15 ) has a team full of such players this season .\n1\t2745055\t2745022\tLast month Intel raised its revenue guidance for the quarter to between $ 7.6 billion and $ 7.8 billion .\tAt the end of the second quarter , Intel initially predicted sales of between $ 6.9 billion and $ 7.5 billion .\n0\t2199097\t2199072\tThe driver , Eugene Rogers , helped to remove children from the bus , Wood said .\tAt the accident scene , the driver was \" covered in blood \" but helped to remove children , Wood said .\n1\t1609290\t1609098\tONG KONG , July 9 Tens of thousands of demonstrators gathered tonight before the legislature building here to call for free elections and the resignation of Hong Kong 's leader .\tTens of thousands of demonstrators gathered yesterday evening to stand before this city 's legislature building and call for free elections and the resignation of Hong Kong 's leader .\n1\t1597193\t1597119\tSaddam loyalists have been blamed for sabotaging the nation 's infrastructure , as well as frequent attacks on U.S. soldiers .\tHussein loyalists have been blamed for sabotaging the nation 's infrastructure and attacking US soldiers .\n1\t2758944\t2758975\tIts closest living relatives are a family frogs called sooglossidae that are found only in the Seychelles in the Indian Ocean .\tIts closest relative is found in the Seychelles Archipelago , near Madagascar in the Indian Ocean .\n0\t2584416\t2584653\tCooley said he expects Muhammad will similarly be called as a witness at a pretrial hearing for Malvo .\tLee Boyd Malvo will be called as a witness Wednesday in a pretrial hearing for fellow sniper suspect John Allen Muhammad .\n1\t86007\t86373\t\" Instead of pursuing the most imminent and real threats - international terrorists , \" Graham said , \" this Bush administration chose to settle old scores . \"\t\" Instead of pursuing the most imminent and real threats - international terrorists - this Bush administration has chosen to settle old scores , \" Graham said .\n1\t1602860\t1602844\tHe said they lied on a sworn affidavit that requires them to list prior marriages .\tMorgenthau said the women , all U.S. citizens , lied on a sworn affidavit that requires them to list prior marriages .\n1\t1201306\t1201329\tThe association said 28.2 million DVDs were rented in the week that ended June 15 , compared with 27.3 million VHS cassettes .\tThe Video Software Dealers Association said 28.2 million DVDs were rented out last week , compared to 27.3 million VHS cassettes .\n0\t461779\t461815\tWith these assets , Funny Cide has a solid chance to become the first Triple Crown winner since Affirmed in 1978 .\tFunny Cide is looking to become horse racing 's first Triple Crown winner in a generation .\n1\t1438666\t1438643\tIntel was disappointed and assessing its \" options in the event Mr. Hamidi resumes his spamming activity against Intel , \" spokesman Chuck Mulloy said .\tIntel spokesman Chuck Mulloy said the company was disappointed and assessing its \" options in the event Mr. Hamidi resumes his spamming activity against Intel . \"\n1\t3261484\t3261306\tMr Annan also warned the US should not use the war on terror as an excuse to suppress \" long-cherished freedoms \" .\tAnnan warned that the dangers of extremism after September 11 should not be used as an excuse to suppress \" long-cherished \" freedoms .\n1\t1277539\t1277527\tAt community colleges , tuition will jump to $ 2,800 from $ 2,500 .\tCommunity college students will see their tuition rise by $ 300 to $ 2,800 or 12 percent .\n1\t3035788\t3035918\tHe made a point of saying during Tuesdays debate that the Confederate flag was a racist symbol .\tThough Dean made a point of saying during the debate that the Confederate flag is a racist symbol .\n0\t132553\t132725\tBush wanted \" to see an aircraft landing the same way that the pilots saw an aircraft landing , \" White House press secretary Ari Fleischer said yesterday .\tOn Tuesday , before Byrd 's speech , Fleischer said Bush wanted ' ' to see an aircraft landing the same way that the pilots saw an aircraft landing .\n0\t2259788\t2259747\tOn Monday the Palestinian Prime Minister , Mahmoud Abbas , will report to the Palestinian parliament on his Government 's achievements in its first 100 days in office .\tPalestinian Prime Minister Mahmoud Abbas must defend the record of his first 100 days in office before Parliament today as the death toll in the occupied territories continues to rise .\n0\t2307064\t2307235\tThe civilian unemployment rate improved marginally last month -- slipping to 6.1 percent -- even as companies slashed payrolls by 93,000 .\tThe civilian unemployment rate improved marginally last month _ sliding down to 6.1 percent _ as companies slashed payrolls by 93,000 amid continuing mixed signals about the nation 's economic health .\n1\t3046488\t3046824\tPer-user pricing is $ 29 for Workplace Messaging , $ 89 for Team Collaboration and $ 35 for Collaborative Learning .\tWorkplace Messaging is $ 29 , Workplace Team Collaboration is $ 89 , and Collaborative Learning is $ 35 .\n1\t86020\t86007\t\" Instead of pursuing the most imminent and real threats – international terrorism – this Bush administration chose to settle old scores , \" Mr. Graham said .\t\" Instead of pursuing the most imminent and real threats - international terrorists , \" Graham said , \" this Bush administration chose to settle old scores . \"\n0\t1100998\t1100441\tSARS has killed about 800 people and affected more than 8400 since being detected in China in November .\tSARS has killed about 800 people and sickened more than 8,400 worldwide , mostly in Asia .\n1\t2268396\t2268480\tAuthorities had no evidence to suggest the two incidents were connected .\tThere was no immediate evidence that the two incidents were connected , police said .\n0\t1984039\t1983986\t\" Jeremy 's a good guy , \" Barber said , adding : \" Jeremy is living the dream life of the New York athlete .\tHe also said Shockey is \" living the dream life of a New York athlete .\n0\t2697659\t2697747\tRatliff 's daughters , Margaret and Martha Ratliff , were adopted by Peterson after their mother 's death .\tPeterson helped raise Ratliff 's two daughters , Margaret and Martha Ratliff , who supported him throughout the trial .\n0\t2175939\t2176090\tAfter losing as much as 84.56 earlier , the Dow Jones industrial average closed up 22.81 , or 0.2 percent , at 9,340.45 .\tIn midday trading , the Dow Jones industrial average lost 68.84 , or 0.7 percent , to 9,248.80 .\n1\t886618\t886456\tRumsfeld , who has been feuding for two years with Army leadership , passed over nine active-duty four-star generals .\tRumsfeld has been feuding for a long time with Army leadership , and he passed over nine active-duty four-star generals .\n1\t588637\t588864\tConsumers who said jobs are difficult to find jumped from 29.4 to 32.6 , while those claiming work was plentiful slipped from 13 to 12.6 .\tConsumers who said jobs are difficult to find jumped to 32.6 from 29.4 , while those saying work was plentiful slipped to 12.6 from 13 in April .\n0\t2252795\t2252970\tHe has no immediate plans for television advertising , believing it is unnecessary this early .\tA Lieberman aide said there were no immediate plans for television advertising .\n1\t1756329\t1756394\t\" I think it happened very quickly , \" Houston Police Department homicide investigator Phil Yochum said of the crime .\t\" I think it happened very quickly , \" said Investigator Phil Yochum of the Houston Police Department 's homicide division .\n1\t1673112\t1673068\tUnited issued a statement saying it will \" work professionally and cooperatively with all its unions . \"\tSenior vice president Sara Fields said the airline \" will work professionally and cooperatively with all our unions . \"\n1\t2357324\t2357271\t\" But they never climb out of the pot of beer again . \"\tIt 's just that they never climb out of the beer again . \"\n1\t780408\t780363\tChief financial officer Andy Bryant has said that hike had a greater affect volume than officials expected .\tBryant has said that hike had a greater effect on demand than officials expected .\n1\t821523\t821385\tRobert Liscouski , the Assistant Secretary of Homeland Security for Infrastructure Protection , will oversee NCSD .\tNCSD 's chief will be Robert Liscouski , the assistant secretary of Homeland Security for Infrastructure Protection .\n1\t2304696\t2304863\tHP 's shipments increased 48 percent year-over-year , compared to an increase of 31 percent for Dell .\tHPs shipments increased 48 per cent year-on-year , compared to an increase of 31 per cent for Dell .\n1\t2531749\t2531607\tChirac , who can pardon a law-breaker , refused Humbert 's request last year but kept in close touch with the family .\tChirac , who has the authority to pardon law-breakers , refused Humbert 's request to be allowed to die last year but kept in close touch with the family .\n1\t3180014\t3179967\tThe charges allege that he was part of the conspiracy to kill and kidnap persons in a foreign country .\tThe government now charges that Sattar conspired with Rahman to kill and kidnap individuals in foreign countries .\n1\t726966\t726945\tIn the 2002 study , the margin of error ranged from 1.8 to 4.4 percentage points .\tIt has a margin of error of plus or minus three to four percentage points .\n1\t2638861\t2638982\tMr. Clinton 's national security adviser , Sandy Berger , said that the White House wasn 't informed of the FBI activities .\tClinton ’ s national security adviser , Sandy Berger , said in an interview that the White House was not informed of the FBI activities .\n1\t2495223\t2495307\t\" This decision is clearly incorrect , \" FTC Chairman Timothy Muris said in a written statement .\tThe decision is \" clearly incorrect , \" FTC Chairman Tim Muris said .\n1\t55187\t54831\tProsecutors allege that Nichols and co-conspirator Timothy McVeigh worked together to prepare a bomb that destroyed the Alfred P. Murrah Federal Building .\tProsecutors allege that Nichols and coconspirator Timothy McVeigh worked together to prepare a 4,000-pound fuel-and-fertilizer bomb that destroyed the Murrah building .\n0\t2763381\t2763517\tTerri Schiavo , 39 , is expected to die sometime in the next two weeks in the Tampa-area hospice where she has spent the past several years .\tTerri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler .\n1\t1990975\t1991132\tSecretary of State Colin Powell designated the Chechen leader believed responsible for last year 's hostage standoff in a Moscow theater as a threat to U.S. security Friday .\tU.S. Secretary of State Colin Powell on Friday designated Chechen rebel leader Shamil Basayev a threat to the security of the United States and to U.S. citizens .\n1\t2204353\t2204418\t\" Today , we are trying to convey this problem to Russian President Vladimir Putin and US President George W Bush . \"\t\" Today , we are trying to convey this problem to Russian President Vladimir Putin ( news - web sites ) and President Bush ( news - web sites ) . \"\n1\t60122\t60445\tThat would be a potential setback to Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries .\tThe inquiry may hinder Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries .\n1\t961836\t962243\tPeopleSoft also said its board had officially rejected Oracle 's offer .\tThursday morning , PeopleSoft 's board rejected the Oracle takeover offer .\n0\t3140260\t3140288\tThe Dow Jones industrial average ended the day down 10.89 at 9,837.94 , after advancing 111.04 Wednesday .\tThe Dow Jones industrial average fell 10.89 points , or 0.11 percent , to 9,837.94 .\n1\t1720166\t1720115\tCortisol levels in the saliva of day care children were highest and rose most steeply in those judged by day care center personnel to be the shyest .\tCortisol levels in the saliva of day-care children were highest and rose most steeply in those whom day-care centre staffed judged to be the shyest .\n1\t2573262\t2573319\t\" The idea that Tony Abbott is in some way a one-dimensional political head-kicker couldn 't be more wrong , \" Mr Howard said .\t\" The idea that Tony Abbott is in some way a one-dimensional political head kicker couldn 't be more wrong . \"\n0\t1353356\t1353174\t\" Biotech products , if anything , may be safer than conventional products because of all the testing , \" Fraley said , adding that 18 countries have adopted biotechnology .\t\" Biotech products , if anything , may be safer than conventional products because of all the testing , \" said Robert Fraley , Monsanto 's executive vice president .\n1\t2738677\t2738741\tThe rate of skin cancer has tripled since the 1950s in Norway and Sweden , according to the study .\tThe study also found that skin cancer nearly tripled in Norway and Sweden since the 1950s .\n1\t1638813\t1639087\tWe acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , \" Rumsfeld said .\tRather , the US acted because the administration saw \" existing evidence in a new light , through the prism of our experience on September 11 \" .\n1\t1605350\t1605425\tTrans fat makes up only 1 percent to 3 percent of the total fat Americans consume , compared with 14 percent for saturated fat .\tTrans fat accounts for 2.5 percent of Americans ' daily calories , compared to 11 percent to 12 percent for saturated fat .\n1\t2494149\t2494073\tHowever , a recent slide in prices and OPEC 's expectations of a surge in oil inventories have compounded its fears about a further softening of the market .\tA 14 percent slide in crude prices this month and expectations of a build up in oil inventories compounded OPEC 's fears of a further softening of the market .\n1\t3023029\t3023229\tPeterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son .\tPeterson , 31 , is charged with two counts of first-degree murder in the slayings of his wife , Laci , and their unborn son , Conner .\n1\t1351550\t1351155\tCarlson on Tuesday said he would not recuse himself from the case .\tService officials said Carlson refused to recuse himself from the case .\n1\t981185\t981234\tThe program will grow to include ports in Dubai , Turkey and Malaysia , among others .\tThe program will be expanded to include areas of the Middle East such as Dubai , Turkey and Malaysia , Mr. Ridge said .\n0\t2111629\t2111786\tMcCabe said he was considered a witness , not a suspect .\t\" He is not considered a suspect , \" McCabe said .\n1\t655498\t655391\tThe woman was exposed to the SARS virus while in the hospital but was not a health care worker , said Dr. Colin D ’ Cunha , Ontario ’ s commissioner of public health .\tThe woman was exposed to the SARS virus while in the hospital but was not a health-care worker , said Dr Colin D 'Cunha , Ontario 's commissioner of public health .\n1\t533823\t533909\tHe added that those \" are not solely American principles , nor are they exclusively Western . \"\t\" These are not solely American principles nor are they exclusively Western , \" Rumsfeld said .\n1\t581592\t581570\t\" If we don 't march into Tehran , I think we will be in pretty good shape , \" he said .\t\" As long as we don 't march on Tehran , I think we are going to be in pretty good shape , \" he said .\n0\t1010655\t1010430\tOn Saturday , a 149mph serve against Agassi equalled Rusedski 's world record .\tOn Saturday , Roddick equalled the world record with a 149 m.p.h. serve in beating Andre Agassi .\n1\t2241925\t2242066\tChad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new technologies and methods to communicate more quickly and efficiently .\tChad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new ways to communicate .\n1\t2796978\t2797024\t\" APEC leaders are painfully aware that security and prosperity are inseparable , \" Thai Prime Minister Thaksin Shinawatra told business leaders .\t\" APEC leaders are painfully aware that security and prosperity are inseparable , \" Thaksin said .\n0\t101746\t101775\tDanbury prosecutor Warren Murray could not be reached for comment Monday .\tProsecutors could not be reached for comment after the legal papers were obtained late Monday afternoon .\n1\t327839\t327748\tWittig resigned last year after being indicted on federal bank fraud charges involving a real estate loan unrelated to Westar business .\tWittig resigned in late November about two weeks after being indicted on bank fraud charges in a real estate case unrelated to the company .\n0\t2988297\t2988555\tShattered Glass , \" starring Hayden Christensen as Stephen Glass , debuted well with $ 80,000 in eight theaters .\t\" Shattered Glass \" _ starring Hayden Christensen as Stephen Glass , The New Republic journalist fired for fabricating stories _ debuted well with $ 80,000 in eight theaters .\n1\t2217613\t2217659\tHe was arrested Friday night at an Alpharetta seafood restaurant while dining with his wife , singer Whitney Houston .\tHe was arrested again Friday night at an Alpharetta restaurant where he was having dinner with his wife .\n0\t2128530\t2128455\tHowever , EPA officials would not confirm the 20 percent figure .\tOnly in the past few weeks have officials settled on the 20 percent figure .\n1\t2208376\t2208198\tUniversity of Michigan President Mary Sue Coleman said in a statement on the university 's Web site , \" Our fundamental values haven 't changed .\t\" Our fundamental values haven 't changed , \" Mary Sue Coleman , president of the university , said in a statement in Ann Arbor .\n1\t1980654\t1980641\tThe first products are likely to be dongles costing between US $ 100 and US $ 150 that will establish connections between consumer electronics devices and PCs .\tThe first products will likely be dongles costing $ 100 to $ 150 that will establish connections between consumer electronics devices and PCs .\n0\t589579\t589557\tHowever , Lapidus expects foreign brands ' sales to be up 4 percent , driven by strong truck sales at Honda Motor Co .\tLapidus expects Ford to be down 5 percent , Chrysler down 10 percent and foreign brands up 4 percent driven by strong truck sales at Honda .\n1\t1636060\t1635946\tMichel , who remains in the government , denied that US pressure had provoked the government 's move .\tMichel , who has stayed in the new government , denied that it was U.S. pressure which had provoked the government 's move .\n1\t1630585\t1630657\tSome of the computers also are used to send spam e-mail messages to drum up traffic to the sites .\tSome are also used to send spam e-mail messages to boost traffic to the sites .\n0\t447728\t447699\tIndonesia 's army has often been accused of human rights abuses during GAM 's battle for independence , charges it has generally denied while accusing the separatists of committing rights violations .\tIndonesia 's army has been accused of human rights abuses during its earlier battles with GAM , charges it has generally denied .\n1\t1606495\t1606619\tBush also hoped to polish his anti-AIDS credentials in Uganda , which has been hailed as an African pioneer in fighting the killer disease .\tPresident Bush flies to Uganda Friday hoping to polish his anti- AIDS credentials in a country hailed as an African pioneer in fighting the epidemic .\n1\t1550897\t1550977\tLater this year , the command will send trainers with soldiers from four North African nations on patrolling and intelligence gathering missions .\tThis fall the command will send trainers to work with soldiers from four North African nations on patrolling and gathering intelligence .\n0\t490376\t490490\tThe reports helped overcome investor jitters after the euro briefly hit an all-time high against the dollar Tuesday .\tStocks slipped at the open after the euro hit record highs against the dollar .\n1\t3084554\t3084612\tSales for the quarter beat expectations , rising 37 percent year-on-year to 1.76 billion euros .\tSales rose 37 per cent year-on-year to 1.76bn , beating expectations .\n1\t315647\t315778\tIf the MTA 's appeal to a higher court is successful , the $ 2 bus and subway base fare won 't be rolled back .\tIf the MTA 's appeal is successful , the $ 2 bus and subway base fare won 't change .\n1\t3428298\t3428362\tRobert Walsh , 40 , remained in critical but stable condition Friday at Staten Island University Hospital 's north campus .\tWalsh , also 40 , was in critical but stable condition at Staten Island University Hospital last night .\n1\t2523564\t2523358\tThe Guru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS ( Basic Input Output System ) update and a troubleshooting-assistance feature called Black Box .\tThe µGuru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS update and a troubleshooting-assistance feature called Black Box .\n1\t2079200\t2079131\tU.S. corporate bond yield spreads tightened in spotty trading on Friday as Wall Street labored to get back on its feet after the largest power outage ever in North America .\tU.S. stocks rose slightly on feather-light volume on Friday , as Wall Street regrouped after the biggest-ever power outage in North America .\n1\t818091\t817811\tThe company said it would issue revised guidance for the full fiscal year next month when it releases its Q2 results .\tThe company said it would renew its guidance for 2003 when it announces its second quarter results in mid-July .\n1\t1580638\t1580663\t\" I stand 100 percent by it , and I think our intelligence services gave us the correct information at the time . \"\tI stand 100 percent by it , and I think that our intelligence services gave us the correct intelligence and information at the time , \" Blair said .\n0\t1919740\t1919926\t\" I don 't know if the person I 'm talking to now may end up being someone else at another time that may not follow the rules , \" Parrish said .\t\" I don 't know whether the person I 'm talking to now may end up being someone else , \" Parrish said .\n1\t2748287\t2748550\t\" I think it 's going to be a close vote , but I think the grant proposal is going to win , \" McConnell said .\t\" I think it 's going to be a close vote , but I think the grant proposal 's going to win , \" said Sen. Mitch McConnell , assistant majority leader .\n1\t3394891\t3394775\tTwenty-eight people were believed to have been spending Christmas Day with the caretaker of the St Sophia 's camp , when the mudslide smashed into two cabins .\tTwenty-seven people were believed to have been spending Christmas Day with the caretaker of Saint Sophia Camp , a Greek Orthodox facility , when the mudslide roared through .\n0\t2963943\t2963880\tOne , Capt. Doug McDonald , remained hospitalized in critical condition on Thursday .\tHer 20-year-old sister , Allyson , was severely burned and remained hospitalized in critical condition .\n0\t1865364\t1865251\tThe United States finally relented during President Bush 's visit to Africa earlier this month .\tDuring President Bush 's trip to Africa earlier this month , however , Washington said it would support the increase .\n1\t263690\t263819\t\" There is no conscious policy of the United States , I can assure you of this , to move the dollar at all , \" he said .\tHe also said there is no conscious policy by the United States to move the value of the dollar .\n1\t283751\t283290\tIt 's the first such drill since the September 11 terrorist attacks on New York and Washington .\tIt is the nation 's first large-scale counterterrorism exercise since the Sept . 11 terrorist attacks .\n1\t2517014\t2516995\tMyanmar 's pro-democracy leader Aung San Suu Kyi will return home late Friday but will remain in detention after recovering from surgery at a Yangon hospital , her personal physician said .\tMyanmar 's pro-democracy leader Aung San Suu Kyi will be kept under house arrest following her release from a hospital where she underwent surgery , her personal physician said Friday .\n1\t1330643\t1330622\tAccording to the Merchant Marine Ministry , the 37-year-old ship is registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands .\tThe Baltic Sky is a 37-year-old ship registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands .\n1\t3111452\t3111428\tIn an unusual move , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages that critics contend could disrupt millions of Web sites .\tIn an unusual move that critics contend could disrupt millions of Web sites , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages .\n0\t1167835\t1167651\tKansas Department of Health and Environment records show there were 88 abortions performed on girls age 14 and younger last year .\tStatistics from the Kansas Department of Health and Environment show that 11,844 abortions were performed in the state last year .\n0\t1423836\t1423708\tA European Union spokesman said the Commission was consulting EU member states \" with a view to taking appropriate action if necessary \" on the matter .\tLaos 's second most important export destination - said it was consulting EU member states ' ' with a view to taking appropriate action if necessary ' ' on the matter .\n1\t2090911\t2091154\tWaiting crowds filling the streets on both sides overwhelmed the peacekeepers soon after daylight , sweeping past the barbed wire barricades .\tBut waiting crowds filling the streets rushed the bridges soon after daylight , overrunning razor-wire barricades .\n1\t2265271\t2265152\tBarry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products not sold in the United States .\tBarry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products unknown to the American market .\n1\t3062202\t3062308\tBy skirting the FDA 's oversight , Eagan said , the quality of the imported drugs is \" less predictable \" than for those obtained in the United States .\tBy skirting the FDA 's oversight , Eagan said the quality of the imported drugs is \" less predictable \" than U.S. drugs .\n1\t2155514\t2155377\tHe said : \" For the first time there is an easy and affordable way of making this treasure trove of BBC content available to all . \"\t\" For the first time , there is an easy and affordable way of making this treasure trove of BBC content available to all , \" Dyke said .\n1\t1552068\t1551928\tThree such vigilante-style attacks forced the hacker organizer , who identified himself only as \" Eleonora [ 67 ] , \" to extend the contest until 7 p.m. EST Sunday .\tThree such vigilante-style attacks forced the hacker organiser , who identified himself only as \" Eleonora67 ] , \" to extend the contest until 8am ( AEST ) today .\n1\t936978\t937500\tEric Gagne pitched a perfect ninth for his 23rd save in as many opportunities .\tGagne struck out two in a perfect ninth inning for his 23rd save .\n0\t985015\t984975\tOne way or another , Harry Potter And The Order Of The Phoenix will be in your hands by Saturday .\tJust about everything about \" Harry Potter and the Order of the Phoenix \" will set records .\n1\t1430357\t1430425\t\" Allison just proves you don 't need to wait until August or September to have a disaster , \" said Josh Lichter , a meteorologist with the Houston-Galveston weather office .\t\" Allison just proves you don 't need to wait until August or September to have a disaster , \" Lichter said .\n1\t3039310\t3039413\tToday , analysts say , UN members can no longer ignore the shifts since the September 11 2001 attacks .\tOn Wednesday , analysts say , UN members can no longer ignore the shifts since the attacks in the US of September 11 2001 .\n1\t34513\t34742\tPolice say CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the United States .\tMr McKinlay said that CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the US .\n1\t368067\t368018\tChiron already has nearly 20 percent acceptances from PowderJect 's shareholders .\tChiron has acceptances from holders of nearly 20 percent of PowderJect shares .\n0\t611663\t611716\tErnst & Young has denied any wrongdoing and plans to fight the allegations .\tErnst & Young has denied the SEC 's claims , and called its recommendations \" irresponsible \" .\n1\t98432\t98657\tThe attack followed several days of disturbances in the city where American soldiers exchanged fire with an unknown number of attackers as civilians carried out demonstrations against the American presence .\tThe attack came after several days of disturbance in the city in which U.S. soldiers exchanged fire with an unknown number of attackers as civilians protested the American presence .\n1\t3039007\t3038845\tNo company employee has received an individual target letter at this time .\tShe said no company official had received \" an individual target letter at this time . \"\n1\t1708040\t1708062\tSecond-quarter results reflected a gain of 10 cents per diluted share , while the 2002 results included a loss of 19 cents per diluted share .\tThe second-quarter results had a non-operating gain of 10 cents a share while the 2002 second-quarter performance had a net non-operating loss of 19 cents a share .\n0\t1757264\t1757375\tHe allegedly told his ex-wife in an angry phone call that he had no intention of following their new custody agreement .\tThe two had battled over custody and he allegedly told her in an angry phone call that he had no intention of following their new custody agreement .\n1\t383417\t383558\tWorldwide , more than 50 million people have seen \" Les Miz , \" with gross receipts of $ 1.8 billion .\tWorldwide , Les Misérables has been seen by over 50 million people , with a total gross of over $ 2 billion .\n0\t2766112\t2766084\tIn fiction : Edward P. Jones ( \" The Known World \" ) and Scott Spencer ( \" A Ship Made of Paper \" ) .\tThe fifth nominee for fiction is Scott Spencer , for A Ship Made of Paper .\n1\t1261116\t1261234\t\" Overwhelmingly the Windows brand really resonated with them . \"\t\" Windows was the part of the experience that really resonated with people . \"\n1\t3028143\t3028234\tThe Centers for Medicare and Medicaid Services , the federal agency that runs Medicare , last year began a similar effort for nursing homes .\tThe Centers for Medicare and Medicaid launched a similar consumer tool for nursing homes last year .\n0\t249699\t249623\tVivace was founded in 1999 and has raised over $ 118 million in three rounds of venture financing .\tDuring difficult times for technology venture capital , Vivace raised over $ 118 million in three rounds of venture financing .\n0\t3448488\t3448449\tThe Dow Jones industrial average < .DJI > added 28 points , or 0.27 percent , at 10,557 , hitting its highest level in 21 months .\tThe Dow Jones industrial average < .DJI > rose 49 points , or 0.47 percent , to 10,578 .\n1\t2749322\t2749663\tThe Democratic candidates also began announcing their fund-raising totals before Wednesday 's deadline to file quarterly reports with the Federal Election Commission .\tThe Democratic candidates also began announcing their fund-raising totals in advance of the deadline today to file quarterly reports with the Federal Election Commission .\n0\t2204592\t2204588\tSun Microsystems Inc. on Thursday said it had added 100 new third-party systems and 100 new components to its Hardware Compatibility List for the Solaris x86 operating system Platform Edition .\tThe vendor has added 100 new third-party systems and 100 new components to the operating system 's Hardware Compatibility List ( HCL ) .\n1\t2889005\t2888954\tProsecutors said PW Marketing violated the state 's 1998 anti-spam law by sending unsolicited e-mail without a toll-free number for recipients to call to stop additional mailings .\tProsecutors said PW Marketing violated the 1998 anti-spam law because these unsolicited e-mails were sent without a free call number for recipients to phone to stop additional mailings .\n0\t1657632\t1657619\tThe Neighbours star and singer spent yesterday resting at her family home in Sydney and will have more tests today .\tGoodrem spent yesterday resting in her family home in Sydney and will have more tests today to determine her exact treatment .\n0\t555617\t555528\tThe 3 rd Armored Cavalry Regiment is 5,200 strong and the largest combat unit at Fort Carson .\tBroomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment .\n1\t2396937\t2396818\t\" The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , \" the Fed said in a statement accompanying the unanimous decision .\t\" The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , \" the policy-setting Federal Open Market Committee said .\n0\t2339738\t2339771\t\" It is bad for Symbian , \" said Per Lindberg , analyst at Dresdner Kleinwort Wasserstein .\t\" Motorola has displayed clear disloyalty \" to Symbian , said Per Lindberg , an analyst at Dresdner Kleinwort Wasserstein in London .\n0\t1616174\t1616206\tBob Richter , a spokesman for House Speaker Tom Craddick , had no comment about the ruling .\tBob Richter , spokesman for Craddick , R-Midland , said the speaker had not seen the ruling and could not comment .\n1\t635783\t635802\tBut Ms Ward said the headroom under its financial covenants was \" tight \" and that there could be another downgrade if Southcorp breached any of its banking covenants .\tBut Ms Ward said the headroom under its financial covenants was \" tight \" and that there could be a rating downgrade if Southcorp did breach any banking covenants .\n1\t3444633\t3444733\tHe added : ``I 've never heard of more reprehensiblebehaviour by a doctor .\tThe Harrisons ’ lawyer Paul LiCalsi said : “ I ’ ve never heard of more reprehensible behaviour by a doctor .\n1\t555553\t555528\tBroomhead was assigned to 2nd Squadron , 3rd Armor Cavalry Regiment , based at Fort Carson .\tBroomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment .\n1\t1112021\t1111925\tOther staff members , however , defended the document , saying it would still help policy-makers and the agency improve efforts to address the climate issue .\tSome E.P.A. staff members defended the document , saying that although pared down it would still help policy makers and the agency address the climate issue .\n0\t2749410\t2749625\tPresident Bush raised a record-breaking $ 49.5 million for his re-election campaign over the last three months , with contributions from 262,000 Americans , the president 's campaign chairman said Tuesday .\tPresident Bush has raised $ 83.9 million since beginning his re-election campaign in May , and has $ 70 million of that left to spend , his campaign said Tuesday .\n1\t1629064\t1629043\tAn episode is declared when the ozone reaches .20 parts per million parts of air for one hour .\tA Stage 1 episode is declared when ozone levels reach 0.20 parts per million .\n1\t789691\t789665\t\" He may not have been there , \" the defence official said on Thursday .\t\" He may not have been there , \" said a defence official speaking on condition of anonymity .\n1\t844421\t844679\tThe U.N. troops are in Congo to protect U.N. installations and personnel , and they can only fire in self defense and have been unable to stem the violence .\tThe troops - whose mandate is to protect U.N. installations and personnel - can only fire in self-defense and have been unable to stem the violence .\n1\t58540\t58567\tNorth American markets grabbed early gains Monday morning , as earnings season begins to slow and economic indicators take the spotlight .\tNorth American futures pointed to a strong start to the first trading session of the week Monday , as earnings season slows and economic indicators take the spotlight .\n1\t781439\t781461\tXerox itself paid a $ 10 million fine last year to settle similar SEC charges .\tXerox itself previously paid a $ 10-million penalty to settle the SEC accusations .\n1\t1909579\t1909408\t\" This deal makes sense for both companies , \" said National Chief Executive Brian Halla .\t\" This deal makes sense for both companies , \" Halla said in a prepared statement .\n0\t787432\t787464\tThe blasts killed two people and injured more than 150 others .\tThe Atlanta Olympic Games attack killed one woman and injured more than 100 other people .\n0\t52758\t52343\tMorrill 's wife , Ellie , sobbed and hugged Bondeson 's sister-in-law during the service .\tAt the service Morrill 's widow , Ellie , sobbed and hugged Bondeson 's sister-in-law as people consoled her .\n1\t1675025\t1675047\tSpansion products are to be available from both AMD and Fujitsu , AMD said .\tSpansion Flash memory solutions are available worldwide from AMD and Fujitsu .\n1\t2131318\t2131372\tAbout 1,500 police will be deployed for the visit .\tAround 1,500 police are to be deployed at Niigata for the ferry 's visit .\n1\t325763\t325928\tGamarekian told The News she remembers only the woman 's first name - and refused to reveal it .\tShe told the New York Daily News she remembers only the intern 's first name , which she refused to reveal .\n1\t2638975\t2638855\tOne of the FBI ’ s key operatives , who had a falling out with the bureau , provided an account of the operation at a friend ’ s closed immigration court proceeding .\tOne of the FBI 's key operatives , who has had a falling-out with the bureau , provided an account of the operation at a friend 's closed immigration court proceeding .\n1\t2198694\t2198937\tA nationally board certified teacher with a master 's degree , Kelley makes a salary of $ 65,000 in his 30th year .\tA nationally board certified teacher with a master 's degree , Kelley , in his 30th year teaching , makes $ 65,000 .\n1\t1825432\t1825301\tA man arrested for allegedly threatening to shoot and kill a city councilman from Queens was ordered held on $ 100,000 bail during an early morning court appearance Saturday .\tThe Queens man arrested for allegedly threatening to shoot City Councilman Hiram Monserrate was held on $ 100,000 bail Saturday , a spokesman for the Queens district attorney said .\n1\t2906104\t2906322\tThey were being held Sunday in the Camden County Jail on $ 100,000 bail .\tThey remained in Camden County Jail on Sunday on $ 100,000 bail .\n1\t722278\t722383\tMs Stewart , the chief executive , was not expected to attend .\tMs Stewart , 61 , its chief executive officer and chairwoman , did not attend .\n0\t101747\t101777\tChristina 's aunt , Shelley Riling , said the defense 's claims were preposterous .\tChristina 's aunt , Shelley Riling , said she will address the court .\n1\t2224884\t2224819\tThe Justice Department Aug. 19 gave pre-clearance for the Oct. 7 date for the election to recall Gov. Gray Davis , saying it would not affect minority voting rights .\tThe Justice Department on Aug. 19 sanctioned the Oct. 7 date for recall election , saying it would not affect voting rights .\n0\t977938\t978162\tLord Falconer hailed the changes as \" a new beginning as far as the courts , Crown Prosecution Service and police are concerned \" .\t\" It 's a new beginning as far as the courts , Crown Prosecution Service and police are concerned , making the criminal justice system work better . \"\n0\t1015010\t1014963\tGE stock closed at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange .\tGE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange .\n1\t1513190\t1513246\tAt least 27 US troops have been killed in hostile fire since Bush 's statement .\tAt least 26 American troops have been killed in hostile fire since major combat was officially declared over on May 1 .\n1\t2385348\t2385394\tA recent poll showed Edwards with a narrow lead in South Carolina , and he plans a rally there later on Tuesday .\tA recent poll showed Edwards in a virtual four-way tie at the top in South Carolina , and he plans a rally there later on Tuesday .\n1\t2317018\t2317252\tNovember 17 's last victim was British defence attache Stephen Saunders , who was shot on an Athens road in June 2000 .\tNovember 17 's last victim was British defense attache Stephen Saunders , who was shot and killed at point-blank range on a busy Athens road in June 2000 .\n0\t1831696\t1831660\tThe agency charged that one WD Energy worker discussed false reporting with traders at two other energy companies .\tThe agency found further that a WD Energy employee discussed false reporting with traders at two other energy companies , which the CFTC didn 't identify .\n1\t1528383\t1528083\tZulifquar Ali , a worshipper slightly wounded by shrapnel , said the assailants first targeted the mosque 's security guards .\tWitness Zulfiqar Ali , who was slightly wounded by shrapnel , said the attackers had focused on the mosque 's guards .\n1\t917965\t918315\tFor the second year in a row , rises in hospital costs accounted for much of the inflation , accounting for 51 percent of the overall cost increase .\tFor the second year in a row , rises in hospital costs dominated the increase , accounting for 51 percent of the overall cost spiral .\n0\t3218713\t3218830\tQ : Can I buy coverage for prescription drugs right away ?\tCongress has added a new benefit - an option to buy insurance coverage for prescription drugs .\n1\t221079\t221003\tThe airline also said it has the option to buy 380 more airplanes , orders that would be split evenly between the two manufacturers .\tThe airline has the option to buy 380 more , split evenly between the two manufacturers .\n1\t2546175\t2546198\tDr Mark McClean , Jonathan 's family doctor , said if the drug had been administered earlier Jonathan would have retained more of his brain functions .\tDr Mark McClean , the family 's GP , said had the drug been administered to Jonathan earlier , he would have retained more of his brain function .\n0\t799346\t799268\tThe chain operates more than 3,400 stores , and has annual revenue of about $ 15.8 billion .\tThe chain , which has been under new management since late 1999 , has more than 3,400 stores and $ 15.8 billion in annual revenue .\n0\t2673104\t2673130\tAll patients developed some or all of the symptoms of E. coli food poisoning : bloody diarrhea , vomiting , abdominal cramping and nausea .\tSymptoms of the E. coli infection include bloody diarrhea , nausea , vomiting and abdominal cramping .\n1\t1354501\t1354476\tFederal regulators have turned from sour to sweet on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings Inc. and Dreyer 's Grand Ice Cream Inc .\tFederal regulators have changed their minds on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings and Dreyer 's Grand Ice Cream .\n1\t3070979\t3070949\tEnvironmental campaigners are using this weekend ’ s lunar eclipse to highlight the huge increase in light pollution across the UK .\tEnvironmental campaigners used the eclipse to highlight the surge in light pollution across Britain .\n0\t1264509\t1264471\tAvailable July 7 , the software supports the Solaris , IBM AIX , Red Hat Linux and Windows operating systems .\tThe OpForce product currently works with Solaris , AIX , Red Hat Linux and Windows servers .\n1\t103280\t103431\tJustice Minister Martin Cauchon and Prime Minister Jean Chrétien have both said the Liberal government will introduce legislation soon to decriminalize possession of small amounts of pot for personal use .\tJustice Minister Martin Cauchon and Prime Minister Jean Chretien both have said the government will introduce legislation to decriminalize possession of small amounts of pot .\n0\t110731\t110648\tBut Chauncey Billups demonstrated he 's also capable of big games , scoring 77 points over the final two games against the Magic .\tBillups scored 77 points in the final two games of the first-round series against the Magic .\n1\t2274844\t2274714\tKelly killed himself after being exposed as the source for a BBC report which claimed the government had embellished evidence of Iraq 's banned weapons to justify the war .\tHe killed himself after being exposed as the source for a BBC report which claimed the government exaggerated the case for war against Iraq .\n0\t1050307\t1050144\tAnd it 's going to be a wild ride , \" said Allan Hoffenblum , a Republican consultant .\tNow the rest is just mechanical , \" said Allan Hoffenblum , a Republican consultant .\n1\t2810634\t2810670\tWhile the Ibrahims had one separation operation , Goodrich and Dr. David Staffenberg plan about three for the Aguirres , with several weeks between each .\tInstead of one long operation to separate the twins , Goodrich and Dr. David Staffenberg plan about three , with several weeks between each .\n1\t3073773\t3073779\tLay had contended that turning over the documents would violate his Fifth Amendment right against self-incrimination .\tLay had refused to turn over the papers , asserting his Fifth Amendment right against self-incrimination .\n0\t261202\t260995\tThe WHO experts didn 't say how many cases in Hebei were in rural areas .\tHebei has reported 191 cases and eight deaths , though the WHO experts did not say how many were in rural areas .\n1\t1824224\t1824209\tNearly 300 mutinous troops who seized a Manila shopping and apartment complex demanding the government resign gave up and retreated peacefully after some 19 hours .\tMutinous troops who seized a Manila shopping and apartment complex demanding the government resign ended a 19-hour standoff late Sunday and returned to barracks without a shot fired .\n1\t548867\t548785\tIn three years , Lend Lease has slipped from a top-five stock , when its share price was around $ 24 , to 37th .\tIn the space of three years , Lend Lease has slipped from a top-five 5 stock when its share price hovered around $ 24 to 37th on the list .\n0\t2796658\t2796682\tAbout two hours later , his body , wrapped in a blanket , was found dumped a few blocks away .\tThen his body was dumped a few blocks away , found in a driveway on Argyle Road .\n1\t1808166\t1808434\tColumbia broke up over Texas upon re-entry on Feb. 1 .\tColumbia broke apart in the skies above Texas on Feb. 1 .\n1\t853475\t853342\tA year or two later , 259 , or 10 per cent , of the youths reported that they had started to smoke , or had taken just a few puffs .\tWithin two years , 259 , or 10 percent , of the youths reported they had started to smoke or had at least taken a few puffs .\n0\t977772\t977804\tThe Lord Chancellor was guardian of the Great Seal , used to stamp all official documents from the sovereign .\tFalconer will hold on , for now , to the Lord Chancellor 's Great Seal , used to sign off instructions from the sovereign .\n1\t577854\t578500\tCindy Yeast , a 50-year-old Washington-area publicist , says she began taking supplements two years ago in part to avoid mild dementia that affects her elderly parents .\tShe started taking supplements two years ago - partly to stave off mild dementia that affects her elderly parents .\n1\t2829194\t2829229\tThe two are not related , but have referred to each other as father and son .\tHe 's not related to Malvo , but the two have referred to each other as father and son .\n1\t2074182\t2074668\tGibson said last month in a press statement that \" neither I nor my film are anti-Semitic .\tGibson said in a June statement that he and his film are not anti-Semitic .\n0\t2758265\t2758282\tThe world 's largest software company said it recognized the difficulty the multiple patches posed for companies , and set out to make it easier for them to apply the updates .\tThe world 's largest software company said it recognized the difficulty the multiple patches posed for companies trying to apply them .\n1\t1958079\t1958143\tThe Dow Jones industrial average .DJI ended up 64.64 points , or 0.71 percent , at 9,191.09 , according to the latest available data .\tThe blue-chip Dow Jones industrial average .DJI added 38 points , or 0.42 percent , to 9,165 .\n1\t544217\t544325\tThe vote came just two days after Kurds swept City Council elections , taking the largest single block of votes on the 30-seat council .\tThe vote for mayor followed City Council elections that gave Kurds the largest block of votes on the 30-seat council .\n1\t2385288\t2385256\tLarge swells and dangerous surf already were being felt along sections of the coast .\tAlready large swells and dangerous surf have arrived along the mid-Atlantic .\n0\t2324708\t2325028\tBased on a separate survey of households , the unemployment rate fell in August to 6.1 percent from 6.2 percent .\tLabor Department analysts discounted a slight improvement in the national unemployment rate , which fell in August to 6.1 percent from 6.2 percent .\n1\t2139506\t2139427\t\" We will work with the board to ensure a smooth transition . \"\tHe said federal regulators would work with the corporation to ensure a \" smooth transition . \"\n1\t2965576\t2965701\tGasps could be heard in the courtroom when the photo was displayed .\tGasps could be heard as the photo was projected onto the screen .\n1\t2931098\t2931144\tGilead had earnings of $ 73.1 million , or 33 cents a share , compared with $ 20.8 million , or 10 cents , in the year-ago quarter .\tQuarterly profit climbed to $ 73.1 million , or 33 cents a share , from $ 20.8 million , or 10 cents , a year earlier , the company said .\n0\t644788\t644816\t\" I had one bad stretch of holes that put me out of contention to win , \" Woods said .\t\" I had one bad stretch of holes that put me out of contention , \" Woods said , referring to his 42 on the front nine Saturday .\n0\t2551891\t2551563\tThe poll had a margin of error of plus or minus 2 percentage points .\tIt had a margin of sampling error of plus or minus four percentage points and was conducted Thursday through Saturday .\n1\t1089053\t1089297\tSen. Patrick Leahy of Vermont , the committee 's senior Democrat , later said the problem is serious but called Hatch 's suggestion too drastic .\tSen. Patrick Leahy , the committee 's senior Democrat , later said the problem is serious but called Hatch 's idea too drastic a remedy to be considered .\n1\t3435735\t3435717\tThe broad Standard & Poor 's 500 < .SPX > eased 0.37 of a point , or 0.03 percent , at 1,121 .\tThe Standard & Poor 's 500 Index < .SPX > slipped 0.26 point , or 0.02 percent , to 1,121.96 .\n0\t1954\t2142\tWatertown , Saugus and Framingham also are going smoke-free Monday , joining a growing number of cities around the country .\tAlong with Boston , Watertown , Saugus and Framingham also are going smoke-free Monday .\n1\t3400796\t3400822\tThat is evident from their failure , three times in a row , to get a big enough turnout to elect a president .\tThree times in a row , they failed to get a big _ enough turnout to elect a president .\n1\t1220668\t1220801\tWe firmly believe we have an absolute right to use the common word ' spike ' as the name of our network . \"\tWe firmly believe that we have an absolute right to use the common word ' spike ' to name our network .\n1\t1889954\t1889847\tSources who knew of the bidding said last week that cable TV company Comcast Corp. was also looking at VUE .\tLate last week , sources told Reuters cable TV company Comcast Corp. CMCSA.O also was looking at buying VUE assets .\n1\t315785\t315653\tBut MTA officials appropriated the money to the 2003 and 2004 budgets without notifying riders or even the MTA board members considering the 50-cent hike , Hevesi found .\tMTA officials appropriated the surplus money to later years ' budgets without notifying riders or the MTA board members when the 50-cent hike was being considered , he said .\n0\t1521034\t1520582\tWhite , who had suffered kidney failure from years of high blood pressure , died at Cedars-Sinai Medical Center around 9 : 30 a.m. , said manager Ned Shankman .\tWhite , who had kidney failure from years of high blood pressure , had been undergoing dialysis and had been hospitalized since a September stroke .\n1\t2083598\t2083810\tAbout 10 percent of high school and 16 percent of elementary students must be proficient at math .\tIn math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient .\n1\t1910610\t1910455\tThe legal ruling follows three days of intense speculation Hewlett-Packard Co. may be bidding for the company .\tThe legal ruling follows three days of wild volatility in RIM 's stock over speculation that PC giant Hewlett-Packard Co. may be bidding for the company .\n1\t3113791\t3113782\tThe European Commission , the EU 's antitrust enforcer , is expected to issue its decision next spring — unless a settlement is reached .\tThe European Commission is expected to issue its decision in the case next spring — unless a settlement is reached .\n1\t3214517\t3214483\t\" So Sebastian did his best to convincingly confess to a crime that he didn 't commit in order to survive , \" she told jurors .\t\" Sebastian did his best to confess convincingly to a crime he didn 't do in order to survive , \" Ms. Richardson declared .\n0\t2083612\t2083810\tTwenty percent of Latino students and 23 percent of black students performed at proficient or higher .\tIn math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient .\n1\t661390\t661218\tHe is charged in three bombings in Atlanta including a blast at the 1996 Olympics and one in Alabama .\tHe is charged in three bombings in Atlanta - including a blast at the 1996 Olympics - along with the bombing in Alabama .\n1\t1269572\t1269682\tThe men were remanded in custody and are due to appear again before court on July 8 .\tThey were remanded in custody and will appear in court again on July 8 .\n1\t1095780\t1095652\t\" No matter who becomes the sponsor for stock-car racing 's top series , NASCAR will need an all-star event , \" Wheeler said in a statement .\tNo matter who becomes the sponsor for stock-car racings top series , NASCAR will need an all-star event , Wheeler said Tuesday .\n1\t116294\t116332\tThe Phillies were upset that Counsell had stolen second in the sixth inning with Arizona leading 7-1 .\tThe Phillies were apparently upset when Counsell stole during the sixth with the Diamondbacks up 7-1 .\n1\t941617\t941673\tHe said his hatred for such people grew from these discussions and had helped convince him violence was the answer .\tHis hatred for these people had germinated from these discussions and helped cement his belief that violence was the panacea .\n1\t2640607\t2640576\t\" There is no need for one deadline for all to create the ASEAN Economic Community , \" Thaksin said .\tThus , he said , there did not have to one deadline to create the economic community .\n1\t3310210\t3310286\tThe announcement was made during the recording of a Christmas concert attended by top Vatican cardinals , bishops , and many elite from Italian society , witnesses said .\tThe broadside came during the recording on Saturday night of a Christmas concert attended by top Vatican cardinals , bishops and many elite of Italian society , witnesses said .\n1\t3376093\t3376101\tThe additional contribution brings total U.S. food aid to North Korea this year to 100,000 tonnes .\tThe donation of 60,000 tons brings the total of U.S. contributions for the year to 100,000 .\n1\t1549586\t1549609\tLeon Williams ' body was found inside his third-floor apartment at 196 Bay St. , in Tompkinsville .\tThe dead man , Leon Williams , was found in his third-floor apartment .\n1\t460211\t460445\tThe player 's eyes were bloodshot and a blood-alcohol test produced a reading of 0.18 - well above Tennessee 's level of presumed intoxication of 0.10 , the report said .\tHe failed a field sobriety test and a blood-alcohol test produced a reading of 0.18 – well above Tennessee 's level of presumed intoxication of 0.10 , the report said .\n1\t1196962\t1197061\tBut Virgin wants to operate Concorde on routes to New York , Barbados and Dubai .\tBranson said that his preference would be to operate a fully commercial service on routes to New York , Barbados and Dubai .\n0\t862804\t862715\tHe tried to fight off officers and was taken to a hospital after a police dog bit him but was later released .\tCruz tried to fight off officers and was hospitalized after a police dog bit him , Sgt. Steve Dixon said .\n1\t1726935\t1726879\tThe announcement , which economists said was not a surprise , may be bittersweet for the millions of Americans without jobs .\tEconomists said the announcement was not a surprise , and politicians said it offered little comfort to the millions of Americans without jobs .\n0\t331980\t332110\tAsked if the delegates could leave on Friday , police intelligence chief in Aceh , Surya Dharma , told reporters they could not because they did not have proper permission .\tAsked if the delegates could leave on Friday , police intelligence chief Surya Dharma told reporters : \" Of course they may not go .\n1\t173879\t173832\tDealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid the yen 's rise against the dollar .\tDealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid ever-falling domestic interest rates .\n0\t2834988\t2835026\tIran has until the end of the month to satisfy the agency it has no plans for nuclear weapons .\tThe Iranians have until the end of the month to answer all the agency 's questions about their past nuclear activities .\n1\t2587300\t2587243\tHer father , Florin Cioaba , the king of Transylvania 's Gypsies , had her brought back and she was married against her will .\tHer father , Roma King Florin Cioaba , had her brought back and she was promptly married against her will .\n0\t554905\t554627\tClaire had advanced to the third round of the 76th annual Scripps Howard National Spelling Bee .\tOne by one they strolled to the microphone , all 251 youngsters in the 76th Scripps Howard National Spelling Bee .\n1\t1912524\t1912648\tCitigroup Inc . C.N , the world 's largest financial services company , on Wednesday promoted Marjorie Magner to chairman and chief executive of its global consumer group .\tCitigroup ( C ) on Wednesday named Marjorie Magner chairman and chief executive of its colossal global consumer business .\n1\t3255597\t3255668\t\" They 've been in the stores for over six weeks , \" says Carney .\tThe quarterlies usually stay in stores for between six to eight weeks , \" Carney added .\n1\t629316\t629289\tLet me just say this : the evidence that we have of weapons of mass destruction was evidence drawn up and accepted by the joint intelligence community .\t\" The evidence that we had of weapons of mass destruction was drawn up and accepted by the Joint Intelligence Committee , \" he said .\n1\t54181\t53570\tRidge said no actual explosives or other harmful substances will be used .\tRidge said no real explosives or harmful devices will be used in the exercise .\n1\t723557\t724115\tThus far , Stewart 's company appears ready to stand behind her .\tFor now , the company 's management appears to be standing behind Stewart .\n0\t2607718\t2607708\tBut late Thursday night , the campaign issued a statement saying there would be no news conference and no big announcement .\tBut late yesterday , the campaign and the state Democratic Party said there would be no news conference .\n1\t753858\t753890\tThere 's also a flaw that results because IE does not implement an appropriate block on a file download dialog box .\tThe second vulnerability is a result of IE not implementing a block on a file download dialog box .\n1\t587009\t586969\tAnother $ 100-million in savings will come from management layoffs and pay cuts .\tThe airline expects to save another $ 100-million a year through management layoffs and pay cuts .\n1\t308567\t308525\tHe called on Prime Minister John Howard to establish a royal commission on child sex abuse .\tThe Senate motion also called on Prime Minister John Howard to hold a royal commission into child sex abuse .\n0\t665419\t665612\t\" We think that the United States of America should support the free speech of all groups , \" Mr. White said , objecting to Mr. Olson 's recommendation .\tWe think that the United States of America should support the free speech of all groups , he said .\n1\t2763517\t2763576\tTerri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler .\tThe tube was removed Wednesday from Terri Schiavo , 39 , at the Tampa Bay-area hospice where she has lived for several years .\n0\t3107118\t3107136\tAfter 18 months , Nissen found that Lipitor stopped plaque buildup in the patients ' arteries .\tAfter 18 months , the atorvastatin patients had no change in the plaque in their arteries .\n1\t780604\t780466\tToll , Australia 's second-largest transport company , last week offered NZ75 a share for Tranz Rail .\tToll last week offered to buy the company for NZ75c a share , or $ NZ158 million .\n0\t1989213\t1989116\t\" This child was literally neglected to death , \" Armstrong County District Attorney Scott Andreassi said .\tArmstrong County District Attorney Scott Andreassi said the many family photos in the home did not include Kristen .\n1\t1462409\t1462504\tWal-Mart , the nation 's largest private employer , has expanded its antidiscrimination policy to protect gay and lesbian employees , company officials said Tuesday .\tWal-Mart Stores Inc . , the nation 's largest private employer , will now include gays and lesbians in its anti-discrimination policy , company officials said Wednesday .\n1\t260952\t260924\tMetro , bus and local rail services in France 's four largest towns -- Paris , Lyon , Lille and Marseille -- were severely disrupted , Europe 1 radio reported .\tSubway , bus and suburban rail services in France 's four largest cities -- Paris , Lyon , Lille and Marseille -- were severely disrupted , transport authorities said .\n1\t1224743\t1225510\tIn the undergraduate case , Rehnquist said the use of race was not \" narrowly tailored \" to achieve the university 's asserted interest in diversity .\tRehnquist wrote that the system was not narrowly tailored to achieve the interest in educational diversity .\n0\t3329379\t3329416\tSP2 is basically about security enhancements to Windows , such as the improved Internet Connection Firewall ( ICF ) .\tThe firewall in the current Windows XP was known as the Internet Connection Firewall ( ICF ) .\n1\t2362761\t2362698\tA landslide in central Chungchong province derailed a Seoul-bound train and 28 passengers were injured , television said .\tIn central Chungchong province , a landslide caused a Seoul-bound Saemaeul Express train to derail , injuring 28 people , local television said .\n0\t1465073\t1464854\tThey will help draft a plan to attack obesity that Kraft will implement over three to four years .\tThe team will help draft a plan by the end of the year to attack obesity .\n1\t195728\t196099\tBut that amount would probably be impossible to pass in the Senate , where Republican moderates have refused to go above $ 350 billion .\tSuch an amount would probably be unable to summon a majority of the Senate , where Republican moderates have refused to go above $ 350 billion .\n1\t2587767\t2587673\tIn the clash with police , Lt. Mothana Ali said about 1,000 demonstrators had gone to the station demanding jobs .\tIn Baghdad , police Lieut . Mothana Ali said about 1,000 demonstrators arrived at the station demanding jobs .\n0\t1490044\t1489975\tCorixa shares rose 54 cents to $ 7.74 yesterday on the Nasdaq Stock Market .\tShares of Corixa rose 54 cents , or about 8 percent , to close at $ 7.74 .\n1\t958161\t957782\tCommittee approval , expected today , would set the stage for debate on the Senate floor beginning Monday .\tThat would clear the way for debate in the full Senate beginning on Monday .\n1\t1033204\t1033365\tO 'Brien was charged with leaving the scene of a fatal accident , a felony .\tBishop Thomas O 'Brien , 67 , was booked on a charge of leaving the scene of a fatal accident .\n0\t2996241\t2996734\tTom Hamilton said his daughter was conscious and alert and in stable condition after the attack Friday morning .\tBethany , who remained in stable condition after the attack Friday morning , talked of the attack Saturday .\n0\t2015389\t2015410\tThe Calgary woman , who is in her twenties , donated blood on Aug. 7 .\tThe woman -- who has no symptoms of illness -- donated blood Aug. 7 .\n1\t221515\t221509\tQuattrone lawyer John W. Keker said his client is innocent .\tIn a statement Monday , his lawyer John Keker said ``Frank Quattrone is innocent .\n0\t2283737\t2283794\tIn the weeks leading up to the execution , several Florida officials received anonymous threatening letters .\tSeveral Florida officials connected to the case have received threatening letters , accompanied by rifle bullets .\n1\t2826681\t2826474\tThe disagreement over online music sales was disclosed in documents filed last week with the judge and made available by the court yesterday .\tThe fight over online music sales was disclosed in documents made available Monday by the court .\n1\t2249237\t2249305\tParson was charged with intentionally causing and attempting to cause damage to protected computers .\tParson is charged with one count of intentionally causing damage to a protected computer .\n1\t389239\t389299\t\" The court and the public need to know much more of the details of the defendant 's seemingly massive fraud , \" the judge said .\t\" The court and the public need to know more of the defendants ' seemingly massive fraud , \" he said .\n1\t2652187\t2652218\tThe U.S. Supreme Court will hear arguments on Wednesday on whether companies can be sued under the Americans with Disabilities Act for refusing to rehire rehabilitated drug users .\tThe high court will hear arguments today on whether companies can be sued under the ADA for refusing to rehire rehabilitated drug users .\n1\t2945693\t2945847\tThe IRS said taxpayers can avoid undelivered checks by having refunds deposited directly into their checking or savings accounts .\tThe IRS said taxpayers can avoid problems with lost or stolen refunds by having refunds deposited directly into personal checking or savings accounts .\n1\t2065523\t2065836\t\" More than 70,000 men and women from bases in Southern California were deployed in Iraq .\tIn all , more than 70,000 troops based in Southern California were deployed to Iraq .\n1\t2222998\t2223097\tBP shares slipped 0.8 percent to 433.50 pence ( $ 6.85 ) each in afternoon trading on the London Stock Exchange .\tBP shares slipped 48 cents to $ 41.72 Friday in trading on the New York Stock Exchange .\n1\t2561999\t2561941\tBecause of the accounting charge , the company now says it lost $ 1.04 billion , or 32 cents a share , in the quarter ended June 30 .\tIncluding the charge , the Santa Clara , Calif.-based company said Monday it lost $ 1.04 billion , or 32 cents per share , in the period ending June 30 .\n0\t2324704\t2325023\tFriday 's report raised new worries that a weak job market could shackle the budding economic recovery despite a slight improvement in the overall unemployment rate .\tU.S. companies slashed payrolls for a seventh straight month in August , raising new worries that a weak jobs market could shackle the budding economic recovery .\n1\t2336453\t2336545\tFederal Emergency Management Administration designated $ 20 million to establish the registry .\tThe registry was launched with $ 20 million from the Federal Emergency Management Agency .\n1\t720572\t720486\tBREAST cancer cases in the UK have hit an all-time high with more than 40,000 women diagnosed with the disease each year , Cancer Re-search UK revealed yesterday .\tCases of breast cancer in Britain have reached a record high , with the number of women diagnosed with the disease passing the 40,000 mark for the first time .\n1\t1605818\t1605806\t\" It was never our intention to sell the product , \" said Health Minister Anne McClellan , a skeptic of medical marijuana use .\t\" It was never the intention of us to sell product , \" federal Health Minister Anne McLellan said yesterday in Edmonton .\n0\t2440680\t2440474\tGM , the world 's largest automaker , has 115,000 active UAW workers and another 340,000 retirees and spouses .\tThey cover more than 300,000 UAW workers and 500,000 retirees and spouses .\n0\t726399\t726078\tRosenthal is hereby sentenced to custody of the Federal Bureau of prisons for one day with credit for time served , \" Breyer said to tumultuous cheers in the courtroom .\t\" Rosenthal is hereby sentenced to custody of the Federal Bureau of Prisons for one day with credit for time served . \"\n1\t533903\t533818\t\" We are committed to helping the Iraqi people get on the path to a free society , \" Rumsfeld said in a speech to the Council on Foreign Relations .\t\" We are committed to helping the Iraqi people get on the path to a free society , \" he said .\n1\t1166473\t1166857\tMr. Young said he was disappointed that the government didn 't see the severe acute respiratory syndrome crisis as worthy of federal disaster-relief money .\tYoung said he was disappointed the government didn 't see the SARS crisis as worthy of federal disaster relief money .\n1\t144089\t143697\tThe 12-nation currency has risen by 33 percent against the dollar over the past 15 months .\tThe euro is up 9 percent against the dollar in the past six weeks .\n1\t3439854\t3439874\tIn February 2000 , the officers — Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy — were acquitted of all charges in the killing .\tThe officers -- Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy -- were acquitted in 2000 of state murder charges .\n1\t3464314\t3464302\tI was surprised it turned out me talking and the president just listening .\t\" I was surprised it turned out me talking and the president just listening . . . It was mostly a monologue . \"\n1\t2008984\t2009175\tThe state 's House delegation currently consists of 17 Democrats and 15 Republicans .\tDemocrats hold a 17-15 edge in the state 's U.S. House delegation .\n0\t816867\t816831\tFreddie also said Leland C. Brendsel will retire as chairman and chief executive and resign from the board .\tHe replaces Leland Brendsel , 61 , who retired as chairman and chief executive .\n1\t192285\t192327\tWe 'll be listening carefully to the [ IAEA ] director general 's report at the next board meeting .\t\" We 'll be listening carefully to the ( IAEA ) director-general 's report at the next board meeting . \"\n1\t2688145\t2688162\tIn that position , Elias will report to Joe Tucci , president and CEO of EMC .\tAs executive vice president of new ventures , Elias will report to Joe Tucci , EMC 's president and chief executive .\n1\t3294207\t3294290\tBut with the PM due to leave tomorrow afternoon for personal reasons there was a risk he might not be present when the final decision was made .\tBut with the Prime Minister due to leave tomorrow , a day early , he may not be present when the final decision is made .\n0\t205100\t205145\tA pro-independence radical , Miodrag Zivkovic , of the Liberal Alliance , came in second with 31 percent of the vote .\tMiodrag Zivkovic , of the Liberal Alliance of Montenegro , won 31 percent of the vote while the independent Dragan Hajdukovic got four percent .\n0\t3242051\t3241897\tMr. Kerkorian tried unsuccessfully to take over Chrysler in 1995 , but did win representation on its board .\tKerkorian and Tracinda had also tried to take over Chrysler in 1995 .\n0\t1076861\t1077018\tGlover spoke at a news conference that included about 20 relatives of the victims .\tAbout 20 family members of the victims were invited to the news conference .\n1\t2095803\t2095786\tDrax faced a financial crisis late last year after it lost its most lucrative sales contract , held with insolvent utility TXU Europe .\tDrax ’ s troubles began late last year when it lost its most lucrative sales contract , with the insolvent utility TXU Europe .\n1\t2112330\t2112376\tBut I would rather be talking about high standards than low standards . \"\t\" I would rather be talking about positive numbers rather than negative .\n1\t3389318\t3389271\tIt was not immediately known how many people were on flight UTA 141 , which could carry 141 passengers and crew .\tIt was still not known exactly how many people were on the plane , which could carry 141 passengers and crew .\n1\t698948\t698933\tThe market remains pinned in a narrow range after a powerful rally drove the broad Standard & Poor 's 500 index .SPX up more than 20 percent since mid-March .\tThe market remains pinned in a narrow range after a powerful rally pushed the broad S & P 500 index up more than 20 percent since mid-March .\n1\t539585\t539355\tWitnesses said they believed the man planned to crash the Launceston-bound Qantas flight 1737 , which was carrying 47 passengers and six crew .\tWitnesses believe he wanted to crash Flight 1737 , which had 47 passengers and six crew .\n1\t684848\t684557\tAs Samudra sat down to hear the indictment , he looked over to his nine lawyers and shouted ``God is Great ' ' three times .\tAs he sat down to hear the indictment , Samudra looked over to his nine lawyers and shouted \" Takbir ! \" , or \" Proclaim ! \" , a religious rallying cry .\n1\t347017\t347002\tIn hardest-hit Taipei , traffic has disappeared from once bustling streets , ubiquitous department stores stand mostly empty and restaurants are eerily quiet .\tIn hardest-hit Taipei , traffic has disappeared from once-bustling streets and department stores and restaurants are virtually empty .\n1\t1592037\t1592076\tIn a statement , Lee said he \" no longer believes that Viacom deliberately intended to trade on my name when naming Spike TV . \"\tSpike Lee no longer believes that Viacom deliberately intended to trade on his name by calling its own venture \" Spike TV , \" according to a statement read in court Tuesday .\n0\t3013483\t3013540\tSingapore Prime Minister Goh Chok Tong says China plays an important role in the integration of Asia , including managing the stresses and strains both within and between countries .\tHAINAN PROVINCE , China : Singapore Prime Minister Goh Chok Tong said China plays an important role in the integration of Asia .\n1\t2020252\t2020081\tThe worm attacks Windows computers via a hole in the operating system , an issue Microsoft on July 16 had warned about .\tThe worm attacks Windows computers via a hole in the operating system , which Microsoft warned of 16 July .\n0\t2614947\t2614904\tThe premium edition adds OfficeFront Page 2003 , Acceleration Server 2000 , and SQL Server 2000 .\tThe premium edition adds ISA Server , SQL Server and a specialized edition of BizTalk 2004 .\n0\t1744257\t1744378\tIn the year-ago quarter , the steelmaker recorded a profit of $ 16.2 million , or 15 cents per share , on sales of $ 1.14 billion .\tIn the second quarter last year , AK Steel reported a profit of $ 16.2 million , or 15 cents a share .\n0\t1119721\t1119714\tSony claimed that the reader 's capacitance sensing technology cannot be fooled by paper copies and does not require cleaning .\tIts capacitance sensing technology electronically reads a fingerprint ; Sony says it can 't be fooled by paper copies and doesn 't require cleaning .\n1\t1186754\t1187056\tAmazon.com shipped out more than a million copies of the new book , making Saturday the largest distribution day of a single item in e-commerce history .\tAmazon.com shipped more than a million copies by Saturday afternoon , making Saturday the largest distribution day of a single item in e-commerce history .\n1\t2842562\t2842582\tThe show 's closure affected third-quarter earnings per share by a penny .\tThe company said this impacted earnings by a penny a share .\n0\t431076\t431242\tAfter the two-hour meeting on May 14 , publisher Arthur O. Sulzberger Jr . , executive editor Howell Raines and managing editor Gerald Boyd pledged quick remedies to staff grievances .\tThe committee will make recommendations to Publisher Arthur Sulzberger , Executive Editor Howell Raines and Managing Editor Gerald Boyd .\n1\t1393764\t1393984\tIt 's been a busy couple of days for security gurus assigned to keep their companies safe and sound .\tIt 's been a busy couple of days for enterprise security gurus tasked with the job of keeping their companies safe and sound .\n0\t2916199\t2916164\tLu reclined in a soft chair wearing a woolly coat near the blackened capsule .\t\" It 's great to be back home , \" said Lu , dressed in a woolly coat near the blackened capsule .\n1\t2530671\t2530542\tGov. Bob Riley proposed the budget cuts after Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 .\tAfter Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 , Riley forecast significant cuts in state programs .\n1\t219064\t218969\t\" It is probably not the easiest time to come in and take over the shuttle program , but then again , I look forward to the challenge , \" he said .\t\" It 's probably not the easiest time to come in and take over the shuttle program , but I look forward to the challenge , \" Parsons told reporters at NASA headquarters .\n0\t2377289\t2377259\tEstonia 's place in the European mainstream and safeguard its independence regained in 1991 .\tEstonia was forcibly incorporated in the Soviet Union in 1940 and regained its independence only in 1991 .\n0\t2110220\t2110199\tFranklin County Judge-Executive Teresa Barton said a firefighter was struck by lightning and was taken to the Frankfort Regional Medical Center .\tA county firefighter , was struck by lightning and was in stable condition at Frankfort Regional Medical Center .\n0\t1864253\t1863810\tPolice suspected that Shaichat , 20 , had been abducted either by Palestinians or by Israeli Arabs .\tNobody claimed responsibility for Schaichat 's death , but police suspect that the 20-year-old soldier was abducted either by Palestinians or Israeli Arabs .\n0\t3150803\t3150839\tDuring this year 's August to October quarter , Lowe 's opened 38 new stores , including two relocations .\tDuring the third quarter , Lowe 's opened 38 new stores and now has 932 stores in 45 states .\n0\t969381\t969512\tThe technology-laced Nasdaq Composite Index < .IXIC > declined 25.78 points , or 1.56 percent , to 1,627.84 .\tThe broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 .\n1\t271891\t271839\tSony said the PSP would also feature a 4.5-inch LCD screen , Memory Stick expansion slots .\tIt also features a 4.5 in back-lit LCD screen and memory expansion facilities .\n0\t2829648\t2829613\tClinton did not mention that two Democratic senators , Charles Robb of Virginia and Wendell Ford of Kentucky , voted to shelve the McCain bill .\tTwo Democrats , Sen. Charles Robb of Virginia and Wendell Ford of Kentucky , voted with the 40 Republicans .\n1\t886904\t887158\tSome of the company 's software developers will join Microsoft , but details haven 't been finalized , said Mike Nash , corporate vice president of Microsoft 's security business unit .\tSome of the companys software developers will join Microsoft , but details havent been finalized , said Mike Nash , corporate vice president of Microsofts security business unit .\n0\t2632692\t2632767\tWal-Mart has said it plans to open at least 40 Supercenters in the state in the coming years ; analysts expect four or more to be in San Diego County .\tAt least 40 of the outlets will be in California , and analysts expect four or more to be in San Diego County .\n1\t2240399\t2240149\tCintas is battling efforts to unionize 17,000 of its workers and to let unions organize the workers by signing cards , rather than by a lengthy election process .\tCintas is battling efforts to unionize 17,000 of its workers and labor 's demands to let its workers organize by signing cards , rather than by a lengthy election process .\n1\t805457\t805985\tThe opposition would resort to rolling mass action \" at strategic times of our choice and without warning to the dictatorship , \" he said .\t\" From now onwards we will embark on rolling mass action at strategic times of our choice and without any warning to the dictatorship , \" he said .\n1\t2896308\t2896334\tFederal Agriculture Minister Warren Truss said the Government still did not know the real reason the sheep were rejected at the Saudi port of Jeddah on August 21 .\tHe said the Government still did not know the real reason the original Saudi buyer pulled out on August 21 .\n1\t2110775\t2110924\tTom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said that scenario is one among many that investigators are considering .\tTom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said investigators are considering the scenario .\n1\t1762569\t1762526\tHester said Sanmina was the best fit among several purchase offers the company received from electronics manufacturers and computer makers .\tHester said Sanmina 's offer was the best among several Newisys received from electronics manufacturers and computer makers .\n0\t2706154\t2706185\tThe other inmate fell but Selenski shimmed down the makeshift rope to a second-story roof and used the mattress to scale a razor-wire fence , Fischi said .\tAfter the other inmate fell , Selenski used the mattress to scale a 10-foot , razor-wire fence , Fischi said .\n1\t1057995\t1057778\tThe hearing , expected to last a week , will determine whether Akbar faces a court-martial .\tThe purpose of the hearing is to determine whether Akbar should be court-martialled .\n1\t1386884\t1386857\tHe said he has begun a court action to seize Beacon Hill 's assets and has frozen more than $ 13 million Beacon Hill had when it closed .\tHe said he has initiated a forfeiture action in court and frozen more than $ 13 million Beacon Hill had when it closed .\n1\t3093023\t3092996\tSpeaking for the first time yesterday , Brigitte 's maternal aunt said his family was unaware he had was in prison or that he had remarried .\tBrigitte 's maternal aunt said his family was unaware he had been sent to prison , or that he had remarried in Sydney .\n1\t1661381\t1661317\t\" Close co-operation between our law enforcement agencies , close co-operation between our intelligence services lie at the heart of the ongoing fight against terrorism . \"\tClose cooperation between regional law enforcement agencies and intelligence services was at the heart of the fight against terrorism , he said .\n0\t2926039\t2925982\tThe mother of a Briton held by Colombian guerrillasspoke of her relief yesterday after hearing that he might be freed in the next few weeks .\tThe parents of a Briton being held hostage by Colombian rebels spoke yesterday of their optimism that he would be freed in time for his birthday next month .\n0\t637168\t637447\tWe strongly disagree with Novell 's position and view it as a desperate measure to curry favor with the Linux community .\tMcBride characterized Novell 's move as \" a desperate measure to curry favor with the Linux community . \"\n1\t696677\t696932\tAfter more than two years ' detention under the State Security Bureau , the four were found guilty of subversion in Beijing 's No. 1 Intermediate Court last Wednesday .\tAfter more than two years in detention by the State Security Bureau , the four were found guilty last Wednesday of subversion .\n1\t3122429\t3122305\tMr Russell , 46 , a coal miner from Brisbane , said : \" They are obviously hurting , so we are basically going over there to help them . \"\t\" They are obviously hurting so we are basically going over there to help them , \" Russell , 46 , said .\n1\t1348909\t1348954\tThe New York Democrat and former first lady has said she will not run for the White House in 2004 , but has not ruled out a race in later years .\tThe former first lady has said she will not run for the White House in 2004 but has not ruled out a race later on .\n0\t162203\t162101\tIt does not affect the current Windows Media Player 9.0 Series .\tWindows Media Player has had security problems before .\n0\t71501\t71627\tThe seizure took place at 4 a.m. on March 18 , just hours before the first American air assault .\tThe time was about 4 a.m. on March 18 , just hours before the first pinpoint missiles rained down on the capital .\n1\t2907762\t2907649\tDonations stemming from the Sept . 11 attacks helped push up contributions to human service organizations and large branches of the United Way by 15 percent and 28.6 percent , respectively .\tDonations stemming from the Sept . 11 attacks helped push up contributions to human service organizations by 15 percent and to large branches of the United Way by 28.6 percent .\n1\t2167771\t2167744\tIn May , Mr. Hatfill said he was struck by a vehicle being driven by an FBI employee who was tailing him in Georgetown .\tLast May , Hatfill was struck by a vehicle being driven by an FBI employee who was tailing him in Washington 's Georgetown neighborhood .\n1\t3320577\t3320553\t\" I will support a constitutional amendment which would honor marriage between a man and a woman , codify that , \" he said .\t\" If necessary , I will support a constitutional amendment which would honour marriage between a man and a woman , codify that . \"\n1\t849291\t849442\tIBM of the US and Infineon Technologies of Germany will today announce a technological development that could threaten multi-billion dollar memory chip markets .\tIBMof the US andInfineon Technologies of Germany willon Tuesdayannounce a technological development that could threaten multi-billion dollar memory chip markets .\n0\t763948\t763991\tCosta 's semifinal opponent is Spaniard Juan Carlos Ferrero , whom he beat in last year 's final .\tCosta will play Juan Carlos Ferrero next in a rematch of last year 's final .\n1\t1908763\t1908744\tA former employee of a local power company pleaded guilty Wednesday to setting off a bomb that knocked out a power substation during the Winter Olympics last year .\tA former Utah Power meter reader pleaded guilty Wednesday to bombing a power substation during the 2002 Winter Olympics .\n0\t1876120\t1876059\tThyroid hormones are known to help in weight loss by stimulating metabolism - and cutting cholesterol - but come with the unwanted side effect of speeding up the heartbeat .\tThyroid hormones are known to help in weight loss by stimulating metabolism , and they can help cut cholesterol too .\n1\t518089\t518133\tJudge Craig Doran said it wasn 't his role to determine if Hovan was \" an evil man \" but maintained that \" he has committed an evil act . \"\tJudge Craig Doran said he couldn 't determine if Hovan was \" an evil man \" but said he \" has committed an evil act . \"\n0\t224932\t224868\tThe Hartford shares rose $ 2.88 , or 6.6 percent , to close Monday at $ 46.50 on the New York Stock Exchange .\tShares of Hartford rose $ 2.88 to $ 46.50 in New York Stock Exchange composite trading .\n1\t1771131\t1771091\tIt also offers a built-in NAND flash boot loader so that high-density NAND flash memory can be used without having to install an additional support chip .\tThe S3C2440 has a built-in NAND flash boot loader , for example , so that high-density NAND flash memory can be installed without an additional support chip .\n0\t2728425\t2728251\tIt decided instead to issue them before the stock market opened Monday after the downgrade of its debt late Friday by Moody 's , the credit rating agency .\tIt decided instead to issue them before the stock market opened Monday to counteract the downgrade of its debt late Friday by Moody 's to one step above junk status .\n0\t953733\t953537\tAltria shares fell 2.5 percent or $ 1.11 to $ 42.57 and were the Dow 's biggest percentage loser .\tIts shares fell $ 9.61 to $ 50.26 , ranking as the NYSE 's most-active issue and its biggest percentage loser .\n1\t349215\t349241\tIt will be followed in November by a third movie , \" The Matrix Revolutions . \"\tThe film is the second of a trilogy , which will wrap up in November with \" The Matrix Revolutions . \"\n1\t2919853\t2919804\tMassachusetts regulators and the Securities and Exchange Commission on Tuesday pressed securities fraud charges against Putnam Investments and two of its former portfolio managers for alleged improper mutual fund trading .\tState and federal securities regulators filed civil charges against Putnam Investments and two portfolio managers in the ever-expanding mutual fund trading scandal .\n1\t954526\t954607\tHe is blocking them until the Air Force assigns four additional C-130 cargo planes to Gowen Field , an Idaho Air National Guard base in Boise .\tHe is holding them up until the Air Force agrees to assign four additional C-130 cargo planes to the Idaho Air National Guard .\n1\t69773\t69792\tCisco pared spending to compensate for sluggish sales .\tIn response to sluggish sales , Cisco pared spending .\n0\t2823575\t2823513\tThe study , published Monday in the journal Molecular Brain Research , is likely to also apply to humans , its authors said .\tThe study , conducted on the brains of developing mice , was being published today in the journal Molecular Brain Research .\n1\t2455942\t2455978\tMy decision today is not based on any one event . \"\tGovernor Rowland said his decision was \" not based on any one event . \"\n1\t131979\t131957\tNelson , 27 , is being retried on civil-rights charges stemming from the disturbance which led to Rosenbaum 's death .\tNelson , 27 , is being retried on civil rights charges stemming from the disturbance that led to Rosenbaum 's death .\n0\t2010705\t2010779\t\" The government elements who have been causing trouble are still in place .\tThe government elements who have been causing trouble are still in place , they are attacking us . \"\n1\t54142\t53641\tNext Monday at about 2 p.m. ( CST ) , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms .\tAround the same time , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms .\n1\t1015249\t1015204\tWal-Mart Stores Inc . , Kohl 's Corp. , Family Dollar Stores Inc. and Big Lots Inc. were among the merchants posting May sales that fell below Wall Street 's modest expectations .\tWal- Mart , Kohl 's Corp. , Family Dollar Stores Inc . , and Big Lots Inc. posted May sales that fell below Wall Street 's modest expectations .\n0\t753928\t753890\tThe patch also fixes a vulnerability that results because IE does not implement an appropriate block on a file download dialog box .\tThe second vulnerability is a result of IE not implementing a block on a file download dialog box .\n1\t3022833\t3023029\tPeterson , a former fertilizer salesman , is charged with murder in the deaths of his 27-year-old wife and the baby boy she was carrying .\tPeterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son .\n0\t751520\t751373\tSPOT products run a Microsoft operating system and the company 's DirectBand radio technology developed with SCA Data Systems .\tThe DirectBand network was developed with the assistance of SCA Data Systems .\n0\t218848\t218851\tHe replaces Ron Dittemore , who announced his resignation in April .\tDittemore announced his plans to resign on April 23 .\n1\t3181118\t3181443\tDetectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , of the arrest shortly after Perry was apprehended .\tShortly after his arrest , detectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , a medical assistant , about the development .\n1\t515581\t515752\tThey were among about 40 people attending the traditional Jewish ceremony colored by some non-traditional touches .\tHe said about 40 people attended the traditional Jewish ceremony colored by some nontraditional touches .\n1\t347022\t347003\tTaiwan had been relatively free of the viral infection until a fiasco at a Taipei hospital in late April caused the number of infections to skyrocket .\tTaiwan had been relatively free of the viral infection until a severe outbreak at a Taipei hospital in late April .\n1\t3311600\t3311633\tMr. Rowland attended a party in South Windsor for the families of Connecticut National Guard soldiers called to active duty .\tRowland was making an appearance at a holiday party for families of Connecticut National Guard soldiers assigned to duty in Iraq and Afghanistan .\n0\t3439114\t3439084\tRoss Garber , Rowland 's lawyer , said Tuesday he would attend the meeting and would ask to speak on the issue .\tRoss Garber , Rowland 's legal counsel , said the governor would have no comment on the condo deal .\n0\t487951\t488007\tThe euro was at 1.5281 versus the Swiss franc EURCHF = , up 0.2 percent on the session , after hitting its highest since mid-2001 around 1.5292 earlier in the session .\tThe euro was steady versus the Swiss franc after hitting its highest since mid-2001 of 1.5261 earlier in the session .\n0\t314997\t315030\tOn the stand Wednesday , she said she was referring only to the kissing .\tOn the stand Wednesday , she testified that she was referring to the kissing before the alleged rape .\n0\t4733\t4557\tGarner said the group would probably be expanded to include , for example , a Christian and perhaps another Sunni leader .\tThe group has already met several times and Gen. Garner said it probably will be expanded to include a Christian and perhaps another Sunni Muslim leader .\n1\t2820371\t2820525\tBlair 's Foreign Secretary Jack Straw was to take his place on Monday to give a statement to parliament on the European Union .\tBlair 's office said his Foreign Secretary Jack Straw would take his place on Monday to give a statement to parliament on the EU meeting the prime minister attended last week .\n1\t801552\t801516\t\" There were more people surrounding the clubhouse than the Unabomber 's house up in the hills , \" Baker said .\t\" There are more people surrounding the clubhouse than surrounded the Unabomber 's home in the hills .\n1\t1704987\t1705268\tCharles O. Prince , 53 , was named as Mr. Weill 's successor .\tMr. Weill 's longtime confidant , Charles O. Prince , 53 , was named as his successor .\n1\t396041\t396188\tOfficials are also meeting with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world .\tCanadian officials were also expected to meet yesterday with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world .\n0\t1014983\t1014963\tGE stock closed Friday at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange .\tGE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange .\n1\t2320654\t2320666\tThe Midwestern research center will focus on the development of diagnostic , therapeutic and vaccine products for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague .\tThe Midwestern center will focus on diagnosis , treatment and vaccines for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague .\n1\t1057876\t1057778\tThe hearing is to determine whether there is enough evidence to order Akbar to a general court-martial proceeding .\tThe purpose of the hearing is to determine whether Akbar should be court-martialled .\n0\t2116843\t2116883\tIn the United States , heart attacks kill about 460,000 year , in Canada about 80,000 .\tIn the United States , heart attacks kill about 460,000 yearly , according to the National Institutes of Health .\n1\t1461629\t1461781\tNinety-five percent of international cargo to the United States is carried by ship .\tShips carry 95 percent of international cargo to the United States .\n0\t374015\t374162\t\" It 's a major victory for Maine , and it 's a major victory for other states .\tThe Maine program could be a model for other states .\n1\t2493369\t2493428\tNews that oil producers were lowering their output starting in November exacerbated a sell-off that was already under way on Wall Street .\tNews that the Organization of Petroleum Exporting Countries was lowering output starting in November exacerbated a stock sell-off already under way yesterday .\n1\t490355\t490378\tThey note that after several weeks of rallies on upbeat earnings , investors are looking for stronger evidence of a recovery before sending stocks higher .\tAfter several weeks of market rallies on upbeat earnings , many investors are looking for more concrete signs of an economic recovery .\n1\t2691044\t2691264\tMost economists had expected a more dire report , with many anticipating the fifth month of job losses in six months .\tMost economists had been expecting a far more dire report , with many expecting to see the fifth month of job losses in six months in September .\n1\t1831453\t1831491\tBut software license revenues , a measure financial analysts watch closely , decreased 21 percent to $ 107.6 million .\tLicense sales , a key measure of demand , fell 21 percent to $ 107.6 million .\n1\t2380695\t2380822\tKing , brand-name writer , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters .\tStephen King , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters from the National Book Foundation .\n1\t2577517\t2577531\tThe Denver-based natural gas producer and marketer said the inaccurate reporting was discovered after it received a subpoena from the U.S. Commodity Futures Trading Commission .\tThe natural gas producer and marketer said the inaccurate reporting was discovered in response to a subpoena from the U.S. Commodity Futures Trading Commission , or CFTC .\n1\t3267026\t3266930\tThe steel tariffs , which the U.S. president imposed in March 2002 , will officially end at midnight , instead of March 2005 as initially planned .\tThe U.S. steel tariffs , which Bush imposed in March 2002 , were to officially end at midnight Thursday ( 0500 GMT ) , instead of March 2005 as initially planned .\n1\t360875\t360943\tBusiness Week 's online edition reported on Friday that WorldCom and the SEC could announce a settlement as early as Monday .\tBusinessWeek Online has learned that the settlement could come as early as Monday , May 19 .\n1\t162632\t162653\tOnly one of the five buildings in the Baghdad compound of the United Nations Development Program escaped being burned , the UN said on its Web site .\tOnly one of the five buildings in the compound in Baghdad run by the UN Development Program , escaped being burned , the UN said on its Web site .\n1\t1128884\t1128865\tShares of Salix have rocketed 64 percent since Axcan made its first offer on April 10 .\tSince the initial takeover offer , Salix shares have risen about 35 percent .\n1\t3264732\t3264648\tThe jury verdict , reached Wednesday after less than four hours of deliberation , followed a 2 week trial , during which Waagner represented himself .\tThe quick conviction followed a 2 1 / 2 week trial , during which the Venango County man represented himself .\n1\t1721433\t1721267\tIt 's happened five times in the last 11 years : A disaster puts this Southwestern town in the headlines during the summer tourist season .\tIt 's happened five times in the last decade : A disaster puts this tourist town in the headlines during summer , its busiest season .\n0\t146112\t146127\tThe broader Standard & Poor 's 500 Index .SPX edged down 9 points , or 0.98 percent , to 921 .\tThe technology-laced Nasdaq Composite Index < .IXIC > shed 15 points , or 0.98 percent , to 1,492 .\n1\t389117\t389052\tThe company emphasized that McDonald 's USA does not import any raw beef or hamburger patties from Canada for McDonald 's use in the United States .\tMcDonald 's said in a statement that it does not import any raw beef or hamburger patties from Canada for use in the United States .\n1\t872784\t872834\tGregory Parseghian , a former investment banker , was appointed chief executive .\tGreg Parseghian was appointed the new chief executive .\n0\t2977500\t2977547\tTheir contract will expire at 12 : 01 a.m. Wednesday instead of 12 : 01 a.m. Sunday , said Rian Wathen , organizing director for United Food and Commercial Workers Local 700 .\t\" It has outraged the membership , \" said Rian Wathen , organizing director of United Food and Commercial Workers Local 700 .\n1\t3107137\t3107119\tBut plaque volume increased by 2.7 percent in pravastatin patients .\tThe volume of plaque in Pravachol patients ' arteries rose by 3 % .\n1\t1619244\t1619274\tToday in the US , the book - kept under wraps by its publishers , G. P. Putnam 's Sons , since its inception - will appear in bookstores .\tTomorrow the book , kept under wraps by G. P. Putnam 's Sons since its inception , will appear in bookstores .\n0\t3061836\t3062031\tThe S & P / TSX composite rose 87.74 points on the week , while the TSX Venture Exchange composite gained 44.49 points .\tOn the week , the Dow Jones industrial average rose 11.56 points , while the Nasdaq Stock Market gained 39.42 points .\n1\t485999\t486011\tEx-KGB agent Putin added that the Beatles were considered ' propaganda of an alien ideology ' .\tIn Soviet times the Beatles ' music \" was considered propaganda of an alien ideology ."
  },
  {
    "path": "src/examples/pytorch/bert_tutorial/parallel.py",
    "content": "from concurrent import futures\nimport torch\nimport torch.neuron\nimport os\nfrom time import time\nfrom queue import Queue\nimport warnings\n\ndef consumer(model, input_queue):\n    while True:\n        inputs, input_id, callback_fn = input_queue.get()\n        input_queue.task_done()\n        # Stop execution if stopping condition is recieved\n        if inputs == \"stop\":\n            break\n        start = time()\n        results = model(*inputs)\n        # Make the output iterable - if it is not already a tuple or list\n        if not isinstance(results, tuple) or isinstance(results, list):\n            results = [results]\n        end = time()\n        if callback_fn is not None:\n            callback_fn(results, input_id, start, end)\n              \nclass NeuronSimpleDataParallel():\n\n    def __init__(self, model_file, num_neuron_cores, batch_size=1):\n        self.num_neuron_cores = num_neuron_cores\n        self.batch_size = batch_size\n        \n        os.environ['NEURON_RT_NUM_CORES'] = str(num_neuron_cores)\n        \n        # Construct a list of models\n        self.models = [torch.jit.load(model_file)\n                       for i in range(num_neuron_cores)]\n        \n        # Create shared input queue\n        self.input_queue = Queue(maxsize=num_neuron_cores*16)\n\n        self.executor = futures.ThreadPoolExecutor(\n            max_workers=num_neuron_cores)\n\n    def eval(self):\n        for model in self.models:\n            model.eval()\n            \n    def train(self):\n        for model in self.models:\n            model.train()\n            \n    def start_continuous_inference(self):\n        for model in self.models:\n            self.executor.submit(consumer, model, self.input_queue)\n    \n    def infer(self, batch, input_id, callback_fn):\n        self.input_queue.put((batch, input_id, callback_fn))\n        \n    def stop(self):\n        for _ in range(self.num_neuron_cores):\n            self.input_queue.put((\"stop\", -1, None))\n"
  },
  {
    "path": "src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Compiling and Deploying HuggingFace Pretrained BERT\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Introduction\\n\",\n    \"\\n\",\n    \"In this tutorial we will compile and deploy BERT-base version of HuggingFace 🤗 Transformers BERT for Inferentia. The full list of HuggingFace's pretrained BERT models can be found in the BERT section on this page https://huggingface.co/transformers/pretrained_models.html. \\n\",\n    \"\\n\",\n    \"This Jupyter notebook should be run on an instance which is inf1.6xlarge or larger. The compile part of this tutorial requires inf1.6xlarge and not the inference itself. For simplicity we will run this tutorial on inf1.6xlarge but in real life scenario the compilation should be done on a compute instance and the deployment on inf1 instance to save costs.\\n\",\n    \"\\n\",\n    \"Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](../../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \\\"Kernel -> Change Kernel\\\" option on the top of this Jupyter notebook page.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install Dependencies:\\n\",\n    \"This tutorial requires the following pip packages:\\n\",\n    \"\\n\",\n    \"- `torch-neuron`\\n\",\n    \"- `neuron-cc[tensorflow]`\\n\",\n    \"- `transformers`\\n\",\n    \"\\n\",\n    \"Most of these packages will be installed when configuring your environment using the Neuron PyTorch setup guide. The additional dependencies must be installed here.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\\n\",\n    \"!pip install --upgrade \\\"transformers==4.6.0\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Compile the model into an AWS Neuron optimized TorchScript\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import tensorflow  # to workaround a protobuf version conflict issue\\n\",\n    \"import torch\\n\",\n    \"import torch.neuron\\n\",\n    \"from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig\\n\",\n    \"import transformers\\n\",\n    \"import os\\n\",\n    \"import warnings\\n\",\n    \"\\n\",\n    \"# Setting up NeuronCore groups for inf1.6xlarge with 16 cores\\n\",\n    \"num_cores = 16 # This value should be 4 on inf1.xlarge and inf1.2xlarge\\n\",\n    \"os.environ['NEURON_RT_NUM_CORES'] = str(num_cores)\\n\",\n    \"\\n\",\n    \"# Build tokenizer and model\\n\",\n    \"tokenizer = AutoTokenizer.from_pretrained(\\\"bert-base-cased-finetuned-mrpc\\\")\\n\",\n    \"model = AutoModelForSequenceClassification.from_pretrained(\\\"bert-base-cased-finetuned-mrpc\\\", return_dict=False)\\n\",\n    \"\\n\",\n    \"# Setup some example inputs\\n\",\n    \"sequence_0 = \\\"The company HuggingFace is based in New York City\\\"\\n\",\n    \"sequence_1 = \\\"Apples are especially bad for your health\\\"\\n\",\n    \"sequence_2 = \\\"HuggingFace's headquarters are situated in Manhattan\\\"\\n\",\n    \"\\n\",\n    \"max_length=128\\n\",\n    \"paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors=\\\"pt\\\")\\n\",\n    \"not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\\\"pt\\\")\\n\",\n    \"\\n\",\n    \"# Run the original PyTorch model on compilation exaple\\n\",\n    \"paraphrase_classification_logits = model(**paraphrase)[0]\\n\",\n    \"\\n\",\n    \"# Convert example inputs to a format that is compatible with TorchScript tracing\\n\",\n    \"example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']\\n\",\n    \"example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']\\n\",\n    \"\\n\",\n    \"# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\\n\",\n    \"model_neuron = torch.neuron.trace(model, example_inputs_paraphrase)\\n\",\n    \"\\n\",\n    \"# Verify the TorchScript works on both example inputs\\n\",\n    \"paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)\\n\",\n    \"not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)\\n\",\n    \"\\n\",\n    \"# Save the TorchScript for later use\\n\",\n    \"model_neuron.save('bert_neuron.pt')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You may inspect `model_neuron.graph` to see which part is running on CPU versus running on the accelerator. All native `aten` operators in the graph will be running on CPU.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"print(model_neuron.graph)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\n\",\n    \"\\n\",\n    \"### Deploy the AWS Neuron optimized TorchScript\\n\",\n    \"\\n\",\n    \"To deploy the AWS Neuron optimized TorchScript, you may choose to load the saved TorchScript from disk and skip the slow compilation.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Load TorchScript back\\n\",\n    \"model_neuron = torch.jit.load('bert_neuron.pt')\\n\",\n    \"# Verify the TorchScript works on both example inputs\\n\",\n    \"paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)\\n\",\n    \"not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)\\n\",\n    \"classes = ['not paraphrase', 'paraphrase']\\n\",\n    \"paraphrase_prediction = paraphrase_classification_logits_neuron[0][0].argmax().item()\\n\",\n    \"not_paraphrase_prediction = not_paraphrase_classification_logits_neuron[0][0].argmax().item()\\n\",\n    \"print('BERT says that \\\"{}\\\" and \\\"{}\\\" are {}'.format(sequence_0, sequence_2, classes[paraphrase_prediction]))\\n\",\n    \"print('BERT says that \\\"{}\\\" and \\\"{}\\\" are {}'.format(sequence_0, sequence_1, classes[not_paraphrase_prediction]))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Now let's run the model in parallel on four cores\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def get_input_with_padding(batch, batch_size, max_length):\\n\",\n    \"    ## Reformulate the batch into three batch tensors - default batch size batches the outer dimension\\n\",\n    \"    encoded = batch['encoded']\\n\",\n    \"    inputs = torch.squeeze(encoded['input_ids'], 1)\\n\",\n    \"    attention = torch.squeeze(encoded['attention_mask'], 1)\\n\",\n    \"    token_type = torch.squeeze(encoded['token_type_ids'], 1)\\n\",\n    \"    quality = list(map(int, batch['quality']))\\n\",\n    \"\\n\",\n    \"    if inputs.size()[0] != batch_size:\\n\",\n    \"        print(\\\"Input size = {} - padding\\\".format(inputs.size()))\\n\",\n    \"        remainder = batch_size - inputs.size()[0]\\n\",\n    \"        zeros = torch.zeros( [remainder, max_length], dtype=torch.long )\\n\",\n    \"        inputs = torch.cat( [inputs, zeros] )\\n\",\n    \"        attention = torch.cat( [attention, zeros] )\\n\",\n    \"        token_type = torch.cat( [token_type, zeros] )\\n\",\n    \"\\n\",\n    \"    assert(inputs.size()[0] == batch_size and inputs.size()[1] == max_length)\\n\",\n    \"    assert(attention.size()[0] == batch_size and attention.size()[1] == max_length)\\n\",\n    \"    assert(token_type.size()[0] == batch_size and token_type.size()[1] == max_length)\\n\",\n    \"\\n\",\n    \"    return (inputs, attention, token_type), quality\\n\",\n    \"\\n\",\n    \"def count(output, quality):\\n\",\n    \"    assert output.size(0) >= len(quality)\\n\",\n    \"    correct_count = 0\\n\",\n    \"    count = len(quality)\\n\",\n    \"    \\n\",\n    \"    batch_predictions = [ row.argmax().item() for row in output ]\\n\",\n    \"\\n\",\n    \"    for a, b in zip(batch_predictions, quality):\\n\",\n    \"        if int(a)==int(b):\\n\",\n    \"            correct_count += 1\\n\",\n    \"\\n\",\n    \"    return correct_count, count\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Data parallel inference\\n\",\n    \"In the below cell, we use the data parallel approach for inference. In this approach, we load multiple models, all of them running in parallel. Each model is loaded onto a single NeuronCore. In the below implementation, we launch 16 models, thereby utilizing all the 16 cores on an inf1.6xlarge.\\n\",\n    \"\\n\",\n    \"> Note: Now if you try to decrease the num_cores in the above cells, please restart the notebook and run `!sudo rmmod neuron; sudo modprobe neuron` step in cell 2 to clear the Neuron cores.\\n\",\n    \"\\n\",\n    \"Since, we can run more than 1 model concurrently, the throughput for the system goes up. To achieve maximum gain in throughput, we need to efficiently feed the models so as to keep them busy at all times. In the below setup, this is done by using a producer-consumer model. We maintain a common python queue shared across all the models. The common queue enables feeding data continuously to the models.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from parallel import NeuronSimpleDataParallel\\n\",\n    \"from bert_benchmark_utils import BertTestDataset, BertResults\\n\",\n    \"import time\\n\",\n    \"import functools\\n\",\n    \"\\n\",\n    \"max_length = 128\\n\",\n    \"num_cores = 16\\n\",\n    \"batch_size = 1\\n\",\n    \"\\n\",\n    \"tsv_file=\\\"glue_mrpc_dev.tsv\\\"\\n\",\n    \"\\n\",\n    \"data_set = BertTestDataset( tsv_file=tsv_file, tokenizer=tokenizer, max_length=max_length )\\n\",\n    \"data_loader = torch.utils.data.DataLoader(data_set, batch_size=batch_size, shuffle=True)\\n\",\n    \"\\n\",\n    \"#Result aggregation class (code in bert_benchmark_utils.py)\\n\",\n    \"results = BertResults(batch_size, num_cores)\\n\",\n    \"def result_handler(output, result_id, start, end, input_dict):\\n\",\n    \"    correct_count, inference_count = count(output[0], input_dict.pop(result_id))\\n\",\n    \"    elapsed = end - start\\n\",\n    \"    results.add_result(correct_count, inference_count, [elapsed], [end], [start])\\n\",\n    \"\\n\",\n    \"parallel_neuron_model = NeuronSimpleDataParallel('bert_neuron.pt', num_cores)\\n\",\n    \"\\n\",\n    \"#Starting the inference threads\\n\",\n    \"parallel_neuron_model.start_continuous_inference()\\n\",\n    \"\\n\",\n    \"# Warm up the cores\\n\",\n    \"z = torch.zeros( [batch_size, max_length], dtype=torch.long )\\n\",\n    \"batch = (z, z, z)\\n\",\n    \"for _ in range(num_cores*4):\\n\",\n    \"    parallel_neuron_model.infer(batch, -1, None)\\n\",\n    \"    \\n\",\n    \"input_dict = {}\\n\",\n    \"input_id = 0\\n\",\n    \"for _ in range(30):\\n\",\n    \"    for batch in data_loader:\\n\",\n    \"        batch, quality = get_input_with_padding(batch, batch_size, max_length)\\n\",\n    \"        input_dict[input_id] = quality\\n\",\n    \"        callback_fn = functools.partial(result_handler, input_dict=input_dict)\\n\",\n    \"        parallel_neuron_model.infer(batch, input_id, callback_fn)\\n\",\n    \"        input_id+=1\\n\",\n    \"\\n\",\n    \"# Stop inference                \\n\",\n    \"parallel_neuron_model.stop()\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"with open(\\\"benchmark.txt\\\", \\\"w\\\") as f:\\n\",\n    \"    results.report(f, window_size=1)\\n\",\n    \"\\n\",\n    \"with open(\\\"benchmark.txt\\\", \\\"r\\\") as f:\\n\",\n    \"    for line in f:\\n\",\n    \"        print(line)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Now recompile with a larger batch size of six sentence pairs\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"batch_size = 6\\n\",\n    \"\\n\",\n    \"example_inputs_paraphrase = (\\n\",\n    \"    torch.cat([paraphrase['input_ids']] * batch_size,0), \\n\",\n    \"    torch.cat([paraphrase['attention_mask']] * batch_size,0), \\n\",\n    \"    torch.cat([paraphrase['token_type_ids']] * batch_size,0)\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\\n\",\n    \"model_neuron_batch = torch.neuron.trace(model, example_inputs_paraphrase)\\n\",\n    \"\\n\",\n    \"## Save the batched model\\n\",\n    \"model_neuron_batch.save('bert_neuron_b{}.pt'.format(batch_size))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Rerun inference with batch 6\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"pycharm\": {\n     \"name\": \"#%%\\n\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"from parallel import NeuronSimpleDataParallel\\n\",\n    \"from bert_benchmark_utils import BertTestDataset, BertResults\\n\",\n    \"import time\\n\",\n    \"import functools\\n\",\n    \"\\n\",\n    \"max_length = 128\\n\",\n    \"num_cores = 16\\n\",\n    \"batch_size = 6\\n\",\n    \"\\n\",\n    \"data_set = BertTestDataset( tsv_file=tsv_file, tokenizer=tokenizer, max_length=max_length )\\n\",\n    \"data_loader = torch.utils.data.DataLoader(data_set, batch_size=batch_size, shuffle=True)\\n\",\n    \"\\n\",\n    \"#Result aggregation class (code in bert_benchmark_utils.py)\\n\",\n    \"results = BertResults(batch_size, num_cores)\\n\",\n    \"def result_handler(output, result_id, start, end, input_dict):\\n\",\n    \"    correct_count, inference_count = count(output[0], input_dict.pop(result_id))\\n\",\n    \"    elapsed = end - start\\n\",\n    \"    results.add_result(correct_count, inference_count, [elapsed], [end], [start])\\n\",\n    \"\\n\",\n    \"parallel_neuron_model = NeuronSimpleDataParallel('bert_neuron_b{}.pt'.format(batch_size), num_cores)\\n\",\n    \"\\n\",\n    \"#Starting the inference threads\\n\",\n    \"parallel_neuron_model.start_continuous_inference()\\n\",\n    \"\\n\",\n    \"# Adding to the input queue to warm all cores\\n\",\n    \"z = torch.zeros( [batch_size, max_length], dtype=torch.long )\\n\",\n    \"batch = (z, z, z)\\n\",\n    \"for _ in range(num_cores*4):\\n\",\n    \"    parallel_neuron_model.infer(batch, -1, None)\\n\",\n    \"\\n\",\n    \"input_dict = {}\\n\",\n    \"input_id = 0\\n\",\n    \"for _ in range(30):\\n\",\n    \"    for batch in data_loader:\\n\",\n    \"        batch, quality = get_input_with_padding(batch, batch_size, max_length)\\n\",\n    \"        input_dict[input_id] = quality\\n\",\n    \"        callback_fn = functools.partial(result_handler, input_dict=input_dict)\\n\",\n    \"        parallel_neuron_model.infer(batch, input_id, callback_fn)\\n\",\n    \"        input_id+=1\\n\",\n    \"\\n\",\n    \"# Stop inference                \\n\",\n    \"parallel_neuron_model.stop()\\n\",\n    \"\\n\",\n    \"with open(\\\"benchmark_b{}.txt\\\".format(batch_size), \\\"w\\\") as f:\\n\",\n    \"    results.report(f, window_size=1)\\n\",\n    \"\\n\",\n    \"with open(\\\"benchmark_b{}.txt\\\".format(batch_size), \\\"r\\\") as f:\\n\",\n    \"    for line in f:\\n\",\n    \"        print(line)\\n\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.8.9 64-bit\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.9\"\n  },\n  \"vscode\": {\n   \"interpreter\": {\n    \"hash\": \"31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6\"\n   }\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert_shared_weights.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Data Parallel HuggingFace Pretrained BERT with Weight Sharing (Deduplication)\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Introduction\\n\",\n    \"\\n\",\n    \"In this tutorial we will compile and deploy BERT-base version of HuggingFace 🤗 Transformers BERT for Inferentia, with additional demonstration of using Weight Sharing (Deduplication) feature.\\n\",\n    \"\\n\",\n    \"To use the [Weight Sharing (Deduplication) feature](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-configurable-parameters.html#shared-weights-neuron-rt-multi-instance-shared-weights), you must set the Neuron Runtime environmental variable NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS to \\\"TRUE\\\" together with the [core placement API](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuron/api-core-placement.html) (``torch_neuron.experimental.neuron_cores_context()``).\\n\",\n    \"\\n\",\n    \"This Jupyter notebook should be run on an instance which is inf1.6xlarge or larger. The compile part of this tutorial requires inf1.6xlarge and not the inference itself. For simplicity we will run this tutorial on inf1.6xlarge but in real life scenario the compilation should be done on a compute instance and the deployment on inf1 instance to save costs.\\n\",\n    \"\\n\",\n    \"Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](../../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \\\"Kernel -> Change Kernel\\\" option on the top of this Jupyter notebook page.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install Dependencies:\\n\",\n    \"This tutorial requires the following pip packages:\\n\",\n    \"\\n\",\n    \"- `torch-neuron`\\n\",\n    \"- `neuron-cc[tensorflow]`\\n\",\n    \"- `transformers`\\n\",\n    \"\\n\",\n    \"Most of these packages will be installed when configuring your environment using the Neuron PyTorch setup guide. The additional dependencies must be installed here.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\\n\",\n    \"!pip install --upgrade \\\"transformers==4.6.0\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Compile the model into an AWS Neuron optimized TorchScript\\n\",\n    \"\\n\",\n    \"This step compiles the model into an AWS Neuron optimized TorchScript, and saves it in the filed ``bert_neuron.pt``. This step is the same as the pretrained BERT tutorial without Shared Weights feature. We use batch 1 for simplicity.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import tensorflow  # to workaround a protobuf version conflict issue\\n\",\n    \"import torch\\n\",\n    \"import torch.neuron\\n\",\n    \"from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig\\n\",\n    \"import transformers\\n\",\n    \"import os\\n\",\n    \"import warnings\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"# Build tokenizer and model\\n\",\n    \"tokenizer = AutoTokenizer.from_pretrained(\\\"bert-base-cased-finetuned-mrpc\\\")\\n\",\n    \"model = AutoModelForSequenceClassification.from_pretrained(\\\"bert-base-cased-finetuned-mrpc\\\", return_dict=False)\\n\",\n    \"\\n\",\n    \"# Setup some example inputs\\n\",\n    \"sequence_0 = \\\"The company HuggingFace is based in New York City\\\"\\n\",\n    \"sequence_1 = \\\"Apples are especially bad for your health\\\"\\n\",\n    \"sequence_2 = \\\"HuggingFace's headquarters are situated in Manhattan\\\"\\n\",\n    \"\\n\",\n    \"max_length=128\\n\",\n    \"paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors=\\\"pt\\\")\\n\",\n    \"not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\\\"pt\\\")\\n\",\n    \"\\n\",\n    \"# Run the original PyTorch model on compilation exaple\\n\",\n    \"paraphrase_classification_logits = model(**paraphrase)[0]\\n\",\n    \"\\n\",\n    \"# Convert example inputs to a format that is compatible with TorchScript tracing\\n\",\n    \"example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']\\n\",\n    \"example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']\\n\",\n    \"\\n\",\n    \"# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\\n\",\n    \"model_neuron = torch.neuron.trace(model, example_inputs_paraphrase)\\n\",\n    \"\\n\",\n    \"# Verify the TorchScript works on both example inputs\\n\",\n    \"paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)\\n\",\n    \"not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)\\n\",\n    \"\\n\",\n    \"# Save the TorchScript for later use\\n\",\n    \"model_neuron.save('bert_neuron.pt')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\n\",\n    \"\\n\",\n    \"### Deploy the AWS Neuron optimized TorchScript\\n\",\n    \"\\n\",\n    \"To deploy the AWS Neuron optimized TorchScript, you may choose to load the saved TorchScript from disk and skip the slow compilation. This step is the same as the pretrained BERT tutorial without Shared Weights feature\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Load TorchScript back\\n\",\n    \"model_neuron = torch.jit.load('bert_neuron.pt')\\n\",\n    \"# Verify the TorchScript works on both example inputs\\n\",\n    \"paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)\\n\",\n    \"not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)\\n\",\n    \"classes = ['not paraphrase', 'paraphrase']\\n\",\n    \"paraphrase_prediction = paraphrase_classification_logits_neuron[0][0].argmax().item()\\n\",\n    \"not_paraphrase_prediction = not_paraphrase_classification_logits_neuron[0][0].argmax().item()\\n\",\n    \"print('BERT says that \\\"{}\\\" and \\\"{}\\\" are {}'.format(sequence_0, sequence_2, classes[paraphrase_prediction]))\\n\",\n    \"print('BERT says that \\\"{}\\\" and \\\"{}\\\" are {}'.format(sequence_0, sequence_1, classes[not_paraphrase_prediction]))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We define two helper functions to pad input and to count correct results.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def get_input_with_padding(batch, batch_size, max_length):\\n\",\n    \"    ## Reformulate the batch into three batch tensors - default batch size batches the outer dimension\\n\",\n    \"    encoded = batch['encoded']\\n\",\n    \"    inputs = torch.squeeze(encoded['input_ids'], 1)\\n\",\n    \"    attention = torch.squeeze(encoded['attention_mask'], 1)\\n\",\n    \"    token_type = torch.squeeze(encoded['token_type_ids'], 1)\\n\",\n    \"    quality = list(map(int, batch['quality']))\\n\",\n    \"\\n\",\n    \"    if inputs.size()[0] != batch_size:\\n\",\n    \"        print(\\\"Input size = {} - padding\\\".format(inputs.size()))\\n\",\n    \"        remainder = batch_size - inputs.size()[0]\\n\",\n    \"        zeros = torch.zeros( [remainder, max_length], dtype=torch.long )\\n\",\n    \"        inputs = torch.cat( [inputs, zeros] )\\n\",\n    \"        attention = torch.cat( [attention, zeros] )\\n\",\n    \"        token_type = torch.cat( [token_type, zeros] )\\n\",\n    \"\\n\",\n    \"    assert(inputs.size()[0] == batch_size and inputs.size()[1] == max_length)\\n\",\n    \"    assert(attention.size()[0] == batch_size and attention.size()[1] == max_length)\\n\",\n    \"    assert(token_type.size()[0] == batch_size and token_type.size()[1] == max_length)\\n\",\n    \"\\n\",\n    \"    return (inputs, attention, token_type), quality\\n\",\n    \"\\n\",\n    \"def count(output, quality):\\n\",\n    \"    assert output.size(0) >= len(quality)\\n\",\n    \"    correct_count = 0\\n\",\n    \"    count = len(quality)\\n\",\n    \"    \\n\",\n    \"    batch_predictions = [ row.argmax().item() for row in output ]\\n\",\n    \"\\n\",\n    \"    for a, b in zip(batch_predictions, quality):\\n\",\n    \"        if int(a)==int(b):\\n\",\n    \"            correct_count += 1\\n\",\n    \"\\n\",\n    \"    return correct_count, count\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Data parallel inference\\n\",\n    \"In the below cell, we use the data parallel approach for inference. In this approach, we load multiple models, all of them running in parallel. Each model is loaded onto a single NeuronCore via the core placement API (``torch_neuron.experimental.neuron_cores_context()``). We also set Neuron Runtime environment variable ``NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS`` to \\\"TRUE\\\" as required to use the Weight Sharing feature.\\n\",\n    \"\\n\",\n    \"In the below implementation, we launch 16 models, thereby utilizing all the 16 cores on an inf1.6xlarge.\\n\",\n    \"\\n\",\n    \"> Note: Now if you try to decrease the num_cores in the below cells, please restart the notebook and run `!sudo rmmod neuron; sudo modprobe neuron` step in cell 2 to clear the Neuron cores.\\n\",\n    \"\\n\",\n    \"Since, we can run more than 1 model concurrently, the throughput for the system goes up. To achieve maximum gain in throughput, we need to efficiently feed the models so as to keep them busy at all times. In the below setup, we use parallel threads to feed data continuously to the models.\\n\",\n    \"\\n\",\n    \"When running the cell below, you can monitor the Inferentia device activities by running ``neuron-top`` in another terminal. You will see that \\\"Device Used Memory\\\" is 1.6GB total, and the model instance loaded onto NeuronDevice 0 NeuronCore 0 uses the most device memory (272MB) while the other model instances loaded onto other NeuronCores use less device memory (92MB). This shows the effect of using Shared Weights as the device memory usage is lower. If you change ``NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS`` to \\\"FALSE\\\" you will see that \\\"Device Used Memory\\\" is 3.2GB, and the model instances loaded onto  NeuronDevice 0 NeuronCore 0 and 1 use the most device memory (360MB) while the other model instances now use 180MB each.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from bert_benchmark_utils import BertTestDataset, BertResults\\n\",\n    \"import time\\n\",\n    \"import functools\\n\",\n    \"import os\\n\",\n    \"import torch.neuron as torch_neuron\\n\",\n    \"from concurrent import futures\\n\",\n    \"\\n\",\n    \"# Setting up NeuronCore groups for inf1.6xlarge with 16 cores\\n\",\n    \"num_cores = 16 # This value should be 4 on inf1.xlarge and inf1.2xlarge\\n\",\n    \"os.environ['NEURON_RT_NUM_CORES'] = str(num_cores)\\n\",\n    \"os.environ['NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS'] = 'TRUE'\\n\",\n    \"#os.environ['NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS'] = 'FALSE'\\n\",\n    \"\\n\",\n    \"max_length = 128\\n\",\n    \"num_cores = 16\\n\",\n    \"batch_size = 1\\n\",\n    \"\\n\",\n    \"tsv_file=\\\"glue_mrpc_dev.tsv\\\"\\n\",\n    \"\\n\",\n    \"data_set = BertTestDataset( tsv_file=tsv_file, tokenizer=tokenizer, max_length=max_length )\\n\",\n    \"data_loader = torch.utils.data.DataLoader(data_set, batch_size=batch_size, shuffle=True)\\n\",\n    \"\\n\",\n    \"#Result aggregation class (code in bert_benchmark_utils.py)\\n\",\n    \"results = BertResults(batch_size, num_cores)\\n\",\n    \"def result_handler(output, result_id, start, end, input_dict):\\n\",\n    \"    correct_count, inference_count = count(output[0], input_dict.pop(result_id))\\n\",\n    \"    elapsed = end - start\\n\",\n    \"    results.add_result(correct_count, inference_count, [elapsed], [end], [start])\\n\",\n    \"\\n\",\n    \"with torch_neuron.experimental.neuron_cores_context(start_nc=0, nc_count=num_cores):\\n\",\n    \"    model = torch.jit.load('bert_neuron.pt')\\n\",\n    \"\\n\",\n    \"# Warm up the cores\\n\",\n    \"z = torch.zeros( [batch_size, max_length], dtype=torch.long )\\n\",\n    \"batch = (z, z, z)\\n\",\n    \"for _ in range(num_cores*4):\\n\",\n    \"    model(*batch)\\n\",\n    \"\\n\",\n    \"# Prepare the input data\\n\",\n    \"batch_list = []\\n\",\n    \"for batch in data_loader:\\n\",\n    \"    batch, quality = get_input_with_padding(batch, batch_size, max_length)\\n\",\n    \"    batch_list.append((batch, quality))\\n\",\n    \"\\n\",\n    \"# One thread running a model on one core\\n\",\n    \"def one_thread(feed_data, quality):\\n\",\n    \"    start = time.time()\\n\",\n    \"    result = model(*feed_data)\\n\",\n    \"    end = time.time()   \\n\",\n    \"    return result[0], quality, start, end\\n\",\n    \"\\n\",\n    \"# Launch more threads than models/cores to keep them busy\\n\",\n    \"processes = []\\n\",\n    \"with futures.ThreadPoolExecutor(max_workers=num_cores*2) as executor:\\n\",\n    \"    # extra loops to help you see activities in neuron-top\\n\",\n    \"    for _ in range(10):\\n\",\n    \"        for input_id, (batch, quality) in enumerate(batch_list):\\n\",\n    \"            processes.append(executor.submit(one_thread, batch, quality))\\n\",\n    \"\\n\",\n    \"results = BertResults(batch_size, num_cores)\\n\",\n    \"for _ in futures.as_completed(processes):   \\n\",\n    \"    (output, quality, start, end) = _.result()     \\n\",\n    \"    correct_count, inference_count = count(output, quality)\\n\",\n    \"    results.add_result(correct_count, inference_count, [start - end], [start], [end])\\n\",\n    \"\\n\",\n    \"with open(\\\"benchmark.txt\\\", \\\"w\\\") as f:\\n\",\n    \"    results.report(f, window_size=1)\\n\",\n    \"\\n\",\n    \"with open(\\\"benchmark.txt\\\", \\\"r\\\") as f:\\n\",\n    \"    for line in f:\\n\",\n    \"        print(line)\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python (torch-neuron)\",\n   \"language\": \"python\",\n   \"name\": \"aws_neuron_venv_pytorch_inf1\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.10\"\n  },\n  \"vscode\": {\n   \"interpreter\": {\n    \"hash\": \"31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6\"\n   }\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "src/examples/pytorch/byoc_sm_bert_tutorial/code/inference.py",
    "content": "import os\nimport json\nimport tensorflow  # to workaround a protobuf version conflict issue\nimport torch\nimport torch.neuron\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig\n\nJSON_CONTENT_TYPE = 'application/json'\n\n\ndef model_fn(model_dir):\n    tokenizer_init = AutoTokenizer.from_pretrained(\"bert-base-cased-finetuned-mrpc\")\n    model_file =os.path.join(model_dir, 'neuron_compiled_model.pt')\n    model_neuron = torch.jit.load(model_file)\n#    print(\"using {}\".format(model_file))\n\n    return (model_neuron, tokenizer_init)\n\n\ndef input_fn(serialized_input_data, content_type=JSON_CONTENT_TYPE):\n    if content_type == JSON_CONTENT_TYPE:\n        input_data = json.loads(serialized_input_data)\n#        print(input_data)\n        return input_data\n\n    else:\n        raise Exception('Requested unsupported ContentType in Accept: ' + content_type)\n        return\n\n\ndef predict_fn(input_data, models):\n#    print('Got input Data: {}'.format(input_data))\n\n    model_bert, tokenizer = models\n    sequence_0 = input_data[0] \n    sequence_1 = input_data[1]\n    \n    max_length=128\n    paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n    # Convert example inputs to a format that is compatible with TorchScript tracing\n    example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']  \n\n    # Verify the TorchScript works on example inputs\n    paraphrase_classification_logits_neuron = model_bert(*example_inputs_paraphrase)\n    classes = ['not paraphrase', 'paraphrase']\n    paraphrase_prediction = paraphrase_classification_logits_neuron[0][0].argmax().item()\n    out_str = 'BERT says that \"{}\" and \"{}\" are {}'.format(sequence_0, sequence_1, classes[paraphrase_prediction])\n    \n    return out_str\n\ndef output_fn(prediction_output, accept=JSON_CONTENT_TYPE):\n    if accept == JSON_CONTENT_TYPE:\n        return json.dumps(prediction_output), accept\n\n    raise Exception('Requested unsupported ContentType in Accept: ' + accept)\n\n"
  },
  {
    "path": "src/examples/pytorch/byoc_sm_bert_tutorial/container/Dockerfile",
    "content": "FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-neuron:1.7.1-neuron-py36-ubuntu18.04\n\n# Install packages \nRUN pip install \"transformers==4.7.0\"\n# CMD [\"/usr/local/bin/entrypoint.sh\"]\n\n"
  },
  {
    "path": "src/examples/pytorch/byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4674f667\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Deploy a pretrained PyTorch BERT model from HuggingFace on Amazon SageMaker with Neuron container\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"b3e39838\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Overview\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"a92c454f\",\n   \"metadata\": {},\n   \"source\": [\n    \"In this tutotial we will deploy on SageMaker a pretraine BERT Base model from HuggingFace Transformers, using the [AWS Deep Learning Containers](https://github.com/aws/deep-learning-containers). We will use the same model as shown in the [Neuron Tutorial \\\"PyTorch - HuggingFace Pretrained BERT Tutorial\\\"](../../../../frameworks/torch/torch-neuronx/tutorials/training/bert.html#). We will compile the model and build a custom AWS Deep Learning Container, to include the HuggingFace Transformers Library. \\n\",\n    \"\\n\",\n    \"This Jupyter Notebook should run on a ml.c5.4xlarge SageMaker Notebook instance. You can set up your SageMaker Notebook instance by following the [Get Started with Amazon SageMaker Notebook Instances](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-console.html) documentation. \\n\",\n    \"\\n\",\n    \"> We recommend increasing the size of the base root volume of you SM notebook instance, to accomodate the models and containers built locally. A root volume of 10Gb should suffice. \\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"37445ad2\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install Dependencies:\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"3ecd765f\",\n   \"metadata\": {},\n   \"source\": [\n    \"This tutorial requires the following pip packages:\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"cae3092c\",\n   \"metadata\": {},\n   \"source\": [\n    \"- torch-neuron\\n\",\n    \"- neuron-cc[tensorflow]\\n\",\n    \"- transformers\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"066c3731\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\\n\",\n    \"!pip install --upgrade --no-cache-dir torch-neuron neuron-cc[tensorflow] torchvision torch --extra-index-url=https://pip.repos.neuron.amazonaws.com\\n\",\n    \"!pip install --upgrade --no-cache-dir 'transformers==4.6.0'\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"a4796d3a\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile the model into an AWS Neuron optimized TorchScript\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"6fe85f8e\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"import torch_neuron\\n\",\n    \"\\n\",\n    \"from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0c5c253a\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Build tokenizer and model\\n\",\n    \"tokenizer = AutoTokenizer.from_pretrained(\\\"bert-base-cased-finetuned-mrpc\\\")\\n\",\n    \"model = AutoModelForSequenceClassification.from_pretrained(\\\"bert-base-cased-finetuned-mrpc\\\", return_dict=False)\\n\",\n    \"\\n\",\n    \"# Setup some example inputs\\n\",\n    \"sequence_0 = \\\"The company HuggingFace is based in New York City\\\"\\n\",\n    \"sequence_1 = \\\"Apples are especially bad for your health\\\"\\n\",\n    \"sequence_2 = \\\"HuggingFace's headquarters are situated in Manhattan\\\"\\n\",\n    \"\\n\",\n    \"max_length=128\\n\",\n    \"paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors=\\\"pt\\\")\\n\",\n    \"not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\\\"pt\\\")\\n\",\n    \"\\n\",\n    \"# Run the original PyTorch model on compilation exaple\\n\",\n    \"paraphrase_classification_logits = model(**paraphrase)[0]\\n\",\n    \"\\n\",\n    \"# Convert example inputs to a format that is compatible with TorchScript tracing\\n\",\n    \"example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']\\n\",\n    \"example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"44255ada\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%time\\n\",\n    \"# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\\n\",\n    \"# This step may need 3-5 min\\n\",\n    \"model_neuron = torch.neuron.trace(model, example_inputs_paraphrase, verbose=1, compiler_workdir='./compilation_artifacts')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5c4752ac\",\n   \"metadata\": {},\n   \"source\": [\n    \"You may inspect **model_neuron.graph** to see which part is running on CPU versus running on the accelerator. All native **aten** operators in the graph will be running on CPU.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"dc00889e\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# See  which part is running on CPU versus running on the accelerator.\\n\",\n    \"print(model_neuron.graph)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"775fb30d\",\n   \"metadata\": {},\n   \"source\": [\n    \"Save the compiled model, so it can be packaged and sent to S3.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"027c4f53\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Save the TorchScript for later use\\n\",\n    \"model_neuron.save('neuron_compiled_model.pt')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d362c579\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Package the pre-trained model and upload it to S3\\n\",\n    \"\\n\",\n    \"To make the model available for the SageMaker deployment, you will TAR the serialized graph and upload it to the default Amazon S3 bucket for your SageMaker session. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"29c7f7b4\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Now you'll create a model.tar.gz file to be used by SageMaker endpoint\\n\",\n    \"!tar -czvf model.tar.gz neuron_compiled_model.pt\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"1beadca0\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import boto3\\n\",\n    \"import time\\n\",\n    \"from sagemaker.utils import name_from_base\\n\",\n    \"import sagemaker\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"06ad87d4\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# upload model to S3\\n\",\n    \"role = sagemaker.get_execution_role()\\n\",\n    \"sess=sagemaker.Session()\\n\",\n    \"region=sess.boto_region_name\\n\",\n    \"bucket=sess.default_bucket()\\n\",\n    \"sm_client=boto3.client('sagemaker')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"5205ec55\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"model_key = '{}/model/model.tar.gz'.format('inf1_compiled_model')\\n\",\n    \"model_path = 's3://{}/{}'.format(bucket, model_key)\\n\",\n    \"boto3.resource('s3').Bucket(bucket).upload_file('model.tar.gz', model_key)\\n\",\n    \"print(\\\"Uploaded model to S3:\\\")\\n\",\n    \"print(model_path)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e8b425d4\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Build and Push the container\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"430e6ed2\",\n   \"metadata\": {},\n   \"source\": [\n    \"The following shell code shows how to build the container image using docker build and push the container image to ECR using docker push.\\n\",\n    \"The Dockerfile in this example is available in the ***container*** folder.\\n\",\n    \"Here's an example of the Dockerfile:\\n\",\n    \"\\n\",\n    \"```Dockerfile\\n\",\n    \"FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-neuron:1.7.1-neuron-py36-ubuntu18.04\\n\",\n    \"\\n\",\n    \"# Install packages \\n\",\n    \"RUN pip install \\\"transformers==4.7.0\\\"\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"3970025d\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!cat container/Dockerfile\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"62f78b0f\",\n   \"metadata\": {},\n   \"source\": [\n    \"Before running the next cell, make sure your SageMaker IAM role has access to ECR. If not, you can attache the role `AmazonEC2ContainerRegistryPowerUser` to your IAM role ARN, which allows you to upload image layers to ECR.  \\n\",\n    \"\\n\",\n    \"It takes 5 minutes to build docker images and upload image to ECR\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ecd51acf\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%sh\\n\",\n    \"\\n\",\n    \"# The name of our algorithm\\n\",\n    \"algorithm_name=neuron-py36-inference\\n\",\n    \"\\n\",\n    \"cd container\\n\",\n    \"\\n\",\n    \"account=$(aws sts get-caller-identity --query Account --output text)\\n\",\n    \"\\n\",\n    \"# Get the region defined in the current configuration (default to us-west-2 if none defined)\\n\",\n    \"region=$(aws configure get region)\\n\",\n    \"region=${region:-us-west-2}\\n\",\n    \"\\n\",\n    \"fullname=\\\"${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest\\\"\\n\",\n    \"\\n\",\n    \"# If the repository doesn't exist in ECR, create it.\\n\",\n    \"\\n\",\n    \"aws ecr describe-repositories --repository-names \\\"${algorithm_name}\\\" > /dev/null 2>&1\\n\",\n    \"\\n\",\n    \"if [ $? -ne 0 ]\\n\",\n    \"then\\n\",\n    \"    aws ecr create-repository --repository-name \\\"${algorithm_name}\\\" > /dev/null\\n\",\n    \"fi\\n\",\n    \"\\n\",\n    \"# Get the login command from ECR in order to pull down the SageMaker PyTorch image\\n\",\n    \"aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com\\n\",\n    \"# Build the docker image locally with the image name and then push it to ECR\\n\",\n    \"# with the full name.\\n\",\n    \"docker build  -t ${algorithm_name} . --build-arg REGION=${region}\\n\",\n    \"docker tag ${algorithm_name} ${fullname}\\n\",\n    \"\\n\",\n    \"# Get the login command from ECR and execute it directly\\n\",\n    \"aws ecr get-login-password --region ${region} | docker login --username AWS --password-stdin ${account}.dkr.ecr.${region}.amazonaws.com\\n\",\n    \"docker push ${fullname}\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e4f6bbda\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Deploy Container and run inference based on the pretrained model\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"64e65e31\",\n   \"metadata\": {},\n   \"source\": [\n    \"To deploy a pretrained PyTorch model, you'll need to use the PyTorch estimator object to create a PyTorchModel object and set a different entry_point.\\n\",\n    \"\\n\",\n    \"You'll use the PyTorchModel object to deploy a PyTorchPredictor. This creates a SageMaker Endpoint -- a hosted prediction service that we can use to perform inference.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"f343d3b1\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import sys\\n\",\n    \"\\n\",\n    \"!{sys.executable} -m pip install Transformers\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"2bd73b77\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import boto3\\n\",\n    \"import sagemaker\\n\",\n    \"\\n\",\n    \"role = sagemaker.get_execution_role()\\n\",\n    \"sess = sagemaker.Session()\\n\",\n    \"\\n\",\n    \"bucket = sess.default_bucket()\\n\",\n    \"prefix = \\\"inf1_compiled_model/model\\\"\\n\",\n    \"\\n\",\n    \"# Get container name in ECR\\n\",\n    \"client=boto3.client('sts')\\n\",\n    \"account=client.get_caller_identity()['Account']\\n\",\n    \"\\n\",\n    \"my_session=boto3.session.Session()\\n\",\n    \"region=my_session.region_name\\n\",\n    \"\\n\",\n    \"algorithm_name=\\\"neuron-py36-inference\\\"\\n\",\n    \"ecr_image='{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, algorithm_name)\\n\",\n    \"print(ecr_image)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"9298f2a7\",\n   \"metadata\": {},\n   \"source\": [\n    \"An implementation of *model_fn* is required for inference script.\\n\",\n    \"We are going to implement our own **model_fn** and **predict_fn** for Hugging Face Bert, and use default implementations of **input_fn** and **output_fn** defined in sagemaker-pytorch-containers.\\n\",\n    \"\\n\",\n    \"In this example, the inference script is put in ***code*** folder. Run the next cell to see it:\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"cfea75b6\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!pygmentize code/inference.py\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"1b31a7b8\",\n   \"metadata\": {},\n   \"source\": [\n    \"Path of compiled pretrained model in S3:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"61f3556e\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"key = os.path.join(prefix, \\\"model.tar.gz\\\")\\n\",\n    \"pretrained_model_data = \\\"s3://{}/{}\\\".format(bucket, key)\\n\",\n    \"print(pretrained_model_data)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e7557a5f\",\n   \"metadata\": {},\n   \"source\": [\n    \"The model object is defined by using the SageMaker Python SDK's PyTorchModel and pass in the model from the estimator and the entry_point. The endpoint's entry point for inference is defined by model_fn as seen in the previous code block that prints out **inference.py**. The model_fn function will load the model and required tokenizer.\\n\",\n    \"\\n\",\n    \"Note, **image_uri** must be user's own ECR images.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0bd99768\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from sagemaker.pytorch.model import PyTorchModel\\n\",\n    \"\\n\",\n    \"pytorch_model = PyTorchModel(\\n\",\n    \"    model_data=pretrained_model_data,\\n\",\n    \"    role=role,\\n\",\n    \"    source_dir=\\\"code\\\",\\n\",\n    \"    framework_version=\\\"1.7.1\\\",\\n\",\n    \"    entry_point=\\\"inference.py\\\",\\n\",\n    \"    image_uri=ecr_image\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"# Let SageMaker know that we've already compiled the model via neuron-cc\\n\",\n    \"pytorch_model._is_compiled_model = True\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"67439fe7\",\n   \"metadata\": {},\n   \"source\": [\n    \"The arguments to the deploy function allow us to set the number and type of instances that will be used for the Endpoint.\\n\",\n    \"\\n\",\n    \"Here you will deploy the model to a single **ml.inf1.2xlarge** instance.\\n\",\n    \"It may take 6-10 min to deploy.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"d771fc7c\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%time\\n\",\n    \"\\n\",\n    \"predictor = pytorch_model.deploy(initial_instance_count=1, instance_type=\\\"ml.inf1.2xlarge\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ab6342f3\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"print(predictor.endpoint_name)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"059537d9\",\n   \"metadata\": {},\n   \"source\": [\n    \"Since in the input_fn we declared that the incoming requests are json-encoded, we need to use a json serializer, to encode the incoming data into a json string. Also, we declared the return content type to be json string, we Need to use a json deserializer to parse the response.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"29e82f90\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"predictor.serializer = sagemaker.serializers.JSONSerializer()\\n\",\n    \"predictor.deserializer = sagemaker.deserializers.JSONDeserializer()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d006ea03\",\n   \"metadata\": {},\n   \"source\": [\n    \"Using a list of sentences, now SageMaker endpoint is invoked to get predictions.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"325a87f8\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%time\\n\",\n    \"result = predictor.predict(\\n\",\n    \"    [\\n\",\n    \"        \\\"Never allow the same bug to bite you twice.\\\",\\n\",\n    \"        \\\"The best part of Amazon SageMaker is that it makes machine learning easy.\\\",\\n\",\n    \"    ]\\n\",\n    \")\\n\",\n    \"print(result)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"4a12410d\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%time\\n\",\n    \"result = predictor.predict(\\n\",\n    \"    [\\n\",\n    \"        \\\"The company HuggingFace is based in New York City\\\",\\n\",\n    \"        \\\"HuggingFace's headquarters are situated in Manhattan\\\",\\n\",\n    \"    ]\\n\",\n    \")\\n\",\n    \"print(result)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"a72dfd16\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Benchmarking your endpoint\\n\",\n    \"\\n\",\n    \"The following cells create a load test for your endpoint. You first define some helper functions: `inference_latency` runs the endpoint request, collects cliend side latency and any errors, `random_sentence` builds random to be sent to the endpoint.  \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"088d0e75\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import numpy as np \\n\",\n    \"import datetime\\n\",\n    \"import math\\n\",\n    \"import time\\n\",\n    \"import boto3   \\n\",\n    \"import matplotlib.pyplot as plt\\n\",\n    \"from joblib import Parallel, delayed\\n\",\n    \"import numpy as np\\n\",\n    \"from tqdm import tqdm\\n\",\n    \"import random\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"038d9953\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def inference_latency(model,*inputs):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    infetence_time is a simple method to return the latency of a model inference.\\n\",\n    \"\\n\",\n    \"        Parameters:\\n\",\n    \"            model: torch model onbject loaded using torch.jit.load\\n\",\n    \"            inputs: model() args\\n\",\n    \"\\n\",\n    \"        Returns:\\n\",\n    \"            latency in seconds\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    error = False\\n\",\n    \"    start = time.time()\\n\",\n    \"    try:\\n\",\n    \"        results = model(*inputs)\\n\",\n    \"    except:\\n\",\n    \"        error = True\\n\",\n    \"        results = []\\n\",\n    \"    return {'latency':time.time() - start, 'error': error, 'result': results}\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"d6b200ac\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def random_sentence():\\n\",\n    \"    \\n\",\n    \"    s_nouns = [\\\"A dude\\\", \\\"My mom\\\", \\\"The king\\\", \\\"Some guy\\\", \\\"A cat with rabies\\\", \\\"A sloth\\\", \\\"Your homie\\\", \\\"This cool guy my gardener met yesterday\\\", \\\"Superman\\\"]\\n\",\n    \"    p_nouns = [\\\"These dudes\\\", \\\"Both of my moms\\\", \\\"All the kings of the world\\\", \\\"Some guys\\\", \\\"All of a cattery's cats\\\", \\\"The multitude of sloths living under your bed\\\", \\\"Your homies\\\", \\\"Like, these, like, all these people\\\", \\\"Supermen\\\"]\\n\",\n    \"    s_verbs = [\\\"eats\\\", \\\"kicks\\\", \\\"gives\\\", \\\"treats\\\", \\\"meets with\\\", \\\"creates\\\", \\\"hacks\\\", \\\"configures\\\", \\\"spies on\\\", \\\"retards\\\", \\\"meows on\\\", \\\"flees from\\\", \\\"tries to automate\\\", \\\"explodes\\\"]\\n\",\n    \"    p_verbs = [\\\"eat\\\", \\\"kick\\\", \\\"give\\\", \\\"treat\\\", \\\"meet with\\\", \\\"create\\\", \\\"hack\\\", \\\"configure\\\", \\\"spy on\\\", \\\"retard\\\", \\\"meow on\\\", \\\"flee from\\\", \\\"try to automate\\\", \\\"explode\\\"]\\n\",\n    \"    infinitives = [\\\"to make a pie.\\\", \\\"for no apparent reason.\\\", \\\"because the sky is green.\\\", \\\"for a disease.\\\", \\\"to be able to make toast explode.\\\", \\\"to know more about archeology.\\\"]\\n\",\n    \"    \\n\",\n    \"    return (random.choice(s_nouns) + ' ' + random.choice(s_verbs) + ' ' + random.choice(s_nouns).lower() or random.choice(p_nouns).lower() + ' ' + random.choice(infinitives))\\n\",\n    \"\\n\",\n    \"print([random_sentence(), random_sentence()])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e2945dde\",\n   \"metadata\": {},\n   \"source\": [\n    \"The following cell creates `number_of_clients` concurrent threads to run `number_of_runs` requests. Once completed, a `boto3` CloudWatch client will query for the server side latency metrics for comparison.   \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"69c047e3\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Defining Auxiliary variables\\n\",\n    \"number_of_clients = 2\\n\",\n    \"number_of_runs = 1000\\n\",\n    \"t = tqdm(range(number_of_runs),position=0, leave=True)\\n\",\n    \"\\n\",\n    \"# Starting parallel clients\\n\",\n    \"cw_start = datetime.datetime.utcnow()\\n\",\n    \"\\n\",\n    \"results = Parallel(n_jobs=number_of_clients,prefer=\\\"threads\\\")(delayed(inference_latency)(predictor.predict,[random_sentence(), random_sentence()]) for mod in t)\\n\",\n    \"avg_throughput = t.total/t.format_dict['elapsed']\\n\",\n    \"\\n\",\n    \"cw_end = datetime.datetime.utcnow() \\n\",\n    \"\\n\",\n    \"# Computing metrics and print\\n\",\n    \"latencies = [res['latency'] for res in results]\\n\",\n    \"errors = [res['error'] for res in results]\\n\",\n    \"error_p = sum(errors)/len(errors) *100\\n\",\n    \"p50 = np.quantile(latencies[-1000:],0.50) * 1000\\n\",\n    \"p90 = np.quantile(latencies[-1000:],0.95) * 1000\\n\",\n    \"p95 = np.quantile(latencies[-1000:],0.99) * 1000\\n\",\n    \"\\n\",\n    \"print(f'Avg Throughput: :{avg_throughput:.1f}\\\\n')\\n\",\n    \"print(f'50th Percentile Latency:{p50:.1f} ms')\\n\",\n    \"print(f'90th Percentile Latency:{p90:.1f} ms')\\n\",\n    \"print(f'95th Percentile Latency:{p95:.1f} ms\\\\n')\\n\",\n    \"print(f'Errors percentage: {error_p:.1f} %\\\\n')\\n\",\n    \"\\n\",\n    \"# Querying CloudWatch\\n\",\n    \"print('Getting Cloudwatch:')\\n\",\n    \"cloudwatch = boto3.client('cloudwatch')\\n\",\n    \"statistics=['SampleCount', 'Average', 'Minimum', 'Maximum']\\n\",\n    \"extended=['p50', 'p90', 'p95', 'p100']\\n\",\n    \"\\n\",\n    \"# Give 5 minute buffer to end\\n\",\n    \"cw_end += datetime.timedelta(minutes=5)\\n\",\n    \"\\n\",\n    \"# Period must be 1, 5, 10, 30, or multiple of 60\\n\",\n    \"# Calculate closest multiple of 60 to the total elapsed time\\n\",\n    \"factor = math.ceil((cw_end - cw_start).total_seconds() / 60)\\n\",\n    \"period = factor * 60\\n\",\n    \"print('Time elapsed: {} seconds'.format((cw_end - cw_start).total_seconds()))\\n\",\n    \"print('Using period of {} seconds\\\\n'.format(period))\\n\",\n    \"\\n\",\n    \"cloudwatch_ready = False\\n\",\n    \"# Keep polling CloudWatch metrics until datapoints are available\\n\",\n    \"while not cloudwatch_ready:\\n\",\n    \"  time.sleep(30)\\n\",\n    \"  print('Waiting 30 seconds ...')\\n\",\n    \"  # Must use default units of microseconds\\n\",\n    \"  model_latency_metrics = cloudwatch.get_metric_statistics(MetricName='ModelLatency',\\n\",\n    \"                                             Dimensions=[{'Name': 'EndpointName',\\n\",\n    \"                                                          'Value': predictor.endpoint_name},\\n\",\n    \"                                                         {'Name': 'VariantName',\\n\",\n    \"                                                          'Value': \\\"AllTraffic\\\"}],\\n\",\n    \"                                             Namespace=\\\"AWS/SageMaker\\\",\\n\",\n    \"                                             StartTime=cw_start,\\n\",\n    \"                                             EndTime=cw_end,\\n\",\n    \"                                             Period=period,\\n\",\n    \"                                             Statistics=statistics,\\n\",\n    \"                                             ExtendedStatistics=extended\\n\",\n    \"                                             )\\n\",\n    \"  # Should be 1000\\n\",\n    \"  if len(model_latency_metrics['Datapoints']) > 0:\\n\",\n    \"    print('{} latency datapoints ready'.format(model_latency_metrics['Datapoints'][0]['SampleCount']))\\n\",\n    \"    side_avg = model_latency_metrics['Datapoints'][0]['Average'] / number_of_runs\\n\",\n    \"    side_p50 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p50'] / number_of_runs\\n\",\n    \"    side_p90 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p90'] / number_of_runs\\n\",\n    \"    side_p95 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p95'] / number_of_runs\\n\",\n    \"    side_p100 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p100'] / number_of_runs\\n\",\n    \"    \\n\",\n    \"    print(f'50th Percentile Latency:{side_p50:.1f} ms')\\n\",\n    \"    print(f'90th Percentile Latency:{side_p90:.1f} ms')\\n\",\n    \"    print(f'95th Percentile Latency:{side_p95:.1f} ms\\\\n')\\n\",\n    \"\\n\",\n    \"    cloudwatch_ready = True\\n\",\n    \"\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"9035e681\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Cleanup\\n\",\n    \"Endpoints should be deleted when no longer in use, to avoid costs.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"1284ef3f\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"predictor.delete_endpoint(predictor.endpoint)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"5af53873\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.8.9 64-bit\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.9\"\n  },\n  \"vscode\": {\n   \"interpreter\": {\n    \"hash\": \"31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6\"\n   }\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/bert_neuronx/compile.py",
    "content": "import torch\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig\nimport transformers\nimport os\nimport warnings\n\nfrom detect_instance import get_instance_type, get_num_neuroncores\n\ninstance_type = get_instance_type() \n\nprint(f\"Detected instance type: {instance_type}\")\n\nif 'inf1' in instance_type:\n    print(\" - using torch_neuron.trace\")\n    from torch_neuron import trace\nelse:\n    print(\" - using torch_neuronx.xla_impl.trace\")\n    from torch_neuronx.xla_impl.trace import trace\nprint()\n\nos.environ['TOKENIZERS_PARALLELISM']='false'\nbatch_size = 6\n\n# Setting up NeuronCore groups for inf1.6xlarge with 16 cores\nnum_cores = get_num_neuroncores(instance_type)\nprint(f\"Number of cores = {num_cores}\")\nos.environ['NEURON_RT_NUM_CORES'] = str(num_cores)\n\n# Build tokenizer and model\ntokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased-finetuned-mrpc\")\nmodel = AutoModelForSequenceClassification.from_pretrained(\"bert-base-cased-finetuned-mrpc\", return_dict=False)\n\n# Setup some example inputs\nsequence_0 = \"The company HuggingFace is based in New York City\"\nsequence_1 = \"Apples are especially bad for your health\"\nsequence_2 = \"HuggingFace's headquarters are situated in Manhattan\"\n\nmax_length=128\nparaphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\nnot_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n\n# Convert example inputs to a format that is compatible with TorchScript tracing\nexample_inputs_paraphrase = (\n    torch.cat([paraphrase['input_ids']] * batch_size,0),\n    torch.cat([paraphrase['attention_mask']] * batch_size,0),\n    torch.cat([paraphrase['token_type_ids']] * batch_size,0)\n)\n\n# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\ntry:\n    model_neuron = trace(model, example_inputs_paraphrase)\nexcept Exception as e:\n    print(e)\n    print(\"libtorch_demo: Model tracing failed - check tutorial steps and preconditions\")\n    print(\"libtorch_demo: If this does not resolve your issue - Report a bug at \")\n    print(\"https://github.com/aws-neuron/aws-neuron-sdk/issues\")\n    exit(1)\n\n# Verify the TorchScript works on both example inputs\ntry:\n    paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)\nexcept:\n    print(\"libtorch_demo: Neuron runtime failed - check tutorial steps and preconditions\")\n    print(\"libtorch_demo: If this does not resolve your issue - Report a bug at \")\n    print(\"https://github.com/aws-neuron/aws-neuron-sdk/issues\")\n    exit(1)\n\n# Save the TorchScript for later use\nmodel_neuron.save(f'bert_neuron_b{batch_size}.pt')\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/bert_neuronx/detect_instance.py",
    "content": "import torch\nimport torch_neuronx\nfrom typing import Optional\n\nINSTANCETYPE_TO_NEURONCORES = {\n    \"inf1.xlarge\": 4,\n    \"inf1.2xlarge\": 4,\n    \"inf1.6xlarge\": 16,\n    \"inf2.xlarge\": 2,\n    \"inf2.8xlarge\": 2,\n    \"inf2.24xlarge\": 12,\n    \"inf2.48xlarge\": 24,\n    \"inf1.24xlarge\": 64,\n    \"trn1.2xlarge\": 2,\n    \"trn1.32xlarge\": 32,\n}\n\ndef get_instance_type() -> str:\n    \"\"\"Try to obtain the instance type.\"\"\"\n    try:\n        from urllib.request import Request, urlopen\n\n        req = Request(\"http://169.254.169.254/latest/api/token\", method=\"PUT\")\n        req.add_header(\"X-aws-ec2-metadata-token-ttl-seconds\", \"21600\")\n        with urlopen(req) as response:\n            token = response.read().decode(\"utf-8\")\n\n        req = Request(\"http://169.254.169.254/latest/meta-data/instance-type\")\n        req.add_header(\"X-aws-ec2-metadata-token\", token)\n        with urlopen(req) as response:\n            instance_type = response.read().decode(\"utf-8\")\n\n        return instance_type\n    except:  # noqa: E722, there are various ways above code can fail and we don't care\n        return None\n\n\ndef get_num_neuroncores(instance_type: Optional[str] = None) -> int:\n    \"\"\"\n    Try to obtain the maximum number of NeuronCores available on this instance.\n\n    Args:\n        instance_type: The Neuron instance type. Autodetermined from current instance\n            if not provided.\n\n    Returns:\n        The number of NeuronCores (or 2 if the type is unknown).\n    \"\"\"\n\n    try:\n        if not instance_type:\n            instance_type = get_instance_type()\n        return INSTANCETYPE_TO_NEURONCORES[instance_type]\n    except KeyError:\n        num_cores = get_num_neuroncores_v3()\n        return num_cores\n\n\ndef get_num_neuroncores_v3() -> int:\n    \"\"\"\n    Retrieve the number of NeuronCores visible to this process.\n\n    Returns:\n        The number of visible neuron cores.\n\n    Raises:\n        RuntimeError: If the Neuron runtime cannot be initialized. This most\n            commonly occurs when executing on an instance with no Neuron\n            devices available or when no Neuron devices are visible to the\n            process.\n    \"\"\"\n    runtime = torch.classes.neuron.Runtime()\n    try:\n        nc_count = runtime.get_visible_nc_count()\n    except RuntimeError as e:\n        raise RuntimeError(\n            \"Neuron runtime cannot be initialized; cannot determine the number of available NeuronCores\"  # noqa: E501\n        ) from e\n    return nc_count\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/clean.sh",
    "content": "#!/bin/bash\n\necho \"Clean up constructed files\"\nrm -rf bert_neuron_b6.pt example-app tokenizers venv/ libtorch/ tokenizers_binding/lib/ tokenizers_binding/venv all_metrics.csv venv\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/example_app/README.txt",
    "content": "AWS NEURON TORCHLIB DEMO FOR C++\n================================\n\nFor the full tutorial, please refer to:\nhttps://awsdocs-neuron.readthedocs-hosted.com\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/example_app/build.sh",
    "content": "#!/bin/bash\n\n# Installation script to build with torch dependency from /usr/local\nset -x\n\n# Find paths for local packages\nPATH_TOKENIZERS_LIB=../tokenizers_binding/lib\nPATH_TORCH=../libtorch\nPATH_TORCH_INC=${PATH_TORCH}/include\nPATH_TORCH_LIB=${PATH_TORCH}/lib\nPATH_NEURON_LIB=${PATH_TORCH}/lib\n\nif [ ! -e \"${PATH_TORCH_LIB}/libnrt.so.1\" ] && [ -e \"/opt/aws/neuron/lib/libnrt.so.1\" ]\nthen\n    PATH_NEURON_LIB=/opt/aws/neuron/lib/\nfi\n\ng++ utils.cpp example_app.cpp \\\n    -o ../example-app \\\n    -O2 \\\n    -D_GLIBCXX_USE_CXX11_ABI=1 \\\n    -I${PATH_TORCH_INC} \\\n    -L${PATH_TOKENIZERS_LIB} \\\n    -L${PATH_NEURON_LIB} \\\n    -L${PATH_TORCH_LIB} \\\n    -Wl,-rpath,libtorch/lib \\\n    -Wl,-rpath,tokenizers_binding/lib \\\n    -Wl,-rpath,$PATH_NEURON_LIB \\\n    -Wl,-no-as-needed \\\n    -ltokenizers \\\n    -ltorchneuron \\\n    -ltorch_cpu \\\n    -lc10 \\\n    -lpthread \\\n    -lnrt \\\n    -std=c++17\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/example_app/core_count.hpp",
    "content": "#pragma once\n\n/*\n * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n \n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\ntypedef enum {\n    NRT_SUCCESS = 0,\n    NRT_FAILURE = 1,                        \n    NRT_INVALID = 2,                        \n    NRT_INVALID_HANDLE = 3,                 \n    NRT_RESOURCE = 4,                      \n    NRT_TIMEOUT = 5,                        \n    NRT_HW_ERROR = 6,                      \n    NRT_QUEUE_FULL = 7,                    \n    NRT_LOAD_NOT_ENOUGH_NC = 9,             \n    NRT_UNSUPPORTED_NEFF_VERSION = 10,    \n    NRT_FAIL_HOST_MEM_ALLOC = 11,           \n    NRT_EXEC_BAD_INPUT = 1002,              \n    NRT_EXEC_COMPLETED_WITH_NUM_ERR = 1003, \n    NRT_EXEC_COMPLETED_WITH_ERR = 1004,     \n    NRT_EXEC_NC_BUSY = 1005,                \n    NRT_COLL_PENDING = 1100,                \n} NRT_STATUS;\n\nNRT_STATUS nrt_get_total_nc_count(uint32_t *nc_count);\n\n#ifdef __cplusplus\n}\n#endif"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/example_app/example_app.cpp",
    "content": "#include <atomic>\n#include <chrono>\n#include <iostream>\n#include <thread>\n\n#include \"utils.hpp\"\n#include \"core_count.hpp\"\n#include \"../tokenizers_binding/remote_rust_tokenizer.h\"\n\ntypedef std::vector<std::vector<long>> Input;\n\nnamespace\n{\n    // some hardcoded parameters that could be read from a config file\n    const size_t seq_len = 128;\n    const size_t batch_size = 6;\n    uint32_t num_neuron_cores = 0;\n    const size_t cores_per_model = 1;\n    const size_t num_runs_per_neuron_core = 2000;\n\n    // these token ids are particular to a vocabulary, could be parsed from vocab file\n    const long start_token = 101;\n    const long end_token = 102;\n}\n\n// construct a single input: input_ids, attention_mask, and token_type_ids from two input sentences\nInput get_input(const std::string& sentence_1, const std::string& sentence_2)\n{\n    // ensure the concatenated sentences + separator tokens do not exceed the compiled sequence length\n    assert(sentence_1.size() + sentence_2.size() + 3 <= seq_len);\n\n    // tokenize the input sentence using the HuggingFace Tokenizers library\n    std::vector<long> input_ids(seq_len, 0);\n    input_ids[0] = start_token;\n    size_t pos = 1; // current write position in input_ids\n\n    // tokenize sentence_1 and copy to output buffer\n    std::vector<uint32_t> buffer(seq_len, 0);\n    remote_rust_encode(sentence_1.c_str(), buffer.data(), buffer.size());\n    for (size_t i = 0; i < seq_len && buffer[i]; i++, pos++) {\n        input_ids[pos] = buffer[i];\n    }\n\n    // mark end of sentence_1\n    input_ids[pos++] = end_token;\n    const size_t sentence_2_start = pos;\n\n    // tokenize sentence_2 and copy to output buffer\n    std::fill(buffer.begin(), buffer.end(), 0);\n    remote_rust_encode(sentence_2.c_str(), buffer.data(), buffer.size());\n    for (size_t i = 0; i < seq_len && buffer[i]; i++, pos++) {\n        input_ids[pos] = buffer[i];\n    }\n\n    // mark end of sentence_2\n    input_ids[pos++] = end_token;\n\n    // construct attention mask\n    std::vector<long> attention_mask(seq_len, 0);\n    for (size_t i = 0; i < seq_len; ++i) attention_mask[i] = input_ids[i] ? 1 : 0;\n\n    // token type ids are 0s for sentence_1 (incl. separators), 1s for sentence_2\n    std::vector<long> token_type_ids(seq_len, 0);\n    for (size_t i = sentence_2_start; i < seq_len; i++) {\n        if (!attention_mask[i]) break;\n        token_type_ids[i] = 1;\n    }\n\n    return {input_ids, attention_mask, token_type_ids};\n}\n\n// reshape a vector of inputs into a proper batch\nstd::vector<torch::jit::IValue> get_batch(const std::vector<Input>& inputs)\n{\n    // must be given a full batch\n    assert(inputs.size() == batch_size);\n\n    torch::Tensor input_ids_tensor = torch::zeros({batch_size, seq_len}, at::kLong);\n    torch::Tensor attention_mask_tensor = torch::zeros({batch_size, seq_len}, at::kLong);\n    torch::Tensor token_type_ids_tensor = torch::zeros({batch_size, seq_len}, at::kLong);\n\n    const auto opts = torch::TensorOptions().dtype(torch::kLong);\n    for (size_t i = 0; i < batch_size; i++) {\n        input_ids_tensor.slice(0, i, i+1) = torch::from_blob((void*)inputs[i][0].data(), {seq_len}, opts);\n        attention_mask_tensor.slice(0, i, i+1) = torch::from_blob((void*)inputs[i][1].data(), {seq_len}, opts);\n        token_type_ids_tensor.slice(0, i, i+1) = torch::from_blob((void*)inputs[i][2].data(), {seq_len}, opts);\n    }\n\n    return {input_ids_tensor, attention_mask_tensor, token_type_ids_tensor};\n}\n\nint sanity_check(const std::string& model_filename)\n{\n    // load the model\n    auto model = get_model(model_filename);\n\n    // construct some example inputs\n    const std::string sentence_1 = \"The company HuggingFace is based in New York City\";\n    const std::string sentence_2 = \"Apples are especially bad for your health\";\n    const std::string sentence_3 = \"HuggingFace's headquarters are situated in Manhattan\";\n    const auto paraphrase = get_input(sentence_1, sentence_3);\n    const auto not_paraphrase = get_input(sentence_1, sentence_2);\n\n    // batch the inputs 50/50 positive/negative\n    std::vector<Input> inputs(batch_size);\n    for (size_t i = 0; i < batch_size; ++i) {\n        if (i < batch_size / 2) {\n            inputs[i] = paraphrase;\n        } else {\n            inputs[i] = not_paraphrase;\n        }\n    }\n    const auto batch = get_batch(inputs);\n\n    // forward pass\n    const auto output = model.forward(batch);\n\n    // interpret output\n    const auto output_tensor = output.toTuple()->elements()[0].toTensor();\n    const auto paraphrase_probabilities = torch::softmax(output_tensor[0], 0);\n    const auto not_paraphrase_probabilities = torch::softmax(output_tensor[batch_size-1], 0);\n    const auto paraphrase_0 = std::round(paraphrase_probabilities[0].item<double>() * 100);\n    const auto paraphrase_1 = std::round(paraphrase_probabilities[1].item<double>() * 100);\n    const auto not_paraphrase_0 = std::round(not_paraphrase_probabilities[0].item<double>() * 100);\n    const auto not_paraphrase_1 = std::round(not_paraphrase_probabilities[1].item<double>() * 100);\n\n    std::cout << sentence_1 << std::endl << sentence_3 << std::endl;\n    std::cout << \"not paraphrase: \" << paraphrase_0 << \"%\" << std::endl;\n    std::cout << \"paraphrase: \" << paraphrase_1 << \"%\" << std::endl;\n    if (paraphrase_0 >= paraphrase_1) return -1;\n\n    std::cout << std::endl;\n\n    std::cout << sentence_1 << std::endl << sentence_2 << std::endl;\n    std::cout << \"not paraphrase: \" << not_paraphrase_0 << \"%\" << std::endl;\n    std::cout << \"paraphrase: \" << not_paraphrase_1 << \"%\" << std::endl;\n    if (not_paraphrase_0 <= not_paraphrase_1) return -2;\n\n    return 0;\n}\n\nvoid benchmark(const std::string& model_filename, const std::vector<torch::jit::IValue>& batch,\n               std::condition_variable& warmup_cv, std::atomic_size_t& warmup_count,\n               std::condition_variable& ready_cv)\n{\n    // load model and warmup\n    auto model = get_model(model_filename);\n    model.forward(batch);\n    std::cout << \".\" << std::flush;\n    --warmup_count;\n    warmup_cv.notify_one();\n\n    // wait for ready signal\n    std::mutex ready_mutex;\n    std::unique_lock<std::mutex> lk(ready_mutex);\n    ready_cv.wait(lk);\n\n    // benchmark\n    for (size_t i = 0; i < num_runs_per_neuron_core; i++) {\n        if (i == num_runs_per_neuron_core/2) std::cout << \".\" << std::flush;\n        model.forward(batch);\n    }\n}\n\nint main(int argc, char *argv[])\n{\n    if (argc < 2) {\n        std::cerr << \"Usage: ./example_app neuron_traced_model.pt [--sanity]\" << std::endl;\n        return -1;\n    }\n\n    if( nrt_get_total_nc_count( &num_neuron_cores ) != NRT_SUCCESS ) {\n        std::cerr << \"Could not determine number of cores - aborting!\" << std::endl;\n        return -1;\n    }\n\n    // let runtime know we want M models / core for N cores (e.g. \"1,1,1,1\")\n    setenv(\"NEURON_RT_VISIBLE_CORES\", get_visible_cores_str(num_neuron_cores, cores_per_model).c_str(), true);\n\n    if (argc >= 3 && std::string(\"--sanity\") == argv[2]) {\n        return sanity_check(argv[1]);\n    }\n\n    /*************************************************************************/\n    // prepare inputs, prepare models, and perform warmup inference\n\n    std::cout << \"Getting ready\" << std::flush;\n\n    const auto input = get_input(\"This sentence is for benchmarking.\", \"For benchmarking, use this sentence.\");\n    const auto batch = get_batch(std::vector<Input>(batch_size, input));\n\n    std::condition_variable warmup_cv, ready_cv;\n    std::atomic_size_t warmup_count(num_neuron_cores);\n    std::vector<std::thread> threads(num_neuron_cores);\n    for (size_t i = 0; i < threads.size(); i++) {\n        threads[i] = std::move(std::thread(benchmark, argv[1], batch, std::ref(warmup_cv),\n                                std::ref(warmup_count), std::ref(ready_cv)));\n    }\n\n    // wait for warmup to complete\n    auto is_warmup_complete = [](std::atomic_size_t& warmup_count) { return warmup_count.load() == 0; };\n    std::mutex warmup_mutex;\n    std::unique_lock<std::mutex> lk(warmup_mutex);\n    warmup_cv.wait(lk, std::bind(is_warmup_complete, std::ref(warmup_count)));\n    std::cout << std::endl;\n\n    /*************************************************************************/\n    // begin timed benchmarking\n\n    std::cout << \"Benchmarking\" << std::flush;\n\n    // signal workers to begin benchmarking and wait for completion\n    const auto start_time = std::chrono::high_resolution_clock::now();\n    ready_cv.notify_all();\n    for (auto& thread : threads) thread.join();\n    const auto end_time = std::chrono::high_resolution_clock::now();\n    std::cout << std::endl;\n\n    // report statistics\n    const float elapsed = (end_time - start_time) / std::chrono::seconds(1);\n    const size_t num_inferences = num_neuron_cores * num_runs_per_neuron_core;\n    const float throughput = (float)(num_inferences * batch_size) / elapsed;\n    std::cout << \"Completed \" << num_inferences << \" operations in \" << elapsed << \" seconds => \" << throughput << \" pairs / second\" << std::endl;\n\n    std::cout << std::endl;\n    std::cout << \"====================\" << std::endl;\n    std::cout << \"Summary information:\" << std::endl;\n    std::cout << \"====================\" << std::endl;\n    std::cout << \"Batch size = \" << batch_size << std::endl;\n    std::cout << \"Num neuron cores = \" << num_neuron_cores << std::endl;\n    std::cout << \"Num runs per neuron core = \" << num_runs_per_neuron_core << std::endl;\n\n    return 0;\n}\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/example_app/utils.cpp",
    "content": "#include \"utils.hpp\"\n#include \"../tokenizers_binding/remote_rust_tokenizer.h\"\n\n#include <random>\n#include <sstream>\n\n#include <torch/csrc/jit/passes/inliner.h>\n#include <ATen/ATen.h>\n\nstd::string get_visible_cores_str(size_t num_neuron_cores, size_t cores_per_model)\n{\n    std::ostringstream oss;\n    oss << \"0-\" << ((num_neuron_cores * cores_per_model) - 1);\n    return oss.str();\n}\n\nstd::string get_uuid()\n{\n    // xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx\n    // M = version = 4, (4 bits, 0100 = 0x4)\n    // N = variant = 1, (2 bits, 10XX = 0x{8, 9, A, B})\n\n    static const char *chars = \"0123456789abcdef\";\n    static std::random_device rd;\n    static std::mt19937 mt(rd());\n    static std::uniform_int_distribution<> dist(0, 15);\n\n    std::stringstream ss;\n    for (size_t i = 0; i < 37; i++) {\n        const int index = dist(mt);\n        ss << chars[index];\n    }\n\n    // variant bits are 10XX\n    std::stringstream variant_ss;\n    size_t variant;\n    variant_ss << std::hex << chars[dist(mt)];\n    variant_ss >> variant;\n    variant = 0x8 | (0x3 & variant);\n\n    ss.seekp(9); ss << \"-\";\n    ss.seekp(14); ss << \"-4\";\n    ss.seekp(19); ss << \"-\" << std::hex << variant;\n    ss.seekp(24); ss << \"-\";\n    return ss.str();\n}\n\ntorch::jit::script::Module get_model(const std::string& filename)\n{\n    torch::jit::script::Module model = torch::jit::load(filename);\n\n    // If you're using a model traced with torch-neuron >= 1.8, \n    // the section below is no longer necessary. It was a workaround \n    // for a runtime issue when loading identical copies of a model.\n\n    // This is redundant in the new flow, but left to provide future \n    // pointer on torchscript graph manipulation if needed\n\n    // this next section adds a unique uuid to the graph, so that the neuron runtime\n    // will load the graph multiple times instead of reusing a previously loaded copy\n\n    /*\n    auto fwd = model.get_method(\"forward\");\n    auto& fn = static_cast<torch::jit::GraphFunction&>(fwd.function());\n    auto graph = fn.graph();\n\n    torch::jit::Inline(*graph);\n    for (auto node : graph->nodes()) {\n        if (std::string(node->kind().toQualString()).rfind(\"neuron::forward\") == 0) {\n            auto uuid_input_tensor = node->inputs()[1];\n            if (std::string(uuid_input_tensor->node()->kind().toQualString()).rfind(\"prim::Constant\") == 0) {\n                // we clone the tensor to retain ownership of \"the blob\" after it goes out of scope\n                const std::string uuid = get_uuid();\n                torch::Tensor t = torch::from_blob((void*)uuid.c_str(), {36}, torch::kUInt8).clone();\n\n                // if we don't move the insertion point so that the copy of the constant appears after the operator,\n                // the inference will crash\n                graph->setInsertPoint(node);\n                torch::jit::Value *val = graph->insertConstant(t);\n                node->replaceInputWith(uuid_input_tensor, val);\n\n                // ensure a valid graph\n                graph->lint();\n            }\n        }\n    }\n    */\n\n    return model;\n}\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/example_app/utils.hpp",
    "content": "#ifndef __UTILS_HPP__\n#define __UTILS_HPP__\n\n#include <torch/script.h>\n\nstd::string get_visible_cores_str(size_t num_neuron_cores, size_t cores_per_model);\nstd::string get_uuid();\ntorch::jit::script::Module get_model(const std::string& filename);\n\n#endif // __UTILS_HPP__\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/neuron.patch",
    "content": "\nFrom 3f126613c47e4261d0e86520cb6e85c5713e2b15 Mon Sep 17 00:00:00 2001\nFrom: Stephen Dunn <stdun@amazon.com>\nDate: Tue, 26 Jan 2021 22:55:40 +0000\nSubject: [PATCH] Adds AWS Neuron native C++ interface\n\n---\ndiff --git a/tokenizers/Cargo.toml b/tokenizers/Cargo.toml\nindex c0f1aff..9767da7 100644\n--- a/tokenizers/Cargo.toml\n+++ b/tokenizers/Cargo.toml\n@@ -19,6 +19,7 @@ exclude = [ \"rust-toolchain\", \"target/*\", \"Cargo.lock\", \"benches/*.txt\", \"benche\n name = \"tokenizers\"\n path = \"src/lib.rs\"\n bench = false\n+crate-type = [\"rlib\", \"cdylib\"]\n \n [[bench]]\n name = \"bpe_benchmark\"\ndiff --git a/tokenizers/src/lib.rs b/tokenizers/src/lib.rs\nindex eb89b93..2392f28 100644\n--- a/tokenizers/src/lib.rs\n+++ b/tokenizers/src/lib.rs\n@@ -145,6 +145,8 @@ pub mod tokenizer;\n // Re-export from tokenizer\n pub use tokenizer::*;\n \n+mod neuron;\n+\n // Re-export also parallelism utils\n pub use utils::parallelism;\n \ndiff --git a/b_tokenizers/tokenizers/src/neuron.rs b/tokenizers/src/neuron.rs\nnew file mode 100644\nindex 0000000..af4a679\n--- /dev/null\n+++ b/tokenizers/src/neuron.rs\n@@ -0,0 +1,25 @@\n+use crate::tokenizer::Tokenizer;\n+use std::ffi::CStr;\n+use std::os::raw::c_char;\n+\n+// cached tokenizer\n+static mut TOKENIZER: Option<Tokenizer> = None;\n+\n+#[no_mangle]\n+pub unsafe extern \"C\" fn remote_rust_encode(input_arr: *const c_char, output_arr: *mut u32, output_arr_len: u32) {\n+    // load the pretrained tokenizer up if we haven't already\n+    let tokenizer = TOKENIZER.get_or_insert_with(|| Tokenizer::from_file(\"./tokenizer.json\").unwrap());\n+\n+    // convert input from C -> Rust\n+    let cstr = CStr::from_ptr(input_arr);\n+    let input = cstr.to_str().unwrap();\n+\n+    // tokenize raw text\n+    let encoding = tokenizer.encode(input, false).unwrap();\n+\n+    // hand the output back to C across shared memory\n+    let output = std::slice::from_raw_parts_mut(output_arr, output_arr_len as usize);\n+    for (i, token) in &mut encoding.get_ids().to_vec().iter().enumerate() {\n+        output[i] = *token;\n+    }\n+}\n\\ No newline at end of file\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/run_tests.sh",
    "content": "#!/bin/bash\n\nset -e\n\nif [ \"$#\" -ne 1 ]; then\n    echo \"usage: ./run_tests.sh model_filename.pt\"\n    exit 1\nfi\n\necho -e \"\\nRunning tokenization sanity checks.\\n\"\npushd tokenizers_binding 2>&1 >/dev/null\nchmod +x run_python.sh run.sh\n(./run_python.sh && ./run.sh) || { echo \"Sanity checks failed.\"; exit 2; }\npopd 2>&1 >/dev/null\necho -e \"\\nTokenization sanity checks passed.\"\n\necho -e \"Running end-to-end sanity check.\\n\"\n(./example-app $1 --sanity) || { echo \"Sanity check failed.\"; exit 3; }\necho -e \"\\nSanity check passed.\\n\""
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/setup.sh",
    "content": "#!/bin/bash\n\nset -eEx\n\n# Fail on error\nset -e\n\nTORCH_VERSION=$(python -c \"import torch; v=torch.__version__.split('+')[0]; print(f'{v}')\")\n\n#Parse cli\nwhile [ \"$1\" != \"\" ]; do\n  case $1 in\n    --torch-version ) shift\n        TORCH_VERSION=$1\n        ;;\n  esac\n  shift\ndone\n\necho \"Using PyTorch version ${TORCH_VERSION}\"\n\n# Python setup\nPYTHON=python3\nPYTHON_VERSION=$($PYTHON --version | cut -f2 -d' ' | cut -f1,2 -d'.')\necho \"Python version is '$PYTHON_VERSION'\"\n\nOLD_TOOL_CHAIN=$($PYTHON -c \\\n    \"from bert_neuronx.detect_instance import get_instance_type; print('inf1' in get_instance_type())\")\n\nif [ \"$OLD_TOOL_CHAIN\" == \"True\" ]; then\n    TORCH_VERSION=\"1.13\"\n    echo \"- Detected inf1 - using version ${TORCH_VERSION}\"\nelse\n    echo \"- Detected inf2 or trn1 - using version ${TORCH_VERSION}\"\nfi\n\n# checkout tokenizers and apply neuron patch\nif [ ! -e \"tokenizers\" ]; then\n    git clone https://github.com/huggingface/tokenizers.git\n    cp neuron.patch tokenizers/neuron.patch\n    pushd tokenizers\n    git checkout d8c4388166cad8f0216dfc485efd6207a3275af2\n    git apply neuron.patch\n    rm neuron.patch\n    popd\nfi\n\n# build tests\npushd tokenizers_binding\nchmod +x build.sh\n./build.sh\npopd\ncp -f tokenizers_binding/tokenizer.json .\n\n# setup torch\nif [ ! -e \"libtorch\" ]; then\n    # Use different download paths based on PyTorch version\n    MAJOR_VERSION=$(echo \"${TORCH_VERSION}\" | cut -d. -f1)\n    MINOR_VERSION=$(echo \"${TORCH_VERSION}\" | cut -d. -f2)\n    \n    if [ \"$MAJOR_VERSION\" -gt 2 ] || ([ \"$MAJOR_VERSION\" -eq 2 ] && [ \"$MINOR_VERSION\" -ge 8 ]); then\n        wget -q https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-${TORCH_VERSION}%2Bcpu.zip\n        unzip -q libtorch-shared-with-deps-${TORCH_VERSION}+cpu.zip\n        rm -f libtorch-shared-with-deps-${TORCH_VERSION}+cpu.zip\n    else\n        wget -q https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-${TORCH_VERSION}%2Bcpu.zip\n        unzip -q libtorch-cxx11-abi-shared-with-deps-${TORCH_VERSION}+cpu.zip\n        rm -f libtorch-cxx11-abi-shared-with-deps-${TORCH_VERSION}+cpu.zip\n    fi\nfi\n\n# get libneuron_op.so and install into libtorch\n$PYTHON -m pip install --upgrade \"transformers==4.40.0\"\n$PYTHON bert_neuronx/compile.py\n\nsite_pkgs_dir=$($PYTHON -c \"import site; print(site.getsitepackages()[0])\")\nif [ \"$OLD_TOOL_CHAIN\" == \"True\" ]\n  then\n    cp -f $(find $site_pkgs_dir -exec find {} -type f -name 'libtorchneuron.so' \\; -quit | grep torch_neuron) libtorch/lib/\n    cp -f $(find $site_pkgs_dir -exec find {} -type f -name 'libnrt.so' \\; -quit ) libtorch/lib/\n    cp -f $(find $site_pkgs_dir -exec find {} -type f -name 'libnrt.so.1' \\; -quit ) libtorch/lib/\n  else\n    cp -f $(find $site_pkgs_dir -exec find {} -type f -name 'libtorchneuron.so' \\; -quit | grep torch_neuronx) libtorch/lib/\nfi\n\n# compile example app\npushd example_app\nchmod +x build.sh\n./build.sh\npopd\n\nchmod +x run_tests.sh\necho \"Successfully completed setup\"\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/tokenizers_binding/build.sh",
    "content": "#!/bin/bash\n\n# clean old artifacts\nrm tokenizer_test 2>&1 >/dev/null\nrm -rf lib 2>&1 >/dev/null\n\n# build shared library\nif [ $# -eq 0 ]; then\n    pushd ../tokenizers/tokenizers\n    echo \"Building release test...\"\n    cargo build --release\n    popd\n    cp -r ../tokenizers/tokenizers/target/release lib\n    g++ -O3 -o tokenizer_test tokenizer_test.cpp -L./lib -ltokenizers\nelse\n    pushd ../tokenizers/tokenizers\n    echo \"Building debug test...\"\n    cargo build\n    popd\n    cp -r ../tokenizers/tokenizers/target/debug lib\n    g++ -O0 -o tokenizer_test tokenizer_test.cpp -L./lib -ltokenizers\nfi\n\nif [ ! -e \"tokenizer.json\" ]; then\n    wget https://huggingface.co/bert-base-cased-finetuned-mrpc/raw/main/tokenizer.json\nfi\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/tokenizers_binding/remote_rust_tokenizer.h",
    "content": "#ifndef __REMOTE_RUST_TOKENIZER_H__\n#define __REMOTE_RUST_TOKENIZER_H__\n\n#include <cstdint>\n\nextern \"C\" {\n    extern void remote_rust_encode(const char *input_arr, uint32_t* output_arr, uint32_t output_arr_len);\n}\n\n#endif // __REMOTE_RUST_TOKENIZER_H__\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/tokenizers_binding/run.sh",
    "content": "#!/bin/bash\n\nset -e\n\nLD_LIBRARY_PATH=./lib ./tokenizer_test\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/tokenizers_binding/run_python.sh",
    "content": "#!/bin/bash\n\nset -e\n\npython tokenizer_test.py\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/tokenizers_binding/tokenizer_test.cpp",
    "content": "#include <iostream>\n#include <chrono> // timing\n#include <cstring> // rust interface\n#include <iomanip> // std::setprecision\n#include <sstream> // parse args\n#include <vector>\n\n#include \"remote_rust_tokenizer.h\"\n\n#define DEFAULT_NUM_TESTS 10000u\n\nint main(int argc, char *argv[]) {\n    // prepare some input to tokenize\n    const uint32_t seq_len = 128;\n    const std::vector<uint32_t> ground_truth = { 1409, 1917, 2947, 16193, 117, 1142, 3087, 1209, 1129, 22559, 2200, 1656, 155, 8954, 119 };\n    const char *input_arr = \"If everything goes smoothly, this text will be tokenized inside Rust.\";\n    uint32_t* output_arr = new uint32_t[seq_len];\n    std::memset(output_arr, 0, sizeof(uint32_t) * seq_len);\n\n    // call rust tokenizer\n    remote_rust_encode(input_arr, output_arr, seq_len);\n\n    // check output\n    std::cout << \"Sanity check \";\n    for (auto i = 0; i < ground_truth.size(); ++i) {\n        if (output_arr[i] != ground_truth[i]) {\n            std::cerr << \"failed at: \" << i << \", \" << output_arr[i] << \" != \" << ground_truth[i] << std::endl;\n            return -1;\n        }\n    }\n    std::cout << \"passed.\" << std::endl;\n\n    // run timed test\n    uint32_t num_tests = DEFAULT_NUM_TESTS;\n    if (argc >= 3 && !strcmp(\"--num_tests\", argv[1])) {\n        std::istringstream iss(argv[2]);\n        iss >> num_tests;\n    }\n\n    const uint32_t ten_percent = uint32_t(0.1 * num_tests);\n    std::cout << \"Begin \" << num_tests << \" timed tests.\" << std::endl;\n    auto start = std::chrono::high_resolution_clock::now();\n\n    for (auto test_num = 0; test_num < num_tests; ++test_num) {\n        if (test_num % ten_percent == 0) {\n            std::cout << \".\" << std::flush;\n        }\n        remote_rust_encode(input_arr, output_arr, seq_len);\n    }\n\n    auto end = std::chrono::high_resolution_clock::now();\n    auto duration = std::chrono::duration<double>(end - start);\n    std::cout << std::endl << \"End timed tests.\" << std::endl << \"C++ took \"\n        << std::setprecision(3) << duration.count()\n        << \" seconds.\" <<  std::endl;\n\n    return 0;\n}\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/tokenizers_binding/tokenizer_test.py",
    "content": "from transformers import AutoTokenizer\nimport argparse\nimport time\nfrom tqdm import tqdm\n\nparser = argparse.ArgumentParser()\nparser.add_argument('--num_tests', type=int, default=10_000)\nargs = parser.parse_args()\n\ntokenizer = AutoTokenizer.from_pretrained('bert-base-cased-finetuned-mrpc')\n\nstart = time.time()\nfor _ in tqdm(range(args.num_tests), desc='Tokenizing'):\n    tokenizer.encode(\"If everything goes smoothly, this text will be tokenized inside Rust.\")\nend = time.time()\nprint('Python took {:.2f} seconds.'.format(end - start))\n"
  },
  {
    "path": "src/examples/pytorch/libtorch_demo/trace_bert_neuron.py",
    "content": "import torch\nimport torch_neuron\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Build tokenizer and model\ntokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased-finetuned-mrpc\")\nmodel = AutoModelForSequenceClassification.from_pretrained(\"bert-base-cased-finetuned-mrpc\", return_dict=False)\n\n# Setup some example inputs\nsequence_0 = \"The company HuggingFace is based in New York City\"\nsequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n\nmax_length = 128\nbatch_size = 6\n\nparaphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n\nexample_inputs_paraphrase = (\n    torch.cat([paraphrase['input_ids']] * batch_size, 0),\n    torch.cat([paraphrase['attention_mask']] * batch_size, 0),\n    torch.cat([paraphrase['token_type_ids']] * batch_size, 0)\n)\n\n# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\nmodel_neuron_batch = torch_neuron.trace(model, example_inputs_paraphrase)\n\n# Save the batched model\nmodel_neuron_batch.save('bert_neuron_b{}.pt'.format(batch_size))"
  },
  {
    "path": "src/examples/pytorch/mnist_mlp/train_monitor.py",
    "content": "import os\nimport time\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom torchvision.datasets import mnist\nfrom torch.optim import SGD\nfrom torch.utils.data import DataLoader\nfrom torchvision.transforms import ToTensor\n\n# XLA imports\nimport torch_xla.core.xla_model as xm\n\n# Declare 3-layer MLP for MNIST dataset\nclass MLP(nn.Module):\n    def __init__(self, input_size = 28 * 28, output_size = 10, layers = [120, 84]):\n        super(MLP, self).__init__()\n        self.fc1 = nn.Linear(input_size, layers[0])\n        self.fc2 = nn.Linear(layers[0], layers[1])\n        self.fc3 = nn.Linear(layers[1], output_size)\n\n    def forward(self, x):\n        x = F.relu(self.fc1(x))\n        x = F.relu(self.fc2(x))\n        x = self.fc3(x)\n        return F.log_softmax(x, dim=1)\n\n# Load MNIST train dataset\ntrain_dataset = mnist.MNIST(root='./MNIST_DATA_train', \\\n                            train=True, download=True, transform=ToTensor())\n\ndef main():\n    # Prepare data loader\n    train_loader = DataLoader(train_dataset, batch_size=32)\n\n    # Fix the random number generator seeds for reproducibility\n    torch.manual_seed(0)\n\n    # XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance)\n    device = 'xla'\n\n    # Move model to device and declare optimizer and loss function\n    model = MLP().to(device)\n    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n    loss_fn = torch.nn.NLLLoss()\n\n    # Run the training loop\n    print('----------Training ---------------')\n    for run in range(0, 1000):\n        print(f'Run {run}')\n        model.train()\n        for idx, (train_x, train_label) in enumerate(train_loader):\n            optimizer.zero_grad()\n            train_x = train_x.view(train_x.size(0), -1)\n            train_x = train_x.to(device)\n            train_label = train_label.to(device)\n            output = model(train_x)\n            loss = loss_fn(output, train_label)\n            loss.backward()\n            optimizer.step()\n            xm.mark_step() # XLA: collect ops and run them in XLA runtime\n            if idx < 2: # skip warmup iterations\n                start = time.time()\n\n    # Save checkpoint for evaluation\n    os.makedirs(\"checkpoints\", exist_ok=True)\n    checkpoint = {'state_dict': model.state_dict()}\n    # XLA: use xm.save instead of torch.save to ensure states are moved back to cpu\n    # This can prevent \"XRT memory handle not found\" at end of test.py execution\n    xm.save(checkpoint,'checkpoints/checkpoint.pt')\n\n    print('----------End Training ---------------')"
  },
  {
    "path": "src/examples/pytorch/mnist_mlp/train_tb.py",
    "content": "import os\nimport time\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom torchvision.datasets import mnist\nfrom torch.optim import SGD\nfrom torch.utils.data import DataLoader\nfrom torchvision.transforms import ToTensor\n\n# XLA imports\nimport torch_xla.core.xla_model as xm\n\nfrom torch.utils.tensorboard import SummaryWriter\n\n# Declare 3-layer MLP for MNIST dataset\nclass MLP(nn.Module):\n  def __init__(self, input_size = 28 * 28, output_size = 10, layers = [120, 84]):\n      super(MLP, self).__init__()\n      self.fc1 = nn.Linear(input_size, layers[0])\n      self.fc2 = nn.Linear(layers[0], layers[1])\n      self.fc3 = nn.Linear(layers[1], output_size)\n\n  def forward(self, x):\n      x = F.relu(self.fc1(x))\n      x = F.relu(self.fc2(x))\n      x = self.fc3(x)\n      return F.log_softmax(x, dim=1)\n\n# Load MNIST train dataset\ntrain_dataset = mnist.MNIST(root='./MNIST_DATA_train', \\\n                            train=True, download=True, transform=ToTensor())\n\ndef main():\n    # Prepare data loader\n    train_loader = DataLoader(train_dataset, batch_size=32)\n\n    # Fix the random number generator seeds for reproducibility\n    torch.manual_seed(0)\n    \n    # XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance)\n    device = 'xla'\n    \n    # Move model to device and declare optimizer and loss function\n    model = MLP().to(device)\n    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n    loss_fn = torch.nn.NLLLoss()\n\n    # Use SummaryWriter to generate logs for TensorBoard\n    writer = SummaryWriter('./output')\n\n    # Run the training loop\n    print('----------Training ---------------')\n    model.train()\n    start = time.time()\n    for idx, (train_x, train_label) in enumerate(train_loader):\n        optimizer.zero_grad()\n        train_x = train_x.view(train_x.size(0), -1)\n        train_x = train_x.to(device)\n        train_label = train_label.to(device)\n        output = model(train_x)\n        loss = loss_fn(output, train_label)\n        writer.add_scalar(\"step loss\", loss, idx) # add the step loss to the TensorBoard logs\n        loss.backward()\n        optimizer.step()\n        xm.mark_step() # XLA: collect ops and run them in XLA runtime\n        if idx < 2: # skip warmup iterations\n            start = time.time()\n    \n    # Compute statistics\n    interval = idx - 2 # skip warmup iterations\n    throughput = interval / (time.time() - start)\n    print(\"Train throughput (iter/sec): {}\".format(throughput))\n    print(\"Final loss is {:0.4f}\".format(loss.detach().to('cpu')))\n    \n    # Ensure TensorBoard logs are all written\n    writer.flush()\n\n    # Save checkpoint for evaluation\n    os.makedirs(\"checkpoints\", exist_ok=True)\n    checkpoint = {'state_dict': model.state_dict()}\n    # XLA: use xm.save instead of torch.save to ensure states are moved back to cpu\n    # This can prevent \"XRT memory handle not found\" at end of test.py execution\n    xm.save(checkpoint,'checkpoints/checkpoint.pt')\n    \n    print('----------End Training ---------------')\n    \nif __name__ == '__main__':\n    main()"
  },
  {
    "path": "src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# [Broken] T5 inference with Tensor Parallelism\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"This is an extension to the [t5 inference tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.html). Here we will use NeuronxDistributed to improve the inference performance using tensor parallelism.\\n\",\n    \"\\n\",\n    \"This tutorial has the following main sections:\\n\",\n    \"\\n\",\n    \"1. Install dependencies\\n\",\n    \"1. Plug in `NeuronxDistributed` layers into T5\\n\",\n    \"1. Compile the T5 model\\n\",\n    \"1. Run distributed inference with beam search \\n\",\n    \"\\n\",\n    \"This Jupyter notebook should be run on a Inf2 instance (`inf2.24xlarge`) or Trn1 isntance (`trn1.32xlarge`)\\n\",\n    \"\\n\",\n    \"> The tutorial works for t5 and flan-t5 models. In this notebook we will run distributed inference with flan-t5-xl.\"\n   ]\n  },\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install dependencies\\n\",\n    \"\\n\",\n    \"The code in this tutorial is written for Jupyter Notebooks. To use Jupyter Notebook on the Neuron instance, you\\n\",\n    \"can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"Run the notebook by cloning aws-neuron-sdk\\n\",\n    \"```\\n\",\n    \"git clone https://github.com/aws-neuron/aws-neuron-sdk.git\\n\",\n    \"cd aws-neuron-sdk/src/examples/pytorch/neuronx_distributed/t5-inference/\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"Once done execute `t5-inference-tutorial.ipynb`\\n\",\n    \"\\n\",\n    \"It is recommended to go through the [t5 inference tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.html) before you start this tutorial. \\n\",\n    \"In addition to the dependencies in the [t5 inference tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.html), we need to install neuronx-distributed. \\n\",\n    \"\\n\",\n    \"This tutorial requires the following pip packages:\\n\",\n    \"\\n\",\n    \"- `torch-neuronx`\\n\",\n    \"- `neuronx-cc`\\n\",\n    \"- `transformers`\\n\",\n    \"- `optimum-neuron`\\n\",\n    \"- `neuronx-distributed`\\n\",\n    \"\\n\",\n    \"Most of these packages will be installed when configuring your environment using the Trn1/Inf2 [ setup guide ](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu20.html#setup-torch-neuronx-ubuntu20). The additional dependencies must be installed here:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"! pip install --upgrade transformers==4.33.1 optimum-neuron neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Pull the latest version of the compiler\\n\",\n    \"! pip install --upgrade neuronx-cc>=2.11 --no-deps\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Lets update numpy to a newer version \\n\",\n    \"! pip install --upgrade \\\"numpy>=1.22.2,<2\\\" --no-deps\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": []\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Plug in NeuronxDistributed layers into T5\\n\",\n    \"\\n\",\n    \"We extend the huggingface's T5 model to use the `NeuronxDistributed` parallel layers. To do so, we simply swap linear layers in `T5LayerSelfAttention`, `T5LayerCrossAttention`, and `T5LayerFF` definitions with `ColumnParallelLinear` and `RowParallelLinear`. We also need to swap the `Embedding` layer with `ParallelEmbedding`.\\n\",\n    \"\\n\",\n    \"Let us take the example of T5Attention. The [attention block](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L363-L366) has q, k, v, and o linear layers. \\n\",\n    \"The multi-head attention block uses q, k and v to compute the attention scores. The attention scores are then passed through o to compute the attention block output. \\n\",\n    \"So let us swap q, k and v layers with `ColumnParallelLinear` and o with `RowParallelLinear`. Having `RowParallelLinear` following a `ColumnParallelLinear` is a performance optimization. The attention scores computed with q, k and v are already split across Neuron devices. The row parallel layer can use this shared output directly. \\n\",\n    \"The embedding layer is simply swapped with the `ParallelEmbedding`.\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"class ParallelAttention(T5Attention):\\n\",\n    \"    def __init__(self, config: T5Config, has_relative_attention_bias=False):\\n\",\n    \"        super().__init__(config, has_relative_attention_bias)\\n\",\n    \"        # Per attention head and per partition values\\n\",\n    \"        world_size = parallel_state.get_tensor_model_parallel_size()\\n\",\n    \"        self.num_attention_heads_per_partition = divide(self.n_heads, world_size)\\n\",\n    \"        self.hidden_size_per_partition = self.num_attention_heads_per_partition * self.key_value_proj_dim\\n\",\n    \"\\n\",\n    \"        # Mesh TensorFlow initialization to avoid scaling before softmax\\n\",\n    \"        self.q = ColumnParallelLinear(self.d_model,\\n\",\n    \"                                      self.inner_dim,\\n\",\n    \"                                      bias=False,\\n\",\n    \"                                      gather_output=False)\\n\",\n    \"        self.k = ColumnParallelLinear(self.d_model,\\n\",\n    \"                                      self.inner_dim,\\n\",\n    \"                                      bias=False,\\n\",\n    \"                                      gather_output=False)\\n\",\n    \"        self.v = ColumnParallelLinear(self.d_model,\\n\",\n    \"                                      self.inner_dim,\\n\",\n    \"                                      bias=False,\\n\",\n    \"                                      gather_output=False)\\n\",\n    \"        self.o = RowParallelLinear(self.inner_dim,\\n\",\n    \"                                   self.d_model,\\n\",\n    \"                                   bias=False,\\n\",\n    \"                                   input_is_parallel=True)\\n\",\n    \"\\n\",\n    \"        if self.has_relative_attention_bias:\\n\",\n    \"            self.relative_attention_bias = ParallelEmbedding(self.relative_attention_num_buckets, self.n_heads)\\n\",\n    \"        self.n_heads = self.num_attention_heads_per_partition\\n\",\n    \"...\\n\",\n    \"```\\n\",\n    \"\\n\",\n    \"You can find the all modified T5 layers defined in [t5_model_layers.py](https://github.com/aws-neuron/aws-neuron-sdk/tree/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5_model_layers.py).  \\n\",\n    \"\\n\",\n    \"\\n\",\n    \"Once we have the modified T5 layers, we can plug in the T5Attention and T5LayerFF into the pretrained model. Here is how you do that. \\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"def load_pretrained_with_parallel_attn(model_name):\\n\",\n    \"    \\n\",\n    \"    model = T5ForConditionalGeneration.from_pretrained(model_name, torch_dtype=\\\"auto\\\")\\n\",\n    \"\\n\",\n    \"    # Parallel implementation of Attention modules.\\n\",\n    \"    from t5_model_layers import ParallelSelfAttention, ParallelFF, ParallelCrossAttention\\n\",\n    \"\\n\",\n    \"    for index, block in enumerate(model.decoder.block):\\n\",\n    \"        if index == 0:\\n\",\n    \"            block.layer[0] = ParallelSelfAttention(model.config,\\n\",\n    \"                                                   has_relative_attention_bias=True)\\n\",\n    \"        else:\\n\",\n    \"            block.layer[0] = ParallelSelfAttention(model.config)\\n\",\n    \"        block.layer[1] = ParallelCrossAttention(model.config)\\n\",\n    \"        block.layer[2] = ParallelFF(model.config)\\n\",\n    \"    # Load the weights into the parallel layers        \\n\",\n    \"    neuronx_distributed.parallel_layers.load(model_name + \\\".pt\\\", model, sharded=False)\\n\",\n    \"\\n\",\n    \"    return model\\n\",\n    \"\\n\",\n    \"```\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile the parallel T5 model\\n\",\n    \"\\n\",\n    \"Let us set some model parameters.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"model_name = \\\"google/flan-t5-xl\\\" \\n\",\n    \"max_length = 128\\n\",\n    \"num_beams = 4\\n\",\n    \"tp_degree = 8 # tensor parallelism degree\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Download and save the model that we want to trace. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"from transformers import T5ForConditionalGeneration\\n\",\n    \"\\n\",\n    \"model = T5ForConditionalGeneration.from_pretrained(model_name, torch_dtype=\\\"auto\\\")\\n\",\n    \"torch.save({\\\"model\\\":model.state_dict()}, model_name.split(\\\"/\\\")[-1] + \\\".pt\\\")\\n\",\n    \"model.config.use_cache = True\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To run HuggingFace T5 models on Neuron, we need to make a couple of changes. Let us reuse the code from the [t5 inference tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.html) which makes T5 compatible with Neuron. For your convenience, the code copied into [wrapper.py](https://github.com/aws-neuron/aws-neuron-sdk/tree/master/src/examples/pytorch/neuronx_distributed/t5-inference/wrapper.py) and [t5_models.py](https://github.com/aws-neuron/aws-neuron-sdk/tree/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5_models.py). This notebook will import these files. \\n\",\n    \"\\n\",\n    \"The only change made to this code is that we use `neuronx_distributed.trace` instead of `torch_neuronx.trace`. \\n\",\n    \"\\n\",\n    \"Let us trace the encoder and decoder. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import t5_models  \\n\",\n    \"import neuronx_distributed\\n\",\n    \"import time \\n\",\n    \"\\n\",\n    \"# This can take up to 20 minutes\\n\",\n    \"encoder_compile_start_time = time.time()\\n\",\n    \"traced_encoder = t5_models.parallel_trace_encoder(model_name, max_length, num_beams, tp_degree)\\n\",\n    \"print(\\\"Encoder compilation time {}\\\".format(time.time() - encoder_compile_start_time))\\n\",\n    \"\\n\",\n    \"neuronx_distributed.trace.parallel_model_save(traced_encoder, \\\"TracedParallelEncoder.pt\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# This can take up to 15 minutes\\n\",\n    \"decoder_compile_start_time = time.time()\\n\",\n    \"traced_decoder = t5_models.parallel_trace_decoder(model, model_name, num_beams, max_length, tp_degree)\\n\",\n    \"print(\\\"Decoder compilation time {}\\\".format(time.time() - decoder_compile_start_time))\\n\",\n    \"\\n\",\n    \"neuronx_distributed.trace.parallel_model_save(traced_decoder, \\\"TracedParallelDecoder.pt\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Inference with the traced parallel T5 model\\n\",\n    \"\\n\",\n    \"With the traced model, let us try using beam search for inference.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Results:\\n\",\n      \"1 Lassen Sie uns gutes Essen essen.\\n\",\n      \"2 Lassen Sie uns gut essen.\\n\",\n      \"3 Lassen Sie uns gutes Essen zu essen.\\n\",\n      \"4 Lassen Sie uns gutes Essen zu sich nehmen.\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"import neuronx_distributed\\n\",\n    \"from wrapper import T5Wrapper\\n\",\n    \"from transformers import T5Tokenizer\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"num_return_sequences = 4\\n\",\n    \"\\n\",\n    \"traced_encoder = neuronx_distributed.trace.parallel_model_load(\\\"TracedParallelEncoder.pt\\\")\\n\",\n    \"traced_decoder = neuronx_distributed.trace.parallel_model_load(\\\"TracedParallelDecoder.pt\\\")\\n\",\n    \"\\n\",\n    \"tokenizer = T5Tokenizer.from_pretrained(model_name)\\n\",\n    \"model = T5Wrapper.from_pretrained(model_name)\\n\",\n    \"\\n\",\n    \"model.encoder = traced_encoder\\n\",\n    \"model.decoder = traced_decoder\\n\",\n    \"setattr(model.encoder, 'main_input_name', 'input_ids')  # Attribute required by beam search\\n\",\n    \"\\n\",\n    \"output = model.parallel_infer(tokenizer=tokenizer,\\n\",\n    \"                              prompt=\\\"translate English to German: Lets eat good food.\\\",\\n\",\n    \"                              max_length=max_length,\\n\",\n    \"                              num_beams=num_beams,\\n\",\n    \"                              num_return_sequences=num_return_sequences,\\n\",\n    \"                              device=\\\"xla\\\")\\n\",\n    \"\\n\",\n    \"results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]\\n\",\n    \"\\n\",\n    \"print('Results:')\\n\",\n    \"for i, summary in enumerate(results):\\n\",\n    \"    print(i + 1, summary)\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Benchmarking\\n\",\n    \"\\n\",\n    \"Let us benchmark the per token decoder latency\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Let us install NeuronPerf. We will use it to measure the performance.\\n\",\n    \"! pip install neuronperf --extra-index-url=https://pip.repos.neuron.amazonaws.com\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os \\n\",\n    \"import neuronperf as npf\\n\",\n    \"\\n\",\n    \"d_model = model.config.d_model\\n\",\n    \"model_dir = \\\"TracedParallelDecoder.pt\\\"\\n\",\n    \"decoder_run_count = 128\\n\",\n    \"\\n\",\n    \"def load_fn(model_path, **kwargs):\\n\",\n    \"    return neuronx_distributed.trace.parallel_model_load(model_path)\\n\",\n    \"    \\n\",\n    \"# NeuronPerf can't see tp_degree at the moment, so just expose all cores\\n\",\n    \"def env_setup_fn(*_):\\n\",\n    \"    del os.environ[\\\"NEURON_RT_VISIBLE_CORES\\\"]\\n\",\n    \"\\n\",\n    \"def benchmark():\\n\",\n    \"\\n\",\n    \"    # Create some sample inputs for the decoder\\n\",\n    \"    decoder_input_ids = torch.ones((num_beams, 1), dtype=torch.int64)\\n\",\n    \"    decoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int32)\\n\",\n    \"    encoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int64)\\n\",\n    \"    encoder_hidden_states = torch.ones((num_beams, max_length, d_model), dtype=torch.float32)\\n\",\n    \"    beam_idx = torch.arange(0, num_beams, dtype=torch.int64)\\n\",\n    \"    beam_scores = torch.zeros((num_beams,), dtype=torch.float)\\n\",\n    \"\\n\",\n    \"    inputs = (decoder_input_ids,\\n\",\n    \"               decoder_attention_mask,\\n\",\n    \"               encoder_hidden_states,\\n\",\n    \"               encoder_attention_mask,\\n\",\n    \"               beam_idx,\\n\",\n    \"               beam_scores)\\n\",\n    \"\\n\",\n    \"    reports = npf.benchmark(\\n\",\n    \"        load_fn,\\n\",\n    \"        model_dir,\\n\",\n    \"        [inputs],       \\n\",\n    \"        batch_sizes=1,\\n\",\n    \"        n_models=1,\\n\",\n    \"        max_infers=decoder_run_count,\\n\",\n    \"        workers_per_model=1,  # no bottleneck on model inputs, so 1 is fine\\n\",\n    \"        env_setup_fn=env_setup_fn,\\n\",\n    \"        multiprocess=False,\\n\",\n    \"    )\\n\",\n    \"    \\n\",\n    \"    report = reports[0]\\n\",\n    \"\\n\",\n    \"    # let's update throughput to be tokens / second and add a new recor\\n\",\n    \"    latency_in_s = report[\\\"latency_ms_avg\\\"] / 1000\\n\",\n    \"    tokens_per_s = decoder_run_count / latency_in_s\\n\",\n    \"    report[\\\"throughput_avg\\\"] = tokens_per_s\\n\",\n    \"    \\n\",\n    \"    # display and save results\\n\",\n    \"    npf.print_reports(reports, cols=[\\\"throughput_avg\\\", \\\"latency_ms_p50\\\", \\\"latency_ms_p99\\\"])\\n\",\n    \"    print(f\\\"Results saved to: {npf.write_json(reports[0])}\\\")\\n\",\n    \"\\n\",\n    \"benchmark()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Now lets benchmark inference as a whole including sampling. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import torch\\n\",\n    \"import neuronx_distributed\\n\",\n    \"import neuronperf as npf\\n\",\n    \"\\n\",\n    \"from transformers import T5Tokenizer\\n\",\n    \"from wrapper import T5Wrapper\\n\",\n    \"\\n\",\n    \"tokenizer = T5Tokenizer.from_pretrained(model_name)\\n\",\n    \"\\n\",\n    \"generated_token_count = 0\\n\",\n    \"\\n\",\n    \"class Wrapper(torch.nn.Module):\\n\",\n    \"    def __init__(self, \\n\",\n    \"                 traced_encoder,\\n\",\n    \"                 traced_decoder):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.model = T5Wrapper.from_pretrained(model_name)\\n\",\n    \"        self.model.encoder = traced_encoder\\n\",\n    \"        self.model.decoder = traced_decoder\\n\",\n    \"        setattr(self.model.encoder, 'main_input_name', 'input_ids')  # Attribute required by beam search\\n\",\n    \"\\n\",\n    \"    def forward(self, *inputs):\\n\",\n    \"        input_ids = inputs[0]['input_ids']\\n\",\n    \"        attention_mask = inputs[0]['attention_mask']\\n\",\n    \"        return self.model.parallel_infer(input_ids=input_ids,\\n\",\n    \"                                         attention_mask=attention_mask,\\n\",\n    \"                                         max_length=max_length,\\n\",\n    \"                                         num_beams=num_beams,\\n\",\n    \"                                         num_return_sequences=num_return_sequences)\\n\",\n    \"\\n\",\n    \"def load_fn(filename, **kwargs):\\n\",\n    \"    traced_encoder = neuronx_distributed.trace.parallel_model_load(filename + \\\"TracedParallelEncoder.pt\\\")\\n\",\n    \"    traced_decoder = neuronx_distributed.trace.parallel_model_load(filename + \\\"TracedParallelDecoder.pt\\\")\\n\",\n    \"    return Wrapper(traced_encoder, traced_decoder)\\n\",\n    \"\\n\",\n    \"# NeuronPerf can't see tp_degree at the moment, so just expose all cores\\n\",\n    \"def env_setup_fn(*_):\\n\",\n    \"    del os.environ[\\\"NEURON_RT_VISIBLE_CORES\\\"]\\n\",\n    \"\\n\",\n    \"def preprocess_fn(inputs):\\n\",\n    \"    \\n\",\n    \"    encoding = []\\n\",\n    \"    for text in inputs:\\n\",\n    \"        batch_encoding = tokenizer(text, \\n\",\n    \"                                   max_length=max_length, \\n\",\n    \"                                   truncation=True, \\n\",\n    \"                                   padding='max_length',\\n\",\n    \"                                   return_tensors=\\\"pt\\\")\\n\",\n    \"        input_ids = batch_encoding['input_ids']\\n\",\n    \"        attention_mask = batch_encoding['attention_mask']\\n\",\n    \"        encoding.append({\\\"input_ids\\\": input_ids,\\n\",\n    \"                         \\\"attention_mask\\\": attention_mask})\\n\",\n    \"    return encoding\\n\",\n    \"\\n\",\n    \"def postprocess_fn(outputs):\\n\",\n    \"    output = [tokenizer.decode(seq) for seq in outputs]\\n\",\n    \"    global generated_token_count \\n\",\n    \"    generated_token_count = len(outputs[0])\\n\",\n    \"    return output\\n\",\n    \"\\n\",\n    \"def benchmark():\\n\",\n    \"    inputs = [\\\"summarize: The Inflation Reduction Act lowers prescription drug costs, health care costs, and energy costs. It's the most aggressive action on tackling the climate crisis in American history, which will lift up American workers and create good-paying, union jobs across the country. It'll lower the deficit and ask the ultra-wealthy and corporations to pay their fair share. And no one making under $400,000 per year will pay a penny more in taxes.\\\"]\\n\",\n    \"    reports = npf.benchmark(\\n\",\n    \"        load_fn,\\n\",\n    \"        \\\"\\\",   # Model dir\\n\",\n    \"        [inputs], \\n\",\n    \"        batch_sizes=1,\\n\",\n    \"        n_models=1,\\n\",\n    \"        max_infers=5,\\n\",\n    \"        max_duration=0,       # sampling can take a while, so let's not timeout\\n\",\n    \"        workers_per_model=1,  \\n\",\n    \"        env_setup_fn=env_setup_fn,\\n\",\n    \"        preprocess_fn=preprocess_fn,\\n\",\n    \"        postprocess_fn=postprocess_fn,\\n\",\n    \"        multiprocess=False,\\n\",\n    \"    )\\n\",\n    \"    \\n\",\n    \"    report = reports[0]\\n\",\n    \"\\n\",\n    \"    report[\\\"throughput_avg\\\"] = round(generated_token_count / (report[\\\"latency_ms_avg\\\"] / 1000), 2)\\n\",\n    \"    report[\\\"latency_per_token_ms_p50\\\"] = round((report[\\\"latency_ms_p50\\\"])/generated_token_count, 2)\\n\",\n    \"    report[\\\"latency_per_token_ms_p99\\\"] = round((report[\\\"latency_ms_p99\\\"])/generated_token_count, 2)\\n\",\n    \"\\n\",\n    \"    # display and save results\\n\",\n    \"    npf.print_reports(reports, cols=[\\\"throughput_avg\\\", \\\"latency_per_token_ms_p50\\\", \\\"latency_per_token_ms_p99\\\"])\\n\",\n    \"    print(f\\\"Results saved to: {npf.write_json(report)}\\\")\\n\",\n    \"\\n\",\n    \"benchmark()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"aws_neuron_venv_pytorch\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.10\"\n  },\n  \"orig_nbformat\": 4\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2\n}\n"
  },
  {
    "path": "src/examples/pytorch/neuronx_distributed/t5-inference/t5_model_layers.py",
    "content": "from neuronx_distributed.parallel_layers import parallel_state\nfrom neuronx_distributed.parallel_layers.layers import BaseParallelLinear, ColumnParallelLinear, RowParallelLinear, ParallelEmbedding\nfrom neuronx_distributed.parallel_layers.utils import divide\n\nimport torch\nfrom torch import nn\nfrom torch.nn.parameter import Parameter\nfrom transformers import T5Config\nfrom transformers.activations import ACT2FN\nfrom transformers.pytorch_utils import find_pruneable_heads_and_indices\nfrom transformers.models.t5.modeling_t5 import T5Attention, T5LayerSelfAttention, T5LayerNorm,\\\n    T5LayerCrossAttention, T5LayerFF, T5DenseGatedActDense, T5DenseActDense\n\nfrom transformers import T5ForConditionalGeneration\nimport neuronx_distributed\n\ndef prune_linear_layer(layer: BaseParallelLinear, index: torch.LongTensor,\n                       dim: int = 0) -> BaseParallelLinear:\n    \"\"\"\n    Prune a linear layer to keep only entries in index.\n\n    Used to remove heads.\n\n    Args:\n        layer (`BaseParallelLinear`): The layer to prune.\n        index (`torch.LongTensor`): The indices to keep in the layer.\n        dim (`int`, *optional*, defaults to 0): The dimension on which to keep the indices.\n\n    Returns:\n        `BaseParallelLinear`: The pruned layer as a new layer with `requires_grad=True`.\n    \"\"\"\n    index = index.to(layer.weight.device)\n    W = layer.weight.index_select(dim, index).clone().detach()\n    if layer.bias is not None:\n        if dim == 1:\n            b = layer.bias.clone().detach()\n        else:\n            b = layer.bias[index].clone().detach()\n    new_size = list(layer.weight.size())\n    new_size[dim] = len(index)\n    new_layer = ColumnParallelLinear(new_size[1],\n                                     new_size[0],\n                                     bias=layer.bias is not None,\n                                     gather_output=False).to(layer.weight.device)\n    new_layer.weight.requires_grad = False\n    new_layer.weight.copy_(W.contiguous())\n    new_layer.weight.requires_grad = True\n    if layer.bias is not None:\n        new_layer.bias.requires_grad = False\n        new_layer.bias.copy_(b.contiguous())\n        new_layer.bias.requires_grad = True\n    return new_layer\n\n\nclass ParallelAttention(T5Attention):\n    def __init__(self, config: T5Config, has_relative_attention_bias=False):\n        super().__init__(config, has_relative_attention_bias)\n        # Per attention head and per partition values\n        world_size = parallel_state.get_tensor_model_parallel_size()\n        self.num_attention_heads_per_partition = divide(\n            self.n_heads, world_size)\n        self.hidden_size_per_partition = self.num_attention_heads_per_partition * self.key_value_proj_dim\n\n        # Mesh TensorFlow initialization to avoid scaling before softmax\n        self.q = ColumnParallelLinear(self.d_model,\n                                      self.inner_dim,\n                                      bias=False,\n                                      gather_output=False)\n        self.k = ColumnParallelLinear(self.d_model,\n                                      self.inner_dim,\n                                      bias=False,\n                                      gather_output=False)\n        self.v = ColumnParallelLinear(self.d_model,\n                                      self.inner_dim,\n                                      bias=False,\n                                      gather_output=False)\n        self.o = RowParallelLinear(self.inner_dim,\n                                   self.d_model,\n                                   bias=False,\n                                   input_is_parallel=True)\n\n        if self.has_relative_attention_bias:\n            self.relative_attention_bias = ParallelEmbedding(self.relative_attention_num_buckets, self.n_heads)\n        self.n_heads = self.num_attention_heads_per_partition\n\n    def prune_heads(self, heads):\n        if len(heads) == 0:\n            return\n        heads, index = find_pruneable_heads_and_indices(\n            heads, self.num_attention_heads_per_partition, self.key_value_proj_dim, self.pruned_heads\n        )\n        # Prune linear layers\n        self.q = prune_linear_layer(self.q, index)\n        self.k = prune_linear_layer(self.k, index)\n        self.v = prune_linear_layer(self.v, index)\n        self.o = prune_linear_layer(self.o, index, dim=1)\n        # Update hyper params\n        self.num_attention_heads_per_partition = self.num_attention_heads_per_partition - len(heads)\n        self.hidden_size_per_partition = self.key_value_proj_dim * self.num_attention_heads_per_partition\n        self.pruned_heads = self.pruned_heads.union(heads)\n\n    def compute_bias(self, query_length, key_length, device=None):\n        \"\"\"Compute binned relative position bias\"\"\"\n        if device is None:\n            device = self.relative_attention_bias.weight.device\n        context_position = torch.arange(query_length, dtype=torch.long, device=device)[:, None]\n        memory_position = torch.arange(key_length, dtype=torch.long, device=device)[None, :]\n        relative_position = memory_position - context_position  # shape (query_length, key_length)\n        relative_position_bucket = self._relative_position_bucket(\n            relative_position,  # shape (query_length, key_length)\n            bidirectional=(not self.is_decoder),\n            num_buckets=self.relative_attention_num_buckets,\n            max_distance=self.relative_attention_max_distance,\n        )\n        values = self.relative_attention_bias(\n            relative_position_bucket)\n        tp_rank = parallel_state.get_tensor_model_parallel_rank()\n        values = values[:, :, tp_rank * self.num_attention_heads_per_partition:(tp_rank + 1)\n                                                                     * self.num_attention_heads_per_partition]\n\n        # values = self.relative_attention_bias(\n        #     relative_position_bucket)  # shape (query_length, key_length, num_heads)\n        values = values.permute([2, 0, 1]).unsqueeze(\n            0)  # shape (1, num_heads, query_length, key_length)\n        # print(\"Values shape is: \", values.shape)\n        return values\n\n    def forward(\n        self,\n        hidden_states,\n        mask=None,\n        key_value_states=None,\n        position_bias=None,\n        past_key_value=None,\n        layer_head_mask=None,\n        query_length=None,\n        use_cache=False,\n        output_attentions=False,\n    ):\n        \"\"\"\n        Self-attention (if key_value_states is None) or attention over source sentence (provided by key_value_states).\n        \"\"\"\n        # Input is (batch_size, seq_length, dim)\n        # Mask is (batch_size, key_length) (non-causal) or (batch_size, key_length, key_length)\n        # past_key_value[0] is (batch_size, n_heads, q_len - 1, dim_per_head)\n        self.is_decoder = True\n        batch_size, seq_length = hidden_states.shape[:2]\n\n        real_seq_length = seq_length\n\n        if past_key_value is not None:\n            assert (\n                    len(past_key_value) == 2\n            ), f\"past_key_value should have 2 past states: keys and values. Got {len(past_key_value)} past states\"\n            real_seq_length += past_key_value[0].shape[2] if query_length is None else query_length\n\n        key_length = real_seq_length if key_value_states is None else key_value_states.shape[1]\n\n        def shape(states):\n            \"\"\"projection\"\"\"\n            return states.view(batch_size, -1, self.num_attention_heads_per_partition,\n                               self.key_value_proj_dim).transpose(1, 2)\n\n        def unshape(states):\n            \"\"\"reshape\"\"\"\n            return states.transpose(1, 2).contiguous().view(batch_size, -1,\n                                                            self.hidden_size_per_partition)\n\n        def project(hidden_states, proj_layer, key_value_states, past_key_value):\n            \"\"\"projects hidden states correctly to key/query states\"\"\"\n            if key_value_states is None:\n                # self-attn\n                # (batch_size, n_heads, seq_length, dim_per_head)\n                hidden_states = shape(proj_layer(hidden_states))\n            elif past_key_value is None:\n                # cross-attn\n                # (batch_size, n_heads, seq_length, dim_per_head)\n                hidden_states = shape(proj_layer(key_value_states))\n\n            if past_key_value is not None:\n                # import pdb; pdb.set_trace()\n                if key_value_states is None:\n                    # self-attn\n                    # (batch_size, n_heads, key_length, dim_per_head)\n                    hidden_states = torch.cat([past_key_value, hidden_states], dim=2)\n                elif past_key_value.shape[2] != key_value_states.shape[1]:\n                    # checking that the `sequence_length` of the `past_key_value` is the same as\n                    # the provided `key_value_states` to support prefix tuning\n                    # cross-attn\n                    # (batch_size, n_heads, seq_length, dim_per_head)\n                    hidden_states = shape(proj_layer(key_value_states))\n                else:\n                    # cross-attn\n                    hidden_states = past_key_value\n            return hidden_states\n\n        # get query states\n        query_states = shape(\n            self.q(hidden_states))  # (batch_size, n_heads, seq_length, dim_per_head)\n\n        # get key/value states\n        key_states = project(\n            hidden_states, self.k, key_value_states,\n            past_key_value[0] if past_key_value is not None else None\n        )\n        value_states = project(\n            hidden_states, self.v, key_value_states,\n            past_key_value[1] if past_key_value is not None else None\n        )\n\n        # compute scores\n        scores = torch.matmul(\n            query_states, key_states.transpose(3, 2)\n        )  # equivalent of torch.einsum(\"bnqd,bnkd->bnqk\", query_states, key_states), compatible with onnx op>9\n\n        if position_bias is None:\n            if not self.has_relative_attention_bias:\n                position_bias = torch.zeros(\n                    (1, self.num_attention_heads_per_partition, real_seq_length, key_length),\n                    device=scores.device,\n                    dtype=scores.dtype\n                )\n                if self.gradient_checkpointing and self.training:\n                    position_bias.requires_grad = True\n            else:\n                position_bias = self.compute_bias(real_seq_length, key_length, device=scores.device)\n\n            # if key and values are already calculated\n            # we want only the last query position bias\n            if past_key_value is not None:\n                position_bias = position_bias[:, :, -hidden_states.size(1):, :]\n\n            if mask is not None:\n                print(position_bias.shape, mask.shape, flush=True)\n                position_bias = position_bias + mask  # (batch_size, n_heads, seq_length, key_length)\n\n        if self.pruned_heads:\n            mask = torch.ones(position_bias.shape[1])\n            mask[list(self.pruned_heads)] = 0\n            position_bias_masked = position_bias[:, mask.bool()]\n        else:\n            position_bias_masked = position_bias\n\n        # print(\"Scores is: \", scores.shape)\n        # print(\"position_bias_masked: \", position_bias_masked.shape)\n        # print(scores.dtype, position_bias_masked.dtype)\n\n        scores += position_bias_masked\n        attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as(\n            scores\n        )  # (batch_size, n_heads, seq_length, key_length)\n        attn_weights = nn.functional.dropout(\n            attn_weights, p=self.dropout, training=self.training\n        )  # (batch_size, n_heads, seq_length, key_length)\n\n        # Mask heads if we want to\n        if layer_head_mask is not None:\n            attn_weights = attn_weights * layer_head_mask\n\n        attn_output = unshape(\n            torch.matmul(attn_weights, value_states))  # (batch_size, seq_length, dim)\n        attn_output = self.o(attn_output)\n\n        print(self.is_decoder,use_cache, flush=True)\n        present_key_value_state = (key_states, value_states) if (\n                self.is_decoder and use_cache) else None\n        outputs = (attn_output,) + (present_key_value_state,) + (position_bias,)\n\n        if output_attentions:\n            outputs = outputs + (attn_weights,)\n        return outputs\n\n\nclass ParallelSelfAttention(T5LayerSelfAttention):\n    def __init__(self, config, has_relative_attention_bias=False):\n        super().__init__(config, has_relative_attention_bias=False)\n        self.SelfAttention = ParallelAttention(config,\n                                         has_relative_attention_bias=has_relative_attention_bias)\n        self.layer_norm = T5LayerNorm(config.d_model, eps=config.layer_norm_epsilon)\n        self.dropout = nn.Dropout(config.dropout_rate)\n\n\nclass ParallelCrossAttention(T5LayerCrossAttention):\n    def __init__(self, config):\n        super().__init__(config)\n        self.EncDecAttention = ParallelAttention(config, has_relative_attention_bias=False)\n        self.layer_norm = T5LayerNorm(config.d_model, eps=config.layer_norm_epsilon)\n        self.dropout = nn.Dropout(config.dropout_rate)\n\n\nclass ParallelDenseActDense(T5DenseActDense):\n    def __init__(self, config: T5Config):\n        super().__init__(config)\n        self.wi = ColumnParallelLinear(config.d_model, config.d_ff, gather_output=False, bias=False)\n        self.wo = RowParallelLinear(config.d_ff, config.d_model, input_is_parallel=True, bias=False)\n        self.dropout = nn.Dropout(config.dropout_rate)\n        self.act = ACT2FN[config.dense_act_fn]\n\n\nclass ParallelDenseGatedActDense(T5DenseGatedActDense):\n    def __init__(self, config: T5Config):\n        super().__init__(config)\n        self.wi_0 = ColumnParallelLinear(config.d_model,\n                                      config.d_ff,\n                                         gather_output=False,\n                                      bias=False)\n        self.wi_1 = ColumnParallelLinear(config.d_model,\n                                      config.d_ff,\n                                        gather_output=False,\n                                      bias=False)\n        self.wo = RowParallelLinear(config.d_ff,\n                                    config.d_model,\n                                    input_is_parallel=True,\n                                    bias=False)\n        self.dropout = nn.Dropout(config.dropout_rate)\n        self.act = ACT2FN[config.dense_act_fn]\n\n\nclass ParallelFF(T5LayerFF):\n    def __init__(self, config: T5Config):\n        super().__init__(config)\n        if config.is_gated_act:\n            self.DenseReluDense = ParallelDenseGatedActDense(config)\n        else:\n            self.DenseReluDense = ParallelDenseActDense(config)\n\n        self.layer_norm = T5LayerNorm(config.d_model, eps=config.layer_norm_epsilon)\n        self.dropout = nn.Dropout(config.dropout_rate)\n\n\ndef load_pretrained_with_parallel_attn(model_name):\n    \n    model = T5ForConditionalGeneration.from_pretrained(model_name, torch_dtype=\"auto\")\n\n    # Parallel implementation of Attention modules.\n    from t5_model_layers import ParallelSelfAttention, ParallelFF, ParallelCrossAttention\n\n    for index, block in enumerate(model.decoder.block):\n        if index == 0:\n            block.layer[0] = ParallelSelfAttention(model.config,\n                                                   has_relative_attention_bias=True)\n        else:\n            block.layer[0] = ParallelSelfAttention(model.config)\n        block.layer[1] = ParallelCrossAttention(model.config)\n        block.layer[2] = ParallelFF(model.config)\n    # Load the weights into the parallel layers        \n    neuronx_distributed.parallel_layers.load(model_name.split(\"/\")[-1] + \".pt\", model, sharded=False)\n\n    return model\n"
  },
  {
    "path": "src/examples/pytorch/neuronx_distributed/t5-inference/t5_models.py",
    "content": "import torch\nimport neuronx_distributed\n\nfrom functools import partial\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\nfrom wrapper import EncoderWrapper, DecoderWrapper\nfrom t5_model_layers import load_pretrained_with_parallel_attn\n\ndef get_wrapped_encoder(max_length, num_beams, tp_degree, model_name):\n    \n    model = load_pretrained_with_parallel_attn(model_name)\n\n    encoder = EncoderWrapper(model.encoder, model.decoder, model.config, num_beams, max_length, \"xla\", num_beams, tp_degree=tp_degree)\n    encoder.eval()\n    \n    # We are alaising the cache, so that way we keep the cache always on device.\n    aliases = {}\n    for i in range(len(encoder.past_key_values_sa)):\n        aliases[encoder.past_key_values_sa[i]] = i\n    \n    for i in range(len(encoder.past_key_values_ca)):\n        aliases[encoder.past_key_values_ca[i]] = len(encoder.past_key_values_sa) + i\n\n    return encoder, aliases\n\n\ndef get_wrapped_decoder(max_length, num_beams, tp_degree, model_name):\n    \n    model = load_pretrained_with_parallel_attn(model_name)\n\n    decoder = DecoderWrapper(decoder=model.decoder,\n                             lm_head=model.lm_head,\n                             model_config=model.config,\n                             num_beams=num_beams,\n                             max_length=max_length,\n                             device=\"xla\",\n                             tp_degree=tp_degree)\n    \n    decoder.eval()\n    num_outputs_from_trace = 3 if num_beams > 1 else 1\n    aliases = {}\n    for i in range(len(decoder.past_key_values_sa)):\n        aliases[decoder.past_key_values_sa[i]] = i + num_outputs_from_trace\n    for i in range(len(decoder.past_key_values_ca)):\n        aliases[decoder.past_key_values_ca[i]] = len(decoder.past_key_values_sa) + i + num_outputs_from_trace\n\n    return decoder, aliases\n\ndef parallel_trace_encoder(model_name: str,\n                           max_length: int,\n                           num_beams: int,\n                           tp_degree: int):\n    \n    print(\"starting encoder parallel trace\")\n    \n    tokenizer = T5Tokenizer.from_pretrained(model_name)\n    get_encoder_callable = partial(get_wrapped_encoder, max_length, num_beams, tp_degree, model_name)\n\n    # Trace encoder\n    batch_encoding = tokenizer(\"translate English to German: Lets go home now\",\n                               max_length=max_length, truncation=True, padding='max_length', return_tensors=\"pt\")\n    input_ids = batch_encoding['input_ids']\n    attention_mask = batch_encoding['attention_mask']\n\n    # Here we are tracing the encoder and cache together. Cache is marked as state and we are aliasing.\n    traced_encoder = neuronx_distributed.trace.parallel_model_trace(get_encoder_callable, (\n            input_ids,\n            attention_mask,\n        ), \n        tp_degree=tp_degree, \n        compiler_workdir=\"/tmp/encoder/\",\n        )\n    setattr(traced_encoder, 'main_input_name', 'input_ids')  # Attribute required by beam search\n\n    print(\"completed encoder parallel trace\")\n\n    return traced_encoder\n\n\ndef parallel_trace_decoder(model: T5ForConditionalGeneration,\n                           model_name: str,\n                           num_beams: int,\n                           max_length: int,\n                           tp_degree: int):\n\n    print(\"starting decoder trace\")\n\n    get_decoder_callable = partial(get_wrapped_decoder, max_length, num_beams, tp_degree, model_name)\n  \n    # We create mock inputs so we can trace the decoder\n    decoder_input_ids = torch.ones((num_beams, 1), dtype=torch.int64)\n    decoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int32)\n    encoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int64)\n    encoder_hidden_states = torch.ones((num_beams, max_length, model.config.d_model), dtype=torch.float32)\n\n    beam_idx = torch.arange(0, num_beams, dtype=torch.int64)\n    beam_scores = torch.zeros((num_beams,), dtype=torch.float)\n\n    traced_decoder = neuronx_distributed.trace.parallel_model_trace(get_decoder_callable, (\n            decoder_input_ids,\n            decoder_attention_mask,\n            encoder_hidden_states,\n            encoder_attention_mask,\n            beam_idx,\n            beam_scores\n        ), \n        tp_degree=tp_degree,\n        compiler_workdir=\"/tmp/decoder/\",\n        )\n\n    print(\"complete decoder trace\")\n\n    return traced_decoder\n"
  },
  {
    "path": "src/examples/pytorch/neuronx_distributed/t5-inference/wrapper.py",
    "content": "import torch\nimport neuronx_distributed\nimport torch_xla.core.xla_model as xm\n\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\nfrom transformers.modeling_outputs import BaseModelOutput, Seq2SeqLMOutput\nfrom transformers.models.t5.modeling_t5 import T5Stack, T5LayerCrossAttention\nfrom transformers.generation.utils import ModelOutput\nfrom typing import Any, Dict, List, Optional, Tuple, Union\nfrom transformers.generation.beam_search import BeamScorer, BeamSearchScorer\n\nfrom optimum.neuron.generation import NeuronGenerationMixin\n\nfrom transformers.generation.logits_process import (\n    LogitsProcessorList,\n)\nfrom transformers.generation.stopping_criteria import (\n    MaxLengthCriteria,\n    MaxTimeCriteria,\n    StoppingCriteriaList,\n    validate_stopping_criteria,\n)\n\nfrom transformers.generation.utils import (\n    BeamSearchDecoderOnlyOutput,\n    BeamSearchEncoderDecoderOutput,\n    BeamSearchOutput,\n    GreedySearchOutput,\n)\n\nclass T5Wrapper(T5ForConditionalGeneration, NeuronGenerationMixin):\n\n    def _prepare_encoder_decoder_kwargs_for_generation(\n        self, \n        inputs_tensor: torch.Tensor, \n        model_kwargs, \n        model_input_name: Optional[str] = None\n    ) -> Dict[str, Any]:\n        encoder = self.get_encoder()\n        model_kwargs[\"encoder_outputs\"]: ModelOutput = encoder(inputs_tensor, model_kwargs[\"attention_mask\"])\n        return model_kwargs\n\n    # Override to cut the input_ids to just last token\n    def prepare_inputs_for_generation(\n        self,\n        input_ids,\n        past_key_values=None,\n        attention_mask=None,\n        head_mask=None,\n        decoder_head_mask=None,\n        decoder_attention_mask=None,\n        cross_attn_head_mask=None,\n        use_cache=None,\n        encoder_outputs=None,\n        **kwargs,\n    ):\n        # cut decoder_input_ids as past is cached\n        input_ids = input_ids[:, -1:]\n\n        return {\n            \"decoder_input_ids\": input_ids,\n            \"past_key_values\": past_key_values,\n            \"encoder_outputs\": encoder_outputs,\n            \"attention_mask\": attention_mask,\n            \"head_mask\": head_mask,\n            \"decoder_head_mask\": decoder_head_mask,\n            \"decoder_attention_mask\": decoder_attention_mask,\n            \"cross_attn_head_mask\": cross_attn_head_mask,\n            \"use_cache\": use_cache,\n        }\n    \n    '''\n        We update the cache in the decoder trace, so lets override the _update_model_kwargs_for_xla_generation in NeuronGenerationMixin\n    '''\n    def _update_model_kwargs_for_xla_generation(\n        self,\n        model_kwargs: Dict[str, Any],\n        batch_size: int,\n        is_encoder_decoder: bool = False,\n        standardize_cache_format: bool = False,\n        max_length: Optional[int] = None,\n        seq_length: Optional[int] = None,\n        use_cache: bool = True,\n    ) -> Dict[str, Any]:\n\n        def _update_attention(model_kwargs, is_encoder_decoder):\n            \"\"\"Updates the appropriate attention mask -- encoder-decoder models use `decoder_attention_mask`\"\"\"\n\n            attention_mask_name = \"decoder_attention_mask\" if is_encoder_decoder else \"attention_mask\"\n            attention_mask = model_kwargs.pop(attention_mask_name)\n            attention_mask_update_slice = torch.ones(\n                (batch_size, 1), dtype=attention_mask.dtype, device=attention_mask.device\n            )\n            attention_mask = torch.cat([attention_mask[:, 1:], attention_mask_update_slice], dim=-1)\n            mask = {attention_mask_name: attention_mask}\n            return mask\n\n        mask = _update_attention(model_kwargs, is_encoder_decoder)\n        # sets the updated variables (mask and past_key_values)\n        model_kwargs.update(mask)\n\n        # Set a mock cache tensor\n        model_kwargs[\"past_key_values\"] = torch.tensor([])\n\n        return model_kwargs\n    \n    def _reorder_cache(self, past_key_values, beam_idx):\n        '''\n            This is needed for beam search and not greedy sampling\n            We reorder the cache within the trace so we can skip it in modelling_t5.py. So we override the _reorder_cache\n        '''\n        self.beam_idx = beam_idx\n        return past_key_values\n\n    def infer(self,\n              tokenizer: T5Tokenizer,\n              prompt: str,\n              max_length: int,\n              num_beams: int,\n              num_return_sequences: int,\n              device: str):\n\n        batch_encoding = tokenizer(prompt, max_length=max_length, truncation=True, padding='max_length',\n                                return_tensors=\"pt\")\n\n        past_key_values = self.encoder(batch_encoding['input_ids'],batch_encoding['attention_mask'])\n \n        decoder_attention_mask = torch.cat([torch.zeros((1, max_length-1), dtype=torch.int32),\n                                            torch.ones((1, 1), dtype=torch.int32)], axis=1)\n\n        # copy the new cache state to the decoder\n        if device == \"xla\":\n            for state, tensor in zip(self.decoder.parameters(), past_key_values):\n                state.copy_(tensor)\n        else:\n            # First half of the cache is self attention and the rest is cross attention\n            self.decoder.past_key_values_sa = past_key_values[:len(past_key_values)//2]\n            self.decoder.past_key_values_ca = past_key_values[len(past_key_values)//2:]\n        \n        output = self.generate(**batch_encoding,\n                                max_length=max_length,\n                                num_beams=num_beams,\n                                num_return_sequences=num_return_sequences,\n                                do_sample=False,\n                                use_cache=True,\n                                decoder_attention_mask=decoder_attention_mask, \n                                encoder_outputs={\"last_hidden_state\": torch.ones((1,128,1))}) # Pass fake encoder_outputs so the transfomers code will not invoke the encoder\n        return output\n\n    def parallel_infer(self,\n                       max_length: int,\n                       num_beams: int,\n                       num_return_sequences: int,\n                       device: str = None,\n                       tokenizer: T5Tokenizer = None,\n                       prompt: str = None, \n                       input_ids: torch.Tensor = None,\n                       attention_mask: torch.Tensor = None):\n\n        if input_ids is None or attention_mask is None: \n            batch_encoding = tokenizer(prompt, \n                                    max_length=max_length, \n                                    truncation=True, \n                                    padding='max_length',\n                                    return_tensors=\"pt\")\n        else: \n            batch_encoding = {\n                'input_ids' : input_ids,\n                'attention_mask': attention_mask\n            }\n\n        \n        past_key_values = self.encoder(batch_encoding['input_ids'],batch_encoding['attention_mask'])\n \n        decoder_attention_mask = torch.cat([torch.zeros((1, max_length-1), dtype=torch.int32),\n                                            torch.ones((1, 1), dtype=torch.int32)], axis=1)\n\n        # Here the encoder is now returning cache which is device tensor, so we directly assign\n        # the cache device tensor to decoder's cache (which is also a device tensor). \n        # We thereby avoid the copy and always use a pre-allocated memory\n        for model_tp_decoder, model_tp_encoder in zip(self.decoder.models, self.encoder.models):\n            model_tp_decoder.load_state_dict(model_tp_encoder.state_dict(), strict=True)\n            \n        # Pass fake encoder_outputs so the transfomers code will not invoke the encoder\n        output = self.generate(**batch_encoding,\n                                max_length=max_length,\n                                num_beams=num_beams,\n                                num_return_sequences=num_return_sequences,\n                                do_sample=False,\n                                use_cache=True,\n                                decoder_attention_mask=decoder_attention_mask, \n                                encoder_outputs={\"last_hidden_state\": torch.ones((1,128,1))}) \n        return output\n\n    def forward(\n        self,\n        attention_mask: Optional[torch.FloatTensor] = None,\n        decoder_input_ids: Optional[torch.LongTensor] = None,\n        decoder_attention_mask: Optional[torch.BoolTensor] = None,\n        encoder_outputs: Optional[Tuple[Tuple[torch.Tensor]]] = None,\n        beam_scores = None,\n        **kwargs\n    ) -> Union[Tuple[torch.FloatTensor], Seq2SeqLMOutput]:\n\n        hidden_states = encoder_outputs[\"last_hidden_state\"]\n\n        if not hasattr(self, 'beam_idx'):\n            # Infering the number of beams from the attention mask\n            num_beams = attention_mask.shape[0]\n            self.beam_idx = torch.arange(0, num_beams, dtype=torch.int64)\n\n        decoder_outputs = self.decoder(\n            decoder_input_ids,\n            decoder_attention_mask,\n            hidden_states,\n            attention_mask,\n            self.beam_idx,\n            beam_scores\n        )\n\n        # lm_logits = decoder_outputs[0]\n        next_token_scores = decoder_outputs[0]\n        next_tokens = decoder_outputs[1]\n        next_indices = decoder_outputs[2]\n\n        return next_token_scores, next_tokens, next_indices\n\n    def beam_search(\n        self,\n        input_ids: torch.LongTensor,\n        beam_scorer: BeamScorer,\n        logits_processor: Optional[LogitsProcessorList] = None,\n        stopping_criteria: Optional[StoppingCriteriaList] = None,\n        max_length: Optional[int] = None,\n        pad_token_id: Optional[int] = None,\n        eos_token_id: Optional[Union[int, List[int]]] = None,\n        output_attentions: Optional[bool] = None,\n        output_hidden_states: Optional[bool] = None,\n        output_scores: Optional[bool] = None,\n        return_dict_in_generate: Optional[bool] = None,\n        synced_gpus: Optional[bool] = False,\n        seq_length: Optional[int] = None,\n        **model_kwargs,\n    ) -> Union[BeamSearchOutput, torch.LongTensor]:\n\n        logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()\n        stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()\n        pad_token_id = pad_token_id if pad_token_id is not None else self.generation_config.pad_token_id\n        eos_token_id = eos_token_id if eos_token_id is not None else self.generation_config.eos_token_id\n        if isinstance(eos_token_id, int):\n            eos_token_id = [eos_token_id]\n        output_scores = output_scores if output_scores is not None else self.generation_config.output_scores\n        output_attentions = (\n            output_attentions if output_attentions is not None else self.generation_config.output_attentions\n        )\n        output_hidden_states = (\n            output_hidden_states if output_hidden_states is not None else self.generation_config.output_hidden_states\n        )\n\n        batch_size = len(beam_scorer._beam_hyps)\n        num_beams = beam_scorer.num_beams\n\n        batch_beam_size, cur_len = input_ids.shape\n\n        # Overwrite cur_len\n        cur_len = seq_length\n\n        if num_beams * batch_size != batch_beam_size:\n            raise ValueError(\n                f\"Batch dimension of `input_ids` should be {num_beams * batch_size}, but is {batch_beam_size}.\"\n            )\n\n        # init attention / hidden states / scores tuples\n        scores = () if (return_dict_in_generate and output_scores) else None\n        beam_indices = (\n            tuple(() for _ in range(batch_beam_size)) if (return_dict_in_generate and output_scores) else None\n        )\n\n        # initialise score of first beam with 0 and the rest with -1e9. This makes sure that only tokens\n        # of the first beam are considered to avoid sampling the exact same tokens across all beams.\n        # beam_scores = torch.zeros((batch_size, num_beams), dtype=torch.float, device=input_ids.device)\n        beam_scores_device = \"cpu\"\n        beam_scores = torch.zeros((batch_size, num_beams), dtype=torch.float, device=beam_scores_device)\n        beam_scores[:, 1:] = -1e9\n        beam_scores = beam_scores.view((batch_size * num_beams,))\n\n        while True:\n            # prepare model inputs\n            # From max_length-sized input_ids, select first\n            # cur_len - 1 values.\n            update_indices = torch.stack(\n                [torch.arange(input_ids.size(0)), torch.tensor(cur_len - 1).repeat(input_ids.size(0))], dim=-1\n            )\n            input_ids_ = input_ids[update_indices[:, 0], update_indices[:, 1], None]\n            model_inputs = self.prepare_inputs_for_generation(input_ids_, **model_kwargs)\n\n            next_token_scores, next_tokens, next_indices = self(\n                **model_inputs,\n                return_dict=True,\n                output_attentions=output_attentions,\n                output_hidden_states=output_hidden_states,\n                beam_scores=beam_scores\n            )\n\n            # stateless\n            beam_outputs = beam_scorer.process(\n                input_ids.to(\"cpu\")[:, :cur_len],\n                next_token_scores.to(\"cpu\"),\n                next_tokens.to(\"cpu\"),\n                next_indices.to(\"cpu\"),\n                pad_token_id=pad_token_id,\n                eos_token_id=eos_token_id,\n                beam_indices=beam_indices,\n            )\n\n            beam_scores = beam_outputs[\"next_beam_scores\"]\n            beam_next_tokens = beam_outputs[\"next_beam_tokens\"]\n            beam_idx = beam_outputs[\"next_beam_indices\"]\n\n            update_indices = torch.stack(\n                [torch.arange(batch_beam_size), torch.tensor(cur_len - 1).repeat(batch_beam_size)], dim=-1\n            )\n            update_indices_2 = torch.stack(\n                [torch.arange(batch_beam_size), torch.tensor(cur_len).repeat(batch_beam_size)], dim=-1\n            )\n            # First select beam_indices\n            device = input_ids.device\n            beam_idx_device = beam_idx.to(device=input_ids.device)\n            input_ids[:, :] = input_ids[beam_idx_device.long(), :]\n\n            # Then append new tokens\n            input_ids[update_indices_2[:, 0], update_indices_2[:, 1], None] = beam_next_tokens.unsqueeze(-1).to(device).to(torch.long)\n            input_ids = input_ids * 1  # Hack to materialize tensor\n\n            # update generated ids, model inputs, and length for next step\n            model_kwargs = self._update_model_kwargs_for_xla_generation(\n                model_kwargs,\n                batch_size=batch_beam_size,\n                is_encoder_decoder=self.config.is_encoder_decoder,\n                max_length=stopping_criteria.max_length,\n                seq_length=cur_len,\n                use_cache=model_kwargs[\"use_cache\"],\n            )\n            if model_kwargs[\"past_key_values\"] is not None:\n                model_kwargs[\"past_key_values\"] = self._reorder_cache(model_kwargs[\"past_key_values\"], beam_idx.to(torch.int64))\n\n            if return_dict_in_generate and output_scores:\n                beam_indices = tuple((beam_indices[beam_idx[i]] + (beam_idx[i],) for i in range(len(beam_indices))))\n\n            # increase cur_len\n            cur_len = cur_len + 1\n\n            # stop when each sentence is finished, or if we exceed the maximum length\n            stop_criterion_1 = beam_scorer.is_done\n            if isinstance(stopping_criteria, list):\n                if len(stopping_criteria) == 1:\n                    stopping_criteria = stopping_criteria[0]\n\n            # Cases that can be handled in XLA without requiring\n            # non-padded input_ids\n            if isinstance(stopping_criteria, MaxLengthCriteria):\n                stop_criterion_2 = cur_len >= stopping_criteria.max_length\n            elif isinstance(stopping_criteria, MaxTimeCriteria):\n                stop_criterion_2 = stopping_criteria(input_ids, scores)\n            else:\n                # Other cases will be handled on CPU\n                batch_size, _ = input_ids.shape\n                input_ids_cpu = input_ids.to(\"cpu\")\n                mask = torch.cat(\n                    [torch.ones(batch_size, cur_len), torch.zeros(batch_size, input_ids.shape[1] - cur_len)], dim=1\n                ).bool()\n                input_ids_cpu = torch.masked_select(input_ids_cpu, mask).reshape((batch_size, cur_len))\n                scores_cpu = scores.to(\"cpu\") if torch.is_tensor(scores) else scores\n                stop_criterion_2 = stopping_criteria(input_ids_cpu, scores_cpu)\n\n            if stop_criterion_1 or stop_criterion_2:\n                if not synced_gpus:\n                    break\n                else:\n                    this_peer_finished = True\n\n        sequence_outputs = beam_scorer.finalize(\n            input_ids.to(\"cpu\"),\n            beam_scores.to(\"cpu\"),\n            next_tokens.to(\"cpu\"),\n            next_indices.to(\"cpu\"),\n            pad_token_id=pad_token_id,\n            eos_token_id=eos_token_id,\n            max_length=stopping_criteria.max_length,\n            beam_indices=beam_indices,\n        )\n\n        for k, v in sequence_outputs.items():\n            if type(v) == torch.Tensor:\n                sequence_outputs[k] = sequence_outputs[k].to(input_ids.device)\n\n        return sequence_outputs[\"sequences\"]\n\n\n    def greedy_search(\n        self,\n        input_ids: torch.LongTensor,\n        logits_processor: Optional[LogitsProcessorList] = None,\n        stopping_criteria: Optional[StoppingCriteriaList] = None,\n        max_length: Optional[int] = None,\n        pad_token_id: Optional[int] = None,\n        eos_token_id: Optional[Union[int, List[int]]] = None,\n        output_attentions: Optional[bool] = None,\n        output_hidden_states: Optional[bool] = None,\n        output_scores: Optional[bool] = None,\n        return_dict_in_generate: Optional[bool] = None,\n        seq_length: Optional[int] = int,\n        streamer: Optional[\"BaseStreamer\"] = None,\n        **model_kwargs,\n    ) -> Union[GreedySearchOutput, torch.LongTensor]:\n        \"\"\"\n            Overriding greedy sampling to use next tokens returned from neuron device instead of logits.\n        \"\"\"\n        # init values\n        logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()\n        use_cache = model_kwargs[\"use_cache\"] if \"use_cache\" in model_kwargs else False\n        stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()\n        pad_token_id = pad_token_id if pad_token_id is not None else self.generation_config.pad_token_id\n        eos_token_id = eos_token_id if eos_token_id is not None else self.generation_config.eos_token_id\n        if isinstance(eos_token_id, int):\n            eos_token_id = [eos_token_id]\n        eos_token_id_tensor = torch.tensor(eos_token_id).to(input_ids.device) if eos_token_id is not None else None\n        output_scores = output_scores if output_scores is not None else self.generation_config.output_scores\n        output_attentions = (\n            output_attentions if output_attentions is not None else self.generation_config.output_attentions\n        )\n        output_hidden_states = (\n            output_hidden_states if output_hidden_states is not None else self.generation_config.output_hidden_states\n        )\n\n        # init attention / hidden states / scores tuples\n        scores = () if (return_dict_in_generate and output_scores) else None\n        decoder_attentions = () if (return_dict_in_generate and output_attentions) else None\n        cross_attentions = () if (return_dict_in_generate and output_attentions) else None\n        decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None\n\n\n        # keep track of which sequences are already finished\n        unfinished_sequences = torch.ones(input_ids.shape[0], dtype=torch.long, device=input_ids.device)\n\n        this_peer_finished = False  # used by synced_gpus only\n        while True:\n\n            # prepare model inputs\n            # From max_length-sized input_ids, select first\n            # seq_length - 1 values.\n\n            if model_kwargs.get(\"past_key_values\") is None:\n                input_ids_ = input_ids[:, :seq_length]\n            else:\n                update_indices = torch.stack(\n                    [torch.arange(input_ids.size(0)), torch.tensor(seq_length - 1).repeat(input_ids.size(0))],\n                    dim=-1,\n                )\n                input_ids_ = input_ids[update_indices[:, 0], update_indices[:, 1], None]\n\n            model_inputs = self.prepare_inputs_for_generation(input_ids_, **model_kwargs)\n        \n            # forward pass to get next token\n            output = self(\n               **model_inputs,\n                return_dict=True,\n                output_attentions=output_attentions,\n                output_hidden_states=output_hidden_states,\n            )\n            next_tokens = output[0]\n\n            # finished sentences should have their next token be a padding token\n            if eos_token_id is not None:\n                if pad_token_id is None:\n                    raise ValueError(\"If `eos_token_id` is defined, make sure that `pad_token_id` is defined.\")\n                next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)\n\n            # update generated ids, model inputs, and length for next step\n\n            batch_size, _ = input_ids.shape\n            update_indices = torch.stack(\n                [torch.arange(batch_size), torch.tensor(seq_length).repeat(batch_size)], dim=-1\n            )\n            input_ids[update_indices[:, 0], update_indices[:, 1]] = next_tokens[:]\n            model_kwargs = self._update_model_kwargs_for_xla_generation(\n                model_kwargs,\n                batch_size=batch_size,\n                is_encoder_decoder=self.config.is_encoder_decoder,\n                max_length=stopping_criteria.max_length,\n                seq_length=seq_length,\n                use_cache=use_cache,\n            )\n\n            seq_length += 1\n\n            # if eos_token was found in one sentence, set sentence to finished\n            if eos_token_id_tensor is not None:\n                unfinished_sequences = unfinished_sequences.mul(\n                    next_tokens.tile(eos_token_id_tensor.shape[0], 1).ne(eos_token_id_tensor.unsqueeze(1)).prod(dim=0)\n                )\n\n            # stop when each sentence is finished, or if we exceed the maximum length\n            stop_criterion_1 = unfinished_sequences.max() == 0\n\n            if isinstance(stopping_criteria, list):\n                if len(stopping_criteria) == 1:\n                    stopping_criteria = stopping_criteria[0]\n\n            # Cases that can be handled in XLA without requiring\n            # non-padded input_ids\n            if isinstance(stopping_criteria, MaxLengthCriteria):\n                stop_criterion_2 = seq_length >= stopping_criteria.max_length\n            elif isinstance(stopping_criteria, MaxTimeCriteria):\n                stop_criterion_2 = stopping_criteria(input_ids, scores)\n            else:\n                # Other cases will be handled on CPU\n                batch_size, _ = input_ids.shape\n                mask = torch.cat(\n                    [torch.ones(batch_size, seq_length), torch.zeros(batch_size, input_ids.shape[1] - seq_length)],\n                    dim=1,\n                ).bool()\n                input_ids_cpu = torch.masked_select(input_ids, mask).reshape((batch_size, seq_length)).to(\"cpu\")\n                scores_cpu = scores.to(\"cpu\") if torch.is_tensor(scores) else scores\n                stop_criterion_2 = stopping_criteria(input_ids_cpu, scores_cpu)\n\n            if stop_criterion_1 or stop_criterion_2:\n                this_peer_finished = True\n\n            if this_peer_finished:\n                break\n\n        if streamer is not None:\n            streamer.end()\n\n        return input_ids\n    \nclass EncoderWrapper(torch.nn.Module):\n    '''\n        This wrapper converts positional args to kwargs\n    '''\n\n    def __init__(self, \n                 encoder,\n                 decoder, \n                 model_config, \n                 batch_size, \n                 max_length, \n                 device, \n                 num_beams,\n                 tp_degree=None):\n        \n        super().__init__()\n        self.encoder = encoder\n        self.decoder = decoder\n        self.batch_size = batch_size\n        self.max_length = max_length\n        self.model_config = model_config\n        self.device = device\n        self.num_beams = num_beams\n        self.num_attention_heads_per_partition = model_config.num_heads\n        self.tp_degree = tp_degree\n        if self.tp_degree is not None:\n            self.num_attention_heads_per_partition = model_config.num_heads // neuronx_distributed.parallel_layers.parallel_state.get_tensor_model_parallel_size()\n            self.past_key_values_sa = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((self.num_beams,self.num_attention_heads_per_partition,self.max_length-1,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(model_config.num_decoder_layers * 2)])\n            self.past_key_values_ca = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((self.num_beams,self.num_attention_heads_per_partition,self.max_length,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(model_config.num_decoder_layers * 2)])\n\n    def forward(self, input_ids, attention_mask):\n        '''\n            This is the core functionality we want to trace. \n        '''\n        encoder_output =  self.encoder(input_ids=input_ids,\n                                       attention_mask=attention_mask,\n                                       output_attentions=False,\n                                       output_hidden_states=False)\n\n        last_hidden_state = encoder_output[\"last_hidden_state\"]\n        encoder_hidden_states = torch.concat([tensor.unsqueeze(0).repeat(self.num_beams, 1, 1) for tensor in last_hidden_state])\n\n        decoder_blocks = self.decoder.block\n        present_key_value_states_sa = []\n        present_key_value_states_ca = []\n\n        for i, block in enumerate(decoder_blocks):\n\n            # Cross attention has to be initialized with the encoder hidden state\n            cross_attention: T5LayerCrossAttention = block.layer[1]\n            attention = cross_attention.EncDecAttention\n\n            def shape(states):\n                \"\"\"projection\"\"\"\n                return states.view(self.batch_size, -1, self.num_attention_heads_per_partition, attention.key_value_proj_dim).transpose(1, 2)\n\n            key_states = shape(attention.k(encoder_hidden_states))\n            value_states = shape(attention.v(encoder_hidden_states))\n\n            if self.tp_degree is None:\n                # cross_attn_kv_state\n                present_key_value_states_ca.append(key_states) \n                present_key_value_states_ca.append(value_states) \n                \n                # Self attention kv states are initialized to zeros.\n                present_key_value_states_sa.append(torch.zeros((self.batch_size,                                                     # key states\n                                                                self.model_config.num_heads, \n                                                                self.max_length-1, \n                                                                self.model_config.d_kv), dtype=torch.float32, device=self.device)) \n                present_key_value_states_sa.append(torch.zeros((self.batch_size,                                                     # value states\n                                                                self.model_config.num_heads, \n                                                                self.max_length-1, \n                                                                self.model_config.d_kv), dtype=torch.float32, device=self.device))\n            else:\n                # We want to copy the cross attention states (key_states and value_states) into the decoder trace. \n                # One way of doing it is to get the encoder trace to return the kv states as an output and then we can pass it to the decoder trace  \n                # as an output. But this requires a copy from device to cpu and back. \n                #\n                # There is no good way to keep the output within the device yet. Until we build that feature, we use this workaround. \n                # The work around uses input_output_aliasing to map the output kv state to an input parameter. The output present_key_value_states_ca\n                # represents the cross attention kv states and is aliased to a similarly named parameter. \n                # \n                # Why are we multiplying past_key_values_ca with 0 and adding to the key or value state?  \n                # The trace api will remove any variables that are not used to compute the output tensor. As the past_key_values parameter is not \n                # being used and to compute the kv cache, it would be removed. To avoid that, we use it in an operation that computes the output \n                # but at the same time does not effect the output. \n                present_key_value_states_ca.append((self.past_key_values_ca[i*2] * 0) + key_states)\n                present_key_value_states_ca.append((self.past_key_values_ca[i*2+1] * 0) + value_states)\n                present_key_value_states_sa.append(self.past_key_values_sa[i*2]*torch.zeros((self.batch_size, self.num_attention_heads_per_partition, self.max_length-1, self.model_config.d_kv), dtype=torch.float32, device=\"xla\"))\n                present_key_value_states_sa.append(self.past_key_values_sa[i*2+1]*torch.zeros((self.batch_size, self.num_attention_heads_per_partition, self.max_length-1, self.model_config.d_kv), dtype=torch.float32, device=\"xla\"))\n\n        return present_key_value_states_sa + present_key_value_states_ca\n\nclass DecoderWrapper(torch.nn.Module):\n\n    def __init__(self, \n                 decoder: T5Stack, \n                 lm_head: torch.nn.Linear,\n                 model_config,\n                 num_beams: int, \n                 max_length: int,\n                 device: str,\n                 tp_degree=None):\n        super().__init__()        \n        self.decoder = decoder\n        self.lm_head = lm_head\n        self.model_dim=model_config.d_model\n        self.device = device\n        self.num_beams = num_beams\n        self.batch_size = 1\n        self.config = model_config\n\n        num_heads=model_config.num_heads\n        num_decoder_layers=model_config.num_decoder_layers\n\n        self.num_attention_heads_per_partition = num_heads\n        if tp_degree is not None:\n            self.num_attention_heads_per_partition = num_heads // neuronx_distributed.parallel_layers.parallel_state.get_tensor_model_parallel_size()\n\n        # (num_beams, n_heads, seq_length, dim_per_head)\n        if device == \"cpu\":\n            self.past_key_values_sa = [torch.ones((num_beams,num_heads,max_length-1,model_config.d_kv), dtype=torch.float32) for _ in range(num_decoder_layers * 2)]\n            self.past_key_values_ca = [torch.ones((num_beams,num_heads,max_length,model_config.d_kv), dtype=torch.float32) for _ in range(num_decoder_layers * 2)]\n        elif device == \"xla\":\n            self.past_key_values_sa = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((num_beams,self.num_attention_heads_per_partition,max_length-1,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(num_decoder_layers * 2)])\n            self.past_key_values_ca = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((num_beams,self.num_attention_heads_per_partition,max_length,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(num_decoder_layers * 2)])\n\n    def update_past(self, past_key_values):\n        new_past_sa = []\n        new_past_ca = []\n        for past_layer in past_key_values:\n            new_past_layer = list(past_layer)\n            for i in range(len(new_past_layer[:2])):\n                new_past_layer[i] = past_layer[i][:, :, 1:]\n            new_past_sa += [new_past_layer[:2],]\n            new_past_ca += [new_past_layer[2:],]\n        return new_past_sa, new_past_ca\n    \n    def reorder_cache(self, past_key_values, beam_idx):\n        for i in range(len(past_key_values)):\n             past_key_values[i] = torch.index_select(past_key_values[i], 0, beam_idx)\n        return past_key_values\n\n    def forward(self,\n                input_ids,\n                decoder_attention_mask,\n                encoder_hidden_states,\n                encoder_attention_mask,\n                beam_idx,\n                beam_scores,\n                **kwargs):\n\n        if self.num_beams > 1:\n            # We reorder the cache based on the beams selected in each iteration. Required step for beam search.\n            past_key_values_sa = self.reorder_cache(self.past_key_values_sa, beam_idx)\n            past_key_values_ca = self.reorder_cache(self.past_key_values_ca, beam_idx)\n        else:\n            # We do not need to reorder for greedy sampling\n            past_key_values_sa = self.past_key_values_sa\n            past_key_values_ca = self.past_key_values_ca\n\n        # The cache is stored in a flatten form. We order the cache per layer before passing it to the decoder. \n        # Each layer has 4 tensors, so we group by 4. \n        past_key_values = [[*past_key_values_sa[i*2:i*2+2], *past_key_values_ca[i*2:i*2+2]] for i in range(0, int(len(past_key_values_ca)/2))]\n\n        decoder_output = self.decoder(\n            input_ids=input_ids,\n            attention_mask=decoder_attention_mask,\n            past_key_values=past_key_values,\n            encoder_hidden_states=encoder_hidden_states,\n            encoder_attention_mask=encoder_attention_mask,\n            use_cache=True,\n            output_attentions=False,\n            output_hidden_states=False)\n\n        last_hidden_state = decoder_output['last_hidden_state']\n        past_key_values = decoder_output['past_key_values']\n\n        if self.config.tie_word_embeddings:\n            last_hidden_state = last_hidden_state * (self.model_dim**-0.5)\n\n        lm_logits = self.lm_head(last_hidden_state)\n\n        past_key_values_sa, past_key_values_ca = self.update_past(past_key_values)\n\n        # We flatten the cache to a single array. This is required for the input output aliasing to work\n        past_key_values_sa = [vec for kv_per_layer in past_key_values_sa for vec in kv_per_layer]\n        past_key_values_ca = [vec for kv_per_layer in past_key_values_ca for vec in kv_per_layer]\n\n        if self.device == \"cpu\":\n            self.past_key_values_sa = past_key_values_sa\n            self.past_key_values_ca = past_key_values_ca\n\n        # Moving the topk inside \n        next_token_logits = lm_logits[:, -1, :]\n\n        if self.num_beams > 1:\n            logit_max, _ = torch.max(next_token_logits, dim=-1, keepdim=True)\n            logsumexp = torch.log(torch.exp(next_token_logits - logit_max).sum(dim=-1, keepdim=True))\n            next_token_scores = next_token_logits - logit_max - logsumexp\n            next_token_scores = next_token_scores + beam_scores[:, None].expand_as(next_token_scores)\n\n            # reshape for beam search\n            vocab_size = next_token_scores.shape[-1]\n            next_token_scores = next_token_scores.view(self.batch_size, self.num_beams * vocab_size)\n            next_token_scores = next_token_scores * 1\n\n            # Sample 2 next tokens for each beam (so we have some spare tokens and match output of beam search)\n            next_token_scores, next_tokens = torch.topk(\n                next_token_scores, 2 * self.num_beams, dim=1, largest=True, sorted=True\n            ) \n\n            next_indices = torch.div(next_tokens, vocab_size, rounding_mode=\"floor\")\n            next_tokens = next_tokens % vocab_size\n\n            return [next_token_scores, next_tokens, next_indices] + past_key_values_sa + past_key_values_ca\n        else:\n            # Greedy    \n            next_tokens = torch.argmax(next_token_logits, dim=-1)\n            return [next_tokens] + past_key_values_sa + past_key_values_ca\n    \n"
  },
  {
    "path": "src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"variable-character\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Using NeuronCore Pipeline with PyTorch\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"valued-economics\",\n   \"metadata\": {},\n   \"source\": [\n    \"In this tutorial you compile a pretrained BERT base model from HuggingFace 🤗 Transformers, using the NeuronCore Pipeline feature of the AWS Neuron SDK. You benchmark model latency of the pipeline parallel mode and compare with the usual data parallel (multi-worker) deployment.\\n\",\n    \"\\n\",\n    \"This tutorial is intended to run in an inf1.6xlarge, running the latest AWS Deep Learning AMI (DLAMI). The inf1.6xlarge instance size has AWS Inferentia chips for a total of 16 NeuronCores.\\n\",\n    \"\\n\",\n    \"Verify that this Jupyter notebook is running the Python or Conda kernel environment that was set up according to the [PyTorch Installation Guide](../../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \\\"Kernel -> Change Kernel\\\" option on the top of this Jupyter notebook page.\\n\",\n    \"\\n\",\n    \"> __Note:__ Do not execute this tutorial using \\\"Run -> Run all cells\\\" option.  \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"private-authentication\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install Dependencies:\\n\",\n    \"This tutorial requires the following pip packages:\\n\",\n    \"\\n\",\n    \"- `torch-neuron`\\n\",\n    \"- `neuron-cc[tensorflow]`\\n\",\n    \"- `transformers`\\n\",\n    \"\\n\",\n    \"Most of these packages will be installed when configuring your environment using the Neuron PyTorch setup guide. The additional HuggingFace 🤗 Transformers dependency must be installed here.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"romantic-accident\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\\n\",\n    \"!pip install --upgrade \\\"transformers==4.6.0\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"prompt-australian\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compiling a BERT base model for a single NeuronCore\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"aging-biodiversity\",\n   \"metadata\": {},\n   \"source\": [\n    \"To run a HuggingFace [BERTModel](https://huggingface.co/transformers/model_doc/bert.html#bertmodel) on Inferentia, you only need to add a single extra line of code to the usual 🤗 Transformers PyTorch implementation, after importing the torch_neuron framework. \\n\",\n    \"\\n\",\n    \"Add the argument `return_dict=False` to the BERT transformers model so it can be traced with [TorchScript](https://pytorch.org/docs/stable/jit.html). TorchScript is a way to create serializable and optimizable models from PyTorch code. \\n\",\n    \"\\n\",\n    \"Enable padding to a maximum sequence length of 128, to test the model's performance with a realistic payload size. You can adapt this sequence length to your application's requirement. \\n\",\n    \"\\n\",\n    \"You can adapt the original example on the [BertModel forward pass docstring](https://huggingface.co/transformers/model_doc/bert.html#transformers.BertModel.forward) according to the following cell\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"stretch-preview\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"import torch_neuron\\n\",\n    \"from transformers import BertTokenizer, BertModel\\n\",\n    \"\\n\",\n    \"from joblib import Parallel, delayed  \\n\",\n    \"import numpy as np\\n\",\n    \"from tqdm import tqdm\\n\",\n    \"\\n\",\n    \"import os\\n\",\n    \"import time \\n\",\n    \"\\n\",\n    \"\\n\",\n    \"tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\\n\",\n    \"model = BertModel.from_pretrained('bert-base-uncased',return_dict=False)\\n\",\n    \"\\n\",\n    \"inputs = tokenizer(\\\"Hello, my dog is cute\\\",return_tensors=\\\"pt\\\",max_length=128,padding='max_length',truncation=True)\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"conceptual-aberdeen\",\n   \"metadata\": {},\n   \"source\": [\n    \"The one extra line required is the call to torch.neuron.trace() method. This call compiles the model and returns the forwad method of the torch `nn.Model` method, which you can use to run inference. \\n\",\n    \"\\n\",\n    \"The compiled graph can be saved using the `torch.jit.save` function and restored using `torch.jit.load` function for inference on Inf1 instances. During inference, the previously compiled artifacts will be loaded into the Neuron Runtime for inference execution.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"secondary-exclusive\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"neuron_model = torch.neuron.trace(model, \\n\",\n    \"                                  example_inputs = (inputs['input_ids'],inputs['attention_mask']),\\n\",\n    \"                                  verbose=1)\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"atmospheric-stewart\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Running the BERT base model on a single NeuronCore\\n\",\n    \"With the model already available in memory, you can time one execution and check for the latency on the single inference call. You will load the model into Inferentia with a single inference call. A large \\\"wall time\\\" is expected when you first run the next cell, running the cell twice will show the actual inference latency:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"approved-reputation\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%time\\n\",\n    \"# The following line tests inference and should be executed on Inf1 instance family. \\n\",\n    \"outputs = neuron_model(*(inputs['input_ids'],inputs['attention_mask']))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"great-collective\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can also check for the throughput of the single model running on a single NeuronCore.\\n\",\n    \"\\n\",\n    \"The sequential inference test (for loop) does not measure all the performance one can achieve in an instance with multiple NeuronCores. To improve hardwar utilization you can run parallel inference requests over multiple model workers, which you'll test in the Data Parallel Bonus Section below.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"framed-reference\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%time\\n\",\n    \"for _ in tqdm(range(100)):\\n\",\n    \"    outputs = neuron_model(*(inputs['input_ids'],inputs['attention_mask'])) \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"super-innocent\",\n   \"metadata\": {},\n   \"source\": [\n    \"Save the compiled model for later use:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"express-greensboro\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"neuron_model.save('bert-base-uncased-neuron.pt')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"modified-government\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compiling a BERT base model for 16 NeuronCores\\n\",\n    \"\\n\",\n    \"Our next step is to compile the same model for all 16 NeuronCores available in the inf1.6xlarge and check the performance difference when running pipeline parallel inferences.. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"compound-initial\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"import torch_neuron\\n\",\n    \"from transformers import BertTokenizer, BertModel\\n\",\n    \"\\n\",\n    \"from joblib import Parallel, delayed  \\n\",\n    \"import numpy as np\\n\",\n    \"from tqdm import tqdm\\n\",\n    \"\\n\",\n    \"import os\\n\",\n    \"import time \\n\",\n    \"\\n\",\n    \"\\n\",\n    \"tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\\n\",\n    \"model = BertModel.from_pretrained('bert-base-uncased',return_dict=False)\\n\",\n    \"\\n\",\n    \"inputs = tokenizer(\\\"Hello, my dog is cute\\\",return_tensors=\\\"pt\\\",max_length=128,padding='max_length',truncation=True)\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"universal-desperate\",\n   \"metadata\": {},\n   \"source\": [\n    \"To enable pipeline mode during compilation, you need only to add the compiler flag `--neuroncore-pipeline-cores` and set the number of desired cores. The cell below sets up a  `neuroncore_pipeline_cores` string, which you can set for the available number of NeuronCores on the instance: _inf1.6xlarge_ has 16 NeuronCores in 4 Inferentia chips. \\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"passing-masters\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Number of Cores in the Pipeline Mode\\n\",\n    \"neuroncore_pipeline_cores = 16 # This string should be '4' on an inf1.xlarge\\n\",\n    \"\\n\",\n    \"# Compiling for neuroncore-pipeline-cores='16'\\n\",\n    \"neuron_pipeline_model = torch.neuron.trace(model,\\n\",\n    \"                                           example_inputs = (inputs['input_ids'],inputs['attention_mask']),\\n\",\n    \"                                           verbose=1,\\n\",\n    \"                                           compiler_args = ['--neuroncore-pipeline-cores', str(neuroncore_pipeline_cores)]\\n\",\n    \"                                          )\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"enhanced-swedish\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Running the BERT base model on 16 NeuronCores\\n\",\n    \"Next, time one execution and check for the latency on the single inference call over 16 cores. You will load the model into Inferentia with a single inference call. A large \\\"wall time\\\" is expected when you first run the next cell, running the cell twice will show the actual inference latency:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"expressed-trinity\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%time\\n\",\n    \"# The following line tests inference and should be executed on Inf1 instance family. \\n\",\n    \"outputs = neuron_pipeline_model(*(inputs['input_ids'],inputs['attention_mask']))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"located-graphic\",\n   \"metadata\": {},\n   \"source\": [\n    \"Check also for the throughput of the single model running over a 16 NeuronCores. \\n\",\n    \"\\n\",\n    \"The sequential inference test (for loop) does not measure all the performance one can achieve with Pipeline mode. As the inference runs in streaming fashion, at least 15 cores are waiting for a new call until the last one processes the first call. This results in low NeuronCore utilization. To improve hardware utilization you will require parallel inference requests, which you'll test in the next section.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"hydraulic-calcium\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"for _ in tqdm(range(100)):\\n\",\n    \"    outputs = neuron_pipeline_model(*(inputs['input_ids'],inputs['attention_mask']))\\n\",\n    \"    \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"patent-victoria\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Load Testing the Pipeline Parallel Mode\\n\",\n    \"\\n\",\n    \"To put the 16 NeuronCores group to test, a client has to run concurrent requests to the model. In this Notebook setup you achieve it by creating a thread pool with `Joblib.Parallel`, with all workers on the pool runing one inference call. \\n\",\n    \"\\n\",\n    \"You can define a new method called `inference_latency()` so that you measure the amount of time each inference calls take.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"appointed-adventure\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def inference_latency(model,*inputs):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    infetence_time is a simple method to return the latency of a model inference.\\n\",\n    \"        \\n\",\n    \"        Parameters:\\n\",\n    \"            model: torch model onbject loaded using torch.jit.load\\n\",\n    \"            inputs: model() args\\n\",\n    \"        \\n\",\n    \"        Returns:\\n\",\n    \"            latency in seconds\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    start = time.time()\\n\",\n    \"    _ = model(*inputs)\\n\",\n    \"    return time.time() - start\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"environmental-guinea\",\n   \"metadata\": {},\n   \"source\": [\n    \"Use `tqdm` to measure total throughput of your experiment, with a nice side-effect of \\\"cool progress bar!\\\". The total throughput is expected to be high, so set your experiment range to a large number, here 30k inferences. \\n\",\n    \"\\n\",\n    \"To calculate the latency statistics over the returned 30k list of latencies use `numpy.qunatile()` method.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"played-catch\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"t = tqdm(range(30000), position=0, leave=True)\\n\",\n    \"latency = Parallel(n_jobs=12,prefer=\\\"threads\\\")(delayed(inference_latency)(neuron_pipeline_model,*(inputs['input_ids'],inputs['attention_mask'])) for i in t)\\n\",\n    \"\\n\",\n    \"p50 = np.quantile(latency[-10000:],0.50) * 1000\\n\",\n    \"p95 = np.quantile(latency[-10000:],0.95) * 1000\\n\",\n    \"p99 = np.quantile(latency[-10000:],0.99) * 1000\\n\",\n    \"avg_throughput = t.total/t.format_dict['elapsed']\\n\",\n    \"print(f'Avg Throughput: :{avg_throughput:.1f}')\\n\",\n    \"print(f'50th Percentile Latency:{p50:.1f} ms')\\n\",\n    \"print(f'95th Percentile Latency:{p95:.1f} ms')\\n\",\n    \"print(f'99th Percentile Latency:{p99:.1f} ms')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"exposed-northern\",\n   \"metadata\": {},\n   \"source\": [\n    \"Save compile model for later use:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"imperial-complex\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Save the TorchScript graph\\n\",\n    \"neuron_pipeline_model.save('bert-base-uncased-neuron-pipeline.pt')\"\n   ]\n  },\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"id\": \"abroad-earthquake\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Bonus Section - Load Testing Data Parallel Mode\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"therapeutic-detector\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"import torch_neuron\\n\",\n    \"from transformers import BertTokenizer \\n\",\n    \"\\n\",\n    \"from joblib import Parallel, delayed  \\n\",\n    \"import numpy as np\\n\",\n    \"from tqdm import tqdm\\n\",\n    \"\\n\",\n    \"import os\\n\",\n    \"import time \\n\",\n    \"\\n\",\n    \"def inference_latency(model,*inputs):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    infetence_time is a simple method to return the latency of a model inference.\\n\",\n    \"        \\n\",\n    \"        Parameters:\\n\",\n    \"            model: torch model onbject loaded using torch.jit.load\\n\",\n    \"            inputs: model() args\\n\",\n    \"        \\n\",\n    \"        Returns:\\n\",\n    \"            latency in seconds\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    start = time.time()\\n\",\n    \"    _ = model(*inputs)\\n\",\n    \"    return time.time() - start\\n\",\n    \"\\n\",\n    \"tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\\n\",\n    \"\\n\",\n    \"inputs = tokenizer(\\\"Hello, my dog is cute\\\",return_tensors=\\\"pt\\\",max_length=128,padding='max_length',truncation=True)\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"legal-terrorist\",\n   \"metadata\": {},\n   \"source\": [\n    \"You use the `'NEURON_RT_NUM_CORES'` environment variable to define how many Neuron cores to be used. Set the environment variable to the number of individual workers you want to test in parallel.\\n\",\n    \"\\n\",\n    \"`torch_neuron` will load one model per NeuronCore group until it runs out of cores. At that point, if the Python process continues to spawn more model objest using `torch.jit.load`, `torch_neuron` will start stacking more than one model per core, until the Inferentia chip memory is full. \\n\",\n    \"\\n\",\n    \"Inferentia is able to run inference over all the loaded models, but only one at a time. The Neuron Runtime takes care of dynamically switching the model context as requests come in, no extra worker process management required. Use 1 model per NeuronCore to achieve maximum performance.\\n\",\n    \"\\n\",\n    \"The following cell creates a list with as many models as NeuronCore Groups and execute one single dummy inference to load the models into Inferentia. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"current-mechanics\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import warnings\\n\",\n    \"# Number of data parallel workers\\n\",\n    \"number_of_workers=16 # This number should be 4 on an inf1.xlarge\\n\",\n    \"\\n\",\n    \"# Setting up a data parallel group\\n\",\n    \"os.environ['NEURON_RT_NUM_CORES'] = str(number_of_workers)\\n\",\n    \"\\n\",\n    \"# Loading 'number_of_workers' amount of models in Python memory\\n\",\n    \"model_list = [torch.jit.load('bert-base-uncased-neuron.pt') for _ in range(number_of_workers)]\\n\",\n    \"\\n\",\n    \"# Dummy inference to load models to Inferentia\\n\",\n    \"_ = [mod(*(inputs['input_ids'],inputs['attention_mask'])) for mod in model_list]\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"threatened-swaziland\",\n   \"metadata\": {},\n   \"source\": [\n    \"Adapt the call to `joblib.Parallel()` iterating over a concatenated version of the `model_list`, to run 'round-robin' calls to each of the model workers.  \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"fleet-month\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"t = tqdm(model_list*1500,position=0, leave=True)\\n\",\n    \"latency = Parallel(n_jobs=number_of_workers,prefer=\\\"threads\\\")(delayed(inference_latency)(mod,*(inputs['input_ids'],inputs['attention_mask'])) for mod in t)\\n\",\n    \"\\n\",\n    \"p50 = np.quantile(latency[-10000:],0.50) * 1000\\n\",\n    \"p95 = np.quantile(latency[-10000:],0.95) * 1000\\n\",\n    \"p99 = np.quantile(latency[-10000:],0.99) * 1000\\n\",\n    \"avg_throughput = t.total/t.format_dict['elapsed']\\n\",\n    \"print(f'Avg Throughput: :{avg_throughput:.1f}')\\n\",\n    \"print(f'50th Percentile Latency:{p50:.1f} ms')\\n\",\n    \"print(f'95th Percentile Latency:{p95:.1f} ms')\\n\",\n    \"print(f'99th Percentile Latency:{p99:.1f} ms')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"aggressive-stevens\",\n   \"metadata\": {},\n   \"source\": [\n    \"For this model, despite the larger number of workers, the per-worker latency increases when running a single model per core, which in turn reduces the total throughput. \\n\",\n    \"\\n\",\n    \"This behavior may not repeat if the model memory footprint or the input payload size changes, i.e batch size > 1. We encourage you to experiment with the data parallel and pipeline parallel modes to optimize your application performance. \"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Environment (conda_aws_neuron_pytorch_p36)\",\n   \"language\": \"python\",\n   \"name\": \"conda_aws_neuron_pytorch_p36\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.6.13\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "src/examples/pytorch/resnet50.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# ResNet50 model for Inferentia\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"## Introduction:\\n\",\n    \"\\n\",\n    \"In this tutorial we will compile and deploy a ResNet50 model for inference on Inferentia. \\n\",\n    \"\\n\",\n    \"This Jupyter notebook should run on an inf1.6xlarge instance. The inference part of this tutorial requires an inf1 instance, not the compilation stage. For simplicity we will run this tutorial on an inf1.6xlarge, but in real life scenarios the compilation should be done on a compute instance and the deployment on an inf1 instance to save costs. \\n\",\n    \"\\n\",\n    \"In this tutorial we provide three main sections:\\n\",\n    \"\\n\",\n    \"1. Compile the ResNet50 model and infer with a batch size of 1\\n\",\n    \"\\n\",\n    \"2. Run the same compiled model on multiple NeuronCores using `torch.neuron.DataParallel` and dynamic batching\\n\",\n    \"\\n\",\n    \"3. Compile the ResNet50 model with a batch size of 5 and run it on multiple NeuronCores using `torch.neuron.DataParallel` for optimal performance on Inferentia\\n\",\n    \"\\n\",\n    \"Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \\\"Kernel -> Change Kernel\\\" option on the top of this Jupyter notebook page.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install Dependencies:\\n\",\n    \"This tutorial requires the following pip packages:\\n\",\n    \"\\n\",\n    \"- `torch>=1.8`\\n\",\n    \"- `torch-neuron`\\n\",\n    \"- `torchvision`\\n\",\n    \"- `neuron-cc[tensorflow]`\\n\",\n    \"\\n\",\n    \"These will be installed by default when configuring your environment using the Neuron PyTorch setup guide.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile model for Neuron\\n\",\n    \"\\n\",\n    \"The following step will compile the ResNet50 model for Inferentia. This will take a few minutes. At the end of script execution, the compiled model is saved as `resnet50_neuron.pt` in your local directory\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"from torchvision import models, transforms, datasets\\n\",\n    \"import torch_neuron\\n\",\n    \"\\n\",\n    \"# Create an example input for compilation\\n\",\n    \"image = torch.zeros([1, 3, 224, 224], dtype=torch.float32)\\n\",\n    \"\\n\",\n    \"# Load a pretrained ResNet50 model\\n\",\n    \"model = models.resnet50(pretrained=True)\\n\",\n    \"\\n\",\n    \"# Tell the model we are using it for evaluation (not training)\\n\",\n    \"model.eval()\\n\",\n    \"\\n\",\n    \"# Analyze the model - this will show operator support and operator count\\n\",\n    \"torch.neuron.analyze_model(model, example_inputs=[image])\\n\",\n    \"\\n\",\n    \"# Compile the model using torch.neuron.trace to create a Neuron model\\n\",\n    \"# that that is optimized for the Inferentia hardware\\n\",\n    \"model_neuron = torch.neuron.trace(model, example_inputs=[image])\\n\",\n    \"\\n\",\n    \"# The output of the compilation step will report the percentage of operators that \\n\",\n    \"# are compiled to Neuron, for example:\\n\",\n    \"#\\n\",\n    \"# INFO:Neuron:The neuron partitioner created 1 sub-graphs\\n\",\n    \"# INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%\\n\",\n    \"# \\n\",\n    \"# We will also be warned if there are operators that are not placed on the Inferentia hardware\\n\",\n    \"\\n\",\n    \"# Save the compiled model\\n\",\n    \"model_neuron.save(\\\"resnet50_neuron.pt\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run inference on Inferentia\\n\",\n    \"\\n\",\n    \"We can use the compiled Neuron model to run inference on Inferentia.\\n\",\n    \"\\n\",\n    \"In the following example, we preprocess a sample image for inference using the CPU model and Neuron model. We compare the predicted labels from the CPU model and Neuron model to verify that they are the same.\\n\",\n    \"\\n\",\n    \"Important: Do not perform inference with a Neuron traced model on a non-Neuron supported instance, as the results will not be calculated properly.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Define a preprocessing function\\n\",\n    \"\\n\",\n    \"We define a basic image preprocessing function that loads a sample image and labels, normalizes and batches the image, and transforms the image into a tensor for inference using the compiled Neuron model.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import json\\n\",\n    \"import os\\n\",\n    \"from urllib import request\\n\",\n    \"\\n\",\n    \"# Create an image directory containing a sample image of a small kitten\\n\",\n    \"os.makedirs(\\\"./torch_neuron_test/images\\\", exist_ok=True)\\n\",\n    \"request.urlretrieve(\\\"https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg\\\",\\n\",\n    \"                    \\\"./torch_neuron_test/images/kitten_small.jpg\\\")\\n\",\n    \"\\n\",\n    \"# Fetch labels to output the top classifications\\n\",\n    \"request.urlretrieve(\\\"https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json\\\",\\\"imagenet_class_index.json\\\")\\n\",\n    \"idx2label = []\\n\",\n    \"\\n\",\n    \"# Read the labels and create a list to hold them for classification \\n\",\n    \"with open(\\\"imagenet_class_index.json\\\", \\\"r\\\") as read_file:\\n\",\n    \"    class_idx = json.load(read_file)\\n\",\n    \"    idx2label = [class_idx[str(k)][1] for k in range(len(class_idx))]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"def preprocess(batch_size=1, num_neuron_cores=1):\\n\",\n    \"    # Define a normalization function using the ImageNet mean and standard deviation\\n\",\n    \"    normalize = transforms.Normalize(\\n\",\n    \"        mean=[0.485, 0.456, 0.406],\\n\",\n    \"        std=[0.229, 0.224, 0.225])\\n\",\n    \"\\n\",\n    \"    # Resize the sample image to [1, 3, 224, 224], normalize it, and turn it into a tensor\\n\",\n    \"    eval_dataset = datasets.ImageFolder(\\n\",\n    \"        os.path.dirname(\\\"./torch_neuron_test/\\\"),\\n\",\n    \"        transforms.Compose([\\n\",\n    \"        transforms.Resize([224, 224]),\\n\",\n    \"        transforms.ToTensor(),\\n\",\n    \"        normalize,\\n\",\n    \"        ])\\n\",\n    \"    )\\n\",\n    \"    image, _ = eval_dataset[0]\\n\",\n    \"    image = torch.tensor(image.numpy()[np.newaxis, ...])\\n\",\n    \"\\n\",\n    \"    # Create a \\\"batched\\\" image with enough images to go on each of the available NeuronCores\\n\",\n    \"    # batch_size is the per-core batch size\\n\",\n    \"    # num_neuron_cores is the number of NeuronCores being used\\n\",\n    \"    batch_image = image\\n\",\n    \"    for i in range(batch_size * num_neuron_cores - 1):\\n\",\n    \"        batch_image = torch.cat([batch_image, image], 0)\\n\",\n    \"     \\n\",\n    \"    return batch_image\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Run inference using the Neuron model\\n\",\n    \"\\n\",\n    \"We import the necessary python modules, load the torch-neuron compiled model, and run inference on Inferentia. \\n\",\n    \"\\n\",\n    \"By default, the Neuron model will run on a single NeuronCore. In the next section, we will see how to run the Neuron model on multiple NeuronCores to fully saturate our hardware for optimal performance on Inferentia. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"from torchvision import models, transforms, datasets\\n\",\n    \"import torch_neuron\\n\",\n    \"\\n\",\n    \"# Get a sample image\\n\",\n    \"image = preprocess()\\n\",\n    \"\\n\",\n    \"# Run inference using the CPU model\\n\",\n    \"output_cpu = model(image)\\n\",\n    \"\\n\",\n    \"# Load the compiled Neuron model\\n\",\n    \"model_neuron = torch.jit.load('resnet50_neuron.pt')\\n\",\n    \"\\n\",\n    \"# Run inference using the Neuron model\\n\",\n    \"output_neuron = model_neuron(image)\\n\",\n    \"\\n\",\n    \"# Verify that the CPU and Neuron predictions are the same by comparing\\n\",\n    \"# the top-5 results\\n\",\n    \"top5_cpu = output_cpu[0].sort()[1][-5:]\\n\",\n    \"top5_neuron = output_neuron[0].sort()[1][-5:]\\n\",\n    \"\\n\",\n    \"# Lookup and print the top-5 labels\\n\",\n    \"top5_labels_cpu = [idx2label[idx] for idx in top5_cpu]\\n\",\n    \"top5_labels_neuron = [idx2label[idx] for idx in top5_neuron]\\n\",\n    \"print(\\\"CPU top-5 labels: {}\\\".format(top5_labels_cpu))\\n\",\n    \"print(\\\"Neuron top-5 labels: {}\\\".format(top5_labels_neuron))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run Inference using torch.neuron.DataParallel\\n\",\n    \"\\n\",\n    \"To fully leverage the Inferentia hardware we want to use all avaialable NeuronCores. An inf1.xlarge and inf1.2xlarge have four NeuronCores, an inf1.6xlarge has 16 NeuronCores, and an inf1.24xlarge has 64 NeuronCores. For maximum performance on Inferentia hardware, we can use `torch.neuron.DataParallel` to utilize all available NeuronCores.\\n\",\n    \"\\n\",\n    \"`torch.neuron.DataParallel` implements data parallelism at the module level by duplicating the Neuron model on all available NeuronCores and distributing data across the different cores for parallelized inference.\\n\",\n    \"\\n\",\n    \"In the following section, we will run inference using the `torch.neuron.DataParallel` module to fully saturate the Inferentia hardware. We benchmark the model to collect throughput and latency statistics.\\n\",\n    \"\\n\",\n    \"Note: `torch.neuron.DataParallel` is new with Neuron 1.16.0. Please ensure you are using the latest Neuron package to run the following sections. \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Define a benchmarking function\\n\",\n    \"\\n\",\n    \"We create a function that handles benchmarking the Neuron model to collect throughput and latency metrics. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from time import time\\n\",\n    \"\\n\",\n    \"def benchmark(model, image):\\n\",\n    \"    print('Input image shape is {}'.format(list(image.shape)))\\n\",\n    \"    \\n\",\n    \"    # The first inference loads the model so exclude it from timing \\n\",\n    \"    results = model(image)\\n\",\n    \"    \\n\",\n    \"    # Collect throughput and latency metrics\\n\",\n    \"    latency = []\\n\",\n    \"    throughput = []\\n\",\n    \"\\n\",\n    \"    # Run inference for 100 iterations and calculate metrics\\n\",\n    \"    num_infers = 100\\n\",\n    \"    for _ in range(num_infers):\\n\",\n    \"        delta_start = time()\\n\",\n    \"        results = model(image)\\n\",\n    \"        delta = time() - delta_start\\n\",\n    \"        latency.append(delta)\\n\",\n    \"        throughput.append(image.size(0)/delta)\\n\",\n    \"    \\n\",\n    \"    # Calculate and print the model throughput and latency\\n\",\n    \"    print(\\\"Avg. Throughput: {:.0f}, Max Throughput: {:.0f}\\\".format(np.mean(throughput), np.max(throughput)))\\n\",\n    \"    print(\\\"Latency P50: {:.0f}\\\".format(np.percentile(latency, 50)*1000.0))\\n\",\n    \"    print(\\\"Latency P90: {:.0f}\\\".format(np.percentile(latency, 90)*1000.0))\\n\",\n    \"    print(\\\"Latency P95: {:.0f}\\\".format(np.percentile(latency, 95)*1000.0))\\n\",\n    \"    print(\\\"Latency P99: {:.0f}\\\\n\\\".format(np.percentile(latency, 99)*1000.0))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Run Inference using torch.neuron.DataParallel\\n\",\n    \"\\n\",\n    \"We create the `torch.neuron.DataParallel` module using the compiled Neuron model, get a sample image, and benchmark the parallelized model on Neuron.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Create a torch.neuron.DataParallel module using the compiled Neuron model\\n\",\n    \"# By default, torch.neuron.DataParallel will use four cores on an inf1.xlarge\\n\",\n    \"# or inf1.2xlarge, 16 cores on an inf1.6xlarge, and 24 cores on an inf1.24xlarge\\n\",\n    \"model_neuron_parallel = torch.neuron.DataParallel(model_neuron)\\n\",\n    \"\\n\",\n    \"# Get sample image with batch size=1 per NeuronCore\\n\",\n    \"batch_size = 1\\n\",\n    \"\\n\",\n    \"# For an inf1.xlarge or inf1.2xlarge, set num_neuron_cores = 4\\n\",\n    \"num_neuron_cores = 16\\n\",\n    \"\\n\",\n    \"image = preprocess(batch_size=batch_size, num_neuron_cores=num_neuron_cores)\\n\",\n    \"\\n\",\n    \"# Benchmark the model\\n\",\n    \"benchmark(model_neuron_parallel, image)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run inference with dynamic batch sizes\\n\",\n    \"\\n\",\n    \"Batch size has a direct impact on model performance. The Inferentia chip is optimized to run with small batch sizes. This means that a Neuron compiled model can outperform a GPU model, even if running single digit batch sizes.\\n\",\n    \"\\n\",\n    \"As a general best practice, we recommend optimizing your model's throughput by compiling the model with a small batch size and gradually increasing it to find the peak throughput on Inferentia.\\n\",\n    \"\\n\",\n    \"Dynamic batching is a feature that allows you to use tensor batch sizes that the Neuron model was not originally compiled against. This is necessary because the underlying Inferentia hardware will always execute inferences with the batch size used during compilation. Fixed batch size execution allows tuning the input batch size for optimal performance. For example, batch size 1 may be best suited for an ultra-low latency on-demand inference application, while batch size > 1 can be used to maximize throughput for offline inferencing. Dynamic batching is implemented by slicing large input tensors into chunks that match the batch size used during the `torch.neuron.trace` compilation call. \\n\",\n    \"\\n\",\n    \"The `torch.neuron.DataParallel` class automatically enables dynamic batching on eligible models. This allows us to run inference in applications that have inputs with a variable batch size without needing to recompile the model.\\n\",\n    \"\\n\",\n    \"In the following example, we use the same `torch.neuron.DataParallel` module to run inference using several different batch sizes. Notice that latency increases consistently as the batch size increases. Throughput increases as well, up until a certain point where the input size becomes too large to be efficient.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# using the same DataParallel model_neuron_parallel model, we can run\\n\",\n    \"# inference on inputs with a variable batch size without recompiling\\n\",\n    \"batch_sizes = [2, 3, 4, 5, 6, 7]\\n\",\n    \"for batch_size in batch_sizes:\\n\",\n    \"    print('Batch size: {}'.format(batch_size))\\n\",\n    \"    image = preprocess(batch_size=batch_size, num_neuron_cores=num_neuron_cores)\\n\",\n    \"    \\n\",\n    \"    # Benchmark the model for each input batch size\\n\",\n    \"    benchmark(model_neuron_parallel, image)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile and Infer with different batch sizes on multiple NeuronCores\\n\",\n    \"\\n\",\n    \"Dynamic batching using small batch sizes can result in sub-optimal throughput because it involves slicing tensors into chunks and iteratively sending data to the hardware. Using a larger batch size at compilation time can use the Inferentia hardware more efficiently in order to maximize throughput. You can test the tradeoff between individual request latency and total throughput by fine-tuning the input batch size.\\n\",\n    \"\\n\",\n    \"In the following example, we recompile our model using a batch size of 5 and run the model using `torch.neuron.DataParallel` to fully saturate our Inferentia hardware for optimal performance.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Create an input with batch size 5 for compilation\\n\",\n    \"batch_size = 5\\n\",\n    \"image = torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\\n\",\n    \"\\n\",\n    \"# Recompile the ResNet50 model for inference with batch size 5\\n\",\n    \"model_neuron = torch.neuron.trace(model, example_inputs=[image])\\n\",\n    \"\\n\",\n    \"# Export to saved model\\n\",\n    \"model_neuron.save(\\\"resnet50_neuron_b{}.pt\\\".format(batch_size))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Run inference with batch size of 5 using the Neuron model compiled for a batch size of 5.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"batch_size = 5\\n\",\n    \"\\n\",\n    \"# Load compiled Neuron model\\n\",\n    \"model_neuron = torch.jit.load(\\\"resnet50_neuron_b{}.pt\\\".format(batch_size))\\n\",\n    \"\\n\",\n    \"# Create DataParallel model\\n\",\n    \"model_neuron_parallel = torch.neuron.DataParallel(model_neuron)\\n\",\n    \"\\n\",\n    \"# Get sample image with batch size=5\\n\",\n    \"image = preprocess(batch_size=batch_size, num_neuron_cores=num_neuron_cores)\\n\",\n    \"\\n\",\n    \"# Benchmark the model\\n\",\n    \"benchmark(model_neuron_parallel, image)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can experiment with different batch size values to see what gives the best overall throughput on Inferentia.\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.8.9 64-bit\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.9\"\n  },\n  \"vscode\": {\n   \"interpreter\": {\n    \"hash\": \"31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6\"\n   }\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "src/examples/pytorch/resnet50_partition.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Manual Partitioning Tutorial\\n\",\n    \"\\n\",\n    \"In this tutorial we will run through how to manually partition a graph.  There are six steps:\\n\",\n    \"\\n\",\n    \"1. Import ResNet50 code from torchvision and set to evaluation mode\\n\",\n    \"1. Download a test image and preprocess it\\n\",\n    \"1. Run inference on CPU as a baseline\\n\",\n    \"1. Manually partition the graph using Neuron\\n\",\n    \"1. Save the model to be loaded on another instance\\n\",\n    \"1. Inspect the graph to deepen our understanding\\n\",\n    \"\\n\",\n    \"The following is a ResNet50 implementation copied from `torchvision.models.resnet`.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**STEP 1:** Import torchvision ResNet50 and run the model on CPU\\n\",\n    \"\\n\",\n    \"Note that training code can be inserted before `model.eval()` if retraining/fine-tuning is necessary.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"import torch.nn as nn\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):\\n\",\n    \"    \\\"\\\"\\\"3x3 convolution with padding\\\"\\\"\\\"\\n\",\n    \"    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,\\n\",\n    \"                     padding=dilation, groups=groups, bias=False, dilation=dilation)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def conv1x1(in_planes, out_planes, stride=1):\\n\",\n    \"    \\\"\\\"\\\"1x1 convolution\\\"\\\"\\\"\\n\",\n    \"    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class Bottleneck(nn.Module):\\n\",\n    \"    expansion = 4\\n\",\n    \"\\n\",\n    \"    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,\\n\",\n    \"                 base_width=64, dilation=1, norm_layer=None):\\n\",\n    \"        super(Bottleneck, self).__init__()\\n\",\n    \"        if norm_layer is None:\\n\",\n    \"            norm_layer = nn.BatchNorm2d\\n\",\n    \"        width = int(planes * (base_width / 64.)) * groups\\n\",\n    \"        # Both self.conv2 and self.downsample layers downsample the input when stride != 1\\n\",\n    \"        self.conv1 = conv1x1(inplanes, width)\\n\",\n    \"        self.bn1 = norm_layer(width)\\n\",\n    \"        self.conv2 = conv3x3(width, width, stride, groups, dilation)\\n\",\n    \"        self.bn2 = norm_layer(width)\\n\",\n    \"        self.conv3 = conv1x1(width, planes * self.expansion)\\n\",\n    \"        self.bn3 = norm_layer(planes * self.expansion)\\n\",\n    \"        self.relu = nn.ReLU(inplace=True)\\n\",\n    \"        self.downsample = downsample\\n\",\n    \"        self.stride = stride\\n\",\n    \"\\n\",\n    \"    def forward(self, x):\\n\",\n    \"        identity = x\\n\",\n    \"\\n\",\n    \"        out = self.conv1(x)\\n\",\n    \"        out = self.bn1(out)\\n\",\n    \"        out = self.relu(out)\\n\",\n    \"\\n\",\n    \"        out = self.conv2(out)\\n\",\n    \"        out = self.bn2(out)\\n\",\n    \"        out = self.relu(out)\\n\",\n    \"\\n\",\n    \"        out = self.conv3(out)\\n\",\n    \"        out = self.bn3(out)\\n\",\n    \"\\n\",\n    \"        if self.downsample is not None:\\n\",\n    \"            identity = self.downsample(x)\\n\",\n    \"\\n\",\n    \"        out += identity\\n\",\n    \"        out = self.relu(out)\\n\",\n    \"\\n\",\n    \"        return out\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class ResNet(nn.Module):\\n\",\n    \"\\n\",\n    \"    def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,\\n\",\n    \"                 groups=1, width_per_group=64, replace_stride_with_dilation=None,\\n\",\n    \"                 norm_layer=None):\\n\",\n    \"        super(ResNet, self).__init__()\\n\",\n    \"        if norm_layer is None:\\n\",\n    \"            norm_layer = nn.BatchNorm2d\\n\",\n    \"        self._norm_layer = norm_layer\\n\",\n    \"\\n\",\n    \"        self.inplanes = 64\\n\",\n    \"        self.dilation = 1\\n\",\n    \"        if replace_stride_with_dilation is None:\\n\",\n    \"            # each element in the tuple indicates if we should replace\\n\",\n    \"            # the 2x2 stride with a dilated convolution instead\\n\",\n    \"            replace_stride_with_dilation = [False, False, False]\\n\",\n    \"        if len(replace_stride_with_dilation) != 3:\\n\",\n    \"            raise ValueError(\\\"replace_stride_with_dilation should be None \\\"\\n\",\n    \"                             \\\"or a 3-element tuple, got {}\\\".format(replace_stride_with_dilation))\\n\",\n    \"        self.groups = groups\\n\",\n    \"        self.base_width = width_per_group\\n\",\n    \"        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,\\n\",\n    \"                               bias=False)\\n\",\n    \"        self.bn1 = norm_layer(self.inplanes)\\n\",\n    \"        self.relu = nn.ReLU(inplace=True)\\n\",\n    \"        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\\n\",\n    \"        self.layer1 = self._make_layer(block, 64, layers[0])\\n\",\n    \"        self.layer2 = self._make_layer(block, 128, layers[1], stride=2,\\n\",\n    \"                                       dilate=replace_stride_with_dilation[0])\\n\",\n    \"        self.layer3 = self._make_layer(block, 256, layers[2], stride=2,\\n\",\n    \"                                       dilate=replace_stride_with_dilation[1])\\n\",\n    \"        self.layer4 = self._make_layer(block, 512, layers[3], stride=2,\\n\",\n    \"                                       dilate=replace_stride_with_dilation[2])\\n\",\n    \"        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))\\n\",\n    \"        self.fc = nn.Linear(512 * block.expansion, num_classes)\\n\",\n    \"\\n\",\n    \"        for m in self.modules():\\n\",\n    \"            if isinstance(m, nn.Conv2d):\\n\",\n    \"                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\\n\",\n    \"            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):\\n\",\n    \"                nn.init.constant_(m.weight, 1)\\n\",\n    \"                nn.init.constant_(m.bias, 0)\\n\",\n    \"\\n\",\n    \"        # Zero-initialize the last BN in each residual branch,\\n\",\n    \"        # so that the residual branch starts with zeros, and each residual block behaves like an identity.\\n\",\n    \"        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677\\n\",\n    \"        if zero_init_residual:\\n\",\n    \"            for m in self.modules():\\n\",\n    \"                if isinstance(m, Bottleneck):\\n\",\n    \"                    nn.init.constant_(m.bn3.weight, 0)\\n\",\n    \"                elif isinstance(m, BasicBlock):\\n\",\n    \"                    nn.init.constant_(m.bn2.weight, 0)\\n\",\n    \"\\n\",\n    \"    def _make_layer(self, block, planes, blocks, stride=1, dilate=False):\\n\",\n    \"        norm_layer = self._norm_layer\\n\",\n    \"        downsample = None\\n\",\n    \"        previous_dilation = self.dilation\\n\",\n    \"        if dilate:\\n\",\n    \"            self.dilation *= stride\\n\",\n    \"            stride = 1\\n\",\n    \"        if stride != 1 or self.inplanes != planes * block.expansion:\\n\",\n    \"            downsample = nn.Sequential(\\n\",\n    \"                conv1x1(self.inplanes, planes * block.expansion, stride),\\n\",\n    \"                norm_layer(planes * block.expansion),\\n\",\n    \"            )\\n\",\n    \"\\n\",\n    \"        layers = []\\n\",\n    \"        layers.append(block(self.inplanes, planes, stride, downsample, self.groups,\\n\",\n    \"                            self.base_width, previous_dilation, norm_layer))\\n\",\n    \"        self.inplanes = planes * block.expansion\\n\",\n    \"        for _ in range(1, blocks):\\n\",\n    \"            layers.append(block(self.inplanes, planes, groups=self.groups,\\n\",\n    \"                                base_width=self.base_width, dilation=self.dilation,\\n\",\n    \"                                norm_layer=norm_layer))\\n\",\n    \"\\n\",\n    \"        return nn.Sequential(*layers)\\n\",\n    \"\\n\",\n    \"    def forward(self, x):\\n\",\n    \"        x = self.conv1(x)\\n\",\n    \"        x = self.bn1(x)\\n\",\n    \"        x = self.relu(x)\\n\",\n    \"        x = self.maxpool(x)\\n\",\n    \"\\n\",\n    \"        x = self.layer1(x)\\n\",\n    \"        x = self.layer2(x)\\n\",\n    \"        x = self.layer3(x)\\n\",\n    \"        x = self.layer4(x)\\n\",\n    \"\\n\",\n    \"        x = self.avgpool(x)\\n\",\n    \"        x = torch.flatten(x, 1)\\n\",\n    \"        x = self.fc(x)\\n\",\n    \"\\n\",\n    \"        return x\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from torch.utils.model_zoo import load_url as load_state_dict_from_url\\n\",\n    \"\\n\",\n    \"model = ResNet(Bottleneck, [3, 4, 6, 3])\\n\",\n    \"state_dict = load_state_dict_from_url('https://download.pytorch.org/models/resnet50-19c8e357.pth', progress=True)\\n\",\n    \"model.load_state_dict(state_dict)\\n\",\n    \"# you can do some training here, before calling model.eval()\\n\",\n    \"model.eval()\\n\",\n    \"print('ResNet50 model is turned into inference mode')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**STEP 2:** Download a cat image and preprocess it\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import numpy as np\\n\",\n    \"from torchvision import transforms, datasets\\n\",\n    \"from tensorflow.keras.applications import resnet50\\n\",\n    \"import urllib.request\\n\",\n    \"\\n\",\n    \"imagedir = './images'\\n\",\n    \"os.makedirs(imagedir, exist_ok=True)\\n\",\n    \"urllib.request.urlretrieve(\\n\",\n    \"    'https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg',\\n\",\n    \"    os.path.join(imagedir, 'kitten_small.jpg'),\\n\",\n    \")\\n\",\n    \"normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\\n\",\n    \"                                 std=[0.229, 0.224, 0.225])\\n\",\n    \"eval_dataset = datasets.ImageFolder(\\n\",\n    \"    '.',\\n\",\n    \"    transforms.Compose([\\n\",\n    \"        transforms.Resize([224, 224]),\\n\",\n    \"        transforms.ToTensor(),\\n\",\n    \"        normalize,\\n\",\n    \"    ])\\n\",\n    \")\\n\",\n    \"image, label = eval_dataset[0]\\n\",\n    \"image = torch.tensor(image.numpy()[np.newaxis, ...])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**STEP 3:** Run inference without neuron for comparison\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"print('model inference result:')\\n\",\n    \"print(resnet50.decode_predictions(model(image).detach().numpy(), top=5)[0])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"STEP 4: Run the same inference using torch.jit.trace - then we can save and load the model\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"jit_trace = torch.jit.trace(model, example_inputs=image)\\n\",\n    \"print('jit.trace inference result:')\\n\",\n    \"print(resnet50.decode_predictions(jit_trace(image).detach().numpy(), top=5)[0])\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"jit_trace_filename = 'resnet50_jit_trace.pt'\\n\",\n    \"jit_trace.save(jit_trace_filename)\\n\",\n    \"jit_trace_loaded = torch.jit.load(jit_trace_filename)\\n\",\n    \"print('loaded jit.trace inferenced result:')\\n\",\n    \"print(resnet50.decode_predictions(jit_trace_loaded(image).detach().numpy(), top=5)[0])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**STEP 4:** Manually partition the ResNet50 model and execute it\\n\",\n    \"\\n\",\n    \"To generate a Neuron-optimized TorchScript with only layers 1~4 placed on Neuron runtime, we first define a new module class `ResNetNeuron` inheriting from `ResNet`. We add 'torch.neuron.trace' calls in the forward function of this module in order to turn layer submodules into Neuron-optimized ones.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch.neuron\\n\",\n    \"\\n\",\n    \"class ResNetNeuron(ResNet):\\n\",\n    \"\\n\",\n    \"    def trace(self, x):\\n\",\n    \"        x = self.conv1(x)\\n\",\n    \"        x = self.bn1(x)\\n\",\n    \"        x = self.relu(x)\\n\",\n    \"        x = self.maxpool(x)\\n\",\n    \"\\n\",\n    \"        self.layer1 = torch.neuron.trace(self.layer1, x, fallback=False)\\n\",\n    \"        x = self.layer1(x)\\n\",\n    \"        \\n\",\n    \"        self.layer2 = torch.neuron.trace(self.layer2, x, fallback=False)\\n\",\n    \"        x = self.layer2(x)\\n\",\n    \"        \\n\",\n    \"        self.layer3 = torch.neuron.trace(self.layer3, x, fallback=False)\\n\",\n    \"        x = self.layer3(x)\\n\",\n    \"        \\n\",\n    \"        self.layer4 = torch.neuron.trace(self.layer4, x, fallback=False)\\n\",\n    \"        \\n\",\n    \"    def forward(self, x):\\n\",\n    \"        x = self.conv1(x)\\n\",\n    \"        x = self.bn1(x)\\n\",\n    \"        x = self.relu(x)\\n\",\n    \"        x = self.maxpool(x)\\n\",\n    \"        \\n\",\n    \"        # After running ResNetNeuron::trace, these layers will be placed on Neuron\\n\",\n    \"        x = self.layer1(x)\\n\",\n    \"        x = self.layer2(x)\\n\",\n    \"        x = self.layer3(x)\\n\",\n    \"        x = self.layer4(x)\\n\",\n    \"        \\n\",\n    \"        x = self.avgpool(x)\\n\",\n    \"        x = torch.flatten(x, 1)\\n\",\n    \"        x = self.fc(x)\\n\",\n    \"\\n\",\n    \"        return x\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Now construct the class and runn an inference to trigger the `neuron-cc` compiler.  Watch for the [ \\\\* ] icon to the left of this cell to disappear and show a number - this will take a minute or two to run\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"model_neuron = ResNetNeuron(Bottleneck, [3, 4, 6, 3])\\n\",\n    \"model_neuron.load_state_dict(state_dict)\\n\",\n    \"model_neuron.eval()\\n\",\n    \"model_neuron.trace(image) # this line triggers neuron-cc compiler\\n\",\n    \"result = model_neuron(image)\\n\",\n    \"print('Neuron optimized model inference result:')\\n\",\n    \"print(resnet50.decode_predictions(result.detach().numpy(), top=5)[0])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**STEP 5:** Save the model as TorchScript ready to deploy\\n\",\n    \"\\n\",\n    \"To deploy the Neuron-optimized as TorchScript, we use `torch.jit.trace` again to generate TorchScript for the entire mode, including the Neuron-optimized `ScriptModule`\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"neuron_trace = torch.jit.trace(model_neuron, example_inputs=image)\\n\",\n    \"print('neuron.trace inference result:')\\n\",\n    \"print(resnet50.decode_predictions(neuron_trace(image).detach().numpy(), top=5)[0])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"This Neuron-optimized `ScriptModule` can be saved/loaded easily and be deployed on inf1 instances.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"neuron_trace_filename = 'resnet50_neuron_trace.pt'\\n\",\n    \"neuron_trace.save(neuron_trace_filename)\\n\",\n    \"neuron_trace_loaded = torch.jit.load(neuron_trace_filename)\\n\",\n    \"print('loaded neuron.trace inference result:')\\n\",\n    \"print(resnet50.decode_predictions(neuron_trace_loaded(image).detach().numpy(), top=5)[0])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**STEP 6:** Understanding the neuron graph\\n\",\n    \"\\n\",\n    \"We can inspect the graph property of the Neuron-optimized `ScriptModule` to get an idea of how Neuron-optimization is performed. Each `torch.neuron.trace` call fuses a submodule (layer) into a `neuron::forward`/`NeuronModule` operator.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"neuron_trace.graph\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.2\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2\n}\n"
  },
  {
    "path": "src/examples/pytorch/torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"id\": \"e11b2ce1\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Compiling and Deploying HuggingFace Pretrained BERT on Trn1 or Inf2\"\n   ]\n  },\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"id\": \"59a44364\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Introduction\\n\",\n    \"\\n\",\n    \"In this tutorial we will compile and deploy a HuggingFace 🤗 Transformers BERT model for accelerated inference on Neuron. In this tutorial, we will be deploying directly on Trn1/Inf2 instances. If you are looking to deploy this model through SageMaker on Inf2 instance, please visit the [Sagemaker samples repository](https://github.com/aws-neuron/aws-neuron-sagemaker-samples/tree/master/inference/inf2-bert-on-sagemaker). \\n\",\n    \"\\n\",\n    \"This tutorial will use the [bert-base-cased-finetuned-mrpc](https://huggingface.co/bert-base-cased-finetuned-mrpc) model. This model has 12 layers, 768 hidden dimensions, 12 attention heads, and 110M total parameters. The final layer is a binary classification head that has been trained on the Microsoft Research Paraphrase Corpus (`mrpc`). The input to the model is two sentences and the output of the model is whether or not those sentences are a paraphrase of each other. \\n\",\n    \"\\n\",\n    \"This tutorial has the following main sections:\\n\",\n    \"\\n\",\n    \"1. Install dependencies\\n\",\n    \"1. Compile the BERT model\\n\",\n    \"1. Run inference on Neuron and compare results to CPU\\n\",\n    \"1. Benchmark the model using multicore inference\\n\",\n    \"1. Finding the optimal batch size\\n\",\n    \"\\n\",\n    \"This Jupyter notebook should be run on a Trn1 instance (`trn1.2xlarge` or larger.) or Inf2 instance (`inf2.xlarge` or larger.)\"\n   ]\n  },\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"id\": \"9ceecb92\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install dependencies\\n\",\n    \"\\n\",\n    \"The code in this tutorial is written for Jupyter Notebooks. To use Jupyter Notebook on the Neuron instance, you\\n\",\n    \"can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"This tutorial requires the following pip packages:\\n\",\n    \"\\n\",\n    \"- `torch-neuronx`\\n\",\n    \"- `neuronx-cc`\\n\",\n    \"- `transformers`\\n\",\n    \"\\n\",\n    \"Most of these packages will be installed when configuring your environment using the Trn1/Inf2 setup guide. The additional dependencies must be installed here:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"66392b0b\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\\n\",\n    \"%env HF_HUB_DISABLE_PROGRESS_BARS=1 # Avoids xet progress bar model download error\\n\",\n    \"!pip install --upgrade transformers\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"82533d8e\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile the model into an AWS Neuron optimized TorchScript\\n\",\n    \"\\n\",\n    \"In the following section, we load the BERT model and tokenizer, get a sample input, run inference on CPU, compile the model for Neuron using `torch_neuronx.trace()`, and save the optimized model as `TorchScript`.\\n\",\n    \"\\n\",\n    \"`torch_neuronx.trace()` expects a tensor or tuple of tensor inputs to use for tracing, so we unpack the tokenizer output using the `encode` function. \\n\",\n    \"\\n\",\n    \"The result of the trace stage will be a static executable where the operations to be run upon inference are determined during compilation. This means that when inferring, the resulting Neuron model must be executed with tensors that are the exact same shape as those provided at compilation time. If a model is given a tensor at inference time whose shape does not match the tensor given at compilation time, an error will occur.\\n\",\n    \"\\n\",\n    \"For language models, the shape of the tokenizer tensors can vary based on the length of input sentence. We can satisfy the Neuron restriction of using a fixed shape input by padding all varying input tensors to a specified length. In a deployment scenario, the padding size should be chosen based on the maximum token length that is expected to occur for the application.\\n\",\n    \"\\n\",\n    \"In the following section we will assume that we will receive a maximum of 128 tokens at inference time. We will pad our example inputs by using `padding='max_length'` and to avoid potential errors caused by creating a tensor that is larger than `max_length=128`, we will always tokenize using `truncation=True`.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0c9aac5e\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"import torch_neuronx\\n\",\n    \"from transformers import AutoTokenizer, AutoModelForSequenceClassification\\n\",\n    \"import transformers\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def encode(tokenizer, *inputs, max_length=128, batch_size=1):\\n\",\n    \"    tokens = tokenizer.encode_plus(\\n\",\n    \"        *inputs,\\n\",\n    \"        max_length=max_length,\\n\",\n    \"        padding='max_length',\\n\",\n    \"        truncation=True,\\n\",\n    \"        return_tensors=\\\"pt\\\"\\n\",\n    \"    )\\n\",\n    \"    return (\\n\",\n    \"        torch.repeat_interleave(tokens['input_ids'], batch_size, 0),\\n\",\n    \"        torch.repeat_interleave(tokens['attention_mask'], batch_size, 0),\\n\",\n    \"        torch.repeat_interleave(tokens['token_type_ids'], batch_size, 0),\\n\",\n    \"    )\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"# Create the tokenizer and model\\n\",\n    \"name = \\\"bert-base-cased-finetuned-mrpc\\\"\\n\",\n    \"tokenizer = AutoTokenizer.from_pretrained(name)\\n\",\n    \"model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)\\n\",\n    \"\\n\",\n    \"# Set up some example inputs\\n\",\n    \"sequence_0 = \\\"The company HuggingFace is based in New York City\\\"\\n\",\n    \"sequence_1 = \\\"Apples are especially bad for your health\\\"\\n\",\n    \"sequence_2 = \\\"HuggingFace's headquarters are situated in Manhattan\\\"\\n\",\n    \"\\n\",\n    \"paraphrase = encode(tokenizer, sequence_0, sequence_2)\\n\",\n    \"not_paraphrase = encode(tokenizer, sequence_0, sequence_1)\\n\",\n    \"\\n\",\n    \"# Run the original PyTorch BERT model on CPU\\n\",\n    \"cpu_paraphrase_logits = model(*paraphrase)[0]\\n\",\n    \"cpu_not_paraphrase_logits = model(*not_paraphrase)[0]\\n\",\n    \"\\n\",\n    \"# Compile the model for Neuron\\n\",\n    \"model_neuron = torch_neuronx.trace(model, paraphrase)\\n\",\n    \"\\n\",\n    \"# Save the TorchScript for inference deployment\\n\",\n    \"filename = 'model.pt'\\n\",\n    \"torch.jit.save(model_neuron, filename)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"53e9605d\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run inference and compare results\\n\",\n    \"\\n\",\n    \"In this section we load the compiled model, run inference on Neuron, and compare the CPU and Neuron outputs.\\n\",\n    \"\\n\",\n    \"NOTE: Although this tutorial section uses one NeuronCore (and the next section uses two NeuronCores), by default each Jupyter notebook Python process will attempt to take ownership of all NeuronCores visible on the instance. For multi-process applications where each process should only use a subset of the NeuronCores on the instance you can use NEURON_RT_NUM_CORES=N or NEURON_RT_VISIBLE_CORES=< list of NeuronCore IDs > when starting the Jupyter notebook as described in [NeuronCore Allocation and Model Placement for Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/core-placement.html).\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"a8d509aa\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Load the TorchScript compiled model\\n\",\n    \"model_neuron = torch.jit.load(filename)\\n\",\n    \"\\n\",\n    \"# Verify the TorchScript works on both example inputs\\n\",\n    \"neuron_paraphrase_logits = model_neuron(*paraphrase)[0]\\n\",\n    \"neuron_not_paraphrase_logits = model_neuron(*not_paraphrase)[0]\\n\",\n    \"\\n\",\n    \"# Compare the results\\n\",\n    \"print('CPU paraphrase logits:        ', cpu_paraphrase_logits.detach().numpy())\\n\",\n    \"print('Neuron paraphrase logits:    ', neuron_paraphrase_logits.detach().numpy())\\n\",\n    \"print('CPU not-paraphrase logits:    ', cpu_not_paraphrase_logits.detach().numpy())\\n\",\n    \"print('Neuron not-paraphrase logits: ', neuron_not_paraphrase_logits.detach().numpy())\"\n   ]\n  },\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"id\": \"a4553cc9\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Benchmarking\\n\",\n    \"\\n\",\n    \"In this section we benchmark the performance of the BERT model on Neuron. By default, models compiled with `torch_neuronx` will always execute on a *single* NeuronCore. When loading *multiple* models, the default behavior of the Neuron runtime is to evenly distribute models across all available NeuronCores. The runtime places models on the NeuronCore that has the fewest models loaded to it first. In the following section, we will `torch.jit.load` multiple instances of the model which should each be loaded onto their own NeuronCore. It is not useful to load more copies of a model than the number of NeuronCores on the instance since an individual NeuronCore can only execute one model at a time.\\n\",\n    \"\\n\",\n    \"To ensure that we are maximizing hardware utilization, we must run inferences using multiple threads in parallel. It is nearly always recommended to use some form of threading/multiprocessing and some form of model replication since even the smallest Neuron EC2 instance has 2 NeuronCores available. Applications with no form of threading are only capable of `1 / num_neuron_cores` hardware utilization which becomes especially problematic on large instances.\\n\",\n    \"\\n\",\n    \"One way to view the hardware utilization is by executing the `neuron-top` application in the terminal while the benchmark is executing. If the monitor shows >90% utilization on all NeuronCores, this is a good indication that the hardware is being utilized effectively.\\n\",\n    \"\\n\",\n    \"In this example we load two models, which utilizes all NeuronCores (2) on a `trn1.2xlarge` or `inf2.xlarge` instance. Additional models can be loaded and run in parallel on larger Trn1 or Inf2 instance sizes to increase throughput.\\n\",\n    \"\\n\",\n    \"We define a benchmarking function that loads two optimized BERT models onto two separate NeuronCores, runs multithreaded inference, and calculates the corresponding latency and throughput.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"c9e14b0d\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import time\\n\",\n    \"import concurrent.futures\\n\",\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def benchmark(filename, example, n_models=2, n_threads=2, batches_per_thread=1000):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Record performance statistics for a serialized model and its input example.\\n\",\n    \"\\n\",\n    \"    Arguments:\\n\",\n    \"        filename: The serialized torchscript model to load for benchmarking.\\n\",\n    \"        example: An example model input.\\n\",\n    \"        n_models: The number of models to load.\\n\",\n    \"        n_threads: The number of simultaneous threads to execute inferences on.\\n\",\n    \"        batches_per_thread: The number of example batches to run per thread.\\n\",\n    \"\\n\",\n    \"    Returns:\\n\",\n    \"        A dictionary of performance statistics.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    # Load models\\n\",\n    \"    models = [torch.jit.load(filename) for _ in range(n_models)]\\n\",\n    \"\\n\",\n    \"    # Warmup\\n\",\n    \"    for _ in range(8):\\n\",\n    \"        for model in models:\\n\",\n    \"            model(*example)\\n\",\n    \"\\n\",\n    \"    latencies = []\\n\",\n    \"\\n\",\n    \"    # Thread task\\n\",\n    \"    def task(model):\\n\",\n    \"        for _ in range(batches_per_thread):\\n\",\n    \"            start = time.time()\\n\",\n    \"            model(*example)\\n\",\n    \"            finish = time.time()\\n\",\n    \"            latencies.append((finish - start) * 1000)\\n\",\n    \"\\n\",\n    \"    # Submit tasks\\n\",\n    \"    begin = time.time()\\n\",\n    \"    with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as pool:\\n\",\n    \"        for i in range(n_threads):\\n\",\n    \"            pool.submit(task, models[i % len(models)])\\n\",\n    \"    end = time.time()\\n\",\n    \"\\n\",\n    \"    # Compute metrics\\n\",\n    \"    boundaries = [50, 95, 99]\\n\",\n    \"    percentiles = {}\\n\",\n    \"\\n\",\n    \"    for boundary in boundaries:\\n\",\n    \"        name = f'latency_p{boundary}'\\n\",\n    \"        percentiles[name] = np.percentile(latencies, boundary)\\n\",\n    \"    duration = end - begin\\n\",\n    \"    batch_size = 0\\n\",\n    \"    for tensor in example:\\n\",\n    \"        if batch_size == 0:\\n\",\n    \"            batch_size = tensor.shape[0]\\n\",\n    \"    inferences = len(latencies) * batch_size\\n\",\n    \"    throughput = inferences / duration\\n\",\n    \"\\n\",\n    \"    # Metrics\\n\",\n    \"    metrics = {\\n\",\n    \"        'filename': str(filename),\\n\",\n    \"        'batch_size': batch_size,\\n\",\n    \"        'batches': len(latencies),\\n\",\n    \"        'inferences': inferences,\\n\",\n    \"        'threads': n_threads,\\n\",\n    \"        'models': n_models,\\n\",\n    \"        'duration': duration,\\n\",\n    \"        'throughput': throughput,\\n\",\n    \"        **percentiles,\\n\",\n    \"    }\\n\",\n    \"\\n\",\n    \"    display(metrics)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def display(metrics):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Display the metrics produced by `benchmark` function.\\n\",\n    \"\\n\",\n    \"    Args:\\n\",\n    \"        metrics: A dictionary of performance statistics.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    pad = max(map(len, metrics)) + 1\\n\",\n    \"    for key, value in metrics.items():\\n\",\n    \"\\n\",\n    \"        parts = key.split('_')\\n\",\n    \"        parts = list(map(str.title, parts))\\n\",\n    \"        title = ' '.join(parts) + \\\":\\\"\\n\",\n    \"\\n\",\n    \"        if isinstance(value, float):\\n\",\n    \"            value = f'{value:0.3f}'\\n\",\n    \"\\n\",\n    \"        print(f'{title :<{pad}} {value}')\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"# Benchmark BERT on Neuron\\n\",\n    \"benchmark(filename, paraphrase)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"fc374b12\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Finding the optimal batch size\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"113acb55\",\n   \"metadata\": {},\n   \"source\": [\n    \"Batch size has a direct impact on model performance. The NeuronCore architecture is optimized to maximize throughput with relatively small batch sizes. This means that a Neuron compiled model can outperform a GPU model, even if running single digit batch sizes.\\n\",\n    \"\\n\",\n    \"As a general best practice, we recommend optimizing your model’s throughput by compiling the model with a small batch size and gradually increasing it to find the peak throughput on Neuron. To minimize latency, using `batch size = 1` will nearly always be optimal. This batch size configuration is typically used for on-demand inference applications. To maximize throughput, *usually* `1 < batch_size < 10` is optimal. A configuration which uses a larger batch size is generally ideal for batched on-demand inference or offline batch processing.\\n\",\n    \"\\n\",\n    \"In the following section, we compile BERT for multiple batch size inputs. We then run inference on each batch size and benchmark the performance. Notice that latency increases consistently as the batch size increases. Throughput increases as well, up until a certain point where the input size becomes too large to be efficient.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"be26aafc\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Compile BERT for different batch sizes\\n\",\n    \"for batch_size in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:\\n\",\n    \"    tokenizer = AutoTokenizer.from_pretrained(name)\\n\",\n    \"    model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)\\n\",\n    \"    example = encode(tokenizer, sequence_0, sequence_2, batch_size=batch_size)\\n\",\n    \"    model_neuron = torch_neuronx.trace(model, example)\\n\",\n    \"    filename = f'model_batch_size_{batch_size}.pt'\\n\",\n    \"    torch.jit.save(model_neuron, filename)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"8f0f6ed2\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Benchmark BERT for different batch sizes\\n\",\n    \"for batch_size in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:\\n\",\n    \"    print('-'*50)\\n\",\n    \"    example = encode(tokenizer, sequence_0, sequence_2, batch_size=batch_size)\\n\",\n    \"    filename = f'model_batch_size_{batch_size}.pt'\\n\",\n    \"    benchmark(filename, example)\\n\",\n    \"    print()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python (Neuron PyTorch)\",\n   \"language\": \"python\",\n   \"name\": \"pytorch_venv\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.7.16\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "src/examples/pytorch/torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"id\": \"6a30ffd9\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Compiling and Deploying ResNet50 on Trn1 or Inf2\"\n   ]\n  },\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"id\": \"ea682fbe\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Introduction\\n\",\n    \"\\n\",\n    \"In this tutorial we will compile and deploy a TorchVision ResNet50 model for accelerated inference on Neuron. To get started with\\n\",\n    \"Jupyter Notebook on Neuron Instance you launched, please use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"This tutorial will use the [resnet50](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html) model, which is primarily used for arbitrary image classification tasks.\\n\",\n    \"\\n\",\n    \"This tutorial has the following main sections:\\n\",\n    \"\\n\",\n    \"1. Install dependencies\\n\",\n    \"1. Compile the ResNet model\\n\",\n    \"1. Run inference on Neuron and compare results to CPU\\n\",\n    \"1. Benchmark the model using multicore inference\\n\",\n    \"1. Finding the optimal batch size\\n\",\n    \"\\n\",\n    \"This Jupyter notebook should be run on a Trn1 instance (`trn1.2xlarge` or larger.) or Inf2 instance (`inf2.xlarge` or larger.)\"\n   ]\n  },\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"id\": \"5f60760a\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install Dependencies\\n\",\n    \"The code in this tutorial is written for Jupyter Notebooks. To use Jupyter Notebook on the Neuron instance, you\\n\",\n    \"can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"This tutorial requires the following pip packages:\\n\",\n    \"\\n\",\n    \"- `torch-neuronx`\\n\",\n    \"- `neuronx-cc`\\n\",\n    \"- `torchvision`\\n\",\n    \"- `Pillow`\\n\",\n    \"\\n\",\n    \"Most of these packages will be installed when configuring your environment using the Trn1 setup guide. The additional dependencies must be installed here:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"c44c5df5\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%env HF_HUB_DISABLE_PROGRESS_BARS=1 # Avoids xet progress bar model download error\\n\",\n    \"!pip install Pillow\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"de2efba5\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile the model into an AWS Neuron optimized TorchScript\\n\",\n    \"\\n\",\n    \"In the following section, we load the model, get a sample input, run inference on CPU, compile the model for Neuron using `torch_neuronx.trace()`, and save the optimized model as `TorchScript`.\\n\",\n    \"\\n\",\n    \"`torch_neuronx.trace()` expects a tensor or tuple of tensor inputs to use for tracing, so we convert the input image into a tensor using the `get_image` function.\\n\",\n    \"\\n\",\n    \"The result of the trace stage will be a static executable where the operations to be run upon inference are determined during compilation. This means that when inferring, the resulting Neuron model must be executed with tensors that are the exact same shape as those provided at compilation time. If a model is given a tensor at inference time whose shape does not match the tensor given at compilation time, an error will occur. \\n\",\n    \"\\n\",\n    \"In the following section, we assume that we will receive an image shape of `[1, 3, 224, 224]` at inference time.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"1650de1f\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import urllib\\n\",\n    \"from PIL import Image\\n\",\n    \"\\n\",\n    \"import torch\\n\",\n    \"import torch_neuronx\\n\",\n    \"from torchvision import models\\n\",\n    \"from torchvision.transforms import functional\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_image(batch_size=1, image_shape=(224, 224)):\\n\",\n    \"    # Get an example input\\n\",\n    \"    filename = \\\"000000039769.jpg\\\"\\n\",\n    \"    if not os.path.exists(filename):\\n\",\n    \"        url = \\\"http://images.cocodataset.org/val2017/000000039769.jpg\\\"\\n\",\n    \"        urllib.request.urlretrieve(url, filename)\\n\",\n    \"    image = Image.open(filename).convert('RGB')\\n\",\n    \"    image = functional.resize(image, (image_shape))\\n\",\n    \"    image = functional.to_tensor(image)\\n\",\n    \"    image = torch.unsqueeze(image, 0)\\n\",\n    \"    image = torch.repeat_interleave(image, batch_size, 0)\\n\",\n    \"    return (image, )\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"# Create the model\\n\",\n    \"model = models.resnet50(pretrained=True)\\n\",\n    \"model.eval()\\n\",\n    \"\\n\",\n    \"# Get an example input\\n\",\n    \"image = get_image()\\n\",\n    \"\\n\",\n    \"# Run inference on CPU\\n\",\n    \"output_cpu = model(*image)\\n\",\n    \"\\n\",\n    \"# Compile the model\\n\",\n    \"model_neuron = torch_neuronx.trace(model, image)\\n\",\n    \"\\n\",\n    \"# Save the TorchScript for inference deployment\\n\",\n    \"filename = 'model.pt'\\n\",\n    \"torch.jit.save(model_neuron, filename)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"25f453f8\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run inference and compare results\\n\",\n    \"\\n\",\n    \"In this section we load the compiled model, run inference on Neuron, and compare the CPU and Neuron outputs using the ImageNet classes.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"b4a203aa\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import json\\n\",\n    \"\\n\",\n    \"# Load the TorchScript compiled model\\n\",\n    \"model_neuron = torch.jit.load(filename)\\n\",\n    \"\\n\",\n    \"# Run inference using the Neuron model\\n\",\n    \"output_neuron = model_neuron(*image)\\n\",\n    \"\\n\",\n    \"# Compare the results\\n\",\n    \"print(f\\\"CPU tensor:    {output_cpu[0][0:10]}\\\")\\n\",\n    \"print(f\\\"Neuron tensor: {output_neuron[0][0:10]}\\\")\\n\",\n    \"\\n\",\n    \"# Download and read the ImageNet classes\\n\",\n    \"urllib.request.urlretrieve(\\\"https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json\\\",\\\"imagenet_class_index.json\\\")\\n\",\n    \"with open(\\\"imagenet_class_index.json\\\", \\\"r\\\") as file:\\n\",\n    \"    class_id = json.load(file)\\n\",\n    \"    id2label = [class_id[str(i)][1] for i in range(len(class_id))]\\n\",\n    \"\\n\",\n    \"# Lookup and print the top-5 labels\\n\",\n    \"top5_cpu = output_cpu[0].sort()[1][-5:]\\n\",\n    \"top5_neuron = output_neuron[0].sort()[1][-5:]\\n\",\n    \"top5_labels_cpu = [id2label[idx] for idx in top5_cpu]\\n\",\n    \"top5_labels_neuron = [id2label[idx] for idx in top5_neuron]\\n\",\n    \"print(f\\\"CPU top-5 labels:    {top5_labels_cpu}\\\")\\n\",\n    \"print(f\\\"Neuron top-5 labels: {top5_labels_neuron}\\\")\"\n   ]\n  },\n  {\n   \"attachments\": {},\n   \"cell_type\": \"markdown\",\n   \"id\": \"c96389ae\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Benchmarking\\n\",\n    \"\\n\",\n    \"In this section we benchmark the performance of the ResNet model on Neuron. By default, models compiled with `torch_neuronx` will always execute on a *single* NeuronCore. When loading *multiple* models, the default behavior of the Neuron runtime is to evenly distribute models across all available NeuronCores. The runtime places models on the NeuronCore that has the fewest models loaded to it first. In the following section, we will `torch.jit.load` multiple instances of the model which should each be loaded onto their own NeuronCore. It is not useful to load more copies of a model than the number of NeuronCores on the instance since an individual NeuronCore can only execute one model at a time.\\n\",\n    \"\\n\",\n    \"To ensure that we are maximizing hardware utilization, we must run inferences using multiple threads in parallel. It is nearly always recommended to use some form of threading/multiprocessing and some form of model replication since even the smallest Neuron EC2 instance has 2 NeuronCores available. Applications with no form of threading are only capable of `1 / num_neuron_cores` hardware utilization which becomes especially problematic on large instances.\\n\",\n    \"\\n\",\n    \"One way to view the hardware utilization is by executing the `neuron-top` application in the terminal while the benchmark is executing. If the monitor shows >90% utilization on all NeuronCores, this is a good indication that the hardware is being utilized effectively.\\n\",\n    \"\\n\",\n    \"In this example we load two models, which utilizes all NeuronCores (2) on a `trn1.2xlarge` or `inf2.xlarge` instance. Additional models can be loaded and run in parallel on larger Trn1 or Inf2 instance sizes to increase throughput.\\n\",\n    \"\\n\",\n    \"We define a benchmarking function that loads two optimized ResNet models onto two separate NeuronCores, runs multithreaded inference, and calculates the corresponding latency and throughput.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"9657ae4f\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import time\\n\",\n    \"import concurrent.futures\\n\",\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def benchmark(filename, example, n_models=2, n_threads=2, batches_per_thread=1000):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Record performance statistics for a serialized model and its input example.\\n\",\n    \"\\n\",\n    \"    Arguments:\\n\",\n    \"        filename: The serialized torchscript model to load for benchmarking.\\n\",\n    \"        example: An example model input.\\n\",\n    \"        n_models: The number of models to load.\\n\",\n    \"        n_threads: The number of simultaneous threads to execute inferences on.\\n\",\n    \"        batches_per_thread: The number of example batches to run per thread.\\n\",\n    \"\\n\",\n    \"    Returns:\\n\",\n    \"        A dictionary of performance statistics.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    # Load models\\n\",\n    \"    models = [torch.jit.load(filename) for _ in range(n_models)]\\n\",\n    \"\\n\",\n    \"    # Warmup\\n\",\n    \"    for _ in range(8):\\n\",\n    \"        for model in models:\\n\",\n    \"            model(*example)\\n\",\n    \"\\n\",\n    \"    latencies = []\\n\",\n    \"\\n\",\n    \"    # Thread task\\n\",\n    \"    def task(model):\\n\",\n    \"        for _ in range(batches_per_thread):\\n\",\n    \"            start = time.time()\\n\",\n    \"            model(*example)\\n\",\n    \"            finish = time.time()\\n\",\n    \"            latencies.append((finish - start) * 1000)\\n\",\n    \"\\n\",\n    \"    # Submit tasks\\n\",\n    \"    begin = time.time()\\n\",\n    \"    with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as pool:\\n\",\n    \"        for i in range(n_threads):\\n\",\n    \"            pool.submit(task, models[i % len(models)])\\n\",\n    \"    end = time.time()\\n\",\n    \"\\n\",\n    \"    # Compute metrics\\n\",\n    \"    boundaries = [50, 95, 99]\\n\",\n    \"    percentiles = {}\\n\",\n    \"\\n\",\n    \"    for boundary in boundaries:\\n\",\n    \"        name = f'latency_p{boundary}'\\n\",\n    \"        percentiles[name] = np.percentile(latencies, boundary)\\n\",\n    \"    duration = end - begin\\n\",\n    \"    batch_size = 0\\n\",\n    \"    for tensor in example:\\n\",\n    \"        if batch_size == 0:\\n\",\n    \"            batch_size = tensor.shape[0]\\n\",\n    \"    inferences = len(latencies) * batch_size\\n\",\n    \"    throughput = inferences / duration\\n\",\n    \"\\n\",\n    \"    # Metrics\\n\",\n    \"    metrics = {\\n\",\n    \"        'filename': str(filename),\\n\",\n    \"        'batch_size': batch_size,\\n\",\n    \"        'batches': len(latencies),\\n\",\n    \"        'inferences': inferences,\\n\",\n    \"        'threads': n_threads,\\n\",\n    \"        'models': n_models,\\n\",\n    \"        'duration': duration,\\n\",\n    \"        'throughput': throughput,\\n\",\n    \"        **percentiles,\\n\",\n    \"    }\\n\",\n    \"\\n\",\n    \"    display(metrics)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def display(metrics):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Display the metrics produced by `benchmark` function.\\n\",\n    \"\\n\",\n    \"    Args:\\n\",\n    \"        metrics: A dictionary of performance statistics.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    pad = max(map(len, metrics)) + 1\\n\",\n    \"    for key, value in metrics.items():\\n\",\n    \"\\n\",\n    \"        parts = key.split('_')\\n\",\n    \"        parts = list(map(str.title, parts))\\n\",\n    \"        title = ' '.join(parts) + \\\":\\\"\\n\",\n    \"\\n\",\n    \"        if isinstance(value, float):\\n\",\n    \"            value = f'{value:0.3f}'\\n\",\n    \"\\n\",\n    \"        print(f'{title :<{pad}} {value}')\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"# Benchmark ResNet on Neuron\\n\",\n    \"benchmark(filename, image)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"795d2fca\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Finding the optimal batch size\\n\",\n    \"\\n\",\n    \"Batch size has a direct impact on model performance. The NeuronCore architecture is optimized to maximize throughput with relatively small batch sizes. This means that a Neuron compiled model can outperform a GPU model, even if running single digit batch sizes.\\n\",\n    \"\\n\",\n    \"As a general best practice, we recommend optimizing your model’s throughput by compiling the model with a small batch size and gradually increasing it to find the peak throughput on Neuron. To minimize latency, using `batch size = 1` will nearly always be optimal. This batch size configuration is typically used for on-demand inference applications. To maximize throughput, *usually* `1 < batch_size < 10` is optimal. A configuration which uses a larger batch size is generally ideal for batched on-demand inference or offline batch processing.\\n\",\n    \"\\n\",\n    \"In the following section, we compile ResNet for multiple batch size inputs. We then run inference on each batch size and benchmark the performance. Notice that latency increases consistently as the batch size increases. Throughput increases as well, up until a certain point where the input size becomes too large to be efficient.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"fdef1805\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Compile ResNet for different batch sizes\\n\",\n    \"for batch_size in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:\\n\",\n    \"    model = models.resnet50(pretrained=True)\\n\",\n    \"    model.eval()\\n\",\n    \"    example = get_image(batch_size=batch_size)\\n\",\n    \"    model_neuron = torch_neuronx.trace(model, example)\\n\",\n    \"    filename = f'model_batch_size_{batch_size}.pt'\\n\",\n    \"    torch.jit.save(model_neuron, filename)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ec244d4e\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Benchmark ResNet for different batch sizes\\n\",\n    \"for batch_size in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:\\n\",\n    \"    print('-'*50)\\n\",\n    \"    example = get_image(batch_size=batch_size)\\n\",\n    \"    filename = f'model_batch_size_{batch_size}.pt'\\n\",\n    \"    benchmark(filename, example)\\n\",\n    \"    print()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python (Neuron PyTorch)\",\n   \"language\": \"python\",\n   \"name\": \"pytorch_venv\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.7.16\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "src/examples/pytorch/torch-neuronx/t5-inference-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# T5 model inference on Trn1 or Inf2\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Introduction\\n\",\n    \"\\n\",\n    \"In this tutorial we will compile and deploy a pretrained T5 model for accelerated inference on Neuron. \\n\",\n    \"\\n\",\n    \"This tutorial will use the [t5-large](https://huggingface.co/t5-large) model. The T5 model can be used for machine translation, document summarization, question answering, and classification tasks. \\n\",\n    \"\\n\",\n    \"This tutorial has the following main sections:\\n\",\n    \"\\n\",\n    \"1. Install dependencies\\n\",\n    \"1. Compile the T5 model\\n\",\n    \"1. Run inference with greedy decoding on Neuron\\n\",\n    \"1. Run infernece with beam search on Neuron\\n\",\n    \"\\n\",\n    \"This Jupyter notebook should be run on a Trn1 instance (`trn1.2xlarge` or larger.) or Inf2 instance (`inf2.xlarge` or larger.)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install dependencies\\n\",\n    \"\\n\",\n    \"The code in this tutorial is written for Jupyter Notebooks. To use Jupyter Notebook on the Neuron instance, you\\n\",\n    \"can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\\n\",\n    \"\\n\",\n    \"This tutorial requires the following pip packages:\\n\",\n    \"\\n\",\n    \"- `torch-neuronx`\\n\",\n    \"- `neuronx-cc`\\n\",\n    \"- `transformers`\\n\",\n    \"- `optimum-neuron`\\n\",\n    \"\\n\",\n    \"Most of these packages will be installed when configuring your environment using the Trn1/Inf2 setup guide. The additional dependencies must be installed here:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%env HF_HUB_DISABLE_PROGRESS_BARS=1 # Avoids xet progress bar model download error\\n\",\n    \"!pip install --upgrade transformers==4.31.0 optimum-neuron==0.0.8 sentencepiece\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\n\",\n    \"🤗 Optimum Neuron is the interface between the 🤗 Transformers library and AWS Accelerators including AWS Trainium and AWS Inferentia. It provides a set of tools enabling easy model loading, training and inference on single- and multi-Accelerator settings for different downstream tasks. In this tutorial we use 🤗 HuggingFace Optimum Neuron's generate() method instead of 🤗 [transformers's generate()](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.generate) to perform greedy decoding. Optimum Neuron takes care of padding the inputs which is necessary to infer on Neuron.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile the model into an AWS Neuron optimized TorchScript\\n\",\n    \"\\n\",\n    \"In the following section, we load the T5 model, compile the model's encoder and decoder for Neuron using `torch_neuronx.trace()`, and save the optimized encoder and decoder as `TorchScript`. \\n\",\n    \"\\n\",\n    \"Before we trace the model, we need to make a couple of changes. \\n\",\n    \"\\n\",\n    \"1. We need to write encoder and decoder wrappers - `torch_neuronx` can only trace functions with positional arguments. But the T5 encoder and decoder both use keyword arguments. So, in order to trace them, we have to write wrappers that convert keyword arguments to positional arguments \\n\",\n    \"2. We modify the t5 code to maximize the computation on the neuron device - Having sections of code running on cpu will reduce the performance. Moreover, we do not want to move data berween the neuron device and cpu during inference. The code we trace with `torch_neuronx` is the code that runs on the neuron device, so we refactor the t5 code to run computationally heavy operations within the wrapper.  \\n\",\n    \"\\n\",\n    \"Let us start with the EncoderWrapper. \\n\",\n    \"\\n\",\n    \"In the huggingface t5 implementation, the encoder block takes in the input ids and returns the encoder hidden states. This hidden states are then used to initialize the KV cache in the decoder blocks during the first decoder invocation. We could trace both the encoder and the cache initialization step separately. But there is a better way, we could just compute the initial KV cache state within the encoder wrapper. This way, we remove the overhead of moving the hidden states from neuron device to cpu and back. This also allows neuron's compiler to optimize execution across both the encoder and cache initialization. \\n\",\n    \"\\n\",\n    \"*Why don't we just initalize the cache on the first decoder run?* \\n\",\n    \"\\n\",\n    \"This is harder to do on Neuron. Similar to `torch.jit.trace()`, `torch_neuronx.trace()` produces a function that has a fixed control flow, i.e. there are no conditional executions. So we cannot choose to conditionally initialize the cache in the first decoder iteration. Instead, we can compute the initial cache state outside the generation flow and pass the cache to it. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"\\n\",\n    \"from transformers.models.t5.modeling_t5 import T5Stack, T5LayerCrossAttention\\n\",\n    \"\\n\",\n    \"class EncoderWrapper(torch.nn.Module):\\n\",\n    \"    '''\\n\",\n    \"        We will trace an instance of the EncoderWrapper. \\n\",\n    \"        This wrapper just converts positional args to kwargs. \\n\",\n    \"    '''\\n\",\n    \"\\n\",\n    \"    def __init__(self, \\n\",\n    \"                 encoder,\\n\",\n    \"                 decoder, \\n\",\n    \"                 model_config, \\n\",\n    \"                 batch_size, \\n\",\n    \"                 max_length, \\n\",\n    \"                 device, \\n\",\n    \"                 num_beams,\\n\",\n    \"                 tp_degree=None):\\n\",\n    \"        \\n\",\n    \"        super().__init__()\\n\",\n    \"        self.encoder = encoder\\n\",\n    \"        self.decoder = decoder\\n\",\n    \"        self.batch_size = batch_size\\n\",\n    \"        self.max_length = max_length\\n\",\n    \"        self.model_config = model_config\\n\",\n    \"        self.device = device\\n\",\n    \"        self.num_beams = num_beams\\n\",\n    \"        self.num_attention_heads_per_partition = model_config.num_heads\\n\",\n    \"        self.tp_degree = tp_degree\\n\",\n    \"\\n\",\n    \"    def forward(self, input_ids, attention_mask):\\n\",\n    \"        '''\\n\",\n    \"            This is the core functionality we want to trace. \\n\",\n    \"        '''\\n\",\n    \"        encoder_output =  self.encoder(input_ids=input_ids,\\n\",\n    \"                                       attention_mask=attention_mask,\\n\",\n    \"                                       output_attentions=False,\\n\",\n    \"                                       output_hidden_states=False)\\n\",\n    \"\\n\",\n    \"        last_hidden_state = encoder_output[\\\"last_hidden_state\\\"]\\n\",\n    \"        encoder_hidden_states = torch.concat([tensor.unsqueeze(0).repeat(self.num_beams, 1, 1) for tensor in last_hidden_state])\\n\",\n    \"\\n\",\n    \"        decoder_blocks = self.decoder.block\\n\",\n    \"        present_key_value_states_sa = []\\n\",\n    \"        present_key_value_states_ca = []\\n\",\n    \"\\n\",\n    \"        for i, block in enumerate(decoder_blocks):\\n\",\n    \"\\n\",\n    \"            # Cross attention has to be initialized with the encoder hidden state\\n\",\n    \"            cross_attention: T5LayerCrossAttention = block.layer[1]\\n\",\n    \"            attention = cross_attention.EncDecAttention\\n\",\n    \"\\n\",\n    \"            def shape(states):\\n\",\n    \"                \\\"\\\"\\\"projection\\\"\\\"\\\"\\n\",\n    \"                return states.view(self.batch_size, -1, self.num_attention_heads_per_partition, attention.key_value_proj_dim).transpose(1, 2)\\n\",\n    \"\\n\",\n    \"            key_states = shape(attention.k(encoder_hidden_states))\\n\",\n    \"            value_states = shape(attention.v(encoder_hidden_states))\\n\",\n    \"\\n\",\n    \"            # cross_attn_kv_state\\n\",\n    \"            present_key_value_states_ca.append(key_states) \\n\",\n    \"            present_key_value_states_ca.append(value_states) \\n\",\n    \"            \\n\",\n    \"            # Self attention kv states are initialized to zeros. This is done to keep the size of the kv cache tensor constant. \\n\",\n    \"            # The kv cache will be an input to the decoder trace. Any traced function will have a fixed control flow. What this means \\n\",\n    \"            # is that the trace performs the exact same computations on inputs of the same shape in each invocation. So the attention \\n\",\n    \"            # kv cache is padded here to keep a fixed shape. \\n\",\n    \"            present_key_value_states_sa.append(torch.zeros((self.batch_size,                                                     # key states\\n\",\n    \"                                                            self.model_config.num_heads, \\n\",\n    \"                                                            self.max_length-1, \\n\",\n    \"                                                            self.model_config.d_kv), dtype=torch.float32, device=self.device)) \\n\",\n    \"            present_key_value_states_sa.append(torch.zeros((self.batch_size,                                                     # value states\\n\",\n    \"                                                            self.model_config.num_heads, \\n\",\n    \"                                                            self.max_length-1, \\n\",\n    \"                                                            self.model_config.d_kv), dtype=torch.float32, device=self.device))\\n\",\n    \"\\n\",\n    \"        return present_key_value_states_sa + present_key_value_states_ca\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\n\",\n    \"In the decoder wrapper, in addition to converting keyword arguments to positional arguments we add support for attention caching. Generating text from the encoder decoder models is an autoregressive process. For each invocation, we have to compute the key and value states of the attention heads repeatedly. To improve the performance, we cache the key and value states. This cache is what HuggingFace transformers code refers to as `past_key_values`.\\n\",\n    \"\\n\",\n    \"In HuggingFace transformers, the `past_key_values` are updated outside the decoder. This works for training and evaluation but for inference we want to perform them within a single trace. This way, we can optimize across both the decoder execution and cache update. So, we move the cache update within the decoder wrapper.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"class DecoderWrapper(torch.nn.Module):\\n\",\n    \"\\n\",\n    \"    def __init__(self, \\n\",\n    \"                 decoder: T5Stack, \\n\",\n    \"                 lm_head: torch.nn.Linear,\\n\",\n    \"                 model_config,\\n\",\n    \"                 num_beams: int, \\n\",\n    \"                 max_length: int,\\n\",\n    \"                 device: str,\\n\",\n    \"                 tp_degree=None):\\n\",\n    \"        super().__init__()        \\n\",\n    \"        self.decoder = decoder\\n\",\n    \"        self.lm_head = lm_head\\n\",\n    \"        self.model_dim=model_config.d_model\\n\",\n    \"        self.device = device\\n\",\n    \"        self.num_beams = num_beams\\n\",\n    \"        self.batch_size = 1\\n\",\n    \"        self.config = model_config\\n\",\n    \"        \\n\",\n    \"        num_heads=model_config.num_heads\\n\",\n    \"        num_decoder_layers=model_config.num_decoder_layers\\n\",\n    \"\\n\",\n    \"        self.num_attention_heads_per_partition = num_heads\\n\",\n    \"\\n\",\n    \"        # (num_beams, n_heads, seq_length, dim_per_head)\\n\",\n    \"        if device == \\\"cpu\\\":\\n\",\n    \"            self.past_key_values_sa = [torch.ones((num_beams,num_heads,max_length-1,model_config.d_kv), dtype=torch.float32) for _ in range(num_decoder_layers * 2)]\\n\",\n    \"            self.past_key_values_ca = [torch.ones((num_beams,num_heads,max_length,model_config.d_kv), dtype=torch.float32) for _ in range(num_decoder_layers * 2)]\\n\",\n    \"        elif device == \\\"xla\\\":\\n\",\n    \"            self.past_key_values_sa = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((num_beams,self.num_attention_heads_per_partition,max_length-1,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(num_decoder_layers * 2)])\\n\",\n    \"            self.past_key_values_ca = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((num_beams,self.num_attention_heads_per_partition,max_length,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(num_decoder_layers * 2)])\\n\",\n    \"\\n\",\n    \"    def update_past(self, past_key_values):\\n\",\n    \"        new_past_sa = []\\n\",\n    \"        new_past_ca = []\\n\",\n    \"        for past_layer in past_key_values:\\n\",\n    \"            new_past_layer = list(past_layer)\\n\",\n    \"            for i in range(len(new_past_layer[:2])):\\n\",\n    \"                new_past_layer[i] = past_layer[i][:, :, 1:]\\n\",\n    \"            new_past_sa += [new_past_layer[:2],]\\n\",\n    \"            new_past_ca += [new_past_layer[2:],]\\n\",\n    \"        return new_past_sa, new_past_ca\\n\",\n    \"    \\n\",\n    \"    def reorder_cache(self, past_key_values, beam_idx):\\n\",\n    \"        for i in range(len(past_key_values)):\\n\",\n    \"            gather_index = beam_idx.view([beam_idx.shape[0],1,1,1]).expand_as(past_key_values[i])\\n\",\n    \"            past_key_values[i] = torch.gather(past_key_values[i], dim = 0, index=gather_index)\\n\",\n    \"        return past_key_values\\n\",\n    \"\\n\",\n    \"    def forward(self,\\n\",\n    \"                input_ids,\\n\",\n    \"                decoder_attention_mask,\\n\",\n    \"                encoder_hidden_states,\\n\",\n    \"                encoder_attention_mask,\\n\",\n    \"                beam_idx,\\n\",\n    \"                beam_scores,\\n\",\n    \"                **kwargs):\\n\",\n    \"\\n\",\n    \"        if self.num_beams > 1:\\n\",\n    \"            # We reorder the cache based on the beams selected in each iteration. Required step for beam search.\\n\",\n    \"            past_key_values_sa = self.reorder_cache(self.past_key_values_sa, beam_idx)\\n\",\n    \"            past_key_values_ca = self.reorder_cache(self.past_key_values_ca, beam_idx)\\n\",\n    \"        else:\\n\",\n    \"            # We do not need to reorder for greedy sampling\\n\",\n    \"            past_key_values_sa = self.past_key_values_sa\\n\",\n    \"            past_key_values_ca = self.past_key_values_ca\\n\",\n    \"\\n\",\n    \"        # The cache is stored in a flatten form. We order the cache per layer before passing it to the decoder. \\n\",\n    \"        # Each layer has 4 tensors, so we group by 4. \\n\",\n    \"        past_key_values = [[*past_key_values_sa[i*2:i*2+2], *past_key_values_ca[i*2:i*2+2]] for i in range(0, int(len(past_key_values_ca)/2))]\\n\",\n    \"\\n\",\n    \"        decoder_output = self.decoder(\\n\",\n    \"            input_ids=input_ids,\\n\",\n    \"            attention_mask=decoder_attention_mask,\\n\",\n    \"            past_key_values=past_key_values,\\n\",\n    \"            encoder_hidden_states=encoder_hidden_states,\\n\",\n    \"            encoder_attention_mask=encoder_attention_mask,\\n\",\n    \"            use_cache=True,\\n\",\n    \"            output_attentions=False,\\n\",\n    \"            output_hidden_states=False)\\n\",\n    \"\\n\",\n    \"        last_hidden_state = decoder_output['last_hidden_state']\\n\",\n    \"        past_key_values = decoder_output['past_key_values']\\n\",\n    \"\\n\",\n    \"        if self.config.tie_word_embeddings:\\n\",\n    \"            # Rescale output before projecting on vocab\\n\",\n    \"            # See https://github.com/tensorflow/mesh/blob/fa19d69eafc9a482aff0b59ddd96b025c0cb207d/mesh_tensorflow/transformer/transformer.py#L586\\n\",\n    \"            last_hidden_state = last_hidden_state * (self.model_dim**-0.5)\\n\",\n    \"        \\n\",\n    \"        lm_logits = self.lm_head(last_hidden_state)\\n\",\n    \"\\n\",\n    \"        past_key_values_sa, past_key_values_ca = self.update_past(past_key_values)\\n\",\n    \"\\n\",\n    \"        # We flatten the cache to a single array. This is required for the input output aliasing to work\\n\",\n    \"        past_key_values_sa = [vec for kv_per_layer in past_key_values_sa for vec in kv_per_layer]\\n\",\n    \"        past_key_values_ca = [vec for kv_per_layer in past_key_values_ca for vec in kv_per_layer]\\n\",\n    \"\\n\",\n    \"        if self.device == \\\"cpu\\\":\\n\",\n    \"            self.past_key_values_sa = past_key_values_sa\\n\",\n    \"            self.past_key_values_ca = past_key_values_ca\\n\",\n    \"\\n\",\n    \"        # We calculate topk inside the wrapper\\n\",\n    \"        next_token_logits = lm_logits[:, -1, :]\\n\",\n    \"\\n\",\n    \"        if self.num_beams > 1:\\n\",\n    \"            # This section of beam search is run outside the decoder in the huggingface t5 implementation. \\n\",\n    \"            # To maximize the computation within the neuron device, we move this within the wrapper\\n\",\n    \"            logit_max, _ = torch.max(next_token_logits, dim=-1, keepdim=True)\\n\",\n    \"            logsumexp = torch.log(torch.exp(next_token_logits - logit_max).sum(dim=-1, keepdim=True))\\n\",\n    \"            next_token_scores = next_token_logits - logit_max - logsumexp\\n\",\n    \"            next_token_scores = next_token_scores + beam_scores[:, None].expand_as(next_token_scores)\\n\",\n    \"\\n\",\n    \"            # reshape for beam search\\n\",\n    \"            vocab_size = next_token_scores.shape[-1]\\n\",\n    \"            next_token_scores = next_token_scores.view(self.batch_size, self.num_beams * vocab_size)\\n\",\n    \"            next_token_scores = next_token_scores * 1\\n\",\n    \"\\n\",\n    \"            # Sample 2 next tokens for each beam (so we have some spare tokens and match output of beam search)\\n\",\n    \"            next_token_scores, next_tokens = torch.topk(\\n\",\n    \"                next_token_scores, 2 * self.num_beams, dim=1, largest=True, sorted=True\\n\",\n    \"            ) \\n\",\n    \"\\n\",\n    \"            next_indices = torch.div(next_tokens, vocab_size, rounding_mode=\\\"floor\\\")\\n\",\n    \"            next_tokens = next_tokens % vocab_size\\n\",\n    \"\\n\",\n    \"            return [next_token_scores, next_tokens, next_indices] + past_key_values_sa + past_key_values_ca\\n\",\n    \"        else:\\n\",\n    \"            # Greedy    \\n\",\n    \"            next_tokens = torch.argmax(next_token_logits, dim=-1)\\n\",\n    \"            return [next_tokens] + past_key_values_sa + past_key_values_ca\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Now let's create a T5 model wrapper to make it compatible with our traced encoder and decoder. \\n\",\n    \"\\n\",\n    \"There are two reasons for having this wrapper, \\n\",\n    \"\\n\",\n    \"1. The encoder and decoder traces can only be invoked with positional arguments. But the HuggingFace transformers code is written with keyword arguments. So we override the functions that invoke encoder and decoder to call with positional arguments. \\n\",\n    \"1. The generate() function in the NeuronGenerationMixin performs cache update within the CPU. As we are handling the cache within the DecoderWrapper, we disable the cache update on CPU. \\n\",\n    \"1. The topK computation to determine the next tokens for beam search was moved into the decoder wrapper. So, we need to override the huggingface's beam search implementation to accept the next tokens and the beam scores from the decoder. \\n\",\n    \"\\n\",\n    \"Let's also override the `generate()` function so that it will intialize the cache using the cache initalizer before starting the greedy decoding.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"import torch_xla.core.xla_model as xm\\n\",\n    \"\\n\",\n    \"from transformers import T5Tokenizer, T5ForConditionalGeneration\\n\",\n    \"from transformers.modeling_outputs import BaseModelOutput, Seq2SeqLMOutput\\n\",\n    \"from transformers.models.t5.modeling_t5 import T5Stack, T5LayerCrossAttention\\n\",\n    \"from transformers.generation.utils import ModelOutput\\n\",\n    \"from typing import Any, Dict, List, Optional, Tuple, Union\\n\",\n    \"from transformers.generation.beam_search import BeamScorer, BeamSearchScorer\\n\",\n    \"\\n\",\n    \"from optimum.neuron.generation import NeuronGenerationMixin\\n\",\n    \"\\n\",\n    \"from transformers.generation.logits_process import (\\n\",\n    \"    LogitsProcessorList,\\n\",\n    \")\\n\",\n    \"from transformers.generation.stopping_criteria import (\\n\",\n    \"    MaxLengthCriteria,\\n\",\n    \"    MaxTimeCriteria,\\n\",\n    \"    StoppingCriteriaList,\\n\",\n    \"    validate_stopping_criteria,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"from transformers.generation.utils import (\\n\",\n    \"    BeamSearchOutput,\\n\",\n    \"    GreedySearchOutput,\\n\",\n    \")\\n\",\n    \"\\n\",\n    \"class T5Wrapper(T5ForConditionalGeneration, NeuronGenerationMixin):\\n\",\n    \"\\n\",\n    \"    def _prepare_encoder_decoder_kwargs_for_generation(\\n\",\n    \"        self, \\n\",\n    \"        inputs_tensor: torch.Tensor, \\n\",\n    \"        model_kwargs, \\n\",\n    \"        model_input_name: Optional[str] = None\\n\",\n    \"    ) -> Dict[str, Any]:\\n\",\n    \"        encoder = self.get_encoder()\\n\",\n    \"        model_kwargs[\\\"encoder_outputs\\\"]: ModelOutput = encoder(inputs_tensor, model_kwargs[\\\"attention_mask\\\"])\\n\",\n    \"        return model_kwargs\\n\",\n    \"\\n\",\n    \"    # Override to cut the input_ids to just last token\\n\",\n    \"    def prepare_inputs_for_generation(\\n\",\n    \"        self,\\n\",\n    \"        input_ids,\\n\",\n    \"        past_key_values=None,\\n\",\n    \"        attention_mask=None,\\n\",\n    \"        head_mask=None,\\n\",\n    \"        decoder_head_mask=None,\\n\",\n    \"        decoder_attention_mask=None,\\n\",\n    \"        cross_attn_head_mask=None,\\n\",\n    \"        use_cache=None,\\n\",\n    \"        encoder_outputs=None,\\n\",\n    \"        **kwargs,\\n\",\n    \"    ):\\n\",\n    \"        # cut decoder_input_ids as past is cached\\n\",\n    \"        input_ids = input_ids[:, -1:]\\n\",\n    \"\\n\",\n    \"        return {\\n\",\n    \"            \\\"decoder_input_ids\\\": input_ids,\\n\",\n    \"            \\\"past_key_values\\\": past_key_values,\\n\",\n    \"            \\\"encoder_outputs\\\": encoder_outputs,\\n\",\n    \"            \\\"attention_mask\\\": attention_mask,\\n\",\n    \"            \\\"head_mask\\\": head_mask,\\n\",\n    \"            \\\"decoder_head_mask\\\": decoder_head_mask,\\n\",\n    \"            \\\"decoder_attention_mask\\\": decoder_attention_mask,\\n\",\n    \"            \\\"cross_attn_head_mask\\\": cross_attn_head_mask,\\n\",\n    \"            \\\"use_cache\\\": use_cache,\\n\",\n    \"        }\\n\",\n    \"    \\n\",\n    \"    '''\\n\",\n    \"        We update the cache in the decoder trace, so lets override the _update_model_kwargs_for_xla_generation in NeuronGenerationMixin\\n\",\n    \"    '''\\n\",\n    \"    def _update_model_kwargs_for_xla_generation(\\n\",\n    \"        self,\\n\",\n    \"        model_kwargs: Dict[str, Any],\\n\",\n    \"        batch_size: int,\\n\",\n    \"        is_encoder_decoder: bool = False,\\n\",\n    \"        standardize_cache_format: bool = False,\\n\",\n    \"        max_length: Optional[int] = None,\\n\",\n    \"        seq_length: Optional[int] = None,\\n\",\n    \"        use_cache: bool = True,\\n\",\n    \"    ) -> Dict[str, Any]:\\n\",\n    \"\\n\",\n    \"        def _update_attention(model_kwargs, is_encoder_decoder):\\n\",\n    \"            \\\"\\\"\\\"Updates the appropriate attention mask -- encoder-decoder models use `decoder_attention_mask`\\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"            attention_mask_name = \\\"decoder_attention_mask\\\" if is_encoder_decoder else \\\"attention_mask\\\"\\n\",\n    \"            attention_mask = model_kwargs.pop(attention_mask_name)\\n\",\n    \"            attention_mask_update_slice = torch.ones(\\n\",\n    \"                (batch_size, 1), dtype=attention_mask.dtype, device=attention_mask.device\\n\",\n    \"            )\\n\",\n    \"            attention_mask = torch.cat([attention_mask[:, 1:], attention_mask_update_slice], dim=-1)\\n\",\n    \"            mask = {attention_mask_name: attention_mask}\\n\",\n    \"            return mask\\n\",\n    \"\\n\",\n    \"        mask = _update_attention(model_kwargs, is_encoder_decoder)\\n\",\n    \"        # sets the updated variables (mask and past_key_values)\\n\",\n    \"        model_kwargs.update(mask)\\n\",\n    \"\\n\",\n    \"        # Set a mock cache tensor\\n\",\n    \"        model_kwargs[\\\"past_key_values\\\"] = torch.tensor([])\\n\",\n    \"\\n\",\n    \"        return model_kwargs\\n\",\n    \"    \\n\",\n    \"    def _reorder_cache(self, past_key_values, beam_idx):\\n\",\n    \"        '''\\n\",\n    \"            This is needed for beam search and not greedy sampling\\n\",\n    \"            We reorder the cache within the trace so we can skip it in modelling_t5.py. So we override the _reorder_cache\\n\",\n    \"        '''\\n\",\n    \"        self.beam_idx = beam_idx\\n\",\n    \"        return past_key_values\\n\",\n    \"\\n\",\n    \"    def generate(self,\\n\",\n    \"                tokenizer: T5Tokenizer,\\n\",\n    \"                prompt: str,\\n\",\n    \"                max_length: int,\\n\",\n    \"                num_beams: int,\\n\",\n    \"                num_return_sequences: int,\\n\",\n    \"                device: str):\\n\",\n    \"\\n\",\n    \"        batch_encoding = tokenizer(prompt, max_length=max_length, truncation=True, padding='max_length',\\n\",\n    \"                                return_tensors=\\\"pt\\\")\\n\",\n    \"\\n\",\n    \"        past_key_values = self.encoder(batch_encoding['input_ids'],batch_encoding['attention_mask'])\\n\",\n    \" \\n\",\n    \"        decoder_attention_mask = torch.cat([torch.zeros((1, max_length-1), dtype=torch.int32),\\n\",\n    \"                                            torch.ones((1, 1), dtype=torch.int32)], axis=1)\\n\",\n    \"\\n\",\n    \"        # copy the new cache state to the decoder\\n\",\n    \"        if device == \\\"xla\\\":\\n\",\n    \"            for state, tensor in zip(self.decoder.parameters(), past_key_values):\\n\",\n    \"                state.copy_(tensor)\\n\",\n    \"        else:\\n\",\n    \"            # First half of the cache is self attention and the rest is cross attention\\n\",\n    \"            self.decoder.past_key_values_sa = past_key_values[:len(past_key_values)//2]\\n\",\n    \"            self.decoder.past_key_values_ca = past_key_values[len(past_key_values)//2:]\\n\",\n    \"        \\n\",\n    \"        output = super().generate(**batch_encoding,\\n\",\n    \"                                max_length=max_length,\\n\",\n    \"                                num_beams=num_beams,\\n\",\n    \"                                num_return_sequences=num_return_sequences,\\n\",\n    \"                                do_sample=False,\\n\",\n    \"                                use_cache=True,\\n\",\n    \"                                decoder_attention_mask=decoder_attention_mask, \\n\",\n    \"                                encoder_outputs={\\\"last_hidden_state\\\": torch.ones((1,128,1))}) # Pass fake encoder_outputs so the transfomers code will not invoke the encoder\\n\",\n    \"        return output\\n\",\n    \"\\n\",\n    \"    def forward(\\n\",\n    \"        self,\\n\",\n    \"        attention_mask: Optional[torch.FloatTensor] = None,\\n\",\n    \"        decoder_input_ids: Optional[torch.LongTensor] = None,\\n\",\n    \"        decoder_attention_mask: Optional[torch.BoolTensor] = None,\\n\",\n    \"        encoder_outputs: Optional[Tuple[Tuple[torch.Tensor]]] = None,\\n\",\n    \"        beam_scores = None,\\n\",\n    \"        **kwargs\\n\",\n    \"    ) -> Union[Tuple[torch.FloatTensor], Seq2SeqLMOutput]:\\n\",\n    \"\\n\",\n    \"        hidden_states = encoder_outputs[\\\"last_hidden_state\\\"]\\n\",\n    \"\\n\",\n    \"        if not hasattr(self, 'beam_idx'):\\n\",\n    \"            # Infering the number of beams from the attention mask\\n\",\n    \"            num_beams = attention_mask.shape[0]\\n\",\n    \"            self.beam_idx = torch.arange(0, num_beams, dtype=torch.int64)\\n\",\n    \"\\n\",\n    \"        decoder_outputs = self.decoder(\\n\",\n    \"            decoder_input_ids,\\n\",\n    \"            decoder_attention_mask,\\n\",\n    \"            hidden_states,\\n\",\n    \"            attention_mask,\\n\",\n    \"            self.beam_idx,\\n\",\n    \"            beam_scores\\n\",\n    \"        )\\n\",\n    \"\\n\",\n    \"        # lm_logits = decoder_outputs[0]\\n\",\n    \"        next_token_scores = decoder_outputs[0]\\n\",\n    \"        next_tokens = decoder_outputs[1]\\n\",\n    \"        next_indices = decoder_outputs[2]\\n\",\n    \"\\n\",\n    \"        return next_token_scores, next_tokens, next_indices\\n\",\n    \"\\n\",\n    \"    def beam_search(\\n\",\n    \"        self,\\n\",\n    \"        input_ids: torch.LongTensor,\\n\",\n    \"        beam_scorer: BeamScorer,\\n\",\n    \"        logits_processor: Optional[LogitsProcessorList] = None,\\n\",\n    \"        stopping_criteria: Optional[StoppingCriteriaList] = None,\\n\",\n    \"        max_length: Optional[int] = None,\\n\",\n    \"        pad_token_id: Optional[int] = None,\\n\",\n    \"        eos_token_id: Optional[Union[int, List[int]]] = None,\\n\",\n    \"        output_attentions: Optional[bool] = None,\\n\",\n    \"        output_hidden_states: Optional[bool] = None,\\n\",\n    \"        output_scores: Optional[bool] = None,\\n\",\n    \"        return_dict_in_generate: Optional[bool] = None,\\n\",\n    \"        synced_gpus: Optional[bool] = False,\\n\",\n    \"        seq_length: Optional[int] = None,\\n\",\n    \"        **model_kwargs,\\n\",\n    \"    ) -> Union[BeamSearchOutput, torch.LongTensor]:\\n\",\n    \"\\n\",\n    \"        logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()\\n\",\n    \"        stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()\\n\",\n    \"        pad_token_id = pad_token_id if pad_token_id is not None else self.generation_config.pad_token_id\\n\",\n    \"        eos_token_id = eos_token_id if eos_token_id is not None else self.generation_config.eos_token_id\\n\",\n    \"        if isinstance(eos_token_id, int):\\n\",\n    \"            eos_token_id = [eos_token_id]\\n\",\n    \"        output_scores = output_scores if output_scores is not None else self.generation_config.output_scores\\n\",\n    \"        output_attentions = (\\n\",\n    \"            output_attentions if output_attentions is not None else self.generation_config.output_attentions\\n\",\n    \"        )\\n\",\n    \"        output_hidden_states = (\\n\",\n    \"            output_hidden_states if output_hidden_states is not None else self.generation_config.output_hidden_states\\n\",\n    \"        )\\n\",\n    \"\\n\",\n    \"        batch_size = len(beam_scorer._beam_hyps)\\n\",\n    \"        num_beams = beam_scorer.num_beams\\n\",\n    \"\\n\",\n    \"        batch_beam_size, cur_len = input_ids.shape\\n\",\n    \"\\n\",\n    \"        # Overwrite cur_len\\n\",\n    \"        cur_len = seq_length\\n\",\n    \"\\n\",\n    \"        if num_beams * batch_size != batch_beam_size:\\n\",\n    \"            raise ValueError(\\n\",\n    \"                f\\\"Batch dimension of `input_ids` should be {num_beams * batch_size}, but is {batch_beam_size}.\\\"\\n\",\n    \"            )\\n\",\n    \"\\n\",\n    \"        # init attention / hidden states / scores tuples\\n\",\n    \"        scores = () if (return_dict_in_generate and output_scores) else None\\n\",\n    \"        beam_indices = (\\n\",\n    \"            tuple(() for _ in range(batch_beam_size)) if (return_dict_in_generate and output_scores) else None\\n\",\n    \"        )\\n\",\n    \"\\n\",\n    \"        # initialise score of first beam with 0 and the rest with -1e9. This makes sure that only tokens\\n\",\n    \"        # of the first beam are considered to avoid sampling the exact same tokens across all beams.\\n\",\n    \"        # beam_scores = torch.zeros((batch_size, num_beams), dtype=torch.float, device=input_ids.device)\\n\",\n    \"        beam_scores_device = \\\"cpu\\\"\\n\",\n    \"        beam_scores = torch.zeros((batch_size, num_beams), dtype=torch.float, device=beam_scores_device)\\n\",\n    \"        beam_scores[:, 1:] = -1e9\\n\",\n    \"        beam_scores = beam_scores.view((batch_size * num_beams,))\\n\",\n    \"\\n\",\n    \"        while True:\\n\",\n    \"            # prepare model inputs\\n\",\n    \"            # From max_length-sized input_ids, select first\\n\",\n    \"            # cur_len - 1 values.\\n\",\n    \"            update_indices = torch.stack(\\n\",\n    \"                [torch.arange(input_ids.size(0)), torch.tensor(cur_len - 1).repeat(input_ids.size(0))], dim=-1\\n\",\n    \"            )\\n\",\n    \"            input_ids_ = input_ids[update_indices[:, 0], update_indices[:, 1], None]\\n\",\n    \"            model_inputs = self.prepare_inputs_for_generation(input_ids_, **model_kwargs)\\n\",\n    \"\\n\",\n    \"            next_token_scores, next_tokens, next_indices = self(\\n\",\n    \"                **model_inputs,\\n\",\n    \"                return_dict=True,\\n\",\n    \"                output_attentions=output_attentions,\\n\",\n    \"                output_hidden_states=output_hidden_states,\\n\",\n    \"                beam_scores=beam_scores\\n\",\n    \"            )\\n\",\n    \"\\n\",\n    \"            # stateless\\n\",\n    \"            beam_outputs = beam_scorer.process(\\n\",\n    \"                input_ids.to(\\\"cpu\\\")[:, :cur_len],\\n\",\n    \"                next_token_scores.to(\\\"cpu\\\"),\\n\",\n    \"                next_tokens.to(\\\"cpu\\\"),\\n\",\n    \"                next_indices.to(\\\"cpu\\\"),\\n\",\n    \"                pad_token_id=pad_token_id,\\n\",\n    \"                eos_token_id=eos_token_id,\\n\",\n    \"                beam_indices=beam_indices,\\n\",\n    \"            )\\n\",\n    \"\\n\",\n    \"            beam_scores = beam_outputs[\\\"next_beam_scores\\\"]\\n\",\n    \"            beam_next_tokens = beam_outputs[\\\"next_beam_tokens\\\"]\\n\",\n    \"            beam_idx = beam_outputs[\\\"next_beam_indices\\\"]\\n\",\n    \"\\n\",\n    \"            update_indices = torch.stack(\\n\",\n    \"                [torch.arange(batch_beam_size), torch.tensor(cur_len - 1).repeat(batch_beam_size)], dim=-1\\n\",\n    \"            )\\n\",\n    \"            update_indices_2 = torch.stack(\\n\",\n    \"                [torch.arange(batch_beam_size), torch.tensor(cur_len).repeat(batch_beam_size)], dim=-1\\n\",\n    \"            )\\n\",\n    \"            # First select beam_indices\\n\",\n    \"            device = input_ids.device\\n\",\n    \"            beam_idx_device = beam_idx.to(device=input_ids.device)\\n\",\n    \"            input_ids[:, :] = input_ids[beam_idx_device.long(), :]\\n\",\n    \"\\n\",\n    \"            # Then append new tokens\\n\",\n    \"            input_ids[update_indices_2[:, 0], update_indices_2[:, 1], None] = beam_next_tokens.unsqueeze(-1).to(device).to(torch.long)\\n\",\n    \"            input_ids = input_ids * 1  # Hack to materialize tensor\\n\",\n    \"\\n\",\n    \"            # update generated ids, model inputs, and length for next step\\n\",\n    \"            model_kwargs = self._update_model_kwargs_for_xla_generation(\\n\",\n    \"                model_kwargs,\\n\",\n    \"                batch_size=batch_beam_size,\\n\",\n    \"                is_encoder_decoder=self.config.is_encoder_decoder,\\n\",\n    \"                max_length=stopping_criteria.max_length,\\n\",\n    \"                seq_length=cur_len,\\n\",\n    \"                use_cache=model_kwargs[\\\"use_cache\\\"],\\n\",\n    \"            )\\n\",\n    \"            if model_kwargs[\\\"past_key_values\\\"] is not None:\\n\",\n    \"                model_kwargs[\\\"past_key_values\\\"] = self._reorder_cache(model_kwargs[\\\"past_key_values\\\"], beam_idx.to(torch.int64))\\n\",\n    \"\\n\",\n    \"            if return_dict_in_generate and output_scores:\\n\",\n    \"                beam_indices = tuple((beam_indices[beam_idx[i]] + (beam_idx[i],) for i in range(len(beam_indices))))\\n\",\n    \"\\n\",\n    \"            # increase cur_len\\n\",\n    \"            cur_len = cur_len + 1\\n\",\n    \"\\n\",\n    \"            # stop when each sentence is finished, or if we exceed the maximum length\\n\",\n    \"            stop_criterion_1 = beam_scorer.is_done\\n\",\n    \"            if isinstance(stopping_criteria, list):\\n\",\n    \"                if len(stopping_criteria) == 1:\\n\",\n    \"                    stopping_criteria = stopping_criteria[0]\\n\",\n    \"\\n\",\n    \"            # Cases that can be handled in XLA without requiring\\n\",\n    \"            # non-padded input_ids\\n\",\n    \"            if isinstance(stopping_criteria, MaxLengthCriteria):\\n\",\n    \"                stop_criterion_2 = cur_len >= stopping_criteria.max_length\\n\",\n    \"            elif isinstance(stopping_criteria, MaxTimeCriteria):\\n\",\n    \"                stop_criterion_2 = stopping_criteria(input_ids, scores)\\n\",\n    \"            else:\\n\",\n    \"                # Other cases will be handled on CPU\\n\",\n    \"                batch_size, _ = input_ids.shape\\n\",\n    \"                input_ids_cpu = input_ids.to(\\\"cpu\\\")\\n\",\n    \"                mask = torch.cat(\\n\",\n    \"                    [torch.ones(batch_size, cur_len), torch.zeros(batch_size, input_ids.shape[1] - cur_len)], dim=1\\n\",\n    \"                ).bool()\\n\",\n    \"                input_ids_cpu = torch.masked_select(input_ids_cpu, mask).reshape((batch_size, cur_len))\\n\",\n    \"                scores_cpu = scores.to(\\\"cpu\\\") if torch.is_tensor(scores) else scores\\n\",\n    \"                stop_criterion_2 = stopping_criteria(input_ids_cpu, scores_cpu)\\n\",\n    \"\\n\",\n    \"            if stop_criterion_1 or stop_criterion_2:\\n\",\n    \"                if not synced_gpus:\\n\",\n    \"                    break\\n\",\n    \"                else:\\n\",\n    \"                    this_peer_finished = True\\n\",\n    \"\\n\",\n    \"        sequence_outputs = beam_scorer.finalize(\\n\",\n    \"            input_ids.to(\\\"cpu\\\"),\\n\",\n    \"            beam_scores.to(\\\"cpu\\\"),\\n\",\n    \"            next_tokens.to(\\\"cpu\\\"),\\n\",\n    \"            next_indices.to(\\\"cpu\\\"),\\n\",\n    \"            pad_token_id=pad_token_id,\\n\",\n    \"            eos_token_id=eos_token_id,\\n\",\n    \"            max_length=stopping_criteria.max_length,\\n\",\n    \"            beam_indices=beam_indices,\\n\",\n    \"        )\\n\",\n    \"\\n\",\n    \"        for k, v in sequence_outputs.items():\\n\",\n    \"            if type(v) == torch.Tensor:\\n\",\n    \"                sequence_outputs[k] = sequence_outputs[k].to(input_ids.device)\\n\",\n    \"\\n\",\n    \"        return sequence_outputs[\\\"sequences\\\"]\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"    def greedy_search(\\n\",\n    \"        self,\\n\",\n    \"        input_ids: torch.LongTensor,\\n\",\n    \"        logits_processor: Optional[LogitsProcessorList] = None,\\n\",\n    \"        stopping_criteria: Optional[StoppingCriteriaList] = None,\\n\",\n    \"        max_length: Optional[int] = None,\\n\",\n    \"        pad_token_id: Optional[int] = None,\\n\",\n    \"        eos_token_id: Optional[Union[int, List[int]]] = None,\\n\",\n    \"        output_attentions: Optional[bool] = None,\\n\",\n    \"        output_hidden_states: Optional[bool] = None,\\n\",\n    \"        output_scores: Optional[bool] = None,\\n\",\n    \"        return_dict_in_generate: Optional[bool] = None,\\n\",\n    \"        seq_length: Optional[int] = int,\\n\",\n    \"        streamer: Optional[\\\"BaseStreamer\\\"] = None,\\n\",\n    \"        **model_kwargs,\\n\",\n    \"    ) -> Union[GreedySearchOutput, torch.LongTensor]:\\n\",\n    \"        \\\"\\\"\\\"\\n\",\n    \"            Overriding greedy sampling to use next tokens returned from neuron device instead of logits.\\n\",\n    \"        \\\"\\\"\\\"\\n\",\n    \"        # init values\\n\",\n    \"        logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()\\n\",\n    \"        use_cache = model_kwargs[\\\"use_cache\\\"] if \\\"use_cache\\\" in model_kwargs else False\\n\",\n    \"        stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()\\n\",\n    \"        pad_token_id = pad_token_id if pad_token_id is not None else self.generation_config.pad_token_id\\n\",\n    \"        eos_token_id = eos_token_id if eos_token_id is not None else self.generation_config.eos_token_id\\n\",\n    \"        if isinstance(eos_token_id, int):\\n\",\n    \"            eos_token_id = [eos_token_id]\\n\",\n    \"        eos_token_id_tensor = torch.tensor(eos_token_id).to(input_ids.device) if eos_token_id is not None else None\\n\",\n    \"        output_scores = output_scores if output_scores is not None else self.generation_config.output_scores\\n\",\n    \"        output_attentions = (\\n\",\n    \"            output_attentions if output_attentions is not None else self.generation_config.output_attentions\\n\",\n    \"        )\\n\",\n    \"        output_hidden_states = (\\n\",\n    \"            output_hidden_states if output_hidden_states is not None else self.generation_config.output_hidden_states\\n\",\n    \"        )\\n\",\n    \"\\n\",\n    \"        # init attention / hidden states / scores tuples\\n\",\n    \"        scores = () if (return_dict_in_generate and output_scores) else None\\n\",\n    \"        decoder_attentions = () if (return_dict_in_generate and output_attentions) else None\\n\",\n    \"        cross_attentions = () if (return_dict_in_generate and output_attentions) else None\\n\",\n    \"        decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"        # keep track of which sequences are already finished\\n\",\n    \"        unfinished_sequences = torch.ones(input_ids.shape[0], dtype=torch.long, device=input_ids.device)\\n\",\n    \"\\n\",\n    \"        this_peer_finished = False  # used by synced_gpus only\\n\",\n    \"        while True:\\n\",\n    \"\\n\",\n    \"            # prepare model inputs\\n\",\n    \"            # From max_length-sized input_ids, select first\\n\",\n    \"            # seq_length - 1 values.\\n\",\n    \"\\n\",\n    \"            if model_kwargs.get(\\\"past_key_values\\\") is None:\\n\",\n    \"                input_ids_ = input_ids[:, :seq_length]\\n\",\n    \"            else:\\n\",\n    \"                update_indices = torch.stack(\\n\",\n    \"                    [torch.arange(input_ids.size(0)), torch.tensor(seq_length - 1).repeat(input_ids.size(0))],\\n\",\n    \"                    dim=-1,\\n\",\n    \"                )\\n\",\n    \"                input_ids_ = input_ids[update_indices[:, 0], update_indices[:, 1], None]\\n\",\n    \"\\n\",\n    \"            model_inputs = self.prepare_inputs_for_generation(input_ids_, **model_kwargs)\\n\",\n    \"        \\n\",\n    \"            # forward pass to get next token\\n\",\n    \"            output = self(\\n\",\n    \"               **model_inputs,\\n\",\n    \"                return_dict=True,\\n\",\n    \"                output_attentions=output_attentions,\\n\",\n    \"                output_hidden_states=output_hidden_states,\\n\",\n    \"            )\\n\",\n    \"            next_tokens = output[0]\\n\",\n    \"\\n\",\n    \"            # finished sentences should have their next token be a padding token\\n\",\n    \"            if eos_token_id is not None:\\n\",\n    \"                if pad_token_id is None:\\n\",\n    \"                    raise ValueError(\\\"If `eos_token_id` is defined, make sure that `pad_token_id` is defined.\\\")\\n\",\n    \"                next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)\\n\",\n    \"\\n\",\n    \"            # update generated ids, model inputs, and length for next step\\n\",\n    \"\\n\",\n    \"            batch_size, _ = input_ids.shape\\n\",\n    \"            update_indices = torch.stack(\\n\",\n    \"                [torch.arange(batch_size), torch.tensor(seq_length).repeat(batch_size)], dim=-1\\n\",\n    \"            )\\n\",\n    \"            input_ids[update_indices[:, 0], update_indices[:, 1]] = next_tokens[:]\\n\",\n    \"            model_kwargs = self._update_model_kwargs_for_xla_generation(\\n\",\n    \"                model_kwargs,\\n\",\n    \"                batch_size=batch_size,\\n\",\n    \"                is_encoder_decoder=self.config.is_encoder_decoder,\\n\",\n    \"                max_length=stopping_criteria.max_length,\\n\",\n    \"                seq_length=seq_length,\\n\",\n    \"                use_cache=use_cache,\\n\",\n    \"            )\\n\",\n    \"\\n\",\n    \"            seq_length += 1\\n\",\n    \"\\n\",\n    \"            # if eos_token was found in one sentence, set sentence to finished\\n\",\n    \"            if eos_token_id_tensor is not None:\\n\",\n    \"                unfinished_sequences = unfinished_sequences.mul(\\n\",\n    \"                    next_tokens.tile(eos_token_id_tensor.shape[0], 1).ne(eos_token_id_tensor.unsqueeze(1)).prod(dim=0)\\n\",\n    \"                )\\n\",\n    \"\\n\",\n    \"            # stop when each sentence is finished, or if we exceed the maximum length\\n\",\n    \"            stop_criterion_1 = unfinished_sequences.max() == 0\\n\",\n    \"\\n\",\n    \"            if isinstance(stopping_criteria, list):\\n\",\n    \"                if len(stopping_criteria) == 1:\\n\",\n    \"                    stopping_criteria = stopping_criteria[0]\\n\",\n    \"\\n\",\n    \"            # Cases that can be handled in XLA without requiring\\n\",\n    \"            # non-padded input_ids\\n\",\n    \"            if isinstance(stopping_criteria, MaxLengthCriteria):\\n\",\n    \"                stop_criterion_2 = seq_length >= stopping_criteria.max_length\\n\",\n    \"            elif isinstance(stopping_criteria, MaxTimeCriteria):\\n\",\n    \"                stop_criterion_2 = stopping_criteria(input_ids, scores)\\n\",\n    \"            else:\\n\",\n    \"                # Other cases will be handled on CPU\\n\",\n    \"                batch_size, _ = input_ids.shape\\n\",\n    \"                mask = torch.cat(\\n\",\n    \"                    [torch.ones(batch_size, seq_length), torch.zeros(batch_size, input_ids.shape[1] - seq_length)],\\n\",\n    \"                    dim=1,\\n\",\n    \"                ).bool()\\n\",\n    \"                input_ids_cpu = torch.masked_select(input_ids, mask).reshape((batch_size, seq_length)).to(\\\"cpu\\\")\\n\",\n    \"                scores_cpu = scores.to(\\\"cpu\\\") if torch.is_tensor(scores) else scores\\n\",\n    \"                stop_criterion_2 = stopping_criteria(input_ids_cpu, scores_cpu)\\n\",\n    \"\\n\",\n    \"            if stop_criterion_1 or stop_criterion_2:\\n\",\n    \"                this_peer_finished = True\\n\",\n    \"\\n\",\n    \"            if this_peer_finished:\\n\",\n    \"                break\\n\",\n    \"\\n\",\n    \"        if streamer is not None:\\n\",\n    \"            streamer.end()\\n\",\n    \"\\n\",\n    \"        return input_ids\\n\",\n    \"    \\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Now let's test inference on CPU with all the wrappers before tracing.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Let's set some run parameters\\n\",\n    \"\\n\",\n    \"model_name = \\\"t5-large\\\"\\n\",\n    \"num_beams = 1\\n\",\n    \"num_return_sequences = 1\\n\",\n    \"max_length = 128\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Results:\\n\",\n      \"1 Lassen Sie uns gutes Essen essen.\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"from transformers import T5Tokenizer\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"prompt=\\\"translate English to German: Lets eat good food.\\\"\\n\",\n    \"        \\n\",\n    \"tokenizer = T5Tokenizer.from_pretrained(model_name, model_max_length=max_length)\\n\",\n    \"model = T5Wrapper.from_pretrained(model_name)\\n\",\n    \"\\n\",\n    \"model.encoder = EncoderWrapper(model.encoder, model.decoder, model.config, num_beams, max_length, \\\"cpu\\\", num_beams)\\n\",\n    \"setattr(model.encoder, 'main_input_name', 'input_ids')  # Attribute required by beam search\\n\",\n    \"\\n\",\n    \"model.decoder = DecoderWrapper(decoder=model.decoder,\\n\",\n    \"                                lm_head=model.lm_head,\\n\",\n    \"                                model_config=model.config,\\n\",\n    \"                                num_beams=num_beams,\\n\",\n    \"                                max_length=max_length,\\n\",\n    \"                                device=\\\"cpu\\\")\\n\",\n    \"\\n\",\n    \"output = model.generate(tokenizer=tokenizer,\\n\",\n    \"                        prompt=prompt,\\n\",\n    \"                        max_length=max_length,\\n\",\n    \"                        num_beams=num_beams,\\n\",\n    \"                        num_return_sequences=num_return_sequences,\\n\",\n    \"                        device=\\\"cpu\\\")\\n\",\n    \"\\n\",\n    \"results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]\\n\",\n    \"\\n\",\n    \"print('Results:')\\n\",\n    \"for i, summary in enumerate(results):\\n\",\n    \"    print(i + 1, summary)\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Now that the wrappers are running as expected, let's trace the encoder, and decoder. To trace these functions, we pass the function and a sample input to the trace function. The result of the trace stage will be a static executable where the operations to be run upon inference are determined during compilation. This means that when inferring, the resulting Neuron model must be executed with tensors that are the exact same shape as those provided at compilation time. If a model is given a tensor at inference time whose shape does not match the tensor given at compilation time, an error will occur.\\n\",\n    \"\\n\",\n    \"The decoder wrapper returns the new state of the cache as an output which is copied back to the CPU. As the cache is a large tensor, copying it to and from the XLA device for each decoder invocation will significantly slow down the inference. Instead, we can use input output aliasing, a feature of `torch_neuronx` to keep these tensors on device rather than copying back to the CPU. To use input output aliasing, we need to map the outputs to input parameters while tracing. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"import torch_neuronx\\n\",\n    \"\\n\",\n    \"from transformers import T5Tokenizer, T5ForConditionalGeneration\\n\",\n    \"\\n\",\n    \"def trace_encoder(model: T5ForConditionalGeneration,\\n\",\n    \"                  tokenizer: T5Tokenizer,\\n\",\n    \"                  max_length: int,\\n\",\n    \"                  num_beams: int):\\n\",\n    \"    \\n\",\n    \"    # Trace encoder\\n\",\n    \"    batch_encoding = tokenizer(\\\"translate English to German: Lets go home now\\\",\\n\",\n    \"                               max_length=max_length, truncation=True, padding='max_length', return_tensors=\\\"pt\\\")\\n\",\n    \"    input_ids = batch_encoding['input_ids']\\n\",\n    \"    attention_mask = batch_encoding['attention_mask']\\n\",\n    \"\\n\",\n    \"    encoder = EncoderWrapper(model.encoder, model.decoder, model.config, num_beams, max_length, \\\"xla\\\", num_beams)\\n\",\n    \"    traced_encoder = torch_neuronx.trace(encoder, (input_ids, attention_mask), compiler_workdir=\\\"/tmp/encoder/\\\")\\n\",\n    \"    setattr(traced_encoder, 'main_input_name', 'input_ids')  # Attribute required by beam search\\n\",\n    \"\\n\",\n    \"    return traced_encoder\\n\",\n    \"\\n\",\n    \"def trace_decoder(model: T5ForConditionalGeneration,\\n\",\n    \"                  num_beams: int,\\n\",\n    \"                  max_length: int):\\n\",\n    \"\\n\",\n    \"    decoder = DecoderWrapper(decoder=model.decoder,\\n\",\n    \"                             lm_head=model.lm_head,\\n\",\n    \"                             model_config=model.config,\\n\",\n    \"                             num_beams=num_beams,\\n\",\n    \"                             max_length=max_length,\\n\",\n    \"                             device=\\\"xla\\\")\\n\",\n    \"\\n\",\n    \"    # We create mock inputs so we can trace the decoder\\n\",\n    \"    decoder_input_ids = torch.ones((num_beams, 1), dtype=torch.int64)\\n\",\n    \"    decoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int32)\\n\",\n    \"    encoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int64)\\n\",\n    \"    encoder_hidden_states = torch.ones((num_beams, max_length, model.config.d_model), dtype=torch.float32)\\n\",\n    \"\\n\",\n    \"    beam_idx = torch.arange(0, num_beams, dtype=torch.int64)\\n\",\n    \"    beam_scores = torch.zeros((num_beams,), dtype=torch.float)\\n\",\n    \"\\n\",\n    \"    num_outputs_from_trace = 3 if num_beams > 1 else 1\\n\",\n    \"\\n\",\n    \"    aliases = {}\\n\",\n    \"    for i in range(len(decoder.past_key_values_sa)):\\n\",\n    \"        aliases[decoder.past_key_values_sa[i]] = i + num_outputs_from_trace\\n\",\n    \"    for i in range(len(decoder.past_key_values_ca)):\\n\",\n    \"        aliases[decoder.past_key_values_ca[i]] = len(decoder.past_key_values_sa) + i + num_outputs_from_trace\\n\",\n    \"\\n\",\n    \"    traced_decoder = torch_neuronx.trace(decoder, (\\n\",\n    \"        decoder_input_ids,\\n\",\n    \"        decoder_attention_mask,\\n\",\n    \"        encoder_hidden_states,\\n\",\n    \"        encoder_attention_mask,\\n\",\n    \"        beam_idx,\\n\",\n    \"        beam_scores,\\n\",\n    \"    ), input_output_aliases=aliases, compiler_workdir=\\\"/tmp/decoder/\\\")\\n\",\n    \"\\n\",\n    \"    return traced_decoder\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"tokenizer = T5Tokenizer.from_pretrained(model_name, model_max_length=max_length)\\n\",\n    \"model = T5ForConditionalGeneration.from_pretrained(model_name)\\n\",\n    \"\\n\",\n    \"# We enable this flag to ensure model uses attention key value caching\\n\",\n    \"model.config.use_cache = True\\n\",\n    \"\\n\",\n    \"traced_encoder = trace_encoder(model, tokenizer, max_length, num_beams)\\n\",\n    \"traced_decoder = trace_decoder(model, num_beams, max_length)\\n\",\n    \"\\n\",\n    \"torch.jit.save(traced_encoder, \\\"TracedEncoder.pt\\\")\\n\",\n    \"torch.jit.save(traced_decoder, \\\"TracedDecoder.pt\\\")\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run inference with greedy decoding\\n\",\n    \"Now that we have the traced model, let's use it for inference. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Results:\\n\",\n      \"1 Lassen Sie uns gutes Essen essen.\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"runtime = torch.classes.neuron.Runtime()\\n\",\n    \"runtime.initialize()\\n\",\n    \"runtime.set_default_neuron_cores(0, 1)\\n\",\n    \"\\n\",\n    \"tokenizer = T5Tokenizer.from_pretrained(model_name)\\n\",\n    \"model = T5Wrapper.from_pretrained(model_name)\\n\",\n    \"\\n\",\n    \"model.encoder = torch.jit.load(\\\"TracedEncoder.pt\\\")\\n\",\n    \"# Attribute required by beam search\\n\",\n    \"setattr(model.encoder, 'main_input_name', 'input_ids')  \\n\",\n    \"\\n\",\n    \"model.decoder = torch.jit.load(\\\"TracedDecoder.pt\\\")\\n\",\n    \"torch_neuronx.move_trace_to_device(model.decoder, 0)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"output = model.generate(tokenizer=tokenizer,\\n\",\n    \"                        prompt=\\\"translate English to German: Lets eat good food.\\\",\\n\",\n    \"                        max_length=max_length,\\n\",\n    \"                        num_beams=num_beams,\\n\",\n    \"                        num_return_sequences=num_return_sequences,\\n\",\n    \"                        device=\\\"xla\\\")\\n\",\n    \"\\n\",\n    \"results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]\\n\",\n    \"\\n\",\n    \"print('Results:')\\n\",\n    \"for i, summary in enumerate(results):\\n\",\n    \"    print(i + 1, summary)\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run inference with beam search\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# Let's set some run parameters for beam search\\n\",\n    \"\\n\",\n    \"model_name = \\\"t5-large\\\"\\n\",\n    \"num_beams = 4\\n\",\n    \"num_return_sequences = 4\\n\",\n    \"max_length = 128\\n\",\n    \"\\n\",\n    \"tokenizer = T5Tokenizer.from_pretrained(model_name, model_max_length=max_length)\\n\",\n    \"model = T5ForConditionalGeneration.from_pretrained(model_name)\\n\",\n    \"model.config.use_cache = True\\n\",\n    \"\\n\",\n    \"traced_encoder = trace_encoder(model, tokenizer, max_length, num_beams)\\n\",\n    \"traced_decoder = trace_decoder(model, num_beams, max_length)\\n\",\n    \"\\n\",\n    \"torch.jit.save(traced_encoder, \\\"TracedEncoder.pt\\\")\\n\",\n    \"torch.jit.save(traced_decoder, \\\"TracedDecoder.pt\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Results:\\n\",\n      \"1 Lassen Sie uns gutes Essen essen.\\n\",\n      \"2 Lassen Sie uns gutes Essen zu essen.\\n\",\n      \"3 Lassen Sie uns essen gutes Essen.\\n\",\n      \"4 Lassen Sie uns gutes Essen.\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"tokenizer = T5Tokenizer.from_pretrained(model_name)\\n\",\n    \"model = T5Wrapper.from_pretrained(model_name)\\n\",\n    \"\\n\",\n    \"model.encoder = torch.jit.load(\\\"TracedEncoder.pt\\\")\\n\",\n    \"# Attribute required by beam search\\n\",\n    \"setattr(model.encoder, 'main_input_name', 'input_ids')  \\n\",\n    \"\\n\",\n    \"model.decoder = torch.jit.load(\\\"TracedDecoder.pt\\\")\\n\",\n    \"torch_neuronx.move_trace_to_device(model.decoder, 0)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"output = model.generate(tokenizer=tokenizer,\\n\",\n    \"                        prompt=\\\"translate English to German: Lets eat good food.\\\",\\n\",\n    \"                        max_length=max_length,\\n\",\n    \"                        num_beams=num_beams,\\n\",\n    \"                        num_return_sequences=num_return_sequences,\\n\",\n    \"                        device=\\\"xla\\\")\\n\",\n    \"\\n\",\n    \"results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]\\n\",\n    \"\\n\",\n    \"print('Results:')\\n\",\n    \"for i, summary in enumerate(results):\\n\",\n    \"    print(i + 1, summary)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"venv\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.10\"\n  },\n  \"orig_nbformat\": 4\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2\n}\n"
  },
  {
    "path": "src/examples/pytorch/torchserve/benchmark_bert.py",
    "content": "import os\nimport argparse\nimport time\nimport numpy as np\nimport requests\nimport sys\nfrom concurrent import futures\n\nimport torch\n\n\nparser = argparse.ArgumentParser()\nparser.add_argument('--url', help='Torchserve model URL', type=str, default=f'http://127.0.0.1:8080/predictions/bert-max_length128-batch_size6')\nparser.add_argument('--num_thread', type=int, default=64, help='Number of threads invoking the model URL')\nparser.add_argument('--batch_size', type=int, default=6)\nparser.add_argument('--sequence_length', type=int, default=128)\nparser.add_argument('--latency_window_size', type=int, default=1000)\nparser.add_argument('--throughput_time', type=int, default=300)\nparser.add_argument('--throughput_interval', type=int, default=10)\nargs = parser.parse_args()\n\ndata = { 'seq_0': 'A completely made up sentence.',\n    'seq_1': 'Well, I suppose they are all made up.' }\nlive = True\nnum_infer = 0\nlatency_list = []\n\n\ndef one_thread(pred, feed_data):\n    global latency_list\n    global num_infer\n    global live\n    session = requests.Session()\n    while True:\n        start = time.time()\n        result = session.post(pred, data=feed_data)\n        latency = time.time() - start\n        latency_list.append(latency)\n        num_infer += 1\n        if not live:\n            break\n\n\ndef current_performance():\n    last_num_infer = num_infer\n    for _ in range(args.throughput_time // args.throughput_interval):\n        current_num_infer = num_infer\n        throughput = (current_num_infer - last_num_infer) / args.throughput_interval\n        p50 = 0.0\n        p90 = 0.0\n        if latency_list:\n            p50 = np.percentile(latency_list[-args.latency_window_size:], 50)\n            p90 = np.percentile(latency_list[-args.latency_window_size:], 90)\n        print('pid {}: current throughput {}, latency p50={:.3f} p90={:.3f}'.format(os.getpid(), throughput, p50, p90))\n        sys.stdout.flush()\n        last_num_infer = current_num_infer\n        time.sleep(args.throughput_interval)\n    global live\n    live = False\n\n\nwith futures.ThreadPoolExecutor(max_workers=args.num_thread+1) as executor:\n    executor.submit(current_performance)\n    for _ in range(args.num_thread):\n        executor.submit(one_thread, args.url, data)"
  },
  {
    "path": "src/examples/pytorch/torchserve/config.json",
    "content": "{\n    \"model_name\": \"bert-base-cased-finetuned-mrpc\",\n    \"max_length\": 128,\n    \"batch_size\": 6\n}"
  },
  {
    "path": "src/examples/pytorch/torchserve/handler_bert.py",
    "content": "import os\nimport json\nimport sys\nimport logging\nfrom abc import ABC\n\nimport torch\nimport torch_neuron\n\nfrom transformers import AutoTokenizer\nfrom ts.torch_handler.base_handler import BaseHandler\n\n\n# one core per worker\nos.environ['NEURON_RT_NUM_CORES'] = '1'\n\nlogger = logging.getLogger(__name__)\n\nclass BertEmbeddingHandler(BaseHandler, ABC):\n    \"\"\"\n    Handler class for Bert Embedding computations.\n    \"\"\"\n    def __init__(self):\n        super(BertEmbeddingHandler, self).__init__()\n        self.initialized = False\n\n    def initialize(self, ctx):\n        self.manifest = ctx.manifest\n        properties = ctx.system_properties\n        self.device = 'cpu'\n        model_dir = properties.get('model_dir')\n        serialized_file = self.manifest['model']['serializedFile']\n        model_pt_path = os.path.join(model_dir, serialized_file)\n\n        # point sys.path to our config file\n        with open('config.json') as fp:\n            config = json.load(fp)\n        self.max_length = config['max_length']\n        self.batch_size = config['batch_size']\n        self.classes = ['not paraphrase', 'paraphrase']\n\n        self.model = torch.jit.load(model_pt_path)\n        logger.debug(f'Model loaded from {model_dir}')\n        self.model.to(self.device)\n        self.model.eval()\n\n        self.tokenizer = AutoTokenizer.from_pretrained(config['model_name'])\n        self.initialized = True\n\n    def preprocess(self, input_data):\n        \"\"\"\n        Tokenization pre-processing\n        \"\"\"\n\n        input_ids = []\n        attention_masks = []\n        token_type_ids = []\n        for row in input_data:\n            seq_0 = row['seq_0'].decode('utf-8')\n            seq_1 = row['seq_1'].decode('utf-8')\n            logger.debug(f'Received text: \"{seq_0}\", \"{seq_1}\"')\n\n            inputs = self.tokenizer.encode_plus(\n                    seq_0,\n                    seq_1,\n                    max_length=self.max_length,\n                    padding='max_length',\n                    truncation=True,\n                    return_tensors='pt'\n                    )\n\n            input_ids.append(inputs['input_ids'])\n            attention_masks.append(inputs['attention_mask'])\n            token_type_ids.append(inputs['token_type_ids'])\n\n        batch = (torch.cat(input_ids, 0),\n                torch.cat(attention_masks, 0),\n                torch.cat(token_type_ids, 0))\n\n        return batch\n\n    def inference(self, inputs):\n        \"\"\"\n        Predict the class of a text using a trained transformer model.\n        \"\"\"\n\n        # sanity check dimensions\n        assert(len(inputs) == 3)\n        num_inferences = len(inputs[0])\n        assert(num_inferences <= self.batch_size)\n\n        # insert padding if we received a partial batch\n        padding = self.batch_size - num_inferences\n        if padding > 0:\n            pad = torch.nn.ConstantPad1d((0, 0, 0, padding), value=0)\n            inputs = [pad(x) for x in inputs]\n\n        outputs = self.model(*inputs)[0]\n        predictions = []\n        for i in range(num_inferences):\n            prediction = self.classes[outputs[i].argmax().item()]\n            predictions.append([prediction])\n            logger.debug(\"Model predicted: '%s'\", prediction)\n        return predictions\n\n    def postprocess(self, inference_output):\n        return inference_output\n"
  },
  {
    "path": "src/examples/pytorch/torchserve/handler_bert_neuronx.py",
    "content": "import os\nimport json\nimport sys\nimport logging\nfrom abc import ABC\n\nimport torch\nimport torch_neuronx\n\nfrom transformers import AutoTokenizer\nfrom ts.torch_handler.base_handler import BaseHandler\n\n\n# one core per worker\nos.environ['NEURON_RT_NUM_CORES'] = '1'\n\nlogger = logging.getLogger(__name__)\n\nclass BertEmbeddingHandler(BaseHandler, ABC):\n    \"\"\"\n    Handler class for Bert Embedding computations.\n    \"\"\"\n    def __init__(self):\n        super(BertEmbeddingHandler, self).__init__()\n        self.initialized = False\n\n    def initialize(self, ctx):\n        self.manifest = ctx.manifest\n        properties = ctx.system_properties\n        self.device = 'cpu'\n        model_dir = properties.get('model_dir')\n        serialized_file = self.manifest['model']['serializedFile']\n        model_pt_path = os.path.join(model_dir, serialized_file)\n\n        # point sys.path to our config file\n        with open('config.json') as fp:\n            config = json.load(fp)\n        self.max_length = config['max_length']\n        self.batch_size = config['batch_size']\n        self.classes = ['not paraphrase', 'paraphrase']\n\n        self.model = torch.jit.load(model_pt_path)\n        logger.debug(f'Model loaded from {model_dir}')\n        self.model.to(self.device)\n        self.model.eval()\n\n        self.tokenizer = AutoTokenizer.from_pretrained(config['model_name'])\n        self.initialized = True\n\n    def preprocess(self, input_data):\n        \"\"\"\n        Tokenization pre-processing\n        \"\"\"\n\n        input_ids = []\n        attention_masks = []\n        token_type_ids = []\n        for row in input_data:\n            seq_0 = row['seq_0'].decode('utf-8')\n            seq_1 = row['seq_1'].decode('utf-8')\n            logger.debug(f'Received text: \"{seq_0}\", \"{seq_1}\"')\n\n            inputs = self.tokenizer.encode_plus(\n                    seq_0,\n                    seq_1,\n                    max_length=self.max_length,\n                    padding='max_length',\n                    truncation=True,\n                    return_tensors='pt'\n                    )\n\n            input_ids.append(inputs['input_ids'])\n            attention_masks.append(inputs['attention_mask'])\n            token_type_ids.append(inputs['token_type_ids'])\n\n        batch = (torch.cat(input_ids, 0),\n                torch.cat(attention_masks, 0),\n                torch.cat(token_type_ids, 0))\n\n        return batch\n\n    def inference(self, inputs):\n        \"\"\"\n        Predict the class of a text using a trained transformer model.\n        \"\"\"\n\n        # sanity check dimensions\n        assert(len(inputs) == 3)\n        num_inferences = len(inputs[0])\n        assert(num_inferences <= self.batch_size)\n\n        # insert padding if we received a partial batch\n        padding = self.batch_size - num_inferences\n        if padding > 0:\n            pad = torch.nn.ConstantPad1d((0, 0, 0, padding), value=0)\n            inputs = [pad(x) for x in inputs]\n\n        outputs = self.model(*inputs)[0]\n        predictions = []\n        for i in range(num_inferences):\n            prediction = self.classes[outputs[i].argmax(dim=-1).item()]\n            predictions.append([prediction])\n            logger.debug(\"Model predicted: '%s'\", prediction)\n        return predictions\n\n    def postprocess(self, inference_output):\n        return inference_output\n"
  },
  {
    "path": "src/examples/pytorch/torchserve/infer_bert.py",
    "content": "import json\nimport concurrent.futures\nimport requests\n\nwith open('config.json') as fp:\n    config = json.load(fp)\nmax_length = config['max_length']\nbatch_size = config['batch_size']\nname = f'bert-max_length{max_length}-batch_size{batch_size}'\n\n# dispatch requests in parallel\nurl = f'http://localhost:8080/predictions/{name}'\nparaphrase = {'seq_0': \"HuggingFace's headquarters are situated in Manhattan\",\n        'seq_1': \"The company HuggingFace is based in New York City\"}\nnot_paraphrase = {'seq_0': paraphrase['seq_0'], 'seq_1': 'This is total nonsense.'}\n\nwith concurrent.futures.ThreadPoolExecutor(max_workers=batch_size) as executor:\n    def worker_thread(worker_index):\n        # we'll send half the requests as not_paraphrase examples for sanity\n        data = paraphrase if worker_index < batch_size//2 else not_paraphrase\n        try:\n            response = requests.post(url, data=data)\n\n            # Check if the response status code indicates success\n            if response.status_code == 200:\n                print(worker_index, response.json())\n            else:\n                # If the response is not successful, raise an exception with the status code and error message\n                error_message = response.json().get('message', 'Unknown Error')\n                raise Exception(f\"Failed request with status code {response.status_code}: {error_message}\")\n        except Exception as e:\n            # Catch all other exceptions that may be raised\n            print(f\"An unexpected error occurred: {e}\")\n            raise\n\n    for worker_index in range(batch_size):\n        executor.submit(worker_thread, worker_index)\n"
  },
  {
    "path": "src/examples/pytorch/torchserve/torchserve.config",
    "content": "# bind inference API to all network interfaces with SSL enabled\ninference_address=http://0.0.0.0:8080\ndefault_workers_per_model=1"
  },
  {
    "path": "src/examples/pytorch/torchserve/trace_bert_neuron.py",
    "content": "import torch\nimport torch_neuron\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Build tokenizer and model\ntokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased-finetuned-mrpc\")\nmodel = AutoModelForSequenceClassification.from_pretrained(\"bert-base-cased-finetuned-mrpc\", return_dict=False)\n\n# Setup some example inputs\nsequence_0 = \"The company HuggingFace is based in New York City\"\nsequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n\nmax_length = 128\nbatch_size = 6\n\nparaphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n\nexample_inputs_paraphrase = (\n    torch.cat([paraphrase['input_ids']] * batch_size, 0),\n    torch.cat([paraphrase['attention_mask']] * batch_size, 0),\n    torch.cat([paraphrase['token_type_ids']] * batch_size, 0)\n)\n\n# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\nmodel_neuron_batch = torch_neuron.trace(model, example_inputs_paraphrase)\n\n# Save the batched model\nmodel_neuron_batch.save('bert_neuron_b{}.pt'.format(batch_size))\n"
  },
  {
    "path": "src/examples/pytorch/torchserve/trace_bert_neuronx.py",
    "content": "import torch\nimport torch_neuronx\n\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification\n\n\n# Build tokenizer and model\ntokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased-finetuned-mrpc\")\nmodel = AutoModelForSequenceClassification.from_pretrained(\"bert-base-cased-finetuned-mrpc\", return_dict=False)\n\n# Setup some example inputs\nsequence_0 = \"The company HuggingFace is based in New York City\"\nsequence_1 = \"HuggingFace's headquarters are situated in Manhattan\"\n\nmax_length = 128\nbatch_size = 6\n\nparaphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n\nexample_inputs_paraphrase = (\n    torch.cat([paraphrase['input_ids']] * batch_size, 0),\n    torch.cat([paraphrase['attention_mask']] * batch_size, 0),\n    torch.cat([paraphrase['token_type_ids']] * batch_size, 0)\n)\n\n# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\nmodel_neuron_batch = torch_neuronx.trace(model, example_inputs_paraphrase)\n\n# Save the batched model\nmodel_neuron_batch.save('bert_neuron_b{}.pt'.format(batch_size))\n"
  },
  {
    "path": "src/examples/pytorch/transformers-marianmt.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Transformers MarianMT Tutorial\\n\",\n    \"\\n\",\n    \"In this tutorial, you will deploy the [HuggingFace MarianMT](https://huggingface.co/transformers/v4.0.1/model_doc/marian.html) model for text translation.\\n\",\n    \"\\n\",\n    \"This Jupyter notebook should be run on an inf1.6xlarge instance since you will be loading and compiling several large models.\\n\",\n    \"\\n\",\n    \"Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \\\"Kernel -> Change Kernel\\\" option on the top of this Jupyter notebook page.\\n\",\n    \"\\n\",\n    \"To generate text, you will be using the beam search algorithm to incrementally generate token candidates until the full output text has been created. Unlike simple single-pass models, this algorithm divides the work into two distinct phases:\\n\",\n    \"\\n\",\n    \"- **Encoder**: Convert the input text into an encoded representation. (Executed once)\\n\",\n    \"- **Decoder**: Use the encoded representation of the input text and the current output tokens to incrementally generate the set of next best candidate tokens. (Executed many times)\\n\",\n    \"\\n\",\n    \"In this tutorial you will perform the following steps:\\n\",\n    \"\\n\",\n    \"- **Compile**: Compile both the Encoder and Decoder for Neuron using simplified interfaces for inference.\\n\",\n    \"- **Infer**: Run on CPU and Neuron and compare results.\\n\",\n    \"\\n\",\n    \"Finally, a completely unrolled decoder will be built which simplifies the implementation at the cost of performing fixed-length inferences.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install Dependencies:\\n\",\n    \"\\n\",\n    \"This tutorial has the following dependencies:\\n\",\n    \"\\n\",\n    \"- `transformers==4.25.1`\\n\",\n    \"- `torch-neuron`\\n\",\n    \"- `sentencepiece`\\n\",\n    \"- `neuron-cc[tensorflow]`\\n\",\n    \"\\n\",\n    \"The following will install the required `transformers` version. Note that encoder/decoder API changes across different minor versions requires that you are specific about the version used. Also note that the `torch-neuron` version is pinned due to `transformer` compatibility issues.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!pip install sentencepiece transformers==4.26.1\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Parameters\\n\",\n    \"\\n\",\n    \"The parameters of a generative model can be tuned for different use-cases. In this example, you'll tailor the parameters to a single inference beam search for an on-demand inference use-case. See the [MarianConfig](https://huggingface.co/transformers/v4.0.1/model_doc/marian.html#marianconfig) for parameter details.\\n\",\n    \"\\n\",\n    \"Rather than varying the encoder/decoder token sizes at runtime, you must define these parameters prior to compilation. The encoder/decoder token sizes are important tunable parameters as a large token sequence will offer greater sentence length flexibility but perform worse than a small token sequence.\\n\",\n    \"\\n\",\n    \"To maximize performance on Neuron, the `num_beams`, `max_encode_length` and `max_decoder_length` should be made as small as possible for the use-case.\\n\",\n    \"\\n\",\n    \"For this tutorial you will use a model that translates sentences of up to 32 token from English to German.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\\n\",\n    \"model_name = \\\"Helsinki-NLP/opus-mt-en-de\\\" # English -> German model\\n\",\n    \"num_texts = 1                             # Number of input texts to decode\\n\",\n    \"num_beams = 4                             # Number of beams per input text\\n\",\n    \"max_encoder_length = 32                   # Maximum input token length\\n\",\n    \"max_decoder_length = 32                   # Maximum output token length\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## CPU Model Inference\\n\",\n    \"\\n\",\n    \"Start by executing the model on CPU to test its execution.\\n\",\n    \"\\n\",\n    \"The following defines the inference function which will be used to compare the Neuron and CPU output. In this example you will display all beam search sequences that were generated. For a real on-demand use case, set the `num_beams` to `1` to return only the top result.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def infer(model, tokenizer, text):\\n\",\n    \"\\n\",\n    \"    # Truncate and pad the max length to ensure that the token size is compatible with fixed-sized encoder (Not necessary for pure CPU execution)\\n\",\n    \"    batch = tokenizer(text, max_length=max_decoder_length, truncation=True, padding='max_length', return_tensors=\\\"pt\\\")\\n\",\n    \"    output = model.generate(**batch, max_length=max_decoder_length, num_beams=num_beams, num_return_sequences=num_beams)\\n\",\n    \"    results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]\\n\",\n    \"\\n\",\n    \"    print('Texts:')\\n\",\n    \"    for i, summary in enumerate(results):\\n\",\n    \"        print(i + 1, summary)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Note that after loading the model, we also set the maximum length. This will later be used to limit the size of the compiled model.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from transformers import MarianMTModel, MarianTokenizer\\n\",\n    \"\\n\",\n    \"model_cpu = MarianMTModel.from_pretrained(model_name)\\n\",\n    \"model_cpu.config.max_length = max_decoder_length\\n\",\n    \"model_cpu.eval()\\n\",\n    \"\\n\",\n    \"tokenizer = MarianTokenizer.from_pretrained(model_name)\\n\",\n    \"\\n\",\n    \"sample_text = \\\"I am a small frog.\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"infer(model_cpu, tokenizer, sample_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Padded Model\\n\",\n    \"In order to perform inference on Neuron, the model must be changed in a way that it supports tracing and fixed-sized inputs. One way in which this is possible is to use a pad the model inputs to the maximum possible tensor sizes. The benefit of using a padded model is that it supports variable length text generation up to a specified length `max_decoder_length`. A consequence of padding is that it can negatively impact performance due to large data transfers.\\n\",\n    \"\\n\",\n    \"### PaddedEncoder & PaddedDecoder Modules\\n\",\n    \"Here you will define wrappers around the encoder and decoder portions of the generation model that are compatible with `torch.jit.trace` as well as fixed-sized inputs.\\n\",\n    \"\\n\",\n    \"The following are important features which are distinct from the default configuration:\\n\",\n    \"\\n\",\n    \"1. Disabled `return_dict`. When this is enabled, the network uses `dataclass` type outputs which are not compatible with `torch.jit.trace`.\\n\",\n    \"2. Disabled `use_cache`. When this option is enabled, the network expects a collection of cache tensors which grow upon each iteration. Since Neuron requires fixed sized inputs, this must be disabled.\\n\",\n    \"3. The `GenerationMixin:beam_search` implementation uses only the logits for the current iteration index from the original decoder layer output. Since inputs must be padded, performance can be improved by selecting only a subset of the hidden state prior to the final linear layer. For efficiency on Neuron, this reduction uses an elementwise-multiply to mask out the unused hidden values and then sums along an axis.\\n\",\n    \"4. Since a reduction step is insterted between the decoder output and the final logit calculation, the original `model` attribute is not used. Instead the `PaddedDecoder` class combines the decoder, reducer, and linear layers into a combined forward pass. In the original model there is a clear distinction between the decoder layer and the final linear layer. These layers are fused together to get one large fully optimized graph.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"from torch.nn import functional as F\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class PaddedEncoder(torch.nn.Module):\\n\",\n    \"\\n\",\n    \"    def __init__(self, model):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.encoder = model.model.encoder\\n\",\n    \"        self.main_input_name = 'input_ids'\\n\",\n    \"        \\n\",\n    \"    def forward(self, input_ids, attention_mask):\\n\",\n    \"        return self.encoder(input_ids, attention_mask=attention_mask, return_dict=False)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class PaddedDecoder(torch.nn.Module):\\n\",\n    \"\\n\",\n    \"    def __init__(self, model):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.weight = model.model.shared.weight.clone().detach()\\n\",\n    \"        self.bias = model.final_logits_bias.clone().detach()\\n\",\n    \"        self.decoder = model.model.decoder\\n\",\n    \"\\n\",\n    \"    def forward(self, input_ids, attention_mask, encoder_outputs, index):\\n\",\n    \"\\n\",\n    \"        # Invoke the decoder\\n\",\n    \"        hidden, = self.decoder(\\n\",\n    \"            input_ids=input_ids,\\n\",\n    \"            encoder_hidden_states=encoder_outputs,\\n\",\n    \"            encoder_attention_mask=attention_mask,\\n\",\n    \"            return_dict=False,\\n\",\n    \"            use_cache=False,\\n\",\n    \"        )\\n\",\n    \"\\n\",\n    \"        _, n_length, _ = hidden.shape\\n\",\n    \"\\n\",\n    \"        # Create selection mask\\n\",\n    \"        mask = torch.arange(n_length, dtype=torch.float32) == index\\n\",\n    \"        mask = mask.view(1, -1, 1)\\n\",\n    \"\\n\",\n    \"        # Broadcast mask\\n\",\n    \"        masked = torch.multiply(hidden, mask)\\n\",\n    \"\\n\",\n    \"        # Reduce along 1st dimension\\n\",\n    \"        hidden = torch.sum(masked, 1, keepdims=True)\\n\",\n    \"\\n\",\n    \"        # Compute final linear layer for token probabilities\\n\",\n    \"        logits = F.linear(\\n\",\n    \"            hidden,\\n\",\n    \"            self.weight,\\n\",\n    \"            bias=self.bias\\n\",\n    \"        )\\n\",\n    \"        return logits\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### PaddedGenerator - GenerationMixin Class\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"On text generation tasks, HuggingFace Transformers defines a [GenerationMixin](https://huggingface.co/transformers/v4.0.1/main_classes/model.html?highlight=generate#transformers.generation_utils.GenerationMixin) base class which provides standard methods and algorithms to generate text. For this tutorial, you will be using the beam search algorithm on encoder/decoder architectures.\\n\",\n    \"\\n\",\n    \"To be able to use these methods, you will be defining your own class derived from the GenerationMixin class to run a beam search. This will invoke the encoder and decoder layers in a way that is compatible with fixed sized inputs and traced modules. This means you must import the base class and the output objects ([Seq2SeqLMOutput](https://huggingface.co/transformers/v4.0.1/main_classes/output.html#transformers.modeling_outputs.Seq2SeqLMOutput), [BaseModelOutput](https://huggingface.co/transformers/v4.0.1/main_classes/output.html#transformers.modeling_outputs.BaseModelOutput)) used by the [beam_search](https://huggingface.co/transformers/v4.0.1/main_classes/model.html?highlight=generate#transformers.generation_utils.GenerationMixin.beam_search) algorithm.\\n\",\n    \"\\n\",\n    \"The `GenerationMixin:generate` method will use `GenerationMixin:beam_search` which requires that you to define your own class implementation that invokes the `PaddedEncoder` and `PaddedDecoder` modules using padded inputs. The standard generator model implementation will not work by default because it is intended to infer with variable-sized (growing) input tensors. \\n\",\n    \"\\n\",\n    \"The `from_model` method is defined to create the `PaddedGenerator` from an existing pretrained generator class.\\n\",\n    \"\\n\",\n    \"To invoke the Encoder and Decoder traced modules in a way that is compatible with the `GenerationMixin:beam_search` implementation, the `get_encoder`, `__call__`, and  `prepare_inputs_for_generation` methods are overriden.\\n\",\n    \"\\n\",\n    \"Lastly, the class defines methods for serialization so that the model can be easily saved and loaded.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"\\n\",\n    \"from transformers import GenerationMixin, AutoConfig\\n\",\n    \"from transformers.modeling_outputs import Seq2SeqLMOutput, BaseModelOutput\\n\",\n    \"from transformers.modeling_utils import PreTrainedModel\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class PaddedGenerator(PreTrainedModel, GenerationMixin):\\n\",\n    \"\\n\",\n    \"    @classmethod\\n\",\n    \"    def from_model(cls, model):\\n\",\n    \"        generator = cls(model.config)\\n\",\n    \"        generator.encoder = PaddedEncoder(model)\\n\",\n    \"        generator.decoder = PaddedDecoder(model)\\n\",\n    \"        return generator\\n\",\n    \"    \\n\",\n    \"    def prepare_inputs_for_generation(\\n\",\n    \"            self,\\n\",\n    \"            input_ids,\\n\",\n    \"            encoder_outputs=None,\\n\",\n    \"            attention_mask=None,\\n\",\n    \"            **kwargs,\\n\",\n    \"    ):\\n\",\n    \"        # Pad the inputs for Neuron\\n\",\n    \"        current_length = input_ids.shape[1]\\n\",\n    \"        pad_size = self.config.max_length - current_length\\n\",\n    \"        return dict(\\n\",\n    \"            input_ids=F.pad(input_ids, (0, pad_size)),\\n\",\n    \"            attention_mask=attention_mask,\\n\",\n    \"            encoder_outputs=encoder_outputs.last_hidden_state,\\n\",\n    \"            current_length=torch.tensor(current_length - 1),\\n\",\n    \"        )\\n\",\n    \"\\n\",\n    \"    def get_encoder(self):\\n\",\n    \"        def encode(input_ids, attention_mask, **kwargs):        \\n\",\n    \"            output, = self.encoder(input_ids, attention_mask)\\n\",\n    \"            return BaseModelOutput(\\n\",\n    \"                last_hidden_state=output,\\n\",\n    \"            )\\n\",\n    \"        return encode\\n\",\n    \"\\n\",\n    \"    def forward(self, input_ids, attention_mask, encoder_outputs, current_length, **kwargs):\\n\",\n    \"        logits = self.decoder(input_ids, attention_mask, encoder_outputs, current_length)\\n\",\n    \"        return Seq2SeqLMOutput(logits=logits)\\n\",\n    \"\\n\",\n    \"    @property\\n\",\n    \"    def device(self):  # Attribute required by beam search\\n\",\n    \"        return torch.device('cpu')\\n\",\n    \"    \\n\",\n    \"    def save_pretrained(self, directory):\\n\",\n    \"        if os.path.isfile(directory):\\n\",\n    \"            print(f\\\"Provided path ({directory}) should be a directory, not a file\\\")\\n\",\n    \"            return\\n\",\n    \"        os.makedirs(directory, exist_ok=True)\\n\",\n    \"        torch.jit.save(self.encoder, os.path.join(directory, 'encoder.pt'))\\n\",\n    \"        torch.jit.save(self.decoder, os.path.join(directory, 'decoder.pt'))\\n\",\n    \"        self.config.save_pretrained(directory)\\n\",\n    \"\\n\",\n    \"    @classmethod\\n\",\n    \"    def from_pretrained(cls, directory):\\n\",\n    \"        config = AutoConfig.from_pretrained(directory)\\n\",\n    \"        obj = cls(config)\\n\",\n    \"        obj.encoder = torch.jit.load(os.path.join(directory, 'encoder.pt'))\\n\",\n    \"        obj.decoder = torch.jit.load(os.path.join(directory, 'decoder.pt'))\\n\",\n    \"        setattr(obj.encoder, 'main_input_name', 'input_ids')  # Attribute required by beam search\\n\",\n    \"        return obj\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Padded CPU Inference\\n\",\n    \"To start, it is important to ensure that the transformations we have made to the model were successful. Using the classes defined above we can test that the padded model execution on CPU is identical to the original output also running on CPU.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"padded_model_cpu = PaddedGenerator.from_model(model_cpu)\\n\",\n    \"infer(padded_model_cpu, tokenizer, sample_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Padded Neuron Tracing & Inference\\n\",\n    \"\\n\",\n    \"Now that the padded version of model is confirmed to produce the same outputs as the non-padded version, the model can be compiled for Neuron.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"import torch_neuron\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def trace(model, num_texts, num_beams, max_decoder_length, max_encoder_length):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Traces the encoder and decoder modules for use on Neuron.\\n\",\n    \"\\n\",\n    \"    This function fixes the network to the given sizes. Once the model has been\\n\",\n    \"    compiled to a given size, the inputs to these networks must always be of\\n\",\n    \"    fixed size.\\n\",\n    \"\\n\",\n    \"    Args:\\n\",\n    \"        model (PaddedGenerator): The padded generator to compile for Neuron\\n\",\n    \"        num_texts (int): The number of input texts to translate at once\\n\",\n    \"        num_beams (int): The number of beams to compute per text\\n\",\n    \"        max_decoder_length (int): The maximum number of tokens to be generated\\n\",\n    \"        max_encoder_length (int): The maximum number of input tokens that will be encoded\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    # Trace the encoder\\n\",\n    \"    inputs = (\\n\",\n    \"        torch.ones((num_texts, max_encoder_length), dtype=torch.long),\\n\",\n    \"        torch.ones((num_texts, max_encoder_length), dtype=torch.long),\\n\",\n    \"    )\\n\",\n    \"    encoder = torch_neuron.trace(model.encoder, inputs)\\n\",\n    \"\\n\",\n    \"    # Trace the decoder (with expanded inputs)\\n\",\n    \"    batch_size = num_texts * num_beams\\n\",\n    \"    inputs = (\\n\",\n    \"        torch.ones((batch_size, max_decoder_length), dtype=torch.long),\\n\",\n    \"        torch.ones((batch_size, max_encoder_length), dtype=torch.long),\\n\",\n    \"        torch.ones((batch_size, max_encoder_length, model.config.d_model), dtype=torch.float),\\n\",\n    \"        torch.tensor(0),\\n\",\n    \"    )\\n\",\n    \"    decoder = torch_neuron.trace(model.decoder, inputs)\\n\",\n    \"    \\n\",\n    \"    traced = PaddedGenerator(model.config)\\n\",\n    \"    traced.encoder = encoder\\n\",\n    \"    traced.decoder = decoder\\n\",\n    \"    setattr(encoder, 'main_input_name', 'input_ids')  # Attribute required by beam search\\n\",\n    \"    return traced\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"padded_model_neuron = trace(padded_model_cpu, num_texts, num_beams, max_decoder_length, max_encoder_length)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Comparing the Neuron execution to the original CPU implementation, you will see the exact same generated text.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"pycharm\": {\n     \"name\": \"#%%\\n\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"# CPU execution for comparison\\n\",\n    \"infer(padded_model_neuron, tokenizer, sample_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Padded Neuron Serialization\\n\",\n    \"Finally, we can test that we can serialize and reload the model so that it can be used later in its precompiled format.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"padded_model_neuron.save_pretrained('NeuronPaddedMarianMT')\\n\",\n    \"padded_model_loaded = PaddedGenerator.from_pretrained('NeuronPaddedMarianMT')\\n\",\n    \"infer(padded_model_loaded, tokenizer, sample_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Greedy Unrolled Model\\n\",\n    \"An unrolled version of the model can achieve better performance in some cases since all operations will be executed on the Neuron hardware without returning to CPU. The consequence of this type of model is that since the generation loop execution never returns to CPU, the entire sequence up to `max_decoder_length` is performed in a single forward pass.\\n\",\n    \"\\n\",\n    \"The following module performs greedy text generation. Unlike the original beam search text generation, this implementation always selects the most probable token and does not generate multiple result texts.\\n\",\n    \"\\n\",\n    \"### GreedyUnrolledGenerator Module\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"class GreedyUnrolledGenerator(torch.nn.Module):\\n\",\n    \"    \\n\",\n    \"    def __init__(self, model):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.config = model.config\\n\",\n    \"        self.model = model\\n\",\n    \"    \\n\",\n    \"    def forward(self, input_ids, attention_mask):\\n\",\n    \"        \\n\",\n    \"        # Generate the encoder state for the input tokens. This is only done once and the state is reused.\\n\",\n    \"        encoder_outputs, = self.model.model.encoder(input_ids, attention_mask=attention_mask, return_dict=False)\\n\",\n    \"        \\n\",\n    \"        # Set the intial state for the decode loop. This will grow per decoder iteration\\n\",\n    \"        tokens = torch.full((input_ids.size(0), 2), self.config.decoder_start_token_id)\\n\",\n    \"        \\n\",\n    \"        # Iteratively invoke the decoder on incrementally generated `tokens` to generate a `next_token`.\\n\",\n    \"        # Note that unlike the GeneratorMixin.generate function, there is no early-exit if the stop token \\n\",\n    \"        # has been reached. This will always run a fixed number of iterations.\\n\",\n    \"        for i in range(self.config.max_length):\\n\",\n    \"            \\n\",\n    \"            hidden, = self.model.model.decoder(\\n\",\n    \"                input_ids=tokens,\\n\",\n    \"                encoder_hidden_states=encoder_outputs,\\n\",\n    \"                encoder_attention_mask=attention_mask,\\n\",\n    \"                return_dict=False,\\n\",\n    \"                use_cache=False,\\n\",\n    \"            ) # size: [batch, current_length, vocab_size]\\n\",\n    \"                        \\n\",\n    \"            logits = F.linear(\\n\",\n    \"                hidden[:, -1, :],\\n\",\n    \"                self.model.model.shared.weight,\\n\",\n    \"                bias=self.model.final_logits_bias\\n\",\n    \"            )\\n\",\n    \"            next_tokens = torch.argmax(logits, dim=1, keepdims=True)\\n\",\n    \"            tokens = torch.cat([tokens, next_tokens], dim=1)\\n\",\n    \"        \\n\",\n    \"        return tokens\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Greedy CPU Inference\\n\",\n    \"The inference code must be updated since the `generate` method is no longer used. This is because the entire generative inference loop occurs within the `GreedyUnrolledGenerator.forward` method.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def infer_greedy(model, tokenizer, text):\\n\",\n    \"    batch = tokenizer(text, max_length=max_decoder_length, truncation=True, padding='max_length', return_tensors=\\\"pt\\\")\\n\",\n    \"    inputs = batch['input_ids'], batch['attention_mask']\\n\",\n    \"    tokens = greedy_cpu(*inputs)\\n\",\n    \"    print('Texts:')\\n\",\n    \"    for i, t in enumerate(tokens):\\n\",\n    \"        result = tokenizer.decode(t, skip_special_tokens=True)\\n\",\n    \"        print(i + 1, result)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Like in previous section of this tutorial, first the greedy model is executed on CPU to validate that the correct results were produced. In this example, the generated text matches the first result of the original beam search.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"model_cpu.config.max_length = 8 # This controls the number of decoder loops. Reduced to improve compilation speed.\\n\",\n    \"greedy_cpu = GreedyUnrolledGenerator(model_cpu)\\n\",\n    \"infer_greedy(greedy_cpu, tokenizer, sample_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Greedy Neuron Tracing & Inference\\n\",\n    \"Similarly the tracing is simplified since the now the `GreedyUnrolledGenerator.forward` can be compiled as a single unit. \\n\",\n    \"\\n\",\n    \"For compilation efficiency, two changes will be made compared to normal compilaition:\\n\",\n    \"- `torch.jit.freeze` is used because it can *sometimes* speed up compilation by in the case where a module is re-used multiple times. In this case, it is more efficient because the `self.model.model.decoder` is used in a loop. \\n\",\n    \"- The `torch_neuron.trace` option `fallback` is set to `False`. This forces all operations to execute on Neuron. Most of the time this is not recommended or efficient. In this case, it is more efficient because it means a single subgraph is produced rather than many. Usually one subgraph would be produced per decoder iteration since `aten::embedding` is executed in a loop. The `aten::embedding` operation is otherwise exected on CPU by default since this is usually more efficient than executing on Neuron.\\n\",\n    \"\\n\",\n    \"You may notice that compilation will take significantly longer with the unrolled model since the model inserts new operations into the compute graph for every single decoder iteration. This creates a much larger model graph even though the weights are re-used.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"example = (\\n\",\n    \"    torch.ones((num_texts, max_encoder_length), dtype=torch.long),\\n\",\n    \"    torch.ones((num_texts, max_encoder_length), dtype=torch.long),\\n\",\n    \")\\n\",\n    \"greedy_cpu.eval()\\n\",\n    \"greedy_trace = torch.jit.trace(greedy_cpu, example)\\n\",\n    \"greedy_frozen = torch.jit.freeze(greedy_trace)\\n\",\n    \"greedy_neuron = torch_neuron.trace(greedy_frozen, example, fallback=False)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"infer_greedy(greedy_neuron, tokenizer, sample_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Greedy Neuron Serialization\\n\",\n    \"Unlike the previous version of the model that used the `GenerationMixin` base class. This greedy version of the model can be serialized using the regular `torch.jit.save` and `torch.jit.load` utilities since it is a pure torchscript module.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"torch.jit.save(greedy_neuron, 'greedy_neuron.pt')\\n\",\n    \"loaded_greedy_neuron = torch.jit.load('greedy_neuron.pt')\\n\",\n    \"infer_greedy(loaded_greedy_neuron, tokenizer, sample_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Appendix\\n\",\n    \"### BART (Mask Filling Task)\\n\",\n    \"\\n\",\n    \"These `PaddedGenerator` class can be applied to the BART model for the task of filling in mask tokens.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"pycharm\": {\n     \"name\": \"#%%\\n\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"from transformers import BartForConditionalGeneration, BartTokenizer\\n\",\n    \"bart_name = \\\"facebook/bart-large\\\"\\n\",\n    \"bart_model = BartForConditionalGeneration.from_pretrained(bart_name)\\n\",\n    \"bart_model.config.max_length = max_decoder_length\\n\",\n    \"bart_tokenizer = BartTokenizer.from_pretrained(bart_name)\\n\",\n    \"bart_text = \\\"UN Chief Says There Is No <mask> in Syria\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"pycharm\": {\n     \"name\": \"#%%\\n\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"# CPU Execution\\n\",\n    \"infer(bart_model, bart_tokenizer, bart_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"pycharm\": {\n     \"name\": \"#%%\\n\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"# Neuron Execution\\n\",\n    \"paddded_bart = PaddedGenerator.from_model(bart_model)\\n\",\n    \"bart_neuron = trace(paddded_bart, num_texts, num_beams, max_decoder_length, max_encoder_length)\\n\",\n    \"infer(bart_neuron, bart_tokenizer, bart_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Pegasus (Summarization Task)\\n\",\n    \"\\n\",\n    \"These `PaddedGenerator` class can be applied to the Pegasus model for summarization.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"pycharm\": {\n     \"name\": \"#%%\\n\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"from transformers import PegasusForConditionalGeneration, PegasusTokenizer\\n\",\n    \"pegasus_name = 'google/pegasus-xsum'\\n\",\n    \"pegasus_model = PegasusForConditionalGeneration.from_pretrained(pegasus_name)\\n\",\n    \"pegasus_model.config.max_length = max_decoder_length\\n\",\n    \"pegasus_tokenizer = PegasusTokenizer.from_pretrained(pegasus_name)\\n\",\n    \"pegasus_text = \\\"PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires.\\\"\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"pycharm\": {\n     \"name\": \"#%%\\n\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"# CPU Execution\\n\",\n    \"infer(pegasus_model, pegasus_tokenizer, pegasus_text)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"pycharm\": {\n     \"name\": \"#%%\\n\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"# Neuron Execution\\n\",\n    \"paddded_pegasus = PaddedGenerator.from_model(pegasus_model)\\n\",\n    \"pegasus_neuron = trace(paddded_pegasus, num_texts, num_beams, max_decoder_length, max_encoder_length)\\n\",\n    \"infer(pegasus_neuron, pegasus_tokenizer, pegasus_text)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3 (ipykernel)\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.10\"\n  },\n  \"vscode\": {\n   \"interpreter\": {\n    \"hash\": \"31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6\"\n   }\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "src/examples/pytorch/yolo_v4.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Evaluate YOLO v4 on Inferentia\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Introduction\\n\",\n    \"This tutorial walks through compiling and evaluating YOLO v4 model implemented in PyTorch on Inferentia. \\n\",\n    \"\\n\",\n    \"The tutorial has five main sections:\\n\",\n    \"\\n\",\n    \"1. Define YOLO v4 model in PyTorch\\n\",\n    \"2. Download the COCO 2017 evaluation dataset and define the data loader function\\n\",\n    \"3. Build, Compile, and Save Neuron-Optimized YOLO v4 TorchScript\\n\",\n    \"4. Evaluate Accuracy on the COCO 2017 Dataset\\n\",\n    \"5. Benchmark COCO Dataset Performance of the Neuron-Optimized TorchScript\\n\",\n    \"\\n\",\n    \"Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \\\"Kernel -> Change Kernel\\\" option on the top of this Jupyter notebook page.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install Dependencies:\\n\",\n    \"This tutorial requires the following pip packages:\\n\",\n    \"\\n\",\n    \"- `torch-neuron`\\n\",\n    \"- `torchvision`\\n\",\n    \"- `pillow`\\n\",\n    \"- `pycocotools`\\n\",\n    \"- `neuron-cc[tensorflow]`\\n\",\n    \"\\n\",\n    \"Many of these packages will be installed by default when configuring your environment using the Neuron PyTorch setup guide. The additional dependencies must be installed here.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!pip install --upgrade pillow pycocotools \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 1: Define YOLO v4 model in PyTorch \\n\",\n    \"The following PyTorch model definition is from https://github.com/Tianxiaomo/pytorch-YOLOv4/.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import numpy as np\\n\",\n    \"import torch\\n\",\n    \"import torch.neuron\\n\",\n    \"from torch import nn\\n\",\n    \"import torch.nn.functional as F\\n\",\n    \"import os\\n\",\n    \"import warnings\\n\",\n    \"\\n\",\n    \"# Setting up NeuronCore groups for inf1.6xlarge with 16 cores\\n\",\n    \"n_cores = 16 # This value should be 4 on inf1.xlarge and inf1.2xlarge\\n\",\n    \"os.environ['NEURON_RT_NUM_CORES'] = str(n_cores)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class Mish(torch.nn.Module):\\n\",\n    \"    def __init__(self):\\n\",\n    \"        super().__init__()\\n\",\n    \"\\n\",\n    \"    def forward(self, x):\\n\",\n    \"        x = x * (torch.tanh(torch.nn.functional.softplus(x)))\\n\",\n    \"        return x\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class Upsample(nn.Module):\\n\",\n    \"    def __init__(self):\\n\",\n    \"        super(Upsample, self).__init__()\\n\",\n    \"\\n\",\n    \"    def forward(self, x, target_size, inference=False):\\n\",\n    \"        assert (x.data.dim() == 4)\\n\",\n    \"\\n\",\n    \"        if inference:\\n\",\n    \"\\n\",\n    \"            return x.view(x.size(0), x.size(1), x.size(2), 1, x.size(3), 1).\\\\\\n\",\n    \"                    expand(x.size(0), x.size(1), x.size(2), target_size[2] // x.size(2), x.size(3), target_size[3] // x.size(3)).\\\\\\n\",\n    \"                    contiguous().view(x.size(0), x.size(1), target_size[2], target_size[3])\\n\",\n    \"        else:\\n\",\n    \"            return F.interpolate(x, size=(target_size[2], target_size[3]), mode='nearest')\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class Conv_Bn_Activation(nn.Module):\\n\",\n    \"    def __init__(self, in_channels, out_channels, kernel_size, stride, activation, bn=True, bias=False):\\n\",\n    \"        super().__init__()\\n\",\n    \"        pad = (kernel_size - 1) // 2\\n\",\n    \"\\n\",\n    \"        self.conv = nn.ModuleList()\\n\",\n    \"        if bias:\\n\",\n    \"            self.conv.append(nn.Conv2d(in_channels, out_channels, kernel_size, stride, pad))\\n\",\n    \"        else:\\n\",\n    \"            self.conv.append(nn.Conv2d(in_channels, out_channels, kernel_size, stride, pad, bias=False))\\n\",\n    \"        if bn:\\n\",\n    \"            self.conv.append(nn.BatchNorm2d(out_channels))\\n\",\n    \"        if activation == \\\"mish\\\":\\n\",\n    \"            self.conv.append(Mish())\\n\",\n    \"        elif activation == \\\"relu\\\":\\n\",\n    \"            self.conv.append(nn.ReLU(inplace=True))\\n\",\n    \"        elif activation == \\\"leaky\\\":\\n\",\n    \"            self.conv.append(nn.LeakyReLU(0.1, inplace=True))\\n\",\n    \"        elif activation == \\\"linear\\\":\\n\",\n    \"            pass\\n\",\n    \"        else:\\n\",\n    \"            print(\\\"activate error !!! {} {} {}\\\".format(sys._getframe().f_code.co_filename,\\n\",\n    \"                                                       sys._getframe().f_code.co_name, sys._getframe().f_lineno))\\n\",\n    \"\\n\",\n    \"    def forward(self, x):\\n\",\n    \"        for l in self.conv:\\n\",\n    \"            x = l(x)\\n\",\n    \"        return x\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class ResBlock(nn.Module):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Sequential residual blocks each of which consists of \\\\\\n\",\n    \"    two convolution layers.\\n\",\n    \"    Args:\\n\",\n    \"        ch (int): number of input and output channels.\\n\",\n    \"        nblocks (int): number of residual blocks.\\n\",\n    \"        shortcut (bool): if True, residual tensor addition is enabled.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    def __init__(self, ch, nblocks=1, shortcut=True):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.shortcut = shortcut\\n\",\n    \"        self.module_list = nn.ModuleList()\\n\",\n    \"        for i in range(nblocks):\\n\",\n    \"            resblock_one = nn.ModuleList()\\n\",\n    \"            resblock_one.append(Conv_Bn_Activation(ch, ch, 1, 1, 'mish'))\\n\",\n    \"            resblock_one.append(Conv_Bn_Activation(ch, ch, 3, 1, 'mish'))\\n\",\n    \"            self.module_list.append(resblock_one)\\n\",\n    \"\\n\",\n    \"    def forward(self, x):\\n\",\n    \"        for module in self.module_list:\\n\",\n    \"            h = x\\n\",\n    \"            for res in module:\\n\",\n    \"                h = res(h)\\n\",\n    \"            x = x + h if self.shortcut else h\\n\",\n    \"        return x\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class DownSample1(nn.Module):\\n\",\n    \"    def __init__(self):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.conv1 = Conv_Bn_Activation(3, 32, 3, 1, 'mish')\\n\",\n    \"\\n\",\n    \"        self.conv2 = Conv_Bn_Activation(32, 64, 3, 2, 'mish')\\n\",\n    \"        self.conv3 = Conv_Bn_Activation(64, 64, 1, 1, 'mish')\\n\",\n    \"        # [route]\\n\",\n    \"        # layers = -2\\n\",\n    \"        self.conv4 = Conv_Bn_Activation(64, 64, 1, 1, 'mish')\\n\",\n    \"\\n\",\n    \"        self.conv5 = Conv_Bn_Activation(64, 32, 1, 1, 'mish')\\n\",\n    \"        self.conv6 = Conv_Bn_Activation(32, 64, 3, 1, 'mish')\\n\",\n    \"        # [shortcut]\\n\",\n    \"        # from=-3\\n\",\n    \"        # activation = linear\\n\",\n    \"\\n\",\n    \"        self.conv7 = Conv_Bn_Activation(64, 64, 1, 1, 'mish')\\n\",\n    \"        # [route]\\n\",\n    \"        # layers = -1, -7\\n\",\n    \"        self.conv8 = Conv_Bn_Activation(128, 64, 1, 1, 'mish')\\n\",\n    \"\\n\",\n    \"    def forward(self, input):\\n\",\n    \"        x1 = self.conv1(input)\\n\",\n    \"        x2 = self.conv2(x1)\\n\",\n    \"        x3 = self.conv3(x2)\\n\",\n    \"        # route -2\\n\",\n    \"        x4 = self.conv4(x2)\\n\",\n    \"        x5 = self.conv5(x4)\\n\",\n    \"        x6 = self.conv6(x5)\\n\",\n    \"        # shortcut -3\\n\",\n    \"        x6 = x6 + x4\\n\",\n    \"\\n\",\n    \"        x7 = self.conv7(x6)\\n\",\n    \"        # [route]\\n\",\n    \"        # layers = -1, -7\\n\",\n    \"        x7 = torch.cat([x7, x3], dim=1)\\n\",\n    \"        x8 = self.conv8(x7)\\n\",\n    \"        return x8\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class DownSample2(nn.Module):\\n\",\n    \"    def __init__(self):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.conv1 = Conv_Bn_Activation(64, 128, 3, 2, 'mish')\\n\",\n    \"        self.conv2 = Conv_Bn_Activation(128, 64, 1, 1, 'mish')\\n\",\n    \"        # r -2\\n\",\n    \"        self.conv3 = Conv_Bn_Activation(128, 64, 1, 1, 'mish')\\n\",\n    \"\\n\",\n    \"        self.resblock = ResBlock(ch=64, nblocks=2)\\n\",\n    \"\\n\",\n    \"        # s -3\\n\",\n    \"        self.conv4 = Conv_Bn_Activation(64, 64, 1, 1, 'mish')\\n\",\n    \"        # r -1 -10\\n\",\n    \"        self.conv5 = Conv_Bn_Activation(128, 128, 1, 1, 'mish')\\n\",\n    \"\\n\",\n    \"    def forward(self, input):\\n\",\n    \"        x1 = self.conv1(input)\\n\",\n    \"        x2 = self.conv2(x1)\\n\",\n    \"        x3 = self.conv3(x1)\\n\",\n    \"\\n\",\n    \"        r = self.resblock(x3)\\n\",\n    \"        x4 = self.conv4(r)\\n\",\n    \"\\n\",\n    \"        x4 = torch.cat([x4, x2], dim=1)\\n\",\n    \"        x5 = self.conv5(x4)\\n\",\n    \"        return x5\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class DownSample3(nn.Module):\\n\",\n    \"    def __init__(self):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.conv1 = Conv_Bn_Activation(128, 256, 3, 2, 'mish')\\n\",\n    \"        self.conv2 = Conv_Bn_Activation(256, 128, 1, 1, 'mish')\\n\",\n    \"        self.conv3 = Conv_Bn_Activation(256, 128, 1, 1, 'mish')\\n\",\n    \"\\n\",\n    \"        self.resblock = ResBlock(ch=128, nblocks=8)\\n\",\n    \"        self.conv4 = Conv_Bn_Activation(128, 128, 1, 1, 'mish')\\n\",\n    \"        self.conv5 = Conv_Bn_Activation(256, 256, 1, 1, 'mish')\\n\",\n    \"\\n\",\n    \"    def forward(self, input):\\n\",\n    \"        x1 = self.conv1(input)\\n\",\n    \"        x2 = self.conv2(x1)\\n\",\n    \"        x3 = self.conv3(x1)\\n\",\n    \"\\n\",\n    \"        r = self.resblock(x3)\\n\",\n    \"        x4 = self.conv4(r)\\n\",\n    \"\\n\",\n    \"        x4 = torch.cat([x4, x2], dim=1)\\n\",\n    \"        x5 = self.conv5(x4)\\n\",\n    \"        return x5\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class DownSample4(nn.Module):\\n\",\n    \"    def __init__(self):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.conv1 = Conv_Bn_Activation(256, 512, 3, 2, 'mish')\\n\",\n    \"        self.conv2 = Conv_Bn_Activation(512, 256, 1, 1, 'mish')\\n\",\n    \"        self.conv3 = Conv_Bn_Activation(512, 256, 1, 1, 'mish')\\n\",\n    \"\\n\",\n    \"        self.resblock = ResBlock(ch=256, nblocks=8)\\n\",\n    \"        self.conv4 = Conv_Bn_Activation(256, 256, 1, 1, 'mish')\\n\",\n    \"        self.conv5 = Conv_Bn_Activation(512, 512, 1, 1, 'mish')\\n\",\n    \"\\n\",\n    \"    def forward(self, input):\\n\",\n    \"        x1 = self.conv1(input)\\n\",\n    \"        x2 = self.conv2(x1)\\n\",\n    \"        x3 = self.conv3(x1)\\n\",\n    \"\\n\",\n    \"        r = self.resblock(x3)\\n\",\n    \"        x4 = self.conv4(r)\\n\",\n    \"\\n\",\n    \"        x4 = torch.cat([x4, x2], dim=1)\\n\",\n    \"        x5 = self.conv5(x4)\\n\",\n    \"        return x5\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class DownSample5(nn.Module):\\n\",\n    \"    def __init__(self):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.conv1 = Conv_Bn_Activation(512, 1024, 3, 2, 'mish')\\n\",\n    \"        self.conv2 = Conv_Bn_Activation(1024, 512, 1, 1, 'mish')\\n\",\n    \"        self.conv3 = Conv_Bn_Activation(1024, 512, 1, 1, 'mish')\\n\",\n    \"\\n\",\n    \"        self.resblock = ResBlock(ch=512, nblocks=4)\\n\",\n    \"        self.conv4 = Conv_Bn_Activation(512, 512, 1, 1, 'mish')\\n\",\n    \"        self.conv5 = Conv_Bn_Activation(1024, 1024, 1, 1, 'mish')\\n\",\n    \"\\n\",\n    \"    def forward(self, input):\\n\",\n    \"        x1 = self.conv1(input)\\n\",\n    \"        x2 = self.conv2(x1)\\n\",\n    \"        x3 = self.conv3(x1)\\n\",\n    \"\\n\",\n    \"        r = self.resblock(x3)\\n\",\n    \"        x4 = self.conv4(r)\\n\",\n    \"\\n\",\n    \"        x4 = torch.cat([x4, x2], dim=1)\\n\",\n    \"        x5 = self.conv5(x4)\\n\",\n    \"        return x5\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class Neck(nn.Module):\\n\",\n    \"    def __init__(self, inference=False):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.inference = inference\\n\",\n    \"\\n\",\n    \"        self.conv1 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\\n\",\n    \"        self.conv2 = Conv_Bn_Activation(512, 1024, 3, 1, 'leaky')\\n\",\n    \"        self.conv3 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\\n\",\n    \"        # SPP\\n\",\n    \"        self.maxpool1 = nn.MaxPool2d(kernel_size=5, stride=1, padding=5 // 2)\\n\",\n    \"        self.maxpool2 = nn.MaxPool2d(kernel_size=9, stride=1, padding=9 // 2)\\n\",\n    \"        self.maxpool3 = nn.MaxPool2d(kernel_size=13, stride=1, padding=13 // 2)\\n\",\n    \"\\n\",\n    \"        # R -1 -3 -5 -6\\n\",\n    \"        # SPP\\n\",\n    \"        self.conv4 = Conv_Bn_Activation(2048, 512, 1, 1, 'leaky')\\n\",\n    \"        self.conv5 = Conv_Bn_Activation(512, 1024, 3, 1, 'leaky')\\n\",\n    \"        self.conv6 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\\n\",\n    \"        self.conv7 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\\n\",\n    \"        # UP\\n\",\n    \"        self.upsample1 = Upsample()\\n\",\n    \"        # R 85\\n\",\n    \"        self.conv8 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\\n\",\n    \"        # R -1 -3\\n\",\n    \"        self.conv9 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\\n\",\n    \"        self.conv10 = Conv_Bn_Activation(256, 512, 3, 1, 'leaky')\\n\",\n    \"        self.conv11 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\\n\",\n    \"        self.conv12 = Conv_Bn_Activation(256, 512, 3, 1, 'leaky')\\n\",\n    \"        self.conv13 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\\n\",\n    \"        self.conv14 = Conv_Bn_Activation(256, 128, 1, 1, 'leaky')\\n\",\n    \"        # UP\\n\",\n    \"        self.upsample2 = Upsample()\\n\",\n    \"        # R 54\\n\",\n    \"        self.conv15 = Conv_Bn_Activation(256, 128, 1, 1, 'leaky')\\n\",\n    \"        # R -1 -3\\n\",\n    \"        self.conv16 = Conv_Bn_Activation(256, 128, 1, 1, 'leaky')\\n\",\n    \"        self.conv17 = Conv_Bn_Activation(128, 256, 3, 1, 'leaky')\\n\",\n    \"        self.conv18 = Conv_Bn_Activation(256, 128, 1, 1, 'leaky')\\n\",\n    \"        self.conv19 = Conv_Bn_Activation(128, 256, 3, 1, 'leaky')\\n\",\n    \"        self.conv20 = Conv_Bn_Activation(256, 128, 1, 1, 'leaky')\\n\",\n    \"\\n\",\n    \"    def forward(self, input, downsample4, downsample3, inference=False):\\n\",\n    \"        x1 = self.conv1(input)\\n\",\n    \"        x2 = self.conv2(x1)\\n\",\n    \"        x3 = self.conv3(x2)\\n\",\n    \"        # SPP\\n\",\n    \"        m1 = self.maxpool1(x3)\\n\",\n    \"        m2 = self.maxpool2(x3)\\n\",\n    \"        m3 = self.maxpool3(x3)\\n\",\n    \"        spp = torch.cat([m3, m2, m1, x3], dim=1)\\n\",\n    \"        # SPP end\\n\",\n    \"        x4 = self.conv4(spp)\\n\",\n    \"        x5 = self.conv5(x4)\\n\",\n    \"        x6 = self.conv6(x5)\\n\",\n    \"        x7 = self.conv7(x6)\\n\",\n    \"        # UP\\n\",\n    \"        up = self.upsample1(x7, downsample4.size(), self.inference)\\n\",\n    \"        # R 85\\n\",\n    \"        x8 = self.conv8(downsample4)\\n\",\n    \"        # R -1 -3\\n\",\n    \"        x8 = torch.cat([x8, up], dim=1)\\n\",\n    \"\\n\",\n    \"        x9 = self.conv9(x8)\\n\",\n    \"        x10 = self.conv10(x9)\\n\",\n    \"        x11 = self.conv11(x10)\\n\",\n    \"        x12 = self.conv12(x11)\\n\",\n    \"        x13 = self.conv13(x12)\\n\",\n    \"        x14 = self.conv14(x13)\\n\",\n    \"\\n\",\n    \"        # UP\\n\",\n    \"        up = self.upsample2(x14, downsample3.size(), self.inference)\\n\",\n    \"        # R 54\\n\",\n    \"        x15 = self.conv15(downsample3)\\n\",\n    \"        # R -1 -3\\n\",\n    \"        x15 = torch.cat([x15, up], dim=1)\\n\",\n    \"\\n\",\n    \"        x16 = self.conv16(x15)\\n\",\n    \"        x17 = self.conv17(x16)\\n\",\n    \"        x18 = self.conv18(x17)\\n\",\n    \"        x19 = self.conv19(x18)\\n\",\n    \"        x20 = self.conv20(x19)\\n\",\n    \"        return x20, x13, x6\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class Yolov4Head(nn.Module):\\n\",\n    \"    def __init__(self, output_ch, n_classes, inference=False):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.inference = inference\\n\",\n    \"\\n\",\n    \"        self.conv1 = Conv_Bn_Activation(128, 256, 3, 1, 'leaky')\\n\",\n    \"        self.conv2 = Conv_Bn_Activation(256, output_ch, 1, 1, 'linear', bn=False, bias=True)\\n\",\n    \"\\n\",\n    \"        self.yolo1 = YoloLayer(\\n\",\n    \"                                anchor_mask=[0, 1, 2], num_classes=n_classes,\\n\",\n    \"                                anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],\\n\",\n    \"                                num_anchors=9, stride=8)\\n\",\n    \"\\n\",\n    \"        # R -4\\n\",\n    \"        self.conv3 = Conv_Bn_Activation(128, 256, 3, 2, 'leaky')\\n\",\n    \"\\n\",\n    \"        # R -1 -16\\n\",\n    \"        self.conv4 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\\n\",\n    \"        self.conv5 = Conv_Bn_Activation(256, 512, 3, 1, 'leaky')\\n\",\n    \"        self.conv6 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\\n\",\n    \"        self.conv7 = Conv_Bn_Activation(256, 512, 3, 1, 'leaky')\\n\",\n    \"        self.conv8 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\\n\",\n    \"        self.conv9 = Conv_Bn_Activation(256, 512, 3, 1, 'leaky')\\n\",\n    \"        self.conv10 = Conv_Bn_Activation(512, output_ch, 1, 1, 'linear', bn=False, bias=True)\\n\",\n    \"        \\n\",\n    \"        self.yolo2 = YoloLayer(\\n\",\n    \"                                anchor_mask=[3, 4, 5], num_classes=n_classes,\\n\",\n    \"                                anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],\\n\",\n    \"                                num_anchors=9, stride=16)\\n\",\n    \"\\n\",\n    \"        # R -4\\n\",\n    \"        self.conv11 = Conv_Bn_Activation(256, 512, 3, 2, 'leaky')\\n\",\n    \"\\n\",\n    \"        # R -1 -37\\n\",\n    \"        self.conv12 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\\n\",\n    \"        self.conv13 = Conv_Bn_Activation(512, 1024, 3, 1, 'leaky')\\n\",\n    \"        self.conv14 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\\n\",\n    \"        self.conv15 = Conv_Bn_Activation(512, 1024, 3, 1, 'leaky')\\n\",\n    \"        self.conv16 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\\n\",\n    \"        self.conv17 = Conv_Bn_Activation(512, 1024, 3, 1, 'leaky')\\n\",\n    \"        self.conv18 = Conv_Bn_Activation(1024, output_ch, 1, 1, 'linear', bn=False, bias=True)\\n\",\n    \"        \\n\",\n    \"        self.yolo3 = YoloLayer(\\n\",\n    \"                                anchor_mask=[6, 7, 8], num_classes=n_classes,\\n\",\n    \"                                anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],\\n\",\n    \"                                num_anchors=9, stride=32)\\n\",\n    \"\\n\",\n    \"    def forward(self, input1, input2, input3):\\n\",\n    \"        x1 = self.conv1(input1)\\n\",\n    \"        x2 = self.conv2(x1)\\n\",\n    \"\\n\",\n    \"        x3 = self.conv3(input1)\\n\",\n    \"        # R -1 -16\\n\",\n    \"        x3 = torch.cat([x3, input2], dim=1)\\n\",\n    \"        x4 = self.conv4(x3)\\n\",\n    \"        x5 = self.conv5(x4)\\n\",\n    \"        x6 = self.conv6(x5)\\n\",\n    \"        x7 = self.conv7(x6)\\n\",\n    \"        x8 = self.conv8(x7)\\n\",\n    \"        x9 = self.conv9(x8)\\n\",\n    \"        x10 = self.conv10(x9)\\n\",\n    \"\\n\",\n    \"        # R -4\\n\",\n    \"        x11 = self.conv11(x8)\\n\",\n    \"        # R -1 -37\\n\",\n    \"        x11 = torch.cat([x11, input3], dim=1)\\n\",\n    \"\\n\",\n    \"        x12 = self.conv12(x11)\\n\",\n    \"        x13 = self.conv13(x12)\\n\",\n    \"        x14 = self.conv14(x13)\\n\",\n    \"        x15 = self.conv15(x14)\\n\",\n    \"        x16 = self.conv16(x15)\\n\",\n    \"        x17 = self.conv17(x16)\\n\",\n    \"        x18 = self.conv18(x17)\\n\",\n    \"        \\n\",\n    \"        if self.inference:\\n\",\n    \"            y1 = self.yolo1(x2)\\n\",\n    \"            y2 = self.yolo2(x10)\\n\",\n    \"            y3 = self.yolo3(x18)\\n\",\n    \"\\n\",\n    \"            return get_region_boxes([y1, y2, y3])\\n\",\n    \"        \\n\",\n    \"        else:\\n\",\n    \"            return [x2, x10, x18]\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class Yolov4(nn.Module):\\n\",\n    \"    def __init__(self, yolov4conv137weight=None, n_classes=80, inference=False):\\n\",\n    \"        super().__init__()\\n\",\n    \"\\n\",\n    \"        output_ch = (4 + 1 + n_classes) * 3\\n\",\n    \"\\n\",\n    \"        # backbone\\n\",\n    \"        self.down1 = DownSample1()\\n\",\n    \"        self.down2 = DownSample2()\\n\",\n    \"        self.down3 = DownSample3()\\n\",\n    \"        self.down4 = DownSample4()\\n\",\n    \"        self.down5 = DownSample5()\\n\",\n    \"        # neck\\n\",\n    \"        self.neek = Neck(inference)\\n\",\n    \"        # yolov4conv137\\n\",\n    \"        if yolov4conv137weight:\\n\",\n    \"            _model = nn.Sequential(self.down1, self.down2, self.down3, self.down4, self.down5, self.neek)\\n\",\n    \"            pretrained_dict = torch.load(yolov4conv137weight)\\n\",\n    \"\\n\",\n    \"            model_dict = _model.state_dict()\\n\",\n    \"            # 1. filter out unnecessary keys\\n\",\n    \"            pretrained_dict = {k1: v for (k, v), k1 in zip(pretrained_dict.items(), model_dict)}\\n\",\n    \"            # 2. overwrite entries in the existing state dict\\n\",\n    \"            model_dict.update(pretrained_dict)\\n\",\n    \"            _model.load_state_dict(model_dict)\\n\",\n    \"        \\n\",\n    \"        # head\\n\",\n    \"        self.head = Yolov4Head(output_ch, n_classes, inference)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"    def forward(self, input):\\n\",\n    \"        d1 = self.down1(input)\\n\",\n    \"        d2 = self.down2(d1)\\n\",\n    \"        d3 = self.down3(d2)\\n\",\n    \"        d4 = self.down4(d3)\\n\",\n    \"        d5 = self.down5(d4)\\n\",\n    \"\\n\",\n    \"        x20, x13, x6 = self.neek(d5, d4, d3)\\n\",\n    \"\\n\",\n    \"        output = self.head(x20, x13, x6)\\n\",\n    \"        return output\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def yolo_forward_dynamic(output, conf_thresh, num_classes, anchors, num_anchors, scale_x_y, only_objectness=1,\\n\",\n    \"                              validation=False):\\n\",\n    \"    # Output would be invalid if it does not satisfy this assert\\n\",\n    \"    # assert (output.size(1) == (5 + num_classes) * num_anchors)\\n\",\n    \"\\n\",\n    \"    # print(output.size())\\n\",\n    \"\\n\",\n    \"    # Slice the second dimension (channel) of output into:\\n\",\n    \"    # [ 2, 2, 1, num_classes, 2, 2, 1, num_classes, 2, 2, 1, num_classes ]\\n\",\n    \"    # And then into\\n\",\n    \"    # bxy = [ 6 ] bwh = [ 6 ] det_conf = [ 3 ] cls_conf = [ num_classes * 3 ]\\n\",\n    \"    # batch = output.size(0)\\n\",\n    \"    # H = output.size(2)\\n\",\n    \"    # W = output.size(3)\\n\",\n    \"\\n\",\n    \"    bxy_list = []\\n\",\n    \"    bwh_list = []\\n\",\n    \"    det_confs_list = []\\n\",\n    \"    cls_confs_list = []\\n\",\n    \"\\n\",\n    \"    for i in range(num_anchors):\\n\",\n    \"        begin = i * (5 + num_classes)\\n\",\n    \"        end = (i + 1) * (5 + num_classes)\\n\",\n    \"        \\n\",\n    \"        bxy_list.append(output[:, begin : begin + 2])\\n\",\n    \"        bwh_list.append(output[:, begin + 2 : begin + 4])\\n\",\n    \"        det_confs_list.append(output[:, begin + 4 : begin + 5])\\n\",\n    \"        cls_confs_list.append(output[:, begin + 5 : end])\\n\",\n    \"\\n\",\n    \"    # Shape: [batch, num_anchors * 2, H, W]\\n\",\n    \"    bxy = torch.cat(bxy_list, dim=1)\\n\",\n    \"    # Shape: [batch, num_anchors * 2, H, W]\\n\",\n    \"    bwh = torch.cat(bwh_list, dim=1)\\n\",\n    \"\\n\",\n    \"    # Shape: [batch, num_anchors, H, W]\\n\",\n    \"    det_confs = torch.cat(det_confs_list, dim=1)\\n\",\n    \"    # Shape: [batch, num_anchors * H * W]\\n\",\n    \"    det_confs = det_confs.view(output.size(0), num_anchors * output.size(2) * output.size(3))\\n\",\n    \"\\n\",\n    \"    # Shape: [batch, num_anchors * num_classes, H, W]\\n\",\n    \"    cls_confs = torch.cat(cls_confs_list, dim=1)\\n\",\n    \"    # Shape: [batch, num_anchors, num_classes, H * W]\\n\",\n    \"    cls_confs = cls_confs.view(output.size(0), num_anchors, num_classes, output.size(2) * output.size(3))\\n\",\n    \"    # Shape: [batch, num_anchors, num_classes, H * W] --> [batch, num_anchors * H * W, num_classes] \\n\",\n    \"    cls_confs = cls_confs.permute(0, 1, 3, 2).reshape(output.size(0), num_anchors * output.size(2) * output.size(3), num_classes)\\n\",\n    \"\\n\",\n    \"    # Apply sigmoid(), exp() and softmax() to slices\\n\",\n    \"    #\\n\",\n    \"    bxy = torch.sigmoid(bxy) * scale_x_y - 0.5 * (scale_x_y - 1)\\n\",\n    \"    bwh = torch.exp(bwh)\\n\",\n    \"    det_confs = torch.sigmoid(det_confs)\\n\",\n    \"    cls_confs = torch.sigmoid(cls_confs)\\n\",\n    \"\\n\",\n    \"    # Prepare C-x, C-y, P-w, P-h (None of them are torch related)\\n\",\n    \"    grid_x = np.expand_dims(np.expand_dims(np.expand_dims(np.linspace(0, output.size(3) - 1, output.size(3)), axis=0).repeat(output.size(2), 0), axis=0), axis=0)\\n\",\n    \"    grid_y = np.expand_dims(np.expand_dims(np.expand_dims(np.linspace(0, output.size(2) - 1, output.size(2)), axis=1).repeat(output.size(3), 1), axis=0), axis=0)\\n\",\n    \"    # grid_x = torch.linspace(0, W - 1, W).reshape(1, 1, 1, W).repeat(1, 1, H, 1)\\n\",\n    \"    # grid_y = torch.linspace(0, H - 1, H).reshape(1, 1, H, 1).repeat(1, 1, 1, W)\\n\",\n    \"\\n\",\n    \"    anchor_w = []\\n\",\n    \"    anchor_h = []\\n\",\n    \"    for i in range(num_anchors):\\n\",\n    \"        anchor_w.append(anchors[i * 2])\\n\",\n    \"        anchor_h.append(anchors[i * 2 + 1])\\n\",\n    \"\\n\",\n    \"    device = None\\n\",\n    \"    cuda_check = output.is_cuda\\n\",\n    \"    if cuda_check:\\n\",\n    \"        device = output.get_device()\\n\",\n    \"\\n\",\n    \"    bx_list = []\\n\",\n    \"    by_list = []\\n\",\n    \"    bw_list = []\\n\",\n    \"    bh_list = []\\n\",\n    \"\\n\",\n    \"    # Apply C-x, C-y, P-w, P-h\\n\",\n    \"    for i in range(num_anchors):\\n\",\n    \"        ii = i * 2\\n\",\n    \"        # Shape: [batch, 1, H, W]\\n\",\n    \"        bx = bxy[:, ii : ii + 1] + torch.tensor(grid_x, device=device, dtype=torch.float32) # grid_x.to(device=device, dtype=torch.float32)\\n\",\n    \"        # Shape: [batch, 1, H, W]\\n\",\n    \"        by = bxy[:, ii + 1 : ii + 2] + torch.tensor(grid_y, device=device, dtype=torch.float32) # grid_y.to(device=device, dtype=torch.float32)\\n\",\n    \"        # Shape: [batch, 1, H, W]\\n\",\n    \"        bw = bwh[:, ii : ii + 1] * anchor_w[i]\\n\",\n    \"        # Shape: [batch, 1, H, W]\\n\",\n    \"        bh = bwh[:, ii + 1 : ii + 2] * anchor_h[i]\\n\",\n    \"\\n\",\n    \"        bx_list.append(bx)\\n\",\n    \"        by_list.append(by)\\n\",\n    \"        bw_list.append(bw)\\n\",\n    \"        bh_list.append(bh)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"    ########################################\\n\",\n    \"    #   Figure out bboxes from slices     #\\n\",\n    \"    ########################################\\n\",\n    \"    \\n\",\n    \"    # Shape: [batch, num_anchors, H, W]\\n\",\n    \"    bx = torch.cat(bx_list, dim=1)\\n\",\n    \"    # Shape: [batch, num_anchors, H, W]\\n\",\n    \"    by = torch.cat(by_list, dim=1)\\n\",\n    \"    # Shape: [batch, num_anchors, H, W]\\n\",\n    \"    bw = torch.cat(bw_list, dim=1)\\n\",\n    \"    # Shape: [batch, num_anchors, H, W]\\n\",\n    \"    bh = torch.cat(bh_list, dim=1)\\n\",\n    \"\\n\",\n    \"    # Shape: [batch, 2 * num_anchors, H, W]\\n\",\n    \"    bx_bw = torch.cat((bx, bw), dim=1)\\n\",\n    \"    # Shape: [batch, 2 * num_anchors, H, W]\\n\",\n    \"    by_bh = torch.cat((by, bh), dim=1)\\n\",\n    \"\\n\",\n    \"    # normalize coordinates to [0, 1]\\n\",\n    \"    bx_bw /= output.size(3)\\n\",\n    \"    by_bh /= output.size(2)\\n\",\n    \"\\n\",\n    \"    # Shape: [batch, num_anchors * H * W, 1]\\n\",\n    \"    bx = bx_bw[:, :num_anchors].view(output.size(0), num_anchors * output.size(2) * output.size(3), 1)\\n\",\n    \"    by = by_bh[:, :num_anchors].view(output.size(0), num_anchors * output.size(2) * output.size(3), 1)\\n\",\n    \"    bw = bx_bw[:, num_anchors:].view(output.size(0), num_anchors * output.size(2) * output.size(3), 1)\\n\",\n    \"    bh = by_bh[:, num_anchors:].view(output.size(0), num_anchors * output.size(2) * output.size(3), 1)\\n\",\n    \"\\n\",\n    \"    bx1 = bx - bw * 0.5\\n\",\n    \"    by1 = by - bh * 0.5\\n\",\n    \"    bx2 = bx1 + bw\\n\",\n    \"    by2 = by1 + bh\\n\",\n    \"\\n\",\n    \"    # Shape: [batch, num_anchors * h * w, 4] -> [batch, num_anchors * h * w, 1, 4]\\n\",\n    \"    boxes = torch.cat((bx1, by1, bx2, by2), dim=2).view(output.size(0), num_anchors * output.size(2) * output.size(3), 1, 4)\\n\",\n    \"    # boxes = boxes.repeat(1, 1, num_classes, 1)\\n\",\n    \"\\n\",\n    \"    # boxes:     [batch, num_anchors * H * W, 1, 4]\\n\",\n    \"    # cls_confs: [batch, num_anchors * H * W, num_classes]\\n\",\n    \"    # det_confs: [batch, num_anchors * H * W]\\n\",\n    \"\\n\",\n    \"    det_confs = det_confs.view(output.size(0), num_anchors * output.size(2) * output.size(3), 1)\\n\",\n    \"    confs = cls_confs * det_confs\\n\",\n    \"\\n\",\n    \"    # boxes: [batch, num_anchors * H * W, 1, 4]\\n\",\n    \"    # confs: [batch, num_anchors * H * W, num_classes]\\n\",\n    \"\\n\",\n    \"    return  boxes, confs\\n\",\n    \"\\n\",\n    \"class YoloLayer(nn.Module):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Yolo layer\\n\",\n    \"    model_out: while inference,is post-processing inside or outside the model\\n\",\n    \"        true:outside\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    def __init__(self, anchor_mask=[], num_classes=0, anchors=[], num_anchors=1, stride=32, model_out=False):\\n\",\n    \"        super(YoloLayer, self).__init__()\\n\",\n    \"        self.anchor_mask = anchor_mask\\n\",\n    \"        self.num_classes = num_classes\\n\",\n    \"        self.anchors = anchors\\n\",\n    \"        self.num_anchors = num_anchors\\n\",\n    \"        self.anchor_step = len(anchors) // num_anchors\\n\",\n    \"        self.coord_scale = 1\\n\",\n    \"        self.noobject_scale = 1\\n\",\n    \"        self.object_scale = 5\\n\",\n    \"        self.class_scale = 1\\n\",\n    \"        self.thresh = 0.6\\n\",\n    \"        self.stride = stride\\n\",\n    \"        self.seen = 0\\n\",\n    \"        self.scale_x_y = 1\\n\",\n    \"\\n\",\n    \"        self.model_out = model_out\\n\",\n    \"\\n\",\n    \"    def forward(self, output, target=None):\\n\",\n    \"        if self.training:\\n\",\n    \"            return output\\n\",\n    \"        masked_anchors = []\\n\",\n    \"        for m in self.anchor_mask:\\n\",\n    \"            masked_anchors += self.anchors[m * self.anchor_step:(m + 1) * self.anchor_step]\\n\",\n    \"        masked_anchors = [anchor / self.stride for anchor in masked_anchors]\\n\",\n    \"\\n\",\n    \"        return yolo_forward_dynamic(output, self.thresh, self.num_classes, masked_anchors, len(self.anchor_mask),scale_x_y=self.scale_x_y)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_region_boxes(boxes_and_confs):\\n\",\n    \"\\n\",\n    \"    # print('Getting boxes from boxes and confs ...')\\n\",\n    \"\\n\",\n    \"    boxes_list = []\\n\",\n    \"    confs_list = []\\n\",\n    \"\\n\",\n    \"    for item in boxes_and_confs:\\n\",\n    \"        boxes_list.append(item[0])\\n\",\n    \"        confs_list.append(item[1])\\n\",\n    \"\\n\",\n    \"    # boxes: [batch, num1 + num2 + num3, 1, 4]\\n\",\n    \"    # confs: [batch, num1 + num2 + num3, num_classes]\\n\",\n    \"    boxes = torch.cat(boxes_list, dim=1)\\n\",\n    \"    confs = torch.cat(confs_list, dim=1)\\n\",\n    \"        \\n\",\n    \"    return boxes, confs\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 2: Download the COCO 2017 evaluation dataset and define the data loader function\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Download dataset\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"!curl -LO http://images.cocodataset.org/zips/val2017.zip\\n\",\n    \"!curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip\\n\",\n    \"!unzip -q val2017.zip\\n\",\n    \"!unzip annotations_trainval2017.zip\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!ls\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Define data loader\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import json\\n\",\n    \"import time\\n\",\n    \"import torchvision\\n\",\n    \"import torchvision.transforms as transforms\\n\",\n    \"import torchvision.datasets as dset\\n\",\n    \"from pycocotools.coco import COCO\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_image_filenames(root=os.getcwd()):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Generate paths to the coco dataset image files.\\n\",\n    \"    \\n\",\n    \"    Args:\\n\",\n    \"        root (str): The root folder contains.\\n\",\n    \"    \\n\",\n    \"    Yields:\\n\",\n    \"        filename (str): The path to an image file.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    image_path = os.path.join(root, 'val2017')\\n\",\n    \"    for root, dirs, files in os.walk(image_path):\\n\",\n    \"        for filename in files:\\n\",\n    \"            yield os.path.join(image_path, filename)\\n\",\n    \"\\n\",\n    \"            \\n\",\n    \"def get_coco_dataloader(coco2017_root, transform, subset_indices=None):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Create the dataset loader and ground truth coco dataset.\\n\",\n    \"    \\n\",\n    \"    Arguments:\\n\",\n    \"        coco2017_root (str): The root directory to load the data/labels from.\\n\",\n    \"        transform (torchvision.Transform): A transform to apply to the images.\\n\",\n    \"        subset_indices (list): Indices used to create a subset of the dataset.\\n\",\n    \"\\n\",\n    \"    Returns: \\n\",\n    \"        loader (iterable): Produces transformed images and labels.\\n\",\n    \"        cocoGt (pycocotools.coco.COCO): Contains the ground truth in coco \\n\",\n    \"            format.\\n\",\n    \"        label_info (dict): A mapping from label id to the human-readable name.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    # Create the dataset\\n\",\n    \"    coco2017_img_path = os.path.join(coco2017_root, 'val2017')\\n\",\n    \"    coco2017_ann_path = os.path.join(\\n\",\n    \"        coco2017_root, 'annotations/instances_val2017.json')\\n\",\n    \"\\n\",\n    \"    # check the number of images in val2017 - Should be 5000\\n\",\n    \"    num_files = len(list(get_image_filenames(coco2017_root)))\\n\",\n    \"    print('\\\\nNumber of images in val2017 = {}\\\\n'.format(num_files))\\n\",\n    \"\\n\",\n    \"    # load annotations to decode classification results\\n\",\n    \"    with open(coco2017_ann_path) as f:\\n\",\n    \"        annotate_json = json.load(f)\\n\",\n    \"    label_info = {label[\\\"id\\\"]: label[\\\"name\\\"]\\n\",\n    \"                  for label in annotate_json['categories']}\\n\",\n    \"\\n\",\n    \"    # initialize COCO ground truth dataset\\n\",\n    \"    cocoGt = COCO(coco2017_ann_path)\\n\",\n    \"\\n\",\n    \"    # create the dataset using torchvision's coco detection dataset\\n\",\n    \"    coco_val_data = dset.CocoDetection(\\n\",\n    \"        root=coco2017_img_path, \\n\",\n    \"        annFile=coco2017_ann_path, \\n\",\n    \"        transform=transform\\n\",\n    \"    )\\n\",\n    \"\\n\",\n    \"    if subset_indices is not None:\\n\",\n    \"        # Create a smaller subset of the data for testing - e.g. to pinpoint error at image 516\\n\",\n    \"        coco_val_data = torch.utils.data.Subset(coco_val_data, subset_indices)\\n\",\n    \"\\n\",\n    \"    # create the dataloader using torch dataloader\\n\",\n    \"    loader = torch.utils.data.DataLoader(coco_val_data, batch_size=1, shuffle=False)\\n\",\n    \"\\n\",\n    \"    return loader, cocoGt, label_info\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Load dataset\\n\",\n    \"Here 2 dataset loaders are created and the resulting data is displayed\\n\",\n    \"- `orig_coco_val_data_loader`: Contains the original unmodified image\\n\",\n    \"- `coco_val_data_loader`: Contains images of a standardized size of 608x608 pixels \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"coco2017_root = './'\\n\",\n    \"orig_coco_val_data_loader, *_ = get_coco_dataloader(coco2017_root, transforms.ToTensor())\\n\",\n    \"transform = transforms.Compose([transforms.Resize([608, 608]), transforms.ToTensor()])\\n\",\n    \"coco_val_data_loader, cocoGt, label_info = get_coco_dataloader(coco2017_root, transform)\\n\",\n    \"image_orig, _ = next(iter(orig_coco_val_data_loader))\\n\",\n    \"print(image_orig.shape)\\n\",\n    \"image, image_info = next(iter(coco_val_data_loader))\\n\",\n    \"image_id = image_info[0][\\\"image_id\\\"].item()\\n\",\n    \"print(image.shape)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Define some helper functions for deployment (inference)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def postprocess(boxes, scores, score_threshold=0.05, iou_threshold=0.5):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Classifies and filters bounding boxes from Yolo V4 output.\\n\",\n    \"    \\n\",\n    \"    Performs classification, filtering, and non-maximum suppression to remove\\n\",\n    \"    boxes that are irrelevant. The result is the filtered set of boxes, the \\n\",\n    \"    associated label confidence score, and the predicted label.\\n\",\n    \"    \\n\",\n    \"    See: https://pytorch.org/docs/stable/torchvision/ops.html#torchvision.ops.nms\\n\",\n    \"    \\n\",\n    \"    Args:\\n\",\n    \"        boxes (torch.Tensor): The Yolo V4 bounding boxes.\\n\",\n    \"        scores (torch.Tensor): The categories scores for each box.\\n\",\n    \"        score_threshold (float): Ignore boxes with scores below threshold.\\n\",\n    \"        iou_threshold (float): Discards boxes with intersection above threshold. \\n\",\n    \"            \\n\",\n    \"    Returns:\\n\",\n    \"        boxes (torch.Tensor): The filtered Yolo V4 bounding boxes.\\n\",\n    \"        scores (torch.Tensor): The label score for each box.\\n\",\n    \"        labels (torch.Tensor): The label for each box.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    \\n\",\n    \"    # shape: [n_batch, n_boxes, 1, 4] => [n_boxes, 4]  # Assumes n_batch size is 1\\n\",\n    \"    boxes = boxes.squeeze()\\n\",\n    \"\\n\",\n    \"    # shape: [n_batch, n_boxes, 80] => [n_boxes, 80]  # Assumes n_batch size is 1\\n\",\n    \"    scores = scores.squeeze()\\n\",\n    \"\\n\",\n    \"    # Classify each box according to the maximum category score\\n\",\n    \"    score, column = torch.max(scores, dim=1)\\n\",\n    \"\\n\",\n    \"    # Filter out rows for scores which are below threshold\\n\",\n    \"    mask = score > score_threshold\\n\",\n    \"\\n\",\n    \"    # Filter model output data\\n\",\n    \"    boxes = boxes[mask]\\n\",\n    \"    score = score[mask]\\n\",\n    \"    idxs = column[mask]\\n\",\n    \"\\n\",\n    \"    # Perform non-max suppression on all categories at once. shape: [n_keep,]\\n\",\n    \"    keep = torchvision.ops.batched_nms(\\n\",\n    \"        boxes=boxes, \\n\",\n    \"        scores=score, \\n\",\n    \"        idxs=idxs,\\n\",\n    \"        iou_threshold=iou_threshold,\\n\",\n    \"    )\\n\",\n    \"\\n\",\n    \"    # The image category id associated with each column\\n\",\n    \"    categories = torch.tensor([\\n\",\n    \"        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16,\\n\",\n    \"        17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31,\\n\",\n    \"        32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,\\n\",\n    \"        44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,\\n\",\n    \"        57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72,\\n\",\n    \"        73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85,\\n\",\n    \"        86, 87, 88, 89, 90\\n\",\n    \"    ])\\n\",\n    \"    \\n\",\n    \"    boxes = boxes[keep]       # shape: [n_keep, 4]\\n\",\n    \"    score = score[keep]       # shape: [n_keep,]\\n\",\n    \"    idxs = idxs[keep]\\n\",\n    \"    label = categories[idxs]  # shape: [n_keep,]\\n\",\n    \"    \\n\",\n    \"    return boxes, score, label\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_results_as_dict(boxes, scores, labels, image_orig):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Transforms post-processed output into dictionary output.\\n\",\n    \"    \\n\",\n    \"    This translates the model coordinate bounding boxes (x1, y1, x2, y2) \\n\",\n    \"    into a rectangular description (x, y, width, height) scaled to the \\n\",\n    \"    original image size.\\n\",\n    \"    \\n\",\n    \"    Args:\\n\",\n    \"        boxes (torch.Tensor): The Yolo V4 bounding boxes.\\n\",\n    \"        scores (torch.Tensor): The label score for each box.\\n\",\n    \"        labels (torch.Tensor): The label for each box.\\n\",\n    \"        image_orig (torch.Tensor): The image to scale the bounding boxes to.\\n\",\n    \"        \\n\",\n    \"    Returns:\\n\",\n    \"        output (dict): The dictionary of rectangle bounding boxes.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    h_size, w_size = image_orig.shape[-2:]\\n\",\n    \"\\n\",\n    \"    x1 = boxes[:, 0] * w_size\\n\",\n    \"    y1 = boxes[:, 1] * h_size\\n\",\n    \"    x2 = boxes[:, 2] * w_size\\n\",\n    \"    y2 = boxes[:, 3] * h_size\\n\",\n    \"\\n\",\n    \"    width = x2 - x1\\n\",\n    \"    height = y2 - y1\\n\",\n    \"\\n\",\n    \"    boxes = torch.stack([x1, y1, width, height]).T\\n\",\n    \"    return {\\n\",\n    \"        'boxes': boxes.detach().numpy(),\\n\",\n    \"        'labels': labels.detach().numpy(),\\n\",\n    \"        'scores': scores.detach().numpy(),\\n\",\n    \"    }\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def prepare_for_coco_detection(predictions):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Convert dictionary model predictions into an expected COCO dataset format.\\n\",\n    \"    \\n\",\n    \"    Args:\\n\",\n    \"        predictions (dict): The list of box coordinates, scores, and labels.\\n\",\n    \"        \\n\",\n    \"    Returns:\\n\",\n    \"        output (list[dict]): The list of bounding boxes.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    coco_results = []\\n\",\n    \"    for original_id, prediction in predictions.items():\\n\",\n    \"        if len(prediction) == 0:\\n\",\n    \"            continue\\n\",\n    \"\\n\",\n    \"        boxes = prediction[\\\"boxes\\\"].tolist()\\n\",\n    \"        scores = prediction[\\\"scores\\\"].tolist()\\n\",\n    \"        labels = prediction[\\\"labels\\\"].tolist()\\n\",\n    \"\\n\",\n    \"        coco_results.extend(\\n\",\n    \"            [\\n\",\n    \"                {\\n\",\n    \"                    \\\"image_id\\\": original_id,\\n\",\n    \"                    \\\"category_id\\\": labels[k],\\n\",\n    \"                    \\\"bbox\\\": box,\\n\",\n    \"                    \\\"score\\\": scores[k],\\n\",\n    \"                }\\n\",\n    \"                for k, box in enumerate(boxes)\\n\",\n    \"            ]\\n\",\n    \"        )\\n\",\n    \"    return coco_results\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Download pretrained checkpoint\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import requests\\n\",\n    \"\\n\",\n    \"def download_file_from_google_drive(id, destination):\\n\",\n    \"    response = requests.post('https://drive.google.com/uc?id='+id+'&confirm=t')\\n\",\n    \"    save_response_content(response, destination)\\n\",\n    \"\\n\",\n    \"def save_response_content(response, destination):\\n\",\n    \"    CHUNK_SIZE = 32768\\n\",\n    \"    with open(destination, \\\"wb\\\") as f:\\n\",\n    \"        for chunk in response.iter_content(CHUNK_SIZE):\\n\",\n    \"            if chunk:  # filter out keep-alive new chunks\\n\",\n    \"                f.write(chunk)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"download_file_from_google_drive('1wv_LiFeCRYwtpkqREPeI13-gPELBDwuJ', './yolo_v4.pth')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 3: Build, Compile, and Save Neuron-Optimized YOLO v4 TorchScript\\n\",\n    \"### Construct model and load pretrained checkpoint\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"tags\": []\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"model = Yolov4(yolov4conv137weight=None, n_classes=80, inference=True)\\n\",\n    \"weightfile = \\\"./yolo_v4.pth\\\"\\n\",\n    \"pretrained_dict = torch.load(weightfile, map_location=torch.device('cpu'))\\n\",\n    \"model.load_state_dict(pretrained_dict)\\n\",\n    \"model.eval()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Execute inference for a single image and display output\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"tags\": []\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import matplotlib.pyplot as plt\\n\",\n    \"import matplotlib.patches as patches\\n\",\n    \"\\n\",\n    \"image_orig, _ = next(iter(orig_coco_val_data_loader))\\n\",\n    \"image, _ = next(iter(coco_val_data_loader))\\n\",\n    \"boxes, scores = model(image)\\n\",\n    \"boxes, scores, labels = postprocess(boxes, scores)\\n\",\n    \"result_dict = get_results_as_dict(boxes, scores, labels, image_orig)\\n\",\n    \"\\n\",\n    \"fig, ax = plt.subplots(figsize=(10, 10))\\n\",\n    \"ax.imshow(image_orig.numpy().squeeze(0).transpose(1, 2, 0))\\n\",\n    \"for xywh, _ in zip(result_dict['boxes'], result_dict['labels']):\\n\",\n    \"    x, y, w, h = xywh\\n\",\n    \"    rect = patches.Rectangle((x, y), w, h, linewidth=1, edgecolor='g', facecolor='none')\\n\",\n    \"    ax.add_patch(rect)\\n\",\n    \"plt.show()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"tags\": []\n   },\n   \"source\": [\n    \"### Run compilation with manually specified device placement\\n\",\n    \"\\n\",\n    \"First, inspect the model without running compilation by adding the `skip_compiler=True` argument to the `torch.neuron.trace` call.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"tags\": []\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"model_neuron_for_inspection = torch.neuron.trace(model, image, skip_compiler=True)\\n\",\n    \"print(model_neuron_for_inspection)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Inspecting the model, we discover that there are many `aten::slice` operations in some submodules called `YoloLayer`. Although these operations are supported by the neuron-cc compiler, they are not going to run efficiently on the Inferentia hardware. To work it around, we recommend to manually place these operators on CPU.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To manually place `YoloLayer` on CPU, we may make use of the `subgraph_builder_function` argument in `torch.neuron.trace`. It is a callback function that returns `True` or `False` based on information available in `node`. The typical use is a condition based on either `node.name` or `node.type_string`.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"tags\": []\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"def subgraph_builder_function(node):\\n\",\n    \"    return 'YoloLayer' not in node.name\\n\",\n    \"\\n\",\n    \"model_neuron = torch.neuron.trace(model, image, subgraph_builder_function=subgraph_builder_function)\\n\",\n    \"model_neuron.save('yolo_v4_neuron.pt')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Compilation is now finished and the compiled model has been saved to a local file called 'yolo_v4_neuron.pt'. Saving is important due to the slow compilation process.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 4: Evaluate Accuracy on the COCO 2017 Dataset\\n\",\n    \"### Load compiled model and run inference\\n\",\n    \"To validate accuracy of the compiled model, lets run inference on the COCO 2017 validation dataset. We start by defining a helper function `run_inference`.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def run_inference(dataloader, dataloader_orig, model, convert=True, modelName=''):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Run Yolo V4 inference on the COCO dataset.\\n\",\n    \"    \\n\",\n    \"    Args:\\n\",\n    \"        dataloader (iterable): Data loader of input processed images and labels.\\n\",\n    \"        dataloader_orig (iterable): Data loader with original images.\\n\",\n    \"        model (torch.nn.Module): The torch model to run inference against.\\n\",\n    \"        convert (bool): Set to False when using a vanilla torchvision model that \\n\",\n    \"            does not need to be transformed into coco format.\\n\",\n    \"        \\n\",\n    \"    Returns: \\n\",\n    \"        imgIds (list): The list of images with predictions.\\n\",\n    \"        cocoDt (pycocotools.coco.COCO): Contains the predictions from the model \\n\",\n    \"            in coco format.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    print('\\\\n================ Starting Inference on {} Images using {} model ================\\\\n'.format(\\n\",\n    \"        len(dataloader), modelName))\\n\",\n    \"\\n\",\n    \"    modelName = str(modelName).replace(\\\" \\\", \\\"_\\\")\\n\",\n    \"\\n\",\n    \"    # convert predicition to cocoDt\\n\",\n    \"    # code from def evaluate in https://github.com/pytorch/vision/blob/master/references/detection/engine.py\\n\",\n    \"    imgIds = []\\n\",\n    \"    results = []\\n\",\n    \"    skippedImages = []\\n\",\n    \"\\n\",\n    \"    # time inference\\n\",\n    \"    inference_time = 0.0\\n\",\n    \"    for idx, ((image, targets), (image_orig, _)) in enumerate(zip(dataloader, dataloader_orig)):\\n\",\n    \"        # if target is empty, skip the image because it breaks the scripted model\\n\",\n    \"        if not targets:\\n\",\n    \"            skippedImages.append(idx)\\n\",\n    \"            continue\\n\",\n    \"\\n\",\n    \"        # get the predictions\\n\",\n    \"        start_time = time.time()\\n\",\n    \"        boxes, scores = model(image)\\n\",\n    \"        delta = time.time() - start_time\\n\",\n    \"        inference_time += delta\\n\",\n    \"        boxes, scores, labels = postprocess(boxes, scores)\\n\",\n    \"        outputs = get_results_as_dict(boxes, scores, labels, image_orig)\\n\",\n    \"\\n\",\n    \"        res = {target[\\\"image_id\\\"].item(): output for target,\\n\",\n    \"               output in zip(targets, [outputs])}\\n\",\n    \"\\n\",\n    \"        # add the image id to imgIds\\n\",\n    \"        image_id = targets[0][\\\"image_id\\\"].item()\\n\",\n    \"        imgIds.append(image_id)\\n\",\n    \"\\n\",\n    \"        # convert the predicition into cocoDt results\\n\",\n    \"        pred = prepare_for_coco_detection(res)\\n\",\n    \"        results.extend(pred)\\n\",\n    \"\\n\",\n    \"    print('\\\\n==================== Performance Measurement ====================')\\n\",\n    \"    print('Finished inference on {} images in {:.2f} seconds'.format(\\n\",\n    \"        len(dataloader), inference_time))\\n\",\n    \"    print('=================================================================\\\\n')\\n\",\n    \"\\n\",\n    \"    # create bbox detections file\\n\",\n    \"    # following code in https://github.com/aws/aws-neuron-sdk/blob/master/src/examples/tensorflow/yolo_v4_demo/evaluate.ipynb\\n\",\n    \"    resultsfile = modelName + '_bbox_detections.json'\\n\",\n    \"    print('Generating json file...')\\n\",\n    \"    with open(resultsfile, 'w') as f:\\n\",\n    \"        json.dump(results, f)\\n\",\n    \"\\n\",\n    \"    # return COCO api object with loadRes\\n\",\n    \"    cocoDt = cocoGt.loadRes(resultsfile)\\n\",\n    \"\\n\",\n    \"    return imgIds, cocoDt\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"The next step is to simply load the compiled model from disk and then run inference.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"model_neuron = torch.jit.load('yolo_v4_neuron.pt')\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"imgIds, cocoDt = run_inference(coco_val_data_loader, orig_coco_val_data_loader, model_neuron)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We then use the standard `pycocotools` routines to generate a report of bounding box precision/recall.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from pycocotools.cocoeval import COCOeval\\n\",\n    \"\\n\",\n    \"cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')\\n\",\n    \"cocoEval.params.imgIds = imgIds\\n\",\n    \"cocoEval.evaluate()\\n\",\n    \"cocoEval.accumulate()\\n\",\n    \"cocoEval.summarize()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"For reference, we may perform the same evaluation on the CPU model.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"imgIdsRef, cocoDtRef = run_inference(coco_val_data_loader, orig_coco_val_data_loader, model)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"cocoEval = COCOeval(cocoGt, cocoDtRef, 'bbox')\\n\",\n    \"cocoEval.params.imgIds = imgIdsRef\\n\",\n    \"cocoEval.evaluate()\\n\",\n    \"cocoEval.accumulate()\\n\",\n    \"cocoEval.summarize()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 5: Benchmark COCO Dataset Performance of the Neuron-Optimized TorchScript\\n\",\n    \"The following code snippet sets up data parallel on 16 NeuronCores and runs saturated multi-threaded inference on the Inferentia accelerator. Note that the number of cores (`n_cores`) should be set to the number of available NeuronCores on the current instance.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import torch\\n\",\n    \"import torch.neuron\\n\",\n    \"import torchvision\\n\",\n    \"import torchvision.transforms as transforms\\n\",\n    \"import torchvision.datasets as dset\\n\",\n    \"import multiprocessing as mp\\n\",\n    \"from concurrent.futures import ThreadPoolExecutor\\n\",\n    \"import PIL\\n\",\n    \"import os\\n\",\n    \"import time\\n\",\n    \"\\n\",\n    \"n_threads = 16\\n\",\n    \"\\n\",\n    \"def get_image_filenames(root=os.getcwd()):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Generate paths to the coco dataset image files.\\n\",\n    \"    \\n\",\n    \"    Args:\\n\",\n    \"        root (str): The root folder contains.\\n\",\n    \"    \\n\",\n    \"    Yields:\\n\",\n    \"        filename (str): The path to an image file.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    image_path = os.path.join(root, 'val2017')\\n\",\n    \"    for root, dirs, files in os.walk(image_path):\\n\",\n    \"        for filename in files:\\n\",\n    \"            yield os.path.join(image_path, filename)\\n\",\n    \"\\n\",\n    \"def preprocess(path):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Load an image and convert to the expected Yolo V4 tensor format.\\n\",\n    \"    \\n\",\n    \"    Args:\\n\",\n    \"        path (str): The image file to load from disk.  \\n\",\n    \"        \\n\",\n    \"    Returns:\\n\",\n    \"        result (torch.Tensor): The image for prediction. Shape: [1, 3, 608, 608]\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    image = PIL.Image.open(path).convert('RGB')\\n\",\n    \"    resized = torchvision.transforms.functional.resize(image, [608, 608])\\n\",\n    \"    tensor = torchvision.transforms.functional.to_tensor(resized)\\n\",\n    \"    return tensor.unsqueeze(0).to(torch.float32)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def load_model(filename='yolo_v4_neuron.pt'):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Load and pre-warm the Yolo V4 model.\\n\",\n    \"    \\n\",\n    \"    Args:\\n\",\n    \"        filename (str): The location to load the model from.\\n\",\n    \"        \\n\",\n    \"    Returns:\\n\",\n    \"        model (torch.nn.Module): The torch model.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    \\n\",\n    \"    # Load model from disk\\n\",\n    \"    model = torch.jit.load(filename)\\n\",\n    \"\\n\",\n    \"    # Warm up model on neuron by running a single example image\\n\",\n    \"    filename = next(iter(get_image_filenames()))\\n\",\n    \"    image = preprocess(filename)\\n\",\n    \"    model(image)\\n\",\n    \"\\n\",\n    \"    return model\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def task(model, filename):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    The thread task to perform prediction.\\n\",\n    \"    \\n\",\n    \"    This does the full end-to-end processing of an image from loading from disk\\n\",\n    \"    all the way to classifying and filtering bounding boxes.\\n\",\n    \"    \\n\",\n    \"    Args:\\n\",\n    \"        model (torch.nn.Module): The model to run processing with\\n\",\n    \"        filename (str): The image file to load from disk.  \\n\",\n    \"    \\n\",\n    \"    Returns:\\n\",\n    \"        boxes (torch.Tensor): The Yolo V4 bounding boxes.\\n\",\n    \"        scores (torch.Tensor): The label score for each box.\\n\",\n    \"        labels (torch.Tensor): The label for each box.        \\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    image = preprocess(filename)\\n\",\n    \"    begin = time.time()\\n\",\n    \"    boxes, scores = model(image)\\n\",\n    \"    delta = time.time() - begin\\n\",\n    \"    return postprocess(boxes, scores), delta\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def benchmark():\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Run a benchmark on the entire COCO dataset against the neuron model.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    \\n\",\n    \"    # Load a model into each NeuronCore\\n\",\n    \"    models = [load_model() for _ in range(n_cores)]\\n\",\n    \"    \\n\",\n    \"    # Create input/output lists\\n\",\n    \"    filenames = list(get_image_filenames())\\n\",\n    \"    results = list()\\n\",\n    \"    latency = list()\\n\",\n    \"    \\n\",\n    \"    # We want to keep track of average completion time per thread\\n\",\n    \"    sum_time = 0.0\\n\",\n    \"    \\n\",\n    \"    # Submit all tasks and wait for them to finish\\n\",\n    \"    with ThreadPoolExecutor(n_threads) as pool:\\n\",\n    \"        for i, filename in enumerate(filenames):\\n\",\n    \"            result = pool.submit(task, models[i % len(models)], filename)\\n\",\n    \"            results.append(result)\\n\",\n    \"        for result in results:\\n\",\n    \"            results, times = result.result() # Note: Outputs unused for benchmark\\n\",\n    \"            latency.append(times)\\n\",\n    \"            sum_time += times\\n\",\n    \"    \\n\",\n    \"    print('Duration: ', sum_time / n_threads)\\n\",\n    \"    print('Images Per Second:', len(filenames) / (sum_time / n_threads))\\n\",\n    \"    print(\\\"Latency P50: {:.1f}\\\".format(np.percentile(latency[1000:], 50)*1000.0))\\n\",\n    \"    print(\\\"Latency P90: {:.1f}\\\".format(np.percentile(latency[1000:], 90)*1000.0))\\n\",\n    \"    print(\\\"Latency P95: {:.1f}\\\".format(np.percentile(latency[1000:], 95)*1000.0))\\n\",\n    \"    print(\\\"Latency P99: {:.1f}\\\".format(np.percentile(latency[1000:], 99)*1000.0))\\n\",\n    \"\\n\",\n    \"benchmark()\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.8.9 64-bit\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.9\"\n  },\n  \"vscode\": {\n   \"interpreter\": {\n    \"hash\": \"31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6\"\n   }\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/LICENSE",
    "content": "Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n  \nPermission is hereby granted, free of charge, to any person obtaining a copy of this\nsoftware and associated documentation files (the \"Software\"), to deal in the Software\nwithout restriction, including without limitation the rights to use, copy, modify,\nmerge, publish, distribute, sublicense, and/or sell copies of the Software, and to\npermit persons to whom the Software is furnished to do so.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,\nINCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A\nPARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\nHOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION\nOF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE\nSOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/README.md",
    "content": "</br>\n</br>\n\nPlease view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** \n\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/bert_client.py",
    "content": "# coding=utf-8\n\n\"\"\" Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n    Program to gather information from a system\n\"\"\"\n\nimport sys\nimport os\nimport argparse\nimport random\nimport time\nimport grpc\nimport mrpc_pb2\nsys.path.append(os.path.dirname(__file__))\nimport mrpc_pb2_grpc\nimport mrpc_feature\n\nlatencies = []\n\ndef client():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--port', default=60061, help='gRPC port')\n    parser.add_argument('--pair', default=None, help='Text pair')\n    parser.add_argument('--cycle', type=int, default=1, help='Number of inference cycles')\n    parser.add_argument('--save-accuracy', default=None, help='Save accuracy to file')\n    args = parser.parse_args()\n    text_pair = mrpc_pb2.TextPair()\n    if args.pair is not None:\n        text_a, text_b = args.pair\n        text_pair.text_a = text_a.encode()\n        text_pair.text_b = text_b.encode()\n    else:\n        eval_data_path = os.path.join(os.path.dirname(__file__), 'glue_mrpc_dev.tsv')\n        tsv = mrpc_feature.read_tsv(eval_data_path)\n    with grpc.insecure_channel('127.0.0.1:{}'.format(args.port)) as channel:\n        stub = mrpc_pb2_grpc.mrpcStub(channel)\n        num_correct = 0\n        very_start = time.time()\n        for _ in range(args.cycle):\n            if args.pair is None:\n                data = random.choice(tsv[1:])\n                text_pair.text_a = data[3].encode()\n                text_pair.text_b = data[4].encode()\n            start = time.time()\n            yes_no = stub.paraphrase(text_pair)\n            elapsed = time.time() - start\n            if data is None:\n                evaluation = ''\n            else:\n                if yes_no.prediction.decode() == data[0]:\n                    num_correct += 1\n                evaluation = 'correct, ' if yes_no.prediction.decode() == data[0] else 'incorrect, '\n            print('{} ({}latency {} s)'.format(yes_no.message.decode(), evaluation, elapsed))\n            latencies.append(elapsed)\n        if args.cycle > 1:\n            accuracy = num_correct / args.cycle\n            print('took {} s for {} cycles, accuracy {}'.format(time.time() - very_start, args.cycle, accuracy))\n            if args.save_accuracy is not None:\n                with open(args.save_accuracy, 'w') as f:\n                    f.write(str(accuracy))\n\ndef write_latencies():\n    with open('latencies.txt', 'a') as f:\n        for l in latencies:\n            f.write(str(l) + '\\n')\n\nif __name__ == '__main__':\n    client()\n    write_latencies()\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/bert_model.py",
    "content": "# coding=utf-8\n\n\"\"\" Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n    Program to gather information from a system\n\"\"\"\n\nimport os\nimport argparse\nimport shlex\nimport numpy as np\nimport tensorflow as tf\nfrom tensorflow.neuron import fuse\nfrom tensorflow.core.framework import attr_value_pb2\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--input_saved_model', required=True, help='Original SaveModel')\n    parser.add_argument('--output_saved_model', required=True, help='Output SavedModel that runs on Inferentia')\n    parser.add_argument('--dtype', default='float16', help='Data type for weights')\n    parser.add_argument('--batch_size', type=int, default=4)\n    parser.add_argument('--sequence_length', type=int, default=128)\n    parser.add_argument('--crude_gelu', action='store_true')\n    parser.add_argument('--aggressive_optimizations', action='store_true')\n    args = parser.parse_args()\n    if os.path.exists(args.output_saved_model):\n        raise OSError('output_saved_model {} already exists'.format(args.output_saved_model))\n    dtype = tf.float16 if args.dtype == 'float16' else tf.float32\n    if args.aggressive_optimizations:\n        args.crude_gelu = True\n    bert = NeuronBERTMRPC(\n        args.input_saved_model,\n        dtype=dtype,\n        batch_size=args.batch_size,\n        seq_len=args.sequence_length,\n        crude_gelu=args.crude_gelu,\n        aggressive_fp16_cast=args.aggressive_optimizations,\n    )\n\n    fuser = fuse(compiler_args=['--fp32-cast', 'matmult'], timeout=360000)\n    bert.encoder = fuser(bert.encoder)\n\n    input_ids = bert.input_ids\n    input_mask = bert.input_mask\n    segment_ids = bert.segment_ids\n    with tf.Session(graph=tf.Graph()) as sess:\n        input_ids_ph_shape = input_ids.shape.as_list()\n        input_ids_ph_shape[0] = None\n        input_ids_ph = tf.placeholder(input_ids.dtype, input_ids_ph_shape, name='input_ids')\n\n        input_mask_ph_shape = input_mask.shape.as_list()\n        input_mask_ph_shape[0] = None\n        input_mask_ph = tf.placeholder(input_mask.dtype, input_mask_ph_shape, name='input_mask')\n\n        segment_ids_ph_shape = segment_ids.shape.as_list()\n        segment_ids_ph_shape[0] = None\n        segment_ids_ph = tf.placeholder(segment_ids.dtype, segment_ids_ph_shape, name='segment_ids')\n\n        dummy_reshapes = []\n        discard_op_names = set()\n\n        with tf.name_scope('bert/embeddings'):\n            expand_dims = tf.expand_dims(input_ids_ph, axis=-1)\n            batch_size = tf.shape(input_ids_ph)[0]\n            reshape = tf.reshape(expand_dims, [batch_size * bert.seq_len])\n            gatherv2 = tf.gather(bert.weights_dict['bert/embeddings/word_embeddings:0'], reshape, axis=0)\n            reshape_1 = tf.reshape(gatherv2, [batch_size, bert.seq_len, bert.hid_size])\n            reshape_2 = tf.reshape(segment_ids_ph, [batch_size * bert.seq_len])\n            one_hot = tf.one_hot(reshape_2, depth=2)\n            matmul = tf.matmul(one_hot, bert.weights_dict['bert/embeddings/token_type_embeddings:0'])\n            reshape_3 = tf.reshape(matmul, [batch_size, bert.seq_len, bert.hid_size])\n            slice0 = tf.slice(bert.weights_dict['bert/embeddings/position_embeddings:0'], begin=[0, 0], size=[bert.seq_len, -1])\n            add_1 = reshape_1 + reshape_3 + slice0\n            input_tensor = tf.reshape(add_1, [batch_size, bert.seq_len, bert.hid_size])\n        with tf.name_scope('bert/encoder'):\n            reshape = tf.reshape(input_mask_ph, [batch_size, 1, 1, bert.seq_len])\n            bias_tensor = tf.cast(reshape, tf.float32)\n            bias_tensor = 1.0 - bias_tensor\n            bias_tensor = bias_tensor * -10000.0\n            bias_tensor = tf.cast(bias_tensor, bert.dtype)\n        tensor = bert.layer_norm(input_tensor, 'embeddings', force_float32=True)\n\n        tensor = tf.reshape(tensor, [bert.batch_size, bert.seq_len, bert.hid_size])\n        dummy_reshapes.append(tensor)\n        discard_op_names.add(tensor.op.name)\n        bias_tensor = tf.reshape(bias_tensor, [bert.batch_size, 1, 1, bert.seq_len])\n        dummy_reshapes.append(bias_tensor)\n        discard_op_names.add(bias_tensor.op.name)\n\n        logits = bert.encoder(tensor, bias_tensor)\n        with tf.name_scope('loss'):\n            if bert.dtype is not tf.float32:\n                logits = tf.cast(logits, tf.float32)\n            probabilities = tf.nn.softmax(logits)\n        for rts in dummy_reshapes:\n            neuron_op = rts.consumers()[0]\n            neuron_op._update_input(list(neuron_op.inputs).index(rts), rts.op.inputs[0])\n        try:\n            sess.run(probabilities)\n        except:\n            pass\n        graph_def = sess.graph.as_graph_def()\n    new_graph_def = tf.GraphDef()\n    new_graph_def.node.MergeFrom(node for node in graph_def.node if node.name not in discard_op_names)\n    neuron_op_node = [node for node in new_graph_def.node if node.op == 'NeuronOp'][0]\n    neuron_op_node.attr['input_batch_axis'].list.i[:] = [0, 0]\n    neuron_op_node.attr['output_batch_axis'].list.i[:] = [0]\n\n    with tf.Session(graph=tf.Graph()) as sess:\n        tf.import_graph_def(new_graph_def, name='')\n        inputs = {\n            'input_ids': sess.graph.get_tensor_by_name(input_ids_ph.name),\n            'input_mask': sess.graph.get_tensor_by_name(input_mask_ph.name),\n            'segment_ids': sess.graph.get_tensor_by_name(segment_ids_ph.name),\n        }\n        outputs = {\n            'probabilities': sess.graph.get_tensor_by_name(probabilities.name)\n        }\n        try:\n            sess.run(probabilities)\n        except:\n            pass\n        neuron_op = [op for op in sess.graph.get_operations() if op.type == 'NeuronOp'][0]\n        if not neuron_op.get_attr('executable'):\n            raise AttributeError('Neuron executable (neff) is empty. Please check neuron-cc is installed and working properly (`pip install neuron-cc` to install neuron-cc).')\n        tf.saved_model.simple_save(sess, args.output_saved_model, inputs, outputs)\n\n\nclass NeuronBERTMRPC:\n\n    def __init__(self, bert_saved_model, dtype=tf.float16, batch_size=4, seq_len=128, crude_gelu=False, aggressive_fp16_cast=False):\n        predictor = tf.contrib.predictor.from_saved_model(bert_saved_model)\n        sess = predictor.session\n        self.input_ids = predictor.feed_tensors['input_ids']\n        self.input_mask = predictor.feed_tensors['input_mask']\n        self.segment_ids = predictor.feed_tensors['segment_ids']\n        weights_dict = {}\n        for op in sess.graph.get_operations():\n            if op.type == 'Const':\n                tensor = op.outputs[0]\n                weights_dict[tensor.name] = tensor\n            if op.type == 'Identity' and op.name.endswith('read'):\n                tensor = op.outputs[0]\n                weights_dict[tensor.op.inputs[0].name] = tensor\n        self.weights_dict = sess.run(weights_dict)\n        self.dtype = dtype\n        self.batch_size = batch_size\n        self.seq_len = seq_len\n        self.hid_size, self.inter_size = self.weights_dict['bert/encoder/layer_0/intermediate/dense/kernel:0'].shape\n        self.num_heads = sess.graph.get_tensor_by_name('bert/encoder/layer_0/attention/self/Reshape:0').shape.as_list()[2]\n        self.head_size = self.hid_size // self.num_heads\n        self.eps = self.weights_dict['bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add/y:0']\n        self.crude_gelu = crude_gelu\n        self.layer_norm_dtype = tf.float16 if aggressive_fp16_cast else tf.float32\n        sess.close()\n\n    def encoder(self, tensor, bias_tensor):\n        tensor = tf.reshape(tensor, [self.batch_size * self.seq_len, self.hid_size])\n        for layer_id in range(24):\n            mid_layer_name = 'layer_{}'.format(layer_id)\n            tensor = self.self_attention(tensor, bias_tensor, mid_layer_name)\n            tensor = self.layer_norm(tensor, 'encoder/' + mid_layer_name + '/attention/output')\n            tensor = self.fully_connected(tensor, mid_layer_name)\n            tensor = self.layer_norm(tensor, 'encoder/' + mid_layer_name + '/output')\n        logits = self.pooler_loss(tensor)\n        return logits\n\n    def fully_connected(self, input_tensor, layer_name):\n        inter_kernel = self.weights_dict['bert/encoder/{}/intermediate/dense/kernel:0'.format(layer_name)]\n        inter_bias = self.weights_dict['bert/encoder/{}/intermediate/dense/bias:0'.format(layer_name)]\n        out_kernel = self.weights_dict['bert/encoder/{}/output/dense/kernel:0'.format(layer_name)]\n        out_bias = self.weights_dict['bert/encoder/{}/output/dense/bias:0'.format(layer_name)]\n        with tf.name_scope('bert/encoder/{}/fully_connected/intermediate/dense'.format(layer_name)):\n            matmul = tf.matmul(input_tensor, inter_kernel.astype(self.dtype.as_numpy_dtype))\n            bias_add = tf.nn.bias_add(matmul, inter_bias.astype(self.dtype.as_numpy_dtype))\n            gelu = self.gelu_sigmoid(bias_add) if self.crude_gelu else self.gelu_tanh(bias_add)\n        with tf.name_scope('bert/encoder/{}/fully_connected/output/dense'.format(layer_name)):\n            matmul = tf.matmul(gelu, out_kernel.astype(self.dtype.as_numpy_dtype))\n            bias_add = tf.nn.bias_add(matmul, out_bias.astype(self.dtype.as_numpy_dtype))\n            output_tensor = bias_add + input_tensor\n        return output_tensor\n\n    def self_attention(self, input_tensor, bias_tensor, layer_name):\n        query_kernel = self.weights_dict['bert/encoder/{}/attention/self/query/kernel:0'.format(layer_name)] * 0.125\n        query_bias = self.weights_dict['bert/encoder/{}/attention/self/query/bias:0'.format(layer_name)] * 0.125\n        key_kernel = self.weights_dict['bert/encoder/{}/attention/self/key/kernel:0'.format(layer_name)]\n        key_bias = self.weights_dict['bert/encoder/{}/attention/self/key/bias:0'.format(layer_name)]\n        value_kernel = self.weights_dict['bert/encoder/{}/attention/self/value/kernel:0'.format(layer_name)]\n        value_bias = self.weights_dict['bert/encoder/{}/attention/self/value/bias:0'.format(layer_name)]\n        output_kernel = self.weights_dict['bert/encoder/{}/attention/output/dense/kernel:0'.format(layer_name)]\n        output_bias = self.weights_dict['bert/encoder/{}/attention/output/dense/bias:0'.format(layer_name)]\n        with tf.name_scope('bert/encoder/{}/attention/self'.format(layer_name)):\n            matmul = tf.matmul(input_tensor, query_kernel.astype(self.dtype.as_numpy_dtype))\n            query = tf.nn.bias_add(matmul, query_bias.astype(self.dtype.as_numpy_dtype))\n            query_r = tf.reshape(query, [self.batch_size, self.seq_len, self.num_heads, self.head_size])\n            query_rt = tf.transpose(query_r, [0, 2, 1, 3])\n            matmul = tf.matmul(input_tensor, key_kernel.astype(self.dtype.as_numpy_dtype))\n            key = tf.nn.bias_add(matmul, key_bias.astype(self.dtype.as_numpy_dtype))\n            key_r = tf.reshape(key, [self.batch_size, self.seq_len, self.num_heads, self.head_size])\n            key_rt = tf.transpose(key_r, [0, 2, 1, 3])  # [b, n, l, h]\n            query_key = tf.matmul(query_rt, key_rt, transpose_b=True)  # [b, n, lq, h] @ [b, n, lk, h] -> [b, n, lq, lk]\n            bias_query_key = tf.add(query_key, bias_tensor)\n            softmax_weights = tf.nn.softmax(bias_query_key)\n            matmul = tf.matmul(input_tensor, value_kernel.astype(self.dtype.as_numpy_dtype))\n            value = tf.nn.bias_add(matmul, value_bias.astype(self.dtype.as_numpy_dtype))\n            value_r = tf.reshape(value, [self.batch_size, self.seq_len, self.num_heads, self.head_size])\n            value_rt = tf.transpose(value_r, [0, 2, 3, 1])\n            weighted_value_rt = tf.matmul(softmax_weights, value_rt, transpose_b=True)  # [b, n, lq, lk] @ [b, n, h, lv] -> [b, n, lq, h]\n            weighted_value_r = tf.transpose(weighted_value_rt, [0, 2, 1, 3])  # [b, lq, n, h]\n            weighted_value = tf.reshape(weighted_value_r, [self.batch_size * self.seq_len, self.hid_size])\n        with tf.name_scope('bert/encoder/{}/attention/output'.format(layer_name)):\n            matmul = tf.matmul(weighted_value, output_kernel.astype(self.dtype.as_numpy_dtype))\n            unnorm_output = tf.nn.bias_add(matmul, output_bias.astype(self.dtype.as_numpy_dtype))\n            output_tensor = tf.add(input_tensor, unnorm_output)\n        return output_tensor\n\n    def layer_norm(self, input_tensor, layer_name, force_float32=False):\n        dtype = tf.float32 if force_float32 else self.layer_norm_dtype\n        gamma = dtype.as_numpy_dtype(self.weights_dict['bert/{}/LayerNorm/gamma:0'.format(layer_name)])\n        beta = dtype.as_numpy_dtype(self.weights_dict['bert/{}/LayerNorm/beta:0'.format(layer_name)])\n        with tf.name_scope('bert/{}/LayerNorm'.format(layer_name)):\n            input_tensor = tf.cast(input_tensor, dtype)\n            mean = tf.reduce_mean(input_tensor, axis=[-1], keepdims=True, name='mean')\n            residuals = tf.subtract(input_tensor, mean, name='residuals')\n            var = tf.reduce_mean(residuals * residuals, axis=[-1], keepdims=True, name='var')\n            rsqrt = tf.rsqrt(var + dtype.as_numpy_dtype(self.eps))\n            norm_output = tf.multiply(residuals, rsqrt, name='normalized')\n            output_tensor = norm_output * gamma + beta\n            output_tensor = tf.cast(output_tensor, self.dtype)\n        return output_tensor\n\n    def pooler_loss(self, input_tensor):\n        pooler_kernel = self.weights_dict['bert/pooler/dense/kernel:0']\n        pooler_bias = self.weights_dict['bert/pooler/dense/bias:0']\n        loss_kernel = self.weights_dict['output_weights:0'].T\n        loss_bias = self.weights_dict['output_bias:0']\n        with tf.name_scope('bert/pooler_loss'):\n            reshape = tf.reshape(input_tensor, [self.batch_size, self.seq_len, self.hid_size])\n            reshape_1 = tf.reshape(reshape[:, 0:1, :], [self.batch_size, self.hid_size])\n            matmul = tf.matmul(reshape_1, pooler_kernel.astype(self.dtype.as_numpy_dtype))\n            bias_add = tf.nn.bias_add(matmul, pooler_bias.astype(self.dtype.as_numpy_dtype))\n            tanh = tf.tanh(bias_add)\n            matmul = tf.matmul(tanh, loss_kernel.astype(self.dtype.as_numpy_dtype))\n            output_tensor = tf.nn.bias_add(matmul, loss_bias.astype(self.dtype.as_numpy_dtype))\n        return output_tensor\n\n    def gelu_tanh(self, tensor):\n        pow3 = 0.044714998453855515 * tensor * tensor * tensor + tensor\n        shifted = (tf.tanh(0.7978845834732056 * pow3) + 1.0) * tensor\n        return tf.multiply(shifted, 0.5)\n\n    def gelu_sigmoid(self, tensor):\n        return tf.sigmoid(1.702 * tensor) * tensor\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/bert_model_server.py",
    "content": "# coding=utf-8\n\n\"\"\" Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n    Program to gather information from a system\n\"\"\"\n\nimport os\nimport argparse\nimport subprocess\nimport time\n\n\n_ONE_DAY_IN_SECONDS = 60 * 60 * 24\n\n\ndef serve():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--serving', required=True, help='Path to tf-serving binary')\n    parser.add_argument('--dir', required=True, help='TensorFlow SavedModel dir')\n    parser.add_argument('--port', default=8500, help='gRPC port')\n    parser.add_argument('--parallel', type=int, default=8, help='Number of predictors')\n    args = parser.parse_args()\n    model = os.path.abspath(args.dir)\n    model_with_version = os.path.join(model, '1')\n    if not os.path.exists(model_with_version):\n        os.makedirs(model_with_version)\n        os.symlink(os.path.join(model, 'variables'), os.path.join(model_with_version, 'variables'))\n        os.symlink(os.path.join(model, 'saved_model.pb'), os.path.join(model_with_version, 'saved_model.pb'))\n    process_list = []\n    for _ in range(args.parallel):\n        proc = subprocess.Popen([\n            args.serving, '--model_base_path={}'.format(model), '--port={}'.format(args.port),\n            '--tensorflow_intra_op_parallelism=1', '--tensorflow_inter_op_parallelism=1'\n        ])\n        process_list.append(proc)\n    try:\n        time.sleep(_ONE_DAY_IN_SECONDS)\n    except KeyboardInterrupt:\n        for proc in process_list:\n            proc.terminate()\n            proc.wait()\n\n\nif __name__ == '__main__':\n    serve()\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/bert_no_model.py",
    "content": "# bert_no_model.py\nimport argparse\nimport tensorflow as tf\nimport tensorflow.neuron as tfn\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--input_saved_model', required=True, help='Original SaveModel')\n    parser.add_argument('--output_saved_model', required=True, help='Output SavedModel that runs on Inferentia')\n    parser.add_argument('--batch_size', type=int, default=1)\n    args = parser.parse_args()\n    pred = tf.contrib.predictor.from_saved_model(args.input_saved_model)\n    no_fuse_ops = [op.name for op in pred.graph.get_operations()]\n\n    def force_fuse_condition(op_name):\n        exclude_scopes = [\n            'bert/encoder/strided_slice',\n            'bert/encoder/ones',\n            'bert/encoder/Reshape',\n            'bert/encoder/Shape',\n            'bert/encoder/Cast',\n        ]\n        for scope in exclude_scopes:\n            if op_name == scope or op_name.startswith('{}/'.format(scope)):\n                return False\n        return op_name.startswith('bert/encoder') or op_name.startswith('bert/pooler')\n\n    force_fuse_ops = [op.name for op in pred.graph.get_operations() if force_fuse_condition(op.name)]\n    compilation_result = tfn.saved_model.compile(\n        args.input_saved_model, args.output_saved_model,\n        batch_size=args.batch_size,\n        no_fuse_ops=no_fuse_ops,\n        force_fuse_ops=force_fuse_ops,\n    )\n    print(compilation_result)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/bert_server.py",
    "content": "# coding=utf-8\n\n\"\"\" Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n    Program to gather information from a system\n\"\"\"\n\nimport sys\nimport os\nimport collections\nimport argparse\nimport time\nimport csv\nimport random\nfrom concurrent import futures\nimport multiprocessing\nfrom multiprocessing.dummy import Pool\nfrom threading import Lock\nimport pkg_resources\nfrom distutils.version import LooseVersion\nimport grpc\nimport numpy as np\nimport tensorflow as tf\nimport mrpc_feature\nimport tokenization\nimport mrpc_pb2\nsys.path.append(os.path.dirname(__file__))\nimport mrpc_pb2_grpc\n\n\n_ONE_DAY_IN_SECONDS = 60 * 60 * 24\n\ntotal_tpt = 0\nnum_tpt = 0\n\nclass BERTService(mrpc_pb2_grpc.mrpcServicer):\n\n    def __init__(self, model_path, parallel, batch_size, bootstrap, vocab_txt, num_thread_per_predictor=2):\n        num_queues = parallel * num_thread_per_predictor\n        config = tf.ConfigProto(inter_op_parallelism_threads=num_queues, intra_op_parallelism_threads=1)\n        tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version)\n        if tfn_version >= LooseVersion('1.15.0.1.0.1333.0'):\n            neuroncore_group_sizes = '{}x1'.format(parallel)\n            predictor = tf.contrib.predictor.from_saved_model(model_path, config=config)\n            self.predictor_list = [predictor for _ in range(num_queues)]\n        else:\n            neuroncore_group_sizes = ','.join('1' for _ in range(parallel))\n            predictor_list = [tf.contrib.predictor.from_saved_model(model_path, config=config) for _ in range(parallel)]\n            self.predictor_list = []\n            for pred in predictor_list:\n                self.predictor_list.extend(pred for _ in range(num_thread_per_predictor))\n        os.environ['NEURONCORE_GROUP_SIZES'] = neuroncore_group_sizes\n        if self.predictor_list[0].feed_tensors['input_ids'].shape.is_fully_defined():\n            self.batch_size = self.predictor_list[0].feed_tensors['input_ids'].shape.as_list()[0]\n        else:\n            self.batch_size = batch_size\n        self.bootstrap = bootstrap\n        self.tokenizer = tokenization.FullTokenizer(vocab_file=vocab_txt, do_lower_case=True)\n        self.num_infer = 0\n        self.num_correct = 0\n        self.output_name = list(self.predictor_list[0].fetch_tensors.keys())[0]\n        self.iid = 0\n        self.throughput_list = []\n        self.latency_list = []\n        self.max_len_latency_list = 1000\n        self.iid_lock = Lock()\n        if bootstrap:\n            self.request_queue_list = [collections.deque() for _ in self.predictor_list]\n            eval_data_path = os.path.join(os.path.dirname(__file__), 'glue_mrpc_dev.tsv')\n            tsv = mrpc_feature.read_tsv(eval_data_path)\n            for request_queue in self.request_queue_list:\n                for _ in range(1024):\n                    data_list = random.choices(tsv[1:], k=self.batch_size)\n                    model_feed_dict_list = [mrpc_feature.text_pair_to_model_feed_dict(data[3], data[4], self.tokenizer) for data in data_list]\n                    label_list = [int(data[0]) for data in data_list]\n                    batch_labels = np.array(label_list)\n                    batch_feeds = {\n                        key: np.concatenate([feed[key] for feed in model_feed_dict_list], axis=0)\n                        for key in model_feed_dict_list[0].keys()\n                    }\n                    request_queue.append((batch_feeds, batch_labels))\n        else:\n            self.request_queue_list = [[] for _ in self.predictor_list]\n        self.result_map = {}\n        self.alive = True\n        dummy_feed = {\n            'input_ids': np.zeros([1, 128], dtype=np.int32),\n            'input_mask': np.zeros([1, 128], dtype=np.int32),\n            'segment_ids': np.zeros([1, 128], dtype=np.int32),\n        }\n        self.dummy_feeds = [(None, dummy_feed) for _ in range(self.batch_size)]\n        model_feed_dict_list = [dummy_feed for _ in range(self.batch_size)]\n        batch_feeds = {\n            key: np.concatenate([feed[key] for feed in model_feed_dict_list], axis=0)\n            for key in model_feed_dict_list[0].keys()\n        }\n        pool = Pool(len(self.predictor_list))\n        for pred in self.predictor_list:\n            pool.apply_async(pred, (batch_feeds,))\n            time.sleep(1)\n        pool.close()\n        pool.join()\n\n    def cleanup(self):\n        for pred in self.predictor_list:\n            print(pred)\n            pred.session.close()\n\n    def current_throughput(self):\n        last_num_infer = self.num_infer\n        global total_tpt\n        global num_tpt\n        while self.alive:\n            current_num_infer = self.num_infer\n            throughput = current_num_infer - last_num_infer\n            self.throughput_list.append(throughput)\n            print('current throughput {}'.format(throughput))\n            last_num_infer = current_num_infer\n            if throughput != 0:\n                total_tpt += throughput\n                num_tpt += 1\n            time.sleep(1)\n\n    def current_throughput_accuracy(self):\n        last_num_infer = self.num_infer\n        while self.alive:\n            current_num_infer = self.num_infer\n            accuracy = 0.0 if self.num_infer == 0 else self.num_correct / self.num_infer\n            print('current throughput {}, accuracy {}'.format(current_num_infer - last_num_infer, accuracy))\n            last_num_infer = current_num_infer\n            if throughput != 0:\n                total_tpt += throughput\n                num_tpt += 1\n            time.sleep(1)\n\n    def paraphrase(self, text_pair, context):\n        iid = self.put_input(text_pair.text_a, text_pair.text_b)\n        yes_no = mrpc_pb2.YesNo()\n        if self.get_output(iid) == 1:\n            yes_no.message = b'paraphrase!'\n            yes_no.prediction = b'1'\n        else:\n            yes_no.message = b'not paraphrase!'\n            yes_no.prediction = b'0'\n        return yes_no\n\n    def put_input(self, text_a, text_b):\n        model_feed_dict = mrpc_feature.text_pair_to_model_feed_dict(text_a, text_b, self.tokenizer)\n        with self.iid_lock:\n            self.iid += 1\n            iid = self.iid\n        self.request_queue_list[iid % len(self.request_queue_list)].append((iid, model_feed_dict))\n        return iid\n\n    def process_input(self, idx):\n        print('input processor is waiting')\n        request_queue = self.request_queue_list[idx]\n        predictor = self.predictor_list[idx]\n        while self.alive:\n            if len(request_queue) > 0:\n                sublist = request_queue[:self.batch_size]\n                request_queue[:self.batch_size] = []\n                if len(sublist) != self.batch_size:\n                    print('batch with {} garbage entries!'.format(self.batch_size - len(sublist)))\n                if len(sublist) < self.batch_size:\n                    pad_batch_size = self.batch_size - len(sublist)\n                    sublist.extend(self.dummy_feeds[:pad_batch_size])\n                iid_list = [iid for iid, _ in sublist]\n                model_feed_dict_list = [feed for _, feed in sublist]\n                batch_feeds = {\n                    key: np.concatenate([feed[key] for feed in model_feed_dict_list], axis=0)\n                    for key in model_feed_dict_list[0].keys()\n                }\n                start = time.time()\n                batch_predictions = predictor(batch_feeds)[self.output_name].argmax(-1)\n                latency = time.time() - start\n                if len(self.latency_list) < self.max_len_latency_list:\n                    self.latency_list.append(latency)\n                self.result_map.update({iid: pred for iid, pred in zip(iid_list, batch_predictions)})\n            time.sleep(0.001)\n\n    def process_input_bootstrap(self, idx):\n        print('input processor is waiting')\n        request_queue = self.request_queue_list[idx]\n        predictor = self.predictor_list[idx]\n        while self.alive:\n            if len(request_queue) > 0:\n                batch_feeds, batch_labels = request_queue.popleft()\n                batch_predictions = predictor(batch_feeds)[self.output_name].argmax(-1)\n                self.num_infer += self.batch_size\n                self.num_correct += (batch_predictions == batch_labels).sum()\n                continue\n            time.sleep(0.0001)\n\n    def get_output(self, iid):\n        while iid not in self.result_map:\n            time.sleep(0.001)\n        self.num_infer += 1\n        return self.result_map.pop(iid)\n\n\ndef serve():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--port', default=60061, help='gRPC port')\n    parser.add_argument('--dir', required=True, help='TensorFlow SavedModel dir')\n    parser.add_argument('--parallel', type=int, default=4, help='Number of predictors')\n    parser.add_argument('--thread', type=int, default=2, help='Number of threads used by each predictor')\n    parser.add_argument('--batch', type=int, default=4, help='Batch size')\n    parser.add_argument('--bootstrap', action='store_true',\n                        help='Server loads a dataset and run inference itself')\n    args = parser.parse_args()\n    vocab_txt = os.path.join(os.path.dirname(__file__), 'uncased_L-24_H-1024_A-16.vocab.txt')\n    bert_service = BERTService(args.dir, args.parallel, args.batch, args.bootstrap, vocab_txt, args.thread)\n    server = grpc.server(\n        futures.ThreadPoolExecutor(max_workers=128),\n        options=[('grpc.max_send_message_length', -1),\n                 ('grpc.max_receive_message_length', -1)])\n    mrpc_pb2_grpc.add_mrpcServicer_to_server(bert_service, server)\n    server.add_insecure_port('[::]:{}'.format(args.port))\n    server.start()\n    try:\n        pool = Pool(len(bert_service.predictor_list) + 1)  # +1 for bert_service.current_throughput\n        if args.bootstrap:\n            monitor_func = bert_service.current_throughput_accuracy\n            process_func = bert_service.process_input_bootstrap\n        else:\n            monitor_func = bert_service.current_throughput\n            process_func = bert_service.process_input\n        pool.apply_async(monitor_func)\n        if args.parallel == 1:\n            process_func(0)\n        else:\n            for idx in range(len(bert_service.predictor_list)):\n                pool.apply_async(process_func, (idx,))\n        pool.close()\n        time.sleep(_ONE_DAY_IN_SECONDS)\n    except KeyboardInterrupt:\n        pass\n    bert_service.cleanup()\n    bert_service.alive = False\n    server.stop(0)\n\n\nif __name__ == '__main__':\n    serve()\n    print(f'Average Throughput: {total_tpt/num_tpt}')\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/download_mrpc_data.py",
    "content": "import os\nimport sys\nimport argparse\nimport urllib.request\n\nMRPC_TRAIN = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt'\nMRPC_TEST = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_test.txt'\n\n# This function is taken from https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e.\ndef format_mrpc(data_dir, path_to_data, path_to_dev_tsv):\n    print(\"Processing MRPC...\")\n    mrpc_dir = os.path.join(data_dir, \"MRPC\")\n    if not os.path.isdir(mrpc_dir):\n        os.mkdir(mrpc_dir)\n    if path_to_data:\n        mrpc_train_file = os.path.join(path_to_data, \"msr_paraphrase_train.txt\")\n        mrpc_test_file = os.path.join(path_to_data, \"msr_paraphrase_test.txt\")\n    else:\n        try:\n            mrpc_train_file = os.path.join(mrpc_dir, \"msr_paraphrase_train.txt\")\n            mrpc_test_file = os.path.join(mrpc_dir, \"msr_paraphrase_test.txt\")\n            urllib.request.urlretrieve(MRPC_TRAIN, mrpc_train_file)\n            urllib.request.urlretrieve(MRPC_TEST, mrpc_test_file)\n        except urllib.error.HTTPError:\n            print(\"Error downloading MRPC\")\n            return\n    assert os.path.isfile(mrpc_train_file), \"Train data not found at %s\" % mrpc_train_file\n    assert os.path.isfile(mrpc_test_file), \"Test data not found at %s\" % mrpc_test_file\n\n    with open(mrpc_test_file, encoding='utf-8') as data_fh, \\\n            open(os.path.join(mrpc_dir, \"test.tsv\"), 'w', encoding='utf-8') as test_fh:\n        header = data_fh.readline()\n        test_fh.write(\"index\\t#1 ID\\t#2 ID\\t#1 String\\t#2 String\\n\")\n        for idx, row in enumerate(data_fh):\n            label, id1, id2, s1, s2 = row.strip().split('\\t')\n            test_fh.write(\"%d\\t%s\\t%s\\t%s\\t%s\\n\" % (idx, id1, id2, s1, s2))\n\n    dev_ids = []\n    with open(path_to_dev_tsv, encoding='utf-8') as dev_fh:\n        header = dev_fh.readline()\n        for row in dev_fh:\n            _, id1, id2, _, _ = row.strip().split('\\t')\n            dev_ids.append([id1, id2])\n\n    with open(mrpc_train_file, encoding='utf-8') as data_fh, \\\n        open(os.path.join(mrpc_dir, \"train.tsv\"), 'w', encoding='utf-8') as train_fh, \\\n        open(os.path.join(mrpc_dir, \"dev.tsv\"), 'w', encoding='utf-8') as dev_fh:\n        header = data_fh.readline()\n        train_fh.write(header)\n        dev_fh.write(header)\n        for row in data_fh:\n            label, id1, id2, s1, s2 = row.strip().split('\\t')\n            if [id1, id2] in dev_ids:\n                dev_fh.write(\"%s\\t%s\\t%s\\t%s\\t%s\\n\" % (label, id1, id2, s1, s2))\n            else:\n                train_fh.write(\"%s\\t%s\\t%s\\t%s\\t%s\\n\" % (label, id1, id2, s1, s2))\n                \n    print(\"\\tCompleted!\")\n\ndef main(arguments):\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--data_dir', help='directory to save data to', type=str, default='glue_data')\n    parser.add_argument('--path_to_mrpc', help='path to directory containing extracted MRPC data, msr_paraphrase_train.txt and msr_paraphrase_text.txt',\n                        type=str, default='')\n    parser.add_argument('--path_to_dev_tsv', help='path to directory containing the glue_mrpc_dev.tsv', type=str, default='glue_mrpc_dev.tsv')\n    args = parser.parse_args(arguments)\n\n    if not os.path.isdir(args.data_dir):\n        os.mkdir(args.data_dir)\n    format_mrpc(args.data_dir, args.path_to_mrpc, args.path_to_dev_tsv)\n\nif __name__ == '__main__':\n    sys.exit(main(sys.argv[1:]))"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/glue_mrpc_dev.tsv",
    "content": "﻿Quality\t#1 ID\t#2 ID\t#1 String\t#2 String\n1\t1355540\t1355592\tHe said the foodservice pie business doesn 't fit the company 's long-term growth strategy .\t\" The foodservice pie business does not fit our long-term growth strategy .\n0\t2029631\t2029565\tMagnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war .\tHis wife said he was \" 100 percent behind George Bush \" and looked forward to using his years of training in the war .\n0\t487993\t487952\tThe dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat .\tThe dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .\n1\t1989515\t1989458\tThe AFL-CIO is waiting until October to decide if it will endorse a candidate .\tThe AFL-CIO announced Wednesday that it will decide in October whether to endorse a candidate before the primaries .\n0\t1783137\t1782659\tNo dates have been set for the civil or the criminal trial .\tNo dates have been set for the criminal or civil cases , but Shanley has pleaded not guilty .\n1\t3039165\t3039036\tWal-Mart said it would check all of its million-plus domestic workers to ensure they were legally employed .\tIt has also said it would review all of its domestic employees more than 1 million to ensure they have legal status .\n0\t1490811\t1490840\tWhile dioxin levels in the environment were up last year , they have dropped by 75 percent since the 1970s , said Caswell .\tThe Institute said dioxin levels in the environment have fallen by as much as 76 percent since the 1970s .\n1\t426112\t426210\tThis integrates with Rational PurifyPlus and allows developers to work in supported versions of Java , Visual C # and Visual Basic .NET.\tIBM said the Rational products were also integrated with Rational PurifyPlus , which allows developers to work in Java , Visual C # and VisualBasic .Net.\n1\t1439663\t1439808\tThe top rate will go to 4.45 percent for all residents with taxable incomes above $ 500,000 .\tFor residents with incomes above $ 500,000 , the income-tax rate will increase to 4.45 percent .\n1\t3147370\t3147525\tThe results appear in the January issue of Cancer , an American Cancer Society journal , being published online today .\tThe results appear in the January issue of Cancer , an American Cancer Society ( news - web sites ) journal , being published online Monday .\n1\t3300040\t3299992\tThe delegates said raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers .\tBin Laden ’ s men pointed out that raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers .\n0\t524136\t524119\t\" Sanitation is poor ... there could be typhoid and cholera , \" he said .\t\" Sanitation is poor , drinking water is generally left behind . . . there could be typhoid and cholera . \"\n0\t969512\t969295\tThe broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 .\tThe technology-laced Nasdaq Composite Index was down 25.36 points , or 1.53 percent , at 1,628.26 .\n1\t1685339\t1685429\tThe only announced Republican to replace Davis is Rep. Darrell Issa of Vista , who has spent $ 1.71 million of his own money to force a recall .\tSo far the only declared major party candidate is Rep. Darrell Issa , a Republican who has spent $ 1.5 million of his own money to fund the recall .\n1\t1967578\t1967664\tThe decision to issue new guidance has been prompted by intelligence passed to Britain by the FBI in a secret briefing in late July .\tScotland Yard 's decision to issue new guidance has been prompted by new intelligence passed to Britain by the FBI in late July .\n1\t2047034\t2046820\tUnable to find a home for him , a judge told mental health authorities they needed to find supervised housing and treatment for DeVries somewhere in California .\tThe judge had told the state Department of Mental Health to find supervised housing and treatment for DeVries somewhere in California .\n1\t2046630\t2046644\tThe decision came a year after Whipple ended federal oversight of the district 's racial balance , facilities , budget , and busing .\tThe decision came a year after Whipple ended federal oversight of school busing as well as the district 's racial balance , facilities and budget .\n0\t2221603\t2221633\tIn midafternoon trading , the Nasdaq composite index was up 8.34 , or 0.5 percent , to 1,790.47 .\tThe Nasdaq Composite Index .IXIC dipped 8.59 points , or 0.48 percent , to 1,773.54 .\n1\t129995\t129864\tMorgan Stanley raised its rating on the beverage maker to \" overweight \" from \" equal-weight \" saying in part that pricing power with its bottlers should improve in 2004 .\tMorgan Stanley raised its rating on the company to \" overweight \" from \" equal-weight , \" saying the beverage maker 's pricing power with bottlers should improve in 2004 .\n0\t919683\t919782\tThe pound also made progress against the dollar , reached fresh three-year highs at $ 1.6789 .\tThe British pound flexed its muscle against the dollar , last up 1 percent at $ 1.6672 .\n0\t970740\t971209\tFriday , Stanford ( 47-15 ) blanked the Gamecocks 8-0 .\tStanford ( 46-15 ) has a team full of such players this season .\n1\t2745055\t2745022\tLast month Intel raised its revenue guidance for the quarter to between $ 7.6 billion and $ 7.8 billion .\tAt the end of the second quarter , Intel initially predicted sales of between $ 6.9 billion and $ 7.5 billion .\n0\t2199097\t2199072\tThe driver , Eugene Rogers , helped to remove children from the bus , Wood said .\tAt the accident scene , the driver was \" covered in blood \" but helped to remove children , Wood said .\n1\t1609290\t1609098\tONG KONG , July 9 Tens of thousands of demonstrators gathered tonight before the legislature building here to call for free elections and the resignation of Hong Kong 's leader .\tTens of thousands of demonstrators gathered yesterday evening to stand before this city 's legislature building and call for free elections and the resignation of Hong Kong 's leader .\n1\t1597193\t1597119\tSaddam loyalists have been blamed for sabotaging the nation 's infrastructure , as well as frequent attacks on U.S. soldiers .\tHussein loyalists have been blamed for sabotaging the nation 's infrastructure and attacking US soldiers .\n1\t2758944\t2758975\tIts closest living relatives are a family frogs called sooglossidae that are found only in the Seychelles in the Indian Ocean .\tIts closest relative is found in the Seychelles Archipelago , near Madagascar in the Indian Ocean .\n0\t2584416\t2584653\tCooley said he expects Muhammad will similarly be called as a witness at a pretrial hearing for Malvo .\tLee Boyd Malvo will be called as a witness Wednesday in a pretrial hearing for fellow sniper suspect John Allen Muhammad .\n1\t86007\t86373\t\" Instead of pursuing the most imminent and real threats - international terrorists , \" Graham said , \" this Bush administration chose to settle old scores . \"\t\" Instead of pursuing the most imminent and real threats - international terrorists - this Bush administration has chosen to settle old scores , \" Graham said .\n1\t1602860\t1602844\tHe said they lied on a sworn affidavit that requires them to list prior marriages .\tMorgenthau said the women , all U.S. citizens , lied on a sworn affidavit that requires them to list prior marriages .\n1\t1201306\t1201329\tThe association said 28.2 million DVDs were rented in the week that ended June 15 , compared with 27.3 million VHS cassettes .\tThe Video Software Dealers Association said 28.2 million DVDs were rented out last week , compared to 27.3 million VHS cassettes .\n0\t461779\t461815\tWith these assets , Funny Cide has a solid chance to become the first Triple Crown winner since Affirmed in 1978 .\tFunny Cide is looking to become horse racing 's first Triple Crown winner in a generation .\n1\t1438666\t1438643\tIntel was disappointed and assessing its \" options in the event Mr. Hamidi resumes his spamming activity against Intel , \" spokesman Chuck Mulloy said .\tIntel spokesman Chuck Mulloy said the company was disappointed and assessing its \" options in the event Mr. Hamidi resumes his spamming activity against Intel . \"\n1\t3261484\t3261306\tMr Annan also warned the US should not use the war on terror as an excuse to suppress \" long-cherished freedoms \" .\tAnnan warned that the dangers of extremism after September 11 should not be used as an excuse to suppress \" long-cherished \" freedoms .\n1\t1277539\t1277527\tAt community colleges , tuition will jump to $ 2,800 from $ 2,500 .\tCommunity college students will see their tuition rise by $ 300 to $ 2,800 or 12 percent .\n1\t3035788\t3035918\tHe made a point of saying during Tuesdays debate that the Confederate flag was a racist symbol .\tThough Dean made a point of saying during the debate that the Confederate flag is a racist symbol .\n0\t132553\t132725\tBush wanted \" to see an aircraft landing the same way that the pilots saw an aircraft landing , \" White House press secretary Ari Fleischer said yesterday .\tOn Tuesday , before Byrd 's speech , Fleischer said Bush wanted ' ' to see an aircraft landing the same way that the pilots saw an aircraft landing .\n0\t2259788\t2259747\tOn Monday the Palestinian Prime Minister , Mahmoud Abbas , will report to the Palestinian parliament on his Government 's achievements in its first 100 days in office .\tPalestinian Prime Minister Mahmoud Abbas must defend the record of his first 100 days in office before Parliament today as the death toll in the occupied territories continues to rise .\n0\t2307064\t2307235\tThe civilian unemployment rate improved marginally last month -- slipping to 6.1 percent -- even as companies slashed payrolls by 93,000 .\tThe civilian unemployment rate improved marginally last month _ sliding down to 6.1 percent _ as companies slashed payrolls by 93,000 amid continuing mixed signals about the nation 's economic health .\n1\t3046488\t3046824\tPer-user pricing is $ 29 for Workplace Messaging , $ 89 for Team Collaboration and $ 35 for Collaborative Learning .\tWorkplace Messaging is $ 29 , Workplace Team Collaboration is $ 89 , and Collaborative Learning is $ 35 .\n1\t86020\t86007\t\" Instead of pursuing the most imminent and real threats – international terrorism – this Bush administration chose to settle old scores , \" Mr. Graham said .\t\" Instead of pursuing the most imminent and real threats - international terrorists , \" Graham said , \" this Bush administration chose to settle old scores . \"\n0\t1100998\t1100441\tSARS has killed about 800 people and affected more than 8400 since being detected in China in November .\tSARS has killed about 800 people and sickened more than 8,400 worldwide , mostly in Asia .\n1\t2268396\t2268480\tAuthorities had no evidence to suggest the two incidents were connected .\tThere was no immediate evidence that the two incidents were connected , police said .\n0\t1984039\t1983986\t\" Jeremy 's a good guy , \" Barber said , adding : \" Jeremy is living the dream life of the New York athlete .\tHe also said Shockey is \" living the dream life of a New York athlete .\n0\t2697659\t2697747\tRatliff 's daughters , Margaret and Martha Ratliff , were adopted by Peterson after their mother 's death .\tPeterson helped raise Ratliff 's two daughters , Margaret and Martha Ratliff , who supported him throughout the trial .\n0\t2175939\t2176090\tAfter losing as much as 84.56 earlier , the Dow Jones industrial average closed up 22.81 , or 0.2 percent , at 9,340.45 .\tIn midday trading , the Dow Jones industrial average lost 68.84 , or 0.7 percent , to 9,248.80 .\n1\t886618\t886456\tRumsfeld , who has been feuding for two years with Army leadership , passed over nine active-duty four-star generals .\tRumsfeld has been feuding for a long time with Army leadership , and he passed over nine active-duty four-star generals .\n1\t588637\t588864\tConsumers who said jobs are difficult to find jumped from 29.4 to 32.6 , while those claiming work was plentiful slipped from 13 to 12.6 .\tConsumers who said jobs are difficult to find jumped to 32.6 from 29.4 , while those saying work was plentiful slipped to 12.6 from 13 in April .\n0\t2252795\t2252970\tHe has no immediate plans for television advertising , believing it is unnecessary this early .\tA Lieberman aide said there were no immediate plans for television advertising .\n1\t1756329\t1756394\t\" I think it happened very quickly , \" Houston Police Department homicide investigator Phil Yochum said of the crime .\t\" I think it happened very quickly , \" said Investigator Phil Yochum of the Houston Police Department 's homicide division .\n1\t1673112\t1673068\tUnited issued a statement saying it will \" work professionally and cooperatively with all its unions . \"\tSenior vice president Sara Fields said the airline \" will work professionally and cooperatively with all our unions . \"\n1\t2357324\t2357271\t\" But they never climb out of the pot of beer again . \"\tIt 's just that they never climb out of the beer again . \"\n1\t780408\t780363\tChief financial officer Andy Bryant has said that hike had a greater affect volume than officials expected .\tBryant has said that hike had a greater effect on demand than officials expected .\n1\t821523\t821385\tRobert Liscouski , the Assistant Secretary of Homeland Security for Infrastructure Protection , will oversee NCSD .\tNCSD 's chief will be Robert Liscouski , the assistant secretary of Homeland Security for Infrastructure Protection .\n1\t2304696\t2304863\tHP 's shipments increased 48 percent year-over-year , compared to an increase of 31 percent for Dell .\tHPs shipments increased 48 per cent year-on-year , compared to an increase of 31 per cent for Dell .\n1\t2531749\t2531607\tChirac , who can pardon a law-breaker , refused Humbert 's request last year but kept in close touch with the family .\tChirac , who has the authority to pardon law-breakers , refused Humbert 's request to be allowed to die last year but kept in close touch with the family .\n1\t3180014\t3179967\tThe charges allege that he was part of the conspiracy to kill and kidnap persons in a foreign country .\tThe government now charges that Sattar conspired with Rahman to kill and kidnap individuals in foreign countries .\n1\t726966\t726945\tIn the 2002 study , the margin of error ranged from 1.8 to 4.4 percentage points .\tIt has a margin of error of plus or minus three to four percentage points .\n1\t2638861\t2638982\tMr. Clinton 's national security adviser , Sandy Berger , said that the White House wasn 't informed of the FBI activities .\tClinton ’ s national security adviser , Sandy Berger , said in an interview that the White House was not informed of the FBI activities .\n1\t2495223\t2495307\t\" This decision is clearly incorrect , \" FTC Chairman Timothy Muris said in a written statement .\tThe decision is \" clearly incorrect , \" FTC Chairman Tim Muris said .\n1\t55187\t54831\tProsecutors allege that Nichols and co-conspirator Timothy McVeigh worked together to prepare a bomb that destroyed the Alfred P. Murrah Federal Building .\tProsecutors allege that Nichols and coconspirator Timothy McVeigh worked together to prepare a 4,000-pound fuel-and-fertilizer bomb that destroyed the Murrah building .\n0\t2763381\t2763517\tTerri Schiavo , 39 , is expected to die sometime in the next two weeks in the Tampa-area hospice where she has spent the past several years .\tTerri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler .\n1\t1990975\t1991132\tSecretary of State Colin Powell designated the Chechen leader believed responsible for last year 's hostage standoff in a Moscow theater as a threat to U.S. security Friday .\tU.S. Secretary of State Colin Powell on Friday designated Chechen rebel leader Shamil Basayev a threat to the security of the United States and to U.S. citizens .\n1\t2204353\t2204418\t\" Today , we are trying to convey this problem to Russian President Vladimir Putin and US President George W Bush . \"\t\" Today , we are trying to convey this problem to Russian President Vladimir Putin ( news - web sites ) and President Bush ( news - web sites ) . \"\n1\t60122\t60445\tThat would be a potential setback to Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries .\tThe inquiry may hinder Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries .\n1\t961836\t962243\tPeopleSoft also said its board had officially rejected Oracle 's offer .\tThursday morning , PeopleSoft 's board rejected the Oracle takeover offer .\n0\t3140260\t3140288\tThe Dow Jones industrial average ended the day down 10.89 at 9,837.94 , after advancing 111.04 Wednesday .\tThe Dow Jones industrial average fell 10.89 points , or 0.11 percent , to 9,837.94 .\n1\t1720166\t1720115\tCortisol levels in the saliva of day care children were highest and rose most steeply in those judged by day care center personnel to be the shyest .\tCortisol levels in the saliva of day-care children were highest and rose most steeply in those whom day-care centre staffed judged to be the shyest .\n1\t2573262\t2573319\t\" The idea that Tony Abbott is in some way a one-dimensional political head-kicker couldn 't be more wrong , \" Mr Howard said .\t\" The idea that Tony Abbott is in some way a one-dimensional political head kicker couldn 't be more wrong . \"\n0\t1353356\t1353174\t\" Biotech products , if anything , may be safer than conventional products because of all the testing , \" Fraley said , adding that 18 countries have adopted biotechnology .\t\" Biotech products , if anything , may be safer than conventional products because of all the testing , \" said Robert Fraley , Monsanto 's executive vice president .\n1\t2738677\t2738741\tThe rate of skin cancer has tripled since the 1950s in Norway and Sweden , according to the study .\tThe study also found that skin cancer nearly tripled in Norway and Sweden since the 1950s .\n1\t1638813\t1639087\tWe acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , \" Rumsfeld said .\tRather , the US acted because the administration saw \" existing evidence in a new light , through the prism of our experience on September 11 \" .\n1\t1605350\t1605425\tTrans fat makes up only 1 percent to 3 percent of the total fat Americans consume , compared with 14 percent for saturated fat .\tTrans fat accounts for 2.5 percent of Americans ' daily calories , compared to 11 percent to 12 percent for saturated fat .\n1\t2494149\t2494073\tHowever , a recent slide in prices and OPEC 's expectations of a surge in oil inventories have compounded its fears about a further softening of the market .\tA 14 percent slide in crude prices this month and expectations of a build up in oil inventories compounded OPEC 's fears of a further softening of the market .\n1\t3023029\t3023229\tPeterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son .\tPeterson , 31 , is charged with two counts of first-degree murder in the slayings of his wife , Laci , and their unborn son , Conner .\n1\t1351550\t1351155\tCarlson on Tuesday said he would not recuse himself from the case .\tService officials said Carlson refused to recuse himself from the case .\n1\t981185\t981234\tThe program will grow to include ports in Dubai , Turkey and Malaysia , among others .\tThe program will be expanded to include areas of the Middle East such as Dubai , Turkey and Malaysia , Mr. Ridge said .\n0\t2111629\t2111786\tMcCabe said he was considered a witness , not a suspect .\t\" He is not considered a suspect , \" McCabe said .\n1\t655498\t655391\tThe woman was exposed to the SARS virus while in the hospital but was not a health care worker , said Dr. Colin D ’ Cunha , Ontario ’ s commissioner of public health .\tThe woman was exposed to the SARS virus while in the hospital but was not a health-care worker , said Dr Colin D 'Cunha , Ontario 's commissioner of public health .\n1\t533823\t533909\tHe added that those \" are not solely American principles , nor are they exclusively Western . \"\t\" These are not solely American principles nor are they exclusively Western , \" Rumsfeld said .\n1\t581592\t581570\t\" If we don 't march into Tehran , I think we will be in pretty good shape , \" he said .\t\" As long as we don 't march on Tehran , I think we are going to be in pretty good shape , \" he said .\n0\t1010655\t1010430\tOn Saturday , a 149mph serve against Agassi equalled Rusedski 's world record .\tOn Saturday , Roddick equalled the world record with a 149 m.p.h. serve in beating Andre Agassi .\n1\t2241925\t2242066\tChad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new technologies and methods to communicate more quickly and efficiently .\tChad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new ways to communicate .\n1\t2796978\t2797024\t\" APEC leaders are painfully aware that security and prosperity are inseparable , \" Thai Prime Minister Thaksin Shinawatra told business leaders .\t\" APEC leaders are painfully aware that security and prosperity are inseparable , \" Thaksin said .\n0\t101746\t101775\tDanbury prosecutor Warren Murray could not be reached for comment Monday .\tProsecutors could not be reached for comment after the legal papers were obtained late Monday afternoon .\n1\t327839\t327748\tWittig resigned last year after being indicted on federal bank fraud charges involving a real estate loan unrelated to Westar business .\tWittig resigned in late November about two weeks after being indicted on bank fraud charges in a real estate case unrelated to the company .\n0\t2988297\t2988555\tShattered Glass , \" starring Hayden Christensen as Stephen Glass , debuted well with $ 80,000 in eight theaters .\t\" Shattered Glass \" _ starring Hayden Christensen as Stephen Glass , The New Republic journalist fired for fabricating stories _ debuted well with $ 80,000 in eight theaters .\n1\t2217613\t2217659\tHe was arrested Friday night at an Alpharetta seafood restaurant while dining with his wife , singer Whitney Houston .\tHe was arrested again Friday night at an Alpharetta restaurant where he was having dinner with his wife .\n0\t2128530\t2128455\tHowever , EPA officials would not confirm the 20 percent figure .\tOnly in the past few weeks have officials settled on the 20 percent figure .\n1\t2208376\t2208198\tUniversity of Michigan President Mary Sue Coleman said in a statement on the university 's Web site , \" Our fundamental values haven 't changed .\t\" Our fundamental values haven 't changed , \" Mary Sue Coleman , president of the university , said in a statement in Ann Arbor .\n1\t1980654\t1980641\tThe first products are likely to be dongles costing between US $ 100 and US $ 150 that will establish connections between consumer electronics devices and PCs .\tThe first products will likely be dongles costing $ 100 to $ 150 that will establish connections between consumer electronics devices and PCs .\n0\t589579\t589557\tHowever , Lapidus expects foreign brands ' sales to be up 4 percent , driven by strong truck sales at Honda Motor Co .\tLapidus expects Ford to be down 5 percent , Chrysler down 10 percent and foreign brands up 4 percent driven by strong truck sales at Honda .\n1\t1636060\t1635946\tMichel , who remains in the government , denied that US pressure had provoked the government 's move .\tMichel , who has stayed in the new government , denied that it was U.S. pressure which had provoked the government 's move .\n1\t1630585\t1630657\tSome of the computers also are used to send spam e-mail messages to drum up traffic to the sites .\tSome are also used to send spam e-mail messages to boost traffic to the sites .\n0\t447728\t447699\tIndonesia 's army has often been accused of human rights abuses during GAM 's battle for independence , charges it has generally denied while accusing the separatists of committing rights violations .\tIndonesia 's army has been accused of human rights abuses during its earlier battles with GAM , charges it has generally denied .\n1\t1606495\t1606619\tBush also hoped to polish his anti-AIDS credentials in Uganda , which has been hailed as an African pioneer in fighting the killer disease .\tPresident Bush flies to Uganda Friday hoping to polish his anti- AIDS credentials in a country hailed as an African pioneer in fighting the epidemic .\n1\t1550897\t1550977\tLater this year , the command will send trainers with soldiers from four North African nations on patrolling and intelligence gathering missions .\tThis fall the command will send trainers to work with soldiers from four North African nations on patrolling and gathering intelligence .\n0\t490376\t490490\tThe reports helped overcome investor jitters after the euro briefly hit an all-time high against the dollar Tuesday .\tStocks slipped at the open after the euro hit record highs against the dollar .\n1\t3084554\t3084612\tSales for the quarter beat expectations , rising 37 percent year-on-year to 1.76 billion euros .\tSales rose 37 per cent year-on-year to 1.76bn , beating expectations .\n1\t315647\t315778\tIf the MTA 's appeal to a higher court is successful , the $ 2 bus and subway base fare won 't be rolled back .\tIf the MTA 's appeal is successful , the $ 2 bus and subway base fare won 't change .\n1\t3428298\t3428362\tRobert Walsh , 40 , remained in critical but stable condition Friday at Staten Island University Hospital 's north campus .\tWalsh , also 40 , was in critical but stable condition at Staten Island University Hospital last night .\n1\t2523564\t2523358\tThe Guru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS ( Basic Input Output System ) update and a troubleshooting-assistance feature called Black Box .\tThe µGuru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS update and a troubleshooting-assistance feature called Black Box .\n1\t2079200\t2079131\tU.S. corporate bond yield spreads tightened in spotty trading on Friday as Wall Street labored to get back on its feet after the largest power outage ever in North America .\tU.S. stocks rose slightly on feather-light volume on Friday , as Wall Street regrouped after the biggest-ever power outage in North America .\n1\t818091\t817811\tThe company said it would issue revised guidance for the full fiscal year next month when it releases its Q2 results .\tThe company said it would renew its guidance for 2003 when it announces its second quarter results in mid-July .\n1\t1580638\t1580663\t\" I stand 100 percent by it , and I think our intelligence services gave us the correct information at the time . \"\tI stand 100 percent by it , and I think that our intelligence services gave us the correct intelligence and information at the time , \" Blair said .\n0\t1919740\t1919926\t\" I don 't know if the person I 'm talking to now may end up being someone else at another time that may not follow the rules , \" Parrish said .\t\" I don 't know whether the person I 'm talking to now may end up being someone else , \" Parrish said .\n1\t2748287\t2748550\t\" I think it 's going to be a close vote , but I think the grant proposal is going to win , \" McConnell said .\t\" I think it 's going to be a close vote , but I think the grant proposal 's going to win , \" said Sen. Mitch McConnell , assistant majority leader .\n1\t3394891\t3394775\tTwenty-eight people were believed to have been spending Christmas Day with the caretaker of the St Sophia 's camp , when the mudslide smashed into two cabins .\tTwenty-seven people were believed to have been spending Christmas Day with the caretaker of Saint Sophia Camp , a Greek Orthodox facility , when the mudslide roared through .\n0\t2963943\t2963880\tOne , Capt. Doug McDonald , remained hospitalized in critical condition on Thursday .\tHer 20-year-old sister , Allyson , was severely burned and remained hospitalized in critical condition .\n0\t1865364\t1865251\tThe United States finally relented during President Bush 's visit to Africa earlier this month .\tDuring President Bush 's trip to Africa earlier this month , however , Washington said it would support the increase .\n1\t263690\t263819\t\" There is no conscious policy of the United States , I can assure you of this , to move the dollar at all , \" he said .\tHe also said there is no conscious policy by the United States to move the value of the dollar .\n1\t283751\t283290\tIt 's the first such drill since the September 11 terrorist attacks on New York and Washington .\tIt is the nation 's first large-scale counterterrorism exercise since the Sept . 11 terrorist attacks .\n1\t2517014\t2516995\tMyanmar 's pro-democracy leader Aung San Suu Kyi will return home late Friday but will remain in detention after recovering from surgery at a Yangon hospital , her personal physician said .\tMyanmar 's pro-democracy leader Aung San Suu Kyi will be kept under house arrest following her release from a hospital where she underwent surgery , her personal physician said Friday .\n1\t1330643\t1330622\tAccording to the Merchant Marine Ministry , the 37-year-old ship is registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands .\tThe Baltic Sky is a 37-year-old ship registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands .\n1\t3111452\t3111428\tIn an unusual move , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages that critics contend could disrupt millions of Web sites .\tIn an unusual move that critics contend could disrupt millions of Web sites , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages .\n0\t1167835\t1167651\tKansas Department of Health and Environment records show there were 88 abortions performed on girls age 14 and younger last year .\tStatistics from the Kansas Department of Health and Environment show that 11,844 abortions were performed in the state last year .\n0\t1423836\t1423708\tA European Union spokesman said the Commission was consulting EU member states \" with a view to taking appropriate action if necessary \" on the matter .\tLaos 's second most important export destination - said it was consulting EU member states ' ' with a view to taking appropriate action if necessary ' ' on the matter .\n1\t2090911\t2091154\tWaiting crowds filling the streets on both sides overwhelmed the peacekeepers soon after daylight , sweeping past the barbed wire barricades .\tBut waiting crowds filling the streets rushed the bridges soon after daylight , overrunning razor-wire barricades .\n1\t2265271\t2265152\tBarry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products not sold in the United States .\tBarry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products unknown to the American market .\n1\t3062202\t3062308\tBy skirting the FDA 's oversight , Eagan said , the quality of the imported drugs is \" less predictable \" than for those obtained in the United States .\tBy skirting the FDA 's oversight , Eagan said the quality of the imported drugs is \" less predictable \" than U.S. drugs .\n1\t2155514\t2155377\tHe said : \" For the first time there is an easy and affordable way of making this treasure trove of BBC content available to all . \"\t\" For the first time , there is an easy and affordable way of making this treasure trove of BBC content available to all , \" Dyke said .\n1\t1552068\t1551928\tThree such vigilante-style attacks forced the hacker organizer , who identified himself only as \" Eleonora [ 67 ] , \" to extend the contest until 7 p.m. EST Sunday .\tThree such vigilante-style attacks forced the hacker organiser , who identified himself only as \" Eleonora67 ] , \" to extend the contest until 8am ( AEST ) today .\n1\t936978\t937500\tEric Gagne pitched a perfect ninth for his 23rd save in as many opportunities .\tGagne struck out two in a perfect ninth inning for his 23rd save .\n0\t985015\t984975\tOne way or another , Harry Potter And The Order Of The Phoenix will be in your hands by Saturday .\tJust about everything about \" Harry Potter and the Order of the Phoenix \" will set records .\n1\t1430357\t1430425\t\" Allison just proves you don 't need to wait until August or September to have a disaster , \" said Josh Lichter , a meteorologist with the Houston-Galveston weather office .\t\" Allison just proves you don 't need to wait until August or September to have a disaster , \" Lichter said .\n1\t3039310\t3039413\tToday , analysts say , UN members can no longer ignore the shifts since the September 11 2001 attacks .\tOn Wednesday , analysts say , UN members can no longer ignore the shifts since the attacks in the US of September 11 2001 .\n1\t34513\t34742\tPolice say CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the United States .\tMr McKinlay said that CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the US .\n1\t368067\t368018\tChiron already has nearly 20 percent acceptances from PowderJect 's shareholders .\tChiron has acceptances from holders of nearly 20 percent of PowderJect shares .\n0\t611663\t611716\tErnst & Young has denied any wrongdoing and plans to fight the allegations .\tErnst & Young has denied the SEC 's claims , and called its recommendations \" irresponsible \" .\n1\t98432\t98657\tThe attack followed several days of disturbances in the city where American soldiers exchanged fire with an unknown number of attackers as civilians carried out demonstrations against the American presence .\tThe attack came after several days of disturbance in the city in which U.S. soldiers exchanged fire with an unknown number of attackers as civilians protested the American presence .\n1\t3039007\t3038845\tNo company employee has received an individual target letter at this time .\tShe said no company official had received \" an individual target letter at this time . \"\n1\t1708040\t1708062\tSecond-quarter results reflected a gain of 10 cents per diluted share , while the 2002 results included a loss of 19 cents per diluted share .\tThe second-quarter results had a non-operating gain of 10 cents a share while the 2002 second-quarter performance had a net non-operating loss of 19 cents a share .\n0\t1757264\t1757375\tHe allegedly told his ex-wife in an angry phone call that he had no intention of following their new custody agreement .\tThe two had battled over custody and he allegedly told her in an angry phone call that he had no intention of following their new custody agreement .\n1\t383417\t383558\tWorldwide , more than 50 million people have seen \" Les Miz , \" with gross receipts of $ 1.8 billion .\tWorldwide , Les Misérables has been seen by over 50 million people , with a total gross of over $ 2 billion .\n0\t2766112\t2766084\tIn fiction : Edward P. Jones ( \" The Known World \" ) and Scott Spencer ( \" A Ship Made of Paper \" ) .\tThe fifth nominee for fiction is Scott Spencer , for A Ship Made of Paper .\n1\t1261116\t1261234\t\" Overwhelmingly the Windows brand really resonated with them . \"\t\" Windows was the part of the experience that really resonated with people . \"\n1\t3028143\t3028234\tThe Centers for Medicare and Medicaid Services , the federal agency that runs Medicare , last year began a similar effort for nursing homes .\tThe Centers for Medicare and Medicaid launched a similar consumer tool for nursing homes last year .\n0\t249699\t249623\tVivace was founded in 1999 and has raised over $ 118 million in three rounds of venture financing .\tDuring difficult times for technology venture capital , Vivace raised over $ 118 million in three rounds of venture financing .\n0\t3448488\t3448449\tThe Dow Jones industrial average < .DJI > added 28 points , or 0.27 percent , at 10,557 , hitting its highest level in 21 months .\tThe Dow Jones industrial average < .DJI > rose 49 points , or 0.47 percent , to 10,578 .\n1\t2749322\t2749663\tThe Democratic candidates also began announcing their fund-raising totals before Wednesday 's deadline to file quarterly reports with the Federal Election Commission .\tThe Democratic candidates also began announcing their fund-raising totals in advance of the deadline today to file quarterly reports with the Federal Election Commission .\n0\t2204592\t2204588\tSun Microsystems Inc. on Thursday said it had added 100 new third-party systems and 100 new components to its Hardware Compatibility List for the Solaris x86 operating system Platform Edition .\tThe vendor has added 100 new third-party systems and 100 new components to the operating system 's Hardware Compatibility List ( HCL ) .\n1\t2889005\t2888954\tProsecutors said PW Marketing violated the state 's 1998 anti-spam law by sending unsolicited e-mail without a toll-free number for recipients to call to stop additional mailings .\tProsecutors said PW Marketing violated the 1998 anti-spam law because these unsolicited e-mails were sent without a free call number for recipients to phone to stop additional mailings .\n0\t1657632\t1657619\tThe Neighbours star and singer spent yesterday resting at her family home in Sydney and will have more tests today .\tGoodrem spent yesterday resting in her family home in Sydney and will have more tests today to determine her exact treatment .\n0\t555617\t555528\tThe 3 rd Armored Cavalry Regiment is 5,200 strong and the largest combat unit at Fort Carson .\tBroomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment .\n1\t2396937\t2396818\t\" The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , \" the Fed said in a statement accompanying the unanimous decision .\t\" The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , \" the policy-setting Federal Open Market Committee said .\n0\t2339738\t2339771\t\" It is bad for Symbian , \" said Per Lindberg , analyst at Dresdner Kleinwort Wasserstein .\t\" Motorola has displayed clear disloyalty \" to Symbian , said Per Lindberg , an analyst at Dresdner Kleinwort Wasserstein in London .\n0\t1616174\t1616206\tBob Richter , a spokesman for House Speaker Tom Craddick , had no comment about the ruling .\tBob Richter , spokesman for Craddick , R-Midland , said the speaker had not seen the ruling and could not comment .\n1\t635783\t635802\tBut Ms Ward said the headroom under its financial covenants was \" tight \" and that there could be another downgrade if Southcorp breached any of its banking covenants .\tBut Ms Ward said the headroom under its financial covenants was \" tight \" and that there could be a rating downgrade if Southcorp did breach any banking covenants .\n1\t3444633\t3444733\tHe added : ``I 've never heard of more reprehensiblebehaviour by a doctor .\tThe Harrisons ’ lawyer Paul LiCalsi said : “ I ’ ve never heard of more reprehensible behaviour by a doctor .\n1\t555553\t555528\tBroomhead was assigned to 2nd Squadron , 3rd Armor Cavalry Regiment , based at Fort Carson .\tBroomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment .\n1\t1112021\t1111925\tOther staff members , however , defended the document , saying it would still help policy-makers and the agency improve efforts to address the climate issue .\tSome E.P.A. staff members defended the document , saying that although pared down it would still help policy makers and the agency address the climate issue .\n0\t2749410\t2749625\tPresident Bush raised a record-breaking $ 49.5 million for his re-election campaign over the last three months , with contributions from 262,000 Americans , the president 's campaign chairman said Tuesday .\tPresident Bush has raised $ 83.9 million since beginning his re-election campaign in May , and has $ 70 million of that left to spend , his campaign said Tuesday .\n1\t1629064\t1629043\tAn episode is declared when the ozone reaches .20 parts per million parts of air for one hour .\tA Stage 1 episode is declared when ozone levels reach 0.20 parts per million .\n1\t789691\t789665\t\" He may not have been there , \" the defence official said on Thursday .\t\" He may not have been there , \" said a defence official speaking on condition of anonymity .\n1\t844421\t844679\tThe U.N. troops are in Congo to protect U.N. installations and personnel , and they can only fire in self defense and have been unable to stem the violence .\tThe troops - whose mandate is to protect U.N. installations and personnel - can only fire in self-defense and have been unable to stem the violence .\n1\t58540\t58567\tNorth American markets grabbed early gains Monday morning , as earnings season begins to slow and economic indicators take the spotlight .\tNorth American futures pointed to a strong start to the first trading session of the week Monday , as earnings season slows and economic indicators take the spotlight .\n1\t781439\t781461\tXerox itself paid a $ 10 million fine last year to settle similar SEC charges .\tXerox itself previously paid a $ 10-million penalty to settle the SEC accusations .\n1\t1909579\t1909408\t\" This deal makes sense for both companies , \" said National Chief Executive Brian Halla .\t\" This deal makes sense for both companies , \" Halla said in a prepared statement .\n0\t787432\t787464\tThe blasts killed two people and injured more than 150 others .\tThe Atlanta Olympic Games attack killed one woman and injured more than 100 other people .\n0\t52758\t52343\tMorrill 's wife , Ellie , sobbed and hugged Bondeson 's sister-in-law during the service .\tAt the service Morrill 's widow , Ellie , sobbed and hugged Bondeson 's sister-in-law as people consoled her .\n1\t1675025\t1675047\tSpansion products are to be available from both AMD and Fujitsu , AMD said .\tSpansion Flash memory solutions are available worldwide from AMD and Fujitsu .\n1\t2131318\t2131372\tAbout 1,500 police will be deployed for the visit .\tAround 1,500 police are to be deployed at Niigata for the ferry 's visit .\n1\t325763\t325928\tGamarekian told The News she remembers only the woman 's first name - and refused to reveal it .\tShe told the New York Daily News she remembers only the intern 's first name , which she refused to reveal .\n1\t2638975\t2638855\tOne of the FBI ’ s key operatives , who had a falling out with the bureau , provided an account of the operation at a friend ’ s closed immigration court proceeding .\tOne of the FBI 's key operatives , who has had a falling-out with the bureau , provided an account of the operation at a friend 's closed immigration court proceeding .\n1\t2198694\t2198937\tA nationally board certified teacher with a master 's degree , Kelley makes a salary of $ 65,000 in his 30th year .\tA nationally board certified teacher with a master 's degree , Kelley , in his 30th year teaching , makes $ 65,000 .\n1\t1825432\t1825301\tA man arrested for allegedly threatening to shoot and kill a city councilman from Queens was ordered held on $ 100,000 bail during an early morning court appearance Saturday .\tThe Queens man arrested for allegedly threatening to shoot City Councilman Hiram Monserrate was held on $ 100,000 bail Saturday , a spokesman for the Queens district attorney said .\n1\t2906104\t2906322\tThey were being held Sunday in the Camden County Jail on $ 100,000 bail .\tThey remained in Camden County Jail on Sunday on $ 100,000 bail .\n1\t722278\t722383\tMs Stewart , the chief executive , was not expected to attend .\tMs Stewart , 61 , its chief executive officer and chairwoman , did not attend .\n0\t101747\t101777\tChristina 's aunt , Shelley Riling , said the defense 's claims were preposterous .\tChristina 's aunt , Shelley Riling , said she will address the court .\n1\t2224884\t2224819\tThe Justice Department Aug. 19 gave pre-clearance for the Oct. 7 date for the election to recall Gov. Gray Davis , saying it would not affect minority voting rights .\tThe Justice Department on Aug. 19 sanctioned the Oct. 7 date for recall election , saying it would not affect voting rights .\n0\t977938\t978162\tLord Falconer hailed the changes as \" a new beginning as far as the courts , Crown Prosecution Service and police are concerned \" .\t\" It 's a new beginning as far as the courts , Crown Prosecution Service and police are concerned , making the criminal justice system work better . \"\n0\t1015010\t1014963\tGE stock closed at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange .\tGE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange .\n1\t1513190\t1513246\tAt least 27 US troops have been killed in hostile fire since Bush 's statement .\tAt least 26 American troops have been killed in hostile fire since major combat was officially declared over on May 1 .\n1\t2385348\t2385394\tA recent poll showed Edwards with a narrow lead in South Carolina , and he plans a rally there later on Tuesday .\tA recent poll showed Edwards in a virtual four-way tie at the top in South Carolina , and he plans a rally there later on Tuesday .\n1\t2317018\t2317252\tNovember 17 's last victim was British defence attache Stephen Saunders , who was shot on an Athens road in June 2000 .\tNovember 17 's last victim was British defense attache Stephen Saunders , who was shot and killed at point-blank range on a busy Athens road in June 2000 .\n0\t1831696\t1831660\tThe agency charged that one WD Energy worker discussed false reporting with traders at two other energy companies .\tThe agency found further that a WD Energy employee discussed false reporting with traders at two other energy companies , which the CFTC didn 't identify .\n1\t1528383\t1528083\tZulifquar Ali , a worshipper slightly wounded by shrapnel , said the assailants first targeted the mosque 's security guards .\tWitness Zulfiqar Ali , who was slightly wounded by shrapnel , said the attackers had focused on the mosque 's guards .\n1\t917965\t918315\tFor the second year in a row , rises in hospital costs accounted for much of the inflation , accounting for 51 percent of the overall cost increase .\tFor the second year in a row , rises in hospital costs dominated the increase , accounting for 51 percent of the overall cost spiral .\n0\t3218713\t3218830\tQ : Can I buy coverage for prescription drugs right away ?\tCongress has added a new benefit - an option to buy insurance coverage for prescription drugs .\n1\t221079\t221003\tThe airline also said it has the option to buy 380 more airplanes , orders that would be split evenly between the two manufacturers .\tThe airline has the option to buy 380 more , split evenly between the two manufacturers .\n1\t2546175\t2546198\tDr Mark McClean , Jonathan 's family doctor , said if the drug had been administered earlier Jonathan would have retained more of his brain functions .\tDr Mark McClean , the family 's GP , said had the drug been administered to Jonathan earlier , he would have retained more of his brain function .\n0\t799346\t799268\tThe chain operates more than 3,400 stores , and has annual revenue of about $ 15.8 billion .\tThe chain , which has been under new management since late 1999 , has more than 3,400 stores and $ 15.8 billion in annual revenue .\n0\t2673104\t2673130\tAll patients developed some or all of the symptoms of E. coli food poisoning : bloody diarrhea , vomiting , abdominal cramping and nausea .\tSymptoms of the E. coli infection include bloody diarrhea , nausea , vomiting and abdominal cramping .\n1\t1354501\t1354476\tFederal regulators have turned from sour to sweet on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings Inc. and Dreyer 's Grand Ice Cream Inc .\tFederal regulators have changed their minds on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings and Dreyer 's Grand Ice Cream .\n1\t3070979\t3070949\tEnvironmental campaigners are using this weekend ’ s lunar eclipse to highlight the huge increase in light pollution across the UK .\tEnvironmental campaigners used the eclipse to highlight the surge in light pollution across Britain .\n0\t1264509\t1264471\tAvailable July 7 , the software supports the Solaris , IBM AIX , Red Hat Linux and Windows operating systems .\tThe OpForce product currently works with Solaris , AIX , Red Hat Linux and Windows servers .\n1\t103280\t103431\tJustice Minister Martin Cauchon and Prime Minister Jean Chrétien have both said the Liberal government will introduce legislation soon to decriminalize possession of small amounts of pot for personal use .\tJustice Minister Martin Cauchon and Prime Minister Jean Chretien both have said the government will introduce legislation to decriminalize possession of small amounts of pot .\n0\t110731\t110648\tBut Chauncey Billups demonstrated he 's also capable of big games , scoring 77 points over the final two games against the Magic .\tBillups scored 77 points in the final two games of the first-round series against the Magic .\n1\t2274844\t2274714\tKelly killed himself after being exposed as the source for a BBC report which claimed the government had embellished evidence of Iraq 's banned weapons to justify the war .\tHe killed himself after being exposed as the source for a BBC report which claimed the government exaggerated the case for war against Iraq .\n0\t1050307\t1050144\tAnd it 's going to be a wild ride , \" said Allan Hoffenblum , a Republican consultant .\tNow the rest is just mechanical , \" said Allan Hoffenblum , a Republican consultant .\n1\t2810634\t2810670\tWhile the Ibrahims had one separation operation , Goodrich and Dr. David Staffenberg plan about three for the Aguirres , with several weeks between each .\tInstead of one long operation to separate the twins , Goodrich and Dr. David Staffenberg plan about three , with several weeks between each .\n1\t3073773\t3073779\tLay had contended that turning over the documents would violate his Fifth Amendment right against self-incrimination .\tLay had refused to turn over the papers , asserting his Fifth Amendment right against self-incrimination .\n0\t261202\t260995\tThe WHO experts didn 't say how many cases in Hebei were in rural areas .\tHebei has reported 191 cases and eight deaths , though the WHO experts did not say how many were in rural areas .\n1\t1824224\t1824209\tNearly 300 mutinous troops who seized a Manila shopping and apartment complex demanding the government resign gave up and retreated peacefully after some 19 hours .\tMutinous troops who seized a Manila shopping and apartment complex demanding the government resign ended a 19-hour standoff late Sunday and returned to barracks without a shot fired .\n1\t548867\t548785\tIn three years , Lend Lease has slipped from a top-five stock , when its share price was around $ 24 , to 37th .\tIn the space of three years , Lend Lease has slipped from a top-five 5 stock when its share price hovered around $ 24 to 37th on the list .\n0\t2796658\t2796682\tAbout two hours later , his body , wrapped in a blanket , was found dumped a few blocks away .\tThen his body was dumped a few blocks away , found in a driveway on Argyle Road .\n1\t1808166\t1808434\tColumbia broke up over Texas upon re-entry on Feb. 1 .\tColumbia broke apart in the skies above Texas on Feb. 1 .\n1\t853475\t853342\tA year or two later , 259 , or 10 per cent , of the youths reported that they had started to smoke , or had taken just a few puffs .\tWithin two years , 259 , or 10 percent , of the youths reported they had started to smoke or had at least taken a few puffs .\n0\t977772\t977804\tThe Lord Chancellor was guardian of the Great Seal , used to stamp all official documents from the sovereign .\tFalconer will hold on , for now , to the Lord Chancellor 's Great Seal , used to sign off instructions from the sovereign .\n1\t577854\t578500\tCindy Yeast , a 50-year-old Washington-area publicist , says she began taking supplements two years ago in part to avoid mild dementia that affects her elderly parents .\tShe started taking supplements two years ago - partly to stave off mild dementia that affects her elderly parents .\n1\t2829194\t2829229\tThe two are not related , but have referred to each other as father and son .\tHe 's not related to Malvo , but the two have referred to each other as father and son .\n1\t2074182\t2074668\tGibson said last month in a press statement that \" neither I nor my film are anti-Semitic .\tGibson said in a June statement that he and his film are not anti-Semitic .\n0\t2758265\t2758282\tThe world 's largest software company said it recognized the difficulty the multiple patches posed for companies , and set out to make it easier for them to apply the updates .\tThe world 's largest software company said it recognized the difficulty the multiple patches posed for companies trying to apply them .\n1\t1958079\t1958143\tThe Dow Jones industrial average .DJI ended up 64.64 points , or 0.71 percent , at 9,191.09 , according to the latest available data .\tThe blue-chip Dow Jones industrial average .DJI added 38 points , or 0.42 percent , to 9,165 .\n1\t544217\t544325\tThe vote came just two days after Kurds swept City Council elections , taking the largest single block of votes on the 30-seat council .\tThe vote for mayor followed City Council elections that gave Kurds the largest block of votes on the 30-seat council .\n1\t2385288\t2385256\tLarge swells and dangerous surf already were being felt along sections of the coast .\tAlready large swells and dangerous surf have arrived along the mid-Atlantic .\n0\t2324708\t2325028\tBased on a separate survey of households , the unemployment rate fell in August to 6.1 percent from 6.2 percent .\tLabor Department analysts discounted a slight improvement in the national unemployment rate , which fell in August to 6.1 percent from 6.2 percent .\n1\t2139506\t2139427\t\" We will work with the board to ensure a smooth transition . \"\tHe said federal regulators would work with the corporation to ensure a \" smooth transition . \"\n1\t2965576\t2965701\tGasps could be heard in the courtroom when the photo was displayed .\tGasps could be heard as the photo was projected onto the screen .\n1\t2931098\t2931144\tGilead had earnings of $ 73.1 million , or 33 cents a share , compared with $ 20.8 million , or 10 cents , in the year-ago quarter .\tQuarterly profit climbed to $ 73.1 million , or 33 cents a share , from $ 20.8 million , or 10 cents , a year earlier , the company said .\n0\t644788\t644816\t\" I had one bad stretch of holes that put me out of contention to win , \" Woods said .\t\" I had one bad stretch of holes that put me out of contention , \" Woods said , referring to his 42 on the front nine Saturday .\n0\t2551891\t2551563\tThe poll had a margin of error of plus or minus 2 percentage points .\tIt had a margin of sampling error of plus or minus four percentage points and was conducted Thursday through Saturday .\n1\t1089053\t1089297\tSen. Patrick Leahy of Vermont , the committee 's senior Democrat , later said the problem is serious but called Hatch 's suggestion too drastic .\tSen. Patrick Leahy , the committee 's senior Democrat , later said the problem is serious but called Hatch 's idea too drastic a remedy to be considered .\n1\t3435735\t3435717\tThe broad Standard & Poor 's 500 < .SPX > eased 0.37 of a point , or 0.03 percent , at 1,121 .\tThe Standard & Poor 's 500 Index < .SPX > slipped 0.26 point , or 0.02 percent , to 1,121.96 .\n0\t1954\t2142\tWatertown , Saugus and Framingham also are going smoke-free Monday , joining a growing number of cities around the country .\tAlong with Boston , Watertown , Saugus and Framingham also are going smoke-free Monday .\n1\t3400796\t3400822\tThat is evident from their failure , three times in a row , to get a big enough turnout to elect a president .\tThree times in a row , they failed to get a big _ enough turnout to elect a president .\n1\t1220668\t1220801\tWe firmly believe we have an absolute right to use the common word ' spike ' as the name of our network . \"\tWe firmly believe that we have an absolute right to use the common word ' spike ' to name our network .\n1\t1889954\t1889847\tSources who knew of the bidding said last week that cable TV company Comcast Corp. was also looking at VUE .\tLate last week , sources told Reuters cable TV company Comcast Corp. CMCSA.O also was looking at buying VUE assets .\n1\t315785\t315653\tBut MTA officials appropriated the money to the 2003 and 2004 budgets without notifying riders or even the MTA board members considering the 50-cent hike , Hevesi found .\tMTA officials appropriated the surplus money to later years ' budgets without notifying riders or the MTA board members when the 50-cent hike was being considered , he said .\n0\t1521034\t1520582\tWhite , who had suffered kidney failure from years of high blood pressure , died at Cedars-Sinai Medical Center around 9 : 30 a.m. , said manager Ned Shankman .\tWhite , who had kidney failure from years of high blood pressure , had been undergoing dialysis and had been hospitalized since a September stroke .\n1\t2083598\t2083810\tAbout 10 percent of high school and 16 percent of elementary students must be proficient at math .\tIn math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient .\n1\t1910610\t1910455\tThe legal ruling follows three days of intense speculation Hewlett-Packard Co. may be bidding for the company .\tThe legal ruling follows three days of wild volatility in RIM 's stock over speculation that PC giant Hewlett-Packard Co. may be bidding for the company .\n1\t3113791\t3113782\tThe European Commission , the EU 's antitrust enforcer , is expected to issue its decision next spring — unless a settlement is reached .\tThe European Commission is expected to issue its decision in the case next spring — unless a settlement is reached .\n1\t3214517\t3214483\t\" So Sebastian did his best to convincingly confess to a crime that he didn 't commit in order to survive , \" she told jurors .\t\" Sebastian did his best to confess convincingly to a crime he didn 't do in order to survive , \" Ms. Richardson declared .\n0\t2083612\t2083810\tTwenty percent of Latino students and 23 percent of black students performed at proficient or higher .\tIn math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient .\n1\t661390\t661218\tHe is charged in three bombings in Atlanta including a blast at the 1996 Olympics and one in Alabama .\tHe is charged in three bombings in Atlanta - including a blast at the 1996 Olympics - along with the bombing in Alabama .\n1\t1269572\t1269682\tThe men were remanded in custody and are due to appear again before court on July 8 .\tThey were remanded in custody and will appear in court again on July 8 .\n1\t1095780\t1095652\t\" No matter who becomes the sponsor for stock-car racing 's top series , NASCAR will need an all-star event , \" Wheeler said in a statement .\tNo matter who becomes the sponsor for stock-car racings top series , NASCAR will need an all-star event , Wheeler said Tuesday .\n1\t116294\t116332\tThe Phillies were upset that Counsell had stolen second in the sixth inning with Arizona leading 7-1 .\tThe Phillies were apparently upset when Counsell stole during the sixth with the Diamondbacks up 7-1 .\n1\t941617\t941673\tHe said his hatred for such people grew from these discussions and had helped convince him violence was the answer .\tHis hatred for these people had germinated from these discussions and helped cement his belief that violence was the panacea .\n1\t2640607\t2640576\t\" There is no need for one deadline for all to create the ASEAN Economic Community , \" Thaksin said .\tThus , he said , there did not have to one deadline to create the economic community .\n1\t3310210\t3310286\tThe announcement was made during the recording of a Christmas concert attended by top Vatican cardinals , bishops , and many elite from Italian society , witnesses said .\tThe broadside came during the recording on Saturday night of a Christmas concert attended by top Vatican cardinals , bishops and many elite of Italian society , witnesses said .\n1\t3376093\t3376101\tThe additional contribution brings total U.S. food aid to North Korea this year to 100,000 tonnes .\tThe donation of 60,000 tons brings the total of U.S. contributions for the year to 100,000 .\n1\t1549586\t1549609\tLeon Williams ' body was found inside his third-floor apartment at 196 Bay St. , in Tompkinsville .\tThe dead man , Leon Williams , was found in his third-floor apartment .\n1\t460211\t460445\tThe player 's eyes were bloodshot and a blood-alcohol test produced a reading of 0.18 - well above Tennessee 's level of presumed intoxication of 0.10 , the report said .\tHe failed a field sobriety test and a blood-alcohol test produced a reading of 0.18 – well above Tennessee 's level of presumed intoxication of 0.10 , the report said .\n1\t1196962\t1197061\tBut Virgin wants to operate Concorde on routes to New York , Barbados and Dubai .\tBranson said that his preference would be to operate a fully commercial service on routes to New York , Barbados and Dubai .\n0\t862804\t862715\tHe tried to fight off officers and was taken to a hospital after a police dog bit him but was later released .\tCruz tried to fight off officers and was hospitalized after a police dog bit him , Sgt. Steve Dixon said .\n1\t1726935\t1726879\tThe announcement , which economists said was not a surprise , may be bittersweet for the millions of Americans without jobs .\tEconomists said the announcement was not a surprise , and politicians said it offered little comfort to the millions of Americans without jobs .\n0\t331980\t332110\tAsked if the delegates could leave on Friday , police intelligence chief in Aceh , Surya Dharma , told reporters they could not because they did not have proper permission .\tAsked if the delegates could leave on Friday , police intelligence chief Surya Dharma told reporters : \" Of course they may not go .\n1\t173879\t173832\tDealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid the yen 's rise against the dollar .\tDealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid ever-falling domestic interest rates .\n0\t2834988\t2835026\tIran has until the end of the month to satisfy the agency it has no plans for nuclear weapons .\tThe Iranians have until the end of the month to answer all the agency 's questions about their past nuclear activities .\n1\t2587300\t2587243\tHer father , Florin Cioaba , the king of Transylvania 's Gypsies , had her brought back and she was married against her will .\tHer father , Roma King Florin Cioaba , had her brought back and she was promptly married against her will .\n0\t554905\t554627\tClaire had advanced to the third round of the 76th annual Scripps Howard National Spelling Bee .\tOne by one they strolled to the microphone , all 251 youngsters in the 76th Scripps Howard National Spelling Bee .\n1\t1912524\t1912648\tCitigroup Inc . C.N , the world 's largest financial services company , on Wednesday promoted Marjorie Magner to chairman and chief executive of its global consumer group .\tCitigroup ( C ) on Wednesday named Marjorie Magner chairman and chief executive of its colossal global consumer business .\n1\t3255597\t3255668\t\" They 've been in the stores for over six weeks , \" says Carney .\tThe quarterlies usually stay in stores for between six to eight weeks , \" Carney added .\n1\t629316\t629289\tLet me just say this : the evidence that we have of weapons of mass destruction was evidence drawn up and accepted by the joint intelligence community .\t\" The evidence that we had of weapons of mass destruction was drawn up and accepted by the Joint Intelligence Committee , \" he said .\n1\t54181\t53570\tRidge said no actual explosives or other harmful substances will be used .\tRidge said no real explosives or harmful devices will be used in the exercise .\n1\t723557\t724115\tThus far , Stewart 's company appears ready to stand behind her .\tFor now , the company 's management appears to be standing behind Stewart .\n0\t2607718\t2607708\tBut late Thursday night , the campaign issued a statement saying there would be no news conference and no big announcement .\tBut late yesterday , the campaign and the state Democratic Party said there would be no news conference .\n1\t753858\t753890\tThere 's also a flaw that results because IE does not implement an appropriate block on a file download dialog box .\tThe second vulnerability is a result of IE not implementing a block on a file download dialog box .\n1\t587009\t586969\tAnother $ 100-million in savings will come from management layoffs and pay cuts .\tThe airline expects to save another $ 100-million a year through management layoffs and pay cuts .\n1\t308567\t308525\tHe called on Prime Minister John Howard to establish a royal commission on child sex abuse .\tThe Senate motion also called on Prime Minister John Howard to hold a royal commission into child sex abuse .\n0\t665419\t665612\t\" We think that the United States of America should support the free speech of all groups , \" Mr. White said , objecting to Mr. Olson 's recommendation .\tWe think that the United States of America should support the free speech of all groups , he said .\n1\t2763517\t2763576\tTerri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler .\tThe tube was removed Wednesday from Terri Schiavo , 39 , at the Tampa Bay-area hospice where she has lived for several years .\n0\t3107118\t3107136\tAfter 18 months , Nissen found that Lipitor stopped plaque buildup in the patients ' arteries .\tAfter 18 months , the atorvastatin patients had no change in the plaque in their arteries .\n1\t780604\t780466\tToll , Australia 's second-largest transport company , last week offered NZ75 a share for Tranz Rail .\tToll last week offered to buy the company for NZ75c a share , or $ NZ158 million .\n0\t1989213\t1989116\t\" This child was literally neglected to death , \" Armstrong County District Attorney Scott Andreassi said .\tArmstrong County District Attorney Scott Andreassi said the many family photos in the home did not include Kristen .\n1\t1462409\t1462504\tWal-Mart , the nation 's largest private employer , has expanded its antidiscrimination policy to protect gay and lesbian employees , company officials said Tuesday .\tWal-Mart Stores Inc . , the nation 's largest private employer , will now include gays and lesbians in its anti-discrimination policy , company officials said Wednesday .\n1\t260952\t260924\tMetro , bus and local rail services in France 's four largest towns -- Paris , Lyon , Lille and Marseille -- were severely disrupted , Europe 1 radio reported .\tSubway , bus and suburban rail services in France 's four largest cities -- Paris , Lyon , Lille and Marseille -- were severely disrupted , transport authorities said .\n1\t1224743\t1225510\tIn the undergraduate case , Rehnquist said the use of race was not \" narrowly tailored \" to achieve the university 's asserted interest in diversity .\tRehnquist wrote that the system was not narrowly tailored to achieve the interest in educational diversity .\n0\t3329379\t3329416\tSP2 is basically about security enhancements to Windows , such as the improved Internet Connection Firewall ( ICF ) .\tThe firewall in the current Windows XP was known as the Internet Connection Firewall ( ICF ) .\n1\t2362761\t2362698\tA landslide in central Chungchong province derailed a Seoul-bound train and 28 passengers were injured , television said .\tIn central Chungchong province , a landslide caused a Seoul-bound Saemaeul Express train to derail , injuring 28 people , local television said .\n0\t1465073\t1464854\tThey will help draft a plan to attack obesity that Kraft will implement over three to four years .\tThe team will help draft a plan by the end of the year to attack obesity .\n1\t195728\t196099\tBut that amount would probably be impossible to pass in the Senate , where Republican moderates have refused to go above $ 350 billion .\tSuch an amount would probably be unable to summon a majority of the Senate , where Republican moderates have refused to go above $ 350 billion .\n1\t2587767\t2587673\tIn the clash with police , Lt. Mothana Ali said about 1,000 demonstrators had gone to the station demanding jobs .\tIn Baghdad , police Lieut . Mothana Ali said about 1,000 demonstrators arrived at the station demanding jobs .\n0\t1490044\t1489975\tCorixa shares rose 54 cents to $ 7.74 yesterday on the Nasdaq Stock Market .\tShares of Corixa rose 54 cents , or about 8 percent , to close at $ 7.74 .\n1\t958161\t957782\tCommittee approval , expected today , would set the stage for debate on the Senate floor beginning Monday .\tThat would clear the way for debate in the full Senate beginning on Monday .\n1\t1033204\t1033365\tO 'Brien was charged with leaving the scene of a fatal accident , a felony .\tBishop Thomas O 'Brien , 67 , was booked on a charge of leaving the scene of a fatal accident .\n0\t2996241\t2996734\tTom Hamilton said his daughter was conscious and alert and in stable condition after the attack Friday morning .\tBethany , who remained in stable condition after the attack Friday morning , talked of the attack Saturday .\n0\t2015389\t2015410\tThe Calgary woman , who is in her twenties , donated blood on Aug. 7 .\tThe woman -- who has no symptoms of illness -- donated blood Aug. 7 .\n1\t221515\t221509\tQuattrone lawyer John W. Keker said his client is innocent .\tIn a statement Monday , his lawyer John Keker said ``Frank Quattrone is innocent .\n0\t2283737\t2283794\tIn the weeks leading up to the execution , several Florida officials received anonymous threatening letters .\tSeveral Florida officials connected to the case have received threatening letters , accompanied by rifle bullets .\n1\t2826681\t2826474\tThe disagreement over online music sales was disclosed in documents filed last week with the judge and made available by the court yesterday .\tThe fight over online music sales was disclosed in documents made available Monday by the court .\n1\t2249237\t2249305\tParson was charged with intentionally causing and attempting to cause damage to protected computers .\tParson is charged with one count of intentionally causing damage to a protected computer .\n1\t389239\t389299\t\" The court and the public need to know much more of the details of the defendant 's seemingly massive fraud , \" the judge said .\t\" The court and the public need to know more of the defendants ' seemingly massive fraud , \" he said .\n1\t2652187\t2652218\tThe U.S. Supreme Court will hear arguments on Wednesday on whether companies can be sued under the Americans with Disabilities Act for refusing to rehire rehabilitated drug users .\tThe high court will hear arguments today on whether companies can be sued under the ADA for refusing to rehire rehabilitated drug users .\n1\t2945693\t2945847\tThe IRS said taxpayers can avoid undelivered checks by having refunds deposited directly into their checking or savings accounts .\tThe IRS said taxpayers can avoid problems with lost or stolen refunds by having refunds deposited directly into personal checking or savings accounts .\n1\t2065523\t2065836\t\" More than 70,000 men and women from bases in Southern California were deployed in Iraq .\tIn all , more than 70,000 troops based in Southern California were deployed to Iraq .\n1\t2222998\t2223097\tBP shares slipped 0.8 percent to 433.50 pence ( $ 6.85 ) each in afternoon trading on the London Stock Exchange .\tBP shares slipped 48 cents to $ 41.72 Friday in trading on the New York Stock Exchange .\n1\t2561999\t2561941\tBecause of the accounting charge , the company now says it lost $ 1.04 billion , or 32 cents a share , in the quarter ended June 30 .\tIncluding the charge , the Santa Clara , Calif.-based company said Monday it lost $ 1.04 billion , or 32 cents per share , in the period ending June 30 .\n0\t2324704\t2325023\tFriday 's report raised new worries that a weak job market could shackle the budding economic recovery despite a slight improvement in the overall unemployment rate .\tU.S. companies slashed payrolls for a seventh straight month in August , raising new worries that a weak jobs market could shackle the budding economic recovery .\n1\t2336453\t2336545\tFederal Emergency Management Administration designated $ 20 million to establish the registry .\tThe registry was launched with $ 20 million from the Federal Emergency Management Agency .\n1\t720572\t720486\tBREAST cancer cases in the UK have hit an all-time high with more than 40,000 women diagnosed with the disease each year , Cancer Re-search UK revealed yesterday .\tCases of breast cancer in Britain have reached a record high , with the number of women diagnosed with the disease passing the 40,000 mark for the first time .\n1\t1605818\t1605806\t\" It was never our intention to sell the product , \" said Health Minister Anne McClellan , a skeptic of medical marijuana use .\t\" It was never the intention of us to sell product , \" federal Health Minister Anne McLellan said yesterday in Edmonton .\n0\t2440680\t2440474\tGM , the world 's largest automaker , has 115,000 active UAW workers and another 340,000 retirees and spouses .\tThey cover more than 300,000 UAW workers and 500,000 retirees and spouses .\n0\t726399\t726078\tRosenthal is hereby sentenced to custody of the Federal Bureau of prisons for one day with credit for time served , \" Breyer said to tumultuous cheers in the courtroom .\t\" Rosenthal is hereby sentenced to custody of the Federal Bureau of Prisons for one day with credit for time served . \"\n1\t533903\t533818\t\" We are committed to helping the Iraqi people get on the path to a free society , \" Rumsfeld said in a speech to the Council on Foreign Relations .\t\" We are committed to helping the Iraqi people get on the path to a free society , \" he said .\n1\t1166473\t1166857\tMr. Young said he was disappointed that the government didn 't see the severe acute respiratory syndrome crisis as worthy of federal disaster-relief money .\tYoung said he was disappointed the government didn 't see the SARS crisis as worthy of federal disaster relief money .\n1\t144089\t143697\tThe 12-nation currency has risen by 33 percent against the dollar over the past 15 months .\tThe euro is up 9 percent against the dollar in the past six weeks .\n1\t3439854\t3439874\tIn February 2000 , the officers — Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy — were acquitted of all charges in the killing .\tThe officers -- Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy -- were acquitted in 2000 of state murder charges .\n1\t3464314\t3464302\tI was surprised it turned out me talking and the president just listening .\t\" I was surprised it turned out me talking and the president just listening . . . It was mostly a monologue . \"\n1\t2008984\t2009175\tThe state 's House delegation currently consists of 17 Democrats and 15 Republicans .\tDemocrats hold a 17-15 edge in the state 's U.S. House delegation .\n0\t816867\t816831\tFreddie also said Leland C. Brendsel will retire as chairman and chief executive and resign from the board .\tHe replaces Leland Brendsel , 61 , who retired as chairman and chief executive .\n1\t192285\t192327\tWe 'll be listening carefully to the [ IAEA ] director general 's report at the next board meeting .\t\" We 'll be listening carefully to the ( IAEA ) director-general 's report at the next board meeting . \"\n1\t2688145\t2688162\tIn that position , Elias will report to Joe Tucci , president and CEO of EMC .\tAs executive vice president of new ventures , Elias will report to Joe Tucci , EMC 's president and chief executive .\n1\t3294207\t3294290\tBut with the PM due to leave tomorrow afternoon for personal reasons there was a risk he might not be present when the final decision was made .\tBut with the Prime Minister due to leave tomorrow , a day early , he may not be present when the final decision is made .\n0\t205100\t205145\tA pro-independence radical , Miodrag Zivkovic , of the Liberal Alliance , came in second with 31 percent of the vote .\tMiodrag Zivkovic , of the Liberal Alliance of Montenegro , won 31 percent of the vote while the independent Dragan Hajdukovic got four percent .\n0\t3242051\t3241897\tMr. Kerkorian tried unsuccessfully to take over Chrysler in 1995 , but did win representation on its board .\tKerkorian and Tracinda had also tried to take over Chrysler in 1995 .\n0\t1076861\t1077018\tGlover spoke at a news conference that included about 20 relatives of the victims .\tAbout 20 family members of the victims were invited to the news conference .\n1\t2095803\t2095786\tDrax faced a financial crisis late last year after it lost its most lucrative sales contract , held with insolvent utility TXU Europe .\tDrax ’ s troubles began late last year when it lost its most lucrative sales contract , with the insolvent utility TXU Europe .\n1\t2112330\t2112376\tBut I would rather be talking about high standards than low standards . \"\t\" I would rather be talking about positive numbers rather than negative .\n1\t3389318\t3389271\tIt was not immediately known how many people were on flight UTA 141 , which could carry 141 passengers and crew .\tIt was still not known exactly how many people were on the plane , which could carry 141 passengers and crew .\n1\t698948\t698933\tThe market remains pinned in a narrow range after a powerful rally drove the broad Standard & Poor 's 500 index .SPX up more than 20 percent since mid-March .\tThe market remains pinned in a narrow range after a powerful rally pushed the broad S & P 500 index up more than 20 percent since mid-March .\n1\t539585\t539355\tWitnesses said they believed the man planned to crash the Launceston-bound Qantas flight 1737 , which was carrying 47 passengers and six crew .\tWitnesses believe he wanted to crash Flight 1737 , which had 47 passengers and six crew .\n1\t684848\t684557\tAs Samudra sat down to hear the indictment , he looked over to his nine lawyers and shouted ``God is Great ' ' three times .\tAs he sat down to hear the indictment , Samudra looked over to his nine lawyers and shouted \" Takbir ! \" , or \" Proclaim ! \" , a religious rallying cry .\n1\t347017\t347002\tIn hardest-hit Taipei , traffic has disappeared from once bustling streets , ubiquitous department stores stand mostly empty and restaurants are eerily quiet .\tIn hardest-hit Taipei , traffic has disappeared from once-bustling streets and department stores and restaurants are virtually empty .\n1\t1592037\t1592076\tIn a statement , Lee said he \" no longer believes that Viacom deliberately intended to trade on my name when naming Spike TV . \"\tSpike Lee no longer believes that Viacom deliberately intended to trade on his name by calling its own venture \" Spike TV , \" according to a statement read in court Tuesday .\n0\t3013483\t3013540\tSingapore Prime Minister Goh Chok Tong says China plays an important role in the integration of Asia , including managing the stresses and strains both within and between countries .\tHAINAN PROVINCE , China : Singapore Prime Minister Goh Chok Tong said China plays an important role in the integration of Asia .\n1\t2020252\t2020081\tThe worm attacks Windows computers via a hole in the operating system , an issue Microsoft on July 16 had warned about .\tThe worm attacks Windows computers via a hole in the operating system , which Microsoft warned of 16 July .\n0\t2614947\t2614904\tThe premium edition adds OfficeFront Page 2003 , Acceleration Server 2000 , and SQL Server 2000 .\tThe premium edition adds ISA Server , SQL Server and a specialized edition of BizTalk 2004 .\n0\t1744257\t1744378\tIn the year-ago quarter , the steelmaker recorded a profit of $ 16.2 million , or 15 cents per share , on sales of $ 1.14 billion .\tIn the second quarter last year , AK Steel reported a profit of $ 16.2 million , or 15 cents a share .\n0\t1119721\t1119714\tSony claimed that the reader 's capacitance sensing technology cannot be fooled by paper copies and does not require cleaning .\tIts capacitance sensing technology electronically reads a fingerprint ; Sony says it can 't be fooled by paper copies and doesn 't require cleaning .\n1\t1186754\t1187056\tAmazon.com shipped out more than a million copies of the new book , making Saturday the largest distribution day of a single item in e-commerce history .\tAmazon.com shipped more than a million copies by Saturday afternoon , making Saturday the largest distribution day of a single item in e-commerce history .\n1\t2842562\t2842582\tThe show 's closure affected third-quarter earnings per share by a penny .\tThe company said this impacted earnings by a penny a share .\n0\t431076\t431242\tAfter the two-hour meeting on May 14 , publisher Arthur O. Sulzberger Jr . , executive editor Howell Raines and managing editor Gerald Boyd pledged quick remedies to staff grievances .\tThe committee will make recommendations to Publisher Arthur Sulzberger , Executive Editor Howell Raines and Managing Editor Gerald Boyd .\n1\t1393764\t1393984\tIt 's been a busy couple of days for security gurus assigned to keep their companies safe and sound .\tIt 's been a busy couple of days for enterprise security gurus tasked with the job of keeping their companies safe and sound .\n0\t2916199\t2916164\tLu reclined in a soft chair wearing a woolly coat near the blackened capsule .\t\" It 's great to be back home , \" said Lu , dressed in a woolly coat near the blackened capsule .\n1\t2530671\t2530542\tGov. Bob Riley proposed the budget cuts after Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 .\tAfter Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 , Riley forecast significant cuts in state programs .\n1\t219064\t218969\t\" It is probably not the easiest time to come in and take over the shuttle program , but then again , I look forward to the challenge , \" he said .\t\" It 's probably not the easiest time to come in and take over the shuttle program , but I look forward to the challenge , \" Parsons told reporters at NASA headquarters .\n0\t2377289\t2377259\tEstonia 's place in the European mainstream and safeguard its independence regained in 1991 .\tEstonia was forcibly incorporated in the Soviet Union in 1940 and regained its independence only in 1991 .\n0\t2110220\t2110199\tFranklin County Judge-Executive Teresa Barton said a firefighter was struck by lightning and was taken to the Frankfort Regional Medical Center .\tA county firefighter , was struck by lightning and was in stable condition at Frankfort Regional Medical Center .\n0\t1864253\t1863810\tPolice suspected that Shaichat , 20 , had been abducted either by Palestinians or by Israeli Arabs .\tNobody claimed responsibility for Schaichat 's death , but police suspect that the 20-year-old soldier was abducted either by Palestinians or Israeli Arabs .\n0\t3150803\t3150839\tDuring this year 's August to October quarter , Lowe 's opened 38 new stores , including two relocations .\tDuring the third quarter , Lowe 's opened 38 new stores and now has 932 stores in 45 states .\n0\t969381\t969512\tThe technology-laced Nasdaq Composite Index < .IXIC > declined 25.78 points , or 1.56 percent , to 1,627.84 .\tThe broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 .\n1\t271891\t271839\tSony said the PSP would also feature a 4.5-inch LCD screen , Memory Stick expansion slots .\tIt also features a 4.5 in back-lit LCD screen and memory expansion facilities .\n0\t2829648\t2829613\tClinton did not mention that two Democratic senators , Charles Robb of Virginia and Wendell Ford of Kentucky , voted to shelve the McCain bill .\tTwo Democrats , Sen. Charles Robb of Virginia and Wendell Ford of Kentucky , voted with the 40 Republicans .\n1\t886904\t887158\tSome of the company 's software developers will join Microsoft , but details haven 't been finalized , said Mike Nash , corporate vice president of Microsoft 's security business unit .\tSome of the companys software developers will join Microsoft , but details havent been finalized , said Mike Nash , corporate vice president of Microsofts security business unit .\n0\t2632692\t2632767\tWal-Mart has said it plans to open at least 40 Supercenters in the state in the coming years ; analysts expect four or more to be in San Diego County .\tAt least 40 of the outlets will be in California , and analysts expect four or more to be in San Diego County .\n1\t2240399\t2240149\tCintas is battling efforts to unionize 17,000 of its workers and to let unions organize the workers by signing cards , rather than by a lengthy election process .\tCintas is battling efforts to unionize 17,000 of its workers and labor 's demands to let its workers organize by signing cards , rather than by a lengthy election process .\n1\t805457\t805985\tThe opposition would resort to rolling mass action \" at strategic times of our choice and without warning to the dictatorship , \" he said .\t\" From now onwards we will embark on rolling mass action at strategic times of our choice and without any warning to the dictatorship , \" he said .\n1\t2896308\t2896334\tFederal Agriculture Minister Warren Truss said the Government still did not know the real reason the sheep were rejected at the Saudi port of Jeddah on August 21 .\tHe said the Government still did not know the real reason the original Saudi buyer pulled out on August 21 .\n1\t2110775\t2110924\tTom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said that scenario is one among many that investigators are considering .\tTom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said investigators are considering the scenario .\n1\t1762569\t1762526\tHester said Sanmina was the best fit among several purchase offers the company received from electronics manufacturers and computer makers .\tHester said Sanmina 's offer was the best among several Newisys received from electronics manufacturers and computer makers .\n0\t2706154\t2706185\tThe other inmate fell but Selenski shimmed down the makeshift rope to a second-story roof and used the mattress to scale a razor-wire fence , Fischi said .\tAfter the other inmate fell , Selenski used the mattress to scale a 10-foot , razor-wire fence , Fischi said .\n1\t1057995\t1057778\tThe hearing , expected to last a week , will determine whether Akbar faces a court-martial .\tThe purpose of the hearing is to determine whether Akbar should be court-martialled .\n1\t1386884\t1386857\tHe said he has begun a court action to seize Beacon Hill 's assets and has frozen more than $ 13 million Beacon Hill had when it closed .\tHe said he has initiated a forfeiture action in court and frozen more than $ 13 million Beacon Hill had when it closed .\n1\t3093023\t3092996\tSpeaking for the first time yesterday , Brigitte 's maternal aunt said his family was unaware he had was in prison or that he had remarried .\tBrigitte 's maternal aunt said his family was unaware he had been sent to prison , or that he had remarried in Sydney .\n1\t1661381\t1661317\t\" Close co-operation between our law enforcement agencies , close co-operation between our intelligence services lie at the heart of the ongoing fight against terrorism . \"\tClose cooperation between regional law enforcement agencies and intelligence services was at the heart of the fight against terrorism , he said .\n0\t2926039\t2925982\tThe mother of a Briton held by Colombian guerrillasspoke of her relief yesterday after hearing that he might be freed in the next few weeks .\tThe parents of a Briton being held hostage by Colombian rebels spoke yesterday of their optimism that he would be freed in time for his birthday next month .\n0\t637168\t637447\tWe strongly disagree with Novell 's position and view it as a desperate measure to curry favor with the Linux community .\tMcBride characterized Novell 's move as \" a desperate measure to curry favor with the Linux community . \"\n1\t696677\t696932\tAfter more than two years ' detention under the State Security Bureau , the four were found guilty of subversion in Beijing 's No. 1 Intermediate Court last Wednesday .\tAfter more than two years in detention by the State Security Bureau , the four were found guilty last Wednesday of subversion .\n1\t3122429\t3122305\tMr Russell , 46 , a coal miner from Brisbane , said : \" They are obviously hurting , so we are basically going over there to help them . \"\t\" They are obviously hurting so we are basically going over there to help them , \" Russell , 46 , said .\n1\t1348909\t1348954\tThe New York Democrat and former first lady has said she will not run for the White House in 2004 , but has not ruled out a race in later years .\tThe former first lady has said she will not run for the White House in 2004 but has not ruled out a race later on .\n0\t162203\t162101\tIt does not affect the current Windows Media Player 9.0 Series .\tWindows Media Player has had security problems before .\n0\t71501\t71627\tThe seizure took place at 4 a.m. on March 18 , just hours before the first American air assault .\tThe time was about 4 a.m. on March 18 , just hours before the first pinpoint missiles rained down on the capital .\n1\t2907762\t2907649\tDonations stemming from the Sept . 11 attacks helped push up contributions to human service organizations and large branches of the United Way by 15 percent and 28.6 percent , respectively .\tDonations stemming from the Sept . 11 attacks helped push up contributions to human service organizations by 15 percent and to large branches of the United Way by 28.6 percent .\n1\t2167771\t2167744\tIn May , Mr. Hatfill said he was struck by a vehicle being driven by an FBI employee who was tailing him in Georgetown .\tLast May , Hatfill was struck by a vehicle being driven by an FBI employee who was tailing him in Washington 's Georgetown neighborhood .\n1\t3320577\t3320553\t\" I will support a constitutional amendment which would honor marriage between a man and a woman , codify that , \" he said .\t\" If necessary , I will support a constitutional amendment which would honour marriage between a man and a woman , codify that . \"\n1\t849291\t849442\tIBM of the US and Infineon Technologies of Germany will today announce a technological development that could threaten multi-billion dollar memory chip markets .\tIBMof the US andInfineon Technologies of Germany willon Tuesdayannounce a technological development that could threaten multi-billion dollar memory chip markets .\n0\t763948\t763991\tCosta 's semifinal opponent is Spaniard Juan Carlos Ferrero , whom he beat in last year 's final .\tCosta will play Juan Carlos Ferrero next in a rematch of last year 's final .\n1\t1908763\t1908744\tA former employee of a local power company pleaded guilty Wednesday to setting off a bomb that knocked out a power substation during the Winter Olympics last year .\tA former Utah Power meter reader pleaded guilty Wednesday to bombing a power substation during the 2002 Winter Olympics .\n0\t1876120\t1876059\tThyroid hormones are known to help in weight loss by stimulating metabolism - and cutting cholesterol - but come with the unwanted side effect of speeding up the heartbeat .\tThyroid hormones are known to help in weight loss by stimulating metabolism , and they can help cut cholesterol too .\n1\t518089\t518133\tJudge Craig Doran said it wasn 't his role to determine if Hovan was \" an evil man \" but maintained that \" he has committed an evil act . \"\tJudge Craig Doran said he couldn 't determine if Hovan was \" an evil man \" but said he \" has committed an evil act . \"\n0\t224932\t224868\tThe Hartford shares rose $ 2.88 , or 6.6 percent , to close Monday at $ 46.50 on the New York Stock Exchange .\tShares of Hartford rose $ 2.88 to $ 46.50 in New York Stock Exchange composite trading .\n1\t1771131\t1771091\tIt also offers a built-in NAND flash boot loader so that high-density NAND flash memory can be used without having to install an additional support chip .\tThe S3C2440 has a built-in NAND flash boot loader , for example , so that high-density NAND flash memory can be installed without an additional support chip .\n0\t2728425\t2728251\tIt decided instead to issue them before the stock market opened Monday after the downgrade of its debt late Friday by Moody 's , the credit rating agency .\tIt decided instead to issue them before the stock market opened Monday to counteract the downgrade of its debt late Friday by Moody 's to one step above junk status .\n0\t953733\t953537\tAltria shares fell 2.5 percent or $ 1.11 to $ 42.57 and were the Dow 's biggest percentage loser .\tIts shares fell $ 9.61 to $ 50.26 , ranking as the NYSE 's most-active issue and its biggest percentage loser .\n1\t349215\t349241\tIt will be followed in November by a third movie , \" The Matrix Revolutions . \"\tThe film is the second of a trilogy , which will wrap up in November with \" The Matrix Revolutions . \"\n1\t2919853\t2919804\tMassachusetts regulators and the Securities and Exchange Commission on Tuesday pressed securities fraud charges against Putnam Investments and two of its former portfolio managers for alleged improper mutual fund trading .\tState and federal securities regulators filed civil charges against Putnam Investments and two portfolio managers in the ever-expanding mutual fund trading scandal .\n1\t954526\t954607\tHe is blocking them until the Air Force assigns four additional C-130 cargo planes to Gowen Field , an Idaho Air National Guard base in Boise .\tHe is holding them up until the Air Force agrees to assign four additional C-130 cargo planes to the Idaho Air National Guard .\n1\t69773\t69792\tCisco pared spending to compensate for sluggish sales .\tIn response to sluggish sales , Cisco pared spending .\n0\t2823575\t2823513\tThe study , published Monday in the journal Molecular Brain Research , is likely to also apply to humans , its authors said .\tThe study , conducted on the brains of developing mice , was being published today in the journal Molecular Brain Research .\n1\t2455942\t2455978\tMy decision today is not based on any one event . \"\tGovernor Rowland said his decision was \" not based on any one event . \"\n1\t131979\t131957\tNelson , 27 , is being retried on civil-rights charges stemming from the disturbance which led to Rosenbaum 's death .\tNelson , 27 , is being retried on civil rights charges stemming from the disturbance that led to Rosenbaum 's death .\n0\t2010705\t2010779\t\" The government elements who have been causing trouble are still in place .\tThe government elements who have been causing trouble are still in place , they are attacking us . \"\n1\t54142\t53641\tNext Monday at about 2 p.m. ( CST ) , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms .\tAround the same time , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms .\n1\t1015249\t1015204\tWal-Mart Stores Inc . , Kohl 's Corp. , Family Dollar Stores Inc. and Big Lots Inc. were among the merchants posting May sales that fell below Wall Street 's modest expectations .\tWal- Mart , Kohl 's Corp. , Family Dollar Stores Inc . , and Big Lots Inc. posted May sales that fell below Wall Street 's modest expectations .\n0\t753928\t753890\tThe patch also fixes a vulnerability that results because IE does not implement an appropriate block on a file download dialog box .\tThe second vulnerability is a result of IE not implementing a block on a file download dialog box .\n1\t3022833\t3023029\tPeterson , a former fertilizer salesman , is charged with murder in the deaths of his 27-year-old wife and the baby boy she was carrying .\tPeterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son .\n0\t751520\t751373\tSPOT products run a Microsoft operating system and the company 's DirectBand radio technology developed with SCA Data Systems .\tThe DirectBand network was developed with the assistance of SCA Data Systems .\n0\t218848\t218851\tHe replaces Ron Dittemore , who announced his resignation in April .\tDittemore announced his plans to resign on April 23 .\n1\t3181118\t3181443\tDetectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , of the arrest shortly after Perry was apprehended .\tShortly after his arrest , detectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , a medical assistant , about the development .\n1\t515581\t515752\tThey were among about 40 people attending the traditional Jewish ceremony colored by some non-traditional touches .\tHe said about 40 people attended the traditional Jewish ceremony colored by some nontraditional touches .\n1\t347022\t347003\tTaiwan had been relatively free of the viral infection until a fiasco at a Taipei hospital in late April caused the number of infections to skyrocket .\tTaiwan had been relatively free of the viral infection until a severe outbreak at a Taipei hospital in late April .\n1\t3311600\t3311633\tMr. Rowland attended a party in South Windsor for the families of Connecticut National Guard soldiers called to active duty .\tRowland was making an appearance at a holiday party for families of Connecticut National Guard soldiers assigned to duty in Iraq and Afghanistan .\n0\t3439114\t3439084\tRoss Garber , Rowland 's lawyer , said Tuesday he would attend the meeting and would ask to speak on the issue .\tRoss Garber , Rowland 's legal counsel , said the governor would have no comment on the condo deal .\n0\t487951\t488007\tThe euro was at 1.5281 versus the Swiss franc EURCHF = , up 0.2 percent on the session , after hitting its highest since mid-2001 around 1.5292 earlier in the session .\tThe euro was steady versus the Swiss franc after hitting its highest since mid-2001 of 1.5261 earlier in the session .\n0\t314997\t315030\tOn the stand Wednesday , she said she was referring only to the kissing .\tOn the stand Wednesday , she testified that she was referring to the kissing before the alleged rape .\n0\t4733\t4557\tGarner said the group would probably be expanded to include , for example , a Christian and perhaps another Sunni leader .\tThe group has already met several times and Gen. Garner said it probably will be expanded to include a Christian and perhaps another Sunni Muslim leader .\n1\t2820371\t2820525\tBlair 's Foreign Secretary Jack Straw was to take his place on Monday to give a statement to parliament on the European Union .\tBlair 's office said his Foreign Secretary Jack Straw would take his place on Monday to give a statement to parliament on the EU meeting the prime minister attended last week .\n1\t801552\t801516\t\" There were more people surrounding the clubhouse than the Unabomber 's house up in the hills , \" Baker said .\t\" There are more people surrounding the clubhouse than surrounded the Unabomber 's home in the hills .\n1\t1704987\t1705268\tCharles O. Prince , 53 , was named as Mr. Weill 's successor .\tMr. Weill 's longtime confidant , Charles O. Prince , 53 , was named as his successor .\n1\t396041\t396188\tOfficials are also meeting with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world .\tCanadian officials were also expected to meet yesterday with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world .\n0\t1014983\t1014963\tGE stock closed Friday at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange .\tGE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange .\n1\t2320654\t2320666\tThe Midwestern research center will focus on the development of diagnostic , therapeutic and vaccine products for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague .\tThe Midwestern center will focus on diagnosis , treatment and vaccines for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague .\n1\t1057876\t1057778\tThe hearing is to determine whether there is enough evidence to order Akbar to a general court-martial proceeding .\tThe purpose of the hearing is to determine whether Akbar should be court-martialled .\n0\t2116843\t2116883\tIn the United States , heart attacks kill about 460,000 year , in Canada about 80,000 .\tIn the United States , heart attacks kill about 460,000 yearly , according to the National Institutes of Health .\n1\t1461629\t1461781\tNinety-five percent of international cargo to the United States is carried by ship .\tShips carry 95 percent of international cargo to the United States .\n0\t374015\t374162\t\" It 's a major victory for Maine , and it 's a major victory for other states .\tThe Maine program could be a model for other states .\n1\t2493369\t2493428\tNews that oil producers were lowering their output starting in November exacerbated a sell-off that was already under way on Wall Street .\tNews that the Organization of Petroleum Exporting Countries was lowering output starting in November exacerbated a stock sell-off already under way yesterday .\n1\t490355\t490378\tThey note that after several weeks of rallies on upbeat earnings , investors are looking for stronger evidence of a recovery before sending stocks higher .\tAfter several weeks of market rallies on upbeat earnings , many investors are looking for more concrete signs of an economic recovery .\n1\t2691044\t2691264\tMost economists had expected a more dire report , with many anticipating the fifth month of job losses in six months .\tMost economists had been expecting a far more dire report , with many expecting to see the fifth month of job losses in six months in September .\n1\t1831453\t1831491\tBut software license revenues , a measure financial analysts watch closely , decreased 21 percent to $ 107.6 million .\tLicense sales , a key measure of demand , fell 21 percent to $ 107.6 million .\n1\t2380695\t2380822\tKing , brand-name writer , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters .\tStephen King , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters from the National Book Foundation .\n1\t2577517\t2577531\tThe Denver-based natural gas producer and marketer said the inaccurate reporting was discovered after it received a subpoena from the U.S. Commodity Futures Trading Commission .\tThe natural gas producer and marketer said the inaccurate reporting was discovered in response to a subpoena from the U.S. Commodity Futures Trading Commission , or CFTC .\n1\t3267026\t3266930\tThe steel tariffs , which the U.S. president imposed in March 2002 , will officially end at midnight , instead of March 2005 as initially planned .\tThe U.S. steel tariffs , which Bush imposed in March 2002 , were to officially end at midnight Thursday ( 0500 GMT ) , instead of March 2005 as initially planned .\n1\t360875\t360943\tBusiness Week 's online edition reported on Friday that WorldCom and the SEC could announce a settlement as early as Monday .\tBusinessWeek Online has learned that the settlement could come as early as Monday , May 19 .\n1\t162632\t162653\tOnly one of the five buildings in the Baghdad compound of the United Nations Development Program escaped being burned , the UN said on its Web site .\tOnly one of the five buildings in the compound in Baghdad run by the UN Development Program , escaped being burned , the UN said on its Web site .\n1\t1128884\t1128865\tShares of Salix have rocketed 64 percent since Axcan made its first offer on April 10 .\tSince the initial takeover offer , Salix shares have risen about 35 percent .\n1\t3264732\t3264648\tThe jury verdict , reached Wednesday after less than four hours of deliberation , followed a 2 week trial , during which Waagner represented himself .\tThe quick conviction followed a 2 1 / 2 week trial , during which the Venango County man represented himself .\n1\t1721433\t1721267\tIt 's happened five times in the last 11 years : A disaster puts this Southwestern town in the headlines during the summer tourist season .\tIt 's happened five times in the last decade : A disaster puts this tourist town in the headlines during summer , its busiest season .\n0\t146112\t146127\tThe broader Standard & Poor 's 500 Index .SPX edged down 9 points , or 0.98 percent , to 921 .\tThe technology-laced Nasdaq Composite Index < .IXIC > shed 15 points , or 0.98 percent , to 1,492 .\n1\t389117\t389052\tThe company emphasized that McDonald 's USA does not import any raw beef or hamburger patties from Canada for McDonald 's use in the United States .\tMcDonald 's said in a statement that it does not import any raw beef or hamburger patties from Canada for use in the United States .\n1\t872784\t872834\tGregory Parseghian , a former investment banker , was appointed chief executive .\tGreg Parseghian was appointed the new chief executive .\n0\t2977500\t2977547\tTheir contract will expire at 12 : 01 a.m. Wednesday instead of 12 : 01 a.m. Sunday , said Rian Wathen , organizing director for United Food and Commercial Workers Local 700 .\t\" It has outraged the membership , \" said Rian Wathen , organizing director of United Food and Commercial Workers Local 700 .\n1\t3107137\t3107119\tBut plaque volume increased by 2.7 percent in pravastatin patients .\tThe volume of plaque in Pravachol patients ' arteries rose by 3 % .\n1\t1619244\t1619274\tToday in the US , the book - kept under wraps by its publishers , G. P. Putnam 's Sons , since its inception - will appear in bookstores .\tTomorrow the book , kept under wraps by G. P. Putnam 's Sons since its inception , will appear in bookstores .\n0\t3061836\t3062031\tThe S & P / TSX composite rose 87.74 points on the week , while the TSX Venture Exchange composite gained 44.49 points .\tOn the week , the Dow Jones industrial average rose 11.56 points , while the Nasdaq Stock Market gained 39.42 points .\n1\t485999\t486011\tEx-KGB agent Putin added that the Beatles were considered ' propaganda of an alien ideology ' .\tIn Soviet times the Beatles ' music \" was considered propaganda of an alien ideology .\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/latency_printer.py",
    "content": "latency_list = []\nwith open('latencies.txt', 'r') as f:\n    for line in f:\n        latency_list.append(float(line.rstrip()))\n\nlatency_list = sorted(latency_list)\nl = len(latency_list)\n\nprint(f'p50 latency is {latency_list[int(.5 * l)]} seconds')\nprint(f'p90 latency is {latency_list[int(.9 * l)]} seconds')\nprint(f'p95 latency is {latency_list[int(.95 * l)]} seconds')\nprint(f'p99 latency is {latency_list[int(.99 * l)]} seconds')\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/mrpc.proto",
    "content": "# coding=utf-8\n\n\"\"\" Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n    Program to gather information from a system\n\"\"\"\n\nsyntax = \"proto3\";\n\npackage mrpc;\n\nservice mrpc {\n    rpc paraphrase (TextPair) returns (YesNo) {}\n}\n\nmessage TextPair {\n    bytes text_a = 1;\n    bytes text_b = 2;\n}\n\nmessage YesNo {\n    bytes message = 1;\n    bytes prediction = 2;\n}\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/mrpc_feature.py",
    "content": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"Extract pre-computed feature vectors from BERT.\"\"\"\n\nimport os\nimport csv\nimport time\nimport numpy as np\nimport tokenization\n\n\nclass InputExample(object):\n  \"\"\"A single training/test example for simple sequence classification.\"\"\"\n\n  def __init__(self, guid, text_a, text_b=None, label=None):\n    \"\"\"Constructs a InputExample.\n\n    Args:\n      guid: Unique id for the example.\n      text_a: string. The untokenized text of the first sequence. For single\n        sequence tasks, only this sequence must be specified.\n      text_b: (Optional) string. The untokenized text of the second sequence.\n        Only must be specified for sequence pair tasks.\n      label: (Optional) string. The label of the example. This should be\n        specified for train and dev examples, but not for test examples.\n    \"\"\"\n    self.guid = guid\n    self.text_a = text_a\n    self.text_b = text_b\n    self.label = label\n\n\nclass PaddingInputExample(object):\n  \"\"\"Fake example so the num input examples is a multiple of the batch size.\n\n  When running eval/predict on the TPU, we need to pad the number of examples\n  to be a multiple of the batch size, because the TPU requires a fixed batch\n  size. The alternative is to drop the last batch, which is bad because it means\n  the entire output data won't be generated.\n\n  We use this class instead of `None` because treating `None` as padding\n  battches could cause silent errors.\n  \"\"\"\n\n\nclass InputFeatures(object):\n  \"\"\"A single set of features of data.\"\"\"\n\n  def __init__(self,\n               input_ids,\n               input_mask,\n               segment_ids,\n               label_id,\n               is_real_example=True):\n    self.input_ids = input_ids\n    self.input_mask = input_mask\n    self.segment_ids = segment_ids\n    self.label_id = label_id\n    self.is_real_example = is_real_example\n\n\ndef convert_single_example(ex_index, example, label_list, max_seq_length,\n                           tokenizer):\n  \"\"\"Converts a single `InputExample` into a single `InputFeatures`.\"\"\"\n\n  if isinstance(example, PaddingInputExample):\n    return InputFeatures(\n        input_ids=[0] * max_seq_length,\n        input_mask=[0] * max_seq_length,\n        segment_ids=[0] * max_seq_length,\n        label_id=0,\n        is_real_example=False)\n\n  label_map = {}\n  for (i, label) in enumerate(label_list):\n    label_map[label] = i\n\n  tokens_a = tokenizer.tokenize(example.text_a)\n  tokens_b = None\n  if example.text_b:\n    tokens_b = tokenizer.tokenize(example.text_b)\n\n  if tokens_b:\n    # Modifies `tokens_a` and `tokens_b` in place so that the total\n    # length is less than the specified length.\n    # Account for [CLS], [SEP], [SEP] with \"- 3\"\n    _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)\n  else:\n    # Account for [CLS] and [SEP] with \"- 2\"\n    if len(tokens_a) > max_seq_length - 2:\n      tokens_a = tokens_a[0:(max_seq_length - 2)]\n\n  # The convention in BERT is:\n  # (a) For sequence pairs:\n  #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]\n  #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1\n  # (b) For single sequences:\n  #  tokens:   [CLS] the dog is hairy . [SEP]\n  #  type_ids: 0     0   0   0  0     0 0\n  #\n  # Where \"type_ids\" are used to indicate whether this is the first\n  # sequence or the second sequence. The embedding vectors for `type=0` and\n  # `type=1` were learned during pre-training and are added to the wordpiece\n  # embedding vector (and position vector). This is not *strictly* necessary\n  # since the [SEP] token unambiguously separates the sequences, but it makes\n  # it easier for the model to learn the concept of sequences.\n  #\n  # For classification tasks, the first vector (corresponding to [CLS]) is\n  # used as the \"sentence vector\". Note that this only makes sense because\n  # the entire model is fine-tuned.\n  tokens = []\n  segment_ids = []\n  tokens.append(\"[CLS]\")\n  segment_ids.append(0)\n  for token in tokens_a:\n    tokens.append(token)\n    segment_ids.append(0)\n  tokens.append(\"[SEP]\")\n  segment_ids.append(0)\n\n  if tokens_b:\n    for token in tokens_b:\n      tokens.append(token)\n      segment_ids.append(1)\n    tokens.append(\"[SEP]\")\n    segment_ids.append(1)\n\n  input_ids = tokenizer.convert_tokens_to_ids(tokens)\n\n  # The mask has 1 for real tokens and 0 for padding tokens. Only real\n  # tokens are attended to.\n  input_mask = [1] * len(input_ids)\n\n  # Zero-pad up to the sequence length.\n  while len(input_ids) < max_seq_length:\n    input_ids.append(0)\n    input_mask.append(0)\n    segment_ids.append(0)\n\n  assert len(input_ids) == max_seq_length\n  assert len(input_mask) == max_seq_length\n  assert len(segment_ids) == max_seq_length\n\n  label_id = label_map[example.label]\n\n  feature = InputFeatures(\n      input_ids=input_ids,\n      input_mask=input_mask,\n      segment_ids=segment_ids,\n      label_id=label_id,\n      is_real_example=True)\n  return feature\n\n\ndef read_tsv(input_file, quotechar=None):\n    \"\"\"Reads a tab separated value file.\"\"\"\n    with open(input_file, \"r\") as f:\n        reader = csv.reader(f, delimiter=\"\\t\", quotechar=quotechar)\n        lines = []\n        for line in reader:\n            lines.append(line)\n        return lines\n\n\ndef create_examples(lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    for (i, line) in enumerate(lines):\n      if i == 0:\n        continue\n      guid = \"%s-%s\" % (set_type, i)\n      text_a = tokenization.convert_to_unicode(line[3])\n      text_b = tokenization.convert_to_unicode(line[4])\n      if set_type == \"test\":\n        label = \"0\"\n      else:\n        label = tokenization.convert_to_unicode(line[0])\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n    return examples\n\n\ndef _truncate_seq_pair(tokens_a, tokens_b, max_length):\n  \"\"\"Truncates a sequence pair in place to the maximum length.\"\"\"\n\n  # This is a simple heuristic which will always truncate the longer sequence\n  # one token at a time. This makes more sense than truncating an equal percent\n  # of tokens from each, since if one sequence is very short then each token\n  # that's truncated likely contains more information than a longer sequence.\n  while True:\n    total_length = len(tokens_a) + len(tokens_b)\n    if total_length <= max_length:\n      break\n    if len(tokens_a) > len(tokens_b):\n      tokens_a.pop()\n    else:\n      tokens_b.pop()\n\n\ndef get_eval_model_feed_dict_list(mrpc_tsv, vocab_txt):\n    tsv = read_tsv(mrpc_tsv)\n    result = create_examples(tsv, \"dev\")\n    model_feed_dict_list = []\n    for example in result:\n        tokenizer = tokenization.FullTokenizer(vocab_file=vocab_txt, do_lower_case=True)\n        label_list = ['0', '1']\n        feature = convert_single_example(ex_index=0, example=example, label_list=label_list,\n                                         max_seq_length=128, tokenizer=tokenizer)\n        pre_model_feed_dict = {\n            'input_ids': feature.input_ids,\n            'input_mask': feature.input_mask,\n            'segment_ids': feature.segment_ids,\n            'label_id': feature.label_id,\n            'is_real_example': feature.is_real_example,\n        }\n        model_feed_dict = {}\n        for key, value in pre_model_feed_dict.items():\n            if key in {'label_id', 'is_real_example'}:\n                value = np.tile(np.int32(value), reps=[1])\n            else:\n                value = np.tile(np.int32(value), reps=[1, 1])\n            model_feed_dict[key] = value\n        model_feed_dict_list.append(model_feed_dict)\n    return model_feed_dict_list\n\n\ndef text_pair_to_model_feed_dict(text_a, text_b, tokenizer):\n    fake_tsv = [['index', '#1 ID', '#2 ID', '#1 String', '#2 String'],\n                ['', '', '', text_a, text_b]]\n    result = create_examples(fake_tsv, \"test\")\n    example = result[0]\n    label_list = ['0', '1']\n    feature = convert_single_example(ex_index=0, example=example, label_list=label_list,\n                                     max_seq_length=128, tokenizer=tokenizer)\n    return {\n        'input_ids': np.tile(np.int32(feature.input_ids), reps=[1, 1]),\n        'input_mask': np.tile(np.int32(feature.input_mask), reps=[1, 1]),\n        'segment_ids': np.tile(np.int32(feature.segment_ids), reps=[1, 1]),\n    }\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/mrpc_pb2.py",
    "content": "# -*- coding: utf-8 -*-\n# Generated by the protocol buffer compiler.  DO NOT EDIT!\n# source: mrpc.proto\n\nimport sys\n_b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1'))\nfrom google.protobuf import descriptor as _descriptor\nfrom google.protobuf import message as _message\nfrom google.protobuf import reflection as _reflection\nfrom google.protobuf import symbol_database as _symbol_database\n# @@protoc_insertion_point(imports)\n\n_sym_db = _symbol_database.Default()\n\n\n\n\nDESCRIPTOR = _descriptor.FileDescriptor(\n  name='mrpc.proto',\n  package='mrpc',\n  syntax='proto3',\n  serialized_options=None,\n  serialized_pb=_b('\\n\\nmrpc.proto\\x12\\x04mrpc\\\"*\\n\\x08TextPair\\x12\\x0e\\n\\x06text_a\\x18\\x01 \\x01(\\x0c\\x12\\x0e\\n\\x06text_b\\x18\\x02 \\x01(\\x0c\\\",\\n\\x05YesNo\\x12\\x0f\\n\\x07message\\x18\\x01 \\x01(\\x0c\\x12\\x12\\n\\nprediction\\x18\\x02 \\x01(\\x0c\\x32\\x33\\n\\x04mrpc\\x12+\\n\\nparaphrase\\x12\\x0e.mrpc.TextPair\\x1a\\x0b.mrpc.YesNo\\\"\\x00\\x62\\x06proto3')\n)\n\n\n\n\n_TEXTPAIR = _descriptor.Descriptor(\n  name='TextPair',\n  full_name='mrpc.TextPair',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='text_a', full_name='mrpc.TextPair.text_a', index=0,\n      number=1, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=_b(\"\"),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='text_b', full_name='mrpc.TextPair.text_b', index=1,\n      number=2, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=_b(\"\"),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=20,\n  serialized_end=62,\n)\n\n\n_YESNO = _descriptor.Descriptor(\n  name='YesNo',\n  full_name='mrpc.YesNo',\n  filename=None,\n  file=DESCRIPTOR,\n  containing_type=None,\n  fields=[\n    _descriptor.FieldDescriptor(\n      name='message', full_name='mrpc.YesNo.message', index=0,\n      number=1, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=_b(\"\"),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n    _descriptor.FieldDescriptor(\n      name='prediction', full_name='mrpc.YesNo.prediction', index=1,\n      number=2, type=12, cpp_type=9, label=1,\n      has_default_value=False, default_value=_b(\"\"),\n      message_type=None, enum_type=None, containing_type=None,\n      is_extension=False, extension_scope=None,\n      serialized_options=None, file=DESCRIPTOR),\n  ],\n  extensions=[\n  ],\n  nested_types=[],\n  enum_types=[\n  ],\n  serialized_options=None,\n  is_extendable=False,\n  syntax='proto3',\n  extension_ranges=[],\n  oneofs=[\n  ],\n  serialized_start=64,\n  serialized_end=108,\n)\n\nDESCRIPTOR.message_types_by_name['TextPair'] = _TEXTPAIR\nDESCRIPTOR.message_types_by_name['YesNo'] = _YESNO\n_sym_db.RegisterFileDescriptor(DESCRIPTOR)\n\nTextPair = _reflection.GeneratedProtocolMessageType('TextPair', (_message.Message,), {\n  'DESCRIPTOR' : _TEXTPAIR,\n  '__module__' : 'mrpc_pb2'\n  # @@protoc_insertion_point(class_scope:mrpc.TextPair)\n  })\n_sym_db.RegisterMessage(TextPair)\n\nYesNo = _reflection.GeneratedProtocolMessageType('YesNo', (_message.Message,), {\n  'DESCRIPTOR' : _YESNO,\n  '__module__' : 'mrpc_pb2'\n  # @@protoc_insertion_point(class_scope:mrpc.YesNo)\n  })\n_sym_db.RegisterMessage(YesNo)\n\n\n\n_MRPC = _descriptor.ServiceDescriptor(\n  name='mrpc',\n  full_name='mrpc.mrpc',\n  file=DESCRIPTOR,\n  index=0,\n  serialized_options=None,\n  serialized_start=110,\n  serialized_end=161,\n  methods=[\n  _descriptor.MethodDescriptor(\n    name='paraphrase',\n    full_name='mrpc.mrpc.paraphrase',\n    index=0,\n    containing_service=None,\n    input_type=_TEXTPAIR,\n    output_type=_YESNO,\n    serialized_options=None,\n  ),\n])\n_sym_db.RegisterServiceDescriptor(_MRPC)\n\nDESCRIPTOR.services_by_name['mrpc'] = _MRPC\n\n# @@protoc_insertion_point(module_scope)\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/mrpc_pb2_grpc.py",
    "content": "# coding=utf-8\n\n\"\"\" Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n    Program to gather information from a system\n\"\"\"\n\n# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!\nimport grpc\n\nimport mrpc_pb2 as mrpc__pb2\n\n\nclass mrpcStub(object):\n  # missing associated documentation comment in .proto file\n  pass\n\n  def __init__(self, channel):\n    \"\"\"Constructor.\n\n    Args:\n      channel: A grpc.Channel.\n    \"\"\"\n    self.paraphrase = channel.unary_unary(\n        '/mrpc.mrpc/paraphrase',\n        request_serializer=mrpc__pb2.TextPair.SerializeToString,\n        response_deserializer=mrpc__pb2.YesNo.FromString,\n        )\n\n\nclass mrpcServicer(object):\n  # missing associated documentation comment in .proto file\n  pass\n\n  def paraphrase(self, request, context):\n    # missing associated documentation comment in .proto file\n    pass\n    context.set_code(grpc.StatusCode.UNIMPLEMENTED)\n    context.set_details('Method not implemented!')\n    raise NotImplementedError('Method not implemented!')\n\n\ndef add_mrpcServicer_to_server(servicer, server):\n  rpc_method_handlers = {\n      'paraphrase': grpc.unary_unary_rpc_method_handler(\n          servicer.paraphrase,\n          request_deserializer=mrpc__pb2.TextPair.FromString,\n          response_serializer=mrpc__pb2.YesNo.SerializeToString,\n      ),\n  }\n  generic_handler = grpc.method_handlers_generic_handler(\n      'mrpc.mrpc', rpc_method_handlers)\n  server.add_generic_rpc_handlers((generic_handler,))\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/protoc.sh",
    "content": "#!/bin/bash\n# coding=utf-8\n\n\"\"\" Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n    Program to gather information from a system\n\"\"\"\n\npython -m grpc_tools.protoc -I . --python_out=. --grpc_python_out=. mrpc.proto\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/setup.py",
    "content": "\nimport setuptools\n\n\nsetuptools.setup(\n    name='bert-demo',\n    version='2019.12.13',\n    description='BERT Client-Server Demo',\n    author='Amazon AWS',\n    author_email='aws-neuron-support@amazon.com',\n    license='BSD',\n    classifiers=[\n        'Development Status :: 1 - Planning',\n        'Intended Audience :: Developers',\n        'Topic :: Scientific/Engineering :: Artificial Intelligence',\n        'License :: OSI Approved :: BSD License',\n        'Programming Language :: Python :: 3',\n        'Programming Language :: Python :: 3.5',\n        'Programming Language :: Python :: 3.6',\n        'Programming Language :: Python :: 3.7',\n    ],\n    keywords='bert',\n    include_package_data=True,\n    packages=setuptools.PEP420PackageFinder.find(),\n    package_data={'': [\n        '*',\n    ]},\n    entry_points={\n        'console_scripts': [\n            'neuron_bert_model=bert_demo.bert_model:main',\n            'bert_server=bert_demo.bert_server:serve',\n            'bert_client=bert_demo.bert_client:client',\n        ],\n    },\n    install_requires=[\n    ],\n)\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/tokenization.py",
    "content": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"Tokenization classes.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport re\nimport unicodedata\nimport six\n\n\ndef validate_case_matches_checkpoint(do_lower_case, init_checkpoint):\n  \"\"\"Checks whether the casing config is consistent with the checkpoint name.\"\"\"\n\n  # The casing has to be passed in by the user and there is no explicit check\n  # as to whether it matches the checkpoint. The casing information probably\n  # should have been stored in the bert_config.json file, but it's not, so\n  # we have to heuristically detect it to validate.\n\n  if not init_checkpoint:\n    return\n\n  m = re.match(\"^.*?([A-Za-z0-9_-]+)/bert_model.ckpt\", init_checkpoint)\n  if m is None:\n    return\n\n  model_name = m.group(1)\n\n  lower_models = [\n      \"uncased_L-24_H-1024_A-16\", \"uncased_L-12_H-768_A-12\",\n      \"multilingual_L-12_H-768_A-12\", \"chinese_L-12_H-768_A-12\"\n  ]\n\n  cased_models = [\n      \"cased_L-12_H-768_A-12\", \"cased_L-24_H-1024_A-16\",\n      \"multi_cased_L-12_H-768_A-12\"\n  ]\n\n  is_bad_config = False\n  if model_name in lower_models and not do_lower_case:\n    is_bad_config = True\n    actual_flag = \"False\"\n    case_name = \"lowercased\"\n    opposite_flag = \"True\"\n\n  if model_name in cased_models and do_lower_case:\n    is_bad_config = True\n    actual_flag = \"True\"\n    case_name = \"cased\"\n    opposite_flag = \"False\"\n\n  if is_bad_config:\n    raise ValueError(\n        \"You passed in `--do_lower_case=%s` with `--init_checkpoint=%s`. \"\n        \"However, `%s` seems to be a %s model, so you \"\n        \"should pass in `--do_lower_case=%s` so that the fine-tuning matches \"\n        \"how the model was pre-training. If this error is wrong, please \"\n        \"just comment out this check.\" % (actual_flag, init_checkpoint,\n                                          model_name, case_name, opposite_flag))\n\n\ndef convert_to_unicode(text):\n  \"\"\"Converts `text` to Unicode (if it's not already), assuming utf-8 input.\"\"\"\n  if six.PY3:\n    if isinstance(text, str):\n      return text\n    elif isinstance(text, bytes):\n      return text.decode(\"utf-8\", \"ignore\")\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  elif six.PY2:\n    if isinstance(text, str):\n      return text.decode(\"utf-8\", \"ignore\")\n    elif isinstance(text, unicode):\n      return text\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  else:\n    raise ValueError(\"Not running on Python2 or Python 3?\")\n\n\ndef printable_text(text):\n  \"\"\"Returns text encoded in a way suitable for print or `tf.logging`.\"\"\"\n\n  # These functions want `str` for both Python2 and Python3, but in one case\n  # it's a Unicode string and in the other it's a byte string.\n  if six.PY3:\n    if isinstance(text, str):\n      return text\n    elif isinstance(text, bytes):\n      return text.decode(\"utf-8\", \"ignore\")\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  elif six.PY2:\n    if isinstance(text, str):\n      return text\n    elif isinstance(text, unicode):\n      return text.encode(\"utf-8\")\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  else:\n    raise ValueError(\"Not running on Python2 or Python 3?\")\n\n\ndef load_vocab(vocab_file):\n  \"\"\"Loads a vocabulary file into a dictionary.\"\"\"\n  vocab = collections.OrderedDict()\n  index = 0\n  with open(vocab_file, \"r\") as reader:\n    while True:\n      token = convert_to_unicode(reader.readline())\n      if not token:\n        break\n      token = token.strip()\n      vocab[token] = index\n      index += 1\n  return vocab\n\n\ndef convert_by_vocab(vocab, items):\n  \"\"\"Converts a sequence of [tokens|ids] using the vocab.\"\"\"\n  output = []\n  for item in items:\n    output.append(vocab[item])\n  return output\n\n\ndef convert_tokens_to_ids(vocab, tokens):\n  return convert_by_vocab(vocab, tokens)\n\n\ndef convert_ids_to_tokens(inv_vocab, ids):\n  return convert_by_vocab(inv_vocab, ids)\n\n\ndef whitespace_tokenize(text):\n  \"\"\"Runs basic whitespace cleaning and splitting on a piece of text.\"\"\"\n  text = text.strip()\n  if not text:\n    return []\n  tokens = text.split()\n  return tokens\n\n\nclass FullTokenizer(object):\n  \"\"\"Runs end-to-end tokenziation.\"\"\"\n\n  def __init__(self, vocab_file, do_lower_case=True):\n    self.vocab = load_vocab(vocab_file)\n    self.inv_vocab = {v: k for k, v in self.vocab.items()}\n    self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case)\n    self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)\n\n  def tokenize(self, text):\n    split_tokens = []\n    for token in self.basic_tokenizer.tokenize(text):\n      for sub_token in self.wordpiece_tokenizer.tokenize(token):\n        split_tokens.append(sub_token)\n\n    return split_tokens\n\n  def convert_tokens_to_ids(self, tokens):\n    return convert_by_vocab(self.vocab, tokens)\n\n  def convert_ids_to_tokens(self, ids):\n    return convert_by_vocab(self.inv_vocab, ids)\n\n\nclass BasicTokenizer(object):\n  \"\"\"Runs basic tokenization (punctuation splitting, lower casing, etc.).\"\"\"\n\n  def __init__(self, do_lower_case=True):\n    \"\"\"Constructs a BasicTokenizer.\n\n    Args:\n      do_lower_case: Whether to lower case the input.\n    \"\"\"\n    self.do_lower_case = do_lower_case\n\n  def tokenize(self, text):\n    \"\"\"Tokenizes a piece of text.\"\"\"\n    text = convert_to_unicode(text)\n    text = self._clean_text(text)\n\n    # This was added on November 1st, 2018 for the multilingual and Chinese\n    # models. This is also applied to the English models now, but it doesn't\n    # matter since the English models were not trained on any Chinese data\n    # and generally don't have any Chinese data in them (there are Chinese\n    # characters in the vocabulary because Wikipedia does have some Chinese\n    # words in the English Wikipedia.).\n    text = self._tokenize_chinese_chars(text)\n\n    orig_tokens = whitespace_tokenize(text)\n    split_tokens = []\n    for token in orig_tokens:\n      if self.do_lower_case:\n        token = token.lower()\n        token = self._run_strip_accents(token)\n      split_tokens.extend(self._run_split_on_punc(token))\n\n    output_tokens = whitespace_tokenize(\" \".join(split_tokens))\n    return output_tokens\n\n  def _run_strip_accents(self, text):\n    \"\"\"Strips accents from a piece of text.\"\"\"\n    text = unicodedata.normalize(\"NFD\", text)\n    output = []\n    for char in text:\n      cat = unicodedata.category(char)\n      if cat == \"Mn\":\n        continue\n      output.append(char)\n    return \"\".join(output)\n\n  def _run_split_on_punc(self, text):\n    \"\"\"Splits punctuation on a piece of text.\"\"\"\n    chars = list(text)\n    i = 0\n    start_new_word = True\n    output = []\n    while i < len(chars):\n      char = chars[i]\n      if _is_punctuation(char):\n        output.append([char])\n        start_new_word = True\n      else:\n        if start_new_word:\n          output.append([])\n        start_new_word = False\n        output[-1].append(char)\n      i += 1\n\n    return [\"\".join(x) for x in output]\n\n  def _tokenize_chinese_chars(self, text):\n    \"\"\"Adds whitespace around any CJK character.\"\"\"\n    output = []\n    for char in text:\n      cp = ord(char)\n      if self._is_chinese_char(cp):\n        output.append(\" \")\n        output.append(char)\n        output.append(\" \")\n      else:\n        output.append(char)\n    return \"\".join(output)\n\n  def _is_chinese_char(self, cp):\n    \"\"\"Checks whether CP is the codepoint of a CJK character.\"\"\"\n    # This defines a \"chinese character\" as anything in the CJK Unicode block:\n    #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)\n    #\n    # Note that the CJK Unicode block is NOT all Japanese and Korean characters,\n    # despite its name. The modern Korean Hangul alphabet is a different block,\n    # as is Japanese Hiragana and Katakana. Those alphabets are used to write\n    # space-separated words, so they are not treated specially and handled\n    # like the all of the other languages.\n    if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #\n        (cp >= 0x3400 and cp <= 0x4DBF) or  #\n        (cp >= 0x20000 and cp <= 0x2A6DF) or  #\n        (cp >= 0x2A700 and cp <= 0x2B73F) or  #\n        (cp >= 0x2B740 and cp <= 0x2B81F) or  #\n        (cp >= 0x2B820 and cp <= 0x2CEAF) or\n        (cp >= 0xF900 and cp <= 0xFAFF) or  #\n        (cp >= 0x2F800 and cp <= 0x2FA1F)):  #\n      return True\n\n    return False\n\n  def _clean_text(self, text):\n    \"\"\"Performs invalid character removal and whitespace cleanup on text.\"\"\"\n    output = []\n    for char in text:\n      cp = ord(char)\n      if cp == 0 or cp == 0xfffd or _is_control(char):\n        continue\n      if _is_whitespace(char):\n        output.append(\" \")\n      else:\n        output.append(char)\n    return \"\".join(output)\n\n\nclass WordpieceTokenizer(object):\n  \"\"\"Runs WordPiece tokenziation.\"\"\"\n\n  def __init__(self, vocab, unk_token=\"[UNK]\", max_input_chars_per_word=200):\n    self.vocab = vocab\n    self.unk_token = unk_token\n    self.max_input_chars_per_word = max_input_chars_per_word\n\n  def tokenize(self, text):\n    \"\"\"Tokenizes a piece of text into its word pieces.\n\n    This uses a greedy longest-match-first algorithm to perform tokenization\n    using the given vocabulary.\n\n    For example:\n      input = \"unaffable\"\n      output = [\"un\", \"##aff\", \"##able\"]\n\n    Args:\n      text: A single token or whitespace separated tokens. This should have\n        already been passed through `BasicTokenizer.\n\n    Returns:\n      A list of wordpiece tokens.\n    \"\"\"\n\n    text = convert_to_unicode(text)\n\n    output_tokens = []\n    for token in whitespace_tokenize(text):\n      chars = list(token)\n      if len(chars) > self.max_input_chars_per_word:\n        output_tokens.append(self.unk_token)\n        continue\n\n      is_bad = False\n      start = 0\n      sub_tokens = []\n      while start < len(chars):\n        end = len(chars)\n        cur_substr = None\n        while start < end:\n          substr = \"\".join(chars[start:end])\n          if start > 0:\n            substr = \"##\" + substr\n          if substr in self.vocab:\n            cur_substr = substr\n            break\n          end -= 1\n        if cur_substr is None:\n          is_bad = True\n          break\n        sub_tokens.append(cur_substr)\n        start = end\n\n      if is_bad:\n        output_tokens.append(self.unk_token)\n      else:\n        output_tokens.extend(sub_tokens)\n    return output_tokens\n\n\ndef _is_whitespace(char):\n  \"\"\"Checks whether `chars` is a whitespace character.\"\"\"\n  # \\t, \\n, and \\r are technically contorl characters but we treat them\n  # as whitespace since they are generally considered as such.\n  if char == \" \" or char == \"\\t\" or char == \"\\n\" or char == \"\\r\":\n    return True\n  cat = unicodedata.category(char)\n  if cat == \"Zs\":\n    return True\n  return False\n\n\ndef _is_control(char):\n  \"\"\"Checks whether `chars` is a control character.\"\"\"\n  # These are technically control characters but we count them as whitespace\n  # characters.\n  if char == \"\\t\" or char == \"\\n\" or char == \"\\r\":\n    return False\n  cat = unicodedata.category(char)\n  if cat in (\"Cc\", \"Cf\"):\n    return True\n  return False\n\n\ndef _is_punctuation(char):\n  \"\"\"Checks whether `chars` is a punctuation character.\"\"\"\n  cp = ord(char)\n  # We treat all non-letter/number ASCII as punctuation.\n  # Characters such as \"^\", \"$\", and \"`\" are not in the Unicode\n  # Punctuation class but we treat them as punctuation anyways, for\n  # consistency.\n  if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or\n      (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):\n    return True\n  cat = unicodedata.category(char)\n  if cat.startswith(\"P\"):\n    return True\n  return False\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/tune_save.sh",
    "content": "#!/bin/bash\n\npushd $BERT_REPO_DIR\n\npython run_classifier.py \\\n  --task_name=MRPC \\\n  --do_train=true \\\n  --do_eval=true \\\n  --do_predict=true \\\n  --data_dir=$GLUE_DIR/MRPC \\\n  --vocab_file=$BERT_BASE_DIR/vocab.txt \\\n  --bert_config_file=$BERT_BASE_DIR/bert_config.json \\\n  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \\\n  --max_seq_length=128 \\\n  --train_batch_size=32 \\\n  --learning_rate=2e-5 \\\n  --num_train_epochs=3.0 \\\n  --output_dir=$BERT_REPO_DIR/MRPC_finetune\n\npython run_classifier.py \\\n  --task_name=MRPC \\\n  --do_predict=true \\\n  --data_dir=$GLUE_DIR/MRPC \\\n  --vocab_file=$BERT_BASE_DIR/vocab.txt \\\n  --bert_config_file=$BERT_BASE_DIR/bert_config.json \\\n  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \\\n  --max_seq_length=128 \\\n  --output_dir=$BERT_REPO_DIR/MRPC_finetune\n\npopd\n\n"
  },
  {
    "path": "src/examples/tensorflow/bert_demo/uncased_L-24_H-1024_A-16.vocab.txt",
    "content": "[PAD]\n[unused0]\n[unused1]\n[unused2]\n[unused3]\n[unused4]\n[unused5]\n[unused6]\n[unused7]\n[unused8]\n[unused9]\n[unused10]\n[unused11]\n[unused12]\n[unused13]\n[unused14]\n[unused15]\n[unused16]\n[unused17]\n[unused18]\n[unused19]\n[unused20]\n[unused21]\n[unused22]\n[unused23]\n[unused24]\n[unused25]\n[unused26]\n[unused27]\n[unused28]\n[unused29]\n[unused30]\n[unused31]\n[unused32]\n[unused33]\n[unused34]\n[unused35]\n[unused36]\n[unused37]\n[unused38]\n[unused39]\n[unused40]\n[unused41]\n[unused42]\n[unused43]\n[unused44]\n[unused45]\n[unused46]\n[unused47]\n[unused48]\n[unused49]\n[unused50]\n[unused51]\n[unused52]\n[unused53]\n[unused54]\n[unused55]\n[unused56]\n[unused57]\n[unused58]\n[unused59]\n[unused60]\n[unused61]\n[unused62]\n[unused63]\n[unused64]\n[unused65]\n[unused66]\n[unused67]\n[unused68]\n[unused69]\n[unused70]\n[unused71]\n[unused72]\n[unused73]\n[unused74]\n[unused75]\n[unused76]\n[unused77]\n[unused78]\n[unused79]\n[unused80]\n[unused81]\n[unused82]\n[unused83]\n[unused84]\n[unused85]\n[unused86]\n[unused87]\n[unused88]\n[unused89]\n[unused90]\n[unused91]\n[unused92]\n[unused93]\n[unused94]\n[unused95]\n[unused96]\n[unused97]\n[unused98]\n[UNK]\n[CLS]\n[SEP]\n[MASK]\n[unused99]\n[unused100]\n[unused101]\n[unused102]\n[unused103]\n[unused104]\n[unused105]\n[unused106]\n[unused107]\n[unused108]\n[unused109]\n[unused110]\n[unused111]\n[unused112]\n[unused113]\n[unused114]\n[unused115]\n[unused116]\n[unused117]\n[unused118]\n[unused119]\n[unused120]\n[unused121]\n[unused122]\n[unused123]\n[unused124]\n[unused125]\n[unused126]\n[unused127]\n[unused128]\n[unused129]\n[unused130]\n[unused131]\n[unused132]\n[unused133]\n[unused134]\n[unused135]\n[unused136]\n[unused137]\n[unused138]\n[unused139]\n[unused140]\n[unused141]\n[unused142]\n[unused143]\n[unused144]\n[unused145]\n[unused146]\n[unused147]\n[unused148]\n[unused149]\n[unused150]\n[unused151]\n[unused152]\n[unused153]\n[unused154]\n[unused155]\n[unused156]\n[unused157]\n[unused158]\n[unused159]\n[unused160]\n[unused161]\n[unused162]\n[unused163]\n[unused164]\n[unused165]\n[unused166]\n[unused167]\n[unused168]\n[unused169]\n[unused170]\n[unused171]\n[unused172]\n[unused173]\n[unused174]\n[unused175]\n[unused176]\n[unused177]\n[unused178]\n[unused179]\n[unused180]\n[unused181]\n[unused182]\n[unused183]\n[unused184]\n[unused185]\n[unused186]\n[unused187]\n[unused188]\n[unused189]\n[unused190]\n[unused191]\n[unused192]\n[unused193]\n[unused194]\n[unused195]\n[unused196]\n[unused197]\n[unused198]\n[unused199]\n[unused200]\n[unused201]\n[unused202]\n[unused203]\n[unused204]\n[unused205]\n[unused206]\n[unused207]\n[unused208]\n[unused209]\n[unused210]\n[unused211]\n[unused212]\n[unused213]\n[unused214]\n[unused215]\n[unused216]\n[unused217]\n[unused218]\n[unused219]\n[unused220]\n[unused221]\n[unused222]\n[unused223]\n[unused224]\n[unused225]\n[unused226]\n[unused227]\n[unused228]\n[unused229]\n[unused230]\n[unused231]\n[unused232]\n[unused233]\n[unused234]\n[unused235]\n[unused236]\n[unused237]\n[unused238]\n[unused239]\n[unused240]\n[unused241]\n[unused242]\n[unused243]\n[unused244]\n[unused245]\n[unused246]\n[unused247]\n[unused248]\n[unused249]\n[unused250]\n[unused251]\n[unused252]\n[unused253]\n[unused254]\n[unused255]\n[unused256]\n[unused257]\n[unused258]\n[unused259]\n[unused260]\n[unused261]\n[unused262]\n[unused263]\n[unused264]\n[unused265]\n[unused266]\n[unused267]\n[unused268]\n[unused269]\n[unused270]\n[unused271]\n[unused272]\n[unused273]\n[unused274]\n[unused275]\n[unused276]\n[unused277]\n[unused278]\n[unused279]\n[unused280]\n[unused281]\n[unused282]\n[unused283]\n[unused284]\n[unused285]\n[unused286]\n[unused287]\n[unused288]\n[unused289]\n[unused290]\n[unused291]\n[unused292]\n[unused293]\n[unused294]\n[unused295]\n[unused296]\n[unused297]\n[unused298]\n[unused299]\n[unused300]\n[unused301]\n[unused302]\n[unused303]\n[unused304]\n[unused305]\n[unused306]\n[unused307]\n[unused308]\n[unused309]\n[unused310]\n[unused311]\n[unused312]\n[unused313]\n[unused314]\n[unused315]\n[unused316]\n[unused317]\n[unused318]\n[unused319]\n[unused320]\n[unused321]\n[unused322]\n[unused323]\n[unused324]\n[unused325]\n[unused326]\n[unused327]\n[unused328]\n[unused329]\n[unused330]\n[unused331]\n[unused332]\n[unused333]\n[unused334]\n[unused335]\n[unused336]\n[unused337]\n[unused338]\n[unused339]\n[unused340]\n[unused341]\n[unused342]\n[unused343]\n[unused344]\n[unused345]\n[unused346]\n[unused347]\n[unused348]\n[unused349]\n[unused350]\n[unused351]\n[unused352]\n[unused353]\n[unused354]\n[unused355]\n[unused356]\n[unused357]\n[unused358]\n[unused359]\n[unused360]\n[unused361]\n[unused362]\n[unused363]\n[unused364]\n[unused365]\n[unused366]\n[unused367]\n[unused368]\n[unused369]\n[unused370]\n[unused371]\n[unused372]\n[unused373]\n[unused374]\n[unused375]\n[unused376]\n[unused377]\n[unused378]\n[unused379]\n[unused380]\n[unused381]\n[unused382]\n[unused383]\n[unused384]\n[unused385]\n[unused386]\n[unused387]\n[unused388]\n[unused389]\n[unused390]\n[unused391]\n[unused392]\n[unused393]\n[unused394]\n[unused395]\n[unused396]\n[unused397]\n[unused398]\n[unused399]\n[unused400]\n[unused401]\n[unused402]\n[unused403]\n[unused404]\n[unused405]\n[unused406]\n[unused407]\n[unused408]\n[unused409]\n[unused410]\n[unused411]\n[unused412]\n[unused413]\n[unused414]\n[unused415]\n[unused416]\n[unused417]\n[unused418]\n[unused419]\n[unused420]\n[unused421]\n[unused422]\n[unused423]\n[unused424]\n[unused425]\n[unused426]\n[unused427]\n[unused428]\n[unused429]\n[unused430]\n[unused431]\n[unused432]\n[unused433]\n[unused434]\n[unused435]\n[unused436]\n[unused437]\n[unused438]\n[unused439]\n[unused440]\n[unused441]\n[unused442]\n[unused443]\n[unused444]\n[unused445]\n[unused446]\n[unused447]\n[unused448]\n[unused449]\n[unused450]\n[unused451]\n[unused452]\n[unused453]\n[unused454]\n[unused455]\n[unused456]\n[unused457]\n[unused458]\n[unused459]\n[unused460]\n[unused461]\n[unused462]\n[unused463]\n[unused464]\n[unused465]\n[unused466]\n[unused467]\n[unused468]\n[unused469]\n[unused470]\n[unused471]\n[unused472]\n[unused473]\n[unused474]\n[unused475]\n[unused476]\n[unused477]\n[unused478]\n[unused479]\n[unused480]\n[unused481]\n[unused482]\n[unused483]\n[unused484]\n[unused485]\n[unused486]\n[unused487]\n[unused488]\n[unused489]\n[unused490]\n[unused491]\n[unused492]\n[unused493]\n[unused494]\n[unused495]\n[unused496]\n[unused497]\n[unused498]\n[unused499]\n[unused500]\n[unused501]\n[unused502]\n[unused503]\n[unused504]\n[unused505]\n[unused506]\n[unused507]\n[unused508]\n[unused509]\n[unused510]\n[unused511]\n[unused512]\n[unused513]\n[unused514]\n[unused515]\n[unused516]\n[unused517]\n[unused518]\n[unused519]\n[unused520]\n[unused521]\n[unused522]\n[unused523]\n[unused524]\n[unused525]\n[unused526]\n[unused527]\n[unused528]\n[unused529]\n[unused530]\n[unused531]\n[unused532]\n[unused533]\n[unused534]\n[unused535]\n[unused536]\n[unused537]\n[unused538]\n[unused539]\n[unused540]\n[unused541]\n[unused542]\n[unused543]\n[unused544]\n[unused545]\n[unused546]\n[unused547]\n[unused548]\n[unused549]\n[unused550]\n[unused551]\n[unused552]\n[unused553]\n[unused554]\n[unused555]\n[unused556]\n[unused557]\n[unused558]\n[unused559]\n[unused560]\n[unused561]\n[unused562]\n[unused563]\n[unused564]\n[unused565]\n[unused566]\n[unused567]\n[unused568]\n[unused569]\n[unused570]\n[unused571]\n[unused572]\n[unused573]\n[unused574]\n[unused575]\n[unused576]\n[unused577]\n[unused578]\n[unused579]\n[unused580]\n[unused581]\n[unused582]\n[unused583]\n[unused584]\n[unused585]\n[unused586]\n[unused587]\n[unused588]\n[unused589]\n[unused590]\n[unused591]\n[unused592]\n[unused593]\n[unused594]\n[unused595]\n[unused596]\n[unused597]\n[unused598]\n[unused599]\n[unused600]\n[unused601]\n[unused602]\n[unused603]\n[unused604]\n[unused605]\n[unused606]\n[unused607]\n[unused608]\n[unused609]\n[unused610]\n[unused611]\n[unused612]\n[unused613]\n[unused614]\n[unused615]\n[unused616]\n[unused617]\n[unused618]\n[unused619]\n[unused620]\n[unused621]\n[unused622]\n[unused623]\n[unused624]\n[unused625]\n[unused626]\n[unused627]\n[unused628]\n[unused629]\n[unused630]\n[unused631]\n[unused632]\n[unused633]\n[unused634]\n[unused635]\n[unused636]\n[unused637]\n[unused638]\n[unused639]\n[unused640]\n[unused641]\n[unused642]\n[unused643]\n[unused644]\n[unused645]\n[unused646]\n[unused647]\n[unused648]\n[unused649]\n[unused650]\n[unused651]\n[unused652]\n[unused653]\n[unused654]\n[unused655]\n[unused656]\n[unused657]\n[unused658]\n[unused659]\n[unused660]\n[unused661]\n[unused662]\n[unused663]\n[unused664]\n[unused665]\n[unused666]\n[unused667]\n[unused668]\n[unused669]\n[unused670]\n[unused671]\n[unused672]\n[unused673]\n[unused674]\n[unused675]\n[unused676]\n[unused677]\n[unused678]\n[unused679]\n[unused680]\n[unused681]\n[unused682]\n[unused683]\n[unused684]\n[unused685]\n[unused686]\n[unused687]\n[unused688]\n[unused689]\n[unused690]\n[unused691]\n[unused692]\n[unused693]\n[unused694]\n[unused695]\n[unused696]\n[unused697]\n[unused698]\n[unused699]\n[unused700]\n[unused701]\n[unused702]\n[unused703]\n[unused704]\n[unused705]\n[unused706]\n[unused707]\n[unused708]\n[unused709]\n[unused710]\n[unused711]\n[unused712]\n[unused713]\n[unused714]\n[unused715]\n[unused716]\n[unused717]\n[unused718]\n[unused719]\n[unused720]\n[unused721]\n[unused722]\n[unused723]\n[unused724]\n[unused725]\n[unused726]\n[unused727]\n[unused728]\n[unused729]\n[unused730]\n[unused731]\n[unused732]\n[unused733]\n[unused734]\n[unused735]\n[unused736]\n[unused737]\n[unused738]\n[unused739]\n[unused740]\n[unused741]\n[unused742]\n[unused743]\n[unused744]\n[unused745]\n[unused746]\n[unused747]\n[unused748]\n[unused749]\n[unused750]\n[unused751]\n[unused752]\n[unused753]\n[unused754]\n[unused755]\n[unused756]\n[unused757]\n[unused758]\n[unused759]\n[unused760]\n[unused761]\n[unused762]\n[unused763]\n[unused764]\n[unused765]\n[unused766]\n[unused767]\n[unused768]\n[unused769]\n[unused770]\n[unused771]\n[unused772]\n[unused773]\n[unused774]\n[unused775]\n[unused776]\n[unused777]\n[unused778]\n[unused779]\n[unused780]\n[unused781]\n[unused782]\n[unused783]\n[unused784]\n[unused785]\n[unused786]\n[unused787]\n[unused788]\n[unused789]\n[unused790]\n[unused791]\n[unused792]\n[unused793]\n[unused794]\n[unused795]\n[unused796]\n[unused797]\n[unused798]\n[unused799]\n[unused800]\n[unused801]\n[unused802]\n[unused803]\n[unused804]\n[unused805]\n[unused806]\n[unused807]\n[unused808]\n[unused809]\n[unused810]\n[unused811]\n[unused812]\n[unused813]\n[unused814]\n[unused815]\n[unused816]\n[unused817]\n[unused818]\n[unused819]\n[unused820]\n[unused821]\n[unused822]\n[unused823]\n[unused824]\n[unused825]\n[unused826]\n[unused827]\n[unused828]\n[unused829]\n[unused830]\n[unused831]\n[unused832]\n[unused833]\n[unused834]\n[unused835]\n[unused836]\n[unused837]\n[unused838]\n[unused839]\n[unused840]\n[unused841]\n[unused842]\n[unused843]\n[unused844]\n[unused845]\n[unused846]\n[unused847]\n[unused848]\n[unused849]\n[unused850]\n[unused851]\n[unused852]\n[unused853]\n[unused854]\n[unused855]\n[unused856]\n[unused857]\n[unused858]\n[unused859]\n[unused860]\n[unused861]\n[unused862]\n[unused863]\n[unused864]\n[unused865]\n[unused866]\n[unused867]\n[unused868]\n[unused869]\n[unused870]\n[unused871]\n[unused872]\n[unused873]\n[unused874]\n[unused875]\n[unused876]\n[unused877]\n[unused878]\n[unused879]\n[unused880]\n[unused881]\n[unused882]\n[unused883]\n[unused884]\n[unused885]\n[unused886]\n[unused887]\n[unused888]\n[unused889]\n[unused890]\n[unused891]\n[unused892]\n[unused893]\n[unused894]\n[unused895]\n[unused896]\n[unused897]\n[unused898]\n[unused899]\n[unused900]\n[unused901]\n[unused902]\n[unused903]\n[unused904]\n[unused905]\n[unused906]\n[unused907]\n[unused908]\n[unused909]\n[unused910]\n[unused911]\n[unused912]\n[unused913]\n[unused914]\n[unused915]\n[unused916]\n[unused917]\n[unused918]\n[unused919]\n[unused920]\n[unused921]\n[unused922]\n[unused923]\n[unused924]\n[unused925]\n[unused926]\n[unused927]\n[unused928]\n[unused929]\n[unused930]\n[unused931]\n[unused932]\n[unused933]\n[unused934]\n[unused935]\n[unused936]\n[unused937]\n[unused938]\n[unused939]\n[unused940]\n[unused941]\n[unused942]\n[unused943]\n[unused944]\n[unused945]\n[unused946]\n[unused947]\n[unused948]\n[unused949]\n[unused950]\n[unused951]\n[unused952]\n[unused953]\n[unused954]\n[unused955]\n[unused956]\n[unused957]\n[unused958]\n[unused959]\n[unused960]\n[unused961]\n[unused962]\n[unused963]\n[unused964]\n[unused965]\n[unused966]\n[unused967]\n[unused968]\n[unused969]\n[unused970]\n[unused971]\n[unused972]\n[unused973]\n[unused974]\n[unused975]\n[unused976]\n[unused977]\n[unused978]\n[unused979]\n[unused980]\n[unused981]\n[unused982]\n[unused983]\n[unused984]\n[unused985]\n[unused986]\n[unused987]\n[unused988]\n[unused989]\n[unused990]\n[unused991]\n[unused992]\n[unused993]\n!\n\"\n#\n$\n%\n&\n'\n(\n)\n*\n+\n,\n-\n.\n/\n0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n:\n;\n<\n=\n>\n?\n@\n[\n\\\n]\n^\n_\n`\na\nb\nc\nd\ne\nf\ng\nh\ni\nj\nk\nl\nm\nn\no\np\nq\nr\ns\nt\nu\nv\nw\nx\ny\nz\n{\n|\n}\n~\n¡\n¢\n£\n¤\n¥\n¦\n§\n¨\n©\nª\n«\n¬\n®\n°\n±\n²\n³\n´\nµ\n¶\n·\n¹\nº\n»\n¼\n½\n¾\n¿\n×\nß\næ\nð\n÷\nø\nþ\nđ\nħ\nı\nł\nŋ\nœ\nƒ\nɐ\nɑ\nɒ\nɔ\nɕ\nə\nɛ\nɡ\nɣ\nɨ\nɪ\nɫ\nɬ\nɯ\nɲ\nɴ\nɹ\nɾ\nʀ\nʁ\nʂ\nʃ\nʉ\nʊ\nʋ\nʌ\nʎ\nʐ\nʑ\nʒ\nʔ\nʰ\nʲ\nʳ\nʷ\nʸ\nʻ\nʼ\nʾ\nʿ\nˈ\nː\nˡ\nˢ\nˣ\nˤ\nα\nβ\nγ\nδ\nε\nζ\nη\nθ\nι\nκ\nλ\nμ\nν\nξ\nο\nπ\nρ\nς\nσ\nτ\nυ\nφ\nχ\nψ\nω\nа\nб\nв\nг\nд\nе\nж\nз\nи\nк\nл\nм\nн\nо\nп\nр\nс\nт\nу\nф\nх\nц\nч\nш\nщ\nъ\nы\nь\nэ\nю\nя\nђ\nє\nі\nј\nљ\nњ\nћ\nӏ\nա\nբ\nգ\nդ\nե\nթ\nի\nլ\nկ\nհ\nմ\nյ\nն\nո\nպ\nս\nվ\nտ\nր\nւ\nք\n־\nא\nב\nג\nד\nה\nו\nז\nח\nט\nי\nך\nכ\nל\nם\nמ\nן\nנ\nס\nע\nף\nפ\nץ\nצ\nק\nר\nש\nת\n،\nء\nا\nب\nة\nت\nث\nج\nح\nخ\nد\nذ\nر\nز\nس\nش\nص\nض\nط\nظ\nع\nغ\nـ\nف\nق\nك\nل\nم\nن\nه\nو\nى\nي\nٹ\nپ\nچ\nک\nگ\nں\nھ\nہ\nی\nے\nअ\nआ\nउ\nए\nक\nख\nग\nच\nज\nट\nड\nण\nत\nथ\nद\nध\nन\nप\nब\nभ\nम\nय\nर\nल\nव\nश\nष\nस\nह\nा\nि\nी\nो\n।\n॥\nং\nঅ\nআ\nই\nউ\nএ\nও\nক\nখ\nগ\nচ\nছ\nজ\nট\nড\nণ\nত\nথ\nদ\nধ\nন\nপ\nব\nভ\nম\nয\nর\nল\nশ\nষ\nস\nহ\nা\nি\nী\nে\nக\nச\nட\nத\nந\nன\nப\nம\nய\nர\nல\nள\nவ\nா\nி\nு\nே\nை\nನ\nರ\nಾ\nක\nය\nර\nල\nව\nා\nก\nง\nต\nท\nน\nพ\nม\nย\nร\nล\nว\nส\nอ\nา\nเ\n་\n།\nག\nང\nད\nན\nཔ\nབ\nམ\nའ\nར\nལ\nས\nမ\nა\nბ\nგ\nდ\nე\nვ\nთ\nი\nკ\nლ\nმ\nნ\nო\nრ\nს\nტ\nუ\nᄀ\nᄂ\nᄃ\nᄅ\nᄆ\nᄇ\nᄉ\nᄊ\nᄋ\nᄌ\nᄎ\nᄏ\nᄐ\nᄑ\nᄒ\nᅡ\nᅢ\nᅥ\nᅦ\nᅧ\nᅩ\nᅪ\nᅭ\nᅮ\nᅯ\nᅲ\nᅳ\nᅴ\nᅵ\nᆨ\nᆫ\nᆯ\nᆷ\nᆸ\nᆼ\nᴬ\nᴮ\nᴰ\nᴵ\nᴺ\nᵀ\nᵃ\nᵇ\nᵈ\nᵉ\nᵍ\nᵏ\nᵐ\nᵒ\nᵖ\nᵗ\nᵘ\nᵢ\nᵣ\nᵤ\nᵥ\nᶜ\nᶠ\n‐\n‑\n‒\n–\n—\n―\n‖\n‘\n’\n‚\n“\n”\n„\n†\n‡\n•\n…\n‰\n′\n″\n›\n‿\n⁄\n⁰\nⁱ\n⁴\n⁵\n⁶\n⁷\n⁸\n⁹\n⁺\n⁻\nⁿ\n₀\n₁\n₂\n₃\n₄\n₅\n₆\n₇\n₈\n₉\n₊\n₍\n₎\nₐ\nₑ\nₒ\nₓ\nₕ\nₖ\nₗ\nₘ\nₙ\nₚ\nₛ\nₜ\n₤\n₩\n€\n₱\n₹\nℓ\n№\nℝ\n™\n⅓\n⅔\n←\n↑\n→\n↓\n↔\n↦\n⇄\n⇌\n⇒\n∂\n∅\n∆\n∇\n∈\n−\n∗\n∘\n√\n∞\n∧\n∨\n∩\n∪\n≈\n≡\n≤\n≥\n⊂\n⊆\n⊕\n⊗\n⋅\n─\n│\n■\n▪\n●\n★\n☆\n☉\n♠\n♣\n♥\n♦\n♭\n♯\n⟨\n⟩\nⱼ\n⺩\n⺼\n⽥\n、\n。\n〈\n〉\n《\n》\n「\n」\n『\n』\n〜\nあ\nい\nう\nえ\nお\nか\nき\nく\nけ\nこ\nさ\nし\nす\nせ\nそ\nた\nち\nっ\nつ\nて\nと\nな\nに\nぬ\nね\nの\nは\nひ\nふ\nへ\nほ\nま\nみ\nむ\nめ\nも\nや\nゆ\nよ\nら\nり\nる\nれ\nろ\nを\nん\nァ\nア\nィ\nイ\nウ\nェ\nエ\nオ\nカ\nキ\nク\nケ\nコ\nサ\nシ\nス\nセ\nタ\nチ\nッ\nツ\nテ\nト\nナ\nニ\nノ\nハ\nヒ\nフ\nヘ\nホ\nマ\nミ\nム\nメ\nモ\nャ\nュ\nョ\nラ\nリ\nル\nレ\nロ\nワ\nン\n・\nー\n一\n三\n上\n下\n不\n世\n中\n主\n久\n之\n也\n事\n二\n五\n井\n京\n人\n亻\n仁\n介\n代\n仮\n伊\n会\n佐\n侍\n保\n信\n健\n元\n光\n八\n公\n内\n出\n分\n前\n劉\n力\n加\n勝\n北\n区\n十\n千\n南\n博\n原\n口\n古\n史\n司\n合\n吉\n同\n名\n和\n囗\n四\n国\n國\n土\n地\n坂\n城\n堂\n場\n士\n夏\n外\n大\n天\n太\n夫\n奈\n女\n子\n学\n宀\n宇\n安\n宗\n定\n宣\n宮\n家\n宿\n寺\n將\n小\n尚\n山\n岡\n島\n崎\n川\n州\n巿\n帝\n平\n年\n幸\n广\n弘\n張\n彳\n後\n御\n德\n心\n忄\n志\n忠\n愛\n成\n我\n戦\n戸\n手\n扌\n政\n文\n新\n方\n日\n明\n星\n春\n昭\n智\n曲\n書\n月\n有\n朝\n木\n本\n李\n村\n東\n松\n林\n森\n楊\n樹\n橋\n歌\n止\n正\n武\n比\n氏\n民\n水\n氵\n氷\n永\n江\n沢\n河\n治\n法\n海\n清\n漢\n瀬\n火\n版\n犬\n王\n生\n田\n男\n疒\n発\n白\n的\n皇\n目\n相\n省\n真\n石\n示\n社\n神\n福\n禾\n秀\n秋\n空\n立\n章\n竹\n糹\n美\n義\n耳\n良\n艹\n花\n英\n華\n葉\n藤\n行\n街\n西\n見\n訁\n語\n谷\n貝\n貴\n車\n軍\n辶\n道\n郎\n郡\n部\n都\n里\n野\n金\n鈴\n镇\n長\n門\n間\n阝\n阿\n陳\n陽\n雄\n青\n面\n風\n食\n香\n馬\n高\n龍\n龸\nﬁ\nﬂ\n！\n（\n）\n，\n－\n．\n／\n：\n？\n～\nthe\nof\nand\nin\nto\nwas\nhe\nis\nas\nfor\non\nwith\nthat\nit\nhis\nby\nat\nfrom\nher\n##s\nshe\nyou\nhad\nan\nwere\nbut\nbe\nthis\nare\nnot\nmy\nthey\none\nwhich\nor\nhave\nhim\nme\nfirst\nall\nalso\ntheir\nhas\nup\nwho\nout\nbeen\nwhen\nafter\nthere\ninto\nnew\ntwo\nits\n##a\ntime\nwould\nno\nwhat\nabout\nsaid\nwe\nover\nthen\nother\nso\nmore\n##e\ncan\nif\nlike\nback\nthem\nonly\nsome\ncould\n##i\nwhere\njust\n##ing\nduring\nbefore\n##n\ndo\n##o\nmade\nschool\nthrough\nthan\nnow\nyears\nmost\nworld\nmay\nbetween\ndown\nwell\nthree\n##d\nyear\nwhile\nwill\n##ed\n##r\n##y\nlater\n##t\ncity\nunder\naround\ndid\nsuch\nbeing\nused\nstate\npeople\npart\nknow\nagainst\nyour\nmany\nsecond\nuniversity\nboth\nnational\n##er\nthese\ndon\nknown\noff\nway\nuntil\nre\nhow\neven\nget\nhead\n...\ndidn\n##ly\nteam\namerican\nbecause\nde\n##l\nborn\nunited\nfilm\nsince\nstill\nlong\nwork\nsouth\nus\nbecame\nany\nhigh\nagain\nday\nfamily\nsee\nright\nman\neyes\nhouse\nseason\nwar\nstates\nincluding\ntook\nlife\nnorth\nsame\neach\ncalled\nname\nmuch\nplace\nhowever\ngo\nfour\ngroup\nanother\nfound\nwon\narea\nhere\ngoing\n10\naway\nseries\nleft\nhome\nmusic\nbest\nmake\nhand\nnumber\ncompany\nseveral\nnever\nlast\njohn\n000\nvery\nalbum\ntake\nend\ngood\ntoo\nfollowing\nreleased\ngame\nplayed\nlittle\nbegan\ndistrict\n##m\nold\nwant\nthose\nside\nheld\nown\nearly\ncounty\nll\nleague\nuse\nwest\n##u\nface\nthink\n##es\n2010\ngovernment\n##h\nmarch\ncame\nsmall\ngeneral\ntown\njune\n##on\nline\nbased\nsomething\n##k\nseptember\nthought\nlooked\nalong\ninternational\n2011\nair\njuly\nclub\nwent\njanuary\noctober\nour\naugust\napril\nyork\n12\nfew\n2012\n2008\neast\nshow\nmember\ncollege\n2009\nfather\npublic\n##us\ncome\nmen\nfive\nset\nstation\nchurch\n##c\nnext\nformer\nnovember\nroom\nparty\nlocated\ndecember\n2013\nage\ngot\n2007\n##g\nsystem\nlet\nlove\n2006\nthough\nevery\n2014\nlook\nsong\nwater\ncentury\nwithout\nbody\nblack\nnight\nwithin\ngreat\nwomen\nsingle\nve\nbuilding\nlarge\npopulation\nriver\nnamed\nband\nwhite\nstarted\n##an\nonce\n15\n20\nshould\n18\n2015\nservice\ntop\nbuilt\nbritish\nopen\ndeath\nking\nmoved\nlocal\ntimes\nchildren\nfebruary\nbook\nwhy\n11\ndoor\nneed\npresident\norder\nfinal\nroad\nwasn\nalthough\ndue\nmajor\ndied\nvillage\nthird\nknew\n2016\nasked\nturned\nst\nwanted\nsay\n##p\ntogether\nreceived\nmain\nson\nserved\ndifferent\n##en\nbehind\nhimself\nfelt\nmembers\npower\nfootball\nlaw\nvoice\nplay\n##in\nnear\npark\nhistory\n30\nhaving\n2005\n16\n##man\nsaw\nmother\n##al\narmy\npoint\nfront\nhelp\nenglish\nstreet\nart\nlate\nhands\ngames\naward\n##ia\nyoung\n14\nput\npublished\ncountry\ndivision\nacross\ntold\n13\noften\never\nfrench\nlondon\ncenter\nsix\nred\n2017\nled\ndays\ninclude\nlight\n25\nfind\ntell\namong\nspecies\nreally\naccording\ncentral\nhalf\n2004\nform\noriginal\ngave\noffice\nmaking\nenough\nlost\nfull\nopened\nmust\nincluded\nlive\ngiven\ngerman\nplayer\nrun\nbusiness\nwoman\ncommunity\ncup\nmight\nmillion\nland\n2000\ncourt\ndevelopment\n17\nshort\nround\nii\nkm\nseen\nclass\nstory\nalways\nbecome\nsure\nresearch\nalmost\ndirector\ncouncil\nla\n##2\ncareer\nthings\nusing\nisland\n##z\ncouldn\ncar\n##is\n24\nclose\nforce\n##1\nbetter\nfree\nsupport\ncontrol\nfield\nstudents\n2003\neducation\nmarried\n##b\nnothing\nworked\nothers\nrecord\nbig\ninside\nlevel\nanything\ncontinued\ngive\njames\n##3\nmilitary\nestablished\nnon\nreturned\nfeel\ndoes\ntitle\nwritten\nthing\nfeet\nwilliam\nfar\nco\nassociation\nhard\nalready\n2002\n##ra\nchampionship\nhuman\nwestern\n100\n##na\ndepartment\nhall\nrole\nvarious\nproduction\n21\n19\nheart\n2001\nliving\nfire\nversion\n##ers\n##f\ntelevision\nroyal\n##4\nproduced\nworking\nact\ncase\nsociety\nregion\npresent\nradio\nperiod\nlooking\nleast\ntotal\nkeep\nengland\nwife\nprogram\nper\nbrother\nmind\nspecial\n22\n##le\nam\nworks\nsoon\n##6\npolitical\ngeorge\nservices\ntaken\ncreated\n##7\nfurther\nable\nreached\ndavid\nunion\njoined\nupon\ndone\nimportant\nsocial\ninformation\neither\n##ic\n##x\nappeared\nposition\nground\nlead\nrock\ndark\nelection\n23\nboard\nfrance\nhair\ncourse\narms\nsite\npolice\ngirl\ninstead\nreal\nsound\n##v\nwords\nmoment\n##te\nsomeone\n##8\nsummer\nproject\nannounced\nsan\nless\nwrote\npast\nfollowed\n##5\nblue\nfounded\nal\nfinally\nindia\ntaking\nrecords\namerica\n##ne\n1999\ndesign\nconsidered\nnorthern\ngod\nstop\nbattle\ntoward\neuropean\noutside\ndescribed\ntrack\ntoday\nplaying\nlanguage\n28\ncall\n26\nheard\nprofessional\nlow\naustralia\nmiles\ncalifornia\nwin\nyet\ngreen\n##ie\ntrying\nblood\n##ton\nsouthern\nscience\nmaybe\neverything\nmatch\nsquare\n27\nmouth\nvideo\nrace\nrecorded\nleave\nabove\n##9\ndaughter\npoints\nspace\n1998\nmuseum\nchange\nmiddle\ncommon\n##0\nmove\ntv\npost\n##ta\nlake\nseven\ntried\nelected\nclosed\nten\npaul\nminister\n##th\nmonths\nstart\nchief\nreturn\ncanada\nperson\nsea\nrelease\nsimilar\nmodern\nbrought\nrest\nhit\nformed\nmr\n##la\n1997\nfloor\nevent\ndoing\nthomas\n1996\nrobert\ncare\nkilled\ntraining\nstar\nweek\nneeded\nturn\nfinished\nrailway\nrather\nnews\nhealth\nsent\nexample\nran\nterm\nmichael\ncoming\ncurrently\nyes\nforces\ndespite\ngold\nareas\n50\nstage\nfact\n29\ndead\nsays\npopular\n2018\noriginally\ngermany\nprobably\ndeveloped\nresult\npulled\nfriend\nstood\nmoney\nrunning\nmi\nsigned\nword\nsongs\nchild\neventually\nmet\ntour\naverage\nteams\nminutes\nfestival\ncurrent\ndeep\nkind\n1995\ndecided\nusually\neastern\nseemed\n##ness\nepisode\nbed\nadded\ntable\nindian\nprivate\ncharles\nroute\navailable\nidea\nthroughout\ncentre\naddition\nappointed\nstyle\n1994\nbooks\neight\nconstruction\npress\nmean\nwall\nfriends\nremained\nschools\nstudy\n##ch\n##um\ninstitute\noh\nchinese\nsometimes\nevents\npossible\n1992\naustralian\ntype\nbrown\nforward\ntalk\nprocess\nfood\ndebut\nseat\nperformance\ncommittee\nfeatures\ncharacter\narts\nherself\nelse\nlot\nstrong\nrussian\nrange\nhours\npeter\narm\n##da\nmorning\ndr\nsold\n##ry\nquickly\ndirected\n1993\nguitar\nchina\n##w\n31\nlist\n##ma\nperformed\nmedia\nuk\nplayers\nsmile\n##rs\nmyself\n40\nplaced\ncoach\nprovince\ntowards\nwouldn\nleading\nwhole\nboy\nofficial\ndesigned\ngrand\ncensus\n##el\neurope\nattack\njapanese\nhenry\n1991\n##re\n##os\ncross\ngetting\nalone\naction\nlower\nnetwork\nwide\nwashington\njapan\n1990\nhospital\nbelieve\nchanged\nsister\n##ar\nhold\ngone\nsir\nhadn\nship\n##ka\nstudies\nacademy\nshot\nrights\nbelow\nbase\nbad\ninvolved\nkept\nlargest\n##ist\nbank\nfuture\nespecially\nbeginning\nmark\nmovement\nsection\nfemale\nmagazine\nplan\nprofessor\nlord\nlonger\n##ian\nsat\nwalked\nhill\nactually\ncivil\nenergy\nmodel\nfamilies\nsize\nthus\naircraft\ncompleted\nincludes\ndata\ncaptain\n##or\nfight\nvocals\nfeatured\nrichard\nbridge\nfourth\n1989\nofficer\nstone\nhear\n##ism\nmeans\nmedical\ngroups\nmanagement\nself\nlips\ncompetition\nentire\nlived\ntechnology\nleaving\nfederal\ntournament\nbit\npassed\nhot\nindependent\nawards\nkingdom\nmary\nspent\nfine\ndoesn\nreported\n##ling\njack\nfall\nraised\nitself\nstay\ntrue\nstudio\n1988\nsports\nreplaced\nparis\nsystems\nsaint\nleader\ntheatre\nwhose\nmarket\ncapital\nparents\nspanish\ncanadian\nearth\n##ity\ncut\ndegree\nwriting\nbay\nchristian\nawarded\nnatural\nhigher\nbill\n##as\ncoast\nprovided\nprevious\nsenior\nft\nvalley\norganization\nstopped\nonto\ncountries\nparts\nconference\nqueen\nsecurity\ninterest\nsaying\nallowed\nmaster\nearlier\nphone\nmatter\nsmith\nwinning\ntry\nhappened\nmoving\ncampaign\nlos\n##ley\nbreath\nnearly\nmid\n1987\ncertain\ngirls\ndate\nitalian\nafrican\nstanding\nfell\nartist\n##ted\nshows\ndeal\nmine\nindustry\n1986\n##ng\neveryone\nrepublic\nprovide\ncollection\nlibrary\nstudent\n##ville\nprimary\nowned\nolder\nvia\nheavy\n1st\nmakes\n##able\nattention\nanyone\nafrica\n##ri\nstated\nlength\nended\nfingers\ncommand\nstaff\nskin\nforeign\nopening\ngovernor\nokay\nmedal\nkill\nsun\ncover\njob\n1985\nintroduced\nchest\nhell\nfeeling\n##ies\nsuccess\nmeet\nreason\nstandard\nmeeting\nnovel\n1984\ntrade\nsource\nbuildings\n##land\nrose\nguy\ngoal\n##ur\nchapter\nnative\nhusband\npreviously\nunit\nlimited\nentered\nweeks\nproducer\noperations\nmountain\ntakes\ncovered\nforced\nrelated\nroman\ncomplete\nsuccessful\nkey\ntexas\ncold\n##ya\nchannel\n1980\ntraditional\nfilms\ndance\nclear\napproximately\n500\nnine\nvan\nprince\nquestion\nactive\ntracks\nireland\nregional\nsilver\nauthor\npersonal\nsense\noperation\n##ine\neconomic\n1983\nholding\ntwenty\nisbn\nadditional\nspeed\nhour\nedition\nregular\nhistoric\nplaces\nwhom\nshook\nmovie\nkm²\nsecretary\nprior\nreport\nchicago\nread\nfoundation\nview\nengine\nscored\n1982\nunits\nask\nairport\nproperty\nready\nimmediately\nlady\nmonth\nlisted\ncontract\n##de\nmanager\nthemselves\nlines\n##ki\nnavy\nwriter\nmeant\n##ts\nruns\n##ro\npractice\nchampionships\nsinger\nglass\ncommission\nrequired\nforest\nstarting\nculture\ngenerally\ngiving\naccess\nattended\ntest\ncouple\nstand\ncatholic\nmartin\ncaught\nexecutive\n##less\neye\n##ey\nthinking\nchair\nquite\nshoulder\n1979\nhope\ndecision\nplays\ndefeated\nmunicipality\nwhether\nstructure\noffered\nslowly\npain\nice\ndirection\n##ion\npaper\nmission\n1981\nmostly\n200\nnoted\nindividual\nmanaged\nnature\nlives\nplant\n##ha\nhelped\nexcept\nstudied\ncomputer\nfigure\nrelationship\nissue\nsignificant\nloss\ndie\nsmiled\ngun\nago\nhighest\n1972\n##am\nmale\nbring\ngoals\nmexico\nproblem\ndistance\ncommercial\ncompletely\nlocation\nannual\nfamous\ndrive\n1976\nneck\n1978\nsurface\ncaused\nitaly\nunderstand\ngreek\nhighway\nwrong\nhotel\ncomes\nappearance\njoseph\ndouble\nissues\nmusical\ncompanies\ncastle\nincome\nreview\nassembly\nbass\ninitially\nparliament\nartists\nexperience\n1974\nparticular\nwalk\nfoot\nengineering\ntalking\nwindow\ndropped\n##ter\nmiss\nbaby\nboys\nbreak\n1975\nstars\nedge\nremember\npolicy\ncarried\ntrain\nstadium\nbar\nsex\nangeles\nevidence\n##ge\nbecoming\nassistant\nsoviet\n1977\nupper\nstep\nwing\n1970\nyouth\nfinancial\nreach\n##ll\nactor\nnumerous\n##se\n##st\nnodded\narrived\n##ation\nminute\n##nt\nbelieved\nsorry\ncomplex\nbeautiful\nvictory\nassociated\ntemple\n1968\n1973\nchance\nperhaps\nmetal\n##son\n1945\nbishop\n##et\nlee\nlaunched\nparticularly\ntree\nle\nretired\nsubject\nprize\ncontains\nyeah\ntheory\nempire\n##ce\nsuddenly\nwaiting\ntrust\nrecording\n##to\nhappy\nterms\ncamp\nchampion\n1971\nreligious\npass\nzealand\nnames\n2nd\nport\nancient\ntom\ncorner\nrepresented\nwatch\nlegal\nanti\njustice\ncause\nwatched\nbrothers\n45\nmaterial\nchanges\nsimply\nresponse\nlouis\nfast\n##ting\nanswer\n60\nhistorical\n1969\nstories\nstraight\ncreate\nfeature\nincreased\nrate\nadministration\nvirginia\nel\nactivities\ncultural\noverall\nwinner\nprograms\nbasketball\nlegs\nguard\nbeyond\ncast\ndoctor\nmm\nflight\nresults\nremains\ncost\neffect\nwinter\n##ble\nlarger\nislands\nproblems\nchairman\ngrew\ncommander\nisn\n1967\npay\nfailed\nselected\nhurt\nfort\nbox\nregiment\nmajority\njournal\n35\nedward\nplans\n##ke\n##ni\nshown\npretty\nirish\ncharacters\ndirectly\nscene\nlikely\noperated\nallow\nspring\n##j\njunior\nmatches\nlooks\nmike\nhouses\nfellow\n##tion\nbeach\nmarriage\n##ham\n##ive\nrules\noil\n65\nflorida\nexpected\nnearby\ncongress\nsam\npeace\nrecent\niii\nwait\nsubsequently\ncell\n##do\nvariety\nserving\nagreed\nplease\npoor\njoe\npacific\nattempt\nwood\ndemocratic\npiece\nprime\n##ca\nrural\nmile\ntouch\nappears\ntownship\n1964\n1966\nsoldiers\n##men\n##ized\n1965\npennsylvania\ncloser\nfighting\nclaimed\nscore\njones\nphysical\neditor\n##ous\nfilled\ngenus\nspecific\nsitting\nsuper\nmom\n##va\ntherefore\nsupported\nstatus\nfear\ncases\nstore\nmeaning\nwales\nminor\nspain\ntower\nfocus\nvice\nfrank\nfollow\nparish\nseparate\ngolden\nhorse\nfifth\nremaining\nbranch\n32\npresented\nstared\n##id\nuses\nsecret\nforms\n##co\nbaseball\nexactly\n##ck\nchoice\nnote\ndiscovered\ntravel\ncomposed\ntruth\nrussia\nball\ncolor\nkiss\ndad\nwind\ncontinue\nring\nreferred\nnumbers\ndigital\ngreater\n##ns\nmetres\nslightly\ndirect\nincrease\n1960\nresponsible\ncrew\nrule\ntrees\ntroops\n##no\nbroke\ngoes\nindividuals\nhundred\nweight\ncreek\nsleep\nmemory\ndefense\nprovides\nordered\ncode\nvalue\njewish\nwindows\n1944\nsafe\njudge\nwhatever\ncorps\nrealized\ngrowing\npre\n##ga\ncities\nalexander\ngaze\nlies\nspread\nscott\nletter\nshowed\nsituation\nmayor\ntransport\nwatching\nworkers\nextended\n##li\nexpression\nnormal\n##ment\nchart\nmultiple\nborder\n##ba\nhost\n##ner\ndaily\nmrs\nwalls\npiano\n##ko\nheat\ncannot\n##ate\nearned\nproducts\ndrama\nera\nauthority\nseasons\njoin\ngrade\n##io\nsign\ndifficult\nmachine\n1963\nterritory\nmainly\n##wood\nstations\nsquadron\n1962\nstepped\niron\n19th\n##led\nserve\nappear\nsky\nspeak\nbroken\ncharge\nknowledge\nkilometres\nremoved\nships\narticle\ncampus\nsimple\n##ty\npushed\nbritain\n##ve\nleaves\nrecently\ncd\nsoft\nboston\nlatter\neasy\nacquired\npoland\n##sa\nquality\nofficers\npresence\nplanned\nnations\nmass\nbroadcast\njean\nshare\nimage\ninfluence\nwild\noffer\nemperor\nelectric\nreading\nheaded\nability\npromoted\nyellow\nministry\n1942\nthroat\nsmaller\npolitician\n##by\nlatin\nspoke\ncars\nwilliams\nmales\nlack\npop\n80\n##ier\nacting\nseeing\nconsists\n##ti\nestate\n1961\npressure\njohnson\nnewspaper\njr\nchris\nolympics\nonline\nconditions\nbeat\nelements\nwalking\nvote\n##field\nneeds\ncarolina\ntext\nfeaturing\nglobal\nblock\nshirt\nlevels\nfrancisco\npurpose\nfemales\net\ndutch\nduke\nahead\ngas\ntwice\nsafety\nserious\nturning\nhighly\nlieutenant\nfirm\nmaria\namount\nmixed\ndaniel\nproposed\nperfect\nagreement\naffairs\n3rd\nseconds\ncontemporary\npaid\n1943\nprison\nsave\nkitchen\nlabel\nadministrative\nintended\nconstructed\nacademic\nnice\nteacher\nraces\n1956\nformerly\ncorporation\nben\nnation\nissued\nshut\n1958\ndrums\nhousing\nvictoria\nseems\nopera\n1959\ngraduated\nfunction\nvon\nmentioned\npicked\nbuild\nrecognized\nshortly\nprotection\npicture\nnotable\nexchange\nelections\n1980s\nloved\npercent\nracing\nfish\nelizabeth\ngarden\nvolume\nhockey\n1941\nbeside\nsettled\n##ford\n1940\ncompeted\nreplied\ndrew\n1948\nactress\nmarine\nscotland\nsteel\nglanced\nfarm\nsteve\n1957\nrisk\ntonight\npositive\nmagic\nsingles\neffects\ngray\nscreen\ndog\n##ja\nresidents\nbus\nsides\nnone\nsecondary\nliterature\npolish\ndestroyed\nflying\nfounder\nhouseholds\n1939\nlay\nreserve\nusa\ngallery\n##ler\n1946\nindustrial\nyounger\napproach\nappearances\nurban\nones\n1950\nfinish\navenue\npowerful\nfully\ngrowth\npage\nhonor\njersey\nprojects\nadvanced\nrevealed\nbasic\n90\ninfantry\npair\nequipment\nvisit\n33\nevening\nsearch\ngrant\neffort\nsolo\ntreatment\nburied\nrepublican\nprimarily\nbottom\nowner\n1970s\nisrael\ngives\njim\ndream\nbob\nremain\nspot\n70\nnotes\nproduce\nchampions\ncontact\ned\nsoul\naccepted\nways\ndel\n##ally\nlosing\nsplit\nprice\ncapacity\nbasis\ntrial\nquestions\n##ina\n1955\n20th\nguess\nofficially\nmemorial\nnaval\ninitial\n##ization\nwhispered\nmedian\nengineer\n##ful\nsydney\n##go\ncolumbia\nstrength\n300\n1952\ntears\nsenate\n00\ncard\nasian\nagent\n1947\nsoftware\n44\ndraw\nwarm\nsupposed\ncom\npro\n##il\ntransferred\nleaned\n##at\ncandidate\nescape\nmountains\nasia\npotential\nactivity\nentertainment\nseem\ntraffic\njackson\nmurder\n36\nslow\nproduct\norchestra\nhaven\nagency\nbbc\ntaught\nwebsite\ncomedy\nunable\nstorm\nplanning\nalbums\nrugby\nenvironment\nscientific\ngrabbed\nprotect\n##hi\nboat\ntypically\n1954\n1953\ndamage\nprincipal\ndivided\ndedicated\nmount\nohio\n##berg\npick\nfought\ndriver\n##der\nempty\nshoulders\nsort\nthank\nberlin\nprominent\naccount\nfreedom\nnecessary\nefforts\nalex\nheadquarters\nfollows\nalongside\ndes\nsimon\nandrew\nsuggested\noperating\nlearning\nsteps\n1949\nsweet\ntechnical\nbegin\neasily\n34\nteeth\nspeaking\nsettlement\nscale\n##sh\nrenamed\nray\nmax\nenemy\nsemi\njoint\ncompared\n##rd\nscottish\nleadership\nanalysis\noffers\ngeorgia\npieces\ncaptured\nanimal\ndeputy\nguest\norganized\n##lin\ntony\ncombined\nmethod\nchallenge\n1960s\nhuge\nwants\nbattalion\nsons\nrise\ncrime\ntypes\nfacilities\ntelling\npath\n1951\nplatform\nsit\n1990s\n##lo\ntells\nassigned\nrich\npull\n##ot\ncommonly\nalive\n##za\nletters\nconcept\nconducted\nwearing\nhappen\nbought\nbecomes\nholy\ngets\nocean\ndefeat\nlanguages\npurchased\ncoffee\noccurred\ntitled\n##q\ndeclared\napplied\nsciences\nconcert\nsounds\njazz\nbrain\n##me\npainting\nfleet\ntax\nnick\n##ius\nmichigan\ncount\nanimals\nleaders\nepisodes\n##line\ncontent\n##den\nbirth\n##it\nclubs\n64\npalace\ncritical\nrefused\nfair\nleg\nlaughed\nreturning\nsurrounding\nparticipated\nformation\nlifted\npointed\nconnected\nrome\nmedicine\nlaid\ntaylor\nsanta\npowers\nadam\ntall\nshared\nfocused\nknowing\nyards\nentrance\nfalls\n##wa\ncalling\n##ad\nsources\nchosen\nbeneath\nresources\nyard\n##ite\nnominated\nsilence\nzone\ndefined\n##que\ngained\nthirty\n38\nbodies\nmoon\n##ard\nadopted\nchristmas\nwidely\nregister\napart\niran\npremier\nserves\ndu\nunknown\nparties\n##les\ngeneration\n##ff\ncontinues\nquick\nfields\nbrigade\nquiet\nteaching\nclothes\nimpact\nweapons\npartner\nflat\ntheater\nsupreme\n1938\n37\nrelations\n##tor\nplants\nsuffered\n1936\nwilson\nkids\nbegins\n##age\n1918\nseats\narmed\ninternet\nmodels\nworth\nlaws\n400\ncommunities\nclasses\nbackground\nknows\nthanks\nquarter\nreaching\nhumans\ncarry\nkilling\nformat\nkong\nhong\nsetting\n75\narchitecture\ndisease\nrailroad\ninc\npossibly\nwish\narthur\nthoughts\nharry\ndoors\ndensity\n##di\ncrowd\nillinois\nstomach\ntone\nunique\nreports\nanyway\n##ir\nliberal\nder\nvehicle\nthick\ndry\ndrug\nfaced\nlargely\nfacility\ntheme\nholds\ncreation\nstrange\ncolonel\n##mi\nrevolution\nbell\npolitics\nturns\nsilent\nrail\nrelief\nindependence\ncombat\nshape\nwrite\ndetermined\nsales\nlearned\n4th\nfinger\noxford\nproviding\n1937\nheritage\nfiction\nsituated\ndesignated\nallowing\ndistribution\nhosted\n##est\nsight\ninterview\nestimated\nreduced\n##ria\ntoronto\nfootballer\nkeeping\nguys\ndamn\nclaim\nmotion\nsport\nsixth\nstayed\n##ze\nen\nrear\nreceive\nhanded\ntwelve\ndress\naudience\ngranted\nbrazil\n##well\nspirit\n##ated\nnoticed\netc\nolympic\nrepresentative\neric\ntight\ntrouble\nreviews\ndrink\nvampire\nmissing\nroles\nranked\nnewly\nhousehold\nfinals\nwave\ncritics\n##ee\nphase\nmassachusetts\npilot\nunlike\nphiladelphia\nbright\nguns\ncrown\norganizations\nroof\n42\nrespectively\nclearly\ntongue\nmarked\ncircle\nfox\nkorea\nbronze\nbrian\nexpanded\nsexual\nsupply\nyourself\ninspired\nlabour\nfc\n##ah\nreference\nvision\ndraft\nconnection\nbrand\nreasons\n1935\nclassic\ndriving\ntrip\njesus\ncells\nentry\n1920\nneither\ntrail\nclaims\natlantic\norders\nlabor\nnose\nafraid\nidentified\nintelligence\ncalls\ncancer\nattacked\npassing\nstephen\npositions\nimperial\ngrey\njason\n39\nsunday\n48\nswedish\navoid\nextra\nuncle\nmessage\ncovers\nallows\nsurprise\nmaterials\nfame\nhunter\n##ji\n1930\ncitizens\nfigures\ndavis\nenvironmental\nconfirmed\nshit\ntitles\ndi\nperforming\ndifference\nacts\nattacks\n##ov\nexisting\nvotes\nopportunity\nnor\nshop\nentirely\ntrains\nopposite\npakistan\n##pa\ndevelop\nresulted\nrepresentatives\nactions\nreality\npressed\n##ish\nbarely\nwine\nconversation\nfaculty\nnorthwest\nends\ndocumentary\nnuclear\nstock\ngrace\nsets\neat\nalternative\n##ps\nbag\nresulting\ncreating\nsurprised\ncemetery\n1919\ndrop\nfinding\nsarah\ncricket\nstreets\ntradition\nride\n1933\nexhibition\ntarget\near\nexplained\nrain\ncomposer\ninjury\napartment\nmunicipal\neducational\noccupied\nnetherlands\nclean\nbillion\nconstitution\nlearn\n1914\nmaximum\nclassical\nfrancis\nlose\nopposition\njose\nontario\nbear\ncore\nhills\nrolled\nending\ndrawn\npermanent\nfun\n##tes\n##lla\nlewis\nsites\nchamber\nryan\n##way\nscoring\nheight\n1934\n##house\nlyrics\nstaring\n55\nofficials\n1917\nsnow\noldest\n##tic\norange\n##ger\nqualified\ninterior\napparently\nsucceeded\nthousand\ndinner\nlights\nexistence\nfans\nheavily\n41\ngreatest\nconservative\nsend\nbowl\nplus\nenter\ncatch\n##un\neconomy\nduty\n1929\nspeech\nauthorities\nprincess\nperformances\nversions\nshall\ngraduate\npictures\neffective\nremembered\npoetry\ndesk\ncrossed\nstarring\nstarts\npassenger\nsharp\n##ant\nacres\nass\nweather\nfalling\nrank\nfund\nsupporting\ncheck\nadult\npublishing\nheads\ncm\nsoutheast\nlane\n##burg\napplication\nbc\n##ura\nles\ncondition\ntransfer\nprevent\ndisplay\nex\nregions\nearl\nfederation\ncool\nrelatively\nanswered\nbesides\n1928\nobtained\nportion\n##town\nmix\n##ding\nreaction\nliked\ndean\nexpress\npeak\n1932\n##tte\ncounter\nreligion\nchain\nrare\nmiller\nconvention\naid\nlie\nvehicles\nmobile\nperform\nsquad\nwonder\nlying\ncrazy\nsword\n##ping\nattempted\ncenturies\nweren\nphilosophy\ncategory\n##ize\nanna\ninterested\n47\nsweden\nwolf\nfrequently\nabandoned\nkg\nliterary\nalliance\ntask\nentitled\n##ay\nthrew\npromotion\nfactory\ntiny\nsoccer\nvisited\nmatt\nfm\nachieved\n52\ndefence\ninternal\npersian\n43\nmethods\n##ging\narrested\notherwise\ncambridge\nprogramming\nvillages\nelementary\ndistricts\nrooms\ncriminal\nconflict\nworry\ntrained\n1931\nattempts\nwaited\nsignal\nbird\ntruck\nsubsequent\nprogramme\n##ol\nad\n49\ncommunist\ndetails\nfaith\nsector\npatrick\ncarrying\nlaugh\n##ss\ncontrolled\nkorean\nshowing\norigin\nfuel\nevil\n1927\n##ent\nbrief\nidentity\ndarkness\naddress\npool\nmissed\npublication\nweb\nplanet\nian\nanne\nwings\ninvited\n##tt\nbriefly\nstandards\nkissed\n##be\nideas\nclimate\ncausing\nwalter\nworse\nalbert\narticles\nwinners\ndesire\naged\nnortheast\ndangerous\ngate\ndoubt\n1922\nwooden\nmulti\n##ky\npoet\nrising\nfunding\n46\ncommunications\ncommunication\nviolence\ncopies\nprepared\nford\ninvestigation\nskills\n1924\npulling\nelectronic\n##ak\n##ial\n##han\ncontaining\nultimately\noffices\nsinging\nunderstanding\nrestaurant\ntomorrow\nfashion\nchrist\nward\nda\npope\nstands\n5th\nflow\nstudios\naired\ncommissioned\ncontained\nexist\nfresh\namericans\n##per\nwrestling\napproved\nkid\nemployed\nrespect\nsuit\n1925\nangel\nasking\nincreasing\nframe\nangry\nselling\n1950s\nthin\nfinds\n##nd\ntemperature\nstatement\nali\nexplain\ninhabitants\ntowns\nextensive\nnarrow\n51\njane\nflowers\nimages\npromise\nsomewhere\nobject\nfly\nclosely\n##ls\n1912\nbureau\ncape\n1926\nweekly\npresidential\nlegislative\n1921\n##ai\n##au\nlaunch\nfounding\n##ny\n978\n##ring\nartillery\nstrike\nun\ninstitutions\nroll\nwriters\nlanding\nchose\nkevin\nanymore\npp\n##ut\nattorney\nfit\ndan\nbillboard\nreceiving\nagricultural\nbreaking\nsought\ndave\nadmitted\nlands\nmexican\n##bury\ncharlie\nspecifically\nhole\niv\nhoward\ncredit\nmoscow\nroads\naccident\n1923\nproved\nwear\nstruck\nhey\nguards\nstuff\nslid\nexpansion\n1915\ncat\nanthony\n##kin\nmelbourne\nopposed\nsub\nsouthwest\narchitect\nfailure\nplane\n1916\n##ron\nmap\ncamera\ntank\nlisten\nregarding\nwet\nintroduction\nmetropolitan\nlink\nep\nfighter\ninch\ngrown\ngene\nanger\nfixed\nbuy\ndvd\nkhan\ndomestic\nworldwide\nchapel\nmill\nfunctions\nexamples\n##head\ndeveloping\n1910\nturkey\nhits\npocket\nantonio\npapers\ngrow\nunless\ncircuit\n18th\nconcerned\nattached\njournalist\nselection\njourney\nconverted\nprovincial\npainted\nhearing\naren\nbands\nnegative\naside\nwondered\nknight\nlap\nsurvey\nma\n##ow\nnoise\nbilly\n##ium\nshooting\nguide\nbedroom\npriest\nresistance\nmotor\nhomes\nsounded\ngiant\n##mer\n150\nscenes\nequal\ncomic\npatients\nhidden\nsolid\nactual\nbringing\nafternoon\ntouched\nfunds\nwedding\nconsisted\nmarie\ncanal\nsr\nkim\ntreaty\nturkish\nrecognition\nresidence\ncathedral\nbroad\nknees\nincident\nshaped\nfired\nnorwegian\nhandle\ncheek\ncontest\nrepresent\n##pe\nrepresenting\nbeauty\n##sen\nbirds\nadvantage\nemergency\nwrapped\ndrawing\nnotice\npink\nbroadcasting\n##ong\nsomehow\nbachelor\nseventh\ncollected\nregistered\nestablishment\nalan\nassumed\nchemical\npersonnel\nroger\nretirement\njeff\nportuguese\nwore\ntied\ndevice\nthreat\nprogress\nadvance\n##ised\nbanks\nhired\nmanchester\nnfl\nteachers\nstructures\nforever\n##bo\ntennis\nhelping\nsaturday\nsale\napplications\njunction\nhip\nincorporated\nneighborhood\ndressed\nceremony\n##ds\ninfluenced\nhers\nvisual\nstairs\ndecades\ninner\nkansas\nhung\nhoped\ngain\nscheduled\ndowntown\nengaged\naustria\nclock\nnorway\ncertainly\npale\nprotected\n1913\nvictor\nemployees\nplate\nputting\nsurrounded\n##ists\nfinishing\nblues\ntropical\n##ries\nminnesota\nconsider\nphilippines\naccept\n54\nretrieved\n1900\nconcern\nanderson\nproperties\ninstitution\ngordon\nsuccessfully\nvietnam\n##dy\nbacking\noutstanding\nmuslim\ncrossing\nfolk\nproducing\nusual\ndemand\noccurs\nobserved\nlawyer\neducated\n##ana\nkelly\nstring\npleasure\nbudget\nitems\nquietly\ncolorado\nphilip\ntypical\n##worth\nderived\n600\nsurvived\nasks\nmental\n##ide\n56\njake\njews\ndistinguished\nltd\n1911\nsri\nextremely\n53\nathletic\nloud\nthousands\nworried\nshadow\ntransportation\nhorses\nweapon\narena\nimportance\nusers\ntim\nobjects\ncontributed\ndragon\ndouglas\naware\nsenator\njohnny\njordan\nsisters\nengines\nflag\ninvestment\nsamuel\nshock\ncapable\nclark\nrow\nwheel\nrefers\nsession\nfamiliar\nbiggest\nwins\nhate\nmaintained\ndrove\nhamilton\nrequest\nexpressed\ninjured\nunderground\nchurches\nwalker\nwars\ntunnel\npasses\nstupid\nagriculture\nsoftly\ncabinet\nregarded\njoining\nindiana\n##ea\n##ms\npush\ndates\nspend\nbehavior\nwoods\nprotein\ngently\nchase\nmorgan\nmention\nburning\nwake\ncombination\noccur\nmirror\nleads\njimmy\nindeed\nimpossible\nsingapore\npaintings\ncovering\n##nes\nsoldier\nlocations\nattendance\nsell\nhistorian\nwisconsin\ninvasion\nargued\npainter\ndiego\nchanging\negypt\n##don\nexperienced\ninches\n##ku\nmissouri\nvol\ngrounds\nspoken\nswitzerland\n##gan\nreform\nrolling\nha\nforget\nmassive\nresigned\nburned\nallen\ntennessee\nlocked\nvalues\nimproved\n##mo\nwounded\nuniverse\nsick\ndating\nfacing\npack\npurchase\nuser\n##pur\nmoments\n##ul\nmerged\nanniversary\n1908\ncoal\nbrick\nunderstood\ncauses\ndynasty\nqueensland\nestablish\nstores\ncrisis\npromote\nhoping\nviews\ncards\nreferee\nextension\n##si\nraise\narizona\nimprove\ncolonial\nformal\ncharged\n##rt\npalm\nlucky\nhide\nrescue\nfaces\n95\nfeelings\ncandidates\njuan\n##ell\ngoods\n6th\ncourses\nweekend\n59\nluke\ncash\nfallen\n##om\ndelivered\naffected\ninstalled\ncarefully\ntries\nswiss\nhollywood\ncosts\nlincoln\nresponsibility\n##he\nshore\nfile\nproper\nnormally\nmaryland\nassistance\njump\nconstant\noffering\nfriendly\nwaters\npersons\nrealize\ncontain\ntrophy\n800\npartnership\nfactor\n58\nmusicians\ncry\nbound\noregon\nindicated\nhero\nhouston\nmedium\n##ure\nconsisting\nsomewhat\n##ara\n57\ncycle\n##che\nbeer\nmoore\nfrederick\ngotten\neleven\nworst\nweak\napproached\narranged\nchin\nloan\nuniversal\nbond\nfifteen\npattern\ndisappeared\n##ney\ntranslated\n##zed\nlip\narab\ncapture\ninterests\ninsurance\n##chi\nshifted\ncave\nprix\nwarning\nsections\ncourts\ncoat\nplot\nsmell\nfeed\ngolf\nfavorite\nmaintain\nknife\nvs\nvoted\ndegrees\nfinance\nquebec\nopinion\ntranslation\nmanner\nruled\noperate\nproductions\nchoose\nmusician\ndiscovery\nconfused\ntired\nseparated\nstream\ntechniques\ncommitted\nattend\nranking\nkings\nthrow\npassengers\nmeasure\nhorror\nfan\nmining\nsand\ndanger\nsalt\ncalm\ndecade\ndam\nrequire\nrunner\n##ik\nrush\nassociate\ngreece\n##ker\nrivers\nconsecutive\nmatthew\n##ski\nsighed\nsq\ndocuments\nsteam\nedited\nclosing\ntie\naccused\n1905\n##ini\nislamic\ndistributed\ndirectors\norganisation\nbruce\n7th\nbreathing\nmad\nlit\narrival\nconcrete\ntaste\n08\ncomposition\nshaking\nfaster\namateur\nadjacent\nstating\n1906\ntwin\nflew\n##ran\ntokyo\npublications\n##tone\nobviously\nridge\nstorage\n1907\ncarl\npages\nconcluded\ndesert\ndriven\nuniversities\nages\nterminal\nsequence\nborough\n250\nconstituency\ncreative\ncousin\neconomics\ndreams\nmargaret\nnotably\nreduce\nmontreal\nmode\n17th\nears\nsaved\njan\nvocal\n##ica\n1909\nandy\n##jo\nriding\nroughly\nthreatened\n##ise\nmeters\nmeanwhile\nlanded\ncompete\nrepeated\ngrass\nczech\nregularly\ncharges\ntea\nsudden\nappeal\n##ung\nsolution\ndescribes\npierre\nclassification\nglad\nparking\n##ning\nbelt\nphysics\n99\nrachel\nadd\nhungarian\nparticipate\nexpedition\ndamaged\ngift\nchildhood\n85\nfifty\n##red\nmathematics\njumped\nletting\ndefensive\nmph\n##ux\n##gh\ntesting\n##hip\nhundreds\nshoot\nowners\nmatters\nsmoke\nisraeli\nkentucky\ndancing\nmounted\ngrandfather\nemma\ndesigns\nprofit\nargentina\n##gs\ntruly\nli\nlawrence\ncole\nbegun\ndetroit\nwilling\nbranches\nsmiling\ndecide\nmiami\nenjoyed\nrecordings\n##dale\npoverty\nethnic\ngay\n##bi\ngary\narabic\n09\naccompanied\n##one\n##ons\nfishing\ndetermine\nresidential\nacid\n##ary\nalice\nreturns\nstarred\nmail\n##ang\njonathan\nstrategy\n##ue\nnet\nforty\ncook\nbusinesses\nequivalent\ncommonwealth\ndistinct\nill\n##cy\nseriously\n##ors\n##ped\nshift\nharris\nreplace\nrio\nimagine\nformula\nensure\n##ber\nadditionally\nscheme\nconservation\noccasionally\npurposes\nfeels\nfavor\n##and\n##ore\n1930s\ncontrast\nhanging\nhunt\nmovies\n1904\ninstruments\nvictims\ndanish\nchristopher\nbusy\ndemon\nsugar\nearliest\ncolony\nstudying\nbalance\nduties\n##ks\nbelgium\nslipped\ncarter\n05\nvisible\nstages\niraq\nfifa\n##im\ncommune\nforming\nzero\n07\ncontinuing\ntalked\ncounties\nlegend\nbathroom\noption\ntail\nclay\ndaughters\nafterwards\nsevere\njaw\nvisitors\n##ded\ndevices\naviation\nrussell\nkate\n##vi\nentering\nsubjects\n##ino\ntemporary\nswimming\nforth\nsmooth\nghost\naudio\nbush\noperates\nrocks\nmovements\nsigns\neddie\n##tz\nann\nvoices\nhonorary\n06\nmemories\ndallas\npure\nmeasures\nracial\npromised\n66\nharvard\nceo\n16th\nparliamentary\nindicate\nbenefit\nflesh\ndublin\nlouisiana\n1902\n1901\npatient\nsleeping\n1903\nmembership\ncoastal\nmedieval\nwanting\nelement\nscholars\nrice\n62\nlimit\nsurvive\nmakeup\nrating\ndefinitely\ncollaboration\nobvious\n##tan\nboss\nms\nbaron\nbirthday\nlinked\nsoil\ndiocese\n##lan\nncaa\n##mann\noffensive\nshell\nshouldn\nwaist\n##tus\nplain\nross\norgan\nresolution\nmanufacturing\nadding\nrelative\nkennedy\n98\nwhilst\nmoth\nmarketing\ngardens\ncrash\n72\nheading\npartners\ncredited\ncarlos\nmoves\ncable\n##zi\nmarshall\n##out\ndepending\nbottle\nrepresents\nrejected\nresponded\nexisted\n04\njobs\ndenmark\nlock\n##ating\ntreated\ngraham\nroutes\ntalent\ncommissioner\ndrugs\nsecure\ntests\nreign\nrestored\nphotography\n##gi\ncontributions\noklahoma\ndesigner\ndisc\ngrin\nseattle\nrobin\npaused\natlanta\nunusual\n##gate\npraised\nlas\nlaughing\nsatellite\nhungary\nvisiting\n##sky\ninteresting\nfactors\ndeck\npoems\nnorman\n##water\nstuck\nspeaker\nrifle\ndomain\npremiered\n##her\ndc\ncomics\nactors\n01\nreputation\neliminated\n8th\nceiling\nprisoners\nscript\n##nce\nleather\naustin\nmississippi\nrapidly\nadmiral\nparallel\ncharlotte\nguilty\ntools\ngender\ndivisions\nfruit\n##bs\nlaboratory\nnelson\nfantasy\nmarry\nrapid\naunt\ntribe\nrequirements\naspects\nsuicide\namongst\nadams\nbone\nukraine\nabc\nkick\nsees\nedinburgh\nclothing\ncolumn\nrough\ngods\nhunting\nbroadway\ngathered\nconcerns\n##ek\nspending\nty\n12th\nsnapped\nrequires\nsolar\nbones\ncavalry\n##tta\niowa\ndrinking\nwaste\nindex\nfranklin\ncharity\nthompson\nstewart\ntip\nflash\nlandscape\nfriday\nenjoy\nsingh\npoem\nlistening\n##back\neighth\nfred\ndifferences\nadapted\nbomb\nukrainian\nsurgery\ncorporate\nmasters\nanywhere\n##more\nwaves\nodd\nsean\nportugal\norleans\ndick\ndebate\nkent\neating\npuerto\ncleared\n96\nexpect\ncinema\n97\nguitarist\nblocks\nelectrical\nagree\ninvolving\ndepth\ndying\npanel\nstruggle\n##ged\npeninsula\nadults\nnovels\nemerged\nvienna\nmetro\ndebuted\nshoes\ntamil\nsongwriter\nmeets\nprove\nbeating\ninstance\nheaven\nscared\nsending\nmarks\nartistic\npassage\nsuperior\n03\nsignificantly\nshopping\n##tive\nretained\n##izing\nmalaysia\ntechnique\ncheeks\n##ola\nwarren\nmaintenance\ndestroy\nextreme\nallied\n120\nappearing\n##yn\nfill\nadvice\nalabama\nqualifying\npolicies\ncleveland\nhat\nbattery\nsmart\nauthors\n10th\nsoundtrack\nacted\ndated\nlb\nglance\nequipped\ncoalition\nfunny\nouter\nambassador\nroy\npossibility\ncouples\ncampbell\ndna\nloose\nethan\nsupplies\n1898\ngonna\n88\nmonster\n##res\nshake\nagents\nfrequency\nsprings\ndogs\npractices\n61\ngang\nplastic\neasier\nsuggests\ngulf\nblade\nexposed\ncolors\nindustries\nmarkets\npan\nnervous\nelectoral\ncharts\nlegislation\nownership\n##idae\nmac\nappointment\nshield\ncopy\nassault\nsocialist\nabbey\nmonument\nlicense\nthrone\nemployment\njay\n93\nreplacement\ncharter\ncloud\npowered\nsuffering\naccounts\noak\nconnecticut\nstrongly\nwright\ncolour\ncrystal\n13th\ncontext\nwelsh\nnetworks\nvoiced\ngabriel\njerry\n##cing\nforehead\nmp\n##ens\nmanage\nschedule\ntotally\nremix\n##ii\nforests\noccupation\nprint\nnicholas\nbrazilian\nstrategic\nvampires\nengineers\n76\nroots\nseek\ncorrect\ninstrumental\nund\nalfred\nbacked\nhop\n##des\nstanley\nrobinson\ntraveled\nwayne\nwelcome\naustrian\nachieve\n67\nexit\nrates\n1899\nstrip\nwhereas\n##cs\nsing\ndeeply\nadventure\nbobby\nrick\njamie\ncareful\ncomponents\ncap\nuseful\npersonality\nknee\n##shi\npushing\nhosts\n02\nprotest\nca\nottoman\nsymphony\n##sis\n63\nboundary\n1890\nprocesses\nconsidering\nconsiderable\ntons\n##work\n##ft\n##nia\ncooper\ntrading\ndear\nconduct\n91\nillegal\napple\nrevolutionary\nholiday\ndefinition\nharder\n##van\njacob\ncircumstances\ndestruction\n##lle\npopularity\ngrip\nclassified\nliverpool\ndonald\nbaltimore\nflows\nseeking\nhonour\napproval\n92\nmechanical\ntill\nhappening\nstatue\ncritic\nincreasingly\nimmediate\ndescribe\ncommerce\nstare\n##ster\nindonesia\nmeat\nrounds\nboats\nbaker\northodox\ndepression\nformally\nworn\nnaked\nclaire\nmuttered\nsentence\n11th\nemily\ndocument\n77\ncriticism\nwished\nvessel\nspiritual\nbent\nvirgin\nparker\nminimum\nmurray\nlunch\ndanny\nprinted\ncompilation\nkeyboards\nfalse\nblow\nbelonged\n68\nraising\n78\ncutting\n##board\npittsburgh\n##up\n9th\nshadows\n81\nhated\nindigenous\njon\n15th\nbarry\nscholar\nah\n##zer\noliver\n##gy\nstick\nsusan\nmeetings\nattracted\nspell\nromantic\n##ver\nye\n1895\nphoto\ndemanded\ncustomers\n##ac\n1896\nlogan\nrevival\nkeys\nmodified\ncommanded\njeans\n##ious\nupset\nraw\nphil\ndetective\nhiding\nresident\nvincent\n##bly\nexperiences\ndiamond\ndefeating\ncoverage\nlucas\nexternal\nparks\nfranchise\nhelen\nbible\nsuccessor\npercussion\ncelebrated\nil\nlift\nprofile\nclan\nromania\n##ied\nmills\n##su\nnobody\nachievement\nshrugged\nfault\n1897\nrhythm\ninitiative\nbreakfast\ncarbon\n700\n69\nlasted\nviolent\n74\nwound\nken\nkiller\ngradually\nfilmed\n°c\ndollars\nprocessing\n94\nremove\ncriticized\nguests\nsang\nchemistry\n##vin\nlegislature\ndisney\n##bridge\nuniform\nescaped\nintegrated\nproposal\npurple\ndenied\nliquid\nkarl\ninfluential\nmorris\nnights\nstones\nintense\nexperimental\ntwisted\n71\n84\n##ld\npace\nnazi\nmitchell\nny\nblind\nreporter\nnewspapers\n14th\ncenters\nburn\nbasin\nforgotten\nsurviving\nfiled\ncollections\nmonastery\nlosses\nmanual\ncouch\ndescription\nappropriate\nmerely\ntag\nmissions\nsebastian\nrestoration\nreplacing\ntriple\n73\nelder\njulia\nwarriors\nbenjamin\njulian\nconvinced\nstronger\namazing\ndeclined\nversus\nmerchant\nhappens\noutput\nfinland\nbare\nbarbara\nabsence\nignored\ndawn\ninjuries\n##port\nproducers\n##ram\n82\nluis\n##ities\nkw\nadmit\nexpensive\nelectricity\nnba\nexception\nsymbol\n##ving\nladies\nshower\nsheriff\ncharacteristics\n##je\naimed\nbutton\nratio\neffectively\nsummit\nangle\njury\nbears\nfoster\nvessels\npants\nexecuted\nevans\ndozen\nadvertising\nkicked\npatrol\n1889\ncompetitions\nlifetime\nprinciples\nathletics\n##logy\nbirmingham\nsponsored\n89\nrob\nnomination\n1893\nacoustic\n##sm\ncreature\nlongest\n##tra\ncredits\nharbor\ndust\njosh\n##so\nterritories\nmilk\ninfrastructure\ncompletion\nthailand\nindians\nleon\narchbishop\n##sy\nassist\npitch\nblake\narrangement\ngirlfriend\nserbian\noperational\nhence\nsad\nscent\nfur\ndj\nsessions\nhp\nrefer\nrarely\n##ora\nexists\n1892\n##ten\nscientists\ndirty\npenalty\nburst\nportrait\nseed\n79\npole\nlimits\nrival\n1894\nstable\nalpha\ngrave\nconstitutional\nalcohol\narrest\nflower\nmystery\ndevil\narchitectural\nrelationships\ngreatly\nhabitat\n##istic\nlarry\nprogressive\nremote\ncotton\n##ics\n##ok\npreserved\nreaches\n##ming\ncited\n86\nvast\nscholarship\ndecisions\ncbs\njoy\nteach\n1885\neditions\nknocked\neve\nsearching\npartly\nparticipation\ngap\nanimated\nfate\nexcellent\n##ett\nna\n87\nalternate\nsaints\nyoungest\n##ily\nclimbed\n##ita\n##tors\nsuggest\n##ct\ndiscussion\nstaying\nchoir\nlakes\njacket\nrevenue\nnevertheless\npeaked\ninstrument\nwondering\nannually\nmanaging\nneil\n1891\nsigning\nterry\n##ice\napply\nclinical\nbrooklyn\naim\ncatherine\nfuck\nfarmers\nfigured\nninth\npride\nhugh\nevolution\nordinary\ninvolvement\ncomfortable\nshouted\ntech\nencouraged\ntaiwan\nrepresentation\nsharing\n##lia\n##em\npanic\nexact\ncargo\ncompeting\nfat\ncried\n83\n1920s\noccasions\npa\ncabin\nborders\nutah\nmarcus\n##isation\nbadly\nmuscles\n##ance\nvictorian\ntransition\nwarner\nbet\npermission\n##rin\nslave\nterrible\nsimilarly\nshares\nseth\nuefa\npossession\nmedals\nbenefits\ncolleges\nlowered\nperfectly\nmall\ntransit\n##ye\n##kar\npublisher\n##ened\nharrison\ndeaths\nelevation\n##ae\nasleep\nmachines\nsigh\nash\nhardly\nargument\noccasion\nparent\nleo\ndecline\n1888\ncontribution\n##ua\nconcentration\n1000\nopportunities\nhispanic\nguardian\nextent\nemotions\nhips\nmason\nvolumes\nbloody\ncontroversy\ndiameter\nsteady\nmistake\nphoenix\nidentify\nviolin\n##sk\ndeparture\nrichmond\nspin\nfuneral\nenemies\n1864\ngear\nliterally\nconnor\nrandom\nsergeant\ngrab\nconfusion\n1865\ntransmission\ninformed\nop\nleaning\nsacred\nsuspended\nthinks\ngates\nportland\nluck\nagencies\nyours\nhull\nexpert\nmuscle\nlayer\npractical\nsculpture\njerusalem\nlatest\nlloyd\nstatistics\ndeeper\nrecommended\nwarrior\narkansas\nmess\nsupports\ngreg\neagle\n1880\nrecovered\nrated\nconcerts\nrushed\n##ano\nstops\neggs\nfiles\npremiere\nkeith\n##vo\ndelhi\nturner\npit\naffair\nbelief\npaint\n##zing\nmate\n##ach\n##ev\nvictim\n##ology\nwithdrew\nbonus\nstyles\nfled\n##ud\nglasgow\ntechnologies\nfunded\nnbc\nadaptation\n##ata\nportrayed\ncooperation\nsupporters\njudges\nbernard\njustin\nhallway\nralph\n##ick\ngraduating\ncontroversial\ndistant\ncontinental\nspider\nbite\n##ho\nrecognize\nintention\nmixing\n##ese\negyptian\nbow\ntourism\nsuppose\nclaiming\ntiger\ndominated\nparticipants\nvi\n##ru\nnurse\npartially\ntape\n##rum\npsychology\n##rn\nessential\ntouring\nduo\nvoting\ncivilian\nemotional\nchannels\n##king\napparent\nhebrew\n1887\ntommy\ncarrier\nintersection\nbeast\nhudson\n##gar\n##zo\nlab\nnova\nbench\ndiscuss\ncosta\n##ered\ndetailed\nbehalf\ndrivers\nunfortunately\nobtain\n##lis\nrocky\n##dae\nsiege\nfriendship\nhoney\n##rian\n1861\namy\nhang\nposted\ngovernments\ncollins\nrespond\nwildlife\npreferred\noperator\n##po\nlaura\npregnant\nvideos\ndennis\nsuspected\nboots\ninstantly\nweird\nautomatic\nbusinessman\nalleged\nplacing\nthrowing\nph\nmood\n1862\nperry\nvenue\njet\nremainder\n##lli\n##ci\npassion\nbiological\nboyfriend\n1863\ndirt\nbuffalo\nron\nsegment\nfa\nabuse\n##era\ngenre\nthrown\nstroke\ncolored\nstress\nexercise\ndisplayed\n##gen\nstruggled\n##tti\nabroad\ndramatic\nwonderful\nthereafter\nmadrid\ncomponent\nwidespread\n##sed\ntale\ncitizen\ntodd\nmonday\n1886\nvancouver\noverseas\nforcing\ncrying\ndescent\n##ris\ndiscussed\nsubstantial\nranks\nregime\n1870\nprovinces\nswitch\ndrum\nzane\nted\ntribes\nproof\nlp\ncream\nresearchers\nvolunteer\nmanor\nsilk\nmilan\ndonated\nallies\nventure\nprinciple\ndelivery\nenterprise\n##ves\n##ans\nbars\ntraditionally\nwitch\nreminded\ncopper\n##uk\npete\ninter\nlinks\ncolin\ngrinned\nelsewhere\ncompetitive\nfrequent\n##oy\nscream\n##hu\ntension\ntexts\nsubmarine\nfinnish\ndefending\ndefend\npat\ndetail\n1884\naffiliated\nstuart\nthemes\nvilla\nperiods\ntool\nbelgian\nruling\ncrimes\nanswers\nfolded\nlicensed\nresort\ndemolished\nhans\nlucy\n1881\nlion\ntraded\nphotographs\nwrites\ncraig\n##fa\ntrials\ngenerated\nbeth\nnoble\ndebt\npercentage\nyorkshire\nerected\nss\nviewed\ngrades\nconfidence\nceased\nislam\ntelephone\nretail\n##ible\nchile\nm²\nroberts\nsixteen\n##ich\ncommented\nhampshire\ninnocent\ndual\npounds\nchecked\nregulations\nafghanistan\nsung\nrico\nliberty\nassets\nbigger\noptions\nangels\nrelegated\ntribute\nwells\nattending\nleaf\n##yan\nbutler\nromanian\nforum\nmonthly\nlisa\npatterns\ngmina\n##tory\nmadison\nhurricane\nrev\n##ians\nbristol\n##ula\nelite\nvaluable\ndisaster\ndemocracy\nawareness\ngermans\nfreyja\n##ins\nloop\nabsolutely\npaying\npopulations\nmaine\nsole\nprayer\nspencer\nreleases\ndoorway\nbull\n##ani\nlover\nmidnight\nconclusion\n##sson\nthirteen\nlily\nmediterranean\n##lt\nnhl\nproud\nsample\n##hill\ndrummer\nguinea\n##ova\nmurphy\nclimb\n##ston\ninstant\nattributed\nhorn\nain\nrailways\nsteven\n##ao\nautumn\nferry\nopponent\nroot\ntraveling\nsecured\ncorridor\nstretched\ntales\nsheet\ntrinity\ncattle\nhelps\nindicates\nmanhattan\nmurdered\nfitted\n1882\ngentle\ngrandmother\nmines\nshocked\nvegas\nproduces\n##light\ncaribbean\n##ou\nbelong\ncontinuous\ndesperate\ndrunk\nhistorically\ntrio\nwaved\nraf\ndealing\nnathan\nbat\nmurmured\ninterrupted\nresiding\nscientist\npioneer\nharold\naaron\n##net\ndelta\nattempting\nminority\nmini\nbelieves\nchorus\ntend\nlots\neyed\nindoor\nload\nshots\nupdated\njail\n##llo\nconcerning\nconnecting\nwealth\n##ved\nslaves\narrive\nrangers\nsufficient\nrebuilt\n##wick\ncardinal\nflood\nmuhammad\nwhenever\nrelation\nrunners\nmoral\nrepair\nviewers\narriving\nrevenge\npunk\nassisted\nbath\nfairly\nbreathe\nlists\ninnings\nillustrated\nwhisper\nnearest\nvoters\nclinton\nties\nultimate\nscreamed\nbeijing\nlions\nandre\nfictional\ngathering\ncomfort\nradar\nsuitable\ndismissed\nhms\nban\npine\nwrist\natmosphere\nvoivodeship\nbid\ntimber\n##ned\n##nan\ngiants\n##ane\ncameron\nrecovery\nuss\nidentical\ncategories\nswitched\nserbia\nlaughter\nnoah\nensemble\ntherapy\npeoples\ntouching\n##off\nlocally\npearl\nplatforms\neverywhere\nballet\ntables\nlanka\nherbert\noutdoor\ntoured\nderek\n1883\nspaces\ncontested\nswept\n1878\nexclusive\nslight\nconnections\n##dra\nwinds\nprisoner\ncollective\nbangladesh\ntube\npublicly\nwealthy\nthai\n##ys\nisolated\nselect\n##ric\ninsisted\npen\nfortune\nticket\nspotted\nreportedly\nanimation\nenforcement\ntanks\n110\ndecides\nwider\nlowest\nowen\n##time\nnod\nhitting\n##hn\ngregory\nfurthermore\nmagazines\nfighters\nsolutions\n##ery\npointing\nrequested\nperu\nreed\nchancellor\nknights\nmask\nworker\neldest\nflames\nreduction\n1860\nvolunteers\n##tis\nreporting\n##hl\nwire\nadvisory\nendemic\norigins\nsettlers\npursue\nknock\nconsumer\n1876\neu\ncompound\ncreatures\nmansion\nsentenced\nivan\ndeployed\nguitars\nfrowned\ninvolves\nmechanism\nkilometers\nperspective\nshops\nmaps\nterminus\nduncan\nalien\nfist\nbridges\n##pers\nheroes\nfed\nderby\nswallowed\n##ros\npatent\nsara\nillness\ncharacterized\nadventures\nslide\nhawaii\njurisdiction\n##op\norganised\n##side\nadelaide\nwalks\nbiology\nse\n##ties\nrogers\nswing\ntightly\nboundaries\n##rie\nprepare\nimplementation\nstolen\n##sha\ncertified\ncolombia\nedwards\ngarage\n##mm\nrecalled\n##ball\nrage\nharm\nnigeria\nbreast\n##ren\nfurniture\npupils\nsettle\n##lus\ncuba\nballs\nclient\nalaska\n21st\nlinear\nthrust\ncelebration\nlatino\ngenetic\nterror\n##cia\n##ening\nlightning\nfee\nwitness\nlodge\nestablishing\nskull\n##ique\nearning\nhood\n##ei\nrebellion\nwang\nsporting\nwarned\nmissile\ndevoted\nactivist\nporch\nworship\nfourteen\npackage\n1871\ndecorated\n##shire\nhoused\n##ock\nchess\nsailed\ndoctors\noscar\njoan\ntreat\ngarcia\nharbour\njeremy\n##ire\ntraditions\ndominant\njacques\n##gon\n##wan\nrelocated\n1879\namendment\nsized\ncompanion\nsimultaneously\nvolleyball\nspun\nacre\nincreases\nstopping\nloves\nbelongs\naffect\ndrafted\ntossed\nscout\nbattles\n1875\nfilming\nshoved\nmunich\ntenure\nvertical\nromance\npc\n##cher\nargue\n##ical\ncraft\nranging\nwww\nopens\nhonest\ntyler\nyesterday\nvirtual\n##let\nmuslims\nreveal\nsnake\nimmigrants\nradical\nscreaming\nspeakers\nfiring\nsaving\nbelonging\nease\nlighting\nprefecture\nblame\nfarmer\nhungry\ngrows\nrubbed\nbeam\nsur\nsubsidiary\n##cha\narmenian\nsao\ndropping\nconventional\n##fer\nmicrosoft\nreply\nqualify\nspots\n1867\nsweat\nfestivals\n##ken\nimmigration\nphysician\ndiscover\nexposure\nsandy\nexplanation\nisaac\nimplemented\n##fish\nhart\ninitiated\nconnect\nstakes\npresents\nheights\nhouseholder\npleased\ntourist\nregardless\nslip\nclosest\n##ction\nsurely\nsultan\nbrings\nriley\npreparation\naboard\nslammed\nbaptist\nexperiment\nongoing\ninterstate\norganic\nplayoffs\n##ika\n1877\n130\n##tar\nhindu\nerror\ntours\ntier\nplenty\narrangements\ntalks\ntrapped\nexcited\nsank\nho\nathens\n1872\ndenver\nwelfare\nsuburb\nathletes\ntrick\ndiverse\nbelly\nexclusively\nyelled\n1868\n##med\nconversion\n##ette\n1874\ninternationally\ncomputers\nconductor\nabilities\nsensitive\nhello\ndispute\nmeasured\nglobe\nrocket\nprices\namsterdam\nflights\ntigers\ninn\nmunicipalities\nemotion\nreferences\n3d\n##mus\nexplains\nairlines\nmanufactured\npm\narchaeological\n1873\ninterpretation\ndevon\ncomment\n##ites\nsettlements\nkissing\nabsolute\nimprovement\nsuite\nimpressed\nbarcelona\nsullivan\njefferson\ntowers\njesse\njulie\n##tin\n##lu\ngrandson\nhi\ngauge\nregard\nrings\ninterviews\ntrace\nraymond\nthumb\ndepartments\nburns\nserial\nbulgarian\nscores\ndemonstrated\n##ix\n1866\nkyle\nalberta\nunderneath\nromanized\n##ward\nrelieved\nacquisition\nphrase\ncliff\nreveals\nhan\ncuts\nmerger\ncustom\n##dar\nnee\ngilbert\ngraduation\n##nts\nassessment\ncafe\ndifficulty\ndemands\nswung\ndemocrat\njennifer\ncommons\n1940s\ngrove\n##yo\ncompleting\nfocuses\nsum\nsubstitute\nbearing\nstretch\nreception\n##py\nreflected\nessentially\ndestination\npairs\n##ched\nsurvival\nresource\n##bach\npromoting\ndoubles\nmessages\ntear\n##down\n##fully\nparade\nflorence\nharvey\nincumbent\npartial\nframework\n900\npedro\nfrozen\nprocedure\nolivia\ncontrols\n##mic\nshelter\npersonally\ntemperatures\n##od\nbrisbane\ntested\nsits\nmarble\ncomprehensive\noxygen\nleonard\n##kov\ninaugural\niranian\nreferring\nquarters\nattitude\n##ivity\nmainstream\nlined\nmars\ndakota\nnorfolk\nunsuccessful\n##°\nexplosion\nhelicopter\ncongressional\n##sing\ninspector\nbitch\nseal\ndeparted\ndivine\n##ters\ncoaching\nexamination\npunishment\nmanufacturer\nsink\ncolumns\nunincorporated\nsignals\nnevada\nsqueezed\ndylan\ndining\nphotos\nmartial\nmanuel\neighteen\nelevator\nbrushed\nplates\nministers\nivy\ncongregation\n##len\nslept\nspecialized\ntaxes\ncurve\nrestricted\nnegotiations\nlikes\nstatistical\narnold\ninspiration\nexecution\nbold\nintermediate\nsignificance\nmargin\nruler\nwheels\ngothic\nintellectual\ndependent\nlistened\neligible\nbuses\nwidow\nsyria\nearn\ncincinnati\ncollapsed\nrecipient\nsecrets\naccessible\nphilippine\nmaritime\ngoddess\nclerk\nsurrender\nbreaks\nplayoff\ndatabase\n##ified\n##lon\nideal\nbeetle\naspect\nsoap\nregulation\nstrings\nexpand\nanglo\nshorter\ncrosses\nretreat\ntough\ncoins\nwallace\ndirections\npressing\n##oon\nshipping\nlocomotives\ncomparison\ntopics\nnephew\n##mes\ndistinction\nhonors\ntravelled\nsierra\nibn\n##over\nfortress\nsa\nrecognised\ncarved\n1869\nclients\n##dan\nintent\n##mar\ncoaches\ndescribing\nbread\n##ington\nbeaten\nnorthwestern\n##ona\nmerit\nyoutube\ncollapse\nchallenges\nem\nhistorians\nobjective\nsubmitted\nvirus\nattacking\ndrake\nassume\n##ere\ndiseases\nmarc\nstem\nleeds\n##cus\n##ab\nfarming\nglasses\n##lock\nvisits\nnowhere\nfellowship\nrelevant\ncarries\nrestaurants\nexperiments\n101\nconstantly\nbases\ntargets\nshah\ntenth\nopponents\nverse\nterritorial\n##ira\nwritings\ncorruption\n##hs\ninstruction\ninherited\nreverse\nemphasis\n##vic\nemployee\narch\nkeeps\nrabbi\nwatson\npayment\nuh\n##ala\nnancy\n##tre\nvenice\nfastest\nsexy\nbanned\nadrian\nproperly\nruth\ntouchdown\ndollar\nboards\nmetre\ncircles\nedges\nfavour\ncomments\nok\ntravels\nliberation\nscattered\nfirmly\n##ular\nholland\npermitted\ndiesel\nkenya\nden\noriginated\n##ral\ndemons\nresumed\ndragged\nrider\n##rus\nservant\nblinked\nextend\ntorn\n##ias\n##sey\ninput\nmeal\neverybody\ncylinder\nkinds\ncamps\n##fe\nbullet\nlogic\n##wn\ncroatian\nevolved\nhealthy\nfool\nchocolate\nwise\npreserve\npradesh\n##ess\nrespective\n1850\n##ew\nchicken\nartificial\ngross\ncorresponding\nconvicted\ncage\ncaroline\ndialogue\n##dor\nnarrative\nstranger\nmario\nbr\nchristianity\nfailing\ntrent\ncommanding\nbuddhist\n1848\nmaurice\nfocusing\nyale\nbike\naltitude\n##ering\nmouse\nrevised\n##sley\nveteran\n##ig\npulls\ntheology\ncrashed\ncampaigns\nlegion\n##ability\ndrag\nexcellence\ncustomer\ncancelled\nintensity\nexcuse\n##lar\nliga\nparticipating\ncontributing\nprinting\n##burn\nvariable\n##rk\ncurious\nbin\nlegacy\nrenaissance\n##my\nsymptoms\nbinding\nvocalist\ndancer\n##nie\ngrammar\ngospel\ndemocrats\nya\nenters\nsc\ndiplomatic\nhitler\n##ser\nclouds\nmathematical\nquit\ndefended\noriented\n##heim\nfundamental\nhardware\nimpressive\nequally\nconvince\nconfederate\nguilt\nchuck\nsliding\n##ware\nmagnetic\nnarrowed\npetersburg\nbulgaria\notto\nphd\nskill\n##ama\nreader\nhopes\npitcher\nreservoir\nhearts\nautomatically\nexpecting\nmysterious\nbennett\nextensively\nimagined\nseeds\nmonitor\nfix\n##ative\njournalism\nstruggling\nsignature\nranch\nencounter\nphotographer\nobservation\nprotests\n##pin\ninfluences\n##hr\ncalendar\n##all\ncruz\ncroatia\nlocomotive\nhughes\nnaturally\nshakespeare\nbasement\nhook\nuncredited\nfaded\ntheories\napproaches\ndare\nphillips\nfilling\nfury\nobama\n##ain\nefficient\narc\ndeliver\nmin\nraid\nbreeding\ninducted\nleagues\nefficiency\naxis\nmontana\neagles\n##ked\nsupplied\ninstructions\nkaren\npicking\nindicating\ntrap\nanchor\npractically\nchristians\ntomb\nvary\noccasional\nelectronics\nlords\nreaders\nnewcastle\nfaint\ninnovation\ncollect\nsituations\nengagement\n160\nclaude\nmixture\n##feld\npeer\ntissue\nlogo\nlean\n##ration\n°f\nfloors\n##ven\narchitects\nreducing\n##our\n##ments\nrope\n1859\nottawa\n##har\nsamples\nbanking\ndeclaration\nproteins\nresignation\nfrancois\nsaudi\nadvocate\nexhibited\narmor\ntwins\ndivorce\n##ras\nabraham\nreviewed\njo\ntemporarily\nmatrix\nphysically\npulse\ncurled\n##ena\ndifficulties\nbengal\nusage\n##ban\nannie\nriders\ncertificate\n##pi\nholes\nwarsaw\ndistinctive\njessica\n##mon\nmutual\n1857\ncustoms\ncircular\neugene\nremoval\nloaded\nmere\nvulnerable\ndepicted\ngenerations\ndame\nheir\nenormous\nlightly\nclimbing\npitched\nlessons\npilots\nnepal\nram\ngoogle\npreparing\nbrad\nlouise\nrenowned\n##₂\nliam\n##ably\nplaza\nshaw\nsophie\nbrilliant\nbills\n##bar\n##nik\nfucking\nmainland\nserver\npleasant\nseized\nveterans\njerked\nfail\nbeta\nbrush\nradiation\nstored\nwarmth\nsoutheastern\nnate\nsin\nraced\nberkeley\njoke\nathlete\ndesignation\ntrunk\n##low\nroland\nqualification\narchives\nheels\nartwork\nreceives\njudicial\nreserves\n##bed\nwoke\ninstallation\nabu\nfloating\nfake\nlesser\nexcitement\ninterface\nconcentrated\naddressed\ncharacteristic\namanda\nsaxophone\nmonk\nauto\n##bus\nreleasing\negg\ndies\ninteraction\ndefender\nce\noutbreak\nglory\nloving\n##bert\nsequel\nconsciousness\nhttp\nawake\nski\nenrolled\n##ress\nhandling\nrookie\nbrow\nsomebody\nbiography\nwarfare\namounts\ncontracts\npresentation\nfabric\ndissolved\nchallenged\nmeter\npsychological\nlt\nelevated\nrally\naccurate\n##tha\nhospitals\nundergraduate\nspecialist\nvenezuela\nexhibit\nshed\nnursing\nprotestant\nfluid\nstructural\nfootage\njared\nconsistent\nprey\n##ska\nsuccession\nreflect\nexile\nlebanon\nwiped\nsuspect\nshanghai\nresting\nintegration\npreservation\nmarvel\nvariant\npirates\nsheep\nrounded\ncapita\nsailing\ncolonies\nmanuscript\ndeemed\nvariations\nclarke\nfunctional\nemerging\nboxing\nrelaxed\ncurse\nazerbaijan\nheavyweight\nnickname\neditorial\nrang\ngrid\ntightened\nearthquake\nflashed\nmiguel\nrushing\n##ches\nimprovements\nboxes\nbrooks\n180\nconsumption\nmolecular\nfelix\nsocieties\nrepeatedly\nvariation\naids\ncivic\ngraphics\nprofessionals\nrealm\nautonomous\nreceiver\ndelayed\nworkshop\nmilitia\nchairs\ntrump\ncanyon\n##point\nharsh\nextending\nlovely\nhappiness\n##jan\nstake\neyebrows\nembassy\nwellington\nhannah\n##ella\nsony\ncorners\nbishops\nswear\ncloth\ncontents\nxi\nnamely\ncommenced\n1854\nstanford\nnashville\ncourage\ngraphic\ncommitment\ngarrison\n##bin\nhamlet\nclearing\nrebels\nattraction\nliteracy\ncooking\nruins\ntemples\njenny\nhumanity\ncelebrate\nhasn\nfreight\nsixty\nrebel\nbastard\n##art\nnewton\n##ada\ndeer\n##ges\n##ching\nsmiles\ndelaware\nsingers\n##ets\napproaching\nassists\nflame\n##ph\nboulevard\nbarrel\nplanted\n##ome\npursuit\n##sia\nconsequences\nposts\nshallow\ninvitation\nrode\ndepot\nernest\nkane\nrod\nconcepts\npreston\ntopic\nchambers\nstriking\nblast\narrives\ndescendants\nmontgomery\nranges\nworlds\n##lay\n##ari\nspan\nchaos\npraise\n##ag\nfewer\n1855\nsanctuary\nmud\nfbi\n##ions\nprogrammes\nmaintaining\nunity\nharper\nbore\nhandsome\nclosure\ntournaments\nthunder\nnebraska\nlinda\nfacade\nputs\nsatisfied\nargentine\ndale\ncork\ndome\npanama\n##yl\n1858\ntasks\nexperts\n##ates\nfeeding\nequation\n##las\n##ida\n##tu\nengage\nbryan\n##ax\num\nquartet\nmelody\ndisbanded\nsheffield\nblocked\ngasped\ndelay\nkisses\nmaggie\nconnects\n##non\nsts\npoured\ncreator\npublishers\n##we\nguided\nellis\nextinct\nhug\ngaining\n##ord\ncomplicated\n##bility\npoll\nclenched\ninvestigate\n##use\nthereby\nquantum\nspine\ncdp\nhumor\nkills\nadministered\nsemifinals\n##du\nencountered\nignore\n##bu\ncommentary\n##maker\nbother\nroosevelt\n140\nplains\nhalfway\nflowing\ncultures\ncrack\nimprisoned\nneighboring\nairline\n##ses\n##view\n##mate\n##ec\ngather\nwolves\nmarathon\ntransformed\n##ill\ncruise\norganisations\ncarol\npunch\nexhibitions\nnumbered\nalarm\nratings\ndaddy\nsilently\n##stein\nqueens\ncolours\nimpression\nguidance\nliu\ntactical\n##rat\nmarshal\ndella\narrow\n##ings\nrested\nfeared\ntender\nowns\nbitter\nadvisor\nescort\n##ides\nspare\nfarms\ngrants\n##ene\ndragons\nencourage\ncolleagues\ncameras\n##und\nsucked\npile\nspirits\nprague\nstatements\nsuspension\nlandmark\nfence\ntorture\nrecreation\nbags\npermanently\nsurvivors\npond\nspy\npredecessor\nbombing\ncoup\n##og\nprotecting\ntransformation\nglow\n##lands\n##book\ndug\npriests\nandrea\nfeat\nbarn\njumping\n##chen\n##ologist\n##con\ncasualties\nstern\nauckland\npipe\nserie\nrevealing\nba\n##bel\ntrevor\nmercy\nspectrum\nyang\nconsist\ngoverning\ncollaborated\npossessed\nepic\ncomprises\nblew\nshane\n##ack\nlopez\nhonored\nmagical\nsacrifice\njudgment\nperceived\nhammer\nmtv\nbaronet\ntune\ndas\nmissionary\nsheets\n350\nneutral\noral\nthreatening\nattractive\nshade\naims\nseminary\n##master\nestates\n1856\nmichel\nwounds\nrefugees\nmanufacturers\n##nic\nmercury\nsyndrome\nporter\n##iya\n##din\nhamburg\nidentification\nupstairs\npurse\nwidened\npause\ncared\nbreathed\naffiliate\nsantiago\nprevented\nceltic\nfisher\n125\nrecruited\nbyzantine\nreconstruction\nfarther\n##mp\ndiet\nsake\nau\nspite\nsensation\n##ert\nblank\nseparation\n105\n##hon\nvladimir\narmies\nanime\n##lie\naccommodate\norbit\ncult\nsofia\narchive\n##ify\n##box\nfounders\nsustained\ndisorder\nhonours\nnortheastern\nmia\ncrops\nviolet\nthreats\nblanket\nfires\ncanton\nfollowers\nsouthwestern\nprototype\nvoyage\nassignment\naltered\nmoderate\nprotocol\npistol\n##eo\nquestioned\nbrass\nlifting\n1852\nmath\nauthored\n##ual\ndoug\ndimensional\ndynamic\n##san\n1851\npronounced\ngrateful\nquest\nuncomfortable\nboom\npresidency\nstevens\nrelating\npoliticians\nchen\nbarrier\nquinn\ndiana\nmosque\ntribal\ncheese\npalmer\nportions\nsometime\nchester\ntreasure\nwu\nbend\ndownload\nmillions\nreforms\nregistration\n##osa\nconsequently\nmonitoring\nate\npreliminary\nbrandon\ninvented\nps\neaten\nexterior\nintervention\nports\ndocumented\nlog\ndisplays\nlecture\nsally\nfavourite\n##itz\nvermont\nlo\ninvisible\nisle\nbreed\n##ator\njournalists\nrelay\nspeaks\nbackward\nexplore\nmidfielder\nactively\nstefan\nprocedures\ncannon\nblond\nkenneth\ncentered\nservants\nchains\nlibraries\nmalcolm\nessex\nhenri\nslavery\n##hal\nfacts\nfairy\ncoached\ncassie\ncats\nwashed\ncop\n##fi\nannouncement\nitem\n2000s\nvinyl\nactivated\nmarco\nfrontier\ngrowled\ncurriculum\n##das\nloyal\naccomplished\nleslie\nritual\nkenny\n##00\nvii\nnapoleon\nhollow\nhybrid\njungle\nstationed\nfriedrich\ncounted\n##ulated\nplatinum\ntheatrical\nseated\ncol\nrubber\nglen\n1840\ndiversity\nhealing\nextends\nid\nprovisions\nadministrator\ncolumbus\n##oe\ntributary\nte\nassured\norg\n##uous\nprestigious\nexamined\nlectures\ngrammy\nronald\nassociations\nbailey\nallan\nessays\nflute\nbelieving\nconsultant\nproceedings\ntravelling\n1853\nkit\nkerala\nyugoslavia\nbuddy\nmethodist\n##ith\nburial\ncentres\nbatman\n##nda\ndiscontinued\nbo\ndock\nstockholm\nlungs\nseverely\n##nk\nciting\nmanga\n##ugh\nsteal\nmumbai\niraqi\nrobot\ncelebrity\nbride\nbroadcasts\nabolished\npot\njoel\noverhead\nfranz\npacked\nreconnaissance\njohann\nacknowledged\nintroduce\nhandled\ndoctorate\ndevelopments\ndrinks\nalley\npalestine\n##nis\n##aki\nproceeded\nrecover\nbradley\ngrain\npatch\nafford\ninfection\nnationalist\nlegendary\n##ath\ninterchange\nvirtually\ngen\ngravity\nexploration\namber\nvital\nwishes\npowell\ndoctrine\nelbow\nscreenplay\n##bird\ncontribute\nindonesian\npet\ncreates\n##com\nenzyme\nkylie\ndiscipline\ndrops\nmanila\nhunger\n##ien\nlayers\nsuffer\nfever\nbits\nmonica\nkeyboard\nmanages\n##hood\nsearched\nappeals\n##bad\ntestament\ngrande\nreid\n##war\nbeliefs\ncongo\n##ification\n##dia\nsi\nrequiring\n##via\ncasey\n1849\nregret\nstreak\nrape\ndepends\nsyrian\nsprint\npound\ntourists\nupcoming\npub\n##xi\ntense\n##els\npracticed\necho\nnationwide\nguild\nmotorcycle\nliz\n##zar\nchiefs\ndesired\nelena\nbye\nprecious\nabsorbed\nrelatives\nbooth\npianist\n##mal\ncitizenship\nexhausted\nwilhelm\n##ceae\n##hed\nnoting\nquarterback\nurge\nhectares\n##gue\nace\nholly\n##tal\nblonde\ndavies\nparked\nsustainable\nstepping\ntwentieth\nairfield\ngalaxy\nnest\nchip\n##nell\ntan\nshaft\npaulo\nrequirement\n##zy\nparadise\ntobacco\ntrans\nrenewed\nvietnamese\n##cker\n##ju\nsuggesting\ncatching\nholmes\nenjoying\nmd\ntrips\ncolt\nholder\nbutterfly\nnerve\nreformed\ncherry\nbowling\ntrailer\ncarriage\ngoodbye\nappreciate\ntoy\njoshua\ninteractive\nenabled\ninvolve\n##kan\ncollar\ndetermination\nbunch\nfacebook\nrecall\nshorts\nsuperintendent\nepiscopal\nfrustration\ngiovanni\nnineteenth\nlaser\nprivately\narray\ncirculation\n##ovic\narmstrong\ndeals\npainful\npermit\ndiscrimination\n##wi\naires\nretiring\ncottage\nni\n##sta\nhorizon\nellen\njamaica\nripped\nfernando\nchapters\nplaystation\npatron\nlecturer\nnavigation\nbehaviour\ngenes\ngeorgian\nexport\nsolomon\nrivals\nswift\nseventeen\nrodriguez\nprinceton\nindependently\nsox\n1847\narguing\nentity\ncasting\nhank\ncriteria\noakland\ngeographic\nmilwaukee\nreflection\nexpanding\nconquest\ndubbed\n##tv\nhalt\nbrave\nbrunswick\ndoi\narched\ncurtis\ndivorced\npredominantly\nsomerset\nstreams\nugly\nzoo\nhorrible\ncurved\nbuenos\nfierce\ndictionary\nvector\ntheological\nunions\nhandful\nstability\nchan\npunjab\nsegments\n##lly\naltar\nignoring\ngesture\nmonsters\npastor\n##stone\nthighs\nunexpected\noperators\nabruptly\ncoin\ncompiled\nassociates\nimproving\nmigration\npin\n##ose\ncompact\ncollegiate\nreserved\n##urs\nquarterfinals\nroster\nrestore\nassembled\nhurry\noval\n##cies\n1846\nflags\nmartha\n##del\nvictories\nsharply\n##rated\nargues\ndeadly\nneo\ndrawings\nsymbols\nperformer\n##iel\ngriffin\nrestrictions\nediting\nandrews\njava\njournals\narabia\ncompositions\ndee\npierce\nremoving\nhindi\ncasino\nrunway\ncivilians\nminds\nnasa\nhotels\n##zation\nrefuge\nrent\nretain\npotentially\nconferences\nsuburban\nconducting\n##tto\n##tions\n##tle\ndescended\nmassacre\n##cal\nammunition\nterrain\nfork\nsouls\ncounts\nchelsea\ndurham\ndrives\ncab\n##bank\nperth\nrealizing\npalestinian\nfinn\nsimpson\n##dal\nbetty\n##ule\nmoreover\nparticles\ncardinals\ntent\nevaluation\nextraordinary\n##oid\ninscription\n##works\nwednesday\nchloe\nmaintains\npanels\nashley\ntrucks\n##nation\ncluster\nsunlight\nstrikes\nzhang\n##wing\ndialect\ncanon\n##ap\ntucked\n##ws\ncollecting\n##mas\n##can\n##sville\nmaker\nquoted\nevan\nfranco\naria\nbuying\ncleaning\neva\ncloset\nprovision\napollo\nclinic\nrat\n##ez\nnecessarily\nac\n##gle\n##ising\nvenues\nflipped\ncent\nspreading\ntrustees\nchecking\nauthorized\n##sco\ndisappointed\n##ado\nnotion\nduration\ntrumpet\nhesitated\ntopped\nbrussels\nrolls\ntheoretical\nhint\ndefine\naggressive\nrepeat\nwash\npeaceful\noptical\nwidth\nallegedly\nmcdonald\nstrict\ncopyright\n##illa\ninvestors\nmar\njam\nwitnesses\nsounding\nmiranda\nmichelle\nprivacy\nhugo\nharmony\n##pp\nvalid\nlynn\nglared\nnina\n102\nheadquartered\ndiving\nboarding\ngibson\n##ncy\nalbanian\nmarsh\nroutine\ndealt\nenhanced\ner\nintelligent\nsubstance\ntargeted\nenlisted\ndiscovers\nspinning\nobservations\npissed\nsmoking\nrebecca\ncapitol\nvisa\nvaried\ncostume\nseemingly\nindies\ncompensation\nsurgeon\nthursday\narsenal\nwestminster\nsuburbs\nrid\nanglican\n##ridge\nknots\nfoods\nalumni\nlighter\nfraser\nwhoever\nportal\nscandal\n##ray\ngavin\nadvised\ninstructor\nflooding\nterrorist\n##ale\nteenage\ninterim\nsenses\nduck\nteen\nthesis\nabby\neager\novercome\n##ile\nnewport\nglenn\nrises\nshame\n##cc\nprompted\npriority\nforgot\nbomber\nnicolas\nprotective\n360\ncartoon\nkatherine\nbreeze\nlonely\ntrusted\nhenderson\nrichardson\nrelax\nbanner\ncandy\npalms\nremarkable\n##rio\nlegends\ncricketer\nessay\nordained\nedmund\nrifles\ntrigger\n##uri\n##away\nsail\nalert\n1830\naudiences\npenn\nsussex\nsiblings\npursued\nindianapolis\nresist\nrosa\nconsequence\nsucceed\navoided\n1845\n##ulation\ninland\n##tie\n##nna\ncounsel\nprofession\nchronicle\nhurried\n##una\neyebrow\neventual\nbleeding\ninnovative\ncure\n##dom\ncommittees\naccounting\ncon\nscope\nhardy\nheather\ntenor\ngut\nherald\ncodes\ntore\nscales\nwagon\n##oo\nluxury\ntin\nprefer\nfountain\ntriangle\nbonds\ndarling\nconvoy\ndried\ntraced\nbeings\ntroy\naccidentally\nslam\nfindings\nsmelled\njoey\nlawyers\noutcome\nsteep\nbosnia\nconfiguration\nshifting\ntoll\nbrook\nperformers\nlobby\nphilosophical\nconstruct\nshrine\naggregate\nboot\ncox\nphenomenon\nsavage\ninsane\nsolely\nreynolds\nlifestyle\n##ima\nnationally\nholdings\nconsideration\nenable\nedgar\nmo\nmama\n##tein\nfights\nrelegation\nchances\natomic\nhub\nconjunction\nawkward\nreactions\ncurrency\nfinale\nkumar\nunderwent\nsteering\nelaborate\ngifts\ncomprising\nmelissa\nveins\nreasonable\nsunshine\nchi\nsolve\ntrails\ninhabited\nelimination\nethics\nhuh\nana\nmolly\nconsent\napartments\nlayout\nmarines\n##ces\nhunters\nbulk\n##oma\nhometown\n##wall\n##mont\ncracked\nreads\nneighbouring\nwithdrawn\nadmission\nwingspan\ndamned\nanthology\nlancashire\nbrands\nbatting\nforgive\ncuban\nawful\n##lyn\n104\ndimensions\nimagination\n##ade\ndante\n##ship\ntracking\ndesperately\ngoalkeeper\n##yne\ngroaned\nworkshops\nconfident\nburton\ngerald\nmilton\ncircus\nuncertain\nslope\ncopenhagen\nsophia\nfog\nphilosopher\nportraits\naccent\ncycling\nvarying\ngripped\nlarvae\ngarrett\nspecified\nscotia\nmature\nluther\nkurt\nrap\n##kes\naerial\n750\nferdinand\nheated\nes\ntransported\n##shan\nsafely\nnonetheless\n##orn\n##gal\nmotors\ndemanding\n##sburg\nstartled\n##brook\nally\ngenerate\ncaps\nghana\nstained\ndemo\nmentions\nbeds\nap\nafterward\ndiary\n##bling\nutility\n##iro\nrichards\n1837\nconspiracy\nconscious\nshining\nfootsteps\nobserver\ncyprus\nurged\nloyalty\ndeveloper\nprobability\nolive\nupgraded\ngym\nmiracle\ninsects\ngraves\n1844\nourselves\nhydrogen\namazon\nkatie\ntickets\npoets\n##pm\nplanes\n##pan\nprevention\nwitnessed\ndense\njin\nrandy\ntang\nwarehouse\nmonroe\nbang\narchived\nelderly\ninvestigations\nalec\ngranite\nmineral\nconflicts\ncontrolling\naboriginal\ncarlo\n##zu\nmechanics\nstan\nstark\nrhode\nskirt\nest\n##berry\nbombs\nrespected\n##horn\nimposed\nlimestone\ndeny\nnominee\nmemphis\ngrabbing\ndisabled\n##als\namusement\naa\nfrankfurt\ncorn\nreferendum\nvaries\nslowed\ndisk\nfirms\nunconscious\nincredible\nclue\nsue\n##zhou\ntwist\n##cio\njoins\nidaho\nchad\ndevelopers\ncomputing\ndestroyer\n103\nmortal\ntucker\nkingston\nchoices\nyu\ncarson\n1800\nos\nwhitney\ngeneva\npretend\ndimension\nstaged\nplateau\nmaya\n##une\nfreestyle\n##bc\nrovers\nhiv\n##ids\ntristan\nclassroom\nprospect\n##hus\nhonestly\ndiploma\nlied\nthermal\nauxiliary\nfeast\nunlikely\niata\n##tel\nmorocco\npounding\ntreasury\nlithuania\nconsiderably\n1841\ndish\n1812\ngeological\nmatching\nstumbled\ndestroying\nmarched\nbrien\nadvances\ncake\nnicole\nbelle\nsettling\nmeasuring\ndirecting\n##mie\ntuesday\nbassist\ncapabilities\nstunned\nfraud\ntorpedo\n##list\n##phone\nanton\nwisdom\nsurveillance\nruined\n##ulate\nlawsuit\nhealthcare\ntheorem\nhalls\ntrend\naka\nhorizontal\ndozens\nacquire\nlasting\nswim\nhawk\ngorgeous\nfees\nvicinity\ndecrease\nadoption\ntactics\n##ography\npakistani\n##ole\ndraws\n##hall\nwillie\nburke\nheath\nalgorithm\nintegral\npowder\nelliott\nbrigadier\njackie\ntate\nvarieties\ndarker\n##cho\nlately\ncigarette\nspecimens\nadds\n##ree\n##ensis\n##inger\nexploded\nfinalist\ncia\nmurders\nwilderness\narguments\nnicknamed\nacceptance\nonwards\nmanufacture\nrobertson\njets\ntampa\nenterprises\nblog\nloudly\ncomposers\nnominations\n1838\nai\nmalta\ninquiry\nautomobile\nhosting\nviii\nrays\ntilted\ngrief\nmuseums\nstrategies\nfurious\neuro\nequality\ncohen\npoison\nsurrey\nwireless\ngoverned\nridiculous\nmoses\n##esh\n##room\nvanished\n##ito\nbarnes\nattract\nmorrison\nistanbul\n##iness\nabsent\nrotation\npetition\njanet\n##logical\nsatisfaction\ncustody\ndeliberately\nobservatory\ncomedian\nsurfaces\npinyin\nnovelist\nstrictly\ncanterbury\noslo\nmonks\nembrace\nibm\njealous\nphotograph\ncontinent\ndorothy\nmarina\ndoc\nexcess\nholden\nallegations\nexplaining\nstack\navoiding\nlance\nstoryline\nmajesty\npoorly\nspike\ndos\nbradford\nraven\ntravis\nclassics\nproven\nvoltage\npillow\nfists\nbutt\n1842\ninterpreted\n##car\n1839\ngage\ntelegraph\nlens\npromising\nexpelled\ncasual\ncollector\nzones\n##min\nsilly\nnintendo\n##kh\n##bra\ndownstairs\nchef\nsuspicious\nafl\nflies\nvacant\nuganda\npregnancy\ncondemned\nlutheran\nestimates\ncheap\ndecree\nsaxon\nproximity\nstripped\nidiot\ndeposits\ncontrary\npresenter\nmagnus\nglacier\nim\noffense\nedwin\n##ori\nupright\n##long\nbolt\n##ois\ntoss\ngeographical\n##izes\nenvironments\ndelicate\nmarking\nabstract\nxavier\nnails\nwindsor\nplantation\noccurring\nequity\nsaskatchewan\nfears\ndrifted\nsequences\nvegetation\nrevolt\n##stic\n1843\nsooner\nfusion\nopposing\nnato\nskating\n1836\nsecretly\nruin\nlease\n##oc\nedit\n##nne\nflora\nanxiety\nruby\n##ological\n##mia\ntel\nbout\ntaxi\nemmy\nfrost\nrainbow\ncompounds\nfoundations\nrainfall\nassassination\nnightmare\ndominican\n##win\nachievements\ndeserve\norlando\nintact\narmenia\n##nte\ncalgary\nvalentine\n106\nmarion\nproclaimed\ntheodore\nbells\ncourtyard\nthigh\ngonzalez\nconsole\ntroop\nminimal\nmonte\neveryday\n##ence\n##if\nsupporter\nterrorism\nbuck\nopenly\npresbyterian\nactivists\ncarpet\n##iers\nrubbing\nuprising\n##yi\ncute\nconceived\nlegally\n##cht\nmillennium\ncello\nvelocity\nji\nrescued\ncardiff\n1835\nrex\nconcentrate\nsenators\nbeard\nrendered\nglowing\nbattalions\nscouts\ncompetitors\nsculptor\ncatalogue\narctic\nion\nraja\nbicycle\nwow\nglancing\nlawn\n##woman\ngentleman\nlighthouse\npublish\npredicted\ncalculated\n##val\nvariants\n##gne\nstrain\n##ui\nwinston\ndeceased\n##nus\ntouchdowns\nbrady\ncaleb\nsinking\nechoed\ncrush\nhon\nblessed\nprotagonist\nhayes\nendangered\nmagnitude\neditors\n##tine\nestimate\nresponsibilities\n##mel\nbackup\nlaying\nconsumed\nsealed\nzurich\nlovers\nfrustrated\n##eau\nahmed\nkicking\nmit\ntreasurer\n1832\nbiblical\nrefuse\nterrified\npump\nagrees\ngenuine\nimprisonment\nrefuses\nplymouth\n##hen\nlou\n##nen\ntara\ntrembling\nantarctic\nton\nlearns\n##tas\ncrap\ncrucial\nfaction\natop\n##borough\nwrap\nlancaster\nodds\nhopkins\nerik\nlyon\n##eon\nbros\n##ode\nsnap\nlocality\ntips\nempress\ncrowned\ncal\nacclaimed\nchuckled\n##ory\nclara\nsends\nmild\ntowel\n##fl\n##day\n##а\nwishing\nassuming\ninterviewed\n##bal\n##die\ninteractions\neden\ncups\nhelena\n##lf\nindie\nbeck\n##fire\nbatteries\nfilipino\nwizard\nparted\n##lam\ntraces\n##born\nrows\nidol\nalbany\ndelegates\n##ees\n##sar\ndiscussions\n##ex\nnotre\ninstructed\nbelgrade\nhighways\nsuggestion\nlauren\npossess\norientation\nalexandria\nabdul\nbeats\nsalary\nreunion\nludwig\nalright\nwagner\nintimate\npockets\nslovenia\nhugged\nbrighton\nmerchants\ncruel\nstole\ntrek\nslopes\nrepairs\nenrollment\npolitically\nunderlying\npromotional\ncounting\nboeing\n##bb\nisabella\nnaming\n##и\nkeen\nbacteria\nlisting\nseparately\nbelfast\nussr\n450\nlithuanian\nanybody\nribs\nsphere\nmartinez\ncock\nembarrassed\nproposals\nfragments\nnationals\n##fs\n##wski\npremises\nfin\n1500\nalpine\nmatched\nfreely\nbounded\njace\nsleeve\n##af\ngaming\npier\npopulated\nevident\n##like\nfrances\nflooded\n##dle\nfrightened\npour\ntrainer\nframed\nvisitor\nchallenging\npig\nwickets\n##fold\ninfected\nemail\n##pes\narose\n##aw\nreward\necuador\noblast\nvale\nch\nshuttle\n##usa\nbach\nrankings\nforbidden\ncornwall\naccordance\nsalem\nconsumers\nbruno\nfantastic\ntoes\nmachinery\nresolved\njulius\nremembering\npropaganda\niceland\nbombardment\ntide\ncontacts\nwives\n##rah\nconcerto\nmacdonald\nalbania\nimplement\ndaisy\ntapped\nsudan\nhelmet\nangela\nmistress\n##lic\ncrop\nsunk\nfinest\n##craft\nhostile\n##ute\n##tsu\nboxer\nfr\npaths\nadjusted\nhabit\nballot\nsupervision\nsoprano\n##zen\nbullets\nwicked\nsunset\nregiments\ndisappear\nlamp\nperforms\napp\n##gia\n##oa\nrabbit\ndigging\nincidents\nentries\n##cion\ndishes\n##oi\nintroducing\n##ati\n##fied\nfreshman\nslot\njill\ntackles\nbaroque\nbacks\n##iest\nlone\nsponsor\ndestiny\naltogether\nconvert\n##aro\nconsensus\nshapes\ndemonstration\nbasically\nfeminist\nauction\nartifacts\n##bing\nstrongest\ntwitter\nhalifax\n2019\nallmusic\nmighty\nsmallest\nprecise\nalexandra\nviola\n##los\n##ille\nmanuscripts\n##illo\ndancers\nari\nmanagers\nmonuments\nblades\nbarracks\nspringfield\nmaiden\nconsolidated\nelectron\n##end\nberry\nairing\nwheat\nnobel\ninclusion\nblair\npayments\ngeography\nbee\ncc\neleanor\nreact\n##hurst\nafc\nmanitoba\n##yu\nsu\nlineup\nfitness\nrecreational\ninvestments\nairborne\ndisappointment\n##dis\nedmonton\nviewing\n##row\nrenovation\n##cast\ninfant\nbankruptcy\nroses\naftermath\npavilion\n##yer\ncarpenter\nwithdrawal\nladder\n##hy\ndiscussing\npopped\nreliable\nagreements\nrochester\n##abad\ncurves\nbombers\n220\nrao\nreverend\ndecreased\nchoosing\n107\nstiff\nconsulting\nnaples\ncrawford\ntracy\nka\nribbon\ncops\n##lee\ncrushed\ndeciding\nunified\nteenager\naccepting\nflagship\nexplorer\npoles\nsanchez\ninspection\nrevived\nskilled\ninduced\nexchanged\nflee\nlocals\ntragedy\nswallow\nloading\nhanna\ndemonstrate\n##ela\nsalvador\nflown\ncontestants\ncivilization\n##ines\nwanna\nrhodes\nfletcher\nhector\nknocking\nconsiders\n##ough\nnash\nmechanisms\nsensed\nmentally\nwalt\nunclear\n##eus\nrenovated\nmadame\n##cks\ncrews\ngovernmental\n##hin\nundertaken\nmonkey\n##ben\n##ato\nfatal\narmored\ncopa\ncaves\ngovernance\ngrasp\nperception\ncertification\nfroze\ndamp\ntugged\nwyoming\n##rg\n##ero\nnewman\n##lor\nnerves\ncuriosity\ngraph\n115\n##ami\nwithdraw\ntunnels\ndull\nmeredith\nmoss\nexhibits\nneighbors\ncommunicate\naccuracy\nexplored\nraiders\nrepublicans\nsecular\nkat\nsuperman\npenny\ncriticised\n##tch\nfreed\nupdate\nconviction\nwade\nham\nlikewise\ndelegation\ngotta\ndoll\npromises\ntechnological\nmyth\nnationality\nresolve\nconvent\n##mark\nsharon\ndig\nsip\ncoordinator\nentrepreneur\nfold\n##dine\ncapability\ncouncillor\nsynonym\nblown\nswan\ncursed\n1815\njonas\nhaired\nsofa\ncanvas\nkeeper\nrivalry\n##hart\nrapper\nspeedway\nswords\npostal\nmaxwell\nestonia\npotter\nrecurring\n##nn\n##ave\nerrors\n##oni\ncognitive\n1834\n##²\nclaws\nnadu\nroberto\nbce\nwrestler\nellie\n##ations\ninfinite\nink\n##tia\npresumably\nfinite\nstaircase\n108\nnoel\npatricia\nnacional\n##cation\nchill\neternal\ntu\npreventing\nprussia\nfossil\nlimbs\n##logist\nernst\nfrog\nperez\nrene\n##ace\npizza\nprussian\n##ios\n##vy\nmolecules\nregulatory\nanswering\nopinions\nsworn\nlengths\nsupposedly\nhypothesis\nupward\nhabitats\nseating\nancestors\ndrank\nyield\nhd\nsynthesis\nresearcher\nmodest\n##var\nmothers\npeered\nvoluntary\nhomeland\n##the\nacclaim\n##igan\nstatic\nvalve\nluxembourg\nalto\ncarroll\nfe\nreceptor\nnorton\nambulance\n##tian\njohnston\ncatholics\ndepicting\njointly\nelephant\ngloria\nmentor\nbadge\nahmad\ndistinguish\nremarked\ncouncils\nprecisely\nallison\nadvancing\ndetection\ncrowded\n##10\ncooperative\nankle\nmercedes\ndagger\nsurrendered\npollution\ncommit\nsubway\njeffrey\nlesson\nsculptures\nprovider\n##fication\nmembrane\ntimothy\nrectangular\nfiscal\nheating\nteammate\nbasket\nparticle\nanonymous\ndeployment\n##ple\nmissiles\ncourthouse\nproportion\nshoe\nsec\n##ller\ncomplaints\nforbes\nblacks\nabandon\nremind\nsizes\noverwhelming\nautobiography\nnatalie\n##awa\nrisks\ncontestant\ncountryside\nbabies\nscorer\ninvaded\nenclosed\nproceed\nhurling\ndisorders\n##cu\nreflecting\ncontinuously\ncruiser\ngraduates\nfreeway\ninvestigated\nore\ndeserved\nmaid\nblocking\nphillip\njorge\nshakes\ndove\nmann\nvariables\nlacked\nburden\naccompanying\nque\nconsistently\norganizing\nprovisional\ncomplained\nendless\n##rm\ntubes\njuice\ngeorges\nkrishna\nmick\nlabels\nthriller\n##uch\nlaps\narcade\nsage\nsnail\n##table\nshannon\nfi\nlaurence\nseoul\nvacation\npresenting\nhire\nchurchill\nsurprisingly\nprohibited\nsavannah\ntechnically\n##oli\n170\n##lessly\ntestimony\nsuited\nspeeds\ntoys\nromans\nmlb\nflowering\nmeasurement\ntalented\nkay\nsettings\ncharleston\nexpectations\nshattered\nachieving\ntriumph\nceremonies\nportsmouth\nlanes\nmandatory\nloser\nstretching\ncologne\nrealizes\nseventy\ncornell\ncareers\nwebb\n##ulating\namericas\nbudapest\nava\nsuspicion\n##ison\nyo\nconrad\n##hai\nsterling\njessie\nrector\n##az\n1831\ntransform\norganize\nloans\nchristine\nvolcanic\nwarrant\nslender\nsummers\nsubfamily\nnewer\ndanced\ndynamics\nrhine\nproceeds\nheinrich\ngastropod\ncommands\nsings\nfacilitate\neaster\nra\npositioned\nresponses\nexpense\nfruits\nyanked\nimported\n25th\nvelvet\nvic\nprimitive\ntribune\nbaldwin\nneighbourhood\ndonna\nrip\nhay\npr\n##uro\n1814\nespn\nwelcomed\n##aria\nqualifier\nglare\nhighland\ntiming\n##cted\nshells\neased\ngeometry\nlouder\nexciting\nslovakia\n##sion\n##iz\n##lot\nsavings\nprairie\n##ques\nmarching\nrafael\ntonnes\n##lled\ncurtain\npreceding\nshy\nheal\ngreene\nworthy\n##pot\ndetachment\nbury\nsherman\n##eck\nreinforced\nseeks\nbottles\ncontracted\nduchess\noutfit\nwalsh\n##sc\nmickey\n##ase\ngeoffrey\narcher\nsqueeze\ndawson\neliminate\ninvention\n##enberg\nneal\n##eth\nstance\ndealer\ncoral\nmaple\nretire\npolo\nsimplified\n##ht\n1833\nhid\nwatts\nbackwards\njules\n##oke\ngenesis\nmt\nframes\nrebounds\nburma\nwoodland\nmoist\nsantos\nwhispers\ndrained\nsubspecies\n##aa\nstreaming\nulster\nburnt\ncorrespondence\nmaternal\ngerard\ndenis\nstealing\n##load\ngenius\nduchy\n##oria\ninaugurated\nmomentum\nsuits\nplacement\nsovereign\nclause\nthames\n##hara\nconfederation\nreservation\nsketch\nyankees\nlets\nrotten\ncharm\nhal\nverses\nultra\ncommercially\ndot\nsalon\ncitation\nadopt\nwinnipeg\nmist\nallocated\ncairo\n##boy\njenkins\ninterference\nobjectives\n##wind\n1820\nportfolio\narmoured\nsectors\n##eh\ninitiatives\n##world\nintegrity\nexercises\nrobe\ntap\nab\ngazed\n##tones\ndistracted\nrulers\n111\nfavorable\njerome\ntended\ncart\nfactories\n##eri\ndiplomat\nvalued\ngravel\ncharitable\n##try\ncalvin\nexploring\nchang\nshepherd\nterrace\npdf\npupil\n##ural\nreflects\nups\n##rch\ngovernors\nshelf\ndepths\n##nberg\ntrailed\ncrest\ntackle\n##nian\n##ats\nhatred\n##kai\nclare\nmakers\nethiopia\nlongtime\ndetected\nembedded\nlacking\nslapped\nrely\nthomson\nanticipation\niso\nmorton\nsuccessive\nagnes\nscreenwriter\nstraightened\nphilippe\nplaywright\nhaunted\nlicence\niris\nintentions\nsutton\n112\nlogical\ncorrectly\n##weight\nbranded\nlicked\ntipped\nsilva\nricky\nnarrator\nrequests\n##ents\ngreeted\nsupernatural\ncow\n##wald\nlung\nrefusing\nemployer\nstrait\ngaelic\nliner\n##piece\nzoe\nsabha\n##mba\ndriveway\nharvest\nprints\nbates\nreluctantly\nthreshold\nalgebra\nira\nwherever\ncoupled\n240\nassumption\npicks\n##air\ndesigners\nraids\ngentlemen\n##ean\nroller\nblowing\nleipzig\nlocks\nscrew\ndressing\nstrand\n##lings\nscar\ndwarf\ndepicts\n##nu\nnods\n##mine\ndiffer\nboris\n##eur\nyuan\nflip\n##gie\nmob\ninvested\nquestioning\napplying\n##ture\nshout\n##sel\ngameplay\nblamed\nillustrations\nbothered\nweakness\nrehabilitation\n##of\n##zes\nenvelope\nrumors\nminers\nleicester\nsubtle\nkerry\n##ico\nferguson\n##fu\npremiership\nne\n##cat\nbengali\nprof\ncatches\nremnants\ndana\n##rily\nshouting\npresidents\nbaltic\nought\nghosts\ndances\nsailors\nshirley\nfancy\ndominic\n##bie\nmadonna\n##rick\nbark\nbuttons\ngymnasium\nashes\nliver\ntoby\noath\nprovidence\ndoyle\nevangelical\nnixon\ncement\ncarnegie\nembarked\nhatch\nsurroundings\nguarantee\nneeding\npirate\nessence\n##bee\nfilter\ncrane\nhammond\nprojected\nimmune\npercy\ntwelfth\n##ult\nregent\ndoctoral\ndamon\nmikhail\n##ichi\nlu\ncritically\nelect\nrealised\nabortion\nacute\nscreening\nmythology\nsteadily\n##fc\nfrown\nnottingham\nkirk\nwa\nminneapolis\n##rra\nmodule\nalgeria\nmc\nnautical\nencounters\nsurprising\nstatues\navailability\nshirts\npie\nalma\nbrows\nmunster\nmack\nsoup\ncrater\ntornado\nsanskrit\ncedar\nexplosive\nbordered\ndixon\nplanets\nstamp\nexam\nhappily\n##bble\ncarriers\nkidnapped\n##vis\naccommodation\nemigrated\n##met\nknockout\ncorrespondent\nviolation\nprofits\npeaks\nlang\nspecimen\nagenda\nancestry\npottery\nspelling\nequations\nobtaining\nki\nlinking\n1825\ndebris\nasylum\n##20\nbuddhism\nteddy\n##ants\ngazette\n##nger\n##sse\ndental\neligibility\nutc\nfathers\naveraged\nzimbabwe\nfrancesco\ncoloured\nhissed\ntranslator\nlynch\nmandate\nhumanities\nmackenzie\nuniforms\nlin\n##iana\n##gio\nasset\nmhz\nfitting\nsamantha\ngenera\nwei\nrim\nbeloved\nshark\nriot\nentities\nexpressions\nindo\ncarmen\nslipping\nowing\nabbot\nneighbor\nsidney\n##av\nrats\nrecommendations\nencouraging\nsquadrons\nanticipated\ncommanders\nconquered\n##oto\ndonations\ndiagnosed\n##mond\ndivide\n##iva\nguessed\ndecoration\nvernon\nauditorium\nrevelation\nconversations\n##kers\n##power\nherzegovina\ndash\nalike\nprotested\nlateral\nherman\naccredited\nmg\n##gent\nfreeman\nmel\nfiji\ncrow\ncrimson\n##rine\nlivestock\n##pped\nhumanitarian\nbored\noz\nwhip\n##lene\n##ali\nlegitimate\nalter\ngrinning\nspelled\nanxious\noriental\nwesley\n##nin\n##hole\ncarnival\ncontroller\ndetect\n##ssa\nbowed\neducator\nkosovo\nmacedonia\n##sin\noccupy\nmastering\nstephanie\njaneiro\npara\nunaware\nnurses\nnoon\n135\ncam\nhopefully\nranger\ncombine\nsociology\npolar\nrica\n##eer\nneill\n##sman\nholocaust\n##ip\ndoubled\nlust\n1828\n109\ndecent\ncooling\nunveiled\n##card\n1829\nnsw\nhomer\nchapman\nmeyer\n##gin\ndive\nmae\nreagan\nexpertise\n##gled\ndarwin\nbrooke\nsided\nprosecution\ninvestigating\ncomprised\npetroleum\ngenres\nreluctant\ndifferently\ntrilogy\njohns\nvegetables\ncorpse\nhighlighted\nlounge\npension\nunsuccessfully\nelegant\naided\nivory\nbeatles\namelia\ncain\ndubai\nsunny\nimmigrant\nbabe\nclick\n##nder\nunderwater\npepper\ncombining\nmumbled\natlas\nhorns\naccessed\nballad\nphysicians\nhomeless\ngestured\nrpm\nfreak\nlouisville\ncorporations\npatriots\nprizes\nrational\nwarn\nmodes\ndecorative\novernight\ndin\ntroubled\nphantom\n##ort\nmonarch\nsheer\n##dorf\ngenerals\nguidelines\norgans\naddresses\n##zon\nenhance\ncurling\nparishes\ncord\n##kie\nlinux\ncaesar\ndeutsche\nbavaria\n##bia\ncoleman\ncyclone\n##eria\nbacon\npetty\n##yama\n##old\nhampton\ndiagnosis\n1824\nthrows\ncomplexity\nrita\ndisputed\n##₃\npablo\n##sch\nmarketed\ntrafficking\n##ulus\nexamine\nplague\nformats\n##oh\nvault\nfaithful\n##bourne\nwebster\n##ox\nhighlights\n##ient\n##ann\nphones\nvacuum\nsandwich\nmodeling\n##gated\nbolivia\nclergy\nqualities\nisabel\n##nas\n##ars\nwears\nscreams\nreunited\nannoyed\nbra\n##ancy\n##rate\ndifferential\ntransmitter\ntattoo\ncontainer\npoker\n##och\nexcessive\nresides\ncowboys\n##tum\naugustus\ntrash\nproviders\nstatute\nretreated\nbalcony\nreversed\nvoid\nstorey\npreceded\nmasses\nleap\nlaughs\nneighborhoods\nwards\nschemes\nfalcon\nsanto\nbattlefield\npad\nronnie\nthread\nlesbian\nvenus\n##dian\nbeg\nsandstone\ndaylight\npunched\ngwen\nanalog\nstroked\nwwe\nacceptable\nmeasurements\ndec\ntoxic\n##kel\nadequate\nsurgical\neconomist\nparameters\nvarsity\n##sberg\nquantity\nella\n##chy\n##rton\ncountess\ngenerating\nprecision\ndiamonds\nexpressway\nga\n##ı\n1821\nuruguay\ntalents\ngalleries\nexpenses\nscanned\ncolleague\noutlets\nryder\nlucien\n##ila\nparamount\n##bon\nsyracuse\ndim\nfangs\ngown\nsweep\n##sie\ntoyota\nmissionaries\nwebsites\n##nsis\nsentences\nadviser\nval\ntrademark\nspells\n##plane\npatience\nstarter\nslim\n##borg\ntoe\nincredibly\nshoots\nelliot\nnobility\n##wyn\ncowboy\nendorsed\ngardner\ntendency\npersuaded\norganisms\nemissions\nkazakhstan\namused\nboring\nchips\nthemed\n##hand\nllc\nconstantinople\nchasing\nsystematic\nguatemala\nborrowed\nerin\ncarey\n##hard\nhighlands\nstruggles\n1810\n##ifying\n##ced\nwong\nexceptions\ndevelops\nenlarged\nkindergarten\ncastro\n##ern\n##rina\nleigh\nzombie\njuvenile\n##most\nconsul\n##nar\nsailor\nhyde\nclarence\nintensive\npinned\nnasty\nuseless\njung\nclayton\nstuffed\nexceptional\nix\napostolic\n230\ntransactions\n##dge\nexempt\nswinging\ncove\nreligions\n##ash\nshields\ndairy\nbypass\n190\npursuing\nbug\njoyce\nbombay\nchassis\nsouthampton\nchat\ninteract\nredesignated\n##pen\nnascar\npray\nsalmon\nrigid\nregained\nmalaysian\ngrim\npublicity\nconstituted\ncapturing\ntoilet\ndelegate\npurely\ntray\ndrift\nloosely\nstriker\nweakened\ntrinidad\nmitch\nitv\ndefines\ntransmitted\nming\nscarlet\nnodding\nfitzgerald\nfu\nnarrowly\nsp\ntooth\nstandings\nvirtue\n##₁\n##wara\n##cting\nchateau\ngloves\nlid\n##nel\nhurting\nconservatory\n##pel\nsinclair\nreopened\nsympathy\nnigerian\nstrode\nadvocated\noptional\nchronic\ndischarge\n##rc\nsuck\ncompatible\nlaurel\nstella\nshi\nfails\nwage\ndodge\n128\ninformal\nsorts\nlevi\nbuddha\nvillagers\n##aka\nchronicles\nheavier\nsummoned\ngateway\n3000\neleventh\njewelry\ntranslations\naccordingly\nseas\n##ency\nfiber\npyramid\ncubic\ndragging\n##ista\ncaring\n##ops\nandroid\ncontacted\nlunar\n##dt\nkai\nlisbon\npatted\n1826\nsacramento\ntheft\nmadagascar\nsubtropical\ndisputes\nta\nholidays\npiper\nwillow\nmare\ncane\nitunes\nnewfoundland\nbenny\ncompanions\ndong\nraj\nobserve\nroar\ncharming\nplaque\ntibetan\nfossils\nenacted\nmanning\nbubble\ntina\ntanzania\n##eda\n##hir\nfunk\nswamp\ndeputies\ncloak\nufc\nscenario\npar\nscratch\nmetals\nanthem\nguru\nengaging\nspecially\n##boat\ndialects\nnineteen\ncecil\nduet\ndisability\nmessenger\nunofficial\n##lies\ndefunct\neds\nmoonlight\ndrainage\nsurname\npuzzle\nhonda\nswitching\nconservatives\nmammals\nknox\nbroadcaster\nsidewalk\ncope\n##ried\nbenson\nprinces\npeterson\n##sal\nbedford\nsharks\neli\nwreck\nalberto\ngasp\narchaeology\nlgbt\nteaches\nsecurities\nmadness\ncompromise\nwaving\ncoordination\ndavidson\nvisions\nleased\npossibilities\neighty\njun\nfernandez\nenthusiasm\nassassin\nsponsorship\nreviewer\nkingdoms\nestonian\nlaboratories\n##fy\n##nal\napplies\nverb\ncelebrations\n##zzo\nrowing\nlightweight\nsadness\nsubmit\nmvp\nbalanced\ndude\n##vas\nexplicitly\nmetric\nmagnificent\nmound\nbrett\nmohammad\nmistakes\nirregular\n##hing\n##ass\nsanders\nbetrayed\nshipped\nsurge\n##enburg\nreporters\ntermed\ngeorg\npity\nverbal\nbulls\nabbreviated\nenabling\nappealed\n##are\n##atic\nsicily\nsting\nheel\nsweetheart\nbart\nspacecraft\nbrutal\nmonarchy\n##tter\naberdeen\ncameo\ndiane\n##ub\nsurvivor\nclyde\n##aries\ncomplaint\n##makers\nclarinet\ndelicious\nchilean\nkarnataka\ncoordinates\n1818\npanties\n##rst\npretending\nar\ndramatically\nkiev\nbella\ntends\ndistances\n113\ncatalog\nlaunching\ninstances\ntelecommunications\nportable\nlindsay\nvatican\n##eim\nangles\naliens\nmarker\nstint\nscreens\nbolton\n##rne\njudy\nwool\nbenedict\nplasma\neuropa\nspark\nimaging\nfilmmaker\nswiftly\n##een\ncontributor\n##nor\nopted\nstamps\napologize\nfinancing\nbutter\ngideon\nsophisticated\nalignment\navery\nchemicals\nyearly\nspeculation\nprominence\nprofessionally\n##ils\nimmortal\ninstitutional\ninception\nwrists\nidentifying\ntribunal\nderives\ngains\n##wo\npapal\npreference\nlinguistic\nvince\noperative\nbrewery\n##ont\nunemployment\nboyd\n##ured\n##outs\nalbeit\nprophet\n1813\nbi\n##rr\n##face\n##rad\nquarterly\nasteroid\ncleaned\nradius\ntemper\n##llen\ntelugu\njerk\nviscount\nmenu\n##ote\nglimpse\n##aya\nyacht\nhawaiian\nbaden\n##rl\nlaptop\nreadily\n##gu\nmonetary\noffshore\nscots\nwatches\n##yang\n##arian\nupgrade\nneedle\nxbox\nlea\nencyclopedia\nflank\nfingertips\n##pus\ndelight\nteachings\nconfirm\nroth\nbeaches\nmidway\nwinters\n##iah\nteasing\ndaytime\nbeverly\ngambling\nbonnie\n##backs\nregulated\nclement\nhermann\ntricks\nknot\n##shing\n##uring\n##vre\ndetached\necological\nowed\nspecialty\nbyron\ninventor\nbats\nstays\nscreened\nunesco\nmidland\ntrim\naffection\n##ander\n##rry\njess\nthoroughly\nfeedback\n##uma\nchennai\nstrained\nheartbeat\nwrapping\novertime\npleaded\n##sworth\nmon\nleisure\noclc\n##tate\n##ele\nfeathers\nangelo\nthirds\nnuts\nsurveys\nclever\ngill\ncommentator\n##dos\ndarren\nrides\ngibraltar\n##nc\n##mu\ndissolution\ndedication\nshin\nmeals\nsaddle\nelvis\nreds\nchaired\ntaller\nappreciation\nfunctioning\nniece\nfavored\nadvocacy\nrobbie\ncriminals\nsuffolk\nyugoslav\npassport\nconstable\ncongressman\nhastings\nvera\n##rov\nconsecrated\nsparks\necclesiastical\nconfined\n##ovich\nmuller\nfloyd\nnora\n1822\npaved\n1827\ncumberland\nned\nsaga\nspiral\n##flow\nappreciated\nyi\ncollaborative\ntreating\nsimilarities\nfeminine\nfinishes\n##ib\njade\nimport\n##nse\n##hot\nchampagne\nmice\nsecuring\ncelebrities\nhelsinki\nattributes\n##gos\ncousins\nphases\nache\nlucia\ngandhi\nsubmission\nvicar\nspear\nshine\ntasmania\nbiting\ndetention\nconstitute\ntighter\nseasonal\n##gus\nterrestrial\nmatthews\n##oka\neffectiveness\nparody\nphilharmonic\n##onic\n1816\nstrangers\nencoded\nconsortium\nguaranteed\nregards\nshifts\ntortured\ncollision\nsupervisor\ninform\nbroader\ninsight\ntheaters\narmour\nemeritus\nblink\nincorporates\nmapping\n##50\n##ein\nhandball\nflexible\n##nta\nsubstantially\ngenerous\nthief\n##own\ncarr\nloses\n1793\nprose\nucla\nromeo\ngeneric\nmetallic\nrealization\ndamages\nmk\ncommissioners\nzach\ndefault\n##ther\nhelicopters\nlengthy\nstems\nspa\npartnered\nspectators\nrogue\nindication\npenalties\nteresa\n1801\nsen\n##tric\ndalton\n##wich\nirving\nphotographic\n##vey\ndell\ndeaf\npeters\nexcluded\nunsure\n##vable\npatterson\ncrawled\n##zio\nresided\nwhipped\nlatvia\nslower\necole\npipes\nemployers\nmaharashtra\ncomparable\nva\ntextile\npageant\n##gel\nalphabet\nbinary\nirrigation\nchartered\nchoked\nantoine\noffs\nwaking\nsupplement\n##wen\nquantities\ndemolition\nregain\nlocate\nurdu\nfolks\nalt\n114\n##mc\nscary\nandreas\nwhites\n##ava\nclassrooms\nmw\naesthetic\npublishes\nvalleys\nguides\ncubs\njohannes\nbryant\nconventions\naffecting\n##itt\ndrain\nawesome\nisolation\nprosecutor\nambitious\napology\ncaptive\ndowns\natmospheric\nlorenzo\naisle\nbeef\nfoul\n##onia\nkidding\ncomposite\ndisturbed\nillusion\nnatives\n##ffer\nemi\nrockets\nriverside\nwartime\npainters\nadolf\nmelted\n##ail\nuncertainty\nsimulation\nhawks\nprogressed\nmeantime\nbuilder\nspray\nbreach\nunhappy\nregina\nrussians\n##urg\ndetermining\n##tation\ntram\n1806\n##quin\naging\n##12\n1823\ngarion\nrented\nmister\ndiaz\nterminated\nclip\n1817\ndepend\nnervously\ndisco\nowe\ndefenders\nshiva\nnotorious\ndisbelief\nshiny\nworcester\n##gation\n##yr\ntrailing\nundertook\nislander\nbelarus\nlimitations\nwatershed\nfuller\noverlooking\nutilized\nraphael\n1819\nsynthetic\nbreakdown\nklein\n##nate\nmoaned\nmemoir\nlamb\npracticing\n##erly\ncellular\narrows\nexotic\n##graphy\nwitches\n117\ncharted\nrey\nhut\nhierarchy\nsubdivision\nfreshwater\ngiuseppe\naloud\nreyes\nqatar\nmarty\nsideways\nutterly\nsexually\njude\nprayers\nmccarthy\nsoftball\nblend\ndamien\n##gging\n##metric\nwholly\nerupted\nlebanese\nnegro\nrevenues\ntasted\ncomparative\nteamed\ntransaction\nlabeled\nmaori\nsovereignty\nparkway\ntrauma\ngran\nmalay\n121\nadvancement\ndescendant\n2020\nbuzz\nsalvation\ninventory\nsymbolic\n##making\nantarctica\nmps\n##gas\n##bro\nmohammed\nmyanmar\nholt\nsubmarines\ntones\n##lman\nlocker\npatriarch\nbangkok\nemerson\nremarks\npredators\nkin\nafghan\nconfession\nnorwich\nrental\nemerge\nadvantages\n##zel\nrca\n##hold\nshortened\nstorms\naidan\n##matic\nautonomy\ncompliance\n##quet\ndudley\natp\n##osis\n1803\nmotto\ndocumentation\nsummary\nprofessors\nspectacular\nchristina\narchdiocese\nflashing\ninnocence\nremake\n##dell\npsychic\nreef\nscare\nemploy\nrs\nsticks\nmeg\ngus\nleans\n##ude\naccompany\nbergen\ntomas\n##iko\ndoom\nwages\npools\n##nch\n##bes\nbreasts\nscholarly\nalison\noutline\nbrittany\nbreakthrough\nwillis\nrealistic\n##cut\n##boro\ncompetitor\n##stan\npike\npicnic\nicon\ndesigning\ncommercials\nwashing\nvillain\nskiing\nmicro\ncostumes\nauburn\nhalted\nexecutives\n##hat\nlogistics\ncycles\nvowel\napplicable\nbarrett\nexclaimed\neurovision\neternity\nramon\n##umi\n##lls\nmodifications\nsweeping\ndisgust\n##uck\ntorch\naviv\nensuring\nrude\ndusty\nsonic\ndonovan\noutskirts\ncu\npathway\n##band\n##gun\n##lines\ndisciplines\nacids\ncadet\npaired\n##40\nsketches\n##sive\nmarriages\n##⁺\nfolding\npeers\nslovak\nimplies\nadmired\n##beck\n1880s\nleopold\ninstinct\nattained\nweston\nmegan\nhorace\n##ination\ndorsal\ningredients\nevolutionary\n##its\ncomplications\ndeity\nlethal\nbrushing\nlevy\ndeserted\ninstitutes\nposthumously\ndelivering\ntelescope\ncoronation\nmotivated\nrapids\nluc\nflicked\npays\nvolcano\ntanner\nweighed\n##nica\ncrowds\nfrankie\ngifted\naddressing\ngranddaughter\nwinding\n##rna\nconstantine\ngomez\n##front\nlandscapes\nrudolf\nanthropology\nslate\nwerewolf\n##lio\nastronomy\ncirca\nrouge\ndreaming\nsack\nknelt\ndrowned\nnaomi\nprolific\ntracked\nfreezing\nherb\n##dium\nagony\nrandall\ntwisting\nwendy\ndeposit\ntouches\nvein\nwheeler\n##bbled\n##bor\nbatted\nretaining\ntire\npresently\ncompare\nspecification\ndaemon\nnigel\n##grave\nmerry\nrecommendation\nczechoslovakia\nsandra\nng\nroma\n##sts\nlambert\ninheritance\nsheikh\nwinchester\ncries\nexamining\n##yle\ncomeback\ncuisine\nnave\n##iv\nko\nretrieve\ntomatoes\nbarker\npolished\ndefining\nirene\nlantern\npersonalities\nbegging\ntract\nswore\n1809\n175\n##gic\nomaha\nbrotherhood\n##rley\nhaiti\n##ots\nexeter\n##ete\n##zia\nsteele\ndumb\npearson\n210\nsurveyed\nelisabeth\ntrends\n##ef\nfritz\n##rf\npremium\nbugs\nfraction\ncalmly\nviking\n##birds\ntug\ninserted\nunusually\n##ield\nconfronted\ndistress\ncrashing\nbrent\nturks\nresign\n##olo\ncambodia\ngabe\nsauce\n##kal\nevelyn\n116\nextant\nclusters\nquarry\nteenagers\nluna\n##lers\n##ister\naffiliation\ndrill\n##ashi\npanthers\nscenic\nlibya\nanita\nstrengthen\ninscriptions\n##cated\nlace\nsued\njudith\nriots\n##uted\nmint\n##eta\npreparations\nmidst\ndub\nchallenger\n##vich\nmock\ncf\ndisplaced\nwicket\nbreaths\nenables\nschmidt\nanalyst\n##lum\nag\nhighlight\nautomotive\naxe\njosef\nnewark\nsufficiently\nresembles\n50th\n##pal\nflushed\nmum\ntraits\n##ante\ncommodore\nincomplete\nwarming\ntitular\nceremonial\nethical\n118\ncelebrating\neighteenth\ncao\nlima\nmedalist\nmobility\nstrips\nsnakes\n##city\nminiature\nzagreb\nbarton\nescapes\numbrella\nautomated\ndoubted\ndiffers\ncooled\ngeorgetown\ndresden\ncooked\nfade\nwyatt\nrna\njacobs\ncarlton\nabundant\nstereo\nboost\nmadras\ninning\n##hia\nspur\nip\nmalayalam\nbegged\nosaka\ngroan\nescaping\ncharging\ndose\nvista\n##aj\nbud\npapa\ncommunists\nadvocates\nedged\ntri\n##cent\nresemble\npeaking\nnecklace\nfried\nmontenegro\nsaxony\ngoose\nglances\nstuttgart\ncurator\nrecruit\ngrocery\nsympathetic\n##tting\n##fort\n127\nlotus\nrandolph\nancestor\n##rand\nsucceeding\njupiter\n1798\nmacedonian\n##heads\nhiking\n1808\nhanding\nfischer\n##itive\ngarbage\nnode\n##pies\nprone\nsingular\npapua\ninclined\nattractions\nitalia\npouring\nmotioned\ngrandma\ngarnered\njacksonville\ncorp\nego\nringing\naluminum\n##hausen\nordering\n##foot\ndrawer\ntraders\nsynagogue\n##play\n##kawa\nresistant\nwandering\nfragile\nfiona\nteased\nvar\nhardcore\nsoaked\njubilee\ndecisive\nexposition\nmercer\nposter\nvalencia\nhale\nkuwait\n1811\n##ises\n##wr\n##eed\ntavern\ngamma\n122\njohan\n##uer\nairways\namino\ngil\n##ury\nvocational\ndomains\ntorres\n##sp\ngenerator\nfolklore\noutcomes\n##keeper\ncanberra\nshooter\nfl\nbeams\nconfrontation\n##lling\n##gram\nfeb\naligned\nforestry\npipeline\njax\nmotorway\nconception\ndecay\n##tos\ncoffin\n##cott\nstalin\n1805\nescorted\nminded\n##nam\nsitcom\npurchasing\ntwilight\nveronica\nadditions\npassive\ntensions\nstraw\n123\nfrequencies\n1804\nrefugee\ncultivation\n##iate\nchristie\nclary\nbulletin\ncrept\ndisposal\n##rich\n##zong\nprocessor\ncrescent\n##rol\nbmw\nemphasized\nwhale\nnazis\naurora\n##eng\ndwelling\nhauled\nsponsors\ntoledo\nmega\nideology\ntheatres\ntessa\ncerambycidae\nsaves\nturtle\ncone\nsuspects\nkara\nrusty\nyelling\ngreeks\nmozart\nshades\ncocked\nparticipant\n##tro\nshire\nspit\nfreeze\nnecessity\n##cos\ninmates\nnielsen\ncouncillors\nloaned\nuncommon\nomar\npeasants\nbotanical\noffspring\ndaniels\nformations\njokes\n1794\npioneers\nsigma\nlicensing\n##sus\nwheelchair\npolite\n1807\nliquor\npratt\ntrustee\n##uta\nforewings\nballoon\n##zz\nkilometre\ncamping\nexplicit\ncasually\nshawn\nfoolish\nteammates\nnm\nhassan\ncarrie\njudged\nsatisfy\nvanessa\nknives\nselective\ncnn\nflowed\n##lice\neclipse\nstressed\neliza\nmathematician\ncease\ncultivated\n##roy\ncommissions\nbrowns\n##ania\ndestroyers\nsheridan\nmeadow\n##rius\nminerals\n##cial\ndownstream\nclash\ngram\nmemoirs\nventures\nbaha\nseymour\narchie\nmidlands\nedith\nfare\nflynn\ninvite\ncanceled\ntiles\nstabbed\nboulder\nincorporate\namended\ncamden\nfacial\nmollusk\nunreleased\ndescriptions\nyoga\ngrabs\n550\nraises\nramp\nshiver\n##rose\ncoined\npioneering\ntunes\nqing\nwarwick\ntops\n119\nmelanie\ngiles\n##rous\nwandered\n##inal\nannexed\nnov\n30th\nunnamed\n##ished\norganizational\nairplane\nnormandy\nstoke\nwhistle\nblessing\nviolations\nchased\nholders\nshotgun\n##ctic\noutlet\nreactor\n##vik\ntires\ntearing\nshores\nfortified\nmascot\nconstituencies\nnc\ncolumnist\nproductive\ntibet\n##rta\nlineage\nhooked\noct\ntapes\njudging\ncody\n##gger\nhansen\nkashmir\ntriggered\n##eva\nsolved\ncliffs\n##tree\nresisted\nanatomy\nprotesters\ntransparent\nimplied\n##iga\ninjection\nmattress\nexcluding\n##mbo\ndefenses\nhelpless\ndevotion\n##elli\ngrowl\nliberals\nweber\nphenomena\natoms\nplug\n##iff\nmortality\napprentice\nhowe\nconvincing\naaa\nswimmer\nbarber\nleone\npromptly\nsodium\ndef\nnowadays\narise\n##oning\ngloucester\ncorrected\ndignity\nnorm\nerie\n##ders\nelders\nevacuated\nsylvia\ncompression\n##yar\nhartford\npose\nbackpack\nreasoning\naccepts\n24th\nwipe\nmillimetres\nmarcel\n##oda\ndodgers\nalbion\n1790\noverwhelmed\naerospace\noaks\n1795\nshowcase\nacknowledge\nrecovering\nnolan\nashe\nhurts\ngeology\nfashioned\ndisappearance\nfarewell\nswollen\nshrug\nmarquis\nwimbledon\n124\nrue\n1792\ncommemorate\nreduces\nexperiencing\ninevitable\ncalcutta\nintel\n##court\nmurderer\nsticking\nfisheries\nimagery\nbloom\n280\nbrake\n##inus\ngustav\nhesitation\nmemorable\npo\nviral\nbeans\naccidents\ntunisia\nantenna\nspilled\nconsort\ntreatments\naye\nperimeter\n##gard\ndonation\nhostage\nmigrated\nbanker\naddiction\napex\nlil\ntrout\n##ously\nconscience\n##nova\nrams\nsands\ngenome\npassionate\ntroubles\n##lets\n##set\namid\n##ibility\n##ret\nhiggins\nexceed\nvikings\n##vie\npayne\n##zan\nmuscular\n##ste\ndefendant\nsucking\n##wal\nibrahim\nfuselage\nclaudia\nvfl\neuropeans\nsnails\ninterval\n##garh\npreparatory\nstatewide\ntasked\nlacrosse\nviktor\n##lation\nangola\n##hra\nflint\nimplications\nemploys\nteens\npatrons\nstall\nweekends\nbarriers\nscrambled\nnucleus\ntehran\njenna\nparsons\nlifelong\nrobots\ndisplacement\n5000\n##bles\nprecipitation\n##gt\nknuckles\nclutched\n1802\nmarrying\necology\nmarx\naccusations\ndeclare\nscars\nkolkata\nmat\nmeadows\nbermuda\nskeleton\nfinalists\nvintage\ncrawl\ncoordinate\naffects\nsubjected\norchestral\nmistaken\n##tc\nmirrors\ndipped\nrelied\n260\narches\ncandle\n##nick\nincorporating\nwildly\nfond\nbasilica\nowl\nfringe\nrituals\nwhispering\nstirred\nfeud\ntertiary\nslick\ngoat\nhonorable\nwhereby\nskip\nricardo\nstripes\nparachute\nadjoining\nsubmerged\nsynthesizer\n##gren\nintend\npositively\nninety\nphi\nbeaver\npartition\nfellows\nalexis\nprohibition\ncarlisle\nbizarre\nfraternity\n##bre\ndoubts\nicy\ncbc\naquatic\nsneak\nsonny\ncombines\nairports\ncrude\nsupervised\nspatial\nmerge\nalfonso\n##bic\ncorrupt\nscan\nundergo\n##ams\ndisabilities\ncolombian\ncomparing\ndolphins\nperkins\n##lish\nreprinted\nunanimous\nbounced\nhairs\nunderworld\nmidwest\nsemester\nbucket\npaperback\nminiseries\ncoventry\ndemise\n##leigh\ndemonstrations\nsensor\nrotating\nyan\n##hler\narrange\nsoils\n##idge\nhyderabad\nlabs\n##dr\nbrakes\ngrandchildren\n##nde\nnegotiated\nrover\nferrari\ncontinuation\ndirectorate\naugusta\nstevenson\ncounterpart\ngore\n##rda\nnursery\nrican\nave\ncollectively\nbroadly\npastoral\nrepertoire\nasserted\ndiscovering\nnordic\nstyled\nfiba\ncunningham\nharley\nmiddlesex\nsurvives\ntumor\ntempo\nzack\naiming\nlok\nurgent\n##rade\n##nto\ndevils\n##ement\ncontractor\nturin\n##wl\n##ool\nbliss\nrepaired\nsimmons\nmoan\nastronomical\ncr\nnegotiate\nlyric\n1890s\nlara\nbred\nclad\nangus\npbs\n##ience\nengineered\nposed\n##lk\nhernandez\npossessions\nelbows\npsychiatric\nstrokes\nconfluence\nelectorate\nlifts\ncampuses\nlava\nalps\n##ep\n##ution\n##date\nphysicist\nwoody\n##page\n##ographic\n##itis\njuliet\nreformation\nsparhawk\n320\ncomplement\nsuppressed\njewel\n##½\nfloated\n##kas\ncontinuity\nsadly\n##ische\ninability\nmelting\nscanning\npaula\nflour\njudaism\nsafer\nvague\n##lm\nsolving\ncurb\n##stown\nfinancially\ngable\nbees\nexpired\nmiserable\ncassidy\ndominion\n1789\ncupped\n145\nrobbery\nfacto\namos\nwarden\nresume\ntallest\nmarvin\ning\npounded\nusd\ndeclaring\ngasoline\n##aux\ndarkened\n270\n650\nsophomore\n##mere\nerection\ngossip\ntelevised\nrisen\ndial\n##eu\npillars\n##link\npassages\nprofound\n##tina\narabian\nashton\nsilicon\nnail\n##ead\n##lated\n##wer\n##hardt\nfleming\nfirearms\nducked\ncircuits\nblows\nwaterloo\ntitans\n##lina\natom\nfireplace\ncheshire\nfinanced\nactivation\nalgorithms\n##zzi\nconstituent\ncatcher\ncherokee\npartnerships\nsexuality\nplatoon\ntragic\nvivian\nguarded\nwhiskey\nmeditation\npoetic\n##late\n##nga\n##ake\nporto\nlisteners\ndominance\nkendra\nmona\nchandler\nfactions\n22nd\nsalisbury\nattitudes\nderivative\n##ido\n##haus\nintake\npaced\njavier\nillustrator\nbarrels\nbias\ncockpit\nburnett\ndreamed\nensuing\n##anda\nreceptors\nsomeday\nhawkins\nmattered\n##lal\nslavic\n1799\njesuit\ncameroon\nwasted\ntai\nwax\nlowering\nvictorious\nfreaking\noutright\nhancock\nlibrarian\nsensing\nbald\ncalcium\nmyers\ntablet\nannouncing\nbarack\nshipyard\npharmaceutical\n##uan\ngreenwich\nflush\nmedley\npatches\nwolfgang\npt\nspeeches\nacquiring\nexams\nnikolai\n##gg\nhayden\nkannada\n##type\nreilly\n##pt\nwaitress\nabdomen\ndevastated\ncapped\npseudonym\npharmacy\nfulfill\nparaguay\n1796\nclicked\n##trom\narchipelago\nsyndicated\n##hman\nlumber\norgasm\nrejection\nclifford\nlorraine\nadvent\nmafia\nrodney\nbrock\n##ght\n##used\n##elia\ncassette\nchamberlain\ndespair\nmongolia\nsensors\ndevelopmental\nupstream\n##eg\n##alis\nspanning\n165\ntrombone\nbasque\nseeded\ninterred\nrenewable\nrhys\nleapt\nrevision\nmolecule\n##ages\nchord\nvicious\nnord\nshivered\n23rd\narlington\ndebts\ncorpus\nsunrise\nbays\nblackburn\ncentimetres\n##uded\nshuddered\ngm\nstrangely\ngripping\ncartoons\nisabelle\norbital\n##ppa\nseals\nproving\n##lton\nrefusal\nstrengthened\nbust\nassisting\nbaghdad\nbatsman\nportrayal\nmara\npushes\nspears\nog\n##cock\nreside\nnathaniel\nbrennan\n1776\nconfirmation\ncaucus\n##worthy\nmarkings\nyemen\nnobles\nku\nlazy\nviewer\ncatalan\nencompasses\nsawyer\n##fall\nsparked\nsubstances\npatents\nbraves\narranger\nevacuation\nsergio\npersuade\ndover\ntolerance\npenguin\ncum\njockey\ninsufficient\ntownships\noccupying\ndeclining\nplural\nprocessed\nprojection\npuppet\nflanders\nintroduces\nliability\n##yon\ngymnastics\nantwerp\ntaipei\nhobart\ncandles\njeep\nwes\nobservers\n126\nchaplain\nbundle\nglorious\n##hine\nhazel\nflung\nsol\nexcavations\ndumped\nstares\nsh\nbangalore\ntriangular\nicelandic\nintervals\nexpressing\nturbine\n##vers\nsongwriting\ncrafts\n##igo\njasmine\nditch\nrite\n##ways\nentertaining\ncomply\nsorrow\nwrestlers\nbasel\nemirates\nmarian\nrivera\nhelpful\n##some\ncaution\ndownward\nnetworking\n##atory\n##tered\ndarted\ngenocide\nemergence\nreplies\nspecializing\nspokesman\nconvenient\nunlocked\nfading\naugustine\nconcentrations\nresemblance\nelijah\ninvestigator\nandhra\n##uda\npromotes\nbean\n##rrell\nfleeing\nwan\nsimone\nannouncer\n##ame\n##bby\nlydia\nweaver\n132\nresidency\nmodification\n##fest\nstretches\n##ast\nalternatively\nnat\nlowe\nlacks\n##ented\npam\ntile\nconcealed\ninferior\nabdullah\nresidences\ntissues\nvengeance\n##ided\nmoisture\npeculiar\ngroove\nzip\nbologna\njennings\nninja\noversaw\nzombies\npumping\nbatch\nlivingston\nemerald\ninstallations\n1797\npeel\nnitrogen\nrama\n##fying\n##star\nschooling\nstrands\nresponding\nwerner\n##ost\nlime\ncasa\naccurately\ntargeting\n##rod\nunderway\n##uru\nhemisphere\nlester\n##yard\noccupies\n2d\ngriffith\nangrily\nreorganized\n##owing\ncourtney\ndeposited\n##dd\n##30\nestadio\n##ifies\ndunn\nexiled\n##ying\nchecks\n##combe\n##о\n##fly\nsuccesses\nunexpectedly\nblu\nassessed\n##flower\n##ه\nobserving\nsacked\nspiders\nkn\n##tail\nmu\nnodes\nprosperity\naudrey\ndivisional\n155\nbroncos\ntangled\nadjust\nfeeds\nerosion\npaolo\nsurf\ndirectory\nsnatched\nhumid\nadmiralty\nscrewed\ngt\nreddish\n##nese\nmodules\ntrench\nlamps\nbind\nleah\nbucks\ncompetes\n##nz\n##form\ntranscription\n##uc\nisles\nviolently\nclutching\npga\ncyclist\ninflation\nflats\nragged\nunnecessary\n##hian\nstubborn\ncoordinated\nharriet\nbaba\ndisqualified\n330\ninsect\nwolfe\n##fies\nreinforcements\nrocked\nduel\nwinked\nembraced\nbricks\n##raj\nhiatus\ndefeats\npending\nbrightly\njealousy\n##xton\n##hm\n##uki\nlena\ngdp\ncolorful\n##dley\nstein\nkidney\n##shu\nunderwear\nwanderers\n##haw\n##icus\nguardians\nm³\nroared\nhabits\n##wise\npermits\ngp\nuranium\npunished\ndisguise\nbundesliga\nelise\ndundee\nerotic\npartisan\npi\ncollectors\nfloat\nindividually\nrendering\nbehavioral\nbucharest\nser\nhare\nvalerie\ncorporal\nnutrition\nproportional\n##isa\nimmense\n##kis\npavement\n##zie\n##eld\nsutherland\ncrouched\n1775\n##lp\nsuzuki\ntrades\nendurance\noperas\ncrosby\nprayed\npriory\nrory\nsocially\n##urn\ngujarat\n##pu\nwalton\ncube\npasha\nprivilege\nlennon\nfloods\nthorne\nwaterfall\nnipple\nscouting\napprove\n##lov\nminorities\nvoter\ndwight\nextensions\nassure\nballroom\nslap\ndripping\nprivileges\nrejoined\nconfessed\ndemonstrating\npatriotic\nyell\ninvestor\n##uth\npagan\nslumped\nsquares\n##cle\n##kins\nconfront\nbert\nembarrassment\n##aid\naston\nurging\nsweater\nstarr\nyuri\nbrains\nwilliamson\ncommuter\nmortar\nstructured\nselfish\nexports\n##jon\ncds\n##him\nunfinished\n##rre\nmortgage\ndestinations\n##nagar\ncanoe\nsolitary\nbuchanan\ndelays\nmagistrate\nfk\n##pling\nmotivation\n##lier\n##vier\nrecruiting\nassess\n##mouth\nmalik\nantique\n1791\npius\nrahman\nreich\ntub\nzhou\nsmashed\nairs\ngalway\nxii\nconditioning\nhonduras\ndischarged\ndexter\n##pf\nlionel\n129\ndebates\nlemon\ntiffany\nvolunteered\ndom\ndioxide\nprocession\ndevi\nsic\ntremendous\nadvertisements\ncolts\ntransferring\nverdict\nhanover\ndecommissioned\nutter\nrelate\npac\nracism\n##top\nbeacon\nlimp\nsimilarity\nterra\noccurrence\nant\n##how\nbecky\ncapt\nupdates\narmament\nrichie\npal\n##graph\nhalloween\nmayo\n##ssen\n##bone\ncara\nserena\nfcc\ndolls\nobligations\n##dling\nviolated\nlafayette\njakarta\nexploitation\n##ime\ninfamous\niconic\n##lah\n##park\nkitty\nmoody\nreginald\ndread\nspill\ncrystals\nolivier\nmodeled\nbluff\nequilibrium\nseparating\nnotices\nordnance\nextinction\nonset\ncosmic\nattachment\nsammy\nexpose\nprivy\nanchored\n##bil\nabbott\nadmits\nbending\nbaritone\nemmanuel\npoliceman\nvaughan\nwinged\nclimax\ndresses\ndenny\npolytechnic\nmohamed\nburmese\nauthentic\nnikki\ngenetics\ngrandparents\nhomestead\ngaza\npostponed\nmetacritic\nuna\n##sby\n##bat\nunstable\ndissertation\n##rial\n##cian\ncurls\nobscure\nuncovered\nbronx\npraying\ndisappearing\n##hoe\nprehistoric\ncoke\nturret\nmutations\nnonprofit\npits\nmonaco\n##ي\n##usion\nprominently\ndispatched\npodium\n##mir\nuci\n##uation\n133\nfortifications\nbirthplace\nkendall\n##lby\n##oll\npreacher\nrack\ngoodman\n##rman\npersistent\n##ott\ncountless\njaime\nrecorder\nlexington\npersecution\njumps\nrenewal\nwagons\n##11\ncrushing\n##holder\ndecorations\n##lake\nabundance\nwrath\nlaundry\n£1\ngarde\n##rp\njeanne\nbeetles\npeasant\n##sl\nsplitting\ncaste\nsergei\n##rer\n##ema\nscripts\n##ively\nrub\nsatellites\n##vor\ninscribed\nverlag\nscrapped\ngale\npackages\nchick\npotato\nslogan\nkathleen\narabs\n##culture\ncounterparts\nreminiscent\nchoral\n##tead\nrand\nretains\nbushes\ndane\naccomplish\ncourtesy\ncloses\n##oth\nslaughter\nhague\nkrakow\nlawson\ntailed\nelias\nginger\n##ttes\ncanopy\nbetrayal\nrebuilding\nturf\n##hof\nfrowning\nallegiance\nbrigades\nkicks\nrebuild\npolls\nalias\nnationalism\ntd\nrowan\naudition\nbowie\nfortunately\nrecognizes\nharp\ndillon\nhorrified\n##oro\nrenault\n##tics\nropes\n##α\npresumed\nrewarded\ninfrared\nwiping\naccelerated\nillustration\n##rid\npresses\npractitioners\nbadminton\n##iard\ndetained\n##tera\nrecognizing\nrelates\nmisery\n##sies\n##tly\nreproduction\npiercing\npotatoes\nthornton\nesther\nmanners\nhbo\n##aan\nours\nbullshit\nernie\nperennial\nsensitivity\nilluminated\nrupert\n##jin\n##iss\n##ear\nrfc\nnassau\n##dock\nstaggered\nsocialism\n##haven\nappointments\nnonsense\nprestige\nsharma\nhaul\n##tical\nsolidarity\ngps\n##ook\n##rata\nigor\npedestrian\n##uit\nbaxter\ntenants\nwires\nmedication\nunlimited\nguiding\nimpacts\ndiabetes\n##rama\nsasha\npas\nclive\nextraction\n131\ncontinually\nconstraints\n##bilities\nsonata\nhunted\nsixteenth\nchu\nplanting\nquote\nmayer\npretended\nabs\nspat\n##hua\nceramic\n##cci\ncurtains\npigs\npitching\n##dad\nlatvian\nsore\ndayton\n##sted\n##qi\npatrols\nslice\nplayground\n##nted\nshone\nstool\napparatus\ninadequate\nmates\ntreason\n##ija\ndesires\n##liga\n##croft\nsomalia\nlaurent\nmir\nleonardo\noracle\ngrape\nobliged\nchevrolet\nthirteenth\nstunning\nenthusiastic\n##ede\naccounted\nconcludes\ncurrents\nbasil\n##kovic\ndrought\n##rica\nmai\n##aire\nshove\nposting\n##shed\npilgrimage\nhumorous\npacking\nfry\npencil\nwines\nsmells\n144\nmarilyn\naching\nnewest\nclung\nbon\nneighbours\nsanctioned\n##pie\nmug\n##stock\ndrowning\n##mma\nhydraulic\n##vil\nhiring\nreminder\nlilly\ninvestigators\n##ncies\nsour\n##eous\ncompulsory\npacket\n##rion\n##graphic\n##elle\ncannes\n##inate\ndepressed\n##rit\nheroic\nimportantly\ntheresa\n##tled\nconway\nsaturn\nmarginal\nrae\n##xia\ncorresponds\nroyce\npact\njasper\nexplosives\npackaging\naluminium\n##ttered\ndenotes\nrhythmic\nspans\nassignments\nhereditary\noutlined\noriginating\nsundays\nlad\nreissued\ngreeting\nbeatrice\n##dic\npillar\nmarcos\nplots\nhandbook\nalcoholic\njudiciary\navant\nslides\nextract\nmasculine\nblur\n##eum\n##force\nhomage\ntrembled\nowens\nhymn\ntrey\nomega\nsignaling\nsocks\naccumulated\nreacted\nattic\ntheo\nlining\nangie\ndistraction\nprimera\ntalbot\n##key\n1200\nti\ncreativity\nbilled\n##hey\ndeacon\neduardo\nidentifies\nproposition\ndizzy\ngunner\nhogan\n##yam\n##pping\n##hol\nja\n##chan\njensen\nreconstructed\n##berger\nclearance\ndarius\n##nier\nabe\nharlem\nplea\ndei\ncircled\nemotionally\nnotation\nfascist\nneville\nexceeded\nupwards\nviable\nducks\n##fo\nworkforce\nracer\nlimiting\nshri\n##lson\npossesses\n1600\nkerr\nmoths\ndevastating\nladen\ndisturbing\nlocking\n##cture\ngal\nfearing\naccreditation\nflavor\naide\n1870s\nmountainous\n##baum\nmelt\n##ures\nmotel\ntexture\nservers\nsoda\n##mb\nherd\n##nium\nerect\npuzzled\nhum\npeggy\nexaminations\ngould\ntestified\ngeoff\nren\ndevised\nsacks\n##law\ndenial\nposters\ngrunted\ncesar\ntutor\nec\ngerry\nofferings\nbyrne\nfalcons\ncombinations\nct\nincoming\npardon\nrocking\n26th\navengers\nflared\nmankind\nseller\nuttar\nloch\nnadia\nstroking\nexposing\n##hd\nfertile\nancestral\ninstituted\n##has\nnoises\nprophecy\ntaxation\neminent\nvivid\npol\n##bol\ndart\nindirect\nmultimedia\nnotebook\nupside\ndisplaying\nadrenaline\nreferenced\ngeometric\n##iving\nprogression\n##ddy\nblunt\nannounce\n##far\nimplementing\n##lav\naggression\nliaison\ncooler\ncares\nheadache\nplantations\ngorge\ndots\nimpulse\nthickness\nashamed\naveraging\nkathy\nobligation\nprecursor\n137\nfowler\nsymmetry\nthee\n225\nhears\n##rai\nundergoing\nads\nbutcher\nbowler\n##lip\ncigarettes\nsubscription\ngoodness\n##ically\nbrowne\n##hos\n##tech\nkyoto\ndonor\n##erty\ndamaging\nfriction\ndrifting\nexpeditions\nhardened\nprostitution\n152\nfauna\nblankets\nclaw\ntossing\nsnarled\nbutterflies\nrecruits\ninvestigative\ncoated\nhealed\n138\ncommunal\nhai\nxiii\nacademics\nboone\npsychologist\nrestless\nlahore\nstephens\nmba\nbrendan\nforeigners\nprinter\n##pc\nached\nexplode\n27th\ndeed\nscratched\ndared\n##pole\ncardiac\n1780\nokinawa\nproto\ncommando\ncompelled\noddly\nelectrons\n##base\nreplica\nthanksgiving\n##rist\nsheila\ndeliberate\nstafford\ntidal\nrepresentations\nhercules\nou\n##path\n##iated\nkidnapping\nlenses\n##tling\ndeficit\nsamoa\nmouths\nconsuming\ncomputational\nmaze\ngranting\nsmirk\nrazor\nfixture\nideals\ninviting\naiden\nnominal\n##vs\nissuing\njulio\npitt\nramsey\ndocks\n##oss\nexhaust\n##owed\nbavarian\ndraped\nanterior\nmating\nethiopian\nexplores\nnoticing\n##nton\ndiscarded\nconvenience\nhoffman\nendowment\nbeasts\ncartridge\nmormon\npaternal\nprobe\nsleeves\ninterfere\nlump\ndeadline\n##rail\njenks\nbulldogs\nscrap\nalternating\njustified\nreproductive\nnam\nseize\ndescending\nsecretariat\nkirby\ncoupe\ngrouped\nsmash\npanther\nsedan\ntapping\n##18\nlola\ncheer\ngermanic\nunfortunate\n##eter\nunrelated\n##fan\nsubordinate\n##sdale\nsuzanne\nadvertisement\n##ility\nhorsepower\n##lda\ncautiously\ndiscourse\nluigi\n##mans\n##fields\nnoun\nprevalent\nmao\nschneider\neverett\nsurround\ngovernorate\nkira\n##avia\nwestward\n##take\nmisty\nrails\nsustainability\n134\nunused\n##rating\npacks\ntoast\nunwilling\nregulate\nthy\nsuffrage\nnile\nawe\nassam\ndefinitions\ntravelers\naffordable\n##rb\nconferred\nsells\nundefeated\nbeneficial\ntorso\nbasal\nrepeating\nremixes\n##pass\nbahrain\ncables\nfang\n##itated\nexcavated\nnumbering\nstatutory\n##rey\ndeluxe\n##lian\nforested\nramirez\nderbyshire\nzeus\nslamming\ntransfers\nastronomer\nbanana\nlottery\nberg\nhistories\nbamboo\n##uchi\nresurrection\nposterior\nbowls\nvaguely\n##thi\nthou\npreserving\ntensed\noffence\n##inas\nmeyrick\ncallum\nridden\nwatt\nlangdon\ntying\nlowland\nsnorted\ndaring\ntruman\n##hale\n##girl\naura\noverly\nfiling\nweighing\ngoa\ninfections\nphilanthropist\nsaunders\neponymous\n##owski\nlatitude\nperspectives\nreviewing\nmets\ncommandant\nradial\n##kha\nflashlight\nreliability\nkoch\nvowels\namazed\nada\nelaine\nsupper\n##rth\n##encies\npredator\ndebated\nsoviets\ncola\n##boards\n##nah\ncompartment\ncrooked\narbitrary\nfourteenth\n##ctive\nhavana\nmajors\nsteelers\nclips\nprofitable\nambush\nexited\npackers\n##tile\nnude\ncracks\nfungi\n##е\nlimb\ntrousers\njosie\nshelby\ntens\nfrederic\n##ος\ndefinite\nsmoothly\nconstellation\ninsult\nbaton\ndiscs\nlingering\n##nco\nconclusions\nlent\nstaging\nbecker\ngrandpa\nshaky\n##tron\neinstein\nobstacles\nsk\nadverse\nelle\neconomically\n##moto\nmccartney\nthor\ndismissal\nmotions\nreadings\nnostrils\ntreatise\n##pace\nsqueezing\nevidently\nprolonged\n1783\nvenezuelan\nje\nmarguerite\nbeirut\ntakeover\nshareholders\n##vent\ndenise\ndigit\nairplay\nnorse\n##bbling\nimaginary\npills\nhubert\nblaze\nvacated\neliminating\n##ello\nvine\nmansfield\n##tty\nretrospective\nbarrow\nborne\nclutch\nbail\nforensic\nweaving\n##nett\n##witz\ndesktop\ncitadel\npromotions\nworrying\ndorset\nieee\nsubdivided\n##iating\nmanned\nexpeditionary\npickup\nsynod\nchuckle\n185\nbarney\n##rz\n##ffin\nfunctionality\nkarachi\nlitigation\nmeanings\nuc\nlick\nturbo\nanders\n##ffed\nexecute\ncurl\noppose\nankles\ntyphoon\n##د\n##ache\n##asia\nlinguistics\ncompassion\npressures\ngrazing\nperfection\n##iting\nimmunity\nmonopoly\nmuddy\nbackgrounds\n136\nnamibia\nfrancesca\nmonitors\nattracting\nstunt\ntuition\n##ии\nvegetable\n##mates\n##quent\nmgm\njen\ncomplexes\nforts\n##ond\ncellar\nbites\nseventeenth\nroyals\nflemish\nfailures\nmast\ncharities\n##cular\nperuvian\ncapitals\nmacmillan\nipswich\noutward\nfrigate\npostgraduate\nfolds\nemploying\n##ouse\nconcurrently\nfiery\n##tai\ncontingent\nnightmares\nmonumental\nnicaragua\n##kowski\nlizard\nmal\nfielding\ngig\nreject\n##pad\nharding\n##ipe\ncoastline\n##cin\n##nos\nbeethoven\nhumphrey\ninnovations\n##tam\n##nge\nnorris\ndoris\nsolicitor\nhuang\nobey\n141\n##lc\nniagara\n##tton\nshelves\naug\nbourbon\ncurry\nnightclub\nspecifications\nhilton\n##ndo\ncentennial\ndispersed\nworm\nneglected\nbriggs\nsm\nfont\nkuala\nuneasy\nplc\n##nstein\n##bound\n##aking\n##burgh\nawaiting\npronunciation\n##bbed\n##quest\neh\noptimal\nzhu\nraped\ngreens\npresided\nbrenda\nworries\n##life\nvenetian\nmarxist\nturnout\n##lius\nrefined\nbraced\nsins\ngrasped\nsunderland\nnickel\nspeculated\nlowell\ncyrillic\ncommunism\nfundraising\nresembling\ncolonists\nmutant\nfreddie\nusc\n##mos\ngratitude\n##run\nmural\n##lous\nchemist\nwi\nreminds\n28th\nsteals\ntess\npietro\n##ingen\npromoter\nri\nmicrophone\nhonoured\nrai\nsant\n##qui\nfeather\n##nson\nburlington\nkurdish\nterrorists\ndeborah\nsickness\n##wed\n##eet\nhazard\nirritated\ndesperation\nveil\nclarity\n##rik\njewels\nxv\n##gged\n##ows\n##cup\nberkshire\nunfair\nmysteries\norchid\nwinced\nexhaustion\nrenovations\nstranded\nobe\ninfinity\n##nies\nadapt\nredevelopment\nthanked\nregistry\nolga\ndomingo\nnoir\ntudor\nole\n##atus\ncommenting\nbehaviors\n##ais\ncrisp\npauline\nprobable\nstirling\nwigan\n##bian\nparalympics\npanting\nsurpassed\n##rew\nluca\nbarred\npony\nfamed\n##sters\ncassandra\nwaiter\ncarolyn\nexported\n##orted\nandres\ndestructive\ndeeds\njonah\ncastles\nvacancy\nsuv\n##glass\n1788\norchard\nyep\nfamine\nbelarusian\nsprang\n##forth\nskinny\n##mis\nadministrators\nrotterdam\nzambia\nzhao\nboiler\ndiscoveries\n##ride\n##physics\nlucius\ndisappointing\noutreach\nspoon\n##frame\nqualifications\nunanimously\nenjoys\nregency\n##iidae\nstade\nrealism\nveterinary\nrodgers\ndump\nalain\nchestnut\ncastile\ncensorship\nrumble\ngibbs\n##itor\ncommunion\nreggae\ninactivated\nlogs\nloads\n##houses\nhomosexual\n##iano\nale\ninforms\n##cas\nphrases\nplaster\nlinebacker\nambrose\nkaiser\nfascinated\n850\nlimerick\nrecruitment\nforge\nmastered\n##nding\nleinster\nrooted\nthreaten\n##strom\nborneo\n##hes\nsuggestions\nscholarships\npropeller\ndocumentaries\npatronage\ncoats\nconstructing\ninvest\nneurons\ncomet\nentirety\nshouts\nidentities\nannoying\nunchanged\nwary\n##antly\n##ogy\nneat\noversight\n##kos\nphillies\nreplay\nconstance\n##kka\nincarnation\nhumble\nskies\nminus\n##acy\nsmithsonian\n##chel\nguerrilla\njar\ncadets\n##plate\nsurplus\naudit\n##aru\ncracking\njoanna\nlouisa\npacing\n##lights\nintentionally\n##iri\ndiner\nnwa\nimprint\naustralians\ntong\nunprecedented\nbunker\nnaive\nspecialists\nark\nnichols\nrailing\nleaked\npedal\n##uka\nshrub\nlonging\nroofs\nv8\ncaptains\nneural\ntuned\n##ntal\n##jet\nemission\nmedina\nfrantic\ncodex\ndefinitive\nsid\nabolition\nintensified\nstocks\nenrique\nsustain\ngenoa\noxide\n##written\nclues\ncha\n##gers\ntributaries\nfragment\nvenom\n##rity\n##ente\n##sca\nmuffled\nvain\nsire\nlaos\n##ingly\n##hana\nhastily\nsnapping\nsurfaced\nsentiment\nmotive\n##oft\ncontests\napproximate\nmesa\nluckily\ndinosaur\nexchanges\npropelled\naccord\nbourne\nrelieve\ntow\nmasks\noffended\n##ues\ncynthia\n##mmer\nrains\nbartender\nzinc\nreviewers\nlois\n##sai\nlegged\narrogant\nrafe\nrosie\ncomprise\nhandicap\nblockade\ninlet\nlagoon\ncopied\ndrilling\nshelley\npetals\n##inian\nmandarin\nobsolete\n##inated\nonward\narguably\nproductivity\ncindy\npraising\nseldom\nbusch\ndiscusses\nraleigh\nshortage\nranged\nstanton\nencouragement\nfirstly\nconceded\novers\ntemporal\n##uke\ncbe\n##bos\nwoo\ncertainty\npumps\n##pton\nstalked\n##uli\nlizzie\nperiodic\nthieves\nweaker\n##night\ngases\nshoving\nchooses\nwc\n##chemical\nprompting\nweights\n##kill\nrobust\nflanked\nsticky\nhu\ntuberculosis\n##eb\n##eal\nchristchurch\nresembled\nwallet\nreese\ninappropriate\npictured\ndistract\nfixing\nfiddle\ngiggled\nburger\nheirs\nhairy\nmechanic\ntorque\napache\nobsessed\nchiefly\ncheng\nlogging\n##tag\nextracted\nmeaningful\nnumb\n##vsky\ngloucestershire\nreminding\n##bay\nunite\n##lit\nbreeds\ndiminished\nclown\nglove\n1860s\n##ن\n##ug\narchibald\nfocal\nfreelance\nsliced\ndepiction\n##yk\norganism\nswitches\nsights\nstray\ncrawling\n##ril\nlever\nleningrad\ninterpretations\nloops\nanytime\nreel\nalicia\ndelighted\n##ech\ninhaled\nxiv\nsuitcase\nbernie\nvega\nlicenses\nnorthampton\nexclusion\ninduction\nmonasteries\nracecourse\nhomosexuality\n##right\n##sfield\n##rky\ndimitri\nmichele\nalternatives\nions\ncommentators\ngenuinely\nobjected\npork\nhospitality\nfencing\nstephan\nwarships\nperipheral\nwit\ndrunken\nwrinkled\nquentin\nspends\ndeparting\nchung\nnumerical\nspokesperson\n##zone\njohannesburg\ncaliber\nkillers\n##udge\nassumes\nneatly\ndemographic\nabigail\nbloc\n##vel\nmounting\n##lain\nbentley\nslightest\nxu\nrecipients\n##jk\nmerlin\n##writer\nseniors\nprisons\nblinking\nhindwings\nflickered\nkappa\n##hel\n80s\nstrengthening\nappealing\nbrewing\ngypsy\nmali\nlashes\nhulk\nunpleasant\nharassment\nbio\ntreaties\npredict\ninstrumentation\npulp\ntroupe\nboiling\nmantle\n##ffe\nins\n##vn\ndividing\nhandles\nverbs\n##onal\ncoconut\nsenegal\n340\nthorough\ngum\nmomentarily\n##sto\ncocaine\npanicked\ndestined\n##turing\nteatro\ndenying\nweary\ncaptained\nmans\n##hawks\n##code\nwakefield\nbollywood\nthankfully\n##16\ncyril\n##wu\namendments\n##bahn\nconsultation\nstud\nreflections\nkindness\n1787\ninternally\n##ovo\ntex\nmosaic\ndistribute\npaddy\nseeming\n143\n##hic\npiers\n##15\n##mura\n##verse\npopularly\nwinger\nkang\nsentinel\nmccoy\n##anza\ncovenant\n##bag\nverge\nfireworks\nsuppress\nthrilled\ndominate\n##jar\nswansea\n##60\n142\nreconciliation\n##ndi\nstiffened\ncue\ndorian\n##uf\ndamascus\namor\nida\nforemost\n##aga\nporsche\nunseen\ndir\n##had\n##azi\nstony\nlexi\nmelodies\n##nko\nangular\ninteger\npodcast\nants\ninherent\njaws\njustify\npersona\n##olved\njosephine\n##nr\n##ressed\ncustomary\nflashes\ngala\ncyrus\nglaring\nbackyard\nariel\nphysiology\ngreenland\nhtml\nstir\navon\natletico\nfinch\nmethodology\nked\n##lent\nmas\ncatholicism\ntownsend\nbranding\nquincy\nfits\ncontainers\n1777\nashore\naragon\n##19\nforearm\npoisoning\n##sd\nadopting\nconquer\ngrinding\namnesty\nkeller\nfinances\nevaluate\nforged\nlankan\ninstincts\n##uto\nguam\nbosnian\nphotographed\nworkplace\ndesirable\nprotector\n##dog\nallocation\nintently\nencourages\nwilly\n##sten\nbodyguard\nelectro\nbrighter\n##ν\nbihar\n##chev\nlasts\nopener\namphibious\nsal\nverde\narte\n##cope\ncaptivity\nvocabulary\nyields\n##tted\nagreeing\ndesmond\npioneered\n##chus\nstrap\ncampaigned\nrailroads\n##ович\nemblem\n##dre\nstormed\n501\n##ulous\nmarijuana\nnorthumberland\n##gn\n##nath\nbowen\nlandmarks\nbeaumont\n##qua\ndanube\n##bler\nattorneys\nth\nge\nflyers\ncritique\nvillains\ncass\nmutation\nacc\n##0s\ncolombo\nmckay\nmotif\nsampling\nconcluding\nsyndicate\n##rell\nneon\nstables\nds\nwarnings\nclint\nmourning\nwilkinson\n##tated\nmerrill\nleopard\nevenings\nexhaled\nemil\nsonia\nezra\ndiscrete\nstove\nfarrell\nfifteenth\nprescribed\nsuperhero\n##rier\nworms\nhelm\nwren\n##duction\n##hc\nexpo\n##rator\nhq\nunfamiliar\nantony\nprevents\nacceleration\nfiercely\nmari\npainfully\ncalculations\ncheaper\nign\nclifton\nirvine\ndavenport\nmozambique\n##np\npierced\n##evich\nwonders\n##wig\n##cate\n##iling\ncrusade\nware\n##uel\nenzymes\nreasonably\nmls\n##coe\nmater\nambition\nbunny\neliot\nkernel\n##fin\nasphalt\nheadmaster\ntorah\naden\nlush\npins\nwaived\n##care\n##yas\njoao\nsubstrate\nenforce\n##grad\n##ules\nalvarez\nselections\nepidemic\ntempted\n##bit\nbremen\ntranslates\nensured\nwaterfront\n29th\nforrest\nmanny\nmalone\nkramer\nreigning\ncookies\nsimpler\nabsorption\n205\nengraved\n##ffy\nevaluated\n1778\nhaze\n146\ncomforting\ncrossover\n##abe\nthorn\n##rift\n##imo\n##pop\nsuppression\nfatigue\ncutter\n##tr\n201\nwurttemberg\n##orf\nenforced\nhovering\nproprietary\ngb\nsamurai\nsyllable\nascent\nlacey\ntick\nlars\ntractor\nmerchandise\nrep\nbouncing\ndefendants\n##yre\nhuntington\n##ground\n##oko\nstandardized\n##hor\n##hima\nassassinated\nnu\npredecessors\nrainy\nliar\nassurance\nlyrical\n##uga\nsecondly\nflattened\nios\nparameter\nundercover\n##mity\nbordeaux\npunish\nridges\nmarkers\nexodus\ninactive\nhesitate\ndebbie\nnyc\npledge\nsavoy\nnagar\noffset\norganist\n##tium\nhesse\nmarin\nconverting\n##iver\ndiagram\npropulsion\npu\nvalidity\nreverted\nsupportive\n##dc\nministries\nclans\nresponds\nproclamation\n##inae\n##ø\n##rea\nein\npleading\npatriot\nsf\nbirch\nislanders\nstrauss\nhates\n##dh\nbrandenburg\nconcession\nrd\n##ob\n1900s\nkillings\ntextbook\nantiquity\ncinematography\nwharf\nembarrassing\nsetup\ncreed\nfarmland\ninequality\ncentred\nsignatures\nfallon\n370\n##ingham\n##uts\nceylon\ngazing\ndirective\nlaurie\n##tern\nglobally\n##uated\n##dent\nallah\nexcavation\nthreads\n##cross\n148\nfrantically\nicc\nutilize\ndetermines\nrespiratory\nthoughtful\nreceptions\n##dicate\nmerging\nchandra\nseine\n147\nbuilders\nbuilds\ndiagnostic\ndev\nvisibility\ngoddamn\nanalyses\ndhaka\ncho\nproves\nchancel\nconcurrent\ncuriously\ncanadians\npumped\nrestoring\n1850s\nturtles\njaguar\nsinister\nspinal\ntraction\ndeclan\nvows\n1784\nglowed\ncapitalism\nswirling\ninstall\nuniversidad\n##lder\n##oat\nsoloist\n##genic\n##oor\ncoincidence\nbeginnings\nnissan\ndip\nresorts\ncaucasus\ncombustion\ninfectious\n##eno\npigeon\nserpent\n##itating\nconclude\nmasked\nsalad\njew\n##gr\nsurreal\ntoni\n##wc\nharmonica\n151\n##gins\n##etic\n##coat\nfishermen\nintending\nbravery\n##wave\nklaus\ntitan\nwembley\ntaiwanese\nransom\n40th\nincorrect\nhussein\neyelids\njp\ncooke\ndramas\nutilities\n##etta\n##print\neisenhower\nprincipally\ngranada\nlana\n##rak\nopenings\nconcord\n##bl\nbethany\nconnie\nmorality\nsega\n##mons\n##nard\nearnings\n##kara\n##cine\nwii\ncommunes\n##rel\ncoma\ncomposing\nsoftened\nsevered\ngrapes\n##17\nnguyen\nanalyzed\nwarlord\nhubbard\nheavenly\nbehave\nslovenian\n##hit\n##ony\nhailed\nfilmmakers\ntrance\ncaldwell\nskye\nunrest\ncoward\nlikelihood\n##aging\nbern\nsci\ntaliban\nhonolulu\npropose\n##wang\n1700\nbrowser\nimagining\ncobra\ncontributes\ndukes\ninstinctively\nconan\nviolinist\n##ores\naccessories\ngradual\n##amp\nquotes\nsioux\n##dating\nundertake\nintercepted\nsparkling\ncompressed\n139\nfungus\ntombs\nhaley\nimposing\nrests\ndegradation\nlincolnshire\nretailers\nwetlands\ntulsa\ndistributor\ndungeon\nnun\ngreenhouse\nconvey\natlantis\naft\nexits\noman\ndresser\nlyons\n##sti\njoking\neddy\njudgement\nomitted\ndigits\n##cts\n##game\njuniors\n##rae\ncents\nstricken\nune\n##ngo\nwizards\nweir\nbreton\nnan\ntechnician\nfibers\nliking\nroyalty\n##cca\n154\npersia\nterribly\nmagician\n##rable\n##unt\nvance\ncafeteria\nbooker\ncamille\nwarmer\n##static\nconsume\ncavern\ngaps\ncompass\ncontemporaries\nfoyer\nsoothing\ngraveyard\nmaj\nplunged\nblush\n##wear\ncascade\ndemonstrates\nordinance\n##nov\nboyle\n##lana\nrockefeller\nshaken\nbanjo\nizzy\n##ense\nbreathless\nvines\n##32\n##eman\nalterations\nchromosome\ndwellings\nfeudal\nmole\n153\ncatalonia\nrelics\ntenant\nmandated\n##fm\nfridge\nhats\nhonesty\npatented\nraul\nheap\ncruisers\naccusing\nenlightenment\ninfants\nwherein\nchatham\ncontractors\nzen\naffinity\nhc\nosborne\npiston\n156\ntraps\nmaturity\n##rana\nlagos\n##zal\npeering\n##nay\nattendant\ndealers\nprotocols\nsubset\nprospects\nbiographical\n##cre\nartery\n##zers\ninsignia\nnuns\nendured\n##eration\nrecommend\nschwartz\nserbs\nberger\ncromwell\ncrossroads\n##ctor\nenduring\nclasped\ngrounded\n##bine\nmarseille\ntwitched\nabel\nchoke\nhttps\ncatalyst\nmoldova\nitalians\n##tist\ndisastrous\nwee\n##oured\n##nti\nwwf\nnope\n##piration\n##asa\nexpresses\nthumbs\n167\n##nza\ncoca\n1781\ncheating\n##ption\nskipped\nsensory\nheidelberg\nspies\nsatan\ndangers\nsemifinal\n202\nbohemia\nwhitish\nconfusing\nshipbuilding\nrelies\nsurgeons\nlandings\nravi\nbaku\nmoor\nsuffix\nalejandro\n##yana\nlitre\nupheld\n##unk\nrajasthan\n##rek\ncoaster\ninsists\nposture\nscenarios\netienne\nfavoured\nappoint\ntransgender\nelephants\npoked\ngreenwood\ndefences\nfulfilled\nmilitant\nsomali\n1758\nchalk\npotent\n##ucci\nmigrants\nwink\nassistants\nnos\nrestriction\nactivism\nniger\n##ario\ncolon\nshaun\n##sat\ndaphne\n##erated\nswam\ncongregations\nreprise\nconsiderations\nmagnet\nplayable\nxvi\n##р\noverthrow\ntobias\nknob\nchavez\ncoding\n##mers\npropped\nkatrina\norient\nnewcomer\n##suke\ntemperate\n##pool\nfarmhouse\ninterrogation\n##vd\ncommitting\n##vert\nforthcoming\nstrawberry\njoaquin\nmacau\nponds\nshocking\nsiberia\n##cellular\nchant\ncontributors\n##nant\n##ologists\nsped\nabsorb\nhail\n1782\nspared\n##hore\nbarbados\nkarate\nopus\noriginates\nsaul\n##xie\nevergreen\nleaped\n##rock\ncorrelation\nexaggerated\nweekday\nunification\nbump\ntracing\nbrig\nafb\npathways\nutilizing\n##ners\nmod\nmb\ndisturbance\nkneeling\n##stad\n##guchi\n100th\npune\n##thy\ndecreasing\n168\nmanipulation\nmiriam\nacademia\necosystem\noccupational\nrbi\n##lem\nrift\n##14\nrotary\nstacked\nincorporation\nawakening\ngenerators\nguerrero\nracist\n##omy\ncyber\nderivatives\nculminated\nallie\nannals\npanzer\nsainte\nwikipedia\npops\nzu\naustro\n##vate\nalgerian\npolitely\nnicholson\nmornings\neducate\ntastes\nthrill\ndartmouth\n##gating\ndb\n##jee\nregan\ndiffering\nconcentrating\nchoreography\ndivinity\n##media\npledged\nalexandre\nrouting\ngregor\nmadeline\n##idal\napocalypse\n##hora\ngunfire\nculminating\nelves\nfined\nliang\nlam\nprogrammed\ntar\nguessing\ntransparency\ngabrielle\n##gna\ncancellation\nflexibility\n##lining\naccession\nshea\nstronghold\nnets\nspecializes\n##rgan\nabused\nhasan\nsgt\nling\nexceeding\n##₄\nadmiration\nsupermarket\n##ark\nphotographers\nspecialised\ntilt\nresonance\nhmm\nperfume\n380\nsami\nthreatens\ngarland\nbotany\nguarding\nboiled\ngreet\npuppy\nrusso\nsupplier\nwilmington\nvibrant\nvijay\n##bius\nparalympic\ngrumbled\npaige\nfaa\nlicking\nmargins\nhurricanes\n##gong\nfest\ngrenade\nripping\n##uz\ncounseling\nweigh\n##sian\nneedles\nwiltshire\nedison\ncostly\n##not\nfulton\ntramway\nredesigned\nstaffordshire\ncache\ngasping\nwatkins\nsleepy\ncandidacy\n##group\nmonkeys\ntimeline\nthrobbing\n##bid\n##sos\nberth\nuzbekistan\nvanderbilt\nbothering\noverturned\nballots\ngem\n##iger\nsunglasses\nsubscribers\nhooker\ncompelling\nang\nexceptionally\nsaloon\nstab\n##rdi\ncarla\nterrifying\nrom\n##vision\ncoil\n##oids\nsatisfying\nvendors\n31st\nmackay\ndeities\noverlooked\nambient\nbahamas\nfelipe\nolympia\nwhirled\nbotanist\nadvertised\ntugging\n##dden\ndisciples\nmorales\nunionist\nrites\nfoley\nmorse\nmotives\ncreepy\n##₀\nsoo\n##sz\nbargain\nhighness\nfrightening\nturnpike\ntory\nreorganization\n##cer\ndepict\nbiographer\n##walk\nunopposed\nmanifesto\n##gles\ninstitut\nemile\naccidental\nkapoor\n##dam\nkilkenny\ncortex\nlively\n##13\nromanesque\njain\nshan\ncannons\n##ood\n##ske\npetrol\nechoing\namalgamated\ndisappears\ncautious\nproposes\nsanctions\ntrenton\n##ر\nflotilla\naus\ncontempt\ntor\ncanary\ncote\ntheirs\n##hun\nconceptual\ndeleted\nfascinating\npaso\nblazing\nelf\nhonourable\nhutchinson\n##eiro\n##outh\n##zin\nsurveyor\ntee\namidst\nwooded\nreissue\nintro\n##ono\ncobb\nshelters\nnewsletter\nhanson\nbrace\nencoding\nconfiscated\ndem\ncaravan\nmarino\nscroll\nmelodic\ncows\nimam\n##adi\n##aneous\nnorthward\nsearches\nbiodiversity\ncora\n310\nroaring\n##bers\nconnell\ntheologian\nhalo\ncompose\npathetic\nunmarried\ndynamo\n##oot\naz\ncalculation\ntoulouse\ndeserves\nhumour\nnr\nforgiveness\ntam\nundergone\nmartyr\npamela\nmyths\nwhore\ncounselor\nhicks\n290\nheavens\nbattleship\nelectromagnetic\n##bbs\nstellar\nestablishments\npresley\nhopped\n##chin\ntemptation\n90s\nwills\nnas\n##yuan\nnhs\n##nya\nseminars\n##yev\nadaptations\ngong\nasher\nlex\nindicator\nsikh\ntobago\ncites\ngoin\n##yte\nsatirical\n##gies\ncharacterised\ncorrespond\nbubbles\nlure\nparticipates\n##vid\neruption\nskate\ntherapeutic\n1785\ncanals\nwholesale\ndefaulted\nsac\n460\npetit\n##zzled\nvirgil\nleak\nravens\n256\nportraying\n##yx\nghetto\ncreators\ndams\nportray\nvicente\n##rington\nfae\nnamesake\nbounty\n##arium\njoachim\n##ota\n##iser\naforementioned\naxle\nsnout\ndepended\ndismantled\nreuben\n480\n##ibly\ngallagher\n##lau\n##pd\nearnest\n##ieu\n##iary\ninflicted\nobjections\n##llar\nasa\ngritted\n##athy\njericho\n##sea\n##was\nflick\nunderside\nceramics\nundead\nsubstituted\n195\neastward\nundoubtedly\nwheeled\nchimney\n##iche\nguinness\ncb\n##ager\nsiding\n##bell\ntraitor\nbaptiste\ndisguised\ninauguration\n149\ntipperary\nchoreographer\nperched\nwarmed\nstationary\neco\n##ike\n##ntes\nbacterial\n##aurus\nflores\nphosphate\n##core\nattacker\ninvaders\nalvin\nintersects\na1\nindirectly\nimmigrated\nbusinessmen\ncornelius\nvalves\nnarrated\npill\nsober\nul\nnationale\nmonastic\napplicants\nscenery\n##jack\n161\nmotifs\nconstitutes\ncpu\n##osh\njurisdictions\nsd\ntuning\nirritation\nwoven\n##uddin\nfertility\ngao\n##erie\nantagonist\nimpatient\nglacial\nhides\nboarded\ndenominations\ninterception\n##jas\ncookie\nnicola\n##tee\nalgebraic\nmarquess\nbahn\nparole\nbuyers\nbait\nturbines\npaperwork\nbestowed\nnatasha\nrenee\noceans\npurchases\n157\nvaccine\n215\n##tock\nfixtures\nplayhouse\nintegrate\njai\noswald\nintellectuals\n##cky\nbooked\nnests\nmortimer\n##isi\nobsession\nsept\n##gler\n##sum\n440\nscrutiny\nsimultaneous\nsquinted\n##shin\ncollects\noven\nshankar\npenned\nremarkably\n##я\nslips\nluggage\nspectral\n1786\ncollaborations\nlouie\nconsolidation\n##ailed\n##ivating\n420\nhoover\nblackpool\nharness\nignition\nvest\ntails\nbelmont\nmongol\nskinner\n##nae\nvisually\nmage\nderry\n##tism\n##unce\nstevie\ntransitional\n##rdy\nredskins\ndrying\nprep\nprospective\n##21\nannoyance\noversee\n##loaded\nfills\n##books\n##iki\nannounces\nfda\nscowled\nrespects\nprasad\nmystic\ntucson\n##vale\nrevue\nspringer\nbankrupt\n1772\naristotle\nsalvatore\nhabsburg\n##geny\ndal\nnatal\nnut\npod\nchewing\ndarts\nmoroccan\nwalkover\nrosario\nlenin\npunjabi\n##ße\ngrossed\nscattering\nwired\ninvasive\nhui\npolynomial\ncorridors\nwakes\ngina\nportrays\n##cratic\narid\nretreating\nerich\nirwin\nsniper\n##dha\nlinen\nlindsey\nmaneuver\nbutch\nshutting\nsocio\nbounce\ncommemorative\npostseason\njeremiah\npines\n275\nmystical\nbeads\nbp\nabbas\nfurnace\nbidding\nconsulted\nassaulted\nempirical\nrubble\nenclosure\nsob\nweakly\ncancel\npolly\nyielded\n##emann\ncurly\nprediction\nbattered\n70s\nvhs\njacqueline\nrender\nsails\nbarked\ndetailing\ngrayson\nriga\nsloane\nraging\n##yah\nherbs\nbravo\n##athlon\nalloy\ngiggle\nimminent\nsuffers\nassumptions\nwaltz\n##itate\naccomplishments\n##ited\nbathing\nremixed\ndeception\nprefix\n##emia\ndeepest\n##tier\n##eis\nbalkan\nfrogs\n##rong\nslab\n##pate\nphilosophers\npeterborough\ngrains\nimports\ndickinson\nrwanda\n##atics\n1774\ndirk\nlan\ntablets\n##rove\nclone\n##rice\ncaretaker\nhostilities\nmclean\n##gre\nregimental\ntreasures\nnorms\nimpose\ntsar\ntango\ndiplomacy\nvariously\ncomplain\n192\nrecognise\narrests\n1779\ncelestial\npulitzer\n##dus\nbing\nlibretto\n##moor\nadele\nsplash\n##rite\nexpectation\nlds\nconfronts\n##izer\nspontaneous\nharmful\nwedge\nentrepreneurs\nbuyer\n##ope\nbilingual\ntranslate\nrugged\nconner\ncirculated\nuae\neaton\n##gra\n##zzle\nlingered\nlockheed\nvishnu\nreelection\nalonso\n##oom\njoints\nyankee\nheadline\ncooperate\nheinz\nlaureate\ninvading\n##sford\nechoes\nscandinavian\n##dham\nhugging\nvitamin\nsalute\nmicah\nhind\ntrader\n##sper\nradioactive\n##ndra\nmilitants\npoisoned\nratified\nremark\ncampeonato\ndeprived\nwander\nprop\n##dong\noutlook\n##tani\n##rix\n##eye\nchiang\ndarcy\n##oping\nmandolin\nspice\nstatesman\nbabylon\n182\nwalled\nforgetting\nafro\n##cap\n158\ngiorgio\nbuffer\n##polis\nplanetary\n##gis\noverlap\nterminals\nkinda\ncentenary\n##bir\narising\nmanipulate\nelm\nke\n1770\nak\n##tad\nchrysler\nmapped\nmoose\npomeranian\nquad\nmacarthur\nassemblies\nshoreline\nrecalls\nstratford\n##rted\nnoticeable\n##evic\nimp\n##rita\n##sque\naccustomed\nsupplying\ntents\ndisgusted\nvogue\nsipped\nfilters\nkhz\nreno\nselecting\nluftwaffe\nmcmahon\ntyne\nmasterpiece\ncarriages\ncollided\ndunes\nexercised\nflare\nremembers\nmuzzle\n##mobile\nheck\n##rson\nburgess\nlunged\nmiddleton\nboycott\nbilateral\n##sity\nhazardous\nlumpur\nmultiplayer\nspotlight\njackets\ngoldman\nliege\nporcelain\nrag\nwaterford\nbenz\nattracts\nhopeful\nbattling\nottomans\nkensington\nbaked\nhymns\ncheyenne\nlattice\nlevine\nborrow\npolymer\nclashes\nmichaels\nmonitored\ncommitments\ndenounced\n##25\n##von\ncavity\n##oney\nhobby\nakin\n##holders\nfutures\nintricate\ncornish\npatty\n##oned\nillegally\ndolphin\n##lag\nbarlow\nyellowish\nmaddie\napologized\nluton\nplagued\n##puram\nnana\n##rds\nsway\nfanny\nłodz\n##rino\npsi\nsuspicions\nhanged\n##eding\ninitiate\ncharlton\n##por\nnak\ncompetent\n235\nanalytical\nannex\nwardrobe\nreservations\n##rma\nsect\n162\nfairfax\nhedge\npiled\nbuckingham\nuneven\nbauer\nsimplicity\nsnyder\ninterpret\naccountability\ndonors\nmoderately\nbyrd\ncontinents\n##cite\n##max\ndisciple\nhr\njamaican\nping\nnominees\n##uss\nmongolian\ndiver\nattackers\neagerly\nideological\npillows\nmiracles\napartheid\nrevolver\nsulfur\nclinics\nmoran\n163\n##enko\nile\nkaty\nrhetoric\n##icated\nchronology\nrecycling\n##hrer\nelongated\nmughal\npascal\nprofiles\nvibration\ndatabases\ndomination\n##fare\n##rant\nmatthias\ndigest\nrehearsal\npolling\nweiss\ninitiation\nreeves\nclinging\nflourished\nimpress\nngo\n##hoff\n##ume\nbuckley\nsymposium\nrhythms\nweed\nemphasize\ntransforming\n##taking\n##gence\n##yman\naccountant\nanalyze\nflicker\nfoil\npriesthood\nvoluntarily\ndecreases\n##80\n##hya\nslater\nsv\ncharting\nmcgill\n##lde\nmoreno\n##iu\nbesieged\nzur\nrobes\n##phic\nadmitting\napi\ndeported\nturmoil\npeyton\nearthquakes\n##ares\nnationalists\nbeau\nclair\nbrethren\ninterrupt\nwelch\ncurated\ngalerie\nrequesting\n164\n##ested\nimpending\nsteward\nviper\n##vina\ncomplaining\nbeautifully\nbrandy\nfoam\nnl\n1660\n##cake\nalessandro\npunches\nlaced\nexplanations\n##lim\nattribute\nclit\nreggie\ndiscomfort\n##cards\nsmoothed\nwhales\n##cene\nadler\ncountered\nduffy\ndisciplinary\nwidening\nrecipe\nreliance\nconducts\ngoats\ngradient\npreaching\n##shaw\nmatilda\nquasi\nstriped\nmeridian\ncannabis\ncordoba\ncertificates\n##agh\n##tering\ngraffiti\nhangs\npilgrims\nrepeats\n##ych\nrevive\nurine\netat\n##hawk\nfueled\nbelts\nfuzzy\nsusceptible\n##hang\nmauritius\nsalle\nsincere\nbeers\nhooks\n##cki\narbitration\nentrusted\nadvise\nsniffed\nseminar\njunk\ndonnell\nprocessors\nprincipality\nstrapped\ncelia\nmendoza\neverton\nfortunes\nprejudice\nstarving\nreassigned\nsteamer\n##lund\ntuck\nevenly\nforeman\n##ffen\ndans\n375\nenvisioned\nslit\n##xy\nbaseman\nliberia\nrosemary\n##weed\nelectrified\nperiodically\npotassium\nstride\ncontexts\nsperm\nslade\nmariners\ninflux\nbianca\nsubcommittee\n##rane\nspilling\nicao\nestuary\n##nock\ndelivers\niphone\n##ulata\nisa\nmira\nbohemian\ndessert\n##sbury\nwelcoming\nproudly\nslowing\n##chs\nmusee\nascension\nruss\n##vian\nwaits\n##psy\nafricans\nexploit\n##morphic\ngov\neccentric\ncrab\npeck\n##ull\nentrances\nformidable\nmarketplace\ngroom\nbolted\nmetabolism\npatton\nrobbins\ncourier\npayload\nendure\n##ifier\nandes\nrefrigerator\n##pr\nornate\n##uca\nruthless\nillegitimate\nmasonry\nstrasbourg\nbikes\nadobe\n##³\napples\nquintet\nwillingly\nniche\nbakery\ncorpses\nenergetic\n##cliffe\n##sser\n##ards\n177\ncentimeters\ncentro\nfuscous\ncretaceous\nrancho\n##yde\nandrei\ntelecom\ntottenham\noasis\nordination\nvulnerability\npresiding\ncorey\ncp\npenguins\nsims\n##pis\nmalawi\npiss\n##48\ncorrection\n##cked\n##ffle\n##ryn\ncountdown\ndetectives\npsychiatrist\npsychedelic\ndinosaurs\nblouse\n##get\nchoi\nvowed\n##oz\nrandomly\n##pol\n49ers\nscrub\nblanche\nbruins\ndusseldorf\n##using\nunwanted\n##ums\n212\ndominique\nelevations\nheadlights\nom\nlaguna\n##oga\n1750\nfamously\nignorance\nshrewsbury\n##aine\najax\nbreuning\nche\nconfederacy\ngreco\noverhaul\n##screen\npaz\nskirts\ndisagreement\ncruelty\njagged\nphoebe\nshifter\nhovered\nviruses\n##wes\nmandy\n##lined\n##gc\nlandlord\nsquirrel\ndashed\n##ι\nornamental\ngag\nwally\ngrange\nliteral\nspurs\nundisclosed\nproceeding\nyin\n##text\nbillie\norphan\nspanned\nhumidity\nindy\nweighted\npresentations\nexplosions\nlucian\n##tary\nvaughn\nhindus\n##anga\n##hell\npsycho\n171\ndaytona\nprotects\nefficiently\nrematch\nsly\ntandem\n##oya\nrebranded\nimpaired\nhee\nmetropolis\npeach\ngodfrey\ndiaspora\nethnicity\nprosperous\ngleaming\ndar\ngrossing\nplayback\n##rden\nstripe\npistols\n##tain\nbirths\nlabelled\n##cating\n172\nrudy\nalba\n##onne\naquarium\nhostility\n##gb\n##tase\nshudder\nsumatra\nhardest\nlakers\nconsonant\ncreeping\ndemos\nhomicide\ncapsule\nzeke\nliberties\nexpulsion\npueblo\n##comb\ntrait\ntransporting\n##ddin\n##neck\n##yna\ndepart\ngregg\nmold\nledge\nhangar\noldham\nplayboy\ntermination\nanalysts\ngmbh\nromero\n##itic\ninsist\ncradle\nfilthy\nbrightness\nslash\nshootout\ndeposed\nbordering\n##truct\nisis\nmicrowave\ntumbled\nsheltered\ncathy\nwerewolves\nmessy\nandersen\nconvex\nclapped\nclinched\nsatire\nwasting\nedo\nvc\nrufus\n##jak\nmont\n##etti\npoznan\n##keeping\nrestructuring\ntransverse\n##rland\nazerbaijani\nslovene\ngestures\nroommate\nchoking\nshear\n##quist\nvanguard\noblivious\n##hiro\ndisagreed\nbaptism\n##lich\ncoliseum\n##aceae\nsalvage\nsociete\ncory\nlocke\nrelocation\nrelying\nversailles\nahl\nswelling\n##elo\ncheerful\n##word\n##edes\ngin\nsarajevo\nobstacle\ndiverted\n##nac\nmessed\nthoroughbred\nfluttered\nutrecht\nchewed\nacquaintance\nassassins\ndispatch\nmirza\n##wart\nnike\nsalzburg\nswell\nyen\n##gee\nidle\nligue\nsamson\n##nds\n##igh\nplayful\nspawned\n##cise\ntease\n##case\nburgundy\n##bot\nstirring\nskeptical\ninterceptions\nmarathi\n##dies\nbedrooms\naroused\npinch\n##lik\npreferences\ntattoos\nbuster\ndigitally\nprojecting\nrust\n##ital\nkitten\npriorities\naddison\npseudo\n##guard\ndusk\nicons\nsermon\n##psis\n##iba\nbt\n##lift\n##xt\nju\ntruce\nrink\n##dah\n##wy\ndefects\npsychiatry\noffences\ncalculate\nglucose\n##iful\n##rized\n##unda\nfrancaise\n##hari\nrichest\nwarwickshire\ncarly\n1763\npurity\nredemption\nlending\n##cious\nmuse\nbruises\ncerebral\naero\ncarving\n##name\npreface\nterminology\ninvade\nmonty\n##int\nanarchist\nblurred\n##iled\nrossi\ntreats\nguts\nshu\nfoothills\nballads\nundertaking\npremise\ncecilia\naffiliates\nblasted\nconditional\nwilder\nminors\ndrone\nrudolph\nbuffy\nswallowing\nhorton\nattested\n##hop\nrutherford\nhowell\nprimetime\nlivery\npenal\n##bis\nminimize\nhydro\nwrecked\nwrought\npalazzo\n##gling\ncans\nvernacular\nfriedman\nnobleman\nshale\nwalnut\ndanielle\n##ection\n##tley\nsears\n##kumar\nchords\nlend\nflipping\nstreamed\npor\ndracula\ngallons\nsacrifices\ngamble\norphanage\n##iman\nmckenzie\n##gible\nboxers\ndaly\n##balls\n##ان\n208\n##ific\n##rative\n##iq\nexploited\nslated\n##uity\ncircling\nhillary\npinched\ngoldberg\nprovost\ncampaigning\nlim\npiles\nironically\njong\nmohan\nsuccessors\nusaf\n##tem\n##ught\nautobiographical\nhaute\npreserves\n##ending\nacquitted\ncomparisons\n203\nhydroelectric\ngangs\ncypriot\ntorpedoes\nrushes\nchrome\nderive\nbumps\ninstability\nfiat\npets\n##mbe\nsilas\ndye\nreckless\nsettler\n##itation\ninfo\nheats\n##writing\n176\ncanonical\nmaltese\nfins\nmushroom\nstacy\naspen\navid\n##kur\n##loading\nvickers\ngaston\nhillside\nstatutes\nwilde\ngail\nkung\nsabine\ncomfortably\nmotorcycles\n##rgo\n169\npneumonia\nfetch\n##sonic\naxel\nfaintly\nparallels\n##oop\nmclaren\nspouse\ncompton\ninterdisciplinary\nminer\n##eni\n181\nclamped\n##chal\n##llah\nseparates\nversa\n##mler\nscarborough\nlabrador\n##lity\n##osing\nrutgers\nhurdles\ncomo\n166\nburt\ndivers\n##100\nwichita\ncade\ncoincided\n##erson\nbruised\nmla\n##pper\nvineyard\n##ili\n##brush\nnotch\nmentioning\njase\nhearted\nkits\ndoe\n##acle\npomerania\n##ady\nronan\nseizure\npavel\nproblematic\n##zaki\ndomenico\n##ulin\ncatering\npenelope\ndependence\nparental\nemilio\nministerial\natkinson\n##bolic\nclarkson\nchargers\ncolby\ngrill\npeeked\narises\nsummon\n##aged\nfools\n##grapher\nfaculties\nqaeda\n##vial\ngarner\nrefurbished\n##hwa\ngeelong\ndisasters\nnudged\nbs\nshareholder\nlori\nalgae\nreinstated\nrot\n##ades\n##nous\ninvites\nstainless\n183\ninclusive\n##itude\ndiocesan\ntil\n##icz\ndenomination\n##xa\nbenton\nfloral\nregisters\n##ider\n##erman\n##kell\nabsurd\nbrunei\nguangzhou\nhitter\nretaliation\n##uled\n##eve\nblanc\nnh\nconsistency\ncontamination\n##eres\n##rner\ndire\npalermo\nbroadcasters\ndiaries\ninspire\nvols\nbrewer\ntightening\nky\nmixtape\nhormone\n##tok\nstokes\n##color\n##dly\n##ssi\npg\n##ometer\n##lington\nsanitation\n##tility\nintercontinental\napps\n##adt\n¹⁄₂\ncylinders\neconomies\nfavourable\nunison\ncroix\ngertrude\nodyssey\nvanity\ndangling\n##logists\nupgrades\ndice\nmiddleweight\npractitioner\n##ight\n206\nhenrik\nparlor\norion\nangered\nlac\npython\nblurted\n##rri\nsensual\nintends\nswings\nangled\n##phs\nhusky\nattain\npeerage\nprecinct\ntextiles\ncheltenham\nshuffled\ndai\nconfess\ntasting\nbhutan\n##riation\ntyrone\nsegregation\nabrupt\nruiz\n##rish\nsmirked\nblackwell\nconfidential\nbrowning\namounted\n##put\nvase\nscarce\nfabulous\nraided\nstaple\nguyana\nunemployed\nglider\nshay\n##tow\ncarmine\ntroll\nintervene\nsquash\nsuperstar\n##uce\ncylindrical\nlen\nroadway\nresearched\nhandy\n##rium\n##jana\nmeta\nlao\ndeclares\n##rring\n##tadt\n##elin\n##kova\nwillem\nshrubs\nnapoleonic\nrealms\nskater\nqi\nvolkswagen\n##ł\ntad\nhara\narchaeologist\nawkwardly\neerie\n##kind\nwiley\n##heimer\n##24\ntitus\norganizers\ncfl\ncrusaders\nlama\nusb\nvent\nenraged\nthankful\noccupants\nmaximilian\n##gaard\npossessing\ntextbooks\n##oran\ncollaborator\nquaker\n##ulo\navalanche\nmono\nsilky\nstraits\nisaiah\nmustang\nsurged\nresolutions\npotomac\ndescend\ncl\nkilograms\nplato\nstrains\nsaturdays\n##olin\nbernstein\n##ype\nholstein\nponytail\n##watch\nbelize\nconversely\nheroine\nperpetual\n##ylus\ncharcoal\npiedmont\nglee\nnegotiating\nbackdrop\nprologue\n##jah\n##mmy\npasadena\nclimbs\nramos\nsunni\n##holm\n##tner\n##tri\nanand\ndeficiency\nhertfordshire\nstout\n##avi\naperture\norioles\n##irs\ndoncaster\nintrigued\nbombed\ncoating\notis\n##mat\ncocktail\n##jit\n##eto\namir\narousal\nsar\n##proof\n##act\n##ories\ndixie\npots\n##bow\nwhereabouts\n159\n##fted\ndrains\nbullying\ncottages\nscripture\ncoherent\nfore\npoe\nappetite\n##uration\nsampled\n##ators\n##dp\nderrick\nrotor\njays\npeacock\ninstallment\n##rro\nadvisors\n##coming\nrodeo\nscotch\n##mot\n##db\n##fen\n##vant\nensued\nrodrigo\ndictatorship\nmartyrs\ntwenties\n##н\ntowed\nincidence\nmarta\nrainforest\nsai\nscaled\n##cles\noceanic\nqualifiers\nsymphonic\nmcbride\ndislike\ngeneralized\naubrey\ncolonization\n##iation\n##lion\n##ssing\ndisliked\nlublin\nsalesman\n##ulates\nspherical\nwhatsoever\nsweating\navalon\ncontention\npunt\nseverity\nalderman\natari\n##dina\n##grant\n##rop\nscarf\nseville\nvertices\nannexation\nfairfield\nfascination\ninspiring\nlaunches\npalatinate\nregretted\n##rca\nferal\n##iom\nelk\nnap\nolsen\nreddy\nyong\n##leader\n##iae\ngarment\ntransports\nfeng\ngracie\noutrage\nviceroy\ninsides\n##esis\nbreakup\ngrady\norganizer\nsofter\ngrimaced\n222\nmurals\ngalicia\narranging\nvectors\n##rsten\nbas\n##sb\n##cens\nsloan\n##eka\nbitten\nara\nfender\nnausea\nbumped\nkris\nbanquet\ncomrades\ndetector\npersisted\n##llan\nadjustment\nendowed\ncinemas\n##shot\nsellers\n##uman\npeek\nepa\nkindly\nneglect\nsimpsons\ntalon\nmausoleum\nrunaway\nhangul\nlookout\n##cic\nrewards\ncoughed\nacquainted\nchloride\n##ald\nquicker\naccordion\nneolithic\n##qa\nartemis\ncoefficient\nlenny\npandora\ntx\n##xed\necstasy\nlitter\nsegunda\nchairperson\ngemma\nhiss\nrumor\nvow\nnasal\nantioch\ncompensate\npatiently\ntransformers\n##eded\njudo\nmorrow\npenis\nposthumous\nphilips\nbandits\nhusbands\ndenote\nflaming\n##any\n##phones\nlangley\nyorker\n1760\nwalters\n##uo\n##kle\ngubernatorial\nfatty\nsamsung\nleroy\noutlaw\n##nine\nunpublished\npoole\njakob\n##ᵢ\n##ₙ\ncrete\ndistorted\nsuperiority\n##dhi\nintercept\ncrust\nmig\nclaus\ncrashes\npositioning\n188\nstallion\n301\nfrontal\narmistice\n##estinal\nelton\naj\nencompassing\ncamel\ncommemorated\nmalaria\nwoodward\ncalf\ncigar\npenetrate\n##oso\nwillard\n##rno\n##uche\nillustrate\namusing\nconvergence\nnoteworthy\n##lma\n##rva\njourneys\nrealise\nmanfred\n##sable\n410\n##vocation\nhearings\nfiance\n##posed\neducators\nprovoked\nadjusting\n##cturing\nmodular\nstockton\npaterson\nvlad\nrejects\nelectors\nselena\nmaureen\n##tres\nuber\n##rce\nswirled\n##num\nproportions\nnanny\npawn\nnaturalist\nparma\napostles\nawoke\nethel\nwen\n##bey\nmonsoon\noverview\n##inating\nmccain\nrendition\nrisky\nadorned\n##ih\nequestrian\ngermain\nnj\nconspicuous\nconfirming\n##yoshi\nshivering\n##imeter\nmilestone\nrumours\nflinched\nbounds\nsmacked\ntoken\n##bei\nlectured\nautomobiles\n##shore\nimpacted\n##iable\nnouns\nnero\n##leaf\nismail\nprostitute\ntrams\n##lace\nbridget\nsud\nstimulus\nimpressions\nreins\nrevolves\n##oud\n##gned\ngiro\nhoneymoon\n##swell\ncriterion\n##sms\n##uil\nlibyan\nprefers\n##osition\n211\npreview\nsucks\naccusation\nbursts\nmetaphor\ndiffusion\ntolerate\nfaye\nbetting\ncinematographer\nliturgical\nspecials\nbitterly\nhumboldt\n##ckle\nflux\nrattled\n##itzer\narchaeologists\nodor\nauthorised\nmarshes\ndiscretion\n##ов\nalarmed\narchaic\ninverse\n##leton\nexplorers\n##pine\ndrummond\ntsunami\nwoodlands\n##minate\n##tland\nbooklet\ninsanity\nowning\ninsert\ncrafted\ncalculus\n##tore\nreceivers\n##bt\nstung\n##eca\n##nched\nprevailing\ntravellers\neyeing\nlila\ngraphs\n##borne\n178\njulien\n##won\nmorale\nadaptive\ntherapist\nerica\ncw\nlibertarian\nbowman\npitches\nvita\n##ional\ncrook\n##ads\n##entation\ncaledonia\nmutiny\n##sible\n1840s\nautomation\n##ß\nflock\n##pia\nironic\npathology\n##imus\nremarried\n##22\njoker\nwithstand\nenergies\n##att\nshropshire\nhostages\nmadeleine\ntentatively\nconflicting\nmateo\nrecipes\neuros\nol\nmercenaries\nnico\n##ndon\nalbuquerque\naugmented\nmythical\nbel\nfreud\n##child\ncough\n##lica\n365\nfreddy\nlillian\ngenetically\nnuremberg\ncalder\n209\nbonn\noutdoors\npaste\nsuns\nurgency\nvin\nrestraint\ntyson\n##cera\n##selle\nbarrage\nbethlehem\nkahn\n##par\nmounts\nnippon\nbarony\nhappier\nryu\nmakeshift\nsheldon\nblushed\ncastillo\nbarking\nlistener\ntaped\nbethel\nfluent\nheadlines\npornography\nrum\ndisclosure\nsighing\nmace\ndoubling\ngunther\nmanly\n##plex\nrt\ninterventions\nphysiological\nforwards\nemerges\n##tooth\n##gny\ncompliment\nrib\nrecession\nvisibly\nbarge\nfaults\nconnector\nexquisite\nprefect\n##rlin\npatio\n##cured\nelevators\nbrandt\nitalics\npena\n173\nwasp\nsatin\nea\nbotswana\ngraceful\nrespectable\n##jima\n##rter\n##oic\nfranciscan\ngenerates\n##dl\nalfredo\ndisgusting\n##olate\n##iously\nsherwood\nwarns\ncod\npromo\ncheryl\nsino\n##ة\n##escu\ntwitch\n##zhi\nbrownish\nthom\nortiz\n##dron\ndensely\n##beat\ncarmel\nreinforce\n##bana\n187\nanastasia\ndownhill\nvertex\ncontaminated\nremembrance\nharmonic\nhomework\n##sol\nfiancee\ngears\nolds\nangelica\nloft\nramsay\nquiz\ncolliery\nsevens\n##cape\nautism\n##hil\nwalkway\n##boats\nruben\nabnormal\nounce\nkhmer\n##bbe\nzachary\nbedside\nmorphology\npunching\n##olar\nsparrow\nconvinces\n##35\nhewitt\nqueer\nremastered\nrods\nmabel\nsolemn\nnotified\nlyricist\nsymmetric\n##xide\n174\nencore\npassports\nwildcats\n##uni\nbaja\n##pac\nmildly\n##ease\nbleed\ncommodity\nmounds\nglossy\norchestras\n##omo\ndamian\nprelude\nambitions\n##vet\nawhile\nremotely\n##aud\nasserts\nimply\n##iques\ndistinctly\nmodelling\nremedy\n##dded\nwindshield\ndani\nxiao\n##endra\naudible\npowerplant\n1300\ninvalid\nelemental\nacquisitions\n##hala\nimmaculate\nlibby\nplata\nsmuggling\nventilation\ndenoted\nminh\n##morphism\n430\ndiffered\ndion\nkelley\nlore\nmocking\nsabbath\nspikes\nhygiene\ndrown\nrunoff\nstylized\ntally\nliberated\naux\ninterpreter\nrighteous\naba\nsiren\nreaper\npearce\nmillie\n##cier\n##yra\ngaius\n##iso\ncaptures\n##ttering\ndorm\nclaudio\n##sic\nbenches\nknighted\nblackness\n##ored\ndiscount\nfumble\noxidation\nrouted\n##ς\nnovak\nperpendicular\nspoiled\nfracture\nsplits\n##urt\npads\ntopology\n##cats\naxes\nfortunate\noffenders\nprotestants\nesteem\n221\nbroadband\nconvened\nfrankly\nhound\nprototypes\nisil\nfacilitated\nkeel\n##sher\nsahara\nawaited\nbubba\norb\nprosecutors\n186\nhem\n520\n##xing\nrelaxing\nremnant\nromney\nsorted\nslalom\nstefano\nulrich\n##active\nexemption\nfolder\npauses\nfoliage\nhitchcock\nepithet\n204\ncriticisms\n##aca\nballistic\nbrody\nhinduism\nchaotic\nyouths\nequals\n##pala\npts\nthicker\nanalogous\ncapitalist\nimprovised\noverseeing\nsinatra\nascended\nbeverage\n##tl\nstraightforward\n##kon\ncurran\n##west\nbois\n325\ninduce\nsurveying\nemperors\nsax\nunpopular\n##kk\ncartoonist\nfused\n##mble\nunto\n##yuki\nlocalities\n##cko\n##ln\ndarlington\nslain\nacademie\nlobbying\nsediment\npuzzles\n##grass\ndefiance\ndickens\nmanifest\ntongues\nalumnus\narbor\ncoincide\n184\nappalachian\nmustafa\nexaminer\ncabaret\ntraumatic\nyves\nbracelet\ndraining\nheroin\nmagnum\nbaths\nodessa\nconsonants\nmitsubishi\n##gua\nkellan\nvaudeville\n##fr\njoked\nnull\nstraps\nprobation\n##ław\nceded\ninterfaces\n##pas\n##zawa\nblinding\nviet\n224\nrothschild\nmuseo\n640\nhuddersfield\n##vr\ntactic\n##storm\nbrackets\ndazed\nincorrectly\n##vu\nreg\nglazed\nfearful\nmanifold\nbenefited\nirony\n##sun\nstumbling\n##rte\nwillingness\nbalkans\nmei\nwraps\n##aba\ninjected\n##lea\ngu\nsyed\nharmless\n##hammer\nbray\ntakeoff\npoppy\ntimor\ncardboard\nastronaut\npurdue\nweeping\nsouthbound\ncursing\nstalls\ndiagonal\n##neer\nlamar\nbryce\ncomte\nweekdays\nharrington\n##uba\nnegatively\n##see\nlays\ngrouping\n##cken\n##henko\naffirmed\nhalle\nmodernist\n##lai\nhodges\nsmelling\naristocratic\nbaptized\ndismiss\njustification\noilers\n##now\ncoupling\nqin\nsnack\nhealer\n##qing\ngardener\nlayla\nbattled\nformulated\nstephenson\ngravitational\n##gill\n##jun\n1768\ngranny\ncoordinating\nsuites\n##cd\n##ioned\nmonarchs\n##cote\n##hips\nsep\nblended\napr\nbarrister\ndeposition\nfia\nmina\npolicemen\nparanoid\n##pressed\nchurchyard\ncovert\ncrumpled\ncreep\nabandoning\ntr\ntransmit\nconceal\nbarr\nunderstands\nreadiness\nspire\n##cology\n##enia\n##erry\n610\nstartling\nunlock\nvida\nbowled\nslots\n##nat\n##islav\nspaced\ntrusting\nadmire\nrig\n##ink\nslack\n##70\nmv\n207\ncasualty\n##wei\nclassmates\n##odes\n##rar\n##rked\namherst\nfurnished\nevolve\nfoundry\nmenace\nmead\n##lein\nflu\nwesleyan\n##kled\nmonterey\nwebber\n##vos\nwil\n##mith\n##на\nbartholomew\njustices\nrestrained\n##cke\namenities\n191\nmediated\nsewage\ntrenches\nml\nmainz\n##thus\n1800s\n##cula\n##inski\ncaine\nbonding\n213\nconverts\nspheres\nsuperseded\nmarianne\ncrypt\nsweaty\nensign\nhistoria\n##br\nspruce\n##post\n##ask\nforks\nthoughtfully\nyukon\npamphlet\names\n##uter\nkarma\n##yya\nbryn\nnegotiation\nsighs\nincapable\n##mbre\n##ntial\nactresses\ntaft\n##mill\nluce\nprevailed\n##amine\n1773\nmotionless\nenvoy\ntestify\ninvesting\nsculpted\ninstructors\nprovence\nkali\ncullen\nhorseback\n##while\ngoodwin\n##jos\ngaa\nnorte\n##ldon\nmodify\nwavelength\nabd\n214\nskinned\nsprinter\nforecast\nscheduling\nmarries\nsquared\ntentative\n##chman\nboer\n##isch\nbolts\nswap\nfisherman\nassyrian\nimpatiently\nguthrie\nmartins\nmurdoch\n194\ntanya\nnicely\ndolly\nlacy\nmed\n##45\nsyn\ndecks\nfashionable\nmillionaire\n##ust\nsurfing\n##ml\n##ision\nheaved\ntammy\nconsulate\nattendees\nroutinely\n197\nfuse\nsaxophonist\nbackseat\nmalaya\n##lord\nscowl\ntau\n##ishly\n193\nsighted\nsteaming\n##rks\n303\n911\n##holes\n##hong\nching\n##wife\nbless\nconserved\njurassic\nstacey\nunix\nzion\nchunk\nrigorous\nblaine\n198\npeabody\nslayer\ndismay\nbrewers\nnz\n##jer\ndet\n##glia\nglover\npostwar\nint\npenetration\nsylvester\nimitation\nvertically\nairlift\nheiress\nknoxville\nviva\n##uin\n390\nmacon\n##rim\n##fighter\n##gonal\njanice\n##orescence\n##wari\nmarius\nbelongings\nleicestershire\n196\nblanco\ninverted\npreseason\nsanity\nsobbing\n##due\n##elt\n##dled\ncollingwood\nregeneration\nflickering\nshortest\n##mount\n##osi\nfeminism\n##lat\nsherlock\ncabinets\nfumbled\nnorthbound\nprecedent\nsnaps\n##mme\nresearching\n##akes\nguillaume\ninsights\nmanipulated\nvapor\nneighbour\nsap\ngangster\nfrey\nf1\nstalking\nscarcely\ncallie\nbarnett\ntendencies\naudi\ndoomed\nassessing\nslung\npanchayat\nambiguous\nbartlett\n##etto\ndistributing\nviolating\nwolverhampton\n##hetic\nswami\nhistoire\n##urus\nliable\npounder\ngroin\nhussain\nlarsen\npopping\nsurprises\n##atter\nvie\ncurt\n##station\nmute\nrelocate\nmusicals\nauthorization\nrichter\n##sef\nimmortality\ntna\nbombings\n##press\ndeteriorated\nyiddish\n##acious\nrobbed\ncolchester\ncs\npmid\nao\nverified\nbalancing\napostle\nswayed\nrecognizable\noxfordshire\nretention\nnottinghamshire\ncontender\njudd\ninvitational\nshrimp\nuhf\n##icient\ncleaner\nlongitudinal\ntanker\n##mur\nacronym\nbroker\nkoppen\nsundance\nsuppliers\n##gil\n4000\nclipped\nfuels\npetite\n##anne\nlandslide\nhelene\ndiversion\npopulous\nlandowners\nauspices\nmelville\nquantitative\n##xes\nferries\nnicky\n##llus\ndoo\nhaunting\nroche\ncarver\ndowned\nunavailable\n##pathy\napproximation\nhiroshima\n##hue\ngarfield\nvalle\ncomparatively\nkeyboardist\ntraveler\n##eit\ncongestion\ncalculating\nsubsidiaries\n##bate\nserb\nmodernization\nfairies\ndeepened\nville\naverages\n##lore\ninflammatory\ntonga\n##itch\nco₂\nsquads\n##hea\ngigantic\nserum\nenjoyment\nretailer\nverona\n35th\ncis\n##phobic\nmagna\ntechnicians\n##vati\narithmetic\n##sport\nlevin\n##dation\namtrak\nchow\nsienna\n##eyer\nbackstage\nentrepreneurship\n##otic\nlearnt\ntao\n##udy\nworcestershire\nformulation\nbaggage\nhesitant\nbali\nsabotage\n##kari\nbarren\nenhancing\nmurmur\npl\nfreshly\nputnam\nsyntax\naces\nmedicines\nresentment\nbandwidth\n##sier\ngrins\nchili\nguido\n##sei\nframing\nimplying\ngareth\nlissa\ngenevieve\npertaining\nadmissions\ngeo\nthorpe\nproliferation\nsato\nbela\nanalyzing\nparting\n##gor\nawakened\n##isman\nhuddled\nsecrecy\n##kling\nhush\ngentry\n540\ndungeons\n##ego\ncoasts\n##utz\nsacrificed\n##chule\nlandowner\nmutually\nprevalence\nprogrammer\nadolescent\ndisrupted\nseaside\ngee\ntrusts\nvamp\ngeorgie\n##nesian\n##iol\nschedules\nsindh\n##market\netched\nhm\nsparse\nbey\nbeaux\nscratching\ngliding\nunidentified\n216\ncollaborating\ngems\njesuits\noro\naccumulation\nshaping\nmbe\nanal\n##xin\n231\nenthusiasts\nnewscast\n##egan\njanata\ndewey\nparkinson\n179\nankara\nbiennial\ntowering\ndd\ninconsistent\n950\n##chet\nthriving\nterminate\ncabins\nfuriously\neats\nadvocating\ndonkey\nmarley\nmuster\nphyllis\nleiden\n##user\ngrassland\nglittering\niucn\nloneliness\n217\nmemorandum\narmenians\n##ddle\npopularized\nrhodesia\n60s\nlame\n##illon\nsans\nbikini\nheader\norbits\n##xx\n##finger\n##ulator\nsharif\nspines\nbiotechnology\nstrolled\nnaughty\nyates\n##wire\nfremantle\nmilo\n##mour\nabducted\nremoves\n##atin\nhumming\nwonderland\n##chrome\n##ester\nhume\npivotal\n##rates\narmand\ngrams\nbelievers\nelector\nrte\napron\nbis\nscraped\n##yria\nendorsement\ninitials\n##llation\neps\ndotted\nhints\nbuzzing\nemigration\nnearer\n##tom\nindicators\n##ulu\ncoarse\nneutron\nprotectorate\n##uze\ndirectional\nexploits\npains\nloire\n1830s\nproponents\nguggenheim\nrabbits\nritchie\n305\nhectare\ninputs\nhutton\n##raz\nverify\n##ako\nboilers\nlongitude\n##lev\nskeletal\nyer\nemilia\ncitrus\ncompromised\n##gau\npokemon\nprescription\nparagraph\neduard\ncadillac\nattire\ncategorized\nkenyan\nweddings\ncharley\n##bourg\nentertain\nmonmouth\n##lles\nnutrients\ndavey\nmesh\nincentive\npractised\necosystems\nkemp\nsubdued\noverheard\n##rya\nbodily\nmaxim\n##nius\napprenticeship\nursula\n##fight\nlodged\nrug\nsilesian\nunconstitutional\npatel\ninspected\ncoyote\nunbeaten\n##hak\n34th\ndisruption\nconvict\nparcel\n##cl\n##nham\ncollier\nimplicated\nmallory\n##iac\n##lab\nsusannah\nwinkler\n##rber\nshia\nphelps\nsediments\ngraphical\nrobotic\n##sner\nadulthood\nmart\nsmoked\n##isto\nkathryn\nclarified\n##aran\ndivides\nconvictions\noppression\npausing\nburying\n##mt\nfederico\nmathias\neileen\n##tana\nkite\nhunched\n##acies\n189\n##atz\ndisadvantage\nliza\nkinetic\ngreedy\nparadox\nyokohama\ndowager\ntrunks\nventured\n##gement\ngupta\nvilnius\nolaf\n##thest\ncrimean\nhopper\n##ej\nprogressively\narturo\nmouthed\narrondissement\n##fusion\nrubin\nsimulcast\noceania\n##orum\n##stra\n##rred\nbusiest\nintensely\nnavigator\ncary\n##vine\n##hini\n##bies\nfife\nrowe\nrowland\nposing\ninsurgents\nshafts\nlawsuits\nactivate\nconor\ninward\nculturally\ngarlic\n265\n##eering\neclectic\n##hui\n##kee\n##nl\nfurrowed\nvargas\nmeteorological\nrendezvous\n##aus\nculinary\ncommencement\n##dition\nquota\n##notes\nmommy\nsalaries\noverlapping\nmule\n##iology\n##mology\nsums\nwentworth\n##isk\n##zione\nmainline\nsubgroup\n##illy\nhack\nplaintiff\nverdi\nbulb\ndifferentiation\nengagements\nmultinational\nsupplemented\nbertrand\ncaller\nregis\n##naire\n##sler\n##arts\n##imated\nblossom\npropagation\nkilometer\nviaduct\nvineyards\n##uate\nbeckett\noptimization\ngolfer\nsongwriters\nseminal\nsemitic\nthud\nvolatile\nevolving\nridley\n##wley\ntrivial\ndistributions\nscandinavia\njiang\n##ject\nwrestled\ninsistence\n##dio\nemphasizes\nnapkin\n##ods\nadjunct\nrhyme\n##ricted\n##eti\nhopeless\nsurrounds\ntremble\n32nd\nsmoky\n##ntly\noils\nmedicinal\npadded\nsteer\nwilkes\n219\n255\nconcessions\nhue\nuniquely\nblinded\nlandon\nyahoo\n##lane\nhendrix\ncommemorating\ndex\nspecify\nchicks\n##ggio\nintercity\n1400\nmorley\n##torm\nhighlighting\n##oting\npang\noblique\nstalled\n##liner\nflirting\nnewborn\n1769\nbishopric\nshaved\n232\ncurrie\n##ush\ndharma\nspartan\n##ooped\nfavorites\nsmug\nnovella\nsirens\nabusive\ncreations\nespana\n##lage\nparadigm\nsemiconductor\nsheen\n##rdo\n##yen\n##zak\nnrl\nrenew\n##pose\n##tur\nadjutant\nmarches\nnorma\n##enity\nineffective\nweimar\ngrunt\n##gat\nlordship\nplotting\nexpenditure\ninfringement\nlbs\nrefrain\nav\nmimi\nmistakenly\npostmaster\n1771\n##bara\nras\nmotorsports\ntito\n199\nsubjective\n##zza\nbully\nstew\n##kaya\nprescott\n1a\n##raphic\n##zam\nbids\nstyling\nparanormal\nreeve\nsneaking\nexploding\nkatz\nakbar\nmigrant\nsyllables\nindefinitely\n##ogical\ndestroys\nreplaces\napplause\n##phine\npest\n##fide\n218\narticulated\nbertie\n##thing\n##cars\n##ptic\ncourtroom\ncrowley\naesthetics\ncummings\ntehsil\nhormones\ntitanic\ndangerously\n##ibe\nstadion\njaenelle\nauguste\nciudad\n##chu\nmysore\npartisans\n##sio\nlucan\nphilipp\n##aly\ndebating\nhenley\ninteriors\n##rano\n##tious\nhomecoming\nbeyonce\nusher\nhenrietta\nprepares\nweeds\n##oman\nely\nplucked\n##pire\n##dable\nluxurious\n##aq\nartifact\npassword\npasture\njuno\nmaddy\nminsk\n##dder\n##ologies\n##rone\nassessments\nmartian\nroyalist\n1765\nexamines\n##mani\n##rge\nnino\n223\nparry\nscooped\nrelativity\n##eli\n##uting\n##cao\ncongregational\nnoisy\ntraverse\n##agawa\nstrikeouts\nnickelodeon\nobituary\ntransylvania\nbinds\ndepictions\npolk\ntrolley\n##yed\n##lard\nbreeders\n##under\ndryly\nhokkaido\n1762\nstrengths\nstacks\nbonaparte\nconnectivity\nneared\nprostitutes\nstamped\nanaheim\ngutierrez\nsinai\n##zzling\nbram\nfresno\nmadhya\n##86\nproton\n##lena\n##llum\n##phon\nreelected\nwanda\n##anus\n##lb\nample\ndistinguishing\n##yler\ngrasping\nsermons\ntomato\nbland\nstimulation\navenues\n##eux\nspreads\nscarlett\nfern\npentagon\nassert\nbaird\nchesapeake\nir\ncalmed\ndistortion\nfatalities\n##olis\ncorrectional\npricing\n##astic\n##gina\nprom\ndammit\nying\ncollaborate\n##chia\nwelterweight\n33rd\npointer\nsubstitution\nbonded\numpire\ncommunicating\nmultitude\npaddle\n##obe\nfederally\nintimacy\n##insky\nbetray\nssr\n##lett\n##lean\n##lves\n##therapy\nairbus\n##tery\nfunctioned\nud\nbearer\nbiomedical\nnetflix\n##hire\n##nca\ncondom\nbrink\nik\n##nical\nmacy\n##bet\nflap\ngma\nexperimented\njelly\nlavender\n##icles\n##ulia\nmunro\n##mian\n##tial\nrye\n##rle\n60th\ngigs\nhottest\nrotated\npredictions\nfuji\nbu\n##erence\n##omi\nbarangay\n##fulness\n##sas\nclocks\n##rwood\n##liness\ncereal\nroe\nwight\ndecker\nuttered\nbabu\nonion\nxml\nforcibly\n##df\npetra\nsarcasm\nhartley\npeeled\nstorytelling\n##42\n##xley\n##ysis\n##ffa\nfibre\nkiel\nauditor\nfig\nharald\ngreenville\n##berries\ngeographically\nnell\nquartz\n##athic\ncemeteries\n##lr\ncrossings\nnah\nholloway\nreptiles\nchun\nsichuan\nsnowy\n660\ncorrections\n##ivo\nzheng\nambassadors\nblacksmith\nfielded\nfluids\nhardcover\nturnover\nmedications\nmelvin\nacademies\n##erton\nro\nroach\nabsorbing\nspaniards\ncolton\n##founded\noutsider\nespionage\nkelsey\n245\nedible\n##ulf\ndora\nestablishes\n##sham\n##tries\ncontracting\n##tania\ncinematic\ncostello\nnesting\n##uron\nconnolly\nduff\n##nology\nmma\n##mata\nfergus\nsexes\ngi\noptics\nspectator\nwoodstock\nbanning\n##hee\n##fle\ndifferentiate\noutfielder\nrefinery\n226\n312\ngerhard\nhorde\nlair\ndrastically\n##udi\nlandfall\n##cheng\nmotorsport\nodi\n##achi\npredominant\nquay\nskins\n##ental\nedna\nharshly\ncomplementary\nmurdering\n##aves\nwreckage\n##90\nono\noutstretched\nlennox\nmunitions\ngalen\nreconcile\n470\nscalp\nbicycles\ngillespie\nquestionable\nrosenberg\nguillermo\nhostel\njarvis\nkabul\nvolvo\nopium\nyd\n##twined\nabuses\ndecca\noutpost\n##cino\nsensible\nneutrality\n##64\nponce\nanchorage\natkins\nturrets\ninadvertently\ndisagree\nlibre\nvodka\nreassuring\nweighs\n##yal\nglide\njumper\nceilings\nrepertory\nouts\nstain\n##bial\nenvy\n##ucible\nsmashing\nheightened\npolicing\nhyun\nmixes\nlai\nprima\n##ples\nceleste\n##bina\nlucrative\nintervened\nkc\nmanually\n##rned\nstature\nstaffed\nbun\nbastards\nnairobi\npriced\n##auer\nthatcher\n##kia\ntripped\ncomune\n##ogan\n##pled\nbrasil\nincentives\nemanuel\nhereford\nmusica\n##kim\nbenedictine\nbiennale\n##lani\neureka\ngardiner\nrb\nknocks\nsha\n##ael\n##elled\n##onate\nefficacy\nventura\nmasonic\nsanford\nmaize\nleverage\n##feit\ncapacities\nsantana\n##aur\nnovelty\nvanilla\n##cter\n##tour\nbenin\n##oir\n##rain\nneptune\ndrafting\ntallinn\n##cable\nhumiliation\n##boarding\nschleswig\nfabian\nbernardo\nliturgy\nspectacle\nsweeney\npont\nroutledge\n##tment\ncosmos\nut\nhilt\nsleek\nuniversally\n##eville\n##gawa\ntyped\n##dry\nfavors\nallegheny\nglaciers\n##rly\nrecalling\naziz\n##log\nparasite\nrequiem\nauf\n##berto\n##llin\nillumination\n##breaker\n##issa\nfestivities\nbows\ngovern\nvibe\nvp\n333\nsprawled\nlarson\npilgrim\nbwf\nleaping\n##rts\n##ssel\nalexei\ngreyhound\nhoarse\n##dler\n##oration\nseneca\n##cule\ngaping\n##ulously\n##pura\ncinnamon\n##gens\n##rricular\ncraven\nfantasies\nhoughton\nengined\nreigned\ndictator\nsupervising\n##oris\nbogota\ncommentaries\nunnatural\nfingernails\nspirituality\ntighten\n##tm\ncanadiens\nprotesting\nintentional\ncheers\nsparta\n##ytic\n##iere\n##zine\nwiden\nbelgarath\ncontrollers\ndodd\niaaf\nnavarre\n##ication\ndefect\nsquire\nsteiner\nwhisky\n##mins\n560\ninevitably\ntome\n##gold\nchew\n##uid\n##lid\nelastic\n##aby\nstreaked\nalliances\njailed\nregal\n##ined\n##phy\nczechoslovak\nnarration\nabsently\n##uld\nbluegrass\nguangdong\nquran\ncriticizing\nhose\nhari\n##liest\n##owa\nskier\nstreaks\ndeploy\n##lom\nraft\nbose\ndialed\nhuff\n##eira\nhaifa\nsimplest\nbursting\nendings\nib\nsultanate\n##titled\nfranks\nwhitman\nensures\nsven\n##ggs\ncollaborators\nforster\norganising\nui\nbanished\nnapier\ninjustice\nteller\nlayered\nthump\n##otti\nroc\nbattleships\nevidenced\nfugitive\nsadie\nrobotics\n##roud\nequatorial\ngeologist\n##iza\nyielding\n##bron\n##sr\ninternationale\nmecca\n##diment\nsbs\nskyline\ntoad\nuploaded\nreflective\nundrafted\nlal\nleafs\nbayern\n##dai\nlakshmi\nshortlisted\n##stick\n##wicz\ncamouflage\ndonate\naf\nchristi\nlau\n##acio\ndisclosed\nnemesis\n1761\nassemble\nstraining\nnorthamptonshire\ntal\n##asi\nbernardino\npremature\nheidi\n42nd\ncoefficients\ngalactic\nreproduce\nbuzzed\nsensations\nzionist\nmonsieur\nmyrtle\n##eme\narchery\nstrangled\nmusically\nviewpoint\nantiquities\nbei\ntrailers\nseahawks\ncured\npee\npreferring\ntasmanian\nlange\nsul\n##mail\n##working\ncolder\noverland\nlucivar\nmassey\ngatherings\nhaitian\n##smith\ndisapproval\nflaws\n##cco\n##enbach\n1766\nnpr\n##icular\nboroughs\ncreole\nforums\ntechno\n1755\ndent\nabdominal\nstreetcar\n##eson\n##stream\nprocurement\ngemini\npredictable\n##tya\nacheron\nchristoph\nfeeder\nfronts\nvendor\nbernhard\njammu\ntumors\nslang\n##uber\ngoaltender\ntwists\ncurving\nmanson\nvuelta\nmer\npeanut\nconfessions\npouch\nunpredictable\nallowance\ntheodor\nvascular\n##factory\nbala\nauthenticity\nmetabolic\ncoughing\nnanjing\n##cea\npembroke\n##bard\nsplendid\n36th\nff\nhourly\n##ahu\nelmer\nhandel\n##ivate\nawarding\nthrusting\ndl\nexperimentation\n##hesion\n##46\ncaressed\nentertained\nsteak\n##rangle\nbiologist\norphans\nbaroness\noyster\nstepfather\n##dridge\nmirage\nreefs\nspeeding\n##31\nbarons\n1764\n227\ninhabit\npreached\nrepealed\n##tral\nhonoring\nboogie\ncaptives\nadminister\njohanna\n##imate\ngel\nsuspiciously\n1767\nsobs\n##dington\nbackbone\nhayward\ngarry\n##folding\n##nesia\nmaxi\n##oof\n##ppe\nellison\ngalileo\n##stand\ncrimea\nfrenzy\namour\nbumper\nmatrices\nnatalia\nbaking\ngarth\npalestinians\n##grove\nsmack\nconveyed\nensembles\ngardening\n##manship\n##rup\n##stituting\n1640\nharvesting\ntopography\njing\nshifters\ndormitory\n##carriage\n##lston\nist\nskulls\n##stadt\ndolores\njewellery\nsarawak\n##wai\n##zier\nfences\nchristy\nconfinement\ntumbling\ncredibility\nfir\nstench\n##bria\n##plication\n##nged\n##sam\nvirtues\n##belt\nmarjorie\npba\n##eem\n##made\ncelebrates\nschooner\nagitated\nbarley\nfulfilling\nanthropologist\n##pro\nrestrict\nnovi\nregulating\n##nent\npadres\n##rani\n##hesive\nloyola\ntabitha\nmilky\nolson\nproprietor\ncrambidae\nguarantees\nintercollegiate\nljubljana\nhilda\n##sko\nignorant\nhooded\n##lts\nsardinia\n##lidae\n##vation\nfrontman\nprivileged\nwitchcraft\n##gp\njammed\nlaude\npoking\n##than\nbracket\namazement\nyunnan\n##erus\nmaharaja\nlinnaeus\n264\ncommissioning\nmilano\npeacefully\n##logies\nakira\nrani\nregulator\n##36\ngrasses\n##rance\nluzon\ncrows\ncompiler\ngretchen\nseaman\nedouard\ntab\nbuccaneers\nellington\nhamlets\nwhig\nsocialists\n##anto\ndirectorial\neaston\nmythological\n##kr\n##vary\nrhineland\nsemantic\ntaut\ndune\ninventions\nsucceeds\n##iter\nreplication\nbranched\n##pired\njul\nprosecuted\nkangaroo\npenetrated\n##avian\nmiddlesbrough\ndoses\nbleak\nmadam\npredatory\nrelentless\n##vili\nreluctance\n##vir\nhailey\ncrore\nsilvery\n1759\nmonstrous\nswimmers\ntransmissions\nhawthorn\ninforming\n##eral\ntoilets\ncaracas\ncrouch\nkb\n##sett\n295\ncartel\nhadley\n##aling\nalexia\nyvonne\n##biology\ncinderella\neton\nsuperb\nblizzard\nstabbing\nindustrialist\nmaximus\n##gm\n##orus\ngroves\nmaud\nclade\noversized\ncomedic\n##bella\nrosen\nnomadic\nfulham\nmontane\nbeverages\ngalaxies\nredundant\nswarm\n##rot\n##folia\n##llis\nbuckinghamshire\nfen\nbearings\nbahadur\n##rom\ngilles\nphased\ndynamite\nfaber\nbenoit\nvip\n##ount\n##wd\nbooking\nfractured\ntailored\nanya\nspices\nwestwood\ncairns\nauditions\ninflammation\nsteamed\n##rocity\n##acion\n##urne\nskyla\nthereof\nwatford\ntorment\narchdeacon\ntransforms\nlulu\ndemeanor\nfucked\nserge\n##sor\nmckenna\nminas\nentertainer\n##icide\ncaress\noriginate\nresidue\n##sty\n1740\n##ilised\n##org\nbeech\n##wana\nsubsidies\n##ghton\nemptied\ngladstone\nru\nfirefighters\nvoodoo\n##rcle\nhet\nnightingale\ntamara\nedmond\ningredient\nweaknesses\nsilhouette\n285\ncompatibility\nwithdrawing\nhampson\n##mona\nanguish\ngiggling\n##mber\nbookstore\n##jiang\nsouthernmost\ntilting\n##vance\nbai\neconomical\nrf\nbriefcase\ndreadful\nhinted\nprojections\nshattering\ntotaling\n##rogate\nanalogue\nindicted\nperiodical\nfullback\n##dman\nhaynes\n##tenberg\n##ffs\n##ishment\n1745\nthirst\nstumble\npenang\nvigorous\n##ddling\n##kor\n##lium\noctave\n##ove\n##enstein\n##inen\n##ones\nsiberian\n##uti\ncbn\nrepeal\nswaying\n##vington\nkhalid\ntanaka\nunicorn\notago\nplastered\nlobe\nriddle\n##rella\nperch\n##ishing\ncroydon\nfiltered\ngraeme\ntripoli\n##ossa\ncrocodile\n##chers\nsufi\nmined\n##tung\ninferno\nlsu\n##phi\nswelled\nutilizes\n£2\ncale\nperiodicals\nstyx\nhike\ninformally\ncoop\nlund\n##tidae\nala\nhen\nqui\ntransformations\ndisposed\nsheath\nchickens\n##cade\nfitzroy\nsas\nsilesia\nunacceptable\nodisha\n1650\nsabrina\npe\nspokane\nratios\nathena\nmassage\nshen\ndilemma\n##drum\n##riz\n##hul\ncorona\ndoubtful\nniall\n##pha\n##bino\nfines\ncite\nacknowledging\nbangor\nballard\nbathurst\n##resh\nhuron\nmustered\nalzheimer\ngarments\nkinase\ntyre\nwarship\n##cp\nflashback\npulmonary\nbraun\ncheat\nkamal\ncyclists\nconstructions\ngrenades\nndp\ntraveller\nexcuses\nstomped\nsignalling\ntrimmed\nfutsal\nmosques\nrelevance\n##wine\nwta\n##23\n##vah\n##lter\nhoc\n##riding\noptimistic\n##´s\ndeco\nsim\ninteracting\nrejecting\nmoniker\nwaterways\n##ieri\n##oku\nmayors\ngdansk\noutnumbered\npearls\n##ended\n##hampton\nfairs\ntotals\ndominating\n262\nnotions\nstairway\ncompiling\npursed\ncommodities\ngrease\nyeast\n##jong\ncarthage\ngriffiths\nresidual\namc\ncontraction\nlaird\nsapphire\n##marine\n##ivated\namalgamation\ndissolve\ninclination\nlyle\npackaged\naltitudes\nsuez\ncanons\ngraded\nlurched\nnarrowing\nboasts\nguise\nwed\nenrico\n##ovsky\nrower\nscarred\nbree\ncub\niberian\nprotagonists\nbargaining\nproposing\ntrainers\nvoyages\nvans\nfishes\n##aea\n##ivist\n##verance\nencryption\nartworks\nkazan\nsabre\ncleopatra\nhepburn\nrotting\nsupremacy\nmecklenburg\n##brate\nburrows\nhazards\noutgoing\nflair\norganizes\n##ctions\nscorpion\n##usions\nboo\n234\nchevalier\ndunedin\nslapping\n##34\nineligible\npensions\n##38\n##omic\nmanufactures\nemails\nbismarck\n238\nweakening\nblackish\nding\nmcgee\nquo\n##rling\nnorthernmost\nxx\nmanpower\ngreed\nsampson\nclicking\n##ange\n##horpe\n##inations\n##roving\ntorre\n##eptive\n##moral\nsymbolism\n38th\nasshole\nmeritorious\noutfits\nsplashed\nbiographies\nsprung\nastros\n##tale\n302\n737\nfilly\nraoul\nnw\ntokugawa\nlinden\nclubhouse\n##apa\ntracts\nromano\n##pio\nputin\ntags\n##note\nchained\ndickson\ngunshot\nmoe\ngunn\nrashid\n##tails\nzipper\n##bas\n##nea\ncontrasted\n##ply\n##udes\nplum\npharaoh\n##pile\naw\ncomedies\ningrid\nsandwiches\nsubdivisions\n1100\nmariana\nnokia\nkamen\nhz\ndelaney\nveto\nherring\n##words\npossessive\noutlines\n##roup\nsiemens\nstairwell\nrc\ngallantry\nmessiah\npalais\nyells\n233\nzeppelin\n##dm\nbolivar\n##cede\nsmackdown\nmckinley\n##mora\n##yt\nmuted\ngeologic\nfinely\nunitary\navatar\nhamas\nmaynard\nrees\nbog\ncontrasting\n##rut\nliv\nchico\ndisposition\npixel\n##erate\nbecca\ndmitry\nyeshiva\nnarratives\n##lva\n##ulton\nmercenary\nsharpe\ntempered\nnavigate\nstealth\namassed\nkeynes\n##lini\nuntouched\n##rrie\nhavoc\nlithium\n##fighting\nabyss\ngraf\nsouthward\nwolverine\nballoons\nimplements\nngos\ntransitions\n##icum\nambushed\nconcacaf\ndormant\neconomists\n##dim\ncosting\ncsi\nrana\nuniversite\nboulders\nverity\n##llon\ncollin\nmellon\nmisses\ncypress\nfluorescent\nlifeless\nspence\n##ulla\ncrewe\nshepard\npak\nrevelations\n##م\njolly\ngibbons\npaw\n##dro\n##quel\nfreeing\n##test\nshack\nfries\npalatine\n##51\n##hiko\naccompaniment\ncruising\nrecycled\n##aver\nerwin\nsorting\nsynthesizers\ndyke\nrealities\nsg\nstrides\nenslaved\nwetland\n##ghan\ncompetence\ngunpowder\ngrassy\nmaroon\nreactors\nobjection\n##oms\ncarlson\ngearbox\nmacintosh\nradios\nshelton\n##sho\nclergyman\nprakash\n254\nmongols\ntrophies\noricon\n228\nstimuli\ntwenty20\ncantonese\ncortes\nmirrored\n##saurus\nbhp\ncristina\nmelancholy\n##lating\nenjoyable\nnuevo\n##wny\ndownfall\nschumacher\n##ind\nbanging\nlausanne\nrumbled\nparamilitary\nreflex\nax\namplitude\nmigratory\n##gall\n##ups\nmidi\nbarnard\nlastly\nsherry\n##hp\n##nall\nkeystone\n##kra\ncarleton\nslippery\n##53\ncoloring\nfoe\nsocket\notter\n##rgos\nmats\n##tose\nconsultants\nbafta\nbison\ntopping\n##km\n490\nprimal\nabandonment\ntransplant\natoll\nhideous\nmort\npained\nreproduced\ntae\nhowling\n##turn\nunlawful\nbillionaire\nhotter\npoised\nlansing\n##chang\ndinamo\nretro\nmessing\nnfc\ndomesday\n##mina\nblitz\ntimed\n##athing\n##kley\nascending\ngesturing\n##izations\nsignaled\ntis\nchinatown\nmermaid\nsavanna\njameson\n##aint\ncatalina\n##pet\n##hers\ncochrane\ncy\nchatting\n##kus\nalerted\ncomputation\nmused\nnoelle\nmajestic\nmohawk\ncampo\noctagonal\n##sant\n##hend\n241\naspiring\n##mart\ncomprehend\niona\nparalyzed\nshimmering\nswindon\nrhone\n##eley\nreputed\nconfigurations\npitchfork\nagitation\nfrancais\ngillian\nlipstick\n##ilo\noutsiders\npontifical\nresisting\nbitterness\nsewer\nrockies\n##edd\n##ucher\nmisleading\n1756\nexiting\ngalloway\n##nging\nrisked\n##heart\n246\ncommemoration\nschultz\n##rka\nintegrating\n##rsa\nposes\nshrieked\n##weiler\nguineas\ngladys\njerking\nowls\ngoldsmith\nnightly\npenetrating\n##unced\nlia\n##33\nignited\nbetsy\n##aring\n##thorpe\nfollower\nvigorously\n##rave\ncoded\nkiran\nknit\nzoology\ntbilisi\n##28\n##bered\nrepository\ngovt\ndeciduous\ndino\ngrowling\n##bba\nenhancement\nunleashed\nchanting\npussy\nbiochemistry\n##eric\nkettle\nrepression\ntoxicity\nnrhp\n##arth\n##kko\n##bush\nernesto\ncommended\noutspoken\n242\nmca\nparchment\nsms\nkristen\n##aton\nbisexual\nraked\nglamour\nnavajo\na2\nconditioned\nshowcased\n##hma\nspacious\nyouthful\n##esa\nusl\nappliances\njunta\nbrest\nlayne\nconglomerate\nenchanted\nchao\nloosened\npicasso\ncirculating\ninspect\nmontevideo\n##centric\n##kti\npiazza\nspurred\n##aith\nbari\nfreedoms\npoultry\nstamford\nlieu\n##ect\nindigo\nsarcastic\nbahia\nstump\nattach\ndvds\nfrankenstein\nlille\napprox\nscriptures\npollen\n##script\nnmi\noverseen\n##ivism\ntides\nproponent\nnewmarket\ninherit\nmilling\n##erland\ncentralized\n##rou\ndistributors\ncredentials\ndrawers\nabbreviation\n##lco\n##xon\ndowning\nuncomfortably\nripe\n##oes\nerase\nfranchises\n##ever\npopulace\n##bery\n##khar\ndecomposition\npleas\n##tet\ndaryl\nsabah\n##stle\n##wide\nfearless\ngenie\nlesions\nannette\n##ogist\noboe\nappendix\nnair\ndripped\npetitioned\nmaclean\nmosquito\nparrot\nrpg\nhampered\n1648\noperatic\nreservoirs\n##tham\nirrelevant\njolt\nsummarized\n##fp\nmedallion\n##taff\n##−\nclawed\nharlow\nnarrower\ngoddard\nmarcia\nbodied\nfremont\nsuarez\naltering\ntempest\nmussolini\nporn\n##isms\nsweetly\noversees\nwalkers\nsolitude\ngrimly\nshrines\nhk\nich\nsupervisors\nhostess\ndietrich\nlegitimacy\nbrushes\nexpressive\n##yp\ndissipated\n##rse\nlocalized\nsystemic\n##nikov\ngettysburg\n##js\n##uaries\ndialogues\nmuttering\n251\nhousekeeper\nsicilian\ndiscouraged\n##frey\nbeamed\nkaladin\nhalftime\nkidnap\n##amo\n##llet\n1754\nsynonymous\ndepleted\ninstituto\ninsulin\nreprised\n##opsis\nclashed\n##ctric\ninterrupting\nradcliffe\ninsisting\nmedici\n1715\nejected\nplayfully\nturbulent\n##47\nstarvation\n##rini\nshipment\nrebellious\npetersen\nverification\nmerits\n##rified\ncakes\n##charged\n1757\nmilford\nshortages\nspying\nfidelity\n##aker\nemitted\nstorylines\nharvested\nseismic\n##iform\ncheung\nkilda\ntheoretically\nbarbie\nlynx\n##rgy\n##tius\ngoblin\nmata\npoisonous\n##nburg\nreactive\nresidues\nobedience\n##евич\nconjecture\n##rac\n401\nhating\nsixties\nkicker\nmoaning\nmotown\n##bha\nemancipation\nneoclassical\n##hering\nconsoles\nebert\nprofessorship\n##tures\nsustaining\nassaults\nobeyed\naffluent\nincurred\ntornadoes\n##eber\n##zow\nemphasizing\nhighlanders\ncheated\nhelmets\n##ctus\ninternship\nterence\nbony\nexecutions\nlegislators\nberries\npeninsular\ntinged\n##aco\n1689\namplifier\ncorvette\nribbons\nlavish\npennant\n##lander\nworthless\n##chfield\n##forms\nmariano\npyrenees\nexpenditures\n##icides\nchesterfield\nmandir\ntailor\n39th\nsergey\nnestled\nwilled\naristocracy\ndevotees\ngoodnight\nraaf\nrumored\nweaponry\nremy\nappropriations\nharcourt\nburr\nriaa\n##lence\nlimitation\nunnoticed\nguo\nsoaking\nswamps\n##tica\ncollapsing\ntatiana\ndescriptive\nbrigham\npsalm\n##chment\nmaddox\n##lization\npatti\ncaliph\n##aja\nakron\ninjuring\nserra\n##ganj\nbasins\n##sari\nastonished\nlauncher\n##church\nhilary\nwilkins\nsewing\n##sf\nstinging\n##fia\n##ncia\nunderwood\nstartup\n##ition\ncompilations\nvibrations\nembankment\njurist\n##nity\nbard\njuventus\ngroundwater\nkern\npalaces\nhelium\nboca\ncramped\nmarissa\nsoto\n##worm\njae\nprincely\n##ggy\nfaso\nbazaar\nwarmly\n##voking\n229\npairing\n##lite\n##grate\n##nets\nwien\nfreaked\nulysses\nrebirth\n##alia\n##rent\nmummy\nguzman\njimenez\nstilled\n##nitz\ntrajectory\ntha\nwoken\narchival\nprofessions\n##pts\n##pta\nhilly\nshadowy\nshrink\n##bolt\nnorwood\nglued\nmigrate\nstereotypes\ndevoid\n##pheus\n625\nevacuate\nhorrors\ninfancy\ngotham\nknowles\noptic\ndownloaded\nsachs\nkingsley\nparramatta\ndarryl\nmor\n##onale\nshady\ncommence\nconfesses\nkan\n##meter\n##placed\nmarlborough\nroundabout\nregents\nfrigates\nio\n##imating\ngothenburg\nrevoked\ncarvings\nclockwise\nconvertible\nintruder\n##sche\nbanged\n##ogo\nvicky\nbourgeois\n##mony\ndupont\nfooting\n##gum\npd\n##real\nbuckle\nyun\npenthouse\nsane\n720\nserviced\nstakeholders\nneumann\nbb\n##eers\ncomb\n##gam\ncatchment\npinning\nrallies\ntyping\n##elles\nforefront\nfreiburg\nsweetie\ngiacomo\nwidowed\ngoodwill\nworshipped\naspirations\nmidday\n##vat\nfishery\n##trick\nbournemouth\nturk\n243\nhearth\nethanol\nguadalajara\nmurmurs\nsl\n##uge\nafforded\nscripted\n##hta\nwah\n##jn\ncoroner\ntranslucent\n252\nmemorials\npuck\nprogresses\nclumsy\n##race\n315\ncandace\nrecounted\n##27\n##slin\n##uve\nfiltering\n##mac\nhowl\nstrata\nheron\nleveled\n##ays\ndubious\n##oja\n##т\n##wheel\ncitations\nexhibiting\n##laya\n##mics\n##pods\nturkic\n##lberg\ninjunction\n##ennial\n##mit\nantibodies\n##44\norganise\n##rigues\ncardiovascular\ncushion\ninverness\n##zquez\ndia\ncocoa\nsibling\n##tman\n##roid\nexpanse\nfeasible\ntunisian\nalgiers\n##relli\nrus\nbloomberg\ndso\nwestphalia\nbro\ntacoma\n281\ndownloads\n##ours\nkonrad\nduran\n##hdi\ncontinuum\njett\ncompares\nlegislator\nsecession\n##nable\n##gues\n##zuka\ntranslating\nreacher\n##gley\n##ła\naleppo\n##agi\ntc\norchards\ntrapping\nlinguist\nversatile\ndrumming\npostage\ncalhoun\nsuperiors\n##mx\nbarefoot\nleary\n##cis\nignacio\nalfa\nkaplan\n##rogen\nbratislava\nmori\n##vot\ndisturb\nhaas\n313\ncartridges\ngilmore\nradiated\nsalford\ntunic\nhades\n##ulsive\narcheological\ndelilah\nmagistrates\nauditioned\nbrewster\ncharters\nempowerment\nblogs\ncappella\ndynasties\niroquois\nwhipping\n##krishna\nraceway\ntruths\nmyra\nweaken\njudah\nmcgregor\n##horse\nmic\nrefueling\n37th\nburnley\nbosses\nmarkus\npremio\nquery\n##gga\ndunbar\n##economic\ndarkest\nlyndon\nsealing\ncommendation\nreappeared\n##mun\naddicted\nezio\nslaughtered\nsatisfactory\nshuffle\n##eves\n##thic\n##uj\nfortification\nwarrington\n##otto\nresurrected\nfargo\nmane\n##utable\n##lei\n##space\nforeword\nox\n##aris\n##vern\nabrams\nhua\n##mento\nsakura\n##alo\nuv\nsentimental\n##skaya\nmidfield\n##eses\nsturdy\nscrolls\nmacleod\n##kyu\nentropy\n##lance\nmitochondrial\ncicero\nexcelled\nthinner\nconvoys\nperceive\n##oslav\n##urable\nsystematically\ngrind\nburkina\n287\n##tagram\nops\n##aman\nguantanamo\n##cloth\n##tite\nforcefully\nwavy\n##jou\npointless\n##linger\n##tze\nlayton\nportico\nsuperficial\nclerical\noutlaws\n##hism\nburials\nmuir\n##inn\ncreditors\nhauling\nrattle\n##leg\ncalais\nmonde\narchers\nreclaimed\ndwell\nwexford\nhellenic\nfalsely\nremorse\n##tek\ndough\nfurnishings\n##uttered\ngabon\nneurological\nnovice\n##igraphy\ncontemplated\npulpit\nnightstand\nsaratoga\n##istan\ndocumenting\npulsing\ntaluk\n##firmed\nbusted\nmarital\n##rien\ndisagreements\nwasps\n##yes\nhodge\nmcdonnell\nmimic\nfran\npendant\ndhabi\nmusa\n##nington\ncongratulations\nargent\ndarrell\nconcussion\nlosers\nregrets\nthessaloniki\nreversal\ndonaldson\nhardwood\nthence\nachilles\nritter\n##eran\ndemonic\njurgen\nprophets\ngoethe\neki\nclassmate\nbuff\n##cking\nyank\nirrational\n##inging\nperished\nseductive\nqur\nsourced\n##crat\n##typic\nmustard\nravine\nbarre\nhorizontally\ncharacterization\nphylogenetic\nboise\n##dit\n##runner\n##tower\nbrutally\nintercourse\nseduce\n##bbing\nfay\nferris\nogden\namar\nnik\nunarmed\n##inator\nevaluating\nkyrgyzstan\nsweetness\n##lford\n##oki\nmccormick\nmeiji\nnotoriety\nstimulate\ndisrupt\nfiguring\ninstructional\nmcgrath\n##zoo\ngroundbreaking\n##lto\nflinch\nkhorasan\nagrarian\nbengals\nmixer\nradiating\n##sov\ningram\npitchers\nnad\ntariff\n##cript\ntata\n##codes\n##emi\n##ungen\nappellate\nlehigh\n##bled\n##giri\nbrawl\nduct\ntexans\n##ciation\n##ropolis\nskipper\nspeculative\nvomit\ndoctrines\nstresses\n253\ndavy\ngraders\nwhitehead\njozef\ntimely\ncumulative\nharyana\npaints\nappropriately\nboon\ncactus\n##ales\n##pid\ndow\nlegions\n##pit\nperceptions\n1730\npicturesque\n##yse\nperiphery\nrune\nwr\n##aha\nceltics\nsentencing\nwhoa\n##erin\nconfirms\nvariance\n425\nmoines\nmathews\nspade\nrave\nm1\nfronted\nfx\nblending\nalleging\nreared\n##gl\n237\n##paper\ngrassroots\neroded\n##free\n##physical\ndirects\nordeal\n##sław\naccelerate\nhacker\nrooftop\n##inia\nlev\nbuys\ncebu\ndevote\n##lce\nspecialising\n##ulsion\nchoreographed\nrepetition\nwarehouses\n##ryl\npaisley\ntuscany\nanalogy\nsorcerer\nhash\nhuts\nshards\ndescends\nexclude\nnix\nchaplin\ngaga\nito\nvane\n##drich\ncauseway\nmisconduct\nlimo\norchestrated\nglands\njana\n##kot\nu2\n##mple\n##sons\nbranching\ncontrasts\nscoop\nlonged\n##virus\nchattanooga\n##75\nsyrup\ncornerstone\n##tized\n##mind\n##iaceae\ncareless\nprecedence\nfrescoes\n##uet\nchilled\nconsult\nmodelled\nsnatch\npeat\n##thermal\ncaucasian\nhumane\nrelaxation\nspins\ntemperance\n##lbert\noccupations\nlambda\nhybrids\nmoons\nmp3\n##oese\n247\nrolf\nsocietal\nyerevan\nness\n##ssler\nbefriended\nmechanized\nnominate\ntrough\nboasted\ncues\nseater\n##hom\nbends\n##tangle\nconductors\nemptiness\n##lmer\neurasian\nadriatic\ntian\n##cie\nanxiously\nlark\npropellers\nchichester\njock\nev\n2a\n##holding\ncredible\nrecounts\ntori\nloyalist\nabduction\n##hoot\n##redo\nnepali\n##mite\nventral\ntempting\n##ango\n##crats\nsteered\n##wice\njavelin\ndipping\nlaborers\nprentice\nlooming\ntitanium\n##ː\nbadges\nemir\ntensor\n##ntation\negyptians\nrash\ndenies\nhawthorne\nlombard\nshowers\nwehrmacht\ndietary\ntrojan\n##reus\nwelles\nexecuting\nhorseshoe\nlifeboat\n##lak\nelsa\ninfirmary\nnearing\nroberta\nboyer\nmutter\ntrillion\njoanne\n##fine\n##oked\nsinks\nvortex\nuruguayan\nclasp\nsirius\n##block\naccelerator\nprohibit\nsunken\nbyu\nchronological\ndiplomats\nochreous\n510\nsymmetrical\n1644\nmaia\n##tology\nsalts\nreigns\natrocities\n##ия\nhess\nbared\nissn\n##vyn\ncater\nsaturated\n##cycle\n##isse\nsable\nvoyager\ndyer\nyusuf\n##inge\nfountains\nwolff\n##39\n##nni\nengraving\nrollins\natheist\nominous\n##ault\nherr\nchariot\nmartina\nstrung\n##fell\n##farlane\nhorrific\nsahib\ngazes\nsaetan\nerased\nptolemy\n##olic\nflushing\nlauderdale\nanalytic\n##ices\n530\nnavarro\nbeak\ngorilla\nherrera\nbroom\nguadalupe\nraiding\nsykes\n311\nbsc\ndeliveries\n1720\ninvasions\ncarmichael\ntajikistan\nthematic\necumenical\nsentiments\nonstage\n##rians\n##brand\n##sume\ncatastrophic\nflanks\nmolten\n##arns\nwaller\naimee\nterminating\n##icing\nalternately\n##oche\nnehru\nprinters\noutraged\n##eving\nempires\ntemplate\nbanners\nrepetitive\nza\n##oise\nvegetarian\n##tell\nguiana\nopt\ncavendish\nlucknow\nsynthesized\n##hani\n##mada\nfinalized\n##ctable\nfictitious\nmayoral\nunreliable\n##enham\nembracing\npeppers\nrbis\n##chio\n##neo\ninhibition\nslashed\ntogo\norderly\nembroidered\nsafari\nsalty\n236\nbarron\nbenito\ntotaled\n##dak\npubs\nsimulated\ncaden\ndevin\ntolkien\nmomma\nwelding\nsesame\n##ept\ngottingen\nhardness\n630\nshaman\ntemeraire\n620\nadequately\npediatric\n##kit\nck\nassertion\nradicals\ncomposure\ncadence\nseafood\nbeaufort\nlazarus\nmani\nwarily\ncunning\nkurdistan\n249\ncantata\n##kir\nares\n##41\n##clusive\nnape\ntownland\ngeared\ninsulted\nflutter\nboating\nviolate\ndraper\ndumping\nmalmo\n##hh\n##romatic\nfirearm\nalta\nbono\nobscured\n##clave\nexceeds\npanorama\nunbelievable\n##train\npreschool\n##essed\ndisconnected\ninstalling\nrescuing\nsecretaries\naccessibility\n##castle\n##drive\n##ifice\n##film\nbouts\nslug\nwaterway\nmindanao\n##buro\n##ratic\nhalves\n##ل\ncalming\nliter\nmaternity\nadorable\nbragg\nelectrification\nmcc\n##dote\nroxy\nschizophrenia\n##body\nmunoz\nkaye\nwhaling\n239\nmil\ntingling\ntolerant\n##ago\nunconventional\nvolcanoes\n##finder\ndeportivo\n##llie\nrobson\nkaufman\nneuroscience\nwai\ndeportation\nmasovian\nscraping\nconverse\n##bh\nhacking\nbulge\n##oun\nadministratively\nyao\n580\namp\nmammoth\nbooster\nclaremont\nhooper\nnomenclature\npursuits\nmclaughlin\nmelinda\n##sul\ncatfish\nbarclay\nsubstrates\ntaxa\nzee\noriginals\nkimberly\npackets\npadma\n##ality\nborrowing\nostensibly\nsolvent\n##bri\n##genesis\n##mist\nlukas\nshreveport\nveracruz\n##ь\n##lou\n##wives\ncheney\ntt\nanatolia\nhobbs\n##zyn\ncyclic\nradiant\nalistair\ngreenish\nsiena\ndat\nindependents\n##bation\nconform\npieter\nhyper\napplicant\nbradshaw\nspores\ntelangana\nvinci\ninexpensive\nnuclei\n322\njang\nnme\nsoho\nspd\n##ign\ncradled\nreceptionist\npow\n##43\n##rika\nfascism\n##ifer\nexperimenting\n##ading\n##iec\n##region\n345\njocelyn\nmaris\nstair\nnocturnal\ntoro\nconstabulary\nelgin\n##kker\nmsc\n##giving\n##schen\n##rase\ndoherty\ndoping\nsarcastically\nbatter\nmaneuvers\n##cano\n##apple\n##gai\n##git\nintrinsic\n##nst\n##stor\n1753\nshowtime\ncafes\ngasps\nlviv\nushered\n##thed\nfours\nrestart\nastonishment\ntransmitting\nflyer\nshrugs\n##sau\nintriguing\ncones\ndictated\nmushrooms\nmedial\n##kovsky\n##elman\nescorting\ngaped\n##26\ngodfather\n##door\n##sell\ndjs\nrecaptured\ntimetable\nvila\n1710\n3a\naerodrome\nmortals\nscientology\n##orne\nangelina\nmag\nconvection\nunpaid\ninsertion\nintermittent\nlego\n##nated\nendeavor\nkota\npereira\n##lz\n304\nbwv\nglamorgan\ninsults\nagatha\nfey\n##cend\nfleetwood\nmahogany\nprotruding\nsteamship\nzeta\n##arty\nmcguire\nsuspense\n##sphere\nadvising\nurges\n##wala\nhurriedly\nmeteor\ngilded\ninline\narroyo\nstalker\n##oge\nexcitedly\nrevered\n##cure\nearle\nintroductory\n##break\n##ilde\nmutants\npuff\npulses\nreinforcement\n##haling\ncurses\nlizards\nstalk\ncorrelated\n##fixed\nfallout\nmacquarie\n##unas\nbearded\ndenton\nheaving\n802\n##ocation\nwinery\nassign\ndortmund\n##lkirk\neverest\ninvariant\ncharismatic\nsusie\n##elling\nbled\nlesley\ntelegram\nsumner\nbk\n##ogen\n##к\nwilcox\nneedy\ncolbert\nduval\n##iferous\n##mbled\nallotted\nattends\nimperative\n##hita\nreplacements\nhawker\n##inda\ninsurgency\n##zee\n##eke\ncasts\n##yla\n680\nives\ntransitioned\n##pack\n##powering\nauthoritative\nbaylor\nflex\ncringed\nplaintiffs\nwoodrow\n##skie\ndrastic\nape\naroma\nunfolded\ncommotion\nnt\npreoccupied\ntheta\nroutines\nlasers\nprivatization\nwand\ndomino\nek\nclenching\nnsa\nstrategically\nshowered\nbile\nhandkerchief\npere\nstoring\nchristophe\ninsulting\n316\nnakamura\nromani\nasiatic\nmagdalena\npalma\ncruises\nstripping\n405\nkonstantin\nsoaring\n##berman\ncolloquially\nforerunner\nhavilland\nincarcerated\nparasites\nsincerity\n##utus\ndisks\nplank\nsaigon\n##ining\ncorbin\nhomo\nornaments\npowerhouse\n##tlement\nchong\nfastened\nfeasibility\nidf\nmorphological\nusable\n##nish\n##zuki\naqueduct\njaguars\nkeepers\n##flies\naleksandr\nfaust\nassigns\newing\nbacterium\nhurled\ntricky\nhungarians\nintegers\nwallis\n321\nyamaha\n##isha\nhushed\noblivion\naviator\nevangelist\nfriars\n##eller\nmonograph\node\n##nary\nairplanes\nlabourers\ncharms\n##nee\n1661\nhagen\ntnt\nrudder\nfiesta\ntranscript\ndorothea\nska\ninhibitor\nmaccabi\nretorted\nraining\nencompassed\nclauses\nmenacing\n1642\nlineman\n##gist\nvamps\n##ape\n##dick\ngloom\n##rera\ndealings\neasing\nseekers\n##nut\n##pment\nhelens\nunmanned\n##anu\n##isson\nbasics\n##amy\n##ckman\nadjustments\n1688\nbrutality\nhorne\n##zell\nsui\n##55\n##mable\naggregator\n##thal\nrhino\n##drick\n##vira\ncounters\nzoom\n##01\n##rting\nmn\nmontenegrin\npackard\n##unciation\n##♭\n##kki\nreclaim\nscholastic\nthugs\npulsed\n##icia\nsyriac\nquan\nsaddam\nbanda\nkobe\nblaming\nbuddies\ndissent\n##lusion\n##usia\ncorbett\njaya\ndelle\nerratic\nlexie\n##hesis\n435\namiga\nhermes\n##pressing\n##leen\nchapels\ngospels\njamal\n##uating\ncompute\nrevolving\nwarp\n##sso\n##thes\narmory\n##eras\n##gol\nantrim\nloki\n##kow\n##asian\n##good\n##zano\nbraid\nhandwriting\nsubdistrict\nfunky\npantheon\n##iculate\nconcurrency\nestimation\nimproper\njuliana\n##his\nnewcomers\njohnstone\nstaten\ncommunicated\n##oco\n##alle\nsausage\nstormy\n##stered\n##tters\nsuperfamily\n##grade\nacidic\ncollateral\ntabloid\n##oped\n##rza\nbladder\nausten\n##ellant\nmcgraw\n##hay\nhannibal\nmein\naquino\nlucifer\nwo\nbadger\nboar\ncher\nchristensen\ngreenberg\ninterruption\n##kken\njem\n244\nmocked\nbottoms\ncambridgeshire\n##lide\nsprawling\n##bbly\neastwood\nghent\nsynth\n##buck\nadvisers\n##bah\nnominally\nhapoel\nqu\ndaggers\nestranged\nfabricated\ntowels\nvinnie\nwcw\nmisunderstanding\nanglia\nnothin\nunmistakable\n##dust\n##lova\nchilly\nmarquette\ntruss\n##edge\n##erine\nreece\n##lty\n##chemist\n##connected\n272\n308\n41st\nbash\nraion\nwaterfalls\n##ump\n##main\nlabyrinth\nqueue\ntheorist\n##istle\nbharatiya\nflexed\nsoundtracks\nrooney\nleftist\npatrolling\nwharton\nplainly\nalleviate\neastman\nschuster\ntopographic\nengages\nimmensely\nunbearable\nfairchild\n1620\ndona\nlurking\nparisian\noliveira\nia\nindictment\nhahn\nbangladeshi\n##aster\nvivo\n##uming\n##ential\nantonia\nexpects\nindoors\nkildare\nharlan\n##logue\n##ogenic\n##sities\nforgiven\n##wat\nchildish\ntavi\n##mide\n##orra\nplausible\ngrimm\nsuccessively\nscooted\n##bola\n##dget\n##rith\nspartans\nemery\nflatly\nazure\nepilogue\n##wark\nflourish\n##iny\n##tracted\n##overs\n##oshi\nbestseller\ndistressed\nreceipt\nspitting\nhermit\ntopological\n##cot\ndrilled\nsubunit\nfrancs\n##layer\neel\n##fk\n##itas\noctopus\nfootprint\npetitions\nufo\n##say\n##foil\ninterfering\nleaking\npalo\n##metry\nthistle\nvaliant\n##pic\nnarayan\nmcpherson\n##fast\ngonzales\n##ym\n##enne\ndustin\nnovgorod\nsolos\n##zman\ndoin\n##raph\n##patient\n##meyer\nsoluble\nashland\ncuffs\ncarole\npendleton\nwhistling\nvassal\n##river\ndeviation\nrevisited\nconstituents\nrallied\nrotate\nloomed\n##eil\n##nting\namateurs\naugsburg\nauschwitz\ncrowns\nskeletons\n##cona\nbonnet\n257\ndummy\nglobalization\nsimeon\nsleeper\nmandal\ndifferentiated\n##crow\n##mare\nmilne\nbundled\nexasperated\ntalmud\nowes\nsegregated\n##feng\n##uary\ndentist\npiracy\nprops\n##rang\ndevlin\n##torium\nmalicious\npaws\n##laid\ndependency\n##ergy\n##fers\n##enna\n258\npistons\nrourke\njed\ngrammatical\ntres\nmaha\nwig\n512\nghostly\njayne\n##achal\n##creen\n##ilis\n##lins\n##rence\ndesignate\n##with\narrogance\ncambodian\nclones\nshowdown\nthrottle\ntwain\n##ception\nlobes\nmetz\nnagoya\n335\nbraking\n##furt\n385\nroaming\n##minster\namin\ncrippled\n##37\n##llary\nindifferent\nhoffmann\nidols\nintimidating\n1751\n261\ninfluenza\nmemo\nonions\n1748\nbandage\nconsciously\n##landa\n##rage\nclandestine\nobserves\nswiped\ntangle\n##ener\n##jected\n##trum\n##bill\n##lta\nhugs\ncongresses\njosiah\nspirited\n##dek\nhumanist\nmanagerial\nfilmmaking\ninmate\nrhymes\ndebuting\ngrimsby\nur\n##laze\nduplicate\nvigor\n##tf\nrepublished\nbolshevik\nrefurbishment\nantibiotics\nmartini\nmethane\nnewscasts\nroyale\nhorizons\nlevant\niain\nvisas\n##ischen\npaler\n##around\nmanifestation\nsnuck\nalf\nchop\nfutile\npedestal\nrehab\n##kat\nbmg\nkerman\nres\nfairbanks\njarrett\nabstraction\nsaharan\n##zek\n1746\nprocedural\nclearer\nkincaid\nsash\nluciano\n##ffey\ncrunch\nhelmut\n##vara\nrevolutionaries\n##tute\ncreamy\nleach\n##mmon\n1747\npermitting\nnes\nplight\nwendell\n##lese\ncontra\nts\nclancy\nipa\nmach\nstaples\nautopsy\ndisturbances\nnueva\nkarin\npontiac\n##uding\nproxy\nvenerable\nhaunt\nleto\nbergman\nexpands\n##helm\nwal\n##pipe\ncanning\nceline\ncords\nobesity\n##enary\nintrusion\nplanner\n##phate\nreasoned\nsequencing\n307\nharrow\n##chon\n##dora\nmarred\nmcintyre\nrepay\ntarzan\ndarting\n248\nharrisburg\nmargarita\nrepulsed\n##hur\n##lding\nbelinda\nhamburger\nnovo\ncompliant\nrunways\nbingham\nregistrar\nskyscraper\nic\ncuthbert\nimprovisation\nlivelihood\n##corp\n##elial\nadmiring\n##dened\nsporadic\nbeliever\ncasablanca\npopcorn\n##29\nasha\nshovel\n##bek\n##dice\ncoiled\ntangible\n##dez\ncasper\nelsie\nresin\ntenderness\nrectory\n##ivision\navail\nsonar\n##mori\nboutique\n##dier\nguerre\nbathed\nupbringing\nvaulted\nsandals\nblessings\n##naut\n##utnant\n1680\n306\nfoxes\npia\ncorrosion\nhesitantly\nconfederates\ncrystalline\nfootprints\nshapiro\ntirana\nvalentin\ndrones\n45th\nmicroscope\nshipments\ntexted\ninquisition\nwry\nguernsey\nunauthorized\nresigning\n760\nripple\nschubert\nstu\nreassure\nfelony\n##ardo\nbrittle\nkoreans\n##havan\n##ives\ndun\nimplicit\ntyres\n##aldi\n##lth\nmagnolia\n##ehan\n##puri\n##poulos\naggressively\nfei\ngr\nfamiliarity\n##poo\nindicative\n##trust\nfundamentally\njimmie\noverrun\n395\nanchors\nmoans\n##opus\nbritannia\narmagh\n##ggle\npurposely\nseizing\n##vao\nbewildered\nmundane\navoidance\ncosmopolitan\ngeometridae\nquartermaster\ncaf\n415\nchatter\nengulfed\ngleam\npurge\n##icate\njuliette\njurisprudence\nguerra\nrevisions\n##bn\ncasimir\nbrew\n##jm\n1749\nclapton\ncloudy\nconde\nhermitage\n278\nsimulations\ntorches\nvincenzo\nmatteo\n##rill\nhidalgo\nbooming\nwestbound\naccomplishment\ntentacles\nunaffected\n##sius\nannabelle\nflopped\nsloping\n##litz\ndreamer\ninterceptor\nvu\n##loh\nconsecration\ncopying\nmessaging\nbreaker\nclimates\nhospitalized\n1752\ntorino\nafternoons\nwinfield\nwitnessing\n##teacher\nbreakers\nchoirs\nsawmill\ncoldly\n##ege\nsipping\nhaste\nuninhabited\nconical\nbibliography\npamphlets\nsevern\nedict\n##oca\ndeux\nillnesses\ngrips\n##pl\nrehearsals\nsis\nthinkers\ntame\n##keepers\n1690\nacacia\nreformer\n##osed\n##rys\nshuffling\n##iring\n##shima\neastbound\nionic\nrhea\nflees\nlittered\n##oum\nrocker\nvomiting\ngroaning\nchamp\noverwhelmingly\ncivilizations\npaces\nsloop\nadoptive\n##tish\nskaters\n##vres\naiding\nmango\n##joy\nnikola\nshriek\n##ignon\npharmaceuticals\n##mg\ntuna\ncalvert\ngustavo\nstocked\nyearbook\n##urai\n##mana\ncomputed\nsubsp\nriff\nhanoi\nkelvin\nhamid\nmoors\npastures\nsummons\njihad\nnectar\n##ctors\nbayou\nuntitled\npleasing\nvastly\nrepublics\nintellect\n##η\n##ulio\n##tou\ncrumbling\nstylistic\nsb\n##ی\nconsolation\nfrequented\nh₂o\nwalden\nwidows\n##iens\n404\n##ignment\nchunks\nimproves\n288\ngrit\nrecited\n##dev\nsnarl\nsociological\n##arte\n##gul\ninquired\n##held\nbruise\nclube\nconsultancy\nhomogeneous\nhornets\nmultiplication\npasta\nprick\nsavior\n##grin\n##kou\n##phile\nyoon\n##gara\ngrimes\nvanishing\ncheering\nreacting\nbn\ndistillery\n##quisite\n##vity\ncoe\ndockyard\nmassif\n##jord\nescorts\nvoss\n##valent\nbyte\nchopped\nhawke\nillusions\nworkings\nfloats\n##koto\n##vac\nkv\nannapolis\nmadden\n##onus\nalvaro\nnoctuidae\n##cum\n##scopic\navenge\nsteamboat\nforte\nillustrates\nerika\n##trip\n570\ndew\nnationalities\nbran\nmanifested\nthirsty\ndiversified\nmuscled\nreborn\n##standing\narson\n##lessness\n##dran\n##logram\n##boys\n##kushima\n##vious\nwilloughby\n##phobia\n286\nalsace\ndashboard\nyuki\n##chai\ngranville\nmyspace\npublicized\ntricked\n##gang\nadjective\n##ater\nrelic\nreorganisation\nenthusiastically\nindications\nsaxe\n##lassified\nconsolidate\niec\npadua\nhelplessly\nramps\nrenaming\nregulars\npedestrians\naccents\nconvicts\ninaccurate\nlowers\nmana\n##pati\nbarrie\nbjp\noutta\nsomeplace\nberwick\nflanking\ninvoked\nmarrow\nsparsely\nexcerpts\nclothed\nrei\n##ginal\nwept\n##straße\n##vish\nalexa\nexcel\n##ptive\nmembranes\naquitaine\ncreeks\ncutler\nsheppard\nimplementations\nns\n##dur\nfragrance\nbudge\nconcordia\nmagnesium\nmarcelo\n##antes\ngladly\nvibrating\n##rral\n##ggles\nmontrose\n##omba\nlew\nseamus\n1630\ncocky\n##ament\n##uen\nbjorn\n##rrick\nfielder\nfluttering\n##lase\nmethyl\nkimberley\nmcdowell\nreductions\nbarbed\n##jic\n##tonic\naeronautical\ncondensed\ndistracting\n##promising\nhuffed\n##cala\n##sle\nclaudius\ninvincible\nmissy\npious\nbalthazar\nci\n##lang\nbutte\ncombo\norson\n##dication\nmyriad\n1707\nsilenced\n##fed\n##rh\ncoco\nnetball\nyourselves\n##oza\nclarify\nheller\npeg\ndurban\netudes\noffender\nroast\nblackmail\ncurvature\n##woods\nvile\n309\nillicit\nsuriname\n##linson\noverture\n1685\nbubbling\ngymnast\ntucking\n##mming\n##ouin\nmaldives\n##bala\ngurney\n##dda\n##eased\n##oides\nbackside\npinto\njars\nracehorse\ntending\n##rdial\nbaronetcy\nwiener\nduly\n##rke\nbarbarian\ncupping\nflawed\n##thesis\nbertha\npleistocene\npuddle\nswearing\n##nob\n##tically\nfleeting\nprostate\namulet\neducating\n##mined\n##iti\n##tler\n75th\njens\nrespondents\nanalytics\ncavaliers\npapacy\nraju\n##iente\n##ulum\n##tip\nfunnel\n271\ndisneyland\n##lley\nsociologist\n##iam\n2500\nfaulkner\nlouvre\nmenon\n##dson\n276\n##ower\nafterlife\nmannheim\npeptide\nreferees\ncomedians\nmeaningless\n##anger\n##laise\nfabrics\nhurley\nrenal\nsleeps\n##bour\n##icle\nbreakout\nkristin\nroadside\nanimator\nclover\ndisdain\nunsafe\nredesign\n##urity\nfirth\nbarnsley\nportage\nreset\nnarrows\n268\ncommandos\nexpansive\nspeechless\ntubular\n##lux\nessendon\neyelashes\nsmashwords\n##yad\n##bang\n##claim\ncraved\nsprinted\nchet\nsomme\nastor\nwrocław\norton\n266\nbane\n##erving\n##uing\nmischief\n##amps\n##sund\nscaling\nterre\n##xious\nimpairment\noffenses\nundermine\nmoi\nsoy\ncontiguous\narcadia\ninuit\nseam\n##tops\nmacbeth\nrebelled\n##icative\n##iot\n590\nelaborated\nfrs\nuniformed\n##dberg\n259\npowerless\npriscilla\nstimulated\n980\nqc\narboretum\nfrustrating\ntrieste\nbullock\n##nified\nenriched\nglistening\nintern\n##adia\nlocus\nnouvelle\nollie\nike\nlash\nstarboard\nee\ntapestry\nheadlined\nhove\nrigged\n##vite\npollock\n##yme\nthrive\nclustered\ncas\nroi\ngleamed\nolympiad\n##lino\npressured\nregimes\n##hosis\n##lick\nripley\n##ophone\nkickoff\ngallon\nrockwell\n##arable\ncrusader\nglue\nrevolutions\nscrambling\n1714\ngrover\n##jure\nenglishman\naztec\n263\ncontemplating\ncoven\nipad\npreach\ntriumphant\ntufts\n##esian\nrotational\n##phus\n328\nfalkland\n##brates\nstrewn\nclarissa\nrejoin\nenvironmentally\nglint\nbanded\ndrenched\nmoat\nalbanians\njohor\nrr\nmaestro\nmalley\nnouveau\nshaded\ntaxonomy\nv6\nadhere\nbunk\nairfields\n##ritan\n1741\nencompass\nremington\ntran\n##erative\namelie\nmazda\nfriar\nmorals\npassions\n##zai\nbreadth\nvis\n##hae\nargus\nburnham\ncaressing\ninsider\nrudd\n##imov\n##mini\n##rso\nitalianate\nmurderous\ntextual\nwainwright\narmada\nbam\nweave\ntimer\n##taken\n##nh\nfra\n##crest\nardent\nsalazar\ntaps\ntunis\n##ntino\nallegro\ngland\nphilanthropic\n##chester\nimplication\n##optera\nesq\njudas\nnoticeably\nwynn\n##dara\ninched\nindexed\ncrises\nvilliers\nbandit\nroyalties\npatterned\ncupboard\ninterspersed\naccessory\nisla\nkendrick\nentourage\nstitches\n##esthesia\nheadwaters\n##ior\ninterlude\ndistraught\ndraught\n1727\n##basket\nbiased\nsy\ntransient\ntriad\nsubgenus\nadapting\nkidd\nshortstop\n##umatic\ndimly\nspiked\nmcleod\nreprint\nnellie\npretoria\nwindmill\n##cek\nsingled\n##mps\n273\nreunite\n##orous\n747\nbankers\noutlying\n##omp\n##ports\n##tream\napologies\ncosmetics\npatsy\n##deh\n##ocks\n##yson\nbender\nnantes\nserene\n##nad\nlucha\nmmm\n323\n##cius\n##gli\ncmll\ncoinage\nnestor\njuarez\n##rook\nsmeared\nsprayed\ntwitching\nsterile\nirina\nembodied\njuveniles\nenveloped\nmiscellaneous\ncancers\ndq\ngulped\nluisa\ncrested\nswat\ndonegal\nref\n##anov\n##acker\nhearst\nmercantile\n##lika\ndoorbell\nua\nvicki\n##alla\n##som\nbilbao\npsychologists\nstryker\nsw\nhorsemen\nturkmenistan\nwits\n##national\nanson\nmathew\nscreenings\n##umb\nrihanna\n##agne\n##nessy\naisles\n##iani\n##osphere\nhines\nkenton\nsaskatoon\ntasha\ntruncated\n##champ\n##itan\nmildred\nadvises\nfredrik\ninterpreting\ninhibitors\n##athi\nspectroscopy\n##hab\n##kong\nkarim\npanda\n##oia\n##nail\n##vc\nconqueror\nkgb\nleukemia\n##dity\narrivals\ncheered\npisa\nphosphorus\nshielded\n##riated\nmammal\nunitarian\nurgently\nchopin\nsanitary\n##mission\nspicy\ndrugged\nhinges\n##tort\ntipping\ntrier\nimpoverished\nwestchester\n##caster\n267\nepoch\nnonstop\n##gman\n##khov\naromatic\ncentrally\ncerro\n##tively\n##vio\nbillions\nmodulation\nsedimentary\n283\nfacilitating\noutrageous\ngoldstein\n##eak\n##kt\nld\nmaitland\npenultimate\npollard\n##dance\nfleets\nspaceship\nvertebrae\n##nig\nalcoholism\nals\nrecital\n##bham\n##ference\n##omics\nm2\n##bm\ntrois\n##tropical\n##в\ncommemorates\n##meric\nmarge\n##raction\n1643\n670\ncosmetic\nravaged\n##ige\ncatastrophe\neng\n##shida\nalbrecht\narterial\nbellamy\ndecor\nharmon\n##rde\nbulbs\nsynchronized\nvito\neasiest\nshetland\nshielding\nwnba\n##glers\n##ssar\n##riam\nbrianna\ncumbria\n##aceous\n##rard\ncores\nthayer\n##nsk\nbrood\nhilltop\nluminous\ncarts\nkeynote\nlarkin\nlogos\n##cta\n##ا\n##mund\n##quay\nlilith\ntinted\n277\nwrestle\nmobilization\n##uses\nsequential\nsiam\nbloomfield\ntakahashi\n274\n##ieving\npresenters\nringo\nblazed\nwitty\n##oven\n##ignant\ndevastation\nhaydn\nharmed\nnewt\ntherese\n##peed\ngershwin\nmolina\nrabbis\nsudanese\n001\ninnate\nrestarted\n##sack\n##fus\nslices\nwb\n##shah\nenroll\nhypothetical\nhysterical\n1743\nfabio\nindefinite\nwarped\n##hg\nexchanging\n525\nunsuitable\n##sboro\ngallo\n1603\nbret\ncobalt\nhomemade\n##hunter\nmx\noperatives\n##dhar\nterraces\ndurable\nlatch\npens\nwhorls\n##ctuated\n##eaux\nbilling\nligament\nsuccumbed\n##gly\nregulators\nspawn\n##brick\n##stead\nfilmfare\nrochelle\n##nzo\n1725\ncircumstance\nsaber\nsupplements\n##nsky\n##tson\ncrowe\nwellesley\ncarrot\n##9th\n##movable\nprimate\ndrury\nsincerely\ntopical\n##mad\n##rao\ncallahan\nkyiv\nsmarter\ntits\nundo\n##yeh\nannouncements\nanthologies\nbarrio\nnebula\n##islaus\n##shaft\n##tyn\nbodyguards\n2021\nassassinate\nbarns\nemmett\nscully\n##mah\n##yd\n##eland\n##tino\n##itarian\ndemoted\ngorman\nlashed\nprized\nadventist\nwrit\n##gui\nalla\ninvertebrates\n##ausen\n1641\namman\n1742\nalign\nhealy\nredistribution\n##gf\n##rize\ninsulation\n##drop\nadherents\nhezbollah\nvitro\nferns\nyanking\n269\nphp\nregistering\nuppsala\ncheerleading\nconfines\nmischievous\ntully\n##ross\n49th\ndocked\nroam\nstipulated\npumpkin\n##bry\nprompt\n##ezer\nblindly\nshuddering\ncraftsmen\nfrail\nscented\nkatharine\nscramble\nshaggy\nsponge\nhelix\nzaragoza\n279\n##52\n43rd\nbacklash\nfontaine\nseizures\nposse\ncowan\nnonfiction\ntelenovela\nwwii\nhammered\nundone\n##gpur\nencircled\nirs\n##ivation\nartefacts\noneself\nsearing\nsmallpox\n##belle\n##osaurus\nshandong\nbreached\nupland\nblushing\nrankin\ninfinitely\npsyche\ntolerated\ndocking\nevicted\n##col\nunmarked\n##lving\ngnome\nlettering\nlitres\nmusique\n##oint\nbenevolent\n##jal\nblackened\n##anna\nmccall\nracers\ntingle\n##ocene\n##orestation\nintroductions\nradically\n292\n##hiff\n##باد\n1610\n1739\nmunchen\nplead\n##nka\ncondo\nscissors\n##sight\n##tens\napprehension\n##cey\n##yin\nhallmark\nwatering\nformulas\nsequels\n##llas\naggravated\nbae\ncommencing\n##building\nenfield\nprohibits\nmarne\nvedic\ncivilized\neuclidean\njagger\nbeforehand\nblasts\ndumont\n##arney\n##nem\n740\nconversions\nhierarchical\nrios\nsimulator\n##dya\n##lellan\nhedges\noleg\nthrusts\nshadowed\ndarby\nmaximize\n1744\ngregorian\n##nded\n##routed\nsham\nunspecified\n##hog\nemory\nfactual\n##smo\n##tp\nfooled\n##rger\nortega\nwellness\nmarlon\n##oton\n##urance\ncasket\nkeating\nley\nenclave\n##ayan\nchar\ninfluencing\njia\n##chenko\n412\nammonia\nerebidae\nincompatible\nviolins\ncornered\n##arat\ngrooves\nastronauts\ncolumbian\nrampant\nfabrication\nkyushu\nmahmud\nvanish\n##dern\nmesopotamia\n##lete\nict\n##rgen\ncaspian\nkenji\npitted\n##vered\n999\ngrimace\nroanoke\ntchaikovsky\ntwinned\n##analysis\n##awan\nxinjiang\narias\nclemson\nkazakh\nsizable\n1662\n##khand\n##vard\nplunge\ntatum\nvittorio\n##nden\ncholera\n##dana\n##oper\nbracing\nindifference\nprojectile\nsuperliga\n##chee\nrealises\nupgrading\n299\nporte\nretribution\n##vies\nnk\nstil\n##resses\nama\nbureaucracy\nblackberry\nbosch\ntestosterone\ncollapses\ngreer\n##pathic\nioc\nfifties\nmalls\n##erved\nbao\nbaskets\nadolescents\nsiegfried\n##osity\n##tosis\nmantra\ndetecting\nexistent\nfledgling\n##cchi\ndissatisfied\ngan\ntelecommunication\nmingled\nsobbed\n6000\ncontroversies\noutdated\ntaxis\n##raus\nfright\nslams\n##lham\n##fect\n##tten\ndetectors\nfetal\ntanned\n##uw\nfray\ngoth\nolympian\nskipping\nmandates\nscratches\nsheng\nunspoken\nhyundai\ntracey\nhotspur\nrestrictive\n##buch\namericana\nmundo\n##bari\nburroughs\ndiva\nvulcan\n##6th\ndistinctions\nthumping\n##ngen\nmikey\nsheds\nfide\nrescues\nspringsteen\nvested\nvaluation\n##ece\n##ely\npinnacle\nrake\nsylvie\n##edo\nalmond\nquivering\n##irus\nalteration\nfaltered\n##wad\n51st\nhydra\nticked\n##kato\nrecommends\n##dicated\nantigua\narjun\nstagecoach\nwilfred\ntrickle\npronouns\n##pon\naryan\nnighttime\n##anian\ngall\npea\nstitch\n##hei\nleung\nmilos\n##dini\neritrea\nnexus\nstarved\nsnowfall\nkant\nparasitic\ncot\ndiscus\nhana\nstrikers\nappleton\nkitchens\n##erina\n##partisan\n##itha\n##vius\ndisclose\nmetis\n##channel\n1701\ntesla\n##vera\nfitch\n1735\nblooded\n##tila\ndecimal\n##tang\n##bai\ncyclones\neun\nbottled\npeas\npensacola\nbasha\nbolivian\ncrabs\nboil\nlanterns\npartridge\nroofed\n1645\nnecks\n##phila\nopined\npatting\n##kla\n##lland\nchuckles\nvolta\nwhereupon\n##nche\ndevout\neuroleague\nsuicidal\n##dee\ninherently\ninvoluntary\nknitting\nnasser\n##hide\npuppets\ncolourful\ncourageous\nsouthend\nstills\nmiraculous\nhodgson\nricher\nrochdale\nethernet\ngreta\nuniting\nprism\numm\n##haya\n##itical\n##utation\ndeterioration\npointe\nprowess\n##ropriation\nlids\nscranton\nbillings\nsubcontinent\n##koff\n##scope\nbrute\nkellogg\npsalms\ndegraded\n##vez\nstanisław\n##ructured\nferreira\npun\nastonishing\ngunnar\n##yat\narya\nprc\ngottfried\n##tight\nexcursion\n##ographer\ndina\n##quil\n##nare\nhuffington\nillustrious\nwilbur\ngundam\nverandah\n##zard\nnaacp\n##odle\nconstructive\nfjord\nkade\n##naud\ngenerosity\nthrilling\nbaseline\ncayman\nfrankish\nplastics\naccommodations\nzoological\n##fting\ncedric\nqb\nmotorized\n##dome\n##otted\nsquealed\ntackled\ncanucks\nbudgets\nsitu\nasthma\ndail\ngabled\ngrasslands\nwhimpered\nwrithing\njudgments\n##65\nminnie\npv\n##carbon\nbananas\ngrille\ndomes\nmonique\nodin\nmaguire\nmarkham\ntierney\n##estra\n##chua\nlibel\npoke\nspeedy\natrium\nlaval\nnotwithstanding\n##edly\nfai\nkala\n##sur\nrobb\n##sma\nlistings\nluz\nsupplementary\ntianjin\n##acing\nenzo\njd\nric\nscanner\ncroats\ntranscribed\n##49\narden\ncv\n##hair\n##raphy\n##lver\n##uy\n357\nseventies\nstaggering\nalam\nhorticultural\nhs\nregression\ntimbers\nblasting\n##ounded\nmontagu\nmanipulating\n##cit\ncatalytic\n1550\ntroopers\n##meo\ncondemnation\nfitzpatrick\n##oire\n##roved\ninexperienced\n1670\ncastes\n##lative\nouting\n314\ndubois\nflicking\nquarrel\nste\nlearners\n1625\niq\nwhistled\n##class\n282\nclassify\ntariffs\ntemperament\n355\nfolly\nliszt\n##yles\nimmersed\njordanian\nceasefire\napparel\nextras\nmaru\nfished\n##bio\nharta\nstockport\nassortment\ncraftsman\nparalysis\ntransmitters\n##cola\nblindness\n##wk\nfatally\nproficiency\nsolemnly\n##orno\nrepairing\namore\ngroceries\nultraviolet\n##chase\nschoolhouse\n##tua\nresurgence\nnailed\n##otype\n##×\nruse\nsaliva\ndiagrams\n##tructing\nalbans\nrann\nthirties\n1b\nantennas\nhilarious\ncougars\npaddington\nstats\n##eger\nbreakaway\nipod\nreza\nauthorship\nprohibiting\nscoffed\n##etz\n##ttle\nconscription\ndefected\ntrondheim\n##fires\nivanov\nkeenan\n##adan\n##ciful\n##fb\n##slow\nlocating\n##ials\n##tford\ncadiz\nbasalt\nblankly\ninterned\nrags\nrattling\n##tick\ncarpathian\nreassured\nsync\nbum\nguildford\niss\nstaunch\n##onga\nastronomers\nsera\nsofie\nemergencies\nsusquehanna\n##heard\nduc\nmastery\nvh1\nwilliamsburg\nbayer\nbuckled\ncraving\n##khan\n##rdes\nbloomington\n##write\nalton\nbarbecue\n##bians\njustine\n##hri\n##ndt\ndelightful\nsmartphone\nnewtown\nphoton\nretrieval\npeugeot\nhissing\n##monium\n##orough\nflavors\nlighted\nrelaunched\ntainted\n##games\n##lysis\nanarchy\nmicroscopic\nhopping\nadept\nevade\nevie\n##beau\ninhibit\nsinn\nadjustable\nhurst\nintuition\nwilton\ncisco\n44th\nlawful\nlowlands\nstockings\nthierry\n##dalen\n##hila\n##nai\nfates\nprank\ntb\nmaison\nlobbied\nprovocative\n1724\n4a\nutopia\n##qual\ncarbonate\ngujarati\npurcell\n##rford\ncurtiss\n##mei\novergrown\narenas\nmediation\nswallows\n##rnik\nrespectful\nturnbull\n##hedron\n##hope\nalyssa\nozone\n##ʻi\nami\ngestapo\njohansson\nsnooker\ncanteen\ncuff\ndeclines\nempathy\nstigma\n##ags\n##iner\n##raine\ntaxpayers\ngui\nvolga\n##wright\n##copic\nlifespan\novercame\ntattooed\nenactment\ngiggles\n##ador\n##camp\nbarrington\nbribe\nobligatory\norbiting\npeng\n##enas\nelusive\nsucker\n##vating\ncong\nhardship\nempowered\nanticipating\nestrada\ncryptic\ngreasy\ndetainees\nplanck\nsudbury\nplaid\ndod\nmarriott\nkayla\n##ears\n##vb\n##zd\nmortally\n##hein\ncognition\nradha\n319\nliechtenstein\nmeade\nrichly\nargyle\nharpsichord\nliberalism\ntrumpets\nlauded\ntyrant\nsalsa\ntiled\nlear\npromoters\nreused\nslicing\ntrident\n##chuk\n##gami\n##lka\ncantor\ncheckpoint\n##points\ngaul\nleger\nmammalian\n##tov\n##aar\n##schaft\ndoha\nfrenchman\nnirvana\n##vino\ndelgado\nheadlining\n##eron\n##iography\njug\ntko\n1649\nnaga\nintersections\n##jia\nbenfica\nnawab\n##suka\nashford\ngulp\n##deck\n##vill\n##rug\nbrentford\nfrazier\npleasures\ndunne\npotsdam\nshenzhen\ndentistry\n##tec\nflanagan\n##dorff\n##hear\nchorale\ndinah\nprem\nquezon\n##rogated\nrelinquished\nsutra\nterri\n##pani\nflaps\n##rissa\npoly\n##rnet\nhomme\naback\n##eki\nlinger\nwomb\n##kson\n##lewood\ndoorstep\northodoxy\nthreaded\nwestfield\n##rval\ndioceses\nfridays\nsubsided\n##gata\nloyalists\n##biotic\n##ettes\nletterman\nlunatic\nprelate\ntenderly\ninvariably\nsouza\nthug\nwinslow\n##otide\nfurlongs\ngogh\njeopardy\n##runa\npegasus\n##umble\nhumiliated\nstandalone\ntagged\n##roller\nfreshmen\nklan\n##bright\nattaining\ninitiating\ntransatlantic\nlogged\nviz\n##uance\n1723\ncombatants\nintervening\nstephane\nchieftain\ndespised\ngrazed\n317\ncdc\ngalveston\ngodzilla\nmacro\nsimulate\n##planes\nparades\n##esses\n960\n##ductive\n##unes\nequator\noverdose\n##cans\n##hosh\n##lifting\njoshi\nepstein\nsonora\ntreacherous\naquatics\nmanchu\nresponsive\n##sation\nsupervisory\n##christ\n##llins\n##ibar\n##balance\n##uso\nkimball\nkarlsruhe\nmab\n##emy\nignores\nphonetic\nreuters\nspaghetti\n820\nalmighty\ndanzig\nrumbling\ntombstone\ndesignations\nlured\noutset\n##felt\nsupermarkets\n##wt\ngrupo\nkei\nkraft\nsusanna\n##blood\ncomprehension\ngenealogy\n##aghan\n##verted\nredding\n##ythe\n1722\nbowing\n##pore\n##roi\nlest\nsharpened\nfulbright\nvalkyrie\nsikhs\n##unds\nswans\nbouquet\nmerritt\n##tage\n##venting\ncommuted\nredhead\nclerks\nleasing\ncesare\ndea\nhazy\n##vances\nfledged\ngreenfield\nservicemen\n##gical\narmando\nblackout\ndt\nsagged\ndownloadable\nintra\npotion\npods\n##4th\n##mism\nxp\nattendants\ngambia\nstale\n##ntine\nplump\nasteroids\nrediscovered\nbuds\nflea\nhive\n##neas\n1737\nclassifications\ndebuts\n##eles\nolympus\nscala\n##eurs\n##gno\n##mute\nhummed\nsigismund\nvisuals\nwiggled\nawait\npilasters\nclench\nsulfate\n##ances\nbellevue\nenigma\ntrainee\nsnort\n##sw\nclouded\ndenim\n##rank\n##rder\nchurning\nhartman\nlodges\nriches\nsima\n##missible\naccountable\nsocrates\nregulates\nmueller\n##cr\n1702\navoids\nsolids\nhimalayas\nnutrient\npup\n##jevic\nsquat\nfades\nnec\n##lates\n##pina\n##rona\n##ου\nprivateer\ntequila\n##gative\n##mpton\napt\nhornet\nimmortals\n##dou\nasturias\ncleansing\ndario\n##rries\n##anta\netymology\nservicing\nzhejiang\n##venor\n##nx\nhorned\nerasmus\nrayon\nrelocating\n£10\n##bags\nescalated\npromenade\nstubble\n2010s\nartisans\naxial\nliquids\nmora\nsho\nyoo\n##tsky\nbundles\noldies\n##nally\nnotification\nbastion\n##ths\nsparkle\n##lved\n1728\nleash\npathogen\nhighs\n##hmi\nimmature\n880\ngonzaga\nignatius\nmansions\nmonterrey\nsweets\nbryson\n##loe\npolled\nregatta\nbrightest\npei\nrosy\nsquid\nhatfield\npayroll\naddict\nmeath\ncornerback\nheaviest\nlodging\n##mage\ncapcom\nrippled\n##sily\nbarnet\nmayhem\nymca\nsnuggled\nrousseau\n##cute\nblanchard\n284\nfragmented\nleighton\nchromosomes\nrisking\n##md\n##strel\n##utter\ncorinne\ncoyotes\ncynical\nhiroshi\nyeomanry\n##ractive\nebook\ngrading\nmandela\nplume\nagustin\nmagdalene\n##rkin\nbea\nfemme\ntrafford\n##coll\n##lun\n##tance\n52nd\nfourier\nupton\n##mental\ncamilla\ngust\niihf\nislamabad\nlongevity\n##kala\nfeldman\nnetting\n##rization\nendeavour\nforaging\nmfa\norr\n##open\ngreyish\ncontradiction\ngraz\n##ruff\nhandicapped\nmarlene\ntweed\noaxaca\nspp\ncampos\nmiocene\npri\nconfigured\ncooks\npluto\ncozy\npornographic\n##entes\n70th\nfairness\nglided\njonny\nlynne\nrounding\nsired\n##emon\n##nist\nremade\nuncover\n##mack\ncomplied\nlei\nnewsweek\n##jured\n##parts\n##enting\n##pg\n293\nfiner\nguerrillas\nathenian\ndeng\ndisused\nstepmother\naccuse\ngingerly\nseduction\n521\nconfronting\n##walker\n##going\ngora\nnostalgia\nsabres\nvirginity\nwrenched\n##minated\nsyndication\nwielding\neyre\n##56\n##gnon\n##igny\nbehaved\ntaxpayer\nsweeps\n##growth\nchildless\ngallant\n##ywood\namplified\ngeraldine\nscrape\n##ffi\nbabylonian\nfresco\n##rdan\n##kney\n##position\n1718\nrestricting\ntack\nfukuoka\nosborn\nselector\npartnering\n##dlow\n318\ngnu\nkia\ntak\nwhitley\ngables\n##54\n##mania\nmri\nsoftness\nimmersion\n##bots\n##evsky\n1713\nchilling\ninsignificant\npcs\n##uis\nelites\nlina\npurported\nsupplemental\nteaming\n##americana\n##dding\n##inton\nproficient\nrouen\n##nage\n##rret\nniccolo\nselects\n##bread\nfluffy\n1621\ngruff\nknotted\nmukherjee\npolgara\nthrash\nnicholls\nsecluded\nsmoothing\nthru\ncorsica\nloaf\nwhitaker\ninquiries\n##rrier\n##kam\nindochina\n289\nmarlins\nmyles\npeking\n##tea\nextracts\npastry\nsuperhuman\nconnacht\nvogel\n##ditional\n##het\n##udged\n##lash\ngloss\nquarries\nrefit\nteaser\n##alic\n##gaon\n20s\nmaterialized\nsling\ncamped\npickering\ntung\ntracker\npursuant\n##cide\ncranes\nsoc\n##cini\n##typical\n##viere\nanhalt\noverboard\nworkout\nchores\nfares\norphaned\nstains\n##logie\nfenton\nsurpassing\njoyah\ntriggers\n##itte\ngrandmaster\n##lass\n##lists\nclapping\nfraudulent\nledger\nnagasaki\n##cor\n##nosis\n##tsa\neucalyptus\ntun\n##icio\n##rney\n##tara\ndax\nheroism\nina\nwrexham\nonboard\nunsigned\n##dates\nmoshe\ngalley\nwinnie\ndroplets\nexiles\npraises\nwatered\nnoodles\n##aia\nfein\nadi\nleland\nmulticultural\nstink\nbingo\ncomets\nerskine\nmodernized\ncanned\nconstraint\ndomestically\nchemotherapy\nfeatherweight\nstifled\n##mum\ndarkly\nirresistible\nrefreshing\nhasty\nisolate\n##oys\nkitchener\nplanners\n##wehr\ncages\nyarn\nimplant\ntoulon\nelects\nchildbirth\nyue\n##lind\n##lone\ncn\nrightful\nsportsman\njunctions\nremodeled\nspecifies\n##rgh\n291\n##oons\ncomplimented\n##urgent\nlister\not\n##logic\nbequeathed\ncheekbones\nfontana\ngabby\n##dial\namadeus\ncorrugated\nmaverick\nresented\ntriangles\n##hered\n##usly\nnazareth\ntyrol\n1675\nassent\npoorer\nsectional\naegean\n##cous\n296\nnylon\nghanaian\n##egorical\n##weig\ncushions\nforbid\nfusiliers\nobstruction\nsomerville\n##scia\ndime\nearrings\nelliptical\nleyte\noder\npolymers\ntimmy\natm\nmidtown\npiloted\nsettles\ncontinual\nexternally\nmayfield\n##uh\nenrichment\nhenson\nkeane\npersians\n1733\nbenji\nbraden\npep\n324\n##efe\ncontenders\npepsi\nvalet\n##isches\n298\n##asse\n##earing\ngoofy\nstroll\n##amen\nauthoritarian\noccurrences\nadversary\nahmedabad\ntangent\ntoppled\ndorchester\n1672\nmodernism\nmarxism\nislamist\ncharlemagne\nexponential\nracks\nunicode\nbrunette\nmbc\npic\nskirmish\n##bund\n##lad\n##powered\n##yst\nhoisted\nmessina\nshatter\n##ctum\njedi\nvantage\n##music\n##neil\nclemens\nmahmoud\ncorrupted\nauthentication\nlowry\nnils\n##washed\nomnibus\nwounding\njillian\n##itors\n##opped\nserialized\nnarcotics\nhandheld\n##arm\n##plicity\nintersecting\nstimulating\n##onis\ncrate\nfellowships\nhemingway\ncasinos\nclimatic\nfordham\ncopeland\ndrip\nbeatty\nleaflets\nrobber\nbrothel\nmadeira\n##hedral\nsphinx\nultrasound\n##vana\nvalor\nforbade\nleonid\nvillas\n##aldo\nduane\nmarquez\n##cytes\ndisadvantaged\nforearms\nkawasaki\nreacts\nconsular\nlax\nuncles\nuphold\n##hopper\nconcepcion\ndorsey\nlass\n##izan\narching\npassageway\n1708\nresearches\ntia\ninternationals\n##graphs\n##opers\ndistinguishes\njavanese\ndivert\n##uven\nplotted\n##listic\n##rwin\n##erik\n##tify\naffirmative\nsignifies\nvalidation\n##bson\nkari\nfelicity\ngeorgina\nzulu\n##eros\n##rained\n##rath\novercoming\n##dot\nargyll\n##rbin\n1734\nchiba\nratification\nwindy\nearls\nparapet\n##marks\nhunan\npristine\nastrid\npunta\n##gart\nbrodie\n##kota\n##oder\nmalaga\nminerva\nrouse\n##phonic\nbellowed\npagoda\nportals\nreclamation\n##gur\n##odies\n##⁄₄\nparentheses\nquoting\nallergic\npalette\nshowcases\nbenefactor\nheartland\nnonlinear\n##tness\nbladed\ncheerfully\nscans\n##ety\n##hone\n1666\ngirlfriends\npedersen\nhiram\nsous\n##liche\n##nator\n1683\n##nery\n##orio\n##umen\nbobo\nprimaries\nsmiley\n##cb\nunearthed\nuniformly\nfis\nmetadata\n1635\nind\n##oted\nrecoil\n##titles\n##tura\n##ια\n406\nhilbert\njamestown\nmcmillan\ntulane\nseychelles\n##frid\nantics\ncoli\nfated\nstucco\n##grants\n1654\nbulky\naccolades\narrays\ncaledonian\ncarnage\noptimism\npuebla\n##tative\n##cave\nenforcing\nrotherham\nseo\ndunlop\naeronautics\nchimed\nincline\nzoning\narchduke\nhellenistic\n##oses\n##sions\ncandi\nthong\n##ople\nmagnate\nrustic\n##rsk\nprojective\nslant\n##offs\ndanes\nhollis\nvocalists\n##ammed\ncongenital\ncontend\ngesellschaft\n##ocating\n##pressive\ndouglass\nquieter\n##cm\n##kshi\nhowled\nsalim\nspontaneously\ntownsville\nbuena\nsouthport\n##bold\nkato\n1638\nfaerie\nstiffly\n##vus\n##rled\n297\nflawless\nrealising\ntaboo\n##7th\nbytes\nstraightening\n356\njena\n##hid\n##rmin\ncartwright\nberber\nbertram\nsoloists\n411\nnoses\n417\ncoping\nfission\nhardin\ninca\n##cen\n1717\nmobilized\nvhf\n##raf\nbiscuits\ncurate\n##85\n##anial\n331\ngaunt\nneighbourhoods\n1540\n##abas\nblanca\nbypassed\nsockets\nbehold\ncoincidentally\n##bane\nnara\nshave\nsplinter\nterrific\n##arion\n##erian\ncommonplace\njuris\nredwood\nwaistband\nboxed\ncaitlin\nfingerprints\njennie\nnaturalized\n##ired\nbalfour\ncraters\njody\nbungalow\nhugely\nquilt\nglitter\npigeons\nundertaker\nbulging\nconstrained\ngoo\n##sil\n##akh\nassimilation\nreworked\n##person\npersuasion\n##pants\nfelicia\n##cliff\n##ulent\n1732\nexplodes\n##dun\n##inium\n##zic\nlyman\nvulture\nhog\noverlook\nbegs\nnorthwards\now\nspoil\n##urer\nfatima\nfavorably\naccumulate\nsargent\nsorority\ncorresponded\ndispersal\nkochi\ntoned\n##imi\n##lita\ninternacional\nnewfound\n##agger\n##lynn\n##rigue\nbooths\npeanuts\n##eborg\nmedicare\nmuriel\nnur\n##uram\ncrates\nmillennia\npajamas\nworsened\n##breakers\njimi\nvanuatu\nyawned\n##udeau\ncarousel\n##hony\nhurdle\n##ccus\n##mounted\n##pod\nrv\n##eche\nairship\nambiguity\ncompulsion\nrecapture\n##claiming\narthritis\n##osomal\n1667\nasserting\nngc\nsniffing\ndade\ndiscontent\nglendale\nported\n##amina\ndefamation\nrammed\n##scent\nfling\nlivingstone\n##fleet\n875\n##ppy\napocalyptic\ncomrade\nlcd\n##lowe\ncessna\neine\npersecuted\nsubsistence\ndemi\nhoop\nreliefs\n710\ncoptic\nprogressing\nstemmed\nperpetrators\n1665\npriestess\n##nio\ndobson\nebony\nrooster\nitf\ntortricidae\n##bbon\n##jian\ncleanup\n##jean\n##øy\n1721\neighties\ntaxonomic\nholiness\n##hearted\n##spar\nantilles\nshowcasing\nstabilized\n##nb\ngia\nmascara\nmichelangelo\ndawned\n##uria\n##vinsky\nextinguished\nfitz\ngrotesque\n£100\n##fera\n##loid\n##mous\nbarges\nneue\nthrobbed\ncipher\njohnnie\n##a1\n##mpt\noutburst\n##swick\nspearheaded\nadministrations\nc1\nheartbreak\npixels\npleasantly\n##enay\nlombardy\nplush\n##nsed\nbobbie\n##hly\nreapers\ntremor\nxiang\nminogue\nsubstantive\nhitch\nbarak\n##wyl\nkwan\n##encia\n910\nobscene\nelegance\nindus\nsurfer\nbribery\nconserve\n##hyllum\n##masters\nhoratio\n##fat\napes\nrebound\npsychotic\n##pour\niteration\n##mium\n##vani\nbotanic\nhorribly\nantiques\ndispose\npaxton\n##hli\n##wg\ntimeless\n1704\ndisregard\nengraver\nhounds\n##bau\n##version\nlooted\nuno\nfacilitates\ngroans\nmasjid\nrutland\nantibody\ndisqualification\ndecatur\nfootballers\nquake\nslacks\n48th\nrein\nscribe\nstabilize\ncommits\nexemplary\ntho\n##hort\n##chison\npantry\ntraversed\n##hiti\ndisrepair\nidentifiable\nvibrated\nbaccalaureate\n##nnis\ncsa\ninterviewing\n##iensis\n##raße\ngreaves\nwealthiest\n343\nclassed\njogged\n£5\n##58\n##atal\nilluminating\nknicks\nrespecting\n##uno\nscrubbed\n##iji\n##dles\nkruger\nmoods\ngrowls\nraider\nsilvia\nchefs\nkam\nvr\ncree\npercival\n##terol\ngunter\ncounterattack\ndefiant\nhenan\nze\n##rasia\n##riety\nequivalence\nsubmissions\n##fra\n##thor\nbautista\nmechanically\n##heater\ncornice\nherbal\ntemplar\n##mering\noutputs\nruining\nligand\nrenumbered\nextravagant\nmika\nblockbuster\neta\ninsurrection\n##ilia\ndarkening\nferocious\npianos\nstrife\nkinship\n##aer\nmelee\n##anor\n##iste\n##may\n##oue\ndecidedly\nweep\n##jad\n##missive\n##ppel\n354\npuget\nunease\n##gnant\n1629\nhammering\nkassel\nob\nwessex\n##lga\nbromwich\negan\nparanoia\nutilization\n##atable\n##idad\ncontradictory\nprovoke\n##ols\n##ouring\n##tangled\nknesset\n##very\n##lette\nplumbing\n##sden\n##¹\ngreensboro\noccult\nsniff\n338\nzev\nbeaming\ngamer\nhaggard\nmahal\n##olt\n##pins\nmendes\nutmost\nbriefing\ngunnery\n##gut\n##pher\n##zh\n##rok\n1679\nkhalifa\nsonya\n##boot\nprincipals\nurbana\nwiring\n##liffe\n##minating\n##rrado\ndahl\nnyu\nskepticism\nnp\ntownspeople\nithaca\nlobster\nsomethin\n##fur\n##arina\n##−1\nfreighter\nzimmerman\nbiceps\ncontractual\n##herton\namend\nhurrying\nsubconscious\n##anal\n336\nmeng\nclermont\nspawning\n##eia\n##lub\ndignitaries\nimpetus\nsnacks\nspotting\ntwigs\n##bilis\n##cz\n##ouk\nlibertadores\nnic\nskylar\n##aina\n##firm\ngustave\nasean\n##anum\ndieter\nlegislatures\nflirt\nbromley\ntrolls\numar\n##bbies\n##tyle\nblah\nparc\nbridgeport\ncrank\nnegligence\n##nction\n46th\nconstantin\nmolded\nbandages\nseriousness\n00pm\nsiegel\ncarpets\ncompartments\nupbeat\nstatehood\n##dner\n##edging\nmarko\n730\nplatt\n##hane\npaving\n##iy\n1738\nabbess\nimpatience\nlimousine\nnbl\n##talk\n441\nlucille\nmojo\nnightfall\nrobbers\n##nais\nkarel\nbrisk\ncalves\nreplicate\nascribed\ntelescopes\n##olf\nintimidated\n##reen\nballast\nspecialization\n##sit\naerodynamic\ncaliphate\nrainer\nvisionary\n##arded\nepsilon\n##aday\n##onte\naggregation\nauditory\nboosted\nreunification\nkathmandu\nloco\nrobyn\n402\nacknowledges\nappointing\nhumanoid\nnewell\nredeveloped\nrestraints\n##tained\nbarbarians\nchopper\n1609\nitaliana\n##lez\n##lho\ninvestigates\nwrestlemania\n##anies\n##bib\n690\n##falls\ncreaked\ndragoons\ngravely\nminions\nstupidity\nvolley\n##harat\n##week\nmusik\n##eries\n##uously\nfungal\nmassimo\nsemantics\nmalvern\n##ahl\n##pee\ndiscourage\nembryo\nimperialism\n1910s\nprofoundly\n##ddled\njiangsu\nsparkled\nstat\n##holz\nsweatshirt\ntobin\n##iction\nsneered\n##cheon\n##oit\nbrit\ncausal\nsmyth\n##neuve\ndiffuse\nperrin\nsilvio\n##ipes\n##recht\ndetonated\niqbal\nselma\n##nism\n##zumi\nroasted\n##riders\ntay\n##ados\n##mament\n##mut\n##rud\n840\ncompletes\nnipples\ncfa\nflavour\nhirsch\n##laus\ncalderon\nsneakers\nmoravian\n##ksha\n1622\nrq\n294\n##imeters\nbodo\n##isance\n##pre\n##ronia\nanatomical\nexcerpt\n##lke\ndh\nkunst\n##tablished\n##scoe\nbiomass\npanted\nunharmed\ngael\nhousemates\nmontpellier\n##59\ncoa\nrodents\ntonic\nhickory\nsingleton\n##taro\n451\n1719\naldo\nbreaststroke\ndempsey\noch\nrocco\n##cuit\nmerton\ndissemination\nmidsummer\nserials\n##idi\nhaji\npolynomials\n##rdon\ngs\nenoch\nprematurely\nshutter\ntaunton\n£3\n##grating\n##inates\narchangel\nharassed\n##asco\n326\narchway\ndazzling\n##ecin\n1736\nsumo\nwat\n##kovich\n1086\nhonneur\n##ently\n##nostic\n##ttal\n##idon\n1605\n403\n1716\nblogger\nrents\n##gnan\nhires\n##ikh\n##dant\nhowie\n##rons\nhandler\nretracted\nshocks\n1632\narun\nduluth\nkepler\ntrumpeter\n##lary\npeeking\nseasoned\ntrooper\n##mara\nlaszlo\n##iciencies\n##rti\nheterosexual\n##inatory\n##ssion\nindira\njogging\n##inga\n##lism\nbeit\ndissatisfaction\nmalice\n##ately\nnedra\npeeling\n##rgeon\n47th\nstadiums\n475\nvertigo\n##ains\niced\nrestroom\n##plify\n##tub\nillustrating\npear\n##chner\n##sibility\ninorganic\nrappers\nreceipts\nwatery\n##kura\nlucinda\n##oulos\nreintroduced\n##8th\n##tched\ngracefully\nsaxons\nnutritional\nwastewater\nrained\nfavourites\nbedrock\nfisted\nhallways\nlikeness\nupscale\n##lateral\n1580\nblinds\nprequel\n##pps\n##tama\ndeter\nhumiliating\nrestraining\ntn\nvents\n1659\nlaundering\nrecess\nrosary\ntractors\ncoulter\nfederer\n##ifiers\n##plin\npersistence\n##quitable\ngeschichte\npendulum\nquakers\n##beam\nbassett\npictorial\nbuffet\nkoln\n##sitor\ndrills\nreciprocal\nshooters\n##57\n##cton\n##tees\nconverge\npip\ndmitri\ndonnelly\nyamamoto\naqua\nazores\ndemographics\nhypnotic\nspitfire\nsuspend\nwryly\nroderick\n##rran\nsebastien\n##asurable\nmavericks\n##fles\n##200\nhimalayan\nprodigy\n##iance\ntransvaal\ndemonstrators\nhandcuffs\ndodged\nmcnamara\nsublime\n1726\ncrazed\n##efined\n##till\nivo\npondered\nreconciled\nshrill\nsava\n##duk\nbal\ncad\nheresy\njaipur\ngoran\n##nished\n341\nlux\nshelly\nwhitehall\n##hre\nisraelis\npeacekeeping\n##wled\n1703\ndemetrius\nousted\n##arians\n##zos\nbeale\nanwar\nbackstroke\nraged\nshrinking\ncremated\n##yck\nbenign\ntowing\nwadi\ndarmstadt\nlandfill\nparana\nsoothe\ncolleen\nsidewalks\nmayfair\ntumble\nhepatitis\nferrer\nsuperstructure\n##gingly\n##urse\n##wee\nanthropological\ntranslators\n##mies\ncloseness\nhooves\n##pw\nmondays\n##roll\n##vita\nlandscaping\n##urized\npurification\nsock\nthorns\nthwarted\njalan\ntiberius\n##taka\nsaline\n##rito\nconfidently\nkhyber\nsculptors\n##ij\nbrahms\nhammersmith\ninspectors\nbattista\nfivb\nfragmentation\nhackney\n##uls\narresting\nexercising\nantoinette\nbedfordshire\n##zily\ndyed\n##hema\n1656\nracetrack\nvariability\n##tique\n1655\naustrians\ndeteriorating\nmadman\ntheorists\naix\nlehman\nweathered\n1731\ndecreed\neruptions\n1729\nflaw\nquinlan\nsorbonne\nflutes\nnunez\n1711\nadored\ndownwards\nfable\nrasped\n1712\nmoritz\nmouthful\nrenegade\nshivers\nstunts\ndysfunction\nrestrain\ntranslit\n327\npancakes\n##avio\n##cision\n##tray\n351\nvial\n##lden\nbain\n##maid\n##oxide\nchihuahua\nmalacca\nvimes\n##rba\n##rnier\n1664\ndonnie\nplaques\n##ually\n337\nbangs\nfloppy\nhuntsville\nloretta\nnikolay\n##otte\neater\nhandgun\nubiquitous\n##hett\neras\nzodiac\n1634\n##omorphic\n1820s\n##zog\ncochran\n##bula\n##lithic\nwarring\n##rada\ndalai\nexcused\nblazers\nmcconnell\nreeling\nbot\neste\n##abi\ngeese\nhoax\ntaxon\n##bla\nguitarists\n##icon\ncondemning\nhunts\ninversion\nmoffat\ntaekwondo\n##lvis\n1624\nstammered\n##rest\n##rzy\nsousa\nfundraiser\nmarylebone\nnavigable\nuptown\ncabbage\ndaniela\nsalman\nshitty\nwhimper\n##kian\n##utive\nprogrammers\nprotections\nrm\n##rmi\n##rued\nforceful\n##enes\nfuss\n##tao\n##wash\nbrat\noppressive\nreykjavik\nspartak\nticking\n##inkles\n##kiewicz\nadolph\nhorst\nmaui\nprotege\nstraighten\ncpc\nlandau\nconcourse\nclements\nresultant\n##ando\nimaginative\njoo\nreactivated\n##rem\n##ffled\n##uising\nconsultative\n##guide\nflop\nkaitlyn\nmergers\nparenting\nsomber\n##vron\nsupervise\nvidhan\n##imum\ncourtship\nexemplified\nharmonies\nmedallist\nrefining\n##rrow\n##ка\namara\n##hum\n780\ngoalscorer\nsited\novershadowed\nrohan\ndispleasure\nsecretive\nmultiplied\nosman\n##orth\nengravings\npadre\n##kali\n##veda\nminiatures\nmis\n##yala\nclap\npali\nrook\n##cana\n1692\n57th\nantennae\nastro\noskar\n1628\nbulldog\ncrotch\nhackett\nyucatan\n##sure\namplifiers\nbrno\nferrara\nmigrating\n##gree\nthanking\nturing\n##eza\nmccann\nting\nandersson\nonslaught\ngaines\nganga\nincense\nstandardization\n##mation\nsentai\nscuba\nstuffing\nturquoise\nwaivers\nalloys\n##vitt\nregaining\nvaults\n##clops\n##gizing\ndigger\nfurry\nmemorabilia\nprobing\n##iad\npayton\nrec\ndeutschland\nfilippo\nopaque\nseamen\nzenith\nafrikaans\n##filtration\ndisciplined\ninspirational\n##merie\nbanco\nconfuse\ngrafton\ntod\n##dgets\nchampioned\nsimi\nanomaly\nbiplane\n##ceptive\nelectrode\n##para\n1697\ncleavage\ncrossbow\nswirl\ninformant\n##lars\n##osta\nafi\nbonfire\nspec\n##oux\nlakeside\nslump\n##culus\n##lais\n##qvist\n##rrigan\n1016\nfacades\nborg\ninwardly\ncervical\nxl\npointedly\n050\nstabilization\n##odon\nchests\n1699\nhacked\nctv\northogonal\nsuzy\n##lastic\ngaulle\njacobite\nrearview\n##cam\n##erted\nashby\n##drik\n##igate\n##mise\n##zbek\naffectionately\ncanine\ndisperse\nlatham\n##istles\n##ivar\nspielberg\n##orin\n##idium\nezekiel\ncid\n##sg\ndurga\nmiddletown\n##cina\ncustomized\nfrontiers\nharden\n##etano\n##zzy\n1604\nbolsheviks\n##66\ncoloration\nyoko\n##bedo\nbriefs\nslabs\ndebra\nliquidation\nplumage\n##oin\nblossoms\ndementia\nsubsidy\n1611\nproctor\nrelational\njerseys\nparochial\nter\n##ici\nesa\npeshawar\ncavalier\nloren\ncpi\nidiots\nshamrock\n1646\ndutton\nmalabar\nmustache\n##endez\n##ocytes\nreferencing\nterminates\nmarche\nyarmouth\n##sop\nacton\nmated\nseton\nsubtly\nbaptised\nbeige\nextremes\njolted\nkristina\ntelecast\n##actic\nsafeguard\nwaldo\n##baldi\n##bular\nendeavors\nsloppy\nsubterranean\n##ensburg\n##itung\ndelicately\npigment\ntq\n##scu\n1626\n##ound\ncollisions\ncoveted\nherds\n##personal\n##meister\n##nberger\nchopra\n##ricting\nabnormalities\ndefective\ngalician\nlucie\n##dilly\nalligator\nlikened\n##genase\nburundi\nclears\ncomplexion\nderelict\ndeafening\ndiablo\nfingered\nchampaign\ndogg\nenlist\nisotope\nlabeling\nmrna\n##erre\nbrilliance\nmarvelous\n##ayo\n1652\ncrawley\nether\nfooted\ndwellers\ndeserts\nhamish\nrubs\nwarlock\nskimmed\n##lizer\n870\nbuick\nembark\nheraldic\nirregularities\n##ajan\nkiara\n##kulam\n##ieg\nantigen\nkowalski\n##lge\noakley\nvisitation\n##mbit\nvt\n##suit\n1570\nmurderers\n##miento\n##rites\nchimneys\n##sling\ncondemn\ncuster\nexchequer\nhavre\n##ghi\nfluctuations\n##rations\ndfb\nhendricks\nvaccines\n##tarian\nnietzsche\nbiking\njuicy\n##duced\nbrooding\nscrolling\nselangor\n##ragan\n352\nannum\nboomed\nseminole\nsugarcane\n##dna\ndepartmental\ndismissing\ninnsbruck\narteries\nashok\nbatavia\ndaze\nkun\novertook\n##rga\n##tlan\nbeheaded\ngaddafi\nholm\nelectronically\nfaulty\ngalilee\nfractures\nkobayashi\n##lized\ngunmen\nmagma\naramaic\nmala\neastenders\ninference\nmessengers\nbf\n##qu\n407\nbathrooms\n##vere\n1658\nflashbacks\nideally\nmisunderstood\n##jali\n##weather\nmendez\n##grounds\n505\nuncanny\n##iii\n1709\nfriendships\n##nbc\nsacrament\naccommodated\nreiterated\nlogistical\npebbles\nthumped\n##escence\nadministering\ndecrees\ndrafts\n##flight\n##cased\n##tula\nfuturistic\npicket\nintimidation\nwinthrop\n##fahan\ninterfered\n339\nafar\nfrancoise\nmorally\nuta\ncochin\ncroft\ndwarfs\n##bruck\n##dents\n##nami\nbiker\n##hner\n##meral\nnano\n##isen\n##ometric\n##pres\n##ан\nbrightened\nmeek\nparcels\nsecurely\ngunners\n##jhl\n##zko\nagile\nhysteria\n##lten\n##rcus\nbukit\nchamps\nchevy\ncuckoo\nleith\nsadler\ntheologians\nwelded\n##section\n1663\njj\nplurality\nxander\n##rooms\n##formed\nshredded\ntemps\nintimately\npau\ntormented\n##lok\n##stellar\n1618\ncharred\nems\nessen\n##mmel\nalarms\nspraying\nascot\nblooms\ntwinkle\n##abia\n##apes\ninternment\nobsidian\n##chaft\nsnoop\n##dav\n##ooping\nmalibu\n##tension\nquiver\n##itia\nhays\nmcintosh\ntravers\nwalsall\n##ffie\n1623\nbeverley\nschwarz\nplunging\nstructurally\nm3\nrosenthal\nvikram\n##tsk\n770\nghz\n##onda\n##tiv\nchalmers\ngroningen\npew\nreckon\nunicef\n##rvis\n55th\n##gni\n1651\nsulawesi\navila\ncai\nmetaphysical\nscrewing\nturbulence\n##mberg\naugusto\nsamba\n56th\nbaffled\nmomentary\ntoxin\n##urian\n##wani\naachen\ncondoms\ndali\nsteppe\n##3d\n##app\n##oed\n##year\nadolescence\ndauphin\nelectrically\ninaccessible\nmicroscopy\nnikita\n##ega\natv\n##cel\n##enter\n##oles\n##oteric\n##ы\naccountants\npunishments\nwrongly\nbribes\nadventurous\nclinch\nflinders\nsouthland\n##hem\n##kata\ngough\n##ciency\nlads\nsoared\n##ה\nundergoes\ndeformation\noutlawed\nrubbish\n##arus\n##mussen\n##nidae\n##rzburg\narcs\n##ingdon\n##tituted\n1695\nwheelbase\nwheeling\nbombardier\ncampground\nzebra\n##lices\n##oj\n##bain\nlullaby\n##ecure\ndonetsk\nwylie\ngrenada\n##arding\n##ης\nsquinting\neireann\nopposes\n##andra\nmaximal\nrunes\n##broken\n##cuting\n##iface\n##ror\n##rosis\nadditive\nbritney\nadultery\ntriggering\n##drome\ndetrimental\naarhus\ncontainment\njc\nswapped\nvichy\n##ioms\nmadly\n##oric\n##rag\nbrant\n##ckey\n##trix\n1560\n1612\nbroughton\nrustling\n##stems\n##uder\nasbestos\nmentoring\n##nivorous\nfinley\nleaps\n##isan\napical\npry\nslits\nsubstitutes\n##dict\nintuitive\nfantasia\ninsistent\nunreasonable\n##igen\n##vna\ndomed\nhannover\nmargot\nponder\n##zziness\nimpromptu\njian\nlc\nrampage\nstemming\n##eft\nandrey\ngerais\nwhichever\namnesia\nappropriated\nanzac\nclicks\nmodifying\nultimatum\ncambrian\nmaids\nverve\nyellowstone\n##mbs\nconservatoire\n##scribe\nadherence\ndinners\nspectra\nimperfect\nmysteriously\nsidekick\ntatar\ntuba\n##aks\n##ifolia\ndistrust\n##athan\n##zle\nc2\nronin\nzac\n##pse\ncelaena\ninstrumentalist\nscents\nskopje\n##mbling\ncomical\ncompensated\nvidal\ncondor\nintersect\njingle\nwavelengths\n##urrent\nmcqueen\n##izzly\ncarp\nweasel\n422\nkanye\nmilitias\npostdoctoral\neugen\ngunslinger\n##ɛ\nfaux\nhospice\n##for\nappalled\nderivation\ndwarves\n##elis\ndilapidated\n##folk\nastoria\nphilology\n##lwyn\n##otho\n##saka\ninducing\nphilanthropy\n##bf\n##itative\ngeek\nmarkedly\nsql\n##yce\nbessie\nindices\nrn\n##flict\n495\nfrowns\nresolving\nweightlifting\ntugs\ncleric\ncontentious\n1653\nmania\nrms\n##miya\n##reate\n##ruck\n##tucket\nbien\neels\nmarek\n##ayton\n##cence\ndiscreet\nunofficially\n##ife\nleaks\n##bber\n1705\n332\ndung\ncompressor\nhillsborough\npandit\nshillings\ndistal\n##skin\n381\n##tat\n##you\nnosed\n##nir\nmangrove\nundeveloped\n##idia\ntextures\n##inho\n##500\n##rise\nae\nirritating\nnay\namazingly\nbancroft\napologetic\ncompassionate\nkata\nsymphonies\n##lovic\nairspace\n##lch\n930\ngifford\nprecautions\nfulfillment\nsevilla\nvulgar\nmartinique\n##urities\nlooting\npiccolo\ntidy\n##dermott\nquadrant\narmchair\nincomes\nmathematicians\nstampede\nnilsson\n##inking\n##scan\nfoo\nquarterfinal\n##ostal\nshang\nshouldered\nsquirrels\n##owe\n344\nvinegar\n##bner\n##rchy\n##systems\ndelaying\n##trics\nars\ndwyer\nrhapsody\nsponsoring\n##gration\nbipolar\ncinder\nstarters\n##olio\n##urst\n421\nsignage\n##nty\naground\nfigurative\nmons\nacquaintances\nduets\nerroneously\nsoyuz\nelliptic\nrecreated\n##cultural\n##quette\n##ssed\n##tma\n##zcz\nmoderator\nscares\n##itaire\n##stones\n##udence\njuniper\nsighting\n##just\n##nsen\nbritten\ncalabria\nry\nbop\ncramer\nforsyth\nstillness\n##л\nairmen\ngathers\nunfit\n##umber\n##upt\ntaunting\n##rip\nseeker\nstreamlined\n##bution\nholster\nschumann\ntread\nvox\n##gano\n##onzo\nstrive\ndil\nreforming\ncovent\nnewbury\npredicting\n##orro\ndecorate\ntre\n##puted\nandover\nie\nasahi\ndept\ndunkirk\ngills\n##tori\nburen\nhuskies\n##stis\n##stov\nabstracts\nbets\nloosen\n##opa\n1682\nyearning\n##glio\n##sir\nberman\neffortlessly\nenamel\nnapoli\npersist\n##peration\n##uez\nattache\nelisa\nb1\ninvitations\n##kic\naccelerating\nreindeer\nboardwalk\nclutches\nnelly\npolka\nstarbucks\n##kei\nadamant\nhuey\nlough\nunbroken\nadventurer\nembroidery\ninspecting\nstanza\n##ducted\nnaia\ntaluka\n##pone\n##roids\nchases\ndeprivation\nflorian\n##jing\n##ppet\nearthly\n##lib\n##ssee\ncolossal\nforeigner\nvet\nfreaks\npatrice\nrosewood\ntriassic\nupstate\n##pkins\ndominates\nata\nchants\nks\nvo\n##400\n##bley\n##raya\n##rmed\n555\nagra\ninfiltrate\n##ailing\n##ilation\n##tzer\n##uppe\n##werk\nbinoculars\nenthusiast\nfujian\nsqueak\n##avs\nabolitionist\nalmeida\nboredom\nhampstead\nmarsden\nrations\n##ands\ninflated\n334\nbonuses\nrosalie\npatna\n##rco\n329\ndetachments\npenitentiary\n54th\nflourishing\nwoolf\n##dion\n##etched\npapyrus\n##lster\n##nsor\n##toy\nbobbed\ndismounted\nendelle\ninhuman\nmotorola\ntbs\nwince\nwreath\n##ticus\nhideout\ninspections\nsanjay\ndisgrace\ninfused\npudding\nstalks\n##urbed\narsenic\nleases\n##hyl\n##rrard\ncollarbone\n##waite\n##wil\ndowry\n##bant\n##edance\ngenealogical\nnitrate\nsalamanca\nscandals\nthyroid\nnecessitated\n##!\n##\"\n###\n##$\n##%\n##&\n##'\n##(\n##)\n##*\n##+\n##,\n##-\n##.\n##/\n##:\n##;\n##<\n##=\n##>\n##?\n##@\n##[\n##\\\n##]\n##^\n##_\n##`\n##{\n##|\n##}\n##~\n##¡\n##¢\n##£\n##¤\n##¥\n##¦\n##§\n##¨\n##©\n##ª\n##«\n##¬\n##®\n##±\n##´\n##µ\n##¶\n##·\n##º\n##»\n##¼\n##¾\n##¿\n##æ\n##ð\n##÷\n##þ\n##đ\n##ħ\n##ŋ\n##œ\n##ƒ\n##ɐ\n##ɑ\n##ɒ\n##ɔ\n##ɕ\n##ə\n##ɡ\n##ɣ\n##ɨ\n##ɪ\n##ɫ\n##ɬ\n##ɯ\n##ɲ\n##ɴ\n##ɹ\n##ɾ\n##ʀ\n##ʁ\n##ʂ\n##ʃ\n##ʉ\n##ʊ\n##ʋ\n##ʌ\n##ʎ\n##ʐ\n##ʑ\n##ʒ\n##ʔ\n##ʰ\n##ʲ\n##ʳ\n##ʷ\n##ʸ\n##ʻ\n##ʼ\n##ʾ\n##ʿ\n##ˈ\n##ˡ\n##ˢ\n##ˣ\n##ˤ\n##β\n##γ\n##δ\n##ε\n##ζ\n##θ\n##κ\n##λ\n##μ\n##ξ\n##ο\n##π\n##ρ\n##σ\n##τ\n##υ\n##φ\n##χ\n##ψ\n##ω\n##б\n##г\n##д\n##ж\n##з\n##м\n##п\n##с\n##у\n##ф\n##х\n##ц\n##ч\n##ш\n##щ\n##ъ\n##э\n##ю\n##ђ\n##є\n##і\n##ј\n##љ\n##њ\n##ћ\n##ӏ\n##ա\n##բ\n##գ\n##դ\n##ե\n##թ\n##ի\n##լ\n##կ\n##հ\n##մ\n##յ\n##ն\n##ո\n##պ\n##ս\n##վ\n##տ\n##ր\n##ւ\n##ք\n##־\n##א\n##ב\n##ג\n##ד\n##ו\n##ז\n##ח\n##ט\n##י\n##ך\n##כ\n##ל\n##ם\n##מ\n##ן\n##נ\n##ס\n##ע\n##ף\n##פ\n##ץ\n##צ\n##ק\n##ר\n##ש\n##ת\n##،\n##ء\n##ب\n##ت\n##ث\n##ج\n##ح\n##خ\n##ذ\n##ز\n##س\n##ش\n##ص\n##ض\n##ط\n##ظ\n##ع\n##غ\n##ـ\n##ف\n##ق\n##ك\n##و\n##ى\n##ٹ\n##پ\n##چ\n##ک\n##گ\n##ں\n##ھ\n##ہ\n##ے\n##अ\n##आ\n##उ\n##ए\n##क\n##ख\n##ग\n##च\n##ज\n##ट\n##ड\n##ण\n##त\n##थ\n##द\n##ध\n##न\n##प\n##ब\n##भ\n##म\n##य\n##र\n##ल\n##व\n##श\n##ष\n##स\n##ह\n##ा\n##ि\n##ी\n##ो\n##।\n##॥\n##ং\n##অ\n##আ\n##ই\n##উ\n##এ\n##ও\n##ক\n##খ\n##গ\n##চ\n##ছ\n##জ\n##ট\n##ড\n##ণ\n##ত\n##থ\n##দ\n##ধ\n##ন\n##প\n##ব\n##ভ\n##ম\n##য\n##র\n##ল\n##শ\n##ষ\n##স\n##হ\n##া\n##ি\n##ী\n##ে\n##க\n##ச\n##ட\n##த\n##ந\n##ன\n##ப\n##ம\n##ய\n##ர\n##ல\n##ள\n##வ\n##ா\n##ி\n##ு\n##ே\n##ை\n##ನ\n##ರ\n##ಾ\n##ක\n##ය\n##ර\n##ල\n##ව\n##ා\n##ก\n##ง\n##ต\n##ท\n##น\n##พ\n##ม\n##ย\n##ร\n##ล\n##ว\n##ส\n##อ\n##า\n##เ\n##་\n##།\n##ག\n##ང\n##ད\n##ན\n##པ\n##བ\n##མ\n##འ\n##ར\n##ལ\n##ས\n##မ\n##ა\n##ბ\n##გ\n##დ\n##ე\n##ვ\n##თ\n##ი\n##კ\n##ლ\n##მ\n##ნ\n##ო\n##რ\n##ს\n##ტ\n##უ\n##ᄀ\n##ᄂ\n##ᄃ\n##ᄅ\n##ᄆ\n##ᄇ\n##ᄉ\n##ᄊ\n##ᄋ\n##ᄌ\n##ᄎ\n##ᄏ\n##ᄐ\n##ᄑ\n##ᄒ\n##ᅡ\n##ᅢ\n##ᅥ\n##ᅦ\n##ᅧ\n##ᅩ\n##ᅪ\n##ᅭ\n##ᅮ\n##ᅯ\n##ᅲ\n##ᅳ\n##ᅴ\n##ᅵ\n##ᆨ\n##ᆫ\n##ᆯ\n##ᆷ\n##ᆸ\n##ᆼ\n##ᴬ\n##ᴮ\n##ᴰ\n##ᴵ\n##ᴺ\n##ᵀ\n##ᵃ\n##ᵇ\n##ᵈ\n##ᵉ\n##ᵍ\n##ᵏ\n##ᵐ\n##ᵒ\n##ᵖ\n##ᵗ\n##ᵘ\n##ᵣ\n##ᵤ\n##ᵥ\n##ᶜ\n##ᶠ\n##‐\n##‑\n##‒\n##–\n##—\n##―\n##‖\n##‘\n##’\n##‚\n##“\n##”\n##„\n##†\n##‡\n##•\n##…\n##‰\n##′\n##″\n##›\n##‿\n##⁄\n##⁰\n##ⁱ\n##⁴\n##⁵\n##⁶\n##⁷\n##⁸\n##⁹\n##⁻\n##ⁿ\n##₅\n##₆\n##₇\n##₈\n##₉\n##₊\n##₍\n##₎\n##ₐ\n##ₑ\n##ₒ\n##ₓ\n##ₕ\n##ₖ\n##ₗ\n##ₘ\n##ₚ\n##ₛ\n##ₜ\n##₤\n##₩\n##€\n##₱\n##₹\n##ℓ\n##№\n##ℝ\n##™\n##⅓\n##⅔\n##←\n##↑\n##→\n##↓\n##↔\n##↦\n##⇄\n##⇌\n##⇒\n##∂\n##∅\n##∆\n##∇\n##∈\n##∗\n##∘\n##√\n##∞\n##∧\n##∨\n##∩\n##∪\n##≈\n##≡\n##≤\n##≥\n##⊂\n##⊆\n##⊕\n##⊗\n##⋅\n##─\n##│\n##■\n##▪\n##●\n##★\n##☆\n##☉\n##♠\n##♣\n##♥\n##♦\n##♯\n##⟨\n##⟩\n##ⱼ\n##⺩\n##⺼\n##⽥\n##、\n##。\n##〈\n##〉\n##《\n##》\n##「\n##」\n##『\n##』\n##〜\n##あ\n##い\n##う\n##え\n##お\n##か\n##き\n##く\n##け\n##こ\n##さ\n##し\n##す\n##せ\n##そ\n##た\n##ち\n##っ\n##つ\n##て\n##と\n##な\n##に\n##ぬ\n##ね\n##の\n##は\n##ひ\n##ふ\n##へ\n##ほ\n##ま\n##み\n##む\n##め\n##も\n##や\n##ゆ\n##よ\n##ら\n##り\n##る\n##れ\n##ろ\n##を\n##ん\n##ァ\n##ア\n##ィ\n##イ\n##ウ\n##ェ\n##エ\n##オ\n##カ\n##キ\n##ク\n##ケ\n##コ\n##サ\n##シ\n##ス\n##セ\n##タ\n##チ\n##ッ\n##ツ\n##テ\n##ト\n##ナ\n##ニ\n##ノ\n##ハ\n##ヒ\n##フ\n##ヘ\n##ホ\n##マ\n##ミ\n##ム\n##メ\n##モ\n##ャ\n##ュ\n##ョ\n##ラ\n##リ\n##ル\n##レ\n##ロ\n##ワ\n##ン\n##・\n##ー\n##一\n##三\n##上\n##下\n##不\n##世\n##中\n##主\n##久\n##之\n##也\n##事\n##二\n##五\n##井\n##京\n##人\n##亻\n##仁\n##介\n##代\n##仮\n##伊\n##会\n##佐\n##侍\n##保\n##信\n##健\n##元\n##光\n##八\n##公\n##内\n##出\n##分\n##前\n##劉\n##力\n##加\n##勝\n##北\n##区\n##十\n##千\n##南\n##博\n##原\n##口\n##古\n##史\n##司\n##合\n##吉\n##同\n##名\n##和\n##囗\n##四\n##国\n##國\n##土\n##地\n##坂\n##城\n##堂\n##場\n##士\n##夏\n##外\n##大\n##天\n##太\n##夫\n##奈\n##女\n##子\n##学\n##宀\n##宇\n##安\n##宗\n##定\n##宣\n##宮\n##家\n##宿\n##寺\n##將\n##小\n##尚\n##山\n##岡\n##島\n##崎\n##川\n##州\n##巿\n##帝\n##平\n##年\n##幸\n##广\n##弘\n##張\n##彳\n##後\n##御\n##德\n##心\n##忄\n##志\n##忠\n##愛\n##成\n##我\n##戦\n##戸\n##手\n##扌\n##政\n##文\n##新\n##方\n##日\n##明\n##星\n##春\n##昭\n##智\n##曲\n##書\n##月\n##有\n##朝\n##木\n##本\n##李\n##村\n##東\n##松\n##林\n##森\n##楊\n##樹\n##橋\n##歌\n##止\n##正\n##武\n##比\n##氏\n##民\n##水\n##氵\n##氷\n##永\n##江\n##沢\n##河\n##治\n##法\n##海\n##清\n##漢\n##瀬\n##火\n##版\n##犬\n##王\n##生\n##田\n##男\n##疒\n##発\n##白\n##的\n##皇\n##目\n##相\n##省\n##真\n##石\n##示\n##社\n##神\n##福\n##禾\n##秀\n##秋\n##空\n##立\n##章\n##竹\n##糹\n##美\n##義\n##耳\n##良\n##艹\n##花\n##英\n##華\n##葉\n##藤\n##行\n##街\n##西\n##見\n##訁\n##語\n##谷\n##貝\n##貴\n##車\n##軍\n##辶\n##道\n##郎\n##郡\n##部\n##都\n##里\n##野\n##金\n##鈴\n##镇\n##長\n##門\n##間\n##阝\n##阿\n##陳\n##陽\n##雄\n##青\n##面\n##風\n##食\n##香\n##馬\n##高\n##龍\n##龸\n##ﬁ\n##ﬂ\n##！\n##（\n##）\n##，\n##－\n##．\n##／\n##：\n##？\n##～\n"
  },
  {
    "path": "src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e91cf83b\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Running Huggingface DistilBERT with TensorFlow-Neuron\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"71394e1e\",\n   \"metadata\": {},\n   \"source\": [\n    \"In this tutorial you will compile and deploy DistilBERT version of HuggingFace 🤗 Transformers BERT for Inferentia using TensorFlow-Neuron. The full list of HuggingFace's pretrained BERT models can be found in the BERT section on this page https://huggingface.co/transformers/pretrained_models.html. you can also read about HuggingFace's pipeline feature here: https://huggingface.co/transformers/main_classes/pipelines.html\\n\",\n    \"\\n\",\n    \"This Jupyter notebook should be run on an instance which is inf1.6xlarge or larger, but in real life scenario the compilation should be done on a compute instance and the deployment on inf1 instance to save costs.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"828ef9bd\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Setup\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5becc549\",\n   \"metadata\": {},\n   \"source\": [\n    \"To run this tutorial please follow the instructions for [TensorFlow-Neuron Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/tensorflow-neuron.html#setup-tensorflow-neuron) and the [Jupyter Notebook Quickstart](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html) and set your kernel to \\\"Python (tensorflow-neuron)\\\" .\\n\",\n    \"\\n\",\n    \"Next, install some additional dependencies.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ee1a3b84\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\\n\",\n    \"!pip install transformers==4.30.2\\n\",\n    \"!pip install ipywidgets\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"c301cfce\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Download From Huggingface and Compile for AWS-Neuron\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"92e8050d\",\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import tensorflow as tf\\n\",\n    \"import tensorflow_neuron as tfn\\n\",\n    \"from transformers import DistilBertTokenizer, TFDistilBertModel\\n\",\n    \"\\n\",\n    \"# Create a wrapper for the roberta model that will accept inputs as a list\\n\",\n    \"# instead of a dictionary. This will allow the compiled model to be saved\\n\",\n    \"# to disk with the model.save() fucntion.\\n\",\n    \"class DistilBertWrapper(tf.keras.Model):\\n\",\n    \"    def __init__(self, model):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.model = model\\n\",\n    \"    def __call__(self, example_inputs):\\n\",\n    \"        return self.model({'input_ids' : example_inputs[0], 'attention_mask' : example_inputs[1]})\\n\",\n    \"        \\n\",\n    \"\\n\",\n    \"tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')\\n\",\n    \"model = DistilBertWrapper(TFDistilBertModel.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english'))\\n\",\n    \"\\n\",\n    \"batch_size = 16\\n\",\n    \"\\n\",\n    \"# create example inputs with a batch size of 16\\n\",\n    \"text = [\\\"Paris is the <mask> of France.\\\"] * batch_size\\n\",\n    \"encoded_input = tokenizer(text, return_tensors='tf', padding='max_length', max_length=64)\\n\",\n    \"\\n\",\n    \"# turn inputs into a list\\n\",\n    \"example_input = [encoded_input['input_ids'], encoded_input['attention_mask']]\\n\",\n    \"\\n\",\n    \"#compile\\n\",\n    \"model_neuron = tfn.trace(model, example_input)\\n\",\n    \"\\n\",\n    \"print(\\\"Running on neuron:\\\", model_neuron(example_input))\\n\",\n    \"\\n\",\n    \"# save the model to disk to save recompilation time for next usage\\n\",\n    \"model_neuron.save('./distilbert-neuron-b16')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0f2e159a\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run Basic Inference Benchmarking\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ccf22e74\",\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import numpy as np\\n\",\n    \"import concurrent.futures\\n\",\n    \"import time\\n\",\n    \"\\n\",\n    \"reloaded_neuron_model = tf.keras.models.load_model('./distilbert-neuron-b16')\\n\",\n    \"print(\\\"Reloaded model running on neuron:\\\", reloaded_neuron_model(example_input))\\n\",\n    \"\\n\",\n    \"num_threads = 4\\n\",\n    \"num_inferences = 1000\\n\",\n    \"\\n\",\n    \"latency_list = []\\n\",\n    \"def inference_with_latency_calculation(example_input):\\n\",\n    \"    global latency_list\\n\",\n    \"    start = time.time()\\n\",\n    \"    result = reloaded_neuron_model(example_input)\\n\",\n    \"    end = time.time()\\n\",\n    \"    latency_list.append((end-start) * 1000)\\n\",\n    \"    return result\\n\",\n    \"\\n\",\n    \"start = time.time()\\n\",\n    \"with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:\\n\",\n    \"    futures = []\\n\",\n    \"    for i in range(num_inferences):\\n\",\n    \"        futures.append(executor.submit(inference_with_latency_calculation, example_input))\\n\",\n    \"    for future in concurrent.futures.as_completed(futures):\\n\",\n    \"        get_result = future.result()\\n\",\n    \"end = time.time()\\n\",\n    \"\\n\",\n    \"total_time = end - start\\n\",\n    \"throughput = (num_inferences * batch_size)/total_time\\n\",\n    \"\\n\",\n    \"print(f\\\"Throughput was {throughput} samples per second.\\\")\\n\",\n    \"print(f\\\"Latency p50 was {np.percentile(latency_list, 50)} ms\\\")\\n\",\n    \"print(f\\\"Latency p90 was {np.percentile(latency_list, 90)} ms\\\")\\n\",\n    \"print(f\\\"Latency p95 was {np.percentile(latency_list, 95)} ms\\\")\\n\",\n    \"print(f\\\"Latency p99 was {np.percentile(latency_list, 99)} ms\\\")\\n\",\n    \"assert(throughput >= 1930.0)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"b31b82fc\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3 (ipykernel)\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.0\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "src/examples/tensorflow/k8s_bert_demo/Dockerfile.tfserving_example",
    "content": "From ubuntu:16.04\nRUN apt-get update\nRUN apt-get install -y wget apt-transport-https ca-certificates awscli\nRUN echo \"deb https://apt.repos.neuron.amazonaws.com xenial main\" > /etc/apt/sources.list.d/neuron.list\nRUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add -\n\nRUN apt-get update\nRUN apt-get install -y tensorflow-model-server-neuron"
  },
  {
    "path": "src/examples/tensorflow/k8s_bert_demo/README.md",
    "content": "</br>\n</br>\n\nPlease view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** \n\n"
  },
  {
    "path": "src/examples/tensorflow/k8s_bert_demo/bert_client.py",
    "content": "import numpy as np\nimport grpc\nimport tensorflow as tf\nfrom tensorflow_serving.apis import predict_pb2\nfrom tensorflow_serving.apis import prediction_service_pb2_grpc\nimport time\n\nif __name__ == '__main__':\n    channel = grpc.insecure_channel('localhost:9000')\n    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)\n    request = predict_pb2.PredictRequest()\n    request.model_spec.name = 'bert_mrpc_hc_gelus_b4_l24_0926_02'\n    i = np.zeros([1, 128], dtype=np.int32)\n    request.inputs['input_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))\n    request.inputs['input_mask'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))\n    request.inputs['segment_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))\n\n    latencies = []\n    for i in range(100):\n        start = time.time()\n        result = stub.Predict(request)\n        latencies.append(time.time() - start)\n        print(\"Inference successful: {}\".format(i))\n\n    print (\"Ran {} inferences successfully. Latency average = {}\".format(len(latencies), np.average(latencies)))\n"
  },
  {
    "path": "src/examples/tensorflow/k8s_bert_demo/bert_service.yml",
    "content": "---\nkind: Service\napiVersion: v1\nmetadata:\n  name: inf-k8s-test\n  labels:\n    app: inf-k8s-test\nspec:\n  ports:\n    - name: http-tf-serving\n      port: 8500\n      targetPort: 8500\n    - name: grpc-tf-serving\n      port: 9000\n      targetPort: 9000\n  selector:\n    app: inf-k8s-test\n    role: master\n  type: ClusterIP\n---\nkind: Deployment\napiVersion: apps/v1\nmetadata:\n  name: inf-k8s-test\n  labels:\n    app: inf-k8s-test\n    role: master\nspec:\n  replicas: 1 # Number of desired replicas. Increase to desired number.\n  selector:\n    matchLabels:\n      app: inf-k8s-test\n      role: master\n  template:\n    metadata:\n      labels:\n        app: inf-k8s-test\n        role: master\n    spec:\n      volumes:\n        - name: sock\n          emptyDir: {}\n      containers:\n        - name: inf-k8s-test\n          image: tf-serving-ctr\n          imagePullPolicy: IfNotPresent\n          command: [\"/bin/sh\",\"-c\"]\n\n          # Pull model from s3, then start tensorflow_model_server_neuron with the model.\n          args:\n            - \"aws s3 sync s3://<your-bert-bucket>/bert /tmp/bert && \\\n           tensorflow_model_server_neuron --port=9000 --rest_api_port=8500 --model_name=bert_mrpc_hc_gelus_b4_l24_0926_02 --model_base_path=/tmp//bert/\"\n\n          # Open grpc and rest API ports\n          ports:\n            - containerPort: 8500\n            - containerPort: 9000\n\n          # Informs tensorflow_model_server_neuron of UDS socket location\n          env:\n            - name: NEURON_RTD_ADDRESS\n              value: unix:/sock/neuron.sock\n\n          # Arbitrary resource requirements\n          resources:\n            limits:\n              cpu: 4\n              memory: 4Gi\n            requests:\n              cpu: \"1\"\n              memory: 1Gi\n\n          # Shared volume mount, for UDS socket\n          volumeMounts:\n            - name: sock\n              mountPath: /sock\n\n        # Neuron-rtd container\n        - name: neuron-rtd\n          image: 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:latest # neuron-rtd image.\n          imagePullPolicy: IfNotPresent\n\n          # Neuron-rtd required capabilities\n          securityContext:\n            capabilities:\n              add:\n                - SYS_ADMIN\n                - IPC_LOCK\n\n          # Shared volume mount, for UDS socket\n          volumeMounts:\n            - name: sock\n              mountPath: /sock\n\n          resources:\n            limits:\n              hugepages-2Mi: 256Mi    # configure to 256 * desired number of Inferentia devices.\n              aws.amazon.com/neuron: 1  # desired number of Inferentia devices.\n            requests:\n              memory: 1024Mi          # Desired amount of memory. Should be larger than hugepages-2Mi limit.\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/LICENSE",
    "content": "Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n  \nPermission is hereby granted, free of charge, to any person obtaining a copy of this\nsoftware and associated documentation files (the \"Software\"), to deal in the Software\nwithout restriction, including without limitation the rights to use, copy, modify,\nmerge, publish, distribute, sublicense, and/or sell copies of the Software, and to\npermit persons to whom the Software is furnished to do so.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,\nINCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A\nPARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\nHOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION\nOF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE\nSOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/README.md",
    "content": "</br>\n</br>\n\nPlease view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** \n\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/fp32tofp16.py",
    "content": "\"\"\" Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n\"\"\"\n\nimport re\nimport argparse\nimport tensorflow as tf\nimport numpy as np\n\nfrom google.protobuf import text_format\nfrom tensorflow.core.framework import graph_pb2\nfrom tensorflow.core.framework import node_def_pb2\nfrom tensorflow.python.platform import gfile\n\nfrom tensorflow.core.framework import attr_value_pb2\nfrom tensorflow.python.framework import tensor_util\n\ndef ConvertFP32ToOther(graphdef):\n  \"\"\"Converts an FP32 network by casting all constants (weights) to a lower\n     precision floating point type (FP16) and updating the dtypes\n     everywhere.\"\"\"\n  cast_type = \"float16\"\n  sess = tf.Session(graph=tf.import_graph_def(graphdef))\n  output_graph_def = graph_pb2.GraphDef()\n  dummy_tensor = sess.run(tf.constant([0.1]))\n  dummy_tensor_proto = tensor_util.make_tensor_proto(dummy_tensor, \\\n      dtype=cast_type, shape=dummy_tensor.shape)\n  dummy_tensor32 = sess.run(tf.constant([0.1]))\n  dummy_tensor_proto32 = tensor_util.make_tensor_proto(dummy_tensor, \\\n      dtype=tf.float32, shape=dummy_tensor.shape)\n  dt_float_type_attr = attr_value_pb2.AttrValue(type=dummy_tensor_proto32.dtype)\n  dt_half_type_attr = attr_value_pb2.AttrValue(type=dummy_tensor_proto.dtype)\n  for node in graphdef.node:\n    output_node = node_def_pb2.NodeDef()\n    output_node.CopyFrom(node)\n    if (node.op == \"Const\"):\n      if (node.attr[\"dtype\"] == dt_float_type_attr):\n        a = tensor_util.MakeNdarray(node.attr[\"value\"].tensor)\n        a = tf.cast(a, cast_type)\n        a = sess.run(a)\n        output_node.attr[\"dtype\"].CopyFrom(dt_half_type_attr)\n        output_node.attr[\"value\"].CopyFrom(\n            attr_value_pb2.AttrValue(\n              tensor=tensor_util.make_tensor_proto(a,\\\n                dtype=cast_type, shape=a.shape)))\n    else:\n      if (\"T\" in node.attr.keys()):\n        if (output_node.attr[\"T\"] == dt_float_type_attr):\n          output_node.attr[\"T\"].CopyFrom(dt_half_type_attr)\n      if (\"Tparams\" in node.attr.keys()):\n        if (output_node.attr[\"Tparams\"] == dt_float_type_attr):\n          output_node.attr[\"Tparams\"].CopyFrom(dt_half_type_attr)\n      if (\"dtype\" in node.attr.keys()):\n        if (node.attr[\"dtype\"] == dt_float_type_attr):\n          output_node.attr[\"dtype\"].CopyFrom(dt_half_type_attr)\n      if (\"SrcT\" in node.attr.keys()):\n        if (node.attr[\"SrcT\"] == dt_float_type_attr):\n          output_node.attr[\"SrcT\"].CopyFrom(dt_half_type_attr)\n      if (\"DstT\" in node.attr.keys()):\n        if (node.attr[\"DstT\"] == dt_float_type_attr):\n          output_node.attr[\"DstT\"].CopyFrom(dt_half_type_attr)\n    output_graph_def.node.extend([output_node])\n  return output_graph_def\n\ndef load_graph(model_file):\n  graph_def = tf.GraphDef()\n\n  with open(model_file, \"rb\") as f:\n    graph_def.ParseFromString(f.read())\n\n  return graph_def\n\nif __name__ == \"__main__\":\n  parser = argparse.ArgumentParser()\n  parser.add_argument(\"--graph\", help=\"graph/model to be executed\",\n      required=True)\n  parser.add_argument(\"--out_graph\", help=\"graph/model to be generated\",\n      required=True)\n  args = parser.parse_args()\n\n  graph_f32 = load_graph(args.graph)\n  graph_f16 = ConvertFP32ToOther(graph_f32)\n  output_xformed_graph_name = args.out_graph\n  with gfile.GFile(output_xformed_graph_name, \"wb\") as f:\n    f.write(graph_f16.SerializeToString())\n  #with gfile.GFile(output_xformed_graph_name+\"txt\", 'w') as f:\n  #  f.write(text_format.MessageToString(graph_f16))\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/full_sweep",
    "content": "#!/usr/bin/env bash\n\n##########################################################################\n#  Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n#  SPDX-License-Identifier: MIT-0\n##########################################################################\n\necho \"\" > full_sweep.log\necho \"\" > full_sweep_results.txt\n\nresults=()\nfor b in $(seq 1 5); do \n    for i in 1 2 4 8 12 16; do \n        python pb2sm_compile.py --batch_size=$b --neuroncore-pipeline-cores=$i | tee -a full_sweep.log;\n        results[$b]+=\", \"`tail -1 full_sweep.log`\n    done\ndone\n\nhead=\"batch\"\nfor i in 1 2 4 8 12 16; do\n    head+=\", nc${i}\"\ndone \necho $head | tee -a full_sweep_results.txt\nfor b in $(seq 1 5); do \n    echo $b${results[$b]} | tee -a full_sweep_results.txt\ndone\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/gen_resnet50_keras.py",
    "content": "\"\"\" Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n\"\"\"\n\nimport re\nimport argparse\nimport tensorflow as tf\nimport numpy as np\n\nfrom tensorflow.keras.applications.resnet50 import ResNet50\nfrom tensorflow.keras.preprocessing import image\nfrom tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions\n\nfrom google.protobuf import text_format\nimport tensorflow.python.saved_model\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--fp16\", action='store_true', help=\"use float16 parameters and operations\")\n    args = parser.parse_args()\n\n    # set Keras global configurations\n    tf.keras.backend.set_learning_phase(0)\n    tf.keras.backend.set_image_data_format('channels_last')\n    if (args.fp16):\n        float_type = 'float16'\n        float_type2 = 'fp16'\n    else:\n        float_type = 'float32'\n        float_type2 = 'fp32'\n    tf.keras.backend.set_floatx(float_type)\n\n    # load pre-trained model using Keras\n    model_name = 'resnet50_%s_keras'%float_type2\n    model = ResNet50(weights='imagenet')\n\n    # various save files\n    frozen_file = model_name + '.pb'\n    opt_file = model_name + '_opt.pb'\n\n    # obtain parameters\n    model_input = model.input.name.replace(':0', '')\n    model_output = model.output.name.replace(':0', '')\n    batch, height, width, channels = model.input.shape\n\n    print (\"model, frozen file, optimized file, input size, input node, output node,\")\n    print (\"%s, %s, %s, %dx%dx%d, %s, %s\" %(model_name, frozen_file, opt_file, width, height, channels, model_input, model_output) ) \n\n    # obtain the TF session\n    sess = tf.compat.v1.keras.backend.get_session()\n\n    # save checkpoint files for freeze_graph\n    ckpt_file = '/tmp/' + model_name + '/' + model_name + '.ckpt'\n    graph_file = '/tmp/' + model_name + '/' + model_name + '.pb'\n    tf.compat.v1.train.Saver().save(sess, ckpt_file)\n    tf.io.write_graph(sess.graph.as_graph_def(), logdir='.', name=graph_file, as_text=False)\n\n    print(model_output)\n    with tf.compat.v1.Session(graph=tf.Graph()) as sess:\n          saver = tf.compat.v1.train.import_meta_graph(ckpt_file + '.meta')\n          saver.restore(sess, ckpt_file)\n          output_graph_def = tf.compat.v1.graph_util.convert_variables_to_constants(\n              sess, tf.compat.v1.get_default_graph().as_graph_def(), [model_output])\n          output_graph_def = tf.compat.v1.graph_util.remove_training_nodes(\n              output_graph_def, protected_nodes=[model_output])\n          with open(frozen_file, 'wb') as f:\n              f.write(output_graph_def.SerializeToString())\n\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/infer_resnet50_keras.py",
    "content": "\"\"\" Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n\"\"\"\n\nimport os\nimport time\nimport shutil\nimport argparse\n\nimport numpy as np\nimport tensorflow as tf\nfrom tensorflow.keras.preprocessing import image\nfrom tensorflow.keras.applications import resnet50\n\nparser = argparse.ArgumentParser()\nparser.add_argument(\"--graph\", default=\"resnet50_fp32_keras.pb\", help=\"Graph to use for inference\", required=True)\nparser.add_argument(\"--input\", default=\"input_1\", help=\"Input of graph\")\nparser.add_argument(\"--output\", default=\"probs/Softmax\", help=\"Output of graph\")\nargs = parser.parse_args()\n\ntf.keras.backend.set_image_data_format('channels_last')\n\ndef pb_to_saved_model(pb_path, input_names, output_names, model_dir):\n    graph_def = tf.compat.v1.GraphDef()\n    graph_def.ParseFromString(open(pb_path, 'rb').read())\n    with tf.compat.v1.Session(graph=tf.Graph()) as sess:\n        tf.import_graph_def(graph_def, name='')\n        inputs = {name: sess.graph.get_tensor_by_name(ts_name) for name, ts_name in input_names.items()}\n        outputs = {name: sess.graph.get_tensor_by_name(ts_name) for name, ts_name in output_names.items()}\n        tf.saved_model.simple_save(sess, model_dir, inputs, outputs)\n\nSAVED_MODEL_DIR = './rn50_fp16'\nshutil.rmtree(SAVED_MODEL_DIR, ignore_errors=True)\ninput_tname=\"{}:0\".format(args.input)\noutput_tname=\"{}:0\".format(args.output)\npb_to_saved_model(args.graph, {input_tname : input_tname}, {output_tname : output_tname}, SAVED_MODEL_DIR)\n\n# Create input from image\nimg_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))\nimg_arr = image.img_to_array(img_sgl)\nimg_arr2 = np.expand_dims(img_arr, axis=0)\nimg_arr3 = resnet50.preprocess_input(np.repeat(img_arr2, 1, axis=0))\n\n# Load model\npredictor_host = tf.contrib.predictor.from_saved_model(SAVED_MODEL_DIR)\n\n# Run inference\nmodel_feed_dict={'input_1:0': img_arr3}\ninfa_rslts = predictor_host(model_feed_dict);\nprint(resnet50.decode_predictions(infa_rslts[output_tname], top=5)[0])\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/infer_resnet50_keras_loadtest.py",
    "content": "\"\"\" Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n\"\"\"\n\nimport shutil\nimport tensorflow as tf\nimport os\nimport time\nfrom concurrent import futures\nimport numpy as np\nimport statistics\nimport argparse\nimport requests\nimport tensorflow as tf\nimport tensorflow.neuron\nfrom tensorflow.keras.preprocessing import image\nfrom tensorflow.keras.applications import resnet50\nimport warnings\nimport subprocess\nimport json\n\ntf.keras.backend.set_image_data_format('channels_last')\n\narg_parser = argparse.ArgumentParser()\narg_parser.add_argument('--batch_size', type=int, default=5, choices=range(1, 6), help='Batch size of model as it was compiled')\narg_parser.add_argument('--neuroncore-pipeline-cores', type=int, default=1, choices=range(1, 17), help='Number of NeuronCores limit for each partitioned graph')\nargs = arg_parser.parse_args()\n\nneuron_ls_output = subprocess.run([\"neuron-ls\",\"-j\"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True, encoding=\"utf-8\")\nneuron_ls_json = json.loads(neuron_ls_output.stdout)\navail_neuroncores = neuron_ls_json[0][\"nc_count\"]\n\nUSER_BATCH_SIZE = 2 * args.batch_size\nNUM_LOOPS_PER_THREAD = 400\nCOMPILED_MODEL_DIR = \"./rn50_fp16_compiled_b\" + str(args.batch_size) + \"_nc\" + str(args.neuroncore_pipeline_cores) + \"/1\"\n\n# Ensure there's enough buffer capacity to hold in-flight requests in runtime\nNUM_INFERS_IN_FLIGHT = args.neuroncore_pipeline_cores + 3\nos.environ['NEURON_MAX_NUM_INFERS'] = str(NUM_INFERS_IN_FLIGHT)\n\nnum_groups = avail_neuroncores // args.neuroncore_pipeline_cores\ngroup_sizes = [str(args.neuroncore_pipeline_cores)] * num_groups\nwarnings.warn(\"NEURONCORE_GROUP_SIZES is being deprecated, if your application is using NEURONCORE_GROUP_SIZES please \\\nsee https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/deprecation.html#announcing-end-of-support-for-neuroncore-group-sizes \\\nfor more details.\", DeprecationWarning)\nos.environ['NEURONCORE_GROUP_SIZES'] = ','.join(group_sizes)\n\n# Create input from image\nimg_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))\nimg_arr = image.img_to_array(img_sgl, dtype='float16')\nimg_arr2 = np.expand_dims(img_arr, axis=0)\nimg_arr3 = np.repeat(img_arr2, USER_BATCH_SIZE, axis=0)\n\n# Load model\nNUM_THREADS_PER_PREDICTOR = args.neuroncore_pipeline_cores\npred_list = [tf.contrib.predictor.from_saved_model(COMPILED_MODEL_DIR) for _ in range(num_groups)]\npred_list = pred_list * NUM_THREADS_PER_PREDICTOR\nnum_threads = len(pred_list)\n\nnum_infer_per_thread = []\ntot_latency_per_thread = []\nthread_active = []\nlatency_list = []\nfor i in range(num_threads):\n    num_infer_per_thread.append(0)\n    tot_latency_per_thread.append(0)\n    thread_active.append(0)\n\ndef one_thread(pred, model_feed_dict, index):\n    global num_infer_per_thread\n    thread_active[index] = 1\n    for i in range(NUM_LOOPS_PER_THREAD):\n        start = time.time()\n        result = pred(model_feed_dict)\n        delta = time.time() - start\n        latency_list.append(delta)\n        # skip first warmup run\n        if i > 0:\n            tot_latency_per_thread[index] += delta\n        num_infer_per_thread[index] += USER_BATCH_SIZE\n        #print(num_infer_per_thread[index])\n    thread_active[index] = 0\n\ndef current_throughput():\n    global num_infer_per_thread\n    global args\n    iteration = 0\n    num_infer = 0\n    last_num_infer = num_infer\n    throughput_stats = []\n    print(\"Run with {} NeuronCores\".format(avail_neuroncores))\n    print(\"NEURON_MAX_NUM_INFERS (env): \" + os.environ.get('NEURON_MAX_NUM_INFERS', '<unset>'))\n    print(\"NEURONCORE_GROUP_SIZES (env): \" + os.environ.get('NEURONCORE_GROUP_SIZES', '<unset>'))\n    print(\"NUM THREADS: \", num_threads)\n    print(\"NUM_LOOPS_PER_THREAD: \", NUM_LOOPS_PER_THREAD)\n    print(\"USER_BATCH_SIZE: \", USER_BATCH_SIZE)\n    while num_infer < NUM_LOOPS_PER_THREAD * USER_BATCH_SIZE * num_threads:\n        num_infer = 0\n        total_thread_cnt = 0\n        for i in range(num_threads):\n            num_infer = num_infer + num_infer_per_thread[i]\n            total_thread_cnt = total_thread_cnt + thread_active[i]\n        current_num_infer = num_infer\n        throughput = current_num_infer - last_num_infer\n        #print('Active threads: {}, current throughput: {} images/sec'.format(total_thread_cnt, throughput))\n        # track throughput over time, after warmup\n        if iteration > 4 and total_thread_cnt == num_threads:\n            throughput_stats.append(throughput)\n        last_num_infer = current_num_infer\n        iteration += 1\n        time.sleep(1.0)\n    time.sleep(1.0)\n    tot_latency = 0\n    for i in range(num_threads):\n        tot_latency += tot_latency_per_thread[i]\n    # adjust loop count to remove the first warmup run\n    print(\"Throughput values collected:\")\n    print(throughput_stats)\n\n    print(\"\\nCompiled batch size {:}, user batch size {:}, Throughput stats (images/sec): Avg={:0.0f} Max={:}, Latency stats (msec/user-batch): P50={:0.1f} P90={:0.1f} P95={:0.1f} P99={:0.1f} \\n\".format( args.batch_size, USER_BATCH_SIZE, np.mean(throughput_stats), np.max(throughput_stats), \n        (np.percentile(latency_list, 50))*1000.0, (np.percentile(latency_list, 90))*1000.0, (np.percentile(latency_list, 95))*1000.0, (np.percentile(latency_list, 99))*1000.0)\n    )\n\n\nprint(\"\\n*** Compiled batch size {}, user batch size {}, num NeuronCores {} (input shape: {}, saved model dir: {}) ***\\n\".format(args.batch_size, USER_BATCH_SIZE, args.neuroncore_pipeline_cores, img_arr3.shape, COMPILED_MODEL_DIR))\n\n# Run inference\nmodel_feed_dict={'input_1:0': img_arr3}\n\nexecutor = futures.ThreadPoolExecutor(max_workers = num_threads + 1)\nexecutor.submit(current_throughput)\nfor i,pred in enumerate(pred_list):\n    executor.submit(one_thread, pred, model_feed_dict, i)\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/keras_resnet50.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"spectacular-payroll\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Tensorflow ResNet 50 Optimization Tutorial\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"equivalent-stack\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Note: this tutorial runs on tensorflow-neuron 1.x only\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"alpine-aside\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Introduction: \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"In this tutorial we provide three main sections:\\n\",\n    \"\\n\",\n    \"* Take a Resnet 50 model and perform optimizations on it\\n\",\n    \"\\n\",\n    \"* Compile the model with different batch sizes and Neuroncore Group sizes (read about Neuroncore Group sizes here: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-runtime/nrt-theory-of-operation.html#neuron-core-group)\\n\",\n    \"\\n\",\n    \"* Run inference on our multiple compiled models to see which has the best throughput\\n\",\n    \"\\n\",\n    \"Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](../../../../frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow). You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"opened-forty\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Install Dependencies\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"meaningful-algebra\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!pip install pillow requests # Necessary for loading images\\n\",\n    \"!pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/\\n\",\n    \"!pip install neuron_cc==1.13.5.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"remarkable-exercise\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"consecutive-right\",\n   \"metadata\": {},\n   \"source\": [\n    \"The following example shows how to compile a FP16 ResNet50 network using various batching parameters to find the optimal solution. On inf1.6xlarge, run through the following steps to get a optimized Resnet 50 model.\\n\",\n    \"First, extract Keras ResNet50 FP32 (resnet50_fp32_keras.pb will be generated):\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"vertical-finland\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import re\\n\",\n    \"import argparse\\n\",\n    \"import tensorflow as tf\\n\",\n    \"import numpy as np\\n\",\n    \"\\n\",\n    \"from tensorflow.keras.applications.resnet50 import ResNet50\\n\",\n    \"from tensorflow.keras.preprocessing import image\\n\",\n    \"from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions\\n\",\n    \"\\n\",\n    \"from google.protobuf import text_format\\n\",\n    \"import tensorflow.python.saved_model\\n\",\n    \"\\n\",\n    \"# set Keras global configurations\\n\",\n    \"tf.keras.backend.set_learning_phase(0)\\n\",\n    \"tf.keras.backend.set_image_data_format('channels_last')\\n\",\n    \"\\n\",\n    \"float_type = 'float32'\\n\",\n    \"float_type2 = 'fp32'\\n\",\n    \"tf.keras.backend.set_floatx(float_type)\\n\",\n    \"\\n\",\n    \"# load pre-trained model using Keras\\n\",\n    \"model_name = 'resnet50_%s_keras'%float_type2\\n\",\n    \"model = ResNet50(weights='imagenet')\\n\",\n    \"\\n\",\n    \"# various save files\\n\",\n    \"frozen_file = model_name + '.pb'\\n\",\n    \"opt_file = model_name + '_opt.pb'\\n\",\n    \"\\n\",\n    \"# obtain parameters\\n\",\n    \"model_input = model.input.name.replace(':0', '')\\n\",\n    \"model_output = model.output.name.replace(':0', '')\\n\",\n    \"batch, height, width, channels = model.input.shape\\n\",\n    \"\\n\",\n    \"print (\\\"model, frozen file, optimized file, input size, input node, output node,\\\")\\n\",\n    \"print (\\\"%s, %s, %s, %dx%dx%d, %s, %s\\\" %(model_name, frozen_file, opt_file, width, height, channels, model_input, model_output) ) \\n\",\n    \"\\n\",\n    \"# obtain the TF session\\n\",\n    \"sess = tf.compat.v1.keras.backend.get_session()\\n\",\n    \"\\n\",\n    \"# save checkpoint files for freeze_graph\\n\",\n    \"ckpt_file = '/tmp/' + model_name + '/' + model_name + '.ckpt'\\n\",\n    \"graph_file = '/tmp/' + model_name + '/' + model_name + '.pb'\\n\",\n    \"tf.compat.v1.train.Saver().save(sess, ckpt_file)\\n\",\n    \"tf.io.write_graph(sess.graph.as_graph_def(), logdir='.', name=graph_file, as_text=False)\\n\",\n    \"\\n\",\n    \"print(model_output)\\n\",\n    \"with tf.compat.v1.Session(graph=tf.Graph()) as sess:\\n\",\n    \"      saver = tf.compat.v1.train.import_meta_graph(ckpt_file + '.meta')\\n\",\n    \"      saver.restore(sess, ckpt_file)\\n\",\n    \"      output_graph_def = tf.compat.v1.graph_util.convert_variables_to_constants(\\n\",\n    \"          sess, tf.compat.v1.get_default_graph().as_graph_def(), [model_output])\\n\",\n    \"      output_graph_def = tf.compat.v1.graph_util.remove_training_nodes(\\n\",\n    \"          output_graph_def, protected_nodes=[model_output])\\n\",\n    \"      with open(frozen_file, 'wb') as f:\\n\",\n    \"          f.write(output_graph_def.SerializeToString())\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"romance-cyprus\",\n   \"metadata\": {},\n   \"source\": [\n    \"Optimize the extracted Keras ResNet50 FP32 graph for inference before casting (resnet50_fp32_keras_opt.pb will be generated) with the following transformations to the graph:\\n\",\n    \"\\n\",\n    \"* Remove Identity and CheckNumerics nodes\\n\",\n    \"* Fold FusedBatchNorm constants into previous Conv2D weights\\n\",\n    \"* Fold other constants\\n\",\n    \"* Strip unused nodes\\n\",\n    \"* Sort by execution order\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"higher-grant\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import copy\\n\",\n    \"import string\\n\",\n    \"\\n\",\n    \"from google.protobuf import text_format\\n\",\n    \"from tensorflow.core.framework import node_def_pb2\\n\",\n    \"from tensorflow.core.framework import attr_value_pb2\\n\",\n    \"from tensorflow.python.framework import tensor_util\\n\",\n    \"from tensorflow.tools.graph_transforms import TransformGraph\\n\",\n    \"\\n\",\n    \"def clear_input(node):\\n\",\n    \"  for i in range(len(node.input)):\\n\",\n    \"    node.input.pop()\\n\",\n    \"\\n\",\n    \"def replace_name(node, name):\\n\",\n    \"  node.name = name\\n\",\n    \"     \\n\",\n    \"def replace_input(node, input_name, new_name):\\n\",\n    \"  # node.input.replace(input_name, new_name)\\n\",\n    \"  temp = []\\n\",\n    \"  for i in node.input:\\n\",\n    \"    temp.extend([new_name if i == input_name else i])\\n\",\n    \"  clear_input(node)\\n\",\n    \"  for i in temp:\\n\",\n    \"    node.input.extend([i])\\n\",\n    \"\\n\",\n    \"def swap_names(node1, node2):\\n\",\n    \"  temp = node2.name\\n\",\n    \"  node2.name = node1.name\\n\",\n    \"  node1.name = temp\\n\",\n    \"\\n\",\n    \"def get_const_node(const_node_name, const_by_name):\\n\",\n    \"  name = re.sub(\\\"/read$\\\", \\\"\\\", const_node_name)\\n\",\n    \"  return const_by_name[name]\\n\",\n    \"\\n\",\n    \"def get_const_ndarray(const_node_name, const_by_name):\\n\",\n    \"  name = re.sub(\\\"/read$\\\", \\\"\\\", const_node_name)\\n\",\n    \"  node = const_by_name[name]\\n\",\n    \"  return tf.make_ndarray(node.attr.get(\\\"value\\\").tensor)\\n\",\n    \"\\n\",\n    \"def adjust_bias_values(bias_node, fbn_node, const_by_name):\\n\",\n    \"  bias_val = get_const_ndarray(bias_node.input[1], const_by_name)  \\n\",\n    \"  gamma_val = get_const_ndarray(fbn_node.input[1], const_by_name)  \\n\",\n    \"  mean_val = get_const_ndarray(fbn_node.input[3], const_by_name)  \\n\",\n    \"  variance_val = get_const_ndarray(fbn_node.input[4], const_by_name) \\n\",\n    \"  new_bias = bias_val * gamma_val / np.sqrt(variance_val)\\n\",\n    \"  new_tensor = tensor_util.make_tensor_proto(new_bias, new_bias.dtype, new_bias.shape)\\n\",\n    \"  bias_const_node = get_const_node(bias_node.input[1], const_by_name)\\n\",\n    \"  bias_const_node.attr[\\\"value\\\"].CopyFrom(attr_value_pb2.AttrValue(tensor=new_tensor))\\n\",\n    \"\\n\",\n    \"def MoveBiasAddAfterFusedBatchNorm(graphdef):\\n\",\n    \"  \\\"\\\"\\\"fold_batch_norm function of TransformGraph is unable to fold Keras ResNet50\\n\",\n    \"  because of BiasAdd between Conv2D and FusedBatchNorm (BiasAdd is not needed\\n\",\n    \"  if FusedBatchNorm is used, but it exists in Keras ResNet50). Here, we \\n\",\n    \"  move BiasAdd to after FusedBatchNorm, and adjust bias value by gamma/sqrt(variance).\\n\",\n    \"  \\\"\\\"\\\"\\n\",\n    \"  sess = tf.compat.v1.Session(graph=tf.import_graph_def(graphdef))\\n\",\n    \"  output_graph_def = tf.compat.v1.GraphDef()\\n\",\n    \"  node_by_name = {}\\n\",\n    \"  const_by_name = {}\\n\",\n    \"  for node in graphdef.node:\\n\",\n    \"    # Hack: use FusedBatchNormV2 so fold_batch_norm can recognize\\n\",\n    \"    if node.op == \\\"FusedBatchNormV3\\\":\\n\",\n    \"      node.op = \\\"FusedBatchNorm\\\"\\n\",\n    \"      del(node.attr[\\\"U\\\"])\\n\",\n    \"      #import pdb; pdb.set_trace()\\n\",\n    \"    copied_node = node_def_pb2.NodeDef()\\n\",\n    \"    copied_node.CopyFrom(node)\\n\",\n    \"    node_by_name[node.name] = copied_node\\n\",\n    \"    skip_add_node = False\\n\",\n    \"    # Switch Mul/BiasAdd in Keras RN50 so fold_batch_norm transform would work\\n\",\n    \"    if node.op == \\\"Const\\\":\\n\",\n    \"      const_by_name[node.name] = copied_node  \\n\",\n    \"    elif node.op.startswith(\\\"FusedBatchNorm\\\"):\\n\",\n    \"      inputs = node.input\\n\",\n    \"      for i in inputs:\\n\",\n    \"        input_node = node_by_name[i]\\n\",\n    \"        if input_node.op == \\\"BiasAdd\\\":\\n\",\n    \"          output_graph_def.node.remove(input_node)\\n\",\n    \"          input_node_input0 = input_node.input[0]\\n\",\n    \"          # Adjust bias values (multiply by scale/sqrt(variance))\\n\",\n    \"          adjust_bias_values(input_node, node, const_by_name)\\n\",\n    \"          # Hack: swap names to avoid changing input of activation\\n\",\n    \"          swap_names(copied_node, input_node)\\n\",\n    \"          # Fix inputs for these two ops\\n\",\n    \"          replace_input(copied_node, i, input_node_input0)\\n\",\n    \"          replace_input(input_node, input_node_input0, copied_node.name)\\n\",\n    \"          # Fix order in node list\\n\",\n    \"          output_graph_def.node.extend([copied_node])\\n\",\n    \"          output_graph_def.node.extend([input_node])\\n\",\n    \"          skip_add_node = True\\n\",\n    \"    # Add maybe-modified nodes if not already done\\n\",\n    \"    if not skip_add_node:\\n\",\n    \"      output_graph_def.node.extend([copied_node])\\n\",\n    \"  return output_graph_def\\n\",\n    \"\\n\",\n    \"def FoldFusedBatchNorm(graph_def):\\n\",\n    \"  \\\"\\\"\\\"Optimize training graph for inference:\\n\",\n    \"    - Remove Identity and CheckNumerics nodes\\n\",\n    \"    - Fold FusedBatchNorm constants into previous Conv2D weights\\n\",\n    \"    - Fold other constants\\n\",\n    \"    - Strip unused nodes\\n\",\n    \"    - Sort by execution order\\n\",\n    \"  \\\"\\\"\\\"\\n\",\n    \"  transformed_graph_def = TransformGraph (\\n\",\n    \"         graph_def,\\n\",\n    \"         ['input_1'],\\n\",\n    \"         ['probs/Softmax'],\\n\",\n    \"         [\\n\",\n    \"            'add_default_attributes',\\n\",\n    \"            'remove_nodes(op=Identity, op=CheckNumerics)',\\n\",\n    \"            'fold_constants(ignore_errors=true)',\\n\",\n    \"            'fold_batch_norms',\\n\",\n    \"            'fold_old_batch_norms',\\n\",\n    \"            'strip_unused_nodes',\\n\",\n    \"            'sort_by_execution_order',\\n\",\n    \"         ])\\n\",\n    \"  return transformed_graph_def\\n\",\n    \"\\n\",\n    \"def load_graph(model_file):\\n\",\n    \"  graph_def = tf.compat.v1.GraphDef()\\n\",\n    \"\\n\",\n    \"  with open(model_file, \\\"rb\\\") as f:\\n\",\n    \"    graph_def.ParseFromString(f.read())\\n\",\n    \"  return graph_def\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"graph_orig = load_graph('resnet50_fp32_keras.pb')\\n\",\n    \"graph_mod = MoveBiasAddAfterFusedBatchNorm(graph_orig)\\n\",\n    \"graph_mod2 = FoldFusedBatchNorm(graph_mod)\\n\",\n    \"with tf.io.gfile.GFile('resnet50_fp32_keras_opt.pb', \\\"wb\\\") as f:\\n\",\n    \"    f.write(graph_mod2.SerializeToString())\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"corresponding-acquisition\",\n   \"metadata\": {},\n   \"source\": [\n    \"Convert full graph to FP16 (resnet50_fp16_keras_opt.pb will be generated.\\n\",\n    \"This will take about a minute.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"detected-training\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from tensorflow.core.framework import graph_pb2\\n\",\n    \"from tensorflow.python.platform import gfile\\n\",\n    \"\\n\",\n    \"def ConvertFP32ToOther(graphdef):\\n\",\n    \"  \\\"\\\"\\\"Converts an FP32 network by casting all constants (weights) to a lower\\n\",\n    \"     precision floating point type (FP16) and updating the dtypes\\n\",\n    \"     everywhere.\\\"\\\"\\\"\\n\",\n    \"  cast_type = \\\"float16\\\"\\n\",\n    \"  sess = tf.Session(graph=tf.import_graph_def(graphdef))\\n\",\n    \"  output_graph_def = graph_pb2.GraphDef()\\n\",\n    \"  dummy_tensor = sess.run(tf.constant([0.1]))\\n\",\n    \"  dummy_tensor_proto = tensor_util.make_tensor_proto(dummy_tensor, \\\\\\n\",\n    \"      dtype=cast_type, shape=dummy_tensor.shape)\\n\",\n    \"  dummy_tensor32 = sess.run(tf.constant([0.1]))\\n\",\n    \"  dummy_tensor_proto32 = tensor_util.make_tensor_proto(dummy_tensor, \\\\\\n\",\n    \"      dtype=tf.float32, shape=dummy_tensor.shape)\\n\",\n    \"  dt_float_type_attr = attr_value_pb2.AttrValue(type=dummy_tensor_proto32.dtype)\\n\",\n    \"  dt_half_type_attr = attr_value_pb2.AttrValue(type=dummy_tensor_proto.dtype)\\n\",\n    \"  for node in graphdef.node:\\n\",\n    \"    output_node = node_def_pb2.NodeDef()\\n\",\n    \"    output_node.CopyFrom(node)\\n\",\n    \"    if (node.op == \\\"Const\\\"):\\n\",\n    \"      if (node.attr[\\\"dtype\\\"] == dt_float_type_attr):\\n\",\n    \"        a = tensor_util.MakeNdarray(node.attr[\\\"value\\\"].tensor)\\n\",\n    \"        a = tf.cast(a, cast_type)\\n\",\n    \"        a = sess.run(a)\\n\",\n    \"        output_node.attr[\\\"dtype\\\"].CopyFrom(dt_half_type_attr)\\n\",\n    \"        output_node.attr[\\\"value\\\"].CopyFrom(\\n\",\n    \"            attr_value_pb2.AttrValue(\\n\",\n    \"              tensor=tensor_util.make_tensor_proto(a,\\\\\\n\",\n    \"                dtype=cast_type, shape=a.shape)))\\n\",\n    \"    else:\\n\",\n    \"      if (\\\"T\\\" in node.attr.keys()):\\n\",\n    \"        if (output_node.attr[\\\"T\\\"] == dt_float_type_attr):\\n\",\n    \"          output_node.attr[\\\"T\\\"].CopyFrom(dt_half_type_attr)\\n\",\n    \"      if (\\\"Tparams\\\" in node.attr.keys()):\\n\",\n    \"        if (output_node.attr[\\\"Tparams\\\"] == dt_float_type_attr):\\n\",\n    \"          output_node.attr[\\\"Tparams\\\"].CopyFrom(dt_half_type_attr)\\n\",\n    \"      if (\\\"dtype\\\" in node.attr.keys()):\\n\",\n    \"        if (node.attr[\\\"dtype\\\"] == dt_float_type_attr):\\n\",\n    \"          output_node.attr[\\\"dtype\\\"].CopyFrom(dt_half_type_attr)\\n\",\n    \"      if (\\\"SrcT\\\" in node.attr.keys()):\\n\",\n    \"        if (node.attr[\\\"SrcT\\\"] == dt_float_type_attr):\\n\",\n    \"          output_node.attr[\\\"SrcT\\\"].CopyFrom(dt_half_type_attr)\\n\",\n    \"      if (\\\"DstT\\\" in node.attr.keys()):\\n\",\n    \"        if (node.attr[\\\"DstT\\\"] == dt_float_type_attr):\\n\",\n    \"          output_node.attr[\\\"DstT\\\"].CopyFrom(dt_half_type_attr)\\n\",\n    \"    output_graph_def.node.extend([output_node])\\n\",\n    \"  return output_graph_def\\n\",\n    \"\\n\",\n    \"def load_graph(model_file):\\n\",\n    \"  graph_def = tf.GraphDef()\\n\",\n    \"\\n\",\n    \"  with open(model_file, \\\"rb\\\") as f:\\n\",\n    \"    graph_def.ParseFromString(f.read())\\n\",\n    \"\\n\",\n    \"  return graph_def\\n\",\n    \"\\n\",\n    \"graph_f32 = load_graph('resnet50_fp32_keras_opt.pb')\\n\",\n    \"graph_f16 = ConvertFP32ToOther(graph_f32)\\n\",\n    \"output_xformed_graph_name = 'resnet50_fp16_keras_opt.pb'\\n\",\n    \"with gfile.GFile(output_xformed_graph_name, \\\"wb\\\") as f:\\n\",\n    \"    f.write(graph_f16.SerializeToString())\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"correct-travel\",\n   \"metadata\": {},\n   \"source\": [\n    \"Run the compilation script to sweep through various batch sizes up to 5 and several NeuronCore Group sizes up to 16. The script calls the compilation script pb2sm_compile.py which tries to perform compilation. Some error messages are expected due to known issues (see Known Issues section in the tutorial). If you run all the configurations it will take about 45 minutes.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"shared-ratio\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%%bash\\n\",\n    \"#!/usr/bin/env bash\\n\",\n    \"\\n\",\n    \"echo \\\"\\\" > full_sweep.log\\n\",\n    \"echo \\\"\\\" > full_sweep_results.txt\\n\",\n    \"\\n\",\n    \"results=()\\n\",\n    \"for b in $(seq 1 5); do \\n\",\n    \"    for i in 1 2 4 8 12 16; do \\n\",\n    \"        python pb2sm_compile.py --batch_size=$b --neuroncore-pipeline-cores=$i | tee -a full_sweep.log;\\n\",\n    \"        results[$b]+=\\\", \\\"`tail -1 full_sweep.log`\\n\",\n    \"    done\\n\",\n    \"done\\n\",\n    \"\\n\",\n    \"head=\\\"batch\\\"\\n\",\n    \"for i in 1 2 4 8 12 16; do\\n\",\n    \"    head+=\\\", nc${i}\\\"\\n\",\n    \"done \\n\",\n    \"echo $head | tee -a full_sweep_results.txt\\n\",\n    \"for b in $(seq 1 5); do \\n\",\n    \"    echo $b${results[$b]} | tee -a full_sweep_results.txt\\n\",\n    \"done\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"attached-austin\",\n   \"metadata\": {},\n   \"source\": [\n    \"You should see some output like this:\\n\",\n    \"```\\n\",\n    \"INFO: Compilation finished in 95 seconds with 99.5% operations placed on Inferentia\\n\",\n    \"\\n\",\n    \"1\\n\",\n    \"\\n\",\n    \"*** Batch size 1, num NeuronCores 2 (input shape: (1, 224, 224, 3), saved model dir: rn50_fp16_compiled_b1_nc2) ***\\n\",\n    \"\\n\",\n    \"INFO: Compilation finished in 95 seconds with 99.5% operations placed on Inferentia\\n\",\n    \"\\n\",\n    \"1\\n\",\n    \"\\n\",\n    \"*** Batch size 1, num NeuronCores 4 (input shape: (1, 224, 224, 3), saved model dir: rn50_fp16_compiled_b1_nc4) ***\\n\",\n    \"\\n\",\n    \"INFO: Compilation finished in 95 seconds with 99.5% operations placed on Inferentia\\n\",\n    \"\\n\",\n    \"1\\n\",\n    \"\\n\",\n    \"... (outputs removed)\\n\",\n    \"\\n\",\n    \"*** Batch size 5, num NeuronCores 16 (input shape: (5, 224, 224, 3), saved model dir: rn50_fp16_compiled_b5_nc16) ***\\n\",\n    \"\\n\",\n    \"ERROR: Compilation finished in 120 seconds with less than 50% operations placed on Inferentia (0.0%)\\n\",\n    \"\\n\",\n    \"INFO: Retry compilation without static weights\\n\",\n    \"\\n\",\n    \"ERROR: Retry compilation finished in 137 seconds with less than 50% operations placed on Inferentia (0.0%)\\n\",\n    \"\\n\",\n    \"0\\n\",\n    \"\\n\",\n    \"The file full_sweep_results.txt shows a summary of the sweep results with latest Neuron 1/27/20 release (0 means compilation unsuccessful and 0 ops mapped to Inferentia, 1 means most ops mapped to Inferentia and non-static weights, 2 means most ops mapped to Inferentia and using static weights):\\n\",\n    \"\\n\",\n    \"batch, nc1, nc2, nc4, nc8, nc12, nc16\\n\",\n    \"1, 1, 1, 1, 2, 2, 2\\n\",\n    \"2, 1, 1, 0, 1, 2, 2\\n\",\n    \"3, 1, 1, 1, 1, 1, 1\\n\",\n    \"4, 1, 1, 0, 1, 1, 1\\n\",\n    \"5, 1, 1, 0, 0, 0, 0\\n\",\n    \"```\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"surprised-abortion\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Inference\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"departmental-surprise\",\n   \"metadata\": {},\n   \"source\": [\n    \"Run inference over different batch sizes and Neuroncore groups to obtain throughput and latency results for ResNet50. To apply dynamic batching, the user batch size is set to 10x the compiled batch size, in order to keep input queue full and to amortize framework-to-Neuron overhead.\\n\",\n    \"\\n\",\n    \"Note: The results are based on the Neuron v1.12.2 (Mar 4th 2021) release. These will continue improve as we increase Neuron performance.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"requested-inspiration\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!cd ~/aws-neuron-sdk/src/examples/tensorflow/keras_resnet50/\\n\",\n    \"!echo \\\"\\\" > batch.log\\n\",\n    \"!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=1 | tee -a batch.log; done\\n\",\n    \"!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=2 | tee -a batch.log; done\\n\",\n    \"!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=4 | tee -a batch.log; done\\n\",\n    \"!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=8 | tee -a batch.log; done\\n\",\n    \"!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=12 | tee -a batch.log; done\\n\",\n    \"!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=16 | tee -a batch.log; done\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"split-genesis\",\n   \"metadata\": {},\n   \"source\": [\n    \"The file batch.log now contains the results for each batch size. We can look at the throughput values to get an idea of which models are performing well. The output should look something like this:\\n\",\n    \"\\n\",\n    \"The model best model configuration for throughput (if you run on an Inf1.6xlarge as suggested in the tutorial) is batch size 5 NeuronCore group size 2. Increasing batch size usually helps to increase throughput (up to a certain extent).\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"filled-township\",\n   \"metadata\": {},\n   \"source\": [\n    \"```\\n\",\n    \"*** Compiled batch size 5, user batch size 10, num NeuronCores 2 (input shape: (10, 224, 224, 3), saved model dir: ./rn50_fp16_compiled_b5_nc2/1) ***\\n\",\n    \"\\n\",\n    \"Instance type inf1.6xlarge with 16 NeuronCores\\n\",\n    \"NEURON_MAX_NUM_INFERS (env): 5\\n\",\n    \"NEURONCORE_GROUP_SIZES (env): 2,2,2,2,2,2,2,2\\n\",\n    \"NUM THREADS:  16\\n\",\n    \"NUM_LOOPS_PER_THREAD:  400\\n\",\n    \"USER_BATCH_SIZE:  10\\n\",\n    \"Throughput values collected:\\n\",\n    \"[10680, 10700, 10660]\\n\",\n    \"\\n\",\n    \"(rest of outputs removed)\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"189c4f0e-1a4e-4067-921f-95449c45dedd\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Known Issues\\n\",\n    \"\\n\",\n    \"### Unable to compile with batch and num NeuronCores combination\\n\",\n    \"\\n\",\n    \"For some combination of batch and number of NeuronCores setting, you may\\n\",\n    \"see an internal compiler error as below. Please see the sweep result\\n\",\n    \"above for Neuron 1/27/20 release. Furthermore, if using auto-casting to\\n\",\n    \"bfloat16 from FP32 network and batch size is larger than 1 would result\\n\",\n    \"in the same error.\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"```bash\\n\",\n    \"\\n\",\n    \"INFO:tensorflow:fusing subgraph neuron_op_a73aed4b95ca5d5b with neuron-cc; log file is at /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.neuron-cc.log\\n\",\n    \"   WARNING:tensorflow:Failed to fuse subgraph neuron_op_a73aed4b95ca5d5b with '/home/ubuntu/test_venv/bin/neuron-cc compile /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.neff --io-config \\\"{\\\\\\\"inputs\\\\\\\": {\\\\\\\"input_10/_0:0\\\\\\\": [[6, 224, 224, 3], \\\\\\\"float16\\\\\\\"]}, \\\\\\\"outputs\\\\\\\": [\\\\\\\"probs/Softmax:0\\\\\\\"]}\\\" --batching_en --rematerialization_en --sb_size 120 --spill_dis --enable-replication True'\\n\",\n    \"   WARNING:tensorflow:neuron-cc error message:\\n\",\n    \"   WARNING:tensorflow:01/23/2020 01:15:40 AM ERROR [neuron-cc]:\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]: ***************************************************************\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]:  An Internal Compiler Error has occurred\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]: ***************************************************************\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]:\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]: Please contact Customer Support and provide the following details.\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]:\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]: Error message:  Non-zero exit status (134) for command: /home/ubuntu/test_venv/lib/python3.6/site-packages/neuroncc/starfish/bin/list_sch --hhir hh-tr-external-move.json --verbose 0 --sb_size 120 --arith_intensity_target 2300 --sb_watermark_low 0.250000 --sb_watermark_high 0.750000 --sb_size_tol 1 --alloc simple1 --alloc_opt --depth_diff 0.100000 --verbose_start_cycle 0 --tt_dist --mm_meet_cnt 1 --load_speed_factor 0.300000 --schir sch_tmp.json --spill_depth_limit 5 --spill_dis --true_dep --mm_order --batching_en --rematerialization_en\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]:\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]: Error class:    CompilerInternalError\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]: Error location: job.Scheduler.3\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]: Command line:   /home/ubuntu/test_venv/bin/neuron-cc compile /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.neff --io-config '{\\\"inputs\\\": {\\\"input_10/_0:0\\\": [[6, 224, 224, 3], \\\"float16\\\"]}, \\\"outputs\\\": [\\\"probs/Softmax:0\\\"]}' --batching_en --rematerialization_en --sb_size 120 --spill_dis --enable-replication True\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]:\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]: Internal details:\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]:   File \\\"neuroncc/driver/Job.py\\\", line 207, in neuroncc.driver.Job.runSingleInputFn\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]:   File \\\"neuroncc/driver/jobs/Scheduler.py\\\", line 58, in neuroncc.driver.jobs.Scheduler.Scheduler.runSingleInput\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]:   File \\\"neuroncc/driver/Job.py\\\", line 145, in neuroncc.driver.Job.Job.shellCommand\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]:\\n\",\n    \"   01/23/2020 01:15:40 AM ERROR [neuron-cc]: Version information:\\n\",\n    \"   01/23/2020 01:15:41 AM ERROR [neuron-cc]:   Neuron Compiler version 1.0.6632.0+6001610955\\n\",\n    \"   01/23/2020 01:15:41 AM ERROR [neuron-cc]:\\n\",\n    \"   01/23/2020 01:15:41 AM ERROR [neuron-cc]:   HWM version 1.0.839.0-6001300654\\n\",\n    \"   01/23/2020 01:15:41 AM ERROR [neuron-cc]:   NEFF version 0.6\\n\",\n    \"   01/23/2020 01:15:41 AM ERROR [neuron-cc]:   TVM version 1.0.1589.0+6001610955\\n\",\n    \"   01/23/2020 01:15:41 AM ERROR [neuron-cc]:   NumPy version 1.16.5\\n\",\n    \"   01/23/2020 01:15:41 AM ERROR [neuron-cc]:   MXNet not available\\n\",\n    \"   01/23/2020 01:15:41 AM ERROR [neuron-cc]:   TF version 1.15.0\\n\",\n    \"   01/23/2020 01:15:41 AM ERROR [neuron-cc]:\\n\",\n    \"\\n\",\n    \"```\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"gentle-census\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.8.9 64-bit\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.9\"\n  },\n  \"vscode\": {\n   \"interpreter\": {\n    \"hash\": \"31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6\"\n   }\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/optimize_for_inference.py",
    "content": "\"\"\" Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n\"\"\"\n\nimport re\nimport copy\nimport argparse\nimport tensorflow as tf\nimport numpy as np\nimport string\n\nfrom google.protobuf import text_format\nfrom tensorflow.core.framework import node_def_pb2\nfrom tensorflow.core.framework import attr_value_pb2\nfrom tensorflow.python.framework import tensor_util\nfrom tensorflow.tools.graph_transforms import TransformGraph\n\ndef clear_input(node):\n  for i in range(len(node.input)):\n    node.input.pop()\n\ndef replace_name(node, name):\n  node.name = name\n     \ndef replace_input(node, input_name, new_name):\n  # node.input.replace(input_name, new_name)\n  temp = []\n  for i in node.input:\n    temp.extend([new_name if i == input_name else i])\n  clear_input(node)\n  for i in temp:\n    node.input.extend([i])\n\ndef swap_names(node1, node2):\n  temp = node2.name\n  node2.name = node1.name\n  node1.name = temp\n\ndef get_const_node(const_node_name, const_by_name):\n  name = re.sub(\"/read$\", \"\", const_node_name)\n  return const_by_name[name]\n\ndef get_const_ndarray(const_node_name, const_by_name):\n  name = re.sub(\"/read$\", \"\", const_node_name)\n  node = const_by_name[name]\n  return tf.make_ndarray(node.attr.get(\"value\").tensor)\n\ndef adjust_bias_values(bias_node, fbn_node, const_by_name):\n  bias_val = get_const_ndarray(bias_node.input[1], const_by_name)  \n  gamma_val = get_const_ndarray(fbn_node.input[1], const_by_name)  \n  mean_val = get_const_ndarray(fbn_node.input[3], const_by_name)  \n  variance_val = get_const_ndarray(fbn_node.input[4], const_by_name) \n  new_bias = bias_val * gamma_val / np.sqrt(variance_val)\n  new_tensor = tensor_util.make_tensor_proto(new_bias, new_bias.dtype, new_bias.shape)\n  bias_const_node = get_const_node(bias_node.input[1], const_by_name)\n  bias_const_node.attr[\"value\"].CopyFrom(attr_value_pb2.AttrValue(tensor=new_tensor))\n\ndef MoveBiasAddAfterFusedBatchNorm(graphdef):\n  \"\"\"fold_batch_norm function of TransformGraph is unable to fold Keras ResNet50\n  because of BiasAdd between Conv2D and FusedBatchNorm (BiasAdd is not needed\n  if FusedBatchNorm is used, but it exists in Keras ResNet50). Here, we \n  move BiasAdd to after FusedBatchNorm, and adjust bias value by gamma/sqrt(variance).\n  \"\"\"\n  sess = tf.compat.v1.Session(graph=tf.import_graph_def(graphdef))\n  output_graph_def = tf.compat.v1.GraphDef()\n  node_by_name = {}\n  const_by_name = {}\n  for node in graphdef.node:\n    # Hack: use FusedBatchNormV2 so fold_batch_norm can recognize\n    if node.op == \"FusedBatchNormV3\":\n      node.op = \"FusedBatchNorm\"\n      del(node.attr[\"U\"])\n      #import pdb; pdb.set_trace()\n    copied_node = node_def_pb2.NodeDef()\n    copied_node.CopyFrom(node)\n    node_by_name[node.name] = copied_node\n    skip_add_node = False\n    # Switch Mul/BiasAdd in Keras RN50 so fold_batch_norm transform would work\n    if node.op == \"Const\":\n      const_by_name[node.name] = copied_node  \n    elif node.op.startswith(\"FusedBatchNorm\"):\n      inputs = node.input\n      for i in inputs:\n        input_node = node_by_name[i]\n        if input_node.op == \"BiasAdd\":\n          output_graph_def.node.remove(input_node)\n          input_node_input0 = input_node.input[0]\n          # Adjust bias values (multiply by scale/sqrt(variance))\n          adjust_bias_values(input_node, node, const_by_name)\n          # Hack: swap names to avoid changing input of activation\n          swap_names(copied_node, input_node)\n          # Fix inputs for these two ops\n          replace_input(copied_node, i, input_node_input0)\n          replace_input(input_node, input_node_input0, copied_node.name)\n          # Fix order in node list\n          output_graph_def.node.extend([copied_node])\n          output_graph_def.node.extend([input_node])\n          skip_add_node = True\n    # Add maybe-modified nodes if not already done\n    if not skip_add_node:\n      output_graph_def.node.extend([copied_node])\n  return output_graph_def\n\ndef FoldFusedBatchNorm(graph_def):\n  \"\"\"Optimize training graph for inference:\n    - Remove Identity and CheckNumerics nodes\n    - Fold FusedBatchNorm constants into previous Conv2D weights\n    - Fold other constants\n    - Strip unused nodes\n    - Sort by execution order\n  \"\"\"\n  transformed_graph_def = TransformGraph (\n         graph_def,\n         ['input_1'],\n         ['probs/Softmax'],\n         [\n            'add_default_attributes',\n            'remove_nodes(op=Identity, op=CheckNumerics)',\n            'fold_constants(ignore_errors=true)',\n            'fold_batch_norms',\n            'fold_old_batch_norms',\n            'strip_unused_nodes',\n            'sort_by_execution_order',\n         ])\n  return transformed_graph_def\n\ndef load_graph(model_file):\n  graph_def = tf.compat.v1.GraphDef()\n\n  with open(model_file, \"rb\") as f:\n    graph_def.ParseFromString(f.read())\n  return graph_def\n\nif __name__ == \"__main__\":\n  parser = argparse.ArgumentParser()\n  parser.add_argument(\"--graph\", help=\"graph/model to be executed\",\n      required=True)\n  parser.add_argument(\"--out_graph\", help=\"graph/model to be generated\",\n      required=True)\n  args = parser.parse_args()\n\n  graph_orig = load_graph(args.graph)\n  graph_mod = MoveBiasAddAfterFusedBatchNorm(graph_orig)\n  graph_mod2 = FoldFusedBatchNorm(graph_mod)\n  with tf.io.gfile.GFile(args.out_graph, \"wb\") as f:\n    f.write(graph_mod2.SerializeToString())\n  #with tf.io.gfile.GFile(args.out_graph + \"txt\", 'w') as f:\n  #  f.write(text_format.MessageToString(graph_mod2))\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/pb2sm_compile.py",
    "content": "\"\"\" Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n    SPDX-License-Identifier: MIT-0\n\"\"\"\n\nimport time\nimport shutil\nimport numpy as np\nimport argparse\nimport tensorflow as tf\nfrom tensorflow.keras.preprocessing import image\nfrom tensorflow.keras.applications import resnet50\nimport tensorflow.neuron as tfn\n\ntf.keras.backend.set_image_data_format('channels_last')\n\narg_parser = argparse.ArgumentParser()\narg_parser.add_argument('--batch_size', type=int, default=5, choices=range(1, 6), help='Input data batch size for compilation of model')\narg_parser.add_argument('--neuroncore-pipeline-cores', type=int, default=1, choices=range(1, 17), help='Number of NeuronCores limit for each partitioned graph')\narg_parser.add_argument('--debug_args', type=str, default=\"\", help='Optional Compiler debug args')\narg_parser.add_argument('--workdir', type=str, default=\"compiler_workdir\", help='Compiler work directory')\n\nargs = arg_parser.parse_args()\n\ndef pb_to_saved_model(pb_path, input_names, output_names, model_dir):\n    graph_def = tf.GraphDef()\n    graph_def.ParseFromString(open(pb_path, 'rb').read())\n    with tf.Session(graph=tf.Graph()) as sess:\n        tf.import_graph_def(graph_def, name='')\n        inputs = {name: sess.graph.get_tensor_by_name(ts_name) for name, ts_name in input_names.items()}\n        outputs = {name: sess.graph.get_tensor_by_name(ts_name) for name, ts_name in output_names.items()}\n        tf.saved_model.simple_save(sess, model_dir, inputs, outputs)\n\nsaved_model_dir = \"rn50_fp16\"\n\nshutil.rmtree(saved_model_dir, ignore_errors=True)\n\npb_to_saved_model(\"resnet50_fp16_keras_opt.pb\", {\"input_1:0\": \"input_1:0\"}, {\"probs/Softmax:0\" : \"probs/Softmax:0\"}, saved_model_dir)\n\nbatch_size = args.batch_size\nimg_arr = np.zeros([batch_size, 224, 224, 3], dtype='float16')\ncompiled_saved_model_dir = saved_model_dir + \"_compiled_b\" + str(batch_size) + \"_nc\" + str(args.neuroncore_pipeline_cores)\nshutil.rmtree(compiled_saved_model_dir + \"/1\", ignore_errors=True)\n\nprint(\"\\n*** Batch size {}, num NeuronCores {} (input shape: {}, saved model dir: {}) ***\\n\".format(batch_size, args.neuroncore_pipeline_cores, img_arr.shape, compiled_saved_model_dir))\ncompiler_args = ['--neuroncore-pipeline-cores', str(args.neuroncore_pipeline_cores)]\nif args.debug_args:\n    compiler_args.extend(args.debug_args.split(\" \"))\n\nstatic_weights = False\nif args.neuroncore_pipeline_cores >= 8:\n    static_weights = True\n\nshutil.rmtree(args.workdir, ignore_errors=True)\nstart = time.time()\nrslts = tfn.saved_model.compile(saved_model_dir, compiled_saved_model_dir + \"/1\",\n               model_feed_dict={'input_1:0' : img_arr},\n               compiler_workdir=args.workdir,\n               dynamic_batch_size=True,\n               compiler_args = compiler_args)\ndelta = time.time() - start\nperc_on_inf = rslts['OnNeuronRatio'] * 100\n\ncompile_success = False\nif perc_on_inf < 50:\n    print(\"\\nERROR: Compilation finished in {:.0f} seconds with less than 50% operations placed on Inferentia ({:.1f}%)\\n\".format(delta, perc_on_inf))\n    if '--static-weights' in compiler_args:\n        print(\"INFO: Retry compilation without static weights\")\n        compiler_args.remove('--static-weights')\n        static_weights = False\n        shutil.rmtree(compiled_saved_model_dir + \"/1\", ignore_errors=True)\n        shutil.rmtree('compiler_workdir2', ignore_errors=True)\n        start = time.time()\n        rslts = tfn.saved_model.compile(saved_model_dir, compiled_saved_model_dir + \"/1\",\n                   model_feed_dict={'input_1:0' : img_arr},\n                   compiler_workdir='compiler_workdir2',\n                   dynamic_batch_size=True,\n                   compiler_args = compiler_args)\n        delta = time.time() - start\n        perc_on_inf = rslts['OnNeuronRatio'] * 100\n        if perc_on_inf < 50:\n            print(\"\\nERROR: Retry compilation finished in {:.0f} seconds with less than 50% operations placed on Inferentia ({:.1f}%)\\n\".format(delta, perc_on_inf))\n        else:\n            print(\"\\nINFO: Retry compilation finished in {:.0f} seconds with {:.1f}% operations placed on Inferentia\\n\".format(delta, perc_on_inf))\n            compile_success = True\nelse:\n    print(\"\\nINFO: Compilation finished in {:.0f} seconds with {:.1f}% operations placed on Inferentia\\n\".format(delta, perc_on_inf))\n    compile_success = True\n\n# Prepare SavedModel for uploading to Inf1 instance\ncompletion_code = 0\nif compile_success:\n    shutil.make_archive('./' + compiled_saved_model_dir, 'zip', './', compiled_saved_model_dir)\n    completion_code = 1 + int(static_weights)\n\nprint(completion_code)\n\nexit(int(not compile_success))\n"
  },
  {
    "path": "src/examples/tensorflow/keras_resnet50/run_all",
    "content": "#!/usr/bin/env bash\n\n##########################################################################\n#  Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n#  SPDX-License-Identifier: MIT-0\n##########################################################################\n\npip install pillow\n\n# Extract Keras ResNet50 FP32 and check inference\npython gen_resnet50_keras.py\npython infer_resnet50_keras.py --graph resnet50_fp32_keras.pb\n# Optimize fp32 graph for inference before casting\npython optimize_for_inference.py --graph resnet50_fp32_keras.pb --out_graph resnet50_fp32_keras_opt.pb\npython infer_resnet50_keras.py --graph resnet50_fp32_keras_opt.pb\n# Cast full graph to FP16\npython fp32tofp16.py  --graph resnet50_fp32_keras_opt.pb --out_graph resnet50_fp16_keras_opt.pb\npython infer_resnet50_keras.py --graph resnet50_fp16_keras_opt.pb\n# Compile\npython pb2sm_compile.py\n# Infer\npython infer_resnet50_keras_loadtest.py\n\n"
  },
  {
    "path": "src/examples/tensorflow/openpose_demo/openpose.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"caff04ba\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Running OpenPose on Inferentia\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"09b2919a\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Note: this tutorial runs on tensorflow-neuron 1.x only\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4dcf9bb1\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Introduction:\\n\",\n    \"\\n\",\n    \"In this tutorial we will compile and deploy Openpose model for Inferentia. This jupyter notebook should run on an inf1.6xlarge instance for compilation and inference. The inference part of this tutorial requires inf1.6xlarge and not the compilation itself. For simplicity we will run this tutorial on a single instance but in real life scenario the compilation can be done on a compute c5.4xlarge instance and the deployment on the inf1 instance family.\\n\",\n    \"\\n\",\n    \"In this tutorial we provide two main sections:\\n\",\n    \"1. Compile the OpenPose model on inf1x6large.\\n\",\n    \"2. Infer the same compiled model on inf1x6large.\\n\",\n    \"\\n\",\n    \"Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](../../../../frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow). You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"04ae0838\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Acknowledgement:\\n\",\n    \"\\n\",\n    \"Many thanks to https://github.com/ildoonet for providing pretrained model as well as the image preprocessing/pose estimating infrastructure.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d0d6d08e\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Download tensorflow pose net frozen graph.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"1926d4e3\",\n   \"metadata\": {\n    \"scrolled\": false\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"!wget -c --tries=2 $( wget -q -O - http://www.mediafire.com/file/qlzzr20mpocnpa3/graph_opt.pb | grep -o 'http*://download[^\\\"]*' | tail -n 1 ) -O graph_opt.pb\\n\",\n    \"\\n\",\n    \"!pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/\\n\",\n    \"!pip install neuron_cc==1.13.5.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"83eb578b\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Compile\\n\",\n    \"Compile the pose net frozen graph into AWS Neuron compatible form. Network input image resolution is adjustable with argument --net_resolution (e. g., --net_resolution=656x368). The compiled model can accept arbitrary batch size input at runtime.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"362f322e\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"\\\"\\\"\\\"\\n\",\n    \"Usage: python convert_graph_opt.py /path/to/graph_opt.pb /path/to/graph_opt_neuron.pb\\n\",\n    \"\\\"\\\"\\\"\\n\",\n    \"#import argparse\\n\",\n    \"import numpy as np\\n\",\n    \"import tensorflow as tf\\n\",\n    \"from tensorflow.core.framework.tensor_shape_pb2 import TensorShapeProto\\n\",\n    \"import tensorflow.neuron as tfn\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def compile():\\n\",\n    \"    #parser = argparse.ArgumentParser()\\n\",\n    \"    #parser.add_argument('input_pb_path', help='Input serialized GraphDef protobuf')\\n\",\n    \"    #parser.add_argument('output_pb_path', help='Ouput serialized GraphDef protobuf')\\n\",\n    \"    #parser.add_argument('--net_resolution', default='656x368', help='Network resolution in WxH format, e. g., --net_resolution=656x368')\\n\",\n    \"    #parser.add_argument('--debug_verify', action='store_true')\\n\",\n    \"    #args = parser.parse_args()\\n\",\n    \"   \\n\",\n    \"    input_pb_path = './graph_opt.pb'\\n\",\n    \"    net_resolution = '656x368'\\n\",\n    \"    output_pb_path = './graph_opt_neuron_' + net_resolution + '.pb'\\n\",\n    \"    \\n\",\n    \"    debug_verify = 'store_true'\\n\",\n    \"    dim_w, dim_h = net_resolution.split('x')\\n\",\n    \"    dim_w = int(dim_w)\\n\",\n    \"    dim_h = int(dim_h)\\n\",\n    \"    graph_def = tf.GraphDef()\\n\",\n    \"    with open(input_pb_path, 'rb') as f:\\n\",\n    \"        graph_def.ParseFromString(f.read())\\n\",\n    \"\\n\",\n    \"    if debug_verify:\\n\",\n    \"        np.random.seed(0)\\n\",\n    \"        feed_dict = {'image:0': np.random.rand(1, dim_h, dim_w, 3)}\\n\",\n    \"        output_name = 'Openpose/concat_stage7:0'\\n\",\n    \"        with tf.Session(graph=tf.Graph()) as sess:\\n\",\n    \"            tf.import_graph_def(graph_def, name='')\\n\",\n    \"            result_reference = sess.run(output_name, feed_dict)\\n\",\n    \"\\n\",\n    \"    preprocessing_ops = {'preprocess_divide', 'preprocess_divide/y', 'preprocess_subtract', 'preprocess_subtract/y'}\\n\",\n    \"    graph_def = nhwc_to_nchw(graph_def, preprocessing_ops)\\n\",\n    \"    graph_def = inline_float32_to_float16(graph_def, preprocessing_ops)\\n\",\n    \"    with tf.Session(graph=tf.Graph()) as sess:\\n\",\n    \"        tf.import_graph_def(graph_def, name='')\\n\",\n    \"        no_fuse_ops = preprocessing_ops.union({'Openpose/concat_stage7'})\\n\",\n    \"        infer_graph = tfn.graph_util.inference_graph_from_session(\\n\",\n    \"            sess, shape_feed_dict={'image:0': [1, dim_h, dim_w, 3]}, output_tensors=['Openpose/concat_stage7:0'],\\n\",\n    \"            no_fuse_ops=no_fuse_ops, dynamic_batch_size=True,\\n\",\n    \"        )\\n\",\n    \"    with open(output_pb_path, 'wb') as f:\\n\",\n    \"        f.write(infer_graph.as_graph_def().SerializeToString())\\n\",\n    \"\\n\",\n    \"    if debug_verify:\\n\",\n    \"        with tf.Session(graph=infer_graph) as sess:\\n\",\n    \"            result_compiled = sess.run(output_name, feed_dict)\\n\",\n    \"        np.testing.assert_allclose(result_compiled, result_reference, rtol=1e-2, atol=1e-3)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def inline_float32_to_float16(graph_def, preprocessing_ops):\\n\",\n    \"    float32_enum = tf.float32.as_datatype_enum\\n\",\n    \"    float16_enum = tf.float16.as_datatype_enum\\n\",\n    \"    graph = tf.Graph()\\n\",\n    \"    with graph.as_default():\\n\",\n    \"        tf.import_graph_def(graph_def, name='')\\n\",\n    \"    graph_def = graph.as_graph_def()\\n\",\n    \"    for node in graph_def.node:\\n\",\n    \"        if node.name in preprocessing_ops or node.op == 'Placeholder':\\n\",\n    \"            cast_input_node_name = node.name\\n\",\n    \"            continue\\n\",\n    \"        if node.op == 'Const':\\n\",\n    \"            if node.attr['dtype'].type == float32_enum:\\n\",\n    \"                node.attr['dtype'].type = float16_enum\\n\",\n    \"                tensor_def = node.attr['value'].tensor\\n\",\n    \"                tensor_def.dtype = float16_enum\\n\",\n    \"                if tensor_def.tensor_content:\\n\",\n    \"                    const_np = np.frombuffer(tensor_def.tensor_content, dtype=np.float32).astype(np.float16)\\n\",\n    \"                    tensor_def.tensor_content = const_np.tobytes()\\n\",\n    \"                elif len(tensor_def.float_val):\\n\",\n    \"                    const_np = np.array(tensor_def.float_val).astype(np.float16).view(np.uint16)\\n\",\n    \"                    tensor_def.float_val[:] = []\\n\",\n    \"                    tensor_def.half_val[:] = list(const_np)\\n\",\n    \"                else:\\n\",\n    \"                    raise NotImplementedError\\n\",\n    \"        elif 'T' in node.attr and node.attr['T'].type == float32_enum:\\n\",\n    \"            node.attr['T'].type = float16_enum\\n\",\n    \"    for node in graph_def.node:\\n\",\n    \"        if node.name == cast_input_node_name:\\n\",\n    \"            node.name = '{}_PreCastFloat32ToFlot16'.format(node.name)\\n\",\n    \"            input_node = node\\n\",\n    \"            break\\n\",\n    \"    cast_input_node = _gen_cast_node_def(cast_input_node_name, tf.float16, input_node)\\n\",\n    \"\\n\",\n    \"    output_node = graph_def.node[-1]\\n\",\n    \"    cast_output_node_name = output_node.name\\n\",\n    \"    output_node.name = '{}_PreCastFloat16ToFlot32'.format(output_node.name)\\n\",\n    \"    cast_output_node = _gen_cast_node_def(cast_output_node_name, tf.float32, output_node)\\n\",\n    \"\\n\",\n    \"    preprocessing_ops.add(input_node.name)\\n\",\n    \"    new_graph_def = tf.GraphDef()\\n\",\n    \"    new_graph_def.node.extend(graph_def.node)\\n\",\n    \"    new_graph_def.node.append(cast_input_node)\\n\",\n    \"    new_graph_def.node.append(cast_output_node)\\n\",\n    \"    graph = tf.Graph()\\n\",\n    \"    with graph.as_default():\\n\",\n    \"        tf.import_graph_def(new_graph_def, name='')\\n\",\n    \"    return graph.as_graph_def()\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def nhwc_to_nchw(graph_def, preprocessing_ops):\\n\",\n    \"    graph = tf.Graph()\\n\",\n    \"    with graph.as_default():\\n\",\n    \"        tf.import_graph_def(graph_def, name='')\\n\",\n    \"    graph_def = graph.as_graph_def()\\n\",\n    \"    node_name_to_node = {node.name: node for node in graph_def.node}\\n\",\n    \"    for node in graph_def.node:\\n\",\n    \"        if node.name in preprocessing_ops or node.op == 'Placeholder':\\n\",\n    \"            transpose_input_node_name = node.name\\n\",\n    \"            continue\\n\",\n    \"        if node.op == 'Conv2D':\\n\",\n    \"            node.attr['data_format'].s = b'NCHW'\\n\",\n    \"            strides = node.attr['strides'].list.i\\n\",\n    \"            strides[:] = [strides[0], strides[3], strides[1], strides[2]]\\n\",\n    \"        elif node.op == 'BiasAdd':\\n\",\n    \"            if node.name != 'probs/BiasAdd':\\n\",\n    \"                node.attr['data_format'].s = b'NCHW'\\n\",\n    \"        elif node.op == 'MaxPool':\\n\",\n    \"            node.attr['data_format'].s = b'NCHW'\\n\",\n    \"            ksize = node.attr['ksize'].list.i\\n\",\n    \"            ksize[:] = [ksize[0], ksize[3], ksize[1], ksize[2]]\\n\",\n    \"            strides = node.attr['strides'].list.i\\n\",\n    \"            strides[:] = [strides[0], strides[3], strides[1], strides[2]]\\n\",\n    \"        elif node.op in {'Concat', 'ConcatV2'}:\\n\",\n    \"            node_axes = node_name_to_node[node.input[-1]]\\n\",\n    \"            node_axes.attr['value'].tensor.int_val[:] = [1]\\n\",\n    \"    for node in graph_def.node:\\n\",\n    \"        if node.name == transpose_input_node_name:\\n\",\n    \"            node.name = '{}_PreTransposeNHWC2NCHW'.format(node.name)\\n\",\n    \"            input_node = node\\n\",\n    \"            break\\n\",\n    \"    transpose_input_node, transpose_input_perm_node = _gen_transpose_def(transpose_input_node_name, [0, 3, 1, 2], input_node)\\n\",\n    \"\\n\",\n    \"    output_node = graph_def.node[-1]\\n\",\n    \"    transpose_output_node_name = output_node.name\\n\",\n    \"    output_node.name = '{}_PreTransposeNCHW2NHWC'.format(output_node.name)\\n\",\n    \"    transpose_output_node, transpose_output_perm_node = _gen_transpose_def(transpose_output_node_name, [0, 2, 3, 1], output_node)\\n\",\n    \"\\n\",\n    \"    preprocessing_ops.add(input_node.name)\\n\",\n    \"    preprocessing_ops.add(transpose_input_perm_node.name)\\n\",\n    \"    new_graph_def = tf.GraphDef()\\n\",\n    \"    new_graph_def.node.extend(graph_def.node)\\n\",\n    \"    new_graph_def.node.append(transpose_input_perm_node)\\n\",\n    \"    new_graph_def.node.append(transpose_input_node)\\n\",\n    \"    new_graph_def.node.append(transpose_output_perm_node)\\n\",\n    \"    new_graph_def.node.append(transpose_output_node)\\n\",\n    \"    graph = tf.Graph()\\n\",\n    \"    with graph.as_default():\\n\",\n    \"        tf.import_graph_def(new_graph_def, name='')\\n\",\n    \"    return graph.as_graph_def()\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def _gen_cast_node_def(name, target_dtype, input_node):\\n\",\n    \"    cast_node = tf.NodeDef(name=name, op='Cast')\\n\",\n    \"    cast_node.input.append(input_node.name)\\n\",\n    \"    cast_node.attr['DstT'].type = target_dtype.as_datatype_enum\\n\",\n    \"    cast_node.attr['SrcT'].type = input_node.attr['T'].type\\n\",\n    \"    cast_node.attr['Truncate'].b = False\\n\",\n    \"    return cast_node\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def _gen_transpose_def(name, perm, input_node):\\n\",\n    \"    perm_node = tf.NodeDef(name='{}/perm'.format(name), op='Const')\\n\",\n    \"    perm_node.attr['dtype'].type = tf.int32.as_datatype_enum\\n\",\n    \"    tensor_def = perm_node.attr['value'].tensor\\n\",\n    \"    tensor_def.dtype = tf.int32.as_datatype_enum\\n\",\n    \"    tensor_def.tensor_shape.dim.append(TensorShapeProto.Dim(size=4))\\n\",\n    \"    tensor_def.tensor_content = np.array(perm, dtype=np.int32).tobytes()\\n\",\n    \"    transpose_node = tf.NodeDef(name=name, op='Transpose')\\n\",\n    \"    transpose_node.input.append(input_node.name)\\n\",\n    \"    transpose_node.input.append(perm_node.name)\\n\",\n    \"    transpose_node.attr['T'].type = input_node.attr['T'].type\\n\",\n    \"    transpose_node.attr['Tperm'].type = tf.int32.as_datatype_enum\\n\",\n    \"    return transpose_node, perm_node\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"88c41e01\",\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"compile()\\n\",\n    \"\\n\",\n    \"# Sample output will look like below:\\n\",\n    \"# WARNING:tensorflow:From <ipython-input-3-27d3844cd753>:47: inference_graph_from_session (from tensorflow_neuron.python.graph_util) is deprecated and will be removed in a future version.\\n\",\n    \"# Instructions for updating:\\n\",\n    \"# Please refer to AWS documentation on Neuron integrated TensorFlow 2.0.\\n\",\n    \"# INFO:tensorflow:Froze 0 variables.\\n\",\n    \"# INFO:tensorflow:Converted 0 variables to const ops.\\n\",\n    \"# INFO:tensorflow:fusing subgraph {subgraph neuron_op_ed41d2deb8c54255 with input tensors [\\\"<tf.Tensor 'preprocess_subtract0/_0:0' shape=(1, 3, 368, 656) dtype=float16>\\\"], output tensors [\\\"<tf.Tensor 'Openpose/concat_stage7_PreCastFloat16ToFlot32:0' shape=(1, 46, 82, 57) dtype=float16>\\\"]} with neuron-cc\\n\",\n    \"# INFO:tensorflow:Number of operations in TensorFlow session: 474\\n\",\n    \"# INFO:tensorflow:Number of operations after tf.neuron optimizations: 474\\n\",\n    \"# INFO:tensorflow:Number of operations placed on Neuron runtime: 465\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5a9af0c7\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Deploy\\n\",\n    \"Using same instance to deploy the model.\\n\",\n    \"In case of different deployment instance, launch a deployment inf1 instance and copy the AWS Neuron optimized tensorflow frozen graph graph_opt_neuron_656x368.pb to the deployment inf1 instance. The smallest instance type inf1.xlarge is sufficient for this demo.\\n\",\n    \"\\n\",\n    \"Your graph_opt_neuron_656x368.pb can now be plugged into https://github.com/ildoonet seemlessly if you have tensorflow-neuron installed. When it is used at runtime, please ensure that the image resolution is the same as compile-time image resolution, i. e., 656x368.\\n\",\n    \"\\n\",\n    \"Measure performance on the compiled frozen graph using dummy inputs.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0481d049\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"\\\"\\\"\\\"\\n\",\n    \"Copyright (C) 2020, Amazon.com. All Rights Reserved\\n\",\n    \"\\\"\\\"\\\"\\n\",\n    \"import os\\n\",\n    \"import atexit\\n\",\n    \"import time\\n\",\n    \"import math\\n\",\n    \"import json\\n\",\n    \"from collections import OrderedDict, Counter\\n\",\n    \"from contextlib import contextmanager, ContextDecorator\\n\",\n    \"from functools import wraps\\n\",\n    \"from tensorflow.python.client import session\\n\",\n    \"from tensorflow.python.platform import tf_logging as logging\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class measure_performance(ContextDecorator):\\n\",\n    \"    \\\"\\\"\\\"Convenient tool for performance measurements.\\n\",\n    \"    Can be apply on tensorflow session.run, tf-serving unary gRPC calls, or a given custom function.\\n\",\n    \"    Usage:\\n\",\n    \"    To generate performance report for the entire Python or gRPC-client process, insert\\n\",\n    \"    the following function call before running inferences:\\n\",\n    \"    `tfn.measure_performance()`\\n\",\n    \"    Then latency/throughput report will be generated when the process terminates.\\n\",\n    \"    Alternatively, it is possible to use `tfn.measure_performance` programmatically\\n\",\n    \"    as a context manager. Performance measurement will be done for all inferences\\n\",\n    \"    happening under this context. Report will be displayed as INFO level log when exiting\\n\",\n    \"    the context. It is also possible to obtain a JSON format report in Python.\\n\",\n    \"    For example:\\n\",\n    \"    ```\\n\",\n    \"    with tfn.measure_performance() as perf:\\n\",\n    \"        ... (run some inferences) ...\\n\",\n    \"    report_json = perf.report()\\n\",\n    \"    report_full_json = perf.report(verbosity=1)\\n\",\n    \"    ```\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    def __init__(self, func=None, window_size=1):\\n\",\n    \"        self.perf_tracker = PerformanceTracker(window_size)\\n\",\n    \"        atexit.register(self.perf_tracker.report)\\n\",\n    \"        self._original_run = session.Session.run\\n\",\n    \"        self._original_grpc_call = None\\n\",\n    \"        if callable(func):\\n\",\n    \"            self.perf_tracker.register_func(self._track_performance(func))\\n\",\n    \"        else:\\n\",\n    \"            session.Session.run = self._track_performance(session.Session.run)\\n\",\n    \"            try:\\n\",\n    \"                import grpc\\n\",\n    \"                from tensorflow_serving.apis import prediction_service_pb2_grpc\\n\",\n    \"                dummy_stub = prediction_service_pb2_grpc.PredictionServiceStub(grpc.insecure_channel(''))\\n\",\n    \"                self._grpc_callable_type = type(dummy_stub.Predict)\\n\",\n    \"                self._original_grpc_call = self._grpc_callable_type.__call__\\n\",\n    \"            except ImportError:\\n\",\n    \"                pass\\n\",\n    \"            if callable(self._original_grpc_call):\\n\",\n    \"                self._grpc_callable_type.__call__ = self._track_performance(\\n\",\n    \"                    grpc._channel._UnaryUnaryMultiCallable.__call__\\n\",\n    \"                )\\n\",\n    \"\\n\",\n    \"    def __enter__(self):\\n\",\n    \"        return self.perf_tracker\\n\",\n    \"\\n\",\n    \"    def __exit__(self, *exc):\\n\",\n    \"        atexit.unregister(self.perf_tracker.report)\\n\",\n    \"        self.perf_tracker.report()\\n\",\n    \"        session.Session.run = self._original_run\\n\",\n    \"        if self._original_grpc_call is not None:\\n\",\n    \"            self._grpc_callable_type.__call__ = self._original_grpc_call\\n\",\n    \"        return False\\n\",\n    \"\\n\",\n    \"    def _track_performance(self, func):\\n\",\n    \"        @wraps(func)\\n\",\n    \"        def wrapper(*args, **kwargs):\\n\",\n    \"            start = time.time()\\n\",\n    \"            result = func(*args, **kwargs)\\n\",\n    \"            end = time.time()\\n\",\n    \"            self.perf_tracker.add_timestamps(start, end)\\n\",\n    \"            return result\\n\",\n    \"        return wrapper\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"class PerformanceTracker(ContextDecorator):\\n\",\n    \"\\n\",\n    \"    description = (\\n\",\n    \"        \\\"Latency unit: second. Throughput unit: number of batched inferences per second. \\\"\\n\",\n    \"        \\\"Reported throughput is a lower bound of the actual throughput as inferences \\\"\\n\",\n    \"        \\\"spanning across window boundaries are not counted towards any of the windows. \\\"\\n\",\n    \"        \\\"'Quiet' periods (i. e., window buckets where the inference function is not called) \\\"\\n\",\n    \"        \\\"are not counted towards the reported average throughput.\\\"\\n\",\n    \"    )\\n\",\n    \"\\n\",\n    \"    def __init__(self, window_size):\\n\",\n    \"        self.window_size = window_size\\n\",\n    \"        self.timestamps_list = []\\n\",\n    \"        self._func = None\\n\",\n    \"\\n\",\n    \"    def __call__(self, *args, **kwargs):\\n\",\n    \"        return self._func(*args, **kwargs)\\n\",\n    \"\\n\",\n    \"    def register_func(self, func):\\n\",\n    \"        self._func = func\\n\",\n    \"\\n\",\n    \"    def add_timestamps(self, start, end):\\n\",\n    \"        self.timestamps_list.append([start, end])\\n\",\n    \"\\n\",\n    \"    def report(self, verbosity=0):\\n\",\n    \"        if self.timestamps_list:\\n\",\n    \"            latency_list = [end - start for start, end in self.timestamps_list]\\n\",\n    \"            latency_json = {\\n\",\n    \"                'p50': percentile(latency_list, 50),\\n\",\n    \"                'p90': percentile(latency_list, 90),\\n\",\n    \"                'p99': percentile(latency_list, 99),\\n\",\n    \"                'p100': percentile(latency_list, 100),\\n\",\n    \"            }\\n\",\n    \"            bucketed_timestamps = [self._get_bucket(start, end) for start, end in self.timestamps_list]\\n\",\n    \"            counted_buckets = Counter(item for item in bucketed_timestamps if item is not None)\\n\",\n    \"            bucket_throughputs = [(key, value / self.window_size) for key, value in sorted(counted_buckets.items())]\\n\",\n    \"            busy_throughputs = list(OrderedDict((key, value) for key, value in bucket_throughputs).values())\\n\",\n    \"            throughput_json = {\\n\",\n    \"                'peak': max(busy_throughputs),\\n\",\n    \"                'median': percentile(busy_throughputs, 50),\\n\",\n    \"                'average': sum(busy_throughputs) / len(busy_throughputs),\\n\",\n    \"            }\\n\",\n    \"            if verbosity > 0:\\n\",\n    \"                throughput_json['trend'] = busy_throughputs\\n\",\n    \"            report_json = {\\n\",\n    \"                'pid': os.getpid(),\\n\",\n    \"                'throughput': throughput_json,\\n\",\n    \"                'latency': latency_json,\\n\",\n    \"                'description': PerformanceTracker.description,\\n\",\n    \"            }\\n\",\n    \"            with _logging_show_info():\\n\",\n    \"                logging.info('performance report:\\\\n{}'.format(json.dumps(report_json, indent=4)))\\n\",\n    \"            return report_json\\n\",\n    \"\\n\",\n    \"    def _get_bucket(self, start, end):\\n\",\n    \"        bucketed_start = math.floor(start / self.window_size) * self.window_size\\n\",\n    \"        bucketed_end = math.ceil(end / self.window_size) * self.window_size\\n\",\n    \"        if bucketed_end - bucketed_start == self.window_size:\\n\",\n    \"            return bucketed_start\\n\",\n    \"        else:\\n\",\n    \"            return None\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def percentile(number_list, percent):\\n\",\n    \"    pos_float = len(number_list) * percent / 100\\n\",\n    \"    max_pos = len(number_list) - 1\\n\",\n    \"    pos_floor = min(math.floor(pos_float), max_pos)\\n\",\n    \"    pos_ceil = min(math.ceil(pos_float), max_pos)\\n\",\n    \"    number_list = sorted(number_list)\\n\",\n    \"    return number_list[pos_ceil] if pos_float - pos_floor > 0.5 else number_list[pos_floor]\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"@contextmanager\\n\",\n    \"def _logging_show_info():\\n\",\n    \"    try:\\n\",\n    \"        verbosity = logging.get_verbosity()\\n\",\n    \"        logging.set_verbosity(logging.INFO)\\n\",\n    \"        yield\\n\",\n    \"    finally:\\n\",\n    \"        logging.set_verbosity(verbosity)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"960c6aa9\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"\\\"\\\"\\\"\\n\",\n    \"Below are the inputs for compiled frozen graph \\n\",\n    \"\\n\",\n    \"pb_path is a /path/graph_opt_neuron_656x368.pb\\n\",\n    \"num_thread = 8 ( Number of threads that work on each tensorflow session ) \\n\",\n    \"batch_size =1 ( batch_size )\\n\",\n    \"net_resolution ,default=656x368\\n\",\n    \"num_inferences = 200\\n\",\n    \"\\\"\\\"\\\"\\n\",\n    \"import os\\n\",\n    \"from concurrent import futures\\n\",\n    \"import numpy as np\\n\",\n    \"import tensorflow as tf\\n\",\n    \"import tensorflow.neuron as tfn\\n\",\n    \"\\n\",\n    \"def run_with_dummy(sess, dummy_feed_dict, num_inferences):\\n\",\n    \"    for _ in range(num_inferences):\\n\",\n    \"        sess.run('Openpose/concat_stage7:0', dummy_feed_dict)\\n\",\n    \"        \\n\",\n    \"def main():\\n\",\n    \"    NUM_NEURON_CORES = 16\\n\",\n    \"    pb_path = './graph_opt_neuron_656x368.pb'\\n\",\n    \"    num_thread = 8\\n\",\n    \"    batch_size = 1\\n\",\n    \"    net_resolution = '656x368'\\n\",\n    \"    num_inferences = 200\\n\",\n    \"    dim_w, dim_h = net_resolution.split('x')\\n\",\n    \"    dim_w = int(dim_w)\\n\",\n    \"    dim_h = int(dim_h)\\n\",\n    \"    graph_def = tf.GraphDef()\\n\",\n    \"    with open(pb_path, 'rb') as f:\\n\",\n    \"        graph_def.ParseFromString(f.read())\\n\",\n    \"    \\n\",\n    \"    graph_def = tfn.graph_util.tag_multicore(graph_def, NUM_NEURON_CORES)\\n\",\n    \"    \\n\",\n    \"    with tfn.measure_performance() as perf:\\n\",\n    \"        with tf.Session(graph=tf.Graph()) as sess:\\n\",\n    \"            tf.import_graph_def(graph_def, name='')\\n\",\n    \"            input_name = 'image:0'\\n\",\n    \"            input_shape = sess.graph.get_tensor_by_name(input_name).shape.as_list()\\n\",\n    \"            input_shape[0] = batch_size\\n\",\n    \"            input_shape[1] = dim_h\\n\",\n    \"            input_shape[2] = dim_w\\n\",\n    \"            dummy_feed_dict = {input_name: np.zeros(input_shape).astype(np.float32)}\\n\",\n    \"            with futures.ThreadPoolExecutor(max_workers=num_thread) as executor:\\n\",\n    \"                fut_list = [executor.submit(run_with_dummy, sess, dummy_feed_dict, num_inferences) for _ in range(num_thread)]\\n\",\n    \"                res_list = [fut.result() for fut in fut_list]   \\n\",\n    \"\\n\",\n    \"main()\\n\",\n    \"\\n\",\n    \"# Sample output will look like below:\\n\",\n    \"# INFO:tensorflow:performance report:\\n\",\n    \"# {\\n\",\n    \"#    \\\"pid\\\": 17713,\\n\",\n    \"#    \\\"throughput\\\": {\\n\",\n    \"#        \\\"peak\\\": 66.0,\\n\",\n    \"#        \\\"median\\\": 64.0,\\n\",\n    \"#        \\\"average\\\": 61.56521739130435\\n\",\n    \"#    },\\n\",\n    \"#    \\\"latency\\\": {\\n\",\n    \"#        \\\"p50\\\": 0.1106414794921875,\\n\",\n    \"#        \\\"p90\\\": 0.11212301254272461,\\n\",\n    \"#        \\\"p99\\\": 0.11337876319885254,\\n\",\n    \"#        \\\"p100\\\": 7.08282732963562\\n\",\n    \"#    },\\n\",\n    \"#    \\\"description\\\": \\\"Latency unit: second. Throughput unit: number of batched inferences per second. Reported throughput is a lower bound of the actual throughput as inferences spanning across window boundaries are not counted towards any of the windows. 'Quiet' periods (i. e., window buckets where the inference function is not called) are not counted towards the reported average throughput.\\\"\\n\",\n    \"# }\"\n   ]\n  },\n  {\n   \"cell_type\": \"raw\",\n   \"id\": \"4f15e776\",\n   \"metadata\": {},\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.8.9 64-bit\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.9\"\n  },\n  \"vscode\": {\n   \"interpreter\": {\n    \"hash\": \"31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6\"\n   }\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "src/examples/tensorflow/ssd300_demo/README.md",
    "content": "</br>\n</br>\n\nPlease view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** \n\n"
  },
  {
    "path": "src/examples/tensorflow/ssd300_demo/ssd300_detection.py",
    "content": "import argparse\nimport json\nimport pkg_resources\nfrom distutils.version import LooseVersion\nimport numpy as np\nfrom PIL import Image\nimport matplotlib.pyplot as plt\nimport matplotlib.patches as patches\nimport tensorflow as tf\nimport tensorflow.neuron as tfn\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--image', required=True, help='Path to image that is to be detected. Support jpeg and png format.')\n    parser.add_argument('--image_with_detections', required=True, help='Path to save image after detection (with bounding boxes drawn). Png format.')\n    parser.add_argument('--saved_model', required=True, help='TensorFlow SSD300 SavedModel')\n    parser.add_argument('--score_threshold', type=float, default=0.15, help='Minimum required score for drawing a bounding box')\n    parser.add_argument('--instances_val2017_json', default=None, help='Json file that contains labeling information')\n    parser.add_argument('--save_results', default=None)\n    parser.add_argument('--disable_version_check', action='store_true')\n    args = parser.parse_args()\n    if not args.disable_version_check:\n        tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version)\n        if tfn_version < LooseVersion('1.15.0.1.0.1333.0'):\n            raise RuntimeError(\n                'tensorflow-neuron version {} is too low for this demo. Please upgrade '\n                'by \"pip install -U tensorflow-neuron --extra-index-url=https://pip.repos.neuron.amazonaws.com\"'.format(tfn_version))\n\n    with open(args.image, 'rb') as f:\n        img_jpg_bytes = f.read()\n    model_feed_dict = {'batch_image': [img_jpg_bytes]}\n\n    predictor = tf.contrib.predictor.from_saved_model(args.saved_model)\n    results = predictor(model_feed_dict)\n    if args.save_results is not None:\n        np.savez(args.save_results, **results)\n    boxes_np = results['boxes']\n    scores_np = results['scores']\n    classes_np = results['classes']\n\n    if args.instances_val2017_json is not None:\n        with open(args.instances_val2017_json) as f:\n            annotate_json = json.load(f)\n        label_info = {idx+1: cat['name'] for idx, cat in enumerate(annotate_json['categories'])}\n\n    plt.switch_backend('agg')\n    fig, ax = plt.subplots(1)\n    ax.imshow(Image.open(args.image).convert('RGB'))\n\n    wanted = scores_np[0] > args.score_threshold\n    for xywh, label_no_bg in zip(boxes_np[0][wanted], classes_np[0][wanted]):\n        rect = patches.Rectangle((xywh[0], xywh[1]), xywh[2], xywh[3], linewidth=1, edgecolor='g', facecolor='none')\n        ax.add_patch(rect)\n        rx, ry = rect.get_xy()\n        rx = rx + rect.get_width() / 2.0\n        if args.instances_val2017_json is not None:\n            ax.annotate(label_info[label_no_bg + 1], (rx, ry), color='w', backgroundcolor='g', fontsize=10,\n                        ha='center', va='center', bbox=dict(boxstyle='square,pad=0.01', fc='g', ec='none', alpha=0.5))\n    plt.savefig(args.image_with_detections)\n    plt.close(fig)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "src/examples/tensorflow/ssd300_demo/ssd300_evaluation.py",
    "content": "import argparse\nimport os\nimport json\nimport glob\nfrom concurrent import futures\nimport time\nimport pkg_resources\nfrom distutils.version import LooseVersion\nimport numpy as np\nimport tensorflow as tf\nimport tensorflow.neuron as tfn\nfrom pycocotools.cocoeval import COCOeval\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.coco import COCO\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import dboxes300_coco\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import SSDTransformer\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import COCODetection\n\n\ndef get_val_dataset(val_annotate, val_coco_root):\n    dboxes = dboxes300_coco()\n    val_trans = SSDTransformer(dboxes, (300, 300), val=True)\n    val_coco = COCODetection(val_coco_root, val_annotate, val_trans)\n    return val_coco\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--saved_model', required=True, help='TensorFlow SSD300 SavedModel')\n    parser.add_argument('--val2017', required=True, help='Path to COCO 2017 validation dataset')\n    parser.add_argument('--instances_val2017_json', required=True, help='Json file that contains labeling information')\n    parser.add_argument('--num_sessions', type=int, default=1, help='Number of tensorflow sessions')\n    parser.add_argument('--num_threads', type=int, default=4, help='Number of threads')\n    parser.add_argument('--throughput_interval', type=int, default=10, help='Interval for counting throughput')\n    parser.add_argument('--save_results', default=None)\n    parser.add_argument('--disable_version_check', action='store_true')\n    args = parser.parse_args()\n    if not args.disable_version_check:\n        tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version)\n        if tfn_version < LooseVersion('1.15.0.1.0.1333.0'):\n            raise RuntimeError(\n                'tensorflow-neuron version {} is too low for this demo. Please upgrade '\n                'by \"pip install -U tensorflow-neuron --extra-index-url=https://pip.repos.neuron.amazonaws.com\"'.format(tfn_version))\n    predictor_list = [tf.contrib.predictor.from_saved_model(args.saved_model) for _ in range(args.num_sessions)]\n\n    val_dataset = get_val_dataset(args.instances_val2017_json, args.val2017)\n    inv_map = {v: k for k, v in val_dataset.label_map.items()}\n    model_feed_dict_list = []\n    for img_id in val_dataset.img_keys:\n        img_path = os.path.join(args.val2017, val_dataset.images[img_id][0])\n        with open(img_path, 'rb') as f:\n            img_jpg_bytes = f.read()\n        model_feed_dict_list.append({'batch_image': [img_jpg_bytes]})\n\n    latency_list = []\n    throughput_list = []\n    def predict(pred, model_feed_dict):\n        start = time.time()\n        result = pred(model_feed_dict)\n        latency_list.append(time.time() - start)\n        return result\n\n    def performance():\n        last_num_infer = len(latency_list)\n        while len(latency_list) < len(model_feed_dict_list):\n            current_num_infer = len(latency_list)\n            throughput = (current_num_infer - last_num_infer) / args.throughput_interval\n            throughput_list.append(throughput)\n            p50 = 0.0\n            p90 = 0.0\n            if latency_list:\n                p50 = np.percentile(latency_list, 50)\n                p90 = np.percentile(latency_list, 90)\n            print('pid {}: current throughput {}, latency p50={:.3f} p90={:.3f}'.format(os.getpid(), throughput, p50, p90))\n            last_num_infer = current_num_infer\n            time.sleep(args.throughput_interval)\n\n    executor = futures.ThreadPoolExecutor(max_workers=(args.num_sessions*args.num_threads)+1)\n    performance_future = executor.submit(performance)\n    eval_futures = []\n    for idx, model_feed_dict in enumerate(model_feed_dict_list):\n        eval_fut = executor.submit(predict, predictor_list[idx%len(predictor_list)], model_feed_dict)\n        eval_futures.append(eval_fut)\n    waited_results = []\n    for idx, eval_fut in enumerate(eval_futures):\n        if idx % 100 == 0:\n            print('evaluating image {}/{}'.format(idx, len(eval_futures)))\n        waited_results.append(eval_fut.result())\n    eval_results = []\n    for idx, (img_id, results) in enumerate(zip(val_dataset.img_keys, waited_results)):\n        boxes = results['boxes']\n        for box, label, prob in zip(results['boxes'][0], results['classes'][0], results['scores'][0]):\n            res = [img_id, box[0], box[1], box[2], box[3], prob, inv_map[label+1]]  # +1 to account for background\n            eval_results.append(res)\n    performance_future.result()\n\n    coco_gt = COCO(annotation_file=args.instances_val2017_json)\n    coco_dt = coco_gt.loadRes(np.array(eval_results).astype(np.float32))\n    coco_eval = COCOeval(coco_gt, coco_dt, iouType='bbox')\n    coco_eval.evaluate()\n    coco_eval.accumulate()\n    coco_eval.summarize()\n    if args.save_results is not None:\n        np.save(args.save_results, coco_eval.stats)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "src/examples/tensorflow/ssd300_demo/ssd300_evaluation_client.py",
    "content": "import argparse\nimport os\nimport json\nimport glob\nfrom concurrent import futures\nimport time\nimport subprocess\nfrom distutils.version import LooseVersion\nimport numpy as np\nimport tensorflow as tf\nimport grpc\nfrom tensorflow_serving.apis import predict_pb2\nfrom tensorflow_serving.apis import prediction_service_pb2_grpc\nfrom pycocotools.cocoeval import COCOeval\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.coco import COCO\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import dboxes300_coco\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import SSDTransformer\nfrom DeepLearningExamples.PyTorch.Detection.SSD.src.utils import COCODetection\n\n\ndef get_val_dataset(val_annotate, val_coco_root):\n    dboxes = dboxes300_coco()\n    val_trans = SSDTransformer(dboxes, (300, 300), val=True)\n    val_coco = COCODetection(val_coco_root, val_annotate, val_trans)\n    return val_coco\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--server_address', default='localhost:8500', help='tensorflow-model-server-neuron grpc address')\n    parser.add_argument('--model_name', default='default', help='Serving model name')\n    parser.add_argument('--val2017', required=True, help='Path to COCO 2017 validation dataset')\n    parser.add_argument('--instances_val2017_json', required=True, help='Json file that contains labeling information')\n    parser.add_argument('--num_threads', type=int, default=4, help='Number of threads')\n    parser.add_argument('--throughput_interval', type=int, default=10, help='Interval for counting throughput')\n    parser.add_argument('--save_results', default=None)\n    args = parser.parse_args()\n\n    channel = grpc.insecure_channel(args.server_address)\n    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)\n\n    val_dataset = get_val_dataset(args.instances_val2017_json, args.val2017)\n    inv_map = {v: k for k, v in val_dataset.label_map.items()}\n    request_list = []\n    for img_id in val_dataset.img_keys:\n        img_path = os.path.join(args.val2017, val_dataset.images[img_id][0])\n        with open(img_path, 'rb') as f:\n            img_jpg_bytes = f.read()\n        data = np.array([img_jpg_bytes], dtype=object)\n        data = tf.contrib.util.make_tensor_proto(data, shape=data.shape)\n        request = predict_pb2.PredictRequest()\n        request.model_spec.name = args.model_name\n        request.inputs['batch_image'].CopyFrom(data)\n        request_list.append(request)\n\n    latency_list = []\n    throughput_list = []\n    def predict(request):\n        start = time.time()\n        result = stub.Predict(request).outputs\n        latency_list.append(time.time() - start)\n        return result\n\n    def performance():\n        last_num_infer = len(latency_list)\n        while len(latency_list) < len(request_list):\n            current_num_infer = len(latency_list)\n            throughput = (current_num_infer - last_num_infer) / args.throughput_interval\n            throughput_list.append(throughput)\n            p50 = 0.0\n            p90 = 0.0\n            if latency_list:\n                p50 = np.percentile(latency_list, 50)\n                p90 = np.percentile(latency_list, 90)\n            print('pid {}: current throughput {}, latency p50={:.3f} p90={:.3f}'.format(os.getpid(), throughput, p50, p90))\n            last_num_infer = current_num_infer\n            time.sleep(args.throughput_interval)\n\n    executor = futures.ThreadPoolExecutor(max_workers=args.num_threads+1)\n    performance_future = executor.submit(performance)\n    eval_futures = []\n    for idx, request in enumerate(request_list):\n        eval_fut = executor.submit(predict, request)\n        eval_futures.append(eval_fut)\n    waited_results = []\n    for idx, eval_fut in enumerate(eval_futures):\n        if idx % 100 == 0:\n            print('evaluating image {}/{}'.format(idx, len(eval_futures)))\n        waited_results.append(eval_fut.result())\n    eval_results = []\n    for idx, (img_id, results) in enumerate(zip(val_dataset.img_keys, waited_results)):\n        results = {key: tf.make_ndarray(value) for key, value in results.items()}\n        boxes = results['boxes']\n        for box, label, prob in zip(results['boxes'][0], results['classes'][0], results['scores'][0]):\n            res = [img_id, box[0], box[1], box[2], box[3], prob, inv_map[label+1]]  # +1 to account for background\n            eval_results.append(res)\n    performance_future.result()\n\n    coco_gt = COCO(annotation_file=args.instances_val2017_json)\n    coco_dt = coco_gt.loadRes(np.array(eval_results).astype(np.float32))\n    coco_eval = COCOeval(coco_gt, coco_dt, iouType='bbox')\n    coco_eval.evaluate()\n    coco_eval.accumulate()\n    coco_eval.summarize()\n    if args.save_results is not None:\n        np.save(args.save_results, coco_eval.stats)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "src/examples/tensorflow/ssd300_demo/ssd300_model.py",
    "content": "import sys\nimport os\nimport argparse\nimport time\nimport itertools\nfrom functools import partial\nfrom collections import Counter\nimport json\nimport shutil\nimport pkg_resources\nfrom distutils.version import LooseVersion\nimport numpy as np\nimport tensorflow as tf\nfrom tensorflow.core.framework import attr_value_pb2\nimport tensorflow.neuron as tfn\nimport torch\n\n\ndef decode_jpeg_resize(input_tensor, image_size):\n    # decode jpeg\n    tensor = tf.image.decode_png(input_tensor, channels=3)\n\n    # resize\n    decoded_shape = tf.shape(tensor)\n    tensor = tf.cast(tensor, tf.float32)\n    decoded_shape_hw = decoded_shape[0:2]\n    decoded_shape_hw_float32 = tf.cast(decoded_shape_hw, tf.float32)\n    tensor = tf.image.resize(tensor, image_size)\n\n    # normalize\n    tensor -= np.array([0.485, 0.456, 0.406]).astype(np.float32) * 255.0\n    return tensor, decoded_shape_hw_float32[::-1]\n\n\ndef preprocessor(input_tensor, image_size):\n    with tf.name_scope('Preprocessor'):\n        tensor, bbox_scale_hw = tf.map_fn(\n            partial(decode_jpeg_resize, image_size=image_size), input_tensor,\n            dtype=(tf.float32, tf.float32), back_prop=False, parallel_iterations=16)\n    return tensor, bbox_scale_hw\n\n\ndef tf_Conv2d(input_tensor, module, first_conv=False):\n    np_dtype = input_tensor.dtype.as_numpy_dtype\n    kernel_np = module.weight.detach().numpy().transpose([2, 3, 1, 0])\n    if first_conv:\n        kernel_np /= (np.array([0.229, 0.224, 0.225]).astype(np.float32) * 255.0)[:, np.newaxis]\n    kernel = tf.constant(kernel_np.astype(np_dtype))\n    if any(module.padding):\n        pad_h, pad_w = module.padding\n        padding = [[0, 0], [pad_h, pad_h], [pad_w, pad_w], [0, 0]]\n        input_tensor = tf.pad(input_tensor, padding)\n    stride_h, stride_w = module.stride\n    tensor = tf.nn.conv2d(input_tensor, kernel, strides=[1, stride_h, stride_w, 1], padding='VALID')\n    if module.bias is not None:\n        bias = tf.constant(module.bias.detach().numpy().astype(np_dtype))\n        tensor = tf.nn.bias_add(tensor, bias)\n    return tensor\n\ndef tf_BatchNorm2d(input_tensor, module):\n    def _norm_np(ts):\n        return ts.astype(input_tensor.dtype.as_numpy_dtype)\n    mean = _norm_np(module.running_mean.detach().numpy())\n    offset = _norm_np(module.bias.detach().numpy())\n    inv_std = np.sqrt(module.running_var.detach().numpy() + module.eps)\n    scale_inv_std = _norm_np(module.weight.detach().numpy() / inv_std)\n    return scale_inv_std * (input_tensor - mean) + offset\n\ndef tf_MaxPool2d(input_tensor, module):\n    pad = module.padding\n    tensor = tf.pad(input_tensor, [[0, 0], [pad, pad], [pad, pad], [0, 0]])\n    return tf.nn.max_pool2d(tensor, ksize=module.kernel_size, strides=module.stride, padding='VALID')\n\ndef tf_Bottleneck(input_tensor, module):\n    tensor = tf_Conv2d(input_tensor, module.conv1)\n    tensor = tf_BatchNorm2d(tensor, module.bn1)\n    tensor = tf.nn.relu(tensor)\n    tensor = tf_Conv2d(tensor, module.conv2)\n    tensor = tf_BatchNorm2d(tensor, module.bn2)\n    tensor = tf.nn.relu(tensor)\n    tensor = tf_Conv2d(tensor, module.conv3)\n    tensor = tf_BatchNorm2d(tensor, module.bn3)\n    if module.downsample is not None:\n        input_tensor = tf_Conv2d(input_tensor, module.downsample[0])\n        input_tensor = tf_BatchNorm2d(input_tensor, module.downsample[1])\n    return tf.nn.relu(input_tensor + tensor)\n\ndef tf_SequentialBottleneck(tensor, seq, resnet):\n    with tf.name_scope('{}.Sequential'.format(seq)):\n        for idx, module in enumerate(resnet[seq]):\n            with tf.name_scope('{}.BasicBlock'.format(idx)):\n                tensor = tf_Bottleneck(tensor, module)\n    return tensor\n\ndef tf_bbox_view(detection_feed, modules, ndim):\n    results = []\n    for idx, (tensor, mod) in enumerate(zip(detection_feed, modules)):\n        with tf.name_scope('branch{}'.format(idx)):\n            tensor = tf_Conv2d(tensor, mod)\n            tensor = tf.transpose(tensor, [0, 3, 1, 2])\n            tensor = tf.cast(tensor, tf.float32)\n\n            shape = tensor.shape.as_list()\n            batch_size = -1 if shape[0] is None else shape[0]\n            new_shape = [batch_size, ndim, np.prod(shape[1:]) // ndim]\n            results.append(tf.reshape(tensor, new_shape))\n    tensor = tf.concat(results, axis=-1)\n    return tensor\n\n\ndef tf_feature_extractor(input_tensor, resnet):\n    with tf.name_scope('FeatureExtractor'):\n        with tf.name_scope('0.Conv2d'):\n            tensor = tf_Conv2d(input_tensor, resnet[0], first_conv=True)\n        with tf.name_scope('1.BatchNorm2d'):\n            tensor = tf_BatchNorm2d(tensor, resnet[1])\n        with tf.name_scope('2.ReLU'):\n            tensor = tf.nn.relu(tensor)\n        with tf.name_scope('3.MaxPool2d'):\n            tensor = tf_MaxPool2d(tensor, resnet[3])\n        tensor = tf_SequentialBottleneck(tensor, 4, resnet)\n        tensor = tf_SequentialBottleneck(tensor, 5, resnet)\n        tensor = tf_SequentialBottleneck(tensor, 6, resnet)\n        tensor = tf.cast(tensor, tf.float16)\n    return tensor\n\n\ndef tf_box_predictor(tensor, ssd300_torch):\n    with tf.name_scope('BoxPredictor'):\n        detection_feed = [tensor]\n        for idx, block in enumerate(ssd300_torch.additional_blocks):\n            with tf.name_scope('{}.Sequential'.format(idx)):\n                tensor = tf_Conv2d(tensor, block[0])\n                tensor = tf_BatchNorm2d(tensor, block[1])\n                tensor = tf.nn.relu(tensor)\n                tensor = tf_Conv2d(tensor, block[3])\n                tensor = tf_BatchNorm2d(tensor, block[4])\n                tensor = tf.nn.relu(tensor)\n                detection_feed.append(tensor)\n        with tf.name_scope('Boxes'):\n            loc = tf_bbox_view(detection_feed, ssd300_torch.loc, ndim=4)\n        with tf.name_scope('Probabilities'):\n            conf = tf_bbox_view(detection_feed, ssd300_torch.conf, ndim=ssd300_torch.label_num)\n    return loc, conf\n\n\n@tfn.fuse(batch_size=1, dynamic_batch_size=True)\ndef tf_ssd300(input_tensor, ssd300_torch):\n    with tf.name_scope('SSD300'):\n        tensor = tf_feature_extractor(input_tensor, ssd300_torch.feature_extractor.feature_extractor)\n        loc, conf = tf_box_predictor(tensor, ssd300_torch)\n    return loc, conf\n\n\ndef scale_back_batch(bboxes_in, scores_in, scale_xy, scale_wh, dboxes_xywh):\n    \"\"\"\n        Do scale and transform from xywh to ltrb\n        suppose input Nx4xnum_bbox Nxlabel_numxnum_bbox\n    \"\"\"\n    with tf.name_scope('ScaleBackBatch'):\n        bboxes_in = tf.transpose(bboxes_in, [0, 2, 1])\n        scores_in = tf.transpose(scores_in, [0, 2, 1])\n\n        bboxes_xy = bboxes_in[:, :, :2]\n        bboxes_wh = bboxes_in[:, :, 2:]\n        bboxes_xy *= scale_xy\n        bboxes_wh *= scale_wh\n\n        bboxes_xy = bboxes_xy * dboxes_xywh[:, :, 2:] + dboxes_xywh[:, :, :2]\n        bboxes_wh = tf.exp(bboxes_wh) * dboxes_xywh[:, :, 2:]\n\n        bboxes_wh_half = 0.5 * bboxes_wh\n        bboxes_lt = bboxes_xy - bboxes_wh_half\n        bboxes_rb = bboxes_xy + bboxes_wh_half\n\n        bboxes_in = tf.concat([bboxes_lt, bboxes_rb], axis=-1)\n\n        return bboxes_in, tf.nn.softmax(scores_in, axis=-1)\n\ndef select_nms_outputs(input_tensors):\n    boxes_xywh, scores, classes, valid_detections = input_tensors\n    return boxes_xywh[:valid_detections], scores[:valid_detections], classes[:valid_detections]\n\ndef postprocessor(ploc_ts, plabel_ts, bbox_scale_hw_ts, scale_xy, scale_wh, dboxes_xywh):\n    with tf.name_scope('Postprocessor'):\n        ploc_ts = tf.cast(ploc_ts, tf.float32)\n        plabel_ts = tf.cast(plabel_ts, tf.float32)\n        bboxes_ts, probs_ts = scale_back_batch(ploc_ts, plabel_ts, scale_xy, scale_wh, dboxes_xywh)\n        bboxes_ts = bboxes_ts[:, :, tf.newaxis, :]\n        probs_ts = probs_ts[:, :, 1:]\n        nms_outputs = tf.image.combined_non_max_suppression(\n            bboxes_ts,\n            probs_ts,\n            max_output_size_per_class=200,\n            max_total_size=200,\n            iou_threshold=0.5,\n            score_threshold=0.05,\n            pad_per_class=False,\n            clip_boxes=False,\n            name='CombinedNonMaxSuppression',\n        )\n        nmsed_boxes_x0y0x1y1, nmsed_scores, nmsed_classes, valid_detections = nms_outputs\n        nmsed_boxes_x0y0 = nmsed_boxes_x0y0x1y1[..., :2]\n        nmsed_boxes_x1y1 = nmsed_boxes_x0y0x1y1[..., 2:]\n        bbox_scale_hw_ts = bbox_scale_hw_ts[:, tf.newaxis, :]\n        nmsed_boxes_xy = nmsed_boxes_x0y0 * bbox_scale_hw_ts\n        nmsed_boxes_wh = (nmsed_boxes_x1y1 - nmsed_boxes_x0y0) * bbox_scale_hw_ts\n        nmsed_boxes_xywh = tf.concat([nmsed_boxes_xy, nmsed_boxes_wh], axis=-1)\n        nmsed_boxes_xywh, nmsed_scores, nmsed_classes = tf.map_fn(\n            select_nms_outputs, (nmsed_boxes_xywh, nmsed_scores, nmsed_classes, valid_detections),\n            dtype=(tf.float32, tf.float32, tf.float32), back_prop=False, parallel_iterations=16)\n    return nmsed_boxes_xywh, nmsed_scores, nmsed_classes\n\n\nclass DefaultBoxes(object):\n\n    def __init__(self, fig_size, feat_size, steps, scales, aspect_ratios,\n                 scale_xy=0.1, scale_wh=0.2):\n\n        self.feat_size = feat_size\n        self.fig_size = fig_size\n\n        self.scale_xy_ = scale_xy\n        self.scale_wh_ = scale_wh\n\n        # According to https://github.com/weiliu89/caffe\n        # Calculation method slightly different from paper\n        self.steps = steps\n        self.scales = scales\n\n        fk = fig_size/np.array(steps)\n        self.aspect_ratios = aspect_ratios\n\n        self.default_boxes = []\n        # size of feature and number of feature\n        for idx, sfeat in enumerate(self.feat_size):\n\n            sk1 = scales[idx]/fig_size\n            sk2 = scales[idx+1]/fig_size\n            sk3 = np.sqrt(sk1*sk2)\n            all_sizes = [(sk1, sk1), (sk3, sk3)]\n\n            for alpha in aspect_ratios[idx]:\n                w, h = sk1*np.sqrt(alpha), sk1/np.sqrt(alpha)\n                all_sizes.append((w, h))\n                all_sizes.append((h, w))\n            for w, h in all_sizes:\n                for i, j in itertools.product(range(sfeat), repeat=2):\n                    cx, cy = (j+0.5)/fk[idx], (i+0.5)/fk[idx]\n                    self.default_boxes.append((cx, cy, w, h))\n\n        self.dboxes = np.array(self.default_boxes)\n        self.dboxes = self.dboxes.clip(min=0, max=1)\n        # For IoU calculation\n        self.dboxes_ltrb = self.dboxes.copy()\n        self.dboxes_ltrb[:, 0] = self.dboxes[:, 0] - 0.5 * self.dboxes[:, 2]\n        self.dboxes_ltrb[:, 1] = self.dboxes[:, 1] - 0.5 * self.dboxes[:, 3]\n        self.dboxes_ltrb[:, 2] = self.dboxes[:, 0] + 0.5 * self.dboxes[:, 2]\n        self.dboxes_ltrb[:, 3] = self.dboxes[:, 1] + 0.5 * self.dboxes[:, 3]\n\n    @property\n    def scale_xy(self):\n        return self.scale_xy_\n\n    @property\n    def scale_wh(self):\n        return self.scale_wh_\n\n    def __call__(self, order=\"ltrb\"):\n        if order == \"ltrb\": return self.dboxes_ltrb\n        if order == \"xywh\": return self.dboxes\n\n\ndef dboxes300_coco():\n    figsize = 300\n    feat_size = [38, 19, 10, 5, 3, 1]\n    steps = [8, 16, 32, 64, 100, 300]\n    # use the scales here: https://github.com/amdegroot/ssd.pytorch/blob/master/data/config.py\n    scales = [21, 45, 99, 153, 207, 261, 315]\n    aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]]\n    dboxes = DefaultBoxes(figsize, feat_size, steps, scales, aspect_ratios)\n    return dboxes\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--torch_checkpoint', required=True, help='Path to PyTorch SSD300 model checkpoint')\n    parser.add_argument('--output_saved_model', required=True, help='Output TensorFlow SavedModel that runs on Inferentia')\n    parser.add_argument('--disable_version_check', action='store_true')\n    args = parser.parse_args()\n    if os.path.exists(args.output_saved_model):\n        raise OSError('SavedModel dir {} already exists'.format(args.output_saved_model))\n\n    if not args.disable_version_check:\n        neuroncc_version = LooseVersion(pkg_resources.get_distribution('neuron-cc').version)\n        if neuroncc_version < LooseVersion('1.0.18000'):\n            raise RuntimeError(\n                'neuron-cc version {} is too low for this demo. Please upgrade '\n                'by \"pip install -U neuron-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com\"'.format(neuroncc_version))\n        tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version)\n        if tfn_version < LooseVersion('1.15.3.1.0.1900.0'):\n            raise RuntimeError(\n                'tensorflow-neuron version {} is too low for this demo. Please upgrade '\n                'by \"pip install -U tensorflow-neuron --extra-index-url=https://pip.repos.neuron.amazonaws.com\"'.format(tfn_version))\n\n    sys.path.append(os.getcwd())\n    from DeepLearningExamples.PyTorch.Detection.SSD.src import model as torch_ssd300_model\n    ssd300_torch = torch_ssd300_model.SSD300()\n    ckpt = torch.load(args.torch_checkpoint, map_location=torch.device('cpu'))\n    ssd300_torch.load_state_dict(ckpt['model'])\n    ssd300_torch.eval()\n\n    input_tensor = tf.placeholder(tf.string, [None])\n    image_tensor, bbox_scale_hw_tensor = preprocessor(input_tensor, [300, 300])\n\n    dboxes = dboxes300_coco()\n    dboxes_xywh = dboxes(order=\"xywh\")[np.newaxis, ...]\n\n    ploc_tensor, plabel_tensor = tf_ssd300(image_tensor, ssd300_torch)\n    boxes_tensor, scores_tensor, classes_tensor = postprocessor(\n        ploc_tensor, plabel_tensor, bbox_scale_hw_tensor, dboxes.scale_xy, dboxes.scale_wh, dboxes_xywh)\n    outputs = {\n        'boxes': boxes_tensor,\n        'scores': scores_tensor,\n        'classes': classes_tensor,\n    }\n\n    sess = tf.Session()\n    try:\n        sess.run(outputs)\n    except:\n        pass\n\n    for op in sess.graph.get_operations():\n        if op.type == 'NeuronOp':\n            if not op.get_attr('executable'):\n                raise AttributeError(\n                    'Neuron executable (neff) is empty. Please check neuron-cc is installed and working properly '\n                    '(\"pip install neuron-cc --force --extra-index-url=https://pip.repos.neuron.amazonaws.com\" '\n                    'to force reinstall neuron-cc).')\n            model_config = op.node_def.attr['model_config'].list\n            if model_config.i:\n                model_config.i[0] = 1\n            else:\n                model_config.i.extend([1, 1, 1, 10])\n            op._set_attr('model_config', attr_value_pb2.AttrValue(list=model_config))\n    tf.saved_model.simple_save(sess, args.output_saved_model, {'batch_image': input_tensor}, outputs)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "src/examples/tensorflow/tensorflow-neuronx/tfneuronx-roberta-base-tutorial.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e91cf83b\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Running Huggingface Roberta-Base with TensorFlow-NeuronX\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"71394e1e\",\n   \"metadata\": {},\n   \"source\": [\n    \"This tutorial demonstrates how to compile the Huggingface roberta-base model and infer on a trn1.2xlarge instance with \\n\",\n    \"```tensorflow-neuronx```. To compile larger models like roberta-large, please consider using an inf2 instance.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"828ef9bd\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Setup\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5becc549\",\n   \"metadata\": {},\n   \"source\": [\n    \"To run this tutorial please follow the instructions for [TensorFlow-NeuronX Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install.html) and the [Jupyter Notebook Quickstart](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html) and set your kernel to \\\"Python (tensorflow-neuronx)\\\".\\n\",\n    \"\\n\",\n    \"Next, install some additional dependencies.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ee1a3b84\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\\n\",\n    \"!pip install transformers\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"c301cfce\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Download From Huggingface and Compile for AWS-Neuron\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"92e8050d\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import tensorflow as tf\\n\",\n    \"import tensorflow_neuronx as tfnx\\n\",\n    \"from transformers import RobertaTokenizer, TFRobertaModel\\n\",\n    \"from transformers import BertTokenizer, TFBertModel\\n\",\n    \"\\n\",\n    \"# Create a wrapper for the roberta model that will accept inputs as a list\\n\",\n    \"# instead of a dictionary. This will allow the compiled model to be saved\\n\",\n    \"# to disk with the model.save() fucntion.\\n\",\n    \"class RobertaWrapper(tf.keras.Model):\\n\",\n    \"    def __init__(self, model):\\n\",\n    \"        super().__init__()\\n\",\n    \"        self.model = model\\n\",\n    \"    def __call__(self, example_inputs):\\n\",\n    \"        return self.model({'input_ids' : example_inputs[0], 'attention_mask' : example_inputs[1]})\\n\",\n    \"        \\n\",\n    \"\\n\",\n    \"tokenizer = RobertaTokenizer.from_pretrained('roberta-base')\\n\",\n    \"model = RobertaWrapper(TFRobertaModel.from_pretrained('roberta-base'))\\n\",\n    \"\\n\",\n    \"batch_size = 16\\n\",\n    \"\\n\",\n    \"# create example inputs with a batch size of 16\\n\",\n    \"text = [\\\"Paris is the <mask> of France.\\\"] * batch_size\\n\",\n    \"encoded_input = tokenizer(text, return_tensors='tf', padding='max_length', max_length=64)\\n\",\n    \"\\n\",\n    \"# turn inputs into a list\\n\",\n    \"example_input = [encoded_input['input_ids'], encoded_input['attention_mask']]\\n\",\n    \"\\n\",\n    \"#compile\\n\",\n    \"model_neuron = tfnx.trace(model, example_input)\\n\",\n    \"\\n\",\n    \"print(\\\"Running on neuron:\\\", model_neuron(example_input))\\n\",\n    \"\\n\",\n    \"# save the model to disk to save recompilation time for next usage\\n\",\n    \"model_neuron.save('./roberta-neuron-b16')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0f2e159a\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Run Basic Inference Benchmarking\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"ccf22e74\",\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import numpy as np\\n\",\n    \"import concurrent.futures\\n\",\n    \"import time\\n\",\n    \"\\n\",\n    \"reloaded_neuron_model = tf.keras.models.load_model('./roberta-neuron-b16')\\n\",\n    \"print(\\\"Reloaded model running on neuron:\\\", reloaded_neuron_model(example_input))\\n\",\n    \"\\n\",\n    \"num_threads = 4\\n\",\n    \"num_inferences = 1000\\n\",\n    \"\\n\",\n    \"latency_list = []\\n\",\n    \"def inference_with_latency_calculation(example_input):\\n\",\n    \"    global latency_list\\n\",\n    \"    start = time.time()\\n\",\n    \"    result = reloaded_neuron_model(example_input)\\n\",\n    \"    end = time.time()\\n\",\n    \"    latency_list.append((end-start) * 1000)\\n\",\n    \"    return result\\n\",\n    \"\\n\",\n    \"start = time.time()\\n\",\n    \"with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:\\n\",\n    \"    futures = []\\n\",\n    \"    for i in range(num_inferences):\\n\",\n    \"        futures.append(executor.submit(inference_with_latency_calculation, example_input))\\n\",\n    \"    for future in concurrent.futures.as_completed(futures):\\n\",\n    \"        get_result = future.result()\\n\",\n    \"end = time.time()\\n\",\n    \"\\n\",\n    \"total_time = end - start\\n\",\n    \"\\n\",\n    \"print(f\\\"Throughput was {(num_inferences * batch_size)/total_time} samples per second.\\\")\\n\",\n    \"print(f\\\"Latency p50 was {np.percentile(latency_list, 50)} ms\\\")\\n\",\n    \"print(f\\\"Latency p90 was {np.percentile(latency_list, 90)} ms\\\")\\n\",\n    \"print(f\\\"Latency p99 was {np.percentile(latency_list, 99)} ms\\\")\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python (Neuron TensorFlow)\",\n   \"language\": \"python\",\n   \"name\": \"aws_neuron_venv_tf\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.10\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 5\n}\n"
  },
  {
    "path": "src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"a3bskVXPvchm\"\n   },\n   \"source\": [\n    \"# Running ResNet50 on Inferentia\\n\",\n    \"## Note: this tutorial runs on tensorflow-neuron 1.x only\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\"\n   },\n   \"source\": [\n    \"## Introduction:\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"Rb5rSpcZvYbX\"\n   },\n   \"source\": [\n    \"In this tutorial we will compile and deploy ResNet50 model for Inferentia.\\n\",\n    \"In this tutorial we provide two main sections:\\n\",\n    \"1. Compile the ResNet50 model.\\n\",\n    \"2. Infer the same compiled model.\\n\",\n    \"\\n\",\n    \"Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](../../../../frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow). You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page.\\n\",\n    \"\\n\",\n    \"Instructions of how to setup Neuron Tensorflow environment and run the tutorial as a Jupyter notebook are available in the [Tensorflow Quick Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuron/tutorials/tensorflow-tutorial-setup.html#tensorflow-tutorial-setup)\\n\",\n    \"\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"vscode\": {\n     \"languageId\": \"shellscript\"\n    }\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"!pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/\\n\",\n    \"!pip install neuron_cc==1.13.5.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"E8FhiMivhcYB\"\n   },\n   \"source\": [\n    \"## Compile for Neuron\\n\",\n    \"\\n\",\n    \"A trained model must be compiled to Inferentia target before it can be deployed on Inferentia instances. In this step we compile the Keras ResNet50 model and export it as a SavedModel which is an interchange format for TensorFlow models.\\n\",\n    \"At the end of compilation, the compiled SavedModel is saved in resnet50_neuron local directory:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import time\\n\",\n    \"import shutil\\n\",\n    \"import tensorflow as tf\\n\",\n    \"import tensorflow.neuron as tfn\\n\",\n    \"import tensorflow.compat.v1.keras as keras\\n\",\n    \"from tensorflow.keras.applications.resnet50 import ResNet50\\n\",\n    \"from tensorflow.keras.applications.resnet50 import preprocess_input\\n\",\n    \"\\n\",\n    \"# Create a workspace\\n\",\n    \"WORKSPACE = './ws_resnet50'\\n\",\n    \"os.makedirs(WORKSPACE, exist_ok=True)\\n\",\n    \"\\n\",\n    \"# Prepare export directory (old one removed)\\n\",\n    \"model_dir = os.path.join(WORKSPACE, 'resnet50')\\n\",\n    \"compiled_model_dir = os.path.join(WORKSPACE, 'resnet50_neuron')\\n\",\n    \"shutil.rmtree(model_dir, ignore_errors=True)\\n\",\n    \"shutil.rmtree(compiled_model_dir, ignore_errors=True)\\n\",\n    \"\\n\",\n    \"# Instantiate Keras ResNet50 model\\n\",\n    \"keras.backend.set_learning_phase(0)\\n\",\n    \"keras.backend.set_image_data_format('channels_last')\\n\",\n    \"\\n\",\n    \"model = ResNet50(weights='imagenet')\\n\",\n    \"\\n\",\n    \"# Export SavedModel\\n\",\n    \"tf.saved_model.simple_save(\\n\",\n    \"    session            = keras.backend.get_session(),\\n\",\n    \"    export_dir         = model_dir,\\n\",\n    \"    inputs             = {'input': model.inputs[0]},\\n\",\n    \"    outputs            = {'output': model.outputs[0]})\\n\",\n    \"\\n\",\n    \"# Compile using Neuron\\n\",\n    \"tfn.saved_model.compile(model_dir, compiled_model_dir)\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!ls\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"colab_type\": \"text\",\n    \"id\": \"I52jQOyO8vAn\"\n   },\n   \"source\": [\n    \"## Deploy on Inferentia\\n\",\n    \"\\n\",\n    \"Using same instance to deploy the model.\\n\",\n    \"In case of different deployment instance, launch a deployment inf1 instance and copy compiled model to the deployment inf1 instance.\\n\",\n    \"\\n\",\n    \"Download the example image, and install pillow module for inference on deployement instance\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg\\n\",\n    \"!pip install pillow  # Necessary for loading images\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### After downloading the example image, run the inference.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import time\\n\",\n    \"import numpy as np\\n\",\n    \"import tensorflow as tf\\n\",\n    \"from tensorflow.keras.preprocessing import image\\n\",\n    \"from tensorflow.keras.applications import resnet50\\n\",\n    \"\\n\",\n    \"tf.keras.backend.set_image_data_format('channels_last')\\n\",\n    \"\\n\",\n    \"# Create input from image\\n\",\n    \"img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))\\n\",\n    \"img_arr = image.img_to_array(img_sgl)\\n\",\n    \"img_arr2 = np.expand_dims(img_arr, axis=0)\\n\",\n    \"img_arr3 = resnet50.preprocess_input(img_arr2)\\n\",\n    \"\\n\",\n    \"# Load model\\n\",\n    \"COMPILED_MODEL_DIR = './ws_resnet50/resnet50_neuron/'\\n\",\n    \"predictor_inferentia = tf.contrib.predictor.from_saved_model(COMPILED_MODEL_DIR)\\n\",\n    \"\\n\",\n    \"# Run inference\\n\",\n    \"model_feed_dict={'input': img_arr3}\\n\",\n    \"infa_rslts = predictor_inferentia(model_feed_dict);\\n\",\n    \"\\n\",\n    \"# Display results\\n\",\n    \"print(resnet50.decode_predictions(infa_rslts[\\\"output\\\"], top=5)[0])\\n\",\n    \"\\n\",\n    \"# Sample output will look like below:\\n\",\n    \"#[('n02123045', 'tabby', 0.68817204), ('n02127052', 'lynx', 0.12701613), ('n02123159', 'tiger_cat', 0.08736559), ('n02124075', 'Egyptian_cat', 0.063844085), ('n02128757', 'snow_leopard', 0.009240591)]\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"colab\": {\n   \"default_view\": {},\n   \"name\": \"Untitled\",\n   \"provenance\": [],\n   \"version\": \"0.3.2\",\n   \"views\": {}\n  },\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.8.9 64-bit\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.2\"\n  },\n  \"vscode\": {\n   \"interpreter\": {\n    \"hash\": \"31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6\"\n   }\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 1\n}\n"
  },
  {
    "path": "src/examples/tensorflow/tensorflow_serving_tutorial.rst",
    "content": ".. _tensorflow-serving-neuronrt-visible-cores:\n\nUsing NEURON_RT_VISIBLE_CORES with TensorFlow Serving\n=====================================================\n\nTensorFlow serving allows customers to scale-up inference workloads\nacross a network. TensorFlow Neuron Serving uses the same API as normal\nTensorFlow Serving with two differences: (a) the saved model must be\ncompiled for Inferentia and (b) the entry point is a different binary\nnamed ``tensorflow_model_server_neuron``. Follow the steps below \nto install the package using apt-get or yum. This will be pre-installed in a future relase.\n\nInstall TensorFlow Model Server and Serving API\n-----------------------------------------------\n\nFollow the steps in the :ref:`install-neuron-tensorflow`.\n\nThen ensure you install using either apt-get or yum.\nIf using TF 1.x, install the appropriate version (see above).:\n\n.. code:: bash\n\n   sudo apt-get install tensorflow-model-server-neuron\n\nor\n\n.. code:: bash\n\n   sudo dnf install tensorflow-model-server-neuron\n\nAlso, you would need TensorFlow Serving API (use --no-deps to prevent\ninstallation of regular tensorflow). Depending on the version of Tensorflow\nyou wish to use:\n\nFor Tensorflow 1.x:\n\n.. code:: bash\n\n   pip install --no-deps tensorflow_serving_api==1.15\n\nFor Tensorflow 2.x:\n\n.. code:: bash\n\n   pip install --no-deps tensorflow_serving_api\n\nFor the example image preprocessing using Keras preprocessing, the\nPython Imaging Library Pillow is required:\n\n.. code:: bash\n\n   pip install pillow\n\nTo workaround h5py issue https://github.com/aws/aws-neuron-sdk/issues/220:\n\n.. code:: bash\n\n   pip install \"h5py<3.0.0\"\n\n\nExport and Compile Saved Model\n------------------------------\n\nThe following example shows graph construction followed by the addition\nof Neuron compilation step before exporting to saved model.\n\nFor Tensorflow 1.x:\n\n.. code:: python\n\n   import tensorflow as tf\n   import tensorflow.neuron\n\n   tf.keras.backend.set_learning_phase(0)\n   tf.keras.backend.set_image_data_format('channels_last')\n   model = tf.keras.applications.ResNet50(weights='imagenet')\n   sess = tf.keras.backend.get_session()\n   inputs = {'input': model.inputs[0]}\n   outputs = {'output': model.outputs[0]}\n\n   # save the model using tf.saved_model.simple_save\n   modeldir = \"./resnet50/1\"\n   tf.saved_model.simple_save(sess, modeldir, inputs, outputs)\n\n   # compile the model for Inferentia\n   neuron_modeldir = \"./resnet50_inf1/1\"\n   tf.neuron.saved_model.compile(modeldir, neuron_modeldir, batch_size=1)\n\nFor Tensorflow 2.x:\n\n.. code:: python\n\n    import tensorflow as tf\n    import tensorflow.neuron as tfn\n    import numpy as np\n\n    tf.keras.backend.set_learning_phase(0)\n    tf.keras.backend.set_image_data_format('channels_last')\n    image_sizes = [224, 224]\n    model = tf.keras.applications.ResNet50(weights='imagenet')\n    example_inputs = tf.random.uniform([1, *image_sizes, 3], dtype=tf.float32)\n\n    # run the model once to define the forward pass and allow for saving\n    model_neuron(example_inputs)\n    model_neuron = tfn.trace(model, example_inputs)\n    tf.keras.models.save_model(model_neuron, './resnet50_inf1/1')\n\n\n\nServing Saved Model\n-------------------\n\nUser can now serve the saved model with the\ntensorflow_model_server_neuron binary. To utilize multiple NeuronCores,\nit is recommended to launch multiple tensorflow model servers that\nlisten to the same gRPC port:\n\n.. code:: bash\n\n   export NEURON_RT_VISIBLE_CORES=0  # important to set this environment variable before launching model servers\n   tensorflow_model_server_neuron --model_name=resnet50_inf1 \\\n        --model_base_path=$(pwd)/resnet50_inf1/ --port=8500\n\n   #then to run another server on a different neuron core open another\n   #window and run this, except this time set NEURON_RT_VISIBLE_CORES=1\n   #you can keep doing this up to the number of Neuron Cores on your machine\n\n   export NEURON_RT_VISIBLE_CORES=1\n   tensorflow_model_server_neuron --model_name=resnet50_inf1 \\\n        --model_base_path=$(pwd)/resnet50_inf1/ --port=8500\n\nThe compiled model is staged in Inferentia DRAM by the server to prepare\nfor inference.\n\nGenerate inference requests to the model server\n-----------------------------------------------\n\nNow run inferences via GRPC as shown in the following sample client\ncode:\n\nFor Tensorflow 1.x:\n\n.. code:: python\n\n  import numpy as np\n  import grpc\n  import tensorflow as tf\n  from tensorflow.keras.preprocessing import image\n  from tensorflow.keras.applications.resnet50 import preprocess_input\n  from tensorflow.keras.applications.resnet50 import decode_predictions\n  from tensorflow_serving.apis import predict_pb2\n  from tensorflow_serving.apis import prediction_service_pb2_grpc\n\n  if __name__ == '__main__':\n      channel = grpc.insecure_channel('localhost:8500')\n      stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)\n      img_file = tf.keras.utils.get_file(\n          \"./kitten_small.jpg\",\n          \"https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg\")\n      img = image.load_img(img_file, target_size=(224, 224))\n      img_array = preprocess_input(image.img_to_array(img)[None, ...])\n      request = predict_pb2.PredictRequest()\n      request.model_spec.name = 'resnet50_inf1'\n      request.inputs['input'].CopyFrom(\n          tf.contrib.util.make_tensor_proto(img_array, shape=img_array.shape))\n      result = stub.Predict(request)\n      prediction = tf.make_ndarray(result.outputs['output'])\n      print(decode_predictions(prediction))\n\nFor Tensorflow 2.x:\n\n.. code:: python\n\n    import numpy as np\n    import grpc\n    import tensorflow as tf\n    from tensorflow.keras.preprocessing import image\n    from tensorflow.keras.applications.resnet50 import preprocess_input\n    from tensorflow_serving.apis import predict_pb2\n    from tensorflow_serving.apis import prediction_service_pb2_grpc\n    from tensorflow.keras.applications.resnet50 import decode_predictions\n\n    tf.keras.backend.set_image_data_format('channels_last')\n\n    if __name__ == '__main__':\n        channel = grpc.insecure_channel('localhost:8500')\n        stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)\n        img_file = tf.keras.utils.get_file(\n            \"./kitten_small.jpg\",\n            \"https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg\")\n        img = image.load_img(img_file, target_size=(224, 224))\n        img_array = preprocess_input(image.img_to_array(img)[None, ...])\n        request = predict_pb2.PredictRequest()\n        request.model_spec.name = 'resnet50_inf1'\n        request.inputs['input_1'].CopyFrom(\n            tf.make_tensor_proto(img_array, shape=img_array.shape))\n        result = stub.Predict(request)\n        prediction = tf.make_ndarray(result.outputs['output_1'])\n        print(decode_predictions(prediction))\n"
  },
  {
    "path": "src/examples/tensorflow/yolo_v3_demo/yolo_v3.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# [Broken] Evaluate YOLO v3 on Inferentia\\n\",\n    \"## Note: this tutorial runs on tensorflow-neuron 1.x only\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Introduction\\n\",\n    \"This tutorial walks through compiling and evaluating YOLO v3 model on Inferentia using the AWS Neuron SDK.\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"In this tutorial we provide two main sections:\\n\",\n    \"\\n\",\n    \"1. Download Dataset and Generate Pretrained SavedModel\\n\",\n    \"\\n\",\n    \"2. Compile the YOLO v3 model.\\n\",\n    \"\\n\",\n    \"3. Deploy the same compiled model.\\n\",\n    \"\\n\",\n    \"Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](../../../../frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow). You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page.\\n\",\n    \"\\n\",\n    \"Instructions of how to setup Neuron Tensorflow environment and run the tutorial as a Jupyter notebook are available in the Tutorial main page [Tensorflow-YOLO_v3 Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuron/tutorials/yolo_v3_demo/yolo_v3_demo.html)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Prerequisites\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"This demo requires the following pip packages:\\n\",\n    \"\\n\",\n    \"`pillow matplotlib pycocotools`\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"%pip install tensorflow_neuron==1.15.5.2.8.9.0 neuron_cc==1.13.5.0 requests pillow matplotlib pycocotools==2.0.1 numpy==1.18.2 torch~=1.5.0 --force \\\\\\n\",\n    \"    --extra-index-url=https://pip.repos.neuron.amazonaws.com\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 1:  Download Dataset and Generate Pretrained SavedModel\\n\",\n    \"### Download COCO 2017 validation dataset\\n\",\n    \"\\n\",\n    \"We start by downloading the COCO validation dataset, which we will use to validate our model. The COCO 2017 dataset is widely used for object-detection, segmentation and image captioning.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"!curl -LO http://images.cocodataset.org/zips/val2017.zip\\n\",\n    \"!curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip\\n\",\n    \"!unzip -q val2017.zip\\n\",\n    \"!unzip annotations_trainval2017.zip\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!ls\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"\\n\",\n    \"## Generate YOLO v3 tensorflow SavedModel (pretrained on COCO 2017 dataset)\\n\",\n    \"\\n\",\n    \"Script yolo_v3_coco_saved_model.py will generate a tensorflow SavedModel using pretrained weights from https://github.com/YunYang1994/tensorflow-yolov3/releases/download/v1.0/yolov3_coco.tar.gz.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"%run yolo_v3_coco_saved_model.py ./yolo_v3_coco_saved_model\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"This tensorflow SavedModel can be loaded as a tensorflow predictor. When a JPEG format image is provided as input, the output result of the tensorflow predictor contains information for drawing bounding boxes and classification results.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import json\\n\",\n    \"import tensorflow as tf\\n\",\n    \"from PIL import Image\\n\",\n    \"import matplotlib.pyplot as plt\\n\",\n    \"import matplotlib.patches as patches\\n\",\n    \"\\n\",\n    \"# launch predictor and run inference on an arbitrary image in the validation dataset\\n\",\n    \"yolo_pred_cpu = tf.contrib.predictor.from_saved_model('./yolo_v3_coco_saved_model')\\n\",\n    \"image_path = './val2017/000000581781.jpg'\\n\",\n    \"with open(image_path, 'rb') as f:\\n\",\n    \"    feeds = {'image': [f.read()]}\\n\",\n    \"results = yolo_pred_cpu(feeds)\\n\",\n    \"\\n\",\n    \"# load annotations to decode classification result\\n\",\n    \"with open('./annotations/instances_val2017.json') as f:\\n\",\n    \"    annotate_json = json.load(f)\\n\",\n    \"label_info = {idx+1: cat['name'] for idx, cat in enumerate(annotate_json['categories'])}\\n\",\n    \"\\n\",\n    \"# draw picture and bounding boxes\\n\",\n    \"fig, ax = plt.subplots(figsize=(10, 10))\\n\",\n    \"ax.imshow(Image.open(image_path).convert('RGB'))\\n\",\n    \"wanted = results['scores'][0] > 0.1\\n\",\n    \"for xyxy, label_no_bg in zip(results['boxes'][0][wanted], results['classes'][0][wanted]):\\n\",\n    \"    xywh = xyxy[0], xyxy[1], xyxy[2] - xyxy[0], xyxy[3] - xyxy[1]\\n\",\n    \"    rect = patches.Rectangle((xywh[0], xywh[1]), xywh[2], xywh[3], linewidth=1, edgecolor='g', facecolor='none')\\n\",\n    \"    ax.add_patch(rect)\\n\",\n    \"    rx, ry = rect.get_xy()\\n\",\n    \"    rx = rx + rect.get_width() / 2.0\\n\",\n    \"    ax.annotate(label_info[label_no_bg + 1], (rx, ry), color='w', backgroundcolor='g', fontsize=10,\\n\",\n    \"                ha='center', va='center', bbox=dict(boxstyle='square,pad=0.01', fc='g', ec='none', alpha=0.5))\\n\",\n    \"plt.show()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 2:  Compile the Pretrained SavedModel for Neuron\\n\",\n    \"\\n\",\n    \"We make use of the Python compilation API `tfn.saved_model.compile` that is available in `tensorflow-neuron<2`. For the purpose of reducing Neuron runtime overhead, it is necessary to make use of arguments `no_fuse_ops` and `minimum_segment_size`.\\n\",\n    \"Compiled model is saved in ./yolo_v3_coco_saved_model_neuron.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import shutil\\n\",\n    \"import tensorflow as tf\\n\",\n    \"import tensorflow.neuron as tfn\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def no_fuse_condition(op):\\n\",\n    \"    return op.name.startswith('Preprocessor') or op.name.startswith('Postprocessor')\\n\",\n    \"\\n\",\n    \"with tf.Session(graph=tf.Graph()) as sess:\\n\",\n    \"    tf.saved_model.loader.load(sess, ['serve'], './yolo_v3_coco_saved_model')\\n\",\n    \"    no_fuse_ops = [op.name for op in sess.graph.get_operations() if no_fuse_condition(op)]\\n\",\n    \"shutil.rmtree('./yolo_v3_coco_saved_model_neuron', ignore_errors=True)\\n\",\n    \"result = tfn.saved_model.compile(\\n\",\n    \"    './yolo_v3_coco_saved_model', './yolo_v3_coco_saved_model_neuron',\\n\",\n    \"    # to enforce trivial compilable subgraphs to run on CPU\\n\",\n    \"    no_fuse_ops=no_fuse_ops,\\n\",\n    \"    minimum_segment_size=100,\\n\",\n    \"    batch_size=2,\\n\",\n    \"    dynamic_batch_size=True,\\n\",\n    \")\\n\",\n    \"print(result)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Deploy the model on Inferentia\\n\",\n    \"## Part 3:Evaluate Model Quality after Compilation\\n\",\n    \"\\n\",\n    \"### Define evaluation functions\\n\",\n    \"We first define some handy helper functions for running evaluation on the COCO 2017 dataset.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import json\\n\",\n    \"import time\\n\",\n    \"import numpy as np\\n\",\n    \"import tensorflow as tf\\n\",\n    \"from pycocotools.coco import COCO\\n\",\n    \"from pycocotools.cocoeval import COCOeval\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def cocoapi_eval(jsonfile,\\n\",\n    \"                 style,\\n\",\n    \"                 coco_gt=None,\\n\",\n    \"                 anno_file=None,\\n\",\n    \"                 max_dets=(100, 300, 1000)):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Args:\\n\",\n    \"        jsonfile: Evaluation json file, eg: bbox.json, mask.json.\\n\",\n    \"        style: COCOeval style, can be `bbox` , `segm` and `proposal`.\\n\",\n    \"        coco_gt: Whether to load COCOAPI through anno_file,\\n\",\n    \"                 eg: coco_gt = COCO(anno_file)\\n\",\n    \"        anno_file: COCO annotations file.\\n\",\n    \"        max_dets: COCO evaluation maxDets.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    assert coco_gt is not None or anno_file is not None\\n\",\n    \"\\n\",\n    \"    if coco_gt is None:\\n\",\n    \"        coco_gt = COCO(anno_file)\\n\",\n    \"    print(\\\"Start evaluate...\\\")\\n\",\n    \"    coco_dt = coco_gt.loadRes(jsonfile)\\n\",\n    \"    if style == 'proposal':\\n\",\n    \"        coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')\\n\",\n    \"        coco_eval.params.useCats = 0\\n\",\n    \"        coco_eval.params.maxDets = list(max_dets)\\n\",\n    \"    else:\\n\",\n    \"        coco_eval = COCOeval(coco_gt, coco_dt, style)\\n\",\n    \"    coco_eval.evaluate()\\n\",\n    \"    coco_eval.accumulate()\\n\",\n    \"    coco_eval.summarize()\\n\",\n    \"    return coco_eval.stats\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def bbox_eval(anno_file, bbox_list):\\n\",\n    \"    coco_gt = COCO(anno_file)\\n\",\n    \"\\n\",\n    \"    outfile = 'bbox_detections.json'\\n\",\n    \"    print('Generating json file...')\\n\",\n    \"    with open(outfile, 'w') as f:\\n\",\n    \"        json.dump(bbox_list, f)\\n\",\n    \"\\n\",\n    \"    map_stats = cocoapi_eval(outfile, 'bbox', coco_gt=coco_gt)\\n\",\n    \"    return map_stats\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_image_as_bytes(images, eval_pre_path):\\n\",\n    \"    batch_im_id_list = []\\n\",\n    \"    batch_im_name_list = []\\n\",\n    \"    batch_img_bytes_list = []\\n\",\n    \"    n = len(images)\\n\",\n    \"    batch_im_id = []\\n\",\n    \"    batch_im_name = []\\n\",\n    \"    batch_img_bytes = []\\n\",\n    \"    for i, im in enumerate(images):\\n\",\n    \"        im_id = im['id']\\n\",\n    \"        file_name = im['file_name']\\n\",\n    \"        if i % eval_batch_size == 0 and i != 0:\\n\",\n    \"            batch_im_id_list.append(batch_im_id)\\n\",\n    \"            batch_im_name_list.append(batch_im_name)\\n\",\n    \"            batch_img_bytes_list.append(batch_img_bytes)\\n\",\n    \"            batch_im_id = []\\n\",\n    \"            batch_im_name = []\\n\",\n    \"            batch_img_bytes = []\\n\",\n    \"        batch_im_id.append(im_id)\\n\",\n    \"        batch_im_name.append(file_name)\\n\",\n    \"\\n\",\n    \"        with open(os.path.join(eval_pre_path, file_name), 'rb') as f:\\n\",\n    \"            batch_img_bytes.append(f.read())\\n\",\n    \"    return batch_im_id_list, batch_im_name_list, batch_img_bytes_list\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def analyze_bbox(results, batch_im_id, _clsid2catid):\\n\",\n    \"    bbox_list = []\\n\",\n    \"    k = 0\\n\",\n    \"    for boxes, scores, classes in zip(results['boxes'], results['scores'], results['classes']):\\n\",\n    \"        if boxes is not None:\\n\",\n    \"            im_id = batch_im_id[k]\\n\",\n    \"            n = len(boxes)\\n\",\n    \"            for p in range(n):\\n\",\n    \"                clsid = classes[p]\\n\",\n    \"                score = scores[p]\\n\",\n    \"                xmin, ymin, xmax, ymax = boxes[p]\\n\",\n    \"                catid = (_clsid2catid[int(clsid)])\\n\",\n    \"                w = xmax - xmin + 1\\n\",\n    \"                h = ymax - ymin + 1\\n\",\n    \"\\n\",\n    \"                bbox = [xmin, ymin, w, h]\\n\",\n    \"                # Round to the nearest 10th to avoid huge file sizes, as COCO suggests\\n\",\n    \"                bbox = [round(float(x) * 10) / 10 for x in bbox]\\n\",\n    \"                bbox_res = {\\n\",\n    \"                    'image_id': im_id,\\n\",\n    \"                    'category_id': catid,\\n\",\n    \"                    'bbox': bbox,\\n\",\n    \"                    'score': float(score),\\n\",\n    \"                }\\n\",\n    \"                bbox_list.append(bbox_res)\\n\",\n    \"        k += 1\\n\",\n    \"    return bbox_list\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Here is the actual evaluation loop. To fully utilize all four cores on one Inferentia, the optimal setup is to run multi-threaded inference using a `ThreadPoolExecutor`. The following cell is a multi-threaded adaptation of the evaluation routine at https://github.com/miemie2013/Keras-YOLOv4/blob/910c4c6f7265f5828fceed0f784496a0b46516bf/tools/cocotools.py#L97.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from concurrent import futures\\n\",\n    \"\\n\",\n    \"def evaluate(yolo_predictor, images, eval_pre_path, anno_file, eval_batch_size, _clsid2catid):\\n\",\n    \"    batch_im_id_list, batch_im_name_list, batch_img_bytes_list = get_image_as_bytes(images, eval_pre_path)\\n\",\n    \"\\n\",\n    \"    # warm up\\n\",\n    \"    yolo_predictor({'image': np.array(batch_img_bytes_list[0], dtype=object)})\\n\",\n    \"\\n\",\n    \"    with futures.ThreadPoolExecutor(4) as exe:\\n\",\n    \"        fut_im_list = []\\n\",\n    \"        fut_list = []\\n\",\n    \"        start_time = time.time()\\n\",\n    \"        for batch_im_id, batch_im_name, batch_img_bytes in zip(batch_im_id_list, batch_im_name_list, batch_img_bytes_list):\\n\",\n    \"            if len(batch_img_bytes) != eval_batch_size:\\n\",\n    \"                continue\\n\",\n    \"            fut = exe.submit(yolo_predictor, {'image': np.array(batch_img_bytes, dtype=object)})\\n\",\n    \"            fut_im_list.append((batch_im_id, batch_im_name))\\n\",\n    \"            fut_list.append(fut)\\n\",\n    \"        bbox_list = []\\n\",\n    \"        count = 0\\n\",\n    \"        for (batch_im_id, batch_im_name), fut in zip(fut_im_list, fut_list):\\n\",\n    \"            results = fut.result()\\n\",\n    \"            bbox_list.extend(analyze_bbox(results, batch_im_id, _clsid2catid))\\n\",\n    \"            for _ in batch_im_id:\\n\",\n    \"                count += 1\\n\",\n    \"                if count % 100 == 0:\\n\",\n    \"                    print('Test iter {}'.format(count))\\n\",\n    \"        print('==================== Performance Measurement ====================')\\n\",\n    \"        print('Finished inference on {} images in {} seconds'.format(len(images), time.time() - start_time))\\n\",\n    \"        print('=================================================================')\\n\",\n    \"    # start evaluation\\n\",\n    \"    box_ap_stats = bbox_eval(anno_file, bbox_list)\\n\",\n    \"    return box_ap_stats\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Evaluate mean average precision (mAP) score\\n\",\n    \"Here is the code to calculate mAP scores of the YOLO v3 model. The expected mAP score is around 0.328 if we use the pretrained weights.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"yolo_pred = tf.contrib.predictor.from_saved_model('./yolo_v3_coco_saved_model_neuron')\\n\",\n    \"\\n\",\n    \"val_coco_root = './val2017'\\n\",\n    \"val_annotate = './annotations/instances_val2017.json'\\n\",\n    \"clsid2catid = {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16,\\n\",\n    \"               15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31,\\n\",\n    \"               27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43,\\n\",\n    \"               39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56,\\n\",\n    \"               51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72,\\n\",\n    \"               63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85,\\n\",\n    \"               75: 86, 76: 87, 77: 88, 78: 89, 79: 90}\\n\",\n    \"eval_batch_size = 8\\n\",\n    \"with open(val_annotate, 'r', encoding='utf-8') as f2:\\n\",\n    \"    for line in f2:\\n\",\n    \"        line = line.strip()\\n\",\n    \"        dataset = json.loads(line)\\n\",\n    \"        images = dataset['images']\\n\",\n    \"box_ap = evaluate(yolo_pred, images, val_coco_root, val_annotate, eval_batch_size, clsid2catid)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3.8.9 64-bit\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.9\"\n  },\n  \"vscode\": {\n   \"interpreter\": {\n    \"hash\": \"31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6\"\n   }\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "src/examples/tensorflow/yolo_v3_demo/yolo_v3_coco_saved_model.py",
    "content": "import argparse\nimport os\nimport urllib.request\nimport tempfile\nimport shutil\nfrom functools import partial\nimport numpy as np\nimport tensorflow as tf\n\n\nSTRIDES = [8, 16, 32]\nANCHORS = np.array([1.25,1.625, 2.0,3.75, 4.125,2.875, 1.875,3.8125, 3.875,2.8125, 3.6875,7.4375, 3.625,2.8125, 4.875,6.1875, 11.65625,10.1875]).astype(np.float32).reshape([3, 3, 2])\nANCHOR_PER_SCALE = 3\nBOX_SCORE_THRESH = 0.3\nUPSAMPLE_METHOD = \"resize\"\nNUM_CLASSES = 80\n\n\nclass YOLOV3(object):\n    \"\"\"Implement tensoflow yolov3 here\"\"\"\n    def __init__(self, input_data, input_size, trainable):\n\n        self.trainable        = trainable\n        self.num_class        = NUM_CLASSES\n        self.strides          = STRIDES\n        self.anchors          = ANCHORS\n        self.anchor_per_scale = ANCHOR_PER_SCALE\n        self.box_score_thresh = BOX_SCORE_THRESH\n        self.upsample_method  = UPSAMPLE_METHOD\n\n        input_data, decoded_shape = preprocessor(input_data, [input_size, input_size])\n        self.conv_lbbox, self.conv_mbbox, self.conv_sbbox = self.__build_nework(input_data)\n\n        def decode_boxes(bboxes_and_decoded_shape):\n            conv_lbbox, conv_mbbox, conv_sbbox, decoded_shape = bboxes_and_decoded_shape\n            conv_lbbox = tf.cast(conv_lbbox, tf.float32)\n            conv_mbbox = tf.cast(conv_mbbox, tf.float32)\n            conv_sbbox = tf.cast(conv_sbbox, tf.float32)\n            conv_lbbox = conv_lbbox[tf.newaxis, ...]\n            conv_mbbox = conv_mbbox[tf.newaxis, ...]\n            conv_sbbox = conv_sbbox[tf.newaxis, ...]\n            decoded_shape = decoded_shape[tf.newaxis, ...]\n            with tf.variable_scope('pred_sbbox'):\n                pred_sbbox_coors, pred_sbbox_class_scores = self.decode(conv_sbbox, self.anchors[0], self.strides[0], decoded_shape, input_size)\n\n            with tf.variable_scope('pred_mbbox'):\n                pred_mbbox_coors, pred_mbbox_class_scores = self.decode(conv_mbbox, self.anchors[1], self.strides[1], decoded_shape, input_size)\n\n            with tf.variable_scope('pred_lbbox'):\n                pred_lbbox_coors, pred_lbbox_class_scores = self.decode(conv_lbbox, self.anchors[2], self.strides[2], decoded_shape, input_size)\n\n            with tf.variable_scope('pred_bbox_filter'):\n                pred_bbox_coors = tf.concat([pred_sbbox_coors, pred_mbbox_coors, pred_lbbox_coors], axis=1)\n                pred_bbox_class_scores = tf.concat([pred_sbbox_class_scores, pred_mbbox_class_scores, pred_lbbox_class_scores], axis=1)\n                nms_top_k = 100\n                nms_thresh= 0.45\n                coors, scores, classes, valid_detections = tf.image.combined_non_max_suppression(\n                    pred_bbox_coors,\n                    pred_bbox_class_scores,\n                    max_output_size_per_class=nms_top_k,\n                    max_total_size=nms_top_k,\n                    iou_threshold=nms_thresh,\n                    score_threshold=self.box_score_thresh,\n                    pad_per_class=False,\n                    clip_boxes=False,\n                    name='CombinedNonMaxSuppression',\n                )\n                scores = scores[..., tf.newaxis]\n                classes = classes[..., tf.newaxis]\n            return coors[0], scores[0], classes[0]\n\n        with tf.name_scope('Postprocessor'):\n            coors, scores, classes = tf.map_fn(\n                decode_boxes, [self.conv_lbbox, self.conv_mbbox, self.conv_sbbox, decoded_shape],\n                dtype=(tf.float32, tf.float32, tf.float32), back_prop=False, parallel_iterations=16)\n\n        with tf.variable_scope('pred_bbox'):\n            self.pred_bbox_boxes = tf.identity(coors, name='boxes')\n            self.pred_bbox_scores = tf.identity(scores[..., 0], name='scores')\n            self.pred_bbox_classes = tf.identity(classes[..., 0], name='classes')\n\n    def __build_nework(self, input_data):\n        route_1, route_2, input_data = darknet53(input_data, self.trainable)\n\n        input_data = convolutional(input_data, (1, 1, 1024,  512), self.trainable, 'conv52')\n        input_data = convolutional(input_data, (3, 3,  512, 1024), self.trainable, 'conv53')\n        input_data = convolutional(input_data, (1, 1, 1024,  512), self.trainable, 'conv54')\n        input_data = convolutional(input_data, (3, 3,  512, 1024), self.trainable, 'conv55')\n        input_data = convolutional(input_data, (1, 1, 1024,  512), self.trainable, 'conv56')\n\n        conv_lobj_branch = convolutional(input_data, (3, 3, 512, 1024), self.trainable, name='conv_lobj_branch')\n        conv_lbbox = convolutional(conv_lobj_branch, (1, 1, 1024, 3*(self.num_class + 5)),\n                                   trainable=self.trainable, name='conv_lbbox', activate=False, bn=False)\n\n        input_data = convolutional(input_data, (1, 1,  512,  256), self.trainable, 'conv57')\n        input_data = upsample(input_data, name='upsample0', method=self.upsample_method)\n\n        with tf.variable_scope('route_1'):\n            input_data = tf.concat([input_data, route_2], axis=-1)\n\n        input_data = convolutional(input_data, (1, 1, 768, 256), self.trainable, 'conv58')\n        input_data = convolutional(input_data, (3, 3, 256, 512), self.trainable, 'conv59')\n        input_data = convolutional(input_data, (1, 1, 512, 256), self.trainable, 'conv60')\n        input_data = convolutional(input_data, (3, 3, 256, 512), self.trainable, 'conv61')\n        input_data = convolutional(input_data, (1, 1, 512, 256), self.trainable, 'conv62')\n\n        conv_mobj_branch = convolutional(input_data, (3, 3, 256, 512),  self.trainable, name='conv_mobj_branch' )\n        conv_mbbox = convolutional(conv_mobj_branch, (1, 1, 512, 3*(self.num_class + 5)),\n                                   trainable=self.trainable, name='conv_mbbox', activate=False, bn=False)\n\n        input_data = convolutional(input_data, (1, 1, 256, 128), self.trainable, 'conv63')\n        input_data = upsample(input_data, name='upsample1', method=self.upsample_method)\n\n        with tf.variable_scope('route_2'):\n            input_data = tf.concat([input_data, route_1], axis=-1)\n\n        input_data = convolutional(input_data, (1, 1, 384, 128), self.trainable, 'conv64')\n        input_data = convolutional(input_data, (3, 3, 128, 256), self.trainable, 'conv65')\n        input_data = convolutional(input_data, (1, 1, 256, 128), self.trainable, 'conv66')\n        input_data = convolutional(input_data, (3, 3, 128, 256), self.trainable, 'conv67')\n        input_data = convolutional(input_data, (1, 1, 256, 128), self.trainable, 'conv68')\n\n        conv_sobj_branch = convolutional(input_data, (3, 3, 128, 256), self.trainable, name='conv_sobj_branch')\n        conv_sbbox = convolutional(conv_sobj_branch, (1, 1, 256, 3*(self.num_class + 5)),\n                                   trainable=self.trainable, name='conv_sbbox', activate=False, bn=False)\n\n        return conv_lbbox, conv_mbbox, conv_sbbox\n\n    def decode(self, conv_output, anchors, stride, decoded_shape, input_size):\n        conv_output = tf.cast(conv_output, tf.float32)\n        \"\"\"\n        return tensor of shape [batch_size, output_size, output_size, anchor_per_scale, 5 + num_classes]\n               contains (x, y, w, h, score, probability)\n        \"\"\"\n\n        conv_shape       = tf.shape(conv_output)\n        batch_size       = conv_shape[0]\n        output_size      = conv_shape[1]\n        anchor_per_scale = len(anchors)\n\n        conv_output = tf.reshape(conv_output, (batch_size, output_size, output_size, anchor_per_scale, 5 + self.num_class))\n\n        conv_raw_dxdy = conv_output[:, :, :, :, 0:2]\n        conv_raw_dwdh = conv_output[:, :, :, :, 2:4]\n        conv_raw_conf = conv_output[:, :, :, :, 4:5]\n        conv_raw_prob = conv_output[:, :, :, :, 5: ]\n\n        y = tf.tile(tf.range(output_size, dtype=tf.int32)[:, tf.newaxis], [1, output_size])\n        x = tf.tile(tf.range(output_size, dtype=tf.int32)[tf.newaxis, :], [output_size, 1])\n\n        xy_grid = tf.concat([x[:, :, tf.newaxis], y[:, :, tf.newaxis]], axis=-1)\n        xy_grid = tf.tile(xy_grid[tf.newaxis, :, :, tf.newaxis, :], [batch_size, 1, 1, anchor_per_scale, 1])\n        xy_grid = tf.cast(xy_grid, tf.float32)\n\n        pred_xy = (tf.sigmoid(conv_raw_dxdy) + xy_grid) * stride\n        pred_wh = (tf.exp(conv_raw_dwdh) * anchors) * stride\n        pred_xywh = tf.concat([pred_xy, pred_wh], axis=-1)\n\n        pred_conf = tf.sigmoid(conv_raw_conf)\n        pred_prob = tf.sigmoid(conv_raw_prob)\n\n        pred_xywh = tf.reshape(pred_xywh, (-1, output_size*output_size*3, pred_xywh.shape[-1]))\n        pred_conf = tf.reshape(pred_conf, (-1, output_size*output_size*3))\n        pred_prob = tf.reshape(pred_prob, (-1, output_size*output_size*3, pred_prob.shape[-1]))\n\n        return tf_postprocess_boxes(pred_xywh, pred_conf, pred_prob, decoded_shape, input_size, self.box_score_thresh)\n\n\ndef darknet53(input_data, trainable):\n\n    with tf.variable_scope('darknet'):\n\n        input_data = convolutional(input_data, filters_shape=(3, 3,  3,  32), trainable=trainable, name='conv0')\n        input_data = convolutional(input_data, filters_shape=(3, 3, 32,  64), trainable=trainable, name='conv1', downsample=True)\n\n        for i in range(1):\n            input_data = residual_block(input_data,  64,  32, 64, trainable=trainable, name='residual%d' %(i+0))\n\n        input_data = convolutional(input_data, filters_shape=(3, 3,  64, 128), trainable=trainable, name='conv4', downsample=True)\n\n        for i in range(2):\n            input_data = residual_block(input_data, 128,  64, 128, trainable=trainable, name='residual%d' %(i+1))\n\n        input_data = convolutional(input_data, filters_shape=(3, 3, 128, 256), trainable=trainable, name='conv9', downsample=True)\n\n        for i in range(8):\n            input_data = residual_block(input_data, 256, 128, 256, trainable=trainable, name='residual%d' %(i+3))\n\n        route_1 = input_data\n        input_data = convolutional(input_data, filters_shape=(3, 3, 256, 512), trainable=trainable, name='conv26', downsample=True)\n\n        for i in range(8):\n            input_data = residual_block(input_data, 512, 256, 512, trainable=trainable, name='residual%d' %(i+11))\n\n        route_2 = input_data\n        input_data = convolutional(input_data, filters_shape=(3, 3, 512, 1024), trainable=trainable, name='conv43', downsample=True)\n\n        for i in range(4):\n            input_data = residual_block(input_data, 1024, 512, 1024, trainable=trainable, name='residual%d' %(i+19))\n\n        return route_1, route_2, input_data\n\n\ndef convolutional(input_data, filters_shape, trainable, name, downsample=False, activate=True, bn=True):\n\n    with tf.variable_scope(name):\n        if downsample:\n            pad_h, pad_w = (filters_shape[0] - 2) // 2 + 1, (filters_shape[1] - 2) // 2 + 1\n            paddings = tf.constant([[0, 0], [pad_h, pad_h], [pad_w, pad_w], [0, 0]])\n            input_data = tf.pad(input_data, paddings, 'CONSTANT')\n            strides = (1, 2, 2, 1)\n            padding = 'VALID'\n        else:\n            strides = (1, 1, 1, 1)\n            padding = \"SAME\"\n\n        weight = tf.get_variable(name='weight', dtype=tf.float32, trainable=True,\n                                 shape=filters_shape, initializer=tf.random_normal_initializer(stddev=0.01))\n        weight = tf.cast(weight, tf.float16)\n        conv = tf.nn.conv2d(input=input_data, filter=weight, strides=strides, padding=padding)\n\n        if bn:\n            conv = tf.layers.batch_normalization(conv, beta_initializer=tf.zeros_initializer(),\n                                                 gamma_initializer=tf.ones_initializer(),\n                                                 moving_mean_initializer=tf.zeros_initializer(),\n                                                 moving_variance_initializer=tf.ones_initializer(), training=trainable,\n                                                 fused=False)\n        else:\n            bias = tf.get_variable(name='bias', shape=filters_shape[-1], trainable=True,\n                                   dtype=tf.float32, initializer=tf.constant_initializer(0.0))\n            bias = tf.cast(bias, tf.float16)\n            conv = tf.nn.bias_add(conv, bias)\n\n        if activate == True: conv = tf.nn.leaky_relu(conv, alpha=0.1)\n\n    return conv\n\n\ndef residual_block(input_data, input_channel, filter_num1, filter_num2, trainable, name):\n    short_cut = input_data\n    with tf.variable_scope(name):\n        input_data = convolutional(input_data, filters_shape=(1, 1, input_channel, filter_num1),\n                                   trainable=trainable, name='conv1')\n        input_data = convolutional(input_data, filters_shape=(3, 3, filter_num1,   filter_num2),\n                                   trainable=trainable, name='conv2')\n        residual_output = input_data + short_cut\n    return residual_output\n\n\ndef upsample(input_data, name, method=\"deconv\"):\n    assert method in [\"resize\", \"deconv\"]\n\n    if method == \"resize\":\n        with tf.variable_scope(name):\n            input_shape = tf.shape(input_data)\n            output = tf.image.resize_nearest_neighbor(input_data, (input_shape[1] * 2, input_shape[2] * 2))\n\n    if method == \"deconv\":\n        # replace resize_nearest_neighbor with conv2d_transpose To support TensorRT optimization\n        numm_filter = input_data.shape.as_list()[-1]\n        output = tf.layers.conv2d_transpose(input_data, numm_filter, kernel_size=2, padding='same',\n                                            strides=(2,2), kernel_initializer=tf.random_normal_initializer())\n\n    return output\n\n\ndef decode_jpeg_resize(input_tensor, image_size):\n    tensor = tf.image.decode_png(input_tensor, channels=3)\n    shape = tf.shape(tensor)\n    tensor = tf.cast(tensor, tf.float32)\n    tensor = tf.image.resize_image_with_pad(tensor, image_size[0], image_size[1])\n    tensor /= 255.0\n    return tf.cast(tensor, tf.float16), shape\n\n\ndef preprocessor(input_tensor, image_size):\n    with tf.name_scope('Preprocessor'):\n        batch_tensor, batch_shape = tf.map_fn(\n            partial(decode_jpeg_resize, image_size=image_size), input_tensor,\n            dtype=(tf.float16, tf.int32), back_prop=False, parallel_iterations=16)\n    return batch_tensor, batch_shape\n\n\ndef tf_postprocess_boxes(pred_xywh, pred_conf, pred_prob, org_img_shape, input_size, score_threshold):\n    batch_size = tf.shape(pred_xywh)[0]\n\n    # # (1) (x, y, w, h) --> (xmin, ymin, xmax, ymax)\n    pred_coor = tf.concat([pred_xywh[:, :, :2] - pred_xywh[:, :, 2:] * 0.5,\n                           pred_xywh[:, :, :2] + pred_xywh[:, :, 2:] * 0.5], axis=-1)\n    # # (2) (xmin, ymin, xmax, ymax) -> (xmin_org, ymin_org, xmax_org, ymax_org)\n    org_wh = org_img_shape[:, tf.newaxis, 1::-1]\n    org_whwh = tf.concat([org_wh, org_wh], axis=-1)\n    org_whwh = tf.cast(org_whwh, tf.float32)\n    input_size = np.float32(input_size)\n    resize_ratio = input_size / tf.reduce_max(org_whwh, axis=-1)\n    dwhwh = (input_size - resize_ratio * org_whwh) / 2\n    pred_coor = (pred_coor - dwhwh) / resize_ratio\n\n    # # (5) discard some boxes with low scores\n    scores = pred_conf * tf.reduce_max(pred_prob, axis=-1)\n    score_mask = scores > score_threshold\n    coors = pred_coor[score_mask]\n    pred_conf = pred_conf[score_mask]\n    pred_conf = tf.reshape(pred_conf, [batch_size, -1, 1])\n    pred_prob = pred_prob[score_mask]\n    pred_prob = tf.reshape(pred_prob, [batch_size, -1, pred_prob.shape[-1]])\n    class_scores = pred_conf * pred_prob\n    coors = tf.reshape(coors, [batch_size, -1, 1, coors.shape[-1]])\n    class_scores = tf.reshape(class_scores, [batch_size, -1, class_scores.shape[-1]])\n    return coors, class_scores\n\n\ndef convert_weights(org_weights_path, cur_weights_path, input_size):\n    org_weights_mess = []\n    with tf.Session(graph=tf.Graph()) as sess:\n        load = tf.train.import_meta_graph(org_weights_path + '.meta')\n        load.restore(sess, org_weights_path)\n        for var in tf.global_variables():\n            var_name = var.op.name\n            var_name_mess = str(var_name).split('/')\n            var_shape = var.shape\n            org_weights_mess.append([var_name, var_shape])\n            print(\"=> \" + str(var_name).ljust(50), var_shape)\n        print()\n\n    cur_weights_mess = []\n    with tf.Session(graph=tf.Graph()) as sess:\n        with tf.name_scope('input'):\n            input_data = tf.placeholder(dtype=tf.string, shape=(None,), name='input_data')\n            training = tf.placeholder(dtype=tf.bool, name='trainable')\n        model = YOLOV3(input_data, input_size, training)\n        for var in tf.global_variables():\n            var_name = var.op.name\n            var_name_mess = str(var_name).split('/')\n            var_shape = var.shape\n            print(var_name_mess[0])\n            cur_weights_mess.append([var_name, var_shape])\n            print(\"=> \" + str(var_name).ljust(50), var_shape)\n\n        org_weights_num = len(org_weights_mess)\n        cur_weights_num = len(cur_weights_mess)\n        if cur_weights_num != org_weights_num:\n            raise RuntimeError\n\n        print('=> Number of weights that will rename:\\t%d' % cur_weights_num)\n        cur_to_org_dict = {}\n        for index in range(org_weights_num):\n            org_name, org_shape = org_weights_mess[index]\n            cur_name, cur_shape = cur_weights_mess[index]\n            if cur_shape != org_shape:\n                print(org_weights_mess[index])\n                print(cur_weights_mess[index])\n                raise RuntimeError\n            cur_to_org_dict[cur_name] = org_name\n            print(\"=> \" + str(cur_name).ljust(50) + ' : ' + org_name)\n\n        with tf.name_scope('load_save'):\n            name_to_var_dict = {var.op.name: var for var in tf.global_variables()}\n            restore_dict = {cur_to_org_dict[cur_name]: name_to_var_dict[cur_name] for cur_name in cur_to_org_dict}\n            load = tf.train.Saver(restore_dict)\n            save = tf.train.Saver(tf.global_variables())\n            for var in tf.global_variables():\n                print(\"=> \" + var.op.name)\n\n        sess.run(tf.global_variables_initializer())\n        print('=> Restoring weights from:\\t %s' % org_weights_path)\n        load.restore(sess, org_weights_path)\n        save.save(sess, cur_weights_path)\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('model_dir')\n    args = parser.parse_args()\n    if os.path.exists(args.model_dir):\n        raise OSError('Directory {} already exists; please specify a different path for the tensorflow SavedModel'.format(args.model_dir))\n    with tempfile.TemporaryDirectory() as workdir:\n        ckpt_file = os.path.join(workdir, './yolov3_coco_demo.ckpt')\n        input_size = 416\n        if not os.path.isfile(ckpt_file + '.meta'):\n            yolov3_coco_tar_gz = os.path.join(workdir, './yolov3_coco.tar.gz')\n            url = 'https://github.com/YunYang1994/tensorflow-yolov3/releases/download/v1.0/yolov3_coco.tar.gz'\n            print('Downloading from {}'.format(url))\n            urllib.request.urlretrieve(url, yolov3_coco_tar_gz)\n            shutil.unpack_archive(yolov3_coco_tar_gz, extract_dir=workdir)\n            convert_weights(os.path.join(workdir, './yolov3_coco.ckpt'), ckpt_file, input_size)\n\n        input_tensor_name = 'input/input_data:0'\n        output_names = ['boxes', 'scores', 'classes']\n        output_tensor_names = ['pred_bbox/boxes:0', 'pred_bbox/scores:0', 'pred_bbox/classes:0']\n        with tf.Session(graph=tf.Graph()) as sess:\n            with tf.name_scope('input'):\n                input_data = tf.placeholder(dtype=tf.string, shape=[None], name='input_data')\n            model = YOLOV3(input_data, input_size, trainable=False)\n            print(model.conv_sbbox, model.conv_mbbox, model.conv_lbbox)\n            saver = tf.train.Saver()\n            saver.restore(sess, ckpt_file)\n            input_tensor = sess.graph.get_tensor_by_name(input_tensor_name)\n            inputs = {'image': input_tensor}\n            outputs = {name: sess.graph.get_tensor_by_name(tensor_name) for name, tensor_name in zip(output_names, output_tensor_names)}\n            tf.saved_model.simple_save(sess, args.model_dir, inputs, outputs)\n    print('tensorflow YOLO v3 SavedModel generated at {}'.format(args.model_dir))\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "src/examples/tensorflow/yolo_v4_demo/README.md",
    "content": "</br>\n</br>\n\nPlease view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** \n\n"
  },
  {
    "path": "src/examples/tensorflow/yolo_v4_demo/evaluate.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Evaluate YOLO v4 on Inferentia\\n\",\n    \"## Note: this tutorial runs on tensorflow-neuron 1.x only\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Introduction\\n\",\n    \"This tutorial walks through compiling and evaluating YOLO v4 model on Inferentia using the AWS Neuron SDK 09/2020 release. We recommend running this tutorial on an EC2 `inf1.2xlarge` instance which contains one Inferentia and 8 vCPU cores, as well as 16 GB of memory.Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow) You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Prerequisites\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"This demo requires the following pip packages:\\n\",\n    \"\\n\",\n    \"`neuron-cc tensorflow-neuron<2 requests pillow matplotlib pycocotools torch`\\n\",\n    \"\\n\",\n    \"and debian/rpm package `aws-neuron-runtime`.\\n\",\n    \"\\n\",\n    \"On DLAMI, `aws-neuron-runtime` is already pre-installed.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!pip install tensorflow_neuron==1.15.5.2.8.9.0 neuron_cc==1.13.5.0 requests pillow matplotlib pycocotools==2.0.1 numpy==1.18.2 torch~=1.5.0 --force \\\\\\n\",\n    \"    --extra-index-url=https://pip.repos.neuron.amazonaws.com\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 1: Download Dataset and Generate Pretrained SavedModel\\n\",\n    \"### Download COCO 2017 validation dataset\\n\",\n    \"We start by downloading the COCO validation dataset, which we will use to validate our model. The COCO 2017 dataset is widely used for object-detection, segmentation and image captioning.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!curl -LO http://images.cocodataset.org/zips/val2017.zip\\n\",\n    \"!curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip\\n\",\n    \"!unzip -q val2017.zip\\n\",\n    \"!unzip annotations_trainval2017.zip\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!ls\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Check required package versions\\n\",\n    \"Here are the minimum required versions of AWS Neuron packages. We run a check.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import pkg_resources\\n\",\n    \"from distutils.version import LooseVersion\\n\",\n    \"\\n\",\n    \"assert LooseVersion(pkg_resources.get_distribution('neuron-cc').version) > LooseVersion('1.0.20000')\\n\",\n    \"assert LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version) > LooseVersion('1.15.3.1.0.2000')\\n\",\n    \"print('passed package version checks')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Generate YOLO v4 tensorflow SavedModel (pretrained on COCO 2017 dataset)\\n\",\n    \"Script `yolo_v4_coco_saved_model.py` will generate a tensorflow SavedModel using pretrained weights from https://github.com/Tianxiaomo/pytorch-YOLOv4.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"!python3 yolo_v4_coco_saved_model.py\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"This tensorflow SavedModel can be loaded as a tensorflow predictor. When a JPEG format image is provided as input, the output result of the tensorflow predictor contains information for drawing bounding boxes and classification results.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import json\\n\",\n    \"import tensorflow as tf\\n\",\n    \"from PIL import Image\\n\",\n    \"import matplotlib.pyplot as plt\\n\",\n    \"import matplotlib.patches as patches\\n\",\n    \"\\n\",\n    \"# launch predictor and run inference on an arbitrary image in the validation dataset\\n\",\n    \"yolo_pred_cpu = tf.contrib.predictor.from_saved_model('./yolo_v4_coco_saved_model')\\n\",\n    \"image_path = './val2017/000000581781.jpg'\\n\",\n    \"with open(image_path, 'rb') as f:\\n\",\n    \"    feeds = {'image': [f.read()]}\\n\",\n    \"results = yolo_pred_cpu(feeds)\\n\",\n    \"\\n\",\n    \"# load annotations to decode classification result\\n\",\n    \"with open('./annotations/instances_val2017.json') as f:\\n\",\n    \"    annotate_json = json.load(f)\\n\",\n    \"label_info = {idx+1: cat['name'] for idx, cat in enumerate(annotate_json['categories'])}\\n\",\n    \"\\n\",\n    \"# draw picture and bounding boxes\\n\",\n    \"fig, ax = plt.subplots(figsize=(10, 10))\\n\",\n    \"ax.imshow(Image.open(image_path).convert('RGB'))\\n\",\n    \"wanted = results['scores'][0] > 0.1\\n\",\n    \"for xyxy, label_no_bg in zip(results['boxes'][0][wanted], results['classes'][0][wanted]):\\n\",\n    \"    xywh = xyxy[0], xyxy[1], xyxy[2] - xyxy[0], xyxy[3] - xyxy[1]\\n\",\n    \"    rect = patches.Rectangle((xywh[0], xywh[1]), xywh[2], xywh[3], linewidth=1, edgecolor='g', facecolor='none')\\n\",\n    \"    ax.add_patch(rect)\\n\",\n    \"    rx, ry = rect.get_xy()\\n\",\n    \"    rx = rx + rect.get_width() / 2.0\\n\",\n    \"    ax.annotate(label_info[label_no_bg + 1], (rx, ry), color='w', backgroundcolor='g', fontsize=10,\\n\",\n    \"                ha='center', va='center', bbox=dict(boxstyle='square,pad=0.01', fc='g', ec='none', alpha=0.5))\\n\",\n    \"plt.show()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 2: Compile the Pretrained SavedModel for Inferentia\\n\",\n    \"We make use of the Python compilation API `tfn.saved_model.compile` that is avaiable in `tensorflow-neuron<2`. For the purpose of reducing Neuron runtime overhead, it is necessary to make use of arguments `no_fuse_ops` and `minimum_segment_size`.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import shutil\\n\",\n    \"import tensorflow as tf\\n\",\n    \"import tensorflow.neuron as tfn\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def no_fuse_condition(op):\\n\",\n    \"    return any(op.name.startswith(pat) for pat in ['reshape', 'lambda_1/Cast', 'lambda_2/Cast', 'lambda_3/Cast'])\\n\",\n    \"\\n\",\n    \"with tf.Session(graph=tf.Graph()) as sess:\\n\",\n    \"    tf.saved_model.loader.load(sess, ['serve'], './yolo_v4_coco_saved_model')\\n\",\n    \"    no_fuse_ops = [op.name for op in sess.graph.get_operations() if no_fuse_condition(op)]\\n\",\n    \"shutil.rmtree('./yolo_v4_coco_saved_model_neuron', ignore_errors=True)\\n\",\n    \"result = tfn.saved_model.compile(\\n\",\n    \"    './yolo_v4_coco_saved_model', './yolo_v4_coco_saved_model_neuron',\\n\",\n    \"    # we partition the graph before casting from float16 to float32, to help reduce the output tensor size by 1/2\\n\",\n    \"    no_fuse_ops=no_fuse_ops,\\n\",\n    \"    # to enforce trivial compilable subgraphs to run on CPU\\n\",\n    \"    minimum_segment_size=100,\\n\",\n    \"    batch_size=1,\\n\",\n    \"    dynamic_batch_size=True,\\n\",\n    \")\\n\",\n    \"print(result)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Part 3: Evaluate Model Quality after Compilation\\n\",\n    \"### Define evaluation functions\\n\",\n    \"We first define some handy helper functions for running evaluation on the COCO 2017 dataset.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"import json\\n\",\n    \"import time\\n\",\n    \"import numpy as np\\n\",\n    \"import tensorflow as tf\\n\",\n    \"from pycocotools.coco import COCO\\n\",\n    \"from pycocotools.cocoeval import COCOeval\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def cocoapi_eval(jsonfile,\\n\",\n    \"                 style,\\n\",\n    \"                 coco_gt=None,\\n\",\n    \"                 anno_file=None,\\n\",\n    \"                 max_dets=(100, 300, 1000)):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    Args:\\n\",\n    \"        jsonfile: Evaluation json file, eg: bbox.json, mask.json.\\n\",\n    \"        style: COCOeval style, can be `bbox` , `segm` and `proposal`.\\n\",\n    \"        coco_gt: Whether to load COCOAPI through anno_file,\\n\",\n    \"                 eg: coco_gt = COCO(anno_file)\\n\",\n    \"        anno_file: COCO annotations file.\\n\",\n    \"        max_dets: COCO evaluation maxDets.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    assert coco_gt is not None or anno_file is not None\\n\",\n    \"\\n\",\n    \"    if coco_gt is None:\\n\",\n    \"        coco_gt = COCO(anno_file)\\n\",\n    \"    print(\\\"Start evaluate...\\\")\\n\",\n    \"    coco_dt = coco_gt.loadRes(jsonfile)\\n\",\n    \"    if style == 'proposal':\\n\",\n    \"        coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')\\n\",\n    \"        coco_eval.params.useCats = 0\\n\",\n    \"        coco_eval.params.maxDets = list(max_dets)\\n\",\n    \"    else:\\n\",\n    \"        coco_eval = COCOeval(coco_gt, coco_dt, style)\\n\",\n    \"    coco_eval.evaluate()\\n\",\n    \"    coco_eval.accumulate()\\n\",\n    \"    coco_eval.summarize()\\n\",\n    \"    return coco_eval.stats\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def bbox_eval(anno_file, bbox_list):\\n\",\n    \"    coco_gt = COCO(anno_file)\\n\",\n    \"\\n\",\n    \"    outfile = 'bbox_detections.json'\\n\",\n    \"    print('Generating json file...')\\n\",\n    \"    with open(outfile, 'w') as f:\\n\",\n    \"        json.dump(bbox_list, f)\\n\",\n    \"\\n\",\n    \"    map_stats = cocoapi_eval(outfile, 'bbox', coco_gt=coco_gt)\\n\",\n    \"    return map_stats\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def get_image_as_bytes(images, eval_pre_path):\\n\",\n    \"    batch_im_id_list = []\\n\",\n    \"    batch_im_name_list = []\\n\",\n    \"    batch_img_bytes_list = []\\n\",\n    \"    n = len(images)\\n\",\n    \"    batch_im_id = []\\n\",\n    \"    batch_im_name = []\\n\",\n    \"    batch_img_bytes = []\\n\",\n    \"    for i, im in enumerate(images):\\n\",\n    \"        im_id = im['id']\\n\",\n    \"        file_name = im['file_name']\\n\",\n    \"        if i % eval_batch_size == 0 and i != 0:\\n\",\n    \"            batch_im_id_list.append(batch_im_id)\\n\",\n    \"            batch_im_name_list.append(batch_im_name)\\n\",\n    \"            batch_img_bytes_list.append(batch_img_bytes)\\n\",\n    \"            batch_im_id = []\\n\",\n    \"            batch_im_name = []\\n\",\n    \"            batch_img_bytes = []\\n\",\n    \"        batch_im_id.append(im_id)\\n\",\n    \"        batch_im_name.append(file_name)\\n\",\n    \"\\n\",\n    \"        with open(os.path.join(eval_pre_path, file_name), 'rb') as f:\\n\",\n    \"            batch_img_bytes.append(f.read())\\n\",\n    \"    return batch_im_id_list, batch_im_name_list, batch_img_bytes_list\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def analyze_bbox(results, batch_im_id, _clsid2catid):\\n\",\n    \"    bbox_list = []\\n\",\n    \"    k = 0\\n\",\n    \"    for boxes, scores, classes in zip(results['boxes'], results['scores'], results['classes']):\\n\",\n    \"        if boxes is not None:\\n\",\n    \"            im_id = batch_im_id[k]\\n\",\n    \"            n = len(boxes)\\n\",\n    \"            for p in range(n):\\n\",\n    \"                clsid = classes[p]\\n\",\n    \"                score = scores[p]\\n\",\n    \"                xmin, ymin, xmax, ymax = boxes[p]\\n\",\n    \"                catid = (_clsid2catid[int(clsid)])\\n\",\n    \"                w = xmax - xmin + 1\\n\",\n    \"                h = ymax - ymin + 1\\n\",\n    \"\\n\",\n    \"                bbox = [xmin, ymin, w, h]\\n\",\n    \"                # Round to the nearest 10th to avoid huge file sizes, as COCO suggests\\n\",\n    \"                bbox = [round(float(x) * 10) / 10 for x in bbox]\\n\",\n    \"                bbox_res = {\\n\",\n    \"                    'image_id': im_id,\\n\",\n    \"                    'category_id': catid,\\n\",\n    \"                    'bbox': bbox,\\n\",\n    \"                    'score': float(score),\\n\",\n    \"                }\\n\",\n    \"                bbox_list.append(bbox_res)\\n\",\n    \"        k += 1\\n\",\n    \"    return bbox_list\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Here is the actual evaluation loop. To fully utilize all four cores on one Inferentia, the optimal setup is to run multi-threaded inference using a `ThreadPoolExecutor`. The following cell is a multi-threaded adaptation of the evaluation routine at https://github.com/miemie2013/Keras-YOLOv4/blob/910c4c6f7265f5828fceed0f784496a0b46516bf/tools/cocotools.py#L97.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"from concurrent import futures\\n\",\n    \"\\n\",\n    \"NUM_THREADS = 4\\n\",\n    \"\\n\",\n    \"def evaluate(yolo_predictor, images, eval_pre_path, anno_file, eval_batch_size, _clsid2catid):\\n\",\n    \"    batch_im_id_list, batch_im_name_list, batch_img_bytes_list = get_image_as_bytes(images, eval_pre_path)\\n\",\n    \"\\n\",\n    \"    # warm up\\n\",\n    \"    yolo_predictor({'image': np.array(batch_img_bytes_list[0], dtype=object)})\\n\",\n    \"    \\n\",\n    \"    def yolo_predictor_timer(yolo_pred, image):\\n\",\n    \"        begin = time.time()\\n\",\n    \"        result = yolo_pred(image)\\n\",\n    \"        delta = time.time() - begin\\n\",\n    \"        return result, delta\\n\",\n    \"\\n\",\n    \"    latency = []\\n\",\n    \"    with futures.ThreadPoolExecutor(NUM_THREADS) as exe:\\n\",\n    \"        fut_im_list = []\\n\",\n    \"        fut_list = []\\n\",\n    \"\\n\",\n    \"        start_time = time.time()\\n\",\n    \"        for batch_im_id, batch_im_name, batch_img_bytes in zip(batch_im_id_list, batch_im_name_list, batch_img_bytes_list):\\n\",\n    \"            if len(batch_img_bytes) != eval_batch_size:\\n\",\n    \"                continue\\n\",\n    \"            fut = exe.submit(yolo_predictor_timer, yolo_predictor, {'image': np.array(batch_img_bytes, dtype=object)})\\n\",\n    \"            fut_im_list.append((batch_im_id, batch_im_name))\\n\",\n    \"            fut_list.append(fut)\\n\",\n    \"        bbox_list = []\\n\",\n    \"        sum_time = 0.0\\n\",\n    \"        count = 0\\n\",\n    \"        for (batch_im_id, batch_im_name), fut in zip(fut_im_list, fut_list):\\n\",\n    \"            results, times = fut.result()\\n\",\n    \"            # Adjust latency since we are in batch\\n\",\n    \"            latency.append(times / eval_batch_size)\\n\",\n    \"            sum_time += times\\n\",\n    \"            bbox_list.extend(analyze_bbox(results, batch_im_id, _clsid2catid))\\n\",\n    \"            for _ in batch_im_id:\\n\",\n    \"                count += 1\\n\",\n    \"                if count % 1000 == 0:\\n\",\n    \"                    print('Test iter {}'.format(count))\\n\",\n    \"\\n\",\n    \"        throughput = len(images) / (sum_time / NUM_THREADS)\\n\",\n    \"\\n\",\n    \"        \\n\",\n    \"    print('Average Images Per Second:', throughput)\\n\",\n    \"    print(\\\"Latency P50: {:.1f} ms\\\".format(np.percentile(latency, 50)*1000.0))\\n\",\n    \"    print(\\\"Latency P90: {:.1f} ms\\\".format(np.percentile(latency, 90)*1000.0))\\n\",\n    \"    print(\\\"Latency P95: {:.1f} ms\\\".format(np.percentile(latency, 95)*1000.0))\\n\",\n    \"    print(\\\"Latency P99: {:.1f} ms\\\".format(np.percentile(latency, 99)*1000.0))\\n\",\n    \"\\n\",\n    \"    # start evaluation\\n\",\n    \"    box_ap_stats = bbox_eval(anno_file, bbox_list)\\n\",\n    \"    return box_ap_stats\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Evaluate mean average precision (mAP) score\\n\",\n    \"Here is the code to calculate mAP scores of the YOLO v4 model. The expected mAP score is around 0.487 if we use the pretrained weights.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"yolo_pred = tf.contrib.predictor.from_saved_model('./yolo_v4_coco_saved_model_neuron')\\n\",\n    \"\\n\",\n    \"val_coco_root = './val2017'\\n\",\n    \"val_annotate = './annotations/instances_val2017.json'\\n\",\n    \"clsid2catid = {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16,\\n\",\n    \"               15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31,\\n\",\n    \"               27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43,\\n\",\n    \"               39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56,\\n\",\n    \"               51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72,\\n\",\n    \"               63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85,\\n\",\n    \"               75: 86, 76: 87, 77: 88, 78: 89, 79: 90}\\n\",\n    \"eval_batch_size = 8\\n\",\n    \"with open(val_annotate, 'r', encoding='utf-8') as f2:\\n\",\n    \"    for line in f2:\\n\",\n    \"        line = line.strip()\\n\",\n    \"        dataset = json.loads(line)\\n\",\n    \"        images = dataset['images']\\n\",\n    \"box_ap = evaluate(yolo_pred, images, val_coco_root, val_annotate, eval_batch_size, clsid2catid)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Environment (conda_aws_neuron_tensorflow_p36)\",\n   \"language\": \"python\",\n   \"name\": \"conda_aws_neuron_tensorflow_p36\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.6.13\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "src/examples/tensorflow/yolo_v4_demo/yolo_v4_coco_saved_model.py",
    "content": "import os\nimport io\nfrom functools import partial\nimport requests\nimport numpy as np\nimport torch\nimport tensorflow as tf\nfrom tensorflow import keras\nfrom tensorflow.keras import layers\n\n\n\ndef rename_weights(checkpoint):\n    name_mapping = {\n        'down1.conv1.conv.0.weight': 'models.0.conv1.weight',\n        'down1.conv1.conv.1.weight': 'models.0.bn1.weight',\n        'down1.conv1.conv.1.bias': 'models.0.bn1.bias',\n        'down1.conv1.conv.1.running_mean': 'models.0.bn1.running_mean',\n        'down1.conv1.conv.1.running_var': 'models.0.bn1.running_var',\n        'down1.conv1.conv.1.num_batches_tracked': 'models.0.bn1.num_batches_tracked',\n        'down1.conv2.conv.0.weight': 'models.1.conv2.weight',\n        'down1.conv2.conv.1.weight': 'models.1.bn2.weight',\n        'down1.conv2.conv.1.bias': 'models.1.bn2.bias',\n        'down1.conv2.conv.1.running_mean': 'models.1.bn2.running_mean',\n        'down1.conv2.conv.1.running_var': 'models.1.bn2.running_var',\n        'down1.conv2.conv.1.num_batches_tracked': 'models.1.bn2.num_batches_tracked',\n        'down1.conv3.conv.0.weight': 'models.2.conv3.weight',\n        'down1.conv3.conv.1.weight': 'models.2.bn3.weight',\n        'down1.conv3.conv.1.bias': 'models.2.bn3.bias',\n        'down1.conv3.conv.1.running_mean': 'models.2.bn3.running_mean',\n        'down1.conv3.conv.1.running_var': 'models.2.bn3.running_var',\n        'down1.conv3.conv.1.num_batches_tracked': 'models.2.bn3.num_batches_tracked',\n        'down1.conv4.conv.0.weight': 'models.4.conv4.weight',\n        'down1.conv4.conv.1.weight': 'models.4.bn4.weight',\n        'down1.conv4.conv.1.bias': 'models.4.bn4.bias',\n        'down1.conv4.conv.1.running_mean': 'models.4.bn4.running_mean',\n        'down1.conv4.conv.1.running_var': 'models.4.bn4.running_var',\n        'down1.conv4.conv.1.num_batches_tracked': 'models.4.bn4.num_batches_tracked',\n        'down1.conv5.conv.0.weight': 'models.5.conv5.weight',\n        'down1.conv5.conv.1.weight': 'models.5.bn5.weight',\n        'down1.conv5.conv.1.bias': 'models.5.bn5.bias',\n        'down1.conv5.conv.1.running_mean': 'models.5.bn5.running_mean',\n        'down1.conv5.conv.1.running_var': 'models.5.bn5.running_var',\n        'down1.conv5.conv.1.num_batches_tracked': 'models.5.bn5.num_batches_tracked',\n        'down1.conv6.conv.0.weight': 'models.6.conv6.weight',\n        'down1.conv6.conv.1.weight': 'models.6.bn6.weight',\n        'down1.conv6.conv.1.bias': 'models.6.bn6.bias',\n        'down1.conv6.conv.1.running_mean': 'models.6.bn6.running_mean',\n        'down1.conv6.conv.1.running_var': 'models.6.bn6.running_var',\n        'down1.conv6.conv.1.num_batches_tracked': 'models.6.bn6.num_batches_tracked',\n        'down1.conv7.conv.0.weight': 'models.8.conv7.weight',\n        'down1.conv7.conv.1.weight': 'models.8.bn7.weight',\n        'down1.conv7.conv.1.bias': 'models.8.bn7.bias',\n        'down1.conv7.conv.1.running_mean': 'models.8.bn7.running_mean',\n        'down1.conv7.conv.1.running_var': 'models.8.bn7.running_var',\n        'down1.conv7.conv.1.num_batches_tracked': 'models.8.bn7.num_batches_tracked',\n        'down1.conv8.conv.0.weight': 'models.10.conv8.weight',\n        'down1.conv8.conv.1.weight': 'models.10.bn8.weight',\n        'down1.conv8.conv.1.bias': 'models.10.bn8.bias',\n        'down1.conv8.conv.1.running_mean': 'models.10.bn8.running_mean',\n        'down1.conv8.conv.1.running_var': 'models.10.bn8.running_var',\n        'down1.conv8.conv.1.num_batches_tracked': 'models.10.bn8.num_batches_tracked',\n        'down2.conv1.conv.0.weight': 'models.11.conv9.weight',\n        'down2.conv1.conv.1.weight': 'models.11.bn9.weight',\n        'down2.conv1.conv.1.bias': 'models.11.bn9.bias',\n        'down2.conv1.conv.1.running_mean': 'models.11.bn9.running_mean',\n        'down2.conv1.conv.1.running_var': 'models.11.bn9.running_var',\n        'down2.conv1.conv.1.num_batches_tracked': 'models.11.bn9.num_batches_tracked',\n        'down2.conv2.conv.0.weight': 'models.12.conv10.weight',\n        'down2.conv2.conv.1.weight': 'models.12.bn10.weight',\n        'down2.conv2.conv.1.bias': 'models.12.bn10.bias',\n        'down2.conv2.conv.1.running_mean': 'models.12.bn10.running_mean',\n        'down2.conv2.conv.1.running_var': 'models.12.bn10.running_var',\n        'down2.conv2.conv.1.num_batches_tracked': 'models.12.bn10.num_batches_tracked',\n        'down2.conv3.conv.0.weight': 'models.14.conv11.weight',\n        'down2.conv3.conv.1.weight': 'models.14.bn11.weight',\n        'down2.conv3.conv.1.bias': 'models.14.bn11.bias',\n        'down2.conv3.conv.1.running_mean': 'models.14.bn11.running_mean',\n        'down2.conv3.conv.1.running_var': 'models.14.bn11.running_var',\n        'down2.conv3.conv.1.num_batches_tracked': 'models.14.bn11.num_batches_tracked',\n        'down2.resblock.module_list.0.0.conv.0.weight': 'models.15.conv12.weight',\n        'down2.resblock.module_list.0.0.conv.1.weight': 'models.15.bn12.weight',\n        'down2.resblock.module_list.0.0.conv.1.bias': 'models.15.bn12.bias',\n        'down2.resblock.module_list.0.0.conv.1.running_mean': 'models.15.bn12.running_mean',\n        'down2.resblock.module_list.0.0.conv.1.running_var': 'models.15.bn12.running_var',\n        'down2.resblock.module_list.0.0.conv.1.num_batches_tracked': 'models.15.bn12.num_batches_tracked',\n        'down2.resblock.module_list.0.1.conv.0.weight': 'models.16.conv13.weight',\n        'down2.resblock.module_list.0.1.conv.1.weight': 'models.16.bn13.weight',\n        'down2.resblock.module_list.0.1.conv.1.bias': 'models.16.bn13.bias',\n        'down2.resblock.module_list.0.1.conv.1.running_mean': 'models.16.bn13.running_mean',\n        'down2.resblock.module_list.0.1.conv.1.running_var': 'models.16.bn13.running_var',\n        'down2.resblock.module_list.0.1.conv.1.num_batches_tracked': 'models.16.bn13.num_batches_tracked',\n        'down2.resblock.module_list.1.0.conv.0.weight': 'models.18.conv14.weight',\n        'down2.resblock.module_list.1.0.conv.1.weight': 'models.18.bn14.weight',\n        'down2.resblock.module_list.1.0.conv.1.bias': 'models.18.bn14.bias',\n        'down2.resblock.module_list.1.0.conv.1.running_mean': 'models.18.bn14.running_mean',\n        'down2.resblock.module_list.1.0.conv.1.running_var': 'models.18.bn14.running_var',\n        'down2.resblock.module_list.1.0.conv.1.num_batches_tracked': 'models.18.bn14.num_batches_tracked',\n        'down2.resblock.module_list.1.1.conv.0.weight': 'models.19.conv15.weight',\n        'down2.resblock.module_list.1.1.conv.1.weight': 'models.19.bn15.weight',\n        'down2.resblock.module_list.1.1.conv.1.bias': 'models.19.bn15.bias',\n        'down2.resblock.module_list.1.1.conv.1.running_mean': 'models.19.bn15.running_mean',\n        'down2.resblock.module_list.1.1.conv.1.running_var': 'models.19.bn15.running_var',\n        'down2.resblock.module_list.1.1.conv.1.num_batches_tracked': 'models.19.bn15.num_batches_tracked',\n        'down2.conv4.conv.0.weight': 'models.21.conv16.weight',\n        'down2.conv4.conv.1.weight': 'models.21.bn16.weight',\n        'down2.conv4.conv.1.bias': 'models.21.bn16.bias',\n        'down2.conv4.conv.1.running_mean': 'models.21.bn16.running_mean',\n        'down2.conv4.conv.1.running_var': 'models.21.bn16.running_var',\n        'down2.conv4.conv.1.num_batches_tracked': 'models.21.bn16.num_batches_tracked',\n        'down2.conv5.conv.0.weight': 'models.23.conv17.weight',\n        'down2.conv5.conv.1.weight': 'models.23.bn17.weight',\n        'down2.conv5.conv.1.bias': 'models.23.bn17.bias',\n        'down2.conv5.conv.1.running_mean': 'models.23.bn17.running_mean',\n        'down2.conv5.conv.1.running_var': 'models.23.bn17.running_var',\n        'down2.conv5.conv.1.num_batches_tracked': 'models.23.bn17.num_batches_tracked',\n        'down3.conv1.conv.0.weight': 'models.24.conv18.weight',\n        'down3.conv1.conv.1.weight': 'models.24.bn18.weight',\n        'down3.conv1.conv.1.bias': 'models.24.bn18.bias',\n        'down3.conv1.conv.1.running_mean': 'models.24.bn18.running_mean',\n        'down3.conv1.conv.1.running_var': 'models.24.bn18.running_var',\n        'down3.conv1.conv.1.num_batches_tracked': 'models.24.bn18.num_batches_tracked',\n        'down3.conv2.conv.0.weight': 'models.25.conv19.weight',\n        'down3.conv2.conv.1.weight': 'models.25.bn19.weight',\n        'down3.conv2.conv.1.bias': 'models.25.bn19.bias',\n        'down3.conv2.conv.1.running_mean': 'models.25.bn19.running_mean',\n        'down3.conv2.conv.1.running_var': 'models.25.bn19.running_var',\n        'down3.conv2.conv.1.num_batches_tracked': 'models.25.bn19.num_batches_tracked',\n        'down3.conv3.conv.0.weight': 'models.27.conv20.weight',\n        'down3.conv3.conv.1.weight': 'models.27.bn20.weight',\n        'down3.conv3.conv.1.bias': 'models.27.bn20.bias',\n        'down3.conv3.conv.1.running_mean': 'models.27.bn20.running_mean',\n        'down3.conv3.conv.1.running_var': 'models.27.bn20.running_var',\n        'down3.conv3.conv.1.num_batches_tracked': 'models.27.bn20.num_batches_tracked',\n        'down3.resblock.module_list.0.0.conv.0.weight': 'models.28.conv21.weight',\n        'down3.resblock.module_list.0.0.conv.1.weight': 'models.28.bn21.weight',\n        'down3.resblock.module_list.0.0.conv.1.bias': 'models.28.bn21.bias',\n        'down3.resblock.module_list.0.0.conv.1.running_mean': 'models.28.bn21.running_mean',\n        'down3.resblock.module_list.0.0.conv.1.running_var': 'models.28.bn21.running_var',\n        'down3.resblock.module_list.0.0.conv.1.num_batches_tracked': 'models.28.bn21.num_batches_tracked',\n        'down3.resblock.module_list.0.1.conv.0.weight': 'models.29.conv22.weight',\n        'down3.resblock.module_list.0.1.conv.1.weight': 'models.29.bn22.weight',\n        'down3.resblock.module_list.0.1.conv.1.bias': 'models.29.bn22.bias',\n        'down3.resblock.module_list.0.1.conv.1.running_mean': 'models.29.bn22.running_mean',\n        'down3.resblock.module_list.0.1.conv.1.running_var': 'models.29.bn22.running_var',\n        'down3.resblock.module_list.0.1.conv.1.num_batches_tracked': 'models.29.bn22.num_batches_tracked',\n        'down3.resblock.module_list.1.0.conv.0.weight': 'models.31.conv23.weight',\n        'down3.resblock.module_list.1.0.conv.1.weight': 'models.31.bn23.weight',\n        'down3.resblock.module_list.1.0.conv.1.bias': 'models.31.bn23.bias',\n        'down3.resblock.module_list.1.0.conv.1.running_mean': 'models.31.bn23.running_mean',\n        'down3.resblock.module_list.1.0.conv.1.running_var': 'models.31.bn23.running_var',\n        'down3.resblock.module_list.1.0.conv.1.num_batches_tracked': 'models.31.bn23.num_batches_tracked',\n        'down3.resblock.module_list.1.1.conv.0.weight': 'models.32.conv24.weight',\n        'down3.resblock.module_list.1.1.conv.1.weight': 'models.32.bn24.weight',\n        'down3.resblock.module_list.1.1.conv.1.bias': 'models.32.bn24.bias',\n        'down3.resblock.module_list.1.1.conv.1.running_mean': 'models.32.bn24.running_mean',\n        'down3.resblock.module_list.1.1.conv.1.running_var': 'models.32.bn24.running_var',\n        'down3.resblock.module_list.1.1.conv.1.num_batches_tracked': 'models.32.bn24.num_batches_tracked',\n        'down3.resblock.module_list.2.0.conv.0.weight': 'models.34.conv25.weight',\n        'down3.resblock.module_list.2.0.conv.1.weight': 'models.34.bn25.weight',\n        'down3.resblock.module_list.2.0.conv.1.bias': 'models.34.bn25.bias',\n        'down3.resblock.module_list.2.0.conv.1.running_mean': 'models.34.bn25.running_mean',\n        'down3.resblock.module_list.2.0.conv.1.running_var': 'models.34.bn25.running_var',\n        'down3.resblock.module_list.2.0.conv.1.num_batches_tracked': 'models.34.bn25.num_batches_tracked',\n        'down3.resblock.module_list.2.1.conv.0.weight': 'models.35.conv26.weight',\n        'down3.resblock.module_list.2.1.conv.1.weight': 'models.35.bn26.weight',\n        'down3.resblock.module_list.2.1.conv.1.bias': 'models.35.bn26.bias',\n        'down3.resblock.module_list.2.1.conv.1.running_mean': 'models.35.bn26.running_mean',\n        'down3.resblock.module_list.2.1.conv.1.running_var': 'models.35.bn26.running_var',\n        'down3.resblock.module_list.2.1.conv.1.num_batches_tracked': 'models.35.bn26.num_batches_tracked',\n        'down3.resblock.module_list.3.0.conv.0.weight': 'models.37.conv27.weight',\n        'down3.resblock.module_list.3.0.conv.1.weight': 'models.37.bn27.weight',\n        'down3.resblock.module_list.3.0.conv.1.bias': 'models.37.bn27.bias',\n        'down3.resblock.module_list.3.0.conv.1.running_mean': 'models.37.bn27.running_mean',\n        'down3.resblock.module_list.3.0.conv.1.running_var': 'models.37.bn27.running_var',\n        'down3.resblock.module_list.3.0.conv.1.num_batches_tracked': 'models.37.bn27.num_batches_tracked',\n        'down3.resblock.module_list.3.1.conv.0.weight': 'models.38.conv28.weight',\n        'down3.resblock.module_list.3.1.conv.1.weight': 'models.38.bn28.weight',\n        'down3.resblock.module_list.3.1.conv.1.bias': 'models.38.bn28.bias',\n        'down3.resblock.module_list.3.1.conv.1.running_mean': 'models.38.bn28.running_mean',\n        'down3.resblock.module_list.3.1.conv.1.running_var': 'models.38.bn28.running_var',\n        'down3.resblock.module_list.3.1.conv.1.num_batches_tracked': 'models.38.bn28.num_batches_tracked',\n        'down3.resblock.module_list.4.0.conv.0.weight': 'models.40.conv29.weight',\n        'down3.resblock.module_list.4.0.conv.1.weight': 'models.40.bn29.weight',\n        'down3.resblock.module_list.4.0.conv.1.bias': 'models.40.bn29.bias',\n        'down3.resblock.module_list.4.0.conv.1.running_mean': 'models.40.bn29.running_mean',\n        'down3.resblock.module_list.4.0.conv.1.running_var': 'models.40.bn29.running_var',\n        'down3.resblock.module_list.4.0.conv.1.num_batches_tracked': 'models.40.bn29.num_batches_tracked',\n        'down3.resblock.module_list.4.1.conv.0.weight': 'models.41.conv30.weight',\n        'down3.resblock.module_list.4.1.conv.1.weight': 'models.41.bn30.weight',\n        'down3.resblock.module_list.4.1.conv.1.bias': 'models.41.bn30.bias',\n        'down3.resblock.module_list.4.1.conv.1.running_mean': 'models.41.bn30.running_mean',\n        'down3.resblock.module_list.4.1.conv.1.running_var': 'models.41.bn30.running_var',\n        'down3.resblock.module_list.4.1.conv.1.num_batches_tracked': 'models.41.bn30.num_batches_tracked',\n        'down3.resblock.module_list.5.0.conv.0.weight': 'models.43.conv31.weight',\n        'down3.resblock.module_list.5.0.conv.1.weight': 'models.43.bn31.weight',\n        'down3.resblock.module_list.5.0.conv.1.bias': 'models.43.bn31.bias',\n        'down3.resblock.module_list.5.0.conv.1.running_mean': 'models.43.bn31.running_mean',\n        'down3.resblock.module_list.5.0.conv.1.running_var': 'models.43.bn31.running_var',\n        'down3.resblock.module_list.5.0.conv.1.num_batches_tracked': 'models.43.bn31.num_batches_tracked',\n        'down3.resblock.module_list.5.1.conv.0.weight': 'models.44.conv32.weight',\n        'down3.resblock.module_list.5.1.conv.1.weight': 'models.44.bn32.weight',\n        'down3.resblock.module_list.5.1.conv.1.bias': 'models.44.bn32.bias',\n        'down3.resblock.module_list.5.1.conv.1.running_mean': 'models.44.bn32.running_mean',\n        'down3.resblock.module_list.5.1.conv.1.running_var': 'models.44.bn32.running_var',\n        'down3.resblock.module_list.5.1.conv.1.num_batches_tracked': 'models.44.bn32.num_batches_tracked',\n        'down3.resblock.module_list.6.0.conv.0.weight': 'models.46.conv33.weight',\n        'down3.resblock.module_list.6.0.conv.1.weight': 'models.46.bn33.weight',\n        'down3.resblock.module_list.6.0.conv.1.bias': 'models.46.bn33.bias',\n        'down3.resblock.module_list.6.0.conv.1.running_mean': 'models.46.bn33.running_mean',\n        'down3.resblock.module_list.6.0.conv.1.running_var': 'models.46.bn33.running_var',\n        'down3.resblock.module_list.6.0.conv.1.num_batches_tracked': 'models.46.bn33.num_batches_tracked',\n        'down3.resblock.module_list.6.1.conv.0.weight': 'models.47.conv34.weight',\n        'down3.resblock.module_list.6.1.conv.1.weight': 'models.47.bn34.weight',\n        'down3.resblock.module_list.6.1.conv.1.bias': 'models.47.bn34.bias',\n        'down3.resblock.module_list.6.1.conv.1.running_mean': 'models.47.bn34.running_mean',\n        'down3.resblock.module_list.6.1.conv.1.running_var': 'models.47.bn34.running_var',\n        'down3.resblock.module_list.6.1.conv.1.num_batches_tracked': 'models.47.bn34.num_batches_tracked',\n        'down3.resblock.module_list.7.0.conv.0.weight': 'models.49.conv35.weight',\n        'down3.resblock.module_list.7.0.conv.1.weight': 'models.49.bn35.weight',\n        'down3.resblock.module_list.7.0.conv.1.bias': 'models.49.bn35.bias',\n        'down3.resblock.module_list.7.0.conv.1.running_mean': 'models.49.bn35.running_mean',\n        'down3.resblock.module_list.7.0.conv.1.running_var': 'models.49.bn35.running_var',\n        'down3.resblock.module_list.7.0.conv.1.num_batches_tracked': 'models.49.bn35.num_batches_tracked',\n        'down3.resblock.module_list.7.1.conv.0.weight': 'models.50.conv36.weight',\n        'down3.resblock.module_list.7.1.conv.1.weight': 'models.50.bn36.weight',\n        'down3.resblock.module_list.7.1.conv.1.bias': 'models.50.bn36.bias',\n        'down3.resblock.module_list.7.1.conv.1.running_mean': 'models.50.bn36.running_mean',\n        'down3.resblock.module_list.7.1.conv.1.running_var': 'models.50.bn36.running_var',\n        'down3.resblock.module_list.7.1.conv.1.num_batches_tracked': 'models.50.bn36.num_batches_tracked',\n        'down3.conv4.conv.0.weight': 'models.52.conv37.weight',\n        'down3.conv4.conv.1.weight': 'models.52.bn37.weight',\n        'down3.conv4.conv.1.bias': 'models.52.bn37.bias',\n        'down3.conv4.conv.1.running_mean': 'models.52.bn37.running_mean',\n        'down3.conv4.conv.1.running_var': 'models.52.bn37.running_var',\n        'down3.conv4.conv.1.num_batches_tracked': 'models.52.bn37.num_batches_tracked',\n        'down3.conv5.conv.0.weight': 'models.54.conv38.weight',\n        'down3.conv5.conv.1.weight': 'models.54.bn38.weight',\n        'down3.conv5.conv.1.bias': 'models.54.bn38.bias',\n        'down3.conv5.conv.1.running_mean': 'models.54.bn38.running_mean',\n        'down3.conv5.conv.1.running_var': 'models.54.bn38.running_var',\n        'down3.conv5.conv.1.num_batches_tracked': 'models.54.bn38.num_batches_tracked',\n        'down4.conv1.conv.0.weight': 'models.55.conv39.weight',\n        'down4.conv1.conv.1.weight': 'models.55.bn39.weight',\n        'down4.conv1.conv.1.bias': 'models.55.bn39.bias',\n        'down4.conv1.conv.1.running_mean': 'models.55.bn39.running_mean',\n        'down4.conv1.conv.1.running_var': 'models.55.bn39.running_var',\n        'down4.conv1.conv.1.num_batches_tracked': 'models.55.bn39.num_batches_tracked',\n        'down4.conv2.conv.0.weight': 'models.56.conv40.weight',\n        'down4.conv2.conv.1.weight': 'models.56.bn40.weight',\n        'down4.conv2.conv.1.bias': 'models.56.bn40.bias',\n        'down4.conv2.conv.1.running_mean': 'models.56.bn40.running_mean',\n        'down4.conv2.conv.1.running_var': 'models.56.bn40.running_var',\n        'down4.conv2.conv.1.num_batches_tracked': 'models.56.bn40.num_batches_tracked',\n        'down4.conv3.conv.0.weight': 'models.58.conv41.weight',\n        'down4.conv3.conv.1.weight': 'models.58.bn41.weight',\n        'down4.conv3.conv.1.bias': 'models.58.bn41.bias',\n        'down4.conv3.conv.1.running_mean': 'models.58.bn41.running_mean',\n        'down4.conv3.conv.1.running_var': 'models.58.bn41.running_var',\n        'down4.conv3.conv.1.num_batches_tracked': 'models.58.bn41.num_batches_tracked',\n        'down4.resblock.module_list.0.0.conv.0.weight': 'models.59.conv42.weight',\n        'down4.resblock.module_list.0.0.conv.1.weight': 'models.59.bn42.weight',\n        'down4.resblock.module_list.0.0.conv.1.bias': 'models.59.bn42.bias',\n        'down4.resblock.module_list.0.0.conv.1.running_mean': 'models.59.bn42.running_mean',\n        'down4.resblock.module_list.0.0.conv.1.running_var': 'models.59.bn42.running_var',\n        'down4.resblock.module_list.0.0.conv.1.num_batches_tracked': 'models.59.bn42.num_batches_tracked',\n        'down4.resblock.module_list.0.1.conv.0.weight': 'models.60.conv43.weight',\n        'down4.resblock.module_list.0.1.conv.1.weight': 'models.60.bn43.weight',\n        'down4.resblock.module_list.0.1.conv.1.bias': 'models.60.bn43.bias',\n        'down4.resblock.module_list.0.1.conv.1.running_mean': 'models.60.bn43.running_mean',\n        'down4.resblock.module_list.0.1.conv.1.running_var': 'models.60.bn43.running_var',\n        'down4.resblock.module_list.0.1.conv.1.num_batches_tracked': 'models.60.bn43.num_batches_tracked',\n        'down4.resblock.module_list.1.0.conv.0.weight': 'models.62.conv44.weight',\n        'down4.resblock.module_list.1.0.conv.1.weight': 'models.62.bn44.weight',\n        'down4.resblock.module_list.1.0.conv.1.bias': 'models.62.bn44.bias',\n        'down4.resblock.module_list.1.0.conv.1.running_mean': 'models.62.bn44.running_mean',\n        'down4.resblock.module_list.1.0.conv.1.running_var': 'models.62.bn44.running_var',\n        'down4.resblock.module_list.1.0.conv.1.num_batches_tracked': 'models.62.bn44.num_batches_tracked',\n        'down4.resblock.module_list.1.1.conv.0.weight': 'models.63.conv45.weight',\n        'down4.resblock.module_list.1.1.conv.1.weight': 'models.63.bn45.weight',\n        'down4.resblock.module_list.1.1.conv.1.bias': 'models.63.bn45.bias',\n        'down4.resblock.module_list.1.1.conv.1.running_mean': 'models.63.bn45.running_mean',\n        'down4.resblock.module_list.1.1.conv.1.running_var': 'models.63.bn45.running_var',\n        'down4.resblock.module_list.1.1.conv.1.num_batches_tracked': 'models.63.bn45.num_batches_tracked',\n        'down4.resblock.module_list.2.0.conv.0.weight': 'models.65.conv46.weight',\n        'down4.resblock.module_list.2.0.conv.1.weight': 'models.65.bn46.weight',\n        'down4.resblock.module_list.2.0.conv.1.bias': 'models.65.bn46.bias',\n        'down4.resblock.module_list.2.0.conv.1.running_mean': 'models.65.bn46.running_mean',\n        'down4.resblock.module_list.2.0.conv.1.running_var': 'models.65.bn46.running_var',\n        'down4.resblock.module_list.2.0.conv.1.num_batches_tracked': 'models.65.bn46.num_batches_tracked',\n        'down4.resblock.module_list.2.1.conv.0.weight': 'models.66.conv47.weight',\n        'down4.resblock.module_list.2.1.conv.1.weight': 'models.66.bn47.weight',\n        'down4.resblock.module_list.2.1.conv.1.bias': 'models.66.bn47.bias',\n        'down4.resblock.module_list.2.1.conv.1.running_mean': 'models.66.bn47.running_mean',\n        'down4.resblock.module_list.2.1.conv.1.running_var': 'models.66.bn47.running_var',\n        'down4.resblock.module_list.2.1.conv.1.num_batches_tracked': 'models.66.bn47.num_batches_tracked',\n        'down4.resblock.module_list.3.0.conv.0.weight': 'models.68.conv48.weight',\n        'down4.resblock.module_list.3.0.conv.1.weight': 'models.68.bn48.weight',\n        'down4.resblock.module_list.3.0.conv.1.bias': 'models.68.bn48.bias',\n        'down4.resblock.module_list.3.0.conv.1.running_mean': 'models.68.bn48.running_mean',\n        'down4.resblock.module_list.3.0.conv.1.running_var': 'models.68.bn48.running_var',\n        'down4.resblock.module_list.3.0.conv.1.num_batches_tracked': 'models.68.bn48.num_batches_tracked',\n        'down4.resblock.module_list.3.1.conv.0.weight': 'models.69.conv49.weight',\n        'down4.resblock.module_list.3.1.conv.1.weight': 'models.69.bn49.weight',\n        'down4.resblock.module_list.3.1.conv.1.bias': 'models.69.bn49.bias',\n        'down4.resblock.module_list.3.1.conv.1.running_mean': 'models.69.bn49.running_mean',\n        'down4.resblock.module_list.3.1.conv.1.running_var': 'models.69.bn49.running_var',\n        'down4.resblock.module_list.3.1.conv.1.num_batches_tracked': 'models.69.bn49.num_batches_tracked',\n        'down4.resblock.module_list.4.0.conv.0.weight': 'models.71.conv50.weight',\n        'down4.resblock.module_list.4.0.conv.1.weight': 'models.71.bn50.weight',\n        'down4.resblock.module_list.4.0.conv.1.bias': 'models.71.bn50.bias',\n        'down4.resblock.module_list.4.0.conv.1.running_mean': 'models.71.bn50.running_mean',\n        'down4.resblock.module_list.4.0.conv.1.running_var': 'models.71.bn50.running_var',\n        'down4.resblock.module_list.4.0.conv.1.num_batches_tracked': 'models.71.bn50.num_batches_tracked',\n        'down4.resblock.module_list.4.1.conv.0.weight': 'models.72.conv51.weight',\n        'down4.resblock.module_list.4.1.conv.1.weight': 'models.72.bn51.weight',\n        'down4.resblock.module_list.4.1.conv.1.bias': 'models.72.bn51.bias',\n        'down4.resblock.module_list.4.1.conv.1.running_mean': 'models.72.bn51.running_mean',\n        'down4.resblock.module_list.4.1.conv.1.running_var': 'models.72.bn51.running_var',\n        'down4.resblock.module_list.4.1.conv.1.num_batches_tracked': 'models.72.bn51.num_batches_tracked',\n        'down4.resblock.module_list.5.0.conv.0.weight': 'models.74.conv52.weight',\n        'down4.resblock.module_list.5.0.conv.1.weight': 'models.74.bn52.weight',\n        'down4.resblock.module_list.5.0.conv.1.bias': 'models.74.bn52.bias',\n        'down4.resblock.module_list.5.0.conv.1.running_mean': 'models.74.bn52.running_mean',\n        'down4.resblock.module_list.5.0.conv.1.running_var': 'models.74.bn52.running_var',\n        'down4.resblock.module_list.5.0.conv.1.num_batches_tracked': 'models.74.bn52.num_batches_tracked',\n        'down4.resblock.module_list.5.1.conv.0.weight': 'models.75.conv53.weight',\n        'down4.resblock.module_list.5.1.conv.1.weight': 'models.75.bn53.weight',\n        'down4.resblock.module_list.5.1.conv.1.bias': 'models.75.bn53.bias',\n        'down4.resblock.module_list.5.1.conv.1.running_mean': 'models.75.bn53.running_mean',\n        'down4.resblock.module_list.5.1.conv.1.running_var': 'models.75.bn53.running_var',\n        'down4.resblock.module_list.5.1.conv.1.num_batches_tracked': 'models.75.bn53.num_batches_tracked',\n        'down4.resblock.module_list.6.0.conv.0.weight': 'models.77.conv54.weight',\n        'down4.resblock.module_list.6.0.conv.1.weight': 'models.77.bn54.weight',\n        'down4.resblock.module_list.6.0.conv.1.bias': 'models.77.bn54.bias',\n        'down4.resblock.module_list.6.0.conv.1.running_mean': 'models.77.bn54.running_mean',\n        'down4.resblock.module_list.6.0.conv.1.running_var': 'models.77.bn54.running_var',\n        'down4.resblock.module_list.6.0.conv.1.num_batches_tracked': 'models.77.bn54.num_batches_tracked',\n        'down4.resblock.module_list.6.1.conv.0.weight': 'models.78.conv55.weight',\n        'down4.resblock.module_list.6.1.conv.1.weight': 'models.78.bn55.weight',\n        'down4.resblock.module_list.6.1.conv.1.bias': 'models.78.bn55.bias',\n        'down4.resblock.module_list.6.1.conv.1.running_mean': 'models.78.bn55.running_mean',\n        'down4.resblock.module_list.6.1.conv.1.running_var': 'models.78.bn55.running_var',\n        'down4.resblock.module_list.6.1.conv.1.num_batches_tracked': 'models.78.bn55.num_batches_tracked',\n        'down4.resblock.module_list.7.0.conv.0.weight': 'models.80.conv56.weight',\n        'down4.resblock.module_list.7.0.conv.1.weight': 'models.80.bn56.weight',\n        'down4.resblock.module_list.7.0.conv.1.bias': 'models.80.bn56.bias',\n        'down4.resblock.module_list.7.0.conv.1.running_mean': 'models.80.bn56.running_mean',\n        'down4.resblock.module_list.7.0.conv.1.running_var': 'models.80.bn56.running_var',\n        'down4.resblock.module_list.7.0.conv.1.num_batches_tracked': 'models.80.bn56.num_batches_tracked',\n        'down4.resblock.module_list.7.1.conv.0.weight': 'models.81.conv57.weight',\n        'down4.resblock.module_list.7.1.conv.1.weight': 'models.81.bn57.weight',\n        'down4.resblock.module_list.7.1.conv.1.bias': 'models.81.bn57.bias',\n        'down4.resblock.module_list.7.1.conv.1.running_mean': 'models.81.bn57.running_mean',\n        'down4.resblock.module_list.7.1.conv.1.running_var': 'models.81.bn57.running_var',\n        'down4.resblock.module_list.7.1.conv.1.num_batches_tracked': 'models.81.bn57.num_batches_tracked',\n        'down4.conv4.conv.0.weight': 'models.83.conv58.weight',\n        'down4.conv4.conv.1.weight': 'models.83.bn58.weight',\n        'down4.conv4.conv.1.bias': 'models.83.bn58.bias',\n        'down4.conv4.conv.1.running_mean': 'models.83.bn58.running_mean',\n        'down4.conv4.conv.1.running_var': 'models.83.bn58.running_var',\n        'down4.conv4.conv.1.num_batches_tracked': 'models.83.bn58.num_batches_tracked',\n        'down4.conv5.conv.0.weight': 'models.85.conv59.weight',\n        'down4.conv5.conv.1.weight': 'models.85.bn59.weight',\n        'down4.conv5.conv.1.bias': 'models.85.bn59.bias',\n        'down4.conv5.conv.1.running_mean': 'models.85.bn59.running_mean',\n        'down4.conv5.conv.1.running_var': 'models.85.bn59.running_var',\n        'down4.conv5.conv.1.num_batches_tracked': 'models.85.bn59.num_batches_tracked',\n        'down5.conv1.conv.0.weight': 'models.86.conv60.weight',\n        'down5.conv1.conv.1.weight': 'models.86.bn60.weight',\n        'down5.conv1.conv.1.bias': 'models.86.bn60.bias',\n        'down5.conv1.conv.1.running_mean': 'models.86.bn60.running_mean',\n        'down5.conv1.conv.1.running_var': 'models.86.bn60.running_var',\n        'down5.conv1.conv.1.num_batches_tracked': 'models.86.bn60.num_batches_tracked',\n        'down5.conv2.conv.0.weight': 'models.87.conv61.weight',\n        'down5.conv2.conv.1.weight': 'models.87.bn61.weight',\n        'down5.conv2.conv.1.bias': 'models.87.bn61.bias',\n        'down5.conv2.conv.1.running_mean': 'models.87.bn61.running_mean',\n        'down5.conv2.conv.1.running_var': 'models.87.bn61.running_var',\n        'down5.conv2.conv.1.num_batches_tracked': 'models.87.bn61.num_batches_tracked',\n        'down5.conv3.conv.0.weight': 'models.89.conv62.weight',\n        'down5.conv3.conv.1.weight': 'models.89.bn62.weight',\n        'down5.conv3.conv.1.bias': 'models.89.bn62.bias',\n        'down5.conv3.conv.1.running_mean': 'models.89.bn62.running_mean',\n        'down5.conv3.conv.1.running_var': 'models.89.bn62.running_var',\n        'down5.conv3.conv.1.num_batches_tracked': 'models.89.bn62.num_batches_tracked',\n        'down5.resblock.module_list.0.0.conv.0.weight': 'models.90.conv63.weight',\n        'down5.resblock.module_list.0.0.conv.1.weight': 'models.90.bn63.weight',\n        'down5.resblock.module_list.0.0.conv.1.bias': 'models.90.bn63.bias',\n        'down5.resblock.module_list.0.0.conv.1.running_mean': 'models.90.bn63.running_mean',\n        'down5.resblock.module_list.0.0.conv.1.running_var': 'models.90.bn63.running_var',\n        'down5.resblock.module_list.0.0.conv.1.num_batches_tracked': 'models.90.bn63.num_batches_tracked',\n        'down5.resblock.module_list.0.1.conv.0.weight': 'models.91.conv64.weight',\n        'down5.resblock.module_list.0.1.conv.1.weight': 'models.91.bn64.weight',\n        'down5.resblock.module_list.0.1.conv.1.bias': 'models.91.bn64.bias',\n        'down5.resblock.module_list.0.1.conv.1.running_mean': 'models.91.bn64.running_mean',\n        'down5.resblock.module_list.0.1.conv.1.running_var': 'models.91.bn64.running_var',\n        'down5.resblock.module_list.0.1.conv.1.num_batches_tracked': 'models.91.bn64.num_batches_tracked',\n        'down5.resblock.module_list.1.0.conv.0.weight': 'models.93.conv65.weight',\n        'down5.resblock.module_list.1.0.conv.1.weight': 'models.93.bn65.weight',\n        'down5.resblock.module_list.1.0.conv.1.bias': 'models.93.bn65.bias',\n        'down5.resblock.module_list.1.0.conv.1.running_mean': 'models.93.bn65.running_mean',\n        'down5.resblock.module_list.1.0.conv.1.running_var': 'models.93.bn65.running_var',\n        'down5.resblock.module_list.1.0.conv.1.num_batches_tracked': 'models.93.bn65.num_batches_tracked',\n        'down5.resblock.module_list.1.1.conv.0.weight': 'models.94.conv66.weight',\n        'down5.resblock.module_list.1.1.conv.1.weight': 'models.94.bn66.weight',\n        'down5.resblock.module_list.1.1.conv.1.bias': 'models.94.bn66.bias',\n        'down5.resblock.module_list.1.1.conv.1.running_mean': 'models.94.bn66.running_mean',\n        'down5.resblock.module_list.1.1.conv.1.running_var': 'models.94.bn66.running_var',\n        'down5.resblock.module_list.1.1.conv.1.num_batches_tracked': 'models.94.bn66.num_batches_tracked',\n        'down5.resblock.module_list.2.0.conv.0.weight': 'models.96.conv67.weight',\n        'down5.resblock.module_list.2.0.conv.1.weight': 'models.96.bn67.weight',\n        'down5.resblock.module_list.2.0.conv.1.bias': 'models.96.bn67.bias',\n        'down5.resblock.module_list.2.0.conv.1.running_mean': 'models.96.bn67.running_mean',\n        'down5.resblock.module_list.2.0.conv.1.running_var': 'models.96.bn67.running_var',\n        'down5.resblock.module_list.2.0.conv.1.num_batches_tracked': 'models.96.bn67.num_batches_tracked',\n        'down5.resblock.module_list.2.1.conv.0.weight': 'models.97.conv68.weight',\n        'down5.resblock.module_list.2.1.conv.1.weight': 'models.97.bn68.weight',\n        'down5.resblock.module_list.2.1.conv.1.bias': 'models.97.bn68.bias',\n        'down5.resblock.module_list.2.1.conv.1.running_mean': 'models.97.bn68.running_mean',\n        'down5.resblock.module_list.2.1.conv.1.running_var': 'models.97.bn68.running_var',\n        'down5.resblock.module_list.2.1.conv.1.num_batches_tracked': 'models.97.bn68.num_batches_tracked',\n        'down5.resblock.module_list.3.0.conv.0.weight': 'models.99.conv69.weight',\n        'down5.resblock.module_list.3.0.conv.1.weight': 'models.99.bn69.weight',\n        'down5.resblock.module_list.3.0.conv.1.bias': 'models.99.bn69.bias',\n        'down5.resblock.module_list.3.0.conv.1.running_mean': 'models.99.bn69.running_mean',\n        'down5.resblock.module_list.3.0.conv.1.running_var': 'models.99.bn69.running_var',\n        'down5.resblock.module_list.3.0.conv.1.num_batches_tracked': 'models.99.bn69.num_batches_tracked',\n        'down5.resblock.module_list.3.1.conv.0.weight': 'models.100.conv70.weight',\n        'down5.resblock.module_list.3.1.conv.1.weight': 'models.100.bn70.weight',\n        'down5.resblock.module_list.3.1.conv.1.bias': 'models.100.bn70.bias',\n        'down5.resblock.module_list.3.1.conv.1.running_mean': 'models.100.bn70.running_mean',\n        'down5.resblock.module_list.3.1.conv.1.running_var': 'models.100.bn70.running_var',\n        'down5.resblock.module_list.3.1.conv.1.num_batches_tracked': 'models.100.bn70.num_batches_tracked',\n        'down5.conv4.conv.0.weight': 'models.102.conv71.weight',\n        'down5.conv4.conv.1.weight': 'models.102.bn71.weight',\n        'down5.conv4.conv.1.bias': 'models.102.bn71.bias',\n        'down5.conv4.conv.1.running_mean': 'models.102.bn71.running_mean',\n        'down5.conv4.conv.1.running_var': 'models.102.bn71.running_var',\n        'down5.conv4.conv.1.num_batches_tracked': 'models.102.bn71.num_batches_tracked',\n        'down5.conv5.conv.0.weight': 'models.104.conv72.weight',\n        'down5.conv5.conv.1.weight': 'models.104.bn72.weight',\n        'down5.conv5.conv.1.bias': 'models.104.bn72.bias',\n        'down5.conv5.conv.1.running_mean': 'models.104.bn72.running_mean',\n        'down5.conv5.conv.1.running_var': 'models.104.bn72.running_var',\n        'down5.conv5.conv.1.num_batches_tracked': 'models.104.bn72.num_batches_tracked',\n        'neek.conv1.conv.0.weight': 'models.105.conv73.weight',\n        'neek.conv1.conv.1.weight': 'models.105.bn73.weight',\n        'neek.conv1.conv.1.bias': 'models.105.bn73.bias',\n        'neek.conv1.conv.1.running_mean': 'models.105.bn73.running_mean',\n        'neek.conv1.conv.1.running_var': 'models.105.bn73.running_var',\n        'neek.conv1.conv.1.num_batches_tracked': 'models.105.bn73.num_batches_tracked',\n        'neek.conv2.conv.0.weight': 'models.106.conv74.weight',\n        'neek.conv2.conv.1.weight': 'models.106.bn74.weight',\n        'neek.conv2.conv.1.bias': 'models.106.bn74.bias',\n        'neek.conv2.conv.1.running_mean': 'models.106.bn74.running_mean',\n        'neek.conv2.conv.1.running_var': 'models.106.bn74.running_var',\n        'neek.conv2.conv.1.num_batches_tracked': 'models.106.bn74.num_batches_tracked',\n        'neek.conv3.conv.0.weight': 'models.107.conv75.weight',\n        'neek.conv3.conv.1.weight': 'models.107.bn75.weight',\n        'neek.conv3.conv.1.bias': 'models.107.bn75.bias',\n        'neek.conv3.conv.1.running_mean': 'models.107.bn75.running_mean',\n        'neek.conv3.conv.1.running_var': 'models.107.bn75.running_var',\n        'neek.conv3.conv.1.num_batches_tracked': 'models.107.bn75.num_batches_tracked',\n        'neek.conv4.conv.0.weight': 'models.114.conv76.weight',\n        'neek.conv4.conv.1.weight': 'models.114.bn76.weight',\n        'neek.conv4.conv.1.bias': 'models.114.bn76.bias',\n        'neek.conv4.conv.1.running_mean': 'models.114.bn76.running_mean',\n        'neek.conv4.conv.1.running_var': 'models.114.bn76.running_var',\n        'neek.conv4.conv.1.num_batches_tracked': 'models.114.bn76.num_batches_tracked',\n        'neek.conv5.conv.0.weight': 'models.115.conv77.weight',\n        'neek.conv5.conv.1.weight': 'models.115.bn77.weight',\n        'neek.conv5.conv.1.bias': 'models.115.bn77.bias',\n        'neek.conv5.conv.1.running_mean': 'models.115.bn77.running_mean',\n        'neek.conv5.conv.1.running_var': 'models.115.bn77.running_var',\n        'neek.conv5.conv.1.num_batches_tracked': 'models.115.bn77.num_batches_tracked',\n        'neek.conv6.conv.0.weight': 'models.116.conv78.weight',\n        'neek.conv6.conv.1.weight': 'models.116.bn78.weight',\n        'neek.conv6.conv.1.bias': 'models.116.bn78.bias',\n        'neek.conv6.conv.1.running_mean': 'models.116.bn78.running_mean',\n        'neek.conv6.conv.1.running_var': 'models.116.bn78.running_var',\n        'neek.conv6.conv.1.num_batches_tracked': 'models.116.bn78.num_batches_tracked',\n        'neek.conv7.conv.0.weight': 'models.117.conv79.weight',\n        'neek.conv7.conv.1.weight': 'models.117.bn79.weight',\n        'neek.conv7.conv.1.bias': 'models.117.bn79.bias',\n        'neek.conv7.conv.1.running_mean': 'models.117.bn79.running_mean',\n        'neek.conv7.conv.1.running_var': 'models.117.bn79.running_var',\n        'neek.conv7.conv.1.num_batches_tracked': 'models.117.bn79.num_batches_tracked',\n        'neek.conv8.conv.0.weight': 'models.120.conv80.weight',\n        'neek.conv8.conv.1.weight': 'models.120.bn80.weight',\n        'neek.conv8.conv.1.bias': 'models.120.bn80.bias',\n        'neek.conv8.conv.1.running_mean': 'models.120.bn80.running_mean',\n        'neek.conv8.conv.1.running_var': 'models.120.bn80.running_var',\n        'neek.conv8.conv.1.num_batches_tracked': 'models.120.bn80.num_batches_tracked',\n        'neek.conv9.conv.0.weight': 'models.122.conv81.weight',\n        'neek.conv9.conv.1.weight': 'models.122.bn81.weight',\n        'neek.conv9.conv.1.bias': 'models.122.bn81.bias',\n        'neek.conv9.conv.1.running_mean': 'models.122.bn81.running_mean',\n        'neek.conv9.conv.1.running_var': 'models.122.bn81.running_var',\n        'neek.conv9.conv.1.num_batches_tracked': 'models.122.bn81.num_batches_tracked',\n        'neek.conv10.conv.0.weight': 'models.123.conv82.weight',\n        'neek.conv10.conv.1.weight': 'models.123.bn82.weight',\n        'neek.conv10.conv.1.bias': 'models.123.bn82.bias',\n        'neek.conv10.conv.1.running_mean': 'models.123.bn82.running_mean',\n        'neek.conv10.conv.1.running_var': 'models.123.bn82.running_var',\n        'neek.conv10.conv.1.num_batches_tracked': 'models.123.bn82.num_batches_tracked',\n        'neek.conv11.conv.0.weight': 'models.124.conv83.weight',\n        'neek.conv11.conv.1.weight': 'models.124.bn83.weight',\n        'neek.conv11.conv.1.bias': 'models.124.bn83.bias',\n        'neek.conv11.conv.1.running_mean': 'models.124.bn83.running_mean',\n        'neek.conv11.conv.1.running_var': 'models.124.bn83.running_var',\n        'neek.conv11.conv.1.num_batches_tracked': 'models.124.bn83.num_batches_tracked',\n        'neek.conv12.conv.0.weight': 'models.125.conv84.weight',\n        'neek.conv12.conv.1.weight': 'models.125.bn84.weight',\n        'neek.conv12.conv.1.bias': 'models.125.bn84.bias',\n        'neek.conv12.conv.1.running_mean': 'models.125.bn84.running_mean',\n        'neek.conv12.conv.1.running_var': 'models.125.bn84.running_var',\n        'neek.conv12.conv.1.num_batches_tracked': 'models.125.bn84.num_batches_tracked',\n        'neek.conv13.conv.0.weight': 'models.126.conv85.weight',\n        'neek.conv13.conv.1.weight': 'models.126.bn85.weight',\n        'neek.conv13.conv.1.bias': 'models.126.bn85.bias',\n        'neek.conv13.conv.1.running_mean': 'models.126.bn85.running_mean',\n        'neek.conv13.conv.1.running_var': 'models.126.bn85.running_var',\n        'neek.conv13.conv.1.num_batches_tracked': 'models.126.bn85.num_batches_tracked',\n        'neek.conv14.conv.0.weight': 'models.127.conv86.weight',\n        'neek.conv14.conv.1.weight': 'models.127.bn86.weight',\n        'neek.conv14.conv.1.bias': 'models.127.bn86.bias',\n        'neek.conv14.conv.1.running_mean': 'models.127.bn86.running_mean',\n        'neek.conv14.conv.1.running_var': 'models.127.bn86.running_var',\n        'neek.conv14.conv.1.num_batches_tracked': 'models.127.bn86.num_batches_tracked',\n        'neek.conv15.conv.0.weight': 'models.130.conv87.weight',\n        'neek.conv15.conv.1.weight': 'models.130.bn87.weight',\n        'neek.conv15.conv.1.bias': 'models.130.bn87.bias',\n        'neek.conv15.conv.1.running_mean': 'models.130.bn87.running_mean',\n        'neek.conv15.conv.1.running_var': 'models.130.bn87.running_var',\n        'neek.conv15.conv.1.num_batches_tracked': 'models.130.bn87.num_batches_tracked',\n        'neek.conv16.conv.0.weight': 'models.132.conv88.weight',\n        'neek.conv16.conv.1.weight': 'models.132.bn88.weight',\n        'neek.conv16.conv.1.bias': 'models.132.bn88.bias',\n        'neek.conv16.conv.1.running_mean': 'models.132.bn88.running_mean',\n        'neek.conv16.conv.1.running_var': 'models.132.bn88.running_var',\n        'neek.conv16.conv.1.num_batches_tracked': 'models.132.bn88.num_batches_tracked',\n        'neek.conv17.conv.0.weight': 'models.133.conv89.weight',\n        'neek.conv17.conv.1.weight': 'models.133.bn89.weight',\n        'neek.conv17.conv.1.bias': 'models.133.bn89.bias',\n        'neek.conv17.conv.1.running_mean': 'models.133.bn89.running_mean',\n        'neek.conv17.conv.1.running_var': 'models.133.bn89.running_var',\n        'neek.conv17.conv.1.num_batches_tracked': 'models.133.bn89.num_batches_tracked',\n        'neek.conv18.conv.0.weight': 'models.134.conv90.weight',\n        'neek.conv18.conv.1.weight': 'models.134.bn90.weight',\n        'neek.conv18.conv.1.bias': 'models.134.bn90.bias',\n        'neek.conv18.conv.1.running_mean': 'models.134.bn90.running_mean',\n        'neek.conv18.conv.1.running_var': 'models.134.bn90.running_var',\n        'neek.conv18.conv.1.num_batches_tracked': 'models.134.bn90.num_batches_tracked',\n        'neek.conv19.conv.0.weight': 'models.135.conv91.weight',\n        'neek.conv19.conv.1.weight': 'models.135.bn91.weight',\n        'neek.conv19.conv.1.bias': 'models.135.bn91.bias',\n        'neek.conv19.conv.1.running_mean': 'models.135.bn91.running_mean',\n        'neek.conv19.conv.1.running_var': 'models.135.bn91.running_var',\n        'neek.conv19.conv.1.num_batches_tracked': 'models.135.bn91.num_batches_tracked',\n        'neek.conv20.conv.0.weight': 'models.136.conv92.weight',\n        'neek.conv20.conv.1.weight': 'models.136.bn92.weight',\n        'neek.conv20.conv.1.bias': 'models.136.bn92.bias',\n        'neek.conv20.conv.1.running_mean': 'models.136.bn92.running_mean',\n        'neek.conv20.conv.1.running_var': 'models.136.bn92.running_var',\n        'neek.conv20.conv.1.num_batches_tracked': 'models.136.bn92.num_batches_tracked',\n        'head.conv1.conv.0.weight': 'models.137.conv93.weight',\n        'head.conv1.conv.1.weight': 'models.137.bn93.weight',\n        'head.conv1.conv.1.bias': 'models.137.bn93.bias',\n        'head.conv1.conv.1.running_mean': 'models.137.bn93.running_mean',\n        'head.conv1.conv.1.running_var': 'models.137.bn93.running_var',\n        'head.conv1.conv.1.num_batches_tracked': 'models.137.bn93.num_batches_tracked',\n        'head.conv2.conv.0.weight': 'models.138.conv94.weight',\n        'head.conv2.conv.0.bias': 'models.138.conv94.bias',\n        'head.conv3.conv.0.weight': 'models.141.conv95.weight',\n        'head.conv3.conv.1.weight': 'models.141.bn95.weight',\n        'head.conv3.conv.1.bias': 'models.141.bn95.bias',\n        'head.conv3.conv.1.running_mean': 'models.141.bn95.running_mean',\n        'head.conv3.conv.1.running_var': 'models.141.bn95.running_var',\n        'head.conv3.conv.1.num_batches_tracked': 'models.141.bn95.num_batches_tracked',\n        'head.conv4.conv.0.weight': 'models.143.conv96.weight',\n        'head.conv4.conv.1.weight': 'models.143.bn96.weight',\n        'head.conv4.conv.1.bias': 'models.143.bn96.bias',\n        'head.conv4.conv.1.running_mean': 'models.143.bn96.running_mean',\n        'head.conv4.conv.1.running_var': 'models.143.bn96.running_var',\n        'head.conv4.conv.1.num_batches_tracked': 'models.143.bn96.num_batches_tracked',\n        'head.conv5.conv.0.weight': 'models.144.conv97.weight',\n        'head.conv5.conv.1.weight': 'models.144.bn97.weight',\n        'head.conv5.conv.1.bias': 'models.144.bn97.bias',\n        'head.conv5.conv.1.running_mean': 'models.144.bn97.running_mean',\n        'head.conv5.conv.1.running_var': 'models.144.bn97.running_var',\n        'head.conv5.conv.1.num_batches_tracked': 'models.144.bn97.num_batches_tracked',\n        'head.conv6.conv.0.weight': 'models.145.conv98.weight',\n        'head.conv6.conv.1.weight': 'models.145.bn98.weight',\n        'head.conv6.conv.1.bias': 'models.145.bn98.bias',\n        'head.conv6.conv.1.running_mean': 'models.145.bn98.running_mean',\n        'head.conv6.conv.1.running_var': 'models.145.bn98.running_var',\n        'head.conv6.conv.1.num_batches_tracked': 'models.145.bn98.num_batches_tracked',\n        'head.conv7.conv.0.weight': 'models.146.conv99.weight',\n        'head.conv7.conv.1.weight': 'models.146.bn99.weight',\n        'head.conv7.conv.1.bias': 'models.146.bn99.bias',\n        'head.conv7.conv.1.running_mean': 'models.146.bn99.running_mean',\n        'head.conv7.conv.1.running_var': 'models.146.bn99.running_var',\n        'head.conv7.conv.1.num_batches_tracked': 'models.146.bn99.num_batches_tracked',\n        'head.conv8.conv.0.weight': 'models.147.conv100.weight',\n        'head.conv8.conv.1.weight': 'models.147.bn100.weight',\n        'head.conv8.conv.1.bias': 'models.147.bn100.bias',\n        'head.conv8.conv.1.running_mean': 'models.147.bn100.running_mean',\n        'head.conv8.conv.1.running_var': 'models.147.bn100.running_var',\n        'head.conv8.conv.1.num_batches_tracked': 'models.147.bn100.num_batches_tracked',\n        'head.conv9.conv.0.weight': 'models.148.conv101.weight',\n        'head.conv9.conv.1.weight': 'models.148.bn101.weight',\n        'head.conv9.conv.1.bias': 'models.148.bn101.bias',\n        'head.conv9.conv.1.running_mean': 'models.148.bn101.running_mean',\n        'head.conv9.conv.1.running_var': 'models.148.bn101.running_var',\n        'head.conv9.conv.1.num_batches_tracked': 'models.148.bn101.num_batches_tracked',\n        'head.conv10.conv.0.weight': 'models.149.conv102.weight',\n        'head.conv10.conv.0.bias': 'models.149.conv102.bias',\n        'head.conv11.conv.0.weight': 'models.152.conv103.weight',\n        'head.conv11.conv.1.weight': 'models.152.bn103.weight',\n        'head.conv11.conv.1.bias': 'models.152.bn103.bias',\n        'head.conv11.conv.1.running_mean': 'models.152.bn103.running_mean',\n        'head.conv11.conv.1.running_var': 'models.152.bn103.running_var',\n        'head.conv11.conv.1.num_batches_tracked': 'models.152.bn103.num_batches_tracked',\n        'head.conv12.conv.0.weight': 'models.154.conv104.weight',\n        'head.conv12.conv.1.weight': 'models.154.bn104.weight',\n        'head.conv12.conv.1.bias': 'models.154.bn104.bias',\n        'head.conv12.conv.1.running_mean': 'models.154.bn104.running_mean',\n        'head.conv12.conv.1.running_var': 'models.154.bn104.running_var',\n        'head.conv12.conv.1.num_batches_tracked': 'models.154.bn104.num_batches_tracked',\n        'head.conv13.conv.0.weight': 'models.155.conv105.weight',\n        'head.conv13.conv.1.weight': 'models.155.bn105.weight',\n        'head.conv13.conv.1.bias': 'models.155.bn105.bias',\n        'head.conv13.conv.1.running_mean': 'models.155.bn105.running_mean',\n        'head.conv13.conv.1.running_var': 'models.155.bn105.running_var',\n        'head.conv13.conv.1.num_batches_tracked': 'models.155.bn105.num_batches_tracked',\n        'head.conv14.conv.0.weight': 'models.156.conv106.weight',\n        'head.conv14.conv.1.weight': 'models.156.bn106.weight',\n        'head.conv14.conv.1.bias': 'models.156.bn106.bias',\n        'head.conv14.conv.1.running_mean': 'models.156.bn106.running_mean',\n        'head.conv14.conv.1.running_var': 'models.156.bn106.running_var',\n        'head.conv14.conv.1.num_batches_tracked': 'models.156.bn106.num_batches_tracked',\n        'head.conv15.conv.0.weight': 'models.157.conv107.weight',\n        'head.conv15.conv.1.weight': 'models.157.bn107.weight',\n        'head.conv15.conv.1.bias': 'models.157.bn107.bias',\n        'head.conv15.conv.1.running_mean': 'models.157.bn107.running_mean',\n        'head.conv15.conv.1.running_var': 'models.157.bn107.running_var',\n        'head.conv15.conv.1.num_batches_tracked': 'models.157.bn107.num_batches_tracked',\n        'head.conv16.conv.0.weight': 'models.158.conv108.weight',\n        'head.conv16.conv.1.weight': 'models.158.bn108.weight',\n        'head.conv16.conv.1.bias': 'models.158.bn108.bias',\n        'head.conv16.conv.1.running_mean': 'models.158.bn108.running_mean',\n        'head.conv16.conv.1.running_var': 'models.158.bn108.running_var',\n        'head.conv16.conv.1.num_batches_tracked': 'models.158.bn108.num_batches_tracked',\n        'head.conv17.conv.0.weight': 'models.159.conv109.weight',\n        'head.conv17.conv.1.weight': 'models.159.bn109.weight',\n        'head.conv17.conv.1.bias': 'models.159.bn109.bias',\n        'head.conv17.conv.1.running_mean': 'models.159.bn109.running_mean',\n        'head.conv17.conv.1.running_var': 'models.159.bn109.running_var',\n        'head.conv17.conv.1.num_batches_tracked': 'models.159.bn109.num_batches_tracked',\n        'head.conv18.conv.0.weight': 'models.160.conv110.weight',\n        'head.conv18.conv.0.bias': 'models.160.conv110.bias',\n    }\n    pth_weights = torch.load(checkpoint)\n    pt_weights = type(pth_weights)()\n    for name, new_name in name_mapping.items():\n        pt_weights[new_name] = pth_weights[name]\n    return pt_weights\n\n\ndef convert_pt_checkpoint_to_keras_h5(state_dict):\n    print('============================================================')\n\n    def copy1(conv, bn, idx):\n        keyword1 = 'conv%d.weight' % idx\n        keyword2 = 'bn%d.weight' % idx\n        keyword3 = 'bn%d.bias' % idx\n        keyword4 = 'bn%d.running_mean' % idx\n        keyword5 = 'bn%d.running_var' % idx\n        for key in state_dict:\n            value = state_dict[key].numpy()\n            if keyword1 in key:\n                w = value\n            elif keyword2 in key:\n                y = value\n            elif keyword3 in key:\n                b = value\n            elif keyword4 in key:\n                m = value\n            elif keyword5 in key:\n                v = value\n        w = w.transpose(2, 3, 1, 0)\n        conv.set_weights([w])\n        bn.set_weights([y, b, m, v])\n\n    def copy2(conv, idx):\n        keyword1 = 'conv%d.weight' % idx\n        keyword2 = 'conv%d.bias' % idx\n        for key in state_dict:\n            value = state_dict[key].numpy()\n            if keyword1 in key:\n                w = value\n            elif keyword2 in key:\n                b = value\n        w = w.transpose(2, 3, 1, 0)\n        conv.set_weights([w, b])\n\n    num_classes = 80\n    num_anchors = 3\n\n    with tf.Session(graph=tf.Graph()):\n        inputs = layers.Input(shape=[], dtype='string')\n        model_body = YOLOv4(inputs, num_classes, num_anchors)\n        model_body.summary()\n        layer_name_to_idx = {layer.name: idx for idx, layer in enumerate(model_body.layers)}\n\n        print('\\nCopying...')\n        i1 = layer_name_to_idx['conv2d']\n        i2 = layer_name_to_idx['batch_normalization']\n        copy1(model_body.layers[i1], model_body.layers[i2], 1)\n        for i in range(2, 94, 1):\n            i1 = layer_name_to_idx['conv2d_%d' % (i - 1)]\n            i2 = layer_name_to_idx['batch_normalization_%d' % (i - 1)]\n            copy1(model_body.layers[i1], model_body.layers[i2], i)\n        for i in range(95, 102, 1):\n            i1 = layer_name_to_idx['conv2d_%d' % (i - 1)]\n            i2 = layer_name_to_idx['batch_normalization_%d' % (i - 2,)]\n            copy1(model_body.layers[i1], model_body.layers[i2], i)\n        for i in range(103, 110, 1):\n            i1 = layer_name_to_idx['conv2d_%d' % (i - 1)]\n            i2 = layer_name_to_idx['batch_normalization_%d' % (i - 3,)]\n            copy1(model_body.layers[i1], model_body.layers[i2], i)\n\n        i1 = layer_name_to_idx['conv2d_93']\n        copy2(model_body.layers[i1], 94)\n        i1 = layer_name_to_idx['conv2d_101']\n        copy2(model_body.layers[i1], 102)\n        i1 = layer_name_to_idx['conv2d_109']\n        copy2(model_body.layers[i1], 110)\n\n        weights = model_body.get_weights()\n    print('\\nDone.')\n    return weights\n\n\nclass Mish(layers.Layer):\n\n    def __init__(self):\n        super(Mish, self).__init__()\n\n    def compute_output_shape(self, input_shape):\n        return input_shape\n\n    def call(self, x):\n        return x * tf.tanh(tf.math.softplus(x))\n\n\ndef conv2d_unit(x, filters, kernels, strides=1, padding='valid', bn=1, act='mish'):\n    use_bias = (bn != 1)\n    x = layers.Conv2D(filters, kernels,\n                      padding=padding,\n                      strides=strides,\n                      use_bias=use_bias,\n                      activation='linear',\n                      kernel_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.01))(x)\n    if bn:\n        x = layers.BatchNormalization(fused=False)(x)\n    if act == 'leaky':\n        x = keras.layers.LeakyReLU(alpha=0.1)(x)\n    elif act == 'mish':\n        x = Mish()(x)\n    return x\n\n\ndef residual_block(inputs, filters_1, filters_2):\n    x = conv2d_unit(inputs, filters_1, 1, strides=1, padding='valid')\n    x = conv2d_unit(x, filters_2, 3, strides=1, padding='same')\n    x = layers.add([inputs, x])\n    return x\n\n\ndef stack_residual_block(inputs, filters_1, filters_2, n):\n    x = residual_block(inputs, filters_1, filters_2)\n    for i in range(n - 1):\n        x = residual_block(x, filters_1, filters_2)\n    return x\n\n\ndef spp(x):\n    x_1 = x\n    x_2 = layers.MaxPooling2D(pool_size=5, strides=1, padding='same')(x)\n    x_3 = layers.MaxPooling2D(pool_size=9, strides=1, padding='same')(x)\n    x_4 = layers.MaxPooling2D(pool_size=13, strides=1, padding='same')(x)\n    out = layers.Concatenate()([x_4, x_3, x_2, x_1])\n    return out\n\n\ndef YOLOv4(inputs, num_classes, num_anchors, input_shape=(608, 608), initial_filters=32,\n           fast=False, anchors=None, conf_thresh=0.05, nms_thresh=0.45, keep_top_k=100, nms_top_k=100):\n    i32 = initial_filters\n    i64 = i32 * 2\n    i128 = i32 * 4\n    i256 = i32 * 8\n    i512 = i32 * 16\n    i1024 = i32 * 32\n\n    x, image_shape = layers.Lambda(lambda t: preprocessor(t, input_shape))(inputs)\n\n    # cspdarknet53\n    x = conv2d_unit(x, i32, 3, strides=1, padding='same')\n\n    # ============================= s2 =============================\n    x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(x)\n    x = conv2d_unit(x, i64, 3, strides=2)\n    s2 = conv2d_unit(x, i64, 1, strides=1)\n    x = conv2d_unit(x, i64, 1, strides=1)\n    x = stack_residual_block(x, i32, i64, n=1)\n    x = conv2d_unit(x, i64, 1, strides=1)\n    x = layers.Concatenate()([x, s2])\n    s2 = conv2d_unit(x, i64, 1, strides=1)\n\n    # ============================= s4 =============================\n    x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(s2)\n    x = conv2d_unit(x, i128, 3, strides=2)\n    s4 = conv2d_unit(x, i64, 1, strides=1)\n    x = conv2d_unit(x, i64, 1, strides=1)\n    x = stack_residual_block(x, i64, i64, n=2)\n    x = conv2d_unit(x, i64, 1, strides=1)\n    x = layers.Concatenate()([x, s4])\n    s4 = conv2d_unit(x, i128, 1, strides=1)\n\n    # ============================= s8 =============================\n    x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(s4)\n    x = conv2d_unit(x, i256, 3, strides=2)\n    s8 = conv2d_unit(x, i128, 1, strides=1)\n    x = conv2d_unit(x, i128, 1, strides=1)\n    x = stack_residual_block(x, i128, i128, n=8)\n    x = conv2d_unit(x, i128, 1, strides=1)\n    x = layers.Concatenate()([x, s8])\n    s8 = conv2d_unit(x, i256, 1, strides=1)\n\n    # ============================= s16 =============================\n    x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(s8)\n    x = conv2d_unit(x, i512, 3, strides=2)\n    s16 = conv2d_unit(x, i256, 1, strides=1)\n    x = conv2d_unit(x, i256, 1, strides=1)\n    x = stack_residual_block(x, i256, i256, n=8)\n    x = conv2d_unit(x, i256, 1, strides=1)\n    x = layers.Concatenate()([x, s16])\n    s16 = conv2d_unit(x, i512, 1, strides=1)\n\n    # ============================= s32 =============================\n    x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(s16)\n    x = conv2d_unit(x, i1024, 3, strides=2)\n    s32 = conv2d_unit(x, i512, 1, strides=1)\n    x = conv2d_unit(x, i512, 1, strides=1)\n    x = stack_residual_block(x, i512, i512, n=4)\n    x = conv2d_unit(x, i512, 1, strides=1)\n    x = layers.Concatenate()([x, s32])\n    s32 = conv2d_unit(x, i1024, 1, strides=1)\n\n    # fpn\n    x = conv2d_unit(s32, i512, 1, strides=1, act='leaky')\n    x = conv2d_unit(x, i1024, 3, strides=1, padding='same', act='leaky')\n    x = conv2d_unit(x, i512, 1, strides=1, act='leaky')\n    x = spp(x)\n\n    x = conv2d_unit(x, i512, 1, strides=1, act='leaky')\n    x = conv2d_unit(x, i1024, 3, strides=1, padding='same', act='leaky')\n    fpn_s32 = conv2d_unit(x, i512, 1, strides=1, act='leaky')\n\n    # pan01\n    x = conv2d_unit(fpn_s32, i256, 1, strides=1, act='leaky')\n    x = layers.UpSampling2D(2)(x)\n    s16 = conv2d_unit(s16, i256, 1, strides=1, act='leaky')\n    x = layers.Concatenate()([s16, x])\n    x = conv2d_unit(x, i256, 1, strides=1, act='leaky')\n    x = conv2d_unit(x, i512, 3, strides=1, padding='same', act='leaky')\n    x = conv2d_unit(x, i256, 1, strides=1, act='leaky')\n    x = conv2d_unit(x, i512, 3, strides=1, padding='same', act='leaky')\n    fpn_s16 = conv2d_unit(x, i256, 1, strides=1, act='leaky')\n\n    # pan02\n    x = conv2d_unit(fpn_s16, i128, 1, strides=1, act='leaky')\n    x = layers.UpSampling2D(2)(x)\n    s8 = conv2d_unit(s8, i128, 1, strides=1, act='leaky')\n    x = layers.Concatenate()([s8, x])\n    x = conv2d_unit(x, i128, 1, strides=1, act='leaky')\n    x = conv2d_unit(x, i256, 3, strides=1, padding='same', act='leaky')\n    x = conv2d_unit(x, i128, 1, strides=1, act='leaky')\n    x = conv2d_unit(x, i256, 3, strides=1, padding='same', act='leaky')\n    x = conv2d_unit(x, i128, 1, strides=1, act='leaky')\n\n    # output_s, doesn't need concat()\n    output_s = conv2d_unit(x, i256, 3, strides=1, padding='same', act='leaky')\n    output_s = conv2d_unit(output_s, num_anchors * (num_classes + 5), 1, strides=1, bn=0, act=None)\n\n    # output_m, need concat()\n    x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(x)\n    x = conv2d_unit(x, i256, 3, strides=2, act='leaky')\n    x = layers.Concatenate()([x, fpn_s16])\n    x = conv2d_unit(x, i256, 1, strides=1, act='leaky')\n    x = conv2d_unit(x, i512, 3, strides=1, padding='same', act='leaky')\n    x = conv2d_unit(x, i256, 1, strides=1, act='leaky')\n    x = conv2d_unit(x, i512, 3, strides=1, padding='same', act='leaky')\n    x = conv2d_unit(x, i256, 1, strides=1, act='leaky')\n    output_m = conv2d_unit(x, i512, 3, strides=1, padding='same', act='leaky')\n    output_m = conv2d_unit(output_m, num_anchors * (num_classes + 5), 1, strides=1, bn=0, act=None)\n\n    # output_l, need concat()\n    x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(x)\n    x = conv2d_unit(x, i512, 3, strides=2, act='leaky')\n    x = layers.Concatenate()([x, fpn_s32])\n    x = conv2d_unit(x, i512, 1, strides=1, act='leaky')\n    x = conv2d_unit(x, i1024, 3, strides=1, padding='same', act='leaky')\n    x = conv2d_unit(x, i512, 1, strides=1, act='leaky')\n    x = conv2d_unit(x, i1024, 3, strides=1, padding='same', act='leaky')\n    x = conv2d_unit(x, i512, 1, strides=1, act='leaky')\n    output_l = conv2d_unit(x, i1024, 3, strides=1, padding='same', act='leaky')\n    output_l = conv2d_unit(output_l, num_anchors * (num_classes + 5), 1, strides=1, bn=0, act=None)\n\n    def cast_float32(tensor):\n        return tf.cast(tensor, tf.float32)\n\n    output_l = layers.Lambda(cast_float32)(output_l)\n    output_m = layers.Lambda(cast_float32)(output_m)\n    output_s = layers.Lambda(cast_float32)(output_s)\n\n    # originally reshape in multi_thread_post\n    output_lr = layers.Reshape((1, input_shape[0] // 32, input_shape[1] // 32, 3, 5 + num_classes))(output_l)\n    output_mr = layers.Reshape((1, input_shape[0] // 16, input_shape[1] // 16, 3, 5 + num_classes))(output_m)\n    output_sr = layers.Reshape((1, input_shape[0] // 8, input_shape[1] // 8, 3, 5 + num_classes))(output_s)\n\n    # originally _yolo_out\n    masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]\n    anchors = [[12, 16], [19, 36], [40, 28], [36, 75], [76, 55],\n               [72, 146], [142, 110], [192, 243], [459, 401]]\n\n    def batch_process_feats(out, anchors, mask):\n        grid_h, grid_w, num_boxes = map(int, out.shape[2:5])\n\n        anchors = [anchors[i] for i in mask]\n        anchors_tensor = np.array(anchors).reshape(1, 1, len(anchors), 2)\n\n        # Reshape to batch, height, width, num_anchors, box_params.\n        box_xy = tf.sigmoid(out[..., :2])\n        box_wh = tf.exp(out[..., 2:4])\n        box_wh = box_wh * anchors_tensor\n\n        box_confidence = tf.sigmoid(out[..., 4])\n        box_confidence = tf.expand_dims(box_confidence, axis=-1)\n        box_class_probs = tf.sigmoid(out[..., 5:])\n\n        col = np.tile(np.arange(0, grid_w), grid_w).reshape(-1, grid_w)\n        row = np.tile(np.arange(0, grid_h).reshape(-1, 1), grid_h)\n\n        col = col.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)\n        row = row.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)\n        grid = np.concatenate((col, row), axis=-1).astype(np.float32)\n\n        box_xy += grid\n        box_xy /= (grid_w, grid_h)\n        box_wh /= input_shape\n        box_xy -= (box_wh / 2.)  # normalized xywh\n        boxes = tf.concat((box_xy, box_xy + box_wh), axis=-1)\n\n        box_scores = box_confidence * box_class_probs\n        num_boxes = np.prod(boxes.shape[1:-1])\n        boxes = tf.reshape(boxes, [-1, num_boxes, boxes.shape[-1]])\n        box_scores = tf.reshape(box_scores, [-1, num_boxes, box_scores.shape[-1]])\n        return boxes, box_scores\n\n    def filter_boxes(outputs):\n        boxes_l, boxes_m, boxes_s, box_scores_l, box_scores_m, box_scores_s, image_shape = outputs\n        boxes_l, box_scores_l = filter_boxes_one_size(boxes_l, box_scores_l)\n        boxes_m, box_scores_m = filter_boxes_one_size(boxes_m, box_scores_m)\n        boxes_s, box_scores_s = filter_boxes_one_size(boxes_s, box_scores_s)\n        boxes = tf.concat([boxes_l, boxes_m, boxes_s], axis=0)\n        box_scores = tf.concat([box_scores_l, box_scores_m, box_scores_s], axis=0)\n        image_shape_wh = image_shape[1::-1]\n        image_shape_whwh = tf.concat([image_shape_wh, image_shape_wh], axis=-1)\n        image_shape_whwh = tf.cast(image_shape_whwh, tf.float32)\n        boxes *= image_shape_whwh\n        boxes = tf.expand_dims(boxes, 0)\n        box_scores = tf.expand_dims(box_scores, 0)\n        boxes = tf.expand_dims(boxes, 2)\n        nms_boxes, nms_scores, nms_classes, valid_detections = tf.image.combined_non_max_suppression(\n            boxes,\n            box_scores,\n            max_output_size_per_class=nms_top_k,\n            max_total_size=nms_top_k,\n            iou_threshold=nms_thresh,\n            score_threshold=conf_thresh,\n            pad_per_class=False,\n            clip_boxes=False,\n            name='CombinedNonMaxSuppression',\n        )\n        return nms_boxes[0], nms_scores[0], nms_classes[0]\n\n    def filter_boxes_one_size(boxes, box_scores):\n        box_class_scores = tf.reduce_max(box_scores, axis=-1)\n        keep = box_class_scores > conf_thresh\n        boxes = boxes[keep]\n        box_scores = box_scores[keep]\n        return boxes, box_scores\n\n    def batch_yolo_out(outputs):\n        with tf.name_scope('yolo_out'):\n            b_output_lr, b_output_mr, b_output_sr, b_image_shape = outputs\n            with tf.name_scope('process_feats'):\n                b_boxes_l, b_box_scores_l = batch_process_feats(b_output_lr, anchors, masks[0])\n            with tf.name_scope('process_feats'):\n                b_boxes_m, b_box_scores_m = batch_process_feats(b_output_mr, anchors, masks[1])\n            with tf.name_scope('process_feats'):\n                b_boxes_s, b_box_scores_s = batch_process_feats(b_output_sr, anchors, masks[2])\n            with tf.name_scope('filter_boxes'):\n                b_nms_boxes, b_nms_scores, b_nms_classes = tf.map_fn(\n                    filter_boxes, [b_boxes_l, b_boxes_m, b_boxes_s, b_box_scores_l, b_box_scores_m, b_box_scores_s, b_image_shape],\n                    dtype=(tf.float32, tf.float32, tf.float32), back_prop=False, parallel_iterations=16)\n        return b_nms_boxes, b_nms_scores, b_nms_classes\n\n    boxes_scores_classes = layers.Lambda(batch_yolo_out)([output_lr, output_mr, output_sr, image_shape])\n\n    model_body = keras.models.Model(inputs=inputs, outputs=boxes_scores_classes)\n    return model_body\n\n\ndef decode_jpeg_resize(input_tensor, image_size):\n    tensor = tf.image.decode_png(input_tensor, channels=3)\n    shape = tf.shape(tensor)\n    tensor = tf.cast(tensor, tf.float32)\n    tensor = tf.image.resize(tensor, image_size)\n    tensor /= 255.0\n    return tf.cast(tensor, tf.float16), shape\n\n\ndef preprocessor(input_tensor, image_size):\n    with tf.name_scope('Preprocessor'):\n        tensor = tf.map_fn(\n            partial(decode_jpeg_resize, image_size=image_size), input_tensor,\n            dtype=(tf.float16, tf.int32), back_prop=False, parallel_iterations=16)\n    return tensor\n\n\ndef main():\n    os.system('aws s3 cp s3://neuron-s3/training_checkpoints/pytorch/yolov4/yolov4.pth . --no-sign-request')\n    torch_weights = rename_weights('./yolov4.pth')\n    keras_weights = convert_pt_checkpoint_to_keras_h5(torch_weights)\n    keras.backend.set_learning_phase(0)\n    num_anchors = 3\n    num_classes = 80\n    input_shape = (608, 608)\n    conf_thresh = 0.001\n    nms_thresh = 0.45\n    inputs = layers.Input(shape=[], dtype='string')\n    yolo = YOLOv4(inputs, num_classes, num_anchors, input_shape, conf_thresh=conf_thresh, nms_thresh=nms_thresh)\n    yolo.set_weights(keras_weights)\n    sess = keras.backend.get_session()\n    inputs = {'image': yolo.inputs[0]}\n    output_names = ['boxes', 'scores', 'classes']\n    outputs = {name: ts for name, ts in zip(output_names, yolo.outputs)}\n    tf.saved_model.simple_save(sess, './yolo_v4_coco_saved_model', inputs, outputs)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "src/helperscripts/installationScripts/python_instructions.txt",
    "content": "\n# AL2 Driver and Tools\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools\n\n# U20 Driver and Tools\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools\n\n# AL2 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n# U20 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# AL2 Pytorch Neuronx Upgrade(1.13)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n# U20 Pytorch Neuronx Upgrade(1.13)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# AL2 Pytorch Neuronx Upgrade(1.12)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.12.0 --neuron-version=2.6.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n# U20 Pytorch Neuronx Upgrade(1.12)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.12.0 --neuron-version=2.6.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# AL2 Pytorch Neuronx Upgrade(1.11)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --neuron-version=2.4.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n# U20 Pytorch Neuronx Upgrade(1.11)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --neuron-version=2.4.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# AL2 tensorflow Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# U20 tensorflow Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# AL2 tensorflow Neuronx upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# U20 tensorflow Neuronx upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# AL2 EFA Installation\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=efa --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n# U20 EFA Installation\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=efa --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# AL2 PyTorch DLAMI\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework\n\n# U20 PyTorch DLAMI\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework\n\n# AL2 tensorflow Neuronx upgrade(2.10)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# U20 tensorflow Neuronx upgrade(2.10)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# AL2 tensorflow Neuronx upgrade(2.9)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# U20 tensorflow Neuronx upgrade(2.9)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# AL2 tensorflow Neuronx upgrade(2.8)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# U20 tensorflow Neuronx upgrade(2.8)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# AL2 tensorflow Neuronx Install(2.10)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# U20 tensorflow Neuronx Install(2.10)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# AL2 tensorflow Neuronx Install(2.8)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.8 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# U20 tensorflow Neuronx Install(2.8)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.8 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# AL2 tensorflow Neuronx Install(2.7)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.7 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# U20 tensorflow Neuronx Install(2.7)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.7 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# AL2 Tensorflow DLAMI\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=tensorflow --framework-version=2.10 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework\n\n# U20 Tensorflow DLAMI\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=tensorflow --framework-version=2.10 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework\n\n# AL2 PyTorch Neuron DLAMI\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=dlami-framework\n\n# U20 PyTorch Neuron DLAMI\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=dlami-framework\n\n# U22 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# U22 Tensorflow Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# U22 Pytorch Neuron Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n# U22 Tensorflow Neuron Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami\n\n# AL2 Pytorch Neuronx DLAMI Upgrade(1.13)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework\n\n# U20 Pytorch Neuronx DLAMI Upgrade(1.13)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework\n\n# AL2 tensorflow Neuronx upgrade DLAMI(2.10)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework --category=compiler_framework\n\n# AL2 tensorflow Neuronx upgrade DLAMI(2.9)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework --category=compiler_framework\n\n# AL2 tensorflow Neuronx upgrade DLAMI(2.8)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework --category=compiler_framework\n\n# U20 tensorflow Neuronx upgrade DLAMI(2.10)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework --category=compiler_framework\n\n# U20 tensorflow Neuronx upgrade DLAMI(2.9)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework --category=compiler_framework\n\n# U20 tensorflow Neuronx upgrade(2.8)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework --category=compiler_framework\n\n# U20 Pytorch Neuronx 2.0 Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# AL2 Pytorch Neuronx 2.0 Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n# U22 Pytorch Neuronx 2.0 Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# AL2 Pytorch Neuronx Upgrade(2.0)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami\n\n# U20 Pytorch Neuronx Upgrade(2.0)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# U22 Pytorch Neuronx Upgrade(2.0)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# AL2 Pytorch Neuronx DLAMI Upgrade(2.0)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework\n\n# U20 Pytorch Neuronx DLAMI Upgrade(2.0)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework\n\n# AL2023 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# AL2023 tensorflow Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# Al2023 Pytorch Neuronx 2.0 Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# AL2023 tensorflow Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami --category=compiler_framework\n\n# U20 Pytorch Neuronx 2.1 Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# AL2023 Pytorch Neuronx 2.1 Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# U22 2.5 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# AL2023 Pytorch Neuronx Upgrade(2.1)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# U20 Pytorch Neuronx Upgrade(2.1)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# U22 2.5.1 Pytorch Neuronx Upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# AL2023 Pytorch Neuronx DLAMI Upgrade(2.1)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=dlami-framework\n\n# U20 Pytorch Neuronx DLAMI Upgrade(2.1)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework\n\n# U22 Neuron DLAMI - Torch-Neuronx-1.13.1\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=dlami-neuron\n\n# U22 Neuron DLAMI - Torch-Neuronx- 2.1.1\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=dlami-neuron\n\n# U22 Neuron DLAMI - Tensorflow-Neuronx- 2.10.1\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=dlami-neuron\n\n# U22 Neuron DLAMI - Transofrmers-Neuronx\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=transformers-neuronx --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=dlami-neuron\n\n# U22 Neuron DLAMI - Torch-Neuron-1.13.1\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=dlami-neuron\n\n# U22 Neuron DLAMI - Tensorflow-Neuron- 2.10.1\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=dlami-neuron\n\n# Rocky Linux 9 Driver and Tools\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=rockylinux9 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools\n\n# AL2023 Driver and Tools\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools\n\n# U22 2.1 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# AL2023 2.1 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# U20 2.1 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# U20 Pytorch Neuronx Upgrade(2.1)\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami\n\n# AL2023 2.5.1 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# AL2023 Driver and Tools\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools\n\n# U22 Driver and Tools\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools\n\n# AL2 EFA Installation\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=efa --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# U22 EFA Installation\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=efa --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# U22 2.6.0 Pytorch Neuronx Upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.6.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# AL2023 2.6.0 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.6.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# U22 2.6.0 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.6.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# AL2023 2.7.0 Pytorch Neuronx Upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.7.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# U22 2.7.0 Pytorch Neuronx Upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.7.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# AL2023 2.7.0 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.7.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# U22 2.7.0 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.7.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# AL2023 2.8.0 Pytorch Neuronx Upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# U22 2.8.0 Pytorch Neuronx Upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# AL2023 Latest Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami\n\n# U22 2.8.0 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# U22 2.9.0 Pytorch Neuronx Upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# U22 Latest Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami\n\n# U24 EFA Installation\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=efa --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami\n\n# U24 2.9.0 Pytorch Neuronx Upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami\n\n# U24 2.9.0 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami\n\n# U24 Driver and Tools\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools\n\n# U24 2.8.0 Pytorch Neuronx Upgrade\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami\n\n# U24 2.8.0 Pytorch Neuronx Install\n.. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami\n\n\n\n"
  },
  {
    "path": "src/helperscripts/n2-helper.py",
    "content": "import json\nimport argparse\nfrom packaging.version import Version, parse\nimport pandas as pd\nfrom pandas import json_normalize\n\n\nclass manifest:\n    def __init__(self, manifest_file):\n\n        self.manifest_file = manifest_file\n        self.df_packages = pd.DataFrame()\n\n    def parse_manifest(self):\n\n        with open(self.manifest_file, 'r') as f:\n            manifest = json.load(f)\n\n        # repos\n        self.df_repos = json_normalize(manifest['repos_n2'])\n\n        # latest release\n        self.df_latest_release = json_normalize(manifest['latest_release'])\n\n        # os properties\n        self.df_os_properties = json_normalize(manifest['os_properties'])\n\n        # ami properties\n        self.df_ami_properties = json_normalize(manifest['ami_properties'])\n\n        # dlami properties\n        self.df_dlami_properties = json_normalize(manifest['dlami_properties'])\n\n        # major version properties\n        self.df_major_version_properties = json_normalize(manifest['major_version_properties'])\n\n        # package properties\n        self.df_package_properties = json_normalize(manifest['package_properties'])\n\n        # neuron releases\n        for release in manifest['neuron_releases']:\n            df_release = json_normalize(release['packages'])\n            df_release['neuron_version'] = release['neuron_version']\n            self.df_packages = pd.concat([self.df_packages, df_release])\n\n        # merge release packages\n        self.df_release_packages = self.df_packages.merge(self.df_package_properties, how='left', on='name')\n        self.df_release_packages['supported_instances'] = self.df_release_packages['supported_instances'].tolist()\n\n    def merge_release_packages(self):\n\n        self.df_release_packages = self.df_packages.merge(self.df_package_properties, how='left', on='name')\n\n    def extract_major_minor_version(self, version):\n\n        return str(version.major) + '.' + str(version.minor)\n\n    def get_pip_packages_supporting_python_versions(self, args):\n        '''\n        Get supported python version by packages (compiler and framework)\n        e.g., {\"3.6\",\"3.7\",\"3.8\"}\n        '''\n\n        if args.neuron_version == None:\n            neuron_version = self.get_latest_neuron_version_per_instance(args.instance)\n        else:\n            neuron_version = args.neuron_version\n\n        df_instance = self.df_release_packages[\n            (self.df_release_packages['supported_instances'].map(lambda x: args.instance in x)) & (\n                    self.df_release_packages['neuron_version'] == neuron_version)]\n\n        # Compiler supporting Python versions\n        compiler_python_versions = \\\n            df_instance.loc[df_instance['component'] == 'Compiler']['supported_python_versions'].values[0]\n\n        # Specific framework version supporting Python versions\n        df_framework = df_instance.loc[df_instance['category'] == args.framework].copy()\n        df_framework['version'] = df_framework['version'].map(lambda x: Version(x))\n        df_framework['major_minor_version'] = df_framework['version'].map(lambda x: str(x.major) + '.' + str(x.minor))\n\n        framework_python_versions = df_framework.loc[\n            df_framework['major_minor_version'] == self.extract_major_minor_version(Version(args.framework_version))][\n            'supported_python_versions'].values[0]\n        return list(set(compiler_python_versions) & set(framework_python_versions))\n\n    def get_major_version(self, package_name, instance):\n        return self.df_major_version_properties.loc[(self.df_major_version_properties['name'] == package_name)][\n            args.instance].values[0]\n\n    def generate_script(self, args):\n        '''\n        It generates:\n        (1) str_preamble\n        (2) str_driver\n        (3) str_runtime\n        (4) str_tools\n        (5) str_python\n        (6) str_compiler\n        (7) str_framework\n        '''\n\n        str_preamble = ''\n\n        # Install and enable EPEL (required only for rocky linux 9 currently)\n        str_preamble += self.install_and_enable_epel(args)\n\n        # Configure Neuron repository\n        str_preamble += self.config_neuron_repository(args)\n\n        # Update OS packages\n        str_preamble += self.update_os_packages(args)\n\n        # Install OS headers\n        str_preamble += self.install_os_headers(args)\n\n        # Install git\n        str_preamble += self.install_git(args)\n\n        # Install Neuron driver\n        str_driver = self.install_neuron_driver(args)\n\n        # Install Neuron runtime\n        str_runtime = self.install_neuron_runtime(args)\n\n        # Install EFA driver\n        str_efa = self.install_efa_driver(args)\n\n        # Install Neuron Tools\n        str_tools = self.install_neuron_system_tools(args)\n\n        # Add PATH\n        if args.mode != 'compile' or args.ami != 'dlami-framework':\n            str_tools += '\\n# Add PATH\\n'\n            str_tools += 'export PATH=/opt/aws/neuron/bin:$PATH\\n'\n\n        # Install Python virtual environment\n        str_python = self.set_python_venv(args)\n\n        # Activate Pythohn venv\n        str_python += self.activate_python_venv(args)\n\n        # install jupyter notebook\n        str_python += self.jupyter_notebook(args)\n\n        # Set pip repository\n        str_python += self.set_pip_repository()\n\n        # Install wget, awscli\n        str_python += self.install_aux(args)\n\n        # install extra dependencies\n        str_deps = self.install_extra_dependencies(args)\n\n        # Install Neuron compiler\n        str_compiler = self.install_neuron_compiler(args)\n\n        # Install Neuron framework\n        str_framework = self.install_neuron_framework(args)\n\n        # install neuron compiler and framework\n        str_compiler_framework = self.install_neuron_compiler_and_framework(args)\n        if args.ami == 'dlami-framework':\n            # dlami instructions\n            str_dlami = self.install_dlami(args)\n            return str_dlami\n        elif args.ami == 'dlami-neuron':\n            str_dlami = self.install_neuron_dlami(args)\n            return str_dlami\n        elif args.category == 'all':\n            if args.instance == 'trn1':\n                str_runtime += str_efa\n            return str_preamble + str_driver + str_runtime + str_tools + str_deps + str_python + str_compiler_framework\n        elif args.category == 'driver_runtime_tools':\n            return str_preamble + str_driver + str_runtime + str_tools\n        elif args.category == 'compiler_framework':\n            return str_deps + str_python + str_compiler_framework\n        elif args.category == 'driver':\n            return str_preamble + str_driver\n        elif args.category == 'runtime':\n            return str_runtime\n        elif args.category == 'tools':\n            return str_tools\n        elif args.category == 'compiler':\n            if args.instance != 'inf1':\n                return str_python + str_compiler\n            else:\n                return str_python\n        elif args.category == 'framework':\n            return str_framework\n        elif args.category == 'efa':\n            return str_efa\n\n    def install_dlami(self, args):\n        latest_release_for_instance = \\\n            self.df_latest_release.loc[self.df_latest_release['instance'] == args.instance]['version'].values[0]\n        latest_release_for_dlami = self.df_dlami_properties[\n            (self.df_dlami_properties['framework'] == args.framework) & (\n                self.df_dlami_properties['supported_instances'].map(lambda x: args.instance in x))][\n            'neuron_released_version'].values[0]\n\n        if (latest_release_for_instance == latest_release_for_dlami):\n            return self.activate_python_venv(args)\n        else:\n            args.install_type = 'update'\n            str_dlami = self.activate_python_venv(args)\n            str_dlami += self.jupyter_notebook(args)\n            str_dlami += self.set_pip_repository()\n            str_dlami += self.install_neuron_compiler_and_framework(args)\n        return str_dlami\n\n\n    def install_neuron_dlami(self, args):\n        str_dlami = \"\"\n        if ((args.instance == 'trn1' or args.instance == 'inf2') and args.category == \"transformers-neuronx\"):\n            str_dlami = '\\n# Activate Python venv for Transformers-NeuronX \\n'\n            str_dlami += \"source /opt/aws_neuronx_venv_transformers_neuronx/bin/activate\"\n        elif ((args.instance == 'trn1' or args.instance == 'inf2') and args.framework == \"pytorch\" and args.framework_version == \"1.13.1\"):\n            str_dlami = '\\n# Activate Python venv for Pytorch 1.13 \\n'\n            str_dlami += \"source /opt/aws_neuronx_venv_pytorch_1_13/bin/activate\"\n        elif ((args.instance == 'trn1' or args.instance == 'inf2') and args.framework == \"pytorch\" and args.framework_version == \"2.1\"):\n            str_dlami = '\\n# Activate Python venv for Pytorch 2.1 \\n'\n            str_dlami += \"source /opt/aws_neuronx_venv_pytorch_2_1/bin/activate\"\n        elif ((args.instance == 'trn1' or args.instance == 'inf2') and args.framework == \"tensorflow\" and args.framework_version == \"2.10.1\"):\n            str_dlami = '\\n# Activate Python venv for Tensorflow 2.10 \\n'\n            str_dlami += \"source /opt/aws_neuronx_venv_tensorflow_2_10/bin/activate\"\n        elif (args.instance == 'inf1' and args.framework == \"tensorflow\" and args.framework_version == \"2.10.1\"):\n            str_dlami = '\\n# Activate Python venv for Tensorflow 2.10 \\n'\n            str_dlami += \"source /opt/aws_neuron_venv_tensorflow_2_10_inf1/bin/activate\"\n        elif (args.instance == 'inf1' and args.framework == \"pytorch\" and args.framework_version == \"1.13.1\"):\n            str_dlami = '\\n# Activate Python venv for Pytorch 1.13 \\n'\n            str_dlami += \"source /opt/aws_neuron_venv_pytorch_1_13_inf1/bin/activate\"\n        return str_dlami\n\n    def jupyter_notebook(self, args):\n        os_default_python_version = \\\n            self.df_os_properties.loc[self.df_os_properties['os'] == args.os]['default_python_version'].values[0]\n        packages_supporting_python_versions = self.get_pip_packages_supporting_python_versions(args)\n\n        if os_default_python_version in packages_supporting_python_versions:\n            target_python_version = os_default_python_version\n        else:\n            target_python_version = max(packages_supporting_python_versions)\n\n        framework_name = self.get_package_names(category=args.framework, instance=args.instance)[0]\n\n        str_jupiter = '\\n# Install Jupyter notebook kernel\\n'\n        str_jupiter += 'pip install ipykernel ' + '\\n'\n        str_jupiter += 'python' + target_python_version + ' -m ipykernel install --user --name '\n        str_jupiter += 'aws_neuron_venv_' + args.framework\n        if args.instance == 'inf1':\n            str_jupiter += '_inf1'\n        str_jupiter += ' --display-name \"Python (' + framework_name + ')\"' + '\\n'\n        str_jupiter += 'pip install jupyter notebook' + '\\n'\n        str_jupiter += 'pip install environment_kernels' + '\\n'\n        return str_jupiter\n\n    def install_and_enable_epel(self, args):\n        str = ''\n        if args.mode != 'compile':\n            if args.install_type == 'install':\n                if args.os == 'rockylinux9':\n                    str += '\\n# Install and enable EPEL\\n'\n                    str += 'sudo dnf config-manager --set-enabled crb\\n'\n                    str += 'sudo dnf install epel-release -y\\n'\n        return str\n\n    def config_neuron_repository(self, args):\n        \"\"\"\n        Reads OS type from the arguments and generates scripts for configuration of Neuron repository\n        \"\"\"\n        str = ''\n        if args.mode != 'compile':\n            # Neuron repository needs when mode is 'develop' or 'deploy'\n            if args.install_type == 'install':\n                str += '\\n# Configure Linux for Neuron repository updates' + '\\n'\n                if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                    str += '. /etc/os-release' + '\\n'\n                    str += 'sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF' + '\\n'\n                    str += 'deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main' + '\\n'\n                    str += 'EOF' + '\\n'\n                    str += 'wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -' + '\\n'\n                elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023' or args.os == 'rockylinux9':\n                    str += 'sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF' + '\\n'\n                    str += '[neuron]' + '\\n'\n                    str += 'name=Neuron YUM Repository' + '\\n'\n                    str += 'baseurl=https://yum.repos.neuron.amazonaws.com' + '\\n'\n                    str += 'enabled=1' + '\\n'\n                    str += 'metadata_expire=0' + '\\n'\n                    str += 'EOF' + '\\n'\n                    str += 'sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB' + '\\n'\n        return str\n\n\n    def get_repo(self):\n        str = '\\n# Configure Linux for Neuron repository updates' + '\\n'\n        if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n            str += '. /etc/os-release' + '\\n'\n            str += 'sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF' + '\\n'\n            str += 'deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main' + '\\n'\n            str += 'EOF' + '\\n'\n            str += 'wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -' + '\\n'\n        elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023' or args.os == 'rockylinux9':\n            str += 'sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF' + '\\n'\n            str += '[neuron]' + '\\n'\n            str += 'name=Neuron YUM Repository' + '\\n'\n            str += 'baseurl=https://yum.repos.neuron.amazonaws.com' + '\\n'\n            str += 'enabled=1' + '\\n'\n            str += 'metadata_expire=0' + '\\n'\n            str += 'EOF' + '\\n'\n            str += 'sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB' + '\\n'\n        return str\n\n    def update_os_packages(self, args):\n        \"\"\"\n        Reads mode and OS type and updates OS packages accordingly.\n        \"\"\"\n        str = ''\n        if args.mode != 'compile':\n            # OS packages need to be updated in \"develop\" or \"deploy\" mode\n            str += '\\n# Update OS packages \\n'\n            if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                str += 'sudo apt-get update -y' + '\\n'\n            elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023' or args.os == 'rockylinux9':\n                str += 'sudo dnf update -y' + '\\n'\n            if args.os == 'rockylinux9':\n                str += '# Reboot instance to ensure kernel is updated\\n'\n                str += 'sudo reboot\\n'\n        return str\n\n    def install_os_headers(self, args):\n        \"\"\"\n        Reads mode and OS type and install OS headers accordingly.\n        \"\"\"\n        str = ''\n        if args.mode != 'compile':\n            # OS headers need to be installed in \"develop\" or \"deploy\" mode\n            if args.install_type == 'install':\n                str += '\\n# Install OS headers \\n'\n            elif args.install_type == 'update':\n                str += '\\n# Update OS headers \\n'\n            if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22'or args.os == 'ubuntu24':\n                str += 'sudo apt-get install linux-headers-$(uname -r) -y' + '\\n'\n            elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023' or args.os == 'rockylinux9':\n                str += 'sudo dnf install -y \"kernel-devel-uname-r = $(uname -r)\"' + '\\n'\n\n        return str\n\n    def install_git(self, args):\n\n        str = '\\n# Install git \\n'\n        if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n            str += 'sudo apt-get install git -y\\n'\n        elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023' or args.os == 'rockylinux9':\n            str += 'sudo dnf install git -y\\n'\n\n        return str\n\n    def install_neuron_driver(self, args):\n        \"\"\"\n        Neuron driver install script will be generated based on mode, AMI, and OS.\n        mode: when develop or deploy\n        AMI: when not dlami-base\n        OS: for different command\n        \"\"\"\n        str = ''\n\n        if args.ami == 'dlami-base':\n            return str\n\n        # get driver package names for release version, instance\n        # we take only the first element in the list since there should be ond driver package.\n        driver_package = self.get_package_names(category='driver', instance=args.instance)[0]\n\n        if args.mode != 'compile':\n            # if args.ami != 'dlami-base':\n            install = 'install' if args.install_type == 'install' else 'upgrade'\n            str += f'\\n# {install} Neuron Driver\\n'\n\n            if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                if args.neuron_version == None:\n                    if self.df_package_properties.loc[self.df_package_properties['name'] == driver_package][\n                        'pin_major'].values[0] == 'true':\n                        version = '=' + self.get_major_version(driver_package, args.instance) + '.'\n                elif (args.neuron_version != None) & (args.install_type == 'install'):\n                    version = '=' + self.get_package_version(category='driver', name=driver_package,\n                                                             neuron_version=args.neuron_version)\n                elif args.install_type == 'update':\n                    if self.df_package_properties.loc[self.df_package_properties['name'] == driver_package][\n                        'pin_major'].values[0] == 'true':\n                        version = '=' + self.get_package_version(category='driver', name=driver_package,\n                                                                 neuron_version=args.neuron_version)\n                str += f'sudo apt-get {install} {driver_package}{version}* -y'\n                if args.install_type == 'update':\n                    str += ' --allow-change-held-packages'\n                str += '\\n'\n\n            elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023' or args.os =='rockylinux9':\n                yum_install = 'install' if args.install_type == 'install' else 'update'\n\n                if args.install_type == 'install':\n\n                    if args.neuron_version == None:\n                        if self.df_package_properties.loc[self.df_package_properties['name'] == driver_package][\n                            'pin_major'].values[0] == 'true':\n                            version = '-' + self.get_major_version(driver_package, args.instance) + '.'\n                    else:\n                        version = '-' + self.get_package_version(category='driver', name=driver_package,\n                                                                 neuron_version=args.neuron_version)\n                elif args.install_type == 'update':\n                    if self.df_package_properties.loc[self.df_package_properties['name'] == driver_package][\n                        'pin_major'].values[0] == 'true':\n                        version = '-' + self.get_major_version(driver_package, args.instance)\n\n                str += f'sudo dnf {yum_install} {driver_package}{version}* -y\\n'\n        '''\n        if args.ami == 'dlami-base':\n            str += '--allow-change-held-packages'\n        '''\n\n        return str\n\n    def install_neuron_runtime(self, args):\n        \"\"\"\n        Neuron runtime install script will be generated based on instace, mode, AMI, and OS.\n        instance: trn1\n        mode: when develop or deploy\n        AMI: when not dlami-base\n        OS: for different command\n        \"\"\"\n        str = ''\n\n        # get runtime package names for release verion, instance\n\n        runtime_packages = self.get_package_names(category='runtime', instance=args.instance,\n                                                  neuron_version=args.neuron_version)\n        # install neuron runtime on trn1\n        if args.mode != 'compile':\n            install = 'install' if args.install_type == 'install' else 'upgrade'\n            if len(runtime_packages) != 0:\n                if args.install_type == 'install':\n                    str += '\\n# Install Neuron Runtime \\n'\n                elif args.install_type == 'update':\n                    str += '\\n# Update Neuron Runtime\\n'\n\n                for runtime_package in runtime_packages:\n                    # if args.ami != 'dlami-base':\n                    if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                        str += (f'sudo apt-get {install} ' + runtime_package)\n                        if args.neuron_version == None:\n                            if self.df_package_properties.loc[self.df_package_properties['name'] == runtime_package][\n                                'pin_major'].values[0] == 'true':\n                                str += '=' + self.get_major_version(runtime_package, args.instance) + '.* -y'\n                                if args.install_type == 'update':\n                                    str += ' --allow-change-held-packages'\n                                str += '\\n'\n                        elif (args.neuron_version != None) & (args.install_type == 'install'):\n                            str += '=' + self.get_package_version(category='runtime', name=runtime_package,\n                                                                  neuron_version=args.neuron_version) + '* -y\\n'\n                        else:\n                            str += '\\n'\n\n                    elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023' or args.os == 'rockylinux9':\n                        str += 'sudo dnf '\n                        if args.install_type == 'install':\n                            str += 'install '\n                            str += runtime_package\n                            if args.neuron_version == None:\n                                if \\\n                                        self.df_package_properties.loc[\n                                            self.df_package_properties['name'] == runtime_package][\n                                            'pin_major'].values[0] == 'true':\n                                    str += '-' + self.get_major_version(runtime_package, args.instance) + '.* -y\\n'\n                            else:\n                                str += '-' + self.get_package_version(category='driver', name=runtime_package,\n                                                                      neuron_version=args.neuron_version) + '* -y\\n'\n                        elif args.install_type == 'update':\n                            str += 'update '\n                            str += runtime_package\n                            if self.df_package_properties.loc[self.df_package_properties['name'] == runtime_package][\n                                'pin_major'].values[0] == 'true':\n                                str += '-' + self.get_major_version(runtime_package, args.instance) + '.* -y\\n'\n        return str\n\n    def install_efa_driver(self, args):\n        str = ''\n        # install EFA driver on trn1\n        if args.instance == 'trn1' and args.mode == 'develop':\n            str += '\\n# Install EFA Driver (only required for multi-instance training)\\n'\n            str += 'curl -O https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz \\n'\n            str += 'wget https://efa-installer.amazonaws.com/aws-efa-installer.key && gpg --import aws-efa-installer.key \\n'\n            str += 'cat aws-efa-installer.key | gpg --fingerprint \\n'\n            str += 'wget https://efa-installer.amazonaws.com/aws-efa-installer-latest.tar.gz.sig && gpg --verify ./aws-efa-installer-latest.tar.gz.sig \\n'\n            str += 'tar -xvf aws-efa-installer-latest.tar.gz \\n'\n            str += 'cd aws-efa-installer && sudo bash efa_installer.sh --yes \\n'\n            str += 'cd \\n'\n            str += 'sudo rm -rf aws-efa-installer-latest.tar.gz aws-efa-installer \\n'\n        return str\n\n    def install_neuron_system_tools(self, args):\n        \"\"\"\n        Neuron tools will be installed in develop mode.\n        \"\"\"\n        str = ''\n        if args.mode == 'develop':\n            # get runtime package names for release verion, instance\n            install = 'install' if args.install_type == 'install' else 'upgrade'\n            system_tool_packages = self.get_package_names(category='system-tools', instance=args.instance)\n            if len(system_tool_packages) != 0:\n                if args.install_type == 'install':\n                    str += '\\n# Install Neuron Tools \\n'\n                elif args.install_type == 'update':\n                    str += '\\n# Update Neuron Tools\\n'\n\n                for system_tool in system_tool_packages:\n                    if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                        str += (f'sudo apt-get {install} ' + system_tool)\n                        if args.neuron_version == None:\n                            if self.df_package_properties.loc[self.df_package_properties['name'] == system_tool][\n                                'pin_major'].values[0] == 'true':\n                                str += '=' + self.get_major_version(system_tool, args.instance) + '.* -y'\n                                if args.install_type == 'update':\n                                    str += ' --allow-change-held-packages'\n                                str += '\\n'\n\n                        elif (args.neuron_version != None) & (args.install_type == 'install'):\n                            str += '=' + self.get_package_version(category='system-tools', name=system_tool,\n                                                                  neuron_version=args.neuron_version) + '* -y\\n'\n\n                    elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023' or args.os == 'rockylinux9':\n                        str += 'sudo dnf '\n                        if args.install_type == 'install':\n                            str += 'install '\n                            str += system_tool\n                            if args.neuron_version == None:\n                                if self.df_package_properties.loc[self.df_package_properties['name'] == system_tool][\n                                    'pin_major'].values[0] == 'true':\n                                    str += '-' + self.get_major_version(system_tool, args.instance) + '.* -y\\n'\n                            else:\n                                str += '-' + self.get_package_version(category='driver', name=system_tool,\n                                                                      neuron_version=args.neuron_version) + '* -y\\n'\n                        elif args.install_type == 'update':\n                            str += 'update '\n                            str += system_tool\n                            if self.df_package_properties.loc[self.df_package_properties['name'] == system_tool][\n                                'pin_major'].values[0] == 'true':\n                                str += '-' + self.get_major_version(system_tool, args.instance) + '.* -y\\n'\n        return str\n\n    def install_extra_dependencies(self, args):\n        \"\"\"\n        Any extra dependencies must be added in this function\n        \"\"\"\n        str = ''\n        if args.os == 'amazonlinux2023':\n            str += '# Install External Dependency\\n'\n            str += 'sudo dnf '\n            if args.mode == 'develop':\n                str += 'install -y '\n            elif args.install_type == 'update':\n                str += 'update '\n            str += 'libxcrypt-compat\\n'\n        return str\n\n    def set_python_venv(self, args):\n        # find the right python version that Neuron framework supports\n        # (for fresh install) install the Python venv\n        # (for fresh install and update) activate the venv\n        str = ''\n\n        indentation = '\\t' if args.venv_install_type == 'parallel-cluster' else ''\n\n        os_default_python_version = \\\n            self.df_os_properties.loc[self.df_os_properties['os'] == args.os]['default_python_version'].values[0]\n        packages_supporting_python_versions = self.get_pip_packages_supporting_python_versions(args)\n\n        if os_default_python_version in packages_supporting_python_versions:\n            target_python_version = os_default_python_version\n        else:\n            target_python_version = max(packages_supporting_python_versions)\n\n        if args.install_type == 'install':\n            # Install Python: if the default Python version of OS does not support Neuron packages, we install the supporting version\n            if os_default_python_version not in packages_supporting_python_versions:\n                str += '\\n# Install Python \\n'\n                if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                    str += 'sudo add-apt-repository ppa:deadsnakes/ppa\\n'\n                    str += 'sudo apt-get install python' + target_python_version + '\\n'\n                elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023':\n                    str += 'sudo dnf install python' + target_python_version + '\\n'\n                elif args.os == 'rockylinux9':\n                    str += 'sudo dnf install python' + target_python_version + '\\n'\n\n            # Install Python venv\n            \"\"\"\n            if os_default_python_version in packages_supporting_python_versions:\n                str += '\\n# Install Python venv \\n'\n                str +='python'+target_python_version+' -m venv '+args.framework+'_venv \\n'\n            else:\n            \"\"\"\n            if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                str += '\\n# Install Python venv \\n'\n                str += 'sudo apt-get install -y python' + target_python_version + '-venv g++ \\n'\n            elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023' or args.os == 'rockylinux9':\n                str += '\\n# Install Python venv \\n'\n                if args.os == 'amazonlinux2' or args.os == 'rockylinux9':\n                    str += 'sudo dnf install -y python' + target_python_version + '-venv gcc-c++ \\n'\n                else:\n                    str += 'sudo dnf install -y gcc-c++ \\n'\n\n            # when venv_install_type is parellel cluster, we need to change the directory\n            if args.venv_install_type == 'parallel-cluster':\n                if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                    str += '\\ncd /home/ubuntu\\n'\n                elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023':\n                    str += '\\ncd /home/ec2-user\\n'\n\n                str += '. \"/etc/parallelcluster/cfnconfig\"\\n'\n                str += '\\nif [[ $cfn_node_type == \"HeadNode\" ]]; then\\n'\n\n            # Create Python venv\n            str += f'\\n{indentation}# Create Python venv\\n'\n            str_venv_name = 'aws_neuron_venv_' + args.framework\n            if args.instance == 'inf1':\n                str_venv_name += '_inf1'\n\n            str += f'{indentation}python{target_python_version} -m venv ' + str_venv_name + ' \\n'\n\n        return str\n\n    def activate_python_venv(self, args):\n\n        str = ''\n\n        indentation = '\\t' if args.venv_install_type == 'parallel-cluster' else ''\n        str_venv_name = ''\n        str += f'\\n{indentation}# Activate Python venv \\n'\n\n        if args.ami == 'dlami-framework':\n            str_venv_name += '/opt/'\n\n        str_venv_name += 'aws_neuron_venv_' + args.framework\n\n        if args.instance == 'inf1':\n            str_venv_name += '_inf1'\n\n        str += f'{indentation}source ' + str_venv_name + '/bin/activate \\n'\n\n        # install python packages\n        if (args.install_type == 'install' and args.ami != 'dlami-framework'):\n            str += f'{indentation}python -m pip install -U pip \\n'\n\n        return str\n\n    def set_pip_repository(self):\n        str = ''\n\n        indentation = '\\t' if args.venv_install_type == 'parallel-cluster' else ''\n\n        str += f'\\n{indentation}# Set pip repository pointing to the Neuron repository \\n'\n        str += f'{indentation}python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com\\n'\n\n        return str\n\n    def install_aux(self, args):\n        str = ''\n\n        indentation = '\\t' if args.venv_install_type == 'parallel-cluster' else ''\n\n        if args.instance == 'trn1':\n            str += f'\\n{indentation}# Install wget, awscli \\n'\n            str += f'{indentation}python -m pip install wget \\n'\n            str += f'{indentation}python -m pip install awscli \\n'\n\n        return str\n\n    def install_neuron_compiler_and_framework(self, args):\n        str = ''\n        indentation = '\\t' if args.venv_install_type == 'parallel-cluster' else ''\n        compiler_package = self.get_package_names(category='compiler', instance=args.instance)[0]\n        framework_name = self.get_package_names(category=args.framework, instance=args.instance)[0]\n        # if args.instance == 'inf1':\n        #     return ''\n\n        str = ''\n        if args.mode != 'deploy':\n            if args.install_type == 'install':\n                str += f'\\n{indentation}# Install Neuron Compiler and Framework\\n'\n            elif args.install_type == 'update':\n                str += f'\\n{indentation}# Update Neuron Compiler and Framework\\n'\n\n            str += f'{indentation}python -m pip install '\n            if args.install_type == 'update':\n                str += '--upgrade '\n\n\n        str += compiler_package\n\n        if args.neuron_version == None or args.install_type == 'update':\n            if self.df_package_properties.loc[self.df_package_properties['name'] == compiler_package][\n                'pin_major'].values[0] == 'true':\n                str += '==' + self.get_major_version(compiler_package, args.instance) + '.* '\n        else:\n            str += '==' + self.get_package_version(category='compiler', name=compiler_package,\n                                                   neuron_version=args.neuron_version) + ' '\n\n        if args.neuron_version != None:  # prev install\n            str += framework_name + '=='\n            str += self.get_package_version(category=args.framework, name=framework_name,\n                                            neuron_version=args.neuron_version,\n                                            framework_version=args.framework_version)\n        else:  # fresh install\n            if args.framework == 'pytorch':\n                str += framework_name\n                if args.framework_version == \"1.13.1\":\n                    str += '=='\n                    str += \"1.13.*\"\n                elif args.framework_version == \"2.1.2\":\n                    str += '=='\n                    str += \"2.1.*\"\n                elif args.framework_version == \"2.5.1\":\n                    str += '=='\n                    str += \"2.5.*\"\n                elif args.framework_version == \"2.6.0\":\n                    str += '=='\n                    str += \"2.6.*\"\n                elif args.framework_version == \"2.7.0\":\n                    str += '=='\n                    str += \"2.7.*\"\n                elif args.framework_version == \"2.8.0\":\n                    str += '=='\n                    str += \"2.8.*\"\n                elif args.framework_version == \"2.9.0\":\n                    str += '=='\n                    str += \"2.9.*\"\n                str += ' torchvision\\n'\n            else:\n                str += framework_name\n\n        if args.instance == 'inf1':\n\n            install = 'Install' if args.install_type == 'install' else 'Update'\n            upgrade = '--upgrade ' if args.install_type == 'update' else ''\n\n            if args.neuron_version != None:  # in case of previous neuron version\n                version = '==' + self.get_package_version(category=args.framework, neuron_version=args.neuron_version,\n                                                          framework_version=args.framework_version)\n            else:  # in case of latest neuron version (fresh install)\n                if args.framework_version.startswith(\n                        self.get_main_framework_version(instance=args.instance, framework=args.framework,\n                                                        neuron_version=args.neuron_version)) == False:\n                    version = '==' + args.framework_version + '.*'\n                else:\n                    version = ''\n\n            if args.framework == 'pytorch':\n\n                pytorch_aux = ' neuron-cc[tensorflow] \"protobuf\"' if args.mode != 'deploy' else ''\n\n                str = f'\\n# {install} PyTorch Neuron\\n'\n                str += f'python -m pip install {upgrade}torch-neuron{version}{pytorch_aux} torchvision\\n'\n\n            elif args.framework == 'tensorflow':\n\n                if args.neuron_version != None:  # in case of previous neuron version\n\n                    ms_version = '=' + self.get_package_version(category='model-server',\n                                                                neuron_version=args.neuron_version,\n                                                                framework_version=args.framework_version)\n                else:  # in case of latest neuron version (fresh install)\n                    if args.framework_version != self.get_main_framework_version(instance=args.instance,\n                                                                                 framework=args.framework,\n                                                                                 neuron_version=args.neuron_version):\n                        ms_version = '=' + self.get_package_version(category='model-server',\n                                                                    neuron_version=args.neuron_version,\n                                                                    framework_version=args.framework_version)\n                    else:\n                        ms_version = ''\n\n                str = f'\\n# {install} TensorFlow Neuron\\n'\n                str += f'python -m pip install {upgrade}tensorflow-neuron[cc]{version} \"protobuf\"\\n'\n\n                str += f'\\n# {install} Neuron TensorBoard\\n'\n                str += f'python -m pip install {upgrade}tensorboard-plugin-neuron\\n'\n\n                if args.mode != 'compile':\n                    str += f'\\n# Optional: {install} Tensorflow Neuron model server\\n'\n                    if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                        str += f'sudo apt-get install tensorflow-model-server-neuronx{ms_version} -y\\n'\n                    elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023':\n                        str += f'sudo dnf install tensorflow-model-server-neuronx{ms_version} -y\\n'\n\n            elif args.framework == 'mxnet':\n\n                mxnet_framework = ''\n\n                neuron_cc_version = ''\n                if args.framework_version == '1.8.0':\n                    mxnet_framework = 'mx_neuron'\n                elif args.framework_version == '1.5.1':\n                    mxnet_framework = 'mxnet_neuron'\n                    neuron_cc_version='==1.15.0'\n\n                str = f'\\n# {install} MXNet Neuron\\n'\n                str += 'wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\\n'\n                str += 'pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\\n'\n                str += f'python -m pip install {upgrade}{mxnet_framework}{version} neuron-cc{neuron_cc_version}\\n'\n\n        if args.venv_install_type == 'parallel-cluster':\n            if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                str += f'\\n\\n{indentation}chown ubuntu:ubuntu -R {args.framework}_venv\\n'\n            elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023':\n                str += f'\\n\\n{indentation}chown ec2-user:ec2-user -R {args.framework}_venv\\n'\n\n            str += 'fi'\n\n        return str\n\n    def install_neuron_compiler(self, args):\n        '''\n        Neuron compiler will be installed in develop or compile mode based on the instance.\n        '''\n        str = ''\n\n        indentation = '\\t' if args.venv_install_type == 'parallel-cluster' else ''\n\n        compiler_package = self.get_package_names(category='compiler', instance=args.instance)[0]\n\n        if args.instance == 'inf1':\n            return ''\n\n        str = ''\n        if args.mode != 'deploy':\n            if args.install_type == 'install':\n                str += f'\\n{indentation}# Install Neuron Compiler\\n'\n            elif args.install_type == 'update':\n                str += f'\\n{indentation}# Update Neuron Compiler\\n'\n\n            str += f'{indentation}python -m pip install '\n            if args.install_type == 'update':\n                str += '--upgrade '\n\n            str += compiler_package\n\n            if args.neuron_version == None:\n                if self.df_package_properties.loc[self.df_package_properties['name'] == compiler_package][\n                    'pin_major'].values[0] == 'true':\n                    str += '==' + self.get_major_version(compiler_package, args.instance) + '.* \\n'\n                else:\n                    str += '\\n'\n            else:\n                str += '==' + self.get_package_version(category='compiler', name=compiler_package,\n                                                       neuron_version=args.neuron_version) + '\\n'\n\n        return str\n\n    def install_neuron_framework(self, args):\n        '''\n        Neuron framework is installed based on:\n        instance\n        framework\n        framework-version\n        '''\n        str = ''\n        indentation = '\\t' if args.venv_install_type == 'parallel-cluster' else ''\n\n        framework_name = self.get_package_names(category=args.framework, instance=args.instance)[0]\n\n        if args.install_type == 'install':\n            str += f'\\n{indentation}# Install Neuron Framework\\n'\n        elif args.install_type == 'update':\n            str += f'\\n{indentation}# Update Neuron Framework\\n'\n\n        str += f'{indentation}python -m pip install '\n        if args.install_type == 'update':\n            str += '--upgrade '\n\n        if args.neuron_version != None:  # prev install\n            str += framework_name + '=='\n            str += self.get_package_version(category=args.framework, name=framework_name,\n                                            neuron_version=args.neuron_version,\n                                            framework_version=args.framework_version)\n        else:  # fresh install\n            str += framework_name\n\n        if args.framework == 'pytorch':\n            str += ' torchvision\\n'\n\n        if args.instance == 'inf1':\n\n            install = 'Install' if args.install_type == 'install' else 'Update'\n            upgrade = '--upgrade ' if args.install_type == 'update' else ''\n\n            if args.neuron_version != None:  # in case of previous neuron version\n                version = '==' + self.get_package_version(category=args.framework, neuron_version=args.neuron_version,\n                                                          framework_version=args.framework_version)\n            else:  # in case of latest neuron version (fresh install)\n                if args.framework_version.startswith(\n                        self.get_main_framework_version(instance=args.instance, framework=args.framework,\n                                                        neuron_version=args.neuron_version)) == False:\n                    version = '==' + args.framework_version + '.*'\n                else:\n                    version = ''\n\n            if args.framework == 'pytorch':\n\n                pytorch_aux = ' neuron-cc[tensorflow] \"protobuf\"' if args.mode != 'deploy' else ''\n\n                str = f'\\n# {install} PyTorch Neuron\\n'\n                str += f'python -m pip install {upgrade}torch-neuron{version}{pytorch_aux} torchvision\\n'\n\n\n            elif args.framework == 'tensorflow':\n\n                if args.neuron_version != None:  # in case of previous neuron version\n\n                    ms_version = '=' + self.get_package_version(category='model-server',\n                                                                neuron_version=args.neuron_version,\n                                                                framework_version=args.framework_version)\n                else:  # in case of latest neuron version (fresh install)\n                    if args.framework_version != self.get_main_framework_version(instance=args.instance,\n                                                                                 framework=args.framework,\n                                                                                 neuron_version=args.neuron_version):\n                        ms_version = '=' + self.get_package_version(category='model-server',\n                                                                    neuron_version=args.neuron_version,\n                                                                    framework_version=args.framework_version)\n                    else:\n                        ms_version = ''\n\n                str = f'\\n# {install} TensorFlow Neuron\\n'\n                str += f'python -m pip install {upgrade}tensorflow-neuron[cc]{version} \"protobuf\"\\n'\n\n                if args.mode != 'compile':\n                    str += f'\\n# Optional: {install} Tensorflow Neuron model server\\n'\n                    if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                        str += f'sudo apt-get install tensorflow-model-server-neuronx{ms_version} -y\\n'\n                    elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023':\n                        str += f'sudo dnf install tensorflow-model-server-neuronx{ms_version} -y\\n'\n\n            elif args.framework == 'mxnet':\n\n                mxnet_framework = ''\n\n                if args.framework_version == '1.8.0':\n                    mxnet_framework = 'mx_neuron'\n                elif args.framework_version == '1.5.1':\n                    mxnet_framework = 'mxnet_neuron'\n\n                str = f'\\n# {install} MXNet Neuron\\n'\n                str += 'wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\\n'\n                str += 'pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\\n'\n                str += f'python -m pip install {upgrade}{mxnet_framework}{version} neuron-cc\\n'\n\n        if args.venv_install_type == 'parallel-cluster':\n            if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24':\n                str += f'\\n\\n{indentation}chown ubuntu:ubuntu -R {args.framework}_venv\\n'\n            elif args.os == 'amazonlinux2' or args.os == 'amazonlinux2023':\n                str += f'\\n\\n{indentation}chown ec2-user:ec2-user -R {args.framework}_venv\\n'\n\n            str += 'fi'\n\n        return str\n\n    def get_latest_neuron_version_per_instance(self, instance):\n        return self.df_latest_release.loc[self.df_latest_release['instance'] == instance]['version'].values[0]\n\n    def get_package_names(self, category, instance, neuron_version=None):\n\n        if neuron_version == None:\n            neuron_version = self.get_latest_neuron_version_per_instance(instance)\n\n        df_instance = self.df_release_packages[\n            self.df_release_packages['supported_instances'].map(lambda x: instance in x)]\n\n        return \\\n            df_instance.loc[(df_instance['category'] == category) & (df_instance['neuron_version'] == neuron_version)][\n                'name'].tolist()\n\n    def get_package_version(self, category, neuron_version, name=None, framework_version=None):\n        if neuron_version == None:\n            neuron_version = self.get_latest_neuron_version_per_instance(args.instance)\n\n        if name != None:\n            df_package = self.df_release_packages.loc[(self.df_release_packages['neuron_version'] == neuron_version) & (\n                    self.df_release_packages['name'] == name)]\n        else:\n            df_package = self.df_release_packages.loc[self.df_release_packages['neuron_version'] == neuron_version]\n\n        if (category == 'pytorch') or (category == 'tensorflow') or (category == 'mxnet') or (\n                category == 'model-server'):\n            df_package = df_package.loc[df_package['category'] == category]\n            fv = self.extract_major_minor_version(Version(framework_version))\n            df_package = df_package.loc[df_package['version'].map(lambda x: x.startswith(fv))]\n        return df_package['version'].values[0]\n\n    def get_main_framework_version(self, instance, framework, neuron_version):\n\n        if neuron_version == None:\n            neuron_version = self.get_latest_neuron_version_per_instance(instance)\n\n        df_instance = self.df_release_packages[\n            self.df_release_packages['supported_instances'].map(lambda x: instance in x)]\n\n        df_version = df_instance.loc[\n            (df_instance['category'] == framework) & (df_instance['neuron_version'] == neuron_version)].copy()\n\n        df_version['version'] = df_version['version'].map(lambda x: Version(x))\n\n        main_version = sorted(df_version['version'], reverse=True)[0]\n\n        return str(main_version.major) + '.' + str(main_version.minor)\n\n    def list_packages(self, args):\n\n        str = ''\n\n        if args.neuron_version == None:\n            neuron_version = self.get_latest_neuron_version_per_instance(args.instance)\n        else:\n            neuron_version = args.neuron_version\n\n        if (args.list == 'packages'):  # list packages by neuron version\n\n            df_instance = self.df_release_packages[\n                self.df_release_packages['supported_instances'].map(lambda x: args.instance in x)]\n\n            df_version = df_instance.loc[\n                (df_instance['neuron_version'] == neuron_version) & (df_instance['category'] != 'efa')].copy()\n\n            str += '\\nList of packages in Neuron ' + neuron_version + ':\\n\\n'\n            str += '{0:35} {1:50}\\n'.format(\"Component\", \"Package\")\n\n            for index, row in df_version.iterrows():\n                if row['category'] == 'libnrt':\n                    str += f\"{row['component']:<35} {row['name'] + ' (Version ' + row['version']})\\n\"\n                else:\n                    str += f\"{row['component']:<35} {row['name'] + '-' + row['version']} \\n\"\n\n            df_version['package'] = (df_version['name'] + '-' + df_version['version'])\n\n        return str\n\n    def list_pyversions(self, args):\n\n        str = ''\n\n        if args.neuron_version == None:\n            neuron_version = self.get_latest_neuron_version_per_instance(args.instance)\n        else:\n            neuron_version = args.neuron_version\n\n        if (args.list == 'pyversions'):  # list packages by neuron version\n\n            df_instance = self.df_release_packages[\n                self.df_release_packages['supported_instances'].map(lambda x: args.instance in x)]\n\n            df_version = df_instance.loc[\n                (df_instance['neuron_version'] == neuron_version) & (df_instance['category'] != 'efa')].copy()\n\n            str += '\\nList of packages in Neuron ' + neuron_version + ':\\n\\n'\n            str += '{0:35} {1:50}\\n'.format(\"Package\", \"           Supported Python Versions\")\n\n            for index, row in df_version.iterrows():\n                python_version_str = ''\n                for i, pversion in enumerate(row['supported_python_versions']):\n                    if i != len(row['supported_python_versions'])-1:\n                        python_version_str += pversion + \", \"\n                    else:\n                        python_version_str += pversion\n                if len(row['supported_python_versions']) != 0:\n                    if row['category'] == 'libnrt':\n                        str += f\"{row['name'] + ' Version ' + row['version']:<50}{python_version_str} \\n\"\n                    else:\n                        str += f\"{row['name'] + '-' + row['version']:<50}{python_version_str} \\n\"\n\n                df_version['package'] = (df_version['name'] + '-' + df_version['version'])\n\n        return str\n\n\n\n################\n# Sanity Checks\n################\ndef cli_validate(args):\n    # case of parallel-cluster, it should not be inf1\n    if (args.venv_install_type == 'parallel-cluster') & (args.instance == 'inf1'):\n        print(__name__, \": error: \", \"parallel-cluster scripts is not compatible with inf1\")\n        exit(-1)\n\n\n########################################\n# parse_arguments\n########################################\n\ndef cli_parse_arguments():\n    __name__ = 'n2-helper.py'\n    parser = argparse.ArgumentParser(prog=__name__\n                                     ,\n                                     usage='\\npython3 %(prog)s --list={packages} [--neuron-version=X.Y.Z] [--instance=INSTANCE]\\n'\n                                           + 'python3 %(prog)s --list={pyversions} [--neuron-version=X.Y.Z] [--instance=INSTANCE]\\n'\n                                           + 'python3 %(prog)s --install-type={install,update}\\n'\n                                           + 'python3 %(prog)s --instance={inf1,trn1,inf2,trn2}\\n'\n                                           + 'python3 %(prog)s --os={ubuntu18,ubuntu20,ubuntu22,amazonlinux2,amazonlinux2023,rockylinux9,ubuntu24}\\n'\n                                           + 'python3 %(prog)s --ami={non-dlami,dlami-base,dlami-conda,dlami-framework,dlami-neuron}\\n'\n                                           + 'python3 %(prog)s --framework={pytorch,tensorflow,mxnet}\\n'\n                                           + 'python3 %(prog)s --framework-version=[X.Y.Z] [options]\\n'\n                                           + 'python3 %(prog)s --mode={develop,compile,deploy} [options]\\n'\n                                           + 'python3 %(prog)s --category={framework,driver,runtime,compiler,tools,all,driver_runtime_tools,compiler_framework,efa, transformers-neuronx}\\n'\n                                           + 'options= [--file=FILE]\\n'\n                                     , description='Installer helper for Neuron SDK')\n\n    group = parser.add_mutually_exclusive_group(required=True)\n    parser.add_argument(\"--neuron-version\", metavar='X.Y.Z')\n    group.add_argument(\"--list\", choices=['neuron_versions', 'pyversions','packages', 'components', 'frameworks'])\n    group.add_argument(\"--install-type\", choices=['install', 'update'])\n    parser.add_argument(\"--instance\", choices=['inf1', 'trn1', 'inf2', 'trn2'])\n    parser.add_argument(\"--os\", choices=['ubuntu18', 'ubuntu20', 'ubuntu22', 'amazonlinux2', 'amazonlinux2023', 'rockylinux9', 'ubuntu24'], )\n    parser.add_argument(\"--ami\", choices=['non-dlami', 'dlami-base', 'dlami-conda', 'dlami-framework', 'dlami-neuron'],\n                        default='non-dlami', help='default=non-dlami')\n    parser.add_argument(\"--mode\", choices=['develop', 'compile', 'deploy', 'initialize'], default='develop')\n    parser.add_argument(\"--category\",\n                        choices=['framework', 'driver', 'runtime', 'compiler', 'tools', 'all', 'driver_runtime_tools',\n                                 'compiler_framework', 'efa', 'transformers-neuronx'])\n    parser.add_argument(\"--framework\", choices=['pytorch', 'tensorflow', 'mxnet'])\n    parser.add_argument(\"--framework-version\", metavar='X.Y.Z')\n    parser.add_argument(\"--venv-install-type\", choices=['single-node', 'parallel-cluster'], default='single-node')\n    parser.add_argument(\"--file\", default='n2-manifest.json', help='default=n2-manifest.json')\n\n    return parser.parse_args()\n\n\nif __name__ == '__main__':\n    setup_cmd = ''\n    args = cli_parse_arguments()\n\n    # arguments sanity check\n    cli_validate(args)\n\n    # parse the manifest file\n    n2_manifest = manifest(manifest_file=args.file)\n    n2_manifest.parse_manifest()\n\n    # framework version sanity check\n    # generate install script\n    if (args.list == 'packages'):\n        print(n2_manifest.list_packages(args))\n    elif (args.list == 'pyversions'):\n        print(n2_manifest.list_pyversions(args))\n    else:\n        print(n2_manifest.generate_script(args))\n"
  },
  {
    "path": "src/helperscripts/n2-manifest.json",
    "content": "{\n    \"repos_n2\": [\n      {\"repo_type\":\"whl\", \"repo_url\":\"https://pip.repos.neuron.amazonaws.com/\"},\n      {\"repo_type\":\"rpm\", \"repo_url\":\"https://yum.repos.neuron.amazonaws.com/\"},\n      {\"repo_type\":\"deb\", \"repo_url\":\"https://apt.repos.neuron.amazonaws.com/\"}\n    ],\n    \"manifest_date\": \"04/09/2026\",\n    \"manifest_version\": \"2.29.0\",\n    \"latest_release\": [\n      {\"instance\":\"inf1\", \"version\":\"2.29.0\"},\n      {\"instance\":\"trn1\", \"version\":\"2.29.0\"},\n      {\"instance\":\"trn2\", \"version\":\"2.29.0\"},\n      {\"instance\":\"inf2\", \"version\":\"2.29.0\"},\n      {\"instance\":\"trn1n\", \"version\":\"2.29.0\"}\n    ],\n    \"os_properties\": [\n      {\"os\":\"ubuntu18\", \"default_python_version\":\"3.7\"},\n      {\"os\":\"ubuntu20\", \"default_python_version\":\"3.8\"},\n      {\"os\":\"ubuntu22\", \"default_python_version\":\"3.10\"},\n      {\"os\":\"ubuntu24\", \"default_python_version\":\"3.12\"},\n      {\"os\":\"amazonlinux2\", \"default_python_version\":\"3.8\"},\n      {\"os\":\"amazonlinux2023\", \"default_python_version\":\"3.9\"},\n      {\"os\":\"rockylinux9\", \"default_python_version\":\"3.9\"}\n    ],\n    \"ami_properties\": [\n      {\"ami\":\"non-dlami\", \"package_categories\": [\"driver\",\"runtime\",\"tools\",\"compiler\",\"framework\"]},\n      {\"ami\":\"dlami-base\", \"package_categories\": [\"tools\",\"compiler\",\"framework\"]},\n      {\"ami\":\"dlami-conda\", \"package_categories\": [\"driver\",\"runtime\",\"tools\",\"compiler\",\"framework\"]},\n      {\"ami\":\"dlami-<framework>\", \"package_categories\": [\"driver\",\"runtime\",\"tools\",\"compiler\"]}\n    ],\n    \"dlami_properties\": [\n      {\"framework\":\"pytorch\", \"dlami\": \"1.13\", \"neuron_released_version\": \"2.17.0\", \"supported_instances\":[\"trn1\",\"inf2\",\"inf1\"]},\n      {\"framework\":\"tensorflow\", \"dlami\": \"2.10\", \"neuron_released_version\": \"2.17.0\", \"supported_instances\":[\"trn1\",\"inf2\"]}\n    ],\n    \"major_version_properties\": [\n      {\"name\":\"neuronx-cc\",\"inf1\":\"\",\"trn1\":\"2\",\"inf2\":\"2\",\"trn2\":\"2\",\"trn3\":\"2\"},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"inf1\":\"2\",\"trn1\":\"2\",\"inf2\":\"2\",\"trn2\":\"2\",\"trn3\":\"2\"},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"inf1\":\"2\",\"trn1\":\"2\",\"inf2\":\"2\",\"trn2\":\"2\",\"trn3\":\"2\"},\n      {\"name\":\"aws-neuronx-oci-hooks\",\"inf1\":\"2\",\"trn1\":\"2\",\"inf2\":\"2\",\"trn2\":\"2\",\"trn3\":\"2\"},\n      {\"name\":\"tensorflow-neuronx\",\"inf1\":\"\",\"trn1\":\"1\",\"inf2\":\"1\"},\n      {\"name\":\"torch-neuronx\",\"inf1\":\"\",\"trn1\":\"1\",\"inf2\":\"1\",\"trn2\":\"2\",\"trn3\":\"2\"},\n      {\"name\":\"aws-neuronx-dkms\",\"inf1\":\"2.21\",\"trn1\":\"2\",\"inf2\":\"2\",\"trn2\":\"2\",\"trn3\":\"2\"},\n      {\"name\":\"aws-neuronx-collectives\",\"trn1\":\"2\",\"inf2\":\"2\",\"trn2\":\"2\",\"trn3\":\"2\"},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"inf1\":\"\",\"trn1\":\"2\",\"inf2\":\"2\",\"trn2\":\"2\",\"trn3\":\"2\"},\n      {\"name\":\"aws-neuronx-tools\",\"inf1\":\"2\",\"trn1\":\"2\",\"inf2\":\"2\",\"trn2\":\"2\",\"trn3\":\"2\"},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"inf1\":\"2\",\"trn1\":\"2\",\"inf2\":\"2\"},\n      {\"name\":\"neuronperf\",\"inf1\":\"2\",\"trn1\":\"2\",\"inf2\":\"2\"},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"inf1\":\"2\",\"trn1\":\"2\",\"inf2\":\"2\",\"trn2\":\"2\"},\n      {\"name\":\"nki\",\"trn1\":\"2\",\"inf2\":\"2\",\"trn2\":\"2\",\"trn3\":\"2\"}\n    ],\n    \"package_properties\": [\n      {\"name\":\"aws-neuronx-runtime-discovery\", \"component\":\"General\",\"category\":\"general\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"aws_neuron_sdk_release_version\", \"component\":\"Github\",\"category\":\"github\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"libneuronxla\",\"component\":\"Framework\",\"category\":\"general\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"neuron-cc\",\"component\":\"Compiler\",\"category\":\"compiler\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"neuronx-cc\",\"component\":\"Compiler\",\"category\":\"compiler\",\"package_type\":\"pip\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"neuronx-cc-stubs\",\"component\":\"Compiler\",\"category\":\"compiler\",\"package_type\":\"pip\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"component\":\"Kubernetes Plugin\",\"category\":\"container\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"component\":\"Kubernetes Scheduler\",\"category\":\"container\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"aws-neuronx-oci-hooks\",\"component\":\"OCI Hooks\",\"category\":\"container\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"mxnet-neuron\",\"component\":\"MXNet\",\"category\":\"mxnet\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"tensorflow-neuron\",\"component\":\"TensorFlow\",\"category\":\"tensorflow\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"tensorflow\",\"component\":\"TensorFlow\",\"category\":\"tensorflow\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"tensorflow-neuronx\",\"component\":\"TensorFlow\",\"category\":\"tensorflow\",\"package_type\":\"pip\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"torch-neuron\",\"component\":\"PyTorch\",\"category\":\"pytorch\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"torch-neuronx\",\"component\":\"PyTorch\",\"category\":\"pytorch\",\"package_type\":\"pip\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"transformers-neuronx\",\"component\":\"Transformers Neuron\",\"category\":\"transformers-neuronx\",\"package_type\":\"pip\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"mxnet_neuron\",\"component\":\"MXNet\",\"category\":\"mxnet\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"mx_neuron\",\"component\":\"MXNet\",\"category\":\"mxnet\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"aws-neuronx-dkms\",\"component\":\"Driver\",\"category\":\"driver\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"aws-neuronx-collectives\",\"component\":\"Collective Communication Library\",\"category\":\"runtime\",\"package_type\":\"os\",\"use_cases\":[\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"efa-installer\",\"component\":\"EFA\",\"category\":\"efa\",\"package_type\":\"na\",\"use_cases\":[\"training\"],\"pin_major\":\"false\"},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"component\":\"Runtime Library\",\"category\":\"runtime\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"aws-neuron-tools\",\"component\":\"System Tools\",\"category\":\"system-tools\",\"package_type\":\"os\",\"use_cases\":[\"inference\"],\"pin_major\":\"true\"},\n      {\"name\":\"aws-neuronx-tools\",\"component\":\"System Tools\",\"category\":\"system-tools\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"tensorflow-model-server-neuron\",\"component\":\"TensorFlow Model Server\",\"category\":\"model-server\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"component\":\"TensorFlow Model Server\",\"category\":\"model-server\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"neuronperf\",\"component\":\"Perf Tools\",\"category\":\"helper-tools\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"tensorboard-plugin-neuron\",\"component\":\"TensorBoard\",\"category\":\"profiling-tools\",\"package_type\":\"os\",\"use_cases\":[\"inference\"],\"pin_major\":\"true\"},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"component\":\"TensorBoard\",\"category\":\"profiling-tools\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"},\n      {\"name\":\"libnrt.so\",\"component\":\"Runtime Library\",\"category\":\"libnrt\",\"package_type\":\"os\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"torch_xla\",\"component\":\"PyTorch\",\"category\":\"helper-lib\",\"package_type\":\"pip\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"false\"},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"component\":\"CustomOps Tools\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"false\"},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"component\":\"CustomOps\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"false\"},\n      {\"name\":\"aws-neuronx-oci-hook\",\"component\":\"OCI\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"false\"},\n      {\"name\":\"dmlc_nnvm\",\"component\":\"Compiler\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"neuronx_hwm\",\"component\":\"Compiler\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"dmlc_topi\",\"component\":\"Compiler\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"dmlc_tvm\",\"component\":\"Compiler\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"inferentia_hwm\",\"component\":\"Compiler\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"false\"},\n      {\"name\":\"neuronx_distributed\",\"component\":\"Neuron Distributed\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"false\"},\n      {\"name\":\"neuronx_distributed_training\",\"component\":\"Neuron Distributed Training\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"false\"},\n      {\"name\":\"neuronx_distributed_inference\",\"component\":\"Neuron Distributed Inference\",\"category\":\"na\",\"package_type\":\"os\",\"use_cases\":[\"inference\"],\"pin_major\":\"false\"},\n      {\"name\":\"jax_neuronx\",\"component\":\"Jax\",\"category\":\"jax\",\"package_type\":\"pip\",\"use_cases\":[\"inference\"],\"pin_major\":\"true\"},\n      {\"name\":\"nki\",\"component\":\"NKI\",\"category\":\"nki\",\"package_type\":\"pip\",\"use_cases\":[\"inference\",\"training\"],\"pin_major\":\"true\"}\n    ],\n    \"neuron_releases\": [\n      {\"neuron_version\":\"2.29.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.31.24.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.27.4.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.21.2.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"14.09.x\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.29.147.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn3\"] ,\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.29.147.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.15.13.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.31.24.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.29.18.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.7.0.1.0.8181\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.16408.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.24.5133.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.24.5133.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.18.27753\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.9.17334\",\"supported_instances\":[\"inf2\",\"trn2\",\"trn1\",\"trn3\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"nki\",\"version\":\"0.3.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.918.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf1\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf1\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf1\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.9.0.2.13.24727\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\",\"trn3\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"efa-installer\",\"version\":\"1.47\",\"supported_instances\":[\"trn1\",\"trn2\",\"trn3\"],\"supported_python_versions\":[]}\n        ]}, \n      {\"neuron_version\":\"2.28.1\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.30.59.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.26.10.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.20.7.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.20.1.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.29.71.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.29.71.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.14.102.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.30.51.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.28.23.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.7.0.1.0.7584\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.15515.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.23.6484.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.23.6484.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.17.26814\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.7.0\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.8.16251\",\"supported_instances\":[\"inf2\",\"trn2\",\"trn1\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"nki\",\"version\":\"0.2.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.918.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.9.0.2.12.22436\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.7.0.2.12.22436\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.8.0.2.12.22436\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"efa-installer\",\"version\":\"1.47\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[]}\n        ]}, \n      {\"neuron_version\":\"2.28.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.30.59.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.26.5.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.20.4.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.20.1.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.29.71.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.29.71.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.14.102.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.30.51.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.28.23.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.7.0.1.0.7584\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.15515.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.23.6484.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.23.6484.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.17.26814\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.7.0\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.8.16251\",\"supported_instances\":[\"inf2\",\"trn2\",\"trn1\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"nki\",\"version\":\"0.2.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.918.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.9.0.2.12.22436\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.7.0.2.12.22436\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.8.0.2.12.22436\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"efa-installer\",\"version\":\"1.47\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[]}\n        ]}, \n      {\"neuron_version\":\"2.27.1\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.29.41.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.25.4.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.19.2.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.19.1.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.29.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.29.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.13.52.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.29.40.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.27.33.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.7.0.1.0.7377\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.14584.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.22.12471.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.22.12471.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.16.25997\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.7.0\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.7.15063\",\"supported_instances\":[\"inf2\",\"trn2\",\"trn1\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"nki\",\"version\":\"0.1.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.918.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.9.0.2.11.19912\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.7.0.2.11.19912\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.8.0.2.11.19912\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n        ]}, \n      {\"neuron_version\":\"2.27.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.29.41.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.25.4.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.19.2.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.19.1.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.29.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.29.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.13.52.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.29.40.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.27.33.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.7.0.1.0.7377\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.14584.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.22.12471.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.22.12471.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.16.25997\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.7.0\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.7.14366\",\"supported_instances\":[\"inf2\",\"trn2\",\"trn1\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"nki\",\"version\":\"0.1.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.918.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.9.0.2.11.19912\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.7.0.2.11.19912\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.8.0.2.11.19912\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\",\"3.12\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n        ]}, \n      {\"neuron_version\":\"2.26.1\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.28.27.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.24.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.18.0.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.18.0.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.28.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.28.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.12.36.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.28.23.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.26.14.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.6.2.1.0.6446\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.12677.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.21.33363.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.21.33363.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.15.22404\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.6.0\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.6.10598\",\"supported_instances\":[\"inf2\",\"trn2\",\"trn1\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.837.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.6.0.2.10.16998\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.7.0.2.10.16998\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.8.0.2.10.16998\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.13.1315\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]}, \n      {\"neuron_version\":\"2.26.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.28.27.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.24.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.18.0.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.18.0.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.28.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.28.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.12.36.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.28.23.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.26.14.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.6.2.1.0.6446\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.12677.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.21.18209.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.21.18209.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.15.22404\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.6.0\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.6.10598\",\"supported_instances\":[\"inf2\",\"trn2\",\"trn1\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.837.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.6.0.2.10.13553\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.7.0.2.10.13553\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.8.0.2.10.13553\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.13.1315\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]}, \n      {\"neuron_version\":\"2.25.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.27.34.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.27.34.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.23.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.17.1.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.17.0.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.27.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.27.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.11.42.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.27.23.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.25.145.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.6.1.1.0.3499\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.8201.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.20.9961.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.20.9961.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.14.18461\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.5.0\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.5.9230\",\"supported_instances\":[\"inf2\",\"trn2\",\"trn1\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.813.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.6.0.2.9.9357\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.7.0.2.9.9357\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.13.1216\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.24.1\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.26.43.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.26.43.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.22.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.16.2.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.16.1.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.26.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.26.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.10.56.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.26.42.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.24.54.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.6.0.1.0.1296\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.4410.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.19.8089.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.19.8089.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.13.14393\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.4.1\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.4.7422\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.760.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.5.1.2.8.6734\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.6.0.2.8.6734\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.7.0.2.8.6734\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.6\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.13.985\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.24.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.26.43.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.26.43.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.22.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.16.2.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.16.1.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.26.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.26.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.10.56.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.26.42.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.24.54.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.6.0.1.0.1296\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.4410.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.19.8089.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.19.8089.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.13.14393\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.4.0\",\"supported_instances\":[\"trn1\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.4.7422\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.760.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.5.1.2.8.6734\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.6.0.2.8.6734\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.7.0.2.8.6734\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.6\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.13.985\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.23.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.25.65.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.21.37.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.15.12.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.15.1.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.25.24.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.25.24.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.9.88.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.25.57.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.23.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.5.3.1.0.719\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.3493.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.18.121.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.18.121.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.12.12111\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.3.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.3.5591\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.0.670.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.5.1.2.7.5413\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.6.0.2.7.5413\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.6\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.13.798\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.22.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.24.59.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.24.59.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.20.28.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.14.12.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.14.6.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.24.23.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.24.23.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.7.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.24.53.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.24.53.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.22.61.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.1.3\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.2.1630.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.3396\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.17.194.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.17.194.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.11.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.2.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.2.0\",\"supported_instances\":[\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.117.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.5.1.2.6.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.6\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.13.470\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.21.1\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.23.135.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.12.35.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.19.64.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.13.16.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.13.2.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.23.45.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.23.45.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.6.36.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.23.112.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.20.204.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.1.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.1.714.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.3396\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.16.372.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.16.372.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.10.1\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.1.1\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.1.1\",\"supported_instances\":[\"inf2\",\"trn2\",\"trn1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.52.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},      \n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.17.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.1.2.2.4.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.5.1.2.4.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurong\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.6\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.13.380\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.21.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.23.133.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.12.35.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.19.64.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.13.16.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.13.2.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.23.30.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.23.30.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.6.36.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.23.110.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.20.204.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.1.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.1.681.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.3388\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.16.345.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.16.345.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.10.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.1.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_inference\",\"version\":\"0.1.0\",\"supported_instances\":[\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.52.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},      \n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.17.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.1.2.2.4.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.5.1.2.4.0\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurong\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.6\",\"supported_instances\":[\"trn1\",\"inf2\",\"trn2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.13.322\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      \n      {\"neuron_version\":\"2.20.2\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.22.33.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.12.35.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.18.20.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.12.2.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.12.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.22.20.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.22.20.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.5.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.22.19.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.19.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.1.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.0.5347.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.3278\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.15.143.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.15.143.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.9.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.0.1\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.63.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},      \n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.16.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.11.13.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.1.2.2.3.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurong\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.5\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.12.313\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.20.1\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.22.26.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.12.35.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.18.12.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.12.2.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.12.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.22.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.22.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.5.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.22.14.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.19.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.1.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.0.4986.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.2978\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.15.141.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.15.141.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.9.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.0.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.63.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.12.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},      \n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.16.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.1.2.2.3.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurong\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.4\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.12.313\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.20.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.22.26.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.12.35.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.18.12.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.12.2.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.12.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.22.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.22.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.5.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.22.14.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.19.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"jax_neuronx\",\"version\":\"0.1.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.9\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.0.4115.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.2978\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.24.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.15.128.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx-cc-stubs\",\"version\":\"2.15.128.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.9.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"neuronx_distributed_training\",\"version\":\"1.0.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.63.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.12.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.12.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.12.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.12.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.12.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.12.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},      \n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.16.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.1.2.2.3.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurong\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.4\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.12.313\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\",\"3.11\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.19.1\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.21.46.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.17.17.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.11.4.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.11.3.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.21.14.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.21.14.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.4.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.21.41.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.18.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.0.2335\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.1795\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.23.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.14.227.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.8.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.63.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.11.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.11.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.11.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.11.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.11.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.11.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.11.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.10.12.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.10.12.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.10.12.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.10.12.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.10.12.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.15.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.1.2.2.2.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuronf\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.3\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.11.351\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.19.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.21.46.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.17.17.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.11.4.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.11.3.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.21.14.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.21.14.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.4.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.21.41.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.18.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.0.2335\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.1795\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.147.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.23.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.93.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.14.213.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.8.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.63.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.11.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.11.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.11.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.11.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.11.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.11.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.11.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.10.12.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.10.12.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.10.12.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.10.12.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.10.12.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.15.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.1.2.2.2.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuronf\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.3\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.11.351\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.18.2\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.20.22.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.16.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.9.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.9.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.20.13.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.20.13.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.3.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.20.22.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.17.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.0.965\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.971\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.50.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.22.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.55.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.13.72.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.14.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.1.2.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurone\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.10.0.360\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.18.1\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.20.22.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.16.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.9.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.9.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.20.13.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.20.13.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.3.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.20.22.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.17.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.0.965\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.971\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.50.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.22.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.55.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.13.68.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.14.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.1.2.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurone\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.10.0.360\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.18.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.20.22.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.16.7.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.9.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.9.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.20.13.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.20.13.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.3.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.20.22.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.17.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"dmlc_nnvm\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_topi\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"dmlc_tvm\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"inferentia_hwm\",\"version\":\"1.17.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"2.0.965\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.971\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.50.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.22.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronperf\",\"version\":\"1.8.55.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.13.66.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"neuronx_distributed\",\"version\":\"0.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.19.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.19.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.74.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.14.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"2.1.2.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurone\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"torch_xla\",\"version\":\"2.1.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"transformers-neuronx\",\"version\":\"0.10.0.21\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.17.0\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.20.11.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.15.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.9.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.9.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.19.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.19.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.45.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.20.11.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.17.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.18.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.18.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.18.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.16.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"2.0.755\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.809\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.40.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.21.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.15.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.12.68.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.6.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.12.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.13.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"2.0.0.2.0.1b0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"2.1.1.2.0.1b0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurond\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"2.1.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.9.474\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.16.1\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.19.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.15.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.9.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.9.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.19.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.19.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.45.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.19.5.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.16.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.18.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.18.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.18.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.16.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"2.0.498\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.669\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.40.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.21.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.15.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.12.68.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.6.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.12.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.13.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"2.0.0.2.0.1b0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"2.1.1.2.0.0b0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurond\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"2.1.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.9.474\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.16.0\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.19.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.15.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.9.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.9.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.19.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.19.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.45.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.19.5.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.16.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.18.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.18.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.18.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.16.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"2.0.498\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.669\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.40.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.21.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.15.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.12.54.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.6.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.12.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.6.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.8.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.17.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.13.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"2.0.0.2.0.1b0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"2.1.1.2.0.0b0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurond\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"2.1.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.9.474\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.15.2\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.18.19.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.14.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.8.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.8.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.18.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.18.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.27.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.18.15.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.15.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.18.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.18.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.18.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"1.0.680\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.570\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.25.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.20.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.11.0.35\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.5.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.11.0.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.43.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.12.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"2.0.0.2.0.1b0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuronc\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"2.0.0+torchneuron0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.8.268\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.18.15\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.15.1\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.18.19.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.14.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.8.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.8.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.18.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.18.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.27.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.18.15.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.15.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.18.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.18.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.18.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"1.0.680\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.570\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.25.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.20.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.11.0.34\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.5.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.11.0.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.43.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.12.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"2.0.0.2.0.1b0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuronc\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"2.0.0+torchneuron0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.8.268\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.18.15\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.15.0\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.18.18.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.14.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.8.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.8.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.18.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.18.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.27.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.18.14.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.15.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.18.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.18.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.18.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"1.0.663\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.538\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.25.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.20.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.11.0.34\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.5.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.11.0.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.43.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.12.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"2.0.0.2.0.0b0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuronc\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"2.0.0+torchneuron0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.8.268\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.18.14\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.14.1\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.17.9.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.13.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.7.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.7.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.17.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.14.6.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.476\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.10.0.5\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuronb\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.10.0.35\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.17.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.17.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.22.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.39.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.39.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.11.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.7.84\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.17.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.17.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.17.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.15.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.4.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.14.0\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.17.9.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.13.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.7.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.7.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.17.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.14.6.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.476\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.10.0.5\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuronb\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.10.0.34\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.19.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.17.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.17.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.22.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.39.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.39.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.11.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.7.84\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.17.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.17.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.17.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.15.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.4.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]}\n    ]},        \n      {\"neuron_version\":\"2.13.2\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.16.16.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.12.18.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.6.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.6.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.16.14.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.13.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.440\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.9.0.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurona\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.9.0.40\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.18.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.16.18.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.16.18.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.25.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.39.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.39.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.10.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.6.106\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.15.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.3.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.13.1\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.16.8.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.12.11.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.6.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.6.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.16.8.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.13.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.425\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.9.0.2\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurona\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.9.0.40\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.18.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.16.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.16.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.21.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.39.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.39.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.10.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.6.106\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.15.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.3.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.13.0\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.16.8.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.12.11.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.6.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.6.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.16.8.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.13.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.425\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.9.0.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneurona\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.9.0.16\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.18.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.16.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.16.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.21.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.10.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf.\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.39.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.9.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.39.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.10.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.6.106\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.15.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.3.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.12.2\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.15.16.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.11.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.5.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.5.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.15.14.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.413\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.8.0.3\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuron8\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.8.0.25\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.17.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.15.6.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.15.6.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.39.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.39.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.9.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.5.58\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.14.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.2.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]}\n    ]},        \n      {\"neuron_version\":\"2.12.1\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.15.16.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.11.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.5.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.5.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.15.14.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.413\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.8.0.3\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuron8\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.8.0.25\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.17.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.15.6.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.15.6.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.39.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.39.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.9.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.5.58\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.14.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.2.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.12.0\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.15.13.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.11.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.5.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.5.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.15.11.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.12.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.391\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.8.0.3\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7,\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuron8\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.8.0.25\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.17.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.15.6.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.15.6.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.16.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.9.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.9.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.39.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.39.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.9.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.5.58\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.16.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.14.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.2.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]}\n    ]},\n      {\"neuron_version\":\"2.11.0\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.14.9.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.10.11.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.4.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.4.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.14.8.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.11.10.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.326\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.7.0.3\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1+torchneuron7\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.7.0.40\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.16.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.14.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.14.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.8.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.8.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.8.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.8.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.8.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.8.9.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.37.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.7.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.7.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.7.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.7.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.7.10.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.39.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.9.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.8.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.4.60\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.16.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.16.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.16.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.14.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"islpy\",\"version\":\"2021.1+aws2021.x.169.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_distributed\",\"version\":\"0.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.10.0\", \"packages\": [\n      {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.13.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.9.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-customop-lib\",\"version\":\"0.3.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.3.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.13.6.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-tools\",\"version\":\"2.10.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"libneuronxla\",\"version\":\"0.5.207\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx_hwm\",\"version\":\"2.6.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch_xla\",\"version\":\"1.13.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronx-cc\",\"version\":\"2.6.0.19\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"neuron-cc\",\"version\":\"1.15.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.13.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.13.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"aws-neuronx-oci-hook\",\"version\":\"2.2.0.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.8.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.8.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.8.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.8.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.8.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.7.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.4.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-neuronx\",\"version\":\"2.9.3.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.8.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.8.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.8.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.8.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.8.1.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.26.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.7.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.7.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.7.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.7.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.7.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.39.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.4.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n      {\"name\":\"torch-neuronx\",\"version\":\"1.13.1.1.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\",\"3.10\"]},\n      {\"name\":\"transformers-neuronx\",\"version\":\"0.3.32\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n      {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n      {\"name\":\"neuronperf\",\"version\":\"1.8.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"libnrt.so\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_nnvm\",\"version\":\"1.15.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_topi\",\"version\":\"1.15.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"dmlc_tvm\",\"version\":\"1.15.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"inferentia_hwm\",\"version\":\"1.14.1\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n      {\"name\":\"islpy\",\"version\":\"2021.1\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.9.1\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.12.35.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.8.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop\",\"version\":\"0.2.3.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.12.23.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.9.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.205\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronx_hwm\",\"version\":\"2.5.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.5.0.28\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.14.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.12.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.12.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hooks\",\"version\":\"2.1.97.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.7.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.7.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.7.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.7.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.7.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.7.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.7.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.7.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.7.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.7.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.25.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.6.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.6.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.6.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.6.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.6.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.37.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.2.127.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.0.1.6.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronperf\",\"version\":\"1.7.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"libnrt.so\",\"version\":\"2.12.16.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.9.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.12.27.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.8.4.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop\",\"version\":\"0.2.3.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.2.1.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.12.16.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.9.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.173\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronx_hwm\",\"version\":\"2.5.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.5.0.28\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.14.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.12.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.12.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hooks\",\"version\":\"2.1.97.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.7.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.7.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.7.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.7.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.7.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.2.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.7.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.7.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.7.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.7.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.7.3.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.25.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.6.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.6.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.6.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.13.1.2.6.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.6.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.37.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.2.127.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.0.1.6.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronperf\",\"version\":\"1.7.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"libnrt.so\",\"version\":\"2.12.16.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.8.0\", \"packages\": [\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.11.47.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.7.33.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-customop\",\"version\":\"0.1.23.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-gpsimd-tools\",\"version\":\"0.1.7.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-discovery\",\"version\":\"2.9\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.11.43.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.8.2.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"libneuronxla\",\"version\":\"0.5.144\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronx_hwm\",\"version\":\"2.4.0.1\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch_xla\",\"version\":\"1.13.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.4.0.21\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"neuron-cc\",\"version\":\"1.13.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.1.12.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.1.12.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hooks\",\"version\":\"2.1.81.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.6.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.4.2.6.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.4.2.6.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.9.3.2.6.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.10.1.2.6.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.10.1.1.0.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.6.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.4.2.6.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.4.2.6.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.9.3.2.6.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.10.1.2.6.5.0\",\"supported_instances\":[\"inf1\",\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.19.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorboard-plugin-neuron\",\"version\":\"2.4.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.5.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.5.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.5.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.5.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.11.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.2.43.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.0.1.5.0\",\"supported_instances\":[\"trn1\",\"inf2\"],\"supported_python_versions\":[\"3.7\",\"3.8\",\"3.9\"]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronperf\",\"version\":\"1.6.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"libnrt.so\",\"version\":\"2.10.30.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.7.0\", \"packages\": [\n        {\"name\":\"neuronx-cc\",\"version\":\"2.4.0.21\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.1.12.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.1.12.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hooks\",\"version\":\"2.1.60.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.2.1.2.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.5.4.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.6.3.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.0.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.0.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.15.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronx-gpsimd-customop\",\"version\":\"0.1.23.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronx-gpsimd-tools\",\"version\":\"0.1.7.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.13.0.1.4.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-xla\",\"version\":\"1.13.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.7.15.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.11.47.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.11.43.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.7.2.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.6.0\", \"packages\": [\n        {\"name\":\"neuronx-cc\",\"version\":\"2.3.0.4\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.1.12.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.1.12.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hooks\",\"version\":\"2.1.14.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.2.1.2.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.5.4.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.6.3.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.0.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.0.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorboard-plugin-neuronx\",\"version\":\"2.5.3.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.12.0.1.4.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"torch-xla\",\"version\":\"1.12.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.6.33.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.10.37.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.10.30.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.6.1.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.5.0\", \"packages\": [\n        {\"name\":\"neuron-cc\",\"version\":\"1.13.5.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.2.0.73\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.1.12.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.1.12.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hooks\",\"version\":\"2.1.14.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.5.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.5.3.2.5.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.6.5.2.5.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.3.2.5.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.2.2.5.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.2.1.2.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"1.15.0.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.5.4.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.6.3.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.7.0.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuronx\",\"version\":\"2.8.0.2.5.6.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorboard-plugin-neuron\",\"version\":\"2.4.6.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.5.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.12.1.2.5.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.5.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.7.1.2.5.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.8.1.2.5.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.5.8.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.11.0.1.2.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.11.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.2.43.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.6.33.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.10.34.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.10.27.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.5.19.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronperf\",\"version\":\"1.6.1.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"libnrt.so\",\"version\":\"2.10.27.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.4.0\", \"packages\": [\n        {\"name\":\"neuron-cc\",\"version\":\"1.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.2.0.73\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.1.2.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.1.2.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hooks\",\"version\":\"2.1.2.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.5.3.2.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.6.3.2.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.1.2.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.0.2.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.2.1.2.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-model-server-neuron\",\"version\":\"1.15.0.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuron\",\"version\":\"2.5.4.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuron\",\"version\":\"2.6.3.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuron\",\"version\":\"2.7.0.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuron\",\"version\":\"2.8.0.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorboard-plugin-neuron\",\"version\":\"2.4.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.7.1.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.8.1.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.11.0.1.2.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.2.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.6.5.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.10.17.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.10.15.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.5.16.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronperf\",\"version\":\"1.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"libnrt.so\",\"version\":\"2.2.51.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]}\n      ]},\n      {\"neuron_version\":\"2.3.0\", \"packages\": [\n        {\"name\":\"neuron-cc\",\"version\":\"1.11.7.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"neuronx-cc\",\"version\":\"2.1.0.76\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"aws-neuronx-k8-plugin\",\"version\":\"2.0.1.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-k8-scheduler\",\"version\":\"2.0.1.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-oci-hooks\",\"version\":\"2.0.1.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"1.15.5.2.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.5.3.2.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.6.3.2.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.7.1.2.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuron\",\"version\":\"2.8.0.2.3.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-neuronx\",\"version\":\"2.8.2.1.1.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"tensorflow-model-server-neuron\",\"version\":\"1.15.0.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuron\",\"version\":\"2.5.4.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuron\",\"version\":\"2.6.3.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuron\",\"version\":\"2.7.0.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorflow-model-server-neuron\",\"version\":\"2.8.0.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"tensorboard-plugin-neuron\",\"version\":\"2.4.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.7.1.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.8.1.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.9.1.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.10.2.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuron\",\"version\":\"1.11.0.2.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"torch-neuronx\",\"version\":\"1.11.0.1.1.1\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[\"3.7\",\"3.8\"]},\n        {\"name\":\"mxnet_neuron\",\"version\":\"1.5.1.1.10.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"mx_neuron\",\"version\":\"1.8.0.2.2.2.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[\"3.7\"]},\n        {\"name\":\"aws-neuronx-dkms\",\"version\":\"2.5.41.0\",\"supported_instances\":[\"inf1\",\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-collectives\",\"version\":\"2.9.86.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"efa-installer\",\"version\":\"na\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-runtime-lib\",\"version\":\"2.9.64.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuron-tools\",\"version\":\"2.1.4.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"aws-neuronx-tools\",\"version\":\"2.4.14.0\",\"supported_instances\":[\"trn1\"],\"supported_python_versions\":[]},\n        {\"name\":\"neuronperf\",\"version\":\"1.3.0.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]},\n        {\"name\":\"libnrt.so\",\"version\":\"2.2.51.0\",\"supported_instances\":[\"inf1\"],\"supported_python_versions\":[]}\n      ]}\n    ]\n}\n"
  },
  {
    "path": "src/helperscripts/neuron-releases-manifest.json",
    "content": "{\n  \"repos\": {\n    \"whl\": \"https://pip.repos.neuron.amazonaws.com/\",\n    \"rpm\": \"https://yum.repos.neuron.amazonaws.com/\",\n    \"deb\": \"https://apt.repos.neuron.amazonaws.com/\"\n  },\n  \"manifest_date\": \"2022-12-12\",\n  \"manifest_version\": \"1.0.1\",\n  \"dlami_conda_env\": {\n    \"tensorflow\": {\n      \"1.15.5\": [\n        \"aws_neuron_tensorflow_p36\",\n        \"aws_neuron_tensorflow_p36\"\n      ],\n      \"2.1.4\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.2.3\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.3.4\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.4.3\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.5.1\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.5.2\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.5.3\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.6.3\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.6.5\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.7.1\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.7.3\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.8.0\": [\n        \"None\",\n        \"None\"\n      ],\n      \"2.8.2\": [\n        \"None\",\n        \"None\"\n      ]\n    },\n    \"pytorch\": {\n      \"1.5.1\": [\n        \"None\",\n        \"aws_neuron_pytorch_p36\"\n      ],\n      \"1.6.0\": [\n        \"None\",\n        \"aws_neuron_pytorch_p36\"\n      ],\n      \"1.7.1\": [\n        \"None\",\n        \"aws_neuron_pytorch_p36\"\n      ],\n      \"1.8.1\": [\n        \"aws_neuron_pytorch_p36\",\n        \"aws_neuron_pytorch_p36\"\n      ],\n      \"1.9.1\": [\n        \"None\",\n        \"aws_neuron_pytorch_p36\"\n      ],\n      \"1.10.1\": [\n        \"None\",\n        \"aws_neuron_pytorch_p36\"\n      ],\n      \"1.10.2\": [\n        \"None\",\n        \"None\"\n      ],\n      \"1.11.0\": [\n        \"None\",\n        \"None\"\n      ]\n    },\n    \"mxnet\": {\n      \"1.5.1\": [\n        \"aws_neuron_mxnet_p36\",\n        \"aws_neuron_mxnet_p36\"\n      ],\n      \"1.8.0\": [\n        \"None\",\n        \"aws_neuron_mxnet_p36\"\n      ]\n    }\n  },\n  \"latest_version_of_maintained_packages\": {\n    \"runtime-server\": {\n      \"framework\": false,\n      \"package-name\": \"aws-neuron-runtime\",\n      \"package-version\": \"1.6.24.0\",\n      \"neuron-version\": \"1.15.2\"\n    },\n    \"mxnet-1.5.1\": {\n      \"framework\": true,\n      \"package-name\": \"mxnet_neuron\",\n      \"package-version\": \"1.5.1.1.6.5.1\",\n      \"neuron-version\": \"1.16.0\"\n    }\n  },\n  \"fal_supported_runtime\": {\n    \"tensorflow\": {\n      \"1.15.5\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"1.15.5.1.6.10.0\"\n        ],\n        \"libnrt\": [\n          \"1.15.5.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.1.4\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"2.1.4.1.6.10.0\"\n        ],\n        \"libnrt\": [\n          \"2.1.4.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.2.3\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"2.2.3.1.6.10.0\"\n        ],\n        \"libnrt\": [\n          \"2.2.3.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.3.3\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.3.4\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.3.4.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.4.2\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.4.3\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.4.3.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.5.0\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.5.1\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.5.1.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.5.2\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.5.1.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.5.3\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.5.1.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.6.3\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.5.1.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.6.5\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.5.1.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.7.1\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.5.1.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.7.3\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.5.1.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.8.0\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.5.1.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"2.8.2\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"2.5.1.2.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      }\n    },\n    \"pytorch\": {\n      \"1.5.1\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"1.5.1.1.5.21.0\"\n        ],\n        \"libnrt\": [\n          \"1.5.1.1.5.21.1\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"1.7.1\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"1.7.1.1.5.21.0\"\n        ],\n        \"libnrt\": [\n          \"1.7.1.1.5.21.1\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"1.8.1\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"1.8.1.1.5.21.0\"\n        ],\n        \"libnrt\": [\n          \"1.8.1.1.5.21.1\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"1.9.1\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"1.9.1.0.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"1.10.1\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"1.9.1.0.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"1.10.2\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"1.9.1.0.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"1.11.0\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"1.9.1.0.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"1.12.1\": {\n        \"neuron-rtd\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"1.9.1.0.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      }\n    },\n    \"mxnet\": {\n      \"1.5.1\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"99.99.99.99.99.99.99\"\n        ],\n        \"libnrt\": [\n          \"99.99.99.99.99.99.99\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      },\n      \"1.8.0\": {\n        \"neuron-rtd\": [\n          \"0.0.0.0\",\n          \"1.8.0.1.3.4.0\"\n        ],\n        \"libnrt\": [\n          \"1.8.0.1.3.4.1\",\n          \"99.99.99.99.99.99.99\"\n        ]\n      }\n    }\n  },\n  \"latest_release\": {\n    \"inf1\": {\n      \"version\": \"2.8.0\"\n    }\n  },\n  \"neuron_versions\": {\n    \"2.6.0\": {\n      \"python_ver\": [\n        \"3.7\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.6.33.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.10.27.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.6.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.13.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.1.2.5.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.5.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.2.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.11.0.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.12.1.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.3.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.6.5.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.7.3.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.8.2.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuronx\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.4.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.6.3.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.7.0.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.8.0.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.4.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.10.11.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.2.43.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"2.7.0\": {\n      \"python_ver\": [\n        \"3.7\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.7.15.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.10.27.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.7.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.13.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.1.2.5.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.5.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.2.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.11.0.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.12.1.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.3.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.6.5.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.7.3.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.8.2.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuronx\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.4.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.6.3.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.7.0.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.8.0.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.4.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.10.11.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.2.43.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"2.8.0\": {\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.7.33.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.10.30.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.8.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.13.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.1.2.5.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.5.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.2.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.11.0.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.12.1.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.6.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.3.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.6.5.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.7.4.2.6.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.8.4.2.6.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.9.3.2.6.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.10.1.2.6.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuronx\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.6.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.4.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.6.3.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.7.4.2.6.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.8.4.2.6.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.9.3.2.6.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.10.1.2.6.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.4.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.10.11.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.2.43.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      },\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"python_ver\": [\n        \"3.7\"\n      ]\n    },\n    \"2.5.0\": {\n      \"python_ver\": [\n        \"3.7\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.6.33.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.10.27.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.5.19.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.13.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.1.2.5.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.5.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.2.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.11.0.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.12.1.2.5.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.3.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.6.5.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.7.3.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.8.2.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuronx\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.4.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.6.3.2.5.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.7.0.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.8.0.2.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.4.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.10.11.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.2.43.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"2.4.0\": {\n      \"python_ver\": [\n        \"3.7\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.6.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.51.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.5.16.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.11.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.1.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.2.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.11.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.6.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.7.1.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.8.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.4.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.6.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.7.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.8.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.4.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.10.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.2.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"2.3.0\": {\n      \"python_ver\": [\n        \"3.7\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.5.41.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.51.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuronx-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.11.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.1.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.2.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.11.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.6.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.7.1.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.8.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.4.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.6.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.7.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.8.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.4.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.10.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.2.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.19.2\": {\n      \"python_ver\": [\n        \"3.7\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.3.26.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.51.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.9.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.9.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.11.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.1.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.2.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.11.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.6.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.7.1.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.8.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.4.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.6.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.7.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.8.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.4.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.10.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.2.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.19.1\": {\n      \"python_ver\": [\n        \"3.7\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.3.11.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.51.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.9.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.9.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.11.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.1.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.2.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.11.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.6.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.7.1.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.8.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.4.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.6.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.7.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.8.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.4.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.10.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.2.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.19.0\": {\n      \"python_ver\": [\n        \"3.7\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.3.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.51.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.9.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.9.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.11.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.1.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.2.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.11.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.6.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.7.1.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.8.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.4.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.6.3.2.3.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.7.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.8.0.2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.4.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.10.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.2.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.18.0\": {\n      \"python_ver\": [\n        \"3.7\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.14.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.51.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.8.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.8.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.790.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.10.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.2.2.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.2.2.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.2.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.1.2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.2.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.3.2.2.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.6.3.2.2.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.7.1.2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.2.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.4.2.2.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.6.3.2.2.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.7.0.2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.9.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.2.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.17.2\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.13.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.31.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.623.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.9.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.1.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.2.1.7.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.2.1.7.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.1.7.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.1.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.1.2.1.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.1.14.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.1.4.2.1.14.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.2.3.2.1.14.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.3.4.2.1.14.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.4.3.2.1.14.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.2.2.1.14.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.1.14.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.1.4.2.1.14.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.2.3.2.1.14.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.3.4.2.1.14.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.4.3.2.1.14.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.3.2.1.14.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.8.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.1.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.17.1\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.13.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.31.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.623.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.9.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.1.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.2.1.7.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.2.1.7.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.1.7.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.1.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.1.2.1.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.1.13.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.1.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.2.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.3.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.4.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.2.2.1.13.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.1.13.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.1.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.2.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.3.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.4.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.3.2.1.13.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.8.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.1.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.17.0\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.13.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.31.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.623.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.9.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.1.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.2.1.7.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.2.1.7.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.1.7.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.1.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.10.1.2.1.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.1.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.1.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.2.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.3.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.4.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.2.2.1.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.1.6.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.1.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.2.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.3.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.4.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.3.2.1.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.8.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.1.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.16.3\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.18.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.494.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.0.85.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.2.0.536.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.2.0.536.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.0.536.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.0.536.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.0.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.1.4.2.0.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.2.3.2.0.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.3.4.2.0.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.4.3.2.0.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.1.2.0.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.0.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.1.4.2.0.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.2.3.2.0.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.3.4.2.0.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.4.3.2.0.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.2.2.0.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.7.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.0.290.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.16.2\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.18.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.327.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.0.85.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.2.0.468.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.2.0.468.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.0.468.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.0.468.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.1.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.2.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.3.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.4.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.1.2.0.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.1.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.2.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.3.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.4.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.2.2.0.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.7.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.0.276.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.16.1\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.18.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.327.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.0.85.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.2.0.392.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.2.0.392.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.0.392.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.0.392.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.1.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.2.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.3.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.4.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.1.2.0.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.1.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.2.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.3.4.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.4.3.2.0.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.2.2.0.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.7.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.0.276.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.16.0\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"libnrt\": {\n          \"framework\": false,\n          \"packages\": {\n            \"libnrt\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.15.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"lib\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.277.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.7.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"neuronperf\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuronperf\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.0.85.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.2.0.318.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.2.0.318.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.2.0.318.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.9.1.2.0.318.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.2.0.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.1.4.2.0.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.2.3.2.0.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.3.4.2.0.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.4.3.2.0.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.1.2.0.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.2.0.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.1.4.2.0.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.2.3.2.0.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.3.4.2.0.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.4.3.2.0.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.2.2.0.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.7.0.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.2.0.271.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.15.2\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.24.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.22.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.22.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.21.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.25.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.6.13.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.5.21.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.5.21.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.1.5.21.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.6.10.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.1.4.1.6.10.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.2.3.1.6.10.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.3.3.1.6.10.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.4.2.1.6.10.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.0.1.6.10.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.6.10.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.1.4.1.6.10.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.2.2.1.6.10.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.3.0.1.6.10.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.4.1.1.6.10.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.1.1.6.10.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.6.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.1.3.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.15.1\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.24.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.22.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.22.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.21.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.25.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.6.13.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.5.21.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.5.21.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.1.5.21.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.1.4.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.2.3.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.3.3.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.4.2.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.0.1.6.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.1.4.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.2.2.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.3.0.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.4.1.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.1.1.6.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.6.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.1.3.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.15.0\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.450.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.19.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.17.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.17.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.16.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.20.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.6.13.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.5.21.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.5.21.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.1.5.21.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.1.4.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.2.3.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.3.3.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.4.2.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"2.5.0.1.6.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.1.4.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.2.2.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.3.0.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.4.1.1.6.8.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                },\n                \"2.5.1.1.6.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.6.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.1.3.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.14.2\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.386.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.9.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.10.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.5.12.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.5.12.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.1.5.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.5.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.5.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.6.1.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.1.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.14.1\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.5.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.7.4.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.5.12.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.5.12.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.1.5.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.5.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.5.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.6.1.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.1.3.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.14.0\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.5.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.5.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.5.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.6.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.4.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.4.1.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.4.1.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.8.1.1.4.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.4.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.1.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.4.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.5.1.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.1.2.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.13.0\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.9.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.17.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.5.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.5.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.5.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.3.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.3.5.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.3.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.3.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-plugin-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"2.0.29.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.3.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.4.4.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            },\n            \"mx_neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.8.0.1.1.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [\n                    \"wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\",\n                    \"pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl\"\n                  ],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.12.3\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.2.11.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.2.24.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.2.24.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.2.9.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.2.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.2.9.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.3.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.12.2\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.12.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.2.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.2.16.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.2.16.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.2.9.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.2.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.2.9.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.3.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.12.1\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.9.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.2.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.2.15.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.2.15.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.2.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.2.6.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.2.8.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.3.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.12.0\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.4.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.2.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.2.3.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.2.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.5.1.2.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.2.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.3.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.11.0\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.2.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.3.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.1.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.1.7.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                },\n                \"1.7.1.1.1.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.4.1.1.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.1.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.1.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.2.1.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    },\n    \"1.10.0\": {\n      \"python_ver\": [\n        \"3.6\"\n      ],\n      \"instance_support\": [\n        \"inf1\"\n      ],\n      \"arch\": [\n        \"x86_64\"\n      ],\n      \"components\": {\n        \"driver\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-dkms\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.2.3.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\",\n                    \"src\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.2.5.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-plugin\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-plugin\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"k8-scheduler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-k8-scheduler\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"runtime-base\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-runtime-base\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.2.0.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-rtd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tools\": {\n          \"framework\": false,\n          \"packages\": {\n            \"aws-neuron-tools\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.2.7.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"neuron-monitor\",\n                    \"neuron-cli\",\n                    \"neuron-top\",\n                    \"neuron-htop\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"compiler\": {\n          \"framework\": false,\n          \"packages\": {\n            \"neuron-cc\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.0.24045.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": \"whl\"\n                }\n              }\n            }\n          }\n        },\n        \"pytorch\": {\n          \"framework\": true,\n          \"packages\": {\n            \"torch-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.0.1978.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"torch-neuron\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow\": {\n          \"framework\": true,\n          \"packages\": {\n            \"tensorflow-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.15.4.1.0.2168.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorboard\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorboard-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.0.615.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"tensorflow-model-server\": {\n          \"framework\": false,\n          \"packages\": {\n            \"tensorflow-model-server-neuron\": {\n              \"install_on_compute_instance\": false,\n              \"versions\": {\n                \"1.15.0.1.0.2168.0\": {\n                  \"main_version\": true,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"deb\",\n                    \"rpm\"\n                  ]\n                }\n              }\n            }\n          }\n        },\n        \"mxnet\": {\n          \"framework\": true,\n          \"packages\": {\n            \"mxnet-neuron\": {\n              \"install_on_compute_instance\": true,\n              \"versions\": {\n                \"1.5.1.1.1.88.0\": {\n                  \"main_version\": false,\n                  \"pre_install_cmds\": [],\n                  \"post_install_cmds\": [],\n                  \"format\": [\n                    \"bin\"\n                  ],\n                  \"content\": [\n                    \"tbd\"\n                  ],\n                  \"package_type\": [\n                    \"whl\"\n                  ]\n                }\n              }\n            }\n          }\n        }\n      }\n    }\n  }\n}"
  },
  {
    "path": "src/helperscripts/neuron-setup-example.py",
    "content": "from neuronsetuphelper import neuron_setup_helper\n\n\nnr_setup=neuron_setup_helper(manifest_file='default',neuron_version='latest')\n\nsetup_cmd = nr_setup.instructions(framework='tensorflow',action='Install',os='ubuntu',ami='non-dlami',mode='develop',framework_version='latest')\nprint (setup_cmd)\n"
  },
  {
    "path": "src/helperscripts/neuronsetuphelper.py",
    "content": "import json\nimport argparse\nfrom packaging.version import Version, parse\n\n\n\n########################################\n# neuron_setup_helper\n########################################\n\nclass neuron_release_info:\n    def __init__(self):\n\n        self.release_frameworks_all = {}\n        self.release_frameworks_main = {}\n        self.release_packages_all ={}\n        self.release_package_main={}\n        self.release_frameworks_list=[]\n        self.release_components_list = []\n        self.release_tf_package_to_model_server_package={}\n        self.release_os_install_list =[]\n        self.python_ver=\"\"\n\n\n\n# release_frameworks_all\n# Desc: Dictionary - all framewors included in the release\n#   example: 'pytorch-1.5.1': {'framework': 'pytorch', 'package': 'torch-neuron', 'version': '1.5.1.1.5.3.0', 'main': False, 'framework_version': '1.5.1', 'package_name': 'torch-neuron-1.5.1.1.5.3.0', 'pre_install_cmds': [], 'post_install_cmds': []}\n# release_frameworks_all = {}\n\n# release_frameworks_main\n# Desc: Dictionary - the main frameworks in each rlease (single  version of the same framework)\n#   example: 'mxnet': {'framework': 'mxnet-1.8.0', 'package': 'mx_neuron', 'version': '1.8.0.1.3.0.0', 'framework_version': '1.5.1', 'full_package_name': 'mx_neuron-1.8.0.1.3.0.0', 'pre_install_cmds': ['wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl', 'pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl'], 'post_install_cmds': []}\n# release_frameworks_main = {}\n\n# release_packages_all\n# Desc: Dictionary -  all packages included in the release\n#   example: 'aws-neuron-dkms-1.5.0.0': {'component': 'driver', 'package': 'aws-neuron-dkms', 'version': '1.5.0.0', 'main': True, 'pre_install_cmds': [], 'post_install_cmds': []}\n# release_packages_all ={}\n\n# release_package_main\n# Desc: Dictionary - only single package from each component\n#   example: 'driver': {'package': 'aws-neuron-dkms', 'version': '1.5.0.0', 'full_package_name': 'aws-neuron-dkms-1.5.0.0', 'pre_install_cmds': [], 'post_install_cmds': []}\n# release_package_main={}\n\n\n# list of all framewoks included in the specific neuron release\n# release_frameworks_list=[]\n\n# list of all neuron components included in the specific neuron release\n# release_components_list = []\n\n# dictionary to correlate tf version with model server version\n# release_tf_package_to_model_server_package = {}\n\n\n# list of all Neuron versions included in the manifest\nneuron_ver_list = []\n\n\n# release_os_install_list =[]\n\ndlami_conda_env= {}\n\n\n\n\npackage_formal_name= {\n    \"compiler\":\"Neuron Compiler\",\n    \"tensorflow\":\"Neuron TensorFlow\",\n    \"pytorch\":\"Neuron PyTorch\",\n    \"mxnet\":\"Neuron MXNet\",\n    \"runtime-server\":\"Neuron Runtime server\",\n    \"libnrt\":\"Neuron Runtime library\",\n    \"runtime-base\":\"Neuron Runtime base\",\n    \"driver\":\"Neuron Driver\",\n    \"tools\":\"Neuron Tools\",\n    \"tensorboard\":\"Neuron TensorBoard\",\n    \"tensorflow-model-server\":\"Neuron TensorFlow model server\"\n    }\n\n\n\n\n########################################\n# parse_arguments\n########################################\n\ndef cli_parse_arguments():\n    __name__='neuron-install-helper.py'\n    parser = argparse.ArgumentParser(prog=__name__\n    ,usage='\\npython3 %(prog)s --list {neuron_versions,packages,components,frameworks} [--neuron-version=X.Y.Z]  [--file FILE] \\n'\n    +'python3 %(prog)s --install {pytorch,tensorflow,mxnet} [--neuron-version=X.Y.Z] [--framework-version=FRAMEWORK-X.Y.Z] [options]\\n'\n    +'python3 %(prog)s --install {driver,runtime,tools} [--neuron-version=X.Y.Z] [options]\\n'\n    +'python3 %(prog)s --update {pytorch,tensorflow,mxnet} [--framework-version=framework-X.Y.Z]  [options]\\n'\n    +'python3 %(prog)s --update {driver,runtime,tools} [options]\\n'\n    +'options= [--file FILE] [--ami {dlami,non-dlami}] [--os {ubuntu,amazonlinux}]\\n'\n    ,description='Installer helper for Neuron SDK')\n\n    group = parser.add_mutually_exclusive_group(required=True)\n    parser.add_argument(\"--neuron-version\",metavar='X.Y.Z')\n    group.add_argument(\"--list\",choices=['neuron_versions','packages','components','frameworks'])\n    group.add_argument(\"--install\",choices=['pytorch','tensorflow','mxnet'])\n    group.add_argument(\"--update\",choices=['pytorch','tensorflow','mxnet'])\n    parser.add_argument(\"--mode\",choices=['develop','compile','deploy'],default='develop')\n    parser.add_argument(\"--framework-version\",metavar='framework-X.Y.Z')\n    parser.add_argument(\"--os\",choices=['ubuntu','amazonlinux'],default='ubuntu',help='default=ubuntu')\n    parser.add_argument(\"--ami\",choices=['dlami','non-dlami'],default='non-dlami',help='default=non-dlami')\n    parser.add_argument(\"--file\",default='neuron-releases-manifest.json',help='default=neuron-releases-manifest.json')\n\n    return parser.parse_args()\n\n\n\n\ndef enumerate_release_manifest(nr_setup, in_neuron_version):\n\n    ########################################\n    # Enumerate the Json file\n    ########################################\n\n    if nr_setup.file==None:\n        nr_setup.file='neuron-releases-manifest.json'\n\n    try:\n        read_file = open(nr_setup.file, \"r\")\n    except:\n        print(__name__,\": error:\",\"Can't open \" + nr_setup.file + \" \")\n        exit(-1)\n\n    neuron_releases = json.load (read_file)\n\n    latest_neuron_version = neuron_releases[\"latest_release\"][\"inf1\"][\"version\"]\n\n    nr_setup.dlami_conda_env = neuron_releases[\"dlami_conda_env\"]\n\n    nr_setup.fal_supported_runtime = neuron_releases[\"fal_supported_runtime\"]\n\n    if (in_neuron_version == None) | (in_neuron_version == 'latest'):\n        neuron_version=latest_neuron_version\n    else:\n        neuron_version = in_neuron_version\n\n\n\n    for n_ver in neuron_releases[\"neuron_versions\"]:\n        neuron_ver_list.append(n_ver)\n\n\n\n    for neuron_release_ver in neuron_releases[\"neuron_versions\"]:\n        m_release=neuron_releases[\"neuron_versions\"][neuron_release_ver][\"components\"]\n        n_info=neuron_release_info()\n        n_info.python_ver=  neuron_releases[\"neuron_versions\"][neuron_release_ver][\"python_ver\"][0]\n\n        for component_name in m_release:\n            if m_release[component_name][\"framework\"]==False:\n                n_info.release_components_list.append(component_name)\n            m_packages=m_release[component_name][\"packages\"]\n            for package_name in m_packages:\n                for package_ver in m_packages[package_name][\"versions\"]:\n                    m_package_ver=m_packages[package_name][\"versions\"][package_ver]\n\n                    full_package_name=package_name+'-'+package_ver\n\n                    n_info.release_packages_all[full_package_name]= {\"component\":component_name,\"package\":package_name,\"version\":package_ver,\"main\":m_package_ver[\"main_version\"],\"pre_install_cmds\":m_package_ver[\"pre_install_cmds\"],\"post_install_cmds\":m_package_ver[\"post_install_cmds\"],\"package_type\":m_package_ver[\"package_type\"]}\n\n                    if m_package_ver[\"main_version\"]:\n                        n_info.release_package_main[component_name]={\"package\":package_name,\"version\":package_ver,\"full_package_name\":full_package_name,\"pre_install_cmds\":m_package_ver[\"pre_install_cmds\"],\"post_install_cmds\":m_package_ver[\"post_install_cmds\"],\"package_type\":m_package_ver[\"package_type\"]}\n\n                    if m_release[component_name][\"framework\"]:\n                        ver_digits = package_ver.rsplit('.')\n                        fw_ver=ver_digits[0]+'.'+ver_digits[1]+'.'+ver_digits[2]\n                        fw_name_ver=component_name+'-'+fw_ver\n\n                        if m_release[component_name][\"framework\"]:\n                            n_info.release_components_list.append(fw_name_ver)\n                            n_info.release_frameworks_list.append(fw_name_ver)\n\n                        if m_package_ver[\"main_version\"]:\n                            n_info.release_frameworks_main[component_name]={\"framework\":fw_name_ver,\"package\":package_name,\"version\":package_ver,\"framework_version\":fw_ver,\"package_name\":full_package_name,\"full_package_name\":full_package_name,\"pre_install_cmds\":m_package_ver[\"pre_install_cmds\"],\"post_install_cmds\":m_package_ver[\"post_install_cmds\"],\"package_type\":m_package_ver[\"package_type\"]}\n\n\n                        n_info.release_frameworks_all[fw_name_ver]={\"framework\":component_name,\"package\":package_name,\"version\":package_ver,\"main\":m_package_ver[\"main_version\"],\"framework_version\":fw_ver,\"package_name\":full_package_name,\"pre_install_cmds\":m_package_ver[\"pre_install_cmds\"],\"post_install_cmds\":m_package_ver[\"post_install_cmds\"],\"package_type\":m_package_ver[\"package_type\"]}\n\n        if 'driver' in n_info.release_components_list:\n            n_info.release_os_install_list.append('driver')\n        if 'runtime-server' in n_info.release_components_list:\n            n_info.release_os_install_list.append('runtime-server')\n        if 'tools' in n_info.release_components_list:\n            n_info.release_os_install_list.append('tools')\n        if 'tensorflow-model-server' in n_info.release_components_list:\n            n_info.release_os_install_list.append('tensorflow-model-server')\n\n        # correlate TF and TF model server versions\n        for pkg in n_info.release_packages_all.keys():\n            if n_info.release_packages_all[pkg]['component'] == 'tensorflow':\n                package_ver=n_info.release_packages_all[pkg]['version']\n                ver_digits = package_ver.rsplit('.')\n                tf_small_ver=ver_digits[0]+'.'+ver_digits[1]\n                for pkg2 in n_info.release_packages_all.keys():\n                    if n_info.release_packages_all[pkg2]['component'] == 'tensorflow-model-server':\n                        package_ver=n_info.release_packages_all[pkg2]['version']\n                        ver_digits = package_ver.rsplit('.')\n                        tf_model_server_small_ver=ver_digits[0]+'.'+ver_digits[1]\n                        if tf_model_server_small_ver==tf_small_ver:\n                            n_info.release_tf_package_to_model_server_package[pkg]=pkg2\n                            break\n\n        nr_setup.releases_info[neuron_release_ver]=n_info\n\n\n    try:\n        m_release=neuron_releases[\"neuron_versions\"][neuron_version][\"components\"]\n    except:\n        print(__name__,\": error: \",\"Version \" + neuron_version + \" is not a Neuron version or it is not supported\")\n        exit(-1)\n\n\n\n\n    return (neuron_version,latest_neuron_version)\n\n\n\n\n\n################\n# Sanity Checks\n################\ndef cli_validate(update,neuron_version,framework_version,is_latest_neuron,ami):\n    # --update_cmd Sanity check\n    # When choosing update, it always updating to latest , should not provide neuron_version\n    if (update!=None) & (is_latest_neuron == False):\n        print (__name__,\": error: \",\"--update always update to latest Neuron versions, can't specify Neuron version\")\n        exit(-1)\n\n    #if neuron_version != None:\n    #    if ami == 'dlami':\n    #        print (__name__,\": error: \",\"--neuron_version should not be specified together with --ami=dlami\")\n    #        exit(-1)\n\n    if (framework_version != None):\n        if (framework_version not in  nr_setup.releases_info[neuron_version].release_frameworks_list):\n            print (__name__,\": error: \",\" \" + framework_version + \" is not a supported framework\")\n            exit(-1)\n\n########################################\n# version to tuple\n########################################\n\ndef versiontuple(v):\n   filled = []\n   for point in v.split(\".\"):\n      filled.append(point.zfill(8))\n   return tuple(filled)\n\n\n########################################\n# --list command\n########################################\ndef cli_list_cmd(nr_setup, neuron_version, list):\n\n\n    str =''\n\n    if (list == 'neuron_versions'):\n        str += '\\nList of Neuron release versions supported by this helper:\\n' + '\\n'\n        for ver in neuron_ver_list:\n            str += 'neuron-'+ver + '\\n'\n\n    #TODO: add \"[main]\" label to main packages\n    if (list == 'packages'):\n        str += '\\nList of Neuron packages included in Neuron release version ' + neuron_version + ':\\n' + '\\n'\n        for package in nr_setup.releases_info[neuron_version].release_packages_all:\n            if len( nr_setup.releases_info[neuron_version].release_packages_all[package]['package_type']):\n                #FIXME Runtime library hardcode print\n                if (nr_setup.releases_info[neuron_version].release_packages_all[package][\"component\"] == 'libnrt'):\n                    str += nr_setup.releases_info[neuron_version].release_packages_all[package][\"component\"] +' : \\t' +     \\\n                        \"libnrt.so (version \"+  \\\n                        nr_setup.releases_info[neuron_version].release_packages_all[package][\"version\"] +  \")\"  + '\\n'\n                else:\n                    str += nr_setup.releases_info[neuron_version].release_packages_all[package][\"component\"] +' : \\t' + package + '\\n'\n\n    if (list == 'components'):\n        str += '\\nList of Neuron components included in Neuron release version ' + neuron_version + ':\\n' + '\\n'\n        for comp in nr_setup.releases_info[neuron_version].release_components_list:\n            str += comp + '\\n'\n\n    #TODO: add \"[main]\" label to main frameworks\n    if (list == 'frameworks'):\n        str += '\\nList of frameworks included in Neuron release version ' + neuron_version + ':\\n' + '\\n'\n        for fw in nr_setup.releases_info[neuron_version].release_frameworks_all:\n            str += nr_setup.releases_info[neuron_version].release_frameworks_all[fw][\"framework\"] +' : \\t' + fw + '\\n'\n\n    return str\n\n\n########################################\n# Print configuration\n########################################\n\ndef hlpr_print_config(nr_setup, neuron_version):\n    str = ''\n    str += '\\n'\n    str += '###########################################################################' + '\\n'\n    str += '# ' + nr_setup.action + ' ' + nr_setup.framework + ' '\n    if (nr_setup.framework_version != 'latest') & (nr_setup.framework_version != None):\n        str += '(' + nr_setup.framework_version + ')' + ' '\n    if nr_setup.action == 'Update':\n        str += 'from latest Neuron version ' + neuron_version\n    else:\n        str += 'from Neuron version ' + neuron_version\n\n    str += '\\n# '\n\n    str += 'On '\n    if (nr_setup.os == 'ubuntu'):\n        str += 'Ubuntu '\n    elif (nr_setup.os == 'amazonlinux'):\n        str += 'Amazon Linux '\n\n    if (nr_setup.ami == 'dlami'):\n       str += 'DLAMI'\n    else:\n        str += 'AMI'\n\n    str += ' for '\n    if (nr_setup.mode == 'compile'):\n       str += 'compilation on compute instance'\n    elif (nr_setup.mode == 'develop'):\n       str += 'development on inf1 instance'\n    elif (nr_setup.mode == 'deploy'):\n       str += 'deployment on inf1 instance'\n    str += '\\n'\n    str += '###########################################################################' + '\\n'\n    str += '\\n'\n\n    return str\n\n###################################\n# Build Pip command\n###################################\ndef hlpr_build_pip_command(nr_setup, neuron_version, component,include_compiler,optional):\n\n\n    package_dict= nr_setup.releases_info[neuron_version].release_package_main\n\n    if (nr_setup.framework_version==None):\n        fw_package_dict= nr_setup.releases_info[neuron_version].release_frameworks_main\n        fw_comp=component\n    else:\n        fw_package_dict= nr_setup.releases_info[neuron_version].release_frameworks_all\n        fw_comp=nr_setup.framework_version\n\n    pip_cmd_prefix=''\n    pip_cmd =''\n\n\n    if nr_setup.action=='Install':\n        pip_cmd_prefix = 'pip install '\n    else:\n        pip_cmd_prefix = 'pip install --upgrade '\n\n    cmd=pip_cmd_prefix\n\n    if (component == 'mxnet') | (component == 'pytorch') | (component == 'tensorflow'):\n\n        # Framework installation\n        if (component == 'mxnet') | (component == 'pytorch'):\n            pip_cmd += cmd + fw_package_dict[fw_comp]['package']\n            if (nr_setup.is_latest_neuron==False) | (nr_setup.force_versions == True):\n                pip_cmd += '=='+fw_package_dict[fw_comp]['version']\n            elif (nr_setup.is_latest_neuron==True)&(nr_setup.framework_version!=None):\n                pip_cmd += '=='+fw_package_dict[fw_comp]['framework_version']+'.*'\n\n        elif (component == 'tensorflow'):\n            if ((parse(neuron_version)<parse('1.15.0')) | (parse(fw_package_dict[fw_comp]['framework_version'])<parse('2.0.0'))):\n                pip_cmd += cmd + fw_package_dict[fw_comp]['package']\n            else:\n                pip_cmd = cmd + fw_package_dict[fw_comp]['package']\n                if (include_compiler == True):\n                    pip_cmd +=  '[cc]'\n\n            if (nr_setup.is_latest_neuron==False) | (nr_setup.force_versions == True):\n                pip_cmd += '=='+fw_package_dict[fw_comp]['version']\n            elif (nr_setup.is_latest_neuron==True)&(nr_setup.framework_version!=None):\n                pip_cmd += '=='+fw_package_dict[fw_comp]['framework_version']+'.*'\n\n        # Compiler installation\n        if (include_compiler == True):\n            if (component == 'tensorflow'):\n                if ((parse(neuron_version)<parse('1.15.0')) | (parse(fw_package_dict[fw_comp]['framework_version'])<parse('2.0.0'))):\n                    pip_cmd += ' ' + package_dict['compiler']['package']\n                    if (nr_setup.is_latest_neuron==False) | (nr_setup.force_versions == True):\n                        pip_cmd += '=='+package_dict['compiler']['version']\n            if (component == 'mxnet'):\n                pip_cmd += ' ' + package_dict['compiler']['package']\n                if (nr_setup.is_latest_neuron==False) | (nr_setup.force_versions == True):\n                    pip_cmd += '=='+package_dict['compiler']['version']\n\n            if (component == 'pytorch'):\n                pip_cmd += ' ' + package_dict['compiler']['package']\n                pip_cmd += '[tensorflow] \"protobuf==3.20.1\"'\n                if (nr_setup.is_latest_neuron==False) | (nr_setup.force_versions == True):\n                    pip_cmd += '=='+package_dict['compiler']['version']\n\n        # Additional packages installation\n        if (component == 'pytorch'):\n                pip_cmd += ' torchvision'\n\n        if component == 'tensorflow':\n            pip_cmd += ' \"protobuf\"'\n\n    else:\n        pip_cmd += '\\n'\n        if optional==False:\n            pip_cmd += '# ' + nr_setup.action  + ' ' + package_formal_name[component] + '\\n'\n        else:\n            pip_cmd += '# Optional: ' + nr_setup.action  + ' ' + package_formal_name[component] + '\\n'\n        pip_cmd += cmd + package_dict[component]['package']\n        if (nr_setup.is_latest_neuron==False) | (nr_setup.force_versions == True):\n            pip_cmd += '=='+package_dict[component]['version']\n\n\n\n\n    pip_cmd += '\\n'\n    return pip_cmd\n\n\n\n\n#################################################\n##  pip_setup_repos\n#################################################\ndef hlpr_pip_repos_setup():\n    str = '\\n'\n    str += '# Set Pip repository  to point to the Neuron repository' + '\\n'\n    str += 'pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com'+ '\\n'\n    return str\n\n#################################################\n##  hlpr_pip_install_create_python_venv\n#################################################\n\ndef hlpr_pip_install_create_python_venv(nr_setup, neuron_version):\n\n    py_ver=nr_setup.releases_info[neuron_version].python_ver\n    str = ''\n    str += '\\n'\n\n    if nr_setup.os == 'ubuntu':\n        str += '######################################################' + '\\n'\n        str += '#   Only for Ubuntu 20 - Install Python' + py_ver + '\\n'\n        str += '#' + '\\n'\n        str += '# sudo add-apt-repository ppa:deadsnakes/ppa' + '\\n'\n        str += '# sudo apt-get install python' + py_ver + '\\n'\n        str += '#' + '\\n'\n        str += '######################################################' + '\\n'\n\n    str += '# Install Python venv and activate Python virtual environment to install    ' + '\\n'\n    str += '# Neuron pip packages.' + '\\n'\n\n    if nr_setup.os == 'ubuntu':\n        str += 'sudo apt-get install -y python'+ py_ver + '-venv g++' + '\\n'\n    elif nr_setup.os == 'amazonlinux':\n        str += 'sudo dnf install -y python'+ py_ver + '-venv gcc-c++' + '\\n'\n    str += 'python'+ py_ver + ' -m venv ' + nr_setup.framework +'_venv' + '\\n'\n    str += 'source '+ nr_setup.framework  + '_venv/bin/activate' + '\\n'\n    str += 'pip install -U pip' + '\\n'\n    str += '\\n'\n\n\n    if (nr_setup.mode == 'develop') & (nr_setup.action =='Install'):\n        if ((nr_setup.ami=='dlami') & (nr_setup.conda_env == 'None')) | \\\n            (nr_setup.ami !='dlami'):\n\n            str += '\\n'\n            str += '# Instal Jupyter notebook kernel '+ '\\n'\n            str += 'pip install ipykernel ' + '\\n'\n            str += 'python'+ py_ver + ' -m ipykernel install --user --name '\n            str += nr_setup.framework  + '_venv '\n            str += '--display-name \"Python (' + package_formal_name[nr_setup.framework] + ')\"' + '\\n'\n            str += 'pip install jupyter notebook' + '\\n'\n            str += 'pip install environment_kernels' + '\\n'\n            str += '\\n'\n\n    return str\n\n#################################################\n##  hlpr_pip_activate_python_venv\n#################################################\n\ndef hlpr_pip_activate_python_venv(nr_setup, neuron_version):\n\n    py_ver=nr_setup.releases_info[neuron_version].python_ver\n\n    str = ''\n    str += '\\n'\n    str += '# Activate a Python ' + py_ver + ' virtual environment where Neuron pip packages were installed ' + '\\n'\n    str += 'source '+ nr_setup.framework  + '_venv/bin/activate' + '\\n'\n    str += '\\n'\n\n    return str\n\n######################################################################\n##  Framework/Compiler installation / Update  instructions (non-DLAMI)\n#######################################################################\n\ndef hlpr_framework_compiler_setup(nr_setup, neuron_version, include_compiler):\n\n    cmd_inst = ''\n    cmd_inst += '\\n'\n    cmd_inst += '#' + nr_setup.action  + ' ' + package_formal_name[nr_setup.framework] + '\\n'\n\n    if (nr_setup.action=='Install'):\n        if len(nr_setup.fw_package_dict[nr_setup.fw_comp]['pre_install_cmds']):\n            for cmd_pre in nr_setup.releases_info[neuron_version].release_package_main[nr_setup.framework]['pre_install_cmds']:\n                cmd_inst += cmd_pre  + '\\n'\n\n\n    cmd_inst += hlpr_build_pip_command(nr_setup=nr_setup,neuron_version=neuron_version, component=nr_setup.framework,include_compiler=include_compiler,optional=False)\n\n    return cmd_inst\n\n\n######################################################################\n##  hlpr_framework_dlami_activate\n#######################################################################\n\ndef hlpr_framework_dlami_activate(nr_setup):\n\n    str = ''\n\n    str += '\\n'\n    if (nr_setup.framework == 'pytorch'):\n            str += '# Activate PyTorch' + '\\n'\n    elif (nr_setup.framework == 'tensorflow'):\n        str += '# Activate TensorFlow' + '\\n'\n\n    elif (nr_setup.framework == 'mxnet'):\n        str += '# Activate MXNet' + '\\n'\n\n    str += 'source activate '\n    str +=  nr_setup.generic_conda_env + '\\n'\n\n    return str\n\n\n#################################################\n##  hlpr_os_packages_update\n#################################################\n\ndef hlpr_os_packages_update(nr_setup):\n\n    str = ''\n    str += '\\n'\n    str += '# Update OS packages' + '\\n'\n    if nr_setup.os == 'ubuntu':\n        str += 'sudo apt-get update -y' + '\\n'\n    elif nr_setup.os == 'amazonlinux':\n        str += 'sudo dnf update -y' + '\\n'\n\n    return str\n\n#################################################\n##  hlpr_os_headers_update\n#################################################\n\ndef hlpr_os_headers_update(nr_setup):\n    str = ''\n    str = '\\n'\n    str += '# ' + nr_setup.action + ' OS headers'\n    str += '\\n'\n    if nr_setup.os == 'ubuntu':\n        str += 'sudo apt-get install linux-headers-$(uname -r) -y' + '\\n'\n    elif nr_setup.os == 'amazonlinux':\n        str += 'sudo dnf install -y \"kernel-devel-uname-r = $(uname -r)\"' + '\\n'\n    return str\n\n#################################################\n##  hlpr_os_export_path\n#################################################\n\ndef hlpr_os_export_path(nr_setup):\n    str = ''\n    str += '\\n'\n    if nr_setup.os == 'ubuntu':\n        str += 'export PATH=/opt/aws/neuron/bin:$PATH' + '\\n'\n    elif nr_setup.os == 'amazonlinux':\n        str += 'export PATH=/opt/aws/neuron/bin:$PATH' + '\\n'\n    return str\n\n\n#################################################\n##  hlpr_os_packages_first_setup\n#################################################\n\ndef hlpr_os_packages_first_setup(nr_setup):\n\n    str = ''\n    str += '\\n# Configure Linux for Neuron repository updates' + '\\n'\n    if nr_setup.os == 'ubuntu':\n        str += '. /etc/os-release' + '\\n'\n        str += 'sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF' + '\\n'\n        str += 'deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main' + '\\n'\n        str += 'EOF' + '\\n'\n        str += 'wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -' + '\\n'\n    elif nr_setup.os == 'amazonlinux':\n        str += 'sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF' + '\\n'\n        str += '[neuron]' + '\\n'\n        str += 'name=Neuron YUM Repository' + '\\n'\n        str += 'baseurl=https://yum.repos.neuron.amazonaws.com' + '\\n'\n        str += 'enabled=1' + '\\n'\n        str += 'metadata_expire=0' + '\\n'\n        str += 'EOF' + '\\n'\n        str += 'sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB' + '\\n'\n\n    return str\n\n\n#################################################\n##  os_comp_setup\n#################################################\n\ndef hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp,optional,pkg):\n\n    os_cmd = ''\n\n    if pkg==None:\n        key=comp\n        pkg_dict= nr_setup.releases_info[neuron_version].release_package_main\n    else:\n        key=pkg\n        pkg_dict= nr_setup.releases_info[neuron_version].release_packages_all\n\n\n\n\n    if (comp=='driver'):\n        #os_cmd += '\\n'\n        #os_cmd += '###############################################################################################################\\n'\n        #os_cmd += '# Before installing or updating aws-neuron-dkms:'+ '\\n'\n        #os_cmd += '# - Stop any existing Neuron runtime 1.0 daemon (neuron-rtd) by calling: \\'sudo systemctl stop neuron-rtd\\'' + '\\n'\n        #os_cmd += '###############################################################################################################\\n'\n        # WARNING: Exception\n        # Starting Neuron 1.16.0 , new kernel is needed to work with Runtime 2.x (library mode)\n\n\n        if (parse(neuron_version)>=parse('2.99.99')):\n            os_cmd += '\\n'\n            os_cmd += '################################################################################################################\\n'\n            os_cmd += '# To install or update to Neuron versions 2.99.99 and newer from previous releases:'+ '\\n'\n            if (nr_setup.os=='ubuntu'):\n                os_cmd += '# - Unstall aws-neuron-dkms by calling \\`sudo dnf remove aws-neuron-dkms -y\\`  -y'+ '\\n'\n            elif (nr_setup.os=='amazonlinux'):\n                os_cmd += '# - Unstall aws-neuron-dkms by calling \\`sudo apt-get remove aws-neuron-dkms\\`  -y'+ '\\n'\n            os_cmd += '# - DO NOT skip \\'aws-neuronx-dkms\\' install or upgrade step, you MUST install or upgrade to latest Neuron driver'+ '\\n'\n            os_cmd += '################################################################################################################\\n'\n        elif (parse(neuron_version)>=parse('1.19.1')):\n            os_cmd += '\\n'\n            os_cmd += '################################################################################################################\\n'\n            os_cmd += '# To install or update to Neuron versions 1.19.1 and newer from previous releases:'+ '\\n'\n            os_cmd += '# - DO NOT skip \\'aws-neuron-dkms\\' install or upgrade step, you MUST install or upgrade to latest Neuron driver'+ '\\n'\n            os_cmd += '################################################################################################################\\n'\n\n\n    # Update header files if driver should be installed or updated\n    if (comp=='driver'):\n        os_cmd += hlpr_os_headers_update(nr_setup)\n\n\n\n\n    if nr_setup.os=='ubuntu':\n        os_cmd_prefix = 'sudo apt-get install '\n    elif (nr_setup.action=='Install')&(nr_setup.os=='amazonlinux'):\n        os_cmd_prefix = 'sudo dnf install '\n    elif (nr_setup.action=='Update')&(nr_setup.os=='amazonlinux'):\n        os_cmd_prefix = 'sudo dnf update '\n\n    if comp in nr_setup.releases_info[neuron_version].release_os_install_list:\n        # install only if there is a package associated with the component\n        if (len(pkg_dict[key]['package_type']) != 0):\n            #os_cmd = build_os_command(cmd=os_cmd_prefix,component=comp,is_latest_release=is_latest_neuron)\n            os_cmd += '\\n'\n            if (optional==False):\n                os_cmd += '# ' + nr_setup.action + ' ' + package_formal_name[comp]\n            else:\n                os_cmd += '# Optional: ' + nr_setup.action + ' ' + package_formal_name[comp]\n\n            if (nr_setup.is_latest_neuron==False)&(nr_setup.os=='ubuntu'):\n                os_cmd += '\\n'\n                os_cmd += '# If you are downgrading from newer version, please add \\'--allow-downgrades\\' option to \\'sudo apt-get install\\' '\n            if (nr_setup.is_latest_neuron==False)&(nr_setup.os=='amazonlinux'):\n                os_cmd += '\\n'\n                os_cmd += '# If you are downgrading from newer version , please remove existing package using \\'sudo dnf remove\\' before installing the older package'\n            os_cmd += '\\n'\n            # Amazon Linux DLAMI will not allow updating tensorflow-model-server and aws-neuron-dkms without adding sudo dnf versionlock delete\n            if ((comp=='tensorflow-model-server') | (comp=='driver'))  & (nr_setup.ami == 'dlami') & (nr_setup.os == 'amazonlinux'):\n                os_cmd += 'sudo dnf versionlock delete '\n                os_cmd += pkg_dict[key]['package']\n                os_cmd += '\\n'\n\n            os_cmd += os_cmd_prefix + pkg_dict[key]['package']\n\n            # Amazon Linux yum installation packaging versioning is set via hyphen not equals\n            version_key = \"=\"\n            if (nr_setup.os=='amazonlinux'):\n                version_key = \"-\"\n\n            if (nr_setup.is_latest_neuron==False) | (nr_setup.force_versions):\n                os_cmd += version_key + pkg_dict[key]['version']\n            elif (pkg!=None):\n                if ( nr_setup.releases_info[neuron_version].release_package_main[comp]['version']!= nr_setup.releases_info[neuron_version].release_packages_all[pkg]['version']):\n                    os_cmd += version_key + pkg_dict[key]['version']\n\n            # Ubuntu DLAMI will not allow updating tensorflow-model-server and aws-neuron-dkms without adding --allow-change-held-packages\n            if ((comp=='tensorflow-model-server') | (comp=='driver'))  & (nr_setup.ami == 'dlami') & (nr_setup.os == 'ubuntu'):\n                os_cmd += ' --allow-change-held-packages'\n\n            os_cmd += ' -y'\n            os_cmd += '\\n'\n\n    # Update header files if driver should be installed or updated\n    if (comp=='driver'):\n        os_cmd += '\\n'\n        os_cmd += '####################################################################################\\n'\n        os_cmd += '# Warning: If Linux kernel is updated as a result of OS package update'+ '\\n'\n        if (parse(neuron_version)>=parse('2.99.99')):\n            os_cmd += '#          Neuron driver (aws-neuronx-dkms) should be re-installed after reboot'+ '\\n'\n        else:\n            os_cmd += '#          Neuron driver (aws-neuron-dkms) should be re-installed after reboot'+ '\\n'\n        os_cmd += '####################################################################################\\n'\n\n    if (comp=='tools'):\n        if (parse(neuron_version)>=parse('2.99.99')):\n            os_cmd += '\\n'\n            os_cmd += '################################################################################################################\\n'\n            os_cmd += '# To install or update to Neuron versions 2.99.99 and newer from previous releases:'+ '\\n'\n            if (nr_setup.os=='ubuntu'):\n                os_cmd += '# - Unstall aws-neuron-tools by calling \\`sudo dnf remove aws-neuron-tools -y\\`  -y'+ '\\n'\n            elif (nr_setup.os=='amazonlinux'):\n                os_cmd += '# - Unstall aws-neuron-tools by calling \\`sudo apt-get remove aws-neuron-tools\\`  -y'+ '\\n'\n            os_cmd += '################################################################################################################\\n'\n\n    return os_cmd\n\n\n########################################\n##  installation / Update  instructions\n########################################\ndef hlpr_instructions(nr_setup, neuron_version):\n\n    cmd_string = ''\n\n    setup_mode=nr_setup.mode\n\n\n\n    # look for conda environment for this framework version\n    for fw_env in nr_setup.dlami_conda_env:\n        if fw_env != nr_setup.framework:\n            continue\n        fw_ver_conda_env=nr_setup.dlami_conda_env[fw_env]\n        for conda_env_fw_ver in fw_ver_conda_env:\n            if (conda_env_fw_ver == nr_setup.fw_package_dict[nr_setup.fw_comp]['framework_version']):\n                nr_setup.conda_env=nr_setup.dlami_conda_env[fw_env][conda_env_fw_ver][0]\n                nr_setup.generic_conda_env=nr_setup.dlami_conda_env[fw_env][conda_env_fw_ver][1]\n                break\n\n\n    # look what runtime works with this framework version\n    fal_rtd=False\n    fal_libnrt=False\n    for fw in nr_setup.fal_supported_runtime:\n        if fw != nr_setup.framework:\n            continue\n        if fw == nr_setup.framework:\n            if (nr_setup.framework_version == None):\n                fw_ver= nr_setup.releases_info[neuron_version].release_frameworks_main[nr_setup.framework]['framework_version']\n                fal_version= nr_setup.releases_info[neuron_version].release_frameworks_main[nr_setup.framework]['version']\n            else:\n                fw_ver= nr_setup.releases_info[neuron_version].release_frameworks_all[nr_setup.framework_version]['framework_version']\n                fal_version= nr_setup.releases_info[neuron_version].release_frameworks_all[nr_setup.framework_version]['version']\n            fal_supported_rtd=nr_setup.fal_supported_runtime[fw][fw_ver]['neuron-rtd']\n            fal_supported_libnrt=nr_setup.fal_supported_runtime[fw][fw_ver]['libnrt']\n            if (parse(fal_version) >= parse(fal_supported_rtd[0])) &  \\\n                (parse(fal_version) <= parse(fal_supported_rtd[1])):\n                fal_rtd=True\n            elif (parse(fal_version) >= parse(fal_supported_libnrt[0])) &  \\\n                (parse(fal_version) <= parse(fal_supported_libnrt[1])):\n                fal_libnrt=True\n\n    if nr_setup.conda_env == \"None\":\n        dlami_ev_exists=False\n    else:\n        dlami_ev_exists=True\n\n    #cmd_string += hlpr_print_config(nr_setup, neuron_version)\n\n    if (nr_setup.framework_version==None):\n        fw_package_dict= nr_setup.releases_info[neuron_version].release_frameworks_main\n        fw_comp=nr_setup.framework\n    else:\n        fw_package_dict= nr_setup.releases_info[neuron_version].release_frameworks_all\n        fw_comp=nr_setup.framework_version\n\n\n\n    if (nr_setup.framework !=None): #if install or update\n        # If we are not using DLAMI\n        if (nr_setup.ami=='non-dlami') | \\\n            ((nr_setup.ami=='dlami') & \\\n                (\n                (nr_setup.action == 'Update') | \\\n                (dlami_ev_exists==False) | \\\n                (nr_setup.is_latest_neuron==False)) \\\n                ):\n\n\n\n            if (nr_setup.ami=='dlami') & (dlami_ev_exists==False):\n                cmd_string += '\\n'\n                cmd_string += '# Note: There is no DLAMI Conda environment for this framework version'+ '\\n'\n                cmd_string += '#       Framework will be installed/updated inside a Python environment'+ '\\n'\n\n\n            if (setup_mode == 'develop') | (setup_mode == 'deploy'):\n                if (nr_setup.action =='Install')&(nr_setup.ami!='dlami'):\n                    # For First install, setup Neuron OS packagaes repo (dnf or apt)\n                    cmd_string += hlpr_os_packages_first_setup(nr_setup)\n\n                # Always update to latest OS packages\n                cmd_string += hlpr_os_packages_update(nr_setup)\n\n                cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp='driver',optional=False,pkg=None)\n\n\n                #FIXME Temporary check for MXNET 1.5 in maintenance mode\n                if (neuron_version == \"1.16.0\") & (nr_setup.framework==\"mxnet\")&    \\\n                    (fw_package_dict[fw_comp]['framework_version']==\"1.5.1\"):\n                    cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version=\"1.15.2\", comp='runtime-server',optional=False,pkg=None)\n                elif (fal_rtd):\n                    cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp='runtime-server',optional=False,pkg=None)\n\n                #if mode = develop, install tools\n                if (setup_mode == 'develop'):\n                    cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp='tools',optional=False,pkg=None)\n                    if (nr_setup.framework == 'tensorflow'):\n                        cmd_string +=  hlpr_build_pip_command(nr_setup, neuron_version, component='tensorboard',include_compiler=False,optional=False)\n\n                if (nr_setup.action =='Install'):\n                    cmd_string += hlpr_os_export_path(nr_setup)\n\n            if (nr_setup.ami=='non-dlami') | \\\n                ((nr_setup.ami=='dlami')&(nr_setup.generic_conda_env==\"None\")):\n\n                if (nr_setup.action =='Install'):\n                    # For first install , install python venv and activate a venv\n                    cmd_string += hlpr_pip_install_create_python_venv(nr_setup, neuron_version)\n                elif (nr_setup.action =='Update'):\n                    # For nect times, activate the venv used for initial install\n                    cmd_string += hlpr_pip_activate_python_venv(nr_setup, neuron_version)\n            elif (nr_setup.ami=='dlami'):\n                cmd_string += hlpr_framework_dlami_activate(nr_setup)\n\n            # Setup Neuron pip packages\n            cmd_string += hlpr_pip_repos_setup()\n\n            # Now install framework\n            if (setup_mode == 'deploy'):\n                # do not install compiler when deploying\n                cmd_string += hlpr_framework_compiler_setup(nr_setup, neuron_version,  include_compiler=False)\n            else:\n                # install compiler when mode = developer or mode = compile\n                cmd_string += hlpr_framework_compiler_setup(nr_setup, neuron_version,  include_compiler=True)\n\n\n            #if mode = deploy, install model server\n            if (setup_mode != 'compile'):\n                    if (nr_setup.framework == 'tensorflow'):\n                        if (nr_setup.framework_version==None):\n                            tf_package= nr_setup.releases_info[neuron_version].release_frameworks_main[nr_setup.framework]['package_name']\n                        else:\n                            tf_package= nr_setup.releases_info[neuron_version].release_frameworks_all[nr_setup.framework_version]['package_name']\n                        cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp='tensorflow-model-server',optional=True,pkg= nr_setup.releases_info[neuron_version].release_tf_package_to_model_server_package[tf_package])\n\n\n        # if running DLAMI\n        elif (nr_setup.ami=='dlami'):\n            if (nr_setup.action =='Install'):\n\n                cmd_string += '\\n'\n                cmd_string += '# Neuron is pre-installed on Deep Learning AMI (DLAMI), latest DLAMI version may not include latest Neuron versions '+ '\\n'\n                cmd_string += '# To update to latest Neuron version, follow \"Update to latest release\" instruction on Neuron documentation'+ '\\n'\n\n                # WARNING: Exception\n                # Starting Neuron 1.16.0 , new kernel is needed to work with Runtime 2.x (library mode)\n                if (parse(neuron_version)>=parse('1.16.0')):\n                    if (setup_mode == 'develop') | (setup_mode == 'deploy'):\n                        cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp='driver',optional=False,pkg=None)\n\n                #FIXME Temporary check for MXNET 1.5 in maintenance mode\n                if (neuron_version == \"1.16.0\") & (nr_setup.framework==\"mxnet\")&    \\\n                    (fw_package_dict[fw_comp]['framework_version']==\"1.5.1\"):\n                    cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version=\"1.15.2\", comp='runtime-server',optional=False,pkg=None)\n\n                cmd_string += '\\n'\n                cmd_string += hlpr_framework_dlami_activate(nr_setup)\n\n\n\n    return cmd_string\n\n\n\n\n\n########################################\n# neuron_setup_helper\n########################################\n\nclass neuron_setup_helper:\n    def __init__(self, manifest_file,neuron_version):\n\n        # All Neuron releases\n        self.releases_info = {}\n\n        if (manifest_file== None) | (manifest_file== 'default')  :\n            self.file = 'neuron-releases-manifest.json'\n        else:\n            self.file = manifest_file\n\n        ver_tuple = enumerate_release_manifest(nr_setup=self,in_neuron_version=neuron_version)\n        self.neuron_version = ver_tuple[0]\n        self.latest_neuron_version = ver_tuple[1]\n\n        self.conda_env=\"\"\n        self.python_ver=\"\"\n        self.generic_conda_env=\"\"\n\n        if self.neuron_version == self.latest_neuron_version:\n            self.is_latest_neuron=True\n        else:\n            self.is_latest_neuron=False\n\n        if (self.is_latest_neuron) & (neuron_version !=None) & (neuron_version !='latest'):\n            # User explicitly specified the version, although it is the latest version\n            # in this case the instructions will include the exact versions of the packages\n            self.force_versions=True\n        else:\n            self.force_versions=False\n\n\n    def instructions(self,framework,action,framework_version,os,ami,mode):\n\n        self.framework=framework\n        self.action=action\n        self.mode=mode\n        self.os=os\n        self.ami=ami\n        if (framework_version=='latest'):\n            self.framework_version=None\n        else:\n            self.framework_version=framework_version\n        setup_cmd = \"\"\n\n        if (self.framework_version==None):\n            self.fw_package_dict= self.releases_info[self.neuron_version].release_frameworks_main\n            self.fw_comp=self.framework\n        else:\n            self.fw_package_dict= self.releases_info[self.neuron_version].release_frameworks_all\n            self.fw_comp=self.framework_version\n\n        setup_cmd=hlpr_instructions(self,self.neuron_version)\n\n        return setup_cmd\n\nif __name__ == '__main__':\n    setup_cmd =''\n    args = cli_parse_arguments()\n    nr_setup=neuron_setup_helper(manifest_file=args.file,neuron_version=args.neuron_version)\n\n    cli_validate(update=args.update,neuron_version=nr_setup.neuron_version,framework_version=args.framework_version,is_latest_neuron=nr_setup.is_latest_neuron,ami=args.ami)\n    if (args.list):\n        setup_cmd += cli_list_cmd(nr_setup=nr_setup,neuron_version=nr_setup.neuron_version, list=args.list)\n    else:\n        if (args.install != None)|(args.update !=None):\n            if args.install:\n                framework=args.install\n                action = 'Install'\n            elif args.update:\n                framework=args.update\n                action = 'Update'\n        else:\n            action = None\n            framework=None\n\n        setup_cmd += nr_setup.instructions(framework=framework,action=action,framework_version=args.framework_version,os=args.os,ami=args.ami,mode=args.mode)\n    print (setup_cmd)\n\n\n\n\n\n\n"
  },
  {
    "path": "src/helperscripts/release-manifest-def.py",
    "content": "\nneuron_releases={\n    \"repos\":{\n        \"whl\"=\"_url\",           # url of the wheel repo\n        \"rpm\"=\"_url\",           # url of the rpm repo (yum)\n        \"deb\"=\"_url\",           # url of the debian repo (apt)\n    }\n    \"manifest_date\": \"_date\",\n    \"manifest_version\":\"_ver\"   # Will increment when format change\n    \"latest_release\":{\n        \"_instance\":{           # can be \"inf1\", \"trn1\", etc.. \n            \"version\":\"_ver\"    # latest neuron release that support the _instance \n        }\n    }\n    \"neuron_versions\"={         # all neuron release versions supported by this manifest\n        \"_neuron_version\":{     # Neuron release version entry e.g. \"1.14.0\"\n            \"python_ver\": [\"_ver\"]              # list of python versions supported by this neuron release, e.g. \"3.6\"\n            \"instance_support\": [\"_instance\"]   # list of instances supported by this neuron release\n            \"arch\":[\"_arch\"]                    # list of architectures supported by this neuron release (e.g. x86)\n            \"components\":{                      # all components included in this neuron release \n                                                # (e.g. compiler, driver , pytorch ...)\n                \"_component_name\":{             # component entry (e.g. driver, compiler)\n                    \"framework\":_boolean        # is this component a framework ? \n                                                # needed since there is a differces in versioning and content etc .. \n                    \"packages\":{                # all packages of this component that included in this release \n                                                # e.g. mxnet support mx_neuron and mxnet-neuron\n                        \"_package_name\":{       # package entry (e.g. mx_neuron)\n                            \"install_on_compute_instance\":_booolean     # can this package installed on compute instance?\n                            \"versions\":{                            # all versions of the specific package\n                                                                    # e.g. torch-neuron may include multiple versions\n                                \"_ver\":{                            # package version entry (e.g. 1.4.1.0)\n                                    \"pre_install_cmds\":[\"_cmd\"]     # a list of commands to call before installing\n                                                                    # the package, e.g. when a plugin need to install the\n                                                                    # framework first , as in mx_neuron\n                                    \"post_install_cmds\":[\"_cmd\"]    # a list of commands to call after installaing the package\n                                    \"format\":[\"_format\"]            # package format (e.g. bin or src)\n                                    \"content\":[\"_content\"]          # package content \n                                                                    # (e.g. tools include neuron-top, neuron monitor etc .. )\n                                    \"package_type\":[\"_type\"]        # list of package type supported ( e.g. whl, rpm, deb)\n                                }\n                            }                     \n                        }\n                    }\n                }\n            }\n        }\n    },\n    \"softwarelifecycle\":{           # Status of neuron software releases (supported, maintained, deprecated)\n                                    # Releases that are not under \"supported\" or \"maintained\" should be \"supported\"\n        \"maintained\":{              # Releases that are being maintained, no active development, bug fixes can be provided\n                                    # releases can be Neuron release, component (e.g. runtime), or a framework (e.g. pytorch-1.5.x)\n            \"neuron_versions\":{     # Neuron versions that are under maintanance status\n                \"from\":\"_ver\"       # from neuron release version\n                \"to\":\"_ver\"         # to neuron release version\n            },\n            \"components\":{              # Components that are under maintanance status\n                \"_component_name\":{     # packages in that component\n                    \"_package_name\":{   # package entry\n                        \"from\":\"_ver\"   # from version\n                        \"to\":\"_ver\"     # to version\n                    }\n                }\n\n            },\n            \"frameworks\":{              # Frameworks that are under maintanance status\n                \"pytorch\":{             # Pytorch versions that are under maintanance status\n                    \"from\":\"_ver\"       # from version\n                    \"to\":\"_ver\"         # to version\n                },\n                \"tensorflow\":{          # Pytorch versions that are under maintanance status\n                    \"from\":\"_ver\"       # from version\n                    \"to\":\"_ver\"         # to version\n                },\n                \"mxnewt\":{              # MXNet versions that are under maintanance status\n                    \"from\":\"_ver\"       # from version\n                    \"to\":\"_ver\"         # to version\n                }\n            }\n        },\n        \"deprecated\":{                  # Releases that are deprecated, no bug fixes\n                                        # format similar to \"maintained\" section\n        },\n    },\n    \"compatability\": {                  # compatability section\n        \"_component_name\": {            # component entry\n            \"_package_name\": {          # package entry\n                \"_ver_to__ver\": {       # compatability entry\n                    \"from\": \"_ver\",     # from version\n                    \"to\": \"_ver\",       # to version\n                    \"instance_support\": [   # instance compatability\n                        \"_instance\"\n                    ],\n                    \"arch\": [               # arch compatability\n                        \"_arch\"\n                    ],\n                    \"components\": {                 # components compatability section \n                        \"_component_name\": {        # component entry\n                            \"_package_name\": {      # package entry\n                                \"from\": \"_ver\",     # from version\n                                \"to\": \"_ver\"        # to version\n                            }\n                        }\n                    }\n                }\n            }\n        }\n    }\n}\n"
  },
  {
    "path": "src/k8/bert_service.yml",
    "content": "---\nkind: Service\napiVersion: v1\nmetadata:\n  name: inf-k8s-test\n  labels:\n    app: inf-k8s-test\nspec:\n  ports:\n    - name: http-tf-serving\n      port: 8500\n      targetPort: 8500\n    - name: grpc-tf-serving\n      port: 9000\n      targetPort: 9000\n  selector:\n    app: inf-k8s-test\n    role: master\n  type: ClusterIP\n---\nkind: Deployment\napiVersion: apps/v1\nmetadata:\n  name: inf-k8s-test\n  labels:\n    app: inf-k8s-test\n    role: master\nspec:\n  replicas: 1 # Number of desired replicas. Increase to desired number.\n  selector:\n    matchLabels:\n      app: inf-k8s-test\n      role: master\n  template:\n    metadata:\n      labels:\n        app: inf-k8s-test\n        role: master\n    spec:\n      volumes:\n        - name: sock\n          emptyDir: {}\n      containers:\n        - name: inf-k8s-test\n          image: tf-serving-ctr\n          imagePullPolicy: IfNotPresent\n          command: [\"/bin/sh\",\"-c\"]\n\n          # Pull model from s3, then start tensorflow_model_server_neuron with the model.\n          args:\n            - \"aws s3 sync s3://<your-bert-bucket>/bert /tmp/bert && \\\n           tensorflow_model_server_neuron --port=9000 --rest_api_port=8500 --model_name=bert_mrpc_hc_gelus_b4_l24_0926_02 --model_base_path=/tmp//bert/\"\n\n          # Open grpc and rest API ports\n          ports:\n            - containerPort: 8500\n            - containerPort: 9000\n\n          # Arbitrary resource requirements\n          resources:\n            limits:\n              cpu: 4\n              memory: 4Gi\n              aws.amazon.com/neuron: 1  # desired number of Inferentia devices.\n            requests:\n              cpu: \"1\"\n              memory: 1Gi\n              aws.amazon.com/neuron: 1  # desired number of Inferentia devices.\n"
  },
  {
    "path": "src/k8/k8s-neuron-device-plugin-rbac.yml",
    "content": "# rbac.yaml\n---\nkind: ClusterRole\napiVersion: rbac.authorization.k8s.io/v1\nmetadata:\n  name: neuron-device-plugin\nrules:\n- apiGroups:\n  - \"\"\n  resources:\n  - nodes\n  verbs:\n  - get\n  - list\n  - watch\n- apiGroups:\n  - \"\"\n  resources:\n  - events\n  verbs:\n  - create\n  - patch\n- apiGroups:\n  - \"\"\n  resources:\n  - pods\n  verbs:\n  - update\n  - patch\n  - get\n  - list\n  - watch\n- apiGroups:\n  - \"\"\n  resources:\n  - nodes/status\n  verbs:\n  - patch\n  - update\n---\napiVersion: v1\nkind: ServiceAccount\nmetadata:\n  name: neuron-device-plugin\n  namespace: kube-system\n---\nkind: ClusterRoleBinding\napiVersion: rbac.authorization.k8s.io/v1\nmetadata:\n  name: neuron-device-plugin\n  namespace: kube-system\nroleRef:\n  apiGroup: rbac.authorization.k8s.io\n  kind: ClusterRole\n  name: neuron-device-plugin\nsubjects:\n- kind: ServiceAccount\n  name: neuron-device-plugin\n  namespace: kube-system\n"
  },
  {
    "path": "src/k8/k8s-neuron-device-plugin.yml",
    "content": "# https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/\napiVersion: apps/v1\nkind: DaemonSet\nmetadata:\n  name: neuron-device-plugin-daemonset\n  namespace: kube-system\nspec:\n  selector:\n    matchLabels:\n      name:  neuron-device-plugin-ds\n  updateStrategy:\n    type: RollingUpdate\n  template:\n    metadata:\n      # Uncomment the annotation below if k8s version is 1.13 or lower\n      # annotations:\n      #  scheduler.alpha.kubernetes.io/critical-pod: \"\"\n      labels:\n        name: neuron-device-plugin-ds\n    spec:\n      serviceAccount: neuron-device-plugin\n      tolerations:\n      - key: CriticalAddonsOnly\n        operator: Exists\n      - key: aws.amazon.com/neuron\n        operator: Exists\n        effect: NoSchedule\n      # Mark this pod as a critical add-on; when enabled, the critical add-on\n      # scheduler reserves resources for critical add-on pods so that they can\n      # be rescheduled after a failure.\n      # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/\n      priorityClassName: \"system-node-critical\"\n      affinity:\n        nodeAffinity:\n          requiredDuringSchedulingIgnoredDuringExecution:\n            nodeSelectorTerms:\n              # Uncomment following matchExpressions if using k8s 1.16 or lower\n              #- matchExpressions:\n              #    - key: \"beta.kubernetes.io/instance-type\"\n              #      operator: In\n              #      values:\n              #        - inf1.xlarge\n              #        - inf1.2xlarge\n              #        - inf1.6xlarge\n              #        - inf1.24xlarge\n              #        - inf2.xlarge\n              #        - inf2.8xlarge\n              #        - inf2.24xlarge\n              #        - inf2.48xlarge\n              #        - trn1.2xlarge\n              #        - trn1.32xlarge\n              #        - trn1n.32xlarge\n              - matchExpressions:\n                  - key: \"node.kubernetes.io/instance-type\"\n                    operator: In\n                    values:\n                      - inf1.xlarge\n                      - inf1.2xlarge\n                      - inf1.6xlarge\n                      - inf1.24xlarge\n                      - inf2.xlarge\n                      - inf2.8xlarge\n                      - inf2.24xlarge\n                      - inf2.48xlarge\n                      - trn1.2xlarge\n                      - trn1.32xlarge\n                      - trn1n.32xlarge\n      containers:\n        # Find all neuron-device-plugin images at https://gallery.ecr.aws/neuron/neuron-device-plugin\n      - image: public.ecr.aws/neuron/neuron-device-plugin:2.22.4.0\n        imagePullPolicy: Always\n        name: neuron-device-plugin\n        env:\n        - name: KUBECONFIG\n          value: /etc/kubernetes/kubelet.conf\n        - name: NODE_NAME\n          valueFrom:\n            fieldRef:\n              fieldPath: spec.nodeName\n        securityContext:\n          allowPrivilegeEscalation: false\n          capabilities:\n            drop: [\"ALL\"]\n        volumeMounts:\n          - name: device-plugin\n            mountPath: /var/lib/kubelet/device-plugins\n          - name: infa-map\n            mountPath: /run\n      volumes:\n        - name: device-plugin\n          hostPath:\n            path: /var/lib/kubelet/device-plugins\n        - name: infa-map\n          hostPath:\n            path: /run\n\n\n\n"
  },
  {
    "path": "src/k8/k8s-neuron-monitor-daemonset.yml",
    "content": "apiVersion: apps/v1\nkind: DaemonSet\nmetadata:\n  name: neuron-monitor\n  namespace: neuron-monitor\n  labels:\n    app: neuron-monitor\n    version: v1\nspec:\n  selector:\n    matchLabels:\n      app: neuron-monitor\n  template:\n    metadata:\n      labels:\n        app: neuron-monitor\n        version: v1\n    spec:\n      affinity:\n        nodeAffinity:\n          requiredDuringSchedulingIgnoredDuringExecution:\n            nodeSelectorTerms:\n              - matchExpressions:\n                  - key: kubernetes.io/os\n                    operator: In\n                    values:\n                      - linux\n                  - key: node.kubernetes.io/instance-type\n                    operator: In\n                    values:\n                      - trn1.2xlarge\n                      - trn1.32xlarge\n                      - trn1n.32xlarge\n                      - inf1.xlarge\n                      - inf1.2xlarge\n                      - inf1.6xlarge\n                      - inf2.xlarge\n                      - inf2.8xlarge\n                      - inf2.24xlarge\n                      - inf2.48xlarge\n      containers:\n        - name: neuron-monitor\n          image: public.ecr.aws/neuron/neuron-monitor:1.3.0\n          ports:\n            - containerPort: 8000\n          command:\n             - \"/opt/bin/entrypoint.sh\"\n          args: \n            - \"--port\"\n            - \"8000\"\n            - \"--neuron-monitor-config\"\n            - \"/opt/aws/neuron/bin/neuron-monitor.conf\"\n          resources:\n            limits:\n              cpu: 500m\n              memory: 256Mi\n            requests:\n              cpu: 256m\n              memory: 128Mi\n          env:\n          - name: GOMEMLIMIT\n            value: 160MiB\n          securityContext:\n            privileged: true\n"
  },
  {
    "path": "src/k8/k8s-neuron-scheduler-configmap.yml",
    "content": "apiVersion: v1\ndata:\n  policy.cfg: |\n    {\n      \"kind\": \"Policy\",\n      \"apiVersion\": \"v1\",\n      \"extenders\": [\n        {\n          \"urlPrefix\": \"http://127.0.0.1:32700\",\n          \"filterVerb\": \"filter\",\n          \"bindVerb\":   \"bind\",\n          \"enableHttps\": false,\n          \"nodeCacheCapable\": true,\n          \"managedResources\": [\n            {\n              \"name\": \"aws.amazon.com/neuron\",\n              \"ignoredByScheduler\": false\n            },\n            {\n              \"name\": \"aws.amazon.com/neurondevice\",\n              \"ignoredByScheduler\": false\n            },\n            {\n              \"name\": \"aws.amazon.com/neuroncore\",\n              \"ignoredByScheduler\": false\n            }\n          ],\n          \"ignorable\": false\n        }\n      ]\n    }\nkind: ConfigMap\nmetadata:\n  name: scheduler-policy\n  namespace: kube-system\n"
  },
  {
    "path": "src/k8/k8s-neuron-scheduler-eks.yml",
    "content": "# rbac.yaml\n---\nkind: ClusterRole\napiVersion: rbac.authorization.k8s.io/v1\nmetadata:\n  name: k8s-neuron-scheduler\nrules:\n- apiGroups:\n  - \"\"\n  resources:\n  - nodes\n  verbs:\n  - get\n  - list\n  - watch\n- apiGroups:\n  - \"\"\n  resources:\n  - nodes/status\n  verbs:\n  - update\n  - patch\n  - get\n  - list\n  - watch\n- apiGroups:\n  - \"\"\n  resources:\n  - events\n  verbs:\n  - create\n  - patch\n- apiGroups:\n  - \"\"\n  resources:\n  - pods\n  verbs:\n  - update\n  - patch\n  - get\n  - list\n  - watch\n- apiGroups:\n  - \"\"\n  resources:\n  - bindings\n  - pods/binding\n  verbs:\n  - create\n---\napiVersion: v1\nkind: ServiceAccount\nmetadata:\n  name: k8s-neuron-scheduler\n  namespace: kube-system\n---\nkind: ClusterRoleBinding\napiVersion: rbac.authorization.k8s.io/v1\nmetadata:\n  name: k8s-neuron-scheduler\n  namespace: kube-system\nroleRef:\n  apiGroup: rbac.authorization.k8s.io\n  kind: ClusterRole\n  name: k8s-neuron-scheduler\nsubjects:\n- kind: ServiceAccount\n  name: k8s-neuron-scheduler\n  namespace: kube-system\n\n# deployment yaml\n---\nkind: Deployment\napiVersion: apps/v1\nmetadata:\n  name: k8s-neuron-scheduler\n  namespace: kube-system\nspec:\n  replicas: 1\n  strategy:\n    type: Recreate\n  selector:\n    matchLabels:\n        app: neuron-scheduler\n        component: k8s-neuron-scheduler\n  template:\n    metadata:\n      labels:\n        app: neuron-scheduler\n        component: k8s-neuron-scheduler\n      annotations:\n        scheduler.alpha.kubernetes.io/critical-pod: ''\n    spec:\n      serviceAccount: k8s-neuron-scheduler\n      schedulerName: my-scheduler\n      containers:\n        - name: neuron-scheduler-exp\n          # Find all neuron-scheduler images at https://gallery.ecr.aws/neuron/neuron-scheduler\n          image: public.ecr.aws/neuron/neuron-scheduler:2.22.4.0\n          env:\n          - name: PORT\n            value: \"12345\"\n\n# service.yaml            \n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: k8s-neuron-scheduler\n  namespace: kube-system\n  labels:\n    app: neuron-scheduler\n    component: k8s-neuron-scheduler\nspec:\n  ports:\n  - port: 12345\n    name: http\n    targetPort: 12345\n  selector:\n    # select app=ingress-nginx pods\n    app: neuron-scheduler\n    component: k8s-neuron-scheduler   \n"
  },
  {
    "path": "src/k8/k8s-neuron-scheduler.yml",
    "content": "# rbac.yaml\n---\nkind: ClusterRole\napiVersion: rbac.authorization.k8s.io/v1\nmetadata:\n  name: k8s-neuron-scheduler\nrules:\n- apiGroups:\n  - \"\"\n  resources:\n  - nodes\n  verbs:\n  - get\n  - list\n  - watch\n- apiGroups:\n  - \"\"\n  resources:\n  - events\n  verbs:\n  - create\n  - patch\n- apiGroups:\n  - \"\"\n  resources:\n  - pods\n  verbs:\n  - update\n  - patch\n  - get\n  - list\n  - watch\n- apiGroups:\n  - \"\"\n  resources:\n  - bindings\n  - pods/binding\n  verbs:\n  - create\n---\napiVersion: v1\nkind: ServiceAccount\nmetadata:\n  name: k8s-neuron-scheduler\n  namespace: kube-system\n---\nkind: ClusterRoleBinding\napiVersion: rbac.authorization.k8s.io/v1\nmetadata:\n  name: k8s-neuron-scheduler\n  namespace: kube-system\nroleRef:\n  apiGroup: rbac.authorization.k8s.io\n  kind: ClusterRole\n  name: k8s-neuron-scheduler\nsubjects:\n- kind: ServiceAccount\n  name: k8s-neuron-scheduler\n  namespace: kube-system\n\n# deployment yaml\n---\nkind: Deployment\napiVersion: apps/v1\nmetadata:\n  name: k8s-neuron-scheduler\n  namespace: kube-system\nspec:\n  replicas: 1\n  strategy:\n    type: Recreate\n  selector:\n    matchLabels:\n        app: neuron-scheduler\n        component: k8s-neuron-scheduler\n  template:\n    metadata:\n      labels:\n        app: neuron-scheduler\n        component: k8s-neuron-scheduler\n      annotations:\n        scheduler.alpha.kubernetes.io/critical-pod: ''\n    spec:\n      hostNetwork: true\n      tolerations:\n      - effect: NoSchedule\n        operator: Exists\n        key: node-role.kubernetes.io/master\n      - effect: NoSchedule\n        operator: Exists\n        key: node.cloudprovider.kubernetes.io/uninitialized\n      nodeSelector:\n         node-role.kubernetes.io/master: \"\"\n      serviceAccount: k8s-neuron-scheduler\n      containers:\n        - name: neuron-scheduler\n          # Find all neuron-scheduler images at https://gallery.ecr.aws/neuron/neuron-scheduler\n          image: public.ecr.aws/neuron/neuron-scheduler:2.22.4.0\n          env:\n          - name: PORT\n            value: \"12345\"\n\n# service.yaml            \n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: k8s-neuron-scheduler\n  namespace: kube-system\n  labels:\n    app: neuron-scheduler\n    component: k8s-neuron-scheduler\nspec:\n  type: NodePort\n  ports:\n  - port: 12345\n    name: http\n    targetPort: 12345\n    nodePort: 32700\n  selector:\n    # select app=ingress-nginx pods\n    app: neuron-scheduler\n    component: k8s-neuron-scheduler   \n"
  },
  {
    "path": "src/k8/k8s-ultraserver-init-script.sh",
    "content": "#!/bin/bash\n\nMPI_HOST_FILE=/etc/mpi/hostfile\n\nNEURON_ULTRASERVER_MODE_UNSET=0\nNEURON_ULTRASERVER_MODE_X4=1\nNEURON_ULTRASERVER_MODE_X2H=2\nNEURON_ULTRASERVER_MODE_X2V=3\nNEURON_ULTRASERVER_MODE_X1=4\n\nULTRASERVER_INIT_DIR=/root/ultraserver_init\nSORTED_NODES_FILE=$ULTRASERVER_INIT_DIR/sorted_nodes.txt\nFQDN_MODE_FILE=$ULTRASERVER_INIT_DIR/fqdn_mode.txt\nENV_VARS_FILE=$ULTRASERVER_INIT_DIR/us_env_vars.txt\nNEW_HOST_FILE=$ULTRASERVER_INIT_DIR/new_hostfile\n\nexport NEURON_ULTRASERVER_SERVER_ID_DEFAULT_VALUE=\"0000000000000000\"\nexport NEURON_ULTRASERVER_NODE_ID_DEFAULT_VALUE=-1\n\nexport NEURON_GLOBAL_TOPOID0_HOST=\"\"\n\nexport NUM_WORKERS=0\n\ncat /dev/null > $SORTED_NODES_FILE\ncat /dev/null > $FQDN_MODE_FILE\ncat /dev/null > $ENV_VARS_FILE\ncat /dev/null > $NEW_HOST_FILE\n\nsave_sorted_node_list() {\n    # Gather ultraserver information from each worker node\n    mpirun --allow-run-as-root \\\n        --mca orte_keep_fqdn_hostnames 1 \\\n        -host $ip_list \\\n        -x NEURON_ULTRASERVER_SERVER_ID_DEFAULT_VALUE \\\n        -x NEURON_ULTRASERVER_NODE_ID_DEFAULT_VALUE \\\n        -x NEURON_ULTRASERVER_NODE_CONFIG \\\n        sh -c '\n            if [ -f \"/sys/class/neuron_device/server_id_${NEURON_ULTRASERVER_NODE_CONFIG}\" ]; then\n                NEURON_ULTRASERVER_SERVER_ID=$(cat /sys/class/neuron_device/server_id_${NEURON_ULTRASERVER_NODE_CONFIG})\n            else\n                NEURON_ULTRASERVER_SERVER_ID=$NEURON_ULTRASERVER_SERVER_ID_DEFAULT_VALUE\n            fi\n\n            if [ -f \"/sys/class/neuron_device/node_id_${NEURON_ULTRASERVER_NODE_CONFIG}\" ]; then\n                NEURON_ULTRASERVER_NODE_ID=$(cat /sys/class/neuron_device/node_id_${NEURON_ULTRASERVER_NODE_CONFIG})\n            else\n                NEURON_ULTRASERVER_NODE_ID=$NEURON_ULTRASERVER_NODE_ID_DEFAULT_VALUE\n            fi\n\n            FQDN=$(hostname --fqdn)\n            echo $NEURON_ULTRASERVER_SERVER_ID:$NEURON_ULTRASERVER_NODE_ID:$FQDN\n        ' | sort -t':' -k1,1 -k2,2 -k3,3 > $SORTED_NODES_FILE\n\n    # Set the topology ids for each worker node\n    local i=0\n    while IFS= read -r line; do\n        echo \"${i}:${line}\"\n        ((i++))\n    done < $SORTED_NODES_FILE > temp && mv temp $SORTED_NODES_FILE\n    NEURON_GLOBAL_TOPOID0_HOST=$(head -n1 $SORTED_NODES_FILE | cut -d: -f4)\n}\n\nvalidate_node_config() {\n    while read -r server_id; do\n        # Server id and node id are only valid for node configs > 1\n        if [ $NEURON_ULTRASERVER_NODE_CONFIG -ne 1 ]; then\n            # Validate server id exists\n            if [ \"$server_id\" = \"$NEURON_ULTRASERVER_SERVER_ID_DEFAULT_VALUE\" ]; then\n                echo \"$NEURON_ULTRASERVER_NODE_CONFIG-node config is not supported\"\n                exit 1\n            fi\n\n            # Validate there is the correct amount of nodes that share the same server id\n            count=$(grep \"$server_id\" \"$SORTED_NODES_FILE\" | wc -l)\n            if [ $count -ne $NEURON_ULTRASERVER_NODE_CONFIG ]; then\n                echo \"Error: Incorrect number of nodes with server id $server_id, need $NEURON_ULTRASERVER_NODE_CONFIG nodes but saw $count\"\n                exit 1\n            fi\n\n            # Validate all the node ids are unique\n            node_ids_count=$(grep \"$server_id\" \"$SORTED_NODES_FILE\" | cut -d':' -f3 | sort | uniq | wc -l)\n            if [ $node_ids_count -ne $NEURON_ULTRASERVER_NODE_CONFIG ]; then\n                echo \"Error: Found $node_ids_count unique node IDs, expected $NEURON_ULTRASERVER_NODE_CONFIG\"\n                exit 1\n            fi\n        fi\n\n        while IFS=':' read -r tid sid nid fqdn; do\n            # Validate mode is valid for each node\n            modes=\"${fqdn_modes_map[$fqdn]}\"\n            if [ $NEURON_ULTRASERVER_NODE_CONFIG -eq 4 ]; then\n                if echo \"$modes\" | grep -q \"\\b$NEURON_ULTRASERVER_MODE_X4\\b\"; then\n                    mode=$NEURON_ULTRASERVER_MODE_X4\n                else\n                    echo \"Error: Node $fqdn does not support 4-node config\"\n                    exit 1\n                fi\n            elif [ $NEURON_ULTRASERVER_NODE_CONFIG -eq 2 ]; then\n                if echo \"$modes\" | grep -q \"\\b$NEURON_ULTRASERVER_MODE_X2V\\b\"; then\n                    mode=$NEURON_ULTRASERVER_MODE_X2V\n                elif echo \"$modes\" | grep -q \"\\b$NEURON_ULTRASERVER_MODE_X2H\\b\"; then\n                    mode=$NEURON_ULTRASERVER_MODE_X2H\n                else\n                    echo \"Error: Node $fqdn does not support 2-node config\"\n                    exit 1\n                fi\n            else\n                mode=$NEURON_ULTRASERVER_MODE_X1\n            fi\n\n            # Save each worker node's environments variables to a file\n            echo \"${tid}:${mode}:${sid}:${nid}:${fqdn}\" >> \"$ENV_VARS_FILE\"\n        done < <(grep \"$server_id\" \"$SORTED_NODES_FILE\")\n    done < <(cut -d':' -f2 \"$SORTED_NODES_FILE\" | sort | uniq)\n}\n\nreorder_hostfile() {\n    # Check if files exist\n    if [ ! -f \"$MPI_HOST_FILE\" ] || [ ! -f \"$SORTED_NODES_FILE\" ]; then\n        echo \"Error: One or both input files do not exist\"\n        exit 1\n    fi\n\n    # Extract FQDNs from SORTED_NODES_FILE and reorder entries\n    while IFS=: read -r _ _ _ fqdn; do\n        # Remove .cluster.local suffix\n        clean_fqdn=${fqdn%.cluster.local}\n\n        # Find the matching line in original file\n        while read -r line; do\n            if [[ \"$line\" == \"$clean_fqdn\"* ]]; then\n                echo \"$line\" >> \"$NEW_HOST_FILE\"\n                break\n            fi\n        done < \"$MPI_HOST_FILE\"\n    done < \"$SORTED_NODES_FILE\"\n}\n\n# Validate node config\nif [ -z \"${NEURON_ULTRASERVER_NODE_CONFIG}\" ]; then\n    NEURON_ULTRASERVER_NODE_CONFIG=4\nfi\nif [ $NEURON_ULTRASERVER_NODE_CONFIG -ne 1 ] && [ $NEURON_ULTRASERVER_NODE_CONFIG -ne 2 ] && [ $NEURON_ULTRASERVER_NODE_CONFIG -ne 4 ]; then\n    echo \"Error: Invalid ultraserver node config: $NEURON_ULTRASERVER_NODE_CONFIG. Must be 1, 2, or 4.\"\n    exit 1\nfi\necho \"Using $NEURON_ULTRASERVER_NODE_CONFIG-node config\"\n\necho -e \"\\nCurrent hostfile:\"\ncat $MPI_HOST_FILE\n\n# Read the file, extract the first column, resolve IPs, and build the comma-separated string\nip_list=\"\"\nwhile read line; do\n    ip=$(getent hosts \"$line\" | awk '{print $1}')\n    if [ -z \"$ip\" ]; then\n        echo \"error: Unable to resolve IP address for host: $line\"\n        exit 1\n    fi\n    if [ -z \"$ip_list\" ]; then\n        ip_list=\"$ip\"\n    else\n        ip_list=\"${ip_list},${ip}\"\n    fi\ndone < <(cut -d' ' -f1 $MPI_HOST_FILE)\necho \"Worker pod IPs:\" \"$ip_list\"\n\n# Count unique IPs from ip_list and store in NUM_WORKERS\nNUM_WORKERS=$(echo \"$ip_list\" | tr -cd ',' | wc -c)\nNUM_WORKERS=$((NUM_WORKERS + 1))\necho \"Number of worker nodes: $NUM_WORKERS\"\n\n# Validate that the number of workers is a multiple of the node config\nif [ $((NUM_WORKERS % NEURON_ULTRASERVER_NODE_CONFIG)) -ne 0 ]; then\n    echo \"Error: Invalid number of worker nodes for $NEURON_ULTRASERVER_NODE_CONFIG-node config: $NUM_WORKERS.\"\n    exit 1\nfi\n\n# Create a map of workers to their possible ultraserver modes\nmpirun --allow-run-as-root \\\n    --mca orte_keep_fqdn_hostnames 1 \\\n    -host $ip_list \\\n    sh -c '\n        FQDN=$(hostname --fqdn)\n        NEURON_ULTRASERVER_MODE=$(cat /sys/class/neuron_device/ultraserver_mode)\n        echo $FQDN:$NEURON_ULTRASERVER_MODE\n    ' | sort -t':' -k1 > $FQDN_MODE_FILE\ndeclare -A fqdn_modes_map\nwhile IFS=':' read -r fqdn mode; do\n    fqdn_modes_map[\"$fqdn\"]=\"$mode\"\ndone < $FQDN_MODE_FILE\n(echo \"FQDN:Modes\" && cat $FQDN_MODE_FILE) | tr ':' '    '\n\n# Validate worker nodes\necho -e \"\\nSorted nodes:\"\nsave_sorted_node_list\n(echo \"TOPO_ID:SERVER_ID:NODE_ID:FQDN\" && cat $SORTED_NODES_FILE) |  tr ':' '    '\necho -e \"\\nNEURON_GLOBAL_TOPOID0 node will be: $NEURON_GLOBAL_TOPOID0_HOST\"\nvalidate_node_config\n\n# Update hostlist\necho -e \"\\nUpdated hostfile:\"\nreorder_hostfile\ncat $NEW_HOST_FILE\n\n# Write environment variables to each worker node\nfor line in `cat $ENV_VARS_FILE`; do\n    IFS=':' read -r topo_id mode server_id node_id fqdn <<< \"$line\"\n    export mode server_id node_id fqdn topo_id\n    mpirun --allow-run-as-root \\\n        --mca orte_keep_fqdn_hostnames 1 \\\n        -host $fqdn \\\n        -x topo_id \\\n        -x NEURON_GLOBAL_TOPOID0_HOST \\\n        -x mode \\\n        -x server_id \\\n        -x node_id \\\n        sh -c '\n            sed -i \"/^NEURON_GLOBAL_TOPOID=/d\" /etc/environment\n            sed -i \"/^NEURON_GLOBAL_TOPOID0_HOST=/d\" /etc/environment\n            sed -i \"/^NEURON_RT_ULTRASERVER_MODE=/d\" /etc/environment\n            sed -i \"/^NEURON_RT_ULTRASERVER_SERVER_ID=/d\" /etc/environment\n            sed -i \"/^NEURON_RT_ULTRASERVER_NODE_ID=/d\" /etc/environment\n\n            echo \"NEURON_GLOBAL_TOPOID=$topo_id\" >> /etc/environment\n            echo \"NEURON_GLOBAL_TOPOID0_HOST=$NEURON_GLOBAL_TOPOID0_HOST\" >> /etc/environment\n            echo \"NEURON_RT_ULTRASERVER_MODE=$mode\" >> /etc/environment\n            echo \"NEURON_RT_ULTRASERVER_SERVER_ID=$server_id\" >> /etc/environment\n            echo \"NEURON_RT_ULTRASERVER_NODE_ID=$node_id\" >> /etc/environment\n\n            echo \"Node $(hostname --fqdn): Variables set and persisted\"\n            echo \"NEURON_GLOBAL_TOPOID=$topo_id\"\n            echo \"NEURON_GLOBAL_TOPOID0_HOST=$NEURON_GLOBAL_TOPOID0_HOST\"\n            echo \"NEURON_RT_ULTRASERVER_MODE=$mode\"\n            echo \"NEURON_RT_ULTRASERVER_SERVER_ID=$server_id\"\n            echo \"NEURON_RT_ULTRASERVER_NODE_ID=$node_id\"\n        '\ndone\n"
  },
  {
    "path": "src/k8/my-scheduler.yml",
    "content": "apiVersion: v1\nkind: ServiceAccount\nmetadata:\n  name: my-scheduler\n  namespace: kube-system\n---\napiVersion: rbac.authorization.k8s.io/v1\nkind: ClusterRoleBinding\nmetadata:\n  name: my-scheduler-as-kube-scheduler\nsubjects:\n- kind: ServiceAccount\n  name: my-scheduler\n  namespace: kube-system\nroleRef:\n  kind: ClusterRole\n  name: system:kube-scheduler\n  apiGroup: rbac.authorization.k8s.io\n---\napiVersion: rbac.authorization.k8s.io/v1\nkind: ClusterRoleBinding\nmetadata:\n  name: my-scheduler-as-volume-scheduler\nsubjects:\n- kind: ServiceAccount\n  name: my-scheduler\n  namespace: kube-system\nroleRef:\n  kind: ClusterRole\n  name: system:volume-scheduler\n  apiGroup: rbac.authorization.k8s.io\n---\nkind: ClusterRole\napiVersion: rbac.authorization.k8s.io/v1\nmetadata:\n  name: my-scheduler\nrules:\n- apiGroups:\n  - \"\"\n  resources:\n  - configmaps\n  verbs:\n  - get\n  - list\n  - watch\n- apiGroups:\n  - coordination.k8s.io\n  resources:\n  - leases\n  verbs:\n  - create\n  - get\n  - list\n  - update\n---\nkind: ClusterRoleBinding\napiVersion: rbac.authorization.k8s.io/v1\nmetadata:\n  name: my-scheduler\n  namespace: kube-system\nroleRef:\n  apiGroup: rbac.authorization.k8s.io\n  kind: ClusterRole\n  name: my-scheduler\nsubjects:\n- kind: ServiceAccount\n  name: my-scheduler\n  namespace: kube-system\n---\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: my-scheduler-config\n  namespace: kube-system\ndata:\n  my-scheduler-config.yaml: |\n    apiVersion: kubescheduler.config.k8s.io/v1\n    kind: KubeSchedulerConfiguration\n    profiles:\n      - schedulerName: my-scheduler\n    extenders:\n      - urlPrefix: 'http://k8s-neuron-scheduler.kube-system.svc.cluster.local:12345'\n        filterVerb: filter\n        bindVerb: bind\n        enableHTTPS: false\n        nodeCacheCapable: true\n        managedResources:\n          - name: 'aws.amazon.com/neuron'\n            ignoredByScheduler: false\n          - name: 'aws.amazon.com/neuroncore'\n            ignoredByScheduler: false\n          - name: 'aws.amazon.com/neurondevice'\n            ignoredByScheduler: false\n        ignorable: false\n    leaderElection:\n      leaderElect: true\n      resourceNamespace: kube-system    \n      resourceName: my-scheduler\n---\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  labels:\n    component: scheduler\n    tier: control-plane\n  name: my-scheduler\n  namespace: kube-system\nspec:\n  selector:\n    matchLabels:\n      component: scheduler\n      tier: control-plane\n  replicas: 1\n  template:\n    metadata:\n      labels:\n        component: scheduler\n        tier: control-plane\n        version: second\n    spec:\n      serviceAccountName: my-scheduler\n      containers:\n      - args:\n        - --config=/etc/kubernetes/my-scheduler/my-scheduler-config.yaml\n        - --leader-elect=true\n        - --v=2\n        command:\n        - /usr/local/bin/kube-scheduler\n        image: public.ecr.aws/eks-distro/kubernetes/kube-scheduler:v1.28.5-eks-1-28-latest\n        # or use below for your version of k8s\n        # image: registry.k8s.io/kube-scheduler:<version>\n        livenessProbe:\n          httpGet:\n            path: /healthz\n            port: 10259\n            scheme: HTTPS\n          initialDelaySeconds: 15\n        name: kube-second-scheduler\n        readinessProbe:\n          httpGet:\n            path: /healthz\n            port: 10259\n            scheme: HTTPS\n        resources:\n          requests:\n            cpu: '0.1'\n        securityContext:\n          privileged: false\n        volumeMounts:\n          - name: config-volume\n            mountPath: /etc/kubernetes/my-scheduler\n      hostNetwork: false\n      hostPID: false\n      volumes:\n        - name: config-volume\n          configMap:\n            name: my-scheduler-config\n\n"
  },
  {
    "path": "src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery-config.yml",
    "content": "apiVersion: v1\ndata:\n  kernel-monitor.json: |\n    {\n        \"plugin\": \"kmsg\",\n        \"logPath\": \"/dev/kmsg\",\n        \"lookback\": \"5m\",\n        \"bufferSize\": 10,\n        \"source\": \"kernel-monitor\",\n        \"conditions\": [\n            {\n                \"type\": \"NeuronHealth\",\n                \"reason\": \"NeuronHasNoError\",\n                \"message\": \"Neuron has no error\"\n            }\n        ],\n        \"rules\": [\n            {\n                \"type\": \"permanent\",\n                \"condition\": \"NeuronHealth\",\n                \"reason\": \"NeuronHasError_SRAM_UNCORRECTABLE_ERROR\",\n                \"pattern\": \".* NEURON_HW_ERR=SRAM_UNCORRECTABLE_ERROR .*\"\n            },\n            {\n                \"type\": \"permanent\",\n                \"condition\": \"NeuronHealth\",\n                \"reason\": \"NeuronHasError_NC_UNCORRECTABLE_ERROR\",\n                \"pattern\": \".* NEURON_HW_ERR=NC_UNCORRECTABLE_ERROR .*\"\n            },\n            {\n                \"type\": \"permanent\",\n                \"condition\": \"NeuronHealth\",\n                \"reason\": \"NeuronHasError_HBM_UNCORRECTABLE_ERROR\",\n                \"pattern\": \".* NEURON_HW_ERR=HBM_UNCORRECTABLE_ERROR .*\"\n            },\n            {\n                \"type\": \"permanent\",\n                \"condition\": \"NeuronHealth\",\n                \"reason\": \"NeuronHasError_DMA_ERROR\",\n                \"pattern\": \".* NEURON_HW_ERR=DMA_ERROR .*\"\n            }\n        ]\n    }\nkind: ConfigMap\nmetadata:\n  name: node-problem-detector-config\n  namespace: neuron-healthcheck-system\n\n"
  },
  {
    "path": "src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery-rbac.yml",
    "content": "apiVersion: v1\nkind: ServiceAccount\nmetadata:\n  name: node-problem-detector\n  namespace: neuron-healthcheck-system\n\n---\napiVersion: rbac.authorization.k8s.io/v1\nkind: ClusterRoleBinding\nmetadata:\n  name: npd-binding\nroleRef:\n  apiGroup: rbac.authorization.k8s.io\n  kind: ClusterRole\n  name: system:node-problem-detector\nsubjects:\n  - kind: ServiceAccount\n    name: node-problem-detector\n    namespace: neuron-healthcheck-system\n\n---\napiVersion: rbac.authorization.k8s.io/v1\nkind: ClusterRole\nmetadata:\n  labels:\n    kubernetes.io/bootstrapping: rbac-defaults\n  name: system:node-problem-detector\nrules:\n- apiGroups:\n  - \"\"\n  resources:\n  - nodes\n  verbs:\n  - get\n- apiGroups:\n  - \"\"\n  resources:\n  - nodes/status\n  verbs:\n  - patch\n- apiGroups:\n  - \"\"\n  - events.k8s.io\n  resources:\n  - events\n  verbs:\n  - create\n  - patch\n  - update\n"
  },
  {
    "path": "src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery.yml",
    "content": "apiVersion: apps/v1\nkind: DaemonSet\nmetadata:\n  name: node-problem-detector\n  namespace: neuron-healthcheck-system\n  labels:\n    app: node-problem-detector\nspec:\n  selector:\n    matchLabels:\n      app: node-problem-detector\n  template:\n    metadata:\n      labels:\n        app: node-problem-detector\n    spec:\n      affinity:\n        nodeAffinity:\n          requiredDuringSchedulingIgnoredDuringExecution:\n            nodeSelectorTerms:\n              - matchExpressions:\n                  - key: \"node.kubernetes.io/instance-type\"\n                    operator: In\n                    values:\n                      - inf1.xlarge\n                      - inf1.2xlarge\n                      - inf1.6xlarge\n                      - inf1.24xlarge\n                      - inf2.xlarge\n                      - inf2.8xlarge\n                      - inf2.24xlarge\n                      - inf2.48xlarge\n                      - trn1.2xlarge\n                      - trn1.32xlarge\n                      - trn1n.32xlarge\n\n      containers:\n      - name: node-problem-detector\n        command:\n        - /node-problem-detector\n        - --logtostderr\n        - --config.system-log-monitor=/config/kernel-monitor.json\n        image: registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.19\n        ports:\n        - containerPort: 20257\n        resources:\n          limits:\n            cpu: 10m\n            memory: 80Mi\n          requests:\n            cpu: 10m\n            memory: 80Mi\n        imagePullPolicy: Always\n        securityContext:\n          privileged: true\n        env:\n        - name: NODE_NAME\n          valueFrom:\n            fieldRef:\n              fieldPath: spec.nodeName\n        volumeMounts:\n        - name: log\n          mountPath: /var/log\n          readOnly: true\n        - name: kmsg\n          mountPath: /dev/kmsg\n          readOnly: true\n        # Make sure node problem detector is in the same timezone\n        # with the host.\n        - name: localtime\n          mountPath: /etc/localtime\n          readOnly: true\n        - name: config\n          mountPath: /config\n          readOnly: true\n      - name: node-recovery\n        command:\n        - /bin/sh\n        - -c\n        - \"sleep 60 && /scripts/check-health.py\"\n        image: public.ecr.aws/neuron/neuron-node-recovery:1.3.0\n        resources:\n          limits:\n            cpu: 10m\n            memory: 150Mi\n          requests:\n            cpu: 10m\n            memory: 150Mi\n        imagePullPolicy: Always\n        env:\n        - name: NODE_NAME\n          valueFrom:\n            fieldRef:\n              fieldPath: spec.nodeName\n        - name: ENABLE_RECOVERY\n          value: \"false\"\n      serviceAccountName: node-problem-detector\n      volumes:\n      - name: log\n        # Config `log` to your system log directory\n        hostPath:\n          path: /var/log/\n      - name: kmsg\n        hostPath:\n          path: /dev/kmsg\n      - name: localtime\n        hostPath:\n          path: /etc/localtime\n      - name: config\n        configMap:\n          name: node-problem-detector-config\n          defaultMode: 0555\n          items:\n          - key: kernel-monitor.json\n            path: kernel-monitor.json\n      tolerations:\n        - effect: NoSchedule\n          operator: Exists\n        - effect: NoExecute\n          operator: Exists\n"
  },
  {
    "path": "src/libnrt/README.md",
    "content": "# NeuronX Runtime Header Files\n\n## Overview\n\nThe NeuronX Runtime Library provides C APIs for initializing the Neuron hardware,\nloading models and input data, executing iterations on loaded models, and\nretrieving output data.\n\nThis library is provided to customers via a shared object (libnrt.so) that is installed\nthrough the `aws-neuronx-runtime-lib` package. This directory exposes the header files\nthat customers can use to write custom applications utilizing the NeuronX Runtime Library.\n\n## File Location\n\nThese header files will be installed to the user's system under `/opt/aws/neuron/include`\nwhen installing the `aws-neuronx-runtime-lib` package and the `libnrt.so` library is \ninstalled under the `/opt/aws/neuron/lib` directory.\n\n## Experimental Headers\n\nThe following files contain experimental function declarations and are subject to change in \nfuture releases.\n\n- nrt_async.h\n- nrt_async_sendrecv.h\n- nrt_experimental.h\n\n## Documentation\n\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-api-guide.html\n"
  },
  {
    "path": "src/libnrt/include/ndl/ndl.h",
    "content": "/*\n * Copyright 2020-2021, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n#pragma once\n\n#include <stdint.h>\n#include <stdbool.h>\n#include <sys/types.h>\n#include <pthread.h>\n\n#include \"neuron_driver_shared.h\"\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\ntypedef enum NQ_DEV_TYPE {\n    NQ_DEV_TYPE_NEURON_CORE = 0,\n    NQ_DEV_TYPE_TOPSP,\n    NQ_DEV_TYPE_MAX,\n} ndl_nq_dev_t;\n\n#define NEURON_MAX_DEVICES MAX_NEURON_DEVICE_COUNT\n#define NEURON_DEVICE_PREFIX \"/dev/neuron\"\n#define NEURON_DRIVER_LIBRARY_MAJOR 1\n#define NEURON_DRIVER_LIB_MINOR 0\n#define MAX_HBM_PER_DEVICE 4\n\n#define DRIVER_VERSION_MAX_SIZE 32\ntypedef struct ndl_version_info {\n    uint16_t driver_major_version;      // Major version of the driver\n    uint16_t driver_minor_version;      // Minor version of the driver\n    char driver_full_version[DRIVER_VERSION_MAX_SIZE];\n    uint16_t library_major_version;     // Major version of the library\n    uint16_t library_minor_version;     // Minor version of the library\n} ndl_version_info_t;\n\n/** Get version info.\n *\n * @param[out] version       - Buffer to store the version information.\n *\n * @return 0 on success.\n *         -1 on failed to read driver version.\n */\nint ndl_get_version(ndl_version_info_t *version);\n\n/** Gets the range of compatible version\n *\n * @param min_compatible_version_min [out]  - Lowest supported version\n * @param max_compatible_version_max [out]  - Highest supported version\n *\n * @return 0 on success.\n *\n */\nint ndl_get_compatible_version(uint32_t *min_compatible_version, uint32_t *max_compatible_version);\n\ntypedef struct ndl_device_init_param {\n    bool initialize_device; // if set to true, device is initialized as part of open()\n    int num_dram_regions; // splits device DRAMs into given number of regions.\n    bool map_hbm; // if set to true, HBM will be mapped during device open\n} ndl_device_init_param_t;\n\n\n#define NDL_COPY_BUF_SIZE (2ull * 1024 * 1024)\ntypedef struct ndl_copy_buf {\n    uint64_t mem_handle;\n    void *mmap_va;\n    pthread_mutex_t lock;\n} ndl_copy_buf_t;\n\n// Maximum neuron devices supported on a system.\n#define MAX_NEURON_DEVICE_COUNT 64\n\n// Maximum neuron cores per device\n#define MAX_NC_PER_DEVICE 8\n\ntypedef struct ndl_device {\n    uint8_t device_index;                               // Device Index\n    uint8_t device_type;                                // Device Type (V1, V2..)\n    uint16_t device_revision;                           // Revision id of board\n    uint8_t connected_device_count;                     // Number of devices connected to this device\n    uint8_t connected_devices[MAX_NEURON_DEVICE_COUNT]; // Array of devices(IDs) connected to this device\n    uint64_t csr_base[2];                               // BAR0/BAR2 base\n    uint64_t csr_size[2];                               // BAR0/BAR2 size\n    ndl_copy_buf_t cpy_bufs[MAX_NC_PER_DEVICE];         // MMAP buffers for efficiently copying data in/out of the device\n    void *hbm_va[MAX_HBM_PER_DEVICE];                   // HBM virtual addresses\n    size_t hbm_size;                                    // HBM sizes\n    uint32_t hbm_va_cnt;                                // Number of active HBM regions\n    uint32_t shift_hbm_size;                            // Cached number of bits to shift\n    uint64_t hbm_offset[MAX_HBM_PER_DEVICE];            // HBM offsets\n    uint8_t context[];                                  // Library reserved fields\n} ndl_device_t;\n\ntypedef struct ndl_device_nc {\n    ndl_device_t *device;\n    uint32_t nc_id;\n} ndl_device_nc_t;\n\ntypedef struct ndl_device_context {\n    int nd_fd;\n} ndl_device_context_t;\n\ntypedef struct ndl_mem_info {\n    ndl_device_t *device;\n    __u64 driver_handle;\n    uint64_t pa;\n    uint64_t mmap_offset;\n    uint64_t size;\n    uint32_t align;\n    void *mmap_va;\n    uint32_t host_memory;\n    int nc_id;\n} ndl_mem_info_t;\n\ntypedef struct ndl_notification_context {\n    union {\n        uint8_t nc_id; // neuron core index\n        uint8_t nq_dev_id; // notification device index\n    };\n    ndl_nq_dev_t nq_dev_type; // notification device type\n    uint8_t nq_type; // type of the notification queue\n    uint8_t engine_index; // engine index\n    uint32_t size; // size of the NQ in bytes\n    int fd; // file descriptor of /dev/ndX/ncY/nqZ\n    uint64_t offset; //mmap offset in the nd\n    uint64_t mem_handle;\n    void *va; // mmapped address\n\n    ndl_mem_info_t *mem_info; // NQ memory info\n} ndl_notification_context_t;\n\n/**\n * Called by app the first time when it accesses the device.\n *\n * @param[in] device_index       - device index that is to be opened\n * @param[in] num_tdram_regions  - number of tdram regions\n * @param[out] device            - device specific information\n *\n * @return 0 on success.\n *         -1 on failure\n */\nint ndl_open_device(int device_index, ndl_device_init_param_t *params, ndl_device_t **device);\n\n/**\n * Called by app when it is done. After this, device cannot be accessed\n *\n * @param[in] device    - Device to close.\n *\n * @return 0 on success.\n *         -1 on failure\n */\nint ndl_close_device(ndl_device_t *device);\n\n/**\n * Get all the device index\n *\n * @param[out] device_indexes       - Buffer to store device indexes.\n * @param[in] device_indexes_size   - Size of the buffer in dwords.\n *\n * @return Number of devices found.\n */\nint ndl_available_devices(int *device_indexes, int device_indexes_size);\n\n/** Read from one or more registers.\n *\n * @param device[in]        - Device handle.\n * @param bar[in]           - BAR to read.\n * @param addresses[in]     - Array of register addresses.\n * @param count[in]         - Number of registers in the array.\n * @param buffer[out]       - Buffer to store read data.\n *\n * @return 0 on success.\n */\nint ndl_bar_read(ndl_device_t *device, uint8_t bar, uint64_t *addresses, uint32_t count, uint32_t *buffer);\n\n/** Write to one or more registers.\n *\n * @param device[in]        - Device handle.\n * @param bar[in]           - BAR to write.\n * @param addresses[in]     - Array of register addresses.\n * @param count[in]         - Number of registers in the array.\n * @param data[in]          - Data to write.\n *\n * @return 0 on success.\n */\nint ndl_bar_write(ndl_device_t *device, uint8_t bar, uint64_t *addresses, uint32_t count, uint32_t *data);\n\n/** Read hw counters from one or more addresses\n *\n * @param device[in]        - Device handle.\n * @param addresses[in]     - Array of register addresses.\n * @param count[in]         - Number of registers in the array.\n * @param buffer[out]       - Buffer to store read data.\n *\n * @return 0 on success.\n */\nint ndl_read_hw_counters(ndl_device_t *device, uint64_t *addresses, uint32_t count, uint32_t *data);\n\n/**\n * Retrieves the cached HBM virtual address for the specified device.\n *\n * @param device[in]        - Device handle.\n * @param hbm_idx[in]       - HBM index.\n * @param va[out]           - Resulting virtual address.\n * @param size[out]         - Size of the HBM\n * \n * @return 0 on success, -EINVAL on failure, and -ENOENT when there are no more entries to be found.\n */\nint ndl_get_hbm_va(ndl_device_t *device, int hbm_idx, void **va, size_t *size);\n\n/** Allocates memory.\n *\n * @param device[in]        - Device to be associated with the allocation.\n * @param size[in]          - Number of bytes to allocate.\n * @param host_memory[in]   - If true allocate from host memory instead of using device memory.\n * @param dram_channel[in]  - DRAM channel to use in the device memory.\n * @param dram_region[in]   - DRAM region to use in the device memory.\n * @param nc_id[in]         - NC ID to use in the device\n * @param mem_alloc_type[in]- Type of memory allocation \n * @param mem_handle[out]   - Allocated memory handle would be stored here.\n *\n * @return 0 on success.\n */\nint ndl_memory_alloc(ndl_device_t *device, size_t size, uint64_t align, uint32_t host_memory, uint32_t dram_channel, uint32_t dram_region,\n                        uint32_t nc_id, uint32_t mem_alloc_type, uint64_t *mem_handle);\n\n/** Given a mem handle gets it PA - HACK to be removed\n * @param mem_handle[in]     - Memory handle\n * @parama pa[out]           - Physical address of handle\n *\n * @return the PA\n */\nint ndl_memory_get_pa(uint64_t mem_handle, uint64_t *pa);\n\n/** Map given m memory handle into virtual address space.\n *\n * @param mem_handle[in]     - Handle to map.\n * @param va[out]            - Resulting virtual address.\n *\n * @return 0 on success\n */\nint ndl_memory_map(uint64_t mem_handle, void **va);\n\n/** Unmap given memory handle from virtual address space.\n *\n * @param mem_handle[in]     - Handle to unmap.\n *\n * @return 0 on success\n */\nint ndl_memory_unmap(uint64_t mem_handle);\n\n/** Frees already allocated memory.\n *\n * @param mem_handle[in]   - Memory handle to be freed.\n *\n * @return 0 on success.\n */\nint ndl_memory_free(uint64_t mem_handle);\n\n/** Copy data from buffer to mem_handle.\n *\n * @param mem_handle[in]    - Handle on which data needs to be copied in.\n * @param buffer            - Buffer from which data needs to be copied.\n * @param offset            - Offset in the mem handle.\n * @param size              - Size in bytes to be copied.\n *\n * @return 0 on success.\n */\nint ndl_memory_buf_copyin(uint64_t mem_handle, void *buffer, uint64_t offset, size_t size);\n\n/** Copy data from mem_handle to buffer.\n *\n * @param mem_handle[in]    - Handle from which data needs to be copied out.\n * @param buffer            - Buffer to which data needs to be copied.\n * @param offset            - Offset in the mem handle.\n * @param size              - Size in bytes to be copied.\n *\n * @return 0 on success.\n */\nint ndl_memory_buf_copyout(uint64_t mem_handle, void *buffer, uint64_t offset, size_t size);\n\n/** Copy data from buffer to mem_handle (zero copy, buffer is pinned and used directly).\n *\n * @param mem_handle[in]    - Handle on which data needs to be copied in.\n * @param buffer            - Buffer from which data needs to be copied.\n * @param offset            - Offset in the mem handle.\n * @param size              - Size in bytes to be copied.\n *\n * @return 0 on success.\n */\nint ndl_memory_buf_zerocopyin(uint64_t mem_handle, void *buffer, uint64_t offset, size_t size, int qid, uint32_t bar4_wr_threshold);\n\n/** Copy data from mem_handle to buffer (zero copy, buffer is pinned and used directly).\n *\n * @param mem_handle[in]    - Handle from which data needs to be copied out.\n * @param buffer            - Buffer to which data needs to be copied.\n * @param offset            - Offset in the mem handle.\n * @param size              - Size in bytes to be copied.\n * @param qid               - H2T queue to use.  NEURON_DMA_H2T_DEFAULT_QID is default\n *\n * @return 0 on success.\n */\nint ndl_memory_buf_zerocopyout(uint64_t mem_handle, void *buffer, uint64_t offset, size_t size, int qid);\n\n/** Batch transfer data between host buffers and device memory.\n *\n * @param mem_handle[in]    - Device memory handle\n * @param ops[in]           - Array of batch operations\n * @param num_ops[in]       - Number of operations in batch\n * @param direction[in]     - Transfer direction (0=write to device, 1=read from device)\n * @param qid[in]           - H2T queue to use (-1 for default)\n *\n * @return 0 on success.\n */\nint ndl_memory_buf_batch_copy(neuron_memcpy_batch_t *batches, uint64_t num_batches, uint32_t direction, int qid);\n\n/** Copy data from buffer to addr in engine.\n *\n * @param device[in]        - Device information.\n * @param nc_id [in]        - Neuron core id.\n * @param dst [in]          - Address on which data needs to be copied in.\n * @param buffer            - Buffer from which data needs to be copied.\n * @param offset            - Offset in the mem handle.\n * @param size              - Size in bytes to be copied.\n * @param qid               - H2T queue to use.  NEURON_DMA_H2T_DEFAULT_QID is default\n *\n * @return 0 on success.\n */\nint ndl_program_engine(ndl_device_t *device, uint32_t nc_id, uint64_t dst, void *buffer, uint64_t offset, size_t size);\n\n/** Memset the given memhandle with passed byte value\n *\n * @param src_mem_handle[in]- Handle which needs to be filled with byte value\n * @param offset            - Src Offset in the mem handle.\n * @param value             - Byte value to fill the memory with\n * @param size              - Size in bytes to be copied.\n *\n * @return 0 on success.\n */\nint ndl_memset(const uint64_t addr, uint64_t offset, const int value, const size_t size);\n\n/** Copy data between mem_handles.\n *\n * @param src_mem_handle[in]- Handle from which data needs to be copied out.\n * @param dst_mem_handle[in]- Handle from which data needs to be copied to.\n * @param src_offset        - Src Offset in the mem handle.\n * @param dst_offset        - Dest Offset in the mem handle.\n * @param size              - Size in bytes to be copied.\n *\n * @return 0 on success.\n */\nint ndl_memory_copy(uint64_t src_mem_handle, uint64_t dst_mem_handle, uint64_t src_offset, uint64_t dst_offset,\n                    size_t size);\n\n\n/** Copy data between mem_handles asynchronously.\n *\n * @param src_mem_handle[in]   - Handle from which data needs to be copied out.\n * @param dst_mem_handle[in]   - Handle from which data needs to be copied to.\n * @param src_offset           - Src Offset in the mem handle.\n * @param dst_offset           - Dest Offset in the mem handle.\n * @param size                 - Size in bytes to be copied.\n * @param prefetch_addr        - Host destination address associate with copy out operation to prefetch\n * @param wait_handle [in/out] - wait_handle [in] is for prev request, [out] is handle for this request\n *\n * @return 0 on success.\n */\nint ndl_memory_copy_as(uint64_t src_mem_handle, uint64_t dst_mem_handle, uint64_t src_offset, uint64_t dst_offset, \n                       size_t size, uint64_t prefetch_addr, int * wait_handle);\n\n\n/** Copy data between mem_handles.\n *\n * @param mem_handle[in]  - Handle from which data for this tran (either src or dst)\n * @param wait_handle     - wait_handle for an async dma\n *\n * @return 0 on success.\n */\nint ndl_memory_copy_as_wait(uint64_t mem_handle, int wait_handle);\n\n/** Set the dma engine state\n *\n * @param device_index[in]  - Device index.\n * @param eng_id[in]        - Eng ID that is initialized.\n * @param state[in]         - State that is set UDMA_NORMAL/UDMA_DISABLE etc\n *\n * @return 0 on success.\n */\nint ndl_dma_eng_set_state(int device_index, uint32_t eng_id, uint32_t state);\n\n/** Get the dma engine state\n *\n * @param device_index[in]  - Device index.\n * @param eng_id[in]        - Engine index which status needs to be collected.\n * @param state[out]        - Buffer to store engine state.\n *\n * @return 0 on success.\n */\nint ndl_dma_eng_get_state(int device_index, uint32_t eng_id, struct neuron_dma_eng_state *state);\n\n/** Get DMA queue state\n *\n * @param device_index[in]  - Device index.\n * @param eng_id [in]       - DMA engine index.\n * @param qid [in]          - DMA queue index.\n * @param tx [out]          - Tx queue state.\n * @param rx [out]          - Rx queue state.\n *\n * @return 0 on success.\n */\nint ndl_dma_queue_get_state(int device_index, uint8_t eng_id, uint8_t qid, struct neuron_dma_queue_state *tx, struct neuron_dma_queue_state *rx);\n\n/** Copy DMA descriptors to userspace.\n *\n *  This API needs root privilege.\n *\n * @param device_index[in]  - Device index.\n * @param eng_id [in]       - DMA engine index.\n * @param qid [in]          - DMA queue index.\n * @param type [in]         - Type of the queue.\n * @param index [in]        - Start descriptor index.\n * @param count [in]        - Number of descriptor needs to be copied.\n * @param buffer [out]      - Buffer to store the descriptors.\n *\n * @return 0 on success.\n */\nint ndl_dma_descriptor_copyout(int device_index, uint8_t eng_id, uint8_t qid, enum neuron_dma_queue_type type, uint32_t start_index, uint32_t count, void *buffer);\n\n/** Initialize the dma queue for a given engine\n *\n * @param device_index[in]  - Device index\n * @param eng_id[in]        - Engine for which the queue is initialized\n * @param qid[in]           - Queue id that needs to be initialized\n * @param tx_desc_count[in] - number of tx desc's need to be allocated\n * @param rx_desc_count[in] - number of rx desc's need to be allocated\n * @param tx_handle[in]     - TX mem handle\n * @param rx_handle[in]     - RX mem handle\n * @param rxc_handle[in]    - Completion mem handle\n *\n * @return 0 on success.\n */\nint ndl_dma_queue_init(int device_index, uint32_t eng_id, uint32_t qid, uint32_t tx_desc_count, uint32_t rx_desc_count,\n                       uint64_t tx_handle, uint64_t rx_handle, uint64_t rxc_handle, uint32_t axi_port);\n\nstruct ndl_queue_init {\n    __u32 eng_id; // [in] DMA engine index\n    __u32 qid; // [in] Queue index in the DMA engine\n    __u32 tx_desc_count; // [in] number of tx desc's need to be allocated\n    __u32 rx_desc_count; // [in] number of rx desc's need to be allocated\n    __u64 tx_handle; // [in] mem handle for the tx ring\n    __u64 rx_handle; // [in] mem handle for the rx ring\n    __u64 rxc_handle; // [in] mem handle for the rxc ring\n    __u32 axi_port; // [in] axi port\n};\n\n#define MAX_NDL_QUEUE_INIT_BATCH 256\nstruct ndl_queue_init_batch {\n    __u32 count;\n    struct ndl_queue_init entries[MAX_NDL_QUEUE_INIT_BATCH];\n};\n\n/** Initialize a batch of dma queues\n *\n * @param device_index[in]  - Device index\n * @param batch[in]         - Batch of dma queue initialization requests\n *\n * @return 0 on success.\n */\nint ndl_dma_queue_init_batch(int device_idx, struct ndl_queue_init_batch *batch);\n\n/** Release the dma queue for a given engine - only used in tests\n *\n * @param device_index[in]  - Device index\n * @param eng_id[in]        - Engine for which the queue is initialized\n * @param qid[in]           - Queue id that needs to be initialized\n *\n * @return 0 on success.\n */\nint ndl_dma_queue_release(int device_index, uint32_t eng_id, uint32_t qid);\n\n/** Starts DMA by copying the given number of descriptors or prefetch s2m\n *\n * @param device_index[in]  - Device index\n * @param eng_id[in]        - Engine for which the queue is initialized\n * @param qid[in]           - Queue id that needs to be initialized\n * @param tx_desc_count[in] - number of tx desc's need to be copied, could be 0 if called for s2m prefetch\n * @param rx_desc_count[in] - number of rx desc's need to be copied\n *\n * @return 0 on success.\n */\nint ndl_dma_queue_copy_start(int device_index, uint32_t eng_id, uint32_t qid, uint32_t tx_desc_count, uint32_t rx_desc_count);\n\n/** Acks the completed desc count for the eng/queue - only used in tests\n *\n * @param device_index[in]  - Device index\n * @param eng_id[in]        - Engine for which the queue is initialized\n * @param qid[in]           - Queue id that needs to be initialized\n * @param count[in]         - Number of desc's to ack\n *\n * @return 0 on success.\n */\nint ndl_dma_ack_completed_desc(int device_index, uint32_t eng_id, uint32_t qid, uint32_t count);\n\n/** Copy data from buffer to mem_handle. Buffer has dma desc\n *\n * @param mem_handle[in]        - Handle on which data needs to be copied in.\n * @param buffer[in]            - Buffer from which data needs to be copied. Buffer has dma desc\n * @param offset[in]            - Offset in the mem handle.\n * @param num_descs[in]         - Number of descriptors to copy\n * @param queue_type[in]        - From which queue copy descriptors.\n *\n * @return 0 on success.\n */\nint ndl_dma_copy_descriptors(uint64_t mem_handle, void *buffer, uint64_t offset, uint32_t num_descs, enum neuron_dma_queue_type queue_type);\n\n/** Reset given NCs within a device.\n *\n * @param device_index[in]  - Device to reset.\n * @param nc_map[in]        - NCs to reset (-1 to reset entire device)\n * @param request_id[out]   - ID for this reset request\n *\n * @return 0 on success.\n */\nint ndl_reset_ncs(int device_index, int nc_map, uint32_t *request_id);\n\n/** Register the callback to NRT to warn/nudge users when hitting soft incompatibility\n *\n * @param callback  - the call back function\n * @return int - 0 on success, otherwise on failure\n */\nint ndl_register_soft_incompat_callback(void (*callback)(const char *));\n\n/** Waits for readiness of given NCs within a device.\n *\n * @param device_index[in]  - Device index.\n * @param request_id[in]    - ID for the reset request to wait on\n * @param result[out]       - Buffer to store the result.\n *                            If the device is ready then this would be set to 1.\n *\n * @return 0 on success.\n *\n */\nint ndl_ready_ncs(int device_index, uint32_t request_id, uint8_t *result);\n\n/** Get info on all the apps that are currently using the device, caller needs to free returned info (*info)\n *\n * @param device_index[in]  - Device index.\n * @param info[out] - Pointer to a pointer which will hold app data, needs to be deallocated by caller\n * @param size[out] - Number of entries in neuron_app_info\n *\n * @return 0   - on success\n */\nint ndl_get_all_apps_info(ndl_device_t *device, struct neuron_app_info **info, size_t *count, uint16_t apps_info_flags);\n\n/** Increment a semaphore in Neuron Core.\n *\n * @param device[in]            - Device\n * @param nc_index[in]          - Neuron Core index\n * @param semaphore_index[in]   - Semaphore which needs to be incremented.\n * @param value[in]             - Value to decrement.\n *\n * @return 0 on success\n */\nint ndl_nc_semaphore_increment(ndl_device_t *device, int nc_index, uint32_t semaphore_index, uint32_t value);\n\n/** Decrement a semaphore in Neuron Core.\n *\n * @param device[in]            - Device\n * @param nc_index[in]          - Neuron Core index\n * @param semaphore_index[in]   - Semaphore which needs to be decremented.\n * @param value[in]             - Value to increment.\n *\n * @return 0 on success\n */\nint ndl_nc_semaphore_decrement(ndl_device_t *device, int nc_index, uint32_t semaphore_index, uint32_t value);\n\n/** Get semaphore value in Neuron Core.\n *\n * @param device[in]            - Device\n * @param nc_index[in]          - Neuron Core index\n * @param semaphore_index[in]   - Semaphore index.\n * @param value[out]            - Buffer where read value would be stored.\n *\n * @return 0 on success\n */\nint ndl_nc_semaphore_read(ndl_device_t *device, int nc_index, uint32_t semaphore_index, uint32_t *value);\n\n\n/** Write given value into the semaphore in Neuron Core.\n *\n * @param device[in]            - Device\n * @param nc_index[in]          - Neuron Core index\n * @param semaphore_index[in]   - Semaphore index.\n * @param value[in]             - Value to write.\n *\n * @return 0 on success\n */\nint ndl_nc_semaphore_write(ndl_device_t *device, int nc_index, uint32_t semaphore_index, uint32_t value);\n\n\n/** Get event value in Neuron Core.\n *\n * @param device[in]            - Device\n * @param nc_index[in]          - Neuron Core index\n * @param semaphore_index[in]   - Semaphore index.\n * @param value[out]            - Buffer where read value would be stored.\n *\n * @return 0 on success\n */\nint ndl_nc_event_get(ndl_device_t *device, int nc_index, uint32_t event_index, uint32_t *value);\n\n\n/** Set a event in Neuron Core.\n *\n * @param device[in]            - Device\n * @param nc_index[in]          - Neuron Core index\n * @param event_index[in]       - Event index.\n * @param value[in]             - Value to write.\n *\n * @return 0 on success\n */\nint ndl_nc_event_set(ndl_device_t *device, int nc_index, uint32_t event_index, uint32_t value);\n\n\n/** Configure notification queue\n *\n * Neuron device has multiple of neuron cores and TOP_SPs. If nq_dev_type is\n * NQ_DEV_TYPE_NEURON_CORE, nq_dev_index conveys neuron core index. In case of\n * NQ_DEV_TYPE_NEURON_TOPSP, nq_dev_index means TOP_SP index.\n *\n * @param device[in]            - Device\n * @param nq_dev_id[in]         - Notification device index\n * @param nq_dev_type[in]       - Notification device type\n * @param nq_type[in]           - Notification queue type\n * @param engine_index[in]      - Engine index\n * @param size[in]              - Size in bytes\n * @param on_host_memory[in]    - If true, NQ is created on host memory\n * @param dram_channel          - If NQ is created on device, DRAM channel to use\n * @param dram_region           - If NQ is created on device, DRAM region to use\n * @param force_alloc_mem       - If true, force allocate new memory (and delete already allocated memory, if any)\n * @param context[out]          - Resulting NQ context.\n *\n * @return 0 on success.\n */\nint ndl_notification_init(ndl_device_t *device, int nq_dev_id, ndl_nq_dev_t nq_dev_type, uint8_t nq_type, uint8_t engine_index,\n                          uint32_t size, bool on_host_memory, uint32_t dram_channel, uint32_t dram_region,\n                          uint64_t *notification_context);\n\n/** Configure notification queue with option to force re-allocate/re-size\n *\n * Neuron device has multiple of neuron cores and TOP_SPs. If nq_dev_type is\n * NQ_DEV_TYPE_NEURON_CORE, nq_dev_index conveys neuron core index. In case of\n * NQ_DEV_TYPE_NEURON_TOPSP, nq_dev_index means TOP_SP index.\n *\n * @param device[in]            - Device\n * @param nq_dev_id[in]         - Notification device index\n * @param nq_dev_type[in]       - Notification device type\n * @param nq_type[in]           - Notification queue type\n * @param engine_index[in]      - Engine index\n * @param size[in]              - Size in bytes\n * @param on_host_memory[in]    - If true, NQ is created on host memory\n * @param dram_channel          - If NQ is created on device, DRAM channel to use\n * @param dram_region           - If NQ is created on device, DRAM region to use\n * @param force_alloc_mem       - If true, force allocate new memory (and delete already allocated memory, if any)\n * @param context[out]          - Resulting NQ context.\n *\n * @return 0 on success.\n */\nint ndl_notification_init_with_realloc(ndl_device_t *device, int nq_dev_id, ndl_nq_dev_t nq_dev_type, uint8_t nq_type, uint8_t engine_index,\n                          uint32_t size, bool on_host_memory, uint32_t dram_channel, uint32_t dram_region, bool force_alloc_mem,\n                          uint64_t *notification_context);\n\n/** Returns mem_handle associated with the NQ\n *\n * @param notification_context[in]  - Notification context\n * @param mem_handle[out]           - Notification's memory handle would be stored here.\n *\n * @return 0 on success, 1 on failure\n */\nint ndl_notification_get_mem_handle(uint64_t notification_context, uint64_t *mem_handle);\n\n/** Returns size associated with the NQ\n *\n * @param notification_context[in]  - Notification context\n * @param size[out]           - Notification's size would be stored here.\n *\n * @return 0 on success, 1 on failure\n */\n\nint ndl_notification_get_size(uint64_t notification_context, uint32_t *size);\n\n/** Maps NQ to virtual address.\n *\n * @param notification_context[in]  - Notification context.\n * @param va [out]                  - Virtual address where the mapping is done.\n * @return 0 on success\n */\nint ndl_notification_map(uint64_t notification_context, void **va);\n\n/** Stops and destroys already configured notification queue.\n *\n * @param notification_context[in] - Notification context.\n *\n * @return 0 on success.\n */\nint ndl_notification_destroy(uint64_t notification_context);\n\n/** Makes neuron ds available for use and returns a valid pointer in **data and a valid size in *size\n *\n * @param device[in]            - Device\n * @param pid[in]               - PID for this NDS (if 0 it allocates a new one)\n * @param data[out]             - Will contain a valid pointer to the datastore\n * @param size[out]             - Will contain a valid size for the datastore\n *\n * @return 0 on success.\n */\nint ndl_nds_open(ndl_device_t *device, int32_t pid, void **data, size_t *size);\n\n/** Decreases ref count for the given pid\n *\n * @param device                - Device\n * @param pid                   - PID owning the datastore\n * @param data                  - Pointer to datastore raw data (returned by ndl_nds_open)\n * @param size                  - Size of datastore (returned by ndl_nds_open)\n *\n * @return 0 on success.\n */\nint ndl_nds_close(ndl_device_t *device, int32_t pid, void *data, size_t size);\n\n/** Enter inference critical section.\n *\n * @param device[in]            - Device\n * @param nc_index[in]          - Neuron core index\n * @param uuid[in]              - UUID of the model expected to be loaded\n *\n * This function would fail if the UUID is different or PID\n * which loaded the UUID is different.\n *\n * @return 0 on success, -1 on failure.\n */\nint ndl_crwl_reader_enter(ndl_device_t *device, int nc_index, struct neuron_uuid uuid);\n\n/** Exit inference critical section.\n *\n * @param device[in]            - Device\n * @param nc_index[in]          - Neuron core index\n * @param uuid[in]              - UUID of the model expected to be loaded\n *\n * @return 0 on success, -1 on failure.\n */\nint ndl_crwl_reader_exit(ndl_device_t *device, int nc_index, struct neuron_uuid uuid);\n\n/** Enter model load critical section.\n *\n * @param device[in]            - Device\n * @param nc_index[in]          - Neuron core index\n * @param uuid[in]              - UUID of the model to be loaded\n *\n * @return 0 on success, -1 on failure.\n */\nint ndl_crwl_writer_enter(ndl_device_t *device, int nc_index, struct neuron_uuid uuid);\n\n/** Exit model load critical section and enter inference critical section.\n *\n * @param device[in]            - Device\n * @param nc_index[in]          - Neuron core index\n * @param uuid[in]              - UUID of the loaded model\n *\n * @return 0 on success, -1 on failure.\n */\nint ndl_crwl_writer_downgrade(ndl_device_t *device, int nc_index, struct neuron_uuid uuid);\n\n/** Find given number of free NCs and mark them as used.\n *\n * @param nc_count[in]          - Number of free neuron cores needed.\n * @param start_nc[in]          - From where to start the free core search.\n * @param end_nc[in]            - Last NC where to stop the free core search.\n * @param max_nc_available[out] - Maximum number of free cores available.\n * @param bitmap[out]           - Bitmap of marked neuron core indexes.\n * @param size[in]              - size of the bitmap in bytes\n *\n * @return 0 on success, -1 on failure.\n */\nint ndl_crwl_nc_range_mark(uint32_t nc_count, uint32_t start_nc, uint32_t end_nc,\n                           uint32_t *max_nc_available, uint64_t *bitmap, size_t size);\n\n/** Unmark neuron cores as free.\n *\n * @param bitmap[in]           - Bitmap of marked neuron core indexes.\n * @param size[in]             - size of the bitmap in bytes\n *\n * @return 0 on success, -1 on failure.\n */\nint ndl_crwl_nc_range_unmark(uint64_t *bitmap, size_t size);\n\n/** Gets the info for the copy buffer for copying data to/from device\n *\n * To dma data in and out of the device, app needs a host dram buffer allocated\n * by the driver. Allocating this every-time is expensive especially if we want\n * a bigger copy size. To avoid this performance penalty, applications can use\n * this preallocated buffer.\n *\n * @param device[in]    - Device\n * @param nc_id[in]     - nc id the copy buffer is from\n * @param cpy_buf[out]  - Pointer to copy buffer\n *\n * @return 0 on success\n */\nint ndl_get_copy_buf(ndl_device_t *device, uint32_t nc_id, ndl_copy_buf_t **cpy_buf);\n\n/** Set the neuron core init state\n * Initially the state is set to started and then app intializes the neuron core. Then\n * it sets the state to completed. If any other app tries to set the state to started when it\n * is already started then this routine will block until the init is done or timeout\n *\n * @param device[in]            - Device\n * @param state[in]             - State that will be state\n * @param new_state[out]        - State after the set is done\n *\n * @return 0 on success, -1 on failure.\n */\nint ndl_nc_init_set_state(ndl_device_t *device, uint32_t nc_id, uint32_t state, uint32_t *new_state);\n\n/** Gets the state of model start. If this is the first model that will be loaded in the nc.\n *\n * @param device[in]            - Device\n * @param nc_id[in]             - nc id\n * @param started_count[out]    - number of times model started in that nc\n *\n * @return 0 on success, -1 on failure.\n */\nint ndl_nc_model_started_count(ndl_device_t *device, uint32_t nc_id, uint64_t *started_count);\n\n/** Gets the architecture & revision of the board\n *\n * @param architecture[out]          - Architecture of the board\n * @param revision[out]              - Revision of the board\n *\n * @return 0 on success\n */\nint ndl_get_board_info(uint32_t *architecture, uint32_t *revision);\n\n/** Gets BDF for a device - only for devices opened by the calling process - DEPRECATED don't use\n *\n * @param bus_num[out]               - Bus number for this device\n * @param pci_slot[out]              - PCI slot for this device\n * @param dev_func[out]              - Device function for this device\n *\n * @return 0 on success\n */\nint ndl_get_device_bdf(int device_index, uint32_t *bus_num, uint8_t *pci_slot, uint8_t *dev_func);\n\n/**\n * @brief Get the anonymous file-descriptor of dma-buf associated with\n * a Neuron device memory region if it was registered for EFA peer direct\n *\n * @param addr[in]        - Device buffer virtual address\n * @param size[in]        - Device buffer size (in bytes)\n * @param fd[out]         - dma-buf fd\n *\n * @return 0 on success\n */\nint ndl_get_dmabuf_fd(uint64_t addr, uint64_t size, int* fd);\n\n/** Gets BDF for a device\n *\n * @param device_index[in]           - Neuron device index\n * @param domain[out]                - PCIe domain for the device\n * @param bus_num[out]               - Bus number for the device\n * @param pci_slot[out]              - PCI slot for the device\n * @param dev_func[out]              - Device function for the device\n *\n * @return 0 on success\n */\nint ndl_get_device_bdf_ext(int device_index, uint32_t *domain, uint32_t *bus_num, uint8_t *pci_slot, uint8_t *dev_func);\n\n/** retrieve offset/size where to mmap around a physical address\n * \n * @param device[in]             - Neuron device\n * @param pa[in]                 - physical address in device mem to retrieve mc mmap info for\n * @param mmap_offset[out]       - mmap offset\n * @param mem_handle[out]        - The handle for the given physical address.\n *                                 Set to 0 when using backwards compatible interface with old drivers.\n * @param size[out]              - size\n *\n */\nint ndl_mem_get_mc_mmap_info(ndl_device_t *device, uint64_t pa, uint64_t *mmap_offset, uint64_t *size, uint64_t *mem_handle);\n\n/** mmap a bar region into user address\n *\n * @param device[in]          - Neuron device\n * @param block[in]           - block type containing the resource\n * @param block_id[in]        - id of the block if is more than one block\n * @param resource[in]        - resource the caller wants to mmap\n * @param va[out]             - virual address of the resource\n * @param size[out]           - size of the resource\n *\n */\nint ndl_mmap_bar_region( ndl_device_t *device, enum neuron_dm_block_type block, uint32_t block_id, enum neuron_dm_resource_type resource, \n                        void ** va, uint64_t * size);\n\n/** Close all cached FDs\n *\n */\nvoid ndl_device_cached_fd_close_all(void);\n\n/** Log an error message to kernel messages/serial console\n * \n * @param str[in]             - The error message\n * @param size[in]            - The size of the error message including null terminator\n * @param action[in]          - Additional action to perform\n * \n * @return On success: 0\n *         On failure: -1 and:\n *           * errno == EFAULT when size is too large\n *           * errno == EBADMSG when str is not null terminated\n */\nint ndl_printk(char *str, uint32_t size, uint32_t action);\n\n/** get the host device id for an open device (for containers)\n * \n * @param device[in]           - Neuron device\n * @param host_device_id[out]  - host device id \n * \n */\nint ndl_get_host_device_id(ndl_device_t *device, uint32_t *host_device_id);\n\n/** return device id to routing id mapping table along with number of entries in the table\n *\n * @param count[in/out]             - [in] size of map in entries.  [out] # entries returned\n * @param host_did_to_rid_map[out]  - map of host device id to routing ids \n *\n */\nint ndl_get_host_device_id_to_rid_map(uint32_t *count, uint32_t *host_did_to_rid_map);\n\nint ndl_dump_device_allocation_info(ndl_device_t *device, uint32_t hbm_index, struct neuron_ioctl_mem_chunk_info *data, uint32_t *num_entries);\n\n/** ask the driver to dump neuron core process info \n *\n * @param nc_id[in]             - [in] neuron core to dump process info for\n * @param filter_log_owner[in]  - [in] only dump log entries for the owner pid of the neuron core\n * @param log_dump_limit[in]    - [in] max number of log entries to dump\n *\n */\nint ndl_dump_nc_pid_info(uint32_t nc_id, bool filter_log_owner, uint32_t log_dump_limit);\n\n/** write a value to entire HBM accessible to Neuron (so excludes firmware carveout)\n *\n *  @param hbm_index     - HBM to write to\n *  @param init_val      - value to write\n */\nint ndl_hbm_scrub_start(ndl_device_t *device, uint32_t nc_id, uint32_t hbm_index, uint32_t axi_port, uint32_t init_val);\nint ndl_hbm_scrub_wait(ndl_device_t *device, uint32_t nc_id, uint32_t hbm_index);\n\n/** Gets the tpb mapping.\n *\n * @param map[out]              - Location to store the mapping information\n * @param max_num_entries[in]   - Maximum number of entries we can store in `map`\n * @param mapping_version[in]   - Flavor of mapping to get from the driver\n *\n * @return 0 on success\n */\nint ndl_get_logical_to_physical_nc_map(struct neuron_ioctl_nc_map *map, uint32_t max_num_entries, enum neuron_ioctl_nc_mapping_type mapping_version);\n\n/** return pod information\n *\n * @param pod_type[out] - type of pod\n * @param pod_sz[out]   - size of the pod\n *\n */\nint ndl_pod_info(uint32_t * pod_type, uint32_t * pod_sz);\n\n/** return pod election state\n *\n * @param state[out] - election state\n *\n */\nint ndl_pod_election_state(uint32_t * state);\n\n/** return pod mapping information.\n *\n * @param node_id[out]          - node id of the pod node.  -1 if the node is not part of a configured pod\n *\n */\nint ndl_pod_mapping_info(int * node_id);\n\n\n/** return pod status\n *\n * @param pod_id[out]           - pod id.  Only valid it the pod is configured as a pod\n * @param state[out]            - state of the pod election \n * @param pod_type[out]         - type of pod \n * @param pod_sz[out]           - size of the pod.  0 if the node is not part of a pod\n * @param node_id[out]          - node id of the pod node.  -1 if the node is not part of a configured pod\n * @param mode[out]             - current operating mode\n * @param modes_supported[out]  - supported operating modes\n *\n */\nint ndl_pod_status(uint8_t *pod_id, uint32_t *state, uint32_t *pod_type, uint32_t *pod_sz, int *node_id, \n                   enum neuron_ultraserver_mode *mode, uint32_t *modes_supported);\n\n\n/** control pod election state\n *\n * @param ctrl[in]           - control request.  (enum neuron_pod_ctrl_req)\n * @param mode[in]           - requested operating mode\n * @param timeout[in]        - timeout for control operation\n * @param state[out]         - state of the pod election \n *\n */\nint ndl_pod_ctrl(uint32_t ctrl, enum neuron_ultraserver_mode mode, uint32_t timeout, uint32_t *state);\n\nint ndl_alloc_contiguous_scratchpad(ndl_device_t *device, uint64_t size, uint32_t hbm_index, uint32_t nc_id, uint64_t *mem_handle);\n\n/** Similar to ndl_memory_map - only difference is that a contiguous scratchpad var may span multiple contiguous memchunks. So size of memory mapping is different from just the size of the first contiguous memchunk.\n *\n * @param mem_handle[in]     - Handle to map.\n * @param va[out]            - Resulting virtual address.\n * @param size[in]           - Size to map \n *\n * @return 0 on success\n */\n\nint ndl_memory_map_contiguous_scratchpad(uint64_t mem_handle, void **va, uint64_t size);\n\n/** Set performance profile\n *\n * @param device[in]        - Device handle.\n * @param profile[in]       - Performance profile to set.\n *\n * @return 0 on success.\n */\nint ndl_set_performance_profile(ndl_device_t *device, uint32_t profile);\n\n/** Enable or disable throttling notifications\n *\n * @param device[in]        - Device handle.\n * @param enable[in]        - true to enable, false to disable.\n *\n * @return 0 on success.\n */\nint ndl_enable_throttling_notifications(ndl_device_t *device, bool enable);\n\nbool ndl_feature_supported(int nd_fd, uint64_t feature);\n\n/** dynamically allocate h2t queues (rings)\n *\n * @param device[in]                - Neuron device\n * @param nc_id[in]                 - neuron core to allocate h2t queues for\n * @param copy_queue_cnt[in]        - number of h2t copy queues to allocate\n * @param service_queue_cnt[in]     - number of service queues to allocate\n * @param copy_queue_bmap[out]      - bitmap of the allocated copy queues\n * @param servic_equeue_bmap[out]   - bitmap of the allocated service queues\n * @param copy_default_queue[out]   - default h2t copy queue\n *\n */\nint ndl_h2t_dma_queue_alloc(ndl_device_t *device,  uint32_t nc_id, uint32_t copy_queue_cnt, uint32_t service_queue_cnt,\n                            uint32_t *copy_queue_bmap, uint32_t *service_queue_bmap, uint32_t *copy_default_queue);\n\n/** free dynamically allocated h2t queues\n *\n * @param device[in]           - Neuron device\n * @param nc_id[in]            - [in] neuron core to free queues for\n * @param queue_bmap[in]       - [in] bitmap of queues to free\n *\n */\nint ndl_h2t_dma_queue_free(ndl_device_t *device,  uint32_t nc_id, uint32_t queue_bmap);\n\n/** control metrics posting behavior\n *\n * @param device[in]            - Neuron device\n * @param mode[in]              - how to modify posting behavior (enable or disable periodic posting)\n */\nint ndl_metrics_ctrl(ndl_device_t *device, enum neuron_metrics_mode mode);\n\n/** get Neuron device and HBM index pointed by VA\n *\n * @param va[in]                - VA of Neuron memory\n * @param device_index[out]     - Neuron device\n * @param hbm_index[out]        - HBM index\n */\nint ndl_get_va_placement(const void *va, int *device_index, int *hbm_index);\n\n/**\n * arbitrary size bitmap support\n *\n */\n#define NBM_NR_BITS(t)     (sizeof(t)*8)\n#define NBM_NR_ENT(nr,t)   (((nr)+NBM_NR_BITS(t)-1) / NBM_NR_BITS(t))\n\nstatic inline uint32_t nbitmap_test_bit(uint32_t nr, uint64_t *addr)\n{\n    return (addr[nr/NBM_NR_BITS(*addr)] & (1ull << (nr % NBM_NR_BITS(*addr)))) != 0ull;\n}\n\nstatic inline void nbitmap_set_bit(uint32_t nr, uint64_t *addr)\n{\n    addr[nr/NBM_NR_BITS(*addr)] |= (1ull << (nr % NBM_NR_BITS(*addr)));\n}\n\nstatic inline uint32_t nbitmap_ffs1(uint32_t nr, uint64_t *addr)\n{\n    int i;\n    for (i=0; i < NBM_NR_ENT(nr, *addr); i++) {\n        uint32_t x = __builtin_ffsl(addr[i]);\n        if (x) \n            return i * NBM_NR_BITS(*addr) + x;\n    }\n    return 0;\n}\n\nstatic inline uint32_t nbitmap_popcount(uint32_t nr, uint64_t *addr)\n{\n    int i;\n    uint32_t cnt = 0;\n    for (i=0; i < NBM_NR_ENT(nr, *addr); i++) {\n        cnt += __builtin_popcountll(addr[i]);\n    }\n    return cnt;\n}\nstatic inline void nbitmap_clr_bit(uint32_t nr, uint64_t *addr)\n{\n    addr[nr/NBM_NR_BITS(*addr)] &= ~(1ull << (nr % NBM_NR_BITS(*addr)));\n}\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/ndl/neuron_driver_shared.h",
    "content": "/*\n * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n#ifndef NEURON_DRIVER_SHARED_H\n#define NEURON_DRIVER_SHARED_H\n\n#include <linux/types.h>\n#include \"neuron_driver_shared_tensor_batch_op.h\"\n\nenum neuron_driver_feature_flag {\n\tNEURON_DRIVER_FEATURE_DMABUF = 1ull <<  0, \n\tNEURON_DRIVER_FEATURE_ASYNC_DMA = 1ull <<  1, \n\tNEURON_DRIVER_FEATURE_BATCH_DMAQ_INIT = 1ull <<  2, \n\tNEURON_DRIVER_FEATURE_BIG_CORE_MAPS   = 1ull <<  3, \n\tNEURON_DRIVER_FEATURE_MEM_ALLOC_TYPE  = 1ull <<  4,\n\tNEURON_DRIVER_FEATURE_HBM_SCRUB = 1ull << 5,\n\tNEURON_DRIVER_FEATURE_MEM_ALLOC64 = 1ull << 6,\n\tNEURON_DRIVER_FEATURE_CONTIGUOUS_SCRATCHPAD = 1ull << 7,\n\tNEURON_DRIVER_FEATURE_ZEROCOPY = 1ull << 8,\n};\n\n// FIXME  this should be more generic - like node type.\nenum {\n\tNEURON_POD_TYPE_NONE = 0,\n\tNEURON_POD_TYPE_P2P,\n\tNEURON_POD_TYPE_SWITCH\n};\n\nenum {\n\tNEURON_POD_E_STATE_NOT_STARTED= 0,\n\tNEURON_POD_E_STATE_IN_PROGRESS,\n\tNEURON_POD_E_STATE_ULTRASERVER,\n\tNEURON_POD_E_STATE_FAILED,\t\t\t\t// TODO we currently don't discriminate between failed and single node (todo for diagnostic/debug purposes)\n\tNEURON_POD_E_STATE_SINGLE_NODE,\n};\n\nenum neuron_pod_ctrl_req {\n\tNEURON_NPE_POD_CTRL_REQ_POD = 0,  \t\t // request pod state to pod (on-demand election request)\n\tNEURON_NPE_POD_CTRL_REQ_SINGLE_NODE = 1, // request pod state to single node\n\tNEURON_NPE_POD_CTRL_REQ_KILL = 2,\t\t // request to kill the election\n\tNEURON_NPE_POD_CTRL_SET_MODE = 3,\t\t // request to ultraserver mode\n};\n\nenum neuron_ultraserver_mode {\n\tNEURON_ULTRASERVER_MODE_UNSET = 0,  \t // no configuration set\n\tNEURON_ULTRASERVER_MODE_X4 = 1,  \t\t // 4 node US configuration\n\tNEURON_ULTRASERVER_MODE_X2H = 2,  \t\t // 2 node US configuration using horizontal links \n\tNEURON_ULTRASERVER_MODE_X2V = 3,  \t\t // 2 node US configuration using vertical links \n\tNEURON_ULTRASERVER_MODE_X1 = 4,  \t\t // 1 node US configuration (standalone)\n};\n\nenum neuron_metrics_mode {\n    NEURON_METRICS_MODE_PERIODIC_ENABLE = 0,    // enable periodic posting\n    NEURON_METRICS_MODE_PERIODIC_DISABLE = 1,   // disable periodic posting\n};\n\n#define NEURON_NC_MAP_DEVICE (0xffffffff)\n\nenum neuron_dma_queue_type {\n\tNEURON_DMA_QUEUE_TYPE_TX = 0, // transmit queue\n\tNEURON_DMA_QUEUE_TYPE_RX, // receive queue\n\tNEURON_DMA_QUEUE_TYPE_COMPLETION, // completion queue\n};\n\nenum neuron_cinit_state {\n\tNEURON_CINIT_STATE_STARTED = 1, // Core Init is initiated\n\tNEURON_CINIT_STATE_COMPLETED, // Core Init is completed successfully\n\tNEURON_CINIT_STATE_INVALID // Core Init is not valid\n};\n\n\nstruct neuron_dma_eng_state {\n\t__u32 revision_id; // revision id\n\t__u32 max_queues; // maximum queues supported\n\t__u32 num_queues; // number of queues configured\n\t__u32 tx_state; // Tx statue\n\t__u32 rx_state; // Rx state\n};\n\nstruct neuron_dma_queue_state {\n\t__u32 hw_status; // hardware status\n\t__u32 sw_status; // software status\n\t__u64 base_addr; // base address of the queue\n\t__u32 length; // size of the queue\n\t__u32 head_pointer; // hardware pointer index\n\t__u32 tail_pointer; // software pointer index\n\t__u64 completion_base_addr; // completion queue base address\n\t__u32 completion_head; // completion head\n};\n\nenum neuron_dma_h2t_ctx_handle_type {\n\tNEURON_DMA_H2T_CTX_HANDLE_NONE   = -1,  // no handle - used as prev handle to start an async dma\n\tNEURON_DMA_H2T_CTX_HANDLE_SYNC   =  0,  // handle for doing synchronous DMA\n\tNEURON_DMA_H2T_CTX_HANDLE_ASYNC1 =  1,  // first of two async handles\n\tNEURON_DMA_H2T_CTX_HANDLE_ASYNC2 =  2,  // second of two async handles\n\tNEURON_DMA_H2T_CTX_HANDLE_CNT    =  3   // number of dma \n};\n\n/*\n * H2T DMA Default Queue id\n */\n#define NEURON_DMA_H2T_DEFAULT_QID (-1)\n\n/*\n * NOTE: In runtime version 5, this enum was passed in as a bool instead -\n * true if top_sp and false if NC. Match the enum values to the bool to\n * maintain compatibility with older runtime. Do not change these values\n * until the min compatibility version is updated to >=6.\n */\nenum NQ_DEVICE_TYPE {\n    NQ_DEVICE_TYPE_NEURON_CORE = 0,\n    NQ_DEVICE_TYPE_TOPSP,\n    NQ_DEVICE_TYPE_MAX\n};\n\nenum NQ_TYPE {\n\tNQ_TYPE_TRACE = 0, /**< Implicit notifications generated during execution. */\n\tNQ_TYPE_NOTIFY, /**< Explicit notifications generated by NOTIFY instruction */\n\tNQ_TYPE_EVENT, /**< Notifications triggered by event set/clear operations. */\n\tNQ_TYPE_ERROR, /**< Notifications triggered by an error condition. */\n\tNQ_TYPE_TRACE_DMA, /**< Implicit notifications generated by DMA transfers.*/\n\tNQ_TYPE_THROTTLE,   /**< Notifications triggered by throttling activity. */\n\tNQ_TYPE_MAX\n};\n\n/**\n * memory mapping enums for selecting what bar0 resources to map.\n * Bar0 mmapping is restricted to a limited set of regions.\n * Resources are selected by block type, block id and resource within the block.\n * TPB 1 State buffer, for example - where type is TPB, block id is 1 and \n * resource is state buffer.\n * NEURON_DM_RESOURCE_ALL resource mapping is restricted to read only.\n *\n */\nenum neuron_dm_block_type {\n   NEURON_DM_BLOCK_INVALID = -1,  // invalid - tag last entry in the table\n   NEURON_DM_BLOCK_TPB     =  0,\n   NEURON_DM_BLOCK_TOPSP   =  1,\n   NEURON_DM_BLOCK_HBM     =  2\n};\n\nenum neuron_dm_resource_type {\n   NEURON_DM_RESOURCE_SEMAPHORE = 0,  // resource to mmap is semaphore region\n   NEURON_DM_RESOURCE_ALL = 1,        // resource to mmap is the entire block (read only). Only available for TOPSP\n   NEURON_DM_RESOURCE_SBUF = 2,       // resource to mmap is state buffer\n   NEURON_DM_RESOURCE_DMEM = 3\t      // resource to mmap is device memory\n};\n\nstruct neuron_uuid {\n\t__u8 value[32];\n};\n\n#define NEURON_MAX_PROCESS_PER_DEVICE 16 // 2 per core (arbitrary but needs to small number for fast lookup)\n\n#define APP_INFO_PID_NC_LOCK_INFO\t(1)\n#define APP_INFO_PID_MEM_USAGE\t\t(1 << 1)\n#define APP_INFO_ALL\t\t\t(0xF)\n\n#define APP_INFO_MAX_MODELS_PER_DEVICE\t(4)\n#define NDS_INVALID_ID (-1)\n\nstruct neuron_app_info {\n\t__s32 pid;\t\t\t\t\t\t\t// PID of this app\n\t__u8 nc_lock_map;\t\t\t\t\t\t// NCs which are locked by it (one bit set for each locked NC)\n\tstruct neuron_uuid uuid_data[APP_INFO_MAX_MODELS_PER_DEVICE];\t// UUIDs running for this app for each neuroncore\n\tsize_t host_mem_size;\t\t\t\t\t\t// Amount of host memory used by this PID\n\tsize_t device_mem_size;\t\t\t\t\t\t// Amount of device memory used by this PID\n};\n\ntypedef union nmetric_version {\n\tstruct {\n\t\t__u64 build_num : 32;\n\t\t__u64 minor_ver : 8;\n\t\t__u64 major_ver : 8;\n\t\t__u64 reserved : 16;\n\t};\n\t__u64 all;\n} nmetric_version_t;\n\nstruct neuron_ioctl_mem_chunk_info {\n\t__u64 pa;\n\t__u64 size;\n\t__u32 mem_type;\n};\n\n// Max number of entries this version of the driver\n// will ever give back to the user\n#define NEURON_NC_MAP_MAX_ENTRIES 128\nenum neuron_ioctl_nc_mapping_type {\n    NEURON_IOCTL_NC_MAPPING_TYPE_V0 = 0,           // seng swap mapping\n};\nstruct neuron_ioctl_nc_map_entry {\n    __u32 device_id;\n    __u32 device_nc_idx;\n};\nstruct neuron_ioctl_nc_map {\n    __u32 num_entries;\n    struct neuron_ioctl_nc_map_entry mappings[];\n};\n\n/* A batch of copy operations */\ntypedef struct neuron_memcpy_batch {\n\t__u64 mem_handle;               // [in] Source or Destination memory handle from/to data needs to be copied.\n\t__u64 mem_handle_offset;        // [in] Memory offset of the memory handle\n\tconst nrt_tensor_batch_op_t *ops_ptr; // [in] Pointer to array of operations\n\t__u32 num_ops;                  // [in] Number of neuron_memcpy_op operations.\n\t__u16 bar4_wr_threshold;        // [in] Threshold below which we will use bar4 direct write vs. DMA. Subject to driver limits.\n\t__u16 flags;                    // [in] TBD.\n\tvoid *context;                  // [in] TBD. opaque context pointer passed back in completion queue\n} neuron_memcpy_batch_t;\n\n/*\n * Memory allocation categories for sysfs counters\n*/\ntypedef enum {\n\tNEURON_MEMALLOC_TYPE_UNKNOWN_HOST, // only for old runtimes, do not use elsewhere\n\tNEURON_MEMALLOC_TYPE_CODE_HOST,\n\tNEURON_MEMALLOC_TYPE_TENSORS_HOST,\n\tNEURON_MEMALLOC_TYPE_CONSTANTS_HOST,\n\tNEURON_MEMALLOC_TYPE_MISC_HOST,\n\tNEURON_MEMALLOC_TYPE_NCDEV_HOST,\n\tNEURON_MEMALLOC_TYPE_NOTIFICATION_HOST,\n\n\tNEURON_MEMALLOC_TYPE_UNKNOWN_DEVICE, // only for old runtimes, do not use elsewhere\n\tNEURON_MEMALLOC_TYPE_CODE_DEVICE,\n\tNEURON_MEMALLOC_TYPE_TENSORS_DEVICE,\n\tNEURON_MEMALLOC_TYPE_CONSTANTS_DEVICE,\n\tNEURON_MEMALLOC_TYPE_SCRATCHPAD_DEVICE,\n\tNEURON_MEMALLOC_TYPE_MISC_DEVICE,\n\tNEURON_MEMALLOC_TYPE_NCDEV_DEVICE,\n\tNEURON_MEMALLOC_TYPE_COLLECTIVES_DEVICE,\n\tNEURON_MEMALLOC_TYPE_SCRATCHPAD_NONSHARED_DEVICE,\n\tNEURON_MEMALLOC_TYPE_NOTIFICATION_DEVICE,\n\n\tNEURON_MEMALLOC_TYPE_DMA_RINGS_HOST,\n\tNEURON_MEMALLOC_TYPE_DMA_RINGS_DEVICE,\n\n\tNEURON_MEMALLOC_TYPE_CONTIGUOUS_SCRATCHPAD_DEVICE, // uses same sysfs counter as NEURON_MEMALLOC_TYPE_SCRATCHPAD_DEVICE\n\n\tNEURON_MEMALLOC_TYPE_MAX\n} mem_alloc_category_t;\n\n/*\n * NDS stats\n * Note: \n * \tTo add a new counter type inside the enum, \n * \t\t1. you need to manually decrease NDS_ND_COUNTER_RESERVED or NDS_NC_COUNTER_RESERVED by 1\n * \t\t2. you need to update NDS_ND_COUNTER_COUNT or NDS_NC_COUNTER_COUNT\n * \tTo prevent compatability issues, you need to always append the new counter type to the end of the enum\n */\n#define NDS_ND_COUNTER_RESERVED 18\n\n// Device counter types\nenum {\n\tNDS_ND_COUNTER_RUNTIME_VERSION,\n\tNDS_ND_COUNTER_FRAMEWORK_VERSION,\n\tNDS_ND_COUNTER_FAL_VERSION,\n\tNDS_ND_COUNTER_FEATURE_BITMAP,\n\tNDS_ND_COUNTER_MIN_NEFF_VERSION,\n\tNDS_ND_COUNTER_MAX_NEFF_VERSION,\n\n\t// memory usage counters\n\tNDS_ND_COUNTER_MEM_USAGE_CODE_HOST,\n\tNDS_ND_COUNTER_MEM_USAGE_TENSORS_HOST,\n\tNDS_ND_COUNTER_MEM_USAGE_CONSTANTS_HOST,\n\tNDS_ND_COUNTER_MEM_USAGE_SCRATCHPAD_HOST,\n\tNDS_ND_COUNTER_MEM_USAGE_MISC_HOST,\n\n\tNDS_ND_COUNTER_DYNAMIC_SYSFS_METRIC_BITMAP,\n\n\tNDS_ND_COUNTER_DEVICE_CLUSTER_ID,\n\n\tNDS_ND_COUNTER_COUNT = NDS_ND_COUNTER_DEVICE_CLUSTER_ID + NDS_ND_COUNTER_RESERVED + 1\n};\n\n#define NDS_NC_COUNTER_RESERVED 0\n\n// Neuroncore counter types\nenum {\n\tNDS_NC_COUNTER_TIME_IN_USE = 0,\n\n\tNDS_NC_COUNTER_INFER_COMPLETED,\n\tNDS_NC_COUNTER_INFER_COMPLETED_WITH_ERR,\n\tNDS_NC_COUNTER_INFER_COMPLETED_WITH_NUM_ERR,\n\tNDS_NC_COUNTER_INFER_TIMED_OUT,\n\tNDS_NC_COUNTER_INFER_INCORRECT_INPUT,\n\tNDS_NC_COUNTER_INFER_FAILED_TO_QUEUE,\n\n\t// these must be in this specifc order\n\t// runtime assumes these are offset by\n\t// error code\n\tNDS_NC_COUNTER_ERR_GENERIC,\n\tNDS_NC_COUNTER_ERR_NUMERICAL,\n\tNDS_NC_COUNTER_ERR_MODEL,\n\tNDS_NC_COUNTER_ERR_TRANSIENT,\n\tNDS_NC_COUNTER_ERR_HW,\n\tNDS_NC_COUNTER_ERR_RT,\n\n\tNDS_NC_COUNTER_LATENCY_DEVICE,\n\tNDS_NC_COUNTER_LATENCY_TOTAL,\n\tNDS_NC_COUNTER_NC_TIME,\n\n\t// these are new counters\n\t// these shall be placed at the\n\t// end so there offsets are always\n\t// greater than old counters\n\t// This will ensure\n\t// new runtime + old driver will\n\t// write to reserved setions and not\n\t// break anything\n\tNDS_NC_COUNTER_GENERIC_FAIL,\n\tNDS_NC_COUNTER_ERR_RESOURCE,\n\tNDS_NC_COUNTER_ERR_RESOURCE_NC,\n\tNDS_NC_COUNTER_ERR_INVALID,\n\tNDS_NC_COUNTER_ERR_UNSUPPORTED_NEFF_VERSION,\n\n\tNDS_NC_COUNTER_CC_TIME,\n\n\tNDS_NC_COUNTER_MEM_USAGE_CODE_DEVICE,\n\tNDS_NC_COUNTER_MEM_USAGE_TENSORS_DEVICE,\n\tNDS_NC_COUNTER_MEM_USAGE_CONSTANTS_DEVICE,\n\tNDS_NC_COUNTER_MEM_USAGE_SCRATCHPAD_DEVICE,\n\tNDS_NC_COUNTER_MEM_USAGE_MISC_DEVICE,\n\n\tNDS_NC_COUNTER_MODEL_LOAD_COUNT,\n\tNDS_NC_COUNTER_INFERENCE_COUNT,\n\n\tNDS_NC_COUNTER_MAC_COUNT,\n\n\tNDS_NC_COUNTER_OOB,\n\n\tNDS_NC_COUNTER_COUNT = NDS_NC_COUNTER_OOB + NDS_NC_COUNTER_RESERVED + 1\n};\n\n#define NDS_MAX_NEURONCORE_COUNT     (4)\n#define NDS_EXT_MAX_NEURONCORE_COUNT (12)\n\n// Additional NC storage\n// | NDS_EXT_NC_COUNTER_COUNT | ... | NDS_EXT_NC_COUNTER_COUNT | (x NDS_MAX_NEURONCORE_COUNT) - this will only store the 'overflow' from the original counters\n// | NDS_NC_COUNTER_COUNT + NDS_EXT_NC_COUNTER_COUNT | ... (x NDS_EXT_MAX_NEURONCORE_COUNT)   - this will store complete data for additional NCs (up to a max of 16)\n#define NDS_EXT_NC_COUNTER_ADDED_RESERVED 54\n// Index of NC counter extensions start at NDS_NC_COUNTER_COUNT not at 0\nenum {\n\tNDS_EXT_NC_COUNTER_HW_ERR_COLLECTIVES = NDS_NC_COUNTER_COUNT,\n\tNDS_EXT_NC_COUNTER_HW_ERR_HBM_UE,\n\tNDS_EXT_NC_COUNTER_HW_ERR_NC_UE,\n\tNDS_EXT_NC_COUNTER_HW_ERR_DMA_ABORT,\n\tNDS_EXT_NC_COUNTER_ERR_SW_NQ_OVERFLOW,\n\tNDS_EXT_NC_COUNTER_ERR_SW_SEMAPHORE_ERROR,\n\tNDS_EXT_NC_COUNTER_ERR_SW_EVENT_ERROR,\n\tNDS_EXT_NC_COUNTER_ERR_SW_PSUM_COLLISION,\n\tNDS_EXT_NC_COUNTER_ERR_SW_SEQUENCER_FATAL,\n\tNDS_EXT_NC_COUNTER_HW_ERR_REPAIRABLE_HBM_UE,\n\tNDS_EXT_NC_COUNTER_LAST,\n\tNDS_EXT_NC_COUNTER_COUNT =  NDS_EXT_NC_COUNTER_LAST - NDS_NC_COUNTER_COUNT + NDS_EXT_NC_COUNTER_ADDED_RESERVED\n};\n\n#define NDS_TOTAL_NC_COUNTER_COUNT (NDS_NC_COUNTER_COUNT + NDS_EXT_NC_COUNTER_COUNT) // 31 original + 64 extended = 95 counters\n\ntypedef struct nds_header {\n\tchar signature[4];      // Fixed signature: 'n', 'd', 's', 0\n\tint  version;           // Version of the datastore's format\n} nds_header_t;\n\n/* --------------------------------------------\n * NDS shared data offsets\n * --------------------------------------------\n */\n\n#define NDS_HEADER_START (0)\n#define NDS_HEADER_SIZE (sizeof(nds_header_t))\n\n#define NDS_ND_COUNTERS_START (NDS_HEADER_START + NDS_HEADER_SIZE)\n#define NDS_ND_COUNTERS_SIZE (NDS_ND_COUNTER_COUNT * sizeof(uint64_t))\n#define NDS_ND_COUNTERS(base_addr) ((uint64_t *)(base_addr + NDS_ND_COUNTERS_START))\n\n// original NC counter section\n#define NDS_NEURONCORE_COUNTERS_COUNT (NDS_NC_COUNTER_COUNT)\n#define NDS_NEURONCORE_COUNTERS_START (NDS_ND_COUNTERS_START + NDS_ND_COUNTERS_SIZE)\n#define NDS_NEURONCORE_COUNTERS_SIZE (NDS_NEURONCORE_COUNTERS_COUNT * NDS_MAX_NEURONCORE_COUNT * sizeof(uint64_t))\n#define NDS_NEURONCORE_COUNTERS(base_addr, nc_index) ((uint64_t *)(base_addr + NDS_NEURONCORE_COUNTERS_START) + (nc_index * NDS_NEURONCORE_COUNTERS_COUNT))\n\n// additional NC counter section at the end of all existing structures in the datastore (i.e. after NDS_PROCESS_EXT_INFO)\n// NDS_PROCESS_EXT_INFO_START + NDS_PROCESS_EXT_INFO_SIZE = 44588 (hardcoded because it's easier than to move all the structs here and sizeof them)\n#define NDS_EXT_NC_COUNTER_COUNT_OLD (65)\n#define NDS_TOTAL_NC_COUNTER_COUNT_OLD (96)\n\n#define NDS_EXT_NEURONCORE_COUNTERS_SIZE_OLD (NDS_EXT_NC_COUNTER_COUNT_OLD * NDS_MAX_NEURONCORE_COUNT * sizeof(uint64_t))\n#define NDS_EXT_NEURONCORE_NC_DATA_SIZE_OLD\t(NDS_TOTAL_NC_COUNTER_COUNT_OLD * NDS_EXT_MAX_NEURONCORE_COUNT * sizeof(uint64_t))\n#define NDS_EXT_SECTION_SIZE_OLD (NDS_EXT_NEURONCORE_COUNTERS_SIZE_OLD + NDS_EXT_NEURONCORE_NC_DATA_SIZE_OLD)\n#define NDS_EXT_OFFSET_OLD (44588)\n\n#define NDS_EXT_ALIGNMENT (64)\n#define NDS_ALIGN(v) ((v) + (-(v) & (NDS_EXT_ALIGNMENT - 1)))\n#define NDS_EXT_OFFSET (NDS_ALIGN(NDS_EXT_OFFSET_OLD + NDS_EXT_SECTION_SIZE_OLD))\n\n#define NDS_EXT_NEURONCORE_COUNTERS_COUNT (NDS_EXT_NC_COUNTER_COUNT) // number of extended counters\n#define NDS_EXT_NEURONCORE_COUNTERS_START (NDS_EXT_OFFSET)\n#define NDS_EXT_NEURONCORE_COUNTERS_SIZE (NDS_EXT_NC_COUNTER_COUNT * NDS_MAX_NEURONCORE_COUNT * sizeof(uint64_t))\n#define NDS_EXT_NEURONCORE_COUNTERS(base_addr, nc_index) ((uint64_t *)(base_addr + NDS_EXT_NEURONCORE_COUNTERS_START) + (nc_index * NDS_EXT_NC_COUNTER_COUNT))\n\n// additional NC data for extra Neuron Cores (12 extra sets which include all 95 counters + 1 for padding)\n#define NDS_EXT_NEURONCORE_NC_DATA_PADDING (1) // 1 added as padding for 64 byte alignment per NC\n#define NDS_EXT_NEURONCORE_NC_DATA_COUNT (NDS_TOTAL_NC_COUNTER_COUNT + NDS_EXT_NEURONCORE_NC_DATA_PADDING) // full set of counters (base + extended) + padding\n#define NDS_EXT_NEURONCORE_NC_DATA_START (NDS_ALIGN(NDS_EXT_NEURONCORE_COUNTERS_START + NDS_EXT_NEURONCORE_COUNTERS_SIZE))\n#define NDS_EXT_NEURONCORE_NC_DATA_SIZE (NDS_EXT_MAX_NEURONCORE_COUNT * NDS_EXT_NEURONCORE_NC_DATA_COUNT * sizeof(uint64_t))\n\n#define NDS_EXT_NEURONCORE_NC_DATA(base_addr, nc_index) ((uint64_t *)(base_addr + NDS_EXT_NEURONCORE_NC_DATA_START) + (nc_index * NDS_EXT_NEURONCORE_NC_DATA_COUNT))\n\n#endif  // NEURON_DRIVER_SHARED_H\n"
  },
  {
    "path": "src/libnrt/include/ndl/neuron_driver_shared_tensor_batch_op.h",
    "content": "/*\n * Shared tensor batch operation between runtime and driver.\n */\n\n#ifndef NEURON_DRIVER_SHARED_TENSOR_BATCH_OP_H\n#define NEURON_DRIVER_SHARED_TENSOR_BATCH_OP_H\n\n#ifdef __KERNEL__\n#include <linux/types.h>\ntypedef __u64 nrt_tensor_batch_offset_t;\ntypedef __u64 nrt_tensor_batch_size_t;\n#else\n#include <stdint.h>\ntypedef uint64_t nrt_tensor_batch_offset_t;\ntypedef uint64_t nrt_tensor_batch_size_t;\n#endif\n\ntypedef struct nrt_tensor_batch_op {\n    nrt_tensor_batch_offset_t offset;\n    nrt_tensor_batch_size_t size;\n    void *buffer;\n} nrt_tensor_batch_op_t;\n\n#endif  // NEURON_DRIVER_SHARED_TENSOR_BATCH_OP_H\n"
  },
  {
    "path": "src/libnrt/include/nrt/ndebug_stream.h",
    "content": "/*\n * Copyright 2025, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n/**\n * Overview:\n * The `ndebug_stream` APIs provide applications a way to consume debug events from the runtime (see\n * `ndebug_stream_event_type_t` for the different event types). These debug events are emitted by the\n * runtime per Logical Neuron Core and can be used by applications to get information on events that\n * occured on the device (ie prints, breakpoints, etc.).\n *\n * Connecting, polling, and consuming:\n * Applications that want to consume debug events will first need to connect to a Logical Neuron Core's debug stream via a call to\n * `nrt_debug_client_connect`. Once a client is connected to a core's debug stream, the runtime will will push debug events emitted\n * by the Logical Neuron Core to the stream for clients to consume. To be notified of emitted debug events, clients can utilize the\n * polling APIs provided by the Linux kernel. The `stream_fd` handle obtained from the `nrt_debug_client_connect` is a typical Linux\n * file descriptor and can be passed into any Linux polling API. It is important to note though, that while the `stream_fd` is pollable,\n * all other non-polling related functionality must go through the provided `nrt_debug_client*` APIs. For example, the stream contents\n * can only be accessed from the `nrt_debug_client_read*` API(s) and any other methods of accessing the stream data leads undefined/undesireable\n * behavior.\n *\n * Closing a Connection:\n * Once a connection is not needed anymore, clients can close the connection using the `nrt_debug_client_connect_close` API.\n *\n * Events:\n * Events consist of a header describing the payload type, and a payload representing the contents of the event. Events can be consumed by\n * clients via the `nrt_debug_client_read*` API(s).\n *\n * Notes:\n *  * These APIs do not allow for interprocess communication. Debug events are only pushed to the process that owns the Logical Neuron Core.\n *  * These APIs do not provide thread safety for multiple threads accessing the SAME stream (thread safety for different streams is guarenteed).\n *  * There can only be one outstanding connection per stream. Any attempts to initialize multiple connectiongs will result in an error.\n *  * Events are only emitted AFTER a client connects to a Logical Neuron Core's stream. Any event that would have been emitted before connectioning\n *    to the stream is dropped.\n *  * Events will be dropped if the number of unconsumed events in a stream exceeds the stream's buffer size. Clients must consume events fast\n *    enough to prevent dropped events. Additionally, Clients can configure the stream's buffer size via the `NEURON_RT_DEBUG_STREAM_BUFFER_SIZE`\n *    environment variable. The buffer size currently defaults to 64K debug events.\n */\n\n#pragma once\n\n#include <stdbool.h>\n#include <stddef.h>\n#include <stdint.h>\n#include <nrt/nrt_status.h>\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\ntypedef enum ndebug_stream_event_type {\n    NDEBUG_STREAM_EVENT_TYPE_INVALID = 0,\n    NDEBUG_STREAM_EVENT_TYPE_DEBUG_TENSOR_READ = 1,\n} ndebug_stream_event_type_t;\n\ntypedef struct ndebug_stream_event_header {\n    uint64_t data_size;\n    uint32_t type;\n    char reserved[52];\n} ndebug_stream_event_header_t;\n\ntypedef struct ndebug_stream_payload_debug_tensor_read {\n    char prefix[512];\n    uint32_t logical_nc_id;\n    uint32_t pipe;\n    char tensor_dtype[16];\n    uint64_t tensor_shape[8];\n    uint64_t tensor_data_size;\n    char reserved0[416];\n    char tensor_data[];\n} ndebug_stream_payload_debug_tensor_read_t;\n\n/** Establish a connection to a specified Logical Neuron Core's debug stream.\n *\n * @param logical_nc_idx[in]    - Core's debug stream to connect to.\n * @param stream_fd[out]        - Connection handle to reference and interact with the stream.\n *\n * @return NRT_SUCCESS on success.\n *\n * @note Only one client can connect to a Logical Neuron Core's stream at any given time.\n *       Attempts to connect to a stream with multiple clients will result in a NRT_INVALID\n *       return status.\n *\n */\nNRT_STATUS nrt_debug_client_connect(int logical_nc_idx, int *stream_fd);\n\n/** Closes connection created by `nrt_debug_client_connect`\n *\n * @param stream_fd[in] - Connection handle to close.\n *\n */\nvoid nrt_debug_client_connect_close(int stream_fd);\n\n/** Consumes a single event from the stream.\n *\n * @param stream_fd[in] - Stream to consume an event from\n * @param header[out]   - Comsuned event's header. See `ndebug_stream_event_header_t`.\n * @param payload[out]  - Consumed event's payload. See `ndebug_stream_payload*` and `ndebug_stream_event_type_t`.\n *                        **IMPORTANT**: it is the user's responsibility to free this payload pointer.\n *\n * @return NRT_SUCCESS on success.\n *\n * @note This function must be called from the same process that owns the Logical Neuron Core. Calling this\n *       function from any other process results in undefined behavior.\n *\n */\nNRT_STATUS nrt_debug_client_read_one_event(int stream_fd, ndebug_stream_event_header_t *header, void **payload);\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/nrt/nds/neuron_ds.h",
    "content": "/*\n * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n#pragma once\n\n#include <stddef.h>\n#include <stdint.h>\n#include <stdbool.h>\n#include <sys/types.h>\n#include <ndl/ndl.h>\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n// Main NDS object handle\ntypedef void *nds_obj_handle_t;\n\n// NDS object types\n#define OBJECT_TYPE_MODEL_NODE_INFO     (0)\n#define OBJECT_TYPE_PROCESS_INFO        (1)\n#define OBJECT_TYPE_PROCESS_INFO_EXT    (2)\n\n// Model-related structs\n#define MODEL_MEM_USAGE_LOCATION_COUNT 2\n\n/*\n * Number of slots for mem_usage_type in Neuron Datastore (also used by tools)\n *\n * In the current version of the neuron datastore's format, there are only 12 slots for storing\n * memory usage type, so we aggregate them using the same logic as for the 'per NC' memory tracker.\n * Monitor always aggregated them even further by adding them together, so we aren't breaking any feature.\n *\n * For usage types definiton, go to \"inc/tdrv/dma_mem_usage_type.h\"\n *\n */\nenum {\n    NDS_DMA_MEM_USAGE_SLOT_CODE,\n    NDS_DMA_MEM_USAGE_SLOT_TENSORS,\n    NDS_DMA_MEM_USAGE_SLOT_CONSTANTS,\n    NDS_DMA_MEM_USAGE_SLOT_SCRATCHPAD,\n    NDS_DMA_MEM_USAGE_SLOT_MISC,\n    NDS_DMA_MEM_USAGE_SLOT_COUNT = 12 // do not change\n};\n\n// Aggregated data for all chunks of the same type/location\ntypedef struct nds_mem_usage_info {\n    size_t total_size;        // Total size\n    uint32_t chunk_count;       // Number chunks that make up the total size\n} nds_mem_usage_info_t;\n\n// Loaded model node information\ntypedef struct nds_model_node_info {\n    uint32_t model_id;           // parent model id\n    uint32_t model_node_id;      // node id\n    char name[256];              // model name\n    char uuid[16];               // uuid\n    uint8_t nc_index;            // nc index\n    uint8_t sg_index;            // subgraph index\n} nds_model_node_info_t;\n\n// Loaded model node memory usage information\ntypedef struct nds_model_node_mem_usage_info {\n    // MODEL_MEM_USAGE_LOCATION_COUNT per each usage type\n    nds_mem_usage_info_t model_mem_usage[MODEL_MEM_USAGE_LOCATION_COUNT][NDS_DMA_MEM_USAGE_SLOT_COUNT];\n} nds_model_node_mem_usage_info_t;\n\n// Version information\ntypedef struct nds_version_info {\n    uint8_t major;\n    uint8_t minor;\n    uint32_t build;\n} nds_version_info_t;\n\n// Process information-related struct\ntypedef struct nds_process_info {\n    int8_t  framework_type;\n    char    tag[32];\n    nds_version_info_t framework_version;\n    nds_version_info_t fal_version;\n    nds_version_info_t runtime_version;\n} nds_process_info_t;\n\n// Extended process information\ntypedef struct nds_process_info_ext {\n    char tag[256];\n} nds_process_info_ext_t;\n\ntypedef struct nds_instance nds_instance_t;\ntypedef struct ndl_device ndl_device_t;\n\n\n// Feature bitmap's bit index information\ntypedef enum feature_bitmap_bit_index {\n    BIT_INDEX_TEST_FEATURE = 0,\n    BIT_INDEX_MULTICORE_FEATURE = 1,\n\n    BIT_INDEX_COUNT = BIT_INDEX_MULTICORE_FEATURE + 1\n} feature_bitmap_bit_index_t;\n\n\n/** Opens NDS for the given pid. If pid == 0, it acquires it for the current PID\n *  and it's opened in read-write mode. If pid != 0, it acquires it for the provided PID\n *  and it's opened as read-only.\n *\n * @param device[in]            - ndl_device used to open this NDS\n * @pid pid[in]                 - pid for which to open the NDS, if 0 - it's opened as r/w for the current process\n * @inst[out]                   - address of a pointer which will contain the instance handle\n *\n * @return non zero in case of error\n */\nint nds_open(ndl_device_t *device, pid_t pid, nds_instance_t **inst);\n\n/** Releases the NDS instance and frees the data associated with it (mandatory for readers)\n *\n * @param inst[in]              - NDS instance to close\n *\n * @return non zero in case of error, the pointer gets deleted regardless\n */\nint nds_close(nds_instance_t *inst);\n\n/* --------------------------------------------\n * NDS Neuroncore Counters\n * --------------------------------------------\n */\n\n/** Increments a simple per-nc counter\n *\n * @param inst[in]              - NDS instance\n * @param pnc_index[in]         - Neuroncore index\n * @param counter_index[in]     - Counter index\n * @param increment[in]         - Amount to increment\n *\n * @return 0 on success.\n */\nint nds_increment_nc_counter(nds_instance_t *inst, int pnc_index, uint32_t counter_index, uint64_t increment);\n\n/** Decrements a simple per-nc counter\n *\n * @param inst[in]              - NDS instance\n * @param pnc_index[in]         - Neuroncore index\n * @param counter_index[in]     - Counter index\n * @param increment[in]         - Amount to increment\n *\n * @return 0 on success.\n */\nint nds_decrement_nc_counter(nds_instance_t *inst, int pnc_index, uint32_t counter_index, uint64_t decrement);\n\n/** Gets a simple per-nc counter\n *\n * @param inst[in]              - NDS instance\n * @param pnc_index[in]         - Neuroncore index\n * @param counter_index[in]     - Counter index\n * @param value[out]            - Counter value\n *\n * @return 0 on success.\n */\nint nds_get_nc_counter(nds_instance_t *inst, int pnc_index, uint32_t counter_index, uint64_t *value);\n\n/** Sets a simple per-nc counter\n *\n * @param inst[in]              - NDS instance\n * @param pnc_index[in]         - Neuroncore index\n * @param counter_index[in]     - Counter index\n * @param value[in]             - Value to set the counter to\n *\n * @return 0 on success.\n */\nint nds_set_nc_counter(nds_instance_t *inst, int pnc_index, uint32_t counter_index, uint64_t *value);\n\n/* --------------------------------------------\n * NDS Neuron Device Counters\n * --------------------------------------------\n */\n\n/** Increments a simple per-nd counter - may overflow\n *\n * @param inst[in]              - NDS instance\n * @param counter_index[in]     - Counter index\n * @param increment[in]         - Amount to increment\n *\n * @return 0 on success.\n */\nint nds_increment_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t increment);\n\n/** Decrements a simple per-nd counter - may overflow\n *\n * @param inst[in]              - NDS instance\n * @param counter_index[in]     - Counter index\n * @param decrement[in]         - Amount to decrement\n *\n * @return 0 on success.\n */\nint nds_decrement_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t decrement);\n\n/** Bitwise inclusive OR operation on counter\n * \n * @param inst[in]              - NDS instance\n * @param counter_index[in]     - Counter index\n * @param 1ull << bit_index     - bit mask on the feature bitmap\n * \n * @return 0 on success.\n */\nint nds_or_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t bit_index);\n\n/** Gets a simple per-nd counter\n *\n * @param inst[in]              - NDS instance\n * @param counter_index[in]     - Counter index\n * @param value[out]            - Counter value\n *\n * @return 0 on success.\n */\nint nds_get_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t *value);\n\n/** Sets a simple per-nd counter\n *\n * @param inst[in]              - NDS instance\n * @param counter_index[in]     - Counter index\n * @param value[in]             - Value to set the counter to\n *\n * @return 0 on success.\n */\nint nds_set_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t *value);\n\n/* --------------------------------------------\n * NDS objects\n * --------------------------------------------\n */\n\n/** Writes an NDS object to the NDS memory\n *\n * @param obj[in]               - NDS object handle\n *\n * @return 0 on success.\n */\nint nds_obj_commit(nds_obj_handle_t obj);\n\n/** Creates a new NDS object with the given type\n *\n * @param inst[in]              - NDS instance\n * @param type[in]              - type of object to create\n *\n * @return handle for newly created object\n */\nnds_obj_handle_t nds_obj_new(nds_instance_t *inst, int type);\n\n/** Deletes a NDS object from NDS (and local memory)\n *\n * @param obj[in]               - NDS object handle\n *\n * @return 0 on success.\n */\nint nds_obj_delete(nds_obj_handle_t obj);\n\n/** Casts this NDS object to a mode_node_info_t which can be used for r/w\n *\n * @param obj[in]               - NDS object handle\n *\n * @return non-NULL on success.\n */\nnds_model_node_info_t *nds_obj_handle_to_model_node_info(nds_obj_handle_t obj);\n\n/** Casts this NDS object to a nds_model_node_mem_usage_info_t which can be used for r/w\n *\n * @param obj[in]               - NDS object handle\n *\n * @return non-NULL on success.\n */\nnds_model_node_mem_usage_info_t *nds_obj_handle_to_model_node_mem_usage(nds_obj_handle_t obj);\n\n/** Reads all model info data and returns it as an array (needs to be deleted by caller)\n *\n * @param inst[in]              - NDS instance\n * @param models[out]           - Pointer where to write the address of an array of length count containing object handles\n * @param count[out]            - Number of models loaded (present in the models array)\n *\n * @return non-NULL on success.\n */\nint nds_read_all_model_nodes(nds_instance_t *inst, nds_obj_handle_t **models, size_t *count);\n\n/** Casts this NDS object to a nds_process_info_t which can be used for r/w\n *\n * @param obj[in]               - NDS object handle\n *\n * @return non-NULL on success.\n */\nnds_process_info_t *nds_obj_handle_to_process_info(nds_obj_handle_t obj);\n\n/** Casts this NDS object to a nds_process_info_ext_t which can be used for r/w\n *\n * @param obj[in]               - NDS object handle\n *\n * @return non-NULL on success.\n */\nnds_process_info_ext_t *nds_obj_handle_to_process_info_ext(nds_obj_handle_t obj);\n\n/** Reads process info and returns a nds_obj_handle\n *\n * @param inst[in]              - NDS instance\n *\n * @return non-NULL on success.\n */\nnds_obj_handle_t nds_read_process_info(nds_instance_t *inst);\n\n/** Reads extended process info and returns a nds_obj_handle\n *\n * @param inst[in]              - NDS instance\n *\n * @return non-NULL on success.\n */\nnds_obj_handle_t nds_read_process_info_ext(nds_instance_t *inst);\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/nrt/nec.h",
    "content": "/*\n * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n#pragma once\n\n#include <time.h>\n#include <stdint.h>\n#include <stddef.h>\n#include <stdbool.h>\n#include \"nrt/nrt_status.h\"\n#include <pthread.h>\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n#define NEC_MAX_CHANNELS 32 /* matches MAXCHANNELS in NCCL */\n#define NEC_MAX_NR_CHANNEL_CHUNKS 32 /* Channel buffers for reduce operation */\n#define NEC_MAX_FOLD_N 16\n\n/*\n * We can set max communicator to anything here but ultimately we will be\n * limited by how much HW resources (such as TOP_SP semaphores or NX DRAM\n * space etc) get used up as number of communicators go up.\n */\n#define NEC_MAX_COMM_N 12   /* Max supported replica-groups in NEFF */\n\n#define NEC_MAX_NET_BUFFERS (2 * NEC_MAX_COMM_N) /* 2(hier & ring) x (# replica groups) */\n\n/*\n * We can set max communicator to anything here but ultimately we will be\n * limited by how much HW resources (such as TOP_SP semaphores or NX DRAM\n * space etc) get used up as number of communicators go up.\n */\n#define NEC_MAX_COMM_N 12   /* Max supported replica-groups in NEFF */\n\n#define NEC_MAX_NET_BUFFERS (2 * NEC_MAX_COMM_N) /* 2(hier & ring) x (# replica groups) */\n\n/*\n * We can set max communicator to anything here but ultimately we will be\n * limited by how much HW resources (such as TOP_SP semaphores or NX DRAM\n * space etc) get used up as number of communicators go up.\n */\n#define NEC_MAX_COMM_N 12   /* Max supported replica-groups in NEFF */\n\n#define NEC_MAX_NET_BUFFERS (2 * NEC_MAX_COMM_N) /* 2(hier & ring) x (# replica groups) */\n\n#define NEC_CACHE_LINE_SIZE 128\n\n/* Rank ID to denote network connector */\n#define NEC_NET_CONNECTOR_RANK -1\n/* MLA dev ID to denote network connector */\n#define NEC_NET_MLA_DEV -1\n/* MLA dev ID to denote POD connector */\n#define NEC_POD_MLA_DEV -2\n/* Rank ID to denote an unknown connector -> possibly not reachable */\n#define NEC_UNKNOWN_RANK -3\n/* MLA dev ID to denote an unknown connector -> possibly not reachable */\n#define NEC_UNKNOWN_MLA_DEV -3\n\n/* the number of hierarchical cc pipeline stage */\n#define NEC_HIER_CC_PIPELINE_STAGE_N    (3)\n\n/* the max number of outgoing requests in the recv/send proxy */\n#define NCCL_NET_NEURON_MAX_REQUESTS 128\n\n/**\n * The maximum number of concurrent cc execution. As NCCL needs this\n * information, define the size in the common header file.\n */\n#define NEC_MAX_STREAM_N       4\n\n/**\n * The different types of ofi communicators that are in the netResources\n * object that is used in the recv/send proxy\n */\ntypedef enum ofi_comm_type {\n    NET_SEND_COMM,\n    NET_RECV_COMM,\n    NET_RECV_LISTEN_COMM,\n    LOCAL_RECV_COMM,\n    LOCAL_SEND_COMM\n} ofi_comm_type_t;\n\nenum enc_comm_type {\n    H_COMM_INTRA_ID = 0,\n    H_COMM_INTER_ID = 1,\n    H_COMM_MAX_ID\n};\n\n/**\n * Neuron Elastic Collectives (NEC)\n *\n * This is the main component for Neuron Elastic Collectives in Neuron Runtime\n * (NRT). This is to provide collective operations to applications offloaded by\n * the device including collective comm init, receiving (post) operations,\n * building resources for the operation, triggering the operation and polling\n * its completion.\n *\n *     +-----------------------+\n *     |  Collectives App      |\n *     +-----------------------+\n *     |  Collectives Library  |\n *     +-----------------------+\n *     |       NEC / NRT       |\n *     +-----------------------+\n *     |        DEVICE         |\n *     +-----------------------+\n *\n * TODO: ENC will be renamed to NEC\n */\n\ntypedef enum nec_pod_type {\n    NEC_POD_TYPE_NONE,\n    NEC_POD_TYPE_P2P,\n    NEC_POD_TYPE_SWITCH,\n    NEC_POD_TYPE_INVALID\n} nec_pod_type_t;\n\ntypedef enum nec_pod_type {\n    NEC_POD_TYPE_NONE,\n    NEC_POD_TYPE_P2P,\n    NEC_POD_TYPE_SWITCH,\n    NEC_POD_TYPE_INVALID\n} nec_pod_type_t;\n\n/* Translated from what KaenaDriver returns */\ntypedef enum nec_pod_type {\n    NEC_POD_TYPE_NONE,\n    NEC_POD_TYPE_P2P,\n    NEC_POD_TYPE_SWITCH,\n    NEC_POD_TYPE_INVALID\n} nec_pod_type_t;\n\ntypedef struct enc_comm* nec_comm_t;\ntypedef struct enc_channel* nec_channel_t;\ntypedef uint64_t dma_addr_t;\n\nstruct enc_net_host_memory_index {\n    union {\n        volatile uint32_t index;\n        char pad[NEC_CACHE_LINE_SIZE]; /* Avoid false-sharing */\n    };\n};\n\n/**\n * Host memory structure for network transport\n *\n * The proxy-thread progress function first waits for the device to be ready by\n * polling host index on fold 0 until it is (-1). Once (-1) was polled, the\n * proxy-thread resets the host index to 0 and notify the device that the\n * proxy-thread is ready by incrementing the handshake semaphore by 1.\n *\n * On the sender side, the device increase the host index to post a buffer to\n * send to a remote device. The proxy-thread send progress function polls the\n * host index and send posted buffers to the respective remote device. The\n * proxy-thread polls for send requests completions and notifies the device on\n * these completions by increasing the send_complete semaphore by the amount of\n * completed send requests. The device may in response to this notification\n * increase the host index further to post additional buffers to send. The\n * proxy-thread recognize the last entry in the FIFO by the fact it is\n * specially marked (See mark_fifo_end())\n *\n * On the receiver side, the device increase the host index to post receive\n * buffers to be filled with data from a remote device. The proxy-thread recv\n * progress function polls the host index and post the receive buffers to the\n * network plugin. The proxy-thread polls for receive completions and notifies\n * the device on these completions by increasing the recv_complete semaphore by\n * the amount of completed recv requests. The device use this notification to\n * know that data is available for processing on the device memory. The device\n * may also in response to this notification increase the host index further to\n * post additional buffers to post as receive buffers. The proxy-thread\n * recognize the last entry in the FIFO by the fact it is specially marked.\n *\n * For the ring algorithm:\n * The sender's handshake and send_complete semaphores\n * are the send-credit semaphore.\n * The receiver's handshake and recv_complete semaphores are the recv-cnt\n * semaphore.\n *\n * For the mesh algorithm:\n * The handshake semaphore is the local-handshake event semaphore for both\n * sender and receiver.\n * The recevier's recv_complete semaphore is the broadcast event semaphore.\n * The sender's send_complete semaphore is the sync event semaphore.\n */\nstruct enc_net_host_memory {\n    union {\n        struct {\n            struct enc_net_host_memory_index post_recv[NEC_MAX_FOLD_N];\n        } recv;\n        struct {\n            struct enc_net_host_memory_index post_send[NEC_MAX_FOLD_N];\n        } send;\n    };\n};\n\ntypedef struct enc_host_mem {\n    void *mem_handle;\n    void *va;\n    dma_addr_t pa;\n    size_t size;\n} enc_host_mem_t;\n\n\ntypedef struct enc_host_mem_shared {\n    enc_host_mem_t mem;\n    int refcnt;\n} enc_host_mem_shared_t;\n\n/**\n * Network connector structure containing allocated resources for network transport\n */\nstruct enc_net_connector {\n    int fold_n;\n\n    enc_host_mem_t net_host_mem; /* Used to signal proxy thread */\n    enc_host_mem_shared_t *dynamic_input_host_mem; /* Used to pass info only available during execution */\n\n    /* Network transport buffer, allocated only for sender */\n    void *devmem_res;\n    void *nccl_mhandle;\n\n    /* Address and mhandle for event semaphores and pre-registered buffers */\n    void *inc_recv_sem_nccl_mhandle;\n    uint32_t *inc_recv_sem_values_buffer;\n    void *inc_recv_sem_values_buffer_mhandle;\n\n    /*\n     * NCCL network connector data structure. When one proxy worker is used for\n     * the same type (recv or send) network operation, connector information\n     * should be included in each transaction.\n     */\n    void *nccl_connector;\n};\n\ntypedef enum enc_pattern {\n    ENC_PATTERN_RING,\n    ENC_PATTERN_MESH,\n    ENC_PATTERN_INVALID,\n} enc_pattern_t;\n\ntypedef enum enc_net_connectivity {\n    ENC_CONNECTIVITY_MESH,\n    ENC_CONNECTIVITY_RDH,\n    ENC_CONNECTIVITY_DEFAULT\n} enc_net_connectivity_t;\n\nstruct enc_channel {\n    /*\n     * Application parameters for init\n     */\n    int id;\n    enc_pattern_t pattern;\n\n    /* Applicable only in case of remote neighbor */\n    struct enc_net_connector *net_recv; /* if receving from rank over the network */\n    struct enc_net_connector *net_send; /* if sending to rank over the network */\n\n    /*\n     * Neuron Runtime context\n     */\n    void *devmem_res;\n    void *two_step_pod_mesh_devmem_res;\n    /* Gateway buffer is allocated only when hybrid ring is supported */\n    void *devmem_gw_buf_res;\n    void *nccl_mhandle;\n\n    dma_addr_t gw_recv_buffer;\n    dma_addr_t gw_send_buffer;\n\n    struct enc_channel_context *ch_ctx;\n    struct encd_dma_channel *drv_channel;\n};\n\nstruct enc_peer_info {\n    int neuron_dev;\n    int rid;\n    int tpb_index;\n    int pod_node_id;\n};\n\ntypedef enum enc_topology_mode {\n    ENC_TOPO_NULL = 0,\n    ENC_TOPO_4_DEVS_IN_ROW,\n    ENC_TOPO_4_DEVS_IN_COLUMN,\n} enc_topology_mode_t;\n\nstruct enc_comm_info {\n    int neuron_dev;\n    int rank;\n    int rank_n;\n    int local_rank_n;\n    int local_rack_rank_n;\n\n    int node;\n    int node_n;\n\n    enc_topology_mode_t enc_topo_mode;\n\n    /* Pod information received from NCCL */\n    bool enable_pod;\n    bool use_net; /* Whether network interface is used or not with the communicator */\n    int pod;\n    int pod_n;\n    int pod_node;\n    int pod_node_n;\n\n    struct enc_peer_info *peers;\n};\n\nstruct enc_ring {\n    int prev;\n    int next;\n    int *user_ranks;\n\n    /* used by one_rank_per_device rings only */\n    bool duplicate;\n};\n\n/* Kangaring */\n#define NEC_KANGARING_MAX_NUM_RANKS (256)\n#define KANGARING_NUM_SENG_PER_DEV  (4)\n#define KANGARING_NUM_TPB_PER_DEV   (8)\n#define KANGARING_MAX_SECONDARIES   (3)\n\nenum SEngine {\n    S0 = 0,\n    S1 = 1,\n    S2 = 2,\n    S3 = 3,\n    SENGS_PER_DIE = 2,\n    SENGS_PER_MLA = 4\n};\n\nstruct enc_kangaring {\n    int vnc;                                        // virtual neuron core size\n    int logical_path[NEC_KANGARING_MAX_NUM_RANKS];  // the logical kangaring path: p0 s0 p1 s1 ...\n    int prev;                                       // upstream\n    int next;                                       // downstream\n    int port;                                       // port to go to next\n\n    /* In VNC 2 case, this is the only peer. For primary ranks, it refer to their secondary rank;\n     * for secondary ranks, this refer to their primary rank.\n     * In VNC 1 case, it refers specifically to the peer over rmtv with the same tpb index.\n     */\n    int peer_rmtv;\n    /* In VNC 1 case, we have these 2 additional peers.\n     * peer_over_rmtv2 refers to the peer over rmtv with a different tpb index.\n     * peer_local refers to the local peer with a different tpb index\n     */\n    int peer_rmtv2;\n    int peer_local;\n\n    int next_peer_rmtv;                              // next's peer over rmtv\n\n    bool is_primary;                                // is self rank on data path?\n    bool is_next_pcie;                              // is next primary reached via pcie or d2d?\n    bool duplicate;                                 // is this a duplicate channel?\n    bool pattern2;                                  // is pattern 2?\n};\n\ntypedef enum metaring_type {\n    RING,\n    KANGARING,\n    SINGLE_CYCLE_RING,\n    RDH,\n    INVALID_METARING\n} metaring_type_t;\n\nstruct enc_alg_metaring {\n    int channel_n;\n    struct enc_channel channels[NEC_MAX_CHANNELS];\n\n    struct enc_ring ring_ranks[NEC_MAX_CHANNELS];\n    struct enc_kangaring kangaring_ranks[NEC_MAX_CHANNELS];\n    metaring_type_t type;\n\n    /* Does the group contain only on rank per device? This variable is set to true when NCCL\n     * returns device level H-cycles to runtime. In this case, we will parse that device H-cycle\n     * and generate ring paths on runtime side. We do this because we need to enforce certain\n     * pre-defined patterns in the paths so that we avoid dead locks between concurrent groups.\n     */\n    bool one_rank_per_device;\n    /* Hybrid ring is supported when RG have 4 H-cycles of one_rank_per_device */\n    bool is_hybrid_ring;\n    bool tokens_exchanged;    /* reinitialzed tokens from old metaring config*/\n    bool deadlock_free_rank_list;\n\n    struct enc_comm *comm; /* Backward reference to ENC comm */\n    struct encd_alg_metaring *drv_alg;\n\n    /* For use by src/tgt pairs only */\n    bool skip_send;\n    bool skip_recv;\n};\n\n/*\n * The order of the events matter here, so while adding a new event make sure the event is added\n * to the right section of the list\n * \n * ENC_COMMON_NUM_EVENT_TYPE:                           contains all common events between RDH-Mesh or A2A-mesh\n * ENC_MESH_NUM_EVENT_TYPE-ENC_COMMON_NUM_EVENT_TYPE:   contains events used by mesh\n * ENC_A2A_NUM_EVENT_TYPE-ENC_MESH_NUM_EVENT_TYPE:      contains events used by A2A only\n * ENC_RDH_NUM_EVENT_TYPE-ENC_A2A_NUM_EVENT_TYPE:       contains events used by RDH only\n *\n */\ntypedef enum enc_mesh_event_type {\n    EVT_SYNC,\n    EVT_GLOBAL_HNDSHK,\n    EVT_LOCAL_HNDSHK,\n    EVT_INTER_GRP_BRDCST,\n    EVT_FUNCTION_BARRIER_FIRST_COLL,\n    EVT_FUNCTION_BARRIER_LAST_COLL,\n    EVT_REDUCE_LOCAL_HNDSHK,\n    EVT_INTRA_GRP_BRDCST,\n    ENC_COMMON_NUM_EVENT_TYPE,\n\n    ENC_MESH_NUM_EVENT_START = ENC_COMMON_NUM_EVENT_TYPE,\n    EVT_REDUCE_COPY = ENC_COMMON_NUM_EVENT_TYPE,\n    EVT_REDUCE_COPY_2,\n    EVT_REDUCE_WRITE,\n    EVT_INTER_GRP_BRDCST_2,\n    EVT_LOCAL_AND_POD_GRP_BRDCST,\n    EVT_LOCAL_AND_POD_GRP_BRDCST_2,\n    ENC_MESH_NUM_EVENT_TYPE,\n\n    ENC_A2A_NUM_EVENT_START = ENC_MESH_NUM_EVENT_TYPE,\n    EVT_LOCAL_HNDSHK_1 = ENC_MESH_NUM_EVENT_TYPE,\n    EVT_LOCAL_HNDSHK_2,\n    EVT_GLOBAL_HNDSHK_1,\n    EVT_INTER_GRP_BRDCST_1,\n    EVT_INTRA_GRP_BRDCST_1,\n    EVT_2DEV_BRDCST,\n    EVT_2DEV_HNDSHK,\n    EVT_COPY_FROM_HOST,\n    ENC_A2A_NUM_EVENT_TYPE,\n\n    ENC_RDH_NUM_EVENT_START = ENC_A2A_NUM_EVENT_TYPE,\n    EVT_RH_STEP_0 = ENC_A2A_NUM_EVENT_TYPE,\n    EVT_RH_STEP_1,\n    EVT_RH_STEP_2,\n    EVT_RH_STEP_3,\n    EVT_RH_STEP_4,\n    EVT_RH_STEP_5,\n    EVT_RH_STEP_6,\n    EVT_RH_STEP_7,\n    EVT_RH_STEP_8,\n    EVT_RH_STEP_9,\n    EVT_RDH_LOCAL_HANDSHAKE = EVT_RH_STEP_9,\n    EVT_RDH_AXES_HANDSHAKE,\n    EVT_RD_STEP_0,\n    EVT_RD_STEP_1,\n    EVT_RD_STEP_2,\n    EVT_RD_STEP_3,\n    EVT_RD_STEP_4,\n    EVT_RD_STEP_5,\n    EVT_RD_STEP_6,\n    EVT_RDH_AXES_HANDSHAKE_2,\n    EVT_1DEV_RDH_STEP_1,\n    EVT_1DEV_RDH_STEP_2,\n    EVT_1DEV_RD_STEP_1,\n    EVT_1DEV_RD_STEP_2,\n    EVT_1DEV_RH_STEP_1,\n    EVT_2DEV_RD_STEP_0,\n    EVT_2DEV_RD_STEP_1,\n    EVT_2DEV_RD_STEP_2,\n    EVT_2DEV_RD_STEP_3,\n    EVT_2DEV_RD_STEP_4,\n    EVT_RDH_LOCAL_PEER_HANDSHAKE,\n    ENC_RDH_NUM_EVENT_TYPE    // We assume each event is used only once\n                              // Enforced by encd_init_mesh_event()\n} enc_mesh_event_type_t;\n\n#define ENC_MESH_MAX_NUM_EVENTS 64\n\n#define KiB     (1024)\n#define MiB     (1024 * KiB)\n#define GiB     (1024 * MiB)\n\nstruct enc_mesh_nbr_grp {\n    int *ranks;\n    int ranks_n;\n};\n\nstruct enc_mesh_event {\n    struct enc_mesh_nbr_grp src_neighbor_grp;\n    struct enc_mesh_nbr_grp dst_neighbor_grp;\n    bool valid;\n    enc_mesh_event_type_t evt_type;\n};\n\ntypedef enum enc_alg_mesh_type {\n    ENC_ALG_FULL_MESH,\n    ENC_ALG_GROUPED_MESH,\n    ENC_ALG_MESH_TRN2,\n    ENC_ALG_MESH_SWITCH,\n    ENC_ALG_MESH_INVALID\n} enc_alg_mesh_type_t;\n\n/* TODO: In a separate commit we will change this to a cpp\n * file so we can have classes\n */\n#define ENC_MAX_OP_TYPES     (13)\nstruct enc_alg_mesh_subtype {\n    struct enc_mesh_event events[ENC_MESH_MAX_NUM_EVENTS];\n    int num_events;\n    struct encd_alg_mesh_subtype *drv_mesh;\n    struct enc_alg_mesh *mesh; /* backward reference */\n    size_t op_max_limit[ENC_MAX_OP_TYPES]; /* upper limit below which we will use mesh */\n    size_t op_min_limit[ENC_MAX_OP_TYPES]; /* lower limit above which we will use mesh */\n    size_t op_max_limit_sbuf[ENC_MAX_OP_TYPES]; /* upper limit below which we will use mesh for 2D tensors */\n    size_t op_min_limit_sbuf[ENC_MAX_OP_TYPES]; /* lower limit above which we will use mesh for 2D Tensors */\n    bool no_inplace_support;\n    bool is_use_chnl_buffer; /* Whether channel bufer will be used or not */\n    bool is_rdh;\n    bool is_single_step_mesh;\n    bool is_two_step_pod_mesh;\n    bool is_latency_opt;\n    bool is_bw_opt;\n    bool is_rmv_dst_routing;\n    uint32_t alltoall_iteration;\n};\n\n#define ENC_MAX_MESH_SUBTYPES         (20)\n#define ENC_MESH_MAX_NUM_DEVICES      (128)\n\nstruct enc_alg_mesh {\n    enc_alg_mesh_type_t mesh_type;\n\n    union {\n        struct {\n            uint32_t devid_to_rankid[ENC_MESH_MAX_NUM_DEVICES];\n            /* Whether it is a single or a multi chip mesh */\n            bool is_multi_chip;\n        } trn2;\n        struct {\n            int num_non_net_node_local_groups;\n        } trn1;\n        struct {\n            bool root_rank;\n            int num_intra_group_roots;\n            int local_root_ids[ENC_MESH_MAX_NUM_DEVICES];\n            int global_root_ids[ENC_MESH_MAX_NUM_DEVICES];\n        } inf2;\n    };\n    int group_id;\n    int num_groups;\n    /* Mesh uses only a single channel */\n    struct enc_channel channel;\n    struct enc_alg_mesh_subtype mesh_subtype[ENC_MAX_MESH_SUBTYPES];\n\n    /* Holds maximum amt of data a single group is allowed to deposit into\n     * the channel buffer. The definition of a group varies by platform type.\n     * On TRN1, TRN2 a group currently consists of all or some ranks from a\n     * single chip but on INF2 it refers to a collection of chips. The concept\n     * of a group exists to avoid traffic replication on the wire by combining\n     * input data from multiple ranks within a group before sending it outside\n     * of the group. Therefore at the destination side we only receive a single\n     * chunk of data per group.\n     */\n    size_t max_chbuf_space_per_group;\n    /* Valid only for TRN2. For TRN2 to prevent AXI deadlock we avoid on-chip\n     * routing at the destination chip and deposit data in the HBM closest to\n     * the entry port. So the rank owning that HBM receives data on behalf of\n     * other ranks on that same chip. This is why we need to carve out dedicated\n     * channel buf space for each of the other s-engines on the same chip.\n     */\n    size_t max_chbuf_space_per_seng;\n    /* Valid only for single step mesh where we directly copy the entire input\n     * buffer into another rank's channel buffer.\n     */\n    size_t max_chbuf_space_per_rank;\n\n    /* Whether to use double buffer to skip global handshake */\n    bool double_buffer;\n\n    /* Whether to build RDH */\n    bool build_rdh;\n    bool rdh_double_buffer;\n    void *rdh_devmem_res;  /*intra rdh channel buffer */\n    bool use_2dev_proxy;\n\n    bool tokens_exchanged;    /* reinitialzed tokens from old mesh config*/\n\n    bool use_net;               /* Whether inter-node mesh with network proxy is used or not */\n\n    /* Backward references to NCCL comm and general cluster info.\n     * These might come from enc_comm or enc_alg_hier\n     */\n    struct enc_nccl_comm_node *nccl_comm_node; /* Reference to NCCL comm */\n    struct enc_comm_info *ci; /* General cluster information */\n\n    struct enc_comm *comm; /* Backward reference to ENC comm */\n    struct encd_alg_mesh *drv_alg;\n\n    /*\n     * DMA mapped memory to host dedicated for A2Av metadata available only during\n     * execution.\n     */\n    enc_host_mem_t alltoallv_host_input;\n};\n\nstruct enc_alg_hier {\n    struct {\n        struct enc_nccl_comm_node *nccl_comm_node;\n        struct enc_comm_info ci;\n\n        struct enc_alg_metaring ring;\n        struct enc_alg_metaring kangaring;\n        struct enc_alg_mesh mesh;\n    } intra;\n\n    struct {\n        struct enc_nccl_comm_node *nccl_comm_node;\n        struct enc_comm_info ci;\n\n        struct enc_alg_metaring ring;\n        struct enc_alg_metaring rdh;\n        struct enc_alg_mesh mesh;\n    } inter;\n\n    struct {\n        struct {\n            struct enc_nccl_comm_node *nccl_comm_node;\n            struct enc_comm_info ci;\n\n            struct enc_alg_metaring ring;\n        } stage[NEC_HIER_CC_PIPELINE_STAGE_N];\n    } pipeline;\n\n    void* devmem_res; /* Hierarchical Reduce Scatter uses intermediate buffer */\n\n    struct enc_comm *comm; /* Backward reference to ENC comm */\n    struct encd_alg_hier *drv_alg;\n};\n\n/**\n * Comm info to query from NCCL\n */\ntypedef struct nccl_comm_info {\n    /* General cluster information */\n    uint64_t cluster_id; // randomly generated id used to identify unique clusters in log metrics\n    time_t epoch; // the epoch of the initial barrier at the start of a collectives execution. used when generating core dumps so that all ranks agree on a datetime.\n\n    int neuron_dev;\n    int rank;\n    int rank_n;\n    int local_rank_n;\n    int local_rack_rank_n;\n\n    int node;\n    int node_n;\n\n    bool enable_pod;\n    bool use_net; /* Whether network interface is used or not with the communicator */\n    int pod;\n    int pod_n;\n    int pod_node;\n    int pod_node_n;\n\n    struct enc_peer_info *peers; /* Needs to be allocated before calling ncclGetCommInfo() or NULL if peers info is not needed */\n\n    /* Ring algorithm information */\n    int channel_n;\n    struct enc_ring rings[NEC_MAX_CHANNELS];\n\n    /* Kangaring algorithm information */\n    int kangaring_channel_n;\n    int* kangaring_paths[NEC_MAX_CHANNELS];\n\n    /* Hamiltonian cycles of MLAs, used to construct 1-rank-per-mla rings */\n    int mla_cycle_n;\n    int* mla_cycles[NEC_MAX_CHANNELS];\n} nccl_comm_info_t;\n\ntypedef struct enc_nccl_comm_node {\n    void *nccl_comm;\n    char *key;\n    size_t key_sz;\n    /* Tracking the graph information in the nccl_comm. We can use\n     * ncclGetCommInfo() but it's expensive. Instead, simply track the graph\n     * information here. This flag can only changed from true to false. The\n     * other way is not possible.\n     */\n    bool disable_graph;\n    bool global_nccl_comm_node;\n    int refcnt;\n    uint32_t stream_id;\n    uint32_t context_id;\n\n    uint32_t num_local_participants;\n    uint32_t num_local_leaders;\n    uint32_t my_local_leader;\n    uint32_t *local_participants;\n    uint32_t *local_leaders;\n    struct bp_barrier *local_barrier;\n    bool intra_pod_interface; /* When intra-pod interface is used, we can't skip exeuction barrier */\n} enc_nccl_comm_node_t;\n\n/* Neuron Device information. This data structure is used to send the device information from NRT to\n * nccom for nccl communicator building.\n */\n#define ENC_PROXY_HISTOGRAM_OUTPUT_PATH_LENGTH_MAX (128)\ntypedef struct enc_proxy_histogram_config {\n    bool enable;\n    size_t bucket_usecs;\n    size_t num_buckets;\n    size_t per_neff_warmup;\n    size_t warmup;\n    char output_path[ENC_PROXY_HISTOGRAM_OUTPUT_PATH_LENGTH_MAX];\n} enc_proxy_histogram_config_t;\n \ntypedef struct enc_neuron_device_info {\n    int nec_dev_id;\n    int mla_idx;\n    int tpb_idx;\n    int host_device_id;\n    int routing_id;\n    uint64_t pod_id;\n    nec_pod_type_t pod_type;\n    uint32_t pod_node_id;\n    uint32_t virtual_server_id;\n    enc_proxy_histogram_config_t histogram_config;\n} enc_neuron_device_info_t;\n\n/**\n * Collective communicator corresponding to ncclComm structure\n *\n * enc_comm is the Collective Comm that holds all the necessary information to\n * execute an collective operation. This should be pre-set before operations are\n * posted mainly because of the topology information built upon physical\n * connectivity. Collective operations are executed on multiple channels and a\n * channel is a path for data transfer along a pre-built topology.\n */\nstruct enc_comm {\n    struct enc_nccl_comm_node *nccl_comm_node; /* Reference to NCCL comm */\n    struct enc_comm_info ci; /* General cluster information */\n    int id;\n    int stream_id;\n\n    /*\n     * Algorithms\n     */\n    struct enc_alg_metaring ring;\n    struct enc_alg_metaring kangaring;\n    struct enc_alg_metaring rdh;\n    struct enc_alg_hier hier;\n    struct enc_alg_mesh mesh;\n\n    /**\n     * Use these handles to share network connector buffers across NEFFs.\n     * Only used in global comm. Other comms will refer to the global comm to reuse them.\n     * We use net_conn_count to sequentially assign these reservations to network conectors\n     * to make sure:\n     * 1) different comm in a NEFF don't reuse the same buffer (for multi-stream cases)\n     * 2) for each NEFF, we always start with index 0 and go up for the most overlap and\n     *    reusability. We reset net_conn_count to 0 in enc_load_operations\n     */\n    int net_conn_count;\n    void* net_connector_devmem_res[NEC_MAX_NET_BUFFERS];\n\n    // TODO: nr_channel_chunks and chunk_size should not be a comm property anymore\n    int nr_channel_chunks; /* Channel buffer depth, applies to all channels */\n    size_t chunk_size; /* Unit of transfer, applies to all channels */\n\n    struct encd_comm *drv_comm; /* Reference to driver comm */\n\n    char topology[1024]; /* Used for debugging purposes only to print the topology in case of an error */\n};\n\n/**\n * Global communicator\n */\nstruct enc_glb_comm {\n    uint32_t g_device_id; /* Same as comm->rank */\n    uint32_t g_device_cnt; /* Same as comm->rank_n */\n    uint32_t vtpb_idx;\n    int nec_dev_id;\n    int mla_idx;\n    /* Absolute neuron device hw id. This is the ID that driver\n       exposes neuron device on to host system aka OS. Neuron devices\n       are expesed to RT by different ID in case docker remaps\n       devices */\n    int host_device_id;\n    int routing_id;\n    uint32_t virtual_server_id;\n    nec_pod_type_t pod_type;\n    uint32_t pod_node_id;\n    uint32_t pod_sz;\n    uint64_t pod_id;\n    const char *root_comm_id; /* By getenv in nrt_config */\n    bool check_sigs;          /* By getenv in nrt_config */\n\n    uint32_t *rank_nodes; /* The node index of each rank */\n    uint32_t *local_ranks; /* The intra-node rank of each rank */\n\n    enc_nccl_comm_node_t nccl_comm_node; /* nccl_comm node can be used by any stream */\n\n    struct bananaphone *local_rings;\n    struct bp_handle *local_peer_handles;\n\n    /**\n     * A set of buffers containing values that are used to\n     * increment semaphores over efa transactions.\n     */\n    uint32_t *inc_recv_sem_values_buffer;\n    size_t inc_recv_sem_values_buffer_size;\n\n    struct enc_comm comm;\n\n    /* TODO: manage all the devmem reservations in a single place\n     * Today we share the buffers under the below path:\n     * enc_glb_comm->comm->ring.channels[ring_channel_id].devmem_res\n     * We need to move the above reservations and the one below to a\n     * singleton class e.g. enc_glb_comm->devmem_res_pool\n     */\n    void* inter_rdh_devmem_res[NEC_MAX_STREAM_N];\n    /* TODO: manage all the devmem reservations in a single place\n     * this mem res is referred by comm->rdh.rdh_devmem_res\n     */\n    void* intra_rdh_devmem_res[NEC_MAX_STREAM_N];\n\n    void* mesh_devmem_res_per_rg[NEC_MAX_STREAM_N * NEC_MAX_COMM_N * H_COMM_MAX_ID];\n\n    void* rdh_devmem_res_per_rg[NEC_MAX_STREAM_N * NEC_MAX_COMM_N];\n\n    void *gateway_devmem_res[NEC_MAX_STREAM_N][NEC_MAX_CHANNELS];\n\n    pthread_mutex_t gcomm_setup_mtx;\n\n    void *proxy_queue;   // opaque pointer to enc_proxy_queue\n\n    void *device_barrier_table;\n};\n\n/**\n * Network transport FIFOs\n *\n * Host send proxy should know the EFA buffer index, offset in the buffer and the size of\n * each data tranfer to send to remote device and recv proxy\n * needs destination addresses for each data from sender to submit network receive request.\n * Send and recv proxy should know when to report the completion of using\n * EFA buffer and complete is used to notify it.\n *\n * Such information is recorded when operation is loaded and becomes available on execution. Host\n * proxy uses these APIs to query the recorded FIFO.\n */\n\n/**\n * A net_ops_info_t entry corresponds to a set of smaller operations that are defined by multiple\n * net_src_addr_t and net_dest_addr_t. These sub operations can correspond to different types of\n * actions, so store a net_addr_mark_t identifier in each net_src_addr_t or net_dest_addr_t entry\n * to denote the purpose of the sub-operation.\n */\ntypedef enum net_addr_mark {\n    NET_TRANSFER,       /* Will drive data transfer over EFA */\n    NET_OP_COMPLETE,    /* Will mark final completion of a collective operation */\n    EXEC_COMPLETE       /* Will mark final completion of a collective load execution */\n} net_addr_mark_t;\n\ntypedef struct net_src_addr {\n    uint32_t net_op_idx;\n    int complete;\n    dma_addr_t dev_addr;\n    void *host_addr;\n    void *nccl_mhandle;\n    uint32_t size;\n    net_addr_mark_t mark;\n    void* proxy_histogram_tag;\n     /* Fields below are for mesh only */\n    int dst_rank;\n    /* For local RDMA read */\n    void *dst_addr;\n    void *dst_mhandle;\n} net_src_addr_t;\n\ntypedef struct net_dest_addr {\n    uint32_t net_op_idx;\n    int complete;\n    dma_addr_t dev_addr;\n    void *host_addr;\n    void *nccl_mhandle;\n    uint32_t size;\n    net_addr_mark_t mark;\n    /* Fields below are for mesh only */\n    int src_rank;\n} net_dest_addr_t;\n\ntypedef struct net_ops_info {\n    uint16_t sema_shift_offset;\n    bool early_send_completion;\n    bool early_recv_posting;\n    volatile uint32_t *inc_send_handshake;\n    volatile uint32_t *inc_send_complete;\n    volatile uint32_t *inc_recv_handshake;\n    volatile uint32_t *inc_recv_complete;\n    uint32_t tx_entry_cnt;\n    uint32_t rx_entry_cnt;\n    uint32_t net_idx_loop_size;\n    uint32_t initial_send_credits;\n    uint32_t ending_recv_credits;\n    size_t data_type_sz;\n    bool is_dynamic_send_recv_sz;\n    bool variable_peer;\n    bool add_to_histogram;\n    /*\n     * proxy uses this pointer to get connector information from transaction\n     * saddr/daddr fifo entry of each operation.\n     */\n    void *enc_channel;\n} net_ops_info_t;\n\n/**\n * API for proxy-thread to increase handshake and send/recv semaphores by writing directly to the\n * memory mapped semaphore inc register.\n * For more information, see documentation on struct enc_net_host_memory definition.\n */\nvoid nec_inc_semaphore(volatile uint32_t *sem_inc_addr, uint32_t val);\n\n/**\n * API for proxy-thread to get dynamic send and offset for the case where message\n * size is determined by data only available during execution.\n */\nsize_t nec_get_dynamic_send_size_bytes(enc_host_mem_t *dyn_input, size_t data_type_sz, int dst_rank, int rank_n);\nsize_t nec_get_dynamic_send_offset_bytes(enc_host_mem_t *dyn_input, size_t data_type_sz, int dst_rank, int rank_n);\nsize_t nec_get_dynamic_recv_offset_bytes(enc_host_mem_t *dyn_input, size_t data_type_sz, int src_rank, int rank_n);\nvoid nec_set_recv_size_bytes(enc_host_mem_t *dyn_input, size_t recv_size_bytes, size_t data_type_sz, int src_rank, int rank_n);\n\n/**\n * Qeury device information\n */\nint nec_get_device_count(int *available_devices_array, uint32_t array_size);\nint nec_get_device_pci_bdf(int neuron_dev, uint32_t *domain, uint32_t *bus_num, uint8_t *pci_slot, uint8_t *dev_func);\n\n/**\n * Query vcore size\n */\nNRT_STATUS nec_get_virtual_core_size(uint32_t *virtual_core_size);\n\ntypedef struct nec_version_info {\n    uint64_t major;\n    uint64_t minor;\n    uint64_t patch;\n    uint64_t maintenance;\n    char git_hash[16];\n    uint64_t compatibility_version;\n    // Any new fields added needs to be here. The fields before this cannot be\n    // changed to maintain backward compatibility\n    uint8_t future_fields[];\n} nec_version_info_t;\n\nNRT_STATUS nec_get_version_info(nec_version_info_t *version_info);\n\nNRT_STATUS nec_build_port_and_rid_map(int local_nec_dev_id, int *mla_indexes, int *host_device_ids, int count);\n\nbool nec_is_mla_available(int local_nec_dev_id, int mla_idx);\n\nint nec_mla_idx_to_rid(int local_nec_dev_id, int mla_idx);\n\nint nec_rid_to_mla_idx(int local_nec_dev_id, int rid);\n\nint nec_get_peer_mla_idx(int local_nec_dev_id, int mla_idx, int port);\n\nint nec_get_p2p_pod_peer_node(uint32_t nec_dev_id, int node, uint32_t port_distance,\n                              int *peer_node);\n\nNRT_STATUS nec_pod_node_can_access_peer_node(nec_pod_type_t pod_type,\n                                             uint32_t local_rid, uint32_t local_node_id,\n                                             uint32_t remote_rid, uint32_t remote_node_id,\n                                             int *can_access_peer);\n\nvoid nec_ndl_printk(char *str, uint32_t size, uint32_t action);\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/nrt/nrt.h",
    "content": "/*\n * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n#pragma once\n\n#include <stdbool.h>\n#include <stddef.h>\n#include <stdint.h>\n// Use quoted includes in nrt headers including other nrt headers. Most clients\n// (ptxla, jax, etc.) build with bazel, and bazel has issue with angle-brackets.\n// See https://bazel.build/docs/bazel-and-cpp#include-paths for details.\n#include \"nrt/nrt_status.h\"\n#include \"ndl/neuron_driver_shared_tensor_batch_op.h\"\n\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n/** Major and minor version of runtime. */\n#define NRT_MAJOR_VERSION 2\n#define NRT_MINOR_VERSION 0\n\ntypedef struct nrt_model nrt_model_t;\n\ntypedef struct nrt_tensor nrt_tensor_t;\n\ntypedef struct nrt_cc_context nrt_cc_context_t;\n\n/**\n * WARNING: Do not change the value of existing enums!\n * These values will be used by libnrt consumers, we\n * cannot change the defines under them, only append.\n */\ntypedef enum {\n    NRT_TENSOR_PLACEMENT_DEVICE,\n    NRT_TENSOR_PLACEMENT_HOST,\n    NRT_TENSOR_PLACEMENT_VIRTUAL,\n} nrt_tensor_placement_t;\n\ntypedef enum {\n    NRT_FRAMEWORK_TYPE_INVALID = 0,             // Invalid\n    NRT_FRAMEWORK_TYPE_NO_FW = 1,               // Framework less execution\n    NRT_FRAMEWORK_TYPE_TENSORFLOW,              // Tensorflow\n    NRT_FRAMEWORK_TYPE_PYTORCH,                 // Pytorch\n    NRT_FRAMEWORK_TYPE_MXNET,                   // Mxnet\n    NRT_FRAMEWORK_TYPE_PRECHECK,                // Neuron Node Precheck\n} nrt_framework_type_t;\n\nenum {\n    NRT_INSTANCE_UNKNOWN    = 0,\n    NRT_INSTANCE_INF1       = 1,\n    NRT_INSTANCE_TRN1       = 2,\n    NRT_INSTANCE_TRN1N      = 3,\n    NRT_INSTANCE_INF2       = 4,\n    NRT_INSTANCE_TRN2       = 5,\n    NRT_INSTANCE_TRN2N      = 6,\n    NRT_INSTANCE_INF2E      = 7,\n    NRT_INSTANCE_TRN2P      = 8,\n    NRT_INSTANCE_TRN2U      = 9,\n    NRT_INSTANCE_TRN2E      = 10,\n    NRT_INSTANCE_TRN2EU     = 11,\n    NRT_INSTANCE_TRN2AC     = 12,\n    NRT_INSTANCE_TRN2UAC    = 13,\n    NRT_INSTANCE_TRN3       = 14,\n    NRT_INSTANCE_TRN3PDS98  = 15\n};\n\nenum {\n    NRT_INSTANCE_SIZE_1XL,\n    NRT_INSTANCE_SIZE_2XL,\n    NRT_INSTANCE_SIZE_4XL,\n    NRT_INSTANCE_SIZE_6XL,\n    NRT_INSTANCE_SIZE_8XL,\n    NRT_INSTANCE_SIZE_24XL,\n    NRT_INSTANCE_SIZE_32XL,\n    NRT_INSTANCE_SIZE_48XL,\n    NRT_INSTANCE_SIZE_3XL,\n    // Note: Add new sizes right above this line to prevent breaking backward compatibility\n\n    NRT_INSTANCE_SIZE_UNKNOWN,\n    NRT_INSTANCE_SIZE_NUM = NRT_INSTANCE_SIZE_UNKNOWN,\n};\n\ntypedef enum nrt_op_type {\n    NRT_OP_ADD     = 0x0,\n    NRT_OP_FMA     = 0x1,\n    NRT_OP_MAX     = 0x2,\n    NRT_OP_MIN     = 0x3,\n    NRT_OP_INVALID = 0xF,\n} nrt_op_type_t;\n\ntypedef enum nrt_dtype {\n    NRT_DTYPE_UNKNOWN  = 0x0,\n    NRT_DTYPE_INVALID  = 0x0,\n    NRT_DTYPE_FP8_E3   = 0xD,\n    NRT_DTYPE_FP8_E4   = 0xE,\n    NRT_DTYPE_FP8_E5   = 0xF,\n    NRT_DTYPE_FLOAT16  = 0x7,\n    NRT_DTYPE_BFLOAT16 = 0x6,\n    NRT_DTYPE_FLOAT32  = 0xA,\n    NRT_DTYPE_FP32R    = 0xB,\n    NRT_DTYPE_UINT8    = 0x3,\n    NRT_DTYPE_UINT16   = 0x5,\n    NRT_DTYPE_UINT32   = 0x9,\n    NRT_DTYPE_UINT64   = 0x1,\n    NRT_DTYPE_INT8     = 0x2,\n    NRT_DTYPE_INT16    = 0x4,\n    NRT_DTYPE_INT32    = 0x8,\n    NRT_DTYPE_INT64    = 0xC,\n} nrt_dtype_t;\n\ntypedef enum nrt_cc_op_type {\n    NRT_CC_ALLGATHER,\n    NRT_CC_ALLREDUCE,\n    NRT_CC_REDUCESCATTER\n} nrt_cc_op_type_t;\n\ntypedef struct nrt_instance_info {\n    uint32_t family;\n    uint32_t size;\n    char arch_name[16];\n    char device_revision[8];\n} nrt_instance_info_t;\n\nNRT_STATUS nrt_get_instance_info(nrt_instance_info_t *info, size_t instance_info_len);\n\n/** Initialize neuron runtime.\n *\n * @param framework[in]      - Type of the framework.\n * @param fw_version[in]     - Framework version as string. (eg 2.1)\n * @param fal_version[in]    - Framework Abstraction Layer version as string.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_init(nrt_framework_type_t framework, const char *fw_version, const char *fal_version);\n\n/** Closes all the devices and cleans up the runtime state.\n */\nvoid nrt_close();\n\n/** Load given NEFF and place it in one or more neuron cores.\n *\n * @param neff_bytes[in]    - Pointer to NEFF data.\n * @param size[in]          - Length of the NEFF data.\n * @param vnc[in]           - VNC index where the NEFF should be loaded(-1 means runtime would automatically load in first free VNC).\n * @param vnc_count[in]     - DEPRECATED: always use -1\n * @param model[out]        - Resulting model would be stored here.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_load(const void *neff_bytes, size_t size, int32_t vnc, int32_t vnc_count, nrt_model_t **model);\n\n/** Load given NEFF for collective operations and place it in one or more neuron cores.\n *\n * If global NCCL communicator was not previously created, we will create it inside this API with the assumption that\n * global device id is same as ctx_device_id and global device count is same as ctx_device_count.\n *\n * @param neff_bytes[in]        - Pointer to NEFF data.\n * @param size[in]              - Length of the NEFF data.\n * @param vnc[in]               - VNC index where the NEFF should be loaded(-1 means runtime would automatically load in first free VNC).\n * @param vnc_count[in]         - DEPRECATED: always use -1\n * @param ctx_device_id[in]     - Device ID relative to the number of devices participating in this NEFF\n * @param ctx_device_count[in]  - Number of devices participating in collectives operations in this NEFF\n * @param model[out]            - Resulting model would be stored here.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_load_collectives(const void *neff_bytes, size_t size, int32_t vnc, int32_t vnc_count,\n                                uint32_t ctx_device_id, uint32_t ctx_device_count, nrt_model_t **model);\n\n/** Unload given model and free up device and host resources.\n *\n * @param model - Model to unload.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_unload(nrt_model_t *model);\n\n/** Get the number of VNCs used by a loaded model. (deprecated)\n *\n * @param model[in] - Model.\n * @param vnc_count[out] - The number of VNCs used by the model.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_get_model_nc_count(const nrt_model_t *model, uint32_t *vnc_count);\n\n/** Get the number of VNCs used by a loaded model. (deprecated)\n *\n * @param model[in] - Model.\n * @param vnc_count[out] - The number of VNCs used by the model.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_get_model_vnc_count(const nrt_model_t *model, uint32_t *vnc_count);\n\n/** Returns VirtualNeuronCores available in instance. (deprecated)\n *\n * @param vnc_count[out] - VirtualNeuronCores available in instance.\n *\n * @note This API can be called before nrt_init().\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_get_total_nc_count(uint32_t *vnc_count);\n\n/** Returns VirtualNeuronCores available in instance.\n *\n * @param vnc_count[out] - VirtualNeuronCores available in instance.\n *\n * @note This API can be called before nrt_init().\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_get_total_vnc_count(uint32_t *vnc_count);\n\n/** Returns VirtualNeuronCores visible to the application. (deprecated)\n *\n * @param vnc_count[out] - VirtualNeuronCores visible to the application.\n *\n * @note This API can be called before nrt_init().\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_get_visible_nc_count(uint32_t *vnc_count);\n\n/** Returns VirtualNeuronCores visible to the application.\n *\n * @param vnc_count[out] - VirtualNeuronCores visible to the application.\n *\n * @note This API can be called before nrt_init().\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_get_visible_vnc_count(uint32_t *vnc_count);\n\n/** A container to hold multiple tensors */\ntypedef void nrt_tensor_set_t;\n\n/** Allocates a new tensor set.\n *\n * @param result[out]       - Pointer to newly allocated tensor set would be stored here.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_allocate_tensor_set(nrt_tensor_set_t **result);\n\n/** Destroys given tensor_set and frees memory.\n *\n * @param tensor_set[in]    - Tensors set to be freed.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nvoid nrt_destroy_tensor_set(nrt_tensor_set_t **tensor_set);\n\n/** Add/replace given tensor to tensor set\n *\n * @param tensor_set[in]    - Tensor set to which the tensor is added.\n * @param tensor_name[in]   - Name of the tensor.\n * @param tensor[in]        - Pointer to tensor. This pointer should be valid till nrt_destroy_tensor_set() is called.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_add_tensor_to_tensor_set(nrt_tensor_set_t *tensor_set, const char *tensor_name, nrt_tensor_t *tensor);\n\n/** Get a tensor's info from a tensor set.\n *\n * @param tensor_set[in]    - Tensor set.\n * @param tensor_name[in]   - Name of the tensor.\n * @param tensor[out]       - Pointer to tensor would be stored here.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_get_tensor_from_tensor_set(nrt_tensor_set_t *tensor_set, const char *tensor_name, nrt_tensor_t **tensor);\n\n/** Execute given model with given inputs and collect outputs.\n *\n * @param model[in] - Model to execute.\n * @param input_set[in] - Set of input tensors.\n * @param output_set[in] - Set of output tensors.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_execute(nrt_model_t *model, const nrt_tensor_set_t *input_set, nrt_tensor_set_t *output_set);\n\n/** Execute given model with given inputs, repeat execution specified number of times and collect outputs.\n *\n * @param model[in] - Model to execute.\n * @param input_set[in] - Set of input tensors.\n * @param output_set[in] - Set of output tensors.\n * @param repeat_count[in] - Number of to repeat execution.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_execute_repeat(nrt_model_t *model, const nrt_tensor_set_t *input_set, nrt_tensor_set_t *output_set, int repeat_count);\n\n/** Build (initialize and setup) NCCL global communicator.\n *\n * @param vnc[in]               - Local VNC (within the instance)\n * @param g_device_id[in]       - Global device id\n * @param g_device_count[in]    - Max world size of all neffs that will be executed\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_build_global_comm(int32_t vnc, uint32_t g_device_id, uint32_t g_device_count);\n\n/** Allocates a tensor that can be passed and used by a model for compute.\n *\n * @param tensor_placement[in]  - Where the tensor would be allocated (device, host, or virtual memory)\n * @param vnc[in]               - Virutal Neuron Core id to allocate the tensor on. Pass in -1 if allocating tensors on host memory.\n * @param size[in]              - Size in bytes of the tensor to allocate.\n * @param name[in]              - OPTIONAL. Name of the tensor.\n * @param tensor[out]           - Pointer to newly created tensor will be stored here.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_tensor_allocate(nrt_tensor_placement_t tensor_placement, int vnc, size_t size, const char *name, nrt_tensor_t **tensor);\n\n/** Deallocates a tensor created by \"nrt_tensor_allocate\".\n *\n * @param tensor[in]    - Deallocates given tensor.\n *\n * @return None\n */\nvoid nrt_tensor_free(nrt_tensor_t **tensor);\n\n/** Copies data from tensor to passed in buffer.\n *\n * @param tensor[in]    - Tensor used to reference the tensor to read from.\n * @param buf[out]      - Buffer used to store data read from the tensor.\n * @param offset[in]    - Offset into the tensor to read from.\n * @param size[in]      - Number of bytes to read.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_tensor_read(const nrt_tensor_t *tensor, void *buf, size_t offset, size_t size);\n\n/** Copies data from passed in buffer to tensor.\n *\n * @param tensor[in/out]    - Tensor used to reference the tensor to write to.\n * @param buf[in]           - Buffer used to store data to write to the tensor.\n * @param offset[in]        - Offset into the tensor to write to.\n * @param size[in]          - Number of bytes to write.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_tensor_write(nrt_tensor_t *tensor, const void *buf, size_t offset, size_t size);\n\n/** A batch of tensor operations on a single tensor */\n// the definition of nrt_tensor_batch_op_t is in neuron_driver_shared_tensor_batch_op.h\ntypedef struct nrt_tensor_batch {\n    const nrt_tensor_t *tensor;        // Tensor handle\n    const nrt_tensor_batch_op_t *ops;  // Array of operations for this tensor\n    uint32_t num_ops;            // Number of operations for this tensor\n} nrt_tensor_batch_t;\n\n/** Batch read data from multiple tensors.\n *\n * @param batches[in]     - An array of batches, each of which describes operations on one tensor\n * @param num_batches[in] - Number of batches (tensors) in the array\n * @param unsafe[in]      - If true, skip tensor tracking/blocking (use with caution)\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_tensor_read_batch(const nrt_tensor_batch_t *batches, uint64_t num_batches, bool unsafe);\n\n/** Batch write data to multiple tensors.\n *\n * @param batches[in]     - An array of batches, each of which describes operations on one tensor\n * @param num_batches[in] - Number of batches (tensors) in the array\n * @param unsafe[in]      - If true, skip tensor tracking/blocking (use with caution)\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_tensor_write_batch(const nrt_tensor_batch_t *batches, uint64_t num_batches, bool unsafe);\n\n/** Copies data between tensors.\n *\n * When copying between two device tensors, they must both be allocated on the SAME Neuron Core.\n * A NRT_INVALID will be returned in the failing case.\n *\n * @param src[in]           - Tensor to copy from.\n * @param src_offset[in]    - Offset into the source tensor to copy from.\n * @param dst[out]          - Tensor to copy to.\n * @param dst_offset[in]    - Offset into the destination tensor to copy to.\n * @param size[in]          - Number of bytes to copy.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_tensor_copy(const nrt_tensor_t *src, size_t src_offset, nrt_tensor_t *dst, size_t dst_offset, size_t size);\n\n/** Gets the size of the passed in tensor.\n *\n * @param tensor[in]    - Tensor used to reference the tensor to get size of.\n *\n * @return Size of the tensor.\n */\nsize_t nrt_tensor_get_size(const nrt_tensor_t *tensor);\n\n/** Set the memory + offset pointed to by tensor to value\n *\n * @param tensor[in]        - allocated tensor\n * @param offset[in]        - offset within the tensor\n * @param value[in]         - value to set with\n * @param size[in]          - size of memory to set\n *\n * @return 0 on success.\n */\nNRT_STATUS nrt_tensor_memset(nrt_tensor_t *tensor, uint64_t offset, int value, size_t size);\n\n/** Allocates an empty tensor, i.e. the tensor structure w/o any attached storage\n *\n * @param name[in]              - OPTIONAL. Name of the tensor.\n * @param tensor[out]           - Pointer to newly created tensor will be stored here.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_tensor_allocate_empty(const char *name, nrt_tensor_t **tensor);\n\n/** Attaches caller supplied buffer to a tensor.  Any storage previously attached to the tensor is detached\n *  and freed if was owned by the tensor.\n *  The buffer is supplied by the caller and must persist through the entire lifetime of the tensor.\n *\n * @param tensor[in]            - Tensor\n * @param buffer[in]            - Caller supplied buffer to use as tensor's storage\n * @param size[in]              - Buffer Size\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_tensor_attach_buffer(nrt_tensor_t *tensor, void *buffer, size_t size);\n\n/** Creates a tensor to point to a slice of another tensor\n *  does not do a deep copy, just points the \"slice\" tensor storage to the \"source\" tensor storage\n *\n * @param tensor_source[in] - Tensor to point at\n * @param offset[in]        - Offset from the beginning of the source tensor to point at\n * @param size[in]          - Size of the slice\n * @param name[in]          - Optional name for the new tensor\n * @param tensor_slice[in]  - Newly allocated tensor to point to the storage of the source tensor\n *\n */\nNRT_STATUS nrt_tensor_allocate_slice( const nrt_tensor_t *tensor_source, size_t offset, size_t size, const char *name, nrt_tensor_t **tensor_slice);\n\n/** Given a tensor get the virtual address.\n *\n * @param tensor[in]        - Tensor for which the VA needs to be obtained\n *\n * @return va on success, NULL on failure.\n */\nvoid *nrt_tensor_get_va(const nrt_tensor_t *tensor);\n\n/** Returns on device allocation info for a tensor\n *\n * @param tensor[in]        - Tensor for which the information needs to be obtained\n * @param alloc_info[out]   - On device allocation information\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\ntypedef struct nrt_tensor_device_allocation_info {\n    uint64_t physical_address; // physical address in device memory space\n    size_t size;               // allocation size, could be larger than the tensor size\n    int hbm_index;             // which of the HBMs the tensor is placed\n} nrt_tensor_device_allocation_info_t;\nNRT_STATUS nrt_tensor_get_device_allocation_info(const nrt_tensor_t *tensor, nrt_tensor_device_allocation_info_t *alloc_info);\n\n/**\n * @brief A Runtime API to check if a given output tensor is fully written/complete.\n *        If timeout is given as unbounded, it emits a warning at the first 30 seconds.\n *\n * @param output_tensor:  The given output tensor.\n * @param timeout:        The maximum total duration to wait for tensor completion in microseconds.\n *                        If timeout is negative, the wait is unbounded. The caller is in charge of handling the timeout behaviors.\n *                        o/w, it checks completion until the timeout.\n * @param expected_completion_count:  The number of completions expected by the caller.\n *\n * @return NRT_STATUS:    It returns NRT_SUCCESS if the tensor is complete;\n *                        It returns NRT_INVALID, if the output tensor is given as NULL;\n *                        It returns NRT_TIMEOUT if the tensor is not reaching the expected_completion_count within the timeout.\n */\nNRT_STATUS nrt_tensor_check_output_completion(const nrt_tensor_t *output_tensor,\n                                              int64_t timeout,\n                                              uint64_t expected_completion_count);\n\n/**\n * @brief A Runtime API to reset the completion counter inside an output tensor to 0.\n *\n * @param output_tensor:  The given output tensor.\n * @return NRT_STATUS:    It returns NRT_SUCCESS if reset is successful;\n *                        It returns NRT_INVALID, if the output tensor is given as NULL.\n */\nNRT_STATUS nrt_tensor_reset_output_completion(nrt_tensor_t *output_tensor);\n\n/**\n * @brief Get the anonymous file-descriptor of dma-buf associated with\n * a Neuron device memory region if it was registered for EFA peer direct\n *\n * @param addr[in]          - Device buffer virtual address\n * @param size[in]          - Device buffer size (in bytes)\n * @param fd[out]           - dma-buf fd\n *\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrt_get_dmabuf_fd(uint64_t va, uint64_t size, int* fd);\n\n\n/**  Get the host based device id from the device id presented to runtime (which may container based device id)\n * @param neuron_dev[in]      - device id\n * @param host_device_id[out] - host device id\n * @return NRT_SUCCESS if call was successful, NRT_INVALID otherwise\n */\nNRT_STATUS nrt_host_device_id_get( int neuron_dev, uint32_t *host_device_id);\n\n/**  Return array of routing IDs indexed by host device ID. This is the definitive routing ID mapping provided from the driver\n * @param coutn[in/out]           - [in] number of entries in the mapping table provided. [out] count of entries returned\n * @param host_did_to_rid_map[in] - table/map of routing IDs indexed by host device ID\n * @return NRT_SUCCESS if call was successful, NRT_INVALID otherwise\n */\nNRT_STATUS nrt_host_device_id_rid_map_get(uint32_t *count, uint32_t *host_did_to_rid_map);\n\n/**\n * Get the HBM virtual address and size for a specific HBM index.\n * @param device_id[in]         - Device ID\n * @param hbm_idx[in]           - HBM index\n * @param addr[out]             - Pointer to store the virtual address\n * @param size[out]             - Pointer to store the size of the HBM region\n * @return NRT_SUCCESS if call was successful and HBM region was mapped\n *         NRT_INVALID_HANDLE if there are no more HBM regions to map for this device\n *         NRT_INVALID if the interface isn't supported or for invalid parameters\n *         NRT_FAILURE for other errors\n */\nNRT_STATUS nrt_get_hbm_mmap_va(int device_id, int hbm_idx, void **addr, size_t *size);\n\n\ntypedef struct nrt_vnc_memory_stats {\n    size_t bytes_used;\n    size_t bytes_limit;\n    // NOTE: For backward compatibility, when making updates, don't delete existing fields, and\n    //  ALWAYS add to the end of this struct!\n} nrt_vnc_memory_stats_t;\n\n/** Get the NRT memory stats for a VNC.\n *\n * @param vnc[in]             - Local VNC (within the instance)\n * @param stats[out]          - Pointer to a nrt_vnc_memory_stats struct\n * @param stats_size_in[in]   - Caller expected size of the nrt_vnc_memory_stats struct, for compatibility purposes\n * @param stats_size_out[out] - Library written size of the nrt_vnc_memory_stats struct, for compatibility purposes\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\n\nNRT_STATUS nrt_get_vnc_memory_stats(uint32_t vnc, nrt_vnc_memory_stats_t *stats, size_t stats_size_in, size_t *stats_size_out);\n\n/** Get BDF of the EFA device attached to a Neuron device identified by VA of HBM allocation on that device\n *\n * @param va[in]            - VA of a memory allocated on a Neuron devices\n * @param efa_bdf[out]      - a buffer (of sufficient size) to store BDF of the connected EFA device\n * @param len[in/out]       - in: length of buffer (including NULL), out: length of string (excluding NULL)\n *\n * @return NRT_SUCCESS on success\n *         NRT_RESOUCE if the buffer is not large enough to store the BDF string\n *         NRT_FAILURE for other errors\n */\n\nNRT_STATUS nrt_get_attached_efa_bdf(const void *va, char *efa_bdf, size_t *len);\n\n/******************************\n * Out-of-NEFF collectives    *\n ******************************/\n\ntypedef struct nrt_cc_comm {\n    uint32_t *replica_group; /* a list of participants */\n    uint32_t rank; /* my rank in the replica_group */\n    uint32_t rank_n; /* size of replica_group */\n\n    uint32_t ctx_device_id;\n    uint32_t ctx_device_count;\n    uint32_t vnc;\n} nrt_cc_comm_t;\n\ntypedef struct nrt_tensor_list {\n    nrt_tensor_t **tensors;\n    size_t num_tensors;\n} nrt_tensor_list_t;\n\n/** Build (initialize and setup) global communicator for host-driven collective operations.\n *\n * @param vnc[in]               - Local VNC (within the instance)\n * @param g_device_id[in]       - Global device id\n * @param g_device_count[in]    - Max world size of all participating workers\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_cc_global_comm_init(uint32_t vnc, uint32_t g_device_id, uint32_t g_device_count);\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/nrt/nrt_async.h",
    "content": "/*\n * Copyright 2025, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n#pragma once\n\n#include <stdbool.h>\n#include <stddef.h>\n#include <stdint.h>\n// Use quoted includes in nrt headers including other nrt headers. Most clients\n// (ptxla, jax, etc.) build with bazel, and bazel has issue with angle-brackets.\n// See https://bazel.build/docs/bazel-and-cpp#include-paths for details.\n#include \"nrt/nrt.h\"\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n// execution units\ntypedef enum {\n  NRTA_XU_TENSOR_READ = 0,\n  NRTA_XU_TENSOR_WRITE,\n  NRTA_XU_TENSOR_OP, // For tensor ops other than read and write\n  NRTA_XU_COMPUTE,\n  NRTA_XU_COLLECTIVES,\n\n  // For new XU types, must only add after existing ones\n  NRTA_XU_TYPE_NUM\n} nrta_xu_t;\n\n\n// nrta_seq_t's are monotomically increasing ids of executions\n// The first 16 bits are a Execution Unit ID, while the last\n// 48 bits are a strictly ordered Sequence Number\ntypedef uint64_t nrta_seq_t;\ntypedef uint16_t nrta_xu_id_t;\n\n#define NRTA_SEQ_NUM_MAX      ((1ull << 48) - 1)\n#define NRTA_SEQ_NUM_MASK     NRTA_SEQ_NUM_MAX\n#define NRTA_SEQ_GET_SEQ_NUM(seq_id)  (seq_id & NRTA_SEQ_NUM_MASK)\n#define NRTA_SEQ_GET_XU_ID(seq_id)    (seq_id >> 48)\n\n\ntypedef struct nrta_error {\n    nrta_seq_t seq_id;\n    uint64_t error_code; // NRT_STATUS, but typed as uint64 to ensure consistent representation across compilers\n} nrta_error_t;\nstatic_assert(sizeof(nrta_error_t) == 16, \"nrta_error_t must be of size 16\");\n\n// data structure used to store errors encountered during execution\ntypedef struct nrta_error_tracker nrta_error_tracker_t;\n\n/** Enqueues a tensor write request.  Copies the data from a host buffer to a\n *  tensor allocated on a Neuron device.  Uses TENSOR_WRITE execution unit based\n *  on the LNC that allocated the tensor.\n *\n * @param tensor[in]          - Destination tensor\n * @param buf[in]             - Host buffer containing source data\n * @param offset[in]          - Offset into the tensor\n * @param size[in]            - Number of bytes to write\n * @param queue[in]           - XU queue to use,\n * @param err[in]             - error tracker\n * @param req_sequence[out]   - Sequence number of the scheduled request\n *\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrta_tensor_write(nrt_tensor_t *tensor,\n                             const void *buf,\n                             uint64_t offset,\n                             uint64_t size,\n                             int queue,\n                             nrta_error_tracker_t *err,\n                             nrta_seq_t *req_sequence);\n\n/** Enqueues a tensor read request.  Copies the data from a tensor allocated on a Neuron device\n *  to a host buffer. Uses TENSOR_READ execution unit based\n *  on the LNC that allocated the tensor.\n *\n * @param buf[in]             - Destination Host buffer\n * @param tensor[in]          - Source tensor\n * @param offset[in]          - Offset into the tensor\n * @param size[in]            - Number of bytes to read\n * @param queue[in]           - XU queue to use,\n * @param err[in]             - error tracker\n * @param req_sequence[out]   - Sequence number of the scheduled request\n *\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrta_tensor_read(void *buf,\n                            nrt_tensor_t *tensor,\n                            uint64_t offset,\n                            uint64_t size,\n                            int queue,\n                            nrta_error_tracker_t *err,\n                            nrta_seq_t *req_sequence);\n\n/** Enqueues a tensor copy request.  Copies data between two tensors allocated\n *  on the same Logical Neuron Core.  Uses TENSOR_OP execution unit.\n *\n * NOTE: the tensors must be allocated until the copy completes\n *\n * @param src[in]             - Source tensor\n * @param src_offset[in]      - Offset into the source tensor\n * @param dst[in]             - Destination tensor\n * @param dst_offset[in]      - Offset into the destination tensor\n * @param size[in]            - Number of bytes to copy\n * @param queue[in]           - XU queue to use\n * @param err[in]             - error tracker\n * @param req_sequence[out]   - Sequence number of the scheduled request\n *\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrta_tensor_copy(nrt_tensor_t *src,\n                            uint64_t src_offset,\n                            nrt_tensor_t *dst,\n                            uint64_t dst_offset,\n                            uint64_t size,\n                            int queue,\n                            nrta_error_tracker_t *err,\n                            nrta_seq_t *req_sequence);\n\n/** Schedules an asynchronous request to execute a model with specified inputs\n *  and outputs. Uses COMPUTE execution unit of an LNC of the loaded model.\n *\n * @param model[in]           - The model to schedule for execution\n * @param input_set[in]       - Set of input tensors for the model\n * @param output_set[in]      - Set of tensors to receive the outputs\n * @param queue[in]           - XU queue to use, must be 0\n * @param err[in]             - error tracker\n * @param req_sequence[out]   - Sequence number of the scheduled request\n *\n * @return NRT_SUCCESS on successful preparation, appropriate error code otherwise\n */\nNRT_STATUS nrta_execute_schedule(nrt_model_t *model,\n                                 const nrt_tensor_set_t *input,\n                                 nrt_tensor_set_t *output,\n                                 int queue,\n                                 nrta_error_tracker_t *err,\n                                 nrta_seq_t *req_sequence);\n\n/** Prepares collective context and HW configuration needed for collectives operation.\n *  Allocates a collective context handle that is returned to the caller\n *  which is freed in the schedule thread post CC op execution.\n *\n * @param comm[in]              - Communicator containing the replica group\n * @param input[in]             - Input tensor list\n * @param output[out]           - Output tensor list\n * @param dtype[in]             - Data type of elements\n * @param op[in]                - Reduction operation (e.g., SUM, MAX) if applicable\n * @param cc_op[in]             - Collective operation (e.g., ALLREDUCE, ALLGATHER)\n * @param cc_ctx[out]           - Collective context\n *\n * @return NRT_SUCCESS on successful preparation, appropriate error code otherwise\n */\nNRT_STATUS nrta_cc_prepare(nrt_cc_comm_t *comm,\n                           nrt_tensor_list_t *input,\n                           nrt_tensor_list_t *output,\n                           nrt_dtype_t dtype,\n                           nrt_op_type_t op,\n                           nrt_cc_op_type_t cc_op,\n                           nrt_cc_context_t **cc_ctx);\n\n/** Schedules an asynchronous request to execute collective operation\n *\n * @param cc_ctx[in]           - Collective context\n * @param queue[in]            - XU queue to use, must be 0\n * @param err[in]              - error tracker\n * @param req_sequence[out]    - Sequence number of the scheduled request\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrta_cc_schedule(nrt_cc_context_t **cc_ctx,\n                            int queue,\n                            nrta_error_tracker_t *err,\n                            nrta_seq_t *req_sequence);\n\n// completion status\n\n/** Checks completion status of a scheduled request\n *\n * @param seq[in]           - Scheduled request sequence id\n * @param is_completed[out] - true if the request is completed, false otherwise\n *\n * @return NRT_SUCCESS if the request is completed, NRT_INVALID if the seq is not valid\n */\nNRT_STATUS nrta_is_completed(nrta_seq_t seq, bool *is_completed);\n\n\n/** Returns sequence number of the last completed request\n *\n * @param lnc[in]           - LNC\n * @param xu[in]            - XU\n * @param queue[in]         - XU's queue\n * @param seq[out]          - last completed sequence number\n *\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrta_get_sequence(uint32_t lnc, nrta_xu_t xu, int queue, nrta_seq_t *seq);\n\n\n/** Returns a pollable file descriptor that is READABLE when the execution request\n * specified by seq is complete.\n *\n * Note that users should only use the `poll` family of functions and `close` on this file\n * descriptor. Any other FD function is invalid and can lead to undefined behavior.\n *\n * The file descriptor must be passed to `close` to free the handle once the handle is not\n * needed anymore.\n *\n * @param seq[in]           - sequence to track completion\n * @param fd[out]           - FD associate with the sequence.\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrta_get_completion_handle(nrta_seq_t seq, int *fd);\n\n\n/** Creates an error tracker list\n *\n * @param lnc_idx[in]           - Logical Neuron Core this list will be used for\n * @param error_tracker[out]    - Created list.\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrta_error_tracker_create(uint32_t lnc_idx, nrta_error_tracker_t **error_tracker);\n\n/** Frees an error tracker list\n *\n * @param error_tracker[in] - Error tracker list to free\n *\n */\nvoid nrta_error_tracker_destroy(nrta_error_tracker_t *error_tracker);\n\n/** Gets list of errors from error tracker list\n *\n * @param error_tracker[in] - Error tracker list to get errors from\n * @param list[out]         - Array of errors obtained from teh error tracker\n * @param error_count[out]  - Number of errors in the list\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrta_error_tracker_get_list(nrta_error_tracker_t *error_tracker, const nrta_error_t **list, size_t *error_count);\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/nrt/nrt_async_sendrecv.h",
    "content": "#pragma once\n\n#include \"nrt/nrt.h\"\n#include \"nrt/nrt_status.h\"\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\ntypedef struct nrt_async_sendrecv_comm nrt_async_sendrecv_comm_t;\ntypedef struct nrt_async_sendrecv_request nrt_async_sendrecv_request_t;\n\n/**\n * Get the maximum number of async sendrecv communicators per logical neuron core\n *\n * @param num[out]   - The maximum number of async sendrecv communicators per logical neuron core\n * @return NRT_SUCCESS on success\n *         NRT_FAILURE for errors\n */\nNRT_STATUS nrt_async_sendrecv_get_max_num_communicators_per_lnc(int* num);\n\n/**\n * Get the maximum number of pending requests per async sendrecv communicator\n *\n * @param num[out]   - The maximum number of pending requests per async sendrecv  communicator\n * @return NRT_SUCCESS on success\n *         NRT_FAILURE for errors\n */\nNRT_STATUS nrt_async_sendrecv_get_max_num_pending_request(int* num);\n\n/** Initialize asynchronous tensor send and receive on logical neuron core\n *\n * Logical neuron core ID is the absolute ID of the logical core on\n * the host machine. The ID is uneffected by device remapping via\n * docker and selection of visible logical cores.\n *\n * This function may only be called when runtime is initialized. This\n * function must have a matching call to nrt_async_sendrecv_close() before\n * nrt_close() is called.\n * This function returns error in case preceeding call to\n * nrt_async_sendrecv_close() on the logical neuron core returned error.\n *\n * @param lnc[in]   - Logical neuron core ID on the current server\n * @return NRT_SUCCESS if logical core has been initialized successfully\n *         NRT_FAILURE for errors\n */\nNRT_STATUS nrt_async_sendrecv_init(int lnc);\n\n/** Closes asynchronous tensor send and receive of logical neuron core and cleans up resources\n *\n * A call to this function must have a preceeding matching call to\n * nrt_async_sendrecv_init().  After this function was invoked, all sendrecv\n * communicators and requests associated with this logical neuron core\n * are closed and cannot be accessed anymore invoking functions with those\n * communicators or requests is regarded undefined behavior.\n * Cases where this function is called and one of the communicators is\n * not connected yet are considered an error. Cases where this\n * function is called and send or receive requests are still inflight\n * are considered an error.\n *\n * @param lnc[in]   - Logical neuron core ID on the current server\n * @return NRT_SUCCESS if logical core has been closed successfully\n *         NRT_FAILURE for errors\n */\nNRT_STATUS nrt_async_sendrecv_close(int lnc);\n\n/** Create send communicator\n *\n * Before send communicator can be used to initiate sending a tensor,\n * connection to receive communicator must be established. Use\n * function nrt_async_sendrecv_test_comm() to test whether connection is\n * established.\n * Async sendrecv for logical neuron core lnc must have been\n * initialized via call to nrt_async_sendrecv_init() before this function is\n * invoked.\n * This function is thread-safe.\n *\n * @param peer_ip[in]    - IP adress of peer logical neuron core\n * @param peer_lnc[in]   - Logical neuron core ID on the peer server\n * @param lnc[in]        - Logical neuron core ID on the current server\n * @param send_comm[out] - Pointer to send communicator\n * @return NRT_SUCCESS  if logical core has been created successfully\n *         NRT_RESOURCE if the number of created communicators exceeds the limit of NRT_ASYNC_SENDRECV_MAX_NUM_COMMUNICATORS_PER_LNC\n *         NRT_FAILURE  for other errors\n */\nNRT_STATUS nrt_async_sendrecv_connect(const char* peer_ip, int peer_lnc, int lnc, nrt_async_sendrecv_comm_t** send_comm);\n\n/** Create receive communicator\n *\n * Before receive communicator can be used to initiate receiveing a tensor,\n * connection to receive communicator must be established. Use\n * function nrt_async_sendrecv_test_comm() to test whether connection is\n * established.\n * Async sendrecv for logical neuron core lnc must have been\n * initialized via call to nrt_async_sendrecv_init() before this function is\n * invoked.\n * This function is thread-safe.\n *\n * @param peer_ip[in]    - IP adress of peer logical neuron core\n * @param peer_lnc[in]   - Logical neuron core ID on the peer server\n * @param lnc[in]        - Logical neuron core ID on the current server\n * @param recv_comm[out] - Pointer to receive communicator\n * @return NRT_SUCCESS  if logical core has been created successfully\n *         NRT_RESOURCE if the number of created communicators exceeds the limit of NRT_ASYNC_SENDRECV_MAX_NUM_COMMUNICATORS_PER_LNC\n *         NRT_FAILURE  for other errors\n */\nNRT_STATUS nrt_async_sendrecv_accept(const char* peer_ip, int peer_lnc, int lnc, nrt_async_sendrecv_comm_t** recv_comm);\n\n/** Test whether connection has been established\n *\n * @param comm[in]  - The send or receive communicator\n * @param done[out] - True if connection to peer communicator is established\n * @return NRT_SUCCESS if test performed without error\n *         NRT_INVALID_HANDLE if handle is invalid\n *         NRT_TIMEOUT        if the communicator fails to establish connection within time limit\n *         NRT_FAILURE        for other errors\n */\nNRT_STATUS nrt_async_sendrecv_test_comm(nrt_async_sendrecv_comm_t* comm, bool* done);\n\n/** Asynchronously send a tensor\n *\n * This is a non-blocking function.\n *\n * This function is thread-safe. This function is only allowed to be\n * invoked on a communicator that is sucessfully tested to be\n * connected via call to nrt_async_sendrecv_test_comm().\n *\n * @param tensor[in]        - Tensor to receive to\n * @param offset[in]        - Offset into the tensor to receive to\n * @param length[in]        - Number of bytes to read\n * @param send_comm[in]     - Send communicator\n * @param request[out]      - Pointer to receive request\n * @return NRT_SUCCESS        on success\n *         NRT_INVALID_HANDLE if handle is invalid\n *         NRT_RESOURCE       if the number of pending requests exceeds the limit of NRT_ASYNC_SENDRECV_MAX_NUM_PENDING_REQUEST\n *         NRT_FAILURE        for other errors\n */\nNRT_STATUS nrt_async_sendrecv_send_tensor(nrt_tensor_t* tensor, size_t offset, size_t length, nrt_async_sendrecv_comm_t* send_comm, nrt_async_sendrecv_request_t** request);\n\n/** Asynchronously receive a tensor\n *\n * This is a non-blocking function.\n *\n * This function is thread-safe. This function is only allowed to be\n * invoked on a communicator that is sucessfully tested to be\n * connected via call to nrt_async_sendrecv_test_comm().\n *\n * @param tensor[in]        - Tensor to receive to\n * @param offset[in]        - Offset into the tensor to receive to\n * @param length[in]        - Number of bytes to read\n * @param recv_comm[in]     - Receive communicator\n * @param request[out]      - Pointer to receive request\n * @return NRT_SUCCESS        on success\n *         NRT_INVALID_HANDLE if handle is invalid\n *         NRT_RESOURCE       if the number of pending requests exceeds the limit of NRT_ASYNC_SENDRECV_MAX_NUM_PENDING_REQUEST\n *         NRT_FAILURE        for other errors\n */\nNRT_STATUS nrt_async_sendrecv_recv_tensor(nrt_tensor_t* tensor, size_t offset, size_t length, nrt_async_sendrecv_comm_t* recv_comm, nrt_async_sendrecv_request_t** request);\n\n/** Test the completion status of a asynchronous request\n *\n * This function is thread-safe when invoked with different\n * requests. This function is not allowed to be invoked concurrently\n * by multiple threads with the same request at the same time. When\n * this function returned request to be completed, this function is\n * not allowed to be invoked again with the same request.\n *\n * @param request[in]       - Request to test\n * @param done[out]         - Whether the request has completed\n * @param size[out]         - Number of bytes sent/received\n * @return NRT_SUCCESS        on success\n *         NRT_INVALID_HANDLE if handle is invalid\n *         NRT_TIMEOUT        if the request fails to complete data transfer within time limit\n *         NRT_FAILURE        for other errors\n */\nNRT_STATUS nrt_async_sendrecv_test_request(nrt_async_sendrecv_request_t* request, bool* done, size_t* size);\n\n/** Flush received messae to ensure full arrival in memory\n *\n * Ensure that received messages of successfully tested async sendrecv\n * receive operations prior to call to this function fully arrived in\n * memory after this function completes.\n *\n * @param lnc[in]        - Receiving logical neuron core ID\n * @return NRT_SUCCESS  if flush operation succeeded\n *         NRT_FAILURE  for other errors\n */\nNRT_STATUS nrt_async_sendrecv_flush(int lnc);\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/nrt/nrt_experimental.h",
    "content": "/*\n * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n#pragma once\n\n#include <stddef.h>\n#include <stdint.h>\n#include \"nrt/nrt_status.h\"\n#include \"nrt/nrt.h\"\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n\n/** Usage of a Tensor in the NEFF\n */\ntypedef enum nrt_tensor_usage {\n    NRT_TENSOR_USAGE_INPUT = 0,     // Tensor is used for ifmap\n    NRT_TENSOR_USAGE_OUTPUT,        // Tensor is used for ofmap\n} nrt_tensor_usage_t;\n\n#define NRT_TENSOR_NAME_MAX 256\n\ntypedef struct nrt_tensor_info {\n    char name[NRT_TENSOR_NAME_MAX];     // Name of the tensor\n    nrt_tensor_usage_t usage;           // Type of the tensor\n    size_t size;                        // Tensor size in bytes\n    nrt_dtype_t dtype;                  // data type\n    uint32_t *shape;                    // an array representing data shape\n    uint32_t ndim;                      // the number of dimensions\n} nrt_tensor_info_t;\n\ntypedef struct nrt_tensor_info_array {\n    uint64_t tensor_count;              // Total number of tensors in the NEFF\n    nrt_tensor_info_t tensor_array[];   // Array of tensor info\n} nrt_tensor_info_array_t;\n\n/* Function definition for async exec status callbacks */\ntypedef void (*NRT_ASYNC_EXEC_STATUS_CALLBACK)(void *params, uint32_t model_id, uint32_t vnc, uint64_t job_id, NRT_STATUS status);\n\n/** Return input/output tensor information for a given model.\n*\n* @param model[in]         - Model for which tensor information needs to be extracted.\n* @param tensor_info[out]  - Pointer to store the result.\n*\n* @return NRT_STATUS_SUCCESS on success.\n*/\nNRT_STATUS nrt_get_model_tensor_info(nrt_model_t *model, nrt_tensor_info_array_t **tensor_info);\n\n/** Return the instance count for this model handle (optimal number of concurrent threads that can call nrt_execute). (deprecated)\n*\n* @param model[in]         - Model for the instance count needs to be returned.\n* @param instance[out]     - Pointer to store the result.\n*\n* @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_get_model_instance_count(nrt_model_t *model, uint32_t *instance_count);\n\n\n/** Free input/output tensor information for a given model.\n*\n* @param tensor_info[in]  - Pointer to store the result.\n*\n* @return NRT_STATUS_SUCCESS on success.\n*/\nNRT_STATUS nrt_free_model_tensor_info(nrt_tensor_info_array_t *tensor_info);\n\n/** Enable tracing for all VNCs visible to the app\n *\n * @param trace_mem[in] - collect memory allocation info\n *\n * @return NRT_SUCCESS on success.\n */\nNRT_STATUS nrt_trace_start(bool trace_mem);\n\n/** Serialize all data and disable tracing\n *\n * @param filename[in] - filename to write to\n *\n * @return NRT_SUCCESS on success.\n */\nNRT_STATUS nrt_trace_stop(const char *filename);\n\n/** temporary, to be removed. See comment in neuron_nccl.cc\n*/\nvoid *nrt_get_libnccl_net(int *err, char *err_msg, size_t err_msg_size);\n\n/** Structs to pass around ucode image info\n*/\ntypedef struct nrt_ucode_img {\n    uint8_t *bin;\n    size_t size;\n} nrt_ucode_img;\n\ntypedef struct nrt_ucode_info {\n    nrt_ucode_img iram;\n    nrt_ucode_img dram;\n} nrt_ucode_info;\n\n/** Specify pooling engine ucode iram and dram images that will get loaded by nrt_init().\n*   To use this API, it MUST be called BEFORE nrt_init().\n*   Swapping ucode after nrt_init() is NOT supported. Ucode images are only loaded once.\n*   This API provides a temporary workaround for swapping ucode.\n*/\nNRT_STATUS nrt_set_pool_eng_ucode(const nrt_ucode_info *ucode_info);\n\n/** Copies data to memory mapped Neuron device memory\n*\n* @param dest[in]          - Pointer to destination memory (mmaped device memory)\n* @param src[in]           - Pointer to source memory\n* @param size[in]          - Copy size\n*\n* @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_memcpy_to_device(void *dest, const void *src, size_t size);\n\n/** Register a return status callback to post exec status to when running in async exec mode.\n *  Calling this multiple times will replace the previouly registered callback.\n *\n * @param callback[in]  - Callback to post nrt exec status to for async execution.\n * @param params[in]    - Params for the async exec thread to pass to the callback upon\n *                        execution completion. Can be NULL.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_register_async_exec_callback(NRT_ASYNC_EXEC_STATUS_CALLBACK callback, void *params);\n\n/** Implements a barrier by running a small all-reduce over all workers\n*\n* @param vnc[in]                 - local VNC (within the instance)\n* @param global_device_id[in]    - global worker ID\n* @param global_device_count[in] - total number of workers\n*\n* @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_barrier(int32_t vnc, uint32_t g_device_id, uint32_t g_device_count);\n\n/** Perform all-rank AllGather\n*\n* @param vnc[in]              - local VNC (within the instance)\n* @param g_device_id[in]      - global worker ID\n* @param g_device_count[in]   - total number of workers\n* @param rank_input_size[in]  - input size\n* @param input[in]            - ptr to input data from this rank\n* @param output[out]          - ptr to output buffer of size (g_device_count*rank_input_size)\n*\n* @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_all_gather(int32_t vnc, uint32_t g_device_id, uint32_t g_device_count,\n                          uint32_t rank_input_size, void *input, void *output);\n\n/** Blocks caller until all queued executions on async worker thread are drained.\n *\n * @param vnc - VNC index to block on.\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_async_drain_queued_execs(int32_t vnc);\n\ntypedef struct nrt_model_info {\n    uint32_t vnc;\n    // additional fields can be added here in the future\n    // do not remove previously added fields because it will cause\n    // memory corruption if the caller was compiled using a different \n    // version of this header.\n} nrt_model_info_t;\n/** Returns information about loaded model\n *\n * @param model [in]          - the model\n * @param info [out]          - the information about the model\n * @param info_size_in [in]   - the size of the info structure (used for version control)\n * @param info_size_out [out] - the number of bytes written (for version control)\n *\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrt_get_model_info(const nrt_model_t *model, nrt_model_info_t *info, size_t info_size_in, size_t *info_size_out);\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/nrt/nrt_profile.h",
    "content": "/*\n * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n#pragma once\n\n#include \"nrt/nrt.h\"\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n/** Enable profiling for a model\n *\n * @param model[in]     - model to profile\n * @param filename[in]  - output filename that will be used with nrt_profile_stop()\n *\n * @return NRT_SUCCESS on success.\n */\nNRT_STATUS nrt_profile_start(nrt_model_t *model, const char *filename);\n\n/** Collect results and disable profiling for a model\n *\n * @param filename[in] - output filename to save the NTFF profile to\n *\n * @return NRT_SUCCESS on success.\n */\nNRT_STATUS nrt_profile_stop(const char *filename);\n\n/** Options for continuous device profiling.\n *\n * Opaque struct used to preserve compatibility and enforce proper usage.\n * Use nrt_profile_continuous_options_set_* functions set options.\n * Default options:\n * - output_dir: \"./output\"\n *\n * Usage:\n *   nrt_profile_continuous_options_t *options;\n *   nrt_profile_continuous_options_allocate(&options);\n *   nrt_profile_continuous_options_set_output_dir(options, \"./output\");\n */\ntypedef struct nrt_profile_continuous_options nrt_profile_continuous_options_t;\n\n/** Allocate memory for the nrt_profile_continuous_options_t struct and set all options to defaults.\n *\n * @param options[in] - pointer to a pointer to nrt_profile_continuous_options_t struct\n */\nNRT_STATUS nrt_profile_continuous_options_allocate(nrt_profile_continuous_options_t **options);\n\n/** Free up memory allocated for the options struct needed for continuous device profiling.\n *\n * @param options[in] - pointer to a nrt_profile_continuous_options struct\n *\n * @return NRT_SUCCESS on success.\n */\nNRT_STATUS nrt_profile_continuous_options_free(nrt_profile_continuous_options_t *options);\n\n/** Sets the output directory for results of continuous device profiling.\n *\n * The filename is set automatically.\n *\n * @param[in,out] options Pointer to the options struct.\n * @param[in] output_dir Path to the output directory.\n *\n * @return NRT_SUCCESS on success.\n */\nNRT_STATUS nrt_profile_continuous_options_set_output_dir(nrt_profile_continuous_options_t *options, const char *output_dir);\n\n/** @brief Start continuous device profiling.\n *\n * When continuous device profiling is started, profiling is enabled for every model but notifications\n * will only be serialized to disk when the user calls nrt_profile_continuous_save(). This gives\n * the user control over which profiles are saved to disk. When a profile is not saved, the overhead\n * of trace serialization and disk write is avoided. Continuous profiling is ideal for scenarios where you\n * only want to save profiles for specific executions. In this mode you do not need to call\n * nrt_profile_start() and nrt_profile_stop() because they are called internally. Continuous profiling\n * will not start if inspect device profiling is already enabled or async execution is enabled.\n *\n * @param options[in] - options to control continuous device profiling\n *\n * @return NRT_SUCCESS on success.\n */\nNRT_STATUS nrt_profile_continuous_start(nrt_profile_continuous_options_t *options);\n\n/** Save NTFF profile to disk for the latest model executed on requested NeuronCore.\n *\n * Output directory will be set according to the options passed into this function. The filenames of\n * NTFFs within the output directory are chosen automatically to avoid conflicts. Calling save does\n * not stop continuous profiling.\n *\n * @param vnc[in]      - (start) NeuronCore id to collect profile for\n * @param options[in]  - options to control continuous device profiling\n *\n * @return NRT_SUCCESS on success.\n */\nNRT_STATUS nrt_profile_continuous_save(uint32_t vnc, nrt_profile_continuous_options_t *options);\n\n/** Stops continuous device profiling.\n *\n * Calling stop does not save a profile.\n *\n * @return NRT_SUCCESS on success.\n */\nNRT_STATUS nrt_profile_continuous_stop();\n\n/* Begin tracing/profiling\n *\n * Users of this API must set options through environment variables:\n * \n * - NEURON_RT_INSPECT_ENABLE: Set to 1 to enable system and device profiles.\n *   For control over which profile types are captured, use NEURON_RT_INSPECT_SYSTEM_PROFILE \n *   and NEURON_RT_INSPECT_DEVICE_PROFILE.\n * - NEURON_RT_INSPECT_OUTPUT_DIR: The directory where captured profile data will be saved to.\n *   Defaults to ./output.\n * - NEURON_RT_INSPECT_SYSTEM_PROFILE: Set to 0 to disable the capture of system profiles. \n *   Defaults to 1 when NEURON_RT_INSPECT_ENABLE is set to 1.\n * - NEURON_RT_INSPECT_DEVICE_PROFILE: Set to 0 to disable the capture of device profiles.\n *   Defaults to 1 when NEURON_RT_INSPECT_ENABLE is set to 1.\n * - NEURON_RT_INSPECT_ON_FAIL: Set to 1 to enable dumping of device profiles in case of an error \n *   during graph execution. Defaults to 0.\n * \n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrt_inspect_begin();\n\n\n/* Stop tracing/profiling and dump profile data.\n * Does nothing if `duration` is given to nrt_inspect_begin() and already elapsed\n *\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrt_inspect_stop();\n\n\n/** @brief Options for nrt_inspect_begin_with_options API.\n *\n * Opaque struct used to preserve compatibility and enforce proper usage.\n * Use nrt_inspect_config_set_* functions to set options or \n * nrt_inspect_config_set_defaults to set use default options.\n *\n * Example Usage:\n *  nrt_inspect_config_t *options;\n *  nrt_inspect_config_allocate(&options);\n *  nrt_inspect_config_set_output_dir(options, \"./output\");\n */\ntypedef struct nrt_inspect_config nrt_inspect_config_t;\n\n\n/** Allocate memory for the options structure which is needed to\n * start profiling using nrt_inspect_begin_with_options. This will set all options to defaults\n * \n * @param options[out] - pointer to a pointer to options nrt_inspect_config struct\n * \n */\nNRT_STATUS nrt_inspect_config_allocate(nrt_inspect_config_t **options);\n\n/** @brief all fields of the nrt_inspect_config structure to their default values.\n * \n * Default behavior after calling this function:\n * - Session ID: 1\n * - Output directory: \"./output\" (when not explicitly set)\n * - Activity types: All activity types enabled (system_profile, device_profile, host_memory, cpu_util)\n * - System trace: All NeuronCores and event types enabled for capture\n * - Inspect mode: Disabled (profiles not captured automatically)\n * - Inspect on failure: Disabled (profiles not captured on execution failures)\n * \n * @param options[in,out] - Pointer to an nrt_inspect_config structure.\n * \n * @return NRT_SUCCESS on success\n * \n * @note These default values set here are NOT influenced by the environment variables. \n * If you are using the environment variables to set the values you do not need to use this method \n * or any of the nrt_inspect_config_set_* functions.\n */\nNRT_STATUS nrt_inspect_config_set_defaults(nrt_inspect_config_t *options);\n\n/** Free up memory allocated for the options structure which is needed to\n * start profiling using nrt_inspect_begin_with_options\n * \n * @param options[in] - pointer to an options nrt_inspect_config struct\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrt_inspect_config_free(nrt_inspect_config_t *options);\n\n/**\n * @brief Sets the session ID for the nrt_inspect_config_t which is needed to\n * start profiling using nrt_inspect_begin_with_options\n *\n * @param[in,out] options Pointer to the options structure.\n * @param[in] session_id Session ID to set.\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrt_inspect_config_set_session_id(nrt_inspect_config_t *options, int session_id);\n\n/**\n * @brief Sets the output directory for results of \n * profiling using nrt_inspect_begin_with_options\n *\n * @param[in,out] options Pointer to the options structure.\n * @param[in] output_dir Path to the output directory. Must be a valid non-empty string \n * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters, NRT_RESOURCE for memory allocation failure.\n * \n * @note The function makes an internal copy of the string, so the caller\n *       does not need to keep the original string alive.\n * @note Call nrt_inspect_config_free() to properly clean up allocated memory.\n */\nNRT_STATUS nrt_inspect_config_set_output_dir(nrt_inspect_config_t *options, const char *output_dir);\n\n/**\n * @brief Sets max number of system trace events that can be stored across all ring buffers\n *\n * @param[in,out] options Pointer to the options structure.\n * @param[in] sys_trace_max_events_per_nc Max number of system trace events that can be stored across all ring buffers.\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrt_inspect_config_set_sys_trace_max_events_per_nc(nrt_inspect_config_t *options, uint64_t sys_trace_max_events_per_nc);\n\n/**\n * @brief Sets system trace capture enabled for a specific NeuronCore\n * ring buffers won't be allocated for disabled NeuronCores \n * \n * @param[in,out] options Pointer to the options structure.\n * @param[in] nc_idx Index of the NeuronCore.\n * @param[in] enabled Boolean value to enable or disable system trace capture.\n * @return NRT_SUCCESS on success\n */\nNRT_STATUS nrt_inspect_config_set_capture_enabled_for_nc(nrt_inspect_config_t *options, uint32_t nc_idx, bool enabled);\n\n/** \n * @brief Sets system trace capture enabled for a specific event type\n * can save memory and reduce output size\n * @param[in,out] options Pointer to the options structure.\n * @param[in] event_type Valid event types.\n * @param[in] enabled Capture enabled flag.\n * @return NRT_SUCCESS on success\n * \n * @note Event type must be a string from the list of supported event types. To get the list of supported event types, \n * use nrt_sys_trace_get_event_types in the nrt_sys_trace.h header file.\n */\nNRT_STATUS nrt_inspect_config_set_capture_enabled_for_event_type_string(nrt_inspect_config_t *options, const char *event_type, bool enabled);\n\n/**\n * @brief Enable both system and device profiling for normal execution\n * \n * When disabled (default), no profiles are captured during normal execution.\n * This flag controls whether profiles are captured automatically for each execution.\n * Note: If both enable_inspect and enable_inspect_on_fail are false, no profiling occurs.\n *\n * @param[in,out] options Pointer to the options structure.\n * @param[in] enable_inspect Boolean value to enable or disable inspect profiling.\n * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters.\n */\nNRT_STATUS nrt_inspect_config_set_enable_inspect(nrt_inspect_config_t *options, bool enable_inspect);\n\n/**\n * @brief Enable dumping of device profiles in case of execution failures\n * \n * When enabled, device profiles will be captured and saved when graph execution fails.\n * This is disabled by default. If both enable_inspect and enable_inspect_on_fail are false,\n * no profiling occurs at all.\n *\n * @param[in,out] options Pointer to the options structure.\n * @param[in] enable_inspect_on_fail Boolean value to enable or disable inspect on failure.\n * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters.\n */\nNRT_STATUS nrt_inspect_config_set_enable_inspect_on_fail(nrt_inspect_config_t *options, bool enable_inspect_on_fail);\n\n /**\n * Begin tracing/profiling with configurable options\n *\n * Parameters:\n * @param[in] options - A pointer to an nrt_inspect_config struct containing configuration options\n *                     for profiling. Use nrt_inspect_config_set_* functions to set options.\n *                     If NULL is passed, default options will be used.\n * @return NRT_SUCCESS on success\n * \n * @note This API ignores all the NEURON_RT_INSPECT_* environment variables.\n * If you are using the environment variables to set the values you do not need to use this method \n * or any of the nrt_inspect_config_set_* functions. Use nrt_inspect_begin() instead.\n */\nNRT_STATUS nrt_inspect_begin_with_options(nrt_inspect_config_t *options);\n\n/**\n * @brief Returns all available activity type strings\n *\n * This function allocates and returns an array of all supported activity type\n * strings. The caller is responsible for freeing both the individual strings\n * and the array itself, or can use nrt_inspect_config_free_activity_types().\n *\n * @param[out] activity_types Pointer to store the allocated array of activity type strings.\n * @param[out] count Pointer to store the number of activity types returned.\n * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters, \n *         NRT_RESOURCE for memory allocation failure.\n */\nNRT_STATUS nrt_inspect_config_get_all_activity_types(const char ***activity_types, size_t *count);\n\n/**\n * @brief Returns the currently enabled activity type strings\n *\n * This function examines the enabled_activities bitmask in the configuration\n * and returns an array of strings for only the currently enabled activity types.\n * The caller is responsible for freeing both the individual strings and the array itself,\n * or can use nrt_inspect_config_free_activity_types().\n *\n * @param[in] options Pointer to the options structure.\n * @param[out] activity_types Pointer to store the allocated array of enabled activity type strings.\n * @param[out] count Pointer to store the number of enabled activity types returned.\n * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters, \n *         NRT_RESOURCE for memory allocation failure.\n */\nNRT_STATUS nrt_inspect_config_get_enabled_activity_types(nrt_inspect_config_t *options, const char ***activity_types, size_t *count);\n\n/**\n * @brief Free the activity types array allocated by nrt_inspect_config_get_all_activity_types\n * or nrt_inspect_config_get_enabled_activity_types.\n * This function properly frees both the array and all individual strings.\n * \n * @param[in] activity_types Pointer to the activity types array to be freed.\n * @param[in] count Number of activity types in the array.\n */\nvoid nrt_inspect_config_free_activity_types(const char **activity_types, size_t count);\n\n/**\n * @brief Sets or clears a specific activity type in the configuration\n *\n * This function enables or disables a specific activity type by name. It converts\n * the activity type string to the corresponding enum value and updates the\n * enabled_activities bitmask accordingly.\n *\n * @param[in,out] options Pointer to the options structure.\n * @param[in] activity_type String name of the activity type. Valid values are:\n *                         \"system_profile\", \"device_profile\", \"host_memory\", \n *                         \"cpu_util\", \"all\"\n * @param[in] enabled True to enable the activity, false to disable it.\n * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters or unknown activity type.\n */\nNRT_STATUS nrt_inspect_config_set_activity(nrt_inspect_config_t *options, const char *activity_type, bool enabled);\n\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/nrt/nrt_status.h",
    "content": "/*\n * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n#pragma once\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n// NOTE: if making changes here please also keep\n// KaenaTools: KaenaTools: pkg/rt/rt.go in sync\n\ntypedef enum {\n    NRT_SUCCESS = 0,\n    NRT_FAILURE = 1,                        // non specific failure, don't use if there is more descriptive type\n    NRT_INVALID = 2,                        // e.g. invalid NEFF, bad instruction, bad DMA descriptor, input tensor name/size does not match the model, etc.\n\n                                            // TODO invalid_handle is no longer useful because handles are not passed in nrt API\n                                            // remove\n    NRT_INVALID_HANDLE = 3,                 // make this one explicit instead of using more generic INVALID_INPUT because it could be a common caller mistake\n    NRT_RESOURCE = 4,                       // failed to allocate a resource for requested operation\n\n                                            // TODO separate exec timeout from others\n    NRT_TIMEOUT = 5,                        // operation timed out\n    NRT_HW_ERROR = 6,                       // Hardware failure\n    NRT_QUEUE_FULL = 7,                     // not enough space in the execution input queue\n    NRT_LOAD_NOT_ENOUGH_NC = 9,             // Failed to allocate enough NCs for loading a NEFF\n    NRT_UNSUPPORTED_NEFF_VERSION = 10,      // Unsupported version of NEFF\n\n    // DO NOT USE - keep for backward compat\n    NRT_FAIL_HOST_MEM_ALLOC = 11,           // failed to allocate host memory\n\n    // Unique retcodes to help the caller identify when nrt apis are called outside the scope of nrt_init() and nrt_close()\n    NRT_UNINITIALIZED = 13,\n    NRT_CLOSED = 14,\n\n    NRT_QUEUE_EMPTY = 15, // Accessed a queue with no data\n\n\n    NRT_EXEC_UNIT_UNRECOVERABLE = 101,      // Encountered fatal error and Execution Unit is in limbo, cannot recover. Need to reset\n\n\n    NRT_EXEC_BAD_INPUT = 1002,              // invalid input has been submitted to exec()\n    NRT_EXEC_COMPLETED_WITH_NUM_ERR = 1003, // execution was completed with numerical errors (produced NaN)\n    NRT_EXEC_COMPLETED_WITH_ERR = 1004,     // execution was completed with other errors,\n                                            // either logical - event double clear, or physical - parity error\n    NRT_EXEC_NC_BUSY = 1005,                // the neuron core is locked (in use) by another model/process\n    NRT_EXEC_OOB = 1006,                    // one or more indirect memcopies and/or embedding updates are out of bound\n    NRT_COLL_PENDING = 1100,                // collective operation is still pending\n\n    // classify different types of execution hangs/timeouts. For unknown/generic hang, use NRT_TIMEOUT.\n    NRT_EXEC_HW_ERR_COLLECTIVES = 1200,     // Stuck in collectives op (missing notification(s)). Possibly caused by a hardware error on another worker.\n    NRT_EXEC_HW_ERR_HBM_UE      = 1201,     // An HBM encountered an unrepairable uncorrectable error and produced incorrect results.\n    NRT_EXEC_HW_ERR_NC_UE       = 1202,     // An on-chip memory of Neuron Core encountered a parity error and produced incorrect results.\n    NRT_EXEC_HW_ERR_DMA_ABORT   = 1203,     // A DMA engine encountered an unrecoverable error.\n\n    NRT_EXEC_SW_NQ_OVERFLOW     = 1204,     // Software notification queue overflow.\n    NRT_EXEC_HW_ERR_REPAIRABLE_HBM_UE = 1205,  // An HBM encountered an repairable uncorrectable error and produced incorrect results.\n\n    NRT_NETWORK_PROXY_FAILURE   = 1206,    // EFA network proxy operation failed.\n} NRT_STATUS;\n\nconst char *nrt_get_status_as_str(NRT_STATUS status);\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/nrt/nrt_sys_trace.h",
    "content": "/*\n * Copyright 2025, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n#pragma once\n#include <nrt/nrt.h>\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n/*\n * This is a public interface used by both the fetch api (which allows near\n * real-time querying of captured events), and inspect profiling (which saves\n * captured events to disk), as well as other profiling functions.\n */\n\n//------------------------------------------------\n// Section: System Trace Capture\n//------------------------------------------------\n\ntypedef struct nrt_sys_trace_config nrt_sys_trace_config_t;\n\n/** Allocate memory for the options structure which is needed to\n * start profiling using nrt_sys_trace_start. This will set all options to\n * defaults. The reason we use an _allocate function is so that users don't need\n * to know the size or implementation details of the config struct.\n *\n * @param options[in] - pointer to a pointer to options nrt_sys_trace_config struct\n *\n */\nNRT_STATUS nrt_sys_trace_config_allocate(nrt_sys_trace_config_t **options);\n\n/** Set all fields of the nrt_sys_trace_config structure to their default values.\n *\n * @param options[in,out] - Pointer to an nrt_sys_trace_config structure.\n */\nvoid nrt_sys_trace_config_set_defaults(nrt_sys_trace_config_t *options);\n\n/** Free up memory allocated for the options structure which is needed to\n * start profiling using nrt_sys_trace_start\n *\n * @param options[in] - pointer to an options nrt_sys_trace_config struct\n *\n */\nvoid nrt_sys_trace_config_free(nrt_sys_trace_config_t *options);\n\n/**\n * @brief Sets max number of events that can be stored across all ring buffers\n *\n * @param[in,out] options Pointer to the options structure.\n * @param[in] max_events_per_nc Max number of events that can be stored in each ring buffer.\n */\nvoid nrt_sys_trace_config_set_max_events_per_nc(nrt_sys_trace_config_t *options, uint64_t max_events_per_nc);\n\n/**\n * @brief Sets system trace capture enabled for a specific NeuronCore\n * ring buffers won't be allocated for disabled NeuronCores.\n * Can save memory, reduce output size, and speed up trace processing.\n * @param[in,out] options Pointer to the options structure.\n * @param[in] nc_idx NeuronCore index.\n * @param[in] enabled Capture enabled flag.\n */\nvoid nrt_sys_trace_config_set_capture_enabled_for_nc(nrt_sys_trace_config_t *options, uint32_t nc_idx, bool enabled);\n\n/**\n * @brief Sets system trace capture enabled for a specific event type.\n * Can save memory, reduce output size, and speed up trace processing.\n * @param[in,out] options Pointer to the options structure.\n * @param[in] event_type Event type string, possible values are from nrt_sys_trace_get_event_types\n * @param[in] enabled Capture enabled flag.\n */\nNRT_STATUS nrt_sys_trace_config_set_capture_enabled_for_event_type(nrt_sys_trace_config_t *options, const char *event_type, bool enabled);\n\n/**\n * @brief Returns an allocated array of all valid event type strings.\n * @param[out] event_types Pointer to array of const char* (allocated).\n * @param[out] count Number of event types.\n * @return NRT_SUCCESS on success, error code otherwise.\n * @note The user is responsible for freeing the array and each string, or can use\n *       nrt_sys_trace_free_event_types() for convenience.\n *\n * Example usage:\n *      const char **event_types = nullptr;\n *      size_t count = 0;\n *      NRT_STATUS status = nrt_sys_trace_get_event_types(&event_types, &count);\n *      // Manual cleanup:\n *      for (size_t i = 0; i < count; ++i) {\n *          free((void*)event_types[i]);\n *      }\n *      free((void*)event_types);\n *      // Or use convenience function:\n *      nrt_sys_trace_free_event_types(event_types, count);\n */\nNRT_STATUS nrt_sys_trace_get_event_types(const char ***event_types, size_t *count);\n\n/**\n * @brief Free the event types array allocated by nrt_sys_trace_get_event_types.\n * This function properly frees both the array and all individual strings.\n *\n * @param[in] event_types Pointer to the event types array to be freed.\n * @param[in] count Number of event types in the array.\n */\nvoid nrt_sys_trace_free_event_types(const char **event_types, size_t count);\n\n/**\n * @brief Returns an allocated array of enabled event type strings for the given config.\n * @param[in] options Pointer to the nrt_sys_trace_config_t structure.\n * @param[out] event_types Pointer to array of const char* (allocated).\n * @param[out] count Number of enabled event types.\n * @return NRT_SUCCESS on success, error code otherwise.\n * @note The user is responsible for freeing the array and each string.\n */\nNRT_STATUS nrt_sys_trace_config_get_enabled_event_types(nrt_sys_trace_config_t *options, const char ***event_types, size_t *count);\n\n// Initiailization for system trace capture including allocating memory for event ring buffers\nNRT_STATUS nrt_sys_trace_start(nrt_sys_trace_config_t *options);\n\n// Teardown for system trace capture including freeing allocated memory for event ring buffers\nNRT_STATUS nrt_sys_trace_stop();\n\n//------------------------------------------------\n// Section: System Trace Fetch\n//------------------------------------------------\n\ntypedef struct nrt_sys_trace_fetch_options nrt_sys_trace_fetch_options_t;\nNRT_STATUS nrt_sys_trace_fetch_options_allocate(nrt_sys_trace_fetch_options_t **options);\nvoid nrt_sys_trace_fetch_options_set_defaults(nrt_sys_trace_fetch_options_t *options);\nvoid nrt_sys_trace_fetch_options_free(nrt_sys_trace_fetch_options_t *options);\n// Max number of events to fetch per NeuronCore\nvoid nrt_sys_trace_fetch_options_set_max_events_per_nc(nrt_sys_trace_fetch_options_t *options, uint64_t max_events_per_nc);\n// Fetch events only for specified NeuronCore\nvoid nrt_sys_trace_fetch_options_set_nc_idx(nrt_sys_trace_fetch_options_t *options, uint64_t nc_idx);\n\n/**\n * Fetches system trace events from process memory and returns them as a JSON-formatted string.\n * Once events are fetched, they cannot be fetched again.\n *\n * @param[out] buffer       On successful return, will point to a dynamically allocated, null-terminated\n *                          JSON string containing the trace events. Memory for the output buffer is\n *                          allocated internally; therefore, the caller should not allocate or initialize\n *                          the buffer before calling this function. Instead, the caller must initialize\n *                          the buffer pointer to NULL and, after a successful call, is responsible for\n *                          freeing the allocated memory by calling nrt_sys_trace_buffer_free(buffer).\n *\n * @param[out] written_size A pointer to a size_t variable that will be set to the number of bytes written\n *                          into the allocated buffer.\n *\n * @param[in] options       Pointer to options such as max number of events to fetch.\n *\n * @return NRT_SUCCESS on success.\n *\n * Usage example:\n *     char *buffer;\n *     size_t written_size;\n *     nrt_sys_trace_fetch_options_t *options;\n *     nrt_sys_trace_fetch_options_allocate(&options);\n *     nrt_sys_trace_fetch_options_set_nc_idx(options, 0); // Fetch events from NeuronCore 0 only instead of all\n *     nrt_sys_trace_fetch_options_set_max_events_per_nc(options, 10000); // Fetch up to 10,000 events instead of all\n *     nrt_sys_trace_fetch_events(&buffer, &written_size, options);\n *     // or if you want to use the default options:\n *     nrt_sys_trace_fetch_events(&buffer, &written_size, NULL);\n *     // finally free the buffer when the events are no longer needed:\n *     nrt_sys_trace_buffer_free(buffer)\n */\nNRT_STATUS nrt_sys_trace_fetch_events(char **buffer, size_t *written_size, const nrt_sys_trace_fetch_options_t *options);\n\n/** Free the buffer allocated by nrt_sys_trace_fetch_events. Should be called after the events are no longer needed.\n *\n * @param buffer [in]        - Pointer to buffer to be freed.\n *\n * @return NRT_SUCCESS on success.\n */\nvoid nrt_sys_trace_buffer_free(char *buffer);\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/libnrt/include/nrt/nrt_version.h",
    "content": "/*\n * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved\n */\n\n#pragma once\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n#define RT_VERSION_DETAIL_LEN 128\n#define GIT_HASH_LEN 64\n\ntypedef struct nrt_version {\n    uint64_t rt_major;\n    uint64_t rt_minor;\n    uint64_t rt_patch;\n    uint64_t rt_maintenance;\n    char rt_detail[RT_VERSION_DETAIL_LEN];\n    char git_hash[GIT_HASH_LEN];\n} nrt_version_t;\n\n/** Get the NRT library version\n *\n * @param ver[out]          - Pointer to nrt version struct\n * @param size[in]          - Length of the data needed to be filled in the nrt_version_struct\n *\n * @return NRT_STATUS_SUCCESS on success.\n */\nNRT_STATUS nrt_get_version(nrt_version_t *ver, size_t size);\n\n#ifdef __cplusplus\n}\n#endif\n"
  },
  {
    "path": "src/neuron-gatherinfo/LICENSE",
    "content": "Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this\nsoftware and associated documentation files (the \"Software\"), to deal in the Software\nwithout restriction, including without limitation the rights to use, copy, modify,\nmerge, publish, distribute, sublicense, and/or sell copies of the Software, and to\npermit persons to whom the Software is furnished to do so.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,\nINCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A\nPARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\nHOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION\nOF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE\nSOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n"
  },
  {
    "path": "src/neuron-gatherinfo/clear_params_tfpb.py",
    "content": "import re\nimport copy\nimport argparse\nimport tensorflow as tf\nimport numpy as np\nimport string\n\nfrom google.protobuf import text_format\nfrom tensorflow.core.framework import node_def_pb2\nfrom tensorflow.core.framework import attr_value_pb2\nfrom tensorflow.python.framework import tensor_util\nfrom tensorflow.tools.graph_transforms import TransformGraph\n\ndef zero_const(node):\n  val = tf.make_ndarray(node.attr.get(\"value\").tensor)\n  new_val = val * 0.0\n  new_tensor = tensor_util.make_tensor_proto(new_val, new_val.dtype, new_val.shape)\n  node.attr[\"value\"].CopyFrom(attr_value_pb2.AttrValue(tensor=new_tensor))\n\ndef ZeroAllConst(graphdef):\n  sess = tf.compat.v1.Session(graph=tf.import_graph_def(graphdef))\n  const_by_name = {}\n  node_by_name = {}\n  for node in graphdef.node:\n    node_by_name[node.name] = node  \n    if node.op == \"Const\":\n      const_by_name[node.name] = node  \n    if node.op == \"BiasAdd\" or node.op == \"MatMul\" \\\n            or node.op.startswith(\"Conv\") \\\n            or node.op.startswith(\"FusedBatchNorm\"):\n      for i in node.input:  \n        i_node = node_by_name[i]\n        if i_node.op == \"Const\":\n          zero_const(i_node)\n        if i_node.op == \"Identity\":\n          x_node = node_by_name[i_node.input[0]]\n          if x_node.op == \"Const\":\n            zero_const(x_node)\n  return graphdef\n\ndef load_graph(model_file):\n  graph_def = tf.compat.v1.GraphDef()\n\n  with open(model_file, \"rb\") as f:\n    graph_def.ParseFromString(f.read())\n  return graph_def\n\nif __name__ == \"__main__\":\n  parser = argparse.ArgumentParser(description=\"Zero-out parameters of BiasAdd, MatMul, Conv*, and FusedBatchNorm of TensorFlow frozen graph.\")\n  parser.add_argument(\"--graph\", help=\"File name of frozen graph to be converted\",\n      required=True)\n  parser.add_argument(\"--out_graph\", help=\"File name to save converted frozen graph\",\n      required=True)\n  args = parser.parse_args()\n\n  graph_orig = load_graph(args.graph)\n  graph_mod = ZeroAllConst(graph_orig)\n  with tf.io.gfile.GFile(args.out_graph, \"wb\") as f:\n    f.write(graph_mod.SerializeToString())\n  #with tf.io.gfile.GFile(args.out_graph + \"txt\", 'w') as f:\n  #  f.write(text_format.MessageToString(graph_mod))\n"
  },
  {
    "path": "src/neuron-gatherinfo/mx_neuron_check_model.py",
    "content": "import os\nimport json\nimport sys\nimport struct\nimport argparse\nimport subprocess\nfrom collections import Counter\n \nclass neuron_parser:\n  def __init__(self):\n    self.parser = argparse.ArgumentParser()\n    self.parser.add_argument('model_path', type=str, help='path prefix to MXNet model (the part before -symbol.json).')\n    self.parser.add_argument('--show_names', action='store_true', help='list operation by name instead of summarizing by type (caution: this option will generate many lines of output for a large model).')\n    self.parser.add_argument('--expand_subgraph', action='store_true', help='show subgraph operations.')\n    self.parser_args = self.parser.parse_args()\n    self.neuronop_info = {}\n    self.total_pipeline_cores = 0\n    self.min_required_pipeline_cores = 0\n    path = self.parser_args.model_path\n\n    if os.path.exists(path + '-symbol.json'):\n      self.load_mxnet_model(path)\n    elif os.path.isdir(path):\n      self.load_tensorflow_model(path)\n    else:\n      raise RuntimeError('Cannot determine framework type from model path argument.')\n    self.supported = self.get_neuron_supported()\n    self.supported.extend(self.addl_support)\n    for name, executable, (sg_nodetypes, sg_nodenames) in self.neuron_nodes:\n      num_cores, requested_cores, _ = self.get_cores_from_executable(executable)\n      self.neuronop_info[name] = (num_cores, requested_cores, sg_nodetypes, sg_nodenames)\n      self.total_pipeline_cores += num_cores\n      if num_cores > self.min_required_pipeline_cores:\n          self.min_required_pipeline_cores = num_cores\n\n  def get_neuron_supported(self):\n    exec_cmd = [\"neuron-cc\", \"list-operators\", \"--framework\", self.framework]\n    oplist = subprocess.check_output(' '.join(exec_cmd), shell=True)\n    oplist = str(oplist, 'utf-8')\n    oplist = oplist.split(\"\\n\")\n    return oplist[:-1]  # Remove the last element which is ''\n \n  def get_tf_subgraph_types_names(self, node):\n    from tensorflow.core.framework import graph_pb2\n    graph_def = graph_pb2.GraphDef()\n    graph_def.ParseFromString(node.attr['graph_def'].s)\n    sg_nodes = graph_def.node\n    sg_nodes = [sg_node for sg_node in sg_nodes if sg_node.op not in self.excl_types]\n    nodetypes = [sg_node.op for sg_node in sg_nodes]\n    nodenames = [sg_node.name for sg_node in sg_nodes]\n    return nodetypes, nodenames\n\n  def load_tensorflow_model(self, path):\n    import tensorflow as tf\n    import tensorflow_hub as hub\n    self.framework = 'TENSORFLOW'\n    self.neuron_optype = \"NeuronOp\"\n    self.excl_types = ['Placeholder', 'PlaceholderWithDefault', 'NoOp', 'Const', 'Identity', 'IdentityN', 'VarHandleOp', 'VarIsInitializedOp', 'AssignVariableOp', 'ReadVariableOp', 'StringJoin', 'ShardedFilename', 'SaveV2', 'MergeV2Checkpoints', 'RestoreV2']\n    self.addl_support = ['FusedBatchNormV3', 'BatchMatMulV2', 'AddV2', 'StopGradient', self.neuron_optype]\n    model = hub.load(path)\n    graph_def = model.graph.as_graph_def()\n    nodes = graph_def.node\n    nodes = [node for node in nodes if node.op not in self.excl_types]\n    self.nodetypes = [node.op for node in nodes]\n    self.nodenames = [node.name for node in nodes]\n    self.neuron_nodes = [(node.name, node.attr['executable'].s, self.get_tf_subgraph_types_names(node)) for node in nodes if node.op == self.neuron_optype]\n\n  def get_mx_subgraph_types_names(self, node):\n    nodetypes = []\n    nodenames = []\n    for sg in node['subgraphs']:\n      filtered_nodes = [sg_node for sg_node in sg['nodes'] if sg_node['op'] not in self.excl_types]\n      nodetypes.extend([sg_node['op'] for sg_node in filtered_nodes])\n      nodenames.extend([sg_node['name'] for sg_node in filtered_nodes])\n    return nodetypes, nodenames\n\n  def load_mxnet_model(self, path):      \n    import mxnet as mx\n    if mx.__version__ != \"1.5.1\":\n      try:\n        import mx_neuron as mxn\n      except:\n        raise \"Please install mxnetneuron package.\"\n    self.framework = 'MXNET'\n    self.neuron_optype = \"_neuron_subgraph_op\"\n    self.excl_types = ['null']\n    self.addl_support = [self.neuron_optype]\n    sym, args, auxs = mx.model.load_checkpoint(path, 0)\n    nodes = json.loads(sym.tojson())[\"nodes\"]\n    nodes = [node for node in nodes if node['op'] not in self.excl_types]\n    self.nodetypes = [node['op'] for node in nodes]\n    self.nodenames = [node['name'] for node in nodes]\n    neuron_nodes_tmp = [node for node in nodes if node['op'] == self.neuron_optype]\n    self.neuron_nodes = [(node['name'], bytearray(args[node['name']+\"_neuronbin\"].asnumpy()), self.get_mx_subgraph_types_names(node)) for node in neuron_nodes_tmp]\n\n  @staticmethod\n  def get_cores_from_executable(executable):\n    _NC_HEADER_SIZE = 544\n    header = executable[:_NC_HEADER_SIZE]\n    info = list(struct.unpack('168xI304xI64B', header))\n    numCores = info.pop(0)\n    numCoresRequested = info.pop(0)\n    coresPerNode = info\n    return  numCores, numCoresRequested, coresPerNode\n\n  # Display table of operation type or name and whether supported or not\n  def print_node_type_info(self):\n    self.cnt_total = len(self.nodetypes)\n    self.cnt_supported = 0\n    if self.parser_args.show_names:\n      widthn = max(max(map(len, self.nodenames)), 8)\n      widtht = max(max(map(len, self.nodetypes)), 8)\n      format_str = \"{:<\" + str(widthn) + \"}  {:<\" + str(widtht) + \"}  {:<4}\"\n      pp = lambda x: print(format_str.format(*x))\n      pp(['Op Name', 'Op Type', 'Neuron Supported ?'])\n      pp(['-------', '-------', '------------------'])\n      for idx, opname in enumerate(self.nodenames):\n        optype = self.nodetypes[idx]\n        if optype in self.supported:\n          pp([opname, optype, 'Yes'])\n          self.cnt_supported += 1\n      for idx, opname in enumerate(self.nodenames):\n        optype = self.nodetypes[idx]\n        if optype not in self.supported:\n          pp([opname, optype, 'No'])\n    else:\n      count = Counter(self.nodetypes)\n      width = max(max(map(len, self.nodetypes)), 8)\n      format_str = \"{:<\" + str(width) + \"}  {:<14}  {:<4}\"\n      pp = lambda x: print(format_str.format(*x))\n      pp(['Op Type', 'Num Instances', 'Neuron Supported ?'])\n      pp(['-------', '-------------', '------------------'])\n      for key in count:\n        if key in self.supported:\n          pp([key, count[key], 'Yes'])\n          self.cnt_supported += count[key]\n      for key in count:\n        if key not in self.supported:\n          pp([key, count[key], 'No'])\n    print()\n\n  def print_subgraph_ops(self, sg_nodetypes, sg_nodenames):\n    if self.parser_args.show_names:\n      widthn = max(max(map(len, sg_nodenames)), 8)\n      widtht = max(max(map(len, sg_nodetypes)), 8)\n      format_str = \"{:<\" + str(widthn) + \"}  {:<\" + str(widtht) + \"}\"\n      pp = lambda x: print('    ', format_str.format(*x))\n      pp(['Op Name', 'Op Type'])\n      pp(['-------', '-------'])\n      for idx, opname in enumerate(sg_nodenames):\n        optype = sg_nodetypes[idx]\n        pp([opname, optype])\n    else:\n      count = Counter(sg_nodetypes)\n      width = max(max(map(len, sg_nodetypes)), 8)\n      format_str = \"{:<\" + str(width) + \"}  {:<14}\"\n      pp = lambda x: print('    ', format_str.format(*x))\n      pp(['Op Type', 'Num Instances'])\n      pp(['-------', '-------------'])\n      for key in count:\n        pp([key, count[key]])\n\n  def print_neuron_node_info(self):\n    idx = 0\n    width = max(max(map(len, self.neuronop_info)), 14) \n    format_str = \"{:<\" + str(width) + \"}  {:<14}\"\n    pp = lambda x: print(format_str.format(*x))\n    pp(['Subgraph Name', 'Num Pipelined NeuronCores'])\n    pp(['-------------', '-------------------------'])\n    core_cnt_list = []\n    for name, (num_cores, _, sg_nodetypes, sg_nodenames) in self.neuronop_info.items():\n      pp([name, num_cores])\n      core_cnt_list.append(num_cores)\n      idx += 1\n      if self.parser_args.expand_subgraph:\n        self.print_subgraph_ops(sg_nodetypes, sg_nodenames)\n    print()\n\n  def print_neuron_support_stats(self):\n    print(\"* Total inference operations: {}\".format(self.cnt_total))\n    print(\"* Total Neuron supported inference operations: {}\".format(self.cnt_supported))\n    if self.cnt_total > 0:\n      perc = self.cnt_supported / self.cnt_total * 100\n    else:\n      perc = 0\n    print(\"* Percent of total inference operations supported by Neuron: {:.1f}\".format(perc))\n    print()\n\n  def print_common_desc(self):\n    if self.parser_args.show_names:\n      print(\"* Each line shows an operation name and whether the type of that operation is supported in Neuron.\")\n    else:\n      print(\"* Each line shows an operation type, the number of instances of that type within model,\\n\" \\\n            \"* and whether the type is supported in Neuron.\")\n    print(\"* Some operation types are excluded from table because they are no-operations or training-related operations:\\n\", \\\n            self.excl_types, \"\\n\")\n\n  def run(self):\n    if len(self.neuronop_info) > 0:\n      print(\"\\n* Found {} Neuron subgraph(s) ({}(s)) in this compiled model.\\n\" \\\n            \"* Use this tool on the original uncompiled model to see Neuron supported operations.\\n\" \\\n            \"* The following table shows all operations, including Neuron subgraphs.\".format(len(self.neuronop_info), self.neuron_optype))\n      self.print_common_desc()\n      self.print_node_type_info()\n      print('* Please run this model on Inf1 instance with at least {} NeuronCore(s).'.format(self.min_required_pipeline_cores))\n      print(\"* The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph\\n\"\\\n            \"* (and subgraph operations if --expand_subgraph is used):\\n\")\n      self.print_neuron_node_info()\n    else:\n      print(\"\\n* The following table shows the supported and unsupported operations within this uncompiled model.\")\n      self.print_common_desc()\n      self.print_node_type_info()\n      self.print_neuron_support_stats()\n \nif __name__=='__main__':\n  toolkit = neuron_parser()\n  toolkit.run()\n"
  },
  {
    "path": "src/neuron-gatherinfo/neuron-gatherinfo.py",
    "content": "#!/usr/bin/env python3\n# coding=utf-8\n\n\n\"\"\" Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n\n    SPDX-License-Identifier: MIT-0\n\n    Program to gather information from a system\n\"\"\"\nimport sys\nimport os\nimport argparse\nimport shutil\nimport subprocess\nimport re\n\nACTUAL_CMD = os.path.realpath(sys.argv[0])\n\nUSAGE_MSG = \"\"\"\n    Usage: {} [options]\n    This program is used to gather information from this system for analysis\n    and debugging\n    \"\"\".format(ACTUAL_CMD)\n\nEXCLUDE_FILES_BY_NAME = \"weight files, model, NEFF (Neuron Executable File Format)\"\n\nHELP_CC_FILES = \"\"\" Location of the neuron-cc generated files \"\"\"\nDEFAULT_CCFILES_LOCATION = \"~/bin\"\n\nSYSLOG_SEARCH_PATTERNS = r\"nrtd|neuron|kernel:\"\n\nEXTERNAL_CMDS = [\"lscpu\", \"lshw\",\n                 \"lspci | grep -i Amazon\",\n                 \"neuron-cc --version\",\n                 \"neuron-ls\",\n                 \"top -b -n 1\",\n                 \"uname -a\", \"uptime\",\n                 ]\n\nPROC_FILES = [\"/proc/cmdline\",\n              \"/proc/cpuinfo\",\n              \"/proc/filesystems\",\n              \"/proc/interrupts\",\n              \"/proc/iomem\",\n              \"/proc/loadavg\",\n              \"/proc/meminfo\",\n              \"/proc/modules\",\n              \"/proc/mtrr\",\n              \"/proc/version\",\n              ]\n\nHELP_ADDITIONAL_FILE_OR_DIR = \"\"\" Additional file or directory that the user wants to provide in\n    the archive. The user can sanitize this file or directory before sharing \"\"\"\n\nINCLUDE_MSG = \"\"\"\n    By default, only the lines containing (grep) patterns like '{}' from the syslog are copied.\n    Other lines are excluded. Using this option allows the timestamp section of other lines\n    to be included. The rest of the contents of the line itself are elided. Providing the\n    timestamp section may provide time continuity while viewing the copied syslog file\n    \"\"\".format(SYSLOG_SEARCH_PATTERNS)\n\nHELP_RT_FILES = \"\"\" Location of the neuron runtime generated files \"\"\"\nMISCINFO_FILE = 'miscinfo.txt'\n\nHELP_VERBOSE = \"\"\" Verbose mode displays commands executed and any additional information\n                   which may be useful in debugging the tool itself\n               \"\"\"\n\nINCLUDE_EXTNS = ('.pb')\n\nHELP_INCLUDE_EXTN_FILES = \"\"\" Include files with these extensions from the compiler work\n    directory in the archive:\n    {}\n    \"\"\".format(INCLUDE_EXTNS)\n\nHELP_STDOUT = \"\"\" The file where the stdout of the compiler run was saved \"\"\"\n\nHELP_OUTDIR_MSG = \"\"\"\n    The output directory where all the files and other information will be stored.\n    The output will be stored as an archive as well as the actual directory where all the\n    contents are copied. This will allow a simple  audit of the files, if necessary.\n    *** N O T E ***: Make sure that this directory has enough space to hold the files\n    and resulting archive\n    \"\"\"\n\nUSERCMDFILE = \"how-the-user-executed-the-script-{}.txt\".format(os.path.basename(ACTUAL_CMD))\n\nNEURONDUMPPROGRAM = \"/opt/aws/neuron/bin/neuron-dump.py\"\nNEURONDUMPFILE = os.path.splitext(os.path.basename(NEURONDUMPPROGRAM))[0]\n\nNEURON_ERRMSG = \"Error: File {} doesn't exist, aws-neuron-tool package isn't installed?\".format(\n        NEURONDUMPPROGRAM)\n\nNEURON_INFO_TARBALL = \"{}\".format(os.path.splitext(os.path.basename(ACTUAL_CMD))[0])\nNEURONTMPDIR = NEURON_INFO_TARBALL\n\nARCHIVE_MSG = \"\\n\\n\\t******\\n\\tArchive created at:\\n\\t\\t{}\\n\\tFrom directory:\\n\\t\\t{}\\n\\t******\\n\\n\"\n\nNOT_IMPLEMENTED_MSG = \", nothing to see here, folks (not implemented as yet)\"\n\n# these are the only compiler-generated files that are included by default\nCOMPILER_FILES = ['graph_def.neuron-cc.log', 'all_metrics.csv', 'hh-tr-operand-tensortensor.json']\n\nCOMPILER_FILES_USER_OPT_IN = ['exp_and_others.json', 'graph_def.neff', 'graph_def.pb',\n                              'hh-spilled.json', 'hh-tr-accDN2virtDN.json',\n                              'hh-tr-external-move.json', 'hh-tr-internal-move.json',\n                              'hh-tr-removeDN.json', 'hh-transforms.json', 'wavegraph.json',\n                              'hh.json', 'pass03_scheduling.json',\n                              'relay_graph_opt_pre_color.txt', 'relay_graph_post_opt_kelp.txt',\n                              'relay_graph_post_opt_unit_level.txt', 'relay_graph_pre_opt.txt',\n                              'saved_model.pb', 'sch.json', 'sch_tmp.json',\n                              'schedule_trace.json',\n                              'wavegraph-bin.json']\n\nMODEL_DATA_MSG = \"\"\"\n    By using this option, the entire compiler work directory's contents will be\n    included (excluding the {} files, unless an additional option is used). This would\n    include model information, etc.\n    The files that are included, by default, are these: {}\n\n    \"\"\".format(INCLUDE_EXTNS, \", \".join(COMPILER_FILES))\n\nMODEL_DATA_MSG_INFO = \"\"\"\n\\t**************************\n\\tBased on your command line option, we're also packaging these files:\n\n\\t\\t{}\n\n\\tAnd this directory: {}\n\n\\t**************************\n\"\"\"\n\ndef get_os_version():\n\n    ''' function to obtain the Linux version\n        Args:\n\n        Output:\n\n        Returns:\n            string with value 'Ubuntu' or 'RedHat'\n    '''\n\n    try:\n        with open(\"/proc/version\") as fdin:\n            data = fdin.read()\n            if data.find('Ubuntu') == -1:\n                osver = 'RedHat'\n            else:\n                osver = 'Ubuntu'\n    except FileNotFoundError:\n        osver = 'Ubuntu'\n\n    return osver\n\n\ndef get_files(*, basedir, matchfiles, verbose):\n    ''' function to get the files based on a base directory and file extension\n\n        Args:\n            basedir     : base directory where files reside\n            matchfiles  : set of files to match\n            verbose : flag to indicate if verbose messages need to be displayed\n\n        Output:\n\n        Returns:\n            list of files found\n\n    '''\n\n    myfiles = list()\n    for dpath, _, files in os.walk(basedir):\n        for mfile in files:\n            if mfile in matchfiles:\n                mfile = os.path.realpath(os.path.join(dpath, mfile))\n                if os.path.isfile(mfile):\n                    myfiles.append(mfile)\n                else:\n                    if verbose:\n                        print(\"Warning: {} is not a file\".format(mfile))\n\n    return myfiles\n\n\ndef dump_compiler_info(*, outdir, location, allowmodel=False, addfldir=None, verbose=False):\n    ''' function to gather the following information:\n            Framework:\n                - TensorFlow\n                - MXNet\n                - PyTorch\n            Compiler:\n        Args:\n            outdir      : output directory\n            location    : location of compiler-generated files\n            allowmodel  : if True, allow gathering of additional files\n            verbose : flag to indicate if verbose messages need to be displayed\n\n        Output: compiler-generated files copied to outdir\n\n        Returns:\n    '''\n\n    if location is not None:\n        if allowmodel:  # copy the entire directory\n            try:\n                shutil.copytree(location, os.path.join(outdir, os.path.basename(location)),\n                                ignore_dangling_symlinks=True)\n            except shutil.Error:\n                pass\n        else:\n            fileset = set(COMPILER_FILES)\n            l1data = get_files(basedir=location, matchfiles=fileset, verbose=verbose)\n            copy_files(outdir=outdir, basedir=location, filelist=l1data, verbose=verbose)\n\n        if addfldir is not None:\n            if os.path.isfile(addfldir):\n                shutil.copy(addfldir, outdir)\n            else:  # directory copy\n                try:\n                    shutil.copytree(addfldir, os.path.join(outdir, os.path.basename(addfldir)),\n                                    ignore_dangling_symlinks=True)\n                except shutil.Error:\n                    pass\n\n    # print(\"Function: \", sys._getframe().f_code.co_name,  # pylint: disable=W0212\n    #       NOT_IMPLEMENTED_MSG)\n\n\ndef copy_stdout(*, outdir, stdout, verbose):\n    ''' function to copy the stdout file to the destination location\n\n        Args:\n            outdir  : destination location (output directory)\n            stdout  : file containing the output of running neuron-cc\n            verbose : flag to indicate if verbose messages need to be displayed\n\n        Output:\n\n        Returns:\n    '''\n\n    if verbose:\n        print(\"Copying {} to {}\".format(stdout, outdir))\n\n    shutil.copy(stdout, outdir)\n\n\ndef copy_syslog(*, outdir, include_flag=False, verbose):\n    '''\n        function to copy contents of the syslog to the output directory\n\n        Args:\n            outdir          : output directory location where the syslog's contents\n                              are to be copied\n            include_flag    : if True, include lines that do not match\n            verbose : flag to indicate if verbose messages need to be displayed\n\n        Output:\n            copy of syslog's contents with just \"Neuron-specific\" lines\n\n        Returns:\n    '''\n\n    # syslog looks like this:\n    # 2019-11-21T19:32:50.347183+00:00 ink neuron-rtd[17977]: nrtd[17977]: <SNIP>\n    # The first regex (regex1) is used to match lines that we want to see in our copy\n\n    regex1 = re.compile(r'^(\\S+)\\s.*?({})'.format(SYSLOG_SEARCH_PATTERNS))\n    regex2 = re.compile(r'^(\\S+)\\s')\n\n    osver = get_os_version()\n    if osver == 'Ubuntu':\n        syslog = '/var/log/syslog'\n    else:\n        syslog = '/var/log/messages'\n\n    try:\n        with open(syslog) as fdin,\\\n            open(os.path.join(outdir, 'copy-of-syslog'), 'w') as fdout:\n            for line in fdin:\n                match = regex1.search(line)\n                if match is not None:\n                    fdout.write(line)\n                else:\n                    if include_flag:\n                        match = regex2.match(line)\n                        if match is not None:\n                            # exclude the rest of the line\n                            fdout.write(match.group(1) + ' XXX contents elided XXX\\n')\n                        else:\n                            print(\"Error in parsing this line: {}\".format(line))\n    except FileNotFoundError:\n        print(\"Error, /var/log/syslog not found\")\n\n\ndef dump_rt_info(*, location, verbose):\n    ''' function to dump the following information:\n            - runtime\n            - Framework (??)\n        Args:\n            location: location of runtime files\n            verbose : flag to indicate if verbose messages need to be displayed\n        Returns:\n            list of info\n    '''\n\n    # l1data = get_files(basedir=location, file_extn=('.sh'))\n    print(\"Function: \", sys._getframe().f_code.co_name,  # pylint: disable=W0212\n          NOT_IMPLEMENTED_MSG)\n\n\ndef allow_capture_of_files():\n    '''\n        function to allow the capture of files from the customer's environment\n        This is OFF by default and has to be explicitly enabled by the command-line\n        option by the user\n\n        Args:\n\n        Output:\n\n        Returns:\n\n    '''\n\n    print(\"Function: \", sys._getframe().f_code.co_name,  # pylint: disable=W0212\n          NOT_IMPLEMENTED_MSG)\n\n\ndef add_additional_filters(filterfile):\n    '''\n        function to apply additional filters to files that are being captured\n\n        Args:\n            filterfile  : text file with patterns (regexs), one per line, to use as filters\n\n\n        Output:\n\n        Returns:\n\n    '''\n\n    print(\"Function: \", sys._getframe().f_code.co_name,  # pylint: disable=W0212\n          NOT_IMPLEMENTED_MSG)\n\n\ndef dump_miscinfo(*, outdir, verbose):\n    ''' function to dump miscellaneous information, including:\n            - system info (uname -a)\n            - package info (??? list of packages installed)\n            - neuron-ls\n            - neuron-top\n\n        Args:\n            outdir  : output directory\n            verbose : flag to indicate if verbose messages need to be displayed\n\n        Output:\n            Creates various reports in the outdir location\n\n        Returns:\n\n    '''\n\n    osver = get_os_version()\n    if osver == 'Ubuntu':\n        pkgcmds = [\"apt list | egrep '^aws'\",\n                   \"pip list | egrep '^neuron|^numpy|^tensor|^scipy'\"]\n    else:\n        pkgcmds = [\"rpm -qa | egrep '^aws|^neuron|^numpy|^tensor|^scipy'\"]\n\n    cmds = EXTERNAL_CMDS + pkgcmds\n\n    for cmd in cmds:\n        cmdname = cmd.split(' ')[0]  # get just the command name for creating the file\n        cmdfile = os.path.join(outdir, \"report-{}.txt\".format(cmdname))\n\n        with open(cmdfile, \"w\") as fdout:\n\n            if verbose:\n                print(\"Running cmd: {} and capturing output in file: {}\".format(cmd, cmdfile))\n\n            try:\n                res = subprocess.Popen(cmd, stdout=subprocess.PIPE,\n                                       stderr=subprocess.STDOUT, universal_newlines=True,\n                                       shell=True)\n                stdout, stderr = res.communicate()\n                if stderr is not None:\n                    fdout.write(\"Error in executing cmd: {}\\nError: {}\\n\".format(cmd, str(stderr)))\n                else:\n                    fdout.write(\"Output from executing cmd: {}\\n\\n{}\\n\".format(cmd, str(stdout)))\n            except (OSError, ValueError) as err:\n                fdout.write(\"Error in executing cmd: {}\\nError: {}\\n\".format(cmd, err))\n\n\ndef dump_proc_info(*, outdir, verbose):\n    '''\n        function to dump information related to \"/proc\"\n\n        Args:\n            outdir  : output directory\n            verbose : flag to indicate if verbose messages need to be displayed\n\n        Output:\n            Creates various reports in the outdir location\n\n        Returns:\n\n    '''\n\n    for procfile in PROC_FILES:\n        fname = procfile.split('/')  # use the 2nd and 3rd items from this (canonical form)\n        pfile = os.path.join(outdir, \"report-{}-{}.txt\".format(fname[1], fname[2]))\n        if verbose:\n            print(\"Copying contents of: {} to: {}\".format(procfile, pfile))\n\n        try:\n            with open(pfile, \"w\") as fdout, open(procfile) as fdin:\n                fdout.write(\"Contents of {}\\n\\n\".format(procfile))\n                fdout.write(fdin.read())\n        except FileNotFoundError:\n            print(\"Error: file {} not found\\n\".format(procfile))\n\n\ndef sanity_check(options):\n    '''\n        function to check if command-line arguments are valid\n\n        Args:\n            options : options from argparse parser\n\n        Output:\n\n        Returns:\n            0 : success\n            1 : failure\n    '''\n\n    # the script has to be run as root or \"sudo\"\n    if os.getuid() != 0:\n        print(\"*** Rerun this script as user 'root' or as sudo **\\n\\n\")\n        return 1\n\n    outdir = options.outdir\n\n    retval = 0\n    if os.path.isfile(outdir) or os.path.isdir(outdir):\n        print(\"Error: {} already exists, please provide a non-existing directory\".format(outdir))\n        retval = 1\n\n    if not os.path.isfile(options.stdout):\n        print(\"Error: {} doesn't exist, please provide an existing file\".format(options.stdout))\n        retval = 1\n\n    if options.addfldir is not None:\n        if not os.path.isfile(options.addfldir) and not os.path.isdir(options.addfldir):\n            print(\"Error: {} isn't a file nor a directory\".format(options.addfldir))\n            retval = 1\n\n    for mydir in [options.ccdir, options.rtdir]:\n        if mydir is not None and not os.path.isdir(mydir):\n            print(\"Error: {} is not a directory, please provide a directory\".format(mydir))\n            retval = 1\n\n    if options.allowmodel and options.ccdir is None:\n        print(\"Error: you need to specify a compiler work directory along with the 'm' option\")\n        retval = 1\n    return retval\n\n\ndef copy_files(*, outdir, basedir, filelist, verbose):\n    '''\n        function to copy files from the original source area\n        into the destination. This is also the place for any\n        massaging or eliding of file contents\n\n        Args:\n            outdir  : destination location\n            basedir : base directory from where the files are to be copied\n            filelist: list of files to be copied\n            verbose : flag to indicate if verbose messages need to be displayed\n\n        Output:\n            Copy of files (possibly altered) from the source\n\n        Returns:\n\n    '''\n\n    for thisfile in filelist:\n        myfile = '.' + thisfile[len(basedir):]\n        mydir = os.path.dirname(os.path.join(outdir, myfile))\n        if not os.path.isdir(mydir):\n            os.makedirs(mydir)\n        shutil.copy(thisfile, mydir, follow_symlinks=True)\n\n\ndef write_miscinfo(*, outdir, data):\n    '''\n        function to write out the contents of the miscellaneous commands\n\n        Args:\n            outdir  : destination location\n            data    : list of strings to be stored in a file\n\n        Output:\n            MISCINFO_FILE created with the contents of the output of the various\n            commands\n    '''\n\n    flname = os.path.join(outdir, MISCINFO_FILE)\n\n    with open(flname, \"w\") as fdout:\n        fdout.write(\"\\n\".join(data))\n\n\ndef run_neuron_dump(outdir, verbose):\n    '''\n        function to call the existing neuron-dump.py tool\n\n        Args:\n            outdir  : destination location\n            verbose : flag to indicate if verbose messages need to be displayed\n\n        Output:\n            tarball created by this tool\n\n        Returns:\n\n    '''\n\n    if not os.path.isfile(NEURONDUMPPROGRAM):\n        print(NEURON_ERRMSG)\n        return\n\n    cmd = \"{} -o {}\".format(NEURONDUMPPROGRAM, os.path.join(outdir, NEURONDUMPFILE))\n\n    if verbose:\n        print(\"Executing command: {}\".format(cmd))\n\n    try:\n        res = subprocess.Popen(cmd, stdout=subprocess.PIPE,\n                               stderr=subprocess.STDOUT, universal_newlines=True,\n                               shell=True)\n        stdout, stderr = res.communicate()\n        if stderr is not None:\n            print(\"Error in executing cmd: {}\\nError: {}\\n\".format(cmd, str(stderr)))\n    except (OSError, ValueError) as err:\n        print(\"Error in executing cmd: {}\\nError: {}\\n\".format(cmd, err))\n\n    if verbose:\n        print(\"Output of cmd: {}\\n{}\".format(cmd, stdout))\n\n\ndef package_tarball(*, outdir, allowmodel, ccdir, verbose):\n    '''\n        function to package everything into a tarball\n\n        Args:\n            outdir      : output directory\n            allowmodel  : flag to indicate whether the user has allowed\n                          gathering of model data\n\n        Output:\n            A tar ball created in directory one level above outdir\n            this would be the directory provided by the user\n\n        Returns:\n    '''\n\n    mytarball = os.path.join(os.path.split(outdir)[0], NEURON_INFO_TARBALL)\n\n    if verbose:\n        print(\"Creating archive: {}\".format(mytarball))\n\n    archivefile = shutil.make_archive(mytarball, 'gztar', outdir)\n    print(ARCHIVE_MSG.format(archivefile, outdir))\n\n    if allowmodel:\n        print(MODEL_DATA_MSG_INFO.format(\"\\n\\t\\t\".join(COMPILER_FILES),\n                                         ccdir))\n\n\ndef add_cmdline_args():\n    '''\n        function to add the command line arguments and options\n\n        Args:\n\n        Output:\n\n        Returns:\n            parser for cmd line\n\n    '''\n\n    parser = argparse.ArgumentParser(\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        description=USAGE_MSG)\n\n    parser.add_argument('--additionalfileordir',\n                        dest='addfldir',\n                        help=HELP_ADDITIONAL_FILE_OR_DIR,\n                        default=None)\n\n    parser.add_argument('-c', '--compileroutdir',\n                        dest='ccdir',\n                        help=HELP_CC_FILES,\n                        default=None)\n\n    parser.add_argument('-i', '--include',\n                        dest='includemismatch',\n                        help=INCLUDE_MSG,\n                        action='store_true',\n                        default=False)\n\n    parser.add_argument('-f', '--filter',\n                        dest='filterfile',\n                        default=None)\n\n    parser.add_argument('-m', \"--modeldata\",  # data related to model, etc. will be gathered\n                        dest='allowmodel',\n                        action='store_true',\n                        help=MODEL_DATA_MSG,\n                        default=False)\n\n    parser.add_argument('-o', '--out',\n                        dest='outdir',\n                        help=HELP_OUTDIR_MSG,\n                        required=True)\n\n    parser.add_argument('-r', '--runtimeoutdir',\n                        dest='rtdir',\n                        help=HELP_RT_FILES,\n                        default=None)\n\n    parser.add_argument('-s', '--stdout',\n                        dest='stdout',\n                        help=HELP_STDOUT,\n                        required=True)\n\n    parser.add_argument('-v', '--verbose',\n                        dest='verbose',\n                        help=HELP_VERBOSE,\n                        action='store_true',\n                        default=False)\n\n    return parser\n\n\ndef main():\n    \"\"\" main function\n        creates command-line option parser, sanity checks, and then executes code\n        based on command-line options\n    \"\"\"\n\n    parser = add_cmdline_args()\n\n    if len(sys.argv) == 1:\n        parser.print_help()\n        sys.exit(1)\n\n    options = parser.parse_args()\n    # append the directory where we'll create files to what the user provides\n    options.outdir = os.path.realpath(os.path.join(options.outdir, NEURONTMPDIR))\n\n    if options.ccdir is not None:\n        options.ccdir = os.path.realpath(options.ccdir)\n\n    if options.addfldir is not None:\n        options.addfldir = os.path.realpath(options.addfldir)\n\n    if options.rtdir is not None:\n        options.rtdir = os.path.realpath(options.rtdir)\n\n    options.stdout = os.path.realpath(options.stdout)\n\n    if sanity_check(options):\n        parser.print_help()\n        sys.exit(1)\n\n    # create the base directory\n    try:\n        os.makedirs(options.outdir)\n    except FileNotFoundError:\n        print(\"Error in creating directory {}\".format(options.outdir))\n        sys.exit(1)\n\n    # if options.allow:\n    #     allow_capture_of_files()\n\n    if options.filterfile is not None:\n        add_additional_filters(os.path.realpath(options.filterfile))\n\n    # record the command as executed by the user\n    with open(os.path.join(options.outdir, USERCMDFILE), \"w\") as fdout:\n        fdout.write(\"Command executed as: {}\\n\".format(\" \".join(sys.argv)))\n\n    dump_compiler_info(outdir=options.outdir, location=options.ccdir,\n                       allowmodel=options.allowmodel,\n                       addfldir=options.addfldir,\n                       verbose=options.verbose)\n\n    # Not being used now. neuron-dump.py would do this\n    # dump_rt_info(location=options.rtdir, verbose=options.verbose)\n\n    dump_miscinfo(outdir=options.outdir, verbose=options.verbose)\n    dump_proc_info(outdir=options.outdir, verbose=options.verbose)\n\n    copy_stdout(outdir=options.outdir, stdout=options.stdout, verbose=options.verbose)\n    copy_syslog(outdir=options.outdir, include_flag=options.includemismatch,\n                verbose=options.verbose)\n\n    # run the existing tool neuron-dump.py as well\n    run_neuron_dump(outdir=options.outdir, verbose=options.verbose)\n\n    package_tarball(outdir=options.outdir, allowmodel=options.allowmodel,\n                    ccdir=options.ccdir, verbose=options.verbose)\n\n    # change permissions for the directory and output\n    os.system(\"chown -R {} {}\".format(os.getlogin(), os.path.split(options.outdir)[0]))\n\n    # write_miscinfo(outdir=options.outdir, data=l3)\n\n\nif __name__ == \"__main__\":\n\n    main()\n"
  },
  {
    "path": "src/neuron-gatherinfo/tf_neuron_check_model.py",
    "content": "import os\nimport json\nimport sys\nimport struct\nimport argparse\nimport subprocess\nfrom collections import Counter\n \nclass neuron_parser:\n  def __init__(self):\n    self.parser = argparse.ArgumentParser()\n    self.parser.add_argument('model_path', type=str, help='a TensorFlow SavedModel directory (currently supporting TensorFlow v1 SaveModel only).')\n    self.parser.add_argument('--show_names', action='store_true', help='list operation by name instead of summarizing by type (caution: this option will generate many lines of output for a large model).')\n    self.parser.add_argument('--expand_subgraph', action='store_true', help='show subgraph operations.')\n    self.parser_args = self.parser.parse_args()\n    self.neuronop_info = {}\n    self.total_pipeline_cores = 0\n    self.min_required_pipeline_cores = 0\n    path = self.parser_args.model_path\n    if os.path.exists(path + '-symbol.json'):\n      self.load_mxnet_model(path)\n    elif os.path.isdir(path):\n      self.load_tensorflow_model(path)\n    else:\n      raise RuntimeError('Cannot determine framework type from model path argument.')\n    self.supported = self.get_neuron_supported()\n    self.supported.extend(self.addl_support)\n    for name, executable, (sg_nodetypes, sg_nodenames) in self.neuron_nodes:\n      num_cores, requested_cores, _ = self.get_cores_from_executable(executable)\n      self.neuronop_info[name] = (num_cores, requested_cores, sg_nodetypes, sg_nodenames)\n      self.total_pipeline_cores += num_cores\n      if num_cores > self.min_required_pipeline_cores:\n          self.min_required_pipeline_cores = num_cores\n\n  def get_neuron_supported(self):\n    exec_cmd = [\"neuron-cc\", \"list-operators\", \"--framework\", self.framework]\n    oplist = subprocess.check_output(' '.join(exec_cmd), shell=True)\n    oplist = str(oplist, 'utf-8')\n    oplist = oplist.split(\"\\n\")\n    return oplist[:-1]  # Remove the last element which is ''\n \n  def get_tf_subgraph_types_names(self, node):\n    from tensorflow.core.framework import graph_pb2\n    graph_def = graph_pb2.GraphDef()\n    graph_def.ParseFromString(node.attr['graph_def'].s)\n    sg_nodes = graph_def.node\n    sg_nodes = [sg_node for sg_node in sg_nodes if sg_node.op not in self.excl_types]\n    nodetypes = [sg_node.op for sg_node in sg_nodes]\n    nodenames = [sg_node.name for sg_node in sg_nodes]\n    return nodetypes, nodenames\n\n  def load_tensorflow_model(self, path):\n    import tensorflow as tf\n    import tensorflow_hub as hub\n    self.framework = 'TENSORFLOW'\n    self.neuron_optype = \"NeuronOp\"\n    self.excl_types = ['Placeholder', 'PlaceholderWithDefault', 'NoOp', 'Const', 'Identity', 'IdentityN', 'VarHandleOp', 'VarIsInitializedOp', 'AssignVariableOp', 'ReadVariableOp', 'StringJoin', 'ShardedFilename', 'SaveV2', 'MergeV2Checkpoints', 'RestoreV2']\n    self.addl_support = ['FusedBatchNormV3', 'BatchMatMulV2', 'AddV2', 'StopGradient', self.neuron_optype]\n    model = hub.load(path)\n    graph_def = model.graph.as_graph_def()\n    nodes = graph_def.node\n    nodes = [node for node in nodes if node.op not in self.excl_types]\n    self.nodetypes = [node.op for node in nodes]\n    self.nodenames = [node.name for node in nodes]\n    self.neuron_nodes = [(node.name, node.attr['executable'].s, self.get_tf_subgraph_types_names(node)) for node in nodes if node.op == self.neuron_optype]\n\n  def get_mx_subgraph_types_names(self, node):\n    nodetypes = []\n    nodenames = []\n    for sg in node['subgraphs']:\n      filtered_nodes = [sg_node for sg_node in sg['nodes'] if sg_node['op'] not in self.excl_types]\n      nodetypes.extend([sg_node['op'] for sg_node in filtered_nodes])\n      nodenames.extend([sg_node['name'] for sg_node in filtered_nodes])\n    return nodetypes, nodenames\n\n  def load_mxnet_model(self, path):      \n    import mxnet as mx\n    if mx.__version__ != \"1.5.1\":\n      try:\n        import mxnetneuron as mxn\n      except:\n        raise \"Please install mxnetneuron package.\"\n    self.framework = 'MXNET'\n    self.neuron_optype = \"_neuron_subgraph_op\"\n    self.excl_types = ['null']\n    self.addl_support = [self.neuron_optype]\n    sym, args, auxs = mx.model.load_checkpoint(path, 0)\n    nodes = json.loads(sym.tojson())[\"nodes\"]\n    nodes = [node for node in nodes if node['op'] not in self.excl_types]\n    self.nodetypes = [node['op'] for node in nodes]\n    self.nodenames = [node['name'] for node in nodes]\n    neuron_nodes_tmp = [node for node in nodes if node['op'] == self.neuron_optype]\n    self.neuron_nodes = [(node['name'], bytearray(args[node['name']+\"_neuronbin\"].asnumpy()), self.get_mx_subgraph_types_names(node)) for node in neuron_nodes_tmp]\n\n  @staticmethod\n  def get_cores_from_executable(executable):\n    _NC_HEADER_SIZE = 544\n    header = executable[:_NC_HEADER_SIZE]\n    info = list(struct.unpack('168xI304xI64B', header))\n    numCores = info.pop(0)\n    numCoresRequested = info.pop(0)\n    coresPerNode = info\n    return  numCores, numCoresRequested, coresPerNode\n\n  # Display table of operation type or name and whether supported or not\n  def print_node_type_info(self):\n    self.cnt_total = len(self.nodetypes)\n    self.cnt_supported = 0\n    if self.parser_args.show_names:\n      widthn = max(max(map(len, self.nodenames)), 8)\n      widtht = max(max(map(len, self.nodetypes)), 8)\n      format_str = \"{:<\" + str(widthn) + \"}  {:<\" + str(widtht) + \"}  {:<4}\"\n      pp = lambda x: print(format_str.format(*x))\n      pp(['Op Name', 'Op Type', 'Neuron Supported ?'])\n      pp(['-------', '-------', '------------------'])\n      for idx, opname in enumerate(self.nodenames):\n        optype = self.nodetypes[idx]\n        if optype in self.supported:\n          pp([opname, optype, 'Yes'])\n          self.cnt_supported += 1\n      for idx, opname in enumerate(self.nodenames):\n        optype = self.nodetypes[idx]\n        if optype not in self.supported:\n          pp([opname, optype, 'No'])\n    else:\n      count = Counter(self.nodetypes)\n      width = max(max(map(len, self.nodetypes)), 8)\n      format_str = \"{:<\" + str(width) + \"}  {:<14}  {:<4}\"\n      pp = lambda x: print(format_str.format(*x))\n      pp(['Op Type', 'Num Instances', 'Neuron Supported ?'])\n      pp(['-------', '-------------', '------------------'])\n      for key in count:\n        if key in self.supported:\n          pp([key, count[key], 'Yes'])\n          self.cnt_supported += count[key]\n      for key in count:\n        if key not in self.supported:\n          pp([key, count[key], 'No'])\n    print()\n\n  def print_subgraph_ops(self, sg_nodetypes, sg_nodenames):\n    if self.parser_args.show_names:\n      widthn = max(max(map(len, sg_nodenames)), 8)\n      widtht = max(max(map(len, sg_nodetypes)), 8)\n      format_str = \"{:<\" + str(widthn) + \"}  {:<\" + str(widtht) + \"}\"\n      pp = lambda x: print('    ', format_str.format(*x))\n      pp(['Op Name', 'Op Type'])\n      pp(['-------', '-------'])\n      for idx, opname in enumerate(sg_nodenames):\n        optype = sg_nodetypes[idx]\n        pp([opname, optype])\n    else:\n      count = Counter(sg_nodetypes)\n      width = max(max(map(len, sg_nodetypes)), 8)\n      format_str = \"{:<\" + str(width) + \"}  {:<14}\"\n      pp = lambda x: print('    ', format_str.format(*x))\n      pp(['Op Type', 'Num Instances'])\n      pp(['-------', '-------------'])\n      for key in count:\n        pp([key, count[key]])\n\n  def print_neuron_node_info(self):\n    idx = 0\n    width = max(max(map(len, self.neuronop_info)), 14) \n    format_str = \"{:<\" + str(width) + \"}  {:<14}\"\n    pp = lambda x: print(format_str.format(*x))\n    pp(['Subgraph Name', 'Num Pipelined NeuronCores'])\n    pp(['-------------', '-------------------------'])\n    core_cnt_list = []\n    for name, (num_cores, _, sg_nodetypes, sg_nodenames) in self.neuronop_info.items():\n      pp([name, num_cores])\n      core_cnt_list.append(num_cores)\n      idx += 1\n      if self.parser_args.expand_subgraph:\n        self.print_subgraph_ops(sg_nodetypes, sg_nodenames)\n    print()\n\n  def print_neuron_support_stats(self):\n    print(\"* Total inference operations: {}\".format(self.cnt_total))\n    print(\"* Total Neuron supported inference operations: {}\".format(self.cnt_supported))\n    if self.cnt_total > 0:\n      perc = self.cnt_supported / self.cnt_total * 100\n    else:\n      perc = 0\n    print(\"* Percent of total inference operations supported by Neuron: {:.1f}\".format(perc))\n    print()\n\n  def print_common_desc(self):\n    if self.parser_args.show_names:\n      print(\"* Each line shows an operation name and whether the type of that operation is supported in Neuron.\")\n    else:\n      print(\"* Each line shows an operation type, the number of instances of that type within model,\\n\" \\\n            \"* and whether the type is supported in Neuron.\")\n    print(\"* Some operation types are excluded from table because they are no-operations or training-related operations:\\n\", \\\n            self.excl_types, \"\\n\")\n\n  def run(self):\n    if len(self.neuronop_info) > 0:\n      print(\"\\n* Found {} Neuron subgraph(s) ({}(s)) in this compiled model.\\n\" \\\n            \"* Use this tool on the original uncompiled model to see Neuron supported operations.\\n\" \\\n            \"* The following table shows all operations, including Neuron subgraphs.\".format(len(self.neuronop_info), self.neuron_optype))\n      self.print_common_desc()\n      self.print_node_type_info()\n      print('* Please run this model on Inf1 instance with at least {} NeuronCore(s).'.format(self.min_required_pipeline_cores))\n      print(\"* The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph\\n\"\\\n            \"* (and subgraph operations if --expand_subgraph is used):\\n\")\n      self.print_neuron_node_info()\n    else:\n      print(\"\\n* The following table shows the supported and unsupported operations within this uncompiled model.\")\n      self.print_common_desc()\n      self.print_node_type_info()\n      self.print_neuron_support_stats()\n \nif __name__=='__main__':\n  toolkit = neuron_parser()\n  toolkit.run()\n"
  },
  {
    "path": "src/neuronperf/LICENSE",
    "content": "AWS Neuron License Agreement\n\nTHIS IS AN AGREEMENT BETWEEN YOU AND AMAZON WEB SERVICES, INC. (WITH ITS AFFILIATES, \"AWS\" OR \"WE\") THAT GOVERNS YOUR USE OF THE AWS NEURON SOFTWARE (TOGETHER WITH ANY UPDATES AND UPGRADES TO IT, AND ACCOMPANYING DOCUMENTATION, THE “SOFTWARE”) THAT WE MAKE AVAILABLE TO YOU. IF YOU DOWNLOAD, INSTALL, OR USE THE SOFTWARE, YOU ACCEPT AND AGREE TO BE BOUND BY THIS AGREEMENT AND REPRESENT THAT YOU HAVE THE AUTHORITY TO BIND YOURSELF OR THE ENTITY YOU REPRESENT TO THIS AGREEMENT.\n\n1.\tUse of the Software\nWe hereby grant you a personal, limited, nonexclusive, non-transferable, non-sublicenseable, revocable, royalty-free, worldwide license during the term of this Agreement to install and use the Software in connection with AWS Services. You may not use the Software if you do not have an account in good standing with AWS. Some components of the Software (whether developed by AWS or third parties) may also be governed by applicable open source software licenses located in the software component's source code. Your license rights with respect to these individual components are defined by the applicable open source software license, and nothing in this Agreement will restrict, limit, or otherwise affect any rights or obligations you may have, or conditions to which you may be subject, under such open source software licenses. “AWS Services” means each of the services made available by AWS as may be updated by AWS from time to time in its sole discretion at https://aws.amazon.com/service-terms/ and are subject to your AWS Customer Agreement or AWS Enterprise Agreement.\n\n2.\tLimitations\n\nYou may not, and you will not encourage, assist or authorize any other person to (a) sell, rent, lease, lend, loan, distribute, act as a service bureau, publicly communicate, transform, or sub-license the Software or otherwise assign any rights to the Software in whole or in part, (b) modify, alter, tamper with, repair, or otherwise create derivative works of the Software, (c) reverse engineer, disassemble, or decompile the Software or apply any other process or procedure to derive the source code of any software included in the Software, or (d) access or use the Software or the AWS Service in a way intended to avoid incurring fees or exceeding usage limits or quotas. All rights granted to you are conditioned on your continued compliance with this Agreement, and will immediately and automatically terminate if you do not comply with any term or condition of this Agreement or the AWS Customer Agreement or AWS Enterprise Agreement, including any failure to remit timely payment for the Software or the AWS Service. You will not use the Software with any software or other materials that are subject to licenses or restrictions (e.g., open source software licenses) that, when combined with the Software, would require us to disclose, license, distribute or otherwise make all or any part of such Software available to anyone.  You will not remove, modify, or obscure any copyright, patent, trademark or other proprietary or attribution notices on or in any Software.\n\n3.\tReservation of Rights\n\nYou may not use the Software for any illegal purpose. The Software is the intellectual property of AWS or its licensors. The structure, organization, and code of the Software are valuable trade secrets and AWS confidential information. The Software is protected by applicable law, including without limitation copyright laws and international treaty provisions. Except for the rights expressly granted to you in this Agreement, all right, title and interest in the Software are reserved and retained by AWS and our licensors. You do not acquire any intellectual property or other rights in the Software as a result of downloading, installing, or using the Software.\n\n4.\tUpdates\n\nIn order to keep the Software up-to-date, we may offer automatic or manual updates at any time. If we elect to provide maintenance or support of any kind, we may terminate that maintenance or support at any time without notice to you.\n\n5.\tTermination\n\nYou may terminate this Agreement at any time by uninstalling and destroying all copies of the Software that are in your possession or control. This Agreement (including any rights granted to you under this Agreement) will immediately and automatically terminate without notice from us if (a) you fail to comply with any term or condition of this Agreement or any other agreement you have with AWS, or (b) you fail to make timely payment for any AWS Service. In the case of termination, you must cease all downloading, installation, and use of the Software and uninstall and destroy all copies of the Software that are in your possession or control. We may modify, suspend, discontinue, or terminate your right to use part or all of the Software at any time without notice to you, and in that event we may modify the Software to make it inoperable. AWS will not be liable to you should it exercise those rights. Our failure to insist upon or enforce your strict compliance with this Agreement will not constitute a waiver of any of our rights. No waiver of any provision of this Agreement shall be effective unless in writing.\n\n6.\tDisclaimer of Warranties and Limitation of Liability\n\na. YOU EXPRESSLY ACKNOWLEDGE AND AGREE THAT INSTALLATION AND USE OF, AND ANY OTHER ACCESS TO, THE SOFTWARE IS AT YOUR SOLE RISK. THE SOFTWARE IS DELIVERED TO YOU “AS IS” WITH ALL FAULTS AND WITHOUT WARRANTY OF ANY KIND, AND AWS, ITS LICENSORS AND DISTRIBUTORS, AND EACH OF THEIR RESPECTIVE AFFILIATES AND SUPPLIERS (COLLECTIVELY, THE “RELEASED PARTIES”) DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, ACCURACY, QUIET ENJOYMENT, AND NON-INFRINGEMENT. NO ORAL OR WRITTEN INFORMATION OR ADVICE GIVEN BY A RELEASED PARTY OR AN AUTHORIZED REPRESENTATIVE OF A RELEASED PARTY WILL CREATE A WARRANTY. THE LAWS OF CERTAIN JURISDICTIONS DO NOT ALLOW THE DISCLAIMER OF IMPLIED WARRANTIES. IF THESE LAWS APPLY TO YOU, SOME OR ALL OF THE ABOVE DISCLAIMERS, EXCLUSIONS, OR LIMITATIONS MAY NOT APPLY TO YOU, AND YOU MAY HAVE ADDITIONAL RIGHTS.\n\nb. TO THE EXTENT NOT PROHIBITED BY LAW, NO RELEASED PARTY WILL BE LIABLE TO YOU FOR ANY INCIDENTAL OR CONSEQUENTIAL DAMAGES FOR BREACH OF ANY EXPRESS OR IMPLIED WARRANTY, BREACH OF CONTRACT, NEGLIGENCE, STRICT LIABILITY, OR ANY OTHER LEGAL THEORY RELATED TO THE SOFTWARE, INCLUDING WITHOUT LIMITATION ANY DAMAGES ARISING OUT OF LOSS OF PROFITS, REVENUE, DATA, OR USE OF THE APPLICATION, EVEN IF A RELEASED PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. IN ANY CASE, ANY RELEASED PARTY’S AGGREGATE LIABILITY UNDER THE AGREEMENT WILL BE LIMITED TO $50.00. THE LAWS OF CERTAIN JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES. IF THESE LAWS APPLY TO YOU, SOME OR ALL OF THE ABOVE EXCLUSIONS OR LIMITATIONS MAY NOT APPLY TO YOU, AND YOU MAY HAVE ADDITIONAL RIGHTS.\n\n7.\tIndemnification\n\nYou are liable for and will defend, indemnify, and hold harmless the Released Parties and their officers, directors, agents, and employees, from and against any liability, loss, damage, cost, or expense (including reasonable attorneys’ fees) arising out of your use of the Software, violation of the Agreement, violation of applicable law, or violation of any right of any person or entity, including without limitation intellectual property rights.\n\n8.\tCompliance with Laws; Export Regulations\n\nYou will comply with all export and re-export restrictions and regulations of the United States Department of Commerce and other United States and foreign agencies and authorities that may apply to the Software, and not to transfer, or encourage, assist, or authorize the transfer of the Software to a prohibited country or otherwise in violation of any applicable restrictions or regulations.\n\n9.\tU.S. Government End Users\n\nThe Software is provided to the U.S. Government as “commercial items,” “commercial computer software,” “commercial computer software documentation,” and “technical data” with the same rights and restrictions generally applicable to the Software. If you are using the Software on behalf of the U.S. Government and these terms fail to meet the U.S. Government’s needs or are inconsistent in any respect with federal law, you will immediately discontinue your use of the Software. The terms “commercial item,” “commercial computer software,” “commercial computer software documentation,” and “technical data” are defined in the Federal Acquisition Regulation and the Defense Federal Acquisition Regulation Supplement.\n\n10.\tAmendment\n\nWe may amend this Agreement at our sole discretion by posting the revised terms on the AWS website (aws.amazon.com) or within the Software. Your continued use of the Software after any amendment's effective date evidences your agreement to be bound by it. If you do not agree to a change, you must stop using the Software and terminate this Agreement.\n\n13.\tConflicts\n\nIn the event of any conflict or inconsistency among the terms and conditions of this Agreement and the existing AWS Customer Agreement or your AWS Enterprise Agreement, such conflict or inconsistency will be resolved by giving precedence to this Agreement.\n\n14.\tEntire Agreement and Severability\n\nThis is the entire agreement between AWS and you regarding the Software and supersedes all prior understandings regarding such subject matter (including any Evaluation Agreement). If any term or condition of this Agreement is deemed invalid, void, or for any reason unenforceable, that part will be deemed severable and will not affect the validity and enforceability of any remaining term or condition.\n"
  },
  {
    "path": "src/neuronperf/README.md",
    "content": "# NeuronPerf\n\nA library for benchmarking machine learning models on accelerators.\n\n## Documentation\n\nhttps://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuronperf/index.html"
  },
  {
    "path": "src/neuronperf/build.sh",
    "content": "#!/bin/bash\n\nset -ex\n\npython3 -m pytest -vv \\\n    --verbose \\\n    --ignore=build/private \\\n    --cov=neuronperf \\\n    --cov-report term-missing \\\n    --cov-report html:build/brazil-documentation/coverage \\\n    --cov-report xml:build/brazil-documentation/coverage/coverage.xml \\\n    --color=yes \\\n    -x \\\n    test \\\n    -m \"sanity or slow\"\n\npython3 setup.py bdist_wheel --dist-dir build/pip/public/neuronperf\n"
  },
  {
    "path": "src/neuronperf/conf.py",
    "content": "\"\"\"Sphinx configuration.\"\"\"\n\nimport datetime\nimport os\nimport shutil\n\nfrom amazon_doc_utils import brazil_info\n\n# Get metadata from brazil\nbrazil_version, intersphinx_factory = brazil_info.get(\n    [brazil_info.PackageVersion, brazil_info.IntersphinxFactory]\n)\n\n\ndef run_apidoc(app):\n    \"\"\"Generate doc stubs using sphinx-apidoc.\"\"\"\n    module_dir = os.path.join(app.srcdir, \"../src/\")\n    output_dir = os.path.join(app.srcdir, \"_apidoc\")\n    excludes = []\n\n    # Ensure that any stale apidoc files are cleaned up first.\n    if os.path.exists(output_dir):\n        shutil.rmtree(output_dir)\n\n    cmd = [\n        \"--separate\",\n        \"--module-first\",\n        \"--doc-project=API Reference\",\n        \"-o\",\n        output_dir,\n        module_dir,\n    ]\n    cmd.extend(excludes)\n\n    try:\n        from sphinx.ext import apidoc  # Sphinx >= 1.7\n\n        apidoc.main(cmd)\n    except ImportError:\n        from sphinx import apidoc  # Sphinx < 1.7\n\n        cmd.insert(0, apidoc.__file__)\n        apidoc.main(cmd)\n\n\ndef setup(app):\n    \"\"\"Register our sphinx-apidoc hook.\"\"\"\n    app.connect(\"builder-inited\", run_apidoc)\n\n\n# Sphinx configuration below.\nproject = brazil_version.name\nversion = brazil_version.mv\nrelease = brazil_version.full_version\ncopyright = \"{}, Amazon.com\".format(datetime.datetime.now().year)\n\nintersphinx_mapping = intersphinx_factory.get_mapping()\n\nextensions = [\n    \"sphinx.ext.autodoc\",\n    \"sphinx.ext.intersphinx\",\n    \"sphinx.ext.napoleon\",\n    \"sphinx.ext.todo\",\n    \"sphinx.ext.viewcode\",\n]\n\nsource_suffix = \".rst\"\nmaster_doc = \"index\"\n\nautoclass_content = \"class\"\nautodoc_member_order = \"bysource\"\ndefault_role = \"py:obj\"\n\nhtml_theme = \"haiku\"\nhtmlhelp_basename = \"{}doc\".format(project)\n\nnapoleon_use_rtype = False\n"
  },
  {
    "path": "src/neuronperf/model_neuron_b1.csv",
    "content": "n_models,workers_per_model,pipeline_size,batch_size,throughput_avg,throughput_peak,latency_ms_p0,latency_ms_p50,latency_ms_p90,latency_ms_p95,latency_ms_p99,latency_ms_p100,load_avg_ms,warmup_avg_ms,e2e_avg_ms,input_avg_ms,preprocess_avg_ms,postprocess_avg_ms,infer_avg_ms,worker_avg_s,total_infs,total_s,status,model_filename,multiprocess,multiinterpreter,device_type,instance_type\n1,1,1,1,31346.0,31408.0,0.03,0.03,0.031,0.032,0.037,0.732,62.217,2.625,0.031,0.001,0.0,0.0,0.028,4.93,154704,5.0,finished,model_neuron_b1.pt,True,False,neuron,inf1.6xlarge\n16,16,1,1,380604.75,380923.0,0.03,0.032,0.054,0.054,0.057,0.938,293.806,3.266,0.043,0.001,0.0,0.0,0.039,4.7,1799549,5.0,finished,model_neuron_b1.pt,True,False,neuron,inf1.6xlarge\n1,2,1,1,51178.0,51319.0,0.035,0.036,0.037,0.039,0.047,1.13,114.118,2.713,0.037,0.001,0.0,0.0,0.033,4.88,248984,5.0,finished,model_neuron_b1.pt,True,False,neuron,inf1.6xlarge\n16,32,1,1,381098.75,383905.0,0.03,0.058,0.067,0.073,0.121,48.07,303.916,4.42,0.08,0.001,0.0,0.0,0.074,4.69,1804925,5.0,finished,model_neuron_b1.pt,True,False,neuron,inf1.6xlarge\n"
  },
  {
    "path": "src/neuronperf/pyproject.toml",
    "content": "[tool.black]\nline-length = 100\n\n[tool.isort]\nknown_first_party = [\"neuronperf\"]\n\n[tool.pytest.ini_options]\nmarkers = [\n    \"sanity\",\n    \"slow\",\n]\n\n# required for compatibility with black:\nprofile = \"black\"\n\n# To maintain consistency with other settings\nline_length = 100\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/__init__.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nNeuronPerf Library\n~~~~~~~~~~~~~~~~~~\n\nA library for benchmarking machine learning models on accelerators.\n\n:copyright: (c) 2022 Amazon Inc.\n:license: See LICENSE.\n\"\"\"\n\nfrom .__version__ import __title__, __description__, __url__, __version__\nfrom .__version__ import __author__, __author_email__, __license__\nfrom .__version__ import __copyright__\n\n# setup logging first\nimport logging\n\n_log_level = logging.DEBUG\nlog = logging.getLogger(__name__)\nlog.setLevel(_log_level)\n\nfrom .logging import _get_stream_handlers\n\nfor handler in _get_stream_handlers(_log_level):\n    log.addHandler(handler)\n\nfrom .benchmarking import compile, benchmark, set_verbosity\nfrom .cpu import cpu\nfrom .cpu.cpu import DummyModel\nfrom .reporting import CSV_COLS, PRINT_COLS, get_reports, print_reports, write_csv, write_json\nfrom .timing import timestamp_convert, Timer\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/__version__.py",
    "content": "__title__ = \"neuronperf\"\n__description__ = \"A benchmarking library for machine learning accelerators.\"\n__url__ = \"https://awsdocs-neuron.readthedocs-hosted.com/en/neuronperf\"\n__version__ = \"0.0.0.0\"\n__author__ = \"AWS\"\n__author_email__ = \"neuronperf@amazon.com\"\n__license__ = \"Proprietary\"\n__copyright__ = \"Copyright Amazon Web Services and its Affiliates. All rights reserved.\"\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/benchmarking.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nneuronperf.benchmarking\n~~~~~~~~~~~~~~~~~~~~~~~\nProvides utility functions and classes that underlie the framework benchmarkers.\n\"\"\"\n\nfrom typing import Any, Callable, Dict, List, Union\n\nimport collections\nimport concurrent\nimport concurrent.futures\nimport copy\nimport functools\nimport logging\nimport multiprocessing\nimport os\nimport psutil\nimport subprocess\nimport sys\nimport tempfile\nimport threading\nimport time\nimport traceback\n\nimport dill\n\n\nfrom . import model_index\nfrom .compile_constants import NEURONCORE_PIPELINE_CORES, FAST_MATH, FAST_MATH_OPTIONS\nfrom .reporting import get_reports\nfrom .scripts import run_benchmark_file\nfrom .timing import Timer\n\n\nlog = logging.getLogger(__name__)\n\n# Wrapper for sending back subprocess failure info. Needs to be at top level for pickle.\nBenchmarkerErrorWrapper = collections.namedtuple(\"BenchmarkerErrorWrapper\", \"trace\")\n\nERROR = \"error\"\nSUPPORTED_DEVICE_TYPES = [\"neuron\", \"cpu\", \"cuda\", \"gpu\"]  # TODO: \"tpu\"]\nBENCHMARK_SECS = 120\n\n\nclass Benchmarker(threading.Thread):\n    r\"\"\"\n    :class:`benchmarking:Benchmarker` benchmarks a single model.\n\n    This class is a `threading.Thread`. Call `start` to launch a non-blocking\n    benchmarking thread. Calling `stop` will end the benchmarking and block\n    until all subroutines complete.\n\n    An object of this class may be serialized and sent to multiple subprocesses\n    for parallel use. After benchmarking, results can be obtained with\n    `results`.\n    \"\"\"\n\n    def __init__(\n        self,\n        id: int,\n        device_id: int,\n        load_fn: Callable[[str], Any],\n        model_filename: str,\n        inputs,\n        workers_per_model: int,\n        env_setup_fn: Callable[[int, Dict, Any], None] = None,\n        setup_fn: Callable[[int, Dict, Any], None] = None,\n        preprocess_fn: Callable[[Any], Any] = None,\n        postprocess_fn: Callable[[Any], Any] = None,\n        dataset_loader_fn: Callable[[Any, int], Any] = None,\n        model_class_name: str = None,\n        model_class_file: str = None,\n    ):\n        super().__init__()\n\n        self.id = id\n        self.device_id = device_id\n        self.load_fn = load_fn\n        self.model_filename = model_filename\n        self.inputs = inputs\n        self.input_iter = None  # Prepared in setup()\n        self.input_lock = threading.Lock()\n        self.workers_per_model = workers_per_model\n        self.env_setup_fn = env_setup_fn\n        self.setup_fn = setup_fn\n        self.preprocess_fn = preprocess_fn\n        self.postprocess_fn = postprocess_fn\n        self.dataset_loader_fn = dataset_loader_fn\n        self.model_class_name = model_class_name\n        self.model_class_file = model_class_file\n\n        # Mutable internal state.\n        self.model = None\n        self.benchmark_timer = Timer()\n        self.env_setup_timer = Timer()\n        self.setup_timer = Timer()\n        self.load_timer = Timer()\n        self.warmup_timer = Timer()\n        self.input_timer = Timer()\n        self.preprocess_timers = [Timer() for _ in range(workers_per_model)]\n        self.infer_timers = [Timer() for _ in range(workers_per_model)]\n        self.postprocess_timers = [Timer() for _ in range(workers_per_model)]\n        self.e2e_timers = [Timer() for _ in range(workers_per_model)]\n        self.worker_timers = [Timer() for _ in range(workers_per_model)]\n        self.n_infs = [0] * workers_per_model\n        self.process_id = 0  # set at launch time\n        self.benchmarking = False\n        self.benchmarking_lock = threading.Lock()\n        self.status_lock = threading.Lock()\n        self.status = \"ready\"\n        self.error = None\n\n    def _status(self, status, error=None):\n        \"\"\"Update internal status, unless a previous error has occurred.\"\"\"\n        with self.status_lock:\n            if self.status == ERROR:\n                return\n            self.status = status\n            if error:\n                self.error = error\n\n    def next_input(self):\n        self.input_lock.acquire()\n        self.input_timer.start()\n        try:\n            return next(self.input_iter)\n        finally:\n            self.input_timer.stop()\n            self.input_lock.release()\n\n    def prepare_inputs(self):\n        \"\"\"Prepares input iterator; runs an optional custom setup function.\"\"\"\n        if self.dataset_loader_fn:\n\n            def input_iter():\n                dataset_loader = self.dataset_loader_fn(self.inputs, self.workers_per_model)\n                while True:\n                    inputs = next(dataset_loader)\n                    yield inputs if isinstance(inputs, tuple) else (inputs,)\n\n            self.input_iter = input_iter()\n        else:\n\n            def input_iter():\n                inputs = self.inputs if isinstance(self.inputs, tuple) else (self.inputs,)\n                while True:\n                    yield inputs\n\n            self.input_iter = input_iter()\n\n    def load(self):\n        \"\"\"Loads the model that will be used for benchmarking.\"\"\"\n        with self.load_timer:\n            self.model = self.load_fn(self.model_filename, device_id=self.device_id)\n\n    def warmup(self):\n        \"\"\"Warmup the model with a single e2e inference.\"\"\"\n        with self.warmup_timer:\n            inputs = self.next_input()\n            if self.preprocess_fn:\n                inputs = self.preprocess_fn(*inputs)\n            outputs = self.model(*inputs if isinstance(inputs, tuple) else inputs)\n            if self.postprocess_fn:\n                self.postprocess_fn(outputs)\n        self.n_infs[0] += 1  # track warmup infs in worker 0\n\n    def setup(self):\n        \"\"\"Perform all setup work prior to benchmarking.\"\"\"\n        self.prepare_inputs()\n\n        if self.env_setup_fn:\n            with self.env_setup_timer:\n                self.env_setup_fn()\n\n        self.load()\n\n        if self.setup_fn:\n            with self.setup_timer:\n                self.setup_fn(self.model)\n\n        self.warmup()\n\n    def infer(self, worker_id) -> tuple:\n        \"\"\"Execute a single inference.\"\"\"\n        with self.e2e_timers[worker_id]:\n            inputs = self.next_input()\n            if self.preprocess_fn:\n                with self.preprocess_timers[worker_id]:\n                    inputs = self.preprocess_fn(*inputs)\n            with self.infer_timers[worker_id]:\n                outputs = self.model(*inputs if isinstance(inputs, tuple) else inputs)\n            if self.postprocess_fn:\n                with self.postprocess_timers[worker_id]:\n                    outputs = self.postprocess_fn(outputs)\n        return outputs\n\n    def worker_thread(self, worker_id):\n        \"\"\"A single worker thread that runs inference until signalled to stop.\"\"\"\n        n_infs = 0\n        try:\n            log.debug(f\"Benchmarker {self.id}, Worker {worker_id} started.\")\n            with self.worker_timers[worker_id]:\n                while self.benchmarking and self.status != ERROR:\n                    self.infer(worker_id)\n                    n_infs += 1\n            if self.status == ERROR:\n                log.debug(\n                    f\"Benchmarker {self.id}, Worker {worker_id} stopped early due to an error after {n_infs} inferences.\"\n                )\n        except StopIteration:\n            pass\n        except:\n            trace = \"\".join(traceback.format_exception(*sys.exc_info()))\n            log.error(\n                f\"Benchmarker {self.id}, Worker {worker_id} encountered an error during benchmarking:\\n{trace}\"\n            )\n            self._status(ERROR, BenchmarkerErrorWrapper(trace))\n        finally:\n            self.n_infs[worker_id] += n_infs\n            log.debug(\n                f\"Benchmarker {self.id}, Worker {worker_id} finished after {self.n_infs[worker_id]} inferences.\"\n            )\n\n    def run(self):\n        with self.benchmarking_lock:\n            if self.benchmarking:\n                raise RuntimeError(\n                    f\"Benchmarker {self.id} can't start because it is already running.\"\n                )\n            self.benchmarking = True\n            self._status(\"running\")\n\n        # Set our process id, now that we are launched.\n        self.process_id = os.getpid()\n\n        # Launch all workers and begin benchmarking.\n        # If any individual worker reports an error, self.status will reflect\n        # that after this method.\n        with self.benchmark_timer:\n            try:\n                self.setup()\n            except:\n                trace = \"\".join(traceback.format_exception(*sys.exc_info()))\n                log.error(f\"Benchmarker {self.id} encountered an error during prep:\\n{trace}\")\n                self._status(ERROR, BenchmarkerErrorWrapper(trace))\n            else:\n                with concurrent.futures.ThreadPoolExecutor(max_workers=self.workers_per_model) as exe:\n                    for worker_id in range(self.workers_per_model):\n                        exe.submit(self.worker_thread, worker_id)\n\n        # There are three ways to reach the next section:\n        # 1. We ran out of benchmarking examples in a provided dataset (graceful quit on StopIteration).\n        # 2. We were asked to stop().\n        # 3. We encountered an error.\n\n        # In cases 1 and 3, we can acquire the lock, update our state if necessary, and quit.\n        # In case 2, we already hold the lock, so we can skip this section and let stop() handle cleanup.\n        if self.benchmarking_lock.acquire(blocking=False):\n            try:\n                self.benchmarking = False\n                self._status(\"finished\")\n            finally:\n                self.benchmarking_lock.release()\n\n    def stop(self):\n        # Setting self.benchmarking = False triggers workers to terminate gracefully.\n        # We must hold the benchmarking_lock until the thread has joined to ensure\n        # consistent use of the self.benchmarking flag.\n        with self.benchmarking_lock:\n            if not self.benchmarking:\n                return\n            self._status(\"stopping\")\n            self.benchmarking = False\n            self.join()\n            self._status(\"finished\")\n\n    def results(self) -> dict:\n        with self.benchmarking_lock:\n            if self.benchmarking:\n                raise RuntimeError(\"Cannot produce results until benchmarking has completed.\")\n            return {\n                \"id\": self.id,\n                \"device_id\": self.device_id,\n                \"workers_per_model\": self.workers_per_model,\n                \"n_infs\": sum(self.n_infs),\n                \"status\": self.status,\n                \"process_id\": self.process_id,\n                \"total_s\": self.benchmark_timer.total_duration(\"s\"),\n                \"timers\": {\n                    \"env_setup\": [self.env_setup_timer],\n                    \"setup\": [self.setup_timer],\n                    \"load\": [self.load_timer],\n                    \"input\": [self.input_timer],\n                    \"warmup\": [self.warmup_timer],\n                    \"preprocess\": self.preprocess_timers,\n                    \"infer\": self.infer_timers,\n                    \"postprocess\": self.postprocess_timers,\n                    \"e2e\": self.e2e_timers,\n                    \"worker\": self.worker_timers,\n                },\n            }\n\n\nclass StatsThread(threading.Thread):\n    \"\"\"A thread to collect some system metrics duirng benchmarking.\"\"\"\n\n    def __init__(self, interval: float):\n        super().__init__()\n        self.interval = interval  # interval (in seconds) to collect metrics\n        self.cpu_percents = []\n        self.mem_percents = []\n        self.running = True\n\n    def run(self):\n        while self.running:\n            cpu_percent = psutil.cpu_percent(interval=self.interval, percpu=False)\n            mem_percent = psutil.virtual_memory()[2]\n            self.cpu_percents.append(cpu_percent)\n            self.mem_percents.append(mem_percent)\n\n    def join(self, **kwargs):\n        self.running = False\n        super().join(**kwargs)\n\n\ndef _combine_results(results: List[dict]) -> dict:\n    \"\"\"Combines the results of multiple benchmarkers into a single results structure.\"\"\"\n    combined_results = {}\n    for result in results:\n        # workers_per_model should be the same across all benchmarkers, so we only need it once.\n        combined_results.setdefault(\"workers_per_model\", result[\"workers_per_model\"])\n        # If an error occurred anywhere, preserve it.\n        combined_results[\"status\"] = (\n            result[\"status\"] if combined_results.get(\"status\", \"\") != ERROR else ERROR\n        )\n        combined_results[\"n_infs\"] = combined_results.get(\"n_infs\", 0) + result[\"n_infs\"]\n        # Keep the longest subprocess duration.\n        combined_results[\"total_s\"] = max(combined_results.get(\"total_s\", 0), result[\"total_s\"])\n        # Concatenate all timing info.\n        timers = combined_results.get(\"timers\", {})\n        for k, v in result[\"timers\"].items():\n            timer_list = timers.get(k, [])\n            timer_list.extend(v)\n            timers[k] = timer_list\n        combined_results[\"timers\"] = timers\n    return combined_results\n\n\ndef _get_num_workers(pipeline_size: int) -> int:\n    \"\"\"Returns a best-guess number of worker threads for a single benchmarking process.\"\"\"\n    return 2 if pipeline_size == 1 else pipeline_size - 1\n\n\ndef get_instance_type() -> str:\n    \"\"\"Try to obtain the maximum number of NeuronCores available on this instance.\"\"\"\n    try:\n        import urllib.request\n\n        with urllib.request.urlopen(\n            \"http://169.254.169.254/latest/meta-data/instance-type\"\n        ) as response:\n            instance_type = response.read().decode(\"utf-8\")\n        log.debug(\"Automatically determined instance type: {}\".format(instance_type))\n        return instance_type\n    except:\n        return None\n\n\ndef _get_cost_per_hour(instance_type: str) -> float:\n    # Hourly rates\n    instancetype_to_cost = {\n        \"inf1.xlarge\": 0.228,\n        \"inf1.2xlarge\": 0.362,\n        \"inf1.6xlarge\": 1.18,\n        \"inf1.24xlarge\": 4.721,\n    }\n    try:\n        return instancetype_to_cost[instance_type]\n    except:\n        # Just ignore unknown instance types for now\n        return None\n\n\ndef _get_max_neuroncores(instance_type: str = None) -> int:\n    \"\"\"Try to obtain the maximum number of NeuronCores available on this instance.\"\"\"\n    instancetype_to_neuroncores = {\n        \"inf1.xlarge\": 4,\n        \"inf1.2xlarge\": 4,\n        \"inf1.6xlarge\": 16,\n        \"inf1.24xlarge\": 64,\n    }\n    try:\n        if not instance_type:\n            instance_type = get_instance_type()\n        return instancetype_to_neuroncores[instance_type]\n    except:\n        num_cores = 2\n        log.warning(f\"Unknown Neuron device size. Assuming {num_cores} NeuronCores is the maximum.\")\n        return num_cores\n\n\ndef _get_num_gpus(instance_type: str = None) -> int:\n    \"\"\"Try to obtain the maximum number of NeuronCores available on this instance.\"\"\"\n    instancetype_to_gpus = {\n        \"g4dn.xlarge\": 1,\n        \"g4dn.2xlarge\": 1,\n        \"g4dn.4xlarge\": 1,\n        \"g4dn.8xlarge\": 1,\n        \"g4dn.16xlarge\": 1,\n        \"g4dn.12xlarge\": 4,\n        \"g4dn.metal\": 8,\n        \"g4ad.xlarge\": 1,\n        \"g4ad.2xlarge\": 1,\n        \"g4ad.4xlarge\": 1,\n        \"g4ad.8xlarge\": 2,\n        \"g4ad.16xlarge\": 4,\n        \"p4d.24xlarge\": 8,\n    }\n    try:\n        if not instance_type:\n            instance_type = get_instance_type()\n        return instancetype_to_gpus[instance_type]\n    except:\n        log.warning(\"Unknown GPU device size. Assuming 1 GPU is available.\")\n        return 1\n\n\ndef _get_num_devices(device_type: str, instance_type: str = None) -> int:\n    \"\"\"This is a stub, to be populated later for other instance types.\"\"\"\n    if device_type == \"neuron\":\n        return _get_max_neuroncores(instance_type)\n    elif device_type == \"cpu\":\n        return multiprocessing.cpu_count()\n    elif device_type == \"cuda\" or device_type == \"gpu\":\n        return _get_num_gpus(instance_type)\n    else:\n        log.warning(\"An unknown device_type was passed: {}\".format(device_type))\n        return None\n\n\ndef _sanitize_inputs(inputs, batch_sizes: Union[int, List[int]], dataset_inputs=False) -> List[int]:\n    \"\"\"Return inputs and batch_sizes with matching lengths, or throw an error.\"\"\"\n    if not isinstance(inputs, list):\n        inputs = [inputs]\n    if isinstance(batch_sizes, int):\n        batch_sizes = [batch_sizes]\n    if not batch_sizes:\n        log.warning(\n            \"Batch sizes were not provided, so assuming 1 and only the first input will be benchmarked.\"\n        )\n        batch_sizes = [1]\n    if not dataset_inputs:\n        if len(batch_sizes) < len(inputs):\n            delta = len(inputs) - len(batch_sizes)\n            log.warning(\n                \"Received {} inputs, but only {} batch sizes. Discarding last {} inputs.\".format(\n                    len(inputs), len(batch_sizes), delta\n                )\n            )\n            inputs = inputs[: len(batch_sizes)]\n        elif len(inputs) < len(batch_sizes):\n            delta = len(batch_sizes) - len(inputs)\n            log.warning(\n                \"Received {} batch sizes, but only {} inputs. Discarding last {} batch sizes.\".format(\n                    len(batch_sizes), len(inputs), delta\n                )\n            )\n            batch_sizes = batch_sizes[: len(inputs)]\n    return inputs, batch_sizes\n\n\ndef set_verbosity(verbosity: int):\n    r\"\"\"\n    Controls the verbosty of NeuronPerf logging.\n\n    :param int verbosity: 0 = error, 1 = info, 2 = debug\n    \"\"\"\n    if 0 == verbosity:\n        log.setLevel(logging.ERROR)\n    elif 1 == verbosity:\n        log.setLevel(logging.INFO)\n    else:\n        log.setLevel(logging.DEBUG)\n\n\ndef compile(\n    compile_fn,\n    model,\n    inputs,\n    batch_sizes: Union[int, List[int]] = None,\n    pipeline_sizes: Union[int, List[int]] = None,\n    performance_levels: Union[str, List[int]] = None,\n    models_dir: str = \"models\",\n    model_name: str = None,\n    filename: str = None,\n    compiler_args: dict = None,\n    verbosity: int = 1,\n    **kwargs,\n) -> str:\n    r\"\"\"\n    Compiles the provided model with each provided example input, pipeline size, and performance level.\n\n    :param model: The model to compile.\n    :param list inputs: A list of example inputs.\n    :param Union[int, List[int]] batch_sizes: A list of batch sizes that correspond to the example inputs.\n    :param Union[int, List[int]] pipeline_sizes: A list of pipeline sizes to use. See :ref:`neuroncore-pipeline`.\n    :param Union[int, List[int]] performance_levels: A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default).  See :ref:`mixed-precision`.\n    :param str models_dir: The directory where compilation artifacts will be stored.\n    :param str model_name: An optional model name tag to apply to compiled artifacts.\n    :param str filename: The name of the model index to write out. If not provided, a name will be generated and returned.\n    :param dict compiler_args: Additional compiler arguments to be forwarded with every compilation.\n    :param int verbosity: 0 = error, 1 = info, 2 = debug\n    :return: A model index filename. If a configuration fails to compile, it will not be included in the index and an error will be logged.\n    :rtype: str\n    \"\"\"\n    # Set NeuronPerf logging verbosity.\n    set_verbosity(verbosity)\n\n    # Standardize arguments.\n    if not pipeline_sizes:\n        pipeline_sizes = [1]\n    if not performance_levels:\n        performance_levels = []\n    if not compiler_args:\n        compiler_args = {}\n    if not model_name:\n        if isinstance(model, str):\n            model_name = model\n        else:\n            try:\n                model_name = model.__name__\n            except AttributeError:\n                log.warning(\"Unable to determine a model name, using 'Model'.\")\n                model_name = \"Model\"\n    if isinstance(pipeline_sizes, int):\n        pipeline_sizes = [pipeline_sizes]\n    if isinstance(performance_levels, int):\n        performance_levels = [performance_levels]\n\n    inputs, batch_sizes = _sanitize_inputs(inputs, batch_sizes)\n\n    # Sanity check and sanitize compiler_args.\n    if NEURONCORE_PIPELINE_CORES in compiler_args:\n        if pipeline_sizes:\n            log.warning(\n                (\n                    \"You provided NeuronCore Pipeline Core sizes using both \"\n                    \"compiler_args and pipeline_sizes. Ignoring flag in compiler_args.\"\n                )\n            )\n        else:\n            pipeline_sizes = [compiler_args[NEURONCORE_PIPELINE_CORES]]\n        del compiler_args[NEURONCORE_PIPELINE_CORES]\n\n    if FAST_MATH in compiler_args:\n        if performance_levels:\n            log.warning(\n                (\n                    f\"You provided performance_levels and {FAST_MATH}. \"\n                    \"Ignoring flag in compiler_args.\"\n                )\n            )\n        del compiler_args[FAST_MATH]\n\n    # Check if performance levels are within expected bounds.\n    max_performance = max(FAST_MATH_OPTIONS)\n    performance_levels_invalid = list(\n        filter(\n            lambda level: level < min(FAST_MATH_OPTIONS) or level > max_performance,\n            performance_levels,\n        )\n    )\n    if performance_levels_invalid:\n        log.warning(\n            \"You provided some invalid performance_levels. Ignoring: {}\".format(\n                performance_levels_invalid\n            )\n        )\n        performance_levels = [\n            level\n            for level in performance_levels\n            if (level in performance_levels) and (level not in performance_levels_invalid)\n        ]\n\n    # If we still have no values, set default to max performance.\n    if not performance_levels:\n        performance_levels.append(max_performance)\n\n    # Create standard output dir, if it doesn't exit.\n    os.makedirs(models_dir, exist_ok=True)\n\n    # Compile all requested model combinations.\n    model_idxs = []\n\n    # TODO: Support appending to existing index by filtering already-compiled configs.\n    def make_index():\n        \"\"\"Create a model index file that contains info about all compiled models.\"\"\"\n        index = model_index.append(*model_idxs)\n        # Return the name of the new index file.\n        return model_index.save(index, filename=filename)\n\n    compile_idx = 1\n    n_compiles = len(inputs) * len(pipeline_sizes) * len(performance_levels)\n    for input_idx, example_input in enumerate(inputs):\n        batch_size = batch_sizes[input_idx]\n        for pipeline_size in pipeline_sizes:\n            for performance_level in performance_levels:\n                _compiler_args = copy.copy(compiler_args)\n                _compiler_args[FAST_MATH] = FAST_MATH_OPTIONS[performance_level]\n                if pipeline_size != 1:\n                    _compiler_args[NEURONCORE_PIPELINE_CORES] = str(pipeline_size)\n\n                # Construct a more informative model name with some config info\n                model_name_ex = \"{}_b{}_p{}_{}\".format(\n                    model_name,\n                    batch_size,\n                    pipeline_size,\n                    model_index.generate_id(),\n                )\n                log.info(\n                    (\n                        f\"Compiling batch size {batch_size} for {pipeline_size} NeuronCore(s) with performance level \"\n                        f\"{performance_level}/{max_performance}. [{compile_idx}/{n_compiles}]\"\n                    )\n                )\n                status = \"ready\"\n                timer = Timer()\n                with timer:\n                    try:\n                        model_filename = compile_fn(\n                            model,\n                            example_input,\n                            models_dir,\n                            model_name_ex,\n                            compiler_args=_compiler_args,\n                            **kwargs,\n                        )\n                        status = \"finished\"\n                    except KeyboardInterrupt:\n                        status = \"error\"\n                        model_filename = None\n                        log.error(\"Compilation interrupted, terminating.\")\n                        return make_index()\n                    except:\n                        status = \"error\"\n                        model_filename = None\n                        log.exception(\n                            (\n                                f\"Failed to compile input={input_idx}, \"\n                                f\"batch_size={batch_size}, \"\n                                f\"pipeline_size={pipeline_size}, \"\n                                f\"performance_level={performance_level}.\"\n                            )\n                        )\n                    finally:\n                        model_idx = model_index.create(\n                            model_filename,\n                            model_name=model_name,\n                            batch_size=batch_size,\n                            pipeline_size=pipeline_size,\n                            performance_level=performance_level,\n                            compile_s=round(timer.total_duration(\"s\"), 2),\n                            status=status,\n                        )\n                        model_idxs.append(model_idx)\n                        filename = make_index()\n                compile_idx += 1\n    return filename\n\n\ndef run_benchmarker(benchmarker, duration, pipe=None):\n    def _send(results):\n        if pipe:\n            pipe.send(results)\n            pipe.close()\n        else:\n            return results\n\n    try:\n        log.debug(f\"Benchmarker {benchmarker.id} started.\")\n        check_freq = 0.1  # Check progress every 0.1 seconds.\n        start_time = time.time()\n        benchmarker.start()\n        elapsed = 0\n        while (elapsed < duration) and benchmarker.benchmarking:\n            elapsed = time.time() - start_time\n            remaining = max(0, duration - elapsed)\n            time.sleep(min(check_freq, remaining))\n        benchmarker.stop()\n    except:\n        trace = \"\".join(traceback.format_exception(*sys.exc_info()))\n        error = BenchmarkerErrorWrapper(trace)\n        return _send(error)\n    else:\n        results = benchmarker.results() if benchmarker.status != ERROR else benchmarker.error\n        return _send(results)\n    finally:\n        log.debug(f\"Benchmarker {benchmarker.id} finished.\")\n\n\ndef _run_benchmarker_new_interpreter(benchmarker, duration):\n    \"\"\"\n    This function is a workaround for frameworks that cannot be safely forked.\n    The premise is to launch a new Python interpreter and run benchmarking\n    from within the new interpreter. It works by writing serialized benchmarkers\n    to temporary files, and then launching run_benchmark_file.py. The script\n    writes back serialized results.\n    \"\"\"\n\n    # Temporary serialization workaround. This attribute is inherited from Thread.\n    # TODO: Separate data from benchmarking.\n    setattr(benchmarker, \"_stderr\", None)\n\n    script = run_benchmark_file.__file__\n\n    # Serialize the benchmarker to a file.\n    f = tempfile.NamedTemporaryFile(delete=False)\n    log.debug(\"Dumping Benchmarker {} to file '{}'.\".format(benchmarker.id, f.name))\n    try:\n        dill.dump(benchmarker, f)\n    except dill.PicklingError:\n        raise dill.PicklingError(\n            (\n                \"NeuronPerf was unable to serialize the benchmarker. This is probably becuause your model \"\n                \"could not be serialized. Make sure to use top-level classes instead of locals. You may \"\n                \"need to wrap your model and manually load it using Python's importlib.\"\n            )\n        )\n    f.close()\n\n    # Run the benchmarking script in a clean Python process.\n    command = [\n        sys.executable,\n        script,\n        f.name,\n        str(duration),\n    ]\n\n    # If we are manually loading a model class file in subprocesses, we need to let them know.\n    if benchmarker.model_class_name and benchmarker.model_class_file:\n        command.append(f\"--model_class_name={benchmarker.model_class_name}\")\n        command.append(f\"--model_class_file={benchmarker.model_class_file}\")\n\n    proc = subprocess.Popen(\n        command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding=\"utf-8\"\n    )\n\n    # Interpreter and framework overhead add a delay to processing. We should ensure\n    # that during multiinterpreter benchmarking, sufficient time is waited for results.\n    timeout = 60 + duration\n\n    try:\n        outs, errs = proc.communicate(timeout=timeout)\n        with open(f.name, \"rb\") as fp:\n            result = dill.load(fp)\n        if isinstance(result, BenchmarkerErrorWrapper):\n            raise ChildProcessError(\n                \"Benchmarker {} encountered an error:\\n{}\".format(benchmarker.id, result.trace)\n            )\n        if isinstance(result, Benchmarker):\n            # If we still have a benchmarker object instead of results, something\n            # went wrong that wasn't handled by the benchmarker routine.\n            from pathlib import Path\n\n            path = Path(f.name)\n            logs = os.path.join(path.parent, \"neuronperf_error_{}\".format(str(path.stem)))\n            if os.path.exists(logs):\n                with open(logs, \"rt\") as logs_fp:\n                    err_logs = logs_fp.readlines()\n                os.unlink(logs)\n                raise ChildProcessError(\n                    \"Benchmarker {} failed. Logs from child process:\\n{}\".format(\n                        benchmarker.id, \"\".join(err_logs)\n                    )\n                )\n            else:\n                raise ChildProcessError(\n                    (\n                        \"Benchmarker {} failed and no error logs were found. A child process may have \"\n                        \"aborted. To obtain a stack trace, try running a single configuration inside a \"\n                        \"single process by passing multiprocess=False, multiinterpreter=False\"\n                    )\n                )\n\n        return result\n    except subprocess.TimeoutExpired:\n        proc.kill()\n        raise ChildProcessError(\n            \"Benchmarker {} stopped responding after {} seconds.\".format(benchmarker.id, timeout)\n        )\n    finally:\n        os.unlink(f.name)\n\n\ndef _run_benchmarkers_multiprocess(\n    benchmarkers: List[Benchmarker], duration: int, benchmark_func=run_benchmarker\n) -> dict:\n    results = []\n    # Hand each benchmarker object to a subprocess.\n    pipes, procs = [], []\n    for benchmarker in benchmarkers:\n        parent_pipe, child_pipe = multiprocessing.Pipe()\n        pipes.append(parent_pipe)\n        proc = multiprocessing.Process(\n            target=benchmark_func, args=(benchmarker, duration, child_pipe)\n        )\n        procs.append(proc)\n    # Launch benchmarking.\n    for proc in procs:\n        proc.start()\n    # Collect results.\n    for id, (pipe, proc) in enumerate(zip(pipes, procs)):\n        try:\n            proc_result = pipe.recv()\n            if isinstance(proc_result, BenchmarkerErrorWrapper):\n                log.error(\"Child process encountered an error:\\n{}\".format(proc_result.trace))\n                raise ChildProcessError()\n            proc.join()\n            results.append(proc_result)\n        except KeyboardInterrupt:\n            log.error(\"Benchmarking interrupted, terminating.\")\n            for proc in procs:\n                proc.terminate()\n            raise KeyboardInterrupt()\n        except EOFError:\n            log.error(\n                (\n                    f\"Child process {id} was killed by the host OS during benchmarking.\\n\"\n                    \"You may have run out of memory.\\n\"\n                    \"Verify that your model can perform inference without NeuronPerf or try n_models=1.\"\n                )\n            )\n    return _combine_results(results)\n\n\ndef _run_benchmarkers_multithreaded(\n    benchmarkers: List[Benchmarker], duration: int, benchmark_func=run_benchmarker\n) -> dict:\n    results = []\n    timeout = 60 + duration  # Add some time for setup overhead and cleanup.\n    try:\n        args = ((benchmarker, duration) for benchmarker in benchmarkers)\n        with concurrent.futures.ThreadPoolExecutor(max_workers=len(benchmarkers)) as exe:\n            results.extend(exe.map(lambda arg: benchmark_func(*arg), args, timeout=timeout))\n        for result in results:\n            if isinstance(result, BenchmarkerErrorWrapper):\n                raise RuntimeError(\"Worker thread encountered an error:\\n{}\".format(result.trace))\n    except concurrent.futures.TimeoutError:\n        log.error(\"Benchmarking timed out after {} seconds.\".format(timeout))\n    except KeyboardInterrupt:\n        raise KeyboardInterrupt(\"Benchmarking interrupted, terminating.\")\n    return _combine_results(results)\n\n\ndef run_benchmarkers(\n    benchmarkers: List[Benchmarker],\n    duration: int,\n    stats_interval: float = 0.5,\n    multiprocess: bool = True,\n    multiinterpreter: bool = False,\n) -> dict:\n    results = {}\n\n    # Launch a background thread to collect system stats during benchmarking.\n    stats_thread = StatsThread(stats_interval)\n    stats_thread.start()\n\n    try:\n        if multiinterpreter:\n            if not sys.executable:\n                raise ValueError(\n                    (\n                        \"Unable to benchmark in multi-interpreter mode because \"\n                        \"the Python interpreter cannot be located (sys.executable is empty).\"\n                    )\n                )\n            # We can safely re-use the multithreaded path here by using a custom benchmarking\n            # function that spawns fresh interpreters.\n            results = _run_benchmarkers_multithreaded(\n                benchmarkers, duration, benchmark_func=_run_benchmarker_new_interpreter\n            )\n        elif multiprocess:\n            results = _run_benchmarkers_multiprocess(benchmarkers, duration)\n        else:\n            results = _run_benchmarkers_multithreaded(benchmarkers, duration)\n    finally:\n        stats_thread.join()\n        results[\"cpu_percents\"] = stats_thread.cpu_percents\n        results[\"mem_percents\"] = stats_thread.mem_percents\n\n    return results\n\n\ndef _get_env_setup_fn(benchmarker_id: int, benchmarker_config: dict, env_setup_fn):\n    \"\"\"Wrap an environment setup function with device-specific requirements.\"\"\"\n    device_type = str(benchmarker_config[\"device_type\"]).lower().strip()\n    legacy = bool(os.environ.get(\"NEURONCORE_GROUP_SIZES\"))\n    if \"neuron\" == device_type:\n\n        @functools.wraps(env_setup_fn)\n        def _env_setup_fn():\n            import os\n\n            id = benchmarker_id\n            config = benchmarker_config\n            pipeline_size = config[\"pipeline_size\"]\n            if config[\"multiprocess\"] or config[\"multiinterpreter\"]:\n                # In multiprocess mode, need to specify the exact cores for the process.\n                min_core = pipeline_size * id\n                max_core = min_core + (pipeline_size - 1)\n                visible_cores = f\"{min_core}-{max_core}\"\n\n                if legacy:\n                    os.environ[\"NEURONCORE_GROUP_SIZES\"] = str(pipeline_size)\n                else:\n                    os.environ[\"NEURON_RT_VISIBLE_CORES\"] = visible_cores\n            else:\n                # In multithreaded mode, all required cores are allocated in this process.\n                n_models = config[\"n_models\"]\n                if legacy:\n                    os.environ[\"NEURONCORE_GROUP_SIZES\"] = \",\".join([str(pipeline_size)] * n_models)\n                else:\n                    os.environ[\"NEURON_RT_VISIBLE_CORES\"] = \"0-{}\".format(\n                        n_models * pipeline_size - 1\n                    )\n\n            # Finally, call any additional custom setup function provided.\n            if env_setup_fn:\n                env_setup_fn(id, config)\n\n        return _env_setup_fn\n    elif device_type == \"cpu\":\n        return env_setup_fn\n    elif device_type == \"cuda\" or device_type == \"gpu\":\n\n        @functools.wraps(env_setup_fn)\n        def _env_setup_fn():\n            import os\n\n            os.environ[\"CUDA_VISIBLE_DEVICES\"] = str(benchmarker_id)\n\n            if env_setup_fn:\n                env_setup_fn(benchmarker_id, benchmarker_config)\n\n        return _env_setup_fn\n    else:\n        log.warning(\n            (\n                f\"NeuronPerf does not implement a proper environment setup for {device_type}. \"\n                \"You may need to provide your own.\"\n            )\n        )\n        return env_setup_fn\n\n\ndef _get_setup_fn(benchmarker_id: int, benchmarker_config: dict, setup_fn):\n    \"\"\"Wraps a customer provided setup function with additional info from the benchmarker.\"\"\"\n    if not setup_fn:\n        return None\n\n    @functools.wraps(setup_fn)\n    def _setup_fn(model):\n        setup_fn(benchmarker_id, benchmarker_config, model)\n\n    return _setup_fn\n\n\ndef _get_device_id(benchmarker_id: int, benchmarker_config: dict):\n    \"\"\"Calculate an appropriate device id for a benchmarker object.\"\"\"\n    device_id = benchmarker_id\n    device_type = str(benchmarker_config[\"device_type\"]).lower().strip()\n    if device_type in SUPPORTED_DEVICE_TYPES:\n        if not (benchmarker_config[\"multiprocess\"] or benchmarker_config[\"multiinterpreter\"]):\n            device_id = benchmarker_id * benchmarker_config[\"pipeline_size\"]\n        return device_id\n    else:\n        log.warning(\n            \"Assuming device_id={} for benchmarker_id={} for unknown device_type={}\".format(\n                device_id, benchmarker_id, device_type\n            )\n        )\n    return device_id\n\n\ndef benchmark(\n    load_fn: Callable[[str, int], Any],\n    model_filename: str,\n    inputs: Any,\n    batch_sizes: Union[int, List[int]] = None,\n    duration: float = BENCHMARK_SECS,\n    n_models: Union[int, List[int]] = None,\n    pipeline_sizes: Union[int, List[int]] = None,\n    performance_levels: Union[int, List[int]] = None,\n    workers_per_model: Union[int, None] = None,\n    env_setup_fn: Callable[[int, Dict], None] = None,\n    setup_fn: Callable[[int, Dict, Any], None] = None,\n    preprocess_fn: Callable[[Any], Any] = None,\n    postprocess_fn: Callable[[Any], Any] = None,\n    dataset_loader_fn: Callable[[Any, int], Any] = None,\n    multiprocess: bool = True,\n    multiinterpreter: bool = False,\n    return_timers: bool = False,\n    stats_interval: float = 0.5,\n    device_type: str = \"neuron\",\n    cost_per_hour: float = None,\n    model_name: str = None,\n    model_class_name: str = None,\n    model_class_file: str = None,\n    verbosity: int = 1,\n) -> List[Dict]:\n    r\"\"\"\n    Benchmarks the model index or individiual model using the provided inputs.\n    If a model index is provided, additional fields such as ``pipeline_sizes`` and\n    ``performance_levels`` can be used to filter the models to benchmark. The default\n    behavior is to benchmark all configurations in the model index. Any additional\n    compiler_args passed will be forwarded to the compiler on every invocation.\n\n    :param Callable[[str, int], Any] load_fn: A function that accepts a model filename and device id, and returns a loaded model. This is automatically passed through the subpackage calls (e.g. ``neuronperf.torch.benchmark``).\n    :param str model_filename: A path to a model index from compile or path to an individual model. For CPU benchmarking, a class should be passed that can be instantiated with a default constructor (e.g. ``MyModelClass``).\n    :param list inputs: A list of example inputs. If the list contains tuples, they will be destructured on inference to support multiple arguments.\n    :param Union[int, List[int]] batch_sizes: A list of ints indicating batch sizes that correspond to the inputs. Assumes 1 if not provided.\n    :param duration float: The number of seconds to benchmark each model.\n    :param n_models Union[int, List[int]]: The number of models to run in parallel. Default behavior runs 1 model and the max number of models possible, determined by a best effort from ``device_type``, instance size, or other environment state.\n    :param Union[int, List[int]] pipeline_sizes: A list of pipeline sizes to use. See :ref:`neuroncore-pipeline`.\n    :param Union[int, List[int]] performance_levels: A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See :ref:`mixed-precision`.\n    :param Union[int, List[int]] workers_per_model: The number of workers to use per model loaded. If ``None``, this is automatically selected.\n    :param Callable[[int, Dict], None] env_setup_fn: A custom environment setup function to run in each subprocess before model loading. It will receive the benchmarker id and config.\n    :param Callable[[int, Dict, Any], None] setup_fn: A function that receives the benchmarker id, config, and model to perform last minute configuration before inference.\n    :param Callable[[Any], Any]: preprocess_fn: A custom preprocessing function to perform on each input before inference.\n    :param Callable[[Any], Any]: postprocess_fn: A custom postprocessing function to perform on each input after inference.\n    :param bool multiprocess: When True, model loading is dispatched to forked subprocesses. Should be left alone unless debugging.\n    :param bool multiinterpreter: When True, benchmarking is performed in a new python interpreter per model. All parameters must be serializable. Overrides multiprocess.\n    :param bool return_timers: When True, the return of this function is a list of tuples ``(config, results)`` with detailed information. This can be converted to reports with ``get_reports(results)``.\n    :param float stats_interval: Collection interval (in seconds) for metrics during benchmarking, such as CPU and memory usage.\n    :param str device_type: This will be set automatically to one of the ``SUPPORTED_DEVICE_TYPES``.\n    :param float cost_per_hour: The price of this device / hour. Used to estimate cost / 1 million infs in reports.\n    :param str model_name: A friendly name for the model to use in reports.\n    :param str model_class_name: Internal use.\n    :param str model_class_file: Internal use.\n    :param int verbosity: 0 = error, 1 = info, 2 = debug\n    :return: A list of benchmarking results.\n    :rtype: List[Dict]\n    \"\"\"\n    # Set NeuronPerf logging verbosity.\n    set_verbosity(verbosity)\n\n    # --------------------------------------------\n    # Input validation\n    # --------------------------------------------\n    # Validate that enough information was provided.\n    if not load_fn:\n        raise ValueError(\n            \"You should call benchmark() through a framework submodule, e.g. neuronperf.torch.benchmark().\"\n        )\n    if not isinstance(model_filename, str):\n        raise ValueError(\n            \"You must provide the path to a saved model or the path to a model index from neuronperf.compile().\"\n        )\n\n    # Useful for debugging.\n    if not multiprocess and not multiinterpreter:\n        log.warning(\"Benchmarking in a single process.\")\n\n    # Standardize inputs.\n    dataset_inputs = dataset_loader_fn is not None\n    if (not dataset_inputs) and (not isinstance(inputs, list)):\n        inputs = [inputs]\n    if isinstance(n_models, int):\n        n_models = [n_models]\n    if isinstance(pipeline_sizes, int):\n        pipeline_sizes = [pipeline_sizes]\n    if isinstance(performance_levels, int):\n        performance_levels = [performance_levels]\n    if workers_per_model is None:\n        workers_per_model = []\n    elif isinstance(workers_per_model, int):\n        workers_per_model = [workers_per_model]\n    if duration < BENCHMARK_SECS:\n        log.warning(\"Results may be unreliable with short test durations.\")\n\n    # If the model_filename is JSON, attempt to interpret it as a model index.\n    index = None\n    if model_filename.endswith(model_index.MODEL_INDEX_SUFFIX):\n        index = model_index.load(model_filename)\n\n    # If we loaded a model_index, ensure provided inputs are compatible\n    # and use it to refine the benchmarking combinations we will run.\n    if index:\n        # Extract a model name from the index, if possible.\n        if not model_name:\n            model_name = index[\"model_name\"]\n\n        # If batch_sizes, pipeline_sizes and/or performance_levels were provided,\n        # treat them as filters on the index. A value of None is treated as no filter.\n        # See the docs for model_index.filter().\n        index = model_index.filter(\n            index,\n            status=\"finished\",  # only take compiled models\n            batch_size=batch_sizes,  # select all requested batch sizes\n            pipeline_size=pipeline_sizes,\n            performance_level=performance_levels,\n        )\n\n        if 0 == len(index[\"model_configs\"]):\n            raise ValueError(\n                \"No models were found in the model index matching requested criteria. Check that compilation succeeded.\"\n            )\n\n        # If a model index was provided without batch_sizes, extract the sizes from the index.\n        if not batch_sizes:\n            # Select unique batch_sizes in model index.\n            batch_sizes = set(config[\"batch_size\"] for config in index[\"model_configs\"])\n            batch_sizes = sorted(list(batch_sizes))\n\n    # Validate batch sizes after attempting to extract from the model index.\n    inputs, batch_sizes = _sanitize_inputs(inputs, batch_sizes, dataset_inputs)\n\n    # If we still don't have a model name, use the filename.\n    if not model_name:\n        model_name = model_filename\n\n    # If no pipeline_sizes are provided, we'll assume it's 1 for a single model unless told otherwise.\n    if not pipeline_sizes:\n        log.debug(\"Pipeline size was not specified, assuming 1.\")\n        pipeline_sizes = [1]\n\n    # Assume max performance is desired.\n    if not performance_levels:\n        max_performance = max(FAST_MATH_OPTIONS)\n        log.debug(f\"Performance level was not specified, assuming {max_performance}.\")\n        performance_levels = [max_performance]\n\n    # If a model was provided directly without a model index, build a dummy model index.\n    # A single model can not possibly have been compiled for more than 1 configuration,\n    # hence why we can assume index [0].\n    if not index:\n        index = model_index.create(\n            filename=model_filename,\n            model_name=model_name,\n            batch_size=batch_sizes[0],\n            pipeline_size=pipeline_sizes[0],\n            performance_level=performance_levels[0],\n        )\n\n    model_configs = index[\"model_configs\"]\n\n    # --------------------------------------------\n    # Benchmarking\n    # --------------------------------------------\n\n    # Estimate time remaining based on configs requested to run.\n    # If n_models wasn't provided, the default benchmarks [min, max].\n    n_models_est = 2 if not n_models else len(n_models)\n    # If workers_per_model wasn't provided, the default benchmarks [1, 2].\n    n_models_est *= 2 if not workers_per_model else len(workers_per_model)\n    secs_remaining = len(model_configs) * n_models_est * duration\n    mins_remaining = None if secs_remaining < 60 else round(secs_remaining / 60.0, 1)\n    etr = f\"{mins_remaining} minutes\" if mins_remaining else f\"{int(round(secs_remaining))} seconds\"\n    log.info(\"Benchmarking '{}', ~{} remaining.\".format(model_filename, etr))\n\n    # Try to determine instance type.\n    instance_type = get_instance_type()\n    if not instance_type:\n        instance_type = \"unknown\"\n\n    # Try to automatically determine the maximum number of devices available.\n    max_devices = _get_num_devices(device_type, instance_type)\n    log.debug(\"Automatically determined number of devices: {}\".format(max_devices))\n\n    # Try to detect cost / hour for this device.\n    if not cost_per_hour:\n        cost_per_hour = _get_cost_per_hour(instance_type)\n\n    # Run through all requested combinations and generate a report.\n    # This will produce a list of tuples, (config, results).\n    all_results = []\n\n    def make_reports():\n        \"\"\"Helper to generate reports from available results.\"\"\"\n        # If all_results was set, we return the unmodified benchmarking results.\n        return all_results if return_timers else get_reports(all_results, cost_per_hour)\n\n    for model_config in model_configs:\n        batch_size = model_config[\"batch_size\"]\n        pipeline_size = model_config[\"pipeline_size\"]\n\n        # Determine the number of model copies for each benchmarking session.\n        model_counts = n_models\n        # If the user didn't provide n_models, choose reasonable defaults.\n        if not model_counts:\n            # Try to run a single model and the max models supported on this hardware.\n            if max_devices and (max_devices // pipeline_size > 1):\n                model_counts = [1, max_devices // pipeline_size]\n            else:\n                model_counts = [1]\n        # If the user provided model counts and we determine they are too large, emit a warning.\n        else:\n            if max_devices:\n                model_counts_too_large = list(\n                    filter(\n                        lambda model_count: model_count * pipeline_size > max_devices, model_counts\n                    )\n                )\n                if model_counts_too_large:\n                    log.warning(\n                        (\n                            \"Some values of n_models exceed the number of devices available: \"\n                            f\"{model_counts_too_large} > {max_devices}\"\n                        )\n                    )\n\n        # Compute number of workers for this pipeline size, if not specified.\n        n_workers = workers_per_model\n        if not n_workers:\n            n_workers = [_get_num_workers(pipeline_size)]\n            # 1 worker thread == min latency\n            if 1 not in n_workers:\n                n_workers.insert(0, 1)\n\n        for _workers_per_model in n_workers:\n            # We now know everything we need to benchmark.\n            #   1. Build a comprehensive benchmarker config,\n            #   2. build one benchmarker per model,\n            #   3. run the benchmarkers in parallel,\n            #   4. and collect the results for this configuration.\n            for model_count in model_counts:\n                # 1. Benchmarker config\n                config = {\n                    \"model_filename\": model_config[\"filename\"],\n                    \"model_name\": model_name,\n                    \"device_type\": device_type,\n                    \"instance_type\": instance_type,\n                    \"batch_size\": batch_size,\n                    \"n_models\": model_count,\n                    \"workers_per_model\": _workers_per_model,\n                    \"pipeline_size\": pipeline_size,\n                    \"n_devices\": model_count * pipeline_size,\n                    \"performance_level\": model_config[\"performance_level\"],\n                    \"multiprocess\": multiprocess,\n                    \"multiinterpreter\": multiinterpreter,\n                    \"stats_interval\": str(stats_interval),\n                    \"start_dts\": time.strftime(\"%Y%m%d-%H%M%S\"),\n                    \"duration\": str(duration),\n                }\n\n                # 2. Build the benchmarkers\n                benchmarkers = []\n                for benchmarker_id in range(model_count):\n                    benchmarker = Benchmarker(\n                        id=benchmarker_id,\n                        device_id=_get_device_id(benchmarker_id, config),\n                        load_fn=load_fn,\n                        model_filename=model_config[\"filename\"],\n                        inputs=inputs if dataset_inputs else inputs[batch_sizes.index(batch_size)],\n                        workers_per_model=_workers_per_model,\n                        env_setup_fn=_get_env_setup_fn(benchmarker_id, config, env_setup_fn),\n                        setup_fn=_get_setup_fn(benchmarker_id, config, setup_fn),\n                        preprocess_fn=preprocess_fn,\n                        postprocess_fn=postprocess_fn,\n                        dataset_loader_fn=dataset_loader_fn,\n                        model_class_name=model_class_name,\n                        model_class_file=model_class_file,\n                    )\n                    benchmarkers.append(benchmarker)\n\n                # 3. Run benchmarkers in parallel\n                log.debug(\"Running model config: {}\".format(config))\n                try:\n                    results = run_benchmarkers(\n                        benchmarkers,\n                        duration,\n                        stats_interval=stats_interval,\n                        multiprocess=multiprocess,\n                        multiinterpreter=multiinterpreter,\n                    )\n\n                    # 4. Collect results\n                    config[\"stop_dts\"] = time.strftime(\"%Y%m%d-%H%M%S\")\n                    all_results.append((config, results))\n                except KeyboardInterrupt:\n                    # If we are interrupted, return whatever we have on hand.\n                    return make_reports()\n                except:\n                    # If something else goes wrong with the model, we should\n                    # log this configuration and move on.\n                    log.exception(\"Failure benchmarking config: {}\".format(config))\n\n    return make_reports()\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/compile_constants.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nneuronperf.compile_constants\n~~~~~~~~~~~~~~~~~~~~~~~\nHolds constants used at compile time.\n\"\"\"\n\nNEURONCORE_PIPELINE_CORES = \"--neuroncore-pipeline-cores\"\nFAST_MATH = \"--fast-math\"\nFAST_MATH_OPTIONS = {\n    0: \"none\",\n    1: \"fp32-cast-matmult no-fast-relayout\",\n    2: \"fp32-cast-matmult\",\n    3: \"all\",\n}\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/cpu/__init__.py",
    "content": "from neuronperf.cpu.cpu import benchmark\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/cpu/cpu.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nneuronperf.cpu\n~~~~~~~~~~~~~~~~~~~~~~~\nProvides CPU support.\n\"\"\"\n\nimport functools\nimport logging\n\nfrom .. import benchmarking\n\n\nlog = logging.getLogger(__name__)\n\n\nclass DummyModel:\n    def __call__(self, x):\n        x *= 5\n        x += 3\n        return x\n\n\ndef benchmark(model_class, inputs, *args, **kwargs):\n    if not isinstance(model_class, type):\n        raise TypeError(\"For CPU benchmarking, you must provide a class to instantiate.\")\n\n    device_type = kwargs.pop(\"device_type\", \"cpu\")\n    multiinterpreter = kwargs.pop(\"multiinterpreter\", False)\n    if multiinterpreter:\n        log.warning(\n            \"CPU + multiinterpreter is not yet fully supported. You need to provide a custom load_fn that can import your class and instantiate it.\"\n        )\n\n    # Create a custom load_fn that instantiates the model.\n    def load_fn(*args, **kwargs):\n        return model_class()\n\n    kwargs[\"device_type\"] = device_type\n    kwargs[\"multiinterpreter\"] = multiinterpreter\n\n    return benchmarking.benchmark(\n        load_fn,\n        model_class.__name__,\n        inputs,\n        *args,\n        **kwargs,\n    )\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/logging.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nneuronperf.logging\n~~~~~~~~~~~~~~~~~~~~~~~\nProvides logging utility functions.\n\"\"\"\n\nimport logging\n\n\nFORMAT_STRING = '%(levelname)s:%(name)s - %(message)s'\n\n\ndef _get_stream_handlers(level = logging.DEBUG):\n    formatter = logging.Formatter(FORMAT_STRING)\n    sh = logging.StreamHandler()\n    sh.setLevel(logging.DEBUG)\n    sh.setFormatter(formatter)\n    return [sh]\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/model_index.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nneuronperf.model_index\n~~~~~~~~~~~~~~~~~~~~~~~\nProvides utilities for working with model indexes.\n\"\"\"\n\nfrom typing import Any, List, Union\n\nimport builtins\nimport copy as copy_module\nimport itertools\nimport json\nimport logging\nimport os\nimport pathlib\nimport random\nimport shutil\n\n\nfrom .__version__ import __version__\nfrom .compile_constants import FAST_MATH_OPTIONS\n\n\nlog = logging.getLogger(__name__)\n\nMODEL_INDEX_SUFFIX = \".json\"\n\n\ndef generate_id(length: int = 8):\n    \"\"\"Generate a random-enough sequence to append to model names and prevent collisions.\"\"\"\n    id_chars = \"abcdefghijklmnopqrstuvwxyz0123456789\"\n    new_id = [id_chars[random.randrange(len(id_chars))] for _ in range(length)]\n    return \"\".join(new_id)\n\n\ndef generate_name(model_name: str):\n    \"\"\"Generate a model index name from a model name.\"\"\"\n    return model_name + \"_\" + generate_id() + MODEL_INDEX_SUFFIX\n\n\ndef _create(model_name: str, compile_info: list) -> dict:\n    if not isinstance(compile_info, list):\n        log.exception(\n            \"Expected a list of compile info dicts, received '{}'.\".format(str(type(compile_info)))\n        )\n    model_index = {\n        \"NeuronPerf_version\": __version__,\n        \"model_name\": model_name,\n        \"model_configs\": compile_info,\n    }\n    return model_index\n\n\ndef create(\n    filename: str,\n    model_name: str = None,\n    batch_size: int = 1,\n    pipeline_size: int = 1,\n    performance_level: int = max(FAST_MATH_OPTIONS),\n    compile_s: float = None,\n    status: str = \"finished\",\n) -> dict:\n    r\"\"\"\n    Create a new model index from a pre-compiled model.\n\n    :param str filename: The path to the compiled model.\n    :param str model_name: A friendly name for the model. Will default to filename.\n    :param int batch_size: The batch size at compilation for this model.\n    :param int pipeline_size: The pipeline size used at compilation for this model.\n    :param int performance_level: The performance level this model was compiled with.\n    :param float compile_s: Seconds spent compiling.\n    :param str status: A string describing compilation result. Can be \"finished\" or \"error\".\n    :return: A new dictionary representing a model index.\n    :rtype: dict\n    \"\"\"\n    if not model_name:\n        model_name = filename\n    compile_info = [\n        {\n            \"filename\": filename,\n            \"batch_size\": batch_size,\n            \"pipeline_size\": pipeline_size,\n            \"performance_level\": performance_level,\n            \"compile_s\": compile_s,\n            \"status\": status,\n        }\n    ]\n    return _create(model_name, compile_info)\n\n\ndef delete(filename: str):\n    \"\"\"Deletes the model index and all associated models referenced by the index.\"\"\"\n    if not os.path.exists(filename):\n        log.warning(\"Asked to delete '{}', but it can't be located.\".format(filename))\n        return\n\n    # Load the index\n    configs = load(filename)[\"model_configs\"]\n\n    # Remove all referenced models\n    model_filenames = map(lambda x: x[\"filename\"], itertools.chain(configs))\n    for model_filename in model_filenames:\n        log.debug(f\"Deleting '{model_filename}'.\")\n        if os.path.exists(model_filename):\n            if os.path.isdir(model_filename):\n                shutil.rmtree(model_filename)\n            else:\n                os.remove(model_filename)\n\n    # Finally, remove the model index itself\n    log.debug(f\"Deleting '{filename}'\")\n    os.remove(filename)\n\n\ndef copy(old_index: Union[str, dict], new_index: str, new_dir: str) -> str:\n    r\"\"\"\n    Copy an index to a new location. Will rename ``old_index``\n    to ``new_index`` and copy all model files into ``new_dir``,\n    updating the index paths.\n\n    This is useful for pulling individual models out of a pool.\n\n    Returns the path to the new index.\n    \"\"\"\n    os.makedirs(new_dir, exist_ok=True)\n    index = _sanitize(old_index)[0].copy()\n\n    configs = index[\"model_configs\"]\n    for config in configs:\n        path = pathlib.Path(config[\"filename\"])\n        config[\"filename\"] = str(shutil.copy2(path, new_dir))\n\n    return save(index, new_index)\n\n\ndef move(old_index: str, new_index: str, new_dir: str) -> str:\n    \"\"\"This is the same as ``copy`` followed by ``delete`` on the old index.\"\"\"\n    index = copy(old_index, new_index, new_dir)\n    delete(old_index)\n    return index\n\n\ndef _sanitize(*model_indexes: Union[str, dict]) -> List[dict]:\n    r\"\"\"\n    Helper function to load indexes if strings are provided.\n    If already loaded, this is a no-op.\n    \"\"\"\n    if not model_indexes:\n        raise ValueError(\"No model indexes were provided.\")\n    indexes = []\n    # Load any paths provided and sanity check all inputs.\n    for index in model_indexes:\n        if not index:\n            raise ValueError(\"An empty value was received, but expected a model index.\")\n        if isinstance(index, str):\n            index = load(index)\n        if not isinstance(index, dict):\n            raise TypeError(\"Expected a model index, but received '{}'.\".format(str(type(None))))\n        if not len(index) > 0:\n            raise ValueError(\"Received an empty model index.\")\n        indexes.append(index)\n    # Check versions are all the same, and emit a warning if they aren't.\n    versions = set(map(lambda x: x[\"NeuronPerf_version\"], indexes))\n    if len(versions) > 1:\n        log.warning(\"Received model with different versions: '{}'.\".format(str(versions)))\n    model_name = indexes[0][\"model_name\"]\n    # Ensure model names are matching.\n    if not all(model_name == index[\"model_name\"] for index in indexes):\n        model_names = list(set(map(lambda x: x[\"model_name\"], indexes)))\n        log.warning(\"Received model indexes with different model names: {}\".format(model_names))\n    return indexes\n\n\ndef append(*model_indexes: Union[str, dict]) -> dict:\n    r\"\"\"\n    Appends the model indexes non-destructively into a new model index, without\n    modifying any of the internal data.\n\n    This is useful if you have benchmarked multiple related models and wish to\n    combine their respective model indexes into a single index.\n\n    Model name will be taken from the first index provided.\n    Duplicate configs will be filtered.\n\n    :param Union[str, dict] model_indexes: Model indexes or paths to model indexes to combine.\n    :return: A new dictionary representing the combined model index.\n    :rtype: dict\n    \"\"\"\n    indexes = _sanitize(*model_indexes)\n    # Extract the model configs from the indexes\n    config_iter = map(lambda index: copy_module.deepcopy(index[\"model_configs\"]), indexes)\n    # Combine the model configs\n    combined = list(itertools.chain.from_iterable(config_iter))\n    # Split unique and duplicate configs\n    duplicate = []\n    unique = []\n    for config in combined:\n        if config in unique:\n            duplicate.append(config)\n        else:\n            unique.append(config)\n    if len(duplicate) > 0:\n        log.warning(\n            (\n                f\"There were {len(duplicate)} duplicate model configs \"\n                \"filtered. The duplicates were:\\n\"\n                \"{}\".format(\"\\n\".join(map(lambda c: str(c), duplicate)))\n            )\n        )\n    # Build new index from configs\n    return _create(indexes[0][\"model_name\"], unique)\n\n\ndef save(model_index: dict, filename: str = None, root_dir=None) -> str:\n    r\"\"\"Save a NeuronPerf model index to a file.\"\"\"\n    if not filename:\n        model_name = model_index[\"model_name\"]\n        filename = generate_name(model_name)\n    if not filename.lower().endswith(MODEL_INDEX_SUFFIX):\n        filename += MODEL_INDEX_SUFFIX\n    if not root_dir:\n        root_dir = \".\"\n    try:\n        with open(os.path.join(root_dir, filename), \"w\") as fp:\n            json.dump(model_index, fp)\n    except OSError:\n        log.exception(\"Failed to write '{}'.\".format(filename))\n    return filename\n\n\ndef load(filename) -> dict:\n    \"\"\"Load a NeuronPerf model index from a file.\"\"\"\n    model_index = None\n    try:\n        with open(filename, \"r\") as fp:\n            model_index = json.load(fp)\n    except OSError:\n        # file is probably not a model index\n        log.exception(\"Failed to load model index '{}'\".format(filename))\n    else:\n        from distutils.version import LooseVersion\n\n        try:\n            if LooseVersion(model_index[\"NeuronPerf_version\"]) > LooseVersion(__version__):\n                log.warning(\n                    \"Model index newer than NeuronPerf (version {} > {}). Try updating NeuronPerf.\".format(\n                        model_index[\"NeuronPerf_version\"], __version__\n                    )\n                )\n        except TypeError:\n            log.warning(\n                \"Couldn't compare model index version ({}) to NeuronPerf version ({}), continuing anyway.\".format(\n                    model_index[\"NeuronPerf_version\"], __version__\n                )\n            )\n\n    return model_index\n\n\ndef filter_configs(configs, filter_name, filter_values) -> List:\n    \"\"\"Filters provided configs on specified filter and value and returns a new config list.\"\"\"\n    if filter_values is None:\n        return configs.copy()\n    # Filter on configs that have the filter_name and value is in filter_values\n    if not isinstance(filter_values, list):\n        filter_values = [filter_values]\n    return list(\n        builtins.filter(\n            lambda config: filter_name in config and config[filter_name] in filter_values, configs\n        )\n    )\n\n\ndef filter(index: Union[str, dict], **kwargs) -> dict:\n    r\"\"\"\n    Filters provided model index on provided criteria and returns a new index.\n    Each kwarg is a standard (k, v) pair, where k is treated as a filter name\n    and v may be one or more values used to filter model configs.\n    \"\"\"\n    index = _sanitize(index)[0].copy()\n\n    # Filter each config on provided kwargs pairs.\n    configs = index[\"model_configs\"]\n    for k, v in kwargs.items():\n        configs = filter_configs(configs, k, v)\n\n    index[\"model_configs\"] = configs\n    return index\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/mxnet/__init__.py",
    "content": "from neuronperf.mxnet.mxnet import benchmark, compile\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/mxnet/mxnet.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nneuronperf.mxnet\n~~~~~~~~~~~~~~~~~~~~~~~\nProvides Apache MXNet support.\n\"\"\"\n\nimport contextlib\nimport functools\nimport os\nimport threading\n\n# handle different API versions of mxnet\nimport mxnet as mx\nfrom distutils.version import LooseVersion\n\nif LooseVersion(mx.__version__) >= LooseVersion(\"1.8\"):\n    _mx_version = 1.8\n    import mx_neuron as neuron\nelse:\n    _mx_version = 1.5\n    from mxnet.contrib import neuron\n\nfrom .. import benchmarking\n\n\nclass _MXNetModelWrapper:\n    def __init__(self, device_id, sym, args, aux):\n        self.device_id = device_id\n        self.sym = sym\n        self.args = args\n        self.aux = aux\n        self.ctx = None\n        self.exes = {}\n        self.lock = threading.Lock()\n\n    def __call__(self, inputs):\n        # on the first inference, do prep work\n        if not self.ctx:\n            self.ctx = mx.neuron(self.device_id)\n\n        # prepare inputs for model\n        for k, v in inputs.items():\n            inputs[k] = mx.nd.array(v)\n        self.args.update(inputs)\n\n        # obtain an executor for this thread\n        thread_id = threading.get_ident()\n        if thread_id not in self.exes:\n            with self.lock:\n                exe = self.sym.bind(\n                    ctx=self.ctx, args=self.args, aux_states=self.aux, grad_req=\"null\"\n                )\n            self.exes[thread_id] = exe\n        else:\n            exe = self.exes[thread_id]\n\n        # run inference\n        outputs = exe.forward(**inputs)\n        mx.nd.waitall()\n        return outputs[0]\n\n\n@contextlib.contextmanager\ndef change_dir(new_dir):\n    old_dir = os.getcwd()\n    os.chdir(os.path.join(old_dir, new_dir))\n    try:\n        yield\n    finally:\n        os.chdir(old_dir)\n\n\ndef _load_fn(model_filename, **kwargs):\n    device_id = kwargs.get(\"device_id\", 0)\n    sym, args, aux = mx.model.load_checkpoint(model_filename, 0)\n    return _MXNetModelWrapper(device_id, sym, args, aux)\n\n\ndef _compile_fn(model, example_inputs, models_dir, model_name, **kwargs):\n    _sym, _args, _aux = model\n    model_filename = os.path.join(models_dir, model_name)\n    compiler_args = kwargs.pop(\"compiler_args\", {})\n\n    # MXNet passes additional kwargs directly to compiler\n    _sym, _args, _aux = neuron.compile(\n        _sym,\n        _args,\n        _aux,\n        example_inputs,\n        **compiler_args,\n    )\n\n    with change_dir(models_dir):\n        mx.model.save_checkpoint(model_name, 0, _sym, _args, _aux)\n    return model_filename\n\n\ndef compile(model, inputs, *args, **kwargs):\n    return benchmarking.compile(_compile_fn, model, inputs, *args, **kwargs)\n\n\ndef benchmark(model_filename, inputs, *args, **kwargs):\n    env_setup_fn = kwargs.pop(\"env_setup_fn\", lambda *_: None)\n\n    # Use a custom setup function to handle MXNet concurrency requirements.\n    @functools.wraps(env_setup_fn)\n    def _env_setup_fn(id, config):\n        workers_per_model = str(config[\"workers_per_model\"])\n        os.environ[\"MXNET_CPU_TEMP_COPY\"] = workers_per_model\n        os.environ[\"MXNET_EXEC_NUM_TEMP\"] = workers_per_model\n        os.environ[\"MXNET_CPU_WORKER_NTHREADS\"] = workers_per_model\n        os.environ[\"MXNET_MP_WORKER_NTHREADS\"] = workers_per_model\n\n        # Remember to call any additional custom setup provided.\n        env_setup_fn(id, config)\n\n    kwargs[\"env_setup_fn\"] = _env_setup_fn\n\n    return benchmarking.benchmark(_load_fn, model_filename, inputs, *args, **kwargs)\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/py.typed",
    "content": "# Marker file that indicates this package supports typing\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/reporting.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nneuronperf.reporting\n~~~~~~~~~~~~~~~~~~~~\nProvides utilities for producing reports from benchmarking results.\n\"\"\"\n\nfrom typing import List\n\nimport csv\nimport itertools\nimport json\nimport logging\nimport time\n\nimport numpy as np\n\nfrom . import __version__\n\n\nlog = logging.getLogger(__name__)\n\nCSV_COLS = [\n    \"model_name\",\n    \"n_models\",\n    \"workers_per_model\",\n    \"pipeline_size\",\n    \"batch_size\",\n    \"throughput_avg\",\n    \"throughput_peak\",\n    \"latency_ms_p0\",\n    \"latency_ms_p50\",\n    \"latency_ms_p90\",\n    \"latency_ms_p95\",\n    \"latency_ms_p99\",\n    \"latency_ms_p100\",\n    \"cpu_avg_percent\",\n    \"cpu_percent_p50\",\n    \"mem_avg_percent\",\n    \"mem_percent_p50\",\n    \"e2e_avg_ms\",\n    \"infer_avg_ms\",\n    \"total_infs\",\n    \"total_s\",\n    \"performance_level\",\n    \"model_filename\",\n    \"device_type\",\n    \"instance_type\",\n    \"cost_per_1m_inf\",\n]\n\nPRINT_COLS = [\n    \"throughput_avg\",\n    \"latency_ms_p50\",\n    \"latency_ms_p99\",\n    \"n_models\",\n    \"pipeline_size\",\n    \"workers_per_model\",\n    \"batch_size\",\n    \"model_filename\",\n]\n\nREQUIRED_CONFIG_KEYS = [\n    \"multiprocess\",\n    \"multiinterpreter\",\n    \"device_type\",\n    \"batch_size\",\n    \"model_filename\",\n    \"model_name\",\n    \"n_models\",\n    \"pipeline_size\",\n]\n\nREQUIRED_RESULTS_KEYS = [\n    \"workers_per_model\",\n    \"status\",\n    \"timers\",\n    \"n_infs\",\n    \"total_s\",\n]\n\n\ndef _validate_config(config):\n    for required_key in REQUIRED_CONFIG_KEYS:\n        if required_key not in config:\n            raise ValueError(\n                (\n                    f\"Model config is missing required key '{required_key}'. \"\n                    \"Something probably went wrong during benchmarking. Provided:\\n{config}\"\n                )\n            )\n\n\ndef _validate_results(results):\n    for required_key in REQUIRED_RESULTS_KEYS:\n        if required_key not in results:\n            raise ValueError(\n                (\n                    f\"Benchmarking results are missing required key '{required_key}'. \"\n                    \"Something probably went wrong during benchmarking. Provided:\\n{results}\"\n                )\n            )\n\n\ndef _get_report_name(model_name: str) -> str:\n    return \"{}.results-{}\".format(model_name, time.strftime(\"%Y%m%d-%H%M%S\"))\n\n\ndef get_report(\n    benchmark_results, cost_per_hour: float = None, window_size: int = 1, verbosity: int = 0\n) -> dict:\n    r\"\"\"Get a performance report from benchmarker results.\n\n    :param benchmark_results: Results from a :class:`benchmarking:Benchmarker` object.\n    :param float cost_per_hour: The cost / hour for this device.\n    :param int window_size: Window size in seconds used to measure throughput.\n    :param int verbosity: Controls logging during report generation. Use 0 (default), 1, or 2.\n    :returns: A dictionary containing performance information.\n    \"\"\"\n    report = {}\n    config, results = benchmark_results\n    _validate_config(config)\n    _validate_results(results)\n    try:\n        report[\"NeuronPerf_version\"] = __version__\n\n        # copy benchmarker info from config into report\n        for k, v in config.items():\n            report[k] = v\n\n        # number of intervals is the same across all stats, so we can use this as a proxy\n        report[\"n_stats_intervals\"] = len(results[\"cpu_percents\"])\n\n        report[\"workers_per_model\"] = results[\"workers_per_model\"]\n        report[\"status\"] = results[\"status\"]\n\n        # timing stats\n        report[\"load_avg_ms\"] = np.fromiter(\n            (t.avg(\"ms\") for t in results[\"timers\"][\"load\"]), float\n        ).mean()\n        report[\"input_avg_ms\"] = np.fromiter(\n            (t.avg(\"ms\") for t in results[\"timers\"][\"input\"]), float\n        ).mean()\n        report[\"warmup_avg_ms\"] = np.fromiter(\n            (t.avg(\"ms\") for t in results[\"timers\"][\"warmup\"]), float\n        ).mean()\n        report[\"env_setup_avg_ms\"] = np.fromiter(\n            (t.avg(\"ms\") for t in results[\"timers\"][\"env_setup\"]), float\n        ).mean()\n        report[\"setup_avg_ms\"] = np.fromiter(\n            (t.avg(\"ms\") for t in results[\"timers\"][\"setup\"]), float\n        ).mean()\n        report[\"preprocess_avg_ms\"] = np.fromiter(\n            (t.avg(\"ms\") for t in results[\"timers\"][\"preprocess\"]), float\n        ).mean()\n        report[\"infer_avg_ms\"] = np.fromiter(\n            (t.avg(\"ms\") for t in results[\"timers\"][\"infer\"]), float\n        ).mean()\n        report[\"postprocess_avg_ms\"] = np.fromiter(\n            (t.avg(\"ms\") for t in results[\"timers\"][\"postprocess\"]), float\n        ).mean()\n        report[\"e2e_avg_ms\"] = np.fromiter(\n            (t.avg(\"ms\") for t in results[\"timers\"][\"e2e\"]), float\n        ).mean()\n        report[\"worker_avg_s\"] = round(\n            np.fromiter((t.avg(\"s\") for t in results[\"timers\"][\"worker\"]), float).mean(), 2\n        )\n        report[\"total_infs\"] = results[\"n_infs\"] * config[\"batch_size\"]\n        report[\"total_s\"] = round(results[\"total_s\"], 2)\n\n        percentiles = [0, 50, 90, 95, 99, 100]\n\n        cpu_percents = np.fromiter(results[\"cpu_percents\"], float)\n        if cpu_percents.size > 2:\n            cpu_percentiles = np.percentile(cpu_percents[1:-1], percentiles)\n            report[\"cpu_avg_percent\"] = cpu_percentiles.mean()\n            for i, p in enumerate(percentiles):\n                report[f\"cpu_percent_p{p}\"] = cpu_percentiles[i]\n\n        mem_percents = np.fromiter(results[\"mem_percents\"], float)\n        if mem_percents.size > 2:\n            mem_percentiles = np.percentile(mem_percents[1:-1], percentiles)\n            report[\"mem_avg_percent\"] = mem_percentiles.mean()\n            for i, p in enumerate(percentiles):\n                report[f\"mem_percent_p{p}\"] = mem_percentiles[i]\n\n        # latency\n        latencies = np.fromiter(\n            itertools.chain.from_iterable(t.durations(\"ms\") for t in results[\"timers\"][\"e2e\"]),\n            float,\n        )\n        latency_percentiles = np.percentile(latencies, percentiles)\n        for i, p in enumerate(percentiles):\n            report[\"latency_ms_p{}\".format(p)] = latency_percentiles[i]\n\n        # bucketize ending timestamps\n        end_timestamps = np.fromiter(\n            itertools.chain.from_iterable(t.end_timestamps(\"s\") for t in results[\"timers\"][\"e2e\"]),\n            float,\n        )\n        bucket_ends = np.floor(end_timestamps / window_size)\n        # group timestamps by window and correct for batch size\n        _, bucket_counts = np.unique(bucket_ends, return_counts=True)\n        bucket_counts *= config[\"batch_size\"]\n        # find max and normalize by window size\n        report[\"throughput_peak\"] = bucket_counts.max() / window_size\n        report[\"throughput_avg\"] = bucket_counts[1:-1].mean() / window_size\n\n        if verbosity > 0:\n            report[\"throughput_hist\"] = bucket_counts\n        if verbosity > 1:\n            report[\"e2e_durations_ms\"] = np.fromiter(\n                (t.durations(\"ms\") for t in results[\"timers\"][\"e2e\"]), float\n            )\n\n        # Try to estimte cost / inference\n        if cost_per_hour:\n            try:\n                infs_per_hour = 3600 * report[\"throughput_avg\"]\n                report[\"cost_per_1m_inf\"] = cost_per_hour * (1_000_000 / infs_per_hour)\n            except:\n                # We'll ignore this, as it's caused by a missing field that would have\n                # already generated an earlier error log. We should continue producing\n                # a report nonetheless.\n                pass\n\n        # Truncate floats to 3 places for readability.\n        for key, value in report.items():\n            if isinstance(value, float):\n                report[key] = round(value, 3)\n\n    except:\n        log.exception(\n            (\n                \"Failed to produce a report from benchmarking results. \"\n                \"Something probably went wrong during benchmarking.\"\n            )\n        )\n    return report\n\n\ndef get_reports(results, cost_per_hour: float = None) -> List[dict]:\n    r\"\"\"\n    Summarizes and combines the detailed results from\n    ``neuronperf.benchmark``, when run with ``return_timers=True``.\n    One report dictionary is produced per model configuration benchmarked.\n    The list of reports can be fed directly to other reporting utilities,\n    such as ``neuronperf.write_csv``.\n\n    :param results: Benchmarker results.\n    :param float cost_per_hour: The cost / hour for this device.\n    \"\"\"\n    reports = []\n    for idx, (config, result) in enumerate(results):\n        try:\n            _validate_config(config)\n            _validate_results(result)\n        except ValueError:\n            log.exception(f\"Result {idx} is missing required information, skipping.\")\n            continue\n        report = get_report((config, result), cost_per_hour)\n        reports.append(report)\n    return reports\n\n\ndef print_reports(reports: List[dict], cols=PRINT_COLS, sort_by=\"throughput_peak\", reverse=False):\n    r\"\"\"Print a subset of report cols to the terminal.\n\n    :param reports: Results from `get_reports`.\n    :param cols: The columns in the report to be displayed.\n    :param sort_by: Sort the cols by the specified key.\n    :param reverse: Sort order.\n    \"\"\"\n    if not reports:\n        print(\"No reports were found. Did benchmarking succeed?\")\n        return\n    # Print headers.\n    col_width = max(map(lambda col: len(col), cols)) + 1\n    row_format = \"{{:<{}}}\".format(col_width) * len(cols)\n    print(row_format.format(*cols))\n    # Extract all rows.\n    rows = []\n    for report in reports:\n        row = []\n        for col in cols:\n            row.append(report[col] if col in report else \"N/A\")\n        rows.append(row)\n    # Sort rows by the specified key, if the key exists.\n    if sort_by in cols:\n        sort_index = cols.index(sort_by)\n        rows = sorted(rows, key=lambda row: row[sort_index], reverse=reverse)\n    # Print all rows.\n    for row in rows:\n        print(row_format.format(*row))\n\n\ndef write_csv(reports: List[dict], filename: str = None, cols=CSV_COLS):\n    r\"\"\"Write a benchmarking report to CSV file.\n\n    :param reports: Results from `get_reports`.\n    :param filename: File name to write out. If not provided, generated from model_name in report and current timestamp.\n    :param cols: The columns in the report to be kept.\n    \"\"\"\n    if not filename:\n        filename = \"{}.csv\".format(_get_report_name(reports[0][\"model_name\"]))\n    try:\n        with open(filename, \"w\", newline=\"\", encoding=\"utf-8\") as csvfile:\n            writer = csv.writer(csvfile)\n            writer.writerow(cols)\n            for idx, report in enumerate(reports):\n                row = []\n                for col in cols:\n                    if col in report:\n                        row.append(report[col] if report[col] is not None else \"N/A\")\n                    else:\n                        log.debug(f\"Report {idx} is missing field '{col}'.\")\n                        row.append(\"N/A\")\n                writer.writerow(row)\n        return filename\n    except OSError:\n        log.exception(f\"Failed to write '{filename}'. Check that you have write permissions.\")\n\n\ndef write_json(reports: List[dict], filename: str = None):\n    if not filename:\n        filename = \"{}.json\".format(_get_report_name(reports[0][\"model_name\"]))\n    try:\n        with open(filename, \"w\", encoding=\"utf-8\") as jsonfile:\n            json.dump(reports, jsonfile)\n        return filename\n    except OSError:\n        log.exception(\n            (\n                f\"Failed to write '{filename}'. Check that the report \"\n                \"contains data and that you have write permissions.\"\n            )\n        )\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/scripts/__init__.py",
    "content": ""
  },
  {
    "path": "src/neuronperf/src/neuronperf/scripts/run_benchmark_file.py",
    "content": "import argparse\nimport dill\nimport neuronperf\n\n\ndef main():\n    parser = argparse.ArgumentParser(\n        prog=\"benchmark\",\n        description=\"Run a serialized Benchmarker for a given `duration`. Upon \"\n        \"success overwrite `filename` with the updated Benchmarker\",\n    )\n    parser.add_argument(\"filename\", type=str, help=\"The serialized Benchmarker\")\n    parser.add_argument(\"duration\", type=float, help=\"The duration of each config (seconds)\")\n    parser.add_argument(\"--model_class_name\", type=str, help=\"The name of a model class to load\")\n    parser.add_argument(\"--model_class_file\", type=str, help=\"Path to Python module defining model_class_name\")\n    args = parser.parse_args()\n\n    try:\n        # If we were provided with a model class to import before deserialization, we need\n        # to handle that now. The class will be manually imported.\n        if args.model_class_name and args.model_class_file:\n            import importlib.util\n\n            spec = importlib.util.spec_from_file_location(\n                args.model_class_name, args.model_class_file\n            )\n            module = importlib.util.module_from_spec(spec)\n            spec.loader.exec_module(module)\n            globals()[args.model_class_name] = getattr(module, args.model_class_name)\n\n        # Load the benchmarker object\n        with open(args.filename, \"rb\") as f:\n            benchmarker = dill.load(f)\n\n        # Execute the benchmarker\n        result = neuronperf.benchmarking.run_benchmarker(benchmarker, args.duration)\n\n        # Write the result back to the same file\n        with open(args.filename, \"wb\") as f:\n            dill.dump(result, f)\n    except:\n        # Dump traceback to a file for debugging.\n        import os\n        import sys\n        import traceback\n        from pathlib import Path\n\n        path = Path(args.filename)\n        filename = os.path.join(path.parent, \"neuronperf_error_{}\".format(path.stem))\n        trace = \"\".join(traceback.format_exception(*sys.exc_info()))\n        with open(filename, \"wt\") as err_fp:\n            err_fp.write(trace)\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/tensorflow/__init__.py",
    "content": "from neuronperf.tensorflow.tensorflow import benchmark, compile\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/tensorflow/tensorflow.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nneuronperf.tensorflow\n~~~~~~~~~~~~~~~~~~~~~~~\nProvides TensorFlow support.\n\"\"\"\n\nimport itertools\nimport logging\nimport os\nimport threading\n\n\nfrom .. import benchmarking\n\n\nlog = logging.getLogger(__name__)\n_lock = threading.Lock()\n\n\ndef _load_fn(model_file, **kwargs):\n    with _lock:\n        import tensorflow as tf\n\n        if tf.__version__.startswith(\"1\"):\n            return tf.contrib.predictor.from_saved_model(model_file)\n        else:\n            import tensorflow.keras as keras\n\n            return keras.models.load_model(model_file)\n\n\ndef _compile_fn(model, inputs, models_dir, model_name, **kwargs):\n    import tensorflow as tf\n    import tensorflow.neuron as tfn\n\n    model_filename = os.path.join(models_dir, model_name)\n\n    # NeuronPerf provides compiler_args as a dictionary, but framework expects a different format.\n    compiler_args = kwargs.pop(\"compiler_args\", {})\n\n    if tf.__version__.startswith(\"1\"):\n        compiler_args_flattened = list(itertools.chain.from_iterable(compiler_args.items()))\n        kwargs[\"compiler_args\"] = compiler_args_flattened\n        kwargs[\"model_feed_dict\"] = inputs\n\n        # For TF 1.x, the saved model path is expected instead of a loaded model.\n        tfn.saved_model.compile(model, model_filename, **kwargs)\n    else:\n        if compiler_args:\n            compiler_args_flattened = \" \".join(\n                [\"{}={}\".format(k, v) for k, v in compiler_args.items()]\n            )\n            os.environ[\"NEURON_CC_FLAGS\"] = compiler_args_flattened\n        else:\n            os.environ[\"NEURON_CC_FLAGS\"] = \"\"\n\n        model_neuron = tfn.trace(model, inputs, **kwargs)\n        model_neuron.save(model_filename)\n    return model_filename\n\n\ndef compile(model, inputs, *args, **kwargs):\n    return benchmarking.compile(_compile_fn, model, inputs, *args, **kwargs)\n\n\ndef benchmark(model_filename, inputs, *args, **kwargs):\n    # Tensorflow-neuron is not currently fork safe, so we workaround this during benchmarking\n    # by spawning a fresh interpreter session for each model we benchmark.\n    if \"multiinterpreter\" in kwargs and not kwargs[\"multiinterpreter\"]:\n        log.warning(\n            \"Setting multiinterpreter=False is not safe with TensorFlow. Use at your own risk.\"\n        )\n    else:\n        kwargs[\"multiinterpreter\"] = True\n\n    return benchmarking.benchmark(_load_fn, model_filename, inputs, *args, **kwargs)\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/timing.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nneuronperf._timing\n~~~~~~~~~~~~~~~~~~~~~~~\nProvides utility functions for timing and time unit conversions.\n\"\"\"\n\nfrom typing import Any, Callable\n\nimport sys\nimport time\nimport typing\n\nimport numpy as np\n\n\ntime_unit_ratios = {\n    'ns': { 'ns': 1, 'us': 1e-3, 'ms': 1e-6, 's': 1e-9 },\n    'us': { 'ns': 1e3, 'us': 1, 'ms': 1e-3, 's': 1e-6 },\n    'ms': { 'ns': 1e6, 'us': 1e3, 'ms': 1, 's': 1e-3 },\n    's': { 'ns': 1e9, 'us': 1e6, 'ms': 1e3, 's': 1 }\n}\n\n\nsupported_time_units = time_unit_ratios.keys()\n\n\ndef timestamp_convert(timestamps,\n                      input_time_unit: str,\n                      output_time_unit: str):\n    \"\"\"Convert timestamp(s) from one time unit to another.\n\n    :param ts: A timestamp or iterable of timestamps.\n    :param input_time_unit: A string specifying the input time unit.\n    :param output_time_unit: A string specifying the output time unit.\n    :returns: A single timestamp or container of timestamps in the output time unit.\n    \"\"\"\n    try:\n        ratio = time_unit_ratios[input_time_unit][output_time_unit]\n    except:\n        raise ValueError(f\"Can't convert {input_time_unit} to {output_time_unit}\")\n\n    return timestamps * ratio\n\n\nclass Timer():\n    def __init__(self,\n                 timer_fn: Callable[[], Any] = time.perf_counter,\n                 timer_unit: str = 's'):\n        self.timer_fn = timer_fn\n        self.timer_unit = timer_unit\n        self._start = []\n        self._end = []\n\n    def __enter__(self):\n        self.start()\n\n    def __exit__(self, type, value, traceback):\n        self.stop()\n\n    def __delitem__(self, index):\n        del self._start[index]\n        del self._end[index]\n\n    def __getitem__(self, index):\n        # it's possible that start and end won't match if negative indices are used,\n        # b/c timer may have started and not stopped yet\n        if index < 0: index = index % len(self._end)\n        return self._start[index], self._end[index]\n\n    def __iter__(self):\n        return zip(self._start, self._end)\n\n    def __len__(self):\n        return len(self._end)\n\n    def __str__(self):\n        return str(self.timestamps())\n\n    def start(self):\n        # If we've already started, consider this a request to restart.\n        # This also handles partial timestamps due to a Timer-unrelated error.\n        if len(self._start) > len(self._end): self._start.pop()\n        self._start.append(self.timer_fn())\n\n    def stop(self):\n        # if we haven't started, ignore this\n        if 0 == len(self._start): return\n        self._end.append(self.timer_fn())\n\n    def next(self):\n        \"\"\"Manually advance the timer to the next timestamp measurement.\"\"\"\n        self.stop()\n        self.start()\n\n    def reset(self):\n        self._start.clear()\n        self._end.clear()\n\n    def insert(self, timestamps: tuple, time_unit: str):\n        \"\"\"Manually insert a timestamp pair. Does not affect ongoing timing.\n\n        :param timestamps: Timestamp pair to insert.\n        :param time_unit: The time unit of the incoming timestamps.\n        \"\"\"\n        if len(timestamps) != 2 or not time_unit: raise ValueError()\n        timestamps = timestamp_convert(np.array(timestamps), time_unit, self.timer_unit)\n        self._start.insert(0, timestamps[0])\n        self._end.insert(0, timestamps[1])\n\n    def start_timestamps(self, time_unit: str = None):\n        if not time_unit: return np.array(self._start)\n        return timestamp_convert(np.array(self._start), self.timer_unit, time_unit)\n\n    def end_timestamps(self, time_unit: str = None):\n        if not time_unit: return np.array(self._end)\n        return timestamp_convert(np.array(self._end), self.timer_unit, time_unit)\n\n    def timestamps(self, time_unit: str = None):\n        \"\"\"Returns a list of pairs of timestamps (start, end).\n\n        :param time_unit: The time unit of the output timestamp(s). `None` will use the timer's native unit.\n        \"\"\"\n        starts, ends = self.start_timestamps(time_unit), self.end_timestamps(time_unit)\n        return np.stack((starts[:len(ends)], ends), axis=-1)\n\n    def durations(self, time_unit: str = None):\n        \"\"\"Returns an `ndarray` of timestamp deltas, optionally converted into a provided time unit.\n\n        :param time_unit: The time unit of the output timestamp(s). `None` will use the timer's native unit.\n        :returns: An `ndarray` of timestamp deltas.\n        \"\"\"\n        starts, ends = self.start_timestamps(), self.end_timestamps()\n        return timestamp_convert(ends - starts[:len(ends)], self.timer_unit, time_unit)\n\n    def total_duration(self, time_unit: str = None):\n        \"\"\"Returns total duration of all time measurements, optionally converted into a provided time unit.\n\n        :param time_unit: The time unit of the output timestamp(s). `None` will use the timer's native unit.\n        :\n        \"\"\"\n        starts, ends = self.start_timestamps(), self.end_timestamps()\n        total = np.sum(ends - starts[:len(ends)])\n        return total if not time_unit else timestamp_convert(total, self.timer_unit, time_unit)\n\n    def avg(self, time_unit: str = None):\n        \"\"\"Returns average duration, optionally converted into a provided time unit.\n\n        :param time_unit: The time unit of the output timestamp(s). `None` will use the timer's native unit.\n        :returns: The average duration.\n        \"\"\"\n        return self.durations(time_unit).mean() if len(self._end) > 0 else 0\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/torch/__init__.py",
    "content": "from neuronperf.torch.torch import benchmark, compile\n"
  },
  {
    "path": "src/neuronperf/src/neuronperf/torch/torch.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"\nneuronperf.torch\n~~~~~~~~~~~~~~~~~~~~~~~\nProvides PyTorch support.\n\"\"\"\n\nimport functools\nimport itertools\nimport logging\nimport math\nimport os\nimport types\n\nimport torch\n\nfrom .. import benchmarking\n\n\nlog = logging.getLogger(__name__)\n\n\ndef _compile_fn(model, example_inputs, models_dir, model_name, **kwargs):\n    import torch_neuron\n\n    \"\"\"Compiles a model for Neuron.\"\"\"\n    model_filename = os.path.join(models_dir, \"{}.pt\".format(model_name))\n    model.eval()\n\n    # NeuronPerf provides compiler_args as a dictionary, but framework expects a different format.\n    compiler_args = kwargs.get(\"compiler_args\", {})\n    compiler_args_flattened = list(itertools.chain.from_iterable(compiler_args.items()))\n    kwargs[\"compiler_args\"] = compiler_args_flattened\n\n    model_neuron = torch.neuron.trace(\n        model,\n        example_inputs,\n        **kwargs,\n    )\n    model_neuron.save(model_filename)\n    return model_filename\n\n\ndef _load_fn(model_filename, **kwargs):\n    import torch_neuron\n\n    model = torch.jit.load(model_filename)\n    model.eval()\n    return model\n\n\ndef _class_load_fn(model_class, **kwargs):\n    model = model_class()\n    model.eval()\n    return model\n\n\ndef compile(model, inputs, *args, **kwargs):\n    return benchmarking.compile(_compile_fn, model, inputs, *args, **kwargs)\n\n\n# See: https://pytorch.org/docs/stable/data.html#dataset-types\ndef _get_dataset_loader_fn(dataset, loop):\n    def _worker_init_fn(worker_id):\n        # This function will be called for each worker by torch.\n        worker_info = torch.utils.data.get_worker_info()\n        worker_id = worker_info.id\n        num_workers = worker_info.num_workers\n        dataset = worker_info.dataset  # the dataset copy in this worker process\n        per_worker = int(math.ceil(len(dataset) / float(num_workers)))\n        start = worker_id * per_worker\n        end = min(start + per_worker, len(dataset))\n        log.debug(\n            \"worker_id={}, num_workers={}, per_worker={}, start={}, end={}\".format(\n                worker_id, num_workers, per_worker, start, end\n            )\n        )\n\n        # We monkey-patch the dataset __iter__ function to support a multi-worker config.\n        def _iter(self, start, end, loop):\n            if loop:\n                return itertools.cycle(range(start, end))\n            else:\n                return iter(range(start, end))\n\n        __iter__ = functools.partial(_iter, start, end, loop)\n        dataset.__iter__ = types.MethodType(__iter__, dataset)\n\n    def dataset_loader_fn(dataset, num_workers):\n        return iter(\n            torch.utils.data.DataLoader(\n                dataset, num_workers=num_workers, worker_init_fn=_worker_init_fn\n            )\n        )\n\n    return dataset_loader_fn\n\n\ndef benchmark(model_filename, inputs, *args, dataset_inputs=False, loop_dataset=False, **kwargs):\n    # These functions may need to be overridden or wrapped, depending upon config requested.\n    load_fn = _load_fn\n    setup_fn = kwargs.get(\"setup_fn\", lambda *args, **kwargs: None)\n    preprocess_fn = kwargs.get(\"preprocess_fn\", lambda *args: (*args,))\n\n    # If cuda is requested, ensure it's available and provide smart wrappers for CUDA device loading.\n    device_type = kwargs.get(\"device_type\", None)\n    use_cuda = device_type and (\"cuda\" in device_type.lower() or \"gpu\" == device_type.lower())\n    if use_cuda:\n        if not torch.cuda.is_available():\n            raise ValueError(\n                \"You requested CUDA benchmarking, but torch is unable to locate a CUDA device.\"\n            )\n\n        # Must use multiinterpreter for CUDA.\n        if \"multiinterpreter\" in kwargs and not kwargs[\"multiinterpreter\"]:\n            log.warning(\n                (\n                    \"You set multiinterpreter to False, but it is required for safe CUDA benchmarking.\\n\"\n                    \"Your preference has been overridden so that benchmarking may continue.\"\n                )\n            )\n        kwargs[\"multiinterpreter\"] = True\n\n        # If we received a non-string, use class-based load function\n        if not isinstance(model_filename, str):\n            # In GPU benchmarking, a model class is expected. This line is for clarity.\n            model_class = model_filename\n            if not isinstance(model_class, type):\n                raise TypeError(\"GPU benchmarking expects a model class to be provided instead of a filename.\")\n\n            # We must also know the name of the file to import from, so that serialization can succeed.\n            import inspect\n\n            try:\n                model_class_file = inspect.getfile(model_class)\n                kwargs[\"model_class_file\"] = model_class_file\n                kwargs[\"model_class_name\"] = model_class.__name__\n            except:\n                raise ValueError(\n                    (\n                        \"Your model class must be defined in a Python module so that it can be serialized properly.\\n\"\n                        \"Please add your model to a simple Python file along with any required imports.\"\n                    )\n                )\n\n            @functools.wraps(_class_load_fn)\n            def load_fn(*args, **kwargs):\n                return _class_load_fn(model_class, **kwargs)\n\n            # Now swap the class object for its name so the benchmarker still receives a string.\n            model_filename = model_class.__name__\n\n        # Wrap setup_fn so that it moves the model to CUDA device.\n        @functools.wraps(setup_fn)\n        def _setup_fn(id, config, model):\n            setup_fn(id, config, model)\n            model.to(\"cuda\")\n\n        kwargs[\"setup_fn\"] = _setup_fn\n\n        # Wrap preprocess_fn with one that moves inputs to CUDA.\n        @functools.wraps(preprocess_fn)\n        def _preprocess_fn(*inputs):\n            inputs = preprocess_fn(*inputs)\n            for input in inputs:\n                input.to(\"cuda\")\n            return (*inputs,)\n\n        kwargs[\"preprocess_fn\"] = _preprocess_fn\n\n    # When custom datasets are used, a loader function will need to be available in subprocesses.\n    dataset_loader_fn = None\n    if dataset_inputs:\n        dataset_loader_fn = _get_dataset_loader_fn(example_inputs, loop_dataset)\n    kwargs[\"dataset_loader_fn\"] = dataset_loader_fn\n\n    with torch.no_grad():\n        return benchmarking.benchmark(\n            load_fn,\n            model_filename,\n            inputs,\n            *args,\n            **kwargs,\n        )\n"
  },
  {
    "path": "src/neuronperf/test/test_neuronperf.py",
    "content": "# -*- coding: utf-8 -*-\n\nimport json\nimport os\nimport pathlib\nimport shutil\nimport time\n\nimport numpy as np\nimport pytest\n\nimport neuronperf\n\n\n@pytest.mark.sanity\ndef test_timer():\n    timer = neuronperf.Timer()\n    with timer:\n        time.sleep(1)\n\n    # sanity check\n    assert timer.total_duration(\"s\") > 0.5 and timer.total_duration(\"s\") < 1.5\n\n    # check conversions are functional\n    assert (\n        timer.total_duration(\"ns\")\n        > timer.total_duration(\"us\")\n        > timer.total_duration(\"ms\")\n        > timer.total_duration(\"s\")\n    )\n\n    # check timestamp deltas are close to total\n    assert timer.total_duration(\"s\") == pytest.approx(timer.durations(\"s\").sum())\n\n    # check iteration functions\n    for _ in range(10):\n        with timer:\n            time.sleep(0.01)\n    assert len(timer) > 10\n\n    # check that timer always returns pairs\n    timestamps = timer.timestamps()\n    for pair in timestamps:\n        assert 2 == len(pair)\n        assert pair[1] > pair[0]\n\n    # check that len is functional\n    assert len(timer) == len(timestamps)\n\n\n@pytest.mark.sanity\ndef test_timestamp_convert():\n    # test scalar behavior\n    assert 1000 == pytest.approx(neuronperf.timestamp_convert(1, \"s\", \"ms\"))\n    assert 1.5 == pytest.approx(neuronperf.timestamp_convert(1500, \"ms\", \"s\"))\n    assert 2.3e6 == pytest.approx(neuronperf.timestamp_convert(2.3, \"s\", \"us\"))\n\n    # test array behavior\n    times = np.array([1, 2, 3])\n    times_ms = neuronperf.timestamp_convert(times, \"s\", \"ms\")\n    assert 1000 == pytest.approx(times_ms[0])\n\n\n@pytest.mark.sanity\ndef test_model_index_create_from_file():\n    filename = \"dummy_model.ext\"\n    model_name = \"dummy\"\n    index = neuronperf.model_index.create(filename, model_name=model_name)\n    assert index[\"model_name\"] == model_name\n    assert len(index[\"model_configs\"]) == 1\n    assert index[\"model_configs\"][0][\"filename\"] == filename\n\n\n@pytest.mark.sanity\ndef test_model_index_create_delete_save_load():\n    filename = \"dummy_index.json\"\n    if os.path.exists(filename):\n        neuronperf.model_index.delete(filename)\n\n    model_name = \"Dummy\"\n    model_filename = os.path.join(\"models\", \"dummy.model\")\n    model_index = neuronperf.model_index.create(model_filename, model_name=model_name)\n    neuronperf.model_index.save(model_index, filename=filename)\n    assert os.path.exists(filename)\n\n    model_index_loaded = neuronperf.model_index.load(filename)\n    assert model_index_loaded == model_index\n    assert model_index_loaded[\"model_name\"] == model_name\n    assert model_index_loaded[\"model_configs\"][0][\"batch_size\"] == 1\n\n    neuronperf.model_index.delete(filename)\n    assert not os.path.exists(filename)\n\n\n@pytest.mark.sanity\ndef test_model_index_copy():\n    filename = \"dummy_index.json\"\n    if os.path.exists(filename):\n        neuronperf.model_index.delete(filename)\n\n    model_filename = os.path.join(\"models\", \"dummy.model\")\n    os.makedirs(\"models\", exist_ok=True)\n    pathlib.Path(model_filename).touch()\n    model_name = \"Dummy\"\n    model_index = neuronperf.model_index.create(model_filename, model_name=model_name)\n    neuronperf.model_index.save(model_index, filename=filename)\n\n    # Test copy API using a pre-loaded model inndex\n    neuronperf.model_index.copy(model_index, \"new_index.json\", \"new_models\")\n    assert os.path.exists(\"models\")\n    assert os.path.exists(model_filename)\n    assert os.path.exists(\"new_index.json\")\n    assert os.path.exists(os.path.join(\"new_models\", \"dummy.model\"))\n\n    new_index = neuronperf.model_index.load(\"new_index.json\")\n    assert new_index[\"model_configs\"][0][\"filename\"] == os.path.join(\"new_models\", \"dummy.model\")\n\n    neuronperf.model_index.delete(filename)\n    neuronperf.model_index.delete(\"new_index.json\")\n    shutil.rmtree(\"new_models\")\n    shutil.rmtree(\"models\")\n\n\n@pytest.mark.sanity\ndef test_model_index_copy_2():\n    filename = \"dummy_index.json\"\n    if os.path.exists(filename):\n        neuronperf.model_index.delete(filename)\n\n    model_filename = os.path.join(\"models\", \"dummy.model\")\n    os.makedirs(\"models\", exist_ok=True)\n    pathlib.Path(model_filename).touch()\n    model_name = \"Dummy\"\n    model_index = neuronperf.model_index.create(model_filename, model_name=model_name)\n    neuronperf.model_index.save(model_index, filename=filename)\n\n    # Test copy API using a file\n    neuronperf.model_index.copy(filename, \"new_index.json\", \"new_models\")\n    assert os.path.exists(\"models\")\n    assert os.path.exists(model_filename)\n    assert os.path.exists(\"new_index.json\")\n    assert os.path.exists(os.path.join(\"new_models\", \"dummy.model\"))\n\n    new_index = neuronperf.model_index.load(\"new_index.json\")\n    assert new_index[\"model_configs\"][0][\"filename\"] == os.path.join(\"new_models\", \"dummy.model\")\n\n    neuronperf.model_index.delete(filename)\n    neuronperf.model_index.delete(\"new_index.json\")\n    shutil.rmtree(\"new_models\")\n    shutil.rmtree(\"models\")\n\n\n@pytest.mark.sanity\ndef test_model_index_move():\n    filename = \"dummy_index.json\"\n    if os.path.exists(filename):\n        neuronperf.model_index.delete(filename)\n\n    model_filename = os.path.join(\"models\", \"dummy.model\")\n    os.makedirs(\"models\", exist_ok=True)\n    pathlib.Path(model_filename).touch()\n    model_name = \"Dummy\"\n    model_index = neuronperf.model_index.create(model_filename, model_name=model_name)\n    neuronperf.model_index.save(model_index, filename=filename)\n\n    neuronperf.model_index.move(filename, \"new_index.json\", \"new_models\")\n    assert not os.path.exists(filename)\n    assert not os.path.exists(model_filename)\n    assert os.path.exists(\"new_index.json\")\n    assert os.path.exists(os.path.join(\"new_models\", \"dummy.model\"))\n\n    new_index = neuronperf.model_index.load(\"new_index.json\")\n    assert new_index[\"model_configs\"][0][\"filename\"] == os.path.join(\"new_models\", \"dummy.model\")\n\n    neuronperf.model_index.delete(\"new_index.json\")\n    shutil.rmtree(\"new_models\")\n    shutil.rmtree(\"models\")\n\n\n@pytest.mark.sanity\ndef test_model_index_append():\n    model_indexes = [\n        neuronperf.model_index.create(f\"Dummy_{x}\", model_name=\"Dummy\") for x in range(10)\n    ]\n    combined_index = neuronperf.model_index.append(*model_indexes)\n    # Assert that combination apparently did happen.\n    assert len(combined_index[\"model_configs\"]) == len(model_indexes)\n    # Check that batch_sizes haven't been modified.\n    assert all(1 == config[\"batch_size\"] for config in combined_index[\"model_configs\"])\n\n    # Test for duplicate filtering behavior\n    model_indexes = [neuronperf.model_index.create(\"Dummy\") for _ in range(10)]\n    combined_index = neuronperf.model_index.append(*model_indexes)\n    assert len(combined_index[\"model_configs\"]) == 1\n\n\n@pytest.mark.sanity\ndef test_model_index_filter():\n    idx_1 = neuronperf.model_index.create(\"fake\", performance_level=2, compile_s=1)\n    idx_2 = neuronperf.model_index.create(\"fake2\", compile_s=2)\n    idx = neuronperf.model_index.append(idx_1, idx_2)\n\n    filtered = neuronperf.model_index.filter(idx, filename=\"fake\")\n    print(filtered)\n    assert 1 == len(filtered[\"model_configs\"])\n    assert \"fake\" == filtered[\"model_name\"]\n\n    filtered = neuronperf.model_index.filter(idx, performance_level=2)\n    assert 1 == len(filtered[\"model_configs\"])\n    assert \"fake\" == filtered[\"model_name\"]\n\n    # None key should filter nothing\n    filtered = neuronperf.model_index.filter(idx, compile_s=None)\n    assert 2 == len(filtered[\"model_configs\"])\n\n\n@pytest.mark.sanity\n@pytest.mark.slow\ndef test_benchmarker():\n    dummy_model = lambda x: None\n    dummy_load = lambda path, device_id: dummy_model\n    b = neuronperf.benchmarking.Benchmarker(\n        id=0, device_id=0, load_fn=dummy_load, model_filename=\"test\", inputs=[], workers_per_model=2\n    )\n    b.start()\n    time.sleep(1.5)\n    b.stop()\n\n    assert b.status == \"finished\"\n    assert all(n_infs > 100 for n_infs in b.n_infs)\n\n\n@pytest.mark.slow\ndef test_benchmark_multithread():\n    benchmarker_results = neuronperf.cpu.benchmark(\n        neuronperf.DummyModel,\n        [np.array([1, 2, 3, 4])],\n        duration=2,\n        n_models=4,\n        multiprocess=False,\n        multiinterpreter=False,\n        verbosity=2,\n        return_timers=True,\n    )\n\n    # Return value is a list of tuples:\n    # [(config, results), (config, results), ...]\n    # Each config is a dict. Each result is a dict.\n\n    # A single configuration without workers_per_model set will produce 2 results\n    assert len(benchmarker_results) == 2\n\n    for benchmarker_result in benchmarker_results:\n        config, results = benchmarker_result\n        assert \"cpu_percents\" in results\n        assert \"mem_percents\" in results\n        assert not config[\"multiprocess\"]\n        assert not config[\"multiinterpreter\"]\n        assert results[\"status\"] == \"finished\"\n        assert results[\"n_infs\"] > 100\n\n\n@pytest.mark.slow\ndef test_benchmark_multithread_2():\n    dummy_model = lambda x: None\n    dummy_load = lambda path, device_id: dummy_model\n    reports = neuronperf.benchmark(\n        load_fn=dummy_load,\n        model_filename=\"dummy_filename\",\n        inputs=[[1]],\n        duration=2,\n        n_models=4,\n        multiprocess=False,\n        multiinterpreter=False,\n        verbosity=2,\n    )\n\n    # A single configuration without workers_per_model set will produce 2 results\n    assert len(reports) == 2\n    report = reports[0]\n    assert not report[\"multiprocess\"]\n    assert not report[\"multiinterpreter\"]\n    assert report[\"status\"] == \"finished\"\n    assert report[\"total_infs\"] > 100\n\n\n@pytest.mark.slow\ndef test_benchmark_multiprocess():\n    n_models = 16\n    benchmarker_results = neuronperf.cpu.benchmark(\n        neuronperf.DummyModel,\n        inputs=[np.array([1, 2])],\n        batch_sizes=[1],\n        duration=2,\n        n_models=n_models,\n        multiprocess=True,\n        multiinterpreter=False,\n        verbosity=2,\n        return_timers=True,\n    )\n\n    # A single configuration will produce a single result tuple\n    assert len(benchmarker_results) == 2\n    # Extract the benchmarker results\n    config, results = benchmarker_results[0]\n    # Confirm that there is least 1 timer / model for each benchmarker\n    assert len(next(iter(results[\"timers\"].values()))) >= n_models\n    assert config[\"multiprocess\"]\n    assert not config[\"multiinterpreter\"]\n    assert results[\"status\"] == \"finished\"\n    assert results[\"n_infs\"] > 100\n\n\n@pytest.mark.slow\ndef test_benchmark_multiinterpreter():\n    benchmarker_results = neuronperf.cpu.benchmark(\n        neuronperf.DummyModel,\n        inputs=[np.array([1, 2])],\n        duration=2.5,\n        n_models=2,\n        multiprocess=False,\n        multiinterpreter=True,\n        verbosity=2,\n        return_timers=True,\n    )\n\n    # A single configuration without workers_per_model set will produce 2 results\n    assert len(benchmarker_results) == 2\n    # Extract the benchmarker results\n    config, results = benchmarker_results[0]\n    assert config[\"multiinterpreter\"]\n    assert results[\"status\"] == \"finished\"\n    assert results[\"n_infs\"] > 100\n\n\n@pytest.mark.slow\ndef test_reporting():\n    benchmarker_results = neuronperf.cpu.benchmark(\n        neuronperf.DummyModel,\n        inputs=[np.array([1, 2, 3, 4])],\n        n_models=[1, 4],\n        duration=2,\n        verbosity=2,\n        return_timers=True,\n    )\n\n    assert len(benchmarker_results) == 4\n    reports = neuronperf.get_reports(benchmarker_results)\n    assert len(reports) == len(benchmarker_results)\n    assert all(\"total_infs\" in report for report in reports)\n\n    neuronperf.print_reports(reports)\n    csv_file = neuronperf.write_csv(reports)\n    os.remove(csv_file)\n    json_file = neuronperf.write_json(reports)\n    with open(json_file, \"rt\") as fp:\n        json.load(fp)\n    os.remove(json_file)\n"
  },
  {
    "path": "static/google673a8c4fbaa024d8.html",
    "content": "google-site-verification: google673a8c4fbaa024d8.html"
  },
  {
    "path": "static/robots.txt",
    "content": "User-agent: *\n\nDisallow: /en/v2.24.0/\n\nDisallow: /en/v2.23.0/\n\nDisallow: /en/v2.22.1/\n\nDisallow: /en/v2.22.0/\n\nDisallow: /en/v2.21.1/\n\nDisallow: /en/v2.21.0/\n\nDisallow: /en/v2.20.2/\n\nDisallow: /en/v2.20.1/\n\nDisallow: /en/v2.20.0/\n\nDisallow: /en/v2.19.1/\n\nDisallow: /en/v2.19.0/\n\nDisallow: /en/v2.18.2/\n\nDisallow: /en/v2.18.1/\n\nDisallow: /en/v2.18.0/\n\nDisallow: /en/v2.17.0/\n\nDisallow: /en/v2.16.1/\n\nDisallow: /en/v2.16.0/\n\nDisallow: /en/v2.15.2/\n\nDisallow: /en/v2.15.1/\n\nDisallow: /en/v2.15.0/\n\nDisallow: /en/v2.14.1/\n\nDisallow: /en/v2.14.0/\n\nDisallow: /en/v2.13.2/\n\nDisallow: /en/v2.13.1/\n\nDisallow: /en/v2.13.0/\n\nDisallow: /en/v2.12.2/\n\nDisallow: /en/v2.12.1/\n\nDisallow: /en/v2.12.0/\n\nDisallow: /en/v2.11.0/\n\nDisallow: /en/v2.10.0/\n\nDisallow: /en/v2.9.0/\n\nDisallow: /en/v2.8.0/\n\nDisallow: /en/v2.7.0/\n\nDisallow: /en/v2.6.0/\n\nDisallow: /en/v2.5.0/\n\nDisallow: /en/v2.4.0/\n\nDisallow: /en/v2.3.0/\n\nDisallow: /en/v1.19.2/\n\nDisallow: /en/v1.19.1/\n\nDisallow: /en/v1.19.0/\n\nDisallow: /en/v1.18.0/\n\nDisallow: /en/v1.17.2/\n\nDisallow: /en/v1.17.1/\n\nDisallow: /en/v1.17.0/\n\nDisallow: /en/v1.16.3/\n\nDisallow: /en/v1.16.2/\n\nDisallow: /en/v1.16.1/\n\nDisallow: /en/v1.16.0/\n\nDisallow: /en/v1.15.2/\n\nDisallow: /en/1.15.1/\n\nDisallow: /en/1.15.0/\n\nDisallow: /en/1.14.2/\n\nDisallow: /en/1.14.1/\n\nDisallow: /en/1.14.0/\n\nDisallow: /en/1.13.0/\n\nDisallow: /en/1.12.2/\n\nDisallow: /en/1.12.1/\n\nDisallow: /en/1.12.0/\n\nDisallow: /en/1.11.0/\n\n\nSitemap: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/sitemap1.xml\n"
  },
  {
    "path": "static/sitemap1.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/index.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/misc-customops.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/third-party-solutions.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/dlami/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/nki_faq.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/torch-neuron-ubuntu20.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/mxnet-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax-neuronx.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/setup-troubleshooting.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/torch-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/multiframework-dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/setup-rocky-linux-9.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/troubleshooting.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/torch-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/index.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/ecs-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/third-party-solutions.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/dlc-then-customize-devflow.html</loc>\n    <lastmod>2025-10-28</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/eks-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/aws-batch-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/sagemaker-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/parallelcluster-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/ec2-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/releasecontent.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/2.29.0.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-developer-guide.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-configurable-parameters.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/configuration-guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/rn.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-troubleshoot.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/faq.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/dlc-then-ecs-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/dlc-then-customize-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/neo-then-hosting-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/dlc-then-k8s-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorial-docker-runtime1.0.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/dlc-then-ec2-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/faq-troubleshooting-releasenote.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/neuron-dra.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/ec2.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/dlc-then-eks-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/container-deployment-flows.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/ec2-then-ec2-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/k8.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/troubleshooting.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/developerflows.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/locate-neuron-dlc-image.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/faq.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/neuron-plugins.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/container-sm-hosting-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/getting-started.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/beta-participation.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/amazonq-getstarted.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/whats-new.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/troubleshooting.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/profiling-tools.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/monitoring-tools.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/sdk-policy.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/what-is-neuron.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/security.html</loc>\n    <lastmod>2026-02-13</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/tensorflow_serving_tutorial.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc/how-to-convolution-in-unet.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc/developer-guide.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc/faq.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc/api-reference-guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc/command-line-reference.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc/developer-guide.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc/faq.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF005.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF011.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/ESPP047.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF010.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF004.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF006.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF007.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF013.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF017.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EBVF030.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF016.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EHCA005.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EARG001.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF015.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF001.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EBIR023.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EUOC002.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EXTP004.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EOOM001.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/ESPP004.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EOOM002.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF018.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF024.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF031.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF019.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF022.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF009.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EXSP001.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/ESFH002.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc/api-reference-guide/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/calculator/neuron-calculator.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/neuron2-intro-faq.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/contributing-faq.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/onnx-faq.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/index.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/index.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/models/index.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/models/inference-inf1-samples.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/models/training-trn1-samples.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/models/inference-inf2-trn1-samples.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/index.html</loc>\n    <lastmod>2025-11-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/index.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/glossary.html</loc>\n    <lastmod>2025-10-28</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/oss/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/mxnet-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/torch-neuron-tab-training.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/tensorflow-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/docs-quicklinks.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/user-guide-quickstart.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/github-samples.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/torch-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/tab-inference-tensorflow-neuron.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/inference-quickstart.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/training-quickstart.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/news-and-blogs/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trn1-arch.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/inferentia2.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trainium3.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/inf1-arch.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trainium2.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/inferentia.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trn2-arch.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trn3-arch.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/neuron-core-v4.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/neuron-core-v1.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/inf2-arch.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trainium.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/neuron-core-v2.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/neuron-core-v3.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/index.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/rounding-modes.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/data-types.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/neuron-caching.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/custom-c++-operators.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/neuroncore-pipeline.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/neuroncore-batching.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/logical-neuroncore-config.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/inf2/inf2-performance.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/trn1/trn1-training-performance.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/trn1/trn1-inference-performance.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/inf1/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-transition-pytorch-trainium.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-multiframework-dlamis-inf1.html</loc>\n    <lastmod>2025-10-17</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-maintenance-nxdt-nxd-core-training.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-2-7-2-8.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-tensorflow-2-8-9.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-1-1-3.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-python38-no-longer-support.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/github-changes.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/sm-training-trn1-introduce.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-nemo-megatron.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-nemo.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-beta-pytorch-neuroncore-placement-apis.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pt2.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-nxdi-changes.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eol-megatron-lm.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-device-version.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-torch-neuronx-nki-jit.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-xla-bf16.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-llama3-2-checkpoint.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-jax-neuronx-nki-call.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-ubuntu-20-base.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eos-tensorflow-tutorial-inf.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-end-of-support-pytorch-2-6.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-maintenance-nxdi-nxd-core-inference.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-neurondevice.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eol-nemo-arg.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eos-tnx.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-python38.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-113.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-nki-library-kernel-migration.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eol-ubuntu-18.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler-v230.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-1-9.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-al2.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-u20-dlamis.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/sm-training-dlc-2.9.1.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-inf1-virtual-environments.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-maintenance-mxnet.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-nki-library-namespace-changes.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-profiling-api.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/neuron2-intro.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-nxd-examples.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-dlami-ubuntu-22-04.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/gpg-expiration.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-tensorflow-inf2.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-torch-neuron-versions.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-tensorboard-tools.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-component-change.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-nxd-examples.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-tensorflow1-x.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-package-change.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-mllama-checkpoint.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-end-of-support-nxdt-nxd-core.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eos-pt2-6.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-7-2-8-v229.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neuron-driver-support-inf1.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/dlami-neuron-2.10.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-end-of-support-neuronxcc-nki.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-tensorflow-inf2.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pt-versions.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/dlami-pytorch-introduce.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-torch-neuron.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/dlami-neuron-2.12.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-tf-versions.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-2-1.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler-2.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/neuron250-packages-changes.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-neuron-det.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-vllm-v0.html</loc>\n    <lastmod>2026-02-26</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-tensorboard-plugin.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neurondevice.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-end-of-support-vllm-v0.html</loc>\n    <lastmod>2026-02-26</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-deprecation-nxd-path-trace-api.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-nki-jit-torch.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-maintenance-tf.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-python-3-9-eol.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-deprecation-transformer-flag.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neuron-det.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-nki-library-namespace-changes-2-28.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-tensorflow2-10.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-moving-samples.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eos-opt.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-megatronlm-2-13.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neurondevice-version.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-block-dimension-nki.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-1.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-end-of-support-parallel-model-trace.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-probuf.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-deprecation-containers-rtd.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-correction-neuron-driver-support-inf1.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/end-of-support-pt2.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/neuron230-packages-changes.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-al2.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-nxdt-nxd-core-training.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-maintenance-tnx.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-tensorflow1-x.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/neuron-rtd-eol.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/release-neuron2.4.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-u20-dlc-dlami.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-jax-neuronx-nki-call.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eol-python-3-7.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-bf16-vars.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-7-2-8.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-nki-namespace-migration.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-dlami.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eos-pt-version.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/eol-pt-15.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announce-eol-pt-before-1-8.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/eol-tf-21-24.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announce-eol-pt-1-5.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announce-eol-mx-before-1-5.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announce-eol-tf-before-2-5.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announcements.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/eol-ncgs-env_2.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announce-eol-tf-before-2-7.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-7.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/torch-neuronx-graph-partitioner-app-note.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/index.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-6.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/torch-neuronx-dataparallel-app-note.html</loc>\n    <lastmod>2025-10-28</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-8.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-9.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-x.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/migration-from-xla-downcast-bf16.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/mxnet-neuron/flex-eg.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/transformers-neuronx/generative-llm-inference-with-neuron.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/neuron-cc/mixed-precision.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/neuronx-distributed/introducing-nxd-inference.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/neuronx-distributed/introducing-nxdt-training.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuron/index.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuron/bucketing-app-note.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuron/torch-neuron-dataparallel-app-note.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuron/rcnn-app-note.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/neuron1x/introducing-libnrt.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/perf/neuron-cc/performance-tuning.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/perf/neuron-cc/parallel-ncgs.html</loc>\n    <lastmod>2025-10-20</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/training/neuron-training.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/inference/neuron-faq.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/inference/trouble-shooting-faq.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-setup.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/inference-torch-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/guide-torch-neuron-vs-torch-neuronx-inference.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/pytorch-native-overview.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/training-torch-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/setup/jax-setup.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/setup/jax-neuronx-known-issues.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/api-reference-guide/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/api-reference-guide/neuron-envvars.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-disable-dynamic-batching.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/training-troubleshooting.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dynamic-batching.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/pytorch-neuron-supported-operators.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-default.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/misc-inference-torch-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/additional-examples-training.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/additional-examples-inference-torch-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dim-neq-zero.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-specify-ncs.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/misc-training.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/about/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-al2.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-u20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-u22.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-al2023.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-u24.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u24.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/note-setup-general.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u22.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-al2-dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-neuronx-install-cxx11.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2023.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-u20-dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/torch-neuronx-profiling-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/torch-neuronx-profiling-dev-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/analyze_for_training.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/tutorials-training-torch-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/finetune_hftrainer.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/mlp.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/zero1_gpt2.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/bert.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/inference/tutorials-torch-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/inference/tutorial-torchserve-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/training/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-debug.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/core-placement.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/autobucketing-dev-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/trace-vs-xla-lazytensor.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/training/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/training/pytorch-neuron-parallel-compile.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/training/torch-neuron-envvars.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-async-lazy-load.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-replace-weights.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-core-placement.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/inference-api-guide-torch-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-trace.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-data-parallel.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-analyze.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.8.0-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.7.0-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.9.0-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/.git/logs/refs/remotes/origin/VRF004.html</loc>\n    <lastmod>2026-01-27</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/how-to/how-to-ultraserver.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/files/index-dra.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/tutorial-oci-hook.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-device-plugin.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-monitor.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/build-run-neuron-container.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-multiple-scheduler.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-scheduler.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-problem-detector-and-recovery.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/tutorial-docker-env-setup.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-default-scheduler.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-helm-chart.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-setup.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-prerequisite.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-scheduler-flow.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/index.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/get-started/quickstart-configure-deploy-dlc.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/get-started/quickstart-pytorch-inference-dlc.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/training/Dockerfile-trainium-dlc.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/training/mlp.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/config-properties.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/torchserve-neuron.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/dockerd-libmode-entrypoint.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/Dockerfile-tf-serving.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/Dockerfile-libmode.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/Dockerfile-inference-dlc.html</loc>\n    <lastmod>2025-10-28</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/v1/inference/Dockerfile-torch-neuron.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/v1/inference/Dockerfile-app-rt-same.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/v1/inference/Dockerfile-app-rt-diff.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/v1/inference/dockerd-entrypoint-app-rt-same.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/v1/inference/Dockerfile-neuron-rtd.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/training/index.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/training/tutorial-training.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/training/k8s_mlp_train_demo.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/inference/index.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/inference/k8s_rn50_demo.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/inference/tutorial-infer.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/about/index.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/about/core-dump.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/about/collectives.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/compute-comm-overlap.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/index.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/work-with-neff-files.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/direct-hbm-tensor-alloc.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/core-dump-deep-dive.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/intranode-collective-comm.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/device-memory.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/runtime-performance-tips.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/internode-collective-comm.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_async_sendrecv.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_status.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt-async-api-best-practices.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_async.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/ndebug_stream.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_sys_trace.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/debug-stream-api.html</loc>\n    <lastmod>2026-02-05</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt-async-api-examples.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/ndl.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt-async-api-overview.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nec.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/neuron_driver_shared.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_experimental.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/neuron_ds.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_profile.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/neuron_driver_shared_tensor_batch_op.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_version.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.1.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/content.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.1.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.28.1.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.28.0.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/rn.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/mxnet-neuron.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/torch-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorboard-neuron.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/libneuronxla.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/index.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/runtime.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/nxd-inference.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/nki-lib.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/dlamis.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/dev-tools.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/nki.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/containers.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/nxd-training.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/compiler.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/jax.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/pytorch.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/nxd-core.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/nemo/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/nemo/neuronx-nemo.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/customcxxps/gpsimd-tools.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/customcxxps/gpsimd-customop-lib.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron1/prev/content.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron1/prev/rn.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron1/neuronrelease/previous-content.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-neuron/tensorflow-neuron.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-neuron/tensorflow-neuron-v2.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuron.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuronx.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuron-v2.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-neuronx/tensorflow-neuronx.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc-ops/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-pytorch.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-xla.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-tensorflow.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-mxnet.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/index.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/nx-jax.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/runtime.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/dlami.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/nxd-inference.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/nx-pytorch.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/tools.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/docs-and-samples.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/containers.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/nxd-training.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/compiler.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/nxd-core.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/nx-jax.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/runtime.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/dlami.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/nxd-inference.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/nx-pytorch.html</loc>\n    <lastmod>2025-12-19</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/tools.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/nki.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/containers.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/nxd-core.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/runtime.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/dlami.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/nxd-inference.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/nki-lib.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/nx-pytorch.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/tools.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/nki.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/containers.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/compiler.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/plugins/npd-ecs-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/dlc-then-ecs-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/aws-batch-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/sagemaker-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/ec2-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/setup/ecs-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/setup/eks-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/dlc-then-ecs-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/byoc-hosting-devflow-inf2.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/neo-then-hosting-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/dlc-then-k8s-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/dlc-then-ec2-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/aws-batch-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/dlc-then-eks-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/ec2-then-ec2-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/sagemaker-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/dev-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/byoc-hosting-devflow.html</loc>\n    <lastmod>2025-10-28</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/parallelcluster-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/env-setup-text.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/ec2-then-ec2-devflow-inf2.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/ec2-flows.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/container-sm-hosting-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/sm-devflow/sm-training-devflow.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/batch/batch-training.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster/parallelcluster-training.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/ec2/ec2-training.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nemo-megatron/index.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/index.html</loc>\n    <lastmod>2025-10-28</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/index.html</loc>\n    <lastmod>2026-02-26</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/overview-index.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/neuron-inference-overview.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/api-reference-guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/overview.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/misc.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer-guide.html</loc>\n    <lastmod>2025-11-11</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api-reference-guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tp_developer_guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tensor_parallelism_overview.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/context_parallelism_overview.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pp_developer_guide.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/model_optimizer_wrapper_developer_guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction_developer_guide.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api-reference-guide-training.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/neuronx_distributed_inference_developer_guide.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/developer-guide-inference.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api-reference-guide-inference.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/model_builder_v2_api_reference.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/developer-guide-training.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/lora_finetune_developer_guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/app_notes.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pipeline_parallelism_overview.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/standard_mixed_precision.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/ptl_developer_guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/index-training.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/developer-guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/save_load_developer_guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/index-inference.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/neuronx-distributed-misc.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/setup/index.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/inference.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/index.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_tutorials.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_pp.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/inference_tutorials.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/finetune_llama3_8b_ptl_lora.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/general/config_overview.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/general/features.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/general/installation_guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/general/known_issues.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-tp-appnote.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-cp-appnote.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-amr-appnote.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/index.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/hf_llama3_70B_pretraining.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/hf_llama3_8B_DPO_ORPO.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/hf_llama3_8B_SFT.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/hf_llama3_8B_SFT_LORA.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/checkpoint_conversion.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/hf_llama3_8B_pretraining.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/index.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/migration_nnm_nxdt.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/cpu_mode_developer_guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/optimizer_lr_scheduler_flow.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/new_model_guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/new_dataloader_guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/migration_nemo_nxdt.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/misc/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/misc/nxdi-troubleshooting.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/app-notes/parallelism.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/app-notes/index.html</loc>\n    <lastmod>2025-11-12</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/vllm/index.html</loc>\n    <lastmod>2026-02-26</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/vllm/quickstart-vllm-offline-serving.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/vllm/quickstart-vllm-online-serving.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/models/index.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/disaggregated-inference-tutorial.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn3-gpt-oss-120b-tutorial.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-tutorial.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/sd-inference-tutorial.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.1-405b-tutorial.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-fp8.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.1-405b-speculative-tutorial.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/disaggregated-inference-tutorial-1p1d.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/index.html</loc>\n    <lastmod>2026-02-26</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/nxd-examples-migration-guide.html</loc>\n    <lastmod>2025-11-12</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/model-reference.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/weights-sharding-guide.html</loc>\n    <lastmod>2025-11-12</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide-v1.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/performance-cli-params.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/moe-arch-deep-dive.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/onboarding-models.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html</loc>\n    <lastmod>2026-02-26</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/llm-inference-benchmarking-guide.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/disaggregated-inference.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/custom-quantization.html</loc>\n    <lastmod>2025-11-12</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/migrate-from-tnx-to-nxdi.html</loc>\n    <lastmod>2025-11-12</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/how-to-use-fpem.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/accuracy-eval-with-datasets.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/writing-tests.html</loc>\n    <lastmod>2025-11-12</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/api-guides/index.html</loc>\n    <lastmod>2025-11-12</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/api-guides/api-guide.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/models/llama3/llama_33_70b.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/models/qwen3/qwen3_moe_235b.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/al2-python.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/launch-trn1-dlami.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/legacy-inf1/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/legacy-inf1/pytorch.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/running-jupyter-notebook-as-script.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/update-dlc.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/manual.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/update-manual.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/update-dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/dlc.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/manual.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/dlc.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf2/note-setup-libnrt-warning.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf2/launch-inf2-dlami.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf2/dlami-enable-neuron-pytorch.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/trn1/dlami-notes.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/neuron-pip-install.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/note-setup-cntr.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/launch-inf1-dlami-aws-cli.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/launch-inf1-dlami.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/launch-inf1-ami.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/note-setup-general.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/neuron-pip-setup.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/compile_mode.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/develop_mode.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/tensorboard-plugin-neuron-pip-install.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/dlami-enable-neuron-mxnet.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/note-setup-libnrt-warning.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/dlami-enable-neuron-pytorch.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/deploy_mode.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-dge.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/index.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-dynamic-loops.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-compiler.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/mxfp-matmul.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/use-neuron-profile.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki_block_dimension_migration_guide.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-dma-bandwidth-guide.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-beta2-migration-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-aps.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-hbm-crc-hashing.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-0-3-0-update-guide.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki_perf_guide.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/how-to-scheduling-apis.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/index.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/nki_simulator.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/framework_custom_op.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.isa.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.collectives.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.api.shared.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.language.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.simulate.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.language.tile_size.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/index.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/quickstart-implement-run-kernel.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/nki-language-guide.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/setup-env.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/index.html</loc>\n    <lastmod>2026-04-08</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/tiling-overview.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/indexing-overview.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/data-representation-overview.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/lnc.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/nki-dma-overview.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/memory-hierarchy-overview.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.set_rng_seed.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.memset.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_n_gather.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gelu_apprx_tanh.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gelu_apprx_sigmoid.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.bn_stats.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.erf_dx.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.ceil.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.is_hbm.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float32.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dge_mode.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.rms_norm.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.arctan.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dma_engine.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.greater.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.mish.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.store.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.maximum.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.activation.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sequential_range.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dropout.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.register_store.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.shared_identity_matrix.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_scalar_cumulative.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.trunc.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.rng.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dma_transpose.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.tan.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.affine_select.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.matmul_perf_mode.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.VirtualRegister.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.exp.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dma_compute.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.less_equal.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_matmul_mx.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gelu.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.register_move.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gelu_apprx_sigmoid_dx.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.broadcast_to.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.sendrecv.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.max.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.logical_not.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.reduce_scatter.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.collective_permute.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.softplus.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.static_range.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.all_gather.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.subtract.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_transpose.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float8_e5m2.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.load.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.shared_constant.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.bitwise_or.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.quantize_mx.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float4_e2m1fn_x4.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.collective_permute_implicit_reduce.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.softmax.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.program_id.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.int8.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.greater_equal.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_version.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_scalar.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.invert.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.uint32.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.core_barrier.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sign.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.shared_hbm.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.negative.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.affine_range.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nonzero_with_count.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.tfloat32.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.zeros.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.oob_mode.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.collective_permute_implicit.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.square.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.is_on_chip.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.ndarray.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.matmul.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.ones.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float8_e4m3.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.num_programs.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.rank_id.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.exponential.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.where.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.is_psum.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float8_e4m3fn.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.power.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_stream_shuffle.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.erf.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.register_alloc.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.rand_set_state.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.rand.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.reciprocal.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.load_transpose2d.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.int32.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sbuf.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sum.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.log.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.get_nc_version.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.equal.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.select_reduce.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dma_copy.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.register_load.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.int16.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.engine.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.less.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float8_e4m3fn_x4.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.scalar_tensor_tensor.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.bfloat16.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_copy_predicated.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.private_hbm.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.not_equal.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_copy.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.logical_and.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.logical_or.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.multiply.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.uint8.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_find_index8.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.max8.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.floor.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.right_shift.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float8_e5m2_x4.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.prod.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.all_to_all.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.rsqrt.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.bool_.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.mean.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.expand_dims.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.min.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.left_shift.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.add.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.hbm.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.rand_get_state.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.jit.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.relu.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.range_select.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.ReplicaGroup.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_match_replace8.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_partition_reduce.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.all_to_all_v.html</loc>\n    <lastmod>2026-04-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.logical_xor.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.uint16.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_matmul.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_tensor_scan.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_reduce.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.program_ndim.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.local_gather.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.tanh.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.bn_aggr.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sigmoid.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.bitwise_and.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_tensor.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.is_sbuf.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.transpose.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.simulate.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.ds.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.reduce_cmd.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.all_reduce.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.no_reorder.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_scalar_reduce.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.cos.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.copy.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.activation_reduce.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.device_print.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.full.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.silu_dx.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.collective_permute_implicit_current_processing_rank_id.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.psum.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.tile_size.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.empty_like.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sin.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.silu.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.iota.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.sequence_bounds.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.minimum.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.var.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.abs.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.reciprocal.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gather_flattened.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.bitwise_xor.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.random_seed.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.rand2.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.zeros_like.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sqrt.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gelu_dx.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.dropout.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.dynamic_range.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.all.html</loc>\n    <lastmod>2026-04-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float16.html</loc>\n    <lastmod>2026-02-24</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/index.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/trainium2_arch.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/trainium3_arch.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/trainium_inferentia2_arch.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/matrix_multiplication.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/spmd_multiple_nc_tensor_addition.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/fused_mamba.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/kernel-optimization.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/transpose2d.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/average_pool2d.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/spmd_tensor_addition.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/index.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/tiled-range.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/tensor-view.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/allocator.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/stream-shuffle-broadcast.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/specs/index.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/specs/design-rmsnorm-quant.html</loc>\n    <lastmod>2026-02-17</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/about/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/transformer-tkg.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/index.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/cross-entropy.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/conv1d.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/find-nonzero-indices.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/output-projection-cte.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/dynamic-elementwise-add.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/blockwise-mm-backward.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/mlp.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/fgcc.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/attention-cte.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/depthwise-conv1d.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/rmsnorm-quant.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/qkv.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/moe-tkg.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/output-projection-tkg.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/attention-block-tkg.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/fg-allgather.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/topk-reduce.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/sb2sb-allgather.html</loc>\n    <lastmod>2026-04-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/cumsum.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/moe-cte.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/attention-tkg.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/router-topk.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/rope.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/api-reference-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/mxnet-neuron-setup.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/misc-mxnet-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/neo-then-hosting-devflow.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/ec2-then-ec2-devflow.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/api-compilation-python-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/developer-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/troubleshooting-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/inference-mxnet-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/index.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_terminology.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_benchmark_guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_troubleshooting.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_overview.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_examples.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_model_index_guide.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_evaluate_guide.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_faq.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/rn.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_compile_guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_framework_notes.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_install.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/api-reference-guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/index.html</loc>\n    <lastmod>2025-10-28</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/transformers-neuronx-tutorials.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/transformers-neuronx-developer-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/transformers-neuronx-api-reference.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/transformers-neuronx-misc.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/developer-guide.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/transformers-neuronx-developer-guide-for-continuous-batching.html</loc>\n    <lastmod>2025-10-28</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/torch-neuron-dataparallel-example-specify-ncs.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/torch-neuron-dataparallel-example-dynamic-batching.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/api-reference-guide-torch-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/torch-neuron-dataparallel-example-disable-dynamic-batching.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/inference-torch-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/api-torch-neuron-dataparallel-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/developer-guide-torch-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/torch-neuron-dataparallel-example-dim-neq-zero.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/additional-examples-inference-torch-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/api-compilation-python-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/torch-neuron-dataparallel-example-default.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/troubleshooting-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/api-core-placement.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/misc-inference-torch-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/helper-tools/index.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/helper-tools/tutorial-neuron-check-model.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/helper-tools/tutorial-neuron-gatherinfo.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/setup-legacy-inf1-tensorflow.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron-inference.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx-inference.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-setup.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorboard/getting-started-tensorboard-neuron-plugin.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/training-gpt-neox.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/gpt3_neuronx_nemo_megatron_pretraining.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/training_llama2_tp_pp_ptl.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/megatron_gpt_pretraining.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/training-gpt-neox-20b.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/finetuning_llama2_7b_ptl.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/finetune_t5.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/multinode-training-model-profiling.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/training_codegen25_7b.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/ssd300_demo/ssd300_demo.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/api-reference-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/api-tracing-python-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tensorflow2-accelerated-ops.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/additional-examples.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/dlc-then-ecs-devflow.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/neo-then-hosting-devflow.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/dlc-then-ec2-devflow.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/api-auto-replication-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/dlc-then-eks-devflow.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/ec2-then-ec2-devflow.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tf2_faq.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/api-tfn-analyze-model-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/api-compilation-python-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/api-reference-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tfnx-analyze-model-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tf-neuronx-auto-replication-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tfneuronx-python-tracing-api.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-u20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-al2.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-u22.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-al2023.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u20-dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u22.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-al2.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-al2-dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tutorials/tutorial-tensorflowx-serving-NeuronRT-Visible-Cores.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.9.0-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.8.0-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-u20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-u22.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-al2023.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-update.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-update-u22.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-update-u20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-nlp.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/tensorflow-tutorial-setup.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-utilizing-neuron-capabilities.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.0-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.2-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.16.3-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.1-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.2-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.1-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.18.0-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.19.0-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.0-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.14.2-tensorflow-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update-u20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update-u22.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update-al2023.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-prev.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-prev-u22.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-prev-u20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-prev-al2.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-cxx11.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update-al2-dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-prev-al2023.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update-u20-dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/guides/torch-lstm-support.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorials-torch-neuron-nlp.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/transformers-marianmt.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorials-utilizing-neuron-capabilities.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorials-torch-neuron-computervision.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/neuroncore_pipeline_pytorch.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorial-libtorch.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/pytorch-tutorial-setup.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorial-torchserve.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorials-inference-torch-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/guides/core-placement/torch-core-placement.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.19.0-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.17.2-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-2.4.0-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.15.2-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.15.1-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.15.0-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.18.0-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-2.3.0-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.16.1-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-2.5.0-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.16.2-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.16.3-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.14.2-pytorch-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/setup/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-update-u20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-al2-base-dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-al2.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-al2023.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-ubuntu22.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-install-prev-al2.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-install-prev-u20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-install-prev-u22.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-ubuntu20.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-ubuntu20-base-dlami.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-install-prev-al2023.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-update.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/tutorials-mxnet-utilizing-neuron-capabilities.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/tutorials-mxnet-computervision.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/tutorial-model-serving.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/tutorials-mxnet-nlp.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/mxnet-tutorial-setup.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.17.2-mxnet-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.15.2-mxnet-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.14.2-mxnet-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.19.0-mxnet-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.16.3-mxnet-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.18.0-mxnet-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.15.0-mxnet-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.15.1-mxnet-install.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/index.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/nccom-test.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-top-user-guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-sysfs-user-guide.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-ls.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tensorboard/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tensorboard/getting-started-tensorboard-neuronx-plugin.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/torch-neuronx-profiling-with-tb.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/tutorial-neuron-monitor-mnist.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/tutorial-tensorboard-scalars-mnist.html</loc>\n    <lastmod>2025-10-09</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/performance-profiling-vllm.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/profiler/neuron-profile-user-guide.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/profiler/neuron-profiler-2-0-beta-user-guide.html</loc>\n    <lastmod>2025-12-01</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/index.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-system-profiles.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-hierarchy-view.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-database-viewer.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/how-to-link-view-source-code.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/migration-faq.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-device-profiles.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/get-started.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/how-to-profile-workload.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-summary-page.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-memory-viewer.html</loc>\n    <lastmod>2026-04-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/view-perfetto.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-tensor-viewer.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-ai-recommendations.html</loc>\n    <lastmod>2026-02-25</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/api-reference-guide/api-reference-guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/api-reference-guide/custom-ops-ref-guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/programming-guide/custom-c++-operators-devguide.html</loc>\n    <lastmod>2026-02-03</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/programming-guide/programming-guide.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/tutorials/customop-mlp-perf-opt.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/tutorials/tutorials.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n  <url>\n    <loc>https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/tutorials/customop-mlp-training.html</loc>\n    <lastmod>2025-10-07</lastmod>\n  </url>\n</urlset>"
  },
  {
    "path": "tools/index.rst",
    "content": ".. _neuron-tools:\n\n.. meta::\n   :description: Developer tools for profiling, monitoring, and analyzing machine learning workloads on AWS Neuron devices.\n   :keywords: AWS Neuron, developer tools, profiler, monitoring, analysis, TensorBoard, visualization, debugging, optimization\n   :date-modified: 12/02/2025\n\nDeveloper Tools\n================\n\nAWS Neuron provides a comprehensive suite of developer tools for optimizing, monitoring, and debugging machine learning workloads on AWS Inferentia and Trainium accelerators. These tools enable developers to gain deep insights into model performance, system utilization, and hardware behavior to maximize the efficiency of ML applications running on Neuron-enabled instances.\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: Neuron Explorer\n      :link: /tools/neuron-explorer/index\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n        \n      Neuron Explorer is a suite of tools designed to support ML engineers throughout their development journey on AWS Trainium, from model development through debugging, profiling, analysis, and optimization.\n\n   .. grid-item-card:: Neuron Profiler 2.0\n      :link: /tools/profiler/neuron-profiler-2-0-beta-user-guide\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n        \n      Neuron Profiler 2.0 offers a user-friendly experience for capturing and analyzing application performance through both high-level system profiles and detailed device-level profiles.\n\n   .. grid-item-card:: Neuron Profiler\n      :link: /tools/profiler/neuron-profile-user-guide\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n        \n      The Neuron Profiler is a tool to profile and analyze performance of a ML model compiled with the Neuron compiler and run on NeuronDevices.\n\n   .. grid-item-card:: System Tools\n      :link: /tools/neuron-sys-tools/index\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n        \n      Command-line utilities for monitoring, debugging, and managing AWS Neuron devices, including neuron-monitor, neuron-top, neuron-ls, and more.\n\n   .. grid-item-card:: Third Party Tools\n      :link: /tools/third-party-solutions\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n        \n      Third-party tools and integrations that support the AWS Neuron development experience, including monitoring, visualization, and optimization solutions.\n\n..\n   .. grid-item-card:: AP Visualizer\n      :link: ap-visualizer/ap-visualizer.html\n      :link-type: url\n      :class-header: sd-bg-primary sd-text-white\n        \n      Visualize access patterns of tensors on Neuron devices.\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Tutorials\n      :link: /tools/tutorials/index\n      :link-type: doc\n      :class-header: sd-bg-secondary sd-text-white\n\n      Tutorials for how to utilize all Neuron Tools.\n\n   .. grid-item-card:: Release Notes\n      :link: /release-notes/components/dev-tools\n      :link-type: doc\n      :class-header: sd-bg-secondary sd-text-white\n\n      Latest updates, new features, and improvements to Neuron Tools and Neuron Explorer.\n\n.. toctree::\n   :maxdepth: 1\n   :hidden:\n\n   Neuron Profiler 2.0 </tools/profiler/neuron-profiler-2-0-beta-user-guide>\n   Neuron Profiler </tools/profiler/neuron-profile-user-guide>\n   System Tools </tools/neuron-sys-tools/index>\n   Third-party Tools </tools/third-party-solutions>\n   Tutorials </tools/tutorials/index>\n   Release Notes </release-notes/components/dev-tools>\n"
  },
  {
    "path": "tools/neuron-explorer/get-started.rst",
    "content": ".. meta::\n   :description: Setup and get started guide for new Neuron SDK profiler\n   :date_updated: 12/02/2025\n\n.. _new-neuron-profiler-setup:\n\nGet Started with Neuron Explorer\n========================================\n\nIn this guide, you'll learn how to set up and launch Neuron Explorer, including the web-based UI for interactive analysis. By the end of this guide, you'll be able to visualize and analyze performance data for your models directly in your browser.\n\nOverview\n---------\n\nIn this guide, you'll launch an AWS Trainium or Inferentia EC2 instance using the AWS Deep Learning AMI (DLAMI) for Neuron, install and verify Neuron Explorer, start both the API and UI servers, and set up secure SSH tunneling to view the Neuron Explorer interface in your local browser.\n\nUse this tool when you want to collect, inspect, and visualize Neuron profiling data from model training or inference jobs running on Neuron-compatible instances. At a high level, you will:\n\n1. Launch a Neuron DLAMI instance\n2. Verify Neuron Explorer installation\n3. Start the Neuron Explorer servers\n4. Configure SSH tunneling\n5. Access the Neuron Explorer UI locally\n\n\nPrerequisites\n--------------\n\n* An AWS account with permissions to launch EC2 instances.\n* Access to an AWS Trainium or Inferentia instance type (such as trn1.2xlarge, inf2.xlarge).\n* AWS Neuron DLAMI with the latest Neuron SDK preinstalled.\n* SSH key pair (``.pem`` file) to securely connect to your EC2 instance.\n* Local machine with SSH client and web browser installed.\n\n\nBefore you begin\n-----------------\n\nComplete these steps before starting the task in this document:\n\n1. Make sure you have an active AWS account and `a default VPC available in your region <https://console.aws.amazon.com/vpc/>`_. \n2. Create or locate your SSH key pair (``.pem`` file) that allows access to your EC2 instance.\n\nInstructions\n-------------\n\n1. Launch a Neuron-compatible EC2 instance\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nLaunch an EC2 instance with either a Trainium or Inferentia instance type using the AWS Neuron DLAMI.\nYou can do this from the AWS Management Console or CLI. For more instructions on how to launch an instance with Neuron DLAMI, refer to the instructions here.\n\n**Expected outcome**\n\nYour instance should start and appear in the EC2 dashboard as \"Running.\"\n\n\n2. Verify that Neuron Explorer is installed\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOnce you've connected to your EC2 instance with SSH, verify that Neuron Explorer and the associated tools are installed:\n\n.. code-block:: bash\n\n   apt list --installed | grep neuronx-tools\n\n**Expected outcome**\n\nYou should see neuronx-tools listed among the installed packages, confirming that Neuron Explorer is available on your instance.\n\n\n3. Launch the API and UI SPA servers\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nStart the Neuron Explorer web servers using the following command:\n\n.. code-block:: bash\n\n   neuron-explorer view -v 2 --data-path ./parquet_files\n\n\nThis command starts:\n\n* The UI SPA (Single Page Application) server (default port: 3001)\n* The API server (default port: 3002)\n\n\n**Expected outcome**\n\nYou'll see terminal logs confirming that both the UI and API servers are running.\n\n\n4. Set up SSH tunneling\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nBy default, Neuron Explorer runs locally on the EC2 instance. To securely access it from your local computer, you must create SSH tunnels for ports 3001 and 3002.\n\nRun the following command from your local machine terminal (replace placeholders such as ``your-key`` and ``public_ip_address_of_your_instance``):\n\n.. code-block:: bash\n\n   ssh -i ~/your-key.pem -L 3001:localhost:3001 -L 3002:localhost:3002 ubuntu@[public_ip_address_of_your_instance_] -fN\n\n**Explanation:**\n\n* ``-L 3001:localhost:3001`` forwards the UI server.\n* ``-L 3002:localhost:3002`` forwards the API server.\n* ``-fN`` keeps the tunnel open in the background.\n\n\n**Expected outcome**\n\nNo error messages should appear, indicating that your SSH tunnels are active.\n\n.. note::\n   Replace ``ubuntu`` with the appropriate username for your AMI (for example, ``ec2-user`` on Amazon Linux).\n\n5. Connect to the Neuron Explorer UI\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nOnce your tunnel is active, open your preferred web browser and navigate to:\n\n.. code-block:: text\n\n   http://localhost:3001\n\n\n**Expected outcome**\n\nThe Neuron Explorer UI loads in your browser, displaying an interactive dashboard for exploring profiling data.\n\nConfirm your work\n------------------\n\nYou've successfully set up Neuron Explorer! To confirm everything is working:\n\n1. The browser should display the Neuron Explorer interface.\n2. The terminal running the profiler command should show log activity when interacting with the UI.\n3. You can explore profiling sessions from your ``./parquet_files`` directory.\n\nIf all these checks pass, you are ready to begin analyzing performance data using Neuron Explorer.\n\nCommon issues\n---------------\n\nIf you encounter an error or other issue while working through this task, here are some commonly encountered issues and how to address them:\n\n* **Neuron Explorer UI doesn't load**: Check that your SSH tunnel is configured correctly. Make sure ports 3001 and 3002 are forwarded using the ``-L`` flags in your SSH command, and verify the EC2 instance is running.\n* **No profiling data displayed**: Double-check that the directory passed to ``--data-path`` contains valid .parquet profiling files generated by a prior Neuron profiling run.\n* **neuron-profile command not found**: Ensure that Neuron SDK is installed. Please ensure that you have launched your instance with Neuron DLAMI or you have set up your instance based on the instructions mentioned here.\n* **Connection refused on port 3001 or 3002**: Confirm that your EC2 security group allows outbound traffic and that the SSH tunnel was created from your local machine, not from inside the instance.\n"
  },
  {
    "path": "tools/neuron-explorer/how-to-link-view-source-code.rst",
    "content": ".. meta::\n    :description: Learn how to use source code linking in Neuron Explorer to understand code performance and optimize your applications\n    :date-modified: 11/21/2025\n\n.. _neuron-explorer-source-code:\n\nSource Code Viewer\n====================\n\nIn this guide, you'll learn how to use Neuron Explorer's source code linking feature to visualize connections between your application code and device performance. Discover how to navigate between source code and device instructions, highlight performance-critical sections, view framework stack traces, and leverage interactive code decorations to optimize your AWS Neuron applications for maximum efficiency.\n\nOverview\n--------\n\nSource code linking helps you understand how your code changes affect device performance and identify ways to optimize it. This feature creates interactive connections between source code files and other Neuron Explorer widgets. You can zoom to device instructions from selected code lines, navigate between instructions and source code, and highlight instructions for specific loop iterations. You can use source code linking in both the VS Code extension and standalone web application. This gives you flexibility for different developer workflows.\n\nThe Framework Stack Trace feature shows up in the Event Details when an instruction on the device profile is clicked. This feature is used to map the device instructions back to framework level code in JAX or PyTorch to better understand what part of the application code resulted in a particular device instruction.\n\n.. image:: /tools/profiler/images/view-link-1.gif\n\nInstructions\n-------------\n\nTo enable the addition of the \"NKI Source Location\" field to a profile enable set this environment variable: ``NEURON_FRAMEWORK_DEBUG=1``\n\nTo enable tracking of the stack trace information, you set these environment variables before compiling your NEFF:\n\n.. code-block:: bash\n\n    export XLA_IR_DEBUG=1\n    export XLA_HLO_DEBUG=1\n\nOnce you have the NEFF, you can simply capture the profile as usual. To view your source code while viewing the profile, use the ``--framework-source-root`` flag to pass the path to framework source files. This is optional and is only needed if you want to view your code alongside the displayedprofile.\n\n.. code-block:: bash\n\n    neuron-explorer view -n file.neff -s profile.ntff --framework-source-root /path/to/framework/source/files\n\nCode Viewer Widget\n-------------------\n\nHighlighting Instructions\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSelect source code lines to highlight their corresponding instructions in the profiler view. You can select individual lines or multiple lines through block selection or multiple cursors.\n\n.. image:: /tools/profiler/images/view-link-2.png\n\nNavigating to Source Code\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n(Ctrl/Cmd)+Click any instruction to jump to it's location in source code. If there are multiple matches, you will be prompted to select which file to navigate to.\n\n.. image:: /tools/profiler/images/view-link-3.png\n\nSource Code Decorations\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nPerformance metrics appear as decorations directly in your source code, updating automatically with the instruction profiler's time range. \n\nConfigure which metrics to display and in the settings panel. Currently only instruction count and PE element count are supported.\n\n.. image:: /tools/profiler/images/view-link-4.png\n\nNavigating to Instructions\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSelect lines in your source code and navigate to their corresponding instructions using Ctrl+Shift+G, the context menu, or the \"Zoom into Instructions\" command from the command palette. \n\nThe Device Trace Viewer will then zoom to show all instructions associated with your selection.\n\n.. image:: /tools/profiler/images/view-link-5.png\n\nDependency Annotations\n~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen enabled, selecting an instruction will highlight its dependent source code lines. The selected instruction's line will be highlighted in one color, with its dependencies shown in a different color.\n\n.. image:: /tools/profiler/images/view-link-6.png\n"
  },
  {
    "path": "tools/neuron-explorer/how-to-profile-workload.rst",
    "content": ".. meta::\n    :description: Learn how to capture a profile, launch the Neuron Explorer UI, and use the Profile Manager to analyze your workload performance.\n    :date-modified: 12/02/2025\n\nCapture and View Profiles in Neuron Explorer\n================================================\n\nCapturing Profiles\n------------------\nIn this guide, you'll learn how to capture a profile, launch the Neuron Explorer, use the Profile Manager, and view Neuron Explorer in your IDE.\n\nTo get a better understanding of your workload's performance, you must collect the raw device traces and runtime metadata in the form of an NTFF (Neuron Trace File Format) which you can then correlate with the compiled NEFF (Neuron Executable File Format) to derive insights.\n\nSet the following environment variables before compiling to capture more descriptive layer names and stack frame information.\n\n.. code-block:: bash\n\n   export XLA_IR_DEBUG=1\n   export XLA_HLO_DEBUG=1\n\nFor NKI developers, set ``NEURON_FRAMEWORK_DEBUG`` in addition to the two above to enable kernel source code tracking:\n\n.. code-block:: bash\n\n   export NEURON_FRAMEWORK_DEBUG=1\n\nIf profiling was successful, you will see NEFF (``.neff``) and NTFF (``.ntff``) artifacts in the specified output directory similar to the following:\n\n.. code-block:: bash\n\n   output\n   └── i-0ade06f040a13f2bf_pid_210229\n       ├── 395760075800974_instid_0_vnc_0.ntff\n       └── neff_395760075800974.neff\n\nDevice profiles for the first execution of each NEFF per NeuronCore are captured, and NEFF/NTFF pairs with the same prefix (for PyTorch) or unique hash (for JAX or CLI) must be uploaded together. See the section on :ref:`uploading profiles <profile-manager-upload-profile>` for more details.\n\nJAX Profiling API\n~~~~~~~~~~~~~~~~~\n\nWhen using the JAX context-managed profiling API, set two extra environment variables to signal the profile plugin to begin capturing device profile data when the profiling API is invoked.\n\n.. code-block:: python\n\n   os.environ[\"NEURON_RT_INSPECT_DEVICE_PROFILE\"] = \"1\"\n   os.environ[\"NEURON_RT_INSPECT_OUTPUT_DIR\"] = \"./output\"\n\nThen, profile a block of code:\n\n.. code-block:: python\n\n   with jax.profiler.trace(os.environ[\"NEURON_RT_INSPECT_OUTPUT_DIR\"]):\n\nFull code example:\n\n.. code-block:: python\n\n   from functools import partial\n   import os\n   import jax\n   import jax.numpy as jnp\n\n   from jax.sharding import Mesh, NamedSharding, PartitionSpec as P\n   from jax.experimental.shard_map import shard_map\n   from time import sleep\n   from functools import partial\n\n   os.environ[\"NEURON_RT_INSPECT_DEVICE_PROFILE\"] = \"1\"\n   os.environ[\"NEURON_RT_INSPECT_OUTPUT_DIR\"] = \"./output\"\n\n   jax.config.update(\"jax_default_prng_impl\", \"rbg\")\n\n   mesh = Mesh(jax.devices(), ('i',))\n\n   def device_put(x, pspec):\n     return jax.device_put(x, NamedSharding(mesh, pspec))\n\n   lhs_spec = P('i', None)\n   lhs = device_put(jax.random.normal(jax.random.key(0), (128, 128)), lhs_spec)\n\n   rhs_spec = P('i', None)\n   rhs = device_put(jax.random.normal(jax.random.key(1), (128, 16)), rhs_spec)\n\n\n   @jax.jit\n   @partial(shard_map, mesh=mesh, in_specs=(lhs_spec, rhs_spec),\n            out_specs=rhs_spec)\n   def matmul_allgather(lhs_block, rhs_block):\n     rhs = jax.lax.all_gather(rhs_block, 'i', tiled=True)\n     return lhs_block @ rhs\n\n   with jax.profiler.trace(os.environ[\"NEURON_RT_INSPECT_OUTPUT_DIR\"]):\n     out = matmul_allgather(lhs, rhs)\n     for i in range(10):\n         with jax.profiler.TraceAnnotation(\"my_label\"+str(i)):\n             out = matmul_allgather(lhs, rhs)\n         sleep(0.001)\n\n\n   expected = lhs @ rhs\n   with jax.default_device(jax.devices('cpu')[0]):\n     equal = jnp.allclose(jax.device_get(out), jax.device_get(expected), atol=1e-3, rtol=1e-3)\n     print(\"Tensors are the same\") if equal else print(\"Tensors are different\")\n\n\n.. _neuron-explorer-capture-environment-variables:\n.. _neuron-explorer-non-framework-user-experience:\n\nEnvironment Variables\n~~~~~~~~~~~~~~~~~~~~~\n\nYou can also control profiling with environment variables. This is useful when you can’t easily change your \napplication code, such as when running an executable which calls the Neuron Runtime or in a containerized \nenvironment where the application code is built into the container image.\n\n.. _neuron-explorer-core-control-variables:\n\nCore Control Variables\n^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Variable\n     - Description\n     - Default behavior\n   * - ``NEURON_RT_INSPECT_ENABLE``\n     - Set to ``1`` to enable profiling\n     - Enables system profiling and disables device profiling. To control which profile types are captured, see :ref:`Profile type selection <neuron-explorer-profile-type-selection>`\n   * - ``NEURON_RT_INSPECT_OUTPUT_DIR``\n     - Directory for profile data output\n     - Default directory for captured profile data is ``./output``\n\n.. _neuron-explorer-profile-type-selection:\n\nDevice or System Profile Type Selection\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. note:: \n    \n    When ``NEURON_RT_INSPECT_ENABLE`` set to ``1``, ``NEURON_RT_INSPECT_SYSTEM_PROFILE`` is enabled by default (set to 1) and ``NEURON_RT_INSPECT_DEVICE_PROFILE`` is disabled by default (set to ``0``).\n\nWhen ``NEURON_RT_INSPECT_ENABLE`` = 1, two different profile types are available:\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Variable\n     - Profile type\n     - Description\n     - Enable capture\n     - Disable capture\n   * - ``NEURON_RT_INSPECT_SYSTEM_PROFILE``\n     - System-level\n     - Captures runtime system events and operations\n     - Set to ``1``\n     - Set to ``0``\n   * - ``NEURON_RT_INSPECT_DEVICE_PROFILE``\n     - Device-level\n     - Captures detailed NeuronCore hardware metrics\n     - Set to ``1``\n     - Set to ``0``\n\n.. note::\n\n    These variables have no effect if ``NEURON_RT_INSPECT_ENABLE`` is not set to ``1``.\n\n.. _neuron-explorer-advanced-config-vars:\n  \nAdvanced configuration for System Profiles\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Variable\n     - Profile type\n     - Description\n     - Default behavior\n   * - ``NEURON_RT_INSPECT_SYS_TRACE_MAX_EVENTS_PER_NC``\n     - System-level\n     - Maximum trace events per NeuronCore before oldest events are overwritten\n     - 1,000,000\n\n.. note:: \n    \n    Increasing the event limit will consume more host memory.\n\nCapture using nccom-test with Environment Variables\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nProfiling can be enabled using environment variables. For simplicity, we have a quick way to generate a Neuron workload through using :ref:`nccom-test <nccom-test>`. nccom-test is a benchmarking tool which is already available with Neuron AMI.\n\n.. code-block:: shell\n\n    export NEURON_RT_INSPECT_ENABLE=1\n    export NEURON_RT_INSPECT_OUTPUT_DIR=./output\n    nccom-test allr allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512\n\n.. note::\n    If you have problems with nccom-test add the --debug flag.\n    If using a trn1.2xlarge instance, change -r 32 to -r 2 to use fewer neuron cores.\n\nTo understand the profiling output see this section: :ref:`Inspect Output <neuron-explorer-inspect-output>`\n\nCapture with EKS\n^^^^^^^^^^^^^^^^\n\nCapturing a profile on EKS is most easily done through setting of environment variables as described in the section \n:ref:`Non-framework specific User Experience <neuron-explorer-non-framework-user-experience>`. By using environment \nvariables, users do not need to change application code in their container image or modify their run commands. \n\nUpdate the deployment yaml to include the ``NEURON_RT_INSPECT_ENABLE`` and ``NEURON_RT_INSPECT_OUTPUT_DIR`` \nenvironment variables. For distributed workloads, it’s important that ``NEURON_RT_INSPECT_OUTPUT_DIR`` points to a \ndirectory on a shared volume which all workers have access to.\n\n.. code-block:: yaml\n\n    apiVersion: v1\n    kind: Pod\n    metadata:\n    name: trn1-mlp\n    spec:\n    restartPolicy: Never\n    schedulerName: default-scheduler\n    nodeSelector:\n        beta.kubernetes.io/instance-type: trn1.32xlarge\n    containers:\n        - name: trn1-mlp\n        env:\n            - name: NEURON_RT_INSPECT_ENABLE\n            value: \"1\"\n            - name: NEURON_RT_INSPECT_OUTPUT_DIR\n            value: \"/shared/output\"\n        command: ['torchrun']\n        args:\n            - '--nnodes=1'\n            - '--nproc_per_node=32'\n            - 'train_torchrun.py'\n        image: ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO}:mlp\n        imagePullPolicy: IfNotPresent\n        resources:\n            limits: \n            aws.amazon.com/neuron: 16\n\n\n.. note::\n\n    EKS users running PyTorch and JAX applications are still free to change their application code \n    and use the PyTorch or JAX Python profiling APIs if they want finer-grained control over profiling. \n    However, using the environment variables conveniently allows profiling without modifying the \n    container image or application code.\n\n\nCLI\n~~~\n\nIn certain cases, you may want to profile the application without requiring code modifications such as when deploying a containerized application through EKS. Note that when capturing with the CLI, profiling will be enabled for the entire lifetime of the application. If more granular control is required for profiling specific sections of the model, it is recommended to use the PyTorch or JAX APIs.\n\nTo enable profiling without code change, run your workload with the following environment variables set:\n\n.. code-block:: bash\n\n   export NEURON_RT_INSPECT_ENABLE=1\n   export NEURON_RT_INSPECT_DEVICE_PROFILE=1\n   export NEURON_RT_INSPECT_OUTPUT_DIR=./output\n   python train.py\n\nCLI reference for System Profiles\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIn addition to controlling profiling with environment variables, you can use the ``neuron-explorer inspect`` command line interface \nfor profiling applications. This provides the same functionality as environment variables but helps you avoid typos, invalid arguments, \nand provides a useful ``--help`` command to explain available options.\n\n.. code-block:: shell\n\n   Usage:\n   neuron-explorer [OPTIONS] inspect [inspect-OPTIONS] [userscript...]\n\n   Application Options:\n   -v, --version               Show version and exit\n\n   Help Options:\n   -h, --help                  Show this help message\n\n   [inspect command options]\n         -o, --output-dir=       Output directory for the inspection results (default: .)\n         -n, --num-trace-events= Maximum number of trace events to capture when profiling. Once hitting this limit, old events are dropped\n\n   [inspect command arguments]\n   userscript:                 Run command/script that launches a Neuron workload. E.g. 'python app.py' or './runscript.sh'\n\nExample of using System Profiles CLI\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUser can provide any type of their own script to generate a Neuron workload such as Pytorch to the System Profiles CLI. \nFor simplicity, we have a quick way to generate a Neuron workload \nthrough using ``nccom-test``. ``nccom-test`` is a benchmarking tool which is already available with Neuron AMI and ``aws-neuronx-tools`` package.\n\n.. code-block:: shell\n\n    ubuntu@ip-172-31-63-210:~$ neuron-explorer inspect -o inspect-output-nccom-test nccom-test allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512\n    INFO[0000] Running command \"nccom-test allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512\" with profiling enabled\n        size(B)    count(elems)    type    time:avg(us)    algbw(GB/s)    busbw(GB/s)\n        524288          131072    fp32           24.15          21.71          21.03\n    Avg bus bandwidth:    21.0339GB/s\n\n.. note::\n    If you have problems with nccom-test add the --debug flag.\n    If using a trn1.2xlarge instance, change -r 32 to -r 2 to use fewer neuron cores.\n\n.. _neuron-explorer-inspect-output:\n\n``neuron-explorer inspect`` Output\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe above command traces a Neuron workload execution and saves the output to the ``inspect-output-nccom-test`` directory. \nYou will see the output directory contains a single NEFF file and a device profile (NTFF) for all Neuron Cores which executed that NEFF. \nYou will also see ``ntrace.pb`` and ``trace_info.pb`` files storing the system profile data.\nBelow showing what the outputs will look like:\n\n.. code-block:: shell\n\n    ubuntu@ip-172-31-63-210:~$ tree inspect-output-nccom-test\n    inspect-output-nccom-test\n        ├── i-012590440bb9fd263_pid_98399\n        │   ├── 14382885777943380728_instid_0_vnc_0.ntff\n        │   ├── 14382885777943380728_instid_0_vnc_1.ntff\n        │   ├── 14382885777943380728_instid_0_vnc_10.ntff\n        │   ├── 14382885777943380728_instid_0_vnc_11.ntff\n        ...\n        │   ├── 14382885777943380728_instid_0_vnc_8.ntff\n        │   ├── 14382885777943380728_instid_0_vnc_9.ntff\n        │   ├── cpu_util.pb\n        │   ├── host_mem.pb\n        │   ├── neff_14382885777943380728.neff\n        │   ├── ntrace.pb\n        │   └── trace_info.pb\n        └──\n\n    2 directories, 74 files\n\n\nTo view a summary of the captured profile data run the command\n\n.. code-block:: shell\n\n    neuron-explorer view -d inspect-output-nccom-test --output-format summary-text\n\n\n.. _neuron-explorer-filtering-system-profiles:\n\nCapture-time Filtering\n----------------------\n\n**Capture-time filtering** reduces memory usage and trace file size by only collecting specific events, but filtered data cannot be recovered later.\nConfigure filters before trace capture using environment variables or API functions. \nYou can use NeuronCore filters to only capture events for specific NeuronCores (for example only events associated with NeuronCore 0 or all the NeuronCores on a specific NeuronDevice). \nYou can use event type filters to only capture specific events (for example model execute or collectives events). \nIt is possible to combine both NeuronCore and event type filters.\n\nNeuronCore\n~~~~~~~~~~\n\nIf capture is enabled for a NeuronCore then a ring buffer will be allocated in host memory for storing those core's events. Thus filtering by NeuronCore decreases host memory usage during capture.\n\nDefault Behavior\n^^^^^^^^^^^^^^^^\n\nBy default, all visible NeuronCores are enabled for capture. \n\nUsing Environment Variables\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: shell\n\n    # Filter to capture events only from NeuronCore 0\n    export NEURON_RT_INSPECT_EVENT_FILTER_NC=0\n\n    # Filter to capture events from NeuronCores 0, 2, and 4\n    export NEURON_RT_INSPECT_EVENT_FILTER_NC=0,2,4\n\n    # Filter to capture events from a range of NeuronCores (0 through 3)\n    export NEURON_RT_INSPECT_EVENT_FILTER_NC=0-3\n\n    # Reset to default behavior\n    unset NEURON_RT_INSPECT_EVENT_FILTER_NC # Back to capturing all visible cores\n\nUsing API Functions\n^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: c\n\n    #include <nrt/nrt_sys_trace.h>\n\n    // Allocate and configure trace options\n    nrt_sys_trace_config_t *config;\n    nrt_sys_trace_config_allocate(&config);\n    nrt_sys_trace_config_set_defaults(config);\n\n    // Enable capture only for specific NeuronCores\n\n    // Disable all cores since by default they are all enabled\n    int num_cores = 128;\n    for (int i=0; i<num_cores; i++) {\n      nrt_sys_trace_config_set_capture_enabled_for_nc(config, i, false); // disable NC i\n    }\n\n    // Then enable specific cores\n    nrt_sys_trace_config_set_capture_enabled_for_nc(config, 0, true);  // Enable NC 0\n    nrt_sys_trace_config_set_capture_enabled_for_nc(config, 2, true);  // Enable NC 2\n\n    // Start tracing with the configuration\n    nrt_sys_trace_start(config);\n\n    // Your application code here...\n\n    // Stop tracing and cleanup\n    nrt_sys_trace_stop();\n    nrt_sys_trace_config_free(config);\n\nEvent Type\n~~~~~~~~~~\n\nDefault Behavior\n^^^^^^^^^^^^^^^^\n\nBy default, all event types are enabled for capture.\n\nGetting Available Event Types\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYou can discover all available event types using the ``nrt_sys_trace_get_event_types`` API.\n\n.. code-block:: c\n\n    #include <nrt/nrt_sys_trace.h>\n\n    // Get all available event types\n    const char **event_types = nullptr;\n    size_t count = 0;\n    NRT_STATUS status = nrt_sys_trace_get_event_types(&event_types, &count);\n\n    if (status == NRT_SUCCESS) {\n        printf(\"Available event types:\\n\");\n        for (size_t i = 0; i < count; ++i) {\n            printf(\"  %s\\n\", event_types[i]);\n        }\n        \n        // Free the event types array\n        for (size_t i = 0; i < count; ++i) {\n            free((void*)event_types[i]);\n        }\n        free((void*)event_types);\n    }\n\nUsing Environment Variables\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe ``NEURON_RT_INSPECT_EVENT_FILTER_TYPE`` environment variable supports:\n\n* **Default**: If not set, all event types are captured\n* **Specific event types**: Use exact event names from ``nrt_sys_trace_get_event_types()``\n* **Event categories**: Use ``hardware`` or ``software`` to filter by category\n* **Exclusion**: Use ``^`` prefix to exclude specific events from a category\n\n.. code-block:: shell\n\n    # Filter to capture only specific event types\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=model_load,nrt_execute,runtime_execute\n\n    # Filter to capture all hardware events\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware\n\n    # Filter to capture all software events\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=software\n\n    # Filter to capture all hardware events EXCEPT cc_exec\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware,^cc_exec\n\n    # Filter to capture all software events EXCEPT model_load\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=software,^model_load\n\n    # Mix categories and specific events\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware,tensor_read,tensor_write\n\n    # Reset to default behavior\n    unset NEURON_RT_INSPECT_EVENT_FILTER_TYPE  # Back to capturing all event types\n\nThe ``hardware`` group contains events that are executed on the NeuronCore. \nThese are ``nc_exec_running``, ``cc_running``, ``cc_exec_barrier``, ``numerical_err``, ``nrt_model_switch``, ``timestamp_sync_point``, ``hw_notify``.\nThe ``software`` group contains all other events.\n\nUsing API Functions\n^^^^^^^^^^^^^^^^^^^\n\nUse the ``nrt_sys_trace_config_set_capture_enabled_for_event_type`` API to filter by event type.\n\n.. code-block:: c\n\n    #include <nrt/nrt_sys_trace.h>\n\n    // Configure trace options\n    nrt_sys_trace_config_t *config;\n    nrt_sys_trace_config_allocate(&config);\n    nrt_sys_trace_config_set_defaults(config); // By default, all event types are enabled\n\n    // Disable specific event types (others remain enabled)\n    nrt_sys_trace_config_set_capture_enabled_for_event_type(config, \"device_exec\", false);\n\n    // Or disable all first, then enable only specific ones\n    const char **all_event_types = nullptr;\n    size_t all_count = 0;\n    nrt_sys_trace_get_event_types(&all_event_types, &all_count);\n\n    // Disable all event types first\n    for (size_t i = 0; i < all_count; ++i) {\n        nrt_sys_trace_config_set_capture_enabled_for_event_type(config, all_event_types[i], false);\n    }\n\n    // Enable only specific event types\n    nrt_sys_trace_config_set_capture_enabled_for_event_type(config, \"model_load\", true);\n    nrt_sys_trace_config_set_capture_enabled_for_event_type(config, \"nrt_execute\", true);\n\n    // Verify which event types are enabled\n    const char **enabled_types = nullptr;\n    size_t enabled_count = 0;\n    nrt_sys_trace_config_get_enabled_event_types(config, &enabled_types, &enabled_count);\n    printf(\"Enabled event types: %zu\\n\", enabled_count);\n    for (size_t i = 0; i < enabled_count; ++i) {\n        printf(\"  %s\\n\", enabled_types[i]);\n    }\n\n    // Clean up memory (caller is responsible)\n    for (size_t i = 0; i < enabled_count; ++i) {\n        free((void*)enabled_types[i]);\n    }\n    free((void*)enabled_types);\n\n    for (size_t i = 0; i < all_count; ++i) {\n        free((void*)all_event_types[i]);\n    }\n    free((void*)all_event_types);\n\n    // Start tracing\n    nrt_sys_trace_start(config);\n\n    // Your application code here...\n\n    // Cleanup\n    nrt_sys_trace_stop();\n    nrt_sys_trace_config_free(config);\n\n\nProcessing-time Filtering\n--------------------------\n\n**Processing-time filtering** preserves the complete trace and allows flexible analysis with different filters, but requires more memory and storage during capture.\nApply filters when viewing or processing already captured profiles. This approach allows you to \nanalyze the same trace data in different ways without recapturing. The filters can be used for any \n``neuron-explorer`` output format including ``--output-format json`` and ``--output-format perfetto``.\n\nNeuronCore\n~~~~~~~~~~\n\nUse the ``--system-trace-filter-neuron-core`` to only process events for specific NeuronCores. The IDs are local to the instance and not global IDs. \n\nIf the ``--system-trace-filter-neuron-core`` argument is not set then events from all NeuronCores will be included in the processed trace.\n\n\n**Single neuron core**\n\n.. code-block:: shell\n\n    neuron-explorer view -d ./output --system-trace-filter-neuron-core \"0\"\n\n**Multiple neuron cores**\n\n.. code-block:: shell\n\n    neuron-explorer view -d ./output --system-trace-filter-neuron-core \"0,1,2,3\"\n\nEvent Type\n~~~~~~~~~~\nUse the ``--system-trace-filter-event-type`` to only process specific trace events types.\n\nIf the ``--system-trace-filter-event-type`` argument is not set then all event types will be included in the processed trace.\n\n**Single event type**\n\n.. code-block:: shell\n\n    neuron-explorer view -d ./output --system-trace-filter-event-type \"nrt_execute\"\n\n**Multiple event type**\n\n.. code-block:: shell\n\n    neuron-explorer view -d ./output --system-trace-filter-event-type \"nrt_execute,nrt_load\"\n\nInstance ID\n~~~~~~~~~~~\n\nUse the ``--system-trace-filter-instance-id`` to only process events for specific ec2 instances.\n\nIf the ``--system-trace-filter-instance-id`` argument is not set then events from all instances will be included in the processed trace.\n\n**Single instance**\n\n.. code-block:: shell\n\n    neuron-explorer view -d ./output --system-trace-filter-instance-id \"i-abc123\"\n\n**Multiple instances**\n\n.. code-block:: shell\n\n    neuron-explorer view -d ./output --system-trace-filter-instance-id \"i-abc123,i-def456,i-ghi789\"\n\nProcessing only system or device profiles\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can reduce processing times by skipping the processing of system or device profiles. Choose this when you are interested in only a specific profile, or when you want to start with a limited set of profiling data before exploring the full profile.\n\nTo skip processing of device profiles use the ``--ignore-device-profile`` option. To skip processing of system profiles use the ``--ignore-system-profile`` option. These options can be used with the ``--output-format`` values ``parquet`` (default), ``perfetto``, or ``json``.\n\nFor example:\n\n.. code-block:: shell\n\n    neuron-explorer view -d ./output --ignore-device-profile --output-format perfetto\n\n\nView Profiles\n-------------\n\nTo view a profile in Neuron Explorer, follow these steps:\n\n1. **Start the Neuron Explorer UI and API servers** using the ``neuron-explorer`` tool from ``aws-neuronx-tools``:\n\n   .. code-block:: bash\n\n      neuron-explorer view --data-path /absolute/path/to/db\n\n   By default, the UI will be launched on port 3001 and the API server will be launched on port 3002.\n\n2. **Set up port-forwarding** (if running on a remote EC2 instance) to enable local viewing:\n\n   .. code-block:: bash\n\n      ssh -i <key.pem> <user>@<ip> -L 3001:localhost:3001 -L 3002:localhost:3002\n\n   note::\n      it is necessary to forward both 3001 (for the UI server) and 3002 (for the data server)\n\n3. **Open the UI** by navigating to ``localhost:3001`` in your browser.\n\n4. **Upload your profile** by clicking the **\"Upload Profile\"** button in the Profile Manager page. You can either:\n\n   * Upload the NEFF (``.neff``) and NTFF (``.ntff``) files individually using the \"Individual Files\" upload mode, or\n   * Upload the folder containing the NEFF and NTFF files using the \"Directory Upload\" mode.\n\nNeuron Explorer Browser UI\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. _neuron-explorer-profile-manager:\n\nProfile Manager\n^^^^^^^^^^^^^^^\n\nProfile Manager is a page for uploading artifact (NEFF, NTFF and source code) and selecting profiles to access.\n\n.. image:: /tools/profiler/images/profile-workload-3.png\n\n.. _profile-manager-upload-profile:\n\n\n\nClick on the \"Upload Profile\" button to open the Upload Profile modal.\n\n\n**Device Profile Upload**\n\nSelect \"Individual Files\" upload mode to upload NEFF, NTFF, and source code individually.\n\nSelect \"Directory Upload\" to upload profile files from a directory.\n\n.. note::\n   * \"Profile name\" is a required field. You cannot upload a profile with existing name unless the option \"Force Upload\" is checked at the bottom. Force Upload currently will overwrite the existing profile with the same name.\n   * For uploading source code, the UI only supports the upload of folders, individual files, or compressed files in the gzipped tar ``.tar.gz`` archive format.\n\n.. image:: /tools/neuron-explorer/images/device-profile-upload-ui.png\n\n\n.. _profile-manager-system-profile-upload:\n\n**System Profile Upload**\n\nSelect \"Directory Upload\", then in the Profile Directory drag and drop area, select the directory containing the system profile files.\n\nThe directory should contain instance sub-directories with the following: ``ntrace.pb``, ``trace_info.pb``, ``cpu_util.pb``, and ``host_mem.pb``.\nFor an example see the output in :ref:`neuron-explorer inspect <neuron-explorer-inspect-output>`\n\n.. note::\n   System Profile uploads only support \"Directory Upload\".\n\n.. image:: /tools/neuron-explorer/images/system-profile-upload-ui.png\n\n\n**Processing Status**\n\nAfter uploading a profile, the processing task is shown under \"User Uploaded\" table. Use the \"Refresh\" button in the top-right to fetch the latest processing status and verify completion.\n\n\n**Listing profiles**\n\nAll uploaded profiles are provided in the Profile Manager page with details such as the processing status and upload time, along with various quick access actions.\n\n.. image:: /tools/profiler/images/profile-workload-5.png\n\n* **Pencil button**: Rename a profile.\n* **Star button**: Mark this profile as favorite profile. This profile will be shown in the User's favorites list.\n* **Bulb button**: Navigate to the summary page of this profile. For more details on the summary page, see :doc:`this overview of the Neuron Explorer Summary Page </tools/neuron-explorer/overview-summary-page>`.\n\nClicking on the name of profile takes you to its corresponding profile page.\n\nNeuron Explorer for Visual Studio Code\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe UI is also available as a VSCode extension, enabling better native integration for features such as code linking.\n\nInstall the Neuron Explorer extension from the Visual Studio Code Marketplace. Open the Extensions view in VSCode by pressing **Ctrl+Shift+X** (Windows/Linux) or **CMD+Shift+X** (MacOS), and search for ``AWS Neuron Explorer`` or ``amazonwebservices.neuron-explorer``. Select the extension published by **Amazon Web Services** in the sidebar, then click the blue **Install** button.\n\n.. image:: /tools/profiler/images/profile-workload-1.png\n\nEnsure the SSH tunnel is established by following the steps above. Otherwise, specify a custom endpoint by selecting the extension in the left activity bar. Then, navigate to the \"Endpoint\" action on the bottom bar of your VSCode session and select \"Custom endpoint\", and enter ``localhost:3002``. \n\n.. image:: /tools/profiler/images/profile-workload-2.png\n\nFrom there, navigate to the **Profile Manager** page through the extension UI in the left activity bar.\n\nJSON Output\n~~~~~~~~~~~\n\nThe ``--output-format`` json option writes processed profile data to human-readable JSON that can be used for scripting and manual inspection.\n\n.. code-block:: shell\n\n    neuron-explorer view -d ./output --output-format json\n\nThis will generate a ``system_profile.json`` file containing the system profile data and a ``device_profile_model_<model_id>.json`` file for each unique compiled model that was executed on a Neuron Device. \n\nThe  system_profile.json JSON contains the following data types:\n\n* ``trace_events``: Neuron Runtime API trace events and Framework/Application trace events containing timestamps, durations, names, and the ec2 instance-id to differentiate between events from different compute nodes in a distributed workload.\n\n.. code-block:: json\n\n    {\n        \"Neuron_Runtime_API_Event\": {\n            \"duration\": 27094,\n            \"group\": \"nrt-nc-000\",\n            \"id\": 1,\n            \"instance_id\": \"i-0f207fb2a99bd2d08\",\n            \"lnc_idx\": \"0\",\n            \"name\": \"nrt_tensor_write\",\n            \"parent_id\": 0,\n            \"process_id\": \"1627711\",\n            \"size\": \"4\",\n            \"tensor_id\": \"4900392441224765051\",\n            \"tensor_name\": \"_unknown_\",\n            \"thread_id\": 1627711,\n            \"timestamp\": 1729888371056597613,\n            \"type\": 11\n        },\n        \"Framework_Event\": {\n            \"duration\": 3758079,\n            \"group\": \"framework-80375131\",\n            \"instance_id\": \"i-0f207fb2a99bd2d08\",\n            \"name\": \"PjitFunction(matmul_allgather)\",\n            \"process_id\": \"701\",\n            \"thread_id\": 80375131,\n            \"timestamp\": 1729888382798557372,\n            \"type\": 99999\n        }\n    }\n\n* ``mem_usage``: sampled host memory usage \n\n.. code-block:: json\n\n    {\n        \"duration\": 1,\n        \"instance_id\": \"i-0f207fb2a99bd2d08\",\n        \"percent_usage\": 9.728179797845964,\n        \"timestamp\": 1729888369286687792,\n        \"usage\": 51805806592\n    }\n\n* ``cpu_util``: sampled CPU utilization. Results are provided per core and per ec2 instance involved in a distributed workload\n\n.. code-block:: json\n\n    {\n        \"cpu_id\": \"47\",\n        \"duration\": 1,\n        \"instance_id\": \"i-0f207fb2a99bd2d08\",\n        \"timestamp\": 1729888371287337243,\n        \"util\": 2.3255813\n    },\n\n\nView in Perfetto\n~~~~~~~~~~~~~~~~\n\nUsers can view their Neuron Explorer profiles in Perfetto. Please see :doc:`view-perfetto` for more information.\n\n.. note::\n    New Neuron Explorer features released in 2.27 and onwards may not be supported in Perfetto. For the full user experience and features set, please use the Neuron Explorer UI or VSCode Integration.\n\n\nTroubleshooting\n---------------\n\nIncomplete JAX Profiles\n~~~~~~~~~~~~~~~~~~~~~~~\n\nIf your JAX profile has fewer events than expected or lacks the Runtime API trace, check whether \n``jax.profiler.stop_trace`` is being called inside a ``with jax.profiler.trace`` context block. \nThis can prematurely stop tracing. Use ``jax.profiler.stop_trace`` only when profiling was started \nwith ``jax.profiler.start_trace``, not when using the context-managed ``with jax.profiler.trace`` API.\n\nAlso when using ``jax.profiler`` within your script ensure that the \nenvironment variable ``NEURON_RT_INSPECT_ENABLE`` is not set to 1. \nAdditionally, ensure that ``NEURON_RT_INSPECT_OUTPUT_DIR`` is set to \nthe correct output directory and this is the output directory passed to \n``with jax.profiler.trace``.\n\nDropped Events in System Profile\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen processing a system profile, you may see a warning indicating that some trace events were dropped during capture.\n\n.. code-block:: shell\n\n    WARN[0000] Warning: 1001 trace events were dropped during capture (stored 530560 out of 531561 total events). Consider increasing buffer size, reducing trace duration, or filtering events.\n\nThis means during capture the trace event buffers filled and oldest events were overwritten. If you need to avoid dropping events for the full duration of your workload consider the following adjustments:\n\n* Increase buffer size by setting ``NEURON_RT_INSPECT_SYS_TRACE_MAX_EVENTS_PER_NC`` (see :ref:`Profile Capture Environment Variables <neuron-explorer-capture-environment-variables>`). This will increase host memory usage.\n* Apply capture-time filters (NeuronCores / event types) (see :ref:`Filtering System Profiles <neuron-explorer-filtering-system-profiles>`.)\n* Shorten profiled region: limit the code span under the profiling context / runtime.\n"
  },
  {
    "path": "tools/neuron-explorer/index.rst",
    "content": ".. meta::\n   :description: Neuron Explorer documentation for performance profiling, debugging, and optimization of ML workloads on AWS Trainium and Inferentia.\n   :date-modified: 12/02/2025\n\n.. _neuron-explorer-home:\n\nNeuron Explorer\n=================\n\n.. important::\n\n    Neuron Explorer is the recommended profiling tool for AWS Neuron workloads. It provides end-to-end profiling support along with the latest features and an improved user experience. \n    \n    **Note:** Neuron will end support for :ref:`Neuron Profiler 2.0 <neuron-profiler-2-0-guide>` and :ref:`Neuron Profiler <neuron-profile-ug>` in Neuron 2.29 release. Users are encouraged to migrate to Neuron Explorer. Please see :doc:`migration-faq` and :ref:`neuron-explorer-faq` for more details.\n    \nNeuron Explorer is a suite of tools designed to support ML engineers throughout their development journey on AWS Trainium. Neuron Explorer helps developers maintain context, iterate efficiently, and focus on building and optimizing high-performance models. Developers can access Neuron Explorer from CLI, UI, or inside their IDE through VSCode integration.\n\nProfiling Viewers\n--------------------\n\nNeuron Explorer enables ML performance engineers to trace execution from source code down to hardware operations, enabling detailed analysis of model behavior at every layer of the stack. The suite of tools supports both single-node and distributed applications, allowing developers to analyze workloads at scale. \n\nGetting Started\n---------------\n\n.. grid:: 1 2 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Get Started\n      :link: get-started\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Set up Neuron Explorer, launch the web UI, and configure SSH tunneling for secure access to profiling data.\n\n   .. grid-item-card:: Capture and View Profiles\n      :link: how-to-profile-workload\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Learn how to capture and view profiles in the Neuron Explorer UI or directly in your IDE via VSCode Integration.\n\nVisualization and Analysis\n---------------------------\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: Device Trace Viewer\n      :link: overview-device-profiles\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Explore hardware-level execution with timeline view, operator table, event details, annotations, dependency highlighting, search, and more analysis features.\n\n   .. grid-item-card:: System Trace Viewer\n      :link: overview-system-profiles\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Explore system-level execution with timeline view and more analysis features.\n\n\n.. grid:: 1 2 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Hierarchy Viewer\n      :link: overview-hierarchy-view\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Visualize the entire execution from model layers down to hardware execution, supporting interactivity with device viewer and source code linking.\n\n   .. grid-item-card:: Source Code Viewer\n      :link: how-to-link-view-source-code\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Navigate between NKI and PyTorch source code and profile data with bidirectional linking and highlighting.\n\n   .. grid-item-card:: Summary Viewer\n      :link: overview-summary-page\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Get streamlined performance insights and optimization recommendations with high-level metrics and visualizations.\n\n   .. grid-item-card:: Database Viewer\n      :link: overview-database-viewer\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Develop your own analyses, examine profiling data stored in database tables, or run ad-hoc queries during performance analysis. \n\n   .. grid-item-card:: Tensor Viewer\n      :link: overview-tensor-viewer\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Viewing tensor information including names, sizes, shapes, and memory usage details.\n\n   .. grid-item-card:: Memory Viewer\n      :link: overview-memory-viewer\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Analyze memory allocation, usage patterns, and potential inefficiencies across SBUF partitions.\n\n   .. grid-item-card:: AI Recommendation Viewer\n      :link: overview-ai-recommendations\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Get AI powered bottleneck analysis and optmization recommendations for NKI profiles.\n\nTutorials\n----------\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: Profile a NKI Kernel\n      :link: /nki/guides/use-neuron-profile\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Learn how to profile a NKI kernel with Neuron Explorer.\n\n.. grid:: 1 2 2 2\n   :gutter: 3\n\n   .. grid-item-card:: vLLM Performance\n      :link: /tools/tutorials/performance-profiling-vllm\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Capture and analyze system-level and device-level profiles for vLLM inference workloads on Trainium.\n\n\nAdditional Resources\n--------------------\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: Viewing Profiles with Perfetto\n      :link: view-perfetto\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Learn how to view Neuron Explorer profiles using the Perfetto UI for trace analysis.\n\n.. _download-neuron-explorer-vscode:\n\nNeuron Explorer for Visual Studio Code\n------------------------------------------------\n\nThe Neuron Explorer VSCode extension is available on the Visual Studio Code Extension Marketplace.\n\nTo install the extension, open the Extensions view in VSCode by pressing **Ctrl+Shift+X** (Windows/Linux) or **CMD+Shift+X** (MacOS), and search for ``AWS Neuron Explorer`` or ``amazonwebservices.neuron-explorer``. Select the extension published by **Amazon Web Services** in the sidebar, then click the blue **Install** button.\n\nYou can also install the extension directly from the `Visual Studio Code Marketplace <https://marketplace.visualstudio.com/items?itemName=AmazonWebServices.neuron-explorer>`_.\n\n.. _neuron-explorer-faq:\n\nNeuron Explorer FAQ\n-------------------\n\nWhat can I expect from the Neuron Explorer?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron Explorer provides a comprehensive profiling experience with both device-level and system-level profiling support. Neuron Explorer features an enhanced profiling experience with hierarchical profiling, bidirectional code linking, AI-powered recommendations, IDE integration, and more. In future releases, Neuron Explorer will continue to expand with additional profiling viewers and features, debugging capabilities, and enhanced recommendation and analysis tools to support the entire ML development journey on Trainium.\n\nWhat is the difference between device-level and system-level profiling?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDevice-level profiling captures hardware execution data from NeuronCores, including compute engine instructions, DMA operations, and hardware utilization. Use device-level profiling to analyze hardware performance, identify compute or memory bottlenecks, and optimize kernel implementations.\n\nSystem-level profiling captures software execution data, including framework operations, Neuron Runtime API calls, CPU utilization, and memory usage. Use system-level profiling to analyze framework overhead, identify CPU bottlenecks, and debug runtime issues.\n\nIs Neuron Explorer going to replace Neuron Profiler and Neuron Profiler 2.0?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYes. Neuron Explorer is the recommended profiling tool and replaces both Neuron Profiler and Profiler 2.0.\n\nNeuron Profiler and Profiler 2.0 are supported for one final release. In Neuron 2.29 release, they will enter end-of-support and will no longer receive updates or technical support, though they will remain accessible through the ``neuron-profile`` package in previous releases. Users should migrate to Neuron Explorer now.\n\nAre my existing profiles compatible with Neuron Explorer?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYes. Neuron Explorer is backwards compatible with profile data captured using Neuron Profiler or Profiler 2.0. Existing profile files must be reprocessed before viewing in Neuron Explorer, but you do not need to recapture them. See :ref:`new-neuron-profiler-setup`.\n\nFor detailed migration guidance, including CLI command mappings and feature comparisons, see the :doc:`migration-faq`.\n\n\n.. toctree::\n   :hidden:\n   :maxdepth: 1\n\n   Get Started <get-started>\n   Neuron Profiler to Neuron Explorer Migration Guide <migration-faq>\n   Capture and View Profiles <how-to-profile-workload>\n   Device Trace Viewer <overview-device-profiles>\n   System Trace Viewer <overview-system-profiles>\n   Hierarchy Viewer <overview-hierarchy-view>\n   Source Code Viewer <how-to-link-view-source-code>\n   Summary Viewer <overview-summary-page>\n   Database Viewer <overview-database-viewer>\n   Tensor Viewer <overview-tensor-viewer>\n   Memory Viewer <overview-memory-viewer>\n   AI Recommendation Viewer <overview-ai-recommendations>\n   View Profiles with Perfetto <view-perfetto>\n   \n"
  },
  {
    "path": "tools/neuron-explorer/migration-faq.rst",
    "content": ".. _neuron-profiler-migration-guide:\n\nMigration Guide from Neuron Profiler to Neuron Explorer\n========================================================\n\nThis guide provides detailed information for migrating from Neuron Profiler or Neuron Profiler 2.0 to Neuron Explorer.\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n\nNeuron Explorer is the recommended profiling tool for AWS Neuron workloads, replacing both Neuron Profiler and Neuron Profiler 2.0. This guide helps you transition your profiling workflows to Neuron Explorer.\n\nKey Differences\n---------------\n\nThe following table summarizes the key differences between Neuron Profiler/Profiler 2.0 and Neuron Explorer:\n\n.. list-table::\n   :widths: 30 35 35\n   :header-rows: 1\n   :align: left\n\n   * - Feature\n     - Neuron Profiler / Profiler 2.0\n     - Neuron Explorer\n   * - CLI tool\n     - ``neuron-profile``\n     - ``neuron-explorer``\n   * - Device Profiling\n     - Yes\n     - Yes (enhanced)\n   * - System Profiling\n     - Yes (Profiler 2.0 only)\n     - Yes\n   * - Hierarchy Viewer\n     - No\n     - Yes\n   * - Source Code Viewer\n     - Yes (Device profiles)\n     - Yes (Device profiles)\n   * - AI Recommendation Viewer\n     - No\n     - Yes (for NKI profiles)\n   * - IDE Integration\n     - No\n     - Yes (VSCode Extension)\n   * - Database Viewer\n     - No\n     - Yes\n   * - Tensor Viewer\n     - No\n     - Yes\n   * - Additional Installation Requirements\n     - InfluxDB installation required\n     - None\n\n\nUpdate CLI Commands\n--------------------\n\nReplace ``neuron-profile`` with ``neuron-explorer`` in your scripts and workflows. The following commands are subject to change before GA:\n\n.. list-table::\n   :widths: 50 50\n   :header-rows: 1\n   :align: left\n\n   * - Neuron Profiler Command\n     - Neuron Explorer Command\n   * - ``neuron-profile view -d ./output``\n     - ``neuron-explorer view -d ./output``\n   * - ``neuron-profile view -n file.neff -s profile.ntff``\n     - ``neuron-explorer view -n file.neff -s profile.ntff``\n   * - ``neuron-profile capture -n file.neff -s profile.ntff``\n     - ``neuron-explorer capture -n file.neff -s profile.ntff``\n\n\nFrequently Asked Questions\n--------------------------\n\nDo I need to install InfluxDB for Neuron Explorer?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNo. Unlike Neuron Profiler, Neuron Explorer requires no external installation or setup.\n\nHow do I view existing profiles captured with Neuron Profiler?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nExisting NEFF and NTFF files captured with Neuron Profiler are fully compatible with Neuron Explorer. To view them:\n\n.. code-block:: bash\n\n   # View a single device profile\n   neuron-explorer view -n file.neff -s profile.ntff\n\nThe profiles will be reprocessed using Neuron Explorer's processing pipeline, which may provide additional insights not available in the original Neuron Profiler view.\n\nHow do I capture profiles with Neuron Explorer?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron Explorer provides the ``neuron-explorer capture`` command for standalone NEFF profiling, similar to ``neuron-profile capture``:\n\n.. code-block:: bash\n\n   # Capture a device profile\n   neuron-explorer capture -n file.neff -s profile.ntff\n\nYou can also use the framework profiling APIs or environment variables to capture profiles during your actual workload execution. For NKI kernel profiling, continue using the ``nki.benchmark`` or ``nki.profile`` APIs as documented in the :ref:`NKI profiling guide <use-neuron-profile>`.\n\nWhat new features does Neuron Explorer provide?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNeuron Explorer introduces several new capabilities:\n\n- **Hierarchy Viewer**: Visualize execution from model layers down to hardware operations. See :doc:`overview-hierarchy-view`.\n- **Source Code Viewer**: Navigate between source code and profile data. See :doc:`how-to-link-view-source-code`.\n- **AI Recommendation Viewer**: Get AI-powered optimization suggestions for NKI profiles. See :doc:`overview-ai-recommendations`.\n- **Database Viewer**: Run custom queries on profiling data. See :doc:`overview-database-viewer`.\n- **Memory Viewer**: Get insight into memory allocation, usage patterns, and potential memory usage inefficiencies.\n- **Tensor Viewer**: Examine tensor information including shapes and memory usage. See :doc:`overview-tensor-viewer`.\n- **VSCode Extension**: View profiles directly in your IDE with native code linking support.\n- **System Trace Viewer**: Enhanced system-level profiling visualization. See :doc:`overview-system-profiles`.\n\nHow do I get help during migration?\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n- Review the :doc:`get-started` guide for initial setup\n- See :doc:`how-to-profile-workload` for detailed capture and viewing instructions\n- Check submitted issues and file new issues via the `AWS Neuron GitHub issues <https://github.com/aws-neuron/aws-neuron-sdk/issues>`_\n"
  },
  {
    "path": "tools/neuron-explorer/overview-ai-recommendations.rst",
    "content": ".. meta::\n    :description: AI Recommendation feature helps identify and understand bottlenecks and optimization opportunities for NKI kernels through AI-powered analysis\n    :date-modified: 11/21/2025\n\nAI Recommendation Viewer\n=========================\n\nIn this guide, you'll learn how to use the AI Recommendation Viewer to identify and understand bottlenecks and optimization opportunities for NKI kernels through AI-powered analysis of the user's profile and source code. Users receive actionable recommendations through the Neuron Explorer UI, CLI, or via their IDE. Each report provides the top 2-3 optimization opportunities ranked by effort and impact, including the symptom with quantified metrics, the optimization with implementation guidance, expected speedup estimates, and implementation tradeoffs. \n\nThe feature is entirely opt-in and only enabled for profiles that the user explicitly requests a recommendation for.\n\n.. warning:: \n    * Responses in this AWS Bedrock-powered feature are AI-generated. Verify accuracy and appropriateness before use. \n    * This feature is available in US Regions only. Neuron may securely transmit data across Regions within your geography for processing. \n    * Your AWS account will be billed for Bedrock usage. Each time you generate an AI Recommendation for a profile, a single Bedrock request is made with up to 30,000 input tokens and 10,000 output tokens. \n    * At the moment, this feature may only be used with Claude Sonnet 4.5.\n\n.. _local_setup_directions:\n\nLocal setup directions\n----------------------------------------------------\n\nAI Recommendations use Amazon Bedrock. To enable this feature, you must configure AWS credentials on the system you are running neuron-explorer on. The AWS credentials should have bedrock:InvokeModel permissions and access to Claude Sonnet 4.5. For information on configuring Bedrock access, refer to the `AWS Bedrock model access documentation <https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html>`_.\n\nGetting an AI Recommendation From the UI\n----------------------------------------------------\n\nTo generate an AI Recommendation from the UI open your profile, click the \"Add Widget\" dropdown, and select **AI Recommendation**.\n\n.. image:: /tools/profiler/images/recommendation-button.png\n\nGo to the **AI Recommendation** widget box and click the **Get AI Recommendation** button. This will perform additional analysis and send the recommendation request to AWS Bedrock and can take up to a minute to generate. Avoid refreshing the page during this time.\n\n.. image:: /tools/profiler/images/recommendation-widget.png\n\nOnce the recommendation has been generated it will be displayed in the widget box. For each recommendation you will see the performance inefficiency symptoms that were observed, the suggested optimization to make, and potential tradeoffs to look out for when implementing the optimizations.\n\n.. image:: /tools/profiler/images/recommendation-view.png\n\nGetting an AI Recommendation from the CLI\n----------------------------------------------------\n\nUsers may also get AI recommendations with the ``neuron-explorer recommend`` CLI command. \n\nBefore you start, ensure that you have followed the :ref:`local setup directions <local_setup_directions>` to enable Bedrock access on your configured AWS account. ``neuron-explorer`` uses the default AWS credentials you have configured. If you will use other credentials, you can specify an AWS profile to use by setting environment variables: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html.\n\nTo generate a recommendation, provide the following to the ``neuron-explorer recommend`` command:\n\n* A NEFF file for your compiled NKI kernel\n* An NTFF file for your captured profile\n* The location where your NKI source files can be found\n\nExample:\n\n.. code-block::\n\n   neuron-explorer recommend -n </path/to/neff> -s </path/to/ntff> --nki-source-root </path/to/src/dir>\n\nRunning this command processes the profile and prints the AI-generated recommendation to the console in Markdown format. You can save this output to a file and view it in any text editor or Markdown viewer.\n"
  },
  {
    "path": "tools/neuron-explorer/overview-database-viewer.rst",
    "content": ".. meta::\n    :description: Learn about the Database Viewer tool in Neuron Explorer for querying and exploring profiling data using SQL or natural language queries.\n    :date-modified: 01/27/2026\n\n.. _database-viewer-overview:\n\nDatabase Viewer\n=====================\n\nThe Database Viewer offers an interactive interface providing visibility to all underlying data that the Neuron Explorer\nprocesses from a :doc:`NEFF </neuron-runtime/explore/work-with-neff-files>` and NTFF. \nUse this tool to develop your own analyses, examine profiling data stored in database tables, or run ad-hoc queries during performance analysis. \nYou can access this data through natural language queries or raw SQL.\n\n\n.. image:: /tools/profiler/images/database-viewer.png\n\nTable Selection and Schema Inspection\n-------------------------------------\n\nWhen the tool loads, it fetches the list of available database tables. Select a table from the dropdown to view its schema.\n\nThe schema table displays:\n\n* **Field Name** - Column name (hover for description tooltip).\n* **Data Type** - The data type of the field.\n* **Required** - Whether the field is required.\n* **Unit** - Measurement unit (if applicable).\n* **Example** - Example value for the field.\n\nQuerying Data\n-------------\n\nThe query input supports two modes:\n\n1. **SQL queries** - Write standard SQL starting with ``SELECT``.\n2. **Natural language queries** - Describe what you want in plain English.\n\nExamples:\n\nNatural language query to get the first 5 rows::\n\n    Get the first 5 rows\n\nSQL query to filter with conditions::\n\n    SELECT field_name FROM table_name WHERE condition\n\nPress **Enter** or click **Execute Query** to run. Use **Shift+Enter** for multi-line input.\n\nQuery Results\n-------------\n\nResults appear below the query input in reverse chronological order (newest first). Each result shows:\n\n* The original query text.\n* The generated SQL (for natural language queries).\n* A scrollable results table.\n\nClick **Export CSV** to download any result set as a CSV file.\n\n.. image:: /tools/profiler/images/database-viewer-query-result.png"
  },
  {
    "path": "tools/neuron-explorer/overview-device-profiles.rst",
    "content": ".. meta::\n    :description: Learn about Neuron Explorer widgets for device profiling including timeline views, event details, annotations, and performance analysis tools.\n    :date-modified: 12/02/2025\n\nDevice Trace Viewer\n===================\n\nThe Neuron Device Trace Viewer displays a hardware instruction level granularity of execution on a NeuronCore. Neuron Explorer collects the timestamped start and end events that occur on the device into a NTFF. As a post-processing step, the profiler will correlate these events with information in the compiled NEFF to generate a detailed report of the hardware performance. The Neuron Explorer UI provides several different tools for an extensible and customizable workflow.\n\n.. image:: /tools/profiler/images/device-profile-1.png\n\nTools\n------\n\nDevice Trace Viewer\n~~~~~~~~~~~~~~~~~~~~~\n\nThe Device Trace Viewer presents a timeline view of the device execution, including activity on the DMA and compute engines, Hardware FLOPs Utilization (HFU) and device memory utilization over time, and more.\n\n.. image:: /tools/profiler/images/device-profile-2.png\n\nHover\n^^^^^\n\n.. image:: /tools/profiler/images/device-profile-3.png\n\nHover on events in the timeline to see important identifying information at a glance, such as the time window, the hierarchy, and the hardware instruction that was executed.\n\nFor more details, clicking the event will display the full details in the Event Details widget.\n\nColor Scheme\n^^^^^^^^^^^^\n\n.. list-table::\n   :header-rows: 0\n   :widths: 50 50\n\n   * - .. image:: /tools/profiler/images/device-profile-4.png\n          :width: 100%\n     - .. image:: /tools/profiler/images/device-profile-5.png\n          :width: 100%\n\nInstructions are color-coded according to their associated PyTorch operator. All instructions derived from the same PyTorch operator share an identical color.\n\n.. note::\n   In future releases, we will introduce more customizable options for color-coding.\n\nPanning\n^^^^^^^\n\n.. image:: /tools/profiler/images/device-profile-6.gif\n\nPanning is supported in a couple of ways:\n\n* Left-clicking the x-axis and dragging it\n* Spinning scroll-wheel while holding down shift\n* With the keyboard:\n    * A/D keys for left/right movement\n    * Left/right arrow keys for left/right movement\n\nThe amount panned depends on the current zoom level.\n\nEvent Details\n~~~~~~~~~~~~~\n\nUpon clicking an event in the Device Trace Viewer, all details related to the event will appear in the Event Details. The information shown will be a superset of the information available on hover, allowing us to dive deeper into what is happening on the hardware.\n\n* The Event Details table will populate with field data from clicked events from the instruction widget.\n* When filtering by fields through Search, all matching events will be rendered as pages in the Event Details. Users can navigate through each page to analyze data for each matching event.\n  \n.. image:: /tools/profiler/images/device-profile-7.png\n\nAnnotations\n~~~~~~~~~~~\n\nUsers can create annotations by right-clicking in the Device Trace Viewer. These annotations can be moved by clicking and dragging the vertical line, and will snap to the closest events when applicable.\n\nThe annotations tab will show more details on all available annotations in the profile, such as the time difference and summary metrics that occur between two markers. The option of which two annotations to compare is configurable in the diff vs column. You can also quickly zoom in to the region between two annotations by selecting the checkbox on the left. Users can rename, delete, save, and load annotations for better readability and collaboration.\n\n.. image:: /tools/profiler/images/device-profile-8.png\n\nOperator Table\n~~~~~~~~~~~~~~\n\nThe Operator Table aggregates the hardware level metrics into framework layers and operations, such as the MFU and amount of data being moved. Users can progressively expand each row to get a further breakdown of each nested operator.\n\nFilters can be applied and columns can be sorted for more streamlined viewing.\n\n.. image:: /tools/profiler/images/device-profile-9.png\n\nOverall Summary\n~~~~~~~~~~~~~~~\n\nThe Overall Summary displays performance metrics across the entire profile run, with metrics broken down into different categories such as by the NeuronCore engines. These can be used for quick insights into how well the model performed.\n\n.. image:: /tools/profiler/images/device-profile-10.png\n\nCurrent Selection Summary\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe Current Selection Summary provides metrics for the current time window. Zooming in and out in the Device Trace Viewer will update the summary. This can be used in conjunction with the zoom feature of Annotations for easy access to a region of interest.\n\n.. image:: /tools/profiler/images/device-profile-11.png\n\n.. _box-selection-summary:\n\nBox Selection Summary\n~~~~~~~~~~~~~~~~~~~~~\n\nThe Box Selection Summary provides metrics within a bounding box region. Select and drag regions within the timeline widget to update the summary.\n\n.. image:: /tools/profiler/images/box-select.gif\n\nBox selection is supported in a couple of ways:\n\n* Toggling the box selection button within the timeline widget\n* Clear selection with `esc` key\n\nCorreponding summary information of the selected region is displayed within the box selection selection widget.\n\nCode Viewer\n~~~~~~~~~~~\n\nProfiles that are uploaded with source code files enable users to quickly navigate between NKI and application level source code and the corresponding hardware level instructions.\n\nIn the Device Trace Viewer, we can click on an event to highlight the source code line in the Code Viewer. A (Ctrl/Cmd) + click on the event will scroll to the corresponding source code line.\n\nIn the Code Viewer, clicking on a line in the source code will automatically highlight all associated events in the Device Trace Viewer. Similarly, highlighting multiple lines of the source code will also highlight all events in the timeline.\n\n.. image:: /tools/profiler/images/device-profile-12.png\n\nSee :ref:`neuron-explorer-source-code` for instructions on how to enable source code viewing.\n\nLayout Customization\n~~~~~~~~~~~~~~~~~~~~\n\nUnderstanding and optimizing performance with the profiler can be overwhelming given the amount of information being processed and displayed. As part of preparing for optimization work, you can cross-reference different information, such as the Device Trace Viewer with the application source code. With the widget-based UI, you can customize the layout to best fit a specific workflow. Each widget can be added, removed, dragged around, and resized. Once you are happy with the layout, you can save it through the Layout dropdown at the top right. The layouts are not tied to a specific profile, so they can be loaded and re-used for future profiles as well.\n\n.. image:: /tools/profiler/images/device-profile-13.png\n"
  },
  {
    "path": "tools/neuron-explorer/overview-hierarchy-view.rst",
    "content": ".. meta::\n    :description: Learn about the Hierarchy View in Neuron Explorer for analyzing framework layers and HLO operations with zooming, highlighting, and display options.\n    :date-modified: 12/02/2025\n\nHierarchy Viewer\n===================\n\nThe Hierarchy Viewer shows an up-leveled representation of the hardware execution organized by the framework layers and HLO operations. It enables you to progressively drill down into nested layers or operators and map the execution of application level constructs to the Neuron device. This view interacts with other tools such as the Device Trace Viewer.\n\n.. image:: /tools/profiler/images/hierarchy-view-1.gif\n\n\nZooming\n-------\n\n.. image:: /tools/profiler/images/hierarchy-view-2.png\n\nYou can zoom in on the Hierarchy Viewer in a couple of ways:\n\n* Click-drag your mouse across the graph (support in both directions)\n* Scroll down using your mouse wheel, with the mouse cursor on the x-axis\n* Zoom in and out buttons in the top-right corner\n* With the keyboard:\n  \n    * W and S for zooming in and out, respectively\n    * Up and down arrow keys for zooming in and out, respectively\n\nTo zoom out, simply scroll up with your mouse wheel when you place your mouse cursor on the x-axis.\n\nChange Displayed Layers\n-----------------------\n\n.. image:: /tools/profiler/images/hierarchy-view-3.png\n\nThe display options menu, accessed with the button in the top-right corner, allows you to selectively show or hide different layers. For instance, in the example shown above, the framework layer is hidden while displaying the hierarchy starting from HLO.\n\nHighlighting\n------------\n\n.. image:: /tools/profiler/images/hierarchy-view-4.png\n\nRight-clicking on an operator in Hierarchy Viewer will highlight all the corresponding instructions in the Device Trace Viewer for the operator using the same color. Multiple operators can be highlighted at once.\n\n.. image:: /tools/profiler/images/hierarchy-view-5.png\n\n"
  },
  {
    "path": "tools/neuron-explorer/overview-memory-viewer.rst",
    "content": ".. meta::\n    :description: Learn about the Memory View in Neuron Explorer for analyzing all the memory allocations on SBUF.\n    :date-modified: 03/24/2026\n\nMemory Viewer\n===================\n\nThe Memory Viewer in Neuron Explorer offers deep, low-level insight into memory allocation, usage patterns, and potential inefficiencies — going well beyond surface-level metrics. With comprehensive visibility into how memory is consumed across the device, it enables kernel and performance engineers to make informed optimization decisions, reduce debugging time, and improve overall system performance.\n\n.. image:: /tools/neuron-explorer/images/memory_viewer_overview.png\n   :alt: Memory Viewer overview showing memory allocation patterns across SBUF partitions\n\n\nEnable Memory Viewer during Profile Upload\n--------------------------------------------\n\nTo enable the Memory Viewer feature, check the option 'Enable Memory Viewer' when you upload your profile:\n\n.. image:: /tools/neuron-explorer/images/memory_viewer_enable.png\n\nView the Memory Viewer Widget\n------------------------------\n\nOnce your profile finishes processing and is ready to view, click the Add Widget button and select 'Memory Viewer':\n\n.. image:: /tools/neuron-explorer/images/memory_viewer_add_widget.png\n\n\nBy hovering your mouse over each allocation, you can see the detailed information about this allocation. For allocations triggered by instructions, hover informations includes: \n* Start time and end time\n* Duration\n* Start address and end address \n* Opcode\n* Operands \n\nFor allocations triggered by DMAs, hover information includes: \n* Partition number \n* Start time and end time \n* Duration\n* Start address and end address \n* DMA queue name \n* Block ID\n\nBy analyzing memory allocations, you can address memory fragmentation by identifying sparse allocation patterns and potentially rescheduling instructions or DMAs to different addresses to maintain memory compactness. Additionally, you can perform spill/reload analysis to identify opportunities for reducing spills by relocating allocations to available space at alternative addresses.\n\nYou can also use the dropdown menu to inspect the memory allocations on different partitions and NC cores:  \n\n.. image:: /tools/neuron-explorer/images/memory_viewer_hover.png\n"
  },
  {
    "path": "tools/neuron-explorer/overview-summary-page.rst",
    "content": ".. meta::\n    :description: Learn how to use the Neuron Explorer summary page to quickly identify performance issues, view key metrics, and get actionable optimization recommendations for your profiles.\n    :date-modified: 03/20/2026\n\nSummary Viewer\n================\n\nThe Neuron Explorer summary viewer provides a streamlined view of your profile's most critical performance insights, enabling quick identification of issues and optimization opportunities without navigating through detailed data.\n\n.. image:: /tools/profiler/images/explorer-summary-page.png\n\nBenefits\n--------\n\nBoth new and experienced users benefit from this streamlined view of profiling data.\n\n* Identify performance issues quickly\n* Understand your profile's most critical metrics at a glance\n* Get actionable recommendations for optimization\n\nHow to use\n-------------\n\n1. **Open your profile** - The Summary Viewer is accessible via the Profile Manager or Neuron Explorer UI.\n2. **Examine key metrics** - Review the metrics and graphs to understand your profile's performance characteristics.\n3. **Review recommendations** - Start with the **Performance Insights & Recommendations** section. This section highlights the most important performance issues.\n4. **Select specific time regions** - Use the \"Region Selection\" menu to view specific timeslices corresponding to network layers. This helps you drill down into specific sections of your profile. You can generate custom time regions using the \"Add Region\" button.\n5. **Take action** - Apply the recommended optimizations to your model or workload.\n\nUnderstanding region-level insights\n-----------------------------------\n\nWhen you work with profiles from entire networks or network subgraphs, different regions will have different performance characteristics. The landing page enables performance analysis on a per-layer basis and provides:\n\n* Layer-specific recommendations\n* Time-range indication of where problems occur\n* More accurate insights for complex profiles\n\nUse the 'Region Selection' menu to navigate between different layers and view their individual performance data.\n\nWhat the landing page displays\n------------------------------\n\nPerformance Insights and Recommendations\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis section shows 2-4 recommendations to help you improve performance. The profiler analyzes your data and identifies the most important issues to address. The profiler prioritizes recommendations by criticality and shows you the most critical ones first.\n\nExample recommendations\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. list-table::\n   :header-rows: 1\n   :widths: 30 35 35\n\n   * - Condition\n     - Root Cause\n     - Recommended Action\n   * - Low Model FLOPS relative to Active FLOPS (< 50%)\n     - Tensor engine is active but not performing useful matrix operations\n     - Ensure instructions use the entire tensor engine and are pipelined correctly\n   * - NKI instruction coverage < 50% on tensor, vector, or scalar engine\n     - Compiler-generated instructions dominate the engine\n     - Write NKI kernel code for the network operations present in that profile section\n   * - Active FLOPS throttling detected\n     - FLOPS lost due to throttling during active tensor engine periods\n     - Investigate the root cause of throttling to recover tensor engine utilization\n   * - Transpose FLOPS > 10% of total hardware FLOPS\n     - Excessive data movement within the tensor engine\n     - Improve memory layout to reduce transpose operations\n   * - Collective operation outliers detected\n     - Significantly underperforming collective operations relative to their group median\n     - Check for overlapping instructions that might be causing delays\n   * - Spill reload bytes > 25% of total HBM reads\n     - Excessive spill/reload operations consuming memory bandwidth\n     - Check for data dependencies causing excessive spill/reload operations\n\nKey Metrics\n~~~~~~~~~~~\n\nThis section displays tables and graphs that summarize your profile's performance metrics.\n\nCompute Performance Statistics\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* **total_time** - Total duration of on-device time for the run in seconds. This doesn't include host-device data movement overhead or host runtime/framework overhead.\n* **mm_arithmetic_intensity** - The ratio of regular Matrix Multiplication (MATMUL) Floating Point Operations (FLOPs) to total Dynamic Random Access Memory (DRAM) transfer size. This metric helps you determine if your workload is memory-bound or compute-bound.\n* **hfu_estimated_percent** - Hardware FLOPs Utilization reflects the Tensor Engine utilization calculated from all Tensor Engine instructions.\n* **mfu_estimated_percent** - Model FLOPs Utilization reflects the Tensor Engine utilization for useful compute (matrix multiplications from your model definition).\n\nMemory Bandwidth Utilization\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n* **total_bandwidth_available** - The total bytes possible to be transferred within the given time region for the current Neuron hardware specification.\n* **mbu_estimated_percent** - Memory Bandwidth Utilization (MBU) shows the achieved (as running on the current Neuron hardware) High Bandwidth Memory (HBM) bandwidth utilization.\n* **average_dma_size** - The average DMA transfer size (higher is better).\n* **useful_read_percent** - The fraction of HBM reads that are useful (``hbm_read_bytes`` - ``hbm_reload_bytes``) / hbm_read_bytes). Note that \"useful\" is related to an inherent property of the memory itself, but a measurement of how efficiently the memory is being utilized by a specific workload or application. Low numbers may indicate inefficient memory access patterns and suboptimal layouts.\n\nFLOPs Utilization\n^^^^^^^^^^^^^^^^^\n\nFor each compute engine (tensor, vector, scalar, gpsimd), displays how well utilized the engine is. You can view all cores simultaneously or select a specific Neuron Core from the dropdown.\n\nTensor Engine\n\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nThe Tensor engine has a detailed breakdown of how the FLOPs are being used:\n\n* **model_flops**: The percentage of tensor flops spent performing useful matrix operations, contributing to model progress\n* **transpose_flops**: The percentage of tensor flops spent performing transpose operations / data movement\n* **active_flops** - Percentage of tensor flops that correspond to the active_time of the tensor engine, but where the engine was not effectively utilized.\n* **throttled_flops (active and inactive)** - Percentage of FLOPs wasted due to throttling, either during active or inactive tensor engine periods.\n\nThere are a few key things to look for in this graph:\n\n1. **model_flops relative to active_flops**. Large differences could indicate that the tensor engine is being poorly utilized with small tensor sizes, or that operations are not being pipelined effectively.\n2. **model_flops relative to transpose_flops**. It is desired to have little-to-no ``transpose_flops`` consuming tensor engine utilization. Ideally the ``model_flops`` amount is much larger than the amount of transposes.\n3. **active_throttled_flops**: FLOPs lost due to throttling during active periods is undesirable. It is worth identifying the root cause for the throttling if there is indication of this happening.\n\nOther Engines (Scalar, Vector, GpSimd)\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nThese engines do not yet have detailed FLOP utilization breakdowns, they only show the active period of operation for the engine.\n\n* **active_flops** - Percentage of FLOPs when the engine processes at least one instruction (excluding semaphore waits).\n\nNKI Engine Statistics\n^^^^^^^^^^^^^^^^^^^^^\n\nThis chart shows the instruction count breakdown between NKI-generated instructions and compiler-generated instructions for each compute engine (tensor, vector, scalar). The stacked bar chart helps you understand how much of your workload is running NKI kernel code versus compiler-generated code.\n\nHovering over a bar displays a detailed breakdown of instruction counts by opcode for that engine and source type.\n\nWhen NKI instruction coverage is below 50% for a given engine, the summary page generates a recommendation to write NKI kernel code for the network operations in that profile section.\n\nDMA Utilization\n^^^^^^^^^^^^^^^\n\nThis chart shows how the DMA engines are being utilized, displayed as a percentage of the total available bandwidth. Two dropdown menus control the chart's aggregation:\n\n* **Outer aggregation** - Choose between viewing data per DMA engine (\"All Engines\") or per Neuron Core (\"Neuron Cores\").\n* **Inner aggregation** - Choose between grouping by data type or source type:\n\n  * **Data Type** groups transfers into Instruction, IO, Weights, and Dynamic categories.\n  * **Source Type** groups transfers into Static (compiler-generated), Software Dynamic (GpSimd-generated), and Hardware Dynamic (DGE hardware-generated) categories.\n\nEach category shows two bar segments: a solid bar representing bandwidth utilization and a striped bar representing active time utilization beyond the bandwidth portion. This helps distinguish between time spent transferring data and time the DMA engine is active but not fully utilizing bandwidth.\n\nMemory Bandwidth Breakdown\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nShows how the available HBM memory bandwidth was used as a doughnut chart:\n\n* HBM Read — effective read bytes (excluding spill reloads)\n* HBM Write — effective write bytes (excluding spill saves)\n* SBUF Spill Reload — bytes reloaded from HBM due to state buffer spills\n* SBUF Spill Save — bytes saved to HBM due to state buffer spills\n* Unused — remaining available bandwidth\n\nCollective Operations Duration\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nDisplays the duration of each collective operation in the profile, grouped by operation type and size. Two visualization modes are available via a dropdown:\n\n* **Scatter** - Shows individual operation durations as scatter points, with each operation type on a separate row. Hovering over a point displays detailed information including algorithm, operation, duration, start/end timestamps, element count, input/output sizes, and trigger engine. Clicking a point pins the tooltip for easy text selection.\n* **Box Plot** - Shows the statistical distribution (min, Q1, median, mean, Q3, max, variance, count) of operation durations per operation type. This is useful for quickly identifying the spread and central tendency of each operation group.\n\nBoth modes are useful for identifying outliers in collective runtime, which can be used to investigate specific sections of the profile more deeply. It is possible to filter out datasets by clicking on the datasets in the legend of the graph.\n\nSystem Information\n^^^^^^^^^^^^^^^^^^\n\nDisplays metadata about the system and software versions used during profiling:\n\n* Instance Type\n* Compiler Version\n* Explorer Version\n* Driver Version\n* Runtime Version\n* Collectives Version\n\nSystem Profile Summary\n======================\n\nWhen a system profile is loaded, the Summary Viewer automatically switches to the System Profile Summary view. System profiles capture data across multiple devices, processes, and instances, providing a holistic view of distributed workload performance.\n\nOverview\n--------\n\nThe System Profile Summary provides:\n\n* A high-level overview of the entire system's profiling session\n* HBM memory usage trends across logical NeuronCores\n* A detailed table of all device profiles with key performance metrics\n* The ability to drill down into individual device profiles for detailed analysis\n\nSystem Overview Card\n--------------------\n\nDisplays aggregate information about the profiling session:\n\n* **Instances** - Number of unique instances captured in the profile\n* **Processes** - Number of unique processes captured\n* **System Profile Time** - Total wall-clock duration of the system profiling session\n* **Total Device Runtime** - Cumulative on-device execution time across all device profiles\n* **Total Device Profiles** - Number of individual device profiles in the system profile\n\nHBM Memory Usage Chart\n-----------------------\n\nA line chart showing HBM memory usage over time. When per-NeuronCore data is available, the chart displays a separate line for each logical NeuronCore (HBM index), color-coded for easy identification. When only aggregate data is available, a single filled area chart shows total HBM usage.\n\nThe x-axis shows time (in the profiling session's time domain) and the y-axis shows memory usage in bytes. Hovering over the chart displays the exact timestamp and memory usage for each NeuronCore.\n\nDevice Profiles Table\n---------------------\n\nA table listing all device profiles captured in the system profile. The table supports:\n\n* **Process filtering** - Use the dropdown to filter profiles by process ID, or select \"All Processes\" to view everything.\n* **Expandable rows** - Click the expand arrow on any row to see additional per-profile metrics including tensor/vector/scalar engine active time percentages, DMA active time, and HBM read/write bytes.\n* **Column tooltips** - Hover over column headers to see descriptions of each metric from the profile schema.\n\nTable columns:\n\n* **Profile Name** - Clickable link that navigates to the detailed device profile view\n* **LNC** - Logical NeuronCore ID\n* **Neuron Cores** - Number of physical NeuronCores used by this profile\n* **Total Duration** - Total on-device execution time for this profile's events\n* **Calls** - Number of execution events for this profile\n* **Duration** - Total profiled time for this device profile\n* **MFU** - Model FLOPs Utilization\n* **HFU** - Hardware FLOPs Utilization\n* **MBU** - Memory Bandwidth Utilization\n* **CC Active** - Collective communication active time percentage\n\nDevice Profile Detail View\n--------------------------\n\nClicking a device profile name in the table navigates to a detail view that embeds the standard Summary Viewer for that specific device profile. This provides the full set of per-device metrics, charts, and recommendations described in the sections above.\n\nA \"Back to System Overview\" button at the top returns you to the system-level summary.\n"
  },
  {
    "path": "tools/neuron-explorer/overview-system-profiles.rst",
    "content": ".. meta::\n    :description: Learn about the System Profile in Neuron Explorer for analyzing system-level execution across instances and workers with runtime and hardware events.\n    :date-modified: 01/30/2026\n\nSystem Profile\n================\n\nThe Neuron System Profile show a system-level granularity of execution across instances and workers in your workload. This provides visibility into Neuron Runtime API calls and ML framework function calls (PyTorch or JAX) to help identify bottlenecks in distributed workloads. The Neuron Explorer UI provides system-level widgets for an extensible and customizable workflow.\n\n.. image:: /tools/neuron-explorer/images/neuron-explorer-system-viewer.png\n\nSystem Trace Viewer\n---------------------\n\nThe System Trace Viewer provides an interactive timeline interface with time range selection, configurable event grouping, system event details on hover, and linking of hardware events to Device Trace Viewer widgets.\n\nYou can see events in the Neuron Runtime and correlate them with hardware execution events on the Neuron Devices.\n\n.. image:: /tools/neuron-explorer/images/system-timeline-widget.png\n\nYou can also see the device memory (HBM) allocations for each Neuron device over time. Hovering over these memory usage events shows a breakdown by usage category.\n\n.. image:: /tools/neuron-explorer/images/system-timeline-widget-hbm-usage.png\n\n\nAdding Widgets\n---------------\nThe System Profile supports both System and Device widgets, enabling multi-profile analysis, for example comparing annotated device events across different devices.\n\nTo add a widget:\n\n1. Click the **Add Widget** button to open the Add Widget modal.\n2. Select a Device or System widget.\n3. Click a widget tile to load it with the selected profile. Each tile is tagged with its supported profile type (system, device, or both).\n\nTo load multiple instances of the same widget type for different profiles, repeat the steps above and select a different profile each time.\n\n.. image:: /tools/neuron-explorer/images/system-timeline-add-widget.gif\n\n\nAfter adding a widget, you can switch to a different profile by using the profile dropdown at the top of the widget.\n\n.. image:: /tools/neuron-explorer/images/widget_switch_profiles.png\n\n.. note::\n\n   Adding duplicate widgets for the same profile is not currently supported.\n\n\n\nSettings\n----------\n\nThe System Trace Viewer supports multiple grouping modes to organize events for different analysis perspectives.\nYou can switch between the following grouping modes in the settings to focus your analysis on different aspects of system performance:\n\n.. list-table:: Grouping Options\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Grouping Option\n     - Description\n     - Example\n   * - CPU vs Device Grouping (Default)\n     - Groups events by event source (CPU or Neuron device events)\n     - Runtime events: ``i-0b1ea78ca2865fd32/PID:1765325/TID:0/neuron_rt``, Hardware events: ``i-0b1ea78ca2865fd32/PID:1765325/Worker:0/neuron_hw``\n   * - NeuronCore Grouping\n     - Groups events by individual NeuronCore\n     - ``i-0b1ea78ca2865fd32/NC:0``, ``i-0b1ea78ca2865fd32/NC:1``\n   * - Thread Grouping\n     - Groups events by thread identifier\n     - ``i-0b1ea78ca2865fd32/PID:1765325/TID:0``\n   * - Process Grouping\n     - Groups events by process identifier\n     - ``i-0b1ea78ca2865fd32/PID:1765325``\n   * - Instance Grouping\n     - Groups all events by instance only\n     - ``i-0b1ea78ca2865fd32``\n\n.. image:: /tools/neuron-explorer/images/system-timeline-settings.png\n\nEvent Details\n--------------\n\nClicking on trace events in the timeline populates the Event Details widget with a list of properties for the system trace event.\n\n.. image:: /tools/neuron-explorer/images/system-event-details.png\n\nDevice Profile Linking\n------------------------\n\nThe System Trace Viewer links hardware events to the Device Trace Viewer, which renders the corresponding device traces.\n\nNavigating from the System Trace Viewer to a Device Trace Viewer can be accomplished in two ways:\n\nOpen the Device Profile List Modal\n------------------------------------\n\nTo see a list of all device profiles captured during your workload:\n\n1. **Click the \"Device Profiles List\" button** in the top right action bar of the System Trace Viewer to open a modal containing a list of device profiles\n2. **Select a Device Profile and click Submit** to open the Device Trace Viewer with the selected device profile\n\n.. image:: /tools/neuron-explorer/images/system-timeline-device-profiles-list-modal.png\n\nDrill-down from Hardware Events\n---------------------------------\n\nTo drill-down from a hardware event to the Device Trace Viewer:\n\n1. Find a hardware event such as ``nc_exec_running``\n2. Click on the hardware event\n3. Wait for the Device Trace Viewer to open\n\nThis will open a new Device Trace Viewer with the selected device profile showing detailed hardware events. To learn about device profiles, see :doc:`Device Profiles in Neuron Explorer <overview-device-profiles>`.\n\n.. image:: /tools/neuron-explorer/images/system-timeline-hardware-event-linking.gif\n"
  },
  {
    "path": "tools/neuron-explorer/overview-tensor-viewer.rst",
    "content": ".. meta::\n    :description: Learn about the Tensor Viewer in Neuron Explorer for viewing tensor information including names, sizes, shapes, and memory usage details.\n    :date-modified: 01/27/2026\n\n.. _tensor-viewer-overview:\n\nTensor Viewer\n=================\n\nThe Tensor Viewer contains the following information about all tensors in the NEFF file:\n\n* **variable_name** - The tensor name.\n* **type** - How the system uses the tensor. Examples include input tensor, output tensor, or weight tensor.\n* **format** - How the tensor arranges in memory. For example, \"NHWC\" shows a specific dimension arrangement. Letters include N (batch size), H (height), W (width), C (channel).\n* **shape** - The tensor's multi-dimensional shape.\n* **size** - The tensor's total size in bytes.\n* **node** - NEFF node.\n* **pcore_idx** - Index of the physical NeuronCore within a Logical NeuronCore (LNC). A Logical NeuronCore groups physical NeuronCores. For LNC2, this field shows either 0 or 1.\n* **load_to_sbuf_avg_size_bytes** - The average size in bytes of each DMA transfer when the system loads this tensor into the State Buffer.\n* **load_to_sbuf_total_size_bytes** - The total size in bytes of all DMA transfers when the system loads this tensor into the State Buffer.\n* **load_to_sbuf_dma_count** - The total number of DMAs that loaded this tensor into the State Buffer.\n* **load_to_sbuf_repeat_factor** - How many times the system loaded this tensor into the State Buffer. A value of 1 means one load, 2 means two loads, and so on.\n\n.. image:: /tools/profiler/images/tensor-viewer-table.png\n\nYou can use this data to match with framework-level instructions or for kernel development. You can also use it to search for instructions in the Device Timeline Viewer. \nThe SBUF loading information in the table can help you verify tensors are loaded efficiently.\n\nSearching\n---------\n\nYou can use the Tensor Viewer with the Device Timeline Viewer and Search tool to match tensor information in the table with instructions that run on the device. \nEnter the variable_name from the table, into the DMA search field to see all DMA instructions that relate to that tensor.\nThe example below shows a complete search for the tensor token_position_to_id:\n\n.. image:: /tools/profiler/images/tensor-viewer-search-example.png\n"
  },
  {
    "path": "tools/neuron-explorer/view-perfetto.rst",
    "content": ".. meta::\n    :description: Learn about using Neuron Explorer with Perfetto\n    :date-modified: 02/05/2026\n\nViewing Profiles with Perfetto\n==============================\n\n.. note::\n    New Neuron Explorer features released in 2.27 and onwards may not be supported in Perfetto. For the full user experience and features set, please use the Neuron Explorer UI or VSCode Integration.\n\nPerfetto is an open-source trace analysis toolkit with a powerful UI for visualizing and analyzing trace data.\nUsers of Neuron Profiler have the option of viewing their profiles in the Perfetto UI.\n\nThe ``--output-format perfetto`` option writes processed data to Perfetto's native protobuf-based tracing format which can be visualized in the Perfetto UI at https://ui.perfetto.dev/.\n\nExample:\n\n.. code-block:: shell\n\n    neuron-explorer view -d ./output --output-format perfetto\n\nThis will generate a ``system_profile.pftrace`` file for the system profile and a ``device_profile_model_<model_id>.pftrace`` file for each unique compiled model that was executed on a Neuron Device.\n\nTo view the system profile, go to https://ui.perfetto.dev/ and open the ``system_profile.pftrace`` file.\n\n.. note::\n    When loading trace files in the Perfetto UI, your data is processed locally and not uploaded to Perfetto’s servers.\n\n|neuron-explorer-perfetto-timeline|\n\nTo view a device profile go to https://ui.perfetto.dev/ and open the  ``device_profile_model_<model_id>.pftrace`` file. This will show a detailed view of hardware activity on the NeuronCore during execution of this graph.\n\n|neuron-explorer-perfetto-device-timeline|\n\n.. note::\n    Your browser may run out of memory when viewing ``*.pftrace`` (Perfetto trace) files that are more than a few hundred MB. See the section :ref:`Viewing Large Profiles in Perfetto <neuron-profile-large-perfetto-profiles>` for directions on how to view large traces using the trace processor.\n\n\nPerfetto Output View Options\n----------------------------\n\nWhen outputting to Perfetto it is possible to group your traces by different attributes. This is useful for\nlarger profiles involving many NeuronCores and instances. The following options are available:\n\n.. list-table:: Perfetto output view options\n     :header-rows: 1\n     :widths: 30 70\n\n     * - CLI option\n       - Description\n     * - ``--system-trace-primary-group``\n       - First-order grouping of trace events (maps to a Perfetto process / process group of rows). Provide a comma-delimited list of field names. Allowed fields: ``instance_id``, ``thread_id``, ``lnc_idx``, ``process_id``. Default: ``instance_id,process_id``.\n     * - ``--system-trace-secondary-group``\n       - Second-order grouping of trace events (maps to a Perfetto thread / single row). Provide a comma-delimited list of field names. Allowed fields: ``instance_id``, ``worker_gid``, ``thread_id``, ``lnc_idx``, ``process_id``. Default: ``worker_gid,lnc_idx, thread_id``.\n\n\nFor example, the following profile uses ``neuron-explorer view --output-format=perfetto --system-trace-primary-group=instance_id,process_id --system-trace-secondary-group=lnc_idx,thread_id`` to group the system profile first by unique combinations\nof instance_id and process_id, and then in each of those groups there are rows of events with unique combinations of lnc_idx and thread_id.\n\n|neuron-explorer-perfetto-grouping|\n\nGrouping By Global Worker ID\n----------------------------\n\nBy default, Perfetto traces are grouped by ``worker_gid`` which is a unique global identifier for each NeuronCore across all instances in a distributed workload.\nWhen clicking on an event in the trace you will see fields for both ``lnc_idx`` (local NeuronCore index on that process) and ``worker_gid`` (global NeuronCore index across all instances).\nIt is possible for ``lnc_idx`` to be the same for different processes on the same instance or across different instances in a distributed workload. However, ``worker_gid`` is unique for each NeuronCore across all instances.\nThe image below shows how to correlate the naming of tracks (rows) in the Perfetto UI to both ``lnc_idx`` and ``worker_gid``.\n\n|neuron-explorer-perfetto-gid|\n\n\n.. |neuron-explorer-perfetto-timeline| image:: /images/neuron-profiler2-perfetto-timeline.png\n.. |neuron-explorer-perfetto-device-timeline| image:: /images/neuron-profiler2-perfetto-device-timeline.png\n.. |neuron-explorer-perfetto-grouping| image:: /images/neuron-profiler2-perfetto-grouping.png\n.. |neuron-explorer-perfetto-gid| image:: /images/neuron-profiler2-perfetto-gid.png\n"
  },
  {
    "path": "tools/neuron-sys-tools/index.rst",
    "content": "System Tools\n============\n\nNeuron system tools provide essential utilities for monitoring, debugging, and managing AWS Neuron devices and workloads. These command-line tools offer real-time insights into device utilization, process management, hardware health, and performance metrics across Neuron instances.\n\n.. toctree:: \n    :maxdepth: 1\n    :hidden:\n\n    Neuron-Monitor User Guide </tools/neuron-sys-tools/neuron-monitor-user-guide>\n    Neuron-Top User Guide </tools/neuron-sys-tools/neuron-top-user-guide>\n    Neuron-LS User Guide </tools/neuron-sys-tools/neuron-ls>\n    Neuron-Sysfs User Guide </tools/neuron-sys-tools/neuron-sysfs-user-guide>\n    NCCOM-TEST User Guide </tools/neuron-sys-tools/nccom-test>\n    TensorBoard </tools/tensorboard/index>\n\n.. grid:: 1 1 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Neuron-Monitor User Guide\n      :link: /tools/neuron-sys-tools/neuron-monitor-user-guide\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n\n      Real-time monitoring tool for tracking NeuronCore utilization, memory usage, and thermal metrics across Neuron devices with customizable output formats.\n\n   .. grid-item-card:: Neuron-Top User Guide\n      :link: /tools/neuron-sys-tools/neuron-top-user-guide\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n\n      Interactive process viewer similar to htop that displays running processes on Neuron devices with real-time resource consumption metrics.\n\n   .. grid-item-card:: Neuron-LS User Guide\n      :link: /tools/neuron-sys-tools/neuron-ls\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n\n      Device discovery and listing tool that provides detailed information about available Neuron devices, their capabilities, and current status.\n\n   .. grid-item-card:: Neuron-Sysfs User Guide\n      :link: /tools/neuron-sys-tools/neuron-sysfs-user-guide\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n\n      Low-level system interface tool for accessing Neuron device information through the Linux sysfs filesystem interface.\n\n   .. grid-item-card:: NCCOM-TEST User Guide\n      :link: /tools/neuron-sys-tools/nccom-test\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n\n      Collective communication testing and benchmarking tool for validating and measuring performance of multi-device communication patterns.\n\n   .. grid-item-card:: TensorBoard\n      :link: /tools/tensorboard/index\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n\n      TensorBoard Neuron plugin for Trn1 instances, including installation, configuration, and advanced visualization features.\n\n   .. grid-item-card:: Tutorials\n      :link: /tools/tutorials/index\n      :link-type: doc\n      :class-header: sd-bg-secondary sd-text-white\n\n      Tutorials for how to utilize the Neuron system tools suite.\n\n   .. grid-item-card:: What's New\n      :link: /release-notes/prev/2.27.0/index\n      :link-type: doc\n      :class-header: sd-bg-secondary sd-text-white\n\n      Latest updates, new features, and improvements to the Neuron system tools suite.\n"
  },
  {
    "path": "tools/neuron-sys-tools/nccom-test.rst",
    "content": ".. _nccom-test:\n\n======================\nNCCOM-TEST User Guide\n======================\n\n.. contents:: Table of contents\n    :local:\n    :depth: 2\n\nOverview\n--------\n\n**nccom-test** is a benchmarking tool for evaluating Collective Communication operations on AWS Trainium and Inferentia instances. It supports Trn1, Trn2, Trn3, and Inf2 instance types. The tool can assess performance across multiple instances or perform quick environment sanity checks before running more complex workloads. While single-instance benchmarking is supported for all compatible instance types, multi-instance benchmarking is limited to Trainium instances (Trn1, Trn2, and Trn3). \nTo execute collective operations, **nccom-test** will generate, and then execute, NEFFs (Neuron Executable File Format) containing several collective operation instructions.\n\n.. note::\n\n    On Inf2 instances, only single-instance benchmarking is supported. Running a multi-node nccom-test benchmark\n    will result in an error.\n\nUsing nccom-test\n----------------\n\nHere is a simple example which will run a 2 worker (ranks) all-reduce with a total size of 32MB:\n\n\n.. code-block::\n\n    nccom-test -r 2 allr\n         size(B)    count(elems)     type    time(us)    algbw(GB/s)    busbw(GB/s)\n        33554432        33554432    uint8         768          40.69          40.69\n    Avg bus bandwidth:      40.6901GB/s\n\n\nOutput description\n^^^^^^^^^^^^^^^^^^\n\nThe command will output a table containing several columns containing performance metrics.\nThere will be a line for every requested data size (by default the data size is 32MB as\nseen in the previous example).\n\n.. list-table::\n    :widths: 40 260\n    :header-rows: 1\n\n    * - Column name\n      - Description\n    * - size(B)\n      - Size in bytes for the data involved in this collective operation\n    * - count(elems)\n      - Number of elements in the data involved in this collective operation. For example, if **size(B)** is 4 and **type** is fp32,\n        then **count** will be 1 since one single fp32 element has been processed.\n    * - type\n      - Data type for the processed data. Can be: **uint8**, **int8**, **uint16**, **int16**, **fp16**, **bf16**, **int32**, **uint32**, **fp32**\n    * - time(us)\n      - Time in microseconds representing the average of all durations for the Collective Communication operations executed during the benchmark.\n    * - algbw(GB/s)\n      - Algorithm bandwidth in gibibytes (1GiB = 1,073,741,824 bytes) per second which is calculated as **size(B)** / **time(us)**\n    * - busbw(GB/s)\n      - Bus bandwidth - bandwidth per data line in gibibytes per second - it provides a bandwidth number that is independent from the number of ranks (unlike **algbw**).\n        For a more in-depth explanation on bus Bandwidth, please refer to `Bus Bandwidth Calculation`_\n    * - algorithm (optional)\n      - Algorithm used to execute this collective operation (e.g. Ring, Mesh, RDH)\n    * - Avg bus bandwidth\n      - Average of the values in the busbw column\n  \n\n.. _Bus Bandwidth Calculation:\n**Bus Bandwidth Calculation:**\n\nThe purpose of bus bandwidth is to provide a number reflecting how optimally hardware is used, normalizing for different rank counts.\n\nGiven the following:\n\n- ``r`` as the number of ranks participating in a collective operation\n- ``s`` as the size of the collective operation\n- ``B`` as the bus bandwidth of a single rank\n- ``t`` latency of the operation\n\nLet's take an AllGather operation as an example. To complete an AllGather operation with ``r`` ranks, each rank must transfer ``r-1`` data chunks of size ``s/r``. Therefore, with a bandwidth of ``B``, the latency (``t``)\nof the operation would be:\n\n.. code-block::\n\n    t = ((number of chunks to transfer) * (size of each chunk)) / (bandwidth of rank)\n    t = ((r-1) * (s/r)) / B\n\nHowever, for a given collective operation result, we have the latency, but not the bandwidth of each rank. Rearranging to solve for bus bandwidth, we get:\n\n.. code-block::\n\n  B = ((r-1) * (s/r)) / t\n\nwhich, given ``algbw = s / t``, can also be rewritten as:\n\n.. code-block::\n\n  B = ((r-1) / r) * algbw\n\nUsing this formula, we can calculate the bus bandwidth, ``B``, for an AllGather collective operation among ``r`` ranks with size ``s`` that took ``t`` seconds.\n\nWe can now directly compare the calculated bus bandwidth to the actual hardware bandwidth to see how well the hardware is being utilized. For different operations that transfer a different\nnumber of chunks, the bandwidth calculation changes slightly, with our algbw factor ``(r-1) / r`` changing depending on the collective operation:\n\n.. list-table::\n    :widths: 40 40\n    :header-rows: 1\n\n    * - Collective Operation\n      - Bus Bandwidth Factor\n    * - All-Reduce\n      - ``(2 * (r-1)) / r``\n    * - All-Gather\n      - ``(r-1) / r``\n    * - Reduce-Scatter\n      - ``(r-1) / r``\n    * - Send-Receive\n      - 1\n    * - All-to-All\n      - ``(r-1) / r``\n    * - Permute\n      - 1\n    * - All-to-Allv\n      - ``(r-1) / r``\n\n\n\nCLI arguments\n^^^^^^^^^^^^^\n\nRequired Arguments:\n~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Argument\n      - Default value\n      - Description\n    * - <cc operation>\n      - N/A, required argument\n      - The type of Collective Communication operation to execute for this benchmark.\n        Supported types:\n\n            - ``all_reduce`` / ``allr``: All-Reduce\n            - ``all_gather`` / ``allg``: All-Gather\n            - ``reduce_scatter`` / ``redsct``: Reduce-Scatter\n            - ``sendrecv``: Send-Receive\n            - ``alltoall``: All-to-All\n            - ``permute``: Permute\n            - ``alltoallv``: All-to-Allv (Currently only supported for inter-node configurations)\n    * - ``-r, --nworkers``\n      - N/A, required argument\n      - Total number of workers (ranks) to use\n\nBenchmark Configuration:\n~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Argument\n      - Default value\n      - Description\n    * - ``-N, --nnodes``\n      - 1\n      - Total number of nodes (instances) to use. The number of workers will be divided equally across all nodes.\n        If this argument is greater than 1, `MPI Execution`_ or `Slurm Execution`_ will need to be used.\n    * - ``-b, --minbytes``\n      - 32M\n      - The starting size for the benchmark\n    * - ``-e, --maxbytes``\n      - 32M\n      - The end size for the benchmark. **nccom-test** will run benchmarks for all sizes between ``-b, --minbytes`` and\n        ``-e, --maxbytes``, increasing the size by either ``-i, --stepbytes`` or ``--f, --stepfactor`` with every run.\n    * - ``-i, --stepbytes``\n      - (``--maxbytes`` - ``--minbytes``) / 10\n      - Amount of bytes with which to increase the benchmark's size on every subsequent run.\n        For example, for this combination of arguments: ``-b 8 -e 16 -i 4``, the benchmark will\n        be ran for the following sizes: 8 bytes, 12 bytes, 16 bytes.\n    * - ``-f, --stepfactor``\n      - N/A\n      - Factor with which to increase the benchmark's size on every subsequent run.\n        For example, for this combination of argument values: ``-b 8 -e 32 -f 2``, the benchmark will\n        be ran for the following sizes: 8 bytes, 16 bytes, 32 bytes.\n\n.. note::\n\n    All arguments that take a size in bytes will also accept larger size units, for example:\n    ``-f 2048`` can be written as ``-f 2kb`` or ``-f 1048576`` can be written as ``-f 1MB``.\n\nIteration Configuration:\n~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Argument\n      - Default value\n      - Description\n    * - ``-n, --iters``\n      - 20\n      - Number of Collective Communication operations to execute during the benchmark.\n    * - ``-w, --warmup_iters``\n      - 5\n      - Number of Collective Communication operations to execute as warmup during the benchmark. \n        The warmup operations will execute prior to any of the measured operations and their performance will be not be used calculate the reported statistics.\n    * - ``-I, --neff_iters``\n      - N/A\n      - Number of times to execute the NEFF with Collective Communication operations during the benchmark. \n    * - ``-W, --neff_warmup_iters``\n      - N/A\n      - Number of times to execute the NEFF with Collective Communication operations as warmup during the benchmark. All collective operations in a warmup NEFF execution will be ignored when calculating statistics.\n\nTo execute collective operations, ``nccom-test`` will generate, and then execute, NEFFs (Neuron Executable File Format) containing several collective operation instructions.\nThe above flags control how many collective operations are generated, run, and measured.\n\nThere are two primary modes for controlling the number of collective operations run:\n\n1. If neither the ``neff_iters`` nor the ``neff_warmup_iters`` flag is supplied, ``iters + warmup_iters`` will be treated as the desired total number of\n   operations to be run. If necessary, ``nccom-test`` will spread this total number of operations out across several NEFFs.\n\n2. If the user desires more control over how collectives operation execution should be organized, they should use the ``neff_iters`` and ``neff_warmup_iters``\n   flags. When these flags are used, the ``iters`` and the ``warmup_iters`` flags now represent the number of operations in a single NEFF. The NEFF itself will be repeatedly run\n   ``neff_iters + neff_warmup_iters`` times.\n\nExamples:\n\n- ``-n 15``, ``-w 5``, ``-I 10``, would result in 200 Collective Communication operations being run with 150 being measured:\n  The generated NEFF will have 20 (15 measured, 5 warmup) ops and the NEFF will be run 10 times.\n- ``-n 15``, ``-w 5``, ``-I 10``, ``-W 5``, would result in 300 Collective Communication operations being run with 150 being measured:\n  The generated NEFF will have 20 (15 measured, 5 warmup) ops and the NEFF will be run 15 (10 measured, 5 warmup) times\n    \n\nInput/Output Data:\n~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Argument\n      - Default value\n      - Description\n    * - ``-d, --datatype``\n      - ``uint8``\n      - Data type for the data used by the benchmark. Supported types: ``uint8``, ``int8``, ``uint16``, ``int16``,\n        ``fp16``, ``bf16``, ``uint32``, ``int32``, ``fp32``. Input data will be zero filled, unless ``--check`` is\n        provided in which case it will be filled with either pseudo-random data or ones.\n    * - ``-c, --check``\n      - N/A\n      - If provided, validates correctness of the operations. Can additionally specify options: ``random`` (default) or ``all_ones``.\n        For an explanation of these options, see `Data Integrity`_.\n        This will not impact device execution time and collective operation performance (time, algbw, and busbw),\n        but will slightly increase the overall execution time.\n    * - ``--seed``\n      - N/A\n      - Seed to use while generating pseudo-random data for ``random`` correctness check with ``--check`` flag\n    * - ``--unique-buff``\n      - false\n      - Use a unique buffer for the input and output of every collective operation. When using this flag, each collective operation in a NEFF will use a\n        different in-memory input/output buffer than every other operation. For All-Gather operations run with certain algorithms (e.g. Mesh, RDH),\n        there is additional handshaking for output buffers, and using unique buffers may improve collective operation performance.\n    * - ``--coalesced-cc-size-ratio``\n      - N/A\n      - List representing the ratio with which to split the input tensor into multiple tensors for coalesced, collective operations. Given a size of ``4MB`` and a ``coalesced-cc-size-ratio`` of ``[1,2,1]``, each collective\n        operation would actually consist of 3 parallel, coalesced operations of sizes: ``1MB``, ``2MB``, and ``1MB``.\n    * - ``--shared-output-buff``\n      - false\n      - For the CC operation, use a single, shared, HBM output buffer between 2 neuron cores in the same HBM domain.\n    * - ``--alltoallv-metadata``\n      - N/A\n      - For ``alltoallv`` collective operation, a ``json`` file containing send counts, send displacements, receive counts, and receive displacements for the collective operation. \n        Counts specify number of elements to send/receive between ranks, displacements specify where in buffer to send/receive data.\n        Length of count and displacement arrays should equal size of replica group over which ``alltoallv`` collective operation is performed. \n        If one metadata entry is provided, it applies to all ranks, otherwise, specify one entry per rank. `AlltoAllV Example`_.\n\n.. _Data Integrity:\n\nData Integrity:\n\nIf the ``--check`` flag is provided when running ``nccom-test``, the correctness of the CC operations will be verified. There are currently two modes for verification: ``random`` (the default used when only ``--check`` is provided)\nand ``all_ones``. \n\n1. The ``random`` mode will fill each input tensor with pseudo-random data and then, on the CPU, calculate a expected golden output. After collective operation execution,\n   the output tensor of the operation will be compared against the calculated golden tensor. For non-integral types (e.g. ``fp16``, ``fp32``), golden comparison will use tolerances.\n   For operations in which all participating ranks should finish with identical outputs (e.g ``allr``, ``allg``), there will also be a check between ranks to ensure this.\n   If the ``random`` check fails, input, output, and golden tensors will be saved to disk for further investigation. The ``--seed`` flag can be used to set the seed for the\n   pseudo-random input tensor generation. Otherwise, the seed value will be based on the current time and logged.\n\n2. The ``all_ones`` mode will fill each input tensor with the value ``1``. A single, golden value\\: ``G``, will be calculated based on the operation. For example, the golden value\\: ``G``\n   for an All-Reduce with 16 ranks will be ``16``. After operation execution, ``nccom-test`` will verify each output tensor is filled with ``G``.\n\n``random`` mode should be preferred for more rigorous verification. However, for quicker, more easily understood verification, ``all_ones`` should be preferred.\n\n.. _MPI Execution:\nMPI Execution:\n~~~~~~~~~~~~~~~\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Argument\n      - Default value\n      - Description\n    * - ``-s, --hosts``\n      - N/A\n      - Hosts on which to run execution.\n    * - ``--hosts-file``\n      - N/A\n      - File containing hosts on which to run execution. One host specified per line.\n    * - ``--mpi-log-dir``\n      - N/A\n      - If specified, logs from each node in ``mpi`` multi-node benchmark will be saved to a unique file within the specified directory\n\nTo use ``mpi`` mode, provide all hosts for your invocation, either with the ``--hosts`` flag or a ``~/hosts`` file, and set the ``NEURON_RT_ROOT_COMM_ID`` environment variable to the IP address of the first host listed and any free port.\nDepending on your environment, ``mpi`` may require passwordless SSH access to each host in your invocation. See the `Open MPI SSH documentation <https://docs.open-mpi.org/en/v5.0.x/launching-apps/ssh.html#launching-with-ssh>`_ for details.\n\nExample:\n\n``NEURON_RT_ROOT_COMM_ID=10.1.4.145:45654 nccom-test -r 64 -N 2 -d fp32 allr --hosts 10.1.4.145 10.1.4.138``\n\nThe above command will invoke a ``neuron-bench`` process on both hosts listed, to execute the collective operations, using 32 ranks from each host.\nLatency data will be reported back from each host and collected on the host on which the ``nccom-test`` command was invoked. \nThe host on which the ``nccom-test`` command is invoked should usually be one of the provided hosts, but it can be another unrelated host, as long as it can invoke MPI processes\non the provided hosts.\n\n\n.. _Slurm Execution:\nSlurm Execution:\n~~~~~~~~~~~~~~~~\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Argument\n      - Default value\n      - Description\n    * - ``-S, --slurm-mode``\n      - false\n      - Use ``srun`` to run benchmark on ``slurm``-based cluster\n    * - ``-u, --slurm-vcpus-per-node``\n      - Minimum CPU count amongst all nodes\n      - Number of vCPUs available per node in ``slurm`` allocation\n    * - ``--slurm-setup-script``\n      - N/A\n      - Script to run on each node in ``slurm`` allocation before executing benchmark. Can use ``default`` to run \n        a default script installing the latest Neuron software.\n    * - ``--slurm-job-id``\n      - alloc\n      - Specify jobId for ``slurm`` allocation to execute benchmark on. By default, will create a new allocation to execute benchmark on.\n    * - ``--slurm-use-head-node-neuron-bench``\n      - false\n      - Copy ``neuron-bench`` binary from head node to all nodes in allocation\n\nTo use ``slurm`` mode, specify the ``--slurm-mode`` flag. When using slurm mode, ``nccom-test`` invocations should be run from the head node of the slurm cluster. \nUsers can either use an existing slurm job by providing a job id, or have ``nccom-test`` allocate one for you. \nAdditionally, users can provide a path to a setup script to run on each slurm node before execution. Users can alternatively specify ``default`` to use a supplied default setup script.\n\nExamples:\n\n``nccom-test -r 64 -N 2 allr --slurm-mode --slurm-setup-script path/to/my/custom-setup-script.sh``\n\nThe above command will execute collective operation across two nodes using slurm. Slurm will allocate a job with two nodes before beginning execution and will run the ``custom-setup-script.sh``\non each node before executing any collective operations.\n\n\n``nccom-test -r 64 -N 2 allr --slurm-mode --slurm-job-id 12345``\n\nThe above command will use an existing slurm allocation (``jobId: 12345``) with no setup.\n\n\nOutput:\n~~~~~~~\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Argument\n      - Default value\n      - Description\n    * - ``--non-interactive``\n      - false\n      - Do not display any animation or progress indicator.\n    * - ``--report-to-json-file``\n      - N/A\n      - Persist config and results to specified JSON file if a filepath is provided.\n    * - ``-t, --stats``\n      - avg\n      - Latency (time) statistics to display in the final output. Currently supports ``avg`` and any percentile (e.g ``p15``, ``p50``, ``p90``).\n    * - ``--show-algorithm``\n      - false\n      - Show which algorithm (e.g. Ring, Mesh, RDH) was used to execute the collective operation in ``nccom-test`` output.\n        Currently, any hierarchical algorithms used will be displayed as ``hier``, and will not include any sub-algorithms.\n    * - ``--show-input-output-size``\n      - false\n      - Print or save to JSON per rank input and output sizes in B.\n    * - ``--debug``\n      - false\n      - Show debug logs from execution of ``nccom-test`` and ``neuron-bench`` in realtime. Enables ``non-interactive`` mode implicitly.\n\n\nSBUF Collectives:\n~~~~~~~~~~~~~~~~~\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Argument\n      - Default value\n      - Description\n    * - ``--sb2sb``\n      - false\n      - Indicates whether to allocate input, output, and scratch-buffer on SBUF (rather than HBM).  This may result in improved performance.\n    * - ``--input-shape``\n      - N/A\n      - Provide input tensor dimensions in format: ``[step0,step1][num_elem0,num_elem1]``. ``step0/num_elem0`` correspond to the free dimension of the SBUF, while ``step1/num_elem1`` correspond to the partition dimension of the SBUF.\n    * - ``--output-shape``\n      - N/A\n      - Provide output tensor dimensions in format: ``[step0,step1][num_elem0,num_elem1]``. ``step0/num_elem0`` correspond to the free dimension of the SBUF, while ``step1/num_elem1`` correspond to the partition dimension of the SBUF.\n    * - ``--cc-dim``\n      - 1\n      - Control dimensions of tensor concatenation. Either concatenate tensor in free dimension (``cc-dim = 0``) or concatenate in partition dimension first and wrap around in free dimension second (``cc-dim = 1``)\n\n\nReplica Group:\n~~~~~~~~~~~~~~\n\nFlags to control which subset of ranks a collective operation will be executed on.\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Argument\n      - Default value\n      - Description\n    * - ``--data-parallel-dimension``\n      - N/A\n      - Run the given collective operation in parallel across multiple sub-groups of size ``data-parallel-dimension``. For 128 ranks and data parallel dimension of 2, \n        there would be 64 parallel collective operations happening at the same time, each with 2 ranks. Primarily intended for multi-node executions with one-rank-per-node\n        replica groups.\n    * - ``--custom-replica-group``\n      - N/A\n      - Provide the JSON file for custom-defined replica groups.\n    * - ``--custom-src-target-pairs``\n      - N/A\n      - Provide the JSON file for custom-defined source_target_pairs for the collective permute operation.\n\nAdditional Flags:\n~~~~~~~~~~~~~~~~~\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Argument\n      - Default value\n      - Description\n    \n    * - ``--vcpu-pin-mode``\n      - false\n      - Pin CPU thread for each rank to a given CPU.  \n    * - ``--data-collector-port``\n      - 60006\n      - If running ``nccom-test`` in multi-node mode or on another node, a data collector is used to gather latencies from all nodes in benchmark.\n        Port to use for data collector.\n    * - ``--data-collector-host``\n      - current host\n      - Hostname or IP address of node to use as data collector, all latencies from other nodes will be sent to this host\n\nEnvironment Variables\n^^^^^^^^^^^^^^^^^^^^^\nIn addition to CLI arguments, there are also several environment variables which can be used to alter how collectives run inside ``nccom-test``\n\n.. list-table::\n    :widths: 40 80 260\n    :header-rows: 1\n\n    * - Environment Variable\n      - Default value\n      - Description\n    * - ``NEURON_LOGICAL_NC_CONFIG``\n      - 2 for ``trn2`` and ``trn3``. 1 for ``inf2`` and ``trn1``\n      - Controls how many physical NeuronCores are grouped to make up a logical NeuronCore.\n\nUsers may also find certain Neuron Runtime environment variables useful with ``nccom-test`` executions. See :ref:`nrt-configuration`\n\nExamples\n^^^^^^^^\n\n.. note::\n\n    Performance data shown in these examples should not be considered up-to-date. For the latest performance\n    data, please refer to the performance section.\n\n\nSingle Instance Examples\n~~~~~~~~~~~~~~~~~~~~~~~~\n\n- Quick environment validation\n\n    .. code-block::\n\n        nccom-test -r 2 allr\n            size(B)    count(elems)     type    time(us)    algbw(GB/s)    busbw(GB/s)\n            33554432        33554432    uint8         768          40.69          40.69\n        Avg bus bandwidth:      40.6901GB/s\n\n\n    If a problem was found, it can be reported in two possible ways:\n\n    - Immediately:\n\n        .. code-block::\n\n            nccom-test -r 2 allr\n            Neuron DKMS Driver is not running! Read the troubleshooting guide at: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-troubleshoot.html#neuron-driver-installation-fails\n\n\n    - After a benchmark attempt:\n\n        .. code-block::\n\n            nccom-test -r 2 allr\n                 size(B)    count(elems)    type    time(us)    algbw(GB/s)    busbw(GB/s)\n                33554432    Failure running neuron-bench - log file /tmp/nccom_test_log_7pqpdfjf.log\n            1 errors found - test failed\n\n\n        In this case, further information about the error can be found in the ``neuron-bench`` log file.\n\n- 2 rank all-reduce on a single instance for sizes ranging from 1MiB to 1GiB with a step of 4x\n\n    .. code-block::\n\n        nccom-test -r 2 --minbytes 1kb --maxbytes 1gb --stepfactor 4 --datatype fp32 allr\n               size(B)    count(elems)    type    time(us)    algbw(GB/s)    busbw(GB/s)\n                  1024             256    fp32          58           0.02           0.02\n                  4096            1024    fp32          58           0.07           0.07\n                 16384            4096    fp32          58           0.26           0.26\n                 65536           16384    fp32          58           1.05           1.05\n                262144           65536    fp32          60           4.07           4.07\n               1048576          262144    fp32          68          14.36          14.36\n               4194304         1048576    fp32         107          36.51          36.51\n              16777216         4194304    fp32         332          47.06          47.06\n              67108864        16777216    fp32        1214          51.48          51.48\n             268435456        67108864    fp32        4750          52.63          52.63\n            1073741824       268435456    fp32       18930          52.83          52.83\n        Avg bus bandwidth:      23.6671GB/s\n\n\n- 32 rank all-gather on a single instance for sizes ranging from 1KiB to 1MiB with a step of 8x, with correctness checking\n\n\n.. code-block::\n\n        nccom-test -r 32 --minbytes 1kb --maxbytes 1mb --stepfactor 8 --datatype fp32 --check allg\n        size(B)    count(elems)    type    time(us)    algbw(GB/s)    busbw(GB/s)\n        1024             256    fp32         151           0.01           0.01\n        8192            2048    fp32         149           0.05           0.05\n       65536           16384    fp32         150           0.41           0.39\n      524288          131072    fp32         179           2.73           2.64\n    Avg bus bandwidth:      0.7731GB/s\n\n- Specify the custom source target pairs as a JSON file for the collective permute operator ``--custom-src-target-pairs``.\n\n.. code-block::\n\n    nccom-test -r 8 --custom-src-target-pairs pairs.json permute\n    size(B)    count(elems)     type    time:avg(us)    algbw(GB/s)    busbw(GB/s)\n    33554432        33554432    uint8          894.24          37.52          37.52\n    Avg bus bandwidth:\t37.5230GB/s\n\n    cat pairs.json\n    {\n        \"src_target_pairs\": [\n            [\n                [0, 1],\n                [1, 0],\n                [2, 3],\n                [3, 2],\n                [4, 4],\n                [5, 5],\n                [6, 6],\n                [7, 7]\n            ]\n        ]\n    }\n\n\n- Reporting the input and output size explicitly with ``--show-input-output-size``.\n\n.. code-block::\n\n    nccom-test -r 32 --minbytes 1kb --maxbytes 1mb --stepfactor 8 --datatype fp32 --check allg --show-input-output-size\n    size(B)    count(elems)    total_input_size(B)    total_output_size(B)    type    time:avg(us)    algbw(GB/s)    busbw(GB/s)\n       1024             256                     32                    1024    fp32            6.16           0.17           0.16\n       8192            2048                    256                    8192    fp32            6.48           1.26           1.23\n      65536           16384                   2048                   65536    fp32            8.17           8.02           7.77\n     524288          131072                  16384                  524288    fp32           23.16          22.64          21.93\n    Avg bus bandwidth:      7.7715GB/s\n\n- Getting percentile latency results with ``--stats``\n\n.. code-block::\n\n    nccom-test -r 8 --minbytes 1kb --maxbytes 1mb --stepfactor 8 --datatype fp32 --stats avg p25 p50 p90 p99 --iters 1000 allg\n    size(B)    count(elems)    type    time:avg(us)    time:p25(us)    time:p50(us)    time:p90(us)    time:p99(us)    algbw(GB/s)    busbw(GB/s)\n       1024             256    fp32            10.0              10              10              11              12           0.10           0.09\n       8192            2048    fp32           10.22              10              10              11              12           0.80           0.70\n      65536           16384    fp32           11.31              11              11              13              13           5.80           5.07\n     524288          131072    fp32           14.83              14              15              16              17          35.34          30.92\n    Avg bus bandwidth:\t9.1966GB/s\n\n- Example results as JSON with ``--report-to-json-file``\n\n.. code-block::\n\n    nccom-test -r 32 --minbytes 1kb --maxbytes 1mb --stepfactor 8 --datatype fp32 --check allg --report-to-json-file nccom-results.json\n    size(B)    count(elems)    type    time:avg(us)    algbw(GB/s)    busbw(GB/s)\n       1024             256    fp32            6.19           0.17           0.16\n       8192            2048    fp32            6.55           1.25           1.21\n      65536           16384    fp32            8.18           8.01           7.76\n     524288          131072    fp32           23.11          22.69          21.98\n    Avg bus bandwidth:      7.7775GB/s\n\n    python3 -m json.tool nccom-results.json\n    {\n        \"results\": [\n            {\n                \"size(B)\": 1024,\n                \"count(elems)\": 256,\n                \"type\": \"fp32\",\n                \"algbw(GB/s)\": 0.16553675170497603,\n                \"busbw(GB/s)\": 0.16036372821419553,\n                \"time:avg(us)\": 6.19\n            },\n            {\n                \"size(B)\": 8192,\n                \"count(elems)\": 2048,\n                \"type\": \"fp32\",\n                \"algbw(GB/s)\": 1.2500906056270864,\n                \"busbw(GB/s)\": 1.21102527420124,\n                \"time:avg(us)\": 6.55\n            },\n            {\n                \"size(B)\": 65536,\n                \"count(elems)\": 16384,\n                \"type\": \"fp32\",\n                \"algbw(GB/s)\": 8.008982241741455,\n                \"busbw(GB/s)\": 7.758701546687035,\n                \"time:avg(us)\": 8.18\n            },\n            {\n                \"size(B)\": 524288,\n                \"count(elems)\": 131072,\n                \"type\": \"fp32\",\n                \"algbw(GB/s)\": 22.688776793562784,\n                \"busbw(GB/s)\": 21.97975251876395,\n                \"time:avg(us)\": 23.11\n            }\n        ]\n    }\n\n- Example results with ``--show-algorithm`` flag\n\n.. code-block::\n\n    nccom-test -r 16 allr -b 4 -e 1gb -f 16 -d fp32 --show-algorithm\n    size(B)    count(elems)    type    time:avg(us)    algbw(GB/s)    busbw(GB/s)    algorithm\n            4               1    fp32          299.91           0.00           0.00         mesh\n           32               8    fp32          299.69           0.00           0.00         mesh\n          512             128    fp32          299.82           0.00           0.00         mesh\n         8192            2048    fp32          299.74           0.03           0.05         mesh\n       131072           32768    fp32          574.15           0.23           0.43         mesh\n      2097152          524288    fp32          686.32           3.06           5.73          rdh\n     33554432         8388608    fp32         2754.15          12.18          22.84    kangaring\n    536870912       134217728    fp32         9689.51          55.41         103.89    kangaring\n    Avg bus bandwidth:\t16.6181GB/s\n\n\nMultiple Instances Example\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- 64 rank all-reduce on two instances for sizes ranging from 8 bytes to 1GiB with a step of 2x, running 50 ops\n\n    .. code-block::\n\n        NEURON_RT_ROOT_COMM_ID=10.1.4.145:45654 nccom-test -r 64 -N 2 -b 8 -e 1GB -f 2 -n 50 -w 5 -d fp32 allr --hosts 127.0.0.1 10.1.4.138\n               size(B)    count(elems)    type    time(us)    algbw(GB/s)    busbw(GB/s)\n                     8               2    fp32         520           0.00           0.00\n                    16               4    fp32         520           0.00           0.00\n                    32               8    fp32         523           0.00           0.00\n                    64              16    fp32         525           0.00           0.00\n                   128              32    fp32         553           0.00           0.00\n                   256              64    fp32         709           0.00           0.00\n                   512             128    fp32         782           0.00           0.00\n                  1024             256    fp32         840           0.00           0.00\n                  2048             512    fp32         881           0.00           0.00\n                  4096            1024    fp32         916           0.00           0.01\n                  8192            2048    fp32        1013           0.01           0.01\n                 16384            4096    fp32        1031           0.01           0.03\n                 32768            8192    fp32        1174           0.03           0.05\n                 65536           16384    fp32        1315           0.05           0.09\n                131072           32768    fp32        1315           0.09           0.18\n                262144           65536    fp32        1311           0.19           0.37\n                524288          131072    fp32        1312           0.37           0.73\n               1048576          262144    fp32        1328           0.74           1.45\n               2097152          524288    fp32        1329           1.47           2.89\n               4194304         1048576    fp32        1378           2.83           5.58\n               8388608         2097152    fp32        1419           5.51          10.84\n              16777216         4194304    fp32        2138           7.31          14.39\n              33554432         8388608    fp32        2711          11.53          22.69\n              67108864        16777216    fp32        3963          15.77          31.05\n             134217728        33554432    fp32        6279          19.91          39.19\n             268435456        67108864    fp32       11954          20.91          41.17\n             536870912       134217728    fp32       21803          22.93          45.15\n            1073741824       268435456    fp32       41806          23.92          47.09\n        Avg bus bandwidth:      9.3924GB/s\n\n\n.. _AlltoAllV Example:\n- Specify alltoallv-metadata as JSON for ``alltoallv`` operation ``--alltoallv-metadata``.\n.. code-block::\n\n    NEURON_RT_ROOT_COMM_ID=172.32.137.79:44444 nccom-test -r 2 -N 2 -d fp32 alltoallv -b 1MB -e 1MB --hosts 127.0.0.1 172.32.253.16 --alltoallv-metadata alltoallv_metadata.json\n    size(B)    count(elems)    type    time:avg(us)    algbw(GB/s)    busbw(GB/s)\n    1048608          262152    fp32          955.05           1.10           0.55\n    Avg bus bandwidth:\t0.5490GB/s\n\n    cat alltoallv_metadata.json\n    {\n      \"alltoallv_metadata\": [\n        {\n          \"send_counts\": [512, 1024],\n          \"send_displs\": [0, 512],\n          \"recv_counts\": [256, 768],\n          \"recv_displs\": [0, 256]\n        }\n      ]\n    }\n"
  },
  {
    "path": "tools/neuron-sys-tools/neuron-ls.rst",
    "content": ".. _neuron-ls-ug:\n\nNeuron LS User Guide\n---------------------\n\nThe neuron-ls command is a tool for managing Neuron devices in your instance.\nThis command serves two key purposes: it identifies all Neuron devices present in the current instance \nand provides information about the processes running on each device along with the command that launched that process.\nTo use this command, simply type ``neuron-ls`` in your terminal.\n\n.. rubric:: neuron-ls CLI\n\n.. code-block:: text\n\n    neuron-ls [options]\n\n**Options**\n\n``--wide, -w``\n    Displays the table in a wider format.\n\n``--show-all-procs, -a``\n    Show all processes using the Neuron Devices, including processes that aren't using\n    Neuron Runtime 2.x such as ``neuron-monitor`` or ``neuron-ls`` itself.\n\n``--topology, -t``\n    Display topology information about the system's Neuron Devices.\n\n``--json-output, -j``\n    Output in JSON format.\n\n.. note::\n\n  ``neuron-ls`` fully supports the newly launched Trn2 instances.\n\nExamples\n^^^^^^^^\n\n``neuron-ls`` is compatible with all Neuron instance types: inf1, inf2, trn1 and trn2.\nThese are a few examples on running the tool on a trn2n.48xlarge:\n\n::\n\n  $ neuron-ls\n  instance-type: trn2n.48xlarge\n  instance-id: i-aabbccdd123456789\n  logical-neuroncore-config: 2\n  +--------+--------+----------+--------+---------------+--------------+---------------+------+\n  | NEURON | NEURON |  NEURON  | NEURON |   CONNECTED   |     PCI      |      CPU      | NUMA |\n  | DEVICE | CORES  | CORE IDS | MEMORY |    DEVICES    |     BDF      |   AFFINITY    | NODE |\n  +--------+--------+----------+--------+---------------+--------------+---------------+------+\n  | 0      | 4      | 0-3      | 96 GB  | 12, 3, 4, 1   | 0000:cc:00.0 | 48-95,144-191 | 1    |\n  | 1      | 4      | 4-7      | 96 GB  | 13, 0, 5, 2   | 0000:b5:00.0 | 48-95,144-191 | 1    |\n  | 2      | 4      | 8-11     | 96 GB  | 14, 1, 6, 3   | 0000:b6:00.0 | 48-95,144-191 | 1    |\n  | 3      | 4      | 12-15    | 96 GB  | 15, 2, 7, 0   | 0000:cb:00.0 | 48-95,144-191 | 1    |\n  | 4      | 4      | 16-19    | 96 GB  | 0, 7, 8, 5    | 0000:6f:00.0 | 0-47,96-143   | 0    |\n  | 5      | 4      | 20-23    | 96 GB  | 1, 4, 9, 6    | 0000:58:00.0 | 0-47,96-143   | 0    |\n  | 6      | 4      | 24-27    | 96 GB  | 2, 5, 10, 7   | 0000:59:00.0 | 0-47,96-143   | 0    |\n  | 7      | 4      | 28-31    | 96 GB  | 3, 6, 11, 4   | 0000:6e:00.0 | 0-47,96-143   | 0    |\n  | 8      | 4      | 32-35    | 96 GB  | 4, 11, 12, 9  | 0000:9b:00.0 | 0-47,96-143   | 0    |\n  | 9      | 4      | 36-39    | 96 GB  | 5, 8, 13, 10  | 0000:84:00.0 | 0-47,96-143   | 0    |\n  | 10     | 4      | 40-43    | 96 GB  | 6, 9, 14, 11  | 0000:85:00.0 | 0-47,96-143   | 0    |\n  | 11     | 4      | 44-47    | 96 GB  | 7, 10, 15, 8  | 0000:9a:00.0 | 0-47,96-143   | 0    |\n  | 12     | 4      | 48-51    | 96 GB  | 8, 15, 0, 13  | 0000:f8:00.0 | 48-95,144-191 | 1    |\n  | 13     | 4      | 52-55    | 96 GB  | 9, 12, 1, 14  | 0000:e1:00.0 | 48-95,144-191 | 1    |\n  | 14     | 4      | 56-59    | 96 GB  | 10, 13, 2, 15 | 0000:e2:00.0 | 48-95,144-191 | 1    |\n  | 15     | 4      | 60-63    | 96 GB  | 11, 14, 3, 12 | 0000:f7:00.0 | 48-95,144-191 | 1    |\n  +--------+--------+----------+--------+---------------+--------------+---------------+------+\n\n::\n\n  $ neuron-ls --wide\n  instance-type: trn2n.48xlarge\n  instance-id: i-aabbccdd123456789\n  logical-neuroncore-config: 2\n  +--------+--------+--------+---------------+---------+--------+----------------------------------------------------------------------------------+---------+\n  | NEURON | NEURON | NEURON |   CONNECTED   |   PCI   |  PID   |                                     COMMAND                                      | RUNTIME |\n  | DEVICE | CORES  | MEMORY |    DEVICES    |   BDF   |        |                                                                                  | VERSION |\n  +--------+--------+--------+---------------+---------+--------+----------------------------------------------------------------------------------+---------+\n  | 0      | 4      | 96 GB  | 12, 3, 4, 1   | cc:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 1      | 4      | 96 GB  | 13, 0, 5, 2   | b5:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 2      | 4      | 96 GB  | 14, 1, 6, 3   | b6:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 3      | 4      | 96 GB  | 15, 2, 7, 0   | cb:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 4      | 4      | 96 GB  | 0, 7, 8, 5    | 6f:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 5      | 4      | 96 GB  | 1, 4, 9, 6    | 58:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 6      | 4      | 96 GB  | 2, 5, 10, 7   | 59:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 7      | 4      | 96 GB  | 3, 6, 11, 4   | 6e:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 8      | 4      | 96 GB  | 4, 11, 12, 9  | 9b:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 9      | 4      | 96 GB  | 5, 8, 13, 10  | 84:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 10     | 4      | 96 GB  | 6, 9, 14, 11  | 85:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 11     | 4      | 96 GB  | 7, 10, 15, 8  | 9a:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 12     | 4      | 96 GB  | 8, 15, 0, 13  | f8:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 13     | 4      | 96 GB  | 9, 12, 1, 14  | e1:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 14     | 4      | 96 GB  | 10, 13, 2, 15 | e2:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  | 15     | 4      | 96 GB  | 11, 14, 3, 12 | f7:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0   |\n  +--------+--------+--------+---------------+---------+--------+----------------------------------------------------------------------------------+---------+\n\n::\n\n  $ neuron-ls --show-all-procs\n  instance-type: trn2n.48xlarge\n  instance-id: i-aabbccdd123456789\n  logical-neuroncore-config: 2\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | NEURON | NEURON | NEURON |   CONNECTED   |   PCI   |  PID   |                 COMMAND                  | RUNTIME |\n  | DEVICE | CORES  | MEMORY |    DEVICES    |   BDF   |        |                                          | VERSION |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 0      | 4      | 96 GB  | 12, 3, 4, 1   | cc:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 1      | 4      | 96 GB  | 13, 0, 5, 2   | b5:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 2      | 4      | 96 GB  | 14, 1, 6, 3   | b6:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 3      | 4      | 96 GB  | 15, 2, 7, 0   | cb:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 4      | 4      | 96 GB  | 0, 7, 8, 5    | 6f:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 5      | 4      | 96 GB  | 1, 4, 9, 6    | 58:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 6      | 4      | 96 GB  | 2, 5, 10, 7   | 59:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 7      | 4      | 96 GB  | 3, 6, 11, 4   | 6e:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 8      | 4      | 96 GB  | 4, 11, 12, 9  | 9b:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 9      | 4      | 96 GB  | 5, 8, 13, 10  | 84:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 10     | 4      | 96 GB  | 6, 9, 14, 11  | 85:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 11     | 4      | 96 GB  | 7, 10, 15, 8  | 9a:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 12     | 4      | 96 GB  | 8, 15, 0, 13  | f8:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 13     | 4      | 96 GB  | 9, 12, 1, 14  | e1:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 14     | 4      | 96 GB  | 10, 13, 2, 15 | e2:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n  | 15     | 4      | 96 GB  | 11, 14, 3, 12 | f7:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0   |\n  |        |        |        |               |         | 269192 | neuron-ls --show-all-procs               | NA      |\n  +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+\n\n::\n\n  $ neuron-ls --topology\n  instance-type: trn2n.48xlarge\n  instance-id: i-aabbccdd123456789\n  logical-neuroncore-config: 2\n  +--------+--------+--------+---------------+---------+\n  | NEURON | NEURON | NEURON |   CONNECTED   |   PCI   |\n  | DEVICE | CORES  | MEMORY |    DEVICES    |   BDF   |\n  +--------+--------+--------+---------------+---------+\n  | 0      | 4      | 96 GB  | 12, 3, 4, 1   | cc:00.0 |\n  | 1      | 4      | 96 GB  | 13, 0, 5, 2   | b5:00.0 |\n  | 2      | 4      | 96 GB  | 14, 1, 6, 3   | b6:00.0 |\n  | 3      | 4      | 96 GB  | 15, 2, 7, 0   | cb:00.0 |\n  | 4      | 4      | 96 GB  | 0, 7, 8, 5    | 6f:00.0 |\n  | 5      | 4      | 96 GB  | 1, 4, 9, 6    | 58:00.0 |\n  | 6      | 4      | 96 GB  | 2, 5, 10, 7   | 59:00.0 |\n  | 7      | 4      | 96 GB  | 3, 6, 11, 4   | 6e:00.0 |\n  | 8      | 4      | 96 GB  | 4, 11, 12, 9  | 9b:00.0 |\n  | 9      | 4      | 96 GB  | 5, 8, 13, 10  | 84:00.0 |\n  | 10     | 4      | 96 GB  | 6, 9, 14, 11  | 85:00.0 |\n  | 11     | 4      | 96 GB  | 7, 10, 15, 8  | 9a:00.0 |\n  | 12     | 4      | 96 GB  | 8, 15, 0, 13  | f8:00.0 |\n  | 13     | 4      | 96 GB  | 9, 12, 1, 14  | e1:00.0 |\n  | 14     | 4      | 96 GB  | 10, 13, 2, 15 | e2:00.0 |\n  | 15     | 4      | 96 GB  | 11, 14, 3, 12 | f7:00.0 |\n  +--------+--------+--------+---------------+---------+\n\n\n  Neuron Device Topology\n        *        *        *        *      \n        │        │        │        │      \n        ▼        ▼        ▼        ▼      \n  *––►[ 0 ]◄––►[ 1 ]◄––►[ 2 ]◄––►[ 3 ]◄––*\n        ▲        ▲        ▲        ▲      \n        │        │        │        │      \n        ▼        ▼        ▼        ▼      \n  *––►[ 4 ]◄––►[ 5 ]◄––►[ 6 ]◄––►[ 7 ]◄––*\n        ▲        ▲        ▲        ▲      \n        │        │        │        │      \n        ▼        ▼        ▼        ▼      \n  *––►[ 8 ]◄––►[ 9 ]◄––►[10 ]◄––►[11 ]◄––*\n        ▲        ▲        ▲        ▲      \n        │        │        │        │      \n        ▼        ▼        ▼        ▼      \n  *––►[12 ]◄––►[13 ]◄––►[14 ]◄––►[15 ]◄––*\n        ▲        ▲        ▲        ▲      \n        │        │        │        │      \n        *        *        *        *      \n\n  Legend:\n\n          *––► = Wrap-around link\n\n::\n\n  $ neuron-ls -j\n  [\n    {\n        \"neuron_device\": 0,\n        \"bdf\": \"cc:00.0\",\n        \"cpu_affinity\": \"48-95,144-191\",\n        \"numa_node\": \"1\",\n        \"connected_to\": [\n            12,\n            3,\n            4,\n            1\n        ],\n        \"nc_count\": 4,\n        \"logical_neuroncore_config\": 2,\n        \"memory_size\": 103079215104,\n        \"neuroncore_ids\": [\n            0,\n            1,\n            2,\n            3\n        ],\n        \"neuron_processes\": [\n            {\n                \"pid\": 113985,\n                \"command\": \"neuron-bench exec --run-as-cc-neff --...\",\n                \"neuron_runtime_version\": \"2.0.0\"\n            }\n        ]\n    },\n    ...\n    {\n        \"neuron_device\": 15,\n        \"bdf\": \"f7:00.0\",\n        \"cpu_affinity\": \"48-95,144-191\",\n        \"numa_node\": \"1\",\n        \"connected_to\": [\n            11,\n            14,\n            3,\n            12\n        ],\n        \"nc_count\": 4,\n        \"logical_neuroncore_config\": 2,\n        \"memory_size\": 103079215104,\n        \"neuroncore_ids\": [\n            60,\n            61,\n            62,\n            63\n        ],\n        \"neuron_processes\": [\n            {\n                \"pid\": 113985,\n                \"command\": \"neuron-bench exec --run-as-cc-neff --...\",\n                \"neuron_runtime_version\": \"2.0.0\"\n            }\n        ]\n    }\n  ]\n\nField Definitions\n^^^^^^^^^^^^^^^^^\n\n-  instance-type: Type of instance on which neuron-ls is running.\n-  instance-id: EC2 ID of the instance on which neuron-ls is running.\n-  logical-neuroncore-config: (only available on trn2 instances) the current logical NeuronCore configuration; for more information refer to :ref:`logical-neuroncore-config`\n-  NEURON DEVICE / neuron_device: Logical ID assigned to the Neuron Device.\n-  NEURON CORES / nc_count: Number of NeuronCores present in the Neuron Device.\n-  NEURON CORE IDS / neuroncore_ids: Range or list of individual NeuronCore IDs belonging to the device, used with ``NEURON_RT_VISIBLE_CORES`` for selective core usage.\n-  NEURON MEMORY / memory_size: Amount DRAM memory in Neuron Device.\n-  CONNECTED DEVICES / connected_to: Logical ID of Neuron Devices connected to this\n   Neuron Device.\n-  PCI BDF / bdf: PCI Bus Device Function (BDF) ID of the device.\n-  CPU AFFINITY / cpu_affinity: CPU cores that per NeuronCore proxy threads are pinned to\n-  NUMA NODE / numa_node: NUMA (Non-Uniform Memory Access) node associated with the Neuron Device\n-  PID / pid: ID of the process using this Neuron Device.\n-  COMMAND / command: Command used to launch the process using this\n   Neuron Device.\n-  RUNTIME VERSION / neuron_runtime_version: Version of Neuron Runtime (if applicable) for\n   the application using this Neuron Device.\n"
  },
  {
    "path": "tools/neuron-sys-tools/neuron-monitor-user-guide.rst",
    "content": ".. _neuron-monitor-ug:\n\nNeuron Monitor User Guide\n=========================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n\n**neuron-monitor** collects metrics and stats from the Neuron\nApplications running on the system and streams the collected data to\n``stdout`` in ``JSON`` format. It is provided as part of the\n``aws-neuron-tools`` package.\n\nThese metrics and stats are organized into **metric groups** which can\nbe configured by providing a configuration file as described in :ref:`using-neuron-monitor`\n\nWhen running, **neuron-monitor** will:\n\n-  Collect the data for the metric groups which, based on the elapsed\n   time since their last update, need to be updated\n-  Take the newly collected data and consolidate it into a large report\n-  Serialize that report to JSON and stream it to stdout from where it\n   can be consumed by other tools - such as the sample\n   :ref:`neuron-monitor-cloudwatch.py <neuron-monitor-cloudwatchpy>` and\n   :ref:`neuron-monitor-prometheus.py <neuron-monitor-prometheuspy>`\n   scripts.\n-  Wait until at least one **metric group** needs to be collected and\n   repeat this flow\n\n.. note::\n\n  ``neuron-monitor`` fully supports the newly launched Trn2 instances.\n\n.. _using-neuron-monitor:\n\nUsing neuron-monitor\n--------------------\n\n.. _monitor_cli:\n\n.. rubric:: neuron-monitor CLI\n\n.. program:: neuron-monitor\n\n.. option:: neuron-monitor [parameters]\n\n    neuron-monitor accepts the following optional parameters:\n\n    - ``--verbose`` (int) default=0: Can be 0 to 4, and controls the amount of\n      debugging and verbose information sent to stderr; **0: no output**,\n      **4: maximum verbosity**\n\n    - ``-c, --config-file`` (string): Allows specifying a valid path to a\n      neuron-monitor JSON configuration file\n\n\n**Example:**\n\n.. code-block::\n\n    neuron-monitor -c monitor.conf\n\n\nNot specifying any configuration file will enable collecting all the metric groups\nwith a period of 5 seconds for all currently running Neuron applications.\n\nConfiguration file example\n~~~~~~~~~~~~~~~~~~~~~~~~~~\nExample of a configuration file which enables all available **metric\ngroups** for every running Neuron application, with a global update period of 1\nsecond and sets an update period of 2 seconds for the ``\"neuron_hw_counters\"``\nmetric group:\n\n::\n\n   {\n     \"period\": \"1s\",\n     \"neuron_runtimes\": [\n       {\n         \"tag_filter\": \".*\",\n         \"metrics\": [\n           {\n             \"type\": \"neuroncore_counters\"\n           },\n           {\n             \"type\": \"memory_used\"\n           },\n           {\n             \"type\": \"neuron_runtime_vcpu_usage\"\n           },\n           {\n             \"type\": \"execution_stats\"\n           }\n         ]\n       }\n     ],\n     \"system_metrics\": [\n       {\n         \"type\": \"vcpu_usage\"\n       },\n       {\n         \"type\": \"memory_info\"\n       },\n       {\n          \"period\": \"2s\",\n          \"type\": \"neuron_hw_counters\"\n       }\n     ]\n   }\n\nNeuron applications tagging\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\nIn order to make application monitoring easier, Neuron applications can be tagged with a 255 character\nstring which identifies that app. Tagging is done using the ``NEURON_PROCESS_TAG`` environment variable.\n\nFor example:\n``NEURON_PROCESS_TAG=my_app_1 python training.py`` will associate the ``my_app_1`` tag with that Python application.\nIf ``NEURON_PROCESS_TAG`` is not specified, the application's PID will be used as a TAG.\n\nThis tag will be used by neuron-monitor to filter Neuron applications.\n\nJSON objects and fields in the configuration file\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n-  ``\"neuron_runtimes\"`` - array of objects specifying which Neuron\n   Applications to monitor and what metric groups are enabled for each\n   of them\n\n   -  ``\"tag_filter\"`` - a regex which will be used to filter Neuron applications tags\n      in order to determine if they will be monitored (optional)\n   -  ``\"metrics\"`` - array of objects specifying which metric groups to\n      capture for this Neuron application\n\n      -  ``\"type\"`` - type of metric group\n\n-  ``\"period\"`` - this field applies to **metric group** objects and\n   sets the amount of time between two updates for that metric group\n\n   -  if can be specified as part of the **root** and/or\n      **neuron_runtime** objects where it applies to all their children,\n      and/or as part of a **metric group** object\n   -  if there's no period specified, a default value of **5 seconds**\n      will be used\n\n-  ``\"system_metrics\"`` - array of objects specifying which system level\n   metric groups are enabled\n\nNeuron Runtime-level metric groups\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n-  :ref:`neuron-monitor-nc-counters` - NeuronCore related metrics\n-  :ref:`neuron-monitor-memory-used` - data on the amount of memory used\n   by the Neuron application\n-  :ref:`neuron-monitor-vcpu-usage` - Neuron application vCPU\n   utilization data\n-  :ref:`neuron-monitor-execution-stats` - Neuron application execution\n   stats, including error count and latency\n\nSystem-wide metric groups\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\n-  :ref:`neuron-monitor-vcpu-usage` - system-wide vCPU usage\n-  :ref:`neuron-monitor-memory-info` - system-wide memory usage\n-  :ref:`neuron-monitor-hw-counters` - counters for correctable and\n   uncorrectable memory ecc events\n\n\nExecution model\n---------------\n\n|image|\n\nneuron-monitor waits for one or more **metric groups** to be up for\nupdate, then collects the corresponding data, consolidates it into a\nreport which is streamed to stdout as a JSON and goes back to waiting.\n\nThe JSON output format\n----------------------\n\nWhenever the report gets updated, a complete JSON is written to stdout.\nThis is its structure:\n\n::\n\n   {\n     \"neuron_runtime_data\": [\n       {\n         \"pid\": 0,\n         \"address\": \"\",\n         \"neuron_runtime_tag\", \"my_app_1\",\n         \"error\": \"\",\n         \"report\": {\n           \"neuroncore_counters\": {\n               [...]\n           },\n           \"execution_stats\": {\n               [...]\n           },\n           \"memory_used\": {\n               [...]\n           },\n           \"neuron_runtime_vcpu_usage\": {\n               [...]\n           }\n         }\n       }\n     ],\n     \"system_data\": {\n       \"neuron_hw_counters\": {\n               [...]\n       },\n       \"vcpu_usage\": {\n               [...]\n       },\n       \"memory_info\": {\n               [...]\n       }\n     },\n     \"instance_info\": {\n               [...]\n     },\n     \"neuron_hardware_info\": {\n               [...]\n     },\n     \"neuron_k8s_info\": {\n               [...]\n     }\n   }\n\n-  ``\"neuron_runtime_data\"`` is an array containing one entry per each\n   Neuron application which passes the filter specified in the settings file\n\n   -  ``\"pid\"`` is the pid of this Neuron application\n   -  ``\"neuron_runtime_tag\"`` is the configured tag for the Neuron application\n   -  ``\"error\"`` specifies any error that occurred when collecting data\n      from this Neuron application\n   -  ``\"report\"`` will contain the results for the Neuron application-level\n      metric groups; their formats are described below\n\n-  ``\"system_data\"`` has a similar structure to ``\"neuron_runtime_data\"``‘s\n   ``\"report\"`` but only contains system-level metric groups (not\n   associated to any Neuron application)\n\nRegardless of the configuration, the following two JSON objects are always present\nin the output:\n\n.. _neuron-monitor-instance-info:\n\ninstance_info\n~~~~~~~~~~~~~\n\nContains information about the instance on which neuron-monitor is running.\n::\n\n     \"instance_info\": {\n       \"instance_name\": \"My_Instance\",\n       \"instance_id\": \"i-0011223344556677a\",\n       \"instance_type\": \"trn2n.48xlarge\",\n       \"instance_availability_zone\": \"us-west-2b\",\n       \"instance_availability_zone_id\": \"usw2-az2\",\n       \"instance_region\": \"us-west-2\",\n       \"ami_id\": \"ami-0011223344556677b\",\n       \"subnet_id\": \"subnet-112233ee\",\n       \"error\": \"\"\n     }\n\nDepending on when the instance was launched, the following fields might\nnot be available:\n\n-  ``instance_availability_zone_id`` : available only for instances\n   launched in 2020-08-24 and later\n-  ``instance_region`` : available only for instances launched on\n   2020-08-24 and later\n-  ``instance_name`` : available only if ``instance_region`` is set and\n   aws-cli tools are installed\n\n``error`` will contain an error string if getting one of the fields,\n**except those mentioned above**, resulted in an error.\n\n.. _neuron-monitor-hardware-info:\n\nneuron_hardware_info\n~~~~~~~~~~~~~~~~~~~~\n\nContains basic information about the Neuron hardware.\n::\n\n     \"neuron_hardware_info\": {\n       \"neuron_device_type\": \"trainium2\",\n       \"neuron_device_version\": \"v4\",\n       \"neuroncore_version\": \"v3d\",\n       \"neuron_device_count\": 16,\n       \"neuron_device_memory_size\": 103079215104,\n       \"neuroncore_per_device_count\": 4,\n       \"logical_neuroncore_config\": 2,\n       \"error\": \"\"\n     }\n\n-  ``neuron_device_type``: type of the Neuron Devices on the instance\n-  ``neuroncore_version``: version of the NeuronCores on the instance\n-  ``neuron_device_count`` : number of available Neuron Devices\n-  ``neuron_device_memory_size``: total memory available on each Neuron Device\n-  ``neuroncore_per_device_count`` : number of NeuronCores present on each Neuron Device\n-  ``logical_neuroncore_config`` : the current Logical NeuronCore configuration\n-  ``error`` : will contain an error string if any occurred when getting this information\n   (usually due to the Neuron Driver not being installed or not running).\n\nThe following JSON object is disabled by default, but can be made available if \"k8s_info\" is enabled:\n\n.. _neuron-monitor-k8s-info:\n\nneuron_k8s_info\n~~~~~~~~~~~~~~~\n\nContains information about what Kubernetes pods/containers are using Neuron resources\n::\n\n           \"neuron_k8s_info\": {\n             \"period\": 15.030359284,\n             \"neuroncores_k8s_info\": {\n               \"0\": {\n                 \"pod_name\": \"p0\",\n                 \"namespace\": \"n0\",\n                 \"container_name\": [\"c0\"]\n               },\n               \"1\": {\n                 \"pod_name\": \"p0\",\n                 \"namespace\": \"n0\",\n                 \"container_name\": [\"c0\"]\n               },\n               ...\n             \"neurondevices_k8s_info\": {\n               \"0\": {\n                 \"pod_name\": \"p0\",\n                 \"namespace\": \"n0\",\n                 \"container_name\": [\"c0\"]\n               },\n               ...\n             }\n             \"error\": \"\"\n           },\n\n- ``\"neuroncores_k8s_info\"`` - object containing information on which\n  Neuron cores are being used by Kubernetes pod/containers, indexed by\n  Neuron core index: ``\"neuroncore_index\": { neuroncore_k8s_data }``\n\n  - ``\"pod_name\"`` - name of pod using Neuron core\n  - ``\"namespace\"`` - namespace of pod using Neuron core\n  - ``\"container_name\"`` - names of containers using Neuron core\n\n- ``\"neurondevices_k8s_info\"`` - object containing information on which\n  Neuron devices are being used by Kubernetes pod/containers, indexed by\n  Neuron device index: ``\"neurondevice_index\": { neurondevice_k8s_data }``\n\n  - ``\"pod_name\"`` - name of pod using Neuron device\n  - ``\"namespace\"`` - namespace of pod using Neuron device\n  - ``\"container_name\"`` - names of containers using Neuron device\n\n- ``\"error\"`` - will contain an error string if any occurred when getting this information\n\nFor more information on how to enable K8s information, see :ref:`neuron-monitor-k8s-infopy`.\n\n.. _neuron-metric-groups:\n\nMetric Groups\n~~~~~~~~~~~~~\n\nEach **metric group** requested in the settings file will get an entry\nin the resulting output. The general format for such an entry is:\n\n::\n\n   \"metric_group\": {\n     \"period\": 1.015, // Actual captured period, in seconds\n     \"error\": \"\",     // Error, if any occurred, otherwise an empty string\n     [...]            // Metric group specific data\n   }\n\n.. _runtime-level-metric-groups-1:\n\nNeuron application level metric groups\n--------------------------------------\n\n.. _neuron-monitor-nc-counters:\n\nneuroncore_counters\n~~~~~~~~~~~~~~~~~~~\n\n::\n\n           \"neuroncore_counters\": {\n             \"period\": 1.000113182,\n             \"neuroncores_in_use\": {\n               \"0\": {\n                 \"neuroncore_utilization\": 42.01,\n                 \"flops\": 1234567891011,\n                 \"v3d\": {\n                   \"nc_v3.0\": {\n                     \"neuroncore_utilization\": 21.01\n                   },\n                   \"nc_v3.1\": {\n                     \"neuroncore_utilization\": 63.01\n                   }\n                 }\n               },\n               \"1\": {\n                 \"neuroncore_utilization\": 42.02,\n                 \"flops\": 1234567891021,\n                 \"v3d\": {\n                   \"nc_v3.2\": {\n                     \"neuroncore_utilization\": 21.02\n                   },\n                   \"nc_v3.3\": {\n                     \"neuroncore_utilization\": 63.02\n                   }\n                 }\n               },\n               [...]\n             },\n             \"error\": \"\"\n           }\n\n-  ``\"neuroncores_in_use\"`` is an object containing data for all the\n   NeuronCores that were active when the data was captured, indexed by\n   NeuronCore index: ``\"neuroncore_index\": { neuroncore_data }``\n\n   -  ``\"neuroncore_utilization\"`` - NeuronCore utilization, in percent,\n      during the captured period\n   -  ``\"flops\"`` - number of floating point operations per second during\n      the captured period\n   -  ``\"v3d\"`` - only available on Trn2 - contains the utilization for every\n      physical NeuronCore that makes up the current NeuronCore\n\n-  ``\"error\"`` - string containing any error that occurred when\n   collecting the data\n\n.. _neuron-monitor-execution-stats:\n\nexecution_stats\n~~~~~~~~~~~~~~~\n\n::\n\n           \"execution_stats\": {\n             \"period\": 1.030613214,\n             \"error_summary\": {\n               \"generic\": 0,\n               \"numerical\": 0,\n               \"transient\": 0,\n               \"model\": 0,\n               \"runtime\": 0,\n               \"hardware\": 0\n             },\n             \"execution_summary\": {\n               \"completed\": 123,\n               \"completed_with_err\": 0,\n               \"completed_with_num_err\": 0,\n               \"timed_out\": 0,\n               \"incorrect_input\": 0,\n               \"failed_to_queue\": 0\n             },\n             \"latency_stats\": {\n               \"total_latency\": {\n                 \"p0\": 0.01100001,\n                 \"p1\": 0.01100002,\n                 \"p25\": 0.01100004,\n                 \"p50\": 0.01100008,\n                 \"p75\": 0.01100010,\n                 \"p99\": 0.01100012,\n                 \"p100\": 0.01100013\n               },\n               \"device_latency\": {\n                 \"p0\": 0.01000001,\n                 \"p1\": 0.01000002,\n                 \"p25\": 0.01000004,\n                 \"p50\": 0.01000008,\n                 \"p75\": 0.01000010,\n                 \"p99\": 0.01000012,\n                 \"p100\": 0.01000013\n               }\n             },\n             \"error\": \"\"\n           },\n\n-  ``\"error_summary\"`` is an object containing the error counts for the\n   captured period indexed by their type\n\n   -  ``\"generic\"`` - generic execution errors\n   -  ``\"numeric\"`` - NAN errors encountered during execution\n   -  ``\"transient\"`` - recoverable errors, such as ECC corrections\n   -  ``\"model\"`` - model-related errors\n   -  ``\"runtime\"`` - Neuron Runtime errors\n   -  ``\"hardware\"`` - hardware errors such as uncorrectable ECC issues\n\n-  ``\"execution_summary\"`` is an object containing all execution outcome\n   counts for the captured period indexed by their type\n\n   -  ``\"completed\"`` - executions completed successfully\n   -  ``\"completed_with_err\"`` - executions that ended in an error other\n      than a numeric error\n   -  ``\"completed_with_num_err\"`` - executions that ended in a numeric\n      error\n   -  ``\"timed_out\"`` - executions that took longer than the Neuron\n      Runtime configured timeout value\n   -  ``\"incorrect_input\"`` - executions that failed to start due to\n      incorrect input being provided\n   -  ``\"failed_to_queue\"`` - execution requests that were rejected due\n      to Neuron Runtime not being able to queue them\n\n-  ``\"latency_stats\"`` contains two objects containing latency\n   percentiles, in seconds, for the data captured for the model\n   executed during the captured period. If there are no models being\n   executed during this time, the two objects will be ``null`` (i.e.\n   ``\"total_latency\": null``)\n\n   -  ``\"total_latency\"`` - percentiles, in seconds, representing latency for an execution as measured by the Neuron Runtime\n   -  ``\"device_latency\"`` - percentiles, in seconds, representing execution time exclusively on the Neuron Device\n\n-  ``\"error\"`` - string containing any error that occurred when\n   collecting the data\n\n\n.. _neuron-monitor-memory-used:\n\nmemory_used\n~~~~~~~~~~~\n\n::\n\n     \"memory_used\": {\n       \"period\": 1.00001,\n       \"neuron_runtime_used_bytes\": {\n         \"host\": 6997643264,\n         \"neuron_device\": 12519788544,\n         \"usage_breakdown\": {\n           \"host\": {\n             \"application_memory\": 6996594688,\n             \"constants\": 0,\n             \"dma_buffers\": 1048576,\n             \"tensors\": 0\n           },\n           \"neuroncore_memory_usage\": {\n             \"0\": {\n               \"constants\": 193986816,\n               \"model_code\": 176285056,\n               \"model_shared_scratchpad\": 0,\n               \"runtime_memory\": 0,\n               \"tensors\": 20971520\n             },\n             \"1\": {\n               \"constants\": 193986816,\n               \"model_code\": 176285056,\n               \"model_shared_scratchpad\": 0,\n               \"runtime_memory\": 0,\n               \"tensors\": 20971520\n             },\n             ...\n           }\n       }\n       \"loaded_models\": [\n         {\n           \"name\": \"neff\",\n           \"uuid\": \"91f2f66e83ea419dace1da07617ad39f\",\n           \"model_id\": 10005,\n           \"is_running\": false,\n           \"subgraphs\": {\n             \"sg_00\": {\n               \"memory_used_bytes\": {\n                 \"host\": 20480,\n                 \"neuron_device\": 21001024,\n                 \"usage_breakdown\": {\n                   \"host\": {\n                     \"application_memory\": 20480,\n                     \"constants\": 0,\n                     \"dma_buffers\": 0,\n                     \"tensors\": 0\n                   },\n                   \"neuron_device\": {\n                     \"constants\": 20971520,\n                     \"model_code\": 29504,\n                     \"runtime_memory\": 0,\n                     \"tensors\": 0\n                   }\n                 }\n               },\n               \"neuroncore_index\": 0,\n               \"neuron_device_index\": 12\n             }\n           }\n         },\n         ...\n         ],\n         \"error\": \"\"\n      }\n\n\n-  ``\"memory_used\"`` summarizes the amount of memory used by the\n   Neuron application\n\n   -  ``\"neuron_runtime_used_bytes\"`` - current amount of memory used by\n      the Neuron application\n\n      -  ``\"host\"`` - total host DRAM usage in bytes\n      -  ``\"neuron_device\"`` - total Neuron device memory usage in bytes\n      -  ``\"usage_breakdown\"`` - a breakdown of the total memory usage in the other two fields\n\n         - ``\"host\"`` - breakdown of the host memory usage\n\n            - ``\"application_memory\"`` - amount of host memory used by the application - this includes all allocations that are not included\n              in the next categories\n            - ``\"constants\"`` - amount of host memory used for constants during training (or weights during inference)\n            - ``\"dma_buffers\"`` - amount of host memory used for DMA transfers\n            - ``\"tensors\"`` - amount of host memory used for tensors\n\n         - ``\"neuroncore_memory_usage\"`` - a breakdown of memory allocated on the Neuron Devices and the NeuronCores for which it was allocated\n\n            - ``\"0\"`` - ``\"64\"`` (for trn2-48xlarge) - NeuronCores for which the memory was allocated\n            - ``\"constants\"`` - amount of device memory used for constants during training (or weights during inference)\n            - ``\"model_code\"`` - amount of device memory used for models' executable code\n            - ``\"model_shared_scratchpad\"`` - amount of device memory used for the scratchpad shared by the models - a memory region reserved for the models' internal variables and auxiliary buffers\n            - ``\"runtime_memory\"`` - amount of device memory used by the Neuron Runtime\n            - ``\"tensors\"`` - amount of device memory used for tensors\n\n-  ``\"loaded_models\"`` - array containing objects representing loaded models\n\n   -  ``\"name\"`` - name of the model\n   -  ``\"uuid\"`` - unique id for the model\n   -  ``\"model_id\"`` - Neuron application-assigned ID for this model\n   -  ``\"is_running\"`` - true if this model is currently started, false otherwise\n   -  \"``subgraphs\"`` - object containing all the subgraphs for the model, indexed by their name: ``\"subgraph_name\": { subgraph_data }``\n\n      -  ``\"memory_used_bytes\"`` - memory usage for this subgraph\n\n         -  ``\"host\"`` - total host DRAM usage in bytes\n         -  ``\"neuron_device\"`` - total Neuron device DRAM usage in bytes\n         -  ``\"usage_breakdown\"`` - a breakdown of memory allocated at load time for this model\n\n            - ``\"host\"`` - breakdown of host memory allocated for this model\n\n               - ``\"application_memory\"`` - amount of host memory allocated for this model by the Neuron Runtime which doesn't fall in any\n                 of the next categories\n               - ``\"constants\"`` - amount of host memory used for constants during training (or weights during inference)\n               - ``\"dma_buffers\"`` - host memory allocated for DMA transfers for this model\n               - ``\"tensors\"`` - amount of device memory used for tensors at model load time\n\n            - ``\"neuron_device\"`` - a breakdown of device memory allocated for this model\n\n               - ``\"constants\"`` - amount of device memory used for constants during training (or weights during inference)\n               - ``\"model_code\"`` - amount of device memory used for the model's executable code\n               - ``\"runtime_memory\"`` - amount of device memory used by the Neuron Runtime for this model\n               - ``\"tensors\"`` - amount of device memory allocated for tensors at this model's load time\n\n      -  ``\"neuroncore_index\"`` - NeuronCore index on which the subgraph is loaded\n      -  ``\"neuron_device_index\"`` - Neuron device index on which the subgraph is loaded\n\n\n-  ``\"error\"`` - string containing any error that occurred when\n   collecting the data\n\n\nneuron_runtime_vcpu_usage\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n           \"neuron_runtime_vcpu_usage\": {\n             \"period\": 1.030604818,\n             \"vcpu_usage\": {\n               \"user\": 42.01,\n               \"system\": 12.34\n             },\n             \"error\": \"\"\n           }\n\n-  ``\"vcpu_usage\"`` - object showing vCPU usage in percentages for the\n   Neuron application during the captured period\n\n   -  ``\"user\"`` - percentage of time spent in user code by this Neuron\n      Application\n   -  ``\"system\"`` - percentage of time spent in kernel code by this\n      Neuron application\n\n-  ``\"error\"`` - string containing any error that occurred when\n   collecting the data\n\nSystem level metric groups\n--------------------------\n\n.. _neuron-monitor-hw-counters:\n\nneuron_hw_counters\n~~~~~~~~~~~~~~~~~~\n\n::\n\n           \"neuron_hw_counters\": {\n             \"period\": 1.030359284,\n             \"neuron_devices\": [\n               {\n                 \"neuron_device_index\": 0,\n                 \"mem_ecc_corrected\": 0,\n                 \"mem_ecc_uncorrected\": 0,\n                 \"sram_ecc_uncorrected\": 0,\n                 \"sram_ecc_corrected\": 0\n               }\n             ],\n             \"error\": \"\"\n           },\n\n-  ``\"neuron_devices\"`` - array containing ECC data for all Neuron devices\n\n   -  ``\"neuron_device_index\"`` - Neuron device index\n   -  ``\"mem_ecc_corrected\"`` - number of corrected ECC events in the\n      Neuron device’s DRAM\n   -  ``\"mem_ecc_uncorrected\"`` - number of uncorrected ECC events in\n      the Neuron device’s DRAM\n   -  ``\"sram_ecc_uncorrected\"`` - number of uncorrected ECC events in\n      the Neuron device’s SRAM\n   -  ``\"sram_ecc_corrected\"`` - number of corrected ECC events in\n      the Neuron device’s SRAM\n\n-  ``\"error\"`` - string containing any error that occurred when\n   collecting the data\n\n.. _neuron-monitor-vcpu-usage:\n\nvcpu_usage\n~~~~~~~~~~~~\n\n::\n\n   \"vcpu_usage\": {\n     \"period\": 0.999974868,\n     \"average_usage\": {\n       \"user\": 32.77,\n       \"nice\": 0,\n       \"system\": 22.87,\n       \"idle\": 39.36,\n       \"io_wait\": 0,\n       \"irq\": 0,\n       \"soft_irq\": 0\n     },\n     \"usage_data\": {\n       \"0\": {\n         \"user\": 34.41,\n         \"nice\": 0,\n         \"system\": 27.96,\n         \"idle\": 37.63,\n         \"io_wait\": 0,\n         \"irq\": 0,\n         \"soft_irq\": 0\n       },\n       \"1\": {\n         \"user\": 56.84,\n         \"nice\": 0,\n         \"system\": 28.42,\n         \"idle\": 14.74,\n         \"io_wait\": 0,\n         \"irq\": 0,\n         \"soft_irq\": 0\n       },\n       [...]\n     },\n     \"context_switch_count\": 123456,\n     \"error\": \"\"\n   }\n\n-  each vCPU usage object contains the following fields:\n\n   -  ``\"user\"`` - percentage of time spent in user code\n   -  ``\"nice\"`` - percentage of time spent executing niced user code\n   -  ``\"system\"`` - percentage of time spent executing kernel code\n   -  ``\"idle\"`` - percentage of time spent idle\n   -  ``\"io_wait\"`` - percentage of time spent waiting for IO operations\n   -  ``\"irq\"`` - percentage of time spent servicing hardware interrupts\n   -  ``\"soft_irq\"`` - percentage of time spent servicing software\n      interrupts\n\n-  ``\"average_usage\"`` - contains the average usage across all vCPUs\n   during the captured period\n-  ``\"usage_data\"`` - contains per vCPU usage during the captured period\n-  ``\"context_switch_count\"`` - contains the number of vCPU context\n   switches during the captured period\n-  ``\"error\"`` - string containing any error that occurred when\n   collecting the data\n\n.. _neuron-monitor-memory-info:\n\nmemory_info\n~~~~~~~~~~~\n\n::\n\n   \"memory_info\": {\n     \"period\": 5.346411129,\n     \"memory_total_bytes\": 49345835008,\n     \"memory_used_bytes\": 16042344448,\n     \"swap_total_bytes\": 0,\n     \"swap_used_bytes\": 0,\n     \"error\": \"\"\n   }\n\n-  ``\"memory_total_bytes\"`` - total size of the host memory, in bytes\n\n-  ``\"memory_used_bytes\"`` - amount of host memory in use, in bytes\n\n-  ``\"swap_total_bytes\"`` - total size of the host swap file, in bytes\n\n-  ``\"swap_used_bytes\"`` - amount of swap memory in use, in bytes\n\n\n.. _neuron-monitor-companion-scripts:\n\nCompanion scripts\n-----------------\n\nneuron-monitor is installed with three Python companion scripts:\n:ref:`neuron-monitor-cloudwatchpy`, :ref:`neuron-monitor-prometheuspy`, and\n:ref:`neuron-monitor-k8s-infopy`\n\n.. _neuron-monitor-cloudwatchpy:\n\nneuron-monitor-cloudwatch.py\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIt requires Python3 and the `boto3 Python\nmodule <https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#quickstart>`__.\nIt is installed to:\n``/opt/aws/neuron/bin/neuron-monitor-cloudwatch.py``.\n\n.. _using-neuron-monitor-cloudwatchpy:\n\nUsing neuron-monitor-cloudwatch.py\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   neuron-monitor | neuron-monitor-cloudwatch.py --namespace <namespace> --region <region>\n\nFor example:\n\n::\n\n   neuron-monitor | neuron-monitor-cloudwatch.py --namespace neuron_monitor_test --region us-west-2\n\n.. _neuron-monitor-prometheuspy:\n\nneuron-monitor-prometheus.py\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIt requires Python3 and the `Prometheus client Python\nmodule <https://github.com/prometheus/client_python>`__. It is installed\nto: ``/opt/aws/neuron/bin/neuron-monitor-prometheus.py``.\n\n.. _using-neuron-monitor-prometheuspy:\n\nUsing neuron-monitor-prometheus.py\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   neuron-monitor | neuron-monitor-prometheus.py --port <port>\n\nFor example:\n\n::\n\n   neuron-monitor | neuron-monitor-prometheus.py --port 8008\n\nThe default value for ``--port`` is ``8000``.\n\nIf your data visualization framework is Grafana, we provided a :download:`Grafana dashboard </src/examples/neuron-monitor/neuron-monitor-grafana.json>`\nwhich integrates with Prometheus and this script.\n\n.. |image| image:: ../../images/nm-img2.png\n\n.. _neuron-monitor-k8s-infopy:\n\nneuron-monitor-k8s-info.py (Beta)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIt requires Python3 and the `gRPC Python\npackage <https://pypi.org/project/grpcio/>`__. It is installed\nto: ``/opt/aws/neuron/bin/neuron-monitor-k8s-info.py``.\n\n.. important::\n\n   This companion script is in Beta and is disabled by default.\n\n   It only works on EKS, and is currently not supported with EKS auto mode.\n\n.. _using-neuron-monitor-k8s-infopy:\n\nUsing neuron-monitor-k8s-info.py\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n::\n\n   neuron-monitor | neuron-monitor-prometheus.py --port <port> --enable-k8s-info | neuron-monitor-k8s-info.py --period <seconds>\n\nFor example:\n\n::\n\n   neuron-monitor | neuron-monitor-prometheus.py --port 8008 --enable-k8s-info | neuron-monitor-k8s-info.py --period 30\n\nThe default value for ``--period`` is ``15``.\n\nRunning neuron monitor in Kubernetes environment\n-------------------------------------------------\n\nFor running neuron monitor in Kubernetes environment, please refer to instructions `here <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html>`_.\n"
  },
  {
    "path": "tools/neuron-sys-tools/neuron-sysfs-user-guide.rst",
    "content": ".. _neuron-sysfs-ug:\n\nNeuron Sysfs User Guide\n=======================\n\n.. contents:: Table of contents\n    :local:\n    :depth: 3\n\nIntroduction\n------------\nThe kernel provides a few ways in which userspace programs can get system information from the kernel space. Sysfs is one common way to do so. It is a virtual filesystem typically mounted on the ``/sys`` directory and contains information about hardware devices attached to the system and about drivers handling those devices. By navigating the hierarchical structure of the sysfs filesystem and viewing the information provided by its files and directories, you can gather valuable information that can help diagnose and resolve a wide range of hardware and system issues.\n\nThus a sysfs filesystem is set up per Neuron Device under ``/sys/devices/virtual/neuron_device`` to give you an insight into the Neuron Driver and Runtime at system level. By performing several simple CLIs such as reading or writing to a sysfs file, you can get information such as Runtime status, memory usage, Driver info etc. You can even create your own shell scripts to query Runtime and Driver statistics from sysfs and generate customized reports.\n\nThis user guide will first explain the Neuron sysfs structure and then introduce many ways where you can perform diagnostic works with Neuron sysfs.\n\n\nNeuron Sysfs Filesystem Structure\n---------------------------------\nHigh Level Overview\n^^^^^^^^^^^^^^^^^^^\n\nHere is the high level structure of the Neuron sysfs filesystem, where the total and present counters are not shown:\n\n.. code-block:: bash\n\n  /sys/devices/virtual/neuron_device/\n  ├── neuron0/\n  │   ├── subsystem\n  │   ├── uevent\n  │   ├── connected_devices\n  │   ├── core_count\n  │   ├── reset\n  │   ├── power/\n  │   │   ├── async\n  │   │   ├── control\n  │   │   ├── runtime_active_time\n  │   │   ├── runtime_active_kids\n  │   │   └── ...\n  │   ├── info/\n  │   │   ├── notify_delay\n  │   │   ├── serial_number\n  │   │   └── architecture/\n  │   │       ├── arch_type\n  │   │       ├── device_name\n  │   │       └── instance_type\n  ├── stats\n  │   ├── hardware\n  │   │   ├── mem_ecc_uncorrected\n  │   │   ├── mem_ecc_repairable_uncorrected\n  │   │   └── sram_ecc_uncorrected\n  │   ├── memory_usage\n  │   │    └── host_mem\n  │   │       ├── application_memory\n  │   │       ├── constants\n  │   │       ├── dma_buffers\n  │   │       ├── dma_rings\n  │   │       ├── driver_memory\n  │   │       ├── notifications\n  │   │       ├── tensors\n  │   │       └── uncategorized\n  │   └── power\n  │        └── utilization\n  ├── neuron_core0/\n  │       ├── info/\n  │       │   └── architecture/\n  │       │       └── arch_type\n  │       ├── stats/\n  │       │   ├── status/\n  │       │   │   ├── exec_bad_input\n  │       │   │   ├── hw_error\n  │       │   │   ├── infer_failed_to_queue\n  │       │   │   ├── resource_nc_error\n  │       │   │   ├── unsupported_neff_version\n  │       │   │   ├── failure\n  │       │   │   ├── infer_completed_with_error\n  │       │   │   ├── invalid_error\n  │       │   │   ├── oob_error\n  │       │   │   ├── success\n  │       │   │   ├── generic_error\n  │       │   │   ├── infer_completed_with_num_error\n  │       │   │   ├── resource_error\n  │       │   │   └── timeout\n  │       │   ├── memory_usage/\n  │       │   │   ├── device_mem/\n  │       │   │   │   ├── collectives\n  │       │   │   │   ├── constants\n  │       │   │   │   ├── dma_rings\n  │       │   │   │   ├── driver_memory\n  │       │   │   │   ├── model_code\n  │       │   │   │   ├── model_shared_scratchpad\n  │       │   │   │   ├── nonshared_scratchpad\n  │       │   │   │   ├── notifications\n  │       │   │   │   ├── runtime_memory\n  │       │   │   │   ├── tensors\n  │       │   |   │   └── uncategorized\n  │       │   │   └── host_mem\n  │       │   └── other_info/\n  │       │       ├── flop_count\n  │       │       ├── inference_count\n  │       │       ├── model_load_count\n  │       │       ├── reset_fail_count\n  │       │       ├── reset_req_count\n  │       │       └── nc_time_in_use\n  │       └── ...\n  │── neuron_core1/\n  │   │   ├── info/\n  │   │   │   └── ...\n  │   │   └── stats/\n  │   │       └── ...\n  │   └── ...\n  ├── neuron1\n  ├── neuron2\n  ├── neuron3\n  └── ...\n\n\nEach Neuron Device is represented as a directory under ``/sys/devices/virtual/neuron_device/``, where ``neuron0/`` represents the Neuron Device 0, ``neuron1/`` represents the Neuron Device 1, etc. Each NeuronCore is represented as a directory under a Neuron Device directory, represented as ``neuron_core{0,1,2,...}``. Metrics such as Runtime and Driver info and statistics are collected as per NeuronCore in two directories under the NeuronCore directory, i.e. ``info/`` and ``stats/``.\n\nMost of the metrics belong to a category called “counter.” \nEach counter is represented as a directory, which holds two numerical values as two files: total and present. Each memory usage counter has an additional value called peak.\nThe total value starts accumulating metrics when the Driver is loaded. The present value records the last changed metric value. The peak value records the max value so far.\nEach counter has the same filesystem structure like this:\n\n.. code-block:: dash\n\n    /sys/devices/virtual/neuron_device/neuron0/neuron_core0/status/\n    ├── exec_bad_input/\n    │   ├── total\n    │   └── present\n    ├── hw_error/\n    │   ├── total\n    │   └── present\n    ├── infer_failed_to_queue/\n    │   ├── total\n    │   └── present\n    └── ...\n\n\n\nDescription for Each Field\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n``info/``: This directory stores general information about hardware and software. None of them are counter types.\n\n* ``notify_delay``: The delay between notifications from the Neuron Device.  Current settings are on (``0``) or off (``-1``).  Off by default. \n\n* ``serial_number``: The unique device identifier.\n\n* ``architecture/``: This directory stores hardware architecture information.\n\n  * ``arch_type``: The architecture type of the Neuron Device. Sample architecture types are v1, v2, and v3. You can only read the value. You cannot change it.\n\n  * ``instance_type``: The instance type of the Neuron Device. Sample instance types are Inf1, Inf2, and Trn1. You can only read the value. You cannot change it.\n\n  * ``device_type``: The Neuron Device type. Sample Neuron Device types are Inferentia, Inferentia2, and Trainium1. You can only read the value. You cannot change it.\n\n\n``stats/``: This directory stores Neuron Runtime and Driver statistics. It contains three subdirectories: ``status/``, ``memory_usage/``, and ``other_info/``.\n\n* ``status/``: This directory stores the number of each return status of API calls. As explained in :ref:`The LIBNRT API Return Codes <nrt_api>`, every API call returns an NRT_STATUS value, which represents the return status of that API call. Our sysfs filesystem stores all ``NRT_STATUS`` as subdirectories under the ``status/`` directory. They all have the counter structure. Thus each ``NRT_STATUS`` subdirectory holds two values (total and present) and records the number of times you receive a certain ``NRT_STATUS``. The following is description for each ``NRT_STATUS`` subdirectory. You should see the description align with what is described in :ref:`The LIBNRT API Return Codes <nrt_api>`.\n\n* ``memory_usage/``: This directory contains memory usage statistics for both device and host, represented as counters. In this directory, the total counters indicate the current memory usage, present counters represent the memory allocation or deallocation amount in the previous operation, and peak counters indicate the maximum memory usage observed. Additionally, this directory provides detailed breakdown statistics for device and host memory usage. These memory breakdown details correspond to the :ref:`Memory Usage Summary <neuron_top_mem_usage>` section displayed on in Neuron Monitor.\n\n  * ``device_mem/``: The amount of memory that Neuron Runtime uses for weights, instructions and DMA rings.\n\n    * This device memory per NeuronCore is further categorized into five types: ``collectives/``, ``constants/``, ``dma_rings/``, ``driver_memory/``, ``model_code/``, ``model_shared_scratchpad/``, ``nonshared_scratchpad/``, ``notifications/``, ``runtime_memory/``, ``tensors/``, and ``uncategorized/``. Each of these categories has total, present, and peak.\n        * ``collectives`` - amount of device memory used for collective communication between workers\n        * ``constants`` - amount of device memory used for constants (for applications running training) or weights (for applications running inferences)\n        * ``dma_rings`` - amount of device memory used for storing model executable code used for data movements\n        * ``driver_memory`` - amount of device memory used by the Neuron Driver\n        * ``model_code`` - amount of device memory used for storing model executable code\n        * ``model_shared_scratchpad`` - amount of device memory used for the shared model scratchpad, a buffer shared between models on the same Neuron Core used for internal model variables and other auxiliary buffers\n        * ``nonshared_scratchpad`` - amount of device memory used for non-shared model scratchpad, a buffer used by a single model for internal model variables and other auxiliary buffers\n        * ``notifications`` - amount of device memory used to store instruction level trace information used to profile workloads ran on the device\n        * ``runtime_memory`` - amount of device memory used by the Neuron Runtime (outside of the previous categories)\n        * ``tensors`` - amount of device memory used for tensors\n        * ``uncategorized`` - amount of device memory that does not belong in any other catagory in this list\n  \n  * ``host_mem/``: The amount of memory that Neuron Runtime uses for input and output tensors.\n\n    * The host memory per Neuron Device is further categorized into four types: ``application_memory/``, ``constants/``, ``dma_buffers/``, ``dma_rings/``, ``driver_memory/``, ``notifications/``, ``tensors/``, ``uncategorized/``.  These categories provide more granular host memory classification compared to :ref:`Host Used Memory <neuron_top_host_mem_usage>` section. Each of these categories has total, present, and peak\n\n  * ``hardware/``: Hardware statistics.\n\n    * ``mem_ecc_uncorrected``: The number of unrepairable uncorrected ECC events in the Neuron device's DRAM.\n\n    * ``mem_ecc_repairable_uncorrected``: The number of repairable uncorrected ECC events in the Neuron device's DRAM.\n\n    * ``sram_ecc_uncorrected``: The  number of uncorrected ECC events in the Neuron device's SRAM.\n  * ``power/``: Power statistics.\n\n    * ``utilization``: Reports per-minute power usage statistics as a percentage of max power in the following format:\n\n        <status>,<timestamp>,<min_power>,<max_power>,<avg_power>\n\n        **Field descriptions:**\n\n        status\n            Indicates the sampling state in a string.  Valid values are:\n\n              ``POWER_STATUS_VALID`` - Sampling successful\n\n              ``POWER_STATUS_NO_DATA`` - No samples available\n\n              ``POWER_STATUS_INVALID`` - An internal sampling error occurred\n\n        timestamp\n            Time when the sample was collected in Unix epoch seconds (integer)\n\n        min_power\n            Minimum power utilization during the sampling period (0.00-100.00%)\n\n        max_power\n            Maximum power utilization during the sampling period (0.00-100.00%)\n\n        avg_power\n            Average power utilization during the sampling period (0.00-100.00%)\n\n      The interface updates these statistics every minute based on continuous power sampling.\n* ``other_info/``: This directory contains statistics that are not included by ``status/`` and ``memory_usage/``. None of them are counter types.\n\n  * ``flop_count``: The number of flops. You can use it to calculate the TFLOP/s by ``flop_count`` / time interval\n\n  * ``inference_count``: The number of successful inferences\n\n  * ``model_load_count``:  The number of successful model loads\n\n  * ``reset_fail_count``: The number of failed device resets\n\n  * ``reset_req_count``:  The number of device resets requests\n\n  * ``nc_time_in_use``:  The time interval in microseconds between the start and the end of the current execution on hardware\n\nOther fields:\n\n* ``connected_devices``: The list of connected devices' ids. You should see the same output as neuron-ls's CONNECTED DEVICES.\n\n* ``reset``: write to this file resets corresponding the Neuron Device.\n\n\nRead and Write to Sysfs\n^^^^^^^^^^^^^^^^^^^^^^^^^\n\nReading a sysfs file gives the value for the corresponding metric. You can use the cat command to view the contents of the sysfs files.: \n\n.. code-block:: bash\n\n  ubuntu@ip-xxx-xx-xx-xxx:~$ sudo cat /sys/devices/virtual/neuron_device/neuron0/neuron_core0/stats/status/failure/total \n  0\n  ubuntu@ip-xxx-xx-xx-xxx:~$ sudo cat /sys/devices/virtual/neuron_device/neuron0/neuron_core0/info/architecture/arch_type \n  NCv2\n\nSysfs metrics of counter type are write to clear. You can write any value to the file, and the metric will be set to 0:\n\n.. code-block:: bash\n\n  ubuntu@ip-xxx-xx-xx-xxx:~$ echo 1 | sudo tee /sys/devices/virtual/neuron_device/neuron0/neuron_core0/stats/status/failure/total \n  1\n\n\nWriting to ``reset`` resets the corresponding Neuron Device. E.g. the below resets Neuron Device 0:\n\n.. code-block:: bash\n\n  ubuntu@ip-xxx-xx-xx-xxx:~$ echo 1 | sudo tee /sys/devices/virtual/neuron_device/neuron0/reset\n  1\n\nNote\n^^^^\n\nAll files under ``/sys/devices/virtual/neuron_device/neuron0/power`` such as ``runtime_active_kids`` or ``runtime_status`` are related to generic device power management. They are not created or controlled by our sysfs metrics. The word ``runtime`` in these files does not refer to Neuron Runtime.\n\n.. _troubleshoot_via_sysfs:\n\nHow to Troubleshoot via Sysfs\n-----------------------------\nYou can perform simple and easy tasks to troubleshoot your ML jobs with one or a few CLIs to read or write the sysfs filesystem.\nYou can do aggregations across all the NeuronCores and all the Neuron Device to get a summarized view using your scripts.\n\n\nYou can also use the Sysfs notification feature to wait passively (without wasting CPU cycles) for changes to the values of Sysfs files. To use this feature, you need to implement a user-space program that calls the poll() function on the Sysfs file that you want to wait on. \nThe poll() function has the following signature: ``unsigned int (*poll) (struct file *, struct poll_table_struct *)``.\nBy default, the Sysfs notification feature is turned off when the driver is loaded. To enable notifications, you can set the value of ``/sys/devices/virtual/neuron_device/neuron0/info/notify_delay`` to 0. To disable notifications, you can set it to -1. Please note that enabling this feature can impact performance.\n\nHere is a sample user space program using poll():\n\n.. code-block:: dash\n\n\t#include <fcntl.h>\n\t#include <poll.h>\n\t#include <unistd.h>\n\t#include <stdio.h>\n\t#include <stdlib.h>\n\n\tint main(int argc, char * argv[])\n\t{\n\t\tchar readbuf[128];\n\t\tint attr_fd = -1; \n\t\tstruct pollfd pfd;\n\t\tint retval = 0;\n\t\tssize_t read_bytes;\n\n\t\tif (argc < 2) {\n\t\t\tfprintf(stderr, \"Error: Please specify sysfs file path\\n\");\n\t\t\texit(1);\n\t\t}   \n\t\tattr_fd = open(argv[1], O_RDONLY, 0); \n\t\tif (attr_fd < 0) {\n\t\t\tperror(argv[1]);\n\t\t\texit(2);\n\t\t}   \n\n\t\tread_bytes = read(attr_fd, readbuf, sizeof(readbuf));\n\t\tif (read_bytes < 0) {\n\t\t\tperror(argv[1]);\n\t\t\texit(3);\n\t\t}   \n\t\tprintf(\"%.*s\", (int)read_bytes, readbuf);\n\n\t\tpfd.fd = attr_fd;\n\t\tpfd.events = POLLERR | POLLPRI;\n\t\tpfd.revents = 0;\n\t\twhile ((retval = poll(&pfd, 1, 100)) >= 0) {\n\t\t\tif (pfd.revents & (POLLERR | POLLPRI)) {\n\t\t\t\tpfd.revents = 0;\n\n\t\t\t\tlseek(attr_fd, 0, SEEK_SET);\n\t\t\t\tread_bytes = read(attr_fd, readbuf, sizeof(readbuf));\n\t\t\t\tif (read_bytes < 0) {\n\t\t\t\t\tperror(argv[1]);\n\t\t\t\t\texit(4);\n\t\t\t\t}\n\t\t\t\tprintf(\"%.*s\", (int)read_bytes, readbuf);\n\t\t\t}\n\t\t}\n\t\treturn 0;\n\t}\n\n\n"
  },
  {
    "path": "tools/neuron-sys-tools/neuron-top-user-guide.rst",
    "content": ".. _neuron-top-ug:\n\nNeuron Top User Guide\n=====================\n\n.. contents:: Table of contents\n   :local:\n   :depth: 2\n\nOverview\n--------\n``neuron-top`` provides useful information about NeuronCore and vCPU utilization, memory usage,\nloaded models, and Neuron applications.\n\n.. note::\n\n  ``neuron-top`` fully supports the newly launched trn2 instances.\n\n.. note::\n\n  If you are parsing ``neuron-top`` output in your automation environment, you can now replace it with ``neuron-monitor``\n  (:ref:`neuron-monitor-ug`) which outputs data in a standardized, easier to parse JSON format.\n\nUsing neuron-top\n----------------\n\nCommand line arguments\n~~~~~~~~~~~~~~~~~~~~~~\nLaunch ``neuron-top`` by simply typing its name in the shell: ``neuron-top``.\n\nUser interface\n~~~~~~~~~~~~~~\n\nThe title section of the user interface shows the application's version number,\nEC2 instance ID, and the instance type on which it is running:\n\n|titleimg|\n\nThe rest of the user interface is divided in 4 sections. The data shown in these\nsections applies to the currently selected tab - which can be the 'all' tab,\nwhich aggregates data from all running Neuron processes, or a tab representing\na single Neuron process:\n\n|overview|\n\n* The ``NeuronCore <vers> Utilization`` section shows the NeuronCore utilization for the\n  currently selected tab. ``<vers>`` is the version of the NeuronCores on the instance (for example,\n  ``v2`` for trn1 instances and inf2 instances, ``v3`` for trn2 instances with ``LNC=1``, ``v3d`` for trn2\n  instances with ``LNC=2``) \n\n  Pressing the 'F' key will toggle between displaying utilization percentages - as seen in the previous image -\n  and teraflops (trillion floating point operations per second), as seen in the image below:\n\n|flops|\n\n* The ``VCPU Utilization`` section shows:\n\n  * ``System vCPU usage`` - the two percentages are user% and system%\n  * ``Runtime vCPU usage`` - same breakdown\n\n.. _neuron_top_mem_usage:\n\n* The ``Memory Usage Summary`` section provides a breakdown of the total memory usage on the Neuron Device as well\n  as on the host:\n\n  .. _neuron_top_host_mem_usage:\n\n  * ``Host Used Memory`` - amount of host memory used by the selected application (or an aggregate of all applications if 'All' is selected)\n\n    * ``Total`` - total amount of host memory used\n    * ``Tensors`` - amount of host memory used for tensors\n    * ``Constants`` - amount of host memory used for constants (for applications running training) or weights (for applications running inferences)\n    * ``DMA Buffers`` - amount of host memory used for DMA transfers\n    * ``App. Memory`` - amount of host memory used by the application that doesn't fall in any of the previous categories\n\n  .. _neuron_top_device_mem_usage:\n\n  * ``Device Used Memory`` - amount of device memory used by the selected application (or an aggregate of all applications if 'All' is selected)\n\n    * ``Total`` - total amount of device memory used\n    * ``Tensors`` - amount of device memory used for tensors\n    * ``Constants`` - amount of device memory used for constants (for applications running training) or weights (for applications running inferences)\n    * ``Model Code`` - amount of device memory used for storing model executable code\n    * ``Runtime Memory`` - amount of device memory used by the Neuron Runtime (outside of the previous categories)\n    * ``Model Scratchpad`` - amount of device memory used for the shared model scratchpad, a shared buffer used for internal model variables and other\n      auxiliary buffers\n\n* ``Memory Usage Details`` contains memory usage data organized as a tree which can be expanded/collapsed. The columns are:\n\n  * ``Model ID`` - the Neuron Runtime identifier for this model instance\n  * ``Host Memory`` - amount of host memory used\n  * ``Device Memory`` - amount of device memory used\n\nThe tree view shows the amount of memory used for the same categories shown in the ``Memory Usage Summary`` but in this section\nthey are attached to either a model (if the memory has been allocated at model load time for that model), or to a NeuronCore (if\nthe memory can't be associated with a model, but has been allocated for that NeuronCore).\nThe 'parent' shows the total amount of memory used - the sum of its children.\n\n.. note::\n  The up/down/left/right keys can be used to navigate the tree view. The 'x' key expands/collapses the\n  entire tree.\n\nThe bottom bar shows which Neuron process' data is currently displayed by highlighting\nits tag using a green font and marking it using a pair of '>', '<' characters. The 'all'\ntab shows an aggregated view of all the Neuron processes currently running on the instance.\n\n|tabbar|\n\n.. note::\n\n  The '1'-'9' keys select the current tab. 'a'/'d' selects the previous/next\n  tab on the bar.\n\n.. |titleimg| image:: ../../images/trn2-neuron-top-header.png\n.. |overview| image:: ../../images/trn2-neuron-top.png\n.. |flops| image:: ../../images/trn2-neuron-top-nc.png\n.. |tabbar| image:: ../../images/nt-2.png\n"
  },
  {
    "path": "tools/profiler/neuron-profile-user-guide.rst",
    "content": ".. _neuron-profile-ug:\n\nNeuron Profiler User Guide\n============================\n\nThe Neuron Profiler, ``neuron-profile``, is a tool to profile and analyze performance of a ML model compiled with the Neuron compiler and run on NeuronDevices.\n\n.. important::\n\n    The Neuron Profiler will be replaced by the new Neuron Explorer in a future release. For more details and migration guidance, see :ref:`neuron-explorer-faq`.\n\n``neuron-profile`` helps developers identify performance bottlenecks and optimize their workloads for NeuronDevices. neuron-profile provides insights into NeuronDevice activity including the instructions executed on each compute engine (ex. Tensor engine, Vector engine, etc.), DMA data movement activity, and performance metrics such as engine utilization, DMA throughput, memory usage, and more. NeuronDevice activity is collected by the ``neuron-profile capture`` command which runs the model with tracing enabled. Profiling typically has near zero overhead because NeuronDevices have dedicated on-chip hardware profiling.\n\nAdditionally, ``neuron-profile`` supports Neuron Kernel Interface (NKI) developers in profiling their kernels. For more information, please refer to :ref:`use-neuron-profile`\n\n.. _neuron-profiler-installation:\n\nInstallation\n------------\n\n``neuron-profile`` comes as part of the ``aws-neuronx-tools`` package, and will be installed to ``/opt/aws/neuron/bin``.\n\n.. note::\n\n    ``neuron-profile`` requires Ubuntu 22.04 or newer, or Amazon Linux 2023 or newer.\n    Capturing profiles requires an Inferentia or Trainium instance, but processing profiles \n    can be done on any instance type.\n\nThe Neuron web profile viewer utilizes InfluxDB OSS 2.x to store time series data for the profiled workloads after post processing.\nPlease follow the instructions provided at https://portal.influxdata.com/downloads/ for the correct OS.  A sample installation\nof Neuron Profile and InfluxDB is provided below.\n\nUbuntu\n~~~~~~\n\n.. code-block:: bash\n\n    # Install Neuron Profile\n    . /etc/os-release\n    sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF\n    deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main\n    EOF\n\n    wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -\n    sudo apt-get update -y\n    sudo apt-get install aws-neuronx-runtime-lib aws-neuronx-dkms -y\n    sudo apt-get install aws-neuronx-tools -y\n\n    # Install InfluxDB\n    wget -q https://repos.influxdata.com/influxdata-archive_compat.key\n    echo '393e8779c89ac8d958f81f942f9ad7fb82a25e133faddaf92e15b16e6ac9ce4c influxdata-archive_compat.key' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null\n    echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list\n\n    sudo apt-get update && sudo apt-get install influxdb2 influxdb2-cli -y\n    sudo systemctl start influxdb\n    influx setup\n    # Fill in the information to finish the setup\n\n\n\nCapturing a profile\n-------------------\n\nThe ``neuron-profile`` tool can both capture and post-process profiling information. ``neuron-profile`` takes a compiled model (a NEFF), executes it, and saves the profile results to a NTFF (``profile.ntff`` by default).\nFor this example, we assume a NEFF is already available as ``file.neff``\n\n::\n\n    $ neuron-profile capture -n file.neff -s profile.ntff\n\nCapturing profiles for multi-worker jobs\n----------------------------------------\n\n``neuron-profile`` can capture profiles for collectives-enabled NEFFs running across multiple NeuronCores, NeuronDevices, or even nodes. \nThis is useful for understanding performance and communication overheads when deploying larger distributed models.\n\nThe following example, performs a distributed run across all NeuronDevices and NeuronCores on an inf2.24xlarge instances, capturing profiles for all 12 workers (one for each NeuronCore).\n\n::\n\n    $ neuron-profile capture -n file.neff --collectives-workers-per-node 12 -s output/profile.ntff\n\nA profile is saved for each worker in the output directory.\n\n:: \n\n    $ ls output\n    profile_rank_0.ntff   profile_rank_2.ntff  profile_rank_6.ntff profile_rank_1.ntff   profile_rank_3.ntff  profile_rank_7.ntff\n    profile_rank_10.ntff  profile_rank_4.ntff  profile_rank_8.ntff profile_rank_11.ntff  profile_rank_5.ntff  profile_rank_9.ntff\n\nIt is also possible to run a distributed job while only capturing a profile for a specific worker instead of all workers. To do that, use the ``--collectives-profile-id`` option.\n\n::\n\n    $ neuron-profile capture -n file.neff --collectives-profile-id 5 --collectives-workers-per-node 12 -s output/profile.ntff\n    $ ls output\n    profile_rank_5.ntff\n\n\nProviding per-worker inputs\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nBy default, ``neuron-profile capture`` uses all-zero inputs or a single set of inputs specified via positional arguments. For multi-worker jobs where each worker needs different inputs, use the ``--multi-input`` (``-m``) option to specify a file that maps inputs to each worker.\n\nEach line in the multi-input file corresponds to one worker and follows the same format as the positional ``inputs`` argument (``<NAME> <FILE_PATH>`` pairs separated by spaces). For example, for a 2-worker job:\n\n::\n\n    # inputs.txt\n    IN1 worker0_x.npy IN2 worker0_y.npy\n    IN1 worker1_x.npy IN2 worker1_y.npy\n\nThen capture the profile with:\n\n::\n\n    $ neuron-profile capture -n file.neff -m inputs.txt --collectives-workers-per-node 2 -s output/profile.ntff\n\n.. note::\n\n    The ``--multi-input`` option cannot be used together with the positional ``inputs`` argument.\n\n\nCapturing profiles for multi-node jobs\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\nFor multi-node jobs, ``neuron-profile`` must be invoked on each node using the ``collectives-worker-start-id`` to specify the global index of the first worker on the given\nnode. For example, for a two node job with a total of four workers and two workers per node, the following commands are run on each node.\n\n::\n\n    # on node 0\n    $ neuron-profile capture -n file.neff --collectives-worker-start-id 0 --collectives-workers-per-node 2 --collectives-worker-count 4\n    # on node 1\n    $ neuron-profile capture -n file.neff --collectives-worker-start-id 2 --collectives-workers-per-node 2 --collectives-worker-count 4\n\n``neuron-profile`` saves the profile for a worker on the node where that worker was launched. So in the case above, ``profile_rank_0.ntff`` and ``profile_rank_1.ntff``\nare saved to node 0, and ``profile_rank_2.ntff`` and ``profile_rank_3.ntff`` are saved to node 1.\n\n\n\nProcessing and viewing the profile results\n------------------------------------------\n\nTo analyze and view the collected profiling data, use the ``view`` subcommand of ``neuron-profile``. This command performs two main functions: it post-processes the profiling data and starts up an HTTP server. Once the server is running, you can access the profiling results through your web browser. Please note: Chrome is the officially supported browser for viewing profiling results\n\n\n.. note::\n    Profiles can be processed and viewed on another machine without Neuron devices. The ``aws-neuronx-tools`` package\n    needs to be installed so that you can run ``neuron-profile view``. To process the profile on another\n    instance, you need to copy the NEFF and NTFF files from your Inf or Trn instance to that instance.\n\nViewing a single profile\n~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe first way to invoke ``neuron-profile view`` is to pass both the NEFF and the NTFF to this command.\nIt will post-process these artifacts and print out a direct link to the profile view.\n\n::\n\n    $ neuron-profile view -n file.neff -s profile.ntff\n    View profile at http://localhost:3001/profile/n_fdc71a0b582ee3009711a96e59958af921243921\n    ctrl-c to exit\n\n\nViewing profiles for multi-worker jobs\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nProfiles from multi-worker jobs (i.e. more than one NeuronCore) can either be viewed individually or in a combined collectives view.\nSince profile data is often similar between workers and processing profile data for all workers can be time-consuming, it is recommended to first \nexplore the profile for a single worker or small subset of workers. Viewing the profile for a specific worker is the same as for single-worker profiles.\n\n::\n\n    $ neuron-profile view -n file.neff -s output/profile_rank_5.ntff\n    View profile at http://localhost:3001/profile/n_fdc71a0b582ee3009711a96e59958af921243921\n\n\nTo view the profile for multiple workers, pass the directory containing all worker profiles to ``neuron-profile``.\n\n::\n\n    $ neuron-profile view -n file.neff -d output\n    View profile at http://localhost:3001/profile_cc/p_9a69d907e1350100c9b03745eaa67aa7422842ed\n\n|neuron-profile-multiworker-timeline|\n\nWhen viewing profiles with the combined collectives view you can easily switch between the timelines of different workers by clicking\nthe \"Rank <x>\" tabs.\n\nNote: the \"CC Aggregated View\" currently shows no data. This will be populated in an upcoming release. \n\n\nViewing multiple profiles\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAlternatively, when post-processing multiple profiles, it may be desirable to have a persistent server running while processing results in the background.\nIn this case, we can skip passing arguments to the command, which will direct users to the main page listing all available profiles.\n\n::\n\n    $ neuron-profile view\n    View a list of profiles at http://localhost:3001/\n\nIn a separate window, we can kick off the post-processing without launching another server by passing the ``--ingest-only`` flag.\n\n::\n\n    $ neuron-profile view -n file.neff -s profile.ntff --ingest-only\n    Profile \"n_47cf9972d42798d236caa68952d0d29a76d8bd66\" is ready to view\n\n``n_47cf9972d42798d236caa68952d0d29a76d8bd66`` is the bucket where the data is stored.  We can find this profile at ``localhost:3001/profile/<bucket>``.\n\nAccessing the profiles\n~~~~~~~~~~~~~~~~~~~~~~\n\nIf ``neuron-profile view`` is run on a remote instance, you may need to use port forwarding to access the profiles.\n\nFrom the local machine, SSH to the remote instance and forward ports 3001 (the default ``neuron-profile`` HTTP server port) and 8086 (the default\nInfluxDB port).  Then in the browser, go to ``localhost:3001`` to view the profiles.\n\n::\n\n    $ ssh <user>@<ip> -L 3001:localhost:3001 -L 8086:localhost:8086\n\n\n.. _neuron-profile-ug-alternative-outputs:\n\nAlternative output formats\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nBesides the web view mentioned above, ``neuron-profile`` also supports other output formats such as ``summary-text`` and ``summary-json`` for viewing overall metrics of the profile,\nas well as ``json`` for a parsable alternative.\n\nProfile summary\n^^^^^^^^^^^^^^^\n\nYou can see a summary of each profile using the command ``neuron-profile view --output-format summary-text -n file.neff -s output/profile_rank_<i>.ntff``. This output\nincludes summary metrics and fields for the NeuronCore (``nc_idx``) and NeuronDevice (``nd_idx``) on which the worker was run. For example, the following shows worker 5 used core 1 on\ndevice 3 and took 0.017 seconds (17 ms) to run the model.\n\n::\n\n    $ neuron-profile view --output-format summary-text -n file.neff -s output/profile_rank_5.ntff | grep -e \"nd_idx\" -e \"nc_idx\" -e \"total_time\"\n    nc_idx      1\n    nd_idx      2\n    total_time  0.017\n\nThis summary is also available as JSON using ``--output-format summary-json``.\n\nJSON\n^^^^\n\nYou can also view the profile summary and all post-processed profiler events together as a single JSON. To do that, use the ``--output-format json`` option.\n\n::\n\n    $ neuron-profile view --output-format json --output-file profile.json -n file.neff -s output/profile_rank_5.ntff\n    $ cat profile.json\n    {\n        \"summary\": [\n            {\n                \"total_time\": 0.017,\n                \"event_count\": 11215\n                [...]\n            }\n        ],\n        \"instruction\": [\n            {\n                \"timestamp\": 10261883214,\n                \"duration\": 148,\n                \"label\": \"TensorMatrix\",\n                \"hlo_name\": \"%add.1 = add(%dot, %custom-call.44)\",\n                \"opcode\": \"MATMUL\",\n                \"operands\": \"S[5] (Tensor)++@complete acc_flags=3 row_grp=q0 src=fp16@0x5600[1,0,0][3,1,1] dst=0x2000000[1,0,0][3,1,1] 3*128 \"\n            },\n            [...]\n        ]\n    }\n\nUnderstanding a Neuron profile\n------------------------------\n\nThe section provides a quick overview on what features and information are available through the Neuron web profile viewer.\n\nFor more information on terms used, please check out the :ref:`neuron_hw_glossary`.\n\nTimeline\n~~~~~~~~\n\n|neuron-profile-web-timeline|\n\nThe execution timeline is plotted based on the elapsed nanoseconds since the start of execution.\n\nStarting from the bottom, the ``TensorMatrix Utilization`` shows the efficiency of the TensorEngine, and\nthe ``Pending DMA Count`` and ``DMA Throughput`` rows show the DMA activity.  In general, we want these to be as high\nas possible, and in some cases may help give clues as to whether the workload is memory or compute bound.\n\nNext are the individual NeuronCore engine executions.  These rows show the start and end times for instructions executed by each\nengine, and clicking on one of these bars will show more detailed information, as well as any dependencies that were found.\nFor models involving collective compute operations, you will additionally see rows labeled with ``CC-core``, which are used to synchronize\nthe CC operations.\n\nTowards the top is the DMA activity.  These can include the transfers of input and output tensors, intermediate tensors, and any\nadditional spilling or loading to and from the on-chip SRAM memory.\n\n\n.. _neuron-profile-ug-features:\n\nFeatures\n~~~~~~~~\n\nThe following are some useful features that may help with navigating a profile:\n\n- Dragging your cursor across a portion of the timeline will zoom in to the selected window, providing a more in depth view of the execution during that time period.\n- Hovering over a point will reveal a subset of information associated with it.\n- Clicking a point will open a text box below the timeline with all the information associated with it.\n- Right-clicking a point will drop a marker at a certain location.  This marker will persist when zooming in and out.\n\n  - All marker information can be found by clicking the ``Annotations`` button.\n  - Markers can be saved and loaded by using a provided name for the marker set.\n  - Individual markers can be renamed or deleted in this menu as well.\n  - Time span between markers will automatically be shown, and users can change the marker name next to ``diff vs`` to calculate time between other markers.\n\n|neuron-profile-annotation-menu|\n\n- The \"Search\" tab can be used to find and highlight specific points in the profile related to the queried field(s).\n- Click on the \"Box Select\" button in the top-right corner of the timeline and then click and drag on any region of the plot to select all events in that region and get summary statistics such as total duration and breakdowns of opcodes, transfer_sizes, and more.\n\nView Settings\n^^^^^^^^^^^^^\n\nOptions within the ``View Settings`` tab can be used to further customize the timeline view.  Editing any settings will update the URL accordingly, which can be used to re-visit the current view at a later time.\nTo speed up initial load times, the default will be a ``Minimal View`` which only shows the instructions executed and the model FLOPs utilization (MFU) over time.  Changing between the minimal and full views can also be done through the ``Reset to Full View`` or ``Reset to Minimal View`` buttons.\n\n- ``DMA color group`` will recolor DMAs based on the selected grouping. For example, \"Engine\" will re-color the DMAs based on the associated engine.\n- ``Instruction color group`` will recolor instructions based on the selected grouping. For example, \"Layer\" will re-color the timeline based on the associated framework layer name.\n- ``Layer group depth`` will group and color instructions at the selected layer depth. It will apply when ``Instruction color group`` is set to \"Layer\".\n\n  **Example:**\n    When ``Layer group depth`` is 2, instructions with layers `model/layer1/op1` and `model/layer1/op2` will be set to the same color.\n- ``Semaphore IDs`` allows for the selection of multiple semaphore values to show at once within the timeline\n  \n\n|neuron-profile-view-settings|\n\nAdditionally, there are various summary tabs that can be clicked to provide more information on the model/NEFFs.\n\n- ``Layer Summary`` shows timing information, FLOPs and instructions counts per layer.\n- ``Selection Summary`` shows summarized information for all data points in the selected window when using the \"Box Select\" mode.\n- ``NEFF Header`` shows details on the profiled NEFF, such as the number of NeuronCores required to execute.\n- ``NEFF Nodes`` shows input, output, and weight tensor information, including name, size, and shape.\n- ``Model Info`` shows a summary of the NTFF, such as the NeuronCore the model was executed on, number of notifications, and hardware execution time.\n- ``DMA Queues Info`` shows more information on the queues used for data movement.\n- ``NC Memory Usage Info`` shows a snapshot of the device memory usage breakdown before profiling was started.\n- ``Terminology`` shows a description of metrics provided in the summary table.\n\n|neuron-profile-web-summaries|\n\nPerformance Warnings\n~~~~~~~~~~~~~~~~~~~~\n\nFurthermore, ``neuron-profile`` will automatically highlight some potential performance issues with warning annotations. For example if a tensor has been loaded more than 2 times a warning annotation (seen below as an orange box) will be drawn, encircling the dma instructions where the tensor was loaded many times.\nHover on the annotation to see more details about loading the tensor. Another kind of warning annotation will highlight areas of high throttling. This provides the user a potential reason for slow down (thermal protection). Specific throttling details are shown when hovering the annotation.\n\n|neuron-profile-tensor-reload-annotation|\n\n.. _neuron-profile-collectives-barrier:\n\nCollectives\n~~~~~~~~~~~\n\nFor models involving collective operations, the timeline will show a box around all data points related to each operation.  Hovering the top left of the box will reveal more information associated with the operation.\n\n.. note::\n    this feature requires profiles to be captured with Neuron Runtime 2.20 or higher.\n\n|neuron-profile-cc-op-annotation|\n\nAdditionally, for any on-device collectives synchronization barrier, a similar box will be display indicating a barrier instead of an actual collectives operation.\n\n|neuron-profile-cc-op-barrier|\n\nEvent Details\n~~~~~~~~~~~~~\n\nThe information when a point is clicked is grouped by categories such as `Timing` or `IDs` for convenience.\nEach row will also include a tool tip on the right side, which can be hovered for an explanation on what the field represents.\nFor instruction `Operands` specifically, clicking on the tooltip will reveal a breakdown of fields that compose an operand, as well as a generic example for reference.  The examples may not apply directly to the currently viewed profile.\n\n|neuron-profile-click-tooltip|\n\n\n.. _neuron-profile-framework-stack-trace:\n\nFramework Stack Trace\n----------------------------\n\nThe Framework Stack Trace feature shows up in the Event Details when an instruction on the device profile is clicked. This can we used to map the device instructions back to framework level code in JAX or PyTorch to better understand what part of the application code resulted in a particular device instruction.\n\n|neuron-profile-stack-trace-event-details|\n\nTo enable tracking of the stack trace information, you need to set environment variables before compiling your NEFF:\n\n::\n\n    export XLA_IR_DEBUG=1\n    export XLA_HLO_DEBUG=1\n\nOnce you have the NEFF, you can simply capture the profile as usual. While viewing the profile use the ``--framework-source-root`` to pass the path to framework source files. This is optional and is only needed if you want to view your code along side the profile.\n\n::\n\n    $ neuron-profile view -n file.neff -s profile.ntff --framework-source-root /path/to/framework/source/files\n\n|neuron-profile-stack-trace-viewer|\n\nSearching Profiles\n~~~~~~~~~~~~~~~~~~\n\nSearching helps identify specific data points that may be worth investigating, such as all instructions related to a specific layer or operation.\nIn the \"Search\" tab, select the corresponding field of interest and enter the value to search for.  Multiple fields can be searched together.  Please refer to the tooltip within the tab for more help on the query syntax.\nThe search results will also include a summary of all data points found within the current time range.\n\n|neuron-profile-search-summary|\n\n\nHardware Errors\n~~~~~~~~~~~~~~~\n\nInvalid code can lead to errors on Neuron hardware. These errors will be displayed in Neuron Profile's Custom Notification timeline, as shown below. For example an Out of Bounds (OOB) error is displayed as:\n\n|neuron-profile-oob-error|\n\nUsers can correlate the error to the time it occurred and view nearby events to help debug.\n\n\n.. _neuron-profile-scratchpad-mem-usage:\n\nView Scratchpad Usage With Memory Tracker\n------------------------------------------\n\nThe Memory Tracker feature in Neuron Profiler provides detailed insights into scratchpad memory usage over time, showing how memory is allocated and utilized by different tensors during model execution. This is particularly useful for understanding memory bottlenecks and optimizing memory usage patterns.\n\nTo enable Memory Tracker, you need to set environment variables before compiling your NEFF:\n\n::\n\n    export XLA_IR_DEBUG=1\n    export XLA_HLO_DEBUG=1\n\nThen compile your model with these debug flags enabled. After compilation, capture the profile with the ``--enable-dge-notifs`` flag or set ``NEURON_RT_ENABLE_DGE_NOTIFICATIONS=1``:\n\n::\n\n    $ neuron-profile capture -n file.neff --enable-dge-notifs\n\nFinally, view the profile with Memory Tracker enabled:\n\n::\n\n    $ neuron-profile view -n file.neff -s profile.ntff --enable-memory-tracker\n\nThe Memory Tracker displays a timeline showing scratchpad memory usage over time, with a detailed breakdown of which tensors are consuming memory at any given point. This visualization helps identify:\n\n- Peak scratchpad memory usage\n- Memory allocation patterns\n- Tensor-specific memory consumption\n- Potential memory optimization opportunities\n\n|neuron-profiler-memory-tracker|\n\nYou can interact with the Memory Tracker timeline similar to other profile views - clicking on memory usage bars will show detailed information about the tensors using memory at that time, and you can zoom in to specific time ranges to get a more detailed view of memory allocation patterns.\n\n\nViewing Profiles with Perfetto\n------------------------------\n\nPerfetto is an open-source trace analysis toolkit with a powerful UI for visualizing and analyzing trace data.\nUsers of Neuron Profiler have the option of viewing their profiles in the Perfetto UI.\n\nTo process your profile and generate a Perfetto trace file that can be viewed in the Perfetto UI run the following command:\n\n::\n\n    $ neuron-profile view -n file.neff -s profile.ntff --output-format perfetto\n\nThis will generate a ntff.pftrace file. Go to https://ui.perfetto.dev/ in your browser and open the ntff.pftrace file to view your profile in Perfetto.\n\n.. note::\n    When loading trace files in the Perfetto UI, your data is processed locally and not uploaded to Perfetto’s servers.\n\n\n|neuron-profile-perfetto-device|\n\n.. _neuron-profile-large-perfetto-profiles:\n\nViewing Large Profiles In Perfetto\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYour browser may run out of memory when viewing ``ntff.pftrace`` (Perfetto trace) files that are more than a few hundred MB.\nTo get around this problem you can use the trace processor script by running the following command on your local system where you wish to view the profile\n\n::\n\n    curl -LO https://get.perfetto.dev/trace_processor\n    chmod +x ./trace_processor\n    ./trace_processor --httpd ntff.pftrace\n\nNow go to  https://ui.perfetto.dev/ in your browser and in the dialog box that pops up click the  “YES, use loaded trace” button.\n\nFor more information on using the trace processor script and viewing large traces, please refer to the \nPerfetto documentation at https://perfetto.dev/docs/visualization/large-traces.\n\nShowing Dependencies In Perfetto\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nBy default Neuron Profiler does not process dependencies for profiles to be viewed in Perfetto because Perfetto renders \nthe full dependency chain which can be visually overwhelming. To include dependencies that can be viewed when clicking \ninstructions and DMAs in the Perfetto UI, use the ``--show-perfetto-flows`` flag when processing your profile.\n\n::\n\n    $ neuron-profile view -n file.neff -s profile.ntff --output-format perfetto --show-perfetto-flows\n\n\nCLI reference\n-------------\n\n.. rubric:: neuron-profile capture\n\n.. rubric:: neuron-profile capture\n\n.. code-block:: text\n\n    neuron-profile capture [parameters] [inputs...]\n\nTakes a given compiled NEFF, executes it, and collects the profile results.\nWhen no inputs are provided, all-zero inputs are used, which may result in inf or NaNs.\nIt is recommended to use ``--ignore-exec-errors``.\n\n**Parameters**\n\n``-n, --neff`` (string)\n    The compiled NEFF to profile.\n\n``-s, --session-file`` (string)\n    The file to store profile session information in.\n\n``--ignore-exec-errors``\n    Ignore errors during execution.\n\n``inputs`` (positional args)\n    List of inputs in the form of ``<NAME> <FILE_PATH>`` separated by space. For example: ``IN1 x.npy IN2 y.npy``.\n\nThe following ``neuron-profile capture`` arguments are only relevant for multi-worker jobs:\n\n``-m, --multi-input`` (string)\n    Path to a file that describes the input list for each requested worker. Each line in the file should correspond to one worker and follow the same format as the ``inputs`` positional argument (i.e. ``<NAME> <FILE_PATH>`` pairs separated by spaces). Cannot be used together with the ``inputs`` positional argument. If ``inputs`` is used instead, all workers will use the same inputs.\n\n``--collectives-profile-id`` (string)\n    Worker id which will be profiled. Passing ``all`` profiles all workers. (default: ``all``)\n\n``-r, --collectives-workers-per-node`` (int)\n    The number of workers on the current node. The global worker id (rank) of worker n on current node is ``collectives-worker-start-id+n``.\n\n``--collectives-worker-count`` (int)\n    Total number of Neuron workers across all nodes for this collectives run.\n\n``--collectives-worker-start-id`` (int)\n    The rank offset for the first worker on the current node. For example, if node 0 has workers 0,1 and node 1 has workers 2,3 then ``collectives-worker-start-id`` for node 0 and 1 will be 0 and 2, respectively. (default: ``0``)\n\n.. rubric:: neuron-profile view\n\n.. code-block:: text\n\n    neuron-profile view [parameters]\n\n**Parameters**\n\n``-n, --neff-path`` (string)\n    The compiled NEFF file location.\n\n``-s, --session-file`` (string)\n    The profile results NTFF file location.\n\n``-d, --session-dir`` (string)\n    Directory containing profile files for multi-worker runs.\n\n``--output-format`` (string)\n    How the processed profile should be presented. The default ``db`` writes processed data to the database. ``summary-text`` and ``summary-json`` print the summary data as a table or json, respectively, without writing to the database. The ``perfetto`` option writes processed data to Perfetto's native protobuf based tracing format, and can be visualized in the Perfetto UI. The ``JSON`` option writes processed data to human-readable JSON. (default: ``db``)\n\n``--output-file`` (string)\n    File path to write results to, if applicable for the given output format.\n\n``--db-endpoint`` (string)\n    The endpoint of InfluxDB. (default: ``http://localhost:8086``)\n\n``--db-org`` (string)\n    The org name of InfluxDB.\n\n``--db-bucket`` (string)\n    Name of the InfluxDB bucket where ingested profile data is stored. Also used in the URL for viewing the profile. (Optional)\n\n``--port`` (int)\n    The port number of the http server. (default: ``3001``)\n\n``--force``\n    Force overwrite an existing profile in the database.\n\n``--terminology``\n    Print a helpful table of terminology used by the profiler.\n\n``--enable-memory-tracker``\n    Enable Memory Tracker to view scratchpad usage over time with a breakdown of usage per tensor. This requires having set ``XLA_IR_DEBUG=1`` and ``XLA_HLO_DEBUG=1`` before NEFF compilation and passing ``--enable-dge-notifs`` when capturing the profile.\n\n\nFAQ\n---\n\nDifference between TensorE and TensorMatrixE Rows in Timeline\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n- TensorE includes instruction trace for LoadStationary (LoadWeight)\n- TensorMatrixE includes instruction trace for MultiplyMoving (Matmul)\n- Both instruction traces happen on the same TensorE engine, but we separate them into two rows to de-clutter the timeline due to the background load stationary feature (loading stationary matrix for the next matmul in parallel to current matmul). See more info in :ref:`NKI architecture guide <arch_guide_tensor_engine_perf>`. \n\nOut of memory (OOM) when capturing a profile\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf ``neuron-profile capture`` fails due to device out-of-memory (OOM), you can increase available memory using using the single-IO mode:\n\nSingle-IO creates one shared I/O buffer on the device equal to the size of the largest I/O tensor. All inputs and outputs then point to slices of this shared buffer instead of allocating separate tensors. This significantly lowers the device memory needed during capture at the cost of producing incorrect outputs.\n\nExample usage:\n\n::\n\n    neuron-profile capture --single-io -n file.neff -s profile.ntff\n\nImportant: with ``--single-io``, the profiled performance characteristics (e.g., timing, utilization, bandwidth) are representative, but the model outputs are intentionally not correct. Use this option only to get accurate performance measurements when device memory is tight; do not use it for correctness/accuracy validation.\n\nIf you are able to make changes to your model itself to reduce memory usage, consider the following:\n- Reduce batch size\n- Lower numerical precision\n- Reduce number of layers\n\nIn some cases, a full device profile isn’t necessary to understand performance at a high level. You can instead capture a system profile, which shows overall model execution time and a runtime API trace across all workers and does not require extra device memory. See :ref:`System Profiles overview <system-profiles-overview>`.\n\nTroubleshooting\n---------------\n\n\nOutputting to Unsupported NumPy Data Type\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen running ``neuron-profile capture --save-output-npy``, you may encounter an error if the output tensor uses a data type that NumPy doesn't natively support:\n\n::\n\n    failed to save output output_hbm to file: unsupported type for npy output: bfloat16\n\nTo work around this, use ``--save-output`` instead to save the output as raw binary, then convert it to the desired data type using NumPy and the ``ml_dtypes`` library.\nThis preserves the precision of the output since it is written to binary instead of casting to a supported data type.\n\n::\n\n    # Capture with raw binary output\n    neuron-profile capture --save-output -n file.neff\n\n    # Convert from raw binary to bfloat16\n    import numpy as np\n    import ml_dtypes\n    \n    output = np.fromfile('output0.npy', dtype=np.uint16)\n    output = output.view(ml_dtypes.bfloat16)\n\nInfluxDB not installed\n~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n    $ neuron-profile view -n file.neff -s profile.ntff\n    ERRO[0001] To install influxdb, go to https://portal.influxdata.com/downloads/ and follow the instructions there\n    influxdb not setup correctly: exec: \"influx\": executable file not found in $PATH\n\n::\n\n    $ neuron-profile view -n file.neff -s profile.ntff\n    ERRO[0000]                                              \n    influxdb token not setup correctly: exit status 1\n    Try executing \"systemctl start influxdb\" and \"influx setup\"\n\nRunning ``neuron-profile view`` without InfluxDB installed will result in an error and a pointer to the InfluxDB installation instructions.\nPlease follow the provided instructions and retry.\n\nToo many open files\n~~~~~~~~~~~~~~~~~~~\n\n::\n\n    influxdb2client E! Write error: internal error: unexpected error writing points to database: [shard 10677] open /home/ubuntu/.influxdbv2/engine/data/7caae65aaa48380d/autogen/10677/index/0/MANIFEST: too many open files\n\nInfluxDB will encounter \"too many open files\" and out of memory errors after a few hundred buckets have been created.\nTwo ways to solve this are to delete unused buckets or increase the system file descriptor limit.\n\nTo increase the file descriptor limit, add the following lines to ``/etc/security/limits.d/efa.conf`` and ``/etc/security/limits.conf``:\n\n::\n\n    *               soft    nofile      1048576\n    *               hard    nofile      1048576\n\nAdd the following lines to /etc/sysctl.conf\n\n::\n\n    fs.file-max = 197341270\n    vm.max_map_count=1048576\n\nCommit changes by running ``sudo sysctl -p``.\n\n.. |neuron-profile-web-timeline| image:: /images/neuron-profile-web-timeline_2-11.png\n.. |neuron-profile-annotation-menu| image:: /images/neuron-profile-annotation-menu_2-21.png\n.. |neuron-profile-view-settings| image:: /images/neuron-profile-view-settings_2-26.png\n.. |neuron-profile-web-summaries| image:: /images/neuron-profile-web-summaries_2-21.png\n.. |neuron-profile-tensor-reload-annotation| image:: /images/neuron-profile-tensor-reload-annotation.png\n.. |neuron-profile-multiworker-timeline| image:: /images/neuron-profile-multiworker-timelime_2-16.png\n.. |neuron-profile-cc-op-annotation| image:: /images/neuron-profile-cc-op-annotation.png\n.. |neuron-profile-cc-op-barrier| image:: /images/neuron-profile-cc-op-barrier.png\n.. |neuron-profile-click-tooltip| image:: /images/neuron-profile-click-tooltip.png\n.. |neuron-profile-oob-error| image:: /images/neuron-profile-oob-error.png\n.. |neuron-profile-search-summary| image:: /images/neuron-profile-search-summary.png\n.. |neuron-profiler-memory-tracker| image:: /images/neuron-profiler-memory-tracker.png\n.. |neuron-profile-stack-trace-event-details| image:: /images/neuron-profile-stack-trace-event-details.png\n.. |neuron-profile-stack-trace-viewer| image:: /images/neuron-profile-stack-trace-viewer.png\n.. |neuron-profile-perfetto-device| image:: /images/neuron-profiler2-perfetto-device.png\n\nWhen viewing UI \"FATAL - Failed metadata query\"\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf you are SSH port forwarding the web UI from a remote machine to your local desktop you will need to port forward both the web UI (3001) and the database (8086) like so:\n\n::\n\n    ssh -L 3001:localhost:3001 -L 8086:localhost:8086 remote_machine\n\nVisual Artifacts when viewing profiles\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSome users have reported visual artifacts when viewing certain profiles in browsers other than Chrome. If you encounter this issue, please try using Chrome. \nFor more details, refer to the GitHub issue: https://github.com/aws-neuron/aws-neuron-sdk/issues/1033\n"
  },
  {
    "path": "tools/profiler/neuron-profiler-2-0-beta-user-guide.rst",
    "content": ".. _neuron-profiler-2-0-guide:\n\nNeuron Profiler 2.0 (Beta) User Guide\n=====================================\n\nOverview\n--------\n\nNeuron Profiler 2.0 offers a user-friendly experience for capturing and analyzing application performance \nthrough both high-level system profiles and detailed device-level profiles. Users can profile their workloads \nusing framework-specific APIs within their application code or by setting an environment variable before \nexecution. This tool supports profiling for both single-node and distributed workloads, integrating with \nenvironments such as ParallelCluster and EKS. Once captured, profile results can be explored through multiple \ninterfaces: the Neuron Profiler UI, the open-source trace viewer `Perfetto <https://perfetto.dev/docs/>`_, \nor by exporting to a human-readable JSON format. This flexibility in data capture and visualization enables \nusers to gain comprehensive insights into their application's performance across various scenarios and scales.\n\n.. important::\n\n    The Neuron Profiler will be replaced by the new Neuron Explorer in a future release. For more details and migration guidance, see :ref:`neuron-explorer-faq`.\n\n.. note::\n    Neuron Profiler 2.0 is a set of new features currently in beta that enhance and simplify the experience of \n    capturing and viewing profiles. It is not a replacement of :ref:`Neuron Profiler <neuron-profile-ug>`, \n    which is the existing feature set specifically for capturing and viewing device profiles.\n\n.. _system-profiles-overview:\n\nKey benefits\n~~~~~~~~~~~~\n\n- End-to-end timing of model execution and a Neuron Runtime API trace across all workers, helping identify scheduling gaps, synchronization, and host/runtime overheads.\n- No extra device memory usage by default, making system profiles ideal when device memory is limited or when only high-level insights are needed.\n- Option to capture device profiles for individual models during your workload. \n- Flexible capture and viewing: enable via environment variables or framework APIs; view in the Neuron Profiler UI, in Perfetto, or export as JSON.\n\nCapturing profiles\n------------------\n\nNeuron Profiler 2.0 offers several flexible options for capturing profiles. Users can either set an environment \nvariable ``NEURON_RT_INSPECT_ENABLE`` or use the PyTorch or JAX profiling APIs from their application code for \nfine-grained control over which sections of their code are profiled. PyTorch and JAX users who prefer not to \nmodify their application code can still enable profiling by setting the environment variable before running \ntheir application.\n\nJAX User Experience\n-------------------\n\nJAX Setup\n~~~~~~~~~~~~\n\nFollow the :ref:`JAX Setup <setup-jax-neuronx>` instructions to install the required\nJAX Neuron Plugin and the latest Neuron Driver, Runtime and Tools packages.\n\n\nJAX Profiler\n~~~~~~~~~~~~\n\nThe JAX context-managed profiling API allows you to profile blocks of code. This will capture a system profile \nincluding a Neuron Runtime API trace and Python trace for your application code in the captured block. This \nwill also capture device profiles for any compiled graphs (NEFFs) executed on NeuronCores within this block. To use \nthe profiler, import the ``jax`` package.\n\n.. code-block:: python\n\n    import jax\n\nProfiling is enabled for all code enclosed in the context when using \n``with jax.profiler.trace(os.environ[\"NEURON_RT_INSPECT_OUTPUT_DIR\"]):``\n\n.. note::\n     It is important to pass the output directory ``os.environ[\"NEURON_RT_INSPECT_OUTPUT_DIR\"]`` to \n     ``with jax.profiler.trace`` and run ``export NEURON_RT_INSPECT_OUTPUT_DIR=<your output directory>`` \n     before enabling profiling. This ensures all captured profile data is saved to the correct output directory.\n\nCustom Annotations in JAX\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo add custom annotations to blocks of code in your profile, you can use ``jax.profiler.TraceAnnotation``. \nAnnotation names can be created at runtime, such as in the :ref:`example here <neuron-profile-full-jax-example>` \nusing ``with jax.profiler.TraceAnnotation(\"my_label\"+str(i)):``. For more information on TraceAnnotations, \nsee the official `JAX documentation <https://jax.readthedocs.io/en/latest/_autosummary/jax.profiler.TraceAnnotation.html>`_.\n\nJAX Profiling using environment variable\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nInstead of using the jax.profiler context manager, you can enable profiling for your entire application using \nan environment variable. This is desirable if you want to capture a profile without modifying your application \ncode. To enable profiling with the environment variable ``NEURON_RT_INSPECT_ENABLE=1`` and \n``NEURON_RT_INSPECT_OUTPUT_DIR=./output`` before running your application.\n\nFor example:\n\n.. code-block:: shell\n\n    # make sure to remove call to with jax.profiler.trace from python script\n    NEURON_RT_INSPECT_ENABLE=1 NEURON_RT_INSPECT_OUTPUT_DIR=./output python jax_script.py\n\nWhen using the ``NEURON_RT_INSPECT_ENABLE`` environment variable instead of ``jax.profiler``, system profiles \nwill not contain a framework and application code trace, only Neuron Runtime API trace.\n\nDo not set the ``NEURON_RT_INSPECT_ENABLE`` environment variable and use the ``jax.profiler`` within your \napplication code at the same time. Use one or the other.\n\nFor more profiling options that can be set through environment variables, see the section :ref:`Profile Capture Environment Variables <neuron-profiler-capture-environment-variables>`.\n\n.. _neuron-profile-full-jax-example:\n\nFull JAX Example\n~~~~~~~~~~~~~~~~\n\nCreate a file ``jax_script.py`` which performs repeated matrix multiplications distributed across Neuron devices.\n\n.. code-block:: python\n\n    from functools import partial\n    import os\n    import jax\n    import jax.numpy as jnp\n\n    from jax.sharding import Mesh, NamedSharding, PartitionSpec as P\n    from jax.experimental.shard_map import shard_map\n    from time import sleep\n\n    os.environ[\"XLA_FLAGS\"] = \"--xla_dump_hlo_snapshots --xla_dump_to=./dump\"\n\n    jax.config.update(\"jax_default_prng_impl\", \"rbg\")\n\n    mesh = Mesh(jax.devices(), ('i',))\n\n    def device_put(x, pspec):\n        return jax.device_put(x, NamedSharding(mesh, pspec))\n\n    lhs_spec = P('i', None)\n    lhs = device_put(jax.random.normal(jax.random.key(0), (128, 128)), lhs_spec)\n\n    rhs_spec = P('i', None)\n    rhs = device_put(jax.random.normal(jax.random.key(1), (128, 16)), rhs_spec)\n\n    @jax.jit\n    @partial(shard_map, mesh=mesh, in_specs=(lhs_spec, rhs_spec), out_specs=rhs_spec)\n    def matmul_allgather(lhs_block, rhs_block):\n        rhs = jax.lax.all_gather(rhs_block, 'i', tiled=True)\n        return lhs_block @ rhs\n\n    with jax.profiler.trace(os.environ[\"NEURON_RT_INSPECT_OUTPUT_DIR\"]):\n        out = matmul_allgather(lhs, rhs)\n        for i in range(10):\n            with jax.profiler.TraceAnnotation(\"my_label\"+str(i)):\n                out = matmul_allgather(lhs, rhs)\n            sleep(0.001)\n\n    expected = lhs @ rhs\n    with jax.default_device(jax.devices('cpu')[0]):\n        equal = jnp.allclose(jax.device_get(out), jax.device_get(expected), atol=1e-3, rtol=1e-3)\n        print(\"Tensors are the same\") if equal else print(\"Tensors are different\")\n\nSet your profile output directory and run the script:\n\n.. code-block:: shell\n\n    export NEURON_RT_INSPECT_OUTPUT_DIR=./output\n    python jax_script.py\n\nPyTorch User Experience\n-----------------------\n\nPyTorch Setup\n~~~~~~~~~~~~~\n\nFollow the :ref:`PyTorch Setup <setup-torch-neuronx>` instructions to install the required PyTorch Neuron packages \nas well as the latest Neuron Driver, Runtime and Tools. \n\nPyTorch Profiler\n~~~~~~~~~~~~~~~~\n\nThe PyTorch context-managed profiling API allows you to profile blocks of code. This will capture a system \nprofile including a Neuron Runtime API trace and Python trace for your application code in the captured block. \nThis will also capture device profiles for any compiled graphs executed on NeuronCores within this block. To \nuse the profiler, import it in your application:\n\n.. code-block:: python\n\n    from torch_neuronx.experimental import profiler\n\nThen profile a block of code using:\n\n.. code-block:: python\n\n    with torch_neuronx.experimental.profiler.profile(\n        port=9012,\n        profile_type='system',\n        target='neuron_profile_perfetto',\n        output_dir=os.environ['NEURON_RT_INSPECT_OUTPUT_DIR'],\n        ms_duration=30000) as profiler:\n\nAfter modifying your code to call the profiler, run your application as you normally would \nbut set the environment variable ``NEURON_RT_INSPECT_OUTPUT_DIR`` to specify the output directory.\n\n.. code-block:: shell\n\n    NEURON_RT_INSPECT_OUTPUT_DIR=./output python application.py\n\n.. note::\n     it is essential to set ``output_dir=os.environ['NEURON_RT_INSPECT_OUTPUT_DIR']`` when starting the profiler from your application code. \n     This ensures that all profile data sources dump to the same output directory. \n\nPyTorch Profiling using Environment Variable\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nInstead of using the ``torch_neuronx.experimental.profiler.profile`` context manager, you can enable profiling \nfor your entire application using environment variable. This is desirable if you want to capture a profile without modifying your application code. To enable profiling \nwith environment variable ``NEURON_RT_INSPECT_ENABLE=1`` and ``NEURON_RT_INSPECT_OUTPUT_DIR=./output`` before running your application.\n\nFor example\n\n.. code-block:: shell\n\n    # make sure to remove call to with torch_neuronx.experimental.profiler.profile from python script\n    NEURON_RT_INSPECT_ENABLE=1 NEURON_RT_INSPECT_OUTPUT_DIR=./output python pytorch_script.py\n\nWhen using the ``NEURON_RT_INSPECT_ENABLE`` environment variable instead of ``torch_neuronx.experimental.profiler.profile`` system profiles will not contain a framework and application code trace, only Neuron Runtime API trace.\n\nDo not set the ``NEURON_RT_INSPECT_ENABLE`` environment variable and use the ``torch_neuronx.experimental.profiler.profile`` within your application code at the same time. Use one or the other. \n\nFor more profiling options that can be set through environment variables, see the section :ref:`Profile Capture Environment Variables <neuron-profiler-capture-environment-variables>`.\n\n\nFull PyTorch Example\n~~~~~~~~~~~~~~~~~~~~\n\nCreate a file ``train_torchrun_context.py`` with the following contents\n\n.. code-block:: python\n\n    import os\n\n    import torch\n    import torch.nn as nn\n    import torch.nn.functional as F\n\n    # XLA imports\n    import torch_xla\n    import torch_xla.core.xla_model as xm\n    import torch_xla.debug.profiler as xp\n\n    import torch_neuronx\n    from torch_neuronx.experimental import profiler\n\n    os.environ[\"NEURON_CC_FLAGS\"] = \"--cache_dir=./compiler_cache\"\n\n    # Global constants\n    EPOCHS = 2\n\n    # Declare 3-layer MLP Model\n    class MLP(nn.Module):\n        def __init__(self, input_size=10, output_size=2, layers=[5, 5]):\n            super(MLP, self).__init__()\n            self.fc1 = nn.Linear(input_size, layers[0])\n            self.fc2 = nn.Linear(layers[0], layers[1])\n            self.fc3 = nn.Linear(layers[1], output_size)\n\n        def forward(self, x):\n            x = F.relu(self.fc1(x))\n            x = F.relu(self.fc2(x))\n            x = self.fc3(x)\n            return F.log_softmax(x, dim=1)\n\n    def main():\n        # Fix the random number generator seeds for reproducibility\n        torch.manual_seed(0)\n\n        # XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance)\n        device = xm.xla_device()\n\n        # Start the profiler context-manager\n        with torch_neuronx.experimental.profiler.profile(\n            port=9012,\n            profile_type='system',\n            target='neuron_profile_perfetto',\n            output_dir=os.environ['NEURON_RT_INSPECT_OUTPUT_DIR'],\n            ms_duration=30000) as profiler:\n\n            # IMPORTANT: the model has to be transferred to XLA within\n            # the context manager, otherwise profiling won't work\n            model = MLP().to(device)\n            optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n            loss_fn = torch.nn.NLLLoss()\n\n            # start training loop\n            print('----------Training ---------------')\n            model.train()\n            for epoch in range(EPOCHS):\n                optimizer.zero_grad()\n                train_x = torch.randn(1, 10).to(device)\n                train_label = torch.tensor([1]).to(device)\n\n                # forward\n                loss = loss_fn(model(train_x), train_label)\n\n                # back\n                loss.backward()\n                optimizer.step()\n\n                # XLA: collect ops and run them in XLA runtime\n                xm.mark_step()\n\n        print('----------End Training ---------------')\n\n    if __name__ == '__main__':\n        main()\n\nRun this workload with the following command:\n\n.. code-block:: shell\n\n    NEURON_RT_INSPECT_OUTPUT_DIR=\"output\" python simple_demo.py\n\n.. _neuron-profiler-non-framework-user-experience:\n\nNon-framework Specific User Experience\n--------------------------------------\n\nYou can also control profiling with environment variables. This is useful when you can’t easily change your \napplication code, such as when running an executable which calls the Neuron Runtime or in a containerized \nenvironment where the application code is built into the container image.\n\n.. _neuron-profiler-capture-environment-variables:\n\nProfile Capture Environment Variables\n--------------------------------------\n\n.. _core-control-variables:\n\nCore control variables\n~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Variable\n     - Description\n     - Default behavior\n   * - ``NEURON_RT_INSPECT_ENABLE``\n     - Set to ``1`` to enable profiling\n     - Enables system profiling and disables device profiling. To control which profile types are captured, see :ref:`Profile type selection <profile-type-selection>`\n   * - ``NEURON_RT_INSPECT_OUTPUT_DIR``\n     - Directory for profile data output\n     - Default directory for captured profile data is ``./output``\n\n.. _profile-type-selection:\n\nProfile type selection\n~~~~~~~~~~~~~~~~~~~~~~~\n\n.. note:: \n    \n    When ``NEURON_RT_INSPECT_ENABLE`` set to ``1``, ``NEURON_RT_INSPECT_SYSTEM_PROFILE`` is enabled by default (set to 1) and ``NEURON_RT_INSPECT_DEVICE_PROFILE`` is disabled by default (set to ``0``).\n\nWhen ``NEURON_RT_INSPECT_ENABLE`` = 1, two different profile types are available:\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Variable\n     - Profile type\n     - Description\n     - Enable capture\n     - Disable capture\n   * - ``NEURON_RT_INSPECT_SYSTEM_PROFILE``\n     - System-level\n     - Captures runtime system events and operations\n     - Set to ``1``\n     - Set to ``0``\n   * - ``NEURON_RT_INSPECT_DEVICE_PROFILE``\n     - Device-level\n     - Captures detailed NeuronCore hardware metrics\n     - Set to ``1``\n     - Set to ``0``\n\n.. note::\n\n    These variables have no effect if ``NEURON_RT_INSPECT_ENABLE`` is not set to ``1``.\n\n.. _advanced-config-vars:\n  \nAdvanced configuration\n~~~~~~~~~~~~~~~~~~~~~~~\n\n.. list-table::\n   :widths: auto\n   :header-rows: 1\n   :align: left\n\n   * - Variable\n     - Profile type\n     - Description\n     - Default behavior\n   * - ``NEURON_RT_INSPECT_SYS_TRACE_MAX_EVENTS_PER_NC``\n     - System-level\n     - Maximum trace events per NeuronCore before oldest events are overwritten\n     - 1,000,000\n\n.. note:: \n    \n    Increasing the event limit will consume more host memory.\n\nExample Capturing Profile of Application Using Environment Variables\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nInstead of using the PyTorch or JAX profilers you can profile your Python application (or any application calling the Neuron Runtime API) using environment variables.\n\n.. code-block:: shell\n\n    NEURON_RT_INSPECT_ENABLE=1 NEURON_RT_INSPECT_OUTPUT_DIR=./output python app.py\n\nSee :ref:`Profile Capture Environment Variables <neuron-profiler-capture-environment-variables>` for other profiling options that can be set via environment variable.\n\nExample Capturing Profile of nccom-test Using Environment Variables\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nProfiling can be enabled using environment variables. For simplicity, we have a quick way to generate a Neuron workload through using :ref:`nccom-test <nccom-test>`. nccom-test is a benchmarking tool which is already available with Neuron AMI.\n\n.. code-block:: shell\n\n    export NEURON_RT_INSPECT_ENABLE=1\n    export NEURON_RT_INSPECT_OUTPUT_DIR=./output\n    nccom-test allr allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512\n\n.. note::\n    If you have problems with nccom-test add the --debug flag.\n    If using a trn1.2xlarge instance, change -r 32 to -r 2 to use fewer neuron cores.\n\nTo understand the profiling output see this section: :ref:`Inspect Output <neuron-profiler-inspect-output>`\n\nCLI reference for System Profiles\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn addition to controlling profiling with environment variables, you can use the ``neuron-profile inspect`` command line interface \nfor profiling applications. This provides the same functionality as environment variables but helps you avoid typos, invalid arguments, \nand provides a useful ``--help`` command to explain available options.\n\n.. code-block:: shell\n\n    Usage:\n    neuron-profile [OPTIONS] inspect [inspect-OPTIONS] [userscript...]\n\n    Application Options:\n    -v, --version                      Show version and exit\n\n    Help Options:\n    -h, --help                         Show this help message\n\n    [inspect command options]\n        -o, --output-dir=              Output directory for the captured profile data, including system and device profiles (default: ./output)\n        -n, --num-trace-events=        Maximum number of trace events to capture when profiling. Once hitting this limit, no new events are recorded\n            --capture-system-profiles  Disable capture of system profile data. Can reduce output size.\n            --capture-device-profiles  Disable capture of device profile data. Can reduce output size.\n\n    [inspect command arguments]\n    userscript:                        Run command/script that launches a Neuron workload. E.g. 'python app.py' or './runscript.sh'\n\n\nExample of using System Profiles CLI\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUser can provide any type of their own script to generate a Neuron workload such as Pytorch to the System Profiles CLI. \nFor simplicity, we have a quick way to generate a Neuron workload \nthrough using ``nccom-test``. ``nccom-test`` is a benchmarking tool which is already available with Neuron AMI and ``aws-neuronx-tools`` package.\n\n.. code-block:: shell\n\n    ubuntu@ip-172-31-63-210:~$ neuron-profile inspect -o inspect-output-nccom-test nccom-test allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512\n    INFO[0000] Running command \"nccom-test allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512\" with profiling enabled\n        size(B)    count(elems)    type    time:avg(us)    algbw(GB/s)    busbw(GB/s)\n        524288          131072    fp32           24.15          21.71          21.03\n    Avg bus bandwidth:    21.0339GB/s\n\n.. note::\n    If you have problems with nccom-test add the --debug flag.\n    If using a trn1.2xlarge instance, change -r 32 to -r 2 to use fewer neuron cores.\n\n.. _neuron-profiler-inspect-output:\n\n``neuron-profile inspect`` Output\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe above command shows a Neuron workload execution is being traced and output to ``inspect-output-nccom-test`` directory. \nYou will see the output directory contains a single NEFF file and a device profile (NTFF) for all Neuron Cores which executed that NEFF. \nYou will also see ``ntrace.pb`` and ``trace_info.pb`` files storing the system profile data.\nBelow showing what the outputs will look like:\n\n.. code-block:: shell\n\n    ubuntu@ip-172-31-63-210:~$ tree inspect-output-nccom-test\n    inspect-output-nccom-test\n        ├── i-012590440bb9fd263_pid_98399\n        │   ├── 14382885777943380728_instid_0_vnc_0.ntff\n        │   ├── 14382885777943380728_instid_0_vnc_1.ntff\n        │   ├── 14382885777943380728_instid_0_vnc_10.ntff\n        │   ├── 14382885777943380728_instid_0_vnc_11.ntff\n        ...\n        │   ├── 14382885777943380728_instid_0_vnc_8.ntff\n        │   ├── 14382885777943380728_instid_0_vnc_9.ntff\n        │   ├── cpu_util.pb\n        │   ├── host_mem.pb\n        │   ├── neff_14382885777943380728.neff\n        │   ├── ntrace.pb\n        │   └── trace_info.pb\n        └──\n\n    2 directories, 74 files\n\n\nTo view a summary of the captured profile data run the command\n\n.. code-block:: shell\n\n    neuron-profile view -d inspect-output-nccom-test --output-format summary-text\n\n\nEKS User Experience\n-------------------\n\nCapturing a profile on EKS is most easily done through setting of environment variables as described in the section \n:ref:`Non-framework specific User Experience <neuron-profiler-non-framework-user-experience>`. By using environment \nvariables, users do not need to change application code in their container image or modify their run commands. \n\nUpdate the deployment yaml to include the ``NEURON_RT_INSPECT_ENABLE`` and ``NEURON_RT_INSPECT_OUTPUT_DIR`` \nenvironment variables. For distributed workloads, it’s important that ``NEURON_RT_INSPECT_OUTPUT_DIR`` points to a \ndirectory on a shared volume which all workers have access to.\n\n.. code-block:: yaml\n\n    apiVersion: v1\n    kind: Pod\n    metadata:\n    name: trn1-mlp\n    spec:\n    restartPolicy: Never\n    schedulerName: default-scheduler\n    nodeSelector:\n        beta.kubernetes.io/instance-type: trn1.32xlarge\n    containers:\n        - name: trn1-mlp\n        env:\n            - name: NEURON_RT_INSPECT_ENABLE\n            value: \"1\"\n            - name: NEURON_RT_INSPECT_OUTPUT_DIR\n            value: \"/shared/output\"\n        command: ['torchrun']\n        args:\n            - '--nnodes=1'\n            - '--nproc_per_node=32'\n            - 'train_torchrun.py'\n        image: ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO}:mlp\n        imagePullPolicy: IfNotPresent\n        resources:\n            limits: \n            aws.amazon.com/neuron: 16\n\n\n.. note::\n\n    EKS users running PyTorch and JAX applications are still free to change their application code \n    and use the PyTorch or JAX Python profiling APIs if they want finer-grained control over profiling. \n    However, using the environment variables conveniently allows profiling without modifying the \n    container image or application code.\n\nProcessing and Viewing Profiles\n-------------------------------\n\nUsers have three output options for interacting with their captured profiles\n\n* Neuron Profiler UI - Neuron’s custom UI which allows easily drilling down to detailed device profiles from high level system profiles\n* Perfetto - Allows sharing profiles as a single file and viewing your profiles in the Perfetto UI at https://ui.perfetto.dev/\n* JSON - human-readable text output that enables simple scripting \n\nNeuron Profiler UI\n~~~~~~~~~~~~~~~~~~\n\nTo view a profile in the Neuron Profiler UI run the following command to process a profile and launch the UI\n\n.. code-block:: shell\n\n    neuron-profile view -d ./output\n\nTo view profiles with the Neuron Profiler UI running locally you will need to have InfluxDB installed on your system. \nTo install and setup InfluxDB follow the :ref:`directions in the official Neuron Profile documentation <neuron-profiler-installation>`.\n\n\nNeuron Profiler System Profile UI\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe system profile timeline shows a trace of Neuron Runtime API calls, ML framework function calls, CPU utilization, and memory usage on each of the instances in your workload. \nThe Neuron Runtime API trace is grouped by NeuronCore IDX and ec2 instance ID. For example, all events in the row \nlabeled nrt-nc-003-i-0f207fb2a99bd2d08 are associated with NeuronCore 3 and instance i-0f207fb2a99bd2d08.\n\nFramework function traces are grouped by thread id and ec2 instance id. For example, all events in \nthe row framework-3266405268-i-0f207fb2a99bd2d08 are framework or application function calls made on thread \n3266405268 running on instance i-0f207fb2a99bd2d08.\n\n\n|neuron-profiler2-annotate-system-ui|\n\nClicking on trace events in the timeline shows a “Event attributes” view with a list of attributes associated with that event. \nFor example, clicking on an nrt_execute event (the Neuron Runtime API call for executing a compiled model on a NeuronCore) \nwill show events such as Flop count (the number of floating point operations for a single execution of the model), \nthe model name, and the NeuronCore idx and ec2 instance id associated with the function call. \n\n|neuron-profiler2-attributes-window|\n\nNeuron Profiler 2.0 allows users to drill-down from a system timeline to a device profile timeline in order to see a detailed view \nof hardware activity during the execution of a graph. To do this, select an nrt_execute event in the timeline and in the \n“Event attributes” view select the \"Open device profile\" button under the Model Name attribute. \nThis will open a new window with a device profile. For help understanding a device profile see the section documentation section \"Understanding a Neuron Profile\"\n\n|neuron-profiler2-drilldown-device|\n\nTo see a list of all device profiles that were captured during your workload press the “Device Profiles” button at the bottom of the timeline. From this list you can \nsee all unique compiled graphs (NEFFs) that were executed on NeuronCores during your workload. For each graph there is a link to a device \nprofile that will show a detailed view of hardware activity on the NeuronCore during execution of this graph. \n\n|neuron-profiler2-device-profile-list|\n\n\nViewing Profiles with Perfetto\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nPerfetto is an open-source trace analysis toolkit with a powerful UI for visualizing and analyzing trace data.\nUsers of Neuron Profiler have the option of viewing their profiles in the Perfetto UI.\n\nThe ``--output-format perfetto`` option writes processed data to Perfetto's native protobuf-based tracing format which can be visualized in the Perfetto UI at https://ui.perfetto.dev/.\n\nExample:\n\n.. code-block:: shell\n\n    neuron-profile view -d ./output --output-format perfetto\n\nThis will generate a ``system_profile.pftrace`` file for the system profile and a ``device_profile_model_<model_id>.pftrace`` file for each unique compiled model that was executed on a Neuron Device.\n\nTo view the system profile, go to https://ui.perfetto.dev/ and open the ``system_profile.pftrace`` file.\n\n.. note::\n    When loading trace files in the Perfetto UI, your data is processed locally and not uploaded to Perfetto’s servers.\n\n|neuron-profiler2-perfetto-timeline|\n\nTo view a device profile go to https://ui.perfetto.dev/ and open the  ``device_profile_model_<model_id>.pftrace`` file. This will show a detailed view of hardware activity on the NeuronCore during execution of this graph.\n\n|neuron-profiler2-perfetto-device-timeline|\n\n.. note::\n    Your browser may run out of memory when viewing ``*.pftrace`` (Perfetto trace) files that are more than a few hundred MB. See the section :ref:`Viewing Large Profiles in Perfetto <neuron-profile-large-perfetto-profiles>` for directions on how to view large traces using the trace processor.\n\n\nPerfetto Output View Options\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen outputting to Perfetto it is possible to group your traces by different attributes. This is useful for\nlarger profiles involving many NeuronCores and instances. The following options are available:\n\n.. list-table:: Perfetto output view options\n     :header-rows: 1\n     :widths: 30 70\n\n     * - CLI option\n       - Description\n     * - ``--system-trace-primary-group``\n       - First-order grouping of trace events (maps to a Perfetto process / process group of rows). Provide a comma-delimited list of field names. Allowed fields: ``instance_id``, ``thread_id``, ``lnc_idx``, ``process_id``. Default: ``instance_id,process_id``.\n     * - ``--system-trace-secondary-group``\n       - Second-order grouping of trace events (maps to a Perfetto thread / single row). Provide a comma-delimited list of field names. Allowed fields: ``instance_id``, ``worker_gid``, ``thread_id``, ``lnc_idx``, ``process_id``. Default: ``worker_gid,lnc_idx, thread_id``.\n\n\nFor example, the following profile uses ``neuron-profile view --output-format=perfetto --system-trace-primary-group=instance_id,process_id --system-trace-secondary-group=lnc_idx,thread_id`` to group the system profile first by unique combinations\nof instance_id and process_id, and then in each of those groups there are rows of events with unique combinations of lnc_idx and thread_id.\n\n|neuron-profiler2-perfetto-grouping|\n\nGrouping By Global Worker ID\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nBy default, Perfetto traces are grouped by ``worker_gid`` which is a unique global identifier for each NeuronCore across all instances in a distributed workload.\nWhen clicking on an event in the trace you will see fields for both ``lnc_idx`` (local NeuronCore index on that process) and ``worker_gid`` (global NeuronCore index across all instances).\nIt is possible for ``lnc_idx`` to be the same for different processes on the same instance or across different instances in a distributed workload. However, ``worker_gid`` is unique for each NeuronCore across all instances.\nThe image below shows how to correlate the naming of tracks (rows) in the Perfetto UI to both ``lnc_idx`` and ``worker_gid``.\n\n|neuron-profiler2-perfetto-gid|\n\n\n\nGenerating JSON Output From Profiles\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe ``--output-format`` json option writes processed profile data to human-readable JSON that can be used for scripting and manual inspection.\n\n.. code-block:: shell\n\n    neuron-profile view -d ./output --output-format json\n\nThis will generate a ``system_profile.json`` file containing the system profile data and a ``device_profile_model_<model_id>.json`` file for each unique compiled model that was executed on a Neuron Device. \n\nThe  system_profile.json JSON contains the following data types:\n\n* ``trace_events``: Neuron Runtime API trace events and Framework/Application trace events containing timestamps, durations, names, and the ec2 instance-id to differentiate between events from different compute nodes in a distributed workload.\n\n.. code-block:: json\n\n    {\n        \"Neuron_Runtime_API_Event\": {\n            \"duration\": 27094,\n            \"group\": \"nrt-nc-000\",\n            \"id\": 1,\n            \"instance_id\": \"i-0f207fb2a99bd2d08\",\n            \"lnc_idx\": \"0\",\n            \"name\": \"nrt_tensor_write\",\n            \"parent_id\": 0,\n            \"process_id\": \"1627711\",\n            \"size\": \"4\",\n            \"tensor_id\": \"4900392441224765051\",\n            \"tensor_name\": \"_unknown_\",\n            \"thread_id\": 1627711,\n            \"timestamp\": 1729888371056597613,\n            \"type\": 11\n        },\n        \"Framework_Event\": {\n            \"duration\": 3758079,\n            \"group\": \"framework-80375131\",\n            \"instance_id\": \"i-0f207fb2a99bd2d08\",\n            \"name\": \"PjitFunction(matmul_allgather)\",\n            \"process_id\": \"701\",\n            \"thread_id\": 80375131,\n            \"timestamp\": 1729888382798557372,\n            \"type\": 99999\n        }\n    }\n\n* ``mem_usage``: sampled host memory usage \n\n.. code-block:: json\n\n    {\n        \"duration\": 1,\n        \"instance_id\": \"i-0f207fb2a99bd2d08\",\n        \"percent_usage\": 9.728179797845964,\n        \"timestamp\": 1729888369286687792,\n        \"usage\": 51805806592\n    }\n\n* ``cpu_util``: sampled CPU utilization. Results are provided per core and per ec2 instance involved in a distributed workload\n\n.. code-block:: json\n\n    {\n        \"cpu_id\": \"47\",\n        \"duration\": 1,\n        \"instance_id\": \"i-0f207fb2a99bd2d08\",\n        \"timestamp\": 1729888371287337243,\n        \"util\": 2.3255813\n    },\n\n\nProcessing only system or device profiles\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nTo reduce processing times it is possible to skip processing of system or device profiles. Sometimes users may only be interested in one or want to start  with a limited set of profiling data before exploring the full profile.  \n\nTo skip processing of device profiles use the ``--ignore-device-profile`` option. To skip processing of system profiles use the ``--ignore-system-profile`` option. These options can be used with the ``--output-format`` values ``db`` (default), ``perfetto``, or ``json``.\n\nFor example:\n\n.. code-block:: shell\n\n    neuron-profile view -d ./output --ignore-device-profile --output-format perfetto\n\n.. _neuron-profiler-filtering-system-profiles:\n\nFiltering System Profiles\n--------------------------\n\nThis guide explains how to filter system trace events to optimize memory usage, reduce output size, and speed up trace processing. **Capture-time filtering** reduces memory usage and trace file size by only collecting specific events, but filtered data cannot be recovered later. **Processing-time filtering** preserves the complete trace and allows flexible analysis with different filters, but requires more memory and storage during capture.\n\nCapture-Time Filtering\n~~~~~~~~~~~~~~~~~~~~~~\n\nConfigure filters before trace capture using environment variables or API functions. \nYou can use NeuronCore filters to only capture events for specific NeuronCores (for example only events associated with NeuronCore 0 or all the NeuronCores on a specific NeuronDevice). \nYou can use event type filters to only capture specific events (for example model execute or collectives events). \nIt is possible to combine both NeuronCore and event type filters.\n\nFiltering by NeuronCore\n^^^^^^^^^^^^^^^^^^^^^^^\n\nIf capture is enabled for a NeuronCore then a ring buffer will be allocated in host memory for storing those core's events. Thus filtering by NeuronCore decreases host memory usage during capture.\n\nDefault Behavior\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nBy default, all visible NeuronCores are enabled for capture. \n\nUsing Environment Variables\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: shell\n\n    # Filter to capture events only from NeuronCore 0\n    export NEURON_RT_INSPECT_EVENT_FILTER_NC=0\n\n    # Filter to capture events from NeuronCores 0, 2, and 4\n    export NEURON_RT_INSPECT_EVENT_FILTER_NC=0,2,4\n\n    # Filter to capture events from a range of NeuronCores (0 through 3)\n    export NEURON_RT_INSPECT_EVENT_FILTER_NC=0-3\n\n    # Reset to default behavior\n    unset NEURON_RT_INSPECT_EVENT_FILTER_NC # Back to capturing all visible cores\n\nUsing API Functions\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: c\n\n    #include <nrt/nrt_sys_trace.h>\n\n    // Allocate and configure trace options\n    nrt_sys_trace_config_t *config;\n    nrt_sys_trace_config_allocate(&config);\n    nrt_sys_trace_config_set_defaults(config);\n\n    // Enable capture only for specific NeuronCores\n\n    // Disable all cores since by default they are all enabled\n    int num_cores = 128;\n    for (int i=0; i<num_cores; i++) {\n      nrt_sys_trace_config_set_capture_enabled_for_nc(config, i, false); // disable NC i\n    }\n\n    // Then enable specific cores\n    nrt_sys_trace_config_set_capture_enabled_for_nc(config, 0, true);  // Enable NC 0\n    nrt_sys_trace_config_set_capture_enabled_for_nc(config, 2, true);  // Enable NC 2\n\n    // Start tracing with the configuration\n    nrt_sys_trace_start(config);\n\n    // Your application code here...\n\n    // Stop tracing and cleanup\n    nrt_sys_trace_stop();\n    nrt_sys_trace_config_free(config);\n\nFiltering by Event Type\n^^^^^^^^^^^^^^^^^^^^^^^\n\nDefault Behavior\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nBy default, all event types are enabled for capture.\n\nGetting Available Event Types\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nYou can discover all available event types using the ``nrt_sys_trace_get_event_types`` API.\n\n.. code-block:: c\n\n    #include <nrt/nrt_sys_trace.h>\n\n    // Get all available event types\n    const char **event_types = nullptr;\n    size_t count = 0;\n    NRT_STATUS status = nrt_sys_trace_get_event_types(&event_types, &count);\n\n    if (status == NRT_SUCCESS) {\n        printf(\"Available event types:\\n\");\n        for (size_t i = 0; i < count; ++i) {\n            printf(\"  %s\\n\", event_types[i]);\n        }\n        \n        // Free the event types array\n        for (size_t i = 0; i < count; ++i) {\n            free((void*)event_types[i]);\n        }\n        free((void*)event_types);\n    }\n\nUsing Environment Variables\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nThe ``NEURON_RT_INSPECT_EVENT_FILTER_TYPE`` environment variable supports:\n\n* **Default**: If not set, all event types are captured\n* **Specific event types**: Use exact event names from ``nrt_sys_trace_get_event_types()``\n* **Event categories**: Use ``hardware`` or ``software`` to filter by category\n* **Exclusion**: Use ``^`` prefix to exclude specific events from a category\n\n.. code-block:: shell\n\n    # Filter to capture only specific event types\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=nrt_load,nrt_execute,nc_exec_running\n\n    # Filter to capture all hardware events\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware\n\n    # Filter to capture all software events\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=software\n\n    # Filter to capture all hardware events EXCEPT cc_exec\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware,^cc_running\n\n    # Filter to capture all software events EXCEPT nrt_load\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=software,^nrt_load\n\n    # Mix categories and specific events\n    export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware,nrt_tensor_write,nrt_tensor_read\n\n    # Reset to default behavior\n    unset NEURON_RT_INSPECT_EVENT_FILTER_TYPE  # Back to capturing all event types\n\nThe ``hardware`` group contains events that are executed on the NeuronCore. \nThese are ``nc_exec_running``, ``cc_running``, ``cc_exec_barrier``, ``numerical_err``, ``nrt_model_switch``, ``timestamp_sync_point``, ``hw_notify``.\nThe ``software`` group contains all other events.\n\nUsing API Functions\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\nUse the ``nrt_sys_trace_config_set_capture_enabled_for_event_type`` API to filter by event type.\n\n.. code-block:: c\n\n    #include <nrt/nrt_sys_trace.h>\n\n    // Configure trace options\n    nrt_sys_trace_config_t *config;\n    nrt_sys_trace_config_allocate(&config);\n    nrt_sys_trace_config_set_defaults(config); // By default, all event types are enabled\n\n    // Disable specific event types (others remain enabled)\n    nrt_sys_trace_config_set_capture_enabled_for_event_type(config, \"device_exec\", false);\n\n    // Or disable all first, then enable only specific ones\n    const char **all_event_types = nullptr;\n    size_t all_count = 0;\n    nrt_sys_trace_get_event_types(&all_event_types, &all_count);\n\n    // Disable all event types first\n    for (size_t i = 0; i < all_count; ++i) {\n        nrt_sys_trace_config_set_capture_enabled_for_event_type(config, all_event_types[i], false);\n    }\n\n    // Enable only specific event types\n    nrt_sys_trace_config_set_capture_enabled_for_event_type(config, \"model_load\", true);\n    nrt_sys_trace_config_set_capture_enabled_for_event_type(config, \"nrt_execute\", true);\n\n    // Verify which event types are enabled\n    const char **enabled_types = nullptr;\n    size_t enabled_count = 0;\n    nrt_sys_trace_config_get_enabled_event_types(config, &enabled_types, &enabled_count);\n    printf(\"Enabled event types: %zu\\n\", enabled_count);\n    for (size_t i = 0; i < enabled_count; ++i) {\n        printf(\"  %s\\n\", enabled_types[i]);\n    }\n\n    // Clean up memory (caller is responsible)\n    for (size_t i = 0; i < enabled_count; ++i) {\n        free((void*)enabled_types[i]);\n    }\n    free((void*)enabled_types);\n\n    for (size_t i = 0; i < all_count; ++i) {\n        free((void*)all_event_types[i]);\n    }\n    free((void*)all_event_types);\n\n    // Start tracing\n    nrt_sys_trace_start(config);\n\n    // Your application code here...\n\n    // Cleanup\n    nrt_sys_trace_stop();\n    nrt_sys_trace_config_free(config);\n\n.. _neuron-profile-system-timestamp-adjustment:\n\nAdjusting Hardware Timestamps\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHardware events executed on the NeuronCore use device-specific timestamps that are in a different time domain than CPU timestamps. To enable accurate correlation between hardware and software events in the JSON system trace output, the runtime automatically adjusts hardware event timestamps to the CPU time domain using synchronization point events.\n\nHow Timestamp Adjustment Works\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nSystem trace events are generated from multiple independent time domains: the CPU host and each ML accelerator devices operating with their own clocks. To align events from different domains, the runtime performs software-based time synchronization after event collection.\n\n**Sync Point Events**: After each execution, a special ``timestamp_sync_point`` event captures nearly simultaneous timestamps from both the host CPU (``cpu_timestamp_ns``) and the device (``nc_timestamp_ns``). These sync events are used to adjust the timestamps of hardware events to the CPU domain. \nThese synchronization events are included in the returned event trace and serve as reference points for timestamp adjustment. Users can see the sync point used for aligning hardware events in the timeline.\n\n**Adjustment Algorithm**: For each hardware event, the runtime:\n\n- Uses the sync point with matching exec_id for that NeuronCore\n- Calculates the time difference between the hardware event and the sync point (in device time)\n- Applies that same time difference to the sync point's CPU timestamp\n- Formula: ``adjusted_timestamp = sync_cpu_timestamp + (event_device_timestamp - sync_device_timestamp)``\n\nIllustration::\n\n         Sync_Point           HW_Event\n                 │                │\n                 ▼                ▼\n    Device Time ─●────────────────●───>\n                 |-------Δt------>|     - sync_device_timestamp and sync_cpu_timestamp occur ~simultaneously, though their clocks differ\n    CPU Time ────●────────────────●───> - Calc Δt = event_device_timestamp - sync_device_timestamp (elapsed time since sync point on device)\n                 |-------Δt------>|     - Add Δt to sync_cpu_timestamp to get adjusted_timestamp\n\n|neuron-profiler2-syncpoint-timeline|\n\n**Hardware Events**: Hardware events that require timestamp adjustment include:\n\n- ``nc_exec_running`` (NeuronCore execution start/stop)\n- ``cc_running`` (collective communication execution)\n- ``cc_exec_barrier`` (collective communication barriers)\n- ``numerical_err`` (numerical errors)\n- ``nc_model_switch`` (NeuronCore model switching)\n\nTips\n^^^^\n\n1. **Memory Optimization**: Use NeuronCore filtering to avoid allocating ring buffers for unused cores and decrease host memory usage. Use both event type or NeuronCore to decrease output trace sizes.\n2. **Event Type Discovery**: Use ``nrt_sys_trace_get_event_types()`` to discover available event types\n3. **Category Filtering**: Use ``hardware``/``software`` categories for broad filtering\n4. **Exclusion Filtering**: Use ``^`` prefix to exclude specific events from categories\n5. **Combine Filters**: Use both NeuronCore and event type filters together for maximum optimization\n\nProcessing-Time Filtering\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nApply filters when viewing or processing already captured profiles. This approach allows you to \nanalyze the same trace data in different ways without recapturing. The filters can be used for any \n``neuron-profile`` output format including ``--output-format json`` and ``--output-format perfetto``.\n\nFiltering by NeuronCore\n^^^^^^^^^^^^^^^^^^^^^^^\n\nUse the ``--system-trace-filter-neuron-core`` to only process events for specific NeuronCores. The IDs are local to the instance and not global IDs. \n\nIf the ``--system-trace-filter-neuron-core`` argument is not set then events from all NeuronCores will be included in the processed trace.\n\n.. code-block:: shell\n\n    # Filter by single neuron core\n    neuron-profile view -d ./output --system-trace-filter-neuron-core \"0\" --output-format perfetto\n\n    # Filter by multiple neuron cores\n    neuron-profile view -d ./output --system-trace-filter-neuron-core \"0,1,2,3\" --output-format perfetto\n\nFiltering by Event Type\n^^^^^^^^^^^^^^^^^^^^^^^\n\nUse the ``--system-trace-filter-event-type`` to only process specific trace events types.\n\nIf the ``--system-trace-filter-event-type`` argument is not set then all event types will be included in the processed trace.\n\n.. code-block:: shell\n\n    # Filter by single event type\n    neuron-profile view -d ./output --system-trace-filter-event-type \"nrt_execute\" --output-format perfetto\n\n    # Filter by multiple event types\n    neuron-profile view -d ./output --system-trace-filter-event-type \"nrt_execute,nrt_load\" --output-format perfetto\n\nFiltering by Instance ID\n^^^^^^^^^^^^^^^^^^^^^^^^\n\nUse the ``--system-trace-filter-instance-id`` to only process events for specific ec2 instances.\n\nIf the ``--system-trace-filter-instance-id`` argument is not set then events from all instances will be included in the processed trace.\n\n.. code-block:: shell\n\n    # Filter by single instance\n    neuron-profile view -d ./output --system-trace-filter-instance-id \"i-abc123\" --output-format perfetto\n\n    # Filter by multiple instances (comma-separated)\n    neuron-profile view -d ./output --system-trace-filter-instance-id \"i-abc123,i-def456,i-ghi789\" --output-format perfetto\n\nTroubleshooting\n---------------\n\nIncomplete JAX Profiles\n~~~~~~~~~~~~~~~~~~~~~~~\n\nIf your JAX profile has fewer events than expected or lacks the Runtime API trace, check whether \n``jax.profiler.stop_trace`` is being called inside a ``with jax.profiler.trace`` context block. \nThis can prematurely stop tracing. Use ``jax.profiler.stop_trace`` only when profiling was started \nwith ``jax.profiler.start_trace``, not when using the context-managed ``with jax.profiler.trace`` API.\n\nAlso when using ``jax.profiler`` within your script ensure that the \nenvironment variable ``NEURON_RT_INSPECT_ENABLE`` is not set to 1. \nAdditionally, ensure that ``NEURON_RT_INSPECT_OUTPUT_DIR`` is set to \nthe correct output directory and this is the output directory passed to \n``with jax.profiler.trace``.\n\nDropped Events in System Profile\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWhen processing a system profile, you may see a warning indicating that some trace events were dropped during capture.\n\n.. code-block:: shell\n\n    WARN[0000] Warning: 1001 trace events were dropped during capture (stored 530560 out of 531561 total events). Consider increasing buffer size, reducing trace duration, or filtering events.\n\nThis means during capture the trace event buffers filled and oldest events were overwritten. If you need to avoid dropping events for the full duration of your workload consider the following adjustments:\n\n* Increase buffer size by setting ``NEURON_RT_INSPECT_SYS_TRACE_MAX_EVENTS_PER_NC`` (see :ref:`Profile Capture Environment Variables <neuron-profiler-capture-environment-variables>`). This will increase host memory usage.\n* Apply capture-time filters (NeuronCores / event types) (see :ref:`Filtering System Profiles <neuron-profiler-filtering-system-profiles>`.)\n* Shorten profiled region: limit the code span under the profiling context / runtime.\n\n\n.. |neuron-profiler2-annotate-system-ui| image:: /images/neuron-profiler2-annotate-system-ui.png\n.. |neuron-profiler2-attributes-window| image:: /images/neuron-profiler2-attributes-window.png\n.. |neuron-profiler2-device-profile-list| image:: /images/neuron-profiler2-device-profile-list.png\n.. |neuron-profiler2-drilldown-device| image:: /images/neuron-profiler2-drilldown-device.png\n.. |neuron-profiler2-perfetto-timeline| image:: /images/neuron-profiler2-perfetto-timeline.png\n.. |neuron-profiler2-perfetto-device-timeline| image:: /images/neuron-profiler2-perfetto-device-timeline.png\n.. |neuron-profiler2-perfetto-grouping| image:: /images/neuron-profiler2-perfetto-grouping.png\n.. |neuron-profiler2-syncpoint-timeline| image:: /images/neuron-profiler2-syncpoint-timeline.png\n.. |neuron-profiler2-perfetto-gid| image:: /images/neuron-profiler2-perfetto-gid.png\n"
  },
  {
    "path": "tools/tensorboard/getting-started-tensorboard-neuronx-plugin.rst",
    "content": ".. _neuronx-plugin-tensorboard:\n\nNeuronX Plugin for TensorBoard (Trn1)\n======================================\n\n.. contents:: Table of Contents\n  :local:\n  :depth: 2\n\n\nOverview\n--------\n\nThis guide is for developers who want to better understand how their\nmodel is executed using Neuron SDK through TensorBoard.\n\nThe Neuron plugin for TensorBoard provides metrics to the performance of machine learning tasks accelerated using the Neuron SDK. It is\ncompatible with TensorBoard versions 1.15 and higher. It provides visualizations and profiling results for graphs executed on NeuronCores.\n\n.. note::\n\n    The following information is compatible with Neuron SDK for Trn1.  For a walkthrough on Inf1, please check out the guide\n    :ref:`neuron-plugin-tensorboard`.\n\n\nEnable profiling on Trn1\n------------------------\n\n.. note::\n\n   Profiling is currently only supported with PyTorch Neuron (``torch-neuronx``).\n\nPlease refer to the following guides:\n\n- PyTorch-Neuron\n    - :ref:`torch-neuronx-profiling-with-tb`\n\n\nLaunch TensorBoard\n------------------\n\nIn this step, we will process the Neuron profile data and launch TensorBoard.\n\n1. Install the Neuron plugin for Tensorboard on your EC2 instance.\n\n.. code:: bash\n\n\n    pip install tensorboard-plugin-neuronx --extra-index-url https://pip.repos.neuron.amazonaws.com\n\n.. note::\n\n   If using TensorBoard >= 2.5, please use the ``--load_fast=false`` option when launching.\n   ``tensorboard --logdir results --load_fast=false``\n\n2. After you see the following message, TensorBoard is ready to use.  By default,\nTensorBoard will be launched at ``localhost:6006``.\n\n::\n\n   ...\n   Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all\n   TensorBoard 2.4.1 at http://localhost:6006/ (Press CTRL+C to quit)\n\n\nView results in TensorBoard\n---------------------------\n\nIn this step, we will view the Neuron plugin for TensorBoard from a browser on your local\ndevelopment machine.\n\n1. Connect to the EC2 instance where TensorBoard is running while enabling port forwarding.\nIn this example, we assume TensorBoard has been launched using the default address ``localhost:6006``.\n\n.. code:: bash\n\n   # if Ubuntu-based AMI\n   ssh -i <PEM key file> ubuntu@<instance DNS> -L 6006:localhost:6006\n\n   # if AL2-based AMI\n   ssh -i <PEM key file> ec2-user@<instance DNS> -L 6006:localhost:6006\n\n2. In a browser, visit |tensorboard_address|.\n\n3. In the top navigation bar, switch from ``Graphs`` to ``Neuron``.  If it does not show up,\nplease wait a while and refresh the page while the plugin loads.  If the issue persists, check\nthe ``Inactive`` dropdown list on the right and check for ``Neuron``.\n\n|image1|\n\n4. If TensorBoard failed to find the generated logs, you will see the following message:\n\n|image2|\n\n\nIn this case, please make sure the version of the ``aws-neuronx-tools``\npackage and the Neuron framework package is from Neuron release 2.6 or newer.\n\n\nNeuron Trace View\n-----------------\n\n|image3|\n\nThe trace view gives a high level timeline of execution by aligning Neuron events, such as Neuron Device execution,\ndata transfers, and Collective Compute synchronization (if applicable), with other events from the XLA profiler.\n\nUse this view to better understand bottlenecks during the run, and potentially experiment with how execution changes\nby moving the ``mark_step()`` call which will execute the graph.\n\n\nNeuron Operator View\n--------------------\n\n|image4|\n\nThe operator view can show timing information for both the framework operators and HLO operators by selecting\nthe ``operator-framework`` and ``operator-hlo`` tools respectively.  The pie charts show breakdowns of the time taken\nby device, as well as per operator on a single device.  The table below lists out the operators and can be sorted by clicking\non the columnn headers.  For fused operations, hover over the ``?`` to see which operators are being executed.\n\nFor a quick glance at the most time consuming operators, click the ``Time %`` column in the table to sort by the relative\ntime spent on this type of operation compared to the rest of the model.\n\n\nNeuron Operator Timeline View\n-----------------------------\n\n|image5|\n\nThe operator timeline view is a detailed look into a single execution with Neuron.  A high level overview at the top breaks\ndown the execution into categories, including Neuron Runtime setup time, as well as NeuronCore compute engine and DMA engine busyness.\nActivity on the compute and DMA engines are further categorized as compute, control, and data transfer intervals which are\nshown as separate processes, with each showing a hierarchical view of the framework operators and their corresponding\nHLO operation.  The fused operations can be a result of compiler optimizations or are operations that are running in\nparallel on the device.  Each bar can be clicked to show information regarding which operators are overlapped.\n\nThis view can give better insight into how operators translate to Neuron, as well as how certain Neuron compiler options\nmay improve performance.\n\n\nTroubleshooting\n---------------\n\nTensorBoard launch fails\n~~~~~~~~~~~~~~~~~~~~~~~~\n\n::\n\n    ImportError: cannot import name 'Mapping' from 'collections'\n\nThis is an issue with Python 3.10 and a dependency of an old tensorboard version.  To workaround this error, please run\n``pip install --upgrade tensorboard``.  For more information, see https://github.com/tensorflow/tensorboard/pull/5490.\n\n.. |image1| image:: /images/Neuron_Profiler_Tensorboard_Dropdown.jpg\n.. |image2| image:: /images/tb-plugin-img12.png\n  :height: 2914\n  :width: 5344\n  :scale: 10%\n.. |image3| image:: /images/Neuron_Profiler_Runtime_Trace_Original.jpg\n.. |image4| image:: /images/Neuron_Profiler_T1_Op_Framework_View.png\n.. |image5| image:: /images/TB_Operator_Timeline_2-10.png\n.. |tensorboard_address| raw:: html\n\n   <a href=\"http://localhost:6006\" target=\"_blank\">localhost:6006</a>\n"
  },
  {
    "path": "tools/tensorboard/index.rst",
    "content": ".. _tensorboard-neuron:\n\nTensorBoard\n===========\n\nTensorBoard integration with AWS Neuron provides powerful visualization and debugging capabilities for machine learning workloads. The Neuron TensorBoard plugins enable developers to monitor training progress, analyze model performance, and debug compilation issues through familiar TensorBoard interfaces.\n\n.. toctree::\n    :maxdepth: 1\n    :hidden:\n\n    TensorBoard for NeuronX </tools/tensorboard/getting-started-tensorboard-neuronx-plugin>\n\nTensorBoard for Trn1\n--------------------\n\n.. grid:: 1\n   :gutter: 3\n\n   .. grid-item-card:: TensorBoard Plugin for NeuronX (Trn1)\n      :link: /tools/tensorboard/getting-started-tensorboard-neuronx-plugin\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n\n      Comprehensive guide for using the TensorBoard Neuron plugin on Trn1 instances, including installation, configuration, and advanced visualization features.\n\n   .. grid-item-card:: Profiling PyTorch NeuronX (``torch-neuronx``) with TensorBoard\n      :link: /tools/tutorials/torch-neuronx-profiling-with-tb\n      :link-type: doc\n      :class-header: sd-bg-primary sd-text-white\n\n      Step-by-step tutorial for monitoring PyTorch training progress on Trn1 instances using TensorBoard scalars, metrics visualization, and performance tracking.\n\n"
  },
  {
    "path": "tools/third-party-solutions.rst",
    "content": ".. _third-party-tool-solutions:\n\nThird-party solutions\n======================\n\nAWS Neuron integrates with multiple third-party partner solutions that alow you to run deep learning workloads on Amazon EC2 \ninstances powered by AWS Trainium and AWS Inferentia chips. The following list gives an overview of third-party solutions \nthat work with AWS Neuron.\n\nDatadog\n\"\"\"\"\"\"\"\nDatadog, an observability and security platform, provides real-time monitoring for cloud infrastructure and ML operations. Datadog is \nexcited to launch its AWS Neuron integration, which pulls metrics collected by Neuron SDK’s Neuron Monitor tool into Datadog, \nenabling users to track the performance of their Trainium and Inferentia-based instances. By providing real-time visibility into \nmodel performance and hardware usage, Datadog helps customers ensure efficient training and inference, optimized resource \nutilization, and the prevention of service slowdowns.\n\n`Datadog documentation <https://docs.datadoghq.com/integrations/aws_neuron/?tab=host>`_\n\n"
  },
  {
    "path": "tools/tutorials/index.rst",
    "content": ".. _neuron-tools-tutorials:\n\nTutorials\n============\n\n.. toctree::\n    :hidden:\n    :maxdepth: 1\n\n    performance-profiling-vllm\n    torch-neuronx-profiling-with-tb\n    tutorial-tensorboard-scalars-mnist\n    tutorial-neuron-monitor-mnist\n\n.. grid:: 1 2 2 2\n   :gutter: 3\n\n   .. grid-item-card:: Profiling a vLLM Inference Workload\n      :link: /tools/tutorials/performance-profiling-vllm\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Learn how to capture and analyze device-level and system-level profiles for vLLM inference workloads on AWS Trainium. \n\n   .. grid-item-card:: Profiling a NKI Kernel\n      :link: /nki/guides/use-neuron-profile\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Learn how to profile a NKI kernel with Neuron Explorer.\n\n   .. grid-item-card:: Profiling PyTorch Neuron with TensorBoard\n      :link: tutorial-tensorboard-scalars-mnist\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Learn how to use Neuron's plugin for TensorBoard that allows users to measure and visualize performance on a torch runtime level or an operator level.\n\n   .. grid-item-card:: Track System Resource Utilization during Training with Neuron Monitor\n      :link: tutorial-neuron-monitor-mnist\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Learn how to monitor resource utilization using neuron-monitor, Prometheus and Grafana while running a multi-layer perceptron MNIST model on Trainium using PyTorch Neuron.\n\n   .. grid-item-card:: Track Training Progress in TensorBoard using PyTorch Neuron\n      :link: torch-neuronx-profiling-with-tb\n      :link-type: doc\n      :class-card: sd-border-1\n\n      Learn how to track training progress in TensorBoard while running a multi-layer perceptron MNIST model on Trainium using PyTorch Neuron.\n"
  },
  {
    "path": "tools/tutorials/performance-profiling-vllm.rst",
    "content": ".. meta::\n    :description: Learn how to use Neuron Explorer to capture and analyze system-level and device-level profiles for vLLM inference workloads on AWS Trainium\n    :date-modified: 12/02/2025\n\nProfiling a vLLM Inference Workload on AWS Trainium\n==========================================================================\n\nThis tutorial outlines the steps involved in using Neuron Explorer to capture and view system-level and device-level profiles for a vLLM-hosted inference workload on AWS Trainium.\n\nOverview\n--------\n\nBy following this tutorial you will learn how to:\n\n* Launch a vLLM-hosted inference workload on AWS Trainium with system and device-level profiling enabled\n* View the system-level profile using Perfetto\n* Identify regions within the system profile that show LLM context-encoding (prefill) and token generation (decode) running on the NeuronDevices, along with the names of the associated compute graphs\n* View the device-level profiles for context-encoding & token generation compute graphs in the Neuron Explorer UI\n\nPrepare your environment\n------------------------\n\nThe following steps show how to launch a Trainium EC2 instance using the latest Neuron Deep Learning AMI (DLAMI) and then install vLLM so that an example vLLM-hosted model can be profiled using the Neuron Explorer. If you would prefer to use a containerized environment (Docker, EKS), please refer to the Neuron documentation to get started with a Neuron Deep Learning Container (DLC) image that has vLLM pre-installed.\n\n1. Launch a Trainium instance (trn1.32xlarge, trn2.3xlarge, trn2.48xlarge)\n    1. Option 1: Launch the instance using the latest AWS Deep Learning AMI (DLAMI), which includes the Neuron SDK preinstalled. Once the instance is launched, please SSH into it and use the virtual environment for neuronx-distributed-inference by following this command -\n        1. ``source /opt/aws_neuronx_venv_pytorch_2_9_nxd_inference/bin/activate``\n    2. Option 2: If using a fresh Linux instance, manually install the latest Neuron packages by following the AWS Neuron installation guide.\n2. Install vLLM\n    1. Refer to the Neuron documentation which outlines how to install the Neuron vLLM fork from source.\n\nStep 1: Save a smaller version of your model\n--------------------------------------------\n\nWhen profiling LLMs it is usually desirable to use only a subset of the model's layers in order to understand model performance and to identify possible bottlenecks. Capturing traces for the entire model could lead to an excessive volume of profiling data, making analysis cumbersome. To address this, the following script takes the Qwen3-8B-base model, truncates it to the first 4 layers, and saves the resulting smaller model for profiling purposes.\n\n.. code-block:: python\n\n    import transformers\n\n    model_id = \"Qwen/Qwen3-8B-Base\"\n    config = transformers.AutoConfig.from_pretrained(model_id)\n    config.num_hidden_layers = 4\n    config.layer_types = [\"full_attention\"] * 4\n    tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)\n    output_dir = \"4layer_qwen3\"\n\n    model = transformers.AutoModelForCausalLM.from_pretrained(model_id, config=config)\n    model.save_pretrained(output_dir)\n    tokenizer.save_pretrained(output_dir)\n\nSave the above python script as ``save_4layer_qwen.py`` and then run it using the python interpreter:\n\n.. code-block:: bash\n\n    python3 ./save_4layer_qwen.py\n\nOnce the script has completed, you should see the new ``4layer_qwen`` directory which contains the truncated model.\n\nStep 2: Run a vLLM offline inference workload with profiling enabled\n--------------------------------------------------------------------\n\nIn this step, you will run a small vLLM offline inference script that will compile, run, and profile your 4-layer Qwen3 model on the Trainium chips.\n\nBegin by saving the following python script as ``qwen3_offline_inference.py``:\n\n.. code-block:: python\n\n    import os\n    os.environ['VLLM_NEURON_FRAMEWORK'] = \"neuronx-distributed-inference\"\n\n    # Enable Neuron profiling via environment variables\n    os.environ['XLA_IR_DEBUG'] = \"1\"\n    os.environ['XLA_HLO_DEBUG'] = \"1\"\n    os.environ['NEURON_FRAMEWORK_DEBUG'] = \"1\"\n    os.environ['NEURON_RT_INSPECT_ENABLE'] = \"1\"\n    os.environ['NEURON_RT_INSPECT_SYSTEM_PROFILE'] = \"1\"\n    os.environ['NEURON_RT_INSPECT_DEVICE_PROFILE'] = \"1\"\n    os.environ['NEURON_RT_INSPECT_OUTPUT_DIR'] = \"./neuron_profiles\"\n\n    from vllm import LLM, SamplingParams\n\n    # Sample prompts.\n    prompts = [\n        \"The president of the United States is\",\n        \"The capital of France is\",\n        \"The future of AI is\",\n    ]\n    # Create a sampling params object.\n    sampling_params = SamplingParams(top_k=1)\n\n    # Create an LLM instance using the 4-layer Qwen3 model\n    llm = LLM(\n        model=\"4layer_qwen3\",\n        max_num_seqs=4,\n        max_model_len=128,\n        additional_config={\n            \"override_neuron_config\": {\n                \"enable_bucketing\":False,\n            },\n        },\n        enable_prefix_caching=False,\n        tensor_parallel_size=8)\n\n    # Run inference using the sample prompts\n    outputs = llm.generate(prompts, sampling_params)\n\nNext, run the offline inference script with a Python interpreter:\n\n.. code-block:: bash\n\n    python3 ./qwen3_offline_inference.py\n\nAfter ~60s the script should complete, and you will see a new ``neuron_profiles`` directory which contains both system-level and device-level profile traces for this example inference workload.\n\nStep 3: Visualize the system profile for your model\n---------------------------------------------------\n\n.. note::\n   System profiles are currently viewed using the open-source Perfetto tool. Viewing of system profiles will be natively supported by the Neuron Explorer UI in an upcoming release.\n\nRun the following command to generate a Perfetto compatible file from the system profile traces that you previously captured:\n\n.. code-block:: bash\n\n    neuron-explorer view -d ./neuron_profiles --ignore-device-profile \\\n      --output-format perfetto\n\nThe above command generates a file called ``system_profile.pftrace`` in your working directory.\n\nCopy the ``system_profile.pftrace`` file to your local machine and open up the Perfetto UI in your local web browser.\n\nIn the left-hand menu, choose \"Open trace file\" and select your ``system_profile.pftrace`` file to view the system profile. Expand the first row under Default Workspace and you will see a timeline view similar to the following:\n\n.. image:: /tools/profiler/images/perf-profiling-1.png\n\nThe system profile shows a high-level chronological view of the various Neuron Runtime API calls that took place during your example inference workload. If you hover the mouse cursor over the various pink/green bars you can see which specific API call occurred at each time point, such as ``nrt_tensor_read``, ``nrt_tensor_write``, ``nrt_execute``, and ``nrt_load_collectives``.\n\nLook for the **nrt_execute** bar identified below and select it. This will open an information dialog providing details of the specific ``nrt_execute`` call:\n\n.. image:: /tools/profiler/images/perf-profiling-2.png\n\n.. image:: /tools/profiler/images/perf-profiling-3.png\n\nIn the Arguments pane you will find useful information such as the following:\n\n* device_profile - the unique name of the device profile associated with this event\n* nc_idx - the index of the NeuronCore that is associated with this API call\n* model_name - path to the compiled Neuron Executable File Format (NEFF) compute graph associated with this event\n\nIn the above screenshot, notice that the model_name field provides additional information about what is happening during this part of the model execution:\n\n.. code-block:: text\n\n    tmp/nxd_model/context_encoding_model/_tp0_bk0/model.MODULE_6d1668c2294e2409dd72+ad9e832d.neff\n\n* ``context_encoding_model`` - indicates that this is handling context-encoding (prefill) during vLLM inference (other model names will alternatively include token_generation_model to indicate the token-generation / decode phase of inference).\n* ``tp0`` - indicates that this profile is associated with the rank0 of the tensor-parallel (TP) replica group\n* ``bk0`` - indicates that this profile is associated with the first sequence bucket as configured in Neuronx Distributed Inference (NxDI) NeuronConfig.\n\nStep 4: Visualize device profiles in Neuron Explorer\n----------------------------------------------------\n\nIn this step, you will view a device profile for your model in Neuron Explorer UI.\n\nIf you look inside the ``neuron_profiles`` directory that was created during Step 2, you will see many Neuron Executable File Format (NEFF) and their associated Neuron Trace File Format (NTFF) files. For each pair of NEFF/NTFF files, the NEFF represents the Neuron-compiled compute graph for a portion of your model, and the NTFF represents the device-level profile trace for that specific compute graph.\n\nWhile you are free to view any of the device-level profiles using the Neuron Explorer UI, it is often more useful to start from the system-level profile and identify a specific device-level profile of interest. Let's refer back to the nrt_execute region of the system-level profile that was covered in the previous section. Please find and left-click this region to bring up the information dialog at the bottom of Perfetto:\n\n.. image:: /tools/profiler/images/perf-profiling-4.png\n\n.. image:: /tools/profiler/images/perf-profiling-5.png\n\nIn the device_profile field, note that numerical ID that is included at the end of the device profile name, in this case 2120860766. This ID is what you will use to locate the NEFF/NTFF pair associated with this specific nrt_execute API call.\n\nUse the following find command (substituting-in your device profile ID) to locate the NEFF/NTFF files associated with your identified ID:\n\n.. code-block:: bash\n\n    find ./neuron_profiles -name \\*2120860766\\* | sort\n\n.. image:: /tools/profiler/images/perf-profiling-6.png\n\nIn the above output you can see that there is a single NEFF file ``neff_2120860766.neff``, and multiple NTFF files ``2120860766_instid_0_vnc_0.ntff`` ... ``2120860766_instid_0_vnc_7.ntff`` each representing the profile trace for one of the 8 NeuronCores that participated in this inference request.\n\nThese are the files you will open in the Neuron profiler UI to inspect the device-level execution.\n\nPlease copy the NEFF and one of the NTFF files to your local machine, as you will need to upload the files to the Neuron Explorer UI using your web browser.\n\nTo view the Neuron Profile Web UI, execute the ``view`` command to start the Neuron Explorer web UI:\n\n.. code-block:: bash\n\n    $ neuron-explorer view --data-path ./<workspace> --output-format parquet\n\n``<workspace>`` is a path that neuron-explorer will use for storing and managing profiles.\n\nThe above command also prints a URL that you can click to open the web UI:\n\n.. code-block:: text\n\n    View a list of profiles at http://localhost:3001/\n\nIf ``neuron-explorer view`` is run on a remote instance, you may need to use port forwarding to access the web UI. By default, ``neuron-explorer`` creates a web server on port 3001 and the API server on port 3002. To enable connection to your browser on your local computer, you must to establish an SSH tunnel to both ports 3001 and 3002.\n\nFor example:\n\n.. code-block:: bash\n\n    ssh -L 3001:localhost:3001 -L 3002:localhost:3002 <user>@<ip> -fN\n\nIf you created an EC2 instance with PEM credentials, include them in the SSH tunnel as seen below:\n\n.. code-block:: bash\n\n    ssh -i ~/my-ec2.pem -L 3001:localhost:3001 -L 3002:localhost:3002 ubuntu@[PUBLIC_IP_ADDRESS] -fN\n\nOnce the SSH tunnel is setup, you can now open a browser and navigate to http://localhost:3001.\n\nWith the Neuron Explorer UI open, go to \"Profile Manager\", and click \"Upload Profile\" at the top-right of the screen. Give your profile an appropriate name, and upload the NEFF and NTFF files that you previously identified:\n\n.. image:: /tools/profiler/images/perf-profiling-7.png\n\nAfter a few seconds, you should receive a message indicating that NEFF/NTFF were uploaded successfully:\n\n.. image:: /tools/profiler/images/perf-profiling-8.png\n\nWithin the Neuron Explorer UI, go tot he Profile Manager screen and look for your newly uploaded profile.\n\n.. image:: /tools/profiler/images/perf-profiling-9.png\n\nDepending on the size of your profile, it could take a few minutes before the Status field shows \"PROCESSED\". Once processing is complete, click the profile name to open the profile:\n\n.. image:: /tools/profiler/images/perf-profiling-10.png\n\nConfirmation\n------------\n\nCongratulations, you have now successfully generated both system-level and device-level profiles for a vLLM inference workload using Neuron Explorer and learned how to visualize them. This knowledge will enable you to effectively analyze the performance characteristics of your workload and identify potential optimization opportunities.\n\nClean up\n--------\n\nAfter completing your profiling experiments, remember to terminate the instance you launched to avoid unnecessary costs.\n\nNext steps\n----------\n\nNow that you've completed this tutorial, try profiling your own model to analyze its workload. Identify performance gaps, apply optimizations, and profile again to measure the improvements. For a deeper dive into performance analysis, check out Neuron's blog series on profiling."
  },
  {
    "path": "tools/tutorials/torch-neuronx-profiling-with-tb.rst",
    "content": ".. _torch-neuronx-profiling-with-tb:\n\nProfiling PyTorch NeuronX with TensorBoard\n==============================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nIntroduction\n------------\n\nNeuron provides a plugin for TensorBoard that allows users to measure and visualize\nperformance on a torch runtime level or an operator\nlevel. With this information, it becomes quicker to identify any\nperformance bottleneck allowing for quicker addressing of that issue.\n\nFor more information on the Neuron plugin for TensorBoard, see :ref:`neuronx-plugin-tensorboard`.\n\nSetup\n-----\n\nPrerequisites\n~~~~~~~~~~~~~\n\n1. Initial `Trn1 setup for PyTorch\n   (torch-neuronx) <https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install.html>`__\n   has been done\n\nEnvironment\n~~~~~~~~~~~\n\n::\n\n   #activate python virtual environment and install tensorboard_plugin_neuron\n   source ~/aws_neuron_venv_pytorch_p38/bin/activate\n   pip install tensorboard_plugin_neuronx\n\n   #create work directory for the Neuron Profiling tutorials\n   mkdir -p ~/neuron_profiling_tensorboard_examples\n   cd ~/neuron_profiling_tensorboard_examples\n\n\n\nPart 1: Operator Level Trace for ``xm.markstep()`` workflow\n-------------------------------------------------------------\n\nGoal\n~~~~\n\nAfter completing this tutorial, the user should be able to understand\nthe features of the Operator Level Trace. The user should also be able\nto form a narrative/surface level analysis from what is being presented\nin the Operator Level Trace.\n\nSet Up\n~~~~~~\n\nLet’s set up a directory containing the material for this demo\n\n::\n\n   cd ~/neuron_profiling_tensorboard_examples\n   mkdir tutorial_1\n   cd tutorial_1\n\n   # this is where our code will be written\n   touch run.py\n\nHere is the code for ``run.py``:\n\n::\n\n   import os\n   import torch\n   import torch_neuronx\n   from torch_neuronx.experimental import profiler\n   import torch_xla.core.xla_model as xm\n\n   os.environ[\"NEURON_CC_FLAGS\"] = \"--cache_dir=./compiler_cache\"\n\n   device = xm.xla_device()\n\n   class NN(torch.nn.Module):\n      def __init__(self):\n         super().__init__()\n\n         self.layer1 = torch.nn.Linear(4,4)\n         self.nl1 = torch.nn.ReLU()\n         self.layer2 = torch.nn.Linear(4,2)\n         self.nl2 = torch.nn.Tanh()\n\n      def forward(self, x):\n         x = self.nl1(self.layer1(x))\n         return self.nl2(self.layer2(x))\n\n   with torch.no_grad():\n\n      model = NN()\n\n      inp = torch.rand(4,4)\n      output = model(inp)\n\n      with torch_neuronx.experimental.profiler.profile(\n         port=9012,\n         profile_type='operator',\n         ms_duration=10000 ):\n         \n         \n         # IMPORTANT: the model has to be transferred to XLA within\n         # the context manager, otherwise profiling won't work\n         neuron_model = model.to(device)\n         neuron_inp = inp.to(device)\n         \n         output_neuron = neuron_model(neuron_inp)\n         xm.mark_step()   \n\n   print(\"==CPU OUTPUT==\")\n   print(output)\n   print()\n   print(\"==TRN1 OUTPUT==\")\n   print(output_neuron)\n\n\nUnderstanding the Code\n~~~~~~~~~~~~~~~~~~~~~~\n\nFor this first tutorial, we’ll be using a simple Feed forward NN model.\nHowever, once the TensorBoard dashboard is up, we’ll see some\ninteresting and unexpected things. A simple model is helpful since it is\neasy to reference back to.\n\nAnother important part is the “operator” profiling type we specified in the context manager.\n\n**Low Level:** The “operator“ dashboard is the dashboard that contains\nthe Operator Level Trace This view also only zooms in on the\nNeuronDevice, while the ”trace“ dashboard shows processes from all\ndevices. The Operator Level Trace View is organized by levels of\nabstraction, with the top level showing the model class. The next lower\ntier shows model components, and the lowest tier shows specific\noperators that occur for a specific model component. This view is useful\nfor identifying model bottlenecks at the operator level.\n\nWe also print out the outputs from the CPU model and the TRN1 model to note\nthe small differences in output.\n\nRunning The Profiler\n~~~~~~~~~~~~~~~~~~~~\n\n::\n\n   python run.py\n\n**Output:**\n\nInitial Output & Compilation Success\n\n::\n\n   0%   10   20   30   40   50   60   70   80   90   100%\n   |----|----|----|----|----|----|----|----|----|----|\n   ***************************************************\n   Analyzing dependencies of Block1\n   0%   10   20   30   40   50   60   70   80   90   100%\n   |----|----|----|----|----|----|----|----|----|----|\n   ***************************************************\n   Analyzing dependencies of Block1\n   0%   10   20   30   40   50   60   70   80   90   100%\n   |----|----|----|----|----|----|----|----|----|----|\n   ***************************************************\n   Dependency reduction of sg0000\n   0%   10   20   30   40   50   60   70   80   90   100%\n   |----|----|----|----|----|----|----|----|----|----|\n   ***************************************************\n\nProcessing the Neuron Profiler Traces\n\n::\n\n   torch_neuron: Waiting for XLA profile completion ...\n   torch_neuron: translate_xplane: Processing plane: '/host:CPU'\n   torch_neuron: XLA decode - Read filename 2023_04_28_00_54_04\n   torch_neuron: XLA decode - Read date parts ['2023', '04', '28', '00', '54', '04']\n   torch_neuron: XLA decode - Read start date 2023-04-28 00:54:04 from directory stamp\n   torch_neuron: translate_xplane: Processing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_op_timeline_split.json'\n   torch_neuron: translate_xplane: Writing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_op_timeline_split.json' to 'temp_profiler_logs/c1a992f0ea378f7a_1/neuron_op_timeline_split.json'\n   torch_neuron: translate_xplane: Processing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_op_timeline.json'\n   torch_neuron: translate_xplane: Writing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_op_timeline.json' to 'temp_profiler_logs/c1a992f0ea378f7a_1/neuron_op_timeline.json'\n   torch_neuron: translate_xplane: Processing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_hlo_op.json'\n   torch_neuron: translate_xplane: Writing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_hlo_op.json' to 'temp_profiler_logs/c1a992f0ea378f7a_1/neuron_hlo_op.json'\n   torch_neuron: translate_xplane: Processing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_framework_op.json'\n   torch_neuron: translate_xplane: Writing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_framework_op.json' to 'temp_profiler_logs/c1a992f0ea378f7a_1/neuron_framework_op.json'\n\nPrinting output from CPU model and Trn1 Model:\n\n::\n\n   ==CPU OUTPUT==\n   tensor([[-0.1396, -0.3266],\n           [-0.0327, -0.3105],\n           [-0.0073, -0.3268],\n           [-0.1683, -0.3230]])\n\n   ==TRN1 OUTPUT==\n   tensor([[-0.1396, -0.3266],\n           [-0.0328, -0.3106],\n           [-0.0067, -0.3270],\n           [-0.1684, -0.3229]], device='xla:1')\n\nLoading the Operators Level Trace in TensorBoard\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRun ``tensorboard --load_fast=false --logdir logs/``\n\nTake note of the port (usually 6006) and enter ``localhost:<port>`` into\nthe local browser (assuming port forwarding is set up properly)\n\n.. note::\n\n   Check :ref:`Tensorboard Interface Overview` to understand TensorBoard interface\n\n\nThe Operator Level Trace views are the same format plus an id at the\nend; ``year_month_day_hour_minute_second_millisecond_id``. The Tool\ndropdown will have 3 options: operator-framework, operator-hlo, and\noperator-timeline.\n\nOperator Framework View\n~~~~~~~~~~~~~~~~~~~~~~~\n\n|tensorboard-operator-framework-view|\n\nThis view contains a pie-chart displaying the\nproportional execution time for each of the model operators on the framework level for a\nneuron device. The list of operators is shown in the bottom along with\nother details about number of occurrences, execution time and neuron\ndevice and core.\n\nOperator HLO View\n~~~~~~~~~~~~~~~~~\n\n|tensorboard-operator-hlo-view|\n\nThis view contains a pie-chart displaying the\nproportional execution time for each of the model operators on the hlo level for a\nNeuron device. The list of operators is shown in the bottom along with\nother details about number of occurrences, execution time and neuron\ndevice and core.\n\n.. note::\n\n   For this simple model, the pie chart will be the same as the framework view. This won't be\n   the case for larger and more complex models.\n\nOperator Trace View\n~~~~~~~~~~~~~~~~~~~\n\n|tensorboard-operator-trace-view|\n\n\n.. _trace_view_sections:\n\nTrace View Sections\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nNotice there are four sections: Process Overview, Control, Execution, and Data\nTransfer. In each section there are more subdivisions with each layer\nrepresenting a certain level of abstraction. Also important to note that\nthe timescale axis is aligned between the two sections. This is\nimportant to note as sometimes there are gaps in the process execution.\nMost of the time, there are data transfer operations happening in\nbetween the gaps.\n\nFusion Operators\n^^^^^^^^^^^^^^^^\n\n**Simple Case:** Zooming in on the operations, we can recognize some\noperations for a neural network, such as a dot product and transpose,\nbut sometimes there will be fused operators (fusion operators). To\nunderstand these operators, click on it, and on the bottom of the\ndashboard, some information will appear. \n\n|tensorboard-operator-trace-fusion-simple|\n\nNotice in the above example the fusion operator is fusing the operator before and\nafter itself on the timeline. More specifically, ``fused_3`` is a fusion\nof ``NN[model]/input`` and\n``NN[model]/ReLU[nl1]/Tensor_1/aten__relu_maximum``. These kinds of\nfusions occur when the ``neuronx-cc`` compiler has found an optimization\nrelating to the two operators. Most often this would be the execution of\nthe operators on separate compute engines or another form of parallelism.\n\n**Complex Case:** Most often, the order of fusion operators can get a\nlittle complicated or contain \"hidden\" information. For the first example,\nlet’s zoom into the data transfer section such that we see the timescale range \nfrom 6000 ns. to 6600 ns. It should look similar to below:\n\n|tensorboard-operator-trace-fusion-complex|\n\nLooking at ``fused_16`` (11452 ns) we see it's surrounded by other fused operators.\nFurthermore, the ``fused_16`` operator fuses more than two operators: ``NN[model]/Linear[layer1]/aten__addmm_add``,\n``NN[model]/input``, and ``NN[model]/Linear[layer1]/aten__addmm_dot``. These operators can be found in the timeline, but sometimes\nthe fused operators may not exist in the timeline due to it occurring within another operation. We go over an example of this case\nin Part 2.\n\n\nUnderstanding the Low Level Timeline\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nLooking at the trace we can look behind the scenes at how the model is\nexecuted on neuron hardware. Before proceeding with the analysis, it is worth recalling the\nway we defined the model for this tutorial:\n\n.. code:: python\n\n   class NN(torch.nn.Module):\n      def __init__(self):\n         super().__init__()\n\n         self.layer1 = torch.nn.Linear(4,4)\n         self.nl1 = torch.nn.ReLU()\n         self.layer2 = torch.nn.Linear(4,2)\n         self.nl2 = torch.nn.Tanh()\n\n      def forward(self, x):\n         x = self.nl1(self.layer1(x))\n         return self.nl2(self.layer2(x))\n\nAnalysis\n^^^^^^^^\n**Input Operators:** We see input operators here. This is because in a markstep flow, we need to transfer inputs to the xla device. This is represented by the ``SyncTensorsGraph.53`` call.\n\n**ReLU at the beginning:** The first couple of blocks in the Process Data Transfer section initially appear to be confusing. There is an ``Input`` (0 ns.)\nblock followed by a ``ReLU`` (100 ns.) operator. Under the hood here, ``ReLU`` is rewritten as an ``elementwise_max(arr,0)``, \n(0 here means an array with zeros) but to create this operation, the zeros have to be set in memory, which is a data operation.\nA general rule is that if an operator appears this early in the data transfer section, it most likely means there is an operation\nlowering involving setting some values into memory for use later on.\n\n**Memory allocation for Linear[layer1]:** We resume with the data transfer operations. Here, memory is getting allocated for specific operators, and sometimes the allocated\ninputs get loaded onto operators while the rest of the input gets allocated. This can be seen at ``fused_18`` (11811 ns.) and ``fused_23`` (12181 ns.).\nEventually the input gets fully allocated, and other allocations occur for dot products, transpose, and broadcast operators for\n``Linear[layer1]`` and ``Linear[layer2]``.\n\nConclusion\n^^^^^^^^^^^\n\nThere are a few conclusions that can be determined from analyzing the timeline. We can see that we’ve been able to save a bit of time due to \nparallelism with fusion operations, and saving some compute time with preloading operations (ex. ``ReLU``). A clear trend is that a majority of the time is spent on data transfer operations.\nIt is also evident that even a simple Feed Forward NN becomes complicated when put under a microscope in the profiler. Facts such as the implementation of ``ReLU`` in the runtime/architecture, aren’t explicitly stated in the profiler, but do make\nthemselves known by the unusual ordering placement of the trace blocks and unusual fusion operators.\n\nIn terms of action items that can be taken based on our narrative, there\nreally isn’t any. This is a very very simple model that outputs after 8\nmicroseconds, and we chose it because it is simple to understand. In\nmore realistic examples we will aim to do more compute than data\ntransfer on the hardware, and where possible to overlap data transfer\nand compute between sequential operations.\n\nThe profiler revealed a lot of optimizations that were done, via fusion\noperators and parallelism. However, the end goal of this tool is to be\nable to improve performance by revealing the bottlenecks of the model.\n\n.. note::\n\n   While we did explain some of the quirks visible in the profiler at a microscopic level, it isn’t necessary\n   to do so for normal use. This tutorial introduced the microscopic explanation for these occurrences to show to the \n   user that this is *indeed* what happens in the hardware when executing a simple FFNN.\n\nPart 2: Operator Level Trace with ``torch_neuronx.trace()`` workflow\n----------------------------------------------------------------------\n\nSet Up\n~~~~~~\n\nThe setup will be similar to Part 1.\n::\n\n   cd ~/neuron_profiling_tensorboard_examples\n   mkdir tutorial_2\n   cd tutorial_2\n\n   # this is where our code will be written\n   touch run.py\n\nHere is the code for ``run.py``:\n\n::\n\n   import os\n   import time\n   import torch\n   import torch_neuronx\n   from torch_neuronx.experimental import profiler\n\n   class NN(torch.nn.Module):\n      def __init__(self):\n         super().__init__()\n\n         self.layer1 = torch.nn.Linear(4,4)\n         self.nl1 = torch.nn.ReLU()\n         self.layer2 = torch.nn.Linear(4,2)\n         self.nl2 = torch.nn.Tanh()\n\n      def forward(self, x):\n         x = self.nl1(self.layer1(x))\n         return self.nl2(self.layer2(x))\n\n   model = NN()\n   model.eval()\n\n   inp = torch.rand(4,4)\n\n   output = model(inp)\n\n   with torch_neuronx.experimental.profiler.profile(\n      port=9012,\n      profile_type='operator',\n      ms_duration=10000,\n      traced_only=True):\n\n      neuron_model = torch_neuronx.trace(model,inp,compiler_workdir=\"./compiler_cache\")\n      neuron_model(inp)\n\n   print(\"==CPU OUTPUT==\")\n   print(output)\n   print()\n   print(\"==INF2 OUTPUT==\")\n   print(output_neuron)\n\nImportant code differences from Part 1\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n1. ``import torch_xla.core.xla_model as xm`` is no longer necessary\n2. Set ``traced_only=True`` in ``torch_neuronx.experimental.profiler.profile()``. This option is necessary for traced models, otherwise the generated profile will not be accurate or not work.\n3. Tracing the model with ``torch_neuronx.trace()`` and removing ``xm.markstep()``.\n\nOtherwise, the code is the same as Part 1.\n\nRunning Part 2\n~~~~~~~~~~~~~~~~~\nTo Run:\n\n::\n\n   python run.py\n\nThe output will look almost identical as Part 1\n\nLoading the Operators Level Trace in TensorBoard\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nRun ``tensorboard --load_fast=false --logdir logs/``, just like Part 1.\n\n.. note::\n\n   Check :ref:`Tensorboard Interface Overview` to understand TensorBoard interface\n\nTimeline View:\n\n|tensorboard-operator-trace-view-traced|\n\nNotable Differences in Timeline View from Part 1\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**No Input Operators:** For a traced model, we do not transfer the input to an xla device, so these operations are not seen on the timeline. This also affects scheduling, which is why the time taken in\nthe profiling is less than the markstep one.\n\n**Combined Loading of Linear[layer1] and Tanh:** ``fused_19`` (5824 ns) contains a fusion between ``Linear[layer1]`` and ``Tanh[nl2]``. This might be a bit odd, but such data loading parallelism\ncan be understood by understanding how tanh is implemented. Typically, functions like tanh are implemented by lookup tables that require being pre-loaded onto memory, which is a data transfer operation.\nA bulk of data transfer operations are done in the beginning to optimize computations.\n\n.. note::\n   Despite these differences, the big picture conclusion drawn from Part 1 still holds, as the two timelines are more similar than different. Some new insights drawn is that the traced model performs better than the markstep flow, since this was profiling a single forward pass.\n\n\n.. |tensorboard-url-image| image:: /images/Neuron_Profiler_Tensorboard_Url.jpg\n\n.. |tensorboard-NEURON-header| image:: /images/Neuron_Profiler_Tensorboard_Header.jpg\n\n.. |tensorboard-NEURON-dropdown| image:: /images/Neuron_Profiler_Tensorboard_Dropdown.jpg\n\n.. |tensorboard-run-tool-dropdowns| image:: /images/Neuron_Profiler_Tensorboard_Run_Tool_Dropdowns.jpg\n\n.. |tensorboard-run-trace-original| image:: /images/Neuron_Profiler_Runtime_Trace_Original.jpg\n\n.. |tensorboard-run-trace-selected-section| image:: /images/Neuron_Profiler_Runtime_Trace_Section_Selection.jpg\n\n.. |tensorboard-run-trace-selected-section-zoomed| image:: /images/Neuron_Profiler_Runtime_Trace_Section_Selection_Zoomed.jpg\n\n.. |tensorboard-run-trace-selected-section-zoomed-named-traces| image:: /images/Neuron_Profiler_Runtime_Trace_Section_Selection_Zoomed_Named_Traces.jpg\n\n.. |tensorboard-operator-framework-view| image:: /images/Neuron_Profiler_T1_Op_Framework_View.png\n\n.. |tensorboard-operator-hlo-view| image:: /images/Neuron_Profiler_T1_Op_HLO_View.png\n\n.. |tensorboard-operator-trace-view| image:: /images/Neuron_Profiler_T1_Op_Trace_View.png\n\n.. |tensorboard-operator-trace-view-traced| image:: /images/Neuron_Profiler_T1_Op_Trace_View_Traced.png\n\n.. |tensorboard-operator-trace-fusion-simple| image:: /images/Neuron_Profiler_T1_Op_Trace_Fusion_Simple.png\n\n.. |tensorboard-operator-trace-fusion-complex| image:: /images/Neuron_Profiler_T1_Op_Trace_Fusion_Complex.png"
  },
  {
    "path": "tools/tutorials/tutorial-neuron-monitor-mnist.rst",
    "content": ".. _track-system-monitor:\n\nTrack System Resource Utilization during Training with neuron-monitor using PyTorch Neuron\n==========================================================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nThis tutorial explains how to monitor resource utilization using **neuron-monitor**, **Prometheus** and **Grafana** while running a multi-layer\nperceptron MNIST model on Trainium using PyTorch Neuron.\n\nMulti-layer Perceptron MNIST Model\n----------------------------------\n\nThis tutorial is based on the MNIST example for PyTorch Neuron on Trainium.\nFor the full tutorial, please see :ref:`Multi-Layer Perceptron Training Tutorial <neuronx-mlp-training-tutorial>`.\n\nThe Training Job\n----------------\n\nFor this tutorial, we will make the original script do more work thus giving us more system utilization data to observe. The training\nloop is simply repeated 1000 times:\n\n.. code:: python\n\n    for run in range(0, 1000):\n        print(f'Run {run}')\n        model.train()\n        ...\n\nSave the following code as :download:`train_monitor.py </src/examples/pytorch/mnist_mlp/train_monitor.py>` and you can run it as\n``python3 train_monitor.py`` on a Trn1 instance.\n\n.. literalinclude:: /src/examples/pytorch/mnist_mlp/train_monitor.py\n    :language: python\n\nSetting up **Prometheus** and **Grafana**\n-----------------------------------------\n\n.. note::\n   The setup presented in the following paragraphs can be extended to monitor any number of instances running training jobs or\n   inference workloads. For this tutorial, we will set everything up on a single Trn1 instance running Amazon Linux 2.\n\nSetting up **Prometheus**\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nFor a more detailed guide on how to install **Prometheus** visit their official guide at https://prometheus.io/docs/prometheus/latest/getting_started/.\n\nDownload and unzip a prebuilt **Prometheus** binary on your Trn1 instance:\n\n.. code:: bash\n\n    wget https://github.com/prometheus/prometheus/releases/download/v2.38.0/prometheus-2.38.0.linux-amd64.tar.gz\n    tar -xzvf prometheus-2.38.0.linux-amd64.tar.gz\n    cd prometheus-2.38.0.linux-amd64/\n\nCreate a config and add a scrape target:\n\n.. code:: bash\n\n    vim prometheus.yml\n\n.. code:: yml\n\n    scrape_configs:\n    - job_name:       'neuron'\n\n    # Scrape target every 5 seconds.\n    scrape_interval: 5s\n\n      static_configs:\n        - targets: ['localhost:8000']\n\nFinally, start **Prometheus**:\n\n.. code:: bash\n\n    ./prometheus --config.file=prometheus.yml\n\nSetting up **Grafana**\n~~~~~~~~~~~~~~~~~~~~~~\n\nFor a more detailed guide on how to install **Grafana** visit their official guide at https://grafana.com/grafana/download.\n\nAdd the Grafana repo to dnf:\n\n.. code:: bash\n\n    sudo vim /etc/yum.repos.d/grafana.repo\n\n    [grafana]\n    name=grafana\n    baseurl=https://packages.grafana.com/oss/rpm\n    repo_gpgcheck=1\n    enabled=1\n    gpgcheck=1\n    gpgkey=https://packages.grafana.com/gpg.key\n    sslverify=1\n    sslcacert=/etc/pki/tls/certs/ca-bundle.crt\n\nInstall and start **Grafana**:\n\n.. code:: bash\n\n    sudo dnf install -y grafana\n    sudo /bin/systemctl start grafana-server.service\n\nBy default, **Grafana** will run a HTTP server on port 3000. If you need to change that, update its config and restart the service:\n\n.. code:: bash\n\n    sudo vim /etc/grafana/grafana.ini\n    ...\n    sudo /bin/systemctl start grafana-server.service\n\nUsing your favorite web browser, access the Grafana webpage and add a new dashboard.\n\nThe default user and password are both 'admin':\n\n.. image:: tutorial_grafana_login.png\n   :alt: Image: image.png\n\nNext, you'll add a Prometheus data source by going to ``Configuration`` -> ``Data Sources``:\n\n.. image:: tutorial_grafana_data_sources.png\n   :alt: Image: image.png\n\n... and adding the local **Prometheus** server as a data source:\n\n.. image:: tutorial_grafana_add_prometheus.png\n   :alt: Image: image.png\n\nFinally, upload the sample dashboard :download:`neuron-monitor-grafana.json </src/examples/neuron-monitor/neuron-monitor-grafana.json>`\nto **Grafana**:\n\n.. image:: tutorial_grafana_upload_dash.png\n   :alt: Image: image.png\n\nMonitoring the Training Workload\n--------------------------------\n\nStart the training job which, due to the artificially added complexity, will take more than 15 minutes:\n\n.. code:: bash\n\n   python train_monitor.py\n\nOn the same instance, start ``neuron-monitor`` and its companion script, ``neuron-monitor-prometheus.py``:\n\n.. code:: bash\n\n   neuron-monitor | neuron-monitor-prometheus.py\n\nOnce they are running, you can use your web browser, access the **Grafana** server running on your Trn1 instance and\nview a timeline of the system utilization.\n\nThe upper part of the dashboard contains:\n - a list of the currently monitored instances (for this tutorial there is a single Trn1 instance)\n - aggregated metrics for stats such as NeuronCore utilization, NeuronCores in use, iteration success rates, error rates etc.\n - a timeline of execution status rates and execution latencies\n\n.. image:: tutorial_grafana_dash_1.png\n   :alt: Image: image.png\n\nThe lower part of the dashboard contains:\n- one line of charts containing a timeline of Neuron resource utilization (NeuronCore, vCPU and memory utilization)\n- one line of charts containing a timeline of host resource utilization (vCPU and memory utilization)\n\n.. image:: tutorial_grafana_dash_2.png\n   :alt: Image: image.png\n"
  },
  {
    "path": "tools/tutorials/tutorial-tensorboard-scalars-mnist.rst",
    "content": ".. _tb_track_training_minst:\n\nTrack Training Progress in TensorBoard using PyTorch Neuron\n============================================================\n\n.. contents:: Table of Contents\n   :local:\n   :depth: 2\n\nThis tutorial explains how to track training progress in TensorBoard while running a multi-layer perceptron MNIST model on Trainium using PyTorch Neuron.\n\nMulti-layer perceptron MNIST model\n----------------------------------\n\nThis tutorial is based on the MNIST example for PyTorch Neuron on Trainium.\nFor the full tutorial, please see :ref:`Multi-Layer Perceptron Training Tutorial <neuronx-mlp-training-tutorial>`.\n\nOutput TensorBoard logs\n-----------------------\n\nTo generate TensorBoard logs, we first modify the training script to use the ``SummaryWriter``:\n\n.. code:: python\n\n   from torch.utils.tensorboard import SummaryWriter\n   writer = SummaryWriter('./output')\n\nIn the training loop, we can then use the ``add_scalar`` API to log the loss per step.\n\n.. code:: python\n\n   writer.add_scalar(\"step loss\", loss, idx)\n\nAt the end of the script, add ``writer.flush()`` to ensure all logs are written.\n\nSave the following code as :download:`train_tb.py </src/examples/pytorch/mnist_mlp/train_tb.py>` and run it as ``python3 train_tb.py`` on a Trn1 instance.\nThe generated logs can be found in the ``./output`` directory that was passed to ``SummaryWriter``.\n\n.. literalinclude:: /src/examples/pytorch/mnist_mlp/train_tb.py\n    :language: python\n\nView loss in TensorBoard\n------------------------\n\nIn order to view your training metrics, install TensorBoard in your Python environment:\n\n.. code:: bash\n\n   pip install tensorboard\n\nThen, launch TensorBoard with the ``./output`` directory\n\n.. code:: bash\n\n   tensorboard --logdir ./output\n\nOnce running, open a new SSH connection to the instance and port-forward\nTCP port 6006 (ex: -L 6006:127.0.0.1:6006). Once the tunnel is\nestablished, TensorBoard can then be accessed via web browser at the\nfollowing URL: `http://localhost:6006 <http://localhost:6006/>`__.\nPlease note that you will not be able to access TensorBoard if you\ndisconnect your port-forwarding SSH session to the Trainium instance.\n\n.. image:: tb-scalars.png\n   :alt: Image: image.png\n\nIn TensorBoard, you can now see the loss per step plotted.\nWhen capturing loss for multiple runs, you can plot them together on the same graph to compare runs.\nBe sure to change the output directory for different runs, for example ``./output/run1`` for the first, ``./output/run2`` for the second, etc.\n"
  }
]